Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20220613となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# コンクリートから抽象的なAI:人工知能を一般大衆に普及させる AI from concrete to abstract: demystifying artificial intelligence to the general public ( http://arxiv.org/abs/2006.04013v6 ) ライセンス: Link先を確認	Rubens Lacerda Queiroz, F\'abio Ferrentini Sampaio, Cabral Lima and Priscila Machado Vieira Lima	(参考訳) 人工知能(AI)は幅広い領域で採用されている。これは、aiの意味を最小限の理解で一般の人々に与える手段を開発することの必要性を示しています。本稿では、ビジュアルプログラミングとWiSARD重みのない人工ニューラルネットワークを組み合わせることで、一般の人々(子供を含む)がこの目標を達成するために、コンクリートから抽象的(AIcon2abs)へのAIという新しい方法論を提案する。本研究の主な戦略は,学習機械の開発に関連する実践的活動を通じて,学習過程の観察を通じて,人工知能の脱ミステリゼーションを促進することである。したがって、人工知能のメカニズムの採用に関わる議論や決定において、被験者に洞察力のある俳優に寄与するスキルを提供することができる。現在、プログラミングを通じて基本的なai概念を教える既存のアプローチは、マシンインテリジェンスを外部要素/モジュールとして扱う。トレーニングを受けた後、その外部モジュールは学習者が開発するメインアプリケーションに結合される。この方法論では、トレーニングタスクと分類タスクの両方が、他のプログラミング構成と同様に、メインプログラムを構成するブロックである。 aicon2absの有益な副作用として、データから学習可能なプログラムと従来のコンピュータプログラムとの差がより顕著になる。さらに、WiSARDの重みのない人工知能ニューラルネットワークモデルの単純さにより、トレーニングや分類タスクの内部実現の可視化と理解が容易になる。 Artificial Intelligence (AI) has been adopted in a wide range of domains. This shows the imperative need to develop means to endow common people with a minimum understanding of what AI means. Combining visual programming and WiSARD weightless artificial neural networks, this article presents a new methodology, AI from concrete to abstract (AIcon2abs), to enable general people (including children) to achieve this goal. The main strategy adopted by is to promote a demystification of artificial intelligence via practical activities related to the development of learning machines, as well as through the observation of their learning process. Thus, it is possible to provide subjects with skills that contributes to making them insightful actors in debates and decisions involving the adoption of artificial intelligence mechanisms. Currently, existing approaches to the teaching of basic AI concepts through programming treat machine intelligence as an external element/module. After being trained, that external module is coupled to the main application being developed by the learners. In the methodology herein presented, both training and classification tasks are blocks that compose the main program, just as the other programming constructs. As a beneficial side effect of AIcon2abs, the difference between a program capable of learning from data and a conventional computer program becomes more evident. In addition, the simplicity of the WiSARD weightless artificial neural network model enables easy visualization and understanding of training and classification tasks internal realization.	翻訳日:2023-05-16 09:12:36 公開日:2022-06-13
# 多体局在遷移における情報理論記憶スケーリング Information-Theoretic Memory Scaling in the Many-Body Localization Transition ( http://arxiv.org/abs/2009.04470v3 ) ライセンス: Link先を確認	Alexander Nico-Katz, Abolfazl Bayat, Sougato Bose	(参考訳) 多体局所化相の重要な特徴は、エルゴディシティの破壊と、結果として局所記憶の出現であり、時間の経過とともに情報の局所保存として明らかにされる。メモリは必ずしも時間に依存する概念であるため、動的量に関するいくつかの研究によって部分的に捉えられている。しかし、これらの量は入力状態に関して最適でも民主的でもなく、多体ローカライゼーションの文脈における局所記憶の基本的な情報理論的理解はいまだ解明されていない。局所記憶の真の定量化として動的ホールボ量を導入し,不均衡や絡み合いエントロピーといった他の量に対する利点を概説する。多体局在遷移にまたがる定常状態における明確なスケーリング挙動を見いだし、この挙動を捉えた2パラメータスケーリングans\"atzeの族を決定する。遷移点とスケーリング指数を抽出したこの力学量の包括的有限サイズスケーリング解析を行う。 A key feature of the many-body localized phase is the breaking of ergodicity and consequently the emergence of local memory; revealed as the local preservation of information over time. As memory is necessarily a time dependent concept, it has been partially captured by a few extant studies of dynamical quantities. However, these quantities are neither optimal, nor democratic with respect to input state; and as such a fundamental and complete information theoretic understanding of local memory in the context of many-body localization remains elusive. We introduce the dynamical Holevo quantity as the true quantifier of local memory, outlining its advantages over other quantities such as the imbalance or entanglement entropy. We find clear scaling behavior in its steady-state across the many-body localization transition, and determine a family of two-parameter scaling ans\"atze which captures this behavior. We perform a comprehensive finite size scaling analysis of this dynamical quantity extracting the transition point and scaling exponents.	翻訳日:2023-05-03 02:53:50 公開日:2022-06-13
# 確率と構造的難読性に基づく文脈性の多様性 Varieties of contextuality based on probability and structural nonembeddability ( http://arxiv.org/abs/2103.06110v5 ) ライセンス: Link先を確認	Karl Svozil	(参考訳) 文脈性の異なる分析的概念は、確率論的と強い文脈性の2つのグループに分けられる。 Kochen and Specker's Theorem~0 はこれらの群を区別するための区切り基準である。確率的文脈性は、古典的モデルも許すが、非古典的確率を持つにもかかわらず、文脈性の論理的代数的「強い」形式は、(拡張された)ブール代数に忠実に埋め込まれない量子可観測物の集合を特徴づける。どちらの形式も古典的慣性下決定であり、これは「値不定」と呼ばれ、理論計算機科学の部分関数によって形式化される。 Different analytic notions of contextuality fall into two major groups: probabilistic and strong notions of contextuality. Kochen and Specker's Theorem~0 is a demarcation criterion for differentiating between those groups. Whereas probabilistic contextuality still allows classical models, albeit with nonclassical probabilities, the logico-algebraic "strong" form of contextuality characterizes collections of quantum observables that have no faithfully embedding into (extended) Boolean algebras. Both forms indicate a classical in- or under-determination that can be termed "value indefinite" and formalized by partial functions of theoretical computer sciences.	翻訳日:2023-04-08 13:32:24 公開日:2022-06-13
# 量子領域融解の観察と量子コンピュータによるシミュレーション Observation of quantum domain melting and its simulation with a quantum computer ( http://arxiv.org/abs/2103.07343v2 ) ライセンス: Link先を確認	Jaka Vodeb, Michele Diego, Yevhenii Vaskivskyi, Yaroslav Gerasimenko, Viktor Kabanov and Dragan Mihailovic	(参考訳) 領域は非平衡相転移で生成される離散対称性の均一領域である。それらはドメインの壁と、それらが融合することを防ぐトポロジカルオブジェクトによって分離される。ドメインは熱駆動型顕微鏡プロセス、および量子システムにおいて、マクロ的な量子トンネルによって再構成される。トンネル形成のためのシステムのエネルギー環境を定義する微視的物理は、宇宙論や他の量子領域システム、より一般的に核物理学、物質波、磁気学、生物学など、多くの異なるシステムで興味深い。量子領域のダイナミクスのような創発的振る舞いにつながる微視的相関のダイナミクスを研究するユニークな機会は、量子材料によって提供される。ここでは、量子コンピュータを用いて量子系をシミュレートするというファインマンのアイデアの直接的な実現として、量子電子再構成ダイナミクスと領域融解の2つの実施形態(原型的2次元電子秩序固体量子材料と最新の量子シミュレータのシミュレーション)における研究を報告する。走査型トンネル顕微鏡を用いて電子領域再構成ダイナミクスの時間変化を計測し、量子領域融解シミュレーションにおける絡み合った相関電子のアンサンブルにおける領域の時間発展と比較する。ドメイン再構成は、マクロ的に観察される特徴的なステップ様の時間発展と温度依存性を持つ、創発的な自己構成エネルギーランドスケープでトンネル化することによって進行する。量子材料と量子シミュレーションの力学における顕著な対応は、顕微鏡レベルで相互作用する多体量子系の創発的挙動を理解するための道を開く。 Domains are homogeneous areas of discrete symmetry, created in nonequilibrium phase transitions. They are separated by domain walls, topological objects which prevent them from fusing together. Domains may reconfigure by thermally-driven microscopic processes, and in quantum systems, by macroscopic quantum tunnelling. The underlying microscopic physics that defines the system's energy landscape for tunnelling is of interest in many different systems, from cosmology and other quantum domain systems, and more generally to nuclear physics, matter waves, magnetism, and biology. A unique opportunity to investigate the dynamics of microscopic correlations leading to emergent behaviour, such as quantum domain dynamics is offered by quantum materials. Here, as a direct realization of Feynman's idea of using a quantum computer to simulate a quantum system, we report an investigation of quantum electron reconfiguration dynamics and domain melting in two matching embodiments: a prototypical two-dimensionally electronically ordered solid-state quantum material and a simulation on a latest-generation quantum simulator. We use scanning tunnelling microscopy to measure the time-evolution of electronic domain reconfiguration dynamics and compare this with the time evolution of domains in an ensemble of entangled correlated electrons in simulated quantum domain melting. The domain reconfiguration is found to proceed by tunnelling in an emergent, self-configuring energy landscape, with characteristic step-like time evolution and temperature-dependences observed macroscopically. The remarkable correspondence in the dynamics of a quantum material and a quantum simulation opens the way to an understanding of emergent behaviour in diverse interacting many-body quantum systems at the microscopic level.	翻訳日:2023-04-08 08:41:50 公開日:2022-06-13
# x状態の局所利用可能な量子相関:対称ケースと反対称ケース Local available quantum correlations of X states: The symmetric and anti-symmetric cases ( http://arxiv.org/abs/2107.00158v3 ) ライセンス: Link先を確認	Hermann Albrecht and David Bellorin and Douglas F. Mundarain	(参考訳) Mundarainらによって定義された局所量子相関 (LAQC) は、等等級の局所ブロッホベクトルを持つ2量子ビットのX状態に対して解析される。対称性 X-状態は部分系の交換の下で不変であり、したがって同じ {local} Bloch ベクトルを持つ。一方、反対称 X 状態は、等等等大であるが反対方向 {(anti-parallel)} を持つ {local} Bloch ベクトルを持つ。いずれの場合も、LAQC量子化器の正確な解析式を得る。いくつかの例を示し、この量子相関をコンカレンスと量子不協和と比較する。我々はまた、振幅減衰デコヒーレンスの下でヴェルナー状態を持つマルコフデコヒーレンスも含む。脱分極と位相減衰の場合と同様に、この量子チャネルを持つこれらの状態のLAQCに対して突然の死の挙動は発生しない。 Local available quantum correlations (LAQC), as defined by Mundarain et al., are analyzed for 2-qubit X states with local Bloch vectors of equal magnitude. Symmetric X-states are invariant under the exchange of subsystems, hence having the same {local} Bloch vector. On the other hand, anti-symmetric X states have {local} Bloch vectors with an equal magnitude but opposite direction {(anti-parallel)}. In both cases, we obtain exact analytical expressions for their LAQC quantifier. We present some examples and compare this quantum correlation to concurrence and quantum discord. We have also included Markovian decoherence, with Werner states under amplitude damping decoherence. As is the case for depolarization and phase damping, no sudden death behavior occurs for the LAQC of these states with this quantum channel.	翻訳日:2023-03-23 20:56:40 公開日:2022-06-13
# 量子コムギ石橋 Quantum Wheatstone Bridge ( http://arxiv.org/abs/2108.11397v2 ) ライセンス: Link先を確認	Kasper Poulsen, Alan C. Santos, Nikolaj T. Zinner	(参考訳) 古典版の完全な量子類似物として,量子小麦石橋を提案する。ブリッジは、未知のカップリングに対する感度を高めるために量子効果を利用する数体の境界駆動スピンチェーンである。この感度は、制御可能なカップリングが未知のカップリングに近づくと、破壊干渉による絡み合ったベル状態の集団の減少によって説明される。破壊的干渉の簡単な基準を見出し、落下の幅の近似式を導出する。未知結合に対する感度は量子フィッシャー情報を用いて定量化され, スピン電流を介して間接的に橋の状態を測定できることを示した。我々の結果はキャリブレーションエラーに対して堅牢であり、現在の最先端量子プラットフォームが実現の手段として使用できるという意味では一般的である。したがって、量子ウィートストーンブリッジは、近距離量子デバイスを用いたセンシングやメトロロジーといった分野で使われる可能性がある。 We propose a quantum Wheatstone bridge as a fully quantum analogue to the classical version. The bridge is a few-body boundary-driven spin chain exploiting quantum effects to gain an enhanced sensitivity to an unknown coupling. The sensitivity is explained by a drop in population of an entangled Bell state due to destructive interference as the controllable coupling approaches the unknown coupling. A simple criteria for the destructive interference is found, and an approximate expression for the width of the drop is derived. The sensitivity to the unknown coupling is quantified using the quantum Fisher information, and we show that the state of the bridge can be measured indirectly through the spin current. Our results are robust towards calibration errors and generic in the sense that several of the current state-of-the-art quantum platforms could be used as a means of realization. The quantum Wheatstone bridge may thus find use in fields such as sensing and metrology using near-term quantum devices.	翻訳日:2023-03-17 05:13:24 公開日:2022-06-13
# 三対角行列表現を持つ非エルミートハミルトンの類について On a class of non-Hermitian Hamiltonians with tridiagonal matrix representation ( http://arxiv.org/abs/2109.14540v5 ) ライセンス: Link先を確認	Francisco M. Fern\'andez	(参考訳) 三対角行列表現を持つ非エルミート・ハミルトニアン作用素は、準エルミートあるいはエルミート作用素に似ている。ここで議論されたハミルトン作用素のクラスにおいて、変換はエルミート的で正定値な対角作用素によって与えられる。開境界条件と周期条件との間には重要な違いがあることを示す。 2つの単純で広く使われているモデルを用いて理論的結果を説明する。 We show that some non-Hermitian Hamiltonian operators with tridiagonal matrix representation may be quasi Hermitian or similar to Hermitian operators. In the class of Hamiltonian operators discussed here the transformation is given by a Hermitian, positive-definite, diagonal operator. We show that there is an important difference between open boundary conditions and periodic ones. We illustrate the theoretical results by means of two simple, widely used, models.	翻訳日:2023-03-13 07:08:39 公開日:2022-06-13
# 永遠非マルコフ性を持つ3量子系における真の多部絡み検出 Detecting genuine multipartite entanglement in three-qubit systems with eternal non-Markovianity ( http://arxiv.org/abs/2110.05211v2 ) ライセンス: Link先を確認	Ankit Vaishy, Subhadip Mitra and Samyadeb Bhattacharya	(参考訳) 量子非マルコフ演算を用いて, 真に多元的絡み合い状態を検出する新しいプロトコルを考案する。我々は、永遠の非マルコフ性として知られる特定の種類の非マルコフ性を利用して、非完全正の写像を構築し、二分割状態のフィルタリングを行い、真の多部交絡を検出する。さらに,本理論に基づき,真に多元的絡み合い状態を検出する証人演算子を提案する。我々の研究は、絡み合い理論と量子非マルコビアン性の間の未解明の接続に光を当てている。 We devise a novel protocol to detect genuinely multipartite entangled states by harnessing quantum non-Markovian operations. We utilize a particular type of non-Markovianity known as the eternal non-Markovianity to construct a non-complete positive map to filter out the bi-separable states and detect genuine multipartite entanglement. We further propose a witness operator to detect genuinely multipartite entangled states experimentally based on this theory. Our study sheds light on a hitherto unexplored connection between entanglement theory and quantum non-Markovianity.	翻訳日:2023-03-11 19:17:47 公開日:2022-06-13
# 産業応用のための量子アニーリング:序論とレビュー Quantum Annealing for Industry Applications: Introduction and Review ( http://arxiv.org/abs/2112.07491v3 ) ライセンス: Link先を確認	Sheir Yarkoni, Elena Raponi, Thomas B\"ack, and Sebastian Schmitt	(参考訳) 量子アニーリング(quantum annealing)は、組合せ最適化問題を解くために使用できるヒューリスティックな量子最適化アルゴリズムである。近年、量子技術の進歩により、プログラマブルな使用のために量子アニールアルゴリズムを実装する小型および中規模量子プロセッサの開発が可能となった。具体的には、D-Wave Systemsによって製造された量子アニールプロセッサが研究され、様々な分野の研究と産業の両方で広くテストされている。本稿では、ヒューリスティックな量子最適化アルゴリズムとしての量子アニーリングの理論的動機、そのような量子プロセッサを使用するために必要なソフトウェアとハードウェア、そしてそれらを用いて実証された最先端の応用と概念実証に関する文献的考察を行う。我々のレビューの目的は、量子アニール技術の応用に関する集中的かつ凝縮した情報源を提供することである。我々は、様々な分野の研究者と実践者の両方にとって量子アニーリングの利点、限界、可能性を明らかにする。 Quantum annealing is a heuristic quantum optimization algorithm that can be used to solve combinatorial optimization problems. In recent years, advances in quantum technologies have enabled the development of small- and intermediate-scale quantum processors that implement the quantum annealing algorithm for programmable use. Specifically, quantum annealing processors produced by D-Wave Systems have been studied and tested extensively in both research and industrial settings across different disciplines. In this paper we provide a literature review of the theoretical motivations for quantum annealing as a heuristic quantum optimization algorithm, the software and hardware that is required to use such quantum processors, and the state-of-the-art applications and proofs-of-concepts that have been demonstrated using them. The goal of our review is to provide a centralized and condensed source regarding applications of quantum annealing technology. We identify the advantages, limitations, and potential of quantum annealing for both researchers and practitioners from various fields.	翻訳日:2023-03-04 14:12:09 公開日:2022-06-13
# bose-hubbard dimerモデルにおける粒子トンネルによるモード絡み合いのダイナミクス Dynamics of mode entanglement induced by particle-tunneling in the extended Bose-Hubbard dimer model ( http://arxiv.org/abs/2112.12382v2 ) ライセンス: Link先を確認	Alan J. Barrios, Andrea Vald\'es-Hern\'andez and Francisco J. Sevilla	(参考訳) モードエンタングルメントの進化は、2つのアクセシブルモードを持つ2つの区別できないボソンの系に対して解析される。エンタングルメントは、各モードにおけるボソンの数が不変であるときに常に静止するが、単粒子トンネルと二粒子トンネルの影響下では豊かなダイナミクスを示す。これらの効果をパラダイム的状態の族で解析することにより, トンネル遷移速度と初期状態の調整を変化させることで, モード絡み合いの特定のダイナミクスの設計と制御のためのガイダンスを提供する。 The evolution of mode entanglement is analysed for a system of two indistinguishable bosons with two accessible modes. Whereas entanglement remains stationary whenever the number of bosons in each mode is left invariant, it exhibits a rich dynamics under the effects of single- and two-particle tunneling. By analysing such effects in paradigmatic families of states, our results provide guidance for the design and control of specific dynamics of mode entanglement, by varying the tunneling transition rates and the preparation of the initial state.	翻訳日:2023-03-03 18:07:46 公開日:2022-06-13
# 古典軌道上のリドバーグ電子のトポロジカル分子とトポロジカル局在 Topological Molecules and Topological Localization of a Rydberg Electron on a Classical Orbit ( http://arxiv.org/abs/2201.10246v2 ) ライセンス: Link先を確認	Ali Emami Kopaei, Xuedong Tian, Krzysztof Giergiel, and Krzysztof Sacha	(参考訳) 原子が互いに惹きつけると分子を形成できるという一般的な知識である。ここでは、原子の境界状態が魅力的な相互作用の結果ではなく、トポロジカルな起源を持つ分子を作ることができることを示す。すなわち、原子の有界状態は、トポロジカルモデルの位相的に保護されたエッジ状態に対応する。このようなトポロジカル分子は、超低温原子間の相互作用強度が時間的に適切に変調されたときに実現できる。同様の機構により、ライドバーグ原子が適切に変調されたマイクロ波磁場によって摂動した場合、古典軌道上の電子の位相的に保護された局在化を実現することができる。 It is common knowledge that atoms can form molecules if they attract each other. Here, we show that it is possible to create molecules where bound states of the atoms are not the result of attractive interactions but have the topological origin. That is, the bound states of the atoms correspond to the topologically protected edge states of a topological model. Such topological molecules can be realized if the interaction strength between ultra-cold atoms is properly modulated in time. A similar mechanism allows one to realize topologically protected localization of an electron on a classical orbit if a Rydberg atom is perturbed by a properly modulated microwave field.	翻訳日:2023-02-27 22:39:49 公開日:2022-06-13
# デュアルユニタリ回路ダイナミクスにおける創発的量子状態設計と二元性 Emergent quantum state designs and biunitarity in dual-unitary circuit dynamics ( http://arxiv.org/abs/2202.12306v2 ) ライセンス: Link先を確認	Pieter W. Claeys, Austen Lamacraft	(参考訳) 最近の研究は、量子クエンチに続くユニタリ力学における新しい種類のランダム行列の挙動の出現について研究している。時間進化状態から始めて、小さなサブシステム上でサポートされた純粋な状態のアンサンブルは、システムの残りの部分で射影測定を行い、投影されたアンサンブルを生成することができる。カオス量子系において、このような投影されたアンサンブルは均一なハールランダムアンサンブルと区別不能になり、量子状態設計につながると推測された。正確な結果が最近hoおよびchoi(phys. rev. lett. 18, 060601 (2022))によって提示された。解くことができる初期状態と測定値を持つ一般のカオス的二重ユニタリ回路に拡張でき、基礎となるデュアルユニタリ性の役割を強調し、さらに、デュアルユニタリ回路モデルが正確な可解性とランダムマトリクスの振る舞いの両方を示す方法を示すことができる代替構成を提供する。両単位接続から得られる結果に基づいて,Hadamard行列とユニタリ誤差ベースがともに解決可能な測定方法につながることを示す。 Recent works have investigated the emergence of a new kind of random matrix behaviour in unitary dynamics following a quantum quench. Starting from a time-evolved state, an ensemble of pure states supported on a small subsystem can be generated by performing projective measurements on the remainder of the system, leading to a projected ensemble. In chaotic quantum systems it was conjectured that such projected ensembles become indistinguishable from the uniform Haar-random ensemble and lead to a quantum state design. Exact results were recently presented by Ho and Choi [Phys. Rev. Lett. 128, 060601 (2022)] for the kicked Ising model at the self-dual point. We provide an alternative construction that can be extended to general chaotic dual-unitary circuits with solvable initial states and measurements, highlighting the role of the underlying dual-unitarity and further showing how dual-unitary circuit models exhibit both exact solvability and random matrix behaviour. Building on results from biunitary connections, we show how complex Hadamard matrices and unitary error bases both lead to solvable measurement schemes.	翻訳日:2023-02-24 01:25:39 公開日:2022-06-13
# 単純量子重力における光線ゆらぎ Light ray fluctuations in simplicial quantum gravity ( http://arxiv.org/abs/2203.07854v2 ) ライセンス: Link先を確認	Ding Jia	(参考訳) 時空の量子領域を通る光線伝播の量子ゆらぎに関する非摂動的研究は、長い時間を要する。ローレンツ型簡約量子重力の理論では、23次元と4次元の対称性が縮小されたボックス領域を移動した後、試験光が異なる場所に着陸する確率を計算する。固定境界条件では、全ての結合定数が絶対値において相対的に小さい場合、光線ゆらぎは一般的に大きいことが判明した。固定結合定数の場合、境界サイズが減少するにつれて、光線のゆらぎが最初に増加し、宇宙定数を持つ2次元理論、アインシュタイン・ヒルベルトおよびR-二乗項が減少する。宇宙定数とアインシュタイン・ヒルベルト項を持つ3Dおよび4D理論では、境界サイズが小さくなるにつれて光線ゆらぎが増大する。ちなみに、2次元量子重力の研究において、宇宙定数とアインシュタイン・ヒルベルト項との大域的時空間双対性は、リッチスカラーの任意の偶数が加わったときにも成り立つことを示す。我々は、非摂動ローレンツ量子重力の連続極限を得るのに光線ゆらぎをどのように利用できるのかを論じる。 A non-perturbative study on the quantum fluctuations of light ray propagation through a quantum region of spacetime is long overdue. Within the theory of Lorentzian simplicial quantum gravity, we compute the probabilities for a test light ray to land at different locations after travelling through a symmetry-reduced box region in 2,3 and 4 spacetime dimensions. It is found that for fixed boundary conditions, light ray fluctuations are generically large when all coupling constants are relatively small in absolute value. For fixed coupling constants, as the boundary size is decreased light ray fluctuations first increase and then decrease in a 2D theory with the cosmological constant, Einstein-Hilbert and R-squared terms. While in 3D and 4D theories with the cosmological constant and Einstein-Hilbert terms, as the boundary size is decreased light ray fluctuations just increase. Incidentally, when studying 2D quantum gravity we show that the global time-space duality with the cosmological constant and Einstein-Hilbert terms noted previously also holds when arbitrary even powers of the Ricci scalar are added. We close by discussing how light ray fluctuations can be used in obtaining the continuum limit of non-perturbative Lorentzian quantum gravity.	翻訳日:2023-02-22 09:14:52 公開日:2022-06-13
# 非相反系における皮膚効果の実スペクトルと位相遷移 Real spectra and phase transition of skin effect in nonreciprocal systems ( http://arxiv.org/abs/2203.08618v3 ) ライセンス: Link先を確認	Qi-Bo Zeng and Rong L\"u	(参考訳) 実近傍ホッピングを持つ一次元非相互格子について検討し、開境界条件下でのエネルギースペクトルが完全に実あるいは虚であることを示す。さらに,等間隔領域に導入された実非反逆ホッピングを持つ1次元モザイク格子のスペクトル特性と非エルミート皮膚効果についても検討した。そのような格子のアイジネギーは、非相互性が変化するにつれて、実複素虚数あるいは実複素遷移を行う。さらに、皮膚効果はモザイクの非相反性の周期に依存する相転移を示す。バルク状態は、周期的境界条件下でスペクトルの点ギャップの閉鎖と再開を伴って臨界点を横切ることによって、格子の一方の端から反対の端に突然シフトする。遷移の位相図を示し、臨界境界を解析的に決定する。非エルミート系におけるエネルギースペクトルと皮膚効果の興味深い性質を明らかにする。 We study the one-dimensional nonreciprocal lattices with real nearest neighboring hopping and find that the energy spectra under open boundary conditions can be entirely real or imaginary. We further investigate the spectral properties and the non-Hermitian skin effect in the one-dimensional mosaic lattices with real nonreciprocal hopping introduced at equally spaced sites. The eigenenergies of such lattices undergo a real-complex-imaginary or real-complex transition as the nonreciprocity varies. Moreover, the skin effect exhibits phase transitions depending on the period of the mosaic nonreciprocity. The bulk states are abruptly shifted from one end of the lattice to the opposite one by crossing the critical points, accompanied by the closing and reopening of point gaps in the spectra under periodic boundary conditions. The phase diagrams of the transition are presented and the critical boundaries are analytically determined. Our work unveils the intriguing properties of the energy spectrum and skin effect in non-Hermitian systems.	翻訳日:2023-02-21 23:07:28 公開日:2022-06-13
# 雇用プロセスにおけるアルゴリズム的障害識別の取り組み--倫理的、法的、技術的分析 Tackling Algorithmic Disability Discrimination in the Hiring Process: An Ethical, Legal and Technical Analysis ( http://arxiv.org/abs/2206.06149v1 ) ライセンス: Link先を確認	Maarten Buyl, Christina Cociancig, Cristina Frattone, Nele Roekens	(参考訳) 障害のある人に対するアルゴリズム的差別(PWD)に取り組むには、特に倫理的、法的、技術的課題のために、他の保護された特徴に適用されるものと根本的に異なるアプローチを要求される。これらの課題は、雇用プロセス(または自動雇用システム、AHS)で使用される人工知能(AI)システムにおいて特に解決され、自動化された評価手順は、独自の倫理的および法的考慮の対象となり、PWDに不確実な悪影響を及ぼす。本稿では,障害の識別に関して,aiを活用した雇用が生み出す懸念と機会について述べる。最終的には、このトピックに関するさらなる研究を奨励するつもりです。したがって、私たちはいくつかの出発点を確立し、倫理主義者、議員、支持者、そしてAI実践者のためのロードマップを設計します。 Tackling algorithmic discrimination against persons with disabilities (PWDs) demands a distinctive approach that is fundamentally different to that applied to other protected characteristics, due to particular ethical, legal, and technical challenges. We address these challenges specifically in the context of artificial intelligence (AI) systems used in hiring processes (or automated hiring systems, AHSs), in which automated assessment procedures are subject to unique ethical and legal considerations and have an undeniable adverse impact on PWDs. In this paper, we discuss concerns and opportunities raised by AI-driven hiring in relation to disability discrimination. Ultimately, we aim to encourage further research into this topic. Hence, we establish some starting points and design a roadmap for ethicists, lawmakers, advocates as well as AI practitioners alike.	翻訳日:2023-02-19 17:44:00 公開日:2022-06-13
# リーキーパイプライン」への取り組み--コンピュータ教育における女性を育成・維持するための行動のレビューと分類 Addressing the "Leaky Pipeline": A Review and Categorisation of Actions to Recruit and Retain Women in Computing Education ( http://arxiv.org/abs/2206.06113v1 ) ライセンス: Link先を確認	Alina Berry, Susan McKeever, Brenda Murphy, Sarah Jane Delany	(参考訳) コンピューティング教育におけるジェンダーの不均衡は、世界中でよく知られた問題である。リーキーパイプライン」という言葉は、上級職に進む前に女性の保持が欠如していることを示すためにしばしば使われる。近年、多くのイニシアチブが漏れやすいパイプラインの問題をターゲットにしている。本論文は,女子大学における女性採用の促進と,高等教育における関連コースの維持に使用される技術に関する総合的な考察を行う。主な目的は、いくつかの効果を示す介入やイニシアティブ(私たちが"アクション"と呼ぶ)を特定することです。第2の目的は、今後の行動議論、比較、計画を可能にするために、発見を分類として構成することであった。作業のかなりの部分で直面した課題は、評価の欠如、すなわち、イニシアティブと維持や採用の成果との直接的な関係の評価であった。行動は、政策、教育、影響と支援、促進と包括の4つのグループに分けられた。政策行動には支援と組織レベルでの構造変化が必要である。教育行為は、コンピューティングコースの教育に関連するイニシアチブである。影響とサポートのカテゴリには、女性がコンピューティングを選択し、一度サポートを受け、留まるように促す方法に関するアクションが含まれている。最後に、プロモーションとエンゲージメントアクションは、コンピューティングベースのコースを促進し、エンゲージメントとアウトリーチ活動を行うためのイニシアティブである。我々は,各カテゴリーと下位カテゴリにおける行動に関する文献を分類し,その分類について述べる。我々は,行動の直接的影響を評価する上での課題を議論し,この作業が我々の仕事の次の段階へどのように繋がるかについて概説する。 Gender imbalance in computing education is a well-known issue around the world. The term "leaky pipeline" is often used to describe the lack of retention of women before they progress to senior roles. Numerous initiatives have targeted the problem of the leaky pipeline in recent decades. This paper provides a comprehensive review of initiatives related to techniques used to boost recruitment and retention of women in undergraduate computing and related courses in higher education. The primary aim was to identify interventions or initiatives (which we called "actions") that have shown some effectiveness. A secondary objective was to structure our findings as a categorisation, in order to enable future action discussion, comparison and planning. A particular challenge faced in a significant portion of the work was the lack of evaluation: i.e. the assessment of the direct relationship between the initiatives and the outcomes on retention or recruitment. The actions were categorised into four groups: Policy, Pedagogy, Influence and Support and Promotion and Engagement. Policy actions need support and potentially structural change at institution level. Pedagogy actions are initiatives related to the teaching of computing courses. The Influence and Support category includes actions associated with ways to influence women to choose computing and once enrolled to support and encourage them to stay. Finally, Promotion and Engagement actions are initiatives to promote computing based courses and involve engagement and outreach activities. We present our categorisation, identifying the literature related to actions under each category and subcategory. We discuss the challenges with evaluating the direct impact of actions and outline how this work leads towards the next phase of our work - a toolkit of actions to promote retention and recruitment of women in computing undergraduate courses.	翻訳日:2023-02-19 17:43:45 公開日:2022-06-13
# 希望のための情報ソースの理論:信念、欲求、イマジネーション、メタ認知 Theorizing Information Sources for Hope: Belief, Desire, Imagination, and Metacognition ( http://arxiv.org/abs/2206.03311v2 ) ライセンス: Link先を確認	Tim Gorichanaz	(参考訳) はじめに。希望は可能な(まだ不明な)望ましい結果に向けられたポジティブな態度である。希望は美徳であるが、絶望は広く、現在の出来事だけでなく、現在の出来事に関する情報にも関係しているように見える。本稿では,情報を通して希望がどのように引き起こされるかを検討する。方法。本研究は、理論的議論を進めるために概念分析と設計の哲学的手法を用いる。分析。まず、希望の概念化が提供され、主に徳の倫理に関する仕事を描く。次に、希望のための4種類の情報ソースが理論化され、哲学と心理学から仕事を構築、合成する。結果だ希望に満ちた情報ソースの4つのカテゴリは、過去または未来についての信念を形成する情報、未来の可能性に関する道徳的想像を巻き込む情報、特定の道徳的成果に対する欲求を喚起する情報、メタ認知のための情報、または希望に関してどのように情報を得るかである。結論だ多くの場合、情報に反応することが望まれます。これは、情報専門家や学者が人々と希望、特に困難な時期をつなぐためのモラルの機会であることを示唆している。さらなる研究、特に情報行動や実践への道筋が提案されている。 Introduction. Hope is a positive attitude oriented toward a possible (yet uncertain), desired outcome. Though hope is a virtue, hopelessness is widespread and seems related not only to current events but also to information about current events. This paper examines how hope can be sparked through information. Method. This study uses the philosophical methods of conceptual analysis and design to advance a theoretical argument. Analysis. First, a conceptualization of hope is offered, drawing on work primarily in virtue ethics. Then, four types of information sources for hope are theorized, building on and synthesizing work from philosophy and psychology. Results. Four categories of information source conducive to hopefulness are identified: information for forming beliefs about the past or future; information for engaging the moral imagination regarding possibilities for the future; information for sparking desire for particular moral outcomes; and information for metacognition, or about how we become informed with respect to hope. Conclusions. Hope is, in many cases, responsive to information. This suggests a moral opportunity for information professionals and scholars to work toward connecting people with information for hope, particularly in difficult times. Avenues for further research, particularly in information behavior and practices, are suggested.	翻訳日:2023-02-19 17:38:00 公開日:2022-06-13
# キャリブレーションサブセット選択によるスクリーニングプロセスの改善 Improving Screening Processes via Calibrated Subset Selection ( http://arxiv.org/abs/2202.01147v3 ) ライセンス: Link先を確認	Lequn Wang, Thorsten Joachims, Manuel Gomez Rodriguez	(参考訳) 治験に合格した患者の検索や検索エンジンの検索パイプラインなど、多くの選択プロセスは複数の段階で構成されており、初期スクリーニング段階は最も有望な候補の短縮にリソースを集中させる。本稿では,手動で構築するか,訓練するかに関わらず,スクリーニング分類器がどのような保証を提供できるかを検討する。我々は、現在の解が分布のない理論的な保証を享受していないことを発見した -- 一般に、完全に校正された分類器でさえ、そのショートリストが最適でない候補のプールが常に存在することを示す。次に,任意の分類器とある程度のキャリブレーションデータが与えられた場合,希望する候補数を含む候補の候補の至近短リストを探索する,分散非分布スクリーニングアルゴリズム -- calibrated subset selection (css) -- を開発した。さらに、特定のグループ間で複数の分類器を校正するCSSの変種が、証明可能な多様性を保証するショートリストを作成することができることを示す。米国国勢調査調査データを用いた実験は,我々の理論的結果を検証し,本アルゴリズムが提供したショートリストが,いくつかの競合ベースラインが提供したショートリストよりも優れていることを示す。 Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates. In this paper, we investigate what guarantees a screening classifier can provide, independently of whether it is constructed manually or trained. We find that current solutions do not enjoy distribution-free theoretical guarantees -- we show that, in general, even for a perfectly calibrated classifier, there always exist specific pools of candidates for which its shortlist is suboptimal. Then, we develop a distribution-free screening algorithm -- called Calibrated Subset Selection (CSS) -- that, given any classifier and some amount of calibration data, finds near-optimal shortlists of candidates that contain a desired number of qualified candidates in expectation. Moreover, we show that a variant of CSS that calibrates a given classifier multiple times across specific groups can create shortlists with provable diversity guarantees. Experiments on US Census survey data validate our theoretical results and show that the shortlists provided by our algorithm are superior to those provided by several competitive baselines.	翻訳日:2023-02-19 14:38:21 公開日:2022-06-13
# 非対称x状態の局所利用可能な量子相関 Local available quantum correlations of non-symmetric X states ( http://arxiv.org/abs/2204.07552v2 ) ライセンス: Link先を確認	David Bellorin and Hermann Albrecht and Douglas F. Mundarain	(参考訳) Mundarainらによって定義された局所可利用量子相関(LAQC)は、非対称な2量子ビットのX状態、すなわち、サブシステムの交換の下で不変でないX状態に対して解析され、したがってノルムが異なる局所ブロッホベクトルを持つ。 LAQC定量器の簡単な解析式を得る。例えば、ウェルナー状態と一般x状態に対する振幅減衰チャネルの局所的応用について解析する。この局所的な量子チャネルは、いくつかのケースでは量子不協和を生成することができるが、LAQCにはそのような結果はあり得ない。この研究は、いわゆる対称および反対称X状態に対する我々の以前の結果と共に、2-量子X状態に対するLAQC量子化器の正確な解析式を追求する。 Local available quantum correlations (LAQC), as defined by Mundarain et al., are analyzed for non-symmetric 2-qubit X states, that is, X-states that are not invariant under the exchange of subsystems and therefore have local Bloch vectors whose norms are different. A simple analytic expression for their LAQC quantifier is obtained. As an example, we analyze the local application of the amplitude damping channel for Werner states and general X states. Although this local quantum channel can create quantum discord in some cases, no such outcome is possible for LAQC, which hints toward their monotonicity under LOCC operations. This work, along with our previous result for so-called symmetric and anti-symmetric X states, completes the pursuit of exact analytical expressions for the LAQC quantifier for 2-qubit X states.	翻訳日:2023-02-16 21:30:06 公開日:2022-06-13
# 一般化Maxwell-BlochフレームワークにおけるPML吸収境界条件の反射誤差の低減 Reducing the reflection error of PML absorbing boundary conditions within a generalized Maxwell-Bloch framework ( http://arxiv.org/abs/2206.04597v2 ) ライセンス: Link先を確認	Johannes Popp, Lukas Seitner, Michael Haider, and Christian Jirauschek (Department of Electrical and Computer Engineering, Technical University of Munich, Arcisstr. 21, 80333 Munich, Germany)	(参考訳) 境界条件を吸収する完全整合層(PML)を含む全波数値Maxwell-Blochシミュレーションツールを実演する。シミュレーション領域の境界における劣化反射誤差を回避するために、内部量子系から生じるインピーダンスミスマッチ効果を考慮した適応型PMLモデルを導入する。修正PMLモデルの数値検証には、テラヘルツ量子カスケードレーザー(QCL)構造のアクティブゲイン媒体にシミュレーションツールを適用する。 Maxwell-Bloch シミュレーション手法を用いて, 能動ゲイン媒体のトランの吸収特性を改良した。 We demonstrate a full-wave numerical Maxwell-Bloch simulation tool including perfectly matched layer (PML) absorbing boundary conditions. To avoid detrimental reflection errors at the boundary of the simulation domain, an adapted PML model is introduced, which takes into account impedance mismatch effects arising from the internal quantum system. For the numerical validation of the modified PML model the simulation tool is applied to the active gain medium of a terahertz quantum cascade laser (QCL) structure. Improved absorbing characteristics for the truncation of active gain media in our Maxwell-Bloch simulation approach are obtained.	翻訳日:2023-02-10 01:24:02 公開日:2022-06-13
# Ancilla-assisted process tomography の意義と感度 Faithfulness and sensitivity for ancilla-assisted process tomography ( http://arxiv.org/abs/2206.05899v1 ) ライセンス: Link先を確認	Seok Hyung Lie, Hyunseok Jeong	(参考訳) 系に作用する未知の量子チャネルの完全な情報を包含できる系アンシラ二成分状態は、忠実状態と呼ばれる。 d'ariano と presti によって証明された、状態の忠実さと対応するジャミョルコフスキー写像の可逆性の間の同値性は、量子チャネルではなくトレース非開化量子演算を仮定した証明が不完全であるにもかかわらず、ancilla-assisted process tomography に有用である。等価性の証明を完了し、量子チャネルの様々なクラスに忠実性の一般化を導入する。また、感度と呼ばれるより一般的な概念を探求し、量子チャネルの非自明な作用によって量子状態の性質が変化する。両特性を、ユニタリチャネル、ランダムユニタリ演算、ユニタリ演算などの量子チャネルの重要なクラスに特徴付けることにより、それらの関係を考察する。予期せぬ(非等価な)結果が量子チャネルの構造に光を当て、量子チャネルの様々なサブクラスに忠実または敏感な量子状態を特徴づけるためには2つの量子状態のクラスのみが必要であることを示した。例えば、量子過程のトモグラフィーと量子相関の関係は、局所的に観測可能な観測不能な二成分状態のみが単位チャネルの効果を感知するために使用できることが分かるため、明らかにされる。 A system-ancilla bipartite state capable of containing the complete information of an unknown quantum channel acting on the system is called faithful. The equivalence between faithfulness of state and invertibility of the corresponding Jamiolkowski map proved by D'Ariano and Presti has been a useful characterization for ancilla-assisted process tomography albeit the proof was incomplete as they assumed trace nonincreasing quantum operations, not quantum channels. We complete the proof of the equivalence and introduce the generalization of faithfulness to various classes of quantum channels. We also explore a more general notion we call sensitivity, the property of quantum state being altered by any nontrivial action of quantum channel. We study their relationship by characterizing both properties for important classes of quantum channels such as unital channels, random unitary operations and unitary operations. Unexpected (non-)equivalence results among them shed light on the structure of quantum channels by showing that we need only two classes of quantum states for characterizing quantum states faithful or sensitive to various subclasses of quantum channels. For example, it reveals the relation between quantum process tomography and quantum correlation as it turns out that only bipartite states that has no local classical observable at all can be used to sense the effect of unital channels.	翻訳日:2023-02-09 12:54:33 公開日:2022-06-13
# 合成量子チャネルによる任意の量子相関の検出 Detection of arbitrary quantum correlations via synthesized quantum channels ( http://arxiv.org/abs/2206.05883v1 ) ライセンス: Link先を確認	Ze Wu, Ping Wang, Tianyun Wang, Yuchen Li, Ran Liu, Yuquan Chen, Xinhua Peng, Ren-Bao Liu, Jiangfeng Du	(参考訳) 量子相関は、量子多体系の構造とダイナミクスに関する重要な情報である。時間順序の異なる高次量子相関には多くの種類があるが、既存の検出方法にアクセスできるものはごくわずかである。近年,任意の種類の相関を選択的に抽出するために,逐次弱測定に基づく量子センシング手法が提案されている。しかし、その実験的な実装はまだ解明されていない。ここでは任意のタイプの量子相関の抽出を示す。我々は,従来の弱測定方式を合成量子チャネルを用いたプロトコルに一般化し,単一およびアンサンブル量子システムを含むより普遍的なシナリオに適用した。この量子チャネル法では、センサの様々な制御が重ね合わされ、所望の量子相関を測定するための特定の経路に沿ってセンサターゲットの進化が選択される。核磁気共鳴法の汎用性を用いて、核スピンターゲットの2次および4次相関を別の核スピンセンサで抽出することに成功した。量子相関の完全な特徴付けは、量子多体系を理解し、基本量子物理学を探求し、量子技術を開発するための新しいツールを提供する。 Quantum correlations are key information about the structures and dynamics of quantum many-body systems. There are many types of high-order quantum correlations with different time orderings, but only a few of them are accessible to the existing detection methods. Recently, a quantum-sensing approach based on sequential weak measurement was proposed to selectively extract arbitrary types of correlations. However, its experimental implementation is still elusive. Here we demonstrate the extraction of arbitrary types of quantum correlations. We generalized the original weak measurement scheme to a protocol using synthesized quantum channels, which can be applied to more universal scenarios including both single and ensemble quantum systems. In this quantum channel method, various controls on the sensors are superimposed to select the sensor-target evolution along a specific path for measuring a desired quantum correlation. Using the versatility of nuclear magnetic resonance techniques, we successfully extract the second- and fourth-order correlations of a nuclear-spin target by another nuclear-spin sensor. The full characterization of quantum correlations provides a new tool for understanding quantum many-body systems, exploring fundamental quantum physics, and developing quantum technologies.	翻訳日:2023-02-09 12:53:51 公開日:2022-06-13
# 高次元グラフ上の計測に基づく量子ウォーク Measurement-Based Quantum Walks on High-Dimensional Graphs ( http://arxiv.org/abs/2206.06059v1 ) ライセンス: Link先を確認	Syamsundar De, Vahid Ansari, Jan Sperling, Sonja Barkhofen, Benjamin Brecht and Christine Silberhorn	(参考訳) 高次元および再構成可能なグラフ上の量子ウォーク(QWs)は、量子シミュレーションと情報処理タスクの完全な可能性を呼び出すことができる。しかしながら、このような大規模でプログラマブルな量子ウォークの実験的な実現は、既存のスキームの複雑さが大幅に増大するため、非常に困難である。この限界を克服して、グローバーが95%以上の類似度を持つ4次元超キューブの上を歩き、98%の類似度を持つ円と有限直線上の400ステップの量子ウォークを示す。これは、量子ウォークに対する新しい測定に基づくアプローチによって実現され、適切に回転されたベース上の測定がターゲットとなる進化ユニタリを実装する。提案手法は,gaussian boson sampling のような複雑なタスクに応用できるスケーラブルでプログラマブルな量子ネットワークの実装に向けた新たな道を開くものである。 Quantum walks (QWs) on high-dimensional and reconfigurable graphs can invoke the full potential of quantum simulation and information processing tasks. However, experimental realization of such large-scale and programmable quantum walks is quite challenging, owing to the significantly increased complexity of the existing schemes. Overcoming this limitation, we here demonstrate Grover walks on four-dimensional hypercubes with high similarities above 95%, and 400-step quantum walks on circles and finite lines with similarities of 98%. This is rendered possible by a novel measurement-based approach to quantum walks where measurements on appropriately rotated bases implement the targeted evolution unitaries. Our results open a new path towards the implementation of scalable and programmable quantum networks that can find application in complex tasks, such as Gaussian Boson Sampling.	翻訳日:2023-02-09 12:47:50 公開日:2022-06-13
# 超伝導ダッフィング発振器の散逸相転移における量子挙動 Quantum behavior of a superconducting Duffing oscillator at the dissipative phase transition ( http://arxiv.org/abs/2206.06338v1 ) ライセンス: Link先を確認	Qi-Ming Chen, Michael Fischer, Yuki Nojiri, Michael Renger, Edwar Xie, Matti Partanen, Stefan Pogorzalek, Kirill G. Fedorov, Achim Marx, Frank Deppe, Rudolf Gross	(参考訳) 決定論的非線形システムの非決定論的挙動を理解することは、ローレンツがそれを「バタフライ効果」と命名して以来、暗黙の夢であった。有名な例はダフィング発振器のヒステリシスとビスタビリティであり、古典的な記述では二重ウェルポテンシャルにおける2つの定常状態の共存に起因する。しかし、この解釈は、パラメータ空間全体において単一の一意的な定常状態が許容される量子力学的観点では失敗する。ここでは、超伝導ダッフィング発振器の非平衡ダイナミクスを測定し、古典的および量子的記述を量子メタスタビリティの統一的な図形で再現する。 2つの古典的な定常状態が実際には準安定状態であることを示す。古典的なヒステリシス体制では極めて長い寿命を持つが、最終的には量子力学によって許される唯一の定常状態に緩和されなければならない。準安定状態の寿命を十分に大きくすることで,11サイトBose-Hubbard格子における平均場の急激な変化を模した1次散逸相転移を観測する。また、量子状態トモグラフィーによる遷移の2つの相、すなわちコヒーレント状態相と臨界点によって分離された圧縮状態相を明らかにする。以上の結果から, 非平衡系のヒステリシスと不安定性を理解する上では, 突然の散逸相転移の背後にあるスムーズな量子状態の進化が明らかとなった。 Understanding the non-deterministic behavior of deterministic nonlinear systems has been an implicit dream since Lorenz named it the "butterfly effect". A prominent example is the hysteresis and bistability of the Duffing oscillator, which in the classical description is attributed to the coexistence of two steady states in a double-well potential. However, this interpretation fails in the quantum-mechanical perspective, where a single unique steady state is allowed in the whole parameter space. Here, we measure the non-equilibrium dynamics of a superconducting Duffing oscillator and reconcile the classical and quantum descriptions in a unified picture of quantum metastability. We demonstrate that the two classically regarded steady states are in fact metastable states. They have a remarkably long lifetime in the classical hysteresis regime but must eventually relax into a single unique steady state allowed by quantum mechanics. By engineering the lifetime of the metastable states sufficiently large, we observe a first-order dissipative phase transition, which mimics a sudden change of the mean field in a 11-site Bose-Hubbard lattice. We also reveal the two distinct phases of the transition by quantum state tomography, namely a coherent-state phase and a squeezed-state phase separated by a critical point. Our results reveal a smooth quantum state evolution behind a sudden dissipative phase transition, and they form an essential step towards understanding hysteresis and instability in non-equilibrium systems.	翻訳日:2023-02-09 12:39:55 公開日:2022-06-13
# 量子制御のための物理インフォームドニューラルネットワーク Physics-informed neural networks for quantum control ( http://arxiv.org/abs/2206.06287v1 ) ライセンス: Link先を確認	Ariel Norambuena, Marios Mattheakis, Francisco J. Gonz\'alez and Ra\'ul Coto	(参考訳) 量子制御はユビキタスな研究分野であり、物理学者は量子システムのダイナミクスと特徴を掘り下げることができる。システムのステアリングに加えて、量子制御は様々な原子、光学、機械、固体システムに強力な応用をもたらした。近年,最適化プロセスに基づく従来の制御技術が,効率的な人工知能アルゴリズムに変換されている。本稿では,物理インフォームドニューラルネットワーク(PINN)を用いた最適量子制御問題の計算手法を提案する。提案手法は,高い確率で状態間移動問題を効率的に解き,短時間で進化させ,制御のパワーを最小化することにより,量子システムの開放に応用する。さらに、パラメータや初期条件の変化の下で同じ問題を解決するために、PINNの柔軟性を説明し、標準制御技術と比較して利点を示す。 Quantum control is a ubiquitous research field that has enabled physicists to delve into the dynamics and features of quantum systems. In addition to steering the system, quantum control has delivered powerful applications for various atomic, optical, mechanical, and solid-state systems. In recent years, traditional control techniques based on optimization processes have been translated into efficient artificial intelligence algorithms. Here, we introduce a computational method for optimal quantum control problems via physics-informed neural networks (PINNs). We apply our methodology to open quantum systems by efficiently solving the state-to-state transfer problem with high probabilities, short-time evolution, and minimizing the power of the control. Furthermore, we illustrate the flexibility of PINNs to solve the same problem under changes in parameters and initial conditions, showing advantages in comparison with standard control techniques.	翻訳日:2023-02-09 12:38:39 公開日:2022-06-13
# シリコン中の量子ドット結合スズ量子ビットの目覚ましい展望 The remarkable prospect for quantum-dot-coupled tin qubits in silicon ( http://arxiv.org/abs/2206.06285v1 ) ライセンス: Link先を確認	Wayne M. Witzel and Jesse J. Lutz and Dwight R. Luhman	(参考訳) シリコン半導体中のスピン-$\frac{1}{2}$$^{119}$sn原子核は優れた量子ビットを生成する。シリコンの核スピンは長いコヒーレンス時間を持つことが知られている。 Tinはシリコンと等電子であるため、電子はSn原子から別の原子へ簡単に移動して超微細な相互作用を通じて量子情報を伝播し、全電子線型化された拡張平面波密度汎関数論計算から予測すると、本質的な$^{29}$Siより約10倍大きい。超微細誘導型電気核制御相(e-n-CPhase)ゲート動作は、電子を一定期間最大超微細強度のスイートスポットに保持するだけで(局所回転まで)発生し、電荷/電圧ノイズに対して非常に耐性があると予測される。ダイアバティックスピンフリップは、控えめな磁場($<10^{-6}$フリップ確率に対して15〜$mt)で抑制され、核スピンバスノイズは、同位体の濃縮または動的デカップリングまたは監視および補償によって緩和される。磁気共鳴制御と組み合わせて、この演算は普遍的な量子計算を可能にする。 Spin-$\frac{1}{2}$ $^{119}$Sn nuclei in a silicon semiconductor could make excellent qubits. Nuclear spins in silicon are known to have long coherence times. Tin is isoelectronic with silicon, so we expect electrons can easily shuttle from one Sn atom to another to propagate quantum information via a hyperfine interaction that we predict, from all-electron linearized augmented plane wave density functional theory calculations, to be roughly ten times larger than intrinsic $^{29}$Si. A hyperfine-induced electro-nuclear controlled-phase (e-n-CPhase) gate operation, generated (up to local rotations) by merely holding an electron at a sweet-spot of maximum hyperfine strength for a specific duration of time, is predicted to be exceptionally resilient to charge/voltage noise. Diabatic spin flips are suppressed with a modest magnetic field ($>15~$mT for $<10^{-6}$ flip probabilities) and nuclear spin bath noise may be avoided via isotopic enrichment or mitigated using dynamical decoupling or through monitoring and compensation. Combined with magnetic resonance control, this operation enables universal quantum computation.	翻訳日:2023-02-09 12:38:27 公開日:2022-06-13
# フロッケ回路の位相欠陥 Topological Defects in Floquet Circuits ( http://arxiv.org/abs/2206.06272v1 ) ライセンス: Link先を確認	Mao Tian Tan, Yifan Wang and Aditi Mitra	(参考訳) トポロジカルな欠陥を持つ駆動Ising鎖を記述するFloquet回路を導入する。対応するゲートはスピンを反転する欠陥と、クラマース・ワニエ双対変換を明示的に実装する双対性欠陥を含む。フロッケユニタリ進化作用素はそのような欠陥で可換であるが、双対性欠陥は状態の半分を射出するためユニタリではない。これらの欠陥の応用は2つある。 1つは、システムの周りに広がる「空間的」欠陥の存在下での戻り振幅を分析することである。我々は、戻り振幅が欠陥の融合規則と一致していることを明確に検証する。第二の応用は、反周期的・双対的境界条件を実装する「時間的」欠陥の存在下でのユニタリ進化を研究することである。後者の場合、単一の未ペアローカライズされたMajorana 0 モードが現れることを示す。我々は、このFloquet回路の対称性として機能する演算子を明示的に構成する。また, 複数箇所のシステムに対して, 一つの時間ステップで絡み合いエントロピーの解析式を, 上記のすべての欠陥構成に対して提示する。 We introduce a Floquet circuit describing the driven Ising chain with topological defects. The corresponding gates include a defect that flips spins as well as the duality defect that explicitly implements the Kramers-Wannier duality transformation. The Floquet unitary evolution operator commutes with such defects, but the duality defect is not unitary, as it projects out half the states. We give two applications of these defects. One is to analyze the return amplitudes in the presence of "space-like" defects stretching around the system. We verify explicitly that the return amplitudes are in agreement with the fusion rules of the defects. The second application is to study unitary evolution in the presence of "time-like" defects that implement anti-periodic and duality-twisted boundary conditions. We show that a single unpaired localized Majorana zero mode appears in the latter case. We explicitly construct this operator, which acts as a symmetry of this Floquet circuit. We also present analytic expressions for the entanglement entropy after a single time step for a system of a few sites, for all of the above defect configurations.	翻訳日:2023-02-09 12:38:05 公開日:2022-06-13
# 炭化ケイ素中のバナジウム:長い緩和寿命と超微細分解光遷移を有するテレコム可読スピン中心 Vanadium in Silicon Carbide: Telecom-ready spin centres with long relaxation lifetimes and hyperfine-resolved optical transitions ( http://arxiv.org/abs/2206.06240v1 ) ライセンス: Link先を確認	T. Astner, P. Koller, C. M. Gilardoni, J. Hendriks, N. T. Son, I. G. Ivanov, J. U. Hassan, C. H. van der Wal, and M. Trupke	(参考訳) 炭化ケイ素(SiC)のバナジウムは、テレコム波長域における光学遷移のため、量子技術の重要な候補系として浮上している。しかし、スピン緩和寿命(T1)、電荷状態のダイナミクス、レベル構造など、この欠陥ファミリーの重要な特徴は、完全には理解されていない。本研究では,バナジウム欠陥のアンサンブルのT1を定量し,低温で大幅に増強できることを実証した。我々は,100mKで最大25s,1.3Kで1s,90%を超える大きなスピンコントラストを観察した。これらの測定は、アンサンブル電荷状態ダイナミクスの特性によって補完される。安定な電子スピンはさらに、2光子磁気分光による超微細構造の高分解能評価を可能にする。得られた知見は、SiCのバナジウムに基づく高性能スピン-光子界面を指している。 Vanadium in silicon carbide (SiC) is emerging as an important candidate system for quantum technology due to its optical transitions in the telecom wavelength range. However, several key characteristics of this defect family including their spin relaxation lifetime (T1), charge state dynamics, and level structure are not fully understood. In this work, we determine the T1 of an ensemble of vanadium defects, demonstrating that it can be greatly enhanced at low temperature. We observe a large spin contrast exceeding 90% and long spin-relaxation times of up to 25s at 100mK, and of order 1s at 1.3K. These measurements are complemented by a characterization of the ensemble charge state dynamics. The stable electron spin furthermore enables high-resolution characterization of the systems' hyperfine level structure via two-photon magneto-spectroscopy. The acquired insights point towards high-performance spin-photon interfaces based on vanadium in SiC.	翻訳日:2023-02-09 12:37:29 公開日:2022-06-13
# 一般量子ネットワーク上の量子回路の分布 Distribution of Quantum Circuits Over General Quantum Networks ( http://arxiv.org/abs/2206.06437v1 ) ライセンス: Link先を確認	Ranjani G Sundaram, Himanshu Gupta, C. R. Ramakrishnan	(参考訳) 短期量子コンピュータは少数の量子ビットしか保持できない。大規模量子計算を容易にする方法の1つは、量子コンピュータの分散ネットワークである。本研究では,量子回路として表現される量子プログラムを,異種量子コンピュータの量子ネットワークに分散する問題を,分散回路の実行に必要な通信コストを最小化する手法で検討する。我々は2つのコミュニケーション方法を検討する: コンピュータのペア間でクビットのリンクコピーを生成する猫の絡み合い、テレポーテーション。不均一なコンピュータは、アルゴリズムによって選択できる猫の絡み合いとテレポーテーション操作に制約を課す。まず,コミュニケーションのためのテレポーテーションではなく,猫の絡み合いのみを許容する特殊なケースに注目した。この特殊な設定を解くための2段階のヒューリスティックを提供する。 (i)タブサーチによるコンピュータへの量子ビットの割り当てを見つけること、 2) ゲートを局所的に実行するために必要な猫絡み操作を決定するために, セットカバー問題の制約バージョン用に設計された反復的欲求アルゴリズムを用いる。両形態の通信を可能にする一般的な場合に対し、量子回路をいくつかの部分に分け、各部分の特殊設定にヒューリスティックを適用する2つのアルゴリズムを提案する。テレポーテーションは、各部分のソリューションを縫い合わせるために使用される。最後に,ランダムに生成する量子ネットワークと回路の広い範囲でアルゴリズムをシミュレートし,その特性について様々なパラメータについて検討する。 Near-term quantum computers can hold only a small number of qubits. One way to facilitate large-scale quantum computations is through a distributed network of quantum computers. In this work, we consider the problem of distributing quantum programs represented as quantum circuits across a quantum network of heterogeneous quantum computers, in a way that minimizes the overall communication cost required to execute the distributed circuit. We consider two ways of communicating: cat-entanglement that creates linked copies of qubits across pairs of computers, and teleportation. The heterogeneous computers impose constraints on cat-entanglement and teleportation operations that can be chosen by an algorithm. We first focus on a special case that only allows cat-entanglements and not teleportations for communication. We provide a two-step heuristic for solving this specialized setting: (i) finding an assignment of qubits to computers using Tabu search, and (ii) using an iterative greedy algorithm designed for a constrained version of the set cover problem to determine cat-entanglement operations required to execute gates locally. For the general case, which allows both forms of communication, we propose two algorithms that subdivide the quantum circuit into several portions and apply the heuristic for the specialized setting on each portion. Teleportations are then used to stitch together the solutions for each portion. Finally, we simulate our algorithms on a wide range of randomly generated quantum networks and circuits, and study the properties of their results with respect to several varying parameters.	翻訳日:2023-02-09 12:32:13 公開日:2022-06-13
# 複合量子シミュレーション Composite Quantum Simulations ( http://arxiv.org/abs/2206.06409v1 ) ライセンス: Link先を確認	Matthew Hagan, Nathan Wiebe	(参考訳) 本稿では、トロッター・スズキ公式やqdriftのような複数の量子シミュレーション手法を単一の合成チャネルに結合し、ゲート数を減らすための古い結合アイデアに基づく枠組みを提案する。このアプローチの背後にある中心的な考え方は、シミュレーション内のチャネルのトロッターまたはQDrift部分にハミルトン項を割り当てるパーティショニングスキームを使用することである。これにより、高次トロッタースズキ式を用いてより大きい項をシミュレートしながら、QDriftを用いて、小さくて多数の項をシミュレートできる。合成チャネルと理想シミュレーションチャネルとの間のダイヤモンド距離の厳密な境界を証明し、合成チャネルの実装コストが漸近的に上界となる条件下では、項の確率的分割と決定論的分割の両方でそれを構成する方法を示す。最後に、分割スキームを決定するための戦略と、同一フレームワーク内で異なるシミュレーション手法を組み込む手法について論じる。 In this paper we provide a framework for combining multiple quantum simulation methods, such as Trotter-Suzuki formulas and QDrift into a single composite channel that builds upon older coalescing ideas for reducing gate counts. The central idea behind our approach is to use a partitioning scheme that allocates a Hamiltonian term to the Trotter or QDrift part of a channel within the simulation. This allows us to simulate small but numerous terms using QDrift while simulating the larger terms using a high-order Trotter-Suzuki formula. We prove rigorous bounds on the diamond distance between the composite channel and the ideal simulation channel and show under what conditions the cost of implementing the composite channel is asymptotically upper bounded by the methods that comprise it for both probabilistic partitioning of terms and deterministic partitioning. Finally, we discuss strategies for determining partitioning schemes as well as methods for incorporating different simulation methods within the same framework.	翻訳日:2023-02-09 12:31:51 公開日:2022-06-13
# Al$_{x}$Ga$_{1-x}$)$_{2}$O$_{3}$/Ga$_{2}$O$_{3}$ヘテロ構造における2次元電子気体のフルバンドモンテカルロシミュレーション Full-band Monte Carlo simulation of two-dimensional electron gas in (Al$_{x}$Ga$_{1-x}$)$_{2}$O$_{3}$/Ga$_{2}$O$_{3}$ heterostructures ( http://arxiv.org/abs/2206.06405v1 ) ライセンス: Link先を確認	Avinash Kumar, and Uttam Singisetti	(参考訳) $\beta$-Gallium oxide (Ga$_{2}$O$_{3}$) は、パワーエレクトロニクスやRFスイッチングに応用するための超広帯域半導体である。室温バルク電子移動量(\sim$200 cm$^{2}$V$^{-1}$s$^{-1}$)は比較的低く、10原子原始細胞に由来する30フォノンモードによって制限されている。理論的に計算された飽和速度は1-2$\times$10$^{7}$ cms$^{-1}$であり、これはGaNに匹敵する。 2DEGにおける高磁場電子輸送は、第一原理計算パラメータに基づいて研究される。与えられたヘテロ構造設計の自己整合計算は、制限された固有関数と固有エネルギーを与える。 loフォノンプラズモンスクリーニングを考慮したフェルミの黄金則に基づいてサブバンド内およびサブバンド間散乱率を算出する。 300kにおけるヘテロ構造のフルバンドモンテカルロシミュレーションから高磁場特性を抽出し、2degおよびバルク中の電子の動きを、いくつかのヘテロ構造設計のために定常領域の人口、過渡ダイナミクス、速度場曲線を出力する統合モンテカルロプログラムにより処理する。飽和の臨界場はバルク値から大きく変化しないが, ピーク速度は2DEG密度の高い値で計算される。低2deg密度での速度は、loフォノンの反遮蔽に影響を与え、ゾーン人口の形成に重要な役割を果たしている。また,実験結果との比較を行い,実験結果との相違の原因について考察した。 $\beta$-Gallium Oxide (Ga$_{2}$O$_{3}$) is an extensively investigated ultrawide-bandgap semiconductor for potential applications in power electronics and RF switching. The room temperature bulk electron mobility ($\sim$200 cm$^{2}$V$^{-1}$s$^{-1}$) is comparatively low and is limited by the 30 phonon modes originating from its 10-atom primitive cell. The theoretically calculated saturation velocity is 1-2$\times$10$^{7}$ cms$^{-1}$ which is comparable to GaN. The high field electron transport in the 2DEG is explored in this work based on the first principles calculated parameters. A self-consistent calculation on a given heterostructure design gives the confined eigenfunctions and eigenenergies. The intrasubband and the intersubband scattering rates are calculated based on the Fermi's golden rule considering LO phonon-plasmon screening. The high field characteristics are extracted from the full-band Monte Carlo simulation of heterostructures at 300 K. The motion of electrons in the 2DEG and the bulk is treated through an integrated Monte Carlo program which outputs the steady state zone population, transient dynamics and the velocity-field curves for a few heterostructure designs. The critical field for saturation does not change significantly from bulk values, however an improved peak velocity is calculated at a higher 2DEG density. The velocity at low 2DEG densities is impacted by the antiscreening of LO phonons which plays an important role in shaping the zone population. A comparison with the experimental measurements is also carried out and possible origins of the discrepancies with experiments is discussed.	翻訳日:2023-02-09 12:31:34 公開日:2022-06-13
# アシュテカー変数を用いた重力誘起デコヒーレンスモデル A gravitationally induced decoherence model using Ashtekar variables ( http://arxiv.org/abs/2206.06397v1 ) ライセンス: Link先を確認	Max Joseph Fahn, Kristina Giesel and Michael Kobler	(参考訳) 線形化重力へのスカラー場の結合を考え、アシュテカー変数を用いた相対論的重力誘起デコヒーレンスモデルを導出する。このモデルは、リレーショナルフォーマリズムにおいて適切な幾何学的時計を用いてゲージ不変度で定式化され、デコヒーレンスモデルの既存のゲージ不変式を広くする。ディラック可観測空間を構成するためには、既知の可観測写像を、時計と制約の役割が交換されるような双対写像の一種によって拡張する。また、ADM文献に存在する幾何学時計の選択についても論じる。次に、Fock空間上の位相空間の量子化を減らし、重力環境におけるギブス状態を選択し、射影演算子技術を用いて最終マスター方程式を導出する。結果として得られたマスター方程式はリンドブラッド型ではなく、現象学モデルでしばしば仮定される出発点であるが、熱ワイトマン関数で表現する相関関数の形式のためにマスター方程式の有効作用素のレベルでの残留時間依存性も含んでいる。さらに、ここで解析されたモデルにおいて、時間独立な実効系作用素の集合を得るための2番目のマルコフ近似の適用は、いくつかの量子力学モデルよりも単純ではないことを議論する。 We consider the coupling of a scalar field to linearised gravity and derive a relativistic gravitationally induced decoherence model using Ashtekar variables. The model is formulated at the gauge invariant level using suitable geometrical clocks in the relational formalism, broadening existing gauge invariant formulations of decoherence models. For the construction of the Dirac observables we extend the known observable map by a kind of dual map where the role of clocks and constraints is interchanged. We also discuss a second choice of geometrical clocks existing in the ADM literature. Then we apply a reduced phase space quantisation on Fock space and derive the final master equation choosing a Gibbs state for the gravitational environment and using the projection operator technique. The resulting master equation is not automatically of Lindblad type, a starting point sometimes assumed for phenomenological models, but still involves a residual time dependence at the level of the effective operators in the master equation due to the form of the correlation functions that we express in terms of thermal Wightman functions. Furthermore, we discuss why in the model analysed here the application of a second Markov approximation in order to obtain a set of time independent effective system operators is less straightforward than in some of the quantum mechanical models.	翻訳日:2023-02-09 12:30:46 公開日:2022-06-13
# 短期量子ハードウェアのための反復量子位相推定プロトコル An iterative quantum-phase-estimation protocol for near-term quantum hardware ( http://arxiv.org/abs/2206.06392v1 ) ライセンス: Link先を確認	Joseph G. Smith, Crispin H. W. Barnes and David R. M. Arvidsson-Shukur	(参考訳) N_{\textrm{tot}}$ が未知の位相 $\theta$ を持つユニタリ演算の応用として与えられると、大規模フォールトトレラント量子系は $\mathcal{O} \left[1 / \sqrt{N_{\textrm{tot}}} \right]$ から $\mathcal{O} \left[1 / {N_{\textrm{tot}}} \right]$ へのスケールを {reduce} する。近未来の量子デバイスで利用可能な限られたリソースのため、絡み合いのないプロトコルが開発され、$\mathcal{O} \left[ \log(N_{\textrm{tot}}) / N_{\textrm{tot}} \right]$ {mean-absolute-error}スケーリングを実現した。本稿では,{error}スケーリングを改良した,短期的位相推定のための新しい2段階プロトコルを提案する。我々のプロトコルの最初のステップは、$\theta$のパラメータ範囲内で、$\theta $のいくつかの低標準偏差推定を生成する。第2のステップは、これらの見積もりの1つに反復的に当てはまる。私たちのプロトコルの {mean absolute error} は $\mathcal{O} \left[ \sqrt{\log (\log N_{\textrm{tot}})} / N_{\textrm{tot}} \right]$ とスケールします。さらに、定数スケーリング係数と必要な回路深さの低減を示す: 本プロトコルは$n_{\textrm{tot}}$の現実的な値に対して漸近的に最適な量子位相推定アルゴリズムを上回ることができる。 Given $N_{\textrm{tot}}$ applications of a unitary operation with an unknown phase $\theta$, a large-scale fault-tolerant quantum system can {reduce} an estimate's {error} scaling from $\mathcal{O} \left[ 1 / \sqrt{N_{\textrm{tot}}} \right]$ to $\mathcal{O} \left[ 1 / {N_{\textrm{tot}}} \right]$. Owing to the limited resources available to near-term quantum devices, entanglement-free protocols have been developed, which achieve a $\mathcal{O} \left[ \log(N_{\textrm{tot}}) / N_{\textrm{tot}} \right]$ {mean-absolute-error} scaling. Here, we propose a new two-step protocol for near-term phase estimation, with an improved {error} scaling. Our protocol's first step produces several low-{standard-deviation} estimates of $\theta $, within $\theta$'s parameter range. The second step iteratively hones in on one of these estimates. Our protocol's {mean absolute error} scales as $\mathcal{O} \left[ \sqrt{\log (\log N_{\textrm{tot}})} / N_{\textrm{tot}} \right]$. Furthermore, we demonstrate a reduction in the constant scaling factor and the required circuit depths: our protocol can outperform the asymptotically optimal quantum-phase estimation algorithm for realistic values of $N_{\textrm{tot}}$.	翻訳日:2023-02-09 12:30:24 公開日:2022-06-13
# 時間最適化マルチ量子ビットゲートの合成とコンパイル Synthesis of and compilation with time-optimal multi-qubit gates ( http://arxiv.org/abs/2206.06387v1 ) ライセンス: Link先を確認	Pascal Ba{\ss}ler, Matthias Zipper, Christopher Cedzich, Markus Heinrich, Patrick Huber, Michael Johanning, Martin Kliesch	(参考訳) 我々は、Ising型とオールツーオール接続を固定した量子コンピューティングプラットフォームに対して、マルチキュービットゲートを絡み合わせるクラスを合成する方法を開発した。相互作用の柔軟性に関する唯一の要件は、個々の量子ビットに対してスイッチオンおよびオフが可能であることである。提案手法は,マルチキュービットゲートの時間最適実装を実現する。本研究では,全マルチキュービットゲートタイムが量子ビット数でほぼ線形であることを数値的に示す。このゲート合成をサブルーチンとして、重要なユースケースに対するコンパイル戦略を提供する。 (i)n$ qubits 上のclifford回路は、ancilla qubits を必要とせずに、最大$n$ マルチキュービットゲートを使用して実装できることを示す。 (ii)同様の方法で量子フーリエ変換を分解する。 (iii)分子動力学のシミュレーションをコンパイルし、 (iv)一般ユニタリに向けてのステップとして,時間最適化マルチキュービットゲートを用いた対角ユニタリのコンパイル法を提案する。モチベーションとして、Ising型相互作用生成のための磁気勾配誘導結合(MAGIC)を用いたマイクロ波制御イオントラップアーキテクチャについて、詳細な議論を行う。 We develop a method to synthesize a class of entangling multi-qubit gates for a quantum computing platform with fixed Ising-type interaction with all-to-all connectivity. The only requirement on the flexibility of the interaction is that it can be switched on and off for individual qubits. Our method yields a time-optimal implementation of the multi-qubit gates. We numerically demonstrate that the total multi-qubit gate time scales approximately linear in the number of qubits. Using this gate synthesis as a subroutine, we provide compilation strategies for important use cases: (i) we show that any Clifford circuit on $n$ qubits can be implemented using at most $n$ multi-qubit gates without requiring ancilla qubits, (ii) we decompose the quantum Fourier transform in a similar fashion, (iii) we compile a simulation of molecular dynamics, and (iv) we propose a method for the compilation of diagonal unitaries with time-optimal multi-qubit gates, as a step towards general unitaries. As motivation, we provide a detailed discussion on a microwave controlled ion trap architecture with magnetic gradient induced coupling (MAGIC) for the generation of the Ising-type interactions.	翻訳日:2023-02-09 12:29:38 公開日:2022-06-13
# 水中乱流チャネル上のパッシブリレーを用いたマルチホップ量子鍵分布 Multi-Hop Quantum Key Distribution with Passive Relays over Underwater Turbulence Channels ( http://arxiv.org/abs/2206.06514v1 ) ライセンス: Link先を確認	Amir Hossein Fahim Raouf, Majid Safari, Murat Uysal	(参考訳) 水中チャネルで経験した吸収、散乱、乱流は、量子通信の範囲を著しく制限する。本稿では,帯域制限を克服するために,中間ノードがソースノードと宛先ノード間の鍵分布を助けるマルチホップ水中量子鍵分布(qkd)について検討する。我々は、測定なしで次の中継ノードや受信機にキュービットをリダイレクトするパッシブリレーの配置を検討する。近距離場解析に基づいて, 大気条件の異なる澄んだ海におけるリレー支援QKDスキームの性能を示す。さらに,システムパラメータ(開口サイズと検出器視野)が到達可能な距離に与える影響について検討する。 Absorption, scattering, and turbulence experienced in underwater channels severely limit the range of quantum communications. In this paper, to overcome range limitations, we investigate a multi-hop underwater quantum key distribution (QKD) where intermediate nodes help the key distribution between the source and destination nodes. We consider deployment of passive-relays which simply redirect the qubits to the next relay node or receiver without any measurement. Based on near-field analysis, we present the performance of relay-assisted QKD scheme in clear ocean under different atmospheric conditions. We further investigate the effect of system parameters (aperture size and detector field-of-view) on the achievable distance.	翻訳日:2023-02-09 12:19:36 公開日:2022-06-13
# 有限クロージャ系における最大閉集合と半空間分離 Maximal Closed Set and Half-Space Separations in Finite Closure Systems ( http://arxiv.org/abs/2001.04417v3 ) ライセンス: Link先を確認	Florian Seiffarth, Tamas Horvath and Stefan Wrobel	(参考訳) いくつかの概念学習問題は、有限基底集合上の抽象閉包系における半空間分離の特別な場合と見なすことができる。閉包系が閉包演算子を介して暗黙的に与えられる典型的なシナリオについて、半空間分離問題はNP完全であることを示す。この負の結果を克服する最初のアプローチとして、この問題を最大閉集合分離に緩和し、この問題を線形数のクロージャ演算子呼び出しで解く一般の欲求アルゴリズムを与え、この境界が鋭いことを示す。第二に,角谷閉包系を考察し,アルゴリズムによって特徴付けられることを証明した。一般問題設定の第一の特別な場合として、グラフ上の角谷閉包系を考察し、禁止グラフマイナーーの観点からこの種の閉包系に十分な条件を与える。第二の特別な場合として、有限格子上の閉包系に着目し、ジェネリック・グリーディアルゴリズムの適応性を高め、仮定格子に関する応用を提案する。 Several concept learning problems can be regarded as special cases of half-space separation in abstract closure systems over finite ground sets. For the typical scenario that the closure system is implicitly given via a closure operator, we show that the half-space separation problem is NP-complete. As a first approach to overcome this negative result, we relax the problem to maximal closed set separation, give a generic greedy algorithm solving this problem with a linear number of closure operator calls, and show that this bound is sharp. For a second direction, we consider Kakutani closure systems and prove that they are algorithmically characterized by the greedy algorithm. As a first special case of the general problem setting, we consider Kakutani closure systems over graphs and give a sufficient condition for this kind of closure systems in terms of forbidden graph minors. For a second special case, we then focus on closure systems over finite lattices, give an improved adaptation of the generic greedy algorithm, and present an application concerning subsumption lattices.	翻訳日:2023-01-11 22:29:46 公開日:2022-06-13
# ピーク制約付きmdpsの効率的モデルフリーアルゴリズム Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints ( http://arxiv.org/abs/2003.05555v6 ) ライセンス: Link先を確認	Qinbo Bai and Vaneet Aggarwal and Ather Gattami	(参考訳) 動的システムの最適化では、変数は一般に制約を持つ。このような問題をCMDP(Constrained Markov Decision Process)としてモデル化することができる。本稿では,有限地平線における全報酬を最大化し,かつ各エポックにおける制約を確率1で満たす政策を選択する,制約付きマルコフ決定過程(PCMDP)について考察する。我々は,PCMDP問題を制約のない問題に変換するモデルフリーアルゴリズムを提案し,Q-ラーニングに基づくアプローチを適用した。提案した PCMDP 問題に対して,ほぼ正しい (PAC) の概念を定義する。提案するアルゴリズムは、エピソード $k\geq\omega(\frac{i^2h^6sa\ell}{\epsilon^2})$ に対して$(\epsilon,p)$-pacポリシーを成立させることが証明されている。 $H$はエピソードごとのエポックの数です。 I$は制約関数の数であり、$\ell=\log(\frac{SAT}{p})$である。ピーク制約を持つPCMDPのPAC解析における最初の結果であり、遷移ダイナミクスはアプリオリではない。提案手法は, エネルギー収穫問題と単機スケジューリング問題に対して提案手法を実証し, 検討された最適化問題の理論的上限に近い性能を示す。 In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied. We define the concept of probably approximately correct (PAC) to the proposed PCMDP problem. The proposed algorithm is proved to achieve an $(\epsilon,p)$-PAC policy when the episode $K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})$, where $S$ and $A$ are the number of states and actions, respectively. $H$ is the number of epochs per episode. $I$ is the number of constraint functions, and $\ell=\log(\frac{SAT}{p})$. We note that this is the first result on PAC kind of analysis for PCMDP with peak constraints, where the transition dynamics are not known apriori. We demonstrate the proposed algorithm on an energy harvesting problem and a single machine scheduling problem, where it performs close to the theoretical upper bound of the studied optimization problem.	翻訳日:2022-12-24 14:31:43 公開日:2022-06-13
# 重み付きQ-Learningによる深層強化学習 Deep Reinforcement Learning with Weighted Q-Learning ( http://arxiv.org/abs/2003.09280v3 ) ライセンス: Link先を確認	Andrea Cini, Carlo D'Eramo, Jan Peters, Cesare Alippi	(参考訳) Q-learningに基づく強化学習アルゴリズムは、複雑な問題の解決と超人的パフォーマンスの実現に向けて、Deep Reinforcement Learning (DRL)研究を推進している。にもかかわらず、Q-Learningは期待値の雑音の最大過度推定を用いて学習するため、正のバイアスを受けることが知られている。動作値の体系的過大評価とDRL法の本質的に高い分散は、漸進的にエラーを蓄積させ、学習アルゴリズムのばらつきを引き起こす。理想的には、DRLエージェントがそれぞれのアクションの最適性について不確実性を考慮し、それを利用して期待されるリターンのより詳細な推定を行えるようにしたい。この点において、Weighted Q-Learning(WQL)はバイアスを効果的に低減し、確率的環境において顕著な結果を示す。 WQLは推定された作用値の重み付け和を使用し、重み付けは各作用値の最大値の確率に対応するが、これらの確率の計算は表の設定でのみ実用的である。本研究では,ディープガウス過程の効果的な近似として,ドロップアウトで訓練されたニューラルネットワークを用いて,drlのwql特性の恩恵を受けるための方法論的進歩を提案する。特に, DRLにおける上皮性不確かさのキャリブレーション値を求めるために, コンクリートドロップアウト変種を採用する。推定器は、いくつかの確率的前方通過をアクション値ネットワークを通過し、モンテカルロ方式で重みを計算することによって得られる。そのような重みは、ドロップアウトによって推定される後方確率分布の最大 w.r.t. に対応する各アクション値の確率のベイズ推定である。そこで本研究では, 重み付きq-learningアルゴリズムを用いて, バイアスw.r.t.のベースラインを低減し, そのアドバンテージを代表ベンチマークで実証的に証明する。 Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values. Systematic overestimation of the action values coupled with the inherently high variance of DRL methods can lead to incrementally accumulate errors, causing learning algorithms to diverge. Ideally, we would like DRL agents to take into account their own uncertainty about the optimality of each action, and be able to exploit it to make more informed estimations of the expected return. In this regard, Weighted Q-Learning (WQL) effectively reduces bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action values, where the weights correspond to the probability of each action value being the maximum; however, the computation of these probabilities is only practical in the tabular setting. In this work, we provide methodological advances to benefit from the WQL properties in DRL, by using neural networks trained with Dropout as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. The estimator, then, is obtained by taking several stochastic forward passes through the action-value network and computing the weights in a Monte Carlo fashion. Such weights are Bayesian estimates of the probability of each action value corresponding to the maximum w.r.t. a posterior probability distribution estimated by Dropout. We show how our novel Deep Weighted Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provides empirical evidence of its advantages on representative benchmarks.	翻訳日:2022-12-21 21:50:00 公開日:2022-06-13
# ランダムスペーシングを用いた分散SGDの分離誤差フィードバック Detached Error Feedback for Distributed SGD with Random Sparsification ( http://arxiv.org/abs/2004.05298v3 ) ライセンス: Link先を確認	An Xu, Heng Huang	(参考訳) 大規模分散ディープラーニングでは,通信ボトルネックが重要な問題となっている。本研究では,不規則なブロック幅の分散SGDを,リングアレーダ互換かつ高い計算効率を持つ勾配圧縮機として検討するが,性能は低下する。この重要な問題に対処するために、我々は通信効率のよい分散SGD、すなわち勾配のばらつきと第二モーメントの間のトレードオフを改善した。このモチベーションにより,非凸問題に対する誤差フィードバックよりも高い収束率を示す新しい分離誤差フィードバック(def)アルゴリズムを提案する。また、Def-Aは、トレーニングの初期段階におけるDefの一般化を加速し、Defよりも優れた一般化境界を示す。さらに,通信効率の高い分散SGDとSGDとの接続を,SGD-IA (Iterate Averaging) と初めて確立した。深層学習実験では,様々な条件下で提案手法の有意な経験的改善が示された。 The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a novel aspect, that is, the trade-off between the variance and second moment of the gradient. With this motivation, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. We also propose DEF-A to accelerate the generalization of DEF at the early stages of the training, which shows better generalization bounds than DEF. Furthermore, we establish the connection between communication-efficient distributed SGD and SGD with iterate averaging (SGD-IA) for the first time. Extensive deep learning experiments show significant empirical improvement of the proposed methods under various settings.	翻訳日:2022-12-14 09:59:33 公開日:2022-06-13
# 機能の浄化: 対人訓練が頑健な深層学習を実現する方法 Feature Purification: How Adversarial Training Performs Robust Deep Learning ( http://arxiv.org/abs/2005.10190v4 ) ライセンス: Link先を確認	Zeyuan Allen-Zhu and Yuanzhi Li	(参考訳) 相反する摂動に対して深層学習モデルを守るために相反する訓練を用いた経験的な成功にもかかわらず、相反する摂動の存在の背後にある原理と、相反するトレーニングがそれらを取り除くためにニューラルネットワークにどのような影響を与えるのかは、今のところまだ不明である。本稿では,ニューラルネットワークのトレーニング過程において,特定の低密度混合物が隠れ重みに蓄積されていること,さらに,そのような混合物を除去して隠蔽重みを浄化することが敵のトレーニングの目的である,という特徴浄化(Feature Purification)の原則を提案する。この原理を説明するために,CIFAR-10データセットを用いて実験を行った。また,特定の自然分類タスクに対して,ランダムに初期化勾配勾配勾配を用いた2層ニューラルネットワークをトレーニングすることで,この原理を満足できることを示す理論的結果を示す。技術的には、我々の知る限りでは、次の2つがreluアクティベーションでニューラルネットワークをトレーニングするために同時に保持できることを証明する最初の結果です。 1) 原データのトレーニングは, 半径の小さな対向摂動に対して, 実際に非破壊的である。 2) fgmのような経験的摂動アルゴリズムであっても、逆行訓練は、実際には同じ半径の摂動に対して確実に頑健である。最後に,線形分類器や低次多項式,あるいはニューラルネットワークの神経接核といった複雑性の低いモデルでは,アルゴリズムが何であっても,同じ半径の摂動に対して防御できないことを示した。 Despite the empirical success of using Adversarial Training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In this paper, we present a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network; and more importantly, one of the goals of adversarial training is to remove such mixtures to purify hidden weights. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle. Technically, we give, to the best of our knowledge, the first result proving that the following two can hold simultaneously for training a neural network with ReLU activation. (1) Training over the original data is indeed non-robust to small adversarial perturbations of some radius. (2) Adversarial training, even with an empirical perturbation algorithm such as FGM, can in fact be provably robust against ANY perturbations of the same radius. Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, CANNOT defend against perturbations of this same radius, no matter what algorithms are used to train them.	翻訳日:2022-12-01 04:31:09 公開日:2022-06-13
# ランダム射影の精密表現:低ランク近似とランダムニュートン Precise expressions for random projections: Low-rank approximation and randomized Newton ( http://arxiv.org/abs/2006.10653v3 ) ライセンス: Link先を確認	Micha{\l} Derezi\'nski, Feynman Liang, Zhenyu Liao and Michael W. Mahoney	(参考訳) 低次元の部分空間に投影することで、大きなデータセットの次元性を減らすことがしばしば望ましい。マトリックススケッチは、そのような次元削減を非常に効率的に行うための強力な技術として登場した。スケッチの最悪の性能に関する広範な文献があるが、既存の保証は実際には観察されているものとは大きく異なる。本研究では,ランダム行列のスペクトル解析における最近の進歩を活かし,スケッチによって得られるランダム射影行列の期待値に対して,確実に正確な表現を提供する新しい手法を開発した。これらの式は、低ランク近似から反復確率最適化まで、様々な機械学習タスクにおける次元削減のパフォーマンスを特徴付けることができる。本手法はガウシアンスケッチやラデマッハスケッチなど,いくつかの一般的なスケッチ手法に適用でき,データのスペクトル特性の観点から,これらの手法を高精度に解析できる。実験結果から,これらのスケッチ手法の実践的性能を,低次効果や定数要因まで反映した表現が得られた。 It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are typically very different from what is observed in practice. We exploit recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching. These expressions can be used to characterize the performance of dimensionality reduction in a variety of common machine learning tasks, ranging from low-rank approximation to iterative stochastic optimization. Our results apply to several popular sketching methods, including Gaussian and Rademacher sketches, and they enable precise analysis of these methods in terms of spectral properties of the data. Empirical results show that the expressions we derive reflect the practical performance of these sketching methods, down to lower-order effects and even constant factors.	翻訳日:2022-11-19 12:57:28 公開日:2022-06-13
# DeepVOX:非理想的音声信号における話者認識のための生音声の特徴発見 DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Non-ideal Audio Signals ( http://arxiv.org/abs/2008.11668v2 ) ライセンス: Link先を確認	Anurag Chowdhury, Arun Ross	(参考訳) 自動音声認識アルゴリズムは通常、メル周波数やガンマタンフィルタバンクなどの予め定義されたフィルタバンクを使用して音声音声を特徴付ける。しかし、これらのフィルタバンクを用いて抽出した特徴は、多様なオーディオ劣化に対する耐性がないことが観察されている。本研究では,大量の音声からフィルタバンク設計を推定する深層学習に基づく手法を提案する。このようなフィルタバンクの目的は、劣化、短命、多言語音声など、理想的でない音声条件にロバストな特徴を抽出することである。この効果のために、1D畳み込みニューラルネットワークは生音声から直接DeepVOXと呼ばれる時間領域のフィルタバンクを学習するように設計されている。次に,フィルタバンクの訓練に適したデータサンプルを効率的にマイニングするために,適応三重項マイニング手法を開発した。第3に,deepvoxフィルタバンクの詳細なアブレーション研究により,抽出された特徴における声道特性と声道特性の両方の存在が明らかになった。 VOXCeleb2、NIST SRE 2008、2010、2018、およびFisher音声データセットの実験結果は、様々な劣化、短い期間、多言語音声におけるDeepVOX特徴の有効性を示す。 DeepVOX機能はまた、xVector-PLDAやiVector-PLDAといった既存の話者認識アルゴリズムの性能向上を示す。 Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. However, it has been observed that the features extracted using these filterbanks are not resilient to diverse audio degradations. In this work, we propose a deep learning-based technique to deduce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to non-ideal audio conditions, such as degraded, short duration, and multi-lingual speech. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining technique is developed to efficiently mine the data samples best suited to train the filterbank. Thirdly, a detailed ablation study of the DeepVOX filterbanks reveals the presence of both vocal source and vocal tract characteristics in the extracted features. Experimental results on VOXCeleb2, NIST SRE 2008, 2010 and 2018, and Fisher speech datasets demonstrate the efficacy of the DeepVOX features across a variety of degraded, short duration, and multi-lingual speech. The DeepVOX features also shown to improve the performance of existing speaker recognition algorithms, such as the xVector-PLDA and the iVector-PLDA.	翻訳日:2022-10-24 22:22:17 公開日:2022-06-13
# CLAS12用ドリフトチャンバーにおけるトラック再構成の自動エンコーダ Auto-encoders for Track Reconstruction in Drift Chambers for CLAS12 ( http://arxiv.org/abs/2009.05144v2 ) ライセンス: Link先を確認	Gagik Gavalian	(参考訳) 本稿では,ドリフトチャンバーに欠落したセグメントを推定することでトラックを識別し,CLAS12追跡アルゴリズムを支援する機械学習モデルの開発について述べる。オートエンコーダは、トラック軌道から欠落したセグメントを再構成するために使用される。実装されたニューラルネットワークは、欠落したセグメントの位置を約0.05$の精度で確実に再構築でき、99.8\%の精度で欠落したトラックの回復に繋がる。 In this article we describe the development of machine learning models to assist the CLAS12 tracking algorithm by identifying tracks through inferring missing segments in the drift chambers. Auto encoders are used to reconstruct missing segments from track trajectory. Implemented neural network was able to reliably reconstruct missing segment positions with accuracy of $\approx 0.35$ wires, and lead to recovery of missing tracks with accuracy of $>99.8\%$.	翻訳日:2022-10-20 04:03:20 公開日:2022-06-13
# 画像ベースのソルガムヘッド計数(sorghum head counting) Image-Based Sorghum Head Counting When You Only Look Once ( http://arxiv.org/abs/2009.11929v3 ) ライセンス: Link先を確認	Lawrence Mosley and Hieu Pham and Yogesh Bansal and Eric Hare	(参考訳) デジタル農業の最近のトレンドは、作物の品質評価と収量推定のために人工知能にシフトしている。本研究では,パラメータ調整された単発物体検出アルゴリズムを用いて,空中ドローン画像からソルガム頭部を識別・カウントする方法について述べる。提案手法は,ソルガム画像の重要な構造要素を同定し,性能に大きく寄与するパラメータ調整アンカーボックスの選択を動機付ける,新たな探索分析を含む。これらの知見は、ベースラインモデルより優れ、サンプル外平均精度0.95を達成したディープラーニングモデルの開発につながった。 Modern trends in digital agriculture have seen a shift towards artificial intelligence for crop quality assessment and yield estimation. In this work, we document how a parameter tuned single-shot object detection algorithm can be used to identify and count sorghum head from aerial drone images. Our approach involves a novel exploratory analysis that identified key structural elements of the sorghum images and motivated the selection of parameter-tuned anchor boxes that contributed significantly to performance. These insights led to the development of a deep learning model that outperformed the baseline model and achieved an out-of-sample mean average precision of 0.95.	翻訳日:2022-10-15 04:21:26 公開日:2022-06-13
# 深層学習による外惑星の同定 IV。ニューラルネットワークを用いた放射速度測定からの恒星活動信号の除去 Identifying Exoplanets with Deep Learning. IV. Removing Stellar Activity Signals from Radial Velocity Measurements Using Neural Networks ( http://arxiv.org/abs/2011.00003v3 ) ライセンス: Link先を確認	Zoe L. de Beurs, Andrew Vanderburg, Christopher J. Shallue, Xavier Dumusque, Andrew Collier Cameron, Christopher Leet, Lars A. Buchhave, Rosario Cosentino, Adriano Ghedina, Rapha\"elle D. Haywood, Nicholas Langellier, David W. Latham, Mercedes L\'opez-Morales, Michel Mayor, Giusi Micela, Timothy W. Milbourne, Annelies Mortier, Emilio Molinari, Francesco Pepe, David F. Phillips, Matteo Pinamonti, Giampaolo Piotto, Ken Rice, Dimitar Sasselov, Alessandro Sozzetti, St\'ephane Udry, Christopher A. Watson	(参考訳) 正確な放射速度(RV)を観測する太陽系外惑星検出は、星活動によって引き起こされる刺激的なRV信号によって現在制限されている。線形回帰やニューラルネットワークのような機械学習技術は、RV観測から活動信号(スタースポット/ファキュラによる)を効果的に除去できることを示す。以前の取り組みは、ガウス過程回帰(haywood et al. 2014)のようなモデリング技術を使って、時間内にアクティビティ信号を注意深くフィルタリングすることに焦点を当てていた。代わりに、スペクトル線の平均的な形状の変化のみを用いて、系統的に活動信号を取り除き、いつ観測されたかに関する情報は得られない。私たちは、シミュレーションデータ(SOAP 2.0ソフトウェアで生成されたDumusqueなど)と、HARPS-N太陽望遠鏡(Dumusque et al. 2015; Phillips et al. 2016; Collier Cameron et al. 2019)からの太陽の観測の両方に基づいて、機械学習モデルをトレーニングしました。これらの技術は、シミュレーションデータ(82 cm/sから3 cm/s)と、HARPS-N太陽望遠鏡(約1.753 m/sから1.039 m/s)で3年間に約600回観測された実測値(約1.7の改善率)から恒星活動を予測することができる。将来的には、太陽系外にある恒星の観測から活動シグナルを取り除き、太陽のような恒星の周りに居住可能な地球外惑星を検出するのに役立つだろう。 Exoplanet detection with precise radial velocity (RV) observations is currently limited by spurious RV signals introduced by stellar activity. We show that machine learning techniques such as linear regression and neural networks can effectively remove the activity signals (due to starspots/faculae) from RV observations. Previous efforts focused on carefully filtering out activity signals in time using modeling techniques like Gaussian Process regression (e.g. Haywood et al. 2014). Instead, we systematically remove activity signals using only changes to the average shape of spectral lines, and no information about when the observations were collected. We trained our machine learning models on both simulated data (generated with the SOAP 2.0 software; Dumusque et al. 2014) and observations of the Sun from the HARPS-N Solar Telescope (Dumusque et al. 2015; Phillips et al. 2016; Collier Cameron et al. 2019). We find that these techniques can predict and remove stellar activity from both simulated data (improving RV scatter from 82 cm/s to 3 cm/s) and from more than 600 real observations taken nearly daily over three years with the HARPS-N Solar Telescope (improving the RV scatter from 1.753 m/s to 1.039 m/s, a factor of ~ 1.7 improvement). In the future, these or similar techniques could remove activity signals from observations of stars outside our solar system and eventually help detect habitable-zone Earth-mass exoplanets around Sun-like stars.	翻訳日:2022-10-01 17:37:48 公開日:2022-06-13
# 偽ニュース検出のためのハイブリッドアンサンブルの試み Hybrid Ensemble for Fake News Detection: An attempt ( http://arxiv.org/abs/2206.13981v1 ) ライセンス: Link先を確認	Lovedeep Singh	(参考訳) フェイクニュース検出は、機械学習の分野で難しい問題となっている。研究者は、古い統計分類モデルと現代のディープラーニングを用いて、いくつかの手法でアプローチしている。現在、データ量の増加、NLPとMLの分野の発展、処理時の計算能力の増加により、この問題に異なる視点からアプローチするための無限の置換と組み合わせが存在する。本稿では,フェイクニュースに取り組むために異なる手法を試し,構築し,古典的機械学習手法と現代的ディープラーニング手法を組み合わせたハイブリッドアンサンブルの可能性を提案する。 Fake News Detection has been a challenging problem in the field of Machine Learning. Researchers have approached it via several techniques using old Statistical Classification models and modern Deep Learning. Today, with the growing amount of data, developments in the field of NLP and ML, and an increase in the computation power at disposal, there are infinite permutations and combinations to approach this problem from a different perspective. In this paper, we try different methods to tackle Fake News, and try to build, and propose the possibilities of a Hybrid Ensemble combining the classical Machine Learning techniques with the modern Deep Learning Approaches	翻訳日:2022-07-04 01:14:25 公開日:2022-06-13
# (参考訳) 畳み込みニューラルネットワークの構造化プルーニングの活用 Leveraging Structured Pruning of Convolutional Neural Networks ( http://arxiv.org/abs/2206.06247v1 ) ライセンス: CC BY 4.0	Hugo Tessier, Vincent Gripon, Mathieu L\'eonardon, Matthieu Arzel, David Bertrand, Thomas Hannagan	(参考訳) 構造化プルーニングは、多くのコンピュータビジョンタスクにおける最先端技術である畳み込みニューラルネットワークのコストを削減する一般的な方法である。しかし、アーキテクチャによっては、プルーニングは、実際のプルーニングネットワークの減少を防ぐ次元的な相違をもたらす。この問題に対処するため,我々は,構造化されたプルーニングマスクを取り込んで,これらの問題に遭遇せず,効率的に活用できるネットワークを生成する手法を提案する。筆者らは,提案手法の正確な説明を行い, 畳み込み畳み込みニューラルネットワークの, 組込みハードウェア上でのエネルギー消費と推論時間におけるゲイン結果を示す。 Structured pruning is a popular method to reduce the cost of convolutional neural networks, that are the state of the art in many computer vision tasks. However, depending on the architecture, pruning introduces dimensional discrepancies which prevent the actual reduction of pruned networks. To tackle this problem, we propose a method that is able to take any structured pruning mask and generate a network that does not encounter any of these problems and can be leveraged efficiently. We provide an accurate description of our solution and show results of gains, in energy consumption and inference time on embedded hardware, of pruned convolutional neural networks.	翻訳日:2022-06-27 00:35:08 公開日:2022-06-13
# (参考訳) 組込みGPUを用いたプルーニングセマンティックセマンティックセグメンテーションネットワークのエネルギー消費解析 Energy Consumption Analysis of pruned Semantic Segmentation Networks on an Embedded GPU ( http://arxiv.org/abs/2206.06255v1 ) ライセンス: CC BY 4.0	Hugo Tessier, Vincent Gripon, Mathieu L\'eonardon, Matthieu Arzel, David Bertrand, Thomas Hannagan	(参考訳) ディープニューラルネットワークは多くのコンピュータビジョンタスクにおける最先端技術である。エネルギー消費の面での制限は、通常最高の性能に達する非常に大きなネットワークの使用を禁止しているため、自動運転車のコンテキストにおける彼らの展開は特に興味深い。これらのアーキテクチャの複雑さを減らす一般的な方法は、精度を犠牲にすることなく、最も重要でない部分を取り除くプラニングに依存することである。このテーマには多くの文献があるが、興味深いことに、刈り取りがエネルギーに与える影響を計測した作品はほとんどない。本研究では、Cityscapesデータセットを用いて、自動運転のためのセマンティックセグメンテーションのコンテキストで測定することに興味がある。そこで本稿では,Jetson Xavier組込みGPU上にトレーニングアーキテクチャをデプロイした場合に,最近提案した構造化プルーニング手法の影響を解析する。 Deep neural networks are the state of the art in many computer vision tasks. Their deployment in the context of autonomous vehicles is of particular interest, since their limitations in terms of energy consumption prohibit the use of very large networks, that typically reach the best performance. A common method to reduce the complexity of these architectures, without sacrificing accuracy, is to rely on pruning, in which the least important portions are eliminated. There is a large literature on the subject, but interestingly few works have measured the actual impact of pruning on energy. In this work, we are interested in measuring it in the specific context of semantic segmentation for autonomous driving, using the Cityscapes dataset. To this end, we analyze the impact of recently proposed structured pruning methods when trained architectures are deployed on a Jetson Xavier embedded GPU.	翻訳日:2022-06-27 00:24:16 公開日:2022-06-13
# (参考訳) 正確な2次元対応から3次元点雲へ From a few Accurate 2D Correspondences to 3D Point Clouds ( http://arxiv.org/abs/2206.08749v1 ) ライセンス: CC BY 4.0	Trung-Kien Le and Ping Li	(参考訳) キーポイント、対応、投影行列、点雲、高密度雲は画像ベースの3次元再構成における骨格であり、そこでは3次元再構成対象の現実的で自然なモデルを生成する上で、点雲が重要な役割を果たす。良好な3D再構成を実現するためには、点雲は物体の表面のほぼ至るところにある必要がある。本稿では,物体の表面全体を覆う点雲の構築を主目的とし,測地的特徴(geodesic feature, geo-feature)と呼ばれる新機能を提案する。新しい測地関数に基づいて、対象の表面に、正確に推定されたすべての射影行列とともにいくつかの(与えられた)初期世界点が存在する場合、これらの二つの世界点を接続する測地線上の新しい世界点が再構成される。すると、これらの初期世界点に接する表面上の領域は、点雲に覆われる。したがって、初期世界点が表面の周囲にある場合、点雲は表面全体を覆うことになる。本稿では,その対応から世界点と投影行列を推定する新しい手法を提案する。本手法は,世界点と射影行列の閉形式および反復解を導出し,世界点数が7未満で画像数が5以上である場合,提案した解が大域的最適であることを示す。本稿では,それらの対応から世界点と射影行列を推定するために world points from their correspondences (wpfc) というアルゴリズムと,第1のアルゴリズムによって与えられた世界点と射影行列から点雲を生成する creating point clouds (crpc) という別のアルゴリズムを提案する。 Key points, correspondences, projection matrices, point clouds and dense clouds are the skeletons in image-based 3D reconstruction, of which point clouds have the important role in generating a realistic and natural model for a 3D reconstructed object. To achieve a good 3D reconstruction, the point clouds must be almost everywhere in the surface of the object. In this article, with a main purpose to build the point clouds covering the entire surface of the object, we propose a new feature named a geodesic feature or geo-feature. Based on the new geo-feature, if there are several (given) initial world points on the object's surface along with all accurately estimated projection matrices, some new world points on the geodesics connecting any two of these given world points will be reconstructed. Then the regions on the surface bordering by these initial world points will be covered by the point clouds. Thus, if the initial world points are around the surface, the point clouds will cover the entire surface. This article proposes a new method to estimate the world points and projection matrices from their correspondences. This method derives the closed-form and iterative solutions for the world points and projection matrices and proves that when the number of world points is less than seven and the number of images is at least five, the proposed solutions are global optimal. We propose an algorithm named World points from their Correspondences (WPfC) to estimate the world points and projection matrices from their correspondences, and another algorithm named Creating Point Clouds (CrPC) to create the point clouds from the world points and projection matrices given by the first algorithm.	翻訳日:2022-06-27 00:14:55 公開日:2022-06-13
# (参考訳) 非ゲートCTスキャンを用いた半教師あり学習によるU-Netモデルによる冠動脈スコーシングの自動化 Automated Coronary Calcium Scoring using U-Net Models through Semi-supervised Learning on Non-Gated CT Scans ( http://arxiv.org/abs/2206.10455v1 ) ライセンス: CC BY-SA 4.0	Sanskriti Singh	(参考訳) 毎年、何千人もの無実の人々が心臓発作で死んでいる。多くの現在の医療計画では、これらのスキャンで石灰化を検索するコストをカバーしていないため、心臓発作は意外な結果に陥ることが多い。心臓疾患の疑いがある場合のみ、ゲートctスキャンを受けます。そうでなければ、患者が心臓発作/死の可能性を認識する方法はありません。非調節型ctスキャンはより定期的に行われるが、石灰化の検出は困難であり、通常は動脈内の石灰化の特定以外の目的で行われる。実際、リアルタイムの冠動脈石灰化スコアは、非ゲートCTスキャンではなく、ゲートCTスキャンでのみ計算される。冠状カルシウムと胸部CTのゲートスキャンでユニットモデルを訓練した後、非接触テストセットでDICE係数0.95を得た。このモデルは非ゲートCTスキャンの予測に用いられ、平均絶対誤差は674.19で、バケット分類精度は41%(5クラス)であった。画像の解析と画像に格納された情報を通じて、数学的方程式が導出され、心臓の位置の周りで自動的に画像が収穫される。半教師付き学習を行うことで、新たに採取した非ゲートスキャンは、ゲートCTスキャンと密接に類似し、MAE(62.38)で91%、精度で23%向上した。 Every year, thousands of innocent people die due to heart attacks. Often undiagnosed heart attacks can hit people by surprise since many current medical plans don't cover the costs to require the searching of calcification on these scans. Only if someone is suspected to have a heart problem, a gated CT scan is taken, otherwise, there's no way for the patient to be aware of a possible heart attack/disease. While nongated CT scans are more periodically taken, it is harder to detect calcification and is usually taken for a purpose other than locating calcification in arteries. In fact, in real time coronary artery calcification scores are only calculated on gated CT scans, not nongated CT scans. After training a unet model on the Coronary Calcium and chest CT's gated scans, it received a DICE coefficient of 0.95 on its untouched test set. This model was used to predict on nongated CT scans, performing with a mean absolute error (MAE) of 674.19 and bucket classification accuracy of 41% (5 classes). Through the analysis of the images and the information stored in the images, mathematical equations were derived and used to automatically crop the images around the location of the heart. By performing semi-supervised learning the new cropped nongated scans were able to closely resemble gated CT scans, improving the performance by 91% in MAE (62.38) and 23% in accuracy.	翻訳日:2022-06-26 23:52:57 公開日:2022-06-13
# 多目的遺伝的プログラミングにおける意味論のハイライト Highlights of Semantics in Multi-objective Genetic Programming ( http://arxiv.org/abs/2206.05010v2 ) ライセンス: Link先を確認	Edgar Galv\'an, Leonardo Trujillo, Fergal Stapleton	(参考訳) セマンティックス(Semantics)は、遺伝的プログラミング(GP)の研究の領域であり、実行時に遺伝的プログラミングの個体の行動出力を指す。 sdo (semantic-based distance as a additional criterion) という手法が提案されており、これまでのところ、多目的gp (multi-objective gp, mogp) における意味論の研究領域は限られている。 SCC(Semantic similarity-based Crossover)とSCD(Semantic-based Crowding Distance)という,2つのセマンティックなセマンティックなアプローチを使用して,パフォーマンスと多様性の指標の観点からGPの拡張分析を行った。それぞれのアプローチは2つの進化的多目的 (EMO) フレームワークに統合される: 非支配的ソーティング遺伝アルゴリズムII (NSGA-II) と強度パレート進化アルゴリズム2 (SPEA2) の3つのセマンティックアプローチと共に、NSGA-IIとSPEA2の正準形式を厳密に比較する。高度にバランスの取れないバイナリ分類データセットを用いて,新たに提案するsdoのアプローチが,多様性の向上と高ボリューム化とともに,非優位なソリューションを一貫して生成することを示した。 Semantics is a growing area of research in Genetic programming (GP) and refers to the behavioural output of a Genetic Programming individual when executed. This research expands upon the current understanding of semantics by proposing a new approach: Semantic-based Distance as an additional criteriOn (SDO), in the thus far, somewhat limited researched area of semantics in Multi-objective GP (MOGP). Our work included an expansive analysis of the GP in terms of performance and diversity metrics, using two additional semantic-based approaches, namely Semantic Similarity-based Crossover (SCC) and Semantic-based Crowding Distance (SCD). Each approach is integrated into two evolutionary multi-objective (EMO) frameworks: Non-dominated Sorting Genetic Algorithm II (NSGA-II) and the Strength Pareto Evolutionary Algorithm 2 (SPEA2), and along with the three semantic approaches, the canonical form of NSGA-II and SPEA2 are rigorously compared. Using highly-unbalanced binary classification datasets, we demonstrated that the newly proposed approach of SDO consistently generated more non-dominated solutions, with better diversity and improved hypervolume results.	翻訳日:2022-06-26 14:49:32 公開日:2022-06-13
# ユーザ生成VRビデオの品質評価のためのデータベース A Database for Perceived Quality Assessment of User-Generated VR Videos ( http://arxiv.org/abs/2206.08751v1 ) ライセンス: Link先を確認	Yuming Fang, Yiru Yao, Xiangjie Sui, and Kede Ma	(参考訳) 仮想現実(vr)ビデオ(通常は360$^\circ$ビデオ)は、vr技術の急速な発展と、消費者向けの360$^\circ$カメラやディスプレイの普及により、注目を集めている。したがって、ユーザーが生み出すVRビデオをどのように知覚するかを理解することが重要であり、それは、しばしば空間と時間で局所化される真正な歪みに悩まされる可能性がある。本稿では,豊富なコンテンツと歪みのある502のユーザ生成ビデオを含む,最大規模の360$^\circ$ビデオデータベースを構築する。 139人の視聴行動(すなわちスキャンパス)を捉え、4つの異なる視聴条件下で評価された品質の意見スコアを収集する(2つの開始点$\times$2回の探索時間)。本研究では,記録データに対する詳細な統計分析を行い,視聴状況が視聴行動や知覚品質に与える影響など,いくつかの興味深い観察結果を得た。また,360$^\circ$ビデオの品質評価のための計算モデルの評価,サリエンシー検出など,データと分析の他の用途についても検討した。データセットとコードはhttps://github.com/Yao-Yiru/VR-Video-Database.comで公開しています。 Virtual reality (VR) videos (typically in the form of 360$^\circ$ videos) have gained increasing attention due to the fast development of VR technologies and the remarkable popularization of consumer-grade 360$^\circ$ cameras and displays. Thus it is pivotal to understand how people perceive user-generated VR videos, which may suffer from commingled authentic distortions, often localized in space and time. In this paper, we establish one of the largest 360$^\circ$ video databases, containing 502 user-generated videos with rich content and distortion diversities. We capture viewing behaviors (i.e., scanpaths) of 139 users, and collect their opinion scores of perceived quality under four different viewing conditions (two starting points $\times$ two exploration times). We provide a thorough statistical analysis of recorded data, resulting in several interesting observations, such as the significant impact of viewing conditions on viewing behaviors and perceived quality. Besides, we explore other usage of our data and analysis, including evaluation of computational models for quality assessment and saliency detection of 360$^\circ$ videos. We have made the dataset and code available at https://github.com/Yao-Yiru/VR-Video-Database.	翻訳日:2022-06-26 07:35:49 公開日:2022-06-13
# アルミナセラミックスレーザ加工の機械学習によるプロセス Machine Learning-Driven Process of Alumina Ceramics Laser Machining ( http://arxiv.org/abs/2206.08747v1 ) ライセンス: Link先を確認	Razyeh Behbahani, Hamidreza Yazdani Sarvestani, Erfan Fatehi, Elham Kiyani, Behnam Ashrafi, Mikko Karttunen and Meysam Rahmat	(参考訳) レーザー加工は高度に柔軟な非接触製造技術であり、学界や産業で広く使われている。光と物質間の非線形相互作用のため、レーザー加工パラメータ間の相互関係の理解を提供することにより、加工品質を向上させるため、シミュレーション手法は非常に重要である。一方、実験的な処理パラメータ最適化では、利用可能な処理パラメータ空間上での系統的、結果として時間を要する調査を推奨している。インテリジェントな戦略は、機械学習(ML)技術を用いて、ピコ秒レーザー加工パラメータ間の関係を捕捉し、適切なパラメータの組み合わせを見つけることで、深い、滑らかで欠陥のないパターンを持つ工業級アルミナセラミックスの所望のカットを生成することである。 MLモデルを用いて、ビーム振幅や周波数、スキャナ通過速度、表面の通過数、試料表面からのスキャナの垂直距離などのレーザパラメータを用いて、刻印チャネルの深さ、頂幅、底幅を予測する。レーザーパラメータ間の複雑な相関関係から,ニューラルネットワーク (nn) が出力予測において最も効率的であることが示されている。レーザパラメータと刻まれたチャネル次元の相互接続をキャプチャするMLモデルにより、ターゲットチャネル形状を達成するために必要な入力パラメータを予測することができる。この戦略は、精度や性能を損なうことなく、開発段階での実験レーザー加工のコストと労力を大幅に削減する。開発された技術は、幅広いセラミックレーザ加工プロセスに適用することができる。 Laser machining is a highly flexible non-contact manufacturing technique that has been employed widely across academia and industry. Due to nonlinear interactions between light and matter, simulation methods are extremely crucial, as they help enhance the machining quality by offering comprehension of the inter-relationships between the laser processing parameters. On the other hand, experimental processing parameter optimization recommends a systematic, and consequently time-consuming, investigation over the available processing parameter space. An intelligent strategy is to employ machine learning (ML) techniques to capture the relationship between picosecond laser machining parameters for finding proper parameter combinations to create the desired cuts on industrial-grade alumina ceramic with deep, smooth and defect-free patterns. Laser parameters such as beam amplitude and frequency, scanner passing speed and the number of passes over the surface, as well as the vertical distance of the scanner from the sample surface, are used for predicting the depth, top width, and bottom width of the engraved channels using ML models. Owing to the complex correlation between laser parameters, it is shown that Neural Networks (NN) are the most efficient in predicting the outputs. Equipped with an ML model that captures the interconnection between laser parameters and the engraved channel dimensions, one can predict the required input parameters to achieve a target channel geometry. This strategy significantly reduces the cost and effort of experimental laser machining during the development phase, without compromising accuracy or performance. The developed techniques can be applied to a wide range of ceramic laser machining processes.	翻訳日:2022-06-26 07:34:54 公開日:2022-06-13
# ReViSe:スマートフォンカメラを用いたリモートバイタルサイン計測 ReViSe: Remote Vital Signs Measurement Using Smartphone Camera ( http://arxiv.org/abs/2206.08748v1 ) ライセンス: Link先を確認	Donghao Qiao, Amtul Haq Ayesha, Farhana Zulkernine, Raihan Masroor, Nauman Jaffar	(参考訳) 遠隔photoplethysmography(rppg)は、顔ビデオを用いたバイタルサイン推定を可能にするバイオメトリックデータ収集のための高速で効果的で安価で便利な方法である。新型コロナウイルス(COVID-19)のパンデミックでは、遠隔医療サービス提供が不可欠であることが証明されている。本稿では,スマートフォンカメラで撮影したユーザの顔の映像から,心拍数(HR),心拍変動(HRV),酸素飽和度(SpO2),血圧(BP)など,人々のバイタルサインを測定するためのエンドツーエンドのフレームワークを提案する。ディープラーニングに基づくニューラルネットワークモデルを用いて,顔のランドマークをリアルタイムで抽出する。予測された顔ランドマークを用いて、関心領域(roi)とも呼ばれる複数の顔パッチを抽出する。血液量パルス(BVP)信号と呼ばれる抽出された心臓信号のRoIからのノイズを低減するために、いくつかのフィルタが適用される。我々は,東京工科大学rPPGとPURE(Pulse Rate Detection)という2つの公開rPPGデータセットを用いて,機械学習モデルを訓練し,検証した。 a) HR それぞれ 1.73 と 3.95 のビーツパーミニット (bpm) について b) HRVは18.55ms、25.03ms、 c) SpO2 は PURE データセット上の 1.64 の MAE である。実生活環境において、エンドツーエンドのrPPGフレームワークReViSeを検証し、Video-HRデータセットを作成しました。我々のHR推定モデルは、このデータセット上で2.49bpmのMAEを達成した。顔ビデオによるBP測定のために公開されているrPPGデータセットは存在しないため、指先センサーからの信号によるデータセットを使用してモデルをトレーニングし、独自のビデオデータセットであるVideo-BPを作成しました。ビデオ-BPデータセットでは,SBPでは6.7mmHg,DBPでは9.6mmHg,DBPでは9.6mmHgであった。 Remote Photoplethysmography (rPPG) is a fast, effective, inexpensive and convenient method for collecting biometric data as it enables vital signs estimation using face videos. Remote contactless medical service provisioning has proven to be a dire necessity during the COVID-19 pandemic. We propose an end-to-end framework to measure people's vital signs including Heart Rate (HR), Heart Rate Variability (HRV), Oxygen Saturation (SpO2) and Blood Pressure (BP) based on the rPPG methodology from the video of a user's face captured with a smartphone camera. We extract face landmarks with a deep learning-based neural network model in real-time. Multiple face patches also called Region-of-Interests (RoIs) are extracted by using the predicted face landmarks. Several filters are applied to reduce the noise from the RoIs in the extracted cardiac signals called Blood Volume Pulse (BVP) signal. We trained and validated machine learning models using two public rPPG datasets namely the TokyoTech rPPG and the Pulse Rate Detection (PURE) datasets, on which our models achieved the following Mean Absolute Errors (MAE): a) for HR, 1.73 and 3.95 Beats-Per-Minute (bpm) respectively, b) for HRV, 18.55 and 25.03 ms respectively, and c) for SpO2, a MAE of 1.64 on the PURE dataset. We validated our end-to-end rPPG framework, ReViSe, in real life environment, and thereby created the Video-HR dataset. Our HR estimation model achieved a MAE of 2.49 bpm on this dataset. Since no publicly available rPPG datasets existed for BP measurement with face videos, we used a dataset with signals from fingertip sensor to train our model and also created our own video dataset, Video-BP. On our Video-BP dataset, our BP estimation model achieved a MAE of 6.7 mmHg for Systolic Blood Pressure (SBP), and a MAE of 9.6 mmHg for Diastolic Blood Pressure (DBP).	翻訳日:2022-06-26 07:11:39 公開日:2022-06-13
# 機械学習支援物理刺激ラマン散乱モデルに基づくフレキシブルラマン増幅器最適化 Flexible Raman Amplifier Optimization Based on Machine Learning-aided Physical Stimulated Raman Scattering Model ( http://arxiv.org/abs/2206.07650v1 ) ライセンス: Link先を確認	Metodi Plamenov Yankov, Francesco Da Ros, Uiara Celine de Moura, Andrea Carena and Darko Zibar	(参考訳) ラマン増幅器最適化の問題点について検討した。機械学習(ML)を用いたラマンゲイン係数に対して,前方伝搬ラマンポンプの勾配勾配勾配最適化が可能な微分補間関数を求める。フォワードポンプ構成における任意の数のポンプの周波数とパワーは、任意のデータチャネルロードとスパン長さに最適化される。前方伝搬モデルは、後方増幅器のポンプの周波数とパワーと後方増幅器のポンプのパワーを協調的に最適化する後方励起ラマン増幅器の実験的なMLモデルと組み合わせられる。前方および後方のアンプの最適化は250kmの未繰り返し伝送に対して実証される。 4Hz以上の利得平坦度を$<$ 1~dBとする。最適化増幅器は数値シミュレータを用いて検証される。 The problem of Raman amplifier optimization is studied. A differentiable interpolation function is obtained for the Raman gain coefficient using machine learning (ML), which allows for the gradient descent optimization of forward-propagating Raman pumps. Both the frequency and power of an arbitrary number of pumps in a forward pumping configuration are then optimized for an arbitrary data channel load and span length. The forward propagation model is combined with an experimentally-trained ML model of a backward-pumping Raman amplifier to jointly optimize the frequency and power of the forward amplifier's pumps and the powers of the backward amplifier's pumps. The joint forward and backward amplifier optimization is demonstrated for an unrepeatered transmission of 250 km. A gain flatness of $<$ 1~dB over 4 THz is achieved. The optimized amplifiers are validated using a numerical simulator.	翻訳日:2022-06-16 15:11:08 公開日:2022-06-13
# (参考訳) 行動経路を有するレコメンダ変換器 Recommender Transformers with Behavior Pathways ( http://arxiv.org/abs/2206.06804v1 ) ライセンス: CC BY 4.0	Zhiyu Yao, Xinyang Chen, Sinan Wang, Qinyan Dai, Yumeng Li, Tanchao Zhu, Mingsheng Long	(参考訳) シーケンシャルレコメンデーションでは、正確なレコメンデーションのために、ログ化されたユーザ行動データから進化する振る舞い特性をキャプチャする必要がある。しかし、ユーザーの振る舞いシーケンスは複数のスレッドが絡み合っているスクリプトと見なされる。重要な振る舞いの小さなセットだけが、ユーザの将来のアクションに進化できることが分かっています。その結果,ユーザの将来の行動を予測することは困難である。この特性は,各ユーザの逐次動作を行動経路として決定する。異なるユーザーは独自の行動経路を持っている。既存のシーケンシャルモデルの中で、トランスフォーマーは世界依存の特徴を捉えている。しかし、これらのモデルは、主に自己注意機構を用いて、以前の全ての行動に密分布を提供し、各ユーザーに調整されていない自明な行動によって最終的な予測が圧倒される。本稿では,新しいパスウェイアテンション機構を備えたRecommender Transformer(RETR)を構築する。 RETRは、ユーザ毎に指定された行動経路を動的に計画し、この行動経路を介してネットワークをスペアリングして、推奨に有用な進化パターンを効果的に捕捉することができる。重要な設計は、単純な振る舞いによって振舞いパスが圧倒されるのを防ぐために学習されたバイナリルートである。実世界の7つのデータセットに対するRETRの有効性を実証的に検証し、RETRは最先端の性能を得る。 Sequential recommendation requires the recommender to capture the evolving behavior characteristics from logged user behavior data for accurate recommendations. However, user behavior sequences are viewed as a script with multiple ongoing threads intertwined. We find that only a small set of pivotal behaviors can be evolved into the user's future action. As a result, the future behavior of the user is hard to predict. We conclude this characteristic for sequential behaviors of each user as the Behavior Pathway. Different users have their unique behavior pathways. Among existing sequential models, transformers have shown great capacity in capturing global-dependent characteristics. However, these models mainly provide a dense distribution over all previous behaviors using the self-attention mechanism, making the final predictions overwhelmed by the trivial behaviors not adjusted to each user. In this paper, we build the Recommender Transformer (RETR) with a novel Pathway Attention mechanism. RETR can dynamically plan the behavior pathway specified for each user, and sparingly activate the network through this behavior pathway to effectively capture evolving patterns useful for recommendation. The key design is a learned binary route to prevent the behavior pathway from being overwhelmed by trivial behaviors. We empirically verify the effectiveness of RETR on seven real-world datasets and RETR yields state-of-the-art performance.	翻訳日:2022-06-16 11:23:22 公開日:2022-06-13
# (参考訳) 深達度学習による複数標識遅延動脈スピンラベルMRIによる脳血流の促進と動脈輸送時間マップの推定 Acceleration of cerebral blood flow and arterial transit time maps estimation from multiple post-labeling delay arterial spin-labeled MRI via deep learning ( http://arxiv.org/abs/2206.06372v1 ) ライセンス: CC BY 4.0	Yiran Li and Ze Wang	(参考訳) 目的: 動脈スピンラベリング (ASL) 灌流像は, 脳血流の直接的および絶対的測定を示している。動脈輸送時間(英: Arterial transit time、ATT)は、脳の領域に到達するためのラベル付きスピンの持続期間を反映する生理学的パラメータである。複数ラベル後遅延(PLD)は、CBFとATTの双方に対して堅牢な尺度を提供し、ATTに基づく地域CBFモデリングの最適化を可能にする。長期取得時間はCBFとATT推定の品質と精度を低下させる可能性がある。信号対雑音比(SNR)の高いPLDの数を著しく削減する新しいネットワークを提案する。方法: CBF法とATT法では, PLDが1例, PLDが2例であった。各モデルは、灌流重み付き画像(PWI)からCBFおよびATT画像への非線形変換を独立に学習した。結果: 1-PLDモデルと2-PLDモデルでは, CBFモデルと2-PLDモデルでは, ATT estima-tionではより正確な構造を示した。提案手法は,STNを犠牲にすることなく,ATTでは6から2に,CBFでは1つのPDDに減少させる。結論: 高品質のディープラーニングを用いたpld削減によるcbfおよびattマップの生成は可能である。 Purpose: Arterial spin labeling (ASL) perfusion imaging indicates direct and absolute measurement of cerebral blood flow (CBF). Arterial transit time (ATT) is a related physiological parameter reflecting the duration for the labeled spins to reach the brain region of interest. Multiple post-labeling delay (PLDs) can provide robust measures of both CBF and ATT, allowing for optimization of regional CBF modeling based on ATT. The prolonged acquisition time can potentially reduce the quality and accuracy of the CBF and ATT estimation. We proposed a novel network to significantly reduce the number of PLDs with higher signal-to-noise ratio (SNR). Method: CBF and ATT estimations were performed for one PLD and two PLDs sepa-rately. Each model was trained independently to learn the nonlinear transformation from perfusion weighted image (PWI) to CBF and ATT images. Results: Both one-PLD and two-PLD models outperformed the conventional method visually on CBF and two-PLD model showed more accurate structure on ATT estima-tion. The proposed method significantly reduces the number of PLDs from 6 to 2 on ATT and even to single PLD on CBF without sacrificing the SNR. Conclusion: It is feasible to generate CBF and ATT maps with reduced PLDs using deep learning with high quality.	翻訳日:2022-06-16 11:04:07 公開日:2022-06-13
# (参考訳) 材料科学における記号回帰:データからの原子間ポテンシャルの発見 Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data ( http://arxiv.org/abs/2206.06422v1 ) ライセンス: CC BY 4.0	Bogdan Burlacu, Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller	(参考訳) 原子スケールの物質の粒子モデルは、新しい材料の開発とそれらの性質の理解において重要な役割を果たす。粒子シミュレーションの精度は原子間ポテンシャルによって決定され、原子座標や他の性質の関数として原子系のポテンシャルエネルギーを計算することができる。第一原理に基づくab initioポテンシャルは任意のレベルの精度に達するが、高い計算コストによってその適用性は制限される。機械学習(ML)は最近、高価なモデルを電子構造データに基づいて訓練された高効率なサロゲートに置き換えることで、アブ初期原子ポテンシャルの計算コストを相殺する有効な方法として登場した。現在の多くの手法の中で、記号回帰(SR)は、原子間ポテンシャルの関数形式を発見するための強力な「ホワイトボックス」アプローチとして勢いを増している。本研究は材料科学(MS)における象徴的回帰の役割を論じ、現在の方法論的課題と最先端の成果を概観する。 ab initio電子構造データを用いて、生データ(原子位置と関連するポテンシャルエネルギーのスナップショット)から原子電位をモデル化する遺伝的プログラミングに基づくアプローチを提示し、実証的に検証する。 Particle-based modeling of materials at atomic scale plays an important role in the development of new materials and understanding of their properties. The accuracy of particle simulations is determined by interatomic potentials, which allow to calculate the potential energy of an atomic system as a function of atomic coordinates and potentially other properties. First-principles-based ab initio potentials can reach arbitrary levels of accuracy, however their aplicability is limited by their high computational cost. Machine learning (ML) has recently emerged as an effective way to offset the high computational costs of ab initio atomic potentials by replacing expensive models with highly efficient surrogates trained on electronic structure data. Among a plethora of current methods, symbolic regression (SR) is gaining traction as a powerful "white-box" approach for discovering functional forms of interatomic potentials. This contribution discusses the role of symbolic regression in Materials Science (MS) and offers a comprehensive overview of current methodological challenges and state-of-the-art results. A genetic programming-based approach for modeling atomic potentials from raw data (consisting of snapshots of atomic positions and associated potential energy) is presented and empirically validated on ab initio electronic structure data.	翻訳日:2022-06-16 10:57:39 公開日:2022-06-13
# (参考訳) Trajectory-Wise Reward を用いたオフライン強化学習 Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward ( http://arxiv.org/abs/2206.06426v1 ) ライセンス: CC BY 4.0	Tengyu Xu, Yingbin Liang	(参考訳) 強化学習(RL)の顕著な成功は、訪問した全ての状態-行動ペアの報酬の観察に大きく依存している。しかし、現実世界の多くの応用において、エージェントは軌道全体の質を表すスコアのみを観察することができ、これは「軌道回り報酬」と呼ばれる。このような状況下では、標準のRL法では軌道的報酬をうまく活用することは困難であり、政策評価において大きなバイアスと分散誤差が生じる可能性がある。本稿では、最小二乗法に基づく報酬再分配によるステップごとの代用報酬への軌道戻りを分解し、学習した代用報酬に基づいて悲観的価値反復を行う、Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)と呼ばれる新しいオフラインRLアルゴリズムを提案する。 PartEDで構築された値関数が常に最適値に対して悲観的であることを保証するため、我々はプロキシ報酬の不確実性を相殺する新しいペナルティ項を設計する。大きな状態空間を持つ一般的なエピソードMDPに対して、オーバーパラメータ化されたニューラルネットワーク関数近似で$\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is the length of episode, $N$ is the total number of sample, $D_{\text{eff}}$ is the effective dimension of the neural tangent kernel matrix。この結果をさらに説明するために、parted は線形 mdps に対して $\tilde{\mathcal{o}}(dh^3/\sqrt{n})$ 準最適性を達成し、ここで $d$ は特徴次元であり、$d_{\text{eff}}=dh$ のとき、ニューラルネットワーク関数近似と一致する。私たちの知る限りでは、PartEDは、トラジェクティブな報酬を持つ一般のMDPにおいて、確実に効率の良い最初のオフラインRLアルゴリズムである。 The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair. In many real world applications, however, an agent can observe only a score that represents the quality of the whole trajectory, which is referred to as the {\em trajectory-wise reward}. In such a situation, it is difficult for standard RL methods to well utilize trajectory-wise reward, and large bias and variance errors can be incurred in policy evaluation. In this work, we propose a novel offline RL algorithm, called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED), which decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value iteration based on the learned proxy reward. To ensure the value functions constructed by PARTED are always pessimistic with respect to the optimal ones, we design a new penalty term to offset the uncertainty of the proxy reward. For general episodic MDPs with large state space, we show that PARTED with overparameterized neural network function approximation achieves an $\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is the length of episode, $N$ is the total number of samples, and $D_{\text{eff}}$ is the effective dimension of the neural tangent kernel matrix. To further illustrate the result, we show that PARTED achieves an $\tilde{\mathcal{O}}(dH^3/\sqrt{N})$ suboptimality with linear MDPs, where $d$ is the feature dimension, which matches with that with neural network function approximation, when $D_{\text{eff}}=dH$. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.	翻訳日:2022-06-16 10:36:58 公開日:2022-06-13
# (参考訳) 命題型フレームワークにおける最適化の要約 An Abstract View on Optimizations in Propositional Frameworks ( http://arxiv.org/abs/2206.06440v1 ) ライセンス: CC BY 4.0	Yuliya Lierler	(参考訳) 検索最適化問題は、科学や工学の分野では多い。人工知能は、検索最適化問題の解決とモデリングを目的とした検索アルゴリズムと宣言型プログラミング言語の開発に長い間貢献してきた。自動推論と知識表現はAIのサブフィールドであり、これらの開発に特に適している。一般的な自動推論パラダイムの多くは、最適化ステートメントをサポートする言語をユーザに提供しています。これらのパラダイムは言語や計算されたソリューションの品質条件を表現する方法によって大きく異なる。ここでは、パラダイム間の構文的な区別をなくし、パラダイムによって提供される最適化文間の本質的な類似性と相違を見極めるいわゆる重みシステムの統一フレームワークを提案する。この統一的な見通しは、自動推論と知識表現における最適化とモジュラリティの研究において重要な単純化と説明可能性を有しており、異なる形式を橋渡しし翻訳解決法を開発するための技術的手段を提供する。論理プログラミングの理論と実践(tplp)における考察。 Search-optimization problems are plentiful in scientific and engineering domains. Artificial intelligence has long contributed to the development of search algorithms and declarative programming languages geared towards solving and modeling search-optimization problems. Automated reasoning and knowledge representation are the subfields of AI that are particularly vested in these developments. Many popular automated reasoning paradigms provide users with languages supporting optimization statements: MaxSAT or answer set programming, to name a few. These paradigms vary significantly in their languages and in the ways they express quality conditions on computed solutions. Here we propose a unifying framework of so-called weight systems that eliminates syntactic distinctions between paradigms and allows us to see essential similarities and differences between optimization statements provided by paradigms. This unifying outlook has a significant simplifying and explanatory potential in the studies of optimization and modularity in automated reasoning and knowledge representation providing technical means for bridging distinct formalisms and developing translational solvers. Under consideration in Theory and Practice of Logic Programming (TPLP).	翻訳日:2022-06-16 10:34:47 公開日:2022-06-13
# (参考訳) Splatting を用いた画像分解能に対するセグメンテーションネットワークの適用 Fitting Segmentation Networks on Varying Image Resolutions using Splatting ( http://arxiv.org/abs/2206.06445v1 ) ライセンス: CC BY 4.0	Mikael Brudfors and Yael Balbastre and John Ashburner and Geraint Rees and Parashkev Nachev and Sebastien Ourselin and M. Jorge Cardoso	(参考訳) イメージセグメンテーションで使用されるデータは、必ずしも同じグリッド上で定義されない。これは特に医療画像に当てはまるもので、解像度、視野、方向がチャンネルや被験者によって異なる可能性がある。したがって、画像とラベルは、前処理ステップとして、通常同じグリッドに再サンプリングされる。しかし,再サンプリング操作では部分体積効果やぼやけが生じ,有効分解能が変化し,構造間のコントラストが低下する。本稿では,入力データの解像度ミスマッチを自動的に処理するsplat層を提案する。この層は、各画像をフォワードパスが行われる平均空間にプッシュする。スプレート演算子が再サンプリング演算子の随伴であるので、平均空間予測をネイティブラベル空間に引き戻すことができ、損失関数が計算される。これにより、補間による明示的な解決調整の必要性が排除される。シミュレーションおよび実マルチモーダル磁気共鳴画像を用いた2つの公開データセットにおいて,本モデルは,前処理ステップとして再サンプリングを行うよりもセグメンテーション結果を改善することを示す。 Data used in image segmentation are not always defined on the same grid. This is particularly true for medical images, where the resolution, field-of-view and orientation can differ across channels and subjects. Images and labels are therefore commonly resampled onto the same grid, as a pre-processing step. However, the resampling operation introduces partial volume effects and blurring, thereby changing the effective resolution and reducing the contrast between structures. In this paper we propose a splat layer, which automatically handles resolution mismatches in the input data. This layer pushes each image onto a mean space where the forward pass is performed. As the splat operator is the adjoint to the resampling operator, the mean-space prediction can be pulled back to the native label space, where the loss function is computed. Thus, the need for explicit resolution adjustment using interpolation is removed. We show on two publicly available datasets, with simulated and real multi-modal magnetic resonance images, that this model improves segmentation results compared to resampling as a pre-processing step.	翻訳日:2022-06-16 10:01:56 公開日:2022-06-13
# (参考訳) ソロモンオフ予測のためのジレンマ A Dilemma for Solomonoff Prediction ( http://arxiv.org/abs/2206.06473v1 ) ライセンス: CC BY 4.0	Sven Neth	(参考訳) ソロモンフ予測の枠組みは、コルモゴロフの複雑性に逆比例する仮説に事前の確率を割り当てる。有名な問題は2つある。第一に、Solomonoff はUniversal Turing マシンの選択と相対的である。第二に、以前のSolomonoffは計算不可能である。しかし、どちらの問題にも反応がある。異なるSolomonoffの優先順位は、ますます多くのデータに収束する。さらに、Solomonoffに対する計算可能な近似がある。私はこの2つの反応の間に緊張があると思う。これは、ソロモンオフ予測への計算可能近似が必ずしも収束しないためである。 The framework of Solomonoff prediction assigns prior probability to hypotheses inversely proportional to their Kolmogorov complexity. There are two well-known problems. First, the Solomonoff prior is relative to a choice of Universal Turing machine. Second, the Solomonoff prior is not computable. However, there are responses to both problems. Different Solomonoff priors converge with more and more data. Further, there are computable approximations to the Solomonoff prior. I argue that there is a tension between these two responses. This is because computable approximations to Solomonoff prediction do not always converge.	翻訳日:2022-06-16 09:47:40 公開日:2022-06-13
# (参考訳) 最悪の性能のためのロバスト蒸留 Robust Distillation for Worst-class Performance ( http://arxiv.org/abs/2206.06479v1 ) ライセンス: CC BY 4.0	Serena Wang and Harikrishna Narasimhan and Yichen Zhou and Sara Hooker and Michal Lukasik and Aditya Krishna Menon	(参考訳) 知識蒸留は教師モデルからの予測を用いた生徒モデルの性能向上に有効な手法であることが証明されている。しかし、最近の研究では、平均効率の利得はデータのサブグループ間で均一ではなく、特に稀なサブグループやクラスにおいて精度の犠牲となることが示されている。長期分布に追随する可能性のあるクラス間での強い性能を維持するため,学生の最悪のクラスパフォーマンスを改善するために調整された蒸留技術を開発した。具体的には、教師と生徒の異なる組み合わせで頑健な最適化目標を導入し、さらに、全体的な精度と頑健な最悪の目標とのトレードオフを伴うトレーニングを可能にする。実験結果から, 我々のロバスト蒸留技術は, より良い最低級性能を達成するだけでなく, 総合的性能と最低級性能のトレードオフを他の基準法と比較し, パレート的に改善することを示した。理論的には、ロバストな学生の教育を目標とするときに、良い教師になるものについての洞察を提供する。 Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may follow a long-tailed distribution, we develop distillation techniques that are tailored to improve the student's worst-class performance. Specifically, we introduce robust optimization objectives in different combinations for the teacher and student, and further allow for training with any tradeoff between the overall accuracy and the robust worst-class objective. We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods. Theoretically, we provide insights into what makes a good teacher when the goal is to train a robust student.	翻訳日:2022-06-16 09:32:24 公開日:2022-06-13
# (参考訳) RigNeRF:完全制御可能なニューラル3Dポートレイト RigNeRF: Fully Controllable Neural 3D Portraits ( http://arxiv.org/abs/2206.06481v1 ) ライセンス: CC BY 4.0	ShahRukh Athar, Zexiang Xu, Kalyan Sunkavalli, Eli Shechtman and Zhixin Shu	(参考訳) ニューラルレイディアンス場(NeRF)のような体積的ニューラルレンダリング法は、フォトリアリスティックな新規なビュー合成を可能にしている。しかし、標準的な形式では、NeRFはシーン内の人間の頭のようなオブジェクトの編集をサポートしない。本研究では,単に新しい視点合成ではなく,単一のポートレートビデオから学習した頭部のポーズと表情の完全な制御を可能にするシステムである rignerf を提案する。 3d morphable face model (3dmm) によって誘導される変形場を用いて, 頭部の姿勢と表情の変化をモデル化する。 3DMMは、3DMM変形の残差のみを予測することを学習し、入力シーケンスに存在しない新しい(厳密な)ポーズや(厳密でない)表現を描画するRigNeRFの先行として効果的に機能する。スマートフォンで撮影した訓練対象のショートビデオのみを用いて,明快な頭部ポーズと表情制御を備えたポートレートシーンのフリービュー合成における本手法の有効性を実証した。プロジェクトページは以下の通りである。 Volumetric neural rendering methods, such as neural radiance fields (NeRFs), have enabled photo-realistic novel view synthesis. However, in their standard form, NeRFs do not support the editing of objects, such as a human head, within a scene. In this work, we propose RigNeRF, a system that goes beyond just novel view synthesis and enables full control of head pose and facial expressions learned from a single portrait video. We model changes in head pose and facial expressions using a deformation field that is guided by a 3D morphable face model (3DMM). The 3DMM effectively acts as a prior for RigNeRF that learns to predict only residuals to the 3DMM deformations and allows us to render novel (rigid) poses and (non-rigid) expressions that were not present in the input sequence. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls. The project page can be found here: http://shahrukhathar.github.io/2022/06/06/RigNeRF.html	翻訳日:2022-06-16 09:31:18 公開日:2022-06-13
# (参考訳) ニューラルデータ拡張と機械学習モデルによるfMRIへのfNIRSマッピング Mapping fNIRS to fMRI with Neural Data Augmentation and Machine Learning Models ( http://arxiv.org/abs/2206.06486v1 ) ライセンス: CC BY 4.0	Jihyun Hur, Jaeyeong Yang, Hoyoung Doh, Woo-Young Ahn	(参考訳) 神経画像技術の進歩は、人間の心がどのように機能するかを理解する新しい洞察を与えました。機能的磁気共鳴イメージング(fMRI)は最も広く用いられている神経イメージング技術であり、個人差のfMRIベースのマーカーへの関心が高まっている。しかし、その効用は高いコストと子供や幼児を含む特定の人口からの獲得が困難であるために制限されることが多い。 fMRIマーカーのサーロゲートマーカーまたはニューラル相関は、重要な実用的意味を持つが、fMRIマーカーに対するスタンドアローン予測器はほとんどない。そこで我々は、機械学習モデルとデータ拡張を用いて、能動近赤外分光法(fNIRS)の多変量パターンから、人間の認識のfMRIマーカーを精度良く予測した。全2回の訪問において,fNIRS,fMRI,fNIRS,fMRIの2つの認知タスク(ストップ信号タスクと確率的逆転学習タスク)を施行した50名の被験者を募集した。 MLモデルとデータ拡張を用いて、前頭前皮質の48チャンネルのfNIRS活性化による応答抑制や予測誤り信号のfMRIマーカーの確立を予測できる。これらの結果から、fNIRSはfMRI活性化の補助的マーカーとなり、幼児を含む様々な集団の理解を広げる可能性が示唆された。 Advances in neuroimaging techniques have provided us novel insights into understanding how the human mind works. Functional magnetic resonance imaging (fMRI) is the most popular and widely used neuroimaging technique, and there is growing interest in fMRI-based markers of individual differences. However, its utility is often limited due to its high cost and difficulty acquiring from specific populations, including children and infants. Surrogate markers, or neural correlates of fMRI markers, would have important practical implications, but we have few stand-alone predictors for the fMRI markers. Here, using machine learning (ML) models and data augmentation, we predicted well-validated fMRI markers of human cognition from multivariate patterns of functional near-infrared spectroscopy (fNIRS), a portable and relatively inexpensive optical neuroimaging technique. We recruited 50 human participants who performed two cognitive tasks (stop signal task and probabilistic reversal learning task), while neural activation was measured with either fNIRS or fMRI at each of the total two visits. Using ML models and data augmentation, we could predict the well-established fMRI markers of response inhibition or prediction error signals from 48-channel fNIRS activation in the prefrontal cortex. These results suggest that fNIRS might offer a surrogate marker of fMRI activation, which would broaden our understanding of various populations, including infants.	翻訳日:2022-06-16 09:30:18 公開日:2022-06-13
# (参考訳) 未ラベル画像からのタスク依存型ゲーム状態表現の学習 Learning Task-Independent Game State Representations from Unlabeled Images ( http://arxiv.org/abs/2206.06490v1 ) ライセンス: CC BY 4.0	Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis	(参考訳) 自己教師付き学習(SSL)技術は、高次元複素データからコンパクトで情報的な表現を学習するために広く用いられている。画像分類などの多くのコンピュータビジョンタスクにおいて、このような手法は教師付き学習手法を超える最先端の結果が得られる。本稿では,ゲームの状態表現を正確に学習するタスクにおいて,SSL手法をどの程度活用できるかを検討する。そこで本研究では,VizDoom,CARLAレーシングシミュレータ,Google Research Football Environmentという3つの異なる3Dゲームから,ゲーム映像フレームとそれに対応するゲームの内部状態を収集する。画像エンコーダを生フレームのみを使用して3つの広く使用されているsslアルゴリズムでトレーニングし,学習した表現から内部状態変数を復元する。その結果,ImageNetなどのトレーニング済みベースラインモデルと比較して,SSL表現とゲーム内部状態の相関が著しく高いことがわかった。このような発見は、SSLベースのビジュアルエンコーダは、特定のタスクに合わせたものではなく、一般的な -- ゲームピクセル情報のみから情報的なゲーム表現が得られることを示唆している。このような表現は、ゲームプレイング、コンテンツ生成、プレイヤーモデリングなど、ゲームにおける下流学習タスクのパフォーマンスを高める基盤を形成することができる。 Self-supervised learning (SSL) techniques have been widely used to learn compact and informative representations from high-dimensional complex data. In many computer vision tasks, such as image classification, such methods achieve state-of-the-art results that surpass supervised learning approaches. In this paper, we investigate whether SSL methods can be leveraged for the task of learning accurate state representations of games, and if so, to what extent. For this purpose, we collect game footage frames and corresponding sequences of games' internal state from three different 3D games: VizDoom, the CARLA racing simulator and the Google Research Football Environment. We train an image encoder with three widely used SSL algorithms using solely the raw frames, and then attempt to recover the internal state variables from the learned representations. Our results across all three games showcase significantly higher correlation between SSL representations and the game's internal state compared to pre-trained baseline models such as ImageNet. Such findings suggest that SSL-based visual encoders can yield general -- not tailored to a specific task -- yet informative game representations solely from game pixel information. Such representations can, in turn, form the basis for boosting the performance of downstream learning tasks in games, including gameplaying, content generation and player modeling.	翻訳日:2022-06-16 09:24:02 公開日:2022-06-13
# (参考訳) 粒子群最適化による高g最適精密設計の高速計算 Fast Computation of Highly G-optimal Exact Designs via Particle Swarm Optimization ( http://arxiv.org/abs/2206.06498v1 ) ライセンス: CC BY 4.0	Stephen J. Walsh and John J. Borkowski	(参考訳) 応答面モデルのための正確な$G$-optimal設計を提案する計算は、過去2年間にアルゴリズム開発によって漸進的に改善された難しい計算である。これらの最適設計は、計算の困難さとコストのために、アプリケーションでは広く考慮されていない。座標交換(CEXCH)、遺伝的アルゴリズム(GA)、および計算コストの大きい$I_\lambda$-optimalityアルゴリズム(G(I_\lambda)$-CEXCH)による比較的新しい$G$-optimalである。 particle swarm optimization (pso)は、多くのアプリケーションで広く使われているが、その広範囲な成功にもかかわらず、最適設計問題への応用は比較的少ない。本稿では,PSOを最適設計問題に適用するための拡張手法を提案する。次に、psoを用いて、工業実験で一般的な実験サイズである$k = 1, 2, 3, 4, 5$の設計因子を含むいくつかのシナリオの最適設計を作成する。これらの結果と過去20年間の文献で公表された$g$-optimalデザインを比較した。 GAが生成した$G$-optimal design for $K=1, 2, 3$ factors has unchallenged for 14 years。 psoはこれらのシナリオの最適設計が改善され、最先端のアルゴリズムである$g(i_\lambda)$-cexchに匹敵する計算コストがかかることを実証した。さらに、PSOは、現在知られているものよりも、$K=4, 5$因子に対して、同等以上の$G$-optimal設計を実現できることを示す。これらの結果から,PSOは既存の手法よりも高いG$-Optimal設計を効率的に生成できる可能性が示唆された。 Computing proposed exact $G$-optimal designs for response surface models is a difficult computation that has received incremental improvements via algorithm development in the last two-decades. These optimal designs have not been considered widely in applications in part due to the difficulty and cost involved with computing them. Three primary algorithms for constructing exact $G$-optimal designs are presented in the literature: the coordinate exchange (CEXCH), a genetic algorithm (GA), and the relatively new $G$-optimal via $I_\lambda$-optimality algorithm ($G(I_\lambda)$-CEXCH) which was developed in part to address large computational cost. Particle swarm optimization (PSO) has achieved widespread use in many applications, but to date, its broad-scale success notwithstanding, has seen relatively few applications in optimal design problems. In this paper we develop an extension of PSO to adapt it to the optimal design problem. We then employ PSO to generate optimal designs for several scenarios covering $K = 1, 2, 3, 4, 5$ design factors, which are common experimental sizes in industrial experiments. We compare these results to all $G$-optimal designs published in last two decades of literature. Published $G$-optimal designs generated by GA for $K=1, 2, 3$ factors have stood unchallenged for 14 years. We demonstrate that PSO has found improved $G$-optimal designs for these scenarios, and it does this with comparable computational cost to the state-of-the-art algorithm $G(I_\lambda)$-CEXCH. Further, we show that PSO is able to produce equal or better $G$-optimal designs for $K= 4, 5$ factors than those currently known. These results suggest that PSO is superior to existing approaches for efficiently generating highly $G$-optimal designs.	翻訳日:2022-06-16 09:08:12 公開日:2022-06-13
# (参考訳) 量子化アウェアトレーニングにおける最適クリッピング法とマグニチュードアウェア微分法 Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training ( http://arxiv.org/abs/2206.06501v1 ) ライセンス: CC BY 4.0	Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany	(参考訳) データクリッピングは、量子化操作におけるノイズの低減と量子化対応トレーニング(QAT)の達成可能な精度の向上に不可欠である。現在のプラクティスは、クリッピング閾値スカラーを設定するためのヒューリスティックスに依存しており、最適であることを示すことはできない。我々は,MSE最適クリッピングスカラーを決定する再帰アルゴリズムであるOptimally Clipped Tensors And Vectors (OCTAV)を提案する。高速Newton-Raphson法から派生したOCTAVは、QATルーチンの各イテレーションにおいて、テンソル毎に、フライ時に最適なクリッピングスカラーを見つける。したがって、QATアルゴリズムは各ステップで証明可能な最小量子化ノイズで定式化される。さらに, qatにおける一般的な勾配推定手法の限界を明らかにし, 精度向上のための修正としてマグニチュードアウェア微分を提案する。実験的に、OCTAV対応QATは複数のタスクで最先端の精度を達成する。その中には、ImageNet上のResNetsとMobileNetsのトレーニングとリトレーニング、BERTモデルを使用したSquadの微調整が含まれる。本研究では,量子化操作を適宜挿入する場合を除いて,ベースラインのトレーニングレシピの変更は不要である。 Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.	翻訳日:2022-06-16 08:45:11 公開日:2022-06-13
# (参考訳) 深部画像に基づくポーズ推定器を用いたスマートベッドの圧力データからのポーズ推定 Estimating Pose from Pressure Data for Smart Beds with Deep Image-based Pose Estimators ( http://arxiv.org/abs/2206.06518v1 ) ライセンス: CC BY 4.0	Vandad Davoodnia, Saeed Ghorbani, Ali Etemad	(参考訳) ベッド内ポーズ推定は、病院の患者モニタリング、睡眠研究、スマートホームなどの分野での価値を示している。本稿では,既存のポーズ推定器を用いて,高度にあいまいな圧力データから身体のポーズを検出するための異なる戦略について検討する。プレトレーニングされたポーズ推定器の性能を, 2つの圧力データセット上で直接あるいは再トレーニングすることによって検証する。また,共通目的ポーズ推定モジュールの期待入力空間に近い表現にあいまいな圧力マップを変換する学習可能な前処理領域適応ステップを利用した他の戦略についても検討する。そこで我々は,複数スケールの完全畳み込みネットワークを用いて,プレトレーニングされたポーズ推定モジュールに圧力マップのポーズ特化特性を提供する。提案手法の完全解析により,学習可能な事前処理モジュールと既存の画像ベースのポーズ推定器を併用することにより,高度にあいまいな圧力点などの問題を克服し,ポーズ推定精度を極めて高めることができた。 In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directly or by re-training them on two pressure datasets. We also explore other strategies utilizing a learnable pre-processing domain adaptation step, which transforms the vague pressure maps to a representation closer to the expected input space of common purpose pose estimation modules. Accordingly, we used a fully convolutional network with multiple scales to provide the pose-specific characteristics of the pressure maps to the pre-trained pose estimation module. Our complete analysis of different approaches shows that the combination of learnable pre-processing module along with re-training pre-existing image-based pose estimators on the pressure data is able to overcome issues such as highly vague pressure points to achieve very high pose estimation accuracy.	翻訳日:2022-06-16 08:15:20 公開日:2022-06-13
# (参考訳) 大規模メモリベースモデル編集 Memory-Based Model Editing at Scale ( http://arxiv.org/abs/2206.06520v1 ) ライセンス: CC BY 4.0	Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn	(参考訳) 最大のニューラルネットワークでさえエラーを起こし、世界が変化すれば、一度訂正された予測が無効になる可能性がある。モデルエディタはベースモデルの振る舞いを局所的に更新し、更新された知識を注入したり、望ましくない振る舞いを修正する。既存のモデルエディタは、将来性を示しているが、表現力に乏しい: 編集の意図したスコープ(編集によって影響を受ける例)を正確にモデル化するのに苦労し、編集にゆるやかに関係しているテストインプットの予測が不正確になり、多くの編集の後完全に失敗することが多い。高容量の代替品として,SERAC(Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model)を提案する。モデルエディタの厳密な評価を可能にするために,質問応答,ファクトチェック,対話生成に基づく3つの難解な言語モデル編集問題を提案する。 SERACだけが3つの問題に対して高い性能を達成し、モデル編集に対する既存のアプローチを著しく上回っていることがわかった。コード、データ、および追加のプロジェクト情報はhttps://sites.google.com/view/serac-editing.comで入手できる。 Even the largest neural networks make errors, and once-correct predictions can become invalid as the world changes. Model editors make local updates to the behavior of base (pre-trained) models to inject updated knowledge or correct undesirable behaviors. Existing model editors have shown promise, but also suffer from insufficient expressiveness: they struggle to accurately model an edit's intended scope (examples affected by the edit), leading to inaccurate predictions for test inputs loosely related to the edit, and they often fail altogether after many edits. As a higher-capacity alternative, we propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC), which stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed. To enable more rigorous evaluation of model editors, we introduce three challenging language model editing problems based on question answering, fact-checking, and dialogue generation. We find that only SERAC achieves high performance on all three problems, consistently outperforming existing approaches to model editing by a significant margin. Code, data, and additional project information will be made available at https://sites.google.com/view/serac-editing.	翻訳日:2022-06-16 07:58:36 公開日:2022-06-13
# 特殊相対性理論と非決定性からの相対的教会修道会論 A Relative Church-Turing-Deutsch Thesis from Special Relativity and Undecidability ( http://arxiv.org/abs/2206.06419v1 ) ライセンス: Link先を確認	Blake Wilson, Ethan Dickey, Vaishnavi Iyer and Sabre Kais	(参考訳) 1950年のチューリングの研究から、人工知能はチューリングマシンによって意識をシミュレートできることを提案した。これは、宇宙がコンピュータ上のシミュレーションである全てのことの潜在的な理論であり、シミュレーションの中に私たちが存在することを証明できるかどうかという疑問を提起する。本研究では,計算可能な \textit{local} マシンを,古典的チューリングマシンである \textit{global} でシミュレートした相対計算モデルを構築する。本研究では,Halting問題と同じ意味でグローバルシミュレータのローカル・マシン・コンピューティング \textbf{simulation properties} の問題は決定不能であることを示す。次に,グローバルシミュレータが蓄積した時間,空間,エラーの計算はシミュレーション特性であり,決定不能であることを示す。これらのシミュレーション特性は、我々が宇宙で経験したのと同じ定数時間局所計算複雑性を持つローカルマシンの量子力学を大域的チューリングマシンが計算する相対的チャーチ・チューリング・ドイッチュ理論を構築するために使用する相対モデルにおいて、特別な相対論的効果をもたらす。 Beginning with Turing's seminal work in 1950, artificial intelligence proposes that consciousness can be simulated by a Turing machine. This implies a potential theory of everything where the universe is a simulation on a computer, which begs the question of whether we can prove we exist in a simulation. In this work, we construct a relative model of computation where a computable \textit{local} machine is simulated by a \textit{global}, classical Turing machine. We show that the problem of the local machine computing \textbf{simulation properties} of its global simulator is undecidable in the same sense as the Halting problem. Then, we show that computing the time, space, or error accumulated by the global simulator are simulation properties and therefore are undecidable. These simulation properties give rise to special relativistic effects in the relative model which we use to construct a relative Church-Turing-Deutsch thesis where a global, classical Turing machine computes quantum mechanics for a local machine with the same constant-time local computational complexity as experienced in our universe.	翻訳日:2022-06-15 15:37:58 公開日:2022-06-13
# 複数インプット法の比較評価のための方法論的枠組み--米国国立コロナウイルス共同研究における人種・民族・身体マス指数の多変量化- A Methodological Framework for the Comparative Evaluation of Multiple Imputation Methods: Multiple Imputation of Race, Ethnicity and Body Mass Index in the U.S. National COVID Cohort Collaborative ( http://arxiv.org/abs/2206.06444v1 ) ライセンス: Link先を確認	Elena Casiraghi, Rachel Wong, Margaret Hall, Ben Coleman, Marco Notaro, Michael D. Evans, Jena S. Tronieri, Hannah Blau, Bryan Laraway, Tiffany J. Callahan, Lauren E. Chan, Carolyn T. Bramante, John B. Buse, Richard A. Moffitt, Til Sturmer, Steven G. Johnson, Yu Raymond Shao, Justin Reese, Peter N. Robinson, Alberto Paccanaro, Giorgio Valentini, Jared D. Huling and Kenneth Wilkins (on behalf of the N3C Consortium): Tell Bennet, Christopher Chute, Peter DeWitt, Kenneth Gersing, Andrew Girvin, Melissa Haendel, Jeremy Harper, Janos Hajagos, Stephanie Hong, Emily Pfaff, Jane Reusch, Corneliu Antoniescu, Kimberly Robaski	(参考訳) 電子健康記録は、バイオメディカル研究のための豊富なデータソースであるが、これらのシステムは、医療設定全体にわたって均一に実装されておらず、医療の断片化とサイロ化された電子健康記録間の相互運用性の欠如により、重要なデータが欠落している可能性がある。欠失データによる症例の削除がその後の分析に深刻なバイアスをもたらす可能性があることを考えると、いくつかの著者は欠失情報を回復するために複数の計算方法を適用することを好む。残念なことに、いくつかの文献は、現在研究に自由に利用できる異なる複数のインプテーションアルゴリズムを使用して有望な結果を文書化しているが、どのmiアルゴリズムが最もうまく機能するかについてのコンセンサスはない。 MI戦略の選択以外にも、計算アルゴリズムとそのアプリケーション設定の選択は決定的かつ困難である。本稿では,Rubin と van Buuren の独創的な研究に触発されて,複数の計算手法の評価と比較に応用できる方法論的枠組みを提案する。本研究の枠組みはより大規模なコホート(コホート)の検証・拡張に応用され,本研究は,全国コホート協力機関が提供した2型糖尿病患者における重要患者の記述者の影響と重症度について検討した。 While electronic health records are a rich data source for biomedical research, these systems are not implemented uniformly across healthcare settings and significant data may be missing due to healthcare fragmentation and lack of interoperability between siloed electronic health records. Considering that the deletion of cases with missing data may introduce severe bias in the subsequent analysis, several authors prefer applying a multiple imputation strategy to recover the missing information. Unfortunately, although several literature works have documented promising results by using any of the different multiple imputation algorithms that are now freely available for research, there is no consensus on which MI algorithm works best. Beside the choice of the MI strategy, the choice of the imputation algorithm and its application settings are also both crucial and challenging. In this paper, inspired by the seminal works of Rubin and van Buuren, we propose a methodological framework that may be applied to evaluate and compare several multiple imputation techniques, with the aim to choose the most valid for computing inferences in a clinical research work. Our framework has been applied to validate, and extend on a larger cohort, the results we presented in a previous literature study, where we evaluated the influence of crucial patients' descriptors and COVID-19 severity in patients with type 2 diabetes mellitus whose data is provided by the National COVID Cohort Collaborative Enclave.	翻訳日:2022-06-15 15:37:40 公開日:2022-06-13
# 視・放射・学習:ラジオ・視覚対応による自己教師あり局所化 Look, Radiate, and Learn: Self-supervised Localisation via Radio-Visual Correspondence ( http://arxiv.org/abs/2206.06424v1 ) ライセンス: Link先を確認	Mohammed Alloulah, Maximilian Arnold	(参考訳) 次世代の携帯電話ネットワークは、無線センシング機能と慣用通信を実装して、前例のない世界規模の無線センシングを屋外で実現する。ディープラーニングはコンピュータビジョンに革命をもたらしたが、電波センシングの性能と将来性を研究するための体系的なデータセットやベンチマークが欠如していることから、電波知覚タスクに限定された応用がなされている。このギャップに対処するために、我々は、無線の正確なターゲットローカライゼーションを容易にする合成無線視覚データセットとベンチマークであるMaxRayを提示する。さらに,無線と視覚の対応から自己コーディネートを抽出することで,ラジオにおける目標のローカライズを学ぶことを提案する。無線ローカライザネットワークのトレーニングには,このような自己監督座標を用いる。我々は、多くの最先端のベースラインに対して、パフォーマンスを特徴付ける。以上の結果から,ラベルのない無線視線データから,正確な無線目標位置推定を自動学習できることが示唆された。これにより、膨大なデータスケーラビリティの扉が開かれ、統一された認識通信セルインフラストラクチャ上で堅牢な無線センシングを実現するための鍵が証明される。 DatasetはIEEE DataPortでホストされる。 Next generation cellular networks will implement radio sensing functions alongside customary communications, thereby enabling unprecedented worldwide sensing coverage outdoors. Deep learning has revolutionised computer vision but has had limited application to radio perception tasks, in part due to lack of systematic datasets and benchmarks dedicated to the study of the performance and promise of radio sensing. To address this gap, we present MaxRay: a synthetic radio-visual dataset and benchmark that facilitate precise target localisation in radio. We further propose to learn to localise targets in radio without supervision by extracting self-coordinates from radio-visual correspondence. We use such self-supervised coordinates to train a radio localiser network. We characterise our performance against a number of state-of-the-art baselines. Our results indicate that accurate radio target localisation can be automatically learned from paired radio-visual data without labels, which is highly relevant to empirical data. This opens the door for vast data scalability and may prove key to realising the promise of robust radio sensing atop a unified perception-communication cellular infrastructure. Dataset will be hosted on IEEE DataPort.	翻訳日:2022-06-15 15:37:15 公開日:2022-06-13
# 画像共起バイアスによる因果効果の推定 : アフリカの貧困への適用 Estimating Causal Effects Under Image Confounding Bias with an Application to Poverty in Africa ( http://arxiv.org/abs/2206.06410v1 ) ライセンス: Link先を確認	Connor T. Jerzak, Fredrik Johansson, Adel Daoud	(参考訳) 因果効果の観察的研究は、結合因子の調整を必要とする。これらの因子が明確に定義され、個別の確率変数である表の設定では、共起の効果がよく理解されている。しかし、公共政策、生態学、医学では、画像で検出されたパターンや物体(地図、衛星、トモグラフィー画像など)に通知される非タブラルな設定で決定されることが多い。このようなイメージを因果推論に使用すると、画像内のオブジェクトが関心のある治療や結果に関連がある可能性があるため、機会が得られる。このような場合、コンバウンディングの調整には画像に依存するが、観測されたデータは、重要なオブジェクトの存在を直接ラベル付けしない。現実世界のアプリケーションによって動機づけられ、この課題、どのように処理できるか、因果効果を識別し見積もるのに十分な条件を定式化します。シミュレーション実験を用いて有限サンプル性能を解析し、機械学習モデルを用いて画像の共起を推定する確率調整アルゴリズムを用いて効果を推定する。また,画像パターン機構の誤特定に対する感度についても検討した。最後に,我々の手法を用いて,衛星画像からアフリカの貧困に対する政策介入の影響を推定する。 Observational studies of causal effects require adjustment for confounding factors. In the tabular setting, where these factors are well-defined, separate random variables, the effect of confounding is well understood. However, in public policy, ecology, and in medicine, decisions are often made in non-tabular settings, informed by patterns or objects detected in images (e.g., maps, satellite or tomography imagery). Using such imagery for causal inference presents an opportunity because objects in the image may be related to the treatment and outcome of interest. In these cases, we rely on the images to adjust for confounding but observed data do not directly label the existence of the important objects. Motivated by real-world applications, we formalize this challenge, how it can be handled, and what conditions are sufficient to identify and estimate causal effects. We analyze finite-sample performance using simulation experiments, estimating effects using a propensity adjustment algorithm that employs a machine learning model to estimate the image confounding. Our experiments also examine sensitivity to misspecification of the image pattern mechanism. Finally, we use our methodology to estimate the effects of policy interventions on poverty in African communities from satellite imagery.	翻訳日:2022-06-15 15:01:14 公開日:2022-06-13
# 画像に基づく治療効果の不均一性 Image-based Treatment Effect Heterogeneity ( http://arxiv.org/abs/2206.06417v1 ) ライセンス: Link先を確認	Connor T. Jerzak, Fredrik Johansson, Adel Daoud	(参考訳) ランダム化制御試験(RCTs)は介入の効果を推定するための金の基準と考えられている。最近の研究では、年齢や民族性などの表式変数に対する推定条件付けによって、rctにおける効果の多様性が研究されている。しかし、そのような変数は実験の前後でのみ観測されることが多く、歴史的または地理的な影響の理由を捉えられないことがある。実験単位が特定の場所と関連付けられると、衛星画像はそのような歴史的・地理的情報を提供することができるが、効果の不均一性を記述するためにそれを組み込む方法は存在しない。本稿では,治療効果に対して同一の分布を持つ画像群を,より確率的モデリングフレームワークを用いて推定する手法を開発した。提案手法をシミュレーションおよびウガンダにおける抗貧困介入の効果を推定するために,提案手法を代替案と比較した。平均治療効果(ATE)を回復する際のクラスタモデルの信頼性を確保するために、因果正則化ペナルティを導入する。最後に,画像情報の普及にともなう医学や気候科学など,これらの手法の適用可能性,限界,適用性について論じる。すべてのモデリング戦略のコードをオープンソースソフトウェアパッケージで公開しています。 Randomized controlled trials (RCTs) are considered the gold standard for estimating the effects of interventions. Recent work has studied effect heterogeneity in RCTs by conditioning estimates on tabular variables such as age and ethnicity. However, such variables are often only observed near the time of the experiment and may fail to capture historical or geographical reasons for effect variation. When experiment units are associated with a particular location, satellite imagery can provide such historical and geographical information, yet there is no method which incorporates it for describing effect heterogeneity. In this paper, we develop such a method which estimates, using a deep probabilistic modeling framework, the clusters of images having the same distribution over treatment effects. We compare the proposed methods against alternatives in simulation and in an application to estimating the effects of an anti-poverty intervention in Uganda. A causal regularization penalty is introduced to ensure reliability of the cluster model in recovering Average Treatment Effects (ATEs). Finally, we discuss feasibility, limitations, and the applicability of these methods to other domains, such as medicine and climate science, where image information is prevalent. We make code for all modeling strategies publicly available in an open-source software package.	翻訳日:2022-06-15 15:00:54 公開日:2022-06-13
# SmartGD:グラフ描画のための自己変化型ジェネレータネットワーク SmartGD: A Self-Challenging Generative Adversarial Network for Graph Drawing ( http://arxiv.org/abs/2206.06434v1 ) ライセンス: Link先を確認	Xiaoqi Wang, Kevin Yen, Yifan Hu and Han-Wei Shen	(参考訳) グラフ描画に関する研究は数多く行われているが、既存の手法の多くはグラフレイアウトの特定の美的側面を最適化することだけに焦点を当てている。グラフが与えられた場合、人間の美的嗜好を満たす良いレイアウトを生成することは、特にそのような嗜好が微分可能な目的関数として表現できない場合、難しい課題である。本稿では,学習者によるGANベースのグラフ描画フレームワークSmartGDを提案する。 SmartGDの学生ネットワークは、良いレイアウトの例を模倣してグラフ描画を学習し、SmartGDの教師ネットワークは、生成されたレイアウトの良さに関する評価を提供する。よいレイアウトを構成するものを特定するための具体的な美的基準がない場合、学生ネットワークは良いレイアウト例から学ぶことができる。一方、定量的基準(差別化不可能でも)でレイアウトの良さを評価できる場合、学生ネットワークは、ターゲットの美学を最適化するための具体的目標として利用することができる。目的を達成するために,GAN の新たな変種である自己整合性 GAN を提案し,審美的基準に対する最適レイアウト分布を,基準が微分可能か否かに関わらず学習する。提案するグラフ描画フレームワークは,優れたレイアウト例と同様のスタイルでグラフを描画できるだけでなく,任意の審美基準に従ってグラフレイアウトを最適化することができる。モデルがトレーニングされると、サンプルレイアウトのスタイルや選択された美的基準に従って任意のグラフを視覚化することができる。総合的な実験研究により、SmartGDは、一般的に合意されている指標に従って12のベンチマークメソッドを上回ります。 A multitude of studies have been conducted on graph drawing, but many existing methods only focus on optimizing particular aesthetic aspects of graph layout. Given a graph, generating a good layout that satisfies certain human aesthetic preference remains a challenging task, especially if such preference can not be expressed as a differentiable objective function. In this paper, we propose a student-teacher GAN-based graph drawing framework, SmartGD, which learns to draw graphs just like how humans learn to perform tasks. The student network in the SmartGD learns graph drawing by imitating good layout examples, while the teacher network in SmartGD is responsible for providing ratings regarding the goodness of the generated layouts. When there is a lack of concrete aesthetic criteria to specify what constitutes a good layout, the student network can learn from the good layout examples. On the other hand, when the goodness of a layout can be assessed by quantitative criteria (even if not differentiable), the student network can use it as a concrete goal to optimize the target aesthetics. To accomplish the goal, we propose a novel variant of GAN, self-challenging GAN, to learn the optimal layout distribution with respect to any aesthetic criterion, whether the criterion is differentiable or not. The proposed graph drawing framework can not only draw graphs in a similar style as the good layout examples but also optimize the graph layouts according to any given aesthetic criteria when available. Once the model is trained, it can be used to visualize arbitrary graphs according to the style of the example layouts or the chosen aesthetic criteria. The comprehensive experimental studies show that SmartGD outperforms 12 benchmark methods according to the commonly agreed metrics.	翻訳日:2022-06-15 15:00:35 公開日:2022-06-13
# 知識発見のための説明可能な混合データ表現とロスレス可視化ツールキット Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery ( http://arxiv.org/abs/2206.06476v1 ) ライセンス: Link先を確認	Boris Kovalerchuk, Elijah McCoy	(参考訳) 不均一/混合データのための機械学習(ml)アルゴリズムの開発は長年の課題である。多くのMLアルゴリズムは、数値データや非数値データ、テキスト、グラフなどを含む混合データに適用できず、解釈可能なモデルを生成する。もう1つの長期的な問題は、多次元混合データのロスレス可視化のためのアルゴリズムの開発である。 MLのさらなる進歩は、混合データに対する解釈可能なMLアルゴリズムの成功と多次元データのロスレス解釈可能な可視化に大きく依存している。これにより、エンドユーザによる視覚的知識発見を使用して解釈可能なMLモデルの開発が可能になる。混合データに対する課題は,(1) 数値MLアルゴリズムの非数値属性の数値符号化スキームを生成し,正確かつ解釈可能なMLモデルを提供すること,(2) n-D の非数値データのロスレス可視化のための方法,およびこれらの視覚化における視覚ルールの発見である。本稿では、混合データの種類を分類し、MLの重要性を分析し、混合データを扱うための実験ツールキットを提案する。データ型エディタ、VisCanvasデータ可視化、ルール発見システムを組み合わせたもので、GitHubで公開されている。 Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.	翻訳日:2022-06-15 15:00:05 公開日:2022-06-13
# 対人ロバスト性向上のための代替手法:摂動スペクトルによる対人訓練の分析 Towards Alternative Techniques for Improving Adversarial Robustness: Analysis of Adversarial Training at a Spectrum of Perturbations ( http://arxiv.org/abs/2206.06496v1 ) ライセンス: Link先を確認	Kaustubh Sridhar, Souradeep Dutta, Ramneet Kaur, James Weimer, Oleg Sokolsky, Insup Lee	(参考訳) 敵のトレーニング(AT)とその変種は、ここ数年で敵の摂動と一般的な腐敗に対するニューラルネットワークの堅牢性を改善する進歩を先導している。 ATのアルゴリズム設計とその変種は、特定の摂動強度$\epsilon$でトレーニングモデルに焦点を当てており、アルゴリズムを改善するために、その$\epsilon$-robustモデルのパフォーマンスからのフィードバックのみを使用する。本研究では、$\epsilon$値のスペクトルに基づいてトレーニングされたモデルに焦点を当てる。モデル性能,中間特徴量精度,畳み込みフィルタ感度の3つの視点を解析する。それぞれにおいて、atに対する別の改善は、1つの$\epsilon$で明らかではありませんでした。具体的には、PGD攻撃の強さが$\delta$の場合、ATモデルが$\epsilon$よりも少し大きくなるが、それ以上の場合、それを最大限に一般化する。そこで我々は,ロバスト性に対する過剰設計を提案し,トレーニングモデルを$\epsilon$で提案する。第二に、ロバスト性は中間的特徴の精度、特に第1層と第2層の後の精度に非常に敏感である(様々な$\epsilon$値全体で)。そこで本研究では,アダプティブアタックの視認精度を向上させるための簡単な量子化手法を提案する。第3に、モデルの各層の畳み込みフィルタを$\epsilon$で解析し、第1層と第2層の畳み込みが入力摂動の増幅にのみ責任があることに気づく。我々は,CIFAR-10およびCIFAR-10-Cデータセット上でResNetおよびWideResNetモデルを用いて実験を行い,本手法を実証した。 Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations and common corruptions in the last few years. Algorithm design of AT and its variants are focused on training models at a specified perturbation strength $\epsilon$ and only using the feedback from the performance of that $\epsilon$-robust model to improve the algorithm. In this work, we focus on models, trained on a spectrum of $\epsilon$ values. We analyze three perspectives: model performance, intermediate feature precision and convolution filter sensitivity. In each, we identify alternative improvements to AT that otherwise wouldn't have been apparent at a single $\epsilon$. Specifically, we find that for a PGD attack at some strength $\delta$, there is an AT model at some slightly larger strength $\epsilon$, but no greater, that generalizes best to it. Hence, we propose overdesigning for robustness where we suggest training models at an $\epsilon$ just above $\delta$. Second, we observe (across various $\epsilon$ values) that robustness is highly sensitive to the precision of intermediate features and particularly those after the first and second layer. Thus, we propose adding a simple quantization to defenses that improves accuracy on seen and unseen adaptive attacks. Third, we analyze convolution filters of each layer of models at increasing $\epsilon$ and notice that those of the first and second layer may be solely responsible for amplifying input perturbations. We present our findings and demonstrate our techniques through experiments with ResNet and WideResNet models on the CIFAR-10 and CIFAR-10-C datasets.	翻訳日:2022-06-15 14:59:47 公開日:2022-06-13
# MetaTPTrans:多言語コード表現学習のためのメタ学習アプローチ MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning ( http://arxiv.org/abs/2206.06460v1 ) ライセンス: Link先を確認	Weiguo Pian, Hanyu Peng, Xunzhu Tang, Tiezhu Sun, Haoye Tian, Andrew Habib, Jacques Klein, Tegawend\'e F. Bissyand\'e	(参考訳) ソースコードの表現学習は、ソフトウェア工学のタスクに機械学習を適用するために不可欠である。異なるプログラミング言語間のコード表現の学習は、複数の言語データセットからのトレーニングデータが、ソースコードから言語に依存しない情報を抽出する能力を改善するため、単一言語データセットから学ぶことよりも効果的であることが示されている。しかし、既存のマルチ言語モデルは、複数の言語データセットでトレーニングする下流タスクにとって重要な言語固有の情報を見落とし、異なる言語間で共有パラメータを学習することだけに焦点を当てている。本稿では,多言語コード表現学習のためのメタ学習手法であるmetatptransを提案する。 metatptransは、入力ソースコードスニペットの特定のプログラミング言語に従って、特徴抽出器の異なるパラメータを生成し、モデルが言語に依存しない情報と言語固有の情報の両方を学習できるようにする。実験結果から,MetaTPTransは,コード要約作業におけるF1スコアを最大2.40ポイント,言語固有のタスクであるコード補完作業においてTop-1(Top-5)の予測精度を最大7.32(Top5)ポイント,言語固有のタスクとして最大7.15ポイント向上した。 Representation learning of source code is essential for applying machine learning to software engineering tasks. Learning code representation across different programming languages has been shown to be more effective than learning from single-language datasets, since more training data from multi-language datasets improves the model's ability to extract language-agnostic information from source code. However, existing multi-language models overlook the language-specific information which is crucial for downstream tasks that is training on multi-language datasets, while only focusing on learning shared parameters among the different languages. To address this problem, we propose MetaTPTrans, a meta learning approach for multilingual code representation learning. MetaTPTrans generates different parameters for the feature extractor according to the specific programming language of the input source code snippet, enabling the model to learn both language-agnostics and language-specific information. Experimental results show that MetaTPTrans improves the F1 score of state-of-the-art approaches significantly by up to 2.40 percentage points for code summarization, a language-agnostic task; and the prediction accuracy of Top-1 (Top-5) by up to 7.32 (13.15) percentage points for code completion, a language-specific task.	翻訳日:2022-06-15 14:56:32 公開日:2022-06-13
# 定量ヘイズレベルとグラウンドトゥルースを有する多目的実ハイズベンチマーク A Multi-purpose Real Haze Benchmark with Quantifiable Haze Levels and Ground Truth ( http://arxiv.org/abs/2206.06427v1 ) ライセンス: Link先を確認	Priya Narayanan, Xin Hu, Zhenyu Wu, Matthew D Thielke, John G Rogers, Andre V Harrison, John A D'Agostino, James D Brown, Long P Quang, James R Uplinger, Heesung Kwon, Zhangyang Wang	(参考訳) 屋外の視覚環境から収集された画像は、濃密な煙やヘイズの存在により劣化することが多い。これらの劣化した視覚環境(DVE)におけるシーン理解の研究における重要な課題は、代表的なベンチマークデータセットの欠如である。これらのデータセットは、最先端のオブジェクト認識や他のコンピュータビジョンアルゴリズムを劣化した設定で評価するために必要である。本稿では,ハズフリー画像を用いた最初のペア実画像ベンチマークと,その場でのハズ密度測定を導入することで,これらの制約に対処する。このデータセットは、現場全体を覆うプロの煙発生装置で制御された環境で作られ、無人航空機(UAV)と無人地上機(UGV)の両方の観点から撮影された画像で構成されている。また,データ集合上の物体検出装置と同様に,最先端のデハジング手法のセットを評価する。 ground truth object classification bounding boxとhaze density measurementを含む、本論文で提示された完全なデータセットは、コミュニティがアルゴリズムを次のように評価するために提供される。このデータセットのサブセットは、CVPR UG2 2022 チャレンジの Haze Track における Object Detection に使用されている。 Imagery collected from outdoor visual environments is often degraded due to the presence of dense smoke or haze. A key challenge for research in scene understanding in these degraded visual environments (DVE) is the lack of representative benchmark datasets. These datasets are required to evaluate state-of-the-art object recognition and other computer vision algorithms in degraded settings. In this paper, we address some of these limitations by introducing the first paired real image benchmark dataset with hazy and haze-free images, and in-situ haze density measurements. This dataset was produced in a controlled environment with professional smoke generating machines that covered the entire scene, and consists of images captured from the perspective of both an unmanned aerial vehicle (UAV) and an unmanned ground vehicle (UGV). We also evaluate a set of representative state-of-the-art dehazing approaches as well as object detectors on the dataset. The full dataset presented in this paper, including the ground truth object classification bounding boxes and haze density measurements, is provided for the community to evaluate their algorithms at: https://a2i2-archangel.vision. A subset of this dataset has been used for the Object Detection in Haze Track of CVPR UG2 2022 challenge.	翻訳日:2022-06-15 14:36:09 公開日:2022-06-13
# 行動認識のイデオロギーを用いたビデオPose3Dの訓練方法 A Training Method For VideoPose3D With Ideology of Action Recognition ( http://arxiv.org/abs/2206.06430v1 ) ライセンス: Link先を確認	Hao Bai	(参考訳) 映像からの行動認識とポーズ推定は人間の動きの理解と密接な関係があるが、より多くの文献では、行動認識から単独でポーズ推定タスクを解決する方法に焦点が当てられている。本研究は,アクション認識に基づくVideoPose3Dのより高速で柔軟なトレーニング手法を示す。このモデルは、推定される型と同じタイプのアクションで供給され、異なるタイプのアクションを別々にトレーニングすることができます。エビデンスによれば、一般的なポーズ推定タスクでは、このモデルはオリジナルの研究と同じような結果を得るために比較的少量のデータを必要としており、アクション指向タスクでは、受容野のサイズが限定され、MPJPEのベロシティエラーのトレーニングエポックが4.5%向上している。このモデルはアクション指向問題と一般的なポーズ推定問題の両方を扱うことができる。 Action recognition and pose estimation from videos are closely related to understand human motions, but more literature focuses on how to solve pose estimation tasks alone from action recognition. This research shows a faster and more flexible training method for VideoPose3D which is based on action recognition. This model is fed with the same type of action as the type that will be estimated, and different types of actions can be trained separately. Evidence has shown that, for common pose-estimation tasks, this model requires a relatively small amount of data to carry out similar results with the original research, and for action-oriented tasks, it outperforms the original research by 4.5% with a limited receptive field size and training epoch on Velocity Error of MPJPE. This model can handle both action-oriented and common pose-estimation problems.	翻訳日:2022-06-15 14:35:50 公開日:2022-06-13
# icpアルゴリズム:理論、実践とスラム指向分類法 ICP Algorithm: Theory, Practice And Its SLAM-oriented Taxonomy ( http://arxiv.org/abs/2206.06435v1 ) ライセンス: Link先を確認	Hao Bai	(参考訳) 反復的最接近点法 (icp) アルゴリズムは三次元表面登録の幾何学的アライメントにおいて最も重要なアルゴリズムの1つであり、同時局在マッピング(slam)タスクを含むコンピュータビジョンタスクでよく用いられる。本稿では, icpアルゴリズムの理論的原理, 表面登録タスクでの利用方法, 従来型icpアルゴリズムの分類法について述べる。また, SLAMタスクがオンラインであるか否か, ランドマークがSLAMタスクの特徴として存在するか否かなど, SLAMタスクの特徴に基づいて, ICPアルゴリズムのSLAM指向の分類も導入している。我々は,最新の研究論文をいくつか比較し,実装の詳細を分析することにより,slamタスクの各タイプの合成を行う。 The Iterative Closest Point (ICP) algorithm is one of the most important algorithms for geometric alignment of three-dimensional surface registration, which is frequently used in computer vision tasks, including the Simultaneous Localization And Mapping (SLAM) tasks. In this paper, we illustrate the theoretical principles of the ICP algorithm, how it can be used in surface registration tasks, and the traditional taxonomy of the variants of the ICP algorithm. As SLAM is becoming a popular topic, we also introduce a SLAM-oriented taxonomy of the ICP algorithm, based on the characteristics of each type of SLAM task, including whether the SLAM task is online or not and whether the landmarks are present as features in the SLAM task. We make a synthesis of each type of SLAM task by comparing several up-to-date research papers and analyzing their implementation details.	翻訳日:2022-06-15 14:35:33 公開日:2022-06-13
# フレームベースおよびイベントベース単一物体定位のためのスパイクニューラルネットワーク Spiking Neural Networks for Frame-based and Event-based Single Object Localization ( http://arxiv.org/abs/2206.06506v1 ) ライセンス: Link先を確認	Sami Barchid, Jos\'e Mennesson, Jason Eshraghian, Chaabane Dj\'eraba, Mohammed Bennamoun	(参考訳) スパイクニューラルネットワークは、ニューラルネットワークのエネルギー効率のよい代替手段として大きな期待を寄せている。しかし、センサノイズや入力エンコーディングがネットワーク活動や性能に与える影響を理解することは、分類のような共通のニューロモルフィック視覚ベースラインでは困難である。そこで本研究では,サロゲート勾配勾配を用いた単一物体位置定位のためのスパイクニューラルネットワークアプローチを提案する。提案手法を類似したニューラルネットワークと比較し,本モデルが競合性能と効率性,各種汚職に対するロバスト性,エネルギー消費量の低減を両立させることを示した。さらに,静的画像に対するニューラルコーディング方式の精度,ロバスト性,エネルギー効率への影響について検討した。本研究は,従来の生物工学的学習規則とは大きく異なり,サロゲート勾配学習アーキテクチャの設計を支援し,ノイズ特性やデータ符号化手法の観点から,将来のニューロモルフィック技術における設計優先の洞察を提供する。 Spiking neural networks have shown much promise as an energy-efficient alternative to artificial neural networks. However, understanding the impacts of sensor noises and input encodings on the network activity and performance remains difficult with common neuromorphic vision baselines like classification. Therefore, we propose a spiking neural network approach for single object localization trained using surrogate gradient descent, for frame- and event-based sensors. We compare our method with similar artificial neural networks and show that our model has competitive/better performance in accuracy, robustness against various corruptions, and has lower energy consumption. Moreover, we study the impact of neural coding schemes for static images in accuracy, robustness, and energy efficiency. Our observations differ importantly from previous studies on bio-plausible learning rules, which helps in the design of surrogate gradient trained architectures, and offers insight to design priorities in future neuromorphic technologies in terms of noise characteristics and data encoding methods.	翻訳日:2022-06-15 14:35:18 公開日:2022-06-13
# 不変構造学習による一般化と因果説明可能性の向上 Invariant Structure Learning for Better Generalization and Causal Explainability ( http://arxiv.org/abs/2206.06469v1 ) ライセンス: Link先を確認	Yunhao Ge, Sercan \"O. Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister	(参考訳) データの背後にある因果構造を学ぶことは、一般化を改善し、高品質な説明を得るのに有用である。本稿では,一般化を指標として因果構造発見を改善するための新しい枠組みである不変構造学習(isl)を提案する。 ISLはデータを異なる環境に分割し、一貫性の制約を課すことで、異なる環境にわたってターゲットに不変な構造を学ぶ。集約機構は、個々の環境から学習した構造よりもデータの因果メカニズムをより正確に反映するグラフ構造に基づいて最適な分類器を選択する。さらに,正確な因果構造発見がラベルに依存しない自己教師型学習環境にISLを拡張した。この自己監督型ISLは、異なるノードをターゲットとして反復的に設定することで、不変因果提案を利用する。合成および実世界のデータセットにおいて、ISLは因果構造を正確に発見し、代替手法より優れ、大きな分布シフトを持つデータセットに対して優れた一般化をもたらすことを示す。 Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target across different environments by imposing a consistency constraint. An aggregation mechanism then selects the optimal classifier based on a graph structure that reflects the causal mechanisms in the data more accurately compared to the structures learnt from individual environments. Furthermore, we extend ISL to a self-supervised learning setting where accurate causal structure discovery does not rely on any labels. This self-supervised ISL utilizes invariant causality proposals by iteratively setting different nodes as targets. On synthetic and real-world datasets, we demonstrate that ISL accurately discovers the causal structure, outperforms alternative methods, and yields superior generalization for datasets with significant distribution shifts.	翻訳日:2022-06-15 14:17:26 公開日:2022-06-13
# 雑音ラベルを用いた画像セグメンテーションについて : 精度とダイスに対する最適解のキャラクタリゼーションとボリューム特性 On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice ( http://arxiv.org/abs/2206.06484v1 ) ライセンス: Link先を確認	Marcus Nordstr\"om, Henrik Hult, Jonas S\"oderberg, Fredrik L\"ofman	(参考訳) 対象ラベルがノイズである場合の医用画像のセグメンテーション,精度,ダイスにおける2つのパフォーマンス指標について検討した。どちらの指標も最適セグメンテーションの集合のキャラクタリゼーションと体積特性に関するいくつかのステートメントが証明され、関連する実験が提供されている。私たちの主な洞察は (i)両方の指標に対する解の体積は、目標の期待される体積から著しくずれる可能性がある。 (ii)精度に対する解の体積は、常にサイスに対する解の体積と同等以下である。 (iii)両メトリクスの最適解が一致するのは、実現可能なセグメンテーションの集合が、対象の期待される体積に等しい体積を持つセグメンテーションの集合に制限されるときである。 We study two of the most popular performance metrics in medical image segmentation, Accuracy and Dice, when the target labels are noisy. For both metrics, several statements related to characterization and volume properties of the set of optimal segmentations are proved, and associated experiments are provided. Our main insights are: (i) the volume of the solutions to both metrics may deviate significantly from the expected volume of the target, (ii) the volume of a solution to Accuracy is always less than or equal to the volume of a solution to Dice and (iii) the optimal solutions to both of these metrics coincide when the set of feasible segmentations is constrained to the set of segmentations with the volume equal to the expected volume of the target.	翻訳日:2022-06-15 14:13:22 公開日:2022-06-13
# 仮説に焦点をあてるモダリティ:多モード知識蒸留のリンクについて The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation ( http://arxiv.org/abs/2206.06487v1 ) ライセンス: Link先を確認	Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao	(参考訳) マルチモーダル知識蒸留(英語版)(KD)は、伝統的な知識蒸留をマルチモーダル学習の領域にまで拡張する。 1つの一般的な実践は、パフォーマンス改善のために全知識を学生に伝達できることを期待して、優れたマルチモーダルネットワークを教師として採用することである。本稿では,マルチモーダルKDの有効性について検討する。まず2つの失敗事例を提供し、kdがマルチモーダル知識伝達における普遍的な治療法ではないことを示す。本稿では,モダリティ関係を理解するためのモダリティベン図と,マルチモーダルKDの有効性の決定的要因を明らかにするモダリティ集中仮説を示す。 6つのマルチモーダルデータセットの実験結果は, 蒸留性能を改善するために, 仮説の正当化, 故障症例の診断, ポイント方向の特定に有用である。 Multimodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning. One common practice is to adopt a well-performed multimodal network as the teacher in the hope that it can transfer its full knowledge to a unimodal student for performance improvement. In this paper, we investigate the efficacy of multimodal KD. We begin by providing two failure cases of it and demonstrate that KD is not a universal cure in multimodal knowledge transfer. We present the modality Venn diagram to understand modality relationships and the modality focusing hypothesis revealing the decisive factor in the efficacy of multimodal KD. Experimental results on 6 multimodal datasets help justify our hypothesis, diagnose failure cases, and point directions to improve distillation performance.	翻訳日:2022-06-15 14:13:09 公開日:2022-06-13
# トランスフォーマーを用いたマルチモーダルラーニング:サーベイ Multimodal Learning with Transformers: A Survey ( http://arxiv.org/abs/2206.06488v1 ) ライセンス: Link先を確認	Peng Xu, Xiatian Zhu, and David A. Clifton	(参考訳) Transformerは有望なニューラルネットワーク学習者であり、さまざまな機械学習タスクで大きな成功を収めている。近年のマルチモーダルアプリケーションとビッグデータの普及により、トランスフォーマーベースのマルチモーダル学習はAI研究においてホットなトピックとなっている。本稿では,マルチモーダルデータ指向の変圧器技術に関する包括的調査を行う。 The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community. Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.	翻訳日:2022-06-15 14:12:55 公開日:2022-06-13
# Habitat 2.0におけるBEHAVIOR: ベンチマークのためのシミュレータ非依存の論理的タスク記述 BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents ( http://arxiv.org/abs/2206.06489v1 ) ライセンス: Link先を確認	Ziang Liu, Roberto Mart\'in-Mart\'in, Fei Xia, Jiajun Wu, Li Fei-Fei	(参考訳) ロボットは倉庫や工場などの管理された環境で反復的かつ精度の高いタスクを実行するのに優れているが、家庭用タスクの補助を提供するAIエージェントにはまだ拡張されていない。コンピュータビジョンや自然言語処理といったAI分野でベンチマークが果たした触媒効果に触発され、コミュニティはAIを具現化した新しいベンチマークを探している。実施済みAIベンチマークの以前の作業では、ひとつの環境やシミュレータやドメインに特有の、異なる形式を使ったタスクを定義していたため、一般的なソリューションや同等のソリューションの開発が困難だった。本研究では,論理空間で定義された動作を異なるシミュレータに適応し易いことを示す第一歩として,その高速シミュレーション速度の恩恵を受けるために,振る舞いアクティビティのサブセットをhabitat 2.0に導入する。 Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks. Inspired by the catalyzing effect that benchmarks have played in the AI fields such as computer vision and natural language processing, the community is looking for new benchmarks for embodied AI. Prior work in embodied AI benchmark defines tasks using a different formalism, often specific to one environment, simulator or domain, making it hard to develop general and comparable solutions. In this work, we bring a subset of BEHAVIOR activities into Habitat 2.0 to benefit from its fast simulation speed, as a first step towards demonstrating the ease of adapting activities defined in the logic space into different simulators.	翻訳日:2022-06-15 14:11:18 公開日:2022-06-13
# 自己回帰ベイズ予測を用いた密度推定 Density Estimation with Autoregressive Bayesian Predictives ( http://arxiv.org/abs/2206.06462v1 ) ライセンス: Link先を確認	Sahra Ghalebikesabi, Chris Holmes, Edwin Fong, Brieuc Lehmann	(参考訳) ベイズ法は、前者によって引き起こされた正規化効果により、小データ体制において統計的推測の一般的な選択である。密度推定の文脈では、標準的なベイズ的アプローチは後方予測を目標とする。一般に、後部予測の直接推定は難解であり、通常は後部分布を中間段階として近似する手法を用いる。しかし,近年の再帰的予測コプラ更新により,後部近似を必要とせずにトラクタブルな予測密度推定が可能となった。これらの推定値は計算上魅力的であるが、非スムースデータ分布に苦しむ傾向がある。これは主に、提案されたコプラ更新が導出された可能性モデルの比較的限定的な形式によるものである。この欠点に対処するために,自己回帰的確率分解とガウス過程が先行するベイズ非パラメトリックモデルを考える。さらに,データを潜在空間にマップする自己回帰ニューラルネットワークを用いて帯域幅の新たなパラメータ化を定式化し,データ内のより複雑な依存関係をキャプチャする。我々の拡張は、既存の再帰的ベイズ密度推定器のモデリング能力を高め、表付きデータセットの最先端結果を達成する。 Bayesian methods are a popular choice for statistical inference in small-data regimes due to the regularization effect induced by the prior, which serves to counteract overfitting. In the context of density estimation, the standard Bayesian approach is to target the posterior predictive. In general, direct estimation of the posterior predictive is intractable and so methods typically resort to approximating the posterior distribution as an intermediate step. The recent development of recursive predictive copula updates, however, has made it possible to perform tractable predictive density estimation without the need for posterior approximation. Although these estimators are computationally appealing, they tend to struggle on non-smooth data distributions. This is largely due to the comparatively restrictive form of the likelihood models from which the proposed copula updates were derived. To address this shortcoming, we consider a Bayesian nonparametric model with an autoregressive likelihood decomposition and Gaussian process prior, which yields a data-dependent bandwidth parameter in the copula update. Further, we formulate a novel parameterization of the bandwidth using an autoregressive neural network that maps the data into a latent space, and is thus able to capture more complex dependencies in the data. Our extensions increase the modelling capacity of existing recursive Bayesian density estimators, achieving state-of-the-art results on tabular data sets.	翻訳日:2022-06-15 14:07:58 公開日:2022-06-13
# ベクトル束をもつ位相複素データのファイバー次元還元 Fiberwise dimensionality reduction of topologically complex data with vector bundles ( http://arxiv.org/abs/2206.06513v1 ) ライセンス: Link先を確認	Luis Scoccola and Jose A. Perea	(参考訳) 非自明な大規模トポロジーを持つデータセットは、既存の次元還元アルゴリズムで低次元ユークリッド空間に埋め込むのは難しい。本稿では,基本空間が大規模トポロジーを,ファイバーが局所幾何学を考慮しながら,ベクトル束を用いて位相的に複雑なデータセットをモデル化することを提案する。これにより、大規模なトポロジーを保ちながら繊維の寸法を小さくすることができる。我々はこの視点を定式化し、応用としてユークリッド空間の初期表現とともにデータセットを入力として、その大規模トポロジの一部を復元すると仮定したアルゴリズムを記述し、初期大域表現に沿って、局所的な線形次元の減少を通じて得られる局所表現を統合する新しい表現を出力する。このアルゴリズムは、力学系と化学の例を示す。これらの例において、本アルゴリズムは、様々な既知のメトリックベース次元低減アルゴリズムよりも低い目標次元におけるデータの位相的に忠実な埋め込みを学習することができる。 Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.	翻訳日:2022-06-15 14:07:36 公開日:2022-06-13
# Transversal GANを用いた3次元PET画像のプライバシー漏洩評価 Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN ( http://arxiv.org/abs/2206.06448v1 ) ライセンス: Link先を確認	Robert V. Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Arman Rahmim, Raymond T. Ng	(参考訳) 疾患診断や画像分割のための医用画像に対するコンピュータビジョン関連アルゴリズムの訓練は,プライバシ上の懸念から難しい。このため、データ共有を容易にするため、生成画像モデルは非常に求められている。しかし、3次元生成モデルは未検討であり、プライバシリークの調査が必要である。腫瘍マスクに装着した頭部・頸部PET画像を用いた3次元生成モデルTransversal GAN (TrGAN) について検討した。画像の忠実性、実用性、プライバシーの定量的尺度を定義します。これらの指標はトレーニングの過程で評価され、理想的な忠実さ、ユーティリティ、プライバシのトレードオフを特定し、これらのパラメータ間の関係を確立する。 trganの判別器は攻撃に対して脆弱であり、攻撃者は訓練に使用されたサンプルをほぼ完全な精度で識別できる(auc = 0.99)。また, 生成器のみにアクセスする攻撃者は, サンプルが訓練に使われたかどうか (auc = 0.51) を確実に分類できないことを示した。これは、TrGANジェネレータは、識別器ではなく、プライバシーのリスクを最小限に抑えつつ、優れたユーティリティと忠実さを維持しながら、合成3DPETデータを共有するために使われる可能性があることを示唆している。 Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transversal GAN (TrGAN), using head & neck PET images which are conditioned on tumour masks as a case study. We define quantitative measures of image fidelity, utility and privacy for our model. These metrics are evaluated in the course of training to identify ideal fidelity, utility and privacy trade-offs and establish the relationships between these parameters. We show that the discriminator of the TrGAN is vulnerable to attack, and that an attacker can identify which samples were used in training with almost perfect accuracy (AUC = 0.99). We also show that an attacker with access to only the generator cannot reliably classify whether a sample had been used for training (AUC = 0.51). This suggests that TrGAN generators, but not discriminators, may be used for sharing synthetic 3-D PET data with minimal privacy risk while maintaining good utility and fidelity.	翻訳日:2022-06-15 14:06:31 公開日:2022-06-13
# ヘイトスピーチとカウンタースピーチ検出: 会話的コンテキストは重要だ Hate Speech and Counter Speech Detection: Conversational Context Does Matter ( http://arxiv.org/abs/2206.06423v1 ) ライセンス: Link先を確認	Xinchen Yu, Eduardo Blanco, Lingzi Hong	(参考訳) ヘイトスピーチは、ユーザー生成コンテンツとともにサイバースペースを脅かしている。本稿では,オンラインヘイトとカウンタースピーチのアノテーションと検出における会話コンテキストの役割について検討する。 redditコメントの3方向分類タスクのためのコンテキスト対応データセット(ヘイトスピーチ、カウンタースピーチ、中立性)を作成しました。我々の分析は、文脈がヘイトとカウンタースピーチの識別に重要であることを示唆している: 人間の判断は、アノテータに文脈を示すかどうかによって、ほとんどのコメントに対して変化する。言語分析は、人々が憎しみや反論を表現するために使用する言語についての洞察を引き出す。実験の結果,文脈を考慮したニューラルネットワークの方が有意に優れた結果が得られることがわかった。また,光を入射する定性的誤差解析についても述べる。 (a)いつ、なぜ文脈が有益で、 (b) コンテキストを考慮した場合の最良のモデルによる残りのエラー。 Hate speech is plaguing the cyberspace along with user-generated content. This paper investigates the role of conversational context in the annotation and detection of online hate and counter speech, where context is defined as the preceding comment in a conversation thread. We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral. Our analyses indicate that context is critical to identify hate and counter speech: human judgments change for most comments depending on whether we show annotators the context. A linguistic analysis draws insights into the language people use to express hate and counter speech. Experimental results show that neural networks obtain significantly better results if context is taken into account. We also present qualitative error analyses shedding light into (a) when and why context is beneficial and (b) the remaining errors made by our best model when context is taken into account.	翻訳日:2022-06-15 14:02:50 公開日:2022-06-13
# LST:パラメータとメモリ効率向上のためのラダーサイドチューニング LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning ( http://arxiv.org/abs/2206.06522v1 ) ライセンス: Link先を確認	Yi-Lin Sung, Jaemin Cho, Mohit Bansal	(参考訳) 近年,下流タスクにおける大規模事前学習モデルが,様々な領域で採用されている。しかし、大きな事前訓練されたモデルのパラメータセット全体を更新するのはコストがかかる。最近提案されたパラメータ効率変換学習(PETL)技術では、トレーニング済みバックボーンネットワーク内のパラメータの小さなサブセット(パラメータの2%しか使用していない)を新しいタスクに更新することができるが、トレーニングメモリの要件を最大30%削減できる。これは、トレーニング可能なパラメータの勾配計算が、大きなトレーニング済みのバックボーンモデルによるバックプロパゲーションを必要とするためである。そこで本研究では,学習時のメモリ要求量を大幅に削減する新しいpetl手法であるlst(ladar side-tuning)を提案する。バックボーンネットワークに新たなパラメータを挿入する既存のパラメータ効率の手法とは異なり、バックボーンネットワークからのショートカット接続(ラダー)を介して中間的なアクティベーションを入力として取り出し、予測を行う、はしご側ネットワークを訓練する。 LSTは、バックボーンネットワークを通してのバックプロパゲーションを必要とせず、代わりにサイドネットワークとラグ接続によってのみメモリ要求が大幅に低下する。 NLP (GLUE) と視覚言語 (VQA, GQA, NLVR2, MSCOCO) の両方で, 様々なモデル (T5, CLIP-T5) を用いて評価を行った。 LSTはネットワーク全体を微調整するためにメモリコストの69%を節約するが、他の方法は同様のパラメータの使用で26%しか節約しない(従って2.7倍のメモリ節約)。さらに、LSTは低メモリ状態においてAdapterやLoRAよりも高い精度を達成する。この優れたメモリ効率の利点をさらに示すため、LSTをより大きなT5モデル(T5-large, T5-3B)に適用し、フルチューニングや他のPETL法よりもGLUE性能が向上した。全く同じ傾向が、VLタスクの実験にも見られる。 Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (ladders) from backbone networks and makes predictions. LST has significantly lower memory requirements than previous methods, because it does not require backpropagation through the backbone network, but instead only through the side network and ladder connections. We evaluate our method with various models (T5, CLIP-T5) on both NLP (GLUE) and vision-language (VQA, GQA, NLVR2, MSCOCO) tasks. LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2.7x more memory savings). Moreover, LST achieves higher accuracy than Adapter and LoRA in a low-memory regime. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models (T5-large, T5-3B), attaining better GLUE performance than full fine-tuning and other PETL methods. The exact same trend also holds in our experiments on VL tasks.	翻訳日:2022-06-15 13:29:52 公開日:2022-06-13
# 視覚とテキストのための合成混合表現 Compositional Mixture Representations for Vision and Text ( http://arxiv.org/abs/2206.06404v1 ) ライセンス: Link先を確認	Stephan Alaniz, Marco Federici, Zeynep Akata	(参考訳) 視覚と言語の間の共通の表現空間を学ぶことで、ディープネットワークは画像内のオブジェクトと対応する意味意味を関連付けることができる。本稿では,テキストの合成性を視覚領域に含ませる共有ガウス混合表現を,明示的な位置監視なしに学習するモデルを提案する。空間変換器を表現学習アプローチと組み合わせることで、画像を別々に符号化したパッチに分割し、視覚的およびテキスト的表現を解釈可能な方法で関連付けることを学ぶ。 MNISTとCIFAR10のバリエーションについて、我々のモデルは弱い教師付きオブジェクト検出を実行でき、オブジェクトの未知の組み合わせに外挿する能力を示す。 Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation learning approach we learn to split images into separately encoded patches to associate visual and textual representations in an interpretable manner. On variations of MNIST and CIFAR10, our model is able to perform weakly supervised object detection and demonstrates its ability to extrapolate to unseen combination of objects.	翻訳日:2022-06-15 13:28:42 公開日:2022-06-13
# GraphMLP: 3Dヒューマンポース推定のためのグラフMLPライクなアーキテクチャ GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation ( http://arxiv.org/abs/2206.06420v1 ) ライセンス: Link先を確認	Wenhao Li, Hong Liu, Tianyu Guo, Hao Tang, Runwei Ding	(参考訳) 現代の多層パーセプトロン(MLP)モデルは、自己注意なしで視覚表現を学習する際の競合的な結果を示している。しかしながら、既存のmlpモデルは、局所的な詳細を捉え、人間の構成に関する事前知識を欠いているため、骨格表現学習のモデリング能力が制限されている。これらの課題に対処するため,我々は,3次元ポーズ推定のためのグローバル・ローカル・グラフィック統一アーキテクチャにおいて,MPPとGCNを組み合わせたグラフ強化型MLPアーキテクチャーGraphMLPを提案する。 GraphMLPは、人体のグラフ構造をMDPモデルに組み込んで、ドメイン固有の要求を満たすと同時に、局所的およびグローバルな空間的相互作用を可能にする。大規模な実験により、提案したGraphMLPは、Human3.6MとMPI-INF-3DHPの2つのデータセットで最先端のパフォーマンスを達成することが示された。ソースコードと事前訓練されたモデルは公開されます。 Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand while also allowing for both local and global spatial interactions. Extensive experiments show that the proposed GraphMLP achieves state-of-the-art performance on two datasets, i.e., Human3.6M and MPI-INF-3DHP. Our source code and pretrained models will be publicly available.	翻訳日:2022-06-15 13:28:30 公開日:2022-06-13
# 皮膚内視鏡的病変分類における深層学習の形状バイアスの再検討 Revisiting the Shape-Bias of Deep Learning for Dermoscopic Skin Lesion Classification ( http://arxiv.org/abs/2206.06466v1 ) ライセンス: Link先を確認	Adriano Lucieri and Fabian Schmeisser and Christoph Peter Balada and Shoaib Ahmed Siddiqui and Andreas Dengel and Sheraz Ahmed	(参考訳) 一般に、人間の視覚システムはテクスチャではなく形状認識に偏っていると考えられている。この仮定は、深層モデルの意思決定プロセスと人間の視覚の基本特性を一致させようとする作業の体系を成長させてきた。形状特徴への依存は主に、共変量シフトの下でこれらのモデルの堅牢性を改善することが期待される。本稿では,皮膚病変画像の分類における形状ビアーゼの重要性を再考する。解析の結果,異なる皮膚病変データセットは個々の画像特徴に対して様々なバイアスを示すことがわかった。興味深いことに、深部特徴抽出装置は、皮膚病変分類の絡み合い特徴の学習に傾いているにもかかわらず、個々の特徴をこの絡み合い表現から復号することができる。このことは、これらの特徴がモデルの学習した埋め込み空間にまだ表現されていることを示しているが、分類には使われていない。加えて、異なるデータセットのスペクトル分析は、一般的な視覚認識とは対照的に、皮膚の皮膚病変の分類は、本質的に、形状バイアスを超えた複雑な特徴の組み合わせに依存していることを示している。自然な結果として、形状バイアスモデルの一般的な欲求から遠ざかることによって、皮膚病変の分類を改善できる場合もある。 It is generally believed that the human visual system is biased towards the recognition of shapes rather than textures. This assumption has led to a growing body of work aiming to align deep models' decision-making processes with the fundamental properties of human vision. The reliance on shape features is primarily expected to improve the robustness of these models under covariate shift. In this paper, we revisit the significance of shape-biases for the classification of skin lesion images. Our analysis shows that different skin lesion datasets exhibit varying biases towards individual image features. Interestingly, despite deep feature extractors being inclined towards learning entangled features for skin lesion classification, individual features can still be decoded from this entangled representation. This indicates that these features are still represented in the learnt embedding spaces of the models, but not used for classification. In addition, the spectral analysis of different datasets shows that in contrast to common visual recognition, dermoscopic skin lesion classification, by nature, is reliant on complex feature combinations beyond shape-bias. As a natural consequence, shifting away from the prevalent desire of shape-biasing models can even improve skin lesion classifiers in some cases.	翻訳日:2022-06-15 13:28:13 公開日:2022-06-13
# 半教師付き学習による顔アンチスプーフィングの一般化 Generalizable Method for Face Anti-Spoofing with Semi-Supervised Learning ( http://arxiv.org/abs/2206.06510v1 ) ライセンス: Link先を確認	Nikolay Sergievskiy, Roman Vlasov, Roman Trusov	(参考訳) 顔の偽造防止は生体認証システムにおける高いセキュリティ要件のために多くの注目を集めている。顔の生体認証を商用ハードウェアに持ち込むことは、専用のセンサーを使わずに偽のログインセッションを検出するための信頼性の高い方法の開発に大きく依存した。現在のCNNベースの手法は、トレーニング対象のドメインでよく機能するが、以前は見つからなかったデータセットでは一般化が不十分であることが多い。本稿では,複数データセット間の性能向上のための教師なし事前学習の手法について述べるとともに,教師付き微調整のためのエントリアンチスプーフィングデータセットを導入し,明示的な解釈可能な信号でスプーフィング試行を検出する二分分類タスクを増強する多クラス補助分類層を提案する。 MSU-MFSD, Replay-Attack, OULU-NPUデータセット上でのクロスデータセットテストの最先端結果を得ることで, モデルの有効性を実証する。 Face anti-spoofing has drawn a lot of attention due to the high security requirements in biometric authentication systems. Bringing face biometric to commercial hardware became mostly dependent on developing reliable methods for detecting fake login sessions without specialized sensors. Current CNN-based method perform well on the domains they were trained for, but often show poor generalization on previously unseen datasets. In this paper we describe a method for utilizing unsupervised pretraining for improving performance across multiple datasets without any adaptation, introduce the Entry Antispoofing Dataset for supervised fine-tuning, and propose a multi-class auxiliary classification layer for augmenting the binary classification task of detecting spoofing attempts with explicit interpretable signals. We demonstrate the efficiency of our model by achieving state-of-the-art results on cross-dataset testing on MSU-MFSD, Replay-Attack, and OULU-NPU datasets.	翻訳日:2022-06-15 13:27:53 公開日:2022-06-13
# ミューティセグメント情報符号化(MUSIC)を用いた自己教師付き表現学習 Self-Supervised Representation Learning With MUlti-Segmental Informational Coding (MUSIC) ( http://arxiv.org/abs/2206.06461v1 ) ライセンス: Link先を確認	Chuang Niu and Ge Wang	(参考訳) 自己教師あり表現学習(self-supervised representation learning)は、高次元データを意味のある埋め込み空間にマッピングする。最近の表現学習法のほとんどは、通常$l2$の正規化された単位超球面上の同じサンプルから異なるビューの埋め込み特徴の間の距離を最大化する。すべてのサンプルが同じ埋め込み特徴を持つ自明な解を避けるため, コントラスト学習, 停止勾配, 分散, 共分散正規化など, 様々な手法が開発されている。本研究では,自己指導型表現学習のためのMulti-Segmental Informational Coding (MUSIC)を提案する。 musicは埋め込み機能を複数のセグメントに分割し、サンプルを異なるセマンティッククラスタに識別的に分割する。情報理論の測定は音楽の最適化に直接用いられ、理論的には自明な解は避けられる。 MUSICは、メモリバンクや大規模なバッチ、非対称性ネットワーク、勾配停止、運動量更新など、一般的な技術に依存していないため、トレーニングフレームワークは柔軟である。実験の結果,MUSIC は画像ネット分類における多くのBarlow Twins 法や VICReg 法よりも精度が良く,深いプロジェクタも大きな特徴次元も必要としないことがわかった。コードは利用可能になる。 Self-supervised representation learning maps high-dimensional data into a meaningful embedding space, where samples of similar semantic contents are close to each other. Most of the recent representation learning methods maximize cosine similarity or minimize the distance between the embedding features of different views from the same sample usually on the $l2$ normalized unit hypersphere. To prevent the trivial solutions that all samples have the same embedding feature, various techniques have been developed, such as contrastive learning, stop gradient, variance and covariance regularization, etc. In this study, we propose MUlti-Segmental Informational Coding (MUSIC) for self-supervised representation learning. MUSIC divides the embedding feature into multiple segments that discriminatively partition samples into different semantic clusters and different segments focus on different partition principles. Information theory measurements are directly used to optimize MUSIC and theoretically guarantee trivial solutions are avoided. MUSIC does not depend on commonly used techniques, such as memory bank or large batches, asymmetry networks, gradient stopping, momentum weight updating, etc, making the training framework flexible. Our experiments demonstrate that MUSIC achieves better results than most related Barlow Twins and VICReg methods on ImageNet classification with linear probing, and requires neither deep projectors nor large feature dimensions. Code will be made available.	翻訳日:2022-06-15 13:24:57 公開日:2022-06-13
# テキスト要約における後編集効果の検討 An Exploration of Post-Editing Effectiveness in Text Summarization ( http://arxiv.org/abs/2206.06383v1 ) ライセンス: Link先を確認	Vivian Lai, Alison Smith-Renner, Ke Zhang, Ruijia Cheng, Wenjuan Zhang, Joel Tetreault, Alejandro Jaimes	(参考訳) 自動要約法は効率的だが、品質が低い。比較して、手動の要約は高価だが、高い品質を生み出す。人間とAIは、要約のパフォーマンスを改善するために協力できるのか? 同様のテキスト生成タスク(例えば機械翻訳)では、「ポスト編集」という形で人間とAIのコラボレーションが人間の作業量を削減し、AI出力の品質を向上させる。そこで,テキスト要約におけるポスト編集の利点について検討した。具体的には,編集後提供された要約と,要約品質,人的効率,形式的(xsumニュース),非公式(reddit投稿)テキストにおけるユーザエクスペリエンスに関するマニュアル要約を比較し,72名を対象に実験を行った。例えば、参加者がドメイン知識を欠いている場合など)は役に立ちますが、他のケース(例えば、提供された要約が不正確な情報を含む場合)では役に立ちます。参加者の異なる編集戦略と支援の必要性は、将来のAI要約システムに影響を及ぼす。 Automatic summarization methods are efficient but can suffer from low quality. In comparison, manual summarization is expensive but produces higher quality. Can humans and AI collaborate to improve summarization performance? In similar text generation tasks (e.g., machine translation), human-AI collaboration in the form of "post-editing" AI-generated text reduces human workload and improves the quality of AI output. Therefore, we explored whether post-editing offers advantages in text summarization. Specifically, we conducted an experiment with 72 participants, comparing post-editing provided summaries with manual summarization for summary quality, human efficiency, and user experience on formal (XSum news) and informal (Reddit posts) text. This study sheds valuable insights on when post-editing is useful for text summarization: it helped in some cases (e.g., when participants lacked domain knowledge) but not in others (e.g., when provided summaries include inaccurate information). Participants' different editing strategies and needs for assistance offer implications for future human-AI summarization systems.	翻訳日:2022-06-15 13:21:56 公開日:2022-06-13
# 何を知るべきか? 単一の経験ストリームにおける予測的特徴発見にメタ勾配降下を用いる What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience ( http://arxiv.org/abs/2206.06485v1 ) ライセンス: Link先を確認	Alexandra Kearney, Anna Koop, Johannes G\"unther, Patrick M. Pilarski	(参考訳) 計算強化学習において、増大する研究は、将来の感覚の予測を通じてエージェントの世界の知覚を構築することを目的としており、より良いゴール指向の意思決定を可能にするために、環境観察に関する予測が追加の入力機能として使用される。この一連の作業におけるオープンな課題は、エージェントがどの予測が意思決定に最も適するかを、無限に多くの予測から決定することである。この課題は、単一エージェントに単一の経験の流れが利用できる連続学習問題において特に顕著である。第一の貢献として,エージェントが学習するメタ勾配降下プロセスを紹介する。 1)何を予測すべきか 2) 選択された予測の見積り,及び 3)これらの見積もりを使って、将来の報酬を最大化するポリシを生成するには、どのように使うか。この原稿では、一般的な値関数として表現される予測について考察する: 将来の信号の蓄積の時間的拡張推定。本研究では, エージェントが環境とのインタラクションを通じて, 部分観測可能性を解決する予測を独立に選択できることを実証する。これらの予測を手動で指定するのではなく、エージェントが自己管理的な方法で有用な予測を特定できるようにし、真に自律的なシステムに向けた一歩を踏み出す。 In computational reinforcement learning, a growing body of work seeks to construct an agent's perception of the world through predictions of future sensations; predictions about environment observations are used as additional input features to enable better goal-directed decision-making. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. This challenge is especially apparent in continual learning problems where a single stream of experience is available to a singular agent. As a primary contribution, we introduce a meta-gradient descent process by which an agent learns 1) what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward -- all during a single ongoing process of continual learning. In this manuscript we consider predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal. We demonstrate that through interaction with the environment an agent can independently select predictions that resolve partial-observability, resulting in performance similar to expertly specified GVFs. By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner, taking a step towards truly autonomous systems.	翻訳日:2022-06-15 13:21:38 公開日:2022-06-13
# (参考訳) IGN : インシシブ生成ネットワーク IGN : Implicit Generative Networks ( http://arxiv.org/abs/2206.05860v1 ) ライセンス: CC BY-SA 4.0	Haozheng Luo, Tianyi Wu, Feiyu Han, Zhijun Yan, Jianfen Zhang	(参考訳) 本研究では,分布強化学習の最近の進歩を生かして,iqnに基づくモデルに最先端の分布型を与える。我々は,ganモデル生成器と分位回帰を持つ判別器関数を用いて,状態-作用の戻り値分布に対する全分位値を近似する。ベースラインデータセット – 57 atari 2600 games in the ale – ではパフォーマンスが向上しています。また,このアルゴリズムを用いて,アタリゲームにおけるリスクに敏感なポリシーの訓練性能を,政策最適化と評価で示す。 In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.	翻訳日:2022-06-15 02:46:43 公開日:2022-06-13
# (参考訳) TC-SfM:ロバストトラックコミュニティに基づく構造移動 TC-SfM: Robust Track-Community-Based Structure-from-Motion ( http://arxiv.org/abs/2206.05866v1 ) ライセンス: CC BY 4.0	Lei Wang, Linlin Ge, Shan Luo, Zihan Yan, Zhaopeng Cui and Jieqing Feng	(参考訳) Structure-from-Motion (SfM) は、入力画像間の対応に基づいて3次元シーン構造とカメラポーズを復元することを目的としており、二重構造(すなわち、強い視覚的類似性を持つ異なる構造)によって生じる曖昧さは、常に正しくないカメラポーズと3次元構造をもたらす。曖昧さに対処するために、既存の研究のほとんどは、2視点のジオメトリや特徴点を分析して追加の制約情報や暗黙の推論に頼っている。本稿では,現場における高次情報,すなわち地域空間の文脈情報を活用することを提案する。具体的には、各コミュニティがトラックのグループで構成され、シーン内の局所的なセグメントを表す、新しい構造、すなわち {\textit{track-community}}を提案する。コミュニティ検出アルゴリズムを使用して、シーンを複数のセグメントに分割する。そして、トラックの近傍を分析して潜在的な曖昧なセグメントを検出し、ポーズ整合性をチェックすることで補正する。最後に,各セグメントに部分的再構成を行い,両面の相対カメラポーズと3D-3D対応を考慮した新しい双方向整合コスト関数と整合する。実験の結果,視覚的に区別できない構造から生じる復元失敗をロバストに軽減し,部分的再構成を正確にマージできることがわかった。 Structure-from-Motion (SfM) aims to recover 3D scene structures and camera poses based on the correspondences between input images, and thus the ambiguity caused by duplicate structures (i.e., different structures with strong visual resemblance) always results in incorrect camera poses and 3D structures. To deal with the ambiguity, most existing studies resort to additional constraint information or implicit inference by analyzing two-view geometries or feature points. In this paper, we propose to exploit high-level information in the scene, i.e., the spatial contextual information of local regions, to guide the reconstruction. Specifically, a novel structure is proposed, namely, {\textit{track-community}}, in which each community consists of a group of tracks and represents a local segment in the scene. A community detection algorithm is used to partition the scene into several segments. Then, the potential ambiguous segments are detected by analyzing the neighborhood of tracks and corrected by checking the pose consistency. Finally, we perform partial reconstruction on each segment and align them with a novel bidirectional consistency cost function which considers both 3D-3D correspondences and pairwise relative camera poses. Experimental results demonstrate that our approach can robustly alleviate reconstruction failure resulting from visually indistinguishable structures and accurately merge the partial reconstructions.	翻訳日:2022-06-15 02:30:10 公開日:2022-06-13
# (参考訳) Pseudo-Labeling の信頼性 Confident Sinkhorn Allocation for Pseudo-Labeling ( http://arxiv.org/abs/2206.05880v1 ) ライセンス: CC BY 4.0	Vu Nguyen and Sachin Farfade and Anton van den Hengel	(参考訳) 半教師付き学習は、ラベル付きデータへの機械学習の依存を減らす重要なツールである。しかし、その内在する空間的・意味的構造を利用して、主に画像や言語データに適用されている。これらのメソッドは、これらのドメイン構造が利用できないため、表データには適用されない。既存の擬似ラベル法(pl)は表データに有効であるが、ノイズサンプルや未知のしきい値が与えられたグリーディ代入に対して脆弱である。本稿では,信頼度の高い標本のみにラベルを割り当て,最適な輸送手段によって最適なラベル割り当てを学習するCSA(Confident Sinkhorn Allocation)を提案する。 CSAは、この事実上重要な領域における現在の最先端技術よりも優れています。 Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data. It has, however, been applied primarily to image and language data, by exploiting the inherent spatial and semantic structure therein. These methods do not apply to tabular data because these domain structures are not available. Existing pseudo-labeling (PL) methods can be effective for tabular data but are vulnerable to noise samples and to greedy assignments given a predefined threshold which is unknown. This paper addresses this problem by proposing a Confident Sinkhorn Allocation (CSA), which assigns labels to only samples with high confidence scores and learns the best label allocation via optimal transport. CSA outperforms the current state-of-the-art in this practically important area.	翻訳日:2022-06-15 02:08:24 公開日:2022-06-13
# (参考訳) 大規模バッチによるアンカークライアントのサンプリングによるフェデレーション学習の高速化 Accelerating Federated Learning via Sampling Anchor Clients with Large Batches ( http://arxiv.org/abs/2206.05891v1 ) ライセンス: CC BY 4.0	Feijie Wu, Song Guo, Zhihao Qu, Shiqi He, Ziming Liu	(参考訳) 最近の連合学習研究で大規模なバッチを使用すると収束率が向上するが、小さなバッチを使うよりも計算のオーバーヘッドが増大する。この制限を克服するため,我々は,参加者を時間変動確率に基づいてアンカーグループとマイナーグループに分離する統一フレームワークfedamdを提案する。アンカーグループの各クライアントは、大きなバッチを使用して勾配を計算する。 minerグループのクライアントは、シリアルミニバッチを使用して複数のローカルアップデートを実行し、各ローカルアップデートは、クライアントのブルジー平均から派生したグローバルターゲットによって間接的に制御される。その結果、マイナーグループは、グローバルモデルを更新するのに適応した、大域的最小化への最適化された更新に従う。 FedAMDは$\epsilon$-approximationによって測定され、一定の確率でアンカーをサンプリングすることで、非凸目的の下で$O(1/\epsilon)$の収束率を達成する。理論的結果は最先端のアルゴリズムであるBVR-L-SGDを$O(1/\epsilon^{3/2})$でかなり上回り、FedAMDは少なくとも$O(1/\epsilon)$通信オーバーヘッドを減らす。実世界のデータセットに関する実証的研究は、FedAMDの有効性を検証し、提案アルゴリズムの優位性を実証する。 Using large batches in recent federated learning studies has improved convergence rates, but it requires additional computation overhead compared to using small batches. To overcome this limitation, we propose a unified framework FedAMD, which disjoints the participants into anchor and miner groups based on time-varying probabilities. Each client in the anchor group computes the gradient using a large batch, which is regarded as its bullseye. Clients in the miner group perform multiple local updates using serial mini-batches, and each local update is also indirectly regulated by the global target derived from the average of clients' bullseyes. As a result, the miner group follows a near-optimal update towards the global minimizer, adapted to update the global model. Measured by $\epsilon$-approximation, FedAMD achieves a convergence rate of $O(1/\epsilon)$ under non-convex objectives by sampling an anchor with a constant probability. The theoretical result considerably surpasses the state-of-the-art algorithm BVR-L-SGD at $O(1/\epsilon^{3/2})$, while FedAMD reduces at least $O(1/\epsilon)$ communication overhead. Empirical studies on real-world datasets validate the effectiveness of FedAMD and demonstrate the superiority of our proposed algorithm.	翻訳日:2022-06-15 01:41:06 公開日:2022-06-13
# (参考訳) 幾何学的ガイドによる統合勾配 Geometrically Guided Integrated Gradients ( http://arxiv.org/abs/2206.05903v1 ) ライセンス: CC BY 4.0	Md Mahfuzur Rahman, Noah Lewis, Sergey Plis	(参考訳) 深層ニューラルネットワークの解釈可能性の方法は、主に元の入力や摂動入力に対するクラススコアの感度に焦点が当てられ、通常は実際の勾配や修正された勾配を用いて測定される。予測の裏にある理性を理解するために、モデルに依存しないアプローチを使う方法もある。本稿では,入力に対するモデルパラメータ空間の局所的幾何が,ポストホックな説明を改善する上でも有用であることを論じ,実証する。この目的を達成するために,従来の統合勾配法のように,線形経路に沿った勾配計算の上に構築する「幾何学的誘導型統合勾配」と呼ばれる解釈可能性手法を提案する。しかし、勾配情報を統合する代わりに、入力の複数のスケールバージョンからモデルの動的挙動を探索し、各入力に対する最良の属性をキャプチャする。提案手法が,主観的および定量的評価においてバニラや統合的勾配よりも優れていることを示す。また,従来のモデルランダム化試験を補完する「モデル摂動」正当性チェックを提案する。 Interpretability methods for deep neural networks mainly focus on the sensitivity of the class score with respect to the original or perturbed input, usually measured using actual or modified gradients. Some methods also use a model-agnostic approach to understanding the rationale behind every prediction. In this paper, we argue and demonstrate that local geometry of the model parameter space relative to the input can also be beneficial for improved post-hoc explanations. To achieve this goal, we introduce an interpretability method called "geometrically-guided integrated gradients" that builds on top of the gradient calculation along a linear path as traditionally used in integrated gradient methods. However, instead of integrating gradient information, our method explores the model's dynamic behavior from multiple scaled versions of the input and captures the best possible attribution for each input. We demonstrate through extensive experiments that the proposed approach outperforms vanilla and integrated gradients in subjective and quantitative assessment. We also propose a "model perturbation" sanity check to complement the traditionally used "model randomization" test.	翻訳日:2022-06-15 01:39:18 公開日:2022-06-13
# (参考訳) 帯域制限関数の一般化におけるNN上のGNNの優位性 Superiority of GNN over NN in generalizing bandlimited functions ( http://arxiv.org/abs/2206.05904v1 ) ライセンス: CC BY 4.0	A. Martina Neuman, Rongrong Wang and Yuying Xie	(参考訳) 厳密な数学的議論を通じて、GNNアーキテクチャは、コンパクトな$d$次元ユークリッド格子上の帯域制限関数の近似において、NNのアーキテクチャよりも優れていることを示す。一様近似誤差である$o_{d}(2^{-\mathcal{m}^{1/d}}) を達成するために、前者は$\mathcal{m}$ の関数値しか必要とせず、この誤差率はnnsがより悪くなる可能性があるという意味で最適であることを示した。 We constructively show, via rigorous mathematical arguments, that GNN architectures outperform those of NN in approximating bandlimited functions on compact $d$-dimensional Euclidean grids. We show that the former only need $\mathcal{M}$ sampled functional values in order to achieve a uniform approximation error of $O_{d}(2^{-\mathcal{M}^{1/d}})$ and that this error rate is optimal, in the sense that, NNs might achieve worse.	翻訳日:2022-06-15 01:27:02 公開日:2022-06-13
# (参考訳) INDIGO:ドメインの一般化のための固有のマルチモーダリティ INDIGO: Intrinsic Multimodality for Domain Generalization ( http://arxiv.org/abs/2206.05912v1 ) ライセンス: CC BY 4.0	Puneet Mangla and Shivam Chandhok and Milan Aggarwal and Vineeth N Balasubramanian and Balaji Krishnamurthy	(参考訳) unseen domain(ドメインの一般化)の下で一般化するモデルには、ドメインに依存しない特徴表現を学習し、オブジェクトカテゴリを構成する基礎となるセマンティクスを捉えることが不可欠である。安価な弱教師付きノイズテキストアノテーションから全体表現を学習する弱教師付き視覚言語モデルへの最近の進歩は、異なるドメインで一般化する対象特性を捉えることによって意味理解の能力を示している。しかし、複数のソースドメインが関与する場合、データセット内の画像毎にテキストアノテーションをキュレートするコストは、その数に応じて数回爆発する可能性がある。これにより、プロセスが退屈で実現不可能になり、教師付き視覚言語アプローチを直接使用して、目に見えないドメイン上で最高の一般化を実現するのを妨げます。このことから,既存の事前学習型マルチモーダルネットワークからのマルチモーダル情報を「本質的な」方法で活用して,未知の領域下でのシステム一般化を実現する方法について検討した。そこで本研究では,これらの事前学習されたマルチモーダルネットワークに存在する本質的モダリティを,視覚モダリティとともに簡易かつエレガントに活用し,テスト時に未知領域への一般化を促進するためのドメイン一般化(indigo)のための本質的マルチモーダリティを提案する。我々はいくつかの領域一般化設定(ClosedDG, OpenDG, Limitedソース)を実験し、未確認領域における最先端の一般化性能を示す。さらに、INDIGOの総合的な理解を深めるために、徹底的な分析を行う。 For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on semantic understanding by capturing object characteristics that generalize under different domains. However, when multiple source domains are involved, the cost of curating textual annotations for every image in the dataset can blow up several times, depending on their number. This makes the process tedious and infeasible, hindering us from directly using these supervised vision-language approaches to achieve the best generalization on an unseen domain. Motivated from this, we study how multimodal information from existing pre-trained multimodal networks can be leveraged in an "intrinsic" way to make systems generalize under unseen domains. To this end, we propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO), a simple and elegant way of leveraging the intrinsic modality present in these pre-trained multimodal networks along with the visual modality to enhance generalization to unseen domains at test-time. We experiment on several Domain Generalization settings (ClosedDG, OpenDG, and Limited sources) and show state-of-the-art generalization performance on unseen domains. Further, we provide a thorough analysis to develop a holistic understanding of INDIGO.	翻訳日:2022-06-15 01:25:49 公開日:2022-06-13
# (参考訳) 量子化が一般化を改善する理由:二元重みニューラルネットワークのNTK Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks ( http://arxiv.org/abs/2206.05916v1 ) ライセンス: CC BY 4.0	Kaiqi Zhang, Ming Yin, Yu-Xiang Wang	(参考訳) 量子化されたニューラルネットワークは、推論中の空間と計算の複雑さを減らすため、多くの注目を集めている。さらに、量子化が暗黙の正則化として作用し、ニューラルネットワークの一般化性を向上させるという伝承もあるが、この興味深い民俗学を定式化する研究は存在しない。本稿では,ニューラルネットワークの2次重みを確率的ラウンドリングの下でのランダム変数とみなし,ニューラルネットワークの異なる層上の分布分布について検討する。本研究では,連続パラメータとスムーズなアクティベーション関数を持つニューラルネットワークである分布伝搬を近似する準ニューラルネットワークを提案する。この準ニューラルネットワークのニューラル・タンジェント・カーネル(NTK)を導出し、ランダム化スケールのガウス・カーネルに匹敵する約指数速度でNTKの固有値が崩壊することを示す。このことは、双対重みニューラルネットワークの再生カーネルヒルベルト空間(RKHS)が、実値重みを持つものと比較して関数の厳密な部分集合をカバーすることを示している。提案する擬似ニューラルネットワークがバイナリ重み付きニューラルネットワークを十分に近似できることを検証するために実験を行う。さらに、二元重みニューラルネットワークは、ガウスカーネルとラプラスカーネルの差に類似した実値重みニューラルネットワークと比較して、より低い一般化ギャップを与える。 Quantized neural networks have drawn a lot of attention as they reduce the space and computational complexity during the inference. Moreover, there has been folklore that quantization acts as an implicit regularizer and thus can improve the generalizability of neural networks, yet no existing work formalizes this interesting folklore. In this paper, we take the binary weights in a neural network as random variables under stochastic rounding, and study the distribution propagation over different layers in the neural network. We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function. We derive the neural tangent kernel (NTK) for this quasi neural network, and show that the eigenvalue of NTK decays at approximately exponential rate, which is comparable to that of Gaussian kernel with randomized scale. This in turn indicates that the Reproducing Kernel Hilbert Space (RKHS) of a binary weight neural network covers a strict subset of functions compared with the one with real value weights. We use experiments to verify that the quasi neural network we proposed can well approximate binary weight neural network. Furthermore, binary weight neural network gives a lower generalization gap compared with real value weight neural network, which is similar to the difference between Gaussian kernel and Laplace kernel.	翻訳日:2022-06-15 01:10:30 公開日:2022-06-13
# (参考訳) 大腸癌における蛍光アンギオグラフィーの分類-予備報告 Fluorescence angiography classification in colorectal surgery -- A preliminary report ( http://arxiv.org/abs/2206.05935v1 ) ライセンス: CC BY-SA 4.0	Antonio S Soares, Sophia Bano, Neil T Clancy, Laurence B Lovat, Danail Stoyanov and Manish Chand	(参考訳) 背景:fluorescence angiographyは, 外科医が最適な灌流組織を選択できることで, 腹水漏出の軽減に非常に有望な結果を示している。しかし、蛍光信号の主観的な解釈は、異なる外科医間で大きな違いが存在するため、この技法の幅広い応用を妨げる。本研究の目的は,術中蛍光アンギオグラフィーデータに基づいて,大腸組織を "perfused" あるいは "not perfused" と分類する人工知能アルゴリズムを開発することである。方法:resnetアーキテクチャを用いた分類モデルを,3次紹介センターにおける大腸切除の蛍光血管造影ビデオのデータセットを用いて検討した。コロンの蛍光および非蛍光セグメントに対応するフレームを用いて分類アルゴリズムを訓練した。トレーニングセットに使用されていない患者のフレームを用いた検証を行い、同じ機器を用いて収集したデータと異なるカメラを用いて収集したデータの両方を含む。パフォーマンス指標が計算され、サリエンシーマップが出力をさらに分析するために使用された。組織分類に基づいて決定境界が同定された。結果: 畳み込みニューラルネットワークを7例の1790フレームで訓練し, 14例の24フレームで検証した。トレーニングセットの精度は100%,検証セットの精度は80%であった。リコールと精度はそれぞれトレーニングセットで100%と100%、検証セットで68.8%と91.7%であった。結論: 術中蛍光式血管造影の精度の高い自動分類が可能であり, 自動判定境界同定が可能である。これにより外科医は蛍光血管造影の技術を標準化できる。アルゴリズムをデプロイするWebベースのアプリが利用可能になった。 Background: Fluorescence angiography has shown very promising results in reducing anastomotic leaks by allowing the surgeon to select optimally perfused tissue. However, subjective interpretation of the fluorescent signal still hinders broad application of the technique, as significant variation between different surgeons exists. Our aim is to develop an artificial intelligence algorithm to classify colonic tissue as 'perfused' or 'not perfused' based on intraoperative fluorescence angiography data. Methods: A classification model with a Resnet architecture was trained on a dataset of fluorescence angiography videos of colorectal resections at a tertiary referral centre. Frames corresponding to fluorescent and non-fluorescent segments of colon were used to train a classification algorithm. Validation using frames from patients not used in the training set was performed, including both data collected using the same equipment and data collected using a different camera. Performance metrics were calculated, and saliency maps used to further analyse the output. A decision boundary was identified based on the tissue classification. Results: A convolutional neural network was successfully trained on 1790 frames from 7 patients and validated in 24 frames from 14 patients. The accuracy on the training set was 100%, on the validation set was 80%. Recall and precision were respectively 100% and 100% on the training set and 68.8% and 91.7% on the validation set. Conclusion: Automated classification of intraoperative fluorescence angiography with a high degree of accuracy is possible and allows automated decision boundary identification. This will enable surgeons to standardise the technique of fluorescence angiography. A web based app was made available to deploy the algorithm.	翻訳日:2022-06-15 01:09:11 公開日:2022-06-13
# (参考訳) リコメンダシステムのためのユニバーサルシーケンス表現学習に向けて Towards Universal Sequence Representation Learning for Recommender Systems ( http://arxiv.org/abs/2206.05941v1 ) ライセンス: CC BY 4.0	Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen	(参考訳) 効率的なシーケンシャルレコメンデータを開発するために,歴史的ユーザ行動のモデル化を目的とした一連のシーケンス表現学習(SRL)手法を提案する。既存のSRLメソッドの多くは、ユーザの好みをよりよく捉えるためにシーケンスモデルを開発するために明示的なアイテムIDに依存している。有効性はあるものの、アイテムIDを明示的にモデル化する制限のため、これらの手法を新しいレコメンデーションシナリオに移すことは困難である。この問題に取り組むため,我々はunisrecと呼ばれる新しいユニバーサルシーケンス表現学習手法を提案する。提案手法では,アイテムの関連記述テキストを用いて,異なるレコメンデーションシナリオ間で転送可能な表現を学習する。ユニバーサルアイテム表現を学習するために、パラメトリックホワイトニングとmixed-of-experts enhanced adaptorに基づく軽量なアイテムエンコーディングアーキテクチャを設計する。ユニバーサルシーケンス表現の学習には,複数領域の負をサンプリングして2つのコントラストプリトレーニングタスクを導入する。事前訓練されたユニバーサルシーケンス表現モデルにより,提案手法は帰納的あるいは帰納的設定の下で,パラメータ効率の良い方法で,新しいレコメンデーションドメインやプラットフォームに効果的に移行することができる。実世界のデータセット上で行った広範囲な実験により,提案手法の有効性が示された。特に,提案手法はクロスプラットフォーム環境での性能向上にもつながり,ユニバーサルsrl方式の強い転送性を示す。コードと事前訓練されたモデルは、https://github.com/RUCAIBox/UniSRec.comで入手できる。 In order to develop effective sequential recommenders, a series of sequence representation learning (SRL) methods are proposed to model historical user behaviors. Most existing SRL methods rely on explicit item IDs for developing the sequence models to better capture user preference. Though effective to some extent, these methods are difficult to be transferred to new recommendation scenarios, due to the limitation by explicitly modeling item IDs. To tackle this issue, we present a novel universal sequence representation learning approach, named UniSRec. The proposed approach utilizes the associated description text of items to learn transferable representations across different recommendation scenarios. For learning universal item representations, we design a lightweight item encoding architecture based on parametric whitening and mixture-of-experts enhanced adaptor. For learning universal sequence representations, we introduce two contrastive pre-training tasks by sampling multi-domain negatives. With the pre-trained universal sequence representation model, our approach can be effectively transferred to new recommendation domains or platforms in a parameter-efficient way, under either inductive or transductive settings. Extensive experiments conducted on real-world datasets demonstrate the effectiveness of the proposed approach. Especially, our approach also leads to a performance improvement in a cross-platform setting, showing the strong transferability of the proposed universal SRL method. The code and pre-trained model are available at: https://github.com/RUCAIBox/UniSRec.	翻訳日:2022-06-15 00:56:48 公開日:2022-06-13
# (参考訳) Pro-TIP:TIP検出によるRObust自動超音波校正用ファントム PRO-TIP: Phantom for RObust automatic ultrasound calibration by TIP detection ( http://arxiv.org/abs/2206.05962v1 ) ライセンス: CC BY 4.0	Matteo Ronchetti, Julia Rackerseder, Maria Tirindelli, Mehrdad Salehi, Nassir Navab, Wolfgang Wein, Oliver Zettinig	(参考訳) 追跡超音波プローブの自動校正法を提案する。この目的のために、高さの異なる9つの円錐からなるカスタムファントムを設計する。チップは複数のスイープにマッチするキーポイントとして使用される。畳み込みニューラルネットワークを用いてこれらを抽出し、超音波フレーム毎にコーンを分割し、スイープ全体にわたって追跡する。キャリブレーションはRANSACを用いて頑健に推定され、後に画像ベース技術を用いて洗練される。 phantomは3dプリントでき、最先端の方法よりも多くのアドバンテージを提供します。 phantomの設計とアルゴリズムコードはオンラインで無料で入手できる。ファントム自体が追跡対象を必要としないため,現在使用されている技術よりも使いやすさが向上している。この完全自動メソッドは、実験で示したように、新しいプローブと異なるベンダーに一般化します。このアプローチは、ドメインエキスパートが取得したキャリブレーションに匹敵する結果を生み出す。 We propose a novel method to automatically calibrate tracked ultrasound probes. To this end we design a custom phantom consisting of nine cones with different heights. The tips are used as key points to be matched between multiple sweeps. We extract them using a convolutional neural network to segment the cones in every ultrasound frame and then track them across the sweep. The calibration is robustly estimated using RANSAC and later refined employing image based techniques. Our phantom can be 3D-printed and offers many advantages over state-of-the-art methods. The phantom design and algorithm code are freely available online. Since our phantom does not require a tracking target on itself, ease of use is improved over currently used techniques. The fully automatic method generalizes to new probes and different vendors, as shown in our experiments. Our approach produces results comparable to calibrations obtained by a domain expert.	翻訳日:2022-06-15 00:38:24 公開日:2022-06-13
# (参考訳) ATDN vSLAM: 視覚的同時局所化とマッピングのための全スルーディープラーニングベースのソリューション ATDN vSLAM: An all-through Deep Learning-Based Solution for Visual Simultaneous Localization and Mapping ( http://arxiv.org/abs/2206.05963v1 ) ライセンス: CC BY 4.0	M\'aty\'as Sz\'ant\'o, Gy\"orgy R. Bog\'ar, L\'aszl\'o Vajta	(参考訳) 本稿では,深層学習コンポーネントで構成された視覚同時局所化マッピング(vslam)のための新しい解法を提案する。提案されたアーキテクチャは高度にモジュール化されたフレームワークであり、各コンポーネントがビジョンベースのディープラーニングソリューションの各分野に最先端の成果を提供する。本論文は, これら個々のビルディングブロックの相乗的統合により, 機能的かつ効率的な全スルーディープニューラル(ATDN)vSLAMシステムを構築することができることを示す。 Embedding Distance Loss関数を導入し、それを使用してATDNアーキテクチャをトレーニングする。その結果、KITTIデータセットのサブセットで4.4%の変換と0.0176 deg/m回転誤差を達成した。提案アーキテクチャは、データベース作成を支援する効率的で低遅延の自律運転(AD)や、自律走行車(AV)制御の基礎として利用できる。 In this paper, a novel solution is introduced for visual Simultaneous Localization and Mapping (vSLAM) that is built up of Deep Learning components. The proposed architecture is a highly modular framework in which each component offers state of the art results in their respective fields of vision-based deep learning solutions. The paper shows that with the synergic integration of these individual building blocks, a functioning and efficient all-through deep neural (ATDN) vSLAM system can be created. The Embedding Distance Loss function is introduced and using it the ATDN architecture is trained. The resulting system managed to achieve 4.4% translation and 0.0176 deg/m rotational error on a subset of the KITTI dataset. The proposed architecture can be used for efficient and low-latency autonomous driving (AD) aiding database creation as well as a basis for autonomous vehicle (AV) control.	翻訳日:2022-06-15 00:29:58 公開日:2022-06-13
# (参考訳) emprox:ニューラルアーキテクチャ探索のためのニューラルネットワーク性能推定 EmProx: Neural Network Performance Estimation For Neural Architecture Search ( http://arxiv.org/abs/2206.05972v1 ) ライセンス: CC BY 4.0	G.G.H. Franken, P. Singh, J. Vanschoren	(参考訳) 一般的なニューラルアーキテクチャ探索手法は、パフォーマンスを評価し最適なアーキテクチャを見つけるためにトレーニングを必要とする大量の候補アーキテクチャを生成する。検索時間を最小化するために、異なるパフォーマンス推定戦略を使用する。このような戦略の有効性は、正確性、適合性、クエリ時間によって異なる。本研究では,EmProx Score (Embedding Proximity Score) という新しい手法を提案する。ニューラルネットワーク最適化(nao)と同様に、この手法は候補アーキテクチャをエンコーダ-デコーダフレームワークを使用して連続的な埋め込み空間にマッピングする。次に、その性能が知られているアーキテクチャの埋め込みベクトルに基づいて、重み付きkNNを用いて候補の性能を推定する。本手法の性能評価は,NAO と比較して約9倍高速であり,NAO で使用される MLP 性能予測器と同等である。現在使用されている他のパフォーマンス評価戦略に対するベンチマークは、より正確で、5倍から80倍高速であることを示している。 Common Neural Architecture Search methods generate large amounts of candidate architectures that need training in order to assess their performance and find an optimal architecture. To minimize the search time we use different performance estimation strategies. The effectiveness of such strategies varies in terms of accuracy and fit and query time. This study proposes a new method, EmProx Score (Embedding Proximity Score). Similar to Neural Architecture Optimization (NAO), this method maps candidate architectures to a continuous embedding space using an encoder-decoder framework. The performance of candidates is then estimated using weighted kNN based on the embedding vectors of architectures of which the performance is known. Performance estimations of this method are comparable to the MLP performance predictor used in NAO in terms of accuracy, while being nearly nine times faster to train compared to NAO. Benchmarking against other performance estimation strategies currently used shows similar to better accuracy, while being five up to eighty times faster.	翻訳日:2022-06-15 00:16:05 公開日:2022-06-13
# (参考訳) 線形時間時相論理に対するsahlqvist型対応定理 A Sahlqvist-style Correspondence Theorem for Linear-time Temporal Logic ( http://arxiv.org/abs/2206.05973v1 ) ライセンス: CC BY 4.0	Rui Li, Francesco Belardinelli	(参考訳) モーダル論理の言語はクリプキフレーム上の一階条件を表現することができる。 Henrik Sahlqvist による古典的な結果は、一階条件 (あるいは Sahlqvist 対応式) が効果的でアルゴリズム的な方法で発見できる、重要なモーダル公式のクラスを特定できる。最近の作品は、この古典的な結果をより複雑なモーダル言語に拡張することに成功している。本稿では,線形時時時論理 (LTL) に対する類似の行を追求し,時相仕様のための最も広く使われている形式言語の一つである Sahlqvist 形式の対応定理を開発する。 LTLは、基本的なモーダル論理の構文を拡張し、専用のテンポラル演算子Next X と until U を持つ。その結果、一階の対応式を持つ公式のクラスの複雑さも、それに応じて増加する。本稿では, モーダル作用素 F , G, X, U を用いて構築した LTL Sahlqvist 公式の有意なクラスを同定する。本論文の主な結果は、一階言語で定義可能なフレーム条件に対するltl sahlqvist公式の対応を証明することである。 The language of modal logic is capable of expressing first-order conditions on Kripke frames. The classic result by Henrik Sahlqvist identifies a significant class of modal formulas for which first-order conditions -- or Sahlqvist correspondents -- can be find in an effective, algorithmic way. Recent works have successfully extended this classic result to more complex modal languages. In this paper, we pursue a similar line and develop a Sahlqvist-style correspondence theorem for Linear-time Temporal Logic (LTL), which is one of the most widely used formal languages for temporal specification. LTL extends the syntax of basic modal logic with dedicated temporal operators Next X and Until U . As a result, the complexity of the class of formulas that have first-order correspondents also increases accordingly. In this paper, we identify a significant class of LTL Sahlqvist formulas built by using modal operators F , G, X, and U . The main result of this paper is to prove the correspondence of LTL Sahlqvist formulas to frame conditions that are definable in first-order language.	翻訳日:2022-06-15 00:06:18 公開日:2022-06-13
# (参考訳) ランク損失を用いたディープニューラルネットワークによる高速化故障時間モデル Deep Neural Network Based Accelerated Failure Time Models using Rank Loss ( http://arxiv.org/abs/2206.05974v1 ) ライセンス: CC BY 4.0	Gwangsu Kim and Sangwook Kang	(参考訳) 加速故障時間(aft)モデルは、故障時間と一連の共変量との対数線形関係を仮定する。危険機能に取り組む他の一般的な生存モデルとは対照的に、共変量の影響は直感的に解釈される障害時間に直接影響する。誤差分布を規定しない半パラメトリックAFTモデルは、分布仮定から逸脱するために柔軟で堅牢である。望ましい特徴から、このタイプのモデルは、検閲された障害時間データの解析において、一般的なcoxモデルに代わる有望な選択肢と見なされている。しかしながら、これらの AFT モデルでは、平均に対する線形予測器が典型的に仮定される。平均をモデル化する際、予測子の非線形性についてはほとんど研究されていない。ディープニューラルネットワーク(DNN)は過去数十年にわたって注目され、様々な分野で大きな成功を収めてきた。 DNNにはいくつかの顕著な利点があり、非線形性に対処するのに特に有用であることが示されている。これを利用して,Gehan型損失モデルとサブサンプリング手法を組み合わせることで,AFTモデルにDNNを適用することを提案する。提案したDNNとランクベースAFTモデル(DeepR-AFT)の有限サンプル特性を広範囲にわたる刺激研究により検討した。 DeepR-AFTは、予測器が非線形である場合、パラメトリックまたはセミパラメトリックよりも優れた性能を示す。線形予測器の場合、共変量の大きさが大きい場合、DeepR-AFTはより良く動作する。提案するdeepr-aftは,その優位性を示す2つの実データセットを用いて示す。 An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, whose interpretation is intuitive. The semiparametric AFT model that does not specify the error distribution is flexible and robust to departures from the distributional assumption. Owing to the desirable features, this class of models has been considered as a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the nonlinearity of predictors when modeling the mean. Deep neural networks (DNNs) have received a focal attention over the past decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing the nonlinearity. By taking advantage of this, we propose to apply DNNs in fitting AFT models using a Gehan-type loss, combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank based AFT model (DeepR-AFT) are investigated via an extensive stimulation study. DeepR-AFT shows a superior performance over its parametric or semiparametric counterparts when the predictor is nonlinear. For linear predictors, DeepR-AFT performs better when the dimensions of covariates are large. The proposed DeepR-AFT is illustrated using two real datasets, which demonstrates its superiority.	翻訳日:2022-06-14 23:47:38 公開日:2022-06-13
# (参考訳) トップ2のアルゴリズムが再検討 Top Two Algorithms Revisited ( http://arxiv.org/abs/2206.05979v1 ) ライセンス: CC BY-SA 4.0	Marc Jourdan, R\'emy Degenne, Dorian Baudry, Rianne de Heide and Emilie Kaufmann	(参考訳) トップ2のアルゴリズムは、トンプソンサンプリングを多腕バンディットモデル(Russo, 2016)の最も優れた腕識別に適応させたことで生まれた。彼らは2つの候補の腕、リーダーと挑戦者のランダム化によって次の腕を選択します。その優れた経験的性能にもかかわらず、固定信頼の最良の腕の識別に関する理論的保証は、既知のばらつきを持つガウス的腕のときのみ得られる。本稿では, リーダー, 挑戦者, および(多分非パラメトリックな)アーム分布の望ましい特性を識別する, 上位2つの方法の一般解析を行う。その結果,有界分布を持つ最適アーム識別のための理論的に支持されたトップ2アルゴリズムが得られた。提案手法は,トンプソンサンプリングから受け継いだリーダの選択に使用されるサンプリングステップが,経験的ベストアームの選択など他の選択に置き換えられることを示す。 Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been obtained when the arms are Gaussian with known variances. In this paper, we provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. As a result, we obtain theoretically supported Top Two algorithms for best arm identification with bounded distributions. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices, like selecting the empirical best arm.	翻訳日:2022-06-14 23:28:24 公開日:2022-06-13
# (参考訳) dnnの注意を誘導する効率的なヒューマン・イン・ザ・ループシステム Efficient Human-in-the-loop System for Guiding DNNs Attention ( http://arxiv.org/abs/2206.05981v1 ) ライセンス: CC BY 4.0	Yi He, Xi Yang, Chia-Ming Chang, Haoran Xie, Takeo Igarashi	(参考訳) 注意指導は、ディープラーニングにおけるデータセットバイアスに対処するためのアプローチであり、モデルが決定を下すのに誤った機能に依存している。画像分類タスクに着目し,ユーザが指定した領域への分類器の注意を対話的に誘導し,共起バイアスの影響を低減し,DNNの伝達性と解釈性を向上させる。注意誘導のための従来のアプローチでは、ピクセルレベルのアノテーションの準備が必要であり、インタラクティブシステムとして設計されていない。本稿では,ユーザが簡単なクリックで画像に注釈を付けるための新しい対話的手法と,アノテーション数を大幅に減らすための新しいアクティブラーニング戦略を提案する。提案システムを複数のデータセット上で評価するために,数値評価とユーザ調査を行った。通常、大量のポリゴンベースのセグメンテーションマスクを使用して微調整やDNNの訓練を行う既存の非アクティブラーニングアプローチと比較して、我々のシステムは多くの労力とお金を節約し、データセットにバイアスがかかってもよりうまく機能する微調整ネットワークを得ることができる。実験結果から,提案システムの有効性,妥当性,信頼性が示唆された。 Attention guidance is an approach to addressing dataset bias in deep learning, where the model relies on incorrect features to make decisions. Focusing on image classification tasks, we propose an efficient human-in-the-loop system to interactively direct the attention of classifiers to the regions specified by users, thereby reducing the influence of co-occurrence bias and improving the transferability and interpretability of a DNN. Previous approaches for attention guidance require the preparation of pixel-level annotations and are not designed as interactive systems. We present a new interactive method to allow users to annotate images with simple clicks, and study a novel active learning strategy to significantly reduce the number of annotations. We conducted both a numerical evaluation and a user study to evaluate the proposed system on multiple datasets. Compared to the existing non-active-learning approach which usually relies on huge amounts of polygon-based segmentation masks to fine-tune or train the DNNs, our system can save lots of labor and money and obtain a fine-tuned network that works better even when the dataset is biased. The experiment results indicate that the proposed system is efficient, reasonable, and reliable.	翻訳日:2022-06-14 23:26:59 公開日:2022-06-13
# (参考訳) 高品質GAN潜時サンプリングのためのハッチネス先行探索と爆発 Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling ( http://arxiv.org/abs/2206.06014v1 ) ライセンス: CC BY 4.0	Yuanbang Liang, Jing Wu, Yu-Kun Lai, Yipeng Qin	(参考訳) gans(generative adversarial network)に関する広範な研究にもかかわらず、その潜在空間から高品質の画像を確実にサンプリングする方法は、未検討のトピックである。本稿では, GAN潜伏分布の偏りを探索し, 利用することにより, 新たなGAN潜伏サンプリング手法を提案する。我々の重要な洞察は、GAN潜伏空間の高次元性は必然的に、潜伏空間の他の潜伏空間よりもはるかに大きなサンプリング密度を持つハブ潜伏空間の出現につながるということである。その結果、これらのハブ潜伏剤はより訓練され、高品質な画像の合成に寄与する。後方の「チェリーピッキング」と異なり,画像合成前に高品質な潜伏剤を識別する前駆的手法であるため,この手法は高効率である。さらに, 広く知られているが純粋に経験的切断トリックは, ハブ潜伏体の中央クラスタリング効果に対するナイーブな近似であり, 切断トリックの理論的根拠を明らかにするだけでなく, 本手法の優越性と基礎性も示す。その結果,提案手法の有効性が示された。 Despite the extensive studies on Generative Adversarial Networks (GANs), how to reliably sample high-quality images from their latent spaces remains an under-explored topic. In this paper, we propose a novel GAN latent sampling method by exploring and exploiting the hubness priors of GAN latent distributions. Our key insight is that the high dimensionality of the GAN latent space will inevitably lead to the emergence of hub latents that usually have much larger sampling densities than other latents in the latent space. As a result, these hub latents are better trained and thus contribute more to the synthesis of high-quality images. Unlike the a posterior "cherry-picking", our method is highly efficient as it is an a priori method that identifies high-quality latents before the synthesis of images. Furthermore, we show that the well-known but purely empirical truncation trick is a naive approximation to the central clustering effect of hub latents, which not only uncovers the rationale of the truncation trick, but also indicates the superiority and fundamentality of our method. Extensive experimental results demonstrate the effectiveness of the proposed method.	翻訳日:2022-06-14 23:08:13 公開日:2022-06-13
# (参考訳) 実処理インメモリシステムにおける機械学習トレーニング Machine Learning Training on a Real Processing-in-Memory System ( http://arxiv.org/abs/2206.06022v1 ) ライセンス: CC BY 4.0	Juan G\'omez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu	(参考訳) 機械学習アルゴリズムのトレーニングは計算集約的なプロセスであり、大規模なトレーニングデータセットに繰り返しアクセスするため、メモリバウンドが頻繁に発生する。その結果、プロセッサ中心のシステム(CPU、GPUなど)は、大量のエネルギーと実行サイクルを消費するメモリユニットと処理ユニットの間のコストのかかるデータ移動に悩まされる。メモリ中心のコンピューティングシステム、すなわち、PIM(Process-in-Memory)機能を備えたコンピューティングシステムは、このデータ移動のボトルネックを軽減することができる。我々の目標は、機械学習のトレーニングを加速するために、現代の汎用PIMアーキテクチャの可能性を理解することである。そのために,(1)実世界の汎用pimアーキテクチャ上で,いくつかの代表的な古典的機械学習アルゴリズム(線形回帰,ロジスティック回帰,決定木,k-平均クラスタリング)を実装し,(2)正確性,性能,スケーリングの観点から特徴付けし,(3)cpuとgpu上での実装と比較する。 2500以上のPIMコアを持つメモリ中心型コンピューティングシステムに対する実験的な評価は、PIMハードウェアで必要な操作やデータタイプをネイティブにサポートする場合、汎用PIMアーキテクチャがメモリバウンド機械学習ワークロードを大幅に高速化できることを示している。我々の知る限り、我々の研究は、現実世界の汎用PIMアーキテクチャにおける機械学習アルゴリズムのトレーニングを評価する最初のものである。 Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., computing systems with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training. To do so, we (1) implement several representative classic machine learning algorithms (namely, linear regression, logistic regression, decision tree, K-means clustering) on a real-world general-purpose PIM architecture, (2) characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our experimental evaluation on a memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound machine learning workloads, when the necessary operations and datatypes are natively supported by PIM hardware. To our knowledge, our work is the first one to evaluate training of machine learning algorithms on a real-world general-purpose PIM architecture.	翻訳日:2022-06-14 22:47:15 公開日:2022-06-13
# (参考訳) TriMix: 自己教師型学習のための仮想埋め込みと自己整合性 TriMix: Virtual embeddings and self-consistency for self-supervised learning ( http://arxiv.org/abs/2206.06023v1 ) ライセンス: CC BY 4.0	Tariq Bdair, Hossam Abdelhamid, Nassir Navab, and Shadi Albarqouni	(参考訳) 自己教師付き学習(SSL)は、教師付き学習モデルのトレーニングにおいて、高コストとデータ制限のために最近注目を集めている。 SSLの現在のパラダイムは、入力空間におけるデータ拡張を利用して、同じイメージの異なるビューを作成し、類似したイメージ間の表現を最大化し、異なるイメージに対して最小化するモデルをトレーニングすることだ。このアプローチは、様々な下流タスクをもたらす最先端(SOTA)を実現するが、しかしながら、潜伏空間の増大を調査する機会を秘めている。本稿では,データの線形補間により仮想埋め込みを生成するSSLの新しい概念であるTriMixを提案する。我々の戦略は、仮想空間からオリジナルの埋め込みを抽出するためにモデルを訓練することに焦点を当てている。さらに,仮想と実際の埋め込みの整合性を改善する自己整合性項を提案する。我々はTriMixを、自然画像と医用画像からなる8つのベンチマークデータセットで検証し、両方のデータ型で2番目に良いモデルよりも2.71%と0.41%改善した。さらに,本手法は半教師付き学習,特に低データ体制において,現在の手法よりも優れていた。さらに、トレーニング済みのモデルは、他のデータセットへの転送性が向上しました。 Self-supervised Learning (SSL) has recently gained much attention due to the high cost and data limitation in the training of supervised learning models. The current paradigm in the SSL is to utilize data augmentation at the input space to create different views of the same images and train a model to maximize the representations between similar images and minimize them for different ones. While this approach achieves state-of-the-art (SOTA) results in various downstream tasks, it still lakes the opportunity to investigate the latent space augmentation. This paper proposes TriMix, a novel concept for SSL that generates virtual embeddings through linear interpolation of the data, thus providing the model with novel representations. Our strategy focuses on training the model to extract the original embeddings from virtual ones, hence, better representation learning. Additionally, we propose a self-consistency term that improves the consistency between the virtual and actual embeddings. We validate TriMix on eight benchmark datasets consisting of natural and medical images with an improvement of 2.71% and 0.41% better than the second-best models for both data types. Further, our approach outperformed the current methods in semi-supervised learning, particularly in low data regimes. Besides, our pre-trained models showed better transfer to other datasets.	翻訳日:2022-06-14 22:34:38 公開日:2022-06-13
# (参考訳) 分光データに基づく機械学習のための普遍的合成データセット A universal synthetic dataset for machine learning on spectroscopic data ( http://arxiv.org/abs/2206.06031v1 ) ライセンス: CC BY 4.0	Jan Schuetzke, Nathan J. Szymanski, Markus Reischl	(参考訳) 分光データの自動分類のための機械学習手法の開発を支援するため,モデル検証に使用できる普遍的な合成データセットを作成した。このデータセットは、x線回折、核磁気共鳴、ラマン分光法などの手法による実験的な測定を表現するために設計された人工スペクトルを含んでいる。データセット生成プロセスは、スキャンの長さやピーク数などのカスタマイズ可能なパラメータを特徴としており、これは手元の問題に合わせて調整することができる。最初のベンチマークとして、500のユニークなクラスに基づいて、35,000のスペクトルを含むデータセットをシミュレートした。このデータの分類を自動化するために、8つの異なる機械学習アーキテクチャを評価した。結果から,分類タスクの最適性能を達成する上で,どの要因が最も重要かを明らかにした。合成スペクトルを生成するためのスクリプトとベンチマークデータセットと評価ルーチンは、分光分析のための改良された機械学習モデルの開発を支援するために公開されている。 To assist in the development of machine learning methods for automated classification of spectroscopic data, we have generated a universal synthetic dataset that can be used for model validation. This dataset contains artificial spectra designed to represent experimental measurements from techniques including X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy. The dataset generation process features customizable parameters, such as scan length and peak count, which can be adjusted to fit the problem at hand. As an initial benchmark, we simulated a dataset containing 35,000 spectra based on 500 unique classes. To automate the classification of this data, eight different machine learning architectures were evaluated. From the results, we shed light on which factors are most critical to achieve optimal performance for the classification task. The scripts used to generate synthetic spectra, as well as our benchmark dataset and evaluation routines, are made publicly available to aid in the development of improved machine learning models for spectroscopic analysis.	翻訳日:2022-06-14 22:13:35 公開日:2022-06-13
# (参考訳) Bluetooth低エネルギー信号とIMUセンサによる自動接触追跡 Automatic Contact Tracing using Bluetooth Low Energy Signals and IMU Sensor Readings ( http://arxiv.org/abs/2206.06033v1 ) ライセンス: CC BY 4.0	Suriyadeepan Ramamoorthy, Joyce Mahon, Michael O'Mahony, Jean Francois Itangayenda, Tendai Mukande, Tlamelo Makati	(参考訳) 本稿では,2台の携帯電話間の距離を推定する必要がある機械学習センター(ml-labs)の課題に対する解決策を提案する。 NIST Too Close For Too Long (TC4TL) Challengeの修正版であり、時間的側面は除外されている。本稿では,Bluetooth RSSI と IMU センサデータに基づく特徴に基づく手法を提案する。距離とBluetooth RSSI の読み方との関係について興味深い知見が得られたモデルに関するアブレーション研究を行った。 In this report, we present our solution to the challenge provided by the SFI Centre for Machine Learning (ML-Labs) in which the distance between two phones needs to be estimated. It is a modified version of the NIST Too Close For Too Long (TC4TL) Challenge, as the time aspect is excluded. We propose a feature-based approach based on Bluetooth RSSI and IMU sensory data, that outperforms the previous state of the art by a significant margin, reducing the error down to 0.071. We perform an ablation study of our model that reveals interesting insights about the relationship between the distance and the Bluetooth RSSI readings.	翻訳日:2022-06-14 22:05:07 公開日:2022-06-13
# (参考訳) 自動脳腫瘍の表現型を臨床画像へ変換する Translating automated brain tumour phenotyping to clinical neuroimaging ( http://arxiv.org/abs/2206.06120v1 ) ライセンス: CC BY 4.0	James K Ruffle, Samia Mohinta, Robert J Gray, Harpreet Hyare, Parashkev Nachev	(参考訳) 背景:脳腫瘍の複雑な異質性がますます認識されてきているため、日常的な臨床治療から引き出された本格的な大規模コレクションのみを要求できる。これは、現代の機械学習が、特にニューロイメージングにおいて促進できるタスクであるが、実際の臨床実践で一般的な不完全なデータを扱う能力は未だ不明である。本稿では, 大規模多地点MRIデータに最先端の手法を適用し, 臨床で観察される様々な完全性のレベルを再現する自動腫瘍分割モデルの比較忠実度を定量化する。方法: 深層学習(nnU-Net由来) 腫瘍分画モデルとT1, 造影T1, T2, FLAIR画像シーケンスの組合せを比較し, 2021BraTS-RSNAグリオーマ群1251例の5倍のクロスバリデーションを訓練し, 実世界の50例を対象に検討した。結果: 非完全データセグメント化病変をよく訓練したモデルは,完全データで訓練されたものと同等であり,全腫瘍のDice係数0.907から0.945(フルデータセット),成分組織型の0.701から0.891(フルデータセット)を示した。不完全なデータセグメンテーションモデルは、コントラストイメージングの欠如による腫瘍の増大を正確に検出し、その体積を0.95～0.97のr2で定量化した。結論: ディープラーニングセグメンテーションモデルは、データ不足時に腫瘍をうまく特徴づけ、コントラストを使わずに拡張組織を検出できる。これは、不完全なデータが一般的である臨床実践への翻訳が、hihertoが信じているよりも容易であり、コントラストの使用への依存を減らすのに有用であることを示唆している。 Background: The complex heterogeneity of brain tumours is increasingly recognized to demand data of magnitudes and richness only fully-inclusive, large-scale collections drawn from routine clinical care could plausibly offer. This is a task contemporary machine learning could facilitate, especially in neuroimaging, but its ability to deal with incomplete data common in real world clinical practice remains unknown. Here we apply state-of-the-art methods to large scale, multi-site MRI data to quantify the comparative fidelity of automated tumour segmentation models replicating the various levels of completeness observed in clinical reality. Methods: We compare deep learning (nnU-Net-derived) tumour segmentation models with all possible combinations of T1, contrast-enhanced T1, T2, and FLAIR imaging sequences, trained and validated with five-fold cross-validation on the 2021 BraTS-RSNA glioma population of 1251 patients, and tested on a diverse, real-world 50 patient sample. Results: Models trained on incomplete data segmented lesions well, often equivalently to those trained on complete data, exhibiting Dice coefficients of 0.907 (single sequence) to 0.945 (full datasets) for whole tumours, and 0.701 (single sequence) to 0.891 (full datasets) for component tissue types. Incomplete data segmentation models could accurately detect enhancing tumour in the absence of contrast imaging, quantifying its volume with an R2 between 0.95-0.97. Conclusions: Deep learning segmentation models characterize tumours well when missing data and can even detect enhancing tissue without the use of contrast. This suggests translation to clinical practice, where incomplete data is common, may be easier than hitherto believed, and may be of value in reducing dependence on contrast use.	翻訳日:2022-06-14 21:54:41 公開日:2022-06-13
# (参考訳) 教師なし学習技術を用いた光学銀河形態の分類 The Classification of Optical Galaxy Morphology Using Unsupervised Learning Techniques ( http://arxiv.org/abs/2206.06165v1 ) ライセンス: CC BY 4.0	Ezra Fielding, Clement N. Nyirenda, Mattia Vaccari	(参考訳) 大規模なデータ集約型天文学調査の出現により、ヒトベースの銀河形態分類法が実現可能になった。簡単に言えば、科学者が視覚的にラベルを付けるには、天文学的なデータが多すぎるということです。一般市民からボランティアを募集することで、この作業をクラウドソース化しようと試みられている。しかし、こうした取り組みでさえ、現在の調査で得られたデータにすぐに従わないだろう。教師なし学習技術では、既存のラベルでデータを分類する必要はなく、計画外の発見への道を開くことができる。そこで本研究では,人間の監督なしにGalaxy Zoo DECaLSデータセットを分類するための教師なし学習アルゴリズムを実装することを目的とする。まず、特徴抽出器として畳み込みオートエンコーダを実装した。抽出した特徴は, k-means, fuzzy c-means, agglomerative clusteringによって分類された。その結果,Galaxy Zoo DECaLSデータセットのボランティア分類と比較した。集約クラスタリングは一般的に最良の結果を得たが、k平均クラスタリングよりもパフォーマンスが向上した。適切な最適化により、この手法はより良いパフォーマンスのGalaxy Zoo DECaLS決定木質問のための分類を提供することができる。最終的に、この教師なし学習アプローチは、科学者にとって有用な貴重な洞察と結果をもたらした。 The advent of large scale, data intensive astronomical surveys has caused the viability of human-based galaxy morphology classification methods to come into question. Put simply, too much astronomical data is being produced for scientists to visually label. Attempts have been made to crowd-source this work by recruiting volunteers from the general public. However, even these efforts will soon fail to keep up with data produced by modern surveys. Unsupervised learning techniques do not require existing labels to classify data and could pave the way to unplanned discoveries. Therefore, this paper aims to implement unsupervised learning algorithms to classify the Galaxy Zoo DECaLS dataset without human supervision. First, a convolutional autoencoder was implemented as a feature extractor. The extracted features were then clustered via k-means, fuzzy c-means and agglomerative clustering to provide classifications. The results were compared to the volunteer classifications of the Galaxy Zoo DECaLS dataset. Agglomerative clustering generally produced the best results, however, the performance gain over k-means clustering was not significant. With the appropriate optimizations, this approach could be used to provide classifications for the better performing Galaxy Zoo DECaLS decision tree questions. Ultimately, this unsupervised learning approach provided valuable insights and results that were useful to scientists.	翻訳日:2022-06-14 21:38:13 公開日:2022-06-13
# (参考訳) 宇宙応用のためのシンボリック回帰:多目的メメティックアルゴリズムによる微分カルテシアン遺伝的プログラミング Symbolic Regression for Space Applications: Differentiable Cartesian Genetic Programming Powered by Multi-objective Memetic Algorithms ( http://arxiv.org/abs/2206.06213v1 ) ライセンス: CC BY 4.0	Marcus M\"artens and Dario Izzo	(参考訳) 解釈可能な回帰モデルは、スパースデータから変数間の関係を専門家が理解できるため、多くのアプリケーションドメインにとって重要である。記号回帰は、基本代数関数から構築できるすべての可能な自由形式方程式の空間を探索することによってこの問題に対処する。明示的な数学的関数はこの方法で再発見できるが、探索中の未知の数値定数の決定はしばしば無視される問題である。進化ループ中の定数を学習するために、微分可能なモンテカルロ遺伝的プログラミング符号化を利用する、新しい多目的メメティックアルゴリズムを提案する。この手法は、火星からの熱パワー推定とジャイロロノロジーによる恒星の年齢決定という2つの応用に対して、学習したブラックボックス回帰モデルやハンドエンジニアリングフィッティングよりも優れていることを示す。 Interpretable regression models are important for many application domains, as they allow experts to understand relations between variables from sparse data. Symbolic regression addresses this issue by searching the space of all possible free form equations that can be constructed from elementary algebraic functions. While explicit mathematical functions can be rediscovered this way, the determination of unknown numerical constants during search has been an often neglected issue. We propose a new multi-objective memetic algorithm that exploits a differentiable Cartesian Genetic Programming encoding to learn constants during evolutionary loops. We show that this approach is competitive or outperforms machine learned black box regression models or hand-engineered fits for two applications from space: the Mars express thermal power estimation and the determination of the age of stars by gyrochronology.	翻訳日:2022-06-14 21:26:20 公開日:2022-06-13
# (参考訳) 光場画像超解像のための劣化適応ネットワークの学習 Learning a Degradation-Adaptive Network for Light Field Image Super-Resolution ( http://arxiv.org/abs/2206.06214v1 ) ライセンス: CC BY 4.0	Yingqian Wang, Zhengyu Liang, Longguang Wang, Jungang Yang, Wei An, Yulan Guo	(参考訳) 近年、光電場(LF)画像超解像(SR)におけるディープニューラルネットワーク(DNN)の大きな進歩を目撃している。しかし、既存のDNNベースのLF画像SR法は、単一の固定劣化(例えば、バイコビックダウンサンプリング)に基づいて開発されており、様々な劣化を伴う実際のLF画像に対して適用できない。本稿では,複数の劣化を伴うlf画像srを扱う最初の方法を提案する。本研究では,実際のLF画像の劣化過程を近似するために,ぼかしと雑音を考慮した実用的なLF劣化モデルを開発した。次に、劣化適応ネットワーク(lf-danet)をsrプロセスに予め組み込むように設計する。複数の合成劣化を持つlf画像の訓練により,空間的および角度的情報を取り入れながら,異なる劣化に適応することを学ぶ。合成劣化LFと実世界のLFの併用実験により,本手法の有効性が示された。既存の最先端のシングルおよびlf画像sr法と比較して,本手法は幅広い劣化下で優れたsr性能を実現し,実画像への一般化を図る。コードとモデルはhttps://github.com/yingqianwang/lf-danetで入手できる。 Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradations. In this paper, we propose the first method to handle LF image SR with multiple degradations. In our method, a practical LF degradation model that considers blur and noise is developed to approximate the degradation process of real LF images. Then, a degradation-adaptive network (LF-DAnet) is designed to incorporate the degradation prior into the SR process. By training on LF images with multiple synthetic degradations, our method can learn to adapt to different degradations while incorporating the spatial and angular information. Extensive experiments on both synthetically degraded and real-world LFs demonstrate the effectiveness of our method. Compared with existing state-of-the-art single and LF image SR methods, our method achieves superior SR performance under a wide range of degradations, and generalizes better to real LF images. Codes and models are available at https://github.com/YingqianWang/LF-DAnet.	翻訳日:2022-06-14 21:16:57 公開日:2022-06-13
# (参考訳) 依存の感覚を作る: 依存度測定を用いた効率的なブラックボックス説明 Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure ( http://arxiv.org/abs/2206.06219v1 ) ライセンス: CC BY 4.0	Paul Novello, Thomas Fel, David Vigouroux	(参考訳) 本稿では,カーネルヒルベルト空間(rkhs)の再現に基づく従属尺度であるヒルベルト・シュミット独立基準(hsic)に基づく,新しい効率的なブラックボックス帰属法を提案する。 hsicは、分布のカーネル埋め込みに基づく入力画像の領域とモデルの出力の間の依存性を測定する。したがって、RKHS表現能力に富んだ説明を提供する。 HSICは、他のブラックボックス属性法と比較して計算コストを大幅に削減することができる。実験の結果,HSICは従来最高のブラックボックス属性法よりも最大8倍高速であり,忠実であることがわかった。実際、画像ネット上の複数の忠実度指標に対して、ブラックボックスとホワイトボックスの両方の属性法を、最新のモデルアーキテクチャで改善または適合させる。ここでは, YOLOv4などの物体検出モデルに対して, 効率よく, 忠実に説明できることを示す。最後に,hsicに基づく重要度スコアの直交分解を可能にする新たなカーネルを提案することで,従来の帰属法を拡張することにより,各画像パッチの重要性だけでなく,その対関係の重要性も評価できる。 This paper presents a new efficient black-box attribution method based on Hilbert-Schmidt Independence Criterion (HSIC), a dependence measure based on Reproducing Kernel Hilbert Spaces (RKHS). HSIC measures the dependence between regions of an input image and the output of a model based on kernel embeddings of distributions. It thus provides explanations enriched by RKHS representation capabilities. HSIC can be estimated very efficiently, significantly reducing the computational cost compared to other black-box attribution methods. Our experiments show that HSIC is up to 8 times faster than the previous best black-box attribution methods while being as faithful. Indeed, we improve or match the state-of-the-art of both black-box and white-box attribution methods for several fidelity metrics on Imagenet with various recent model architectures. Importantly, we show that these advances can be transposed to efficiently and faithfully explain object detection models such as YOLOv4. Finally, we extend the traditional attribution methods by proposing a new kernel enabling an orthogonal decomposition of importance scores based on HSIC, allowing us to evaluate not only the importance of each image patch but also the importance of their pairwise interactions.	翻訳日:2022-06-14 20:41:39 公開日:2022-06-13
# (参考訳) 説明可能性・設計:意思決定システムにおける説明を支援する手法 Explainability-by-Design: A Methodology to Support Explanations in Decision-Making Systems ( http://arxiv.org/abs/2206.06251v1 ) ライセンス: CC BY-SA 4.0	Trung Dong Huynh, Niko Tsakalakis, Ayah Helal, Sophie Stalla-Bourdillon, Luc Moreau	(参考訳) 近年、アルゴリズムは、生活の様々な側面を制御または影響する多くの技術システムにおいて重要な役割を担っている。その結果、ユーザーや組織のニーズに対応する説明の提供は、法律や規則、行動規範、大衆によってますます期待されている。しかし、法律や規則はそのような期待に応える方法を規定していないため、組織はしばしば説明可能性に対する独自のアプローチを考案し、必然的にコンプライアンスと優れたガバナンスのコストを増加させます。そこで我々は,意思決定システムの設計に説明能力を含める積極的措置によって特徴付けられる包括的方法論である"Explainability by Design"を提唱した。本稿では、特定のアプリケーション・コンテキストに対してドメイン・エキスパートが要求する要件から説明能力を実装するためのソフトウェア・エンジニアリング・ワークフローにおける説明可能性・設計手法の技術的なステップについて述べる。 Explainability-by-Design(説明可能性・バイ・デザイン)方法論の成果は、アプリケーションが提供するログを利用して、関連するデータポイントを抽出するためにクエリ可能な証明トレースを生成する、Explaination Assistantと呼ばれる再利用可能なサービスのセットである。これらのステップに従って、組織は、規定された要件を満たす説明を、法律、規制、ビジネスニーズから作成するための意思決定システムを設計することができる。この方法論を2つのアプリケーションに適用し,説明機能を示す説明アシスタントを配置した。最後に、関連する開発コストを測定し、説明文1文あたり2時間程度の開発時間で、説明書の構築アプローチが抽出可能であることを示す。 Algorithms play a key role nowadays in many technological systems that control or affect various aspects of our lives. As a result, providing explanations to address the needs of users and organisations is increasingly expected by the laws and regulations, codes of conduct, and the public. However, as laws and regulations do not prescribe how to meet such expectations, organisations are often left to devise their own approaches to explainability, inevitably increasing the cost of compliance and good governance. Hence, we put forth "Explainability by Design", a holistic methodology characterised by proactive measures to include explanation capability in the design of decision-making systems. This paper describes the technical steps of the Explainability-by-Design methodology in a software engineering workflow to implement explanation capability from requirements elicited by domain experts for a specific application context. Outputs of the Explainability-by-Design methodology are a set of configurations, allowing a reusable service, called the Explanation Assistant, to exploit logs provided by applications and create provenance traces that can be queried to extract relevant data points, which in turn can be used in explanation plans to construct explanations personalised to their consumers. Following those steps, organisations will be able to design their decision-making systems to produce explanations that meet the specified requirements, be it from laws, regulations, or business needs. We apply the methodology to two applications, resulting in a deployment of the Explanation Assistant demonstrating explanations capabilities. Finally, the associated development costs are measured, showing that the approach to construct explanations is tractable in terms of development time, which can be as low as two hours per explanation sentence.	翻訳日:2022-06-14 20:16:54 公開日:2022-06-13
# (参考訳) 大規模ニューラルネットワークの堅牢化のための分散逆行訓練 Distributed Adversarial Training to Robustify Deep Neural Networks at Scale ( http://arxiv.org/abs/2206.06257v1 ) ライセンス: CC BY 4.0	Gaoyuan Zhang, Songtao Lu, Yihua Zhang, Xiangyi Chen, Pin-Yu Chen, Quanfu Fan, Lee Martie, Lior Horesh, Mingyi Hong, Sijia Liu	(参考訳) 現在のディープニューラルネットワーク(DNN)は、入力に対する敵の摂動が分類を変更したり操作したりする敵攻撃に対して脆弱である。このような攻撃を防御するために、対戦訓練(AT)として知られる効果的で一般的なアプローチが、min-maxロバストな訓練方法により、敵攻撃の負の影響を軽減することが示されている。効果的ではあるが、それが分散学習コンテキストにうまく適応できるかは不明だ。複数のマシンに対する分散最適化のパワーにより、大規模なモデルやデータセットに対する堅牢なトレーニングをスケールアップできます。そこで本研究では,複数のマシンにまたがる大規模攻撃訓練フレームワークであるdistributed adversarial training (dat)を提案する。 DATは一般に,ラベル付きおよびラベルなしデータのトレーニング,複数種類の攻撃発生方法,分散最適化に適した勾配圧縮操作をサポートする。理論的には、最適化理論の標準的な条件下では、一般の非凸設定における一階定常点への DAT の収束率を提供する。経験的に、DATは最先端の堅牢なアキュラシーにマッチするか、より優れており、優雅なトレーニングスピードアップを実現している(例:ImageNetのResNet-50)。コードはhttps://github.com/dat-2022/datで入手できる。 Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.	翻訳日:2022-06-14 19:52:54 公開日:2022-06-13
# (参考訳) 多重カーネル拡張畳み込みネットワークによるポリプの自動分割 Automatic Polyp Segmentation with Multiple Kernel Dilated Convolution Network ( http://arxiv.org/abs/2206.06264v1 ) ライセンス: CC BY 4.0	Nikhil Kumar Tomar, Abhishek Srivastava, Ulas Bagci, Debesh Jha	(参考訳) 大腸内視鏡による前立腺ポリープの検出と除去は,世界中の大腸癌予防の第一の手法である。しかし,大腸ポリープのミス率は内科医によって大きく異なる。コンピュータ支援診断システム(CAD)は,大腸ポリープの検出や内科医の変動の最小化に役立てることが知られている。本研究では,ポリープデータ分布の大幅な変化に頑健な自動ポリープセグメンテーションを実現するための新しいディープラーニングアーキテクチャである {\textbf{MKDCNet}}を提案する。 MKDCNetは単にエンコーダ-デコーダニューラルネットワークであり、事前に訓練された \textit{ResNet50} をエンコーダとして使用し、新しい \textit{multiple kernel dilated convolution (MKDC) ブロックを使用して視野を広げ、より堅牢で不均一な表現を学ぶ。公開された4つのポリプデータセットと細胞核データセットの大規模な実験により、提案されたMKDCNetは、同じデータセット上でトレーニングおよびテストを行った場合や、異なる分布から見えないポリプデータセットでテストした場合に、最先端のメソッドよりも優れていた。その結果,提案するアーキテクチャの堅牢性が実証された。効率の観点から、我々のアルゴリズムはRTX 3090 GPU上で毎秒($\approx45$)フレームを処理できる。 mkdcnetは、臨床大腸のリアルタイムシステムを構築するための強力なベンチマークである。 MKDCNet のコードは \url{https://github.com/nikhilroxtomar/MKDCNet} で公開されている。 The detection and removal of precancerous polyps through colonoscopy is the primary technique for the prevention of colorectal cancer worldwide. However, the miss rate of colorectal polyp varies significantly among the endoscopists. It is well known that a computer-aided diagnosis (CAD) system can assist endoscopists in detecting colon polyps and minimize the variation among endoscopists. In this study, we introduce a novel deep learning architecture, named {\textbf{MKDCNet}}, for automatic polyp segmentation robust to significant changes in polyp data distribution. MKDCNet is simply an encoder-decoder neural network that uses the pre-trained \textit{ResNet50} as the encoder and novel \textit{multiple kernel dilated convolution (MKDC)} block that expands the field of view to learn more robust and heterogeneous representation. Extensive experiments on four publicly available polyp datasets and cell nuclei dataset show that the proposed MKDCNet outperforms the state-of-the-art methods when trained and tested on the same dataset as well when tested on unseen polyp datasets from different distributions. With rich results, we demonstrated the robustness of the proposed architecture. From an efficiency perspective, our algorithm can process at ($\approx45$) frames per second on RTX 3090 GPU. MKDCNet can be a strong benchmark for building real-time systems for clinical colonoscopies. The code of the proposed MKDCNet is available at \url{https://github.com/nikhilroxtomar/MKDCNet}.	翻訳日:2022-06-14 19:20:50 公開日:2022-06-13
# (参考訳) 関節表面アトラスの学習 Learning Joint Surface Atlases ( http://arxiv.org/abs/2206.06273v1 ) ライセンス: CC BY 4.0	Theo Deprelle, Thibault Groueix, Noam Aigerman, Vladimir G. Kim and Mathieu Aubry	(参考訳) 本稿では,3次元表面のアトラス様表現,すなわち2次元領域から表面への準同型変換を学習するための新しい手法について述べる。先行研究と比較して,2つの主要な貢献を提案する。まず、正方形パッチなどの固定された2次元領域を表面にマッピングするのではなく、ガウスの混合として表される点サンプリング分布を最適化することにより任意の位相を持つ連続な2次元領域を学習する。次に、3次元表面から2次元領域へのチャートと、その逆のパラメトリゼーションという、両方向の一貫性のあるマッピングを学習する。これにより、学習した表面表現の品質が向上し、関連する形状の集合における一貫性が向上することを示す。これにより、対応推定、テクスチャ転送、一貫性のあるuvマッピングなどのアプリケーションの改善につながる。追加の技術的貢献として、通常の整合性の導入には明確なメリットがあるが、最適化の問題につながり、これらの問題は単純な反発正則化によって緩和できる、と概説する。我々の貢献は、既存のベースラインよりも優れた表面表現を提供することを実証する。 This paper describes new techniques for learning atlas-like representations of 3D surfaces, i.e. homeomorphic transformations from a 2D domain to surfaces. Compared to prior work, we propose two major contributions. First, instead of mapping a fixed 2D domain, such as a set of square patches, to the surface, we learn a continuous 2D domain with arbitrary topology by optimizing a point sampling distribution represented as a mixture of Gaussians. Second, we learn consistent mappings in both directions: charts, from the 3D surface to 2D domain, and parametrizations, their inverse. We demonstrate that this improves the quality of the learned surface representation, as well as its consistency in a collection of related shapes. It thus leads to improvements for applications such as correspondence estimation, texture transfer, and consistent UV mapping. As an additional technical contribution, we outline that, while incorporating normal consistency has clear benefits, it leads to issues in the optimization, and that these issues can be mitigated using a simple repulsive regularization. We demonstrate that our contributions provide better surface representation than existing baselines.	翻訳日:2022-06-14 19:07:10 公開日:2022-06-13
# (参考訳) アクティブラーニングにおけるサンプルの再利用性について On the reusability of samples in active learning ( http://arxiv.org/abs/2206.06276v1 ) ライセンス: CC BY-SA 4.0	Gijs van Tulder and Marco Loog	(参考訳) アクティブラーニングにおいて、興味深いが広く研究されていない質問は、サンプル再利用可能性である。本稿では,サンプル再利用性が実用的関心事である理由,再利用性が問題になり得る理由,重要度重み付けアクティブラーニングによる再利用性の向上,普遍的再利用性への障害について述べる。理論的議論と実演により、普遍的再利用は不可能であると主張する。アクティブな学習戦略はすべて、サンプルスペースのいくつかの領域を過小評価しなければならないため、これらの領域のサンプルに依存する学習者は、ランダムなサンプル選択からさらに学ぶことができる。本稿では,実践における再利用可能性問題の影響を示す,重要度の高いアクティブラーニング実験について述べる。実験では、普遍的な再利用性は存在しないことを確認したが、いくつかのデータセットといくつかの分類器では、サンプル再利用性がある。最後に,2つの分類器間の再利用性を保証する条件について考察する。 An interesting but not extensively studied question in active learning is that of sample reusability: to what extent can samples selected for one learner be reused by another? This paper explains why sample reusability is of practical interest, why reusability can be a problem, how reusability could be improved by importance-weighted active learning, and which obstacles to universal reusability remain. With theoretical arguments and practical demonstrations, this paper argues that universal reusability is impossible. Because every active learning strategy must undersample some areas of the sample space, learners that depend on the samples in those areas will learn more from a random sample selection. This paper describes several experiments with importance-weighted active learning that show the impact of the reusability problem in practice. The experiments confirmed that universal reusability does not exist, although in some cases -- on some datasets and with some pairs of classifiers -- there is sample reusability. Finally, this paper explores the conditions that could guarantee the reusability between two classifiers.	翻訳日:2022-06-14 18:54:03 公開日:2022-06-13
# (参考訳) 病院療養における健康格差の緩和 Mitigating health disparities in hospital readmissions ( http://arxiv.org/abs/2206.06279v1 ) ライセンス: CC BY 4.0	Shaina Raza	(参考訳) 入院患者の高血糖管理は死亡率と死亡率の両方に大きな影響を及ぼす。本研究は,糖尿病患者を入院させる必要性を予測するために,大規模臨床データベースを用いた。しかし、これらの予測は人種、年齢、性別などの社会的決定要因によって引き起こされる健康格差に弱い可能性がある。これらのバイアスは、データ収集プロセスの初期にシステムに入る前に取り除かなければならず、モデル予測によって強化され、モデルの決定にバイアスが生じる。本稿では,バイアスの検出と軽減に加えて,予測が可能な機械学習パイプラインを提案する。このパイプラインは臨床データを分析し、バイアスの有無を決定し、それらを除去し、予測する。実験によるモデル予測における分類精度と公平性を示す。その結果、モデルの初期にバイアスを緩和すると、より公平な予測が得られます。また、フェアネスが向上するにつれて、ある程度の精度が犠牲になり、以前の研究でも検証されていることもわかりました。このパイプラインを通じて対処できる健康格差に寄与する追加の要因を特定するために、研究コミュニティを招待します。 The management of hyperglycemia in hospitalized patients has a significant impact on both morbidity and mortality. This study used a large clinical database to predict the need for diabetic patients to be hospitalized, which could lead to improvements in patient safety. These predictions, however, may be vulnerable to health disparities caused by social determinants such as race, age, and gender. These biases must be removed early in the data collection process, before they enter the system and are reinforced by model predictions, resulting in biases in the model's decisions. In this paper, we propose a machine learning pipeline capable of making predictions as well as detecting and mitigating biases. This pipeline analyses clinical data, determines whether biases exist, removes them, and then make predictions. We demonstrate the classification accuracy and fairness in model predictions using experiments. The results show that when we mitigate biases early in a model, we get fairer predictions. We also find that as we get better fairness, we sacrifice a certain level of accuracy, which is also validated in the previous studies. We invite the research community to contribute to identifying additional factors that contribute to health disparities that can be addressed through this pipeline.	翻訳日:2022-06-14 18:33:19 公開日:2022-06-13
# コストの少ない車両経路用マルチエージェントニューラルリライト装置 Multi-Agent Neural Rewriter for Vehicle Routing with Limited Disclosure of Costs ( http://arxiv.org/abs/2206.05990v1 ) ライセンス: Link先を確認	Nathalie Paul, Tim Wirtz, Stefan Wrobel, Alexander Kister	(参考訳) マルチサイクルルーティング問題を部分的に観測可能なコストでチームマルコフゲームとして解釈する。特定の顧客に対して、プレーングエージェント(車両)は、チーム最適エージェントルートを最小限のコストで決定するという共通の目標を持っています。これにより、各エージェントは自身のコストのみを観測する。我々のマルチエージェント強化学習アプローチである、いわゆるマルチエージェントニューラルリライタは、1エージェントニューラルリライタを利用して、反復的に書き換えるソリューションによって問題を解決する。並列エージェントアクションの実行と部分的可観測性は、ゲームに対する新しい書き換えルールを必要とする。本稿では,未アクセスノードの収集ポイントとして機能する,いわゆるプールの導入を提案する。エージェントは同時に動作し、ノードを競合のない方法で交換することができる。学習中にのみ共有することで,エージェント固有のコストの開示の制限を実現する。推論中、各エージェントは、そのコストのみに基づいて、分散的に行動する。小さな問題サイズに関する最初の実験結果から、完全なコスト情報設定で動作するOR-Toolsベンチマークに近い性能に達することが示される。 We interpret solving the multi-vehicle routing problem as a team Markov game with partially observable costs. For a given set of customers to serve, the playing agents (vehicles) have the common goal to determine the team-optimal agent routes with minimal total cost. Each agent thereby observes only its own cost. Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions. Parallel agent action execution and partial observability require new rewriting rules for the game. We propose the introduction of a so-called pool in the system which serves as a collection point for unvisited nodes. It enables agents to act simultaneously and exchange nodes in a conflict-free manner. We realize limited disclosure of agent-specific costs by only sharing them during learning. During inference, each agents acts decentrally, solely based on its own cost. First empirical results on small problem sizes demonstrate that we reach a performance close to the employed OR-Tools benchmark which operates in the perfect cost information setting.	翻訳日:2022-06-14 18:24:10 公開日:2022-06-13
# リアルタイム重力波検出のための新しい多層モジュラーアプローチ A Novel Multi-Layer Modular Approach for Real-Time Gravitational-Wave Detection ( http://arxiv.org/abs/2206.06004v1 ) ライセンス: Link先を確認	Francesco Pio Barone, Daniele Dell'Aquila, Marco Russo	(参考訳) 高度なLIGOと高度なVirgo地上ベースの干渉計は、前例のないほど大量の宇宙空間を探査し、重力波エミッターの新たな源に観測の発見能力を高めている。このシナリオでは、高度に最適化された重力波検出アルゴリズムの開発が重要である。本稿では,音声処理技術に触発された重力波のリアルタイム検出のための新しい階層化フレームワークを提案し,その実装において,遺伝的プログラミングとニューラルネットワークのハイブリッド化を含む最先端の機械学習アプローチに基づく。新しく提案されたフレームワークの重要な側面は、よく構造化された、階層化されたアプローチと低い計算複雑性である。本稿では,フレームワークの基本概念と,最初の3つのレイヤの導出について述べる。たとえこの実装において、機械学習アプローチで導出されたモデルに基づいていても、提案された階層構造は普遍的な性質を持つ。モデルの訓練および試験には, 高度なLIGO感度設計を示す合成ガウス雑音における二元ブラックホール重力波波形を用いた。畳み込みニューラルネットワークのようなより複雑なアプローチと比較すると、我々のフレームワークは、論文に記述された単純な基底モデルでさえも、同様の性能を持つが、計算の複雑さはずっと低く、モジュール性も高い。さらに、短期的な特徴の活用は、新しい枠組みの結果を重力波信号の時間配置と事実上独立にし、第2世代の干渉計による重力波検出のためのリアルタイム多層パイプラインの将来の利用を単純化する。 Advanced LIGO and Advanced Virgo ground-based interferometers are poised to probe an unprecedentedly large volume of space, enhancing the discovery power of the observations to even new sources of gravitational wave emitters. In this scenario, the development of highly optimized gravitational wave detection algorithms is crucial. We propose a novel layered framework for real-time detection of gravitational waves inspired by speech processing techniques and, in the present implementation, based on a state-of-the-art machine learning approach involving a hybridization of genetic programming and neural networks. The key aspects of the newly proposed framework are: the well structured, layered approach, and the low computational complexity. The paper describes the basic concepts of the framework and the derivation of the first three layers. Even if, in the present implementation, the layers are based on models derived using a machine learning approach, the proposed layered structure has a universal nature. To train and test the models, we used simulated binary black hole gravitational wave waveforms in synthetic Gaussian noise representative of Advanced LIGO sensitivity design. Compared to more complex approaches, such as convolutional neural networks, our framework, even using the simple ground model described in the paper, has similar performance but with a much lower computational complexity and a higher degree of modularity. Furthermore, the underlying exploitation of short-term features makes the results of the new framework virtually independent against time-position of gravitational wave signals, simplifying its future exploitation in real-time multi-layer pipelines for gravitational-wave detection with second generation interferometers.	翻訳日:2022-06-14 18:23:54 公開日:2022-06-13
# neuromorphic wireless cognition: 遠隔推論のためのイベント駆動意味コミュニケーション Neuromorphic Wireless Cognition: Event-Driven Semantic Communications for Remote Inference ( http://arxiv.org/abs/2206.06047v1 ) ライセンス: Link先を確認	Jiechen Chen, Nicolas Skatchkovsky, Osvaldo Simeone	(参考訳) ニューロモーフィックコンピューティングは、バッチ処理からストリーミングデータのオンライン、イベント駆動処理に移行する、新たなコンピューティングパラダイムである。スパイクベースのセンサーと組み合わせたニューロモルフィックチップは、スパイクのタイミングで関連する事象が記録されたときにのみエネルギーを消費し、環境の変化に対する低遅延応答を証明することによって、データ分布の「セマンティック」に本質的に適応することができる。本稿では,スパイクベースセンシング,処理,通信を統合したニューロモルフィック無線インターネット・オブ・シングスシステムのエンドツーエンド設計を提案する。提案するニューロコムシステムでは、各センシング装置は、神経形態センサ、スパイキングニューラルネットワーク(snn)、複数のアンテナを備えたインパルス無線送信機を備える。送信は、マルチアンテナインパルス無線受信機とSNNを備えた受信機に共有フェーディングチャネルを介して行われる。受信機のフェーディングチャネル条件への適応を可能にするため、パイロットを用いてデコードsnの重みを制御するハイパーネットワークを導入する。パイロット、SNNの符号化、SNNの復号化、ハイパーネットワークは、複数のチャネル実現を通じて共同で訓練される。提案システムは,従来のフレームベースデジタルソリューションよりも,時間-精度およびエネルギー消費の指標を用いて,代替の非適応的トレーニング手法よりも大幅に改善されている。 Neuromorphic computing is an emerging computing paradigm that moves away from batched processing towards the online, event-driven, processing of streaming data. Neuromorphic chips, when coupled with spike-based sensors, can inherently adapt to the "semantics" of the data distribution by consuming energy only when relevant events are recorded in the timing of spikes and by proving a low-latency response to changing conditions in the environment. This paper proposes an end-to-end design for a neuromorphic wireless Internet-of-Things system that integrates spike-based sensing, processing, and communication. In the proposed NeuroComm system, each sensing device is equipped with a neuromorphic sensor, a spiking neural network (SNN), and an impulse radio transmitter with multiple antennas. Transmission takes place over a shared fading channel to a receiver equipped with a multi-antenna impulse radio receiver and with an SNN. In order to enable adaptation of the receiver to the fading channel conditions, we introduce a hypernetwork to control the weights of the decoding SNN using pilots. Pilots, encoding SNNs, decoding SNN, and hypernetwork are jointly trained across multiple channel realizations. The proposed system is shown to significantly improve over conventional frame-based digital solutions, as well as over alternative non-adaptive training methods, in terms of time-to-accuracy and energy consumption metrics.	翻訳日:2022-06-14 18:23:28 公開日:2022-06-13
# 音響シーン分類のための低複雑深層学習フレームワーク Low-complexity deep learning frameworks for acoustic scene classification ( http://arxiv.org/abs/2206.06057v1 ) ライセンス: Link先を確認	Lam Pham, Dat Ngo, Anahid Jalali, Alexander Schindler	(参考訳) 本稿では,音響シーン分類(ASC)のための低複雑深層学習フレームワークを提案する。提案するフレームワークは、フロントエンドのスペクトログラム抽出、オンラインデータ拡張、バックエンド分類、予測確率の後期融合の4つの主要なステップに分けることができる。特に,まず音声録音をメル,ガンマタン,およびcqtスペクトログラムに変換する。次に、ランダムクロップ、スペクタグメント、ミックスアップのデータ拡張手法を適用し、深層学習に基づく分類器に入力する前に、拡張スペクトログラムを生成する。最後に, 3つの個別分類器から得られた確率を, 3種類のスペクトログラムで独立に学習し, 最適性能を得る。 DCASE 2022 Task 1 Development データセットで実施した実験は,低複雑さの要件を十分に満たし,60.1%の最高の分類精度を達成し,DCASE ベースラインを17.2%向上させた。 In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied to generate augmented spectrograms before being fed into deep learning based classifiers. Finally, to achieve the best performance, we fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms. Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%.	翻訳日:2022-06-14 18:23:03 公開日:2022-06-13
# 学習可能なウェーブレットパケット変換を用いたロバスト時系列デノーミング Robust Time Series Denoising with Learnable Wavelet Packet Transform ( http://arxiv.org/abs/2206.06126v1 ) ライセンス: Link先を確認	Gaetan Frusque, Olga Fink	(参考訳) 多くのアプリケーションでは、信号デノイジングは、後続の分析や学習タスクの前に最初の前処理ステップであることが多い。本稿では,ウェーブレットパケット変換の学習可能なバージョンである信号処理に触発された深層学習分母モデルを適用することを提案する。提案アルゴリズムは,解釈可能なパラメータが少なく,直感的な初期化が可能である。雑音レベルに適応するためのパラメータの学習後修正を提案する。提案手法の性能を2つのケーススタディで評価し,ウェーブレット収縮デノイング,畳み込みニューラルネットワーク,オートエンコーダ,U-netディープモデルなど,他の手法と比較した。最初のケーススタディは、アルゴリズムの認知特性を研究するのによく使われる設計関数に基づいている。第2のケーススタディは、オーディオバックグラウンド除去タスクです。本稿では,提案アルゴリズムが信号処理手法の普遍性と深層学習手法の学習能力にどのように関連しているかを示す。特に,訓練用クラス内外における構造化雑音信号の発声性能について評価した。トレーニングクラス内外における信号の復調性能に加えて, ノイズレベル, ノイズタイプ, アーティファクトが相違する場合には, 特にロバストであることを示す。 In many applications, signal denoising is often the first pre-processing step before any subsequent analysis or learning task. In this paper, we propose to apply a deep learning denoising model inspired by a signal processing, a learnable version of wavelet packet transform. The proposed algorithm has signficant learning capabilities with few interpretable parameters and has an intuitive initialisation. We propose a post-learning modification of the parameters to adapt the denoising to different noise levels. We evaluate the performance of the proposed methodology on two case studies and compare it to other state of the art approaches, including wavelet schrinkage denoising, convolutional neural network, autoencoder and U-net deep models. The first case study is based on designed functions that have typically been used to study denoising properties of the algorithms. The second case study is an audio background removal task. We demonstrate how the proposed algorithm relates to the universality of signal processing methods and the learning capabilities of deep learning approaches. In particular, we evaluate the obtained denoising performances on structured noisy signals inside and outside the classes used for training. In addition to having good performance in denoising signals inside and outside to the training class, our method shows to be particularly robust when different noise levels, noise types and artifacts are added.	翻訳日:2022-06-14 18:22:35 公開日:2022-06-13
# 階層的相関再構成を用いた活動銀河核の赤方偏移の確率分布予測 Predicting conditional probability distributions of redshifts of Active Galactic Nuclei using Hierarchical Correlation Reconstruction ( http://arxiv.org/abs/2206.06194v1 ) ライセンス: Link先を確認	Jarek Duda	(参考訳) 一般に値の予測に焦点が当てられているが、実データは条件付き確率分布のみを予測でき、条件付きエントロピー$H(Y\|X)$で制限される。さらに不確実性を推定すれば、予測値をラプラス分布のガウス中心として扱うことができ、これは実データの複雑な条件分布とはかけ離れた理想化である。本稿では,複数モーメント様パラメータの独立なMSE推定により,比較的複雑な条件分布(マルチモーダルなど)を安価に予測するために階層的相関再構成(HCR)手法を適用する。この目的のために線形回帰を用いて解釈可能なモデルを得る:条件付きモーメントに対する特徴の寄与を記述する係数を持つ。本稿では,第4のfermi-lat data release 2 (4lac) データセットに基づく活動銀河核の赤方偏移予測の実用的問題に着目し,特徴量最適化とl1"lasso"正則化にcanonical correlation analysis (cca) を用いた最初のアプローチを拡張した。 While there is a general focus on prediction of values, real data often only allows to predict conditional probability distributions, with capabilities bounded by conditional entropy $H(Y\|X)$. If additionally estimating uncertainty, we can treat a predicted value as the center of Gaussian of Laplace distribution - idealization which can be far from complex conditional distributions of real data. This article applies Hierarchical Correlation Reconstruction (HCR) approach to inexpensively predict quite complex conditional probability distributions (e.g. multimodal): by independent MSE estimation of multiple moment-like parameters, which allow to reconstruct the conditional distribution. Using linear regression for this purpose, we get interpretable models: with coefficients describing contributions of features to conditional moments. This article extends on the original approach especially by using Canonical Correlation Analysis (CCA) for feature optimization and l1 "lasso" regularization, focusing on practical problem of prediction of redshift of Active Galactic Nuclei (AGN) based on Fourth Fermi-LAT Data Release 2 (4LAC) dataset.	翻訳日:2022-06-14 18:22:13 公開日:2022-06-13
# モデル不確かさ下におけるマルコフ決定過程 Markov Decision Processes under Model Uncertainty ( http://arxiv.org/abs/2206.06109v1 ) ライセンス: Link先を確認	Ariel Neufeld, Julian Sester, Mario \v{S}iki\'c	(参考訳) 離散時間無限地平線設定におけるモデル不確実性の下でのマルコフ決定問題の一般的な枠組みを紹介する。動的プログラミングの原理を提供することにより、局所的-グローバルなパラダイム、すなわち、一段階の堅牢な最適化問題を解くことで、大域的(無限の時間ステップ)頑健な確率的最適制御問題の最適化と、それに対応する最悪の尺度が得られる。さらに、このフレームワークをS&P500のデータを含むポートフォリオ最適化に適用する。 2つの異なる曖昧性集合を提示する。1つは経験的測度の周りのwasserstein-ballによって与えられたデータ駆動であり、もう1つはパラメータの不確かさ集合がデータから推定される多変量正規分布のパラメトリック集合によって記述される。市場が変動的あるいは不安定なシナリオでは、対応する堅牢な最適化問題からの最適ポートフォリオ戦略がモデル不確実性のないポートフォリオよりも優れており、モデル不確実性を考慮することの重要性が示される。 We introduce a general framework for Markov decision problems under model uncertainty in a discrete-time infinite horizon setting. By providing a dynamic programming principle we obtain a local-to-global paradigm, namely solving a local, i.e., a one time-step robust optimization problem leads to an optimizer of the global (i.e. infinite time-steps) robust stochastic optimal control problem, as well as to a corresponding worst-case measure. Moreover, we apply this framework to portfolio optimization involving data of the S&P 500. We present two different types of ambiguity sets; one is fully data-driven given by a Wasserstein-ball around the empirical measure, the second one is described by a parametric set of multivariate normal distributions, where the corresponding uncertainty sets of the parameters are estimated from the data. It turns out that in scenarios where the market is volatile or bearish, the optimal portfolio strategies from the corresponding robust optimization problem outperforms the ones without model uncertainty, showcasing the importance of taking model uncertainty into account.	翻訳日:2022-06-14 18:21:47 公開日:2022-06-13
# (参考訳) 社会と記憶誘導を持つ人工航海者における累積的文化の自然発生 Cumulative culture spontaneously emerges in artificial navigators who are social and memory-guided ( http://arxiv.org/abs/2206.06281v1 ) ライセンス: CC BY 4.0	Edwin S. Dalmaijer	(参考訳) これまでは人間特有の存在と考えられてきたが、累積的な文化進化は非ヒト動物にも見られる。個人からの適応的な革新が社会学習を通じて連続的に受け継がれるときである。例えば、単独または安定なペアで飛行するハトは、比較的厳格な亜最適経路を示すが、経験豊富なメンバーがナイーブな経路に交換される世代によって、ルート効率が徐々に向上する。これは、累積的な文化進化のために必要最小限の認知アーキテクチャが生み出すかという疑問を提起する。ここでは,目標指向性,社会的近接性,経路記憶の3つの主機能を有するエージェントに対して,この問題に答えようと考えた。効率と世代効率の改善のためのオプティマでは、ハトで観察された累積培養を再現した。それぞれの最適な経路は、主に記憶によって決定され、社会的近接と目標指向によってより少ない範囲で決定された。社会的近接の必要性から、各エージェントは記憶された経路に沿って経験豊富なエージェントに近づいた。しかし、ルート記憶に妨げられず、単純エージェントの進路は目標に向かって進む傾向が強かった。これによりペアの経路が微妙に偏り、その結果の効率改善はゴールに回帰する。累積的文化的進化の現在の枠組みでは、世代ごとの漸進的な改善が全ての中核的な基準を満たしており、初歩的な累積的最適化は、社会的近接を好んで記憶能力を持つ単純なシステムでも現れる進化のメカニズムであることを示している。 While previously thought to be uniquely human, cumulative cultural evolution continues to be found in non-human animals. It occurs when an adaptive innovation from an individual is repeatedly passed onto consecutive generations through social learning. For example, pigeons who fly alone or in stable pairs show relatively rigid sub-optimal routes, but gradually improve route efficiency over generations of pairs in which experienced members are swapped for naive ones. This raises the question of what the minimally required cognitive architecture is for cumulative cultural evolution to emerge. Here, I aimed to answer this question in artificial agents who employ three main functions: goal-direction, social proximity, and route memory. At the optima for efficiency and generational efficiency improvement, agents replicated cumulative culture observed in pigeons. At each optimum, paths were determined primarily by memory, and to a lesser extent by social proximity and goal-direction. Because of their need for social proximity, each naive agent stayed close to their experienced counterpart as that followed its memorised path. However, unhindered by route memory, the naive agent's heading was more likely to err towards the goal. This subtly biased pairs' routes, and the resulting efficiency improvement is thus regression to the goal. The resulting incremental improvements over generations meet all core criteria in current frameworks of cumulative cultural evolution, suggesting that rudimentary cumulative optimisation is an evolutionary mechanism that emerges even in simple systems that prefer social proximity and have a memory capacity.	翻訳日:2022-06-14 18:20:01 公開日:2022-06-13
# GradICON: 勾配逆整合による近似微分同相 GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency ( http://arxiv.org/abs/2206.05897v1 ) ライセンス: Link先を確認	Lin Tian, Hastings Greer, Fran\c{c}ois-Xavier Vialard, Roland Kwitt, Ra\'ul San Jos\'e Est\'epar, Marc Niethammer	(参考訳) 多くの登録手法があり、初期の研究は画像ペアの最適化に基づくアプローチに重点を置いている。最近の研究は、空間変換を予測するための深層登録ネットワークに焦点を当てている。どちらの場合でも、低次元変換パラメータの代わりに変換関数を推定する非パラメトリック登録モデルは、(滑らかな変換を促進するために)適切な正規化子とそのパラメータを選択する必要がある。これによりモデルはチューニングが難しくなり、選択された正規化器によって許容される変形空間に変形を制限できる。光学フローのためのディープラーニングモデルは変換を規則化せず、代わりにデータに完全に依存するが、医療画像登録に望ましい二相変換は生じないかもしれない。そこで本研究では,正規化に逆一貫性のみを使用する非教師付きアイコン型ディープラーニング登録手法に基づくgradiconビルディングを開発した。しかし、ICONとは対照的に、勾配の逆一貫性損失を用いることで収束が著しく向上するだけでなく、結果の変換写像の暗黙的な正則化がもたらされることを実証し実証的に検証する。磁気共鳴(MR)膝画像とCT(CT)肺画像の合成実験と実験により,GradICONの優れた性能が示された。我々は、簡単な登録定式化を維持しつつ、最先端(SOTA)の精度を達成する。 Many registration approaches exist with early work focusing on optimization-based approaches for image pairs. Recent work focuses on deep registration networks to predict spatial transformations. In both cases, commonly used non-parametric registration models, which estimate transformation functions instead of low-dimensional transformation parameters, require choosing a suitable regularizer (to encourage smooth transformations) and its parameters. This makes models difficult to tune and restricts deformations to the deformation space permissible by the chosen regularizer. While deep-learning models for optical flow exist that do not regularize transformations and instead entirely rely on the data these might not yield diffeomorphic transformations which are desirable for medical image registration. In this work, we therefore develop GradICON building upon the unsupervised ICON deep-learning registration approach, which only uses inverse-consistency for regularization. However, in contrast to ICON, we prove and empirically verify that using a gradient inverse-consistency loss not only significantly improves convergence, but also results in a similar implicit regularization of the resulting transformation map. Synthetic experiments and experiments on magnetic resonance (MR) knee images and computed tomography (CT) lung images show the excellent performance of GradICON. We achieve state-of-the-art (SOTA) accuracy while retaining a simple registration formulation, which is practically important.	翻訳日:2022-06-14 18:05:31 公開日:2022-06-13
# 可変画像復元のためのハイパーネットワーク One Size Fits All: Hypernetwork for Tunable Image Restoration ( http://arxiv.org/abs/2206.05970v1 ) ライセンス: Link先を確認	Shai Aharon and Gil Ben-Artzi	(参考訳) 本稿では,複数のモデルの精度を向上し,それぞれが異なる劣化レベルに最適化され,単一のモデルと全く同じパラメータを持つ,可変画像復元のための新しい手法を提案する。我々のモデルは、一定数のパラメータと様々な画像復元タスクで必要となる多くの劣化レベルを復元するために最適化できる。実世界のデータセットに対する実験により、我々の手法は既存のチューナブルモデルに対してデノナイズ、デJPEG、超高解像度化を実現し、より広範囲の劣化レベルに対するスムーズで正確なフィッティングを可能にした。 We introduce a novel approach for tunable image restoration that achieves the accuracy of multiple models, each optimized for a different level of degradation, with exactly the same number of parameters as a single model. Our model can be optimized to restore as many degradation levels as required with a constant number of parameters and for various image restoration tasks. Experiments on real-world datasets show that our approach achieves state-of-the art results in denoising, DeJPEG and super-resolution with respect to existing tunable models, allowing smoother and more accurate fitting over a wider range of degradation levels.	翻訳日:2022-06-14 18:05:09 公開日:2022-06-13
# 胸部X線写真における結核分画の深層アンサンブル学習 Deep ensemble learning for segmenting tuberculosis-consistent manifestations in chest radiographs ( http://arxiv.org/abs/2206.06065v1 ) ライセンス: Link先を確認	Sivaramakrishnan Rajaraman, Feng Yang, Ghada Zamzmi, Peng Guo, Zhiyun Xue and Sameer K Antani	(参考訳) 深層学習(DL)法を用いた胸部X線(CXR)の結核性病変の自動分離は、放射線治療の労力を減らし、臨床的意思決定を補完し、患者治療の改善をもたらす可能性がある。論文の大半は、粗い境界ボックスアノテーションを用いた自動セグメンテーションモデルのトレーニングについて論じている。しかし、バウンディングボックスアノテーションの粒度は、ピクセルレベルでの偽陽性と負のかなりの部分を含めることによって、全体的なセマンティックセグメンテーション性能に悪影響を及ぼす可能性がある。この研究 (i)tb整合病変の細粒度アノテーションの利点と有用性の評価 (ii) オリジナルおよび骨抑制前頭cxrのtb一貫性病変を意味的に分節するu-netモデルの変種を訓練し構成する。ビットワイズ,ビットワイズ,ビットワイズ,ビットワイズマックス,スタックリングなどのアンサンブル手法を用いてセグメント化性能を評価した。重み付けアンサンブルは,個々の構成モデルおよび他のアンサンブル法と比較して,高いセグメンテーション性能(ディップスコア0.5743,95%信頼区間0.4055,0.7431)を示した。本研究は,細粒度tb一貫性病変分割性能を改善するためにアンサンブル学習を適用した最初の研究である。 Automated segmentation of tuberculosis (TB)-consistent lesions in chest X-rays (CXRs) using deep learning (DL) methods can help reduce radiologist effort, supplement clinical decision-making, and potentially result in improved patient treatment. The majority of works in the literature discuss training automatic segmentation models using coarse bounding box annotations. However, the granularity of the bounding box annotation could result in the inclusion of a considerable fraction of false positives and negatives at the pixel level that may adversely impact overall semantic segmentation performance. This study (i) evaluates the benefits of using fine-grained annotations of TB-consistent lesions and (ii) trains and constructs ensembles of the variants of U-Net models for semantically segmenting TB-consistent lesions in both original and bone-suppressed frontal CXRs. We evaluated segmentation performance using several ensemble methods such as bitwise AND, bitwise-OR, bitwise-MAX, and stacking. We observed that the stacking ensemble demonstrated superior segmentation performance (Dice score: 0.5743, 95% confidence interval: (0.4055,0.7431)) compared to the individual constituent models and other ensemble methods. To the best of our knowledge, this is the first study to apply ensemble learning to improve fine-grained TB-consistent lesion segmentation performance.	翻訳日:2022-06-14 18:04:57 公開日:2022-06-13
# 自己深達度学習によるmpMRI前立腺癌の悪性度検出と局在:臨床応用への一歩 Prostate Cancer Malignancy Detection and localization from mpMRI using auto-Deep Learning: One Step Closer to Clinical Utilization ( http://arxiv.org/abs/2206.06235v1 ) ライセンス: Link先を確認	Weiwei Zong and Eric Carver and Simeng Zhu and Eric Schaff and Daniel Chapman and Joon Lee and Hassan Bagher Ebadian and Indrin Chetty and Benjamin Movsas and Winston Wen and Tarik Alafif and Xiangyun Zong	(参考訳) mpMRIによる前立腺悪性腫瘍の診断は,ここ数年で大きく研究されている。モデル解釈とドメインドリフトが臨床利用の主要な道路ブロックとなっている。対象は,201名の患者とのコホートでカスタマイズされた畳み込みニューラルネットワークを訓練し,関心領域周辺の2dパッチをカットして入力として,前立腺の2.5dスライスを入力とし,モデル空間においてオートケラを用いて最適なモデル検索を行った。末梢領域(PZ)と中心腺(CG)を別々に訓練し,PZ検出器とCG検出器は, 医師の作業負荷を大幅に軽減するために, 配列から最も疑わしいスライスをハイライトするために効果的に実証された。 Automatic diagnosis of malignant prostate cancer patients from mpMRI has been studied heavily in the past years. Model interpretation and domain drift have been the main road blocks for clinical utilization. As an extension from our previous work where we trained a customized convolutional neural network on a public cohort with 201 patients and the cropped 2D patches around the region of interest were used as the input, the cropped 2.5D slices of the prostate glands were used as the input, and the optimal model were searched in the model space using autoKeras. Something different was peripheral zone (PZ) and central gland (CG) were trained and tested separately, the PZ detector and CG detector were demonstrated effectively in highlighting the most suspicious slices out of a sequence, hopefully to greatly ease the workload for the physicians.	翻訳日:2022-06-14 18:04:32 公開日:2022-06-13
# 脳腫瘍患者の生存時間予測のためのMMMNA-Net MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients ( http://arxiv.org/abs/2206.06267v1 ) ライセンス: Link先を確認	Wen Tang, Haoyue Zhang, Pengxin Yu, Han Kang, Rongguo Zhang	(参考訳) 全身生存時間(OS)はグリオーマの病態に対する最も重要な評価指標の1つである。マルチモーダルMRI(Multimodal Magnetic Resonance Imaging)スキャンは、グリオーマ予後OSの研究において重要な役割を担っている。マルチモーダルMRIにおけるOS時間予測のために, 深層学習に基づくいくつかの手法を提案する。しかし、これらの手法は通常、深層学習ネットワークの開始時や終了時にマルチモーダル情報を融合し、異なるスケールの機能の融合を欠いている。さらに、ネットワークの終端での融合は常にグローバル(例えば、グローバル平均プーリングアウトプットの結合後に完全に接続された)やローカル(例えば、バイリニアプーリング)に適応し、ローカルとグローバルの情報を失う。本稿では,脳腫瘍患者に対するマルチモーダルos時間予測法を提案する。提案手法は,現在の最先端手法(0.6989対0.6426)に比べて8.76%向上した。広範囲な試験により,本手法はモダリティが欠如している状況に適応できることが示された。コードはhttps://github.com/tangwen920812/mmmna-netで入手できる。 Overall survival (OS) time is one of the most important evaluation indices for gliomas situations. Multimodal Magnetic Resonance Imaging (MRI) scans play an important role in the study of glioma prognosis OS time. Several deep learning-based methods are proposed for the OS time prediction on multi-modal MRI problems. However, these methods usually fuse multi-modal information at the beginning or at the end of the deep learning networks and lack the fusion of features from different scales. In addition, the fusion at the end of networks always adapts global with global (eg. fully connected after concatenation of global average pooling output) or local with local (eg. bilinear pooling), which loses the information of local with global. In this paper, we propose a novel method for multi-modal OS time prediction of brain tumor patients, which contains an improved nonlocal features fusion module introduced on different scales. Our method obtains a relative 8.76% improvement over the current state-of-art method (0.6989 vs. 0.6426 on accuracy). Extensive testing demonstrates that our method could adapt to situations with missing modalities. The code is available at https://github.com/TangWen920812/mmmna-net.	翻訳日:2022-06-14 18:04:15 公開日:2022-06-13
# グラフモチーフ度対策の絶対表現性 Absolute Expressiveness of Subgraph Motif Centrality Measures ( http://arxiv.org/abs/2206.06137v1 ) ライセンス: Link先を確認	Andreas Pieris and Jorge Salas	(参考訳) グラフベースのアプリケーションでは、(有向または無向の)グラフの最も重要なまたは `central'' 頂点をピンポイントするか、グラフの頂点を重要度に応じてランク付けするのが一般的なタスクである。この目的のために、グラフ内の頂点が最も重要なものであるかを評価する文献において、いわゆる中央集権的尺度が多数提案されている。 riveros と salas は icdt 2020 の論文で、グラフにおける頂点の重要性は、それを取り巻く部分グラフモチーフとして知られる'relevant' 接続部分グラフの数と相対する、次の直感的な原理に基づく集中度尺度の族を提案した。上述の原則から派生した措置を下記の指標として参照する。サブグラフモチーフ測度はグラフデータベースアプリケーションに適しているという説得力のある主張がある。 ICDT論文は, 部分グラフモチーフ測度が好むいくつかの特性について研究したが, その絶対表現性はほとんど探索されていない。本研究の目的は,部分グラフモチーフ測度のファミリの絶対表現性を正確に特徴付けることである。 In graph-based applications, a common task is to pinpoint the most important or ``central'' vertex in a (directed or undirected) graph, or rank the vertices of a graph according to their importance. To this end, a plethora of so-called centrality measures have been proposed in the literature that assess which vertices in a graph are the most important ones. Riveros and Salas, in an ICDT 2020 paper, proposed a family of centrality measures based on the following intuitive principle: the importance of a vertex in a graph is relative to the number of ``relevant'' connected subgraphs, known as subgraph motifs, surrounding it. We refer to the measures derived from the above principle as subgraph motif measures. It has been convincingly argued that subgraph motif measures are well-suited for graph database applications. Although the ICDT paper studied several favourable properties enjoyed by subgraph motif measures, their absolute expressiveness remains largely unexplored. The goal of this work is to precisely characterize the absolute expressiveness of the family of subgraph motif measures.	翻訳日:2022-06-14 18:01:41 公開日:2022-06-13
# シャッフル型勾配アルゴリズムの大域解への収束について On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms ( http://arxiv.org/abs/2206.05869v1 ) ライセンス: Link先を確認	Lam M. Nguyen, Trang H. Tran	(参考訳) 確率的勾配降下(sgd)アルゴリズムは、拡張性と大規模問題への対処効率により、多くの機械学習タスクで選択される方法である。本稿では,本研究の主流である実用的ヒューリスティックスと一致するSGDのシャッフルバージョンに着目した。過パラメータ設定下での非凸関数のクラスに対してSGDをシャッフルする大域的解の収束性を示す。我々の分析では、以前の文献よりも緩和された非凸仮定を採用している。それでも、一般凸設定においてシャッフルSGDが達成した計算複雑性は維持される。 Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings. Our analysis employs more relaxed non-convex assumptions than previous literature. Nevertheless, we maintain the desired computational complexity as shuffling SGD has achieved in the general convex setting.	翻訳日:2022-06-14 17:57:27 公開日:2022-06-13
# Safe-FinRL:高速株式取引のための低バイアス・可変深層強化学習実装 Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading ( http://arxiv.org/abs/2206.05910v1 ) ライセンス: Link先を確認	Zitao Song, Xuyang Jin, Chenliang Li	(参考訳) 近年、量的金融の実践者の多くが、より優れた量的取引(QT)戦略を構築するためにDeep Reinforcement Learning(DRL)を使用しようと試みている。しかしながら、既存の多くの研究は、非定常的な金融環境や実際の金融市場にDRLを適用する際のバイアスや分散トレードオフなど、いくつかの深刻な課題に対処できていない。そこで本研究では,固定的金融環境と低バイアス・分散推定によって強化された,新規なdrlベースの株取引戦略であるsafe-finrlを提案する。第一に、長い金融時系列をほぼ定常的な短い環境に分離し、第二に、一般的なリトレースオペレータをソフトアクタ-クリティックに組み込むことにより、ほぼ定常な金融環境にトレースsacを実装する。暗号通貨市場における大規模な実験は、Safe-FinRLが安定的な価値推定と安定した政策改善を提供し、ほぼ定常的な金融環境においてバイアスと分散を著しく低減したことを示している。 In recent years, many practitioners in quantitative finance have attempted to use Deep Reinforcement Learning (DRL) to build better quantitative trading (QT) strategies. Nevertheless, many existing studies fail to address several serious challenges, such as the non-stationary financial environment and the bias and variance trade-off when applying DRL in the real financial market. In this work, we proposed Safe-FinRL, a novel DRL-based high-freq stock trading strategy enhanced by the near-stationary financial environment and low bias and variance estimation. Our main contributions are twofold: firstly, we separate the long financial time series into the near-stationary short environment; secondly, we implement Trace-SAC in the near-stationary financial environment by incorporating the general retrace operator into the Soft Actor-Critic. Extensive experiments on the cryptocurrency market have demonstrated that Safe-FinRL has provided a stable value estimation and a steady policy improvement and reduced bias and variance significantly in the near-stationary financial environment.	翻訳日:2022-06-14 17:57:18 公開日:2022-06-13
# 決定点過程に対する遅延および高速グリーディMAP推論 Lazy and Fast Greedy MAP Inference for Determinantal Point Process ( http://arxiv.org/abs/2206.05947v1 ) ライセンス: Link先を確認	Shinichi Hemmi, Taihei Oki, Shinsaku Sakaue, Kaito Fujii, Satoru Iwata	(参考訳) 決定点プロセス(DPP)に対するMAP推論は、多くの機械学習アプリケーションにおいて多様な項目を選択する上で重要である。 DPP MAP推論はNPハードであるが、グリードアルゴリズムはしばしば高品質な解を見つけ、多くの研究者がその効率的な実装を研究している。古典的かつ実用的な方法の一つが遅延グリーディアルゴリズムであり、これは一般的な部分モジュラー関数の最大化に適用できるが、コレスキー因子分解に基づく最近の高速グリーディアルゴリズムはdpp写像推論より効率的である。本稿では,文献において相容れないと考えられる「怠け者」と「速い者」の考え方を組み合わせる方法について述べる。私たちの怠け者で高速な欲望アルゴリズムは、現在の最良のアルゴリズムとほぼ同じ時間複雑性を達成し、実際より速く実行します。 Lazy + fast"というアイデアは他のグリーディ型アルゴリズムにも拡張可能である。また、制約のない DPP MAP 推論のための二重欲求アルゴリズムの高速版も提供する。実験は加速アイデアの有効性を検証する。 The maximum a posteriori (MAP) inference for determinantal point processes (DPPs) is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. One classical and practical method is the lazy greedy algorithm, which is applicable to general submodular function maximization, while a recent fast greedy algorithm based on the Cholesky factorization is more efficient for DPP MAP inference. This paper presents how to combine the ideas of "lazy" and "fast", which have been considered incompatible in the literature. Our lazy and fast greedy algorithm achieves almost the same time complexity as the current best one and runs faster in practice. The idea of "lazy + fast" is extendable to other greedy-type algorithms. We also give a fast version of the double greedy algorithm for unconstrained DPP MAP inference. Experiments validate the effectiveness of our acceleration ideas.	翻訳日:2022-06-14 17:56:56 公開日:2022-06-13
# 値関数に基づく二値ハイパーパラメータ選択問題に対する差分凸アルゴリズム Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems ( http://arxiv.org/abs/2206.05976v1 ) ライセンス: Link先を確認	Lucy Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang	(参考訳) 超パラメータチューニングのための勾配に基づく最適化手法は、固定上層変数値に対して、両レベルプログラムの下位レベルが強い凸(LLSC)と滑らか(LLS)であるときに、定常解に対する理論的収束を保証する。この条件は多くの機械学習アルゴリズムのハイパーパラメータのチューニングから生じるバイレベルプログラムでは満足できない。本研究では, 逐次収束型値関数に基づく差分凸アルゴリズム(VF-iDCA)を開発した。このアルゴリズムは,多種多様なハイパーパラメータチューニングアプリケーションから,LLSCやLSSの仮定を伴わない定常解を実現する。提案したVF-iDCAは,過度パラメータを調整した場合に優れた性能を示す。 Gradient-based optimization methods for hyperparameter tuning guarantee theoretical convergence to stationary solutions when for fixed upper-level variable values, the lower level of the bilevel program is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We show that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications. Our extensive experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.	翻訳日:2022-06-14 17:55:32 公開日:2022-06-13
# 機械学習マルチバースのモデリング Modeling the Machine Learning Multiverse ( http://arxiv.org/abs/2206.05985v1 ) ライセンス: Link先を確認	Samuel J. Bell, Onno P. Kampman, Jesse Dodge and Neil D. Lawrence	(参考訳) 機械学習研究の信頼性と信頼性に関する懸念が高まる中、我々は、堅牢で一般化可能なクレームを作るための原則的なフレームワークであるMultiverse Analysisを提案する。我々の枠組みは,心理学の再現性危機に対応するために導入された多元的分析(Steegen et al., 2016)に基づいている。高次元かつしばしば連続なml探索空間を効率的に探索するために、多元系をガウス過程でモデル化し、ベイズ実験設計を適用する。我々のフレームワークは、モデル性能に関する堅牢な科学的結論を導き出すために設計されており、従来の最適化よりも探索に焦点を当てている。最初の2つのケーススタディにおいて、適応最適化器の相対的メリットに関する議論を考察した。第二に,大規模バッチ訓練一般化ギャップに対する学習率の影響について,矛盾する研究を合成する。機械学習コミュニティにとって、Multiverse Analysisは、堅牢なクレームを識別し、透明性を高め、再現性を向上させるためのシンプルで効果的なテクニックである。 Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the Multiverse Analysis. Our framework builds upon the Multiverse Analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML search spaces, we model the multiverse with a Gaussian Process surrogate and apply Bayesian experimental design. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization. In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. For the machine learning community, the Multiverse Analysis is a simple and effective technique for identifying robust claims, for increasing transparency, and a step toward improved reproducibility.	翻訳日:2022-06-14 17:55:20 公開日:2022-06-13
# 非orthogonal multi accessにおけるgpuアクセラレーション機械学習 GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access ( http://arxiv.org/abs/2206.05998v1 ) ライセンス: Link先を確認	Daniel Sch\"aufele, Guillermo Marcus, Nikolaus Binder, Matthias Mehlhose, Alexander Keller, S{\l}awomir Sta\'nczak	(参考訳) 非直交多重アクセス(Noma)は、将来の5Gおよび6Gネットワークに必要な大規模な接続を可能にする興味深い技術である。純粋に線形処理は、すでにNOMAシステムでは優れた性能を達成しているが、あるシナリオでは、許容可能な性能を保証するためには非線形処理が必須である。本稿では,線形処理と非線形処理の両方の利点を組み合わせたニューラルネットワークアーキテクチャを提案する。リアルタイム検出性能はグラフィックス処理ユニット(GPU)の高効率実装によって実証される。実験環境における実測値を用いて,従来の手法よりも優れた手法を示す。 Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks. While purely linear processing already achieves good performance in NOMA systems, in certain scenarios, non-linear processing is mandatory to ensure acceptable performance. In this paper, we propose a neural network architecture that combines the advantages of both linear and non-linear processing. Its real-time detection performance is demonstrated by a highly efficient implementation on a graphics processing unit (GPU). Using real measurements in a laboratory environment, we show the superiority of our approach over conventional methods.	翻訳日:2022-06-14 17:55:04 公開日:2022-06-13
# 都市道路網における充電ステーションの強化学習による配置 Reinforcement Learning-based Placement of Charging Stations in Urban Road Networks ( http://arxiv.org/abs/2206.06011v1 ) ライセンス: Link先を確認	Leonie von Wahl (1), Nicolas Tempelmeier (1), Ashutosh Sao (2) and Elena Demidova (3) ((1) Volkswagen Group, (2) L3S Research Center, University of Hannover, (3) Data Science & Intelligent Systems Group (DSIS), University of Bonn)	(参考訳) 従来のモビリティからエレクトロモビリティへの移行は、帯電インフラの可用性と最適配置に大きく依存しており、都市部における帯電ステーションの最適配置について検討する。我々は、地域の充電インフラ供給を最大化し、予算制約を設定しながら、待ち時間、旅行時間、充電時間を最小化する。さらに, 都市全域の充電需要をより高精度に推定するために, 家庭での充電が可能となる可能性についても検討した。充電ステーションの最適位置と異なる充電タイプの充電池数を求める非線形整数最適化問題として充電ステーションの配置を定式化する。我々は、充電ステーション配置問題(PCRL)を解決するために、新しいDeep Reinforcement Learningアプローチを設計する。実世界のデータセットに対する大規模な実験は、PCRLが5つの基準線と比較して充電計画の利点を高めながら、待ち時間と旅行時間を減らしていることを示している。既存のインフラストラクチャと比較して、待ち時間を最大97%削減し、メリットを最大497%向上することが可能です。 The transition from conventional mobility to electromobility largely depends on charging infrastructure availability and optimal placement.This paper examines the optimal placement of charging stations in urban areas. We maximise the charging infrastructure supply over the area and minimise waiting, travel, and charging times while setting budget constraints. Moreover, we include the possibility of charging vehicles at home to obtain a more refined estimation of the actual charging demand throughout the urban area. We formulate the Placement of Charging Stations problem as a non-linear integer optimisation problem that seeks the optimal positions for charging stations and the optimal number of charging piles of different charging types. We design a novel Deep Reinforcement Learning approach to solve the charging station placement problem (PCRL). Extensive experiments on real-world datasets show how the PCRL reduces the waiting and travel time while increasing the benefit of the charging plan compared to five baselines. Compared to the existing infrastructure, we can reduce the waiting time by up to 97% and increase the benefit up to 497%.	翻訳日:2022-06-14 17:54:53 公開日:2022-06-13
# 雑音フィードバックを持つゲームにおける非回帰学習:学習速度分離による高速率と適応性 No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation ( http://arxiv.org/abs/2206.06015v1 ) ライセンス: Link先を確認	Yu-Guan Hsieh, Kimon Antonakopoulos, Volkan Cevher, Panayotis Mertikopoulos	(参考訳) 本稿では,学習者が他の最適化エージェントと連続ゲームに関わった場合の後悔の最小化の問題について考察する。変動安定ゲーム(全凸凹ゲームと単調ゲームを含む連続ゲーム)の文脈でこの問題を考察し、各プレイヤーが個々のペイオフ勾配のノイズ推定にのみアクセスできる場合について考察する。雑音が加法的であれば、ゲーム理論と純粋に敵対的な設定は同様の後悔の保証を享受するが、ノイズが乗算的であれば、学習者が常に後悔できることを示す。学習速度分離を伴う楽観的勾配スキーム(つまり、ノイズプロファイルに応じて、その方法の補間と更新ステップが異なるスケジュールに調整される)によって、この高速レートを達成する。その後、微妙なハイパーパラメータチューニングの必要性をなくすため、最悪と最良な後悔の保証をスムーズに補間する完全適応手法を提案する。 We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that smoothly interpolates between worst- and best-case regret guarantees.	翻訳日:2022-06-14 17:54:36 公開日:2022-06-13
# 機械学習モデルの$k$-safetyプロパティの指定とテスト Specifying and Testing $k$-Safety Properties for Machine-Learning Models ( http://arxiv.org/abs/2206.06054v1 ) ライセンス: Link先を確認	Maria Christakis, Hasan Ferit Eniser, J\"org Hoffmann, Adish Singla, Valentin W\"ustholz	(参考訳) 機械学習モデルは、画像分類や意思決定タスクの支援など、私たちの生活でますます普及している。その結果、これらのモデルの信頼性は重要であり、その堅牢性と公平性を検証するための多くのアプローチの開発に繋がった。しかし、そのような特定の特性を超えて、モデルから一般的な機能的修正性期待を特定することは困難である。本稿では,形式的手法で使われる仕様からインスピレーションを得て,約$k$の異なる実行,いわゆる$k$-safetyプロパティを推論することで,機能的正当性を表現した。銀行のクレジット・スクリーニングモデルを考えると、「人がローンを否定され、その収入が減少しても、まだローンを否定すべきである」という期待は2つの安全資産である。ここでは、機械学習モデルに対する$k$-safetyプロパティの幅広い適用性を示し、それらを表現するための最初の仕様言語を示す。我々はまた、メタモルフィックテストを使用してそのようなプロパティを自動的に検証するフレームワークで言語を運用する。我々の実験は、我々のフレームワークがプロパティ違反を特定するのに効果的であり、検出されたバグがより良いモデルを訓練するのに使えることを示した。 Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about $k$ different executions, so-called $k$-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of $k$-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models.	翻訳日:2022-06-14 17:54:16 公開日:2022-06-13
# 交通を考慮した長期記憶予測に基づく機械型デバイスのエネルギー効率向上 Energy-Efficient Wake-Up Signalling for Machine-Type Devices Based on Traffic-Aware Long-Short Term Memory Prediction ( http://arxiv.org/abs/2206.06058v1 ) ライセンス: Link先を確認	David E. Ru\'iz-Guirola, Carlos A. Rodr\'iguez-L\'opez, Samuel Montejo-S\'anchez, Richard Demo Souza, Onel L. A. L\'opez and Hirley Alves	(参考訳) 低消費電力機械型通信(MTC)ネットワークでは省エネルギー化が課題となっている。この点において、機械型デバイス(MTD)の無線インタフェースが消費するエネルギーを最小化することを目的としたWuS(Wake-up Signal)技術は、有望な解決策である。しかし、最先端のWuSメカニズムは静的な操作パラメータを使用するため、システムダイナミクスに効率的に適応することはできない。そこで我々は,mtcのトラフィックパターンを予測し,それに応じてwusを構成するための,単純かつ効率的なニューラルネットワークを設計した。提案する予測wus (fwus) は,遅延状態のページ監視を回避し,mtdの睡眠時間を延ばすことのできる,精度の高いlong-short term memory (lstm) ベースのトラヒック予測を活用している。シミュレーションの結果,提案手法の有効性が示された。交通予測誤差は4%未満で、誤報と誤検出確率はそれぞれ8.8%と1.3%である。エネルギー消費削減の観点からは、fwusは最高ベンチマークメカニズムを最大32%で上回ることができる。最後に、FWuSが交通密度の変化に動的に適応し、低消費電力のMCCスケーラビリティを促進する能力を証明する。 Reducing energy consumption is a pressing issue in low-power machine-type communication (MTC) networks. In this regard, the Wake-up Signal (WuS) technology, which aims to minimize the energy consumed by the radio interface of the machine-type devices (MTDs), stands as a promising solution. However, state-of-the-art WuS mechanisms use static operational parameters, so they cannot efficiently adapt to the system dynamics. To overcome this, we design a simple but efficient neural network to predict MTC traffic patterns and configure WuS accordingly. Our proposed forecasting WuS (FWuS) leverages an accurate long-short term memory (LSTM)- based traffic prediction that allows extending the sleep time of MTDs by avoiding frequent page monitoring occasions in idle state. Simulation results show the effectiveness of our approach. The traffic prediction errors are shown to be below 4%, being false alarm and miss-detection probabilities respectively below 8.8% and 1.3%. In terms of energy consumption reduction, FWuS can outperform the best benchmark mechanism in up to 32%. Finally, we certify the ability of FWuS to dynamically adapt to traffic density changes, promoting low-power MTC scalability	翻訳日:2022-06-14 17:53:56 公開日:2022-06-13
# EGRU:アクティビティスパース推論と学習のためのイベントベースGRU EGRU: Event-based GRU for activity-sparse inference and learning ( http://arxiv.org/abs/2206.06178v1 ) ライセンス: Link先を確認	Anand Subramoney, Khaleelulla Khan Nazeer, Mark Sch\"one, Christian Mayr, David Kappel	(参考訳) 繰り返しニューラルネットワーク(RNN)のスケーラビリティは、前のタイムステップの出力に対する各タイムステップの計算の逐次的依存によって妨げられる。したがって、RNNの高速化とスケールアップの1つの方法は、モデルのサイズやタスクに依存しない各ステップで必要とされる計算を減らすことである。本稿では、イベントベースGRU(Event-based GRU)と呼ばれるイベントベースアクティビティスパースモデルとしてGRU(Gated Recurrent Units)を再構成し、他のユニットからの入力イベント(イベントベース)の受信時にのみ更新を演算するモデルを提案する。アクティブな単位のごく一部しか持たない(アクティビティスパース)と組み合わせると、このモデルは現在のRNNよりもはるかに効率的な計算能力を持つ。特に,本モデルでは,勾配降下時のスパースパラメータの更新も行い,この計算効率をトレーニングフェーズに拡張する。 EGRUは,言語モデリングを含む実世界のタスクにおける最先端の繰り返しネットワークモデルと比較して,高い活動空間を推論や訓練中に自然に維持し,競争力を発揮することを示す。これは、新しいニューロモルフィックなハードウェアに適した、スケーラブルでより適した次世代のリカレントネットワークの舞台となる。 The scalability of recurrent neural networks (RNNs) is hindered by the sequential dependence of each time step's computation on the previous time step's output. Therefore, one way to speed up and scale RNNs is to reduce the computation required at each time step independent of model size and task. In this paper, we propose a model that reformulates Gated Recurrent Units (GRU) as an event-based activity-sparse model that we call the Event-based GRU (EGRU), where units compute updates only on receipt of input events (event-based) from other units. When combined with having only a small fraction of the units active at a time (activity-sparse), this model has the potential to be vastly more compute efficient than current RNNs. Notably, activity-sparsity in our model also translates into sparse parameter updates during gradient descent, extending this compute efficiency to the training phase. We show that the EGRU demonstrates competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling while maintaining high activity sparsity naturally during inference and training. This sets the stage for the next generation of recurrent networks that are scalable and more suitable for novel neuromorphic hardware.	翻訳日:2022-06-14 17:52:42 公開日:2022-06-13
# 医療におけるaiベースのデータ準備とデータ分析:糖尿病の事例 AI-based Data Preparation and Data Analytics in Healthcare: The Case of Diabetes ( http://arxiv.org/abs/2206.06182v1 ) ライセンス: Link先を確認	Marianna Maranghi, Aris Anagnostopoulos, Irene Cannistraci, Ioannis Chatzigiannakis, Federico Croce, Giulia Di Teodoro, Michele Gentile, Giorgio Grani, Maurizio Lenzerini, Stefano Leonardi, Andrea Mastropietro, Laura Palagi, Massimiliano Pappa, Riccardo Rosati, Riccardo Valentini, Paola Velardi	(参考訳) Associazione Medici Diabetologi (AMD)は、AMDデータベースとしても知られる、世界最大規模の糖尿病患者の記録を収集し管理している。本稿では,これらの重要かつ価値のあるデータセットを概念化し,クリーニングし,分析するための人工知能と機械学習技術の活用に焦点をあてた,現在進行中のプロジェクトの初期成果について述べる。 The Associazione Medici Diabetologi (AMD) collects and manages one of the largest worldwide-available collections of diabetic patient records, also known as the AMD database. This paper presents the initial results of an ongoing project whose focus is the application of Artificial Intelligence and Machine Learning techniques for conceptualizing, cleaning, and analyzing such an important and valuable dataset, with the goal of providing predictive insights to better support diabetologists in their diagnostic and therapeutic choices.	翻訳日:2022-06-14 17:52:19 公開日:2022-06-13
# 機械学習に基づくウィンドウマルウェア検出手法の評価におけるデータセットサイズとクラス不均衡の影響について On the impact of dataset size and class imbalance in evaluating machine-learning-based windows malware detection techniques ( http://arxiv.org/abs/2206.06256v1 ) ライセンス: Link先を確認	David Illes	(参考訳) このプロジェクトの目的は、Microsoft Windowsのマルウェアに焦点を当てた結果の互換性と実際の適用性に関するデータを収集し分析することであり、具体的には、データセットのサイズとデータセットの不均衡が測定された検出性能に与える影響である。一部の研究者は、より小さなデータセットを使用しており、データセットのサイズがパフォーマンスに大きな影響を与える場合、公開結果の比較が困難になる。研究者はまた、バランスの取れたデータセットと精度をテストの指標として使う傾向がある。前者は現実の真の表現ではなく、良性サンプルはマルウェアの数を大幅に上回っており、後者は不均衡な問題に対する問題であることが知られている。このプロジェクトは、データセットのサイズが測定された検出器のパフォーマンスと相関しているかどうかを、公開結果の有意義な比較を妨げる範囲まで理解し、公開研究で報告された優れたパフォーマンスが実際のデプロイシナリオでうまく機能することを期待できるかどうかを理解するという、2つの重要な目標を特定した。本研究の結果は, データセットサイズが測定値と相関し, 結果の有意な比較を防止し, かつ, 結果の結論に対するトレーニングセットサイズ精度曲線の性質を理解せずに, 精度スコアにのみ基づくアプローチを行なわなければならないことを示唆した。結果は、高い精度スコアは必ずしも現実世界のパフォーマンスに変換されないことを示唆した。 The purpose of this project was to collect and analyse data about the comparability and real-life applicability of published results focusing on Microsoft Windows malware, more specifically the impact of dataset size and testing dataset imbalance on measured detector performance. Some researchers use smaller datasets, and if dataset size has a significant impact on performance, that makes comparison of the published results difficult. Researchers also tend to use balanced datasets and accuracy as a metric for testing. The former is not a true representation of reality, where benign samples significantly outnumber malware, and the latter is approach is known to be problematic for imbalanced problems. The project identified two key objectives, to understand if dataset size correlates to measured detector performance to an extent that prevents meaningful comparison of published results, and to understand if good performance reported in published research can be expected to perform well in a real-world deployment scenario. The research's results suggested that dataset size does correlate with measured detector performance to an extent that prevents meaningful comparison of published results, and without understanding the nature of the training set size-accuracy curve for published results conclusions between approaches on which approach is "better" shouldn't be made solely based on accuracy scores. Results also suggested that high accuracy scores don't necessarily translate to high real-world performance.	翻訳日:2022-06-14 17:52:10 公開日:2022-06-13
# Annular Computational Imaging:単純なレンズによるパノラマ画像の鮮明化 Annular Computational Imaging: Capture Clear Panoramic Images through Simple Lens ( http://arxiv.org/abs/2206.06070v1 ) ライセンス: Link先を確認	Qi Jiang, Hao Shi, Lei Sun, Shaohua Gao, Kailun Yang, Kaiwei Wang	(参考訳) panoramic annular lens (pal) はレンズ数が少ないが、小型で視野が大きいため、モバイルやウェアラブル端末のセンシングタスクをパノラマで囲む大きな可能性を秘めている。しかし,小容量PALの画質は収差補正用レンズの欠如により光学限界に制限される。本稿では,軽量PAL設計の光学的限界を破るAnnular Computational Imaging (ACI)フレームワークを提案する。学習に基づく画像復元を容易にするため,パノラマ画像のための波動シミュレーションパイプラインを導入し,複数のデータ分布を通して合成と現実のギャップに対処する。提案したパイプラインは設計パラメータを持つ任意のPALに容易に適応でき、耐ゆるい設計に適している。さらに,パノラマ画像と物理インフォームド学習の物理的先行性を考慮した物理インフォームド画像復元ネットワーク(PI2RNet)を設計する。データセットレベルでは、DIVPanoデータセットを作成し、その上で広範囲な実験を行い、提案したネットワークが空間的に変化する劣化下でのパノラマ画像復元における技術の新しい状態を設定することを示す。さらに,3球面レンズのみを用いた簡易PALによるACIの評価により,高画質パノラマ画像とコンパクトデザインとの微妙なバランスが明らかとなった。私たちの知る限りでは、計算イメージング(CI)をPALで最初に探求した人物です。コードとデータセットはhttps://github.com/zju-jiangqi/ACI-PI2RNetで公開されている。 Panoramic Annular Lens (PAL), composed of few lenses, has great potential in panoramic surrounding sensing tasks for mobile and wearable devices because of its tiny size and large Field of View (FoV). However, the image quality of tiny-volume PAL confines to optical limit due to the lack of lenses for aberration correction. In this paper, we propose an Annular Computational Imaging (ACI) framework to break the optical limit of light-weight PAL design. To facilitate learning-based image restoration, we introduce a wave-based simulation pipeline for panoramic imaging and tackle the synthetic-to-real gap through multiple data distributions. The proposed pipeline can be easily adapted to any PAL with design parameters and is suitable for loose-tolerance designs. Furthermore, we design the Physics Informed Image Restoration Network (PI2RNet), considering the physical priors of panoramic imaging and physics-informed learning. At the dataset level, we create the DIVPano dataset and the extensive experiments on it illustrate that our proposed network sets the new state of the art in the panoramic image restoration under spatially-variant degradation. In addition, the evaluation of the proposed ACI on a simple PAL with only 3 spherical lenses reveals the delicate balance between high-quality panoramic imaging and compact design. To the best of our knowledge, we are the first to explore Computational Imaging (CI) in PAL. Code and datasets will be made publicly available at https://github.com/zju-jiangqi/ACI-PI2RNet.	翻訳日:2022-06-14 17:51:26 公開日:2022-06-13
# (参考訳) ロボットマニピュレーションタスクの強化学習におけるSim2Real転送のランダム化効果の解析 Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks ( http://arxiv.org/abs/2206.06282v1 ) ライセンス: CC BY 4.0	Josip Josifovski, Mohammadhossein Malmir, Noah Klarmann, Bare Luka \v{Z}agar, Nicol\'as Navarro-Guerrero and Alois Knoll	(参考訳) ランダム化は現在、ロボット工学におけるデータ駆動学習アルゴリズムのSim2Real転送において広く使われているアプローチである。それでもほとんどのsim2real研究は、特定のランダム化手法と高度にカスタマイズされたロボットシステムの結果を報告しており、異なるランダム化アプローチを体系的に評価することは困難である。この問題に対処するために、ロボットリーチ・アンド・バランスマニピュレータタスクの再現容易な実験セットアップを定義し、比較のためのベンチマークとして機能する。 4つのランダム化戦略と3つのランダム化パラメータをシミュレーションと実ロボットで比較する。その結果,よりランダム化がsim2実数転送の助けとなるが,シミュレーションにおける適切なポリシーを見つけるアルゴリズムの能力を損なう可能性があることがわかった。完全にランダム化されたシミュレーションと微調整は、異なる結果を示し、テストされた他のアプローチよりも実際のロボットに翻訳する。 Randomization is currently a widely used approach in Sim2Real transfer for data-driven learning algorithms in robotics. Still, most Sim2Real studies report results for a specific randomization technique and often on a highly customized robotic system, making it difficult to evaluate different randomization approaches systematically. To address this problem, we define an easy-to-reproduce experimental setup for a robotic reach-and-balance manipulator task, which can serve as a benchmark for comparison. We compare four randomization strategies with three randomized parameters both in simulation and on a real robot. Our results show that more randomization helps in Sim2Real transfer, yet it can also harm the ability of the algorithm to find a good policy in simulation. Fully randomized simulations and fine-tuning show differentiated results and translate better to the real robot than the other approaches tested.	翻訳日:2022-06-14 17:48:50 公開日:2022-06-13
# AI研究のためのX-Risk解析 X-Risk Analysis for AI Research ( http://arxiv.org/abs/2206.05862v1 ) ライセンス: Link先を確認	Dan Hendrycks, Mantas Mazeika	(参考訳) 人工知能(AI)は、社会を大幅に改善する可能性があるが、強力なテクノロジーと同様に、リスクと責任が高められる。現在のAI研究は、投機的長期リスクを含むAIシステムから長期リスクを管理する方法に関する体系的な議論を欠いている。 AIが人類の長期的な可能性を改善するのに不可欠である可能性を念頭に置いて、よりインテリジェントで強力なAIシステムを構築することが、最終的には私たちよりも強力なシステムをもたらすのではないかという懸念がある。これらの議論の正確さと基礎化のため,我々は,大規模プロセスをより安全な方向に進めるように設計されたハザード分析とシステム安全性の観点から,時間的にテストされた概念の集合について検討する。次に、AI研究者がAIシステムの安全性に長期的な影響を現実的に与える方法について論じる。最後に、安全性と一般的な能力のバランスに影響を与えるプロセスを堅牢に形成する方法について論じる。 Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind that AI may be integral to improving humanity's long-term potential, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we review a collection of time-tested concepts from hazard analysis and systems safety, which have been designed to steer large processes in safer directions. We then discuss how AI researchers can realistically have long-term impacts on the safety of AI systems. Finally, we discuss how to robustly shape the processes that will affect the balance between safety and general capabilities.	翻訳日:2022-06-14 17:07:27 公開日:2022-06-13
# 干渉認識を伴うオンラインサービスシステムにおける因果推論に基づくルート原因解析 Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition ( http://arxiv.org/abs/2206.05871v1 ) ライセンス: Link先を確認	Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei	(参考訳) 断層診断は多くの領域で重要であり、断層は安全上の脅威や経済的な損失につながる可能性がある。オンラインサービスシステムの分野では、オペレータは障害の検出と軽減のために巨大な監視データに依存している。根本原因指標の小さなセットを迅速に認識することで、障害の軽減に多くの時間を費やすことができる。本稿では,介入認識という新たな因果推論タスクとして根本原因分析問題を定式化する。そこで我々はCausal Inference-based Root Cause Analysis (CIRCA)という,教師なし因果推論に基づく新しい手法を提案した。核となる考え方は、監視変数が根本原因指標となるのに十分な条件、すなわち因果ベイズネットワーク(cbn)の親に条件づけられた確率分布の変化である。オンラインサービスシステムのアプリケーションに向けて、circaはシステムアーキテクチャの知識と因果的仮定に基づいてメトリクスを監視するグラフを構築します。シミュレーション研究は、CIRCAの理論的信頼性を示す。実世界のデータセットのパフォーマンスは、circaが最高のベースラインメソッドよりも、トップ1レコメンデーションのリコールを25%改善できることを示している。 Fault diagnosis is critical in many domains, as faults may lead to safety threats or economic losses. In the field of online service systems, operators rely on enormous monitoring data to detect and mitigate failures. Quickly recognizing a small set of root cause indicators for the underlying fault can save much time for failure mitigation. In this paper, we formulate the root cause analysis problem as a new causal inference task named intervention recognition. We proposed a novel unsupervised causal inference-based method named Causal Inference-based Root Cause Analysis (CIRCA). The core idea is a sufficient condition for a monitoring variable to be a root cause indicator, i.e., the change of probability distribution conditioned on the parents in the Causal Bayesian Network (CBN). Towards the application in online service systems, CIRCA constructs a graph among monitoring metrics based on the knowledge of system architecture and a set of causal assumptions. The simulation study illustrates the theoretical reliability of CIRCA. The performance on a real-world dataset further shows that CIRCA can improve the recall of the top-1 recommendation by 25% over the best baseline method.	翻訳日:2022-06-14 17:07:09 公開日:2022-06-13
# スパースグループブースティング --偏りのないグループと変数の選択 Sparse-group boosting -- Unbiased group and variable selection ( http://arxiv.org/abs/2206.06344v1 ) ライセンス: Link先を確認	Fabian Obster, Christian Heumann	(参考訳) グループ化共変量体の存在下では,グループ内およびグループ間の間隔を強制できる強化のためのフレームワークを提案する。自由度を調整した成分方向および群方向勾配ブーストを同時に使用することにより、スパース群lassoと同様の特性を有するモデルをブーストにより装着することができる。群内および群間間隔を混合パラメータで制御できることを示し, スパース群ラッソにおける混合パラメータとの類似性と相違について考察した。シミュレーション,遺伝子データおよび農業データを用いて,この推定装置の有効性と予測的競争性を示す。データとシミュレーションは、群化変数が存在する場合、スパースグループブースティングの使用は、偏りの少ない変数選択と、コンポーネントワイズブースティングよりも高い予測可能性に関連していることを示唆している。さらに,自由度を通じて成分的に促進するバイアスを低減する方法を提案する。 In the presence of grouped covariates, we propose a framework for boosting that allows to enforce sparsity within and between groups. By using component-wise and group-wise gradient boosting at the same time with adjusted degrees of freedom, a model with similar properties as the sparse group lasso can be fitted through boosting. We show that within-group and between-group sparsity can be controlled by a mixing parameter and discuss similarities and differences to the mixing parameter in the sparse group lasso. With simulations, gene data as well as agricultural data we show the effectiveness and predictive competitiveness of this estimator. The data and simulations suggest, that in the presence of grouped variables the use of sparse group boosting is associated with less biased variable selection and higher predictability compared to component-wise boosting. Additionally, we propose a way of reducing bias in component-wise boosting through the degrees of freedom.	翻訳日:2022-06-14 17:04:21 公開日:2022-06-13
# マルチコアCPUにおける高スループット・低レイテンシソフトウェア定義無線用DSEL A DSEL for High Throughput and Low Latency Software-Defined Radio on Multicore CPUs ( http://arxiv.org/abs/2206.06147v1 ) ライセンス: Link先を確認	Adrien Cassagne (ALSOC, SU), Romain Tajan (IMS, Bordeaux INP), Olivier Aumage (STORM), Camille Leroux (IMS, Bordeaux INP), Denis Barthou (STORM, Bordeaux INP), Christophe J\'ego (IMS, Bordeaux INP)	(参考訳) 本稿では、Software-Defined Radio(SDR)専用の新しいDomain Specific Embedded Language(DSEL)について述べる。注意深く設計されたコンポーネントセットから、効率的なソフトウェアデジタル通信システムを構築することができ、プログラマにとって簡単で安全な方法で、現代のプロセッサアーキテクチャの並列性を活用することができる。特に,提案するDSELは,パイプライニングとシーケンス複製を併用して,ディジタル通信システムから時間的および空間的並列性を抽出することができる。 DSELの機能は、ソフトウェアで完全に設計された広く使われているDVB-S2標準のための完全なデジタルトランスシーバーである。評価により,提案したソフトウェアDVB-S2トランシーバが,最新のハイエンドマルチコアCPUターゲットを最大限に活用できることを示す。 This article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software-Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables the combination of pipelining and sequence duplication techniques to extract both temporal and spatial parallelism from digital communication systems. We leverage the DSEL capabilities on a real use case: a fully digital transceiver for the widely used DVB-S2 standard designed entirely in software. Through evaluation, we show how proposed software DVB-S2 transceiver is able to get the most from modern, high-end multicore CPU targets.	翻訳日:2022-06-14 17:04:03 公開日:2022-06-13
# SwitchboardベンチマークでOracleのワードエラー率をゼロに Toward Zero Oracle Word Error Rate on the Switchboard Benchmark ( http://arxiv.org/abs/2206.06192v1 ) ライセンス: Link先を確認	Arlo Faria, Adam Janin, Korbinian Riedhammer, Sidhi Adkoli	(参考訳) スイッチボードベンチマーク」は自動音声認識(ASR)研究において非常によく知られたテストセットであり、人間レベルの転写精度を主張するシステムの記録設定性能を確立する。この研究は、この評価のあまり知られていない実践的考察を強調し、参照文字の修正と公式評価手法からの逸脱による単語誤り率(WER)の大幅な改善を示す。このより詳細に再現可能なスキームでは、商用のASRシステムでさえ5\% WER未満のスコアが得られ、研究システムの確立された記録は2.3%に低下する。書き起こし精度の別の指標が提案されており、削除を罰せず、人間と機械の性能をより区別しているように見える。商用のASRシステムは、まだこの閾値を下回っているが、商用の人間の音声認識の精度を明らかに上回っている。この研究は、oracle werの計算に標準化されたスコアリングツールを使うことも検討している。フレーズの代替表現は、発話レベルのN-bestリストや単語レベルのデータ構造と比較される。 The "Switchboard benchmark" is a very well-known test set in automatic speech recognition (ASR) research, establishing record-setting performance for systems that claim human-level transcription accuracy. This work highlights lesser-known practical considerations of this evaluation, demonstrating major improvements in word error rate (WER) by correcting the reference transcriptions and deviating from the official scoring methodology. In this more detailed and reproducible scheme, even commercial ASR systems can score below 5\% WER and the established record for a research system is lowered to 2.3%. An alternative metric of transcript precision is proposed, which does not penalize deletions and appears to be more discriminating for human vs. machine performance. While commercial ASR systems are still below this threshold, a research system is shown to clearly surpass the accuracy of commercial human speech recognition. This work also explores using standardized scoring tools to compute oracle WER by selecting the best among a list of alternatives. A phrase alternatives representation is compared to utterance-level N-best lists and word-level data structures; using dense lattices and adding out-of-vocabulary words, this achieves an oracle WER of 0.18%.	翻訳日:2022-06-14 17:03:51 公開日:2022-06-13
# 標準認知症スクリーニングテストの自動化評価 Automated Evaluation of Standardized Dementia Screening Tests ( http://arxiv.org/abs/2206.06208v1 ) ライセンス: Link先を確認	Franziska Braun, Markus F\"orstel, Bastian Oppermann, Andreas Erzigkeit, Thomas Hillemacher, Hartmut Lehfeld, Korbinian Riedhammer	(参考訳) 認知症スクリーニングとモニタリングのためには、様々な認知タスクのパフォーマンスを測定することで主観性を最小化することを目的としており、標準化されたテストが臨床ルーチンにおいて重要な役割を果たす。本稿では,SKTとCERAD-NBの2つの標準化された神経心理学的テストに続き,半標準化された歴史からなる研究について報告する。テストには、名前オブジェクトや単語リストの学習といった基本的なタスクだけでなく、MMSEのような広く使われているツールも含まれている。ほとんどのタスクは音声で実行されるので、書き起こしに基づく自動スコアリングに適している。第1回では,手作業による手作業による評価と手作業による自動評価の相関について検討した。 sktとcerad-nbの両方において,手書きの書き起こしを用いて,高い相関度から完全相関度を観測し,相関度の低いタスクでは,音声に制限されるため,自動スコアリングは人間の基準よりも厳格である。自動転写を用いると、相関は期待通りに低下し、認識精度に関係するが、高い相関は最大0.98(SKT)と0.85(CERAD-NB)である。単語の代替は認識誤りの軽減に役立ち、専門家のスコアとの相関性が向上することを示す。 For dementia screening and monitoring, standardized tests play a key role in clinical routine since they aim at minimizing subjectivity by measuring performance on a variety of cognitive tasks. In this paper, we report on a study that consists of a semi-standardized history taking followed by two standardized neuropsychological tests, namely the SKT and the CERAD-NB. The tests include basic tasks such as naming objects, learning word lists, but also widely used tools such as the MMSE. Most of the tasks are performed verbally and should thus be suitable for automated scoring based on transcripts. For the first batch of 30 patients, we analyze the correlation between expert manual evaluations and automatic evaluations based on manual and automatic transcriptions. For both SKT and CERAD-NB, we observe high to perfect correlations using manual transcripts; for certain tasks with lower correlation, the automatic scoring is stricter than the human reference since it is limited to the audio. Using automatic transcriptions, correlations drop as expected and are related to recognition accuracy; however, we still observe high correlations of up to 0.98 (SKT) and 0.85 (CERAD-NB). We show that using word alternatives helps to mitigate recognition errors and subsequently improves correlation with expert scores.	翻訳日:2022-06-14 17:03:29 公開日:2022-06-13
# (参考訳) Federated Bayesian Neural Regression: A Scalable Global Federated Gaussian Process Federated Bayesian Neural Regression: A Scalable Global Federated Gaussian Process ( http://arxiv.org/abs/2206.06357v1 ) ライセンス: CC BY 4.0	Haolin Yu, Kaiyang Guo, Mahdi Karami, Xi Chen, Guojun Zhang, Pascal Poupart	(参考訳) フェデレートラーニング(FL)フレームワークが適用される典型的なシナリオでは、クライアントが正確なモデルを作成するのに十分なトレーニングデータを持つことが一般的です。したがって、点推定だけでなく、信頼の概念も提供するモデルは有益である。ガウス過程(英: Gaussian Process, GP)は、自然に校正された分散推定を伴う強力なベイズモデルである。しかし、ローカルカーネルのマージがプライバシー漏洩につながるため、スタンドアローンのグローバルGPを学ぶのは難しい。プライバシーを守るために、フェデレーションgpsを検討する以前の研究は、ローカルモデルのパーソナライズされた設定や学習に集中することで、グローバルモデルを学ぶことを避ける。我々は,クライアントのプライバシを尊重するスケーラブルなスタンドアロングローバルフェデレーションGPを学習するアルゴリズムであるFederated Bayesian Neural Regression (FedBNR)を提案する。統一的なランダムカーネルを定義することで、拡張性のために深いカーネル学習とランダム機能を取り込んでいます。このランダムカーネルは、静止カーネルと多くの非定常カーネルを復元可能であることを示す。そして、すべてのクライアントデータが集中しているかのように、グローバルな予測モデルを学ぶ原則に基づくアプローチを導出します。また,グローバルカーネルを非同一かつ独立に分散したクライアントに対して,知識蒸留法を用いて学習する。実世界の回帰データセットを用いて実験を行い、他のGPモデルと比較して統計的に有意な改善を示した。 In typical scenarios where the Federated Learning (FL) framework applies, it is common for clients to have insufficient training data to produce an accurate model. Thus, models that provide not only point estimations, but also some notion of confidence are beneficial. Gaussian Process (GP) is a powerful Bayesian model that comes with naturally well-calibrated variance estimations. However, it is challenging to learn a stand-alone global GP since merging local kernels leads to privacy leakage. To preserve privacy, previous works that consider federated GPs avoid learning a global model by focusing on the personalized setting or learning an ensemble of local models. We present Federated Bayesian Neural Regression (FedBNR), an algorithm that learns a scalable stand-alone global federated GP that respects clients' privacy. We incorporate deep kernel learning and random features for scalability by defining a unifying random kernel. We show this random kernel can recover any stationary kernel and many non-stationary kernels. We then derive a principled approach of learning a global predictive model as if all client data is centralized. We also learn global kernels with knowledge distillation methods for non-identically and independently distributed (non-i.i.d.) clients. Experiments are conducted on real-world regression datasets and show statistically significant improvements compared to other federated GP models.	翻訳日:2022-06-14 16:57:08 公開日:2022-06-13
# 連続k-Nearest Neighboursグラフを用いた局所距離保存オートエンコーダ Local distance preserving auto-encoders using Continuous k-Nearest Neighbours graphs ( http://arxiv.org/abs/2206.05909v1 ) ライセンス: Link先を確認	Nutan Chen, Patrick van der Smagt, Botond Cseke	(参考訳) データの類似性を保存する自動エンコーダモデルは、表現学習において一般的なツールである。本稿では,データ空間から潜在空間へのマッピング時に局所距離を保持する自動エンコーダモデルをいくつか紹介する。我々は、任意のスケールで位相的特徴を同時に捉えることが知られている連続k-アネレスグラフに基づく局所的距離保存損失を用いる。学習性能を向上させるために,局所的距離保存を主目的とし,復元精度を制約として学習を制約最適化問題として定式化する。このアプローチを階層的変分オートエンコーダに一般化し,幾何学的一貫性のある潜在性とデータ空間を持つ生成モデルを学ぶ。提案手法は,複数の標準データセットと評価指標にまたがって最先端のパフォーマンスを提供する。 Auto-encoder models that preserve similarities in the data are a popular tool in representation learning. In this paper we introduce several auto-encoder models that preserve local distances when mapping from the data space to the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbours graph which is known to capture topological features at all scales simultaneously. To improve training performance, we formulate learning as a constraint optimisation problem with local distance preservation as the main objective and reconstruction accuracy as a constraint. We generalise this approach to hierarchical variational auto-encoders thus learning generative models with geometrically consistent latent and data spaces. Our method provides state-of-the-art performance across several standard datasets and evaluation metrics.	翻訳日:2022-06-14 16:31:32 公開日:2022-06-13
# 階層構造を持つプライベート合成データ Private Synthetic Data with Hierarchical Structure ( http://arxiv.org/abs/2206.05942v1 ) ライセンス: Link先を確認	Terrance Liu, Zhiwei Steven Wu	(参考訳) 本研究では、個人データポイントがグループ化される階層的データセット(例えば、家庭内の人々)に対する差分プライベートな合成データ生成の問題について検討する。特に、合成データセットと基礎となるプライベートデータセットの類似性を測定するために、プライベートクエリリリースの問題の下で目的を定め、クエリの集合(平均集計数のような統計)の回答を保存する合成データセットを生成します。しかし、クエリリリース問題へのプライベートな合成データの適用はよく研究されているが、そのような研究は階層的でないデータドメインに限定されており、最初の疑問を提起している。さらに、これらの統計を捉えながら、グループレベルでも個人レベルでも合成データを生成する方法はまだ確立されていない。これらの課題を踏まえて、我々はまず階層的なクエリリリースの問題を定式化し、そこでは階層的なデータセットの統計収集を目標としています。具体的には、グループと個人レベルの属性間の関係をキャプチャする統計クエリの一般的なセットを提供する。次に,階層的クエリリリースのためのプライベート合成データアルゴリズムを導入し,american community surveyとalegheny family screening toolデータから得られた階層的データセット上で評価する。最後に、アメリカン・コミュニティ・サーベイ(American Community Survey)に注目します。その本質的に階層構造は、実験を行う別のドメイン固有のクエリのセットを生み出します。 We study the problem of differentially private synthetic data generation for hierarchical datasets in which individual data points are grouped together (e.g., people within households). In particular, to measure the similarity between the synthetic dataset and the underlying private one, we frame our objective under the problem of private query release, generating a synthetic dataset that preserves answers for some collection of queries (i.e., statistics like mean aggregate counts). However, while the application of private synthetic data to the problem of query release has been well studied, such research is restricted to non-hierarchical data domains, raising the initial question -- what queries are important when considering data of this form? Moreover, it has not yet been established how one can generate synthetic data at both the group and individual-level while capturing such statistics. In light of these challenges, we first formalize the problem of hierarchical query release, in which the goal is to release a collection of statistics for some hierarchical dataset. Specifically, we provide a general set of statistical queries that captures relationships between attributes at both the group and individual-level. Subsequently, we introduce private synthetic data algorithms for hierarchical query release and evaluate them on hierarchical datasets derived from the American Community Survey and Allegheny Family Screening Tool data. Finally, we look to the American Community Survey, whose inherent hierarchical structure gives rise to another set of domain-specific queries that we run experiments with.	翻訳日:2022-06-14 16:29:02 公開日:2022-06-13
# 制約付き高次元ベイズ最適化:粉体重み付けへの応用 High-Dimensional Bayesian Optimization with Constraints: Application to Powder Weighing ( http://arxiv.org/abs/2206.05988v1 ) ライセンス: Link先を確認	Shoki Miyagawa, Atsuyoshi Yano, Naoko Sawada and Isamu Ogawa	(参考訳) ベイズ最適化はブラックボックス問題のパラメータを効果的に最適化する。しかし,この手法は限定試行において高次元パラメータには有効ではなかった。パラメータは非線形に低次元空間に埋め込むことで効率的に探索することができるが、制約は考慮できない。高次元ベイズ最適化において既知の等式と未知不等式制約の両方を考慮するため,非線形埋め込みに不等角表現学習を導入することでパラメータ分解を組み合わせることを提案した。提案手法を使用シナリオとしてパウダー重み付け作業に適用した。提案手法は,実験結果に基づいて制約を考慮し,手動パラメータチューニングと比較して約66%の試行回数削減に寄与する。 Bayesian optimization works effectively optimizing parameters in black-box problems. However, this method did not work for high-dimensional parameters in limited trials. Parameters can be efficiently explored by nonlinearly embedding them into a low-dimensional space; however, the constraints cannot be considered. We proposed combining parameter decomposition by introducing disentangled representation learning into nonlinear embedding to consider both known equality and unknown inequality constraints in high-dimensional Bayesian optimization. We applied the proposed method to a powder weighing task as a usage scenario. Based on the experimental results, the proposed method considers the constraints and contributes to reducing the number of trials by approximately 66% compared to manual parameter tuning.	翻訳日:2022-06-14 16:28:39 公開日:2022-06-13
# 非修正解析を用いた有向非巡回グラフにおける一般DNNの関数近似と安定性の解析 Analysis of function approximation and stability of general DNNs in directed acyclic graphs using un-rectifying analysis ( http://arxiv.org/abs/2206.05997v1 ) ライセンス: Link先を確認	Wen-Liang Hwang and Shih-Shuo Tung	(参考訳) ディープフィードフォワードニューラルネットワーク(DNN)に関する一般的な理解の欠如は、非線形関数の構成を分析するツールの欠如と、DNNアーキテクチャの多様性に適用可能な数学的モデルの欠如によるものである。本稿では,不整合法を用いて有向非巡回グラフ(DAG)を用いてDNNを解析するために,アクティベーション関数,非線形変換,DNNアーキテクチャに関するいくつかの基本的な仮定を行った。これらの仮定を満たすDNNは一般的なDNNと呼ばれる。分析グラフの構築は,dagをボトムアップから,規制規則に従って基本要素への原子操作の適用によって構築する公理的手法に基づく。このアプローチにより、数学的帰納法により一般DNNの特性を導出することができる。提案手法を用いることで、一般的なDNNに対して真である性質を導出できることを示す。この分析はネットワーク機能の理解を深め、グラフの分析ツールのホストを活用できれば、さらなる理論的洞察を促進することができる。 A general lack of understanding pertaining to deep feedforward neural networks (DNNs) can be attributed partly to a lack of tools with which to analyze the composition of non-linear functions, and partly to a lack of mathematical models applicable to the diversity of DNN architectures. In this paper, we made a number of basic assumptions pertaining to activation functions, non-linear transformations, and DNN architectures in order to use the un-rectifying method to analyze DNNs via directed acyclic graphs (DAGs). DNNs that satisfy these assumptions are referred to as general DNNs. Our construction of an analytic graph was based on an axiomatic method in which DAGs are built from the bottom-up through the application of atomic operations to basic elements in accordance with regulatory rules. This approach allows us to derive the properties of general DNNs via mathematical induction. We show that using the proposed approach, some properties hold true for general DNNs can be derived. This analysis advances our understanding of network functions and could promote further theoretical insights if the host of analytical tools for graphs can be leveraged.	翻訳日:2022-06-14 16:28:28 公開日:2022-06-13
# シャープネス・アウェア・ミニミゼーションの理解に向けて Towards Understanding Sharpness-Aware Minimization ( http://arxiv.org/abs/2206.06232v1 ) ライセンス: Link先を確認	Maksym Andriushchenko, Nicolas Flammarion	(参考訳) Sharpness-Aware Minimization (SAM) は、様々な設定における一般化を著しく改善する最悪の重み摂動に依存する最近の訓練手法である。我々は、PAC-ベイズ一般化境界に基づくSAMの成功に対する既存の正当化と平坦なミニマへの収束の考えが不完全であると主張する。さらに、SAM で$m$-sharpness を使うことの成功については、一般化に必須であることが示されている説明がない。 SAMのこの側面をよりよく理解するために、対角線ネットワークの暗黙バイアスを理論的に分析する。 SAMは常にある種の問題に対して標準勾配降下よりも優れた一般化特性を持つ解を選択しており、この効果は$m$-シャープネスを用いて増幅される。さらに,非線形ネットワーク上での暗黙バイアスの特性を実証的に研究し,SAMを用いた標準モデルの微調整が一般化の改善につながることを示した。最後に,確率勾配を用いた非凸目的に対するsamの収束結果を示す。本稿では,これらの結果を深層ネットワークに実証的に説明し,SAMの一般化挙動との関係について論じる。実験のコードはhttps://github.com/tml-epfl/understanding-sam.comで公開されている。 Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using $m$-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using $m$-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.	翻訳日:2022-06-14 16:27:51 公開日:2022-06-13
# (参考訳) Proof Tree AutomataとProof Tree Graphsの紹介 Introducing Proof Tree Automata and Proof Tree Graphs ( http://arxiv.org/abs/2206.06294v1 ) ライセンス: CC BY 4.0	Valentin D. Richard	(参考訳) 構造証明理論では、大規模計算の設計と作業は、システム全体の一部として個々の規則について直観を得るのを難しくする。グラフ理論とオートマトン理論のアプローチを用いて,計算作業を支援する2つの新しいツールを提案する。第一のツールはProof Tree Automaton (PTA) であり、その言語が電卓の派生言語である木オートマトンである。第2のツールは、Proof Tree Graph (PTG) と呼ばれる計算のグラフィカル表現である。この有向ハイパーグラフでは、頂点は項(例えば列)の集合であり、ハイパーアークは規則である。 PTA と PTG の性質とそれらの相互関係について検討する。計算値から従来の木オートマトンへの部分写像としてPTAを分解できることが示される。我々はその文を精錬システム理論で定式化する。最後に、フレームワークをネットと文字列ダイアグラムの証明と比較します。 In structural proof theory, designing and working on large calculi make it difficult to get intuitions about each rule individually and as part of a whole system. We introduce two novel tools to help working on calculi using the approach of graph theory and automata theory. The first tool is a Proof Tree Automaton (PTA): a tree automaton which language is the derivation language of a calculus. The second tool is a graphical representation of a calculus called Proof Tree Graph (PTG). In this directed hypergraph, vertices are sets of terms (e.g. sequents) and hyperarcs are rules. We explore properties of PTA and PTGs and how they relate to each other. We show that we can decompose a PTA as a partial map from a calculus to a traditional tree automaton. We formulate that statement in the theory of refinement systems. Finally, we compare our framework to proof nets and string diagrams.	翻訳日:2022-06-14 16:25:39 公開日:2022-06-13
# (参考訳) arf:芸術的照度分野 ARF: Artistic Radiance Fields ( http://arxiv.org/abs/2206.06360v1 ) ライセンス: CC BY 4.0	Kai Zhang and Nick Kolkin and Sai Bi and Fujun Luan and Zexiang Xu and Eli Shechtman and Noah Snavely	(参考訳) 本稿では,任意のスタイル画像の芸術的特徴を3Dシーンに転送する方法を提案する。点雲やメッシュ上で3次元スタイリングを行う従来の方法は、複雑な現実世界のシーンの幾何学的再構成誤差に敏感である。代わりに、よりロバストな放射場表現をスタイライズすることを提案する。一般的に使用されるグラム行列に基づく損失は,忠実なブラシストロークを使わずに曖昧な結果をもたらす傾向があり,多視点一貫性を維持しつつ,スタイルの詳細を捉えるのに極めて効果的である近傍の損失を導入する。また、フル解像度レンダリング画像上で定義されたスタイル損失を用いて、メモリ集約放射場を最適化する新しい遅延バックプロパゲーション手法を提案する。本手法は, スタイル画像に類似した芸術的外観を生成することにより, ベースラインよりも優れることを示す。ビデオ結果とオープンソース実装については、プロジェクトページを参照してください。 We present a method for transferring the artistic features of an arbitrary style image to a 3D scene. Previous methods that perform 3D stylization on point clouds or meshes are sensitive to geometric reconstruction errors for complex real-world scenes. Instead, we propose to stylize the more robust radiance field representation. We find that the commonly used Gram matrix-based loss tends to produce blurry results without faithful brushstrokes, and introduce a nearest neighbor-based loss that is highly effective at capturing style details while maintaining multi-view consistency. We also propose a novel deferred back-propagation method to enable optimization of memory-intensive radiance fields using style losses defined on full-resolution rendered images. Our extensive evaluation demonstrates that our method outperforms baselines by generating artistic appearance that more closely resembles the style image. Please check our project page for video results and open-source implementations: https://www.cs.cornell.edu/projects/arf/ .	翻訳日:2022-06-14 15:57:50 公開日:2022-06-13
# オブジェクトトークンのフレームクリップ一貫性による映像シーン構造の実現 Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens ( http://arxiv.org/abs/2206.06346v1 ) ライセンス: Link先を確認	Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson	(参考訳) 最近の行動認識モデルは、オブジェクト、それらの位置、相互作用を統合することで印象的な結果を得た。しかし、各フレームに対して厳密な構造化アノテーションを取得するのは面倒で時間を要するため、これらのメソッドはトレーニングコストが高く、スケーラビリティも低い。同時に、関心領域内外を問わず、注釈付き画像の小さなセットが利用可能であれば、これをビデオ下流タスクに活用するにはどうすればよいのか? 学習フレームワークStructureViT(略してSViT)を提案し、トレーニング中にのみ利用できる少数の画像の構造を利用することで、ビデオモデルを改善する方法を示す。 SViTは2つの重要な洞察に依存している。まず、画像とビデオの両方に構造化情報が含まれているため、画像とビデオにまたがって使用できる「emph{object tokens}」セットのトランスフォーマーモデルを統合する。第二に、動画中の個々のフレームのシーン表現は静止画と「一致」すべきである。これは、画像とビデオ間の構造化情報の流れを保証する \emph{frame-clip consistency} 損失によって達成される。場面構造の特定のインスタンス化、すなわち、手と物体がノードとして位置し、接点/非接点がエッジとして物理的関係からなる、\emph{hand-object graph} を探索する。 SViTは、複数のビデオ理解タスクとデータセットで強力なパフォーマンス向上を示しており、Ego4D CVPR'22 Object State Localizationチャレンジで優勝している。コードと事前訓練されたモデルについては、プロジェクトページの \url{https://eladb3.github.io/SViT/} を参照してください。 Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. At the same time, if a small set of annotated images is available, either within or outside the domain of interest, how could we leverage these for a video downstream task? We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos. Second, the scene representations of individual frames in video should "align" with those of still images. This is achieved via a \emph{Frame-Clip Consistency} loss, which ensures the flow of structured information between images and videos. We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges. SViT shows strong performance improvements on multiple video understanding tasks and datasets; and it wins first place in the Ego4D CVPR'22 Object State Localization challenge. For code and pretrained models, visit the project page at \url{https://eladb3.github.io/SViT/}	翻訳日:2022-06-14 15:56:49 公開日:2022-06-13
# 小型人辞書に基づく具体性・難易度評価付き大辞書の自動生成 Automatic generation of a large dictionary with concreteness/abstractness ratings based on a small human dictionary ( http://arxiv.org/abs/2206.06200v1 ) ライセンス: Link先を確認	Vladimir Ivanov, Valery Solovyev	(参考訳) 具体的/抽象的な言葉は、心理学的・神経生理学的研究の増加に使われている。いくつかの言語では、大きな辞書が手作業で作成されている。これは非常に時間がかかり、コストがかかるプロセスです。より小さなサンプルで得られた専門家評価を外挿する必要があるコンクリート/コンクリートの単語の高品質辞書を自動で生成する。研究上の疑問は、このようなサンプルがどの程度小さくして十分な外挿を行うべきかである。本稿では,単語の自動格付け手法を提案するとともに,専門家評価の量を大幅に削減するためのアプローチを提案する。この手法は英語の大規模なテストセットで評価されている。構築された辞書の品質は専門家に匹敵する。予測された評価と専門家の評価の相関は、最先端の手法と比較して高い。 Concrete/abstract words are used in a growing number of psychological and neurophysiological research. For a few languages, large dictionaries have been created manually. This is a very time-consuming and costly process. To generate large high-quality dictionaries of concrete/abstract words automatically one needs extrapolating the expert assessments obtained on smaller samples. The research question that arises is how small such samples should be to do a good enough extrapolation. In this paper, we present a method for automatic ranking concreteness of words and propose an approach to significantly decrease amount of expert assessment. The method has been evaluated on a large test set for English. The quality of the constructed dictionaries is comparable to the expert ones. The correlation between predicted and expert ratings is higher comparing to the state-of-the-art methods.	翻訳日:2022-06-14 15:55:45 公開日:2022-06-13
# (参考訳) SNeS:不完全データからおそらく対称性のあるニューラルサーフェスを学習する SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data ( http://arxiv.org/abs/2206.06340v1 ) ライセンス: CC BY 4.0	Eldar Insafutdinov, Dylan Campbell, Jo\~ao F. Henriques, Andrea Vedaldi	(参考訳) 部分対称物体の正確な3次元再構成法を提案する。我々は、ニューラルレイディアンスフィールド(NeRF)のようなニューラル再構成とレンダリングの最近の進歩の強みの上に構築する。このようなアプローチの大きな欠点は、トレーニングイメージではっきりと見えないオブジェクトの任意の部分を再構築できないことだ。証拠が欠落している場合、対称性のような構造的事前情報を使用して、欠落情報を完成させることができる。幾何学的および非反射的材料は対称的であるかもしれないが、周囲のシーンからの影や反射は一般に対称ではない。これに対処するために,3次元形状と材料特性にソフト対称性の制約を適用し,照明,アルベド色,反射率に寄与する。提案手法を最近導入したCO3Dデータセット上で評価し,高反射性材料を再構成する難しさから自動車カテゴリーに着目した。高い忠実度で観察されていない領域を再構成し、高品質のノベルビュー画像を作成することができることを示す。 We present a method for the accurate 3D reconstruction of partly-symmetric objects. We build on the strengths of recent advances in neural reconstruction and rendering such as Neural Radiance Fields (NeRF). A major shortcoming of such approaches is that they fail to reconstruct any part of the object which is not clearly visible in the training image, which is often the case for in-the-wild images and videos. When evidence is lacking, structural priors such as symmetry can be used to complete the missing information. However, exploiting such priors in neural rendering is highly non-trivial: while geometry and non-reflective materials may be symmetric, shadows and reflections from the ambient scene are not symmetric in general. To address this, we apply a soft symmetry constraint to the 3D geometry and material properties, having factored appearance into lighting, albedo colour and reflectivity. We evaluate our method on the recently introduced CO3D dataset, focusing on the car category due to the challenge of reconstructing highly-reflective materials. We show that it can reconstruct unobserved regions with high fidelity and render high-quality novel view images.	翻訳日:2022-06-14 15:55:00 公開日:2022-06-13
# link3d: 3dlidar point cloudの線形キーポイント表現 LinK3D: Linear Keypoints Representation for 3D LiDAR Point Cloud ( http://arxiv.org/abs/2206.05927v1 ) ライセンス: Link先を確認	Yunge Cui, Yinlong Zhang, Jiahua Dong, Haibo Sun and Feng Zhu	(参考訳) 特徴抽出とマッチングは、2Dや3Dオブジェクトの検出、認識、登録など、多くのコンピュータビジョンタスクの基本部分である。ご存知の通り、2Dの特徴抽出とマッチングはすでに大きな成功を収めています。残念なことに、3Dの分野では、現在の手法は視覚タスクにおける3D LiDARセンサーの広範囲な応用をサポートできない。この制限に対処するため,LinK3Dと呼ばれる3次元LiDAR点雲に対する線形キーポイント表現法を提案する。 LinK3D の斬新さは、LiDAR のポイントクラウドの特徴(空間性、シナリオの複雑さなど)を完全に考慮し、現在のキーポイントをその強い隣のキーポイントで表現し、現在のキーポイントの記述に強い制約を与える点にある。提案したLinK3Dは,2つの公開データセット(KITTI,Steven VLP16)で評価され,実験結果から,提案手法が適合性能の最先端性を大幅に向上することが示された。さらに重要なことに、LinK3Dは(LiDARの周波数10Hzに基づいて)優れたリアルタイムパフォーマンスを示している。 LinK3Dは、64光のレーザービームで収集された点雲から特徴を引き出すのに平均32ミリ秒しかかからず、Intel Core i7 @2.2 GHzプロセッサでノートブックで実行すると2つのLiDARスキャンと一致するのに、わずか8ミリ秒しかかからない。さらに、この手法は様々な3d視覚アプリケーションに広く拡張することができる。本稿では,LinK3Dを3次元登録,LiDARオドメトリー,位置認識タスクに適用し,最先端手法と比較して競争力のある結果を得た。 Feature extraction and matching are the basic parts of many computer vision tasks, such as 2D or 3D object detection, recognition, and registration. As we all know, 2D feature extraction and matching have already been achieved great success. Unfortunately, in the field of 3D, the current methods fail to support the extensive application of 3D LiDAR sensors in vision tasks, due to the poor descriptiveness and inefficiency. To address this limitation, we propose a novel 3D feature representation method: Linear Keypoints representation for 3D LiDAR point cloud, called LinK3D. The novelty of LinK3D lies in that it fully considers the characteristics (such as sparsity, complexity of scenarios) of LiDAR point cloud, and represents current keypoint with its robust neighbor keypoints, which provide strong constraint on the description of current keypoint. The proposed LinK3D has been evaluated on two public datasets (i.e., KITTI, Steven VLP16), and the experimental results show that our method greatly outperforms the state-of-the-arts in matching performance. More importantly, LinK3D shows excellent real-time performance (based on the frequence 10 Hz of LiDAR). LinK3D only takes an average of 32 milliseconds to extract features from the point cloud collected by a 64-ray laser beam, and takes merely about 8 milliseconds to match two LiDAR scans when executed in a notebook with an Intel Core i7 @2.2 GHz processor. Moreover, our method can be widely extended to a variety of 3D vision applications. In this paper, we has applied our LinK3D to 3D registration, LiDAR odometry and place recognition tasks, and achieved competitive results compared with the state-of-the-art methods.	翻訳日:2022-06-14 15:50:48 公開日:2022-06-13
# in-the-wildイメージから学ぶファッションの相性 Learning Fashion Compatibility from In-the-wild Images ( http://arxiv.org/abs/2206.05982v1 ) ライセンス: Link先を確認	Additya Popli, Vijay Kumar, Sujit Jos and Saraansh Tandon	(参考訳) 相補的なファッションレコメンデーションは、衣装として「うまく行く」異なるカテゴリー(シャツ、履物など)のアイテムを特定することを目的としている。既存のアプローチのほとんどは、手動で調整された互換アイテムの組み合わせを含むラベル付き衣装データセットを使用して、このタスクの表現を学習する。本研究では,コンパチブルな服装を身に着けることが多いという事実を活かし,自己教師付き学習を通じて,街頭ファッション画像からコンパチブル予測のための表現を学習することを提案する。本研究の前提課題は、同一人物が着用する異なる項目の表現が、他人が着用するものよりも近いように定式化されている。さらに,推測中の画像とカタログの領域間ギャップを低減するために,両領域間の特徴分布の差を最小限に抑える対角的損失を導入する。我々は、PolyvoreとPolyvore-Disjointの2つの一般的なファッション互換性ベンチマークで実験を行い、既存の自己教師型アプローチよりも優れています。 Complementary fashion recommendation aims at identifying items from different categories (e.g. shirt, footwear, etc.) that "go well together" as an outfit. Most existing approaches learn representation for this task using labeled outfit datasets containing manually curated compatible item combinations. In this work, we propose to learn representations for compatibility prediction from in-the-wild street fashion images through self-supervised learning by leveraging the fact that people often wear compatible outfits. Our pretext task is formulated such that the representations of different items worn by the same person are closer compared to those worn by other people. Additionally, to reduce the domain gap between in-the-wild and catalog images during inference, we introduce an adversarial loss that minimizes the difference in feature distribution between the two domains. We conduct our experiments on two popular fashion compatibility benchmarks - Polyvore and Polyvore-Disjoint outfits, and outperform existing self-supervised approaches, particularly significant in cross-dataset setting where training and testing images are from different sources.	翻訳日:2022-06-14 15:50:12 公開日:2022-06-13
# より良い教師: 知識蒸留のための動的事前知識 Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation ( http://arxiv.org/abs/2206.06067v1 ) ライセンス: Link先を確認	Zengyu Qiu, Xinzhu Ma, Kunlin Yang, Chunya Liu, Jun Hou, Shuai Yi, Wanli Ouyang	(参考訳) 知識蒸留(kd)は、大きなモデル(教師)から小さなモデル(学生)への学習表現の転送に非常に有望な能力を示している。しかし,学生と教師の能力格差が大きくなるにつれて,既存のKD手法ではより良い結果が得られない。本研究は,特に大規模教員に適用する場合において,kdにとって「優先的知識」が不可欠であることを示す。特に,教師の特徴の一部を,特徴蒸留の前に先行知識として統合する動的事前知識(DPK)を提案する。これは、我々のメソッドが教師の特徴を単に「ターゲット」ではなく「インプット」として捉えることを意味します。また,学習段階における事前知識の比率を特徴ギャップに応じて動的に調整することにより,学生を適切な難易度で指導する。提案手法を評価するため、2つの画像分類ベンチマーク(CIFAR100とImageNet)とオブジェクト検出ベンチマーク(MS COCO)について広範な実験を行った。その結果,異なる条件下での性能において,本手法が優れていることを示す。さらに,dpkにより,生徒モデルの性能と教師モデルとの正の相関が得られ,より大きな教師を適用することで,学生の正確性をさらに高めることができる。私たちのコードは再現性のために公開されます。 Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the 'prior knowledge' is vital to KD, especially when applying large teachers. Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of the teacher's features as the prior knowledge before the feature distillation. This means that our method also takes the teacher's feature as `input', not just `target'. Besides, we dynamically adjust the ratio of the prior knowledge during the training phase according to the feature gap, thus guiding the student in an appropriate difficulty. To evaluate the proposed method, we conduct extensive experiments on two image classification benchmarks (i.e. CIFAR100 and ImageNet) and an object detection benchmark (i.e. MS COCO). The results demonstrate the superiority of our method in performance under varying settings. More importantly, our DPK makes the performance of the student model is positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers. Our codes will be publicly available for the reproducibility.	翻訳日:2022-06-14 15:49:51 公開日:2022-06-13
# recaptured image forensic における学習特徴のばらつきと動的融合 Learning Feature Disentanglement and Dynamic Fusion for Recaptured Image Forensic ( http://arxiv.org/abs/2206.06103v1 ) ライセンス: Link先を確認	Shuyu Miao, Lin Zheng, Hong Jin	(参考訳) 画像再キャプチャーは、他人の画像を再キャプチャすることでシステムを欺く人工知能(ai)システムの公平性を損なう。既存の再キャプチャモデルのほとんどは、固定された電子デバイスを使用してシミュレーションされた再キャプチャされた画像を含むデータセットに基づいて、単一の再キャプチャパターン(moire、エッジ、アーティファクトなど)にのみ対処できる。本稿では,画像再キャプチャータスクを画像再キャプチャー認識の4つのパターン,すなわちモアレ再キャプチャー,エッジ再キャプチャー,アーティファクト再キャプチャー,その他の再キャプチャーとして明示的に再定義する。一方,異なる再キャプチャパターン認識をカバーするために,最も効果的な再キャプチャ特徴表現を適応的に学習する特徴分散と動的融合(FDDF)モデルを提案する。さらに,これまでに公表したデータセットの約5倍の多種多様なリキャプチャパターンを含む,大規模な実時間ユニバーサルリキャプチャ(rur)データセットを収集した。我々の知る限り、我々はまず、再適応画像法学のための一般的なモデルと一般的な実シーンの大規模データセットを提案する。大規模な実験により,提案したFDDFはRURデータセット上で最先端の性能を達成できることが示された。 Image recapture seriously breaks the fairness of artificial intelligent (AI) systems, which deceives the system by recapturing others' images. Most of the existing recapture models can only address a single pattern of recapture (e.g., moire, edge, artifact, and others) based on the datasets with simulated recaptured images using fixed electronic devices. In this paper, we explicitly redefine image recapture forensic task as four patterns of image recapture recognition, i.e., moire recapture, edge recapture, artifact recapture, and other recapture. Meanwhile, we propose a novel Feature Disentanglement and Dynamic Fusion (FDDF) model to adaptively learn the most effective recapture feature representation for covering different recapture pattern recognition. Furthermore, we collect a large-scale Real-scene Universal Recapture (RUR) dataset containing various recapture patterns, which is about five times the number of previously published datasets. To the best of our knowledge, we are the first to propose a general model and a general real-scene large-scale dataset for recaptured image forensic. Extensive experiments show that our proposed FDDF can achieve state-of-the-art performance on the RUR dataset.	翻訳日:2022-06-14 15:49:30 公開日:2022-06-13
# 特異値の微調整:最小ショットのセグメンテーションは、最小パラメータの微調整を必要とする Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning ( http://arxiv.org/abs/2206.06122v1 ) ライセンス: Link先を確認	Yanpeng Sun, Qiang Chen, Xiangyu He, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jian Cheng, Zechao Li, Jingdong Wang	(参考訳) 事前トレーニングされたバックボーンの凍結は、少数ショットのセグメンテーションでオーバーフィットを避けるための標準的なパラダイムになっています。本稿では、このパラダイムを再考し、新しい体制を探求する。オーバーフィッティング問題を克服する解決策を提案し,新しいクラスを学習する際のモデル一般化を改良する。本手法では, バックボーンパラメータをSingular Value Decomposition (SVD) を介して3つの連続行列に分解し, 特異値のみを微調整し, 他のパラメータを凍結する。上記の設計により、トレーニング済みのバックボーン内でセマンティックなヒントを維持しながら、新しいクラスの特徴表現を調整できる。バックボーンの異なる複数ショットセグメンテーション法におけるSVF(Singular Value Fine-tuning)アプローチの評価を行った。本研究では,Pascal-5$^i$とCOCO-20$^i$を1ショット5ショット設定で比較した。このシンプルなベースラインが研究者たちに、バックボーンの微調整の役割を再考させることを期待したい。ソースコードとモデルは \url{https://github.com/syp2ysy/SVF} で入手できる。 Freezing the pre-trained backbone has become a standard paradigm to avoid overfitting in few-shot segmentation. In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}. We present a solution to overcome the overfitting problem, leading to better model generalization on learning novel classes. Our method decomposes backbone parameters into three successive matrices via the Singular Value Decomposition (SVD), then {\em only fine-tunes the singular values} and keeps others frozen. The above design allows the model to adjust feature representations on novel classes while maintaining semantic clues within the pre-trained backbone. We evaluate our {\em Singular Value Fine-tuning (SVF)} approach on various few-shot segmentation methods with different backbones. We achieve state-of-the-art results on both Pascal-5$^i$ and COCO-20$^i$ across 1-shot and 5-shot settings. Hopefully, this simple baseline will encourage researchers to rethink the role of backbone fine-tuning in few-shot settings. The source code and models will be available at \url{https://github.com/syp2ysy/SVF}.	翻訳日:2022-06-14 15:49:08 公開日:2022-06-13
# ICCV 2021 VIPriors画像分類チャレンジのための第2位ソリューション: 抽出と反発学習アプローチ 2nd Place Solution for ICCV 2021 VIPriors Image Classification Challenge: An Attract-and-Repulse Learning Approach ( http://arxiv.org/abs/2206.06168v1 ) ライセンス: Link先を確認	Yilu Guo, Shicai Yang, Weijie Chen, Liang Ma, Di Xie, Shiliang Pu	(参考訳) 畳み込みニューラルネットワーク(cnns)は,大規模データセットを利用することで,画像分類において有意な成功を収めている。しかし、小規模データセットをスクラッチから効率的に学習することは依然として大きな課題である。限られたトレーニングデータセットでは、過度にパラメータ化されたCNNが単にデータセットを記憶する傾向があるため、カテゴリの概念は曖昧になる。したがって,過度な適合を避けながら,より差別的な表現を学習する方法を研究することが重要である。カテゴリの概念はあいまいな傾向があるため、より個別の情報を取得することが重要である。そこで本稿では,特徴表現を豊かにするContrastive Regularization (CR) と,異なるクラスに対する適合性のバランスをとるSymmetric Cross Entropy (SCE) と,ラベル情報のキャリブレーションを行うMean Teacher という新たなフレームワークを提案する。具体的には、sce と cr は、クラス情報 (attract) とインスタンス (repulse) の間の適応的トレードオフによって過剰フィッティングを緩和しながら、識別表現を学習する。その後、より正確なソフト擬似ラベルを校正することで、パフォーマンスをさらに改善するために平均教師が使用される。十分な実験は、Attract-and-Repulseフレームワークの有効性を検証する。攻撃的データ拡張,tencrop推論,モデルセンシングなど他の戦略とともに,iccv 2021画像分類課題において,第2位を達成した。 Convolutional neural networks (CNNs) have achieved significant success in image classification by utilizing large-scale datasets. However, it is still of great challenge to learn from scratch on small-scale datasets efficiently and effectively. With limited training datasets, the concepts of categories will be ambiguous since the over-parameterized CNNs tend to simply memorize the dataset, leading to poor generalization capacity. Therefore, it is crucial to study how to learn more discriminative representations while avoiding over-fitting. Since the concepts of categories tend to be ambiguous, it is important to catch more individual-wise information. Thus, we propose a new framework, termed Attract-and-Repulse, which consists of Contrastive Regularization (CR) to enrich the feature representations, Symmetric Cross Entropy (SCE) to balance the fitting for different classes and Mean Teacher to calibrate label information. Specifically, SCE and CR learn discriminative representations while alleviating over-fitting by the adaptive trade-off between the information of classes (attract) and instances (repulse). After that, Mean Teacher is used to further improve the performance via calibrating more accurate soft pseudo labels. Sufficient experiments validate the effectiveness of the Attract-and-Repulse framework. Together with other strategies, such as aggressive data augmentation, TenCrop inference, and models ensembling, we achieve the second place in ICCV 2021 VIPriors Image Classification Challenge.	翻訳日:2022-06-14 15:48:44 公開日:2022-06-13
# 変圧器病変追跡装置 Transformer Lesion Tracker ( http://arxiv.org/abs/2206.06252v1 ) ライセンス: Link先を確認	Wen Tang, Han Kang, Haoyue Zhang, Pengxin Yu, Corey W. Arnold, Rongguo Zhang	(参考訳) 長期病変追跡による病変進展と治療反応の評価は臨床実践において重要な役割を担っている。このタスクの自動化されたアプローチは、手動で病変マッチングを行う場合の労働コストと時間消費によって動機付けられる。従来の手法は、通常、ローカルとグローバルの情報の統合を欠いている。本研究では,Transformer Lesion Tracker (TLT) と呼ばれるトランスフォーマーベースの手法を提案する。具体的には,CAT(Cross Attention-based Transformer)を設計し,グローバル情報とローカル情報を組み合わせて特徴抽出を強化する。我々はまた,CATに解剖情報を導入し,有用な特徴知識に集中できるように,登録ベースの解剖アテンションモジュール(RAAM)を開発した。トランスフォーマートレーニングでは、機能の選択とメモリフットプリントの削減のためにスパース選択戦略(SSS)が提示される。さらに、グローバル回帰を使用して、モデルパフォーマンスをさらに向上します。我々は,我々の手法の優位性を示すために,公開データセット上で実験を行い,我々のモデルの性能が最先端(SOTA)と比較して,平均ユークリッド中心誤差を14.3%(6mm vs. 7mm)以上改善したことを確認した。コードはhttps://github.com/TangWen920812/TLTで入手できる。 Evaluating lesion progression and treatment response via longitudinal lesion tracking plays a critical role in clinical practice. Automated approaches for this task are motivated by prohibitive labor costs and time consumption when lesion matching is done manually. Previous methods typically lack the integration of local and global information. In this work, we propose a transformer-based approach, termed Transformer Lesion Tracker (TLT). Specifically, we design a Cross Attention-based Transformer (CAT) to capture and combine both global and local information to enhance feature extraction. We also develop a Registration-based Anatomical Attention Module (RAAM) to introduce anatomical information to CAT so that it can focus on useful feature knowledge. A Sparse Selection Strategy (SSS) is presented for selecting features and reducing memory footprint in Transformer training. In addition, we use a global regression to further improve model performance. We conduct experiments on a public dataset to show the superiority of our method and find that our model performance has improved the average Euclidean center error by at least 14.3% (6mm vs. 7mm) compared with the state-of-the-art (SOTA). Code is available at https://github.com/TangWen920812/TLT.	翻訳日:2022-06-14 15:48:05 公開日:2022-06-13
# Faturized Query R-CNN Featurized Query R-CNN ( http://arxiv.org/abs/2206.06258v1 ) ライセンス: Link先を確認	Wenqiang Zhang and Tianheng Cheng and Xinggang Wang and Qian Zhang and Wenyu Liu	(参考訳) detr法で導入されたクエリメカニズムはオブジェクト検出のパラダイムを変えており、最近では多くのクエリベースのメソッドが強いオブジェクト検出性能を得ている。しかし、現在のクエリベースの検出パイプラインは以下の2つの問題に悩まされている。まず、ランダムに初期化されたオブジェクトクエリを最適化するためには、マルチステージデコーダが必要である。第二に、クエリはトレーニング後に修正され、満足のいく一般化能力に繋がる。そこで本稿では,r-cnnフレームワークにおいて,クエリ生成ネットワークが予測するオブジェクトクエリの実現と,r-cnnの高速化について述べる。 COCOデータセットの大規模な実験により、我々のFeaturized Query R-CNNは、最新の最先端のスパースR-CNN検出器を含むすべてのR-CNN検出器の中で、最高の速度精度のトレードオフが得られることが示された。コードは \url{https://github.com/hustvl/featurized-queryrcnn} で入手できる。 The query mechanism introduced in the DETR method is changing the paradigm of object detection and recently there are many query-based methods have obtained strong object detection performance. However, the current query-based detection pipelines suffer from the following two issues. Firstly, multi-stage decoders are required to optimize the randomly initialized object queries, incurring a large computation burden. Secondly, the queries are fixed after training, leading to unsatisfying generalization capability. To remedy the above issues, we present featurized object queries predicted by a query generation network in the well-established Faster R-CNN framework and develop a Featurized Query R-CNN. Extensive experiments on the COCO dataset show that our Featurized Query R-CNN obtains the best speed-accuracy trade-off among all R-CNN detectors, including the recent state-of-the-art Sparse R-CNN detector. The code is available at \url{https://github.com/hustvl/Featurized-QueryRCNN}.	翻訳日:2022-06-14 15:47:44 公開日:2022-06-13
# (参考訳) インド法典の要約: テキスト正規化に基づくアプローチ Indian Legal Text Summarization: A Text Normalisation-based Approach ( http://arxiv.org/abs/2206.06238v1 ) ライセンス: CC BY 4.0	Satyajit Ghosh, Mousumi Dutta, Tanaya Das	(参考訳) インドの裁判所制度では、保留中の事件は長い間問題となっていた。特筆すべき症例は4件以上ある。何百もの文書を手作業で要約することは、法的利害関係者にとって時間と手間のかかる作業である。テキスト要約のための最先端モデルの多くは、機械学習が進むにつれて登場してきた。ドメインに依存しないモデルは法的テキストではうまく機能せず、インドの法律システムのためにこれらのモデルを微調整することは、一般公開されたデータセットの欠如によって問題となる。ドメインに依存しないモデルの性能を向上させるため,インドの文脈における法文の正規化手法を提案した。著者らは、法的テキスト要約のための2つの最先端のドメイン非依存モデル、すなわちBARTとPEGASUSを実験した。 BARTとPEGASUSは、テキスト正規化アプローチの有効性を理解するために、抽出的および抽象的要約の観点から、そのペースを経る。要約されたテキストは、複数のパラメーターとROUGEメトリクスを使用してドメインの専門家によって評価される。提案手法は,ドメインに依存しないモデルを用いた法的なテキストに有効であることを示す。 In the Indian court system, pending cases have long been a problem. There are more than 4 crore cases outstanding. Manually summarising hundreds of documents is a time-consuming and tedious task for legal stakeholders. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models for the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalising legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalisation approach. Summarised texts are evaluated by domain experts on multiple parameters and using ROUGE metrics. It shows the proposed text normalisation approach is effective in legal texts with domain-independent models.	翻訳日:2022-06-14 15:46:25 公開日:2022-06-13
# 非自己回帰変圧器の学習について On the Learning of Non-Autoregressive Transformers ( http://arxiv.org/abs/2206.05975v1 ) ライセンス: Link先を確認	Fei Huang, Tianhua Tao, Hao Zhou, Lei Li, Minlie Huang	(参考訳) 非自己回帰トランスフォーマー(non-autoregressive transformer, nat)は、文全体を並列に予測することで復号遅延を削減することを目的としたテキスト生成モデルである。しかし、そのようなレイテンシ低減は、左から右への依存関係をキャプチャする能力を犠牲にして、NAT学習を非常に困難にする。本稿では,NAT学習の課題を明らかにするための理論的,実証的な分析を行い,既存の成功を理解するための統一的な視点を提案する。まず, NAT を最大化することで, NAT のトレーニングを行うだけで限界分布の近似を導出できるが, トークン間の依存度はすべて減少し, ドロップした情報がデータセットの条件付き総相関によって測定可能であることを示す。第2に,従来の目標の多くを統一フレームワークで定式化し,その成功をプロキシ分布の可能性を最大化することで,情報損失を低減できることを示す。実証的研究により,NAT学習における現象を考察し,新たな学習手法の設計を指導できることが示唆された。 Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. First, we show that simply training NAT by maximizing the likelihood can lead to an approximation of marginal distributions but drops all dependencies between tokens, where the dropped information can be measured by the dataset's conditional total correlation. Second, we formalize many previous objectives in a unified framework and show that their success can be concluded as maximizing the likelihood on a proxy distribution, leading to a reduced information loss. Empirical studies show that our perspective can explain the phenomena in NAT learning and guide the design of new training methods.	翻訳日:2022-06-14 15:37:21 公開日:2022-06-13
# 言語モデルは汎用インターフェースである Language Models are General-Purpose Interfaces ( http://arxiv.org/abs/2206.06336v1 ) ライセンス: Link先を確認	Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei	(参考訳) 基盤モデルは、幅広い下流アプリケーションで有効であるため、多くの注目を集めています。アーキテクチャには大きな収束があるが、ほとんどの事前訓練されたモデルは、通常、特定のタスクやモダリティのために開発されている。本稿では,様々な基礎モデルに対する汎用インタフェースとして言語モデルを使うことを提案する。プリトレーニングされたエンコーダのコレクションは、さまざまなモダリティ(視覚や言語など)を知覚し、普遍的なタスク層の役割を担う言語モデルと連携します。インタフェースとモジュールエンコーダを共同で事前学習する半コーサル言語モデリングの目的を提案する。因果モデリングと非因果モデリングの両方から利点と能力を仮定し、2つの世界のベストを組み合わせる。特に, 提案手法は, 因果的言語モデルから文脈内学習と開放型生成の能力を継承するだけでなく, 双方向エンコーダによる微調整にも寄与する。さらに重要なことは、私たちのアプローチは上記の機能の組み合わせをシームレスに解き放ち、例えば、微調整エンコーダでテキスト内学習や命令の追従を可能にします。様々な言語のみのベンチマークおよび視覚言語ベンチマークにおける実験の結果は、我々のモデルは微調整、ゼロショット一般化、少数ショット学習といった特殊なモデルよりも優れ、または競合していることを示している。 Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language), and they dock with a language model that plays the role of a universal task layer. We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders. We subsume the advantages and capabilities from both causal and non-causal modeling, thereby combining the best of two worlds. Specifically, the proposed method not only inherits the capabilities of in-context learning and open-ended generation from causal language modeling, but also is conducive to finetuning because of the bidirectional encoders. More importantly, our approach seamlessly unlocks the combinations of the above capabilities, e.g., enabling in-context learning or instruction following with finetuned encoders. Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.	翻訳日:2022-06-14 15:37:02 公開日:2022-06-13
# 知覚からプログラムへ:規則化、過剰パラメータ化、償却 From Perception to Programs: Regularize, Overparameterize, and Amortize ( http://arxiv.org/abs/2206.05922v1 ) ライセンス: Link先を確認	Hao Tang and Kevin Ellis	(参考訳) 帰納的推論と知覚能力を組み合わせることを目的として,まず知覚入力をニューラルネットワークで解析して低次元の解釈可能な表現とし,次に合成プログラムで処理するニューロシンボリックプログラム合成技術を開発した。本稿では,問題を緩和し,全モジュールを勾配勾配で学習する手法について検討する。マルチタスク学習,償却推論,過度パラメータ化,長大プログラムのペナルティ化のための異なる戦略である。このツールボックスは、勾配誘導型プログラム探索の安定性を改善し、入力を離散抽象として知覚する方法と、それらの抽象をプログラムとして象徴的に処理する方法の両方を学ぶ方法を提案する。 Toward combining inductive reasoning with perception abilities, we develop techniques for neurosymbolic program synthesis where perceptual input is first parsed by neural nets into a low-dimensional interpretable representation, which is then processed by a synthesized program. We explore several techniques for relaxing the problem and jointly learning all modules end-to-end with gradient descent: multitask learning; amortized inference; overparameterization; and a differentiable strategy for penalizing lengthy programs. Collectedly this toolbox improves the stability of gradient-guided program search, and suggests ways of learning both how to perceive input as discrete abstractions, and how to symbolically process those abstractions as programs.	翻訳日:2022-06-14 15:34:15 公開日:2022-06-13
# 拘束ガイドグラディエントドライズ:不平等制約による指導訓練 Constraint Guided Gradient Descent: Guided Training with Inequality Constraints ( http://arxiv.org/abs/2206.06202v1 ) ライセンス: Link先を確認	Quinten Van Baelen Peter Karsmakers	(参考訳) ディープラーニングは通常、利用可能なドメイン知識を無視した入出力ペアという形式で、データのみからニューラルネットワークを学習することによって行われる。本研究では,訓練手順にドメイン知識を注入できるCGGD(Constraint Guided Gradient Descent)フレームワークを提案する。ドメイン知識は、いくつかのアプリケーションにとって自然な選択であるように見えるハード不等式制約の結合として記述される。他のニューロシンボリックアプローチと比較すると、提案手法はトレーニングデータに対する不等式制約を満たすモデルに収束し、学習(最適化)目標に追加されるアドホックな用語にまず制約を変換する必要がなくなる。ある条件下では、CGGDはトレーニングセット上の制約を満たすモデルに収束するが、事前の作業は必ずしもそのようなモデルに収束するとは限らない。これは、CGGDがトレーニングをネットワークの初期化に依存しにくくし、全てのデータに対する制約を満たすことを実証的に示している。 Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge. In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that enables the injection of domain knowledge into the training procedure. The domain knowledge is assumed to be described as a conjunction of hard inequality constraints which appears to be a natural choice for several applications. Compared to other neuro-symbolic approaches, the proposed method converges to a model that satisfies any inequality constraint on the training data and does not require to first transform the constraints into some ad-hoc term that is added to the learning (optimisation) objective. Under certain conditions, it is shown that CGGD can converges to a model that satisfies the constraints on the training set, while prior work does not necessarily converge to such a model. It is empirically shown on two independent and small data sets that CGGD makes training less dependent on the initialisation of the network and improves the constraint satisfiability on all data.	翻訳日:2022-06-14 15:33:46 公開日:2022-06-13
# ワンショットNASからFew-shot NASへのトレーニングスキームによるスーパーネットのランク付け相関の改善 Improve Ranking Correlation of Super-net through Training Scheme from One-shot NAS to Few-shot NAS ( http://arxiv.org/abs/2206.05896v1 ) ライセンス: Link先を確認	Jiawei Liu, Kaiyu Zhang, Weitai Hu and Qing Yang	(参考訳) one-shot neural architecture search (nas) のアルゴリズムは計算量を減らすために広く使われている。しかし、重みが共有されるサブネット間の干渉のため、これらのアルゴリズムによって訓練されたスーパーネットから継承されたサブネットは、精度ランキングの一貫性に乏しい。この問題に対処するために,ワンショットNASから少数ショットNASへのステップバイステップトレーニングスーパーネットスキームを提案する。トレーニングスキームでは,まずワンショット方式でスーパーネットをトレーニングし,それをマルチサブネットに分割して徐々にトレーニングすることで,スーパーネットの重みを解消する。最後に,本手法はcvpr2022軽量nasチャレンジトラック1で4位である。私たちのコードはhttps://github.com/liujiawei2333/cvpr2022-nascompetition-track-1-4th-solutionで利用可能です。 The algorithms of one-shot neural architecture search (NAS) have been widely used to reduce the computation. However, because of the interference among the subnets which weights are shared, the subnets inherited from these super-net trained by those algorithms have poor consistency in precision ranking. To address this problem, we propose a step-by-step training super-net scheme from one-shot NAS to few-shot NAS. In the training scheme, we training super-net by the one-shot way firstly, and then we disentangles the weights of super-net by splitting that to multi-subnets and training them gradually. Finally, our method ranks 4th place in the CVPR2022 Lightweight NAS Challenge Track1. Our code is available at https://github.com/liujiawei2333/CVPR2022-NAScompetition-Track-1-4th-solution.	翻訳日:2022-06-14 15:27:50 公開日:2022-06-13
# (参考訳) スマートマニュファクチャリングデータセットにおける異常検出とセンサ間転送学習 Anomaly Detection and Inter-Sensor Transfer Learning on Smart Manufacturing Datasets ( http://arxiv.org/abs/2206.06355v1 ) ライセンス: CC BY 4.0	Mustafa Abdallah, Byung-Gun Joung, Wo Jae Lee, Charilaos Mousoulis, John W. Sutherland, and Saurabh Bagchi	(参考訳) スマートマニュファクチャリングシステムは、さまざまなセンシングされた情報を解釈し、システムの観察から得られた知識に作用する能力があるため、成長速度で展開されている。多くの場合、スマートマニュファクチャリングシステムの主な目標は、迅速な障害の検出(あるいは予測)と運用コストの削減、ダウンタイムの削減である。これはしばしば、システムから取得したセンサー日内における異常を検出するためである。スマートマニュファクチャリングアプリケーションドメインは、ある種の技術的課題を提起する。特に、能力やコストの異なる複数のタイプのセンサーがあることが多い。センサデータ特性は、モータのRPMなどの環境や機械の動作点によって変化する。したがって、異常検出プロセスは動作点付近で校正する必要がある。本稿では,製造試験場から展開されたセンサから4つのデータセットを解析する。センサデータの時系列を予測するために,従来のMLおよびMLに基づく予測モデルの性能を評価する。そして、一種類のセンサからのスパースデータを考慮して、高データレートセンサからの転送学習を行い、欠陥タイプ分類を行う。その結果,予測的障害分類が可能となり,予測的メンテナンスの道筋が整った。 Smart manufacturing systems are being deployed at a growing rate because of their ability to interpret a wide variety of sensed information and act on the knowledge gleaned from system observations. In many cases, the principal goal of the smart manufacturing system is to rapidly detect (or anticipate) failures to reduce operational cost and eliminate downtime. This often boils down to detecting anomalies within the sensor date acquired from the system. The smart manufacturing application domain poses certain salient technical challenges. In particular, there are often multiple types of sensors with varying capabilities and costs. The sensor data characteristics change with the operating point of the environment or machines, such as, the RPM of the motor. The anomaly detection process therefore has to be calibrated near an operating point. In this paper, we analyze four datasets from sensors deployed from manufacturing testbeds. We evaluate the performance of several traditional and ML-based forecasting models for predicting the time series of sensor data. Then, considering the sparse data from one kind of sensor, we perform transfer learning from a high data rate sensor to perform defect type classification. Taken together, we show that predictive failure classification can be achieved, thus paving the way for predictive maintenance.	翻訳日:2022-06-14 15:27:05 公開日:2022-06-13
# F-RANの計算オフロードと資源配分: 深層強化学習アプローチ Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach ( http://arxiv.org/abs/2206.05881v1 ) ライセンス: Link先を確認	Lingling Zhang, Yanxiang Jiang, Fu-Chun Zheng, Mehdi Bennis, and Xiaohu You	(参考訳) フォグ無線アクセスネットワーク(F-RAN)は、ユーザのモバイルデバイス(MD)が計算タスクを近くのフォグアクセスポイント(F-AP)にオフロードできる有望な技術である。 F-APの限られた資源のため、効率的なタスクオフロード方式を設計することが重要である。本稿では,時間変化を考慮したネットワーク環境を考慮し,F-RANの動的計算オフロードと資源配分問題を定式化し,MDのタスク実行遅延とエネルギー消費を最小化する。この問題を解決するために、各F-APにおける計算オフロードとリソース割り当てを行うディープ決定性ポリシー勾配(DDPG)アルゴリズムを、DRLに基づくアルゴリズムを提案する。 DDPGエージェントをトレーニングすることで、トレーニングプロセスの計算複雑性を低減し、ユーザのプライバシを保護する。シミュレーションの結果,提案したDDPGアルゴリズムは,他の既存手法と比較して,作業実行の遅れやMDのエネルギー消費を低減できることがわかった。 The fog radio access network (F-RAN) is a promising technology in which the user mobile devices (MDs) can offload computation tasks to the nearby fog access points (F-APs). Due to the limited resource of F-APs, it is important to design an efficient task offloading scheme. In this paper, by considering time-varying network environment, a dynamic computation offloading and resource allocation problem in F-RANs is formulated to minimize the task execution delay and energy consumption of MDs. To solve the problem, a federated deep reinforcement learning (DRL) based algorithm is proposed, where the deep deterministic policy gradient (DDPG) algorithm performs computation offloading and resource allocation in each F-AP. Federated learning is exploited to train the DDPG agents in order to decrease the computing complexity of training process and protect the user privacy. Simulation results show that the proposed federated DDPG algorithm can achieve lower task execution delay and energy consumption of MDs more quickly compared with the other existing strategies.	翻訳日:2022-06-14 15:04:55 公開日:2022-06-13
# フォグランズにおけるコンテンツ人気予測:クラスタ型フェデレーション学習に基づくアプローチ Content Popularity Prediction in Fog-RANs: A Clustered Federated Learning Based Approach ( http://arxiv.org/abs/2206.05894v1 ) ライセンス: Link先を確認	Zhiheng Wang, Yanxiang Jiang, Fu-Chun Zheng, Mehdi Bennis and Xiaohu You	(参考訳) 本稿では,フォグラジオアクセスネットワーク(F-RAN)におけるコンテンツ人気予測問題について検討する。クラスタ化されたフェデレーション学習に基づいて,ローカルユーザとモバイルユーザの観点からコンテンツの人気度を統合する,モビリティを考慮した新しい人気予測ポリシーを提案する。ローカルユーザに対しては,ローカルユーザとコンテンツの隠れた表現を学習することで,コンテンツの人気を予測できる。近隣情報を自己情報に組み込んだローカルユーザとコンテンツの初期特徴を生成する。次に、二重チャネルニューラルネットワーク(DCNN)モデルを導入し、初期特徴から深い潜伏特徴を生成して隠れ表現を学習する。モバイルユーザーにとって、コンテンツの人気はユーザー好みの学習によって予測される。コンテンツ人気の地域差を識別するために、クラスタ化フェデレーションラーニング(CFL)が採用され、類似の地域型を持つフォグアクセスポイント(F-AP)が互いに恩恵を受け、各F-APに対してより専門的なDCNNモデルを提供する。シミュレーションの結果,提案手法は従来の政策よりも大幅な性能向上を実現していることがわかった。 In this paper, the content popularity prediction problem in fog radio access networks (F-RANs) is investigated. Based on clustered federated learning, we propose a novel mobility-aware popularity prediction policy, which integrates content popularities in terms of local users and mobile users. For local users, the content popularity is predicted by learning the hidden representations of local users and contents. Initial features of local users and contents are generated by incorporating neighbor information with self information. Then, dual-channel neural network (DCNN) model is introduced to learn the hidden representations by producing deep latent features from initial features. For mobile users, the content popularity is predicted via user preference learning. In order to distinguish regional variations of content popularity, clustered federated learning (CFL) is employed, which enables fog access points (F-APs) with similar regional types to benefit from one another and provides a more specialized DCNN model for each F-AP. Simulation results show that our proposed policy achieves significant performance improvement over the traditional policies.	翻訳日:2022-06-14 15:04:39 公開日:2022-06-13
# 光処理ユニットを用いた圧縮クラスタリング Compressive Clustering with an Optical Processing Unit ( http://arxiv.org/abs/2206.05928v1 ) ライセンス: Link先を確認	Luc Giffon (DANTE), R\'emi Gribonval (DANTE)	(参考訳) 光処理ユニット(opu)を使用して、スケッチのためのランダムなフーリエ特徴を計算し、この設定に全体的な圧縮クラスタリングパイプラインを適用する。また,圧縮クラスタリングの臨界ハイパーパラメータのチューニングを支援するツールを提案する。 We explore the use of Optical Processing Units (OPU) to compute random Fourier features for sketching, and adapt the overall compressive clustering pipeline to this setting. We also propose some tools to help tuning a critical hyper-parameter of compressive clustering.	翻訳日:2022-06-14 15:04:20 公開日:2022-06-13
# 本質的動機づけによるオプション学習:最近の方法の比較研究 Intrinsically motivated option learning: a comparative study of recent methods ( http://arxiv.org/abs/2206.06007v1 ) ライセンス: Link先を確認	Djordje Bo\v{z}i\'c, Predrag Tadi\'c, Mladen Nikoli\'c	(参考訳) オプションは強化学習(RL)における複数の時間スケールでの推論のためのフレームワークである。 rl研究コミュニティにおける教師なし学習パラダイムに対する近年の活発な関心により、オプションフレームワークは、エージェントが環境に与える影響の量と、この影響を知覚する能力に対応し、環境の報酬構造によって提供される監督なしで最適化できるエンパワーメントの概念を利用するように適応された。近年、多くの論文がこの概念を様々な方法で修正し、賞賛できる結果を得た。しかし、これらの様々な変更を通じて、エンパワーメントの初期の文脈はしばしば失われる。本研究では、元のエンパワーメント原理のレンズを通して、そのような論文の比較研究を行う。 Options represent a framework for reasoning across multiple time scales in reinforcement learning (RL). With the recent active interest in the unsupervised learning paradigm in the RL research community, the option framework was adapted to utilize the concept of empowerment, which corresponds to the amount of influence the agent has on the environment and its ability to perceive this influence, and which can be optimized without any supervision provided by the environment's reward structure. Many recent papers modify this concept in various ways achieving commendable results. Through these various modifications, however, the initial context of empowerment is often lost. In this work we offer a comparative study of such papers through the lens of the original empowerment principle.	翻訳日:2022-06-14 15:03:25 公開日:2022-06-13
# 現実世界における自律的段階付けに向けて Towards Autonomous Grading In The Real World ( http://arxiv.org/abs/2206.06091v1 ) ライセンス: Link先を確認	Yakov Miron, Chana Ross, Yuval Goldfracht, Chen Tessler and Dotan Di Castro	(参考訳) 本研究では,不均一な領域を平滑化するためにドーザーが必要となる自律的採点の問題に取り組むことを目的としている。さらに,シミュレーション環境と実シナリオとのギャップを埋める手法についても検討する。実際のドーザーダイナミクスと感覚情報を模倣した実物シミュレーションと実物プロトタイプ環境の両方を設計した。我々はその問題を解決するためにヒューリスティックスと学習戦略を確立する。大規模な実験を通じて, ヒューリスティックはクリーンでノイズのないシミュレーション環境で問題に取り組むことができるが, 現実のシナリオに直面すると壊滅的に失敗することを示した。ヒューリスティックスはシミュレーション環境でタスクをうまく解くことができるので、シミュレーションとスケールしたプロトタイプ環境の両方においてタスクを一般化し解決できる学習エージェントの誘導に活用できることを示す。 In this work, we aim to tackle the problem of autonomous grading, where a dozer is required to flatten an uneven area. In addition, we explore methods for bridging the gap between a simulated environment and real scenarios. We design both a realistic physical simulation and a scaled real prototype environment mimicking the real dozer dynamics and sensory information. We establish heuristics and learning strategies in order to solve the problem. Through extensive experimentation, we show that although heuristics are capable of tackling the problem in a clean and noise-free simulated environment, they fail catastrophically when facing real world scenarios. As the heuristics are capable of successfully solving the task in the simulated environment, we show they can be leveraged to guide a learning agent which can generalize and solve the task both in simulation and in a scaled prototype environment.	翻訳日:2022-06-14 15:03:13 公開日:2022-06-13
# dcase 2022チャレンジタスク2 : ドメイン一般化手法を適用した機械状態監視のための教師なし異常音検出 Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques ( http://arxiv.org/abs/2206.05876v1 ) ライセンス: Link先を確認	Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto and Yohei Kawaguchi	(参考訳) 本稿では,音響シーンとイベントの検出と分類に関するタスク記述(dcase)2022 challenge task 2: "unsupervised anomalous sound detection (asd) for machine condition monitoring using domain generalization techniques"について述べる。ドメインシフトは、ASDシステムの適用にとって重要な問題である。ドメインシフトはデータの音響特性を変化させる可能性があるため、ソースドメインでトレーニングされたモデルは、ターゲットドメインに対して性能が悪い。 DCASE 2021 Challenge Task 2では、ドメインシフトを処理するためのASDタスクを編成しました。この課題では、領域シフトの発生が知られていると仮定された。しかし、実際には、各サンプルのドメインは与えられず、ドメインシフトは暗黙的に発生する可能性がある。 2022タスク2では,ドメインシフトによらず異常を検出する領域一般化技術に注目した。具体的には、各サンプルのドメインがテストデータに与えられず、すべてのドメインに対して1つのしきい値のみが許可される。課題提出期限後に,課題結果と提案内容の分析を加えます。 We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 2: "Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques". Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domain performs poorly for a target domain. In DCASE 2021 Challenge Task 2, we organized an ASD task for handling domain shifts. In this task, it was assumed that the occurrences of domain shifts are known. However, in practice, the domain of each sample may not be given, and the domain shifts can occur implicitly. In 2022 Task 2, we focus on domain generalization techniques that detects anomalies regardless of the domain shifts. Specifically, the domain of each sample is not given in the test data and only one threshold is allowed for all domains. We will add challenge results and analysis of the submissions after the challenge submission deadline.	翻訳日:2022-06-14 15:01:58 公開日:2022-06-13
# SyntheX: サイコ実験による学習ベースのX線画像解析のスケールアップ SyntheX: Scaling Up Learning-based X-ray Image Analysis Through In Silico Experiments ( http://arxiv.org/abs/2206.06127v1 ) ライセンス: Link先を確認	Cong Gao, Benjamin D. Killeen, Yicheng Hu, Robert B. Grupp, Russell H. Taylor, Mehran Armand, Mathias Unberath	(参考訳) 人工知能(AI)は、医療用画像の自動解釈を可能にする。しかし、手術中のガイダンスなどの介入画像(トリアージや診断に関わるもの)に対するAIの潜在的な使用は、ほとんど未解決のままである。これは現在、外科的AIシステムは、倫理的考慮、費用、スケーラビリティ、データの完全性、基礎的真実の欠如など、基本的なおよび実践的な制限があるライブ手術中に収集されたデータのポストホック分析を使用して訓練されているためである。本稿では,人間のモデルから現実的なシミュレーション画像を作成することは,大規模な実地データ収集を補完する有効な代替手段であることを示す。本研究は,ai画像解析モデルと現代ドメイン一般化や適応手法を組み合わせることで,実データ上でのモデルと正確に一致する実データ学習セットで学習されたモデルとを両立させることができることを示す。人間ベースのモデルからのトレーニングデータの合成生成は、容易にスケールできるため、synthexと呼ばれるx線画像解析のモデル転送パラダイムは、より大きなデータセットでのトレーニングの有効性により、実際のデータトレーニングモデルよりも優れています。われわれはSyntheXの3つの臨床課題について, ヒップ画像解析, 手術用ロボットツール検出, および COVID-19 肺病変のセグメンテーションの3つの可能性を示した。 SyntheXは、X線治療のためのインテリジェントシステムの概念、設計、評価を劇的に加速する機会を提供する。加えて、シミュレーションされた画像環境は、新しい計測方法のテスト、補完的な手術アプローチの設計、そして人間のデータ収集の倫理的かつ実践的な考察から解放された、成果を改善し、時間を節約し、ヒューマンエラーを緩和する新しい手法を想定する機会を提供する。 Artificial intelligence (AI) now enables automated interpretation of medical images for clinical use. However, AI's potential use for interventional images (versus those involved in triage or diagnosis), such as for guidance during surgery, remains largely untapped. This is because surgical AI systems are currently trained using post hoc analysis of data collected during live surgeries, which has fundamental and practical limitations, including ethical considerations, expense, scalability, data integrity, and a lack of ground truth. Here, we demonstrate that creating realistic simulated images from human models is a viable alternative and complement to large-scale in situ data collection. We show that training AI image analysis models on realistically synthesized data, combined with contemporary domain generalization or adaptation techniques, results in models that on real data perform comparably to models trained on a precisely matched real data training set. Because synthetic generation of training data from human-based models scales easily, we find that our model transfer paradigm for X-ray image analysis, which we refer to as SyntheX, can even outperform real data-trained models due to the effectiveness of training on a larger dataset. We demonstrate the potential of SyntheX on three clinical tasks: Hip image analysis, surgical robotic tool detection, and COVID-19 lung lesion segmentation. SyntheX provides an opportunity to drastically accelerate the conception, design, and evaluation of intelligent systems for X-ray-based medicine. In addition, simulated image environments provide the opportunity to test novel instrumentation, design complementary surgical approaches, and envision novel techniques that improve outcomes, save time, or mitigate human error, freed from the ethical and practical considerations of live human data collection.	翻訳日:2022-06-14 14:56:23 公開日:2022-06-13
# ctスキャンによる体積超解像のためのrplhr-ctデータセットと変圧器ベースライン RPLHR-CT Dataset and Transformer Baseline for Volumetric Super-Resolution from CT Scans ( http://arxiv.org/abs/2206.06253v1 ) ライセンス: Link先を確認	Pengxin Yu, Haoyue Zhang, Han Kang, Wen Tang, Corey W. Arnold, Rongguo Zhang	(参考訳) 臨床では, 取得時間の短縮や保存コストの低減などにより, 平面分解能の低い異方性容積医用画像が一般的に用いられる。しかしながら、この粗い解決は、医師またはコンピュータ支援の診断アルゴリズムによる医療診断の困難につながる可能性がある。深層学習に基づくボリューム超解像(SR)法は、畳み込みニューラルネットワーク(CNN)を中心に、解像度を改善するための実現可能な方法である。近年の進歩にもかかわらず、これらの手法はコンボリューション演算子の性質によって制限されており、コンボリューションの関連性を無視し、長距離依存を効果的にモデル化できない。さらに、既存の手法の多くは擬似ペアドボリュームをトレーニングと評価に使用しており、擬似低分解能(LR)ボリュームは高分解能(HR)ボリュームの単純な劣化によって生成される。しかし、擬似LRボリュームと実LRボリュームのドメインギャップは、実際にはこれらの手法の貧弱な性能をもたらす。本稿では,量的SRのベンチマークとして,最初の公開実対データセット RPLHR-CT を構築し,最先端の CNN ベースの4つの手法を再実装することによって,ベースライン結果を提供する。また,CNNの固有の欠点を考慮し,コンボリューションを完全に排除したアテンション機構に基づくトランスフォーマーボリューム超解像ネットワーク(TVSRN)を提案する。これはCTボリュームSRに純粋なトランスフォーマーを使用した最初の研究である。実験の結果,TVSRNはPSNRとSSIMの両方のベースラインを著しく上回ることがわかった。さらに,TVSRN法では,画像品質,パラメータ数,実行時間とのトレードオフが向上する。データとコードはhttps://github.com/smilenaxx/RPLHR-CTで入手できる。 In clinical practice, anisotropic volumetric medical images with low through-plane resolution are commonly used due to short acquisition time and lower storage cost. Nevertheless, the coarse resolution may lead to difficulties in medical diagnosis by either physicians or computer-aided diagnosis algorithms. Deep learning-based volumetric super-resolution (SR) methods are feasible ways to improve resolution, with convolutional neural networks (CNN) at their core. Despite recent progress, these methods are limited by inherent properties of convolution operators, which ignore content relevance and cannot effectively model long-range dependencies. In addition, most of the existing methods use pseudo-paired volumes for training and evaluation, where pseudo low-resolution (LR) volumes are generated by a simple degradation of their high-resolution (HR) counterparts. However, the domain gap between pseudo- and real-LR volumes leads to the poor performance of these methods in practice. In this paper, we build the first public real-paired dataset RPLHR-CT as a benchmark for volumetric SR, and provide baseline results by re-implementing four state-of-the-art CNN-based methods. Considering the inherent shortcoming of CNN, we also propose a transformer volumetric super-resolution network (TVSRN) based on attention mechanisms, dispensing with convolutions entirely. This is the first research to use a pure transformer for CT volumetric SR. The experimental results show that TVSRN significantly outperforms all baselines on both PSNR and SSIM. Moreover, the TVSRN method achieves a better trade-off between the image quality, the number of parameters, and the running time. Data and code are available at https://github.com/smilenaxx/RPLHR-CT.	翻訳日:2022-06-14 14:55:49 公開日:2022-06-13
# maniskill 2021: learning-from-demonstrations and heuristic rule-based method for object manipulation Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation ( http://arxiv.org/abs/2206.06289v1 ) ライセンス: Link先を確認	Yingwei Pan and Yehao Li and Yiheng Zhang and Qi Cai and Fuchen Long and Zhaofan Qiu and Ting Yao and Tao Mei	(参考訳) 本稿では,sapien maniskill challenge 2021において,以下の2つのトラック用に設計されたシステムの概要と比較分析を行った。模倣学習に基づくアプローチ,すなわち,古典的教師付き学習手法を用いた観察行動の模倣と,オフライン強化学習に基づくアプローチの両方について検討した。さらに,物体やロボットアームの形状やテクスチャ構造をトランスフォーマーネットワークで活用し,模倣学習を容易にする。 No Restriction Track: このトラックでは、タスクを一連のサブタスクに分解することで高品質なオブジェクト操作をトリガーするHuristic Rule-based Method(HRM)を設計します。各サブタスクに対して、ロボットアームに適用可能な動作を予測するための単純なルールベースの制御戦略が採用されている。システムの実装を容易にするため、すべてのソースコードと事前訓練済みモデルは、 \url{https://github.com/caiqi/Silver-Bullet-3D/}で利用可能である。 This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories. We investigate both imitation learning-based approach, i.e., imitating the observed behavior using classical supervised learning techniques, and offline reinforcement learning-based approaches, for this track. Moreover, the geometry and texture structures of objects and robotic arms are exploited via Transformer-based networks to facilitate imitation learning. No Restriction Track: In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks. For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms. To ease the implementations of our systems, all the source codes and pre-trained models are available at \url{https://github.com/caiqi/Silver-Bullet-3D/}.	翻訳日:2022-06-14 14:55:20 公開日:2022-06-13
# 多項式複雑性をもつスコアベース生成モデルの収束性 Convergence for score-based generative modeling with polynomial complexity ( http://arxiv.org/abs/2206.06227v1 ) ライセンス: Link先を確認	Holden Lee and Jianfeng Lu and Yixin Tan	(参考訳) スコアベース生成モデリング(SGM)は、データから確率分布を学習し、さらなるサンプルを生成するために非常に成功した手法である。 sgm の背後にあるコアメカニックに対する最初の多項式収束性を保証する: 確率密度 $p$ が与えられたスコア推定値 ($\nabla \ln p$ の見積もり) からサンプルを抽出し、$l^2(p)$ で正確であることを証明する。以前の作品と比較して、私たちは指数関数的に増加するエラーや、次元の呪いに苦しむエラーを犯さない。この保証は任意の滑らかな分布に対して有効であり、その対数ソボレフ定数に依存する。保証条件を用いて,音階の異なるスコア推定値から,ホワイトノイズ入力をサンプルに変換するスコアベース生成モデルの理論解析を行った。提案手法は, 熱処理による各工程の温かいスタート点の獲得に要するので, 有効試料の生成には熱処理が必要であるという理論的な根拠を与える。さらに,予測子補正アルゴリズムは,どちらの部分のみを使用するよりも収束性が良いことを示す。 Score-based generative modeling (SGM) is a highly successful approach for learning a probability distribution from data and generating further samples. We prove the first polynomial convergence guarantees for the core mechanic behind SGM: drawing samples from a probability density $p$ given a score estimate (an estimate of $\nabla \ln p$) that is accurate in $L^2(p)$. Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality. Our guarantee works for any smooth distribution and depends polynomially on its log-Sobolev constant. Using our guarantee, we give a theoretical analysis of score-based generative modeling, which transforms white-noise input into samples from a learned data distribution given score estimates at different noise scales. Our analysis gives theoretical grounding to the observation that an annealed procedure is required in practice to generate good samples, as our proof depends essentially on using annealing to obtain a warm start at each step. Moreover, we show that a predictor-corrector algorithm gives better convergence than using either portion alone.	翻訳日:2022-06-14 14:54:48 公開日:2022-06-13
# 畳み込み型長期記憶を用いた畳み込み型ニューラルネットワークを用いた全身動的PETのフレーム間運動補正 Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network ( http://arxiv.org/abs/2206.06341v1 ) ライセンス: Link先を確認	Xueqi Guo, Bo Zhou, David Pigg, Bruce Spottiswoode, Michael E. Casey, Chi Liu, Nicha C. Dvornek	(参考訳) 全身動的PETにおける被写体運動は、フレーム間ミスマッチを導入し、パラメトリックイメージングに深刻な影響を及ぼす。従来の非厳密な登録手法は一般に計算量が多く、時間がかかる。ディープラーニングアプローチは,高速で高精度な学習を実現する上で有望だが,トレーサ分布の変化や全身範囲についてはまだ検討されていない。本研究では,フレーム間体の動きを補正するための教師なし自動ディープラーニングフレームワークを開発した。運動推定ネットワークは、動的時間的特徴と空間的情報を完全に活用した畳み込み長短期記憶層を組み合わせた畳み込みニューラルネットワークである。本データセットは90分FDGフルボディPETスキャンで27名の被験者を抽出した。従来の学習ベースラインと深層学習ベースラインの両方と比較して,9倍のクロスバリデーションでは,パラメトリック$K_{i}$と$V_{b}$の画像の質的,定量的な空間的アライメントが向上し,パラメトリックフィッティング誤差が大幅に低減された。また,提案手法が推定したパラメトリック画像の下流解析に影響を及ぼす可能性を示し,良質な代謝異常領域と悪性度を区別する能力を向上した。一旦トレーニングすると,提案ネットワークの動作推定時間は,従来の登録ベースラインの約460倍高速となり,臨床応用が容易になる可能性が示唆された。 Subject motion in whole-body dynamic PET introduces inter-frame mismatch and seriously impacts parametric imaging. Traditional non-rigid registration methods are generally computationally intense and time-consuming. Deep learning approaches are promising in achieving high accuracy with fast speed, but have yet been investigated with consideration for tracer distribution changes or in the whole-body scope. In this work, we developed an unsupervised automatic deep learning-based framework to correct inter-frame body motion. The motion estimation network is a convolutional neural network with a combined convolutional long short-term memory layer, fully utilizing dynamic temporal features and spatial information. Our dataset contains 27 subjects each under a 90-min FDG whole-body dynamic PET scan. With 9-fold cross-validation, compared with both traditional and deep learning baselines, we demonstrated that the proposed network obtained superior performance in enhanced qualitative and quantitative spatial alignment between parametric $K_{i}$ and $V_{b}$ images and in significantly reduced parametric fitting error. We also showed the potential of the proposed motion correction method for impacting downstream analysis of the estimated parametric images, improving the ability to distinguish malignant from benign hypermetabolic regions of interest. Once trained, the motion estimation inference time of our proposed network was around 460 times faster than the conventional registration baseline, showing its potential to be easily applied in clinical settings.	翻訳日:2022-06-14 14:54:29 公開日:2022-06-13
# (参考訳) 微分可能かつ伝達可能な構造学習 Differentiable and Transportable Structure Learning ( http://arxiv.org/abs/2206.06354v1 ) ライセンス: CC BY 4.0	Jeroen Berrevoets, Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar	(参考訳) 我々は,有向非巡回グラフィカルモデル(DAG)に着目した教師なし構造学習に興味を持っている。これらの構造を推論するために必要となる計算は、一般的に変数の量において超指数的である。つまり、最近の進歩によってこの空間を微分可能な計量を用いて探索できるまで、検索時間は劇的に削減される。この手法は notears と名付けられ、dag-discovery の独創的な作品と見なされているが、微分可能性(英語版)(transportability)を支持する重要な特性である。本稿では,新しいアーキテクチャと損失関数により,検出された構造物の輸送性を復元するD-Structを提案する。 D-Structは相変わらず差別化可能であるため、従来NOTEARSで行われていたように、我々の手法を差別化可能なアーキテクチャで容易に適用することができる。実験では, エッジ精度とハミング距離に関するD構造を実験的に検証した。 We are interested in unsupervised structure learning with a particular focus on directed acyclic graphical (DAG) models. Compute required to infer these structures is typically super-exponential in the amount of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances allowed to search this space using a differentiable metric, drastically reducing search time. While this technique -- named NOTEARS -- is widely considered a seminal work in DAG-discovery, it concedes an important property in favour of differentiability: transportability. In our paper we introduce D-Struct which recovers transportability in the found structures through a novel architecture and loss function, while remaining completely differentiable. As D-Struct remains differentiable, one can easily adopt our method in differentiable architectures as was previously done with NOTEARS. In our experiments we empirically validate D-Struct with respect to edge accuracy and the structural Hamming distance.	翻訳日:2022-06-14 14:53:02 公開日:2022-06-13
# 強化学習におけるマルチタスク表現学習の有益性 Provable Benefit of Multitask Representation Learning in Reinforcement Learning ( http://arxiv.org/abs/2206.05900v1 ) ライセンス: Link先を確認	Yuan Cheng, Songtao Feng, Jing Yang, Hong Zhang, Yingbin Liang	(参考訳) 表現学習は、実際には強化学習(RL)におけるサンプルの複雑さを低減する強力な手法となり、その利点に関する理論的理解は限定的である。本稿では,低ランクマルコフ決定過程(MDP)モデルに基づく表現学習の利点を理論的に特徴づける。まず,全てのタスクが共通表現を持つマルチタスク低ランクRL(上流トレーニング)について検討し,REFUELと呼ばれる新しいマルチタスク報酬のないアルゴリズムを提案する。 REFUELは、各タスクの遷移カーネルとほぼ最適ポリシーの両方を学び、下流タスクのよく学習された表現を出力する。その結果、タスクの総数が一定のしきい値を超えている限り、マルチタスク表現学習は各タスクを個別に学習するよりもサンプル効率が高いことが示された。次に、ダウンストリームRLをオンラインとオフラインの両方の設定で研究し、エージェントにアップストリームタスクと同じ表現を共有する新しいタスクを割り当てる。オンラインとオフラインの両方の設定で、サンプル効率のよいアルゴリズムを開発し、上流での学習表現の推定誤差と下流のサンプル数が大きくなるにつれて消滅する項の合計によって、サブオプティリティギャップを境界とする最適に近いポリシーを見出す。オンラインおよびオフラインRLのダウンストリーム結果はさらに、ローランクモデルの表現を直接学習するのではなく、上流から学習した表現を採用するメリットを捉えています。我々の知る限りでは、上流と下流の両方のタスクに対して探索に基づく報酬なしマルチタスクRLにおける表現学習の利点を特徴づける最初の理論的研究である。 As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both the transition kernel and the near-optimal policy for each task, and outputs a well-learned representation for downstream tasks. Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually, as long as the total number of tasks is above a certain threshold. We then study the downstream RL in both online and offline settings, where the agent is assigned with a new task sharing the same representation as the upstream tasks. For both online and offline settings, we develop a sample-efficient algorithm, and show that it finds a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large. Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the low-rank model directly. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.	翻訳日:2022-06-14 14:28:01 公開日:2022-06-13
# 比較学習特徴を用いたグラフ生成モデルの評価 Evaluating Graph Generative Models with Contrastively Learned Features ( http://arxiv.org/abs/2206.06234v1 ) ライセンス: Link先を確認	Hamed Shirzad and Kaveh Hassani and Danica J. Sutherland	(参考訳) グラフ生成モデルには様々なモデルが提案されており、その品質を評価するのに効果的な方法が必要となる。今のところ、ほとんどのテクニックは、サブグラフカウントに基づく伝統的なメトリクスまたはランダムに初期化されたグラフニューラルネットワーク(GNN)の表現を使用する。我々は、ランダムなGNNではなく、対照的に訓練されたGNNの表現を使うことを提案する。しかし、従来のアプローチもGNNベースのアプローチもどちらにも支配的ではなく、それぞれのアプローチが区別できないグラフの例を挙げる。グラフサブストラクチャーネットワーク(GSN)は、両方のアプローチを組み合わせることで、グラフデータセット間の距離を区別するのがより優れていることを実証する。 A wide range of models have been proposed for Graph Generative Models, necessitating effective methods to evaluate their quality. So far, most techniques use either traditional metrics based on subgraph counting, or the representations of randomly initialized Graph Neural Networks (GNNs). We propose using representations from contrastively trained GNNs, rather than random GNNs, and show this gives more reliable evaluation metrics. Neither traditional approaches nor GNN-based approaches dominate the other, however: we give examples of graphs that each approach is unable to distinguish. We demonstrate that Graph Substructure Networks (GSNs), which in a way combine both approaches, are better at distinguishing the distances between graph datasets.	翻訳日:2022-06-14 14:27:34 公開日:2022-06-13
# 制約付きmdpの最適近傍サンプル複雑性境界 Near-Optimal Sample Complexity Bounds for Constrained MDPs ( http://arxiv.org/abs/2206.06270v1 ) ライセンス: Link先を確認	Sharan Vaswani, Lin F. Yang, Csaba Szepesv\'ari	(参考訳) マルコフ決定過程(MDPs)を解くためのサンプル複雑性の特徴付けの進歩とは対照的に、制約付きMDP(CMDPs)を解くための最適な統計複雑性はいまだ不明である。生成モデル(シミュレータ)にアクセスして割引cmdpで最適に近いポリシーを学ぶために、サンプル複雑性の最小上限と下限を提供することで、この問題を解決する。特に、2つの設定に対処するモデルベースアルゴリズムを設計する。 (i)小さな制約違反が許容されるような緩和実現可能性 (ii)厳格な実現可能性(制約を満たすために出力ポリシが必要) のために i) 提案アルゴリズムは,$\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$クエリを生成モデルに適用することにより,確率 1 - \delta$ で $\epsilon$-optimal Policy を返すことを証明した。のために (ii) アルゴリズムのサンプルの複雑さは、$\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1- \gamma)^5 \, \epsilon^2 \zeta^2} \right)$$\zeta$は問題依存のスレーター定数であり、実現可能な領域のサイズを特徴付ける。最後に, 厳密な実現可能性設定に対して一致した下界を証明し, 割引CMDPに対する第1の極小最適境界を求める。以上の結果から,CMDPの学習は制約違反を許す場合と同等に容易であるが,制約違反を要求しない場合には本質的に困難であることがわかった。 In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $\epsilon$-optimal policy with probability $1 - \delta$, by making $\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right)$ where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.	翻訳日:2022-06-14 14:27:22 公開日:2022-06-13
# CNNのロバスト性に向けたPixelからバイナリへの埋め込み Pixel to Binary Embedding Towards Robustness for CNNs ( http://arxiv.org/abs/2206.05898v1 ) ライセンス: Link先を確認	Ikki Kishida and Hideki Nakayama	(参考訳) 畳み込みニューラルネットワーク(CNN)の堅牢性にはいくつかの問題がある。例えば、入力に少量のノイズを加えることでCNNの予測を変更でき、トレーニング中に見られない変換(例えば、ぼやけた効果)によって入力の分布がシフトされたときにCNNのパフォーマンスが劣化する。対向摂動問題に対処するため、画素値をバイナリ埋め込みで置き換えるアプローチがあり、堅牢性の向上に成功している。本研究では,cnnのロバスト性を改善するために,p2beを提案する。 p2beは、以前の手書きバイナリ埋め込みメソッドとは対照的に学習可能なバイナリ埋め込みメソッドである。 P2BEは、訓練中に表示されない対向的摂動や視覚的腐敗に対する堅牢性において、他のバイナリ埋め込み方法よりも優れる。 There are several problems with the robustness of Convolutional Neural Networks (CNNs). For example, the prediction of CNNs can be changed by adding a small magnitude of noise to an input, and the performances of CNNs are degraded when the distribution of input is shifted by a transformation never seen during training (e.g., the blur effect). There are approaches to replace pixel values with binary embeddings to tackle the problem of adversarial perturbations, which successfully improve robustness. In this work, we propose Pixel to Binary Embedding (P2BE) to improve the robustness of CNNs. P2BE is a learnable binary embedding method as opposed to previous hand-coded binary embedding methods. P2BE outperforms other binary embedding methods in robustness against adversarial perturbations and visual corruptions that are not shown during training.	翻訳日:2022-06-14 14:24:44 公開日:2022-06-13
# 最適化を高速化するメタラーニング適応フェーズ Faster Optimization-Based Meta-Learning Adaptation Phase ( http://arxiv.org/abs/2206.05930v1 ) ライセンス: Link先を確認	Kostiantyn Khabarlak	(参考訳) ニューラルネットワークは学習するために大量の注釈付きデータを必要とする。メタラーニングアルゴリズムは、トレーニングサンプルの数をほんの数人に減らす方法を提案する。最も著名な最適化に基づくメタ学習アルゴリズムの1つは、モデル非依存メタ学習(maml)である。しかし、MAMLにおける新しいタスクへの適応の重要な手順は非常に遅い。本研究では,MAMLメタ学習アルゴリズムの改良を提案する。適応フェーズ中にネットワーク内で更新される重量を制限するLambdaパターンを導入する。これにより、特定の勾配計算をスキップすることができる。許容品質劣化閾値パラメータにより、最速パターンが選択される。特定の場合には、注意深いパターン選択によって品質改善が可能となる。実験により, ラムダ適応パターンの選択により, 適応時間は最小精度損失の3倍に減少し, 1段階適応の精度は大幅に向上した。 Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease the number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is Model-Agnostic Meta-Learning (MAML). However, the key procedure of adaptation to new tasks in MAML is quite slow. In this work we propose an improvement to MAML meta-learning algorithm. We introduce Lambda patterns by which we restrict which weight are updated in the network during the adaptation phase. This makes it possible to skip certain gradient computations. The fastest pattern is selected given an allowed quality degradation threshold parameter. In certain cases, quality improvement is possible by a careful pattern selection. The experiments conducted have shown that via Lambda adaptation pattern selection, it is possible to significantly improve the MAML method in the following areas: adaptation time has been decreased by a factor of 3 with minimal accuracy loss; accuracy for one-step adaptation has been substantially improved.	翻訳日:2022-06-14 14:24:30 公開日:2022-06-13
# GoToNet: 高速なモノクロシーン露光と探索 GoToNet: Fast Monocular Scene Exposure and Exploration ( http://arxiv.org/abs/2206.05967v1 ) ライセンス: Link先を確認	Tom Avrech, Evgenii Zheltonozhskii, Chaim Baskin, Ehud Rivlin	(参考訳) 自律的なシーンの露出と探索、特に、未知のシーンのターゲットを見つけるのに有用なローカライズや通信の密度の高い領域は、コンピュータナビゲーションにおいて難しい問題である。そこで本研究では,事前学習のための視覚的に類似したデータセット,シーンに十分な照明,環境検知のための前方向けRGBカメラなど,リアルタイム環境探索のための新しい手法を提案する。既存の手法とは対照的に,良好な戦術決定を行うには1つのルック(画像)しか必要とせず,非成長的な一定時間で動作する。 GotoとLookatと呼ばれる画素が特徴である2つの方向予測が,本手法のコアを構成する。これらの画素は推奨飛行指示を次のように符号化する: Gotoピクセルはエージェントが1つの距離単位で動く方向を定義し、Lookatピクセルは次のステップでカメラが指している方向を定義している。これらのフライングインストラクションピクセルは、現在未調査領域で最も多く露出するように最適化されている。本手法は,この問題を解決するための新しい深層学習に基づくナビゲーション手法を提案し,計算能力が制限された場合に,さらに複雑なセットアップでその能力を示す。また,rgbと深度画像を用いた効率的な学習を実現するため,ナビゲーション指向データセットを生成する手法を提案する。スパースピクセルのコーディネーション推論プロセスと、領域を公開し、期待できる結果を達成するための距離を減らすことを目的とした2Dおよび3Dテスト飛行の両方を評価するシミュレータで実施されたテスト。現状のアルゴリズムと比較すると、カメラのポーズごとに新しいボクセルを計測し、ターゲットまでの距離を最小化し、表面ボクセルのパーセンテージを計測し、計算時間を計測する。 Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board forward-looking RGB camera for environmental sensing. As opposed to existing methods, our method requires only one look (image) to make a good tactical decision, and therefore works at a non-growing, constant time. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method. These pixels encode the recommended flight instructions in the following way: the Goto pixel defines the direction in which the agent should move by one distance unit, and the Lookat pixel defines the direction in which the camera should be pointing at in the next step. These flying-instruction pixels are optimized to expose the largest amount of currently unexplored areas. Our method presents a novel deep learning-based navigation approach that is able to solve this problem and demonstrate its ability in an even more complicated setup, i.e., when computational power is limited. In addition, we propose a way to generate a navigation-oriented dataset, enabling efficient training of our method using RGB and depth images. Tests conducted in a simulator evaluating both the sparse pixels' coordinations inferring process, and 2D and 3D test flights aimed to unveil areas and decrease distances to targets achieve promising results. Comparison against a state-of-the-art algorithm shows our method is able to overperform it, that while measuring the new voxels per camera pose, minimum distance to target, percentage of surface voxels seen, and compute time metrics.	翻訳日:2022-06-14 14:24:19 公開日:2022-06-13
# 教師なし意味セグメンテーションのためのトランスフォーマーを用いた物体マスクの発見 Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation ( http://arxiv.org/abs/2206.06363v1 ) ライセンス: Link先を確認	Wouter Van Gansbeke, Simon Vandenhende, Luc Van Gool	(参考訳) 教師なしセマンティックセグメンテーションの課題は、ピクセルを意味のあるグループにクラスタ化することである。具体的には、同じクラスタに割り当てられたピクセルは、オブジェクトや部分カテゴリのようなハイレベルなセマンティクス特性を共有する必要がある。本稿では,3つのキーアイデアに基づいた教師なしセマンティックセグメンテーションのための新しいフレームワークMaskDistillを提案する。まず、セマンティックセグメンテーションの前にピクセルグループとして機能するオブジェクトマスクを生成するためのデータ駆動戦略を提案する。このアプローチは、特定のシーン構成のためにしばしば設計され、競合するフレームワークの適用性を制限する手作りの先行を省略する。第2に、MaskDistillはオブジェクトマスクをクラスタ化して、初期オブジェクトセグメンテーションモデルをトレーニングするための擬似グラウンドトルースを得る。第3に、このモデルを利用して低品質のオブジェクトマスクをフィルタします。この戦略は,画素グループ化前のノイズを軽減し,最終的なセグメンテーションモデルをトレーニングするために使用するマスクのクリーンコレクションを実現する。これらのコンポーネントを組み合わせることで、PASCAL(+11% mIoU)とCOCO(+4% mask AP50)の教師なしセマンティックセマンティックセグメンテーションにおいて、従来よりも大幅に優れています。興味深いことに、既存のアプローチとは対照的に、我々のフレームワークは低レベルの画像キューにラッチせず、オブジェクト中心のデータセットに限定されない。コードとモデルは利用可能になる。 The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to generate object masks that serve as a pixel grouping prior for semantic segmentation. This approach omits handcrafted priors, which are often designed for specific scene compositions and limit the applicability of competing frameworks. Second, MaskDistill clusters the object masks to obtain pseudo-ground-truth for training an initial object segmentation model. Third, we leverage this model to filter out low-quality object masks. This strategy mitigates the noise in our pixel grouping prior and results in a clean collection of masks which we use to train a final segmentation model. By combining these components, we can considerably outperform previous works for unsupervised semantic segmentation on PASCAL (+11% mIoU) and COCO (+4% mask AP50). Interestingly, as opposed to existing approaches, our framework does not latch onto low-level image cues and is not limited to object-centric datasets. The code and models will be made available.	翻訳日:2022-06-14 14:23:46 公開日:2022-06-13
# ヒューマンオブジェクトインタラクション検出のためのインタラクション提案に基づく構造認識変換器の探索 Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection ( http://arxiv.org/abs/2206.06291v1 ) ライセンス: Link先を確認	Yong Zhang and Yingwei Pan and Ting Yao and Rui Huang and Tao Mei and Chang-Wen Chen	(参考訳) 近年のHuman-Object Interaction(HOI)検出技術はTransformerベースのオブジェクト検出器(DETR)の影響を強く受けている。それでも、ほとんどの場合、パラメトリックなインタラクションクエリを直接、バニラトランスフォーマーを通じて一段階的にHOI予測にマッピングする。これにより、リッチな相互作用間構造や相互作用内構造が過小評価される。本稿では,hoi検出のための新しいトランスフォーマティブ型hoi検出器,すなわちstip(structure-aware transformer over interaction proposals)を設計した。このような設計は、HOIセット予測の過程を、まず相互作用の提案生成を行い、次に構造認識変換器を介して非パラメトリック相互作用提案をHOI予測に変換する2つのフェーズに分解する。構造対応トランスフォーマーは、相互作用提案の全体的意味構造と、各相互作用提案内の人間・物体の局所的空間構造を付加してバニラ変換器をアップグレードし、HOI予測を強化する。 V-COCOとHICO-DETのベンチマークで行った大規模な実験はSTIPの有効性を示し、最先端のHOI検出器と比較すると優れた結果が報告されている。ソースコードは \url{https://github.com/zyong812/STIP} で入手できる。 Recent high-performing Human-Object Interaction (HOI) detection techniques have been highly influenced by Transformer-based object detector (i.e., DETR). Nevertheless, most of them directly map parametric interaction queries into a set of HOI predictions through vanilla Transformer in a one-stage manner. This leaves rich inter- or intra-interaction structure under-exploited. In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection. Such design decomposes the process of HOI set prediction into two subsequent phases, i.e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer. The structure-aware Transformer upgrades vanilla Transformer by encoding additionally the holistically semantic structure among interaction proposals as well as the locally spatial structure of human/object within each interaction proposal, so as to strengthen HOI predictions. Extensive experiments conducted on V-COCO and HICO-DET benchmarks have demonstrated the effectiveness of STIP, and superior results are reported when comparing with the state-of-the-art HOI detectors. Source code is available at \url{https://github.com/zyong812/STIP}.	翻訳日:2022-06-14 14:21:03 公開日:2022-06-13
# MLP-3D:グループ時間混合型MLPライクな3Dアーキテクチャ MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing ( http://arxiv.org/abs/2206.06292v1 ) ライセンス: Link先を確認	Zhaofan Qiu and Ting Yao and Chong-Wah Ngo and Tao Mei	(参考訳) 畳み込みニューラルネットワーク(CNN)は、視覚認識のためのゴートモデルとみなされてきた。近年,MSA(Multi-head self-attention)やMLP(Multi-layer perceptrons)に基づく畳み込みのないネットワークが普及している。それにもかかわらず、ビデオデータの大きなバリエーションや複雑さのために、これらの新たなネットワークをビデオ認識に活用するのは簡単ではない。本稿では,ビデオ認識のための新しい3DアーキテクチャであるMLP-3Dネットワークを提案する。具体的には、MLP-3Dブロックで構成され、各ブロックはトークン間で適用される1つのMLP(トークン混合MLP)と、各トークンに対して独立して適用される1つのMLP(チャネルMLP)を含む。新規なグループ化時間混合(GTM)演算の導出により,時間的モデリングの能力を備えた基本トークン混合MLPを開発した。 GTMは入力トークンを複数の時間群に分割し、各グループのトークンを共有射影行列で線形にマッピングする。さらに,GTM の様々な変種をグループ化戦略で考案し,各変種を Greedy アーキテクチャサーチにより MLP-3D ネットワークの異なるブロックに構成する。コンボリューションやアテンション機構に依存せずに、我々のMLP-3Dネットワークは、Somes-Something V2 と Kinetics-400 のデータセット上で、それぞれ68.5\%/81.4\%のトップ-1精度を達成する。計算量が少ないにもかかわらず、結果は最先端の3D CNNやビデオトランスフォーマーに匹敵する。ソースコードはhttps://github.com/ZhaofanQiu/MLP-3Dで入手できる。 Convolutional Neural Networks (CNNs) have been regarded as the go-to models for visual recognition. More recently, convolution-free networks, based on multi-head self-attention (MSA) or multi-layer perceptrons (MLPs), become more and more popular. Nevertheless, it is not trivial when utilizing these newly-minted networks for video recognition due to the large variations and complexities in video data. In this paper, we present MLP-3D networks, a novel MLP-like 3D architecture for video recognition. Specifically, the architecture consists of MLP-3D blocks, where each block contains one MLP applied across tokens (i.e., token-mixing MLP) and one MLP applied independently to each token (i.e., channel MLP). By deriving the novel grouped time mixing (GTM) operations, we equip the basic token-mixing MLP with the ability of temporal modeling. GTM divides the input tokens into several temporal groups and linearly maps the tokens in each group with the shared projection matrix. Furthermore, we devise several variants of GTM with different grouping strategies, and compose each variant in different blocks of MLP-3D network by greedy architecture search. Without the dependence on convolutions or attention mechanisms, our MLP-3D networks achieves 68.5\%/81.4\% top-1 accuracy on Something-Something V2 and Kinetics-400 datasets, respectively. Despite with fewer computations, the results are comparable to state-of-the-art widely-used 3D CNNs and video transformers. Source code is available at https://github.com/ZhaofanQiu/MLP-3D.	翻訳日:2022-06-14 14:20:41 公開日:2022-06-13
# (参考訳) 衛星によるC\^ote d'Ivoireとガーナのココア植林地域の高分解能地図 Satellite-based high-resolution maps of cocoa planted area for C\^ote d'Ivoire and Ghana ( http://arxiv.org/abs/2206.06119v1 ) ライセンス: CC BY 4.0	Nikolai Kalischek, Nico Lang, C\'ecile Renier, Rodrigo Caye Daudt, Thomas Addoah, William Thompson, Wilma J. Blaser-Hart, Rachael Garrett, Konrad Schindler, Jan D. Wegner	(参考訳) 世界最大のcocoa生産国であるc\^ote d'ivoireとガーナは世界のcocoa生産の3分の2を占めている。どちらの国でもココアが主要な多年生作物であり、約200万人の農家に収入を提供している。ココア栽培地域の正確な地図は欠落しており、保護地域の拡大の正確な定量化、生産と収量、サステナビリティガバナンスの改善に利用可能な情報制限を妨げている。本稿では,ココアプランテーションデータと公開衛星画像とを深層学習の枠組みで組み合わせ,両国のココアプランテーションの高解像度マップを作成する。以上の結果から,ココア栽培は,C\ote d'Ivoire と Ghana の保護地域における森林被害の37%以上と13%の基盤要因であり,Ghana の植林面積を最大40%まで大幅に過小評価していることが明らかとなった。これらの地図は、ココア生産地域の保全と経済発展を理解する上で重要な構成要素となっている。 C\^ote d'Ivoire and Ghana, the world's largest producers of cocoa, account for two thirds of the global cocoa production. In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of cocoa planted area are missing, hindering accurate quantification of expansion in protected areas, production and yields, and limiting information available for improved sustainability governance. Here, we combine cocoa plantation data with publicly available satellite imagery in a deep learning framework and create high-resolution maps of cocoa plantations for both countries, validated in situ. Our results suggest that cocoa cultivation is an underlying driver of over 37% and 13% of forest loss in protected areas in C\^ote d'Ivoire and Ghana, respectively, and that official reports substantially underestimate the planted area, up to 40% in Ghana. These maps serve as a crucial building block to advance understanding of conservation and economic development in cocoa producing regions.	翻訳日:2022-06-14 14:18:27 公開日:2022-06-13
# 生物学的にインスパイアされた神経経路探索 Biologically Inspired Neural Path Finding ( http://arxiv.org/abs/2206.05971v1 ) ライセンス: Link先を確認	Hang Li, Qadeer Khan, Volker Tresp, Daniel Cremers	(参考訳) ヒトの脳は、シナプスによって接続された数千億の生物学的ニューロンからなるグラフィカルな構造と見なすことができる。神経細胞が損傷した場合に、別の経路を流れる情報を自動的にルートする能力がある。さらに、脳は情報を保持し、類似するが完全に見えないシナリオに適用することができる。本稿では,脳のこれらの属性からインスピレーションを得て,一般化グラフにおけるソースノードと宛先ノードの間の最適な低コスト経路を見つけるための計算フレームワークを開発する。私たちのフレームワークは、テスト時に見当たらないグラフを処理できることを示します。さらに、任意の予測時間を維持しながら、推論中にノードを任意に追加または削除する場合に、代替の最適経路を見つけることができる。コードはここにある。 https://github.com/hangligit/pathfinding The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain, to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time. Code is available here: https://github.com/hangligit/pathfinding	翻訳日:2022-06-14 13:56:37 公開日:2022-06-13
# 高速政策伝達のための相対的政策移行最適化 Relative Policy-Transition Optimization for Fast Policy Transfer ( http://arxiv.org/abs/2206.06009v1 ) ライセンス: Link先を確認	Lei Han, Jiawei Xu, Cheng Zhou, Yizheng Zhang, Zhengyou Zhang	(参考訳) 我々は,2つのマルコフ決定過程(mdps)間の政策伝達の問題を考える。本研究では,既存の理論結果に基づく補題法である強化学習(rl)を導入し,任意の2つのmdp間の相対性を測定する。この補題に基づいて、我々は、それぞれ高速なポリシー伝達と動的モデリングを提供する相対ポリシー最適化(RPO)と相対遷移最適化(RTO)と呼ばれる2つの新しいアルゴリズムを提案する。 RPOは相対的な方針勾配を用いてポリシーを更新し、ある環境で評価されたポリシーを転送し、別の環境でのリターンを最大化する一方、RTOは相対的な遷移勾配を用いてパラメータ化された力学モデルを更新し、2つの環境のダイナミクス間のギャップを減らす。次に、2つのアルゴリズムを統合することで、ポリシーが2つの環境と同時に相互作用し、ポリシーと遷移の更新が1つのクローズドループで完了し、ポリシー転送のための原則学習フレームワークを形成する、完全なアルゴリズムであるRelative Policy-Transition Optimization (RPTO)が提供される。本研究では,OpenAI体育館の古典的制御課題におけるRPTOの有効性を示す。 We consider the problem of policy transfer between two Markov Decision Processes (MDPs). We introduce a lemma based on existing theoretical results in reinforcement learning (RL) to measure the relativity between two arbitrary MDPs, that is the difference between any two cumulative expected returns defined on different policies and environment dynamics. Based on this lemma, we propose two new algorithms referred to as Relative Policy Optimization (RPO) and Relative Transition Optimization (RTO), which can offer fast policy transfer and dynamics modeling, respectively. RPO updates the policy using the relative policy gradient to transfer the policy evaluated in one environment to maximize the return in another, while RTO updates the parameterized dynamics model (if there exists) using the relative transition gradient to reduce the gap between the dynamics of the two environments. Then, integrating the two algorithms offers the complete algorithm Relative Policy-Transition Optimization (RPTO), in which the policy interacts with the two environments simultaneously, such that data collections from two environments, policy and transition updates are completed in one closed loop to form a principled learning framework for policy transfer. We demonstrate the effectiveness of RPTO in OpenAI gym's classic control tasks by creating policy transfer problems via variant dynamics.	翻訳日:2022-06-14 13:56:25 公開日:2022-06-13
# ディープニューラルネットワークにおけるランク低下 Rank Diminishing in Deep Neural Networks ( http://arxiv.org/abs/2206.06072v1 ) ライセンス: Link先を確認	Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, Zheng-Jun Zha	(参考訳) ニューラルネットワークのランクは、層をまたがる情報を測定する。これは、機械学習の幅広い領域にまたがる重要な構造的条件の例である。特に、低ランクな特徴表現の仮定は多くのアーキテクチャにおいてアルゴリズム的な発展をもたらす。しかし、ニューラルネットワークでは、低ランク構造を生み出す固有のメカニズムはあいまいで不明瞭である。このギャップを埋めるために,ネットワークランクの挙動に関する厳密な研究を行い,特にランク不足の概念に着目した。微分および代数的構成の基本規則からネットワークランクの普遍的単調減少特性を理論的に確立し,ネットワークブロックのランク不足と深い関数結合を明らかにする。この数値計算手法を用いて,imagenet上のネットワークランクの層毎挙動,すなわちresnet,deep mlp,transformerの実用場面における最初の経験的解析を行う。これらの実験結果は我々の理論と直接一致している。さらに,特定のカテゴリの分類信頼度を,他のカテゴリの信頼度によって線形に決定できるディープネットワークのランク不足によって生じる,新たな独立性の欠如現象を明らかにした。この研究の理論的結果は、経験的な発見とともに、ディープニューラルネットワークの本質的原理の理解を深める可能性がある。 The rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. In particular, the assumption of low-rank feature representations leads to algorithmic developments in many architectures. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear. To fill this gap, we perform a rigorous study on the behavior of network rank, focusing particularly on the notion of rank deficiency. We theoretically establish a universal monotonic decreasing property of network rank from the basic rules of differential and algebraic composition, and uncover rank deficiency of network blocks and deep function coupling. By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i.e., ResNets, deep MLPs, and Transformers on ImageNet. These empirical results are in direct accord with our theory. Furthermore, we reveal a novel phenomenon of independence deficit caused by the rank deficiency of deep networks, where classification confidence of a given category can be linearly decided by the confidence of a handful of other categories. The theoretical results of this work, together with the empirical findings, may advance understanding of the inherent principles of deep neural networks.	翻訳日:2022-06-14 13:56:03 公開日:2022-06-13
# 時系列の教師なし領域適応のためのコントラスト学習 Contrastive Learning for Unsupervised Domain Adaptation of Time Series ( http://arxiv.org/abs/2206.06243v1 ) ライセンス: Link先を確認	Yilmazcan Ozyurt, Stefan Feuerriegel, Ce Zhang	(参考訳) unsupervised domain adaptation (uda) は、ラベル付きソースドメインを使用して機械学習モデルを学習することを目的としている。 UDAは医療などの多くの分野で重要であり、様々な患者コホートにリスクスコアを適応させるのに用いられる。本稿では,CLUDAと呼ばれる時系列データのUDAのための新しいフレームワークを開発する。具体的には,多変量時系列におけるドメイン不変セマンティクスを学習するための対照的な学習フレームワークを提案する。また,本フレームワークでは,最寄りのコントラスト学習により,ソースドメインとターゲットドメインのセマンティックな変化を捉える。我々の知る限りでは、時系列データのUDAのドメイン不変セマンティック情報を学ぶための最初のフレームワークである。我々は,医療時系列を用いた大規模実世界のデータセット(MIMIC-IVとアムステルダムUMCdb)を用いて,その有効性を実証し,時系列UDAの最先端性能を実現することを示す。 Unsupervised domain adaptation (UDA) aims at learning a machine learning model using a labeled source domain that performs well on a similar yet different, unlabeled target domain. UDA is important in many applications such as medicine, where it is used to adapt risk scores across different patient cohorts. In this paper, we develop a novel framework for UDA of time series data, called CLUDA. Specifically, we propose a contrastive learning framework to learn domain-invariant semantics in multivariate time series, so that these preserve label information for the prediction task. In our framework, we further capture semantic variation between source and target domain via nearest-neighbor contrastive learning. To the best of our knowledge, ours is the first framework to learn domain-invariant semantic information for UDA of time series data. We evaluate our framework using large-scale, real-world datasets with medical time series (i.e., MIMIC-IV and AmsterdamUMCdb) to demonstrate its effectiveness and show that it achieves state-of-the-art performance for time series UDA.	翻訳日:2022-06-14 13:55:28 公開日:2022-06-13
# 予測プロセスモニタリング改善のためのニューラルネットワークによる不確かさの学習 Learning Uncertainty with Artificial Neural Networks for Improved Predictive Process Monitoring ( http://arxiv.org/abs/2206.06317v1 ) ライセンス: Link先を確認	Hans Weytjens and Jochen De Weerdt	(参考訳) 人工ニューラルネットワークが予測の不確実性を評価することができないことは、その普及に障害となる。学習データ不足によるモデル不確かさとノイズによる観測不確実性との2つのタイプを区別する。ベイズニューラルネットワークは、予測のモデルの不確実性を学ぶために、堅固な数学的基盤を使用する。観測の不確実性は、これらのネットワークに1つの層を追加し、損失関数を増強することで計算することができる。我々の貢献は、これらの不確実性概念を予測プロセス監視タスクに適用し、不確実性に基づくモデルを訓練し、残りの時間と結果を予測することである。実験の結果,不確実性推定により,より精度の低い予測が可能であり,信頼性区間は回帰と分類の両方で構築できることがわかった。これらの結論は、実行中のプロセスの初期段階でも当てはまります。さらに、デプロイされたテクニックは高速で、より正確な予測を生成する。学習された不確実性によって、プロセス予測システムに対するユーザの信頼が高まり、人とシステム間のコラボレーションが向上し、より小さなデータセットによる初期の実装が可能になる。 The inability of artificial neural networks to assess the uncertainty of their predictions is an impediment to their widespread use. We distinguish two types of learnable uncertainty: model uncertainty due to a lack of training data and noise-induced observational uncertainty. Bayesian neural networks use solid mathematical foundations to learn the model uncertainties of their predictions. The observational uncertainty can be calculated by adding one layer to these networks and augmenting their loss functions. Our contribution is to apply these uncertainty concepts to predictive process monitoring tasks to train uncertainty-based models to predict the remaining time and outcomes. Our experiments show that uncertainty estimates allow more and less accurate predictions to be differentiated and confidence intervals to be constructed in both regression and classification tasks. These conclusions remain true even in early stages of running processes. Moreover, the deployed techniques are fast and produce more accurate predictions. The learned uncertainty could increase users' confidence in their process prediction systems, promote better cooperation between humans and these systems, and enable earlier implementations with smaller datasets.	翻訳日:2022-06-14 13:53:33 公開日:2022-06-13
# (参考訳) 知識グラフの構築と放射線科医による自動放射線学レポート作成への応用 Knowledge Graph Construction and Its Application in Automatic Radiology Report Generation from Radiologist's Dictation ( http://arxiv.org/abs/2206.06308v1 ) ライセンス: CC BY 4.0	Kaveri Kale, Pushpak Bhattacharyya, Aditya Shetty, Miling Gune, Kush Shrivastava, Rustom Lawyer and Spriha Biswas	(参考訳) 従来、放射線科医は診断ノートを作成し、それを転写学者と共有する。その後、書き起こし師はメモを参照して予備書式レポートを作成し、最後に、放射線学者はレポートをレビューし、エラーを修正し、サインオフする。このワークフローはレポートに重大な遅延とエラーを引き起こす。本研究は,情報抽出(IE)やドメイン固有知識グラフ(KG)といったNLP技術を用いて,放射線技師の指示から放射線学レポートを自動生成することに焦点を当てている。本稿は,既存の大量の自由テキストラジオグラフィーレポートから情報を抽出し,各臓器のKG構築に焦点を当てる。本研究では,ルールベース,パターンベース,辞書ベースの手法と語彙意味的特徴を組み合わせた情報抽出パイプラインを構築し,エンティティと関係を抽出する。短いディクテーションで欠落した情報は、kgsからアクセスでき、病理的な記述が生成される。生成した病理的記述は、金標準病理的記述と97%の類似性を示す意味的類似度メトリクスを用いて評価される。また,本分析の結果から,我々のIEモジュールは放射線学領域のOpenIEツールよりも優れた性能を示している。さらに, 放射線科医による手作業による定性解析を行い, 生成した報告の80～85%が正しく書かれ, 残りは部分的に正しいことを示した。 Conventionally, the radiologist prepares the diagnosis notes and shares them with the transcriptionist. Then the transcriptionist prepares a preliminary formatted report referring to the notes, and finally, the radiologist reviews the report, corrects the errors, and signs off. This workflow causes significant delays and errors in the report. In current research work, we focus on applications of NLP techniques like Information Extraction (IE) and domain-specific Knowledge Graph (KG) to automatically generate radiology reports from radiologist's dictation. This paper focuses on KG construction for each organ by extracting information from an existing large corpus of free-text radiology reports. We develop an information extraction pipeline that combines rule-based, pattern-based, and dictionary-based techniques with lexical-semantic features to extract entities and relations. Missing information in short dictation can be accessed from the KGs to generate pathological descriptions and hence the radiology report. Generated pathological descriptions evaluated using semantic similarity metrics, which shows 97% similarity with gold standard pathological descriptions. Also, our analysis shows that our IE module is performing better than the OpenIE tool for the radiology domain. Furthermore, we include a manual qualitative analysis from radiologists, which shows that 80-85% of the generated reports are correctly written, and the remaining are partially correct.	翻訳日:2022-06-14 13:50:08 公開日:2022-06-13
# EnergyMatch:セミスーパービジョンラーニングのためのエネルギーベース擬似ラベル EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning ( http://arxiv.org/abs/2206.06359v1 ) ライセンス: Link先を確認	Zhuoran Yu, Yin Li, Yong Jae Lee	(参考訳) 半教師付き学習(SSL)における最近の最先端手法は、整合性正規化と信頼に基づく疑似ラベルを組み合わせる。高品質な擬似ラベルを得るには、一般的に高い信頼しきい値を採用する。しかし,深層ネットワークにおけるソフトマックスに基づく信頼度スコアは,トレーニングデータから離れたサンプルでは任意に高い値となり,信頼性の低いサンプルであっても疑似ラベルは信頼できない可能性がある。本研究では,モデル信頼度に頼らずに,ラベルなしサンプルが"分布内"である可能性が高いか,すなわち現在のトレーニングデータに近いかを測定する。ラベルのないサンプルが「分布内」か「分布外」かを分類するために、分布外検出文献からのエネルギースコアを採用する。トレーニングが進み、ラベルのないサンプルが流通し、トレーニングに寄与するにつれて、ラベル付きデータと擬ラベル付きデータを組み合わせることで、真の分布を近似してモデルを改善することができる。提案手法は, 概念的には単純であるが, 不均衡sslベンチマークにおける信頼度ベース手法を著しく上回っており, クラスバランスデータにおける競合性能を実現した。例えば、不均衡比が50を超えると、cifar10-ltの絶対精度が4-6%向上する。最先端のロングテールSSLメソッドと組み合わせると、さらなる改善が達成される。 Recent state-of-the-art methods in semi-supervised learning (SSL) combine consistency regularization with confidence-based pseudo-labeling. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling: instead of relying on model confidence, we instead measure whether an unlabeled sample is likely to be "in-distribution"; i.e., close to the current training data. To classify whether an unlabeled sample is "in-distribution" or "out-of-distribution", we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks, and achieves competitive performance on class-balanced data. For example, it produces a 4-6% absolute accuracy improvement on CIFAR10-LT when the imbalance ratio is higher than 50. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained.	翻訳日:2022-06-14 13:36:37 公開日:2022-06-13
# 2次元ホログラフィック縮小表現を用いた信頼できないプラットフォームへの畳み込みネットワークの展開 Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations ( http://arxiv.org/abs/2206.05893v1 ) ライセンス: Link先を確認	Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt	(参考訳) ニューラルネットワークの推論を実行する計算コストのため、サードパーティの計算環境やハードウェアに推論ステップをデプロイする必要が一般的である。第三者が完全に信頼されていない場合、入力と出力の性質を難読化することが望ましいので、第三者が実行中の特定のタスクを容易に決定できない。信頼できない相手を利用するための安全なプロトコルは存在するが、実際に実行するには計算要求が多すぎる。代わりに、Connectionist Symbolic Pseudo Secretsと呼ばれる、高速でヒューリスティックなセキュリティの異なる戦略を探求します。ホログラフィック縮小表現(hrr)を利用することで、非現実的に敵に有利な脅威モデルの下でも、攻撃に対する堅牢性を示す疑似暗号化スタイルの防御を備えたニューラルネットワークを構築する。 Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common. If the third party is not fully trusted, it is desirable to obfuscate the nature of the inputs and outputs, so that the third party can not easily determine what specific task is being performed. Provably secure protocols for leveraging an untrusted party exist but are too computational demanding to run in practice. We instead explore a different strategy of fast, heuristic security that we call Connectionist Symbolic Pseudo Secrets. By leveraging Holographic Reduced Representations (HRR), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary.	翻訳日:2022-06-14 13:36:14 公開日:2022-06-13
# ar-nerf: 開口レンダリングニューラルラミアンスフィールドを用いた自然画像からの奥行きとデフォーカス効果の教師なし学習 AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields ( http://arxiv.org/abs/2206.06100v1 ) ライセンス: Link先を確認	Takuhiro Kaneko	(参考訳) データ収集の利点から、完全に教師なしの3D表現学習が注目を集めている。成功したアプローチは、生成モデル(例えば、gans)に基づく画像分布を学習し、3d認識モデル(例えば、神経放射野(nerfs))に基づいて様々なビュー画像を生成する視点認識アプローチである。しかし、トレーニングには様々なビューのイメージが必要であるため、少ない視点や限られた視点のデータセットへの適用は依然として課題である。相補的なアプローチとして,デフォーカスキューを用いた開口レンダリングGAN(AR-GAN)を提案する。しかし、ar-ganはcnnベースのモデルであり、相関度が高いにもかかわらず、視点の変化とは独立にデフォーカスを表現する。 AR-GANの代替として、共通のレイトレーシングフレームワークにおいて両因子を表現し、視点とデフォーカスの手がかりを統一的に活用できる開口レンダリングNeRF(AR-NeRF)を提案する。さらに,デフォーカス認識とデフォーカス非依存表現を不連続に学習するために,開口サイズと潜在符号を独立にランダム化しながら画像を生成するアパーチャランダム化トレーニングを提案する。実験では, 花, 鳥, 顔画像などの自然画像データセットにAR-NeRFを適用し, 深度とデフォーカス効果の教師なし学習におけるAR-NeRFの有用性を実証した。 Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection. A successful approach involves a viewpoint-aware approach that learns an image distribution based on generative models (e.g., generative adversarial networks (GANs)) while generating various view images based on 3D-aware models (e.g., neural radiance fields (NeRFs)). However, they require images with various views for training, and consequently, their application to datasets with few or limited viewpoints remains a challenge. As a complementary approach, an aperture rendering GAN (AR-GAN) that employs a defocus cue was proposed. However, an AR-GAN is a CNN-based model and represents a defocus independently from a viewpoint change despite its high correlation, which is one of the reasons for its performance. As an alternative to an AR-GAN, we propose an aperture rendering NeRF (AR-NeRF), which can utilize viewpoint and defocus cues in a unified manner by representing both factors in a common ray-tracing framework. Moreover, to learn defocus-aware and defocus-independent representations in a disentangled manner, we propose aperture randomized training, for which we learn to generate images while randomizing the aperture size and latent codes independently. During our experiments, we applied AR-NeRF to various natural image datasets, including flower, bird, and face images, the results of which demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.	翻訳日:2022-06-14 13:35:41 公開日:2022-06-13
# jiuzhang: 数学問題理解のための中国語事前学習言語モデル JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding ( http://arxiv.org/abs/2206.06315v1 ) ライセンス: Link先を確認	Wayne Xin Zhao, Kun Zhou, Zheng Gong, Beichen Zhang, Yuanhang Zhou, Jing Sha, Zhigang Chen, Shijin Wang, Cong Liu, Ji-Rong Wen	(参考訳) 本稿では,中国初の数学事前学習言語モデル(plm)を提示することで,機械の数学的知性を向上させることを目的とする。他の標準のNLPタスクとは異なり、数学的テキストは問題文に数学的用語、記号、公式を含むため理解が難しい。一般に、数学問題を解決するには複雑な数学的論理と背景知識が必要である。数学テキストの複雑な性質を考慮し,基礎科と高等科の両方からなる数学plmの学習を改善するための新しいカリキュラム事前学習手法を考案する。具体的には,まず位置バイアスマスキング戦略に基づいてトークンレベルの事前学習を行い,その後,シャッフル文と式をそれぞれ復元する論理に基づく事前学習タスクを設計する。最後に,plmが生成したソリューションのエラーの検出と修正を強制する,より難しい事前学習タスクを導入する。オフライン評価(9つの数学関連タスクを含む)とオンラインの$A/B$テストについて広範な実験を行った。実験により, 提案手法の有効性を, 競争力のあるベースラインと比較した。コードは \textcolor{blue}{\url{https://github.com/rucaibox/jiuzhang}} で利用可能です。 This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model~(PLM) for effectively understanding and representing mathematical problems. Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. Typically, it requires complex mathematical logic and background knowledge for solving mathematical problems. Considering the complex nature of mathematical texts, we design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses. Specially, we first perform token-level pre-training based on a position-biased masking strategy, and then design logic-based pre-training tasks that aim to recover the shuffled sentences and formulas, respectively. Finally, we introduce a more difficult pre-training task that enforces the PLM to detect and correct the errors in its generated solutions. We conduct extensive experiments on offline evaluation (including nine math-related tasks) and online $A/B$ test. Experimental results demonstrate the effectiveness of our approach compared with a number of competitive baselines. Our code is available at: \textcolor{blue}{\url{https://github.com/RUCAIBox/JiuZhang}}.	翻訳日:2022-06-14 13:33:59 公開日:2022-06-13
# クラス条件コントラスト学習を用いたトランスダクティブクリップ Transductive CLIP with Class-Conditional Contrastive Learning ( http://arxiv.org/abs/2206.06177v1 ) ライセンス: Link先を確認	Junchu Huang, Weijie Chen, Shicai Yang, Di Xie, Shiliang Pu, Yueting Zhuang	(参考訳) 視覚言語事前学習モデルの目覚ましいゼロショット一般化能力に触発され、我々はCLIPモデルの監督を利用してデータラベリングの負担を軽減する。しかし、そのような監督は必然的にラベルノイズを含み、分類モデルの判別能力を大幅に低下させる。本研究では,雑音ラベル付き分類ネットワークをスクラッチから学習するための新しいフレームワークであるTransductive CLIPを提案する。まず, 擬似ラベルへの依存を緩和し, 雑音ラベルに対する耐性を高めるために, クラス条件型コントラスト学習機構を提案する。次に,疑似ラベル更新戦略としてアンサンブルラベルを採用し,ノイズラベルを用いたディープニューラルネットワークのトレーニングを安定化する。このフレームワークは、両方のテクニックを組み合わせることで、CLIPモデルからのノイズラベルの影響を効果的に低減することができる。複数のベンチマークデータセットの実験では、他の最先端メソッドよりも大幅に改善されている。 Inspired by the remarkable zero-shot generalization capacity of vision-language pre-trained model, we seek to leverage the supervision from CLIP model to alleviate the burden of data labeling. However, such supervision inevitably contains the label noise, which significantly degrades the discriminative power of the classification model. In this work, we propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch. Firstly, a class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels and boost the tolerance to noisy labels. Secondly, ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels. This framework can reduce the impact of noisy labels from CLIP model effectively by combining both techniques. Experiments on multiple benchmark datasets demonstrate the substantial improvements over other state-of-the-art methods.	翻訳日:2022-06-14 13:30:46 公開日:2022-06-13
# 確率的教師による学習領域適応オブジェクト検出 Learning Domain Adaptive Object Detection with Probabilistic Teacher ( http://arxiv.org/abs/2206.06293v1 ) ライセンス: Link先を確認	Meilin Chen, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, Shiliang Pu	(参考訳) 教師なしドメイン適応オブジェクト検出のための自己学習は難しい課題であり、その性能は擬似ボックスの品質に大きく依存する。有望な結果にもかかわらず、先行作品は、セルフトレーニング中の疑似ボックスの不確かさをほとんど見逃している。本稿では,段階的に発展する教師から未ラベルの目標データの不確実性を捉え,相互に有益な方法で生徒の学習を指導することを目的とした,簡易かつ効果的な枠組みである確率教師(PT)を提案する。具体的には,不確実性誘導型整合性トレーニングを活用して分類適応と局所化適応を促進することを提案する。また,アンカーを学習可能なパラメータと見なすことができるため,アンカー適応を局所化適応と並行して行う。この枠組みとともに,不確実性誘導型自己学習をさらに促進する新しいエントロピー焦点損失(efl)を提案する。 EFLを装備したPTは、以前のベースライン全てを大きなマージンで上回り、新しい最先端を実現する。 Self-training for unsupervised domain adaptive object detection is a challenging task, of which the performance depends heavily on the quality of pseudo boxes. Despite the promising results, prior works have largely overlooked the uncertainty of pseudo boxes during self-training. In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture the uncertainty of unlabeled target data from a gradually evolving teacher and guides the learning of a student in a mutually beneficial manner. Specifically, we propose to leverage the uncertainty-guided consistency training to promote classification adaptation and localization adaptation, rather than filtering pseudo boxes via an elaborate confidence threshold. In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter. Together with this framework, we also present a novel Entropy Focal Loss (EFL) to further facilitate the uncertainty-guided self-training. Equipped with EFL, PT outperforms all previous baselines by a large margin and achieve new state-of-the-arts.	翻訳日:2022-06-14 13:30:31 公開日:2022-06-13
# テキスト・モデリングのための潜時拡散エネルギーベースモデル Latent Diffusion Energy-Based Model for Interpretable Text Modeling ( http://arxiv.org/abs/2206.05895v1 ) ライセンス: Link先を確認	Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruigi Gao, Yixin Zhu, Song-Chun Zhu, and Ying Nian Wu	(参考訳) 潜在宇宙エネルギーベースモデル(EBMs)は、エネルギーベースモデルとしても知られ、生成モデルへの関心が高まっている。定式化の柔軟性と潜在空間の強力なモデリング力により、テキストモデリングの解釈可能性を目指して、近年の研究が進められている。しかし、遅延空間のEMMは、データ空間におけるEMMのいくつかの欠陥を継承し、縮退したMCMCサンプリングの品質は、特に複雑な遅延構造を持つデータにおいて、訓練における生成品質と不安定性を低下させる可能性がある。本研究では, 拡散回復可能性学習をサンプリング問題の解決策として活用する最近の取り組みに触発されて, 拡散モデルと潜時空間ebmsとの共生を, 潜時拡散エネルギーに基づくモデルとして創成した変分学習枠組みに導入する。本研究では,情報ボトルネックと協調して幾何クラスタリングに基づく正規化手法を開発し,学習した潜在空間の品質をさらに向上させる。いくつかの課題に対する実験は、強力なテキストモデリングにおける我々のモデルの優れた性能を示すものである。 Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.	翻訳日:2022-06-14 13:29:49 公開日:2022-06-13
# (参考訳) メディエーター:NLPモデル行動を説明する会話エージェント Mediators: Conversational Agents Explaining NLP Model Behavior ( http://arxiv.org/abs/2206.06029v1 ) ライセンス: CC BY 4.0	Nils Feldhus, Ajay Madhavan Ravichandran, Sebastian M\"oller	(参考訳) 人間中心の説明可能な人工知能(HCXAI)コミュニティは、人間と機械の会話として説明プロセスをフレーミングする必要性を高めた。本稿では,ニューラルモデルの振る舞いを自然言語を用いて対話的に説明できるテキストベースの会話エージェントである仲介者のためのデシデラタを構築した。自然言語処理(nlp)研究の観点からは,感情分析の課題に対するこのような仲介者の青写真を作成し,対話に基づく説明への道のりを現在研究がどこまで進んでいるかを評価する。 The human-centric explainable artificial intelligence (HCXAI) community has raised the need for framing the explanation process as a conversation between human and machine. In this position paper, we establish desiderata for Mediators, text-based conversational agents which are capable of explaining the behavior of neural models interactively using natural language. From the perspective of natural language processing (NLP) research, we engineer a blueprint of such a Mediator for the task of sentiment analysis and assess how far along current research is on the path towards dialogue-based explanations.	翻訳日:2022-06-14 13:28:14 公開日:2022-06-13
# 文脈埋め込みを用いた遷移型抽象的意味表現 Transition-based Abstract Meaning Representation Parsing with Contextual Embeddings ( http://arxiv.org/abs/2206.06229v1 ) ライセンス: Link先を確認	Yichao Liang	(参考訳) 言語を理解して生成する能力は、人間の認知を他の既知の生命体と区別する。統計的言語モデルと記号意味論的意味論の2つの意味への最も成功した経路を意味解析のタスクで融合する方法について検討した。遷移型抽象的意味表現(AMR)構文解析(AmrEager)を基盤として,AMR解析の課題に事前学習した文脈認識単語の埋め込み(BERTやRoBERTaなど)を組み込むことの有用性について検討し,AmrBergerと命名した新しい構文解析に寄与する。実験により、これらのリッチな語彙的特徴だけでは、非文脈的特徴と比較してsmatchスコアによって測定されたパーザ全体のパフォーマンスを改善するのにはあまり役に立たないことがわかった。病変研究を通じて,コンテクスト埋め込みの使用は,明示的な構文特徴の除去に対して,より堅牢なシステムを実現するのに役立つことがわかった。これらの知見は文脈埋め込みと言語モデルの強みと弱みを現在の形で明らかにし、その深い理解を動機付けている。 The ability to understand and generate languages sets human cognition apart from other known life forms'. We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing. Building on a transition-based, Abstract Meaning Representation (AMR) parser, AmrEager, we explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of AMR parsing, contributing a new parser we dub as AmrBerger. Experiments find these rich lexical features alone are not particularly helpful in improving the parser's overall performance as measured by the SMATCH score when compared to the non-contextual counterpart, while additional concept information empowers the system to outperform the baselines. Through lesion study, we found the use of contextual embeddings helps to make the system more robust against the removal of explicit syntactical features. These findings expose the strength and weakness of the contextual embeddings and the language models in the current form, and motivate deeper understanding thereof.	翻訳日:2022-06-14 13:03:39 公開日:2022-06-13
# SIXO: ツイストオブジェクトによるスムーズな推論 SIXO: Smoothing Inference with Twisted Objectives ( http://arxiv.org/abs/2206.05952v1 ) ライセンス: Link先を確認	Dieterich Lawson, Allan Ravent\'os, Andrew Warrington, Scott Linderman	(参考訳) シークエンシャルモンテカルロ (Sequential Monte Carlo, SMC) は、状態空間モデルに対する推論アルゴリズムであり、中間ターゲット分布の列からサンプリングすることで後部を近似する。対象の分布はしばしばフィルタリング分布として選択されるが、これらは将来の観測からの情報を無視し、推論とモデル学習の実践的および理論的制限をもたらす。 SIXOは、スムーズな分布を近似するターゲットを学習し、全ての観測結果から情報を取り入れる手法である。重要なアイデアは、フィルタ分布を平滑化分布に警告する関数を適合させるために密度比推定を使用することである。次に、これらの学習対象とSMCを用いて、モデルと提案学習の変動目標を定義する。 SIXO は対数境界の下限を確実に狭くし、様々な領域でより正確な後方推測とパラメータ推定を提供する。 Sequential Monte Carlo (SMC) is an inference algorithm for state space models that approximates the posterior by sampling from a sequence of intermediate target distributions. The target distributions are often chosen to be the filtering distributions, but these ignore information from future observations, leading to practical and theoretical limitations in inference and model learning. We introduce SIXO, a method that instead learns targets that approximate the smoothing distributions, incorporating information from all observations. The key idea is to use density ratio estimation to fit functions that warp the filtering distributions into the smoothing distributions. We then use SMC with these learned targets to define a variational objective for model and proposal learning. SIXO yields provably tighter log marginal lower bounds and offers significantly more accurate posterior inferences and parameter estimates in a variety of domains.	翻訳日:2022-06-14 13:03:14 公開日:2022-06-13
# iCITRIS:瞬時効果のための因果表現学習 iCITRIS: Causal Representation Learning for Instantaneous Temporal Effects ( http://arxiv.org/abs/2206.06169v1 ) ライセンス: Link先を確認	Phillip Lippe, Sara Magliacane, Sindy L\"owe, Yuki M. Asano, Taco Cohen, Efstratios Gavves	(参考訳) 因果表現学習は、基礎となる因果変数とその関係を画像などの高次元観察から識別するタスクである。近年の研究では, 因果関係が存在しないという仮定の下で, 時間的な観測順序から因果変数を再構築できることが示されている。しかし,実際の応用では,我々の測定やフレームレートは多くの因果効果よりも遅い可能性がある。効果を効果的に生成し、以前の識別可能性の結果を無効にする。そこで本研究では,既知の介入目標を満たした完全な介入が与えられた場合,時間系列における瞬時効果を処理できる因果表現学習手法であるicitrisを提案する。 iCITRISは、時間的観察から因果因子を特定し、同時に異なる因果発見法を用いて因果グラフを学習する。 3つのビデオデータセットの実験において、iCITRISは因果因子とその因果グラフを正確に識別する。 Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that can handle instantaneous effects in temporal sequences when given perfect interventions with known intervention targets. iCITRIS identifies the causal factors from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three video datasets, iCITRIS accurately identifies the causal factors and their causal graph.	翻訳日:2022-06-14 13:02:58 公開日:2022-06-13
# Markov Chain Score Ascent: Markovian Gradientsによる変分推論の統一フレームワーク Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients ( http://arxiv.org/abs/2206.06295v1 ) ライセンス: Link先を確認	Kyurae Kim, Jisu Oh, Jacob R. Gardner, Adji Bousso Dieng, Hongseok Kim	(参考訳) 確率勾配降下(sgd)を伴う包括的kullback-leibler(kl)分岐の最小化は、その勾配が後方の積分として定義されるため困難である。近年,マルコフ連鎖から得られた偏差勾配推定値を用いてSGDを実行する方法が提案されている。本稿では, この手法について, 混合速度と勾配分散の確立により, 初の非漸近収束解析を行う。そこで我々は,これらの手法をMarkov chain score Ascent (MCSA) と総称し,Markov chain gradient descent framework の特殊な場合として適用できることを実証した。さらに, この新たな理解を活かし, 勾配分散のより厳密な結合を実現する新しいmcsaスキームであるparallel mcsa (pmcsa) を開発した。この改良された理論結果が優れた経験的性能をもたらすことを実証する。 Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.	翻訳日:2022-06-14 13:02:44 公開日:2022-06-13
# (参考訳) 単純なキューが強力なマルチオブジェクトトラッカーに導く Simple Cues Lead to a Strong Multi-Object Tracker ( http://arxiv.org/abs/2206.04656v3 ) ライセンス: CC BY 4.0	Jenny Seidenschwarz, Guillem Bras\'o, Ismail Elezi, and Laura Leal-Taix\'e	(参考訳) 長い間、マルチオブジェクト追跡の最も一般的なパラダイムはtracking-by-detection(tbd)で、まずオブジェクトを検出してビデオフレーム上で関連付ける。関連して、ほとんどのモデルは動きと外観の手がかりに頼りになる。これらの方法に引き続き依存しているが、近年のアプローチでは、例えば、データトレーニングや全体的な複雑なフレームワークの必要性が高まっている。私たちは 1) 設計上の重要な選択が適用されれば,少量のトレーニングデータから強固な手がかりを得ることができる。 2) これらの強い手がかりから、ハンガリーの標準マッチングに基づく協会は、印象的な結果を得るのに十分である。私たちの主な洞察は、外見に基づくトラッキングにおいて、標準的な再識別ネットワークが優れている重要なコンポーネントを特定することです。その障害事例を広範囲に分析し,我々の外観特徴と単純な運動モデルの組み合わせが強い追跡結果をもたらすことを示した。 IDF1では5.4pp,HOTAでは4.4ppに向上し,MOT17およびMOT20データセットの最先端性能が向上した。論文が受け入れられた後、コードとモデルをリリースします。 For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resource to motion and appearance cues. While still relying on these cues, recent approaches based on, e.g., attention have shown an ever-increasing need for training data and overall complex frameworks. We claim that 1) strong cues can be obtained from little amounts of training data if some key design choices are applied, 2) given these strong cues, standard Hungarian matching-based association is enough to obtain impressive results. Our main insight is to identify key components that allow a standard reidentification network to excel at appearance-based tracking. We extensively analyze its failure cases and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our model achieves state-of-the-art performance on MOT17 and MOT20 datasets outperforming previous state-of-the-art trackers by up to 5.4pp in IDF1 and 4.4pp in HOTA. We will release the code and models after the paper's acceptance.	翻訳日:2022-06-14 12:26:34 公開日:2022-06-13
# (参考訳) コアセットを擁護する: アクティブラーニングのための密度認識型コアセット選択 In Defense of Core-set: A Density-aware Core-set Selection for Active Learning ( http://arxiv.org/abs/2206.04838v2 ) ライセンス: CC BY 4.0	Yeachan Kim, Bonggun Shin	(参考訳) アクティブラーニングは、ラベルのないデータセットから情報サンプルをラベル付けすることで、ラベル付きデータセットの効率的な構築を可能にする。実世界のアクティブな学習シナリオでは、多くの冗長あるいは非常に類似したサンプルが存在するため、選択されたサンプルの多様性を考慮することが重要である。コアセットアプローチは、サンプル間の距離に基づいて多様なサンプルを選択する、有望な多様性に基づく手法である。しかし、このアプローチは、神経モデルが低い信頼性を示す最も難しいサンプルを選択する不確実性に基づくアプローチに比べて、パフォーマンスが劣る。本研究では, 密度のレンズを通して特徴空間を解析し, 興味深いことに, 局所スパース領域は密度の高い領域よりも情報的なサンプルを持つ傾向にある。本分析により,密度認識によるコアセットのアプローチが強化され,密度認識コアセット(DACS)が提案される。この戦略は,未ラベル標本の密度を推定し,主にスパース領域から多種多様な試料を抽出する。密度推定における計算ボトルネックを削減するため,局所性に敏感なハッシュに基づく新しい密度近似を提案する。実験により,DACSの分類・回帰作業における有効性が明らかに示され,実用シナリオにおいてDACSが最先端の性能を発揮できることを示す。 DACSはニューラルネットワークアーキテクチャに弱いため,既存の手法とDACSを効果的に組み合わせることができることを示すための,単純かつ効果的な組み合わせ法を提案する。 Active learning enables the efficient construction of a labeled dataset by labeling informative samples from an unlabeled dataset. In a real-world active learning scenario, considering the diversity of the selected samples is crucial because many redundant or highly similar samples exist. Core-set approach is the promising diversity-based method selecting diverse samples based on the distance between samples. However, the approach poorly performs compared to the uncertainty-based approaches that select the most difficult samples where neural models reveal low confidence. In this work, we analyze the feature space through the lens of the density and, interestingly, observe that locally sparse regions tend to have more informative samples than dense regions. Motivated by our analysis, we empower the core-set approach with the density-awareness and propose a density-aware core-set (DACS). The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions. To reduce the computational bottlenecks in estimating the density, we also introduce a new density approximation based on locality-sensitive hashing. Experimental results clearly demonstrate the efficacy of DACS in both classification and regression tasks and specifically show that DACS can produce state-of-the-art performance in a practical scenario. Since DACS is weakly dependent on neural architectures, we present a simple yet effective combination method to show that the existing methods can be beneficially combined with DACS.	翻訳日:2022-06-14 12:25:23 公開日:2022-06-13
# (参考訳) フェデレーション学習のための高速深層オートエンコーダ Fast Deep Autoencoder for Federated learning ( http://arxiv.org/abs/2206.05136v2 ) ライセンス: CC BY 4.0	David Novoa-Paradela, Oscar Romero-Fontenla, Bertha Guijarro-Berdi\~nas	(参考訳) 本稿では,ディープオートエンコーダの新規かつ高速かつプライバシ保護実装を提案する。 DAEF(Deep Autoencoder for Federated Learning)は、従来のニューラルネットワークとは異なり、ディープオートエンコーダネットワークを非定型的にトレーニングすることで、トレーニング時間を劇的に短縮する。そのトレーニングは分散(データセットの分割を並行して行う)とインクリメンタル(部分モデルの集約)で行うことができ、数学的定式化のため、交換されるデータはユーザのプライバシを危険にさらすことはない。これにより、DAEFはエッジコンピューティングとフェデレーション学習シナリオの有効な方法となる。この手法は、7つの実際の異常検出データセットを用いた従来の(反復的な)ディープオートエンコーダと比較され、daefの高速トレーニングにもかかわらず、その性能が類似していることが示されている。 This paper presents a novel, fast and privacy preserving implementation of deep autoencoders. DAEF (Deep Autoencoder for Federated learning), unlike traditional neural networks, trains a deep autoencoder network in a non-iterative way, which drastically reduces its training time. Its training can be carried out in a distributed way (several partitions of the dataset in parallel) and incrementally (aggregation of partial models), and due to its mathematical formulation, the data that is exchanged does not endanger the privacy of the users. This makes DAEF a valid method for edge computing and federated learning scenarios. The method has been evaluated and compared to traditional (iterative) deep autoencoders using seven real anomaly detection datasets, and their performance have been shown to be similar despite DAEF's faster training.	翻訳日:2022-06-14 12:09:48 公開日:2022-06-13
# MAREO: メモリと注意に基づく視覚的リズオン MAREO: Memory- and Attention- based visual REasOning ( http://arxiv.org/abs/2206.04928v2 ) ライセンス: Link先を確認	Mohit Vaishnav, Thomas Serre	(参考訳) 人間は、複雑な視覚シーンを柔軟に解析し理解する能力において、現代のAIシステムを大きく上回っている。注意と記憶は、行動に関連した視覚情報を選択的に保守し、操作し、最も困難な視覚的推論タスクを解決する能力において重要な役割を果たすことが知られている2つのシステムである。本稿では,視覚推論に関する認知科学文献,記憶と注意に基づく(視覚)推論(mareo)アーキテクチャに触発された視覚推論のための新しいアーキテクチャを提案する。 MAREOは、脳が複雑な視覚的推論問題を合成的に解決し、より複雑な視覚ルーチンを形成するための基本的な視覚操作を組み合わせることを学習することで、アクティブビジョン理論をインスタンス化する。 MAREOは、アテンションシフトのシーケンスを通じて視覚的推論タスクの解決を学び、マルチヘッドトランスフォーマーモジュールを介してタスク関連視覚情報をメモリバンクに保持する。視覚ルーチンは、シーン内のオブジェクト間のさまざまな関係を判断する専用の推論モジュールによってデプロイされる。 4種類の推論タスクの実験は、堅牢でサンプル効率のよい視覚ルーチンを学習するMAREOの能力を示している。 Humans continue to vastly outperform modern AI systems in their ability to parse and understand complex visual scenes flexibly. Attention and memory are two systems known to play a critical role in our ability to selectively maintain and manipulate behaviorally-relevant visual information to solve some of the most challenging visual reasoning tasks. Here, we present a novel architecture for visual reasoning inspired by the cognitive-science literature on visual reasoning, the Memory- and Attention-based (visual) REasOning (MAREO) architecture. MAREO instantiates an active-vision theory, which posits that the brain solves complex visual reasoning problems compositionally by learning to combine previously-learned elementary visual operations to form more complex visual routines. MAREO learns to solve visual reasoning tasks via sequences of attention shifts to route and maintain task-relevant visual information into a memory bank via a multi-head transformer module. Visual routines are then deployed by a dedicated reasoning module trained to judge various relations between objects in the scenes. Experiments on four types of reasoning tasks demonstrate MAREO's ability to learn visual routines in a robust and sample-efficient manner.	翻訳日:2022-06-14 11:48:33 公開日:2022-06-13
# 評価理論を用いたテキスト中の感情の次元モデリング:コーパス生成、注釈信頼性、予測 Dimensional Modeling of Emotions in Text with Appraisal Theories: Corpus Creation, Annotation Reliability, and Prediction ( http://arxiv.org/abs/2206.05238v2 ) ライセンス: Link先を確認	Enrica Troiano and Laura Oberl\"ander and Roman Klinger	(参考訳) 感情分析の最も顕著なタスクは、テキストに感情を割り当て、言語で感情がどのように現れるかを理解することである。自然言語処理における重要な観察は、感情はイベントのみを参照することで暗黙的にコミュニケーションでき、感情名に明示的に言及することなく、感情の共感的、客観的な理解に訴えることができることである。心理学において、評価理論として知られる感情理論のクラスは、出来事と感情の関係を説明することを目的としている。評価は、関連する出来事を経験する人々による認知評価を測定する変数として形式化することができる。それらは、イベントが新規である場合、人が自分自身を責任とみなす場合、それが自身の目標と一致している場合、その他多くの場合、評価を含む。このような評価は、例えば、新しい状況が驚きを引き起こすことや、不確実な結果をもたらすことが恐怖を引き起こすことを、イベントに基づいてどの感情が発達するかを説明する。テキストにおける感情分析における評価理論の適合性を分析し,評価概念が注釈者によって確実に再構築できるか,テキスト分類器によって予測可能か,評価概念が感情カテゴリーの識別に役立つかを理解することを目的としている。そこで我々は,特定の感情を誘発する出来事をテキストで記述し,評価を明らかにすることでコーパスをコンパイルする。そして,本文から感情や評価を再構築するよう読者に求めた。この設定により、感情や評価がテキストから純粋に回収できるかどうかを計測することができ、モデルのパフォーマンス測定を判断するための人間のベースラインを提供する。テキスト分類法を人間の注釈者と比較した結果,どちらも類似の性能で感情や評価を確実に検出できることがわかった。さらに、評価概念がテキスト中の感情の分類を改善することを示す。 The most prominent tasks in emotion analysis are to assign emotions to texts and to understand how emotions manifest in language. An important observation for natural language processing is that emotions can be communicated implicitly by referring to events alone, appealing to an empathetic, intersubjective understanding of events, even without explicitly mentioning an emotion name. In psychology, the class of emotion theories known as appraisal theories aims at explaining the link between events and emotions. Appraisals can be formalized as variables that measure a cognitive evaluation by people living through an event that they consider relevant. They include the assessment if an event is novel, if the person considers themselves to be responsible, if it is in line with the own goals, and many others. Such appraisals explain which emotions are developed based on an event, e.g., that a novel situation can induce surprise or one with uncertain consequences could evoke fear. We analyze the suitability of appraisal theories for emotion analysis in text with the goal of understanding if appraisal concepts can reliably be reconstructed by annotators, if they can be predicted by text classifiers, and if appraisal concepts help to identify emotion categories. To achieve that, we compile a corpus by asking people to textually describe events that triggered particular emotions and to disclose their appraisals. Then, we ask readers to reconstruct emotions and appraisals from the text. This setup allows us to measure if emotions and appraisals can be recovered purely from text and provides a human baseline to judge model's performance measures. Our comparison of text classification methods to human annotators shows that both can reliably detect emotions and appraisals with similar performance. We further show that appraisal concepts improve the categorization of emotions in text.	翻訳日:2022-06-14 11:48:12 公開日:2022-06-13
# (参考訳) ニューラルラプラス:ラプラス領域における微分方程式の多様なクラスを学ぶ Neural Laplace: Learning diverse classes of differential equations in the Laplace domain ( http://arxiv.org/abs/2206.04843v2 ) ライセンス: CC BY 4.0	Samuel Holt, Zhaozhi Qian, Mihaela van der Schaar	(参考訳) ニューラルネットワークで学習したODEを用いたニューラル正規微分方程式モデルしかし、ODEは工学や生物学的システムに共通する長距離依存や不連続性を持つシステムをモデル化するには基本的に不十分である。微分方程式の幅広いクラス (de) は、遅延微分方程式や積分微分方程式を含む修正として提案されている。さらに、剛体ODEとODEを一方向強制関数でモデル化する場合、Neural ODEは数値不安定性に悩まされる。本研究は,上記を含む多種多様なDESクラスを学習するための統一フレームワークであるNeural Laplaceを提案する。時間領域のダイナミクスをモデル化するのではなく、ラプラス領域でモデル化し、時間における履歴依存性や不連続を複素指数関数の和として表すことができる。学習をより効率的にするために、リーマン球面の幾何学的立体地図を用いてラプラス領域のより滑らかさを誘導する。実験では、Neural Laplaceは、複雑な履歴依存や急激な変化を含む様々なDESクラスの軌道をモデル化および外挿する上で、優れた性能を示す。 Neural Ordinary Differential Equations model dynamical systems with ODEs learned by neural networks. However, ODEs are fundamentally inadequate to model systems with long-range dependencies or discontinuities, which are common in engineering and biological systems. Broader classes of differential equations (DE) have been proposed as remedies, including delay differential equations and integro-differential equations. Furthermore, Neural ODE suffers from numerical instability when modelling stiff ODEs and ODEs with piecewise forcing functions. In this work, we propose Neural Laplace, a unified framework for learning diverse classes of DEs including all the aforementioned ones. Instead of modelling the dynamics in the time domain, we model it in the Laplace domain, where the history-dependencies and discontinuities in time can be represented as summations of complex exponentials. To make learning more efficient, we use the geometrical stereographic map of a Riemann sphere to induce more smoothness in the Laplace domain. In the experiments, Neural Laplace shows superior performance in modelling and extrapolating the trajectories of diverse classes of DEs, including the ones with complex history dependency and abrupt changes.	翻訳日:2022-06-14 11:46:59 公開日:2022-06-13
# COSTA: グラフコントラスト学習のための共分散保存機能強化 COSTA: Covariance-Preserving Feature Augmentation for Graph Contrastive Learning ( http://arxiv.org/abs/2206.04726v2 ) ライセンス: Link先を確認	Yifei Zhang and Hao Zhu and Zixing Song and Piotr Koniusz and Irwin King	(参考訳) グラフコントラスト学習 (gcl) はグラフ表現学習を改善し、様々な下流タスクで sota に繋がる。グラフ拡大ステップは、GCLの重要なステップであるが、ほとんど研究されていない。本稿では,グラフ拡張によって得られるノード埋め込みが偏りが強く,下流タスクの識別的特徴の学習から対照的なモデルを多少制限することを示す。したがって、入力空間におけるグラフの強化を調べる代わりに、隠れた特徴の強化(特徴の強化)を行うように提案する。いわゆる行列スケッチにインスパイアされたCOSTAは,従来の特徴の「よいスケッチ」を保ち,拡張された特徴を生成できる,新しいCOSTA(COvariance-preServing feaTure space Augmentation framework for GCL)を提案する。 COSTAによる機能拡張の優位性を強調するため、メモリと計算を保存するシングルビュー設定(マルチビュー設定に加えて)について検討する。 COSTAによる機能拡張は,グラフ拡張に基づくモデルに比べて,同等/ベターな結果が得られることを示す。 Graph contrastive learning (GCL) improves graph representation learning, leading to SOTA on various downstream tasks. The graph augmentation step is a vital but scarcely studied step of GCL. In this paper, we show that the node embedding obtained via the graph augmentations is highly biased, somewhat limiting contrastive models from learning discriminative features for downstream tasks. Thus, instead of investigating graph augmentation in the input space, we alternatively propose to perform augmentations on the hidden features (feature augmentation). Inspired by so-called matrix sketching, we propose COSTA, a novel COvariance-preServing feaTure space Augmentation framework for GCL, which generates augmented features by maintaining a "good sketch" of original features. To highlight the superiority of feature augmentation with COSTA, we investigate a single-view setting (in addition to multi-view one) which conserves memory and computations. We show that the feature augmentation with COSTA achieves comparable/better results than graph augmentation based models.	翻訳日:2022-06-14 11:20:20 公開日:2022-06-13
# AI-MIA:医療画像による新型コロナウイルス検出・重症度分析 AI-MIA: COVID-19 Detection & Severity Analysis through Medical Imaging ( http://arxiv.org/abs/2206.04732v2 ) ライセンス: Link先を確認	Dimitrios Kollias and Anastasios Arsenos and Stefanos Kollias	(参考訳) 本稿では,欧州コンピュータビジョン会議(ECCV 2022)におけるAIIAワークショップの枠組みにおいて,第2回Covid-19コンペティションの基幹となるアプローチについて述べる。 COV19-CT-DBデータベースは、約7,700個の3DCTスキャンからなる新型コロナウイルスの予防のために注釈付けされている。コビッド19の症例からなるデータベースの一部は、さらに4つのコビッド19の重症度条件で注釈付けされている。トレーニング、検証、テストデータセットで、データベースと後者を分割しました。前者2つのデータセットは機械学習モデルのトレーニングと検証に使用され、後者は開発したモデルの評価に使用される。ベースラインアプローチは、CNN-RNNネットワークに基づくディープラーニングアプローチで構成され、そのパフォーマンスをCOVID19-CT-DBデータベースに報告する。 This paper presents the baseline approach for the organized 2nd Covid-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). It presents the COV19-CT-DB database which is annotated for COVID-19 detction, consisting of about 7,700 3-D CT scans. Part of the database consisting of Covid-19 cases is further annotated in terms of four Covid-19 severity conditions. We have split the database and the latter part of it in training, validation and test datasets. The former two datasets are used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. The baseline approach consists of a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database.	翻訳日:2022-06-14 11:20:02 公開日:2022-06-13
# トピック制御可能な要約のためのトピックアウェア評価とトランスフォーマー法 Topic-Aware Evaluation and Transformer Methods for Topic-Controllable Summarization ( http://arxiv.org/abs/2206.04317v2 ) ライセンス: Link先を確認	Tatiana Passali, Grigorios Tsoumakas	(参考訳) トピック制御可能な要約は、幅広い応用可能性を持つ新たな研究分野である。しかし、既存のアプローチには大きな制限がある。第一に、現在この課題に対する評価基準は確立されていない。さらに、recurrentアーキテクチャ上に構築された既存のメソッドは、最近のtransformerベースのアーキテクチャに比べてパフォーマンスを著しく制限すると同時に、トピックを制御するためにモデルのアーキテクチャを変更する必要もある。本研究では,生成した要約と所望のトピックとの親和性に基づいて,生成した要約を自動的に評価する新たなトピック指向評価尺度を提案する。また,本尺度の信頼性を検証するユーザ調査を行った。最後に,モデルアーキテクチャにトピック埋め込みを組み込むか,あるいは要約生成を導くために制御トークンを使用するか,トピック制御可能な要約方法を提案する。実験結果から, 制御トークンは, より複雑な埋め込みベースのアプローチに比べ, はるかに高速かつ優れた性能が得られることがわかった。 Topic-controllable summarization is an emerging research area with a wide range of potential applications. However, existing approaches suffer from significant limitations. First, there is currently no established evaluation metric for this task. Furthermore, existing methods built upon recurrent architectures, which can significantly limit their performance compared to more recent Transformer-based architectures, while they also require modifications to the model's architecture for controlling the topic. In this work, we propose a new topic-oriented evaluation measure to automatically evaluate the generated summaries based on the topic affinity between the generated summary and the desired topic. We also conducted a user study that validates the reliability of this measure. Finally, we propose simple, yet powerful methods for topic-controllable summarization either incorporating topic embeddings into the model's architecture or employing control tokens to guide the summary generation. Experimental results show that control tokens can achieve better performance compared to more complicated embedding-based approaches while being at the same time significantly faster.	翻訳日:2022-06-14 11:19:48 公開日:2022-06-13
# 回復力のある分散ブースティングアルゴリズム A Resilient Distributed Boosting Algorithm ( http://arxiv.org/abs/2206.04713v2 ) ライセンス: Link先を確認	Yuval Filmus, Idan Mehalel and Shay Moran	(参考訳) データが複数のパーティに分散する学習タスクを考えると、コミュニケーションは、当事者が最小化したい基本的なリソースの1つです。限られた雑音に耐性を持つ分散ブースティングアルゴリズムを提案する。我々のアルゴリズムは古典的なブースティングアルゴリズムに似ているが、Impagliazzoのハードコア補題(Impagliazzo95)にインスパイアされた新しいコンポーネントを備えており、アルゴリズムにロバストな品質を加えている。また, 漸近的に大きい雑音に対するレジリエンスは通信効率のよいアルゴリズムでは達成できないことを示すことで, この結果を補完する。 Given a learning task where the data is distributed among several parties, communication is one of the fundamental resources which the parties would like to minimize. We present a distributed boosting algorithm which is resilient to a limited amount of noise. Our algorithm is similar to classical boosting algorithms, although it is equipped with a new component, inspired by Impagliazzo's hard-core lemma [Impagliazzo95], adding a robustness quality to the algorithm. We also complement this result by showing that resilience to any asymptotically larger noise is not achievable by a communication-efficient algorithm.	翻訳日:2022-06-14 11:18:54 公開日:2022-06-13
# スリングショット機構:適応的最適化とグロッキング現象の実証的研究 The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon ( http://arxiv.org/abs/2206.04817v2 ) ライセンス: Link先を確認	Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss and Joshua Susskind	(参考訳) Power et al. (arXiv:2201.02177 ) によって報告されたグルーキング現象は、長期にわたるオーバーフィッティングが続き、突然に完全な一般化へと移行した状態を指す。本稿では,Grokkingの基盤を明らかにするために,一連の実証的研究を行った。具体的には,Slingshot Mechanismと呼ばれる,適応型最適化機構を極端に遅い段階から発見する。スリングショット機構の顕著なアーチファクトは、安定なトレーニング体制と不安定なトレーニング体制の間の循環相転移によって測定でき、最後の層重みのノルムの循環挙動によって容易に監視できる。我々は、明示的な正則化がなければ、(arXiv:2201.02177 )GrokkingはほとんどSlingshotsの開始時にのみ発生し、それなしでは存在しないことを実証的に観察した。より一般的な環境では一般的で容易に再現できるが、スリングショット機構は我々が認識しているいかなる既知の最適化理論にも従わず、奥行きを調べることなく容易に見過ごせる。私たちの研究は、トレーニングの後期における適応勾配最適化器の驚くほど有用な帰納的バイアスを示し、それらの起源の理論的解析の改訂を要求している。 The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 ) refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking via a series of empirical studies. Specifically, we uncover an optimization anomaly plaguing adaptive optimizers at extremely late stages of training, referred to as the Slingshot Mechanism. A prominent artifact of the Slingshot Mechanism can be measured by the cyclic phase transitions between stable and unstable training regimes, and can be easily monitored by the cyclic behavior of the norm of the last layers weights. We empirically observe that without explicit regularization, Grokking as reported in ( arXiv:2201.02177 ) almost exclusively happens at the onset of Slingshots, and is absent without it. While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination. Our work points to a surprising and useful inductive bias of adaptive gradient optimizers at late stages of training, calling for a revised theoretical analysis of their origin.	翻訳日:2022-06-14 11:18:42 公開日:2022-06-13

Title

Authors

Abstract

論文公表日・翻訳日

# コンクリートから抽象的なAI:人工知能を一般大衆に普及させる

AI from concrete to abstract: demystifying artificial intelligence to the general public ( http://arxiv.org/abs/2006.04013v6 )

ライセンス: Link先を確認

Rubens Lacerda Queiroz, F\'abio Ferrentini Sampaio, Cabral Lima and Priscila Machado Vieira Lima

(参考訳) 人工知能(AI)は幅広い領域で採用されている。これは、aiの意味を最小限の理解で一般の人々に与える手段を開発することの必要性を示しています。本稿では、ビジュアルプログラミングとWiSARD重みのない人工ニューラルネットワークを組み合わせることで、一般の人々(子供を含む)がこの目標を達成するために、コンクリートから抽象的(AIcon2abs)へのAIという新しい方法論を提案する。本研究の主な戦略は,学習機械の開発に関連する実践的活動を通じて,学習過程の観察を通じて,人工知能の脱ミステリゼーションを促進することである。したがって、人工知能のメカニズムの採用に関わる議論や決定において、被験者に洞察力のある俳優に寄与するスキルを提供することができる。現在、プログラミングを通じて基本的なai概念を教える既存のアプローチは、マシンインテリジェンスを外部要素/モジュールとして扱う。トレーニングを受けた後、その外部モジュールは学習者が開発するメインアプリケーションに結合される。この方法論では、トレーニングタスクと分類タスクの両方が、他のプログラミング構成と同様に、メインプログラムを構成するブロックである。 aicon2absの有益な副作用として、データから学習可能なプログラムと従来のコンピュータプログラムとの差がより顕著になる。さらに、WiSARDの重みのない人工知能ニューラルネットワークモデルの単純さにより、トレーニングや分類タスクの内部実現の可視化と理解が容易になる。

Artificial Intelligence (AI) has been adopted in a wide range of domains. This shows the imperative need to develop means to endow common people with a minimum understanding of what AI means. Combining visual programming and WiSARD weightless artificial neural networks, this article presents a new methodology, AI from concrete to abstract (AIcon2abs), to enable general people (including children) to achieve this goal. The main strategy adopted by is to promote a demystification of artificial intelligence via practical activities related to the development of learning machines, as well as through the observation of their learning process. Thus, it is possible to provide subjects with skills that contributes to making them insightful actors in debates and decisions involving the adoption of artificial intelligence mechanisms. Currently, existing approaches to the teaching of basic AI concepts through programming treat machine intelligence as an external element/module. After being trained, that external module is coupled to the main application being developed by the learners. In the methodology herein presented, both training and classification tasks are blocks that compose the main program, just as the other programming constructs. As a beneficial side effect of AIcon2abs, the difference between a program capable of learning from data and a conventional computer program becomes more evident. In addition, the simplicity of the WiSARD weightless artificial neural network model enables easy visualization and understanding of training and classification tasks internal realization.

翻訳日:2023-05-16 09:12:36 公開日:2022-06-13

# 多体局在遷移における情報理論記憶スケーリング

Information-Theoretic Memory Scaling in the Many-Body Localization Transition ( http://arxiv.org/abs/2009.04470v3 )

ライセンス: Link先を確認

Alexander Nico-Katz, Abolfazl Bayat, Sougato Bose

(参考訳) 多体局所化相の重要な特徴は、エルゴディシティの破壊と、結果として局所記憶の出現であり、時間の経過とともに情報の局所保存として明らかにされる。メモリは必ずしも時間に依存する概念であるため、動的量に関するいくつかの研究によって部分的に捉えられている。しかし、これらの量は入力状態に関して最適でも民主的でもなく、多体ローカライゼーションの文脈における局所記憶の基本的な情報理論的理解はいまだ解明されていない。局所記憶の真の定量化として動的ホールボ量を導入し,不均衡や絡み合いエントロピーといった他の量に対する利点を概説する。多体局在遷移にまたがる定常状態における明確なスケーリング挙動を見いだし、この挙動を捉えた2パラメータスケーリングans\"atzeの族を決定する。遷移点とスケーリング指数を抽出したこの力学量の包括的有限サイズスケーリング解析を行う。

A key feature of the many-body localized phase is the breaking of ergodicity and consequently the emergence of local memory; revealed as the local preservation of information over time. As memory is necessarily a time dependent concept, it has been partially captured by a few extant studies of dynamical quantities. However, these quantities are neither optimal, nor democratic with respect to input state; and as such a fundamental and complete information theoretic understanding of local memory in the context of many-body localization remains elusive. We introduce the dynamical Holevo quantity as the true quantifier of local memory, outlining its advantages over other quantities such as the imbalance or entanglement entropy. We find clear scaling behavior in its steady-state across the many-body localization transition, and determine a family of two-parameter scaling ans\"atze which captures this behavior. We perform a comprehensive finite size scaling analysis of this dynamical quantity extracting the transition point and scaling exponents.

翻訳日:2023-05-03 02:53:50 公開日:2022-06-13

# 確率と構造的難読性に基づく文脈性の多様性

Varieties of contextuality based on probability and structural nonembeddability ( http://arxiv.org/abs/2103.06110v5 )

ライセンス: Link先を確認

Karl Svozil

(参考訳) 文脈性の異なる分析的概念は、確率論的と強い文脈性の2つのグループに分けられる。 Kochen and Specker's Theorem~0 はこれらの群を区別するための区切り基準である。確率的文脈性は、古典的モデルも許すが、非古典的確率を持つにもかかわらず、文脈性の論理的代数的「強い」形式は、(拡張された)ブール代数に忠実に埋め込まれない量子可観測物の集合を特徴づける。どちらの形式も古典的慣性下決定であり、これは「値不定」と呼ばれ、理論計算機科学の部分関数によって形式化される。

Different analytic notions of contextuality fall into two major groups: probabilistic and strong notions of contextuality. Kochen and Specker's Theorem~0 is a demarcation criterion for differentiating between those groups. Whereas probabilistic contextuality still allows classical models, albeit with nonclassical probabilities, the logico-algebraic "strong" form of contextuality characterizes collections of quantum observables that have no faithfully embedding into (extended) Boolean algebras. Both forms indicate a classical in- or under-determination that can be termed "value indefinite" and formalized by partial functions of theoretical computer sciences.

翻訳日:2023-04-08 13:32:24 公開日:2022-06-13

# 量子領域融解の観察と量子コンピュータによるシミュレーション

Observation of quantum domain melting and its simulation with a quantum computer ( http://arxiv.org/abs/2103.07343v2 )

ライセンス: Link先を確認

Jaka Vodeb, Michele Diego, Yevhenii Vaskivskyi, Yaroslav Gerasimenko, Viktor Kabanov and Dragan Mihailovic

(参考訳) 領域は非平衡相転移で生成される離散対称性の均一領域である。それらはドメインの壁と、それらが融合することを防ぐトポロジカルオブジェクトによって分離される。ドメインは熱駆動型顕微鏡プロセス、および量子システムにおいて、マクロ的な量子トンネルによって再構成される。トンネル形成のためのシステムのエネルギー環境を定義する微視的物理は、宇宙論や他の量子領域システム、より一般的に核物理学、物質波、磁気学、生物学など、多くの異なるシステムで興味深い。量子領域のダイナミクスのような創発的振る舞いにつながる微視的相関のダイナミクスを研究するユニークな機会は、量子材料によって提供される。ここでは、量子コンピュータを用いて量子系をシミュレートするというファインマンのアイデアの直接的な実現として、量子電子再構成ダイナミクスと領域融解の2つの実施形態(原型的2次元電子秩序固体量子材料と最新の量子シミュレータのシミュレーション)における研究を報告する。走査型トンネル顕微鏡を用いて電子領域再構成ダイナミクスの時間変化を計測し、量子領域融解シミュレーションにおける絡み合った相関電子のアンサンブルにおける領域の時間発展と比較する。ドメイン再構成は、マクロ的に観察される特徴的なステップ様の時間発展と温度依存性を持つ、創発的な自己構成エネルギーランドスケープでトンネル化することによって進行する。量子材料と量子シミュレーションの力学における顕著な対応は、顕微鏡レベルで相互作用する多体量子系の創発的挙動を理解するための道を開く。

Domains are homogeneous areas of discrete symmetry, created in nonequilibrium phase transitions. They are separated by domain walls, topological objects which prevent them from fusing together. Domains may reconfigure by thermally-driven microscopic processes, and in quantum systems, by macroscopic quantum tunnelling. The underlying microscopic physics that defines the system's energy landscape for tunnelling is of interest in many different systems, from cosmology and other quantum domain systems, and more generally to nuclear physics, matter waves, magnetism, and biology. A unique opportunity to investigate the dynamics of microscopic correlations leading to emergent behaviour, such as quantum domain dynamics is offered by quantum materials. Here, as a direct realization of Feynman's idea of using a quantum computer to simulate a quantum system, we report an investigation of quantum electron reconfiguration dynamics and domain melting in two matching embodiments: a prototypical two-dimensionally electronically ordered solid-state quantum material and a simulation on a latest-generation quantum simulator. We use scanning tunnelling microscopy to measure the time-evolution of electronic domain reconfiguration dynamics and compare this with the time evolution of domains in an ensemble of entangled correlated electrons in simulated quantum domain melting. The domain reconfiguration is found to proceed by tunnelling in an emergent, self-configuring energy landscape, with characteristic step-like time evolution and temperature-dependences observed macroscopically. The remarkable correspondence in the dynamics of a quantum material and a quantum simulation opens the way to an understanding of emergent behaviour in diverse interacting many-body quantum systems at the microscopic level.

翻訳日:2023-04-08 08:41:50 公開日:2022-06-13

# x状態の局所利用可能な量子相関:対称ケースと反対称ケース

Local available quantum correlations of X states: The symmetric and anti-symmetric cases ( http://arxiv.org/abs/2107.00158v3 )

ライセンス: Link先を確認

Hermann Albrecht and David Bellorin and Douglas F. Mundarain

(参考訳) Mundarainらによって定義された局所量子相関 (LAQC) は、等等級の局所ブロッホベクトルを持つ2量子ビットのX状態に対して解析される。対称性 X-状態は部分系の交換の下で不変であり、したがって同じ {local} Bloch ベクトルを持つ。一方、反対称 X 状態は、等等等大であるが反対方向 {(anti-parallel)} を持つ {local} Bloch ベクトルを持つ。いずれの場合も、LAQC量子化器の正確な解析式を得る。いくつかの例を示し、この量子相関をコンカレンスと量子不協和と比較する。我々はまた、振幅減衰デコヒーレンスの下でヴェルナー状態を持つマルコフデコヒーレンスも含む。脱分極と位相減衰の場合と同様に、この量子チャネルを持つこれらの状態のLAQCに対して突然の死の挙動は発生しない。

Local available quantum correlations (LAQC), as defined by Mundarain et al., are analyzed for 2-qubit X states with local Bloch vectors of equal magnitude. Symmetric X-states are invariant under the exchange of subsystems, hence having the same {local} Bloch vector. On the other hand, anti-symmetric X states have {local} Bloch vectors with an equal magnitude but opposite direction {(anti-parallel)}. In both cases, we obtain exact analytical expressions for their LAQC quantifier. We present some examples and compare this quantum correlation to concurrence and quantum discord. We have also included Markovian decoherence, with Werner states under amplitude damping decoherence. As is the case for depolarization and phase damping, no sudden death behavior occurs for the LAQC of these states with this quantum channel.

翻訳日:2023-03-23 20:56:40 公開日:2022-06-13

# 量子コムギ石橋

Quantum Wheatstone Bridge ( http://arxiv.org/abs/2108.11397v2 )

ライセンス: Link先を確認

Kasper Poulsen, Alan C. Santos, Nikolaj T. Zinner

(参考訳) 古典版の完全な量子類似物として,量子小麦石橋を提案する。ブリッジは、未知のカップリングに対する感度を高めるために量子効果を利用する数体の境界駆動スピンチェーンである。この感度は、制御可能なカップリングが未知のカップリングに近づくと、破壊干渉による絡み合ったベル状態の集団の減少によって説明される。破壊的干渉の簡単な基準を見出し、落下の幅の近似式を導出する。未知結合に対する感度は量子フィッシャー情報を用いて定量化され, スピン電流を介して間接的に橋の状態を測定できることを示した。我々の結果はキャリブレーションエラーに対して堅牢であり、現在の最先端量子プラットフォームが実現の手段として使用できるという意味では一般的である。したがって、量子ウィートストーンブリッジは、近距離量子デバイスを用いたセンシングやメトロロジーといった分野で使われる可能性がある。

We propose a quantum Wheatstone bridge as a fully quantum analogue to the classical version. The bridge is a few-body boundary-driven spin chain exploiting quantum effects to gain an enhanced sensitivity to an unknown coupling. The sensitivity is explained by a drop in population of an entangled Bell state due to destructive interference as the controllable coupling approaches the unknown coupling. A simple criteria for the destructive interference is found, and an approximate expression for the width of the drop is derived. The sensitivity to the unknown coupling is quantified using the quantum Fisher information, and we show that the state of the bridge can be measured indirectly through the spin current. Our results are robust towards calibration errors and generic in the sense that several of the current state-of-the-art quantum platforms could be used as a means of realization. The quantum Wheatstone bridge may thus find use in fields such as sensing and metrology using near-term quantum devices.

翻訳日:2023-03-17 05:13:24 公開日:2022-06-13

# 三対角行列表現を持つ非エルミートハミルトンの類について

On a class of non-Hermitian Hamiltonians with tridiagonal matrix representation ( http://arxiv.org/abs/2109.14540v5 )

ライセンス: Link先を確認

Francisco M. Fern\'andez

(参考訳) 三対角行列表現を持つ非エルミート・ハミルトニアン作用素は、準エルミートあるいはエルミート作用素に似ている。ここで議論されたハミルトン作用素のクラスにおいて、変換はエルミート的で正定値な対角作用素によって与えられる。開境界条件と周期条件との間には重要な違いがあることを示す。 2つの単純で広く使われているモデルを用いて理論的結果を説明する。

We show that some non-Hermitian Hamiltonian operators with tridiagonal matrix representation may be quasi Hermitian or similar to Hermitian operators. In the class of Hamiltonian operators discussed here the transformation is given by a Hermitian, positive-definite, diagonal operator. We show that there is an important difference between open boundary conditions and periodic ones. We illustrate the theoretical results by means of two simple, widely used, models.

翻訳日:2023-03-13 07:08:39 公開日:2022-06-13

# 永遠非マルコフ性を持つ3量子系における真の多部絡み検出

Detecting genuine multipartite entanglement in three-qubit systems with eternal non-Markovianity ( http://arxiv.org/abs/2110.05211v2 )

ライセンス: Link先を確認

Ankit Vaishy, Subhadip Mitra and Samyadeb Bhattacharya

(参考訳) 量子非マルコフ演算を用いて, 真に多元的絡み合い状態を検出する新しいプロトコルを考案する。我々は、永遠の非マルコフ性として知られる特定の種類の非マルコフ性を利用して、非完全正の写像を構築し、二分割状態のフィルタリングを行い、真の多部交絡を検出する。さらに,本理論に基づき,真に多元的絡み合い状態を検出する証人演算子を提案する。我々の研究は、絡み合い理論と量子非マルコビアン性の間の未解明の接続に光を当てている。

We devise a novel protocol to detect genuinely multipartite entangled states by harnessing quantum non-Markovian operations. We utilize a particular type of non-Markovianity known as the eternal non-Markovianity to construct a non-complete positive map to filter out the bi-separable states and detect genuine multipartite entanglement. We further propose a witness operator to detect genuinely multipartite entangled states experimentally based on this theory. Our study sheds light on a hitherto unexplored connection between entanglement theory and quantum non-Markovianity.

翻訳日:2023-03-11 19:17:47 公開日:2022-06-13

# 産業応用のための量子アニーリング:序論とレビュー

Quantum Annealing for Industry Applications: Introduction and Review ( http://arxiv.org/abs/2112.07491v3 )

ライセンス: Link先を確認

Sheir Yarkoni, Elena Raponi, Thomas B\"ack, and Sebastian Schmitt

(参考訳) 量子アニーリング(quantum annealing)は、組合せ最適化問題を解くために使用できるヒューリスティックな量子最適化アルゴリズムである。近年、量子技術の進歩により、プログラマブルな使用のために量子アニールアルゴリズムを実装する小型および中規模量子プロセッサの開発が可能となった。具体的には、D-Wave Systemsによって製造された量子アニールプロセッサが研究され、様々な分野の研究と産業の両方で広くテストされている。本稿では、ヒューリスティックな量子最適化アルゴリズムとしての量子アニーリングの理論的動機、そのような量子プロセッサを使用するために必要なソフトウェアとハードウェア、そしてそれらを用いて実証された最先端の応用と概念実証に関する文献的考察を行う。我々のレビューの目的は、量子アニール技術の応用に関する集中的かつ凝縮した情報源を提供することである。我々は、様々な分野の研究者と実践者の両方にとって量子アニーリングの利点、限界、可能性を明らかにする。

Quantum annealing is a heuristic quantum optimization algorithm that can be used to solve combinatorial optimization problems. In recent years, advances in quantum technologies have enabled the development of small- and intermediate-scale quantum processors that implement the quantum annealing algorithm for programmable use. Specifically, quantum annealing processors produced by D-Wave Systems have been studied and tested extensively in both research and industrial settings across different disciplines. In this paper we provide a literature review of the theoretical motivations for quantum annealing as a heuristic quantum optimization algorithm, the software and hardware that is required to use such quantum processors, and the state-of-the-art applications and proofs-of-concepts that have been demonstrated using them. The goal of our review is to provide a centralized and condensed source regarding applications of quantum annealing technology. We identify the advantages, limitations, and potential of quantum annealing for both researchers and practitioners from various fields.

翻訳日:2023-03-04 14:12:09 公開日:2022-06-13

# bose-hubbard dimerモデルにおける粒子トンネルによるモード絡み合いのダイナミクス

Dynamics of mode entanglement induced by particle-tunneling in the extended Bose-Hubbard dimer model ( http://arxiv.org/abs/2112.12382v2 )

ライセンス: Link先を確認

Alan J. Barrios, Andrea Vald\'es-Hern\'andez and Francisco J. Sevilla

(参考訳) モードエンタングルメントの進化は、2つのアクセシブルモードを持つ2つの区別できないボソンの系に対して解析される。エンタングルメントは、各モードにおけるボソンの数が不変であるときに常に静止するが、単粒子トンネルと二粒子トンネルの影響下では豊かなダイナミクスを示す。これらの効果をパラダイム的状態の族で解析することにより, トンネル遷移速度と初期状態の調整を変化させることで, モード絡み合いの特定のダイナミクスの設計と制御のためのガイダンスを提供する。

The evolution of mode entanglement is analysed for a system of two indistinguishable bosons with two accessible modes. Whereas entanglement remains stationary whenever the number of bosons in each mode is left invariant, it exhibits a rich dynamics under the effects of single- and two-particle tunneling. By analysing such effects in paradigmatic families of states, our results provide guidance for the design and control of specific dynamics of mode entanglement, by varying the tunneling transition rates and the preparation of the initial state.

翻訳日:2023-03-03 18:07:46 公開日:2022-06-13

# 古典軌道上のリドバーグ電子のトポロジカル分子とトポロジカル局在

Topological Molecules and Topological Localization of a Rydberg Electron on a Classical Orbit ( http://arxiv.org/abs/2201.10246v2 )

ライセンス: Link先を確認

Ali Emami Kopaei, Xuedong Tian, Krzysztof Giergiel, and Krzysztof Sacha

(参考訳) 原子が互いに惹きつけると分子を形成できるという一般的な知識である。ここでは、原子の境界状態が魅力的な相互作用の結果ではなく、トポロジカルな起源を持つ分子を作ることができることを示す。すなわち、原子の有界状態は、トポロジカルモデルの位相的に保護されたエッジ状態に対応する。このようなトポロジカル分子は、超低温原子間の相互作用強度が時間的に適切に変調されたときに実現できる。同様の機構により、ライドバーグ原子が適切に変調されたマイクロ波磁場によって摂動した場合、古典軌道上の電子の位相的に保護された局在化を実現することができる。

It is common knowledge that atoms can form molecules if they attract each other. Here, we show that it is possible to create molecules where bound states of the atoms are not the result of attractive interactions but have the topological origin. That is, the bound states of the atoms correspond to the topologically protected edge states of a topological model. Such topological molecules can be realized if the interaction strength between ultra-cold atoms is properly modulated in time. A similar mechanism allows one to realize topologically protected localization of an electron on a classical orbit if a Rydberg atom is perturbed by a properly modulated microwave field.

翻訳日:2023-02-27 22:39:49 公開日:2022-06-13

# デュアルユニタリ回路ダイナミクスにおける創発的量子状態設計と二元性

Emergent quantum state designs and biunitarity in dual-unitary circuit dynamics ( http://arxiv.org/abs/2202.12306v2 )

ライセンス: Link先を確認

Pieter W. Claeys, Austen Lamacraft

(参考訳) 最近の研究は、量子クエンチに続くユニタリ力学における新しい種類のランダム行列の挙動の出現について研究している。時間進化状態から始めて、小さなサブシステム上でサポートされた純粋な状態のアンサンブルは、システムの残りの部分で射影測定を行い、投影されたアンサンブルを生成することができる。カオス量子系において、このような投影されたアンサンブルは均一なハールランダムアンサンブルと区別不能になり、量子状態設計につながると推測された。正確な結果が最近hoおよびchoi(phys. rev. lett. 18, 060601 (2022))によって提示された。解くことができる初期状態と測定値を持つ一般のカオス的二重ユニタリ回路に拡張でき、基礎となるデュアルユニタリ性の役割を強調し、さらに、デュアルユニタリ回路モデルが正確な可解性とランダムマトリクスの振る舞いの両方を示す方法を示すことができる代替構成を提供する。両単位接続から得られる結果に基づいて,Hadamard行列とユニタリ誤差ベースがともに解決可能な測定方法につながることを示す。

Recent works have investigated the emergence of a new kind of random matrix behaviour in unitary dynamics following a quantum quench. Starting from a time-evolved state, an ensemble of pure states supported on a small subsystem can be generated by performing projective measurements on the remainder of the system, leading to a projected ensemble. In chaotic quantum systems it was conjectured that such projected ensembles become indistinguishable from the uniform Haar-random ensemble and lead to a quantum state design. Exact results were recently presented by Ho and Choi [Phys. Rev. Lett. 128, 060601 (2022)] for the kicked Ising model at the self-dual point. We provide an alternative construction that can be extended to general chaotic dual-unitary circuits with solvable initial states and measurements, highlighting the role of the underlying dual-unitarity and further showing how dual-unitary circuit models exhibit both exact solvability and random matrix behaviour. Building on results from biunitary connections, we show how complex Hadamard matrices and unitary error bases both lead to solvable measurement schemes.

翻訳日:2023-02-24 01:25:39 公開日:2022-06-13

# 単純量子重力における光線ゆらぎ

Light ray fluctuations in simplicial quantum gravity ( http://arxiv.org/abs/2203.07854v2 )

ライセンス: Link先を確認

Ding Jia

(参考訳) 時空の量子領域を通る光線伝播の量子ゆらぎに関する非摂動的研究は、長い時間を要する。ローレンツ型簡約量子重力の理論では、23次元と4次元の対称性が縮小されたボックス領域を移動した後、試験光が異なる場所に着陸する確率を計算する。固定境界条件では、全ての結合定数が絶対値において相対的に小さい場合、光線ゆらぎは一般的に大きいことが判明した。固定結合定数の場合、境界サイズが減少するにつれて、光線のゆらぎが最初に増加し、宇宙定数を持つ2次元理論、アインシュタイン・ヒルベルトおよびR-二乗項が減少する。宇宙定数とアインシュタイン・ヒルベルト項を持つ3Dおよび4D理論では、境界サイズが小さくなるにつれて光線ゆらぎが増大する。ちなみに、2次元量子重力の研究において、宇宙定数とアインシュタイン・ヒルベルト項との大域的時空間双対性は、リッチスカラーの任意の偶数が加わったときにも成り立つことを示す。我々は、非摂動ローレンツ量子重力の連続極限を得るのに光線ゆらぎをどのように利用できるのかを論じる。

A non-perturbative study on the quantum fluctuations of light ray propagation through a quantum region of spacetime is long overdue. Within the theory of Lorentzian simplicial quantum gravity, we compute the probabilities for a test light ray to land at different locations after travelling through a symmetry-reduced box region in 2,3 and 4 spacetime dimensions. It is found that for fixed boundary conditions, light ray fluctuations are generically large when all coupling constants are relatively small in absolute value. For fixed coupling constants, as the boundary size is decreased light ray fluctuations first increase and then decrease in a 2D theory with the cosmological constant, Einstein-Hilbert and R-squared terms. While in 3D and 4D theories with the cosmological constant and Einstein-Hilbert terms, as the boundary size is decreased light ray fluctuations just increase. Incidentally, when studying 2D quantum gravity we show that the global time-space duality with the cosmological constant and Einstein-Hilbert terms noted previously also holds when arbitrary even powers of the Ricci scalar are added. We close by discussing how light ray fluctuations can be used in obtaining the continuum limit of non-perturbative Lorentzian quantum gravity.

翻訳日:2023-02-22 09:14:52 公開日:2022-06-13

# 非相反系における皮膚効果の実スペクトルと位相遷移

Real spectra and phase transition of skin effect in nonreciprocal systems ( http://arxiv.org/abs/2203.08618v3 )

ライセンス: Link先を確認

Qi-Bo Zeng and Rong L\"u

(参考訳) 実近傍ホッピングを持つ一次元非相互格子について検討し、開境界条件下でのエネルギースペクトルが完全に実あるいは虚であることを示す。さらに,等間隔領域に導入された実非反逆ホッピングを持つ1次元モザイク格子のスペクトル特性と非エルミート皮膚効果についても検討した。そのような格子のアイジネギーは、非相互性が変化するにつれて、実複素虚数あるいは実複素遷移を行う。さらに、皮膚効果はモザイクの非相反性の周期に依存する相転移を示す。バルク状態は、周期的境界条件下でスペクトルの点ギャップの閉鎖と再開を伴って臨界点を横切ることによって、格子の一方の端から反対の端に突然シフトする。遷移の位相図を示し、臨界境界を解析的に決定する。非エルミート系におけるエネルギースペクトルと皮膚効果の興味深い性質を明らかにする。

We study the one-dimensional nonreciprocal lattices with real nearest neighboring hopping and find that the energy spectra under open boundary conditions can be entirely real or imaginary. We further investigate the spectral properties and the non-Hermitian skin effect in the one-dimensional mosaic lattices with real nonreciprocal hopping introduced at equally spaced sites. The eigenenergies of such lattices undergo a real-complex-imaginary or real-complex transition as the nonreciprocity varies. Moreover, the skin effect exhibits phase transitions depending on the period of the mosaic nonreciprocity. The bulk states are abruptly shifted from one end of the lattice to the opposite one by crossing the critical points, accompanied by the closing and reopening of point gaps in the spectra under periodic boundary conditions. The phase diagrams of the transition are presented and the critical boundaries are analytically determined. Our work unveils the intriguing properties of the energy spectrum and skin effect in non-Hermitian systems.

翻訳日:2023-02-21 23:07:28 公開日:2022-06-13

# 雇用プロセスにおけるアルゴリズム的障害識別の取り組み--倫理的、法的、技術的分析

Tackling Algorithmic Disability Discrimination in the Hiring Process: An Ethical, Legal and Technical Analysis ( http://arxiv.org/abs/2206.06149v1 )

ライセンス: Link先を確認

Maarten Buyl, Christina Cociancig, Cristina Frattone, Nele Roekens

(参考訳) 障害のある人に対するアルゴリズム的差別(PWD)に取り組むには、特に倫理的、法的、技術的課題のために、他の保護された特徴に適用されるものと根本的に異なるアプローチを要求される。これらの課題は、雇用プロセス(または自動雇用システム、AHS)で使用される人工知能(AI)システムにおいて特に解決され、自動化された評価手順は、独自の倫理的および法的考慮の対象となり、PWDに不確実な悪影響を及ぼす。本稿では,障害の識別に関して,aiを活用した雇用が生み出す懸念と機会について述べる。最終的には、このトピックに関するさらなる研究を奨励するつもりです。したがって、私たちはいくつかの出発点を確立し、倫理主義者、議員、支持者、そしてAI実践者のためのロードマップを設計します。

Tackling algorithmic discrimination against persons with disabilities (PWDs) demands a distinctive approach that is fundamentally different to that applied to other protected characteristics, due to particular ethical, legal, and technical challenges. We address these challenges specifically in the context of artificial intelligence (AI) systems used in hiring processes (or automated hiring systems, AHSs), in which automated assessment procedures are subject to unique ethical and legal considerations and have an undeniable adverse impact on PWDs. In this paper, we discuss concerns and opportunities raised by AI-driven hiring in relation to disability discrimination. Ultimately, we aim to encourage further research into this topic. Hence, we establish some starting points and design a roadmap for ethicists, lawmakers, advocates as well as AI practitioners alike.

翻訳日:2023-02-19 17:44:00 公開日:2022-06-13

# リーキーパイプライン」への取り組み--コンピュータ教育における女性を育成・維持するための行動のレビューと分類

Addressing the "Leaky Pipeline": A Review and Categorisation of Actions to Recruit and Retain Women in Computing Education ( http://arxiv.org/abs/2206.06113v1 )

ライセンス: Link先を確認

Alina Berry, Susan McKeever, Brenda Murphy, Sarah Jane Delany

(参考訳) コンピューティング教育におけるジェンダーの不均衡は、世界中でよく知られた問題である。リーキーパイプライン」という言葉は、上級職に進む前に女性の保持が欠如していることを示すためにしばしば使われる。近年、多くのイニシアチブが漏れやすいパイプラインの問題をターゲットにしている。本論文は,女子大学における女性採用の促進と,高等教育における関連コースの維持に使用される技術に関する総合的な考察を行う。主な目的は、いくつかの効果を示す介入やイニシアティブ(私たちが"アクション"と呼ぶ)を特定することです。第2の目的は、今後の行動議論、比較、計画を可能にするために、発見を分類として構成することであった。作業のかなりの部分で直面した課題は、評価の欠如、すなわち、イニシアティブと維持や採用の成果との直接的な関係の評価であった。行動は、政策、教育、影響と支援、促進と包括の4つのグループに分けられた。政策行動には支援と組織レベルでの構造変化が必要である。教育行為は、コンピューティングコースの教育に関連するイニシアチブである。影響とサポートのカテゴリには、女性がコンピューティングを選択し、一度サポートを受け、留まるように促す方法に関するアクションが含まれている。最後に、プロモーションとエンゲージメントアクションは、コンピューティングベースのコースを促進し、エンゲージメントとアウトリーチ活動を行うためのイニシアティブである。我々は,各カテゴリーと下位カテゴリにおける行動に関する文献を分類し,その分類について述べる。我々は,行動の直接的影響を評価する上での課題を議論し,この作業が我々の仕事の次の段階へどのように繋がるかについて概説する。

Gender imbalance in computing education is a well-known issue around the world. The term "leaky pipeline" is often used to describe the lack of retention of women before they progress to senior roles. Numerous initiatives have targeted the problem of the leaky pipeline in recent decades. This paper provides a comprehensive review of initiatives related to techniques used to boost recruitment and retention of women in undergraduate computing and related courses in higher education. The primary aim was to identify interventions or initiatives (which we called "actions") that have shown some effectiveness. A secondary objective was to structure our findings as a categorisation, in order to enable future action discussion, comparison and planning. A particular challenge faced in a significant portion of the work was the lack of evaluation: i.e. the assessment of the direct relationship between the initiatives and the outcomes on retention or recruitment. The actions were categorised into four groups: Policy, Pedagogy, Influence and Support and Promotion and Engagement. Policy actions need support and potentially structural change at institution level. Pedagogy actions are initiatives related to the teaching of computing courses. The Influence and Support category includes actions associated with ways to influence women to choose computing and once enrolled to support and encourage them to stay. Finally, Promotion and Engagement actions are initiatives to promote computing based courses and involve engagement and outreach activities. We present our categorisation, identifying the literature related to actions under each category and subcategory. We discuss the challenges with evaluating the direct impact of actions and outline how this work leads towards the next phase of our work - a toolkit of actions to promote retention and recruitment of women in computing undergraduate courses.

翻訳日:2023-02-19 17:43:45 公開日:2022-06-13

# 希望のための情報ソースの理論:信念、欲求、イマジネーション、メタ認知

Theorizing Information Sources for Hope: Belief, Desire, Imagination, and Metacognition ( http://arxiv.org/abs/2206.03311v2 )

ライセンス: Link先を確認

Tim Gorichanaz

(参考訳) はじめに。希望は可能な(まだ不明な)望ましい結果に向けられたポジティブな態度である。希望は美徳であるが、絶望は広く、現在の出来事だけでなく、現在の出来事に関する情報にも関係しているように見える。本稿では,情報を通して希望がどのように引き起こされるかを検討する。方法。本研究は、理論的議論を進めるために概念分析と設計の哲学的手法を用いる。分析。まず、希望の概念化が提供され、主に徳の倫理に関する仕事を描く。次に、希望のための4種類の情報ソースが理論化され、哲学と心理学から仕事を構築、合成する。結果だ希望に満ちた情報ソースの4つのカテゴリは、過去または未来についての信念を形成する情報、未来の可能性に関する道徳的想像を巻き込む情報、特定の道徳的成果に対する欲求を喚起する情報、メタ認知のための情報、または希望に関してどのように情報を得るかである。結論だ多くの場合、情報に反応することが望まれます。これは、情報専門家や学者が人々と希望、特に困難な時期をつなぐためのモラルの機会であることを示唆している。さらなる研究、特に情報行動や実践への道筋が提案されている。

Introduction. Hope is a positive attitude oriented toward a possible (yet uncertain), desired outcome. Though hope is a virtue, hopelessness is widespread and seems related not only to current events but also to information about current events. This paper examines how hope can be sparked through information. Method. This study uses the philosophical methods of conceptual analysis and design to advance a theoretical argument. Analysis. First, a conceptualization of hope is offered, drawing on work primarily in virtue ethics. Then, four types of information sources for hope are theorized, building on and synthesizing work from philosophy and psychology. Results. Four categories of information source conducive to hopefulness are identified: information for forming beliefs about the past or future; information for engaging the moral imagination regarding possibilities for the future; information for sparking desire for particular moral outcomes; and information for metacognition, or about how we become informed with respect to hope. Conclusions. Hope is, in many cases, responsive to information. This suggests a moral opportunity for information professionals and scholars to work toward connecting people with information for hope, particularly in difficult times. Avenues for further research, particularly in information behavior and practices, are suggested.

翻訳日:2023-02-19 17:38:00 公開日:2022-06-13

# キャリブレーションサブセット選択によるスクリーニングプロセスの改善

Improving Screening Processes via Calibrated Subset Selection ( http://arxiv.org/abs/2202.01147v3 )

ライセンス: Link先を確認

Lequn Wang, Thorsten Joachims, Manuel Gomez Rodriguez

(参考訳) 治験に合格した患者の検索や検索エンジンの検索パイプラインなど、多くの選択プロセスは複数の段階で構成されており、初期スクリーニング段階は最も有望な候補の短縮にリソースを集中させる。本稿では,手動で構築するか,訓練するかに関わらず,スクリーニング分類器がどのような保証を提供できるかを検討する。我々は、現在の解が分布のない理論的な保証を享受していないことを発見した -- 一般に、完全に校正された分類器でさえ、そのショートリストが最適でない候補のプールが常に存在することを示す。次に,任意の分類器とある程度のキャリブレーションデータが与えられた場合,希望する候補数を含む候補の候補の至近短リストを探索する,分散非分布スクリーニングアルゴリズム -- calibrated subset selection (css) -- を開発した。さらに、特定のグループ間で複数の分類器を校正するCSSの変種が、証明可能な多様性を保証するショートリストを作成することができることを示す。米国国勢調査調査データを用いた実験は,我々の理論的結果を検証し,本アルゴリズムが提供したショートリストが,いくつかの競合ベースラインが提供したショートリストよりも優れていることを示す。

Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates. In this paper, we investigate what guarantees a screening classifier can provide, independently of whether it is constructed manually or trained. We find that current solutions do not enjoy distribution-free theoretical guarantees -- we show that, in general, even for a perfectly calibrated classifier, there always exist specific pools of candidates for which its shortlist is suboptimal. Then, we develop a distribution-free screening algorithm -- called Calibrated Subset Selection (CSS) -- that, given any classifier and some amount of calibration data, finds near-optimal shortlists of candidates that contain a desired number of qualified candidates in expectation. Moreover, we show that a variant of CSS that calibrates a given classifier multiple times across specific groups can create shortlists with provable diversity guarantees. Experiments on US Census survey data validate our theoretical results and show that the shortlists provided by our algorithm are superior to those provided by several competitive baselines.

翻訳日:2023-02-19 14:38:21 公開日:2022-06-13

# 非対称x状態の局所利用可能な量子相関

Local available quantum correlations of non-symmetric X states ( http://arxiv.org/abs/2204.07552v2 )

ライセンス: Link先を確認

David Bellorin and Hermann Albrecht and Douglas F. Mundarain

(参考訳) Mundarainらによって定義された局所可利用量子相関(LAQC)は、非対称な2量子ビットのX状態、すなわち、サブシステムの交換の下で不変でないX状態に対して解析され、したがってノルムが異なる局所ブロッホベクトルを持つ。 LAQC定量器の簡単な解析式を得る。例えば、ウェルナー状態と一般x状態に対する振幅減衰チャネルの局所的応用について解析する。この局所的な量子チャネルは、いくつかのケースでは量子不協和を生成することができるが、LAQCにはそのような結果はあり得ない。この研究は、いわゆる対称および反対称X状態に対する我々の以前の結果と共に、2-量子X状態に対するLAQC量子化器の正確な解析式を追求する。

Local available quantum correlations (LAQC), as defined by Mundarain et al., are analyzed for non-symmetric 2-qubit X states, that is, X-states that are not invariant under the exchange of subsystems and therefore have local Bloch vectors whose norms are different. A simple analytic expression for their LAQC quantifier is obtained. As an example, we analyze the local application of the amplitude damping channel for Werner states and general X states. Although this local quantum channel can create quantum discord in some cases, no such outcome is possible for LAQC, which hints toward their monotonicity under LOCC operations. This work, along with our previous result for so-called symmetric and anti-symmetric X states, completes the pursuit of exact analytical expressions for the LAQC quantifier for 2-qubit X states.

翻訳日:2023-02-16 21:30:06 公開日:2022-06-13

# 一般化Maxwell-BlochフレームワークにおけるPML吸収境界条件の反射誤差の低減

Reducing the reflection error of PML absorbing boundary conditions within a generalized Maxwell-Bloch framework ( http://arxiv.org/abs/2206.04597v2 )

ライセンス: Link先を確認

Johannes Popp, Lukas Seitner, Michael Haider, and Christian Jirauschek (Department of Electrical and Computer Engineering, Technical University of Munich, Arcisstr. 21, 80333 Munich, Germany)

(参考訳) 境界条件を吸収する完全整合層(PML)を含む全波数値Maxwell-Blochシミュレーションツールを実演する。シミュレーション領域の境界における劣化反射誤差を回避するために、内部量子系から生じるインピーダンスミスマッチ効果を考慮した適応型PMLモデルを導入する。修正PMLモデルの数値検証には、テラヘルツ量子カスケードレーザー(QCL)構造のアクティブゲイン媒体にシミュレーションツールを適用する。 Maxwell-Bloch シミュレーション手法を用いて, 能動ゲイン媒体のトランの吸収特性を改良した。

We demonstrate a full-wave numerical Maxwell-Bloch simulation tool including perfectly matched layer (PML) absorbing boundary conditions. To avoid detrimental reflection errors at the boundary of the simulation domain, an adapted PML model is introduced, which takes into account impedance mismatch effects arising from the internal quantum system. For the numerical validation of the modified PML model the simulation tool is applied to the active gain medium of a terahertz quantum cascade laser (QCL) structure. Improved absorbing characteristics for the truncation of active gain media in our Maxwell-Bloch simulation approach are obtained.

翻訳日:2023-02-10 01:24:02 公開日:2022-06-13

# Ancilla-assisted process tomography の意義と感度

Faithfulness and sensitivity for ancilla-assisted process tomography ( http://arxiv.org/abs/2206.05899v1 )

ライセンス: Link先を確認

Seok Hyung Lie, Hyunseok Jeong

(参考訳) 系に作用する未知の量子チャネルの完全な情報を包含できる系アンシラ二成分状態は、忠実状態と呼ばれる。 d'ariano と presti によって証明された、状態の忠実さと対応するジャミョルコフスキー写像の可逆性の間の同値性は、量子チャネルではなくトレース非開化量子演算を仮定した証明が不完全であるにもかかわらず、ancilla-assisted process tomography に有用である。等価性の証明を完了し、量子チャネルの様々なクラスに忠実性の一般化を導入する。また、感度と呼ばれるより一般的な概念を探求し、量子チャネルの非自明な作用によって量子状態の性質が変化する。両特性を、ユニタリチャネル、ランダムユニタリ演算、ユニタリ演算などの量子チャネルの重要なクラスに特徴付けることにより、それらの関係を考察する。予期せぬ(非等価な)結果が量子チャネルの構造に光を当て、量子チャネルの様々なサブクラスに忠実または敏感な量子状態を特徴づけるためには2つの量子状態のクラスのみが必要であることを示した。例えば、量子過程のトモグラフィーと量子相関の関係は、局所的に観測可能な観測不能な二成分状態のみが単位チャネルの効果を感知するために使用できることが分かるため、明らかにされる。

A system-ancilla bipartite state capable of containing the complete information of an unknown quantum channel acting on the system is called faithful. The equivalence between faithfulness of state and invertibility of the corresponding Jamiolkowski map proved by D'Ariano and Presti has been a useful characterization for ancilla-assisted process tomography albeit the proof was incomplete as they assumed trace nonincreasing quantum operations, not quantum channels. We complete the proof of the equivalence and introduce the generalization of faithfulness to various classes of quantum channels. We also explore a more general notion we call sensitivity, the property of quantum state being altered by any nontrivial action of quantum channel. We study their relationship by characterizing both properties for important classes of quantum channels such as unital channels, random unitary operations and unitary operations. Unexpected (non-)equivalence results among them shed light on the structure of quantum channels by showing that we need only two classes of quantum states for characterizing quantum states faithful or sensitive to various subclasses of quantum channels. For example, it reveals the relation between quantum process tomography and quantum correlation as it turns out that only bipartite states that has no local classical observable at all can be used to sense the effect of unital channels.

翻訳日:2023-02-09 12:54:33 公開日:2022-06-13

# 合成量子チャネルによる任意の量子相関の検出

Detection of arbitrary quantum correlations via synthesized quantum channels ( http://arxiv.org/abs/2206.05883v1 )

ライセンス: Link先を確認

Ze Wu, Ping Wang, Tianyun Wang, Yuchen Li, Ran Liu, Yuquan Chen, Xinhua Peng, Ren-Bao Liu, Jiangfeng Du

(参考訳) 量子相関は、量子多体系の構造とダイナミクスに関する重要な情報である。時間順序の異なる高次量子相関には多くの種類があるが、既存の検出方法にアクセスできるものはごくわずかである。近年,任意の種類の相関を選択的に抽出するために,逐次弱測定に基づく量子センシング手法が提案されている。しかし、その実験的な実装はまだ解明されていない。ここでは任意のタイプの量子相関の抽出を示す。我々は,従来の弱測定方式を合成量子チャネルを用いたプロトコルに一般化し,単一およびアンサンブル量子システムを含むより普遍的なシナリオに適用した。この量子チャネル法では、センサの様々な制御が重ね合わされ、所望の量子相関を測定するための特定の経路に沿ってセンサターゲットの進化が選択される。核磁気共鳴法の汎用性を用いて、核スピンターゲットの2次および4次相関を別の核スピンセンサで抽出することに成功した。量子相関の完全な特徴付けは、量子多体系を理解し、基本量子物理学を探求し、量子技術を開発するための新しいツールを提供する。

Quantum correlations are key information about the structures and dynamics of quantum many-body systems. There are many types of high-order quantum correlations with different time orderings, but only a few of them are accessible to the existing detection methods. Recently, a quantum-sensing approach based on sequential weak measurement was proposed to selectively extract arbitrary types of correlations. However, its experimental implementation is still elusive. Here we demonstrate the extraction of arbitrary types of quantum correlations. We generalized the original weak measurement scheme to a protocol using synthesized quantum channels, which can be applied to more universal scenarios including both single and ensemble quantum systems. In this quantum channel method, various controls on the sensors are superimposed to select the sensor-target evolution along a specific path for measuring a desired quantum correlation. Using the versatility of nuclear magnetic resonance techniques, we successfully extract the second- and fourth-order correlations of a nuclear-spin target by another nuclear-spin sensor. The full characterization of quantum correlations provides a new tool for understanding quantum many-body systems, exploring fundamental quantum physics, and developing quantum technologies.

翻訳日:2023-02-09 12:53:51 公開日:2022-06-13

# 高次元グラフ上の計測に基づく量子ウォーク

Measurement-Based Quantum Walks on High-Dimensional Graphs ( http://arxiv.org/abs/2206.06059v1 )

ライセンス: Link先を確認

Syamsundar De, Vahid Ansari, Jan Sperling, Sonja Barkhofen, Benjamin Brecht and Christine Silberhorn

(参考訳) 高次元および再構成可能なグラフ上の量子ウォーク(QWs)は、量子シミュレーションと情報処理タスクの完全な可能性を呼び出すことができる。しかしながら、このような大規模でプログラマブルな量子ウォークの実験的な実現は、既存のスキームの複雑さが大幅に増大するため、非常に困難である。この限界を克服して、グローバーが95%以上の類似度を持つ4次元超キューブの上を歩き、98%の類似度を持つ円と有限直線上の400ステップの量子ウォークを示す。これは、量子ウォークに対する新しい測定に基づくアプローチによって実現され、適切に回転されたベース上の測定がターゲットとなる進化ユニタリを実装する。提案手法は,gaussian boson sampling のような複雑なタスクに応用できるスケーラブルでプログラマブルな量子ネットワークの実装に向けた新たな道を開くものである。

Quantum walks (QWs) on high-dimensional and reconfigurable graphs can invoke the full potential of quantum simulation and information processing tasks. However, experimental realization of such large-scale and programmable quantum walks is quite challenging, owing to the significantly increased complexity of the existing schemes. Overcoming this limitation, we here demonstrate Grover walks on four-dimensional hypercubes with high similarities above 95%, and 400-step quantum walks on circles and finite lines with similarities of 98%. This is rendered possible by a novel measurement-based approach to quantum walks where measurements on appropriately rotated bases implement the targeted evolution unitaries. Our results open a new path towards the implementation of scalable and programmable quantum networks that can find application in complex tasks, such as Gaussian Boson Sampling.

翻訳日:2023-02-09 12:47:50 公開日:2022-06-13

# 超伝導ダッフィング発振器の散逸相転移における量子挙動

Quantum behavior of a superconducting Duffing oscillator at the dissipative phase transition ( http://arxiv.org/abs/2206.06338v1 )

ライセンス: Link先を確認

Qi-Ming Chen, Michael Fischer, Yuki Nojiri, Michael Renger, Edwar Xie, Matti Partanen, Stefan Pogorzalek, Kirill G. Fedorov, Achim Marx, Frank Deppe, Rudolf Gross

(参考訳) 決定論的非線形システムの非決定論的挙動を理解することは、ローレンツがそれを「バタフライ効果」と命名して以来、暗黙の夢であった。有名な例はダフィング発振器のヒステリシスとビスタビリティであり、古典的な記述では二重ウェルポテンシャルにおける2つの定常状態の共存に起因する。しかし、この解釈は、パラメータ空間全体において単一の一意的な定常状態が許容される量子力学的観点では失敗する。ここでは、超伝導ダッフィング発振器の非平衡ダイナミクスを測定し、古典的および量子的記述を量子メタスタビリティの統一的な図形で再現する。 2つの古典的な定常状態が実際には準安定状態であることを示す。古典的なヒステリシス体制では極めて長い寿命を持つが、最終的には量子力学によって許される唯一の定常状態に緩和されなければならない。準安定状態の寿命を十分に大きくすることで,11サイトBose-Hubbard格子における平均場の急激な変化を模した1次散逸相転移を観測する。また、量子状態トモグラフィーによる遷移の2つの相、すなわちコヒーレント状態相と臨界点によって分離された圧縮状態相を明らかにする。以上の結果から, 非平衡系のヒステリシスと不安定性を理解する上では, 突然の散逸相転移の背後にあるスムーズな量子状態の進化が明らかとなった。

Understanding the non-deterministic behavior of deterministic nonlinear systems has been an implicit dream since Lorenz named it the "butterfly effect". A prominent example is the hysteresis and bistability of the Duffing oscillator, which in the classical description is attributed to the coexistence of two steady states in a double-well potential. However, this interpretation fails in the quantum-mechanical perspective, where a single unique steady state is allowed in the whole parameter space. Here, we measure the non-equilibrium dynamics of a superconducting Duffing oscillator and reconcile the classical and quantum descriptions in a unified picture of quantum metastability. We demonstrate that the two classically regarded steady states are in fact metastable states. They have a remarkably long lifetime in the classical hysteresis regime but must eventually relax into a single unique steady state allowed by quantum mechanics. By engineering the lifetime of the metastable states sufficiently large, we observe a first-order dissipative phase transition, which mimics a sudden change of the mean field in a 11-site Bose-Hubbard lattice. We also reveal the two distinct phases of the transition by quantum state tomography, namely a coherent-state phase and a squeezed-state phase separated by a critical point. Our results reveal a smooth quantum state evolution behind a sudden dissipative phase transition, and they form an essential step towards understanding hysteresis and instability in non-equilibrium systems.

翻訳日:2023-02-09 12:39:55 公開日:2022-06-13

# 量子制御のための物理インフォームドニューラルネットワーク

Physics-informed neural networks for quantum control ( http://arxiv.org/abs/2206.06287v1 )

ライセンス: Link先を確認

Ariel Norambuena, Marios Mattheakis, Francisco J. Gonz\'alez and Ra\'ul Coto

(参考訳) 量子制御はユビキタスな研究分野であり、物理学者は量子システムのダイナミクスと特徴を掘り下げることができる。システムのステアリングに加えて、量子制御は様々な原子、光学、機械、固体システムに強力な応用をもたらした。近年,最適化プロセスに基づく従来の制御技術が,効率的な人工知能アルゴリズムに変換されている。本稿では,物理インフォームドニューラルネットワーク(PINN)を用いた最適量子制御問題の計算手法を提案する。提案手法は,高い確率で状態間移動問題を効率的に解き,短時間で進化させ,制御のパワーを最小化することにより,量子システムの開放に応用する。さらに、パラメータや初期条件の変化の下で同じ問題を解決するために、PINNの柔軟性を説明し、標準制御技術と比較して利点を示す。

Quantum control is a ubiquitous research field that has enabled physicists to delve into the dynamics and features of quantum systems. In addition to steering the system, quantum control has delivered powerful applications for various atomic, optical, mechanical, and solid-state systems. In recent years, traditional control techniques based on optimization processes have been translated into efficient artificial intelligence algorithms. Here, we introduce a computational method for optimal quantum control problems via physics-informed neural networks (PINNs). We apply our methodology to open quantum systems by efficiently solving the state-to-state transfer problem with high probabilities, short-time evolution, and minimizing the power of the control. Furthermore, we illustrate the flexibility of PINNs to solve the same problem under changes in parameters and initial conditions, showing advantages in comparison with standard control techniques.

翻訳日:2023-02-09 12:38:39 公開日:2022-06-13

# シリコン中の量子ドット結合スズ量子ビットの目覚ましい展望

The remarkable prospect for quantum-dot-coupled tin qubits in silicon ( http://arxiv.org/abs/2206.06285v1 )

ライセンス: Link先を確認

Wayne M. Witzel and Jesse J. Lutz and Dwight R. Luhman

(参考訳) シリコン半導体中のスピン-$\frac{1}{2}$$^{119}$sn原子核は優れた量子ビットを生成する。シリコンの核スピンは長いコヒーレンス時間を持つことが知られている。 Tinはシリコンと等電子であるため、電子はSn原子から別の原子へ簡単に移動して超微細な相互作用を通じて量子情報を伝播し、全電子線型化された拡張平面波密度汎関数論計算から予測すると、本質的な$^{29}$Siより約10倍大きい。超微細誘導型電気核制御相(e-n-CPhase)ゲート動作は、電子を一定期間最大超微細強度のスイートスポットに保持するだけで(局所回転まで)発生し、電荷/電圧ノイズに対して非常に耐性があると予測される。ダイアバティックスピンフリップは、控えめな磁場($<10^{-6}$フリップ確率に対して15〜$mt)で抑制され、核スピンバスノイズは、同位体の濃縮または動的デカップリングまたは監視および補償によって緩和される。磁気共鳴制御と組み合わせて、この演算は普遍的な量子計算を可能にする。

Spin-$\frac{1}{2}$ $^{119}$Sn nuclei in a silicon semiconductor could make excellent qubits. Nuclear spins in silicon are known to have long coherence times. Tin is isoelectronic with silicon, so we expect electrons can easily shuttle from one Sn atom to another to propagate quantum information via a hyperfine interaction that we predict, from all-electron linearized augmented plane wave density functional theory calculations, to be roughly ten times larger than intrinsic $^{29}$Si. A hyperfine-induced electro-nuclear controlled-phase (e-n-CPhase) gate operation, generated (up to local rotations) by merely holding an electron at a sweet-spot of maximum hyperfine strength for a specific duration of time, is predicted to be exceptionally resilient to charge/voltage noise. Diabatic spin flips are suppressed with a modest magnetic field ($>15~$mT for $<10^{-6}$ flip probabilities) and nuclear spin bath noise may be avoided via isotopic enrichment or mitigated using dynamical decoupling or through monitoring and compensation. Combined with magnetic resonance control, this operation enables universal quantum computation.

翻訳日:2023-02-09 12:38:27 公開日:2022-06-13

# フロッケ回路の位相欠陥

Topological Defects in Floquet Circuits ( http://arxiv.org/abs/2206.06272v1 )

ライセンス: Link先を確認

Mao Tian Tan, Yifan Wang and Aditi Mitra

(参考訳) トポロジカルな欠陥を持つ駆動Ising鎖を記述するFloquet回路を導入する。対応するゲートはスピンを反転する欠陥と、クラマース・ワニエ双対変換を明示的に実装する双対性欠陥を含む。フロッケユニタリ進化作用素はそのような欠陥で可換であるが、双対性欠陥は状態の半分を射出するためユニタリではない。これらの欠陥の応用は2つある。 1つは、システムの周りに広がる「空間的」欠陥の存在下での戻り振幅を分析することである。我々は、戻り振幅が欠陥の融合規則と一致していることを明確に検証する。第二の応用は、反周期的・双対的境界条件を実装する「時間的」欠陥の存在下でのユニタリ進化を研究することである。後者の場合、単一の未ペアローカライズされたMajorana 0 モードが現れることを示す。我々は、このFloquet回路の対称性として機能する演算子を明示的に構成する。また, 複数箇所のシステムに対して, 一つの時間ステップで絡み合いエントロピーの解析式を, 上記のすべての欠陥構成に対して提示する。

We introduce a Floquet circuit describing the driven Ising chain with topological defects. The corresponding gates include a defect that flips spins as well as the duality defect that explicitly implements the Kramers-Wannier duality transformation. The Floquet unitary evolution operator commutes with such defects, but the duality defect is not unitary, as it projects out half the states. We give two applications of these defects. One is to analyze the return amplitudes in the presence of "space-like" defects stretching around the system. We verify explicitly that the return amplitudes are in agreement with the fusion rules of the defects. The second application is to study unitary evolution in the presence of "time-like" defects that implement anti-periodic and duality-twisted boundary conditions. We show that a single unpaired localized Majorana zero mode appears in the latter case. We explicitly construct this operator, which acts as a symmetry of this Floquet circuit. We also present analytic expressions for the entanglement entropy after a single time step for a system of a few sites, for all of the above defect configurations.

翻訳日:2023-02-09 12:38:05 公開日:2022-06-13

# 炭化ケイ素中のバナジウム:長い緩和寿命と超微細分解光遷移を有するテレコム可読スピン中心

Vanadium in Silicon Carbide: Telecom-ready spin centres with long relaxation lifetimes and hyperfine-resolved optical transitions ( http://arxiv.org/abs/2206.06240v1 )

ライセンス: Link先を確認

T. Astner, P. Koller, C. M. Gilardoni, J. Hendriks, N. T. Son, I. G. Ivanov, J. U. Hassan, C. H. van der Wal, and M. Trupke

(参考訳) 炭化ケイ素(SiC)のバナジウムは、テレコム波長域における光学遷移のため、量子技術の重要な候補系として浮上している。しかし、スピン緩和寿命(T1)、電荷状態のダイナミクス、レベル構造など、この欠陥ファミリーの重要な特徴は、完全には理解されていない。本研究では,バナジウム欠陥のアンサンブルのT1を定量し,低温で大幅に増強できることを実証した。我々は,100mKで最大25s,1.3Kで1s,90%を超える大きなスピンコントラストを観察した。これらの測定は、アンサンブル電荷状態ダイナミクスの特性によって補完される。安定な電子スピンはさらに、2光子磁気分光による超微細構造の高分解能評価を可能にする。得られた知見は、SiCのバナジウムに基づく高性能スピン-光子界面を指している。

Vanadium in silicon carbide (SiC) is emerging as an important candidate system for quantum technology due to its optical transitions in the telecom wavelength range. However, several key characteristics of this defect family including their spin relaxation lifetime (T1), charge state dynamics, and level structure are not fully understood. In this work, we determine the T1 of an ensemble of vanadium defects, demonstrating that it can be greatly enhanced at low temperature. We observe a large spin contrast exceeding 90% and long spin-relaxation times of up to 25s at 100mK, and of order 1s at 1.3K. These measurements are complemented by a characterization of the ensemble charge state dynamics. The stable electron spin furthermore enables high-resolution characterization of the systems' hyperfine level structure via two-photon magneto-spectroscopy. The acquired insights point towards high-performance spin-photon interfaces based on vanadium in SiC.

翻訳日:2023-02-09 12:37:29 公開日:2022-06-13

# 一般量子ネットワーク上の量子回路の分布

Distribution of Quantum Circuits Over General Quantum Networks ( http://arxiv.org/abs/2206.06437v1 )

ライセンス: Link先を確認

Ranjani G Sundaram, Himanshu Gupta, C. R. Ramakrishnan

(参考訳) 短期量子コンピュータは少数の量子ビットしか保持できない。大規模量子計算を容易にする方法の1つは、量子コンピュータの分散ネットワークである。本研究では,量子回路として表現される量子プログラムを,異種量子コンピュータの量子ネットワークに分散する問題を,分散回路の実行に必要な通信コストを最小化する手法で検討する。我々は2つのコミュニケーション方法を検討する: コンピュータのペア間でクビットのリンクコピーを生成する猫の絡み合い、テレポーテーション。不均一なコンピュータは、アルゴリズムによって選択できる猫の絡み合いとテレポーテーション操作に制約を課す。まず,コミュニケーションのためのテレポーテーションではなく,猫の絡み合いのみを許容する特殊なケースに注目した。この特殊な設定を解くための2段階のヒューリスティックを提供する。 (i)タブサーチによるコンピュータへの量子ビットの割り当てを見つけること、 2) ゲートを局所的に実行するために必要な猫絡み操作を決定するために, セットカバー問題の制約バージョン用に設計された反復的欲求アルゴリズムを用いる。両形態の通信を可能にする一般的な場合に対し、量子回路をいくつかの部分に分け、各部分の特殊設定にヒューリスティックを適用する2つのアルゴリズムを提案する。テレポーテーションは、各部分のソリューションを縫い合わせるために使用される。最後に,ランダムに生成する量子ネットワークと回路の広い範囲でアルゴリズムをシミュレートし,その特性について様々なパラメータについて検討する。

Near-term quantum computers can hold only a small number of qubits. One way to facilitate large-scale quantum computations is through a distributed network of quantum computers. In this work, we consider the problem of distributing quantum programs represented as quantum circuits across a quantum network of heterogeneous quantum computers, in a way that minimizes the overall communication cost required to execute the distributed circuit. We consider two ways of communicating: cat-entanglement that creates linked copies of qubits across pairs of computers, and teleportation. The heterogeneous computers impose constraints on cat-entanglement and teleportation operations that can be chosen by an algorithm. We first focus on a special case that only allows cat-entanglements and not teleportations for communication. We provide a two-step heuristic for solving this specialized setting: (i) finding an assignment of qubits to computers using Tabu search, and (ii) using an iterative greedy algorithm designed for a constrained version of the set cover problem to determine cat-entanglement operations required to execute gates locally. For the general case, which allows both forms of communication, we propose two algorithms that subdivide the quantum circuit into several portions and apply the heuristic for the specialized setting on each portion. Teleportations are then used to stitch together the solutions for each portion. Finally, we simulate our algorithms on a wide range of randomly generated quantum networks and circuits, and study the properties of their results with respect to several varying parameters.

翻訳日:2023-02-09 12:32:13 公開日:2022-06-13

# 複合量子シミュレーション

Composite Quantum Simulations ( http://arxiv.org/abs/2206.06409v1 )

ライセンス: Link先を確認

Matthew Hagan, Nathan Wiebe

(参考訳) 本稿では、トロッター・スズキ公式やqdriftのような複数の量子シミュレーション手法を単一の合成チャネルに結合し、ゲート数を減らすための古い結合アイデアに基づく枠組みを提案する。このアプローチの背後にある中心的な考え方は、シミュレーション内のチャネルのトロッターまたはQDrift部分にハミルトン項を割り当てるパーティショニングスキームを使用することである。これにより、高次トロッタースズキ式を用いてより大きい項をシミュレートしながら、QDriftを用いて、小さくて多数の項をシミュレートできる。合成チャネルと理想シミュレーションチャネルとの間のダイヤモンド距離の厳密な境界を証明し、合成チャネルの実装コストが漸近的に上界となる条件下では、項の確率的分割と決定論的分割の両方でそれを構成する方法を示す。最後に、分割スキームを決定するための戦略と、同一フレームワーク内で異なるシミュレーション手法を組み込む手法について論じる。

In this paper we provide a framework for combining multiple quantum simulation methods, such as Trotter-Suzuki formulas and QDrift into a single composite channel that builds upon older coalescing ideas for reducing gate counts. The central idea behind our approach is to use a partitioning scheme that allocates a Hamiltonian term to the Trotter or QDrift part of a channel within the simulation. This allows us to simulate small but numerous terms using QDrift while simulating the larger terms using a high-order Trotter-Suzuki formula. We prove rigorous bounds on the diamond distance between the composite channel and the ideal simulation channel and show under what conditions the cost of implementing the composite channel is asymptotically upper bounded by the methods that comprise it for both probabilistic partitioning of terms and deterministic partitioning. Finally, we discuss strategies for determining partitioning schemes as well as methods for incorporating different simulation methods within the same framework.

翻訳日:2023-02-09 12:31:51 公開日:2022-06-13

# Al$_{x}$Ga$_{1-x}$)$_{2}$O$_{3}$/Ga$_{2}$O$_{3}$ヘテロ構造における2次元電子気体のフルバンドモンテカルロシミュレーション

Full-band Monte Carlo simulation of two-dimensional electron gas in (Al$_{x}$Ga$_{1-x}$)$_{2}$O$_{3}$/Ga$_{2}$O$_{3}$ heterostructures ( http://arxiv.org/abs/2206.06405v1 )

ライセンス: Link先を確認

Avinash Kumar, and Uttam Singisetti

(参考訳) $\beta$-Gallium oxide (Ga$_{2}$O$_{3}$) は、パワーエレクトロニクスやRFスイッチングに応用するための超広帯域半導体である。室温バルク電子移動量(\sim$200 cm$^{2}$V$^{-1}$s$^{-1}$)は比較的低く、10原子原始細胞に由来する30フォノンモードによって制限されている。理論的に計算された飽和速度は1-2$\times$10$^{7}$ cms$^{-1}$であり、これはGaNに匹敵する。 2DEGにおける高磁場電子輸送は、第一原理計算パラメータに基づいて研究される。与えられたヘテロ構造設計の自己整合計算は、制限された固有関数と固有エネルギーを与える。 loフォノンプラズモンスクリーニングを考慮したフェルミの黄金則に基づいてサブバンド内およびサブバンド間散乱率を算出する。 300kにおけるヘテロ構造のフルバンドモンテカルロシミュレーションから高磁場特性を抽出し、2degおよびバルク中の電子の動きを、いくつかのヘテロ構造設計のために定常領域の人口、過渡ダイナミクス、速度場曲線を出力する統合モンテカルロプログラムにより処理する。飽和の臨界場はバルク値から大きく変化しないが, ピーク速度は2DEG密度の高い値で計算される。低2deg密度での速度は、loフォノンの反遮蔽に影響を与え、ゾーン人口の形成に重要な役割を果たしている。また,実験結果との比較を行い,実験結果との相違の原因について考察した。

$\beta$-Gallium Oxide (Ga$_{2}$O$_{3}$) is an extensively investigated ultrawide-bandgap semiconductor for potential applications in power electronics and RF switching. The room temperature bulk electron mobility ($\sim$200 cm$^{2}$V$^{-1}$s$^{-1}$) is comparatively low and is limited by the 30 phonon modes originating from its 10-atom primitive cell. The theoretically calculated saturation velocity is 1-2$\times$10$^{7}$ cms$^{-1}$ which is comparable to GaN. The high field electron transport in the 2DEG is explored in this work based on the first principles calculated parameters. A self-consistent calculation on a given heterostructure design gives the confined eigenfunctions and eigenenergies. The intrasubband and the intersubband scattering rates are calculated based on the Fermi's golden rule considering LO phonon-plasmon screening. The high field characteristics are extracted from the full-band Monte Carlo simulation of heterostructures at 300 K. The motion of electrons in the 2DEG and the bulk is treated through an integrated Monte Carlo program which outputs the steady state zone population, transient dynamics and the velocity-field curves for a few heterostructure designs. The critical field for saturation does not change significantly from bulk values, however an improved peak velocity is calculated at a higher 2DEG density. The velocity at low 2DEG densities is impacted by the antiscreening of LO phonons which plays an important role in shaping the zone population. A comparison with the experimental measurements is also carried out and possible origins of the discrepancies with experiments is discussed.

翻訳日:2023-02-09 12:31:34 公開日:2022-06-13

# アシュテカー変数を用いた重力誘起デコヒーレンスモデル

A gravitationally induced decoherence model using Ashtekar variables ( http://arxiv.org/abs/2206.06397v1 )

ライセンス: Link先を確認

Max Joseph Fahn, Kristina Giesel and Michael Kobler

(参考訳) 線形化重力へのスカラー場の結合を考え、アシュテカー変数を用いた相対論的重力誘起デコヒーレンスモデルを導出する。このモデルは、リレーショナルフォーマリズムにおいて適切な幾何学的時計を用いてゲージ不変度で定式化され、デコヒーレンスモデルの既存のゲージ不変式を広くする。ディラック可観測空間を構成するためには、既知の可観測写像を、時計と制約の役割が交換されるような双対写像の一種によって拡張する。また、ADM文献に存在する幾何学時計の選択についても論じる。次に、Fock空間上の位相空間の量子化を減らし、重力環境におけるギブス状態を選択し、射影演算子技術を用いて最終マスター方程式を導出する。結果として得られたマスター方程式はリンドブラッド型ではなく、現象学モデルでしばしば仮定される出発点であるが、熱ワイトマン関数で表現する相関関数の形式のためにマスター方程式の有効作用素のレベルでの残留時間依存性も含んでいる。さらに、ここで解析されたモデルにおいて、時間独立な実効系作用素の集合を得るための2番目のマルコフ近似の適用は、いくつかの量子力学モデルよりも単純ではないことを議論する。

We consider the coupling of a scalar field to linearised gravity and derive a relativistic gravitationally induced decoherence model using Ashtekar variables. The model is formulated at the gauge invariant level using suitable geometrical clocks in the relational formalism, broadening existing gauge invariant formulations of decoherence models. For the construction of the Dirac observables we extend the known observable map by a kind of dual map where the role of clocks and constraints is interchanged. We also discuss a second choice of geometrical clocks existing in the ADM literature. Then we apply a reduced phase space quantisation on Fock space and derive the final master equation choosing a Gibbs state for the gravitational environment and using the projection operator technique. The resulting master equation is not automatically of Lindblad type, a starting point sometimes assumed for phenomenological models, but still involves a residual time dependence at the level of the effective operators in the master equation due to the form of the correlation functions that we express in terms of thermal Wightman functions. Furthermore, we discuss why in the model analysed here the application of a second Markov approximation in order to obtain a set of time independent effective system operators is less straightforward than in some of the quantum mechanical models.

翻訳日:2023-02-09 12:30:46 公開日:2022-06-13

# 短期量子ハードウェアのための反復量子位相推定プロトコル

An iterative quantum-phase-estimation protocol for near-term quantum hardware ( http://arxiv.org/abs/2206.06392v1 )

ライセンス: Link先を確認

Joseph G. Smith, Crispin H. W. Barnes and David R. M. Arvidsson-Shukur

(参考訳) N_{\textrm{tot}}$ が未知の位相 $\theta$ を持つユニタリ演算の応用として与えられると、大規模フォールトトレラント量子系は $\mathcal{O} \left[1 / \sqrt{N_{\textrm{tot}}} \right]$ から $\mathcal{O} \left[1 / {N_{\textrm{tot}}} \right]$ へのスケールを {reduce} する。近未来の量子デバイスで利用可能な限られたリソースのため、絡み合いのないプロトコルが開発され、$\mathcal{O} \left[ \log(N_{\textrm{tot}}) / N_{\textrm{tot}} \right]$ {mean-absolute-error}スケーリングを実現した。本稿では,{error}スケーリングを改良した,短期的位相推定のための新しい2段階プロトコルを提案する。我々のプロトコルの最初のステップは、$\theta$のパラメータ範囲内で、$\theta $のいくつかの低標準偏差推定を生成する。第2のステップは、これらの見積もりの1つに反復的に当てはまる。私たちのプロトコルの {mean absolute error} は $\mathcal{O} \left[ \sqrt{\log (\log N_{\textrm{tot}})} / N_{\textrm{tot}} \right]$ とスケールします。さらに、定数スケーリング係数と必要な回路深さの低減を示す: 本プロトコルは$n_{\textrm{tot}}$の現実的な値に対して漸近的に最適な量子位相推定アルゴリズムを上回ることができる。

Given $N_{\textrm{tot}}$ applications of a unitary operation with an unknown phase $\theta$, a large-scale fault-tolerant quantum system can {reduce} an estimate's {error} scaling from $\mathcal{O} \left[ 1 / \sqrt{N_{\textrm{tot}}} \right]$ to $\mathcal{O} \left[ 1 / {N_{\textrm{tot}}} \right]$. Owing to the limited resources available to near-term quantum devices, entanglement-free protocols have been developed, which achieve a $\mathcal{O} \left[ \log(N_{\textrm{tot}}) / N_{\textrm{tot}} \right]$ {mean-absolute-error} scaling. Here, we propose a new two-step protocol for near-term phase estimation, with an improved {error} scaling. Our protocol's first step produces several low-{standard-deviation} estimates of $\theta $, within $\theta$'s parameter range. The second step iteratively hones in on one of these estimates. Our protocol's {mean absolute error} scales as $\mathcal{O} \left[ \sqrt{\log (\log N_{\textrm{tot}})} / N_{\textrm{tot}} \right]$. Furthermore, we demonstrate a reduction in the constant scaling factor and the required circuit depths: our protocol can outperform the asymptotically optimal quantum-phase estimation algorithm for realistic values of $N_{\textrm{tot}}$.

翻訳日:2023-02-09 12:30:24 公開日:2022-06-13

# 時間最適化マルチ量子ビットゲートの合成とコンパイル

Synthesis of and compilation with time-optimal multi-qubit gates ( http://arxiv.org/abs/2206.06387v1 )

ライセンス: Link先を確認

Pascal Ba{\ss}ler, Matthias Zipper, Christopher Cedzich, Markus Heinrich, Patrick Huber, Michael Johanning, Martin Kliesch

(参考訳) 我々は、Ising型とオールツーオール接続を固定した量子コンピューティングプラットフォームに対して、マルチキュービットゲートを絡み合わせるクラスを合成する方法を開発した。相互作用の柔軟性に関する唯一の要件は、個々の量子ビットに対してスイッチオンおよびオフが可能であることである。提案手法は,マルチキュービットゲートの時間最適実装を実現する。本研究では,全マルチキュービットゲートタイムが量子ビット数でほぼ線形であることを数値的に示す。このゲート合成をサブルーチンとして、重要なユースケースに対するコンパイル戦略を提供する。 (i)n$ qubits 上のclifford回路は、ancilla qubits を必要とせずに、最大$n$ マルチキュービットゲートを使用して実装できることを示す。 (ii)同様の方法で量子フーリエ変換を分解する。 (iii)分子動力学のシミュレーションをコンパイルし、 (iv)一般ユニタリに向けてのステップとして,時間最適化マルチキュービットゲートを用いた対角ユニタリのコンパイル法を提案する。モチベーションとして、Ising型相互作用生成のための磁気勾配誘導結合(MAGIC)を用いたマイクロ波制御イオントラップアーキテクチャについて、詳細な議論を行う。

We develop a method to synthesize a class of entangling multi-qubit gates for a quantum computing platform with fixed Ising-type interaction with all-to-all connectivity. The only requirement on the flexibility of the interaction is that it can be switched on and off for individual qubits. Our method yields a time-optimal implementation of the multi-qubit gates. We numerically demonstrate that the total multi-qubit gate time scales approximately linear in the number of qubits. Using this gate synthesis as a subroutine, we provide compilation strategies for important use cases: (i) we show that any Clifford circuit on $n$ qubits can be implemented using at most $n$ multi-qubit gates without requiring ancilla qubits, (ii) we decompose the quantum Fourier transform in a similar fashion, (iii) we compile a simulation of molecular dynamics, and (iv) we propose a method for the compilation of diagonal unitaries with time-optimal multi-qubit gates, as a step towards general unitaries. As motivation, we provide a detailed discussion on a microwave controlled ion trap architecture with magnetic gradient induced coupling (MAGIC) for the generation of the Ising-type interactions.

翻訳日:2023-02-09 12:29:38 公開日:2022-06-13

# 水中乱流チャネル上のパッシブリレーを用いたマルチホップ量子鍵分布

Multi-Hop Quantum Key Distribution with Passive Relays over Underwater Turbulence Channels ( http://arxiv.org/abs/2206.06514v1 )

ライセンス: Link先を確認

Amir Hossein Fahim Raouf, Majid Safari, Murat Uysal

(参考訳) 水中チャネルで経験した吸収、散乱、乱流は、量子通信の範囲を著しく制限する。本稿では,帯域制限を克服するために,中間ノードがソースノードと宛先ノード間の鍵分布を助けるマルチホップ水中量子鍵分布(qkd)について検討する。我々は、測定なしで次の中継ノードや受信機にキュービットをリダイレクトするパッシブリレーの配置を検討する。近距離場解析に基づいて, 大気条件の異なる澄んだ海におけるリレー支援QKDスキームの性能を示す。さらに,システムパラメータ(開口サイズと検出器視野)が到達可能な距離に与える影響について検討する。

Absorption, scattering, and turbulence experienced in underwater channels severely limit the range of quantum communications. In this paper, to overcome range limitations, we investigate a multi-hop underwater quantum key distribution (QKD) where intermediate nodes help the key distribution between the source and destination nodes. We consider deployment of passive-relays which simply redirect the qubits to the next relay node or receiver without any measurement. Based on near-field analysis, we present the performance of relay-assisted QKD scheme in clear ocean under different atmospheric conditions. We further investigate the effect of system parameters (aperture size and detector field-of-view) on the achievable distance.

翻訳日:2023-02-09 12:19:36 公開日:2022-06-13

# 有限クロージャ系における最大閉集合と半空間分離

Maximal Closed Set and Half-Space Separations in Finite Closure Systems ( http://arxiv.org/abs/2001.04417v3 )

ライセンス: Link先を確認

Florian Seiffarth, Tamas Horvath and Stefan Wrobel

(参考訳) いくつかの概念学習問題は、有限基底集合上の抽象閉包系における半空間分離の特別な場合と見なすことができる。閉包系が閉包演算子を介して暗黙的に与えられる典型的なシナリオについて、半空間分離問題はNP完全であることを示す。この負の結果を克服する最初のアプローチとして、この問題を最大閉集合分離に緩和し、この問題を線形数のクロージャ演算子呼び出しで解く一般の欲求アルゴリズムを与え、この境界が鋭いことを示す。第二に,角谷閉包系を考察し,アルゴリズムによって特徴付けられることを証明した。一般問題設定の第一の特別な場合として、グラフ上の角谷閉包系を考察し、禁止グラフマイナーーの観点からこの種の閉包系に十分な条件を与える。第二の特別な場合として、有限格子上の閉包系に着目し、ジェネリック・グリーディアルゴリズムの適応性を高め、仮定格子に関する応用を提案する。

Several concept learning problems can be regarded as special cases of half-space separation in abstract closure systems over finite ground sets. For the typical scenario that the closure system is implicitly given via a closure operator, we show that the half-space separation problem is NP-complete. As a first approach to overcome this negative result, we relax the problem to maximal closed set separation, give a generic greedy algorithm solving this problem with a linear number of closure operator calls, and show that this bound is sharp. For a second direction, we consider Kakutani closure systems and prove that they are algorithmically characterized by the greedy algorithm. As a first special case of the general problem setting, we consider Kakutani closure systems over graphs and give a sufficient condition for this kind of closure systems in terms of forbidden graph minors. For a second special case, we then focus on closure systems over finite lattices, give an improved adaptation of the generic greedy algorithm, and present an application concerning subsumption lattices.

翻訳日:2023-01-11 22:29:46 公開日:2022-06-13

# ピーク制約付きmdpsの効率的モデルフリーアルゴリズム

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints ( http://arxiv.org/abs/2003.05555v6 )

ライセンス: Link先を確認

Qinbo Bai and Vaneet Aggarwal and Ather Gattami

(参考訳) 動的システムの最適化では、変数は一般に制約を持つ。このような問題をCMDP(Constrained Markov Decision Process)としてモデル化することができる。本稿では,有限地平線における全報酬を最大化し,かつ各エポックにおける制約を確率1で満たす政策を選択する,制約付きマルコフ決定過程(PCMDP)について考察する。我々は,PCMDP問題を制約のない問題に変換するモデルフリーアルゴリズムを提案し,Q-ラーニングに基づくアプローチを適用した。提案した PCMDP 問題に対して,ほぼ正しい (PAC) の概念を定義する。提案するアルゴリズムは、エピソード $k\geq\omega(\frac{i^2h^6sa\ell}{\epsilon^2})$ に対して$(\epsilon,p)$-pacポリシーを成立させることが証明されている。 $H$はエピソードごとのエポックの数です。 I$は制約関数の数であり、$\ell=\log(\frac{SAT}{p})$である。ピーク制約を持つPCMDPのPAC解析における最初の結果であり、遷移ダイナミクスはアプリオリではない。提案手法は, エネルギー収穫問題と単機スケジューリング問題に対して提案手法を実証し, 検討された最適化問題の理論的上限に近い性能を示す。

In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied. We define the concept of probably approximately correct (PAC) to the proposed PCMDP problem. The proposed algorithm is proved to achieve an $(\epsilon,p)$-PAC policy when the episode $K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})$, where $S$ and $A$ are the number of states and actions, respectively. $H$ is the number of epochs per episode. $I$ is the number of constraint functions, and $\ell=\log(\frac{SAT}{p})$. We note that this is the first result on PAC kind of analysis for PCMDP with peak constraints, where the transition dynamics are not known apriori. We demonstrate the proposed algorithm on an energy harvesting problem and a single machine scheduling problem, where it performs close to the theoretical upper bound of the studied optimization problem.

翻訳日:2022-12-24 14:31:43 公開日:2022-06-13

# 重み付きQ-Learningによる深層強化学習

Deep Reinforcement Learning with Weighted Q-Learning ( http://arxiv.org/abs/2003.09280v3 )

ライセンス: Link先を確認

Andrea Cini, Carlo D'Eramo, Jan Peters, Cesare Alippi

(参考訳) Q-learningに基づく強化学習アルゴリズムは、複雑な問題の解決と超人的パフォーマンスの実現に向けて、Deep Reinforcement Learning (DRL)研究を推進している。にもかかわらず、Q-Learningは期待値の雑音の最大過度推定を用いて学習するため、正のバイアスを受けることが知られている。動作値の体系的過大評価とDRL法の本質的に高い分散は、漸進的にエラーを蓄積させ、学習アルゴリズムのばらつきを引き起こす。理想的には、DRLエージェントがそれぞれのアクションの最適性について不確実性を考慮し、それを利用して期待されるリターンのより詳細な推定を行えるようにしたい。この点において、Weighted Q-Learning(WQL)はバイアスを効果的に低減し、確率的環境において顕著な結果を示す。 WQLは推定された作用値の重み付け和を使用し、重み付けは各作用値の最大値の確率に対応するが、これらの確率の計算は表の設定でのみ実用的である。本研究では,ディープガウス過程の効果的な近似として,ドロップアウトで訓練されたニューラルネットワークを用いて,drlのwql特性の恩恵を受けるための方法論的進歩を提案する。特に, DRLにおける上皮性不確かさのキャリブレーション値を求めるために, コンクリートドロップアウト変種を採用する。推定器は、いくつかの確率的前方通過をアクション値ネットワークを通過し、モンテカルロ方式で重みを計算することによって得られる。そのような重みは、ドロップアウトによって推定される後方確率分布の最大 w.r.t. に対応する各アクション値の確率のベイズ推定である。そこで本研究では, 重み付きq-learningアルゴリズムを用いて, バイアスw.r.t.のベースラインを低減し, そのアドバンテージを代表ベンチマークで実証的に証明する。

Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values. Systematic overestimation of the action values coupled with the inherently high variance of DRL methods can lead to incrementally accumulate errors, causing learning algorithms to diverge. Ideally, we would like DRL agents to take into account their own uncertainty about the optimality of each action, and be able to exploit it to make more informed estimations of the expected return. In this regard, Weighted Q-Learning (WQL) effectively reduces bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action values, where the weights correspond to the probability of each action value being the maximum; however, the computation of these probabilities is only practical in the tabular setting. In this work, we provide methodological advances to benefit from the WQL properties in DRL, by using neural networks trained with Dropout as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. The estimator, then, is obtained by taking several stochastic forward passes through the action-value network and computing the weights in a Monte Carlo fashion. Such weights are Bayesian estimates of the probability of each action value corresponding to the maximum w.r.t. a posterior probability distribution estimated by Dropout. We show how our novel Deep Weighted Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provides empirical evidence of its advantages on representative benchmarks.

翻訳日:2022-12-21 21:50:00 公開日:2022-06-13

# ランダムスペーシングを用いた分散SGDの分離誤差フィードバック

Detached Error Feedback for Distributed SGD with Random Sparsification ( http://arxiv.org/abs/2004.05298v3 )

ライセンス: Link先を確認

An Xu, Heng Huang

(参考訳) 大規模分散ディープラーニングでは,通信ボトルネックが重要な問題となっている。本研究では,不規則なブロック幅の分散SGDを,リングアレーダ互換かつ高い計算効率を持つ勾配圧縮機として検討するが,性能は低下する。この重要な問題に対処するために、我々は通信効率のよい分散SGD、すなわち勾配のばらつきと第二モーメントの間のトレードオフを改善した。このモチベーションにより,非凸問題に対する誤差フィードバックよりも高い収束率を示す新しい分離誤差フィードバック(def)アルゴリズムを提案する。また、Def-Aは、トレーニングの初期段階におけるDefの一般化を加速し、Defよりも優れた一般化境界を示す。さらに,通信効率の高い分散SGDとSGDとの接続を,SGD-IA (Iterate Averaging) と初めて確立した。深層学習実験では,様々な条件下で提案手法の有意な経験的改善が示された。

The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a novel aspect, that is, the trade-off between the variance and second moment of the gradient. With this motivation, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. We also propose DEF-A to accelerate the generalization of DEF at the early stages of the training, which shows better generalization bounds than DEF. Furthermore, we establish the connection between communication-efficient distributed SGD and SGD with iterate averaging (SGD-IA) for the first time. Extensive deep learning experiments show significant empirical improvement of the proposed methods under various settings.

翻訳日:2022-12-14 09:59:33 公開日:2022-06-13

# 機能の浄化: 対人訓練が頑健な深層学習を実現する方法

Feature Purification: How Adversarial Training Performs Robust Deep Learning ( http://arxiv.org/abs/2005.10190v4 )

ライセンス: Link先を確認

Zeyuan Allen-Zhu and Yuanzhi Li

(参考訳) 相反する摂動に対して深層学習モデルを守るために相反する訓練を用いた経験的な成功にもかかわらず、相反する摂動の存在の背後にある原理と、相反するトレーニングがそれらを取り除くためにニューラルネットワークにどのような影響を与えるのかは、今のところまだ不明である。本稿では,ニューラルネットワークのトレーニング過程において,特定の低密度混合物が隠れ重みに蓄積されていること,さらに,そのような混合物を除去して隠蔽重みを浄化することが敵のトレーニングの目的である,という特徴浄化(Feature Purification)の原則を提案する。この原理を説明するために,CIFAR-10データセットを用いて実験を行った。また,特定の自然分類タスクに対して,ランダムに初期化勾配勾配勾配を用いた2層ニューラルネットワークをトレーニングすることで,この原理を満足できることを示す理論的結果を示す。技術的には、我々の知る限りでは、次の2つがreluアクティベーションでニューラルネットワークをトレーニングするために同時に保持できることを証明する最初の結果です。 1) 原データのトレーニングは, 半径の小さな対向摂動に対して, 実際に非破壊的である。 2) fgmのような経験的摂動アルゴリズムであっても、逆行訓練は、実際には同じ半径の摂動に対して確実に頑健である。最後に,線形分類器や低次多項式,あるいはニューラルネットワークの神経接核といった複雑性の低いモデルでは,アルゴリズムが何であっても,同じ半径の摂動に対して防御できないことを示した。

Despite the empirical success of using Adversarial Training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In this paper, we present a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network; and more importantly, one of the goals of adversarial training is to remove such mixtures to purify hidden weights. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle. Technically, we give, to the best of our knowledge, the first result proving that the following two can hold simultaneously for training a neural network with ReLU activation. (1) Training over the original data is indeed non-robust to small adversarial perturbations of some radius. (2) Adversarial training, even with an empirical perturbation algorithm such as FGM, can in fact be provably robust against ANY perturbations of the same radius. Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, CANNOT defend against perturbations of this same radius, no matter what algorithms are used to train them.

翻訳日:2022-12-01 04:31:09 公開日:2022-06-13

# ランダム射影の精密表現:低ランク近似とランダムニュートン

Precise expressions for random projections: Low-rank approximation and randomized Newton ( http://arxiv.org/abs/2006.10653v3 )

ライセンス: Link先を確認

Micha{\l} Derezi\'nski, Feynman Liang, Zhenyu Liao and Michael W. Mahoney

(参考訳) 低次元の部分空間に投影することで、大きなデータセットの次元性を減らすことがしばしば望ましい。マトリックススケッチは、そのような次元削減を非常に効率的に行うための強力な技術として登場した。スケッチの最悪の性能に関する広範な文献があるが、既存の保証は実際には観察されているものとは大きく異なる。本研究では,ランダム行列のスペクトル解析における最近の進歩を活かし,スケッチによって得られるランダム射影行列の期待値に対して,確実に正確な表現を提供する新しい手法を開発した。これらの式は、低ランク近似から反復確率最適化まで、様々な機械学習タスクにおける次元削減のパフォーマンスを特徴付けることができる。本手法はガウシアンスケッチやラデマッハスケッチなど,いくつかの一般的なスケッチ手法に適用でき,データのスペクトル特性の観点から,これらの手法を高精度に解析できる。実験結果から,これらのスケッチ手法の実践的性能を,低次効果や定数要因まで反映した表現が得られた。

It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are typically very different from what is observed in practice. We exploit recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching. These expressions can be used to characterize the performance of dimensionality reduction in a variety of common machine learning tasks, ranging from low-rank approximation to iterative stochastic optimization. Our results apply to several popular sketching methods, including Gaussian and Rademacher sketches, and they enable precise analysis of these methods in terms of spectral properties of the data. Empirical results show that the expressions we derive reflect the practical performance of these sketching methods, down to lower-order effects and even constant factors.

翻訳日:2022-11-19 12:57:28 公開日:2022-06-13

# DeepVOX:非理想的音声信号における話者認識のための生音声の特徴発見

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Non-ideal Audio Signals ( http://arxiv.org/abs/2008.11668v2 )

ライセンス: Link先を確認

Anurag Chowdhury, Arun Ross

(参考訳) 自動音声認識アルゴリズムは通常、メル周波数やガンマタンフィルタバンクなどの予め定義されたフィルタバンクを使用して音声音声を特徴付ける。しかし、これらのフィルタバンクを用いて抽出した特徴は、多様なオーディオ劣化に対する耐性がないことが観察されている。本研究では,大量の音声からフィルタバンク設計を推定する深層学習に基づく手法を提案する。このようなフィルタバンクの目的は、劣化、短命、多言語音声など、理想的でない音声条件にロバストな特徴を抽出することである。この効果のために、1D畳み込みニューラルネットワークは生音声から直接DeepVOXと呼ばれる時間領域のフィルタバンクを学習するように設計されている。次に,フィルタバンクの訓練に適したデータサンプルを効率的にマイニングするために,適応三重項マイニング手法を開発した。第3に,deepvoxフィルタバンクの詳細なアブレーション研究により,抽出された特徴における声道特性と声道特性の両方の存在が明らかになった。 VOXCeleb2、NIST SRE 2008、2010、2018、およびFisher音声データセットの実験結果は、様々な劣化、短い期間、多言語音声におけるDeepVOX特徴の有効性を示す。 DeepVOX機能はまた、xVector-PLDAやiVector-PLDAといった既存の話者認識アルゴリズムの性能向上を示す。

Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. However, it has been observed that the features extracted using these filterbanks are not resilient to diverse audio degradations. In this work, we propose a deep learning-based technique to deduce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to non-ideal audio conditions, such as degraded, short duration, and multi-lingual speech. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining technique is developed to efficiently mine the data samples best suited to train the filterbank. Thirdly, a detailed ablation study of the DeepVOX filterbanks reveals the presence of both vocal source and vocal tract characteristics in the extracted features. Experimental results on VOXCeleb2, NIST SRE 2008, 2010 and 2018, and Fisher speech datasets demonstrate the efficacy of the DeepVOX features across a variety of degraded, short duration, and multi-lingual speech. The DeepVOX features also shown to improve the performance of existing speaker recognition algorithms, such as the xVector-PLDA and the iVector-PLDA.

翻訳日:2022-10-24 22:22:17 公開日:2022-06-13

# CLAS12用ドリフトチャンバーにおけるトラック再構成の自動エンコーダ

Auto-encoders for Track Reconstruction in Drift Chambers for CLAS12 ( http://arxiv.org/abs/2009.05144v2 )

ライセンス: Link先を確認

Gagik Gavalian

(参考訳) 本稿では,ドリフトチャンバーに欠落したセグメントを推定することでトラックを識別し,CLAS12追跡アルゴリズムを支援する機械学習モデルの開発について述べる。オートエンコーダは、トラック軌道から欠落したセグメントを再構成するために使用される。実装されたニューラルネットワークは、欠落したセグメントの位置を約0.05$の精度で確実に再構築でき、99.8\%の精度で欠落したトラックの回復に繋がる。

In this article we describe the development of machine learning models to assist the CLAS12 tracking algorithm by identifying tracks through inferring missing segments in the drift chambers. Auto encoders are used to reconstruct missing segments from track trajectory. Implemented neural network was able to reliably reconstruct missing segment positions with accuracy of $\approx 0.35$ wires, and lead to recovery of missing tracks with accuracy of $>99.8\%$.

翻訳日:2022-10-20 04:03:20 公開日:2022-06-13

# 画像ベースのソルガムヘッド計数(sorghum head counting)

Image-Based Sorghum Head Counting When You Only Look Once ( http://arxiv.org/abs/2009.11929v3 )

ライセンス: Link先を確認

Lawrence Mosley and Hieu Pham and Yogesh Bansal and Eric Hare

(参考訳) デジタル農業の最近のトレンドは、作物の品質評価と収量推定のために人工知能にシフトしている。本研究では,パラメータ調整された単発物体検出アルゴリズムを用いて,空中ドローン画像からソルガム頭部を識別・カウントする方法について述べる。提案手法は,ソルガム画像の重要な構造要素を同定し,性能に大きく寄与するパラメータ調整アンカーボックスの選択を動機付ける,新たな探索分析を含む。これらの知見は、ベースラインモデルより優れ、サンプル外平均精度0.95を達成したディープラーニングモデルの開発につながった。

Modern trends in digital agriculture have seen a shift towards artificial intelligence for crop quality assessment and yield estimation. In this work, we document how a parameter tuned single-shot object detection algorithm can be used to identify and count sorghum head from aerial drone images. Our approach involves a novel exploratory analysis that identified key structural elements of the sorghum images and motivated the selection of parameter-tuned anchor boxes that contributed significantly to performance. These insights led to the development of a deep learning model that outperformed the baseline model and achieved an out-of-sample mean average precision of 0.95.

翻訳日:2022-10-15 04:21:26 公開日:2022-06-13

# 深層学習による外惑星の同定 IV。ニューラルネットワークを用いた放射速度測定からの恒星活動信号の除去

Identifying Exoplanets with Deep Learning. IV. Removing Stellar Activity Signals from Radial Velocity Measurements Using Neural Networks ( http://arxiv.org/abs/2011.00003v3 )

ライセンス: Link先を確認

Zoe L. de Beurs, Andrew Vanderburg, Christopher J. Shallue, Xavier Dumusque, Andrew Collier Cameron, Christopher Leet, Lars A. Buchhave, Rosario Cosentino, Adriano Ghedina, Rapha\"elle D. Haywood, Nicholas Langellier, David W. Latham, Mercedes L\'opez-Morales, Michel Mayor, Giusi Micela, Timothy W. Milbourne, Annelies Mortier, Emilio Molinari, Francesco Pepe, David F. Phillips, Matteo Pinamonti, Giampaolo Piotto, Ken Rice, Dimitar Sasselov, Alessandro Sozzetti, St\'ephane Udry, Christopher A. Watson

(参考訳) 正確な放射速度(RV)を観測する太陽系外惑星検出は、星活動によって引き起こされる刺激的なRV信号によって現在制限されている。線形回帰やニューラルネットワークのような機械学習技術は、RV観測から活動信号(スタースポット/ファキュラによる)を効果的に除去できることを示す。以前の取り組みは、ガウス過程回帰(haywood et al. 2014)のようなモデリング技術を使って、時間内にアクティビティ信号を注意深くフィルタリングすることに焦点を当てていた。代わりに、スペクトル線の平均的な形状の変化のみを用いて、系統的に活動信号を取り除き、いつ観測されたかに関する情報は得られない。私たちは、シミュレーションデータ(SOAP 2.0ソフトウェアで生成されたDumusqueなど)と、HARPS-N太陽望遠鏡(Dumusque et al. 2015; Phillips et al. 2016; Collier Cameron et al. 2019)からの太陽の観測の両方に基づいて、機械学習モデルをトレーニングしました。これらの技術は、シミュレーションデータ(82 cm/sから3 cm/s)と、HARPS-N太陽望遠鏡(約1.753 m/sから1.039 m/s)で3年間に約600回観測された実測値(約1.7の改善率)から恒星活動を予測することができる。将来的には、太陽系外にある恒星の観測から活動シグナルを取り除き、太陽のような恒星の周りに居住可能な地球外惑星を検出するのに役立つだろう。

Exoplanet detection with precise radial velocity (RV) observations is currently limited by spurious RV signals introduced by stellar activity. We show that machine learning techniques such as linear regression and neural networks can effectively remove the activity signals (due to starspots/faculae) from RV observations. Previous efforts focused on carefully filtering out activity signals in time using modeling techniques like Gaussian Process regression (e.g. Haywood et al. 2014). Instead, we systematically remove activity signals using only changes to the average shape of spectral lines, and no information about when the observations were collected. We trained our machine learning models on both simulated data (generated with the SOAP 2.0 software; Dumusque et al. 2014) and observations of the Sun from the HARPS-N Solar Telescope (Dumusque et al. 2015; Phillips et al. 2016; Collier Cameron et al. 2019). We find that these techniques can predict and remove stellar activity from both simulated data (improving RV scatter from 82 cm/s to 3 cm/s) and from more than 600 real observations taken nearly daily over three years with the HARPS-N Solar Telescope (improving the RV scatter from 1.753 m/s to 1.039 m/s, a factor of ~ 1.7 improvement). In the future, these or similar techniques could remove activity signals from observations of stars outside our solar system and eventually help detect habitable-zone Earth-mass exoplanets around Sun-like stars.

翻訳日:2022-10-01 17:37:48 公開日:2022-06-13

# 偽ニュース検出のためのハイブリッドアンサンブルの試み

Hybrid Ensemble for Fake News Detection: An attempt ( http://arxiv.org/abs/2206.13981v1 )

ライセンス: Link先を確認

Lovedeep Singh

(参考訳) フェイクニュース検出は、機械学習の分野で難しい問題となっている。研究者は、古い統計分類モデルと現代のディープラーニングを用いて、いくつかの手法でアプローチしている。現在、データ量の増加、NLPとMLの分野の発展、処理時の計算能力の増加により、この問題に異なる視点からアプローチするための無限の置換と組み合わせが存在する。本稿では,フェイクニュースに取り組むために異なる手法を試し,構築し,古典的機械学習手法と現代的ディープラーニング手法を組み合わせたハイブリッドアンサンブルの可能性を提案する。

Fake News Detection has been a challenging problem in the field of Machine Learning. Researchers have approached it via several techniques using old Statistical Classification models and modern Deep Learning. Today, with the growing amount of data, developments in the field of NLP and ML, and an increase in the computation power at disposal, there are infinite permutations and combinations to approach this problem from a different perspective. In this paper, we try different methods to tackle Fake News, and try to build, and propose the possibilities of a Hybrid Ensemble combining the classical Machine Learning techniques with the modern Deep Learning Approaches

翻訳日:2022-07-04 01:14:25 公開日:2022-06-13

# (参考訳) 畳み込みニューラルネットワークの構造化プルーニングの活用

Leveraging Structured Pruning of Convolutional Neural Networks ( http://arxiv.org/abs/2206.06247v1 )

ライセンス: CC BY 4.0

Hugo Tessier, Vincent Gripon, Mathieu L\'eonardon, Matthieu Arzel, David Bertrand, Thomas Hannagan

(参考訳) 構造化プルーニングは、多くのコンピュータビジョンタスクにおける最先端技術である畳み込みニューラルネットワークのコストを削減する一般的な方法である。しかし、アーキテクチャによっては、プルーニングは、実際のプルーニングネットワークの減少を防ぐ次元的な相違をもたらす。この問題に対処するため,我々は,構造化されたプルーニングマスクを取り込んで,これらの問題に遭遇せず,効率的に活用できるネットワークを生成する手法を提案する。筆者らは,提案手法の正確な説明を行い, 畳み込み畳み込みニューラルネットワークの, 組込みハードウェア上でのエネルギー消費と推論時間におけるゲイン結果を示す。

Structured pruning is a popular method to reduce the cost of convolutional neural networks, that are the state of the art in many computer vision tasks. However, depending on the architecture, pruning introduces dimensional discrepancies which prevent the actual reduction of pruned networks. To tackle this problem, we propose a method that is able to take any structured pruning mask and generate a network that does not encounter any of these problems and can be leveraged efficiently. We provide an accurate description of our solution and show results of gains, in energy consumption and inference time on embedded hardware, of pruned convolutional neural networks.

翻訳日:2022-06-27 00:35:08 公開日:2022-06-13

# (参考訳) 組込みGPUを用いたプルーニングセマンティックセマンティックセグメンテーションネットワークのエネルギー消費解析

Energy Consumption Analysis of pruned Semantic Segmentation Networks on an Embedded GPU ( http://arxiv.org/abs/2206.06255v1 )

ライセンス: CC BY 4.0

Hugo Tessier, Vincent Gripon, Mathieu L\'eonardon, Matthieu Arzel, David Bertrand, Thomas Hannagan

(参考訳) ディープニューラルネットワークは多くのコンピュータビジョンタスクにおける最先端技術である。エネルギー消費の面での制限は、通常最高の性能に達する非常に大きなネットワークの使用を禁止しているため、自動運転車のコンテキストにおける彼らの展開は特に興味深い。これらのアーキテクチャの複雑さを減らす一般的な方法は、精度を犠牲にすることなく、最も重要でない部分を取り除くプラニングに依存することである。このテーマには多くの文献があるが、興味深いことに、刈り取りがエネルギーに与える影響を計測した作品はほとんどない。本研究では、Cityscapesデータセットを用いて、自動運転のためのセマンティックセグメンテーションのコンテキストで測定することに興味がある。そこで本稿では,Jetson Xavier組込みGPU上にトレーニングアーキテクチャをデプロイした場合に,最近提案した構造化プルーニング手法の影響を解析する。

Deep neural networks are the state of the art in many computer vision tasks. Their deployment in the context of autonomous vehicles is of particular interest, since their limitations in terms of energy consumption prohibit the use of very large networks, that typically reach the best performance. A common method to reduce the complexity of these architectures, without sacrificing accuracy, is to rely on pruning, in which the least important portions are eliminated. There is a large literature on the subject, but interestingly few works have measured the actual impact of pruning on energy. In this work, we are interested in measuring it in the specific context of semantic segmentation for autonomous driving, using the Cityscapes dataset. To this end, we analyze the impact of recently proposed structured pruning methods when trained architectures are deployed on a Jetson Xavier embedded GPU.

翻訳日:2022-06-27 00:24:16 公開日:2022-06-13

# (参考訳) 正確な2次元対応から3次元点雲へ

From a few Accurate 2D Correspondences to 3D Point Clouds ( http://arxiv.org/abs/2206.08749v1 )

ライセンス: CC BY 4.0

Trung-Kien Le and Ping Li

(参考訳) キーポイント、対応、投影行列、点雲、高密度雲は画像ベースの3次元再構成における骨格であり、そこでは3次元再構成対象の現実的で自然なモデルを生成する上で、点雲が重要な役割を果たす。良好な3D再構成を実現するためには、点雲は物体の表面のほぼ至るところにある必要がある。本稿では,物体の表面全体を覆う点雲の構築を主目的とし,測地的特徴(geodesic feature, geo-feature)と呼ばれる新機能を提案する。新しい測地関数に基づいて、対象の表面に、正確に推定されたすべての射影行列とともにいくつかの(与えられた)初期世界点が存在する場合、これらの二つの世界点を接続する測地線上の新しい世界点が再構成される。すると、これらの初期世界点に接する表面上の領域は、点雲に覆われる。したがって、初期世界点が表面の周囲にある場合、点雲は表面全体を覆うことになる。本稿では,その対応から世界点と投影行列を推定する新しい手法を提案する。本手法は,世界点と射影行列の閉形式および反復解を導出し,世界点数が7未満で画像数が5以上である場合,提案した解が大域的最適であることを示す。本稿では,それらの対応から世界点と射影行列を推定するために world points from their correspondences (wpfc) というアルゴリズムと,第1のアルゴリズムによって与えられた世界点と射影行列から点雲を生成する creating point clouds (crpc) という別のアルゴリズムを提案する。

Key points, correspondences, projection matrices, point clouds and dense clouds are the skeletons in image-based 3D reconstruction, of which point clouds have the important role in generating a realistic and natural model for a 3D reconstructed object. To achieve a good 3D reconstruction, the point clouds must be almost everywhere in the surface of the object. In this article, with a main purpose to build the point clouds covering the entire surface of the object, we propose a new feature named a geodesic feature or geo-feature. Based on the new geo-feature, if there are several (given) initial world points on the object's surface along with all accurately estimated projection matrices, some new world points on the geodesics connecting any two of these given world points will be reconstructed. Then the regions on the surface bordering by these initial world points will be covered by the point clouds. Thus, if the initial world points are around the surface, the point clouds will cover the entire surface. This article proposes a new method to estimate the world points and projection matrices from their correspondences. This method derives the closed-form and iterative solutions for the world points and projection matrices and proves that when the number of world points is less than seven and the number of images is at least five, the proposed solutions are global optimal. We propose an algorithm named World points from their Correspondences (WPfC) to estimate the world points and projection matrices from their correspondences, and another algorithm named Creating Point Clouds (CrPC) to create the point clouds from the world points and projection matrices given by the first algorithm.

翻訳日:2022-06-27 00:14:55 公開日:2022-06-13

# (参考訳) 非ゲートCTスキャンを用いた半教師あり学習によるU-Netモデルによる冠動脈スコーシングの自動化

Automated Coronary Calcium Scoring using U-Net Models through Semi-supervised Learning on Non-Gated CT Scans ( http://arxiv.org/abs/2206.10455v1 )

ライセンス: CC BY-SA 4.0

Sanskriti Singh

(参考訳) 毎年、何千人もの無実の人々が心臓発作で死んでいる。多くの現在の医療計画では、これらのスキャンで石灰化を検索するコストをカバーしていないため、心臓発作は意外な結果に陥ることが多い。心臓疾患の疑いがある場合のみ、ゲートctスキャンを受けます。そうでなければ、患者が心臓発作/死の可能性を認識する方法はありません。非調節型ctスキャンはより定期的に行われるが、石灰化の検出は困難であり、通常は動脈内の石灰化の特定以外の目的で行われる。実際、リアルタイムの冠動脈石灰化スコアは、非ゲートCTスキャンではなく、ゲートCTスキャンでのみ計算される。冠状カルシウムと胸部CTのゲートスキャンでユニットモデルを訓練した後、非接触テストセットでDICE係数0.95を得た。このモデルは非ゲートCTスキャンの予測に用いられ、平均絶対誤差は674.19で、バケット分類精度は41%(5クラス)であった。画像の解析と画像に格納された情報を通じて、数学的方程式が導出され、心臓の位置の周りで自動的に画像が収穫される。半教師付き学習を行うことで、新たに採取した非ゲートスキャンは、ゲートCTスキャンと密接に類似し、MAE(62.38)で91%、精度で23%向上した。

Every year, thousands of innocent people die due to heart attacks. Often undiagnosed heart attacks can hit people by surprise since many current medical plans don't cover the costs to require the searching of calcification on these scans. Only if someone is suspected to have a heart problem, a gated CT scan is taken, otherwise, there's no way for the patient to be aware of a possible heart attack/disease. While nongated CT scans are more periodically taken, it is harder to detect calcification and is usually taken for a purpose other than locating calcification in arteries. In fact, in real time coronary artery calcification scores are only calculated on gated CT scans, not nongated CT scans. After training a unet model on the Coronary Calcium and chest CT's gated scans, it received a DICE coefficient of 0.95 on its untouched test set. This model was used to predict on nongated CT scans, performing with a mean absolute error (MAE) of 674.19 and bucket classification accuracy of 41% (5 classes). Through the analysis of the images and the information stored in the images, mathematical equations were derived and used to automatically crop the images around the location of the heart. By performing semi-supervised learning the new cropped nongated scans were able to closely resemble gated CT scans, improving the performance by 91% in MAE (62.38) and 23% in accuracy.

翻訳日:2022-06-26 23:52:57 公開日:2022-06-13

# 多目的遺伝的プログラミングにおける意味論のハイライト

Highlights of Semantics in Multi-objective Genetic Programming ( http://arxiv.org/abs/2206.05010v2 )

ライセンス: Link先を確認

Edgar Galv\'an, Leonardo Trujillo, Fergal Stapleton

(参考訳) セマンティックス(Semantics)は、遺伝的プログラミング(GP)の研究の領域であり、実行時に遺伝的プログラミングの個体の行動出力を指す。 sdo (semantic-based distance as a additional criterion) という手法が提案されており、これまでのところ、多目的gp (multi-objective gp, mogp) における意味論の研究領域は限られている。 SCC(Semantic similarity-based Crossover)とSCD(Semantic-based Crowding Distance)という,2つのセマンティックなセマンティックなアプローチを使用して,パフォーマンスと多様性の指標の観点からGPの拡張分析を行った。それぞれのアプローチは2つの進化的多目的 (EMO) フレームワークに統合される: 非支配的ソーティング遺伝アルゴリズムII (NSGA-II) と強度パレート進化アルゴリズム2 (SPEA2) の3つのセマンティックアプローチと共に、NSGA-IIとSPEA2の正準形式を厳密に比較する。高度にバランスの取れないバイナリ分類データセットを用いて,新たに提案するsdoのアプローチが,多様性の向上と高ボリューム化とともに,非優位なソリューションを一貫して生成することを示した。

Semantics is a growing area of research in Genetic programming (GP) and refers to the behavioural output of a Genetic Programming individual when executed. This research expands upon the current understanding of semantics by proposing a new approach: Semantic-based Distance as an additional criteriOn (SDO), in the thus far, somewhat limited researched area of semantics in Multi-objective GP (MOGP). Our work included an expansive analysis of the GP in terms of performance and diversity metrics, using two additional semantic-based approaches, namely Semantic Similarity-based Crossover (SCC) and Semantic-based Crowding Distance (SCD). Each approach is integrated into two evolutionary multi-objective (EMO) frameworks: Non-dominated Sorting Genetic Algorithm II (NSGA-II) and the Strength Pareto Evolutionary Algorithm 2 (SPEA2), and along with the three semantic approaches, the canonical form of NSGA-II and SPEA2 are rigorously compared. Using highly-unbalanced binary classification datasets, we demonstrated that the newly proposed approach of SDO consistently generated more non-dominated solutions, with better diversity and improved hypervolume results.

翻訳日:2022-06-26 14:49:32 公開日:2022-06-13

# ユーザ生成VRビデオの品質評価のためのデータベース

A Database for Perceived Quality Assessment of User-Generated VR Videos ( http://arxiv.org/abs/2206.08751v1 )

ライセンス: Link先を確認

Yuming Fang, Yiru Yao, Xiangjie Sui, and Kede Ma

(参考訳) 仮想現実(vr)ビデオ(通常は360$^\circ$ビデオ)は、vr技術の急速な発展と、消費者向けの360$^\circ$カメラやディスプレイの普及により、注目を集めている。したがって、ユーザーが生み出すVRビデオをどのように知覚するかを理解することが重要であり、それは、しばしば空間と時間で局所化される真正な歪みに悩まされる可能性がある。本稿では,豊富なコンテンツと歪みのある502のユーザ生成ビデオを含む,最大規模の360$^\circ$ビデオデータベースを構築する。 139人の視聴行動(すなわちスキャンパス)を捉え、4つの異なる視聴条件下で評価された品質の意見スコアを収集する(2つの開始点$\times$2回の探索時間)。本研究では,記録データに対する詳細な統計分析を行い,視聴状況が視聴行動や知覚品質に与える影響など,いくつかの興味深い観察結果を得た。また,360$^\circ$ビデオの品質評価のための計算モデルの評価,サリエンシー検出など,データと分析の他の用途についても検討した。データセットとコードはhttps://github.com/Yao-Yiru/VR-Video-Database.comで公開しています。

Virtual reality (VR) videos (typically in the form of 360$^\circ$ videos) have gained increasing attention due to the fast development of VR technologies and the remarkable popularization of consumer-grade 360$^\circ$ cameras and displays. Thus it is pivotal to understand how people perceive user-generated VR videos, which may suffer from commingled authentic distortions, often localized in space and time. In this paper, we establish one of the largest 360$^\circ$ video databases, containing 502 user-generated videos with rich content and distortion diversities. We capture viewing behaviors (i.e., scanpaths) of 139 users, and collect their opinion scores of perceived quality under four different viewing conditions (two starting points $\times$ two exploration times). We provide a thorough statistical analysis of recorded data, resulting in several interesting observations, such as the significant impact of viewing conditions on viewing behaviors and perceived quality. Besides, we explore other usage of our data and analysis, including evaluation of computational models for quality assessment and saliency detection of 360$^\circ$ videos. We have made the dataset and code available at https://github.com/Yao-Yiru/VR-Video-Database.

翻訳日:2022-06-26 07:35:49 公開日:2022-06-13

# アルミナセラミックスレーザ加工の機械学習によるプロセス

Machine Learning-Driven Process of Alumina Ceramics Laser Machining ( http://arxiv.org/abs/2206.08747v1 )

ライセンス: Link先を確認

Razyeh Behbahani, Hamidreza Yazdani Sarvestani, Erfan Fatehi, Elham Kiyani, Behnam Ashrafi, Mikko Karttunen and Meysam Rahmat

(参考訳) レーザー加工は高度に柔軟な非接触製造技術であり、学界や産業で広く使われている。光と物質間の非線形相互作用のため、レーザー加工パラメータ間の相互関係の理解を提供することにより、加工品質を向上させるため、シミュレーション手法は非常に重要である。一方、実験的な処理パラメータ最適化では、利用可能な処理パラメータ空間上での系統的、結果として時間を要する調査を推奨している。インテリジェントな戦略は、機械学習(ML)技術を用いて、ピコ秒レーザー加工パラメータ間の関係を捕捉し、適切なパラメータの組み合わせを見つけることで、深い、滑らかで欠陥のないパターンを持つ工業級アルミナセラミックスの所望のカットを生成することである。 MLモデルを用いて、ビーム振幅や周波数、スキャナ通過速度、表面の通過数、試料表面からのスキャナの垂直距離などのレーザパラメータを用いて、刻印チャネルの深さ、頂幅、底幅を予測する。レーザーパラメータ間の複雑な相関関係から,ニューラルネットワーク (nn) が出力予測において最も効率的であることが示されている。レーザパラメータと刻まれたチャネル次元の相互接続をキャプチャするMLモデルにより、ターゲットチャネル形状を達成するために必要な入力パラメータを予測することができる。この戦略は、精度や性能を損なうことなく、開発段階での実験レーザー加工のコストと労力を大幅に削減する。開発された技術は、幅広いセラミックレーザ加工プロセスに適用することができる。

Laser machining is a highly flexible non-contact manufacturing technique that has been employed widely across academia and industry. Due to nonlinear interactions between light and matter, simulation methods are extremely crucial, as they help enhance the machining quality by offering comprehension of the inter-relationships between the laser processing parameters. On the other hand, experimental processing parameter optimization recommends a systematic, and consequently time-consuming, investigation over the available processing parameter space. An intelligent strategy is to employ machine learning (ML) techniques to capture the relationship between picosecond laser machining parameters for finding proper parameter combinations to create the desired cuts on industrial-grade alumina ceramic with deep, smooth and defect-free patterns. Laser parameters such as beam amplitude and frequency, scanner passing speed and the number of passes over the surface, as well as the vertical distance of the scanner from the sample surface, are used for predicting the depth, top width, and bottom width of the engraved channels using ML models. Owing to the complex correlation between laser parameters, it is shown that Neural Networks (NN) are the most efficient in predicting the outputs. Equipped with an ML model that captures the interconnection between laser parameters and the engraved channel dimensions, one can predict the required input parameters to achieve a target channel geometry. This strategy significantly reduces the cost and effort of experimental laser machining during the development phase, without compromising accuracy or performance. The developed techniques can be applied to a wide range of ceramic laser machining processes.

翻訳日:2022-06-26 07:34:54 公開日:2022-06-13

# ReViSe:スマートフォンカメラを用いたリモートバイタルサイン計測

ReViSe: Remote Vital Signs Measurement Using Smartphone Camera ( http://arxiv.org/abs/2206.08748v1 )

ライセンス: Link先を確認

Donghao Qiao, Amtul Haq Ayesha, Farhana Zulkernine, Raihan Masroor, Nauman Jaffar

(参考訳) 遠隔photoplethysmography(rppg)は、顔ビデオを用いたバイタルサイン推定を可能にするバイオメトリックデータ収集のための高速で効果的で安価で便利な方法である。新型コロナウイルス(COVID-19)のパンデミックでは、遠隔医療サービス提供が不可欠であることが証明されている。本稿では,スマートフォンカメラで撮影したユーザの顔の映像から,心拍数(HR),心拍変動(HRV),酸素飽和度(SpO2),血圧(BP)など,人々のバイタルサインを測定するためのエンドツーエンドのフレームワークを提案する。ディープラーニングに基づくニューラルネットワークモデルを用いて,顔のランドマークをリアルタイムで抽出する。予測された顔ランドマークを用いて、関心領域(roi)とも呼ばれる複数の顔パッチを抽出する。血液量パルス(BVP)信号と呼ばれる抽出された心臓信号のRoIからのノイズを低減するために、いくつかのフィルタが適用される。我々は,東京工科大学rPPGとPURE(Pulse Rate Detection)という2つの公開rPPGデータセットを用いて,機械学習モデルを訓練し,検証した。 a) HR それぞれ 1.73 と 3.95 のビーツパーミニット (bpm) について b) HRVは18.55ms、25.03ms、 c) SpO2 は PURE データセット上の 1.64 の MAE である。実生活環境において、エンドツーエンドのrPPGフレームワークReViSeを検証し、Video-HRデータセットを作成しました。我々のHR推定モデルは、このデータセット上で2.49bpmのMAEを達成した。顔ビデオによるBP測定のために公開されているrPPGデータセットは存在しないため、指先センサーからの信号によるデータセットを使用してモデルをトレーニングし、独自のビデオデータセットであるVideo-BPを作成しました。ビデオ-BPデータセットでは,SBPでは6.7mmHg,DBPでは9.6mmHg,DBPでは9.6mmHgであった。

Remote Photoplethysmography (rPPG) is a fast, effective, inexpensive and convenient method for collecting biometric data as it enables vital signs estimation using face videos. Remote contactless medical service provisioning has proven to be a dire necessity during the COVID-19 pandemic. We propose an end-to-end framework to measure people's vital signs including Heart Rate (HR), Heart Rate Variability (HRV), Oxygen Saturation (SpO2) and Blood Pressure (BP) based on the rPPG methodology from the video of a user's face captured with a smartphone camera. We extract face landmarks with a deep learning-based neural network model in real-time. Multiple face patches also called Region-of-Interests (RoIs) are extracted by using the predicted face landmarks. Several filters are applied to reduce the noise from the RoIs in the extracted cardiac signals called Blood Volume Pulse (BVP) signal. We trained and validated machine learning models using two public rPPG datasets namely the TokyoTech rPPG and the Pulse Rate Detection (PURE) datasets, on which our models achieved the following Mean Absolute Errors (MAE): a) for HR, 1.73 and 3.95 Beats-Per-Minute (bpm) respectively, b) for HRV, 18.55 and 25.03 ms respectively, and c) for SpO2, a MAE of 1.64 on the PURE dataset. We validated our end-to-end rPPG framework, ReViSe, in real life environment, and thereby created the Video-HR dataset. Our HR estimation model achieved a MAE of 2.49 bpm on this dataset. Since no publicly available rPPG datasets existed for BP measurement with face videos, we used a dataset with signals from fingertip sensor to train our model and also created our own video dataset, Video-BP. On our Video-BP dataset, our BP estimation model achieved a MAE of 6.7 mmHg for Systolic Blood Pressure (SBP), and a MAE of 9.6 mmHg for Diastolic Blood Pressure (DBP).

翻訳日:2022-06-26 07:11:39 公開日:2022-06-13

# 機械学習支援物理刺激ラマン散乱モデルに基づくフレキシブルラマン増幅器最適化

Flexible Raman Amplifier Optimization Based on Machine Learning-aided Physical Stimulated Raman Scattering Model ( http://arxiv.org/abs/2206.07650v1 )

ライセンス: Link先を確認

Metodi Plamenov Yankov, Francesco Da Ros, Uiara Celine de Moura, Andrea Carena and Darko Zibar

(参考訳) ラマン増幅器最適化の問題点について検討した。機械学習(ML)を用いたラマンゲイン係数に対して,前方伝搬ラマンポンプの勾配勾配勾配最適化が可能な微分補間関数を求める。フォワードポンプ構成における任意の数のポンプの周波数とパワーは、任意のデータチャネルロードとスパン長さに最適化される。前方伝搬モデルは、後方増幅器のポンプの周波数とパワーと後方増幅器のポンプのパワーを協調的に最適化する後方励起ラマン増幅器の実験的なMLモデルと組み合わせられる。前方および後方のアンプの最適化は250kmの未繰り返し伝送に対して実証される。 4Hz以上の利得平坦度を$<$ 1~dBとする。最適化増幅器は数値シミュレータを用いて検証される。

The problem of Raman amplifier optimization is studied. A differentiable interpolation function is obtained for the Raman gain coefficient using machine learning (ML), which allows for the gradient descent optimization of forward-propagating Raman pumps. Both the frequency and power of an arbitrary number of pumps in a forward pumping configuration are then optimized for an arbitrary data channel load and span length. The forward propagation model is combined with an experimentally-trained ML model of a backward-pumping Raman amplifier to jointly optimize the frequency and power of the forward amplifier's pumps and the powers of the backward amplifier's pumps. The joint forward and backward amplifier optimization is demonstrated for an unrepeatered transmission of 250 km. A gain flatness of $<$ 1~dB over 4 THz is achieved. The optimized amplifiers are validated using a numerical simulator.

翻訳日:2022-06-16 15:11:08 公開日:2022-06-13

# (参考訳) 行動経路を有するレコメンダ変換器

Recommender Transformers with Behavior Pathways ( http://arxiv.org/abs/2206.06804v1 )

ライセンス: CC BY 4.0

Zhiyu Yao, Xinyang Chen, Sinan Wang, Qinyan Dai, Yumeng Li, Tanchao Zhu, Mingsheng Long

(参考訳) シーケンシャルレコメンデーションでは、正確なレコメンデーションのために、ログ化されたユーザ行動データから進化する振る舞い特性をキャプチャする必要がある。しかし、ユーザーの振る舞いシーケンスは複数のスレッドが絡み合っているスクリプトと見なされる。重要な振る舞いの小さなセットだけが、ユーザの将来のアクションに進化できることが分かっています。その結果,ユーザの将来の行動を予測することは困難である。この特性は,各ユーザの逐次動作を行動経路として決定する。異なるユーザーは独自の行動経路を持っている。既存のシーケンシャルモデルの中で、トランスフォーマーは世界依存の特徴を捉えている。しかし、これらのモデルは、主に自己注意機構を用いて、以前の全ての行動に密分布を提供し、各ユーザーに調整されていない自明な行動によって最終的な予測が圧倒される。本稿では,新しいパスウェイアテンション機構を備えたRecommender Transformer(RETR)を構築する。 RETRは、ユーザ毎に指定された行動経路を動的に計画し、この行動経路を介してネットワークをスペアリングして、推奨に有用な進化パターンを効果的に捕捉することができる。重要な設計は、単純な振る舞いによって振舞いパスが圧倒されるのを防ぐために学習されたバイナリルートである。実世界の7つのデータセットに対するRETRの有効性を実証的に検証し、RETRは最先端の性能を得る。

Sequential recommendation requires the recommender to capture the evolving behavior characteristics from logged user behavior data for accurate recommendations. However, user behavior sequences are viewed as a script with multiple ongoing threads intertwined. We find that only a small set of pivotal behaviors can be evolved into the user's future action. As a result, the future behavior of the user is hard to predict. We conclude this characteristic for sequential behaviors of each user as the Behavior Pathway. Different users have their unique behavior pathways. Among existing sequential models, transformers have shown great capacity in capturing global-dependent characteristics. However, these models mainly provide a dense distribution over all previous behaviors using the self-attention mechanism, making the final predictions overwhelmed by the trivial behaviors not adjusted to each user. In this paper, we build the Recommender Transformer (RETR) with a novel Pathway Attention mechanism. RETR can dynamically plan the behavior pathway specified for each user, and sparingly activate the network through this behavior pathway to effectively capture evolving patterns useful for recommendation. The key design is a learned binary route to prevent the behavior pathway from being overwhelmed by trivial behaviors. We empirically verify the effectiveness of RETR on seven real-world datasets and RETR yields state-of-the-art performance.

翻訳日:2022-06-16 11:23:22 公開日:2022-06-13

# (参考訳) 深達度学習による複数標識遅延動脈スピンラベルMRIによる脳血流の促進と動脈輸送時間マップの推定

Acceleration of cerebral blood flow and arterial transit time maps estimation from multiple post-labeling delay arterial spin-labeled MRI via deep learning ( http://arxiv.org/abs/2206.06372v1 )

ライセンス: CC BY 4.0

Yiran Li and Ze Wang

(参考訳) 目的: 動脈スピンラベリング (ASL) 灌流像は, 脳血流の直接的および絶対的測定を示している。動脈輸送時間(英: Arterial transit time、ATT)は、脳の領域に到達するためのラベル付きスピンの持続期間を反映する生理学的パラメータである。複数ラベル後遅延(PLD)は、CBFとATTの双方に対して堅牢な尺度を提供し、ATTに基づく地域CBFモデリングの最適化を可能にする。長期取得時間はCBFとATT推定の品質と精度を低下させる可能性がある。信号対雑音比(SNR)の高いPLDの数を著しく削減する新しいネットワークを提案する。方法: CBF法とATT法では, PLDが1例, PLDが2例であった。各モデルは、灌流重み付き画像(PWI)からCBFおよびATT画像への非線形変換を独立に学習した。結果: 1-PLDモデルと2-PLDモデルでは, CBFモデルと2-PLDモデルでは, ATT estima-tionではより正確な構造を示した。提案手法は,STNを犠牲にすることなく,ATTでは6から2に,CBFでは1つのPDDに減少させる。結論: 高品質のディープラーニングを用いたpld削減によるcbfおよびattマップの生成は可能である。

Purpose: Arterial spin labeling (ASL) perfusion imaging indicates direct and absolute measurement of cerebral blood flow (CBF). Arterial transit time (ATT) is a related physiological parameter reflecting the duration for the labeled spins to reach the brain region of interest. Multiple post-labeling delay (PLDs) can provide robust measures of both CBF and ATT, allowing for optimization of regional CBF modeling based on ATT. The prolonged acquisition time can potentially reduce the quality and accuracy of the CBF and ATT estimation. We proposed a novel network to significantly reduce the number of PLDs with higher signal-to-noise ratio (SNR). Method: CBF and ATT estimations were performed for one PLD and two PLDs sepa-rately. Each model was trained independently to learn the nonlinear transformation from perfusion weighted image (PWI) to CBF and ATT images. Results: Both one-PLD and two-PLD models outperformed the conventional method visually on CBF and two-PLD model showed more accurate structure on ATT estima-tion. The proposed method significantly reduces the number of PLDs from 6 to 2 on ATT and even to single PLD on CBF without sacrificing the SNR. Conclusion: It is feasible to generate CBF and ATT maps with reduced PLDs using deep learning with high quality.

翻訳日:2022-06-16 11:04:07 公開日:2022-06-13

# (参考訳) 材料科学における記号回帰:データからの原子間ポテンシャルの発見

Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data ( http://arxiv.org/abs/2206.06422v1 )

ライセンス: CC BY 4.0

Bogdan Burlacu, Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller

(参考訳) 原子スケールの物質の粒子モデルは、新しい材料の開発とそれらの性質の理解において重要な役割を果たす。粒子シミュレーションの精度は原子間ポテンシャルによって決定され、原子座標や他の性質の関数として原子系のポテンシャルエネルギーを計算することができる。第一原理に基づくab initioポテンシャルは任意のレベルの精度に達するが、高い計算コストによってその適用性は制限される。機械学習(ML)は最近、高価なモデルを電子構造データに基づいて訓練された高効率なサロゲートに置き換えることで、アブ初期原子ポテンシャルの計算コストを相殺する有効な方法として登場した。現在の多くの手法の中で、記号回帰(SR)は、原子間ポテンシャルの関数形式を発見するための強力な「ホワイトボックス」アプローチとして勢いを増している。本研究は材料科学(MS)における象徴的回帰の役割を論じ、現在の方法論的課題と最先端の成果を概観する。 ab initio電子構造データを用いて、生データ(原子位置と関連するポテンシャルエネルギーのスナップショット)から原子電位をモデル化する遺伝的プログラミングに基づくアプローチを提示し、実証的に検証する。

Particle-based modeling of materials at atomic scale plays an important role in the development of new materials and understanding of their properties. The accuracy of particle simulations is determined by interatomic potentials, which allow to calculate the potential energy of an atomic system as a function of atomic coordinates and potentially other properties. First-principles-based ab initio potentials can reach arbitrary levels of accuracy, however their aplicability is limited by their high computational cost. Machine learning (ML) has recently emerged as an effective way to offset the high computational costs of ab initio atomic potentials by replacing expensive models with highly efficient surrogates trained on electronic structure data. Among a plethora of current methods, symbolic regression (SR) is gaining traction as a powerful "white-box" approach for discovering functional forms of interatomic potentials. This contribution discusses the role of symbolic regression in Materials Science (MS) and offers a comprehensive overview of current methodological challenges and state-of-the-art results. A genetic programming-based approach for modeling atomic potentials from raw data (consisting of snapshots of atomic positions and associated potential energy) is presented and empirically validated on ab initio electronic structure data.

翻訳日:2022-06-16 10:57:39 公開日:2022-06-13

# (参考訳) Trajectory-Wise Reward を用いたオフライン強化学習

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward ( http://arxiv.org/abs/2206.06426v1 )

ライセンス: CC BY 4.0

Tengyu Xu, Yingbin Liang

(参考訳) 強化学習(RL)の顕著な成功は、訪問した全ての状態-行動ペアの報酬の観察に大きく依存している。しかし、現実世界の多くの応用において、エージェントは軌道全体の質を表すスコアのみを観察することができ、これは「軌道回り報酬」と呼ばれる。このような状況下では、標準のRL法では軌道的報酬をうまく活用することは困難であり、政策評価において大きなバイアスと分散誤差が生じる可能性がある。本稿では、最小二乗法に基づく報酬再分配によるステップごとの代用報酬への軌道戻りを分解し、学習した代用報酬に基づいて悲観的価値反復を行う、Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)と呼ばれる新しいオフラインRLアルゴリズムを提案する。 PartEDで構築された値関数が常に最適値に対して悲観的であることを保証するため、我々はプロキシ報酬の不確実性を相殺する新しいペナルティ項を設計する。大きな状態空間を持つ一般的なエピソードMDPに対して、オーバーパラメータ化されたニューラルネットワーク関数近似で$\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is the length of episode, $N$ is the total number of sample, $D_{\text{eff}}$ is the effective dimension of the neural tangent kernel matrix。この結果をさらに説明するために、parted は線形 mdps に対して $\tilde{\mathcal{o}}(dh^3/\sqrt{n})$ 準最適性を達成し、ここで $d$ は特徴次元であり、$d_{\text{eff}}=dh$ のとき、ニューラルネットワーク関数近似と一致する。私たちの知る限りでは、PartEDは、トラジェクティブな報酬を持つ一般のMDPにおいて、確実に効率の良い最初のオフラインRLアルゴリズムである。

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair. In many real world applications, however, an agent can observe only a score that represents the quality of the whole trajectory, which is referred to as the {\em trajectory-wise reward}. In such a situation, it is difficult for standard RL methods to well utilize trajectory-wise reward, and large bias and variance errors can be incurred in policy evaluation. In this work, we propose a novel offline RL algorithm, called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED), which decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value iteration based on the learned proxy reward. To ensure the value functions constructed by PARTED are always pessimistic with respect to the optimal ones, we design a new penalty term to offset the uncertainty of the proxy reward. For general episodic MDPs with large state space, we show that PARTED with overparameterized neural network function approximation achieves an $\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is the length of episode, $N$ is the total number of samples, and $D_{\text{eff}}$ is the effective dimension of the neural tangent kernel matrix. To further illustrate the result, we show that PARTED achieves an $\tilde{\mathcal{O}}(dH^3/\sqrt{N})$ suboptimality with linear MDPs, where $d$ is the feature dimension, which matches with that with neural network function approximation, when $D_{\text{eff}}=dH$. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.

翻訳日:2022-06-16 10:36:58 公開日:2022-06-13

# (参考訳) 命題型フレームワークにおける最適化の要約

An Abstract View on Optimizations in Propositional Frameworks ( http://arxiv.org/abs/2206.06440v1 )

ライセンス: CC BY 4.0

Yuliya Lierler

(参考訳) 検索最適化問題は、科学や工学の分野では多い。人工知能は、検索最適化問題の解決とモデリングを目的とした検索アルゴリズムと宣言型プログラミング言語の開発に長い間貢献してきた。自動推論と知識表現はAIのサブフィールドであり、これらの開発に特に適している。一般的な自動推論パラダイムの多くは、最適化ステートメントをサポートする言語をユーザに提供しています。これらのパラダイムは言語や計算されたソリューションの品質条件を表現する方法によって大きく異なる。ここでは、パラダイム間の構文的な区別をなくし、パラダイムによって提供される最適化文間の本質的な類似性と相違を見極めるいわゆる重みシステムの統一フレームワークを提案する。この統一的な見通しは、自動推論と知識表現における最適化とモジュラリティの研究において重要な単純化と説明可能性を有しており、異なる形式を橋渡しし翻訳解決法を開発するための技術的手段を提供する。論理プログラミングの理論と実践(tplp)における考察。

Search-optimization problems are plentiful in scientific and engineering domains. Artificial intelligence has long contributed to the development of search algorithms and declarative programming languages geared towards solving and modeling search-optimization problems. Automated reasoning and knowledge representation are the subfields of AI that are particularly vested in these developments. Many popular automated reasoning paradigms provide users with languages supporting optimization statements: MaxSAT or answer set programming, to name a few. These paradigms vary significantly in their languages and in the ways they express quality conditions on computed solutions. Here we propose a unifying framework of so-called weight systems that eliminates syntactic distinctions between paradigms and allows us to see essential similarities and differences between optimization statements provided by paradigms. This unifying outlook has a significant simplifying and explanatory potential in the studies of optimization and modularity in automated reasoning and knowledge representation providing technical means for bridging distinct formalisms and developing translational solvers. Under consideration in Theory and Practice of Logic Programming (TPLP).

翻訳日:2022-06-16 10:34:47 公開日:2022-06-13

# (参考訳) Splatting を用いた画像分解能に対するセグメンテーションネットワークの適用

Fitting Segmentation Networks on Varying Image Resolutions using Splatting ( http://arxiv.org/abs/2206.06445v1 )

ライセンス: CC BY 4.0

Mikael Brudfors and Yael Balbastre and John Ashburner and Geraint Rees and Parashkev Nachev and Sebastien Ourselin and M. Jorge Cardoso

(参考訳) イメージセグメンテーションで使用されるデータは、必ずしも同じグリッド上で定義されない。これは特に医療画像に当てはまるもので、解像度、視野、方向がチャンネルや被験者によって異なる可能性がある。したがって、画像とラベルは、前処理ステップとして、通常同じグリッドに再サンプリングされる。しかし,再サンプリング操作では部分体積効果やぼやけが生じ,有効分解能が変化し,構造間のコントラストが低下する。本稿では,入力データの解像度ミスマッチを自動的に処理するsplat層を提案する。この層は、各画像をフォワードパスが行われる平均空間にプッシュする。スプレート演算子が再サンプリング演算子の随伴であるので、平均空間予測をネイティブラベル空間に引き戻すことができ、損失関数が計算される。これにより、補間による明示的な解決調整の必要性が排除される。シミュレーションおよび実マルチモーダル磁気共鳴画像を用いた2つの公開データセットにおいて,本モデルは,前処理ステップとして再サンプリングを行うよりもセグメンテーション結果を改善することを示す。

Data used in image segmentation are not always defined on the same grid. This is particularly true for medical images, where the resolution, field-of-view and orientation can differ across channels and subjects. Images and labels are therefore commonly resampled onto the same grid, as a pre-processing step. However, the resampling operation introduces partial volume effects and blurring, thereby changing the effective resolution and reducing the contrast between structures. In this paper we propose a splat layer, which automatically handles resolution mismatches in the input data. This layer pushes each image onto a mean space where the forward pass is performed. As the splat operator is the adjoint to the resampling operator, the mean-space prediction can be pulled back to the native label space, where the loss function is computed. Thus, the need for explicit resolution adjustment using interpolation is removed. We show on two publicly available datasets, with simulated and real multi-modal magnetic resonance images, that this model improves segmentation results compared to resampling as a pre-processing step.

翻訳日:2022-06-16 10:01:56 公開日:2022-06-13

# (参考訳) ソロモンオフ予測のためのジレンマ

A Dilemma for Solomonoff Prediction ( http://arxiv.org/abs/2206.06473v1 )

ライセンス: CC BY 4.0

Sven Neth

(参考訳) ソロモンフ予測の枠組みは、コルモゴロフの複雑性に逆比例する仮説に事前の確率を割り当てる。有名な問題は2つある。第一に、Solomonoff はUniversal Turing マシンの選択と相対的である。第二に、以前のSolomonoffは計算不可能である。しかし、どちらの問題にも反応がある。異なるSolomonoffの優先順位は、ますます多くのデータに収束する。さらに、Solomonoffに対する計算可能な近似がある。私はこの2つの反応の間に緊張があると思う。これは、ソロモンオフ予測への計算可能近似が必ずしも収束しないためである。

The framework of Solomonoff prediction assigns prior probability to hypotheses inversely proportional to their Kolmogorov complexity. There are two well-known problems. First, the Solomonoff prior is relative to a choice of Universal Turing machine. Second, the Solomonoff prior is not computable. However, there are responses to both problems. Different Solomonoff priors converge with more and more data. Further, there are computable approximations to the Solomonoff prior. I argue that there is a tension between these two responses. This is because computable approximations to Solomonoff prediction do not always converge.

翻訳日:2022-06-16 09:47:40 公開日:2022-06-13

# (参考訳) 最悪の性能のためのロバスト蒸留

Robust Distillation for Worst-class Performance ( http://arxiv.org/abs/2206.06479v1 )

ライセンス: CC BY 4.0

Serena Wang and Harikrishna Narasimhan and Yichen Zhou and Sara Hooker and Michal Lukasik and Aditya Krishna Menon

(参考訳) 知識蒸留は教師モデルからの予測を用いた生徒モデルの性能向上に有効な手法であることが証明されている。しかし、最近の研究では、平均効率の利得はデータのサブグループ間で均一ではなく、特に稀なサブグループやクラスにおいて精度の犠牲となることが示されている。長期分布に追随する可能性のあるクラス間での強い性能を維持するため,学生の最悪のクラスパフォーマンスを改善するために調整された蒸留技術を開発した。具体的には、教師と生徒の異なる組み合わせで頑健な最適化目標を導入し、さらに、全体的な精度と頑健な最悪の目標とのトレードオフを伴うトレーニングを可能にする。実験結果から, 我々のロバスト蒸留技術は, より良い最低級性能を達成するだけでなく, 総合的性能と最低級性能のトレードオフを他の基準法と比較し, パレート的に改善することを示した。理論的には、ロバストな学生の教育を目標とするときに、良い教師になるものについての洞察を提供する。

Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may follow a long-tailed distribution, we develop distillation techniques that are tailored to improve the student's worst-class performance. Specifically, we introduce robust optimization objectives in different combinations for the teacher and student, and further allow for training with any tradeoff between the overall accuracy and the robust worst-class objective. We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods. Theoretically, we provide insights into what makes a good teacher when the goal is to train a robust student.

翻訳日:2022-06-16 09:32:24 公開日:2022-06-13

# (参考訳) RigNeRF:完全制御可能なニューラル3Dポートレイト

RigNeRF: Fully Controllable Neural 3D Portraits ( http://arxiv.org/abs/2206.06481v1 )

ライセンス: CC BY 4.0

ShahRukh Athar, Zexiang Xu, Kalyan Sunkavalli, Eli Shechtman and Zhixin Shu

(参考訳) ニューラルレイディアンス場(NeRF)のような体積的ニューラルレンダリング法は、フォトリアリスティックな新規なビュー合成を可能にしている。しかし、標準的な形式では、NeRFはシーン内の人間の頭のようなオブジェクトの編集をサポートしない。本研究では,単に新しい視点合成ではなく,単一のポートレートビデオから学習した頭部のポーズと表情の完全な制御を可能にするシステムである rignerf を提案する。 3d morphable face model (3dmm) によって誘導される変形場を用いて, 頭部の姿勢と表情の変化をモデル化する。 3DMMは、3DMM変形の残差のみを予測することを学習し、入力シーケンスに存在しない新しい(厳密な)ポーズや(厳密でない)表現を描画するRigNeRFの先行として効果的に機能する。スマートフォンで撮影した訓練対象のショートビデオのみを用いて,明快な頭部ポーズと表情制御を備えたポートレートシーンのフリービュー合成における本手法の有効性を実証した。プロジェクトページは以下の通りである。

Volumetric neural rendering methods, such as neural radiance fields (NeRFs), have enabled photo-realistic novel view synthesis. However, in their standard form, NeRFs do not support the editing of objects, such as a human head, within a scene. In this work, we propose RigNeRF, a system that goes beyond just novel view synthesis and enables full control of head pose and facial expressions learned from a single portrait video. We model changes in head pose and facial expressions using a deformation field that is guided by a 3D morphable face model (3DMM). The 3DMM effectively acts as a prior for RigNeRF that learns to predict only residuals to the 3DMM deformations and allows us to render novel (rigid) poses and (non-rigid) expressions that were not present in the input sequence. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls. The project page can be found here: http://shahrukhathar.github.io/2022/06/06/RigNeRF.html

翻訳日:2022-06-16 09:31:18 公開日:2022-06-13

# (参考訳) ニューラルデータ拡張と機械学習モデルによるfMRIへのfNIRSマッピング

Mapping fNIRS to fMRI with Neural Data Augmentation and Machine Learning Models ( http://arxiv.org/abs/2206.06486v1 )

ライセンス: CC BY 4.0

Jihyun Hur, Jaeyeong Yang, Hoyoung Doh, Woo-Young Ahn

(参考訳) 神経画像技術の進歩は、人間の心がどのように機能するかを理解する新しい洞察を与えました。機能的磁気共鳴イメージング(fMRI)は最も広く用いられている神経イメージング技術であり、個人差のfMRIベースのマーカーへの関心が高まっている。しかし、その効用は高いコストと子供や幼児を含む特定の人口からの獲得が困難であるために制限されることが多い。 fMRIマーカーのサーロゲートマーカーまたはニューラル相関は、重要な実用的意味を持つが、fMRIマーカーに対するスタンドアローン予測器はほとんどない。そこで我々は、機械学習モデルとデータ拡張を用いて、能動近赤外分光法(fNIRS)の多変量パターンから、人間の認識のfMRIマーカーを精度良く予測した。全2回の訪問において,fNIRS,fMRI,fNIRS,fMRIの2つの認知タスク(ストップ信号タスクと確率的逆転学習タスク)を施行した50名の被験者を募集した。 MLモデルとデータ拡張を用いて、前頭前皮質の48チャンネルのfNIRS活性化による応答抑制や予測誤り信号のfMRIマーカーの確立を予測できる。これらの結果から、fNIRSはfMRI活性化の補助的マーカーとなり、幼児を含む様々な集団の理解を広げる可能性が示唆された。

Advances in neuroimaging techniques have provided us novel insights into understanding how the human mind works. Functional magnetic resonance imaging (fMRI) is the most popular and widely used neuroimaging technique, and there is growing interest in fMRI-based markers of individual differences. However, its utility is often limited due to its high cost and difficulty acquiring from specific populations, including children and infants. Surrogate markers, or neural correlates of fMRI markers, would have important practical implications, but we have few stand-alone predictors for the fMRI markers. Here, using machine learning (ML) models and data augmentation, we predicted well-validated fMRI markers of human cognition from multivariate patterns of functional near-infrared spectroscopy (fNIRS), a portable and relatively inexpensive optical neuroimaging technique. We recruited 50 human participants who performed two cognitive tasks (stop signal task and probabilistic reversal learning task), while neural activation was measured with either fNIRS or fMRI at each of the total two visits. Using ML models and data augmentation, we could predict the well-established fMRI markers of response inhibition or prediction error signals from 48-channel fNIRS activation in the prefrontal cortex. These results suggest that fNIRS might offer a surrogate marker of fMRI activation, which would broaden our understanding of various populations, including infants.

翻訳日:2022-06-16 09:30:18 公開日:2022-06-13

# (参考訳) 未ラベル画像からのタスク依存型ゲーム状態表現の学習

Learning Task-Independent Game State Representations from Unlabeled Images ( http://arxiv.org/abs/2206.06490v1 )

ライセンス: CC BY 4.0

Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis

(参考訳) 自己教師付き学習(SSL)技術は、高次元複素データからコンパクトで情報的な表現を学習するために広く用いられている。画像分類などの多くのコンピュータビジョンタスクにおいて、このような手法は教師付き学習手法を超える最先端の結果が得られる。本稿では,ゲームの状態表現を正確に学習するタスクにおいて,SSL手法をどの程度活用できるかを検討する。そこで本研究では,VizDoom,CARLAレーシングシミュレータ,Google Research Football Environmentという3つの異なる3Dゲームから,ゲーム映像フレームとそれに対応するゲームの内部状態を収集する。画像エンコーダを生フレームのみを使用して3つの広く使用されているsslアルゴリズムでトレーニングし,学習した表現から内部状態変数を復元する。その結果,ImageNetなどのトレーニング済みベースラインモデルと比較して,SSL表現とゲーム内部状態の相関が著しく高いことがわかった。このような発見は、SSLベースのビジュアルエンコーダは、特定のタスクに合わせたものではなく、一般的な -- ゲームピクセル情報のみから情報的なゲーム表現が得られることを示唆している。このような表現は、ゲームプレイング、コンテンツ生成、プレイヤーモデリングなど、ゲームにおける下流学習タスクのパフォーマンスを高める基盤を形成することができる。

Self-supervised learning (SSL) techniques have been widely used to learn compact and informative representations from high-dimensional complex data. In many computer vision tasks, such as image classification, such methods achieve state-of-the-art results that surpass supervised learning approaches. In this paper, we investigate whether SSL methods can be leveraged for the task of learning accurate state representations of games, and if so, to what extent. For this purpose, we collect game footage frames and corresponding sequences of games' internal state from three different 3D games: VizDoom, the CARLA racing simulator and the Google Research Football Environment. We train an image encoder with three widely used SSL algorithms using solely the raw frames, and then attempt to recover the internal state variables from the learned representations. Our results across all three games showcase significantly higher correlation between SSL representations and the game's internal state compared to pre-trained baseline models such as ImageNet. Such findings suggest that SSL-based visual encoders can yield general -- not tailored to a specific task -- yet informative game representations solely from game pixel information. Such representations can, in turn, form the basis for boosting the performance of downstream learning tasks in games, including gameplaying, content generation and player modeling.

翻訳日:2022-06-16 09:24:02 公開日:2022-06-13

# (参考訳) 粒子群最適化による高g最適精密設計の高速計算

Fast Computation of Highly G-optimal Exact Designs via Particle Swarm Optimization ( http://arxiv.org/abs/2206.06498v1 )

ライセンス: CC BY 4.0

Stephen J. Walsh and John J. Borkowski

(参考訳) 応答面モデルのための正確な$G$-optimal設計を提案する計算は、過去2年間にアルゴリズム開発によって漸進的に改善された難しい計算である。これらの最適設計は、計算の困難さとコストのために、アプリケーションでは広く考慮されていない。座標交換(CEXCH)、遺伝的アルゴリズム(GA)、および計算コストの大きい$I_\lambda$-optimalityアルゴリズム(G(I_\lambda)$-CEXCH)による比較的新しい$G$-optimalである。 particle swarm optimization (pso)は、多くのアプリケーションで広く使われているが、その広範囲な成功にもかかわらず、最適設計問題への応用は比較的少ない。本稿では,PSOを最適設計問題に適用するための拡張手法を提案する。次に、psoを用いて、工業実験で一般的な実験サイズである$k = 1, 2, 3, 4, 5$の設計因子を含むいくつかのシナリオの最適設計を作成する。これらの結果と過去20年間の文献で公表された$g$-optimalデザインを比較した。 GAが生成した$G$-optimal design for $K=1, 2, 3$ factors has unchallenged for 14 years。 psoはこれらのシナリオの最適設計が改善され、最先端のアルゴリズムである$g(i_\lambda)$-cexchに匹敵する計算コストがかかることを実証した。さらに、PSOは、現在知られているものよりも、$K=4, 5$因子に対して、同等以上の$G$-optimal設計を実現できることを示す。これらの結果から,PSOは既存の手法よりも高いG$-Optimal設計を効率的に生成できる可能性が示唆された。

Computing proposed exact $G$-optimal designs for response surface models is a difficult computation that has received incremental improvements via algorithm development in the last two-decades. These optimal designs have not been considered widely in applications in part due to the difficulty and cost involved with computing them. Three primary algorithms for constructing exact $G$-optimal designs are presented in the literature: the coordinate exchange (CEXCH), a genetic algorithm (GA), and the relatively new $G$-optimal via $I_\lambda$-optimality algorithm ($G(I_\lambda)$-CEXCH) which was developed in part to address large computational cost. Particle swarm optimization (PSO) has achieved widespread use in many applications, but to date, its broad-scale success notwithstanding, has seen relatively few applications in optimal design problems. In this paper we develop an extension of PSO to adapt it to the optimal design problem. We then employ PSO to generate optimal designs for several scenarios covering $K = 1, 2, 3, 4, 5$ design factors, which are common experimental sizes in industrial experiments. We compare these results to all $G$-optimal designs published in last two decades of literature. Published $G$-optimal designs generated by GA for $K=1, 2, 3$ factors have stood unchallenged for 14 years. We demonstrate that PSO has found improved $G$-optimal designs for these scenarios, and it does this with comparable computational cost to the state-of-the-art algorithm $G(I_\lambda)$-CEXCH. Further, we show that PSO is able to produce equal or better $G$-optimal designs for $K= 4, 5$ factors than those currently known. These results suggest that PSO is superior to existing approaches for efficiently generating highly $G$-optimal designs.

翻訳日:2022-06-16 09:08:12 公開日:2022-06-13

# (参考訳) 量子化アウェアトレーニングにおける最適クリッピング法とマグニチュードアウェア微分法

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training ( http://arxiv.org/abs/2206.06501v1 )

ライセンス: CC BY 4.0

Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany

(参考訳) データクリッピングは、量子化操作におけるノイズの低減と量子化対応トレーニング(QAT)の達成可能な精度の向上に不可欠である。現在のプラクティスは、クリッピング閾値スカラーを設定するためのヒューリスティックスに依存しており、最適であることを示すことはできない。我々は,MSE最適クリッピングスカラーを決定する再帰アルゴリズムであるOptimally Clipped Tensors And Vectors (OCTAV)を提案する。高速Newton-Raphson法から派生したOCTAVは、QATルーチンの各イテレーションにおいて、テンソル毎に、フライ時に最適なクリッピングスカラーを見つける。したがって、QATアルゴリズムは各ステップで証明可能な最小量子化ノイズで定式化される。さらに, qatにおける一般的な勾配推定手法の限界を明らかにし, 精度向上のための修正としてマグニチュードアウェア微分を提案する。実験的に、OCTAV対応QATは複数のタスクで最先端の精度を達成する。その中には、ImageNet上のResNetsとMobileNetsのトレーニングとリトレーニング、BERTモデルを使用したSquadの微調整が含まれる。本研究では,量子化操作を適宜挿入する場合を除いて,ベースラインのトレーニングレシピの変更は不要である。

Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.

翻訳日:2022-06-16 08:45:11 公開日:2022-06-13

# (参考訳) 深部画像に基づくポーズ推定器を用いたスマートベッドの圧力データからのポーズ推定

Estimating Pose from Pressure Data for Smart Beds with Deep Image-based Pose Estimators ( http://arxiv.org/abs/2206.06518v1 )

ライセンス: CC BY 4.0

Vandad Davoodnia, Saeed Ghorbani, Ali Etemad

(参考訳) ベッド内ポーズ推定は、病院の患者モニタリング、睡眠研究、スマートホームなどの分野での価値を示している。本稿では,既存のポーズ推定器を用いて,高度にあいまいな圧力データから身体のポーズを検出するための異なる戦略について検討する。プレトレーニングされたポーズ推定器の性能を, 2つの圧力データセット上で直接あるいは再トレーニングすることによって検証する。また,共通目的ポーズ推定モジュールの期待入力空間に近い表現にあいまいな圧力マップを変換する学習可能な前処理領域適応ステップを利用した他の戦略についても検討する。そこで我々は,複数スケールの完全畳み込みネットワークを用いて,プレトレーニングされたポーズ推定モジュールに圧力マップのポーズ特化特性を提供する。提案手法の完全解析により,学習可能な事前処理モジュールと既存の画像ベースのポーズ推定器を併用することにより,高度にあいまいな圧力点などの問題を克服し,ポーズ推定精度を極めて高めることができた。

In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directly or by re-training them on two pressure datasets. We also explore other strategies utilizing a learnable pre-processing domain adaptation step, which transforms the vague pressure maps to a representation closer to the expected input space of common purpose pose estimation modules. Accordingly, we used a fully convolutional network with multiple scales to provide the pose-specific characteristics of the pressure maps to the pre-trained pose estimation module. Our complete analysis of different approaches shows that the combination of learnable pre-processing module along with re-training pre-existing image-based pose estimators on the pressure data is able to overcome issues such as highly vague pressure points to achieve very high pose estimation accuracy.

翻訳日:2022-06-16 08:15:20 公開日:2022-06-13

# (参考訳) 大規模メモリベースモデル編集

Memory-Based Model Editing at Scale ( http://arxiv.org/abs/2206.06520v1 )

ライセンス: CC BY 4.0

Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn

(参考訳) 最大のニューラルネットワークでさえエラーを起こし、世界が変化すれば、一度訂正された予測が無効になる可能性がある。モデルエディタはベースモデルの振る舞いを局所的に更新し、更新された知識を注入したり、望ましくない振る舞いを修正する。既存のモデルエディタは、将来性を示しているが、表現力に乏しい: 編集の意図したスコープ(編集によって影響を受ける例)を正確にモデル化するのに苦労し、編集にゆるやかに関係しているテストインプットの予測が不正確になり、多くの編集の後完全に失敗することが多い。高容量の代替品として,SERAC(Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model)を提案する。モデルエディタの厳密な評価を可能にするために,質問応答,ファクトチェック,対話生成に基づく3つの難解な言語モデル編集問題を提案する。 SERACだけが3つの問題に対して高い性能を達成し、モデル編集に対する既存のアプローチを著しく上回っていることがわかった。コード、データ、および追加のプロジェクト情報はhttps://sites.google.com/view/serac-editing.comで入手できる。

Even the largest neural networks make errors, and once-correct predictions can become invalid as the world changes. Model editors make local updates to the behavior of base (pre-trained) models to inject updated knowledge or correct undesirable behaviors. Existing model editors have shown promise, but also suffer from insufficient expressiveness: they struggle to accurately model an edit's intended scope (examples affected by the edit), leading to inaccurate predictions for test inputs loosely related to the edit, and they often fail altogether after many edits. As a higher-capacity alternative, we propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC), which stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed. To enable more rigorous evaluation of model editors, we introduce three challenging language model editing problems based on question answering, fact-checking, and dialogue generation. We find that only SERAC achieves high performance on all three problems, consistently outperforming existing approaches to model editing by a significant margin. Code, data, and additional project information will be made available at https://sites.google.com/view/serac-editing.

翻訳日:2022-06-16 07:58:36 公開日:2022-06-13

# 特殊相対性理論と非決定性からの相対的教会修道会論

A Relative Church-Turing-Deutsch Thesis from Special Relativity and Undecidability ( http://arxiv.org/abs/2206.06419v1 )

ライセンス: Link先を確認

Blake Wilson, Ethan Dickey, Vaishnavi Iyer and Sabre Kais

(参考訳) 1950年のチューリングの研究から、人工知能はチューリングマシンによって意識をシミュレートできることを提案した。これは、宇宙がコンピュータ上のシミュレーションである全てのことの潜在的な理論であり、シミュレーションの中に私たちが存在することを証明できるかどうかという疑問を提起する。本研究では,計算可能な \textit{local} マシンを,古典的チューリングマシンである \textit{global} でシミュレートした相対計算モデルを構築する。本研究では,Halting問題と同じ意味でグローバルシミュレータのローカル・マシン・コンピューティング \textbf{simulation properties} の問題は決定不能であることを示す。次に,グローバルシミュレータが蓄積した時間,空間,エラーの計算はシミュレーション特性であり,決定不能であることを示す。これらのシミュレーション特性は、我々が宇宙で経験したのと同じ定数時間局所計算複雑性を持つローカルマシンの量子力学を大域的チューリングマシンが計算する相対的チャーチ・チューリング・ドイッチュ理論を構築するために使用する相対モデルにおいて、特別な相対論的効果をもたらす。

Beginning with Turing's seminal work in 1950, artificial intelligence proposes that consciousness can be simulated by a Turing machine. This implies a potential theory of everything where the universe is a simulation on a computer, which begs the question of whether we can prove we exist in a simulation. In this work, we construct a relative model of computation where a computable \textit{local} machine is simulated by a \textit{global}, classical Turing machine. We show that the problem of the local machine computing \textbf{simulation properties} of its global simulator is undecidable in the same sense as the Halting problem. Then, we show that computing the time, space, or error accumulated by the global simulator are simulation properties and therefore are undecidable. These simulation properties give rise to special relativistic effects in the relative model which we use to construct a relative Church-Turing-Deutsch thesis where a global, classical Turing machine computes quantum mechanics for a local machine with the same constant-time local computational complexity as experienced in our universe.

翻訳日:2022-06-15 15:37:58 公開日:2022-06-13

# 複数インプット法の比較評価のための方法論的枠組み--米国国立コロナウイルス共同研究における人種・民族・身体マス指数の多変量化-

A Methodological Framework for the Comparative Evaluation of Multiple Imputation Methods: Multiple Imputation of Race, Ethnicity and Body Mass Index in the U.S. National COVID Cohort Collaborative ( http://arxiv.org/abs/2206.06444v1 )

ライセンス: Link先を確認

Elena Casiraghi, Rachel Wong, Margaret Hall, Ben Coleman, Marco Notaro, Michael D. Evans, Jena S. Tronieri, Hannah Blau, Bryan Laraway, Tiffany J. Callahan, Lauren E. Chan, Carolyn T. Bramante, John B. Buse, Richard A. Moffitt, Til Sturmer, Steven G. Johnson, Yu Raymond Shao, Justin Reese, Peter N. Robinson, Alberto Paccanaro, Giorgio Valentini, Jared D. Huling and Kenneth Wilkins (on behalf of the N3C Consortium): Tell Bennet, Christopher Chute, Peter DeWitt, Kenneth Gersing, Andrew Girvin, Melissa Haendel, Jeremy Harper, Janos Hajagos, Stephanie Hong, Emily Pfaff, Jane Reusch, Corneliu Antoniescu, Kimberly Robaski

(参考訳) 電子健康記録は、バイオメディカル研究のための豊富なデータソースであるが、これらのシステムは、医療設定全体にわたって均一に実装されておらず、医療の断片化とサイロ化された電子健康記録間の相互運用性の欠如により、重要なデータが欠落している可能性がある。欠失データによる症例の削除がその後の分析に深刻なバイアスをもたらす可能性があることを考えると、いくつかの著者は欠失情報を回復するために複数の計算方法を適用することを好む。残念なことに、いくつかの文献は、現在研究に自由に利用できる異なる複数のインプテーションアルゴリズムを使用して有望な結果を文書化しているが、どのmiアルゴリズムが最もうまく機能するかについてのコンセンサスはない。 MI戦略の選択以外にも、計算アルゴリズムとそのアプリケーション設定の選択は決定的かつ困難である。本稿では,Rubin と van Buuren の独創的な研究に触発されて,複数の計算手法の評価と比較に応用できる方法論的枠組みを提案する。本研究の枠組みはより大規模なコホート(コホート)の検証・拡張に応用され,本研究は,全国コホート協力機関が提供した2型糖尿病患者における重要患者の記述者の影響と重症度について検討した。

While electronic health records are a rich data source for biomedical research, these systems are not implemented uniformly across healthcare settings and significant data may be missing due to healthcare fragmentation and lack of interoperability between siloed electronic health records. Considering that the deletion of cases with missing data may introduce severe bias in the subsequent analysis, several authors prefer applying a multiple imputation strategy to recover the missing information. Unfortunately, although several literature works have documented promising results by using any of the different multiple imputation algorithms that are now freely available for research, there is no consensus on which MI algorithm works best. Beside the choice of the MI strategy, the choice of the imputation algorithm and its application settings are also both crucial and challenging. In this paper, inspired by the seminal works of Rubin and van Buuren, we propose a methodological framework that may be applied to evaluate and compare several multiple imputation techniques, with the aim to choose the most valid for computing inferences in a clinical research work. Our framework has been applied to validate, and extend on a larger cohort, the results we presented in a previous literature study, where we evaluated the influence of crucial patients' descriptors and COVID-19 severity in patients with type 2 diabetes mellitus whose data is provided by the National COVID Cohort Collaborative Enclave.

翻訳日:2022-06-15 15:37:40 公開日:2022-06-13

# 視・放射・学習:ラジオ・視覚対応による自己教師あり局所化

Look, Radiate, and Learn: Self-supervised Localisation via Radio-Visual Correspondence ( http://arxiv.org/abs/2206.06424v1 )

ライセンス: Link先を確認

Mohammed Alloulah, Maximilian Arnold

(参考訳) 次世代の携帯電話ネットワークは、無線センシング機能と慣用通信を実装して、前例のない世界規模の無線センシングを屋外で実現する。ディープラーニングはコンピュータビジョンに革命をもたらしたが、電波センシングの性能と将来性を研究するための体系的なデータセットやベンチマークが欠如していることから、電波知覚タスクに限定された応用がなされている。このギャップに対処するために、我々は、無線の正確なターゲットローカライゼーションを容易にする合成無線視覚データセットとベンチマークであるMaxRayを提示する。さらに,無線と視覚の対応から自己コーディネートを抽出することで,ラジオにおける目標のローカライズを学ぶことを提案する。無線ローカライザネットワークのトレーニングには,このような自己監督座標を用いる。我々は、多くの最先端のベースラインに対して、パフォーマンスを特徴付ける。以上の結果から,ラベルのない無線視線データから,正確な無線目標位置推定を自動学習できることが示唆された。これにより、膨大なデータスケーラビリティの扉が開かれ、統一された認識通信セルインフラストラクチャ上で堅牢な無線センシングを実現するための鍵が証明される。 DatasetはIEEE DataPortでホストされる。

Next generation cellular networks will implement radio sensing functions alongside customary communications, thereby enabling unprecedented worldwide sensing coverage outdoors. Deep learning has revolutionised computer vision but has had limited application to radio perception tasks, in part due to lack of systematic datasets and benchmarks dedicated to the study of the performance and promise of radio sensing. To address this gap, we present MaxRay: a synthetic radio-visual dataset and benchmark that facilitate precise target localisation in radio. We further propose to learn to localise targets in radio without supervision by extracting self-coordinates from radio-visual correspondence. We use such self-supervised coordinates to train a radio localiser network. We characterise our performance against a number of state-of-the-art baselines. Our results indicate that accurate radio target localisation can be automatically learned from paired radio-visual data without labels, which is highly relevant to empirical data. This opens the door for vast data scalability and may prove key to realising the promise of robust radio sensing atop a unified perception-communication cellular infrastructure. Dataset will be hosted on IEEE DataPort.

翻訳日:2022-06-15 15:37:15 公開日:2022-06-13

# 画像共起バイアスによる因果効果の推定 : アフリカの貧困への適用

Estimating Causal Effects Under Image Confounding Bias with an Application to Poverty in Africa ( http://arxiv.org/abs/2206.06410v1 )

ライセンス: Link先を確認

Connor T. Jerzak, Fredrik Johansson, Adel Daoud

(参考訳) 因果効果の観察的研究は、結合因子の調整を必要とする。これらの因子が明確に定義され、個別の確率変数である表の設定では、共起の効果がよく理解されている。しかし、公共政策、生態学、医学では、画像で検出されたパターンや物体(地図、衛星、トモグラフィー画像など)に通知される非タブラルな設定で決定されることが多い。このようなイメージを因果推論に使用すると、画像内のオブジェクトが関心のある治療や結果に関連がある可能性があるため、機会が得られる。このような場合、コンバウンディングの調整には画像に依存するが、観測されたデータは、重要なオブジェクトの存在を直接ラベル付けしない。現実世界のアプリケーションによって動機づけられ、この課題、どのように処理できるか、因果効果を識別し見積もるのに十分な条件を定式化します。シミュレーション実験を用いて有限サンプル性能を解析し、機械学習モデルを用いて画像の共起を推定する確率調整アルゴリズムを用いて効果を推定する。また,画像パターン機構の誤特定に対する感度についても検討した。最後に,我々の手法を用いて,衛星画像からアフリカの貧困に対する政策介入の影響を推定する。

Observational studies of causal effects require adjustment for confounding factors. In the tabular setting, where these factors are well-defined, separate random variables, the effect of confounding is well understood. However, in public policy, ecology, and in medicine, decisions are often made in non-tabular settings, informed by patterns or objects detected in images (e.g., maps, satellite or tomography imagery). Using such imagery for causal inference presents an opportunity because objects in the image may be related to the treatment and outcome of interest. In these cases, we rely on the images to adjust for confounding but observed data do not directly label the existence of the important objects. Motivated by real-world applications, we formalize this challenge, how it can be handled, and what conditions are sufficient to identify and estimate causal effects. We analyze finite-sample performance using simulation experiments, estimating effects using a propensity adjustment algorithm that employs a machine learning model to estimate the image confounding. Our experiments also examine sensitivity to misspecification of the image pattern mechanism. Finally, we use our methodology to estimate the effects of policy interventions on poverty in African communities from satellite imagery.

翻訳日:2022-06-15 15:01:14 公開日:2022-06-13

# 画像に基づく治療効果の不均一性

Image-based Treatment Effect Heterogeneity ( http://arxiv.org/abs/2206.06417v1 )

ライセンス: Link先を確認

Connor T. Jerzak, Fredrik Johansson, Adel Daoud

(参考訳) ランダム化制御試験(RCTs)は介入の効果を推定するための金の基準と考えられている。最近の研究では、年齢や民族性などの表式変数に対する推定条件付けによって、rctにおける効果の多様性が研究されている。しかし、そのような変数は実験の前後でのみ観測されることが多く、歴史的または地理的な影響の理由を捉えられないことがある。実験単位が特定の場所と関連付けられると、衛星画像はそのような歴史的・地理的情報を提供することができるが、効果の不均一性を記述するためにそれを組み込む方法は存在しない。本稿では,治療効果に対して同一の分布を持つ画像群を,より確率的モデリングフレームワークを用いて推定する手法を開発した。提案手法をシミュレーションおよびウガンダにおける抗貧困介入の効果を推定するために,提案手法を代替案と比較した。平均治療効果(ATE)を回復する際のクラスタモデルの信頼性を確保するために、因果正則化ペナルティを導入する。最後に,画像情報の普及にともなう医学や気候科学など,これらの手法の適用可能性,限界,適用性について論じる。すべてのモデリング戦略のコードをオープンソースソフトウェアパッケージで公開しています。

Randomized controlled trials (RCTs) are considered the gold standard for estimating the effects of interventions. Recent work has studied effect heterogeneity in RCTs by conditioning estimates on tabular variables such as age and ethnicity. However, such variables are often only observed near the time of the experiment and may fail to capture historical or geographical reasons for effect variation. When experiment units are associated with a particular location, satellite imagery can provide such historical and geographical information, yet there is no method which incorporates it for describing effect heterogeneity. In this paper, we develop such a method which estimates, using a deep probabilistic modeling framework, the clusters of images having the same distribution over treatment effects. We compare the proposed methods against alternatives in simulation and in an application to estimating the effects of an anti-poverty intervention in Uganda. A causal regularization penalty is introduced to ensure reliability of the cluster model in recovering Average Treatment Effects (ATEs). Finally, we discuss feasibility, limitations, and the applicability of these methods to other domains, such as medicine and climate science, where image information is prevalent. We make code for all modeling strategies publicly available in an open-source software package.

翻訳日:2022-06-15 15:00:54 公開日:2022-06-13

# SmartGD:グラフ描画のための自己変化型ジェネレータネットワーク

SmartGD: A Self-Challenging Generative Adversarial Network for Graph Drawing ( http://arxiv.org/abs/2206.06434v1 )

ライセンス: Link先を確認

Xiaoqi Wang, Kevin Yen, Yifan Hu and Han-Wei Shen

(参考訳) グラフ描画に関する研究は数多く行われているが、既存の手法の多くはグラフレイアウトの特定の美的側面を最適化することだけに焦点を当てている。グラフが与えられた場合、人間の美的嗜好を満たす良いレイアウトを生成することは、特にそのような嗜好が微分可能な目的関数として表現できない場合、難しい課題である。本稿では,学習者によるGANベースのグラフ描画フレームワークSmartGDを提案する。 SmartGDの学生ネットワークは、良いレイアウトの例を模倣してグラフ描画を学習し、SmartGDの教師ネットワークは、生成されたレイアウトの良さに関する評価を提供する。よいレイアウトを構成するものを特定するための具体的な美的基準がない場合、学生ネットワークは良いレイアウト例から学ぶことができる。一方、定量的基準(差別化不可能でも)でレイアウトの良さを評価できる場合、学生ネットワークは、ターゲットの美学を最適化するための具体的目標として利用することができる。目的を達成するために,GAN の新たな変種である自己整合性 GAN を提案し,審美的基準に対する最適レイアウト分布を,基準が微分可能か否かに関わらず学習する。提案するグラフ描画フレームワークは,優れたレイアウト例と同様のスタイルでグラフを描画できるだけでなく,任意の審美基準に従ってグラフレイアウトを最適化することができる。モデルがトレーニングされると、サンプルレイアウトのスタイルや選択された美的基準に従って任意のグラフを視覚化することができる。総合的な実験研究により、SmartGDは、一般的に合意されている指標に従って12のベンチマークメソッドを上回ります。

A multitude of studies have been conducted on graph drawing, but many existing methods only focus on optimizing particular aesthetic aspects of graph layout. Given a graph, generating a good layout that satisfies certain human aesthetic preference remains a challenging task, especially if such preference can not be expressed as a differentiable objective function. In this paper, we propose a student-teacher GAN-based graph drawing framework, SmartGD, which learns to draw graphs just like how humans learn to perform tasks. The student network in the SmartGD learns graph drawing by imitating good layout examples, while the teacher network in SmartGD is responsible for providing ratings regarding the goodness of the generated layouts. When there is a lack of concrete aesthetic criteria to specify what constitutes a good layout, the student network can learn from the good layout examples. On the other hand, when the goodness of a layout can be assessed by quantitative criteria (even if not differentiable), the student network can use it as a concrete goal to optimize the target aesthetics. To accomplish the goal, we propose a novel variant of GAN, self-challenging GAN, to learn the optimal layout distribution with respect to any aesthetic criterion, whether the criterion is differentiable or not. The proposed graph drawing framework can not only draw graphs in a similar style as the good layout examples but also optimize the graph layouts according to any given aesthetic criteria when available. Once the model is trained, it can be used to visualize arbitrary graphs according to the style of the example layouts or the chosen aesthetic criteria. The comprehensive experimental studies show that SmartGD outperforms 12 benchmark methods according to the commonly agreed metrics.

翻訳日:2022-06-15 15:00:35 公開日:2022-06-13

# 知識発見のための説明可能な混合データ表現とロスレス可視化ツールキット

Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery ( http://arxiv.org/abs/2206.06476v1 )

ライセンス: Link先を確認

Boris Kovalerchuk, Elijah McCoy

(参考訳) 不均一/混合データのための機械学習(ml)アルゴリズムの開発は長年の課題である。多くのMLアルゴリズムは、数値データや非数値データ、テキスト、グラフなどを含む混合データに適用できず、解釈可能なモデルを生成する。もう1つの長期的な問題は、多次元混合データのロスレス可視化のためのアルゴリズムの開発である。 MLのさらなる進歩は、混合データに対する解釈可能なMLアルゴリズムの成功と多次元データのロスレス解釈可能な可視化に大きく依存している。これにより、エンドユーザによる視覚的知識発見を使用して解釈可能なMLモデルの開発が可能になる。混合データに対する課題は,(1) 数値MLアルゴリズムの非数値属性の数値符号化スキームを生成し,正確かつ解釈可能なMLモデルを提供すること,(2) n-D の非数値データのロスレス可視化のための方法,およびこれらの視覚化における視覚ルールの発見である。本稿では、混合データの種類を分類し、MLの重要性を分析し、混合データを扱うための実験ツールキットを提案する。データ型エディタ、VisCanvasデータ可視化、ルール発見システムを組み合わせたもので、GitHubで公開されている。

Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.

翻訳日:2022-06-15 15:00:05 公開日:2022-06-13

# 対人ロバスト性向上のための代替手法:摂動スペクトルによる対人訓練の分析

Towards Alternative Techniques for Improving Adversarial Robustness: Analysis of Adversarial Training at a Spectrum of Perturbations ( http://arxiv.org/abs/2206.06496v1 )

ライセンス: Link先を確認

Kaustubh Sridhar, Souradeep Dutta, Ramneet Kaur, James Weimer, Oleg Sokolsky, Insup Lee

(参考訳) 敵のトレーニング(AT)とその変種は、ここ数年で敵の摂動と一般的な腐敗に対するニューラルネットワークの堅牢性を改善する進歩を先導している。 ATのアルゴリズム設計とその変種は、特定の摂動強度$\epsilon$でトレーニングモデルに焦点を当てており、アルゴリズムを改善するために、その$\epsilon$-robustモデルのパフォーマンスからのフィードバックのみを使用する。本研究では、$\epsilon$値のスペクトルに基づいてトレーニングされたモデルに焦点を当てる。モデル性能,中間特徴量精度,畳み込みフィルタ感度の3つの視点を解析する。それぞれにおいて、atに対する別の改善は、1つの$\epsilon$で明らかではありませんでした。具体的には、PGD攻撃の強さが$\delta$の場合、ATモデルが$\epsilon$よりも少し大きくなるが、それ以上の場合、それを最大限に一般化する。そこで我々は,ロバスト性に対する過剰設計を提案し,トレーニングモデルを$\epsilon$で提案する。第二に、ロバスト性は中間的特徴の精度、特に第1層と第2層の後の精度に非常に敏感である(様々な$\epsilon$値全体で)。そこで本研究では,アダプティブアタックの視認精度を向上させるための簡単な量子化手法を提案する。第3に、モデルの各層の畳み込みフィルタを$\epsilon$で解析し、第1層と第2層の畳み込みが入力摂動の増幅にのみ責任があることに気づく。我々は,CIFAR-10およびCIFAR-10-Cデータセット上でResNetおよびWideResNetモデルを用いて実験を行い,本手法を実証した。

Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations and common corruptions in the last few years. Algorithm design of AT and its variants are focused on training models at a specified perturbation strength $\epsilon$ and only using the feedback from the performance of that $\epsilon$-robust model to improve the algorithm. In this work, we focus on models, trained on a spectrum of $\epsilon$ values. We analyze three perspectives: model performance, intermediate feature precision and convolution filter sensitivity. In each, we identify alternative improvements to AT that otherwise wouldn't have been apparent at a single $\epsilon$. Specifically, we find that for a PGD attack at some strength $\delta$, there is an AT model at some slightly larger strength $\epsilon$, but no greater, that generalizes best to it. Hence, we propose overdesigning for robustness where we suggest training models at an $\epsilon$ just above $\delta$. Second, we observe (across various $\epsilon$ values) that robustness is highly sensitive to the precision of intermediate features and particularly those after the first and second layer. Thus, we propose adding a simple quantization to defenses that improves accuracy on seen and unseen adaptive attacks. Third, we analyze convolution filters of each layer of models at increasing $\epsilon$ and notice that those of the first and second layer may be solely responsible for amplifying input perturbations. We present our findings and demonstrate our techniques through experiments with ResNet and WideResNet models on the CIFAR-10 and CIFAR-10-C datasets.

翻訳日:2022-06-15 14:59:47 公開日:2022-06-13

# MetaTPTrans:多言語コード表現学習のためのメタ学習アプローチ

MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning ( http://arxiv.org/abs/2206.06460v1 )

ライセンス: Link先を確認

Weiguo Pian, Hanyu Peng, Xunzhu Tang, Tiezhu Sun, Haoye Tian, Andrew Habib, Jacques Klein, Tegawend\'e F. Bissyand\'e

(参考訳) ソースコードの表現学習は、ソフトウェア工学のタスクに機械学習を適用するために不可欠である。異なるプログラミング言語間のコード表現の学習は、複数の言語データセットからのトレーニングデータが、ソースコードから言語に依存しない情報を抽出する能力を改善するため、単一言語データセットから学ぶことよりも効果的であることが示されている。しかし、既存のマルチ言語モデルは、複数の言語データセットでトレーニングする下流タスクにとって重要な言語固有の情報を見落とし、異なる言語間で共有パラメータを学習することだけに焦点を当てている。本稿では,多言語コード表現学習のためのメタ学習手法であるmetatptransを提案する。 metatptransは、入力ソースコードスニペットの特定のプログラミング言語に従って、特徴抽出器の異なるパラメータを生成し、モデルが言語に依存しない情報と言語固有の情報の両方を学習できるようにする。実験結果から,MetaTPTransは,コード要約作業におけるF1スコアを最大2.40ポイント,言語固有のタスクであるコード補完作業においてTop-1(Top-5)の予測精度を最大7.32(Top5)ポイント,言語固有のタスクとして最大7.15ポイント向上した。

Representation learning of source code is essential for applying machine learning to software engineering tasks. Learning code representation across different programming languages has been shown to be more effective than learning from single-language datasets, since more training data from multi-language datasets improves the model's ability to extract language-agnostic information from source code. However, existing multi-language models overlook the language-specific information which is crucial for downstream tasks that is training on multi-language datasets, while only focusing on learning shared parameters among the different languages. To address this problem, we propose MetaTPTrans, a meta learning approach for multilingual code representation learning. MetaTPTrans generates different parameters for the feature extractor according to the specific programming language of the input source code snippet, enabling the model to learn both language-agnostics and language-specific information. Experimental results show that MetaTPTrans improves the F1 score of state-of-the-art approaches significantly by up to 2.40 percentage points for code summarization, a language-agnostic task; and the prediction accuracy of Top-1 (Top-5) by up to 7.32 (13.15) percentage points for code completion, a language-specific task.

翻訳日:2022-06-15 14:56:32 公開日:2022-06-13

# 定量ヘイズレベルとグラウンドトゥルースを有する多目的実ハイズベンチマーク

A Multi-purpose Real Haze Benchmark with Quantifiable Haze Levels and Ground Truth ( http://arxiv.org/abs/2206.06427v1 )

ライセンス: Link先を確認

Priya Narayanan, Xin Hu, Zhenyu Wu, Matthew D Thielke, John G Rogers, Andre V Harrison, John A D'Agostino, James D Brown, Long P Quang, James R Uplinger, Heesung Kwon, Zhangyang Wang

(参考訳) 屋外の視覚環境から収集された画像は、濃密な煙やヘイズの存在により劣化することが多い。これらの劣化した視覚環境(DVE)におけるシーン理解の研究における重要な課題は、代表的なベンチマークデータセットの欠如である。これらのデータセットは、最先端のオブジェクト認識や他のコンピュータビジョンアルゴリズムを劣化した設定で評価するために必要である。本稿では,ハズフリー画像を用いた最初のペア実画像ベンチマークと,その場でのハズ密度測定を導入することで,これらの制約に対処する。このデータセットは、現場全体を覆うプロの煙発生装置で制御された環境で作られ、無人航空機(UAV)と無人地上機(UGV)の両方の観点から撮影された画像で構成されている。また,データ集合上の物体検出装置と同様に,最先端のデハジング手法のセットを評価する。 ground truth object classification bounding boxとhaze density measurementを含む、本論文で提示された完全なデータセットは、コミュニティがアルゴリズムを次のように評価するために提供される。このデータセットのサブセットは、CVPR UG2 2022 チャレンジの Haze Track における Object Detection に使用されている。

Imagery collected from outdoor visual environments is often degraded due to the presence of dense smoke or haze. A key challenge for research in scene understanding in these degraded visual environments (DVE) is the lack of representative benchmark datasets. These datasets are required to evaluate state-of-the-art object recognition and other computer vision algorithms in degraded settings. In this paper, we address some of these limitations by introducing the first paired real image benchmark dataset with hazy and haze-free images, and in-situ haze density measurements. This dataset was produced in a controlled environment with professional smoke generating machines that covered the entire scene, and consists of images captured from the perspective of both an unmanned aerial vehicle (UAV) and an unmanned ground vehicle (UGV). We also evaluate a set of representative state-of-the-art dehazing approaches as well as object detectors on the dataset. The full dataset presented in this paper, including the ground truth object classification bounding boxes and haze density measurements, is provided for the community to evaluate their algorithms at: https://a2i2-archangel.vision. A subset of this dataset has been used for the Object Detection in Haze Track of CVPR UG2 2022 challenge.

翻訳日:2022-06-15 14:36:09 公開日:2022-06-13

# 行動認識のイデオロギーを用いたビデオPose3Dの訓練方法

A Training Method For VideoPose3D With Ideology of Action Recognition ( http://arxiv.org/abs/2206.06430v1 )

ライセンス: Link先を確認

Hao Bai

(参考訳) 映像からの行動認識とポーズ推定は人間の動きの理解と密接な関係があるが、より多くの文献では、行動認識から単独でポーズ推定タスクを解決する方法に焦点が当てられている。本研究は,アクション認識に基づくVideoPose3Dのより高速で柔軟なトレーニング手法を示す。このモデルは、推定される型と同じタイプのアクションで供給され、異なるタイプのアクションを別々にトレーニングすることができます。エビデンスによれば、一般的なポーズ推定タスクでは、このモデルはオリジナルの研究と同じような結果を得るために比較的少量のデータを必要としており、アクション指向タスクでは、受容野のサイズが限定され、MPJPEのベロシティエラーのトレーニングエポックが4.5%向上している。このモデルはアクション指向問題と一般的なポーズ推定問題の両方を扱うことができる。

Action recognition and pose estimation from videos are closely related to understand human motions, but more literature focuses on how to solve pose estimation tasks alone from action recognition. This research shows a faster and more flexible training method for VideoPose3D which is based on action recognition. This model is fed with the same type of action as the type that will be estimated, and different types of actions can be trained separately. Evidence has shown that, for common pose-estimation tasks, this model requires a relatively small amount of data to carry out similar results with the original research, and for action-oriented tasks, it outperforms the original research by 4.5% with a limited receptive field size and training epoch on Velocity Error of MPJPE. This model can handle both action-oriented and common pose-estimation problems.

翻訳日:2022-06-15 14:35:50 公開日:2022-06-13

# icpアルゴリズム:理論、実践とスラム指向分類法

ICP Algorithm: Theory, Practice And Its SLAM-oriented Taxonomy ( http://arxiv.org/abs/2206.06435v1 )

ライセンス: Link先を確認

Hao Bai

(参考訳) 反復的最接近点法 (icp) アルゴリズムは三次元表面登録の幾何学的アライメントにおいて最も重要なアルゴリズムの1つであり、同時局在マッピング(slam)タスクを含むコンピュータビジョンタスクでよく用いられる。本稿では, icpアルゴリズムの理論的原理, 表面登録タスクでの利用方法, 従来型icpアルゴリズムの分類法について述べる。また, SLAMタスクがオンラインであるか否か, ランドマークがSLAMタスクの特徴として存在するか否かなど, SLAMタスクの特徴に基づいて, ICPアルゴリズムのSLAM指向の分類も導入している。我々は,最新の研究論文をいくつか比較し,実装の詳細を分析することにより,slamタスクの各タイプの合成を行う。

The Iterative Closest Point (ICP) algorithm is one of the most important algorithms for geometric alignment of three-dimensional surface registration, which is frequently used in computer vision tasks, including the Simultaneous Localization And Mapping (SLAM) tasks. In this paper, we illustrate the theoretical principles of the ICP algorithm, how it can be used in surface registration tasks, and the traditional taxonomy of the variants of the ICP algorithm. As SLAM is becoming a popular topic, we also introduce a SLAM-oriented taxonomy of the ICP algorithm, based on the characteristics of each type of SLAM task, including whether the SLAM task is online or not and whether the landmarks are present as features in the SLAM task. We make a synthesis of each type of SLAM task by comparing several up-to-date research papers and analyzing their implementation details.

翻訳日:2022-06-15 14:35:33 公開日:2022-06-13

# フレームベースおよびイベントベース単一物体定位のためのスパイクニューラルネットワーク

Spiking Neural Networks for Frame-based and Event-based Single Object Localization ( http://arxiv.org/abs/2206.06506v1 )

ライセンス: Link先を確認

Sami Barchid, Jos\'e Mennesson, Jason Eshraghian, Chaabane Dj\'eraba, Mohammed Bennamoun

(参考訳) スパイクニューラルネットワークは、ニューラルネットワークのエネルギー効率のよい代替手段として大きな期待を寄せている。しかし、センサノイズや入力エンコーディングがネットワーク活動や性能に与える影響を理解することは、分類のような共通のニューロモルフィック視覚ベースラインでは困難である。そこで本研究では,サロゲート勾配勾配を用いた単一物体位置定位のためのスパイクニューラルネットワークアプローチを提案する。提案手法を類似したニューラルネットワークと比較し,本モデルが競合性能と効率性,各種汚職に対するロバスト性,エネルギー消費量の低減を両立させることを示した。さらに,静的画像に対するニューラルコーディング方式の精度,ロバスト性,エネルギー効率への影響について検討した。本研究は,従来の生物工学的学習規則とは大きく異なり,サロゲート勾配学習アーキテクチャの設計を支援し,ノイズ特性やデータ符号化手法の観点から,将来のニューロモルフィック技術における設計優先の洞察を提供する。

Spiking neural networks have shown much promise as an energy-efficient alternative to artificial neural networks. However, understanding the impacts of sensor noises and input encodings on the network activity and performance remains difficult with common neuromorphic vision baselines like classification. Therefore, we propose a spiking neural network approach for single object localization trained using surrogate gradient descent, for frame- and event-based sensors. We compare our method with similar artificial neural networks and show that our model has competitive/better performance in accuracy, robustness against various corruptions, and has lower energy consumption. Moreover, we study the impact of neural coding schemes for static images in accuracy, robustness, and energy efficiency. Our observations differ importantly from previous studies on bio-plausible learning rules, which helps in the design of surrogate gradient trained architectures, and offers insight to design priorities in future neuromorphic technologies in terms of noise characteristics and data encoding methods.

翻訳日:2022-06-15 14:35:18 公開日:2022-06-13

# 不変構造学習による一般化と因果説明可能性の向上

Invariant Structure Learning for Better Generalization and Causal Explainability ( http://arxiv.org/abs/2206.06469v1 )

ライセンス: Link先を確認

Yunhao Ge, Sercan \"O. Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister

(参考訳) データの背後にある因果構造を学ぶことは、一般化を改善し、高品質な説明を得るのに有用である。本稿では,一般化を指標として因果構造発見を改善するための新しい枠組みである不変構造学習(isl)を提案する。 ISLはデータを異なる環境に分割し、一貫性の制約を課すことで、異なる環境にわたってターゲットに不変な構造を学ぶ。集約機構は、個々の環境から学習した構造よりもデータの因果メカニズムをより正確に反映するグラフ構造に基づいて最適な分類器を選択する。さらに,正確な因果構造発見がラベルに依存しない自己教師型学習環境にISLを拡張した。この自己監督型ISLは、異なるノードをターゲットとして反復的に設定することで、不変因果提案を利用する。合成および実世界のデータセットにおいて、ISLは因果構造を正確に発見し、代替手法より優れ、大きな分布シフトを持つデータセットに対して優れた一般化をもたらすことを示す。

Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target across different environments by imposing a consistency constraint. An aggregation mechanism then selects the optimal classifier based on a graph structure that reflects the causal mechanisms in the data more accurately compared to the structures learnt from individual environments. Furthermore, we extend ISL to a self-supervised learning setting where accurate causal structure discovery does not rely on any labels. This self-supervised ISL utilizes invariant causality proposals by iteratively setting different nodes as targets. On synthetic and real-world datasets, we demonstrate that ISL accurately discovers the causal structure, outperforms alternative methods, and yields superior generalization for datasets with significant distribution shifts.

翻訳日:2022-06-15 14:17:26 公開日:2022-06-13

# 雑音ラベルを用いた画像セグメンテーションについて : 精度とダイスに対する最適解のキャラクタリゼーションとボリューム特性

On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice ( http://arxiv.org/abs/2206.06484v1 )

ライセンス: Link先を確認

Marcus Nordstr\"om, Henrik Hult, Jonas S\"oderberg, Fredrik L\"ofman

(参考訳) 対象ラベルがノイズである場合の医用画像のセグメンテーション,精度,ダイスにおける2つのパフォーマンス指標について検討した。どちらの指標も最適セグメンテーションの集合のキャラクタリゼーションと体積特性に関するいくつかのステートメントが証明され、関連する実験が提供されている。私たちの主な洞察は (i)両方の指標に対する解の体積は、目標の期待される体積から著しくずれる可能性がある。 (ii)精度に対する解の体積は、常にサイスに対する解の体積と同等以下である。 (iii)両メトリクスの最適解が一致するのは、実現可能なセグメンテーションの集合が、対象の期待される体積に等しい体積を持つセグメンテーションの集合に制限されるときである。

We study two of the most popular performance metrics in medical image segmentation, Accuracy and Dice, when the target labels are noisy. For both metrics, several statements related to characterization and volume properties of the set of optimal segmentations are proved, and associated experiments are provided. Our main insights are: (i) the volume of the solutions to both metrics may deviate significantly from the expected volume of the target, (ii) the volume of a solution to Accuracy is always less than or equal to the volume of a solution to Dice and (iii) the optimal solutions to both of these metrics coincide when the set of feasible segmentations is constrained to the set of segmentations with the volume equal to the expected volume of the target.

翻訳日:2022-06-15 14:13:22 公開日:2022-06-13

# 仮説に焦点をあてるモダリティ:多モード知識蒸留のリンクについて

The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation ( http://arxiv.org/abs/2206.06487v1 )

ライセンス: Link先を確認

Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao

(参考訳) マルチモーダル知識蒸留(英語版)(KD)は、伝統的な知識蒸留をマルチモーダル学習の領域にまで拡張する。 1つの一般的な実践は、パフォーマンス改善のために全知識を学生に伝達できることを期待して、優れたマルチモーダルネットワークを教師として採用することである。本稿では,マルチモーダルKDの有効性について検討する。まず2つの失敗事例を提供し、kdがマルチモーダル知識伝達における普遍的な治療法ではないことを示す。本稿では,モダリティ関係を理解するためのモダリティベン図と,マルチモーダルKDの有効性の決定的要因を明らかにするモダリティ集中仮説を示す。 6つのマルチモーダルデータセットの実験結果は, 蒸留性能を改善するために, 仮説の正当化, 故障症例の診断, ポイント方向の特定に有用である。

Multimodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning. One common practice is to adopt a well-performed multimodal network as the teacher in the hope that it can transfer its full knowledge to a unimodal student for performance improvement. In this paper, we investigate the efficacy of multimodal KD. We begin by providing two failure cases of it and demonstrate that KD is not a universal cure in multimodal knowledge transfer. We present the modality Venn diagram to understand modality relationships and the modality focusing hypothesis revealing the decisive factor in the efficacy of multimodal KD. Experimental results on 6 multimodal datasets help justify our hypothesis, diagnose failure cases, and point directions to improve distillation performance.

翻訳日:2022-06-15 14:13:09 公開日:2022-06-13

# トランスフォーマーを用いたマルチモーダルラーニング:サーベイ

Multimodal Learning with Transformers: A Survey ( http://arxiv.org/abs/2206.06488v1 )

ライセンス: Link先を確認

Peng Xu, Xiatian Zhu, and David A. Clifton

(参考訳) Transformerは有望なニューラルネットワーク学習者であり、さまざまな機械学習タスクで大きな成功を収めている。近年のマルチモーダルアプリケーションとビッグデータの普及により、トランスフォーマーベースのマルチモーダル学習はAI研究においてホットなトピックとなっている。本稿では,マルチモーダルデータ指向の変圧器技術に関する包括的調査を行う。 The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.

Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.

翻訳日:2022-06-15 14:12:55 公開日:2022-06-13

# Habitat 2.0におけるBEHAVIOR: ベンチマークのためのシミュレータ非依存の論理的タスク記述

BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents ( http://arxiv.org/abs/2206.06489v1 )

ライセンス: Link先を確認

Ziang Liu, Roberto Mart\'in-Mart\'in, Fei Xia, Jiajun Wu, Li Fei-Fei

(参考訳) ロボットは倉庫や工場などの管理された環境で反復的かつ精度の高いタスクを実行するのに優れているが、家庭用タスクの補助を提供するAIエージェントにはまだ拡張されていない。コンピュータビジョンや自然言語処理といったAI分野でベンチマークが果たした触媒効果に触発され、コミュニティはAIを具現化した新しいベンチマークを探している。実施済みAIベンチマークの以前の作業では、ひとつの環境やシミュレータやドメインに特有の、異なる形式を使ったタスクを定義していたため、一般的なソリューションや同等のソリューションの開発が困難だった。本研究では,論理空間で定義された動作を異なるシミュレータに適応し易いことを示す第一歩として,その高速シミュレーション速度の恩恵を受けるために,振る舞いアクティビティのサブセットをhabitat 2.0に導入する。

Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks. Inspired by the catalyzing effect that benchmarks have played in the AI fields such as computer vision and natural language processing, the community is looking for new benchmarks for embodied AI. Prior work in embodied AI benchmark defines tasks using a different formalism, often specific to one environment, simulator or domain, making it hard to develop general and comparable solutions. In this work, we bring a subset of BEHAVIOR activities into Habitat 2.0 to benefit from its fast simulation speed, as a first step towards demonstrating the ease of adapting activities defined in the logic space into different simulators.

翻訳日:2022-06-15 14:11:18 公開日:2022-06-13

# 自己回帰ベイズ予測を用いた密度推定

Density Estimation with Autoregressive Bayesian Predictives ( http://arxiv.org/abs/2206.06462v1 )

ライセンス: Link先を確認

Sahra Ghalebikesabi, Chris Holmes, Edwin Fong, Brieuc Lehmann

(参考訳) ベイズ法は、前者によって引き起こされた正規化効果により、小データ体制において統計的推測の一般的な選択である。密度推定の文脈では、標準的なベイズ的アプローチは後方予測を目標とする。一般に、後部予測の直接推定は難解であり、通常は後部分布を中間段階として近似する手法を用いる。しかし,近年の再帰的予測コプラ更新により,後部近似を必要とせずにトラクタブルな予測密度推定が可能となった。これらの推定値は計算上魅力的であるが、非スムースデータ分布に苦しむ傾向がある。これは主に、提案されたコプラ更新が導出された可能性モデルの比較的限定的な形式によるものである。この欠点に対処するために,自己回帰的確率分解とガウス過程が先行するベイズ非パラメトリックモデルを考える。さらに,データを潜在空間にマップする自己回帰ニューラルネットワークを用いて帯域幅の新たなパラメータ化を定式化し,データ内のより複雑な依存関係をキャプチャする。我々の拡張は、既存の再帰的ベイズ密度推定器のモデリング能力を高め、表付きデータセットの最先端結果を達成する。

Bayesian methods are a popular choice for statistical inference in small-data regimes due to the regularization effect induced by the prior, which serves to counteract overfitting. In the context of density estimation, the standard Bayesian approach is to target the posterior predictive. In general, direct estimation of the posterior predictive is intractable and so methods typically resort to approximating the posterior distribution as an intermediate step. The recent development of recursive predictive copula updates, however, has made it possible to perform tractable predictive density estimation without the need for posterior approximation. Although these estimators are computationally appealing, they tend to struggle on non-smooth data distributions. This is largely due to the comparatively restrictive form of the likelihood models from which the proposed copula updates were derived. To address this shortcoming, we consider a Bayesian nonparametric model with an autoregressive likelihood decomposition and Gaussian process prior, which yields a data-dependent bandwidth parameter in the copula update. Further, we formulate a novel parameterization of the bandwidth using an autoregressive neural network that maps the data into a latent space, and is thus able to capture more complex dependencies in the data. Our extensions increase the modelling capacity of existing recursive Bayesian density estimators, achieving state-of-the-art results on tabular data sets.

翻訳日:2022-06-15 14:07:58 公開日:2022-06-13

# ベクトル束をもつ位相複素データのファイバー次元還元

Fiberwise dimensionality reduction of topologically complex data with vector bundles ( http://arxiv.org/abs/2206.06513v1 )

ライセンス: Link先を確認

Luis Scoccola and Jose A. Perea

(参考訳) 非自明な大規模トポロジーを持つデータセットは、既存の次元還元アルゴリズムで低次元ユークリッド空間に埋め込むのは難しい。本稿では,基本空間が大規模トポロジーを,ファイバーが局所幾何学を考慮しながら,ベクトル束を用いて位相的に複雑なデータセットをモデル化することを提案する。これにより、大規模なトポロジーを保ちながら繊維の寸法を小さくすることができる。我々はこの視点を定式化し、応用としてユークリッド空間の初期表現とともにデータセットを入力として、その大規模トポロジの一部を復元すると仮定したアルゴリズムを記述し、初期大域表現に沿って、局所的な線形次元の減少を通じて得られる局所表現を統合する新しい表現を出力する。このアルゴリズムは、力学系と化学の例を示す。これらの例において、本アルゴリズムは、様々な既知のメトリックベース次元低減アルゴリズムよりも低い目標次元におけるデータの位相的に忠実な埋め込みを学習することができる。

Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.

翻訳日:2022-06-15 14:07:36 公開日:2022-06-13

# Transversal GANを用いた3次元PET画像のプライバシー漏洩評価

Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN ( http://arxiv.org/abs/2206.06448v1 )

ライセンス: Link先を確認

Robert V. Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Arman Rahmim, Raymond T. Ng

(参考訳) 疾患診断や画像分割のための医用画像に対するコンピュータビジョン関連アルゴリズムの訓練は,プライバシ上の懸念から難しい。このため、データ共有を容易にするため、生成画像モデルは非常に求められている。しかし、3次元生成モデルは未検討であり、プライバシリークの調査が必要である。腫瘍マスクに装着した頭部・頸部PET画像を用いた3次元生成モデルTransversal GAN (TrGAN) について検討した。画像の忠実性、実用性、プライバシーの定量的尺度を定義します。これらの指標はトレーニングの過程で評価され、理想的な忠実さ、ユーティリティ、プライバシのトレードオフを特定し、これらのパラメータ間の関係を確立する。 trganの判別器は攻撃に対して脆弱であり、攻撃者は訓練に使用されたサンプルをほぼ完全な精度で識別できる(auc = 0.99)。また, 生成器のみにアクセスする攻撃者は, サンプルが訓練に使われたかどうか (auc = 0.51) を確実に分類できないことを示した。これは、TrGANジェネレータは、識別器ではなく、プライバシーのリスクを最小限に抑えつつ、優れたユーティリティと忠実さを維持しながら、合成3DPETデータを共有するために使われる可能性があることを示唆している。

Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transversal GAN (TrGAN), using head & neck PET images which are conditioned on tumour masks as a case study. We define quantitative measures of image fidelity, utility and privacy for our model. These metrics are evaluated in the course of training to identify ideal fidelity, utility and privacy trade-offs and establish the relationships between these parameters. We show that the discriminator of the TrGAN is vulnerable to attack, and that an attacker can identify which samples were used in training with almost perfect accuracy (AUC = 0.99). We also show that an attacker with access to only the generator cannot reliably classify whether a sample had been used for training (AUC = 0.51). This suggests that TrGAN generators, but not discriminators, may be used for sharing synthetic 3-D PET data with minimal privacy risk while maintaining good utility and fidelity.

翻訳日:2022-06-15 14:06:31 公開日:2022-06-13

# ヘイトスピーチとカウンタースピーチ検出: 会話的コンテキストは重要だ

Hate Speech and Counter Speech Detection: Conversational Context Does Matter ( http://arxiv.org/abs/2206.06423v1 )

ライセンス: Link先を確認

Xinchen Yu, Eduardo Blanco, Lingzi Hong

(参考訳) ヘイトスピーチは、ユーザー生成コンテンツとともにサイバースペースを脅かしている。本稿では,オンラインヘイトとカウンタースピーチのアノテーションと検出における会話コンテキストの役割について検討する。 redditコメントの3方向分類タスクのためのコンテキスト対応データセット(ヘイトスピーチ、カウンタースピーチ、中立性)を作成しました。我々の分析は、文脈がヘイトとカウンタースピーチの識別に重要であることを示唆している: 人間の判断は、アノテータに文脈を示すかどうかによって、ほとんどのコメントに対して変化する。言語分析は、人々が憎しみや反論を表現するために使用する言語についての洞察を引き出す。実験の結果,文脈を考慮したニューラルネットワークの方が有意に優れた結果が得られることがわかった。また,光を入射する定性的誤差解析についても述べる。 (a)いつ、なぜ文脈が有益で、 (b) コンテキストを考慮した場合の最良のモデルによる残りのエラー。

Hate speech is plaguing the cyberspace along with user-generated content. This paper investigates the role of conversational context in the annotation and detection of online hate and counter speech, where context is defined as the preceding comment in a conversation thread. We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral. Our analyses indicate that context is critical to identify hate and counter speech: human judgments change for most comments depending on whether we show annotators the context. A linguistic analysis draws insights into the language people use to express hate and counter speech. Experimental results show that neural networks obtain significantly better results if context is taken into account. We also present qualitative error analyses shedding light into (a) when and why context is beneficial and (b) the remaining errors made by our best model when context is taken into account.

翻訳日:2022-06-15 14:02:50 公開日:2022-06-13

# LST:パラメータとメモリ効率向上のためのラダーサイドチューニング

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning ( http://arxiv.org/abs/2206.06522v1 )

ライセンス: Link先を確認

Yi-Lin Sung, Jaemin Cho, Mohit Bansal

(参考訳) 近年,下流タスクにおける大規模事前学習モデルが,様々な領域で採用されている。しかし、大きな事前訓練されたモデルのパラメータセット全体を更新するのはコストがかかる。最近提案されたパラメータ効率変換学習(PETL)技術では、トレーニング済みバックボーンネットワーク内のパラメータの小さなサブセット(パラメータの2%しか使用していない)を新しいタスクに更新することができるが、トレーニングメモリの要件を最大30%削減できる。これは、トレーニング可能なパラメータの勾配計算が、大きなトレーニング済みのバックボーンモデルによるバックプロパゲーションを必要とするためである。そこで本研究では,学習時のメモリ要求量を大幅に削減する新しいpetl手法であるlst(ladar side-tuning)を提案する。バックボーンネットワークに新たなパラメータを挿入する既存のパラメータ効率の手法とは異なり、バックボーンネットワークからのショートカット接続(ラダー)を介して中間的なアクティベーションを入力として取り出し、予測を行う、はしご側ネットワークを訓練する。 LSTは、バックボーンネットワークを通してのバックプロパゲーションを必要とせず、代わりにサイドネットワークとラグ接続によってのみメモリ要求が大幅に低下する。 NLP (GLUE) と視覚言語 (VQA, GQA, NLVR2, MSCOCO) の両方で, 様々なモデル (T5, CLIP-T5) を用いて評価を行った。 LSTはネットワーク全体を微調整するためにメモリコストの69%を節約するが、他の方法は同様のパラメータの使用で26%しか節約しない(従って2.7倍のメモリ節約)。さらに、LSTは低メモリ状態においてAdapterやLoRAよりも高い精度を達成する。この優れたメモリ効率の利点をさらに示すため、LSTをより大きなT5モデル(T5-large, T5-3B)に適用し、フルチューニングや他のPETL法よりもGLUE性能が向上した。全く同じ傾向が、VLタスクの実験にも見られる。

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (ladders) from backbone networks and makes predictions. LST has significantly lower memory requirements than previous methods, because it does not require backpropagation through the backbone network, but instead only through the side network and ladder connections. We evaluate our method with various models (T5, CLIP-T5) on both NLP (GLUE) and vision-language (VQA, GQA, NLVR2, MSCOCO) tasks. LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2.7x more memory savings). Moreover, LST achieves higher accuracy than Adapter and LoRA in a low-memory regime. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models (T5-large, T5-3B), attaining better GLUE performance than full fine-tuning and other PETL methods. The exact same trend also holds in our experiments on VL tasks.

翻訳日:2022-06-15 13:29:52 公開日:2022-06-13

# 視覚とテキストのための合成混合表現

Compositional Mixture Representations for Vision and Text ( http://arxiv.org/abs/2206.06404v1 )

ライセンス: Link先を確認

Stephan Alaniz, Marco Federici, Zeynep Akata

(参考訳) 視覚と言語の間の共通の表現空間を学ぶことで、ディープネットワークは画像内のオブジェクトと対応する意味意味を関連付けることができる。本稿では,テキストの合成性を視覚領域に含ませる共有ガウス混合表現を,明示的な位置監視なしに学習するモデルを提案する。空間変換器を表現学習アプローチと組み合わせることで、画像を別々に符号化したパッチに分割し、視覚的およびテキスト的表現を解釈可能な方法で関連付けることを学ぶ。 MNISTとCIFAR10のバリエーションについて、我々のモデルは弱い教師付きオブジェクト検出を実行でき、オブジェクトの未知の組み合わせに外挿する能力を示す。

Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation learning approach we learn to split images into separately encoded patches to associate visual and textual representations in an interpretable manner. On variations of MNIST and CIFAR10, our model is able to perform weakly supervised object detection and demonstrates its ability to extrapolate to unseen combination of objects.

翻訳日:2022-06-15 13:28:42 公開日:2022-06-13

# GraphMLP: 3Dヒューマンポース推定のためのグラフMLPライクなアーキテクチャ

GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation ( http://arxiv.org/abs/2206.06420v1 )

ライセンス: Link先を確認

Wenhao Li, Hong Liu, Tianyu Guo, Hao Tang, Runwei Ding

(参考訳) 現代の多層パーセプトロン(MLP)モデルは、自己注意なしで視覚表現を学習する際の競合的な結果を示している。しかしながら、既存のmlpモデルは、局所的な詳細を捉え、人間の構成に関する事前知識を欠いているため、骨格表現学習のモデリング能力が制限されている。これらの課題に対処するため,我々は,3次元ポーズ推定のためのグローバル・ローカル・グラフィック統一アーキテクチャにおいて,MPPとGCNを組み合わせたグラフ強化型MLPアーキテクチャーGraphMLPを提案する。 GraphMLPは、人体のグラフ構造をMDPモデルに組み込んで、ドメイン固有の要求を満たすと同時に、局所的およびグローバルな空間的相互作用を可能にする。大規模な実験により、提案したGraphMLPは、Human3.6MとMPI-INF-3DHPの2つのデータセットで最先端のパフォーマンスを達成することが示された。ソースコードと事前訓練されたモデルは公開されます。

Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand while also allowing for both local and global spatial interactions. Extensive experiments show that the proposed GraphMLP achieves state-of-the-art performance on two datasets, i.e., Human3.6M and MPI-INF-3DHP. Our source code and pretrained models will be publicly available.

翻訳日:2022-06-15 13:28:30 公開日:2022-06-13

# 皮膚内視鏡的病変分類における深層学習の形状バイアスの再検討

Revisiting the Shape-Bias of Deep Learning for Dermoscopic Skin Lesion Classification ( http://arxiv.org/abs/2206.06466v1 )

ライセンス: Link先を確認

Adriano Lucieri and Fabian Schmeisser and Christoph Peter Balada and Shoaib Ahmed Siddiqui and Andreas Dengel and Sheraz Ahmed

(参考訳) 一般に、人間の視覚システムはテクスチャではなく形状認識に偏っていると考えられている。この仮定は、深層モデルの意思決定プロセスと人間の視覚の基本特性を一致させようとする作業の体系を成長させてきた。形状特徴への依存は主に、共変量シフトの下でこれらのモデルの堅牢性を改善することが期待される。本稿では,皮膚病変画像の分類における形状ビアーゼの重要性を再考する。解析の結果,異なる皮膚病変データセットは個々の画像特徴に対して様々なバイアスを示すことがわかった。興味深いことに、深部特徴抽出装置は、皮膚病変分類の絡み合い特徴の学習に傾いているにもかかわらず、個々の特徴をこの絡み合い表現から復号することができる。このことは、これらの特徴がモデルの学習した埋め込み空間にまだ表現されていることを示しているが、分類には使われていない。加えて、異なるデータセットのスペクトル分析は、一般的な視覚認識とは対照的に、皮膚の皮膚病変の分類は、本質的に、形状バイアスを超えた複雑な特徴の組み合わせに依存していることを示している。自然な結果として、形状バイアスモデルの一般的な欲求から遠ざかることによって、皮膚病変の分類を改善できる場合もある。

It is generally believed that the human visual system is biased towards the recognition of shapes rather than textures. This assumption has led to a growing body of work aiming to align deep models' decision-making processes with the fundamental properties of human vision. The reliance on shape features is primarily expected to improve the robustness of these models under covariate shift. In this paper, we revisit the significance of shape-biases for the classification of skin lesion images. Our analysis shows that different skin lesion datasets exhibit varying biases towards individual image features. Interestingly, despite deep feature extractors being inclined towards learning entangled features for skin lesion classification, individual features can still be decoded from this entangled representation. This indicates that these features are still represented in the learnt embedding spaces of the models, but not used for classification. In addition, the spectral analysis of different datasets shows that in contrast to common visual recognition, dermoscopic skin lesion classification, by nature, is reliant on complex feature combinations beyond shape-bias. As a natural consequence, shifting away from the prevalent desire of shape-biasing models can even improve skin lesion classifiers in some cases.

翻訳日:2022-06-15 13:28:13 公開日:2022-06-13

# 半教師付き学習による顔アンチスプーフィングの一般化

Generalizable Method for Face Anti-Spoofing with Semi-Supervised Learning ( http://arxiv.org/abs/2206.06510v1 )

ライセンス: Link先を確認

Nikolay Sergievskiy, Roman Vlasov, Roman Trusov

(参考訳) 顔の偽造防止は生体認証システムにおける高いセキュリティ要件のために多くの注目を集めている。顔の生体認証を商用ハードウェアに持ち込むことは、専用のセンサーを使わずに偽のログインセッションを検出するための信頼性の高い方法の開発に大きく依存した。現在のCNNベースの手法は、トレーニング対象のドメインでよく機能するが、以前は見つからなかったデータセットでは一般化が不十分であることが多い。本稿では,複数データセット間の性能向上のための教師なし事前学習の手法について述べるとともに,教師付き微調整のためのエントリアンチスプーフィングデータセットを導入し,明示的な解釈可能な信号でスプーフィング試行を検出する二分分類タスクを増強する多クラス補助分類層を提案する。 MSU-MFSD, Replay-Attack, OULU-NPUデータセット上でのクロスデータセットテストの最先端結果を得ることで, モデルの有効性を実証する。

Face anti-spoofing has drawn a lot of attention due to the high security requirements in biometric authentication systems. Bringing face biometric to commercial hardware became mostly dependent on developing reliable methods for detecting fake login sessions without specialized sensors. Current CNN-based method perform well on the domains they were trained for, but often show poor generalization on previously unseen datasets. In this paper we describe a method for utilizing unsupervised pretraining for improving performance across multiple datasets without any adaptation, introduce the Entry Antispoofing Dataset for supervised fine-tuning, and propose a multi-class auxiliary classification layer for augmenting the binary classification task of detecting spoofing attempts with explicit interpretable signals. We demonstrate the efficiency of our model by achieving state-of-the-art results on cross-dataset testing on MSU-MFSD, Replay-Attack, and OULU-NPU datasets.

翻訳日:2022-06-15 13:27:53 公開日:2022-06-13

# ミューティセグメント情報符号化(MUSIC)を用いた自己教師付き表現学習

Self-Supervised Representation Learning With MUlti-Segmental Informational Coding (MUSIC) ( http://arxiv.org/abs/2206.06461v1 )

ライセンス: Link先を確認

Chuang Niu and Ge Wang

(参考訳) 自己教師あり表現学習(self-supervised representation learning)は、高次元データを意味のある埋め込み空間にマッピングする。最近の表現学習法のほとんどは、通常$l2$の正規化された単位超球面上の同じサンプルから異なるビューの埋め込み特徴の間の距離を最大化する。すべてのサンプルが同じ埋め込み特徴を持つ自明な解を避けるため, コントラスト学習, 停止勾配, 分散, 共分散正規化など, 様々な手法が開発されている。本研究では,自己指導型表現学習のためのMulti-Segmental Informational Coding (MUSIC)を提案する。 musicは埋め込み機能を複数のセグメントに分割し、サンプルを異なるセマンティッククラスタに識別的に分割する。情報理論の測定は音楽の最適化に直接用いられ、理論的には自明な解は避けられる。 MUSICは、メモリバンクや大規模なバッチ、非対称性ネットワーク、勾配停止、運動量更新など、一般的な技術に依存していないため、トレーニングフレームワークは柔軟である。実験の結果,MUSIC は画像ネット分類における多くのBarlow Twins 法や VICReg 法よりも精度が良く,深いプロジェクタも大きな特徴次元も必要としないことがわかった。コードは利用可能になる。

Self-supervised representation learning maps high-dimensional data into a meaningful embedding space, where samples of similar semantic contents are close to each other. Most of the recent representation learning methods maximize cosine similarity or minimize the distance between the embedding features of different views from the same sample usually on the $l2$ normalized unit hypersphere. To prevent the trivial solutions that all samples have the same embedding feature, various techniques have been developed, such as contrastive learning, stop gradient, variance and covariance regularization, etc. In this study, we propose MUlti-Segmental Informational Coding (MUSIC) for self-supervised representation learning. MUSIC divides the embedding feature into multiple segments that discriminatively partition samples into different semantic clusters and different segments focus on different partition principles. Information theory measurements are directly used to optimize MUSIC and theoretically guarantee trivial solutions are avoided. MUSIC does not depend on commonly used techniques, such as memory bank or large batches, asymmetry networks, gradient stopping, momentum weight updating, etc, making the training framework flexible. Our experiments demonstrate that MUSIC achieves better results than most related Barlow Twins and VICReg methods on ImageNet classification with linear probing, and requires neither deep projectors nor large feature dimensions. Code will be made available.

翻訳日:2022-06-15 13:24:57 公開日:2022-06-13

# テキスト要約における後編集効果の検討

An Exploration of Post-Editing Effectiveness in Text Summarization ( http://arxiv.org/abs/2206.06383v1 )

ライセンス: Link先を確認

Vivian Lai, Alison Smith-Renner, Ke Zhang, Ruijia Cheng, Wenjuan Zhang, Joel Tetreault, Alejandro Jaimes

(参考訳) 自動要約法は効率的だが、品質が低い。比較して、手動の要約は高価だが、高い品質を生み出す。人間とAIは、要約のパフォーマンスを改善するために協力できるのか? 同様のテキスト生成タスク(例えば機械翻訳)では、「ポスト編集」という形で人間とAIのコラボレーションが人間の作業量を削減し、AI出力の品質を向上させる。そこで,テキスト要約におけるポスト編集の利点について検討した。具体的には,編集後提供された要約と,要約品質,人的効率,形式的(xsumニュース),非公式(reddit投稿)テキストにおけるユーザエクスペリエンスに関するマニュアル要約を比較し,72名を対象に実験を行った。例えば、参加者がドメイン知識を欠いている場合など)は役に立ちますが、他のケース(例えば、提供された要約が不正確な情報を含む場合)では役に立ちます。参加者の異なる編集戦略と支援の必要性は、将来のAI要約システムに影響を及ぼす。

Automatic summarization methods are efficient but can suffer from low quality. In comparison, manual summarization is expensive but produces higher quality. Can humans and AI collaborate to improve summarization performance? In similar text generation tasks (e.g., machine translation), human-AI collaboration in the form of "post-editing" AI-generated text reduces human workload and improves the quality of AI output. Therefore, we explored whether post-editing offers advantages in text summarization. Specifically, we conducted an experiment with 72 participants, comparing post-editing provided summaries with manual summarization for summary quality, human efficiency, and user experience on formal (XSum news) and informal (Reddit posts) text. This study sheds valuable insights on when post-editing is useful for text summarization: it helped in some cases (e.g., when participants lacked domain knowledge) but not in others (e.g., when provided summaries include inaccurate information). Participants' different editing strategies and needs for assistance offer implications for future human-AI summarization systems.

翻訳日:2022-06-15 13:21:56 公開日:2022-06-13

# 何を知るべきか? 単一の経験ストリームにおける予測的特徴発見にメタ勾配降下を用いる

What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience ( http://arxiv.org/abs/2206.06485v1 )

ライセンス: Link先を確認

Alexandra Kearney, Anna Koop, Johannes G\"unther, Patrick M. Pilarski

(参考訳) 計算強化学習において、増大する研究は、将来の感覚の予測を通じてエージェントの世界の知覚を構築することを目的としており、より良いゴール指向の意思決定を可能にするために、環境観察に関する予測が追加の入力機能として使用される。この一連の作業におけるオープンな課題は、エージェントがどの予測が意思決定に最も適するかを、無限に多くの予測から決定することである。この課題は、単一エージェントに単一の経験の流れが利用できる連続学習問題において特に顕著である。第一の貢献として,エージェントが学習するメタ勾配降下プロセスを紹介する。 1)何を予測すべきか 2) 選択された予測の見積り,及び 3)これらの見積もりを使って、将来の報酬を最大化するポリシを生成するには、どのように使うか。この原稿では、一般的な値関数として表現される予測について考察する: 将来の信号の蓄積の時間的拡張推定。本研究では, エージェントが環境とのインタラクションを通じて, 部分観測可能性を解決する予測を独立に選択できることを実証する。これらの予測を手動で指定するのではなく、エージェントが自己管理的な方法で有用な予測を特定できるようにし、真に自律的なシステムに向けた一歩を踏み出す。

In computational reinforcement learning, a growing body of work seeks to construct an agent's perception of the world through predictions of future sensations; predictions about environment observations are used as additional input features to enable better goal-directed decision-making. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. This challenge is especially apparent in continual learning problems where a single stream of experience is available to a singular agent. As a primary contribution, we introduce a meta-gradient descent process by which an agent learns 1) what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward -- all during a single ongoing process of continual learning. In this manuscript we consider predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal. We demonstrate that through interaction with the environment an agent can independently select predictions that resolve partial-observability, resulting in performance similar to expertly specified GVFs. By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner, taking a step towards truly autonomous systems.

翻訳日:2022-06-15 13:21:38 公開日:2022-06-13

# (参考訳) IGN : インシシブ生成ネットワーク

IGN : Implicit Generative Networks ( http://arxiv.org/abs/2206.05860v1 )

ライセンス: CC BY-SA 4.0

Haozheng Luo, Tianyi Wu, Feiyu Han, Zhijun Yan, Jianfen Zhang

(参考訳) 本研究では,分布強化学習の最近の進歩を生かして,iqnに基づくモデルに最先端の分布型を与える。我々は,ganモデル生成器と分位回帰を持つ判別器関数を用いて,状態-作用の戻り値分布に対する全分位値を近似する。ベースラインデータセット – 57 atari 2600 games in the ale – ではパフォーマンスが向上しています。また,このアルゴリズムを用いて,アタリゲームにおけるリスクに敏感なポリシーの訓練性能を,政策最適化と評価で示す。

In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.

翻訳日:2022-06-15 02:46:43 公開日:2022-06-13

# (参考訳) TC-SfM:ロバストトラックコミュニティに基づく構造移動

TC-SfM: Robust Track-Community-Based Structure-from-Motion ( http://arxiv.org/abs/2206.05866v1 )

ライセンス: CC BY 4.0

Lei Wang, Linlin Ge, Shan Luo, Zihan Yan, Zhaopeng Cui and Jieqing Feng

(参考訳) Structure-from-Motion (SfM) は、入力画像間の対応に基づいて3次元シーン構造とカメラポーズを復元することを目的としており、二重構造(すなわち、強い視覚的類似性を持つ異なる構造)によって生じる曖昧さは、常に正しくないカメラポーズと3次元構造をもたらす。曖昧さに対処するために、既存の研究のほとんどは、2視点のジオメトリや特徴点を分析して追加の制約情報や暗黙の推論に頼っている。本稿では,現場における高次情報,すなわち地域空間の文脈情報を活用することを提案する。具体的には、各コミュニティがトラックのグループで構成され、シーン内の局所的なセグメントを表す、新しい構造、すなわち {\textit{track-community}}を提案する。コミュニティ検出アルゴリズムを使用して、シーンを複数のセグメントに分割する。そして、トラックの近傍を分析して潜在的な曖昧なセグメントを検出し、ポーズ整合性をチェックすることで補正する。最後に,各セグメントに部分的再構成を行い,両面の相対カメラポーズと3D-3D対応を考慮した新しい双方向整合コスト関数と整合する。実験の結果,視覚的に区別できない構造から生じる復元失敗をロバストに軽減し,部分的再構成を正確にマージできることがわかった。

Structure-from-Motion (SfM) aims to recover 3D scene structures and camera poses based on the correspondences between input images, and thus the ambiguity caused by duplicate structures (i.e., different structures with strong visual resemblance) always results in incorrect camera poses and 3D structures. To deal with the ambiguity, most existing studies resort to additional constraint information or implicit inference by analyzing two-view geometries or feature points. In this paper, we propose to exploit high-level information in the scene, i.e., the spatial contextual information of local regions, to guide the reconstruction. Specifically, a novel structure is proposed, namely, {\textit{track-community}}, in which each community consists of a group of tracks and represents a local segment in the scene. A community detection algorithm is used to partition the scene into several segments. Then, the potential ambiguous segments are detected by analyzing the neighborhood of tracks and corrected by checking the pose consistency. Finally, we perform partial reconstruction on each segment and align them with a novel bidirectional consistency cost function which considers both 3D-3D correspondences and pairwise relative camera poses. Experimental results demonstrate that our approach can robustly alleviate reconstruction failure resulting from visually indistinguishable structures and accurately merge the partial reconstructions.

翻訳日:2022-06-15 02:30:10 公開日:2022-06-13

# (参考訳) Pseudo-Labeling の信頼性

Confident Sinkhorn Allocation for Pseudo-Labeling ( http://arxiv.org/abs/2206.05880v1 )

ライセンス: CC BY 4.0

Vu Nguyen and Sachin Farfade and Anton van den Hengel

(参考訳) 半教師付き学習は、ラベル付きデータへの機械学習の依存を減らす重要なツールである。しかし、その内在する空間的・意味的構造を利用して、主に画像や言語データに適用されている。これらのメソッドは、これらのドメイン構造が利用できないため、表データには適用されない。既存の擬似ラベル法(pl)は表データに有効であるが、ノイズサンプルや未知のしきい値が与えられたグリーディ代入に対して脆弱である。本稿では,信頼度の高い標本のみにラベルを割り当て,最適な輸送手段によって最適なラベル割り当てを学習するCSA(Confident Sinkhorn Allocation)を提案する。 CSAは、この事実上重要な領域における現在の最先端技術よりも優れています。

Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data. It has, however, been applied primarily to image and language data, by exploiting the inherent spatial and semantic structure therein. These methods do not apply to tabular data because these domain structures are not available. Existing pseudo-labeling (PL) methods can be effective for tabular data but are vulnerable to noise samples and to greedy assignments given a predefined threshold which is unknown. This paper addresses this problem by proposing a Confident Sinkhorn Allocation (CSA), which assigns labels to only samples with high confidence scores and learns the best label allocation via optimal transport. CSA outperforms the current state-of-the-art in this practically important area.

翻訳日:2022-06-15 02:08:24 公開日:2022-06-13

# (参考訳) 大規模バッチによるアンカークライアントのサンプリングによるフェデレーション学習の高速化

Accelerating Federated Learning via Sampling Anchor Clients with Large Batches ( http://arxiv.org/abs/2206.05891v1 )

ライセンス: CC BY 4.0

Feijie Wu, Song Guo, Zhihao Qu, Shiqi He, Ziming Liu

(参考訳) 最近の連合学習研究で大規模なバッチを使用すると収束率が向上するが、小さなバッチを使うよりも計算のオーバーヘッドが増大する。この制限を克服するため,我々は,参加者を時間変動確率に基づいてアンカーグループとマイナーグループに分離する統一フレームワークfedamdを提案する。アンカーグループの各クライアントは、大きなバッチを使用して勾配を計算する。 minerグループのクライアントは、シリアルミニバッチを使用して複数のローカルアップデートを実行し、各ローカルアップデートは、クライアントのブルジー平均から派生したグローバルターゲットによって間接的に制御される。その結果、マイナーグループは、グローバルモデルを更新するのに適応した、大域的最小化への最適化された更新に従う。 FedAMDは$\epsilon$-approximationによって測定され、一定の確率でアンカーをサンプリングすることで、非凸目的の下で$O(1/\epsilon)$の収束率を達成する。理論的結果は最先端のアルゴリズムであるBVR-L-SGDを$O(1/\epsilon^{3/2})$でかなり上回り、FedAMDは少なくとも$O(1/\epsilon)$通信オーバーヘッドを減らす。実世界のデータセットに関する実証的研究は、FedAMDの有効性を検証し、提案アルゴリズムの優位性を実証する。

Using large batches in recent federated learning studies has improved convergence rates, but it requires additional computation overhead compared to using small batches. To overcome this limitation, we propose a unified framework FedAMD, which disjoints the participants into anchor and miner groups based on time-varying probabilities. Each client in the anchor group computes the gradient using a large batch, which is regarded as its bullseye. Clients in the miner group perform multiple local updates using serial mini-batches, and each local update is also indirectly regulated by the global target derived from the average of clients' bullseyes. As a result, the miner group follows a near-optimal update towards the global minimizer, adapted to update the global model. Measured by $\epsilon$-approximation, FedAMD achieves a convergence rate of $O(1/\epsilon)$ under non-convex objectives by sampling an anchor with a constant probability. The theoretical result considerably surpasses the state-of-the-art algorithm BVR-L-SGD at $O(1/\epsilon^{3/2})$, while FedAMD reduces at least $O(1/\epsilon)$ communication overhead. Empirical studies on real-world datasets validate the effectiveness of FedAMD and demonstrate the superiority of our proposed algorithm.

翻訳日:2022-06-15 01:41:06 公開日:2022-06-13

# (参考訳) 幾何学的ガイドによる統合勾配

Geometrically Guided Integrated Gradients ( http://arxiv.org/abs/2206.05903v1 )

ライセンス: CC BY 4.0

Md Mahfuzur Rahman, Noah Lewis, Sergey Plis

(参考訳) 深層ニューラルネットワークの解釈可能性の方法は、主に元の入力や摂動入力に対するクラススコアの感度に焦点が当てられ、通常は実際の勾配や修正された勾配を用いて測定される。予測の裏にある理性を理解するために、モデルに依存しないアプローチを使う方法もある。本稿では,入力に対するモデルパラメータ空間の局所的幾何が,ポストホックな説明を改善する上でも有用であることを論じ,実証する。この目的を達成するために,従来の統合勾配法のように,線形経路に沿った勾配計算の上に構築する「幾何学的誘導型統合勾配」と呼ばれる解釈可能性手法を提案する。しかし、勾配情報を統合する代わりに、入力の複数のスケールバージョンからモデルの動的挙動を探索し、各入力に対する最良の属性をキャプチャする。提案手法が,主観的および定量的評価においてバニラや統合的勾配よりも優れていることを示す。また,従来のモデルランダム化試験を補完する「モデル摂動」正当性チェックを提案する。

Interpretability methods for deep neural networks mainly focus on the sensitivity of the class score with respect to the original or perturbed input, usually measured using actual or modified gradients. Some methods also use a model-agnostic approach to understanding the rationale behind every prediction. In this paper, we argue and demonstrate that local geometry of the model parameter space relative to the input can also be beneficial for improved post-hoc explanations. To achieve this goal, we introduce an interpretability method called "geometrically-guided integrated gradients" that builds on top of the gradient calculation along a linear path as traditionally used in integrated gradient methods. However, instead of integrating gradient information, our method explores the model's dynamic behavior from multiple scaled versions of the input and captures the best possible attribution for each input. We demonstrate through extensive experiments that the proposed approach outperforms vanilla and integrated gradients in subjective and quantitative assessment. We also propose a "model perturbation" sanity check to complement the traditionally used "model randomization" test.

翻訳日:2022-06-15 01:39:18 公開日:2022-06-13

# (参考訳) 帯域制限関数の一般化におけるNN上のGNNの優位性

Superiority of GNN over NN in generalizing bandlimited functions ( http://arxiv.org/abs/2206.05904v1 )

ライセンス: CC BY 4.0

A. Martina Neuman, Rongrong Wang and Yuying Xie

(参考訳) 厳密な数学的議論を通じて、GNNアーキテクチャは、コンパクトな$d$次元ユークリッド格子上の帯域制限関数の近似において、NNのアーキテクチャよりも優れていることを示す。一様近似誤差である$o_{d}(2^{-\mathcal{m}^{1/d}}) を達成するために、前者は$\mathcal{m}$ の関数値しか必要とせず、この誤差率はnnsがより悪くなる可能性があるという意味で最適であることを示した。

We constructively show, via rigorous mathematical arguments, that GNN architectures outperform those of NN in approximating bandlimited functions on compact $d$-dimensional Euclidean grids. We show that the former only need $\mathcal{M}$ sampled functional values in order to achieve a uniform approximation error of $O_{d}(2^{-\mathcal{M}^{1/d}})$ and that this error rate is optimal, in the sense that, NNs might achieve worse.

翻訳日:2022-06-15 01:27:02 公開日:2022-06-13

# (参考訳) INDIGO:ドメインの一般化のための固有のマルチモーダリティ

INDIGO: Intrinsic Multimodality for Domain Generalization ( http://arxiv.org/abs/2206.05912v1 )

ライセンス: CC BY 4.0

Puneet Mangla and Shivam Chandhok and Milan Aggarwal and Vineeth N Balasubramanian and Balaji Krishnamurthy

(参考訳) unseen domain(ドメインの一般化)の下で一般化するモデルには、ドメインに依存しない特徴表現を学習し、オブジェクトカテゴリを構成する基礎となるセマンティクスを捉えることが不可欠である。安価な弱教師付きノイズテキストアノテーションから全体表現を学習する弱教師付き視覚言語モデルへの最近の進歩は、異なるドメインで一般化する対象特性を捉えることによって意味理解の能力を示している。しかし、複数のソースドメインが関与する場合、データセット内の画像毎にテキストアノテーションをキュレートするコストは、その数に応じて数回爆発する可能性がある。これにより、プロセスが退屈で実現不可能になり、教師付き視覚言語アプローチを直接使用して、目に見えないドメイン上で最高の一般化を実現するのを妨げます。このことから,既存の事前学習型マルチモーダルネットワークからのマルチモーダル情報を「本質的な」方法で活用して,未知の領域下でのシステム一般化を実現する方法について検討した。そこで本研究では,これらの事前学習されたマルチモーダルネットワークに存在する本質的モダリティを,視覚モダリティとともに簡易かつエレガントに活用し,テスト時に未知領域への一般化を促進するためのドメイン一般化(indigo)のための本質的マルチモーダリティを提案する。我々はいくつかの領域一般化設定(ClosedDG, OpenDG, Limitedソース)を実験し、未確認領域における最先端の一般化性能を示す。さらに、INDIGOの総合的な理解を深めるために、徹底的な分析を行う。

For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on semantic understanding by capturing object characteristics that generalize under different domains. However, when multiple source domains are involved, the cost of curating textual annotations for every image in the dataset can blow up several times, depending on their number. This makes the process tedious and infeasible, hindering us from directly using these supervised vision-language approaches to achieve the best generalization on an unseen domain. Motivated from this, we study how multimodal information from existing pre-trained multimodal networks can be leveraged in an "intrinsic" way to make systems generalize under unseen domains. To this end, we propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO), a simple and elegant way of leveraging the intrinsic modality present in these pre-trained multimodal networks along with the visual modality to enhance generalization to unseen domains at test-time. We experiment on several Domain Generalization settings (ClosedDG, OpenDG, and Limited sources) and show state-of-the-art generalization performance on unseen domains. Further, we provide a thorough analysis to develop a holistic understanding of INDIGO.

翻訳日:2022-06-15 01:25:49 公開日:2022-06-13

# (参考訳) 量子化が一般化を改善する理由:二元重みニューラルネットワークのNTK

Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks ( http://arxiv.org/abs/2206.05916v1 )

ライセンス: CC BY 4.0

Kaiqi Zhang, Ming Yin, Yu-Xiang Wang

(参考訳) 量子化されたニューラルネットワークは、推論中の空間と計算の複雑さを減らすため、多くの注目を集めている。さらに、量子化が暗黙の正則化として作用し、ニューラルネットワークの一般化性を向上させるという伝承もあるが、この興味深い民俗学を定式化する研究は存在しない。本稿では,ニューラルネットワークの2次重みを確率的ラウンドリングの下でのランダム変数とみなし,ニューラルネットワークの異なる層上の分布分布について検討する。本研究では,連続パラメータとスムーズなアクティベーション関数を持つニューラルネットワークである分布伝搬を近似する準ニューラルネットワークを提案する。この準ニューラルネットワークのニューラル・タンジェント・カーネル(NTK)を導出し、ランダム化スケールのガウス・カーネルに匹敵する約指数速度でNTKの固有値が崩壊することを示す。このことは、双対重みニューラルネットワークの再生カーネルヒルベルト空間(RKHS)が、実値重みを持つものと比較して関数の厳密な部分集合をカバーすることを示している。提案する擬似ニューラルネットワークがバイナリ重み付きニューラルネットワークを十分に近似できることを検証するために実験を行う。さらに、二元重みニューラルネットワークは、ガウスカーネルとラプラスカーネルの差に類似した実値重みニューラルネットワークと比較して、より低い一般化ギャップを与える。

Quantized neural networks have drawn a lot of attention as they reduce the space and computational complexity during the inference. Moreover, there has been folklore that quantization acts as an implicit regularizer and thus can improve the generalizability of neural networks, yet no existing work formalizes this interesting folklore. In this paper, we take the binary weights in a neural network as random variables under stochastic rounding, and study the distribution propagation over different layers in the neural network. We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function. We derive the neural tangent kernel (NTK) for this quasi neural network, and show that the eigenvalue of NTK decays at approximately exponential rate, which is comparable to that of Gaussian kernel with randomized scale. This in turn indicates that the Reproducing Kernel Hilbert Space (RKHS) of a binary weight neural network covers a strict subset of functions compared with the one with real value weights. We use experiments to verify that the quasi neural network we proposed can well approximate binary weight neural network. Furthermore, binary weight neural network gives a lower generalization gap compared with real value weight neural network, which is similar to the difference between Gaussian kernel and Laplace kernel.

翻訳日:2022-06-15 01:10:30 公開日:2022-06-13

# (参考訳) 大腸癌における蛍光アンギオグラフィーの分類-予備報告

Fluorescence angiography classification in colorectal surgery -- A preliminary report ( http://arxiv.org/abs/2206.05935v1 )

ライセンス: CC BY-SA 4.0

Antonio S Soares, Sophia Bano, Neil T Clancy, Laurence B Lovat, Danail Stoyanov and Manish Chand

(参考訳) 背景:fluorescence angiographyは, 外科医が最適な灌流組織を選択できることで, 腹水漏出の軽減に非常に有望な結果を示している。しかし、蛍光信号の主観的な解釈は、異なる外科医間で大きな違いが存在するため、この技法の幅広い応用を妨げる。本研究の目的は,術中蛍光アンギオグラフィーデータに基づいて,大腸組織を "perfused" あるいは "not perfused" と分類する人工知能アルゴリズムを開発することである。方法:resnetアーキテクチャを用いた分類モデルを,3次紹介センターにおける大腸切除の蛍光血管造影ビデオのデータセットを用いて検討した。コロンの蛍光および非蛍光セグメントに対応するフレームを用いて分類アルゴリズムを訓練した。トレーニングセットに使用されていない患者のフレームを用いた検証を行い、同じ機器を用いて収集したデータと異なるカメラを用いて収集したデータの両方を含む。パフォーマンス指標が計算され、サリエンシーマップが出力をさらに分析するために使用された。組織分類に基づいて決定境界が同定された。結果: 畳み込みニューラルネットワークを7例の1790フレームで訓練し, 14例の24フレームで検証した。トレーニングセットの精度は100%,検証セットの精度は80%であった。リコールと精度はそれぞれトレーニングセットで100%と100%、検証セットで68.8%と91.7%であった。結論: 術中蛍光式血管造影の精度の高い自動分類が可能であり, 自動判定境界同定が可能である。これにより外科医は蛍光血管造影の技術を標準化できる。アルゴリズムをデプロイするWebベースのアプリが利用可能になった。

Background: Fluorescence angiography has shown very promising results in reducing anastomotic leaks by allowing the surgeon to select optimally perfused tissue. However, subjective interpretation of the fluorescent signal still hinders broad application of the technique, as significant variation between different surgeons exists. Our aim is to develop an artificial intelligence algorithm to classify colonic tissue as 'perfused' or 'not perfused' based on intraoperative fluorescence angiography data. Methods: A classification model with a Resnet architecture was trained on a dataset of fluorescence angiography videos of colorectal resections at a tertiary referral centre. Frames corresponding to fluorescent and non-fluorescent segments of colon were used to train a classification algorithm. Validation using frames from patients not used in the training set was performed, including both data collected using the same equipment and data collected using a different camera. Performance metrics were calculated, and saliency maps used to further analyse the output. A decision boundary was identified based on the tissue classification. Results: A convolutional neural network was successfully trained on 1790 frames from 7 patients and validated in 24 frames from 14 patients. The accuracy on the training set was 100%, on the validation set was 80%. Recall and precision were respectively 100% and 100% on the training set and 68.8% and 91.7% on the validation set. Conclusion: Automated classification of intraoperative fluorescence angiography with a high degree of accuracy is possible and allows automated decision boundary identification. This will enable surgeons to standardise the technique of fluorescence angiography. A web based app was made available to deploy the algorithm.

翻訳日:2022-06-15 01:09:11 公開日:2022-06-13

# (参考訳) リコメンダシステムのためのユニバーサルシーケンス表現学習に向けて

Towards Universal Sequence Representation Learning for Recommender Systems ( http://arxiv.org/abs/2206.05941v1 )

ライセンス: CC BY 4.0

Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen

(参考訳) 効率的なシーケンシャルレコメンデータを開発するために,歴史的ユーザ行動のモデル化を目的とした一連のシーケンス表現学習(SRL)手法を提案する。既存のSRLメソッドの多くは、ユーザの好みをよりよく捉えるためにシーケンスモデルを開発するために明示的なアイテムIDに依存している。有効性はあるものの、アイテムIDを明示的にモデル化する制限のため、これらの手法を新しいレコメンデーションシナリオに移すことは困難である。この問題に取り組むため,我々はunisrecと呼ばれる新しいユニバーサルシーケンス表現学習手法を提案する。提案手法では,アイテムの関連記述テキストを用いて,異なるレコメンデーションシナリオ間で転送可能な表現を学習する。ユニバーサルアイテム表現を学習するために、パラメトリックホワイトニングとmixed-of-experts enhanced adaptorに基づく軽量なアイテムエンコーディングアーキテクチャを設計する。ユニバーサルシーケンス表現の学習には,複数領域の負をサンプリングして2つのコントラストプリトレーニングタスクを導入する。事前訓練されたユニバーサルシーケンス表現モデルにより,提案手法は帰納的あるいは帰納的設定の下で,パラメータ効率の良い方法で,新しいレコメンデーションドメインやプラットフォームに効果的に移行することができる。実世界のデータセット上で行った広範囲な実験により,提案手法の有効性が示された。特に,提案手法はクロスプラットフォーム環境での性能向上にもつながり,ユニバーサルsrl方式の強い転送性を示す。コードと事前訓練されたモデルは、https://github.com/RUCAIBox/UniSRec.comで入手できる。

In order to develop effective sequential recommenders, a series of sequence representation learning (SRL) methods are proposed to model historical user behaviors. Most existing SRL methods rely on explicit item IDs for developing the sequence models to better capture user preference. Though effective to some extent, these methods are difficult to be transferred to new recommendation scenarios, due to the limitation by explicitly modeling item IDs. To tackle this issue, we present a novel universal sequence representation learning approach, named UniSRec. The proposed approach utilizes the associated description text of items to learn transferable representations across different recommendation scenarios. For learning universal item representations, we design a lightweight item encoding architecture based on parametric whitening and mixture-of-experts enhanced adaptor. For learning universal sequence representations, we introduce two contrastive pre-training tasks by sampling multi-domain negatives. With the pre-trained universal sequence representation model, our approach can be effectively transferred to new recommendation domains or platforms in a parameter-efficient way, under either inductive or transductive settings. Extensive experiments conducted on real-world datasets demonstrate the effectiveness of the proposed approach. Especially, our approach also leads to a performance improvement in a cross-platform setting, showing the strong transferability of the proposed universal SRL method. The code and pre-trained model are available at: https://github.com/RUCAIBox/UniSRec.

翻訳日:2022-06-15 00:56:48 公開日:2022-06-13

# (参考訳) Pro-TIP:TIP検出によるRObust自動超音波校正用ファントム

PRO-TIP: Phantom for RObust automatic ultrasound calibration by TIP detection ( http://arxiv.org/abs/2206.05962v1 )

ライセンス: CC BY 4.0

Matteo Ronchetti, Julia Rackerseder, Maria Tirindelli, Mehrdad Salehi, Nassir Navab, Wolfgang Wein, Oliver Zettinig

(参考訳) 追跡超音波プローブの自動校正法を提案する。この目的のために、高さの異なる9つの円錐からなるカスタムファントムを設計する。チップは複数のスイープにマッチするキーポイントとして使用される。畳み込みニューラルネットワークを用いてこれらを抽出し、超音波フレーム毎にコーンを分割し、スイープ全体にわたって追跡する。キャリブレーションはRANSACを用いて頑健に推定され、後に画像ベース技術を用いて洗練される。 phantomは3dプリントでき、最先端の方法よりも多くのアドバンテージを提供します。 phantomの設計とアルゴリズムコードはオンラインで無料で入手できる。ファントム自体が追跡対象を必要としないため,現在使用されている技術よりも使いやすさが向上している。この完全自動メソッドは、実験で示したように、新しいプローブと異なるベンダーに一般化します。このアプローチは、ドメインエキスパートが取得したキャリブレーションに匹敵する結果を生み出す。

We propose a novel method to automatically calibrate tracked ultrasound probes. To this end we design a custom phantom consisting of nine cones with different heights. The tips are used as key points to be matched between multiple sweeps. We extract them using a convolutional neural network to segment the cones in every ultrasound frame and then track them across the sweep. The calibration is robustly estimated using RANSAC and later refined employing image based techniques. Our phantom can be 3D-printed and offers many advantages over state-of-the-art methods. The phantom design and algorithm code are freely available online. Since our phantom does not require a tracking target on itself, ease of use is improved over currently used techniques. The fully automatic method generalizes to new probes and different vendors, as shown in our experiments. Our approach produces results comparable to calibrations obtained by a domain expert.

翻訳日:2022-06-15 00:38:24 公開日:2022-06-13

# (参考訳) ATDN vSLAM: 視覚的同時局所化とマッピングのための全スルーディープラーニングベースのソリューション

ATDN vSLAM: An all-through Deep Learning-Based Solution for Visual Simultaneous Localization and Mapping ( http://arxiv.org/abs/2206.05963v1 )

ライセンス: CC BY 4.0

M\'aty\'as Sz\'ant\'o, Gy\"orgy R. Bog\'ar, L\'aszl\'o Vajta

(参考訳) 本稿では,深層学習コンポーネントで構成された視覚同時局所化マッピング(vslam)のための新しい解法を提案する。提案されたアーキテクチャは高度にモジュール化されたフレームワークであり、各コンポーネントがビジョンベースのディープラーニングソリューションの各分野に最先端の成果を提供する。本論文は, これら個々のビルディングブロックの相乗的統合により, 機能的かつ効率的な全スルーディープニューラル(ATDN)vSLAMシステムを構築することができることを示す。 Embedding Distance Loss関数を導入し、それを使用してATDNアーキテクチャをトレーニングする。その結果、KITTIデータセットのサブセットで4.4%の変換と0.0176 deg/m回転誤差を達成した。提案アーキテクチャは、データベース作成を支援する効率的で低遅延の自律運転(AD)や、自律走行車(AV)制御の基礎として利用できる。

In this paper, a novel solution is introduced for visual Simultaneous Localization and Mapping (vSLAM) that is built up of Deep Learning components. The proposed architecture is a highly modular framework in which each component offers state of the art results in their respective fields of vision-based deep learning solutions. The paper shows that with the synergic integration of these individual building blocks, a functioning and efficient all-through deep neural (ATDN) vSLAM system can be created. The Embedding Distance Loss function is introduced and using it the ATDN architecture is trained. The resulting system managed to achieve 4.4% translation and 0.0176 deg/m rotational error on a subset of the KITTI dataset. The proposed architecture can be used for efficient and low-latency autonomous driving (AD) aiding database creation as well as a basis for autonomous vehicle (AV) control.

翻訳日:2022-06-15 00:29:58 公開日:2022-06-13

# (参考訳) emprox:ニューラルアーキテクチャ探索のためのニューラルネットワーク性能推定

EmProx: Neural Network Performance Estimation For Neural Architecture Search ( http://arxiv.org/abs/2206.05972v1 )

ライセンス: CC BY 4.0

G.G.H. Franken, P. Singh, J. Vanschoren

(参考訳) 一般的なニューラルアーキテクチャ探索手法は、パフォーマンスを評価し最適なアーキテクチャを見つけるためにトレーニングを必要とする大量の候補アーキテクチャを生成する。検索時間を最小化するために、異なるパフォーマンス推定戦略を使用する。このような戦略の有効性は、正確性、適合性、クエリ時間によって異なる。本研究では,EmProx Score (Embedding Proximity Score) という新しい手法を提案する。ニューラルネットワーク最適化(nao)と同様に、この手法は候補アーキテクチャをエンコーダ-デコーダフレームワークを使用して連続的な埋め込み空間にマッピングする。次に、その性能が知られているアーキテクチャの埋め込みベクトルに基づいて、重み付きkNNを用いて候補の性能を推定する。本手法の性能評価は,NAO と比較して約9倍高速であり,NAO で使用される MLP 性能予測器と同等である。現在使用されている他のパフォーマンス評価戦略に対するベンチマークは、より正確で、5倍から80倍高速であることを示している。

Common Neural Architecture Search methods generate large amounts of candidate architectures that need training in order to assess their performance and find an optimal architecture. To minimize the search time we use different performance estimation strategies. The effectiveness of such strategies varies in terms of accuracy and fit and query time. This study proposes a new method, EmProx Score (Embedding Proximity Score). Similar to Neural Architecture Optimization (NAO), this method maps candidate architectures to a continuous embedding space using an encoder-decoder framework. The performance of candidates is then estimated using weighted kNN based on the embedding vectors of architectures of which the performance is known. Performance estimations of this method are comparable to the MLP performance predictor used in NAO in terms of accuracy, while being nearly nine times faster to train compared to NAO. Benchmarking against other performance estimation strategies currently used shows similar to better accuracy, while being five up to eighty times faster.

翻訳日:2022-06-15 00:16:05 公開日:2022-06-13

# (参考訳) 線形時間時相論理に対するsahlqvist型対応定理

A Sahlqvist-style Correspondence Theorem for Linear-time Temporal Logic ( http://arxiv.org/abs/2206.05973v1 )

ライセンス: CC BY 4.0

Rui Li, Francesco Belardinelli

(参考訳) モーダル論理の言語はクリプキフレーム上の一階条件を表現することができる。 Henrik Sahlqvist による古典的な結果は、一階条件 (あるいは Sahlqvist 対応式) が効果的でアルゴリズム的な方法で発見できる、重要なモーダル公式のクラスを特定できる。最近の作品は、この古典的な結果をより複雑なモーダル言語に拡張することに成功している。本稿では,線形時時時論理 (LTL) に対する類似の行を追求し,時相仕様のための最も広く使われている形式言語の一つである Sahlqvist 形式の対応定理を開発する。 LTLは、基本的なモーダル論理の構文を拡張し、専用のテンポラル演算子Next X と until U を持つ。その結果、一階の対応式を持つ公式のクラスの複雑さも、それに応じて増加する。本稿では, モーダル作用素 F , G, X, U を用いて構築した LTL Sahlqvist 公式の有意なクラスを同定する。本論文の主な結果は、一階言語で定義可能なフレーム条件に対するltl sahlqvist公式の対応を証明することである。

The language of modal logic is capable of expressing first-order conditions on Kripke frames. The classic result by Henrik Sahlqvist identifies a significant class of modal formulas for which first-order conditions -- or Sahlqvist correspondents -- can be find in an effective, algorithmic way. Recent works have successfully extended this classic result to more complex modal languages. In this paper, we pursue a similar line and develop a Sahlqvist-style correspondence theorem for Linear-time Temporal Logic (LTL), which is one of the most widely used formal languages for temporal specification. LTL extends the syntax of basic modal logic with dedicated temporal operators Next X and Until U . As a result, the complexity of the class of formulas that have first-order correspondents also increases accordingly. In this paper, we identify a significant class of LTL Sahlqvist formulas built by using modal operators F , G, X, and U . The main result of this paper is to prove the correspondence of LTL Sahlqvist formulas to frame conditions that are definable in first-order language.

翻訳日:2022-06-15 00:06:18 公開日:2022-06-13

# (参考訳) ランク損失を用いたディープニューラルネットワークによる高速化故障時間モデル

Deep Neural Network Based Accelerated Failure Time Models using Rank Loss ( http://arxiv.org/abs/2206.05974v1 )

ライセンス: CC BY 4.0

Gwangsu Kim and Sangwook Kang

(参考訳) 加速故障時間(aft)モデルは、故障時間と一連の共変量との対数線形関係を仮定する。危険機能に取り組む他の一般的な生存モデルとは対照的に、共変量の影響は直感的に解釈される障害時間に直接影響する。誤差分布を規定しない半パラメトリックAFTモデルは、分布仮定から逸脱するために柔軟で堅牢である。望ましい特徴から、このタイプのモデルは、検閲された障害時間データの解析において、一般的なcoxモデルに代わる有望な選択肢と見なされている。しかしながら、これらの AFT モデルでは、平均に対する線形予測器が典型的に仮定される。平均をモデル化する際、予測子の非線形性についてはほとんど研究されていない。ディープニューラルネットワーク(DNN)は過去数十年にわたって注目され、様々な分野で大きな成功を収めてきた。 DNNにはいくつかの顕著な利点があり、非線形性に対処するのに特に有用であることが示されている。これを利用して,Gehan型損失モデルとサブサンプリング手法を組み合わせることで,AFTモデルにDNNを適用することを提案する。提案したDNNとランクベースAFTモデル(DeepR-AFT)の有限サンプル特性を広範囲にわたる刺激研究により検討した。 DeepR-AFTは、予測器が非線形である場合、パラメトリックまたはセミパラメトリックよりも優れた性能を示す。線形予測器の場合、共変量の大きさが大きい場合、DeepR-AFTはより良く動作する。提案するdeepr-aftは,その優位性を示す2つの実データセットを用いて示す。

An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, whose interpretation is intuitive. The semiparametric AFT model that does not specify the error distribution is flexible and robust to departures from the distributional assumption. Owing to the desirable features, this class of models has been considered as a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the nonlinearity of predictors when modeling the mean. Deep neural networks (DNNs) have received a focal attention over the past decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing the nonlinearity. By taking advantage of this, we propose to apply DNNs in fitting AFT models using a Gehan-type loss, combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank based AFT model (DeepR-AFT) are investigated via an extensive stimulation study. DeepR-AFT shows a superior performance over its parametric or semiparametric counterparts when the predictor is nonlinear. For linear predictors, DeepR-AFT performs better when the dimensions of covariates are large. The proposed DeepR-AFT is illustrated using two real datasets, which demonstrates its superiority.

翻訳日:2022-06-14 23:47:38 公開日:2022-06-13

# (参考訳) トップ2のアルゴリズムが再検討

Top Two Algorithms Revisited ( http://arxiv.org/abs/2206.05979v1 )

ライセンス: CC BY-SA 4.0

Marc Jourdan, R\'emy Degenne, Dorian Baudry, Rianne de Heide and Emilie Kaufmann

(参考訳) トップ2のアルゴリズムは、トンプソンサンプリングを多腕バンディットモデル(Russo, 2016)の最も優れた腕識別に適応させたことで生まれた。彼らは2つの候補の腕、リーダーと挑戦者のランダム化によって次の腕を選択します。その優れた経験的性能にもかかわらず、固定信頼の最良の腕の識別に関する理論的保証は、既知のばらつきを持つガウス的腕のときのみ得られる。本稿では, リーダー, 挑戦者, および(多分非パラメトリックな)アーム分布の望ましい特性を識別する, 上位2つの方法の一般解析を行う。その結果,有界分布を持つ最適アーム識別のための理論的に支持されたトップ2アルゴリズムが得られた。提案手法は,トンプソンサンプリングから受け継いだリーダの選択に使用されるサンプリングステップが,経験的ベストアームの選択など他の選択に置き換えられることを示す。

Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been obtained when the arms are Gaussian with known variances. In this paper, we provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. As a result, we obtain theoretically supported Top Two algorithms for best arm identification with bounded distributions. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices, like selecting the empirical best arm.

翻訳日:2022-06-14 23:28:24 公開日:2022-06-13

# (参考訳) dnnの注意を誘導する効率的なヒューマン・イン・ザ・ループシステム

Efficient Human-in-the-loop System for Guiding DNNs Attention ( http://arxiv.org/abs/2206.05981v1 )

ライセンス: CC BY 4.0

Yi He, Xi Yang, Chia-Ming Chang, Haoran Xie, Takeo Igarashi

(参考訳) 注意指導は、ディープラーニングにおけるデータセットバイアスに対処するためのアプローチであり、モデルが決定を下すのに誤った機能に依存している。画像分類タスクに着目し,ユーザが指定した領域への分類器の注意を対話的に誘導し,共起バイアスの影響を低減し,DNNの伝達性と解釈性を向上させる。注意誘導のための従来のアプローチでは、ピクセルレベルのアノテーションの準備が必要であり、インタラクティブシステムとして設計されていない。本稿では,ユーザが簡単なクリックで画像に注釈を付けるための新しい対話的手法と,アノテーション数を大幅に減らすための新しいアクティブラーニング戦略を提案する。提案システムを複数のデータセット上で評価するために,数値評価とユーザ調査を行った。通常、大量のポリゴンベースのセグメンテーションマスクを使用して微調整やDNNの訓練を行う既存の非アクティブラーニングアプローチと比較して、我々のシステムは多くの労力とお金を節約し、データセットにバイアスがかかってもよりうまく機能する微調整ネットワークを得ることができる。実験結果から,提案システムの有効性,妥当性,信頼性が示唆された。

Attention guidance is an approach to addressing dataset bias in deep learning, where the model relies on incorrect features to make decisions. Focusing on image classification tasks, we propose an efficient human-in-the-loop system to interactively direct the attention of classifiers to the regions specified by users, thereby reducing the influence of co-occurrence bias and improving the transferability and interpretability of a DNN. Previous approaches for attention guidance require the preparation of pixel-level annotations and are not designed as interactive systems. We present a new interactive method to allow users to annotate images with simple clicks, and study a novel active learning strategy to significantly reduce the number of annotations. We conducted both a numerical evaluation and a user study to evaluate the proposed system on multiple datasets. Compared to the existing non-active-learning approach which usually relies on huge amounts of polygon-based segmentation masks to fine-tune or train the DNNs, our system can save lots of labor and money and obtain a fine-tuned network that works better even when the dataset is biased. The experiment results indicate that the proposed system is efficient, reasonable, and reliable.

翻訳日:2022-06-14 23:26:59 公開日:2022-06-13

# (参考訳) 高品質GAN潜時サンプリングのためのハッチネス先行探索と爆発

Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling ( http://arxiv.org/abs/2206.06014v1 )

ライセンス: CC BY 4.0

Yuanbang Liang, Jing Wu, Yu-Kun Lai, Yipeng Qin

(参考訳) gans(generative adversarial network)に関する広範な研究にもかかわらず、その潜在空間から高品質の画像を確実にサンプリングする方法は、未検討のトピックである。本稿では, GAN潜伏分布の偏りを探索し, 利用することにより, 新たなGAN潜伏サンプリング手法を提案する。我々の重要な洞察は、GAN潜伏空間の高次元性は必然的に、潜伏空間の他の潜伏空間よりもはるかに大きなサンプリング密度を持つハブ潜伏空間の出現につながるということである。その結果、これらのハブ潜伏剤はより訓練され、高品質な画像の合成に寄与する。後方の「チェリーピッキング」と異なり,画像合成前に高品質な潜伏剤を識別する前駆的手法であるため,この手法は高効率である。さらに, 広く知られているが純粋に経験的切断トリックは, ハブ潜伏体の中央クラスタリング効果に対するナイーブな近似であり, 切断トリックの理論的根拠を明らかにするだけでなく, 本手法の優越性と基礎性も示す。その結果,提案手法の有効性が示された。

Despite the extensive studies on Generative Adversarial Networks (GANs), how to reliably sample high-quality images from their latent spaces remains an under-explored topic. In this paper, we propose a novel GAN latent sampling method by exploring and exploiting the hubness priors of GAN latent distributions. Our key insight is that the high dimensionality of the GAN latent space will inevitably lead to the emergence of hub latents that usually have much larger sampling densities than other latents in the latent space. As a result, these hub latents are better trained and thus contribute more to the synthesis of high-quality images. Unlike the a posterior "cherry-picking", our method is highly efficient as it is an a priori method that identifies high-quality latents before the synthesis of images. Furthermore, we show that the well-known but purely empirical truncation trick is a naive approximation to the central clustering effect of hub latents, which not only uncovers the rationale of the truncation trick, but also indicates the superiority and fundamentality of our method. Extensive experimental results demonstrate the effectiveness of the proposed method.

翻訳日:2022-06-14 23:08:13 公開日:2022-06-13

# (参考訳) 実処理インメモリシステムにおける機械学習トレーニング

Machine Learning Training on a Real Processing-in-Memory System ( http://arxiv.org/abs/2206.06022v1 )

ライセンス: CC BY 4.0

Juan G\'omez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu

(参考訳) 機械学習アルゴリズムのトレーニングは計算集約的なプロセスであり、大規模なトレーニングデータセットに繰り返しアクセスするため、メモリバウンドが頻繁に発生する。その結果、プロセッサ中心のシステム(CPU、GPUなど)は、大量のエネルギーと実行サイクルを消費するメモリユニットと処理ユニットの間のコストのかかるデータ移動に悩まされる。メモリ中心のコンピューティングシステム、すなわち、PIM(Process-in-Memory)機能を備えたコンピューティングシステムは、このデータ移動のボトルネックを軽減することができる。我々の目標は、機械学習のトレーニングを加速するために、現代の汎用PIMアーキテクチャの可能性を理解することである。そのために,(1)実世界の汎用pimアーキテクチャ上で,いくつかの代表的な古典的機械学習アルゴリズム(線形回帰,ロジスティック回帰,決定木,k-平均クラスタリング)を実装し,(2)正確性,性能,スケーリングの観点から特徴付けし,(3)cpuとgpu上での実装と比較する。 2500以上のPIMコアを持つメモリ中心型コンピューティングシステムに対する実験的な評価は、PIMハードウェアで必要な操作やデータタイプをネイティブにサポートする場合、汎用PIMアーキテクチャがメモリバウンド機械学習ワークロードを大幅に高速化できることを示している。我々の知る限り、我々の研究は、現実世界の汎用PIMアーキテクチャにおける機械学習アルゴリズムのトレーニングを評価する最初のものである。

Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., computing systems with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training. To do so, we (1) implement several representative classic machine learning algorithms (namely, linear regression, logistic regression, decision tree, K-means clustering) on a real-world general-purpose PIM architecture, (2) characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our experimental evaluation on a memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound machine learning workloads, when the necessary operations and datatypes are natively supported by PIM hardware. To our knowledge, our work is the first one to evaluate training of machine learning algorithms on a real-world general-purpose PIM architecture.

翻訳日:2022-06-14 22:47:15 公開日:2022-06-13

# (参考訳) TriMix: 自己教師型学習のための仮想埋め込みと自己整合性

TriMix: Virtual embeddings and self-consistency for self-supervised learning ( http://arxiv.org/abs/2206.06023v1 )

ライセンス: CC BY 4.0

Tariq Bdair, Hossam Abdelhamid, Nassir Navab, and Shadi Albarqouni

(参考訳) 自己教師付き学習(SSL)は、教師付き学習モデルのトレーニングにおいて、高コストとデータ制限のために最近注目を集めている。 SSLの現在のパラダイムは、入力空間におけるデータ拡張を利用して、同じイメージの異なるビューを作成し、類似したイメージ間の表現を最大化し、異なるイメージに対して最小化するモデルをトレーニングすることだ。このアプローチは、様々な下流タスクをもたらす最先端(SOTA)を実現するが、しかしながら、潜伏空間の増大を調査する機会を秘めている。本稿では,データの線形補間により仮想埋め込みを生成するSSLの新しい概念であるTriMixを提案する。我々の戦略は、仮想空間からオリジナルの埋め込みを抽出するためにモデルを訓練することに焦点を当てている。さらに,仮想と実際の埋め込みの整合性を改善する自己整合性項を提案する。我々はTriMixを、自然画像と医用画像からなる8つのベンチマークデータセットで検証し、両方のデータ型で2番目に良いモデルよりも2.71%と0.41%改善した。さらに,本手法は半教師付き学習,特に低データ体制において,現在の手法よりも優れていた。さらに、トレーニング済みのモデルは、他のデータセットへの転送性が向上しました。

Self-supervised Learning (SSL) has recently gained much attention due to the high cost and data limitation in the training of supervised learning models. The current paradigm in the SSL is to utilize data augmentation at the input space to create different views of the same images and train a model to maximize the representations between similar images and minimize them for different ones. While this approach achieves state-of-the-art (SOTA) results in various downstream tasks, it still lakes the opportunity to investigate the latent space augmentation. This paper proposes TriMix, a novel concept for SSL that generates virtual embeddings through linear interpolation of the data, thus providing the model with novel representations. Our strategy focuses on training the model to extract the original embeddings from virtual ones, hence, better representation learning. Additionally, we propose a self-consistency term that improves the consistency between the virtual and actual embeddings. We validate TriMix on eight benchmark datasets consisting of natural and medical images with an improvement of 2.71% and 0.41% better than the second-best models for both data types. Further, our approach outperformed the current methods in semi-supervised learning, particularly in low data regimes. Besides, our pre-trained models showed better transfer to other datasets.

翻訳日:2022-06-14 22:34:38 公開日:2022-06-13

# (参考訳) 分光データに基づく機械学習のための普遍的合成データセット

A universal synthetic dataset for machine learning on spectroscopic data ( http://arxiv.org/abs/2206.06031v1 )

ライセンス: CC BY 4.0

Jan Schuetzke, Nathan J. Szymanski, Markus Reischl

(参考訳) 分光データの自動分類のための機械学習手法の開発を支援するため,モデル検証に使用できる普遍的な合成データセットを作成した。このデータセットは、x線回折、核磁気共鳴、ラマン分光法などの手法による実験的な測定を表現するために設計された人工スペクトルを含んでいる。データセット生成プロセスは、スキャンの長さやピーク数などのカスタマイズ可能なパラメータを特徴としており、これは手元の問題に合わせて調整することができる。最初のベンチマークとして、500のユニークなクラスに基づいて、35,000のスペクトルを含むデータセットをシミュレートした。このデータの分類を自動化するために、8つの異なる機械学習アーキテクチャを評価した。結果から,分類タスクの最適性能を達成する上で,どの要因が最も重要かを明らかにした。合成スペクトルを生成するためのスクリプトとベンチマークデータセットと評価ルーチンは、分光分析のための改良された機械学習モデルの開発を支援するために公開されている。

To assist in the development of machine learning methods for automated classification of spectroscopic data, we have generated a universal synthetic dataset that can be used for model validation. This dataset contains artificial spectra designed to represent experimental measurements from techniques including X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy. The dataset generation process features customizable parameters, such as scan length and peak count, which can be adjusted to fit the problem at hand. As an initial benchmark, we simulated a dataset containing 35,000 spectra based on 500 unique classes. To automate the classification of this data, eight different machine learning architectures were evaluated. From the results, we shed light on which factors are most critical to achieve optimal performance for the classification task. The scripts used to generate synthetic spectra, as well as our benchmark dataset and evaluation routines, are made publicly available to aid in the development of improved machine learning models for spectroscopic analysis.

翻訳日:2022-06-14 22:13:35 公開日:2022-06-13

# (参考訳) Bluetooth低エネルギー信号とIMUセンサによる自動接触追跡

Automatic Contact Tracing using Bluetooth Low Energy Signals and IMU Sensor Readings ( http://arxiv.org/abs/2206.06033v1 )

ライセンス: CC BY 4.0

Suriyadeepan Ramamoorthy, Joyce Mahon, Michael O'Mahony, Jean Francois Itangayenda, Tendai Mukande, Tlamelo Makati

(参考訳) 本稿では,2台の携帯電話間の距離を推定する必要がある機械学習センター(ml-labs)の課題に対する解決策を提案する。 NIST Too Close For Too Long (TC4TL) Challengeの修正版であり、時間的側面は除外されている。本稿では,Bluetooth RSSI と IMU センサデータに基づく特徴に基づく手法を提案する。距離とBluetooth RSSI の読み方との関係について興味深い知見が得られたモデルに関するアブレーション研究を行った。

In this report, we present our solution to the challenge provided by the SFI Centre for Machine Learning (ML-Labs) in which the distance between two phones needs to be estimated. It is a modified version of the NIST Too Close For Too Long (TC4TL) Challenge, as the time aspect is excluded. We propose a feature-based approach based on Bluetooth RSSI and IMU sensory data, that outperforms the previous state of the art by a significant margin, reducing the error down to 0.071. We perform an ablation study of our model that reveals interesting insights about the relationship between the distance and the Bluetooth RSSI readings.

翻訳日:2022-06-14 22:05:07 公開日:2022-06-13

# (参考訳) 自動脳腫瘍の表現型を臨床画像へ変換する

Translating automated brain tumour phenotyping to clinical neuroimaging ( http://arxiv.org/abs/2206.06120v1 )

ライセンス: CC BY 4.0

James K Ruffle, Samia Mohinta, Robert J Gray, Harpreet Hyare, Parashkev Nachev

(参考訳) 背景:脳腫瘍の複雑な異質性がますます認識されてきているため、日常的な臨床治療から引き出された本格的な大規模コレクションのみを要求できる。これは、現代の機械学習が、特にニューロイメージングにおいて促進できるタスクであるが、実際の臨床実践で一般的な不完全なデータを扱う能力は未だ不明である。本稿では, 大規模多地点MRIデータに最先端の手法を適用し, 臨床で観察される様々な完全性のレベルを再現する自動腫瘍分割モデルの比較忠実度を定量化する。方法: 深層学習(nnU-Net由来) 腫瘍分画モデルとT1, 造影T1, T2, FLAIR画像シーケンスの組合せを比較し, 2021BraTS-RSNAグリオーマ群1251例の5倍のクロスバリデーションを訓練し, 実世界の50例を対象に検討した。結果: 非完全データセグメント化病変をよく訓練したモデルは,完全データで訓練されたものと同等であり,全腫瘍のDice係数0.907から0.945(フルデータセット),成分組織型の0.701から0.891(フルデータセット)を示した。不完全なデータセグメンテーションモデルは、コントラストイメージングの欠如による腫瘍の増大を正確に検出し、その体積を0.95～0.97のr2で定量化した。結論: ディープラーニングセグメンテーションモデルは、データ不足時に腫瘍をうまく特徴づけ、コントラストを使わずに拡張組織を検出できる。これは、不完全なデータが一般的である臨床実践への翻訳が、hihertoが信じているよりも容易であり、コントラストの使用への依存を減らすのに有用であることを示唆している。

Background: The complex heterogeneity of brain tumours is increasingly recognized to demand data of magnitudes and richness only fully-inclusive, large-scale collections drawn from routine clinical care could plausibly offer. This is a task contemporary machine learning could facilitate, especially in neuroimaging, but its ability to deal with incomplete data common in real world clinical practice remains unknown. Here we apply state-of-the-art methods to large scale, multi-site MRI data to quantify the comparative fidelity of automated tumour segmentation models replicating the various levels of completeness observed in clinical reality. Methods: We compare deep learning (nnU-Net-derived) tumour segmentation models with all possible combinations of T1, contrast-enhanced T1, T2, and FLAIR imaging sequences, trained and validated with five-fold cross-validation on the 2021 BraTS-RSNA glioma population of 1251 patients, and tested on a diverse, real-world 50 patient sample. Results: Models trained on incomplete data segmented lesions well, often equivalently to those trained on complete data, exhibiting Dice coefficients of 0.907 (single sequence) to 0.945 (full datasets) for whole tumours, and 0.701 (single sequence) to 0.891 (full datasets) for component tissue types. Incomplete data segmentation models could accurately detect enhancing tumour in the absence of contrast imaging, quantifying its volume with an R2 between 0.95-0.97. Conclusions: Deep learning segmentation models characterize tumours well when missing data and can even detect enhancing tissue without the use of contrast. This suggests translation to clinical practice, where incomplete data is common, may be easier than hitherto believed, and may be of value in reducing dependence on contrast use.

翻訳日:2022-06-14 21:54:41 公開日:2022-06-13

# (参考訳) 教師なし学習技術を用いた光学銀河形態の分類

The Classification of Optical Galaxy Morphology Using Unsupervised Learning Techniques ( http://arxiv.org/abs/2206.06165v1 )

ライセンス: CC BY 4.0

Ezra Fielding, Clement N. Nyirenda, Mattia Vaccari

(参考訳) 大規模なデータ集約型天文学調査の出現により、ヒトベースの銀河形態分類法が実現可能になった。簡単に言えば、科学者が視覚的にラベルを付けるには、天文学的なデータが多すぎるということです。一般市民からボランティアを募集することで、この作業をクラウドソース化しようと試みられている。しかし、こうした取り組みでさえ、現在の調査で得られたデータにすぐに従わないだろう。教師なし学習技術では、既存のラベルでデータを分類する必要はなく、計画外の発見への道を開くことができる。そこで本研究では,人間の監督なしにGalaxy Zoo DECaLSデータセットを分類するための教師なし学習アルゴリズムを実装することを目的とする。まず、特徴抽出器として畳み込みオートエンコーダを実装した。抽出した特徴は, k-means, fuzzy c-means, agglomerative clusteringによって分類された。その結果,Galaxy Zoo DECaLSデータセットのボランティア分類と比較した。集約クラスタリングは一般的に最良の結果を得たが、k平均クラスタリングよりもパフォーマンスが向上した。適切な最適化により、この手法はより良いパフォーマンスのGalaxy Zoo DECaLS決定木質問のための分類を提供することができる。最終的に、この教師なし学習アプローチは、科学者にとって有用な貴重な洞察と結果をもたらした。

The advent of large scale, data intensive astronomical surveys has caused the viability of human-based galaxy morphology classification methods to come into question. Put simply, too much astronomical data is being produced for scientists to visually label. Attempts have been made to crowd-source this work by recruiting volunteers from the general public. However, even these efforts will soon fail to keep up with data produced by modern surveys. Unsupervised learning techniques do not require existing labels to classify data and could pave the way to unplanned discoveries. Therefore, this paper aims to implement unsupervised learning algorithms to classify the Galaxy Zoo DECaLS dataset without human supervision. First, a convolutional autoencoder was implemented as a feature extractor. The extracted features were then clustered via k-means, fuzzy c-means and agglomerative clustering to provide classifications. The results were compared to the volunteer classifications of the Galaxy Zoo DECaLS dataset. Agglomerative clustering generally produced the best results, however, the performance gain over k-means clustering was not significant. With the appropriate optimizations, this approach could be used to provide classifications for the better performing Galaxy Zoo DECaLS decision tree questions. Ultimately, this unsupervised learning approach provided valuable insights and results that were useful to scientists.

翻訳日:2022-06-14 21:38:13 公開日:2022-06-13

# (参考訳) 宇宙応用のためのシンボリック回帰:多目的メメティックアルゴリズムによる微分カルテシアン遺伝的プログラミング

Symbolic Regression for Space Applications: Differentiable Cartesian Genetic Programming Powered by Multi-objective Memetic Algorithms ( http://arxiv.org/abs/2206.06213v1 )

ライセンス: CC BY 4.0

Marcus M\"artens and Dario Izzo

(参考訳) 解釈可能な回帰モデルは、スパースデータから変数間の関係を専門家が理解できるため、多くのアプリケーションドメインにとって重要である。記号回帰は、基本代数関数から構築できるすべての可能な自由形式方程式の空間を探索することによってこの問題に対処する。明示的な数学的関数はこの方法で再発見できるが、探索中の未知の数値定数の決定はしばしば無視される問題である。進化ループ中の定数を学習するために、微分可能なモンテカルロ遺伝的プログラミング符号化を利用する、新しい多目的メメティックアルゴリズムを提案する。この手法は、火星からの熱パワー推定とジャイロロノロジーによる恒星の年齢決定という2つの応用に対して、学習したブラックボックス回帰モデルやハンドエンジニアリングフィッティングよりも優れていることを示す。

Interpretable regression models are important for many application domains, as they allow experts to understand relations between variables from sparse data. Symbolic regression addresses this issue by searching the space of all possible free form equations that can be constructed from elementary algebraic functions. While explicit mathematical functions can be rediscovered this way, the determination of unknown numerical constants during search has been an often neglected issue. We propose a new multi-objective memetic algorithm that exploits a differentiable Cartesian Genetic Programming encoding to learn constants during evolutionary loops. We show that this approach is competitive or outperforms machine learned black box regression models or hand-engineered fits for two applications from space: the Mars express thermal power estimation and the determination of the age of stars by gyrochronology.

翻訳日:2022-06-14 21:26:20 公開日:2022-06-13

# (参考訳) 光場画像超解像のための劣化適応ネットワークの学習

Learning a Degradation-Adaptive Network for Light Field Image Super-Resolution ( http://arxiv.org/abs/2206.06214v1 )

ライセンス: CC BY 4.0

Yingqian Wang, Zhengyu Liang, Longguang Wang, Jungang Yang, Wei An, Yulan Guo

(参考訳) 近年、光電場(LF)画像超解像(SR)におけるディープニューラルネットワーク(DNN)の大きな進歩を目撃している。しかし、既存のDNNベースのLF画像SR法は、単一の固定劣化(例えば、バイコビックダウンサンプリング)に基づいて開発されており、様々な劣化を伴う実際のLF画像に対して適用できない。本稿では,複数の劣化を伴うlf画像srを扱う最初の方法を提案する。本研究では,実際のLF画像の劣化過程を近似するために,ぼかしと雑音を考慮した実用的なLF劣化モデルを開発した。次に、劣化適応ネットワーク(lf-danet)をsrプロセスに予め組み込むように設計する。複数の合成劣化を持つlf画像の訓練により,空間的および角度的情報を取り入れながら,異なる劣化に適応することを学ぶ。合成劣化LFと実世界のLFの併用実験により,本手法の有効性が示された。既存の最先端のシングルおよびlf画像sr法と比較して,本手法は幅広い劣化下で優れたsr性能を実現し,実画像への一般化を図る。コードとモデルはhttps://github.com/yingqianwang/lf-danetで入手できる。

Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradations. In this paper, we propose the first method to handle LF image SR with multiple degradations. In our method, a practical LF degradation model that considers blur and noise is developed to approximate the degradation process of real LF images. Then, a degradation-adaptive network (LF-DAnet) is designed to incorporate the degradation prior into the SR process. By training on LF images with multiple synthetic degradations, our method can learn to adapt to different degradations while incorporating the spatial and angular information. Extensive experiments on both synthetically degraded and real-world LFs demonstrate the effectiveness of our method. Compared with existing state-of-the-art single and LF image SR methods, our method achieves superior SR performance under a wide range of degradations, and generalizes better to real LF images. Codes and models are available at https://github.com/YingqianWang/LF-DAnet.

翻訳日:2022-06-14 21:16:57 公開日:2022-06-13

# (参考訳) 依存の感覚を作る: 依存度測定を用いた効率的なブラックボックス説明

Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure ( http://arxiv.org/abs/2206.06219v1 )

ライセンス: CC BY 4.0

Paul Novello, Thomas Fel, David Vigouroux

(参考訳) 本稿では,カーネルヒルベルト空間(rkhs)の再現に基づく従属尺度であるヒルベルト・シュミット独立基準(hsic)に基づく,新しい効率的なブラックボックス帰属法を提案する。 hsicは、分布のカーネル埋め込みに基づく入力画像の領域とモデルの出力の間の依存性を測定する。したがって、RKHS表現能力に富んだ説明を提供する。 HSICは、他のブラックボックス属性法と比較して計算コストを大幅に削減することができる。実験の結果,HSICは従来最高のブラックボックス属性法よりも最大8倍高速であり,忠実であることがわかった。実際、画像ネット上の複数の忠実度指標に対して、ブラックボックスとホワイトボックスの両方の属性法を、最新のモデルアーキテクチャで改善または適合させる。ここでは, YOLOv4などの物体検出モデルに対して, 効率よく, 忠実に説明できることを示す。最後に,hsicに基づく重要度スコアの直交分解を可能にする新たなカーネルを提案することで,従来の帰属法を拡張することにより,各画像パッチの重要性だけでなく,その対関係の重要性も評価できる。

This paper presents a new efficient black-box attribution method based on Hilbert-Schmidt Independence Criterion (HSIC), a dependence measure based on Reproducing Kernel Hilbert Spaces (RKHS). HSIC measures the dependence between regions of an input image and the output of a model based on kernel embeddings of distributions. It thus provides explanations enriched by RKHS representation capabilities. HSIC can be estimated very efficiently, significantly reducing the computational cost compared to other black-box attribution methods. Our experiments show that HSIC is up to 8 times faster than the previous best black-box attribution methods while being as faithful. Indeed, we improve or match the state-of-the-art of both black-box and white-box attribution methods for several fidelity metrics on Imagenet with various recent model architectures. Importantly, we show that these advances can be transposed to efficiently and faithfully explain object detection models such as YOLOv4. Finally, we extend the traditional attribution methods by proposing a new kernel enabling an orthogonal decomposition of importance scores based on HSIC, allowing us to evaluate not only the importance of each image patch but also the importance of their pairwise interactions.

翻訳日:2022-06-14 20:41:39 公開日:2022-06-13

# (参考訳) 説明可能性・設計:意思決定システムにおける説明を支援する手法

Explainability-by-Design: A Methodology to Support Explanations in Decision-Making Systems ( http://arxiv.org/abs/2206.06251v1 )

ライセンス: CC BY-SA 4.0

Trung Dong Huynh, Niko Tsakalakis, Ayah Helal, Sophie Stalla-Bourdillon, Luc Moreau

(参考訳) 近年、アルゴリズムは、生活の様々な側面を制御または影響する多くの技術システムにおいて重要な役割を担っている。その結果、ユーザーや組織のニーズに対応する説明の提供は、法律や規則、行動規範、大衆によってますます期待されている。しかし、法律や規則はそのような期待に応える方法を規定していないため、組織はしばしば説明可能性に対する独自のアプローチを考案し、必然的にコンプライアンスと優れたガバナンスのコストを増加させます。そこで我々は,意思決定システムの設計に説明能力を含める積極的措置によって特徴付けられる包括的方法論である"Explainability by Design"を提唱した。本稿では、特定のアプリケーション・コンテキストに対してドメイン・エキスパートが要求する要件から説明能力を実装するためのソフトウェア・エンジニアリング・ワークフローにおける説明可能性・設計手法の技術的なステップについて述べる。 Explainability-by-Design(説明可能性・バイ・デザイン)方法論の成果は、アプリケーションが提供するログを利用して、関連するデータポイントを抽出するためにクエリ可能な証明トレースを生成する、Explaination Assistantと呼ばれる再利用可能なサービスのセットである。これらのステップに従って、組織は、規定された要件を満たす説明を、法律、規制、ビジネスニーズから作成するための意思決定システムを設計することができる。この方法論を2つのアプリケーションに適用し,説明機能を示す説明アシスタントを配置した。最後に、関連する開発コストを測定し、説明文1文あたり2時間程度の開発時間で、説明書の構築アプローチが抽出可能であることを示す。

Algorithms play a key role nowadays in many technological systems that control or affect various aspects of our lives. As a result, providing explanations to address the needs of users and organisations is increasingly expected by the laws and regulations, codes of conduct, and the public. However, as laws and regulations do not prescribe how to meet such expectations, organisations are often left to devise their own approaches to explainability, inevitably increasing the cost of compliance and good governance. Hence, we put forth "Explainability by Design", a holistic methodology characterised by proactive measures to include explanation capability in the design of decision-making systems. This paper describes the technical steps of the Explainability-by-Design methodology in a software engineering workflow to implement explanation capability from requirements elicited by domain experts for a specific application context. Outputs of the Explainability-by-Design methodology are a set of configurations, allowing a reusable service, called the Explanation Assistant, to exploit logs provided by applications and create provenance traces that can be queried to extract relevant data points, which in turn can be used in explanation plans to construct explanations personalised to their consumers. Following those steps, organisations will be able to design their decision-making systems to produce explanations that meet the specified requirements, be it from laws, regulations, or business needs. We apply the methodology to two applications, resulting in a deployment of the Explanation Assistant demonstrating explanations capabilities. Finally, the associated development costs are measured, showing that the approach to construct explanations is tractable in terms of development time, which can be as low as two hours per explanation sentence.

翻訳日:2022-06-14 20:16:54 公開日:2022-06-13

# (参考訳) 大規模ニューラルネットワークの堅牢化のための分散逆行訓練

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale ( http://arxiv.org/abs/2206.06257v1 )

ライセンス: CC BY 4.0

Gaoyuan Zhang, Songtao Lu, Yihua Zhang, Xiangyi Chen, Pin-Yu Chen, Quanfu Fan, Lee Martie, Lior Horesh, Mingyi Hong, Sijia Liu

(参考訳) 現在のディープニューラルネットワーク(DNN)は、入力に対する敵の摂動が分類を変更したり操作したりする敵攻撃に対して脆弱である。このような攻撃を防御するために、対戦訓練(AT)として知られる効果的で一般的なアプローチが、min-maxロバストな訓練方法により、敵攻撃の負の影響を軽減することが示されている。効果的ではあるが、それが分散学習コンテキストにうまく適応できるかは不明だ。複数のマシンに対する分散最適化のパワーにより、大規模なモデルやデータセットに対する堅牢なトレーニングをスケールアップできます。そこで本研究では,複数のマシンにまたがる大規模攻撃訓練フレームワークであるdistributed adversarial training (dat)を提案する。 DATは一般に,ラベル付きおよびラベルなしデータのトレーニング,複数種類の攻撃発生方法,分散最適化に適した勾配圧縮操作をサポートする。理論的には、最適化理論の標準的な条件下では、一般の非凸設定における一階定常点への DAT の収束率を提供する。経験的に、DATは最先端の堅牢なアキュラシーにマッチするか、より優れており、優雅なトレーニングスピードアップを実現している(例:ImageNetのResNet-50)。コードはhttps://github.com/dat-2022/datで入手できる。

Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.

翻訳日:2022-06-14 19:52:54 公開日:2022-06-13

# (参考訳) 多重カーネル拡張畳み込みネットワークによるポリプの自動分割

Automatic Polyp Segmentation with Multiple Kernel Dilated Convolution Network ( http://arxiv.org/abs/2206.06264v1 )

ライセンス: CC BY 4.0

Nikhil Kumar Tomar, Abhishek Srivastava, Ulas Bagci, Debesh Jha

(参考訳) 大腸内視鏡による前立腺ポリープの検出と除去は,世界中の大腸癌予防の第一の手法である。しかし,大腸ポリープのミス率は内科医によって大きく異なる。コンピュータ支援診断システム(CAD)は,大腸ポリープの検出や内科医の変動の最小化に役立てることが知られている。本研究では,ポリープデータ分布の大幅な変化に頑健な自動ポリープセグメンテーションを実現するための新しいディープラーニングアーキテクチャである {\textbf{MKDCNet}}を提案する。 MKDCNetは単にエンコーダ-デコーダニューラルネットワークであり、事前に訓練された \textit{ResNet50} をエンコーダとして使用し、新しい \textit{multiple kernel dilated convolution (MKDC) ブロックを使用して視野を広げ、より堅牢で不均一な表現を学ぶ。公開された4つのポリプデータセットと細胞核データセットの大規模な実験により、提案されたMKDCNetは、同じデータセット上でトレーニングおよびテストを行った場合や、異なる分布から見えないポリプデータセットでテストした場合に、最先端のメソッドよりも優れていた。その結果,提案するアーキテクチャの堅牢性が実証された。効率の観点から、我々のアルゴリズムはRTX 3090 GPU上で毎秒($\approx45$)フレームを処理できる。 mkdcnetは、臨床大腸のリアルタイムシステムを構築するための強力なベンチマークである。 MKDCNet のコードは \url{https://github.com/nikhilroxtomar/MKDCNet} で公開されている。

The detection and removal of precancerous polyps through colonoscopy is the primary technique for the prevention of colorectal cancer worldwide. However, the miss rate of colorectal polyp varies significantly among the endoscopists. It is well known that a computer-aided diagnosis (CAD) system can assist endoscopists in detecting colon polyps and minimize the variation among endoscopists. In this study, we introduce a novel deep learning architecture, named {\textbf{MKDCNet}}, for automatic polyp segmentation robust to significant changes in polyp data distribution. MKDCNet is simply an encoder-decoder neural network that uses the pre-trained \textit{ResNet50} as the encoder and novel \textit{multiple kernel dilated convolution (MKDC)} block that expands the field of view to learn more robust and heterogeneous representation. Extensive experiments on four publicly available polyp datasets and cell nuclei dataset show that the proposed MKDCNet outperforms the state-of-the-art methods when trained and tested on the same dataset as well when tested on unseen polyp datasets from different distributions. With rich results, we demonstrated the robustness of the proposed architecture. From an efficiency perspective, our algorithm can process at ($\approx45$) frames per second on RTX 3090 GPU. MKDCNet can be a strong benchmark for building real-time systems for clinical colonoscopies. The code of the proposed MKDCNet is available at \url{https://github.com/nikhilroxtomar/MKDCNet}.

翻訳日:2022-06-14 19:20:50 公開日:2022-06-13

# (参考訳) 関節表面アトラスの学習

Learning Joint Surface Atlases ( http://arxiv.org/abs/2206.06273v1 )

ライセンス: CC BY 4.0

Theo Deprelle, Thibault Groueix, Noam Aigerman, Vladimir G. Kim and Mathieu Aubry

(参考訳) 本稿では,3次元表面のアトラス様表現,すなわち2次元領域から表面への準同型変換を学習するための新しい手法について述べる。先行研究と比較して,2つの主要な貢献を提案する。まず、正方形パッチなどの固定された2次元領域を表面にマッピングするのではなく、ガウスの混合として表される点サンプリング分布を最適化することにより任意の位相を持つ連続な2次元領域を学習する。次に、3次元表面から2次元領域へのチャートと、その逆のパラメトリゼーションという、両方向の一貫性のあるマッピングを学習する。これにより、学習した表面表現の品質が向上し、関連する形状の集合における一貫性が向上することを示す。これにより、対応推定、テクスチャ転送、一貫性のあるuvマッピングなどのアプリケーションの改善につながる。追加の技術的貢献として、通常の整合性の導入には明確なメリットがあるが、最適化の問題につながり、これらの問題は単純な反発正則化によって緩和できる、と概説する。我々の貢献は、既存のベースラインよりも優れた表面表現を提供することを実証する。

This paper describes new techniques for learning atlas-like representations of 3D surfaces, i.e. homeomorphic transformations from a 2D domain to surfaces. Compared to prior work, we propose two major contributions. First, instead of mapping a fixed 2D domain, such as a set of square patches, to the surface, we learn a continuous 2D domain with arbitrary topology by optimizing a point sampling distribution represented as a mixture of Gaussians. Second, we learn consistent mappings in both directions: charts, from the 3D surface to 2D domain, and parametrizations, their inverse. We demonstrate that this improves the quality of the learned surface representation, as well as its consistency in a collection of related shapes. It thus leads to improvements for applications such as correspondence estimation, texture transfer, and consistent UV mapping. As an additional technical contribution, we outline that, while incorporating normal consistency has clear benefits, it leads to issues in the optimization, and that these issues can be mitigated using a simple repulsive regularization. We demonstrate that our contributions provide better surface representation than existing baselines.

翻訳日:2022-06-14 19:07:10 公開日:2022-06-13

# (参考訳) アクティブラーニングにおけるサンプルの再利用性について

On the reusability of samples in active learning ( http://arxiv.org/abs/2206.06276v1 )

ライセンス: CC BY-SA 4.0

Gijs van Tulder and Marco Loog

(参考訳) アクティブラーニングにおいて、興味深いが広く研究されていない質問は、サンプル再利用可能性である。本稿では,サンプル再利用性が実用的関心事である理由,再利用性が問題になり得る理由,重要度重み付けアクティブラーニングによる再利用性の向上,普遍的再利用性への障害について述べる。理論的議論と実演により、普遍的再利用は不可能であると主張する。アクティブな学習戦略はすべて、サンプルスペースのいくつかの領域を過小評価しなければならないため、これらの領域のサンプルに依存する学習者は、ランダムなサンプル選択からさらに学ぶことができる。本稿では,実践における再利用可能性問題の影響を示す,重要度の高いアクティブラーニング実験について述べる。実験では、普遍的な再利用性は存在しないことを確認したが、いくつかのデータセットといくつかの分類器では、サンプル再利用性がある。最後に,2つの分類器間の再利用性を保証する条件について考察する。

An interesting but not extensively studied question in active learning is that of sample reusability: to what extent can samples selected for one learner be reused by another? This paper explains why sample reusability is of practical interest, why reusability can be a problem, how reusability could be improved by importance-weighted active learning, and which obstacles to universal reusability remain. With theoretical arguments and practical demonstrations, this paper argues that universal reusability is impossible. Because every active learning strategy must undersample some areas of the sample space, learners that depend on the samples in those areas will learn more from a random sample selection. This paper describes several experiments with importance-weighted active learning that show the impact of the reusability problem in practice. The experiments confirmed that universal reusability does not exist, although in some cases -- on some datasets and with some pairs of classifiers -- there is sample reusability. Finally, this paper explores the conditions that could guarantee the reusability between two classifiers.

翻訳日:2022-06-14 18:54:03 公開日:2022-06-13

# (参考訳) 病院療養における健康格差の緩和

Mitigating health disparities in hospital readmissions ( http://arxiv.org/abs/2206.06279v1 )

ライセンス: CC BY 4.0

Shaina Raza

(参考訳) 入院患者の高血糖管理は死亡率と死亡率の両方に大きな影響を及ぼす。本研究は,糖尿病患者を入院させる必要性を予測するために,大規模臨床データベースを用いた。しかし、これらの予測は人種、年齢、性別などの社会的決定要因によって引き起こされる健康格差に弱い可能性がある。これらのバイアスは、データ収集プロセスの初期にシステムに入る前に取り除かなければならず、モデル予測によって強化され、モデルの決定にバイアスが生じる。本稿では,バイアスの検出と軽減に加えて,予測が可能な機械学習パイプラインを提案する。このパイプラインは臨床データを分析し、バイアスの有無を決定し、それらを除去し、予測する。実験によるモデル予測における分類精度と公平性を示す。その結果、モデルの初期にバイアスを緩和すると、より公平な予測が得られます。また、フェアネスが向上するにつれて、ある程度の精度が犠牲になり、以前の研究でも検証されていることもわかりました。このパイプラインを通じて対処できる健康格差に寄与する追加の要因を特定するために、研究コミュニティを招待します。

The management of hyperglycemia in hospitalized patients has a significant impact on both morbidity and mortality. This study used a large clinical database to predict the need for diabetic patients to be hospitalized, which could lead to improvements in patient safety. These predictions, however, may be vulnerable to health disparities caused by social determinants such as race, age, and gender. These biases must be removed early in the data collection process, before they enter the system and are reinforced by model predictions, resulting in biases in the model's decisions. In this paper, we propose a machine learning pipeline capable of making predictions as well as detecting and mitigating biases. This pipeline analyses clinical data, determines whether biases exist, removes them, and then make predictions. We demonstrate the classification accuracy and fairness in model predictions using experiments. The results show that when we mitigate biases early in a model, we get fairer predictions. We also find that as we get better fairness, we sacrifice a certain level of accuracy, which is also validated in the previous studies. We invite the research community to contribute to identifying additional factors that contribute to health disparities that can be addressed through this pipeline.

翻訳日:2022-06-14 18:33:19 公開日:2022-06-13

# コストの少ない車両経路用マルチエージェントニューラルリライト装置

Multi-Agent Neural Rewriter for Vehicle Routing with Limited Disclosure of Costs ( http://arxiv.org/abs/2206.05990v1 )

ライセンス: Link先を確認

Nathalie Paul, Tim Wirtz, Stefan Wrobel, Alexander Kister

(参考訳) マルチサイクルルーティング問題を部分的に観測可能なコストでチームマルコフゲームとして解釈する。特定の顧客に対して、プレーングエージェント(車両)は、チーム最適エージェントルートを最小限のコストで決定するという共通の目標を持っています。これにより、各エージェントは自身のコストのみを観測する。我々のマルチエージェント強化学習アプローチである、いわゆるマルチエージェントニューラルリライタは、1エージェントニューラルリライタを利用して、反復的に書き換えるソリューションによって問題を解決する。並列エージェントアクションの実行と部分的可観測性は、ゲームに対する新しい書き換えルールを必要とする。本稿では,未アクセスノードの収集ポイントとして機能する,いわゆるプールの導入を提案する。エージェントは同時に動作し、ノードを競合のない方法で交換することができる。学習中にのみ共有することで,エージェント固有のコストの開示の制限を実現する。推論中、各エージェントは、そのコストのみに基づいて、分散的に行動する。小さな問題サイズに関する最初の実験結果から、完全なコスト情報設定で動作するOR-Toolsベンチマークに近い性能に達することが示される。

We interpret solving the multi-vehicle routing problem as a team Markov game with partially observable costs. For a given set of customers to serve, the playing agents (vehicles) have the common goal to determine the team-optimal agent routes with minimal total cost. Each agent thereby observes only its own cost. Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions. Parallel agent action execution and partial observability require new rewriting rules for the game. We propose the introduction of a so-called pool in the system which serves as a collection point for unvisited nodes. It enables agents to act simultaneously and exchange nodes in a conflict-free manner. We realize limited disclosure of agent-specific costs by only sharing them during learning. During inference, each agents acts decentrally, solely based on its own cost. First empirical results on small problem sizes demonstrate that we reach a performance close to the employed OR-Tools benchmark which operates in the perfect cost information setting.

翻訳日:2022-06-14 18:24:10 公開日:2022-06-13

# リアルタイム重力波検出のための新しい多層モジュラーアプローチ

A Novel Multi-Layer Modular Approach for Real-Time Gravitational-Wave Detection ( http://arxiv.org/abs/2206.06004v1 )

ライセンス: Link先を確認

Francesco Pio Barone, Daniele Dell'Aquila, Marco Russo

(参考訳) 高度なLIGOと高度なVirgo地上ベースの干渉計は、前例のないほど大量の宇宙空間を探査し、重力波エミッターの新たな源に観測の発見能力を高めている。このシナリオでは、高度に最適化された重力波検出アルゴリズムの開発が重要である。本稿では,音声処理技術に触発された重力波のリアルタイム検出のための新しい階層化フレームワークを提案し,その実装において,遺伝的プログラミングとニューラルネットワークのハイブリッド化を含む最先端の機械学習アプローチに基づく。新しく提案されたフレームワークの重要な側面は、よく構造化された、階層化されたアプローチと低い計算複雑性である。本稿では,フレームワークの基本概念と,最初の3つのレイヤの導出について述べる。たとえこの実装において、機械学習アプローチで導出されたモデルに基づいていても、提案された階層構造は普遍的な性質を持つ。モデルの訓練および試験には, 高度なLIGO感度設計を示す合成ガウス雑音における二元ブラックホール重力波波形を用いた。畳み込みニューラルネットワークのようなより複雑なアプローチと比較すると、我々のフレームワークは、論文に記述された単純な基底モデルでさえも、同様の性能を持つが、計算の複雑さはずっと低く、モジュール性も高い。さらに、短期的な特徴の活用は、新しい枠組みの結果を重力波信号の時間配置と事実上独立にし、第2世代の干渉計による重力波検出のためのリアルタイム多層パイプラインの将来の利用を単純化する。

Advanced LIGO and Advanced Virgo ground-based interferometers are poised to probe an unprecedentedly large volume of space, enhancing the discovery power of the observations to even new sources of gravitational wave emitters. In this scenario, the development of highly optimized gravitational wave detection algorithms is crucial. We propose a novel layered framework for real-time detection of gravitational waves inspired by speech processing techniques and, in the present implementation, based on a state-of-the-art machine learning approach involving a hybridization of genetic programming and neural networks. The key aspects of the newly proposed framework are: the well structured, layered approach, and the low computational complexity. The paper describes the basic concepts of the framework and the derivation of the first three layers. Even if, in the present implementation, the layers are based on models derived using a machine learning approach, the proposed layered structure has a universal nature. To train and test the models, we used simulated binary black hole gravitational wave waveforms in synthetic Gaussian noise representative of Advanced LIGO sensitivity design. Compared to more complex approaches, such as convolutional neural networks, our framework, even using the simple ground model described in the paper, has similar performance but with a much lower computational complexity and a higher degree of modularity. Furthermore, the underlying exploitation of short-term features makes the results of the new framework virtually independent against time-position of gravitational wave signals, simplifying its future exploitation in real-time multi-layer pipelines for gravitational-wave detection with second generation interferometers.

翻訳日:2022-06-14 18:23:54 公開日:2022-06-13

# neuromorphic wireless cognition: 遠隔推論のためのイベント駆動意味コミュニケーション

Neuromorphic Wireless Cognition: Event-Driven Semantic Communications for Remote Inference ( http://arxiv.org/abs/2206.06047v1 )

ライセンス: Link先を確認

Jiechen Chen, Nicolas Skatchkovsky, Osvaldo Simeone

(参考訳) ニューロモーフィックコンピューティングは、バッチ処理からストリーミングデータのオンライン、イベント駆動処理に移行する、新たなコンピューティングパラダイムである。スパイクベースのセンサーと組み合わせたニューロモルフィックチップは、スパイクのタイミングで関連する事象が記録されたときにのみエネルギーを消費し、環境の変化に対する低遅延応答を証明することによって、データ分布の「セマンティック」に本質的に適応することができる。本稿では,スパイクベースセンシング,処理,通信を統合したニューロモルフィック無線インターネット・オブ・シングスシステムのエンドツーエンド設計を提案する。提案するニューロコムシステムでは、各センシング装置は、神経形態センサ、スパイキングニューラルネットワーク(snn)、複数のアンテナを備えたインパルス無線送信機を備える。送信は、マルチアンテナインパルス無線受信機とSNNを備えた受信機に共有フェーディングチャネルを介して行われる。受信機のフェーディングチャネル条件への適応を可能にするため、パイロットを用いてデコードsnの重みを制御するハイパーネットワークを導入する。パイロット、SNNの符号化、SNNの復号化、ハイパーネットワークは、複数のチャネル実現を通じて共同で訓練される。提案システムは,従来のフレームベースデジタルソリューションよりも,時間-精度およびエネルギー消費の指標を用いて,代替の非適応的トレーニング手法よりも大幅に改善されている。

Neuromorphic computing is an emerging computing paradigm that moves away from batched processing towards the online, event-driven, processing of streaming data. Neuromorphic chips, when coupled with spike-based sensors, can inherently adapt to the "semantics" of the data distribution by consuming energy only when relevant events are recorded in the timing of spikes and by proving a low-latency response to changing conditions in the environment. This paper proposes an end-to-end design for a neuromorphic wireless Internet-of-Things system that integrates spike-based sensing, processing, and communication. In the proposed NeuroComm system, each sensing device is equipped with a neuromorphic sensor, a spiking neural network (SNN), and an impulse radio transmitter with multiple antennas. Transmission takes place over a shared fading channel to a receiver equipped with a multi-antenna impulse radio receiver and with an SNN. In order to enable adaptation of the receiver to the fading channel conditions, we introduce a hypernetwork to control the weights of the decoding SNN using pilots. Pilots, encoding SNNs, decoding SNN, and hypernetwork are jointly trained across multiple channel realizations. The proposed system is shown to significantly improve over conventional frame-based digital solutions, as well as over alternative non-adaptive training methods, in terms of time-to-accuracy and energy consumption metrics.

翻訳日:2022-06-14 18:23:28 公開日:2022-06-13

# 音響シーン分類のための低複雑深層学習フレームワーク

Low-complexity deep learning frameworks for acoustic scene classification ( http://arxiv.org/abs/2206.06057v1 )

ライセンス: Link先を確認

Lam Pham, Dat Ngo, Anahid Jalali, Alexander Schindler

(参考訳) 本稿では,音響シーン分類(ASC)のための低複雑深層学習フレームワークを提案する。提案するフレームワークは、フロントエンドのスペクトログラム抽出、オンラインデータ拡張、バックエンド分類、予測確率の後期融合の4つの主要なステップに分けることができる。特に,まず音声録音をメル,ガンマタン,およびcqtスペクトログラムに変換する。次に、ランダムクロップ、スペクタグメント、ミックスアップのデータ拡張手法を適用し、深層学習に基づく分類器に入力する前に、拡張スペクトログラムを生成する。最後に, 3つの個別分類器から得られた確率を, 3種類のスペクトログラムで独立に学習し, 最適性能を得る。 DCASE 2022 Task 1 Development データセットで実施した実験は,低複雑さの要件を十分に満たし,60.1%の最高の分類精度を達成し,DCASE ベースラインを17.2%向上させた。

In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied to generate augmented spectrograms before being fed into deep learning based classifiers. Finally, to achieve the best performance, we fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms. Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%.

翻訳日:2022-06-14 18:23:03 公開日:2022-06-13

# 学習可能なウェーブレットパケット変換を用いたロバスト時系列デノーミング

Robust Time Series Denoising with Learnable Wavelet Packet Transform ( http://arxiv.org/abs/2206.06126v1 )

ライセンス: Link先を確認

Gaetan Frusque, Olga Fink

(参考訳) 多くのアプリケーションでは、信号デノイジングは、後続の分析や学習タスクの前に最初の前処理ステップであることが多い。本稿では,ウェーブレットパケット変換の学習可能なバージョンである信号処理に触発された深層学習分母モデルを適用することを提案する。提案アルゴリズムは,解釈可能なパラメータが少なく,直感的な初期化が可能である。雑音レベルに適応するためのパラメータの学習後修正を提案する。提案手法の性能を2つのケーススタディで評価し,ウェーブレット収縮デノイング,畳み込みニューラルネットワーク,オートエンコーダ,U-netディープモデルなど,他の手法と比較した。最初のケーススタディは、アルゴリズムの認知特性を研究するのによく使われる設計関数に基づいている。第2のケーススタディは、オーディオバックグラウンド除去タスクです。本稿では,提案アルゴリズムが信号処理手法の普遍性と深層学習手法の学習能力にどのように関連しているかを示す。特に,訓練用クラス内外における構造化雑音信号の発声性能について評価した。トレーニングクラス内外における信号の復調性能に加えて, ノイズレベル, ノイズタイプ, アーティファクトが相違する場合には, 特にロバストであることを示す。

In many applications, signal denoising is often the first pre-processing step before any subsequent analysis or learning task. In this paper, we propose to apply a deep learning denoising model inspired by a signal processing, a learnable version of wavelet packet transform. The proposed algorithm has signficant learning capabilities with few interpretable parameters and has an intuitive initialisation. We propose a post-learning modification of the parameters to adapt the denoising to different noise levels. We evaluate the performance of the proposed methodology on two case studies and compare it to other state of the art approaches, including wavelet schrinkage denoising, convolutional neural network, autoencoder and U-net deep models. The first case study is based on designed functions that have typically been used to study denoising properties of the algorithms. The second case study is an audio background removal task. We demonstrate how the proposed algorithm relates to the universality of signal processing methods and the learning capabilities of deep learning approaches. In particular, we evaluate the obtained denoising performances on structured noisy signals inside and outside the classes used for training. In addition to having good performance in denoising signals inside and outside to the training class, our method shows to be particularly robust when different noise levels, noise types and artifacts are added.

翻訳日:2022-06-14 18:22:35 公開日:2022-06-13

# 階層的相関再構成を用いた活動銀河核の赤方偏移の確率分布予測

Predicting conditional probability distributions of redshifts of Active Galactic Nuclei using Hierarchical Correlation Reconstruction ( http://arxiv.org/abs/2206.06194v1 )

ライセンス: Link先を確認

Jarek Duda

(参考訳) 一般に値の予測に焦点が当てられているが、実データは条件付き確率分布のみを予測でき、条件付きエントロピー$H(Y|X)$で制限される。さらに不確実性を推定すれば、予測値をラプラス分布のガウス中心として扱うことができ、これは実データの複雑な条件分布とはかけ離れた理想化である。本稿では,複数モーメント様パラメータの独立なMSE推定により,比較的複雑な条件分布(マルチモーダルなど)を安価に予測するために階層的相関再構成(HCR)手法を適用する。この目的のために線形回帰を用いて解釈可能なモデルを得る:条件付きモーメントに対する特徴の寄与を記述する係数を持つ。本稿では,第4のfermi-lat data release 2 (4lac) データセットに基づく活動銀河核の赤方偏移予測の実用的問題に着目し,特徴量最適化とl1"lasso"正則化にcanonical correlation analysis (cca) を用いた最初のアプローチを拡張した。

While there is a general focus on prediction of values, real data often only allows to predict conditional probability distributions, with capabilities bounded by conditional entropy $H(Y|X)$. If additionally estimating uncertainty, we can treat a predicted value as the center of Gaussian of Laplace distribution - idealization which can be far from complex conditional distributions of real data. This article applies Hierarchical Correlation Reconstruction (HCR) approach to inexpensively predict quite complex conditional probability distributions (e.g. multimodal): by independent MSE estimation of multiple moment-like parameters, which allow to reconstruct the conditional distribution. Using linear regression for this purpose, we get interpretable models: with coefficients describing contributions of features to conditional moments. This article extends on the original approach especially by using Canonical Correlation Analysis (CCA) for feature optimization and l1 "lasso" regularization, focusing on practical problem of prediction of redshift of Active Galactic Nuclei (AGN) based on Fourth Fermi-LAT Data Release 2 (4LAC) dataset.

翻訳日:2022-06-14 18:22:13 公開日:2022-06-13

# モデル不確かさ下におけるマルコフ決定過程

Markov Decision Processes under Model Uncertainty ( http://arxiv.org/abs/2206.06109v1 )

ライセンス: Link先を確認

Ariel Neufeld, Julian Sester, Mario \v{S}iki\'c

(参考訳) 離散時間無限地平線設定におけるモデル不確実性の下でのマルコフ決定問題の一般的な枠組みを紹介する。動的プログラミングの原理を提供することにより、局所的-グローバルなパラダイム、すなわち、一段階の堅牢な最適化問題を解くことで、大域的(無限の時間ステップ)頑健な確率的最適制御問題の最適化と、それに対応する最悪の尺度が得られる。さらに、このフレームワークをS&P500のデータを含むポートフォリオ最適化に適用する。 2つの異なる曖昧性集合を提示する。1つは経験的測度の周りのwasserstein-ballによって与えられたデータ駆動であり、もう1つはパラメータの不確かさ集合がデータから推定される多変量正規分布のパラメトリック集合によって記述される。市場が変動的あるいは不安定なシナリオでは、対応する堅牢な最適化問題からの最適ポートフォリオ戦略がモデル不確実性のないポートフォリオよりも優れており、モデル不確実性を考慮することの重要性が示される。

We introduce a general framework for Markov decision problems under model uncertainty in a discrete-time infinite horizon setting. By providing a dynamic programming principle we obtain a local-to-global paradigm, namely solving a local, i.e., a one time-step robust optimization problem leads to an optimizer of the global (i.e. infinite time-steps) robust stochastic optimal control problem, as well as to a corresponding worst-case measure. Moreover, we apply this framework to portfolio optimization involving data of the S&P 500. We present two different types of ambiguity sets; one is fully data-driven given by a Wasserstein-ball around the empirical measure, the second one is described by a parametric set of multivariate normal distributions, where the corresponding uncertainty sets of the parameters are estimated from the data. It turns out that in scenarios where the market is volatile or bearish, the optimal portfolio strategies from the corresponding robust optimization problem outperforms the ones without model uncertainty, showcasing the importance of taking model uncertainty into account.

翻訳日:2022-06-14 18:21:47 公開日:2022-06-13

# (参考訳) 社会と記憶誘導を持つ人工航海者における累積的文化の自然発生

Cumulative culture spontaneously emerges in artificial navigators who are social and memory-guided ( http://arxiv.org/abs/2206.06281v1 )

ライセンス: CC BY 4.0

Edwin S. Dalmaijer

(参考訳) これまでは人間特有の存在と考えられてきたが、累積的な文化進化は非ヒト動物にも見られる。個人からの適応的な革新が社会学習を通じて連続的に受け継がれるときである。例えば、単独または安定なペアで飛行するハトは、比較的厳格な亜最適経路を示すが、経験豊富なメンバーがナイーブな経路に交換される世代によって、ルート効率が徐々に向上する。これは、累積的な文化進化のために必要最小限の認知アーキテクチャが生み出すかという疑問を提起する。ここでは,目標指向性,社会的近接性,経路記憶の3つの主機能を有するエージェントに対して,この問題に答えようと考えた。効率と世代効率の改善のためのオプティマでは、ハトで観察された累積培養を再現した。それぞれの最適な経路は、主に記憶によって決定され、社会的近接と目標指向によってより少ない範囲で決定された。社会的近接の必要性から、各エージェントは記憶された経路に沿って経験豊富なエージェントに近づいた。しかし、ルート記憶に妨げられず、単純エージェントの進路は目標に向かって進む傾向が強かった。これによりペアの経路が微妙に偏り、その結果の効率改善はゴールに回帰する。累積的文化的進化の現在の枠組みでは、世代ごとの漸進的な改善が全ての中核的な基準を満たしており、初歩的な累積的最適化は、社会的近接を好んで記憶能力を持つ単純なシステムでも現れる進化のメカニズムであることを示している。

While previously thought to be uniquely human, cumulative cultural evolution continues to be found in non-human animals. It occurs when an adaptive innovation from an individual is repeatedly passed onto consecutive generations through social learning. For example, pigeons who fly alone or in stable pairs show relatively rigid sub-optimal routes, but gradually improve route efficiency over generations of pairs in which experienced members are swapped for naive ones. This raises the question of what the minimally required cognitive architecture is for cumulative cultural evolution to emerge. Here, I aimed to answer this question in artificial agents who employ three main functions: goal-direction, social proximity, and route memory. At the optima for efficiency and generational efficiency improvement, agents replicated cumulative culture observed in pigeons. At each optimum, paths were determined primarily by memory, and to a lesser extent by social proximity and goal-direction. Because of their need for social proximity, each naive agent stayed close to their experienced counterpart as that followed its memorised path. However, unhindered by route memory, the naive agent's heading was more likely to err towards the goal. This subtly biased pairs' routes, and the resulting efficiency improvement is thus regression to the goal. The resulting incremental improvements over generations meet all core criteria in current frameworks of cumulative cultural evolution, suggesting that rudimentary cumulative optimisation is an evolutionary mechanism that emerges even in simple systems that prefer social proximity and have a memory capacity.

翻訳日:2022-06-14 18:20:01 公開日:2022-06-13

# GradICON: 勾配逆整合による近似微分同相

GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency ( http://arxiv.org/abs/2206.05897v1 )

ライセンス: Link先を確認

Lin Tian, Hastings Greer, Fran\c{c}ois-Xavier Vialard, Roland Kwitt, Ra\'ul San Jos\'e Est\'epar, Marc Niethammer

(参考訳) 多くの登録手法があり、初期の研究は画像ペアの最適化に基づくアプローチに重点を置いている。最近の研究は、空間変換を予測するための深層登録ネットワークに焦点を当てている。どちらの場合でも、低次元変換パラメータの代わりに変換関数を推定する非パラメトリック登録モデルは、(滑らかな変換を促進するために)適切な正規化子とそのパラメータを選択する必要がある。これによりモデルはチューニングが難しくなり、選択された正規化器によって許容される変形空間に変形を制限できる。光学フローのためのディープラーニングモデルは変換を規則化せず、代わりにデータに完全に依存するが、医療画像登録に望ましい二相変換は生じないかもしれない。そこで本研究では,正規化に逆一貫性のみを使用する非教師付きアイコン型ディープラーニング登録手法に基づくgradiconビルディングを開発した。しかし、ICONとは対照的に、勾配の逆一貫性損失を用いることで収束が著しく向上するだけでなく、結果の変換写像の暗黙的な正則化がもたらされることを実証し実証的に検証する。磁気共鳴(MR)膝画像とCT(CT)肺画像の合成実験と実験により,GradICONの優れた性能が示された。我々は、簡単な登録定式化を維持しつつ、最先端(SOTA)の精度を達成する。

Many registration approaches exist with early work focusing on optimization-based approaches for image pairs. Recent work focuses on deep registration networks to predict spatial transformations. In both cases, commonly used non-parametric registration models, which estimate transformation functions instead of low-dimensional transformation parameters, require choosing a suitable regularizer (to encourage smooth transformations) and its parameters. This makes models difficult to tune and restricts deformations to the deformation space permissible by the chosen regularizer. While deep-learning models for optical flow exist that do not regularize transformations and instead entirely rely on the data these might not yield diffeomorphic transformations which are desirable for medical image registration. In this work, we therefore develop GradICON building upon the unsupervised ICON deep-learning registration approach, which only uses inverse-consistency for regularization. However, in contrast to ICON, we prove and empirically verify that using a gradient inverse-consistency loss not only significantly improves convergence, but also results in a similar implicit regularization of the resulting transformation map. Synthetic experiments and experiments on magnetic resonance (MR) knee images and computed tomography (CT) lung images show the excellent performance of GradICON. We achieve state-of-the-art (SOTA) accuracy while retaining a simple registration formulation, which is practically important.

翻訳日:2022-06-14 18:05:31 公開日:2022-06-13

# 可変画像復元のためのハイパーネットワーク

One Size Fits All: Hypernetwork for Tunable Image Restoration ( http://arxiv.org/abs/2206.05970v1 )

ライセンス: Link先を確認

Shai Aharon and Gil Ben-Artzi

(参考訳) 本稿では,複数のモデルの精度を向上し,それぞれが異なる劣化レベルに最適化され,単一のモデルと全く同じパラメータを持つ,可変画像復元のための新しい手法を提案する。我々のモデルは、一定数のパラメータと様々な画像復元タスクで必要となる多くの劣化レベルを復元するために最適化できる。実世界のデータセットに対する実験により、我々の手法は既存のチューナブルモデルに対してデノナイズ、デJPEG、超高解像度化を実現し、より広範囲の劣化レベルに対するスムーズで正確なフィッティングを可能にした。

We introduce a novel approach for tunable image restoration that achieves the accuracy of multiple models, each optimized for a different level of degradation, with exactly the same number of parameters as a single model. Our model can be optimized to restore as many degradation levels as required with a constant number of parameters and for various image restoration tasks. Experiments on real-world datasets show that our approach achieves state-of-the art results in denoising, DeJPEG and super-resolution with respect to existing tunable models, allowing smoother and more accurate fitting over a wider range of degradation levels.

翻訳日:2022-06-14 18:05:09 公開日:2022-06-13

# 胸部X線写真における結核分画の深層アンサンブル学習

Deep ensemble learning for segmenting tuberculosis-consistent manifestations in chest radiographs ( http://arxiv.org/abs/2206.06065v1 )

ライセンス: Link先を確認

Sivaramakrishnan Rajaraman, Feng Yang, Ghada Zamzmi, Peng Guo, Zhiyun Xue and Sameer K Antani

(参考訳) 深層学習(DL)法を用いた胸部X線(CXR)の結核性病変の自動分離は、放射線治療の労力を減らし、臨床的意思決定を補完し、患者治療の改善をもたらす可能性がある。論文の大半は、粗い境界ボックスアノテーションを用いた自動セグメンテーションモデルのトレーニングについて論じている。しかし、バウンディングボックスアノテーションの粒度は、ピクセルレベルでの偽陽性と負のかなりの部分を含めることによって、全体的なセマンティックセグメンテーション性能に悪影響を及ぼす可能性がある。この研究 (i)tb整合病変の細粒度アノテーションの利点と有用性の評価 (ii) オリジナルおよび骨抑制前頭cxrのtb一貫性病変を意味的に分節するu-netモデルの変種を訓練し構成する。ビットワイズ,ビットワイズ,ビットワイズ,ビットワイズマックス,スタックリングなどのアンサンブル手法を用いてセグメント化性能を評価した。重み付けアンサンブルは,個々の構成モデルおよび他のアンサンブル法と比較して,高いセグメンテーション性能(ディップスコア0.5743,95%信頼区間0.4055,0.7431)を示した。本研究は,細粒度tb一貫性病変分割性能を改善するためにアンサンブル学習を適用した最初の研究である。

Automated segmentation of tuberculosis (TB)-consistent lesions in chest X-rays (CXRs) using deep learning (DL) methods can help reduce radiologist effort, supplement clinical decision-making, and potentially result in improved patient treatment. The majority of works in the literature discuss training automatic segmentation models using coarse bounding box annotations. However, the granularity of the bounding box annotation could result in the inclusion of a considerable fraction of false positives and negatives at the pixel level that may adversely impact overall semantic segmentation performance. This study (i) evaluates the benefits of using fine-grained annotations of TB-consistent lesions and (ii) trains and constructs ensembles of the variants of U-Net models for semantically segmenting TB-consistent lesions in both original and bone-suppressed frontal CXRs. We evaluated segmentation performance using several ensemble methods such as bitwise AND, bitwise-OR, bitwise-MAX, and stacking. We observed that the stacking ensemble demonstrated superior segmentation performance (Dice score: 0.5743, 95% confidence interval: (0.4055,0.7431)) compared to the individual constituent models and other ensemble methods. To the best of our knowledge, this is the first study to apply ensemble learning to improve fine-grained TB-consistent lesion segmentation performance.

翻訳日:2022-06-14 18:04:57 公開日:2022-06-13

# 自己深達度学習によるmpMRI前立腺癌の悪性度検出と局在:臨床応用への一歩

Prostate Cancer Malignancy Detection and localization from mpMRI using auto-Deep Learning: One Step Closer to Clinical Utilization ( http://arxiv.org/abs/2206.06235v1 )

ライセンス: Link先を確認

Weiwei Zong and Eric Carver and Simeng Zhu and Eric Schaff and Daniel Chapman and Joon Lee and Hassan Bagher Ebadian and Indrin Chetty and Benjamin Movsas and Winston Wen and Tarik Alafif and Xiangyun Zong

(参考訳) mpMRIによる前立腺悪性腫瘍の診断は,ここ数年で大きく研究されている。モデル解釈とドメインドリフトが臨床利用の主要な道路ブロックとなっている。対象は,201名の患者とのコホートでカスタマイズされた畳み込みニューラルネットワークを訓練し,関心領域周辺の2dパッチをカットして入力として,前立腺の2.5dスライスを入力とし,モデル空間においてオートケラを用いて最適なモデル検索を行った。末梢領域(PZ)と中心腺(CG)を別々に訓練し,PZ検出器とCG検出器は, 医師の作業負荷を大幅に軽減するために, 配列から最も疑わしいスライスをハイライトするために効果的に実証された。

Automatic diagnosis of malignant prostate cancer patients from mpMRI has been studied heavily in the past years. Model interpretation and domain drift have been the main road blocks for clinical utilization. As an extension from our previous work where we trained a customized convolutional neural network on a public cohort with 201 patients and the cropped 2D patches around the region of interest were used as the input, the cropped 2.5D slices of the prostate glands were used as the input, and the optimal model were searched in the model space using autoKeras. Something different was peripheral zone (PZ) and central gland (CG) were trained and tested separately, the PZ detector and CG detector were demonstrated effectively in highlighting the most suspicious slices out of a sequence, hopefully to greatly ease the workload for the physicians.

翻訳日:2022-06-14 18:04:32 公開日:2022-06-13

# 脳腫瘍患者の生存時間予測のためのMMMNA-Net

MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients ( http://arxiv.org/abs/2206.06267v1 )

ライセンス: Link先を確認

Wen Tang, Haoyue Zhang, Pengxin Yu, Han Kang, Rongguo Zhang

(参考訳) 全身生存時間(OS)はグリオーマの病態に対する最も重要な評価指標の1つである。マルチモーダルMRI(Multimodal Magnetic Resonance Imaging)スキャンは、グリオーマ予後OSの研究において重要な役割を担っている。マルチモーダルMRIにおけるOS時間予測のために, 深層学習に基づくいくつかの手法を提案する。しかし、これらの手法は通常、深層学習ネットワークの開始時や終了時にマルチモーダル情報を融合し、異なるスケールの機能の融合を欠いている。さらに、ネットワークの終端での融合は常にグローバル(例えば、グローバル平均プーリングアウトプットの結合後に完全に接続された)やローカル(例えば、バイリニアプーリング)に適応し、ローカルとグローバルの情報を失う。本稿では,脳腫瘍患者に対するマルチモーダルos時間予測法を提案する。提案手法は,現在の最先端手法(0.6989対0.6426)に比べて8.76%向上した。広範囲な試験により,本手法はモダリティが欠如している状況に適応できることが示された。コードはhttps://github.com/tangwen920812/mmmna-netで入手できる。

Overall survival (OS) time is one of the most important evaluation indices for gliomas situations. Multimodal Magnetic Resonance Imaging (MRI) scans play an important role in the study of glioma prognosis OS time. Several deep learning-based methods are proposed for the OS time prediction on multi-modal MRI problems. However, these methods usually fuse multi-modal information at the beginning or at the end of the deep learning networks and lack the fusion of features from different scales. In addition, the fusion at the end of networks always adapts global with global (eg. fully connected after concatenation of global average pooling output) or local with local (eg. bilinear pooling), which loses the information of local with global. In this paper, we propose a novel method for multi-modal OS time prediction of brain tumor patients, which contains an improved nonlocal features fusion module introduced on different scales. Our method obtains a relative 8.76% improvement over the current state-of-art method (0.6989 vs. 0.6426 on accuracy). Extensive testing demonstrates that our method could adapt to situations with missing modalities. The code is available at https://github.com/TangWen920812/mmmna-net.

翻訳日:2022-06-14 18:04:15 公開日:2022-06-13

# グラフモチーフ度対策の絶対表現性

Absolute Expressiveness of Subgraph Motif Centrality Measures ( http://arxiv.org/abs/2206.06137v1 )

ライセンス: Link先を確認

Andreas Pieris and Jorge Salas

(参考訳) グラフベースのアプリケーションでは、(有向または無向の)グラフの最も重要なまたは `central'' 頂点をピンポイントするか、グラフの頂点を重要度に応じてランク付けするのが一般的なタスクである。この目的のために、グラフ内の頂点が最も重要なものであるかを評価する文献において、いわゆる中央集権的尺度が多数提案されている。 riveros と salas は icdt 2020 の論文で、グラフにおける頂点の重要性は、それを取り巻く部分グラフモチーフとして知られる'relevant' 接続部分グラフの数と相対する、次の直感的な原理に基づく集中度尺度の族を提案した。上述の原則から派生した措置を下記の指標として参照する。サブグラフモチーフ測度はグラフデータベースアプリケーションに適しているという説得力のある主張がある。 ICDT論文は, 部分グラフモチーフ測度が好むいくつかの特性について研究したが, その絶対表現性はほとんど探索されていない。本研究の目的は,部分グラフモチーフ測度のファミリの絶対表現性を正確に特徴付けることである。

In graph-based applications, a common task is to pinpoint the most important or ``central'' vertex in a (directed or undirected) graph, or rank the vertices of a graph according to their importance. To this end, a plethora of so-called centrality measures have been proposed in the literature that assess which vertices in a graph are the most important ones. Riveros and Salas, in an ICDT 2020 paper, proposed a family of centrality measures based on the following intuitive principle: the importance of a vertex in a graph is relative to the number of ``relevant'' connected subgraphs, known as subgraph motifs, surrounding it. We refer to the measures derived from the above principle as subgraph motif measures. It has been convincingly argued that subgraph motif measures are well-suited for graph database applications. Although the ICDT paper studied several favourable properties enjoyed by subgraph motif measures, their absolute expressiveness remains largely unexplored. The goal of this work is to precisely characterize the absolute expressiveness of the family of subgraph motif measures.

翻訳日:2022-06-14 18:01:41 公開日:2022-06-13

# シャッフル型勾配アルゴリズムの大域解への収束について

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms ( http://arxiv.org/abs/2206.05869v1 )

ライセンス: Link先を確認

Lam M. Nguyen, Trang H. Tran

(参考訳) 確率的勾配降下(sgd)アルゴリズムは、拡張性と大規模問題への対処効率により、多くの機械学習タスクで選択される方法である。本稿では,本研究の主流である実用的ヒューリスティックスと一致するSGDのシャッフルバージョンに着目した。過パラメータ設定下での非凸関数のクラスに対してSGDをシャッフルする大域的解の収束性を示す。我々の分析では、以前の文献よりも緩和された非凸仮定を採用している。それでも、一般凸設定においてシャッフルSGDが達成した計算複雑性は維持される。

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings. Our analysis employs more relaxed non-convex assumptions than previous literature. Nevertheless, we maintain the desired computational complexity as shuffling SGD has achieved in the general convex setting.

翻訳日:2022-06-14 17:57:27 公開日:2022-06-13

# Safe-FinRL:高速株式取引のための低バイアス・可変深層強化学習実装

Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading ( http://arxiv.org/abs/2206.05910v1 )

ライセンス: Link先を確認

Zitao Song, Xuyang Jin, Chenliang Li

(参考訳) 近年、量的金融の実践者の多くが、より優れた量的取引(QT)戦略を構築するためにDeep Reinforcement Learning(DRL)を使用しようと試みている。しかしながら、既存の多くの研究は、非定常的な金融環境や実際の金融市場にDRLを適用する際のバイアスや分散トレードオフなど、いくつかの深刻な課題に対処できていない。そこで本研究では,固定的金融環境と低バイアス・分散推定によって強化された,新規なdrlベースの株取引戦略であるsafe-finrlを提案する。第一に、長い金融時系列をほぼ定常的な短い環境に分離し、第二に、一般的なリトレースオペレータをソフトアクタ-クリティックに組み込むことにより、ほぼ定常な金融環境にトレースsacを実装する。暗号通貨市場における大規模な実験は、Safe-FinRLが安定的な価値推定と安定した政策改善を提供し、ほぼ定常的な金融環境においてバイアスと分散を著しく低減したことを示している。

In recent years, many practitioners in quantitative finance have attempted to use Deep Reinforcement Learning (DRL) to build better quantitative trading (QT) strategies. Nevertheless, many existing studies fail to address several serious challenges, such as the non-stationary financial environment and the bias and variance trade-off when applying DRL in the real financial market. In this work, we proposed Safe-FinRL, a novel DRL-based high-freq stock trading strategy enhanced by the near-stationary financial environment and low bias and variance estimation. Our main contributions are twofold: firstly, we separate the long financial time series into the near-stationary short environment; secondly, we implement Trace-SAC in the near-stationary financial environment by incorporating the general retrace operator into the Soft Actor-Critic. Extensive experiments on the cryptocurrency market have demonstrated that Safe-FinRL has provided a stable value estimation and a steady policy improvement and reduced bias and variance significantly in the near-stationary financial environment.

翻訳日:2022-06-14 17:57:18 公開日:2022-06-13

# 決定点過程に対する遅延および高速グリーディMAP推論

Lazy and Fast Greedy MAP Inference for Determinantal Point Process ( http://arxiv.org/abs/2206.05947v1 )

ライセンス: Link先を確認

Shinichi Hemmi, Taihei Oki, Shinsaku Sakaue, Kaito Fujii, Satoru Iwata

(参考訳) 決定点プロセス(DPP)に対するMAP推論は、多くの機械学習アプリケーションにおいて多様な項目を選択する上で重要である。 DPP MAP推論はNPハードであるが、グリードアルゴリズムはしばしば高品質な解を見つけ、多くの研究者がその効率的な実装を研究している。古典的かつ実用的な方法の一つが遅延グリーディアルゴリズムであり、これは一般的な部分モジュラー関数の最大化に適用できるが、コレスキー因子分解に基づく最近の高速グリーディアルゴリズムはdpp写像推論より効率的である。本稿では,文献において相容れないと考えられる「怠け者」と「速い者」の考え方を組み合わせる方法について述べる。私たちの怠け者で高速な欲望アルゴリズムは、現在の最良のアルゴリズムとほぼ同じ時間複雑性を達成し、実際より速く実行します。 Lazy + fast"というアイデアは他のグリーディ型アルゴリズムにも拡張可能である。また、制約のない DPP MAP 推論のための二重欲求アルゴリズムの高速版も提供する。実験は加速アイデアの有効性を検証する。

The maximum a posteriori (MAP) inference for determinantal point processes (DPPs) is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. One classical and practical method is the lazy greedy algorithm, which is applicable to general submodular function maximization, while a recent fast greedy algorithm based on the Cholesky factorization is more efficient for DPP MAP inference. This paper presents how to combine the ideas of "lazy" and "fast", which have been considered incompatible in the literature. Our lazy and fast greedy algorithm achieves almost the same time complexity as the current best one and runs faster in practice. The idea of "lazy + fast" is extendable to other greedy-type algorithms. We also give a fast version of the double greedy algorithm for unconstrained DPP MAP inference. Experiments validate the effectiveness of our acceleration ideas.

翻訳日:2022-06-14 17:56:56 公開日:2022-06-13

# 値関数に基づく二値ハイパーパラメータ選択問題に対する差分凸アルゴリズム

Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems ( http://arxiv.org/abs/2206.05976v1 )

ライセンス: Link先を確認

Lucy Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang

(参考訳) 超パラメータチューニングのための勾配に基づく最適化手法は、固定上層変数値に対して、両レベルプログラムの下位レベルが強い凸(LLSC)と滑らか(LLS)であるときに、定常解に対する理論的収束を保証する。この条件は多くの機械学習アルゴリズムのハイパーパラメータのチューニングから生じるバイレベルプログラムでは満足できない。本研究では, 逐次収束型値関数に基づく差分凸アルゴリズム(VF-iDCA)を開発した。このアルゴリズムは,多種多様なハイパーパラメータチューニングアプリケーションから,LLSCやLSSの仮定を伴わない定常解を実現する。提案したVF-iDCAは,過度パラメータを調整した場合に優れた性能を示す。

Gradient-based optimization methods for hyperparameter tuning guarantee theoretical convergence to stationary solutions when for fixed upper-level variable values, the lower level of the bilevel program is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We show that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications. Our extensive experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.

翻訳日:2022-06-14 17:55:32 公開日:2022-06-13

# 機械学習マルチバースのモデリング

Modeling the Machine Learning Multiverse ( http://arxiv.org/abs/2206.05985v1 )

ライセンス: Link先を確認

Samuel J. Bell, Onno P. Kampman, Jesse Dodge and Neil D. Lawrence

(参考訳) 機械学習研究の信頼性と信頼性に関する懸念が高まる中、我々は、堅牢で一般化可能なクレームを作るための原則的なフレームワークであるMultiverse Analysisを提案する。我々の枠組みは,心理学の再現性危機に対応するために導入された多元的分析(Steegen et al., 2016)に基づいている。高次元かつしばしば連続なml探索空間を効率的に探索するために、多元系をガウス過程でモデル化し、ベイズ実験設計を適用する。我々のフレームワークは、モデル性能に関する堅牢な科学的結論を導き出すために設計されており、従来の最適化よりも探索に焦点を当てている。最初の2つのケーススタディにおいて、適応最適化器の相対的メリットに関する議論を考察した。第二に,大規模バッチ訓練一般化ギャップに対する学習率の影響について,矛盾する研究を合成する。機械学習コミュニティにとって、Multiverse Analysisは、堅牢なクレームを識別し、透明性を高め、再現性を向上させるためのシンプルで効果的なテクニックである。

Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the Multiverse Analysis. Our framework builds upon the Multiverse Analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML search spaces, we model the multiverse with a Gaussian Process surrogate and apply Bayesian experimental design. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization. In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. For the machine learning community, the Multiverse Analysis is a simple and effective technique for identifying robust claims, for increasing transparency, and a step toward improved reproducibility.

翻訳日:2022-06-14 17:55:20 公開日:2022-06-13

# 非orthogonal multi accessにおけるgpuアクセラレーション機械学習

GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access ( http://arxiv.org/abs/2206.05998v1 )

ライセンス: Link先を確認

Daniel Sch\"aufele, Guillermo Marcus, Nikolaus Binder, Matthias Mehlhose, Alexander Keller, S{\l}awomir Sta\'nczak

(参考訳) 非直交多重アクセス(Noma)は、将来の5Gおよび6Gネットワークに必要な大規模な接続を可能にする興味深い技術である。純粋に線形処理は、すでにNOMAシステムでは優れた性能を達成しているが、あるシナリオでは、許容可能な性能を保証するためには非線形処理が必須である。本稿では,線形処理と非線形処理の両方の利点を組み合わせたニューラルネットワークアーキテクチャを提案する。リアルタイム検出性能はグラフィックス処理ユニット(GPU)の高効率実装によって実証される。実験環境における実測値を用いて,従来の手法よりも優れた手法を示す。

Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks. While purely linear processing already achieves good performance in NOMA systems, in certain scenarios, non-linear processing is mandatory to ensure acceptable performance. In this paper, we propose a neural network architecture that combines the advantages of both linear and non-linear processing. Its real-time detection performance is demonstrated by a highly efficient implementation on a graphics processing unit (GPU). Using real measurements in a laboratory environment, we show the superiority of our approach over conventional methods.

翻訳日:2022-06-14 17:55:04 公開日:2022-06-13

# 都市道路網における充電ステーションの強化学習による配置

Reinforcement Learning-based Placement of Charging Stations in Urban Road Networks ( http://arxiv.org/abs/2206.06011v1 )

ライセンス: Link先を確認

Leonie von Wahl (1), Nicolas Tempelmeier (1), Ashutosh Sao (2) and Elena Demidova (3) ((1) Volkswagen Group, (2) L3S Research Center, University of Hannover, (3) Data Science & Intelligent Systems Group (DSIS), University of Bonn)

(参考訳) 従来のモビリティからエレクトロモビリティへの移行は、帯電インフラの可用性と最適配置に大きく依存しており、都市部における帯電ステーションの最適配置について検討する。我々は、地域の充電インフラ供給を最大化し、予算制約を設定しながら、待ち時間、旅行時間、充電時間を最小化する。さらに, 都市全域の充電需要をより高精度に推定するために, 家庭での充電が可能となる可能性についても検討した。充電ステーションの最適位置と異なる充電タイプの充電池数を求める非線形整数最適化問題として充電ステーションの配置を定式化する。我々は、充電ステーション配置問題(PCRL)を解決するために、新しいDeep Reinforcement Learningアプローチを設計する。実世界のデータセットに対する大規模な実験は、PCRLが5つの基準線と比較して充電計画の利点を高めながら、待ち時間と旅行時間を減らしていることを示している。既存のインフラストラクチャと比較して、待ち時間を最大97%削減し、メリットを最大497%向上することが可能です。

The transition from conventional mobility to electromobility largely depends on charging infrastructure availability and optimal placement.This paper examines the optimal placement of charging stations in urban areas. We maximise the charging infrastructure supply over the area and minimise waiting, travel, and charging times while setting budget constraints. Moreover, we include the possibility of charging vehicles at home to obtain a more refined estimation of the actual charging demand throughout the urban area. We formulate the Placement of Charging Stations problem as a non-linear integer optimisation problem that seeks the optimal positions for charging stations and the optimal number of charging piles of different charging types. We design a novel Deep Reinforcement Learning approach to solve the charging station placement problem (PCRL). Extensive experiments on real-world datasets show how the PCRL reduces the waiting and travel time while increasing the benefit of the charging plan compared to five baselines. Compared to the existing infrastructure, we can reduce the waiting time by up to 97% and increase the benefit up to 497%.

翻訳日:2022-06-14 17:54:53 公開日:2022-06-13

# 雑音フィードバックを持つゲームにおける非回帰学習:学習速度分離による高速率と適応性

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation ( http://arxiv.org/abs/2206.06015v1 )

ライセンス: Link先を確認

Yu-Guan Hsieh, Kimon Antonakopoulos, Volkan Cevher, Panayotis Mertikopoulos

(参考訳) 本稿では,学習者が他の最適化エージェントと連続ゲームに関わった場合の後悔の最小化の問題について考察する。変動安定ゲーム(全凸凹ゲームと単調ゲームを含む連続ゲーム)の文脈でこの問題を考察し、各プレイヤーが個々のペイオフ勾配のノイズ推定にのみアクセスできる場合について考察する。雑音が加法的であれば、ゲーム理論と純粋に敵対的な設定は同様の後悔の保証を享受するが、ノイズが乗算的であれば、学習者が常に後悔できることを示す。学習速度分離を伴う楽観的勾配スキーム(つまり、ノイズプロファイルに応じて、その方法の補間と更新ステップが異なるスケジュールに調整される)によって、この高速レートを達成する。その後、微妙なハイパーパラメータチューニングの必要性をなくすため、最悪と最良な後悔の保証をスムーズに補間する完全適応手法を提案する。

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that smoothly interpolates between worst- and best-case regret guarantees.

翻訳日:2022-06-14 17:54:36 公開日:2022-06-13

# 機械学習モデルの$k$-safetyプロパティの指定とテスト

Specifying and Testing $k$-Safety Properties for Machine-Learning Models ( http://arxiv.org/abs/2206.06054v1 )

ライセンス: Link先を確認

Maria Christakis, Hasan Ferit Eniser, J\"org Hoffmann, Adish Singla, Valentin W\"ustholz

(参考訳) 機械学習モデルは、画像分類や意思決定タスクの支援など、私たちの生活でますます普及している。その結果、これらのモデルの信頼性は重要であり、その堅牢性と公平性を検証するための多くのアプローチの開発に繋がった。しかし、そのような特定の特性を超えて、モデルから一般的な機能的修正性期待を特定することは困難である。本稿では,形式的手法で使われる仕様からインスピレーションを得て,約$k$の異なる実行,いわゆる$k$-safetyプロパティを推論することで,機能的正当性を表現した。銀行のクレジット・スクリーニングモデルを考えると、「人がローンを否定され、その収入が減少しても、まだローンを否定すべきである」という期待は2つの安全資産である。ここでは、機械学習モデルに対する$k$-safetyプロパティの幅広い適用性を示し、それらを表現するための最初の仕様言語を示す。我々はまた、メタモルフィックテストを使用してそのようなプロパティを自動的に検証するフレームワークで言語を運用する。我々の実験は、我々のフレームワークがプロパティ違反を特定するのに効果的であり、検出されたバグがより良いモデルを訓練するのに使えることを示した。

Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about $k$ different executions, so-called $k$-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of $k$-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models.

翻訳日:2022-06-14 17:54:16 公開日:2022-06-13

# 交通を考慮した長期記憶予測に基づく機械型デバイスのエネルギー効率向上

Energy-Efficient Wake-Up Signalling for Machine-Type Devices Based on Traffic-Aware Long-Short Term Memory Prediction ( http://arxiv.org/abs/2206.06058v1 )

ライセンス: Link先を確認

David E. Ru\'iz-Guirola, Carlos A. Rodr\'iguez-L\'opez, Samuel Montejo-S\'anchez, Richard Demo Souza, Onel L. A. L\'opez and Hirley Alves

(参考訳) 低消費電力機械型通信(MTC)ネットワークでは省エネルギー化が課題となっている。この点において、機械型デバイス(MTD)の無線インタフェースが消費するエネルギーを最小化することを目的としたWuS(Wake-up Signal)技術は、有望な解決策である。しかし、最先端のWuSメカニズムは静的な操作パラメータを使用するため、システムダイナミクスに効率的に適応することはできない。そこで我々は,mtcのトラフィックパターンを予測し,それに応じてwusを構成するための,単純かつ効率的なニューラルネットワークを設計した。提案する予測wus (fwus) は,遅延状態のページ監視を回避し,mtdの睡眠時間を延ばすことのできる,精度の高いlong-short term memory (lstm) ベースのトラヒック予測を活用している。シミュレーションの結果,提案手法の有効性が示された。交通予測誤差は4%未満で、誤報と誤検出確率はそれぞれ8.8%と1.3%である。エネルギー消費削減の観点からは、fwusは最高ベンチマークメカニズムを最大32%で上回ることができる。最後に、FWuSが交通密度の変化に動的に適応し、低消費電力のMCCスケーラビリティを促進する能力を証明する。

Reducing energy consumption is a pressing issue in low-power machine-type communication (MTC) networks. In this regard, the Wake-up Signal (WuS) technology, which aims to minimize the energy consumed by the radio interface of the machine-type devices (MTDs), stands as a promising solution. However, state-of-the-art WuS mechanisms use static operational parameters, so they cannot efficiently adapt to the system dynamics. To overcome this, we design a simple but efficient neural network to predict MTC traffic patterns and configure WuS accordingly. Our proposed forecasting WuS (FWuS) leverages an accurate long-short term memory (LSTM)- based traffic prediction that allows extending the sleep time of MTDs by avoiding frequent page monitoring occasions in idle state. Simulation results show the effectiveness of our approach. The traffic prediction errors are shown to be below 4%, being false alarm and miss-detection probabilities respectively below 8.8% and 1.3%. In terms of energy consumption reduction, FWuS can outperform the best benchmark mechanism in up to 32%. Finally, we certify the ability of FWuS to dynamically adapt to traffic density changes, promoting low-power MTC scalability

翻訳日:2022-06-14 17:53:56 公開日:2022-06-13

# EGRU:アクティビティスパース推論と学習のためのイベントベースGRU

EGRU: Event-based GRU for activity-sparse inference and learning ( http://arxiv.org/abs/2206.06178v1 )

ライセンス: Link先を確認

Anand Subramoney, Khaleelulla Khan Nazeer, Mark Sch\"one, Christian Mayr, David Kappel

(参考訳) 繰り返しニューラルネットワーク(RNN)のスケーラビリティは、前のタイムステップの出力に対する各タイムステップの計算の逐次的依存によって妨げられる。したがって、RNNの高速化とスケールアップの1つの方法は、モデルのサイズやタスクに依存しない各ステップで必要とされる計算を減らすことである。本稿では、イベントベースGRU(Event-based GRU)と呼ばれるイベントベースアクティビティスパースモデルとしてGRU(Gated Recurrent Units)を再構成し、他のユニットからの入力イベント(イベントベース)の受信時にのみ更新を演算するモデルを提案する。アクティブな単位のごく一部しか持たない(アクティビティスパース)と組み合わせると、このモデルは現在のRNNよりもはるかに効率的な計算能力を持つ。特に,本モデルでは,勾配降下時のスパースパラメータの更新も行い,この計算効率をトレーニングフェーズに拡張する。 EGRUは,言語モデリングを含む実世界のタスクにおける最先端の繰り返しネットワークモデルと比較して,高い活動空間を推論や訓練中に自然に維持し,競争力を発揮することを示す。これは、新しいニューロモルフィックなハードウェアに適した、スケーラブルでより適した次世代のリカレントネットワークの舞台となる。

The scalability of recurrent neural networks (RNNs) is hindered by the sequential dependence of each time step's computation on the previous time step's output. Therefore, one way to speed up and scale RNNs is to reduce the computation required at each time step independent of model size and task. In this paper, we propose a model that reformulates Gated Recurrent Units (GRU) as an event-based activity-sparse model that we call the Event-based GRU (EGRU), where units compute updates only on receipt of input events (event-based) from other units. When combined with having only a small fraction of the units active at a time (activity-sparse), this model has the potential to be vastly more compute efficient than current RNNs. Notably, activity-sparsity in our model also translates into sparse parameter updates during gradient descent, extending this compute efficiency to the training phase. We show that the EGRU demonstrates competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling while maintaining high activity sparsity naturally during inference and training. This sets the stage for the next generation of recurrent networks that are scalable and more suitable for novel neuromorphic hardware.

翻訳日:2022-06-14 17:52:42 公開日:2022-06-13

# 医療におけるaiベースのデータ準備とデータ分析:糖尿病の事例

AI-based Data Preparation and Data Analytics in Healthcare: The Case of Diabetes ( http://arxiv.org/abs/2206.06182v1 )

ライセンス: Link先を確認

Marianna Maranghi, Aris Anagnostopoulos, Irene Cannistraci, Ioannis Chatzigiannakis, Federico Croce, Giulia Di Teodoro, Michele Gentile, Giorgio Grani, Maurizio Lenzerini, Stefano Leonardi, Andrea Mastropietro, Laura Palagi, Massimiliano Pappa, Riccardo Rosati, Riccardo Valentini, Paola Velardi

(参考訳) Associazione Medici Diabetologi (AMD)は、AMDデータベースとしても知られる、世界最大規模の糖尿病患者の記録を収集し管理している。本稿では,これらの重要かつ価値のあるデータセットを概念化し,クリーニングし,分析するための人工知能と機械学習技術の活用に焦点をあてた,現在進行中のプロジェクトの初期成果について述べる。

The Associazione Medici Diabetologi (AMD) collects and manages one of the largest worldwide-available collections of diabetic patient records, also known as the AMD database. This paper presents the initial results of an ongoing project whose focus is the application of Artificial Intelligence and Machine Learning techniques for conceptualizing, cleaning, and analyzing such an important and valuable dataset, with the goal of providing predictive insights to better support diabetologists in their diagnostic and therapeutic choices.

翻訳日:2022-06-14 17:52:19 公開日:2022-06-13

# 機械学習に基づくウィンドウマルウェア検出手法の評価におけるデータセットサイズとクラス不均衡の影響について

On the impact of dataset size and class imbalance in evaluating machine-learning-based windows malware detection techniques ( http://arxiv.org/abs/2206.06256v1 )

ライセンス: Link先を確認

David Illes

(参考訳) このプロジェクトの目的は、Microsoft Windowsのマルウェアに焦点を当てた結果の互換性と実際の適用性に関するデータを収集し分析することであり、具体的には、データセットのサイズとデータセットの不均衡が測定された検出性能に与える影響である。一部の研究者は、より小さなデータセットを使用しており、データセットのサイズがパフォーマンスに大きな影響を与える場合、公開結果の比較が困難になる。研究者はまた、バランスの取れたデータセットと精度をテストの指標として使う傾向がある。前者は現実の真の表現ではなく、良性サンプルはマルウェアの数を大幅に上回っており、後者は不均衡な問題に対する問題であることが知られている。このプロジェクトは、データセットのサイズが測定された検出器のパフォーマンスと相関しているかどうかを、公開結果の有意義な比較を妨げる範囲まで理解し、公開研究で報告された優れたパフォーマンスが実際のデプロイシナリオでうまく機能することを期待できるかどうかを理解するという、2つの重要な目標を特定した。本研究の結果は, データセットサイズが測定値と相関し, 結果の有意な比較を防止し, かつ, 結果の結論に対するトレーニングセットサイズ精度曲線の性質を理解せずに, 精度スコアにのみ基づくアプローチを行なわなければならないことを示唆した。結果は、高い精度スコアは必ずしも現実世界のパフォーマンスに変換されないことを示唆した。

The purpose of this project was to collect and analyse data about the comparability and real-life applicability of published results focusing on Microsoft Windows malware, more specifically the impact of dataset size and testing dataset imbalance on measured detector performance. Some researchers use smaller datasets, and if dataset size has a significant impact on performance, that makes comparison of the published results difficult. Researchers also tend to use balanced datasets and accuracy as a metric for testing. The former is not a true representation of reality, where benign samples significantly outnumber malware, and the latter is approach is known to be problematic for imbalanced problems. The project identified two key objectives, to understand if dataset size correlates to measured detector performance to an extent that prevents meaningful comparison of published results, and to understand if good performance reported in published research can be expected to perform well in a real-world deployment scenario. The research's results suggested that dataset size does correlate with measured detector performance to an extent that prevents meaningful comparison of published results, and without understanding the nature of the training set size-accuracy curve for published results conclusions between approaches on which approach is "better" shouldn't be made solely based on accuracy scores. Results also suggested that high accuracy scores don't necessarily translate to high real-world performance.

翻訳日:2022-06-14 17:52:10 公開日:2022-06-13

# Annular Computational Imaging:単純なレンズによるパノラマ画像の鮮明化

Annular Computational Imaging: Capture Clear Panoramic Images through Simple Lens ( http://arxiv.org/abs/2206.06070v1 )

ライセンス: Link先を確認

Qi Jiang, Hao Shi, Lei Sun, Shaohua Gao, Kailun Yang, Kaiwei Wang

(参考訳) panoramic annular lens (pal) はレンズ数が少ないが、小型で視野が大きいため、モバイルやウェアラブル端末のセンシングタスクをパノラマで囲む大きな可能性を秘めている。しかし,小容量PALの画質は収差補正用レンズの欠如により光学限界に制限される。本稿では,軽量PAL設計の光学的限界を破るAnnular Computational Imaging (ACI)フレームワークを提案する。学習に基づく画像復元を容易にするため,パノラマ画像のための波動シミュレーションパイプラインを導入し,複数のデータ分布を通して合成と現実のギャップに対処する。提案したパイプラインは設計パラメータを持つ任意のPALに容易に適応でき、耐ゆるい設計に適している。さらに,パノラマ画像と物理インフォームド学習の物理的先行性を考慮した物理インフォームド画像復元ネットワーク(PI2RNet)を設計する。データセットレベルでは、DIVPanoデータセットを作成し、その上で広範囲な実験を行い、提案したネットワークが空間的に変化する劣化下でのパノラマ画像復元における技術の新しい状態を設定することを示す。さらに,3球面レンズのみを用いた簡易PALによるACIの評価により,高画質パノラマ画像とコンパクトデザインとの微妙なバランスが明らかとなった。私たちの知る限りでは、計算イメージング(CI)をPALで最初に探求した人物です。コードとデータセットはhttps://github.com/zju-jiangqi/ACI-PI2RNetで公開されている。

Panoramic Annular Lens (PAL), composed of few lenses, has great potential in panoramic surrounding sensing tasks for mobile and wearable devices because of its tiny size and large Field of View (FoV). However, the image quality of tiny-volume PAL confines to optical limit due to the lack of lenses for aberration correction. In this paper, we propose an Annular Computational Imaging (ACI) framework to break the optical limit of light-weight PAL design. To facilitate learning-based image restoration, we introduce a wave-based simulation pipeline for panoramic imaging and tackle the synthetic-to-real gap through multiple data distributions. The proposed pipeline can be easily adapted to any PAL with design parameters and is suitable for loose-tolerance designs. Furthermore, we design the Physics Informed Image Restoration Network (PI2RNet), considering the physical priors of panoramic imaging and physics-informed learning. At the dataset level, we create the DIVPano dataset and the extensive experiments on it illustrate that our proposed network sets the new state of the art in the panoramic image restoration under spatially-variant degradation. In addition, the evaluation of the proposed ACI on a simple PAL with only 3 spherical lenses reveals the delicate balance between high-quality panoramic imaging and compact design. To the best of our knowledge, we are the first to explore Computational Imaging (CI) in PAL. Code and datasets will be made publicly available at https://github.com/zju-jiangqi/ACI-PI2RNet.

翻訳日:2022-06-14 17:51:26 公開日:2022-06-13

# (参考訳) ロボットマニピュレーションタスクの強化学習におけるSim2Real転送のランダム化効果の解析

Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks ( http://arxiv.org/abs/2206.06282v1 )

ライセンス: CC BY 4.0

Josip Josifovski, Mohammadhossein Malmir, Noah Klarmann, Bare Luka \v{Z}agar, Nicol\'as Navarro-Guerrero and Alois Knoll

(参考訳) ランダム化は現在、ロボット工学におけるデータ駆動学習アルゴリズムのSim2Real転送において広く使われているアプローチである。それでもほとんどのsim2real研究は、特定のランダム化手法と高度にカスタマイズされたロボットシステムの結果を報告しており、異なるランダム化アプローチを体系的に評価することは困難である。この問題に対処するために、ロボットリーチ・アンド・バランスマニピュレータタスクの再現容易な実験セットアップを定義し、比較のためのベンチマークとして機能する。 4つのランダム化戦略と3つのランダム化パラメータをシミュレーションと実ロボットで比較する。その結果,よりランダム化がsim2実数転送の助けとなるが,シミュレーションにおける適切なポリシーを見つけるアルゴリズムの能力を損なう可能性があることがわかった。完全にランダム化されたシミュレーションと微調整は、異なる結果を示し、テストされた他のアプローチよりも実際のロボットに翻訳する。

Randomization is currently a widely used approach in Sim2Real transfer for data-driven learning algorithms in robotics. Still, most Sim2Real studies report results for a specific randomization technique and often on a highly customized robotic system, making it difficult to evaluate different randomization approaches systematically. To address this problem, we define an easy-to-reproduce experimental setup for a robotic reach-and-balance manipulator task, which can serve as a benchmark for comparison. We compare four randomization strategies with three randomized parameters both in simulation and on a real robot. Our results show that more randomization helps in Sim2Real transfer, yet it can also harm the ability of the algorithm to find a good policy in simulation. Fully randomized simulations and fine-tuning show differentiated results and translate better to the real robot than the other approaches tested.

翻訳日:2022-06-14 17:48:50 公開日:2022-06-13

# AI研究のためのX-Risk解析

X-Risk Analysis for AI Research ( http://arxiv.org/abs/2206.05862v1 )

ライセンス: Link先を確認

Dan Hendrycks, Mantas Mazeika

(参考訳) 人工知能(AI)は、社会を大幅に改善する可能性があるが、強力なテクノロジーと同様に、リスクと責任が高められる。現在のAI研究は、投機的長期リスクを含むAIシステムから長期リスクを管理する方法に関する体系的な議論を欠いている。 AIが人類の長期的な可能性を改善するのに不可欠である可能性を念頭に置いて、よりインテリジェントで強力なAIシステムを構築することが、最終的には私たちよりも強力なシステムをもたらすのではないかという懸念がある。これらの議論の正確さと基礎化のため,我々は,大規模プロセスをより安全な方向に進めるように設計されたハザード分析とシステム安全性の観点から,時間的にテストされた概念の集合について検討する。次に、AI研究者がAIシステムの安全性に長期的な影響を現実的に与える方法について論じる。最後に、安全性と一般的な能力のバランスに影響を与えるプロセスを堅牢に形成する方法について論じる。

Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind that AI may be integral to improving humanity's long-term potential, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we review a collection of time-tested concepts from hazard analysis and systems safety, which have been designed to steer large processes in safer directions. We then discuss how AI researchers can realistically have long-term impacts on the safety of AI systems. Finally, we discuss how to robustly shape the processes that will affect the balance between safety and general capabilities.

翻訳日:2022-06-14 17:07:27 公開日:2022-06-13

# 干渉認識を伴うオンラインサービスシステムにおける因果推論に基づくルート原因解析

Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition ( http://arxiv.org/abs/2206.05871v1 )

ライセンス: Link先を確認

Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei

(参考訳) 断層診断は多くの領域で重要であり、断層は安全上の脅威や経済的な損失につながる可能性がある。オンラインサービスシステムの分野では、オペレータは障害の検出と軽減のために巨大な監視データに依存している。根本原因指標の小さなセットを迅速に認識することで、障害の軽減に多くの時間を費やすことができる。本稿では,介入認識という新たな因果推論タスクとして根本原因分析問題を定式化する。そこで我々はCausal Inference-based Root Cause Analysis (CIRCA)という,教師なし因果推論に基づく新しい手法を提案した。核となる考え方は、監視変数が根本原因指標となるのに十分な条件、すなわち因果ベイズネットワーク(cbn)の親に条件づけられた確率分布の変化である。オンラインサービスシステムのアプリケーションに向けて、circaはシステムアーキテクチャの知識と因果的仮定に基づいてメトリクスを監視するグラフを構築します。シミュレーション研究は、CIRCAの理論的信頼性を示す。実世界のデータセットのパフォーマンスは、circaが最高のベースラインメソッドよりも、トップ1レコメンデーションのリコールを25%改善できることを示している。

Fault diagnosis is critical in many domains, as faults may lead to safety threats or economic losses. In the field of online service systems, operators rely on enormous monitoring data to detect and mitigate failures. Quickly recognizing a small set of root cause indicators for the underlying fault can save much time for failure mitigation. In this paper, we formulate the root cause analysis problem as a new causal inference task named intervention recognition. We proposed a novel unsupervised causal inference-based method named Causal Inference-based Root Cause Analysis (CIRCA). The core idea is a sufficient condition for a monitoring variable to be a root cause indicator, i.e., the change of probability distribution conditioned on the parents in the Causal Bayesian Network (CBN). Towards the application in online service systems, CIRCA constructs a graph among monitoring metrics based on the knowledge of system architecture and a set of causal assumptions. The simulation study illustrates the theoretical reliability of CIRCA. The performance on a real-world dataset further shows that CIRCA can improve the recall of the top-1 recommendation by 25% over the best baseline method.

翻訳日:2022-06-14 17:07:09 公開日:2022-06-13

# スパースグループブースティング --偏りのないグループと変数の選択

Sparse-group boosting -- Unbiased group and variable selection ( http://arxiv.org/abs/2206.06344v1 )

ライセンス: Link先を確認

Fabian Obster, Christian Heumann

(参考訳) グループ化共変量体の存在下では,グループ内およびグループ間の間隔を強制できる強化のためのフレームワークを提案する。自由度を調整した成分方向および群方向勾配ブーストを同時に使用することにより、スパース群lassoと同様の特性を有するモデルをブーストにより装着することができる。群内および群間間隔を混合パラメータで制御できることを示し, スパース群ラッソにおける混合パラメータとの類似性と相違について考察した。シミュレーション,遺伝子データおよび農業データを用いて,この推定装置の有効性と予測的競争性を示す。データとシミュレーションは、群化変数が存在する場合、スパースグループブースティングの使用は、偏りの少ない変数選択と、コンポーネントワイズブースティングよりも高い予測可能性に関連していることを示唆している。さらに,自由度を通じて成分的に促進するバイアスを低減する方法を提案する。

In the presence of grouped covariates, we propose a framework for boosting that allows to enforce sparsity within and between groups. By using component-wise and group-wise gradient boosting at the same time with adjusted degrees of freedom, a model with similar properties as the sparse group lasso can be fitted through boosting. We show that within-group and between-group sparsity can be controlled by a mixing parameter and discuss similarities and differences to the mixing parameter in the sparse group lasso. With simulations, gene data as well as agricultural data we show the effectiveness and predictive competitiveness of this estimator. The data and simulations suggest, that in the presence of grouped variables the use of sparse group boosting is associated with less biased variable selection and higher predictability compared to component-wise boosting. Additionally, we propose a way of reducing bias in component-wise boosting through the degrees of freedom.

翻訳日:2022-06-14 17:04:21 公開日:2022-06-13

# マルチコアCPUにおける高スループット・低レイテンシソフトウェア定義無線用DSEL

A DSEL for High Throughput and Low Latency Software-Defined Radio on Multicore CPUs ( http://arxiv.org/abs/2206.06147v1 )

ライセンス: Link先を確認

Adrien Cassagne (ALSOC, SU), Romain Tajan (IMS, Bordeaux INP), Olivier Aumage (STORM), Camille Leroux (IMS, Bordeaux INP), Denis Barthou (STORM, Bordeaux INP), Christophe J\'ego (IMS, Bordeaux INP)

(参考訳) 本稿では、Software-Defined Radio(SDR)専用の新しいDomain Specific Embedded Language(DSEL)について述べる。注意深く設計されたコンポーネントセットから、効率的なソフトウェアデジタル通信システムを構築することができ、プログラマにとって簡単で安全な方法で、現代のプロセッサアーキテクチャの並列性を活用することができる。特に,提案するDSELは,パイプライニングとシーケンス複製を併用して,ディジタル通信システムから時間的および空間的並列性を抽出することができる。 DSELの機能は、ソフトウェアで完全に設計された広く使われているDVB-S2標準のための完全なデジタルトランスシーバーである。評価により,提案したソフトウェアDVB-S2トランシーバが,最新のハイエンドマルチコアCPUターゲットを最大限に活用できることを示す。

This article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software-Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables the combination of pipelining and sequence duplication techniques to extract both temporal and spatial parallelism from digital communication systems. We leverage the DSEL capabilities on a real use case: a fully digital transceiver for the widely used DVB-S2 standard designed entirely in software. Through evaluation, we show how proposed software DVB-S2 transceiver is able to get the most from modern, high-end multicore CPU targets.

翻訳日:2022-06-14 17:04:03 公開日:2022-06-13

# SwitchboardベンチマークでOracleのワードエラー率をゼロに

Toward Zero Oracle Word Error Rate on the Switchboard Benchmark ( http://arxiv.org/abs/2206.06192v1 )

ライセンス: Link先を確認

Arlo Faria, Adam Janin, Korbinian Riedhammer, Sidhi Adkoli

(参考訳) スイッチボードベンチマーク」は自動音声認識(ASR)研究において非常によく知られたテストセットであり、人間レベルの転写精度を主張するシステムの記録設定性能を確立する。この研究は、この評価のあまり知られていない実践的考察を強調し、参照文字の修正と公式評価手法からの逸脱による単語誤り率(WER)の大幅な改善を示す。このより詳細に再現可能なスキームでは、商用のASRシステムでさえ5\% WER未満のスコアが得られ、研究システムの確立された記録は2.3%に低下する。書き起こし精度の別の指標が提案されており、削除を罰せず、人間と機械の性能をより区別しているように見える。商用のASRシステムは、まだこの閾値を下回っているが、商用の人間の音声認識の精度を明らかに上回っている。この研究は、oracle werの計算に標準化されたスコアリングツールを使うことも検討している。フレーズの代替表現は、発話レベルのN-bestリストや単語レベルのデータ構造と比較される。

The "Switchboard benchmark" is a very well-known test set in automatic speech recognition (ASR) research, establishing record-setting performance for systems that claim human-level transcription accuracy. This work highlights lesser-known practical considerations of this evaluation, demonstrating major improvements in word error rate (WER) by correcting the reference transcriptions and deviating from the official scoring methodology. In this more detailed and reproducible scheme, even commercial ASR systems can score below 5\% WER and the established record for a research system is lowered to 2.3%. An alternative metric of transcript precision is proposed, which does not penalize deletions and appears to be more discriminating for human vs. machine performance. While commercial ASR systems are still below this threshold, a research system is shown to clearly surpass the accuracy of commercial human speech recognition. This work also explores using standardized scoring tools to compute oracle WER by selecting the best among a list of alternatives. A phrase alternatives representation is compared to utterance-level N-best lists and word-level data structures; using dense lattices and adding out-of-vocabulary words, this achieves an oracle WER of 0.18%.

翻訳日:2022-06-14 17:03:51 公開日:2022-06-13

# 標準認知症スクリーニングテストの自動化評価

Automated Evaluation of Standardized Dementia Screening Tests ( http://arxiv.org/abs/2206.06208v1 )

ライセンス: Link先を確認

Franziska Braun, Markus F\"orstel, Bastian Oppermann, Andreas Erzigkeit, Thomas Hillemacher, Hartmut Lehfeld, Korbinian Riedhammer

(参考訳) 認知症スクリーニングとモニタリングのためには、様々な認知タスクのパフォーマンスを測定することで主観性を最小化することを目的としており、標準化されたテストが臨床ルーチンにおいて重要な役割を果たす。本稿では,SKTとCERAD-NBの2つの標準化された神経心理学的テストに続き,半標準化された歴史からなる研究について報告する。テストには、名前オブジェクトや単語リストの学習といった基本的なタスクだけでなく、MMSEのような広く使われているツールも含まれている。ほとんどのタスクは音声で実行されるので、書き起こしに基づく自動スコアリングに適している。第1回では,手作業による手作業による評価と手作業による自動評価の相関について検討した。 sktとcerad-nbの両方において,手書きの書き起こしを用いて,高い相関度から完全相関度を観測し,相関度の低いタスクでは,音声に制限されるため,自動スコアリングは人間の基準よりも厳格である。自動転写を用いると、相関は期待通りに低下し、認識精度に関係するが、高い相関は最大0.98(SKT)と0.85(CERAD-NB)である。単語の代替は認識誤りの軽減に役立ち、専門家のスコアとの相関性が向上することを示す。

For dementia screening and monitoring, standardized tests play a key role in clinical routine since they aim at minimizing subjectivity by measuring performance on a variety of cognitive tasks. In this paper, we report on a study that consists of a semi-standardized history taking followed by two standardized neuropsychological tests, namely the SKT and the CERAD-NB. The tests include basic tasks such as naming objects, learning word lists, but also widely used tools such as the MMSE. Most of the tasks are performed verbally and should thus be suitable for automated scoring based on transcripts. For the first batch of 30 patients, we analyze the correlation between expert manual evaluations and automatic evaluations based on manual and automatic transcriptions. For both SKT and CERAD-NB, we observe high to perfect correlations using manual transcripts; for certain tasks with lower correlation, the automatic scoring is stricter than the human reference since it is limited to the audio. Using automatic transcriptions, correlations drop as expected and are related to recognition accuracy; however, we still observe high correlations of up to 0.98 (SKT) and 0.85 (CERAD-NB). We show that using word alternatives helps to mitigate recognition errors and subsequently improves correlation with expert scores.

翻訳日:2022-06-14 17:03:29 公開日:2022-06-13

# (参考訳) Federated Bayesian Neural Regression: A Scalable Global Federated Gaussian Process

Federated Bayesian Neural Regression: A Scalable Global Federated Gaussian Process ( http://arxiv.org/abs/2206.06357v1 )

ライセンス: CC BY 4.0

Haolin Yu, Kaiyang Guo, Mahdi Karami, Xi Chen, Guojun Zhang, Pascal Poupart

(参考訳) フェデレートラーニング(FL)フレームワークが適用される典型的なシナリオでは、クライアントが正確なモデルを作成するのに十分なトレーニングデータを持つことが一般的です。したがって、点推定だけでなく、信頼の概念も提供するモデルは有益である。ガウス過程(英: Gaussian Process, GP)は、自然に校正された分散推定を伴う強力なベイズモデルである。しかし、ローカルカーネルのマージがプライバシー漏洩につながるため、スタンドアローンのグローバルGPを学ぶのは難しい。プライバシーを守るために、フェデレーションgpsを検討する以前の研究は、ローカルモデルのパーソナライズされた設定や学習に集中することで、グローバルモデルを学ぶことを避ける。我々は,クライアントのプライバシを尊重するスケーラブルなスタンドアロングローバルフェデレーションGPを学習するアルゴリズムであるFederated Bayesian Neural Regression (FedBNR)を提案する。統一的なランダムカーネルを定義することで、拡張性のために深いカーネル学習とランダム機能を取り込んでいます。このランダムカーネルは、静止カーネルと多くの非定常カーネルを復元可能であることを示す。そして、すべてのクライアントデータが集中しているかのように、グローバルな予測モデルを学ぶ原則に基づくアプローチを導出します。また,グローバルカーネルを非同一かつ独立に分散したクライアントに対して,知識蒸留法を用いて学習する。実世界の回帰データセットを用いて実験を行い、他のGPモデルと比較して統計的に有意な改善を示した。

In typical scenarios where the Federated Learning (FL) framework applies, it is common for clients to have insufficient training data to produce an accurate model. Thus, models that provide not only point estimations, but also some notion of confidence are beneficial. Gaussian Process (GP) is a powerful Bayesian model that comes with naturally well-calibrated variance estimations. However, it is challenging to learn a stand-alone global GP since merging local kernels leads to privacy leakage. To preserve privacy, previous works that consider federated GPs avoid learning a global model by focusing on the personalized setting or learning an ensemble of local models. We present Federated Bayesian Neural Regression (FedBNR), an algorithm that learns a scalable stand-alone global federated GP that respects clients' privacy. We incorporate deep kernel learning and random features for scalability by defining a unifying random kernel. We show this random kernel can recover any stationary kernel and many non-stationary kernels. We then derive a principled approach of learning a global predictive model as if all client data is centralized. We also learn global kernels with knowledge distillation methods for non-identically and independently distributed (non-i.i.d.) clients. Experiments are conducted on real-world regression datasets and show statistically significant improvements compared to other federated GP models.

翻訳日:2022-06-14 16:57:08 公開日:2022-06-13

# 連続k-Nearest Neighboursグラフを用いた局所距離保存オートエンコーダ

Local distance preserving auto-encoders using Continuous k-Nearest Neighbours graphs ( http://arxiv.org/abs/2206.05909v1 )

ライセンス: Link先を確認

Nutan Chen, Patrick van der Smagt, Botond Cseke

(参考訳) データの類似性を保存する自動エンコーダモデルは、表現学習において一般的なツールである。本稿では,データ空間から潜在空間へのマッピング時に局所距離を保持する自動エンコーダモデルをいくつか紹介する。我々は、任意のスケールで位相的特徴を同時に捉えることが知られている連続k-アネレスグラフに基づく局所的距離保存損失を用いる。学習性能を向上させるために,局所的距離保存を主目的とし,復元精度を制約として学習を制約最適化問題として定式化する。このアプローチを階層的変分オートエンコーダに一般化し,幾何学的一貫性のある潜在性とデータ空間を持つ生成モデルを学ぶ。提案手法は,複数の標準データセットと評価指標にまたがって最先端のパフォーマンスを提供する。

Auto-encoder models that preserve similarities in the data are a popular tool in representation learning. In this paper we introduce several auto-encoder models that preserve local distances when mapping from the data space to the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbours graph which is known to capture topological features at all scales simultaneously. To improve training performance, we formulate learning as a constraint optimisation problem with local distance preservation as the main objective and reconstruction accuracy as a constraint. We generalise this approach to hierarchical variational auto-encoders thus learning generative models with geometrically consistent latent and data spaces. Our method provides state-of-the-art performance across several standard datasets and evaluation metrics.

翻訳日:2022-06-14 16:31:32 公開日:2022-06-13

# 階層構造を持つプライベート合成データ

Private Synthetic Data with Hierarchical Structure ( http://arxiv.org/abs/2206.05942v1 )

ライセンス: Link先を確認

Terrance Liu, Zhiwei Steven Wu

(参考訳) 本研究では、個人データポイントがグループ化される階層的データセット(例えば、家庭内の人々)に対する差分プライベートな合成データ生成の問題について検討する。特に、合成データセットと基礎となるプライベートデータセットの類似性を測定するために、プライベートクエリリリースの問題の下で目的を定め、クエリの集合(平均集計数のような統計)の回答を保存する合成データセットを生成します。しかし、クエリリリース問題へのプライベートな合成データの適用はよく研究されているが、そのような研究は階層的でないデータドメインに限定されており、最初の疑問を提起している。さらに、これらの統計を捉えながら、グループレベルでも個人レベルでも合成データを生成する方法はまだ確立されていない。これらの課題を踏まえて、我々はまず階層的なクエリリリースの問題を定式化し、そこでは階層的なデータセットの統計収集を目標としています。具体的には、グループと個人レベルの属性間の関係をキャプチャする統計クエリの一般的なセットを提供する。次に,階層的クエリリリースのためのプライベート合成データアルゴリズムを導入し,american community surveyとalegheny family screening toolデータから得られた階層的データセット上で評価する。最後に、アメリカン・コミュニティ・サーベイ(American Community Survey)に注目します。その本質的に階層構造は、実験を行う別のドメイン固有のクエリのセットを生み出します。

We study the problem of differentially private synthetic data generation for hierarchical datasets in which individual data points are grouped together (e.g., people within households). In particular, to measure the similarity between the synthetic dataset and the underlying private one, we frame our objective under the problem of private query release, generating a synthetic dataset that preserves answers for some collection of queries (i.e., statistics like mean aggregate counts). However, while the application of private synthetic data to the problem of query release has been well studied, such research is restricted to non-hierarchical data domains, raising the initial question -- what queries are important when considering data of this form? Moreover, it has not yet been established how one can generate synthetic data at both the group and individual-level while capturing such statistics. In light of these challenges, we first formalize the problem of hierarchical query release, in which the goal is to release a collection of statistics for some hierarchical dataset. Specifically, we provide a general set of statistical queries that captures relationships between attributes at both the group and individual-level. Subsequently, we introduce private synthetic data algorithms for hierarchical query release and evaluate them on hierarchical datasets derived from the American Community Survey and Allegheny Family Screening Tool data. Finally, we look to the American Community Survey, whose inherent hierarchical structure gives rise to another set of domain-specific queries that we run experiments with.

翻訳日:2022-06-14 16:29:02 公開日:2022-06-13

# 制約付き高次元ベイズ最適化:粉体重み付けへの応用

High-Dimensional Bayesian Optimization with Constraints: Application to Powder Weighing ( http://arxiv.org/abs/2206.05988v1 )

ライセンス: Link先を確認

Shoki Miyagawa, Atsuyoshi Yano, Naoko Sawada and Isamu Ogawa

(参考訳) ベイズ最適化はブラックボックス問題のパラメータを効果的に最適化する。しかし,この手法は限定試行において高次元パラメータには有効ではなかった。パラメータは非線形に低次元空間に埋め込むことで効率的に探索することができるが、制約は考慮できない。高次元ベイズ最適化において既知の等式と未知不等式制約の両方を考慮するため,非線形埋め込みに不等角表現学習を導入することでパラメータ分解を組み合わせることを提案した。提案手法を使用シナリオとしてパウダー重み付け作業に適用した。提案手法は,実験結果に基づいて制約を考慮し,手動パラメータチューニングと比較して約66%の試行回数削減に寄与する。

Bayesian optimization works effectively optimizing parameters in black-box problems. However, this method did not work for high-dimensional parameters in limited trials. Parameters can be efficiently explored by nonlinearly embedding them into a low-dimensional space; however, the constraints cannot be considered. We proposed combining parameter decomposition by introducing disentangled representation learning into nonlinear embedding to consider both known equality and unknown inequality constraints in high-dimensional Bayesian optimization. We applied the proposed method to a powder weighing task as a usage scenario. Based on the experimental results, the proposed method considers the constraints and contributes to reducing the number of trials by approximately 66% compared to manual parameter tuning.

翻訳日:2022-06-14 16:28:39 公開日:2022-06-13

# 非修正解析を用いた有向非巡回グラフにおける一般DNNの関数近似と安定性の解析

Analysis of function approximation and stability of general DNNs in directed acyclic graphs using un-rectifying analysis ( http://arxiv.org/abs/2206.05997v1 )

ライセンス: Link先を確認

Wen-Liang Hwang and Shih-Shuo Tung

(参考訳) ディープフィードフォワードニューラルネットワーク(DNN)に関する一般的な理解の欠如は、非線形関数の構成を分析するツールの欠如と、DNNアーキテクチャの多様性に適用可能な数学的モデルの欠如によるものである。本稿では,不整合法を用いて有向非巡回グラフ(DAG)を用いてDNNを解析するために,アクティベーション関数,非線形変換,DNNアーキテクチャに関するいくつかの基本的な仮定を行った。これらの仮定を満たすDNNは一般的なDNNと呼ばれる。分析グラフの構築は,dagをボトムアップから,規制規則に従って基本要素への原子操作の適用によって構築する公理的手法に基づく。このアプローチにより、数学的帰納法により一般DNNの特性を導出することができる。提案手法を用いることで、一般的なDNNに対して真である性質を導出できることを示す。この分析はネットワーク機能の理解を深め、グラフの分析ツールのホストを活用できれば、さらなる理論的洞察を促進することができる。

A general lack of understanding pertaining to deep feedforward neural networks (DNNs) can be attributed partly to a lack of tools with which to analyze the composition of non-linear functions, and partly to a lack of mathematical models applicable to the diversity of DNN architectures. In this paper, we made a number of basic assumptions pertaining to activation functions, non-linear transformations, and DNN architectures in order to use the un-rectifying method to analyze DNNs via directed acyclic graphs (DAGs). DNNs that satisfy these assumptions are referred to as general DNNs. Our construction of an analytic graph was based on an axiomatic method in which DAGs are built from the bottom-up through the application of atomic operations to basic elements in accordance with regulatory rules. This approach allows us to derive the properties of general DNNs via mathematical induction. We show that using the proposed approach, some properties hold true for general DNNs can be derived. This analysis advances our understanding of network functions and could promote further theoretical insights if the host of analytical tools for graphs can be leveraged.

翻訳日:2022-06-14 16:28:28 公開日:2022-06-13

# シャープネス・アウェア・ミニミゼーションの理解に向けて

Towards Understanding Sharpness-Aware Minimization ( http://arxiv.org/abs/2206.06232v1 )

ライセンス: Link先を確認

Maksym Andriushchenko, Nicolas Flammarion

(参考訳) Sharpness-Aware Minimization (SAM) は、様々な設定における一般化を著しく改善する最悪の重み摂動に依存する最近の訓練手法である。我々は、PAC-ベイズ一般化境界に基づくSAMの成功に対する既存の正当化と平坦なミニマへの収束の考えが不完全であると主張する。さらに、SAM で$m$-sharpness を使うことの成功については、一般化に必須であることが示されている説明がない。 SAMのこの側面をよりよく理解するために、対角線ネットワークの暗黙バイアスを理論的に分析する。 SAMは常にある種の問題に対して標準勾配降下よりも優れた一般化特性を持つ解を選択しており、この効果は$m$-シャープネスを用いて増幅される。さらに,非線形ネットワーク上での暗黙バイアスの特性を実証的に研究し,SAMを用いた標準モデルの微調整が一般化の改善につながることを示した。最後に,確率勾配を用いた非凸目的に対するsamの収束結果を示す。本稿では,これらの結果を深層ネットワークに実証的に説明し,SAMの一般化挙動との関係について論じる。実験のコードはhttps://github.com/tml-epfl/understanding-sam.comで公開されている。

Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using $m$-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using $m$-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.

翻訳日:2022-06-14 16:27:51 公開日:2022-06-13

# (参考訳) Proof Tree AutomataとProof Tree Graphsの紹介

Introducing Proof Tree Automata and Proof Tree Graphs ( http://arxiv.org/abs/2206.06294v1 )

ライセンス: CC BY 4.0

Valentin D. Richard

(参考訳) 構造証明理論では、大規模計算の設計と作業は、システム全体の一部として個々の規則について直観を得るのを難しくする。グラフ理論とオートマトン理論のアプローチを用いて,計算作業を支援する2つの新しいツールを提案する。第一のツールはProof Tree Automaton (PTA) であり、その言語が電卓の派生言語である木オートマトンである。第2のツールは、Proof Tree Graph (PTG) と呼ばれる計算のグラフィカル表現である。この有向ハイパーグラフでは、頂点は項(例えば列)の集合であり、ハイパーアークは規則である。 PTA と PTG の性質とそれらの相互関係について検討する。計算値から従来の木オートマトンへの部分写像としてPTAを分解できることが示される。我々はその文を精錬システム理論で定式化する。最後に、フレームワークをネットと文字列ダイアグラムの証明と比較します。

In structural proof theory, designing and working on large calculi make it difficult to get intuitions about each rule individually and as part of a whole system. We introduce two novel tools to help working on calculi using the approach of graph theory and automata theory. The first tool is a Proof Tree Automaton (PTA): a tree automaton which language is the derivation language of a calculus. The second tool is a graphical representation of a calculus called Proof Tree Graph (PTG). In this directed hypergraph, vertices are sets of terms (e.g. sequents) and hyperarcs are rules. We explore properties of PTA and PTGs and how they relate to each other. We show that we can decompose a PTA as a partial map from a calculus to a traditional tree automaton. We formulate that statement in the theory of refinement systems. Finally, we compare our framework to proof nets and string diagrams.

翻訳日:2022-06-14 16:25:39 公開日:2022-06-13

# (参考訳) arf:芸術的照度分野

ARF: Artistic Radiance Fields ( http://arxiv.org/abs/2206.06360v1 )

ライセンス: CC BY 4.0

Kai Zhang and Nick Kolkin and Sai Bi and Fujun Luan and Zexiang Xu and Eli Shechtman and Noah Snavely

(参考訳) 本稿では,任意のスタイル画像の芸術的特徴を3Dシーンに転送する方法を提案する。点雲やメッシュ上で3次元スタイリングを行う従来の方法は、複雑な現実世界のシーンの幾何学的再構成誤差に敏感である。代わりに、よりロバストな放射場表現をスタイライズすることを提案する。一般的に使用されるグラム行列に基づく損失は,忠実なブラシストロークを使わずに曖昧な結果をもたらす傾向があり,多視点一貫性を維持しつつ,スタイルの詳細を捉えるのに極めて効果的である近傍の損失を導入する。また、フル解像度レンダリング画像上で定義されたスタイル損失を用いて、メモリ集約放射場を最適化する新しい遅延バックプロパゲーション手法を提案する。本手法は, スタイル画像に類似した芸術的外観を生成することにより, ベースラインよりも優れることを示す。ビデオ結果とオープンソース実装については、プロジェクトページを参照してください。

We present a method for transferring the artistic features of an arbitrary style image to a 3D scene. Previous methods that perform 3D stylization on point clouds or meshes are sensitive to geometric reconstruction errors for complex real-world scenes. Instead, we propose to stylize the more robust radiance field representation. We find that the commonly used Gram matrix-based loss tends to produce blurry results without faithful brushstrokes, and introduce a nearest neighbor-based loss that is highly effective at capturing style details while maintaining multi-view consistency. We also propose a novel deferred back-propagation method to enable optimization of memory-intensive radiance fields using style losses defined on full-resolution rendered images. Our extensive evaluation demonstrates that our method outperforms baselines by generating artistic appearance that more closely resembles the style image. Please check our project page for video results and open-source implementations: https://www.cs.cornell.edu/projects/arf/ .

翻訳日:2022-06-14 15:57:50 公開日:2022-06-13

# オブジェクトトークンのフレームクリップ一貫性による映像シーン構造の実現

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens ( http://arxiv.org/abs/2206.06346v1 )

ライセンス: Link先を確認

Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

(参考訳) 最近の行動認識モデルは、オブジェクト、それらの位置、相互作用を統合することで印象的な結果を得た。しかし、各フレームに対して厳密な構造化アノテーションを取得するのは面倒で時間を要するため、これらのメソッドはトレーニングコストが高く、スケーラビリティも低い。同時に、関心領域内外を問わず、注釈付き画像の小さなセットが利用可能であれば、これをビデオ下流タスクに活用するにはどうすればよいのか? 学習フレームワークStructureViT(略してSViT)を提案し、トレーニング中にのみ利用できる少数の画像の構造を利用することで、ビデオモデルを改善する方法を示す。 SViTは2つの重要な洞察に依存している。まず、画像とビデオの両方に構造化情報が含まれているため、画像とビデオにまたがって使用できる「emph{object tokens}」セットのトランスフォーマーモデルを統合する。第二に、動画中の個々のフレームのシーン表現は静止画と「一致」すべきである。これは、画像とビデオ間の構造化情報の流れを保証する \emph{frame-clip consistency} 損失によって達成される。場面構造の特定のインスタンス化、すなわち、手と物体がノードとして位置し、接点/非接点がエッジとして物理的関係からなる、\emph{hand-object graph} を探索する。 SViTは、複数のビデオ理解タスクとデータセットで強力なパフォーマンス向上を示しており、Ego4D CVPR'22 Object State Localizationチャレンジで優勝している。コードと事前訓練されたモデルについては、プロジェクトページの \url{https://eladb3.github.io/SViT/} を参照してください。

Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. At the same time, if a small set of annotated images is available, either within or outside the domain of interest, how could we leverage these for a video downstream task? We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of \emph{object tokens} that can be used across images and videos. Second, the scene representations of individual frames in video should "align" with those of still images. This is achieved via a \emph{Frame-Clip Consistency} loss, which ensures the flow of structured information between images and videos. We explore a particular instantiation of scene structure, namely a \emph{Hand-Object Graph}, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges. SViT shows strong performance improvements on multiple video understanding tasks and datasets; and it wins first place in the Ego4D CVPR'22 Object State Localization challenge. For code and pretrained models, visit the project page at \url{https://eladb3.github.io/SViT/}

翻訳日:2022-06-14 15:56:49 公開日:2022-06-13

# 小型人辞書に基づく具体性・難易度評価付き大辞書の自動生成

Automatic generation of a large dictionary with concreteness/abstractness ratings based on a small human dictionary ( http://arxiv.org/abs/2206.06200v1 )

ライセンス: Link先を確認

Vladimir Ivanov, Valery Solovyev

(参考訳) 具体的/抽象的な言葉は、心理学的・神経生理学的研究の増加に使われている。いくつかの言語では、大きな辞書が手作業で作成されている。これは非常に時間がかかり、コストがかかるプロセスです。より小さなサンプルで得られた専門家評価を外挿する必要があるコンクリート/コンクリートの単語の高品質辞書を自動で生成する。研究上の疑問は、このようなサンプルがどの程度小さくして十分な外挿を行うべきかである。本稿では,単語の自動格付け手法を提案するとともに,専門家評価の量を大幅に削減するためのアプローチを提案する。この手法は英語の大規模なテストセットで評価されている。構築された辞書の品質は専門家に匹敵する。予測された評価と専門家の評価の相関は、最先端の手法と比較して高い。

Concrete/abstract words are used in a growing number of psychological and neurophysiological research. For a few languages, large dictionaries have been created manually. This is a very time-consuming and costly process. To generate large high-quality dictionaries of concrete/abstract words automatically one needs extrapolating the expert assessments obtained on smaller samples. The research question that arises is how small such samples should be to do a good enough extrapolation. In this paper, we present a method for automatic ranking concreteness of words and propose an approach to significantly decrease amount of expert assessment. The method has been evaluated on a large test set for English. The quality of the constructed dictionaries is comparable to the expert ones. The correlation between predicted and expert ratings is higher comparing to the state-of-the-art methods.

翻訳日:2022-06-14 15:55:45 公開日:2022-06-13

# (参考訳) SNeS:不完全データからおそらく対称性のあるニューラルサーフェスを学習する

SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data ( http://arxiv.org/abs/2206.06340v1 )

ライセンス: CC BY 4.0

Eldar Insafutdinov, Dylan Campbell, Jo\~ao F. Henriques, Andrea Vedaldi

(参考訳) 部分対称物体の正確な3次元再構成法を提案する。我々は、ニューラルレイディアンスフィールド(NeRF)のようなニューラル再構成とレンダリングの最近の進歩の強みの上に構築する。このようなアプローチの大きな欠点は、トレーニングイメージではっきりと見えないオブジェクトの任意の部分を再構築できないことだ。証拠が欠落している場合、対称性のような構造的事前情報を使用して、欠落情報を完成させることができる。幾何学的および非反射的材料は対称的であるかもしれないが、周囲のシーンからの影や反射は一般に対称ではない。これに対処するために,3次元形状と材料特性にソフト対称性の制約を適用し,照明,アルベド色,反射率に寄与する。提案手法を最近導入したCO3Dデータセット上で評価し,高反射性材料を再構成する難しさから自動車カテゴリーに着目した。高い忠実度で観察されていない領域を再構成し、高品質のノベルビュー画像を作成することができることを示す。

We present a method for the accurate 3D reconstruction of partly-symmetric objects. We build on the strengths of recent advances in neural reconstruction and rendering such as Neural Radiance Fields (NeRF). A major shortcoming of such approaches is that they fail to reconstruct any part of the object which is not clearly visible in the training image, which is often the case for in-the-wild images and videos. When evidence is lacking, structural priors such as symmetry can be used to complete the missing information. However, exploiting such priors in neural rendering is highly non-trivial: while geometry and non-reflective materials may be symmetric, shadows and reflections from the ambient scene are not symmetric in general. To address this, we apply a soft symmetry constraint to the 3D geometry and material properties, having factored appearance into lighting, albedo colour and reflectivity. We evaluate our method on the recently introduced CO3D dataset, focusing on the car category due to the challenge of reconstructing highly-reflective materials. We show that it can reconstruct unobserved regions with high fidelity and render high-quality novel view images.

翻訳日:2022-06-14 15:55:00 公開日:2022-06-13

# link3d: 3dlidar point cloudの線形キーポイント表現

LinK3D: Linear Keypoints Representation for 3D LiDAR Point Cloud ( http://arxiv.org/abs/2206.05927v1 )

ライセンス: Link先を確認

Yunge Cui, Yinlong Zhang, Jiahua Dong, Haibo Sun and Feng Zhu

(参考訳) 特徴抽出とマッチングは、2Dや3Dオブジェクトの検出、認識、登録など、多くのコンピュータビジョンタスクの基本部分である。ご存知の通り、2Dの特徴抽出とマッチングはすでに大きな成功を収めています。残念なことに、3Dの分野では、現在の手法は視覚タスクにおける3D LiDARセンサーの広範囲な応用をサポートできない。この制限に対処するため,LinK3Dと呼ばれる3次元LiDAR点雲に対する線形キーポイント表現法を提案する。 LinK3D の斬新さは、LiDAR のポイントクラウドの特徴(空間性、シナリオの複雑さなど)を完全に考慮し、現在のキーポイントをその強い隣のキーポイントで表現し、現在のキーポイントの記述に強い制約を与える点にある。提案したLinK3Dは,2つの公開データセット(KITTI,Steven VLP16)で評価され,実験結果から,提案手法が適合性能の最先端性を大幅に向上することが示された。さらに重要なことに、LinK3Dは(LiDARの周波数10Hzに基づいて)優れたリアルタイムパフォーマンスを示している。 LinK3Dは、64光のレーザービームで収集された点雲から特徴を引き出すのに平均32ミリ秒しかかからず、Intel Core i7 @2.2 GHzプロセッサでノートブックで実行すると2つのLiDARスキャンと一致するのに、わずか8ミリ秒しかかからない。さらに、この手法は様々な3d視覚アプリケーションに広く拡張することができる。本稿では,LinK3Dを3次元登録,LiDARオドメトリー,位置認識タスクに適用し,最先端手法と比較して競争力のある結果を得た。

Feature extraction and matching are the basic parts of many computer vision tasks, such as 2D or 3D object detection, recognition, and registration. As we all know, 2D feature extraction and matching have already been achieved great success. Unfortunately, in the field of 3D, the current methods fail to support the extensive application of 3D LiDAR sensors in vision tasks, due to the poor descriptiveness and inefficiency. To address this limitation, we propose a novel 3D feature representation method: Linear Keypoints representation for 3D LiDAR point cloud, called LinK3D. The novelty of LinK3D lies in that it fully considers the characteristics (such as sparsity, complexity of scenarios) of LiDAR point cloud, and represents current keypoint with its robust neighbor keypoints, which provide strong constraint on the description of current keypoint. The proposed LinK3D has been evaluated on two public datasets (i.e., KITTI, Steven VLP16), and the experimental results show that our method greatly outperforms the state-of-the-arts in matching performance. More importantly, LinK3D shows excellent real-time performance (based on the frequence 10 Hz of LiDAR). LinK3D only takes an average of 32 milliseconds to extract features from the point cloud collected by a 64-ray laser beam, and takes merely about 8 milliseconds to match two LiDAR scans when executed in a notebook with an Intel Core i7 @2.2 GHz processor. Moreover, our method can be widely extended to a variety of 3D vision applications. In this paper, we has applied our LinK3D to 3D registration, LiDAR odometry and place recognition tasks, and achieved competitive results compared with the state-of-the-art methods.

翻訳日:2022-06-14 15:50:48 公開日:2022-06-13

# in-the-wildイメージから学ぶファッションの相性

Learning Fashion Compatibility from In-the-wild Images ( http://arxiv.org/abs/2206.05982v1 )

ライセンス: Link先を確認

Additya Popli, Vijay Kumar, Sujit Jos and Saraansh Tandon

(参考訳) 相補的なファッションレコメンデーションは、衣装として「うまく行く」異なるカテゴリー(シャツ、履物など)のアイテムを特定することを目的としている。既存のアプローチのほとんどは、手動で調整された互換アイテムの組み合わせを含むラベル付き衣装データセットを使用して、このタスクの表現を学習する。本研究では,コンパチブルな服装を身に着けることが多いという事実を活かし,自己教師付き学習を通じて,街頭ファッション画像からコンパチブル予測のための表現を学習することを提案する。本研究の前提課題は、同一人物が着用する異なる項目の表現が、他人が着用するものよりも近いように定式化されている。さらに,推測中の画像とカタログの領域間ギャップを低減するために,両領域間の特徴分布の差を最小限に抑える対角的損失を導入する。我々は、PolyvoreとPolyvore-Disjointの2つの一般的なファッション互換性ベンチマークで実験を行い、既存の自己教師型アプローチよりも優れています。

Complementary fashion recommendation aims at identifying items from different categories (e.g. shirt, footwear, etc.) that "go well together" as an outfit. Most existing approaches learn representation for this task using labeled outfit datasets containing manually curated compatible item combinations. In this work, we propose to learn representations for compatibility prediction from in-the-wild street fashion images through self-supervised learning by leveraging the fact that people often wear compatible outfits. Our pretext task is formulated such that the representations of different items worn by the same person are closer compared to those worn by other people. Additionally, to reduce the domain gap between in-the-wild and catalog images during inference, we introduce an adversarial loss that minimizes the difference in feature distribution between the two domains. We conduct our experiments on two popular fashion compatibility benchmarks - Polyvore and Polyvore-Disjoint outfits, and outperform existing self-supervised approaches, particularly significant in cross-dataset setting where training and testing images are from different sources.

翻訳日:2022-06-14 15:50:12 公開日:2022-06-13

# より良い教師: 知識蒸留のための動的事前知識

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation ( http://arxiv.org/abs/2206.06067v1 )

ライセンス: Link先を確認

Zengyu Qiu, Xinzhu Ma, Kunlin Yang, Chunya Liu, Jun Hou, Shuai Yi, Wanli Ouyang

(参考訳) 知識蒸留(kd)は、大きなモデル(教師)から小さなモデル(学生)への学習表現の転送に非常に有望な能力を示している。しかし,学生と教師の能力格差が大きくなるにつれて,既存のKD手法ではより良い結果が得られない。本研究は,特に大規模教員に適用する場合において,kdにとって「優先的知識」が不可欠であることを示す。特に,教師の特徴の一部を,特徴蒸留の前に先行知識として統合する動的事前知識(DPK)を提案する。これは、我々のメソッドが教師の特徴を単に「ターゲット」ではなく「インプット」として捉えることを意味します。また,学習段階における事前知識の比率を特徴ギャップに応じて動的に調整することにより,学生を適切な難易度で指導する。提案手法を評価するため、2つの画像分類ベンチマーク(CIFAR100とImageNet)とオブジェクト検出ベンチマーク(MS COCO)について広範な実験を行った。その結果,異なる条件下での性能において,本手法が優れていることを示す。さらに,dpkにより,生徒モデルの性能と教師モデルとの正の相関が得られ,より大きな教師を適用することで,学生の正確性をさらに高めることができる。私たちのコードは再現性のために公開されます。

Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the 'prior knowledge' is vital to KD, especially when applying large teachers. Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of the teacher's features as the prior knowledge before the feature distillation. This means that our method also takes the teacher's feature as `input', not just `target'. Besides, we dynamically adjust the ratio of the prior knowledge during the training phase according to the feature gap, thus guiding the student in an appropriate difficulty. To evaluate the proposed method, we conduct extensive experiments on two image classification benchmarks (i.e. CIFAR100 and ImageNet) and an object detection benchmark (i.e. MS COCO). The results demonstrate the superiority of our method in performance under varying settings. More importantly, our DPK makes the performance of the student model is positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers. Our codes will be publicly available for the reproducibility.

翻訳日:2022-06-14 15:49:51 公開日:2022-06-13

# recaptured image forensic における学習特徴のばらつきと動的融合

Learning Feature Disentanglement and Dynamic Fusion for Recaptured Image Forensic ( http://arxiv.org/abs/2206.06103v1 )

ライセンス: Link先を確認

Shuyu Miao, Lin Zheng, Hong Jin

(参考訳) 画像再キャプチャーは、他人の画像を再キャプチャすることでシステムを欺く人工知能(ai)システムの公平性を損なう。既存の再キャプチャモデルのほとんどは、固定された電子デバイスを使用してシミュレーションされた再キャプチャされた画像を含むデータセットに基づいて、単一の再キャプチャパターン(moire、エッジ、アーティファクトなど)にのみ対処できる。本稿では,画像再キャプチャータスクを画像再キャプチャー認識の4つのパターン,すなわちモアレ再キャプチャー,エッジ再キャプチャー,アーティファクト再キャプチャー,その他の再キャプチャーとして明示的に再定義する。一方,異なる再キャプチャパターン認識をカバーするために,最も効果的な再キャプチャ特徴表現を適応的に学習する特徴分散と動的融合(FDDF)モデルを提案する。さらに,これまでに公表したデータセットの約5倍の多種多様なリキャプチャパターンを含む,大規模な実時間ユニバーサルリキャプチャ(rur)データセットを収集した。我々の知る限り、我々はまず、再適応画像法学のための一般的なモデルと一般的な実シーンの大規模データセットを提案する。大規模な実験により,提案したFDDFはRURデータセット上で最先端の性能を達成できることが示された。

Image recapture seriously breaks the fairness of artificial intelligent (AI) systems, which deceives the system by recapturing others' images. Most of the existing recapture models can only address a single pattern of recapture (e.g., moire, edge, artifact, and others) based on the datasets with simulated recaptured images using fixed electronic devices. In this paper, we explicitly redefine image recapture forensic task as four patterns of image recapture recognition, i.e., moire recapture, edge recapture, artifact recapture, and other recapture. Meanwhile, we propose a novel Feature Disentanglement and Dynamic Fusion (FDDF) model to adaptively learn the most effective recapture feature representation for covering different recapture pattern recognition. Furthermore, we collect a large-scale Real-scene Universal Recapture (RUR) dataset containing various recapture patterns, which is about five times the number of previously published datasets. To the best of our knowledge, we are the first to propose a general model and a general real-scene large-scale dataset for recaptured image forensic. Extensive experiments show that our proposed FDDF can achieve state-of-the-art performance on the RUR dataset.

翻訳日:2022-06-14 15:49:30 公開日:2022-06-13

# 特異値の微調整:最小ショットのセグメンテーションは、最小パラメータの微調整を必要とする

Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning ( http://arxiv.org/abs/2206.06122v1 )

ライセンス: Link先を確認

Yanpeng Sun, Qiang Chen, Xiangyu He, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jian Cheng, Zechao Li, Jingdong Wang

(参考訳) 事前トレーニングされたバックボーンの凍結は、少数ショットのセグメンテーションでオーバーフィットを避けるための標準的なパラダイムになっています。本稿では、このパラダイムを再考し、新しい体制を探求する。オーバーフィッティング問題を克服する解決策を提案し,新しいクラスを学習する際のモデル一般化を改良する。本手法では, バックボーンパラメータをSingular Value Decomposition (SVD) を介して3つの連続行列に分解し, 特異値のみを微調整し, 他のパラメータを凍結する。上記の設計により、トレーニング済みのバックボーン内でセマンティックなヒントを維持しながら、新しいクラスの特徴表現を調整できる。バックボーンの異なる複数ショットセグメンテーション法におけるSVF(Singular Value Fine-tuning)アプローチの評価を行った。本研究では,Pascal-5$^i$とCOCO-20$^i$を1ショット5ショット設定で比較した。このシンプルなベースラインが研究者たちに、バックボーンの微調整の役割を再考させることを期待したい。ソースコードとモデルは \url{https://github.com/syp2ysy/SVF} で入手できる。

Freezing the pre-trained backbone has become a standard paradigm to avoid overfitting in few-shot segmentation. In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}. We present a solution to overcome the overfitting problem, leading to better model generalization on learning novel classes. Our method decomposes backbone parameters into three successive matrices via the Singular Value Decomposition (SVD), then {\em only fine-tunes the singular values} and keeps others frozen. The above design allows the model to adjust feature representations on novel classes while maintaining semantic clues within the pre-trained backbone. We evaluate our {\em Singular Value Fine-tuning (SVF)} approach on various few-shot segmentation methods with different backbones. We achieve state-of-the-art results on both Pascal-5$^i$ and COCO-20$^i$ across 1-shot and 5-shot settings. Hopefully, this simple baseline will encourage researchers to rethink the role of backbone fine-tuning in few-shot settings. The source code and models will be available at \url{https://github.com/syp2ysy/SVF}.

翻訳日:2022-06-14 15:49:08 公開日:2022-06-13

# ICCV 2021 VIPriors画像分類チャレンジのための第2位ソリューション: 抽出と反発学習アプローチ

2nd Place Solution for ICCV 2021 VIPriors Image Classification Challenge: An Attract-and-Repulse Learning Approach ( http://arxiv.org/abs/2206.06168v1 )

ライセンス: Link先を確認

Yilu Guo, Shicai Yang, Weijie Chen, Liang Ma, Di Xie, Shiliang Pu

(参考訳) 畳み込みニューラルネットワーク(cnns)は,大規模データセットを利用することで,画像分類において有意な成功を収めている。しかし、小規模データセットをスクラッチから効率的に学習することは依然として大きな課題である。限られたトレーニングデータセットでは、過度にパラメータ化されたCNNが単にデータセットを記憶する傾向があるため、カテゴリの概念は曖昧になる。したがって,過度な適合を避けながら,より差別的な表現を学習する方法を研究することが重要である。カテゴリの概念はあいまいな傾向があるため、より個別の情報を取得することが重要である。そこで本稿では,特徴表現を豊かにするContrastive Regularization (CR) と,異なるクラスに対する適合性のバランスをとるSymmetric Cross Entropy (SCE) と,ラベル情報のキャリブレーションを行うMean Teacher という新たなフレームワークを提案する。具体的には、sce と cr は、クラス情報 (attract) とインスタンス (repulse) の間の適応的トレードオフによって過剰フィッティングを緩和しながら、識別表現を学習する。その後、より正確なソフト擬似ラベルを校正することで、パフォーマンスをさらに改善するために平均教師が使用される。十分な実験は、Attract-and-Repulseフレームワークの有効性を検証する。攻撃的データ拡張,tencrop推論,モデルセンシングなど他の戦略とともに,iccv 2021画像分類課題において,第2位を達成した。

Convolutional neural networks (CNNs) have achieved significant success in image classification by utilizing large-scale datasets. However, it is still of great challenge to learn from scratch on small-scale datasets efficiently and effectively. With limited training datasets, the concepts of categories will be ambiguous since the over-parameterized CNNs tend to simply memorize the dataset, leading to poor generalization capacity. Therefore, it is crucial to study how to learn more discriminative representations while avoiding over-fitting. Since the concepts of categories tend to be ambiguous, it is important to catch more individual-wise information. Thus, we propose a new framework, termed Attract-and-Repulse, which consists of Contrastive Regularization (CR) to enrich the feature representations, Symmetric Cross Entropy (SCE) to balance the fitting for different classes and Mean Teacher to calibrate label information. Specifically, SCE and CR learn discriminative representations while alleviating over-fitting by the adaptive trade-off between the information of classes (attract) and instances (repulse). After that, Mean Teacher is used to further improve the performance via calibrating more accurate soft pseudo labels. Sufficient experiments validate the effectiveness of the Attract-and-Repulse framework. Together with other strategies, such as aggressive data augmentation, TenCrop inference, and models ensembling, we achieve the second place in ICCV 2021 VIPriors Image Classification Challenge.

翻訳日:2022-06-14 15:48:44 公開日:2022-06-13

# 変圧器病変追跡装置

Transformer Lesion Tracker ( http://arxiv.org/abs/2206.06252v1 )

ライセンス: Link先を確認

Wen Tang, Han Kang, Haoyue Zhang, Pengxin Yu, Corey W. Arnold, Rongguo Zhang

(参考訳) 長期病変追跡による病変進展と治療反応の評価は臨床実践において重要な役割を担っている。このタスクの自動化されたアプローチは、手動で病変マッチングを行う場合の労働コストと時間消費によって動機付けられる。従来の手法は、通常、ローカルとグローバルの情報の統合を欠いている。本研究では,Transformer Lesion Tracker (TLT) と呼ばれるトランスフォーマーベースの手法を提案する。具体的には,CAT(Cross Attention-based Transformer)を設計し,グローバル情報とローカル情報を組み合わせて特徴抽出を強化する。我々はまた,CATに解剖情報を導入し,有用な特徴知識に集中できるように,登録ベースの解剖アテンションモジュール(RAAM)を開発した。トランスフォーマートレーニングでは、機能の選択とメモリフットプリントの削減のためにスパース選択戦略(SSS)が提示される。さらに、グローバル回帰を使用して、モデルパフォーマンスをさらに向上します。我々は,我々の手法の優位性を示すために,公開データセット上で実験を行い,我々のモデルの性能が最先端(SOTA)と比較して,平均ユークリッド中心誤差を14.3%(6mm vs. 7mm)以上改善したことを確認した。コードはhttps://github.com/TangWen920812/TLTで入手できる。

Evaluating lesion progression and treatment response via longitudinal lesion tracking plays a critical role in clinical practice. Automated approaches for this task are motivated by prohibitive labor costs and time consumption when lesion matching is done manually. Previous methods typically lack the integration of local and global information. In this work, we propose a transformer-based approach, termed Transformer Lesion Tracker (TLT). Specifically, we design a Cross Attention-based Transformer (CAT) to capture and combine both global and local information to enhance feature extraction. We also develop a Registration-based Anatomical Attention Module (RAAM) to introduce anatomical information to CAT so that it can focus on useful feature knowledge. A Sparse Selection Strategy (SSS) is presented for selecting features and reducing memory footprint in Transformer training. In addition, we use a global regression to further improve model performance. We conduct experiments on a public dataset to show the superiority of our method and find that our model performance has improved the average Euclidean center error by at least 14.3% (6mm vs. 7mm) compared with the state-of-the-art (SOTA). Code is available at https://github.com/TangWen920812/TLT.

翻訳日:2022-06-14 15:48:05 公開日:2022-06-13

# Faturized Query R-CNN

Featurized Query R-CNN ( http://arxiv.org/abs/2206.06258v1 )

ライセンス: Link先を確認

Wenqiang Zhang and Tianheng Cheng and Xinggang Wang and Qian Zhang and Wenyu Liu

(参考訳) detr法で導入されたクエリメカニズムはオブジェクト検出のパラダイムを変えており、最近では多くのクエリベースのメソッドが強いオブジェクト検出性能を得ている。しかし、現在のクエリベースの検出パイプラインは以下の2つの問題に悩まされている。まず、ランダムに初期化されたオブジェクトクエリを最適化するためには、マルチステージデコーダが必要である。第二に、クエリはトレーニング後に修正され、満足のいく一般化能力に繋がる。そこで本稿では,r-cnnフレームワークにおいて,クエリ生成ネットワークが予測するオブジェクトクエリの実現と,r-cnnの高速化について述べる。 COCOデータセットの大規模な実験により、我々のFeaturized Query R-CNNは、最新の最先端のスパースR-CNN検出器を含むすべてのR-CNN検出器の中で、最高の速度精度のトレードオフが得られることが示された。コードは \url{https://github.com/hustvl/featurized-queryrcnn} で入手できる。

The query mechanism introduced in the DETR method is changing the paradigm of object detection and recently there are many query-based methods have obtained strong object detection performance. However, the current query-based detection pipelines suffer from the following two issues. Firstly, multi-stage decoders are required to optimize the randomly initialized object queries, incurring a large computation burden. Secondly, the queries are fixed after training, leading to unsatisfying generalization capability. To remedy the above issues, we present featurized object queries predicted by a query generation network in the well-established Faster R-CNN framework and develop a Featurized Query R-CNN. Extensive experiments on the COCO dataset show that our Featurized Query R-CNN obtains the best speed-accuracy trade-off among all R-CNN detectors, including the recent state-of-the-art Sparse R-CNN detector. The code is available at \url{https://github.com/hustvl/Featurized-QueryRCNN}.

翻訳日:2022-06-14 15:47:44 公開日:2022-06-13

# (参考訳) インド法典の要約: テキスト正規化に基づくアプローチ

Indian Legal Text Summarization: A Text Normalisation-based Approach ( http://arxiv.org/abs/2206.06238v1 )

ライセンス: CC BY 4.0

Satyajit Ghosh, Mousumi Dutta, Tanaya Das

(参考訳) インドの裁判所制度では、保留中の事件は長い間問題となっていた。特筆すべき症例は4件以上ある。何百もの文書を手作業で要約することは、法的利害関係者にとって時間と手間のかかる作業である。テキスト要約のための最先端モデルの多くは、機械学習が進むにつれて登場してきた。ドメインに依存しないモデルは法的テキストではうまく機能せず、インドの法律システムのためにこれらのモデルを微調整することは、一般公開されたデータセットの欠如によって問題となる。ドメインに依存しないモデルの性能を向上させるため,インドの文脈における法文の正規化手法を提案した。著者らは、法的テキスト要約のための2つの最先端のドメイン非依存モデル、すなわちBARTとPEGASUSを実験した。 BARTとPEGASUSは、テキスト正規化アプローチの有効性を理解するために、抽出的および抽象的要約の観点から、そのペースを経る。要約されたテキストは、複数のパラメーターとROUGEメトリクスを使用してドメインの専門家によって評価される。提案手法は,ドメインに依存しないモデルを用いた法的なテキストに有効であることを示す。

In the Indian court system, pending cases have long been a problem. There are more than 4 crore cases outstanding. Manually summarising hundreds of documents is a time-consuming and tedious task for legal stakeholders. Many state-of-the-art models for text summarization have emerged as machine learning has progressed. Domain-independent models don't do well with legal texts, and fine-tuning those models for the Indian Legal System is problematic due to a lack of publicly available datasets. To improve the performance of domain-independent models, the authors have proposed a methodology for normalising legal texts in the Indian context. The authors experimented with two state-of-the-art domain-independent models for legal text summarization, namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms of extractive and abstractive summarization to understand the effectiveness of the text normalisation approach. Summarised texts are evaluated by domain experts on multiple parameters and using ROUGE metrics. It shows the proposed text normalisation approach is effective in legal texts with domain-independent models.

翻訳日:2022-06-14 15:46:25 公開日:2022-06-13

# 非自己回帰変圧器の学習について

On the Learning of Non-Autoregressive Transformers ( http://arxiv.org/abs/2206.05975v1 )

ライセンス: Link先を確認

Fei Huang, Tianhua Tao, Hao Zhou, Lei Li, Minlie Huang

(参考訳) 非自己回帰トランスフォーマー(non-autoregressive transformer, nat)は、文全体を並列に予測することで復号遅延を削減することを目的としたテキスト生成モデルである。しかし、そのようなレイテンシ低減は、左から右への依存関係をキャプチャする能力を犠牲にして、NAT学習を非常に困難にする。本稿では,NAT学習の課題を明らかにするための理論的,実証的な分析を行い,既存の成功を理解するための統一的な視点を提案する。まず, NAT を最大化することで, NAT のトレーニングを行うだけで限界分布の近似を導出できるが, トークン間の依存度はすべて減少し, ドロップした情報がデータセットの条件付き総相関によって測定可能であることを示す。第2に,従来の目標の多くを統一フレームワークで定式化し,その成功をプロキシ分布の可能性を最大化することで,情報損失を低減できることを示す。実証的研究により,NAT学習における現象を考察し,新たな学習手法の設計を指導できることが示唆された。

Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. First, we show that simply training NAT by maximizing the likelihood can lead to an approximation of marginal distributions but drops all dependencies between tokens, where the dropped information can be measured by the dataset's conditional total correlation. Second, we formalize many previous objectives in a unified framework and show that their success can be concluded as maximizing the likelihood on a proxy distribution, leading to a reduced information loss. Empirical studies show that our perspective can explain the phenomena in NAT learning and guide the design of new training methods.

翻訳日:2022-06-14 15:37:21 公開日:2022-06-13

# 言語モデルは汎用インターフェースである

Language Models are General-Purpose Interfaces ( http://arxiv.org/abs/2206.06336v1 )

ライセンス: Link先を確認

Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

(参考訳) 基盤モデルは、幅広い下流アプリケーションで有効であるため、多くの注目を集めています。アーキテクチャには大きな収束があるが、ほとんどの事前訓練されたモデルは、通常、特定のタスクやモダリティのために開発されている。本稿では,様々な基礎モデルに対する汎用インタフェースとして言語モデルを使うことを提案する。プリトレーニングされたエンコーダのコレクションは、さまざまなモダリティ(視覚や言語など)を知覚し、普遍的なタスク層の役割を担う言語モデルと連携します。インタフェースとモジュールエンコーダを共同で事前学習する半コーサル言語モデリングの目的を提案する。因果モデリングと非因果モデリングの両方から利点と能力を仮定し、2つの世界のベストを組み合わせる。特に, 提案手法は, 因果的言語モデルから文脈内学習と開放型生成の能力を継承するだけでなく, 双方向エンコーダによる微調整にも寄与する。さらに重要なことは、私たちのアプローチは上記の機能の組み合わせをシームレスに解き放ち、例えば、微調整エンコーダでテキスト内学習や命令の追従を可能にします。様々な言語のみのベンチマークおよび視覚言語ベンチマークにおける実験の結果は、我々のモデルは微調整、ゼロショット一般化、少数ショット学習といった特殊なモデルよりも優れ、または競合していることを示している。

Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language), and they dock with a language model that plays the role of a universal task layer. We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders. We subsume the advantages and capabilities from both causal and non-causal modeling, thereby combining the best of two worlds. Specifically, the proposed method not only inherits the capabilities of in-context learning and open-ended generation from causal language modeling, but also is conducive to finetuning because of the bidirectional encoders. More importantly, our approach seamlessly unlocks the combinations of the above capabilities, e.g., enabling in-context learning or instruction following with finetuned encoders. Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

翻訳日:2022-06-14 15:37:02 公開日:2022-06-13

# 知覚からプログラムへ:規則化、過剰パラメータ化、償却

From Perception to Programs: Regularize, Overparameterize, and Amortize ( http://arxiv.org/abs/2206.05922v1 )

ライセンス: Link先を確認

Hao Tang and Kevin Ellis

(参考訳) 帰納的推論と知覚能力を組み合わせることを目的として,まず知覚入力をニューラルネットワークで解析して低次元の解釈可能な表現とし,次に合成プログラムで処理するニューロシンボリックプログラム合成技術を開発した。本稿では,問題を緩和し,全モジュールを勾配勾配で学習する手法について検討する。マルチタスク学習,償却推論,過度パラメータ化,長大プログラムのペナルティ化のための異なる戦略である。このツールボックスは、勾配誘導型プログラム探索の安定性を改善し、入力を離散抽象として知覚する方法と、それらの抽象をプログラムとして象徴的に処理する方法の両方を学ぶ方法を提案する。

Toward combining inductive reasoning with perception abilities, we develop techniques for neurosymbolic program synthesis where perceptual input is first parsed by neural nets into a low-dimensional interpretable representation, which is then processed by a synthesized program. We explore several techniques for relaxing the problem and jointly learning all modules end-to-end with gradient descent: multitask learning; amortized inference; overparameterization; and a differentiable strategy for penalizing lengthy programs. Collectedly this toolbox improves the stability of gradient-guided program search, and suggests ways of learning both how to perceive input as discrete abstractions, and how to symbolically process those abstractions as programs.

翻訳日:2022-06-14 15:34:15 公開日:2022-06-13

# 拘束ガイドグラディエントドライズ:不平等制約による指導訓練

Constraint Guided Gradient Descent: Guided Training with Inequality Constraints ( http://arxiv.org/abs/2206.06202v1 )

ライセンス: Link先を確認

Quinten Van Baelen Peter Karsmakers

(参考訳) ディープラーニングは通常、利用可能なドメイン知識を無視した入出力ペアという形式で、データのみからニューラルネットワークを学習することによって行われる。本研究では,訓練手順にドメイン知識を注入できるCGGD(Constraint Guided Gradient Descent)フレームワークを提案する。ドメイン知識は、いくつかのアプリケーションにとって自然な選択であるように見えるハード不等式制約の結合として記述される。他のニューロシンボリックアプローチと比較すると、提案手法はトレーニングデータに対する不等式制約を満たすモデルに収束し、学習(最適化)目標に追加されるアドホックな用語にまず制約を変換する必要がなくなる。ある条件下では、CGGDはトレーニングセット上の制約を満たすモデルに収束するが、事前の作業は必ずしもそのようなモデルに収束するとは限らない。これは、CGGDがトレーニングをネットワークの初期化に依存しにくくし、全てのデータに対する制約を満たすことを実証的に示している。

Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge. In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that enables the injection of domain knowledge into the training procedure. The domain knowledge is assumed to be described as a conjunction of hard inequality constraints which appears to be a natural choice for several applications. Compared to other neuro-symbolic approaches, the proposed method converges to a model that satisfies any inequality constraint on the training data and does not require to first transform the constraints into some ad-hoc term that is added to the learning (optimisation) objective. Under certain conditions, it is shown that CGGD can converges to a model that satisfies the constraints on the training set, while prior work does not necessarily converge to such a model. It is empirically shown on two independent and small data sets that CGGD makes training less dependent on the initialisation of the network and improves the constraint satisfiability on all data.

翻訳日:2022-06-14 15:33:46 公開日:2022-06-13

# ワンショットNASからFew-shot NASへのトレーニングスキームによるスーパーネットのランク付け相関の改善

Improve Ranking Correlation of Super-net through Training Scheme from One-shot NAS to Few-shot NAS ( http://arxiv.org/abs/2206.05896v1 )

ライセンス: Link先を確認

Jiawei Liu, Kaiyu Zhang, Weitai Hu and Qing Yang

(参考訳) one-shot neural architecture search (nas) のアルゴリズムは計算量を減らすために広く使われている。しかし、重みが共有されるサブネット間の干渉のため、これらのアルゴリズムによって訓練されたスーパーネットから継承されたサブネットは、精度ランキングの一貫性に乏しい。この問題に対処するために,ワンショットNASから少数ショットNASへのステップバイステップトレーニングスーパーネットスキームを提案する。トレーニングスキームでは,まずワンショット方式でスーパーネットをトレーニングし,それをマルチサブネットに分割して徐々にトレーニングすることで,スーパーネットの重みを解消する。最後に,本手法はcvpr2022軽量nasチャレンジトラック1で4位である。私たちのコードはhttps://github.com/liujiawei2333/cvpr2022-nascompetition-track-1-4th-solutionで利用可能です。

The algorithms of one-shot neural architecture search (NAS) have been widely used to reduce the computation. However, because of the interference among the subnets which weights are shared, the subnets inherited from these super-net trained by those algorithms have poor consistency in precision ranking. To address this problem, we propose a step-by-step training super-net scheme from one-shot NAS to few-shot NAS. In the training scheme, we training super-net by the one-shot way firstly, and then we disentangles the weights of super-net by splitting that to multi-subnets and training them gradually. Finally, our method ranks 4th place in the CVPR2022 Lightweight NAS Challenge Track1. Our code is available at https://github.com/liujiawei2333/CVPR2022-NAScompetition-Track-1-4th-solution.

翻訳日:2022-06-14 15:27:50 公開日:2022-06-13

# (参考訳) スマートマニュファクチャリングデータセットにおける異常検出とセンサ間転送学習

Anomaly Detection and Inter-Sensor Transfer Learning on Smart Manufacturing Datasets ( http://arxiv.org/abs/2206.06355v1 )

ライセンス: CC BY 4.0

Mustafa Abdallah, Byung-Gun Joung, Wo Jae Lee, Charilaos Mousoulis, John W. Sutherland, and Saurabh Bagchi

(参考訳) スマートマニュファクチャリングシステムは、さまざまなセンシングされた情報を解釈し、システムの観察から得られた知識に作用する能力があるため、成長速度で展開されている。多くの場合、スマートマニュファクチャリングシステムの主な目標は、迅速な障害の検出(あるいは予測)と運用コストの削減、ダウンタイムの削減である。これはしばしば、システムから取得したセンサー日内における異常を検出するためである。スマートマニュファクチャリングアプリケーションドメインは、ある種の技術的課題を提起する。特に、能力やコストの異なる複数のタイプのセンサーがあることが多い。センサデータ特性は、モータのRPMなどの環境や機械の動作点によって変化する。したがって、異常検出プロセスは動作点付近で校正する必要がある。本稿では,製造試験場から展開されたセンサから4つのデータセットを解析する。センサデータの時系列を予測するために,従来のMLおよびMLに基づく予測モデルの性能を評価する。そして、一種類のセンサからのスパースデータを考慮して、高データレートセンサからの転送学習を行い、欠陥タイプ分類を行う。その結果,予測的障害分類が可能となり,予測的メンテナンスの道筋が整った。

Smart manufacturing systems are being deployed at a growing rate because of their ability to interpret a wide variety of sensed information and act on the knowledge gleaned from system observations. In many cases, the principal goal of the smart manufacturing system is to rapidly detect (or anticipate) failures to reduce operational cost and eliminate downtime. This often boils down to detecting anomalies within the sensor date acquired from the system. The smart manufacturing application domain poses certain salient technical challenges. In particular, there are often multiple types of sensors with varying capabilities and costs. The sensor data characteristics change with the operating point of the environment or machines, such as, the RPM of the motor. The anomaly detection process therefore has to be calibrated near an operating point. In this paper, we analyze four datasets from sensors deployed from manufacturing testbeds. We evaluate the performance of several traditional and ML-based forecasting models for predicting the time series of sensor data. Then, considering the sparse data from one kind of sensor, we perform transfer learning from a high data rate sensor to perform defect type classification. Taken together, we show that predictive failure classification can be achieved, thus paving the way for predictive maintenance.

翻訳日:2022-06-14 15:27:05 公開日:2022-06-13

# F-RANの計算オフロードと資源配分: 深層強化学習アプローチ

Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach ( http://arxiv.org/abs/2206.05881v1 )

ライセンス: Link先を確認

Lingling Zhang, Yanxiang Jiang, Fu-Chun Zheng, Mehdi Bennis, and Xiaohu You

(参考訳) フォグ無線アクセスネットワーク(F-RAN)は、ユーザのモバイルデバイス(MD)が計算タスクを近くのフォグアクセスポイント(F-AP)にオフロードできる有望な技術である。 F-APの限られた資源のため、効率的なタスクオフロード方式を設計することが重要である。本稿では,時間変化を考慮したネットワーク環境を考慮し,F-RANの動的計算オフロードと資源配分問題を定式化し,MDのタスク実行遅延とエネルギー消費を最小化する。この問題を解決するために、各F-APにおける計算オフロードとリソース割り当てを行うディープ決定性ポリシー勾配(DDPG)アルゴリズムを、DRLに基づくアルゴリズムを提案する。 DDPGエージェントをトレーニングすることで、トレーニングプロセスの計算複雑性を低減し、ユーザのプライバシを保護する。シミュレーションの結果,提案したDDPGアルゴリズムは,他の既存手法と比較して,作業実行の遅れやMDのエネルギー消費を低減できることがわかった。

The fog radio access network (F-RAN) is a promising technology in which the user mobile devices (MDs) can offload computation tasks to the nearby fog access points (F-APs). Due to the limited resource of F-APs, it is important to design an efficient task offloading scheme. In this paper, by considering time-varying network environment, a dynamic computation offloading and resource allocation problem in F-RANs is formulated to minimize the task execution delay and energy consumption of MDs. To solve the problem, a federated deep reinforcement learning (DRL) based algorithm is proposed, where the deep deterministic policy gradient (DDPG) algorithm performs computation offloading and resource allocation in each F-AP. Federated learning is exploited to train the DDPG agents in order to decrease the computing complexity of training process and protect the user privacy. Simulation results show that the proposed federated DDPG algorithm can achieve lower task execution delay and energy consumption of MDs more quickly compared with the other existing strategies.

翻訳日:2022-06-14 15:04:55 公開日:2022-06-13

# フォグランズにおけるコンテンツ人気予測:クラスタ型フェデレーション学習に基づくアプローチ

Content Popularity Prediction in Fog-RANs: A Clustered Federated Learning Based Approach ( http://arxiv.org/abs/2206.05894v1 )

ライセンス: Link先を確認

Zhiheng Wang, Yanxiang Jiang, Fu-Chun Zheng, Mehdi Bennis and Xiaohu You

(参考訳) 本稿では,フォグラジオアクセスネットワーク(F-RAN)におけるコンテンツ人気予測問題について検討する。クラスタ化されたフェデレーション学習に基づいて,ローカルユーザとモバイルユーザの観点からコンテンツの人気度を統合する,モビリティを考慮した新しい人気予測ポリシーを提案する。ローカルユーザに対しては,ローカルユーザとコンテンツの隠れた表現を学習することで,コンテンツの人気を予測できる。近隣情報を自己情報に組み込んだローカルユーザとコンテンツの初期特徴を生成する。次に、二重チャネルニューラルネットワーク(DCNN)モデルを導入し、初期特徴から深い潜伏特徴を生成して隠れ表現を学習する。モバイルユーザーにとって、コンテンツの人気はユーザー好みの学習によって予測される。コンテンツ人気の地域差を識別するために、クラスタ化フェデレーションラーニング(CFL)が採用され、類似の地域型を持つフォグアクセスポイント(F-AP)が互いに恩恵を受け、各F-APに対してより専門的なDCNNモデルを提供する。シミュレーションの結果,提案手法は従来の政策よりも大幅な性能向上を実現していることがわかった。

In this paper, the content popularity prediction problem in fog radio access networks (F-RANs) is investigated. Based on clustered federated learning, we propose a novel mobility-aware popularity prediction policy, which integrates content popularities in terms of local users and mobile users. For local users, the content popularity is predicted by learning the hidden representations of local users and contents. Initial features of local users and contents are generated by incorporating neighbor information with self information. Then, dual-channel neural network (DCNN) model is introduced to learn the hidden representations by producing deep latent features from initial features. For mobile users, the content popularity is predicted via user preference learning. In order to distinguish regional variations of content popularity, clustered federated learning (CFL) is employed, which enables fog access points (F-APs) with similar regional types to benefit from one another and provides a more specialized DCNN model for each F-AP. Simulation results show that our proposed policy achieves significant performance improvement over the traditional policies.

翻訳日:2022-06-14 15:04:39 公開日:2022-06-13

# 光処理ユニットを用いた圧縮クラスタリング

Compressive Clustering with an Optical Processing Unit ( http://arxiv.org/abs/2206.05928v1 )

ライセンス: Link先を確認

Luc Giffon (DANTE), R\'emi Gribonval (DANTE)

(参考訳) 光処理ユニット(opu)を使用して、スケッチのためのランダムなフーリエ特徴を計算し、この設定に全体的な圧縮クラスタリングパイプラインを適用する。また,圧縮クラスタリングの臨界ハイパーパラメータのチューニングを支援するツールを提案する。

We explore the use of Optical Processing Units (OPU) to compute random Fourier features for sketching, and adapt the overall compressive clustering pipeline to this setting. We also propose some tools to help tuning a critical hyper-parameter of compressive clustering.

翻訳日:2022-06-14 15:04:20 公開日:2022-06-13

# 本質的動機づけによるオプション学習:最近の方法の比較研究

Intrinsically motivated option learning: a comparative study of recent methods ( http://arxiv.org/abs/2206.06007v1 )

ライセンス: Link先を確認

Djordje Bo\v{z}i\'c, Predrag Tadi\'c, Mladen Nikoli\'c

(参考訳) オプションは強化学習(RL)における複数の時間スケールでの推論のためのフレームワークである。 rl研究コミュニティにおける教師なし学習パラダイムに対する近年の活発な関心により、オプションフレームワークは、エージェントが環境に与える影響の量と、この影響を知覚する能力に対応し、環境の報酬構造によって提供される監督なしで最適化できるエンパワーメントの概念を利用するように適応された。近年、多くの論文がこの概念を様々な方法で修正し、賞賛できる結果を得た。しかし、これらの様々な変更を通じて、エンパワーメントの初期の文脈はしばしば失われる。本研究では、元のエンパワーメント原理のレンズを通して、そのような論文の比較研究を行う。

Options represent a framework for reasoning across multiple time scales in reinforcement learning (RL). With the recent active interest in the unsupervised learning paradigm in the RL research community, the option framework was adapted to utilize the concept of empowerment, which corresponds to the amount of influence the agent has on the environment and its ability to perceive this influence, and which can be optimized without any supervision provided by the environment's reward structure. Many recent papers modify this concept in various ways achieving commendable results. Through these various modifications, however, the initial context of empowerment is often lost. In this work we offer a comparative study of such papers through the lens of the original empowerment principle.

翻訳日:2022-06-14 15:03:25 公開日:2022-06-13

# 現実世界における自律的段階付けに向けて

Towards Autonomous Grading In The Real World ( http://arxiv.org/abs/2206.06091v1 )

ライセンス: Link先を確認

Yakov Miron, Chana Ross, Yuval Goldfracht, Chen Tessler and Dotan Di Castro

(参考訳) 本研究では,不均一な領域を平滑化するためにドーザーが必要となる自律的採点の問題に取り組むことを目的としている。さらに,シミュレーション環境と実シナリオとのギャップを埋める手法についても検討する。実際のドーザーダイナミクスと感覚情報を模倣した実物シミュレーションと実物プロトタイプ環境の両方を設計した。我々はその問題を解決するためにヒューリスティックスと学習戦略を確立する。大規模な実験を通じて, ヒューリスティックはクリーンでノイズのないシミュレーション環境で問題に取り組むことができるが, 現実のシナリオに直面すると壊滅的に失敗することを示した。ヒューリスティックスはシミュレーション環境でタスクをうまく解くことができるので、シミュレーションとスケールしたプロトタイプ環境の両方においてタスクを一般化し解決できる学習エージェントの誘導に活用できることを示す。

In this work, we aim to tackle the problem of autonomous grading, where a dozer is required to flatten an uneven area. In addition, we explore methods for bridging the gap between a simulated environment and real scenarios. We design both a realistic physical simulation and a scaled real prototype environment mimicking the real dozer dynamics and sensory information. We establish heuristics and learning strategies in order to solve the problem. Through extensive experimentation, we show that although heuristics are capable of tackling the problem in a clean and noise-free simulated environment, they fail catastrophically when facing real world scenarios. As the heuristics are capable of successfully solving the task in the simulated environment, we show they can be leveraged to guide a learning agent which can generalize and solve the task both in simulation and in a scaled prototype environment.

翻訳日:2022-06-14 15:03:13 公開日:2022-06-13

# dcase 2022チャレンジタスク2 : ドメイン一般化手法を適用した機械状態監視のための教師なし異常音検出

Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques ( http://arxiv.org/abs/2206.05876v1 )

ライセンス: Link先を確認

Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto and Yohei Kawaguchi

(参考訳) 本稿では,音響シーンとイベントの検出と分類に関するタスク記述(dcase)2022 challenge task 2: "unsupervised anomalous sound detection (asd) for machine condition monitoring using domain generalization techniques"について述べる。ドメインシフトは、ASDシステムの適用にとって重要な問題である。ドメインシフトはデータの音響特性を変化させる可能性があるため、ソースドメインでトレーニングされたモデルは、ターゲットドメインに対して性能が悪い。 DCASE 2021 Challenge Task 2では、ドメインシフトを処理するためのASDタスクを編成しました。この課題では、領域シフトの発生が知られていると仮定された。しかし、実際には、各サンプルのドメインは与えられず、ドメインシフトは暗黙的に発生する可能性がある。 2022タスク2では,ドメインシフトによらず異常を検出する領域一般化技術に注目した。具体的には、各サンプルのドメインがテストデータに与えられず、すべてのドメインに対して1つのしきい値のみが許可される。課題提出期限後に,課題結果と提案内容の分析を加えます。

We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 2: "Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques". Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domain performs poorly for a target domain. In DCASE 2021 Challenge Task 2, we organized an ASD task for handling domain shifts. In this task, it was assumed that the occurrences of domain shifts are known. However, in practice, the domain of each sample may not be given, and the domain shifts can occur implicitly. In 2022 Task 2, we focus on domain generalization techniques that detects anomalies regardless of the domain shifts. Specifically, the domain of each sample is not given in the test data and only one threshold is allowed for all domains. We will add challenge results and analysis of the submissions after the challenge submission deadline.

翻訳日:2022-06-14 15:01:58 公開日:2022-06-13

# SyntheX: サイコ実験による学習ベースのX線画像解析のスケールアップ

SyntheX: Scaling Up Learning-based X-ray Image Analysis Through In Silico Experiments ( http://arxiv.org/abs/2206.06127v1 )

ライセンス: Link先を確認

Cong Gao, Benjamin D. Killeen, Yicheng Hu, Robert B. Grupp, Russell H. Taylor, Mehran Armand, Mathias Unberath

(参考訳) 人工知能(AI)は、医療用画像の自動解釈を可能にする。しかし、手術中のガイダンスなどの介入画像(トリアージや診断に関わるもの)に対するAIの潜在的な使用は、ほとんど未解決のままである。これは現在、外科的AIシステムは、倫理的考慮、費用、スケーラビリティ、データの完全性、基礎的真実の欠如など、基本的なおよび実践的な制限があるライブ手術中に収集されたデータのポストホック分析を使用して訓練されているためである。本稿では,人間のモデルから現実的なシミュレーション画像を作成することは,大規模な実地データ収集を補完する有効な代替手段であることを示す。本研究は,ai画像解析モデルと現代ドメイン一般化や適応手法を組み合わせることで,実データ上でのモデルと正確に一致する実データ学習セットで学習されたモデルとを両立させることができることを示す。人間ベースのモデルからのトレーニングデータの合成生成は、容易にスケールできるため、synthexと呼ばれるx線画像解析のモデル転送パラダイムは、より大きなデータセットでのトレーニングの有効性により、実際のデータトレーニングモデルよりも優れています。われわれはSyntheXの3つの臨床課題について, ヒップ画像解析, 手術用ロボットツール検出, および COVID-19 肺病変のセグメンテーションの3つの可能性を示した。 SyntheXは、X線治療のためのインテリジェントシステムの概念、設計、評価を劇的に加速する機会を提供する。加えて、シミュレーションされた画像環境は、新しい計測方法のテスト、補完的な手術アプローチの設計、そして人間のデータ収集の倫理的かつ実践的な考察から解放された、成果を改善し、時間を節約し、ヒューマンエラーを緩和する新しい手法を想定する機会を提供する。

Artificial intelligence (AI) now enables automated interpretation of medical images for clinical use. However, AI's potential use for interventional images (versus those involved in triage or diagnosis), such as for guidance during surgery, remains largely untapped. This is because surgical AI systems are currently trained using post hoc analysis of data collected during live surgeries, which has fundamental and practical limitations, including ethical considerations, expense, scalability, data integrity, and a lack of ground truth. Here, we demonstrate that creating realistic simulated images from human models is a viable alternative and complement to large-scale in situ data collection. We show that training AI image analysis models on realistically synthesized data, combined with contemporary domain generalization or adaptation techniques, results in models that on real data perform comparably to models trained on a precisely matched real data training set. Because synthetic generation of training data from human-based models scales easily, we find that our model transfer paradigm for X-ray image analysis, which we refer to as SyntheX, can even outperform real data-trained models due to the effectiveness of training on a larger dataset. We demonstrate the potential of SyntheX on three clinical tasks: Hip image analysis, surgical robotic tool detection, and COVID-19 lung lesion segmentation. SyntheX provides an opportunity to drastically accelerate the conception, design, and evaluation of intelligent systems for X-ray-based medicine. In addition, simulated image environments provide the opportunity to test novel instrumentation, design complementary surgical approaches, and envision novel techniques that improve outcomes, save time, or mitigate human error, freed from the ethical and practical considerations of live human data collection.

翻訳日:2022-06-14 14:56:23 公開日:2022-06-13

# ctスキャンによる体積超解像のためのrplhr-ctデータセットと変圧器ベースライン

RPLHR-CT Dataset and Transformer Baseline for Volumetric Super-Resolution from CT Scans ( http://arxiv.org/abs/2206.06253v1 )

ライセンス: Link先を確認

Pengxin Yu, Haoyue Zhang, Han Kang, Wen Tang, Corey W. Arnold, Rongguo Zhang

(参考訳) 臨床では, 取得時間の短縮や保存コストの低減などにより, 平面分解能の低い異方性容積医用画像が一般的に用いられる。しかしながら、この粗い解決は、医師またはコンピュータ支援の診断アルゴリズムによる医療診断の困難につながる可能性がある。深層学習に基づくボリューム超解像(SR)法は、畳み込みニューラルネットワーク(CNN)を中心に、解像度を改善するための実現可能な方法である。近年の進歩にもかかわらず、これらの手法はコンボリューション演算子の性質によって制限されており、コンボリューションの関連性を無視し、長距離依存を効果的にモデル化できない。さらに、既存の手法の多くは擬似ペアドボリュームをトレーニングと評価に使用しており、擬似低分解能(LR)ボリュームは高分解能(HR)ボリュームの単純な劣化によって生成される。しかし、擬似LRボリュームと実LRボリュームのドメインギャップは、実際にはこれらの手法の貧弱な性能をもたらす。本稿では,量的SRのベンチマークとして,最初の公開実対データセット RPLHR-CT を構築し,最先端の CNN ベースの4つの手法を再実装することによって,ベースライン結果を提供する。また,CNNの固有の欠点を考慮し,コンボリューションを完全に排除したアテンション機構に基づくトランスフォーマーボリューム超解像ネットワーク(TVSRN)を提案する。これはCTボリュームSRに純粋なトランスフォーマーを使用した最初の研究である。実験の結果,TVSRNはPSNRとSSIMの両方のベースラインを著しく上回ることがわかった。さらに,TVSRN法では,画像品質,パラメータ数,実行時間とのトレードオフが向上する。データとコードはhttps://github.com/smilenaxx/RPLHR-CTで入手できる。

In clinical practice, anisotropic volumetric medical images with low through-plane resolution are commonly used due to short acquisition time and lower storage cost. Nevertheless, the coarse resolution may lead to difficulties in medical diagnosis by either physicians or computer-aided diagnosis algorithms. Deep learning-based volumetric super-resolution (SR) methods are feasible ways to improve resolution, with convolutional neural networks (CNN) at their core. Despite recent progress, these methods are limited by inherent properties of convolution operators, which ignore content relevance and cannot effectively model long-range dependencies. In addition, most of the existing methods use pseudo-paired volumes for training and evaluation, where pseudo low-resolution (LR) volumes are generated by a simple degradation of their high-resolution (HR) counterparts. However, the domain gap between pseudo- and real-LR volumes leads to the poor performance of these methods in practice. In this paper, we build the first public real-paired dataset RPLHR-CT as a benchmark for volumetric SR, and provide baseline results by re-implementing four state-of-the-art CNN-based methods. Considering the inherent shortcoming of CNN, we also propose a transformer volumetric super-resolution network (TVSRN) based on attention mechanisms, dispensing with convolutions entirely. This is the first research to use a pure transformer for CT volumetric SR. The experimental results show that TVSRN significantly outperforms all baselines on both PSNR and SSIM. Moreover, the TVSRN method achieves a better trade-off between the image quality, the number of parameters, and the running time. Data and code are available at https://github.com/smilenaxx/RPLHR-CT.

翻訳日:2022-06-14 14:55:49 公開日:2022-06-13

# maniskill 2021: learning-from-demonstrations and heuristic rule-based method for object manipulation

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation ( http://arxiv.org/abs/2206.06289v1 )

ライセンス: Link先を確認

Yingwei Pan and Yehao Li and Yiheng Zhang and Qi Cai and Fuchen Long and Zhaofan Qiu and Ting Yao and Tao Mei

(参考訳) 本稿では,sapien maniskill challenge 2021において,以下の2つのトラック用に設計されたシステムの概要と比較分析を行った。模倣学習に基づくアプローチ,すなわち,古典的教師付き学習手法を用いた観察行動の模倣と,オフライン強化学習に基づくアプローチの両方について検討した。さらに,物体やロボットアームの形状やテクスチャ構造をトランスフォーマーネットワークで活用し,模倣学習を容易にする。 No Restriction Track: このトラックでは、タスクを一連のサブタスクに分解することで高品質なオブジェクト操作をトリガーするHuristic Rule-based Method(HRM)を設計します。各サブタスクに対して、ロボットアームに適用可能な動作を予測するための単純なルールベースの制御戦略が採用されている。システムの実装を容易にするため、すべてのソースコードと事前訓練済みモデルは、 \url{https://github.com/caiqi/Silver-Bullet-3D/}で利用可能である。

This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories. We investigate both imitation learning-based approach, i.e., imitating the observed behavior using classical supervised learning techniques, and offline reinforcement learning-based approaches, for this track. Moreover, the geometry and texture structures of objects and robotic arms are exploited via Transformer-based networks to facilitate imitation learning. No Restriction Track: In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks. For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms. To ease the implementations of our systems, all the source codes and pre-trained models are available at \url{https://github.com/caiqi/Silver-Bullet-3D/}.

翻訳日:2022-06-14 14:55:20 公開日:2022-06-13

# 多項式複雑性をもつスコアベース生成モデルの収束性

Convergence for score-based generative modeling with polynomial complexity ( http://arxiv.org/abs/2206.06227v1 )

ライセンス: Link先を確認

Holden Lee and Jianfeng Lu and Yixin Tan

(参考訳) スコアベース生成モデリング(SGM)は、データから確率分布を学習し、さらなるサンプルを生成するために非常に成功した手法である。 sgm の背後にあるコアメカニックに対する最初の多項式収束性を保証する: 確率密度 $p$ が与えられたスコア推定値 ($\nabla \ln p$ の見積もり) からサンプルを抽出し、$l^2(p)$ で正確であることを証明する。以前の作品と比較して、私たちは指数関数的に増加するエラーや、次元の呪いに苦しむエラーを犯さない。この保証は任意の滑らかな分布に対して有効であり、その対数ソボレフ定数に依存する。保証条件を用いて,音階の異なるスコア推定値から,ホワイトノイズ入力をサンプルに変換するスコアベース生成モデルの理論解析を行った。提案手法は, 熱処理による各工程の温かいスタート点の獲得に要するので, 有効試料の生成には熱処理が必要であるという理論的な根拠を与える。さらに,予測子補正アルゴリズムは,どちらの部分のみを使用するよりも収束性が良いことを示す。

Score-based generative modeling (SGM) is a highly successful approach for learning a probability distribution from data and generating further samples. We prove the first polynomial convergence guarantees for the core mechanic behind SGM: drawing samples from a probability density $p$ given a score estimate (an estimate of $\nabla \ln p$) that is accurate in $L^2(p)$. Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality. Our guarantee works for any smooth distribution and depends polynomially on its log-Sobolev constant. Using our guarantee, we give a theoretical analysis of score-based generative modeling, which transforms white-noise input into samples from a learned data distribution given score estimates at different noise scales. Our analysis gives theoretical grounding to the observation that an annealed procedure is required in practice to generate good samples, as our proof depends essentially on using annealing to obtain a warm start at each step. Moreover, we show that a predictor-corrector algorithm gives better convergence than using either portion alone.

翻訳日:2022-06-14 14:54:48 公開日:2022-06-13

# 畳み込み型長期記憶を用いた畳み込み型ニューラルネットワークを用いた全身動的PETのフレーム間運動補正

Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network ( http://arxiv.org/abs/2206.06341v1 )

ライセンス: Link先を確認

Xueqi Guo, Bo Zhou, David Pigg, Bruce Spottiswoode, Michael E. Casey, Chi Liu, Nicha C. Dvornek

(参考訳) 全身動的PETにおける被写体運動は、フレーム間ミスマッチを導入し、パラメトリックイメージングに深刻な影響を及ぼす。従来の非厳密な登録手法は一般に計算量が多く、時間がかかる。ディープラーニングアプローチは,高速で高精度な学習を実現する上で有望だが,トレーサ分布の変化や全身範囲についてはまだ検討されていない。本研究では,フレーム間体の動きを補正するための教師なし自動ディープラーニングフレームワークを開発した。運動推定ネットワークは、動的時間的特徴と空間的情報を完全に活用した畳み込み長短期記憶層を組み合わせた畳み込みニューラルネットワークである。本データセットは90分FDGフルボディPETスキャンで27名の被験者を抽出した。従来の学習ベースラインと深層学習ベースラインの両方と比較して,9倍のクロスバリデーションでは,パラメトリック$K_{i}$と$V_{b}$の画像の質的,定量的な空間的アライメントが向上し,パラメトリックフィッティング誤差が大幅に低減された。また,提案手法が推定したパラメトリック画像の下流解析に影響を及ぼす可能性を示し,良質な代謝異常領域と悪性度を区別する能力を向上した。一旦トレーニングすると,提案ネットワークの動作推定時間は,従来の登録ベースラインの約460倍高速となり,臨床応用が容易になる可能性が示唆された。

Subject motion in whole-body dynamic PET introduces inter-frame mismatch and seriously impacts parametric imaging. Traditional non-rigid registration methods are generally computationally intense and time-consuming. Deep learning approaches are promising in achieving high accuracy with fast speed, but have yet been investigated with consideration for tracer distribution changes or in the whole-body scope. In this work, we developed an unsupervised automatic deep learning-based framework to correct inter-frame body motion. The motion estimation network is a convolutional neural network with a combined convolutional long short-term memory layer, fully utilizing dynamic temporal features and spatial information. Our dataset contains 27 subjects each under a 90-min FDG whole-body dynamic PET scan. With 9-fold cross-validation, compared with both traditional and deep learning baselines, we demonstrated that the proposed network obtained superior performance in enhanced qualitative and quantitative spatial alignment between parametric $K_{i}$ and $V_{b}$ images and in significantly reduced parametric fitting error. We also showed the potential of the proposed motion correction method for impacting downstream analysis of the estimated parametric images, improving the ability to distinguish malignant from benign hypermetabolic regions of interest. Once trained, the motion estimation inference time of our proposed network was around 460 times faster than the conventional registration baseline, showing its potential to be easily applied in clinical settings.

翻訳日:2022-06-14 14:54:29 公開日:2022-06-13

# (参考訳) 微分可能かつ伝達可能な構造学習

Differentiable and Transportable Structure Learning ( http://arxiv.org/abs/2206.06354v1 )

ライセンス: CC BY 4.0

Jeroen Berrevoets, Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar

(参考訳) 我々は,有向非巡回グラフィカルモデル(DAG)に着目した教師なし構造学習に興味を持っている。これらの構造を推論するために必要となる計算は、一般的に変数の量において超指数的である。つまり、最近の進歩によってこの空間を微分可能な計量を用いて探索できるまで、検索時間は劇的に削減される。この手法は notears と名付けられ、dag-discovery の独創的な作品と見なされているが、微分可能性(英語版)(transportability)を支持する重要な特性である。本稿では,新しいアーキテクチャと損失関数により,検出された構造物の輸送性を復元するD-Structを提案する。 D-Structは相変わらず差別化可能であるため、従来NOTEARSで行われていたように、我々の手法を差別化可能なアーキテクチャで容易に適用することができる。実験では, エッジ精度とハミング距離に関するD構造を実験的に検証した。

We are interested in unsupervised structure learning with a particular focus on directed acyclic graphical (DAG) models. Compute required to infer these structures is typically super-exponential in the amount of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances allowed to search this space using a differentiable metric, drastically reducing search time. While this technique -- named NOTEARS -- is widely considered a seminal work in DAG-discovery, it concedes an important property in favour of differentiability: transportability. In our paper we introduce D-Struct which recovers transportability in the found structures through a novel architecture and loss function, while remaining completely differentiable. As D-Struct remains differentiable, one can easily adopt our method in differentiable architectures as was previously done with NOTEARS. In our experiments we empirically validate D-Struct with respect to edge accuracy and the structural Hamming distance.

翻訳日:2022-06-14 14:53:02 公開日:2022-06-13

# 強化学習におけるマルチタスク表現学習の有益性

Provable Benefit of Multitask Representation Learning in Reinforcement Learning ( http://arxiv.org/abs/2206.05900v1 )

ライセンス: Link先を確認

Yuan Cheng, Songtao Feng, Jing Yang, Hong Zhang, Yingbin Liang

(参考訳) 表現学習は、実際には強化学習(RL)におけるサンプルの複雑さを低減する強力な手法となり、その利点に関する理論的理解は限定的である。本稿では,低ランクマルコフ決定過程(MDP)モデルに基づく表現学習の利点を理論的に特徴づける。まず,全てのタスクが共通表現を持つマルチタスク低ランクRL(上流トレーニング)について検討し,REFUELと呼ばれる新しいマルチタスク報酬のないアルゴリズムを提案する。 REFUELは、各タスクの遷移カーネルとほぼ最適ポリシーの両方を学び、下流タスクのよく学習された表現を出力する。その結果、タスクの総数が一定のしきい値を超えている限り、マルチタスク表現学習は各タスクを個別に学習するよりもサンプル効率が高いことが示された。次に、ダウンストリームRLをオンラインとオフラインの両方の設定で研究し、エージェントにアップストリームタスクと同じ表現を共有する新しいタスクを割り当てる。オンラインとオフラインの両方の設定で、サンプル効率のよいアルゴリズムを開発し、上流での学習表現の推定誤差と下流のサンプル数が大きくなるにつれて消滅する項の合計によって、サブオプティリティギャップを境界とする最適に近いポリシーを見出す。オンラインおよびオフラインRLのダウンストリーム結果はさらに、ローランクモデルの表現を直接学習するのではなく、上流から学習した表現を採用するメリットを捉えています。我々の知る限りでは、上流と下流の両方のタスクに対して探索に基づく報酬なしマルチタスクRLにおける表現学習の利点を特徴づける最初の理論的研究である。

As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both the transition kernel and the near-optimal policy for each task, and outputs a well-learned representation for downstream tasks. Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually, as long as the total number of tasks is above a certain threshold. We then study the downstream RL in both online and offline settings, where the agent is assigned with a new task sharing the same representation as the upstream tasks. For both online and offline settings, we develop a sample-efficient algorithm, and show that it finds a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large. Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the low-rank model directly. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.

翻訳日:2022-06-14 14:28:01 公開日:2022-06-13

# 比較学習特徴を用いたグラフ生成モデルの評価

Evaluating Graph Generative Models with Contrastively Learned Features ( http://arxiv.org/abs/2206.06234v1 )

ライセンス: Link先を確認

Hamed Shirzad and Kaveh Hassani and Danica J. Sutherland

(参考訳) グラフ生成モデルには様々なモデルが提案されており、その品質を評価するのに効果的な方法が必要となる。今のところ、ほとんどのテクニックは、サブグラフカウントに基づく伝統的なメトリクスまたはランダムに初期化されたグラフニューラルネットワーク(GNN)の表現を使用する。我々は、ランダムなGNNではなく、対照的に訓練されたGNNの表現を使うことを提案する。しかし、従来のアプローチもGNNベースのアプローチもどちらにも支配的ではなく、それぞれのアプローチが区別できないグラフの例を挙げる。グラフサブストラクチャーネットワーク(GSN)は、両方のアプローチを組み合わせることで、グラフデータセット間の距離を区別するのがより優れていることを実証する。

A wide range of models have been proposed for Graph Generative Models, necessitating effective methods to evaluate their quality. So far, most techniques use either traditional metrics based on subgraph counting, or the representations of randomly initialized Graph Neural Networks (GNNs). We propose using representations from contrastively trained GNNs, rather than random GNNs, and show this gives more reliable evaluation metrics. Neither traditional approaches nor GNN-based approaches dominate the other, however: we give examples of graphs that each approach is unable to distinguish. We demonstrate that Graph Substructure Networks (GSNs), which in a way combine both approaches, are better at distinguishing the distances between graph datasets.

翻訳日:2022-06-14 14:27:34 公開日:2022-06-13

# 制約付きmdpの最適近傍サンプル複雑性境界

Near-Optimal Sample Complexity Bounds for Constrained MDPs ( http://arxiv.org/abs/2206.06270v1 )

ライセンス: Link先を確認

Sharan Vaswani, Lin F. Yang, Csaba Szepesv\'ari

(参考訳) マルコフ決定過程(MDPs)を解くためのサンプル複雑性の特徴付けの進歩とは対照的に、制約付きMDP(CMDPs)を解くための最適な統計複雑性はいまだ不明である。生成モデル(シミュレータ)にアクセスして割引cmdpで最適に近いポリシーを学ぶために、サンプル複雑性の最小上限と下限を提供することで、この問題を解決する。特に、2つの設定に対処するモデルベースアルゴリズムを設計する。 (i)小さな制約違反が許容されるような緩和実現可能性 (ii)厳格な実現可能性(制約を満たすために出力ポリシが必要) のために i) 提案アルゴリズムは,$\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$クエリを生成モデルに適用することにより,確率 1 - \delta$ で $\epsilon$-optimal Policy を返すことを証明した。のために (ii) アルゴリズムのサンプルの複雑さは、$\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1- \gamma)^5 \, \epsilon^2 \zeta^2} \right)$$\zeta$は問題依存のスレーター定数であり、実現可能な領域のサイズを特徴付ける。最後に, 厳密な実現可能性設定に対して一致した下界を証明し, 割引CMDPに対する第1の極小最適境界を求める。以上の結果から,CMDPの学習は制約違反を許す場合と同等に容易であるが,制約違反を要求しない場合には本質的に困難であることがわかった。

In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $\epsilon$-optimal policy with probability $1 - \delta$, by making $\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right)$ where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.

翻訳日:2022-06-14 14:27:22 公開日:2022-06-13

# CNNのロバスト性に向けたPixelからバイナリへの埋め込み

Pixel to Binary Embedding Towards Robustness for CNNs ( http://arxiv.org/abs/2206.05898v1 )

ライセンス: Link先を確認

Ikki Kishida and Hideki Nakayama

(参考訳) 畳み込みニューラルネットワーク(CNN)の堅牢性にはいくつかの問題がある。例えば、入力に少量のノイズを加えることでCNNの予測を変更でき、トレーニング中に見られない変換(例えば、ぼやけた効果)によって入力の分布がシフトされたときにCNNのパフォーマンスが劣化する。対向摂動問題に対処するため、画素値をバイナリ埋め込みで置き換えるアプローチがあり、堅牢性の向上に成功している。本研究では,cnnのロバスト性を改善するために,p2beを提案する。 p2beは、以前の手書きバイナリ埋め込みメソッドとは対照的に学習可能なバイナリ埋め込みメソッドである。 P2BEは、訓練中に表示されない対向的摂動や視覚的腐敗に対する堅牢性において、他のバイナリ埋め込み方法よりも優れる。

There are several problems with the robustness of Convolutional Neural Networks (CNNs). For example, the prediction of CNNs can be changed by adding a small magnitude of noise to an input, and the performances of CNNs are degraded when the distribution of input is shifted by a transformation never seen during training (e.g., the blur effect). There are approaches to replace pixel values with binary embeddings to tackle the problem of adversarial perturbations, which successfully improve robustness. In this work, we propose Pixel to Binary Embedding (P2BE) to improve the robustness of CNNs. P2BE is a learnable binary embedding method as opposed to previous hand-coded binary embedding methods. P2BE outperforms other binary embedding methods in robustness against adversarial perturbations and visual corruptions that are not shown during training.

翻訳日:2022-06-14 14:24:44 公開日:2022-06-13

# 最適化を高速化するメタラーニング適応フェーズ

Faster Optimization-Based Meta-Learning Adaptation Phase ( http://arxiv.org/abs/2206.05930v1 )

ライセンス: Link先を確認

Kostiantyn Khabarlak

(参考訳) ニューラルネットワークは学習するために大量の注釈付きデータを必要とする。メタラーニングアルゴリズムは、トレーニングサンプルの数をほんの数人に減らす方法を提案する。最も著名な最適化に基づくメタ学習アルゴリズムの1つは、モデル非依存メタ学習(maml)である。しかし、MAMLにおける新しいタスクへの適応の重要な手順は非常に遅い。本研究では,MAMLメタ学習アルゴリズムの改良を提案する。適応フェーズ中にネットワーク内で更新される重量を制限するLambdaパターンを導入する。これにより、特定の勾配計算をスキップすることができる。許容品質劣化閾値パラメータにより、最速パターンが選択される。特定の場合には、注意深いパターン選択によって品質改善が可能となる。実験により, ラムダ適応パターンの選択により, 適応時間は最小精度損失の3倍に減少し, 1段階適応の精度は大幅に向上した。

Neural networks require a large amount of annotated data to learn. Meta-learning algorithms propose a way to decrease the number of training samples to only a few. One of the most prominent optimization-based meta-learning algorithms is Model-Agnostic Meta-Learning (MAML). However, the key procedure of adaptation to new tasks in MAML is quite slow. In this work we propose an improvement to MAML meta-learning algorithm. We introduce Lambda patterns by which we restrict which weight are updated in the network during the adaptation phase. This makes it possible to skip certain gradient computations. The fastest pattern is selected given an allowed quality degradation threshold parameter. In certain cases, quality improvement is possible by a careful pattern selection. The experiments conducted have shown that via Lambda adaptation pattern selection, it is possible to significantly improve the MAML method in the following areas: adaptation time has been decreased by a factor of 3 with minimal accuracy loss; accuracy for one-step adaptation has been substantially improved.

翻訳日:2022-06-14 14:24:30 公開日:2022-06-13

# GoToNet: 高速なモノクロシーン露光と探索

GoToNet: Fast Monocular Scene Exposure and Exploration ( http://arxiv.org/abs/2206.05967v1 )

ライセンス: Link先を確認

Tom Avrech, Evgenii Zheltonozhskii, Chaim Baskin, Ehud Rivlin

(参考訳) 自律的なシーンの露出と探索、特に、未知のシーンのターゲットを見つけるのに有用なローカライズや通信の密度の高い領域は、コンピュータナビゲーションにおいて難しい問題である。そこで本研究では,事前学習のための視覚的に類似したデータセット,シーンに十分な照明,環境検知のための前方向けRGBカメラなど,リアルタイム環境探索のための新しい手法を提案する。既存の手法とは対照的に,良好な戦術決定を行うには1つのルック(画像)しか必要とせず,非成長的な一定時間で動作する。 GotoとLookatと呼ばれる画素が特徴である2つの方向予測が,本手法のコアを構成する。これらの画素は推奨飛行指示を次のように符号化する: Gotoピクセルはエージェントが1つの距離単位で動く方向を定義し、Lookatピクセルは次のステップでカメラが指している方向を定義している。これらのフライングインストラクションピクセルは、現在未調査領域で最も多く露出するように最適化されている。本手法は,この問題を解決するための新しい深層学習に基づくナビゲーション手法を提案し,計算能力が制限された場合に,さらに複雑なセットアップでその能力を示す。また,rgbと深度画像を用いた効率的な学習を実現するため,ナビゲーション指向データセットを生成する手法を提案する。スパースピクセルのコーディネーション推論プロセスと、領域を公開し、期待できる結果を達成するための距離を減らすことを目的とした2Dおよび3Dテスト飛行の両方を評価するシミュレータで実施されたテスト。現状のアルゴリズムと比較すると、カメラのポーズごとに新しいボクセルを計測し、ターゲットまでの距離を最小化し、表面ボクセルのパーセンテージを計測し、計算時間を計測する。

Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board forward-looking RGB camera for environmental sensing. As opposed to existing methods, our method requires only one look (image) to make a good tactical decision, and therefore works at a non-growing, constant time. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method. These pixels encode the recommended flight instructions in the following way: the Goto pixel defines the direction in which the agent should move by one distance unit, and the Lookat pixel defines the direction in which the camera should be pointing at in the next step. These flying-instruction pixels are optimized to expose the largest amount of currently unexplored areas. Our method presents a novel deep learning-based navigation approach that is able to solve this problem and demonstrate its ability in an even more complicated setup, i.e., when computational power is limited. In addition, we propose a way to generate a navigation-oriented dataset, enabling efficient training of our method using RGB and depth images. Tests conducted in a simulator evaluating both the sparse pixels' coordinations inferring process, and 2D and 3D test flights aimed to unveil areas and decrease distances to targets achieve promising results. Comparison against a state-of-the-art algorithm shows our method is able to overperform it, that while measuring the new voxels per camera pose, minimum distance to target, percentage of surface voxels seen, and compute time metrics.

翻訳日:2022-06-14 14:24:19 公開日:2022-06-13

# 教師なし意味セグメンテーションのためのトランスフォーマーを用いた物体マスクの発見

Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation ( http://arxiv.org/abs/2206.06363v1 )

ライセンス: Link先を確認

Wouter Van Gansbeke, Simon Vandenhende, Luc Van Gool

(参考訳) 教師なしセマンティックセグメンテーションの課題は、ピクセルを意味のあるグループにクラスタ化することである。具体的には、同じクラスタに割り当てられたピクセルは、オブジェクトや部分カテゴリのようなハイレベルなセマンティクス特性を共有する必要がある。本稿では,3つのキーアイデアに基づいた教師なしセマンティックセグメンテーションのための新しいフレームワークMaskDistillを提案する。まず、セマンティックセグメンテーションの前にピクセルグループとして機能するオブジェクトマスクを生成するためのデータ駆動戦略を提案する。このアプローチは、特定のシーン構成のためにしばしば設計され、競合するフレームワークの適用性を制限する手作りの先行を省略する。第2に、MaskDistillはオブジェクトマスクをクラスタ化して、初期オブジェクトセグメンテーションモデルをトレーニングするための擬似グラウンドトルースを得る。第3に、このモデルを利用して低品質のオブジェクトマスクをフィルタします。この戦略は,画素グループ化前のノイズを軽減し,最終的なセグメンテーションモデルをトレーニングするために使用するマスクのクリーンコレクションを実現する。これらのコンポーネントを組み合わせることで、PASCAL(+11% mIoU)とCOCO(+4% mask AP50)の教師なしセマンティックセマンティックセグメンテーションにおいて、従来よりも大幅に優れています。興味深いことに、既存のアプローチとは対照的に、我々のフレームワークは低レベルの画像キューにラッチせず、オブジェクト中心のデータセットに限定されない。コードとモデルは利用可能になる。

The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to generate object masks that serve as a pixel grouping prior for semantic segmentation. This approach omits handcrafted priors, which are often designed for specific scene compositions and limit the applicability of competing frameworks. Second, MaskDistill clusters the object masks to obtain pseudo-ground-truth for training an initial object segmentation model. Third, we leverage this model to filter out low-quality object masks. This strategy mitigates the noise in our pixel grouping prior and results in a clean collection of masks which we use to train a final segmentation model. By combining these components, we can considerably outperform previous works for unsupervised semantic segmentation on PASCAL (+11% mIoU) and COCO (+4% mask AP50). Interestingly, as opposed to existing approaches, our framework does not latch onto low-level image cues and is not limited to object-centric datasets. The code and models will be made available.

翻訳日:2022-06-14 14:23:46 公開日:2022-06-13

# ヒューマンオブジェクトインタラクション検出のためのインタラクション提案に基づく構造認識変換器の探索

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection ( http://arxiv.org/abs/2206.06291v1 )

ライセンス: Link先を確認

Yong Zhang and Yingwei Pan and Ting Yao and Rui Huang and Tao Mei and Chang-Wen Chen

(参考訳) 近年のHuman-Object Interaction(HOI)検出技術はTransformerベースのオブジェクト検出器(DETR)の影響を強く受けている。それでも、ほとんどの場合、パラメトリックなインタラクションクエリを直接、バニラトランスフォーマーを通じて一段階的にHOI予測にマッピングする。これにより、リッチな相互作用間構造や相互作用内構造が過小評価される。本稿では,hoi検出のための新しいトランスフォーマティブ型hoi検出器,すなわちstip(structure-aware transformer over interaction proposals)を設計した。このような設計は、HOIセット予測の過程を、まず相互作用の提案生成を行い、次に構造認識変換器を介して非パラメトリック相互作用提案をHOI予測に変換する2つのフェーズに分解する。構造対応トランスフォーマーは、相互作用提案の全体的意味構造と、各相互作用提案内の人間・物体の局所的空間構造を付加してバニラ変換器をアップグレードし、HOI予測を強化する。 V-COCOとHICO-DETのベンチマークで行った大規模な実験はSTIPの有効性を示し、最先端のHOI検出器と比較すると優れた結果が報告されている。ソースコードは \url{https://github.com/zyong812/STIP} で入手できる。

Recent high-performing Human-Object Interaction (HOI) detection techniques have been highly influenced by Transformer-based object detector (i.e., DETR). Nevertheless, most of them directly map parametric interaction queries into a set of HOI predictions through vanilla Transformer in a one-stage manner. This leaves rich inter- or intra-interaction structure under-exploited. In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection. Such design decomposes the process of HOI set prediction into two subsequent phases, i.e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer. The structure-aware Transformer upgrades vanilla Transformer by encoding additionally the holistically semantic structure among interaction proposals as well as the locally spatial structure of human/object within each interaction proposal, so as to strengthen HOI predictions. Extensive experiments conducted on V-COCO and HICO-DET benchmarks have demonstrated the effectiveness of STIP, and superior results are reported when comparing with the state-of-the-art HOI detectors. Source code is available at \url{https://github.com/zyong812/STIP}.

翻訳日:2022-06-14 14:21:03 公開日:2022-06-13

# MLP-3D:グループ時間混合型MLPライクな3Dアーキテクチャ

MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing ( http://arxiv.org/abs/2206.06292v1 )

ライセンス: Link先を確認

Zhaofan Qiu and Ting Yao and Chong-Wah Ngo and Tao Mei

(参考訳) 畳み込みニューラルネットワーク(CNN)は、視覚認識のためのゴートモデルとみなされてきた。近年,MSA(Multi-head self-attention)やMLP(Multi-layer perceptrons)に基づく畳み込みのないネットワークが普及している。それにもかかわらず、ビデオデータの大きなバリエーションや複雑さのために、これらの新たなネットワークをビデオ認識に活用するのは簡単ではない。本稿では,ビデオ認識のための新しい3DアーキテクチャであるMLP-3Dネットワークを提案する。具体的には、MLP-3Dブロックで構成され、各ブロックはトークン間で適用される1つのMLP(トークン混合MLP)と、各トークンに対して独立して適用される1つのMLP(チャネルMLP)を含む。新規なグループ化時間混合(GTM)演算の導出により,時間的モデリングの能力を備えた基本トークン混合MLPを開発した。 GTMは入力トークンを複数の時間群に分割し、各グループのトークンを共有射影行列で線形にマッピングする。さらに,GTM の様々な変種をグループ化戦略で考案し,各変種を Greedy アーキテクチャサーチにより MLP-3D ネットワークの異なるブロックに構成する。コンボリューションやアテンション機構に依存せずに、我々のMLP-3Dネットワークは、Somes-Something V2 と Kinetics-400 のデータセット上で、それぞれ68.5\%/81.4\%のトップ-1精度を達成する。計算量が少ないにもかかわらず、結果は最先端の3D CNNやビデオトランスフォーマーに匹敵する。ソースコードはhttps://github.com/ZhaofanQiu/MLP-3Dで入手できる。

Convolutional Neural Networks (CNNs) have been regarded as the go-to models for visual recognition. More recently, convolution-free networks, based on multi-head self-attention (MSA) or multi-layer perceptrons (MLPs), become more and more popular. Nevertheless, it is not trivial when utilizing these newly-minted networks for video recognition due to the large variations and complexities in video data. In this paper, we present MLP-3D networks, a novel MLP-like 3D architecture for video recognition. Specifically, the architecture consists of MLP-3D blocks, where each block contains one MLP applied across tokens (i.e., token-mixing MLP) and one MLP applied independently to each token (i.e., channel MLP). By deriving the novel grouped time mixing (GTM) operations, we equip the basic token-mixing MLP with the ability of temporal modeling. GTM divides the input tokens into several temporal groups and linearly maps the tokens in each group with the shared projection matrix. Furthermore, we devise several variants of GTM with different grouping strategies, and compose each variant in different blocks of MLP-3D network by greedy architecture search. Without the dependence on convolutions or attention mechanisms, our MLP-3D networks achieves 68.5\%/81.4\% top-1 accuracy on Something-Something V2 and Kinetics-400 datasets, respectively. Despite with fewer computations, the results are comparable to state-of-the-art widely-used 3D CNNs and video transformers. Source code is available at https://github.com/ZhaofanQiu/MLP-3D.

翻訳日:2022-06-14 14:20:41 公開日:2022-06-13

# (参考訳) 衛星によるC\^ote d'Ivoireとガーナのココア植林地域の高分解能地図

Satellite-based high-resolution maps of cocoa planted area for C\^ote d'Ivoire and Ghana ( http://arxiv.org/abs/2206.06119v1 )

ライセンス: CC BY 4.0

Nikolai Kalischek, Nico Lang, C\'ecile Renier, Rodrigo Caye Daudt, Thomas Addoah, William Thompson, Wilma J. Blaser-Hart, Rachael Garrett, Konrad Schindler, Jan D. Wegner

(参考訳) 世界最大のcocoa生産国であるc\^ote d'ivoireとガーナは世界のcocoa生産の3分の2を占めている。どちらの国でもココアが主要な多年生作物であり、約200万人の農家に収入を提供している。ココア栽培地域の正確な地図は欠落しており、保護地域の拡大の正確な定量化、生産と収量、サステナビリティガバナンスの改善に利用可能な情報制限を妨げている。本稿では,ココアプランテーションデータと公開衛星画像とを深層学習の枠組みで組み合わせ,両国のココアプランテーションの高解像度マップを作成する。以上の結果から,ココア栽培は,C\ote d'Ivoire と Ghana の保護地域における森林被害の37%以上と13%の基盤要因であり,Ghana の植林面積を最大40%まで大幅に過小評価していることが明らかとなった。これらの地図は、ココア生産地域の保全と経済発展を理解する上で重要な構成要素となっている。

C\^ote d'Ivoire and Ghana, the world's largest producers of cocoa, account for two thirds of the global cocoa production. In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of cocoa planted area are missing, hindering accurate quantification of expansion in protected areas, production and yields, and limiting information available for improved sustainability governance. Here, we combine cocoa plantation data with publicly available satellite imagery in a deep learning framework and create high-resolution maps of cocoa plantations for both countries, validated in situ. Our results suggest that cocoa cultivation is an underlying driver of over 37% and 13% of forest loss in protected areas in C\^ote d'Ivoire and Ghana, respectively, and that official reports substantially underestimate the planted area, up to 40% in Ghana. These maps serve as a crucial building block to advance understanding of conservation and economic development in cocoa producing regions.

翻訳日:2022-06-14 14:18:27 公開日:2022-06-13

# 生物学的にインスパイアされた神経経路探索

Biologically Inspired Neural Path Finding ( http://arxiv.org/abs/2206.05971v1 )

ライセンス: Link先を確認

Hang Li, Qadeer Khan, Volker Tresp, Daniel Cremers

(参考訳) ヒトの脳は、シナプスによって接続された数千億の生物学的ニューロンからなるグラフィカルな構造と見なすことができる。神経細胞が損傷した場合に、別の経路を流れる情報を自動的にルートする能力がある。さらに、脳は情報を保持し、類似するが完全に見えないシナリオに適用することができる。本稿では,脳のこれらの属性からインスピレーションを得て,一般化グラフにおけるソースノードと宛先ノードの間の最適な低コスト経路を見つけるための計算フレームワークを開発する。私たちのフレームワークは、テスト時に見当たらないグラフを処理できることを示します。さらに、任意の予測時間を維持しながら、推論中にノードを任意に追加または削除する場合に、代替の最適経路を見つけることができる。コードはここにある。 https://github.com/hangligit/pathfinding

The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain, to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time. Code is available here: https://github.com/hangligit/pathfinding

翻訳日:2022-06-14 13:56:37 公開日:2022-06-13

# 高速政策伝達のための相対的政策移行最適化

Relative Policy-Transition Optimization for Fast Policy Transfer ( http://arxiv.org/abs/2206.06009v1 )

ライセンス: Link先を確認

Lei Han, Jiawei Xu, Cheng Zhou, Yizheng Zhang, Zhengyou Zhang

(参考訳) 我々は,2つのマルコフ決定過程(mdps)間の政策伝達の問題を考える。本研究では,既存の理論結果に基づく補題法である強化学習(rl)を導入し,任意の2つのmdp間の相対性を測定する。この補題に基づいて、我々は、それぞれ高速なポリシー伝達と動的モデリングを提供する相対ポリシー最適化(RPO)と相対遷移最適化(RTO)と呼ばれる2つの新しいアルゴリズムを提案する。 RPOは相対的な方針勾配を用いてポリシーを更新し、ある環境で評価されたポリシーを転送し、別の環境でのリターンを最大化する一方、RTOは相対的な遷移勾配を用いてパラメータ化された力学モデルを更新し、2つの環境のダイナミクス間のギャップを減らす。次に、2つのアルゴリズムを統合することで、ポリシーが2つの環境と同時に相互作用し、ポリシーと遷移の更新が1つのクローズドループで完了し、ポリシー転送のための原則学習フレームワークを形成する、完全なアルゴリズムであるRelative Policy-Transition Optimization (RPTO)が提供される。本研究では,OpenAI体育館の古典的制御課題におけるRPTOの有効性を示す。

We consider the problem of policy transfer between two Markov Decision Processes (MDPs). We introduce a lemma based on existing theoretical results in reinforcement learning (RL) to measure the relativity between two arbitrary MDPs, that is the difference between any two cumulative expected returns defined on different policies and environment dynamics. Based on this lemma, we propose two new algorithms referred to as Relative Policy Optimization (RPO) and Relative Transition Optimization (RTO), which can offer fast policy transfer and dynamics modeling, respectively. RPO updates the policy using the relative policy gradient to transfer the policy evaluated in one environment to maximize the return in another, while RTO updates the parameterized dynamics model (if there exists) using the relative transition gradient to reduce the gap between the dynamics of the two environments. Then, integrating the two algorithms offers the complete algorithm Relative Policy-Transition Optimization (RPTO), in which the policy interacts with the two environments simultaneously, such that data collections from two environments, policy and transition updates are completed in one closed loop to form a principled learning framework for policy transfer. We demonstrate the effectiveness of RPTO in OpenAI gym's classic control tasks by creating policy transfer problems via variant dynamics.

翻訳日:2022-06-14 13:56:25 公開日:2022-06-13

# ディープニューラルネットワークにおけるランク低下

Rank Diminishing in Deep Neural Networks ( http://arxiv.org/abs/2206.06072v1 )

ライセンス: Link先を確認

Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, Zheng-Jun Zha

(参考訳) ニューラルネットワークのランクは、層をまたがる情報を測定する。これは、機械学習の幅広い領域にまたがる重要な構造的条件の例である。特に、低ランクな特徴表現の仮定は多くのアーキテクチャにおいてアルゴリズム的な発展をもたらす。しかし、ニューラルネットワークでは、低ランク構造を生み出す固有のメカニズムはあいまいで不明瞭である。このギャップを埋めるために,ネットワークランクの挙動に関する厳密な研究を行い,特にランク不足の概念に着目した。微分および代数的構成の基本規則からネットワークランクの普遍的単調減少特性を理論的に確立し,ネットワークブロックのランク不足と深い関数結合を明らかにする。この数値計算手法を用いて,imagenet上のネットワークランクの層毎挙動,すなわちresnet,deep mlp,transformerの実用場面における最初の経験的解析を行う。これらの実験結果は我々の理論と直接一致している。さらに,特定のカテゴリの分類信頼度を,他のカテゴリの信頼度によって線形に決定できるディープネットワークのランク不足によって生じる,新たな独立性の欠如現象を明らかにした。この研究の理論的結果は、経験的な発見とともに、ディープニューラルネットワークの本質的原理の理解を深める可能性がある。

The rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. In particular, the assumption of low-rank feature representations leads to algorithmic developments in many architectures. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear. To fill this gap, we perform a rigorous study on the behavior of network rank, focusing particularly on the notion of rank deficiency. We theoretically establish a universal monotonic decreasing property of network rank from the basic rules of differential and algebraic composition, and uncover rank deficiency of network blocks and deep function coupling. By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i.e., ResNets, deep MLPs, and Transformers on ImageNet. These empirical results are in direct accord with our theory. Furthermore, we reveal a novel phenomenon of independence deficit caused by the rank deficiency of deep networks, where classification confidence of a given category can be linearly decided by the confidence of a handful of other categories. The theoretical results of this work, together with the empirical findings, may advance understanding of the inherent principles of deep neural networks.

翻訳日:2022-06-14 13:56:03 公開日:2022-06-13

# 時系列の教師なし領域適応のためのコントラスト学習

Contrastive Learning for Unsupervised Domain Adaptation of Time Series ( http://arxiv.org/abs/2206.06243v1 )

ライセンス: Link先を確認

Yilmazcan Ozyurt, Stefan Feuerriegel, Ce Zhang

(参考訳) unsupervised domain adaptation (uda) は、ラベル付きソースドメインを使用して機械学習モデルを学習することを目的としている。 UDAは医療などの多くの分野で重要であり、様々な患者コホートにリスクスコアを適応させるのに用いられる。本稿では,CLUDAと呼ばれる時系列データのUDAのための新しいフレームワークを開発する。具体的には,多変量時系列におけるドメイン不変セマンティクスを学習するための対照的な学習フレームワークを提案する。また,本フレームワークでは,最寄りのコントラスト学習により,ソースドメインとターゲットドメインのセマンティックな変化を捉える。我々の知る限りでは、時系列データのUDAのドメイン不変セマンティック情報を学ぶための最初のフレームワークである。我々は,医療時系列を用いた大規模実世界のデータセット(MIMIC-IVとアムステルダムUMCdb)を用いて,その有効性を実証し,時系列UDAの最先端性能を実現することを示す。

Unsupervised domain adaptation (UDA) aims at learning a machine learning model using a labeled source domain that performs well on a similar yet different, unlabeled target domain. UDA is important in many applications such as medicine, where it is used to adapt risk scores across different patient cohorts. In this paper, we develop a novel framework for UDA of time series data, called CLUDA. Specifically, we propose a contrastive learning framework to learn domain-invariant semantics in multivariate time series, so that these preserve label information for the prediction task. In our framework, we further capture semantic variation between source and target domain via nearest-neighbor contrastive learning. To the best of our knowledge, ours is the first framework to learn domain-invariant semantic information for UDA of time series data. We evaluate our framework using large-scale, real-world datasets with medical time series (i.e., MIMIC-IV and AmsterdamUMCdb) to demonstrate its effectiveness and show that it achieves state-of-the-art performance for time series UDA.

翻訳日:2022-06-14 13:55:28 公開日:2022-06-13

# 予測プロセスモニタリング改善のためのニューラルネットワークによる不確かさの学習

Learning Uncertainty with Artificial Neural Networks for Improved Predictive Process Monitoring ( http://arxiv.org/abs/2206.06317v1 )

ライセンス: Link先を確認

Hans Weytjens and Jochen De Weerdt

(参考訳) 人工ニューラルネットワークが予測の不確実性を評価することができないことは、その普及に障害となる。学習データ不足によるモデル不確かさとノイズによる観測不確実性との2つのタイプを区別する。ベイズニューラルネットワークは、予測のモデルの不確実性を学ぶために、堅固な数学的基盤を使用する。観測の不確実性は、これらのネットワークに1つの層を追加し、損失関数を増強することで計算することができる。我々の貢献は、これらの不確実性概念を予測プロセス監視タスクに適用し、不確実性に基づくモデルを訓練し、残りの時間と結果を予測することである。実験の結果,不確実性推定により,より精度の低い予測が可能であり,信頼性区間は回帰と分類の両方で構築できることがわかった。これらの結論は、実行中のプロセスの初期段階でも当てはまります。さらに、デプロイされたテクニックは高速で、より正確な予測を生成する。学習された不確実性によって、プロセス予測システムに対するユーザの信頼が高まり、人とシステム間のコラボレーションが向上し、より小さなデータセットによる初期の実装が可能になる。

The inability of artificial neural networks to assess the uncertainty of their predictions is an impediment to their widespread use. We distinguish two types of learnable uncertainty: model uncertainty due to a lack of training data and noise-induced observational uncertainty. Bayesian neural networks use solid mathematical foundations to learn the model uncertainties of their predictions. The observational uncertainty can be calculated by adding one layer to these networks and augmenting their loss functions. Our contribution is to apply these uncertainty concepts to predictive process monitoring tasks to train uncertainty-based models to predict the remaining time and outcomes. Our experiments show that uncertainty estimates allow more and less accurate predictions to be differentiated and confidence intervals to be constructed in both regression and classification tasks. These conclusions remain true even in early stages of running processes. Moreover, the deployed techniques are fast and produce more accurate predictions. The learned uncertainty could increase users' confidence in their process prediction systems, promote better cooperation between humans and these systems, and enable earlier implementations with smaller datasets.

翻訳日:2022-06-14 13:53:33 公開日:2022-06-13

# (参考訳) 知識グラフの構築と放射線科医による自動放射線学レポート作成への応用

Knowledge Graph Construction and Its Application in Automatic Radiology Report Generation from Radiologist's Dictation ( http://arxiv.org/abs/2206.06308v1 )

ライセンス: CC BY 4.0

Kaveri Kale, Pushpak Bhattacharyya, Aditya Shetty, Miling Gune, Kush Shrivastava, Rustom Lawyer and Spriha Biswas

(参考訳) 従来、放射線科医は診断ノートを作成し、それを転写学者と共有する。その後、書き起こし師はメモを参照して予備書式レポートを作成し、最後に、放射線学者はレポートをレビューし、エラーを修正し、サインオフする。このワークフローはレポートに重大な遅延とエラーを引き起こす。本研究は,情報抽出(IE)やドメイン固有知識グラフ(KG)といったNLP技術を用いて,放射線技師の指示から放射線学レポートを自動生成することに焦点を当てている。本稿は,既存の大量の自由テキストラジオグラフィーレポートから情報を抽出し,各臓器のKG構築に焦点を当てる。本研究では,ルールベース,パターンベース,辞書ベースの手法と語彙意味的特徴を組み合わせた情報抽出パイプラインを構築し,エンティティと関係を抽出する。短いディクテーションで欠落した情報は、kgsからアクセスでき、病理的な記述が生成される。生成した病理的記述は、金標準病理的記述と97%の類似性を示す意味的類似度メトリクスを用いて評価される。また,本分析の結果から,我々のIEモジュールは放射線学領域のOpenIEツールよりも優れた性能を示している。さらに, 放射線科医による手作業による定性解析を行い, 生成した報告の80～85%が正しく書かれ, 残りは部分的に正しいことを示した。

Conventionally, the radiologist prepares the diagnosis notes and shares them with the transcriptionist. Then the transcriptionist prepares a preliminary formatted report referring to the notes, and finally, the radiologist reviews the report, corrects the errors, and signs off. This workflow causes significant delays and errors in the report. In current research work, we focus on applications of NLP techniques like Information Extraction (IE) and domain-specific Knowledge Graph (KG) to automatically generate radiology reports from radiologist's dictation. This paper focuses on KG construction for each organ by extracting information from an existing large corpus of free-text radiology reports. We develop an information extraction pipeline that combines rule-based, pattern-based, and dictionary-based techniques with lexical-semantic features to extract entities and relations. Missing information in short dictation can be accessed from the KGs to generate pathological descriptions and hence the radiology report. Generated pathological descriptions evaluated using semantic similarity metrics, which shows 97% similarity with gold standard pathological descriptions. Also, our analysis shows that our IE module is performing better than the OpenIE tool for the radiology domain. Furthermore, we include a manual qualitative analysis from radiologists, which shows that 80-85% of the generated reports are correctly written, and the remaining are partially correct.

翻訳日:2022-06-14 13:50:08 公開日:2022-06-13

# EnergyMatch:セミスーパービジョンラーニングのためのエネルギーベース擬似ラベル

EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning ( http://arxiv.org/abs/2206.06359v1 )

ライセンス: Link先を確認

Zhuoran Yu, Yin Li, Yong Jae Lee

(参考訳) 半教師付き学習(SSL)における最近の最先端手法は、整合性正規化と信頼に基づく疑似ラベルを組み合わせる。高品質な擬似ラベルを得るには、一般的に高い信頼しきい値を採用する。しかし,深層ネットワークにおけるソフトマックスに基づく信頼度スコアは,トレーニングデータから離れたサンプルでは任意に高い値となり,信頼性の低いサンプルであっても疑似ラベルは信頼できない可能性がある。本研究では,モデル信頼度に頼らずに,ラベルなしサンプルが"分布内"である可能性が高いか,すなわち現在のトレーニングデータに近いかを測定する。ラベルのないサンプルが「分布内」か「分布外」かを分類するために、分布外検出文献からのエネルギースコアを採用する。トレーニングが進み、ラベルのないサンプルが流通し、トレーニングに寄与するにつれて、ラベル付きデータと擬ラベル付きデータを組み合わせることで、真の分布を近似してモデルを改善することができる。提案手法は, 概念的には単純であるが, 不均衡sslベンチマークにおける信頼度ベース手法を著しく上回っており, クラスバランスデータにおける競合性能を実現した。例えば、不均衡比が50を超えると、cifar10-ltの絶対精度が4-6%向上する。最先端のロングテールSSLメソッドと組み合わせると、さらなる改善が達成される。

Recent state-of-the-art methods in semi-supervised learning (SSL) combine consistency regularization with confidence-based pseudo-labeling. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling: instead of relying on model confidence, we instead measure whether an unlabeled sample is likely to be "in-distribution"; i.e., close to the current training data. To classify whether an unlabeled sample is "in-distribution" or "out-of-distribution", we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks, and achieves competitive performance on class-balanced data. For example, it produces a 4-6% absolute accuracy improvement on CIFAR10-LT when the imbalance ratio is higher than 50. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained.

翻訳日:2022-06-14 13:36:37 公開日:2022-06-13

# 2次元ホログラフィック縮小表現を用いた信頼できないプラットフォームへの畳み込みネットワークの展開

Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations ( http://arxiv.org/abs/2206.05893v1 )

ライセンス: Link先を確認

Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt

(参考訳) ニューラルネットワークの推論を実行する計算コストのため、サードパーティの計算環境やハードウェアに推論ステップをデプロイする必要が一般的である。第三者が完全に信頼されていない場合、入力と出力の性質を難読化することが望ましいので、第三者が実行中の特定のタスクを容易に決定できない。信頼できない相手を利用するための安全なプロトコルは存在するが、実際に実行するには計算要求が多すぎる。代わりに、Connectionist Symbolic Pseudo Secretsと呼ばれる、高速でヒューリスティックなセキュリティの異なる戦略を探求します。ホログラフィック縮小表現(hrr)を利用することで、非現実的に敵に有利な脅威モデルの下でも、攻撃に対する堅牢性を示す疑似暗号化スタイルの防御を備えたニューラルネットワークを構築する。

Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common. If the third party is not fully trusted, it is desirable to obfuscate the nature of the inputs and outputs, so that the third party can not easily determine what specific task is being performed. Provably secure protocols for leveraging an untrusted party exist but are too computational demanding to run in practice. We instead explore a different strategy of fast, heuristic security that we call Connectionist Symbolic Pseudo Secrets. By leveraging Holographic Reduced Representations (HRR), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary.

翻訳日:2022-06-14 13:36:14 公開日:2022-06-13

# ar-nerf: 開口レンダリングニューラルラミアンスフィールドを用いた自然画像からの奥行きとデフォーカス効果の教師なし学習

AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields ( http://arxiv.org/abs/2206.06100v1 )

ライセンス: Link先を確認

Takuhiro Kaneko

(参考訳) データ収集の利点から、完全に教師なしの3D表現学習が注目を集めている。成功したアプローチは、生成モデル(例えば、gans)に基づく画像分布を学習し、3d認識モデル(例えば、神経放射野(nerfs))に基づいて様々なビュー画像を生成する視点認識アプローチである。しかし、トレーニングには様々なビューのイメージが必要であるため、少ない視点や限られた視点のデータセットへの適用は依然として課題である。相補的なアプローチとして,デフォーカスキューを用いた開口レンダリングGAN(AR-GAN)を提案する。しかし、ar-ganはcnnベースのモデルであり、相関度が高いにもかかわらず、視点の変化とは独立にデフォーカスを表現する。 AR-GANの代替として、共通のレイトレーシングフレームワークにおいて両因子を表現し、視点とデフォーカスの手がかりを統一的に活用できる開口レンダリングNeRF(AR-NeRF)を提案する。さらに,デフォーカス認識とデフォーカス非依存表現を不連続に学習するために,開口サイズと潜在符号を独立にランダム化しながら画像を生成するアパーチャランダム化トレーニングを提案する。実験では, 花, 鳥, 顔画像などの自然画像データセットにAR-NeRFを適用し, 深度とデフォーカス効果の教師なし学習におけるAR-NeRFの有用性を実証した。

Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection. A successful approach involves a viewpoint-aware approach that learns an image distribution based on generative models (e.g., generative adversarial networks (GANs)) while generating various view images based on 3D-aware models (e.g., neural radiance fields (NeRFs)). However, they require images with various views for training, and consequently, their application to datasets with few or limited viewpoints remains a challenge. As a complementary approach, an aperture rendering GAN (AR-GAN) that employs a defocus cue was proposed. However, an AR-GAN is a CNN-based model and represents a defocus independently from a viewpoint change despite its high correlation, which is one of the reasons for its performance. As an alternative to an AR-GAN, we propose an aperture rendering NeRF (AR-NeRF), which can utilize viewpoint and defocus cues in a unified manner by representing both factors in a common ray-tracing framework. Moreover, to learn defocus-aware and defocus-independent representations in a disentangled manner, we propose aperture randomized training, for which we learn to generate images while randomizing the aperture size and latent codes independently. During our experiments, we applied AR-NeRF to various natural image datasets, including flower, bird, and face images, the results of which demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.

翻訳日:2022-06-14 13:35:41 公開日:2022-06-13

# jiuzhang: 数学問題理解のための中国語事前学習言語モデル

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding ( http://arxiv.org/abs/2206.06315v1 )

ライセンス: Link先を確認

Wayne Xin Zhao, Kun Zhou, Zheng Gong, Beichen Zhang, Yuanhang Zhou, Jing Sha, Zhigang Chen, Shijin Wang, Cong Liu, Ji-Rong Wen

(参考訳) 本稿では,中国初の数学事前学習言語モデル(plm)を提示することで,機械の数学的知性を向上させることを目的とする。他の標準のNLPタスクとは異なり、数学的テキストは問題文に数学的用語、記号、公式を含むため理解が難しい。一般に、数学問題を解決するには複雑な数学的論理と背景知識が必要である。数学テキストの複雑な性質を考慮し,基礎科と高等科の両方からなる数学plmの学習を改善するための新しいカリキュラム事前学習手法を考案する。具体的には,まず位置バイアスマスキング戦略に基づいてトークンレベルの事前学習を行い,その後,シャッフル文と式をそれぞれ復元する論理に基づく事前学習タスクを設計する。最後に,plmが生成したソリューションのエラーの検出と修正を強制する,より難しい事前学習タスクを導入する。オフライン評価(9つの数学関連タスクを含む)とオンラインの$A/B$テストについて広範な実験を行った。実験により, 提案手法の有効性を, 競争力のあるベースラインと比較した。コードは \textcolor{blue}{\url{https://github.com/rucaibox/jiuzhang}} で利用可能です。

This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model~(PLM) for effectively understanding and representing mathematical problems. Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. Typically, it requires complex mathematical logic and background knowledge for solving mathematical problems. Considering the complex nature of mathematical texts, we design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses. Specially, we first perform token-level pre-training based on a position-biased masking strategy, and then design logic-based pre-training tasks that aim to recover the shuffled sentences and formulas, respectively. Finally, we introduce a more difficult pre-training task that enforces the PLM to detect and correct the errors in its generated solutions. We conduct extensive experiments on offline evaluation (including nine math-related tasks) and online $A/B$ test. Experimental results demonstrate the effectiveness of our approach compared with a number of competitive baselines. Our code is available at: \textcolor{blue}{\url{https://github.com/RUCAIBox/JiuZhang}}.

翻訳日:2022-06-14 13:33:59 公開日:2022-06-13

# クラス条件コントラスト学習を用いたトランスダクティブクリップ

Transductive CLIP with Class-Conditional Contrastive Learning ( http://arxiv.org/abs/2206.06177v1 )

ライセンス: Link先を確認

Junchu Huang, Weijie Chen, Shicai Yang, Di Xie, Shiliang Pu, Yueting Zhuang

(参考訳) 視覚言語事前学習モデルの目覚ましいゼロショット一般化能力に触発され、我々はCLIPモデルの監督を利用してデータラベリングの負担を軽減する。しかし、そのような監督は必然的にラベルノイズを含み、分類モデルの判別能力を大幅に低下させる。本研究では,雑音ラベル付き分類ネットワークをスクラッチから学習するための新しいフレームワークであるTransductive CLIPを提案する。まず, 擬似ラベルへの依存を緩和し, 雑音ラベルに対する耐性を高めるために, クラス条件型コントラスト学習機構を提案する。次に,疑似ラベル更新戦略としてアンサンブルラベルを採用し,ノイズラベルを用いたディープニューラルネットワークのトレーニングを安定化する。このフレームワークは、両方のテクニックを組み合わせることで、CLIPモデルからのノイズラベルの影響を効果的に低減することができる。複数のベンチマークデータセットの実験では、他の最先端メソッドよりも大幅に改善されている。

Inspired by the remarkable zero-shot generalization capacity of vision-language pre-trained model, we seek to leverage the supervision from CLIP model to alleviate the burden of data labeling. However, such supervision inevitably contains the label noise, which significantly degrades the discriminative power of the classification model. In this work, we propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch. Firstly, a class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels and boost the tolerance to noisy labels. Secondly, ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels. This framework can reduce the impact of noisy labels from CLIP model effectively by combining both techniques. Experiments on multiple benchmark datasets demonstrate the substantial improvements over other state-of-the-art methods.

翻訳日:2022-06-14 13:30:46 公開日:2022-06-13

# 確率的教師による学習領域適応オブジェクト検出

Learning Domain Adaptive Object Detection with Probabilistic Teacher ( http://arxiv.org/abs/2206.06293v1 )

ライセンス: Link先を確認

Meilin Chen, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, Shiliang Pu

(参考訳) 教師なしドメイン適応オブジェクト検出のための自己学習は難しい課題であり、その性能は擬似ボックスの品質に大きく依存する。有望な結果にもかかわらず、先行作品は、セルフトレーニング中の疑似ボックスの不確かさをほとんど見逃している。本稿では,段階的に発展する教師から未ラベルの目標データの不確実性を捉え,相互に有益な方法で生徒の学習を指導することを目的とした,簡易かつ効果的な枠組みである確率教師(PT)を提案する。具体的には,不確実性誘導型整合性トレーニングを活用して分類適応と局所化適応を促進することを提案する。また,アンカーを学習可能なパラメータと見なすことができるため,アンカー適応を局所化適応と並行して行う。この枠組みとともに,不確実性誘導型自己学習をさらに促進する新しいエントロピー焦点損失(efl)を提案する。 EFLを装備したPTは、以前のベースライン全てを大きなマージンで上回り、新しい最先端を実現する。

Self-training for unsupervised domain adaptive object detection is a challenging task, of which the performance depends heavily on the quality of pseudo boxes. Despite the promising results, prior works have largely overlooked the uncertainty of pseudo boxes during self-training. In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture the uncertainty of unlabeled target data from a gradually evolving teacher and guides the learning of a student in a mutually beneficial manner. Specifically, we propose to leverage the uncertainty-guided consistency training to promote classification adaptation and localization adaptation, rather than filtering pseudo boxes via an elaborate confidence threshold. In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter. Together with this framework, we also present a novel Entropy Focal Loss (EFL) to further facilitate the uncertainty-guided self-training. Equipped with EFL, PT outperforms all previous baselines by a large margin and achieve new state-of-the-arts.

翻訳日:2022-06-14 13:30:31 公開日:2022-06-13

# テキスト・モデリングのための潜時拡散エネルギーベースモデル

Latent Diffusion Energy-Based Model for Interpretable Text Modeling ( http://arxiv.org/abs/2206.05895v1 )

ライセンス: Link先を確認

Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruigi Gao, Yixin Zhu, Song-Chun Zhu, and Ying Nian Wu

(参考訳) 潜在宇宙エネルギーベースモデル(EBMs)は、エネルギーベースモデルとしても知られ、生成モデルへの関心が高まっている。定式化の柔軟性と潜在空間の強力なモデリング力により、テキストモデリングの解釈可能性を目指して、近年の研究が進められている。しかし、遅延空間のEMMは、データ空間におけるEMMのいくつかの欠陥を継承し、縮退したMCMCサンプリングの品質は、特に複雑な遅延構造を持つデータにおいて、訓練における生成品質と不安定性を低下させる可能性がある。本研究では, 拡散回復可能性学習をサンプリング問題の解決策として活用する最近の取り組みに触発されて, 拡散モデルと潜時空間ebmsとの共生を, 潜時拡散エネルギーに基づくモデルとして創成した変分学習枠組みに導入する。本研究では,情報ボトルネックと協調して幾何クラスタリングに基づく正規化手法を開発し,学習した潜在空間の品質をさらに向上させる。いくつかの課題に対する実験は、強力なテキストモデリングにおける我々のモデルの優れた性能を示すものである。

Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.

翻訳日:2022-06-14 13:29:49 公開日:2022-06-13

# (参考訳) メディエーター:NLPモデル行動を説明する会話エージェント

Mediators: Conversational Agents Explaining NLP Model Behavior ( http://arxiv.org/abs/2206.06029v1 )

ライセンス: CC BY 4.0

Nils Feldhus, Ajay Madhavan Ravichandran, Sebastian M\"oller

(参考訳) 人間中心の説明可能な人工知能(HCXAI)コミュニティは、人間と機械の会話として説明プロセスをフレーミングする必要性を高めた。本稿では,ニューラルモデルの振る舞いを自然言語を用いて対話的に説明できるテキストベースの会話エージェントである仲介者のためのデシデラタを構築した。自然言語処理(nlp)研究の観点からは,感情分析の課題に対するこのような仲介者の青写真を作成し,対話に基づく説明への道のりを現在研究がどこまで進んでいるかを評価する。

The human-centric explainable artificial intelligence (HCXAI) community has raised the need for framing the explanation process as a conversation between human and machine. In this position paper, we establish desiderata for Mediators, text-based conversational agents which are capable of explaining the behavior of neural models interactively using natural language. From the perspective of natural language processing (NLP) research, we engineer a blueprint of such a Mediator for the task of sentiment analysis and assess how far along current research is on the path towards dialogue-based explanations.

翻訳日:2022-06-14 13:28:14 公開日:2022-06-13

# 文脈埋め込みを用いた遷移型抽象的意味表現

Transition-based Abstract Meaning Representation Parsing with Contextual Embeddings ( http://arxiv.org/abs/2206.06229v1 )

ライセンス: Link先を確認

Yichao Liang

(参考訳) 言語を理解して生成する能力は、人間の認知を他の既知の生命体と区別する。統計的言語モデルと記号意味論的意味論の2つの意味への最も成功した経路を意味解析のタスクで融合する方法について検討した。遷移型抽象的意味表現(AMR)構文解析(AmrEager)を基盤として,AMR解析の課題に事前学習した文脈認識単語の埋め込み(BERTやRoBERTaなど)を組み込むことの有用性について検討し,AmrBergerと命名した新しい構文解析に寄与する。実験により、これらのリッチな語彙的特徴だけでは、非文脈的特徴と比較してsmatchスコアによって測定されたパーザ全体のパフォーマンスを改善するのにはあまり役に立たないことがわかった。病変研究を通じて,コンテクスト埋め込みの使用は,明示的な構文特徴の除去に対して,より堅牢なシステムを実現するのに役立つことがわかった。これらの知見は文脈埋め込みと言語モデルの強みと弱みを現在の形で明らかにし、その深い理解を動機付けている。

The ability to understand and generate languages sets human cognition apart from other known life forms'. We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing. Building on a transition-based, Abstract Meaning Representation (AMR) parser, AmrEager, we explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of AMR parsing, contributing a new parser we dub as AmrBerger. Experiments find these rich lexical features alone are not particularly helpful in improving the parser's overall performance as measured by the SMATCH score when compared to the non-contextual counterpart, while additional concept information empowers the system to outperform the baselines. Through lesion study, we found the use of contextual embeddings helps to make the system more robust against the removal of explicit syntactical features. These findings expose the strength and weakness of the contextual embeddings and the language models in the current form, and motivate deeper understanding thereof.

翻訳日:2022-06-14 13:03:39 公開日:2022-06-13

# SIXO: ツイストオブジェクトによるスムーズな推論

SIXO: Smoothing Inference with Twisted Objectives ( http://arxiv.org/abs/2206.05952v1 )

ライセンス: Link先を確認

Dieterich Lawson, Allan Ravent\'os, Andrew Warrington, Scott Linderman

(参考訳) シークエンシャルモンテカルロ (Sequential Monte Carlo, SMC) は、状態空間モデルに対する推論アルゴリズムであり、中間ターゲット分布の列からサンプリングすることで後部を近似する。対象の分布はしばしばフィルタリング分布として選択されるが、これらは将来の観測からの情報を無視し、推論とモデル学習の実践的および理論的制限をもたらす。 SIXOは、スムーズな分布を近似するターゲットを学習し、全ての観測結果から情報を取り入れる手法である。重要なアイデアは、フィルタ分布を平滑化分布に警告する関数を適合させるために密度比推定を使用することである。次に、これらの学習対象とSMCを用いて、モデルと提案学習の変動目標を定義する。 SIXO は対数境界の下限を確実に狭くし、様々な領域でより正確な後方推測とパラメータ推定を提供する。

Sequential Monte Carlo (SMC) is an inference algorithm for state space models that approximates the posterior by sampling from a sequence of intermediate target distributions. The target distributions are often chosen to be the filtering distributions, but these ignore information from future observations, leading to practical and theoretical limitations in inference and model learning. We introduce SIXO, a method that instead learns targets that approximate the smoothing distributions, incorporating information from all observations. The key idea is to use density ratio estimation to fit functions that warp the filtering distributions into the smoothing distributions. We then use SMC with these learned targets to define a variational objective for model and proposal learning. SIXO yields provably tighter log marginal lower bounds and offers significantly more accurate posterior inferences and parameter estimates in a variety of domains.

翻訳日:2022-06-14 13:03:14 公開日:2022-06-13

# iCITRIS:瞬時効果のための因果表現学習

iCITRIS: Causal Representation Learning for Instantaneous Temporal Effects ( http://arxiv.org/abs/2206.06169v1 )

ライセンス: Link先を確認

Phillip Lippe, Sara Magliacane, Sindy L\"owe, Yuki M. Asano, Taco Cohen, Efstratios Gavves

(参考訳) 因果表現学習は、基礎となる因果変数とその関係を画像などの高次元観察から識別するタスクである。近年の研究では, 因果関係が存在しないという仮定の下で, 時間的な観測順序から因果変数を再構築できることが示されている。しかし,実際の応用では,我々の測定やフレームレートは多くの因果効果よりも遅い可能性がある。効果を効果的に生成し、以前の識別可能性の結果を無効にする。そこで本研究では,既知の介入目標を満たした完全な介入が与えられた場合,時間系列における瞬時効果を処理できる因果表現学習手法であるicitrisを提案する。 iCITRISは、時間的観察から因果因子を特定し、同時に異なる因果発見法を用いて因果グラフを学習する。 3つのビデオデータセットの実験において、iCITRISは因果因子とその因果グラフを正確に識別する。

Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that can handle instantaneous effects in temporal sequences when given perfect interventions with known intervention targets. iCITRIS identifies the causal factors from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three video datasets, iCITRIS accurately identifies the causal factors and their causal graph.

翻訳日:2022-06-14 13:02:58 公開日:2022-06-13

# Markov Chain Score Ascent: Markovian Gradientsによる変分推論の統一フレームワーク

Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients ( http://arxiv.org/abs/2206.06295v1 )

ライセンス: Link先を確認

Kyurae Kim, Jisu Oh, Jacob R. Gardner, Adji Bousso Dieng, Hongseok Kim

(参考訳) 確率勾配降下(sgd)を伴う包括的kullback-leibler(kl)分岐の最小化は、その勾配が後方の積分として定義されるため困難である。近年,マルコフ連鎖から得られた偏差勾配推定値を用いてSGDを実行する方法が提案されている。本稿では, この手法について, 混合速度と勾配分散の確立により, 初の非漸近収束解析を行う。そこで我々は,これらの手法をMarkov chain score Ascent (MCSA) と総称し,Markov chain gradient descent framework の特殊な場合として適用できることを実証した。さらに, この新たな理解を活かし, 勾配分散のより厳密な結合を実現する新しいmcsaスキームであるparallel mcsa (pmcsa) を開発した。この改良された理論結果が優れた経験的性能をもたらすことを実証する。

Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.

翻訳日:2022-06-14 13:02:44 公開日:2022-06-13

# (参考訳) 単純なキューが強力なマルチオブジェクトトラッカーに導く

Simple Cues Lead to a Strong Multi-Object Tracker ( http://arxiv.org/abs/2206.04656v3 )

ライセンス: CC BY 4.0

Jenny Seidenschwarz, Guillem Bras\'o, Ismail Elezi, and Laura Leal-Taix\'e

(参考訳) 長い間、マルチオブジェクト追跡の最も一般的なパラダイムはtracking-by-detection(tbd)で、まずオブジェクトを検出してビデオフレーム上で関連付ける。関連して、ほとんどのモデルは動きと外観の手がかりに頼りになる。これらの方法に引き続き依存しているが、近年のアプローチでは、例えば、データトレーニングや全体的な複雑なフレームワークの必要性が高まっている。私たちは 1) 設計上の重要な選択が適用されれば,少量のトレーニングデータから強固な手がかりを得ることができる。 2) これらの強い手がかりから、ハンガリーの標準マッチングに基づく協会は、印象的な結果を得るのに十分である。私たちの主な洞察は、外見に基づくトラッキングにおいて、標準的な再識別ネットワークが優れている重要なコンポーネントを特定することです。その障害事例を広範囲に分析し,我々の外観特徴と単純な運動モデルの組み合わせが強い追跡結果をもたらすことを示した。 IDF1では5.4pp,HOTAでは4.4ppに向上し,MOT17およびMOT20データセットの最先端性能が向上した。論文が受け入れられた後、コードとモデルをリリースします。

For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resource to motion and appearance cues. While still relying on these cues, recent approaches based on, e.g., attention have shown an ever-increasing need for training data and overall complex frameworks. We claim that 1) strong cues can be obtained from little amounts of training data if some key design choices are applied, 2) given these strong cues, standard Hungarian matching-based association is enough to obtain impressive results. Our main insight is to identify key components that allow a standard reidentification network to excel at appearance-based tracking. We extensively analyze its failure cases and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our model achieves state-of-the-art performance on MOT17 and MOT20 datasets outperforming previous state-of-the-art trackers by up to 5.4pp in IDF1 and 4.4pp in HOTA. We will release the code and models after the paper's acceptance.

翻訳日:2022-06-14 12:26:34 公開日:2022-06-13

# (参考訳) コアセットを擁護する: アクティブラーニングのための密度認識型コアセット選択

In Defense of Core-set: A Density-aware Core-set Selection for Active Learning ( http://arxiv.org/abs/2206.04838v2 )

ライセンス: CC BY 4.0

Yeachan Kim, Bonggun Shin

(参考訳) アクティブラーニングは、ラベルのないデータセットから情報サンプルをラベル付けすることで、ラベル付きデータセットの効率的な構築を可能にする。実世界のアクティブな学習シナリオでは、多くの冗長あるいは非常に類似したサンプルが存在するため、選択されたサンプルの多様性を考慮することが重要である。コアセットアプローチは、サンプル間の距離に基づいて多様なサンプルを選択する、有望な多様性に基づく手法である。しかし、このアプローチは、神経モデルが低い信頼性を示す最も難しいサンプルを選択する不確実性に基づくアプローチに比べて、パフォーマンスが劣る。本研究では, 密度のレンズを通して特徴空間を解析し, 興味深いことに, 局所スパース領域は密度の高い領域よりも情報的なサンプルを持つ傾向にある。本分析により,密度認識によるコアセットのアプローチが強化され,密度認識コアセット(DACS)が提案される。この戦略は,未ラベル標本の密度を推定し,主にスパース領域から多種多様な試料を抽出する。密度推定における計算ボトルネックを削減するため,局所性に敏感なハッシュに基づく新しい密度近似を提案する。実験により,DACSの分類・回帰作業における有効性が明らかに示され,実用シナリオにおいてDACSが最先端の性能を発揮できることを示す。 DACSはニューラルネットワークアーキテクチャに弱いため,既存の手法とDACSを効果的に組み合わせることができることを示すための,単純かつ効果的な組み合わせ法を提案する。

Active learning enables the efficient construction of a labeled dataset by labeling informative samples from an unlabeled dataset. In a real-world active learning scenario, considering the diversity of the selected samples is crucial because many redundant or highly similar samples exist. Core-set approach is the promising diversity-based method selecting diverse samples based on the distance between samples. However, the approach poorly performs compared to the uncertainty-based approaches that select the most difficult samples where neural models reveal low confidence. In this work, we analyze the feature space through the lens of the density and, interestingly, observe that locally sparse regions tend to have more informative samples than dense regions. Motivated by our analysis, we empower the core-set approach with the density-awareness and propose a density-aware core-set (DACS). The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions. To reduce the computational bottlenecks in estimating the density, we also introduce a new density approximation based on locality-sensitive hashing. Experimental results clearly demonstrate the efficacy of DACS in both classification and regression tasks and specifically show that DACS can produce state-of-the-art performance in a practical scenario. Since DACS is weakly dependent on neural architectures, we present a simple yet effective combination method to show that the existing methods can be beneficially combined with DACS.

翻訳日:2022-06-14 12:25:23 公開日:2022-06-13

# (参考訳) フェデレーション学習のための高速深層オートエンコーダ

Fast Deep Autoencoder for Federated learning ( http://arxiv.org/abs/2206.05136v2 )

ライセンス: CC BY 4.0

David Novoa-Paradela, Oscar Romero-Fontenla, Bertha Guijarro-Berdi\~nas

(参考訳) 本稿では,ディープオートエンコーダの新規かつ高速かつプライバシ保護実装を提案する。 DAEF(Deep Autoencoder for Federated Learning)は、従来のニューラルネットワークとは異なり、ディープオートエンコーダネットワークを非定型的にトレーニングすることで、トレーニング時間を劇的に短縮する。そのトレーニングは分散(データセットの分割を並行して行う)とインクリメンタル(部分モデルの集約)で行うことができ、数学的定式化のため、交換されるデータはユーザのプライバシを危険にさらすことはない。これにより、DAEFはエッジコンピューティングとフェデレーション学習シナリオの有効な方法となる。この手法は、7つの実際の異常検出データセットを用いた従来の(反復的な)ディープオートエンコーダと比較され、daefの高速トレーニングにもかかわらず、その性能が類似していることが示されている。

This paper presents a novel, fast and privacy preserving implementation of deep autoencoders. DAEF (Deep Autoencoder for Federated learning), unlike traditional neural networks, trains a deep autoencoder network in a non-iterative way, which drastically reduces its training time. Its training can be carried out in a distributed way (several partitions of the dataset in parallel) and incrementally (aggregation of partial models), and due to its mathematical formulation, the data that is exchanged does not endanger the privacy of the users. This makes DAEF a valid method for edge computing and federated learning scenarios. The method has been evaluated and compared to traditional (iterative) deep autoencoders using seven real anomaly detection datasets, and their performance have been shown to be similar despite DAEF's faster training.

翻訳日:2022-06-14 12:09:48 公開日:2022-06-13

# MAREO: メモリと注意に基づく視覚的リズオン

MAREO: Memory- and Attention- based visual REasOning ( http://arxiv.org/abs/2206.04928v2 )

ライセンス: Link先を確認

Mohit Vaishnav, Thomas Serre

(参考訳) 人間は、複雑な視覚シーンを柔軟に解析し理解する能力において、現代のAIシステムを大きく上回っている。注意と記憶は、行動に関連した視覚情報を選択的に保守し、操作し、最も困難な視覚的推論タスクを解決する能力において重要な役割を果たすことが知られている2つのシステムである。本稿では,視覚推論に関する認知科学文献,記憶と注意に基づく(視覚)推論(mareo)アーキテクチャに触発された視覚推論のための新しいアーキテクチャを提案する。 MAREOは、脳が複雑な視覚的推論問題を合成的に解決し、より複雑な視覚ルーチンを形成するための基本的な視覚操作を組み合わせることを学習することで、アクティブビジョン理論をインスタンス化する。 MAREOは、アテンションシフトのシーケンスを通じて視覚的推論タスクの解決を学び、マルチヘッドトランスフォーマーモジュールを介してタスク関連視覚情報をメモリバンクに保持する。視覚ルーチンは、シーン内のオブジェクト間のさまざまな関係を判断する専用の推論モジュールによってデプロイされる。 4種類の推論タスクの実験は、堅牢でサンプル効率のよい視覚ルーチンを学習するMAREOの能力を示している。

Humans continue to vastly outperform modern AI systems in their ability to parse and understand complex visual scenes flexibly. Attention and memory are two systems known to play a critical role in our ability to selectively maintain and manipulate behaviorally-relevant visual information to solve some of the most challenging visual reasoning tasks. Here, we present a novel architecture for visual reasoning inspired by the cognitive-science literature on visual reasoning, the Memory- and Attention-based (visual) REasOning (MAREO) architecture. MAREO instantiates an active-vision theory, which posits that the brain solves complex visual reasoning problems compositionally by learning to combine previously-learned elementary visual operations to form more complex visual routines. MAREO learns to solve visual reasoning tasks via sequences of attention shifts to route and maintain task-relevant visual information into a memory bank via a multi-head transformer module. Visual routines are then deployed by a dedicated reasoning module trained to judge various relations between objects in the scenes. Experiments on four types of reasoning tasks demonstrate MAREO's ability to learn visual routines in a robust and sample-efficient manner.

翻訳日:2022-06-14 11:48:33 公開日:2022-06-13

# 評価理論を用いたテキスト中の感情の次元モデリング:コーパス生成、注釈信頼性、予測

Dimensional Modeling of Emotions in Text with Appraisal Theories: Corpus Creation, Annotation Reliability, and Prediction ( http://arxiv.org/abs/2206.05238v2 )

ライセンス: Link先を確認

Enrica Troiano and Laura Oberl\"ander and Roman Klinger

(参考訳) 感情分析の最も顕著なタスクは、テキストに感情を割り当て、言語で感情がどのように現れるかを理解することである。自然言語処理における重要な観察は、感情はイベントのみを参照することで暗黙的にコミュニケーションでき、感情名に明示的に言及することなく、感情の共感的、客観的な理解に訴えることができることである。心理学において、評価理論として知られる感情理論のクラスは、出来事と感情の関係を説明することを目的としている。評価は、関連する出来事を経験する人々による認知評価を測定する変数として形式化することができる。それらは、イベントが新規である場合、人が自分自身を責任とみなす場合、それが自身の目標と一致している場合、その他多くの場合、評価を含む。このような評価は、例えば、新しい状況が驚きを引き起こすことや、不確実な結果をもたらすことが恐怖を引き起こすことを、イベントに基づいてどの感情が発達するかを説明する。テキストにおける感情分析における評価理論の適合性を分析し,評価概念が注釈者によって確実に再構築できるか,テキスト分類器によって予測可能か,評価概念が感情カテゴリーの識別に役立つかを理解することを目的としている。そこで我々は,特定の感情を誘発する出来事をテキストで記述し,評価を明らかにすることでコーパスをコンパイルする。そして,本文から感情や評価を再構築するよう読者に求めた。この設定により、感情や評価がテキストから純粋に回収できるかどうかを計測することができ、モデルのパフォーマンス測定を判断するための人間のベースラインを提供する。テキスト分類法を人間の注釈者と比較した結果,どちらも類似の性能で感情や評価を確実に検出できることがわかった。さらに、評価概念がテキスト中の感情の分類を改善することを示す。

The most prominent tasks in emotion analysis are to assign emotions to texts and to understand how emotions manifest in language. An important observation for natural language processing is that emotions can be communicated implicitly by referring to events alone, appealing to an empathetic, intersubjective understanding of events, even without explicitly mentioning an emotion name. In psychology, the class of emotion theories known as appraisal theories aims at explaining the link between events and emotions. Appraisals can be formalized as variables that measure a cognitive evaluation by people living through an event that they consider relevant. They include the assessment if an event is novel, if the person considers themselves to be responsible, if it is in line with the own goals, and many others. Such appraisals explain which emotions are developed based on an event, e.g., that a novel situation can induce surprise or one with uncertain consequences could evoke fear. We analyze the suitability of appraisal theories for emotion analysis in text with the goal of understanding if appraisal concepts can reliably be reconstructed by annotators, if they can be predicted by text classifiers, and if appraisal concepts help to identify emotion categories. To achieve that, we compile a corpus by asking people to textually describe events that triggered particular emotions and to disclose their appraisals. Then, we ask readers to reconstruct emotions and appraisals from the text. This setup allows us to measure if emotions and appraisals can be recovered purely from text and provides a human baseline to judge model's performance measures. Our comparison of text classification methods to human annotators shows that both can reliably detect emotions and appraisals with similar performance. We further show that appraisal concepts improve the categorization of emotions in text.

翻訳日:2022-06-14 11:48:12 公開日:2022-06-13

# (参考訳) ニューラルラプラス:ラプラス領域における微分方程式の多様なクラスを学ぶ

Neural Laplace: Learning diverse classes of differential equations in the Laplace domain ( http://arxiv.org/abs/2206.04843v2 )

ライセンス: CC BY 4.0

Samuel Holt, Zhaozhi Qian, Mihaela van der Schaar

(参考訳) ニューラルネットワークで学習したODEを用いたニューラル正規微分方程式モデルしかし、ODEは工学や生物学的システムに共通する長距離依存や不連続性を持つシステムをモデル化するには基本的に不十分である。微分方程式の幅広いクラス (de) は、遅延微分方程式や積分微分方程式を含む修正として提案されている。さらに、剛体ODEとODEを一方向強制関数でモデル化する場合、Neural ODEは数値不安定性に悩まされる。本研究は,上記を含む多種多様なDESクラスを学習するための統一フレームワークであるNeural Laplaceを提案する。時間領域のダイナミクスをモデル化するのではなく、ラプラス領域でモデル化し、時間における履歴依存性や不連続を複素指数関数の和として表すことができる。学習をより効率的にするために、リーマン球面の幾何学的立体地図を用いてラプラス領域のより滑らかさを誘導する。実験では、Neural Laplaceは、複雑な履歴依存や急激な変化を含む様々なDESクラスの軌道をモデル化および外挿する上で、優れた性能を示す。

Neural Ordinary Differential Equations model dynamical systems with ODEs learned by neural networks. However, ODEs are fundamentally inadequate to model systems with long-range dependencies or discontinuities, which are common in engineering and biological systems. Broader classes of differential equations (DE) have been proposed as remedies, including delay differential equations and integro-differential equations. Furthermore, Neural ODE suffers from numerical instability when modelling stiff ODEs and ODEs with piecewise forcing functions. In this work, we propose Neural Laplace, a unified framework for learning diverse classes of DEs including all the aforementioned ones. Instead of modelling the dynamics in the time domain, we model it in the Laplace domain, where the history-dependencies and discontinuities in time can be represented as summations of complex exponentials. To make learning more efficient, we use the geometrical stereographic map of a Riemann sphere to induce more smoothness in the Laplace domain. In the experiments, Neural Laplace shows superior performance in modelling and extrapolating the trajectories of diverse classes of DEs, including the ones with complex history dependency and abrupt changes.

翻訳日:2022-06-14 11:46:59 公開日:2022-06-13

# COSTA: グラフコントラスト学習のための共分散保存機能強化

COSTA: Covariance-Preserving Feature Augmentation for Graph Contrastive Learning ( http://arxiv.org/abs/2206.04726v2 )

ライセンス: Link先を確認

Yifei Zhang and Hao Zhu and Zixing Song and Piotr Koniusz and Irwin King

(参考訳) グラフコントラスト学習 (gcl) はグラフ表現学習を改善し、様々な下流タスクで sota に繋がる。グラフ拡大ステップは、GCLの重要なステップであるが、ほとんど研究されていない。本稿では,グラフ拡張によって得られるノード埋め込みが偏りが強く,下流タスクの識別的特徴の学習から対照的なモデルを多少制限することを示す。したがって、入力空間におけるグラフの強化を調べる代わりに、隠れた特徴の強化(特徴の強化)を行うように提案する。いわゆる行列スケッチにインスパイアされたCOSTAは,従来の特徴の「よいスケッチ」を保ち,拡張された特徴を生成できる,新しいCOSTA(COvariance-preServing feaTure space Augmentation framework for GCL)を提案する。 COSTAによる機能拡張の優位性を強調するため、メモリと計算を保存するシングルビュー設定(マルチビュー設定に加えて)について検討する。 COSTAによる機能拡張は,グラフ拡張に基づくモデルに比べて,同等/ベターな結果が得られることを示す。

Graph contrastive learning (GCL) improves graph representation learning, leading to SOTA on various downstream tasks. The graph augmentation step is a vital but scarcely studied step of GCL. In this paper, we show that the node embedding obtained via the graph augmentations is highly biased, somewhat limiting contrastive models from learning discriminative features for downstream tasks. Thus, instead of investigating graph augmentation in the input space, we alternatively propose to perform augmentations on the hidden features (feature augmentation). Inspired by so-called matrix sketching, we propose COSTA, a novel COvariance-preServing feaTure space Augmentation framework for GCL, which generates augmented features by maintaining a "good sketch" of original features. To highlight the superiority of feature augmentation with COSTA, we investigate a single-view setting (in addition to multi-view one) which conserves memory and computations. We show that the feature augmentation with COSTA achieves comparable/better results than graph augmentation based models.

翻訳日:2022-06-14 11:20:20 公開日:2022-06-13

# AI-MIA:医療画像による新型コロナウイルス検出・重症度分析

AI-MIA: COVID-19 Detection & Severity Analysis through Medical Imaging ( http://arxiv.org/abs/2206.04732v2 )

ライセンス: Link先を確認

Dimitrios Kollias and Anastasios Arsenos and Stefanos Kollias

(参考訳) 本稿では,欧州コンピュータビジョン会議(ECCV 2022)におけるAIIAワークショップの枠組みにおいて,第2回Covid-19コンペティションの基幹となるアプローチについて述べる。 COV19-CT-DBデータベースは、約7,700個の3DCTスキャンからなる新型コロナウイルスの予防のために注釈付けされている。コビッド19の症例からなるデータベースの一部は、さらに4つのコビッド19の重症度条件で注釈付けされている。トレーニング、検証、テストデータセットで、データベースと後者を分割しました。前者2つのデータセットは機械学習モデルのトレーニングと検証に使用され、後者は開発したモデルの評価に使用される。ベースラインアプローチは、CNN-RNNネットワークに基づくディープラーニングアプローチで構成され、そのパフォーマンスをCOVID19-CT-DBデータベースに報告する。

This paper presents the baseline approach for the organized 2nd Covid-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). It presents the COV19-CT-DB database which is annotated for COVID-19 detction, consisting of about 7,700 3-D CT scans. Part of the database consisting of Covid-19 cases is further annotated in terms of four Covid-19 severity conditions. We have split the database and the latter part of it in training, validation and test datasets. The former two datasets are used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. The baseline approach consists of a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database.

翻訳日:2022-06-14 11:20:02 公開日:2022-06-13

# トピック制御可能な要約のためのトピックアウェア評価とトランスフォーマー法

Topic-Aware Evaluation and Transformer Methods for Topic-Controllable Summarization ( http://arxiv.org/abs/2206.04317v2 )

ライセンス: Link先を確認

Tatiana Passali, Grigorios Tsoumakas

(参考訳) トピック制御可能な要約は、幅広い応用可能性を持つ新たな研究分野である。しかし、既存のアプローチには大きな制限がある。第一に、現在この課題に対する評価基準は確立されていない。さらに、recurrentアーキテクチャ上に構築された既存のメソッドは、最近のtransformerベースのアーキテクチャに比べてパフォーマンスを著しく制限すると同時に、トピックを制御するためにモデルのアーキテクチャを変更する必要もある。本研究では,生成した要約と所望のトピックとの親和性に基づいて,生成した要約を自動的に評価する新たなトピック指向評価尺度を提案する。また,本尺度の信頼性を検証するユーザ調査を行った。最後に,モデルアーキテクチャにトピック埋め込みを組み込むか,あるいは要約生成を導くために制御トークンを使用するか,トピック制御可能な要約方法を提案する。実験結果から, 制御トークンは, より複雑な埋め込みベースのアプローチに比べ, はるかに高速かつ優れた性能が得られることがわかった。

Topic-controllable summarization is an emerging research area with a wide range of potential applications. However, existing approaches suffer from significant limitations. First, there is currently no established evaluation metric for this task. Furthermore, existing methods built upon recurrent architectures, which can significantly limit their performance compared to more recent Transformer-based architectures, while they also require modifications to the model's architecture for controlling the topic. In this work, we propose a new topic-oriented evaluation measure to automatically evaluate the generated summaries based on the topic affinity between the generated summary and the desired topic. We also conducted a user study that validates the reliability of this measure. Finally, we propose simple, yet powerful methods for topic-controllable summarization either incorporating topic embeddings into the model's architecture or employing control tokens to guide the summary generation. Experimental results show that control tokens can achieve better performance compared to more complicated embedding-based approaches while being at the same time significantly faster.

翻訳日:2022-06-14 11:19:48 公開日:2022-06-13

# 回復力のある分散ブースティングアルゴリズム

A Resilient Distributed Boosting Algorithm ( http://arxiv.org/abs/2206.04713v2 )

ライセンス: Link先を確認

Yuval Filmus, Idan Mehalel and Shay Moran

(参考訳) データが複数のパーティに分散する学習タスクを考えると、コミュニケーションは、当事者が最小化したい基本的なリソースの1つです。限られた雑音に耐性を持つ分散ブースティングアルゴリズムを提案する。我々のアルゴリズムは古典的なブースティングアルゴリズムに似ているが、Impagliazzoのハードコア補題(Impagliazzo95)にインスパイアされた新しいコンポーネントを備えており、アルゴリズムにロバストな品質を加えている。また, 漸近的に大きい雑音に対するレジリエンスは通信効率のよいアルゴリズムでは達成できないことを示すことで, この結果を補完する。

Given a learning task where the data is distributed among several parties, communication is one of the fundamental resources which the parties would like to minimize. We present a distributed boosting algorithm which is resilient to a limited amount of noise. Our algorithm is similar to classical boosting algorithms, although it is equipped with a new component, inspired by Impagliazzo's hard-core lemma [Impagliazzo95], adding a robustness quality to the algorithm. We also complement this result by showing that resilience to any asymptotically larger noise is not achievable by a communication-efficient algorithm.

翻訳日:2022-06-14 11:18:54 公開日:2022-06-13

# スリングショット機構:適応的最適化とグロッキング現象の実証的研究

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon ( http://arxiv.org/abs/2206.04817v2 )

ライセンス: Link先を確認

Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss and Joshua Susskind

(参考訳) Power et al. (arXiv:2201.02177 ) によって報告されたグルーキング現象は、長期にわたるオーバーフィッティングが続き、突然に完全な一般化へと移行した状態を指す。本稿では,Grokkingの基盤を明らかにするために,一連の実証的研究を行った。具体的には,Slingshot Mechanismと呼ばれる,適応型最適化機構を極端に遅い段階から発見する。スリングショット機構の顕著なアーチファクトは、安定なトレーニング体制と不安定なトレーニング体制の間の循環相転移によって測定でき、最後の層重みのノルムの循環挙動によって容易に監視できる。我々は、明示的な正則化がなければ、(arXiv:2201.02177 )GrokkingはほとんどSlingshotsの開始時にのみ発生し、それなしでは存在しないことを実証的に観察した。より一般的な環境では一般的で容易に再現できるが、スリングショット機構は我々が認識しているいかなる既知の最適化理論にも従わず、奥行きを調べることなく容易に見過ごせる。私たちの研究は、トレーニングの後期における適応勾配最適化器の驚くほど有用な帰納的バイアスを示し、それらの起源の理論的解析の改訂を要求している。

The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 ) refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking via a series of empirical studies. Specifically, we uncover an optimization anomaly plaguing adaptive optimizers at extremely late stages of training, referred to as the Slingshot Mechanism. A prominent artifact of the Slingshot Mechanism can be measured by the cyclic phase transitions between stable and unstable training regimes, and can be easily monitored by the cyclic behavior of the norm of the last layers weights. We empirically observe that without explicit regularization, Grokking as reported in ( arXiv:2201.02177 ) almost exclusively happens at the onset of Slingshots, and is absent without it. While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination. Our work points to a surprising and useful inductive bias of adaptive gradient optimizers at late stages of training, calling for a revised theoretical analysis of their origin.

翻訳日:2022-06-14 11:18:42 公開日:2022-06-13

PDF登録状況（公開日: 20220613）