Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210304となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# w-posenet: 密度対応正規化画素対ポーズ回帰 W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression ( http://arxiv.org/abs/1912.11888v2 ) ライセンス: Link先を確認	Zelin Xu, Ke Chen and Kui Jia	(参考訳) 6dポーズ推定の解決は,本質的外観や形状のばらつき,物体間咬合の重篤さに対処し,非制御環境下で取得したデータの大幅な照明変化や低品質化に照らしてより困難になる。本稿では,入力データから6次元ポーズ,モデル空間における3次元座標に強く回帰する新しいポーズ推定アルゴリズムW-PoseNetを提案する。言い換えれば、深層ネットワークにおけるポーズ回帰を学習した局所的特徴は、3次元ポーズ感応座標への画素ワイド対応マッピングを補助タスクとして明示的に学習することで正規化される。さらに,不整合とスパースな局所的特徴に対するロバスト性を改善するため,画素単位の特徴のスパース対と画素対ポーズ予測におけるソフト投票の組み合わせを考案した。人気の高いYCB-VideoとLineMODベンチマークの実験結果から、提案したW-PoseNetは最先端のアルゴリズムよりも一貫して優れた性能を発揮することが示された。 Solving 6D pose estimation is non-trivial to cope with intrinsic appearance and shape variation and severe inter-object occlusion, and is made more challenging in light of extrinsic large illumination changes and low quality of the acquired data under an uncontrolled environment. This paper introduces a novel pose estimation algorithm W-PoseNet, which densely regresses from input data to 6D pose and also 3D coordinates in model space. In other words, local features learned for pose regression in our deep network are regularized by explicitly learning pixel-wise correspondence mapping onto 3D pose-sensitive coordinates as an auxiliary task. Moreover, a sparse pair combination of pixel-wise features and soft voting on pixel-pair pose predictions are designed to improve robustness to inconsistent and sparse local features. Experiment results on the popular YCB-Video and LineMOD benchmarks show that the proposed W-PoseNet consistently achieves superior performance to the state-of-the-art algorithms.	翻訳日:2023-06-10 08:10:22 公開日:2021-03-04
# 周期量子因果モデル Cyclic Quantum Causal Models ( http://arxiv.org/abs/2002.12157v3 ) ライセンス: Link先を確認	Jonathan Barrett, Robin Lorenz, Ognyan Oreshkov	(参考訳) 因果推論は科学に不可欠であるが、量子論はそれに挑戦する。ベルの不等式に違反する量子相関は古典的因果モデルの枠組みの中で満足のいく因果的説明を否定する。さらに、量子システムと重力を包含する理論は、因果順序が不定な操作を特徴付ける因果的非分離過程を許容し、事象が全く因果順序づけられることを証明している。最初の課題は、量子過程の因果的説明を可能にする、近年の量子因果的モデルの発展によって解決された。この研究は因果的に分離不能なプロセスに対処し、量子因果モデルを循環因果構造に拡張することでそれらの因果的視点を提供する。このアプローチの他の応用として、ユニタリ拡張可能な2成分プロセスはすべて因果分離可能であり、ユニタリプロセスでは因果非分離性と因果構造の循環性が等価であることが示されている。 Causal reasoning is essential to science, yet quantum theory challenges it. Quantum correlations violating Bell inequalities defy satisfactory causal explanations within the framework of classical causal models. What is more, a theory encompassing quantum systems and gravity is expected to allow causally nonseparable processes featuring operations in indefinite causal order, defying that events be causally ordered at all. The first challenge has been addressed through the recent development of intrinsically quantum causal models, allowing causal explanations of quantum processes -- provided they admit a definite causal order, i.e. have an acyclic causal structure. This work addresses causally nonseparable processes and offers a causal perspective on them through extending quantum causal models to cyclic causal structures. Among other applications of the approach, it is shown that all unitarily extendible bipartite processes are causally separable and that for unitary processes, causal nonseparability and cyclicity of their causal structure are equivalent.	翻訳日:2023-06-01 12:37:18 公開日:2021-03-04
# 最も短いベクトル問題に対する2つの量子イジングアルゴリズム Two quantum Ising algorithms for the Shortest Vector Problem: one for now and one for later ( http://arxiv.org/abs/2006.14057v7 ) ライセンス: Link先を確認	David Joseph, Adam Callison, Cong Ling, Florian Mintert	(参考訳) 量子コンピュータは、今日の公開鍵暗号を数十年以内に破ると予想されている。新しい暗号系は後量子時代のために設計・標準化されており、これらの大部分は、最短ベクトル問題のような問題の難しさを量子敵に頼っている。本稿では,この問題を解決するための量子イジングアルゴリズムの2つの変種について述べる。 1つの変種は空間的に効率的であり、N が格子次元であるような O(NlogN) 量子ビットのみを必要とするが、もう1つの変種はノイズに対してより堅牢である。量子アニール器および数値シミュレーションにおけるアルゴリズムの性能の解析は、より量子ビット効率のよい変種は長期的には優れ、他の変種は短期的な実装に適していることを示している。 Quantum computers are expected to break today's public key cryptography within a few decades. New cryptosystems are being designed and standardised for the post-quantum era, and a significant proportion of these rely on the hardness of problems like the Shortest Vector Problem to a quantum adversary. In this paper we describe two variants of a quantum Ising algorithm to solve this problem. One variant is spatially efficient, requiring only O(NlogN) qubits where N is the lattice dimension, while the other variant is more robust to noise. Analysis of the algorithms' performance on a quantum annealer and in numerical simulations show that the more qubit-efficient variant will outperform in the long run, while the other variant is more suitable for near-term implementation.	翻訳日:2023-05-12 22:07:34 公開日:2021-03-04
# NISQ応用のための多重指数誤差外挿と複合誤差除去技術 Multi-exponential Error Extrapolation and Combining Error Mitigation Techniques for NISQ Applications ( http://arxiv.org/abs/2007.01265v2 ) ライセンス: Link先を確認	Zhenyu Cai	(参考訳) 量子ハードウェアにおけるノイズは、量子コンピュータの実装における最大の障害である。短期量子コンピュータの実用的応用におけるノイズと戦うために、大きな量子ビットオーバヘッドを必要とする量子誤差補正に頼る代わりに、余分な測定値を利用する量子エラー緩和に目を向ける。 error extrapolationは、実験的に実装されたエラー緩和技術である。数値シミュレーションとヒューリスティックな議論により、指数曲線は、予測された回路誤差がユニティの周りの大きな回路限界における外挿に有効であることが示されている。本稿では、これをマルチ指数誤差外挿に拡張し、パウリ雑音下での有効性のより厳密な証明を提供する。これは数値シミュレーションによりさらに検証され,単指数外挿よりも推定精度が桁違いに向上した。さらに,これらの手法の特徴を生かして,誤り補間と他の2つの誤り緩和手法,準確率と対称性の検証を組み合わせる手法を開発した。シミュレーションで示されるように,本手法は,標準誤差推定において必要となるハードウェア誤差率を調整することなく,サンプリングコストを準確率よりも数倍小さくして,推定バイアスを低減できる。 Noise in quantum hardware remains the biggest roadblock for the implementation of quantum computers. To fight the noise in the practical application of near-term quantum computers, instead of relying on quantum error correction which requires large qubit overhead, we turn to quantum error mitigation, in which we make use of extra measurements. Error extrapolation is an error mitigation technique that has been successfully implemented experimentally. Numerical simulation and heuristic arguments have indicated that exponential curves are effective for extrapolation in the large circuit limit with an expected circuit error count around unity. In this article, we extend this to multi-exponential error extrapolation and provide more rigorous proof for its effectiveness under Pauli noise. This is further validated via our numerical simulations, showing orders of magnitude improvements in the estimation accuracy over single-exponential extrapolation. Moreover, we develop methods to combine error extrapolation with two other error mitigation techniques: quasi-probability and symmetry verification, through exploiting features of these individual techniques. As shown in our simulation, our combined method can achieve low estimation bias with a sampling cost multiple times smaller than quasi-probability while without needing to be able to adjust the hardware error rate as required in canonical error extrapolation.	翻訳日:2023-05-11 20:37:22 公開日:2021-03-04
# 符号なし集合の端について On the edge of the set of no-signaling assemblages ( http://arxiv.org/abs/2008.12325v2 ) ライセンス: Link先を確認	Micha{\l} Banacki, Ricard Ravell Rodr\'iguez and Pawe{\l} Horodecki	(参考訳) 近年の進展に伴い, マルチパーティント・ポストクエンタム・ステアリングと一般無署名アセンブリのシナリオを考察した。我々は,無符号集合の集合の辺の概念を導入し,その特徴付けについて述べる。次に、この概念を用いて、LHSモデルなしで無署名の集合体を目撃する。最後に、2つの信頼できないサブシステムで操る最も単純な非自明な場合、エッジ上の集合の量子化の可能性について議論する。特に、3量子状態の場合、与えられた状態のランクが 3 より大きい限り、POVM によって記述された測定値を用いて、エッジ上にアセンブリを生成できないことを述べる no-go 型結果を得る。 Following recent advancements, we consider a scenario of multipartite postquantum steering and general no-signaling assemblages. We introduce the notion of the edge of the set of no-signaling assemblages and we present its characterization. Next, we use this concept to construct witnesses for no-signaling assemblages without an LHS model. Finally, in the simplest nontrivial case of steering with two untrusted subsystems, we discuss the possibility of quantum realization of assemblages on the edge. In particular, for three-qubit states, we obtain a no-go type result, which states that it is impossible to produce assemblage on the edge using measurements described by POVMs as long as the rank of a given state is greater than or equal to 3.	翻訳日:2023-05-04 19:28:06 公開日:2021-03-04
# 確率的定式化によるオープン量子システムシミュレーションのための自己回帰ニューラルネットワーク Autoregressive Neural Network for Simulating Open Quantum Systems via a Probabilistic Formulation ( http://arxiv.org/abs/2009.05580v3 ) ライセンス: Link先を確認	Di Luo, Zhuo Chen, Juan Carrasquilla, and Bryan K. Clark	(参考訳) オープン量子システムの理論は、量子科学と工学における現代の研究のかなりの部分の基礎を成している。拡張ヒルベルト空間の次元性に根ざし、開量子系をシミュレートする高い計算複雑性は、それらの力学を近似する戦略の開発を要求する。本稿では,オープン量子システムダイナミクスに取り組むためのアプローチを提案する。我々は,リウビリアン超作用素の運動をフォワード・バックワード・トラペズイド法を用いてシミュレートし,変分定式化によって定常状態を求める。本研究では,正の演算子値測度(povm)と自己回帰ニューラルネットワークを組み合わせた量子物理学の確率論的定式化を行い,効率的なサンプリングと扱いやすい密度によるアルゴリズムの柔軟性を生かした。自己回帰型ニューラルネットワークの対称性を部分的に復元し,局所相関の記述を改善する,改良されたアンサッツ,文字列状態を導入する。我々は,本手法を原型的な1次元と2次元のシステムでベンチマークし,厳密な解を追跡し,最近提案された制限ボルツマンマシンに基づくアプローチと比較して精度の高い結果を求める。このアプローチは、様々な文脈における密度行列の進化に広く適用できると期待する。 The theory of open quantum systems lays the foundations of a substantial part of modern research in quantum science and engineering. Rooted in the dimensionality of their extended Hilbert spaces, the high computational complexity of simulating open quantum systems calls for the development of strategies to approximate their dynamics. In this paper, we present an approach for tackling open quantum system dynamics. We simulate the dynamics of the Liouvillian superoperator using a forward-backward trapezoid method and find the steady-state via a variational formulation. We make use of a probabilistic formulation of quantum physics based on a positive operator-valued measure (POVM) in combination with autoregressive neural networks, which bring significant algorithmic flexibility due to their efficient sampling and tractable density. We introduce improved ansatzs, String States, which partially restore the symmetry of the autoregressive neural network and improve the description of local correlations. We benchmark our approaches on prototypical one and two-dimensional systems, finding results which closely track the exact solution and achieve higher accuracy in comparison to the recently proposed approach based on restricted Boltzmann machines. We anticipate this approach will be widely applicable to evolving density matrices in various contexts.	翻訳日:2023-05-02 22:20:00 公開日:2021-03-04
# 非対称量子ラビモデルの隠れ対称性 The hidden symmetry of the asymmetric quantum Rabi model ( http://arxiv.org/abs/2010.02496v2 ) ライセンス: Link先を確認	Vladimir V. Mangazeev, Murray T. Batchelor and Vladimir V. Bazhanov	(参考訳) 非対称量子ラビモデル (AQRM) は、バイアスパラメータ $\epsilon$ の値 $\epsilon\in\frac{1}{2}\mathbb{Z}$ に対して固有スペクトルのレベル交差を示す。このようなレベルの交差は、モデルの隠れ対称性と結びつくことが期待されている。この隠れ対称性の起源は、これらの特別な値でAQRMハミルトニアンと通勤する作用素を見つけることによって確立される。この構成は、最初のいくつかのケースで明示的に与えられ、同様のレベルの交差がバイアス項の存在下で観測された他の関連する光-物質相互作用モデルに適用することができる。 The asymmetric quantum Rabi model (AQRM) exhibits level crossings in the eigenspectrum for the values $\epsilon\in\frac{1}{2}\mathbb{Z}$ of the bias parameter $\epsilon$. Such level crossings are expected to be associated with some hidden symmetry of the model. The origin of this hidden symmetry is established by finding the operators which commute with the AQRM hamiltonian at these special values. The construction is given explicitly for the first several cases and can be applied to other related light-matter interaction models for which similar level crossings have been observed in the presence of a bias term.	翻訳日:2023-04-29 20:29:39 公開日:2021-03-04
# リッチトーリック符号における対称性分数化のための文字列順序パラメータ String order parameters for symmetry fractionalization in an enriched toric code ( http://arxiv.org/abs/2011.02981v2 ) ライセンス: Link先を確認	Jos\'e Garre-Rubio, Mohsin Iqbal and David T. Stephen	(参考訳) 低次元対称性保護位相状態を持つトーリック符号モデルをデコレートした対称性エンリッチ位相秩序の簡単なモデルについて検討した。このモデルにおける対称性の分数化は弦次数パラメータによって特徴づけられ、これらのシグネチャは相転移点まで外部場と相互作用の影響下で頑健であることを示す。これは[new journal of physics 21 113016 (2019)]の最近の提案を固定点テンソルネットワーク状態の設定を超えて拡張し、対称性分数化を特徴付け、検出するための有用なツールとして文字列順序パラメータを固化する。これに加えて、対称性を自発的に破る対称性の力を分別するエノンの凝縮がどのように観察し、射影された絡み合ったペア状態の枠組みでこれを証明する。この現象は並列磁場中のトーリック符号の位相図に顕著な変化をもたらす。 We study a simple model of symmetry-enriched topological order obtained by decorating a toric code model with lower-dimensional symmetry-protected topological states. We show that the symmetry fractionalization in this model can be characterized by string order parameters, and that these signatures are robust under the effects of external fields and interactions, up to the phase transition point. This extends the recent proposal of [New Journal of Physics 21, 113016 (2019)] beyond the setting of fixed-point tensor network states, and solidifies string order parameters as a useful tool to characterize and detect symmetry fractionalization. In addition to this, we observe how the condensation of an anyon that fractionalizes a symmetry forces that symmetry to spontaneously break, and we give a proof of this in the framework of projected entangled pair states. This phenomenon leads to a notable change in the phase diagram of the toric code in parallel magnetic fields.	翻訳日:2023-04-25 05:16:45 公開日:2021-03-04
# ブール関数に対する量子ランダムアクセス符号 Quantum Random Access Codes for Boolean Functions ( http://arxiv.org/abs/2011.06535v4 ) ライセンス: Link先を確認	Jo\~ao F. Doriguello, Ashley Montanaro	(参考訳) $n\overset{p}{\mapsto}m$ random access code (RAC) は$n$ビットを$m$ビットに符号化し、任意の初期ビットは少なくとも$p$で回収できるが、量子RAC(QRAC)では$n$ビットは$m$qubitsに符号化される。提案以来、RACの考え方は様々な方法で一般化され、例えば共有絡み(絡み付きランダムアクセスコードまたは単にEARACと呼ばれる)や複数のビットの復元が可能になった。本稿では,初期ビットの固定サイズの任意の部分集合上で,与えられたブール関数$f$の値を返すためのRACの考え方を一般化し,これを$f$-randomアクセスコードと呼ぶ。我々は、古典的な(f$-RAC)および量子(f$-QRAC)エンコーディングを持つ$f$ランダムアクセスコードのためのプロトコルを、プライベートまたは共有ランダムネス、共有エンタングルメント(f$-EARAC)、Poposcu-Rohrlichbox(f$-PRRAC)など、さまざまなリソースとともに研究し、提供する。我々のプロトコルの成功確率は、ブール関数 $f$ の \emph{noise stability} によって特徴づけられる。さらに、共有ランダム性を持つ任意の$f$-QRACの成功確率について \emph{upper bound} を与え、その成功確率を乗法定数(および拡張による$f$-RAC)まで一致させる。 An $n\overset{p}{\mapsto}m$ random access code (RAC) is an encoding of $n$ bits into $m$ bits such that any initial bit can be recovered with probability at least $p$, while in a quantum RAC (QRAC), the $n$ bits are encoded into $m$ qubits. Since its proposal, the idea of RACs was generalized in many different ways, e.g. allowing the use of shared entanglement (called entanglement-assisted random access code, or simply EARAC) or recovering multiple bits instead of one. In this paper we generalize the idea of RACs to recovering the value of a given Boolean function $f$ on any subset of fixed size of the initial bits, which we call $f$-random access codes. We study and give protocols for $f$-random access codes with classical ($f$-RAC) and quantum ($f$-QRAC) encoding, together with many different resources, e.g. private or shared randomness, shared entanglement ($f$-EARAC) and Popescu-Rohrlich boxes ($f$-PRRAC). The success probability of our protocols is characterized by the \emph{noise stability} of the Boolean function $f$. Moreover, we give an \emph{upper bound} on the success probability of any $f$-QRAC with shared randomness that matches its success probability up to a multiplicative constant (and $f$-RACs by extension), meaning that quantum protocols can only achieve a limited advantage over their classical counterparts.	翻訳日:2023-04-24 07:39:30 公開日:2021-03-04
# タブ駆動型量子近傍サンプラー Tabu-driven Quantum Neighborhood Samplers ( http://arxiv.org/abs/2011.09508v2 ) ライセンス: Link先を確認	Charles Moussa, Hao Wang, Henri Calandra, Thomas B\"ack, Vedran Dunjko	(参考訳) 組合せ最適化は量子コンピューティングを対象とする重要な応用である。しかし、短期的なハードウェアの制約により、大きな実用的問題に対する高性能な古典的ヒューリスティックと比較すると、量子アルゴリズムは競争力に欠ける。短期的なデバイスで利点を得る一つの選択肢は、それらを古典的ヒューリスティックと組み合わせて使うことである。特に,量子法を用いて古典的に難解な分布からサンプルを抽出し,最適化問題を高速に解くための真の証明可能な量子分離を実現するための最も可能性の高い手法を提案する。量子近似最適化アルゴリズム (qaoa) を近傍サンプルとして, タブサーチの適用により, この拡張を数値的に検討した。このようなハイブリッド環境では,QAOAは探索・探索の柔軟なツールであり,タブイテレーションを多く節約し,より良いソリューションを実現することで,問題の迅速な解決に有効であることを示す。 Combinatorial optimization is an important application targeted by quantum computing. However, near-term hardware constraints make quantum algorithms unlikely to be competitive when compared to high-performing classical heuristics on large practical problems. One option to achieve advantages with near-term devices is to use them in combination with classical heuristics. In particular, we propose using quantum methods to sample from classically intractable distributions -- which is the most probable approach to attain a true provable quantum separation in the near-term -- which are used to solve optimization problems faster. We numerically study this enhancement by an adaptation of Tabu Search using the Quantum Approximate Optimization Algorithm (QAOA) as a neighborhood sampler. We show that QAOA provides a flexible tool for exploration-exploitation in such hybrid settings and can provide evidence that it can help in solving problems faster by saving many tabu iterations and achieving better solutions.	翻訳日:2023-04-23 19:08:48 公開日:2021-03-04
# 非対称 {S_1FS_2} ジョセフソン接合における超ギャップおよびサブギャップ増強電流 Supergap and subgap enhanced currents in asymmetric {S_1FS_2} Josephson junctions ( http://arxiv.org/abs/2011.12967v2 ) ライセンス: Link先を確認	Mohammad Alidoust, Klaus Halterman	(参考訳) 超伝導鉛の超伝導ギャップの大きさは不等式、すなわち$\Delta_1\neq \Delta_2$で非対称な$S_1NS_2$および$S_1FS_2$系を生成する。その結果, 超伝導ギャップの比を$\Delta_2/\Delta_1$にすることで, S_1NS_2$系における臨界超電流を100\%以上高め, 飽和点に達したり, 接合厚さ, 磁化強度, 化学ポテンシャルに応じて崩壊させたりすることができることがわかった。拡散性$S_1NS_2$系の総臨界電流は、パラボラティカルに50\%以上増大し、超伝導ギャップの1つを増大させることで飽和に達することがわかった。均一な強磁性接合では、超電流は$\Delta_2/\Delta_1>1$の増加によって反転する。超電流をスーパーギャップ成分とサブギャップ成分に分解することにより、ジョセフソン電流の流れに対するそれらの重要な相対的貢献を示す。その結果,S_1FS_2$接合におけるサブギャップ電流とスーパーギャップ電流の競合は,電流相関係における第2高調波の出現をもたらすことがわかった。拡散非対称ジョセフソン構成とは対照的に、$\Delta_2/\Delta_1=1$の弾道系における超電流の挙動は、フェルミ準位ミスマッチ、磁化強度、接合厚を含む幅広いパラメータセットにおいてのみ、サブギャップ電流成分によって適切に記述できる。興味深いことに、$\delta_2/\delta_1>1$の場合、全超電流がsupergapコンポーネントによって駆動される複数のパラメータセットを見つけました。そこで本研究では,弾道系および拡散系におけるサブギャップおよびスーパーギャップ超電流成分の重要性を概説した。 We have theoretically studied the supercurrent profiles in three-dimensional normal metal and ferromagnetic Josephson configurations, where the magnitude of the superconducting gaps in the superconducting leads are unequal, i.e., $\Delta_1\neq \Delta_2$, creating asymmetric $S_1NS_2$ and $S_1FS_2$ systems. Our results reveal that by increasing the ratio of the superconducting gaps $\Delta_2/\Delta_1$, the critical supercurrent in a ballistic $S_1NS_2$ system can be enhanced by more than $100\%$, and reaches a saturation point, or decays away, depending on the junction thickness, magnetization strength, and chemical potential. The total critical current in a diffusive $S_1NS_2$ system was found to be enhanced by more than $50\%$ parabolically, and reaches saturation by increasing one of the superconducting gaps. In a uniform ferromagnetic junction, the supercurrent undergoes reversal by increasing $\Delta_2/\Delta_1>1$. Through decomposing the total supercurrent into its supergap and subgap components, our results illustrate their crucial relative contributions to the Josephson current flow. It was found that the competition of subgap and supergap currents in a $S_1FS_2$ junction results in the emergence of second harmonics in the current-phase relation. In contrast to a diffusive asymmetric Josephson configuration, the behavior of the supercurrent in a ballistic system with $\Delta_2/\Delta_1=1$ can be properly described by the subgap current component only, in a wide range of parameter sets, including Fermi level mismatch, magnetization strength, and junction thickness. Interestingly, when $\Delta_2/\Delta_1>1$, our results have found multiple parameter sets where the total supercurrent is driven by the supergap component. Therefore, our comprehensive study highlights the importance of subgap and supergap supercurrent components in both the ballistic and diffusive regimes.	翻訳日:2023-04-23 00:48:40 公開日:2021-03-04
# 新型コロナウイルス(covid-19)の拡散を制御する自動暴露通知(aen)技術の約束:スマートフォンアプリの展開、使用、反復評価の推奨 Realizing the Promise of Automated Exposure Notification (AEN) Technology to Control the Spread of COVID-19: Recommendations for Smartphone App Deployment, Use, and Iterative Assessment ( http://arxiv.org/abs/2012.09232v2 ) ライセンス: Link先を確認	Jesslyn Alekseyev (1), Erica Dixon (2), Vilhelm L Andersen Woltz (3), Danny Weitzner (3) ((1) Massachusetts Institute of Technology Lincoln Laboratory, (2) University of Pennsylvania, (3) Massachusetts Institute of Technology)	(参考訳) 現代の暗号技術を用いることで、プライバシ保存自動露光通知(aen)技術は、インキュベーション期間中に個人のデータのプライバシーを維持しながら、人々間の接触を自動的に記録することで、拡散する病気の軽減を約束する。今日では、米国や世界中の公共衛生部門が、AENシステムを急速に展開している。多くの組織がアプリをデプロイする前に調査を行ったが、世界中の経験から、接触追跡アプリが比較的低いレベルにインストールされ、使用されていることが分かる。このホワイトペーパーは、AENシステムの展開を検討している州に有用な情報を提供し、既に配備されている州の改善をガイドすることを目的としている。 GAENコンソーシアム Exposure Notifications (EN) Express ツールを含む,新型コロナウイルスの感染拡大を抑えるという究極の目標を掲げて,AEN システムの採用に関連する人的要因について概説する。また、AENシステムを設計、デプロイする国家や、連絡先追跡アプリのデプロイを評価するための一連の推奨事項や、初期展開時の有効性を改善するための関心領域のターゲティングのための実用的な設計および実装ガイドも提供します。ケーススタディでは,ペンシルバニア州(PA)が展開する商用アプリと,ユーザの採用促進に向けた継続的な取り組みについて検討する。 By using modern cryptographic techniques, privacy-preserving Automated Exposure Notification (AEN) technologies offer the promise of mitigating disease spread by automatically recording contacts between people over the incubation period while maintaining individual data privacy. Today, public health departments in States and other countries around the world are deploying AEN systems at a rapid pace. Though many organizations conducted research prior to deploying apps, experience around the world shows that contact-tracing apps are installed and used at relatively low levels. This whitepaper is intended to provide usable information for States who are considering the deployment of an AEN system, as well as to guide ongoing improvements for States that have already deployed. We outline the human factors considerations related to employing AEN systems with the ultimate goal of controlling the spread of COVID-19, including the GAEN consortium Exposure Notifications (EN) Express tool. We will also provide a practical design and implementation guide for States and others designing and deploying AEN systems, as well as a set of recommendations for assessing deployment of contact tracing apps and targeting areas of concern to improve efficacy of use during and after initial deployment. As a case study, we consider the commercial app deployed by the state of Pennsylvania (PA) and the ongoing efforts to drive user adoption there.	翻訳日:2023-04-20 10:52:22 公開日:2021-03-04
# 数サイクルパルスによるCEP依存性コヒーレンスの解析理論 An analytical theory of CEP-dependent coherence driven by few-cycle pulses ( http://arxiv.org/abs/2101.04881v3 ) ライセンス: Link先を確認	Bing Zeng and Lingze Duan	(参考訳) 原子系と数サイクルの超高速パルスの相互作用は、リッチ物理と量子コヒーレンス制御におけるかなりの応用可能性をもたらす。しかしながら、その一般的な行動に関する理論的理解は、特にキャリア-エンベロープ相(CEP)の影響に関して、この体制における解析的な記述が欠如していることによって妨げられている。ここでは、遠方共鳴、少数サイクル2乗パルスによって駆動される2レベル原子を記述した分析理論を示す。シュロディンガー方程式の単純閉形式解は、回転波近似やゆっくりと変化するエンベロープ近似を招かなくても、一階摂動の下で得られる。さらなる研究により、原子の最終反転とパルスのCEPの間の算術的関係が明らかになる。その数学的単純さにもかかわらず、この関係は相互作用の重要な特徴を捉えることができ、パルス形状の一般化に対して頑健であることが証明され、数値解との良好な一致を示す。この理論は、将来のCEP感受性量子コヒーレンスの研究における一般的なガイダンスを提供する可能性がある。 The interaction between an atomic system and a few-cycle ultrafast pulse carries rich physics and a considerable application prospect in quantum-coherence control. However, theoretical understanding of its general behaviors has been hindered by the lack of an analytical description in this regime, especially with regard to the impact of the carrier-envelope phase (CEP). Here, we present an analytical theory that describes a two-level atom driven by a far-off-resonance, few-cycle square pulse. A simple, closed-form solution of the Schrodinger equation is obtained under the first-order perturbation without invoking the rotating-wave approximation or the slowly varying envelope approximation. Further investigation reveals an arithmetic relation between the final inversion of the atom and the CEP of the pulse. Despite its mathematical simplicity, the relation is able to capture some of the key features of the interaction, which prove to be robust against generalization of pulse shapes and show good agreements with numerical solutions. The theory can potentially offer a general guidance in future studies of CEP-sensitive quantum coherence.	翻訳日:2023-04-15 17:48:02 公開日:2021-03-04
# ダーク原子からの連続量子光 Continuous quantum light from a dark atom ( http://arxiv.org/abs/2103.01138v2 ) ライセンス: Link先を確認	Karl Nicolas Tolazzi, Bo Wang, Christopher Ianzano, Jonas Neumeier, Celso Jorge Villas-Boas, Gerhard Rempe	(参考訳) サイクリング過程はレーザーからトポロジカル絶縁体まで多くの物理学領域において重要であり、しばしば各系の動的および構造的側面に関する驚くべき洞察を与える。本稿では、共振レーザと光共振器が単一原子のいくつかの基底状態と励起状態の間の閉周期を定義する量子非線形波動混合実験について報告する。強い原子空洞結合と定常運転では、原子状態とキャビティ内光子数の絡み合いが量子干渉によって励起状態の集団を抑圧し、原子基底状態へのサイクルを効果的に減少させることを示した。システムダイナミクスは、各空洞光子数に対する1つの暗黒状態と、空洞から放出される光子にアンチバンチングを発生させる量子ゼノ遮断のハーモニックラグ内での遷移によって生じる。還元サイクルはサイクル外の原子状態への不要な光ポンピングを抑制し、発光光子の数を増大させる。 Cycling processes are important in many areas of physics ranging from lasers to topological insulators, often offering surprising insights into dynamical and structural aspects of the respective system. Here we report on a quantum-nonlinear wave-mixing experiment where resonant lasers and an optical cavity define a closed cycle between several ground and excited states of a single atom. We show that, for strong atom-cavity coupling and steady-state driving, the entanglement between the atomic states and intracavity photon number suppresses the excited-state population via quantum interference, effectively reducing the cycle to the atomic ground states. The system dynamics then result from transitions within a harmonic ladder of entangled dark states, one for each cavity photon number, and a quantum Zeno blockade that generates antibunching in the photons emitted from the cavity. The reduced cycle suppresses unwanted optical pumping into atomic states outside the cycle, thereby enhancing the number of emitted photons.	翻訳日:2023-04-09 14:34:33 公開日:2021-03-04
# 量子絡み合い生成のためのマクロランダムネス Macroscopic randomness for quantum entanglement generation ( http://arxiv.org/abs/2103.02879v1 ) ライセンス: Link先を確認	Byoung S. Ham	(参考訳) 2つ以上の2成分間の量子絡み合いは、量子重ね合わせによってハイゼンベルクの不確定性原理によって直接制御される微視的レジームに制限された量子情報領域の中核概念であり、非決定論的かつ確率的量子的特徴をもたらす。このような量子機能は古典的手法では生成できない。ここでは、オンデマンドの絡み合った光ペア生成の古典的手法を、基底ランダム性を用いてマクロな状態に提示する。従来の量子力学のこの矛盾する考えは、古典性と量子性の両方に関する根本的な疑問を提起する。 Quantum entanglement between two or more bipartite entities is a core concept in quantum information areas limited to microscopic regimes directly governed by Heisenberg uncertainty principle via quantum superposition, resulting in nondeterministic and probabilistic quantum features. Such quantum features cannot be generated by classical means. Here, a pure classical method of on-demand entangled light-pair generation is presented in a macroscopic regime via basis randomness. This conflicting idea of conventional quantum mechanics invokes a fundamental question about both classicality and quantumness, where superposition is key to its resolution.	翻訳日:2023-04-09 02:46:02 公開日:2021-03-04
# BLOCKEYE:ブロックチェーンでDeFi攻撃を狙う BLOCKEYE: Hunting For DeFi Attacks on Blockchain ( http://arxiv.org/abs/2103.02873v1 ) ライセンス: Link先を確認	Bin Wang, Han Liu, Chao Liu, Zhiqiang Yang, Qian Ren, Huixuan Zheng, Hong Lei	(参考訳) 分散金融、すなわちDeFiは、近年、多くのパブリックブロックチェーン(Ethereumなど)上で最も人気のあるタイプのアプリケーションとなっている。従来の金融と比較して、DeFiは顧客が比較的低コストでスマートコントラクトを通じて、多様なブロックチェーン金融サービス(融資、借り入れ、担保付け、交換など)に柔軟に参加することを可能にする。しかし、DeFiのオープンな性質は必然的に大きな攻撃面をもたらし、これは参加者の資金の安全に対する深刻な脅威である。本稿では,Ethereumブロックチェーン上でのDeFiプロジェクトのリアルタイム攻撃検出システムであるBLOCKEYEを提案する。 blockeyeが提供する重要な機能は次の2つだ。 1) 潜在的に脆弱なdefiプロジェクトは、重要なサービス状態(例えば資産価格)のデータフローを象徴的に推論し、外部操作が可能なかどうかをチェックする自動セキュリティ分析プロセスに基づいて識別される。 2) その後、脆弱なDeFiプロジェクトのためにトランザクションモニタがオフチェーンにインストールされる。プロジェクトだけでなく関連するプロジェクトにも送信されたトランザクションは、さらなるセキュリティ分析のために収集される。ブロックアイに設定された臨界不変量に対して違反が検出されると、潜在的な攻撃をフラグ付けする。例えば、利益は、非常に短時間で達成され、コストよりもはるかに大きい。いくつかの人気のあるDeFiプロジェクトにBLOCKEYEを適用し、報告されていない潜在的なセキュリティ攻撃を発見しました。 BLOCKEYEのビデオはhttps://youtu.be/7DjsWBLdlQUで公開されている。 Decentralized finance, i.e., DeFi, has become the most popular type of application on many public blockchains (e.g., Ethereum) in recent years. Compared to the traditional finance, DeFi allows customers to flexibly participate in diverse blockchain financial services (e.g., lending, borrowing, collateralizing, exchanging etc.) via smart contracts at a relatively low cost of trust. However, the open nature of DeFi inevitably introduces a large attack surface, which is a severe threat to the security of participants funds. In this paper, we proposed BLOCKEYE, a real-time attack detection system for DeFi projects on the Ethereum blockchain. Key capabilities provided by BLOCKEYE are twofold: (1) Potentially vulnerable DeFi projects are identified based on an automatic security analysis process, which performs symbolic reasoning on the data flow of important service states, e.g., asset price, and checks whether they can be externally manipulated. (2) Then, a transaction monitor is installed offchain for a vulnerable DeFi project. Transactions sent not only to that project but other associated projects as well are collected for further security analysis. A potential attack is flagged if a violation is detected on a critical invariant configured in BLOCKEYE, e.g., Benefit is achieved within a very short time and way much bigger than the cost. We applied BLOCKEYE in several popular DeFi projects and managed to discover potential security attacks that are unreported before. A video of BLOCKEYE is available at https://youtu.be/7DjsWBLdlQU.	翻訳日:2023-04-09 02:45:50 公開日:2021-03-04
# 状態集合とユニタリチャネルの量子認証 Quantum certification of state set and unitary channel ( http://arxiv.org/abs/2103.02837v1 ) ライセンス: Link先を確認	Wei Xie	(参考訳) 量子状態集合とユニタリ量子チャネルの効率的な量子認証アルゴリズムについて検討する。未知の状態の $o(\varepsilon^{-4}\ln \|\mathcal{p}\|)$ を使って未知の状態が含まれているかどうかを、トレース距離に関して既知の状態の有限集合 $\mathcal{p}$ と区別するアルゴリズムを提案する。このアルゴリズムは、いくつかの設定でよりサンプル効率が良い。以前の研究では、未知のユニタリ$u$が既知のユニタリ$v$と同一か、または、未知のユニタリ$v$を固定次元で、o(\varepsilon^{-2})$で使用し、choi状態が使われ、アンシラシステムが必要であるかを区別できることを示した。 2つのケースを1つのユニタリの$o(\varepsilon^{-1})$で区別するアルゴリズムを与える。 We study efficient quantum certification algorithms for quantum state set and unitary quantum channel. We present an algorithm that uses $O(\varepsilon^{-4}\ln \|\mathcal{P}\|)$ copies of an unknown state to distinguish whether the unknown state is contained in or $\varepsilon$-far from a finite set $\mathcal{P}$ of known states with respect to the trace distance. This algorithm is more sample-efficient in some settings. Previous study showed that one can distinguish whether an unknown unitary $U$ is equal to or $\varepsilon$-far from a known or unknown unitary $V$ in fixed dimension with $O(\varepsilon^{-2})$ uses of the unitary, in which the Choi state is used and thus an ancilla system is needed. We give an algorithm that distinguishes the two cases with $O(\varepsilon^{-1})$ uses of the unitary, using much fewer or no ancilla compared with previous results.	翻訳日:2023-04-09 02:45:26 公開日:2021-03-04
# ベル試験の1パラメータファミリーに及ぼす測定依存性の影響 Effects of measurement dependence on 1-parameter family of Bell tests ( http://arxiv.org/abs/2103.02819v1 ) ライセンス: Link先を確認	Fen-Zhuo Guo and Ze-Tian Lv and Shi-Hui Wei and Qiao-Yan Wen	(参考訳) ベルテストに基づくほとんどの量子情報タスクは、測定独立性の仮定に依存する。しかし, 測定独立性の仮定が常に実験で満たされていることを保証することは困難であり, ベル試験におけるこの仮定の緩和効果を検討することが重要である。本稿では,ベル (1-pfb) 試験の1パラメータファミリーに対する測定独立性の仮定を緩和する効果について検討する。一般的な入力分布と分解可能な入力分布の両方に対して、Eveが偽造できる1-PFB相関関数の計測依存性、推定確率、最大値の関係を確立する。 Eveが最大値を偽装する決定論的戦略も与えられる。チェイン不平等と1-PFB不平等の未知の情報レートを比較し、イーブが1-PFB不平等の最大量子違反を1-PFB不等式より偽装することが難しいパラメータの範囲を求める。 Most quantum information tasks based on Bell tests relie on the assumption of measurement independence. However, it is difficult to ensure that the assumption of measurement independence is always met in experimental operations, so it is crucial to explore the effects of relaxing this assumption on Bell tests. In this paper, we discuss the effects of relaxing the assumption of measurement independence on 1-parameter family of Bell (1-PFB) tests. For both general and factorizable input distributions, we establish the relationship among measurement dependence, guessing probability, and the maximum value of 1-PFB correlation function that Eve can fake. The deterministic strategy when Eve fakes the maximum value is also given. We compare the unknown information rate of Chain inequality and 1-PFB inequality, and find the range of the parameter in which it is more difficult for Eve to fake the maximum quantum violation in 1-PFB inequality than in Chain inequality.	翻訳日:2023-04-09 02:45:08 公開日:2021-03-04
# 対称多量子状態:星、絡み合い、ロトセンサー Symmetric Multiqudit States: Stars, Entanglement, Rotosensors ( http://arxiv.org/abs/2103.02786v1 ) ライセンス: Link先を確認	Chryssomalis Chryssomalakos, Louis Hanotel, Edgar Guzm\'an-Gonz\'alez, Daniel Braun, Eduardo Serrano-Ens\'astiga and Karol \.Zyczkowski	(参考訳) n=d-1$ majorana 星の星座は、次元 $d$ の任意の純量子状態または $n$ qubits からなる系の置換対称状態を表す。後者の構成を、それぞれ$d$レベルを持つ$k$サブシステムの任意の対称な純粋状態を表すように一般化する。 d\geq 3$ に対して、そのような状態は、回転に関する限り、一定の相対複素重みを持つ様々なスピン状態の集まりと等価である。マヨラナの先導に従って、上記のスピン状態のマヨラナ星座からなり、補助的な「観測者」星座によって拡張され、複素重みをエンコードする多層星座を導入する。 4つの四重項と2つのスピン-3/2$系の対称状態の恒星表現の例を示す。我々は、様々なスピンの多元状態、パーティの数、さらには対称性を関連付けるエルミートとムルナガンの同型を再検討する。本稿では,多成分の絡み合いを解析し,最適な量子ロートセンサ,すなわち,特定の軸まわりの回転に最大に敏感な純粋状態,あるいは全ての軸の平均値を特定するために導入されるツールについて述べる。 A constellation of $N=d-1$ Majorana stars represents an arbitrary pure quantum state of dimension $d$ or a permutation-symmetric state of a system consisting of $n$ qubits. We generalize the latter construction to represent in a similar way an arbitrary symmetric pure state of $k$ subsystems with $d$ levels each. For $d\geq 3$, such states are equivalent, as far as rotations are concerned, to a collection of various spin states, with definite relative complex weights. Following Majorana's lead, we introduce a multiconstellation, consisting of the Majorana constellations of the above spin states, augmented by an auxiliary, "spectator" constellation, encoding the complex weights. Examples of stellar representations of symmetric states of four qutrits, and two spin-$3/2$ systems, are presented. We revisit the Hermite and Murnaghan isomorphisms, which relate multipartite states of various spins, number of parties, and even symmetries. We show how the tools introduced can be used to analyze multipartite entanglement and to identify optimal quantum rotosensors, i.e., pure states which are maximally sensitive to rotations around a specified axis, or averaged over all axes.	翻訳日:2023-04-09 02:44:51 公開日:2021-03-04
# 光合成錯体は量子コヒーレンスを用いて効率を向上させるか? おそらくそうではない。 Do photosynthetic complexes use quantum coherence to increase their efficiency? Probably not ( http://arxiv.org/abs/2103.02604v1 ) ライセンス: Link先を確認	Elinor Zerah-Harush and Yonatan Dubi	(参考訳) この疑問に答えることが量子生物学の分野で中心的なモチベーションとなり、光合成錯体の波状挙動を実証する一連の実験の後、このアイデアが生まれて以来である。本稿では,3つの天然錯体の効率に及ぼす量子コヒーレンスの影響を直接評価する。オープン量子システムアプローチにより、自然の生理的条件下で、それらの「量子性」と効率のレベルを同時に特定できる。これらのシステムは、デファス化支援輸送を特徴とする混合量子古典的状態にあることを示す。しかし、この体制における効率の変化は最短であり、量子コヒーレンスの存在が効率の向上に重要な役割を果たさないことを示唆している。しかしながら、このレジームの効率性はいかなる構造パラメータにも依存せず、進化によって自然コンプレックスをパラメータレジームに誘導し、他の用途のためにその構造を「設計する」ことを示唆している。 Answering the titular question has become a central motivation in the field of quantum biology, ever since the idea was raised following a series of experiments demonstrating wave-like behavior in photosynthetic complexes. Here, we report a direct evaluation of the effect of quantum coherence on the efficiency of three natural complexes. An open quantum systems approach allows us to simultaneously identify their level of "quantumness" and efficiency, under natural physiological conditions. We show that these systems reside in a mixed quantum-classical regime, characterized by dephasing-assisted transport. Yet, we find that the change in efficiency at this regime is minute at best, implying that the presence of quantum coherence does not play a significant role in enhancing efficiency. However, in this regime efficiency is independent of any structural parameters, suggesting that evolution may have driven natural complexes to their parameter regime in order to "design" their structure for other uses.	翻訳日:2023-04-09 02:44:28 公開日:2021-03-04
# $3 \rightarrow 1 $ sequence quantum random access codes を用いた非シャープ測定による3つのブラックボックスの認証 Certification of three black boxes with unsharp measurements using $3 \rightarrow 1 $ sequential quantum random access codes ( http://arxiv.org/abs/2103.03075v1 ) ライセンス: Link先を確認	Shihui Wei, Fenzhuo Guo, Fei Gao, and Qiao-Yan Wen	(参考訳) アンシャープ測定は量子情報理論においてますます重要な役割を果たす。本稿では、3 \rightarrow 1 $ sequential random access codes (racs)に基づく非シャープ測定による3者間準備変換測定実験について検討する。 3 \rightarrow 1 $ sequential quantum random access codes (qracs) の2つの相関証人間の最適なトレードオフを導出し、その結果を用いて3つのシーケンシャルパーティのための量子準備、機器、測定の自己テストを完了させる。また、シャープネスパラメータの上限と下限を与え、自己テストスキームのロバスト性解析を完了させる。さらに, 古典的相関証人違反は, 両者の相関証人から3ドル1ドル連続RACを同時に取得できないことがわかった。これは、第2のパーティが古典的な上界を克服するために強いアンシャープ測定を使用する場合、第3のパーティは鋭い測定でもそれをできないことを意味する。最後に、異なるシャープネスパラメータ下での乱数生成効率の解析と比較を行い、決定式値に基づいて2 \rightarrow 1 $ と $3 \rightarrow 1 $ qracs を別々に示す。このレターは、セミデバイス独立フレームワークにおけるマルチパーティ間の乱数生成に新たな光を当てている。 Unsharp measurements play an increasingly important role in quantum information theory. In this paper, we study a three-party prepare-transform-measure experiment with unsharp measurements based on $ 3 \rightarrow 1 $ sequential random access codes (RACs). We derive optimal trade-off between the two correlation witnesses in $ 3 \rightarrow 1 $ sequential quantum random access codes (QRACs), and use the result to complete the self-testing of quantum preparations, instruments and measurements for three sequential parties. We also give the upper and lower bounds of the sharpness parameter to complete the robustness analysis of the self-testing scheme. In addition, we find that classical correlation witness violation based on $3 \rightarrow 1 $ sequential RACs cannot be obtained by both correlation witnesses simultaneously. This means that if the second party uses strong unsharp measurements to overcome the classical upper bound, the third party cannot do so even with sharp measurements. Finally, we give the analysis and comparison of the random number generation efficiency under different sharpness parameters based on the determinant value, $2 \rightarrow 1 $ and $3 \rightarrow 1 $ QRACs separately. This letter sheds new light on generating random numbers among multi-party in semi-device independent framework.	翻訳日:2023-04-09 02:39:07 公開日:2021-03-04
# シカモア量子超越回路のシミュレーション Simulating the Sycamore quantum supremacy circuits ( http://arxiv.org/abs/2103.03074v1 ) ライセンス: Link先を確認	Feng Pan and Pan Zhang	(参考訳) 量子回路をシミュレートする一般的なテンソルネットワーク法を提案する。この方法は既存の方法よりも多くの相関ビットストリング振幅と確率を計算するのに非常に効率的である。本研究では、従来のスーパーコンピュータの限界を超え、量子超越性を示すために用いられてきた、googleのsycamore回路のサンプリング問題を研究する。我々は,60のグラフィカル処理ユニット(GPU)を含む小さな計算クラスタを用いて,53キュービットと20サイクルのSycamore回路から100万個の相関ビットストリングを生成し,線形クロスエントロピーベンチマーク(XEB)フィデリティは0.739であり,これはGoogleの量子超越実験よりもはるかに高い。 We propose a general tensor network method for simulating quantum circuits. The method is massively more efficient in computing a large number of correlated bitstring amplitudes and probabilities than existing methods. As an application, we study the sampling problem of Google's Sycamore circuits, which are believed to be beyond the reach of classical supercomputers and have been used to demonstrate quantum supremacy. Using our method, employing a small computational cluster containing 60 graphical processing units (GPUs), we have generated one million correlated bitstrings with some entries fixed, from the Sycamore circuit with 53 qubits and 20 cycles, with linear cross-entropy benchmark (XEB) fidelity equals 0.739, which is much higher than those in Google's quantum supremacy experiments.	翻訳日:2023-04-09 02:38:44 公開日:2021-03-04
# 核スピンフリーNi(II)に基づく分子単量体におけるスピン時計転移の化学チューニング Chemical tuning of spin clock transitions in molecular monomers based on nuclear spin-free Ni(II) ( http://arxiv.org/abs/2103.03021v1 ) ライセンス: Link先を確認	Marcos Rub\'in-Osanz, Fran\c{c}ois Lambert, Feng Shao, Eric Rivi\`ere, R\'egis Guillot, Nicolas Suaud, Nathalie Guih\'ery, David Zueco, Anne-Laure Barra, Talal Mallah and Fernando Luis	(参考訳) 我々は、一核ni錯体の2つの最低電子スピン準位の間に大きな量子トンネルが存在することを報告する。このギャップに関連するレベルの反交差(磁気時計遷移)は、熱容量実験によって直接監視されている。これらの結果と、対称性によってトンネルが禁止されるco誘導体との比較は、時計遷移が分子間スピンスピン-スピン相互作用を効果的に抑制することを示している。さらに, 量子トンネル分割法では, 結晶場と磁気異方性を決定するリガンドシェルの修飾による化学チューニングが認められることを示した。これらの性質は、デコヒーレンスに対して必要なレジリエンス、他のキュービットとの適切なインターフェース、制御回路、冷却による初期化能力を組み合わせたモデルスピン量子ビットの実現に不可欠である。 We report the existence of a sizeable quantum tunnelling splitting between the two lowest electronic spin levels of mononuclear Ni complexes. The level anti-crossing, or magnetic clock transition, associated with this gap has been directly monitored by heat capacity experiments. The comparison of these results with those obtained for a Co derivative, for which tunnelling is forbidden by symmetry, shows that the clock transition leads to an effective suppression of intermolecular spin-spin interactions. In addition, we show that the quantum tunnelling splitting admits a chemical tuning via the modification of the ligand shell that determines the crystal field and the magnetic anisotropy. These properties are crucial to realize model spin qubits that combine the necessary resilience against decoherence, a proper interfacing with other qubits and with the control circuitry and the ability to initialize them by cooling.	翻訳日:2023-04-09 02:37:47 公開日:2021-03-04
# ゲートセットトモグラフィーによる超伝導量子ビットの中間回路特性評価 Characterizing mid-circuit measurements on a superconducting qubit using gate set tomography ( http://arxiv.org/abs/2103.03008v1 ) ライセンス: Link先を確認	Kenneth Rudinger, Guilhem J. Ribeill, Luke C. G. Govia, Matthew Ware, Erik Nielsen, Kevin Young, Thomas A. Ohki, Robin Blume-Kohout, and Timothy Proctor	(参考訳) 量子回路の内部層で発生する測定(中間回路計測)は、重要な量子コンピューティングプリミティブであり、特に量子エラーの修正に向いている。中回路測定は古典的出力と量子的出力の両方を持つため、量子回路を終端する測定には存在しない誤差モードの対象となる。本稿では,量子機器を用いた中周期計測を,量子機器線形ゲートセットトモグラフィ(QILGST)と呼ぶ手法を用いて特徴付ける方法を示す。次に、この手法を適用し、マルチキュービットシステム内の超伝導トランスモン量子ビットの分散測定を特徴付ける。測定パルスとその後のゲート間の遅延時間を変化させることで,残余空洞光子集団が測定誤差に与える影響を探索する。実験では、1000 ns以上の遅延時間、すなわち、$\epsilon_{\diamond} = 8.1 \pm 1.4 \%$, a readout fidelity of 97.0 \pm 0.3\%$, and output quantum state fidelities of 9,6.7 \pm 0.6\%$と93.7 \pm 0.7\%$の合計誤差率(すなわち、半ダイヤモンド距離)を測定した。 Measurements that occur within the internal layers of a quantum circuit -- mid-circuit measurements -- are an important quantum computing primitive, most notably for quantum error correction. Mid-circuit measurements have both classical and quantum outputs, so they can be subject to error modes that do not exist for measurements that terminate quantum circuits. Here we show how to characterize mid-circuit measurements, modelled by quantum instruments, using a technique that we call quantum instrument linear gate set tomography (QILGST). We then apply this technique to characterize a dispersive measurement on a superconducting transmon qubit within a multiqubit system. By varying the delay time between the measurement pulse and subsequent gates, we explore the impact of residual cavity photon population on measurement error. QILGST can resolve different error modes and quantify the total error from a measurement; in our experiment, for delay times above 1000 ns we measured a total error rate (i.e., half diamond distance) of $\epsilon_{\diamond} = 8.1 \pm 1.4 \%$, a readout fidelity of $97.0 \pm 0.3\%$, and output quantum state fidelities of $96.7 \pm 0.6\%$ and $93.7 \pm 0.7\%$ when measuring $0$ and $1$, respectively.	翻訳日:2023-04-09 02:37:18 公開日:2021-03-04
# H3の電子共鳴状態における擬似Jahn-Teller相互作用 Pseudo-Jahn-Teller interaction among electronic resonant states of H3 ( http://arxiv.org/abs/2103.02935v1 ) ライセンス: Link先を確認	Patrik Hedvall, {\AA}sa Larson	(参考訳) 我々は、H3+基底状態のポテンシャルエネルギー面上にエネルギーを持つH3の電子共鳴状態を研究する。これらの共鳴状態は、より高い衝突エネルギーにおけるH3+の解離的再結合に重要である。これらの共鳴状態を記述するために擬似ヤーン・テラーモデルの複素一般化を導入する。共振状態のポテンシャルエネルギーと自己イオン化幅は、複雑なコーン変分法を用いて電子散乱計算により計算され、複素モデルパラメータは結果に適合する最小二乗で抽出される。この処理により、この系を記述する非エルミート擬ヤーン・テラー・ハミルトニアンが現れる。非断熱結合と幾何位相はさらに計算され、複素断熱ポテンシャルエネルギー面の強化トポロジーを特徴付けるために用いられる。 We study the electronic resonant states of H3 with energies above the potential energy surface of the H3+ ground state. These resonant states are important for the dissociative recombination of H3+ at higher collision energies, and previous studies have indicated that these resonant states exhibit a triple intersection. We introduce a complex generalization of the pseudo-Jahn-Teller model to describe these resonant states. The potential energies and the autoionization widths of the resonant states are computed with electron scattering calculations using the complex Kohn variational method, and the complex model parameters are extracted by a least-square fit to the results. This treatment results in a non-Hermitian pseudo-Jahn-Teller Hamiltonian describing the system. The non-adiabatic coupling and geometric phase are further calculated and used to characterize the enriched topology of the complex adiabatic potential energy surfaces.	翻訳日:2023-04-09 02:36:41 公開日:2021-03-04
# 結晶面原子核による相対論的粒子非コヒーレント散乱 Relativistic particle incoherent scattering by the nuclei of crystal plane atoms ( http://arxiv.org/abs/2103.03141v1 ) ライセンス: Link先を確認	Victor V. Tikhomirov	(参考訳) 現象パラメータを持たない結晶面の核による古典的移動相対論的粒子の不整合散乱を記述する一貫した理論を示す。量子力学の基本概念を適用し、粒子軌道の単位長さあたりの平均二乗非コヒーレント散乱角の基本的なコンパクトな公式を導入する。後者は、クーロン散乱シミュレーションにおける結晶原子分布の不均一性の影響を、シミュレーション時間の顕著な延長なしに実装するために用いられる。この理論は、正に荷電された粒子が低核密度領域から脱流路する性質を本質的に再検討し、結晶の振動子と短粒子の特定の電磁モーメントの測定の両方に必須である。 A consistent theory, which describes the incoherent scattering of classically moving relativistic particles by the nuclei of crystal planes without any phenomenological parameter is presented. The basic notions of quantum mechanics are applied to introduce a fundamental compact formula for the mean square incoherent scattering angle per unit length of particle trajectory. The latter is used to implement the effects of the crystal atom distribution inhomogeneity into the Coulomb scattering simulations without noticeable elongation of the simulation time. The theory essentially reconsiders the nature of positively charged particle dechanneling from the low nuclear density regions, being essential in both the crystal undulators and envisaged measurements of the specific electromagnetic momenta of short living particles.	翻訳日:2023-04-09 02:30:06 公開日:2021-03-04
# 雑音量子回路のラデマチャー複雑性 Rademacher complexity of noisy quantum circuits ( http://arxiv.org/abs/2103.03139v1 ) ライセンス: Link先を確認	Kaifeng Bu, Dax Enshan Koh, Lu Li, Qingxian Luo, Yaobo Zhang	(参考訳) 量子系におけるノイズは、大きな量子回路上で多くの量子アルゴリズムを実装する上で大きな障害となる。本研究では,量子回路のラデマッハ複雑性に対する雑音の影響について検討する。これは,これらの回路によって生成される関数のクラスのリッチさを定量化する,統計的複雑性の尺度である。我々は、一意チャネルの凸結合で表されるノイズモデルを検討し、これらのノイズモデルによって特徴づけられる量子回路のラデマッハ複素量に対して上下境界を提供する。特に、ノイズのない量子回路のラデマッハ複雑性と回路の自由ロバスト性に依存する雑音量子回路のラデマッハ複雑性に対する下界を求める。以上の結果から,量子回路のRademacher複雑性はノイズの増加とともに減少することが示された。 Noise in quantum systems is a major obstacle to implementing many quantum algorithms on large quantum circuits. In this work, we study the effects of noise on the Rademacher complexity of quantum circuits, which is a measure of statistical complexity that quantifies the richness of classes of functions generated by these circuits. We consider noise models that are represented by convex combinations of unitary channels and provide both upper and lower bounds for the Rademacher complexities of quantum circuits characterized by these noise models. In particular, we find a lower bound for the Rademacher complexity of noisy quantum circuits that depends on the Rademacher complexity of the corresponding noiseless quantum circuit as well as the free robustness of the circuit. Our results show that the Rademacher complexity of quantum circuits decreases with the increase in noise.	翻訳日:2023-04-09 02:29:54 公開日:2021-03-04
# 線形判別分析による量子次元の低減 Quantum Dimensionality Reduction by Linear Discriminant Analysis ( http://arxiv.org/abs/2103.03131v1 ) ライセンス: Link先を確認	Kai Yu, Gong-De Guo, and Song Lin	(参考訳) データの次元性低減(DR)は、パターン認識やデータ分類など、多くの機械学習タスクにおいて重要な問題である。本稿では,次元減少のための線形判別分析(LDA)を効率的に行う量子アルゴリズムと量子回路を提案する。まず,提案アルゴリズムは既存の量子ldaアルゴリズムを改善し,元のアルゴリズムにおけるクラス間散乱行列$s_b$の非可逆性による誤差を回避する。次に,低次元データに対応する対象状態を得るために量子アルゴリズムと量子回路を提案する。最もよく知られた古典的アルゴリズムと比較すると、量子線形判別分析次元減少 (qldadr) アルゴリズムは、元のデータ空間の次元が多対数低次元空間に投影されたとき、ベクトル数 $m$ の指数加速度と、元のデータ空間の次元 $d$ の二次速度を持つ。さらに、本アルゴリズムにより得られた対象状態は、他の量子機械学習タスクのサブモジュールとして使用できる。それは、それを次元の災難から解放する実用的な応用価値を持っている。 Dimensionality reduction (DR) of data is a crucial issue for many machine learning tasks, such as pattern recognition and data classification. In this paper, we present a quantum algorithm and a quantum circuit to efficiently perform linear discriminant analysis (LDA) for dimensionality reduction. Firstly, the presented algorithm improves the existing quantum LDA algorithm to avoid the error caused by the irreversibility of the between-class scatter matrix $S_B$ in the original algorithm. Secondly, a quantum algorithm and quantum circuits are proposed to obtain the target state corresponding to the low-dimensional data. Compared with the best-known classical algorithm, the quantum linear discriminant analysis dimensionality reduction (QLDADR) algorithm has exponential acceleration on the number $M$ of vectors and a quadratic speedup on the dimensionality $D$ of the original data space, when the original dataset is projected onto a polylogarithmic low-dimensional space. Moreover, the target state obtained by our algorithm can be used as a submodule of other quantum machine learning tasks. It has practical application value of make that free from the disaster of dimensionality.	翻訳日:2023-04-09 02:29:42 公開日:2021-03-04
# 量子テレポーテーションに基づく量子情報マスキング Quantum information masking basing on quantum teleportation ( http://arxiv.org/abs/2103.03126v1 ) ライセンス: Link先を確認	Wei-Min Shang and Fu-Lin Zhang and Jing-Ling Chen	(参考訳) マスキングの定理は、二部構成のシナリオでは量子情報のマスキングは不可能であると述べている。しかし、マルチパーティイト系では量子状態をマスクするスキームが存在する。本研究では,テレポーテーションにおける関節計測は,装置がシステム全体の量子参加者と見なされる場合,実際にはマスキングの過程であることを示す。前者のうちの1つは、Li と Wang [Phys. Rev. A 98, 062306 (2018)] によって与えられる四ビットスキームの任意の次元の一般化を提供する。量子状態の占有確率とコヒーレンスは、我々のスキームの2つのステップで隠蔽される。そしてその情報は、その逆のプロセスで自然に抽出できる。 The no-masking theorem says that masking quantum information is impossible in a bipartite scenario. However, there exist schemes to mask quantum states in multipartite systems. In this work, we show that, the joint measurement in the teleportation is really a masking process, when the apparatus is regarded as a quantum participant in the whole system.Based on the view, we present two four-partite maskers and a tripartite masker. One of the former provides a generalization in arbitrary dimension of the four-qubit scheme given by Li and Wang [Phys. Rev. A 98, 062306 (2018)], and the latter is precisely their tripartite scheme. The occupation probabilities and coherence of quantum states are masked in two steps of our schemes. And the information can be extracted naturally in their reverse processes.	翻訳日:2023-04-09 02:29:24 公開日:2021-03-04
# 経済異常検出のためのイベントベース動的バンキングネットワーク探索 Event-Based Dynamic Banking Network Exploration for Economic Anomaly Detection ( http://arxiv.org/abs/2103.03120v1 ) ライセンス: Link先を確認	Andry Alamsyah, Dian Puteri Ramadhani, Farida Titik Kristanti	(参考訳) 金融システムの不安定さは、銀行の破綻を引き起こし、流出を誘発し、金融システムに悪影響を及ぼす感染効果を生じさせ、最終的には経済に影響を及ぼす可能性がある。この現象は、高度に相互接続された銀行取引の結果である。銀行取引ネットワークは金融アーキテクチャのバックボーンと見なされている。銀行間の強い相互接続性は、銀行網全体に広がる伝染破壊をエスカレートし、システム全体の崩壊を引き起こす。これまでのところ、金融の不安定性は、主に制御されていない取引赤字量と未払いの対外債務をマクロアプローチで検出されている。本研究は、グローバルに銀行網構造を探索するマクロビューとモチーフと呼ばれる詳細なネットワークパターンに焦点を当てたマイクロビューを通して、別の視点で金融不安定検出を提案する。ネットワーク三進モチーフパターンは、金融不安定を検出するのに用いられる。不安定期間に関連する最も関連するネットワーク三進モチーフ変化を検出器として決定する。インドネシアの主要な宗教行事であるeid al-fitrとともに、金融不安定現象下の銀行ネットワークの挙動を考察する。我々は、金融不安定な基盤検出器として一つのモチーフパターンを発見する。この研究は金融システムの安定管理を支援するのに役立つ。 The instability of financial system issues might trigger a bank failure, evoke spillovers, and generate contagion effects which negatively impacted the financial system, ultimately on the economy. This phenomenon is the result of the highly interconnected banking transaction. The banking transactions network is considered as a financial architecture backbone. The strong interconnectedness between banks escalates contagion disruption spreading over the banking network and trigger the entire system collapse. This far, the financial instability is generally detected using macro approach mainly the uncontrolled transaction deficits amount and unpaid foreign debt. This research proposes financial instability detection in another point of view, through the macro view where the banking network structure are explored globally and micro view where focuses on the detailed network patterns called motif. Network triadic motif patterns used as a denomination to detect financial instability. The most related network triadic motif changes related to the instability period are determined as a detector. We explore the banking network behavior under financial instability phenomenon along with the major religious event in Indonesia, Eid al-Fitr. We discover one motif pattern as the financial instability underlying detector. This research helps to support the financial system stability supervision.	翻訳日:2023-04-09 02:29:01 公開日:2021-03-04
# インドネシアのEコマースデータに基づく中小企業の分類決定木アプローチによる販売予測モデル Sales Prediction Model Using Classification Decision Tree Approach For Small Medium Enterprise Based on Indonesian E-Commerce Data ( http://arxiv.org/abs/2103.03117v1 ) ライセンス: Link先を確認	Raden Johannes, Andry Alamsyah	(参考訳) インドネシアにおけるインターネット利用者の増加は、商業を含む日常生活の多くの側面に影響を及ぼす。インドネシアの中小企業は、新しいメディアの利点を生かして、オンラインコマースの意義を生かした。これまでのところ、過去の取引で売上と収益を予測できる実用的な実装は知られていない。本稿では,インドネシア最大のeコマースプロバイダであるTokopediaで収集した実生活データを用いて,インドネシアの靴産業における販売予測モデルを構築した。データマイニングは、データを処理することによって情報を集めるために使用できる分野である。本研究は,データマイニングにおける分類手法を用いて,市場のパターンを記述し,市場商品における地域の可能性を予測する。我々のアプローチは分類決定木に基づいている。われわれは、視聴者が販売する商品数、価格、靴の種類を予測することができた。 The growth of internet users in Indonesia gives an impact on many aspects of daily life, including commerce. Indonesian small-medium enterprises took this advantage of new media to derive their activity by the meaning of online commerce. Until now, there is no known practical implementation of how to predict their sales and revenue using their historical transaction. In this paper, we build a sales prediction model on the Indonesian footwear industry using real-life data crawled on Tokopedia, one of the biggest e-commerce providers in Indonesia. Data mining is a discipline that can be used to gather information by processing the data. By using the method of classification in data mining, this research will describe patterns of the market and predict the potential of the region in the national market commodities. Our approach is based on the classification decision tree. We managed to determine predicted the number of items sold by the viewers, price, and type of shoes.	翻訳日:2023-04-09 02:28:41 公開日:2021-03-04
# ノイズ量子ネットワークのロバスト性 Robustness of Noisy Quantum Networks ( http://arxiv.org/abs/2103.03266v1 ) ライセンス: Link先を確認	Bruno C. Coutinho, William J. Munro, Kae Nemoto and Yasser Omar	(参考訳) 量子ネットワークは複雑なネットワークの新しいパラダイムであり、ネットワーク化された量子技術を利用して量子インターネットを開発することができる。しかし、リンクとノードが故障し始めると、量子ネットワークはどのくらい堅牢か? 典型的雑音の量子リピータノードに基づく量子ネットワークは、動作リンクやノードのランダムな損失に対して不連続な相転移を起こしやすいことを示し、ネットワークの接続性を急激に妥協させ、その動作範囲を著しく制限することを示した。さらに,ネットワークトポロジー,ネットワークサイズ,ネットワーク内の絡み合い分布の関数として,この破滅的な接続損失を回避するために必要な臨界量子リピータ効率を決定する。特に,大規模量子インターネットの確立には,スケールフリートポロジーが重要な設計原理であることを示す。 Quantum networks are a new paradigm of complex networks, allowing us to harness networked quantum technologies and to develop a quantum internet. But how robust is a quantum network when its links and nodes start failing? We show that quantum networks based on typical noisy quantum-repeater nodes are prone to discontinuous phase transitions with respect to the random loss of operating links and nodes, abruptly compromising the connectivity of the network, and thus significantly limiting the reach of its operation. Furthermore, we determine the critical quantum-repeater efficiency necessary to avoid this catastrophic loss of connectivity as a function of the network topology, the network size, and the distribution of entanglement in the network. In particular, our results indicate that a scale-free topology is a crucial design principle to establish a robust large-scale quantum internet.	翻訳日:2023-04-09 02:20:42 公開日:2021-03-04
# ブロックチェーンプラットフォームの利用可能性に関する要件分析と評価 Requirement Analyses and Evaluations of Blockchain Platforms per Possible Use Cases ( http://arxiv.org/abs/2103.03209v1 ) ライセンス: Link先を確認	Kenji Saito, Akimitsu Shiseki, Mitsuyasu Takada, Hiroki Yamamoto, Masaaki Saitoh, Hiroaki Ohkawa, Hirofumi Andou, Naotake Miyamoto, Kazuaki Yamakawa, Kiyoshi Kurakawa, Tomohiro Yabushita, Yuji Yamada, Go Masuda, Kazuyuki Masuda	(参考訳) ブロックチェーンは、公開文書や私文書の管理から、さまざまな産業におけるトレーサビリティ、デジタル通貨に至るまで、幅広い方法で社会のデジタルトランスフォーメーションに寄与すると言われている。いわゆるブロックチェーンプラットフォームが数多く開発され、実験や応用が行われている。しかし、これらのプラットフォームは本当にブロックチェーンのコンセプトを実践するのに有効だろうか? 質問に答えるためには、ブロックチェーンと呼ばれる技術が何であるかをよりよく理解する必要があります。ブロックチェーンが何のために発明されたのか、それが何を意味するのかを理解する上で、我々が見る混乱を整理する必要がある。また,その応用構造を明らかにする必要がある。このドキュメントは、ブロックチェーンとそのアプリケーションを理解する一般的なモデルを提供します。プラットフォームを分類するためにデザインパターンを導入します。アプリケーション間の構造を識別し,各ケースの機能的,性能的,運用的,法的要件を整理することにより,考えられるユースケースを分類する。分類と基準に基づいて、Hyperledger Fabric、Hyperledger Iroha、Hyperledger Indy、Ethereum、Quorum/Hyperledger Besu、Ethereum 2.0、Pokadot、Corda、BBc-1といったプラットフォームを評価し、比較した。評価と比較で公平に取り組んできたが、議論を誘発することを期待している。このドキュメントの読者は、非エンジニアや非技術者を含む、ブロックチェーンとそのプラットフォームを理解したいアプリケーションシステムの開発に関わる人であれば誰でも参加できる。この文書のアセスメントにより、読者はブロックチェーンプラットフォームの技術的要件を理解し、既存の技術に疑問を呈し、想定するアプリケーションに適したプラットフォームを選択することができる。この比較は、新しいテクノロジーを設計するためのガイドとしても役立つだろう。 It is said that blockchain will contribute to the digital transformation of society in a wide range of ways, from the management of public and private documents to the traceability in various industries, as well as digital currencies. A number of so-called blockchain platforms have been developed, and experiments and applications have been carried out on them. But are these platforms really conducive to practical use of the blockchain concept? To answer the question, we need to better understand what the technology called blockchain really is. We need to sort out the confusion we see in understanding what blockchain was invented for and what it means. We also need to clarify the structure of its applications. This document provides a generic model of understanding blockchain and its applications. We introduce design patterns to classify the platforms. We categorize possible use cases by identifying the structure among applications, and organize the functional, performance, operational and legal requirements for each such case. Based on the categorization and criteria, we evaluated and compared the following platforms: Hyperledger Fabric, Hyperledger Iroha, Hyperledger Indy, Ethereum, Quorum/Hyperledger Besu, Ethereum 2.0, Polkadot, Corda and BBc-1. We have tried to be fair in our evaluations and comparisons, but we also expect to provoke discussion. The intended readers for this document is anyone involved in development of application systems who wants to understand blockchain and their platforms, including non-engineers and non-technologists. The assessments in this document will allow readers to understand the technological requirements for the blockchain platforms, to question existing technologies, and to choose the appropriate platforms for the applications they envision. The comparisons hopefully will also be useful as a guide for designing new technologies.	翻訳日:2023-04-09 02:19:59 公開日:2021-03-04
# 量子コンピュータにおけるデジタル散逸ダイナミクスを用いた量子マルコフ連鎖モンテカルロ Quantum Markov Chain Monte Carlo with Digital Dissipative Dynamics on Quantum Computers ( http://arxiv.org/abs/2103.03207v1 ) ライセンス: Link先を確認	Mekena Metcalf, Emma Stone, Katherine Klymko, Alexander F. Kemper, Mohan Sarovar, and Wibe A. de Jong	(参考訳) 環境に接続された量子システムのダイナミクスのモデリングは、自然界のほとんどの量子プロセスが環境に影響されるため、複雑な量子プロセスの理解を進める上で非常に重要である。量子シミュレータ上のマクロ環境のモデリングは、システムと適切な方法でエネルギー交換を促進し、環境を模倣する独立したアンシラ量子ビットを結合することで達成できる。このアプローチには、非現実的な大規模な、おそらく指数関数的な自由度を必要とする。対照的に,少数のアンシラ量子ビットを用いて環境とのインタラクションをシミュレートするディジタル量子アルゴリズムを開発した。周期的なアンシラエネルギの変調(またはスペクトルコンピング)と周期的なリセット操作を組み合わせることで、大きな環境との相互作用を模倣し、相互作用する多体系の熱状態を生成することができる。逆イジングモデルの熱状態のシミュレーションによるアルゴリズムの評価を行った。このアルゴリズムは、多変量モデルのギブス分布のサンプリングを可能にする量子マルコフ連鎖モンテカルロ(qmcmc)プロセスとしても見ることができる。そこで本研究では,単純な確率的グラフィカルモデルのギブス分布のサンプリング精度を評価する。 Modeling the dynamics of a quantum system connected to the environment is critical for advancing our understanding of complex quantum processes, as most quantum processes in nature are affected by an environment. Modeling a macroscopic environment on a quantum simulator may be achieved by coupling independent ancilla qubits that facilitate energy exchange in an appropriate manner with the system and mimic an environment. This approach requires a large, and possibly exponential number of ancillary degrees of freedom which is impractical. In contrast, we develop a digital quantum algorithm that simulates interaction with an environment using a small number of ancilla qubits. By combining periodic modulation of the ancilla energies, or spectral combing, with periodic reset operations, we are able to mimic interaction with a large environment and generate thermal states of interacting many-body systems. We evaluate the algorithm by simulating preparation of thermal states of the transverse Ising model. Our algorithm can also be viewed as a quantum Markov chain Monte Carlo (QMCMC) process that allows sampling of the Gibbs distribution of a multivariate model. To demonstrate this we evaluate the accuracy of sampling Gibbs distributions of simple probabilistic graphical models using the algorithm.	翻訳日:2023-04-09 02:19:33 公開日:2021-03-04
# ポート割り当てとコンパイルによる線形光学不完全化の緩和 Mitigating linear optics imperfections via port allocation and compilation ( http://arxiv.org/abs/2103.03183v1 ) ライセンス: Link先を確認	Shreya P. Kumar, Leonhard Neuhaus, Lukas G. Helt, Haoyu Qi, Blair Morrison, Dylan H. Mahler, Ish Dhand	(参考訳) 線形光学は、室温で動作し、統合フォトニックプラットフォーム上でスカラーで製造できる量子技術を構築する有望な経路である。しかし、ライン光学のスケールアップには製造不備が避けられないため、高性能な運転が必要である。そこで本研究では, 適切なキャリブレーション手順によって事前に決定できる, ポート割り当てとオンチップ不完全度へのコンパイルを調整し, 線形光干渉計の性能を向上させる手法を提案する。代表例として、所定の干渉計の平均消費電力や、その上に実装されたすべての可能なユニタリ変換における消費電力値の範囲の劇的な削減を示す。さらに, 製造欠陥の存在下での所望の変換のフィダリティ向上にこれらの技術が有効であることを示す。関連する測定値における線形光干渉計の性能を数桁改善することにより、これらのツールは真の量子優位性を示すための光学技術をもたらす。 Linear optics is a promising route to building quantum technologies that operate at room temperature and can be manufactured scalably on integrated photonic platforms. However, scaling up linear optics requires high-performance operation amid inevitable manufacturing imperfections. We present techniques for enhancing the performance of linear optical interferometers by tailoring their port allocation and compilation to the on-chip imperfections, which can be determined beforehand by suitable calibration procedures that we introduce. As representative examples, we demonstrate dramatic reductions in the average power consumption of a given interferometer or in the range of its power consumption values across all possible unitary transformations implemented on it. Furthermore, we demonstrate the efficacy of these techniques at improving the fidelities of the desired transformations in the presence of fabrication defects. By improving the performance of linear optical interferometers in relevant metrics by several orders of magnitude, these tools bring optical technologies closer to demonstrating true quantum advantage.	翻訳日:2023-04-09 02:19:11 公開日:2021-03-04
# 観測不能因果ループの量子力学的概念とアントロピック原理 The quantum mechanical notion of unobservable causal loop and the anthropic principle ( http://arxiv.org/abs/2103.03173v1 ) ライセンス: Link先を確認	Giuseppe Castagnoli	(参考訳) 2つの1対1の相関した測定結果の間の可逆的量子過程の通常の記述は、因果関係の方向を規定しないことで、可逆的過程に必要な時間対称性に反する因果構造が許されるため不完全である。これはまた、単に時間対称性化することで完了できることを意味する。すなわち、最初の測定値と最後の測定値が、それらの相関した結果の選択に均等に寄与することを要求することである。これは説明を変更せずに残すが、因果構造が完全に定義される観測不能な時間対称性のインスタンスの量子重ね合わせであることを示している。それぞれのインスタンスは因果ループで構成されている:最後の測定は、単位変換の入力状態がその直前の状態につながるときに後方に変化する。前者の研究では、そのようなループが量子計算のスピードアップと量子非局所性を正確に説明できることが示されている。この研究で、量子スピードアップを伴う宇宙の進化を可能にする人類の原理の完成につながることを示した。 It can be argued that the ordinary description of the reversible quantum process between two one-to-one correlated measurement outcomes is incomplete because, by not specifying the direction of causality, it allows causal structures that violate the time symmetry that is required of a reversible process. This also means that it can be completed simply by time-symmetrizing it, namely by requiring that the initial and final measurements evenly contribute to the selection of their correlated pair of outcomes. This leaves the description unaltered but shows that it is the quantum superposition of unobservable time-symmetrized instances whose causal structure is completely defined. Each instance consists of a causal loop: the final measurement that changes backwards in time the input state of the unitary transformation that leads to the state immediately before it. In former works, we have shown that such loops exactly explain the quantum computational speedup and quantum nonlocality. In this work we show that they lead to a completion of the anthropic principle that allows a universe evolution with quantum speedup.	翻訳日:2023-04-09 02:18:55 公開日:2021-03-04
# 農場における現場作業の遠隔観察 Remote Observation of Field Work on the Farm ( http://arxiv.org/abs/2103.03163v1 ) ライセンス: Link先を確認	Wendy Ju, Ilan Mandel, Kevin Weatherwax, Leila Takayama, Nikolas Martelaro, Denis Willett	(参考訳) 旅行制限やソーシャルディスタンシング対策は、物理的フィールドワークの監視、監視、管理を困難にしている。本研究は,農場における現場作業の観察のために,道路内車両におけるリアルタイム遠隔観察・会話技術を適用した研究である。私たちは、ニューヨーク北部のKreher Eggsで、このプロジェクトのパイロット展開に協力しました。車両関連作業を行う農作業員の遠隔観察と面接を行うための機器を備えたトラクタを製作した。本研究は, 遠隔地から長期にわたる現場作業の継続観察を可能にするため, 本研究は, 本研究の現状を踏まえ, 地理的・身体的距離が標準となった場合の観察研究の実施方法についてのケーススタディを提供する。我々は, 現場で遠隔観察研究を行おうとする人々に対して, 経験を議論し, 予備的な知見を提供する。 Travel restrictions and social distancing measures make it difficult to observe, monitor or manage physical fieldwork. We describe research in progress that applies technologies for real-time remote observation and conversation in on-road vehicles to observe field work on a farm. We collaborated on a pilot deployment of this project at Kreher Eggs in upstate New York. We instrumented a tractor with equipment to remotely observe and interview farm workers performing vehicle-related work. This work was initially undertaken to allow sustained observation of field work over longer periods of time from geographically distant locales; given our current situation, this work provides a case study in how to perform observational research when geographic and bodily distance have become the norm. We discuss our experiences and provide some preliminary insights for others looking to conduct remote observational research in the field.	翻訳日:2023-04-09 02:18:34 公開日:2021-03-04
# ボームの量子ポテンシャルの非局所位相場モデル A non local phase field model of Bohm's quantum potential ( http://arxiv.org/abs/2103.03162v1 ) ライセンス: Link先を確認	Roberto Mauri	(参考訳) 気体の自由エネルギーがその質量密度の対数に非局所的に依存すると仮定すると、結果として生じる運動方程式の体力は密度勾配項の和からなる。 2項目の後にこの級数、ボームの量子ポテンシャルとマドルング方程式は同一に得られ、量子力学の定式化に繋がった仮説のいくつかが非局所性に基づく古典的な解釈を受け入れていることを示している。 Assuming that the free energy of a gas depends non-locally on the logarithm of its mass density, the body force in the resulting equation of motion consists of the sum of density gradient terms. Truncating this series after the second term, Bohm's quantum potential and the Madelung equation are identically obtained, showing explicitly that some of the hypotheses that led to the formulation of quantum mechanics admit a classical interpretation based on non-locality.	翻訳日:2023-04-09 02:18:21 公開日:2021-03-04
# 新規サチンボワーバード最適化装置によるコンクリートの一軸圧縮強度の解析 Analyzing Uniaxial Compressive Strength of Concrete Using a Novel Satin Bowerbird Optimizer ( http://arxiv.org/abs/2103.15547v1 ) ライセンス: Link先を確認	Hossein Moayedi, Amir Mosavi	(参考訳) コンクリートの力学パラメータを解析する複雑さを克服するには, 適切な方法を選択する必要がある。本研究では, コンクリートの一軸圧縮強度(ucs)を予測するために, 人工ニューラルネットワーク (ann) と, satin bowerbird optimizer (sbo) という新しいメタヒューリスティックな手法を統合した。この目的のために作成されたハイブリッドは、公開された文献から収集された比較的大きなデータセットを使用してトレーニングされ、テストされる。その他の3つの新しいアルゴリズム、Henry Gas Solubility Optimization (HGSO)、Sunflower Optimization (SFO)、VSA (Vortex search algorithm) もベンチマークとして使用されている。様々な精度指標を用いて,全アルゴリズムの適切な集団サイズを達成した後,提案手法はUCSの挙動を良好に解析するだけでなく,3つのベンチマークハイブリッド(ANN-HGSO,ANN-SFO,ANN-VSA)全てに優れていた。予測フェーズでは, 0.87394, 0.87936, 0.95329, 0.95663の相関指標と, ANN-HGSO, ANN-SFO, ANN-VSA, ANN-SBOで算出された15.9719, 15.3845, 9.4970, 8.0629%の平均絶対誤差は, それぞれ最高の予測性能を示した。また、ANN-VSAも信頼できる結果を得た。要するにann-sboは、技術者がコンクリートのucsを予測するための効率的な非破壊的手法として使用できる。 Surmounting the complexities in analyzing the mechanical parameters of concrete entails selecting an appropriate methodology. This study integrates an artificial neural network (ANN) with a novel metaheuristic technique, namely satin bowerbird optimizer (SBO) for predicting uniaxial compressive strength (UCS) of concrete. For this purpose, the created hybrid is trained and tested using a relatively large dataset collected from the published literature. Three other new algorithms, namely Henry gas solubility optimization (HGSO), sunflower optimization (SFO), and vortex search algorithm (VSA) are also used as benchmarks. After attaining a proper population size for all algorithms, Utilizing various accuracy indicators, it was shown that the proposed ANN-SBO not only can excellently analyze the UCS behavior, but also outperforms all three benchmark hybrids (i.e., ANN-HGSO, ANN-SFO, and ANN-VSA). In the prediction phase, the correlation indices of 0.87394, 0.87936, 0.95329, and 0.95663, as well as mean absolute percentage errors of 15.9719, 15.3845, 9.4970, and 8.0629%, calculated for the ANN-HGSO, ANN-SFO, ANN-VSA, and ANN-SBO, respectively, manifested the best prediction performance for the proposed model. Also, the ANN-VSA achieved reliable results as well. In short, the ANN-SBO can be used by engineers as an efficient non-destructive method for predicting the UCS of concrete.	翻訳日:2023-04-09 02:11:54 公開日:2021-03-04
# 誰が何をする? コンピュータサイエンス学生チームにおける作業分割と配置戦略 Who does what? Work division and allocation strategies of computer science student teams ( http://arxiv.org/abs/2103.09048v1 ) ライセンス: Link先を確認	Anna van der Meulen, Efthimia Aivaloglou	(参考訳) コラボレーションスキルは将来のソフトウェアエンジニアにとって重要だ。コンピュータサイエンス教育では、これらのスキルは、学生が共同でソフトウェアを開発するグループ課題を通じて実践されることが多い。これらの課題に学生が取り組むアプローチは様々であるが、しばしば分業を伴う。そして、コラボレーションが現在も行われているかどうかを議論することができる。コンピューティング教育の分野はこの文脈で特に興味深いのは、特定の特徴(例えば、エントリスキルのレベルの変化やコラボレーションプラットフォームとしてのソースコードリポジトリの使用など)がグループワークで取られたアプローチに影響を与える可能性があるからである。本研究の目的は,集団課題におけるコンピュータサイエンスの学生の作業分割とアロケーション戦略の洞察を得ることである。この結果、4つの大学の20人の学生にインタビューを行った。テーマ分析は、ペアプログラミングとコードレビューが頻繁に採用され、学生は独立して働くためにワークロードを分割する傾向があることを示している。学生は主に成績と効率の要素に動機付けられ、主に専門知識と選好に基づいてタスクを選択して割り当てる。本研究の結果から,グループ課題の設定は,新しいソフトウェア工学のスキルを実践する学生のモチベーションを制限し,実験と学習を促進するためには介入が必要であると論じている。 Collaboration skills are important for future software engineers. In computer science education, these skills are often practiced through group assignments, where students develop software collaboratively. The approach that students take in these assignments varies widely, but often involves a division of labour. It can then be argued whether collaboration still takes place. The discipline of computing education is especially interesting in this context, because some of its specific features (such as the variation in entry skill level and the use of source code repositories as collaboration platforms) are likely to influence the approach taken within groupwork. The aim of this research is to gain insight into the work division and allocation strategies applied by computer science students during group assignments. To this end, we interviewed twenty students of four universities. The thematic analysis shows that students tend to divide up the workload to enable working independently, with pair programming and code reviews being often employed. Motivated primarily by grade and efficiency factors, students choose and allocate tasks primarily based on their prior expertise and preferences. Based on our findings, we argue that the setup of group assignments can limit student motivation for practicing new software engineering skills, and that interventions are needed towards encouraging experimentation and learning.	翻訳日:2023-04-09 02:11:23 公開日:2021-03-04
# ニューラルネットワークにおけるクラスタ性 Clusterability in Neural Networks ( http://arxiv.org/abs/2103.03386v1 ) ライセンス: Link先を確認	Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell	(参考訳) ニューラルネットワークの学習重量は、しばしば精査可能な内部構造を持たないと考えられている。本稿では,ネットワークを内部接続性が高く,外部接続性が弱いニューロン群に分割可能かという,クラスタ性という形での構造を考察する。トレーニングされたニューラルネットワークは、通常ランダムに初期化されたネットワークよりもクラスタリング可能であり、しばしば同じ重みの分布を持つランダムネットワークに対してクラスタリング可能である。また,ニューラルネットワークの学習においてクラスタ性を促進する新しい手法を示し,多層パーセプトロンでは,精度を低下させることなく,よりクラスタ性の高いネットワークを実現する。ニューラルネットワークのクラスタビリティを理解して制御することで、意味のあるクラスタへのパーティショニングを容易にすることで、内部処理をよりエンジニアに解釈可能にすることが期待できる。 The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.	翻訳日:2023-04-09 02:10:47 公開日:2021-03-04
# チップ上の圧縮量子マイクロコンブ A squeezed quantum microcomb on a chip ( http://arxiv.org/abs/2103.03380v1 ) ライセンス: Link先を確認	Zijiao Yang, Mandana Jahanbozorgi, Dongin Jeong, Shuman Sun, Olivier Pfister, Hansuek Lee, Xu Yi	(参考訳) 光マイクロ共鳴器ベースの周波数コム(microcomb)は、非線形物理学研究のための汎用プラットフォームを提供し、メトロロジーから分光まで幅広い応用がある。決定論的量子状態(Deterministic quantum regime)は、何百もの等価周波数モードの間の無条件の絡み合いが、スケーラブルな普遍量子コンピューティングや量子ネットワークにとって重要な要素となる、マイクロコムの未発見の側面である。ここでは、シリコンチップ上のシリカマイクロ共振器において決定論的量子マイクロコンブを示す。連続変動可能な40の量子モードは、通信波長で1thzの光スパン内で20個の2モードのスクイーズドコム対の形で観測される。 1.6dBの最大原料スクイーズが得られる。量子マイクロコムの周波数等距離を特徴付けるために高分解能分光計測法を開発した。実験では, 周波数多重量子状態と集積光学を活用し, 分光, 量子計測, スケーラブルな量子情報処理の分野における新たな道を開拓する可能性を示す。 The optical microresonator-based frequency comb (microcomb) provides a versatile platform for nonlinear physics studies and has wide applications ranging from metrology to spectroscopy. Deterministic quantum regime is an unexplored aspect of microcombs, in which unconditional entanglements among hundreds of equidistant frequency modes can serve as critical ingredients to scalable universal quantum computing and quantum networking. Here, we demonstrate a deterministic quantum microcomb in a silica microresonator on a silicon chip. 40 continuous-variable quantum modes, in the form of 20 simultaneously two-mode squeezed comb pairs, are observed within 1 THz optical span at telecommunication wavelengths. A maximum raw squeezing of 1.6 dB is attained. A high-resolution spectroscopy measurement is developed to characterize the frequency equidistance of quantum microcombs. Our demonstration offers the possibility to leverage deterministically generated, frequency multiplexed quantum states and integrated photonics to open up new avenues in fields of spectroscopy, quantum metrology, and scalable quantum information processing.	翻訳日:2023-04-09 02:10:33 公開日:2021-03-04
# 時空は全体として Spacetime Paths as a Whole ( http://arxiv.org/abs/2103.03364v1 ) ライセンス: Link先を確認	Sky Nelson-Isaacs	(参考訳) 量子力学における非相対論的波動関数伝播とスカラー回折理論における画像伝播の数学的類似性は、時空全体を通しての時間と経路の新たな理解を促進するために用いられる。ファインマンによる非相対論的量子力学の経路積分公式の元々の導出は、時空を通るすべての可能な経路の和として振幅を計算するのに時間スライシングを用いることで知られている。ここでは、外部時間パラメータを持たないため、通常の意味での変化や進化ができない3+1D時空波分布とその4次元双対を公式に開発する。時間」は「外から」である。与えられた3+1D運動量表現は、システム全体の時空挙動を記述した完全な動的情報を符号化する。ホログラムの数学との比較を行い、単純な系に対する運動の性質を導出する。 The mathematical similarities between non-relativistic wavefunction propagation in quantum mechanics and image propagation in scalar diffraction theory are used to develop a novel understanding of time and paths through spacetime as a whole. It is well known that Feynman's original derivation of the path integral formulation of non-relativistic quantum mechanics uses time-slicing to calculate amplitudes as sums over all possible paths through space, but along a definite curve through time. Here, a 3+1D spacetime wave distribution and its 4-momentum dual are formally developed which have no external time parameter and therefore cannot change or evolve in the usual sense. Time is thus seen "from the outside". A given 3+1D momentum representation of a system encodes complete dynamical information, describing the system's spacetime behavior as a whole. A comparison is made to the mathematics of holograms, and properties of motion for simple systems are derived.	翻訳日:2023-04-09 02:10:13 公開日:2021-03-04
# オンラインデートプラットフォームにおける介入評価のためのエージェントベースモデル An Agent-based Model to Evaluate Interventions on Online Dating Platforms to Decrease Racial Homogamy ( http://arxiv.org/abs/2103.03332v1 ) ライセンス: Link先を確認	Stefania Ionescu, Aniko Hannak, Kenneth Joseph	(参考訳) おそらく今日のオンラインプラットフォームの研究で最も議論を呼んでいる疑問は、プラットフォームがどのプラットフォームに介入して社会的な病気を軽減できるかだ。議論の余地は、オンラインいじめなど、プラットフォームが対処できる効果的な、永続的な介入があるかどうか、あるいは、そのような問題に対処するためにより広範囲にわたる変更が必要であるかどうかである。このような疑問に対処するには実証的な作業が不可欠だ。しかし、それはまた、時間がかかり、高価であり、時には企業が問うべき質問に制限されるため、難しい。本稿では,エージェント・ベース・モデリング(ABM)アプローチを提案する。応用として、シミュレーションされたオンラインデートプラットフォームへの介入が、人工社会における長期の人種間関係の欠如に与える影響を分析する。現実の世界では、異人種間関係の欠如は不平等を維持する重要な手段である。我々の研究は、オンラインデートプラットフォームが、ウェブサイトからの人種間関係の数を増やすために、これまで想定されていた多くの介入が限定的な効果を示し、いかなる介入の有効性も社会文化的構造に関する仮定の対象となっていることを示している。さらに、長期的な関係における多様性の増大に有効な介入は、プラットフォームの利益志向の目標に反する。一般的なレベルでは、abmアプローチを用いてプラットフォームが持つ可能性のあるさまざまな介入の潜在的な影響と副作用を理解することの価値を示す。 Perhaps the most controversial questions in the study of online platforms today surround the extent to which platforms can intervene to reduce the societal ills perpetrated on them. Up for debate is whether there exist any effective and lasting interventions a platform can adopt to address, e.g., online bullying, or if other, more far-reaching change is necessary to address such problems. Empirical work is critical to addressing such questions. But it is also challenging, because it is time-consuming, expensive, and sometimes limited to the questions companies are willing to ask. To help focus and inform this empirical work, we here propose an agent-based modeling (ABM) approach. As an application, we analyze the impact of a set of interventions on a simulated online dating platform on the lack of long-term interracial relationships in an artificial society. In the real world, a lack of interracial relationships are a critical vehicle through which inequality is maintained. Our work shows that many previously hypothesized interventions online dating platforms could take to increase the number of interracial relationships from their website have limited effects, and that the effectiveness of any intervention is subject to assumptions about sociocultural structure. Further, interventions that are effective in increasing diversity in long-term relationships are at odds with platforms' profit-oriented goals. At a general level, the present work shows the value of using an ABM approach to help understand the potential effects and side effects of different interventions that a platform could take.	翻訳日:2023-04-09 02:09:57 公開日:2021-03-04
# 量子力学における摂動温度 Temperature as perturbation in quantum mechanics ( http://arxiv.org/abs/2103.03306v1 ) ライセンス: Link先を確認	Ashkan Shekaari and Mahmoud Jafari	(参考訳) 摂動的アプローチは、非相対論的量子力学の温度依存バージョンを低温度限界で開発するために採用された。したがって、一般化された自己整合ハミルトニアンは任意の量子力学系に対して、基底状態ハミルトニアンが絶対零点での制限ケースであるように構成された。利害関係と直近の環境を結び付ける弱結合項は摂動として扱われた。得られた一般化ハミルトニアンを、箱の中の自由粒子、真空中の自由粒子、高調波振動子を含む完全な零温度解を持ついくつかの典型的な量子系に適用すると、関連するハミルトニアン、エネルギースペクトル、波動関数は低温限界と一致するように修正された。さらに、箱の中の自由粒子の残留確率によるある種の量子トンネル効果が、貯水池への熱的結合の主な結果として明らかになった。また, 熱環境が波動関数の主特性に及ぼす影響についても詳しく検討し, 考察した。 The perturbative approach was adopted to develop a temperature-dependent version of non-relativistic quantum mechanics in the limit of low-enough temperatures. A generalized, self-consistent Hamiltonian was therefore constructed for an arbitrary quantum-mechanical system in a way that the ground-state Hamiltonian turned out to be just a limiting case at absolute zero. The weak-coupling term connecting the system of interest and its immediate environment was accordingly treated as the perturbation. Applying the obtained generalized Hamiltonian to some typical quantum systems with exact zero-temperature solutions, including the free particle in a box, the free particle in vacuum, and the harmonic oscillator, up to the first order of self-consistency, therefore corrected their associated Hamiltonians, energy spectrums, and wavefunctions to be consistent with the low-temperature limit. Further investigation revealed some kind of quantum tunneling effect by a residual probability for the free particle in a box, as a chief consequence of thermally coupling to the reservoir. The possible effects of thermal environment on the main properties of the wavefunctions were also thoroughly examined and discussed.	翻訳日:2023-04-09 02:09:33 公開日:2021-03-04
# 反jaynes-cummingsモデルにおける2量子ビット量子制御なし論理ゲートと1量子ビットアダマール論理ゲートの理論的実現 Theoretical realization of a two qubit quantum controlled-not logic gate and a single qubit Hadamard logic gate in the anti-Jaynes-Cummings model ( http://arxiv.org/abs/2103.03297v1 ) ライセンス: Link先を確認	Christopher Mayero and Joseph Akeyo Omolo and Stephen Onyango Okeyo	(参考訳) 反jaynes-cummings相互作用過程におけるhadaardとquantum controlled-not logic gates演算を実現するための理論的スキームを提供する。特定の初期原子状態に対する標準アダマール演算は、反ジャイネス・カミングス・クビット状態遷移演算と反ジャイネス・カミングス・ハミルトン状態遷移を生成する相互作用成分とに特定の和周波数と光子数を設定することにより達成される。量子制御NOT論理ゲートは、2次元ヒルベルト空間で定義される1つの原子量子ビットが制御量子ビットであり、2次元ヒルベルト空間で定義される2つの非退化および直交偏極キャビティがターゲット量子ビットを作るときに実現される。反Jaynes-Cummings 準空間で定義される反Jaynes-Cummings qubit状態遷移演算における相互作用時間の正確な選択により、量子制御NOT演算における成功の理想的な単位確率を得る。 We provide a theoretical scheme for realizing a Hadamard and a quantum controlled-NOT logic gates operations in the anti-Jaynes-Cummings interaction process. Standard Hadamard operation for a specified initial atomic state is achieved by setting a specific sum frequency and photon number in the anti-Jaynes-Cummings qubit state transition operation with the interaction component of the anti-Jaynes-Cummings Hamiltonian generating the state transitions. The quantum controlled-NOT logic gate is realized when a single atomic qubit defined in a two-dimensional Hilbert space is the control qubit and two non-degenerate and orthogonal polarized cavities defined in a two-dimensional Hilbert space make the target qubit. With precise choice of interaction time in the anti-Jaynes-Cummings qubit state transition operations defined in the anti-Jaynes-Cummings sub-space spanned by normalized but non-orthogonal basic qubit state vectors, we obtain ideal unit probabilities of success in the quantum controlled-NOT operations.	翻訳日:2023-04-09 02:09:12 公開日:2021-03-04
# 温度:量子力学における無視因子 Temperature: The ignored factor in quantum mechanics ( http://arxiv.org/abs/2001.05212v2 ) ライセンス: Link先を確認	Ashkan Shekaari and Mahmoud Jafari	(参考訳) 古典的熱力学の法則と統計力学の標準アンサンブルスキームを用いて、非相対論的量子力学の枠組みに温度をパラメータとして導入する理論的形式論を開発した。自己整合ハミルトニアンが与えられた量子多体系のために構築され、この系は対応する零温度ハミルトニアンに付加される補正項の形で温度の影響を含む。粒子・イン・ア・ボックスモデル、自由粒子、および我々の有限温度内の調和振動子を含む、厳密なゼロ温度解を持つ量子力学系の研究は、その波関数とエネルギースペクトルに対する物理的に受け入れられない振る舞いに遭遇することなく、これらの系を絶対零点以上で記述する温度依存ハミルトニアンにつながった。結果は、有限温度の量子力学系がゼロ温度励起状態にあるかのように振る舞うという見解を強く支持する。 We have developed a theoretical formalism to introduce temperature as a parameter into the framework of non-relativistic quantum mechanics using the laws of classical thermodynamics and the canonical ensemble scheme of statistical mechanics. A self-consistent Hamiltonian has then been constructed for a given quantum many-body system which includes the effect of temperature in the form of correction terms added to the corresponding zero-temperature Hamiltonian of the system. Investigating some quantum mechanical systems with exact zero-temperature solutions including the particle-in-a-box model, the free particle, and the harmonic oscillator within our finite-temperature approach up to the first order of self-consistency has led to temperature-dependent Hamiltonians describing these systems above absolute zero without encountering any physically unacceptable brand of behavior for their wave functions and energy spectra. Results firmly support the view that a quantum mechanical system at a finite temperature behaves as if it is in a zero-temperature excited state.	翻訳日:2023-01-11 06:51:54 公開日:2021-03-04
# 一般ランクを有するスパイクウィグナーモデルにおける弱検出 Weak Detection in the Spiked Wigner Model with General Rank ( http://arxiv.org/abs/2001.05676v3 ) ライセンス: Link先を確認	Ji Hyung Jung, Hye Won Chung, and Ji Oon Lee	(参考訳) 加算ウィグナー雑音を伴う'signal+noise'型行列モデルから信号を検出する統計的決定過程について検討した。本稿では,信号の分布や雑音に依存しないデータ行列の線形スペクトル統計に基づく仮説テストを提案する。信号対雑音比が小さい場合、このテストはガウス雑音の下で最適であり、タイプIとタイプIIの誤差の総和を最小化する。非ガウス雑音下では、データ行列へのエントリワイズ変換によりテストを改善することができる。また,優先度が分かっていない場合の信号のランクを推定するアルゴリズムも導入する。 We study the statistical decision process of detecting the signal from a `signal+noise' type matrix model with an additive Wigner noise. We propose a hypothesis test based on the linear spectral statistics of the data matrix, which does not depend on the distribution of the signal or the noise. The test is optimal under the Gaussian noise if the signal-to-noise ratio is small, as it minimizes the sum of the Type-I and Type-II errors. Under the non-Gaussian noise, the test can be improved with an entrywise transformation to the data matrix. We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.	翻訳日:2023-01-11 00:14:17 公開日:2021-03-04
# 最大プールネットワークの最適化と一般化解析 An Optimization and Generalization Analysis for Max-Pooling Networks ( http://arxiv.org/abs/2002.09781v4 ) ライセンス: Link先を確認	Alon Brutzkus, Amir Globerson	(参考訳) 最大プール操作は、ディープラーニングアーキテクチャのコアコンポーネントである。特に、プーリングはパターン検出問題に対する自然なアプローチであるため、マシンビジョンで使用されるほとんどの畳み込みアーキテクチャの一部である。しかし、これらのアーキテクチャは理論的観点からはあまり理解されていない。例えば、それらがいつグローバルに最適化できるか、そして、超パラメータ化が一般化に与える影響は理解できない。ここでは、畳み込み最大プールアーキテクチャの理論解析を行い、グローバルに最適化でき、高度にパラメータ化されたモデルでもうまく一般化できることを示した。本研究では,パターン検出問題に触発されたデータ生成分布に着目した。我々は,理論結果から予測されるように,CNNが完全に接続されたネットワークよりも優れていることを実証的に検証した。 Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a "discriminative" pattern needs to be detected among "spurious" patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.	翻訳日:2022-12-29 19:03:07 公開日:2021-03-04
# 形状先行記憶からの単視点3次元物体再構成 Single-View 3D Object Reconstruction from Shape Priors in Memory ( http://arxiv.org/abs/2003.03711v3 ) ライセンス: Link先を確認	Shuo Yang, Min Xu, Haozhe Xie, Stuart Perry, Jiahao Xia	(参考訳) 画像特徴を3次元表現に変換するために,既存の3次元オブジェクト再構成手法を直接学習する。しかし,これらの手法は,高品質な3次元形状を再構成するのに十分な情報を含んでいないため,ノイズの多い背景と重閉塞を含む画像に対して脆弱である。人間は通常、画像から不完全または騒がしい視覚手がかりを使用して、記憶から類似した3d形状を取得し、物体の3d形状を再構築する。そこで我々はMem3Dという新しい手法を提案し,画像の欠落した情報を補うために,形状の先行を明示的に構築する。具体的には、形状優先は、トレーニング中によく設計された書き方によって格納されるメモリネットワーク内の「イメージボクセル」ペアの形式である。また,入力画像に強く関連した正確な3次元形状を形状先行から検索するためのボクセル三重項損失関数を提案する。 lstmベースの形状エンコーダは、取得した3d形状から情報を抽出するために導入され、非常に閉塞された、あるいは複雑な環境での物体の3d形状の復元に有用である。実験により,Mem3Dは再構成品質を著しく向上し,ShapeNetおよびPix3Dデータセットの最先端手法に対して良好な性能を発揮することが示された。 Existing methods for single-view 3D object reconstruction directly learn to transform image features into 3D representations. However, these methods are vulnerable to images containing noisy backgrounds and heavy occlusions because the extracted image features do not contain enough information to reconstruct high-quality 3D shapes. Humans routinely use incomplete or noisy visual cues from an image to retrieve similar 3D shapes from their memory and reconstruct the 3D shape of an object. Inspired by this, we propose a novel method, named Mem3D, that explicitly constructs shape priors to supplement the missing information in the image. Specifically, the shape priors are in the forms of "image-voxel" pairs in the memory network, which is stored by a well-designed writing strategy during training. We also propose a voxel triplet loss function that helps to retrieve the precise 3D shapes that are highly related to the input image from shape priors. The LSTM-based shape encoder is introduced to extract information from the retrieved 3D shapes, which are useful in recovering the 3D shape of an object that is heavily occluded or in complex environments. Experimental results demonstrate that Mem3D significantly improves reconstruction quality and performs favorably against state-of-the-art methods on the ShapeNet and Pix3D datasets.	翻訳日:2022-12-25 14:25:25 公開日:2021-03-04
# 行動・テキストデータに基づく分類器のメタ特徴に基づく規則抽出 Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data ( http://arxiv.org/abs/2003.04792v3 ) ライセンス: Link先を確認	Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene Praet	(参考訳) 行動データとテキストデータの機械学習モデルは、非常に正確な予測モデルをもたらすが、しばしば解釈するのが非常に困難である。複雑な「ブラックボックス」モデルの予測精度とグローバルな説明可能性を組み合わせたルール抽出手法が提案されている。しかし,多くの特徴が予測に関係している高次元スパースデータのコンテキストにおけるルール抽出は,ブラックボックスモデルを多くのルールで置き換えることによって,ユーザを理解不能な説明に戻すため,困難である。この問題に対処するため,我々は,高レベルで低スパースなメタ機能に基づく規則抽出手法を開発し,テストする。分析の鍵となる発見は、説明の忠実性によって測定されるように、メタ特徴に基づく説明がブラックボックス予測モデルの振る舞いを模倣するのに役立つことである。 Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse metafeatures. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.	翻訳日:2022-12-24 20:37:23 公開日:2021-03-04
# 実世界の強化学習の課題に関する実証的研究 An empirical investigation of the challenges of real-world reinforcement learning ( http://arxiv.org/abs/2003.11881v2 ) ライセンス: Link先を確認	Gabriel Dulac-Arnold and Nir Levine and Daniel J. Mankowitz and Jerry Li and Cosmin Paduraru and Sven Gowal and Todd Hester	(参考訳) 強化学習(RL)は、一連の人工ドメインでその価値を証明し、現実のシナリオでいくつかの成功を示し始めている。しかしながら、RLにおける研究の進歩の多くは、実際に満たされることがほとんどない一連の仮定のため、現実世界のシステムで活用することが難しい。本研究は,RLが現実のシステムに一般的に展開されるために必要な困難を具現化した,一連の独立した課題を特定し,定式化する。それぞれの課題について,マルコフ決定過程の文脈で形式的に定義し,その課題が最先端学習アルゴリズムに与える影響を分析し,それに取り組むための既存の試みを提示する。私たちの提案する課題に対処するアプローチは、現実世界の多くの問題に対して容易にデプロイできると信じています。提案する課題は,オープンソースベンチマークとして提案するrealworldrl-suiteと呼ばれる,一連の継続的制御環境に実装されている。 Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called the realworldrl-suite which we propose an as an open-source benchmark.	翻訳日:2022-12-20 08:13:14 公開日:2021-03-04
# 補助分類器を用いた生成逆数ネットワークに基づく完全自動心電図分類システム Fully Automatic Electrocardiogram Classification System based on Generative Adversarial Network with Auxiliary Classifier ( http://arxiv.org/abs/2004.04894v3 ) ライセンス: Link先を確認	Zhanhong Zhou, Xiaolong Zhai, Chung Tin	(参考訳) 本稿では,完全自動心電図(ecg)不整脈分類システムを用いたgan(generative adversarial network)について述べる。 GANのジェネレータ(G)は、データ拡張のために異なる不整脈クラスで条件付けられた様々な結合行列入力を生成するように設計されている。我々の設計した判別器(D)は、実および生成したECG結合行列の入力に基づいて訓練され、GANのトレーニングが完了すると不整脈分類器として抽出される。 MIT-BIH不整脈データベースでは,非教師付きアルゴリズムを用いて患者固有の正常拍を推定し,Gによる異常拍を発生させることでDを微調整した後,全自動で上室異所性拍(SVEB,Sビート)と心室異所性拍(VEB,Vビート)の総合分類性能が良好であった。最先端の自動分類器を数種類超え、専門家支援手法と同様のレベルで実行することができる。特に、SVEBのF1スコアは、パフォーマンスの高い自動システムよりも最大13%向上している。また, SVEB (87%) とVOB (93%) の感度も高く, 診断に有用である。そこで我々は,ACE-GAN (Generative Adversarial Network with Auxiliary Classifier for Electrocardiogram) ベースの自動システムは,手動介在物や専門家によるラベリングを必要とせず,高スループットな臨床検診を行う上で有望かつ信頼性の高いツールであることを示す。 A generative adversarial network (GAN) based fully automatic electrocardiogram (ECG) arrhythmia classification system with high performance is presented in this paper. The generator (G) in our GAN is designed to generate various coupling matrix inputs conditioned on different arrhythmia classes for data augmentation. Our designed discriminator (D) is trained on both real and generated ECG coupling matrix inputs, and is extracted as an arrhythmia classifier upon completion of training for our GAN. After fine-tuning the D by including patient-specific normal beats estimated using an unsupervised algorithm, and generated abnormal beats by G that are usually rare to obtain, our fully automatic system showed superior overall classification performance for both supraventricular ectopic beats (SVEB or S beats) and ventricular ectopic beats (VEB or V beats) on the MIT-BIH arrhythmia database. It surpassed several state-of-art automatic classifiers and can perform on similar levels as some expert-assisted methods. In particular, the F1 score of SVEB has been improved by up to 13% over the top-performing automatic systems. Moreover, high sensitivity for both SVEB (87%) and VEB (93%) detection has been achieved, which is of great value for practical diagnosis. We, therefore, suggest our ACE-GAN (Generative Adversarial Network with Auxiliary Classifier for Electrocardiogram) based automatic system can be a promising and reliable tool for high throughput clinical screening practice, without any need of manual intervene or expert assisted labeling.	翻訳日:2022-12-14 20:38:03 公開日:2021-03-04
# テセル化ワッサースタインオートエンコーダ Tessellated Wasserstein Auto-Encoders ( http://arxiv.org/abs/2005.09923v2 ) ライセンス: Link先を確認	Kuo Gai and Shihua Zhang	(参考訳) varational auto-encoder (vae)、wasserstein auto-encoder with maximum average discrepancy (wae-mmd)、slicd-wasserstein auto-encoder (swae)といった非敵生成モデルは比較的訓練しやすく、wasserstein auto-encoder with generative adversarial network (wae-gan)に比べてモード崩壊が少ない。しかし、現実と偽の微妙な違いを検出する判別器が存在しないため、潜在空間におけるターゲット分布の近似にはあまり正確ではない。そこで本研究では,Tessellated Wasserstein Auto-Encoders (TWAE) と呼ばれる,不一致の正確な計算のためのランダムシャッフルではなく,テッセルレーションに従ってデータバッチを設計するセンタロイド式Voronoi tessellation (CVT) 技術により,対象領域への目標分布の支持をテッセルレートする,新たな非敵的フレームワークを開発した。理論的には、サンプル数n$ とテッセレーションの領域 $m$ がそれぞれ $\mathcal{o}(\frac{1}{\sqrt{n}})$ と $\mathcal{o}(\frac{1}{\sqrt{m}})$ で大きくなると、推定誤差が減少することを示した。固定$n$と$m$が与えられた場合、測定誤差の上限を最小化するために必要な条件は、テッセル化がCVTによって決定されるものであることである。 TWAEは、異なる非敵対的指標に対して非常に柔軟であり、VAE、WAE-MMD、SWAEと比較してFr\'{e}chet開始距離(FID)において、その生成性能を大幅に向上させることができる。さらに,TWAEは敵対モデルWAE-GANと競合し,その強力な生成能力を示した。 Non-adversarial generative models such as variational auto-encoder (VAE), Wasserstein auto-encoders with maximum mean discrepancy (WAE-MMD), sliced-Wasserstein auto-encoder (SWAE) are relatively easy to train and have less mode collapse compared to Wasserstein auto-encoder with generative adversarial network (WAE-GAN). However, they are not very accurate in approximating the target distribution in the latent space because they don't have a discriminator to detect the minor difference between real and fake. To this end, we develop a novel non-adversarial framework called Tessellated Wasserstein Auto-encoders (TWAE) to tessellate the support of the target distribution into a given number of regions by the centroidal Voronoi tessellation (CVT) technique and design batches of data according to the tessellation instead of random shuffling for accurate computation of discrepancy. Theoretically, we demonstrate that the error of estimate to the discrepancy decreases when the numbers of samples $n$ and regions $m$ of the tessellation become larger with rates of $\mathcal{O}(\frac{1}{\sqrt{n}})$ and $\mathcal{O}(\frac{1}{\sqrt{m}})$, respectively. Given fixed $n$ and $m$, a necessary condition for the upper bound of measurement error to be minimized is that the tessellation is the one determined by CVT. TWAE is very flexible to different non-adversarial metrics and can substantially enhance their generative performance in terms of Fr\'{e}chet inception distance (FID) compared to VAE, WAE-MMD, SWAE. Moreover, numerical results indeed demonstrate that TWAE is competitive to the adversarial model WAE-GAN, demonstrating its powerful generative ability.	翻訳日:2022-12-01 05:15:38 公開日:2021-03-04
# 平均場理論による非負行列の高速階数削減 Fast Rank Reduction for Non-negative Matrices via Mean Field Theory ( http://arxiv.org/abs/2006.05321v2 ) ライセンス: Link先を確認	Kazu Ghalamkari, Mahito Sugiyama	(参考訳) 行列の行数や列数において時間複雑性が2次である非負行列に対する効率的な行列ランク低減法を提案する。我々の重要な洞察は、構造化サンプル空間上の対数線形モデルを用いて行列をモデル化することにより、平均場近似としてランク還元を定式化することである。この定式化のハイライトは、与えられた行列からklの発散を最小化する最適解を閉形式で解析的に計算できることである。提案手法は,NMFとNMFの変種であるlraNMFよりも高速であり,合成および実世界のデータセット上での競合的低ランク近似誤差を実現する。 We propose an efficient matrix rank reduction method for non-negative matrices, whose time complexity is quadratic in the number of rows or columns of a matrix. Our key insight is to formulate rank reduction as a mean-field approximation by modeling matrices via a log-linear model on structured sample space, which allows us to solve the rank reduction as convex optimization. The highlight of this formulation is that the optimal solution that minimizes the KL divergence from a given matrix can be analytically computed in a closed form. We empirically show that our rank reduction method is faster than NMF and its popular variant, lraNMF, while achieving competitive low rank approximation error on synthetic and real-world datasets.	翻訳日:2022-11-23 14:01:16 公開日:2021-03-04
# 良い分類器は補間体制で豊富です Good Classifiers are Abundant in the Interpolating Regime ( http://arxiv.org/abs/2006.12625v2 ) ライセンス: Link先を確認	Ryan Theisen, Jason M. Klusowski, Michael W. Mahoney	(参考訳) 機械学習コミュニティの中で、広く使われている統一収束フレームワークは、過パラメータ化されたモデルが新しいデータにどのように一般化できるかという疑問に答えるために使われてきた。このアプローチは、データに適合する最悪のケースモデルのテストエラーを境界とするが、基本的な制限がある。統計力学の学習アプローチに着想を得て,複数のモデルクラスからの補間分類器間のテストエラーの分布を精度良く計算する手法を定式化し,開発する。本稿では,この分布を線形およびランダムな特徴分類モデルを用いて,実データおよび合成データに対して計算する。テストエラーは小さな典型値$\varepsilon^$に集中する傾向にあり、これは同じデータセット上の最悪のケース補間モデルのテストエラーから大きく逸脱しており、"悪い"分類器は極めて稀であることを示している。我々は、テストエラーの漸近分布を特徴付ける簡単な設定で理論的結果を提供し、これらが実際に$\varepsilon^$の値に集中していることを示し、その値も正確に識別する。次に、経験的発見によって支持されるより一般的な予想を定式化する。以上の結果から,統計的学習理論の一般的な解析手法は,実際に観測された優れた一般化性能を捉えるには,十分にきめ細やかな粒度が得られず,統計的学習力学に基づくアプローチが有望な代替手段となる可能性が示唆された。 Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value $\varepsilon^$, which deviates substantially from the test error of the worst-case interpolating model on the same datasets, indicating that "bad" classifiers are extremely rare. We provide theoretical results in a simple setting in which we characterize the full asymptotic distribution of test errors, and we show that these indeed concentrate around a value $\varepsilon^$, which we also identify exactly. We then formalize a more general conjecture supported by our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning may offer a promising alternative.	翻訳日:2022-11-18 05:11:08 公開日:2021-03-04
# ランダム林としてのマクロ経済 The Macroeconomy as a Random Forest ( http://arxiv.org/abs/2006.12724v3 ) ライセンス: Link先を確認	Philippe Goulet Coulombe	(参考訳) マクロ経済ランダムフォレスト(MRF, Macroeconomic Random Forest)は,線形マクロ方程式の進化パラメータを柔軟にモデル化するための機械学習ツールである。主な出力である一般化時変パラメータ(gtvps)は、多くの一般的な非線形性(threshold/switching, smooth transition, structural breaks/change)をネストし、洗練された新しいものを可能にする多用途デバイスである。このアプローチは多くの代替案よりも明確な予測ゲインをもたらし、2008年の失業の劇的な増加を予測し、インフレのためにうまく機能する。ほとんどのMLベースの方法とは異なり、MRFはGTVPを介して直接解釈可能である。例えば、失業率予測の成功は、不況のたびにほぼ倍増する前向きの変数(例えば、用語の拡散、住宅の開始)の影響によるものである。興味深いことに、フィリップス曲線は確かに平坦であり、そのポテンシャルは非常に巡回的である。 I develop Macroeconomic Random Forest (MRF), an algorithm adapting the canonical Machine Learning (ML) tool to flexibly model evolving parameters in a linear macro equation. Its main output, Generalized Time-Varying Parameters (GTVPs), is a versatile device nesting many popular nonlinearities (threshold/switching, smooth transition, structural breaks/change) and allowing for sophisticated new ones. The approach delivers clear forecasting gains over numerous alternatives, predicts the 2008 drastic rise in unemployment, and performs well for inflation. Unlike most ML-based methods, MRF is directly interpretable -- via its GTVPs. For instance, the successful unemployment forecast is due to the influence of forward-looking variables (e.g., term spreads, housing starts) nearly doubling before every recession. Interestingly, the Phillips curve has indeed flattened, and its might is highly cyclical.	翻訳日:2022-11-17 23:26:18 公開日:2021-03-04
# PanRep:異種グラフにおける普遍ノード埋め込み抽出のためのグラフニューラルネットワーク PanRep: Graph neural networks for extracting universal node embeddings in heterogeneous graphs ( http://arxiv.org/abs/2007.10445v2 ) ライセンス: Link先を確認	Vassilis N. Ioannidis, Da Zheng, George Karypis	(参考訳) 教師なしノード埋め込みの学習は、ノード分類やリンク予測などの下流タスクを容易にする。ノードの埋め込みは、様々な下流タスクで使われるように設計されている場合、普遍的である。この研究は、異種グラフに対する普遍ノード表現の教師なし学習のためのグラフニューラルネットワーク(GNN)モデルであるPanRepを紹介する。 PanRepは、ノード埋め込みと4つのデコーダを取得するGNNエンコーダで構成され、それぞれが異なるトポロジとノードの特徴特性をキャプチャする。これらの特性に従えば、新しい教師なしフレームワークは、異なる下流タスクに適用可能な普遍的な埋め込みを学習する。 PanRepは、限定ラベルを考慮に入れた微調整が可能である。この運用環境では、PanRepは異種グラフデータのノード埋め込みを抽出するための事前訓練されたモデルとみなされる。 panrepは、ノード分類とリンク予測において、特に教師なしメソッドのラベル付きデータが小さい場合に、教師なしメソッドと教師なしメソッドを全て上回る。 PanRep-FT(微調整)は他の教師ありアプローチよりも優れており、事前学習モデルの利点を裏付けている。最後に、Covid-19の新規薬物発見にPanRep-FTを適用した。薬物再導入における普遍的な埋め込みの利点を示し,臨床試験で用いられる薬物を薬物候補として同定する。 Learning unsupervised node embeddings facilitates several downstream tasks such as node classification and link prediction. A node embedding is universal if it is designed to be used by and benefit various downstream tasks. This work introduces PanRep, a graph neural network (GNN) model, for unsupervised learning of universal node representations for heterogenous graphs. PanRep consists of a GNN encoder that obtains node embeddings and four decoders, each capturing different topological and node feature properties. Abiding to these properties the novel unsupervised framework learns universal embeddings applicable to different downstream tasks. PanRep can be furthered fine-tuned to account for possible limited labels. In this operational setting PanRep is considered as a pretrained model for extracting node embeddings of heterogenous graph data. PanRep outperforms all unsupervised and certain supervised methods in node classification and link prediction, especially when the labeled data for the supervised methods is small. PanRep-FT (with fine-tuning) outperforms all other supervised approaches, which corroborates the merits of pretraining models. Finally, we apply PanRep-FT for discovering novel drugs for Covid-19. We showcase the advantage of universal embeddings in drug repurposing and identify several drugs used in clinical trials as possible drug candidates.	翻訳日:2022-11-08 12:55:40 公開日:2021-03-04
# Clarinet: 予算に優しいドメイン適応に向けての一段階 Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation ( http://arxiv.org/abs/2007.14612v2 ) ライセンス: Link先を確認	Yiyang Zhang, Feng Liu, Zhen Fang, Bo Yuan, Guangquan Zhang, Jie Lu	(参考訳) unsupervised domain adaptation(uda)では、ターゲットドメインの分類器は、ソースドメインからの巨大なtrue-labelデータと、ターゲットドメインからのunlabelデータで訓練される。しかし、予算が限られているため、ソースドメインで完全なラベルデータを集めるのは難しいかもしれません。この問題を軽減するために,対象ドメインの分類器を,ソースドメインからの補完ラベルデータと,対象ドメインのラベルなしデータとで訓練する必要がある,予算フレンドリーuda(bfuda)という新たな問題を考える。主な利点は、(BFUDAが要求する)補完ラベルのソースデータを収集するコストが、(通常のUDAが要求する)真のラベルのソースデータを収集するコストよりもはるかに少ないことである。この目的のために、BFUDA問題を解決するためにCLARINET(Compleorary label adversarial Network)を提案する。 clarinetは2つのディープネットワークを同時に維持しており、1つは補完ラベルのソースデータを分類し、もう1つはソースからターゲットへの分散適応を扱う。 CLARINETは、一連の有能なベースラインを著しく上回っている。 In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain. However, it may be difficult to collect fully-true-label data in a source domain given a limited budget. To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to be trained with complementary-label data from the source domain and unlabeled data from the target domain named budget-friendly UDA (BFUDA). The key benefit is that it is much less costly to collect complementary-label source data (required by BFUDA) than collecting the true-label source data (required by ordinary UDA). To this end, the complementary label adversarial network (CLARINET) is proposed to solve the BFUDA problem. CLARINET maintains two deep networks simultaneously, where one focuses on classifying complementary-label source data and the other takes care of the source-to-target distributional adaptation. Experiments show that CLARINET significantly outperforms a series of competent baselines.	翻訳日:2022-11-05 19:24:32 公開日:2021-03-04
# リカレントニューラルネットワークの再検討と画像分類の改善 Rethinking Recurrent Neural Networks and Other Improvements for Image Classification ( http://arxiv.org/abs/2007.15161v3 ) ライセンス: Link先を確認	Nguyen Huu Phong, Bernardete Ribeiro	(参考訳) 数十年前にさかのぼる機械学習の長い歴史の中で、リカレントニューラルネットワーク(RNN)は主にシーケンシャルなデータや時系列、一般的に1D情報に使われてきた。 2次元画像の稀な研究においても、これらのネットワークは画像認識タスクではなく、データのシーケンシャルな学習と生成にのみ使用される。本研究では,画像認識モデルの設計において,RNNを付加層として統合することを提案する。また,複数のモデルを用いてエキスパート予測を行うエンド・ツー・エンドのマルチモデルアンサンブルを開発した。さらに、トレーニング戦略を拡張して、主要なモデルと互換性があり、いくつかの挑戦的なデータセット(SVHN (0.99)、Cifar-100 (0.9027)、Cifar-10 (0.9852) など)で最先端のモデルにマッチさせることができる。さらに,本モデルでは,サリーデータセット (0.949) に新しいレコードを設定する。この記事では、メソッドのソースコードをhttps://github.com/leonlha/e2e-3mとhttp://nguyenhuuphong.meで公開します。 Over the long history of machine learning, which dates back several decades, recurrent neural networks (RNNs) have been used mainly for sequential data and time series and generally with 1D information. Even in some rare studies on 2D images, these networks are used merely to learn and generate data sequentially rather than for image recognition tasks. In this study, we propose integrating an RNN as an additional layer when designing image recognition models. We also develop end-to-end multimodel ensembles that produce expert predictions using several models. In addition, we extend the training strategy so that our model performs comparably to leading models and can even match the state-of-the-art models on several challenging datasets (e.g., SVHN (0.99), Cifar-100 (0.9027) and Cifar-10 (0.9852)). Moreover, our model sets a new record on the Surrey dataset (0.949). The source code of the methods provided in this article is available at https://github.com/leonlha/e2e-3m and http://nguyenhuuphong.me.	翻訳日:2022-11-05 14:06:40 公開日:2021-03-04
# 人物再同定のための弾性損失をもつ不完全記述子マイニング Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification ( http://arxiv.org/abs/2008.04010v4 ) ライセンス: Link先を確認	Hongchen Tan, Yuhao Bian, Huasheng Wang, Xiuping Liu, and Baocai Yin	(参考訳) 本稿では,人物リidタスクに注意し頑健な人物記述子をキャプチャする新しい人物リidモデルである連続バッチドロップブロックネットワーク(cbdb-net)を提案する。 CBDB-Net には Consecutive Batch DropBlock Module (CBDBM) と Elastic Loss (EL) という2つの新しい設計が含まれている。連続したバッチドロップブロックモジュール(cbdbm)では、最初に機能マップで一様分割を行います。そして、各パッチを独立して継続的にフィーチャーマップの上から下へ落とし、複数の不完全なフィーチャーマップを出力します。トレーニング段階では、これらの複数の不完全な機能はRe-IDモデルをより促進し、Re-IDタスクの堅牢な人物記述子をキャプチャする。 EL(Elastic Loss)では、Re-IDモデルがハードサンプルペアと簡単なサンプルペアを適応的にバランスさせるのに役立つ新しい重量制御アイテムを設計する。広範囲にわたるアブレーション研究を通じて,CBDBM(Consecutive Batch DropBlock Module)とEL(Elastic Loss)がCBDB-Netの性能向上に貢献していることを確認した。我々のCBDB-Netは、3つの標準人物Re-IDデータセット(Market-1501、DukeMTMC-Re-ID、CUHK03データセット)、3つの隠蔽人物Re-IDデータセット(Occluded DukeMTMC、Partial-REID、Partial iLIDSデータセット)、一般的な画像検索データセット(In-Shop Clothes Retrievalデータセット)で競合性能を達成できることを示した。 In this paper, we propose a novel person Re-ID model, Consecutive Batch DropBlock Network (CBDB-Net), to capture the attentive and robust person descriptor for the person Re-ID task. The CBDB-Net contains two novel designs: the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL). In the Consecutive Batch DropBlock Module (CBDBM), we firstly conduct uniform partition on the feature maps. And then, we independently and continuously drop each patch from top to bottom on the feature maps, which can output multiple incomplete feature maps. In the training stage, these multiple incomplete features can better encourage the Re-ID model to capture the robust person descriptor for the Re-ID task. In the Elastic Loss (EL), we design a novel weight control item to help the Re-ID model adaptively balance hard sample pairs and easy sample pairs in the whole training process. Through an extensive set of ablation studies, we verify that the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL) each contribute to the performance boosts of CBDB-Net. We demonstrate that our CBDB-Net can achieve the competitive performance on the three standard person Re-ID datasets (the Market-1501, the DukeMTMC-Re-ID, and the CUHK03 dataset), three occluded Person Re-ID datasets (the Occluded DukeMTMC, the Partial-REID, and the Partial iLIDS dataset), and a general image retrieval dataset (In-Shop Clothes Retrieval dataset).	翻訳日:2022-10-31 23:06:46 公開日:2021-03-04
# 軌道フィードバックによる強化学習 Reinforcement Learning with Trajectory Feedback ( http://arxiv.org/abs/2008.06036v2 ) ライセンス: Link先を確認	Yonathan Efroni, Nadav Merlis, Shie Mannor	(参考訳) 強化学習の標準的なフィードバックモデルは、訪問した状態-アクションペアの報酬を明らかにする必要がある。しかし、実際には、そのような頻繁なフィードバックが利用できないことが多い。この研究では、この仮定を緩和する第一歩を踏み出し、より弱い形式のフィードバックを必要とし、これを 'emph{trajectory feedback} と呼ぶ。各アクションの後に得られる報酬を観察する代わりに、エージェントが観察する軌道全体の質を表すスコア、すなわち、この軌道上で得られるすべての報酬の合計だけを受け取ると仮定する。我々は,未知報酬の最小二乗推定に基づく強化学習アルゴリズムを,既知のトランジションモデルと未知のトランジションモデルの両方のケースに対してこの設定に拡張し,その後悔を分析してアルゴリズムの性能について検討する。遷移モデルが未知の場合には、トラクタブルアルゴリズムをもたらすハイブリッドな楽観的なトンプソンサンプリング手法を提供する。 The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a tractable algorithm.	翻訳日:2022-10-30 22:36:51 公開日:2021-03-04
# ベクトル値画像正規化のためのカラー弾性モデル A Color Elastica Model for Vector-Valued Image Regularization ( http://arxiv.org/abs/2008.08255v2 ) ライセンス: Link先を確認	Hao Liu, Xue-Cheng Tai, Ron Kimmel, Roland Glowinski	(参考訳) オイラーの弾性エネルギーに関連するモデルは、画像処理を含む多くのアプリケーションで有用であることが証明されている。カラー画像やマルチチャネルデータに弾性モデルを拡張することは難しい課題であり、これらの幾何学モデルの安定かつ一貫した数値解法は高階微分を伴うことが多い。単一チャネルのオイラーの弾性モデルや全変動(TV)モデルと同様に、高次微分を含む幾何学的測度は、弾性特性を最小化する画像形成モデルを考える際に役立つ。過去には、高エネルギー物理学からのポリアコフ作用がカラー画像処理にうまく応用されている。ここでは、色多様体曲率を最小化する色画像に対するポリアコフ作用の追加を紹介する。カラー画像チャネルにラプラス・ベルトラミ演算子を適用してカラー画像曲率を算出する。グレースケールに縮小した場合、空間と色の間の適切なスケーリングを選択しながら、画像レベルセットで操作するオイラーの弾性を最小化する。提案する非線形幾何モデルの最小値を求めることは,本論文で提示する課題である。具体的には,提案する関数を最小化する演算子分割法を提案する。非線形性は、3つのベクトル値変数と行列値変数を導入することで分離される。その後、問題は関連する初期値問題の定常状態の解に変換される。初期値問題は、各サブプロブレムが閉形式解を持つか、高速アルゴリズムで解けるように、3つの分数ステップに時間分割される。提案手法の効率性とロバスト性は系統的な数値実験により実証された。 Models related to the Euler's elastica energy have proven to be useful for many applications including image processing. Extending elastica models to color images and multi-channel data is a challenging task, as stable and consistent numerical solvers for these geometric models often involve high order derivatives. Like the single channel Euler's elastica model and the total variation (TV) models, geometric measures that involve high order derivatives could help when considering image formation models that minimize elastic properties. In the past, the Polyakov action from high energy physics has been successfully applied to color image processing. Here, we introduce an addition to the Polyakov action for color images that minimizes the color manifold curvature. The color image curvature is computed by applying of the Laplace-Beltrami operator to the color image channels. When reduced to gray-scale images, while selecting appropriate scaling between space and color, the proposed model minimizes the Euler's elastica operating on the image level sets. Finding a minimizer for the proposed nonlinear geometric model is a challenge we address in this paper. Specifically, we present an operator-splitting method to minimize the proposed functional. The non-linearity is decoupled by introducing three vector-valued and matrix-valued variables. The problem is then converted into solving for the steady state of an associated initial-value problem. The initial-value problem is time-split into three fractional steps, such that each sub-problem has a closed form solution, or can be solved by fast algorithms. The efficiency and robustness of the proposed method are demonstrated by systematic numerical experiments.	翻訳日:2022-10-27 12:01:21 公開日:2021-03-04
# varifocalnet:iouを検知する高密度物体検出器 VarifocalNet: An IoU-aware Dense Object Detector ( http://arxiv.org/abs/2008.13367v2 ) ライセンス: Link先を確認	Haoyang Zhang, Ying Wang, Feras Dayoub and Niko S\"underhauf	(参考訳) 密度の高い物体検出器が高性能を達成するためには、膨大な数の候補検出を正確にランク付けすることが不可欠である。事前の作業では、分類スコアまたは分類と予測定位スコアの組み合わせを使用して候補をランク付けする。しかし、どちらのオプションも信頼性の高いランキングとなり、検出性能が低下する。本稿では,物体の存在感と位置推定精度の合同表現として,IACS(Iou-Aware Classification Score)を学習することを提案する。高密度物体検出器は、iacsに基づいて、より正確な候補検出ランキングを実現できることを示す。我々は、IACSを予測するために高密度物体検出器を訓練するために、Varifocal Lossという新しい損失関数を設計し、IACS予測とバウンディングボックス改善のための新しい星型バウンディングボックス特徴表現を提案する。これら2つの新しいコンポーネントとバウンディングボックスリファインメントブランチを組み合わせることで、FCOS+ATSSアーキテクチャに基づいたIoU対応の高密度オブジェクト検出器を構築し、VarifocalNetまたはVFNetを略して呼び出す。 MS COCOの大規模な実験により、VFNetは、異なるバックボーンを持つ$\sim$2.0 APの強いベースラインを一貫して超えています。我々のベストモデルであるVFNet-X-1200とRes2Net-101-DCNは、COCO test-dev上で55.1のシングルスケールAPを達成する。 Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .	翻訳日:2022-10-23 07:09:35 公開日:2021-03-04
# 活動特化特徴と活動相関を用いたマルチラベル活動認識 Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations ( http://arxiv.org/abs/2009.07420v2 ) ライセンス: Link先を確認	Yanyi Zhang, Xinyu Li, Ivan Marsic	(参考訳) マルチラベルアクティビティ認識は、各ビデオで同時または順次に実行される複数のアクティビティを認識するように設計されている。最近のアクティビティ認識ネットワークは、各ビデオ内の1つのアクティビティのみを前提とする単一のアクティビティに焦点を当てている。これらのネットワークは、マルチラベルアクティビティ用に設計されていないすべてのアクティビティの共有機能を抽出する。本稿では,各アクティビティの独立な特徴記述子を抽出し,アクティビティ相関を学習するマルチラベルアクティビティ認識手法を提案する。この構造はエンドツーエンドでトレーニングでき、ビデオ分類のために既存のネットワーク構造にプラグインすることができる。提案手法は,4つのマルチラベルアクティビティ認識データセットにおける最先端手法よりも優れている。システムが生成するアクティビティ特有の特徴をよりよく理解するために、これらのアクティビティ特有の機能をcharadesデータセットで視覚化しました。 Multi-label activity recognition is designed for recognizing multiple activities that are performed simultaneously or sequentially in each video. Most recent activity recognition networks focus on single-activities, that assume only one activity in each video. These networks extract shared features for all the activities, which are not designed for multi-label activities. We introduce an approach to multi-label activity recognition that extracts independent feature descriptors for each activity and learns activity correlations. This structure can be trained end-to-end and plugged into any existing network structures for video classification. Our method outperformed state-of-the-art approaches on four multi-label activity recognition datasets. To better understand the activity-specific features that the system generated, we visualized these activity-specific features in the Charades dataset.	翻訳日:2022-10-18 00:12:30 公開日:2021-03-04
# ゼロショット学習のための情報ボトルネック制約付き潜在双方向埋め込み Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning ( http://arxiv.org/abs/2009.07451v3 ) ライセンス: Link先を確認	Yang Liu, Lei Zhou, Xiao Bai, Lin Gu, Tatsuya Harada, Jun Zhou	(参考訳) ゼロショット学習(ZSL)は、目に見えるクラスから目に見えないクラスに意味的な知識を移すことによって、新しいクラスを認識することを目的としている。多くのZSL法は視覚空間と意味空間の直接マッピングに依存しているが、キャリブレーション偏差とハブ性問題は一般化能力を目に見えないクラスに制限する。最近出現した生成型ZSL法は、ZSLを教師付き分類問題に変換するために見えない画像特徴を生成する。しかし、ほとんどの生成モデルは、トレーニングに使用されるデータのみであるため、まだ見受けられないバイアス問題に苦しんでいる。そこで本研究では, 密接な視覚-感覚結合制約を持つ双方向埋め込み型生成モデルを提案する。視覚空間と意味空間の両方の埋め込みパラメトリック分布を校正する統合潜在空間を学習する。高次元視覚特徴からの埋め込みは、多くの非意味情報を含むので、潜在空間における視覚と意味のアライメントは必然的に逸脱する。そこで本研究では,ZSLに初めて情報ボトルネック(IB)制約を導入し,マッピング中に本質的な属性情報を保持する。具体的には,不確実性推定と覚醒手順を利用して特徴雑音を緩和し,モデルの抽象化能力を向上させる。また, 画像のラベルを生成することで, トランスダクティブZSL設定に容易に拡張することができる。そして、このラベルノイズ問題を解決するためにロバストな損失を導入する。広範な実験結果から,本手法は,ほとんどのベンチマークデータセットのzsl設定において,最先端のメソッドよりも優れていた。コードはhttps://github.com/osierboy/IBZSLで入手できる。 Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Though many ZSL methods rely on a direct mapping between the visual and the semantic space, the calibration deviation and hubness problem limit the generalization capability to unseen classes. Recently emerged generative ZSL methods generate unseen image features to transform ZSL into a supervised classification problem. However, most generative models still suffer from the seen-unseen bias problem as only seen data is used for training. To address these issues, we propose a novel bidirectional embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Since the embedding from high-dimensional visual features comprise much non-semantic information, the alignment of visual and semantic in latent space would inevitably been deviated. Therefore, we introduce information bottleneck (IB) constraint to ZSL for the first time to preserve essential attribute information during the mapping. Specifically, we utilize the uncertainty estimation and the wake-sleep procedure to alleviate the feature noises and improve model abstraction capability. In addition, our method can be easily extended to transductive ZSL setting by generating labels for unseen images. We then introduce a robust loss to solve this label noise problem. Extensive experimental results show that our method outperforms the state-of-the-art methods in different ZSL settings on most benchmark datasets. The code will be available at https://github.com/osierboy/IBZSL.	翻訳日:2022-10-17 23:29:50 公開日:2021-03-04
# 不確定な pomdp に対するロバスト有限状態制御器 Robust Finite-State Controllers for Uncertain POMDPs ( http://arxiv.org/abs/2009.11459v2 ) ライセンス: Link先を確認	Murat Cubuktepe, Nils Jansen, Sebastian Junges, Ahmadreza Marandi, Marnix Suilen, Ufuk Topcu	(参考訳) 観測不能なマルコフ決定過程 (uPOMDPs) により、標準ポドフの確率的遷移と観測関数はいわゆる不確実性集合に属する。このような不確実性は、疫学的な不確実性と呼ばれ、例えばデータ不足によって生じる確率分布の無数の集合をキャプチャする。我々は,任意の許容分布に対する仕様を確実に満たす uPOMDP の有限メモリポリシを計算するアルゴリズムを開発した。一般に、そのような政策の計算は理論上、実用上は難解である。この問題に対する効率的な解決策を4ステップで提供します。 1) 基礎となる問題を無限に多くの制約のある非凸最適化問題とする。 2) 専用双対化スキームは、まだ凸ではないが有限個の制約を持つ双対問題をもたらす。 3) この双対問題を線形化し, (4) 結果として得られる有限線形プログラムを解き, 原問題に対する局所最適解を得る。その結果生じる問題定式化は、既存の方法による問題よりも指数関数的に小さい。航空機衝突回避シナリオの大規模事例と,新しい宇宙機モーションプランニングケーススタディを用いて,本アルゴリズムの適用性を示す。 Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captures uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.	翻訳日:2022-10-15 04:39:16 公開日:2021-03-04
# VIVO: 新しいオブジェクトキャプションのためのビジュアル語彙事前トレーニング VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning ( http://arxiv.org/abs/2009.13682v2 ) ライセンス: Link先を確認	Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu	(参考訳) キャプションラベル付きトレーニングデータに見えない新規なオブジェクトを記述できる画像キャプションを生成することは、非常に望ましいが、新規なオブジェクトキャプションチャレンジ(ノーキャップ)で評価される能力である。この課題では、COCO Captions以外のイメージキャプチャトレーニングデータは、モデルトレーニングには使用できない。したがって、従来のビジョンランゲージ事前訓練(VLP)法は適用できない。本稿では、字幕アノテーションがない場合に事前学習を行うVIVO(Visual VOcabulary Pretraining)を提案する。 VLPにおけるペア画像キャプチャトレーニングデータの依存を断ち切ることで、VIVOは大量のペア画像タグデータを利用して視覚語彙を学習することができる。これは、画像レベルのタグを対応する画像領域の特徴に合わせることを学ぶマルチレイヤトランスフォーマーモデルを事前訓練することで実現される。画像タグの非順序性に対処するため、VIVOはハンガリーのマッチング損失とマスク付きタグ予測を使用して事前トレーニングを行う。画像キャプションのための訓練済みモデルを微調整し,VIVOの有効性を検証する。さらに,モデルによって推定される視覚的テキストアライメントの分析を行う。その結果,本モデルでは,新規なオブジェクトを記述した画像キャプションを生成するだけでなく,それらのオブジェクトの位置を識別できることがわかった。我々の1つのモデルは、nocapsで新しい最先端の結果を達成し、人間のCIDErスコアを上回りました。 It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps). In this challenge, no additional image-caption training data, other thanCOCO Captions, is allowed for model training. Thus, conventional Vision-Language Pre-training (VLP) methods cannot be applied. This paper presents VIsual VOcabulary pretraining (VIVO) that performs pre-training in the absence of caption annotations. By breaking the dependency of paired image-caption training data in VLP, VIVO can leverage large amounts of paired image-tag data to learn a visual vocabulary. This is done by pre-training a multi-layer Transformer model that learns to align image-level tags with their corresponding image region features. To address the unordered nature of image tags, VIVO uses a Hungarian matching loss with masked tag prediction to conduct pre-training. We validate the effectiveness of VIVO by fine-tuning the pre-trained model for image captioning. In addition, we perform an analysis of the visual-text alignment inferred by our model. The results show that our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Our single model has achieved new state-of-the-art results on nocaps and surpassed the human CIDEr score.	翻訳日:2022-10-13 20:38:37 公開日:2021-03-04
# ByzShield: 分散トレーニングのための効率的でロバストなシステム ByzShield: An Efficient and Robust System for Distributed Training ( http://arxiv.org/abs/2010.04902v2 ) ライセンス: Link先を確認	Konstantinos Konstantinidis, Aditya Ramamoorthy	(参考訳) 分散クラスタ上での大規模モデルのトレーニングは、マシンラーニングパイプラインの重要なコンポーネントである。しかし、一部の労働者が逆(ビザンチン)のやり方で振る舞うと、パラメータサーバ(ps)に任意の結果を返すので、このトレーニングは簡単に失敗する。多くの既存論文が様々な攻撃モデルを検討し、これらの攻撃の影響を軽減するために堅牢な集約と/または計算冗長性を提案する。本研究では, 作業者の勾配計算課題について, 敵が十分な知識を持っていて, 最大被害を誘発するために, K の作業ノードから q を攻撃(最大)できるような, 奇抜な攻撃モデルを考える。冗長性に基づく手法である byzshield は作業者へのタスク割り当てに二部展開グラフの特性を利用する。具体的には、直交するラテン正方形とラマヌジャングラフに基づく構成の固有値に基づいて、崩壊した勾配の最悪のケース分数について上界を示す。数値実験により, 崩壊した勾配の分数の平均値が36%以上低下していることが判明した。同様に、CIFAR-10データセット上の画像分類によるトレーニング実験では、ByzShieldは最も高度な攻撃下で平均20%の精度で精度を向上している。 ByzShieldはまた、以前の作業よりもはるかに多くの対向ノードを許容する。 Training of large scale models on distributed clusters is a critical component of the machine learning pipeline. However, this training can easily be made to fail if some workers behave in an adversarial (Byzantine) fashion whereby they return arbitrary results to the parameter server (PS). A plethora of existing papers consider a variety of attack models and propose robust aggregation and/or computational redundancy to alleviate the effects of these attacks. In this work we consider an omniscient attack model where the adversary has full knowledge about the gradient computation assignments of the workers and can choose to attack (up to) any q out of K worker nodes to induce maximal damage. Our redundancy-based method ByzShield leverages the properties of bipartite expander graphs for the assignment of tasks to workers; this helps to effectively mitigate the effect of the Byzantine behavior. Specifically, we demonstrate an upper bound on the worst case fraction of corrupted gradients based on the eigenvalues of our constructions which are based on mutually orthogonal Latin squares and Ramanujan graphs. Our numerical experiments indicate over a 36% reduction on average in the fraction of corrupted gradients compared to the state of the art. Likewise, our experiments on training followed by image classification on the CIFAR-10 dataset show that ByzShield has on average a 20% advantage in accuracy under the most sophisticated attacks. ByzShield also tolerates a much larger fraction of adversarial nodes compared to prior work.	翻訳日:2022-10-08 23:36:47 公開日:2021-03-04
# アルゴリズムフェアネスに向けたブリッジ機械学習とメカニズム設計 Bridging Machine Learning and Mechanism Design towards Algorithmic Fairness ( http://arxiv.org/abs/2010.05434v2 ) ライセンス: Link先を確認	Jessie Finocchiaro, Roland Maio, Faidra Monachou, Gourab K Patro, Manish Raghavan, Ana-Andreea Stoica, Stratis Tsirtsis	(参考訳) したがって、公平で平等なシステムを構築するためにアルゴリズム的なコンポーネントにどのように介入するかは、最も重要な問題です。現代の意思決定システムでは、リソースや情報を人(例えば学校選択、広告)に割り当てることによって、パイプラインに機械が学習した予測を取り入れ、潜在的戦略行動や制約された割り当てに関する懸念を提起する。機械学習とメカニズム設計の両方が公平性と公平性の問題に対処するフレームワークを開発したが、複雑な意思決定システムではどちらのフレームワークも個々に十分ではない。本稿では,公平な意思決定システムを構築するには,各分野に固有の制約を克服する必要があるという立場を開発する。私たちの究極の目標は、メカニズム設計と機械学習の個々のフレームワークを結合的にブリッジする包括的フレームワークを構築することです。我々は、各分野が公正な意思決定を行う視点を比較し、各分野が教えた教訓をティーズアウトし、相互に教えることができ、これらの分野の強力な協力を必要とするアプリケーションドメインを強調して、この目標に向けて基礎的な作業を始めている。 Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.	翻訳日:2022-10-08 08:02:51 公開日:2021-03-04
# 選択的コントラスト学習による完全教師なし人物再同定 Fully Unsupervised Person Re-identification viaSelective Contrastive Learning ( http://arxiv.org/abs/2010.07608v2 ) ライセンス: Link先を確認	Bo Pang, Deming Zhai, Junjun Jiang, Xianming Liu	(参考訳) 人物再識別(ReID)は、様々なカメラが捉えた画像の中から同一人物を検索することを目的としている。教師なしのReIDは,手動のアノテーションを多用せず,新たな条件に適応する可能性が大きいため,近年多くの注目を集めている。表現学習は、教師なしのReIDにおいて重要な役割を果たす。本研究では,教師なし特徴学習のための新しいコントラスト学習フレームワークを提案する。具体的には、従来のコントラスト学習戦略と異なり、コントラスト損失を定義するために複数の正と適応的にサンプリングされた負を用いて、より強い識別表現を持つ特徴埋め込みモデルを学ぶことを提案する。さらに,グローバルメモリバンクとローカルメモリバンクを対の類似性計算に,混合メモリバンクをコントラスト損失定義に使用する3つの動的辞書を構成するために,グローバルとローカルの機能を共同で活用することを提案する。調査の結果,教師なしのReIDにおける手法の優位性について,最先端技術と比較した。 Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras. Unsupervised person ReID attracts a lot of attention recently, due to it works without intensive manual annotation and thus shows great potential of adapting to new conditions. Representation learning plays a critical role in unsupervised person ReID. In this work, we propose a novel selective contrastive learning framework for unsupervised feature learning. Specifically, different from traditional contrastive learning strategies, we propose to use multiple positives and adaptively sampled negatives for defining the contrastive loss, enabling to learn a feature embedding model with stronger identity discriminative representation. Moreover, we propose to jointly leverage global and local features to construct three dynamic dictionaries, among which the global and local memory banks are used for pairwise similarity computation and the mixture memory bank are used for contrastive loss definition. Experimental results demonstrate the superiority of our method in unsupervised person ReID compared with the state-of-the-arts.	翻訳日:2022-10-07 04:08:23 公開日:2021-03-04
# 非凸非凸ミニマックス最適化の連続時間系による限界挙動 Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems ( http://arxiv.org/abs/2010.10628v2 ) ライセンス: Link先を確認	Benjamin Grimmer, Haihao Lu, Pratik Worah, Vahab Mirrokni	(参考訳) 非凸最適化とは異なり、勾配降下は局所最適化に収束することが保証されるが、非凸非凹極小最適化のアルゴリズムは位相的に異なる解経路を持つことがある。本稿では,勾配降下上昇法(gda),交流勾配降下上昇法(agda),超勾配法(egm)の3つの古典的なminimaxアルゴリズムの限界挙動について検討する。本稿では,これらの制限行動がGAN(Generative Adversarial Networks)トレーニングで発生し,多様なGAN問題に対して容易に実証できることを観察する。これらの異なる挙動を説明するために、各アルゴリズムに対応する高次分解能連続時間ダイナミクスについて検討し、各手法による局所収束に十分な(そしてほぼ必要)条件を導出する。さらに、この ode の視点により、ホップ分岐として正規化を導入することによって引き起こされるこれらの異なる制限挙動間の位相遷移を特徴付けることができる。 Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.	翻訳日:2022-10-05 07:57:38 公開日:2021-03-04
# BiTe-GCN: テキストリッチネットワーク上のトポロジと特徴の双方向変換による新しいGCNアーキテクチャ BiTe-GCN: A New GCN Architecture via BidirectionalConvolution of Topology and Features on Text-Rich Networks ( http://arxiv.org/abs/2010.12157v2 ) ライセンス: Link先を確認	Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, Jiawei Han	(参考訳) グラフ畳み込み層を通して高次近傍情報を統合することを目的としたグラフ畳み込みネットワーク(gcns)は、多くのネットワーク分析タスクにおいて顕著な能力を示している。しかし、オーバースムーシングや局所位相ホモフィイを含むトポロジー上の制限は、ネットワークを表現する能力を制限する。既存の研究はネットワークトポロジにおける特徴畳み込みしか行わず、必然的にトポロジと特徴の不均衡をもたらす。実世界では、情報ネットワークは、ノードレベルの引用情報だけでなく、ローカルなテキストシーケンス情報も含む。テキストリッチネットワーク上でのトポロジと特徴の双方向畳み込みによる新しいGCNアーキテクチャであるBiTe-GCNを提案する。まず、元のテキストリッチネットワークを拡張二型ヘテロジニアスネットワークに変換し、グローバルノードレベル情報とローカルテキストシーケンス情報の両方をテキストからキャプチャする。次に、トポロジーと特徴の両方の畳み込みを同時に実行する識別畳み込み機構を導入する。テキストリッチネットワークに関する広範囲な実験によって、新しいアーキテクチャがブレークアウトの改善によって最先端を上回っていることが証明された。さらに、このアーキテクチャはjd検索のようないくつかのeコマース検索シーンにも適用できる。 jdデータセットにおける実験は、提案するアーキテクチャが関連する手法よりも優れていることを検証している。 Graph convolutional networks (GCNs), aiming to integrate high-order neighborhood information through stacked graph convolution layers, have demonstrated remarkable power in many network analysis tasks. However, topological limitations, including over-smoothing and local topology homophily, limit its capability to represent networks. Existing studies only perform feature convolution on network topology, which inevitably introduces unbalance between topology and features. Considering that in real world, the information network consists of not only the node-level citation information but also the local text-sequence information. We propose BiTe-GCN, a novel GCN architecture with bidirectional convolution of both topology and features on text-rich networks to solve these limitations. We first transform the original text-rich network into an augmented bi-typed heterogeneous network, capturing both the global node-level information and the local text-sequence information from texts. We then introduce discriminative convolution mechanisms to performs convolutions of both topology and features simultaneously. Extensive experiments on text-rich networks demonstrate that our new architecture outperforms state-of-the-art by a breakout improvement. Moreover, this architecture can also be applied to several e-commerce searching scenes such as JD searching. The experiments on the JD dataset validate the superiority of the proposed architecture over the related methods.	翻訳日:2022-10-03 23:37:21 公開日:2021-03-04
# EDNet:コストボリュームとアテンションに基づく空間残差を考慮した効率的な分散推定 EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual ( http://arxiv.org/abs/2010.13338v4 ) ライセンス: Link先を確認	Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei, Xiaowen Chu	(参考訳) 既存の最先端の分散度推定作業は、主に4D結合ボリュームを活用し、高いメモリ消費と遅い推論速度のために非効率な分散回帰のために非常に深い3D畳み込みニューラルネットワーク(CNN)を構築する。本稿では,EDNetというネットワークを効率よく分散推定する手法を提案する。まず,圧縮された連結音量からの文脈情報と相関音量からの特徴類似性測定を組み合わせた複合音量を構築する。結合ボリュームは次に2D畳み込みによって集約され、3D畳み込みよりも高速で少ないメモリを必要とする。次に,注意認識残差特性を生成するための注意に基づく空間残差モジュールを提案する。注意機構を適用し、複数スケールの誤差マップを用いて、不正確な領域に関する直感的な空間的証拠を提供し、残差学習効率を向上する。 Scene FlowとKITTIデータセットの大規模な実験は、EDNetが以前の3D CNNベースの作業より優れており、非常に高速でメモリ消費の少ない最先端のパフォーマンスを実現していることを示している。 Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed. In this paper, we propose a network named EDNet for efficient disparity estimation. Firstly, we construct a combined volume which incorporates contextual information from the squeezed concatenation volume and feature similarity measurement from the correlation volume. The combined volume can be next aggregated by 2D convolutions which are faster and require less memory than 3D convolutions. Secondly, we propose an attention-based spatial residual module to generate attention-aware residual features. The attention mechanism is applied to provide intuitive spatial evidence about inaccurate regions with the help of error maps at multiple scales and thus improve the residual learning efficiency. Extensive experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works and achieves state-of-the-art performance with significantly faster speed and less memory consumption.	翻訳日:2022-10-02 19:16:29 公開日:2021-03-04
# 遠心損失に基づく弱教師付き意味セグメンテーションアプローチ:品質管理と検査への応用 A Weakly-Supervised Semantic Segmentation Approach based on the Centroid Loss: Application to Quality Control and Inspection ( http://arxiv.org/abs/2010.13433v3 ) ライセンス: Link先を確認	Kai Yao, Alberto Ortiz, Francisco Bonnin-Pascual	(参考訳) 一般に、ディープラーニングと畳み込みニューラルネットワークに基づく現在のビジョンアルゴリズムの重要な部分の1つは、競合性能を達成するのに十分な数の画像のアノテーションであると考えられている。アノテーションはピクセルレベルで理想的に生成する必要があるため、セマンティックセグメンテーションタスクでは特に難しい。弱い教師付きセマンティックセグメンテーション(weakly supervised semantic segmentation)は、よりシンプルなアノテーションを使用することで、このコストを削減することを目的としている。本稿では,弱いアノテーションの効果を相殺することを目的とした新しい損失関数を用いて,新しい弱教師付き意味セグメンテーション手法を提案し,評価する。この目的のために、この損失関数は部分的エントロピー損失に基づくいくつかの項を含み、その1つはセントロイド損失である。この用語は、最適化を導くことによってセグメンテーションネットワークのトレーニングを改善することを目的として、対象クラスにおける画像画素のクラスタリングを誘導する。手法の性能は,品質管理アプリケーションにおいて,複数の異なるオブジェクトクラスのインスタンスを検出した場合と,視覚検査領域に起源を持つ場合と,特定の欠陥によって影響を受けるシーン表面の点に対応する画像領域の局所化を扱う場合の2つの異なるケーススタディから評価される。両方のケースで報告された検出結果は、両者の違いと特定の課題にもかかわらず、弱いアノテーションの使用が双方の競争性能レベルを達成するのを妨げないことを示している。 It is generally accepted that one of the critical parts of current vision algorithms based on deep learning and convolutional neural networks is the annotation of a sufficient number of images to achieve competitive performance. This is particularly difficult for semantic segmentation tasks since the annotation must be ideally generated at the pixel level. Weakly-supervised semantic segmentation aims at reducing this cost by employing simpler annotations that, hence, are easier, cheaper and quicker to produce. In this paper, we propose and assess a new weakly-supervised semantic segmentation approach making use of a novel loss function whose goal is to counteract the effects of weak annotations. To this end, this loss function comprises several terms based on partial cross-entropy losses, being one of them the Centroid Loss. This term induces a clustering of the image pixels in the object classes under consideration, whose aim is to improve the training of the segmentation network by guiding the optimization. The performance of the approach is evaluated against datasets from two different industry-related case studies: while one involves the detection of instances of a number of different object classes in the context of a quality control application, the other stems from the visual inspection domain and deals with the localization of images areas whose pixels correspond to scene surface points affected by a specific sort of defect. The detection results that are reported for both cases show that, despite the differences among them and the particular challenges, the use of weak annotations do not prevent from achieving a competitive performance level for both.	翻訳日:2022-10-02 19:13:42 公開日:2021-03-04
# パブリックポリシーのための説明可能な機械学習: ユースケース、ギャップ、研究方向 Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions ( http://arxiv.org/abs/2010.14374v2 ) ライセンス: Link先を確認	Kasun Amarasinghe, Kit Rodolfa, Hemank Lamba, Rayid Ghani	(参考訳) 説明可能性は、健康、刑事司法、教育、雇用といった公共政策分野における意思決定を支援する機械学習(ml)モデルの採用と同様に、有効性にとって重要な要件であると同時に、近年説明可能性の分野は拡大しているが、この研究の多くは現実世界のニーズを考慮していない。提案手法の大部分は、明確なユースケースや意図したエンドユーザを使わずに、汎用的な説明可能性目標を持つベンチマークデータセットを使用する。その結果, 実世界の応用における理論的, 方法論的研究の適用性や有効性は明らかでない。本稿では、この空白を公共政策の領域に充足することに焦点を当てる。我々は,公共政策問題における説明可能性ユースケースの分類を開発し,各ユースケースにおいて,説明のエンドユーザーを定義し,その具体的目標を達成しなければならない。第3に,既存の作業をこれらのユースケースにマッピングし,ギャップを特定し,それらのギャップを埋めて,MLを通じた実践的な社会的影響を得るための研究の方向性を提案する。 Explainability is a crucial requirement for effectiveness as well as the adoption of Machine Learning (ML) models supporting decisions in high-stakes public policy areas such as health, criminal justice, education, and employment, While the field of explainable has expanded in recent years, much of this work has not taken real-world needs into account. A majority of proposed methods use benchmark datasets with generic explainability goals without clear use-cases or intended end-users. As a result, the applicability and effectiveness of this large body of theoretical and methodological work on real-world applications is unclear. This paper focuses on filling this void for the domain of public policy. We develop a taxonomy of explainability use-cases within public policy problems; for each use-case, we define the end-users of explanations and the specific goals explainability has to fulfill; third, we map existing work to these use-cases, identify gaps, and propose research directions to fill those gaps in order to have a practical societal impact through ML.	翻訳日:2022-10-02 11:22:23 公開日:2021-03-04
# 投射による一般線形帯域の自己一致解析 Self-Concordant Analysis of Generalized Linear Bandits with Forgetting ( http://arxiv.org/abs/2011.00819v2 ) ライセンス: Link先を確認	Yoan Russac (DI-ENS, CNRS, PSL, VALDA), Louis Faury, Olivier Capp\'e (DI-ENS, VALDA), Aur\'elien Garivier (UMPA-ENSL)	(参考訳) カテゴリー的あるいは数値的観察を伴う文脈的逐次決定問題はユビキタスであり、一般化線形バンド(glb)はそれらに対処するための固い理論的枠組みを提供する。線形帯域の場合とは対照的に、GLBの既存のアルゴリズムは適用性を損なう2つの欠点がある。まず、モデルの非線形の性質のため、過度に悲観的な濃度境界に依存する。第二に、推定器の有界性を強制するためには、非凸射影ステップかバーンインフェーズのいずれかが必要である。これらの問題は、GLBパラメータが時間によって変化する可能性のある非定常モデルを考えると、どちらも悪化する。本研究では,スライディングウインドウと指数重みのどちらかを用いて達成した自己調和型GLB(ロジスティックおよびポアソン回帰を含む)に着目した。そこで本研究では,急速変化する環境において,最大類似推定器に対する信頼度に基づく新しいアルゴリズムを提案する。これらの結果とそれに伴う数値シミュレーションは,GLBの非定常性に対処する提案手法の可能性を強調している。 Contextual sequential decision problems with categorical or numerical observations are ubiquitous and Generalized Linear Bandits (GLB) offer a solid theoretical framework to address them. In contrast to the case of linear bandits, existing algorithms for GLB have two drawbacks undermining their applicability. First, they rely on excessively pessimistic concentration bounds due to the non-linear nature of the model. Second, they require either non-convex projection steps or burn-in phases to enforce boundedness of the estimators. Both of these issues are worsened when considering non-stationary models, in which the GLB parameter may vary with time. In this work, we focus on self-concordant GLB (which include logistic and Poisson regression) with forgetting achieved either by the use of a sliding window or exponential weights. We propose a novel confidence-based algorithm for the maximum-likehood estimator with forgetting and analyze its perfomance in abruptly changing environments. These results as well as the accompanying numerical simulations highlight the potential of the proposed approach to address non-stationarity in GLB.	翻訳日:2022-09-30 10:46:22 公開日:2021-03-04
# ライフサイクルアウェアカプセルネットワークを用いた顔行動の時空間的解析 Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks ( http://arxiv.org/abs/2011.08819v2 ) ライセンス: Link先を確認	Nikhil Churamani, Sinan Kalkan and Hatice Gunes	(参考訳) 顔活動単位(au)検出のための最先端のアプローチのほとんどは、静的フレームからの表情の評価に依存しており、顔活動の高度化のスナップショットをエンコードしている。しかし、現実世界の相互作用では、表情はより微妙で、時間的方法で進化し、時間的情報と同様に空間的および時間的情報を学ぶ必要がある。本稿では,顔AUアクティベーションの時間的変化を符号化する空間的特徴と時空間的特徴の両方に焦点をあてる。そこで本研究では,フレームとシーケンスレベルの両方の機能を用いてau検出を行うアクションユニットライフサイクルアウェアカプセルネットワーク(aula-caps)を提案する。フレームレベルでは、AULA-Capsのカプセル層が空間的特徴プリミティブを学習し、AUのアクティベーションを決定する一方で、シーケンスレベルでは、シーケンス内の関連する時空間セグメントに焦点を当てることで、連続フレーム間の時間的依存関係を学習する。学習された特徴カプセルは、AUライフサイクルに応じて、空間的あるいは時空間的な情報に選択的に集中するようにルーティングされる。提案手法はBP4D と GFT のベンチマークデータセットで評価され,両データセットの最先端結果が得られた。 Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art results on both the datasets.	翻訳日:2022-09-24 16:11:17 公開日:2021-03-04
# Universal MelGAN:複数領域における高密度波形生成のためのロバストニューラルネットワーク Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains ( http://arxiv.org/abs/2011.09631v2 ) ライセンス: Link先を確認	Won Jang, Dan Lim, Jaesam Yoon	(参考訳) 複数のドメインで高忠実度音声を合成するボコーダであるUniversal MelGANを提案する。数百人の話者のデータセットを用いてMelGANに基づく構造を訓練した場合の音質を維持するため,生成波形のスペクトル分解能を高めるため,マルチレゾリューション・スペクトログラム判別器を追加した。これにより、大型フットプリントモデルの高周波帯域における過平滑化問題を緩和し、マルチスピーカの現実的な波形を生成することができる。学習中に波形とスペクトログラムを識別することにより、推定速度を低下させることなく、地中データに近い信号を生成する。このモデルでは,入力としてグラウンドトルース・メル・スペクトログラムを用いて,ほとんどのシナリオで最高の平均世論スコア(MOS)を得た。特に, 話者, 感情, 言語に関して, 未認識領域において優れた性能を示した。さらに,変換器モデルで生成したメルスペクトルを用いたマルチスピーカ音声合成では,4.22MOSの高忠実度音声を合成した。これらの結果は、外部のドメイン情報なしで達成され、普遍的なボコーダとして提案モデルの可能性を強調している。 We propose Universal MelGAN, a vocoder that synthesizes high-fidelity speech in multiple domains. To preserve sound quality when the MelGAN-based structure is trained with a dataset of hundreds of speakers, we added multi-resolution spectrogram discriminators to sharpen the spectral resolution of the generated waveforms. This enables the model to generate realistic waveforms of multi-speakers, by alleviating the over-smoothing problem in the high frequency band of the large footprint model. Our structure generates signals close to ground-truth data without reducing the inference speed, by discriminating the waveform and spectrogram during training. The model achieved the best mean opinion score (MOS) in most scenarios using ground-truth mel-spectrogram as an input. Especially, it showed superior performance in unseen domains with regard of speaker, emotion, and language. Moreover, in a multi-speaker text-to-speech scenario using mel-spectrogram generated by a transformer model, it synthesized high-fidelity speech of 4.22 MOS. These results, achieved without external domain information, highlight the potential of the proposed model as a universal vocoder.	翻訳日:2022-09-23 20:43:05 公開日:2021-03-04
# (参考訳) SAFFIRE:自律的特徴フィルタリングとインテリジェントROI推定システム SAFFIRE: System for Autonomous Feature Filtering and Intelligent ROI Estimation ( http://arxiv.org/abs/2012.02502v2 ) ライセンス: CC BY 4.0	Marco Boschi, Luigi Di Stefano, Martino Alessandrini	(参考訳) この研究は、一連の画像サンプルから支配的な再帰的イメージパターンを自動的に抽出する、SAFFIREという新しいフレームワークを導入する。このようなパターンは、多くのコンピュータビジョンや機械学習タスクにおいて共通の要件であるサンプル間のポーズの変動を排除するために使用される。このフレームワークは、自動製品検査のためのマシンビジョンシステムという文脈で特化している。ここでは、ユーザがアンカーパターンの識別を尋ね、さらに処理する前に、自動化システムがデータを正規化するために使用するのが慣例である。しかし、これは本質的に主観的で高度な専門知識を必要とする非常に敏感な操作である。これにより、SAFFIREは、ユーザに完全に透過的な方法で最適なアンカーパターンを教師なしで識別するための、ユニークで破壊的なフレームワークを提供する。 saffireは、マシンビジョン検査パイプラインの現実的なケーススタディで完全に検証されている。 This work introduces a new framework, named SAFFIRE, to automatically extract a dominant recurrent image pattern from a set of image samples. Such a pattern shall be used to eliminate pose variations between samples, which is a common requirement in many computer vision and machine learning tasks. The framework is specialized here in the context of a machine vision system for automated product inspection. Here, it is customary to ask the user for the identification of an anchor pattern, to be used by the automated system to normalize data before further processing. Yet, this is a very sensitive operation which is intrinsically subjective and requires high expertise. Hereto, SAFFIRE provides a unique and disruptive framework for unsupervised identification of an optimal anchor pattern in a way which is fully transparent to the user. SAFFIRE is thoroughly validated on several realistic case studies for a machine vision inspection pipeline.	翻訳日:2021-05-23 07:35:23 公開日:2021-03-04
# YieldNet:リモートセンシングデータに基づくコーンと大豆の同時収量予測のための畳み込みニューラルネットワーク YieldNet: A Convolutional Neural Network for Simultaneous Corn and Soybean Yield Prediction Based on Remote Sensing Data ( http://arxiv.org/abs/2012.03129v2 ) ライセンス: Link先を確認	Saeed Khaki, Hieu Pham and Lizhi Wang	(参考訳) 大規模作物収量の推定は、その成長状態を通じて作物の連続的な監視を可能にするリモートセンシングデータの提供によって可能になった。この情報を持つことで、利害関係者は利回りポテンシャルを最大化するためにリアルタイムの意思決定ができる。リモートセンシングデータから収量を予測する様々なモデルが存在するが、現在では複数の作物の収量を同時に推定できるアプローチは存在しないため、より正確な予測につながる。複数の作物の収量を予測し、複数の作物の収量間の相互作用を同時に考慮するモデル。本稿では,背骨特徴抽出器の重みを共用することにより,トウモロコシと大豆の収量予測の伝達学習を利用する新しいディープラーニングフレームワークである yieldnet を用いた新しいモデルを提案する。さらに,マルチターゲット応答変数を検討するために,新しい損失関数を提案する。その結果,提案手法は収穫の1～4か月前の収量を正確に予測でき,他の最先端手法と競合することがわかった。 Large scale crop yield estimation is, in part, made possible due to the availability of remote sensing data allowing for the continuous monitoring of crops throughout its growth state. Having this information allows stakeholders the ability to make real-time decisions to maximize yield potential. Although various models exist that predict yield from remote sensing data, there currently does not exist an approach that can estimate yield for multiple crops simultaneously, and thus leads to more accurate predictions. A model that predicts yield of multiple crops and concurrently considers the interaction between multiple crop's yield. We propose a new model called YieldNet which utilizes a novel deep learning framework that uses transfer learning between corn and soybean yield predictions by sharing the weights of the backbone feature extractor. Additionally, to consider the multi-target response variable, we propose a new loss function. Numerical results demonstrate that our proposed method accurately predicts yield from one to four months before the harvest, and is competitive to other state-of-the-art approaches.	翻訳日:2021-05-22 12:09:23 公開日:2021-03-04
# (参考訳) スパース凸ウェーブレットクラスタリングによる同時グループ化とデノーミング Simultaneous Grouping and Denoising via Sparse Convex Wavelet Clustering ( http://arxiv.org/abs/2012.04762v2 ) ライセンス: CC BY-SA 4.0	Michael Weylandt and T. Mitchell Roddenberry and Genevera I. Allen	(参考訳) クラスタリングは、データサイエンスと信号処理におけるユビキタスな問題である。ノイズの多い信号を観測する多くのアプリケーションでは、まず最初にウェーブレットをデノイズ化し、次にクラスタリングアルゴリズムを適用することが一般的である。本稿では,グループを分離し発見する疎凸ウェーブレットクラスタリング手法を開発した。本手法では,コンベックス核融合ペナルティを用いて凝集とグループスパースペナルティを実現し,ウェーブレット領域のスパーシティを緩和する。クラスタを識別する一般的な手法とは対照的に,我々の手法は同時に実行する統一凸アプローチである。本手法は,解釈性とデータ圧縮性を両立させるデノタイズ(ウェーブレットスパース)クラスタセントロイドを生成する。本手法の合成例とNMR分光への応用について述べる。 Clustering is a ubiquitous problem in data science and signal processing. In many applications where we observe noisy signals, it is common practice to first denoise the data, perhaps using wavelet denoising, and then to apply a clustering algorithm. In this paper, we develop a sparse convex wavelet clustering approach that simultaneously denoises and discovers groups. Our approach utilizes convex fusion penalties to achieve agglomeration and group-sparse penalties to denoise through sparsity in the wavelet domain. In contrast to common practice which denoises then clusters, our method is a unified, convex approach that performs both simultaneously. Our method yields denoised (wavelet-sparse) cluster centroids that both improve interpretability and data compression. We demonstrate our method on synthetic examples and in an application to NMR spectroscopy.	翻訳日:2021-05-16 23:09:52 公開日:2021-03-04
# 不確実性整定ピラミッド一貫性による鼻咽頭癌分節の高効率化 Efficient Semi-Supervised Gross Target Volume of Nasopharyngeal Carcinoma Segmentation via Uncertainty Rectified Pyramid Consistency ( http://arxiv.org/abs/2012.07042v3 ) ライセンス: Link先を確認	Xiangde Luo, Wenjun Liao, Jieneng Chen, Tao Song, Yinan Chen, Shichuan Zhang, Nianyong Chen, Guotai Wang, Shaoting Zhang	(参考訳) 鼻咽喉頭癌(NPC)に対する放射線治療計画においてGross Target Volume(GTV)セグメンテーションは相容れない役割を担っている。畳み込みニューラルネットワーク(CNN)はこのタスクで優れたパフォーマンスを達成したが、トレーニングには大量のラベル付きイメージを頼りにしており、これは高価で取得に時間がかかる。本稿では,半教師付きNPC GTVセグメンテーションのための不確かさ確認ピラミッド整合性(URPC)正則化手法を提案する。具体的には、バックボーンセグメンテーションネットワークを拡張して、異なるスケールでピラミッド予測を生成する。ピラミッド予測ネットワーク(PPNet)は、ラベル付き画像の基底真実とラベル付き画像のマルチスケール一貫性損失によって管理されており、同じ入力に対する異なるスケールでの予測は類似し一貫性があるべきであるという事実を動機としている。しかし、これらの予測の解像度が異なるため、各ピクセルに一貫性を持たせるように促すことは、ロバスト性が低く、詳細を失う可能性がある。この問題に対処するために,新たな不確実性整流モジュールをデザインし,各規模の有意義で信頼性の高いコンセンサス領域から徐々に学習できるようにする。 258 npcのmr画像を用いたデータセット実験の結果,ラベル付き画像が10%から20%しか表示されず,ラベル付き画像を活用することによりセグメント化性能が大幅に向上し,最先端の半教師付きセグメンテーション手法を5つ上回った。さらに、50%のイメージがラベル付けされただけで、URPCは、完全に教師付き学習に近い平均82.74%のDiceスコアを達成した。 Gross Target Volume (GTV) segmentation plays an irreplaceable role in radiotherapy planning for Nasopharyngeal Carcinoma (NPC). Despite that Convolutional Neural Networks (CNN) have achieved good performance for this task, they rely on a large set of labeled images for training, which is expensive and time-consuming to acquire. In this paper, we propose a novel framework with Uncertainty Rectified Pyramid Consistency (URPC) regularization for semi-supervised NPC GTV segmentation. Concretely, we extend a backbone segmentation network to produce pyramid predictions at different scales. The pyramid predictions network (PPNet) is supervised by the ground truth of labeled images and a multi-scale consistency loss for unlabeled images, motivated by the fact that prediction at different scales for the same input should be similar and consistent. However, due to the different resolution of these predictions, encouraging them to be consistent at each pixel directly has low robustness and may lose some fine details. To address this problem, we further design a novel uncertainty rectifying module to enable the framework to gradually learn from meaningful and reliable consensual regions at different scales. Experimental results on a dataset with 258 NPC MR images showed that with only 10% or 20% images labeled, our method largely improved the segmentation performance by leveraging the unlabeled images, and it also outperformed five state-of-the-art semi-supervised segmentation methods. Moreover, when only 50% images labeled, URPC achieved an average Dice score of 82.74% that was close to fully supervised learning.	翻訳日:2021-05-09 12:44:33 公開日:2021-03-04
# c-watcher:covid-19流行に先立ってリスクの高い地域を早期発見するためのフレームワーク C-Watcher: A Framework for Early Detection of High-Risk Neighborhoods Ahead of COVID-19 Outbreak ( http://arxiv.org/abs/2012.12169v3 ) ライセンス: Link先を確認	Congxi Xiao, Jingbo Zhou, Jizhou Huang, An Zhuo, Ji Liu, Haoyi Xiong, Dejing Dou	(参考訳) 新型コロナウイルス(COVID-19)は日常的に流行し、いまだに世界中に波及している。非薬剤的介入のための既存の解決策は、通常、住宅地の一部分を適時かつ正確に選択して封じ込めたり隔離したりすることが必要であり、そこでは、特定された症例の空間分布が、部分集合選択の重要な基準とされてきた。このような封じ込め措置は、一部の国では新型コロナウイルスの感染拡大を食い止めたり減速させたりしているものの、確認された症例の統計はたいてい時間的に遅延し、粗粒化しているため、非効率あるいは非効率であると批判されている。この課題に対処するため,C-Watcherという新たなデータ駆動型フレームワークを提案する。C-Watcherは,新型コロナウイルスの流行に先立ち,対象都市の各地区をスクリーニングし,感染リスクを予測する。デザイン面では、C-WatcherはBaidu Mapsから大規模な人間の移動データを収集し、都市の移動パターンに基づいた一連の特徴を用いて市内のすべての住宅地区を特徴付ける。さらに, 地域発生前に, 自発的な知識を対象都市に移すため, 対象都市において特定された事例が判明する以前にも, モビリティ関連特徴から「都市不変」表現を学習し, 高リスク地域を正確に早期に検出するための新しい敵対的エンコーダフレームワークを採用する。新型コロナウイルス(covid-19)流行の初期段階において,実データ記録を用いたc-watcherの広範な実験を行い,多数の都市から高リスク地区を早期に検出するためのc-watcherの効率性と有効性を示した。 The novel coronavirus disease (COVID-19) has crushed daily routines and is still rampaging through the world. Existing solution for nonpharmaceutical interventions usually needs to timely and precisely select a subset of residential urban areas for containment or even quarantine, where the spatial distribution of confirmed cases has been considered as a key criterion for the subset selection. While such containment measure has successfully stopped or slowed down the spread of COVID-19 in some countries, it is criticized for being inefficient or ineffective, as the statistics of confirmed cases are usually time-delayed and coarse-grained. To tackle the issues, we propose C-Watcher, a novel data-driven framework that aims at screening every neighborhood in a target city and predicting infection risks, prior to the spread of COVID-19 from epicenters to the city. In terms of design, C-Watcher collects large-scale long-term human mobility data from Baidu Maps, then characterizes every residential neighborhood in the city using a set of features based on urban mobility patterns. Furthermore, to transfer the firsthand knowledge (witted in epicenters) to the target city before local outbreaks, we adopt a novel adversarial encoder framework to learn "city-invariant" representations from the mobility-related features for precise early detection of high-risk neighborhoods, even before any confirmed cases known, in the target city. We carried out extensive experiments on C-Watcher using the real-data records in the early stage of COVID-19 outbreaks, where the results demonstrate the efficiency and effectiveness of C-Watcher for early detection of high-risk neighborhoods from a large number of cities.	翻訳日:2021-04-26 07:43:19 公開日:2021-03-04
# 緊急対応システムにおける資源配分の階層的計画 Hierarchical Planning for Resource Allocation in Emergency Response Systems ( http://arxiv.org/abs/2012.13300v2 ) ライセンス: Link先を確認	Geoffrey Pettet and Ayan Mukhopadhyay and Mykel Kochenderfer and Abhishek Dubey	(参考訳) 都市規模のサイバー物理システム(CPS)における古典的な問題は、不確実性の下で資源割り当てである。通常、そのような問題はマルコフ決定過程(あるいは半マルコフ決定過程)としてモデル化される。このような問題に対して、オンライン、オフライン、分散のアプローチが適用されてきたが、大きな意思決定問題へのスケールアップは困難である。本稿では,都市レベルのCPS問題の構造を不確実性を考慮した資源配分に活用する階層的計画手法を提案する。緊急対応を事例研究として,大規模資源割当問題をどのようにより小さな問題に分割できるかを示す。次に、より小さな問題を解決し、それらの相互作用に取り組むための原則化されたフレームワークを作成します。最後に、アメリカの主要都市圏であるテネシー州ナッシュビルからの実世界データを使用して、我々のアプローチを検証する。提案手法は,緊急対応分野における最先端のアプローチよりも優れていることを示す。 A classical problem in city-scale cyber-physical systems (CPS) is resource allocation under uncertainty. Typically, such problems are modeled as Markov (or semi-Markov) decision processes. While online, offline, and decentralized approaches have been applied to such problems, they have difficulty scaling to large decision problems. We present a general approach to hierarchical planning that leverages structure in city-level CPS problems for resource allocation under uncertainty. We use the emergency response as a case study and show how a large resource allocation problem can be split into smaller problems. We then create a principled framework for solving the smaller problems and tackling the interaction between them. Finally, we use real-world data from Nashville, Tennessee, a major metropolitan area in the United States, to validate our approach. Our experiments show that the proposed approach outperforms state-of-the-art approaches used in the field of emergency response.	翻訳日:2021-04-25 08:19:26 公開日:2021-03-04
# (参考訳) FPCC:インスタンスセグメンテーションのための高速ポイントクラウドクラスタリング FPCC: Fast Point Cloud Clustering for Instance Segmentation ( http://arxiv.org/abs/2012.14618v3 ) ライセンス: CC BY 4.0	Yajun Xu, Shogo Arai, Diyi Liu, Fangzhou Lin, Kazuhiro Kosuge	(参考訳) インスタンスセグメンテーションは、ロボット工学、自動運転車、人間とコンピュータの相互作用など、多くの現実世界のアプリケーションにおいて重要な前処理タスクである。しかし、同一クラスの複数のオブジェクトを積み重ねたビンピッキングシーンの3Dポイントクラウドインスタンスセグメンテーションについてはほとんど研究されていない。 2次元画像タスクのためのディープラーニングの急速な開発と比較すると、ディープラーニングベースの3Dポイントクラウドセグメンテーションは、まだ開発の余地がたくさんある。このような状況下では、同じクラスの多数の隠蔽対象を区別することが非常に難しい問題である。通常のビンピッキングシーンでは、オブジェクトモデルが知られ、オブジェクトの型数は1である。したがって、セマンティック情報は無視できる。代わりに、インスタンスのセグメンテーションに焦点が当てられる。このタスク要求に基づき、各インスタンスの特徴中心を推論し、残りのポイントを特徴埋め込み空間において最も近い特徴中心にクラスタリングするネットワーク(FPCC-Net)を提案する。 FPCC-Netには2つのサブネットがあり、1つはクラスタリングのための特徴中心を推測し、もう1つは各点の特徴を記述する。提案手法は,既存の3dポイントクラウドおよび2dセグメンテーション手法と比較した。 FPCC-Net は SGPN よりも平均精度 (AP) が 40 % 向上し,約 0.8 [s] で約 6 万点処理可能である。 Instance segmentation is an important pre-processing task in numerous real-world applications, such as robotics, autonomous vehicles, and human-computer interaction. However, there has been little research on 3D point cloud instance segmentation of bin-picking scenes in which multiple objects of the same class are stacked together. Compared with the rapid development of deep learning for two-dimensional (2D) image tasks, deep learning-based 3D point cloud segmentation still has a lot of room for development. In such a situation, distinguishing a large number of occluded objects of the same class is a highly challenging problem. In a usual bin-picking scene, an object model is known and the number of object type is one. Thus, the semantic information can be ignored; instead, the focus is put on the segmentation of instances. Based on this task requirement, we propose a network (FPCC-Net) that infers feature centers of each instance and then clusters the remaining points to the closest feature center in feature embedding space. FPCC-Net includes two subnets, one for inferring the feature centers for clustering and the other for describing features of each point. The proposed method is compared with existing 3D point cloud and 2D segmentation methods in some bin-picking scenes. It is shown that FPCC-Net improves average precision (AP) by about 40\% than SGPN and can process about 60,000 points in about 0.8 [s].	翻訳日:2021-04-19 05:18:34 公開日:2021-03-04
# トリプルトマッチングネットワークによる分類学の完成 Taxonomy Completion via Triplet Matching Network ( http://arxiv.org/abs/2101.01896v3 ) ライセンス: Link先を確認	Jieyu Zhang, Xiangchen Song, Ying Zeng, Jiaze Chen, Jiaming Shen, Yuning Mao, Lei Li	(参考訳) 自動的に分類を構築することは、eコマースやWeb検索に多くの応用を見出す。重要な課題の1つは、データとビジネスのスコープが実際のアプリケーションで増大するにつれて、新しい概念が出現し、既存の分類体系に追加する必要があることである。従来のアプローチは分類学の拡張、すなわち新しいクエリ概念のための分類法から適切なハイパーニム概念を見つける。本稿では,クエリのハイパーネムとハイポネムの概念の両方を発見することで,新しいタスクである「分類完了」を定式化する。本稿では,与えられたクエリ概念に対して適切な<hypernym, hyponym>ペアを見つけるために,Triplet Matching Network (TMN)を提案する。 TMNは1つの予備スコアと複数の補助スコアからなる。これらの補助スコアラは、様々なきめ細かい信号(例えば、hypernymへのクエリやhypnymセマンティクスへのクエリ)をキャプチャし、予備スコアラは、すべての補助スコアラの内部特徴表現に基づいて、<query, hypernym, hyponym>トリプレットの全体的予測を行う。また、概念表現におけるタスク固有情報を保持する革新的なチャネルワイズゲーティング機構を導入し、さらなるモデル性能の向上を図る。実世界の4つの大規模データセットにおける実験は、tmnが既存の手法を上回って、分類完了タスクと以前の分類拡張タスクの両方において最高の性能を達成していることを示している。 Automatically constructing taxonomy finds many applications in e-commerce and web search. One critical challenge is as data and business scope grow in real applications, new concepts are emerging and needed to be added to the existing taxonomy. Previous approaches focus on the taxonomy expansion, i.e. finding an appropriate hypernym concept from the taxonomy for a new query concept. In this paper, we formulate a new task, "taxonomy completion", by discovering both the hypernym and hyponym concepts for a query. We propose Triplet Matching Network (TMN), to find the appropriate <hypernym, hyponym> pairs for a given query concept. TMN consists of one primal scorer and multiple auxiliary scorers. These auxiliary scorers capture various fine-grained signals (e.g., query to hypernym or query to hyponym semantics), and the primal scorer makes a holistic prediction on <query, hypernym, hyponym> triplet based on the internal feature representations of all auxiliary scorers. Also, an innovative channel-wise gating mechanism that retains task-specific information in concept representations is introduced to further boost model performance. Experiments on four real-world large-scale datasets show that TMN achieves the best performance on both taxonomy completion task and the previous taxonomy expansion task, outperforming existing methods.	翻訳日:2021-04-11 00:15:54 公開日:2021-03-04
# (参考訳) 高音源から低音源言語への変換学習による音声認識精度の向上 Transfer learning from High-Resource to Low-Resource Language Improves Speech Affect Recognition Classification Accuracy ( http://arxiv.org/abs/2103.11764v1 ) ライセンス: CC BY-SA 4.0	Sara Durrani and Umair Arshad	(参考訳) 音声認識は、音声データから感情的影響を抽出する問題である。低リソース言語コーパスは後方にあり、クロスコーパス設定では影響認識が難しいタスクである。本稿では,低リソース言語における影響を認識するために,モデルが高リソース言語と微調整に基づいて訓練されるアプローチを提案する。 SAVEE, EMOVO, Urdu, IEMOCAPをベースライン精度60.45, 68.05, 80.34, 56.58パーセントで同一のコーパスでトレーニングする。言語における影響の多様性を捉えるため、クロスコーポレーション評価を詳細に論じる。トレーニングデータにドメインターゲットデータを追加することで、精度が向上することがわかった。最後に,ウルドゥー語とイタリア語の音声のuarを69.32および68.2で達成することで,低資源言語音声認識の性能が向上することを示す。 Speech Affect Recognition is a problem of extracting emotional affects from audio data. Low resource languages corpora are rear and affect recognition is a difficult task in cross-corpus settings. We present an approach in which the model is trained on high resource language and fine-tune to recognize affects in low resource language. We train the model in same corpus setting on SAVEE, EMOVO, Urdu, and IEMOCAP by achieving baseline accuracy of 60.45, 68.05, 80.34, and 56.58 percent respectively. For capturing the diversity of affects in languages cross-corpus evaluations are discussed in detail. We find that accuracy improves by adding the domain target data into the training data. Finally, we show that performance is improved for low resource language speech affect recognition by achieving the UAR OF 69.32 and 68.2 for Urdu and Italian speech affects.	翻訳日:2021-04-05 06:44:04 公開日:2021-03-04
# (参考訳) MICCAIハッカソン : MICCAI会議における論文の再現性・多様性・選定 The MICCAI Hackathon on reproducibility, diversity, and selection of papers at the MICCAI conference ( http://arxiv.org/abs/2103.05437v1 ) ライセンス: CC BY 4.0	Fabian Balsiger, Alain Jungo, Naren Akash R J, Jianan Chen, Ivan Ezhov, Shengnan Liu, Jun Ma, Johannes C. Paetzold, Vishva Saravanan R, Anjany Sekuboyina, Suprosanna Shit, Yannick Suter, Moshood Yekini, Guodong Zeng, Markus Rempfler	(参考訳) MICCAIカンファレンスは、コミュニティの規模、コントリビューションの数、技術的成功の観点から、ここ数年で大きな成長を遂げています。しかし、この成長はコミュニティに新たな課題をもたらします。再現は困難であり,MICCAI会議への論文提出件数の増加は,選択プロセスとトピックの多様性に関する新たな疑問を提起する。これらの課題の交換、議論、創造的な解決策を見つけるために、ハッカソンの新しいフォーマットは、miccai 2020 conference: the miccai hackathonのサテライトイベントとして開始された。 MICCAIハッカソンの第1版では、MICCAI論文の再現性、多様性、選択について論じられている。小さなシンクタンクの方法で、参加者は協力してこれらの課題に対する解決策を見つけました。本報告では,MICCAIハッカソンから得られた知見を,これらの課題に対処するための即時的・長期的対策について要約する。提案手法は, 論文の再現性, 多様性, 選択性に関して, MICCAI会議を改善するための議論・行動の出発点および指針とみなすことができる。 The MICCAI conference has encountered tremendous growth over the last years in terms of the size of the community, as well as the number of contributions and their technical success. With this growth, however, come new challenges for the community. Methods are more difficult to reproduce and the ever-increasing number of paper submissions to the MICCAI conference poses new questions regarding the selection process and the diversity of topics. To exchange, discuss, and find novel and creative solutions to these challenges, a new format of a hackathon was initiated as a satellite event at the MICCAI 2020 conference: The MICCAI Hackathon. The first edition of the MICCAI Hackathon covered the topics reproducibility, diversity, and selection of MICCAI papers. In the manner of a small think-tank, participants collaborated to find solutions to these challenges. In this report, we summarize the insights from the MICCAI Hackathon into immediate and long-term measures to address these challenges. The proposed measures can be seen as starting points and guidelines for discussions and actions to possibly improve the MICCAI conference with regards to reproducibility, diversity, and selection of papers.	翻訳日:2021-04-05 06:36:12 公開日:2021-03-04
# NADI 2021:第2回Nuanced Arabic Dialect Identification Shared Task NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ( http://arxiv.org/abs/2103.08466v1 ) ライセンス: Link先を確認	Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash	(参考訳) 本研究は,第2回Nuanced Arabic Dialect Identification Shared Task (NADI 2021)の結果と結果を報告する。この共有タスクには、国レベルの現代標準アラビア語(MSA)識別(Subtask 1.1)、国レベルの方言識別(Subtask 1.2)、州レベルのMSA識別(Subtask 2.1)、州レベルの方言識別(Subtask 2.2)の4つのサブタスクが含まれる。共有タスクデータセットは、Twitterドメインから収集された21のアラブ諸国から合計100の州をカバーする。 23か国53チームが参加登録しており、この地域のコミュニティの関心を反映している。 5チームからsubtask 1.1,8チームから27のsubtask 1.2,4チームから12のsubtask 2.1,4チームから13のsubtask 2.2の申し込みを受けました。 We present the findings and results of the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1.1), country-level dialect identification (Subtask 1.2), province-level MSA identification (Subtask 2.1), and province-level sub-dialect identification (Subtask 2.2). The shared task dataset covers a total of 100 provinces from 21 Arab countries, collected from the Twitter domain. A total of 53 teams from 23 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 16 submissions for Subtask 1.1 from five teams, 27 submissions for Subtask 1.2 from eight teams, 12 submissions for Subtask 2.1 from four teams, and 13 Submissions for subtask 2.2 from four teams.	翻訳日:2021-04-05 00:55:09 公開日:2021-03-04
# 高度運転支援システムにおける信頼に値するAIの評価リストの探索 Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems ( http://arxiv.org/abs/2103.09051v1 ) ライセンス: Link先を確認	Markus Borg, Joshua Bronson, Linus Christensson, Fredrik Olsson, Olof Lennartsson, Elias Sonnsj\"o, Hamid Ebabi, Martin Karsberg	(参考訳) 人工知能(AI)はますます重要な応用に使われている。したがって、信頼性の高いAIシステムの必要性は急速に高まっている。 2018年、欧州委員会は専門家をAI-HLEG(High-Level Expert Group on AI)に任命した。 AI-HLEGは、信頼できるAIを、1)合法、2)倫理的、3)堅牢で、対応する7つの重要な要件として定義した。開発組織を支援するため、AI-HLEGは先頃、信頼できるAI評価リスト(ALTAI)を公開した。本稿では,機械学習(ML)を活用した高度運転支援システム(ADAS)の開発プロジェクトへのALTAIの適用例を紹介する。われわれの経験から、ALTAIはADAS開発に大半が当てはまるが、人事機関や透明性に関連する特定の部分は無視できる。さらに、社会的・環境的な影響に関する大きな疑問は、ADASサプライヤーが単独で取り組むことはできない。我々は,altai準拠性を確保するためにadasの開発計画を述べる。最後に,altaiの次回の改訂,すなわちライフサイクル変種,ドメイン固有適応,冗長性除去のための3つの推奨事項を提示する。 Artificial Intelligence (AI) is increasingly used in critical applications. Thus, the need for dependable AI systems is rapidly growing. In 2018, the European Commission appointed experts to a High-Level Expert Group on AI (AI-HLEG). AI-HLEG defined Trustworthy AI as 1) lawful, 2) ethical, and 3) robust and specified seven corresponding key requirements. To help development organizations, AI-HLEG recently published the Assessment List for Trustworthy AI (ALTAI). We present an illustrative case study from applying ALTAI to an ongoing development project of an Advanced Driver-Assistance System (ADAS) that relies on Machine Learning (ML). Our experience shows that ALTAI is largely applicable to ADAS development, but specific parts related to human agency and transparency can be disregarded. Moreover, bigger questions related to societal and environmental impact cannot be tackled by an ADAS supplier in isolation. We present how we plan to develop the ADAS to ensure ALTAI-compliance. Finally, we provide three recommendations for the next revision of ALTAI, i.e., life-cycle variants, domain-specific adaptations, and removed redundancy.	翻訳日:2021-04-05 00:54:27 公開日:2021-03-04
# トポロジカル深層学習 Topological Deep Learning ( http://arxiv.org/abs/2101.05778v2 ) ライセンス: Link先を確認	Ephy R. Love, Benjamin Filippenko, Vasileios Maroulas, Gunnar Carlsson	(参考訳) この研究は、いくつかのトポロジカルに定義された畳み込み手法を含むトポロジカルCNN(TCNN)を紹介する。自然な画像空間と重要な関係を持つ多様体は、TCNNの畳み込み重みとして使用される画像フィルタのパラメータ化に使用される。これらの多様体はまた、重みが局所化されるTCNNの層におけるスライスをパラメータ化する。従来のCNNに比べて,TNNがより速く,少ないデータで学習し,学習パラメータが少なく,一般化性と解釈性が高いことを示す。我々は、画像データとビデオデータの両方にTCNN層を導入し、探索する。 3D画像と3Dビデオの拡張を提案する。 This work introduces the Topological CNN (TCNN), which encompasses several topologically defined convolutional methods. Manifolds with important relationships to the natural image space are used to parameterize image filters which are used as convolutional weights in a TCNN. These manifolds also parameterize slices in layers of a TCNN across which the weights are localized. We show evidence that TCNNs learn faster, on less data, with fewer learned parameters, and with greater generalizability and interpretability than conventional CNNs. We introduce and explore TCNN layers for both image and video data. We propose extensions to 3D images and 3D video.	翻訳日:2021-03-29 00:56:35 公開日:2021-03-04
# DeepDT: Delaunay Triangulation による表面再構成の学習 DeepDT: Learning Geometry From Delaunay Triangulation for Surface Reconstruction ( http://arxiv.org/abs/2101.10353v2 ) ライセンス: Link先を確認	Yiming Luo, Zhenxing Mi, Wenbing Tao	(参考訳) 本稿では,DeepDTと呼ばれる新しい学習ネットワークを提案し,点雲のデラウネー三角測量から表面を再構築する。 deepdtは、点雲と対応するdelaunay三角測量から直接、delaunay tetrahedronsの内外ラベルを予測することを学ぶ。局所幾何学的特徴はまず入力点雲から抽出され、デラウネー三角測量から得られるグラフに集約される。次に、テトラセドロンのラベル予測に構造正規化を加えるために、集約された特徴にグラフフィルタリングを適用する。四面体と三角形の間の複雑な空間関係のため、基底真理面から四面体の基底真理ラベルを直接生成することは不可能である。そこで我々は,その内部にサンプリング位置のラベルを付けたテトラヘドロンのラベルを投票するマルチラベル監視戦略を提案する。提案したDeepDTは、特にオープンシーンの内面に対して、過度に複雑な表面を生成することなく、豊富な幾何学的詳細を維持できる。一方,提案手法の一般化能力と時間消費は,最先端手法と比較して許容され,競争力がある。実験は提案されたDeepDTの優れた性能を示す。 In this paper, a novel learning-based network, named DeepDT, is proposed to reconstruct the surface from Delaunay triangulation of point cloud. DeepDT learns to predict inside/outside labels of Delaunay tetrahedrons directly from a point cloud and corresponding Delaunay triangulation. The local geometry features are first extracted from the input point cloud and aggregated into a graph deriving from the Delaunay triangulation. Then a graph filtering is applied on the aggregated features in order to add structural regularization to the label prediction of tetrahedrons. Due to the complicated spatial relations between tetrahedrons and the triangles, it is impossible to directly generate ground truth labels of tetrahedrons from ground truth surface. Therefore, we propose a multilabel supervision strategy which votes for the label of a tetrahedron with labels of sampling locations inside it. The proposed DeepDT can maintain abundant geometry details without generating overly complex surfaces , especially for inner surfaces of open scenes. Meanwhile, the generalization ability and time consumption of the proposed method is acceptable and competitive compared with the state-of-the-art methods. Experiments demonstrate the superior performance of the proposed DeepDT.	翻訳日:2021-03-14 19:04:33 公開日:2021-03-04
# (参考訳) 低高度における非相関空域エンカウンターモデルの適用性とサロガビリティ Applicability and Surrogacy of Uncorrelated Airspace Encounter Models at Low Altitudes ( http://arxiv.org/abs/2103.04753v1 ) ライセンス: CC BY 4.0	Ngaire Underhill, Andrew Weinert	(参考訳) 国立航空システム(NAS)は、安全で効率的な航空を可能にする複雑で進化したシステムです。高度なエアモビリティの概念と無人航空機のような新しい空域参入者は、全体的な安全性や効率を低下させることなくNASに統合する必要があります。例えば、航空機間の空中衝突のリスクを軽減するために、規制、基準、システムが必要となる。モンテカルロシミュレーションは航空機の衝突回避システムを開発し、評価し、認定するための基礎的な能力である。これらはしばしば人力による実験や飛行試験によって検証される。多くの航空安全研究において、有人航空機の挙動は動的ベイズネットワークを用いて表される。オリジナルの統計モデルは2008年から2013年にかけて開発され、500フィート以上の地上レベル(AGL)の安全シミュレーションをサポートした。しかし、これらのモデルは500フィートAGL以下の小さなUAS操作の安全性を評価するには十分ではなかった。その結果、高度500フィート AGL 以下の新しいモデルが2018年から開発されている。多くのモデルは、航空機の挙動は相関性がなく、航空交通サービスや近隣の航空機に依存しないと考えている。本研究の目的は、従来の航空機の様々な非相関モデルを比較し、モデルの違いを特定することである。特にロータークラフトのモデルが固定翼機のモデルと十分に異なる場合、タイプ固有のモデルが必要となる。主な貢献は、低高度運転用に設計された衝突回避システムの性能を評価する際に、非相関モデルが活用すべきガイダンスです。また、トランスポンダを使わずに非協調航空機のサロゲートモデルに対処する。 The National Airspace System (NAS) is a complex and evolving system that enables safe and efficient aviation. Advanced air mobility concepts and new airspace entrants, such as unmanned aircraft, must integrate into the NAS without degrading overall safety or efficiency. For instance, regulations, standards, and systems are required to mitigate the risk of a midair collision between aircraft. Monte Carlo simulations have been a foundational capability for decades to develop, assess, and certify aircraft conflict avoidance systems. These are often validated through human-in-the-loop experiments and flight testing. For many aviation safety studies, manned aircraft behavior is represented using dynamic Bayesian networks. The original statistical models were developed from 2008-2013 to support safety simulations for altitudes above 500 feet Above Ground Level (AGL). However, these models were not sufficient to assess the safety of smaller UAS operations below 500 feet AGL. In response, newer models with altitude floors below 500 feet AGL have been in development since 2018. Many of the models assume that aircraft behavior is uncorrelated and not dependent on air traffic services or nearby aircraft. Our research objective was to compare the various uncorrelated models of conventional aircraft and identify how the models differ. Particularly if models of rotorcraft were sufficiently different than models of fixed-wing aircraft to require type specific models. The primary contribution is guidance on which uncorrelated models to leverage when evaluating the performance of a collision avoidance system designed for low altitude operations. We also address which models can be surrogates for noncooperative aircraft without transponders.	翻訳日:2021-03-10 19:01:54 公開日:2021-03-04
# アンダーサンプリングスペクトルデータを用いた光コヒーレンストモグラフィにおけるニューラルネットワークによる画像再構成 Neural network-based image reconstruction in swept-source optical coherence tomography using undersampled spectral data ( http://arxiv.org/abs/2103.03877v1 ) ライセンス: Link先を確認	Yijie Zhang, Tairan Liu, Manmohan Singh, Yilin Luo, Yair Rivenson, Kirill V. Larin, and Aydogan Ozcan	(参考訳) 光学コヒーレンストモグラフィ(OCT)は、サンプルの体積画像を迅速に提供できる広く使用されている非侵襲的バイオメディカルイメージングモダリティです。本稿では、空間エイリアスのない低サンプリングスペクトルデータを用いて、スイープソースOCT(SS-OCT)画像を生成するディープラーニングベースの画像再構築フレームワークを提案する。このニューラルネットワークベースの画像再構成は、光学的設定にハードウェアの変更は必要とせず、既存のsweeptソースまたはスペクトル領域octシステムと容易に統合でき、取得する生のスペクトルデータ量を減らすことができる。本フレームワークの有効性を示すため,SS-OCTシステムにより画像化されたマウス胚を用いた深層ニューラルネットワークの訓練と盲目的試験を行った。トレーニングされたニューラルネットワークは、2倍のアンサンプ付きスペクトルデータ(Aラインあたり640のスペクトル点)を使用して、デスクトップコンピュータを用いて512のAラインを6.73msで盲目的に再構築し、スペクトルアンサンプによる空間エイリアシングアーティファクトを除去し、同じサンプルの画像と非常によく一致し、全スペクトルCTデータ(Aライン当たり1280のスペクトル点)を使用して再構成することができる。また,A線当たりのスペクトルデータを3xアンサンプ化するために,このフレームワークをさらに拡張できることを実証した。この深層学習可能な画像再構成手法は、様々なスペクトル領域octシステムで広く使用することができ、画像解像度と信号対雑音比を犠牲にすることなく画像速度を向上させることができる。 Optical Coherence Tomography (OCT) is a widely used non-invasive biomedical imaging modality that can rapidly provide volumetric images of samples. Here, we present a deep learning-based image reconstruction framework that can generate swept-source OCT (SS-OCT) images using undersampled spectral data, without any spatial aliasing artifacts. This neural network-based image reconstruction does not require any hardware changes to the optical set-up and can be easily integrated with existing swept-source or spectral domain OCT systems to reduce the amount of raw spectral data to be acquired. To show the efficacy of this framework, we trained and blindly tested a deep neural network using mouse embryo samples imaged by an SS-OCT system. Using 2-fold undersampled spectral data (i.e., 640 spectral points per A-line), the trained neural network can blindly reconstruct 512 A-lines in ~6.73 ms using a desktop computer, removing spatial aliasing artifacts due to spectral undersampling, also presenting a very good match to the images of the same samples, reconstructed using the full spectral OCT data (i.e., 1280 spectral points per A-line). We also successfully demonstrate that this framework can be further extended to process 3x undersampled spectral data per A-line, with some performance degradation in the reconstructed image quality compared to 2x spectral undersampling. This deep learning-enabled image reconstruction approach can be broadly used in various forms of spectral domain OCT systems, helping to increase their imaging speed without sacrificing image resolution and signal-to-noise ratio.	翻訳日:2021-03-09 15:50:03 公開日:2021-03-04
# ニューラルネットワークに基づくネットワーク自動化のための量子化 Neural Network-based Quantization for Network Automation ( http://arxiv.org/abs/2103.04764v1 ) ライセンス: Link先を確認	Marton Kajo, Stephen S. Mwanje, Benedek Schultz, Georg Carle	(参考訳) ディープラーニング手法はモバイルネットワーク、特に高度なマシン認知のための手段を提供するネットワーク管理自動化に採用されている。ディープラーニング手法は最先端のハードウェアとソフトウェアツールを使用し、複雑な認知アルゴリズムを開発できる。最近の論文では,k-Meansアルゴリズムの修正であるBunding Sphere Quantization (BSQ)アルゴリズムを導入し,異常検出などの特定のネットワーク管理ユースケースに対して,より優れた量子化を実現することを示した。しかし、BSQはk-Meansよりもトレーニングにかなり長い時間を要し、ニューラルネットワークベースの実装で克服できる課題である。本稿では,最先端のディープラーニングツールを用いて,競争力のあるトレーニング速度を実現するBSQの実装を提案する。 Deep Learning methods have been adopted in mobile networks, especially for network management automation where they provide means for advanced machine cognition. Deep learning methods utilize cutting-edge hardware and software tools, allowing complex cognitive algorithms to be developed. In a recent paper, we introduced the Bounding Sphere Quantization (BSQ) algorithm, a modification of the k-Means algorithm, that was shown to create better quantizations for certain network management use-cases, such as anomaly detection. However, BSQ required a significantly longer time to train than k-Means, a challenge which can be overcome with a neural network-based implementation. In this paper, we present such an implementation of BSQ that utilizes state-of-the-art deep learning tools to achieve a competitive training speed.	翻訳日:2021-03-09 15:26:42 公開日:2021-03-04
# (参考訳) 多重重要度サンプリングによる保守的最適政策最適化 Conservative Optimistic Policy Optimization via Multiple Importance Sampling ( http://arxiv.org/abs/2103.03307v1 ) ライセンス: CC BY 4.0	Achraf Azize and Othman Gaizi	(参考訳) 強化学習(rl)は,アタリゲームのプレイやgoのゲーム解決といった難しい問題を,統一的なアプローチで解決することができる。しかし、現代のディープRLアプローチは、まだ現実世界のアプリケーションでは広く使われていない。理由の1つは、既存の(すでに稼働している)ベースラインポリシーと比較して、中間実行ポリシーのパフォーマンスに対する保証がないことである。本論文では,政策最適化問題における保守的な探索を解くオンラインモデルフリーアルゴリズムを提案する。提案されたアプローチの後悔は、離散パラメータ空間と連続パラメータ空間の両方に対して $\tilde{\mathcal{O}}(\sqrt{T})$ で有界であることを示した。 Reinforcement Learning (RL) has been able to solve hard problems such as playing Atari games or solving the game of Go, with a unified approach. Yet modern deep RL approaches are still not widely used in real-world applications. One reason could be the lack of guarantees on the performance of the intermediate executed policies, compared to an existing (already working) baseline policy. In this paper, we propose an online model-free algorithm that solves conservative exploration in the policy optimization problem. We show that the regret of the proposed approach is bounded by $\tilde{\mathcal{O}}(\sqrt{T})$ for both discrete and continuous parameter spaces.	翻訳日:2021-03-09 05:37:27 公開日:2021-03-04
# (参考訳) nutrition5k:ジェネリック食品の自動栄養理解に向けて Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food ( http://arxiv.org/abs/2103.03375v1 ) ライセンス: CC BY 4.0	Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, Jack Sim	(参考訳) 視覚データから食品の栄養分を理解することはコンピュータビジョンの問題であり、公衆衛生にポジティブで広範な影響を与える可能性がある。この領域の研究は、栄養学的理解能力を持つモデルの訓練に必要な十分な多様性やラベルが欠けている分野の既存のデータセットに限られている。本研究では,映像ストリーム,奥行き画像,成分重み,高精細な栄養コンテンツアノテーションを備えた,5kの多様な実世界の食品料理のデータセットである nutrition5kを紹介する。本稿では, 複雑で現実的な料理のカロリーおよびマクロ栄養価を, プロの栄養士を上回る精度で予測できるコンピュータビジョンアルゴリズムを訓練することにより, このデータセットの可能性を実証する。さらに,栄養予測を改善するため,深度センサデータを組み込んだベースラインを提案する。栄養理解の領域でイノベーションを加速することを願って、Nutrition5kを公にリリースします。 Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health. Studies in this area are limited to existing datasets in the field that lack sufficient diversity or labels required for training models with nutritional understanding capability. We introduce Nutrition5k, a novel dataset of 5k diverse, real world food dishes with corresponding video streams, depth images, component weights, and high accuracy nutritional content annotation. We demonstrate the potential of this dataset by training a computer vision algorithm capable of predicting the caloric and macronutrient values of a complex, real world dish at an accuracy that outperforms professional nutritionists. Further we present a baseline for incorporating depth sensor data to improve nutrition predictions. We will publicly release Nutrition5k in the hope that it will accelerate innovation in the space of nutritional understanding.	翻訳日:2021-03-09 04:57:40 公開日:2021-03-04
# (参考訳) ハードラベルマニホールド: オンマニホールドアドバンサリの例を見つけるためのクエリ効率の予期せぬ利点 Hard-label Manifolds: Unexpected Advantages of Query Efficiency for Finding On-manifold Adversarial Examples ( http://arxiv.org/abs/2103.03325v1 ) ライセンス: CC BY 4.0	Washington Garcia, Pin-Yu Chen, Somesh Jha, Scott Clouse, Kevin R. B. Butler	(参考訳) 相手の例に強いディープネットワークを設計することは、依然としてオープンな問題です。同様に、画像分類モデルに対する最近のゼロオーダーのハードラベル攻撃は、ファーストオーダーのグラデーションレベルの代替品に匹敵するパフォーマンスを示している。最近、グラデーションレベルの設定では、通常の敵対的な例がデータ多様体から離れ、オンマニホールドの例が実際には一般化エラーであることが示されている。本論文では,0次設定におけるクエリ効率が,データマニホールドを介して相手のトラバーサルと結びついていることを論じる。この振る舞いを説明するために,雑音の多い多様体距離オラクルに基づく情報理論の議論を提案し,敵の勾配推定を通じて多様体情報を漏らす。多様体勾配相互情報の数値実験により,この挙動が有効な問題次元と訓練点の数の関数として作用することを示す。実世界のデータセットと次元還元を用いた複数のゼロ次攻撃では、同じ普遍的な挙動を観察して、データ多様体に近いサンプルを生成する。この結果、モデルロバスト性にかかわらず、多様体距離測度は最大で2倍減少する。以上の結果から,多様体段階の相互情報を考慮に入れることで,将来より頑健なモデル設計に寄与し,感度の高いデータ多様体の漏洩を回避できることが示唆された。 Designing deep networks robust to adversarial examples remains an open problem. Likewise, recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives. It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. In this paper, we argue that query efficiency in the zeroth-order setting is connected to an adversary's traversal through the data manifold. To explain this behavior, we propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate. Through numerical experiments of manifold-gradient mutual information, we show this behavior acts as a function of the effective problem dimensionality and number of training points. On real-world datasets and multiple zeroth-order attacks using dimension-reduction, we observe the same universal behavior to produce samples closer to the data manifold. This results in up to two-fold decrease in the manifold distance measure, regardless of the model robustness. Our results suggest that taking the manifold-gradient mutual information into account can thus inform better robust model design in the future, and avoid leakage of the sensitive data manifold.	翻訳日:2021-03-08 18:51:34 公開日:2021-03-04
# 大規模対話型AIシステムにおけるスキルルーティングのためのニューラルモデルロバストネス--設計選択探索 Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration ( http://arxiv.org/abs/2103.03373v1 ) ライセンス: Link先を確認	Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya	(参考訳) 産業における最新の大規模対話型AIまたはインテリジェントデジタルアシスタントシステムは、自動音声認識(ASR)や自然言語理解(NLU)などの一連のコンポーネントで構成されています。共有nluオントロジー(例えば集中型インテント/スロットスキーマ)を利用するシステムでは、リクエストを適切なスキルに正しくルーティングする独立したスキルルーティングコンポーネントが存在します。スキルルーティングコンポーネントは、同じインテントをサブスクライブしたり、特定のコンテキスト条件下でインテントをサブスクライブする(例えば、デバイスにはスクリーンがある)ことができる何千ものスキルがあるため、必要である。スキルルーティングモデルが本番環境にデプロイされた後、オントロジーのサブスクリプションを動的に変更する可能性があるため、スキルルーティングコンポーネントにおけるモデルの堅牢性やレジリエンスを保証することが重要な問題である。本稿では,最新の商用会話型aiシステムにおけるスキルルーティングの文脈におけるモデルロバスト性,特にデータ拡張,モデルアーキテクチャ,最適化方法に関する選択に,異なるモデリング設計選択がどう影響するかを示す。データ拡張を適用することは、モデルロバスト性を大幅に改善する非常に効果的で実用的な方法であることを示す。 Current state-of-the-art large-scale conversational AI or intelligent digital assistant systems in industry comprises a set of components such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). For some of these systems that leverage a shared NLU ontology (e.g., a centralized intent/slot schema), there exists a separate skill routing component to correctly route a request to an appropriate skill, which is either a first-party or third-party application that actually executes on a user request. The skill routing component is needed as there are thousands of skills that can either subscribe to the same intent and/or subscribe to an intent under specific contextual conditions (e.g., device has a screen). Ensuring model robustness or resilience in the skill routing component is an important problem since skills may dynamically change their subscription in the ontology after the skill routing model has been deployed to production. We show how different modeling design choices impact the model robustness in the context of skill routing on a state-of-the-art commercial conversational AI system, specifically on the choices around data augmentation, model architecture, and optimization method. We show that applying data augmentation can be a very effective and practical way to drastically improve model robustness.	翻訳日:2021-03-08 15:07:43 公開日:2021-03-04
# 忘れたいことを思い出す: 機械学習のためのアルゴリズム Remember What You Want to Forget: Algorithms for Machine Unlearning ( http://arxiv.org/abs/2103.03279v1 ) ライセンス: Link先を確認	Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh	(参考訳) 学習モデルからデータポイントを忘れる問題について検討する。この場合、学習者はまずデータセット$S$ drawing i.i.dを受け取る。未知の分布から出力し、その分布から見当たらないサンプルでうまく機能する予測値 $w$ を出力します。しかし、将来のある時点では、任意のトレーニングデータポイント$z \in S$は、未学習を要求できるため、学習者は、同じ精度の保証を保ちながら、出力予測を変更できる。本研究は,人口環境における機械学習の厳密な研究を開始し,その目的は,目に見えないテスト損失に対するパフォーマンスを維持することである。次に,凸損失関数の学習アルゴリズムを提供する。凸損失の設定については、最大$o(n/d^{1/4})$サンプルを削除できる未学習アルゴリズムを提供し、ここで$d$が問題次元である。対照的に、一般に、差分的プライベート学習v(非学習を意味する)は$O(n/d^{1/2})$サンプルの削除のみを保証する。これは、非学習が、削除能力の$d$への依存の観点からプライベートに学習するよりも少なくとも多項式に効率的であることを示している。 We study the problem of forgetting datapoints from a learnt model. In this case, the learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a predictor $w$ that performs well on unseen samples from that distribution. However, at some point in the future, any training data point $z \in S$ can request to be unlearned, thus prompting the learner to modify its output predictor while still ensuring the same accuracy guarantees. In our work, we initiate a rigorous study of machine unlearning in the population setting, where the goal is to maintain performance on the unseen test loss. We then provide unlearning algorithms for convex loss functions. For the setting of convex losses, we provide an unlearning algorithm that can delete up to $O(n/d^{1/4})$ samples, where $d$ is the problem dimension. In comparison, in general, differentially private learningv(which implies unlearning) only guarantees deletion of $O(n/d^{1/2})$ samples. This shows that unlearning is at least polynomially more efficient than learning privately in terms of dependence on $d$ in the deletion capacity.	翻訳日:2021-03-08 15:04:09 公開日:2021-03-04
# 二階情報によるモーメントの補正 Correcting Momentum with Second-order Information ( http://arxiv.org/abs/2103.03265v1 ) ライセンス: Link先を確認	Hoang Tran, Ashok Cutkosky	(参考訳) 非凸確率最適化のための新しいアルゴリズムを開発し、最適な$o(\epsilon^{-3})$確率勾配とヘッセンベクトル積計算において$\epsilon$臨界点を求める。我々のアルゴリズムは、運動量を持つSGDの運動量におけるバイアス項を「修正」するためにヘシアンベクトル積を用いる。これにより、分散還元法に類似した方法で勾配推定が改善される。従来の作業とは対照的に、過大なバッチサイズ(あるいは、バッチサイズに関するいかなる制限も)は必要とせず、我々のアルゴリズムと解析はよりシンプルです。私たちは、SGDとAdamよりも改善が見られる、さまざまな大規模ディープラーニングベンチマークとアーキテクチャの結果を検証しています。 We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and hessian-vector product computations. Our algorithm uses Hessian-vector products to "correct" a bias term in the momentum of SGD with momentum. This leads to better gradient estimates in a manner analogous to variance reduction methods. In contrast to prior work, we do not require excessively large batch sizes (or indeed any restrictions at all on the batch size), and both our algorithm and its analysis are much simpler. We validate our results on a variety of large-scale deep learning benchmarks and architectures, where we see improvements over SGD and Adam.	翻訳日:2021-03-08 15:02:14 公開日:2021-03-04
# ランダムSHAPのアンサンブル Ensembles of Random SHAPs ( http://arxiv.org/abs/2103.03302v1 ) ライセンス: Link先を確認	Lev V. Utkin and Andrei V. Konstantinov	(参考訳) ブラックボックスモデルの局所説明のためのよく知られたSHapley Additive exPlanations (SHAP) 法のアンサンブルに基づく修正が提案されている。修正は、多くの機能がある場合に計算コストがかかるshapを単純化することを目的としている。提案された修正の背景にある主な考え方は、少数の特徴を持つSHAPのアンサンブルによってSHAPを近似することである。 ER-SHAPと呼ばれる最初の修正では、いくつかの特徴が特徴集合からランダムに選択され、その特徴のShapley値は「小さい」SHAPによって計算される。説明結果は、最終的なShapley値を得るために平均されます。 ERW-SHAPと呼ばれる2番目の修正では、説明されたインスタンスの周りに多様性のためにいくつかのポイントが生成され、その説明の結果はポイントと説明されたインスタンスの距離に応じて重みと結合される。 ER-SHAP-RFと呼ばれる第3の修正は、ER-SHAPのアンサンブルベースの手順における特徴の選択に適用される特徴確率分布の予備的説明にランダムフォレストを用いている。提案された修正を例示する多くの数値実験は、その効率と局所的な説明のための特性を示す。 Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called ER-SHAP, several features are randomly selected many times from the feature set, and Shapley values for the features are computed by means of "small" SHAPs. The explanation results are averaged to get the final Shapley values. According to the second modification, called ERW-SHAP, several points are generated around the explained instance for diversity purposes, and results of their explanation are combined with weights depending on distances between points and the explained instance. The third modification, called ER-SHAP-RF, uses the random forest for preliminary explanation of instances and determining a feature probability distribution which is applied to selection of features in the ensemble-based procedure of ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for local explanation.	翻訳日:2021-03-08 15:02:00 公開日:2021-03-04
# ラベルシフトによる分類の分布自由不確実性定量化 Distribution-free uncertainty quantification for classification under label shift ( http://arxiv.org/abs/2103.03323v1 ) ライセンス: Link先を確認	Aleksandr Podkopaev, Aaditya Ramdas	(参考訳) MLモデルの信頼できる展開には、特に安全クリティカルなアプリケーションにおいて、不確実性の適切な測定が必要です。我々は,2つの経路による分類問題に対する不確実性定量化 (uq) に焦点をあて, 共形予測を用いた予測セットとポストホックバイナリ化による確率的予測器のキャリブレーションを行う。データだ i.i.d.を超えて一般化する2つの一般的な方法設定には共変量とラベルシフトの扱いが含まれる。流通のないUQの文脈では、前者は既に注目を集めていますが、後者ではありません。ラベルシフトは予測を損なうことが知られており、最初に、カバレッジとキャリブレーションの劣化を示すことでuqも損なうと論じる。ラベルシフトへの対応(より良い予測のために)の最近の進歩を裏付けて、ターゲット分布からのラベルされていないデータが利用可能であるたびに、上記の適合および校正手順を重み付けすることにより、UQを達成する正しい方法を検討します。これらの手法を, 理論上, 分散性のない枠組みで検討し, その優れた実用性を示す。 Trustworthy deployment of ML models requires a proper measure of uncertainty, especially in safety-critical applications. We focus on uncertainty quantification (UQ) for classification problems via two avenues -- prediction sets using conformal prediction and calibration of probabilistic predictors by post-hoc binning -- since these possess distribution-free guarantees for i.i.d. data. Two common ways of generalizing beyond the i.i.d. setting include handling covariate and label shift. Within the context of distribution-free UQ, the former has already received attention, but not the latter. It is known that label shift hurts prediction, and we first argue that it also hurts UQ, by showing degradation in coverage and calibration. Piggybacking on recent progress in addressing label shift (for better prediction), we examine the right way to achieve UQ by reweighting the aforementioned conformal and calibration procedures whenever some unlabeled data from the target distribution is available. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.	翻訳日:2021-03-08 15:01:41 公開日:2021-03-04
# 医用画像解析における深層学習一般化の複雑性評価 Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis ( http://arxiv.org/abs/2103.03328v1 ) ライセンス: Link先を確認	Aleksandar Vakanski, Min Xian	(参考訳) 医用画像解析のためのディープラーニングモデルの一般化誤差は、データ取得、デバイス設定、患者集団のために異なるデバイスで収集された画像に対してしばしば減少する。新しい画像に対する一般化能力の理解が深層学習における臨床医の信頼性に不可欠である。近年,一般化限界と複雑性尺度の確立に向けた研究が盛んに行われているが,予測と実際の一般化性能との間には大きな差があることが多い。同様に、関連する大規模な実証研究は、主に汎用画像データセットによる検証に基づいている。本稿では,乳房超音波画像における25種類の複雑性尺度と教師付き深層学習分類器の一般化能力の相関について検討する。結果は,PAC-Bayes平坦度とパスノルムに基づく尺度が,モデルとデータの組み合わせについて最も一貫した説明をもたらすことを示唆している。また,乳房画像に対するマルチタスク分類とセグメンテーション手法の利用について検討し,これらの学習手法が暗黙の正規化として機能し,一般化の促進に寄与することを示す。 The generalization error of deep learning models for medical image analysis often decreases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness in deep learning. Although significant research efforts have been recently directed toward establishing generalization bounds and complexity measures, still, there is often a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of supervised deep learning classifiers for breast ultrasound images. The results indicate that PAC-Bayes flatness-based and path norm-based measures produce the most consistent explanation for the combination of models and data. We also investigate the use of multi-task classification and segmentation approach for breast images, and report that such learning approach acts as an implicit regularizer and is conducive toward improved generalization.	翻訳日:2021-03-08 15:00:01 公開日:2021-03-04
# PVG at WASSA 2021: 共感と距離予測のためのマルチ入力、マルチタスク、トランスフォーマーベースのアーキテクチャ PVG at WASSA 2021: A Multi-Input, Multi-Task, Transformer-Based Architecture for Empathy and Distress Prediction ( http://arxiv.org/abs/2103.03296v1 ) ライセンス: Link先を確認	Atharva Kulkarni, Sunanda Somwase, Shivam Rajput, and Manisha Marathe	(参考訳) 共感と苦痛の感情現象に関する活発な研究は、人間と機械の相互作用を改善するために非常に貴重です。これらの構成は心理学理論に深く根ざしているため、テキストデータからそのような複雑な感情の強度を予測することは困難です。したがって、より良い予測のためには、心理的テストスコア、人口統計学的特徴、潜在的な原始的感情の基礎、テキストのアンダートーンとその心理的複雑さなどの補助要因を考慮することが不可欠です。本稿では,WASSA 2021のニュース記事に対する共感と感情の予測に関する共有タスクに対するPVGのソリューションについて述べる。テキストデータ,人口統計特性,心理テストスコア,原始感情と共感の本質的な相互依存を利用して,共感スコア予測タスクのためのマルチ入力マルチタスクフレームワークを提案する。ここで、共感スコア予測は第一次タスクと見なされ、感情と共感の分類は二次補助タスクと見なされます。ストレススコア予測タスクでは、語彙的特徴の追加により、システムはさらに強化される。私たちの提案は、平均相関(0.545)と苦痛相関(0.574)と共感的ピアソン相関(0.517)の2$^{nd}$に基づいて1$^{st}$をランク付けしました。 Active research pertaining to the affective phenomenon of empathy and distress is invaluable for improving human-machine interaction. Predicting intensities of such complex emotions from textual data is difficult, as these constructs are deeply rooted in the psychological theory. Consequently, for better prediction, it becomes imperative to take into account ancillary factors such as the psychological test scores, demographic features, underlying latent primitive emotions, along with the text's undertone and its psychological complexity. This paper proffers team PVG's solution to the WASSA 2021 Shared Task on Predicting Empathy and Emotion in Reaction to News Stories. Leveraging the textual data, demographic features, psychological test score, and the intrinsic interdependencies of primitive emotions and empathy, we propose a multi-input, multi-task framework for the task of empathy score prediction. Here, the empathy score prediction is considered the primary task, while emotion and empathy classification are considered secondary auxiliary tasks. For the distress score prediction task, the system is further boosted by the addition of lexical features. Our submission ranked 1$^{st}$ based on the average correlation (0.545) as well as the distress correlation (0.574), and 2$^{nd}$ for the empathy Pearson correlation (0.517).	翻訳日:2021-03-08 14:59:27 公開日:2021-03-04
# HLAの多重特徴表現を用いた腎臓移植生存予測 Predicting Kidney Transplant Survival using Multiple Feature Representations for HLAs ( http://arxiv.org/abs/2103.03305v1 ) ライセンス: Link先を確認	Mohammadreza Nemati, Haonan Zhang, Michael Sloma, Dulat Bekbolsynov, Hong Wang, Stanislaw Stepkowski, and Kevin S. Xu	(参考訳) 腎移植は末期腎疾患患者の生活水準を大幅に向上させることができる。腎移植の移植生存時間(移植が失敗し、患者が別の移植を受けるまでの時間)に影響を与える重要な要因は、ドナーと受取人のヒト白血球抗原(HLA)の適合性である。本稿では,HLA情報を機械学習による生存分析アルゴリズムに組み込む生体関連特徴表現を提案する。提案したHLA特徴表現は10万以上の移植のデータベース上で評価し, 約1%の精度で予測精度が向上し, 患者レベルでは緩やかだが, 社会的レベルでは有意義であることが確認された。生存時間の正確な予測は、移植後の生存率を改善でき、受け手へのドナーの割り当てが向上し、移植片の不全による再移植の回数が減少する。 Kidney transplantation can significantly enhance living standards for people suffering from end-stage renal disease. A significant factor that affects graft survival time (the time until the transplant fails and the patient requires another transplant) for kidney transplantation is the compatibility of the Human Leukocyte Antigens (HLAs) between the donor and recipient. In this paper, we propose new biologically-relevant feature representations for incorporating HLA information into machine learning-based survival analysis algorithms. We evaluate our proposed HLA feature representations on a database of over 100,000 transplants and find that they improve prediction accuracy by about 1%, modest at the patient level but potentially significant at a societal level. Accurate prediction of survival times can improve transplant survival outcomes, enabling better allocation of donors to recipients and reducing the number of re-transplants due to graft failure with poorly matched donors.	翻訳日:2021-03-08 14:58:46 公開日:2021-03-04
# ガウス過程がニューラルオードを満たす:希少データと雑音データから部分的に観測されたシステムのダイナミクスを学ぶベイズ的枠組み Gaussian processes meet NeuralODEs: A Bayesian framework for learning the dynamics of partially observed systems from scarce and noisy data ( http://arxiv.org/abs/2103.03385v1 ) ライセンス: Link先を確認	Mohamed Aziz Bhouri and Paris Perdikaris	(参考訳) 本稿では,非線形力学系の部分的,雑音的,不規則な観測からベイズ系を同定する機械学習フレームワーク(GP-NODE)を提案する。提案手法は微分可能計画における最近の発展を利用して、通常の微分方程式解法を通じて勾配情報を伝播し、ハミルトニアンモンテカルロサンプリングとガウス過程を用いた未知のモデルパラメータに対してベイズ推論を行う。これにより,観測データの時間的相関を活用し,不確かさを定量化したモデル上での後方分布を効率的に推定することができる。さらに、自由モデルパラメータにフィンランドのホースシューのような疎開促進優先度を使用することにより、基礎となる潜在ダイナミクスに対する解釈可能で同義表現の発見が可能になる。捕食者予備システム,システム生物学,50次元ヒューマンモーションダイナミクスシステムを含む提案GP-NODE法の有効性を示すために,一連の数値的研究を行った。総合すると、不確実性の下でデータ駆動モデル発見のための新しい、柔軟で堅牢なワークフローが生まれました。この原稿に付随するすべてのコードとデータは、 \url{https://github.com/PredictiveIntelligenceLab/GP-NODEs} でオンラインで入手できます。 This paper presents a machine learning framework (GP-NODE) for Bayesian systems identification from partial, noisy and irregular observations of nonlinear dynamical systems. The proposed method takes advantage of recent developments in differentiable programming to propagate gradient information through ordinary differential equation solvers and perform Bayesian inference with respect to unknown model parameters using Hamiltonian Monte Carlo sampling and Gaussian Process priors over the observed system states. This allows us to exploit temporal correlations in the observed data, and efficiently infer posterior distributions over plausible models with quantified uncertainty. Moreover, the use of sparsity-promoting priors such as the Finnish Horseshoe for free model parameters enables the discovery of interpretable and parsimonious representations for the underlying latent dynamics. A series of numerical studies is presented to demonstrate the effectiveness of the proposed GP-NODE method including predator-prey systems, systems biology, and a 50-dimensional human motion dynamical system. Taken together, our findings put forth a novel, flexible and robust workflow for data-driven model discovery under uncertainty. All code and data accompanying this manuscript are available online at \url{https://github.com/PredictiveIntelligenceLab/GP-NODEs}.	翻訳日:2021-03-08 14:56:53 公開日:2021-03-04
# ソーシャルメディアのダンス映像から身近な人物の忠実度を学習する Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos ( http://arxiv.org/abs/2103.03319v1 ) ライセンス: Link先を確認	Yasamin Jafarian, Hyun Soo Park	(参考訳) 服を着る人間の幾何学を学ぶための重要な課題は、地上の真実データ(例えば、3Dスキャンされたモデル)の限られた可用性にある。さまざまな外観、衣料品スタイル、パフォーマンス、アイデンティティにまたがるソーシャルメディアダンスビデオの数:我々は、新しいデータリソースを利用して、この課題に取り組みます。それぞれのビデオは、1人の身体と衣服のダイナミックな動きを描いているが、3D地上の真実の幾何学は欠如している。これらの映像を利用するために,予測された人物の局所的幾何を異なるタイミングで他の人物の局所的形状にワープする,局所的変換を用いた新しい手法を提案する。これにより、予測に対する時間的コヒーレンスを強制する自己超越が可能となる。さらに, 局所的なテクスチャ, しわ, 日陰に応答する表面の正常値とともに, 幾何的一貫性を最大化することにより, 深度を共に学習する。本手法はエンドツーエンドで訓練可能であり,入力実画像に忠実な微細形状を予測できる高忠実度深さ推定を行う。本手法は,実画像とレンダリング画像の両方において,最先端の人間の深度推定と人間の形状復元アプローチに勝ることを示す。 A key challenge of learning the geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To utilize these videos, we present a new method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision as enforcing a temporal coherence over the predictions. In addition, we jointly learn the depth along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.	翻訳日:2021-03-08 14:52:59 公開日:2021-03-04
# PolarNet: 極域における自動車レーダを用いた深部オープンスペースのセグメンテーション PolarNet: Accelerated Deep Open Space Segmentation Using Automotive Radar in Polar Domain ( http://arxiv.org/abs/2103.03387v1 ) ライセンス: Link先を確認	Farzan Erlik Nowruzi, Dhanvin Kolhatkar, Prince Kapoor, Elnaz Jahani Heravi, Fahed Al Hassanat, Robert Laganiere, Julien Rebut, Waqas Malik	(参考訳) カメラとライダー処理は、ディープラーニングモデルアーキテクチャの急速な開発に革命をもたらした。自動車レーダーは、自動運転支援と自動運転システムの重要な要素の1つです。 Radarはまだ、カメラやLidarベースの方法とは異なり、従来の信号処理技術に依存している。これが最も堅牢な知覚システムを実現するための欠落したリンクだと考えています。運転可能なスペースと占有スペースを特定することは、あらゆる自律的な意思決定タスクの最初のステップです。この目的のためにしばしば、環境の占有グリッドマップ表現が使用される。本稿では,空間分割のための極域におけるレーダ情報を処理するディープニューラルモデルであるPolarNetを提案する。さまざまな入出力表現を探ります。本実験では,PolarNetがレーダデータを処理する有効な方法であり,小型化を維持しながら最新の性能と処理速度を達成できることを示した。 Camera and Lidar processing have been revolutionized with the rapid development of deep learning model architectures. Automotive radar is one of the crucial elements of automated driver assistance and autonomous driving systems. Radar still relies on traditional signal processing techniques, unlike camera and Lidar based methods. We believe this is the missing link to achieve the most robust perception system. Identifying drivable space and occupied space is the first step in any autonomous decision making task. Occupancy grid map representation of the environment is often used for this purpose. In this paper, we propose PolarNet, a deep neural model to process radar information in polar domain for open space segmentation. We explore various input-output representations. Our experiments show that PolarNet is a effective way to process radar data that achieves state-of-the-art performance and processing speeds while maintaining a compact size.	翻訳日:2021-03-08 14:52:37 公開日:2021-03-04
# BERTに基づくランキングモデルを用いた転帰学習と擬似ラベルの体系的評価 A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models ( http://arxiv.org/abs/2103.03335v1 ) ライセンス: Link先を確認	Iurii Mokrii, Leonid Boytsov, Pavel Braslavski	(参考訳) アノテーションコストが高いため、既存の人為的トレーニングデータを最大限に活用することは重要な研究方向です。そこで本研究では,5つの英語データセットにまたがるBERTに基づくニューラルランキングモデルの伝達性に関する体系的評価を行った。これまでの研究では、大きなデータセットから少数のクエリを持つデータセットへのゼロショットと少数ショットの転送に重点を置いていた。対照的に、各コレクションには膨大な数のクエリがあり、フルショット評価モードを可能にし、結果の信頼性を向上させます。さらに、ソースデータセットのライセンスはしばしば商用利用を禁止しているため、転送学習とBM25スコアラーが生成した擬似ラベルのトレーニングを比較する。擬似ラベルのトレーニング -- おそらくは、わずかな数の注釈付きクエリを使った微調整 -- は、トランスファーラーニングと比較して、競争力や優れたモデルを生み出すことができる。しかし、数発訓練の安定性と/または有効性を改善する必要があるため、事前訓練されたモデルの性能を低下させることができる場合もある。 Due to high annotation costs, making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels -- possibly with subsequent fine-tuning using a modest number of annotated queries -- can produce a competitive or better model compared to transfer learning. However, there is a need to improve the stability and/or effectiveness of the few-shot training, which, in some cases, can degrade performance of a pretrained model.	翻訳日:2021-03-08 14:48:34 公開日:2021-03-04
# GPU指向データ通信アーキテクチャによる大規模グラフ畳み込みネットワークトレーニング Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture ( http://arxiv.org/abs/2103.03330v1 ) ライセンス: Link先を確認	Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayeto\u{g}lu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu	(参考訳) グラフ畳み込みネットワーク(gcns)は大規模グラフベースのレコメンデーションシステムでますます採用されている。 GCNのトレーニングには、ミニバッチジェネレーターがグラフを横断し、隣接するノードをサンプリングして特徴を得る必要があります。現実のグラフはGPUメモリの容量を超えることが多いため、現在のGCNトレーニングシステムは、フィーチャーテーブルをホストメモリに保持し、GPUに送信する前にスパース機能を集めるためにCPUに依存している。しかしこのアプローチは、ホストメモリの帯域幅とCPUに大きなプレッシャーを与えます。これは、CPUが(1)メモリからスパース機能を読み込み、(2)高密度フォーマットとしてメモリに機能を書き込み、(3)メモリからGPUに機能を転送する必要があるためである。本研究では、GPUスレッドがCPUの助けなしにゼロコピーアクセスを介してホストメモリのスパースな機能に直接アクセスする、GCNトレーニングのための新しいGPU指向データ通信アプローチを提案する。 CPU収集段階を除去することにより、ホストリソースの消費とデータアクセス遅延を大幅に低減する。さらに,gpuによる高ホストメモリアクセス効率を実現するために,(1)pcieパケット効率を最大化する自動データアクセスアドレスアライメント,(2)非同期ゼロコピーアクセスとカーネル実行の2つの重要な技術を提案する。提案手法をPyTorchに組み込んで,最大1100万ノードと160億エッジのグラフを用いて,その有効性を評価する。マルチGPUトレーニングのセットアップでは、従来のデータ転送方法よりも65〜92%高速で、GPUメモリに収まるグラフのオールインGPUメモリトレーニングのパフォーマンスも一致させることができます。 Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features before sending them to the GPUs. This approach, however, puts tremendous pressure on host memory bandwidth and the CPU. This is because the CPU needs to (1) read sparse features from memory, (2) write features into memory as a dense format, and (3) transfer the features from memory to the GPUs. In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help. By removing the CPU gathering stage, our method significantly reduces the consumption of the host resources and data access latency. We further present two important techniques to achieve high host memory access efficiency by the GPU: (1) automatic data access address alignment to maximize PCIe packet efficiency, and (2) asynchronous zero-copy access and kernel execution to fully overlap data transfer with training. We incorporate our method into PyTorch and evaluate its effectiveness using several graphs with sizes up to 111 million nodes and 1.6 billion edges. In a multi-GPU training setup, our method is 65-92% faster than the conventional data transfer method, and can even match the performance of all-in-GPU-memory training for some graphs that fit in GPU memory.	翻訳日:2021-03-08 14:48:19 公開日:2021-03-04
# 変分スパースガウス過程に対するMCMCについて:擬マリナルアプローチ On MCMC for variationally sparse Gaussian processes: A pseudo-marginal approach ( http://arxiv.org/abs/2103.03321v1 ) ライセンス: Link先を確認	Karla Monterrubio-G\'omez and Sara Wade	(参考訳) ガウス過程(GP)は、機械学習や統計学において強力なモデルを構築するために頻繁に用いられる。しかし,GPを実際に使用する場合には,計算負担の増大,後部近似,共分散関数の選択,ハイパーパラメータの推測など,重要な考慮が必要である。これらの問題に対処するため、Hensman氏ら。 (2015) は変分スパースGPとマルコフ連鎖モンテカルロ(MCMC)を組み合わせて、GPモデルのスケーラブルで柔軟な一般的なフレームワークを導出する。それでも、結果として得られるアプローチは、多くの観測モデルに対して難解な可能性評価を必要とする。そこで本研究では,この問題を回避すべく,漸近的精密な推論と2倍確率推定器による計算利得を提供する疑似マージナル(pm)スキームを提案する。複素モデルでは、PMスキームの利点は特に顕著であり、非パラメトリック共分散関数を持つ2レベルGP回帰モデルで非定常性を捉えることを実証する。 Gaussian processes (GPs) are frequently used in machine learning and statistics to construct powerful models. However, when employing GPs in practice, important considerations must be made, regarding the high computational burden, approximation of the posterior, choice of the covariance function and inference of its hyperparmeters. To address these issues, Hensman et al. (2015) combine variationally sparse GPs with Markov chain Monte Carlo (MCMC) to derive a scalable, flexible and general framework for GP models. Nevertheless, the resulting approach requires intractable likelihood evaluations for many observation models. To bypass this problem, we propose a pseudo-marginal (PM) scheme that offers asymptotically exact inference as well as computational gains through doubly stochastic estimators for the intractable likelihood and large datasets. In complex models, the advantages of the PM scheme are particularly evident, and we demonstrate this on a two-level GP regression model with a nonparametric covariance function to capture non-stationarity.	翻訳日:2021-03-08 14:44:14 公開日:2021-03-04
# IrrMapper-U-Net のマッピングへの深層学習アプローチ A Deep Learning Approach to Mapping Irrigation: IrrMapper-U-Net ( http://arxiv.org/abs/2103.03278v1 ) ライセンス: Link先を確認	Thomas Colligan, David Ketchum, Douglas Brinkerhoff, Marco Maneta	(参考訳) 水資源の理解と管理には正確な灌水マップが不可欠である。本研究では,2000年から2019年までのモンタナ州における新しい灌水マッピング法を提案し,その精度を実証する。この手法は、ランドサット画像からの反射情報を用いて、IrrMapper-U-Netと呼ぶ灌水画素を分類する畳み込みニューラルネットワークのアンサンブルに基づいている。この方法論は広範な機能工学に依存しておらず、既存の地理空間データセットからの土地利用情報と分類を条件としない。アンサンブルは網羅的なハイパーパラメータチューニングを必要とせず、分析パイプラインはパーソナルコンピュータに実装できるほど軽量である。さらに,提案手法は分類に関連する不確実性を推定する。本研究は,高度に高精度な空間拡張型地中真理データセットを用いて,郡規模のusda調査とキャダストラムサーベイを用いて,方法論と得られた灌水地図を評価した。その結果,本手法はモンタナ州において,総合的精度と精度の点で他の方法よりも優れていることがわかった。農林水産省の農林省全国農業統計調査では,他の方法に比べて灌水地域の推定値と州全体で一致しており,降水地域における委託の誤差は極めて少ないことがわかった。このメソッドは、クラウドをマスクし、監視なしでLandsat 7スキャンライン障害を無視することを学び、データを前処理する必要性を減らします。この手法は、アメリカ合衆国全土とランドサットの完全な記録に適用される可能性がある。 Accurate maps of irrigation are essential for understanding and managing water resources. We present a new method of mapping irrigation and demonstrate its accuracy for the state of Montana from years 2000-2019. The method is based off of an ensemble of convolutional neural networks that use reflectance information from Landsat imagery to classify irrigated pixels, that we call IrrMapper-U-Net. The methodology does not rely on extensive feature engineering and does not condition the classification with land use information from existing geospatial datasets. The ensemble does not need exhaustive hyperparameter tuning and the analysis pipeline is lightweight enough to be implemented on a personal computer. Furthermore, the proposed methodology provides an estimate of the uncertainty associated with classification. We evaluated our methodology and the resulting irrigation maps using a highly accurate novel spatially-explicit ground truth data set, using county-scale USDA surveys of irrigation extent, and using cadastral surveys. We found that that our method outperforms other methods of mapping irrigation in Montana in terms of overall accuracy and precision. We found that our method agrees better statewide with the USDA National Agricultural Statistics Survey estimates of irrigated area compared to other methods, and has far fewer errors of commission in rainfed agriculture areas. The method learns to mask clouds and ignore Landsat 7 scan-line failures without supervision, reducing the need for preprocessing data. This methodology has the potential to be applied across the entire United States and for the complete Landsat record.	翻訳日:2021-03-08 14:43:55 公開日:2021-03-04
# CLAIMED - Trusted AIのためのビジュアルでスケーラブルなコンポーネントライブラリ CLAIMED, a visual and scalable component library for Trusted AI ( http://arxiv.org/abs/2103.03281v1 ) ライセンス: Link先を確認	Romeo Kienzler and Ivan Nesic	(参考訳) ディープラーニングモデルの人気はますます高まっているが、説明可能性、敵対的堅牢性、公平性に関する制約は、プロダクションデプロイメントの大きな懸念であることが多い。オープンソースエコシステムはこれらの懸念に対処するために豊富ですが、完全に統合されたエンドツーエンドのシステムはオープンソースに欠けています。したがって、IBMとUniversity Hospital Baselの共同作業であるKubernetes上に、完全にオープンソースで再利用可能なコンポーネントフレームワーク、プロダクショングレードの機械学習用のビジュアルエディタと実行エンジンを提供します。 Kubeflow Pipelines、AI Explainability360ツールキット、AI Fairness360ツールキット、ElyraAI、Kubeflow、Kubernetes、JupyterLab上にAdversarial Robustness Toolkitを使用しています。 Elyraパイプラインエディタを使用すると、一連のJupyterノートブックでAIパイプラインを視覚的に開発できます。 Deep Learning models are getting more and more popular but constraints on explainability, adversarial robustness and fairness are often major concerns for production deployment. Although the open source ecosystem is abundant on addressing those concerns, fully integrated, end to end systems are lacking in open source. Therefore we provide an entirely open source, reusable component framework, visual editor and execution engine for production grade machine learning on top of Kubernetes, a joint effort between IBM and the University Hospital Basel. It uses Kubeflow Pipelines, the AI Explainability360 toolkit, the AI Fairness360 toolkit and the Adversarial Robustness Toolkit on top of ElyraAI, Kubeflow, Kubernetes and JupyterLab. Using the Elyra pipeline editor, AI pipelines can be developed visually with a set of jupyter notebooks.	翻訳日:2021-03-08 14:40:51 公開日:2021-03-04
# プライオリティの見直し$k$-Center:フェアネスとアウトリーチ Revisiting Priority $k$-Center: Fairness and Outliers ( http://arxiv.org/abs/2103.03337v1 ) ライセンス: Link先を確認	Tanvi Bajpai, Deeparnab Chakrabarty, Chandra Chekuri, Maryam Negahbani	(参考訳) 優先度 $k$-Center 問題では、入力は計量空間 $(X,d)$ と整数 $k$ と、各点 $v \in X$ の優先度半径 $r(v)$ からなる。目標は $k$-centers $S \subseteq X$ を選び、$\max_{v \in X} \frac{1}{r(v)} d(v,S)$ を最小化することである。すべての$r(v)$ が一様であれば、古典的な $k$-center 問題が得られる。 Plesn\'ik [Plesn\'ik, Disc。アプリ。数学。 1987年]この問題を導入し、バニラ$k$-centerの最良のアルゴリズムと一致する2ドルの近似アルゴリズムを与えた。この問題は、フェアクラスタリングの2つの異なる概念 [Harris et al., NeurIPS 2018; Jung et al., FORC 2020] とどのように関連しているかを示します。これらの開発に動機づけられて、我々は問題を再検討し、私たちの主な技術的貢献では、優先度$ k$-Centerの定数因子近似アルゴリズムを生成するフレームワークを開発します。我々のフレームワークは、行列やknapsackの制約に$k$-Centerのプライオリティの一般化にまで拡張され、また、論理的にハリスらの宝くじモデルにおいて公平性を保証するアルゴリズムも得られる。 In the Priority $k$-Center problem, the input consists of a metric space $(X,d)$, an integer $k$ and for each point $v \in X$ a priority radius $r(v)$. The goal is to choose $k$-centers $S \subseteq X$ to minimize $\max_{v \in X} \frac{1}{r(v)} d(v,S)$. If all $r(v)$'s were uniform, one obtains the classical $k$-center problem. Plesn\'ik [Plesn\'ik, Disc. Appl. Math. 1987] introduced this problem and gave a $2$-approximation algorithm matching the best possible algorithm for vanilla $k$-center. We show how the problem is related to two different notions of fair clustering [Harris et al., NeurIPS 2018; Jung et al., FORC 2020]. Motivated by these developments we revisit the problem and, in our main technical contribution, develop a framework that yields constant factor approximation algorithms for Priority $k$-Center with outliers. Our framework extends to generalizations of Priority $k$-Center to matroid and knapsack constraints, and as a corollary, also yields algorithms with fairness guarantees in the lottery model of Harris et al.	翻訳日:2021-03-08 14:40:34 公開日:2021-03-04
# WaveGuard: オーディオアドバイザリの例を理解して修正する WaveGuard: Understanding and Mitigating Audio Adversarial Examples ( http://arxiv.org/abs/2103.03344v1 ) ライセンス: Link先を確認	Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar	(参考訳) 近年,ディープラーニングに基づく自動音声認識(ASR)システムに対する敵対的攻撃が急増している。これらの攻撃はディープラーニングのセキュリティに新たな課題をもたらし、安全クリティカルなアプリケーションにASRシステムをデプロイすることに大きな懸念を引き起こしました。本稿では,asrシステムを攻撃するために開発された逆入力を検出するフレームワークwaveguardを紹介する。本フレームワークは,音声変換機能を組み込んで原音声と変換音声のasr転写を解析し,逆入力を検出する。我々は,近年の4つの音声対向攻撃によって構築された対向的事例を,様々な音声変換関数を用いて確実に検出できることを実証した。防衛評価におけるベストプラクティスを慎重に検討し,音声領域における適応的かつ強固な攻撃に耐える防衛力とその強みを分析した。我々は,音声を知覚的情報から復元する音声変換が,完全なホワイトボックス設定であっても,適応的相手に対して堅牢な強い防御につながることを実証的に実証した。さらに、WaveGuardはすぐに使用でき、任意のASRモデルと直接統合され、モデルの再トレーニングを必要とせずに、オーディオの逆転例を効率的に検出できます。 There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems. These attacks pose new challenges to deep learning security and have raised significant concerns in deploying ASR systems in safety-critical applications. In this work, we introduce WaveGuard: a framework for detecting adversarial inputs that are crafted to attack ASR systems. Our framework incorporates audio transformation functions and analyses the ASR transcriptions of the original and transformed audio to detect adversarial inputs. We demonstrate that our defense framework is able to reliably detect adversarial examples constructed by four recent audio adversarial attacks, with a variety of audio transformation functions. With careful regard for best practices in defense evaluations, we analyze our proposed defense and its strength to withstand adaptive and robust attacks in the audio domain. We empirically demonstrate that audio transformations that recover audio from perceptually informed representations can lead to a strong defense that is robust against an adaptive adversary even in a complete white-box setting. Furthermore, WaveGuard can be used out-of-the box and integrated directly with any ASR model to efficiently detect audio adversarial examples, without the need for model retraining.	翻訳日:2021-03-08 14:35:37 公開日:2021-03-04
# (参考訳) コヒーレントリスクに対する政策勾配の収束と最適性について On the Convergence and Optimality of Policy Gradient for Coherent Risk ( http://arxiv.org/abs/2103.02827v1 ) ライセンス: CC BY 4.0	Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli	(参考訳) 強化学習におけるリスク回避をモデル化するために、新たな研究ラインでは、よく知られたアルゴリズムを使用してコヒーレントリスク関数(条件付きリスク(CVaR)を含むクラス)を最適化する。マルコフの決定プロセスではコヒーレントリスクの最適化は困難であるため、最近の研究では、時間の一貫性のある代理であるマルコフコヒーレントリスク(MCR)に焦点を当てる傾向にある。政策勾配 (PG) の更新はこの目的のために導出されているが、(i) PG が MCR にグローバルに最適であるかどうか、(ii) トラクタブルな方法で勾配を推定する方法は不明である。本稿では,mcrの目的が(期待値と異なり)勾配が支配的ではなく,定常点が一般にグローバルに最適であることを保証するものではないことを実証する。さらに,目的の非線形性とリスク回避の程度に依存することを特徴として,学習方針の最適性に対する厳密な上限を示す。対処法(ii)では, 従来の制限を克服するために, 状態分布の重み付けを用いたPGの実践的実装を提案する。実験を通じて,最適性ギャップが小さい場合,pgはリスクに敏感な方針を学習できることを実証する。しかし、大きな最適性ギャップを持つインスタンスは豊富で構築が容易であり、将来の研究における重要な課題を概説する。 In order to model risk aversion in reinforcement learning, an emerging line of research adapts familiar algorithms to optimize coherent risk functionals, a class that includes conditional value-at-risk (CVaR). Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate. While, policy gradient (PG) updates have been derived for this objective, it remains unclear (i) whether PG finds a global optimum for MCR; (ii) how to estimate the gradient in a tractable manner. In this paper, we demonstrate that, in general, MCR objectives (unlike the expected return) are not gradient dominated and that stationary points are not, in general, guaranteed to be globally optimal. Moreover, we present a tight upper bound on the suboptimality of the learned policy, characterizing its dependence on the nonlinearity of the objective and the degree of risk aversion. Addressing (ii), we propose a practical implementation of PG that uses state distribution reweighting to overcome previous limitations. Through experiments, we demonstrate that when the optimality gap is small, PG can learn risk-sensitive policies. However, we find that instances with large suboptimality gaps are abundant and easy to construct, outlining an important challenge for future research.	翻訳日:2021-03-08 00:21:40 公開日:2021-03-04
# (参考訳) 動的語彙を用いた感情制御対話応答生成モデル An Emotion-controlled Dialog Response Generation Model with Dynamic Vocabulary ( http://arxiv.org/abs/2103.02878v1 ) ライセンス: CC BY 4.0	Shuangyong Song, Kexin Wang, Chao Wang, Haiqing Chen, Huan Chen	(参考訳) 応答生成タスクでは、適切な感情表現は、応答の人間的様レベルを明らかに改善することができる。しかし,オンラインシステムにおける実際の応用には,高QPS(オンラインシステムのフローキャパシティの指標)が必要であり,動的語彙機構が生成モデルの高速化に有効であることが証明されている。本稿では,動的語彙機構に基づく感情制御型対話応答生成モデルを提案し,実験結果から本モデルの有用性が示された。 In response generation task, proper sentimental expressions can obviously improve the human-like level of the responses. However, for real application in online systems, high QPS (queries per second, an indicator of the flow capacity of on-line systems) is required, and a dynamic vocabulary mechanism has been proved available in improving speed of generative models. In this paper, we proposed an emotion-controlled dialog response generation model based on the dynamic vocabulary mechanism, and the experimental results show the benefit of this model.	翻訳日:2021-03-07 23:08:23 公開日:2021-03-04
# (参考訳) スペクトルDefense:フーリエ領域におけるCNNの敵攻撃の検出 SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain ( http://arxiv.org/abs/2103.03000v1 ) ライセンス: CC BY 4.0	Paula Harder, Franz-Josef Pfreundt, Margret Keuper, Janis Keuper	(参考訳) 多くのコンピュータビジョンや画像解析タスクにおける畳み込みニューラルネットワーク(CNN)の成功にもかかわらず、それらはいわゆる敵対的な攻撃に対して脆弱のままです。防御は敵の例を検出することである。本稿では,入力画像と特徴マップのフーリエ領域における解析を用いて,良質なテストサンプルと敵画像の区別を行う方法を示す。第1報では,入力画像の大きさスペクトルを用いて敵の攻撃を検出する手法を提案する。このシンプルで堅牢な分類器は、3つの一般的な攻撃方法の敵対的摂動をうまく検出できます。第2の方法は、第1に構築され、さらにネットワークの異なる層における特徴マップのフーリエ係数の位相を抽出する。この拡張により、5つの異なる攻撃方法における最先端検出器と比較して、敵検出率を向上させることができる。 Despite the success of convolutional neural networks (CNNs) in many computer vision and image analysis tasks, they remain vulnerable against so-called adversarial attacks: Small, crafted perturbations in the input images can lead to false predictions. A possible defense is to detect adversarial examples. In this work, we show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods: Our first method employs the magnitude spectrum of the input images to detect an adversarial attack. This simple and robust classifier can successfully detect adversarial perturbations of three commonly used attack methods. The second method builds upon the first and additionally extracts the phase of Fourier coefficients of feature-maps at different layers of the network. With this extension, we are able to improve adversarial detection rates compared to state-of-the-art detectors on five different attack methods.	翻訳日:2021-03-07 23:04:14 公開日:2021-03-04
# (参考訳) there and back again: 変動要因の分離のための集合全体のサイクル一貫性 There and back again: Cycle consistency across sets for isolating factors of variation ( http://arxiv.org/abs/2103.03240v1 ) ライセンス: CC BY 4.0	Kieran A. Murphy, Varun Jampani, Srikumar Ramalingam, Ameesh Makadia	(参考訳) 表現学習は、データの変動の基盤となる説明的要因の集合を解き放つタスクにかかっている。本研究では,変動要因をサブセットに限定したグループ化(grouping)という形で,データに関する限られた情報や,集合メンバシップ(set membership)という設定で運用する。私たちの目標は、グループ間で共通する変化の要因を分離する表現を学ぶことです。我々の重要な洞察は、異なる集合に属する画像の学習された埋め込み間の集合(CCS)間のサイクル一貫性の利用である。セット管理を利用する他の手法とは対照的に、CCSは変化の要因に対する制約を著しく少なくし、非常に広い範囲の設定で適用でき、トレーニングデータの一部に対してのみセットメンバーシップを利用することができる。 shapes3dからデータセットをキュレートすることで,学習表現と既知の生成因子の相互情報を通してccsの有効性を定量化する。さらに,デジタルスタイル分離と合成オブジェクトポーズ転送のタスクに対するCSの適用性を実証し,これを用いた生成的アプローチとの比較を行った。 Representational learning hinges on the task of unraveling the set of underlying explanatory factors of variation in data. In this work, we operate in the setting where limited information is known about the data in the form of groupings, or set membership, where the underlying factors of variation is restricted to a subset. Our goal is to learn representations which isolate the factors of variation that are common across the groupings. Our key insight is the use of cycle consistency across sets(CCS) between the learned embeddings of images belonging to different sets. In contrast to other methods utilizing set supervision, CCS can be applied with significantly fewer constraints on the factors of variation, across a remarkably broad range of settings, and only utilizing set membership for some fraction of the training data. By curating datasets from Shapes3D, we quantify the effectiveness of CCS through mutual information between the learned representations and the known generative factors. In addition, we demonstrate the applicability of CCS to the tasks of digit style isolation and synthetic-to-real object pose transfer and compare to generative approaches utilizing the same supervision.	翻訳日:2021-03-07 21:51:22 公開日:2021-03-04
# (参考訳) コードの普遍表現 Universal Representation for Code ( http://arxiv.org/abs/2103.03116v1 ) ライセンス: CC BY 4.0	Linfeng Liu, Hoan Nguyen, George Karypis, Srinivasan Sengamedu	(参考訳) ソースコードから学ぶには、通常大量のラベル付きデータが必要です。ラベル付きデータの不足の可能性にもかかわらず、トレーニングされたモデルはタスク固有であり、異なるタスクへの転送性に欠ける。本稿では,新しいグラフベースのコード表現の上に,コードの普遍表現を生成するための効果的な事前学習戦略を提案する。特に、私たちのグラフベースの表現は、コード要素(例えば、制御フローとデータフロー)間の重要なセマンティクスをキャプチャします。我々は、グラフニューラルネットワークの表現を事前学習し、普遍的なコード特性を抽出する。事前トレーニングされたモデルは、様々な下流アプリケーションをサポートするための微調整を可能にする。実世界の2つのデータセット - 30億のjavaメソッドと770万のpythonメソッドにまたがる。可視化により、普遍的なコード表現における識別特性を明らかにする。複数のベンチマークを比較することで,提案フレームワークがメソッド名予測とコードグラフリンク予測の最先端結果を実現することを示す。 Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.	翻訳日:2021-03-07 20:43:47 公開日:2021-03-04
# (参考訳) 生涯学習の現実的なシナリオとしての継続的協調 Continuous Coordination As a Realistic Scenario for Lifelong Learning ( http://arxiv.org/abs/2103.03216v1 ) ライセンス: CC BY-SA 4.0	Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar	(参考訳) 現在の深層強化学習(RL)アルゴリズムは依然としてタスク固有であり、新しい環境に一般化する能力がない。しかし、LLL(Lifelong Learning)は、タスク間の知識を効率的に転送し、使用することにより、複数のタスクを順次解決することを目指しています。近年の生涯RLへの関心の高まりにもかかわらず、現実的なテストベッドの欠如はLLLアルゴリズムの堅牢な評価を困難にします。一方、マルチエージェントRL(MARL)は、エージェントのポリシーが時間とともに変化するため、その固有の非定常性のため、寿命の長いRLの自然なシナリオと見なすことができる。本研究では,ゼロショット設定と少数ショット設定の両方をサポートするマルチエージェント生涯学習テストベッドを提案する。私たちのセットアップは、部分的に観察可能で完全に協力的なマルチエージェントゲームであるhanabiをベースにしています。その大きな戦略空間は、生涯RLタスクにとって望ましい環境である。最近のMARL法、および制限メモリおよび計算システムにおける最新のLLLアルゴリズムのベンチマークを評価し、それらの長所と短所を明らかにします。この継続的な学習パラダイムは、MARLで最も一般的に使用されるトレーニングプロトコルである集中型トレーニングを超えて実用的な方法を提供します。我々は経験的に、我々の設定で訓練されたエージェントは、以前の作業による追加の仮定なしに、未発見のエージェントとうまく協調できることを示します。 Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.	翻訳日:2021-03-07 20:31:37 公開日:2021-03-04
# (参考訳) 定量化法の比較評価 A Comparative Evaluation of Quantification Methods ( http://arxiv.org/abs/2103.03223v1 ) ライセンス: CC BY 4.0	Tobias Schumacher, Markus Strohmaier, Florian Lemmerich	(参考訳) 量子化は、与えられたターゲットセットのクラス分布を予測する問題を表す。また、近年、さまざまなアルゴリズムが提案されている教師付き機械学習の研究分野も拡大しています。しかし,アルゴリズム選択をサポートする定量化手法の包括的比較は,まだ行われていない。本研究では,24種類の数値化手法の実証的性能比較を徹底することで,この研究ギャップを埋める。バイナリのさまざまなシナリオとマルチクラス量子化設定を検討するため、40のデータセット上で約300万回の実験実行を実施しました。一つのアルゴリズムが一般に競合に勝ることはないが、Median SweepやDySフレームワークなど、バイナリ設定で大幅にパフォーマンスが向上するメソッド群を識別する。多クラス構成の場合、一般化された確率的調整数、readme法、エネルギー距離最小化法、数値化のためのemアルゴリズム、フリードマン法など、異なる幅広いアルゴリズム群が優れた性能をもたらすことが観察される。より一般的には、多クラス定量化の性能はバイナリ設定の結果よりも劣っていることが分かる。本研究は,定量化アルゴリズムを適用しようとする実践者の指導と,今後の研究の機会の特定を支援する。 Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods. To consider a broad range of different scenarios for binary as well as multiclass quantification settings, we carried out almost 3 million experimental runs on 40 data sets. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the Median Sweep and the DyS framework that perform significantly better in binary settings. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research.	翻訳日:2021-03-07 20:08:42 公開日:2021-03-04
# (参考訳) ビザンチン-ロバスト分散推論のための分散減調平均推定器 Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference ( http://arxiv.org/abs/2103.02860v1 ) ライセンス: CC BY 4.0	Jiyuan Tu, Weidong Liu, Xiaojun Mao, and Xi Chen	(参考訳) 本論文では,Byzantineノードの適度な分数に対して堅牢な分散推論アルゴリズム,すなわち分散学習システムにおける任意かつ潜在的に逆転するマシンを開発する。堅牢な統計では、中央平均(MOM)は、実装の容易さと計算効率のためにビザンチンの失敗に対してヘッジする一般的なアプローチでした。しかし、MOM推定器は統計効率の面で欠点があります。この論文の最初の主な貢献は、バニラMOM推定器の統計効率を向上し、MOMと同等の計算効率を持つ分散還元平均(VRMOM)推定器を提案することである。提案したVRMOM推定器に基づいて,ビザンチンの故障に対して頑健な一般分散推論アルゴリズムを開発した。理論上,分散アルゴリズムは一定数の通信ラウンド数しか持たない高速収束率を達成している。また,統計的推測を目的とした漸近正規化結果も提供する。私たちの知る限りでは、これはByzantine-robust分散学習の設定における最初の正常性の結果です。また,本手法の有効性を示すためにシミュレーション結果も提示した。 This paper develops an efficient distributed inference algorithm, which is robust against a moderate fraction of Byzantine nodes, namely arbitrary and possibly adversarial machines in a distributed learning system. In robust statistics, the median-of-means (MOM) has been a popular approach to hedge against Byzantine failures due to its ease of implementation and computational efficiency. However, the MOM estimator has the shortcoming in terms of statistical efficiency. The first main contribution of the paper is to propose a variance reduced median-of-means (VRMOM) estimator, which improves the statistical efficiency over the vanilla MOM estimator and is computationally as efficient as the MOM. Based on the proposed VRMOM estimator, we develop a general distributed inference algorithm that is robust against Byzantine failures. Theoretically, our distributed algorithm achieves a fast convergence rate with only a constant number of rounds of communications. We also provide the asymptotic normality result for the purpose of statistical inference. To the best of our knowledge, this is the first normality result in the setting of Byzantine-robust distributed learning. The simulation results are also presented to illustrate the effectiveness of our method.	翻訳日:2021-03-07 19:30:26 公開日:2021-03-04
# (参考訳) 正常および耳炎中耳炎における広帯域吸音率の機械学習による解析 Analysing Wideband Absorbance Immittance in Normal and Ears with Otitis Media with Effusion Using Machine Learning ( http://arxiv.org/abs/2103.02982v1 ) ライセンス: CC BY 4.0	Emad M. Grais, Xiaoya Wang, Jie Wang, Fei Zhao, Wen Jiang, Yuexin Cai, Lifang Zhang, Qingwen Lin, Haidi Yang	(参考訳) 広帯域吸収率(WAI)は10年以上前から利用されていますが、その臨床使用は、限られた理解とWAI結果の悪い解釈の課題にまだ直面しています。本研究は、正常中耳および耳の異なる周波数圧領域におけるWAI吸収特性を、輸液(OME)による耳炎媒体で同定し、中耳の状態を自動的に診断する機械学習(ML)ツールの開発を目的とした。本研究では, waiデータの前処理, 統計解析, 分類モデル開発を含むデータ解析と, 2次元周波数圧wai画像からのキー領域抽出を行った。実験結果から, MLツールがWAIデータから中耳疾患を自動診断する大きな可能性を秘めていることがわかった。 WAIの特定された重要な領域は、WAIデータをよりよく理解し、解釈し、迅速かつ正確な診断決定の見通しを提供する実践者にガイダンスを提供します。 Wideband Absorbance Immittance (WAI) has been available for more than a decade, however its clinical use still faces the challenges of limited understanding and poor interpretation of WAI results. This study aimed to develop Machine Learning (ML) tools to identify the WAI absorbance characteristics across different frequency-pressure regions in the normal middle ear and ears with otitis media with effusion (OME) to enable diagnosis of middle ear conditions automatically. Data analysis including pre-processing of the WAI data, statistical analysis and classification model development, together with key regions extraction from the 2D frequency-pressure WAI images are conducted in this study. Our experimental results show that ML tools appear to hold great potential for the automated diagnosis of middle ear diseases from WAI data. The identified key regions in the WAI provide guidance to practitioners to better understand and interpret WAI data and offer the prospect of quick and accurate diagnostic decisions.	翻訳日:2021-03-07 19:29:23 公開日:2021-03-04
# (参考訳) 自己監視型ジオメトリック知覚 Self-supervised Geometric Perception ( http://arxiv.org/abs/2103.03114v1 ) ライセンス: CC BY 4.0	Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun	(参考訳) SGP(Self-supervised Geometric Recognition)は、地上真正の幾何学モデルラベル(例えば、カメラポーズ、リジッド変換)なしでマッチングする機能記述子を学習する最初の一般的なフレームワークである。私たちの最初の貢献は、特徴ディスクリプタと幾何モデル(例えば画像、点雲)を共同で最適化する最適化問題として幾何学的知覚を定式化することです。この最適化定式化の下では、視覚における2つの重要な研究の流れ、すなわち頑健なモデルフィッティングと深い特徴学習が、他のブロックを固定しながら未知変数の1ブロックを最適化することに対応することを示す。この分析は自然に、共同最適化を解決するために交互最小化を実行するSGPアルゴリズムの2番目の貢献につながります。 SGPは、2つのメタアルゴリズムを反復的に実行する: 与えられた学習特徴を頑健なモデルフィッティングして幾何学的擬似ラベルを生成する教師と、擬似ラベルのうるさい監督の下で深い特徴学習を行う学生である。第3の貢献として,GeoDepthの相対カメラポーズ推定と3DMatchのポイントクラウド登録という,大規模実データに対する2つの認識問題にSGPを適用している。本研究は,SGPが地上トラスラベルを用いて訓練した教師付きオークルよりも同等あるいは優れる最先端性能を達成できることを実証する。 We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution -- the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.	翻訳日:2021-03-07 19:13:44 公開日:2021-03-04
# (参考訳) 多発性硬化症のMR画像の構造的因果モデル A Structural Causal Model for MR Images of Multiple Sclerosis ( http://arxiv.org/abs/2103.03158v1 ) ライセンス: CC BY 4.0	Jacob C. Reinhold, Aaron Carass, Jerry L. Prince	(参考訳) 精密医学は、「この患者は治療Aまたは治療Bに対してよりよく反応するだろうか? これらのタイプの質問は本質的に因果関係であり、因果推論のツール、例えば構造因果モデル(SCM)で答える必要がある。本研究では,多発性硬化症(ms)患者の脳の人口統計情報,疾患共変量,磁気共鳴(mr)画像の相互作用をモデル化するscmを開発した。 SCMの推論は、人口動態や疾患の共変量を変更すると、脳のMR画像がどのように見えるかを示す反事実画像を生成する。これらの画像は病気の進行をモデル化したり、共同設立者のための制御が必要な下流の画像処理タスクに使用できる。 Precision medicine involves answering counterfactual questions such as "Would this patient respond better to treatment A or treatment B?" These types of questions are causal in nature and require the tools of causal inference to be answered, e.g., with a structural causal model (SCM). In this work, we develop an SCM that models the interaction between demographic information, disease covariates, and magnetic resonance (MR) images of the brain for people with multiple sclerosis (MS). Inference in the SCM generates counterfactual images that show what an MR image of the brain would look like when demographic or disease covariates are changed. These images can be used for modeling disease progression or used for downstream image processing tasks where controlling for confounders is necessary.	翻訳日:2021-03-07 18:39:22 公開日:2021-03-04
# (参考訳) Dota 2ボットコンペティション The Dota 2 Bot Competition ( http://arxiv.org/abs/2103.02943v1 ) ライセンス: CC BY 4.0	Jose M. Font and Tobias Mahlmann	(参考訳) マルチプレイヤーオンラインバトルエリア(MOBA)ゲームは、ビデオゲーム業界と国際的なeスポーツシーンの両方で最近大きな成功を収めています。これらのゲームはチームの協調と協力、短期と長期の計画、リアルタイムでのアクションと戦略のゲームプレイを促進する。ゲーム研究コンペティションにおける人工知能と計算知能は、異なるゲームジャンルに対するai技術の研究と応用に関する幅広い課題を提供している。これらのイベントはAI/CIコミュニティによって、この分野の他の多くの研究領域に強く影響を及ぼす、一種のAIベンチマークとして広く受け入れられている。本稿では、Dota 2 BotコンペティションとそれをサポートするDota 2 AIフレームワークについて詳しく説明します。このチャレンジは、MOBAとAI/CIゲームコンペティションの両方に参加することを目的としており、参加者はMOBA \textit{Defense of the Ancients 2} (Dota 2)にAIコントローラを提出して、1v1のマッチでプレイすることを勧めている。 Dota 2 AIフレームワークは、実際のDota 2ゲームモデディング機能を利用して、オリジナルのFree-to-Playゲームを使用して、外部のAIコントローラを実際のDota 2ゲームマッチに接続し、オリジナルのFree-to-Playゲームを使用して、外部のAIコントローラを実際のDota 2ゲームマッチに接続できるようにします。 Multiplayer Online Battle Area (MOBA) games are a recent huge success both in the video game industry and the international eSports scene. These games encourage team coordination and cooperation, short and long-term planning, within a real-time combined action and strategy gameplay. Artificial Intelligence and Computational Intelligence in Games research competitions offer a wide variety of challenges regarding the study and application of AI techniques to different game genres. These events are widely accepted by the AI/CI community as a sort of AI benchmarking that strongly influences many other research areas in the field. This paper presents and describes in detail the Dota 2 Bot competition and the Dota 2 AI framework that supports it. This challenge aims to join both, MOBAs and AI/CI game competitions, inviting participants to submit AI controllers for the successful MOBA \textit{Defense of the Ancients 2} (Dota 2) to play in 1v1 matches, which aims for fostering research on AI techniques for real-time games. The Dota 2 AI framework makes use of the actual Dota 2 game modding capabilities to enable to connect external AI controllers to actual Dota 2 game matches using the original Free-to-Play game.se of the actual Dota 2 game modding capabilities to enable to connect external AI controllers to actual Dota 2 game matches using the original Free-to-Play game.	翻訳日:2021-03-07 18:05:37 公開日:2021-03-04
# (参考訳) DeepTag: 心磁気共鳴画像を用いた動き追跡のための教師なし深層学習法 DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images ( http://arxiv.org/abs/2103.02772v1 ) ライセンス: CC BY 4.0	Meng Ye, Mikael Kanski, Dong Yang, Qi Chang, Zhennan Yan, Qiaoying Huang, Leon Axel, Dimitris Metaxas	(参考訳) 心筋タグ付け磁気共鳴イメージング(t-MRI)は、局所心筋変形および心緊張評価のためのゴールドスタンダードです。しかし, t-MRI画像では運動追跡が困難であったため, 臨床診断にはあまり使われていない。本論文では,t-MRI画像のin vivoモーショントラッキングのための深層学習に基づく完全監視手法を提案する。 2つの連続したt-MRIフレーム間の運動場(INF)を二方向生成二形登録ニューラルネットワークで推定する。この結果を用いて,参照フレームと他のフレームとの間のラグランジュ運動場を微分可能な合成層で推定する。時間情報を利用して時空間運動場を合理的に推定することにより、動的医用画像における運動追跡や画像登録に有用なソリューションを提供する。本手法は代表的な臨床用t-mriデータセット上で検証され, ランドマーク追跡精度と推定効率の点で, 従来の運動追跡法よりも優れていることが実証された。 Cardiac tagging magnetic resonance imaging (t-MRI) is the gold standard for regional myocardium deformation and cardiac strain estimation. However, this technique has not been widely used in clinical diagnosis, as a result of the difficulty of motion tracking encountered with t-MRI images. In this paper, we propose a novel deep learning-based fully unsupervised method for in vivo motion tracking on t-MRI images. We first estimate the motion field (INF) between any two consecutive t-MRI frames by a bi-directional generative diffeomorphic registration neural network. Using this result, we then estimate the Lagrangian motion field between the reference frame and any other frame through a differentiable composition layer. By utilizing temporal information to perform reasonable estimations on spatio-temporal motion fields, this novel method provides a useful solution for motion tracking and image registration in dynamic medical imaging. Our method has been validated on a representative clinical t-MRI dataset; the experimental results show that our method is superior to conventional motion tracking methods in terms of landmark tracking accuracy and inference efficiency.	翻訳日:2021-03-07 17:52:12 公開日:2021-03-04
# (参考訳) きめ細かい視覚分類のための特徴増強, 抑圧, 多様化 Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification ( http://arxiv.org/abs/2103.02782v1 ) ライセンス: CC BY 4.0	Jianwei Song, Ruoyu Yang	(参考訳) 識別的局所領域からの特徴表現の学習は、きめ細かい視覚的分類において重要な役割を担っている。部分的特徴抽出のための注意機構の活用がトレンドとなっている。しかし、これらの方法には2つの大きな制限がある: まず、他の目立たないが区別可能な部分を無視しながら、最も健全な部分に焦点を当てることがしばしばである。第2に、関係を無視しながら、異なる部分の特徴を分離して扱う。これらの制約に対処するために,複数の異なる識別可能な部分を見つけ,それらの関係を明示的な方法で探究することを提案する。本稿では,既存の畳み込みニューラルネットワークに簡単に接続可能な2つの軽量モジュールを提案する。本稿では,特徴マップの最も顕著な部分を強化し,部分固有の表現を取得し,次のネットワークに他の潜在的な部品をマイニングさせるよう抑制する機能強化・抑制モジュールを提案する。一方,相関した部分固有表現から意味的に補完的な情報を学習する特徴多様化モジュールを提案する。私たちのメソッドはバウンディングボックス/パーツアノテーションを必要とせず、エンドツーエンドでトレーニングできます。広範な実験結果から,本手法は複数のベンチマークきめ細かなデータセットにおいて最先端の性能を得ることができた。 Learning feature representation from discriminative local regions plays a key role in fine-grained visual classification. Employing attention mechanisms to extract part features has become a trend. However, there are two major limitations in these methods: First, they often focus on the most salient part while neglecting other inconspicuous but distinguishable parts. Second, they treat different part features in isolation while neglecting their relationships. To handle these limitations, we propose to locate multiple different distinguishable parts and explore their relationships in an explicit way. In this pursuit, we introduce two lightweight modules that can be easily plugged into existing convolutional neural networks. On one hand, we introduce a feature boosting and suppression module that boosts the most salient part of feature maps to obtain a part-specific representation and suppresses it to force the following network to mine other potential parts. On the other hand, we introduce a feature diversification module that learns semantically complementary information from the correlated part-specific representations. Our method does not need bounding boxes/part annotations and can be trained end-to-end. Extensive experimental results show that our method achieves state-of-the-art performances on several benchmark fine-grained datasets.	翻訳日:2021-03-07 17:19:04 公開日:2021-03-04
# (参考訳) 粒度認識畳み込みニューラルネットワークによる粒度分類の学習 Learning Granularity-Aware Convolutional Neural Network for Fine-Grained Visual Classification ( http://arxiv.org/abs/2103.02788v1 ) ライセンス: CC BY 4.0	Jianwei Song, Ruoyu Yang	(参考訳) 識別的部分の配置は、異なるオブジェクト間の高い類似性のため、きめ細かい視覚的分類において重要な役割を果たす。畳み込みニューラルネットワークに基づく最近の研究は、最終畳み込み層から抽出した特徴写像を利用して識別領域をマイニングしている。しかしながら、最後の畳み込み層は、大きな受容野のためにオブジェクト全体に集中する傾向にあり、それによって違いを見つける能力が低下する。そこで本研究では,Granularity-Aware Convolutional Neural Network (GA-CNN) を提案する。具体的には, GA-CNNは, 異なる層における受容場の違いを利用して多粒度特徴を学習し, 前段のより小さな粒度情報に基づいて, より大きな粒度情報を利用する。性能をさらに向上するため,原画像が与えられたオブジェクトを効果的にローカライズできるオブジェクト検出モジュールを導入する。 GA-CNNはバウンディングボックス/パーツアノテーションを必要とせず、エンドツーエンドでトレーニングできます。広範な実験結果から,3つのベンチマークデータセットで最新のパフォーマンスを達成した。 Locating discriminative parts plays a key role in fine-grained visual classification due to the high similarities between different objects. Recent works based on convolutional neural networks utilize the feature maps taken from the last convolutional layer to mine discriminative regions. However, the last convolutional layer tends to focus on the whole object due to the large receptive field, which leads to a reduced ability to spot the differences. To address this issue, we propose a novel Granularity-Aware Convolutional Neural Network (GA-CNN) that progressively explores discriminative features. Specifically, GA-CNN utilizes the differences of the receptive fields at different layers to learn multi-granularity features, and it exploits larger granularity information based on the smaller granularity information found at the previous stages. To further boost the performance, we introduce an object-attentive module that can effectively localize the object given a raw image. GA-CNN does not need bounding boxes/part annotations and can be trained end-to-end. Extensive experimental results show that our approach achieves state-of-the-art performances on three benchmark datasets.	翻訳日:2021-03-07 17:04:23 公開日:2021-03-04
# (参考訳) 効率的なモバイルネットワーク設計のためのコーディネート注意 Coordinate Attention for Efficient Mobile Network Design ( http://arxiv.org/abs/2103.02907v1 ) ライセンス: CC BY 4.0	Qibin Hou, Daquan Zhou, Jiashi Feng	(参考訳) 移動ネットワーク設計に関する最近の研究は, モデル性能向上のためのチャネル注意(例えば, 押し出し注意)の顕著な効果を実証してきたが, 一般に位置情報は無視され, 空間的に選択的に注意マップを生成するのに重要である。本稿では,位置情報をチャネルの注意に埋め込むことにより,モバイルネットワークにおける新たな注意メカニズムを提案する。 2次元グローバルプーリングにより特徴テンソルを単一特徴ベクトルに変換するチャネルアテンションとは異なり、座標アテンションはチャネルアテンションを2つの空間方向に沿って特徴を集約する2つの1次元特徴符号化プロセスに分解する。このようにして、一方の空間方向に沿って長距離依存を捕捉でき、他方の空間方向に沿って正確な位置情報を保存することができる。結果として得られた特徴マップは、入力された特徴マップに相補的に適用でき、関心のある対象の表現を増強できる方向認識および位置知覚のアテンションマップに別々にエンコードされる。座標の注意は単純で、MobileNetV2、MobileNeXt、EfficientNetなどの古典的なモバイルネットワークに柔軟に接続でき、計算オーバーヘッドはほとんどない。広範な実験は、私たちの座標の注意がImageNet分類に有益であるだけでなく、より興味深いことに、オブジェクト検出やセマンティックセグメンテーションなどの下流タスクでより良い振る舞いを示す。コードはhttps://github.com/Andrew-Qibin/CoordAttentionで入手できる。 Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.	翻訳日:2021-03-07 16:55:39 公開日:2021-03-04
# (参考訳) モーションブルービデオ補間と外挿 Motion-blurred Video Interpolation and Extrapolation ( http://arxiv.org/abs/2103.02984v1 ) ライセンス: CC BY 4.0	Dawit Mureja Argaw, Junsik Kim, Francois Rameau, In So Kweon	(参考訳) シーン内のカメラやオブジェクトの突然の動作はぼやけたビデオになるため、高品質なビデオの復元には2つのタイプの強化が必要である。広い範囲の研究により、ぼやけた画像列や時間的にアップサンプルフレームからクリーンフレームを補間する方法が試みられたが、両者を共同で扱う研究は非常に限られている。そこで本研究では,映像から鮮明なフレームをエンド・ツー・エンドで切り離し,補間し,外挿する新しいフレームワークを提案する。まず,入力のぼやけを引き起こした画素レベルの動きを光学的流れ推定によって学習し,デコードされた特徴を推定フローで反動させることで,複数のクリーンフレームを予測した。予測フレーム間の時間的コヒーレンスを確保し,潜在的な時間的あいまいさに対処するために,単純で効果的なフローベースルールを提案する。提案手法の有効性と好適性は,高速ビデオからの動色データセットの質的,定量的な評価を通じて強調される。 Abrupt motion of camera or objects in a scene result in a blurry video, and therefore recovering high quality video requires two types of enhancements: visual enhancement and temporal upsampling. A broad range of research attempted to recover clean frames from blurred image sequences or temporally upsample frames by interpolation, yet there are very limited studies handling both problems jointly. In this work, we present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner. We design our framework by first learning the pixel-level motion that caused the blur from the given inputs via optical flow estimation and then predict multiple clean frames by warping the decoded features with the estimated flows. To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule. The effectiveness and favorability of our approach are highlighted through extensive qualitative and quantitative evaluations on motion-blurred datasets from high speed videos.	翻訳日:2021-03-07 16:39:56 公開日:2021-03-04
# (参考訳) 単一運動破砕画像からの光流量推定 Optical Flow Estimation from a Single Motion-blurred Image ( http://arxiv.org/abs/2103.02996v1 ) ライセンス: CC BY 4.0	Dawit Mureja Argaw, Junsik Kim, Francois Rameau, Jae Won Cho, In So Kweon	(参考訳) ほとんどのコンピュータビジョンアプリケーションでは、動きのぼやけは望ましくない人工物と見なされる。しかし、画像内の動きのぼやけは、基本的なコンピュータビジョン問題に実際的な関心を持つ可能性があることが示されている。そこで本研究では,単一動画像からの光流れをエンドツーエンドで推定する新しい枠組みを提案する。ネットワークをトランスフォーマーネットワークで設計し,動きブレート入力の符号化特徴からグローバルおよび局所的な動きを学習し,明示的なフレーム監督を伴わずに左右のフレーム特徴をデコードする。次に、フロー推定ネットワークを用いて、デコードされた特徴から光学的流れを粗い方法で推定する。合成および実動ブルールデータセットに関する大規模な実験を通じて、モデルを定性的かつ定量的に評価します。また、関連するアプローチに関連するモデルの詳細な分析を行い、アプローチの有効性と有利性を強調します。さらに,本手法で推定したオブジェクト分割タスクの解読と移動におけるフローの適用性について述べる。 In most of computer vision applications, motion blur is regarded as an undesirable artifact. However, it has been shown that motion blur in an image may have practical interests in fundamental computer vision problems. In this work, we propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner. We design our network with transformer networks to learn globally and locally varying motions from encoded features of a motion-blurred input, and decode left and right frame features without explicit frame supervision. A flow estimator network is then used to estimate optical flow from the decoded features in a coarse-to-fine manner. We qualitatively and quantitatively evaluate our model through a large set of experiments on synthetic and real motion-blur datasets. We also provide in-depth analysis of our model in connection with related approaches to highlight the effectiveness and favorability of our approach. Furthermore, we showcase the applicability of the flow estimated by our method on deblurring and moving object segmentation tasks.	翻訳日:2021-03-07 16:22:02 公開日:2021-03-04
# (参考訳) 時間的行動定位のためのマルチラベル行動依存のモデル化 Modeling Multi-Label Action Dependencies for Temporal Action Localization ( http://arxiv.org/abs/2103.03027v1 ) ライセンス: CC BY 4.0	Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah	(参考訳) 実世界のビデオには、アクションクラス間の固有の関係を持つ多くの複雑なアクションが含まれている。本研究では,映像の時間的行動ローカライゼーションの課題に対して,これらの行動関係をモデル化するアテンションベースアーキテクチャを提案する。アクションのビデオレベルの共起を利用する以前の作品とは対照的に、我々は同時に発生するアクションと異なるタイムステップで発生するアクションの関係を区別する(すなわち)。互いに先行する、または従うもの) これらの異なる関係をアクション依存と定義します。本稿では,これらのアクション依存性を,新しいアテンションベースマルチラベルアクション依存性(MLAD)層でモデル化することで,アクションローカライズ性能を向上させることを提案する。 MLADレイヤは、共起アクション依存関係をモデル化するための共起アクション依存関係ブランチと、時間的アクション依存関係の2つのブランチで構成されている。我々は,マルチラベル分類に使用される既存のメトリクスは,アクション依存のモデル化の精度を明示的に測定しないので,アクションクラス間の共起と時間依存の両方を考慮した新しいメトリクスを提案する。実験的な評価と広範囲な分析により,f-mAPと提案した指標を用いて,マルチラベル動作ローカライゼーションベンチマーク(MultiTHUMOSとCharades)の最先端手法よりも優れた性能を示す。 Real-world videos contain many complex actions with inherent relationships between action classes. In this work, we propose an attention-based architecture that models these action relationships for the task of temporal action localization in untrimmed videos. As opposed to previous works that leverage video-level co-occurrence of actions, we distinguish the relationships between actions that occur at the same time-step and actions that occur at different time-steps (i.e. those which precede or follow each other). We define these distinct relationships as action dependencies. We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer. The MLAD layer consists of two branches: a Co-occurrence Dependency Branch and a Temporal Dependency Branch to model co-occurrence action dependencies and temporal action dependencies, respectively. We observe that existing metrics used for multi-label classification do not explicitly measure how well action dependencies are modeled, therefore, we propose novel metrics that consider both co-occurrence and temporal dependencies between action classes. Through empirical evaluation and extensive analysis, we show improved performance over state-of-the-art methods on multi-label action localization benchmarks(MultiTHUMOS and Charades) in terms of f-mAP and our proposed metric.	翻訳日:2021-03-07 16:06:43 公開日:2021-03-04
# (参考訳) SSTN:自律運転のための自己監督型ドメイン適応熱物体検出 SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving ( http://arxiv.org/abs/2103.03150v1 ) ライセンス: CC BY 4.0	Farzeen Munir, Shoaib Azam and Moongu Jeon	(参考訳) 環境の感受性と感度は、自動運転車の安全かつ安全な運転において決定的な役割を果たす。周囲のこの知覚は、人間の視覚表現に似ています。人間の脳は、異なる感覚チャネルを利用して環境を知覚し、ビュー不変表現モデルを開発する。この状況を維持しながら、異なる外部受容センサーが環境を知覚するために自動運転車に展開される。最も一般的な感知センサは、自動運転車の知覚のためのカメラ、ライダー、レーダーです。これらのセンサーは、例えば夜間の悪天候下では可視スペクトル領域の利点を示してきたが、運用能力は限られており、致命的な事故を引き起こす可能性がある。本研究では, 自己監督型コントラスト学習手法を用いて, ビュー不変モデル表現をモデル化する熱物体検出手法を検討する。そこで本研究では,可視領域と赤外領域の情報をコントラスト学習により最大化するための特徴埋め込みを学習する深層ニューラルネットワーク self supervised thermal network (sstn) を提案し,その特徴表現をマルチスケールエンコーダ・デコーダトランスフォーマネットワークを用いた熱物体検出に適用した。提案手法は、FLIR-ADASデータセットとKAISTマルチスペクトラルデータセットの2つの公開データセットで広く評価されている。実験結果は,提案手法の有効性を示す。 The sensibility and sensitivity of the environment play a decisive role in the safe and secure operation of autonomous vehicles. This perception of the surrounding is way similar to human visual representation. The human's brain perceives the environment by utilizing different sensory channels and develop a view-invariant representation model. Keeping in this context, different exteroceptive sensors are deployed on the autonomous vehicle for perceiving the environment. The most common exteroceptive sensors are camera, Lidar and radar for autonomous vehicle's perception. Despite being these sensors have illustrated their benefit in the visible spectrum domain yet in the adverse weather conditions, for instance, at night, they have limited operation capability, which may lead to fatal accidents. In this work, we explore thermal object detection to model a view-invariant model representation by employing the self-supervised contrastive learning approach. For this purpose, we have proposed a deep neural network Self Supervised Thermal Network (SSTN) for learning the feature embedding to maximize the information between visible and infrared spectrum domain by contrastive learning, and later employing these learned feature representation for the thermal object detection using multi-scale encoder-decoder transformer network. The proposed method is extensively evaluated on the two publicly available datasets: the FLIR-ADAS dataset and the KAIST Multi-Spectral dataset. The experimental results illustrate the efficacy of the proposed method.	翻訳日:2021-03-07 15:44:55 公開日:2021-03-04
# (参考訳) 拡張畳み込みを用いた注意型ニューラルネットワークによる映像からの3次元人物位置推定 Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions ( http://arxiv.org/abs/2103.03170v1 ) ライセンス: CC BY 4.0	Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, Vijayan K. Asari	(参考訳) 注意メカニズムは、暗黙的な時間整合性を高めた空間モデル学習のための連続予測フレームワークを提供する。本研究では,従来のネットワークなどの制約をアテンションフレームワークに組み込む手法として,ポーズ推定タスクの長距離依存性を学習するための体系的設計(2次元から3次元まで)を提案する。本論文は,任意の映像シーケンスの柔軟性とスケーラビリティを入力として,エンド・ツー・エンドのポーズ推定のためのアテンションベースモデルの設計と訓練のための体系的なアプローチを提案する。拡張畳み込みのマルチスケール構造により,時間受容場を適応させることにより,これを実現する。さらに,提案アーキテクチャは,リアルタイム性能を実現する因果モデルに容易に適応できる。既製の2Dポーズ推定システム、例えば。 Mocapライブラリは、アドホックな方法で簡単に統合できます。提案手法は,Human3.6Mデータセット上での関節位置誤差の平均を33.4mmに減らし,最先端性能を達成し,既存の手法よりも優れる。 The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other forms of constraints can be incorporated into the attention framework for learning long-range dependencies for the task of pose estimation. The contribution of this paper is to provide a systematic approach for designing and training of attention-based models for the end-to-end pose estimation, with the flexibility and scalability of arbitrary video sequences as input. We achieve this by adapting temporal receptive field via a multi-scale structure of dilated convolutions. Besides, the proposed architecture can be easily adapted to a causal model enabling real-time performance. Any off-the-shelf 2D pose estimation systems, e.g. Mocap libraries, can be easily integrated in an ad-hoc fashion. Our method achieves the state-of-the-art performance and outperforms existing methods by reducing the mean per joint position error to 33.4 mm on Human3.6M dataset.	翻訳日:2021-03-07 15:28:48 公開日:2021-03-04
# (参考訳) 深度Quantum Measurement Ordinal Regressionを用いた前立腺組織グレーディング Prostate Tissue Grading with Deep Quantum Measurement Ordinal Regression ( http://arxiv.org/abs/2103.03188v1 ) ライセンス: CC BY 4.0	Santiago Toledo-Cort\'es, Diego H. Useche, and Fabio A. Gonz\'alez	(参考訳) 前立腺癌(PCa)は世界中で最も一般的で攻撃的ながんの1つである。 Gleasonスコア(GS)システムは、前立腺癌を分類する標準的な方法であり、続く重症度と治療を決定する最も信頼性の高い方法です。病理学者は前立腺の癌細胞の配列を調べ、6から10の範囲のスケールでスコアを割り当てます。前立腺全スライド画像(WSI)の自動分析は、通常、GSによって与えられた段階間の細かい区別を欠くバイナリ分類問題として扱われます。本稿では,前立腺WSIからGSを推定できる確率論的深層学習順序分類法を提案する。微分可能な確率モデルを用いた順序回帰タスクとしてこの問題にアプローチすることで、結果の解釈性が向上するだけでなく、従来の深層分類や回帰アーキテクチャと比較してモデルの精度が向上する。 Prostate cancer (PCa) is one of the most common and aggressive cancers worldwide. The Gleason score (GS) system is the standard way of classifying prostate cancer and the most reliable method to determine the severity and treatment to follow. The pathologist looks at the arrangement of cancer cells in the prostate and assigns a score on a scale that ranges from 6 to 10. Automatic analysis of prostate whole-slide images (WSIs) is usually addressed as a binary classification problem, which misses the finer distinction between stages given by the GS. This paper presents a probabilistic deep learning ordinal classification method that can estimate the GS from a prostate WSI. Approaching the problem as an ordinal regression task using a differentiable probabilistic model not only improves the interpretability of the results, but also improves the accuracy of the model when compared to conventional deep classification and regression architectures.	翻訳日:2021-03-07 15:01:12 公開日:2021-03-04
# (参考訳) 自然言語処理のための完全量子化BERTのハードウェア高速化 Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing ( http://arxiv.org/abs/2103.02800v1 ) ライセンス: CC BY 4.0	Zejian Liu, Gang Li and Jian Cheng	(参考訳) BERTは、さまざまなNLPタスクで最先端のパフォーマンスを実現する最新のトランスフォーマーベースのモデルです。本稿では,エッジコンピューティングのためのFPGA上でのBERTのハードウェアアクセラレーションについて検討する。計算量とメモリフットプリントの問題に対処するために、重み、アクティベーション、ソフトマックス、層正規化、および全ての中間結果を含むBERT(FQ-BERT)の完全定量化を提案する。実験の結果、FQ-BERTは7.94倍の圧縮を達成でき、性能損失は無視できることがわかった。次に、FQ-BERTに適したアクセラレータを提案し、Xilinx ZCU102 と ZCU111 FPGA上で評価する。それぞれIntel(R) Core(TM) i7-8700 CPUとNVIDIA K80 GPUより28.91xと12.72xの3.18fps/Wの性能を実現することができる。 BERT is the most recent Transformer-based model that achieves state-of-the-art performance in various NLP tasks. In this paper, we investigate the hardware acceleration of BERT on FPGA for edge computing. To tackle the issue of huge computational complexity and memory footprint, we propose to fully quantize the BERT (FQ-BERT), including weights, activations, softmax, layer normalization, and all the intermediate results. Experiments demonstrate that the FQ-BERT can achieve 7.94x compression for weights with negligible performance loss. We then propose an accelerator tailored for the FQ-BERT and evaluate on Xilinx ZCU102 and ZCU111 FPGA. It can achieve a performance-per-watt of 3.18 fps/W, which is 28.91x and 12.72x over Intel(R) Core(TM) i7-8700 CPU and NVIDIA K80 GPU, respectively.	翻訳日:2021-03-07 14:51:56 公開日:2021-03-04
# (参考訳) 米下院議員、英国で初の新型コロナウイルスのロックダウンを阻止-ホワイトペーパー MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK: White Paper ( http://arxiv.org/abs/2103.02917v1 ) ライセンス: CC BY 4.0	Tracie Farrell, Mehmet Bakir, Kalina Bontcheva	(参考訳) 英国は数年前から不安定な政治環境をとっており、ブレグジットとリーダーシップの危機は過去5年間をマークしています。この研究では、世界の保健緊急事態であるCOVID-19が、英国政治家が公衆と関わるときに受ける虐待の量、種類、またはトピックにどのように影響するかについてもっと理解したいと考えました。この研究では、世界の保健緊急事態であるCOVID-19が、英国政治家が公衆と関わるときに受ける虐待の量、種類、またはトピックにどのように影響するかについてもっと理解したいと考えました。この研究は、2020年6月から12月までの期間をカバーし、英国の議員に対するTwitterの乱用を分析します。この研究は、英国の新型コロナウイルスパンデミックの最初の4ヶ月間のオンライン虐待の分析によるフォローアップです。この論文は、この新しい7ヶ月間の全体的な虐待レベルを調べ、さまざまな政党や英国政府のメンバーへの反応を分析し、オンラインの虐待とブレグジット、政府のCOVID-19対応と政策、社会問題などのトピックとの関係を分析します。また,同時期の国会議員への虐待的回答に載った陰謀論の存在についても検討した。英国議会議員に対する虐待レベルは、2020年12月に過去最高(全返信ツイートの5.4%)に達したことが判明した。これは、総選挙の2ヶ月前よりもほぼ1%高いです。新型コロナウイルスの感染拡大と欧州連合(EU)とのブレグジット(ブレグジット)交渉の終了が近づいている中、2020年7月以降、トーリー党の国会議員は、新型コロナウイルス(COVID-19)の危機が深刻化し、2020年9月から5%を超える虐待的な回答を最も多く受け取っている。 The UK has had a volatile political environment for some years now, with Brexit and leadership crises marking the past five years. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. This work covers the period of June to December 2020 and analyses Twitter abuse in replies to UK MPs. This work is a follow-up from our analysis of online abuse during the first four months of the COVID-19 pandemic in the UK. The paper examines overall abuse levels during this new seven month period, analyses reactions to members of different political parties and the UK government, and the relationship between online abuse and topics such as Brexit, government's COVID-19 response and policies, and social issues. In addition, we have also examined the presence of conspiracy theories posted in abusive replies to MPs during the period. We have found that abuse levels toward UK MPs were at an all-time high in December 2020 (5.4% of all reply tweets sent to MPs). This is almost 1% higher that the two months preceding the General Election. In a departure from the trend seen in the first four months of the pandemic, MPs from the Tory party received the highest percentage of abusive replies from July 2020 onward, which stays above 5% starting from September 2020 onward, as the COVID-19 crisis deepened and the Brexit negotiations with the EU started nearing completion.	翻訳日:2021-03-07 14:42:12 公開日:2021-03-04
# (参考訳) エビデンス支援による予測の学習:臨床リスク予測への応用 Learning to Predict with Supporting Evidence: Applications to Clinical Risk Prediction ( http://arxiv.org/abs/2103.02768v1 ) ライセンス: CC BY 4.0	Aniruddh Raghu, John Guttag, Katherine Young, Eugene Pomerantsev, Adrian V. Dalca, Collin M. Stultz	(参考訳) 機械学習モデルがヘルスケアに与える影響は、医療専門家がこれらのモデルによって予測される信頼度に依存する。本論文では,予測が信頼されるべき理由に関するドメイン関連証拠を臨床専門家に提示する手法を提案する。まず,有意な潜在概念を予測対象や観測データに関連付ける確率モデルを設計する。このモデルにおける潜在変数の推論は、予測の作成と、その予測の裏付けとなる証拠の提供の両方に対応する。 i) 変動学習を用いたモデルパラメータの推定, (ii) 確率モデルから派生した目的を訓練したニューラルネットワークを用いたモデルにおける潜時変数の推定を最大に近似する。本研究は,循環器疾患患者の死亡リスクを予測するための課題である。特に,心電図と表データ入力を用いて,本手法が正確な予測のための適切な領域関連証拠を提供することを示す。 The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. In this paper, we present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted. We first design a probabilistic model that relates meaningful latent concepts to prediction targets and observed data. Inference of latent variables in this model corresponds to both making a prediction and providing supporting evidence for that prediction. We present a two-step process to efficiently approximate inference: (i) estimating model parameters using variational learning, and (ii) approximating maximum a posteriori estimation of latent variables in the model using a neural network, trained with an objective derived from the probabilistic model. We demonstrate the method on the task of predicting mortality risk for patients with cardiovascular disease. Specifically, using electrocardiogram and tabular data as input, we show that our approach provides appropriate domain-relevant supporting evidence for accurate predictions.	翻訳日:2021-03-07 12:33:37 公開日:2021-03-04
# (参考訳) トポロジカル機能による3Dポイントクラウドの次のベストビューの学習 Learning the Next Best View for 3D Point Clouds via Topological Features ( http://arxiv.org/abs/2103.02789v1 ) ライセンス: CC BY 4.0	Christopher Collander, William J. Beksi, Manfred Huber	(参考訳) 本稿では,新しいトポロジーに基づく情報ゲインメトリックを用いて,ノイズの多い3dセンサの次なる最善の視点を指示する強化学習手法を提案する。測定器は観察された表面の不整合セクションを結合し、穴および凹面セクションのような高密度の特徴に焦点を合わせます。実験の結果,本手法は,ストリーミングポイントクラウドデータが提供する情報を最適化するためにロボットセンサの配置を確立するのに役立つことがわかった。さらに、3Dオブジェクトのラベル付きデータセット、カスタムロボットマニピュレータ用のCAD設計、およびポイントクラウドの変換、結合、および登録のためのソフトウェアが研究コミュニティに公開されました。 In this paper, we introduce a reinforcement learning approach utilizing a novel topology-based information gain metric for directing the next best view of a noisy 3D sensor. The metric combines the disjoint sections of an observed surface to focus on high-detail features such as holes and concave sections. Experimental results show that our approach can aid in establishing the placement of a robotic sensor to optimize the information provided by its streaming point cloud data. Furthermore, a labeled dataset of 3D objects, a CAD design for a custom robotic manipulator, and software for the transformation, union, and registration of point clouds has been publicly released to the research community.	翻訳日:2021-03-07 12:06:54 公開日:2021-03-04
# (参考訳) IACN:Recommendationのための影響認識と注意ベースの共進化ネットワーク IACN: Influence-aware and Attention-based Co-evolutionary Network for Recommendation ( http://arxiv.org/abs/2103.02866v1 ) ライセンス: CC0 1.0	Shalini Pandey, George Karypis and Jaideep Srivasatava	(参考訳) RedditやTwitterなどのオンラインコミュニティでは,関連する項目をユーザに推奨することが重要な課題だ。レコメンデーションシステムのために,表現学習は,ユーザの振る舞いを表現するために埋め込みを学習し,アイテムのプロパティをキャプチャする強力なテクニックを提供する。しかし、オンラインコミュニティへの埋め込みの学習は、ユーザーの興味が進化し続けるため、難しい課題である。この進化は,1) ユーザと項目間のインタラクション,2) コミュニティ内の他のユーザの影響から捉えることができる。既存の動的埋め込みモデルは、ユーザーの埋め込みを更新する要因のいずれかのみを考慮します。しかし、ある時点では、2つの要素の組み合わせによってユーザーの興味が進化する。そこで我々は,影響認識と注意に基づく共進化ネットワーク (IACN) を提案する。本質的にIACNは、相互作用モデリングと影響モデリングの2つの重要なコンポーネントから構成される。インタラクションモデリングレイヤは、ユーザがアイテムと対話する際に、ユーザとアイテムの埋め込みを更新する責務を負う。影響モデリング層は、他のユーザの相互作用によって引き起こされる時間的興奮をキャプチャする。これら2つの層から得られる信号を統合するために,インタラクションベースと影響ベースの埋め込みを効果的に組み合わせ,最終的なユーザ埋め込みを予測する新しい融合層を設計する。私たちのモデルは、さまざまなドメインの既存の最新モデルよりも優れています。 Recommending relevant items to users is a crucial task on online communities such as Reddit and Twitter. For recommendation system, representation learning presents a powerful technique that learns embeddings to represent user behaviors and capture item properties. However, learning embeddings on online communities is a challenging task because the user interest keep evolving. This evolution can be captured from 1) interaction between user and item, 2) influence from other users in the community. The existing dynamic embedding models only consider either of the factors to update user embeddings. However, at a given time, user interest evolves due to a combination of the two factors. To this end, we propose Influence-aware and Attention-based Co-evolutionary Network (IACN). Essentially, IACN consists of two key components: interaction modeling and influence modeling layer. The interaction modeling layer is responsible for updating the embedding of a user and an item when the user interacts with the item. The influence modeling layer captures the temporal excitation caused by interactions of other users. To integrate the signals obtained from the two layers, we design a novel fusion layer that effectively combines interaction-based and influence-based embeddings to predict final user embedding. Our model outperforms the existing state-of-the-art models from various domains.	翻訳日:2021-03-07 11:52:58 公開日:2021-03-04
# (参考訳) パンスハーピング成功のためのモデルベースの画像調整 Model-based image adjustment for a successful pansharpening ( http://arxiv.org/abs/2103.03062v1 ) ライセンス: CC BY 4.0	Gintautas Palubinskas	(参考訳) マルチレゾリューション画像融合やパンシャルペニングの強化のための新しいモデルベース画像調整法を提案する。このような画像調整は、パンクロマティックバンドおよび/または強度画像(マルチスペクトルバンドの重み付き合計として計算される)を入力として使用するほとんどのパンスハーピング方法に必要です。様々な理由で、例えば。キャリブレーションの不正確さ、異なるセンサーの使用、パンシャーパニングのための入力画像:低分解能マルチスペクトル画像またはより正確には計算された強度画像と高分解能パンクロマティック画像は、物理特性の値が異なる可能性がある。処理レベルに応じて、線量または反射率。しかし、両方の画像の同じオブジェクト/クラスは、類似の値またはより一般的に類似の統計を示すべきである。類似性の定義は特定のアプリケーションに依存する。 2つのセンサーからのデータをうまく融合させるには、両方のセンサーの放射/反射のエネルギーバランスを保つ必要がある。異なるセンサにおける全エネルギー不均衡を補償するために仮想バンドが導入された。まず、個々のスペクトル帯域の重みを低分解能スケールで推定し、マルチスペクトル画像とパンクロマティック画像の両方(低パスフィルタバージョン)が利用可能であり、推定仮想バンドを高スケールにアップサンプリングし、最終的に、高分解能パンクロマティックバンドを仮想バンドを減算して補正する。次のpansharpeningで元のpanchromaticイメージの代りにこの訂正されたpanchromaticバンドが使用されます。例えば、コンポーネント置換ベースのメソッドのパフォーマンス品質が大幅に向上できることが示されている。 A new model-based image adjustment for the enhancement of multi-resolution image fusion or pansharpening is proposed. Such image adjustment is needed for most pansharpening methods using panchromatic band and/or intensity image (calculated as a weighted sum of multispectral bands) as an input. Due various reasons, e.g. calibration inaccuracies, usage of different sensors, input images for pansharpening: low resolution multispectral image or more precisely the calculated intensity image and high resolution panchromatic image may differ in values of their physical properties, e.g. radiances or reflectances depending on the processing level. But the same objects/classes in both images should exhibit similar values or more generally similar statistics. Similarity definition will depend on a particular application. For a successful fusion of data from two sensors the energy balance between radiances/reflectances of both sensors should hold. A virtual band is introduced to compensate for total energy disbalance in different sensors. Its estimation consists of several steps: first, weights for individual spectral bands are estimated in a low resolution scale, where both multispectral and panchromatic images (low pass filtered version) are available, then, the estimated virtual band is up-sampled to a high scale and, finally, high resolution panchromatic band is corrected by subtracting virtual band. This corrected panchromatic band is used instead of original panchromatic image in the following pansharpening. It is shown, for example, that the performance quality of component substitution based methods can be increased significantly.	翻訳日:2021-03-07 11:27:09 公開日:2021-03-04
# (参考訳) DONeRF:depth Oracle Networksを用いたニューラルネットワークのリアルタイムレンダリングに向けて DONeRF: Towards Real-Time Rendering of Neural Radiance Fields using Depth Oracle Networks ( http://arxiv.org/abs/2103.03231v1 ) ライセンス: CC BY 4.0	Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Chakravarty R. Alla Chaitanya, Anton Kaplanyan, Markus Steinberger	(参考訳) 最近のNeural Radiance Fields(NeRF)に関する研究爆発は、新しいビュー生成のためにニューラルネットワークにシーンや照明情報を暗黙的に保存する巨大な可能性を示している。しかし、NeRFの普及を妨げる大きな制限の1つは、各ビューレイに沿って過度のネットワーク評価を禁止する計算コストであり、現在のデバイスでリアルタイムレンダリングを目指すには数十のPetaFLOPSを必要とします。現地のサンプルをシーンの表面に配置すると、各ビューレイに必要なサンプル数が大幅に削減できることを示します。この目的のために,1つのネットワーク評価で各ビューレイのレイサンプル位置を予測できる深度oracleネットワークを提案する。対数的に離散化され, 球面に歪んだ深度値を含む分類網を用いることは, 深度を直接推定するのではなく, 表面位置を符号化する上で重要であることを示す。これらの手法を組み合わせることで、深度オラクルネットワークを第1ステップとする二重ネットワーク設計のDONeRFと、局所的な試料シェーディングネットワークによる光線蓄積が実現される。当社の設計により、NeRFと比較して最大48倍の推論コストを削減します。市販の推論apiと単純な計算カーネルを組み合わせることで、レイマーチングベースのニューラルネットワーク表現を1つのgpu上でインタラクティブなフレームレート(毎秒15フレーム、800x800)でレンダリングした最初の例です。同時に、我々は表面周辺のシーンの重要部分に焦点を当てるため、NeRFと同等または良質な品質が得られる。 The recent research explosion around Neural Radiance Fields (NeRFs) shows that there is immense potential for implicitly storing scene and lighting information in neural networks, e.g., for novel view generation. However, one major limitation preventing the widespread use of NeRFs is the prohibitive computational cost of excessive network evaluations along each view ray, requiring dozens of petaFLOPS when aiming for real-time rendering on current devices. We show that the number of samples required for each view ray can be significantly reduced when local samples are placed around surfaces in the scene. To this end, we propose a depth oracle network, which predicts ray sample locations for each view ray with a single network evaluation. We show that using a classification network around logarithmically discretized and spherically warped depth values is essential to encode surface locations rather than directly estimating depth. The combination of these techniques leads to DONeRF, a dual network design with a depth oracle network as a first step and a locally sampled shading network for ray accumulation. With our design, we reduce the inference costs by up to 48x compared to NeRF. Using an off-the-shelf inference API in combination with simple compute kernels, we are the first to render raymarching-based neural representations at interactive frame rates (15 frames per second at 800x800) on a single GPU. At the same time, since we focus on the important parts of the scene around surfaces, we achieve equal or better quality compared to NeRF.	翻訳日:2021-03-07 11:02:37 公開日:2021-03-04
# (参考訳) グラフニューラルネットワークの知識を抽出し、その先へ進む:効果的な知識蒸留フレームワーク Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework ( http://arxiv.org/abs/2103.02885v1 ) ライセンス: CC BY 4.0	Cheng Yang, Jiawei Liu and Chuan Shi	(参考訳) グラフ上の半教師付き学習は、機械学習の分野で重要な問題です。近年,グラフニューラルネットワーク(GNN)に基づく最先端の分類手法が,ラベル伝搬などの従来の手法よりも優れていることが示されている。しかし、これらのニューラルモデルの洗練されたアーキテクチャは複雑な予測メカニズムにつながり、例えば構造的に関連付けられたノードは同じクラスを持つ傾向があるため、データにある貴重な事前知識をフルに利用できない。本稿では,上記の課題を解決するための知識蒸留に基づく枠組みを提案する。本フレームワークは、任意の学習gnnモデル(教師モデル)の知識を抽出し、よく設計された学生モデルに注入する。学生モデルは2つの単純な予測機構、すなわちラベル伝搬と特徴変換で構築され、それぞれ構造に基づく事前知識と特徴に基づく事前知識を自然に保存する。具体的には,パラメータ化ラベル伝搬と特徴変換モジュールの訓練可能な組み合わせとして,学生モデルを設計する。その結果、学習した学生は、gnn教師の事前知識と知識の両方から、より効果的な予測を得ることができる。さらに,学習者モデルはgnnよりも解釈可能な予測プロセスを有する。我々は5つの公開ベンチマークデータセットの実験を行い、教師モデルとしてGCN, GAT, APPNP, SAGE, SGC, GCNII, GLPを含む7つのGNNモデルを用いる。実験結果から,学習者モデルは平均1.4%～4.7%の教師モデルより一貫して優れていた。コードとデータはhttps://github.com/BUPT-GAMMA/CPFで入手できます。 Semi-supervised learning on graphs is an important problem in the machine learning area. In recent years, state-of-the-art classification methods based on graph neural networks (GNNs) have shown their superiority over traditional ones such as label propagation. However, the sophisticated architectures of these neural models will lead to a complex prediction mechanism, which could not make full use of valuable prior knowledge lying in the data, e.g., structurally correlated nodes tend to have the same class. In this paper, we propose a framework based on knowledge distillation to address the above issues. Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model), and injects it into a well-designed student model. The student model is built with two simple prediction mechanisms, i.e., label propagation and feature transformation, which naturally preserves structure-based and feature-based prior knowledge, respectively. In specific, we design the student model as a trainable combination of parameterized label propagation and feature transformation modules. As a result, the learned student can benefit from both prior knowledge and the knowledge in GNN teachers for more effective predictions. Moreover, the learned student model has a more interpretable prediction process than GNNs. We conduct experiments on five public benchmark datasets and employ seven GNN models including GCN, GAT, APPNP, SAGE, SGC, GCNII and GLP as the teacher models. Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average. Code and data are available at https://github.com/BUPT-GAMMA/CPF	翻訳日:2021-03-07 10:23:20 公開日:2021-03-04
# (参考訳) 胸部X線分類のための自己制御深部畳み込みニューラルネットワーク Self-supervised deep convolutional neural network for chest X-ray classification ( http://arxiv.org/abs/2103.03055v1 ) ライセンス: CC BY 4.0	Matej Gazda, Jakub Gazda, Jan Plavka, Peter Drotar	(参考訳) 胸部X線撮影は、診断決定を行うための重要な情報を伝える比較的安価で広く利用可能な医療手順です。胸部x線は肺炎や最近のcovid-19などの呼吸器疾患の診断によく用いられる。本論文では,ラベルのない胸部X線データセット上に予め訓練された自己監視型ディープニューラルネットワークを提案する。学習された表現は、呼吸器疾患の分類である下流タスクに転送される。 4つの公開データセットで得られた結果は、私たちのアプローチが大量のラベル付きトレーニングデータを必要とせずに競争結果をもたらすことを示しています。 Chest radiography is a relatively cheap, widely available medical procedure that conveys key information for making diagnostic decisions. Chest X-rays are almost always used in the diagnosis of respiratory diseases such as pneumonia or the recent COVID-19. In this paper, we propose a self-supervised deep neural network that is pretrained on an unlabeled chest X-ray dataset. The learned representations are transferred to downstream task - the classification of respiratory diseases. The results obtained on four public datasets show that our approach yields competitive results without requiring large amounts of labeled training data.	翻訳日:2021-03-07 10:07:13 公開日:2021-03-04
# (参考訳) 信号対称フィードバックアライメントを有するエッジデバイス上での効率的な学習畳み込みニューラルネットワーク Efficient Training Convolutional Neural Networks on Edge Devices with Gradient-pruned Sign-symmetric Feedback Alignment ( http://arxiv.org/abs/2103.02889v1 ) ライセンス: CC BY 4.0	Ziyang Hong and C. Patrick Yue	(参考訳) モバイルデバイスの繁栄に伴い、分散データによるモデルトレーニングを可能にする分散学習アプローチが広く研究されている。しかし,エッジデバイスにおける学習能力の欠如は,実生活における分散学習のエネルギー効率を著しく低下させる。本稿では,従来のバックプロパゲーションの冗長性と重み非対称性ポテンシャルを利用したDNNのトレーニング手法について述べる。提案手法は, 分類精度の損失が無視できるほど, エネルギー効率の面では, 先行技術よりも5倍高いことを実証する。 With the prosperity of mobile devices, the distributed learning approach enabling model training with decentralized data has attracted wide research. However, the lack of training capability for edge devices significantly limits the energy efficiency of distributed learning in real life. This paper describes a novel approach of training DNNs exploiting the redundancy and the weight asymmetry potential of conventional backpropagation. We demonstrate that with negligible classification accuracy loss, the proposed approach outperforms the prior arts by 5x in terms of energy efficiency.	翻訳日:2021-03-07 09:46:56 公開日:2021-03-04
# (参考訳) パンデミック・ドラッグのパンデミック・スピード--高性能コンピュータ上でのハイブリッド機械学習と物理シミュレーションによるcovid-19創薬の加速 Pandemic Drugs at Pandemic Speed: Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers ( http://arxiv.org/abs/2103.02843v1 ) ライセンス: CC BY 4.0	Agastya P. Bhati, Shunzhou Wan, Dario Alf\`e, Austin R. Clyde, Mathis Bode, Li Tan, Mikhail Titov, Andre Merzky, Matteo Turilli, Shantenu Jha, Roger R. Highfield, Walter Rocchia, Nicola Scafuri, Sauro Succi, Dieter Kranzlm\"uller, Gerald Mathias, David Wifling, Yann Donon, Alberto Di Meglio, Sofia Vallecorsa, Heng Ma, Anda Trifan, Arvind Ramanathan, Tom Brettin, Alexander Partin, Fangfang Xia, Xiaotan Duan, Rick Stevens, Peter V. Coveney	(参考訳) 世界的パンデミックの課題を満たすための競争は、既存の医薬品発見プロセスが高価で非効率で遅いことを思い出させるのに役立った。抗ウイルス薬の開発のためのリード化合物をショートリストに潜在的な小さな分子の膨大な数をスクリーニングする主要なボトルネックがあります。薬物発見を加速する新たな機会は、線形加速器用に開発された機械学習手法と物理に基づく手法のインターフェースにある。 2つのシリコ法では、それぞれ独自の利点と制限があり、興味深いことに互いに補完する。本稿では、薬物発見を加速する両アプローチを組み合わせた革新的な方法を提案する。結果として生じるワークフローのスケールは、ハイパフォーマンスコンピューティングに依存している。我々は、このワークフローを4つのcovid-19ターゲットタンパク質に適用できることと、様々なスーパーコンピュータ上で鉛化合物を識別するために必要な大規模計算を行う能力を示した。 The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative method that combines both approaches to accelerate drug discovery. The scale of the resulting workflow is such that it is dependent on high performance computing. We have demonstrated the applicability of this workflow on four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead compounds on a variety of supercomputers.	翻訳日:2021-03-07 08:25:43 公開日:2021-03-04
# (参考訳) 雑音ラベル学習のための強化戦略 Augmentation Strategies for Learning with Noisy Labels ( http://arxiv.org/abs/2103.02130v2 ) ライセンス: CC BY 4.0	Kento Nishi, Yi Ding, Alex Rich, Tobias H\"ollerer	(参考訳) 不完全なラベルは、実世界のデータセットに普遍的です。ラベルノイズに強いディープニューラルネットワーク(DNN)を訓練するいくつかの成功した方法は、ウォームアップフェーズ中の損失に基づいてサンプルをフィルタリングして、クリーンなラベル付きサンプルの最初のセットをキュレートし、その後の損失計算のための擬似ラベルとしてネットワークの出力を使用することである。本稿では,「ノイズラベルを用いた学習」問題に取り組むアルゴリズムの強化戦略について検討する。 CIFAR-10 と CIFAR-100 に基づく合成データセットと実世界データセット Clothing1M を用いて,複数の拡張戦略を提案し,検討する。これらのアルゴリズムにいくつかの共通性があるため、損失モデリングタスクに1組の加減と学習のためのもう1セットを用いることが最も効果的であり、最先端や他の以前の方法の結果を改善することが判明した。さらに, ウォームアップ期間中に添加することで, 不正確なラベル付き試料に対する損失収束挙動に負の影響がみられた。我々は,この拡張戦略を最先端技術に導入し,評価されたすべての騒音レベルにおける性能向上を実証する。特に、CIFAR-10ベンチマークの精度を90%の対称雑音で絶対精度で15%以上向上し、実世界のデータセットであるClathing1Mの性能も向上する。 (※同等の貢献) Imperfect labels are ubiquitous in real-world datasets. Several recent successful methods for training deep neural networks (DNNs) robust to label noise have used two primary techniques: filtering samples based on loss during a warm-up phase to curate an initial set of cleanly labeled samples, and using the output of a network as a pseudo-label for subsequent loss calculations. In this paper, we evaluate different augmentation strategies for algorithms tackling the "learning with noisy labels" problem. We propose and examine multiple augmentation strategies and evaluate them using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world dataset Clothing1M. Due to several commonalities in these algorithms, we find that using one set of augmentations for loss modeling tasks and another set for learning is the most effective, improving results on the state-of-the-art and other previous methods. Furthermore, we find that applying augmentation during the warm-up period can negatively impact the loss convergence behavior of correctly versus incorrectly labeled samples. We introduce this augmentation strategy to the state-of-the-art technique and demonstrate that we can improve performance across all evaluated noise levels. In particular, we improve accuracy on the CIFAR-10 benchmark at 90% symmetric noise by more than 15% in absolute accuracy and we also improve performance on the real-world dataset Clothing1M. (* equal contribution)	翻訳日:2021-03-07 08:08:26 公開日:2021-03-04
# (参考訳) MotionRNN:時空変動運動を用いたフレキシブルな映像予測モデル MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions ( http://arxiv.org/abs/2103.02243v2 ) ライセンス: CC BY 4.0	Haixu Wu, Zhiyu Yao, Mingsheng Long, Jianmin Wang	(参考訳) 本稿では,空間と時間の両方で絶え間なく変化する時空変動運動を予測する新たな次元から映像予測に取り組む。以前の方法は、主に時間状態遷移を捕捉するが、運動自体の複雑な時空間変動を見落とし、絶えず変化する動きに適応することは困難である。物理世界の動きは過渡的な変動と動きの傾向に分解できるが、後者は過去の動きの蓄積と見なすことができる。したがって、時空変動運動をより予測可能にする鍵は、過渡変動と運動トレンドを同時に捉えることである。これらの観察に基づいて,モーション内の複雑な変動を捉え,時空変動のシナリオに適応できる motionrnn フレームワークを提案する。 MotionRNNには2つの主な貢献がある。 1つ目は、過渡変動と動きの傾向を統一的にモデル化できるモーションGRUユニットを設計することである。 2つ目は、rnnベースの予測モデルにmotiongruを適用し、変化可能な動きの予測能力を大幅に向上し、積み重ねられた多層予測モデルにおける動き消失を回避する新しいフレキシブルビデオ予測アーキテクチャを示すことである。高い柔軟性により、このフレームワークは決定論的時空間予測のための一連のモデルに適応することができる。当社の MotionRNN は、時空変動運動によるビデオ予測の3つの困難なベンチマークで大幅な改善をもたらすことができます。 This paper tackles video prediction from a new dimension of predicting spacetime-varying motions that are incessantly changing across both space and time. Prior methods mainly capture the temporal state transitions but overlook the complex spatiotemporal variations of the motion itself, making them difficult to adapt to ever-changing motions. We observe that physical world motions can be decomposed into transient variation and motion trend, while the latter can be regarded as the accumulation of previous motions. Thus, simultaneously capturing the transient variation and the motion trend is the key to make spacetime-varying motions more predictable. Based on these observations, we propose the MotionRNN framework, which can capture the complex variations within motions and adapt to spacetime-varying scenarios. MotionRNN has two main contributions. The first is that we design the MotionGRU unit, which can model the transient variation and motion trend in a unified way. The second is that we apply the MotionGRU to RNN-based predictive models and indicate a new flexible video prediction architecture with a Motion Highway that can significantly improve the ability to predict changeable motions and avoid motion vanishing for stacked multiple-layer predictive models. With high flexibility, this framework can adapt to a series of models for deterministic spatiotemporal prediction. Our MotionRNN can yield significant improvements on three challenging benchmarks for video prediction with spacetime-varying motions.	翻訳日:2021-03-07 07:35:04 公開日:2021-03-04
# (参考訳) ID-Unet: ビュー合成のための反復ソフトとハード変形 ID-Unet: Iterative Soft and Hard Deformation for View Synthesis ( http://arxiv.org/abs/2103.02264v2 ) ライセンス: CC BY 4.0	Mingyu Yin, Li Sun, Qingli Li	(参考訳) ビュー合成は通常、オートエンコーダによって行われ、エンコーダはソースビュー画像を潜在コンテンツコードにマッピングし、デコーダはその条件に従ってターゲットビューイメージに変換する。しかし、ソースの内容はよくこの設定に保持されていないため、ビュー翻訳中に不要な変更が発生します。 unetのようなスキップ接続の追加は問題を緩和するが、ビューの適合性に障害を引き起こすことが多い。本稿では, 音源から目標への変形を反復的に行う新しいアーキテクチャを提案する。エンコーダの複数の層からの機能を単に組み込むのではなく、ソフトで硬い変形モジュールを設計し、それによってエンコーダの機能を異なる解像度でターゲットビューにワープし、詳細を補うためにデコーダに結果を与える。特に、現在の反り流は、同じ解像度の特徴を調整するだけでなく、高解像度の特徴を粗く変形させる近似としても使用されます。そして、残留流を高分解能で推定して印加することにより、粗粒度から細粒度までの変形が構築される。モデルをよりよく制約するために,中間フローとその歪んだ特徴に基づいて,粗い目標視像を合成する。 2つの異なるデータセットにおける広範なアブレーション研究と最終結果は,提案モデルの有効性を示している。 View synthesis is usually done by an autoencoder, in which the encoder maps a source view image into a latent content code, and the decoder transforms it into a target view image according to the condition. However, the source contents are often not well kept in this setting, which leads to unnecessary changes during the view translation. Although adding skipped connections, like Unet, alleviates the problem, but it often causes the failure on the view conformity. This paper proposes a new architecture by performing the source-to-target deformation in an iterative way. Instead of simply incorporating the features from multiple layers of the encoder, we design soft and hard deformation modules, which warp the encoder features to the target view at different resolutions, and give results to the decoder to complement the details. Particularly, the current warping flow is not only used to align the feature of the same resolution, but also as an approximation to coarsely deform the high resolution feature. Then the residual flow is estimated and applied in the high resolution, so that the deformation is built up in the coarse-to-fine fashion. To better constrain the model, we synthesize a rough target view image based on the intermediate flows and their warped features. The extensive ablation studies and the final results on two different data sets show the effectiveness of the proposed model.	翻訳日:2021-03-07 06:41:47 公開日:2021-03-04
# (参考訳) H\"older クラスにおけるReLU-Sine-Exponential Activations Break Curse of Dimensionalityを用いたディープニューラルネットワーク Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on H\"older Class ( http://arxiv.org/abs/2103.00542v2 ) ライセンス: CC BY 4.0	Yuling Jiao, Yanming Lai, Xiliang Lu, Zhijian Yang	(参考訳) 本論文では,ReLU,sine,および2^x$をアクティベーション関数とするニューラルネットワークを構築する。 for general continuous $f$ defined on $[0,1]^d$ with continuity modulus $\omega_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy a approximation rate $\mathcal{O}(\omega_f(\sqrt{d})\cdot2^{-M}+\omega_{f}\left(\frac{\sqrt{d}}{N}\right)$, where $M,N\in \mathbb{N}^{+}$。 As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3\mu}{\epsilon}\right)^{1/{\alpha}}\right\rceil,2\left\lceil\log_2\frac{3\mu d^{\alpha/2}}{2\epsilon}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ within a given tolerance $\epsilon >0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ denotes the H\"older continuous function class defined on $[0,1]^d$ with order $\alpha \in (0,1]$ and constant $\mu > 0$. したがって、ReLU-sine-$2^x$ネットワークは、$\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$上の次元の呪いを克服する。スーパー表現力に加えて、ReLU-sine-$2^x$ネットワークで実装された関数は(一般化)微分可能であり、SGDを訓練に適用することができる。 In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $\omega_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(\omega_f(\sqrt{d})\cdot2^{-M}+\omega_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3\mu}{\epsilon}\right)^{1/{\alpha}}\right\rceil,2\left\lceil\log_2\frac{3\mu d^{\alpha/2}}{2\epsilon}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ within a given tolerance $\epsilon >0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ denotes the H\"older continuous function class defined on $[0,1]^d$ with order $\alpha \in (0,1]$ and constant $\mu > 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.	翻訳日:2021-03-07 06:12:46 公開日:2021-03-04
# フレーズベースおよびニューラルマシン翻訳の実証的分析 An empirical analysis of phrase-based and neural machine translation ( http://arxiv.org/abs/2103.03108v1 ) ライセンス: Link先を確認	Hamidreza Ghader	(参考訳) 機械翻訳(MT)の2つの一般的なタイプは、フレーズベースとニューラルマシン翻訳システムです。どちらのシステムも複数の複雑なモデルや層で構成されている。これらのモデルとレイヤはそれぞれ、ソース言語の異なる言語的側面を学ぶ。しかし,これらのモデルや層について,どの言語現象が学習されるのか,どのように学習されるのかは明らかになっていない。フレーズベースのMTシステムでは、各モデルでどのような情報が学習されるのかが明確であり、むしろこの情報がどのように学習されるか、特に句の並べ替えモデルについてである。ニューラルマシン翻訳システムでは、その状況はさらに複雑であり、多くの場合、どのような情報が学習され、どのように学習されるかは正確には分かっていない。 MTシステムでは,言語現象がどのように捉えられているかを明らかにするために,フレーズベースとニューラルMTシステムの両方において重要なモデルの挙動を解析する。本研究では, フレーズリオーダリングモデルを用いて, フレーズリオーダリングの動作を定義する上で, フレーズ内のどの単語がもっとも影響が大きいかを検討する。さらに、ニューラルMTシステムの解釈可能性に寄与するために、ニューラルMTシステムにおける重要なコンポーネントである注意モデルと、フレーズベースのシステムにおけるフレーズリオーダーモデルに最も近いモデルの振る舞いを研究します。注意モデルとエンコーダ隠された状態表現は、神経MTのソース側言語情報をエンコードする主要なコンポーネントを形成する。この目的のために、我々はまた、神経MTシステムのエンコーダ隠蔽状態表現でキャプチャされた情報を分析します。異なるニューラルMTアーキテクチャの隠れた状態表現によって、ソース側からの構文的および語彙的セマンティック情報が捕捉される範囲を調査する。 Two popular types of machine translation (MT) are phrase-based and neural machine translation systems. Both of these types of systems are composed of multiple complex models or layers. Each of these models and layers learns different linguistic aspects of the source language. However, for some of these models and layers, it is not clear which linguistic phenomena are learned or how this information is learned. For phrase-based MT systems, it is often clear what information is learned by each model, and the question is rather how this information is learned, especially for its phrase reordering model. For neural machine translation systems, the situation is even more complex, since for many cases it is not exactly clear what information is learned and how it is learned. To shed light on what linguistic phenomena are captured by MT systems, we analyze the behavior of important models in both phrase-based and neural MT systems. We consider phrase reordering models from phrase-based MT systems to investigate which words from inside of a phrase have the biggest impact on defining the phrase reordering behavior. Additionally, to contribute to the interpretability of neural MT systems we study the behavior of the attention model, which is a key component in neural MT systems and the closest model in functionality to phrase reordering models in phrase-based systems. The attention model together with the encoder hidden state representations form the main components to encode source side linguistic information in neural MT. To this end, we also analyze the information captured in the encoder hidden state representations of a neural MT system. We investigate the extent to which syntactic and lexical-semantic information from the source side is captured by hidden state representations of different neural MT architectures.	翻訳日:2021-03-05 15:13:04 公開日:2021-03-04
# 敵対的攻撃を防御する構造保存型低ランク画像補完 Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks ( http://arxiv.org/abs/2103.02781v1 ) ライセンス: Link先を確認	Zhiqun Zhao, Hengyou Wang, Hao Sun and Zhihai He	(参考訳) ディープニューラルネットワークは、局所的な画像の詳細を分析し、推論層に沿って情報を要約することでオブジェクトを認識し、最終的な決定を導出する。このため、敵対的な攻撃の傾向が強い。入力画像の小さな洗練されたノイズは、ネットワーク推測経路に沿って蓄積し、ネットワーク出力で間違った決定を下すことができる。一方、人間の目は局所的なイメージテクスチャではなく、そのグローバルな構造と意味的な手がかりに基づいて物体を認識する。このため、人間の目は敵の攻撃によって大きな損傷を受けた画像から対象をはっきりと認識することができる。これは、ディープニューラルネットワークを敵の攻撃から守るための非常に興味深いアプローチにつながります。本研究では,入力画像から不要なテクスチャの詳細を取り除き,ディープニューラルネットワークのバイアスをグローバルオブジェクト構造や意味的手がかりにシフトさせる構造保存型プログレッシブ低ランク画像補完(splic)手法を提案する。最適化過程における局所最小化を回避するため、段階的に滑らかな階数関数を持つ低ランク行列補完問題に問題を定式化する。実験の結果,提案手法は重要なグローバルなオブジェクト構造を保ちながら,重要でないローカル画像の細部を除去できることがわかった。ブラックボックス,グレイボックス,ホワイトボックス攻撃では,既存の防御手法(最大12.6%)を上回り,ネットワークの敵対的堅牢性を大幅に向上させる。 Deep neural networks recognize objects by analyzing local image details and summarizing their information along the inference layers to derive the final decision. Because of this, they are prone to adversarial attacks. Small sophisticated noise in the input images can accumulate along the network inference path and produce wrong decisions at the network output. On the other hand, human eyes recognize objects based on their global structure and semantic cues, instead of local image textures. Because of this, human eyes can still clearly recognize objects from images which have been heavily damaged by adversarial attacks. This leads to a very interesting approach for defending deep neural networks against adversarial attacks. In this work, we propose to develop a structure-preserving progressive low-rank image completion (SPLIC) method to remove unneeded texture details from the input images and shift the bias of deep neural networks towards global object structures and semantic cues. We formulate the problem into a low-rank matrix completion problem with progressively smoothed rank functions to avoid local minimums during the optimization process. Our experimental results demonstrate that the proposed method is able to successfully remove the insignificant local image details while preserving important global object structures. On black-box, gray-box, and white-box attacks, our method outperforms existing defense methods (by up to 12.6%) and significantly improves the adversarial robustness of the network.	翻訳日:2021-03-05 15:12:13 公開日:2021-03-04
# Pruning in Pruning: テスト精度を超えたPruning Neural Networkの効果 Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy ( http://arxiv.org/abs/2103.03014v1 ) ライセンス: Link先を確認	Lucas Liebenwein, Cenk Baykal, Brandon Carter, David Gifford, Daniela Rus	(参考訳) ニューラルネットワークプルーニングは、現代的、潜在的に過パラメータ化された、ネットワークの推論コストを削減するために使用される一般的な技術です。事前訓練されたネットワークから始まるプロセスは、冗長なパラメータを削除し、再トレーニングし、同じテスト精度を維持しながら繰り返します。結果は、予測性能(テスト精度)に匹敵するオリジナルのサイズのごく一部であるモデルである。ここでは,テスト精度のみを終了条件で使用することが十分であるかどうかを再評価し,結果のモデルが,分布外データへの一般化やノイズに対するレジリエンスといった,幅広い"ハード"指標で良好に動作することを保証する。さまざまなアーキテクチャやデータセットの評価を横断すると、prunedネットワークは非prunedモデルを効果的に近似するが、prunedネットワークが同等のパフォーマンスを達成するプルーン比率はタスク間で大きく異なる。これらの結果は、深層学習における \emph{genuine} オーバーパラメータ化の程度を疑問視し、特に安全性クリティカルなシステムの文脈において、pruned ネットワークをデプロイすることの実践可能性について懸念を喚起する。私たちのコードはhttps://github.com/lucaslie/torchpruneで利用可能です。 Neural network pruning is a popular technique used to reduce the inference costs of modern, potentially overparameterized, networks. Starting from a pre-trained network, the process is as follows: remove redundant parameters, retrain, and repeat while maintaining the same test accuracy. The result is a model that is a fraction of the size of the original with comparable predictive performance (test accuracy). Here, we reassess and evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well across a wide spectrum of "harder" metrics such as generalization to out-of-distribution data and resilience to noise. Across evaluations on varying architectures and data sets, we find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks. These results call into question the extent of \emph{genuine} overparameterization in deep learning and raise concerns about the practicability of deploying pruned networks, specifically in the context of safety-critical systems, unless they are widely evaluated beyond test accuracy to reliably predict their performance. Our code is available at https://github.com/lucaslie/torchprune.	翻訳日:2021-03-05 15:11:49 公開日:2021-03-04
# ヒートマップとトリックバッグを用いたサブピクセル顔のランドマーク Sub-pixel face landmarks using heatmaps and a bag of tricks ( http://arxiv.org/abs/2103.03059v1 ) ライセンス: Link先を確認	Samuel W. F. Earp and Aubin Samacoits and Sanjana Jain and Pavit Noinongyao and Siwa Boonpunmongkol	(参考訳) 正確な顔のランドマークのローカリゼーションは、顔認識、再構築、モーフィングの不可欠な部分です。顔のランドマークを正確にローカライズするために,熱マップ回帰手法を提案する。各モデルはmobilenetv2バックボーンからなり、続いていくつかのスケールアップ層があり、パフォーマンスと推論コストの両方を最適化するさまざまなトリックがある。従来の手法のように境界ボックスを使うのではなく、顔の位置とアライメントに5つのna\"ive face landmarkを使用します。さらに,アライメント後にランダムな回転,変位,スケーリングを加えることで,モデルが向きよりも顔位置に敏感であることが分かる。また, デコンボリューション層とピクセルシャッフル層を混合することで, 局所化性能を損なうことなく, アップスケーリングの複雑さを低減できることを示した。我々は,最先端の顔ランドマークローカライズモデルを提案する(第2位は106ポイント顔ランドマークローカライズ検証セットの2位)。最後に,公開モデルとベンチマークを用いて,これらのランドマークを用いた顔認識の効果をテストする。 Accurate face landmark localization is an essential part of face recognition, reconstruction and morphing. To accurately localize face landmarks, we present our heatmap regression approach. Each model consists of a MobileNetV2 backbone followed by several upscaling layers, with different tricks to optimize both performance and inference cost. We use five na\"ive face landmarks from a publicly available face detector to position and align the face instead of using the bounding box like traditional methods. Moreover, we show by adding random rotation, displacement and scaling -- after alignment -- that the model is more sensitive to the face position than orientation. We also show that it is possible to reduce the upscaling complexity by using a mixture of deconvolution and pixel-shuffle layers without impeding localization performance. We present our state-of-the-art face landmark localization model (ranking second on The 2nd Grand Challenge of 106-Point Facial Landmark Localization validation set). Finally, we test the effect on face recognition using these landmarks, using a publicly available model and benchmarks.	翻訳日:2021-03-05 15:11:22 公開日:2021-03-04
# BM3D vs 2-Layer ONN BM3D vs 2-Layer ONN ( http://arxiv.org/abs/2103.03060v1 ) ライセンス: Link先を確認	Junaid Malik, Serkan Kiranyaz, Mehmet Yamac, Moncef Gabbouj	(参考訳) 最近の画像のノイズ除去の成功にもかかわらず、深く複雑なアーキテクチャの必要性はCNNの実用的な使用を妨げています。特にリソース制約のあるシナリオでは、bm3dのような古いが計算効率のよい手法が一般的である。本研究では,AWGN画像デノイジングにおけるBM3Dと比較し,小型ニューラルネットワークが競争結果を得ることができるかどうかを検討する。この目的のために,隠れレイヤを2つしか持たないネットワークを設定し,異なるニューロンモデルと層幅を用いて,異なるawgnノイズレベルにおけるbm3dの性能を比較する。この結果から, 生成ニューロンモデル(Self-ONNs)に基づくニューラルネットワークの自己組織型は, CNNよりも優れた選択であるだけでなく, BM3Dに比べて競合性があり, 高い雑音レベルにおいてさらに優れていることが示唆された。 Despite their recent success on image denoising, the need for deep and complex architectures still hinders the practical usage of CNNs. Older but computationally more efficient methods such as BM3D remain a popular choice, especially in resource-constrained scenarios. In this study, we aim to find out whether compact neural networks can learn to produce competitive results as compared to BM3D for AWGN image denoising. To this end, we configure networks with only two hidden layers and employ different neuron models and layer widths for comparing the performance with BM3D across different AWGN noise levels. Our results conclusively show that the recently proposed self-organized variant of operational neural networks based on a generative neuron model (Self-ONNs) is not only a better choice as compared to CNNs, but also provide competitive results as compared to BM3D and even significantly surpass it for high noise levels.	翻訳日:2021-03-05 15:11:01 公開日:2021-03-04
# Barlow Twins: 冗長化による自己監督型学習 Barlow Twins: Self-Supervised Learning via Redundancy Reduction ( http://arxiv.org/abs/2103.03230v1 ) ライセンス: Link先を確認	Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, St\'ephane Deny	(参考訳) SSL(Self-supervised Learning)は、大規模なコンピュータビジョンベンチマークの監督メソッドによるギャップを急速に閉じています。 SSLの成功したアプローチは、入力サンプルの歪みに不変な表現を学ぶことである。しかし、このアプローチの繰り返しの問題は、自明な定数表現の存在である。現在のほとんどのメソッドは、注意深く実装することで、そのような崩壊したソリューションを避ける。サンプルの歪んだバージョンで供給される2つの同一ネットワークの出力間の相互相関行列を計測し、可能な限り同一行列に近づけることで、そのような崩壊を自然に避ける目的関数を提案する。これにより、歪んだサンプルの表現ベクトルは類似し、これらのベクトルの成分間の冗長性が最小化される。この方法は、神経科学者H. Barlowの冗長還元原理が同一のネットワークに適用されるため、Barlow Twinsと呼ばれる。 Barlow Twinsは、予測器ネットワーク、勾配停止、重量更新における移動平均などのネットワーク双対間の大きなバッチや非対称性を必要としない。これは非常に高次元の出力ベクトルを使うことができる。 Barlow Twins は、低データ状態における半教師付き分類のための ImageNet の以前の手法よりも優れており、線形分類器ヘッドによる ImageNet の分類技術の現状と分類とオブジェクト検出の伝達タスクに匹敵する。 Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn representations which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant representations. Most current methods avoid such collapsed solutions by careful implementation details. We propose an objective function that naturally avoids such collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the representation vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. It allows the use of very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.	翻訳日:2021-03-05 15:10:45 公開日:2021-03-04
# 人工知能ガイド放射線学システムにおけるサニティテストによる汚い相関の検出 Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems ( http://arxiv.org/abs/2103.03048v1 ) ライセンス: Link先を確認	Usman Mahmood, Robik Shrestha, David D.B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Erdi, Christopher Kanan	(参考訳) 人工知能(ai)は、機械知覚の多くの問題を解決することに成功した。放射線学において、AIシステムは急速に進化し、治療決定の導出、診断、医療画像上の疾患の局所化、放射線医の効率の向上の進展を示している。放射線学におけるAIの展開における重要な要素は、開発システムの有効性と安全性への信頼を得ることである。現在のゴールドスタンダードのアプローチは、1つ以上の機関からの一般化データセットでパフォーマンスの分析検証を行い、次にデプロイ中のシステムの有効性に関する臨床検証を行う。臨床検証研究は時間がかかり、ベストプラクティスは分析検証データの限られた再利用を指示するので、システムが分析または臨床検証に失敗する可能性があるかどうかを事前に知るのが理想的です。本稿では,開発データに不正な理由から,システムがいつ良好に動作するかを特定するための一連の健全性テストについて述べる。本研究は,ctで見る膵癌分類のための深層学習システムを設計することで,健康検査の価値を示す。 Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.	翻訳日:2021-03-05 15:10:21 公開日:2021-03-04
# Perceiver: 反復的注意を伴った一般認識 Perceiver: General Perception with Iterative Attention ( http://arxiv.org/abs/2103.03206v1 ) ライセンス: Link先を確認	Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira	(参考訳) 生体システムは視覚、オーディション、タッチ、プロピオセプションなど様々な形態の高次元入力を同時に処理することで世界を理解する。一方、ディープラーニングで使用される知覚モデルは個々のモダリティのために設計されており、多くの場合、ほとんどすべての既存の視覚モデルによって活用される局所格子構造のようなドメイン固有の仮定に依存している。これらの優先事項は、有益な誘導バイアスを導入するだけでなく、個々のモダリティにモデルをロックする。本稿では,トランスフォーマーを基盤とするモデルであるperceiverについて紹介する。このモデルでは,入力間の関係についてアーキテクチャ上の仮定をほとんど行わないが,convnetsのような数十万の入力にもスケールする。このモデルは非対称な注意機構を利用して、反復的に入力をタイトな潜在ボトルネックに蒸留し、非常に大きな入力を処理するためにスケールすることができる。このアーキテクチャは,画像,ポイントクラウド,オーディオ,ビデオ,ビデオ+オーディオなど,さまざまなモードの分類タスクに対して,競争的に,あるいはそれ以上に,強力な特殊なモデルを実行していることを示す。イメージネット上のresnet-50に匹敵する性能は畳み込みなく、5万画素まで直接参加することで得られる。また、AudioSetのすべてのモダリティの最先端の結果を超えています。 Biological systems understand the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture performs competitively or beyond strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video and video+audio. The Perceiver obtains performance comparable to ResNet-50 on ImageNet without convolutions and by directly attending to 50,000 pixels. It also surpasses state-of-the-art results for all modalities in AudioSet.	翻訳日:2021-03-05 15:10:03 公開日:2021-03-04
# 画像間翻訳の新しい応用:単一画像からの学習による染色体ストレート化フレームワーク A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image ( http://arxiv.org/abs/2103.02835v1 ) ライセンス: Link先を確認	Sifan Song, Daiyun Huang, Yalun Hu, Chunxiao Yang, Jia Meng, Fei Ma, Jiaming Zhang, Jionglong Su	(参考訳) 医療用イメージングでは、染色体矯正は染色体の病理学的研究と細胞遺伝図の作成に重要な役割を果たします。ストレート化タスクには異なるアプローチが存在するが、主に幾何学的アルゴリズムであり、出力はジャッジエッジまたはバンドリングパターンを廃止したフラグメントによって特徴づけられる。幾何学的アルゴリズムの欠陥に対処するため,画像から画像への変換に基づく新しいフレームワークを提案し,不断なバンドリングパターンと保存された詳細を持つストレート化染色体を合成するための関係マッピング依存性を学習する。また、入力染色体の不足の落とし穴を避けるため、トレーニングモデルに単一の湾曲した染色体画像のみを用いた拡張データセットを構築した。この枠組みに基づき,u字型ネットワークと条件付き生成型逆ネットワークという2つの一般的な画像から画像への翻訳アーキテクチャを適用し,その有効性を評価する。 642個の実世界の染色体からなるデータセットに関する実験は、現実的で連続的な染色体詳細を表現し、直線化性能における幾何学的手法と比較して、我々の枠組みの優越性を示している。さらに, 染色体分類の精度を0.98%-1.39%向上させた。 In medical imaging, chromosome straightening plays a significant role in the pathological study of chromosomes and in the development of cytogenetic maps. Whereas different approaches exist for the straightening task, they are mostly geometric algorithms whose outputs are characterized by jagged edges or fragments with discontinued banding patterns. To address the flaws in the geometric algorithms, we propose a novel framework based on image-to-image translation to learn a pertinent mapping dependence for synthesizing straightened chromosomes with uninterrupted banding patterns and preserved details. In addition, to avoid the pitfall of deficient input chromosomes, we construct an augmented dataset using only one single curved chromosome image for training models. Based on this framework, we apply two popular image-to-image translation architectures, U-shape networks and conditional generative adversarial networks, to assess its efficacy. Experiments on a dataset comprising of 642 real-world chromosomes demonstrate the superiority of our framework as compared to the geometric method in straightening performance by rendering realistic and continued chromosome details. Furthermore, our straightened results improve the chromosome classification, achieving 0.98%-1.39% in mean accuracy.	翻訳日:2021-03-05 15:09:42 公開日:2021-03-04
# Morphset:フェースモーフィングを用いたディメンショナル・インパクト・ラベルを用いたカテゴリー別感情データセットの拡張 Morphset:Augmenting categorical emotion datasets with dimensional affect labels using face morphing ( http://arxiv.org/abs/2103.02854v1 ) ライセンス: Link先を確認	Vassilios Vonikakis, Dexter Neo, Stefan Winkler	(参考訳) 感情認識と理解は人間と機械の相互作用において重要な要素である。原子価と覚醒を用いた影響の次元モデルは、人間のエモ・オプション状態の複雑さのために伝統的なカテゴリーよりも有利である。しかし、次元的な感情アノテーションは収集が困難でコストがかかるため、それでも感情的なコンピューティングコミュニティでは限られている。そこで本論文では,これらの課題を補うために,得られたサンプルのディトリブチオンと円周空間の次元ラベルを完全に制御し,少なくとも20倍以上の増分係数を達成したまま,既存のカテゴリー的感情データセットから合成画像を生成する手法を提案する。 Emotion recognition and understanding is a vital componentin human-machine interaction. Dimensional models of affectsuch as those using valence and arousal have advantages overtraditional categorical ones due to the complexity of emo-tional states in humans. However, dimensional emotion an-notations are difficult and expensive to collect, therefore theyare still limited in the affective computing community. To ad-dress these issues, we propose a method to generate syntheticimages from existing categorical emotion datasets using facemorphing, with full control over the resulting sample distri-bution as well as dimensional labels in the circumplex space,while achieving augmentation factors of at least 20x or more.	翻訳日:2021-03-05 15:09:23 公開日:2021-03-04
# ロバスト長期政策移行に向けて Toward Robust Long Range Policy Transfer ( http://arxiv.org/abs/2103.02957v1 ) ライセンス: Link先を確認	Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun	(参考訳) 人間は、経験を積んで得たスキルを活かして、数回の試行錯誤で新しいタスクをマスターできます。この能力を模倣するために、事前タスクから学習した原始的ポリシーを組み合わせた階層モデルが提案されている。しかし、これらの方法は人間の移動可能性の範囲と比較して短い。そこで本稿では,階層構造を活用し,複合機能を訓練し,多種多様な原始警察を交互に適応させ,新しい課題に挑戦する上で,様々な複雑な行動を効率的に生み出す手法を提案する。また,プリミティブの多様性と利用率を向上させるために,プリトレーニングフェーズにおける2つの正規化項を設計した。提案手法は,タスク内のこれら再利用可能なプリミティブを連続的なアクション空間と組み合わせることで,他の最近のポリシー転送手法よりも優れることを示す。実験の結果,提案手法がより広い転送範囲を提供することが示された。アブレーション研究は、規則化条件が長期政策移行に重要であることも示している。最後に,本手法は,プリミティブの品質が変化する場合,他の手法よりも常に優れることを示す。 Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination function and adapt the set of diverse primitive polices alternatively, to efficiently produce a range of complex behaviors on challenging new tasks. We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase. We demonstrate that our method outperforms other recent policy transfer methods by combining and adapting these reusable primitives in tasks with continuous action space. The experiment results further show that our approach provides a broader transferring range. The ablation study also shows the regularization terms are critical for long range policy transfer. Finally, we show that our method consistently outperforms other methods when the quality of the primitives varies.	翻訳日:2021-03-05 15:09:11 公開日:2021-03-04
# マルチターン対話理解の進歩:調査 Advances in Multi-turn Dialogue Comprehension: A Survey ( http://arxiv.org/abs/2103.03125v1 ) ライセンス: Link先を確認	Zhuosheng Zhang and Hai Zhao	(参考訳) 自然言語を理解し、人間と対話する機械の訓練は、人工知能の分野では難解で不可欠な作業です。近年,深層学習研究,特に最近の事前学習言語モデルの急速な発展にともなって,対話システムの多様化が図られている。これらの研究の中で、基本的な課題は対話理解であり、その役割は機械に応答する前に対話の文脈を読み、理解させることである。本稿では,対話モデリングの観点から,これまでの手法を検討する。平文読解とは対照的に,対話理解の特徴と課題を要約する。次に,対話シナリオにおけるprlm向上のための対話関連言語モデリング手法とともに,対話理解タスクにおいて広く用いられている対話モデリングの3つの典型的なパターンについて考察する。最後に,近年の技術的進歩を浮き彫りにして,経験的分析から学べる教訓と新たな研究のフロンティアへの展望を指摘する。 Training machines to understand natural language and interact with humans is an elusive and essential task in the field of artificial intelligence. In recent years, a diversity of dialogue systems has been designed with the rapid development of deep learning researches, especially the recent pre-trained language models. Among these studies, the fundamental yet challenging part is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the perspective of dialogue modeling. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks such as response selection and conversation question-answering, as well as dialogue-related language modeling techniques to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons we can learn from the empirical analysis and the prospects towards a new frontier of researches.	翻訳日:2021-03-05 15:08:17 公開日:2021-03-04
# 暗黙的政策推定による逆強化学習 Inverse Reinforcement Learning with Explicit Policy Estimates ( http://arxiv.org/abs/2103.02863v1 ) ライセンス: Link先を確認	Navyata Sanghvi, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, Kris Kitani	(参考訳) 逆強化学習(IRL)問題を解くための様々な手法が、機械学習と経済学において独立に開発されている。特に、最大因果エントロピーIRL法はエントロピー最大化の観点に基づいており、経済分野における関連する進歩は、専門家の振る舞いを説明するために観測されていない作用ショックの存在を前提としている(Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm)。本研究では,これらの関連手法について,両分野から未知の接続を行う。目的の共通形式、関連する方針、客観的勾配を特徴とする最適化問題のクラスに属することを示すことにより、これを達成する。最適ソフト値関数の近似による手法間の鍵となる計算量とアルゴリズムの差異を実証し,より効率的なアルゴリズムを導出する方法について述べる。この最適化問題の研究から得られた知見を用いて,様々な問題シナリオを特定し,それらの問題に対する各手法の適合性について検討する。 Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of the optimal soft value function, and describe how this leads to more efficient algorithms. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.	翻訳日:2021-03-05 15:07:25 公開日:2021-03-04
# 弱監督分類における低有界適正損失 Lower-bounded proper losses for weakly supervised classification ( http://arxiv.org/abs/2103.02893v1 ) ライセンス: Link先を確認	Shuhei M. Yoshida, Takashi Takenouchi, Masashi Sugiyama	(参考訳) 本稿では,あるラベル破損プロセスによって生成される弱いラベルをインスタンスに付与する分類の弱い教師付き学習の問題について論じる。目標は、弱ラベル学習における損失関数が適切かつ低境界である条件を導出することであり、クラス確率推定に使用される損失の2つの必須条件である。そのために、教師付き学習における適切な損失を表す表現定理を導出し、サベージ表現を双対化する。この定理を用いて, 固有な弱ラベル損失を特徴付け, 低バウンドとなる条件を見いだす。これらの理論的知見に基づき, 正則化法則化法を導出し, 正則性を失うことなく, 下から任意の弱ラベル損失を境界とする一般化ロジット絞込み法を導出する。さらに,提案手法の有効性を,不適切な損失や非バウンド損失と比較して実験的に実証した。これらの結果は、適切性と低い有界性の重要性を強調します。コードはhttps://github.com/yoshum/lower-bounded-proper-lossesで公開されている。 This paper discusses the problem of weakly supervised learning of classification, in which instances are given weak labels that are produced by some label-corruption process. The goal is to derive conditions under which loss functions for weak-label learning are proper and lower-bounded -- two essential requirements for the losses used in class-probability estimation. To this end, we derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation. We use this theorem to characterize proper weak-label losses and find a condition for them to be lower-bounded. Based on these theoretical findings, we derive a novel regularization scheme called generalized logit squeezing, which makes any proper weak-label loss bounded from below, without losing properness. Furthermore, we experimentally demonstrate the effectiveness of our proposed approach, as compared to improper or unbounded losses. Those results highlight the importance of properness and lower-boundedness. The code is publicly available at https://github.com/yoshum/lower-bounded-proper-losses.	翻訳日:2021-03-05 15:07:06 公開日:2021-03-04
# KL発散最小化によるベストランク-1テンソル近似の閉形式解 A Closed Form Solution to Best Rank-1 Tensor Approximation via KL divergence Minimization ( http://arxiv.org/abs/2103.02898v1 ) ライセンス: Link先を確認	Kazu Ghalamkari, Mahito Sugiyama	(参考訳) テンソル分解は根本的に難しい問題です。テンソル分解の最も単純な場合でさえも、最小二乗 (ls) 誤差の項におけるランク1近似はnpハードであることが知られている。ここでは、LS誤差の代わりにKLの発散を考えると、与えられた正のテンソルからKLの発散を最小限に抑えるランク1テンソルに対する閉形式解を解析的に導出できることが示される。我々の重要な洞察は、正のテンソルを確率分布として扱い、ランク1のテンソルの集合への射影としてランク1近似の過程を定式化することである。これにより,階数1近似を凸最適化により解くことができる。実験により,我々のアルゴリズムは既存のランク1近似法よりも桁違いに高速であることを示すとともに,理論的な発見を支援するテンソルの近似性を向上する。 Tensor decomposition is a fundamentally challenging problem. Even the simplest case of tensor decomposition, the rank-1 approximation in terms of the Least Squares (LS) error, is known to be NP-hard. Here, we show that, if we consider the KL divergence instead of the LS error, we can analytically derive a closed form solution for the rank-1 tensor that minimizes the KL divergence from a given positive tensor. Our key insight is to treat a positive tensor as a probability distribution and formulate the process of rank-1 approximation as a projection onto the set of rank-1 tensors. This enables us to solve rank-1 approximation by convex optimization. We empirically demonstrate that our algorithm is an order of magnitude faster than the existing rank-1 approximation methods and gives better approximation of given tensors, which supports our theoretical finding.	翻訳日:2021-03-05 15:06:48 公開日:2021-03-04
# Calibrated Simplex Mapping Classification Calibrated Simplex Mapping Classification ( http://arxiv.org/abs/2103.02926v1 ) ライセンス: Link先を確認	Raoul Heese, Micha{\l} Walczak, Michael Bortz, Jochen Schmid	(参考訳) 本研究では,学習データを直線的に分離可能な潜在空間に写像する多重クラス/単一ラベル分類器を提案する。このアプローチにより、分類問題をよく定義された回帰問題に変換することができる。そのソリューションでは、特徴空間における適切な距離メトリックと、潜在的な空間座標を予測する回帰モデルを選択できます。様々な人工および実世界のデータセットのベンチマークを用いて,分類器のキャリブレーション品質と予測性能を示す。 We propose a novel supervised multi-class/single-label classifier that maps training data onto a linearly separable latent space with a simplex-like geometry. This approach allows us to transform the classification problem into a well-defined regression problem. For its solution we can choose suitable distance metrics in feature space and regression models predicting latent space coordinates. A benchmark on various artificial and real-world data sets is used to demonstrate the calibration qualities and prediction performance of our classifier.	翻訳日:2021-03-05 15:06:32 公開日:2021-03-04
# svmax: 機能埋め込み正規化子 SVMax: A Feature Embedding Regularizer ( http://arxiv.org/abs/2103.02770v1 ) ライセンス: Link先を確認	Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis	(参考訳) ニューラルネットワーク正規化器(例えば体重減少)は、ネットワークの複雑さを明示的に罰することにより性能を高める。本稿では,下位のネットワークアクティベーション -- 機能埋め込み -- をペナルティし,ネットワークの重みを暗黙的に規則化する。より均一な特徴埋め込みを学習するための特異値最大化(SVMax)を提案する。 SVMax正規化器は教師なし学習と教師なし学習の両方をサポートする。モデル崩壊を緩和し、学習率を大きくします。 SVMax正則化器を検索と生成の両逆ネットワークを用いて評価する。ガウスデータセットの合成混合物を用いてSVMaxを教師なし環境で評価する。検索ネットワークの場合、SVMaxは様々なランキング損失で大幅な改善マージンを達成します。 https://bit.ly/3jNkgDt A neural network regularizer (e.g., weight decay) boosts performance by explicitly penalizing the complexity of a network. In this paper, we penalize inferior network activations -- feature embeddings -- which in turn regularize the network's weights implicitly. We propose singular value maximization (SVMax) to learn a more uniform feature embedding. The SVMax regularizer supports both supervised and unsupervised learning. Our formulation mitigates model collapse and enables larger learning rates. We evaluate the SVMax regularizer using both retrieval and generative adversarial networks. We leverage a synthetic mixture of Gaussians dataset to evaluate SVMax in an unsupervised setting. For retrieval networks, SVMax achieves significant improvement margins across various ranking losses. Code available at https://bit.ly/3jNkgDt	翻訳日:2021-03-05 15:06:25 公開日:2021-03-04
# ストアド埋め込みによる視覚強化学習における計算効率の向上 Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings ( http://arxiv.org/abs/2103.02886v1 ) ライセンス: Link先を確認	Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel	(参考訳) オフポリシー深層強化学習(RL)の最近の進歩は、視覚観察からの複雑なタスクで印象的な成功をもたらしました。 experience replayは過去の経験を再利用することでサンプル効率を改善し、畳み込みニューラルネットワーク(cnns)は高次元入力を効果的に処理する。しかし、そのような技術は高いメモリと計算帯域を必要とする。本稿では,既存の非政治RLメソッドの単純な修正であるストアド・エンベディング for Efficient Reinforcement Learning (SEER) について,これらの計算とメモリの要件に対処するために提示する。 CNNの勾配更新の計算オーバーヘッドを減らすために、パラメータの早期収束によるトレーニングの早い段階でCNNエンコーダの下層を凍結します。さらに、高次元画像の代わりに経験再生のための低次元潜時ベクトルを格納することにより、メモリ要求を低減し、リプレイバッファ容量の適応的増加を可能にする。実験の結果,SEERはRLエージェントの性能を劣化させることなく,様々なDeepMindコントロール環境とAtariゲーム間で計算とメモリを著しく節約できることがわかった。 CNNの下位層は、異なるタスクやドメインに使用できる一般化可能な特徴を抽出するため、SEERはRLの計算効率の高い転送学習に有用であることを示す。 Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations. Experience replay improves sample-efficiency by reusing experiences from the past, and convolutional neural networks (CNNs) process high-dimensional inputs effectively. However, such techniques demand high memory and computational bandwidth. In this paper, we present Stored Embeddings for Efficient Reinforcement Learning (SEER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements. To reduce the computational overhead of gradient updates in CNNs, we freeze the lower layers of CNN encoders early in training due to early convergence of their parameters. Additionally, we reduce memory requirements by storing the low-dimensional latent vectors for experience replay instead of high-dimensional images, enabling an adaptive increase in the replay buffer capacity, a useful technique in constrained-memory settings. In our experiments, we show that SEER does not degrade the performance of RL agents while significantly saving computation and memory across a diverse set of DeepMind Control environments and Atari games. Finally, we show that SEER is useful for computation-efficient transfer learning in RL because lower layers of CNNs extract generalizable features, which can be used for different tasks and domains.	翻訳日:2021-03-05 15:06:15 公開日:2021-03-04
# 差動的階層的テキスト分類におけるプライバシ利用トレードオフについて On the privacy-utility trade-off in differentially private hierarchical text classification ( http://arxiv.org/abs/2103.02895v1 ) ライセンス: Link先を確認	Dominik Wunderlich, Daniel Bernau, Francesco Ald\`a, Javier Parra-Arnau, Thorsten Strufe	(参考訳) テキスト分類のための階層モデルは、トレーニングデータ記憶のために機密または機密のトレーニングデータ情報を敵に漏らすことができます。モデルトレーニング中に差分プライバシーを使用することで、トレーニングオプティマイザを摂動させることで、トレーニングモデルに対する漏洩攻撃を軽減できます。しかし、階層的なテキスト分類では、モデルアーキテクチャの多重性が利用可能であり、モデル精度とモデルリークとのトレードオフが、他のアーキテクチャよりも優れているかどうかは不明である。我々は,ホワイトボックスのメンバシップ推論攻撃を用いて,階層的テキスト分類のための3つの広範に使用されているニューラルネットワークアーキテクチャの情報漏洩を評価する。我々は,メンバシップ推論攻撃を完全に軽減するために,比較的弱い差分プライバシ保証がすでに十分であることを示す。より具体的には、長いテキストを持つ大規模なデータセットでは、トランスベースのモデルを観察して、全体的に有利なプライバシユーティリティトレードオフを達成しました。 Hierarchical models for text classification can leak sensitive or confidential training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models by perturbing the training optimizer. However, for hierarchical text classification a multiplicity of model architectures is available and it is unclear whether some architectures yield a better trade-off between remaining model accuracy and model leakage under differentially private training perturbation than others. We use a white-box membership inference attack to assess the information leakage of three widely used neural network architectures for hierarchical text classification under differential privacy. We show that relatively weak differential privacy guarantees already suffice to completely mitigate the membership inference attack, thus resulting only in a moderate decrease in utility. More specifically, for large datasets with long texts we observed transformer-based models to achieve an overall favorable privacy-utility trade-off, while for smaller datasets with shorter texts CNNs are preferable.	翻訳日:2021-03-05 15:05:54 公開日:2021-03-04
# bad and good error: ディープアンサンブル学習における価値重み付けスキルスコア Bad and good errors: value-weighted skill scores in deep ensemble learning ( http://arxiv.org/abs/2103.02881v1 ) ライセンス: Link先を確認	Sabrina Guastavino, Michele Piana, Federico Benvenuto	(参考訳) 本稿では,予測検証を実現する新しい手法を提案する。具体的には、予測誤差の重大度を評価するための戦略を、ある事象の発生を予測しただけの誤報が、連続した非発生事象の途中の1つよりも優れており、一方、孤立した事象のミスは、複数の連続事象の一部である1つの事象の欠落よりも悪い影響を有するという証拠に基づいて紹介する。本稿では,この概念に基づいて,その品質よりも予測の価値に重きを置くような,混乱行列とスキルスコアの新たな定義を導入する。次に,これらの値重み付けスキルスコアの最適化により,ニューラルネットワークの確率的結果がクラスタ化される,バイナリ分類のための深層アンサンブル学習手順を提案する。我々は, 公害, 宇宙天気, 株式賞の予測に関する3つの応用事例において, 最終的にこのアプローチの性能を示す。 In this paper we propose a novel approach to realize forecast verification. Specifically, we introduce a strategy for assessing the severity of forecast errors based on the evidence that, on the one hand, a false alarm just anticipating an occurring event is better than one in the middle of consecutive non-occurring events, and that, on the other hand, a miss of an isolated event has a worse impact than a miss of a single event, which is part of several consecutive occurrences. Relying on this idea, we introduce a novel definition of confusion matrix and skill scores giving greater importance to the value of the prediction rather than to its quality. Then, we introduce a deep ensemble learning procedure for binary classification, in which the probabilistic outcomes of a neural network are clustered via optimization of these value-weighted skill scores. We finally show the performances of this approach in the case of three applications concerned with pollution, space weather and stock prize forecasting.	翻訳日:2021-03-05 15:05:35 公開日:2021-03-04
# データサイエンスのためのサーバレスモデル Serverless Model Serving for Data Science ( http://arxiv.org/abs/2103.02958v1 ) ライセンス: Link先を確認	Yuncheng Wu, Tien Tuan Anh Dinh, Guoyu Hu, Meihui Zhang, Yeow Meng Chee, Beng Chin Ooi	(参考訳) 機械学習(ML)は、現代のデータサイエンスアプリケーションの重要な部分です。データサイエンティストは現在、モデルトレーニングとモデルサービスの両方を含むエンドツーエンドのMLライフサイクルを管理しなければなりません。モデルサービスのためのシステムは、高いパフォーマンス、低コスト、管理の容易さを必要とする。クラウドプロバイダは、マネージドサービスやセルフレンタルサーバなど、モデルサービスオプションをすでに提供している。最近では、高い弾力性ときめ細かいコストモデルを含むサーバレスコンピューティングが、モデル提供の新たな可能性をもたらしている。本稿では、データサイエンスアプリケーションのためのプラットフォームを提供する主流モデルとしてのサーバーレスの実現可能性について検討する。 Amazon Web Service(AWS)とGoogle Cloud Platform(GCP)の2つのクラウド上の他のモデルサービスシステムに対して、サーバレスのパフォーマンスとコストを総合的に評価します。サーバーレスは、コストとパフォーマンスに関して多くのクラウドベースの代替手段を上回っています。さらに興味深いのは、いくつかの状況下では、平均レイテンシとコストの両方でGPUベースのシステムより優れていることだ。これらの結果は、サーバーレスはモデルサービスには適さないという以前のワークスの主張と異なり、GPUベースのシステムはCPUベースのシステムよりもMLワークロードに適しているという従来の認識に反している。他の発見としては、AWSとGCPのサーバレス関数間のコールドスタート時間の大きなギャップ、ワークロードやモデルの変更に対するサーバレスの低感度などが挙げられる。評価結果は、サーバレスがモデルサービスにとって実行可能な選択肢であることを示している。最後に,スケーラブルでコスト効率のよいモデル提供にサーバレスを使用する方法について,データサイエンティストに対していくつかの実践的な推奨を行う。 Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it makes their works available to end-users. Systems for model serving require high performance, low cost, and ease of management. Cloud providers are already offering model serving options, including managed services and self-rented servers. Recently, serverless computing, whose advantages include high elasticity and fine-grained cost model, brings another possibility for model serving. In this paper, we study the viability of serverless as a mainstream model serving platform for data science applications. We conduct a comprehensive evaluation of the performance and cost of serverless against other model serving systems on two clouds: Amazon Web Service (AWS) and Google Cloud Platform (GCP). We find that serverless outperforms many cloud-based alternatives with respect to cost and performance. More interestingly, under some circumstances, it can even outperform GPU-based systems for both average latency and cost. These results are different from previous works' claim that serverless is not suitable for model serving, and are contrary to the conventional wisdom that GPU-based systems are better for ML workloads than CPU-based systems. Other findings include a large gap in cold start time between AWS and GCP serverless functions, and serverless' low sensitivity to changes in workloads or models. Our evaluation results indicate that serverless is a viable option for model serving. Finally, we present several practical recommendations for data scientists on how to use serverless for scalable and cost-effective model serving.	翻訳日:2021-03-05 15:05:18 公開日:2021-03-04
# 胸部CT画像におけるEigenlungs-based classifierとCOVID-19診断の確率的組み合わせ Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images ( http://arxiv.org/abs/2103.02961v1 ) ライセンス: Link先を確認	Juan E. Arco, Andr\'es Ortiz, Javier Ram\'irez, Francisco J. Mart\'inez-Murcia, Yu-Dong Zhang, Jordi Broncano, M. \'Alvaro Berb\'is, Javier Royuela-del-Val, Antonio Luna, Juan M. G\'orriz	(参考訳) 新型コロナウイルス(covid-19)パンデミック(coonavirus disease 2019)の流行は世界を変えた。世界保健機関(WHO)によると、新型コロナウイルスの感染者数は1億人以上で、2400万人以上が死亡している。この疾患の早期発見は非常に重要であり、胸部X線(CXR)や胸部CT(CCT)などの医療画像の使用は優れたソリューションであることが証明されています。しかし、このプロセスでは、医師が手作業や時間を要する作業で行う必要があり、診断のスピードアップには適していない。本研究では,肺炎のパターンを識別するために,確率的支援ベクトルマシン(SVM)に基づくアンサンブル分類器を提案する。具体的には、各CCTスキャンを立方パッチに分割し、それぞれに含まれる特徴をカーネルPCAを適用して抽出する。アンサンブル内での塩基型分類器の使用により,サイズや位置に関わらず,本システムは肺炎のパターンを識別できる。個々のパッチの決定は、個々の分類の信頼性に応じてグローバルに結合されます:不確実性が低いほど、貢献度が高くなります。実際のシナリオで性能を評価し、精度は97.86%である。得られた大きな性能とシステムのシンプルさ(CCT画像におけるディープラーニングの使用は膨大な計算コストをもたらす)は、現実世界での提案の適用可能性を示しています。 The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment.	翻訳日:2021-03-05 15:04:46 公開日:2021-03-04
# ノード欠落した多層グラフのクラスタリング Clustering multilayer graphs with missing nodes ( http://arxiv.org/abs/2103.03235v1 ) ライセンス: Link先を確認	Guillaume Braun, Hemant Tyagi, Christophe Biernacki	(参考訳) エージェント間の関係はグラフによって便利に表現できる。これらの関係が異なるモダリティを持つ場合、各層が1つのモダリティに関連付けられる多層グラフによりモデル化される。このようなグラフは、生物的および社会的ネットワークを含む多くの文脈で自然に生じる。クラスタリングはネットワーク分析における基本的な問題であり、同じ接続プロファイルを持つノードを再グループ化するのが目標である。過去10年間で、各レイヤが提供する情報を統合するために、一層設定から多層グラフへ様々なクラスタリング手法が拡張されてきた。既存のほとんどの作業では、すべてのレイヤが同じノードセットを共有していると仮定していますが、異なるノードセットでレイヤを定義することができる新しいフレームワークを提案します。特に、層に記録されていないノードは欠落として扱われる。このパラダイム内では,不完全なクラスタへの完全設定において,よく知られたクラスタリング手法のいくつかの一般化を調べ,多層確率ブロックモデルの仮定の下で一貫性の証明を行う。当社の理論結果は、合成データに関するアルゴリズムと実際のデータセットの数値的比較によって補完され、様々な設定における我々の手法の有望な振る舞いを強調しています。 Relationship between agents can be conveniently represented by graphs. When these relationships have different modalities, they are better modelled by multilayer graphs where each layer is associated with one modality. Such graphs arise naturally in many contexts including biological and social networks. Clustering is a fundamental problem in network analysis where the goal is to regroup nodes with similar connectivity profiles. In the past decade, various clustering methods have been extended from the unilayer setting to multilayer graphs in order to incorporate the information provided by each layer. While most existing works assume - rather restrictively - that all layers share the same set of nodes, we propose a new framework that allows for layers to be defined on different sets of nodes. In particular, the nodes not recorded in a layer are treated as missing. Within this paradigm, we investigate several generalizations of well-known clustering methods in the complete setting to the incomplete one and prove some consistency results under the Multi-Layer Stochastic Block Model assumption. Our theoretical results are complemented by thorough numerical comparisons between our proposed algorithms on synthetic data, and also on real datasets, thus highlighting the promising behaviour of our methods in various settings.	翻訳日:2021-03-05 15:04:23 公開日:2021-03-04
# モーメントとマッチング:模倣学習におけるトレードオフと治療 Of Moments and Matching: Trade-offs and Treatments in Imitation Learning ( http://arxiv.org/abs/2103.03236v1 ) ライセンス: Link先を確認	Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, J. Andrew Bagnell	(参考訳) 我々は、モーメントマッチングのレンズを通して、過去の模倣学習アルゴリズムの大規模なファミリの統一ビューを提供する。その中心となる分類法は,(1)報奨と(2)専門家の行動の行動価値モーメントを一致させようとする学習者かに基づいており,それぞれの選択肢によって異なるアルゴリズム的アプローチが導かれる。学習者と専門家の行動の反対に選択された分岐を考慮することによって、私たちはこれらのクラスのすべてのアルゴリズムに適用する政策パフォーマンスの境界を導き出すことができます。また,従来の模擬学習において暗黙的な復元可能性の概念を導入し,各アルゴリズムファミリーが複合的誤りを軽減できるかを明確化することができる。 AdVILとAdRILという2つの新しいアルゴリズムテンプレートを、強力な保証、シンプルな実装、競争力のある実証的パフォーマンスで導出します。 We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between learner and expert behavior, we are able to derive bounds on policy performance that apply for all algorithms in each of these classes, the first to our knowledge. We also introduce the notion of recoverability, implicit in many previous analyses of imitation learning, which allows us to cleanly delineate how well each algorithmic family is able to mitigate compounding errors. We derive two novel algorithm templates, AdVIL and AdRIL, with strong guarantees, simple implementation, and competitive empirical performance.	翻訳日:2021-03-05 15:04:06 公開日:2021-03-04
# ロバストな医用画像分割のためのコンテキストフィードバックループによる学習 Learning With Context Feedback Loop for Robust Medical Image Segmentation ( http://arxiv.org/abs/2103.02844v1 ) ライセンス: Link先を確認	Kibrom Berihu Girum, Gilles Cr\'ehange, Alain Lalande	(参考訳) 深層学習は、医用画像セグメンテーションにうまく活用されている。コンボリューショナルニューラルネットワーク(CNN)を使用して、定義された画素ワイドの目的関数から特有の画像特徴を学習する。しかし、このアプローチは不完全で非現実的なセグメンテーション結果を生成する出力画素相互依存を減少させる可能性がある。本稿では,2つのシステムを用いた再帰的枠組みとしてセグメンテーション問題を定式化し,ロバストな医用画像セグメンテーションのための完全自動深層学習手法を提案する。 1つ目は、入力画像からセグメンテーション結果を予測するエンコーダデコーダcnnのフォワードシステムである。フォワードシステムの予測確率出力は、完全な畳み込みネットワーク(FCN)ベースのコンテキストフィードバックシステムによって符号化される。 FCNの符号化された特徴空間は、フォワードシステムのフィードフォワード学習プロセスに統合される。 FCNベースのコンテキストフィードバックループを使用することで、フォワードシステムはより高レベルな画像の特徴を学習し、抽出し、以前の誤りを修正し、時間とともに予測精度を向上させることができる。 4つの異なる臨床データセットで実施した実験結果から,本手法の医療画像の単一・多構造セグメント化への応用の可能性を示した。フィードバックループにより、ディープラーニングメソッドは解剖学的に実行可能で、コントラストの低い画像に対して堅牢な結果を生み出すことができる。したがって、コンテキストフィードバックループを介して2つの相互接続ネットワークの繰り返しフレームワークとして画像セグメンテーションを形成することは、堅牢で効率的な医療画像分析の潜在的な方法である。 Deep learning has successfully been leveraged for medical image segmentation. It employs convolutional neural networks (CNN) to learn distinctive image features from a defined pixel-wise objective function. However, this approach can lead to less output pixel interdependence producing incomplete and unrealistic segmentation results. In this paper, we present a fully automatic deep learning method for robust medical image segmentation by formulating the segmentation problem as a recurrent framework using two systems. The first one is a forward system of an encoder-decoder CNN that predicts the segmentation result from the input image. The predicted probabilistic output of the forward system is then encoded by a fully convolutional network (FCN)-based context feedback system. The encoded feature space of the FCN is then integrated back into the forward system's feed-forward learning process. Using the FCN-based context feedback loop allows the forward system to learn and extract more high-level image features and fix previous mistakes, thereby improving prediction accuracy over time. Experimental results, performed on four different clinical datasets, demonstrate our method's potential application for single and multi-structure medical image segmentation by outperforming the state of the art methods. With the feedback loop, deep learning methods can now produce results that are both anatomically plausible and robust to low contrast images. Therefore, formulating image segmentation as a recurrent framework of two interconnected networks via context feedback loop can be a potential method for robust and efficient medical image analysis.	翻訳日:2021-03-05 15:03:51 公開日:2021-03-04
# ディープニューラルネットワークを用いたx線血管造影における冠動脈狭窄の自動検出 Automated Detection of Coronary Artery Stenosis in X-ray Angiography using Deep Neural Networks ( http://arxiv.org/abs/2103.02969v1 ) ライセンス: Link先を確認	Dinis L. Rodrigues, Miguel Nobre Menezes, Fausto J. Pinto, Arlindo L. Oliveira	(参考訳) 冠動脈の一部または全部の閉塞である狭窄につながる冠動脈疾患は、毎年数百万の患者に影響を与える重篤な疾患である。最小限の侵襲的処置による狭窄度の自動同定と分類は臨床的価値が高いが、作業の複雑さのため、既存の方法は経験豊富な心科医の正確さに合致しない。狭窄を定量的に評価するための多くの計算手法が提案されているが、これらの手法の性能は臨床応用に必要なレベルには程遠い。本稿では,X線冠動脈造影画像からの狭窄検出を部分的に自動化する2段階のディープラーニングフレームワークを提案する。 2つのステップにおいて、我々は2つの異なる畳み込みニューラルネットワークアーキテクチャを使用し、1つはビューの角度を自動的に識別し分類し、もう1つは、狭窄が見えるフレームにおける関心領域の境界ボックスを決定する。転送学習とデータ拡張技術は、両方のタスクでシステムの性能を高めるために使用された。左/右冠動脈(LCA/RCA)角度ビューの分類作業において0.97の精度とLCAとRCAの関心領域の決定に関する0.68/0.73リコールを達成した。これらの結果は関連するアプローチで得られたこれまでの結果と比較し、x線血管造影から狭窄度を完全自動化する方法への道を開く。 Coronary artery disease leading up to stenosis, the partial or total blocking of coronary arteries, is a severe condition that affects millions of patients each year. Automated identification and classification of stenosis severity from minimally invasive procedures would be of great clinical value, but existing methods do not match the accuracy of experienced cardiologists, due to the complexity of the task. Although a number of computational approaches for quantitative assessment of stenosis have been proposed to date, the performance of these methods is still far from the required levels for clinical applications. In this paper, we propose a two-step deep-learning framework to partially automate the detection of stenosis from X-ray coronary angiography images. In the two steps, we used two distinct convolutional neural network architectures, one to automatically identify and classify the angle of view, and another to determine the bounding boxes of the regions of interest in frames where stenosis is visible. Transfer learning and data augmentation techniques were used to boost the performance of the system in both tasks. We achieved a 0.97 accuracy on the task of classifying the Left/Right Coronary Artery (LCA/RCA) angle view and 0.68/0.73 recall on the determination of the regions of interest, for LCA and RCA, respectively. These results compare favorably with previous results obtained using related approaches, and open the way to a fully automated method for the identification of stenosis severity from X-ray angiographies.	翻訳日:2021-03-05 15:03:29 公開日:2021-03-04
# PointGuard: おそらくロバストな3Dポイントクラウド分類 PointGuard: Provably Robust 3D Point Cloud Classification ( http://arxiv.org/abs/2103.03046v1 ) ライセンス: Link先を確認	Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong	(参考訳) 3Dポイントクラウド分類には、自律運転やロボットグリップなど、多くの安全クリティカルな応用がある。しかし、いくつかの研究で敵の攻撃に弱いことが示されている。特に、攻撃者は、少数のポイントを慎重に修正、追加、削除することで、3Dポイントクラウドの誤ラベルを予測することができる。ランダム化スムージングは、確実な堅牢な2D画像分類器を構築するための最先端の技術です。しかし、3Dポイントクラウド分類を適用すると、ランダム化されたスムージングは、逆 {modified} ポイントに対する堅牢性のみを証明できます。本研究では,反対に修正,追加,削除された点に対する堅牢性を保証する最初の防御であるPointGuardを提案する。具体的には、3Dポイントクラウドと任意のポイントクラウド分類器を与えられた場合、PointGuardは最初に、元のポイントクラウド内のポイントのランダムサブセットを含む複数のサブサンプルポイントクラウドを作成し、PointGuardは、ポイントクラウド分類器によって予測されるサブサンプルポイントクラウドのラベルの過半数として、元のポイントクラウドのラベルを予測します。最初の大きな理論的貢献は、逆修正、追加、および/または削除されたポイントの数に境界がある場合、PointGuardが3Dポイントクラウドの同じラベルを予測できることを示しています。 2つ目の大きな理論的貢献は、点クラウド分類器に仮定がない場合、導出境界の厳密性を証明することです。さらに、認証された堅牢性保証を計算する効率的なアルゴリズムを設計します。また、ModelNet40およびScanNetベンチマークデータセット上でPointGuardを実証的に評価する。 3D point cloud classification has many safety-critical applications such as autonomous driving and robotic grasping. However, several studies showed that it is vulnerable to adversarial attacks. In particular, an attacker can make a classifier predict an incorrect label for a 3D point cloud via carefully modifying, adding, and/or deleting a small number of its points. Randomized smoothing is state-of-the-art technique to build certifiably robust 2D image classifiers. However, when applied to 3D point cloud classification, randomized smoothing can only certify robustness against adversarially {modified} points. In this work, we propose PointGuard, the first defense that has provable robustness guarantees against adversarially modified, added, and/or deleted points. Specifically, given a 3D point cloud and an arbitrary point cloud classifier, our PointGuard first creates multiple subsampled point clouds, each of which contains a random subset of the points in the original point cloud; then our PointGuard predicts the label of the original point cloud as the majority vote among the labels of the subsampled point clouds predicted by the point cloud classifier. Our first major theoretical contribution is that we show PointGuard provably predicts the same label for a 3D point cloud when the number of adversarially modified, added, and/or deleted points is bounded. Our second major theoretical contribution is that we prove the tightness of our derived bound when no assumptions on the point cloud classifier are made. Moreover, we design an efficient algorithm to compute our certified robustness guarantees. We also empirically evaluate PointGuard on ModelNet40 and ScanNet benchmark datasets.	翻訳日:2021-03-05 15:03:02 公開日:2021-03-04
# 実世界ブラインド画像復調のための畳み込み対自己組織型オペレーショナルニューラルネットワーク Convolutional versus Self-Organized Operational Neural Networks for Real-World Blind Image Denoising ( http://arxiv.org/abs/2103.03070v1 ) ライセンス: Link先を確認	Junaid Malik, Serkan Kiranyaz, Mehmet Yamac, Esin Guldogan, Moncef Gabbouj	(参考訳) 実世界のブラインドデノージングは、基礎となるノイズ分布の非決定論的性質のため、ユニークな画像復元に挑戦する。合成雑音モデルで訓練された一般的な識別ネットワークは、実世界のノイズ画像に悪影響を与えることが示されている。実世界のノイズ画像のキュレーションと地上の真理推定手順の改善は依然として重要なポイントであるが、潜在的研究の方向性は、より深い畳み込みニューラルネットワーク(CNN)を使うのとは対照的に、より少ないデータと低いネットワークの複雑さでより良い一般化を可能にするために広く使用される畳み込みニューロンモデルの拡張を探索することである。オペレーショナルニューラルネットワーク(ONNs)とその最近の変種である自己組織化ONN(Self-ONNs)は、強化された非線形性をニューロンモデルに組み込むことを提案し、様々な回帰タスクでCNNを上回ることが示されています。しかし、これらの比較はすべてコンパクトなネットワークで行われており、現代のディープアーキテクチャにおける畳み込みレイヤの代替として運用層を配置する効果は、まだ確認されていない。そこで本研究では,実世界のブラインド画像のデノジング問題に初めて,深い自己オンを用いて対処する。最先端の深層CNNネットワークであるDnCNNに対して、複数のメトリクスにまたがる広範囲な定量的および定性的評価と、高解像度の4つの実世界のノイズ画像データセットは、PSNRにおいて最大1.76dBの性能向上を確実に達成していることが明らかとなった。さらに、DnCNNの計算リソースのほんの一部だけを必要とするレイヤーの数を半分から4分の1まで持つSelf-ONNは、最先端のものと同じまたはより良い結果を達成できます。 Real-world blind denoising poses a unique image restoration challenge due to the non-deterministic nature of the underlying noise distribution. Prevalent discriminative networks trained on synthetic noise models have been shown to generalize poorly to real-world noisy images. While curating real-world noisy images and improving ground truth estimation procedures remain key points of interest, a potential research direction is to explore extensions to the widely used convolutional neuron model to enable better generalization with fewer data and lower network complexity, as opposed to simply using deeper Convolutional Neural Networks (CNNs). Operational Neural Networks (ONNs) and their recent variant, Self-organized ONNs (Self-ONNs), propose to embed enhanced non-linearity into the neuron model and have been shown to outperform CNNs across a variety of regression tasks. However, all such comparisons have been made for compact networks and the efficacy of deploying operational layers as a drop-in replacement for convolutional layers in contemporary deep architectures remains to be seen. In this work, we tackle the real-world blind image denoising problem by employing, for the first time, a deep Self-ONN. Extensive quantitative and qualitative evaluations spanning multiple metrics and four high-resolution real-world noisy image datasets against the state-of-the-art deep CNN network, DnCNN, reveal that deep Self-ONNs consistently achieve superior results with performance gains of up to 1.76dB in PSNR. Furthermore, Self-ONNs with half and even quarter the number of layers that require only a fraction of computational resources as that of DnCNN can still achieve similar or better results compared to the state-of-the-art.	翻訳日:2021-03-05 15:02:39 公開日:2021-03-04
# ジェネラティブ手法を用いたプライバシ攻撃に対する医療画像診断の防御 Defending Medical Image Diagnostics against Privacy Attacks using Generative Methods ( http://arxiv.org/abs/2103.03078v1 ) ライセンス: Link先を確認	William Paul, Yinzhi Cao, Miaomiao Zhang, and Phil Burlina	(参考訳) 医療画像診断に使用される機械学習(ML)モデルは、メンバーシップ推論攻撃を含むさまざまなプライバシー攻撃に脆弱になり、医療データの使用を規制する規制に違反し、診療所での効果的な展開を妨害する恐れがあります。本稿では,モデル変更と後処理ステップに着目したプライバシアウェアmlの最近の研究とは対照的に,データ共有プロセスを制御することで医療データのセキュリティを高める新しい補完スキームを提案する。本稿では,医療データ発信者に対してGAN(Generative Adversarial Network)を用いたプライバシ保護プロトコルの開発と評価を行う。病院) 原画像から合成されたプロキシデータセットを外部エージェント(モデラー)に提供することで、モデル消費者が利用可能な診断システムは、プライバシ攻撃者に対してレジリエントになる。本研究では, 糖尿病性網膜症に用いる網膜診断AIについて, 個人情報が漏洩するリスクがあることを示す。プライバシー擁護者とモデラーの両方の懸念を組み込むために、プライバシーとユーティリティのパフォーマンスを組み合わせ評価するメトリクスを導入し、これらの新旧のメトリクスを使用して、私たちのアプローチは、それ自体または他の防御と組み合わせて、プライバシー攻撃から守るための最先端の(SOTA)パフォーマンスを提供します。 Machine learning (ML) models used in medical imaging diagnostics can be vulnerable to a variety of privacy attacks, including membership inference attacks, that lead to violations of regulations governing the use of medical data and threaten to compromise their effective deployment in the clinic. In contrast to most recent work in privacy-aware ML that has been focused on model alteration and post-processing steps, we propose here a novel and complementary scheme that enhances the security of medical data by controlling the data sharing process. We develop and evaluate a privacy defense protocol based on using a generative adversarial network (GAN) that allows a medical data sourcer (e.g. a hospital) to provide an external agent (a modeler) a proxy dataset synthesized from the original images, so that the resulting diagnostic systems made available to model consumers is rendered resilient to privacy attackers. We validate the proposed method on retinal diagnostics AI used for diabetic retinopathy that bears the risk of possibly leaking private information. To incorporate concerns of both privacy advocates and modelers, we introduce a metric to evaluate privacy and utility performance in combination, and demonstrate, using these novel and classical metrics, that our approach, by itself or in conjunction with other defenses, provides state of the art (SOTA) performance for defending against privacy attacks.	翻訳日:2021-03-05 15:01:52 公開日:2021-03-04
# スパースランダム特徴による関数近似 Function Approximation via Sparse Random Features ( http://arxiv.org/abs/2103.03191v1 ) ライセンス: Link先を確認	Abolfazl Hashemi, Hayden Schaeffer, Robert Shi, Ufuk Topcu, Giang Tran, Rachel Ward	(参考訳) ランダム特徴法は様々な機械学習タスクで成功し、計算が容易で、理論的に精度の限界がある。コストのかかるトレーニングフェーズなしで同様の関数空間を表現できるため、標準的なニューラルネットワークに代わるアプローチとして機能します。しかしながら、正確性のため、ランダム特徴法はトレーニング可能なパラメータよりも多くの測定を必要とするため、データ収集アプリケーションや科学的な機械学習における問題に対する使用が制限される。本稿では,圧縮センシングの手法を用いて無作為特徴モデルを学習する分散ランダム特徴量法を提案する。再生カーネルヒルベルト空間における関数の近似誤差について、サンプル数と特徴量の分布に依存する一様境界を与える。誤差境界は、座標の間隔、スペクトルのコンパクトなクラスター、または急速なスペクトル崩壊などの追加の構造条件で改善される。分散ランダム特徴法は,十分に構造化された機能や科学的機械学習タスクへの応用において,浅層ネットワークよりも優れていることを示す。 Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing. We provide uniform bounds on the approximation error for functions in a reproducing kernel Hilbert space depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.	翻訳日:2021-03-05 15:01:27 公開日:2021-03-04
# 音声言語理解に関する調査 : 最近の進歩と新たなフロンティア A Survey on Spoken Language Understanding: Recent Advances and New Frontiers ( http://arxiv.org/abs/2103.03095v1 ) ライセンス: Link先を確認	Libo Qin, Tianbao Xie, Wanxiang Che, Ting Liu	(参考訳) SLU(Spoken Language Understanding)は、タスク指向ダイアログシステムの中核コンポーネントであるユーザクエリのセマンティクスフレームを抽出することを目的としている。深層ニューラルネットワークの破裂と事前訓練された言語モデルの進化により、SLUの研究は大きなブレークスルーを得た。しかし、既存のアプローチと最近のトレンドを要約した包括的な調査がいまだに欠落しており、この記事で提示された研究の動機となっている。本稿では、SLUの最近の進歩と新しいフロンティアを調査します。 Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. この調査が今後のSLU分野の研究に光を当てることを願っている。 Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. In this paper, we survey recent advances and new frontiers in SLU. Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. We hope that this survey can shed a light on future research in SLU field.	翻訳日:2021-03-05 15:01:11 公開日:2021-03-04
# エンドツーエンド同時音声翻訳復号戦略の実証的研究 An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies ( http://arxiv.org/abs/2103.03233v1 ) ライセンス: Link先を確認	Ha Nguyen, Yannick Est\`eve, Laurent Besacier	(参考訳) 本稿では,エンドツーエンドの同時音声翻訳のためのデコード戦略を提案する。オフラインモードで訓練されたエンドツーエンドモデルを活用し、2つの言語ペア(英語-ドイツ語と英語-ポルトガル語)の実証的研究を行います。また,文字やByte Pair Encoding (BPE)ユニットなど,さまざまな出力トークンの粒度についても検討する。その結果, BLEU/Average Laggingのトレードオフを, 異なる遅延方式で制御できることが示された。最適な復号化設定は,IWSLT 2020共有タスクの同時翻訳トラックで評価された強力なカスケードモデルにより,同等の結果が得られる。 This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.	翻訳日:2021-03-05 15:00:54 公開日:2021-03-04
# PC2WF: 原点雲からの3Dワイヤフレーム再構築 PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds ( http://arxiv.org/abs/2103.02766v1 ) ライセンス: Link先を確認	Yujia Liu, Stefano D'Aronco, Konrad Schindler, Jan Dirk Wegner	(参考訳) PC2WFは,3Dポイントクラウドをワイヤフレームモデルに変換するための,最初のエンドツーエンドトレーニング可能なディープネットワークアーキテクチャである。ネットワークは、あるオブジェクトの表面からサンプリングされた無秩序な3dポイントのセットを入力とし、そのオブジェクトのワイヤーフレーム、すなわち線分でリンクされたコーナーポイントのスパースセットを出力する。ワイヤフレームの復元は難しい作業であり、頂点とエッジの数が各インスタンスで異なるため、a-prioriは未知である。私たちのアーキテクチャは徐々にモデルを構築し、ポイントを特徴ベクトルにエンコードすることから始まります。これらの特徴に基づいて、候補者頂点のプールを特定し、候補者をコーナー頂点の最終セットにプルーンし、位置を洗練します。次に、コーナーは、最終的なワイヤフレームを得るために再びプルーニングされる候補エッジの総括セットにリンクされる。すべてのステップはトレーニング可能で、エラーはシーケンス全体をバックプロパゲーションすることができる。提案したモデルを,地上の真理線フレームにアクセス可能な公開合成データセットと,新たな実世界のデータセットで検証する。我々のモデルは、優れた品質のワイヤフレーム抽象化を生成し、いくつかのベースラインを上回ります。 We introduce PC2WF, the first end-to-end trainable deep network architecture to convert a 3D point cloud into a wireframe model. The network takes as input an unordered set of 3D points sampled from the surface of some object, and outputs a wireframe of that object, i.e., a sparse set of corner points linked by line segments. Recovering the wireframe is a challenging task, where the numbers of both vertices and edges are different for every instance, and a-priori unknown. Our architecture gradually builds up the model: It starts by encoding the points into feature vectors. Based on those features, it identifies a pool of candidate vertices, then prunes those candidates to a final set of corner vertices and refines their locations. Next, the corners are linked with an exhaustive set of candidate edges, which is again pruned to obtain the final wireframe. All steps are trainable, and errors can be backpropagated through the entire sequence. We validate the proposed model on a publicly available synthetic dataset, for which the ground truth wireframes are accessible, as well as on a new real-world dataset. Our model produces wireframe abstractions of good quality and outperforms several baselines.	翻訳日:2021-03-05 14:59:33 公開日:2021-03-04
# 顔認識がオクルージョンを満たすとき: 新しいベンチマーク When Face Recognition Meets Occlusion: A New Benchmark ( http://arxiv.org/abs/2103.02805v1 ) ライセンス: Link先を確認	Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Kangli Zeng, Zhen Han, Xin Tian, Yuhong Yang	(参考訳) 既存の顔認識データセットは、通常、顔認識の開発を妨げる閉塞サンプルを欠いています。特に新型コロナウイルス(COVID-19)の流行により、マスクの着用はウイルスの拡散を防ぐ効果的な手段となっている。既存のデータセットでトレーニングされた従来のCNNベースの顔認識モデルは、重閉塞に対してほとんど効果がない。この目的のために,シミュレーションによるオクルージョン顔認識データセットを考案する。特に,まず様々な眼鏡やマスクを隠蔽として収集し,隠蔽属性(隠蔽物,テクスチャ,色)をランダムに組み合わせて,より現実的な隠蔽タイプを多数達成する。それから私達は正常な閉塞の習慣の顔のイメージの適切な位置でそれらを覆います。さらに,オリジナル正規顔画像とオクルード顔画像を組み合わせて,webface-occと呼ばれる最終データセットを形成する。その多様性と安定性を確保するために、10,575人の被験者の804,704枚の顔画像をカバーしています。公開データセットに関する広範な実験は、データセットで再トレーニングされたarcfaceが最先端を著しく上回っていることを示している。 Webface-OCCはhttps://github.com/Baojin-Huang/Webface-OCCで入手できる。 The existing face recognition datasets usually lack occlusion samples, which hinders the development of face recognition. Especially during the COVID-19 coronavirus epidemic, wearing a mask has become an effective means of preventing the virus spread. Traditional CNN-based face recognition models trained on existing datasets are almost ineffective for heavy occlusion. To this end, we pioneer a simulated occlusion face recognition dataset. In particular, we first collect a variety of glasses and masks as occlusion, and randomly combine the occlusion attributes (occlusion objects, textures,and colors) to achieve a large number of more realistic occlusion types. We then cover them in the proper position of the face image with the normal occlusion habit. Furthermore, we reasonably combine original normal face images and occluded face images to form our final dataset, termed as Webface-OCC. It covers 804,704 face images of 10,575 subjects, with diverse occlusion types to ensure its diversity and stability. Extensive experiments on public datasets show that the ArcFace retrained by our dataset significantly outperforms the state-of-the-arts. Webface-OCC is available at https://github.com/Baojin-Huang/Webface-OCC.	翻訳日:2021-03-05 14:59:12 公開日:2021-03-04
# 構造条件付き逆学習による画像分類のための教師なし領域適応 Unsupervised Domain Adaptation for Image Classification via Structure-Conditioned Adversarial Learning ( http://arxiv.org/abs/2103.02808v1 ) ライセンス: Link先を確認	Hui Wang, Jian Tian, Songyuan Li, Hanbin Zhao, Qi Tian, Fei Wu, and Xi Li	(参考訳) Unsupervised Domain Adapt (UDA) は、典型的には、ラベルリッチなソースドメインから非ラベル付きターゲットドメインへの知識転送を、逆学習によって行う。原則として、既存のUDAアプローチは主にドメイン間のグローバルな分布アライメントに焦点を当て、固有の局所分布特性を無視している。そこで本研究では,ドメイン分散アライメント中のクラス内コンパクト性を維持可能な,エンドツーエンド構造条件付き対向学習スキーム(SCAL)を提案する。局所構造を構造認識条件として用いることで,提案手法を構造条件付き逆学習パイプラインに実装する。上記学習手順は、局所構造確立と構造条件付き逆学習とを交互に交互に行う。 UDAシナリオにおける提案手法の有効性を実験的に実証した。 Unsupervised domain adaptation (UDA) typically carries out knowledge transfer from a label-rich source domain to an unlabeled target domain by adversarial learning. In principle, existing UDA approaches mainly focus on the global distribution alignment between domains while ignoring the intrinsic local distribution properties. Motivated by this observation, we propose an end-to-end structure-conditioned adversarial learning scheme (SCAL) that is able to preserve the intra-class compactness during domain distribution alignment. By using local structures as structure-aware conditions, the proposed scheme is implemented in a structure-conditioned adversarial learning pipeline. The above learning procedure is iteratively performed by alternating between local structures establishment and structure-conditioned adversarial learning. Experimental results demonstrate the effectiveness of the proposed scheme in UDA scenarios.	翻訳日:2021-03-05 14:58:55 公開日:2021-03-04
# セマンティックアグリゲーションとアダプティブ2D-1Dレジストレーションによるカメラ空間ハンドメッシュの回復 Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration ( http://arxiv.org/abs/2103.02845v1 ) ライセンス: Link先を確認	Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, Wen Zheng	(参考訳) 近年,3dハンドメッシュの回復が著しい進展を遂げている。しかし、本質的な2Dから3Dの曖昧さのために、単一のRGB画像からカメラ空間の3D情報を回復することは困難のままです。この問題に対処するため、カメラ空間メッシュ回復を2つのサブタスク、すなわちルート相対メッシュ回復とルート回復に分割する。まず、単一の入力画像からジョイントランドマークとシルエットを抽出し、3Dタスクに2Dキューを提供します。ルート関係メッシュ回収タスクでは,ジョイント間の意味関係を利用して抽出した2次元キューから3次元メッシュを生成する。このような生成された3Dメッシュ座標は、手首というルート位置に対して表現される。ルート回収タスクでは、生成した3Dメッシュを2Dキューに戻すことにより、カメラ空間にルート位置を登録し、カメラ空間の3Dメッシュ回復を完了させる。このパイプラインは,(1)関節間の既知の意味関係を明示的に利用し,(2)シルエットとメッシュの1次元投影を利用してロバストな登録を実現している。 FreiHAND、RHD、Human3.6Mなどの一般的なデータセットに関する広範な実験は、私たちのアプローチがルート相対メッシュリカバリとルートリカバリの両方で最先端のパフォーマンスを達成することを実証しています。私たちのコードはhttps://github.com/SeanChenxy/HandMeshで公開されています。 Recent years have witnessed significant progress in 3D hand mesh recovery. Nevertheless, because of the intrinsic 2D-to-3D ambiguity, recovering camera-space 3D information from a single RGB image remains challenging. To tackle this problem, we divide camera-space mesh recovery into two sub-tasks, i.e., root-relative mesh recovery and root recovery. First, joint landmarks and silhouette are extracted from a single input image to provide 2D cues for the 3D tasks. In the root-relative mesh recovery task, we exploit semantic relations among joints to generate a 3D mesh from the extracted 2D cues. Such generated 3D mesh coordinates are expressed relative to a root position, i.e., wrist of the hand. In the root recovery task, the root position is registered to the camera space by aligning the generated 3D mesh back to 2D cues, thereby completing camera-space 3D mesh recovery. Our pipeline is novel in that (1) it explicitly makes use of known semantic relations among joints and (2) it exploits 1D projections of the silhouette and mesh to achieve robust registration. Extensive experiments on popular datasets such as FreiHAND, RHD, and Human3.6M demonstrate that our approach achieves state-of-the-art performance on both root-relative mesh recovery and root recovery. Our code is publicly available at https://github.com/SeanChenxy/HandMesh.	翻訳日:2021-03-05 14:58:43 公開日:2021-03-04
# 微分可能なニューラルレンダリングによる物体検出のためのデータ拡張 Data Augmentation for Object Detection via Differentiable Neural Rendering ( http://arxiv.org/abs/2103.02852v1 ) ライセンス: Link先を確認	Guanghan Ning, Guang Chen, Chaowei Tan, Si Luo, Liefeng Bo, Heng Huang	(参考訳) 注釈付きデータが乏しい場合、堅牢なオブジェクト検出器を訓練することは困難です。この問題に対処する既存のアプローチには、ラベルなしデータからラベル付きデータを補間する半教師付き学習、プリテキストタスクを介してラベルなしデータ内の信号を利用する自己教師付き学習などがある。教師付き学習パラダイムを変えることなく,学習データを新しいビューで意味的に補間する,オブジェクト検出のためのオフラインデータ拡張手法を導入する。具体的には,人間の介入を伴わない境界ボックスアノテーションとともに,識別可能なニューラルレンダリングに基づくトレーニング画像の制御可能なビューを生成する。まず,深度マップを推定しながら,画素整列画像の特徴を点雲に抽出・投影する。次に、ターゲットカメラのポーズでそれらを再投影し、新しいビュー2d画像を描画する。キーポイント形式のオブジェクトはポイントクラウドにマークされ、新しいビューでアノテーションを復元します。アフィン変換やイメージミックスアップなどのオンラインデータ拡張手法と完全に互換性がある。広範な実験により,画像やラベルを強調するコストのないツールとして,訓練データが少ない物体検出システムの性能を著しく向上させることができることが示された。コードは \url{https://github.com/Guanghan/DANR} で入手できる。 It is challenging to train a robust object detector when annotated data is scarce. Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data, self-supervised learning that exploit signals within unlabeled data via pretext tasks. Without changing the supervised learning paradigm, we introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views. Specifically, our proposed system generates controllable views of training images based on differentiable neural rendering, together with corresponding bounding box annotations which involve no human intervention. Firstly, we extract and project pixel-aligned image features into point clouds while estimating depth maps. We then re-project them with a target camera pose and render a novel-view 2d image. Objects in the form of keypoints are marked in point clouds to recover annotations in new views. It is fully compatible with online data augmentation methods, such as affine transform, image mixup, etc. Extensive experiments show that our method, as a cost-free tool to enrich images and labels, can significantly boost the performance of object detection systems with scarce training data. Code is available at \url{https://github.com/Guanghan/DANR}.	翻訳日:2021-03-05 14:58:20 公開日:2021-03-04
# 相互一貫性トレーニングによる半教師付き左心房セグメンテーション Semi-supervised Left Atrium Segmentation with Mutual Consistency Training ( http://arxiv.org/abs/2103.02911v1 ) ライセンス: Link先を確認	Yicheng Wu, Minfeng Xu, Zongyuan Ge, Jianfei Cai and Lei Zhang	(参考訳) 半教師付き学習は、トレーニングのために大量の注釈データを集めることの重荷を軽減し、特に医用画像分割タスクにおいて機械学習の分野で大きな注目を集めている。しかし、既存の方法のほとんどは挑戦的な地域(例えば)の重要性を過小評価している。訓練中の小さな枝またはぼやけた縁) これらの未ラベル領域には、モデルの不確実性予測を最小限に抑えるためにより重要な情報が含まれており、トレーニングプロセスにおいて強調されるべきであると考えている。そこで本稿では,3次元MR画像からの半教師付き左房分割のための新しいMultual Consistency Network(MC-Net)を提案する。特に、MC-Netは1つのエンコーダと2つのわずかに異なるデコーダから構成されており、2つのデコーダの予測誤差は、相互整合性を促進するために設計された疑似ラベルスキームによって教師なしの損失として変換される。このような相互整合性は、2つのデコーダの一貫性と低エントロピー予測を奨励し、モデルがこれらのラベルのない挑戦領域から徐々に一般化された特徴を捉えることを可能にする。我々は,公開左心房(la)データベース上でmc-netを評価し,ラベルなしデータを効果的に活用することで印象的な性能向上を実現する。我々のMC-Netは、最近6つの半教師付き左房セグメンテーション法より優れており、LAデータベース上で新しい最先端性能を設定できる。 Semi-supervised learning has attracted great attention in the field of machine learning, especially for medical image segmentation tasks, since it alleviates the heavy burden of collecting abundant densely annotated data for training. However, most of existing methods underestimate the importance of challenging regions (e.g. small branches or blurred edges) during training. We believe that these unlabeled regions may contain more crucial information to minimize the uncertainty prediction for the model and should be emphasized in the training process. Therefore, in this paper, we propose a novel Mutual Consistency Network (MC-Net) for semi-supervised left atrium segmentation from 3D MR images. Particularly, our MC-Net consists of one encoder and two slightly different decoders, and the prediction discrepancies of two decoders are transformed as an unsupervised loss by our designed cycled pseudo label scheme to encourage mutual consistency. Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions. We evaluate our MC-Net on the public Left Atrium (LA) database and it obtains impressive performance gains by exploiting the unlabeled data effectively. Our MC-Net outperforms six recent semi-supervised methods for left atrium segmentation, and sets the new state-of-the-art performance on the LA database.	翻訳日:2021-03-05 14:58:00 公開日:2021-03-04
# QAIR:画像検索のためのクエリ効率の高いブラックボックス攻撃 QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval ( http://arxiv.org/abs/2103.02927v1 ) ライセンス: Link先を確認	Xiaodan Li, Jinfeng Li, Yuefeng Chen, Shaokai Ye, Yuan He, Shuhui Wang, Hang Su, Hui Xue	(参考訳) 画像検索に対するクエリーベースの攻撃について検討し、ブラックボックス設定下では、データベースから上位kランクのラベルなし画像へのクエリーアクセスしか持たない、敵対的な例に対する堅牢性を評価する。返却されたラベルや信頼スコアに応じて敵を生成する画像分類におけるクエリーアタックと比較すると、部分的検索リストにおける攻撃効果の定量化が困難であるため、課題はさらに顕著になる。本稿では,QAIR(Query-based Attack against Image Retrieval)を初めて試行し,トップk検索結果を完全に変換する。具体的には、攻撃前後の上位K検索結果に対するセット類似度を測定し、勾配最適化を導くことにより、攻撃効果の定量化を図る。攻撃効率をさらに高めるため、ターゲットモデル上で転送可能なプリエントを取得し、事前誘導勾配を生成する再帰的モデル盗み法を提案する。総合的な実験により,ブラックボックス設定による画像検索システムに対するクエリ数が少なく,高い攻撃成功率を達成した。現実世界のビジュアル検索エンジンの攻撃評価は、Bing Visual Searchのような商用システムを、平均33のクエリで98%の攻撃成功率で欺くことに成功したことを示している。 We study the query-based attack against image retrieval to evaluate its robustness against adversarial examples under the black-box setting, where the adversary only has query access to the top-k ranked unlabeled images from the database. Compared with query attacks in image classification, which produce adversaries according to the returned labels or confidence score, the challenge becomes even more prominent due to the difficulty in quantifying the attack effectiveness on the partial retrieved list. In this paper, we make the first attempt in Query-based Attack against Image Retrieval (QAIR), to completely subvert the top-k retrieval results. Specifically, a new relevance-based loss is designed to quantify the attack effects by measuring the set similarity on the top-k retrieval results before and after attacks and guide the gradient optimization. To further boost the attack efficiency, a recursive model stealing method is proposed to acquire transferable priors on the target model and generate the prior-guided gradients. Comprehensive experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting. The attack evaluations on the real-world visual search engine show that it successfully deceives a commercial system such as Bing Visual Search with 98% attack success rate by only 33 queries on average.	翻訳日:2021-03-05 14:57:37 公開日:2021-03-04
# Visual Question Answering: どのアプリケーションを調査したか? Visual Question Answering: which investigated applications? ( http://arxiv.org/abs/2103.02937v1 ) ライセンス: Link先を確認	Silvio Barra, Carmen Bisogni, Maria De Marsico, Stefano Ricciardi	(参考訳) VQA(Visual Question Answering)は、コンピュータビジョン(CV)と自然言語処理(NLP)が最近出会った非常に刺激的で挑戦的な研究分野である。画像キャプションとビデオ要約では、セマンティック情報は静止画またはビデオダイナミクスに完全に含まれており、人間の一貫性のある方法でマイニングおよび表現されるだけです。これとは違って、同じメディア内のVQAセマンティック情報は、自然言語で表現された質問によって暗示されるセマンティックスと比較されなければならない。 VQAアプローチに関する最近の調査では、画像関連処理や言語関連処理の基礎となる手法や、伝達された情報を一貫して融合させる方法に焦点が当てられている。実際、引用されたほとんどの作品は、VQAシステムのビルディングブロックを評価するために使用される汎用データセットに依存しています。本稿では、実際のアプリケーションにフォーカスした提案を検討し、アプリケーションドメインにバインドされた適切なデータをベンチマークとして使用する可能性について考察する。また、VQA研究における最近の課題についても報告する。 Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is completely contained in still images or video dynamics, and it has only to be mined and expressed in a human-consistent way. Differently from this, in VQA semantic information in the same media must be compared with the semantics implied by a question expressed in natural language, doubling the artificial intelligence-related effort. Some recent surveys about VQA approaches have focused on methods underlying either the image-related processing or the verbal-related one, or on the way to consistently fuse the conveyed information. Possible applications are only suggested, and, in fact, most cited works rely on general-purpose datasets that are used to assess the building blocks of a VQA system. This paper rather considers the proposals that focus on real-world applications, possibly using as benchmarks suitable data bound to the application domain. The paper also reports about some recent challenges in VQA research.	翻訳日:2021-03-05 14:57:16 公開日:2021-03-04
# MOGAN:単一画像からの形態学的構造認識ジェネラティブラーニング MOGAN: Morphologic-structure-aware Generative Learning from a Single Image ( http://arxiv.org/abs/2103.02997v1 ) ライセンス: Link先を確認	Jinshu Chen, Qihui Xu, Qi Kang and MengChu Zhou	(参考訳) ユーザの関心領域(ROI)が与えられたほとんどのインタラクティブな画像生成タスクにおいて、生成した結果は、元の画像の正確かつ合理的な構造を維持しつつ、外観に適切な多様性が期待できる。このようなタスクは、限られたデータしか利用できない場合、より困難になる。近年,1つの画像のみに基づく生成モデルによる完全学習が提案されている。彼らはサンプル内の異なるオブジェクトの実際の意味情報を無視しながら、サンプルのモノリシックな特徴に多くの注意を払います。その結果、ROIベースの生成タスクでは、関連するオブジェクトの正しい構造を維持することなく、過度のランダム性を持つ不適切なサンプルを生成する可能性があります。この問題に対処するために,MOGAN と呼ばれるMOrphological-aware Generative Adversarial Networkを導入し,単一の画像のみに基づいて,多様な外観と信頼性を有するランダムなサンプルを生成する。 roiのトレーニングのために,原画像からのデータを拡張し,これらの拡張データを構造と外観の両方を含む知識に変換する新しいモジュールを導入することで,モデルのサンプル理解を高めることを提案する。 ROI以外の残りの領域を学ぶために、ROIから分離された生成を保証するためにバイナリマスクを使用します。最後に、上記の学習プロセスの並列および階層的な分岐を設定した。他の単一画像GAN方式と比較して,本手法は合理的な構造維持や外観の変化など,内部的な特徴に重点を置いている。実験では、ROIベースの画像生成タスクにおける私たちのモデルの能力は、競合相手よりも優れています。 In most interactive image generation tasks, given regions of interest (ROI) by users, the generated results are expected to have adequate diversities in appearance while maintaining correct and reasonable structures in original images. Such tasks become more challenging if only limited data is available. Recently proposed generative models complete training based on only one image. They pay much attention to the monolithic feature of the sample while ignoring the actual semantic information of different objects inside the sample. As a result, for ROI-based generation tasks, they may produce inappropriate samples with excessive randomicity and without maintaining the related objects' correct structures. To address this issue, this work introduces a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances and reliable structures based on only one image. For training for ROI, we propose to utilize the data coming from the original image being augmented and bring in a novel module to transform such augmented data into knowledge containing both structures and appearances, thus enhancing the model's comprehension of the sample. To learn the rest areas other than ROI, we employ binary masks to ensure the generation isolated from ROI. Finally, we set parallel and hierarchical branches of the mentioned learning process. Compared with other single image GAN schemes, our approach focuses on internal features including the maintenance of rational structures and variation on appearance. Experiments confirm a better capacity of our model on ROI-based image generation tasks than its competitive peers.	翻訳日:2021-03-05 14:56:57 公開日:2021-03-04
# CoTr:3D医療画像セグメンテーションのための効率の良いCNNとトランスフォーマー CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation ( http://arxiv.org/abs/2103.03024v1 ) ライセンス: Link先を確認	Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia	(参考訳) 畳み込みニューラルネットワーク(CNN)は、現代の3D医療画像セグメンテーションのデファクトスタンダードとなっている。しかし、これらのネットワークで使用される畳み込み操作は、局所性および重量共有の誘導バイアスのために長距離依存性のモデリングに必然的に制限がある。 Transformerはこの問題に対処するために生まれたが、高解像度の3D特徴マップを処理する際の計算量と空間的複雑さに悩まされている。本稿では, 正確な3次元医用画像分割のために, {\bf Co}nvolutional Neural Network と {\bf Tr}ansformer {\bf (CoTr)} を効率的に橋渡しする新しいフレームワークを提案する。このフレームワークの下で、CNNは特徴表現を抽出するために構築され、抽出された特徴マップ上の長距離依存性をモデル化する効率的な変形可能なトランスフォーマー(DeTrans)が構築される。画像位置を均等に扱うバニラ変換器とは異なり、DeTransは変形可能な自己認識機構を導入することで、キー位置の小さなセットにのみ注意を払う。したがって、DeTransの計算と空間の複雑さは大幅に減少し、画像分割において最も重要となるマルチスケールで高解像度な特徴写像を処理できるようになった。 11の主要なヒト臓器をカバーするBCV(Multi-Atlas Labeling Beyond the Cranial Vault)データセットについて広範な評価を行っています。その結果, cotrは他のcnnベース, トランスフォーマーベース, ハイブリッド法に比べて, 3次元マルチオーガンセグメンテーションタスクの性能が大幅に向上した。コードは \def\UrlFont{\rm\ Small\ttfamily} \url{https://github.com/YtongXie/CoTr} で入手できる。 Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a {\bf Co}nvolutional neural network and a {\bf Tr}ansformer {\bf (CoTr)} for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer which treats all image positions equally, our DeTrans pays attention only to a small set of key positions by introducing the deformable self-attention mechanism. Thus, the computational and spatial complexities of DeTrans have been greatly reduced, making it possible to process the multi-scale and high-resolution feature maps, which are usually of paramount importance for image segmentation. We conduct an extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that covers 11 major human organs. The results indicate that our CoTr leads to a substantial performance improvement over other CNN-based, transformer-based, and hybrid methods on the 3D multi-organ segmentation task. Code is available at \def\UrlFont{\rm\small\ttfamily} \url{https://github.com/YtongXie/CoTr}	翻訳日:2021-03-05 14:56:34 公開日:2021-03-04
# モバイルタッチレス指紋認識:実装、パフォーマンス、ユーザビリティの観点から Mobile Touchless Fingerprint Recognition: Implementation, Performance and Usability Aspects ( http://arxiv.org/abs/2103.03038v1 ) ライセンス: Link先を確認	Jannis Priesnitz, Rolf Huesmann, Christian Rathgeb, Nicolas Buchmann, Christoph Busch	(参考訳) 本研究は,スマートフォン用タッチレス指紋自動認識システムを提案する。認識パイプライン全体を包括的に記述し,完全自動化したキャプチャシステムにおける重要な要件について考察する。また,本実装は研究目的で公開されている。データベースの取得中に、29の被験者の合計1,360のタッチレスおよびタッチベースのサンプルが2つの異なる環境状況でキャプチャされます。取得したデータベース上での実験では,環境制約下でのタッチレススキームとタッチベースベースラインスキームに匹敵する性能を示した。両方のキャプチャデバイスタイプの比較ユーザビリティ調査は、被験者の大半がタッチレスキャプチャ方法を好むことを示しています。実験結果に基づいて、現在のCOVID-19パンデミックが指紋認識システムに与える影響を分析した。最後に,タッチレス指紋認証の実装について概説する。 This work presents an automated touchless fingerprint recognition system for smartphones. We provide a comprehensive description of the entire recognition pipeline and discuss important requirements for a fully automated capturing system. Also, our implementation is made publicly available for research purposes. During a database acquisition, a total number of 1,360 touchless and touch-based samples of 29 subjects are captured in two different environmental situations. Experiments on the acquired database show a comparable performance of our touchless scheme and the touch-based baseline scheme under constrained environmental influences. A comparative usability study on both capturing device types indicates that the majority of subjects prefer the touchless capturing method. Based on our experimental results we analyze the impact of the current COVID-19 pandemic on fingerprint recognition systems. Finally, implementation aspects of touchless fingerprint recognition are summarized.	翻訳日:2021-03-05 14:56:07 公開日:2021-03-04
# 組織グラフを用いた不完全・不完全ラベルからの全すべりセグメンテーションの学習 Learning Whole-Slide Segmentation from Inexact and Incomplete Labels using Tissue Graphs ( http://arxiv.org/abs/2103.03129v1 ) ライセンス: Link先を確認	Valentin Anklin, Pushpak Pati, Guillaume Jaume, Behzad Bozorgtabar, Antonio Foncubierta-Rodr\'iguez, Jean-Philippe Thiran, Mathilde Sibony, Maria Gabrani, Orcun Goksel	(参考訳) 病理組織像を診断的に関連のある領域に分割することは,病理医の時間的かつ信頼性の高い判断を支援する上で不可欠である。この目的のためにコンピュータ支援技術が提案されており、スキャンされた組織学スライドの関連領域を記述している。しかし、この技術は、退屈で時間がかかり、費用がかかり、多くのヒストロジータスクのために取得できない、注釈付きピクセルのタスク固有の大きなデータセットを必要とします。よって, より安価で, より早く取得できるような, 弱い監督的セマンティックセグメンテーション手法を提案する。本稿では,弱い多重化アノテーション,すなわち弱い多重化アノテーションを利用するグラフを用いた弱教師付きセグメンテーション手法であるSegGiniを提案する。組織マイクロアレイ(TMA)からスライド画像全体(WSI)まで、任意の画像と大きな画像を分割する不完全かつ不完全なアノテーション。正式には、SegGiniは入力ヒストロジー画像のための組織グラフ表現を構築し、グラフノードは組織領域を描写する。そして、不正確な画像レベルラベル、不完全なスクリブル、またはその両方を用いて、ノード分類による弱い教師付きセグメンテーションを実行する。 TMAとWSIを含む2つの前立腺癌データセットを用いてSegGiniの評価を行った。本手法は,病理学者のベースラインに匹敵しながら,様々なアノテーション設定において,両方のデータセットにおいて最先端のセグメンテーション性能を達成した。 Segmenting histology images into diagnostically relevant regions is imperative to support timely and reliable decisions by pathologists. To this end, computer-aided techniques have been proposed to delineate relevant regions in scanned histology slides. However, the techniques necessitate task-specific large datasets of annotated pixels, which is tedious, time-consuming, expensive, and infeasible to acquire for many histology tasks. Thus, weakly-supervised semantic segmentation techniques are proposed to utilize weak supervision that is cheaper and quicker to acquire. In this paper, we propose SegGini, a weakly supervised segmentation method using graphs, that can utilize weak multiplex annotations, i.e. inexact and incomplete annotations, to segment arbitrary and large images, scaling from tissue microarray (TMA) to whole slide image (WSI). Formally, SegGini constructs a tissue-graph representation for an input histology image, where the graph nodes depict tissue regions. Then, it performs weakly-supervised segmentation via node classification by using inexact image-level labels, incomplete scribbles, or both. We evaluated SegGini on two public prostate cancer datasets containing TMAs and WSIs. Our method achieved state-of-the-art segmentation performance on both datasets for various annotation settings while being comparable to a pathologist baseline.	翻訳日:2021-03-05 14:55:58 公開日:2021-03-04
# コントラスト学習とトランスファー学習--医用画像解析を事例として Contrastive Learning Meets Transfer Learning: A Case Study In Medical Image Analysis ( http://arxiv.org/abs/2103.03166v1 ) ライセンス: Link先を確認	Yuzhe Lu, Aadarsh Jha, and Yuankai Huo	(参考訳) 注釈付き医療画像は、ドメインの知識とプライバシーの制約によって制限されるため、ラベル付き自然画像よりも稀である。転校とコントラスト学習の最近の進歩は、異なる視点からこれらの問題に取り組む効果的な解決策を提供してきた。最先端の伝達学習(Big Transfer(BiT)など)とコントラスト学習(Simple Siamese Contrastive Learning(SimSiam)など)のアプローチは、その補完的な性質を考慮せずに、独立して検討されている。遅い収束速度が現代のコントラスト学習アプローチの重要な制限であることを考えると、トランスファー学習によるコントラスト学習を加速させるのは魅力的です。本稿では, BiT と SimSiam の整合性について検討する。経験的分析から、BiTをSimSiamに適応させる上では、異なる正規化技術(SimSiamのBatch Norm対Batch Norm)が鍵となる。 CIFAR-10およびHAM10000データセット上でBiT,SimSiam,BiT+SimSiamを用いてBiTとSimSiamを組み合わせた場合の性能評価を行った。その結果, BiTモデルがSimSiamの収束速度を加速することが示唆された。一緒に使用すると、どちらのモデルよりも優れた性能を発揮します。この研究は、画像解析のためのコントラスト学習モデルで、大きな学習済みモデルを集約するタスクを再検討する研究者を動機付けることを願っています。 Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective solutions to tackle such issues from different perspectives. The state-of-the-art transfer learning (e.g., Big Transfer (BiT)) and contrastive learning (e.g., Simple Siamese Contrastive Learning (SimSiam)) approaches have been investigated independently, without considering the complementary nature of such techniques. It would be appealing to accelerate contrastive learning with transfer learning, given that slow convergence speed is a critical limitation of modern contrastive learning approaches. In this paper, we investigate the feasibility of aligning BiT with SimSiam. From empirical analyses, different normalization techniques (Group Norm in BiT vs. Batch Norm in SimSiam) are the key hurdle of adapting BiT to SimSiam. When combining BiT with SimSiam, we evaluated the performance of using BiT, SimSiam, and BiT+SimSiam on CIFAR-10 and HAM10000 datasets. The results suggest that the BiT models accelerate the convergence speed of SimSiam. When used together, the model gives superior performance over both of its counterparts. We hope this study will motivate researchers to revisit the task of aggregating big pre-trained models with contrastive learning models for image analysis.	翻訳日:2021-03-05 14:55:36 公開日:2021-03-04
# インタラクティブ画像合成と編集のためのAnycost GAN Anycost GANs for Interactive Image Synthesis and Editing ( http://arxiv.org/abs/2103.03243v1 ) ライセンス: Link先を確認	Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu	(参考訳) generative adversarial networks (gans) はフォトリアリスティックな画像合成と編集を可能にした。しかし、大規模なジェネレータ(例:StyleGAN2)の計算コストが高いため、エッジデバイス上の単一の編集結果を見るのには通常数秒かかり、インタラクティブなユーザーエクスペリエンスを禁止します。本稿では,現代的なレンダリングソフトウェアからインスピレーションを得て,インタラクティブな自然画像編集のためのAnycost GANを提案する。 Anycost GANをトレーニングし、弾力性のある解像度とチャンネルをサポートし、汎用性の高い速度で画像生成を高速化します。フルジェネレーターのサブセットを実行すると、フルジェネレーターと知覚的に類似した出力が生成されるため、プレビューに適したプロキシになります。サンプリングベースのマルチリゾリューショントレーニング、アダプティブチャネルトレーニング、および発電機コンディショニング識別器を使用することで、任意のジェネレータをさまざまな構成で評価し、別々に訓練されたモデルよりも優れた画質を実現できます。さらに,画像投影中に異なるサブジェネレータ間の一貫性を促進するために,新しいエンコーダトレーニングと潜在コード最適化手法を開発した。 Anycost GANは、さまざまなコスト予算(最大10倍の計算削減)で実行でき、幅広いハードウェアおよびレイテンシ要件に適応できます。デスクトップCPUとエッジデバイスにデプロイすると、6-12倍のスピードアップで知覚的に同様のプレビューを提供し、インタラクティブな画像編集を可能にします。コードとデモは公開されている。 https://github.com/mit-han-lab/anycost-gan。 Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, we take inspirations from modern rendering software and propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10x computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing. The code and demo are publicly available: https://github.com/mit-han-lab/anycost-gan.	翻訳日:2021-03-05 14:55:11 公開日:2021-03-04
# ニューラルネットワークによる正確かつ解釈可能な決定規則セットの学習 Learning Accurate and Interpretable Decision Rule Sets from Neural Networks ( http://arxiv.org/abs/2103.02826v1 ) ライセンス: Link先を確認	Litao Qiao, Weijia Wang, Bill Lin	(参考訳) 本論文では,分類の解釈可能なモデルとして独立論理則の集合を解読正規形式で学習する新しいパラダイムを提案する。我々は、ニューラルネットワークを特定の、しかし非常に単純な2層アーキテクチャでトレーニングすることとして、解釈可能な決定ルールセットを学ぶ問題を考える。第1層の各ニューロンはトレーニング後の解釈可能なif-thenルールに直接マップし、第2層の出力ニューロンは第1層ルールの切断に直接マップして決定ルールセットを形成する。この第1のルール層におけるニューロンの表現は、決定ルールにおける特徴の正の結合と負の結合の両方をエンコードできる。最先端のニューラルネットワークトレーニングアプローチは、高精度な分類モデルの学習に利用できる。さらに,分類精度とルールの単純さのバランスをとるために,スパース性に基づく正規化手法を提案する。実験の結果,本手法は他の最先端ルール学習アルゴリズムよりも精度の高いルールセットを生成できることがわかった。さらに、ランダムフォレストやフル精度ディープニューラルネットワークなどの解釈不能なブラックボックス機械学習アプローチと比較すると、予測性能に匹敵する解釈可能な決定ルールセットを簡単に見つけることができます。 This paper proposes a new paradigm for learning a set of independent logical rules in disjunctive normal form as an interpretable model for classification. We consider the problem of learning an interpretable decision rule set as training a neural network in a specific, yet very simple two-layer architecture. Each neuron in the first layer directly maps to an interpretable if-then rule after training, and the output neuron in the second layer directly maps to a disjunction of the first-layer rules to form the decision rule set. Our representation of neurons in this first rules layer enables us to encode both the positive and the negative association of features in a decision rule. State-of-the-art neural net training approaches can be leveraged for learning highly accurate classification models. Moreover, we propose a sparsity-based regularization approach to balance between classification accuracy and the simplicity of the derived rules. Our experimental results show that our method can generate more accurate decision rule sets than other state-of-the-art rule-learning algorithms with better accuracy-simplicity trade-offs. Further, when compared with uninterpretable black-box machine learning approaches such as random forests and full-precision deep neural networks, our approach can easily find interpretable decision rule sets that have comparable predictive performance.	翻訳日:2021-03-05 14:53:15 公開日:2021-03-04
# 旅行セールスマン問題のためのトランスフォーマーネットワーク The Transformer Network for the Traveling Salesman Problem ( http://arxiv.org/abs/2103.03012v1 ) ライセンス: Link先を確認	Xavier Bresson and Thomas Laurent	(参考訳) トラベリングセールスマン問題(TSP)は1951年にフォン・ノイマンから始まった最も人気があり、最も研究されている組合せ問題である。切断面、分岐およびバウンド、ローカル検索、ラグランジアンリラクゼーション、シミュレートアニールなど、いくつかの最適化技術が発見されました。過去5年間、(グラフ)ニューラルネットワークが新しい組み合わせアルゴリズムを学習できる有望な技術の出現を見てきました。主な疑問は、ディープラーニングがデータからより良いヒューリスティックを学習できるかどうかである。人間工学のヒューリスティックを置き換える? NP-hard問題に効率的に対処するアルゴリズムを開発するには、長年の研究が必要であり、多くの業界問題が自然と組み合わせているため、これは魅力的である。本研究では,自然言語処理用に開発されたトランスフォーマーアーキテクチャを組込みTSPに適応させることを提案する。トレーニングは強化学習によって行われ、tspトレーニングソリューションがないため、デコーディングはビームサーチを使用する。 TSP50では0.004%、TSP100では0.39%の最適なギャップを持つ最近の学習ヒューリスティックに対するパフォーマンスの改善を報告する。 The Traveling Salesman Problem (TSP) is the most popular and most studied combinatorial problem, starting with von Neumann in 1951. It has driven the discovery of several optimization techniques such as cutting planes, branch-and-bound, local search, Lagrangian relaxation, and simulated annealing. The last five years have seen the emergence of promising techniques where (graph) neural networks have been capable to learn new combinatorial algorithms. The main question is whether deep learning can learn better heuristics from data, i.e. replacing human-engineered heuristics? This is appealing because developing algorithms to tackle efficiently NP-hard problems may require years of research, and many industry problems are combinatorial by nature. In this work, we propose to adapt the recent successful Transformer architecture originally developed for natural language processing to the combinatorial TSP. Training is done by reinforcement learning, hence without TSP training solutions, and decoding uses beam search. We report improved performances over recent learned heuristics with an optimal gap of 0.004% for TSP50 and 0.39% for TSP100.	翻訳日:2021-03-05 14:52:55 公開日:2021-03-04
# FESサイクリングにおける神経刺激制御の神経力学に基づく深部強化学習 Neuromechanics-based Deep Reinforcement Learning of Neurostimulation Control in FES cycling ( http://arxiv.org/abs/2103.03057v1 ) ライセンス: Link先を確認	Nat Wannawas, Mahendran Subramanian, A. Aldo Faisal	(参考訳) 機能電気刺激(FES)は麻痺した人の筋肉に動きを回復できます。しかし、四肢全体の機能を取り戻すために多くの筋肉を刺激する制御は未解決の問題である。現在の神経刺激工学はまだ20世紀の制御アプローチに依存しており、それに応じて日常的なチンカーリングがまったく動作する必要がある控えめな結果のみを示しています。本稿では,FESサイクリングのための麻痺肢の適応的神経刺激をリアルタイムに行うために開発されたDeep Reinforcement Learning(RL)の現状について述べる。アプローチの核心は、強化学習フレームワークにパーソナライズされた神経機械部品を組み込むことで、患者とのトレーニングセッションの延長を必要とせず、より効率的にモデルをトレーニングすることが可能になります。神経力学成分は筋・腱機能の筋骨格モデルと筋疲労の多状態モデルとを結合し、麻痺のサイクリストの瞬時筋肉容量に応答する神経刺激を誘導する。我々のRLアプローチはPIDとファジィロジックの精度と性能に優れる。また,本システムでは,自転車競技者の足の運動速度の上昇を刺激し,筋疲労時の定常走行において高いケイデンスを維持できることを学習した。 RLの神経刺激システムの一部は、2020年キバスロンオリンピックのFES大会で、9つの競技チームの中でパラパラパラパラジックサイクリストが銀メダルを獲得したことで成功しました。 Functional Electrical Stimulation (FES) can restore motion to a paralysed person's muscles. Yet, control stimulating many muscles to restore the practical function of entire limbs is an unsolved problem. Current neurostimulation engineering still relies on 20th Century control approaches and correspondingly shows only modest results that require daily tinkering to operate at all. Here, we present our state of the art Deep Reinforcement Learning (RL) developed for real time adaptive neurostimulation of paralysed legs for FES cycling. Core to our approach is the integration of a personalised neuromechanical component into our reinforcement learning framework that allows us to train the model efficiently without demanding extended training sessions with the patient and working out of the box. Our neuromechanical component includes merges musculoskeletal models of muscle and or tendon function and a multistate model of muscle fatigue, to render the neurostimulation responsive to a paraplegic's cyclist instantaneous muscle capacity. Our RL approach outperforms PID and Fuzzy Logic controllers in accuracy and performance. Crucially, our system learned to stimulate a cyclist's legs from ramping up speed at the start to maintaining a high cadence in steady state racing as the muscles fatigue. A part of our RL neurostimulation system has been successfully deployed at the Cybathlon 2020 bionic Olympics in the FES discipline with our paraplegic cyclist winning the Silver medal among 9 competing teams.	翻訳日:2021-03-05 14:52:35 公開日:2021-03-04
# アクセント話者に対する誤り駆動型固定予算ASRパーソナライズ Error-driven Fixed-Budget ASR Personalization for Accented Speakers ( http://arxiv.org/abs/2103.03142v1 ) ライセンス: Link先を確認	Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, Preethi Jyothi	(参考訳) 話者特有の発話を記録するための固定予算に縛られながら、ASRモデルをパーソナライズするタスクを検討します。話者とASRモデルが与えられた場合,話者の発話を認識しにくくする文を識別する手法を提案する。このような文を選択するのに役立つ音素レベルの誤りモデルを学習するために、少数の話者固有データを仮定する。その結果,誤りモデルを用いて選択した文に対する話者の発話は,ランダムに選択された文に対する話者の発話よりも誤り率が高いことがわかった。誤りモデルの助けを借りて選択した文発話におけるasrモデルの微調整は、ランダムに選択された文発話数の微調整と比較して高いwr改善をもたらすことが判明した。そこで本手法は,ASRモデルのパーソナライズのための予算制約下で話者発話を効率よく収集する方法を提供する。 We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker-specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's utterances are likely to be harder for the given ASR model to recognize. We assume a tiny amount of speaker-specific data to learn phoneme-level error models which help us select such sentences. We show that speaker's utterances on the sentences selected using our error model indeed have larger error rates when compared to speaker's utterances on randomly selected sentences. We find that fine-tuning the ASR model on the sentence utterances selected with the help of error models yield higher WER improvements in comparison to fine-tuning on an equal number of randomly selected sentence utterances. Thus, our method provides an efficient way of collecting speaker utterances under budget constraints for personalizing ASR models.	翻訳日:2021-03-05 14:51:39 公開日:2021-03-04
# 交互セグメンテーションと合成によるコントラスト適応型組織分類 Contrast Adaptive Tissue Classification by Alternating Segmentation and Synthesis ( http://arxiv.org/abs/2103.02767v1 ) ライセンス: Link先を確認	Dzung L. Pham, Yi-Yu Chou, Blake E. Dewey, Daniel S. Reich, John A. Butman, and Snehashis Roy	(参考訳) 磁気共鳴画像のセグメンテーションに対する深層学習のアプローチは、脳画像の定量的解析の自動化に重要な可能性を示している。しかし、継続する課題は、取得プロトコルの可変性に対する感度である。トレーニングデータ内のコントラスト特性が異なるイメージをセグメント化しようとすると、一般的にパフォーマンスが大幅に低下する。さらに、取得の違いによる量的変動はしばしば測定しようとする生物学的差異による変動を弱めるため、不均質なデータセットは簡単には評価できない。本稿では,トレーニングデータのコントラスト特性を入力画像に適応させる,交互セグメンテーションと合成ステップを用いたアプローチについて述べる。これにより、トレーニングデータに似ていない入力イメージをより一貫してセグメント化できる。このアプローチの顕著な利点は、そのコントラスト特性に適応するために取得プロトコルの1つの例だけが必要であることである。 2つの異なるT1重み付きボリュームプロトコルでスキャンした被験者の脳画像を用いたアプローチの有効性を実証した。 Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantly reduced performance. Furthermore, heterogeneous data sets cannot be easily evaluated because the quantitative variation due to acquisition differences often dwarfs the variation due to the biological differences that one seeks to measure. In this work, we describe an approach using alternating segmentation and synthesis steps that adapts the contrast properties of the training data to the input image. This allows input images that do not resemble the training data to be more consistently segmented. A notable advantage of this approach is that only a single example of the acquisition protocol is required to adapt to its contrast properties. We demonstrate the efficacy of our approaching using brain images from a set of human subjects scanned with two different T1-weighted volumetric protocols.	翻訳日:2021-03-05 14:51:22 公開日:2021-03-04
# 複数カーネルと複数カーネル空間正規化器を用いたPET画像再構成 PET Image Reconstruction with Multiple Kernels and Multiple Kernel Space Regularizers ( http://arxiv.org/abs/2103.02813v1 ) ライセンス: Link先を確認	Shiyao Guo, Yuxia Sheng, Shenpeng Li, Li Chai, Jingxin Zhang	(参考訳) Kernelized max-likelihood (ML) 期待最大化 (EM) 法は最近、PET 画像の再構築において注目され、多くの最新の手法を上回っています。しかし,非カーネル化MLEM法では,潜在的に大規模な再構成誤りや反復数に対する高い感度で問題に対処できない。本稿では,理論的な推論と実験結果を用いてこれらの問題を実証し,その解法を提案する。このソリューションは、複数のカーネル行列と異なるアプリケーションに合わせて調整できる複数のカーネルスペース正規化を備えた正規化されたカーネル化されたMLEMです。再構成誤差と繰り返し数に対する感度を低減するため,カーネル画像辞書とカーネル画像からなる2つの正規化器の一般クラスと,それを用いて,PET画像再構成のための単一カーネル正規化EMとマルチカーネル正規化EMアルゴリズムを導出する。これらの新しいアルゴリズムは、機械学習におけるマルチカーネルコンビネーション、スパース符号化における画像辞書学習、グラフ信号処理におけるグラフラプルシアン二次的な技術ツールを用いて導かれる。シミュレーションデータと生体内データの比較実験を行い、新しいアルゴリズムの検証と評価を行い、カーネル化されたMLEMや他の従来の手法よりも優れた性能と利点を示す。 Kernelized maximum-likelihood (ML) expectation maximization (EM) methods have recently gained prominence in PET image reconstruction, outperforming many previous state-of-the-art methods. But they are not immune to the problems of non-kernelized MLEM methods in potentially large reconstruction error and high sensitivity to iteration number. This paper demonstrates these problems by theoretical reasoning and experiment results, and provides a novel solution to solve these problems. The solution is a regularized kernelized MLEM with multiple kernel matrices and multiple kernel space regularizers that can be tailored for different applications. To reduce the reconstruction error and the sensitivity to iteration number, we present a general class of multi-kernel matrices and two regularizers consisting of kernel image dictionary and kernel image Laplacian quatradic, and use them to derive the single-kernel regularized EM and multi-kernel regularized EM algorithms for PET image reconstruction. These new algorithms are derived using the technical tools of multi-kernel combination in machine learning, image dictionary learning in sparse coding, and graph Laplcian quadratic in graph signal processing. Extensive tests and comparisons on the simulated and in vivo data are presented to validate and evaluate the new algorithms, and demonstrate their superior performance and advantages over the kernelized MLEM and other conventional methods.	翻訳日:2021-03-05 14:51:07 公開日:2021-03-04
# 人間と機械の知覚に対する電位エネルギーに基づく点雲歪みの定量化 Point Cloud Distortion Quantification based on Potential Energy for Human and Machine Perception ( http://arxiv.org/abs/2103.02850v1 ) ライセンス: Link先を確認	Qi Yang, Siheng Chen, Yiling Xu, Jun Sun, M. Salman Asif, Zhan Ma	(参考訳) 点雲の歪み定量化は、幅広い人間や機械の知覚タスクにおいて、ステルスだが重要な役割を果たす。人間の知覚タスクでは、歪み量子化は主観的な実験に代えて3次元可視化を導くことができ、機械知覚タスクでは歪み量子化は教師なし学習タスクのためのディープニューラルネットワークのトレーニングを導くための損失関数として機能する。多くのアプリケーションで様々な要求を処理するためには、歪み定量化は、歪み識別可能で、微分可能で、計算の複雑さが低い必要がある。しかし、現在では3つの条件をすべて満たす一般的な歪量化が欠如している。このギャップを埋めるために、この研究は、点雲の幾何学と色差を測定する歪み定量化であるMPED(multiscale potential energy discrepancy)を提案する。様々な地域規模で評価することにより,MPEDはグローバルな局所的トレードオフを実現し,マルチスケールで歪みを捉えている。広範な実験研究は、人間と機械の両方の知覚タスクに対するMPEDの優位性を検証します。 Distortion quantification of point clouds plays a stealth, yet vital role in a wide range of human and machine perception tasks. For human perception tasks, a distortion quantification can substitute subjective experiments to guide 3D visualization; while for machine perception tasks, a distortion quantification can work as a loss function to guide the training of deep neural networks for unsupervised learning tasks. To handle a variety of demands in many applications, a distortion quantification needs to be distortion discriminable, differentiable, and have a low computational complexity. Currently, however, there is a lack of a general distortion quantification that can satisfy all three conditions. To fill this gap, this work proposes multiscale potential energy discrepancy (MPED), a distortion quantification to measure point cloud geometry and color difference. By evaluating at various neighborhood sizes, the proposed MPED achieves global-local tradeoffs, capturing distortion in a multiscale fashion. Extensive experimental studies validate MPED's superiority for both human and machine perception tasks.	翻訳日:2021-03-05 14:50:42 公開日:2021-03-04
# mask dngan: 逆損失と勾配マスクを備えたマルチステージ生ビデオ Mask DnGAN: Multi-Stage Raw Video Denoising with Adversarial Loss and Gradient Mask ( http://arxiv.org/abs/2103.02861v1 ) ライセンス: Link先を確認	Avinash Paliwal, Libing Zeng and Nima Khademi Kalantari	(参考訳) 本論文では,低照度下で撮影された生の映像を消音する学習手法を提案する。まず、畳み込みニューラルネットワーク(cnn)を用いて、隣接するフレームを現在のフレームに明示的に調整することを提案する。次に、登録されたフレームを別のCNNを使って融合し、最終識別フレームを得る。時間的に離れたフレームを直接アライメントしないように、複数の段階でアライメントと融合の2つのプロセスを実行します。具体的には、各段階で3つの連続入力フレームで消音処理を行い、中間消音フレームを生成し、次のステージに入力として渡します。複数の段階で処理を行うことで、時間的に離れたフレームを直接調整することなく、隣接するフレームの情報を有効に活用することができる。我々は,条件付き判別器を用いた対向損失を用いた多段階システムの訓練を行う。具体的には,スムーズな領域に高周波アーティファクトを導入するのを防ぐために,ソフトグラデーションマスクに識別器を装着する。本システムでは,時間的にコヒーレントな映像をリアルに生成できることを示す。さらに,本手法が最先端の映像や映像を数値的および視覚的に表現する手法よりも優れていることを示す実験を行った。 In this paper, we propose a learning-based approach for denoising raw videos captured under low lighting conditions. We propose to do this by first explicitly aligning the neighboring frames to the current frame using a convolutional neural network (CNN). We then fuse the registered frames using another CNN to obtain the final denoised frame. To avoid directly aligning the temporally distant frames, we perform the two processes of alignment and fusion in multiple stages. Specifically, at each stage, we perform the denoising process on three consecutive input frames to generate the intermediate denoised frames which are then passed as the input to the next stage. By performing the process in multiple stages, we can effectively utilize the information of neighboring frames without directly aligning the temporally distant frames. We train our multi-stage system using an adversarial loss with a conditional discriminator. Specifically, we condition the discriminator on a soft gradient mask to prevent introducing high-frequency artifacts in smooth regions. We show that our system is able to produce temporally coherent videos with realistic details. Furthermore, we demonstrate through extensive experiments that our approach outperforms state-of-the-art image and video denoising methods both numerically and visually.	翻訳日:2021-03-05 14:50:24 公開日:2021-03-04
# メタモルフィックテストによる重畳生成逆数ネットワークのロバスト性評価 Robustness Evaluation of Stacked Generative Adversarial Networks using Metamorphic Testing ( http://arxiv.org/abs/2103.02870v1 ) ライセンス: Link先を確認	Hyejin Park, Taaha Waseem, Wen Qi Teo, Ying Hwei Low, Mei Kuan Lim and Chun Yong Chong	(参考訳) 自然言語からのフォトリアリスティック画像の合成は、コンピュータビジョンにおいて難しい問題の一つである。過去10年間で、多くのアプローチが提案され、改良されたStackGAN-v2は、入力されたテキスト記述に指定された詳細を反映した高解像度の画像を生成することができることが証明されている。本論文では, StackGAN-v2モデルの堅牢性と耐障害性を評価するために, トレーニングデータの変動を導入した。しかし、GAN(Generative Adversarial Network)の動作原理により、トレーニングデータの修正時にモデルの出力を予測することは困難である。そこで本研究では,様々な予期せぬトレーニングデータセットを用いてモデルのロバスト性を評価するために,メタモルフィックテスト手法を採用する。そこで我々はまずstackgan-v2アルゴリズムを実装し,原著者による事前学習モデルの検証を行い,実験の基礎的真理を明らかにした。次に、テストケースが生成される変成関係を特定します。さらに, 先行試験結果の観察に基づいて, 変成関係を連続的に導出した。その結果,StackGAN-v2モデルの著者やユーザからは報告されていないメインオブジェクトとの重複が最小限に抑えられたとしても,StackGAN-v2アルゴリズムは強迫性オブジェクトによる画像入力に感受性があることが判明した。提案したメタモルフィック関係は、ロバスト性を検証するだけでなく、機械学習モデルによる結果の理解と解釈を支援するために、他のテキスト・画像合成モデルにも適用することができる。 Synthesising photo-realistic images from natural language is one of the challenging problems in computer vision. Over the past decade, a number of approaches have been proposed, of which the improved Stacked Generative Adversarial Network (StackGAN-v2) has proven capable of generating high resolution images that reflect the details specified in the input text descriptions. In this paper, we aim to assess the robustness and fault-tolerance capability of the StackGAN-v2 model by introducing variations in the training data. However, due to the working principle of Generative Adversarial Network (GAN), it is difficult to predict the output of the model when the training data are modified. Hence, in this work, we adopt Metamorphic Testing technique to evaluate the robustness of the model with a variety of unexpected training dataset. As such, we first implement StackGAN-v2 algorithm and test the pre-trained model provided by the original authors to establish a ground truth for our experiments. We then identify a metamorphic relation, from which test cases are generated. Further, metamorphic relations were derived successively based on the observations of prior test results. Finally, we synthesise the results from our experiment of all the metamorphic relations and found that StackGAN-v2 algorithm is susceptible to input images with obtrusive objects, even if it overlaps with the main object minimally, which was not reported by the authors and users of StackGAN-v2 model. The proposed metamorphic relations can be applied to other text-to-image synthesis models to not only verify the robustness but also to help researchers understand and interpret the results made by the machine learning models.	翻訳日:2021-03-05 14:50:06 公開日:2021-03-04
# 深部画像圧縮におけるレイテンシのクロスチャネルコンテキストモデル A Cross Channel Context Model for Latents in Deep Image Compression ( http://arxiv.org/abs/2103.02884v1 ) ライセンス: Link先を確認	Changyue Ma, Zhao Wang, Ruling Liao, Yan Ye	(参考訳) 本稿では,深部画像圧縮における潜伏者に対するクロスチャネルコンテキストモデルを提案する。一般的に、深い画像圧縮は自動エンコーダフレームワークに基づいており、元の画像はエンコーダで潜入し、復号器で量子化された潜伏から復元された画像を回復する。変換は通常エントロピーモデルと組み合わされ、算術符号化のための量子化された潜みの確率分布を推定する。現在、共同自己回帰的および階層的先行エントロピーモデルが広く採用され、ハイパーレイトからのグローバルコンテキストと量子化されたレイト要素からのローカルコンテキストの両方をキャプチャする。ローカルコンテキストでは、広く採用されている2Dマスク畳み込みは、空間コンテキストのみをキャプチャできる。しかし, 異なるチャネル間に強い相関関係があることが観察された。クロスチャネル相関を利用するため,本手法では,チャネルインデックスに従って潜伏者を複数のグループに分割し,そのグループを1つずつコード化する。提案するクロスチャネルコンテキストモデルは自己回帰モデルと階層的事前エントロピーモデルを組み合わせたものである。実験結果は、PSNRを歪みメトリックとして使用することにより、ベースラインエントロピーモデルよりも6.30%と6.31%のBDレート削減を達成し、KodakおよびCVPR CLIC2020プロフェッショナルデータセット用の最新のビデオコーディング標準のVVC(Versatile Video Coding)に対して2.50%と2.20%を達成した。また,MS-SSIMに最適化した場合,より快適な再構成画像を生成する。 This paper presents a cross channel context model for latents in deep image compression. Generally, deep image compression is based on an autoencoder framework, which transforms the original image to latents at the encoder and recovers the reconstructed image from the quantized latents at the decoder. The transform is usually combined with an entropy model, which estimates the probability distribution of the quantized latents for arithmetic coding. Currently, joint autoregressive and hierarchical prior entropy models are widely adopted to capture both the global contexts from the hyper latents and the local contexts from the quantized latent elements. For the local contexts, the widely adopted 2D mask convolution can only capture the spatial context. However, we observe that there are strong correlations between different channels in the latents. To utilize the cross channel correlations, we propose to divide the latents into several groups according to channel index and code the groups one by one, where previously coded groups are utilized to provide cross channel context for the current group. The proposed cross channel context model is combined with the joint autoregressive and hierarchical prior entropy model. Experimental results show that, using PSNR as the distortion metric, the combined model achieves BD-rate reductions of 6.30% and 6.31% over the baseline entropy model, and 2.50% and 2.20% over the latest video coding standard Versatile Video Coding (VVC) for the Kodak and CVPR CLIC2020 professional dataset, respectively. In addition, when optimized for the MS-SSIM metric, our approach generates visually more pleasant reconstructed images.	翻訳日:2021-03-05 14:49:38 公開日:2021-03-04
# 効率と高速化:混合精度量子化のための新しいシーケンシャル・シングルパス探索 Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization ( http://arxiv.org/abs/2103.02904v1 ) ライセンス: Link先を確認	Qigong Sun, Licheng Jiao, Yan Ren, Xiufang Li, Fanhua Shang, Fang Liu	(参考訳) モデル量子化はモデルサイズと計算遅延を低減するのに役立つため、携帯電話、組み込みデバイス、スマートチップの多くのアプリケーションでうまく適用されている。混合精度量子化モデルは、異なる層の感度に応じて異なる量子化ビット精度に適合し、優れた性能を達成することができる。しかし、いくつかの制約(ハードウェアリソース、エネルギー消費、モデルサイズ、計算遅延など)に従って、ディープニューラルネットワーク内の各層の量子化ビット精度を迅速に決定することは困難である。この問題に対処するために,提案した制約を損失関数に導入し,探索プロセスを導出する,混合精度量子化のための新しいシーケンシャルシングルパス探索法(SSPS)を提案する。単一の経路探索セルは、勾配に基づくアルゴリズムによって最適化できる完全微分可能なスーパーネットを結合するために使用される。さらに, 探索空間を指数関数的に削減し, 探索過程の収束を高速化するために, 選択条件に従って候補精度を逐次決定する。実験では,異なるアーキテクチャ(例:ResNet-20, 18, 34, 50, MobileNet-V2)とデータセット(例:CIFAR-10, ImageNet, COCO)の混合精度モデルを,特定の制約下で効率的に探索できることを示した。 Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance. However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency). To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,in which the given constraints are introduced into its loss function to guide searching process. A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.	翻訳日:2021-03-05 14:49:08 公開日:2021-03-04
# 極端k-Spaceアンダーサンプリングと超解像による超高速MRI Towards Ultrafast MRI via Extreme k-Space Undersampling and Superresolution ( http://arxiv.org/abs/2103.02940v1 ) ライセンス: Link先を確認	Aleksandr Belov and Joel Stadelmann and Sergey Kastryulin and Dmitry V. Dylov	(参考訳) 我々は、オリジナルのfastMRIチャレンジを参照するすべての論文によって報告されたMRI加速因子(k-space undersampling)を下回った後、未解決の画像を補う強力なディープラーニングベースの画像強化方法を検討した。我々は、サンプリングパターン、アンダーサンプリングおよびダウンスケーリング要因、ならびに脳と膝の高速MRIベンチマークの最終的な画像品質に対する回復モデルの影響を徹底的に検討します。復元された画像の品質は他の方法よりも高く、MSEは0.00114、PSNRは29.6 dB、SSIMは0.956 x16加速係数である。 x32およびx64のより極端なアンダーサンプリング因子も検討され、コンピュータ支援手術や放射線計画などの特定の臨床応用を約束する。専門家5名の放射線技師が100対の画像を評価し、回収したサンプル画像が統計的に診断価値を保っていることを示す。 We went below the MRI acceleration factors (a.k.a., k-space undersampling) reported by all published papers that reference the original fastMRI challenge, and then considered powerful deep learning based image enhancement methods to compensate for the underresolved images. We thoroughly study the influence of the sampling patterns, the undersampling and the downscaling factors, as well as the recovery models on the final image quality for both the brain and the knee fastMRI benchmarks. The quality of the reconstructed images surpasses that of the other methods, yielding an MSE of 0.00114, a PSNR of 29.6 dB, and an SSIM of 0.956 at x16 acceleration factor. More extreme undersampling factors of x32 and x64 are also investigated, holding promise for certain clinical applications such as computer-assisted surgery or radiation planning. We survey 5 expert radiologists to assess 100 pairs of images and show that the recovered undersampled images statistically preserve their diagnostic value.	翻訳日:2021-03-05 14:48:40 公開日:2021-03-04
# 高品質優先学習と劣化学習による知覚的画像復元 Perceptual Image Restoration with High-Quality Priori and Degradation Learning ( http://arxiv.org/abs/2103.03010v1 ) ライセンス: Link先を確認	Chaoyi Han, Yiping Duan, Xiaoming Tao, Jianhua Lu	(参考訳) 知覚画像復元は、特定の画像に劣化する可能性が高い高忠実度画像を求めます。より良い視覚品質のために、以前の研究は生成モデルの潜在空間を利用して自然画像多様体内の解を探すことを提案した。しかし, 遅延埋め込みが先行分布に近い場合にのみ, 生成画像の品質が保証される。本研究では,先行多様体内の実現可能な領域を制限することを提案する。これは、2つの分布のための非パラメトリック計量で達成されます:最大平均差(MMD)。さらに, 劣化過程を直接条件分布としてモデル化する。本モデルは,復元画像と劣化画像の類似度を測定するのに有効であることを示す。劣化画像よりも長く批判された画素距離を最適化する代わりに、高い確率で視覚的快楽画像を見つけるためにそのようなモデルを頼りにしている。同時修復・拡張フレームワークは,実世界の複雑な分解型によく一般化する。 nr-iqa(perceptual quality and no-reference image quality assessment)の実験結果から,本手法の優れた性能を示す。 Perceptual image restoration seeks for high-fidelity images that most likely degrade to given images. For better visual quality, previous work proposed to search for solutions within the natural image manifold, by exploiting the latent space of a generative model. However, the quality of generated images are only guaranteed when latent embedding lies close to the prior distribution. In this work, we propose to restrict the feasible region within the prior manifold. This is accomplished with a non-parametric metric for two distributions: the Maximum Mean Discrepancy (MMD). Moreover, we model the degradation process directly as a conditional distribution. We show that our model performs well in measuring the similarity between restored and degraded images. Instead of optimizing the long criticized pixel-wise distance over degraded images, we rely on such model to find visual pleasing images with high probability. Our simultaneous restoration and enhancement framework generalizes well to real-world complicated degradation types. The experimental results on perceptual quality and no-reference image quality assessment (NR-IQA) demonstrate the superior performance of our method.	翻訳日:2021-03-05 14:48:23 公開日:2021-03-04
# TPCN: モーション予測のための一時的ポイントクラウドネットワーク TPCN: Temporal Point Cloud Networks for Motion Forecasting ( http://arxiv.org/abs/2103.03067v1 ) ライセンス: Link先を確認	Maosheng Ye, Tongyi Cao, Qifeng Chen	(参考訳) 軌道予測のための空間的および時間的学習を併用した新しい柔軟な枠組みであるtemporal point cloud networks (tpcn)を提案する。エージェントをラスタライズしたり,情報を2次元イメージにマッピングしたり,グラフ表現で操作したりする既存のアプローチとは異なり,このアプローチでは,動的時間学習を伴うポイントクラウド学習から,軌道予測を空間的および時間的次元に分割することで,空間的および時間的情報をキャプチャするアイデアを拡張している。空間的次元ではエージェントは無秩序な点集合と見なすことができ、したがってエージェントの位置をモデル化するためにポイントクラウド学習技術を適用することは容易である。空間次元は運動的・運動的情報を考慮しないが,エージェントの時間的動きをモデル化するための動的時間学習も提案する。 Argoverse運動予測ベンチマークの実験は、私たちのアプローチが最先端の結果を達成することを示しています。 We propose the Temporal Point Cloud Networks (TPCN), a novel and flexible framework with joint spatial and temporal learning for trajectory prediction. Unlike existing approaches that rasterize agents and map information as 2D images or operate in a graph representation, our approach extends ideas from point cloud learning with dynamic temporal learning to capture both spatial and temporal information by splitting trajectory prediction into both spatial and temporal dimensions. In the spatial dimension, agents can be viewed as an unordered point set, and thus it is straightforward to apply point cloud learning techniques to model agents' locations. While the spatial dimension does not take kinematic and motion information into account, we further propose dynamic temporal learning to model agents' motion over time. Experiments on the Argoverse motion forecasting benchmark show that our approach achieves the state-of-the-art results.	翻訳日:2021-03-05 14:48:09 公開日:2021-03-04
# 大規模ビデオ圧縮センシングのためのメモリ効率ネットワーク Memory-Efficient Network for Large-scale Video Compressive Sensing ( http://arxiv.org/abs/2103.03089v1 ) ライセンス: Link先を確認	Ziheng Cheng, Bo Chen, Guanliang Liu, Hao Zhang, Ruiying Lu, Zhengjue Wang, Xin Yuan	(参考訳) video snapshot compressive imaging (sci) は、2d検出器を使って1つのショットで一連のビデオフレームをキャプチャする。基本原理は、1つの露光時間の間に異なるマスクを高速シーンに課して圧縮測定を行うというものである。マスクの知識により、このスナップショット測定から所望の高速映像フレームを再構成するために最適化アルゴリズムやディープラーニング手法が用いられる。残念ながら、これらの手法は良好な結果が得られるが、最適化アルゴリズムの長い実行時間やディープネットワークの巨大なトレーニングメモリ占有は、実用上のアプリケーションではそれらを妨げている。本稿では,マルチグループ可逆3次元畳み込みニューラルネットワークに基づく大規模映像SCIのためのメモリ効率の良いネットワークを開発する。グレースケールSCIシステムの基本モデルに加えて、我々はバイエル測定からカラービデオを直接回復するために、復号化とSCI再構築を組み合わせるためにさらに一歩進んでいます。 SCIカメラが捉えたシミュレーションと実データの両方の大規模な結果から,提案したモデルは,メモリの少ない従来モデルよりも優れており,大規模な問題に利用できることを示す。コードはhttps://github.com/BoChenGroup/RevSCI-netにある。 Video snapshot compressive imaging (SCI) captures a sequence of video frames in a single shot using a 2D detector. The underlying principle is that during one exposure time, different masks are imposed on the high-speed scene to form a compressed measurement. With the knowledge of masks, optimization algorithms or deep learning methods are employed to reconstruct the desired high-speed video frames from this snapshot measurement. Unfortunately, though these methods can achieve decent results, the long running time of optimization algorithms or huge training memory occupation of deep networks still preclude them in practical applications. In this paper, we develop a memory-efficient network for large-scale video SCI based on multi-group reversible 3D convolutional neural networks. In addition to the basic model for the grayscale SCI system, we take one step further to combine demosaicing and SCI reconstruction to directly recover color video from Bayer measurements. Extensive results on both simulation and real data captured by SCI cameras demonstrate that our proposed model outperforms previous state-of-the-art with less memory and thus can be used in large-scale problems. The code is at https://github.com/BoChenGroup/RevSCI-net.	翻訳日:2021-03-05 14:47:53 公開日:2021-03-04
# STEP:安全なオフロードナビゲーションのための確率的トラバーサビリティ評価と計画 STEP: Stochastic Traversability Evaluation and Planning for Safe Off-road Navigation ( http://arxiv.org/abs/2103.02828v1 ) ライセンス: Link先を確認	David D. Fan, Kyohei Otsu, Yuki Kubo, Anushri Dixit, Joel Burdick, and Ali-Akbar Agha-Mohammadi	(参考訳) 地上ロボットの自律性は、構造化および制御された環境で広く使用されているが、未知およびオフロード地形での自律性は依然として難しい問題である。未発達の荒野、洞窟、瓦瓦など、極端、オフロード、非構造的な環境は、自律的なナビゲーションに独特で困難な問題を引き起こす。これらの課題に対処するために, トラバーサビリティの評価と, 安全で実現可能な高速な軌道をリアルタイムに計画する手法を提案する。我々はSTEP (Stochastic Traversability Evaluation and Planning) と名づけたアプローチを, 1) 急激な不確実性認識マッピングとトラバースビリティ評価, 2) 条件付き値アットリスクを用いたテールリスク評価, 3) シーケンシャル2次プログラミングベース(SQP)モデル予測制御(MPC)を用いた効率的なリスクと制約対応キノダイナミックな運動計画に頼っている。本手法をシミュレーションで解析し,地下溶岩管を含む極端な地形を探索する車輪型および脚型ロボットプラットフォーム上での有効性を検証する。 Although ground robotic autonomy has gained widespread usage in structured and controlled environments, autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, and rubble pose unique and challenging problems for autonomous navigation. To tackle these problems we propose an approach for assessing traversability and planning a safe, feasible, and fast trajectory in real-time. Our approach, which we name STEP (Stochastic Traversability Evaluation and Planning), relies on: 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), and 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC). We analyze our method in simulation and validate its efficacy on wheeled and legged robotic platforms exploring extreme terrains including an underground lava tube.	翻訳日:2021-03-05 14:47:34 公開日:2021-03-04
# サイバーフィジカルシステムのためのrlに基づく適応検出戦略 An RL-Based Adaptive Detection Strategy to Secure Cyber-Physical Systems ( http://arxiv.org/abs/2103.02872v1 ) ライセンス: Link先を確認	Ipsita Koley, Sunandan Adhikary and Soumyajit Dey	(参考訳) ネットワークへの依存が高まり、ソフトウェアベースの制御はサイバーフィジカルシステム(CPS)の脆弱性を増大させました。動的システム理論を利用して開発された検出・監視コンポーネントは、安全上の重要なCPSを偽データ注入攻撃から保護するための軽量なセキュリティ対策としてしばしば用いられる。しかし、既存のアプローチは攻撃シナリオと検出システムのパラメータを関連付けていない。本研究では,攻撃シナリオから学んだ経験に基づいて,これらの検出器のパラメータを適応的に設定し,検出率を最大化し,制御動作を保ちながらプロセス中の誤報を最小化する強化学習(rl)フレームワークを提案する。 Increased dependence on networked, software based control has escalated the vulnerabilities of Cyber Physical Systems (CPSs). Detection and monitoring components developed leveraging dynamical systems theory are often employed as lightweight security measures for protecting such safety critical CPSs against false data injection attacks. However, existing approaches do not correlate attack scenarios with parameters of detection systems. In the present work, we propose a Reinforcement Learning (RL) based framework which adaptively sets the parameters of such detectors based on experience learned from attack scenarios, maximizing detection rate and minimizing false alarms in the process while attempting performance preserving control actions.	翻訳日:2021-03-05 14:46:56 公開日:2021-03-04
# 微分プライベートディープラーニングにおける$\epsilon$の選択と監査の定量化 Quantifying identifiability to choose and audit $\epsilon$ in differentially private deep learning ( http://arxiv.org/abs/2103.02913v1 ) ライセンス: Link先を確認	Daniel Bernau, G\"unther Eibl, Philip W. Grassal, Hannah Keller, Florian Kerschbaum	(参考訳) 差分プライバシーにより、トレーニングデータレコードが機械学習モデルに与える影響を制限できます。機械学習で差分プライバシーを使用するには、データサイエンティストがプライバシパラメータを$(\epsilon,\delta)$を選択する必要がある。弱いプライバシパラメータでトレーニングされたモデルが過剰なプライバシリークを引き起こす可能性があり、強力なプライバシパラメータがモデルユーティリティを過度に低下させる可能性があるため、有意義なプライバシパラメータを選択することが重要だ。しかし,プライバシパラメータの値は2つの主な理由から選択が難しい。まず、選択された感度と実用的なデータセットのデータ分布に応じて、プライバシー損失$(\epsilon,\delta)$の上限は緩いかもしれません。第二に、匿名化の法的要件と社会的規範は個々の識別可能性を指し、$(\epsilon,\delta)$は間接的にのみ関係している。 %プライアワークは$(\epsilon,\delta)$の選択を導くためにメンバーシップ推論の敵対者を提案した。しかし、これらの敵はディファレンシャル・プライバシが想定する敵よりも弱く、$(\epsilon,\delta)$で定義されるプライバシ損失の上限を経験上は到達できない。したがって、メンバーシップ推論攻撃の定量化は、$(\epsilon,\delta)$が行う正確な意味を保持しません。我々は(\epsilon,\delta)$を、トレーニングデータセットにおけるレコードの存在に関する差分プライバシーによって仮定される敵のベイズ的後方信念の束縛に変換する。構成下における多次元クエリのバウンダリは保持され、実際はタイトであることを示す。さらに, 識別可能性境界を導出し, 差動プライバシで想定される敵と, メンバシップ推論敵に対する先行研究との関連性を示す。我々は、データサイエンティストがモデルトレーニングを監査し、経験的識別可能性スコアと経験的$(\epsilon,\delta)$を計算することを可能にするこの差分プライバシーの逆数の実装を策定します。 Differential privacy allows bounding the influence that training data records have on a machine learning model. To use differential privacy in machine learning, data scientists must choose privacy parameters $(\epsilon,\delta)$. Choosing meaningful privacy parameters is key since models trained with weak privacy parameters might result in excessive privacy leakage, while strong privacy parameters might overly degrade model utility. However, privacy parameter values are difficult to choose for two main reasons. First, the upper bound on privacy loss $(\epsilon,\delta)$ might be loose, depending on the chosen sensitivity and data distribution of practical datasets. Second, legal requirements and societal norms for anonymization often refer to individual identifiability, to which $(\epsilon,\delta)$ are only indirectly related. %Prior work has proposed membership inference adversaries to guide the choice of $(\epsilon,\delta)$. However, these adversaries are weaker than the adversary assumed by differential privacy and cannot empirically reach the upper bounds on privacy loss defined by $(\epsilon,\delta)$. Therefore, no quantification of a membership inference attack holds the exact meaning that $(\epsilon,\delta)$ does. We transform $(\epsilon,\delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset. The bound holds for multidimensional queries under composition, and we show that it can be tight in practice. Furthermore, we derive an identifiability bound, which relates the adversary assumed in differential privacy to previous work on membership inference adversaries. We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(\epsilon,\delta)$.	翻訳日:2021-03-05 14:46:45 公開日:2021-03-04
# M5競合データの代表性を探る Exploring the representativeness of the M5 competition data ( http://arxiv.org/abs/2103.02941v1 ) ライセンス: Link先を確認	Evangelos Theodorou, Shengjie Wang, Yanfei Kang, Evangelos Spiliotis, Spyros Makridakis, Vassilios Assimakopoulos	(参考訳) m5コンペティションの主な目的は、walmartの階層的なユニットセールスの予測に焦点をあて、現場における予測方法の正確性と不確実性を評価し、ベストプラクティスを特定し、その実践的意義を強調することであった。しかし、m5コンペティションの成果が小売業者の意思決定と運用をより良く支援するために一般化され、活用されるかどうかは、m5データが現実を表わす程度、すなわち、異なる地域で活動する小売業者の単位販売データを十分に表現し、異なる種類の製品を販売し、異なるマーケティング戦略を検討するかによって異なる。この問いに答えるために、我々はM5時系列の特徴を分析し、特徴空間を用いて2つの食料品店、すなわちCooraci\'on Favoritaと主要なギリシャのスーパーマーケットチェーンの特性を比較した。以上の結果から,m5データの代表性を支持するデータセット間の差異は少ないことが示唆された。 The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field in order to identify best practices and highlight their practical implications. However, whether the findings of the M5 competition can be generalized and exploited by retail firms to better support their decisions and operation depends on the extent to which the M5 data is representative of the reality, i.e., sufficiently represent the unit sales data of retailers that operate in different regions, sell different types of products, and consider different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporaci\'on Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest that there are only small discrepancies between the examined data sets, supporting the representativeness of the M5 data.	翻訳日:2021-03-05 14:46:09 公開日:2021-03-04
# ロバスト表現のための深層グラフ構造学習:調査 Deep Graph Structure Learning for Robust Representations: A Survey ( http://arxiv.org/abs/2103.03036v1 ) ライセンス: Link先を確認	Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, Liang Wang	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データの解析に広く利用されている。ほとんどのGNN法はグラフ構造の品質に非常に敏感であり、通常は情報埋め込みを学ぶのに完璧なグラフ構造を必要とする。しかし、グラフにおける雑音の広がりは実世界の問題に対するロバスト表現の学習を必要とする。 GNNモデルの堅牢性を改善するために、最適化されたグラフ構造と対応する表現を共同学習することを目的としたグラフ構造学習(GSL)の中心概念を中心に多くの研究が提案されている。本研究の目的は, 頑健な表現を学習するための GSL 手法の最近の進歩を概観することである。具体的には、まずGSLの一般的なパラダイムを定式化し、次に、GSLの考え方を他のグラフタスクに組み込んだアプリケーションで、グラフ構造のモデル化方法によって分類された最先端の手法をレビューする。最後に,本研究の問題点を指摘し,今後の方向性について論じる。 Graph Neural Networks (GNNs) are widely used for analyzing graph-structured data. Most GNN methods are highly sensitive to the quality of graph structures and usually require a perfect graph structure for learning informative embeddings. However, the pervasiveness of noise in graphs necessitates learning robust representations for real-world problems. To improve the robustness of GNN models, many studies have been proposed around the central concept of Graph Structure Learning (GSL), which aims to jointly learn an optimized graph structure and corresponding representations. Towards this end, in the presented survey, we broadly review recent progress of GSL methods for learning robust representations. Specifically, we first formulate a general paradigm of GSL, and then review state-of-the-art methods classified by how they model graph structures, followed by applications that incorporate the idea of GSL in other graph tasks. Finally, we point out some issues in current studies and discuss future directions.	翻訳日:2021-03-05 14:45:51 公開日:2021-03-04
# 自律車いすプラットフォームにおける人間のナビゲーション意図の視線一致復号 Gaze-contingent decoding of human navigation intention on an autonomous wheelchair platform ( http://arxiv.org/abs/2103.03072v1 ) ライセンス: Link先を確認	Mahendran Subramanian, Suhyung Park, Pavel Orlov, Ali Shafti, A. Aldo Faisal	(参考訳) 我々は,モビリティ・デバイスのナビゲートの目的を理解するために,ユーザが環境をどのように見ているかをデコードすることで,モビリティ・プラットフォームを制御するためのwhere-you-look-is where-you-goアプローチの先駆者となった。しかし、多くの自然眼球運動は行動意図のデコードとは無関係であり、midas touch問題と呼ばれるデコードに挑戦する者もいる。本稿では,1. 深部コンピュータビジョンを用いて,ユーザが自分の視野で何を見ているのかを理解し,2. ユーザが見ている対象のバウンディングボックスのどこにあるのかを分析し,3. 単純な機械学習分類器を用いて,対象に対する視覚上の注意がその対象へのナビゲーション意図の予測であるかどうかを判断する。私たちのデコードシステムは最終的に、ユーザーがドアなどへ運転したいかどうかを判断するか、単にそれを見るかを決定します。重要なのは、ユーザーがオブジェクトを見て、それに向かって動いていることを想像すると、このモーターイメージ(神経インターフェイスと同様)から得られる目の動きはデコダラブルのままであることだ。運転意図と位置を検知すると、自動車椅子プラットフォームであるA.Eye-Driveに、静的で移動中の障害物を避けながら、所望の物体への移動を指示する。したがって,ナビゲーションのためには,車いすを目標(低レベルヒューマンインタフェース)に継続的に操るのではなく,目的と認知的にインタラクションすることのみを必要とする認知レベルのヒューマンインタフェースを実現する。 We have pioneered the Where-You-Look-Is Where-You-Go approach to controlling mobility platforms by decoding how the user looks at the environment to understand where they want to navigate their mobility device. However, many natural eye-movements are not relevant for action intention decoding, only some are, which places a challenge on decoding, the so-called Midas Touch Problem. Here, we present a new solution, consisting of 1. deep computer vision to understand what object a user is looking at in their field of view, with 2. an analysis of where on the object's bounding box the user is looking, to 3. use a simple machine learning classifier to determine whether the overt visual attention on the object is predictive of a navigation intention to that object. Our decoding system ultimately determines whether the user wants to drive to e.g., a door or just looks at it. Crucially, we find that when users look at an object and imagine they were moving towards it, the resulting eye-movements from this motor imagery (akin to neural interfaces) remain decodable. Once a driving intention and thus also the location is detected our system instructs our autonomous wheelchair platform, the A.Eye-Drive, to navigate to the desired object while avoiding static and moving obstacles. Thus, for navigation purposes, we have realised a cognitive-level human interface, as it requires the user only to cognitively interact with the desired goal, not to continuously steer their wheelchair to the target (low-level human interfacing).	翻訳日:2021-03-05 14:45:37 公開日:2021-03-04
# Gradient-Guided Dynamic Efficient Adversarial Training Gradient-Guided Dynamic Efficient Adversarial Training ( http://arxiv.org/abs/2103.03076v1 ) ライセンス: Link先を確認	Fu Wang, Yanghao Zhang, Yanbin Zheng, Wenjie Ruan	(参考訳) 敵意の強い攻撃に耐えられる強固なディープニューラルネットワークを訓練する上で、敵意のトレーニングは効果的だが時間がかかります。そこで本研究では,非効率性に対する反応として動的効率のよい対向訓練(deat)を提案し,対向的反復を徐々に増加させる。さらに、与えられたネットワークのLipschitz定数の下界の接続と、逆例に対する部分微分の大きさが理論的に明らかになる。この理論的な発見を裏付けるものとして, 勾配の大きさを利用して, 逆訓練の有効性を定量化し, 訓練手順の調整タイミングを決定する。このマグニチュードベースの戦略は計算に優しく、実装が簡単です。それはDEATのために特に適し、また反対の訓練方法の広い範囲に移植することができます。今後の研究に光を当てる可能性のある,効果的な対人訓練を実現するためには,一定のレベルのトレーニング対人事例の品質維持が不可欠であると考えられた。 Adversarial training is arguably an effective but time-consuming way to train robust deep neural networks that can withstand strong adversarial attacks. As a response to the inefficiency, we propose the Dynamic Efficient Adversarial Training (DEAT), which gradually increases the adversarial iteration during training. Moreover, we theoretically reveal that the connection of the lower bound of Lipschitz constant of a given network and the magnitude of its partial derivative towards adversarial examples. Supported by this theoretical finding, we utilize the gradient's magnitude to quantify the effectiveness of adversarial training and determine the timing to adjust the training procedure. This magnitude based strategy is computational friendly and easy to implement. It is especially suited for DEAT and can also be transplanted into a wide range of adversarial training methods. Our post-investigation suggests that maintaining the quality of the training adversarial examples at a certain level is essential to achieve efficient adversarial training, which may shed some light on future studies.	翻訳日:2021-03-05 14:45:08 公開日:2021-03-04
# ILoSA: スティフネスとアトラクションのインタラクティブな学習 ILoSA: Interactive Learning of Stiffness and Attractors ( http://arxiv.org/abs/2103.03099v1 ) ライセンス: Link先を確認	Giovanni Franzese, Anna M\'esz\'aros, Luka Peternel, and Jens Kober	(参考訳) ロボットに私たちの好みに応じて力を適用する方法を教えることは、複数のエンジニアリングの観点から取り組まなければならないオープンチャレンジです。本稿では,ユーザフレンドリーなインタフェースを用いて,人間の実演や修正からデカルト剛性とアトラクタの両方を学習できる可変インピーダンスポリシーの学習法について検討する。 ILoSAと呼ばれるこのフレームワークは、政策学習、不確実性領域の特定、インタラクティブな修正、剛性変調、アクティブ障害拒否を可能にするためにガウスプロセスを使用している。フランカ・エミカ・パンダにおいて,(1)プラグの取り外し成功時に突然の力の不連続が発生するプラグを引っ張る,(2)ロボットの動作を維持するために持続的な力を必要とする箱を押す,(3)力が移動方向に垂直に作用するホワイトボードを拭く,という3つの異なる力相互作用特性を有する実験的な評価を行った。 Teaching robots how to apply forces according to our preferences is still an open challenge that has to be tackled from multiple engineering perspectives. This paper studies how to learn variable impedance policies where both the Cartesian stiffness and the attractor can be learned from human demonstrations and corrections with a user-friendly interface. The presented framework, named ILoSA, uses Gaussian Processes for policy learning, identifying regions of uncertainty and allowing interactive corrections, stiffness modulation and active disturbance rejection. The experimental evaluation of the framework is carried out on a Franka-Emika Panda in three separate cases with unique force interaction properties: 1) pulling a plug wherein a sudden force discontinuity occurs upon successful removal of the plug, 2) pushing a box where a sustained force is required to keep the robot in motion, and 3) wiping a whiteboard in which the force is applied perpendicular to the direction of movement.	翻訳日:2021-03-05 14:44:50 公開日:2021-03-04
# Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks ( http://arxiv.org/abs/2103.03212v1 ) ライセンス: Link先を確認	Cristian Bodnar, Fabrizio Frasca, Yu Guang Wang, Nina Otter, Guido Mont\'ufar, Pietro Li\`o, Michael Bronstein	(参考訳) グラフ機械学習のペアワイズ相互作用パラダイムは、リレーショナルシステムのモデリングを主に支配している。しかし、グラフだけでは多くの複雑なシステムに存在するマルチレベル相互作用を捉えることができず、そのようなスキームの表現力は限定的であることが証明された。これらの制限を克服するために、グラフを高次元に一般化する位相的オブジェクトである単純複素体(SC)上のメッセージパッシングを実行するモデルのクラスであるMessage Passing Simplicial Networks (MPSNs)を提案する。理論的にモデルの表現性を解析するために,非同型SCを識別するための単純ワイスフェイラー・リーマンカラー化法を導入する。我々は、SWLのパワーと非同型グラフの識別の問題とを関連づけ、SWLとMPSNがWLテストよりも厳密に強力であり、3WLテストよりも強力でないことを示す。我々は,従来のグラフニューラルネットワークとReLUアクティベーションを比較して,表現可能な関数の線形領域の数の観点から分析を深めている。我々は,MPSNがGNNが失敗する難易度の高い正則グラフを識別し,配向同変層を備えると,GNNベースラインと比較して指向性SCの分類精度を向上させることができることを示すことによって,我々の理論的主張を実証的に支持する。さらに、我々は今後リリースする予定の単純複雑体上でメッセージパッシングのためのライブラリを実装します。 The pairwise interaction paradigm of graph machine learning has predominantly governed the modelling of relational systems. However, graphs alone cannot capture the multi-level interactions present in many complex systems and the expressive power of such schemes was proven to be limited. To overcome these limitations, we propose Message Passing Simplicial Networks (MPSNs), a class of models that perform message passing on simplicial complexes (SCs) - topological objects generalising graphs to higher dimensions. To theoretically analyse the expressivity of our model we introduce a Simplicial Weisfeiler-Lehman (SWL) colouring procedure for distinguishing non-isomorphic SCs. We relate the power of SWL to the problem of distinguishing non-isomorphic graphs and show that SWL and MPSNs are strictly more powerful than the WL test and not less powerful than the 3-WL test. We deepen the analysis by comparing our model with traditional graph neural networks with ReLU activations in terms of the number of linear regions of the functions they can represent. We empirically support our theoretical claims by showing that MPSNs can distinguish challenging strongly regular graphs for which GNNs fail and, when equipped with orientation equivariant layers, they can improve classification accuracy in oriented SCs compared to a GNN baseline. Additionally, we implement a library for message passing on simplicial complexes that we envision to release in due course.	翻訳日:2021-03-05 14:44:31 公開日:2021-03-04
# GenoML: ゲノムのための自動機械学習 GenoML: Automated Machine Learning for Genomics ( http://arxiv.org/abs/2103.03221v1 ) ライセンス: Link先を確認	Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, David Saffo, Lana Sargent, Anant Dadu, Eduardo Salmer\'on Casta\~no, John F. Carter, Melina Maleknia, Juan A. Botia, Cornelis Blauwendraat, Roy H. Campbell, Sayed Hadi Hashemi, Andrew B. Singleton, Mike A. Nalls, Faraz Faghri	(参考訳) GenoMLは、ゲノミクス(遺伝学とマルチオミクス)のための機械学習ワークフローを自動化するPythonパッケージである。ゲノムデータには、データのクリーン化、前処理、調和、および品質管理を行うための重要なドメイン専門知識が必要です。さらに、チューニング、検証、および解釈には、基礎となるデータ収集、プロトコル、および技術の生物学および潜在的に制限を考慮する必要があります。 GenoMLの使命は、完全な開発、評価、および展開プロセスを自動化する使いやすいツールを開発し、ゲノム学と臨床データの機械学習を非専門家にもたらすことです。オープンサイエンスを重視して、ワークフローを科学コミュニティ内で簡単にアクセス、複製、転送できるようにします。ソースコードとドキュメントはhttps://genoml.comで入手できる。 GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlying data collection, protocols, and technology. GenoML's mission is to bring machine learning for genomics and clinical data to non-experts by developing an easy-to-use tool that automates the full development, evaluation, and deployment process. Emphasis is put on open science to make workflows easily accessible, replicable, and transferable within the scientific community. Source code and documentation is available at https://genoml.com.	翻訳日:2021-03-05 14:44:06 公開日:2021-03-04
# ガウス過程のための小さなサンプル空間 Small Sample Spaces for Gaussian Processes ( http://arxiv.org/abs/2103.03169v1 ) ライセンス: Link先を確認	Toni Karvonen	(参考訳) ガウスプロセス$X$のサンプルの与えられた再生核ヒルベルト空間(RKHS)のメンバシップは、特定の核支配条件によって制御されることが知られている。しかし、サンプルを含む「小さな」関数の集合(必ずしもベクトル空間ではない)をどうやって識別するかは明確ではない。本稿では、そのような集合を識別するための一般的なアプローチを示す。私たちは、ヒルベルトスケールの一般化と見なすことができるスケールされたRKHSを使用して、スケールされたRKHSのコレクションによって誘導される$\sigma$-algebraで$X$の法則の下で完全な測定のすべての要素に含まれる最大のセットとしてサンプルサポートセットを定義します。この潜在的に測定不可能な集合は、$X$ の共分散核 RKHS の直交基底の点から拡大できる函数から成り、その平方基底係数が 0 と無限から遠ざかっていることが示され、カルフン-Lo\`{e}ve の定理によって示唆される。 It is known that the membership in a given reproducing kernel Hilbert space (RKHS) of the samples of a Gaussian process $X$ is controlled by a certain nuclear dominance condition. However, it is less clear how to identify a "small" set of functions (not necessarily a vector space) that contains the samples. This article presents a general approach for identifying such sets. We use scaled RKHSs, which can be viewed as a generalisation of Hilbert scales, to define the sample support set as the largest set which is contained in every element of full measure under the law of $X$ in the $\sigma$-algebra induced by the collection of scaled RKHS. This potentially non-measurable set is then shown to consist of those functions that can be expanded in terms of an orthonormal basis of the RKHS of the covariance kernel of $X$ and have their squared basis coefficients bounded away from zero and infinity, a result suggested by the Karhunen-Lo\`{e}ve theorem.	翻訳日:2021-03-05 14:43:23 公開日:2021-03-04
# 背景音楽と混合した放送データを用いたニューラルテキスト音声モデル A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music ( http://arxiv.org/abs/2103.03049v1 ) ライセンス: Link先を確認	Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, Hoon-Young Cho	(参考訳) 近年,インターネットやyoutubeなどのメディアから音声データを得るのが容易になっているが,tts(neural text-to-speech)モデルを直接利用することは困難である。クリーンスピーチの割合は不十分であり、残りはバックグラウンドミュージックを含む。 global style token (gst)でさえも。そこで本研究では,放送データに制限のあるエンドツーエンドのTSモデルを学習する手法を提案する。まず、音楽フィルタを導入することにより、背景音楽が音声から削除される。第二に、補助品質分類器を備えたGST-TTSモデルは、フィルタリングされた音声と少量のクリーンな音声で訓練される。特に、品質分類器は、GST層の埋め込みベクトルを入力音声の音声品質(フィルタまたはクリーン)を表すことに重点を置いている。実験により,提案手法は従来手法よりもはるかに高品質な音声を合成することを確認した。 Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-end TTS model with limited broadcast data. First, the background music is removed from the speech by introducing a music filter. Second, the GST-TTS model with an auxiliary quality classifier is trained with the filtered speech and a small amount of clean speech. In particular, the quality classifier makes the embedding vector of the GST layer focus on representing the speech quality (filtered or clean) of the input speech. The experimental results verified that the proposed method synthesized much more high-quality speech than conventional methods.	翻訳日:2021-03-05 14:42:32 公開日:2021-03-04
# One for One, or All for All: フェデレーション学習におけるコラボレーションの平衡と最適性 One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning ( http://arxiv.org/abs/2103.03228v1 ) ライセンス: Link先を確認	Avrim Blum, Nika Haghtalab, Richard Lanas Phillips, Han Shao	(参考訳) 近年、連合学習は、多数の学習エージェントにまたがるコラボレーションを実現するためのアプローチとして受け入れられている。しかし、これらのコラボレーションを維持するために個別のリソースを共同学習に割り当てる際にエージェントのインセンティブをどのように考慮すべきかについては、ほとんど知られていない。本論文では,ゲーム理論の概念に触発されて,フェデレーション学習におけるインセンティブ認識学習とデータ共有のためのフレームワークを提案する。本研究は, 学習目標達成に関心のあるエージェントの存在下で, サンプル収集の負担を低く抑えながら, 協調の考え方を捉えたものである。例えば、うらやましのない平衡では、いかなるエージェントもサンプリング負荷を他のエージェントと交換することを望んでおらず、安定した平衡では、サンプリング負荷を一方的に低減したいエージェントはいない。この枠組みの形式化に加えて、我々の貢献には、そのような平衡の構造的性質を特徴づけ、その存在を証明し、どのように計算できるかを示すことが含まれる。さらに、エージェントのインセンティブを無視した場合のインセンティブ認識コラボレーションのサンプル複雑さと最適なコラボレーションのサンプルを比較します。 In recent years, federated learning has been embraced as an approach for bringing about collaboration across large populations of learning agents. However, little is known about how collaboration protocols should take agents' incentives into account when allocating individual resources for communal learning in order to maintain such collaborations. Inspired by game theoretic notions, this paper introduces a framework for incentive-aware learning and data sharing in federated learning. Our stable and envy-free equilibria capture notions of collaboration in the presence of agents interested in meeting their learning objectives while keeping their own sample collection burden low. For example, in an envy-free equilibrium, no agent would wish to swap their sampling burden with any other agent and in a stable equilibrium, no agent would wish to unilaterally reduce their sampling burden. In addition to formalizing this framework, our contributions include characterizing the structural properties of such equilibria, proving when they exist, and showing how they can be computed. Furthermore, we compare the sample complexity of incentive-aware collaboration with that of optimal collaboration when one ignores agents' incentives.	翻訳日:2021-03-05 14:42:06 公開日:2021-03-04
# Moshpit SGD:不均一不信頼デバイスにおけるコミュニケーション効率の良い分散トレーニング Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices ( http://arxiv.org/abs/2103.03239v1 ) ライセンス: Link先を確認	Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko	(参考訳) 大規模データセットでのディープニューラルネットワークのトレーニングは、複数の計算ノードを使用することで、しばしば加速される。分散トレーニングとして知られるこのアプローチは、リングオールリデューサのような特殊なメッセージパッシングプロトコルを使って数百のコンピュータを利用することができる。しかし、これらのプロトコルを大規模に実行するには、専用のクラスタでしか利用できない信頼性の高い高速ネットワークが必要である。対照的に、フェデレーション学習やクラウドベースの分散トレーニングといった現実世界のアプリケーションの多くは、不安定なネットワーク帯域を持つ信頼性の低いデバイス上で動作します。その結果、これらのアプリケーションは、パラメータサーバまたはgossipベースの平均プロトコルの使用に制限される。この研究では、指数関数的に世界平均に収束する反復平均プロトコルであるMoshpit All-Reduceを提案した。我々は,分散最適化のためのプロトコルの効率を,強い理論的保証で実証する。実験では、ImageNet上のResNet-50トレーニングの1.3倍のスピードアップと、プリエンプティブルな計算ノードを使用してALBERTをスクラッチからトレーニングする際の1.5倍のスピードアップが示されている。 Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce -- an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.	翻訳日:2021-03-05 14:41:48 公開日:2021-03-04
# 契約理論による学習型適応制御 Learning-based Adaptive Control via Contraction Theory ( http://arxiv.org/abs/2103.02987v1 ) ライセンス: Link先を確認	Hiroyasu Tsukamoto and Soon-Jo Chung and Jean-Jacques Slotine	(参考訳) 本稿では,適応神経収縮計量(adaptive neural shrink metric, ancm)と呼ばれる,乗算可能なパラメトリック不確かさを持つ非線形システムのための,新しいディープラーニングに基づく適応制御フレームワークを提案する。 aNCMは、パラメトリック不確実性の下での系軌道の漸近安定性と指数的有界性を保証する最適適応収縮計量のニューラルネットワークモデルを使用する。特に,神経収縮計量(ncm)の概念を活用し,境界外乱を有する非線形システムに対して名目上十分安定なロバスト制御方針を得るとともに,この方針を新しい適応則と組み合わせて安定性保証を実現する。また,基底関数近似を用いてモデル化した力学系の適応制御に適用できることを示した。さらに、ニューラルネットワークをaNCMで使用することで、リアルタイムの実装が可能となり、様々なシステムに適用可能である。最先端技術に対するその優位性は、単純なカートポールバランスタスクで示される。 We present a new deep learning-based adaptive control framework for nonlinear systems with multiplicatively-separable parametric uncertainty, called an adaptive Neural Contraction Metric (aNCM). The aNCM uses a neural network model of an optimal adaptive contraction metric, the existence of which guarantees asymptotic stability and exponential boundedness of system trajectories under the parametric uncertainty. In particular, we exploit the concept of a Neural Contraction Metric (NCM) to obtain a nominal provably stable robust control policy for nonlinear systems with bounded disturbances, and combine this policy with a novel adaptation law to achieve stability guarantees. We also show that the framework is applicable to adaptive control of dynamical systems modeled via basis function approximation. Furthermore, the use of neural networks in the aNCM permits its real-time implementation, resulting in broad applicability to a variety of systems. Its superiority to the state-of-the-art is illustrated with a simple cart-pole balancing task.	翻訳日:2021-03-05 14:41:26 公開日:2021-03-04
# BERTをベースとした特許ノベルティ検索のトレーニング BERT based patent novelty search by training claims to their own description ( http://arxiv.org/abs/2103.01126v3 ) ライセンス: Link先を確認	Michael Freunek and Andr\'e Bodmer	(参考訳) 本稿では,特許クレームを自己記述に結合する手法を提案する。この方法を適用することで、BERTはクレームの適切な記述を訓練する。このようなトレーニングされたBERT (claim-to-description- BERT) は、特許の新規性に関する記述を識別することができる。さらに,BERTの出力を有意に処理するために,新たなスコアリング方式,関連スコア,あるいは新規スコアを導入する。特許に関する最初の主張とそれに対応する記述に基づいてBERTを訓練することにより,特許出願の手法を検証した。 BERTの出力は、検索レポートの引用X文書と比較して、関連スコアと結果に基づいて処理されている。テストの結果、BERTは引用されたX文書のいくつかを非常に関連性が高いと評価した。 In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claim-to-description- BERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT's output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.	翻訳日:2021-03-05 13:05:03 公開日:2021-03-04
# ハイブリッドネットワークを用いたイベントベース合成開口イメージング Event-based Synthetic Aperture Imaging with a Hybrid Network ( http://arxiv.org/abs/2103.02376v2 ) ライセンス: Link先を確認	Xiang Zhang, Liao Wei, Lei Yu, Wen Yang and Gui-Song Xia	(参考訳) 合成開口画像(SAI)は、オフフォーカス前景の閉塞をぼかし、マルチビュー画像からフォーカス内隠蔽対象を再構成することにより、その効果を生かすことができる。しかし、非常に密集した閉塞と極端な照明条件は、従来のフレームベースのカメラに基づくSAIに大きな障害をもたらし、性能劣化を引き起こす可能性がある。そこで本研究では,低レイテンシかつ高ダイナミックレンジの非同期イベントを生成可能なイベントカメラに基づく新しいSAIシステムを提案する。これにより、ほぼ連続的な視点で測定することで密閉体の干渉を排除でき、同時に露光問題に対処することができる。閉鎖対象を再構築するために、スパイキングニューラルネットワーク(SNN)と畳み込みニューラルネットワーク(CNN)からなるハイブリッドエンコーダデコーダネットワークを提案する。ハイブリッドネットワークでは、収集されたイベントの時空間情報が最初にsnn層によってエンコードされ、その後、スタイル転送cnnデコーダによってオクルードされたターゲットの視覚画像に変換される。実験により,非常に密集したオクルージョンと極端な照明条件に対処し,純イベントデータを用いて高品質な視覚画像を再構成できることを示す。 Synthetic aperture imaging (SAI) is able to achieve the see through effect by blurring out the off-focus foreground occlusions and reconstructing the in-focus occluded targets from multi-view images. However, very dense occlusions and extreme lighting conditions may bring significant disturbances to SAI based on conventional frame-based cameras, leading to performance degeneration. To address these problems, we propose a novel SAI system based on the event camera which can produce asynchronous events with extremely low latency and high dynamic range. Thus, it can eliminate the interference of dense occlusions by measuring with almost continuous views, and simultaneously tackle the over/under exposure problems. To reconstruct the occluded targets, we propose a hybrid encoder-decoder network composed of spiking neural networks (SNNs) and convolutional neural networks (CNNs). In the hybrid network, the spatio-temporal information of the collected events is first encoded by SNN layers, and then transformed to the visual image of the occluded targets by a style-transfer CNN decoder. Through experiments, the proposed method shows remarkable performance in dealing with very dense occlusions and extreme lighting conditions, and high quality visual images can be reconstructed using pure event data.	翻訳日:2021-03-05 13:04:27 公開日:2021-03-04
# ルールセットの視覚化:設計空間の探索と検証 Visualizing Rule Sets: Exploration and Validation of a Design Space ( http://arxiv.org/abs/2103.01022v2 ) ライセンス: Link先を確認	Jun Yuan, Oded Nov, Enrico Bertini	(参考訳) ルールセットは、透明性と知性が必要な設定でモデルロジックを伝える手段として、機械学習(ML)でよく使用される。ルールセットは通常、論理文(ルール)のテキストベースのリストとして表示される。驚いたことに、これまでルールを提示するための視覚的な代替方法を探求する作業は限られていた。本論文では,ルールの可読性や理解にポジティブな影響を与えると思われる視覚的要因に焦点をあてて,ルールの代替表現を設計するアイデアを検討する。本稿では,ルールセットを視覚化するための初期設計空間と,その影響を探索するユーザスタディを提案する。その結果, 設計要因のいくつかは, 精度への影響を最小限に抑えつつ, 読者がいかに効率的にルールを処理できるかに強い影響を与えていることがわかった。この作業は、ルールをコミュニケーション戦略として使用してMLモデルを理解する際に、実践者がより効果的なソリューションを採用するのに役立ちます。 Rule sets are often used in Machine Learning (ML) as a way to communicate the model logic in settings where transparency and intelligibility are necessary. Rule sets are typically presented as a text-based list of logical statements (rules). Surprisingly, to date there has been limited work on exploring visual alternatives for presenting rules. In this paper, we explore the idea of designing alternative representations of rules, focusing on a number of visual factors we believe have a positive impact on rule readability and understanding. The paper presents an initial design space for visualizing rule sets and a user study exploring their impact. The results show that some design factors have a strong impact on how efficiently readers can process the rules while having minimal impact on accuracy. This work can help practitioners employ more effective solutions when using rules as a communication strategy to understand ML models.	翻訳日:2021-03-05 13:04:06 公開日:2021-03-04
# ダグは好きか? 構造学習と因果的発見に関する調査 D'ya like DAGs? A Survey on Structure Learning and Causal Discovery ( http://arxiv.org/abs/2103.02582v2 ) ライセンス: Link先を確認	Matthew J. Vowels, Necati Cihan Camgoz, and Richard Bowden	(参考訳) 因果推論は科学と人間の知性の重要な部分です。データから因果関係を発見するためには構造探索法が必要である。本稿では、背景理論のレビューと構造発見手法の調査を行う。私たちは主にモダンで継続的な最適化手法にフォーカスし、ベンチマークデータセットやソフトウェアパッケージといったさらなるリソースへの参照を提供します。最後に,構造から因果関係へ導くために必要な跳躍について論じる。 Causal reasoning is a crucial part of science and human intelligence. In order to discover causal relationships from data, we need structure discovery methods. We provide a review of background theory and a survey of methods for structure discovery. We primarily focus on modern, continuous optimization methods, and provide reference to further resources such as benchmark datasets and software packages. Finally, we discuss the assumptive leap required to take us from structure to causality.	翻訳日:2021-03-05 13:03:51 公開日:2021-03-04
# ジェネレーティブ・アドバーサリー・ネットワークを1つの段階でトレーニングする Training Generative Adversarial Networks in One Stage ( http://arxiv.org/abs/2103.00430v2 ) ライセンス: Link先を確認	Chengchao Shen, Youtan Yin, Xinchao Wang, Xubin LI, Jie Song, Mingli Song	(参考訳) GAN(Generative Adversarial Networks)は、様々な画像生成タスクで前例のない成功を収めています。しかし、奨励的な結果は、発電機と識別器が2つの段階に交互に更新される面倒なトレーニングプロセスの価格で提供されます。本稿では,1段階のみに効率よくGANを訓練できる総合的な訓練手法について検討する。生成器と識別器の対角的損失に基づいて、GANを対称的GANと非対称的GANの2つのクラスに分類し、2つのクラスを統一する新たな勾配分解法を導入し、両方のクラスを1段階にトレーニングし、トレーニング作業を緩和する。いくつかのデータセットと様々なネットワークアーキテクチャの計算解析と実験結果から,提案した1段階トレーニングスキームは,ジェネレータと判別器のネットワークアーキテクチャによらず,従来のトレーニングスキームよりも1.5$\times$加速度が得られた。さらに,提案手法は,データフリーナレッジ蒸留など,他の対比訓練シナリオにも容易に適用できることを示した。ソースコードはもうすぐ公開します。 Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Based on the adversarial losses of the generator and discriminator, we categorize GANs into two classes, Symmetric GANs and Asymmetric GANs, and introduce a novel gradient decomposition method to unify the two, allowing us to train both classes in one stage and hence alleviate the training effort. Computational analysis and experimental results on several datasets and various network architectures demonstrate that, the proposed one-stage training scheme yields a solid 1.5$\times$ acceleration over conventional training schemes, regardless of the network architectures of the generator and discriminator. Furthermore, we show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation. Our source code will be published soon.	翻訳日:2021-03-05 13:03:47 公開日:2021-03-04
# 授業増分学習におけるデータの蒸留因果効果 Distilling Causal Effect of Data in Class-Incremental Learning ( http://arxiv.org/abs/2103.01737v2 ) ライセンス: Link先を確認	Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang	(参考訳) 本研究では,CIL(Class-Incremental Learning)における破滅的忘れについて説明し,データリプレイや特徴/ラベル蒸留といった既存のアンチフォーガーティング手法に直交する新しい蒸留法を導出するための因果的枠組みを提案する。まず最初に、CILをフレームワークに配置し、2) 忘れる理由に答える: 古いデータの因果効果が新しいトレーニングで失われ、3) 既存のテクニックがそれを緩和する方法について説明する: 因果効果を取り戻せる。この枠組みから, 特徴・ラベル蒸留は貯蔵効率が高いが, その因果効果は, データ再生によって保存されるエンドツーエンドの特徴学習の長所と一致しないことがわかった。そこで本研究では,データ再生の因果効果と基本的に等価であるが,再生ストレージのコストを伴わずに,古いデータと新しいデータとの衝突効果を蒸留することを提案する。因果効果分析のおかげで、データストリームのIncremental Momentum Effectをさらにキャプチャし、新しいデータ効果によって圧倒された古い効果を保持するのに役立つものを削除し、テストにおける古いクラスの忘れを軽減することができます。 CIFAR-100、ImageNet-Sub&Fullの3つのCILベンチマークに関する広範な実験は、提案された因果効果蒸留が、様々な最先端のCIL法を大きなマージン(0.72%--9.06%)で改善できることを示した。 We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation. We first 1) place CIL into the framework, 2) answer why the forgetting happens: the causal effect of the old data is lost in new training, and then 3) explain how the existing techniques mitigate it: they bring the causal effect back. Based on the framework, we find that although the feature/label distillation is storage-efficient, its causal effect is not coherent with the end-to-end feature learning merit, which is however preserved by data replay. To this end, we propose to distill the Colliding Effect between the old and the new data, which is fundamentally equivalent to the causal effect of data replay, but without any cost of replay storage. Thanks to the causal effect analysis, we can further capture the Incremental Momentum Effect of the data stream, removing which can help to retain the old effect overwhelmed by the new data effect, and thus alleviate the forgetting of the old class in testing. Extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect distillation can improve various state-of-the-art CIL methods by a large margin (0.72%--9.06%).	翻訳日:2021-03-05 13:03:27 公開日:2021-03-04
# PENet: 精密かつ効率的な画像ガイド深度補完を目指して PENet: Towards Precise and Efficient Image Guided Depth Completion ( http://arxiv.org/abs/2103.00783v2 ) ライセンス: Link先を確認	Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiaojin Gong	(参考訳) 画像案内深度完成は、スパース深度マップと高品質な画像から濃密深度マップを生成するタスクである。このタスクでは、色と深さのモダリティを融合する方法が、優れたパフォーマンスを達成する上で重要な役割を果たす。本論文では, 色優性分枝と深度優性分枝からなる2枝バックボーンを提案し, 2つのモダリティを徹底的に活用・融合する。具体的には、色画像とスパース深度マップを入力し、密度の深い深度マップを予測する。他方の分岐は、スパース深度マップと予め予測された深さマップを入力とし、高密度深さマップも出力する。 2つの枝から予測される深度マップは互いに補完的であり、適応的に融合する。さらに,3次元幾何学的手がかりを符号化する簡単な幾何学的畳み込み層も提案する。幾何エンコードされたバックボーンは、複数の段階で異なるモダリティの融合を行い、良好な深さ完成結果をもたらします。さらに、融解深度マップを効率的に洗練するために、拡張および加速CSPN++を実装します。提案する完全モデルは、提出時点でkitti depth completion online leaderboardで1位にランクインしている。また、トップクラスのほとんどのメソッドよりもはるかに高速に推論する。この作業のコードはhttps://github.com/JUGGHM/PENet_ICRA2021で入手できます。 Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch to exploit and fuse two modalities thoroughly. More specifically, one branch inputs a color image and a sparse depth map to predict a dense depth map. The other branch takes as inputs the sparse depth map and the previously predicted depth map, and outputs a dense depth map as well. The depth maps predicted from two branches are complimentary to each other and therefore they are adaptively fused. In addition, we also propose a simple geometric convolutional layer to encode 3D geometric cues. The geometric encoded backbone conducts the fusion of different modalities at multiple stages, leading to good depth completion results. We further implement a dilated and accelerated CSPN++ to refine the fused depth map efficiently. The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission. It also infers much faster than most of the top ranked methods. The code of this work will be available at https://github.com/JUGGHM/PENet_ICRA2021.	翻訳日:2021-03-05 13:03:01 公開日:2021-03-04
# StyleGAN用円形人工物の系統解析と除去 Systematic Analysis and Removal of Circular Artifacts for StyleGAN ( http://arxiv.org/abs/2103.01090v2 ) ライセンス: Link先を確認	Way Tan, Bihan Wen, Xulei Yang	(参考訳) StyleGANは、高解像度と超リアルな顔画像の合成で有名な最先端の画像ジェネレーターの1つです。バニラスタイルGANモデルによって生成された画像は視覚的に魅力的であるが、しばしば、生成された画像の品質を著しく低下させる顕著な円形のアーティファクトを含んでいる。本研究では、バニラ様式GANアーキテクチャの異なる段階の機能を検討し、メカニズム解析と広範な実験の両方を用いて、これらの円形アーティファクトがどのように形成されるのかを体系的に調査する。このような望ましくないアーティファクトを促進するバニラスタイルガンのキーモジュールが強調される。私たちの調査では、アーティファクトが通常、円形であり、比較的小さく、まれに2つ以上の部分に分割される理由も説明しています。さらに,バニラ型GANの顕著な円形アーティファクトを,新しいピクセルインスタンス正規化(PIN)層を適用して,簡易かつ効果的に除去する手法を提案する。 StyleGAN is one of the state-of-the-art image generators which is well-known for synthesizing high-resolution and hyper-realistic face images. Though images generated by vanilla StyleGAN model are visually appealing, they sometimes contain prominent circular artifacts which severely degrade the quality of generated images. In this work, we provide a systematic investigation on how those circular artifacts are formed by studying the functionalities of different stages of vanilla StyleGAN architecture, with both mechanism analysis and extensive experiments. The key modules of vanilla StyleGAN that promote such undesired artifacts are highlighted. Our investigation also explains why the artifacts are usually circular, relatively small and rarely split into 2 or more parts. Besides, we propose a simple yet effective solution to remove the prominent circular artifacts for vanilla StyleGAN, by applying a novel pixel-instance normalization (PIN) layer.	翻訳日:2021-03-05 13:02:40 公開日:2021-03-04
# 自己分散バイナリニューラルネットワーク Self-Distribution Binary Neural Networks ( http://arxiv.org/abs/2103.02394v2 ) ライセンス: Link先を確認	Ping Xue, Yang Lu, Jingfei Chang, Xing Wei, Zhen Wei	(参考訳) 本研究では、重みとアクティベーションの両方がバイナリ(すなわち1ビット表現)である2進ニューラルネットワーク(BNN)について検討する。特徴表現はディープニューラルネットワークにとって重要ですが、BNNでは特徴はサインでしか異なります。先行研究では、量子化誤差を低減し、bnnの分類精度を効果的に向上するために、二元重みとアクティベーションにスケーリング係数を導入する。しかしながら、スケーリング要因はネットワークの計算複雑性を増加させるだけでなく、バイナリ機能の兆候にも意味をなさない。そこで,SD-BNN(Self-Distribution Binary Neural Network)を提案する。まず、アクティベーション自己分布(ASD)を用いて、アクティベーションの符号分布を適応的に調整し、畳み込みの出力の符号差を改善する。第二に、重量自己分布(WSD)を通じて重みの符号分布を調整し、畳み込みの符号分布を微調整します。さまざまなネットワーク構造を持つCIFAR-10およびImageNetデータセットの広範な実験は、提案されたSD-BNNが常に最先端の(SOTA)BNN(例えば、CIFAR-10で92.5%、ResNet-18で66.5%)を計算コストで上回ることを示唆している。コードはhttps://github.com/ pingxue-hfut/SD-BNNで入手できる。 In this work, we study the binary neural networks (BNNs) of which both the weights and activations are binary (i.e., 1-bit representation). Feature representation is critical for deep neural networks, while in BNNs, the features only differ in signs. Prior work introduces scaling factors into binary weights and activations to reduce the quantization error and effectively improves the classification accuracy of BNNs. However, the scaling factors not only increase the computational complexity of networks, but also make no sense to the signs of binary features. To this end, Self-Distribution Binary Neural Network (SD-BNN) is proposed. Firstly, we utilize Activation Self Distribution (ASD) to adaptively adjust the sign distribution of activations, thereby improve the sign differences of the outputs of the convolution. Secondly, we adjust the sign distribution of weights through Weight Self Distribution (WSD) and then fine-tune the sign distribution of the outputs of the convolution. Extensive experiments on CIFAR-10 and ImageNet datasets with various network structures show that the proposed SD-BNN consistently outperforms the state-of-the-art (SOTA) BNNs (e.g., achieves 92.5% on CIFAR-10 and 66.5% on ImageNet with ResNet-18) with less computation cost. Code is available at https://github.com/ pingxue-hfut/SD-BNN.	翻訳日:2021-03-05 13:02:23 公開日:2021-03-04
# S^3$: ガイド深度推定のための学習可能なスパース信号超密度 $S^3$: Learnable Sparse Signal Superdensity for Guided Depth Estimation ( http://arxiv.org/abs/2103.02396v2 ) ライセンス: Link先を確認	Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Yu-Cheng Chang, Tsung-Lin Tsou, Yu-An Wang, and Winston H. Hsu	(参考訳) Dense Depthの推定は、ロボット工学、3D再構成、拡張現実といった複数のアプリケーションにおいて重要な役割を果たす。 LiDAR や Radar などのスパース信号は高密度深度推定のガイダンスとして利用されているが、密度が低く、分布が不均衡なため改善が制限されている。スパースソースから有効性を最大化するために,拡張領域の信頼性を推定しながらスパースキューから深さ値を拡張する,$S^3$手法を提案する。提案した$S^3$は、様々な誘導深度推定手法や、入力、コストボリューム、出力を含む様々な段階で訓練されたエンドツーエンドに適用できる。広範な実験はLiDARおよびレーダー信号の$S^3$の技術の有効性、堅牢性および柔軟性を示す。 Dense Depth estimation plays a key role in multiple applications such as robotics, 3D reconstruction, and augmented reality. While sparse signal, e.g., LiDAR and Radar, has been leveraged as guidance for enhancing dense depth estimation, the improvement is limited due to its low density and imbalanced distribution. To maximize the utility from the sparse source, we propose $S^3$ technique, which expands the depth value from sparse cues while estimating the confidence of expanded region. The proposed $S^3$ can be applied to various guided depth estimation approaches and trained end-to-end at different stages, including input, cost volume and output. Extensive experiments demonstrate the effectiveness, robustness, and flexibility of the $S^3$ technique on LiDAR and Radar signal.	翻訳日:2021-03-05 13:01:53 公開日:2021-03-04
# 超解像圧縮画像の並列化とアーティファクト低減と分解能向上のシリーズ統合 Super-resolving Compressed Images via Parallel and Series Integration of Artifact Reduction and Resolution Enhancement ( http://arxiv.org/abs/2103.01698v2 ) ライセンス: Link先を確認	Hongming Luo, Fei Zhou, Guangsen Liao, and Guoping Qiu	(参考訳) 本論文では,アーティファクト除去と解像度向上の並列および直列統合に基づく新しい圧縮画像超解像(CISR)フレームワークを提案する。クリーンな低分解能(LR)入力画像と、ダウンサンプリングおよび圧縮観察からのクリーンな高分解能(HR)出力イメージを推定するための最大後方推論に基づいて、アーティファクトリダクションモジュール(ARM)とリゾリューションエンハンスモジュール(REM)の2つのディープニューラルネットワークモジュールからなるCISRアーキテクチャを設計しました。 ARMとREMは、圧縮LRイメージを入力として取得することと並行して動作し、REMはARMの出力を入力の1つとして取得し、ARMはREMの出力を他の入力として取得する。 CSIRシステムのユニークな特徴は、異なる方法で圧縮されたLR画像を様々な品質に超解ける1つの訓練されたモデルである。これは、画像劣化を処理するためのディープニューラルネットワーク容量と、ARMとREM間の並列および直列接続を利用して、特定の劣化への依存を減らすことで実現される。 ARMとREMは、深層展開技術によって同時に訓練される。 JPEGとWebP圧縮画像の混合に対して,圧縮型と圧縮係数の事前知識のない実験を行った。視覚的および定量的比較は、最先端のsuper resolu-tionメソッドよりも優れた方法を示している。 In this paper, we propose a novel compressed image super resolution (CISR) framework based on parallel and series integration of artifact removal and resolution enhancement. Based on maximum a posterior inference for estimating a clean low-resolution (LR) input image and a clean high resolution (HR) output image from down-sampled and compressed observations, we have designed a CISR architecture consisting of two deep neural network modules: the artifact reduction module (ARM) and resolution enhancement module (REM). ARM and REM work in parallel with both taking the compressed LR image as their inputs, while they also work in series with REM taking the output of ARM as one of its inputs and ARM taking the output of REM as its other input. A unique property of our CSIR system is that a single trained model is able to super-resolve LR images compressed by different methods to various qualities. This is achieved by exploiting deep neural net-works capacity for handling image degradations, and the parallel and series connections between ARM and REM to reduce the dependency on specific degradations. ARM and REM are trained simultaneously by the deep unfolding technique. Experiments are conducted on a mixture of JPEG and WebP compressed images without a priori knowledge of the compression type and com-pression factor. Visual and quantitative comparisons demonstrate the superiority of our method over state-of-the-art super resolu-tion methods.Code link: https://github.com/luohongming/CISR_PSI	翻訳日:2021-03-05 13:01:39 公開日:2021-03-04
# 動的核融合モジュールによる乾燥領域と道路異常検出:ベンチマークとアルゴリズム Dynamic Fusion Module Evolves Drivable Area and Road Anomaly Detection: A Benchmark and Algorithms ( http://arxiv.org/abs/2103.02433v2 ) ライセンス: Link先を確認	Hengli Wang, Rui Fan, Yuxiang Sun, Ming Liu	(参考訳) 移動ロボットにとって,乾燥領域と道路異常の同時検出は非常に重要である。近年,畳み込みニューラルネットワーク(CNN)に基づく多くのセマンティックセグメンテーション手法が,画素ワイドな領域と道路異常検出のために提案されている。さらに、KITTIやCityscapesなどのベンチマークデータセットが広く使用されている。しかし、既存のベンチマークは主に自動運転車向けに設計されている。ロボット車椅子などの地上移動ロボットのベンチマークが欠けています。そこで本論文では,まず地上移動ロボットの走行可能領域と道路異常検出ベンチマークを構築し,視覚的特徴の6つのモダリティを用いて,既存の最先端の単一モーダルおよびデータ融合セマンティックセグメンテーションCNNを評価する。さらに,動的融合モジュール(DFM)と呼ばれる新しいモジュールを提案し,既存のデータ融合ネットワークに容易に展開し,異なるタイプの視覚的特徴を効果的かつ効率的に融合させることができる。実験の結果,変換された不均質画像が最も有意義な視覚的特徴であり,提案したDFM-RTFNetは最先端技術よりも優れていた。さらに,我々のDFM-RTFNetは,KITTIロードベンチマーク上での競合性能を実現している。私たちのベンチマークはhttps://sites.google.com/view/gmrbで公開されています。 Joint detection of drivable areas and road anomalies is very important for mobile robots. Recently, many semantic segmentation approaches based on convolutional neural networks (CNNs) have been proposed for pixel-wise drivable area and road anomaly detection. In addition, some benchmark datasets, such as KITTI and Cityscapes, have been widely used. However, the existing benchmarks are mostly designed for self-driving cars. There lacks a benchmark for ground mobile robots, such as robotic wheelchairs. Therefore, in this paper, we first build a drivable area and road anomaly detection benchmark for ground mobile robots, evaluating the existing state-of-the-art single-modal and data-fusion semantic segmentation CNNs using six modalities of visual features. Furthermore, we propose a novel module, referred to as the dynamic fusion module (DFM), which can be easily deployed in existing data-fusion networks to fuse different types of visual features effectively and efficiently. The experimental results show that the transformed disparity image is the most informative visual feature and the proposed DFM-RTFNet outperforms the state-of-the-arts. Additionally, our DFM-RTFNet achieves competitive performance on the KITTI road benchmark. Our benchmark is publicly available at https://sites.google.com/view/gmrb.	翻訳日:2021-03-05 13:01:12 公開日:2021-03-04
# 飛べる学習--多エージェントクワッドコプター制御の強化学習のためのパイブルレット物理を用いた体育環境 Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control ( http://arxiv.org/abs/2103.02142v2 ) ライセンス: Link先を確認	Jacopo Panerati (1 and 2), Hehui Zheng (3), SiQi Zhou (1 and 2), James Xu (1), Amanda Prorok (3), Angela P. Schoellig (1 and 2) ((1) University of Toronto Institute for Aerospace Studies, (2) Vector Institute for Artificial Intelligence, (3) University of Cambridge)	(参考訳) ロボットシミュレータは、学術研究と教育、および安全クリティカルなアプリケーションの開発に不可欠です。強化学習環境 -- 報酬関数の形で問題仕様と結合した単純なシミュレーション -- もまた、学習アルゴリズムの開発(およびベンチマーク)を標準化する上で重要である。しかし、フルスケールのシミュレータは移植性と並列性に欠ける。逆に、多くの強化学習環境は、おもちゃのような問題における高いサンプルスループットのためのトレードオフリアリズムである。パブリックデータセットはディープラーニングとコンピュータビジョンに大きく貢献していますが、制御理論と強化学習アプローチを同時に開発し、比較するソフトウェアツールはまだありません。本稿では,Bullet物理エンジンに基づく複数クワッドコプターのためのオープンソースのOpenAI Gymライクな環境を提案する。マルチエージェントおよびビジョンベースの強化学習インターフェース、および現実的な衝突と空力効果のサポートは、私たちの知識の最高に、その種の最初のものにします。我々は、制御(pid制御による軌道追跡、ダウンウォッシュによるマルチロボット飛行など)の例を通して、その使用例を実証する。または強化学習(単一および複数エージェント安定化タスク)、制御理論と機械学習を組み合わせた将来の研究を刺激することを望んでいます。 Robotic simulators are crucial for academic research and education as well as the development of safety-critical applications. Reinforcement learning environments -- simple simulations coupled with a problem specification in the form of a reward function -- are also important to standardize the development (and benchmarking) of learning algorithms. Yet, full-scale simulators typically lack portability and parallelizability. Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toy-like problems. While public data sets have greatly benefited deep learning and computer vision, we still lack the software tools to simultaneously develop -- and fairly compare -- control theory and reinforcement learning approaches. In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. Its multi-agent and vision based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects, make it, to the best of our knowledge, a first of its kind. We demonstrate its use through several examples, either for control (trajectory tracking with PID control, multi-robot flight with downwash, etc.) or reinforcement learning (single and multi-agent stabilization tasks), hoping to inspire future research that combines control theory and machine learning.	翻訳日:2021-03-05 13:00:36 公開日:2021-03-04

Title

Authors

Abstract

論文公表日・翻訳日

# w-posenet: 密度対応正規化画素対ポーズ回帰

W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression ( http://arxiv.org/abs/1912.11888v2 )

ライセンス: Link先を確認

Zelin Xu, Ke Chen and Kui Jia

(参考訳) 6dポーズ推定の解決は,本質的外観や形状のばらつき,物体間咬合の重篤さに対処し,非制御環境下で取得したデータの大幅な照明変化や低品質化に照らしてより困難になる。本稿では,入力データから6次元ポーズ,モデル空間における3次元座標に強く回帰する新しいポーズ推定アルゴリズムW-PoseNetを提案する。言い換えれば、深層ネットワークにおけるポーズ回帰を学習した局所的特徴は、3次元ポーズ感応座標への画素ワイド対応マッピングを補助タスクとして明示的に学習することで正規化される。さらに,不整合とスパースな局所的特徴に対するロバスト性を改善するため,画素単位の特徴のスパース対と画素対ポーズ予測におけるソフト投票の組み合わせを考案した。人気の高いYCB-VideoとLineMODベンチマークの実験結果から、提案したW-PoseNetは最先端のアルゴリズムよりも一貫して優れた性能を発揮することが示された。

Solving 6D pose estimation is non-trivial to cope with intrinsic appearance and shape variation and severe inter-object occlusion, and is made more challenging in light of extrinsic large illumination changes and low quality of the acquired data under an uncontrolled environment. This paper introduces a novel pose estimation algorithm W-PoseNet, which densely regresses from input data to 6D pose and also 3D coordinates in model space. In other words, local features learned for pose regression in our deep network are regularized by explicitly learning pixel-wise correspondence mapping onto 3D pose-sensitive coordinates as an auxiliary task. Moreover, a sparse pair combination of pixel-wise features and soft voting on pixel-pair pose predictions are designed to improve robustness to inconsistent and sparse local features. Experiment results on the popular YCB-Video and LineMOD benchmarks show that the proposed W-PoseNet consistently achieves superior performance to the state-of-the-art algorithms.

翻訳日:2023-06-10 08:10:22 公開日:2021-03-04

# 周期量子因果モデル

Cyclic Quantum Causal Models ( http://arxiv.org/abs/2002.12157v3 )

ライセンス: Link先を確認

Jonathan Barrett, Robin Lorenz, Ognyan Oreshkov

(参考訳) 因果推論は科学に不可欠であるが、量子論はそれに挑戦する。ベルの不等式に違反する量子相関は古典的因果モデルの枠組みの中で満足のいく因果的説明を否定する。さらに、量子システムと重力を包含する理論は、因果順序が不定な操作を特徴付ける因果的非分離過程を許容し、事象が全く因果順序づけられることを証明している。最初の課題は、量子過程の因果的説明を可能にする、近年の量子因果的モデルの発展によって解決された。この研究は因果的に分離不能なプロセスに対処し、量子因果モデルを循環因果構造に拡張することでそれらの因果的視点を提供する。このアプローチの他の応用として、ユニタリ拡張可能な2成分プロセスはすべて因果分離可能であり、ユニタリプロセスでは因果非分離性と因果構造の循環性が等価であることが示されている。

Causal reasoning is essential to science, yet quantum theory challenges it. Quantum correlations violating Bell inequalities defy satisfactory causal explanations within the framework of classical causal models. What is more, a theory encompassing quantum systems and gravity is expected to allow causally nonseparable processes featuring operations in indefinite causal order, defying that events be causally ordered at all. The first challenge has been addressed through the recent development of intrinsically quantum causal models, allowing causal explanations of quantum processes -- provided they admit a definite causal order, i.e. have an acyclic causal structure. This work addresses causally nonseparable processes and offers a causal perspective on them through extending quantum causal models to cyclic causal structures. Among other applications of the approach, it is shown that all unitarily extendible bipartite processes are causally separable and that for unitary processes, causal nonseparability and cyclicity of their causal structure are equivalent.

翻訳日:2023-06-01 12:37:18 公開日:2021-03-04

# 最も短いベクトル問題に対する2つの量子イジングアルゴリズム

Two quantum Ising algorithms for the Shortest Vector Problem: one for now and one for later ( http://arxiv.org/abs/2006.14057v7 )

ライセンス: Link先を確認

David Joseph, Adam Callison, Cong Ling, Florian Mintert

(参考訳) 量子コンピュータは、今日の公開鍵暗号を数十年以内に破ると予想されている。新しい暗号系は後量子時代のために設計・標準化されており、これらの大部分は、最短ベクトル問題のような問題の難しさを量子敵に頼っている。本稿では,この問題を解決するための量子イジングアルゴリズムの2つの変種について述べる。 1つの変種は空間的に効率的であり、N が格子次元であるような O(NlogN) 量子ビットのみを必要とするが、もう1つの変種はノイズに対してより堅牢である。量子アニール器および数値シミュレーションにおけるアルゴリズムの性能の解析は、より量子ビット効率のよい変種は長期的には優れ、他の変種は短期的な実装に適していることを示している。

Quantum computers are expected to break today's public key cryptography within a few decades. New cryptosystems are being designed and standardised for the post-quantum era, and a significant proportion of these rely on the hardness of problems like the Shortest Vector Problem to a quantum adversary. In this paper we describe two variants of a quantum Ising algorithm to solve this problem. One variant is spatially efficient, requiring only O(NlogN) qubits where N is the lattice dimension, while the other variant is more robust to noise. Analysis of the algorithms' performance on a quantum annealer and in numerical simulations show that the more qubit-efficient variant will outperform in the long run, while the other variant is more suitable for near-term implementation.

翻訳日:2023-05-12 22:07:34 公開日:2021-03-04

# NISQ応用のための多重指数誤差外挿と複合誤差除去技術

Multi-exponential Error Extrapolation and Combining Error Mitigation Techniques for NISQ Applications ( http://arxiv.org/abs/2007.01265v2 )

ライセンス: Link先を確認

Zhenyu Cai

(参考訳) 量子ハードウェアにおけるノイズは、量子コンピュータの実装における最大の障害である。短期量子コンピュータの実用的応用におけるノイズと戦うために、大きな量子ビットオーバヘッドを必要とする量子誤差補正に頼る代わりに、余分な測定値を利用する量子エラー緩和に目を向ける。 error extrapolationは、実験的に実装されたエラー緩和技術である。数値シミュレーションとヒューリスティックな議論により、指数曲線は、予測された回路誤差がユニティの周りの大きな回路限界における外挿に有効であることが示されている。本稿では、これをマルチ指数誤差外挿に拡張し、パウリ雑音下での有効性のより厳密な証明を提供する。これは数値シミュレーションによりさらに検証され,単指数外挿よりも推定精度が桁違いに向上した。さらに,これらの手法の特徴を生かして,誤り補間と他の2つの誤り緩和手法,準確率と対称性の検証を組み合わせる手法を開発した。シミュレーションで示されるように,本手法は,標準誤差推定において必要となるハードウェア誤差率を調整することなく,サンプリングコストを準確率よりも数倍小さくして,推定バイアスを低減できる。

Noise in quantum hardware remains the biggest roadblock for the implementation of quantum computers. To fight the noise in the practical application of near-term quantum computers, instead of relying on quantum error correction which requires large qubit overhead, we turn to quantum error mitigation, in which we make use of extra measurements. Error extrapolation is an error mitigation technique that has been successfully implemented experimentally. Numerical simulation and heuristic arguments have indicated that exponential curves are effective for extrapolation in the large circuit limit with an expected circuit error count around unity. In this article, we extend this to multi-exponential error extrapolation and provide more rigorous proof for its effectiveness under Pauli noise. This is further validated via our numerical simulations, showing orders of magnitude improvements in the estimation accuracy over single-exponential extrapolation. Moreover, we develop methods to combine error extrapolation with two other error mitigation techniques: quasi-probability and symmetry verification, through exploiting features of these individual techniques. As shown in our simulation, our combined method can achieve low estimation bias with a sampling cost multiple times smaller than quasi-probability while without needing to be able to adjust the hardware error rate as required in canonical error extrapolation.

翻訳日:2023-05-11 20:37:22 公開日:2021-03-04

# 符号なし集合の端について

On the edge of the set of no-signaling assemblages ( http://arxiv.org/abs/2008.12325v2 )

ライセンス: Link先を確認

Micha{\l} Banacki, Ricard Ravell Rodr\'iguez and Pawe{\l} Horodecki

(参考訳) 近年の進展に伴い, マルチパーティント・ポストクエンタム・ステアリングと一般無署名アセンブリのシナリオを考察した。我々は,無符号集合の集合の辺の概念を導入し,その特徴付けについて述べる。次に、この概念を用いて、LHSモデルなしで無署名の集合体を目撃する。最後に、2つの信頼できないサブシステムで操る最も単純な非自明な場合、エッジ上の集合の量子化の可能性について議論する。特に、3量子状態の場合、与えられた状態のランクが 3 より大きい限り、POVM によって記述された測定値を用いて、エッジ上にアセンブリを生成できないことを述べる no-go 型結果を得る。

Following recent advancements, we consider a scenario of multipartite postquantum steering and general no-signaling assemblages. We introduce the notion of the edge of the set of no-signaling assemblages and we present its characterization. Next, we use this concept to construct witnesses for no-signaling assemblages without an LHS model. Finally, in the simplest nontrivial case of steering with two untrusted subsystems, we discuss the possibility of quantum realization of assemblages on the edge. In particular, for three-qubit states, we obtain a no-go type result, which states that it is impossible to produce assemblage on the edge using measurements described by POVMs as long as the rank of a given state is greater than or equal to 3.

翻訳日:2023-05-04 19:28:06 公開日:2021-03-04

# 確率的定式化によるオープン量子システムシミュレーションのための自己回帰ニューラルネットワーク

Autoregressive Neural Network for Simulating Open Quantum Systems via a Probabilistic Formulation ( http://arxiv.org/abs/2009.05580v3 )

ライセンス: Link先を確認

Di Luo, Zhuo Chen, Juan Carrasquilla, and Bryan K. Clark

(参考訳) オープン量子システムの理論は、量子科学と工学における現代の研究のかなりの部分の基礎を成している。拡張ヒルベルト空間の次元性に根ざし、開量子系をシミュレートする高い計算複雑性は、それらの力学を近似する戦略の開発を要求する。本稿では,オープン量子システムダイナミクスに取り組むためのアプローチを提案する。我々は,リウビリアン超作用素の運動をフォワード・バックワード・トラペズイド法を用いてシミュレートし,変分定式化によって定常状態を求める。本研究では,正の演算子値測度(povm)と自己回帰ニューラルネットワークを組み合わせた量子物理学の確率論的定式化を行い,効率的なサンプリングと扱いやすい密度によるアルゴリズムの柔軟性を生かした。自己回帰型ニューラルネットワークの対称性を部分的に復元し,局所相関の記述を改善する,改良されたアンサッツ,文字列状態を導入する。我々は,本手法を原型的な1次元と2次元のシステムでベンチマークし,厳密な解を追跡し,最近提案された制限ボルツマンマシンに基づくアプローチと比較して精度の高い結果を求める。このアプローチは、様々な文脈における密度行列の進化に広く適用できると期待する。

The theory of open quantum systems lays the foundations of a substantial part of modern research in quantum science and engineering. Rooted in the dimensionality of their extended Hilbert spaces, the high computational complexity of simulating open quantum systems calls for the development of strategies to approximate their dynamics. In this paper, we present an approach for tackling open quantum system dynamics. We simulate the dynamics of the Liouvillian superoperator using a forward-backward trapezoid method and find the steady-state via a variational formulation. We make use of a probabilistic formulation of quantum physics based on a positive operator-valued measure (POVM) in combination with autoregressive neural networks, which bring significant algorithmic flexibility due to their efficient sampling and tractable density. We introduce improved ansatzs, String States, which partially restore the symmetry of the autoregressive neural network and improve the description of local correlations. We benchmark our approaches on prototypical one and two-dimensional systems, finding results which closely track the exact solution and achieve higher accuracy in comparison to the recently proposed approach based on restricted Boltzmann machines. We anticipate this approach will be widely applicable to evolving density matrices in various contexts.

翻訳日:2023-05-02 22:20:00 公開日:2021-03-04

# 非対称量子ラビモデルの隠れ対称性

The hidden symmetry of the asymmetric quantum Rabi model ( http://arxiv.org/abs/2010.02496v2 )

ライセンス: Link先を確認

Vladimir V. Mangazeev, Murray T. Batchelor and Vladimir V. Bazhanov

(参考訳) 非対称量子ラビモデル (AQRM) は、バイアスパラメータ $\epsilon$ の値 $\epsilon\in\frac{1}{2}\mathbb{Z}$ に対して固有スペクトルのレベル交差を示す。このようなレベルの交差は、モデルの隠れ対称性と結びつくことが期待されている。この隠れ対称性の起源は、これらの特別な値でAQRMハミルトニアンと通勤する作用素を見つけることによって確立される。この構成は、最初のいくつかのケースで明示的に与えられ、同様のレベルの交差がバイアス項の存在下で観測された他の関連する光-物質相互作用モデルに適用することができる。

The asymmetric quantum Rabi model (AQRM) exhibits level crossings in the eigenspectrum for the values $\epsilon\in\frac{1}{2}\mathbb{Z}$ of the bias parameter $\epsilon$. Such level crossings are expected to be associated with some hidden symmetry of the model. The origin of this hidden symmetry is established by finding the operators which commute with the AQRM hamiltonian at these special values. The construction is given explicitly for the first several cases and can be applied to other related light-matter interaction models for which similar level crossings have been observed in the presence of a bias term.

翻訳日:2023-04-29 20:29:39 公開日:2021-03-04

# リッチトーリック符号における対称性分数化のための文字列順序パラメータ

String order parameters for symmetry fractionalization in an enriched toric code ( http://arxiv.org/abs/2011.02981v2 )

ライセンス: Link先を確認

Jos\'e Garre-Rubio, Mohsin Iqbal and David T. Stephen

(参考訳) 低次元対称性保護位相状態を持つトーリック符号モデルをデコレートした対称性エンリッチ位相秩序の簡単なモデルについて検討した。このモデルにおける対称性の分数化は弦次数パラメータによって特徴づけられ、これらのシグネチャは相転移点まで外部場と相互作用の影響下で頑健であることを示す。これは[new journal of physics 21 113016 (2019)]の最近の提案を固定点テンソルネットワーク状態の設定を超えて拡張し、対称性分数化を特徴付け、検出するための有用なツールとして文字列順序パラメータを固化する。これに加えて、対称性を自発的に破る対称性の力を分別するエノンの凝縮がどのように観察し、射影された絡み合ったペア状態の枠組みでこれを証明する。この現象は並列磁場中のトーリック符号の位相図に顕著な変化をもたらす。

We study a simple model of symmetry-enriched topological order obtained by decorating a toric code model with lower-dimensional symmetry-protected topological states. We show that the symmetry fractionalization in this model can be characterized by string order parameters, and that these signatures are robust under the effects of external fields and interactions, up to the phase transition point. This extends the recent proposal of [New Journal of Physics 21, 113016 (2019)] beyond the setting of fixed-point tensor network states, and solidifies string order parameters as a useful tool to characterize and detect symmetry fractionalization. In addition to this, we observe how the condensation of an anyon that fractionalizes a symmetry forces that symmetry to spontaneously break, and we give a proof of this in the framework of projected entangled pair states. This phenomenon leads to a notable change in the phase diagram of the toric code in parallel magnetic fields.

翻訳日:2023-04-25 05:16:45 公開日:2021-03-04

# ブール関数に対する量子ランダムアクセス符号

Quantum Random Access Codes for Boolean Functions ( http://arxiv.org/abs/2011.06535v4 )

ライセンス: Link先を確認

Jo\~ao F. Doriguello, Ashley Montanaro

(参考訳) $n\overset{p}{\mapsto}m$ random access code (RAC) は$n$ビットを$m$ビットに符号化し、任意の初期ビットは少なくとも$p$で回収できるが、量子RAC(QRAC)では$n$ビットは$m$qubitsに符号化される。提案以来、RACの考え方は様々な方法で一般化され、例えば共有絡み(絡み付きランダムアクセスコードまたは単にEARACと呼ばれる)や複数のビットの復元が可能になった。本稿では,初期ビットの固定サイズの任意の部分集合上で,与えられたブール関数$f$の値を返すためのRACの考え方を一般化し,これを$f$-randomアクセスコードと呼ぶ。我々は、古典的な(f$-RAC)および量子(f$-QRAC)エンコーディングを持つ$f$ランダムアクセスコードのためのプロトコルを、プライベートまたは共有ランダムネス、共有エンタングルメント(f$-EARAC)、Poposcu-Rohrlichbox(f$-PRRAC)など、さまざまなリソースとともに研究し、提供する。我々のプロトコルの成功確率は、ブール関数 $f$ の \emph{noise stability} によって特徴づけられる。さらに、共有ランダム性を持つ任意の$f$-QRACの成功確率について \emph{upper bound} を与え、その成功確率を乗法定数(および拡張による$f$-RAC)まで一致させる。

An $n\overset{p}{\mapsto}m$ random access code (RAC) is an encoding of $n$ bits into $m$ bits such that any initial bit can be recovered with probability at least $p$, while in a quantum RAC (QRAC), the $n$ bits are encoded into $m$ qubits. Since its proposal, the idea of RACs was generalized in many different ways, e.g. allowing the use of shared entanglement (called entanglement-assisted random access code, or simply EARAC) or recovering multiple bits instead of one. In this paper we generalize the idea of RACs to recovering the value of a given Boolean function $f$ on any subset of fixed size of the initial bits, which we call $f$-random access codes. We study and give protocols for $f$-random access codes with classical ($f$-RAC) and quantum ($f$-QRAC) encoding, together with many different resources, e.g. private or shared randomness, shared entanglement ($f$-EARAC) and Popescu-Rohrlich boxes ($f$-PRRAC). The success probability of our protocols is characterized by the \emph{noise stability} of the Boolean function $f$. Moreover, we give an \emph{upper bound} on the success probability of any $f$-QRAC with shared randomness that matches its success probability up to a multiplicative constant (and $f$-RACs by extension), meaning that quantum protocols can only achieve a limited advantage over their classical counterparts.

翻訳日:2023-04-24 07:39:30 公開日:2021-03-04

# タブ駆動型量子近傍サンプラー

Tabu-driven Quantum Neighborhood Samplers ( http://arxiv.org/abs/2011.09508v2 )

ライセンス: Link先を確認

Charles Moussa, Hao Wang, Henri Calandra, Thomas B\"ack, Vedran Dunjko

(参考訳) 組合せ最適化は量子コンピューティングを対象とする重要な応用である。しかし、短期的なハードウェアの制約により、大きな実用的問題に対する高性能な古典的ヒューリスティックと比較すると、量子アルゴリズムは競争力に欠ける。短期的なデバイスで利点を得る一つの選択肢は、それらを古典的ヒューリスティックと組み合わせて使うことである。特に,量子法を用いて古典的に難解な分布からサンプルを抽出し,最適化問題を高速に解くための真の証明可能な量子分離を実現するための最も可能性の高い手法を提案する。量子近似最適化アルゴリズム (qaoa) を近傍サンプルとして, タブサーチの適用により, この拡張を数値的に検討した。このようなハイブリッド環境では,QAOAは探索・探索の柔軟なツールであり,タブイテレーションを多く節約し,より良いソリューションを実現することで,問題の迅速な解決に有効であることを示す。

Combinatorial optimization is an important application targeted by quantum computing. However, near-term hardware constraints make quantum algorithms unlikely to be competitive when compared to high-performing classical heuristics on large practical problems. One option to achieve advantages with near-term devices is to use them in combination with classical heuristics. In particular, we propose using quantum methods to sample from classically intractable distributions -- which is the most probable approach to attain a true provable quantum separation in the near-term -- which are used to solve optimization problems faster. We numerically study this enhancement by an adaptation of Tabu Search using the Quantum Approximate Optimization Algorithm (QAOA) as a neighborhood sampler. We show that QAOA provides a flexible tool for exploration-exploitation in such hybrid settings and can provide evidence that it can help in solving problems faster by saving many tabu iterations and achieving better solutions.

翻訳日:2023-04-23 19:08:48 公開日:2021-03-04

# 非対称 {S_1FS_2} ジョセフソン接合における超ギャップおよびサブギャップ増強電流

Supergap and subgap enhanced currents in asymmetric {S_1FS_2} Josephson junctions ( http://arxiv.org/abs/2011.12967v2 )

ライセンス: Link先を確認

Mohammad Alidoust, Klaus Halterman

(参考訳) 超伝導鉛の超伝導ギャップの大きさは不等式、すなわち$\Delta_1\neq \Delta_2$で非対称な$S_1NS_2$および$S_1FS_2$系を生成する。その結果, 超伝導ギャップの比を$\Delta_2/\Delta_1$にすることで, S_1NS_2$系における臨界超電流を100\%以上高め, 飽和点に達したり, 接合厚さ, 磁化強度, 化学ポテンシャルに応じて崩壊させたりすることができることがわかった。拡散性$S_1NS_2$系の総臨界電流は、パラボラティカルに50\%以上増大し、超伝導ギャップの1つを増大させることで飽和に達することがわかった。均一な強磁性接合では、超電流は$\Delta_2/\Delta_1>1$の増加によって反転する。超電流をスーパーギャップ成分とサブギャップ成分に分解することにより、ジョセフソン電流の流れに対するそれらの重要な相対的貢献を示す。その結果,S_1FS_2$接合におけるサブギャップ電流とスーパーギャップ電流の競合は,電流相関係における第2高調波の出現をもたらすことがわかった。拡散非対称ジョセフソン構成とは対照的に、$\Delta_2/\Delta_1=1$の弾道系における超電流の挙動は、フェルミ準位ミスマッチ、磁化強度、接合厚を含む幅広いパラメータセットにおいてのみ、サブギャップ電流成分によって適切に記述できる。興味深いことに、$\delta_2/\delta_1>1$の場合、全超電流がsupergapコンポーネントによって駆動される複数のパラメータセットを見つけました。そこで本研究では,弾道系および拡散系におけるサブギャップおよびスーパーギャップ超電流成分の重要性を概説した。

We have theoretically studied the supercurrent profiles in three-dimensional normal metal and ferromagnetic Josephson configurations, where the magnitude of the superconducting gaps in the superconducting leads are unequal, i.e., $\Delta_1\neq \Delta_2$, creating asymmetric $S_1NS_2$ and $S_1FS_2$ systems. Our results reveal that by increasing the ratio of the superconducting gaps $\Delta_2/\Delta_1$, the critical supercurrent in a ballistic $S_1NS_2$ system can be enhanced by more than $100\%$, and reaches a saturation point, or decays away, depending on the junction thickness, magnetization strength, and chemical potential. The total critical current in a diffusive $S_1NS_2$ system was found to be enhanced by more than $50\%$ parabolically, and reaches saturation by increasing one of the superconducting gaps. In a uniform ferromagnetic junction, the supercurrent undergoes reversal by increasing $\Delta_2/\Delta_1>1$. Through decomposing the total supercurrent into its supergap and subgap components, our results illustrate their crucial relative contributions to the Josephson current flow. It was found that the competition of subgap and supergap currents in a $S_1FS_2$ junction results in the emergence of second harmonics in the current-phase relation. In contrast to a diffusive asymmetric Josephson configuration, the behavior of the supercurrent in a ballistic system with $\Delta_2/\Delta_1=1$ can be properly described by the subgap current component only, in a wide range of parameter sets, including Fermi level mismatch, magnetization strength, and junction thickness. Interestingly, when $\Delta_2/\Delta_1>1$, our results have found multiple parameter sets where the total supercurrent is driven by the supergap component. Therefore, our comprehensive study highlights the importance of subgap and supergap supercurrent components in both the ballistic and diffusive regimes.

翻訳日:2023-04-23 00:48:40 公開日:2021-03-04

# 新型コロナウイルス(covid-19)の拡散を制御する自動暴露通知(aen)技術の約束:スマートフォンアプリの展開、使用、反復評価の推奨

Realizing the Promise of Automated Exposure Notification (AEN) Technology to Control the Spread of COVID-19: Recommendations for Smartphone App Deployment, Use, and Iterative Assessment ( http://arxiv.org/abs/2012.09232v2 )

ライセンス: Link先を確認

Jesslyn Alekseyev (1), Erica Dixon (2), Vilhelm L Andersen Woltz (3), Danny Weitzner (3) ((1) Massachusetts Institute of Technology Lincoln Laboratory, (2) University of Pennsylvania, (3) Massachusetts Institute of Technology)

(参考訳) 現代の暗号技術を用いることで、プライバシ保存自動露光通知(aen)技術は、インキュベーション期間中に個人のデータのプライバシーを維持しながら、人々間の接触を自動的に記録することで、拡散する病気の軽減を約束する。今日では、米国や世界中の公共衛生部門が、AENシステムを急速に展開している。多くの組織がアプリをデプロイする前に調査を行ったが、世界中の経験から、接触追跡アプリが比較的低いレベルにインストールされ、使用されていることが分かる。このホワイトペーパーは、AENシステムの展開を検討している州に有用な情報を提供し、既に配備されている州の改善をガイドすることを目的としている。 GAENコンソーシアム Exposure Notifications (EN) Express ツールを含む,新型コロナウイルスの感染拡大を抑えるという究極の目標を掲げて,AEN システムの採用に関連する人的要因について概説する。また、AENシステムを設計、デプロイする国家や、連絡先追跡アプリのデプロイを評価するための一連の推奨事項や、初期展開時の有効性を改善するための関心領域のターゲティングのための実用的な設計および実装ガイドも提供します。ケーススタディでは,ペンシルバニア州(PA)が展開する商用アプリと,ユーザの採用促進に向けた継続的な取り組みについて検討する。

By using modern cryptographic techniques, privacy-preserving Automated Exposure Notification (AEN) technologies offer the promise of mitigating disease spread by automatically recording contacts between people over the incubation period while maintaining individual data privacy. Today, public health departments in States and other countries around the world are deploying AEN systems at a rapid pace. Though many organizations conducted research prior to deploying apps, experience around the world shows that contact-tracing apps are installed and used at relatively low levels. This whitepaper is intended to provide usable information for States who are considering the deployment of an AEN system, as well as to guide ongoing improvements for States that have already deployed. We outline the human factors considerations related to employing AEN systems with the ultimate goal of controlling the spread of COVID-19, including the GAEN consortium Exposure Notifications (EN) Express tool. We will also provide a practical design and implementation guide for States and others designing and deploying AEN systems, as well as a set of recommendations for assessing deployment of contact tracing apps and targeting areas of concern to improve efficacy of use during and after initial deployment. As a case study, we consider the commercial app deployed by the state of Pennsylvania (PA) and the ongoing efforts to drive user adoption there.

翻訳日:2023-04-20 10:52:22 公開日:2021-03-04

# 数サイクルパルスによるCEP依存性コヒーレンスの解析理論

An analytical theory of CEP-dependent coherence driven by few-cycle pulses ( http://arxiv.org/abs/2101.04881v3 )

ライセンス: Link先を確認

Bing Zeng and Lingze Duan

(参考訳) 原子系と数サイクルの超高速パルスの相互作用は、リッチ物理と量子コヒーレンス制御におけるかなりの応用可能性をもたらす。しかしながら、その一般的な行動に関する理論的理解は、特にキャリア-エンベロープ相(CEP)の影響に関して、この体制における解析的な記述が欠如していることによって妨げられている。ここでは、遠方共鳴、少数サイクル2乗パルスによって駆動される2レベル原子を記述した分析理論を示す。シュロディンガー方程式の単純閉形式解は、回転波近似やゆっくりと変化するエンベロープ近似を招かなくても、一階摂動の下で得られる。さらなる研究により、原子の最終反転とパルスのCEPの間の算術的関係が明らかになる。その数学的単純さにもかかわらず、この関係は相互作用の重要な特徴を捉えることができ、パルス形状の一般化に対して頑健であることが証明され、数値解との良好な一致を示す。この理論は、将来のCEP感受性量子コヒーレンスの研究における一般的なガイダンスを提供する可能性がある。

The interaction between an atomic system and a few-cycle ultrafast pulse carries rich physics and a considerable application prospect in quantum-coherence control. However, theoretical understanding of its general behaviors has been hindered by the lack of an analytical description in this regime, especially with regard to the impact of the carrier-envelope phase (CEP). Here, we present an analytical theory that describes a two-level atom driven by a far-off-resonance, few-cycle square pulse. A simple, closed-form solution of the Schrodinger equation is obtained under the first-order perturbation without invoking the rotating-wave approximation or the slowly varying envelope approximation. Further investigation reveals an arithmetic relation between the final inversion of the atom and the CEP of the pulse. Despite its mathematical simplicity, the relation is able to capture some of the key features of the interaction, which prove to be robust against generalization of pulse shapes and show good agreements with numerical solutions. The theory can potentially offer a general guidance in future studies of CEP-sensitive quantum coherence.

翻訳日:2023-04-15 17:48:02 公開日:2021-03-04

# ダーク原子からの連続量子光

Continuous quantum light from a dark atom ( http://arxiv.org/abs/2103.01138v2 )

ライセンス: Link先を確認

Karl Nicolas Tolazzi, Bo Wang, Christopher Ianzano, Jonas Neumeier, Celso Jorge Villas-Boas, Gerhard Rempe

(参考訳) サイクリング過程はレーザーからトポロジカル絶縁体まで多くの物理学領域において重要であり、しばしば各系の動的および構造的側面に関する驚くべき洞察を与える。本稿では、共振レーザと光共振器が単一原子のいくつかの基底状態と励起状態の間の閉周期を定義する量子非線形波動混合実験について報告する。強い原子空洞結合と定常運転では、原子状態とキャビティ内光子数の絡み合いが量子干渉によって励起状態の集団を抑圧し、原子基底状態へのサイクルを効果的に減少させることを示した。システムダイナミクスは、各空洞光子数に対する1つの暗黒状態と、空洞から放出される光子にアンチバンチングを発生させる量子ゼノ遮断のハーモニックラグ内での遷移によって生じる。還元サイクルはサイクル外の原子状態への不要な光ポンピングを抑制し、発光光子の数を増大させる。

Cycling processes are important in many areas of physics ranging from lasers to topological insulators, often offering surprising insights into dynamical and structural aspects of the respective system. Here we report on a quantum-nonlinear wave-mixing experiment where resonant lasers and an optical cavity define a closed cycle between several ground and excited states of a single atom. We show that, for strong atom-cavity coupling and steady-state driving, the entanglement between the atomic states and intracavity photon number suppresses the excited-state population via quantum interference, effectively reducing the cycle to the atomic ground states. The system dynamics then result from transitions within a harmonic ladder of entangled dark states, one for each cavity photon number, and a quantum Zeno blockade that generates antibunching in the photons emitted from the cavity. The reduced cycle suppresses unwanted optical pumping into atomic states outside the cycle, thereby enhancing the number of emitted photons.

翻訳日:2023-04-09 14:34:33 公開日:2021-03-04

# 量子絡み合い生成のためのマクロランダムネス

Macroscopic randomness for quantum entanglement generation ( http://arxiv.org/abs/2103.02879v1 )

ライセンス: Link先を確認

Byoung S. Ham

(参考訳) 2つ以上の2成分間の量子絡み合いは、量子重ね合わせによってハイゼンベルクの不確定性原理によって直接制御される微視的レジームに制限された量子情報領域の中核概念であり、非決定論的かつ確率的量子的特徴をもたらす。このような量子機能は古典的手法では生成できない。ここでは、オンデマンドの絡み合った光ペア生成の古典的手法を、基底ランダム性を用いてマクロな状態に提示する。従来の量子力学のこの矛盾する考えは、古典性と量子性の両方に関する根本的な疑問を提起する。

Quantum entanglement between two or more bipartite entities is a core concept in quantum information areas limited to microscopic regimes directly governed by Heisenberg uncertainty principle via quantum superposition, resulting in nondeterministic and probabilistic quantum features. Such quantum features cannot be generated by classical means. Here, a pure classical method of on-demand entangled light-pair generation is presented in a macroscopic regime via basis randomness. This conflicting idea of conventional quantum mechanics invokes a fundamental question about both classicality and quantumness, where superposition is key to its resolution.

翻訳日:2023-04-09 02:46:02 公開日:2021-03-04

# BLOCKEYE:ブロックチェーンでDeFi攻撃を狙う

BLOCKEYE: Hunting For DeFi Attacks on Blockchain ( http://arxiv.org/abs/2103.02873v1 )

ライセンス: Link先を確認

Bin Wang, Han Liu, Chao Liu, Zhiqiang Yang, Qian Ren, Huixuan Zheng, Hong Lei

(参考訳) 分散金融、すなわちDeFiは、近年、多くのパブリックブロックチェーン(Ethereumなど)上で最も人気のあるタイプのアプリケーションとなっている。従来の金融と比較して、DeFiは顧客が比較的低コストでスマートコントラクトを通じて、多様なブロックチェーン金融サービス(融資、借り入れ、担保付け、交換など)に柔軟に参加することを可能にする。しかし、DeFiのオープンな性質は必然的に大きな攻撃面をもたらし、これは参加者の資金の安全に対する深刻な脅威である。本稿では,Ethereumブロックチェーン上でのDeFiプロジェクトのリアルタイム攻撃検出システムであるBLOCKEYEを提案する。 blockeyeが提供する重要な機能は次の2つだ。 1) 潜在的に脆弱なdefiプロジェクトは、重要なサービス状態(例えば資産価格)のデータフローを象徴的に推論し、外部操作が可能なかどうかをチェックする自動セキュリティ分析プロセスに基づいて識別される。 2) その後、脆弱なDeFiプロジェクトのためにトランザクションモニタがオフチェーンにインストールされる。プロジェクトだけでなく関連するプロジェクトにも送信されたトランザクションは、さらなるセキュリティ分析のために収集される。ブロックアイに設定された臨界不変量に対して違反が検出されると、潜在的な攻撃をフラグ付けする。例えば、利益は、非常に短時間で達成され、コストよりもはるかに大きい。いくつかの人気のあるDeFiプロジェクトにBLOCKEYEを適用し、報告されていない潜在的なセキュリティ攻撃を発見しました。 BLOCKEYEのビデオはhttps://youtu.be/7DjsWBLdlQUで公開されている。

Decentralized finance, i.e., DeFi, has become the most popular type of application on many public blockchains (e.g., Ethereum) in recent years. Compared to the traditional finance, DeFi allows customers to flexibly participate in diverse blockchain financial services (e.g., lending, borrowing, collateralizing, exchanging etc.) via smart contracts at a relatively low cost of trust. However, the open nature of DeFi inevitably introduces a large attack surface, which is a severe threat to the security of participants funds. In this paper, we proposed BLOCKEYE, a real-time attack detection system for DeFi projects on the Ethereum blockchain. Key capabilities provided by BLOCKEYE are twofold: (1) Potentially vulnerable DeFi projects are identified based on an automatic security analysis process, which performs symbolic reasoning on the data flow of important service states, e.g., asset price, and checks whether they can be externally manipulated. (2) Then, a transaction monitor is installed offchain for a vulnerable DeFi project. Transactions sent not only to that project but other associated projects as well are collected for further security analysis. A potential attack is flagged if a violation is detected on a critical invariant configured in BLOCKEYE, e.g., Benefit is achieved within a very short time and way much bigger than the cost. We applied BLOCKEYE in several popular DeFi projects and managed to discover potential security attacks that are unreported before. A video of BLOCKEYE is available at https://youtu.be/7DjsWBLdlQU.

翻訳日:2023-04-09 02:45:50 公開日:2021-03-04

# 状態集合とユニタリチャネルの量子認証

Quantum certification of state set and unitary channel ( http://arxiv.org/abs/2103.02837v1 )

ライセンス: Link先を確認

Wei Xie

(参考訳) 量子状態集合とユニタリ量子チャネルの効率的な量子認証アルゴリズムについて検討する。未知の状態の $o(\varepsilon^{-4}\ln |\mathcal{p}|)$ を使って未知の状態が含まれているかどうかを、トレース距離に関して既知の状態の有限集合 $\mathcal{p}$ と区別するアルゴリズムを提案する。このアルゴリズムは、いくつかの設定でよりサンプル効率が良い。以前の研究では、未知のユニタリ$u$が既知のユニタリ$v$と同一か、または、未知のユニタリ$v$を固定次元で、o(\varepsilon^{-2})$で使用し、choi状態が使われ、アンシラシステムが必要であるかを区別できることを示した。 2つのケースを1つのユニタリの$o(\varepsilon^{-1})$で区別するアルゴリズムを与える。

We study efficient quantum certification algorithms for quantum state set and unitary quantum channel. We present an algorithm that uses $O(\varepsilon^{-4}\ln |\mathcal{P}|)$ copies of an unknown state to distinguish whether the unknown state is contained in or $\varepsilon$-far from a finite set $\mathcal{P}$ of known states with respect to the trace distance. This algorithm is more sample-efficient in some settings. Previous study showed that one can distinguish whether an unknown unitary $U$ is equal to or $\varepsilon$-far from a known or unknown unitary $V$ in fixed dimension with $O(\varepsilon^{-2})$ uses of the unitary, in which the Choi state is used and thus an ancilla system is needed. We give an algorithm that distinguishes the two cases with $O(\varepsilon^{-1})$ uses of the unitary, using much fewer or no ancilla compared with previous results.

翻訳日:2023-04-09 02:45:26 公開日:2021-03-04

# ベル試験の1パラメータファミリーに及ぼす測定依存性の影響

Effects of measurement dependence on 1-parameter family of Bell tests ( http://arxiv.org/abs/2103.02819v1 )

ライセンス: Link先を確認

Fen-Zhuo Guo and Ze-Tian Lv and Shi-Hui Wei and Qiao-Yan Wen

(参考訳) ベルテストに基づくほとんどの量子情報タスクは、測定独立性の仮定に依存する。しかし, 測定独立性の仮定が常に実験で満たされていることを保証することは困難であり, ベル試験におけるこの仮定の緩和効果を検討することが重要である。本稿では,ベル (1-pfb) 試験の1パラメータファミリーに対する測定独立性の仮定を緩和する効果について検討する。一般的な入力分布と分解可能な入力分布の両方に対して、Eveが偽造できる1-PFB相関関数の計測依存性、推定確率、最大値の関係を確立する。 Eveが最大値を偽装する決定論的戦略も与えられる。チェイン不平等と1-PFB不平等の未知の情報レートを比較し、イーブが1-PFB不平等の最大量子違反を1-PFB不等式より偽装することが難しいパラメータの範囲を求める。

Most quantum information tasks based on Bell tests relie on the assumption of measurement independence. However, it is difficult to ensure that the assumption of measurement independence is always met in experimental operations, so it is crucial to explore the effects of relaxing this assumption on Bell tests. In this paper, we discuss the effects of relaxing the assumption of measurement independence on 1-parameter family of Bell (1-PFB) tests. For both general and factorizable input distributions, we establish the relationship among measurement dependence, guessing probability, and the maximum value of 1-PFB correlation function that Eve can fake. The deterministic strategy when Eve fakes the maximum value is also given. We compare the unknown information rate of Chain inequality and 1-PFB inequality, and find the range of the parameter in which it is more difficult for Eve to fake the maximum quantum violation in 1-PFB inequality than in Chain inequality.

翻訳日:2023-04-09 02:45:08 公開日:2021-03-04

# 対称多量子状態:星、絡み合い、ロトセンサー

Symmetric Multiqudit States: Stars, Entanglement, Rotosensors ( http://arxiv.org/abs/2103.02786v1 )

ライセンス: Link先を確認

Chryssomalis Chryssomalakos, Louis Hanotel, Edgar Guzm\'an-Gonz\'alez, Daniel Braun, Eduardo Serrano-Ens\'astiga and Karol \.Zyczkowski

(参考訳) n=d-1$ majorana 星の星座は、次元 $d$ の任意の純量子状態または $n$ qubits からなる系の置換対称状態を表す。後者の構成を、それぞれ$d$レベルを持つ$k$サブシステムの任意の対称な純粋状態を表すように一般化する。 d\geq 3$ に対して、そのような状態は、回転に関する限り、一定の相対複素重みを持つ様々なスピン状態の集まりと等価である。マヨラナの先導に従って、上記のスピン状態のマヨラナ星座からなり、補助的な「観測者」星座によって拡張され、複素重みをエンコードする多層星座を導入する。 4つの四重項と2つのスピン-3/2$系の対称状態の恒星表現の例を示す。我々は、様々なスピンの多元状態、パーティの数、さらには対称性を関連付けるエルミートとムルナガンの同型を再検討する。本稿では,多成分の絡み合いを解析し,最適な量子ロートセンサ,すなわち,特定の軸まわりの回転に最大に敏感な純粋状態,あるいは全ての軸の平均値を特定するために導入されるツールについて述べる。

A constellation of $N=d-1$ Majorana stars represents an arbitrary pure quantum state of dimension $d$ or a permutation-symmetric state of a system consisting of $n$ qubits. We generalize the latter construction to represent in a similar way an arbitrary symmetric pure state of $k$ subsystems with $d$ levels each. For $d\geq 3$, such states are equivalent, as far as rotations are concerned, to a collection of various spin states, with definite relative complex weights. Following Majorana's lead, we introduce a multiconstellation, consisting of the Majorana constellations of the above spin states, augmented by an auxiliary, "spectator" constellation, encoding the complex weights. Examples of stellar representations of symmetric states of four qutrits, and two spin-$3/2$ systems, are presented. We revisit the Hermite and Murnaghan isomorphisms, which relate multipartite states of various spins, number of parties, and even symmetries. We show how the tools introduced can be used to analyze multipartite entanglement and to identify optimal quantum rotosensors, i.e., pure states which are maximally sensitive to rotations around a specified axis, or averaged over all axes.

翻訳日:2023-04-09 02:44:51 公開日:2021-03-04

# 光合成錯体は量子コヒーレンスを用いて効率を向上させるか? おそらくそうではない。

Do photosynthetic complexes use quantum coherence to increase their efficiency? Probably not ( http://arxiv.org/abs/2103.02604v1 )

ライセンス: Link先を確認

Elinor Zerah-Harush and Yonatan Dubi

(参考訳) この疑問に答えることが量子生物学の分野で中心的なモチベーションとなり、光合成錯体の波状挙動を実証する一連の実験の後、このアイデアが生まれて以来である。本稿では,3つの天然錯体の効率に及ぼす量子コヒーレンスの影響を直接評価する。オープン量子システムアプローチにより、自然の生理的条件下で、それらの「量子性」と効率のレベルを同時に特定できる。これらのシステムは、デファス化支援輸送を特徴とする混合量子古典的状態にあることを示す。しかし、この体制における効率の変化は最短であり、量子コヒーレンスの存在が効率の向上に重要な役割を果たさないことを示唆している。しかしながら、このレジームの効率性はいかなる構造パラメータにも依存せず、進化によって自然コンプレックスをパラメータレジームに誘導し、他の用途のためにその構造を「設計する」ことを示唆している。

Answering the titular question has become a central motivation in the field of quantum biology, ever since the idea was raised following a series of experiments demonstrating wave-like behavior in photosynthetic complexes. Here, we report a direct evaluation of the effect of quantum coherence on the efficiency of three natural complexes. An open quantum systems approach allows us to simultaneously identify their level of "quantumness" and efficiency, under natural physiological conditions. We show that these systems reside in a mixed quantum-classical regime, characterized by dephasing-assisted transport. Yet, we find that the change in efficiency at this regime is minute at best, implying that the presence of quantum coherence does not play a significant role in enhancing efficiency. However, in this regime efficiency is independent of any structural parameters, suggesting that evolution may have driven natural complexes to their parameter regime in order to "design" their structure for other uses.

翻訳日:2023-04-09 02:44:28 公開日:2021-03-04

# $3 \rightarrow 1 $ sequence quantum random access codes を用いた非シャープ測定による3つのブラックボックスの認証

Certification of three black boxes with unsharp measurements using $3 \rightarrow 1 $ sequential quantum random access codes ( http://arxiv.org/abs/2103.03075v1 )

ライセンス: Link先を確認

Shihui Wei, Fenzhuo Guo, Fei Gao, and Qiao-Yan Wen

(参考訳) アンシャープ測定は量子情報理論においてますます重要な役割を果たす。本稿では、3 \rightarrow 1 $ sequential random access codes (racs)に基づく非シャープ測定による3者間準備変換測定実験について検討する。 3 \rightarrow 1 $ sequential quantum random access codes (qracs) の2つの相関証人間の最適なトレードオフを導出し、その結果を用いて3つのシーケンシャルパーティのための量子準備、機器、測定の自己テストを完了させる。また、シャープネスパラメータの上限と下限を与え、自己テストスキームのロバスト性解析を完了させる。さらに, 古典的相関証人違反は, 両者の相関証人から3ドル1ドル連続RACを同時に取得できないことがわかった。これは、第2のパーティが古典的な上界を克服するために強いアンシャープ測定を使用する場合、第3のパーティは鋭い測定でもそれをできないことを意味する。最後に、異なるシャープネスパラメータ下での乱数生成効率の解析と比較を行い、決定式値に基づいて2 \rightarrow 1 $ と $3 \rightarrow 1 $ qracs を別々に示す。このレターは、セミデバイス独立フレームワークにおけるマルチパーティ間の乱数生成に新たな光を当てている。

Unsharp measurements play an increasingly important role in quantum information theory. In this paper, we study a three-party prepare-transform-measure experiment with unsharp measurements based on $ 3 \rightarrow 1 $ sequential random access codes (RACs). We derive optimal trade-off between the two correlation witnesses in $ 3 \rightarrow 1 $ sequential quantum random access codes (QRACs), and use the result to complete the self-testing of quantum preparations, instruments and measurements for three sequential parties. We also give the upper and lower bounds of the sharpness parameter to complete the robustness analysis of the self-testing scheme. In addition, we find that classical correlation witness violation based on $3 \rightarrow 1 $ sequential RACs cannot be obtained by both correlation witnesses simultaneously. This means that if the second party uses strong unsharp measurements to overcome the classical upper bound, the third party cannot do so even with sharp measurements. Finally, we give the analysis and comparison of the random number generation efficiency under different sharpness parameters based on the determinant value, $2 \rightarrow 1 $ and $3 \rightarrow 1 $ QRACs separately. This letter sheds new light on generating random numbers among multi-party in semi-device independent framework.

翻訳日:2023-04-09 02:39:07 公開日:2021-03-04

# シカモア量子超越回路のシミュレーション

Simulating the Sycamore quantum supremacy circuits ( http://arxiv.org/abs/2103.03074v1 )

ライセンス: Link先を確認

Feng Pan and Pan Zhang

(参考訳) 量子回路をシミュレートする一般的なテンソルネットワーク法を提案する。この方法は既存の方法よりも多くの相関ビットストリング振幅と確率を計算するのに非常に効率的である。本研究では、従来のスーパーコンピュータの限界を超え、量子超越性を示すために用いられてきた、googleのsycamore回路のサンプリング問題を研究する。我々は,60のグラフィカル処理ユニット(GPU)を含む小さな計算クラスタを用いて,53キュービットと20サイクルのSycamore回路から100万個の相関ビットストリングを生成し,線形クロスエントロピーベンチマーク(XEB)フィデリティは0.739であり,これはGoogleの量子超越実験よりもはるかに高い。

We propose a general tensor network method for simulating quantum circuits. The method is massively more efficient in computing a large number of correlated bitstring amplitudes and probabilities than existing methods. As an application, we study the sampling problem of Google's Sycamore circuits, which are believed to be beyond the reach of classical supercomputers and have been used to demonstrate quantum supremacy. Using our method, employing a small computational cluster containing 60 graphical processing units (GPUs), we have generated one million correlated bitstrings with some entries fixed, from the Sycamore circuit with 53 qubits and 20 cycles, with linear cross-entropy benchmark (XEB) fidelity equals 0.739, which is much higher than those in Google's quantum supremacy experiments.

翻訳日:2023-04-09 02:38:44 公開日:2021-03-04

# 核スピンフリーNi(II)に基づく分子単量体におけるスピン時計転移の化学チューニング

Chemical tuning of spin clock transitions in molecular monomers based on nuclear spin-free Ni(II) ( http://arxiv.org/abs/2103.03021v1 )

ライセンス: Link先を確認

Marcos Rub\'in-Osanz, Fran\c{c}ois Lambert, Feng Shao, Eric Rivi\`ere, R\'egis Guillot, Nicolas Suaud, Nathalie Guih\'ery, David Zueco, Anne-Laure Barra, Talal Mallah and Fernando Luis

(参考訳) 我々は、一核ni錯体の2つの最低電子スピン準位の間に大きな量子トンネルが存在することを報告する。このギャップに関連するレベルの反交差(磁気時計遷移)は、熱容量実験によって直接監視されている。これらの結果と、対称性によってトンネルが禁止されるco誘導体との比較は、時計遷移が分子間スピンスピン-スピン相互作用を効果的に抑制することを示している。さらに, 量子トンネル分割法では, 結晶場と磁気異方性を決定するリガンドシェルの修飾による化学チューニングが認められることを示した。これらの性質は、デコヒーレンスに対して必要なレジリエンス、他のキュービットとの適切なインターフェース、制御回路、冷却による初期化能力を組み合わせたモデルスピン量子ビットの実現に不可欠である。

We report the existence of a sizeable quantum tunnelling splitting between the two lowest electronic spin levels of mononuclear Ni complexes. The level anti-crossing, or magnetic clock transition, associated with this gap has been directly monitored by heat capacity experiments. The comparison of these results with those obtained for a Co derivative, for which tunnelling is forbidden by symmetry, shows that the clock transition leads to an effective suppression of intermolecular spin-spin interactions. In addition, we show that the quantum tunnelling splitting admits a chemical tuning via the modification of the ligand shell that determines the crystal field and the magnetic anisotropy. These properties are crucial to realize model spin qubits that combine the necessary resilience against decoherence, a proper interfacing with other qubits and with the control circuitry and the ability to initialize them by cooling.

翻訳日:2023-04-09 02:37:47 公開日:2021-03-04

# ゲートセットトモグラフィーによる超伝導量子ビットの中間回路特性評価

Characterizing mid-circuit measurements on a superconducting qubit using gate set tomography ( http://arxiv.org/abs/2103.03008v1 )

ライセンス: Link先を確認

Kenneth Rudinger, Guilhem J. Ribeill, Luke C. G. Govia, Matthew Ware, Erik Nielsen, Kevin Young, Thomas A. Ohki, Robin Blume-Kohout, and Timothy Proctor

(参考訳) 量子回路の内部層で発生する測定(中間回路計測)は、重要な量子コンピューティングプリミティブであり、特に量子エラーの修正に向いている。中回路測定は古典的出力と量子的出力の両方を持つため、量子回路を終端する測定には存在しない誤差モードの対象となる。本稿では,量子機器を用いた中周期計測を,量子機器線形ゲートセットトモグラフィ(QILGST)と呼ぶ手法を用いて特徴付ける方法を示す。次に、この手法を適用し、マルチキュービットシステム内の超伝導トランスモン量子ビットの分散測定を特徴付ける。測定パルスとその後のゲート間の遅延時間を変化させることで,残余空洞光子集団が測定誤差に与える影響を探索する。実験では、1000 ns以上の遅延時間、すなわち、$\epsilon_{\diamond} = 8.1 \pm 1.4 \%$, a readout fidelity of 97.0 \pm 0.3\%$, and output quantum state fidelities of 9,6.7 \pm 0.6\%$と93.7 \pm 0.7\%$の合計誤差率(すなわち、半ダイヤモンド距離)を測定した。

Measurements that occur within the internal layers of a quantum circuit -- mid-circuit measurements -- are an important quantum computing primitive, most notably for quantum error correction. Mid-circuit measurements have both classical and quantum outputs, so they can be subject to error modes that do not exist for measurements that terminate quantum circuits. Here we show how to characterize mid-circuit measurements, modelled by quantum instruments, using a technique that we call quantum instrument linear gate set tomography (QILGST). We then apply this technique to characterize a dispersive measurement on a superconducting transmon qubit within a multiqubit system. By varying the delay time between the measurement pulse and subsequent gates, we explore the impact of residual cavity photon population on measurement error. QILGST can resolve different error modes and quantify the total error from a measurement; in our experiment, for delay times above 1000 ns we measured a total error rate (i.e., half diamond distance) of $\epsilon_{\diamond} = 8.1 \pm 1.4 \%$, a readout fidelity of $97.0 \pm 0.3\%$, and output quantum state fidelities of $96.7 \pm 0.6\%$ and $93.7 \pm 0.7\%$ when measuring $0$ and $1$, respectively.

翻訳日:2023-04-09 02:37:18 公開日:2021-03-04

# H3の電子共鳴状態における擬似Jahn-Teller相互作用

Pseudo-Jahn-Teller interaction among electronic resonant states of H3 ( http://arxiv.org/abs/2103.02935v1 )

ライセンス: Link先を確認

Patrik Hedvall, {\AA}sa Larson

(参考訳) 我々は、H3+基底状態のポテンシャルエネルギー面上にエネルギーを持つH3の電子共鳴状態を研究する。これらの共鳴状態は、より高い衝突エネルギーにおけるH3+の解離的再結合に重要である。これらの共鳴状態を記述するために擬似ヤーン・テラーモデルの複素一般化を導入する。共振状態のポテンシャルエネルギーと自己イオン化幅は、複雑なコーン変分法を用いて電子散乱計算により計算され、複素モデルパラメータは結果に適合する最小二乗で抽出される。この処理により、この系を記述する非エルミート擬ヤーン・テラー・ハミルトニアンが現れる。非断熱結合と幾何位相はさらに計算され、複素断熱ポテンシャルエネルギー面の強化トポロジーを特徴付けるために用いられる。

We study the electronic resonant states of H3 with energies above the potential energy surface of the H3+ ground state. These resonant states are important for the dissociative recombination of H3+ at higher collision energies, and previous studies have indicated that these resonant states exhibit a triple intersection. We introduce a complex generalization of the pseudo-Jahn-Teller model to describe these resonant states. The potential energies and the autoionization widths of the resonant states are computed with electron scattering calculations using the complex Kohn variational method, and the complex model parameters are extracted by a least-square fit to the results. This treatment results in a non-Hermitian pseudo-Jahn-Teller Hamiltonian describing the system. The non-adiabatic coupling and geometric phase are further calculated and used to characterize the enriched topology of the complex adiabatic potential energy surfaces.

翻訳日:2023-04-09 02:36:41 公開日:2021-03-04

# 結晶面原子核による相対論的粒子非コヒーレント散乱

Relativistic particle incoherent scattering by the nuclei of crystal plane atoms ( http://arxiv.org/abs/2103.03141v1 )

ライセンス: Link先を確認

Victor V. Tikhomirov

(参考訳) 現象パラメータを持たない結晶面の核による古典的移動相対論的粒子の不整合散乱を記述する一貫した理論を示す。量子力学の基本概念を適用し、粒子軌道の単位長さあたりの平均二乗非コヒーレント散乱角の基本的なコンパクトな公式を導入する。後者は、クーロン散乱シミュレーションにおける結晶原子分布の不均一性の影響を、シミュレーション時間の顕著な延長なしに実装するために用いられる。この理論は、正に荷電された粒子が低核密度領域から脱流路する性質を本質的に再検討し、結晶の振動子と短粒子の特定の電磁モーメントの測定の両方に必須である。

A consistent theory, which describes the incoherent scattering of classically moving relativistic particles by the nuclei of crystal planes without any phenomenological parameter is presented. The basic notions of quantum mechanics are applied to introduce a fundamental compact formula for the mean square incoherent scattering angle per unit length of particle trajectory. The latter is used to implement the effects of the crystal atom distribution inhomogeneity into the Coulomb scattering simulations without noticeable elongation of the simulation time. The theory essentially reconsiders the nature of positively charged particle dechanneling from the low nuclear density regions, being essential in both the crystal undulators and envisaged measurements of the specific electromagnetic momenta of short living particles.

翻訳日:2023-04-09 02:30:06 公開日:2021-03-04

# 雑音量子回路のラデマチャー複雑性

Rademacher complexity of noisy quantum circuits ( http://arxiv.org/abs/2103.03139v1 )

ライセンス: Link先を確認

Kaifeng Bu, Dax Enshan Koh, Lu Li, Qingxian Luo, Yaobo Zhang

(参考訳) 量子系におけるノイズは、大きな量子回路上で多くの量子アルゴリズムを実装する上で大きな障害となる。本研究では,量子回路のラデマッハ複雑性に対する雑音の影響について検討する。これは,これらの回路によって生成される関数のクラスのリッチさを定量化する,統計的複雑性の尺度である。我々は、一意チャネルの凸結合で表されるノイズモデルを検討し、これらのノイズモデルによって特徴づけられる量子回路のラデマッハ複素量に対して上下境界を提供する。特に、ノイズのない量子回路のラデマッハ複雑性と回路の自由ロバスト性に依存する雑音量子回路のラデマッハ複雑性に対する下界を求める。以上の結果から,量子回路のRademacher複雑性はノイズの増加とともに減少することが示された。

Noise in quantum systems is a major obstacle to implementing many quantum algorithms on large quantum circuits. In this work, we study the effects of noise on the Rademacher complexity of quantum circuits, which is a measure of statistical complexity that quantifies the richness of classes of functions generated by these circuits. We consider noise models that are represented by convex combinations of unitary channels and provide both upper and lower bounds for the Rademacher complexities of quantum circuits characterized by these noise models. In particular, we find a lower bound for the Rademacher complexity of noisy quantum circuits that depends on the Rademacher complexity of the corresponding noiseless quantum circuit as well as the free robustness of the circuit. Our results show that the Rademacher complexity of quantum circuits decreases with the increase in noise.

翻訳日:2023-04-09 02:29:54 公開日:2021-03-04

# 線形判別分析による量子次元の低減

Quantum Dimensionality Reduction by Linear Discriminant Analysis ( http://arxiv.org/abs/2103.03131v1 )

ライセンス: Link先を確認

Kai Yu, Gong-De Guo, and Song Lin

(参考訳) データの次元性低減(DR)は、パターン認識やデータ分類など、多くの機械学習タスクにおいて重要な問題である。本稿では,次元減少のための線形判別分析(LDA)を効率的に行う量子アルゴリズムと量子回路を提案する。まず,提案アルゴリズムは既存の量子ldaアルゴリズムを改善し,元のアルゴリズムにおけるクラス間散乱行列$s_b$の非可逆性による誤差を回避する。次に,低次元データに対応する対象状態を得るために量子アルゴリズムと量子回路を提案する。最もよく知られた古典的アルゴリズムと比較すると、量子線形判別分析次元減少 (qldadr) アルゴリズムは、元のデータ空間の次元が多対数低次元空間に投影されたとき、ベクトル数 $m$ の指数加速度と、元のデータ空間の次元 $d$ の二次速度を持つ。さらに、本アルゴリズムにより得られた対象状態は、他の量子機械学習タスクのサブモジュールとして使用できる。それは、それを次元の災難から解放する実用的な応用価値を持っている。

Dimensionality reduction (DR) of data is a crucial issue for many machine learning tasks, such as pattern recognition and data classification. In this paper, we present a quantum algorithm and a quantum circuit to efficiently perform linear discriminant analysis (LDA) for dimensionality reduction. Firstly, the presented algorithm improves the existing quantum LDA algorithm to avoid the error caused by the irreversibility of the between-class scatter matrix $S_B$ in the original algorithm. Secondly, a quantum algorithm and quantum circuits are proposed to obtain the target state corresponding to the low-dimensional data. Compared with the best-known classical algorithm, the quantum linear discriminant analysis dimensionality reduction (QLDADR) algorithm has exponential acceleration on the number $M$ of vectors and a quadratic speedup on the dimensionality $D$ of the original data space, when the original dataset is projected onto a polylogarithmic low-dimensional space. Moreover, the target state obtained by our algorithm can be used as a submodule of other quantum machine learning tasks. It has practical application value of make that free from the disaster of dimensionality.

翻訳日:2023-04-09 02:29:42 公開日:2021-03-04

# 量子テレポーテーションに基づく量子情報マスキング

Quantum information masking basing on quantum teleportation ( http://arxiv.org/abs/2103.03126v1 )

ライセンス: Link先を確認

Wei-Min Shang and Fu-Lin Zhang and Jing-Ling Chen

(参考訳) マスキングの定理は、二部構成のシナリオでは量子情報のマスキングは不可能であると述べている。しかし、マルチパーティイト系では量子状態をマスクするスキームが存在する。本研究では,テレポーテーションにおける関節計測は,装置がシステム全体の量子参加者と見なされる場合,実際にはマスキングの過程であることを示す。前者のうちの1つは、Li と Wang [Phys. Rev. A 98, 062306 (2018)] によって与えられる四ビットスキームの任意の次元の一般化を提供する。量子状態の占有確率とコヒーレンスは、我々のスキームの2つのステップで隠蔽される。そしてその情報は、その逆のプロセスで自然に抽出できる。

The no-masking theorem says that masking quantum information is impossible in a bipartite scenario. However, there exist schemes to mask quantum states in multipartite systems. In this work, we show that, the joint measurement in the teleportation is really a masking process, when the apparatus is regarded as a quantum participant in the whole system.Based on the view, we present two four-partite maskers and a tripartite masker. One of the former provides a generalization in arbitrary dimension of the four-qubit scheme given by Li and Wang [Phys. Rev. A 98, 062306 (2018)], and the latter is precisely their tripartite scheme. The occupation probabilities and coherence of quantum states are masked in two steps of our schemes. And the information can be extracted naturally in their reverse processes.

翻訳日:2023-04-09 02:29:24 公開日:2021-03-04

# 経済異常検出のためのイベントベース動的バンキングネットワーク探索

Event-Based Dynamic Banking Network Exploration for Economic Anomaly Detection ( http://arxiv.org/abs/2103.03120v1 )

ライセンス: Link先を確認

Andry Alamsyah, Dian Puteri Ramadhani, Farida Titik Kristanti

(参考訳) 金融システムの不安定さは、銀行の破綻を引き起こし、流出を誘発し、金融システムに悪影響を及ぼす感染効果を生じさせ、最終的には経済に影響を及ぼす可能性がある。この現象は、高度に相互接続された銀行取引の結果である。銀行取引ネットワークは金融アーキテクチャのバックボーンと見なされている。銀行間の強い相互接続性は、銀行網全体に広がる伝染破壊をエスカレートし、システム全体の崩壊を引き起こす。これまでのところ、金融の不安定性は、主に制御されていない取引赤字量と未払いの対外債務をマクロアプローチで検出されている。本研究は、グローバルに銀行網構造を探索するマクロビューとモチーフと呼ばれる詳細なネットワークパターンに焦点を当てたマイクロビューを通して、別の視点で金融不安定検出を提案する。ネットワーク三進モチーフパターンは、金融不安定を検出するのに用いられる。不安定期間に関連する最も関連するネットワーク三進モチーフ変化を検出器として決定する。インドネシアの主要な宗教行事であるeid al-fitrとともに、金融不安定現象下の銀行ネットワークの挙動を考察する。我々は、金融不安定な基盤検出器として一つのモチーフパターンを発見する。この研究は金融システムの安定管理を支援するのに役立つ。

The instability of financial system issues might trigger a bank failure, evoke spillovers, and generate contagion effects which negatively impacted the financial system, ultimately on the economy. This phenomenon is the result of the highly interconnected banking transaction. The banking transactions network is considered as a financial architecture backbone. The strong interconnectedness between banks escalates contagion disruption spreading over the banking network and trigger the entire system collapse. This far, the financial instability is generally detected using macro approach mainly the uncontrolled transaction deficits amount and unpaid foreign debt. This research proposes financial instability detection in another point of view, through the macro view where the banking network structure are explored globally and micro view where focuses on the detailed network patterns called motif. Network triadic motif patterns used as a denomination to detect financial instability. The most related network triadic motif changes related to the instability period are determined as a detector. We explore the banking network behavior under financial instability phenomenon along with the major religious event in Indonesia, Eid al-Fitr. We discover one motif pattern as the financial instability underlying detector. This research helps to support the financial system stability supervision.

翻訳日:2023-04-09 02:29:01 公開日:2021-03-04

# インドネシアのEコマースデータに基づく中小企業の分類決定木アプローチによる販売予測モデル

Sales Prediction Model Using Classification Decision Tree Approach For Small Medium Enterprise Based on Indonesian E-Commerce Data ( http://arxiv.org/abs/2103.03117v1 )

ライセンス: Link先を確認

Raden Johannes, Andry Alamsyah

(参考訳) インドネシアにおけるインターネット利用者の増加は、商業を含む日常生活の多くの側面に影響を及ぼす。インドネシアの中小企業は、新しいメディアの利点を生かして、オンラインコマースの意義を生かした。これまでのところ、過去の取引で売上と収益を予測できる実用的な実装は知られていない。本稿では,インドネシア最大のeコマースプロバイダであるTokopediaで収集した実生活データを用いて,インドネシアの靴産業における販売予測モデルを構築した。データマイニングは、データを処理することによって情報を集めるために使用できる分野である。本研究は,データマイニングにおける分類手法を用いて,市場のパターンを記述し,市場商品における地域の可能性を予測する。我々のアプローチは分類決定木に基づいている。われわれは、視聴者が販売する商品数、価格、靴の種類を予測することができた。

The growth of internet users in Indonesia gives an impact on many aspects of daily life, including commerce. Indonesian small-medium enterprises took this advantage of new media to derive their activity by the meaning of online commerce. Until now, there is no known practical implementation of how to predict their sales and revenue using their historical transaction. In this paper, we build a sales prediction model on the Indonesian footwear industry using real-life data crawled on Tokopedia, one of the biggest e-commerce providers in Indonesia. Data mining is a discipline that can be used to gather information by processing the data. By using the method of classification in data mining, this research will describe patterns of the market and predict the potential of the region in the national market commodities. Our approach is based on the classification decision tree. We managed to determine predicted the number of items sold by the viewers, price, and type of shoes.

翻訳日:2023-04-09 02:28:41 公開日:2021-03-04

# ノイズ量子ネットワークのロバスト性

Robustness of Noisy Quantum Networks ( http://arxiv.org/abs/2103.03266v1 )

ライセンス: Link先を確認

Bruno C. Coutinho, William J. Munro, Kae Nemoto and Yasser Omar

(参考訳) 量子ネットワークは複雑なネットワークの新しいパラダイムであり、ネットワーク化された量子技術を利用して量子インターネットを開発することができる。しかし、リンクとノードが故障し始めると、量子ネットワークはどのくらい堅牢か? 典型的雑音の量子リピータノードに基づく量子ネットワークは、動作リンクやノードのランダムな損失に対して不連続な相転移を起こしやすいことを示し、ネットワークの接続性を急激に妥協させ、その動作範囲を著しく制限することを示した。さらに,ネットワークトポロジー,ネットワークサイズ,ネットワーク内の絡み合い分布の関数として,この破滅的な接続損失を回避するために必要な臨界量子リピータ効率を決定する。特に,大規模量子インターネットの確立には,スケールフリートポロジーが重要な設計原理であることを示す。

Quantum networks are a new paradigm of complex networks, allowing us to harness networked quantum technologies and to develop a quantum internet. But how robust is a quantum network when its links and nodes start failing? We show that quantum networks based on typical noisy quantum-repeater nodes are prone to discontinuous phase transitions with respect to the random loss of operating links and nodes, abruptly compromising the connectivity of the network, and thus significantly limiting the reach of its operation. Furthermore, we determine the critical quantum-repeater efficiency necessary to avoid this catastrophic loss of connectivity as a function of the network topology, the network size, and the distribution of entanglement in the network. In particular, our results indicate that a scale-free topology is a crucial design principle to establish a robust large-scale quantum internet.

翻訳日:2023-04-09 02:20:42 公開日:2021-03-04

# ブロックチェーンプラットフォームの利用可能性に関する要件分析と評価

Requirement Analyses and Evaluations of Blockchain Platforms per Possible Use Cases ( http://arxiv.org/abs/2103.03209v1 )

ライセンス: Link先を確認

Kenji Saito, Akimitsu Shiseki, Mitsuyasu Takada, Hiroki Yamamoto, Masaaki Saitoh, Hiroaki Ohkawa, Hirofumi Andou, Naotake Miyamoto, Kazuaki Yamakawa, Kiyoshi Kurakawa, Tomohiro Yabushita, Yuji Yamada, Go Masuda, Kazuyuki Masuda

(参考訳) ブロックチェーンは、公開文書や私文書の管理から、さまざまな産業におけるトレーサビリティ、デジタル通貨に至るまで、幅広い方法で社会のデジタルトランスフォーメーションに寄与すると言われている。いわゆるブロックチェーンプラットフォームが数多く開発され、実験や応用が行われている。しかし、これらのプラットフォームは本当にブロックチェーンのコンセプトを実践するのに有効だろうか? 質問に答えるためには、ブロックチェーンと呼ばれる技術が何であるかをよりよく理解する必要があります。ブロックチェーンが何のために発明されたのか、それが何を意味するのかを理解する上で、我々が見る混乱を整理する必要がある。また,その応用構造を明らかにする必要がある。このドキュメントは、ブロックチェーンとそのアプリケーションを理解する一般的なモデルを提供します。プラットフォームを分類するためにデザインパターンを導入します。アプリケーション間の構造を識別し,各ケースの機能的,性能的,運用的,法的要件を整理することにより,考えられるユースケースを分類する。分類と基準に基づいて、Hyperledger Fabric、Hyperledger Iroha、Hyperledger Indy、Ethereum、Quorum/Hyperledger Besu、Ethereum 2.0、Pokadot、Corda、BBc-1といったプラットフォームを評価し、比較した。評価と比較で公平に取り組んできたが、議論を誘発することを期待している。このドキュメントの読者は、非エンジニアや非技術者を含む、ブロックチェーンとそのプラットフォームを理解したいアプリケーションシステムの開発に関わる人であれば誰でも参加できる。この文書のアセスメントにより、読者はブロックチェーンプラットフォームの技術的要件を理解し、既存の技術に疑問を呈し、想定するアプリケーションに適したプラットフォームを選択することができる。この比較は、新しいテクノロジーを設計するためのガイドとしても役立つだろう。

It is said that blockchain will contribute to the digital transformation of society in a wide range of ways, from the management of public and private documents to the traceability in various industries, as well as digital currencies. A number of so-called blockchain platforms have been developed, and experiments and applications have been carried out on them. But are these platforms really conducive to practical use of the blockchain concept? To answer the question, we need to better understand what the technology called blockchain really is. We need to sort out the confusion we see in understanding what blockchain was invented for and what it means. We also need to clarify the structure of its applications. This document provides a generic model of understanding blockchain and its applications. We introduce design patterns to classify the platforms. We categorize possible use cases by identifying the structure among applications, and organize the functional, performance, operational and legal requirements for each such case. Based on the categorization and criteria, we evaluated and compared the following platforms: Hyperledger Fabric, Hyperledger Iroha, Hyperledger Indy, Ethereum, Quorum/Hyperledger Besu, Ethereum 2.0, Polkadot, Corda and BBc-1. We have tried to be fair in our evaluations and comparisons, but we also expect to provoke discussion. The intended readers for this document is anyone involved in development of application systems who wants to understand blockchain and their platforms, including non-engineers and non-technologists. The assessments in this document will allow readers to understand the technological requirements for the blockchain platforms, to question existing technologies, and to choose the appropriate platforms for the applications they envision. The comparisons hopefully will also be useful as a guide for designing new technologies.

翻訳日:2023-04-09 02:19:59 公開日:2021-03-04

# 量子コンピュータにおけるデジタル散逸ダイナミクスを用いた量子マルコフ連鎖モンテカルロ

Quantum Markov Chain Monte Carlo with Digital Dissipative Dynamics on Quantum Computers ( http://arxiv.org/abs/2103.03207v1 )

ライセンス: Link先を確認

Mekena Metcalf, Emma Stone, Katherine Klymko, Alexander F. Kemper, Mohan Sarovar, and Wibe A. de Jong

(参考訳) 環境に接続された量子システムのダイナミクスのモデリングは、自然界のほとんどの量子プロセスが環境に影響されるため、複雑な量子プロセスの理解を進める上で非常に重要である。量子シミュレータ上のマクロ環境のモデリングは、システムと適切な方法でエネルギー交換を促進し、環境を模倣する独立したアンシラ量子ビットを結合することで達成できる。このアプローチには、非現実的な大規模な、おそらく指数関数的な自由度を必要とする。対照的に,少数のアンシラ量子ビットを用いて環境とのインタラクションをシミュレートするディジタル量子アルゴリズムを開発した。周期的なアンシラエネルギの変調(またはスペクトルコンピング)と周期的なリセット操作を組み合わせることで、大きな環境との相互作用を模倣し、相互作用する多体系の熱状態を生成することができる。逆イジングモデルの熱状態のシミュレーションによるアルゴリズムの評価を行った。このアルゴリズムは、多変量モデルのギブス分布のサンプリングを可能にする量子マルコフ連鎖モンテカルロ(qmcmc)プロセスとしても見ることができる。そこで本研究では,単純な確率的グラフィカルモデルのギブス分布のサンプリング精度を評価する。

Modeling the dynamics of a quantum system connected to the environment is critical for advancing our understanding of complex quantum processes, as most quantum processes in nature are affected by an environment. Modeling a macroscopic environment on a quantum simulator may be achieved by coupling independent ancilla qubits that facilitate energy exchange in an appropriate manner with the system and mimic an environment. This approach requires a large, and possibly exponential number of ancillary degrees of freedom which is impractical. In contrast, we develop a digital quantum algorithm that simulates interaction with an environment using a small number of ancilla qubits. By combining periodic modulation of the ancilla energies, or spectral combing, with periodic reset operations, we are able to mimic interaction with a large environment and generate thermal states of interacting many-body systems. We evaluate the algorithm by simulating preparation of thermal states of the transverse Ising model. Our algorithm can also be viewed as a quantum Markov chain Monte Carlo (QMCMC) process that allows sampling of the Gibbs distribution of a multivariate model. To demonstrate this we evaluate the accuracy of sampling Gibbs distributions of simple probabilistic graphical models using the algorithm.

翻訳日:2023-04-09 02:19:33 公開日:2021-03-04

# ポート割り当てとコンパイルによる線形光学不完全化の緩和

Mitigating linear optics imperfections via port allocation and compilation ( http://arxiv.org/abs/2103.03183v1 )

ライセンス: Link先を確認

Shreya P. Kumar, Leonhard Neuhaus, Lukas G. Helt, Haoyu Qi, Blair Morrison, Dylan H. Mahler, Ish Dhand

(参考訳) 線形光学は、室温で動作し、統合フォトニックプラットフォーム上でスカラーで製造できる量子技術を構築する有望な経路である。しかし、ライン光学のスケールアップには製造不備が避けられないため、高性能な運転が必要である。そこで本研究では, 適切なキャリブレーション手順によって事前に決定できる, ポート割り当てとオンチップ不完全度へのコンパイルを調整し, 線形光干渉計の性能を向上させる手法を提案する。代表例として、所定の干渉計の平均消費電力や、その上に実装されたすべての可能なユニタリ変換における消費電力値の範囲の劇的な削減を示す。さらに, 製造欠陥の存在下での所望の変換のフィダリティ向上にこれらの技術が有効であることを示す。関連する測定値における線形光干渉計の性能を数桁改善することにより、これらのツールは真の量子優位性を示すための光学技術をもたらす。

Linear optics is a promising route to building quantum technologies that operate at room temperature and can be manufactured scalably on integrated photonic platforms. However, scaling up linear optics requires high-performance operation amid inevitable manufacturing imperfections. We present techniques for enhancing the performance of linear optical interferometers by tailoring their port allocation and compilation to the on-chip imperfections, which can be determined beforehand by suitable calibration procedures that we introduce. As representative examples, we demonstrate dramatic reductions in the average power consumption of a given interferometer or in the range of its power consumption values across all possible unitary transformations implemented on it. Furthermore, we demonstrate the efficacy of these techniques at improving the fidelities of the desired transformations in the presence of fabrication defects. By improving the performance of linear optical interferometers in relevant metrics by several orders of magnitude, these tools bring optical technologies closer to demonstrating true quantum advantage.

翻訳日:2023-04-09 02:19:11 公開日:2021-03-04

# 観測不能因果ループの量子力学的概念とアントロピック原理

The quantum mechanical notion of unobservable causal loop and the anthropic principle ( http://arxiv.org/abs/2103.03173v1 )

ライセンス: Link先を確認

Giuseppe Castagnoli

(参考訳) 2つの1対1の相関した測定結果の間の可逆的量子過程の通常の記述は、因果関係の方向を規定しないことで、可逆的過程に必要な時間対称性に反する因果構造が許されるため不完全である。これはまた、単に時間対称性化することで完了できることを意味する。すなわち、最初の測定値と最後の測定値が、それらの相関した結果の選択に均等に寄与することを要求することである。これは説明を変更せずに残すが、因果構造が完全に定義される観測不能な時間対称性のインスタンスの量子重ね合わせであることを示している。それぞれのインスタンスは因果ループで構成されている:最後の測定は、単位変換の入力状態がその直前の状態につながるときに後方に変化する。前者の研究では、そのようなループが量子計算のスピードアップと量子非局所性を正確に説明できることが示されている。この研究で、量子スピードアップを伴う宇宙の進化を可能にする人類の原理の完成につながることを示した。

It can be argued that the ordinary description of the reversible quantum process between two one-to-one correlated measurement outcomes is incomplete because, by not specifying the direction of causality, it allows causal structures that violate the time symmetry that is required of a reversible process. This also means that it can be completed simply by time-symmetrizing it, namely by requiring that the initial and final measurements evenly contribute to the selection of their correlated pair of outcomes. This leaves the description unaltered but shows that it is the quantum superposition of unobservable time-symmetrized instances whose causal structure is completely defined. Each instance consists of a causal loop: the final measurement that changes backwards in time the input state of the unitary transformation that leads to the state immediately before it. In former works, we have shown that such loops exactly explain the quantum computational speedup and quantum nonlocality. In this work we show that they lead to a completion of the anthropic principle that allows a universe evolution with quantum speedup.

翻訳日:2023-04-09 02:18:55 公開日:2021-03-04

# 農場における現場作業の遠隔観察

Remote Observation of Field Work on the Farm ( http://arxiv.org/abs/2103.03163v1 )

ライセンス: Link先を確認

Wendy Ju, Ilan Mandel, Kevin Weatherwax, Leila Takayama, Nikolas Martelaro, Denis Willett

(参考訳) 旅行制限やソーシャルディスタンシング対策は、物理的フィールドワークの監視、監視、管理を困難にしている。本研究は,農場における現場作業の観察のために,道路内車両におけるリアルタイム遠隔観察・会話技術を適用した研究である。私たちは、ニューヨーク北部のKreher Eggsで、このプロジェクトのパイロット展開に協力しました。車両関連作業を行う農作業員の遠隔観察と面接を行うための機器を備えたトラクタを製作した。本研究は, 遠隔地から長期にわたる現場作業の継続観察を可能にするため, 本研究は, 本研究の現状を踏まえ, 地理的・身体的距離が標準となった場合の観察研究の実施方法についてのケーススタディを提供する。我々は, 現場で遠隔観察研究を行おうとする人々に対して, 経験を議論し, 予備的な知見を提供する。

Travel restrictions and social distancing measures make it difficult to observe, monitor or manage physical fieldwork. We describe research in progress that applies technologies for real-time remote observation and conversation in on-road vehicles to observe field work on a farm. We collaborated on a pilot deployment of this project at Kreher Eggs in upstate New York. We instrumented a tractor with equipment to remotely observe and interview farm workers performing vehicle-related work. This work was initially undertaken to allow sustained observation of field work over longer periods of time from geographically distant locales; given our current situation, this work provides a case study in how to perform observational research when geographic and bodily distance have become the norm. We discuss our experiences and provide some preliminary insights for others looking to conduct remote observational research in the field.

翻訳日:2023-04-09 02:18:34 公開日:2021-03-04

# ボームの量子ポテンシャルの非局所位相場モデル

A non local phase field model of Bohm's quantum potential ( http://arxiv.org/abs/2103.03162v1 )

ライセンス: Link先を確認

Roberto Mauri

(参考訳) 気体の自由エネルギーがその質量密度の対数に非局所的に依存すると仮定すると、結果として生じる運動方程式の体力は密度勾配項の和からなる。 2項目の後にこの級数、ボームの量子ポテンシャルとマドルング方程式は同一に得られ、量子力学の定式化に繋がった仮説のいくつかが非局所性に基づく古典的な解釈を受け入れていることを示している。

Assuming that the free energy of a gas depends non-locally on the logarithm of its mass density, the body force in the resulting equation of motion consists of the sum of density gradient terms. Truncating this series after the second term, Bohm's quantum potential and the Madelung equation are identically obtained, showing explicitly that some of the hypotheses that led to the formulation of quantum mechanics admit a classical interpretation based on non-locality.

翻訳日:2023-04-09 02:18:21 公開日:2021-03-04

# 新規サチンボワーバード最適化装置によるコンクリートの一軸圧縮強度の解析

Analyzing Uniaxial Compressive Strength of Concrete Using a Novel Satin Bowerbird Optimizer ( http://arxiv.org/abs/2103.15547v1 )

ライセンス: Link先を確認

Hossein Moayedi, Amir Mosavi

(参考訳) コンクリートの力学パラメータを解析する複雑さを克服するには, 適切な方法を選択する必要がある。本研究では, コンクリートの一軸圧縮強度(ucs)を予測するために, 人工ニューラルネットワーク (ann) と, satin bowerbird optimizer (sbo) という新しいメタヒューリスティックな手法を統合した。この目的のために作成されたハイブリッドは、公開された文献から収集された比較的大きなデータセットを使用してトレーニングされ、テストされる。その他の3つの新しいアルゴリズム、Henry Gas Solubility Optimization (HGSO)、Sunflower Optimization (SFO)、VSA (Vortex search algorithm) もベンチマークとして使用されている。様々な精度指標を用いて,全アルゴリズムの適切な集団サイズを達成した後,提案手法はUCSの挙動を良好に解析するだけでなく,3つのベンチマークハイブリッド(ANN-HGSO,ANN-SFO,ANN-VSA)全てに優れていた。予測フェーズでは, 0.87394, 0.87936, 0.95329, 0.95663の相関指標と, ANN-HGSO, ANN-SFO, ANN-VSA, ANN-SBOで算出された15.9719, 15.3845, 9.4970, 8.0629%の平均絶対誤差は, それぞれ最高の予測性能を示した。また、ANN-VSAも信頼できる結果を得た。要するにann-sboは、技術者がコンクリートのucsを予測するための効率的な非破壊的手法として使用できる。

Surmounting the complexities in analyzing the mechanical parameters of concrete entails selecting an appropriate methodology. This study integrates an artificial neural network (ANN) with a novel metaheuristic technique, namely satin bowerbird optimizer (SBO) for predicting uniaxial compressive strength (UCS) of concrete. For this purpose, the created hybrid is trained and tested using a relatively large dataset collected from the published literature. Three other new algorithms, namely Henry gas solubility optimization (HGSO), sunflower optimization (SFO), and vortex search algorithm (VSA) are also used as benchmarks. After attaining a proper population size for all algorithms, Utilizing various accuracy indicators, it was shown that the proposed ANN-SBO not only can excellently analyze the UCS behavior, but also outperforms all three benchmark hybrids (i.e., ANN-HGSO, ANN-SFO, and ANN-VSA). In the prediction phase, the correlation indices of 0.87394, 0.87936, 0.95329, and 0.95663, as well as mean absolute percentage errors of 15.9719, 15.3845, 9.4970, and 8.0629%, calculated for the ANN-HGSO, ANN-SFO, ANN-VSA, and ANN-SBO, respectively, manifested the best prediction performance for the proposed model. Also, the ANN-VSA achieved reliable results as well. In short, the ANN-SBO can be used by engineers as an efficient non-destructive method for predicting the UCS of concrete.

翻訳日:2023-04-09 02:11:54 公開日:2021-03-04

# 誰が何をする? コンピュータサイエンス学生チームにおける作業分割と配置戦略

Who does what? Work division and allocation strategies of computer science student teams ( http://arxiv.org/abs/2103.09048v1 )

ライセンス: Link先を確認

Anna van der Meulen, Efthimia Aivaloglou

(参考訳) コラボレーションスキルは将来のソフトウェアエンジニアにとって重要だ。コンピュータサイエンス教育では、これらのスキルは、学生が共同でソフトウェアを開発するグループ課題を通じて実践されることが多い。これらの課題に学生が取り組むアプローチは様々であるが、しばしば分業を伴う。そして、コラボレーションが現在も行われているかどうかを議論することができる。コンピューティング教育の分野はこの文脈で特に興味深いのは、特定の特徴(例えば、エントリスキルのレベルの変化やコラボレーションプラットフォームとしてのソースコードリポジトリの使用など)がグループワークで取られたアプローチに影響を与える可能性があるからである。本研究の目的は,集団課題におけるコンピュータサイエンスの学生の作業分割とアロケーション戦略の洞察を得ることである。この結果、4つの大学の20人の学生にインタビューを行った。テーマ分析は、ペアプログラミングとコードレビューが頻繁に採用され、学生は独立して働くためにワークロードを分割する傾向があることを示している。学生は主に成績と効率の要素に動機付けられ、主に専門知識と選好に基づいてタスクを選択して割り当てる。本研究の結果から,グループ課題の設定は,新しいソフトウェア工学のスキルを実践する学生のモチベーションを制限し,実験と学習を促進するためには介入が必要であると論じている。

Collaboration skills are important for future software engineers. In computer science education, these skills are often practiced through group assignments, where students develop software collaboratively. The approach that students take in these assignments varies widely, but often involves a division of labour. It can then be argued whether collaboration still takes place. The discipline of computing education is especially interesting in this context, because some of its specific features (such as the variation in entry skill level and the use of source code repositories as collaboration platforms) are likely to influence the approach taken within groupwork. The aim of this research is to gain insight into the work division and allocation strategies applied by computer science students during group assignments. To this end, we interviewed twenty students of four universities. The thematic analysis shows that students tend to divide up the workload to enable working independently, with pair programming and code reviews being often employed. Motivated primarily by grade and efficiency factors, students choose and allocate tasks primarily based on their prior expertise and preferences. Based on our findings, we argue that the setup of group assignments can limit student motivation for practicing new software engineering skills, and that interventions are needed towards encouraging experimentation and learning.

翻訳日:2023-04-09 02:11:23 公開日:2021-03-04

# ニューラルネットワークにおけるクラスタ性

Clusterability in Neural Networks ( http://arxiv.org/abs/2103.03386v1 )

ライセンス: Link先を確認

Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

(参考訳) ニューラルネットワークの学習重量は、しばしば精査可能な内部構造を持たないと考えられている。本稿では,ネットワークを内部接続性が高く,外部接続性が弱いニューロン群に分割可能かという,クラスタ性という形での構造を考察する。トレーニングされたニューラルネットワークは、通常ランダムに初期化されたネットワークよりもクラスタリング可能であり、しばしば同じ重みの分布を持つランダムネットワークに対してクラスタリング可能である。また,ニューラルネットワークの学習においてクラスタ性を促進する新しい手法を示し,多層パーセプトロンでは,精度を低下させることなく,よりクラスタ性の高いネットワークを実現する。ニューラルネットワークのクラスタビリティを理解して制御することで、意味のあるクラスタへのパーティショニングを容易にすることで、内部処理をよりエンジニアに解釈可能にすることが期待できる。

The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.

翻訳日:2023-04-09 02:10:47 公開日:2021-03-04

# チップ上の圧縮量子マイクロコンブ

A squeezed quantum microcomb on a chip ( http://arxiv.org/abs/2103.03380v1 )

ライセンス: Link先を確認

Zijiao Yang, Mandana Jahanbozorgi, Dongin Jeong, Shuman Sun, Olivier Pfister, Hansuek Lee, Xu Yi

(参考訳) 光マイクロ共鳴器ベースの周波数コム(microcomb)は、非線形物理学研究のための汎用プラットフォームを提供し、メトロロジーから分光まで幅広い応用がある。決定論的量子状態(Deterministic quantum regime)は、何百もの等価周波数モードの間の無条件の絡み合いが、スケーラブルな普遍量子コンピューティングや量子ネットワークにとって重要な要素となる、マイクロコムの未発見の側面である。ここでは、シリコンチップ上のシリカマイクロ共振器において決定論的量子マイクロコンブを示す。連続変動可能な40の量子モードは、通信波長で1thzの光スパン内で20個の2モードのスクイーズドコム対の形で観測される。 1.6dBの最大原料スクイーズが得られる。量子マイクロコムの周波数等距離を特徴付けるために高分解能分光計測法を開発した。実験では, 周波数多重量子状態と集積光学を活用し, 分光, 量子計測, スケーラブルな量子情報処理の分野における新たな道を開拓する可能性を示す。

The optical microresonator-based frequency comb (microcomb) provides a versatile platform for nonlinear physics studies and has wide applications ranging from metrology to spectroscopy. Deterministic quantum regime is an unexplored aspect of microcombs, in which unconditional entanglements among hundreds of equidistant frequency modes can serve as critical ingredients to scalable universal quantum computing and quantum networking. Here, we demonstrate a deterministic quantum microcomb in a silica microresonator on a silicon chip. 40 continuous-variable quantum modes, in the form of 20 simultaneously two-mode squeezed comb pairs, are observed within 1 THz optical span at telecommunication wavelengths. A maximum raw squeezing of 1.6 dB is attained. A high-resolution spectroscopy measurement is developed to characterize the frequency equidistance of quantum microcombs. Our demonstration offers the possibility to leverage deterministically generated, frequency multiplexed quantum states and integrated photonics to open up new avenues in fields of spectroscopy, quantum metrology, and scalable quantum information processing.

翻訳日:2023-04-09 02:10:33 公開日:2021-03-04

# 時空は全体として

Spacetime Paths as a Whole ( http://arxiv.org/abs/2103.03364v1 )

ライセンス: Link先を確認

Sky Nelson-Isaacs

(参考訳) 量子力学における非相対論的波動関数伝播とスカラー回折理論における画像伝播の数学的類似性は、時空全体を通しての時間と経路の新たな理解を促進するために用いられる。ファインマンによる非相対論的量子力学の経路積分公式の元々の導出は、時空を通るすべての可能な経路の和として振幅を計算するのに時間スライシングを用いることで知られている。ここでは、外部時間パラメータを持たないため、通常の意味での変化や進化ができない3+1D時空波分布とその4次元双対を公式に開発する。時間」は「外から」である。与えられた3+1D運動量表現は、システム全体の時空挙動を記述した完全な動的情報を符号化する。ホログラムの数学との比較を行い、単純な系に対する運動の性質を導出する。

The mathematical similarities between non-relativistic wavefunction propagation in quantum mechanics and image propagation in scalar diffraction theory are used to develop a novel understanding of time and paths through spacetime as a whole. It is well known that Feynman's original derivation of the path integral formulation of non-relativistic quantum mechanics uses time-slicing to calculate amplitudes as sums over all possible paths through space, but along a definite curve through time. Here, a 3+1D spacetime wave distribution and its 4-momentum dual are formally developed which have no external time parameter and therefore cannot change or evolve in the usual sense. Time is thus seen "from the outside". A given 3+1D momentum representation of a system encodes complete dynamical information, describing the system's spacetime behavior as a whole. A comparison is made to the mathematics of holograms, and properties of motion for simple systems are derived.

翻訳日:2023-04-09 02:10:13 公開日:2021-03-04

# オンラインデートプラットフォームにおける介入評価のためのエージェントベースモデル

An Agent-based Model to Evaluate Interventions on Online Dating Platforms to Decrease Racial Homogamy ( http://arxiv.org/abs/2103.03332v1 )

ライセンス: Link先を確認

Stefania Ionescu, Aniko Hannak, Kenneth Joseph

(参考訳) おそらく今日のオンラインプラットフォームの研究で最も議論を呼んでいる疑問は、プラットフォームがどのプラットフォームに介入して社会的な病気を軽減できるかだ。議論の余地は、オンラインいじめなど、プラットフォームが対処できる効果的な、永続的な介入があるかどうか、あるいは、そのような問題に対処するためにより広範囲にわたる変更が必要であるかどうかである。このような疑問に対処するには実証的な作業が不可欠だ。しかし、それはまた、時間がかかり、高価であり、時には企業が問うべき質問に制限されるため、難しい。本稿では,エージェント・ベース・モデリング(ABM)アプローチを提案する。応用として、シミュレーションされたオンラインデートプラットフォームへの介入が、人工社会における長期の人種間関係の欠如に与える影響を分析する。現実の世界では、異人種間関係の欠如は不平等を維持する重要な手段である。我々の研究は、オンラインデートプラットフォームが、ウェブサイトからの人種間関係の数を増やすために、これまで想定されていた多くの介入が限定的な効果を示し、いかなる介入の有効性も社会文化的構造に関する仮定の対象となっていることを示している。さらに、長期的な関係における多様性の増大に有効な介入は、プラットフォームの利益志向の目標に反する。一般的なレベルでは、abmアプローチを用いてプラットフォームが持つ可能性のあるさまざまな介入の潜在的な影響と副作用を理解することの価値を示す。

Perhaps the most controversial questions in the study of online platforms today surround the extent to which platforms can intervene to reduce the societal ills perpetrated on them. Up for debate is whether there exist any effective and lasting interventions a platform can adopt to address, e.g., online bullying, or if other, more far-reaching change is necessary to address such problems. Empirical work is critical to addressing such questions. But it is also challenging, because it is time-consuming, expensive, and sometimes limited to the questions companies are willing to ask. To help focus and inform this empirical work, we here propose an agent-based modeling (ABM) approach. As an application, we analyze the impact of a set of interventions on a simulated online dating platform on the lack of long-term interracial relationships in an artificial society. In the real world, a lack of interracial relationships are a critical vehicle through which inequality is maintained. Our work shows that many previously hypothesized interventions online dating platforms could take to increase the number of interracial relationships from their website have limited effects, and that the effectiveness of any intervention is subject to assumptions about sociocultural structure. Further, interventions that are effective in increasing diversity in long-term relationships are at odds with platforms' profit-oriented goals. At a general level, the present work shows the value of using an ABM approach to help understand the potential effects and side effects of different interventions that a platform could take.

翻訳日:2023-04-09 02:09:57 公開日:2021-03-04

# 量子力学における摂動温度

Temperature as perturbation in quantum mechanics ( http://arxiv.org/abs/2103.03306v1 )

ライセンス: Link先を確認

Ashkan Shekaari and Mahmoud Jafari

(参考訳) 摂動的アプローチは、非相対論的量子力学の温度依存バージョンを低温度限界で開発するために採用された。したがって、一般化された自己整合ハミルトニアンは任意の量子力学系に対して、基底状態ハミルトニアンが絶対零点での制限ケースであるように構成された。利害関係と直近の環境を結び付ける弱結合項は摂動として扱われた。得られた一般化ハミルトニアンを、箱の中の自由粒子、真空中の自由粒子、高調波振動子を含む完全な零温度解を持ついくつかの典型的な量子系に適用すると、関連するハミルトニアン、エネルギースペクトル、波動関数は低温限界と一致するように修正された。さらに、箱の中の自由粒子の残留確率によるある種の量子トンネル効果が、貯水池への熱的結合の主な結果として明らかになった。また, 熱環境が波動関数の主特性に及ぼす影響についても詳しく検討し, 考察した。

The perturbative approach was adopted to develop a temperature-dependent version of non-relativistic quantum mechanics in the limit of low-enough temperatures. A generalized, self-consistent Hamiltonian was therefore constructed for an arbitrary quantum-mechanical system in a way that the ground-state Hamiltonian turned out to be just a limiting case at absolute zero. The weak-coupling term connecting the system of interest and its immediate environment was accordingly treated as the perturbation. Applying the obtained generalized Hamiltonian to some typical quantum systems with exact zero-temperature solutions, including the free particle in a box, the free particle in vacuum, and the harmonic oscillator, up to the first order of self-consistency, therefore corrected their associated Hamiltonians, energy spectrums, and wavefunctions to be consistent with the low-temperature limit. Further investigation revealed some kind of quantum tunneling effect by a residual probability for the free particle in a box, as a chief consequence of thermally coupling to the reservoir. The possible effects of thermal environment on the main properties of the wavefunctions were also thoroughly examined and discussed.

翻訳日:2023-04-09 02:09:33 公開日:2021-03-04

# 反jaynes-cummingsモデルにおける2量子ビット量子制御なし論理ゲートと1量子ビットアダマール論理ゲートの理論的実現

Theoretical realization of a two qubit quantum controlled-not logic gate and a single qubit Hadamard logic gate in the anti-Jaynes-Cummings model ( http://arxiv.org/abs/2103.03297v1 )

ライセンス: Link先を確認

Christopher Mayero and Joseph Akeyo Omolo and Stephen Onyango Okeyo

(参考訳) 反jaynes-cummings相互作用過程におけるhadaardとquantum controlled-not logic gates演算を実現するための理論的スキームを提供する。特定の初期原子状態に対する標準アダマール演算は、反ジャイネス・カミングス・クビット状態遷移演算と反ジャイネス・カミングス・ハミルトン状態遷移を生成する相互作用成分とに特定の和周波数と光子数を設定することにより達成される。量子制御NOT論理ゲートは、2次元ヒルベルト空間で定義される1つの原子量子ビットが制御量子ビットであり、2次元ヒルベルト空間で定義される2つの非退化および直交偏極キャビティがターゲット量子ビットを作るときに実現される。反Jaynes-Cummings 準空間で定義される反Jaynes-Cummings qubit状態遷移演算における相互作用時間の正確な選択により、量子制御NOT演算における成功の理想的な単位確率を得る。

We provide a theoretical scheme for realizing a Hadamard and a quantum controlled-NOT logic gates operations in the anti-Jaynes-Cummings interaction process. Standard Hadamard operation for a specified initial atomic state is achieved by setting a specific sum frequency and photon number in the anti-Jaynes-Cummings qubit state transition operation with the interaction component of the anti-Jaynes-Cummings Hamiltonian generating the state transitions. The quantum controlled-NOT logic gate is realized when a single atomic qubit defined in a two-dimensional Hilbert space is the control qubit and two non-degenerate and orthogonal polarized cavities defined in a two-dimensional Hilbert space make the target qubit. With precise choice of interaction time in the anti-Jaynes-Cummings qubit state transition operations defined in the anti-Jaynes-Cummings sub-space spanned by normalized but non-orthogonal basic qubit state vectors, we obtain ideal unit probabilities of success in the quantum controlled-NOT operations.

翻訳日:2023-04-09 02:09:12 公開日:2021-03-04

# 温度:量子力学における無視因子

Temperature: The ignored factor in quantum mechanics ( http://arxiv.org/abs/2001.05212v2 )

ライセンス: Link先を確認

Ashkan Shekaari and Mahmoud Jafari

(参考訳) 古典的熱力学の法則と統計力学の標準アンサンブルスキームを用いて、非相対論的量子力学の枠組みに温度をパラメータとして導入する理論的形式論を開発した。自己整合ハミルトニアンが与えられた量子多体系のために構築され、この系は対応する零温度ハミルトニアンに付加される補正項の形で温度の影響を含む。粒子・イン・ア・ボックスモデル、自由粒子、および我々の有限温度内の調和振動子を含む、厳密なゼロ温度解を持つ量子力学系の研究は、その波関数とエネルギースペクトルに対する物理的に受け入れられない振る舞いに遭遇することなく、これらの系を絶対零点以上で記述する温度依存ハミルトニアンにつながった。結果は、有限温度の量子力学系がゼロ温度励起状態にあるかのように振る舞うという見解を強く支持する。

We have developed a theoretical formalism to introduce temperature as a parameter into the framework of non-relativistic quantum mechanics using the laws of classical thermodynamics and the canonical ensemble scheme of statistical mechanics. A self-consistent Hamiltonian has then been constructed for a given quantum many-body system which includes the effect of temperature in the form of correction terms added to the corresponding zero-temperature Hamiltonian of the system. Investigating some quantum mechanical systems with exact zero-temperature solutions including the particle-in-a-box model, the free particle, and the harmonic oscillator within our finite-temperature approach up to the first order of self-consistency has led to temperature-dependent Hamiltonians describing these systems above absolute zero without encountering any physically unacceptable brand of behavior for their wave functions and energy spectra. Results firmly support the view that a quantum mechanical system at a finite temperature behaves as if it is in a zero-temperature excited state.

翻訳日:2023-01-11 06:51:54 公開日:2021-03-04

# 一般ランクを有するスパイクウィグナーモデルにおける弱検出

Weak Detection in the Spiked Wigner Model with General Rank ( http://arxiv.org/abs/2001.05676v3 )

ライセンス: Link先を確認

Ji Hyung Jung, Hye Won Chung, and Ji Oon Lee

(参考訳) 加算ウィグナー雑音を伴う'signal+noise'型行列モデルから信号を検出する統計的決定過程について検討した。本稿では,信号の分布や雑音に依存しないデータ行列の線形スペクトル統計に基づく仮説テストを提案する。信号対雑音比が小さい場合、このテストはガウス雑音の下で最適であり、タイプIとタイプIIの誤差の総和を最小化する。非ガウス雑音下では、データ行列へのエントリワイズ変換によりテストを改善することができる。また,優先度が分かっていない場合の信号のランクを推定するアルゴリズムも導入する。

We study the statistical decision process of detecting the signal from a `signal+noise' type matrix model with an additive Wigner noise. We propose a hypothesis test based on the linear spectral statistics of the data matrix, which does not depend on the distribution of the signal or the noise. The test is optimal under the Gaussian noise if the signal-to-noise ratio is small, as it minimizes the sum of the Type-I and Type-II errors. Under the non-Gaussian noise, the test can be improved with an entrywise transformation to the data matrix. We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.

翻訳日:2023-01-11 00:14:17 公開日:2021-03-04

# 最大プールネットワークの最適化と一般化解析

An Optimization and Generalization Analysis for Max-Pooling Networks ( http://arxiv.org/abs/2002.09781v4 )

ライセンス: Link先を確認

Alon Brutzkus, Amir Globerson

(参考訳) 最大プール操作は、ディープラーニングアーキテクチャのコアコンポーネントである。特に、プーリングはパターン検出問題に対する自然なアプローチであるため、マシンビジョンで使用されるほとんどの畳み込みアーキテクチャの一部である。しかし、これらのアーキテクチャは理論的観点からはあまり理解されていない。例えば、それらがいつグローバルに最適化できるか、そして、超パラメータ化が一般化に与える影響は理解できない。ここでは、畳み込み最大プールアーキテクチャの理論解析を行い、グローバルに最適化でき、高度にパラメータ化されたモデルでもうまく一般化できることを示した。本研究では,パターン検出問題に触発されたデータ生成分布に着目した。我々は,理論結果から予測されるように,CNNが完全に接続されたネットワークよりも優れていることを実証的に検証した。

Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a "discriminative" pattern needs to be detected among "spurious" patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.

翻訳日:2022-12-29 19:03:07 公開日:2021-03-04

# 形状先行記憶からの単視点3次元物体再構成

Single-View 3D Object Reconstruction from Shape Priors in Memory ( http://arxiv.org/abs/2003.03711v3 )

ライセンス: Link先を確認

Shuo Yang, Min Xu, Haozhe Xie, Stuart Perry, Jiahao Xia

(参考訳) 画像特徴を3次元表現に変換するために,既存の3次元オブジェクト再構成手法を直接学習する。しかし,これらの手法は,高品質な3次元形状を再構成するのに十分な情報を含んでいないため,ノイズの多い背景と重閉塞を含む画像に対して脆弱である。人間は通常、画像から不完全または騒がしい視覚手がかりを使用して、記憶から類似した3d形状を取得し、物体の3d形状を再構築する。そこで我々はMem3Dという新しい手法を提案し,画像の欠落した情報を補うために,形状の先行を明示的に構築する。具体的には、形状優先は、トレーニング中によく設計された書き方によって格納されるメモリネットワーク内の「イメージボクセル」ペアの形式である。また,入力画像に強く関連した正確な3次元形状を形状先行から検索するためのボクセル三重項損失関数を提案する。 lstmベースの形状エンコーダは、取得した3d形状から情報を抽出するために導入され、非常に閉塞された、あるいは複雑な環境での物体の3d形状の復元に有用である。実験により,Mem3Dは再構成品質を著しく向上し,ShapeNetおよびPix3Dデータセットの最先端手法に対して良好な性能を発揮することが示された。

Existing methods for single-view 3D object reconstruction directly learn to transform image features into 3D representations. However, these methods are vulnerable to images containing noisy backgrounds and heavy occlusions because the extracted image features do not contain enough information to reconstruct high-quality 3D shapes. Humans routinely use incomplete or noisy visual cues from an image to retrieve similar 3D shapes from their memory and reconstruct the 3D shape of an object. Inspired by this, we propose a novel method, named Mem3D, that explicitly constructs shape priors to supplement the missing information in the image. Specifically, the shape priors are in the forms of "image-voxel" pairs in the memory network, which is stored by a well-designed writing strategy during training. We also propose a voxel triplet loss function that helps to retrieve the precise 3D shapes that are highly related to the input image from shape priors. The LSTM-based shape encoder is introduced to extract information from the retrieved 3D shapes, which are useful in recovering the 3D shape of an object that is heavily occluded or in complex environments. Experimental results demonstrate that Mem3D significantly improves reconstruction quality and performs favorably against state-of-the-art methods on the ShapeNet and Pix3D datasets.

翻訳日:2022-12-25 14:25:25 公開日:2021-03-04

# 行動・テキストデータに基づく分類器のメタ特徴に基づく規則抽出

Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data ( http://arxiv.org/abs/2003.04792v3 )

ライセンス: Link先を確認

Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene Praet

(参考訳) 行動データとテキストデータの機械学習モデルは、非常に正確な予測モデルをもたらすが、しばしば解釈するのが非常に困難である。複雑な「ブラックボックス」モデルの予測精度とグローバルな説明可能性を組み合わせたルール抽出手法が提案されている。しかし,多くの特徴が予測に関係している高次元スパースデータのコンテキストにおけるルール抽出は,ブラックボックスモデルを多くのルールで置き換えることによって,ユーザを理解不能な説明に戻すため,困難である。この問題に対処するため,我々は,高レベルで低スパースなメタ機能に基づく規則抽出手法を開発し,テストする。分析の鍵となる発見は、説明の忠実性によって測定されるように、メタ特徴に基づく説明がブラックボックス予測モデルの振る舞いを模倣するのに役立つことである。

Machine learning models on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse metafeatures. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.

翻訳日:2022-12-24 20:37:23 公開日:2021-03-04

# 実世界の強化学習の課題に関する実証的研究

An empirical investigation of the challenges of real-world reinforcement learning ( http://arxiv.org/abs/2003.11881v2 )

ライセンス: Link先を確認

Gabriel Dulac-Arnold and Nir Levine and Daniel J. Mankowitz and Jerry Li and Cosmin Paduraru and Sven Gowal and Todd Hester

(参考訳) 強化学習(RL)は、一連の人工ドメインでその価値を証明し、現実のシナリオでいくつかの成功を示し始めている。しかしながら、RLにおける研究の進歩の多くは、実際に満たされることがほとんどない一連の仮定のため、現実世界のシステムで活用することが難しい。本研究は,RLが現実のシステムに一般的に展開されるために必要な困難を具現化した,一連の独立した課題を特定し,定式化する。それぞれの課題について,マルコフ決定過程の文脈で形式的に定義し,その課題が最先端学習アルゴリズムに与える影響を分析し,それに取り組むための既存の試みを提示する。私たちの提案する課題に対処するアプローチは、現実世界の多くの問題に対して容易にデプロイできると信じています。提案する課題は,オープンソースベンチマークとして提案するrealworldrl-suiteと呼ばれる,一連の継続的制御環境に実装されている。

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called the realworldrl-suite which we propose an as an open-source benchmark.

翻訳日:2022-12-20 08:13:14 公開日:2021-03-04

# 補助分類器を用いた生成逆数ネットワークに基づく完全自動心電図分類システム

Fully Automatic Electrocardiogram Classification System based on Generative Adversarial Network with Auxiliary Classifier ( http://arxiv.org/abs/2004.04894v3 )

ライセンス: Link先を確認

Zhanhong Zhou, Xiaolong Zhai, Chung Tin

(参考訳) 本稿では,完全自動心電図(ecg)不整脈分類システムを用いたgan(generative adversarial network)について述べる。 GANのジェネレータ(G)は、データ拡張のために異なる不整脈クラスで条件付けられた様々な結合行列入力を生成するように設計されている。我々の設計した判別器(D)は、実および生成したECG結合行列の入力に基づいて訓練され、GANのトレーニングが完了すると不整脈分類器として抽出される。 MIT-BIH不整脈データベースでは,非教師付きアルゴリズムを用いて患者固有の正常拍を推定し,Gによる異常拍を発生させることでDを微調整した後,全自動で上室異所性拍(SVEB,Sビート)と心室異所性拍(VEB,Vビート)の総合分類性能が良好であった。最先端の自動分類器を数種類超え、専門家支援手法と同様のレベルで実行することができる。特に、SVEBのF1スコアは、パフォーマンスの高い自動システムよりも最大13%向上している。また, SVEB (87%) とVOB (93%) の感度も高く, 診断に有用である。そこで我々は,ACE-GAN (Generative Adversarial Network with Auxiliary Classifier for Electrocardiogram) ベースの自動システムは,手動介在物や専門家によるラベリングを必要とせず,高スループットな臨床検診を行う上で有望かつ信頼性の高いツールであることを示す。

A generative adversarial network (GAN) based fully automatic electrocardiogram (ECG) arrhythmia classification system with high performance is presented in this paper. The generator (G) in our GAN is designed to generate various coupling matrix inputs conditioned on different arrhythmia classes for data augmentation. Our designed discriminator (D) is trained on both real and generated ECG coupling matrix inputs, and is extracted as an arrhythmia classifier upon completion of training for our GAN. After fine-tuning the D by including patient-specific normal beats estimated using an unsupervised algorithm, and generated abnormal beats by G that are usually rare to obtain, our fully automatic system showed superior overall classification performance for both supraventricular ectopic beats (SVEB or S beats) and ventricular ectopic beats (VEB or V beats) on the MIT-BIH arrhythmia database. It surpassed several state-of-art automatic classifiers and can perform on similar levels as some expert-assisted methods. In particular, the F1 score of SVEB has been improved by up to 13% over the top-performing automatic systems. Moreover, high sensitivity for both SVEB (87%) and VEB (93%) detection has been achieved, which is of great value for practical diagnosis. We, therefore, suggest our ACE-GAN (Generative Adversarial Network with Auxiliary Classifier for Electrocardiogram) based automatic system can be a promising and reliable tool for high throughput clinical screening practice, without any need of manual intervene or expert assisted labeling.

翻訳日:2022-12-14 20:38:03 公開日:2021-03-04

# テセル化ワッサースタインオートエンコーダ

Tessellated Wasserstein Auto-Encoders ( http://arxiv.org/abs/2005.09923v2 )

ライセンス: Link先を確認

Kuo Gai and Shihua Zhang

(参考訳) varational auto-encoder (vae)、wasserstein auto-encoder with maximum average discrepancy (wae-mmd)、slicd-wasserstein auto-encoder (swae)といった非敵生成モデルは比較的訓練しやすく、wasserstein auto-encoder with generative adversarial network (wae-gan)に比べてモード崩壊が少ない。しかし、現実と偽の微妙な違いを検出する判別器が存在しないため、潜在空間におけるターゲット分布の近似にはあまり正確ではない。そこで本研究では,Tessellated Wasserstein Auto-Encoders (TWAE) と呼ばれる,不一致の正確な計算のためのランダムシャッフルではなく,テッセルレーションに従ってデータバッチを設計するセンタロイド式Voronoi tessellation (CVT) 技術により,対象領域への目標分布の支持をテッセルレートする,新たな非敵的フレームワークを開発した。理論的には、サンプル数n$ とテッセレーションの領域 $m$ がそれぞれ $\mathcal{o}(\frac{1}{\sqrt{n}})$ と $\mathcal{o}(\frac{1}{\sqrt{m}})$ で大きくなると、推定誤差が減少することを示した。固定$n$と$m$が与えられた場合、測定誤差の上限を最小化するために必要な条件は、テッセル化がCVTによって決定されるものであることである。 TWAEは、異なる非敵対的指標に対して非常に柔軟であり、VAE、WAE-MMD、SWAEと比較してFr\'{e}chet開始距離(FID)において、その生成性能を大幅に向上させることができる。さらに,TWAEは敵対モデルWAE-GANと競合し,その強力な生成能力を示した。

Non-adversarial generative models such as variational auto-encoder (VAE), Wasserstein auto-encoders with maximum mean discrepancy (WAE-MMD), sliced-Wasserstein auto-encoder (SWAE) are relatively easy to train and have less mode collapse compared to Wasserstein auto-encoder with generative adversarial network (WAE-GAN). However, they are not very accurate in approximating the target distribution in the latent space because they don't have a discriminator to detect the minor difference between real and fake. To this end, we develop a novel non-adversarial framework called Tessellated Wasserstein Auto-encoders (TWAE) to tessellate the support of the target distribution into a given number of regions by the centroidal Voronoi tessellation (CVT) technique and design batches of data according to the tessellation instead of random shuffling for accurate computation of discrepancy. Theoretically, we demonstrate that the error of estimate to the discrepancy decreases when the numbers of samples $n$ and regions $m$ of the tessellation become larger with rates of $\mathcal{O}(\frac{1}{\sqrt{n}})$ and $\mathcal{O}(\frac{1}{\sqrt{m}})$, respectively. Given fixed $n$ and $m$, a necessary condition for the upper bound of measurement error to be minimized is that the tessellation is the one determined by CVT. TWAE is very flexible to different non-adversarial metrics and can substantially enhance their generative performance in terms of Fr\'{e}chet inception distance (FID) compared to VAE, WAE-MMD, SWAE. Moreover, numerical results indeed demonstrate that TWAE is competitive to the adversarial model WAE-GAN, demonstrating its powerful generative ability.

翻訳日:2022-12-01 05:15:38 公開日:2021-03-04

# 平均場理論による非負行列の高速階数削減

Fast Rank Reduction for Non-negative Matrices via Mean Field Theory ( http://arxiv.org/abs/2006.05321v2 )

ライセンス: Link先を確認

Kazu Ghalamkari, Mahito Sugiyama

(参考訳) 行列の行数や列数において時間複雑性が2次である非負行列に対する効率的な行列ランク低減法を提案する。我々の重要な洞察は、構造化サンプル空間上の対数線形モデルを用いて行列をモデル化することにより、平均場近似としてランク還元を定式化することである。この定式化のハイライトは、与えられた行列からklの発散を最小化する最適解を閉形式で解析的に計算できることである。提案手法は,NMFとNMFの変種であるlraNMFよりも高速であり,合成および実世界のデータセット上での競合的低ランク近似誤差を実現する。

We propose an efficient matrix rank reduction method for non-negative matrices, whose time complexity is quadratic in the number of rows or columns of a matrix. Our key insight is to formulate rank reduction as a mean-field approximation by modeling matrices via a log-linear model on structured sample space, which allows us to solve the rank reduction as convex optimization. The highlight of this formulation is that the optimal solution that minimizes the KL divergence from a given matrix can be analytically computed in a closed form. We empirically show that our rank reduction method is faster than NMF and its popular variant, lraNMF, while achieving competitive low rank approximation error on synthetic and real-world datasets.

翻訳日:2022-11-23 14:01:16 公開日:2021-03-04

# 良い分類器は補間体制で豊富です

Good Classifiers are Abundant in the Interpolating Regime ( http://arxiv.org/abs/2006.12625v2 )

ライセンス: Link先を確認

Ryan Theisen, Jason M. Klusowski, Michael W. Mahoney

(参考訳) 機械学習コミュニティの中で、広く使われている統一収束フレームワークは、過パラメータ化されたモデルが新しいデータにどのように一般化できるかという疑問に答えるために使われてきた。このアプローチは、データに適合する最悪のケースモデルのテストエラーを境界とするが、基本的な制限がある。統計力学の学習アプローチに着想を得て,複数のモデルクラスからの補間分類器間のテストエラーの分布を精度良く計算する手法を定式化し,開発する。本稿では,この分布を線形およびランダムな特徴分類モデルを用いて,実データおよび合成データに対して計算する。テストエラーは小さな典型値$\varepsilon^*$に集中する傾向にあり、これは同じデータセット上の最悪のケース補間モデルのテストエラーから大きく逸脱しており、"悪い"分類器は極めて稀であることを示している。我々は、テストエラーの漸近分布を特徴付ける簡単な設定で理論的結果を提供し、これらが実際に$\varepsilon^*$の値に集中していることを示し、その値も正確に識別する。次に、経験的発見によって支持されるより一般的な予想を定式化する。以上の結果から,統計的学習理論の一般的な解析手法は,実際に観測された優れた一般化性能を捉えるには,十分にきめ細やかな粒度が得られず,統計的学習力学に基づくアプローチが有望な代替手段となる可能性が示唆された。

Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value $\varepsilon^*$, which deviates substantially from the test error of the worst-case interpolating model on the same datasets, indicating that "bad" classifiers are extremely rare. We provide theoretical results in a simple setting in which we characterize the full asymptotic distribution of test errors, and we show that these indeed concentrate around a value $\varepsilon^*$, which we also identify exactly. We then formalize a more general conjecture supported by our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning may offer a promising alternative.

翻訳日:2022-11-18 05:11:08 公開日:2021-03-04

# ランダム林としてのマクロ経済

The Macroeconomy as a Random Forest ( http://arxiv.org/abs/2006.12724v3 )

ライセンス: Link先を確認

Philippe Goulet Coulombe

(参考訳) マクロ経済ランダムフォレスト(MRF, Macroeconomic Random Forest)は,線形マクロ方程式の進化パラメータを柔軟にモデル化するための機械学習ツールである。主な出力である一般化時変パラメータ(gtvps)は、多くの一般的な非線形性(threshold/switching, smooth transition, structural breaks/change)をネストし、洗練された新しいものを可能にする多用途デバイスである。このアプローチは多くの代替案よりも明確な予測ゲインをもたらし、2008年の失業の劇的な増加を予測し、インフレのためにうまく機能する。ほとんどのMLベースの方法とは異なり、MRFはGTVPを介して直接解釈可能である。例えば、失業率予測の成功は、不況のたびにほぼ倍増する前向きの変数(例えば、用語の拡散、住宅の開始)の影響によるものである。興味深いことに、フィリップス曲線は確かに平坦であり、そのポテンシャルは非常に巡回的である。

I develop Macroeconomic Random Forest (MRF), an algorithm adapting the canonical Machine Learning (ML) tool to flexibly model evolving parameters in a linear macro equation. Its main output, Generalized Time-Varying Parameters (GTVPs), is a versatile device nesting many popular nonlinearities (threshold/switching, smooth transition, structural breaks/change) and allowing for sophisticated new ones. The approach delivers clear forecasting gains over numerous alternatives, predicts the 2008 drastic rise in unemployment, and performs well for inflation. Unlike most ML-based methods, MRF is directly interpretable -- via its GTVPs. For instance, the successful unemployment forecast is due to the influence of forward-looking variables (e.g., term spreads, housing starts) nearly doubling before every recession. Interestingly, the Phillips curve has indeed flattened, and its might is highly cyclical.

翻訳日:2022-11-17 23:26:18 公開日:2021-03-04

# PanRep:異種グラフにおける普遍ノード埋め込み抽出のためのグラフニューラルネットワーク

PanRep: Graph neural networks for extracting universal node embeddings in heterogeneous graphs ( http://arxiv.org/abs/2007.10445v2 )

ライセンス: Link先を確認

Vassilis N. Ioannidis, Da Zheng, George Karypis

(参考訳) 教師なしノード埋め込みの学習は、ノード分類やリンク予測などの下流タスクを容易にする。ノードの埋め込みは、様々な下流タスクで使われるように設計されている場合、普遍的である。この研究は、異種グラフに対する普遍ノード表現の教師なし学習のためのグラフニューラルネットワーク(GNN)モデルであるPanRepを紹介する。 PanRepは、ノード埋め込みと4つのデコーダを取得するGNNエンコーダで構成され、それぞれが異なるトポロジとノードの特徴特性をキャプチャする。これらの特性に従えば、新しい教師なしフレームワークは、異なる下流タスクに適用可能な普遍的な埋め込みを学習する。 PanRepは、限定ラベルを考慮に入れた微調整が可能である。この運用環境では、PanRepは異種グラフデータのノード埋め込みを抽出するための事前訓練されたモデルとみなされる。 panrepは、ノード分類とリンク予測において、特に教師なしメソッドのラベル付きデータが小さい場合に、教師なしメソッドと教師なしメソッドを全て上回る。 PanRep-FT(微調整)は他の教師ありアプローチよりも優れており、事前学習モデルの利点を裏付けている。最後に、Covid-19の新規薬物発見にPanRep-FTを適用した。薬物再導入における普遍的な埋め込みの利点を示し,臨床試験で用いられる薬物を薬物候補として同定する。

Learning unsupervised node embeddings facilitates several downstream tasks such as node classification and link prediction. A node embedding is universal if it is designed to be used by and benefit various downstream tasks. This work introduces PanRep, a graph neural network (GNN) model, for unsupervised learning of universal node representations for heterogenous graphs. PanRep consists of a GNN encoder that obtains node embeddings and four decoders, each capturing different topological and node feature properties. Abiding to these properties the novel unsupervised framework learns universal embeddings applicable to different downstream tasks. PanRep can be furthered fine-tuned to account for possible limited labels. In this operational setting PanRep is considered as a pretrained model for extracting node embeddings of heterogenous graph data. PanRep outperforms all unsupervised and certain supervised methods in node classification and link prediction, especially when the labeled data for the supervised methods is small. PanRep-FT (with fine-tuning) outperforms all other supervised approaches, which corroborates the merits of pretraining models. Finally, we apply PanRep-FT for discovering novel drugs for Covid-19. We showcase the advantage of universal embeddings in drug repurposing and identify several drugs used in clinical trials as possible drug candidates.

翻訳日:2022-11-08 12:55:40 公開日:2021-03-04

# Clarinet: 予算に優しいドメイン適応に向けての一段階

Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation ( http://arxiv.org/abs/2007.14612v2 )

ライセンス: Link先を確認

Yiyang Zhang, Feng Liu, Zhen Fang, Bo Yuan, Guangquan Zhang, Jie Lu

(参考訳) unsupervised domain adaptation(uda)では、ターゲットドメインの分類器は、ソースドメインからの巨大なtrue-labelデータと、ターゲットドメインからのunlabelデータで訓練される。しかし、予算が限られているため、ソースドメインで完全なラベルデータを集めるのは難しいかもしれません。この問題を軽減するために,対象ドメインの分類器を,ソースドメインからの補完ラベルデータと,対象ドメインのラベルなしデータとで訓練する必要がある,予算フレンドリーuda(bfuda)という新たな問題を考える。主な利点は、(BFUDAが要求する)補完ラベルのソースデータを収集するコストが、(通常のUDAが要求する)真のラベルのソースデータを収集するコストよりもはるかに少ないことである。この目的のために、BFUDA問題を解決するためにCLARINET(Compleorary label adversarial Network)を提案する。 clarinetは2つのディープネットワークを同時に維持しており、1つは補完ラベルのソースデータを分類し、もう1つはソースからターゲットへの分散適応を扱う。 CLARINETは、一連の有能なベースラインを著しく上回っている。

In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain. However, it may be difficult to collect fully-true-label data in a source domain given a limited budget. To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to be trained with complementary-label data from the source domain and unlabeled data from the target domain named budget-friendly UDA (BFUDA). The key benefit is that it is much less costly to collect complementary-label source data (required by BFUDA) than collecting the true-label source data (required by ordinary UDA). To this end, the complementary label adversarial network (CLARINET) is proposed to solve the BFUDA problem. CLARINET maintains two deep networks simultaneously, where one focuses on classifying complementary-label source data and the other takes care of the source-to-target distributional adaptation. Experiments show that CLARINET significantly outperforms a series of competent baselines.

翻訳日:2022-11-05 19:24:32 公開日:2021-03-04

# リカレントニューラルネットワークの再検討と画像分類の改善

Rethinking Recurrent Neural Networks and Other Improvements for Image Classification ( http://arxiv.org/abs/2007.15161v3 )

ライセンス: Link先を確認

Nguyen Huu Phong, Bernardete Ribeiro

(参考訳) 数十年前にさかのぼる機械学習の長い歴史の中で、リカレントニューラルネットワーク(RNN)は主にシーケンシャルなデータや時系列、一般的に1D情報に使われてきた。 2次元画像の稀な研究においても、これらのネットワークは画像認識タスクではなく、データのシーケンシャルな学習と生成にのみ使用される。本研究では,画像認識モデルの設計において,RNNを付加層として統合することを提案する。また,複数のモデルを用いてエキスパート予測を行うエンド・ツー・エンドのマルチモデルアンサンブルを開発した。さらに、トレーニング戦略を拡張して、主要なモデルと互換性があり、いくつかの挑戦的なデータセット(SVHN (0.99)、Cifar-100 (0.9027)、Cifar-10 (0.9852) など)で最先端のモデルにマッチさせることができる。さらに,本モデルでは,サリーデータセット (0.949) に新しいレコードを設定する。この記事では、メソッドのソースコードをhttps://github.com/leonlha/e2e-3mとhttp://nguyenhuuphong.meで公開します。

Over the long history of machine learning, which dates back several decades, recurrent neural networks (RNNs) have been used mainly for sequential data and time series and generally with 1D information. Even in some rare studies on 2D images, these networks are used merely to learn and generate data sequentially rather than for image recognition tasks. In this study, we propose integrating an RNN as an additional layer when designing image recognition models. We also develop end-to-end multimodel ensembles that produce expert predictions using several models. In addition, we extend the training strategy so that our model performs comparably to leading models and can even match the state-of-the-art models on several challenging datasets (e.g., SVHN (0.99), Cifar-100 (0.9027) and Cifar-10 (0.9852)). Moreover, our model sets a new record on the Surrey dataset (0.949). The source code of the methods provided in this article is available at https://github.com/leonlha/e2e-3m and http://nguyenhuuphong.me.

翻訳日:2022-11-05 14:06:40 公開日:2021-03-04

# 人物再同定のための弾性損失をもつ不完全記述子マイニング

Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification ( http://arxiv.org/abs/2008.04010v4 )

ライセンス: Link先を確認

Hongchen Tan, Yuhao Bian, Huasheng Wang, Xiuping Liu, and Baocai Yin

(参考訳) 本稿では,人物リidタスクに注意し頑健な人物記述子をキャプチャする新しい人物リidモデルである連続バッチドロップブロックネットワーク(cbdb-net)を提案する。 CBDB-Net には Consecutive Batch DropBlock Module (CBDBM) と Elastic Loss (EL) という2つの新しい設計が含まれている。連続したバッチドロップブロックモジュール(cbdbm)では、最初に機能マップで一様分割を行います。そして、各パッチを独立して継続的にフィーチャーマップの上から下へ落とし、複数の不完全なフィーチャーマップを出力します。トレーニング段階では、これらの複数の不完全な機能はRe-IDモデルをより促進し、Re-IDタスクの堅牢な人物記述子をキャプチャする。 EL(Elastic Loss)では、Re-IDモデルがハードサンプルペアと簡単なサンプルペアを適応的にバランスさせるのに役立つ新しい重量制御アイテムを設計する。広範囲にわたるアブレーション研究を通じて,CBDBM(Consecutive Batch DropBlock Module)とEL(Elastic Loss)がCBDB-Netの性能向上に貢献していることを確認した。我々のCBDB-Netは、3つの標準人物Re-IDデータセット(Market-1501、DukeMTMC-Re-ID、CUHK03データセット)、3つの隠蔽人物Re-IDデータセット(Occluded DukeMTMC、Partial-REID、Partial iLIDSデータセット)、一般的な画像検索データセット(In-Shop Clothes Retrievalデータセット)で競合性能を達成できることを示した。

In this paper, we propose a novel person Re-ID model, Consecutive Batch DropBlock Network (CBDB-Net), to capture the attentive and robust person descriptor for the person Re-ID task. The CBDB-Net contains two novel designs: the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL). In the Consecutive Batch DropBlock Module (CBDBM), we firstly conduct uniform partition on the feature maps. And then, we independently and continuously drop each patch from top to bottom on the feature maps, which can output multiple incomplete feature maps. In the training stage, these multiple incomplete features can better encourage the Re-ID model to capture the robust person descriptor for the Re-ID task. In the Elastic Loss (EL), we design a novel weight control item to help the Re-ID model adaptively balance hard sample pairs and easy sample pairs in the whole training process. Through an extensive set of ablation studies, we verify that the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL) each contribute to the performance boosts of CBDB-Net. We demonstrate that our CBDB-Net can achieve the competitive performance on the three standard person Re-ID datasets (the Market-1501, the DukeMTMC-Re-ID, and the CUHK03 dataset), three occluded Person Re-ID datasets (the Occluded DukeMTMC, the Partial-REID, and the Partial iLIDS dataset), and a general image retrieval dataset (In-Shop Clothes Retrieval dataset).

翻訳日:2022-10-31 23:06:46 公開日:2021-03-04

# 軌道フィードバックによる強化学習

Reinforcement Learning with Trajectory Feedback ( http://arxiv.org/abs/2008.06036v2 )

ライセンス: Link先を確認

Yonathan Efroni, Nadav Merlis, Shie Mannor

(参考訳) 強化学習の標準的なフィードバックモデルは、訪問した状態-アクションペアの報酬を明らかにする必要がある。しかし、実際には、そのような頻繁なフィードバックが利用できないことが多い。この研究では、この仮定を緩和する第一歩を踏み出し、より弱い形式のフィードバックを必要とし、これを 'emph{trajectory feedback} と呼ぶ。各アクションの後に得られる報酬を観察する代わりに、エージェントが観察する軌道全体の質を表すスコア、すなわち、この軌道上で得られるすべての報酬の合計だけを受け取ると仮定する。我々は,未知報酬の最小二乗推定に基づく強化学習アルゴリズムを,既知のトランジションモデルと未知のトランジションモデルの両方のケースに対してこの設定に拡張し,その後悔を分析してアルゴリズムの性能について検討する。遷移モデルが未知の場合には、トラクタブルアルゴリズムをもたらすハイブリッドな楽観的なトンプソンサンプリング手法を提供する。

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a tractable algorithm.

翻訳日:2022-10-30 22:36:51 公開日:2021-03-04

# ベクトル値画像正規化のためのカラー弾性モデル

A Color Elastica Model for Vector-Valued Image Regularization ( http://arxiv.org/abs/2008.08255v2 )

ライセンス: Link先を確認

Hao Liu, Xue-Cheng Tai, Ron Kimmel, Roland Glowinski

(参考訳) オイラーの弾性エネルギーに関連するモデルは、画像処理を含む多くのアプリケーションで有用であることが証明されている。カラー画像やマルチチャネルデータに弾性モデルを拡張することは難しい課題であり、これらの幾何学モデルの安定かつ一貫した数値解法は高階微分を伴うことが多い。単一チャネルのオイラーの弾性モデルや全変動(TV)モデルと同様に、高次微分を含む幾何学的測度は、弾性特性を最小化する画像形成モデルを考える際に役立つ。過去には、高エネルギー物理学からのポリアコフ作用がカラー画像処理にうまく応用されている。ここでは、色多様体曲率を最小化する色画像に対するポリアコフ作用の追加を紹介する。カラー画像チャネルにラプラス・ベルトラミ演算子を適用してカラー画像曲率を算出する。グレースケールに縮小した場合、空間と色の間の適切なスケーリングを選択しながら、画像レベルセットで操作するオイラーの弾性を最小化する。提案する非線形幾何モデルの最小値を求めることは,本論文で提示する課題である。具体的には,提案する関数を最小化する演算子分割法を提案する。非線形性は、3つのベクトル値変数と行列値変数を導入することで分離される。その後、問題は関連する初期値問題の定常状態の解に変換される。初期値問題は、各サブプロブレムが閉形式解を持つか、高速アルゴリズムで解けるように、3つの分数ステップに時間分割される。提案手法の効率性とロバスト性は系統的な数値実験により実証された。

Models related to the Euler's elastica energy have proven to be useful for many applications including image processing. Extending elastica models to color images and multi-channel data is a challenging task, as stable and consistent numerical solvers for these geometric models often involve high order derivatives. Like the single channel Euler's elastica model and the total variation (TV) models, geometric measures that involve high order derivatives could help when considering image formation models that minimize elastic properties. In the past, the Polyakov action from high energy physics has been successfully applied to color image processing. Here, we introduce an addition to the Polyakov action for color images that minimizes the color manifold curvature. The color image curvature is computed by applying of the Laplace-Beltrami operator to the color image channels. When reduced to gray-scale images, while selecting appropriate scaling between space and color, the proposed model minimizes the Euler's elastica operating on the image level sets. Finding a minimizer for the proposed nonlinear geometric model is a challenge we address in this paper. Specifically, we present an operator-splitting method to minimize the proposed functional. The non-linearity is decoupled by introducing three vector-valued and matrix-valued variables. The problem is then converted into solving for the steady state of an associated initial-value problem. The initial-value problem is time-split into three fractional steps, such that each sub-problem has a closed form solution, or can be solved by fast algorithms. The efficiency and robustness of the proposed method are demonstrated by systematic numerical experiments.

翻訳日:2022-10-27 12:01:21 公開日:2021-03-04

# varifocalnet:iouを検知する高密度物体検出器

VarifocalNet: An IoU-aware Dense Object Detector ( http://arxiv.org/abs/2008.13367v2 )

ライセンス: Link先を確認

Haoyang Zhang, Ying Wang, Feras Dayoub and Niko S\"underhauf

(参考訳) 密度の高い物体検出器が高性能を達成するためには、膨大な数の候補検出を正確にランク付けすることが不可欠である。事前の作業では、分類スコアまたは分類と予測定位スコアの組み合わせを使用して候補をランク付けする。しかし、どちらのオプションも信頼性の高いランキングとなり、検出性能が低下する。本稿では,物体の存在感と位置推定精度の合同表現として,IACS(Iou-Aware Classification Score)を学習することを提案する。高密度物体検出器は、iacsに基づいて、より正確な候補検出ランキングを実現できることを示す。我々は、IACSを予測するために高密度物体検出器を訓練するために、Varifocal Lossという新しい損失関数を設計し、IACS予測とバウンディングボックス改善のための新しい星型バウンディングボックス特徴表現を提案する。これら2つの新しいコンポーネントとバウンディングボックスリファインメントブランチを組み合わせることで、FCOS+ATSSアーキテクチャに基づいたIoU対応の高密度オブジェクト検出器を構築し、VarifocalNetまたはVFNetを略して呼び出す。 MS COCOの大規模な実験により、VFNetは、異なるバックボーンを持つ$\sim$2.0 APの強いベースラインを一貫して超えています。我々のベストモデルであるVFNet-X-1200とRes2Net-101-DCNは、COCO test-dev上で55.1のシングルスケールAPを達成する。

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .

翻訳日:2022-10-23 07:09:35 公開日:2021-03-04

# 活動特化特徴と活動相関を用いたマルチラベル活動認識

Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations ( http://arxiv.org/abs/2009.07420v2 )

ライセンス: Link先を確認

Yanyi Zhang, Xinyu Li, Ivan Marsic

(参考訳) マルチラベルアクティビティ認識は、各ビデオで同時または順次に実行される複数のアクティビティを認識するように設計されている。最近のアクティビティ認識ネットワークは、各ビデオ内の1つのアクティビティのみを前提とする単一のアクティビティに焦点を当てている。これらのネットワークは、マルチラベルアクティビティ用に設計されていないすべてのアクティビティの共有機能を抽出する。本稿では,各アクティビティの独立な特徴記述子を抽出し,アクティビティ相関を学習するマルチラベルアクティビティ認識手法を提案する。この構造はエンドツーエンドでトレーニングでき、ビデオ分類のために既存のネットワーク構造にプラグインすることができる。提案手法は,4つのマルチラベルアクティビティ認識データセットにおける最先端手法よりも優れている。システムが生成するアクティビティ特有の特徴をよりよく理解するために、これらのアクティビティ特有の機能をcharadesデータセットで視覚化しました。

Multi-label activity recognition is designed for recognizing multiple activities that are performed simultaneously or sequentially in each video. Most recent activity recognition networks focus on single-activities, that assume only one activity in each video. These networks extract shared features for all the activities, which are not designed for multi-label activities. We introduce an approach to multi-label activity recognition that extracts independent feature descriptors for each activity and learns activity correlations. This structure can be trained end-to-end and plugged into any existing network structures for video classification. Our method outperformed state-of-the-art approaches on four multi-label activity recognition datasets. To better understand the activity-specific features that the system generated, we visualized these activity-specific features in the Charades dataset.

翻訳日:2022-10-18 00:12:30 公開日:2021-03-04

# ゼロショット学習のための情報ボトルネック制約付き潜在双方向埋め込み

Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning ( http://arxiv.org/abs/2009.07451v3 )

ライセンス: Link先を確認

Yang Liu, Lei Zhou, Xiao Bai, Lin Gu, Tatsuya Harada, Jun Zhou

(参考訳) ゼロショット学習(ZSL)は、目に見えるクラスから目に見えないクラスに意味的な知識を移すことによって、新しいクラスを認識することを目的としている。多くのZSL法は視覚空間と意味空間の直接マッピングに依存しているが、キャリブレーション偏差とハブ性問題は一般化能力を目に見えないクラスに制限する。最近出現した生成型ZSL法は、ZSLを教師付き分類問題に変換するために見えない画像特徴を生成する。しかし、ほとんどの生成モデルは、トレーニングに使用されるデータのみであるため、まだ見受けられないバイアス問題に苦しんでいる。そこで本研究では, 密接な視覚-感覚結合制約を持つ双方向埋め込み型生成モデルを提案する。視覚空間と意味空間の両方の埋め込みパラメトリック分布を校正する統合潜在空間を学習する。高次元視覚特徴からの埋め込みは、多くの非意味情報を含むので、潜在空間における視覚と意味のアライメントは必然的に逸脱する。そこで本研究では,ZSLに初めて情報ボトルネック(IB)制約を導入し,マッピング中に本質的な属性情報を保持する。具体的には,不確実性推定と覚醒手順を利用して特徴雑音を緩和し,モデルの抽象化能力を向上させる。また, 画像のラベルを生成することで, トランスダクティブZSL設定に容易に拡張することができる。そして、このラベルノイズ問題を解決するためにロバストな損失を導入する。広範な実験結果から,本手法は,ほとんどのベンチマークデータセットのzsl設定において,最先端のメソッドよりも優れていた。コードはhttps://github.com/osierboy/IBZSLで入手できる。

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Though many ZSL methods rely on a direct mapping between the visual and the semantic space, the calibration deviation and hubness problem limit the generalization capability to unseen classes. Recently emerged generative ZSL methods generate unseen image features to transform ZSL into a supervised classification problem. However, most generative models still suffer from the seen-unseen bias problem as only seen data is used for training. To address these issues, we propose a novel bidirectional embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Since the embedding from high-dimensional visual features comprise much non-semantic information, the alignment of visual and semantic in latent space would inevitably been deviated. Therefore, we introduce information bottleneck (IB) constraint to ZSL for the first time to preserve essential attribute information during the mapping. Specifically, we utilize the uncertainty estimation and the wake-sleep procedure to alleviate the feature noises and improve model abstraction capability. In addition, our method can be easily extended to transductive ZSL setting by generating labels for unseen images. We then introduce a robust loss to solve this label noise problem. Extensive experimental results show that our method outperforms the state-of-the-art methods in different ZSL settings on most benchmark datasets. The code will be available at https://github.com/osierboy/IBZSL.

翻訳日:2022-10-17 23:29:50 公開日:2021-03-04

# 不確定な pomdp に対するロバスト有限状態制御器

Robust Finite-State Controllers for Uncertain POMDPs ( http://arxiv.org/abs/2009.11459v2 )

ライセンス: Link先を確認

Murat Cubuktepe, Nils Jansen, Sebastian Junges, Ahmadreza Marandi, Marnix Suilen, Ufuk Topcu

(参考訳) 観測不能なマルコフ決定過程 (uPOMDPs) により、標準ポドフの確率的遷移と観測関数はいわゆる不確実性集合に属する。このような不確実性は、疫学的な不確実性と呼ばれ、例えばデータ不足によって生じる確率分布の無数の集合をキャプチャする。我々は,任意の許容分布に対する仕様を確実に満たす uPOMDP の有限メモリポリシを計算するアルゴリズムを開発した。一般に、そのような政策の計算は理論上、実用上は難解である。この問題に対する効率的な解決策を4ステップで提供します。 1) 基礎となる問題を無限に多くの制約のある非凸最適化問題とする。 2) 専用双対化スキームは、まだ凸ではないが有限個の制約を持つ双対問題をもたらす。 3) この双対問題を線形化し, (4) 結果として得られる有限線形プログラムを解き, 原問題に対する局所最適解を得る。その結果生じる問題定式化は、既存の方法による問題よりも指数関数的に小さい。航空機衝突回避シナリオの大規模事例と,新しい宇宙機モーションプランニングケーススタディを用いて,本アルゴリズムの適用性を示す。

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captures uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

翻訳日:2022-10-15 04:39:16 公開日:2021-03-04

# VIVO: 新しいオブジェクトキャプションのためのビジュアル語彙事前トレーニング

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning ( http://arxiv.org/abs/2009.13682v2 )

ライセンス: Link先を確認

Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

(参考訳) キャプションラベル付きトレーニングデータに見えない新規なオブジェクトを記述できる画像キャプションを生成することは、非常に望ましいが、新規なオブジェクトキャプションチャレンジ(ノーキャップ)で評価される能力である。この課題では、COCO Captions以外のイメージキャプチャトレーニングデータは、モデルトレーニングには使用できない。したがって、従来のビジョンランゲージ事前訓練(VLP)法は適用できない。本稿では、字幕アノテーションがない場合に事前学習を行うVIVO(Visual VOcabulary Pretraining)を提案する。 VLPにおけるペア画像キャプチャトレーニングデータの依存を断ち切ることで、VIVOは大量のペア画像タグデータを利用して視覚語彙を学習することができる。これは、画像レベルのタグを対応する画像領域の特徴に合わせることを学ぶマルチレイヤトランスフォーマーモデルを事前訓練することで実現される。画像タグの非順序性に対処するため、VIVOはハンガリーのマッチング損失とマスク付きタグ予測を使用して事前トレーニングを行う。画像キャプションのための訓練済みモデルを微調整し,VIVOの有効性を検証する。さらに,モデルによって推定される視覚的テキストアライメントの分析を行う。その結果,本モデルでは,新規なオブジェクトを記述した画像キャプションを生成するだけでなく,それらのオブジェクトの位置を識別できることがわかった。我々の1つのモデルは、nocapsで新しい最先端の結果を達成し、人間のCIDErスコアを上回りました。

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps). In this challenge, no additional image-caption training data, other thanCOCO Captions, is allowed for model training. Thus, conventional Vision-Language Pre-training (VLP) methods cannot be applied. This paper presents VIsual VOcabulary pretraining (VIVO) that performs pre-training in the absence of caption annotations. By breaking the dependency of paired image-caption training data in VLP, VIVO can leverage large amounts of paired image-tag data to learn a visual vocabulary. This is done by pre-training a multi-layer Transformer model that learns to align image-level tags with their corresponding image region features. To address the unordered nature of image tags, VIVO uses a Hungarian matching loss with masked tag prediction to conduct pre-training. We validate the effectiveness of VIVO by fine-tuning the pre-trained model for image captioning. In addition, we perform an analysis of the visual-text alignment inferred by our model. The results show that our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects. Our single model has achieved new state-of-the-art results on nocaps and surpassed the human CIDEr score.

翻訳日:2022-10-13 20:38:37 公開日:2021-03-04

# ByzShield: 分散トレーニングのための効率的でロバストなシステム

ByzShield: An Efficient and Robust System for Distributed Training ( http://arxiv.org/abs/2010.04902v2 )

ライセンス: Link先を確認

Konstantinos Konstantinidis, Aditya Ramamoorthy

(参考訳) 分散クラスタ上での大規模モデルのトレーニングは、マシンラーニングパイプラインの重要なコンポーネントである。しかし、一部の労働者が逆(ビザンチン)のやり方で振る舞うと、パラメータサーバ(ps)に任意の結果を返すので、このトレーニングは簡単に失敗する。多くの既存論文が様々な攻撃モデルを検討し、これらの攻撃の影響を軽減するために堅牢な集約と/または計算冗長性を提案する。本研究では, 作業者の勾配計算課題について, 敵が十分な知識を持っていて, 最大被害を誘発するために, K の作業ノードから q を攻撃(最大)できるような, 奇抜な攻撃モデルを考える。冗長性に基づく手法である byzshield は作業者へのタスク割り当てに二部展開グラフの特性を利用する。具体的には、直交するラテン正方形とラマヌジャングラフに基づく構成の固有値に基づいて、崩壊した勾配の最悪のケース分数について上界を示す。数値実験により, 崩壊した勾配の分数の平均値が36%以上低下していることが判明した。同様に、CIFAR-10データセット上の画像分類によるトレーニング実験では、ByzShieldは最も高度な攻撃下で平均20%の精度で精度を向上している。 ByzShieldはまた、以前の作業よりもはるかに多くの対向ノードを許容する。

Training of large scale models on distributed clusters is a critical component of the machine learning pipeline. However, this training can easily be made to fail if some workers behave in an adversarial (Byzantine) fashion whereby they return arbitrary results to the parameter server (PS). A plethora of existing papers consider a variety of attack models and propose robust aggregation and/or computational redundancy to alleviate the effects of these attacks. In this work we consider an omniscient attack model where the adversary has full knowledge about the gradient computation assignments of the workers and can choose to attack (up to) any q out of K worker nodes to induce maximal damage. Our redundancy-based method ByzShield leverages the properties of bipartite expander graphs for the assignment of tasks to workers; this helps to effectively mitigate the effect of the Byzantine behavior. Specifically, we demonstrate an upper bound on the worst case fraction of corrupted gradients based on the eigenvalues of our constructions which are based on mutually orthogonal Latin squares and Ramanujan graphs. Our numerical experiments indicate over a 36% reduction on average in the fraction of corrupted gradients compared to the state of the art. Likewise, our experiments on training followed by image classification on the CIFAR-10 dataset show that ByzShield has on average a 20% advantage in accuracy under the most sophisticated attacks. ByzShield also tolerates a much larger fraction of adversarial nodes compared to prior work.

翻訳日:2022-10-08 23:36:47 公開日:2021-03-04

# アルゴリズムフェアネスに向けたブリッジ機械学習とメカニズム設計

Bridging Machine Learning and Mechanism Design towards Algorithmic Fairness ( http://arxiv.org/abs/2010.05434v2 )

ライセンス: Link先を確認

Jessie Finocchiaro, Roland Maio, Faidra Monachou, Gourab K Patro, Manish Raghavan, Ana-Andreea Stoica, Stratis Tsirtsis

(参考訳) したがって、公平で平等なシステムを構築するためにアルゴリズム的なコンポーネントにどのように介入するかは、最も重要な問題です。現代の意思決定システムでは、リソースや情報を人(例えば学校選択、広告)に割り当てることによって、パイプラインに機械が学習した予測を取り入れ、潜在的戦略行動や制約された割り当てに関する懸念を提起する。機械学習とメカニズム設計の両方が公平性と公平性の問題に対処するフレームワークを開発したが、複雑な意思決定システムではどちらのフレームワークも個々に十分ではない。本稿では,公平な意思決定システムを構築するには,各分野に固有の制約を克服する必要があるという立場を開発する。私たちの究極の目標は、メカニズム設計と機械学習の個々のフレームワークを結合的にブリッジする包括的フレームワークを構築することです。我々は、各分野が公正な意思決定を行う視点を比較し、各分野が教えた教訓をティーズアウトし、相互に教えることができ、これらの分野の強力な協力を必要とするアプリケーションドメインを強調して、この目標に向けて基礎的な作業を始めている。

Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.

翻訳日:2022-10-08 08:02:51 公開日:2021-03-04

# 選択的コントラスト学習による完全教師なし人物再同定

Fully Unsupervised Person Re-identification viaSelective Contrastive Learning ( http://arxiv.org/abs/2010.07608v2 )

ライセンス: Link先を確認

Bo Pang, Deming Zhai, Junjun Jiang, Xianming Liu

(参考訳) 人物再識別(ReID)は、様々なカメラが捉えた画像の中から同一人物を検索することを目的としている。教師なしのReIDは,手動のアノテーションを多用せず,新たな条件に適応する可能性が大きいため,近年多くの注目を集めている。表現学習は、教師なしのReIDにおいて重要な役割を果たす。本研究では,教師なし特徴学習のための新しいコントラスト学習フレームワークを提案する。具体的には、従来のコントラスト学習戦略と異なり、コントラスト損失を定義するために複数の正と適応的にサンプリングされた負を用いて、より強い識別表現を持つ特徴埋め込みモデルを学ぶことを提案する。さらに,グローバルメモリバンクとローカルメモリバンクを対の類似性計算に,混合メモリバンクをコントラスト損失定義に使用する3つの動的辞書を構成するために,グローバルとローカルの機能を共同で活用することを提案する。調査の結果,教師なしのReIDにおける手法の優位性について,最先端技術と比較した。

Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras. Unsupervised person ReID attracts a lot of attention recently, due to it works without intensive manual annotation and thus shows great potential of adapting to new conditions. Representation learning plays a critical role in unsupervised person ReID. In this work, we propose a novel selective contrastive learning framework for unsupervised feature learning. Specifically, different from traditional contrastive learning strategies, we propose to use multiple positives and adaptively sampled negatives for defining the contrastive loss, enabling to learn a feature embedding model with stronger identity discriminative representation. Moreover, we propose to jointly leverage global and local features to construct three dynamic dictionaries, among which the global and local memory banks are used for pairwise similarity computation and the mixture memory bank are used for contrastive loss definition. Experimental results demonstrate the superiority of our method in unsupervised person ReID compared with the state-of-the-arts.

翻訳日:2022-10-07 04:08:23 公開日:2021-03-04

# 非凸非凸ミニマックス最適化の連続時間系による限界挙動

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems ( http://arxiv.org/abs/2010.10628v2 )

ライセンス: Link先を確認

Benjamin Grimmer, Haihao Lu, Pratik Worah, Vahab Mirrokni

(参考訳) 非凸最適化とは異なり、勾配降下は局所最適化に収束することが保証されるが、非凸非凹極小最適化のアルゴリズムは位相的に異なる解経路を持つことがある。本稿では,勾配降下上昇法(gda),交流勾配降下上昇法(agda),超勾配法(egm)の3つの古典的なminimaxアルゴリズムの限界挙動について検討する。本稿では,これらの制限行動がGAN(Generative Adversarial Networks)トレーニングで発生し,多様なGAN問題に対して容易に実証できることを観察する。これらの異なる挙動を説明するために、各アルゴリズムに対応する高次分解能連続時間ダイナミクスについて検討し、各手法による局所収束に十分な(そしてほぼ必要)条件を導出する。さらに、この ode の視点により、ホップ分岐として正規化を導入することによって引き起こされるこれらの異なる制限挙動間の位相遷移を特徴付けることができる。

Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

翻訳日:2022-10-05 07:57:38 公開日:2021-03-04

# BiTe-GCN: テキストリッチネットワーク上のトポロジと特徴の双方向変換による新しいGCNアーキテクチャ

BiTe-GCN: A New GCN Architecture via BidirectionalConvolution of Topology and Features on Text-Rich Networks ( http://arxiv.org/abs/2010.12157v2 )

ライセンス: Link先を確認

Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, Jiawei Han

(参考訳) グラフ畳み込み層を通して高次近傍情報を統合することを目的としたグラフ畳み込みネットワーク(gcns)は、多くのネットワーク分析タスクにおいて顕著な能力を示している。しかし、オーバースムーシングや局所位相ホモフィイを含むトポロジー上の制限は、ネットワークを表現する能力を制限する。既存の研究はネットワークトポロジにおける特徴畳み込みしか行わず、必然的にトポロジと特徴の不均衡をもたらす。実世界では、情報ネットワークは、ノードレベルの引用情報だけでなく、ローカルなテキストシーケンス情報も含む。テキストリッチネットワーク上でのトポロジと特徴の双方向畳み込みによる新しいGCNアーキテクチャであるBiTe-GCNを提案する。まず、元のテキストリッチネットワークを拡張二型ヘテロジニアスネットワークに変換し、グローバルノードレベル情報とローカルテキストシーケンス情報の両方をテキストからキャプチャする。次に、トポロジーと特徴の両方の畳み込みを同時に実行する識別畳み込み機構を導入する。テキストリッチネットワークに関する広範囲な実験によって、新しいアーキテクチャがブレークアウトの改善によって最先端を上回っていることが証明された。さらに、このアーキテクチャはjd検索のようないくつかのeコマース検索シーンにも適用できる。 jdデータセットにおける実験は、提案するアーキテクチャが関連する手法よりも優れていることを検証している。

Graph convolutional networks (GCNs), aiming to integrate high-order neighborhood information through stacked graph convolution layers, have demonstrated remarkable power in many network analysis tasks. However, topological limitations, including over-smoothing and local topology homophily, limit its capability to represent networks. Existing studies only perform feature convolution on network topology, which inevitably introduces unbalance between topology and features. Considering that in real world, the information network consists of not only the node-level citation information but also the local text-sequence information. We propose BiTe-GCN, a novel GCN architecture with bidirectional convolution of both topology and features on text-rich networks to solve these limitations. We first transform the original text-rich network into an augmented bi-typed heterogeneous network, capturing both the global node-level information and the local text-sequence information from texts. We then introduce discriminative convolution mechanisms to performs convolutions of both topology and features simultaneously. Extensive experiments on text-rich networks demonstrate that our new architecture outperforms state-of-the-art by a breakout improvement. Moreover, this architecture can also be applied to several e-commerce searching scenes such as JD searching. The experiments on the JD dataset validate the superiority of the proposed architecture over the related methods.

翻訳日:2022-10-03 23:37:21 公開日:2021-03-04

# EDNet:コストボリュームとアテンションに基づく空間残差を考慮した効率的な分散推定

EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual ( http://arxiv.org/abs/2010.13338v4 )

ライセンス: Link先を確認

Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei, Xiaowen Chu

(参考訳) 既存の最先端の分散度推定作業は、主に4D結合ボリュームを活用し、高いメモリ消費と遅い推論速度のために非効率な分散回帰のために非常に深い3D畳み込みニューラルネットワーク(CNN)を構築する。本稿では,EDNetというネットワークを効率よく分散推定する手法を提案する。まず,圧縮された連結音量からの文脈情報と相関音量からの特徴類似性測定を組み合わせた複合音量を構築する。結合ボリュームは次に2D畳み込みによって集約され、3D畳み込みよりも高速で少ないメモリを必要とする。次に,注意認識残差特性を生成するための注意に基づく空間残差モジュールを提案する。注意機構を適用し、複数スケールの誤差マップを用いて、不正確な領域に関する直感的な空間的証拠を提供し、残差学習効率を向上する。 Scene FlowとKITTIデータセットの大規模な実験は、EDNetが以前の3D CNNベースの作業より優れており、非常に高速でメモリ消費の少ない最先端のパフォーマンスを実現していることを示している。

Existing state-of-the-art disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression, which is inefficient due to the high memory consumption and slow inference speed. In this paper, we propose a network named EDNet for efficient disparity estimation. Firstly, we construct a combined volume which incorporates contextual information from the squeezed concatenation volume and feature similarity measurement from the correlation volume. The combined volume can be next aggregated by 2D convolutions which are faster and require less memory than 3D convolutions. Secondly, we propose an attention-based spatial residual module to generate attention-aware residual features. The attention mechanism is applied to provide intuitive spatial evidence about inaccurate regions with the help of error maps at multiple scales and thus improve the residual learning efficiency. Extensive experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works and achieves state-of-the-art performance with significantly faster speed and less memory consumption.

翻訳日:2022-10-02 19:16:29 公開日:2021-03-04

# 遠心損失に基づく弱教師付き意味セグメンテーションアプローチ:品質管理と検査への応用

A Weakly-Supervised Semantic Segmentation Approach based on the Centroid Loss: Application to Quality Control and Inspection ( http://arxiv.org/abs/2010.13433v3 )

ライセンス: Link先を確認

Kai Yao, Alberto Ortiz, Francisco Bonnin-Pascual

(参考訳) 一般に、ディープラーニングと畳み込みニューラルネットワークに基づく現在のビジョンアルゴリズムの重要な部分の1つは、競合性能を達成するのに十分な数の画像のアノテーションであると考えられている。アノテーションはピクセルレベルで理想的に生成する必要があるため、セマンティックセグメンテーションタスクでは特に難しい。弱い教師付きセマンティックセグメンテーション(weakly supervised semantic segmentation)は、よりシンプルなアノテーションを使用することで、このコストを削減することを目的としている。本稿では,弱いアノテーションの効果を相殺することを目的とした新しい損失関数を用いて,新しい弱教師付き意味セグメンテーション手法を提案し,評価する。この目的のために、この損失関数は部分的エントロピー損失に基づくいくつかの項を含み、その1つはセントロイド損失である。この用語は、最適化を導くことによってセグメンテーションネットワークのトレーニングを改善することを目的として、対象クラスにおける画像画素のクラスタリングを誘導する。手法の性能は,品質管理アプリケーションにおいて,複数の異なるオブジェクトクラスのインスタンスを検出した場合と,視覚検査領域に起源を持つ場合と,特定の欠陥によって影響を受けるシーン表面の点に対応する画像領域の局所化を扱う場合の2つの異なるケーススタディから評価される。両方のケースで報告された検出結果は、両者の違いと特定の課題にもかかわらず、弱いアノテーションの使用が双方の競争性能レベルを達成するのを妨げないことを示している。

It is generally accepted that one of the critical parts of current vision algorithms based on deep learning and convolutional neural networks is the annotation of a sufficient number of images to achieve competitive performance. This is particularly difficult for semantic segmentation tasks since the annotation must be ideally generated at the pixel level. Weakly-supervised semantic segmentation aims at reducing this cost by employing simpler annotations that, hence, are easier, cheaper and quicker to produce. In this paper, we propose and assess a new weakly-supervised semantic segmentation approach making use of a novel loss function whose goal is to counteract the effects of weak annotations. To this end, this loss function comprises several terms based on partial cross-entropy losses, being one of them the Centroid Loss. This term induces a clustering of the image pixels in the object classes under consideration, whose aim is to improve the training of the segmentation network by guiding the optimization. The performance of the approach is evaluated against datasets from two different industry-related case studies: while one involves the detection of instances of a number of different object classes in the context of a quality control application, the other stems from the visual inspection domain and deals with the localization of images areas whose pixels correspond to scene surface points affected by a specific sort of defect. The detection results that are reported for both cases show that, despite the differences among them and the particular challenges, the use of weak annotations do not prevent from achieving a competitive performance level for both.

翻訳日:2022-10-02 19:13:42 公開日:2021-03-04

# パブリックポリシーのための説明可能な機械学習: ユースケース、ギャップ、研究方向

Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions ( http://arxiv.org/abs/2010.14374v2 )

ライセンス: Link先を確認

Kasun Amarasinghe, Kit Rodolfa, Hemank Lamba, Rayid Ghani

(参考訳) 説明可能性は、健康、刑事司法、教育、雇用といった公共政策分野における意思決定を支援する機械学習(ml)モデルの採用と同様に、有効性にとって重要な要件であると同時に、近年説明可能性の分野は拡大しているが、この研究の多くは現実世界のニーズを考慮していない。提案手法の大部分は、明確なユースケースや意図したエンドユーザを使わずに、汎用的な説明可能性目標を持つベンチマークデータセットを使用する。その結果, 実世界の応用における理論的, 方法論的研究の適用性や有効性は明らかでない。本稿では、この空白を公共政策の領域に充足することに焦点を当てる。我々は,公共政策問題における説明可能性ユースケースの分類を開発し,各ユースケースにおいて,説明のエンドユーザーを定義し,その具体的目標を達成しなければならない。第3に,既存の作業をこれらのユースケースにマッピングし,ギャップを特定し,それらのギャップを埋めて,MLを通じた実践的な社会的影響を得るための研究の方向性を提案する。

Explainability is a crucial requirement for effectiveness as well as the adoption of Machine Learning (ML) models supporting decisions in high-stakes public policy areas such as health, criminal justice, education, and employment, While the field of explainable has expanded in recent years, much of this work has not taken real-world needs into account. A majority of proposed methods use benchmark datasets with generic explainability goals without clear use-cases or intended end-users. As a result, the applicability and effectiveness of this large body of theoretical and methodological work on real-world applications is unclear. This paper focuses on filling this void for the domain of public policy. We develop a taxonomy of explainability use-cases within public policy problems; for each use-case, we define the end-users of explanations and the specific goals explainability has to fulfill; third, we map existing work to these use-cases, identify gaps, and propose research directions to fill those gaps in order to have a practical societal impact through ML.

翻訳日:2022-10-02 11:22:23 公開日:2021-03-04

# 投射による一般線形帯域の自己一致解析

Self-Concordant Analysis of Generalized Linear Bandits with Forgetting ( http://arxiv.org/abs/2011.00819v2 )

ライセンス: Link先を確認

Yoan Russac (DI-ENS, CNRS, PSL, VALDA), Louis Faury, Olivier Capp\'e (DI-ENS, VALDA), Aur\'elien Garivier (UMPA-ENSL)

(参考訳) カテゴリー的あるいは数値的観察を伴う文脈的逐次決定問題はユビキタスであり、一般化線形バンド(glb)はそれらに対処するための固い理論的枠組みを提供する。線形帯域の場合とは対照的に、GLBの既存のアルゴリズムは適用性を損なう2つの欠点がある。まず、モデルの非線形の性質のため、過度に悲観的な濃度境界に依存する。第二に、推定器の有界性を強制するためには、非凸射影ステップかバーンインフェーズのいずれかが必要である。これらの問題は、GLBパラメータが時間によって変化する可能性のある非定常モデルを考えると、どちらも悪化する。本研究では,スライディングウインドウと指数重みのどちらかを用いて達成した自己調和型GLB(ロジスティックおよびポアソン回帰を含む)に着目した。そこで本研究では,急速変化する環境において,最大類似推定器に対する信頼度に基づく新しいアルゴリズムを提案する。これらの結果とそれに伴う数値シミュレーションは,GLBの非定常性に対処する提案手法の可能性を強調している。

Contextual sequential decision problems with categorical or numerical observations are ubiquitous and Generalized Linear Bandits (GLB) offer a solid theoretical framework to address them. In contrast to the case of linear bandits, existing algorithms for GLB have two drawbacks undermining their applicability. First, they rely on excessively pessimistic concentration bounds due to the non-linear nature of the model. Second, they require either non-convex projection steps or burn-in phases to enforce boundedness of the estimators. Both of these issues are worsened when considering non-stationary models, in which the GLB parameter may vary with time. In this work, we focus on self-concordant GLB (which include logistic and Poisson regression) with forgetting achieved either by the use of a sliding window or exponential weights. We propose a novel confidence-based algorithm for the maximum-likehood estimator with forgetting and analyze its perfomance in abruptly changing environments. These results as well as the accompanying numerical simulations highlight the potential of the proposed approach to address non-stationarity in GLB.

翻訳日:2022-09-30 10:46:22 公開日:2021-03-04

# ライフサイクルアウェアカプセルネットワークを用いた顔行動の時空間的解析

Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks ( http://arxiv.org/abs/2011.08819v2 )

ライセンス: Link先を確認

Nikhil Churamani, Sinan Kalkan and Hatice Gunes

(参考訳) 顔活動単位(au)検出のための最先端のアプローチのほとんどは、静的フレームからの表情の評価に依存しており、顔活動の高度化のスナップショットをエンコードしている。しかし、現実世界の相互作用では、表情はより微妙で、時間的方法で進化し、時間的情報と同様に空間的および時間的情報を学ぶ必要がある。本稿では,顔AUアクティベーションの時間的変化を符号化する空間的特徴と時空間的特徴の両方に焦点をあてる。そこで本研究では,フレームとシーケンスレベルの両方の機能を用いてau検出を行うアクションユニットライフサイクルアウェアカプセルネットワーク(aula-caps)を提案する。フレームレベルでは、AULA-Capsのカプセル層が空間的特徴プリミティブを学習し、AUのアクティベーションを決定する一方で、シーケンスレベルでは、シーケンス内の関連する時空間セグメントに焦点を当てることで、連続フレーム間の時間的依存関係を学習する。学習された特徴カプセルは、AUライフサイクルに応じて、空間的あるいは時空間的な情報に選択的に集中するようにルーティングされる。提案手法はBP4D と GFT のベンチマークデータセットで評価され,両データセットの最先端結果が得られた。

Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art results on both the datasets.

翻訳日:2022-09-24 16:11:17 公開日:2021-03-04

# Universal MelGAN:複数領域における高密度波形生成のためのロバストニューラルネットワーク

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains ( http://arxiv.org/abs/2011.09631v2 )

ライセンス: Link先を確認

Won Jang, Dan Lim, Jaesam Yoon

(参考訳) 複数のドメインで高忠実度音声を合成するボコーダであるUniversal MelGANを提案する。数百人の話者のデータセットを用いてMelGANに基づく構造を訓練した場合の音質を維持するため,生成波形のスペクトル分解能を高めるため,マルチレゾリューション・スペクトログラム判別器を追加した。これにより、大型フットプリントモデルの高周波帯域における過平滑化問題を緩和し、マルチスピーカの現実的な波形を生成することができる。学習中に波形とスペクトログラムを識別することにより、推定速度を低下させることなく、地中データに近い信号を生成する。このモデルでは,入力としてグラウンドトルース・メル・スペクトログラムを用いて,ほとんどのシナリオで最高の平均世論スコア(MOS)を得た。特に, 話者, 感情, 言語に関して, 未認識領域において優れた性能を示した。さらに,変換器モデルで生成したメルスペクトルを用いたマルチスピーカ音声合成では,4.22MOSの高忠実度音声を合成した。これらの結果は、外部のドメイン情報なしで達成され、普遍的なボコーダとして提案モデルの可能性を強調している。

We propose Universal MelGAN, a vocoder that synthesizes high-fidelity speech in multiple domains. To preserve sound quality when the MelGAN-based structure is trained with a dataset of hundreds of speakers, we added multi-resolution spectrogram discriminators to sharpen the spectral resolution of the generated waveforms. This enables the model to generate realistic waveforms of multi-speakers, by alleviating the over-smoothing problem in the high frequency band of the large footprint model. Our structure generates signals close to ground-truth data without reducing the inference speed, by discriminating the waveform and spectrogram during training. The model achieved the best mean opinion score (MOS) in most scenarios using ground-truth mel-spectrogram as an input. Especially, it showed superior performance in unseen domains with regard of speaker, emotion, and language. Moreover, in a multi-speaker text-to-speech scenario using mel-spectrogram generated by a transformer model, it synthesized high-fidelity speech of 4.22 MOS. These results, achieved without external domain information, highlight the potential of the proposed model as a universal vocoder.

翻訳日:2022-09-23 20:43:05 公開日:2021-03-04

# (参考訳) SAFFIRE:自律的特徴フィルタリングとインテリジェントROI推定システム

SAFFIRE: System for Autonomous Feature Filtering and Intelligent ROI Estimation ( http://arxiv.org/abs/2012.02502v2 )

ライセンス: CC BY 4.0

Marco Boschi, Luigi Di Stefano, Martino Alessandrini

(参考訳) この研究は、一連の画像サンプルから支配的な再帰的イメージパターンを自動的に抽出する、SAFFIREという新しいフレームワークを導入する。このようなパターンは、多くのコンピュータビジョンや機械学習タスクにおいて共通の要件であるサンプル間のポーズの変動を排除するために使用される。このフレームワークは、自動製品検査のためのマシンビジョンシステムという文脈で特化している。ここでは、ユーザがアンカーパターンの識別を尋ね、さらに処理する前に、自動化システムがデータを正規化するために使用するのが慣例である。しかし、これは本質的に主観的で高度な専門知識を必要とする非常に敏感な操作である。これにより、SAFFIREは、ユーザに完全に透過的な方法で最適なアンカーパターンを教師なしで識別するための、ユニークで破壊的なフレームワークを提供する。 saffireは、マシンビジョン検査パイプラインの現実的なケーススタディで完全に検証されている。

This work introduces a new framework, named SAFFIRE, to automatically extract a dominant recurrent image pattern from a set of image samples. Such a pattern shall be used to eliminate pose variations between samples, which is a common requirement in many computer vision and machine learning tasks. The framework is specialized here in the context of a machine vision system for automated product inspection. Here, it is customary to ask the user for the identification of an anchor pattern, to be used by the automated system to normalize data before further processing. Yet, this is a very sensitive operation which is intrinsically subjective and requires high expertise. Hereto, SAFFIRE provides a unique and disruptive framework for unsupervised identification of an optimal anchor pattern in a way which is fully transparent to the user. SAFFIRE is thoroughly validated on several realistic case studies for a machine vision inspection pipeline.

翻訳日:2021-05-23 07:35:23 公開日:2021-03-04

# YieldNet:リモートセンシングデータに基づくコーンと大豆の同時収量予測のための畳み込みニューラルネットワーク

YieldNet: A Convolutional Neural Network for Simultaneous Corn and Soybean Yield Prediction Based on Remote Sensing Data ( http://arxiv.org/abs/2012.03129v2 )

ライセンス: Link先を確認

Saeed Khaki, Hieu Pham and Lizhi Wang

(参考訳) 大規模作物収量の推定は、その成長状態を通じて作物の連続的な監視を可能にするリモートセンシングデータの提供によって可能になった。この情報を持つことで、利害関係者は利回りポテンシャルを最大化するためにリアルタイムの意思決定ができる。リモートセンシングデータから収量を予測する様々なモデルが存在するが、現在では複数の作物の収量を同時に推定できるアプローチは存在しないため、より正確な予測につながる。複数の作物の収量を予測し、複数の作物の収量間の相互作用を同時に考慮するモデル。本稿では,背骨特徴抽出器の重みを共用することにより,トウモロコシと大豆の収量予測の伝達学習を利用する新しいディープラーニングフレームワークである yieldnet を用いた新しいモデルを提案する。さらに,マルチターゲット応答変数を検討するために,新しい損失関数を提案する。その結果,提案手法は収穫の1～4か月前の収量を正確に予測でき,他の最先端手法と競合することがわかった。

Large scale crop yield estimation is, in part, made possible due to the availability of remote sensing data allowing for the continuous monitoring of crops throughout its growth state. Having this information allows stakeholders the ability to make real-time decisions to maximize yield potential. Although various models exist that predict yield from remote sensing data, there currently does not exist an approach that can estimate yield for multiple crops simultaneously, and thus leads to more accurate predictions. A model that predicts yield of multiple crops and concurrently considers the interaction between multiple crop's yield. We propose a new model called YieldNet which utilizes a novel deep learning framework that uses transfer learning between corn and soybean yield predictions by sharing the weights of the backbone feature extractor. Additionally, to consider the multi-target response variable, we propose a new loss function. Numerical results demonstrate that our proposed method accurately predicts yield from one to four months before the harvest, and is competitive to other state-of-the-art approaches.

翻訳日:2021-05-22 12:09:23 公開日:2021-03-04

# (参考訳) スパース凸ウェーブレットクラスタリングによる同時グループ化とデノーミング

Simultaneous Grouping and Denoising via Sparse Convex Wavelet Clustering ( http://arxiv.org/abs/2012.04762v2 )

ライセンス: CC BY-SA 4.0

Michael Weylandt and T. Mitchell Roddenberry and Genevera I. Allen

(参考訳) クラスタリングは、データサイエンスと信号処理におけるユビキタスな問題である。ノイズの多い信号を観測する多くのアプリケーションでは、まず最初にウェーブレットをデノイズ化し、次にクラスタリングアルゴリズムを適用することが一般的である。本稿では,グループを分離し発見する疎凸ウェーブレットクラスタリング手法を開発した。本手法では,コンベックス核融合ペナルティを用いて凝集とグループスパースペナルティを実現し,ウェーブレット領域のスパーシティを緩和する。クラスタを識別する一般的な手法とは対照的に,我々の手法は同時に実行する統一凸アプローチである。本手法は,解釈性とデータ圧縮性を両立させるデノタイズ(ウェーブレットスパース)クラスタセントロイドを生成する。本手法の合成例とNMR分光への応用について述べる。

Clustering is a ubiquitous problem in data science and signal processing. In many applications where we observe noisy signals, it is common practice to first denoise the data, perhaps using wavelet denoising, and then to apply a clustering algorithm. In this paper, we develop a sparse convex wavelet clustering approach that simultaneously denoises and discovers groups. Our approach utilizes convex fusion penalties to achieve agglomeration and group-sparse penalties to denoise through sparsity in the wavelet domain. In contrast to common practice which denoises then clusters, our method is a unified, convex approach that performs both simultaneously. Our method yields denoised (wavelet-sparse) cluster centroids that both improve interpretability and data compression. We demonstrate our method on synthetic examples and in an application to NMR spectroscopy.

翻訳日:2021-05-16 23:09:52 公開日:2021-03-04

# 不確実性整定ピラミッド一貫性による鼻咽頭癌分節の高効率化

Efficient Semi-Supervised Gross Target Volume of Nasopharyngeal Carcinoma Segmentation via Uncertainty Rectified Pyramid Consistency ( http://arxiv.org/abs/2012.07042v3 )

ライセンス: Link先を確認

Xiangde Luo, Wenjun Liao, Jieneng Chen, Tao Song, Yinan Chen, Shichuan Zhang, Nianyong Chen, Guotai Wang, Shaoting Zhang

(参考訳) 鼻咽喉頭癌(NPC)に対する放射線治療計画においてGross Target Volume(GTV)セグメンテーションは相容れない役割を担っている。畳み込みニューラルネットワーク(CNN)はこのタスクで優れたパフォーマンスを達成したが、トレーニングには大量のラベル付きイメージを頼りにしており、これは高価で取得に時間がかかる。本稿では,半教師付きNPC GTVセグメンテーションのための不確かさ確認ピラミッド整合性(URPC)正則化手法を提案する。具体的には、バックボーンセグメンテーションネットワークを拡張して、異なるスケールでピラミッド予測を生成する。ピラミッド予測ネットワーク(PPNet)は、ラベル付き画像の基底真実とラベル付き画像のマルチスケール一貫性損失によって管理されており、同じ入力に対する異なるスケールでの予測は類似し一貫性があるべきであるという事実を動機としている。しかし、これらの予測の解像度が異なるため、各ピクセルに一貫性を持たせるように促すことは、ロバスト性が低く、詳細を失う可能性がある。この問題に対処するために,新たな不確実性整流モジュールをデザインし,各規模の有意義で信頼性の高いコンセンサス領域から徐々に学習できるようにする。 258 npcのmr画像を用いたデータセット実験の結果,ラベル付き画像が10%から20%しか表示されず,ラベル付き画像を活用することによりセグメント化性能が大幅に向上し,最先端の半教師付きセグメンテーション手法を5つ上回った。さらに、50%のイメージがラベル付けされただけで、URPCは、完全に教師付き学習に近い平均82.74%のDiceスコアを達成した。

Gross Target Volume (GTV) segmentation plays an irreplaceable role in radiotherapy planning for Nasopharyngeal Carcinoma (NPC). Despite that Convolutional Neural Networks (CNN) have achieved good performance for this task, they rely on a large set of labeled images for training, which is expensive and time-consuming to acquire. In this paper, we propose a novel framework with Uncertainty Rectified Pyramid Consistency (URPC) regularization for semi-supervised NPC GTV segmentation. Concretely, we extend a backbone segmentation network to produce pyramid predictions at different scales. The pyramid predictions network (PPNet) is supervised by the ground truth of labeled images and a multi-scale consistency loss for unlabeled images, motivated by the fact that prediction at different scales for the same input should be similar and consistent. However, due to the different resolution of these predictions, encouraging them to be consistent at each pixel directly has low robustness and may lose some fine details. To address this problem, we further design a novel uncertainty rectifying module to enable the framework to gradually learn from meaningful and reliable consensual regions at different scales. Experimental results on a dataset with 258 NPC MR images showed that with only 10% or 20% images labeled, our method largely improved the segmentation performance by leveraging the unlabeled images, and it also outperformed five state-of-the-art semi-supervised segmentation methods. Moreover, when only 50% images labeled, URPC achieved an average Dice score of 82.74% that was close to fully supervised learning.

翻訳日:2021-05-09 12:44:33 公開日:2021-03-04

# c-watcher:covid-19流行に先立ってリスクの高い地域を早期発見するためのフレームワーク

C-Watcher: A Framework for Early Detection of High-Risk Neighborhoods Ahead of COVID-19 Outbreak ( http://arxiv.org/abs/2012.12169v3 )

ライセンス: Link先を確認

Congxi Xiao, Jingbo Zhou, Jizhou Huang, An Zhuo, Ji Liu, Haoyi Xiong, Dejing Dou

(参考訳) 新型コロナウイルス(COVID-19)は日常的に流行し、いまだに世界中に波及している。非薬剤的介入のための既存の解決策は、通常、住宅地の一部分を適時かつ正確に選択して封じ込めたり隔離したりすることが必要であり、そこでは、特定された症例の空間分布が、部分集合選択の重要な基準とされてきた。このような封じ込め措置は、一部の国では新型コロナウイルスの感染拡大を食い止めたり減速させたりしているものの、確認された症例の統計はたいてい時間的に遅延し、粗粒化しているため、非効率あるいは非効率であると批判されている。この課題に対処するため,C-Watcherという新たなデータ駆動型フレームワークを提案する。C-Watcherは,新型コロナウイルスの流行に先立ち,対象都市の各地区をスクリーニングし,感染リスクを予測する。デザイン面では、C-WatcherはBaidu Mapsから大規模な人間の移動データを収集し、都市の移動パターンに基づいた一連の特徴を用いて市内のすべての住宅地区を特徴付ける。さらに, 地域発生前に, 自発的な知識を対象都市に移すため, 対象都市において特定された事例が判明する以前にも, モビリティ関連特徴から「都市不変」表現を学習し, 高リスク地域を正確に早期に検出するための新しい敵対的エンコーダフレームワークを採用する。新型コロナウイルス(covid-19)流行の初期段階において,実データ記録を用いたc-watcherの広範な実験を行い,多数の都市から高リスク地区を早期に検出するためのc-watcherの効率性と有効性を示した。

The novel coronavirus disease (COVID-19) has crushed daily routines and is still rampaging through the world. Existing solution for nonpharmaceutical interventions usually needs to timely and precisely select a subset of residential urban areas for containment or even quarantine, where the spatial distribution of confirmed cases has been considered as a key criterion for the subset selection. While such containment measure has successfully stopped or slowed down the spread of COVID-19 in some countries, it is criticized for being inefficient or ineffective, as the statistics of confirmed cases are usually time-delayed and coarse-grained. To tackle the issues, we propose C-Watcher, a novel data-driven framework that aims at screening every neighborhood in a target city and predicting infection risks, prior to the spread of COVID-19 from epicenters to the city. In terms of design, C-Watcher collects large-scale long-term human mobility data from Baidu Maps, then characterizes every residential neighborhood in the city using a set of features based on urban mobility patterns. Furthermore, to transfer the firsthand knowledge (witted in epicenters) to the target city before local outbreaks, we adopt a novel adversarial encoder framework to learn "city-invariant" representations from the mobility-related features for precise early detection of high-risk neighborhoods, even before any confirmed cases known, in the target city. We carried out extensive experiments on C-Watcher using the real-data records in the early stage of COVID-19 outbreaks, where the results demonstrate the efficiency and effectiveness of C-Watcher for early detection of high-risk neighborhoods from a large number of cities.

翻訳日:2021-04-26 07:43:19 公開日:2021-03-04

# 緊急対応システムにおける資源配分の階層的計画

Hierarchical Planning for Resource Allocation in Emergency Response Systems ( http://arxiv.org/abs/2012.13300v2 )

ライセンス: Link先を確認

Geoffrey Pettet and Ayan Mukhopadhyay and Mykel Kochenderfer and Abhishek Dubey

(参考訳) 都市規模のサイバー物理システム(CPS)における古典的な問題は、不確実性の下で資源割り当てである。通常、そのような問題はマルコフ決定過程(あるいは半マルコフ決定過程)としてモデル化される。このような問題に対して、オンライン、オフライン、分散のアプローチが適用されてきたが、大きな意思決定問題へのスケールアップは困難である。本稿では,都市レベルのCPS問題の構造を不確実性を考慮した資源配分に活用する階層的計画手法を提案する。緊急対応を事例研究として,大規模資源割当問題をどのようにより小さな問題に分割できるかを示す。次に、より小さな問題を解決し、それらの相互作用に取り組むための原則化されたフレームワークを作成します。最後に、アメリカの主要都市圏であるテネシー州ナッシュビルからの実世界データを使用して、我々のアプローチを検証する。提案手法は,緊急対応分野における最先端のアプローチよりも優れていることを示す。

A classical problem in city-scale cyber-physical systems (CPS) is resource allocation under uncertainty. Typically, such problems are modeled as Markov (or semi-Markov) decision processes. While online, offline, and decentralized approaches have been applied to such problems, they have difficulty scaling to large decision problems. We present a general approach to hierarchical planning that leverages structure in city-level CPS problems for resource allocation under uncertainty. We use the emergency response as a case study and show how a large resource allocation problem can be split into smaller problems. We then create a principled framework for solving the smaller problems and tackling the interaction between them. Finally, we use real-world data from Nashville, Tennessee, a major metropolitan area in the United States, to validate our approach. Our experiments show that the proposed approach outperforms state-of-the-art approaches used in the field of emergency response.

翻訳日:2021-04-25 08:19:26 公開日:2021-03-04

# (参考訳) FPCC:インスタンスセグメンテーションのための高速ポイントクラウドクラスタリング

FPCC: Fast Point Cloud Clustering for Instance Segmentation ( http://arxiv.org/abs/2012.14618v3 )

ライセンス: CC BY 4.0

Yajun Xu, Shogo Arai, Diyi Liu, Fangzhou Lin, Kazuhiro Kosuge

(参考訳) インスタンスセグメンテーションは、ロボット工学、自動運転車、人間とコンピュータの相互作用など、多くの現実世界のアプリケーションにおいて重要な前処理タスクである。しかし、同一クラスの複数のオブジェクトを積み重ねたビンピッキングシーンの3Dポイントクラウドインスタンスセグメンテーションについてはほとんど研究されていない。 2次元画像タスクのためのディープラーニングの急速な開発と比較すると、ディープラーニングベースの3Dポイントクラウドセグメンテーションは、まだ開発の余地がたくさんある。このような状況下では、同じクラスの多数の隠蔽対象を区別することが非常に難しい問題である。通常のビンピッキングシーンでは、オブジェクトモデルが知られ、オブジェクトの型数は1である。したがって、セマンティック情報は無視できる。代わりに、インスタンスのセグメンテーションに焦点が当てられる。このタスク要求に基づき、各インスタンスの特徴中心を推論し、残りのポイントを特徴埋め込み空間において最も近い特徴中心にクラスタリングするネットワーク(FPCC-Net)を提案する。 FPCC-Netには2つのサブネットがあり、1つはクラスタリングのための特徴中心を推測し、もう1つは各点の特徴を記述する。提案手法は,既存の3dポイントクラウドおよび2dセグメンテーション手法と比較した。 FPCC-Net は SGPN よりも平均精度 (AP) が 40 % 向上し,約 0.8 [s] で約 6 万点処理可能である。

Instance segmentation is an important pre-processing task in numerous real-world applications, such as robotics, autonomous vehicles, and human-computer interaction. However, there has been little research on 3D point cloud instance segmentation of bin-picking scenes in which multiple objects of the same class are stacked together. Compared with the rapid development of deep learning for two-dimensional (2D) image tasks, deep learning-based 3D point cloud segmentation still has a lot of room for development. In such a situation, distinguishing a large number of occluded objects of the same class is a highly challenging problem. In a usual bin-picking scene, an object model is known and the number of object type is one. Thus, the semantic information can be ignored; instead, the focus is put on the segmentation of instances. Based on this task requirement, we propose a network (FPCC-Net) that infers feature centers of each instance and then clusters the remaining points to the closest feature center in feature embedding space. FPCC-Net includes two subnets, one for inferring the feature centers for clustering and the other for describing features of each point. The proposed method is compared with existing 3D point cloud and 2D segmentation methods in some bin-picking scenes. It is shown that FPCC-Net improves average precision (AP) by about 40\% than SGPN and can process about 60,000 points in about 0.8 [s].

翻訳日:2021-04-19 05:18:34 公開日:2021-03-04

# トリプルトマッチングネットワークによる分類学の完成

Taxonomy Completion via Triplet Matching Network ( http://arxiv.org/abs/2101.01896v3 )

ライセンス: Link先を確認

Jieyu Zhang, Xiangchen Song, Ying Zeng, Jiaze Chen, Jiaming Shen, Yuning Mao, Lei Li

(参考訳) 自動的に分類を構築することは、eコマースやWeb検索に多くの応用を見出す。重要な課題の1つは、データとビジネスのスコープが実際のアプリケーションで増大するにつれて、新しい概念が出現し、既存の分類体系に追加する必要があることである。従来のアプローチは分類学の拡張、すなわち新しいクエリ概念のための分類法から適切なハイパーニム概念を見つける。本稿では,クエリのハイパーネムとハイポネムの概念の両方を発見することで,新しいタスクである「分類完了」を定式化する。本稿では,与えられたクエリ概念に対して適切な<hypernym, hyponym>ペアを見つけるために,Triplet Matching Network (TMN)を提案する。 TMNは1つの予備スコアと複数の補助スコアからなる。これらの補助スコアラは、様々なきめ細かい信号(例えば、hypernymへのクエリやhypnymセマンティクスへのクエリ)をキャプチャし、予備スコアラは、すべての補助スコアラの内部特徴表現に基づいて、<query, hypernym, hyponym>トリプレットの全体的予測を行う。また、概念表現におけるタスク固有情報を保持する革新的なチャネルワイズゲーティング機構を導入し、さらなるモデル性能の向上を図る。実世界の4つの大規模データセットにおける実験は、tmnが既存の手法を上回って、分類完了タスクと以前の分類拡張タスクの両方において最高の性能を達成していることを示している。

Automatically constructing taxonomy finds many applications in e-commerce and web search. One critical challenge is as data and business scope grow in real applications, new concepts are emerging and needed to be added to the existing taxonomy. Previous approaches focus on the taxonomy expansion, i.e. finding an appropriate hypernym concept from the taxonomy for a new query concept. In this paper, we formulate a new task, "taxonomy completion", by discovering both the hypernym and hyponym concepts for a query. We propose Triplet Matching Network (TMN), to find the appropriate <hypernym, hyponym> pairs for a given query concept. TMN consists of one primal scorer and multiple auxiliary scorers. These auxiliary scorers capture various fine-grained signals (e.g., query to hypernym or query to hyponym semantics), and the primal scorer makes a holistic prediction on <query, hypernym, hyponym> triplet based on the internal feature representations of all auxiliary scorers. Also, an innovative channel-wise gating mechanism that retains task-specific information in concept representations is introduced to further boost model performance. Experiments on four real-world large-scale datasets show that TMN achieves the best performance on both taxonomy completion task and the previous taxonomy expansion task, outperforming existing methods.

翻訳日:2021-04-11 00:15:54 公開日:2021-03-04

# (参考訳) 高音源から低音源言語への変換学習による音声認識精度の向上

Transfer learning from High-Resource to Low-Resource Language Improves Speech Affect Recognition Classification Accuracy ( http://arxiv.org/abs/2103.11764v1 )

ライセンス: CC BY-SA 4.0

Sara Durrani and Umair Arshad

(参考訳) 音声認識は、音声データから感情的影響を抽出する問題である。低リソース言語コーパスは後方にあり、クロスコーパス設定では影響認識が難しいタスクである。本稿では,低リソース言語における影響を認識するために,モデルが高リソース言語と微調整に基づいて訓練されるアプローチを提案する。 SAVEE, EMOVO, Urdu, IEMOCAPをベースライン精度60.45, 68.05, 80.34, 56.58パーセントで同一のコーパスでトレーニングする。言語における影響の多様性を捉えるため、クロスコーポレーション評価を詳細に論じる。トレーニングデータにドメインターゲットデータを追加することで、精度が向上することがわかった。最後に,ウルドゥー語とイタリア語の音声のuarを69.32および68.2で達成することで,低資源言語音声認識の性能が向上することを示す。

Speech Affect Recognition is a problem of extracting emotional affects from audio data. Low resource languages corpora are rear and affect recognition is a difficult task in cross-corpus settings. We present an approach in which the model is trained on high resource language and fine-tune to recognize affects in low resource language. We train the model in same corpus setting on SAVEE, EMOVO, Urdu, and IEMOCAP by achieving baseline accuracy of 60.45, 68.05, 80.34, and 56.58 percent respectively. For capturing the diversity of affects in languages cross-corpus evaluations are discussed in detail. We find that accuracy improves by adding the domain target data into the training data. Finally, we show that performance is improved for low resource language speech affect recognition by achieving the UAR OF 69.32 and 68.2 for Urdu and Italian speech affects.

翻訳日:2021-04-05 06:44:04 公開日:2021-03-04

# (参考訳) MICCAIハッカソン : MICCAI会議における論文の再現性・多様性・選定

The MICCAI Hackathon on reproducibility, diversity, and selection of papers at the MICCAI conference ( http://arxiv.org/abs/2103.05437v1 )

ライセンス: CC BY 4.0

Fabian Balsiger, Alain Jungo, Naren Akash R J, Jianan Chen, Ivan Ezhov, Shengnan Liu, Jun Ma, Johannes C. Paetzold, Vishva Saravanan R, Anjany Sekuboyina, Suprosanna Shit, Yannick Suter, Moshood Yekini, Guodong Zeng, Markus Rempfler

(参考訳) MICCAIカンファレンスは、コミュニティの規模、コントリビューションの数、技術的成功の観点から、ここ数年で大きな成長を遂げています。しかし、この成長はコミュニティに新たな課題をもたらします。再現は困難であり,MICCAI会議への論文提出件数の増加は,選択プロセスとトピックの多様性に関する新たな疑問を提起する。これらの課題の交換、議論、創造的な解決策を見つけるために、ハッカソンの新しいフォーマットは、miccai 2020 conference: the miccai hackathonのサテライトイベントとして開始された。 MICCAIハッカソンの第1版では、MICCAI論文の再現性、多様性、選択について論じられている。小さなシンクタンクの方法で、参加者は協力してこれらの課題に対する解決策を見つけました。本報告では,MICCAIハッカソンから得られた知見を,これらの課題に対処するための即時的・長期的対策について要約する。提案手法は, 論文の再現性, 多様性, 選択性に関して, MICCAI会議を改善するための議論・行動の出発点および指針とみなすことができる。

The MICCAI conference has encountered tremendous growth over the last years in terms of the size of the community, as well as the number of contributions and their technical success. With this growth, however, come new challenges for the community. Methods are more difficult to reproduce and the ever-increasing number of paper submissions to the MICCAI conference poses new questions regarding the selection process and the diversity of topics. To exchange, discuss, and find novel and creative solutions to these challenges, a new format of a hackathon was initiated as a satellite event at the MICCAI 2020 conference: The MICCAI Hackathon. The first edition of the MICCAI Hackathon covered the topics reproducibility, diversity, and selection of MICCAI papers. In the manner of a small think-tank, participants collaborated to find solutions to these challenges. In this report, we summarize the insights from the MICCAI Hackathon into immediate and long-term measures to address these challenges. The proposed measures can be seen as starting points and guidelines for discussions and actions to possibly improve the MICCAI conference with regards to reproducibility, diversity, and selection of papers.

翻訳日:2021-04-05 06:36:12 公開日:2021-03-04

# NADI 2021:第2回Nuanced Arabic Dialect Identification Shared Task

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ( http://arxiv.org/abs/2103.08466v1 )

ライセンス: Link先を確認

Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda Bouamor, Nizar Habash

(参考訳) 本研究は,第2回Nuanced Arabic Dialect Identification Shared Task (NADI 2021)の結果と結果を報告する。この共有タスクには、国レベルの現代標準アラビア語(MSA)識別(Subtask 1.1)、国レベルの方言識別(Subtask 1.2)、州レベルのMSA識別(Subtask 2.1)、州レベルの方言識別(Subtask 2.2)の4つのサブタスクが含まれる。共有タスクデータセットは、Twitterドメインから収集された21のアラブ諸国から合計100の州をカバーする。 23か国53チームが参加登録しており、この地域のコミュニティの関心を反映している。 5チームからsubtask 1.1,8チームから27のsubtask 1.2,4チームから12のsubtask 2.1,4チームから13のsubtask 2.2の申し込みを受けました。

We present the findings and results of the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1.1), country-level dialect identification (Subtask 1.2), province-level MSA identification (Subtask 2.1), and province-level sub-dialect identification (Subtask 2.2). The shared task dataset covers a total of 100 provinces from 21 Arab countries, collected from the Twitter domain. A total of 53 teams from 23 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 16 submissions for Subtask 1.1 from five teams, 27 submissions for Subtask 1.2 from eight teams, 12 submissions for Subtask 2.1 from four teams, and 13 Submissions for subtask 2.2 from four teams.

翻訳日:2021-04-05 00:55:09 公開日:2021-03-04

# 高度運転支援システムにおける信頼に値するAIの評価リストの探索

Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems ( http://arxiv.org/abs/2103.09051v1 )

ライセンス: Link先を確認

Markus Borg, Joshua Bronson, Linus Christensson, Fredrik Olsson, Olof Lennartsson, Elias Sonnsj\"o, Hamid Ebabi, Martin Karsberg

(参考訳) 人工知能(AI)はますます重要な応用に使われている。したがって、信頼性の高いAIシステムの必要性は急速に高まっている。 2018年、欧州委員会は専門家をAI-HLEG(High-Level Expert Group on AI)に任命した。 AI-HLEGは、信頼できるAIを、1)合法、2)倫理的、3)堅牢で、対応する7つの重要な要件として定義した。開発組織を支援するため、AI-HLEGは先頃、信頼できるAI評価リスト(ALTAI)を公開した。本稿では,機械学習(ML)を活用した高度運転支援システム(ADAS)の開発プロジェクトへのALTAIの適用例を紹介する。われわれの経験から、ALTAIはADAS開発に大半が当てはまるが、人事機関や透明性に関連する特定の部分は無視できる。さらに、社会的・環境的な影響に関する大きな疑問は、ADASサプライヤーが単独で取り組むことはできない。我々は,altai準拠性を確保するためにadasの開発計画を述べる。最後に,altaiの次回の改訂,すなわちライフサイクル変種,ドメイン固有適応,冗長性除去のための3つの推奨事項を提示する。

Artificial Intelligence (AI) is increasingly used in critical applications. Thus, the need for dependable AI systems is rapidly growing. In 2018, the European Commission appointed experts to a High-Level Expert Group on AI (AI-HLEG). AI-HLEG defined Trustworthy AI as 1) lawful, 2) ethical, and 3) robust and specified seven corresponding key requirements. To help development organizations, AI-HLEG recently published the Assessment List for Trustworthy AI (ALTAI). We present an illustrative case study from applying ALTAI to an ongoing development project of an Advanced Driver-Assistance System (ADAS) that relies on Machine Learning (ML). Our experience shows that ALTAI is largely applicable to ADAS development, but specific parts related to human agency and transparency can be disregarded. Moreover, bigger questions related to societal and environmental impact cannot be tackled by an ADAS supplier in isolation. We present how we plan to develop the ADAS to ensure ALTAI-compliance. Finally, we provide three recommendations for the next revision of ALTAI, i.e., life-cycle variants, domain-specific adaptations, and removed redundancy.

翻訳日:2021-04-05 00:54:27 公開日:2021-03-04

# トポロジカル深層学習

Topological Deep Learning ( http://arxiv.org/abs/2101.05778v2 )

ライセンス: Link先を確認

Ephy R. Love, Benjamin Filippenko, Vasileios Maroulas, Gunnar Carlsson

(参考訳) この研究は、いくつかのトポロジカルに定義された畳み込み手法を含むトポロジカルCNN(TCNN)を紹介する。自然な画像空間と重要な関係を持つ多様体は、TCNNの畳み込み重みとして使用される画像フィルタのパラメータ化に使用される。これらの多様体はまた、重みが局所化されるTCNNの層におけるスライスをパラメータ化する。従来のCNNに比べて,TNNがより速く,少ないデータで学習し,学習パラメータが少なく,一般化性と解釈性が高いことを示す。我々は、画像データとビデオデータの両方にTCNN層を導入し、探索する。 3D画像と3Dビデオの拡張を提案する。

This work introduces the Topological CNN (TCNN), which encompasses several topologically defined convolutional methods. Manifolds with important relationships to the natural image space are used to parameterize image filters which are used as convolutional weights in a TCNN. These manifolds also parameterize slices in layers of a TCNN across which the weights are localized. We show evidence that TCNNs learn faster, on less data, with fewer learned parameters, and with greater generalizability and interpretability than conventional CNNs. We introduce and explore TCNN layers for both image and video data. We propose extensions to 3D images and 3D video.

翻訳日:2021-03-29 00:56:35 公開日:2021-03-04

# DeepDT: Delaunay Triangulation による表面再構成の学習

DeepDT: Learning Geometry From Delaunay Triangulation for Surface Reconstruction ( http://arxiv.org/abs/2101.10353v2 )

ライセンス: Link先を確認

Yiming Luo, Zhenxing Mi, Wenbing Tao

(参考訳) 本稿では,DeepDTと呼ばれる新しい学習ネットワークを提案し,点雲のデラウネー三角測量から表面を再構築する。 deepdtは、点雲と対応するdelaunay三角測量から直接、delaunay tetrahedronsの内外ラベルを予測することを学ぶ。局所幾何学的特徴はまず入力点雲から抽出され、デラウネー三角測量から得られるグラフに集約される。次に、テトラセドロンのラベル予測に構造正規化を加えるために、集約された特徴にグラフフィルタリングを適用する。四面体と三角形の間の複雑な空間関係のため、基底真理面から四面体の基底真理ラベルを直接生成することは不可能である。そこで我々は,その内部にサンプリング位置のラベルを付けたテトラヘドロンのラベルを投票するマルチラベル監視戦略を提案する。提案したDeepDTは、特にオープンシーンの内面に対して、過度に複雑な表面を生成することなく、豊富な幾何学的詳細を維持できる。一方,提案手法の一般化能力と時間消費は,最先端手法と比較して許容され,競争力がある。実験は提案されたDeepDTの優れた性能を示す。

In this paper, a novel learning-based network, named DeepDT, is proposed to reconstruct the surface from Delaunay triangulation of point cloud. DeepDT learns to predict inside/outside labels of Delaunay tetrahedrons directly from a point cloud and corresponding Delaunay triangulation. The local geometry features are first extracted from the input point cloud and aggregated into a graph deriving from the Delaunay triangulation. Then a graph filtering is applied on the aggregated features in order to add structural regularization to the label prediction of tetrahedrons. Due to the complicated spatial relations between tetrahedrons and the triangles, it is impossible to directly generate ground truth labels of tetrahedrons from ground truth surface. Therefore, we propose a multilabel supervision strategy which votes for the label of a tetrahedron with labels of sampling locations inside it. The proposed DeepDT can maintain abundant geometry details without generating overly complex surfaces , especially for inner surfaces of open scenes. Meanwhile, the generalization ability and time consumption of the proposed method is acceptable and competitive compared with the state-of-the-art methods. Experiments demonstrate the superior performance of the proposed DeepDT.

翻訳日:2021-03-14 19:04:33 公開日:2021-03-04

# (参考訳) 低高度における非相関空域エンカウンターモデルの適用性とサロガビリティ

Applicability and Surrogacy of Uncorrelated Airspace Encounter Models at Low Altitudes ( http://arxiv.org/abs/2103.04753v1 )

ライセンス: CC BY 4.0

Ngaire Underhill, Andrew Weinert

(参考訳) 国立航空システム(NAS)は、安全で効率的な航空を可能にする複雑で進化したシステムです。高度なエアモビリティの概念と無人航空機のような新しい空域参入者は、全体的な安全性や効率を低下させることなくNASに統合する必要があります。例えば、航空機間の空中衝突のリスクを軽減するために、規制、基準、システムが必要となる。モンテカルロシミュレーションは航空機の衝突回避システムを開発し、評価し、認定するための基礎的な能力である。これらはしばしば人力による実験や飛行試験によって検証される。多くの航空安全研究において、有人航空機の挙動は動的ベイズネットワークを用いて表される。オリジナルの統計モデルは2008年から2013年にかけて開発され、500フィート以上の地上レベル(AGL)の安全シミュレーションをサポートした。しかし、これらのモデルは500フィートAGL以下の小さなUAS操作の安全性を評価するには十分ではなかった。その結果、高度500フィート AGL 以下の新しいモデルが2018年から開発されている。多くのモデルは、航空機の挙動は相関性がなく、航空交通サービスや近隣の航空機に依存しないと考えている。本研究の目的は、従来の航空機の様々な非相関モデルを比較し、モデルの違いを特定することである。特にロータークラフトのモデルが固定翼機のモデルと十分に異なる場合、タイプ固有のモデルが必要となる。主な貢献は、低高度運転用に設計された衝突回避システムの性能を評価する際に、非相関モデルが活用すべきガイダンスです。また、トランスポンダを使わずに非協調航空機のサロゲートモデルに対処する。

The National Airspace System (NAS) is a complex and evolving system that enables safe and efficient aviation. Advanced air mobility concepts and new airspace entrants, such as unmanned aircraft, must integrate into the NAS without degrading overall safety or efficiency. For instance, regulations, standards, and systems are required to mitigate the risk of a midair collision between aircraft. Monte Carlo simulations have been a foundational capability for decades to develop, assess, and certify aircraft conflict avoidance systems. These are often validated through human-in-the-loop experiments and flight testing. For many aviation safety studies, manned aircraft behavior is represented using dynamic Bayesian networks. The original statistical models were developed from 2008-2013 to support safety simulations for altitudes above 500 feet Above Ground Level (AGL). However, these models were not sufficient to assess the safety of smaller UAS operations below 500 feet AGL. In response, newer models with altitude floors below 500 feet AGL have been in development since 2018. Many of the models assume that aircraft behavior is uncorrelated and not dependent on air traffic services or nearby aircraft. Our research objective was to compare the various uncorrelated models of conventional aircraft and identify how the models differ. Particularly if models of rotorcraft were sufficiently different than models of fixed-wing aircraft to require type specific models. The primary contribution is guidance on which uncorrelated models to leverage when evaluating the performance of a collision avoidance system designed for low altitude operations. We also address which models can be surrogates for noncooperative aircraft without transponders.

翻訳日:2021-03-10 19:01:54 公開日:2021-03-04

# アンダーサンプリングスペクトルデータを用いた光コヒーレンストモグラフィにおけるニューラルネットワークによる画像再構成

Neural network-based image reconstruction in swept-source optical coherence tomography using undersampled spectral data ( http://arxiv.org/abs/2103.03877v1 )

ライセンス: Link先を確認

Yijie Zhang, Tairan Liu, Manmohan Singh, Yilin Luo, Yair Rivenson, Kirill V. Larin, and Aydogan Ozcan

(参考訳) 光学コヒーレンストモグラフィ(OCT)は、サンプルの体積画像を迅速に提供できる広く使用されている非侵襲的バイオメディカルイメージングモダリティです。本稿では、空間エイリアスのない低サンプリングスペクトルデータを用いて、スイープソースOCT(SS-OCT)画像を生成するディープラーニングベースの画像再構築フレームワークを提案する。このニューラルネットワークベースの画像再構成は、光学的設定にハードウェアの変更は必要とせず、既存のsweeptソースまたはスペクトル領域octシステムと容易に統合でき、取得する生のスペクトルデータ量を減らすことができる。本フレームワークの有効性を示すため,SS-OCTシステムにより画像化されたマウス胚を用いた深層ニューラルネットワークの訓練と盲目的試験を行った。トレーニングされたニューラルネットワークは、2倍のアンサンプ付きスペクトルデータ(Aラインあたり640のスペクトル点)を使用して、デスクトップコンピュータを用いて512のAラインを6.73msで盲目的に再構築し、スペクトルアンサンプによる空間エイリアシングアーティファクトを除去し、同じサンプルの画像と非常によく一致し、全スペクトルCTデータ(Aライン当たり1280のスペクトル点)を使用して再構成することができる。また,A線当たりのスペクトルデータを3xアンサンプ化するために,このフレームワークをさらに拡張できることを実証した。この深層学習可能な画像再構成手法は、様々なスペクトル領域octシステムで広く使用することができ、画像解像度と信号対雑音比を犠牲にすることなく画像速度を向上させることができる。

Optical Coherence Tomography (OCT) is a widely used non-invasive biomedical imaging modality that can rapidly provide volumetric images of samples. Here, we present a deep learning-based image reconstruction framework that can generate swept-source OCT (SS-OCT) images using undersampled spectral data, without any spatial aliasing artifacts. This neural network-based image reconstruction does not require any hardware changes to the optical set-up and can be easily integrated with existing swept-source or spectral domain OCT systems to reduce the amount of raw spectral data to be acquired. To show the efficacy of this framework, we trained and blindly tested a deep neural network using mouse embryo samples imaged by an SS-OCT system. Using 2-fold undersampled spectral data (i.e., 640 spectral points per A-line), the trained neural network can blindly reconstruct 512 A-lines in ~6.73 ms using a desktop computer, removing spatial aliasing artifacts due to spectral undersampling, also presenting a very good match to the images of the same samples, reconstructed using the full spectral OCT data (i.e., 1280 spectral points per A-line). We also successfully demonstrate that this framework can be further extended to process 3x undersampled spectral data per A-line, with some performance degradation in the reconstructed image quality compared to 2x spectral undersampling. This deep learning-enabled image reconstruction approach can be broadly used in various forms of spectral domain OCT systems, helping to increase their imaging speed without sacrificing image resolution and signal-to-noise ratio.

翻訳日:2021-03-09 15:50:03 公開日:2021-03-04

# ニューラルネットワークに基づくネットワーク自動化のための量子化

Neural Network-based Quantization for Network Automation ( http://arxiv.org/abs/2103.04764v1 )

ライセンス: Link先を確認

Marton Kajo, Stephen S. Mwanje, Benedek Schultz, Georg Carle

(参考訳) ディープラーニング手法はモバイルネットワーク、特に高度なマシン認知のための手段を提供するネットワーク管理自動化に採用されている。ディープラーニング手法は最先端のハードウェアとソフトウェアツールを使用し、複雑な認知アルゴリズムを開発できる。最近の論文では,k-Meansアルゴリズムの修正であるBunding Sphere Quantization (BSQ)アルゴリズムを導入し,異常検出などの特定のネットワーク管理ユースケースに対して,より優れた量子化を実現することを示した。しかし、BSQはk-Meansよりもトレーニングにかなり長い時間を要し、ニューラルネットワークベースの実装で克服できる課題である。本稿では,最先端のディープラーニングツールを用いて,競争力のあるトレーニング速度を実現するBSQの実装を提案する。

Deep Learning methods have been adopted in mobile networks, especially for network management automation where they provide means for advanced machine cognition. Deep learning methods utilize cutting-edge hardware and software tools, allowing complex cognitive algorithms to be developed. In a recent paper, we introduced the Bounding Sphere Quantization (BSQ) algorithm, a modification of the k-Means algorithm, that was shown to create better quantizations for certain network management use-cases, such as anomaly detection. However, BSQ required a significantly longer time to train than k-Means, a challenge which can be overcome with a neural network-based implementation. In this paper, we present such an implementation of BSQ that utilizes state-of-the-art deep learning tools to achieve a competitive training speed.

翻訳日:2021-03-09 15:26:42 公開日:2021-03-04

# (参考訳) 多重重要度サンプリングによる保守的最適政策最適化

Conservative Optimistic Policy Optimization via Multiple Importance Sampling ( http://arxiv.org/abs/2103.03307v1 )

ライセンス: CC BY 4.0

Achraf Azize and Othman Gaizi

(参考訳) 強化学習(rl)は,アタリゲームのプレイやgoのゲーム解決といった難しい問題を,統一的なアプローチで解決することができる。しかし、現代のディープRLアプローチは、まだ現実世界のアプリケーションでは広く使われていない。理由の1つは、既存の(すでに稼働している)ベースラインポリシーと比較して、中間実行ポリシーのパフォーマンスに対する保証がないことである。本論文では,政策最適化問題における保守的な探索を解くオンラインモデルフリーアルゴリズムを提案する。提案されたアプローチの後悔は、離散パラメータ空間と連続パラメータ空間の両方に対して $\tilde{\mathcal{O}}(\sqrt{T})$ で有界であることを示した。

Reinforcement Learning (RL) has been able to solve hard problems such as playing Atari games or solving the game of Go, with a unified approach. Yet modern deep RL approaches are still not widely used in real-world applications. One reason could be the lack of guarantees on the performance of the intermediate executed policies, compared to an existing (already working) baseline policy. In this paper, we propose an online model-free algorithm that solves conservative exploration in the policy optimization problem. We show that the regret of the proposed approach is bounded by $\tilde{\mathcal{O}}(\sqrt{T})$ for both discrete and continuous parameter spaces.

翻訳日:2021-03-09 05:37:27 公開日:2021-03-04

# (参考訳) nutrition5k:ジェネリック食品の自動栄養理解に向けて

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food ( http://arxiv.org/abs/2103.03375v1 )

ライセンス: CC BY 4.0

Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, Jack Sim

(参考訳) 視覚データから食品の栄養分を理解することはコンピュータビジョンの問題であり、公衆衛生にポジティブで広範な影響を与える可能性がある。この領域の研究は、栄養学的理解能力を持つモデルの訓練に必要な十分な多様性やラベルが欠けている分野の既存のデータセットに限られている。本研究では,映像ストリーム,奥行き画像,成分重み,高精細な栄養コンテンツアノテーションを備えた,5kの多様な実世界の食品料理のデータセットである nutrition5kを紹介する。本稿では, 複雑で現実的な料理のカロリーおよびマクロ栄養価を, プロの栄養士を上回る精度で予測できるコンピュータビジョンアルゴリズムを訓練することにより, このデータセットの可能性を実証する。さらに,栄養予測を改善するため,深度センサデータを組み込んだベースラインを提案する。栄養理解の領域でイノベーションを加速することを願って、Nutrition5kを公にリリースします。

Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health. Studies in this area are limited to existing datasets in the field that lack sufficient diversity or labels required for training models with nutritional understanding capability. We introduce Nutrition5k, a novel dataset of 5k diverse, real world food dishes with corresponding video streams, depth images, component weights, and high accuracy nutritional content annotation. We demonstrate the potential of this dataset by training a computer vision algorithm capable of predicting the caloric and macronutrient values of a complex, real world dish at an accuracy that outperforms professional nutritionists. Further we present a baseline for incorporating depth sensor data to improve nutrition predictions. We will publicly release Nutrition5k in the hope that it will accelerate innovation in the space of nutritional understanding.

翻訳日:2021-03-09 04:57:40 公開日:2021-03-04

# (参考訳) ハードラベルマニホールド: オンマニホールドアドバンサリの例を見つけるためのクエリ効率の予期せぬ利点

Hard-label Manifolds: Unexpected Advantages of Query Efficiency for Finding On-manifold Adversarial Examples ( http://arxiv.org/abs/2103.03325v1 )

ライセンス: CC BY 4.0

Washington Garcia, Pin-Yu Chen, Somesh Jha, Scott Clouse, Kevin R. B. Butler

(参考訳) 相手の例に強いディープネットワークを設計することは、依然としてオープンな問題です。同様に、画像分類モデルに対する最近のゼロオーダーのハードラベル攻撃は、ファーストオーダーのグラデーションレベルの代替品に匹敵するパフォーマンスを示している。最近、グラデーションレベルの設定では、通常の敵対的な例がデータ多様体から離れ、オンマニホールドの例が実際には一般化エラーであることが示されている。本論文では,0次設定におけるクエリ効率が,データマニホールドを介して相手のトラバーサルと結びついていることを論じる。この振る舞いを説明するために,雑音の多い多様体距離オラクルに基づく情報理論の議論を提案し,敵の勾配推定を通じて多様体情報を漏らす。多様体勾配相互情報の数値実験により,この挙動が有効な問題次元と訓練点の数の関数として作用することを示す。実世界のデータセットと次元還元を用いた複数のゼロ次攻撃では、同じ普遍的な挙動を観察して、データ多様体に近いサンプルを生成する。この結果、モデルロバスト性にかかわらず、多様体距離測度は最大で2倍減少する。以上の結果から,多様体段階の相互情報を考慮に入れることで,将来より頑健なモデル設計に寄与し,感度の高いデータ多様体の漏洩を回避できることが示唆された。

Designing deep networks robust to adversarial examples remains an open problem. Likewise, recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives. It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. In this paper, we argue that query efficiency in the zeroth-order setting is connected to an adversary's traversal through the data manifold. To explain this behavior, we propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate. Through numerical experiments of manifold-gradient mutual information, we show this behavior acts as a function of the effective problem dimensionality and number of training points. On real-world datasets and multiple zeroth-order attacks using dimension-reduction, we observe the same universal behavior to produce samples closer to the data manifold. This results in up to two-fold decrease in the manifold distance measure, regardless of the model robustness. Our results suggest that taking the manifold-gradient mutual information into account can thus inform better robust model design in the future, and avoid leakage of the sensitive data manifold.

翻訳日:2021-03-08 18:51:34 公開日:2021-03-04

# 大規模対話型AIシステムにおけるスキルルーティングのためのニューラルモデルロバストネス--設計選択探索

Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration ( http://arxiv.org/abs/2103.03373v1 )

ライセンス: Link先を確認

Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya

(参考訳) 産業における最新の大規模対話型AIまたはインテリジェントデジタルアシスタントシステムは、自動音声認識(ASR)や自然言語理解(NLU)などの一連のコンポーネントで構成されています。共有nluオントロジー(例えば集中型インテント/スロットスキーマ)を利用するシステムでは、リクエストを適切なスキルに正しくルーティングする独立したスキルルーティングコンポーネントが存在します。スキルルーティングコンポーネントは、同じインテントをサブスクライブしたり、特定のコンテキスト条件下でインテントをサブスクライブする(例えば、デバイスにはスクリーンがある)ことができる何千ものスキルがあるため、必要である。スキルルーティングモデルが本番環境にデプロイされた後、オントロジーのサブスクリプションを動的に変更する可能性があるため、スキルルーティングコンポーネントにおけるモデルの堅牢性やレジリエンスを保証することが重要な問題である。本稿では,最新の商用会話型aiシステムにおけるスキルルーティングの文脈におけるモデルロバスト性,特にデータ拡張,モデルアーキテクチャ,最適化方法に関する選択に,異なるモデリング設計選択がどう影響するかを示す。データ拡張を適用することは、モデルロバスト性を大幅に改善する非常に効果的で実用的な方法であることを示す。

Current state-of-the-art large-scale conversational AI or intelligent digital assistant systems in industry comprises a set of components such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). For some of these systems that leverage a shared NLU ontology (e.g., a centralized intent/slot schema), there exists a separate skill routing component to correctly route a request to an appropriate skill, which is either a first-party or third-party application that actually executes on a user request. The skill routing component is needed as there are thousands of skills that can either subscribe to the same intent and/or subscribe to an intent under specific contextual conditions (e.g., device has a screen). Ensuring model robustness or resilience in the skill routing component is an important problem since skills may dynamically change their subscription in the ontology after the skill routing model has been deployed to production. We show how different modeling design choices impact the model robustness in the context of skill routing on a state-of-the-art commercial conversational AI system, specifically on the choices around data augmentation, model architecture, and optimization method. We show that applying data augmentation can be a very effective and practical way to drastically improve model robustness.

翻訳日:2021-03-08 15:07:43 公開日:2021-03-04

# 忘れたいことを思い出す: 機械学習のためのアルゴリズム

Remember What You Want to Forget: Algorithms for Machine Unlearning ( http://arxiv.org/abs/2103.03279v1 )

ライセンス: Link先を確認

Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh

(参考訳) 学習モデルからデータポイントを忘れる問題について検討する。この場合、学習者はまずデータセット$S$ drawing i.i.dを受け取る。未知の分布から出力し、その分布から見当たらないサンプルでうまく機能する予測値 $w$ を出力します。しかし、将来のある時点では、任意のトレーニングデータポイント$z \in S$は、未学習を要求できるため、学習者は、同じ精度の保証を保ちながら、出力予測を変更できる。本研究は,人口環境における機械学習の厳密な研究を開始し,その目的は,目に見えないテスト損失に対するパフォーマンスを維持することである。次に,凸損失関数の学習アルゴリズムを提供する。凸損失の設定については、最大$o(n/d^{1/4})$サンプルを削除できる未学習アルゴリズムを提供し、ここで$d$が問題次元である。対照的に、一般に、差分的プライベート学習v(非学習を意味する)は$O(n/d^{1/2})$サンプルの削除のみを保証する。これは、非学習が、削除能力の$d$への依存の観点からプライベートに学習するよりも少なくとも多項式に効率的であることを示している。

We study the problem of forgetting datapoints from a learnt model. In this case, the learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a predictor $w$ that performs well on unseen samples from that distribution. However, at some point in the future, any training data point $z \in S$ can request to be unlearned, thus prompting the learner to modify its output predictor while still ensuring the same accuracy guarantees. In our work, we initiate a rigorous study of machine unlearning in the population setting, where the goal is to maintain performance on the unseen test loss. We then provide unlearning algorithms for convex loss functions. For the setting of convex losses, we provide an unlearning algorithm that can delete up to $O(n/d^{1/4})$ samples, where $d$ is the problem dimension. In comparison, in general, differentially private learningv(which implies unlearning) only guarantees deletion of $O(n/d^{1/2})$ samples. This shows that unlearning is at least polynomially more efficient than learning privately in terms of dependence on $d$ in the deletion capacity.

翻訳日:2021-03-08 15:04:09 公開日:2021-03-04

# 二階情報によるモーメントの補正

Correcting Momentum with Second-order Information ( http://arxiv.org/abs/2103.03265v1 )

ライセンス: Link先を確認

Hoang Tran, Ashok Cutkosky

(参考訳) 非凸確率最適化のための新しいアルゴリズムを開発し、最適な$o(\epsilon^{-3})$確率勾配とヘッセンベクトル積計算において$\epsilon$臨界点を求める。我々のアルゴリズムは、運動量を持つSGDの運動量におけるバイアス項を「修正」するためにヘシアンベクトル積を用いる。これにより、分散還元法に類似した方法で勾配推定が改善される。従来の作業とは対照的に、過大なバッチサイズ(あるいは、バッチサイズに関するいかなる制限も)は必要とせず、我々のアルゴリズムと解析はよりシンプルです。私たちは、SGDとAdamよりも改善が見られる、さまざまな大規模ディープラーニングベンチマークとアーキテクチャの結果を検証しています。

We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and hessian-vector product computations. Our algorithm uses Hessian-vector products to "correct" a bias term in the momentum of SGD with momentum. This leads to better gradient estimates in a manner analogous to variance reduction methods. In contrast to prior work, we do not require excessively large batch sizes (or indeed any restrictions at all on the batch size), and both our algorithm and its analysis are much simpler. We validate our results on a variety of large-scale deep learning benchmarks and architectures, where we see improvements over SGD and Adam.

翻訳日:2021-03-08 15:02:14 公開日:2021-03-04

# ランダムSHAPのアンサンブル

Ensembles of Random SHAPs ( http://arxiv.org/abs/2103.03302v1 )

ライセンス: Link先を確認

Lev V. Utkin and Andrei V. Konstantinov

(参考訳) ブラックボックスモデルの局所説明のためのよく知られたSHapley Additive exPlanations (SHAP) 法のアンサンブルに基づく修正が提案されている。修正は、多くの機能がある場合に計算コストがかかるshapを単純化することを目的としている。提案された修正の背景にある主な考え方は、少数の特徴を持つSHAPのアンサンブルによってSHAPを近似することである。 ER-SHAPと呼ばれる最初の修正では、いくつかの特徴が特徴集合からランダムに選択され、その特徴のShapley値は「小さい」SHAPによって計算される。説明結果は、最終的なShapley値を得るために平均されます。 ERW-SHAPと呼ばれる2番目の修正では、説明されたインスタンスの周りに多様性のためにいくつかのポイントが生成され、その説明の結果はポイントと説明されたインスタンスの距離に応じて重みと結合される。 ER-SHAP-RFと呼ばれる第3の修正は、ER-SHAPのアンサンブルベースの手順における特徴の選択に適用される特徴確率分布の予備的説明にランダムフォレストを用いている。提案された修正を例示する多くの数値実験は、その効率と局所的な説明のための特性を示す。

Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called ER-SHAP, several features are randomly selected many times from the feature set, and Shapley values for the features are computed by means of "small" SHAPs. The explanation results are averaged to get the final Shapley values. According to the second modification, called ERW-SHAP, several points are generated around the explained instance for diversity purposes, and results of their explanation are combined with weights depending on distances between points and the explained instance. The third modification, called ER-SHAP-RF, uses the random forest for preliminary explanation of instances and determining a feature probability distribution which is applied to selection of features in the ensemble-based procedure of ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for local explanation.

翻訳日:2021-03-08 15:02:00 公開日:2021-03-04

# ラベルシフトによる分類の分布自由不確実性定量化

Distribution-free uncertainty quantification for classification under label shift ( http://arxiv.org/abs/2103.03323v1 )

ライセンス: Link先を確認

Aleksandr Podkopaev, Aaditya Ramdas

(参考訳) MLモデルの信頼できる展開には、特に安全クリティカルなアプリケーションにおいて、不確実性の適切な測定が必要です。我々は,2つの経路による分類問題に対する不確実性定量化 (uq) に焦点をあて, 共形予測を用いた予測セットとポストホックバイナリ化による確率的予測器のキャリブレーションを行う。データだ i.i.d.を超えて一般化する2つの一般的な方法設定には共変量とラベルシフトの扱いが含まれる。流通のないUQの文脈では、前者は既に注目を集めていますが、後者ではありません。ラベルシフトは予測を損なうことが知られており、最初に、カバレッジとキャリブレーションの劣化を示すことでuqも損なうと論じる。ラベルシフトへの対応(より良い予測のために)の最近の進歩を裏付けて、ターゲット分布からのラベルされていないデータが利用可能であるたびに、上記の適合および校正手順を重み付けすることにより、UQを達成する正しい方法を検討します。これらの手法を, 理論上, 分散性のない枠組みで検討し, その優れた実用性を示す。

Trustworthy deployment of ML models requires a proper measure of uncertainty, especially in safety-critical applications. We focus on uncertainty quantification (UQ) for classification problems via two avenues -- prediction sets using conformal prediction and calibration of probabilistic predictors by post-hoc binning -- since these possess distribution-free guarantees for i.i.d. data. Two common ways of generalizing beyond the i.i.d. setting include handling covariate and label shift. Within the context of distribution-free UQ, the former has already received attention, but not the latter. It is known that label shift hurts prediction, and we first argue that it also hurts UQ, by showing degradation in coverage and calibration. Piggybacking on recent progress in addressing label shift (for better prediction), we examine the right way to achieve UQ by reweighting the aforementioned conformal and calibration procedures whenever some unlabeled data from the target distribution is available. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.

翻訳日:2021-03-08 15:01:41 公開日:2021-03-04

# 医用画像解析における深層学習一般化の複雑性評価

Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis ( http://arxiv.org/abs/2103.03328v1 )

ライセンス: Link先を確認

Aleksandar Vakanski, Min Xian

(参考訳) 医用画像解析のためのディープラーニングモデルの一般化誤差は、データ取得、デバイス設定、患者集団のために異なるデバイスで収集された画像に対してしばしば減少する。新しい画像に対する一般化能力の理解が深層学習における臨床医の信頼性に不可欠である。近年,一般化限界と複雑性尺度の確立に向けた研究が盛んに行われているが,予測と実際の一般化性能との間には大きな差があることが多い。同様に、関連する大規模な実証研究は、主に汎用画像データセットによる検証に基づいている。本稿では,乳房超音波画像における25種類の複雑性尺度と教師付き深層学習分類器の一般化能力の相関について検討する。結果は,PAC-Bayes平坦度とパスノルムに基づく尺度が,モデルとデータの組み合わせについて最も一貫した説明をもたらすことを示唆している。また,乳房画像に対するマルチタスク分類とセグメンテーション手法の利用について検討し,これらの学習手法が暗黙の正規化として機能し,一般化の促進に寄与することを示す。

The generalization error of deep learning models for medical image analysis often decreases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness in deep learning. Although significant research efforts have been recently directed toward establishing generalization bounds and complexity measures, still, there is often a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of supervised deep learning classifiers for breast ultrasound images. The results indicate that PAC-Bayes flatness-based and path norm-based measures produce the most consistent explanation for the combination of models and data. We also investigate the use of multi-task classification and segmentation approach for breast images, and report that such learning approach acts as an implicit regularizer and is conducive toward improved generalization.

翻訳日:2021-03-08 15:00:01 公開日:2021-03-04

# PVG at WASSA 2021: 共感と距離予測のためのマルチ入力、マルチタスク、トランスフォーマーベースのアーキテクチャ

PVG at WASSA 2021: A Multi-Input, Multi-Task, Transformer-Based Architecture for Empathy and Distress Prediction ( http://arxiv.org/abs/2103.03296v1 )

ライセンス: Link先を確認

Atharva Kulkarni, Sunanda Somwase, Shivam Rajput, and Manisha Marathe

(参考訳) 共感と苦痛の感情現象に関する活発な研究は、人間と機械の相互作用を改善するために非常に貴重です。これらの構成は心理学理論に深く根ざしているため、テキストデータからそのような複雑な感情の強度を予測することは困難です。したがって、より良い予測のためには、心理的テストスコア、人口統計学的特徴、潜在的な原始的感情の基礎、テキストのアンダートーンとその心理的複雑さなどの補助要因を考慮することが不可欠です。本稿では,WASSA 2021のニュース記事に対する共感と感情の予測に関する共有タスクに対するPVGのソリューションについて述べる。テキストデータ,人口統計特性,心理テストスコア,原始感情と共感の本質的な相互依存を利用して,共感スコア予測タスクのためのマルチ入力マルチタスクフレームワークを提案する。ここで、共感スコア予測は第一次タスクと見なされ、感情と共感の分類は二次補助タスクと見なされます。ストレススコア予測タスクでは、語彙的特徴の追加により、システムはさらに強化される。私たちの提案は、平均相関(0.545)と苦痛相関(0.574)と共感的ピアソン相関(0.517)の2$^{nd}$に基づいて1$^{st}$をランク付けしました。

Active research pertaining to the affective phenomenon of empathy and distress is invaluable for improving human-machine interaction. Predicting intensities of such complex emotions from textual data is difficult, as these constructs are deeply rooted in the psychological theory. Consequently, for better prediction, it becomes imperative to take into account ancillary factors such as the psychological test scores, demographic features, underlying latent primitive emotions, along with the text's undertone and its psychological complexity. This paper proffers team PVG's solution to the WASSA 2021 Shared Task on Predicting Empathy and Emotion in Reaction to News Stories. Leveraging the textual data, demographic features, psychological test score, and the intrinsic interdependencies of primitive emotions and empathy, we propose a multi-input, multi-task framework for the task of empathy score prediction. Here, the empathy score prediction is considered the primary task, while emotion and empathy classification are considered secondary auxiliary tasks. For the distress score prediction task, the system is further boosted by the addition of lexical features. Our submission ranked 1$^{st}$ based on the average correlation (0.545) as well as the distress correlation (0.574), and 2$^{nd}$ for the empathy Pearson correlation (0.517).

翻訳日:2021-03-08 14:59:27 公開日:2021-03-04

# HLAの多重特徴表現を用いた腎臓移植生存予測

Predicting Kidney Transplant Survival using Multiple Feature Representations for HLAs ( http://arxiv.org/abs/2103.03305v1 )

ライセンス: Link先を確認

Mohammadreza Nemati, Haonan Zhang, Michael Sloma, Dulat Bekbolsynov, Hong Wang, Stanislaw Stepkowski, and Kevin S. Xu

(参考訳) 腎移植は末期腎疾患患者の生活水準を大幅に向上させることができる。腎移植の移植生存時間(移植が失敗し、患者が別の移植を受けるまでの時間)に影響を与える重要な要因は、ドナーと受取人のヒト白血球抗原(HLA)の適合性である。本稿では,HLA情報を機械学習による生存分析アルゴリズムに組み込む生体関連特徴表現を提案する。提案したHLA特徴表現は10万以上の移植のデータベース上で評価し, 約1%の精度で予測精度が向上し, 患者レベルでは緩やかだが, 社会的レベルでは有意義であることが確認された。生存時間の正確な予測は、移植後の生存率を改善でき、受け手へのドナーの割り当てが向上し、移植片の不全による再移植の回数が減少する。

Kidney transplantation can significantly enhance living standards for people suffering from end-stage renal disease. A significant factor that affects graft survival time (the time until the transplant fails and the patient requires another transplant) for kidney transplantation is the compatibility of the Human Leukocyte Antigens (HLAs) between the donor and recipient. In this paper, we propose new biologically-relevant feature representations for incorporating HLA information into machine learning-based survival analysis algorithms. We evaluate our proposed HLA feature representations on a database of over 100,000 transplants and find that they improve prediction accuracy by about 1%, modest at the patient level but potentially significant at a societal level. Accurate prediction of survival times can improve transplant survival outcomes, enabling better allocation of donors to recipients and reducing the number of re-transplants due to graft failure with poorly matched donors.

翻訳日:2021-03-08 14:58:46 公開日:2021-03-04

# ガウス過程がニューラルオードを満たす:希少データと雑音データから部分的に観測されたシステムのダイナミクスを学ぶベイズ的枠組み

Gaussian processes meet NeuralODEs: A Bayesian framework for learning the dynamics of partially observed systems from scarce and noisy data ( http://arxiv.org/abs/2103.03385v1 )

ライセンス: Link先を確認

Mohamed Aziz Bhouri and Paris Perdikaris

(参考訳) 本稿では,非線形力学系の部分的,雑音的,不規則な観測からベイズ系を同定する機械学習フレームワーク(GP-NODE)を提案する。提案手法は微分可能計画における最近の発展を利用して、通常の微分方程式解法を通じて勾配情報を伝播し、ハミルトニアンモンテカルロサンプリングとガウス過程を用いた未知のモデルパラメータに対してベイズ推論を行う。これにより,観測データの時間的相関を活用し,不確かさを定量化したモデル上での後方分布を効率的に推定することができる。さらに、自由モデルパラメータにフィンランドのホースシューのような疎開促進優先度を使用することにより、基礎となる潜在ダイナミクスに対する解釈可能で同義表現の発見が可能になる。捕食者予備システム,システム生物学,50次元ヒューマンモーションダイナミクスシステムを含む提案GP-NODE法の有効性を示すために,一連の数値的研究を行った。総合すると、不確実性の下でデータ駆動モデル発見のための新しい、柔軟で堅牢なワークフローが生まれました。この原稿に付随するすべてのコードとデータは、 \url{https://github.com/PredictiveIntelligenceLab/GP-NODEs} でオンラインで入手できます。

This paper presents a machine learning framework (GP-NODE) for Bayesian systems identification from partial, noisy and irregular observations of nonlinear dynamical systems. The proposed method takes advantage of recent developments in differentiable programming to propagate gradient information through ordinary differential equation solvers and perform Bayesian inference with respect to unknown model parameters using Hamiltonian Monte Carlo sampling and Gaussian Process priors over the observed system states. This allows us to exploit temporal correlations in the observed data, and efficiently infer posterior distributions over plausible models with quantified uncertainty. Moreover, the use of sparsity-promoting priors such as the Finnish Horseshoe for free model parameters enables the discovery of interpretable and parsimonious representations for the underlying latent dynamics. A series of numerical studies is presented to demonstrate the effectiveness of the proposed GP-NODE method including predator-prey systems, systems biology, and a 50-dimensional human motion dynamical system. Taken together, our findings put forth a novel, flexible and robust workflow for data-driven model discovery under uncertainty. All code and data accompanying this manuscript are available online at \url{https://github.com/PredictiveIntelligenceLab/GP-NODEs}.

翻訳日:2021-03-08 14:56:53 公開日:2021-03-04

# ソーシャルメディアのダンス映像から身近な人物の忠実度を学習する

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos ( http://arxiv.org/abs/2103.03319v1 )

ライセンス: Link先を確認

Yasamin Jafarian, Hyun Soo Park

(参考訳) 服を着る人間の幾何学を学ぶための重要な課題は、地上の真実データ(例えば、3Dスキャンされたモデル)の限られた可用性にある。さまざまな外観、衣料品スタイル、パフォーマンス、アイデンティティにまたがるソーシャルメディアダンスビデオの数:我々は、新しいデータリソースを利用して、この課題に取り組みます。それぞれのビデオは、1人の身体と衣服のダイナミックな動きを描いているが、3D地上の真実の幾何学は欠如している。これらの映像を利用するために,予測された人物の局所的幾何を異なるタイミングで他の人物の局所的形状にワープする,局所的変換を用いた新しい手法を提案する。これにより、予測に対する時間的コヒーレンスを強制する自己超越が可能となる。さらに, 局所的なテクスチャ, しわ, 日陰に応答する表面の正常値とともに, 幾何的一貫性を最大化することにより, 深度を共に学習する。本手法はエンドツーエンドで訓練可能であり,入力実画像に忠実な微細形状を予測できる高忠実度深さ推定を行う。本手法は,実画像とレンダリング画像の両方において,最先端の人間の深度推定と人間の形状復元アプローチに勝ることを示す。

A key challenge of learning the geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To utilize these videos, we present a new method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision as enforcing a temporal coherence over the predictions. In addition, we jointly learn the depth along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.

翻訳日:2021-03-08 14:52:59 公開日:2021-03-04

# PolarNet: 極域における自動車レーダを用いた深部オープンスペースのセグメンテーション

PolarNet: Accelerated Deep Open Space Segmentation Using Automotive Radar in Polar Domain ( http://arxiv.org/abs/2103.03387v1 )

ライセンス: Link先を確認

Farzan Erlik Nowruzi, Dhanvin Kolhatkar, Prince Kapoor, Elnaz Jahani Heravi, Fahed Al Hassanat, Robert Laganiere, Julien Rebut, Waqas Malik

(参考訳) カメラとライダー処理は、ディープラーニングモデルアーキテクチャの急速な開発に革命をもたらした。自動車レーダーは、自動運転支援と自動運転システムの重要な要素の1つです。 Radarはまだ、カメラやLidarベースの方法とは異なり、従来の信号処理技術に依存している。これが最も堅牢な知覚システムを実現するための欠落したリンクだと考えています。運転可能なスペースと占有スペースを特定することは、あらゆる自律的な意思決定タスクの最初のステップです。この目的のためにしばしば、環境の占有グリッドマップ表現が使用される。本稿では,空間分割のための極域におけるレーダ情報を処理するディープニューラルモデルであるPolarNetを提案する。さまざまな入出力表現を探ります。本実験では,PolarNetがレーダデータを処理する有効な方法であり,小型化を維持しながら最新の性能と処理速度を達成できることを示した。

Camera and Lidar processing have been revolutionized with the rapid development of deep learning model architectures. Automotive radar is one of the crucial elements of automated driver assistance and autonomous driving systems. Radar still relies on traditional signal processing techniques, unlike camera and Lidar based methods. We believe this is the missing link to achieve the most robust perception system. Identifying drivable space and occupied space is the first step in any autonomous decision making task. Occupancy grid map representation of the environment is often used for this purpose. In this paper, we propose PolarNet, a deep neural model to process radar information in polar domain for open space segmentation. We explore various input-output representations. Our experiments show that PolarNet is a effective way to process radar data that achieves state-of-the-art performance and processing speeds while maintaining a compact size.

翻訳日:2021-03-08 14:52:37 公開日:2021-03-04

# BERTに基づくランキングモデルを用いた転帰学習と擬似ラベルの体系的評価

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models ( http://arxiv.org/abs/2103.03335v1 )

ライセンス: Link先を確認

Iurii Mokrii, Leonid Boytsov, Pavel Braslavski

(参考訳) アノテーションコストが高いため、既存の人為的トレーニングデータを最大限に活用することは重要な研究方向です。そこで本研究では,5つの英語データセットにまたがるBERTに基づくニューラルランキングモデルの伝達性に関する体系的評価を行った。これまでの研究では、大きなデータセットから少数のクエリを持つデータセットへのゼロショットと少数ショットの転送に重点を置いていた。対照的に、各コレクションには膨大な数のクエリがあり、フルショット評価モードを可能にし、結果の信頼性を向上させます。さらに、ソースデータセットのライセンスはしばしば商用利用を禁止しているため、転送学習とBM25スコアラーが生成した擬似ラベルのトレーニングを比較する。擬似ラベルのトレーニング -- おそらくは、わずかな数の注釈付きクエリを使った微調整 -- は、トランスファーラーニングと比較して、競争力や優れたモデルを生み出すことができる。しかし、数発訓練の安定性と/または有効性を改善する必要があるため、事前訓練されたモデルの性能を低下させることができる場合もある。

Due to high annotation costs, making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels -- possibly with subsequent fine-tuning using a modest number of annotated queries -- can produce a competitive or better model compared to transfer learning. However, there is a need to improve the stability and/or effectiveness of the few-shot training, which, in some cases, can degrade performance of a pretrained model.

翻訳日:2021-03-08 14:48:34 公開日:2021-03-04

# GPU指向データ通信アーキテクチャによる大規模グラフ畳み込みネットワークトレーニング

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture ( http://arxiv.org/abs/2103.03330v1 )

ライセンス: Link先を確認

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayeto\u{g}lu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

(参考訳) グラフ畳み込みネットワーク(gcns)は大規模グラフベースのレコメンデーションシステムでますます採用されている。 GCNのトレーニングには、ミニバッチジェネレーターがグラフを横断し、隣接するノードをサンプリングして特徴を得る必要があります。現実のグラフはGPUメモリの容量を超えることが多いため、現在のGCNトレーニングシステムは、フィーチャーテーブルをホストメモリに保持し、GPUに送信する前にスパース機能を集めるためにCPUに依存している。しかしこのアプローチは、ホストメモリの帯域幅とCPUに大きなプレッシャーを与えます。これは、CPUが(1)メモリからスパース機能を読み込み、(2)高密度フォーマットとしてメモリに機能を書き込み、(3)メモリからGPUに機能を転送する必要があるためである。本研究では、GPUスレッドがCPUの助けなしにゼロコピーアクセスを介してホストメモリのスパースな機能に直接アクセスする、GCNトレーニングのための新しいGPU指向データ通信アプローチを提案する。 CPU収集段階を除去することにより、ホストリソースの消費とデータアクセス遅延を大幅に低減する。さらに,gpuによる高ホストメモリアクセス効率を実現するために,(1)pcieパケット効率を最大化する自動データアクセスアドレスアライメント,(2)非同期ゼロコピーアクセスとカーネル実行の2つの重要な技術を提案する。提案手法をPyTorchに組み込んで,最大1100万ノードと160億エッジのグラフを用いて,その有効性を評価する。マルチGPUトレーニングのセットアップでは、従来のデータ転送方法よりも65〜92%高速で、GPUメモリに収まるグラフのオールインGPUメモリトレーニングのパフォーマンスも一致させることができます。

Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features before sending them to the GPUs. This approach, however, puts tremendous pressure on host memory bandwidth and the CPU. This is because the CPU needs to (1) read sparse features from memory, (2) write features into memory as a dense format, and (3) transfer the features from memory to the GPUs. In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help. By removing the CPU gathering stage, our method significantly reduces the consumption of the host resources and data access latency. We further present two important techniques to achieve high host memory access efficiency by the GPU: (1) automatic data access address alignment to maximize PCIe packet efficiency, and (2) asynchronous zero-copy access and kernel execution to fully overlap data transfer with training. We incorporate our method into PyTorch and evaluate its effectiveness using several graphs with sizes up to 111 million nodes and 1.6 billion edges. In a multi-GPU training setup, our method is 65-92% faster than the conventional data transfer method, and can even match the performance of all-in-GPU-memory training for some graphs that fit in GPU memory.

翻訳日:2021-03-08 14:48:19 公開日:2021-03-04

# 変分スパースガウス過程に対するMCMCについて:擬マリナルアプローチ

On MCMC for variationally sparse Gaussian processes: A pseudo-marginal approach ( http://arxiv.org/abs/2103.03321v1 )

ライセンス: Link先を確認

Karla Monterrubio-G\'omez and Sara Wade

(参考訳) ガウス過程(GP)は、機械学習や統計学において強力なモデルを構築するために頻繁に用いられる。しかし,GPを実際に使用する場合には,計算負担の増大,後部近似,共分散関数の選択,ハイパーパラメータの推測など,重要な考慮が必要である。これらの問題に対処するため、Hensman氏ら。 (2015) は変分スパースGPとマルコフ連鎖モンテカルロ(MCMC)を組み合わせて、GPモデルのスケーラブルで柔軟な一般的なフレームワークを導出する。それでも、結果として得られるアプローチは、多くの観測モデルに対して難解な可能性評価を必要とする。そこで本研究では,この問題を回避すべく,漸近的精密な推論と2倍確率推定器による計算利得を提供する疑似マージナル(pm)スキームを提案する。複素モデルでは、PMスキームの利点は特に顕著であり、非パラメトリック共分散関数を持つ2レベルGP回帰モデルで非定常性を捉えることを実証する。

Gaussian processes (GPs) are frequently used in machine learning and statistics to construct powerful models. However, when employing GPs in practice, important considerations must be made, regarding the high computational burden, approximation of the posterior, choice of the covariance function and inference of its hyperparmeters. To address these issues, Hensman et al. (2015) combine variationally sparse GPs with Markov chain Monte Carlo (MCMC) to derive a scalable, flexible and general framework for GP models. Nevertheless, the resulting approach requires intractable likelihood evaluations for many observation models. To bypass this problem, we propose a pseudo-marginal (PM) scheme that offers asymptotically exact inference as well as computational gains through doubly stochastic estimators for the intractable likelihood and large datasets. In complex models, the advantages of the PM scheme are particularly evident, and we demonstrate this on a two-level GP regression model with a nonparametric covariance function to capture non-stationarity.

翻訳日:2021-03-08 14:44:14 公開日:2021-03-04

# IrrMapper-U-Net のマッピングへの深層学習アプローチ

A Deep Learning Approach to Mapping Irrigation: IrrMapper-U-Net ( http://arxiv.org/abs/2103.03278v1 )

ライセンス: Link先を確認

Thomas Colligan, David Ketchum, Douglas Brinkerhoff, Marco Maneta

(参考訳) 水資源の理解と管理には正確な灌水マップが不可欠である。本研究では,2000年から2019年までのモンタナ州における新しい灌水マッピング法を提案し,その精度を実証する。この手法は、ランドサット画像からの反射情報を用いて、IrrMapper-U-Netと呼ぶ灌水画素を分類する畳み込みニューラルネットワークのアンサンブルに基づいている。この方法論は広範な機能工学に依存しておらず、既存の地理空間データセットからの土地利用情報と分類を条件としない。アンサンブルは網羅的なハイパーパラメータチューニングを必要とせず、分析パイプラインはパーソナルコンピュータに実装できるほど軽量である。さらに,提案手法は分類に関連する不確実性を推定する。本研究は,高度に高精度な空間拡張型地中真理データセットを用いて,郡規模のusda調査とキャダストラムサーベイを用いて,方法論と得られた灌水地図を評価した。その結果,本手法はモンタナ州において,総合的精度と精度の点で他の方法よりも優れていることがわかった。農林水産省の農林省全国農業統計調査では,他の方法に比べて灌水地域の推定値と州全体で一致しており,降水地域における委託の誤差は極めて少ないことがわかった。このメソッドは、クラウドをマスクし、監視なしでLandsat 7スキャンライン障害を無視することを学び、データを前処理する必要性を減らします。この手法は、アメリカ合衆国全土とランドサットの完全な記録に適用される可能性がある。

Accurate maps of irrigation are essential for understanding and managing water resources. We present a new method of mapping irrigation and demonstrate its accuracy for the state of Montana from years 2000-2019. The method is based off of an ensemble of convolutional neural networks that use reflectance information from Landsat imagery to classify irrigated pixels, that we call IrrMapper-U-Net. The methodology does not rely on extensive feature engineering and does not condition the classification with land use information from existing geospatial datasets. The ensemble does not need exhaustive hyperparameter tuning and the analysis pipeline is lightweight enough to be implemented on a personal computer. Furthermore, the proposed methodology provides an estimate of the uncertainty associated with classification. We evaluated our methodology and the resulting irrigation maps using a highly accurate novel spatially-explicit ground truth data set, using county-scale USDA surveys of irrigation extent, and using cadastral surveys. We found that that our method outperforms other methods of mapping irrigation in Montana in terms of overall accuracy and precision. We found that our method agrees better statewide with the USDA National Agricultural Statistics Survey estimates of irrigated area compared to other methods, and has far fewer errors of commission in rainfed agriculture areas. The method learns to mask clouds and ignore Landsat 7 scan-line failures without supervision, reducing the need for preprocessing data. This methodology has the potential to be applied across the entire United States and for the complete Landsat record.

翻訳日:2021-03-08 14:43:55 公開日:2021-03-04

# CLAIMED - Trusted AIのためのビジュアルでスケーラブルなコンポーネントライブラリ

CLAIMED, a visual and scalable component library for Trusted AI ( http://arxiv.org/abs/2103.03281v1 )

ライセンス: Link先を確認

Romeo Kienzler and Ivan Nesic

(参考訳) ディープラーニングモデルの人気はますます高まっているが、説明可能性、敵対的堅牢性、公平性に関する制約は、プロダクションデプロイメントの大きな懸念であることが多い。オープンソースエコシステムはこれらの懸念に対処するために豊富ですが、完全に統合されたエンドツーエンドのシステムはオープンソースに欠けています。したがって、IBMとUniversity Hospital Baselの共同作業であるKubernetes上に、完全にオープンソースで再利用可能なコンポーネントフレームワーク、プロダクショングレードの機械学習用のビジュアルエディタと実行エンジンを提供します。 Kubeflow Pipelines、AI Explainability360ツールキット、AI Fairness360ツールキット、ElyraAI、Kubeflow、Kubernetes、JupyterLab上にAdversarial Robustness Toolkitを使用しています。 Elyraパイプラインエディタを使用すると、一連のJupyterノートブックでAIパイプラインを視覚的に開発できます。

Deep Learning models are getting more and more popular but constraints on explainability, adversarial robustness and fairness are often major concerns for production deployment. Although the open source ecosystem is abundant on addressing those concerns, fully integrated, end to end systems are lacking in open source. Therefore we provide an entirely open source, reusable component framework, visual editor and execution engine for production grade machine learning on top of Kubernetes, a joint effort between IBM and the University Hospital Basel. It uses Kubeflow Pipelines, the AI Explainability360 toolkit, the AI Fairness360 toolkit and the Adversarial Robustness Toolkit on top of ElyraAI, Kubeflow, Kubernetes and JupyterLab. Using the Elyra pipeline editor, AI pipelines can be developed visually with a set of jupyter notebooks.

翻訳日:2021-03-08 14:40:51 公開日:2021-03-04

# プライオリティの見直し$k$-Center:フェアネスとアウトリーチ

Revisiting Priority $k$-Center: Fairness and Outliers ( http://arxiv.org/abs/2103.03337v1 )

ライセンス: Link先を確認

Tanvi Bajpai, Deeparnab Chakrabarty, Chandra Chekuri, Maryam Negahbani

(参考訳) 優先度 $k$-Center 問題では、入力は計量空間 $(X,d)$ と整数 $k$ と、各点 $v \in X$ の優先度半径 $r(v)$ からなる。目標は $k$-centers $S \subseteq X$ を選び、$\max_{v \in X} \frac{1}{r(v)} d(v,S)$ を最小化することである。すべての$r(v)$ が一様であれば、古典的な $k$-center 問題が得られる。 Plesn\'ik [Plesn\'ik, Disc。アプリ。数学。 1987年]この問題を導入し、バニラ$k$-centerの最良のアルゴリズムと一致する2ドルの近似アルゴリズムを与えた。この問題は、フェアクラスタリングの2つの異なる概念 [Harris et al., NeurIPS 2018; Jung et al., FORC 2020] とどのように関連しているかを示します。これらの開発に動機づけられて、我々は問題を再検討し、私たちの主な技術的貢献では、優先度$ k$-Centerの定数因子近似アルゴリズムを生成するフレームワークを開発します。我々のフレームワークは、行列やknapsackの制約に$k$-Centerのプライオリティの一般化にまで拡張され、また、論理的にハリスらの宝くじモデルにおいて公平性を保証するアルゴリズムも得られる。

In the Priority $k$-Center problem, the input consists of a metric space $(X,d)$, an integer $k$ and for each point $v \in X$ a priority radius $r(v)$. The goal is to choose $k$-centers $S \subseteq X$ to minimize $\max_{v \in X} \frac{1}{r(v)} d(v,S)$. If all $r(v)$'s were uniform, one obtains the classical $k$-center problem. Plesn\'ik [Plesn\'ik, Disc. Appl. Math. 1987] introduced this problem and gave a $2$-approximation algorithm matching the best possible algorithm for vanilla $k$-center. We show how the problem is related to two different notions of fair clustering [Harris et al., NeurIPS 2018; Jung et al., FORC 2020]. Motivated by these developments we revisit the problem and, in our main technical contribution, develop a framework that yields constant factor approximation algorithms for Priority $k$-Center with outliers. Our framework extends to generalizations of Priority $k$-Center to matroid and knapsack constraints, and as a corollary, also yields algorithms with fairness guarantees in the lottery model of Harris et al.

翻訳日:2021-03-08 14:40:34 公開日:2021-03-04

# WaveGuard: オーディオアドバイザリの例を理解して修正する

WaveGuard: Understanding and Mitigating Audio Adversarial Examples ( http://arxiv.org/abs/2103.03344v1 )

ライセンス: Link先を確認

Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar

(参考訳) 近年,ディープラーニングに基づく自動音声認識(ASR)システムに対する敵対的攻撃が急増している。これらの攻撃はディープラーニングのセキュリティに新たな課題をもたらし、安全クリティカルなアプリケーションにASRシステムをデプロイすることに大きな懸念を引き起こしました。本稿では,asrシステムを攻撃するために開発された逆入力を検出するフレームワークwaveguardを紹介する。本フレームワークは,音声変換機能を組み込んで原音声と変換音声のasr転写を解析し,逆入力を検出する。我々は,近年の4つの音声対向攻撃によって構築された対向的事例を,様々な音声変換関数を用いて確実に検出できることを実証した。防衛評価におけるベストプラクティスを慎重に検討し,音声領域における適応的かつ強固な攻撃に耐える防衛力とその強みを分析した。我々は,音声を知覚的情報から復元する音声変換が,完全なホワイトボックス設定であっても,適応的相手に対して堅牢な強い防御につながることを実証的に実証した。さらに、WaveGuardはすぐに使用でき、任意のASRモデルと直接統合され、モデルの再トレーニングを必要とせずに、オーディオの逆転例を効率的に検出できます。

There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems. These attacks pose new challenges to deep learning security and have raised significant concerns in deploying ASR systems in safety-critical applications. In this work, we introduce WaveGuard: a framework for detecting adversarial inputs that are crafted to attack ASR systems. Our framework incorporates audio transformation functions and analyses the ASR transcriptions of the original and transformed audio to detect adversarial inputs. We demonstrate that our defense framework is able to reliably detect adversarial examples constructed by four recent audio adversarial attacks, with a variety of audio transformation functions. With careful regard for best practices in defense evaluations, we analyze our proposed defense and its strength to withstand adaptive and robust attacks in the audio domain. We empirically demonstrate that audio transformations that recover audio from perceptually informed representations can lead to a strong defense that is robust against an adaptive adversary even in a complete white-box setting. Furthermore, WaveGuard can be used out-of-the box and integrated directly with any ASR model to efficiently detect audio adversarial examples, without the need for model retraining.

翻訳日:2021-03-08 14:35:37 公開日:2021-03-04

# (参考訳) コヒーレントリスクに対する政策勾配の収束と最適性について

On the Convergence and Optimality of Policy Gradient for Coherent Risk ( http://arxiv.org/abs/2103.02827v1 )

ライセンス: CC BY 4.0

Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli

(参考訳) 強化学習におけるリスク回避をモデル化するために、新たな研究ラインでは、よく知られたアルゴリズムを使用してコヒーレントリスク関数(条件付きリスク(CVaR)を含むクラス)を最適化する。マルコフの決定プロセスではコヒーレントリスクの最適化は困難であるため、最近の研究では、時間の一貫性のある代理であるマルコフコヒーレントリスク(MCR)に焦点を当てる傾向にある。政策勾配 (PG) の更新はこの目的のために導出されているが、(i) PG が MCR にグローバルに最適であるかどうか、(ii) トラクタブルな方法で勾配を推定する方法は不明である。本稿では,mcrの目的が(期待値と異なり)勾配が支配的ではなく,定常点が一般にグローバルに最適であることを保証するものではないことを実証する。さらに,目的の非線形性とリスク回避の程度に依存することを特徴として,学習方針の最適性に対する厳密な上限を示す。対処法(ii)では, 従来の制限を克服するために, 状態分布の重み付けを用いたPGの実践的実装を提案する。実験を通じて,最適性ギャップが小さい場合,pgはリスクに敏感な方針を学習できることを実証する。しかし、大きな最適性ギャップを持つインスタンスは豊富で構築が容易であり、将来の研究における重要な課題を概説する。

In order to model risk aversion in reinforcement learning, an emerging line of research adapts familiar algorithms to optimize coherent risk functionals, a class that includes conditional value-at-risk (CVaR). Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate. While, policy gradient (PG) updates have been derived for this objective, it remains unclear (i) whether PG finds a global optimum for MCR; (ii) how to estimate the gradient in a tractable manner. In this paper, we demonstrate that, in general, MCR objectives (unlike the expected return) are not gradient dominated and that stationary points are not, in general, guaranteed to be globally optimal. Moreover, we present a tight upper bound on the suboptimality of the learned policy, characterizing its dependence on the nonlinearity of the objective and the degree of risk aversion. Addressing (ii), we propose a practical implementation of PG that uses state distribution reweighting to overcome previous limitations. Through experiments, we demonstrate that when the optimality gap is small, PG can learn risk-sensitive policies. However, we find that instances with large suboptimality gaps are abundant and easy to construct, outlining an important challenge for future research.

翻訳日:2021-03-08 00:21:40 公開日:2021-03-04

# (参考訳) 動的語彙を用いた感情制御対話応答生成モデル

An Emotion-controlled Dialog Response Generation Model with Dynamic Vocabulary ( http://arxiv.org/abs/2103.02878v1 )

ライセンス: CC BY 4.0

Shuangyong Song, Kexin Wang, Chao Wang, Haiqing Chen, Huan Chen

(参考訳) 応答生成タスクでは、適切な感情表現は、応答の人間的様レベルを明らかに改善することができる。しかし,オンラインシステムにおける実際の応用には,高QPS(オンラインシステムのフローキャパシティの指標)が必要であり,動的語彙機構が生成モデルの高速化に有効であることが証明されている。本稿では,動的語彙機構に基づく感情制御型対話応答生成モデルを提案し,実験結果から本モデルの有用性が示された。

In response generation task, proper sentimental expressions can obviously improve the human-like level of the responses. However, for real application in online systems, high QPS (queries per second, an indicator of the flow capacity of on-line systems) is required, and a dynamic vocabulary mechanism has been proved available in improving speed of generative models. In this paper, we proposed an emotion-controlled dialog response generation model based on the dynamic vocabulary mechanism, and the experimental results show the benefit of this model.

翻訳日:2021-03-07 23:08:23 公開日:2021-03-04

# (参考訳) スペクトルDefense:フーリエ領域におけるCNNの敵攻撃の検出

SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain ( http://arxiv.org/abs/2103.03000v1 )

ライセンス: CC BY 4.0

Paula Harder, Franz-Josef Pfreundt, Margret Keuper, Janis Keuper

(参考訳) 多くのコンピュータビジョンや画像解析タスクにおける畳み込みニューラルネットワーク(CNN)の成功にもかかわらず、それらはいわゆる敵対的な攻撃に対して脆弱のままです。防御は敵の例を検出することである。本稿では,入力画像と特徴マップのフーリエ領域における解析を用いて,良質なテストサンプルと敵画像の区別を行う方法を示す。第1報では,入力画像の大きさスペクトルを用いて敵の攻撃を検出する手法を提案する。このシンプルで堅牢な分類器は、3つの一般的な攻撃方法の敵対的摂動をうまく検出できます。第2の方法は、第1に構築され、さらにネットワークの異なる層における特徴マップのフーリエ係数の位相を抽出する。この拡張により、5つの異なる攻撃方法における最先端検出器と比較して、敵検出率を向上させることができる。

Despite the success of convolutional neural networks (CNNs) in many computer vision and image analysis tasks, they remain vulnerable against so-called adversarial attacks: Small, crafted perturbations in the input images can lead to false predictions. A possible defense is to detect adversarial examples. In this work, we show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods: Our first method employs the magnitude spectrum of the input images to detect an adversarial attack. This simple and robust classifier can successfully detect adversarial perturbations of three commonly used attack methods. The second method builds upon the first and additionally extracts the phase of Fourier coefficients of feature-maps at different layers of the network. With this extension, we are able to improve adversarial detection rates compared to state-of-the-art detectors on five different attack methods.

翻訳日:2021-03-07 23:04:14 公開日:2021-03-04

# (参考訳) there and back again: 変動要因の分離のための集合全体のサイクル一貫性

There and back again: Cycle consistency across sets for isolating factors of variation ( http://arxiv.org/abs/2103.03240v1 )

ライセンス: CC BY 4.0

Kieran A. Murphy, Varun Jampani, Srikumar Ramalingam, Ameesh Makadia

(参考訳) 表現学習は、データの変動の基盤となる説明的要因の集合を解き放つタスクにかかっている。本研究では,変動要因をサブセットに限定したグループ化(grouping)という形で,データに関する限られた情報や,集合メンバシップ(set membership)という設定で運用する。私たちの目標は、グループ間で共通する変化の要因を分離する表現を学ぶことです。我々の重要な洞察は、異なる集合に属する画像の学習された埋め込み間の集合(CCS)間のサイクル一貫性の利用である。セット管理を利用する他の手法とは対照的に、CCSは変化の要因に対する制約を著しく少なくし、非常に広い範囲の設定で適用でき、トレーニングデータの一部に対してのみセットメンバーシップを利用することができる。 shapes3dからデータセットをキュレートすることで,学習表現と既知の生成因子の相互情報を通してccsの有効性を定量化する。さらに,デジタルスタイル分離と合成オブジェクトポーズ転送のタスクに対するCSの適用性を実証し,これを用いた生成的アプローチとの比較を行った。

Representational learning hinges on the task of unraveling the set of underlying explanatory factors of variation in data. In this work, we operate in the setting where limited information is known about the data in the form of groupings, or set membership, where the underlying factors of variation is restricted to a subset. Our goal is to learn representations which isolate the factors of variation that are common across the groupings. Our key insight is the use of cycle consistency across sets(CCS) between the learned embeddings of images belonging to different sets. In contrast to other methods utilizing set supervision, CCS can be applied with significantly fewer constraints on the factors of variation, across a remarkably broad range of settings, and only utilizing set membership for some fraction of the training data. By curating datasets from Shapes3D, we quantify the effectiveness of CCS through mutual information between the learned representations and the known generative factors. In addition, we demonstrate the applicability of CCS to the tasks of digit style isolation and synthetic-to-real object pose transfer and compare to generative approaches utilizing the same supervision.

翻訳日:2021-03-07 21:51:22 公開日:2021-03-04

# (参考訳) コードの普遍表現

Universal Representation for Code ( http://arxiv.org/abs/2103.03116v1 )

ライセンス: CC BY 4.0

Linfeng Liu, Hoan Nguyen, George Karypis, Srinivasan Sengamedu

(参考訳) ソースコードから学ぶには、通常大量のラベル付きデータが必要です。ラベル付きデータの不足の可能性にもかかわらず、トレーニングされたモデルはタスク固有であり、異なるタスクへの転送性に欠ける。本稿では,新しいグラフベースのコード表現の上に,コードの普遍表現を生成するための効果的な事前学習戦略を提案する。特に、私たちのグラフベースの表現は、コード要素(例えば、制御フローとデータフロー)間の重要なセマンティクスをキャプチャします。我々は、グラフニューラルネットワークの表現を事前学習し、普遍的なコード特性を抽出する。事前トレーニングされたモデルは、様々な下流アプリケーションをサポートするための微調整を可能にする。実世界の2つのデータセット - 30億のjavaメソッドと770万のpythonメソッドにまたがる。可視化により、普遍的なコード表現における識別特性を明らかにする。複数のベンチマークを比較することで,提案フレームワークがメソッド名予測とコードグラフリンク予測の最先端結果を実現することを示す。

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.

翻訳日:2021-03-07 20:43:47 公開日:2021-03-04

# (参考訳) 生涯学習の現実的なシナリオとしての継続的協調

Continuous Coordination As a Realistic Scenario for Lifelong Learning ( http://arxiv.org/abs/2103.03216v1 )

ライセンス: CC BY-SA 4.0

Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar

(参考訳) 現在の深層強化学習(RL)アルゴリズムは依然としてタスク固有であり、新しい環境に一般化する能力がない。しかし、LLL(Lifelong Learning)は、タスク間の知識を効率的に転送し、使用することにより、複数のタスクを順次解決することを目指しています。近年の生涯RLへの関心の高まりにもかかわらず、現実的なテストベッドの欠如はLLLアルゴリズムの堅牢な評価を困難にします。一方、マルチエージェントRL(MARL)は、エージェントのポリシーが時間とともに変化するため、その固有の非定常性のため、寿命の長いRLの自然なシナリオと見なすことができる。本研究では,ゼロショット設定と少数ショット設定の両方をサポートするマルチエージェント生涯学習テストベッドを提案する。私たちのセットアップは、部分的に観察可能で完全に協力的なマルチエージェントゲームであるhanabiをベースにしています。その大きな戦略空間は、生涯RLタスクにとって望ましい環境である。最近のMARL法、および制限メモリおよび計算システムにおける最新のLLLアルゴリズムのベンチマークを評価し、それらの長所と短所を明らかにします。この継続的な学習パラダイムは、MARLで最も一般的に使用されるトレーニングプロトコルである集中型トレーニングを超えて実用的な方法を提供します。我々は経験的に、我々の設定で訓練されたエージェントは、以前の作業による追加の仮定なしに、未発見のエージェントとうまく協調できることを示します。

Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.

翻訳日:2021-03-07 20:31:37 公開日:2021-03-04

# (参考訳) 定量化法の比較評価

A Comparative Evaluation of Quantification Methods ( http://arxiv.org/abs/2103.03223v1 )

ライセンス: CC BY 4.0

Tobias Schumacher, Markus Strohmaier, Florian Lemmerich

(参考訳) 量子化は、与えられたターゲットセットのクラス分布を予測する問題を表す。また、近年、さまざまなアルゴリズムが提案されている教師付き機械学習の研究分野も拡大しています。しかし,アルゴリズム選択をサポートする定量化手法の包括的比較は,まだ行われていない。本研究では,24種類の数値化手法の実証的性能比較を徹底することで,この研究ギャップを埋める。バイナリのさまざまなシナリオとマルチクラス量子化設定を検討するため、40のデータセット上で約300万回の実験実行を実施しました。一つのアルゴリズムが一般に競合に勝ることはないが、Median SweepやDySフレームワークなど、バイナリ設定で大幅にパフォーマンスが向上するメソッド群を識別する。多クラス構成の場合、一般化された確率的調整数、readme法、エネルギー距離最小化法、数値化のためのemアルゴリズム、フリードマン法など、異なる幅広いアルゴリズム群が優れた性能をもたらすことが観察される。より一般的には、多クラス定量化の性能はバイナリ設定の結果よりも劣っていることが分かる。本研究は,定量化アルゴリズムを適用しようとする実践者の指導と,今後の研究の機会の特定を支援する。

Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods. To consider a broad range of different scenarios for binary as well as multiclass quantification settings, we carried out almost 3 million experimental runs on 40 data sets. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the Median Sweep and the DyS framework that perform significantly better in binary settings. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research.

翻訳日:2021-03-07 20:08:42 公開日:2021-03-04

# (参考訳) ビザンチン-ロバスト分散推論のための分散減調平均推定器

Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference ( http://arxiv.org/abs/2103.02860v1 )

ライセンス: CC BY 4.0

Jiyuan Tu, Weidong Liu, Xiaojun Mao, and Xi Chen

(参考訳) 本論文では,Byzantineノードの適度な分数に対して堅牢な分散推論アルゴリズム,すなわち分散学習システムにおける任意かつ潜在的に逆転するマシンを開発する。堅牢な統計では、中央平均(MOM)は、実装の容易さと計算効率のためにビザンチンの失敗に対してヘッジする一般的なアプローチでした。しかし、MOM推定器は統計効率の面で欠点があります。この論文の最初の主な貢献は、バニラMOM推定器の統計効率を向上し、MOMと同等の計算効率を持つ分散還元平均(VRMOM)推定器を提案することである。提案したVRMOM推定器に基づいて,ビザンチンの故障に対して頑健な一般分散推論アルゴリズムを開発した。理論上,分散アルゴリズムは一定数の通信ラウンド数しか持たない高速収束率を達成している。また,統計的推測を目的とした漸近正規化結果も提供する。私たちの知る限りでは、これはByzantine-robust分散学習の設定における最初の正常性の結果です。また,本手法の有効性を示すためにシミュレーション結果も提示した。

This paper develops an efficient distributed inference algorithm, which is robust against a moderate fraction of Byzantine nodes, namely arbitrary and possibly adversarial machines in a distributed learning system. In robust statistics, the median-of-means (MOM) has been a popular approach to hedge against Byzantine failures due to its ease of implementation and computational efficiency. However, the MOM estimator has the shortcoming in terms of statistical efficiency. The first main contribution of the paper is to propose a variance reduced median-of-means (VRMOM) estimator, which improves the statistical efficiency over the vanilla MOM estimator and is computationally as efficient as the MOM. Based on the proposed VRMOM estimator, we develop a general distributed inference algorithm that is robust against Byzantine failures. Theoretically, our distributed algorithm achieves a fast convergence rate with only a constant number of rounds of communications. We also provide the asymptotic normality result for the purpose of statistical inference. To the best of our knowledge, this is the first normality result in the setting of Byzantine-robust distributed learning. The simulation results are also presented to illustrate the effectiveness of our method.

翻訳日:2021-03-07 19:30:26 公開日:2021-03-04

# (参考訳) 正常および耳炎中耳炎における広帯域吸音率の機械学習による解析

Analysing Wideband Absorbance Immittance in Normal and Ears with Otitis Media with Effusion Using Machine Learning ( http://arxiv.org/abs/2103.02982v1 )

ライセンス: CC BY 4.0

Emad M. Grais, Xiaoya Wang, Jie Wang, Fei Zhao, Wen Jiang, Yuexin Cai, Lifang Zhang, Qingwen Lin, Haidi Yang

(参考訳) 広帯域吸収率(WAI)は10年以上前から利用されていますが、その臨床使用は、限られた理解とWAI結果の悪い解釈の課題にまだ直面しています。本研究は、正常中耳および耳の異なる周波数圧領域におけるWAI吸収特性を、輸液(OME)による耳炎媒体で同定し、中耳の状態を自動的に診断する機械学習(ML)ツールの開発を目的とした。本研究では, waiデータの前処理, 統計解析, 分類モデル開発を含むデータ解析と, 2次元周波数圧wai画像からのキー領域抽出を行った。実験結果から, MLツールがWAIデータから中耳疾患を自動診断する大きな可能性を秘めていることがわかった。 WAIの特定された重要な領域は、WAIデータをよりよく理解し、解釈し、迅速かつ正確な診断決定の見通しを提供する実践者にガイダンスを提供します。

Wideband Absorbance Immittance (WAI) has been available for more than a decade, however its clinical use still faces the challenges of limited understanding and poor interpretation of WAI results. This study aimed to develop Machine Learning (ML) tools to identify the WAI absorbance characteristics across different frequency-pressure regions in the normal middle ear and ears with otitis media with effusion (OME) to enable diagnosis of middle ear conditions automatically. Data analysis including pre-processing of the WAI data, statistical analysis and classification model development, together with key regions extraction from the 2D frequency-pressure WAI images are conducted in this study. Our experimental results show that ML tools appear to hold great potential for the automated diagnosis of middle ear diseases from WAI data. The identified key regions in the WAI provide guidance to practitioners to better understand and interpret WAI data and offer the prospect of quick and accurate diagnostic decisions.

翻訳日:2021-03-07 19:29:23 公開日:2021-03-04

# (参考訳) 自己監視型ジオメトリック知覚

Self-supervised Geometric Perception ( http://arxiv.org/abs/2103.03114v1 )

ライセンス: CC BY 4.0

Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun

(参考訳) SGP(Self-supervised Geometric Recognition)は、地上真正の幾何学モデルラベル(例えば、カメラポーズ、リジッド変換)なしでマッチングする機能記述子を学習する最初の一般的なフレームワークである。私たちの最初の貢献は、特徴ディスクリプタと幾何モデル(例えば画像、点雲)を共同で最適化する最適化問題として幾何学的知覚を定式化することです。この最適化定式化の下では、視覚における2つの重要な研究の流れ、すなわち頑健なモデルフィッティングと深い特徴学習が、他のブロックを固定しながら未知変数の1ブロックを最適化することに対応することを示す。この分析は自然に、共同最適化を解決するために交互最小化を実行するSGPアルゴリズムの2番目の貢献につながります。 SGPは、2つのメタアルゴリズムを反復的に実行する: 与えられた学習特徴を頑健なモデルフィッティングして幾何学的擬似ラベルを生成する教師と、擬似ラベルのうるさい監督の下で深い特徴学習を行う学生である。第3の貢献として,GeoDepthの相対カメラポーズ推定と3DMatchのポイントクラウド登録という,大規模実データに対する2つの認識問題にSGPを適用している。本研究は,SGPが地上トラスラベルを用いて訓練した教師付きオークルよりも同等あるいは優れる最先端性能を達成できることを実証する。

We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution -- the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.

翻訳日:2021-03-07 19:13:44 公開日:2021-03-04

# (参考訳) 多発性硬化症のMR画像の構造的因果モデル

A Structural Causal Model for MR Images of Multiple Sclerosis ( http://arxiv.org/abs/2103.03158v1 )

ライセンス: CC BY 4.0

Jacob C. Reinhold, Aaron Carass, Jerry L. Prince

(参考訳) 精密医学は、「この患者は治療Aまたは治療Bに対してよりよく反応するだろうか? これらのタイプの質問は本質的に因果関係であり、因果推論のツール、例えば構造因果モデル(SCM)で答える必要がある。本研究では,多発性硬化症(ms)患者の脳の人口統計情報,疾患共変量,磁気共鳴(mr)画像の相互作用をモデル化するscmを開発した。 SCMの推論は、人口動態や疾患の共変量を変更すると、脳のMR画像がどのように見えるかを示す反事実画像を生成する。これらの画像は病気の進行をモデル化したり、共同設立者のための制御が必要な下流の画像処理タスクに使用できる。

Precision medicine involves answering counterfactual questions such as "Would this patient respond better to treatment A or treatment B?" These types of questions are causal in nature and require the tools of causal inference to be answered, e.g., with a structural causal model (SCM). In this work, we develop an SCM that models the interaction between demographic information, disease covariates, and magnetic resonance (MR) images of the brain for people with multiple sclerosis (MS). Inference in the SCM generates counterfactual images that show what an MR image of the brain would look like when demographic or disease covariates are changed. These images can be used for modeling disease progression or used for downstream image processing tasks where controlling for confounders is necessary.

翻訳日:2021-03-07 18:39:22 公開日:2021-03-04

# (参考訳) Dota 2ボットコンペティション

The Dota 2 Bot Competition ( http://arxiv.org/abs/2103.02943v1 )

ライセンス: CC BY 4.0

Jose M. Font and Tobias Mahlmann

(参考訳) マルチプレイヤーオンラインバトルエリア(MOBA)ゲームは、ビデオゲーム業界と国際的なeスポーツシーンの両方で最近大きな成功を収めています。これらのゲームはチームの協調と協力、短期と長期の計画、リアルタイムでのアクションと戦略のゲームプレイを促進する。ゲーム研究コンペティションにおける人工知能と計算知能は、異なるゲームジャンルに対するai技術の研究と応用に関する幅広い課題を提供している。これらのイベントはAI/CIコミュニティによって、この分野の他の多くの研究領域に強く影響を及ぼす、一種のAIベンチマークとして広く受け入れられている。本稿では、Dota 2 BotコンペティションとそれをサポートするDota 2 AIフレームワークについて詳しく説明します。このチャレンジは、MOBAとAI/CIゲームコンペティションの両方に参加することを目的としており、参加者はMOBA \textit{Defense of the Ancients 2} (Dota 2)にAIコントローラを提出して、1v1のマッチでプレイすることを勧めている。 Dota 2 AIフレームワークは、実際のDota 2ゲームモデディング機能を利用して、オリジナルのFree-to-Playゲームを使用して、外部のAIコントローラを実際のDota 2ゲームマッチに接続し、オリジナルのFree-to-Playゲームを使用して、外部のAIコントローラを実際のDota 2ゲームマッチに接続できるようにします。

Multiplayer Online Battle Area (MOBA) games are a recent huge success both in the video game industry and the international eSports scene. These games encourage team coordination and cooperation, short and long-term planning, within a real-time combined action and strategy gameplay. Artificial Intelligence and Computational Intelligence in Games research competitions offer a wide variety of challenges regarding the study and application of AI techniques to different game genres. These events are widely accepted by the AI/CI community as a sort of AI benchmarking that strongly influences many other research areas in the field. This paper presents and describes in detail the Dota 2 Bot competition and the Dota 2 AI framework that supports it. This challenge aims to join both, MOBAs and AI/CI game competitions, inviting participants to submit AI controllers for the successful MOBA \textit{Defense of the Ancients 2} (Dota 2) to play in 1v1 matches, which aims for fostering research on AI techniques for real-time games. The Dota 2 AI framework makes use of the actual Dota 2 game modding capabilities to enable to connect external AI controllers to actual Dota 2 game matches using the original Free-to-Play game.se of the actual Dota 2 game modding capabilities to enable to connect external AI controllers to actual Dota 2 game matches using the original Free-to-Play game.

翻訳日:2021-03-07 18:05:37 公開日:2021-03-04

# (参考訳) DeepTag: 心磁気共鳴画像を用いた動き追跡のための教師なし深層学習法

DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images ( http://arxiv.org/abs/2103.02772v1 )

ライセンス: CC BY 4.0

Meng Ye, Mikael Kanski, Dong Yang, Qi Chang, Zhennan Yan, Qiaoying Huang, Leon Axel, Dimitris Metaxas

(参考訳) 心筋タグ付け磁気共鳴イメージング(t-MRI)は、局所心筋変形および心緊張評価のためのゴールドスタンダードです。しかし, t-MRI画像では運動追跡が困難であったため, 臨床診断にはあまり使われていない。本論文では,t-MRI画像のin vivoモーショントラッキングのための深層学習に基づく完全監視手法を提案する。 2つの連続したt-MRIフレーム間の運動場(INF)を二方向生成二形登録ニューラルネットワークで推定する。この結果を用いて,参照フレームと他のフレームとの間のラグランジュ運動場を微分可能な合成層で推定する。時間情報を利用して時空間運動場を合理的に推定することにより、動的医用画像における運動追跡や画像登録に有用なソリューションを提供する。本手法は代表的な臨床用t-mriデータセット上で検証され, ランドマーク追跡精度と推定効率の点で, 従来の運動追跡法よりも優れていることが実証された。

Cardiac tagging magnetic resonance imaging (t-MRI) is the gold standard for regional myocardium deformation and cardiac strain estimation. However, this technique has not been widely used in clinical diagnosis, as a result of the difficulty of motion tracking encountered with t-MRI images. In this paper, we propose a novel deep learning-based fully unsupervised method for in vivo motion tracking on t-MRI images. We first estimate the motion field (INF) between any two consecutive t-MRI frames by a bi-directional generative diffeomorphic registration neural network. Using this result, we then estimate the Lagrangian motion field between the reference frame and any other frame through a differentiable composition layer. By utilizing temporal information to perform reasonable estimations on spatio-temporal motion fields, this novel method provides a useful solution for motion tracking and image registration in dynamic medical imaging. Our method has been validated on a representative clinical t-MRI dataset; the experimental results show that our method is superior to conventional motion tracking methods in terms of landmark tracking accuracy and inference efficiency.

翻訳日:2021-03-07 17:52:12 公開日:2021-03-04

# (参考訳) きめ細かい視覚分類のための特徴増強, 抑圧, 多様化

Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification ( http://arxiv.org/abs/2103.02782v1 )

ライセンス: CC BY 4.0

Jianwei Song, Ruoyu Yang

(参考訳) 識別的局所領域からの特徴表現の学習は、きめ細かい視覚的分類において重要な役割を担っている。部分的特徴抽出のための注意機構の活用がトレンドとなっている。しかし、これらの方法には2つの大きな制限がある: まず、他の目立たないが区別可能な部分を無視しながら、最も健全な部分に焦点を当てることがしばしばである。第2に、関係を無視しながら、異なる部分の特徴を分離して扱う。これらの制約に対処するために,複数の異なる識別可能な部分を見つけ,それらの関係を明示的な方法で探究することを提案する。本稿では,既存の畳み込みニューラルネットワークに簡単に接続可能な2つの軽量モジュールを提案する。本稿では,特徴マップの最も顕著な部分を強化し,部分固有の表現を取得し,次のネットワークに他の潜在的な部品をマイニングさせるよう抑制する機能強化・抑制モジュールを提案する。一方,相関した部分固有表現から意味的に補完的な情報を学習する特徴多様化モジュールを提案する。私たちのメソッドはバウンディングボックス/パーツアノテーションを必要とせず、エンドツーエンドでトレーニングできます。広範な実験結果から,本手法は複数のベンチマークきめ細かなデータセットにおいて最先端の性能を得ることができた。

Learning feature representation from discriminative local regions plays a key role in fine-grained visual classification. Employing attention mechanisms to extract part features has become a trend. However, there are two major limitations in these methods: First, they often focus on the most salient part while neglecting other inconspicuous but distinguishable parts. Second, they treat different part features in isolation while neglecting their relationships. To handle these limitations, we propose to locate multiple different distinguishable parts and explore their relationships in an explicit way. In this pursuit, we introduce two lightweight modules that can be easily plugged into existing convolutional neural networks. On one hand, we introduce a feature boosting and suppression module that boosts the most salient part of feature maps to obtain a part-specific representation and suppresses it to force the following network to mine other potential parts. On the other hand, we introduce a feature diversification module that learns semantically complementary information from the correlated part-specific representations. Our method does not need bounding boxes/part annotations and can be trained end-to-end. Extensive experimental results show that our method achieves state-of-the-art performances on several benchmark fine-grained datasets.

翻訳日:2021-03-07 17:19:04 公開日:2021-03-04

# (参考訳) 粒度認識畳み込みニューラルネットワークによる粒度分類の学習

Learning Granularity-Aware Convolutional Neural Network for Fine-Grained Visual Classification ( http://arxiv.org/abs/2103.02788v1 )

ライセンス: CC BY 4.0

Jianwei Song, Ruoyu Yang

(参考訳) 識別的部分の配置は、異なるオブジェクト間の高い類似性のため、きめ細かい視覚的分類において重要な役割を果たす。畳み込みニューラルネットワークに基づく最近の研究は、最終畳み込み層から抽出した特徴写像を利用して識別領域をマイニングしている。しかしながら、最後の畳み込み層は、大きな受容野のためにオブジェクト全体に集中する傾向にあり、それによって違いを見つける能力が低下する。そこで本研究では,Granularity-Aware Convolutional Neural Network (GA-CNN) を提案する。具体的には, GA-CNNは, 異なる層における受容場の違いを利用して多粒度特徴を学習し, 前段のより小さな粒度情報に基づいて, より大きな粒度情報を利用する。性能をさらに向上するため,原画像が与えられたオブジェクトを効果的にローカライズできるオブジェクト検出モジュールを導入する。 GA-CNNはバウンディングボックス/パーツアノテーションを必要とせず、エンドツーエンドでトレーニングできます。広範な実験結果から,3つのベンチマークデータセットで最新のパフォーマンスを達成した。

Locating discriminative parts plays a key role in fine-grained visual classification due to the high similarities between different objects. Recent works based on convolutional neural networks utilize the feature maps taken from the last convolutional layer to mine discriminative regions. However, the last convolutional layer tends to focus on the whole object due to the large receptive field, which leads to a reduced ability to spot the differences. To address this issue, we propose a novel Granularity-Aware Convolutional Neural Network (GA-CNN) that progressively explores discriminative features. Specifically, GA-CNN utilizes the differences of the receptive fields at different layers to learn multi-granularity features, and it exploits larger granularity information based on the smaller granularity information found at the previous stages. To further boost the performance, we introduce an object-attentive module that can effectively localize the object given a raw image. GA-CNN does not need bounding boxes/part annotations and can be trained end-to-end. Extensive experimental results show that our approach achieves state-of-the-art performances on three benchmark datasets.

翻訳日:2021-03-07 17:04:23 公開日:2021-03-04

# (参考訳) 効率的なモバイルネットワーク設計のためのコーディネート注意

Coordinate Attention for Efficient Mobile Network Design ( http://arxiv.org/abs/2103.02907v1 )

ライセンス: CC BY 4.0

Qibin Hou, Daquan Zhou, Jiashi Feng

(参考訳) 移動ネットワーク設計に関する最近の研究は, モデル性能向上のためのチャネル注意(例えば, 押し出し注意)の顕著な効果を実証してきたが, 一般に位置情報は無視され, 空間的に選択的に注意マップを生成するのに重要である。本稿では,位置情報をチャネルの注意に埋め込むことにより,モバイルネットワークにおける新たな注意メカニズムを提案する。 2次元グローバルプーリングにより特徴テンソルを単一特徴ベクトルに変換するチャネルアテンションとは異なり、座標アテンションはチャネルアテンションを2つの空間方向に沿って特徴を集約する2つの1次元特徴符号化プロセスに分解する。このようにして、一方の空間方向に沿って長距離依存を捕捉でき、他方の空間方向に沿って正確な位置情報を保存することができる。結果として得られた特徴マップは、入力された特徴マップに相補的に適用でき、関心のある対象の表現を増強できる方向認識および位置知覚のアテンションマップに別々にエンコードされる。座標の注意は単純で、MobileNetV2、MobileNeXt、EfficientNetなどの古典的なモバイルネットワークに柔軟に接続でき、計算オーバーヘッドはほとんどない。広範な実験は、私たちの座標の注意がImageNet分類に有益であるだけでなく、より興味深いことに、オブジェクト検出やセマンティックセグメンテーションなどの下流タスクでより良い振る舞いを示す。コードはhttps://github.com/Andrew-Qibin/CoordAttentionで入手できる。

Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.

翻訳日:2021-03-07 16:55:39 公開日:2021-03-04

# (参考訳) モーションブルービデオ補間と外挿

Motion-blurred Video Interpolation and Extrapolation ( http://arxiv.org/abs/2103.02984v1 )

ライセンス: CC BY 4.0

Dawit Mureja Argaw, Junsik Kim, Francois Rameau, In So Kweon

(参考訳) シーン内のカメラやオブジェクトの突然の動作はぼやけたビデオになるため、高品質なビデオの復元には2つのタイプの強化が必要である。広い範囲の研究により、ぼやけた画像列や時間的にアップサンプルフレームからクリーンフレームを補間する方法が試みられたが、両者を共同で扱う研究は非常に限られている。そこで本研究では,映像から鮮明なフレームをエンド・ツー・エンドで切り離し,補間し,外挿する新しいフレームワークを提案する。まず,入力のぼやけを引き起こした画素レベルの動きを光学的流れ推定によって学習し,デコードされた特徴を推定フローで反動させることで,複数のクリーンフレームを予測した。予測フレーム間の時間的コヒーレンスを確保し,潜在的な時間的あいまいさに対処するために,単純で効果的なフローベースルールを提案する。提案手法の有効性と好適性は,高速ビデオからの動色データセットの質的,定量的な評価を通じて強調される。

Abrupt motion of camera or objects in a scene result in a blurry video, and therefore recovering high quality video requires two types of enhancements: visual enhancement and temporal upsampling. A broad range of research attempted to recover clean frames from blurred image sequences or temporally upsample frames by interpolation, yet there are very limited studies handling both problems jointly. In this work, we present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner. We design our framework by first learning the pixel-level motion that caused the blur from the given inputs via optical flow estimation and then predict multiple clean frames by warping the decoded features with the estimated flows. To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule. The effectiveness and favorability of our approach are highlighted through extensive qualitative and quantitative evaluations on motion-blurred datasets from high speed videos.

翻訳日:2021-03-07 16:39:56 公開日:2021-03-04

# (参考訳) 単一運動破砕画像からの光流量推定

Optical Flow Estimation from a Single Motion-blurred Image ( http://arxiv.org/abs/2103.02996v1 )

ライセンス: CC BY 4.0

Dawit Mureja Argaw, Junsik Kim, Francois Rameau, Jae Won Cho, In So Kweon

(参考訳) ほとんどのコンピュータビジョンアプリケーションでは、動きのぼやけは望ましくない人工物と見なされる。しかし、画像内の動きのぼやけは、基本的なコンピュータビジョン問題に実際的な関心を持つ可能性があることが示されている。そこで本研究では,単一動画像からの光流れをエンドツーエンドで推定する新しい枠組みを提案する。ネットワークをトランスフォーマーネットワークで設計し,動きブレート入力の符号化特徴からグローバルおよび局所的な動きを学習し,明示的なフレーム監督を伴わずに左右のフレーム特徴をデコードする。次に、フロー推定ネットワークを用いて、デコードされた特徴から光学的流れを粗い方法で推定する。合成および実動ブルールデータセットに関する大規模な実験を通じて、モデルを定性的かつ定量的に評価します。また、関連するアプローチに関連するモデルの詳細な分析を行い、アプローチの有効性と有利性を強調します。さらに,本手法で推定したオブジェクト分割タスクの解読と移動におけるフローの適用性について述べる。

In most of computer vision applications, motion blur is regarded as an undesirable artifact. However, it has been shown that motion blur in an image may have practical interests in fundamental computer vision problems. In this work, we propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner. We design our network with transformer networks to learn globally and locally varying motions from encoded features of a motion-blurred input, and decode left and right frame features without explicit frame supervision. A flow estimator network is then used to estimate optical flow from the decoded features in a coarse-to-fine manner. We qualitatively and quantitatively evaluate our model through a large set of experiments on synthetic and real motion-blur datasets. We also provide in-depth analysis of our model in connection with related approaches to highlight the effectiveness and favorability of our approach. Furthermore, we showcase the applicability of the flow estimated by our method on deblurring and moving object segmentation tasks.

翻訳日:2021-03-07 16:22:02 公開日:2021-03-04

# (参考訳) 時間的行動定位のためのマルチラベル行動依存のモデル化

Modeling Multi-Label Action Dependencies for Temporal Action Localization ( http://arxiv.org/abs/2103.03027v1 )

ライセンス: CC BY 4.0

Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah

(参考訳) 実世界のビデオには、アクションクラス間の固有の関係を持つ多くの複雑なアクションが含まれている。本研究では,映像の時間的行動ローカライゼーションの課題に対して,これらの行動関係をモデル化するアテンションベースアーキテクチャを提案する。アクションのビデオレベルの共起を利用する以前の作品とは対照的に、我々は同時に発生するアクションと異なるタイムステップで発生するアクションの関係を区別する(すなわち)。互いに先行する、または従うもの) これらの異なる関係をアクション依存と定義します。本稿では,これらのアクション依存性を,新しいアテンションベースマルチラベルアクション依存性(MLAD)層でモデル化することで,アクションローカライズ性能を向上させることを提案する。 MLADレイヤは、共起アクション依存関係をモデル化するための共起アクション依存関係ブランチと、時間的アクション依存関係の2つのブランチで構成されている。我々は,マルチラベル分類に使用される既存のメトリクスは,アクション依存のモデル化の精度を明示的に測定しないので,アクションクラス間の共起と時間依存の両方を考慮した新しいメトリクスを提案する。実験的な評価と広範囲な分析により,f-mAPと提案した指標を用いて,マルチラベル動作ローカライゼーションベンチマーク(MultiTHUMOSとCharades)の最先端手法よりも優れた性能を示す。

Real-world videos contain many complex actions with inherent relationships between action classes. In this work, we propose an attention-based architecture that models these action relationships for the task of temporal action localization in untrimmed videos. As opposed to previous works that leverage video-level co-occurrence of actions, we distinguish the relationships between actions that occur at the same time-step and actions that occur at different time-steps (i.e. those which precede or follow each other). We define these distinct relationships as action dependencies. We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer. The MLAD layer consists of two branches: a Co-occurrence Dependency Branch and a Temporal Dependency Branch to model co-occurrence action dependencies and temporal action dependencies, respectively. We observe that existing metrics used for multi-label classification do not explicitly measure how well action dependencies are modeled, therefore, we propose novel metrics that consider both co-occurrence and temporal dependencies between action classes. Through empirical evaluation and extensive analysis, we show improved performance over state-of-the-art methods on multi-label action localization benchmarks(MultiTHUMOS and Charades) in terms of f-mAP and our proposed metric.

翻訳日:2021-03-07 16:06:43 公開日:2021-03-04

# (参考訳) SSTN:自律運転のための自己監督型ドメイン適応熱物体検出

SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving ( http://arxiv.org/abs/2103.03150v1 )

ライセンス: CC BY 4.0

Farzeen Munir, Shoaib Azam and Moongu Jeon

(参考訳) 環境の感受性と感度は、自動運転車の安全かつ安全な運転において決定的な役割を果たす。周囲のこの知覚は、人間の視覚表現に似ています。人間の脳は、異なる感覚チャネルを利用して環境を知覚し、ビュー不変表現モデルを開発する。この状況を維持しながら、異なる外部受容センサーが環境を知覚するために自動運転車に展開される。最も一般的な感知センサは、自動運転車の知覚のためのカメラ、ライダー、レーダーです。これらのセンサーは、例えば夜間の悪天候下では可視スペクトル領域の利点を示してきたが、運用能力は限られており、致命的な事故を引き起こす可能性がある。本研究では, 自己監督型コントラスト学習手法を用いて, ビュー不変モデル表現をモデル化する熱物体検出手法を検討する。そこで本研究では,可視領域と赤外領域の情報をコントラスト学習により最大化するための特徴埋め込みを学習する深層ニューラルネットワーク self supervised thermal network (sstn) を提案し,その特徴表現をマルチスケールエンコーダ・デコーダトランスフォーマネットワークを用いた熱物体検出に適用した。提案手法は、FLIR-ADASデータセットとKAISTマルチスペクトラルデータセットの2つの公開データセットで広く評価されている。実験結果は,提案手法の有効性を示す。

The sensibility and sensitivity of the environment play a decisive role in the safe and secure operation of autonomous vehicles. This perception of the surrounding is way similar to human visual representation. The human's brain perceives the environment by utilizing different sensory channels and develop a view-invariant representation model. Keeping in this context, different exteroceptive sensors are deployed on the autonomous vehicle for perceiving the environment. The most common exteroceptive sensors are camera, Lidar and radar for autonomous vehicle's perception. Despite being these sensors have illustrated their benefit in the visible spectrum domain yet in the adverse weather conditions, for instance, at night, they have limited operation capability, which may lead to fatal accidents. In this work, we explore thermal object detection to model a view-invariant model representation by employing the self-supervised contrastive learning approach. For this purpose, we have proposed a deep neural network Self Supervised Thermal Network (SSTN) for learning the feature embedding to maximize the information between visible and infrared spectrum domain by contrastive learning, and later employing these learned feature representation for the thermal object detection using multi-scale encoder-decoder transformer network. The proposed method is extensively evaluated on the two publicly available datasets: the FLIR-ADAS dataset and the KAIST Multi-Spectral dataset. The experimental results illustrate the efficacy of the proposed method.

翻訳日:2021-03-07 15:44:55 公開日:2021-03-04

# (参考訳) 拡張畳み込みを用いた注意型ニューラルネットワークによる映像からの3次元人物位置推定

Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions ( http://arxiv.org/abs/2103.03170v1 )

ライセンス: CC BY 4.0

Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, Vijayan K. Asari

(参考訳) 注意メカニズムは、暗黙的な時間整合性を高めた空間モデル学習のための連続予測フレームワークを提供する。本研究では,従来のネットワークなどの制約をアテンションフレームワークに組み込む手法として,ポーズ推定タスクの長距離依存性を学習するための体系的設計(2次元から3次元まで)を提案する。本論文は,任意の映像シーケンスの柔軟性とスケーラビリティを入力として,エンド・ツー・エンドのポーズ推定のためのアテンションベースモデルの設計と訓練のための体系的なアプローチを提案する。拡張畳み込みのマルチスケール構造により,時間受容場を適応させることにより,これを実現する。さらに,提案アーキテクチャは,リアルタイム性能を実現する因果モデルに容易に適応できる。既製の2Dポーズ推定システム、例えば。 Mocapライブラリは、アドホックな方法で簡単に統合できます。提案手法は,Human3.6Mデータセット上での関節位置誤差の平均を33.4mmに減らし,最先端性能を達成し,既存の手法よりも優れる。

The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other forms of constraints can be incorporated into the attention framework for learning long-range dependencies for the task of pose estimation. The contribution of this paper is to provide a systematic approach for designing and training of attention-based models for the end-to-end pose estimation, with the flexibility and scalability of arbitrary video sequences as input. We achieve this by adapting temporal receptive field via a multi-scale structure of dilated convolutions. Besides, the proposed architecture can be easily adapted to a causal model enabling real-time performance. Any off-the-shelf 2D pose estimation systems, e.g. Mocap libraries, can be easily integrated in an ad-hoc fashion. Our method achieves the state-of-the-art performance and outperforms existing methods by reducing the mean per joint position error to 33.4 mm on Human3.6M dataset.

翻訳日:2021-03-07 15:28:48 公開日:2021-03-04

# (参考訳) 深度Quantum Measurement Ordinal Regressionを用いた前立腺組織グレーディング

Prostate Tissue Grading with Deep Quantum Measurement Ordinal Regression ( http://arxiv.org/abs/2103.03188v1 )

ライセンス: CC BY 4.0

Santiago Toledo-Cort\'es, Diego H. Useche, and Fabio A. Gonz\'alez

(参考訳) 前立腺癌(PCa)は世界中で最も一般的で攻撃的ながんの1つである。 Gleasonスコア(GS)システムは、前立腺癌を分類する標準的な方法であり、続く重症度と治療を決定する最も信頼性の高い方法です。病理学者は前立腺の癌細胞の配列を調べ、6から10の範囲のスケールでスコアを割り当てます。前立腺全スライド画像(WSI)の自動分析は、通常、GSによって与えられた段階間の細かい区別を欠くバイナリ分類問題として扱われます。本稿では,前立腺WSIからGSを推定できる確率論的深層学習順序分類法を提案する。微分可能な確率モデルを用いた順序回帰タスクとしてこの問題にアプローチすることで、結果の解釈性が向上するだけでなく、従来の深層分類や回帰アーキテクチャと比較してモデルの精度が向上する。

Prostate cancer (PCa) is one of the most common and aggressive cancers worldwide. The Gleason score (GS) system is the standard way of classifying prostate cancer and the most reliable method to determine the severity and treatment to follow. The pathologist looks at the arrangement of cancer cells in the prostate and assigns a score on a scale that ranges from 6 to 10. Automatic analysis of prostate whole-slide images (WSIs) is usually addressed as a binary classification problem, which misses the finer distinction between stages given by the GS. This paper presents a probabilistic deep learning ordinal classification method that can estimate the GS from a prostate WSI. Approaching the problem as an ordinal regression task using a differentiable probabilistic model not only improves the interpretability of the results, but also improves the accuracy of the model when compared to conventional deep classification and regression architectures.

翻訳日:2021-03-07 15:01:12 公開日:2021-03-04

# (参考訳) 自然言語処理のための完全量子化BERTのハードウェア高速化

Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing ( http://arxiv.org/abs/2103.02800v1 )

ライセンス: CC BY 4.0

Zejian Liu, Gang Li and Jian Cheng

(参考訳) BERTは、さまざまなNLPタスクで最先端のパフォーマンスを実現する最新のトランスフォーマーベースのモデルです。本稿では,エッジコンピューティングのためのFPGA上でのBERTのハードウェアアクセラレーションについて検討する。計算量とメモリフットプリントの問題に対処するために、重み、アクティベーション、ソフトマックス、層正規化、および全ての中間結果を含むBERT(FQ-BERT)の完全定量化を提案する。実験の結果、FQ-BERTは7.94倍の圧縮を達成でき、性能損失は無視できることがわかった。次に、FQ-BERTに適したアクセラレータを提案し、Xilinx ZCU102 と ZCU111 FPGA上で評価する。それぞれIntel(R) Core(TM) i7-8700 CPUとNVIDIA K80 GPUより28.91xと12.72xの3.18fps/Wの性能を実現することができる。

BERT is the most recent Transformer-based model that achieves state-of-the-art performance in various NLP tasks. In this paper, we investigate the hardware acceleration of BERT on FPGA for edge computing. To tackle the issue of huge computational complexity and memory footprint, we propose to fully quantize the BERT (FQ-BERT), including weights, activations, softmax, layer normalization, and all the intermediate results. Experiments demonstrate that the FQ-BERT can achieve 7.94x compression for weights with negligible performance loss. We then propose an accelerator tailored for the FQ-BERT and evaluate on Xilinx ZCU102 and ZCU111 FPGA. It can achieve a performance-per-watt of 3.18 fps/W, which is 28.91x and 12.72x over Intel(R) Core(TM) i7-8700 CPU and NVIDIA K80 GPU, respectively.

翻訳日:2021-03-07 14:51:56 公開日:2021-03-04

# (参考訳) 米下院議員、英国で初の新型コロナウイルスのロックダウンを阻止-ホワイトペーパー

MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK: White Paper ( http://arxiv.org/abs/2103.02917v1 )

ライセンス: CC BY 4.0

Tracie Farrell, Mehmet Bakir, Kalina Bontcheva

(参考訳) 英国は数年前から不安定な政治環境をとっており、ブレグジットとリーダーシップの危機は過去5年間をマークしています。この研究では、世界の保健緊急事態であるCOVID-19が、英国政治家が公衆と関わるときに受ける虐待の量、種類、またはトピックにどのように影響するかについてもっと理解したいと考えました。この研究では、世界の保健緊急事態であるCOVID-19が、英国政治家が公衆と関わるときに受ける虐待の量、種類、またはトピックにどのように影響するかについてもっと理解したいと考えました。この研究は、2020年6月から12月までの期間をカバーし、英国の議員に対するTwitterの乱用を分析します。この研究は、英国の新型コロナウイルスパンデミックの最初の4ヶ月間のオンライン虐待の分析によるフォローアップです。この論文は、この新しい7ヶ月間の全体的な虐待レベルを調べ、さまざまな政党や英国政府のメンバーへの反応を分析し、オンラインの虐待とブレグジット、政府のCOVID-19対応と政策、社会問題などのトピックとの関係を分析します。また,同時期の国会議員への虐待的回答に載った陰謀論の存在についても検討した。英国議会議員に対する虐待レベルは、2020年12月に過去最高(全返信ツイートの5.4%)に達したことが判明した。これは、総選挙の2ヶ月前よりもほぼ1%高いです。新型コロナウイルスの感染拡大と欧州連合(EU)とのブレグジット(ブレグジット)交渉の終了が近づいている中、2020年7月以降、トーリー党の国会議員は、新型コロナウイルス(COVID-19)の危機が深刻化し、2020年9月から5%を超える虐待的な回答を最も多く受け取っている。

The UK has had a volatile political environment for some years now, with Brexit and leadership crises marking the past five years. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. This work covers the period of June to December 2020 and analyses Twitter abuse in replies to UK MPs. This work is a follow-up from our analysis of online abuse during the first four months of the COVID-19 pandemic in the UK. The paper examines overall abuse levels during this new seven month period, analyses reactions to members of different political parties and the UK government, and the relationship between online abuse and topics such as Brexit, government's COVID-19 response and policies, and social issues. In addition, we have also examined the presence of conspiracy theories posted in abusive replies to MPs during the period. We have found that abuse levels toward UK MPs were at an all-time high in December 2020 (5.4% of all reply tweets sent to MPs). This is almost 1% higher that the two months preceding the General Election. In a departure from the trend seen in the first four months of the pandemic, MPs from the Tory party received the highest percentage of abusive replies from July 2020 onward, which stays above 5% starting from September 2020 onward, as the COVID-19 crisis deepened and the Brexit negotiations with the EU started nearing completion.

翻訳日:2021-03-07 14:42:12 公開日:2021-03-04

# (参考訳) エビデンス支援による予測の学習:臨床リスク予測への応用

Learning to Predict with Supporting Evidence: Applications to Clinical Risk Prediction ( http://arxiv.org/abs/2103.02768v1 )

ライセンス: CC BY 4.0

Aniruddh Raghu, John Guttag, Katherine Young, Eugene Pomerantsev, Adrian V. Dalca, Collin M. Stultz

(参考訳) 機械学習モデルがヘルスケアに与える影響は、医療専門家がこれらのモデルによって予測される信頼度に依存する。本論文では,予測が信頼されるべき理由に関するドメイン関連証拠を臨床専門家に提示する手法を提案する。まず,有意な潜在概念を予測対象や観測データに関連付ける確率モデルを設計する。このモデルにおける潜在変数の推論は、予測の作成と、その予測の裏付けとなる証拠の提供の両方に対応する。 i) 変動学習を用いたモデルパラメータの推定, (ii) 確率モデルから派生した目的を訓練したニューラルネットワークを用いたモデルにおける潜時変数の推定を最大に近似する。本研究は,循環器疾患患者の死亡リスクを予測するための課題である。特に,心電図と表データ入力を用いて,本手法が正確な予測のための適切な領域関連証拠を提供することを示す。

The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. In this paper, we present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted. We first design a probabilistic model that relates meaningful latent concepts to prediction targets and observed data. Inference of latent variables in this model corresponds to both making a prediction and providing supporting evidence for that prediction. We present a two-step process to efficiently approximate inference: (i) estimating model parameters using variational learning, and (ii) approximating maximum a posteriori estimation of latent variables in the model using a neural network, trained with an objective derived from the probabilistic model. We demonstrate the method on the task of predicting mortality risk for patients with cardiovascular disease. Specifically, using electrocardiogram and tabular data as input, we show that our approach provides appropriate domain-relevant supporting evidence for accurate predictions.

翻訳日:2021-03-07 12:33:37 公開日:2021-03-04

# (参考訳) トポロジカル機能による3Dポイントクラウドの次のベストビューの学習

Learning the Next Best View for 3D Point Clouds via Topological Features ( http://arxiv.org/abs/2103.02789v1 )

ライセンス: CC BY 4.0

Christopher Collander, William J. Beksi, Manfred Huber

(参考訳) 本稿では,新しいトポロジーに基づく情報ゲインメトリックを用いて,ノイズの多い3dセンサの次なる最善の視点を指示する強化学習手法を提案する。測定器は観察された表面の不整合セクションを結合し、穴および凹面セクションのような高密度の特徴に焦点を合わせます。実験の結果,本手法は,ストリーミングポイントクラウドデータが提供する情報を最適化するためにロボットセンサの配置を確立するのに役立つことがわかった。さらに、3Dオブジェクトのラベル付きデータセット、カスタムロボットマニピュレータ用のCAD設計、およびポイントクラウドの変換、結合、および登録のためのソフトウェアが研究コミュニティに公開されました。

In this paper, we introduce a reinforcement learning approach utilizing a novel topology-based information gain metric for directing the next best view of a noisy 3D sensor. The metric combines the disjoint sections of an observed surface to focus on high-detail features such as holes and concave sections. Experimental results show that our approach can aid in establishing the placement of a robotic sensor to optimize the information provided by its streaming point cloud data. Furthermore, a labeled dataset of 3D objects, a CAD design for a custom robotic manipulator, and software for the transformation, union, and registration of point clouds has been publicly released to the research community.

翻訳日:2021-03-07 12:06:54 公開日:2021-03-04

# (参考訳) IACN:Recommendationのための影響認識と注意ベースの共進化ネットワーク

IACN: Influence-aware and Attention-based Co-evolutionary Network for Recommendation ( http://arxiv.org/abs/2103.02866v1 )

ライセンス: CC0 1.0

Shalini Pandey, George Karypis and Jaideep Srivasatava

(参考訳) RedditやTwitterなどのオンラインコミュニティでは,関連する項目をユーザに推奨することが重要な課題だ。レコメンデーションシステムのために,表現学習は,ユーザの振る舞いを表現するために埋め込みを学習し,アイテムのプロパティをキャプチャする強力なテクニックを提供する。しかし、オンラインコミュニティへの埋め込みの学習は、ユーザーの興味が進化し続けるため、難しい課題である。この進化は,1) ユーザと項目間のインタラクション,2) コミュニティ内の他のユーザの影響から捉えることができる。既存の動的埋め込みモデルは、ユーザーの埋め込みを更新する要因のいずれかのみを考慮します。しかし、ある時点では、2つの要素の組み合わせによってユーザーの興味が進化する。そこで我々は,影響認識と注意に基づく共進化ネットワーク (IACN) を提案する。本質的にIACNは、相互作用モデリングと影響モデリングの2つの重要なコンポーネントから構成される。インタラクションモデリングレイヤは、ユーザがアイテムと対話する際に、ユーザとアイテムの埋め込みを更新する責務を負う。影響モデリング層は、他のユーザの相互作用によって引き起こされる時間的興奮をキャプチャする。これら2つの層から得られる信号を統合するために,インタラクションベースと影響ベースの埋め込みを効果的に組み合わせ,最終的なユーザ埋め込みを予測する新しい融合層を設計する。私たちのモデルは、さまざまなドメインの既存の最新モデルよりも優れています。

Recommending relevant items to users is a crucial task on online communities such as Reddit and Twitter. For recommendation system, representation learning presents a powerful technique that learns embeddings to represent user behaviors and capture item properties. However, learning embeddings on online communities is a challenging task because the user interest keep evolving. This evolution can be captured from 1) interaction between user and item, 2) influence from other users in the community. The existing dynamic embedding models only consider either of the factors to update user embeddings. However, at a given time, user interest evolves due to a combination of the two factors. To this end, we propose Influence-aware and Attention-based Co-evolutionary Network (IACN). Essentially, IACN consists of two key components: interaction modeling and influence modeling layer. The interaction modeling layer is responsible for updating the embedding of a user and an item when the user interacts with the item. The influence modeling layer captures the temporal excitation caused by interactions of other users. To integrate the signals obtained from the two layers, we design a novel fusion layer that effectively combines interaction-based and influence-based embeddings to predict final user embedding. Our model outperforms the existing state-of-the-art models from various domains.

翻訳日:2021-03-07 11:52:58 公開日:2021-03-04

# (参考訳) パンスハーピング成功のためのモデルベースの画像調整

Model-based image adjustment for a successful pansharpening ( http://arxiv.org/abs/2103.03062v1 )

ライセンス: CC BY 4.0

Gintautas Palubinskas

(参考訳) マルチレゾリューション画像融合やパンシャルペニングの強化のための新しいモデルベース画像調整法を提案する。このような画像調整は、パンクロマティックバンドおよび/または強度画像(マルチスペクトルバンドの重み付き合計として計算される)を入力として使用するほとんどのパンスハーピング方法に必要です。様々な理由で、例えば。キャリブレーションの不正確さ、異なるセンサーの使用、パンシャーパニングのための入力画像:低分解能マルチスペクトル画像またはより正確には計算された強度画像と高分解能パンクロマティック画像は、物理特性の値が異なる可能性がある。処理レベルに応じて、線量または反射率。しかし、両方の画像の同じオブジェクト/クラスは、類似の値またはより一般的に類似の統計を示すべきである。類似性の定義は特定のアプリケーションに依存する。 2つのセンサーからのデータをうまく融合させるには、両方のセンサーの放射/反射のエネルギーバランスを保つ必要がある。異なるセンサにおける全エネルギー不均衡を補償するために仮想バンドが導入された。まず、個々のスペクトル帯域の重みを低分解能スケールで推定し、マルチスペクトル画像とパンクロマティック画像の両方(低パスフィルタバージョン)が利用可能であり、推定仮想バンドを高スケールにアップサンプリングし、最終的に、高分解能パンクロマティックバンドを仮想バンドを減算して補正する。次のpansharpeningで元のpanchromaticイメージの代りにこの訂正されたpanchromaticバンドが使用されます。例えば、コンポーネント置換ベースのメソッドのパフォーマンス品質が大幅に向上できることが示されている。

A new model-based image adjustment for the enhancement of multi-resolution image fusion or pansharpening is proposed. Such image adjustment is needed for most pansharpening methods using panchromatic band and/or intensity image (calculated as a weighted sum of multispectral bands) as an input. Due various reasons, e.g. calibration inaccuracies, usage of different sensors, input images for pansharpening: low resolution multispectral image or more precisely the calculated intensity image and high resolution panchromatic image may differ in values of their physical properties, e.g. radiances or reflectances depending on the processing level. But the same objects/classes in both images should exhibit similar values or more generally similar statistics. Similarity definition will depend on a particular application. For a successful fusion of data from two sensors the energy balance between radiances/reflectances of both sensors should hold. A virtual band is introduced to compensate for total energy disbalance in different sensors. Its estimation consists of several steps: first, weights for individual spectral bands are estimated in a low resolution scale, where both multispectral and panchromatic images (low pass filtered version) are available, then, the estimated virtual band is up-sampled to a high scale and, finally, high resolution panchromatic band is corrected by subtracting virtual band. This corrected panchromatic band is used instead of original panchromatic image in the following pansharpening. It is shown, for example, that the performance quality of component substitution based methods can be increased significantly.

翻訳日:2021-03-07 11:27:09 公開日:2021-03-04

# (参考訳) DONeRF:depth Oracle Networksを用いたニューラルネットワークのリアルタイムレンダリングに向けて

DONeRF: Towards Real-Time Rendering of Neural Radiance Fields using Depth Oracle Networks ( http://arxiv.org/abs/2103.03231v1 )

ライセンス: CC BY 4.0

Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Chakravarty R. Alla Chaitanya, Anton Kaplanyan, Markus Steinberger

(参考訳) 最近のNeural Radiance Fields(NeRF)に関する研究爆発は、新しいビュー生成のためにニューラルネットワークにシーンや照明情報を暗黙的に保存する巨大な可能性を示している。しかし、NeRFの普及を妨げる大きな制限の1つは、各ビューレイに沿って過度のネットワーク評価を禁止する計算コストであり、現在のデバイスでリアルタイムレンダリングを目指すには数十のPetaFLOPSを必要とします。現地のサンプルをシーンの表面に配置すると、各ビューレイに必要なサンプル数が大幅に削減できることを示します。この目的のために,1つのネットワーク評価で各ビューレイのレイサンプル位置を予測できる深度oracleネットワークを提案する。対数的に離散化され, 球面に歪んだ深度値を含む分類網を用いることは, 深度を直接推定するのではなく, 表面位置を符号化する上で重要であることを示す。これらの手法を組み合わせることで、深度オラクルネットワークを第1ステップとする二重ネットワーク設計のDONeRFと、局所的な試料シェーディングネットワークによる光線蓄積が実現される。当社の設計により、NeRFと比較して最大48倍の推論コストを削減します。市販の推論apiと単純な計算カーネルを組み合わせることで、レイマーチングベースのニューラルネットワーク表現を1つのgpu上でインタラクティブなフレームレート(毎秒15フレーム、800x800)でレンダリングした最初の例です。同時に、我々は表面周辺のシーンの重要部分に焦点を当てるため、NeRFと同等または良質な品質が得られる。

The recent research explosion around Neural Radiance Fields (NeRFs) shows that there is immense potential for implicitly storing scene and lighting information in neural networks, e.g., for novel view generation. However, one major limitation preventing the widespread use of NeRFs is the prohibitive computational cost of excessive network evaluations along each view ray, requiring dozens of petaFLOPS when aiming for real-time rendering on current devices. We show that the number of samples required for each view ray can be significantly reduced when local samples are placed around surfaces in the scene. To this end, we propose a depth oracle network, which predicts ray sample locations for each view ray with a single network evaluation. We show that using a classification network around logarithmically discretized and spherically warped depth values is essential to encode surface locations rather than directly estimating depth. The combination of these techniques leads to DONeRF, a dual network design with a depth oracle network as a first step and a locally sampled shading network for ray accumulation. With our design, we reduce the inference costs by up to 48x compared to NeRF. Using an off-the-shelf inference API in combination with simple compute kernels, we are the first to render raymarching-based neural representations at interactive frame rates (15 frames per second at 800x800) on a single GPU. At the same time, since we focus on the important parts of the scene around surfaces, we achieve equal or better quality compared to NeRF.

翻訳日:2021-03-07 11:02:37 公開日:2021-03-04

# (参考訳) グラフニューラルネットワークの知識を抽出し、その先へ進む:効果的な知識蒸留フレームワーク

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework ( http://arxiv.org/abs/2103.02885v1 )

ライセンス: CC BY 4.0

Cheng Yang, Jiawei Liu and Chuan Shi

(参考訳) グラフ上の半教師付き学習は、機械学習の分野で重要な問題です。近年,グラフニューラルネットワーク(GNN)に基づく最先端の分類手法が,ラベル伝搬などの従来の手法よりも優れていることが示されている。しかし、これらのニューラルモデルの洗練されたアーキテクチャは複雑な予測メカニズムにつながり、例えば構造的に関連付けられたノードは同じクラスを持つ傾向があるため、データにある貴重な事前知識をフルに利用できない。本稿では,上記の課題を解決するための知識蒸留に基づく枠組みを提案する。本フレームワークは、任意の学習gnnモデル(教師モデル)の知識を抽出し、よく設計された学生モデルに注入する。学生モデルは2つの単純な予測機構、すなわちラベル伝搬と特徴変換で構築され、それぞれ構造に基づく事前知識と特徴に基づく事前知識を自然に保存する。具体的には,パラメータ化ラベル伝搬と特徴変換モジュールの訓練可能な組み合わせとして,学生モデルを設計する。その結果、学習した学生は、gnn教師の事前知識と知識の両方から、より効果的な予測を得ることができる。さらに,学習者モデルはgnnよりも解釈可能な予測プロセスを有する。我々は5つの公開ベンチマークデータセットの実験を行い、教師モデルとしてGCN, GAT, APPNP, SAGE, SGC, GCNII, GLPを含む7つのGNNモデルを用いる。実験結果から,学習者モデルは平均1.4%～4.7%の教師モデルより一貫して優れていた。コードとデータはhttps://github.com/BUPT-GAMMA/CPFで入手できます。

Semi-supervised learning on graphs is an important problem in the machine learning area. In recent years, state-of-the-art classification methods based on graph neural networks (GNNs) have shown their superiority over traditional ones such as label propagation. However, the sophisticated architectures of these neural models will lead to a complex prediction mechanism, which could not make full use of valuable prior knowledge lying in the data, e.g., structurally correlated nodes tend to have the same class. In this paper, we propose a framework based on knowledge distillation to address the above issues. Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model), and injects it into a well-designed student model. The student model is built with two simple prediction mechanisms, i.e., label propagation and feature transformation, which naturally preserves structure-based and feature-based prior knowledge, respectively. In specific, we design the student model as a trainable combination of parameterized label propagation and feature transformation modules. As a result, the learned student can benefit from both prior knowledge and the knowledge in GNN teachers for more effective predictions. Moreover, the learned student model has a more interpretable prediction process than GNNs. We conduct experiments on five public benchmark datasets and employ seven GNN models including GCN, GAT, APPNP, SAGE, SGC, GCNII and GLP as the teacher models. Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average. Code and data are available at https://github.com/BUPT-GAMMA/CPF

翻訳日:2021-03-07 10:23:20 公開日:2021-03-04

# (参考訳) 胸部X線分類のための自己制御深部畳み込みニューラルネットワーク

Self-supervised deep convolutional neural network for chest X-ray classification ( http://arxiv.org/abs/2103.03055v1 )

ライセンス: CC BY 4.0

Matej Gazda, Jakub Gazda, Jan Plavka, Peter Drotar

(参考訳) 胸部X線撮影は、診断決定を行うための重要な情報を伝える比較的安価で広く利用可能な医療手順です。胸部x線は肺炎や最近のcovid-19などの呼吸器疾患の診断によく用いられる。本論文では,ラベルのない胸部X線データセット上に予め訓練された自己監視型ディープニューラルネットワークを提案する。学習された表現は、呼吸器疾患の分類である下流タスクに転送される。 4つの公開データセットで得られた結果は、私たちのアプローチが大量のラベル付きトレーニングデータを必要とせずに競争結果をもたらすことを示しています。

Chest radiography is a relatively cheap, widely available medical procedure that conveys key information for making diagnostic decisions. Chest X-rays are almost always used in the diagnosis of respiratory diseases such as pneumonia or the recent COVID-19. In this paper, we propose a self-supervised deep neural network that is pretrained on an unlabeled chest X-ray dataset. The learned representations are transferred to downstream task - the classification of respiratory diseases. The results obtained on four public datasets show that our approach yields competitive results without requiring large amounts of labeled training data.

翻訳日:2021-03-07 10:07:13 公開日:2021-03-04

# (参考訳) 信号対称フィードバックアライメントを有するエッジデバイス上での効率的な学習畳み込みニューラルネットワーク

Efficient Training Convolutional Neural Networks on Edge Devices with Gradient-pruned Sign-symmetric Feedback Alignment ( http://arxiv.org/abs/2103.02889v1 )

ライセンス: CC BY 4.0

Ziyang Hong and C. Patrick Yue

(参考訳) モバイルデバイスの繁栄に伴い、分散データによるモデルトレーニングを可能にする分散学習アプローチが広く研究されている。しかし,エッジデバイスにおける学習能力の欠如は,実生活における分散学習のエネルギー効率を著しく低下させる。本稿では,従来のバックプロパゲーションの冗長性と重み非対称性ポテンシャルを利用したDNNのトレーニング手法について述べる。提案手法は, 分類精度の損失が無視できるほど, エネルギー効率の面では, 先行技術よりも5倍高いことを実証する。

With the prosperity of mobile devices, the distributed learning approach enabling model training with decentralized data has attracted wide research. However, the lack of training capability for edge devices significantly limits the energy efficiency of distributed learning in real life. This paper describes a novel approach of training DNNs exploiting the redundancy and the weight asymmetry potential of conventional backpropagation. We demonstrate that with negligible classification accuracy loss, the proposed approach outperforms the prior arts by 5x in terms of energy efficiency.

翻訳日:2021-03-07 09:46:56 公開日:2021-03-04

# (参考訳) パンデミック・ドラッグのパンデミック・スピード--高性能コンピュータ上でのハイブリッド機械学習と物理シミュレーションによるcovid-19創薬の加速

Pandemic Drugs at Pandemic Speed: Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers ( http://arxiv.org/abs/2103.02843v1 )

ライセンス: CC BY 4.0

Agastya P. Bhati, Shunzhou Wan, Dario Alf\`e, Austin R. Clyde, Mathis Bode, Li Tan, Mikhail Titov, Andre Merzky, Matteo Turilli, Shantenu Jha, Roger R. Highfield, Walter Rocchia, Nicola Scafuri, Sauro Succi, Dieter Kranzlm\"uller, Gerald Mathias, David Wifling, Yann Donon, Alberto Di Meglio, Sofia Vallecorsa, Heng Ma, Anda Trifan, Arvind Ramanathan, Tom Brettin, Alexander Partin, Fangfang Xia, Xiaotan Duan, Rick Stevens, Peter V. Coveney

(参考訳) 世界的パンデミックの課題を満たすための競争は、既存の医薬品発見プロセスが高価で非効率で遅いことを思い出させるのに役立った。抗ウイルス薬の開発のためのリード化合物をショートリストに潜在的な小さな分子の膨大な数をスクリーニングする主要なボトルネックがあります。薬物発見を加速する新たな機会は、線形加速器用に開発された機械学習手法と物理に基づく手法のインターフェースにある。 2つのシリコ法では、それぞれ独自の利点と制限があり、興味深いことに互いに補完する。本稿では、薬物発見を加速する両アプローチを組み合わせた革新的な方法を提案する。結果として生じるワークフローのスケールは、ハイパフォーマンスコンピューティングに依存している。我々は、このワークフローを4つのcovid-19ターゲットタンパク質に適用できることと、様々なスーパーコンピュータ上で鉛化合物を識別するために必要な大規模計算を行う能力を示した。

The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative method that combines both approaches to accelerate drug discovery. The scale of the resulting workflow is such that it is dependent on high performance computing. We have demonstrated the applicability of this workflow on four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead compounds on a variety of supercomputers.

翻訳日:2021-03-07 08:25:43 公開日:2021-03-04

# (参考訳) 雑音ラベル学習のための強化戦略

Augmentation Strategies for Learning with Noisy Labels ( http://arxiv.org/abs/2103.02130v2 )

ライセンス: CC BY 4.0

Kento Nishi, Yi Ding, Alex Rich, Tobias H\"ollerer

(参考訳) 不完全なラベルは、実世界のデータセットに普遍的です。ラベルノイズに強いディープニューラルネットワーク(DNN)を訓練するいくつかの成功した方法は、ウォームアップフェーズ中の損失に基づいてサンプルをフィルタリングして、クリーンなラベル付きサンプルの最初のセットをキュレートし、その後の損失計算のための擬似ラベルとしてネットワークの出力を使用することである。本稿では,「ノイズラベルを用いた学習」問題に取り組むアルゴリズムの強化戦略について検討する。 CIFAR-10 と CIFAR-100 に基づく合成データセットと実世界データセット Clothing1M を用いて,複数の拡張戦略を提案し,検討する。これらのアルゴリズムにいくつかの共通性があるため、損失モデリングタスクに1組の加減と学習のためのもう1セットを用いることが最も効果的であり、最先端や他の以前の方法の結果を改善することが判明した。さらに, ウォームアップ期間中に添加することで, 不正確なラベル付き試料に対する損失収束挙動に負の影響がみられた。我々は,この拡張戦略を最先端技術に導入し,評価されたすべての騒音レベルにおける性能向上を実証する。特に、CIFAR-10ベンチマークの精度を90%の対称雑音で絶対精度で15%以上向上し、実世界のデータセットであるClathing1Mの性能も向上する。 (※同等の貢献)

Imperfect labels are ubiquitous in real-world datasets. Several recent successful methods for training deep neural networks (DNNs) robust to label noise have used two primary techniques: filtering samples based on loss during a warm-up phase to curate an initial set of cleanly labeled samples, and using the output of a network as a pseudo-label for subsequent loss calculations. In this paper, we evaluate different augmentation strategies for algorithms tackling the "learning with noisy labels" problem. We propose and examine multiple augmentation strategies and evaluate them using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world dataset Clothing1M. Due to several commonalities in these algorithms, we find that using one set of augmentations for loss modeling tasks and another set for learning is the most effective, improving results on the state-of-the-art and other previous methods. Furthermore, we find that applying augmentation during the warm-up period can negatively impact the loss convergence behavior of correctly versus incorrectly labeled samples. We introduce this augmentation strategy to the state-of-the-art technique and demonstrate that we can improve performance across all evaluated noise levels. In particular, we improve accuracy on the CIFAR-10 benchmark at 90% symmetric noise by more than 15% in absolute accuracy and we also improve performance on the real-world dataset Clothing1M. (* equal contribution)

翻訳日:2021-03-07 08:08:26 公開日:2021-03-04

# (参考訳) MotionRNN:時空変動運動を用いたフレキシブルな映像予測モデル

MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions ( http://arxiv.org/abs/2103.02243v2 )

ライセンス: CC BY 4.0

Haixu Wu, Zhiyu Yao, Mingsheng Long, Jianmin Wang

(参考訳) 本稿では,空間と時間の両方で絶え間なく変化する時空変動運動を予測する新たな次元から映像予測に取り組む。以前の方法は、主に時間状態遷移を捕捉するが、運動自体の複雑な時空間変動を見落とし、絶えず変化する動きに適応することは困難である。物理世界の動きは過渡的な変動と動きの傾向に分解できるが、後者は過去の動きの蓄積と見なすことができる。したがって、時空変動運動をより予測可能にする鍵は、過渡変動と運動トレンドを同時に捉えることである。これらの観察に基づいて,モーション内の複雑な変動を捉え,時空変動のシナリオに適応できる motionrnn フレームワークを提案する。 MotionRNNには2つの主な貢献がある。 1つ目は、過渡変動と動きの傾向を統一的にモデル化できるモーションGRUユニットを設計することである。 2つ目は、rnnベースの予測モデルにmotiongruを適用し、変化可能な動きの予測能力を大幅に向上し、積み重ねられた多層予測モデルにおける動き消失を回避する新しいフレキシブルビデオ予測アーキテクチャを示すことである。高い柔軟性により、このフレームワークは決定論的時空間予測のための一連のモデルに適応することができる。当社の MotionRNN は、時空変動運動によるビデオ予測の3つの困難なベンチマークで大幅な改善をもたらすことができます。

This paper tackles video prediction from a new dimension of predicting spacetime-varying motions that are incessantly changing across both space and time. Prior methods mainly capture the temporal state transitions but overlook the complex spatiotemporal variations of the motion itself, making them difficult to adapt to ever-changing motions. We observe that physical world motions can be decomposed into transient variation and motion trend, while the latter can be regarded as the accumulation of previous motions. Thus, simultaneously capturing the transient variation and the motion trend is the key to make spacetime-varying motions more predictable. Based on these observations, we propose the MotionRNN framework, which can capture the complex variations within motions and adapt to spacetime-varying scenarios. MotionRNN has two main contributions. The first is that we design the MotionGRU unit, which can model the transient variation and motion trend in a unified way. The second is that we apply the MotionGRU to RNN-based predictive models and indicate a new flexible video prediction architecture with a Motion Highway that can significantly improve the ability to predict changeable motions and avoid motion vanishing for stacked multiple-layer predictive models. With high flexibility, this framework can adapt to a series of models for deterministic spatiotemporal prediction. Our MotionRNN can yield significant improvements on three challenging benchmarks for video prediction with spacetime-varying motions.

翻訳日:2021-03-07 07:35:04 公開日:2021-03-04

# (参考訳) ID-Unet: ビュー合成のための反復ソフトとハード変形

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis ( http://arxiv.org/abs/2103.02264v2 )

ライセンス: CC BY 4.0

Mingyu Yin, Li Sun, Qingli Li

(参考訳) ビュー合成は通常、オートエンコーダによって行われ、エンコーダはソースビュー画像を潜在コンテンツコードにマッピングし、デコーダはその条件に従ってターゲットビューイメージに変換する。しかし、ソースの内容はよくこの設定に保持されていないため、ビュー翻訳中に不要な変更が発生します。 unetのようなスキップ接続の追加は問題を緩和するが、ビューの適合性に障害を引き起こすことが多い。本稿では, 音源から目標への変形を反復的に行う新しいアーキテクチャを提案する。エンコーダの複数の層からの機能を単に組み込むのではなく、ソフトで硬い変形モジュールを設計し、それによってエンコーダの機能を異なる解像度でターゲットビューにワープし、詳細を補うためにデコーダに結果を与える。特に、現在の反り流は、同じ解像度の特徴を調整するだけでなく、高解像度の特徴を粗く変形させる近似としても使用されます。そして、残留流を高分解能で推定して印加することにより、粗粒度から細粒度までの変形が構築される。モデルをよりよく制約するために,中間フローとその歪んだ特徴に基づいて,粗い目標視像を合成する。 2つの異なるデータセットにおける広範なアブレーション研究と最終結果は,提案モデルの有効性を示している。

View synthesis is usually done by an autoencoder, in which the encoder maps a source view image into a latent content code, and the decoder transforms it into a target view image according to the condition. However, the source contents are often not well kept in this setting, which leads to unnecessary changes during the view translation. Although adding skipped connections, like Unet, alleviates the problem, but it often causes the failure on the view conformity. This paper proposes a new architecture by performing the source-to-target deformation in an iterative way. Instead of simply incorporating the features from multiple layers of the encoder, we design soft and hard deformation modules, which warp the encoder features to the target view at different resolutions, and give results to the decoder to complement the details. Particularly, the current warping flow is not only used to align the feature of the same resolution, but also as an approximation to coarsely deform the high resolution feature. Then the residual flow is estimated and applied in the high resolution, so that the deformation is built up in the coarse-to-fine fashion. To better constrain the model, we synthesize a rough target view image based on the intermediate flows and their warped features. The extensive ablation studies and the final results on two different data sets show the effectiveness of the proposed model.

翻訳日:2021-03-07 06:41:47 公開日:2021-03-04

# (参考訳) H\"older クラスにおけるReLU-Sine-Exponential Activations Break Curse of Dimensionalityを用いたディープニューラルネットワーク

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on H\"older Class ( http://arxiv.org/abs/2103.00542v2 )

ライセンス: CC BY 4.0

Yuling Jiao, Yanming Lai, Xiliang Lu, Zhijian Yang

(参考訳) 本論文では,ReLU,sine,および2^x$をアクティベーション関数とするニューラルネットワークを構築する。 for general continuous $f$ defined on $[0,1]^d$ with continuity modulus $\omega_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy a approximation rate $\mathcal{O}(\omega_f(\sqrt{d})\cdot2^{-M}+\omega_{f}\left(\frac{\sqrt{d}}{N}\right)$, where $M,N\in \mathbb{N}^{+}$。 As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3\mu}{\epsilon}\right)^{1/{\alpha}}\right\rceil,2\left\lceil\log_2\frac{3\mu d^{\alpha/2}}{2\epsilon}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ within a given tolerance $\epsilon >0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ denotes the H\"older continuous function class defined on $[0,1]^d$ with order $\alpha \in (0,1]$ and constant $\mu > 0$. したがって、ReLU-sine-$2^x$ネットワークは、$\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$上の次元の呪いを克服する。スーパー表現力に加えて、ReLU-sine-$2^x$ネットワークで実装された関数は(一般化)微分可能であり、SGDを訓練に適用することができる。

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $\omega_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(\omega_f(\sqrt{d})\cdot2^{-M}+\omega_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3\mu}{\epsilon}\right)^{1/{\alpha}}\right\rceil,2\left\lceil\log_2\frac{3\mu d^{\alpha/2}}{2\epsilon}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ within a given tolerance $\epsilon >0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$ denotes the H\"older continuous function class defined on $[0,1]^d$ with order $\alpha \in (0,1]$ and constant $\mu > 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_{\mu}^{\alpha}([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.

翻訳日:2021-03-07 06:12:46 公開日:2021-03-04

# フレーズベースおよびニューラルマシン翻訳の実証的分析

An empirical analysis of phrase-based and neural machine translation ( http://arxiv.org/abs/2103.03108v1 )

ライセンス: Link先を確認

Hamidreza Ghader

(参考訳) 機械翻訳(MT)の2つの一般的なタイプは、フレーズベースとニューラルマシン翻訳システムです。どちらのシステムも複数の複雑なモデルや層で構成されている。これらのモデルとレイヤはそれぞれ、ソース言語の異なる言語的側面を学ぶ。しかし,これらのモデルや層について,どの言語現象が学習されるのか,どのように学習されるのかは明らかになっていない。フレーズベースのMTシステムでは、各モデルでどのような情報が学習されるのかが明確であり、むしろこの情報がどのように学習されるか、特に句の並べ替えモデルについてである。ニューラルマシン翻訳システムでは、その状況はさらに複雑であり、多くの場合、どのような情報が学習され、どのように学習されるかは正確には分かっていない。 MTシステムでは,言語現象がどのように捉えられているかを明らかにするために,フレーズベースとニューラルMTシステムの両方において重要なモデルの挙動を解析する。本研究では, フレーズリオーダリングモデルを用いて, フレーズリオーダリングの動作を定義する上で, フレーズ内のどの単語がもっとも影響が大きいかを検討する。さらに、ニューラルMTシステムの解釈可能性に寄与するために、ニューラルMTシステムにおける重要なコンポーネントである注意モデルと、フレーズベースのシステムにおけるフレーズリオーダーモデルに最も近いモデルの振る舞いを研究します。注意モデルとエンコーダ隠された状態表現は、神経MTのソース側言語情報をエンコードする主要なコンポーネントを形成する。この目的のために、我々はまた、神経MTシステムのエンコーダ隠蔽状態表現でキャプチャされた情報を分析します。異なるニューラルMTアーキテクチャの隠れた状態表現によって、ソース側からの構文的および語彙的セマンティック情報が捕捉される範囲を調査する。

Two popular types of machine translation (MT) are phrase-based and neural machine translation systems. Both of these types of systems are composed of multiple complex models or layers. Each of these models and layers learns different linguistic aspects of the source language. However, for some of these models and layers, it is not clear which linguistic phenomena are learned or how this information is learned. For phrase-based MT systems, it is often clear what information is learned by each model, and the question is rather how this information is learned, especially for its phrase reordering model. For neural machine translation systems, the situation is even more complex, since for many cases it is not exactly clear what information is learned and how it is learned. To shed light on what linguistic phenomena are captured by MT systems, we analyze the behavior of important models in both phrase-based and neural MT systems. We consider phrase reordering models from phrase-based MT systems to investigate which words from inside of a phrase have the biggest impact on defining the phrase reordering behavior. Additionally, to contribute to the interpretability of neural MT systems we study the behavior of the attention model, which is a key component in neural MT systems and the closest model in functionality to phrase reordering models in phrase-based systems. The attention model together with the encoder hidden state representations form the main components to encode source side linguistic information in neural MT. To this end, we also analyze the information captured in the encoder hidden state representations of a neural MT system. We investigate the extent to which syntactic and lexical-semantic information from the source side is captured by hidden state representations of different neural MT architectures.

翻訳日:2021-03-05 15:13:04 公開日:2021-03-04

# 敵対的攻撃を防御する構造保存型低ランク画像補完

Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks ( http://arxiv.org/abs/2103.02781v1 )

ライセンス: Link先を確認

Zhiqun Zhao, Hengyou Wang, Hao Sun and Zhihai He

(参考訳) ディープニューラルネットワークは、局所的な画像の詳細を分析し、推論層に沿って情報を要約することでオブジェクトを認識し、最終的な決定を導出する。このため、敵対的な攻撃の傾向が強い。入力画像の小さな洗練されたノイズは、ネットワーク推測経路に沿って蓄積し、ネットワーク出力で間違った決定を下すことができる。一方、人間の目は局所的なイメージテクスチャではなく、そのグローバルな構造と意味的な手がかりに基づいて物体を認識する。このため、人間の目は敵の攻撃によって大きな損傷を受けた画像から対象をはっきりと認識することができる。これは、ディープニューラルネットワークを敵の攻撃から守るための非常に興味深いアプローチにつながります。本研究では,入力画像から不要なテクスチャの詳細を取り除き,ディープニューラルネットワークのバイアスをグローバルオブジェクト構造や意味的手がかりにシフトさせる構造保存型プログレッシブ低ランク画像補完(splic)手法を提案する。最適化過程における局所最小化を回避するため、段階的に滑らかな階数関数を持つ低ランク行列補完問題に問題を定式化する。実験の結果,提案手法は重要なグローバルなオブジェクト構造を保ちながら,重要でないローカル画像の細部を除去できることがわかった。ブラックボックス,グレイボックス,ホワイトボックス攻撃では,既存の防御手法(最大12.6%)を上回り,ネットワークの敵対的堅牢性を大幅に向上させる。

Deep neural networks recognize objects by analyzing local image details and summarizing their information along the inference layers to derive the final decision. Because of this, they are prone to adversarial attacks. Small sophisticated noise in the input images can accumulate along the network inference path and produce wrong decisions at the network output. On the other hand, human eyes recognize objects based on their global structure and semantic cues, instead of local image textures. Because of this, human eyes can still clearly recognize objects from images which have been heavily damaged by adversarial attacks. This leads to a very interesting approach for defending deep neural networks against adversarial attacks. In this work, we propose to develop a structure-preserving progressive low-rank image completion (SPLIC) method to remove unneeded texture details from the input images and shift the bias of deep neural networks towards global object structures and semantic cues. We formulate the problem into a low-rank matrix completion problem with progressively smoothed rank functions to avoid local minimums during the optimization process. Our experimental results demonstrate that the proposed method is able to successfully remove the insignificant local image details while preserving important global object structures. On black-box, gray-box, and white-box attacks, our method outperforms existing defense methods (by up to 12.6%) and significantly improves the adversarial robustness of the network.

翻訳日:2021-03-05 15:12:13 公開日:2021-03-04

# Pruning in Pruning: テスト精度を超えたPruning Neural Networkの効果

Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy ( http://arxiv.org/abs/2103.03014v1 )

ライセンス: Link先を確認

Lucas Liebenwein, Cenk Baykal, Brandon Carter, David Gifford, Daniela Rus

(参考訳) ニューラルネットワークプルーニングは、現代的、潜在的に過パラメータ化された、ネットワークの推論コストを削減するために使用される一般的な技術です。事前訓練されたネットワークから始まるプロセスは、冗長なパラメータを削除し、再トレーニングし、同じテスト精度を維持しながら繰り返します。結果は、予測性能(テスト精度)に匹敵するオリジナルのサイズのごく一部であるモデルである。ここでは,テスト精度のみを終了条件で使用することが十分であるかどうかを再評価し,結果のモデルが,分布外データへの一般化やノイズに対するレジリエンスといった,幅広い"ハード"指標で良好に動作することを保証する。さまざまなアーキテクチャやデータセットの評価を横断すると、prunedネットワークは非prunedモデルを効果的に近似するが、prunedネットワークが同等のパフォーマンスを達成するプルーン比率はタスク間で大きく異なる。これらの結果は、深層学習における \emph{genuine} オーバーパラメータ化の程度を疑問視し、特に安全性クリティカルなシステムの文脈において、pruned ネットワークをデプロイすることの実践可能性について懸念を喚起する。私たちのコードはhttps://github.com/lucaslie/torchpruneで利用可能です。

Neural network pruning is a popular technique used to reduce the inference costs of modern, potentially overparameterized, networks. Starting from a pre-trained network, the process is as follows: remove redundant parameters, retrain, and repeat while maintaining the same test accuracy. The result is a model that is a fraction of the size of the original with comparable predictive performance (test accuracy). Here, we reassess and evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well across a wide spectrum of "harder" metrics such as generalization to out-of-distribution data and resilience to noise. Across evaluations on varying architectures and data sets, we find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks. These results call into question the extent of \emph{genuine} overparameterization in deep learning and raise concerns about the practicability of deploying pruned networks, specifically in the context of safety-critical systems, unless they are widely evaluated beyond test accuracy to reliably predict their performance. Our code is available at https://github.com/lucaslie/torchprune.

翻訳日:2021-03-05 15:11:49 公開日:2021-03-04

# ヒートマップとトリックバッグを用いたサブピクセル顔のランドマーク

Sub-pixel face landmarks using heatmaps and a bag of tricks ( http://arxiv.org/abs/2103.03059v1 )

ライセンス: Link先を確認

Samuel W. F. Earp and Aubin Samacoits and Sanjana Jain and Pavit Noinongyao and Siwa Boonpunmongkol

(参考訳) 正確な顔のランドマークのローカリゼーションは、顔認識、再構築、モーフィングの不可欠な部分です。顔のランドマークを正確にローカライズするために,熱マップ回帰手法を提案する。各モデルはmobilenetv2バックボーンからなり、続いていくつかのスケールアップ層があり、パフォーマンスと推論コストの両方を最適化するさまざまなトリックがある。従来の手法のように境界ボックスを使うのではなく、顔の位置とアライメントに5つのna\"ive face landmarkを使用します。さらに,アライメント後にランダムな回転,変位,スケーリングを加えることで,モデルが向きよりも顔位置に敏感であることが分かる。また, デコンボリューション層とピクセルシャッフル層を混合することで, 局所化性能を損なうことなく, アップスケーリングの複雑さを低減できることを示した。我々は,最先端の顔ランドマークローカライズモデルを提案する(第2位は106ポイント顔ランドマークローカライズ検証セットの2位)。最後に,公開モデルとベンチマークを用いて,これらのランドマークを用いた顔認識の効果をテストする。

Accurate face landmark localization is an essential part of face recognition, reconstruction and morphing. To accurately localize face landmarks, we present our heatmap regression approach. Each model consists of a MobileNetV2 backbone followed by several upscaling layers, with different tricks to optimize both performance and inference cost. We use five na\"ive face landmarks from a publicly available face detector to position and align the face instead of using the bounding box like traditional methods. Moreover, we show by adding random rotation, displacement and scaling -- after alignment -- that the model is more sensitive to the face position than orientation. We also show that it is possible to reduce the upscaling complexity by using a mixture of deconvolution and pixel-shuffle layers without impeding localization performance. We present our state-of-the-art face landmark localization model (ranking second on The 2nd Grand Challenge of 106-Point Facial Landmark Localization validation set). Finally, we test the effect on face recognition using these landmarks, using a publicly available model and benchmarks.

翻訳日:2021-03-05 15:11:22 公開日:2021-03-04

# BM3D vs 2-Layer ONN

BM3D vs 2-Layer ONN ( http://arxiv.org/abs/2103.03060v1 )

ライセンス: Link先を確認

Junaid Malik, Serkan Kiranyaz, Mehmet Yamac, Moncef Gabbouj

(参考訳) 最近の画像のノイズ除去の成功にもかかわらず、深く複雑なアーキテクチャの必要性はCNNの実用的な使用を妨げています。特にリソース制約のあるシナリオでは、bm3dのような古いが計算効率のよい手法が一般的である。本研究では,AWGN画像デノイジングにおけるBM3Dと比較し,小型ニューラルネットワークが競争結果を得ることができるかどうかを検討する。この目的のために,隠れレイヤを2つしか持たないネットワークを設定し,異なるニューロンモデルと層幅を用いて,異なるawgnノイズレベルにおけるbm3dの性能を比較する。この結果から, 生成ニューロンモデル(Self-ONNs)に基づくニューラルネットワークの自己組織型は, CNNよりも優れた選択であるだけでなく, BM3Dに比べて競合性があり, 高い雑音レベルにおいてさらに優れていることが示唆された。

Despite their recent success on image denoising, the need for deep and complex architectures still hinders the practical usage of CNNs. Older but computationally more efficient methods such as BM3D remain a popular choice, especially in resource-constrained scenarios. In this study, we aim to find out whether compact neural networks can learn to produce competitive results as compared to BM3D for AWGN image denoising. To this end, we configure networks with only two hidden layers and employ different neuron models and layer widths for comparing the performance with BM3D across different AWGN noise levels. Our results conclusively show that the recently proposed self-organized variant of operational neural networks based on a generative neuron model (Self-ONNs) is not only a better choice as compared to CNNs, but also provide competitive results as compared to BM3D and even significantly surpass it for high noise levels.

翻訳日:2021-03-05 15:11:01 公開日:2021-03-04

# Barlow Twins: 冗長化による自己監督型学習

Barlow Twins: Self-Supervised Learning via Redundancy Reduction ( http://arxiv.org/abs/2103.03230v1 )

ライセンス: Link先を確認

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, St\'ephane Deny

(参考訳) SSL(Self-supervised Learning)は、大規模なコンピュータビジョンベンチマークの監督メソッドによるギャップを急速に閉じています。 SSLの成功したアプローチは、入力サンプルの歪みに不変な表現を学ぶことである。しかし、このアプローチの繰り返しの問題は、自明な定数表現の存在である。現在のほとんどのメソッドは、注意深く実装することで、そのような崩壊したソリューションを避ける。サンプルの歪んだバージョンで供給される2つの同一ネットワークの出力間の相互相関行列を計測し、可能な限り同一行列に近づけることで、そのような崩壊を自然に避ける目的関数を提案する。これにより、歪んだサンプルの表現ベクトルは類似し、これらのベクトルの成分間の冗長性が最小化される。この方法は、神経科学者H. Barlowの冗長還元原理が同一のネットワークに適用されるため、Barlow Twinsと呼ばれる。 Barlow Twinsは、予測器ネットワーク、勾配停止、重量更新における移動平均などのネットワーク双対間の大きなバッチや非対称性を必要としない。これは非常に高次元の出力ベクトルを使うことができる。 Barlow Twins は、低データ状態における半教師付き分類のための ImageNet の以前の手法よりも優れており、線形分類器ヘッドによる ImageNet の分類技術の現状と分類とオブジェクト検出の伝達タスクに匹敵する。

Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn representations which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant representations. Most current methods avoid such collapsed solutions by careful implementation details. We propose an objective function that naturally avoids such collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the representation vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. It allows the use of very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

翻訳日:2021-03-05 15:10:45 公開日:2021-03-04

# 人工知能ガイド放射線学システムにおけるサニティテストによる汚い相関の検出

Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems ( http://arxiv.org/abs/2103.03048v1 )

ライセンス: Link先を確認

Usman Mahmood, Robik Shrestha, David D.B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Erdi, Christopher Kanan

(参考訳) 人工知能(ai)は、機械知覚の多くの問題を解決することに成功した。放射線学において、AIシステムは急速に進化し、治療決定の導出、診断、医療画像上の疾患の局所化、放射線医の効率の向上の進展を示している。放射線学におけるAIの展開における重要な要素は、開発システムの有効性と安全性への信頼を得ることである。現在のゴールドスタンダードのアプローチは、1つ以上の機関からの一般化データセットでパフォーマンスの分析検証を行い、次にデプロイ中のシステムの有効性に関する臨床検証を行う。臨床検証研究は時間がかかり、ベストプラクティスは分析検証データの限られた再利用を指示するので、システムが分析または臨床検証に失敗する可能性があるかどうかを事前に知るのが理想的です。本稿では,開発データに不正な理由から,システムがいつ良好に動作するかを特定するための一連の健全性テストについて述べる。本研究は,ctで見る膵癌分類のための深層学習システムを設計することで,健康検査の価値を示す。

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

翻訳日:2021-03-05 15:10:21 公開日:2021-03-04

# Perceiver: 反復的注意を伴った一般認識

Perceiver: General Perception with Iterative Attention ( http://arxiv.org/abs/2103.03206v1 )

ライセンス: Link先を確認

Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira

(参考訳) 生体システムは視覚、オーディション、タッチ、プロピオセプションなど様々な形態の高次元入力を同時に処理することで世界を理解する。一方、ディープラーニングで使用される知覚モデルは個々のモダリティのために設計されており、多くの場合、ほとんどすべての既存の視覚モデルによって活用される局所格子構造のようなドメイン固有の仮定に依存している。これらの優先事項は、有益な誘導バイアスを導入するだけでなく、個々のモダリティにモデルをロックする。本稿では,トランスフォーマーを基盤とするモデルであるperceiverについて紹介する。このモデルでは,入力間の関係についてアーキテクチャ上の仮定をほとんど行わないが,convnetsのような数十万の入力にもスケールする。このモデルは非対称な注意機構を利用して、反復的に入力をタイトな潜在ボトルネックに蒸留し、非常に大きな入力を処理するためにスケールすることができる。このアーキテクチャは,画像,ポイントクラウド,オーディオ,ビデオ,ビデオ+オーディオなど,さまざまなモードの分類タスクに対して,競争的に,あるいはそれ以上に,強力な特殊なモデルを実行していることを示す。イメージネット上のresnet-50に匹敵する性能は畳み込みなく、5万画素まで直接参加することで得られる。また、AudioSetのすべてのモダリティの最先端の結果を超えています。

Biological systems understand the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture performs competitively or beyond strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video and video+audio. The Perceiver obtains performance comparable to ResNet-50 on ImageNet without convolutions and by directly attending to 50,000 pixels. It also surpasses state-of-the-art results for all modalities in AudioSet.

翻訳日:2021-03-05 15:10:03 公開日:2021-03-04

# 画像間翻訳の新しい応用:単一画像からの学習による染色体ストレート化フレームワーク

A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image ( http://arxiv.org/abs/2103.02835v1 )

ライセンス: Link先を確認

Sifan Song, Daiyun Huang, Yalun Hu, Chunxiao Yang, Jia Meng, Fei Ma, Jiaming Zhang, Jionglong Su

(参考訳) 医療用イメージングでは、染色体矯正は染色体の病理学的研究と細胞遺伝図の作成に重要な役割を果たします。ストレート化タスクには異なるアプローチが存在するが、主に幾何学的アルゴリズムであり、出力はジャッジエッジまたはバンドリングパターンを廃止したフラグメントによって特徴づけられる。幾何学的アルゴリズムの欠陥に対処するため,画像から画像への変換に基づく新しいフレームワークを提案し,不断なバンドリングパターンと保存された詳細を持つストレート化染色体を合成するための関係マッピング依存性を学習する。また、入力染色体の不足の落とし穴を避けるため、トレーニングモデルに単一の湾曲した染色体画像のみを用いた拡張データセットを構築した。この枠組みに基づき,u字型ネットワークと条件付き生成型逆ネットワークという2つの一般的な画像から画像への翻訳アーキテクチャを適用し,その有効性を評価する。 642個の実世界の染色体からなるデータセットに関する実験は、現実的で連続的な染色体詳細を表現し、直線化性能における幾何学的手法と比較して、我々の枠組みの優越性を示している。さらに, 染色体分類の精度を0.98%-1.39%向上させた。

In medical imaging, chromosome straightening plays a significant role in the pathological study of chromosomes and in the development of cytogenetic maps. Whereas different approaches exist for the straightening task, they are mostly geometric algorithms whose outputs are characterized by jagged edges or fragments with discontinued banding patterns. To address the flaws in the geometric algorithms, we propose a novel framework based on image-to-image translation to learn a pertinent mapping dependence for synthesizing straightened chromosomes with uninterrupted banding patterns and preserved details. In addition, to avoid the pitfall of deficient input chromosomes, we construct an augmented dataset using only one single curved chromosome image for training models. Based on this framework, we apply two popular image-to-image translation architectures, U-shape networks and conditional generative adversarial networks, to assess its efficacy. Experiments on a dataset comprising of 642 real-world chromosomes demonstrate the superiority of our framework as compared to the geometric method in straightening performance by rendering realistic and continued chromosome details. Furthermore, our straightened results improve the chromosome classification, achieving 0.98%-1.39% in mean accuracy.

翻訳日:2021-03-05 15:09:42 公開日:2021-03-04

# Morphset:フェースモーフィングを用いたディメンショナル・インパクト・ラベルを用いたカテゴリー別感情データセットの拡張

Morphset:Augmenting categorical emotion datasets with dimensional affect labels using face morphing ( http://arxiv.org/abs/2103.02854v1 )

ライセンス: Link先を確認

Vassilios Vonikakis, Dexter Neo, Stefan Winkler

(参考訳) 感情認識と理解は人間と機械の相互作用において重要な要素である。原子価と覚醒を用いた影響の次元モデルは、人間のエモ・オプション状態の複雑さのために伝統的なカテゴリーよりも有利である。しかし、次元的な感情アノテーションは収集が困難でコストがかかるため、それでも感情的なコンピューティングコミュニティでは限られている。そこで本論文では,これらの課題を補うために,得られたサンプルのディトリブチオンと円周空間の次元ラベルを完全に制御し,少なくとも20倍以上の増分係数を達成したまま,既存のカテゴリー的感情データセットから合成画像を生成する手法を提案する。

Emotion recognition and understanding is a vital componentin human-machine interaction. Dimensional models of affectsuch as those using valence and arousal have advantages overtraditional categorical ones due to the complexity of emo-tional states in humans. However, dimensional emotion an-notations are difficult and expensive to collect, therefore theyare still limited in the affective computing community. To ad-dress these issues, we propose a method to generate syntheticimages from existing categorical emotion datasets using facemorphing, with full control over the resulting sample distri-bution as well as dimensional labels in the circumplex space,while achieving augmentation factors of at least 20x or more.

翻訳日:2021-03-05 15:09:23 公開日:2021-03-04

# ロバスト長期政策移行に向けて

Toward Robust Long Range Policy Transfer ( http://arxiv.org/abs/2103.02957v1 )

ライセンス: Link先を確認

Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun

(参考訳) 人間は、経験を積んで得たスキルを活かして、数回の試行錯誤で新しいタスクをマスターできます。この能力を模倣するために、事前タスクから学習した原始的ポリシーを組み合わせた階層モデルが提案されている。しかし、これらの方法は人間の移動可能性の範囲と比較して短い。そこで本稿では,階層構造を活用し,複合機能を訓練し,多種多様な原始警察を交互に適応させ,新しい課題に挑戦する上で,様々な複雑な行動を効率的に生み出す手法を提案する。また,プリミティブの多様性と利用率を向上させるために,プリトレーニングフェーズにおける2つの正規化項を設計した。提案手法は,タスク内のこれら再利用可能なプリミティブを連続的なアクション空間と組み合わせることで,他の最近のポリシー転送手法よりも優れることを示す。実験の結果,提案手法がより広い転送範囲を提供することが示された。アブレーション研究は、規則化条件が長期政策移行に重要であることも示している。最後に,本手法は,プリミティブの品質が変化する場合,他の手法よりも常に優れることを示す。

Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination function and adapt the set of diverse primitive polices alternatively, to efficiently produce a range of complex behaviors on challenging new tasks. We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase. We demonstrate that our method outperforms other recent policy transfer methods by combining and adapting these reusable primitives in tasks with continuous action space. The experiment results further show that our approach provides a broader transferring range. The ablation study also shows the regularization terms are critical for long range policy transfer. Finally, we show that our method consistently outperforms other methods when the quality of the primitives varies.

翻訳日:2021-03-05 15:09:11 公開日:2021-03-04

# マルチターン対話理解の進歩:調査

Advances in Multi-turn Dialogue Comprehension: A Survey ( http://arxiv.org/abs/2103.03125v1 )

ライセンス: Link先を確認

Zhuosheng Zhang and Hai Zhao

(参考訳) 自然言語を理解し、人間と対話する機械の訓練は、人工知能の分野では難解で不可欠な作業です。近年,深層学習研究,特に最近の事前学習言語モデルの急速な発展にともなって,対話システムの多様化が図られている。これらの研究の中で、基本的な課題は対話理解であり、その役割は機械に応答する前に対話の文脈を読み、理解させることである。本稿では,対話モデリングの観点から,これまでの手法を検討する。平文読解とは対照的に,対話理解の特徴と課題を要約する。次に,対話シナリオにおけるprlm向上のための対話関連言語モデリング手法とともに,対話理解タスクにおいて広く用いられている対話モデリングの3つの典型的なパターンについて考察する。最後に,近年の技術的進歩を浮き彫りにして,経験的分析から学べる教訓と新たな研究のフロンティアへの展望を指摘する。

Training machines to understand natural language and interact with humans is an elusive and essential task in the field of artificial intelligence. In recent years, a diversity of dialogue systems has been designed with the rapid development of deep learning researches, especially the recent pre-trained language models. Among these studies, the fundamental yet challenging part is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the perspective of dialogue modeling. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks such as response selection and conversation question-answering, as well as dialogue-related language modeling techniques to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons we can learn from the empirical analysis and the prospects towards a new frontier of researches.

翻訳日:2021-03-05 15:08:17 公開日:2021-03-04

# 暗黙的政策推定による逆強化学習

Inverse Reinforcement Learning with Explicit Policy Estimates ( http://arxiv.org/abs/2103.02863v1 )

ライセンス: Link先を確認

Navyata Sanghvi, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, Kris Kitani

(参考訳) 逆強化学習(IRL)問題を解くための様々な手法が、機械学習と経済学において独立に開発されている。特に、最大因果エントロピーIRL法はエントロピー最大化の観点に基づいており、経済分野における関連する進歩は、専門家の振る舞いを説明するために観測されていない作用ショックの存在を前提としている(Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm)。本研究では,これらの関連手法について,両分野から未知の接続を行う。目的の共通形式、関連する方針、客観的勾配を特徴とする最適化問題のクラスに属することを示すことにより、これを達成する。最適ソフト値関数の近似による手法間の鍵となる計算量とアルゴリズムの差異を実証し,より効率的なアルゴリズムを導出する方法について述べる。この最適化問題の研究から得られた知見を用いて,様々な問題シナリオを特定し,それらの問題に対する各手法の適合性について検討する。

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of the optimal soft value function, and describe how this leads to more efficient algorithms. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.

翻訳日:2021-03-05 15:07:25 公開日:2021-03-04

# 弱監督分類における低有界適正損失

Lower-bounded proper losses for weakly supervised classification ( http://arxiv.org/abs/2103.02893v1 )

ライセンス: Link先を確認

Shuhei M. Yoshida, Takashi Takenouchi, Masashi Sugiyama

(参考訳) 本稿では,あるラベル破損プロセスによって生成される弱いラベルをインスタンスに付与する分類の弱い教師付き学習の問題について論じる。目標は、弱ラベル学習における損失関数が適切かつ低境界である条件を導出することであり、クラス確率推定に使用される損失の2つの必須条件である。そのために、教師付き学習における適切な損失を表す表現定理を導出し、サベージ表現を双対化する。この定理を用いて, 固有な弱ラベル損失を特徴付け, 低バウンドとなる条件を見いだす。これらの理論的知見に基づき, 正則化法則化法を導出し, 正則性を失うことなく, 下から任意の弱ラベル損失を境界とする一般化ロジット絞込み法を導出する。さらに,提案手法の有効性を,不適切な損失や非バウンド損失と比較して実験的に実証した。これらの結果は、適切性と低い有界性の重要性を強調します。コードはhttps://github.com/yoshum/lower-bounded-proper-lossesで公開されている。

This paper discusses the problem of weakly supervised learning of classification, in which instances are given weak labels that are produced by some label-corruption process. The goal is to derive conditions under which loss functions for weak-label learning are proper and lower-bounded -- two essential requirements for the losses used in class-probability estimation. To this end, we derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation. We use this theorem to characterize proper weak-label losses and find a condition for them to be lower-bounded. Based on these theoretical findings, we derive a novel regularization scheme called generalized logit squeezing, which makes any proper weak-label loss bounded from below, without losing properness. Furthermore, we experimentally demonstrate the effectiveness of our proposed approach, as compared to improper or unbounded losses. Those results highlight the importance of properness and lower-boundedness. The code is publicly available at https://github.com/yoshum/lower-bounded-proper-losses.

翻訳日:2021-03-05 15:07:06 公開日:2021-03-04

# KL発散最小化によるベストランク-1テンソル近似の閉形式解

A Closed Form Solution to Best Rank-1 Tensor Approximation via KL divergence Minimization ( http://arxiv.org/abs/2103.02898v1 )

ライセンス: Link先を確認

Kazu Ghalamkari, Mahito Sugiyama

(参考訳) テンソル分解は根本的に難しい問題です。テンソル分解の最も単純な場合でさえも、最小二乗 (ls) 誤差の項におけるランク1近似はnpハードであることが知られている。ここでは、LS誤差の代わりにKLの発散を考えると、与えられた正のテンソルからKLの発散を最小限に抑えるランク1テンソルに対する閉形式解を解析的に導出できることが示される。我々の重要な洞察は、正のテンソルを確率分布として扱い、ランク1のテンソルの集合への射影としてランク1近似の過程を定式化することである。これにより,階数1近似を凸最適化により解くことができる。実験により,我々のアルゴリズムは既存のランク1近似法よりも桁違いに高速であることを示すとともに,理論的な発見を支援するテンソルの近似性を向上する。

Tensor decomposition is a fundamentally challenging problem. Even the simplest case of tensor decomposition, the rank-1 approximation in terms of the Least Squares (LS) error, is known to be NP-hard. Here, we show that, if we consider the KL divergence instead of the LS error, we can analytically derive a closed form solution for the rank-1 tensor that minimizes the KL divergence from a given positive tensor. Our key insight is to treat a positive tensor as a probability distribution and formulate the process of rank-1 approximation as a projection onto the set of rank-1 tensors. This enables us to solve rank-1 approximation by convex optimization. We empirically demonstrate that our algorithm is an order of magnitude faster than the existing rank-1 approximation methods and gives better approximation of given tensors, which supports our theoretical finding.

翻訳日:2021-03-05 15:06:48 公開日:2021-03-04

# Calibrated Simplex Mapping Classification

Calibrated Simplex Mapping Classification ( http://arxiv.org/abs/2103.02926v1 )

ライセンス: Link先を確認

Raoul Heese, Micha{\l} Walczak, Michael Bortz, Jochen Schmid

(参考訳) 本研究では,学習データを直線的に分離可能な潜在空間に写像する多重クラス/単一ラベル分類器を提案する。このアプローチにより、分類問題をよく定義された回帰問題に変換することができる。そのソリューションでは、特徴空間における適切な距離メトリックと、潜在的な空間座標を予測する回帰モデルを選択できます。様々な人工および実世界のデータセットのベンチマークを用いて,分類器のキャリブレーション品質と予測性能を示す。

We propose a novel supervised multi-class/single-label classifier that maps training data onto a linearly separable latent space with a simplex-like geometry. This approach allows us to transform the classification problem into a well-defined regression problem. For its solution we can choose suitable distance metrics in feature space and regression models predicting latent space coordinates. A benchmark on various artificial and real-world data sets is used to demonstrate the calibration qualities and prediction performance of our classifier.

翻訳日:2021-03-05 15:06:32 公開日:2021-03-04

# svmax: 機能埋め込み正規化子

SVMax: A Feature Embedding Regularizer ( http://arxiv.org/abs/2103.02770v1 )

ライセンス: Link先を確認

Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis

(参考訳) ニューラルネットワーク正規化器(例えば体重減少)は、ネットワークの複雑さを明示的に罰することにより性能を高める。本稿では,下位のネットワークアクティベーション -- 機能埋め込み -- をペナルティし,ネットワークの重みを暗黙的に規則化する。より均一な特徴埋め込みを学習するための特異値最大化(SVMax)を提案する。 SVMax正規化器は教師なし学習と教師なし学習の両方をサポートする。モデル崩壊を緩和し、学習率を大きくします。 SVMax正則化器を検索と生成の両逆ネットワークを用いて評価する。ガウスデータセットの合成混合物を用いてSVMaxを教師なし環境で評価する。検索ネットワークの場合、SVMaxは様々なランキング損失で大幅な改善マージンを達成します。 https://bit.ly/3jNkgDt

A neural network regularizer (e.g., weight decay) boosts performance by explicitly penalizing the complexity of a network. In this paper, we penalize inferior network activations -- feature embeddings -- which in turn regularize the network's weights implicitly. We propose singular value maximization (SVMax) to learn a more uniform feature embedding. The SVMax regularizer supports both supervised and unsupervised learning. Our formulation mitigates model collapse and enables larger learning rates. We evaluate the SVMax regularizer using both retrieval and generative adversarial networks. We leverage a synthetic mixture of Gaussians dataset to evaluate SVMax in an unsupervised setting. For retrieval networks, SVMax achieves significant improvement margins across various ranking losses. Code available at https://bit.ly/3jNkgDt

翻訳日:2021-03-05 15:06:25 公開日:2021-03-04

# ストアド埋め込みによる視覚強化学習における計算効率の向上

Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings ( http://arxiv.org/abs/2103.02886v1 )

ライセンス: Link先を確認

Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel

(参考訳) オフポリシー深層強化学習(RL)の最近の進歩は、視覚観察からの複雑なタスクで印象的な成功をもたらしました。 experience replayは過去の経験を再利用することでサンプル効率を改善し、畳み込みニューラルネットワーク(cnns)は高次元入力を効果的に処理する。しかし、そのような技術は高いメモリと計算帯域を必要とする。本稿では,既存の非政治RLメソッドの単純な修正であるストアド・エンベディング for Efficient Reinforcement Learning (SEER) について,これらの計算とメモリの要件に対処するために提示する。 CNNの勾配更新の計算オーバーヘッドを減らすために、パラメータの早期収束によるトレーニングの早い段階でCNNエンコーダの下層を凍結します。さらに、高次元画像の代わりに経験再生のための低次元潜時ベクトルを格納することにより、メモリ要求を低減し、リプレイバッファ容量の適応的増加を可能にする。実験の結果,SEERはRLエージェントの性能を劣化させることなく,様々なDeepMindコントロール環境とAtariゲーム間で計算とメモリを著しく節約できることがわかった。 CNNの下位層は、異なるタスクやドメインに使用できる一般化可能な特徴を抽出するため、SEERはRLの計算効率の高い転送学習に有用であることを示す。

Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations. Experience replay improves sample-efficiency by reusing experiences from the past, and convolutional neural networks (CNNs) process high-dimensional inputs effectively. However, such techniques demand high memory and computational bandwidth. In this paper, we present Stored Embeddings for Efficient Reinforcement Learning (SEER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements. To reduce the computational overhead of gradient updates in CNNs, we freeze the lower layers of CNN encoders early in training due to early convergence of their parameters. Additionally, we reduce memory requirements by storing the low-dimensional latent vectors for experience replay instead of high-dimensional images, enabling an adaptive increase in the replay buffer capacity, a useful technique in constrained-memory settings. In our experiments, we show that SEER does not degrade the performance of RL agents while significantly saving computation and memory across a diverse set of DeepMind Control environments and Atari games. Finally, we show that SEER is useful for computation-efficient transfer learning in RL because lower layers of CNNs extract generalizable features, which can be used for different tasks and domains.

翻訳日:2021-03-05 15:06:15 公開日:2021-03-04

# 差動的階層的テキスト分類におけるプライバシ利用トレードオフについて

On the privacy-utility trade-off in differentially private hierarchical text classification ( http://arxiv.org/abs/2103.02895v1 )

ライセンス: Link先を確認

Dominik Wunderlich, Daniel Bernau, Francesco Ald\`a, Javier Parra-Arnau, Thorsten Strufe

(参考訳) テキスト分類のための階層モデルは、トレーニングデータ記憶のために機密または機密のトレーニングデータ情報を敵に漏らすことができます。モデルトレーニング中に差分プライバシーを使用することで、トレーニングオプティマイザを摂動させることで、トレーニングモデルに対する漏洩攻撃を軽減できます。しかし、階層的なテキスト分類では、モデルアーキテクチャの多重性が利用可能であり、モデル精度とモデルリークとのトレードオフが、他のアーキテクチャよりも優れているかどうかは不明である。我々は,ホワイトボックスのメンバシップ推論攻撃を用いて,階層的テキスト分類のための3つの広範に使用されているニューラルネットワークアーキテクチャの情報漏洩を評価する。我々は,メンバシップ推論攻撃を完全に軽減するために,比較的弱い差分プライバシ保証がすでに十分であることを示す。より具体的には、長いテキストを持つ大規模なデータセットでは、トランスベースのモデルを観察して、全体的に有利なプライバシユーティリティトレードオフを達成しました。

Hierarchical models for text classification can leak sensitive or confidential training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models by perturbing the training optimizer. However, for hierarchical text classification a multiplicity of model architectures is available and it is unclear whether some architectures yield a better trade-off between remaining model accuracy and model leakage under differentially private training perturbation than others. We use a white-box membership inference attack to assess the information leakage of three widely used neural network architectures for hierarchical text classification under differential privacy. We show that relatively weak differential privacy guarantees already suffice to completely mitigate the membership inference attack, thus resulting only in a moderate decrease in utility. More specifically, for large datasets with long texts we observed transformer-based models to achieve an overall favorable privacy-utility trade-off, while for smaller datasets with shorter texts CNNs are preferable.

翻訳日:2021-03-05 15:05:54 公開日:2021-03-04

# bad and good error: ディープアンサンブル学習における価値重み付けスキルスコア

Bad and good errors: value-weighted skill scores in deep ensemble learning ( http://arxiv.org/abs/2103.02881v1 )

ライセンス: Link先を確認

Sabrina Guastavino, Michele Piana, Federico Benvenuto

(参考訳) 本稿では,予測検証を実現する新しい手法を提案する。具体的には、予測誤差の重大度を評価するための戦略を、ある事象の発生を予測しただけの誤報が、連続した非発生事象の途中の1つよりも優れており、一方、孤立した事象のミスは、複数の連続事象の一部である1つの事象の欠落よりも悪い影響を有するという証拠に基づいて紹介する。本稿では,この概念に基づいて,その品質よりも予測の価値に重きを置くような,混乱行列とスキルスコアの新たな定義を導入する。次に,これらの値重み付けスキルスコアの最適化により,ニューラルネットワークの確率的結果がクラスタ化される,バイナリ分類のための深層アンサンブル学習手順を提案する。我々は, 公害, 宇宙天気, 株式賞の予測に関する3つの応用事例において, 最終的にこのアプローチの性能を示す。

In this paper we propose a novel approach to realize forecast verification. Specifically, we introduce a strategy for assessing the severity of forecast errors based on the evidence that, on the one hand, a false alarm just anticipating an occurring event is better than one in the middle of consecutive non-occurring events, and that, on the other hand, a miss of an isolated event has a worse impact than a miss of a single event, which is part of several consecutive occurrences. Relying on this idea, we introduce a novel definition of confusion matrix and skill scores giving greater importance to the value of the prediction rather than to its quality. Then, we introduce a deep ensemble learning procedure for binary classification, in which the probabilistic outcomes of a neural network are clustered via optimization of these value-weighted skill scores. We finally show the performances of this approach in the case of three applications concerned with pollution, space weather and stock prize forecasting.

翻訳日:2021-03-05 15:05:35 公開日:2021-03-04

# データサイエンスのためのサーバレスモデル

Serverless Model Serving for Data Science ( http://arxiv.org/abs/2103.02958v1 )

ライセンス: Link先を確認

Yuncheng Wu, Tien Tuan Anh Dinh, Guoyu Hu, Meihui Zhang, Yeow Meng Chee, Beng Chin Ooi

(参考訳) 機械学習(ML)は、現代のデータサイエンスアプリケーションの重要な部分です。データサイエンティストは現在、モデルトレーニングとモデルサービスの両方を含むエンドツーエンドのMLライフサイクルを管理しなければなりません。モデルサービスのためのシステムは、高いパフォーマンス、低コスト、管理の容易さを必要とする。クラウドプロバイダは、マネージドサービスやセルフレンタルサーバなど、モデルサービスオプションをすでに提供している。最近では、高い弾力性ときめ細かいコストモデルを含むサーバレスコンピューティングが、モデル提供の新たな可能性をもたらしている。本稿では、データサイエンスアプリケーションのためのプラットフォームを提供する主流モデルとしてのサーバーレスの実現可能性について検討する。 Amazon Web Service(AWS)とGoogle Cloud Platform(GCP)の2つのクラウド上の他のモデルサービスシステムに対して、サーバレスのパフォーマンスとコストを総合的に評価します。サーバーレスは、コストとパフォーマンスに関して多くのクラウドベースの代替手段を上回っています。さらに興味深いのは、いくつかの状況下では、平均レイテンシとコストの両方でGPUベースのシステムより優れていることだ。これらの結果は、サーバーレスはモデルサービスには適さないという以前のワークスの主張と異なり、GPUベースのシステムはCPUベースのシステムよりもMLワークロードに適しているという従来の認識に反している。他の発見としては、AWSとGCPのサーバレス関数間のコールドスタート時間の大きなギャップ、ワークロードやモデルの変更に対するサーバレスの低感度などが挙げられる。評価結果は、サーバレスがモデルサービスにとって実行可能な選択肢であることを示している。最後に,スケーラブルでコスト効率のよいモデル提供にサーバレスを使用する方法について,データサイエンティストに対していくつかの実践的な推奨を行う。

Machine learning (ML) is an important part of modern data science applications. Data scientists today have to manage the end-to-end ML life cycle that includes both model training and model serving, the latter of which is essential, as it makes their works available to end-users. Systems for model serving require high performance, low cost, and ease of management. Cloud providers are already offering model serving options, including managed services and self-rented servers. Recently, serverless computing, whose advantages include high elasticity and fine-grained cost model, brings another possibility for model serving. In this paper, we study the viability of serverless as a mainstream model serving platform for data science applications. We conduct a comprehensive evaluation of the performance and cost of serverless against other model serving systems on two clouds: Amazon Web Service (AWS) and Google Cloud Platform (GCP). We find that serverless outperforms many cloud-based alternatives with respect to cost and performance. More interestingly, under some circumstances, it can even outperform GPU-based systems for both average latency and cost. These results are different from previous works' claim that serverless is not suitable for model serving, and are contrary to the conventional wisdom that GPU-based systems are better for ML workloads than CPU-based systems. Other findings include a large gap in cold start time between AWS and GCP serverless functions, and serverless' low sensitivity to changes in workloads or models. Our evaluation results indicate that serverless is a viable option for model serving. Finally, we present several practical recommendations for data scientists on how to use serverless for scalable and cost-effective model serving.

翻訳日:2021-03-05 15:05:18 公開日:2021-03-04

# 胸部CT画像におけるEigenlungs-based classifierとCOVID-19診断の確率的組み合わせ

Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images ( http://arxiv.org/abs/2103.02961v1 )

ライセンス: Link先を確認

Juan E. Arco, Andr\'es Ortiz, Javier Ram\'irez, Francisco J. Mart\'inez-Murcia, Yu-Dong Zhang, Jordi Broncano, M. \'Alvaro Berb\'is, Javier Royuela-del-Val, Antonio Luna, Juan M. G\'orriz

(参考訳) 新型コロナウイルス(covid-19)パンデミック(coonavirus disease 2019)の流行は世界を変えた。世界保健機関(WHO)によると、新型コロナウイルスの感染者数は1億人以上で、2400万人以上が死亡している。この疾患の早期発見は非常に重要であり、胸部X線(CXR)や胸部CT(CCT)などの医療画像の使用は優れたソリューションであることが証明されています。しかし、このプロセスでは、医師が手作業や時間を要する作業で行う必要があり、診断のスピードアップには適していない。本研究では,肺炎のパターンを識別するために,確率的支援ベクトルマシン(SVM)に基づくアンサンブル分類器を提案する。具体的には、各CCTスキャンを立方パッチに分割し、それぞれに含まれる特徴をカーネルPCAを適用して抽出する。アンサンブル内での塩基型分類器の使用により,サイズや位置に関わらず,本システムは肺炎のパターンを識別できる。個々のパッチの決定は、個々の分類の信頼性に応じてグローバルに結合されます:不確実性が低いほど、貢献度が高くなります。実際のシナリオで性能を評価し、精度は97.86%である。得られた大きな性能とシステムのシンプルさ(CCT画像におけるディープラーニングの使用は膨大な計算コストをもたらす)は、現実世界での提案の適用可能性を示しています。

The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment.

翻訳日:2021-03-05 15:04:46 公開日:2021-03-04

# ノード欠落した多層グラフのクラスタリング

Clustering multilayer graphs with missing nodes ( http://arxiv.org/abs/2103.03235v1 )

ライセンス: Link先を確認

Guillaume Braun, Hemant Tyagi, Christophe Biernacki

(参考訳) エージェント間の関係はグラフによって便利に表現できる。これらの関係が異なるモダリティを持つ場合、各層が1つのモダリティに関連付けられる多層グラフによりモデル化される。このようなグラフは、生物的および社会的ネットワークを含む多くの文脈で自然に生じる。クラスタリングはネットワーク分析における基本的な問題であり、同じ接続プロファイルを持つノードを再グループ化するのが目標である。過去10年間で、各レイヤが提供する情報を統合するために、一層設定から多層グラフへ様々なクラスタリング手法が拡張されてきた。既存のほとんどの作業では、すべてのレイヤが同じノードセットを共有していると仮定していますが、異なるノードセットでレイヤを定義することができる新しいフレームワークを提案します。特に、層に記録されていないノードは欠落として扱われる。このパラダイム内では,不完全なクラスタへの完全設定において,よく知られたクラスタリング手法のいくつかの一般化を調べ,多層確率ブロックモデルの仮定の下で一貫性の証明を行う。当社の理論結果は、合成データに関するアルゴリズムと実際のデータセットの数値的比較によって補完され、様々な設定における我々の手法の有望な振る舞いを強調しています。

Relationship between agents can be conveniently represented by graphs. When these relationships have different modalities, they are better modelled by multilayer graphs where each layer is associated with one modality. Such graphs arise naturally in many contexts including biological and social networks. Clustering is a fundamental problem in network analysis where the goal is to regroup nodes with similar connectivity profiles. In the past decade, various clustering methods have been extended from the unilayer setting to multilayer graphs in order to incorporate the information provided by each layer. While most existing works assume - rather restrictively - that all layers share the same set of nodes, we propose a new framework that allows for layers to be defined on different sets of nodes. In particular, the nodes not recorded in a layer are treated as missing. Within this paradigm, we investigate several generalizations of well-known clustering methods in the complete setting to the incomplete one and prove some consistency results under the Multi-Layer Stochastic Block Model assumption. Our theoretical results are complemented by thorough numerical comparisons between our proposed algorithms on synthetic data, and also on real datasets, thus highlighting the promising behaviour of our methods in various settings.

翻訳日:2021-03-05 15:04:23 公開日:2021-03-04

# モーメントとマッチング:模倣学習におけるトレードオフと治療

Of Moments and Matching: Trade-offs and Treatments in Imitation Learning ( http://arxiv.org/abs/2103.03236v1 )

ライセンス: Link先を確認

Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, J. Andrew Bagnell

(参考訳) 我々は、モーメントマッチングのレンズを通して、過去の模倣学習アルゴリズムの大規模なファミリの統一ビューを提供する。その中心となる分類法は,(1)報奨と(2)専門家の行動の行動価値モーメントを一致させようとする学習者かに基づいており,それぞれの選択肢によって異なるアルゴリズム的アプローチが導かれる。学習者と専門家の行動の反対に選択された分岐を考慮することによって、私たちはこれらのクラスのすべてのアルゴリズムに適用する政策パフォーマンスの境界を導き出すことができます。また,従来の模擬学習において暗黙的な復元可能性の概念を導入し,各アルゴリズムファミリーが複合的誤りを軽減できるかを明確化することができる。 AdVILとAdRILという2つの新しいアルゴリズムテンプレートを、強力な保証、シンプルな実装、競争力のある実証的パフォーマンスで導出します。

We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between learner and expert behavior, we are able to derive bounds on policy performance that apply for all algorithms in each of these classes, the first to our knowledge. We also introduce the notion of recoverability, implicit in many previous analyses of imitation learning, which allows us to cleanly delineate how well each algorithmic family is able to mitigate compounding errors. We derive two novel algorithm templates, AdVIL and AdRIL, with strong guarantees, simple implementation, and competitive empirical performance.

翻訳日:2021-03-05 15:04:06 公開日:2021-03-04

# ロバストな医用画像分割のためのコンテキストフィードバックループによる学習

Learning With Context Feedback Loop for Robust Medical Image Segmentation ( http://arxiv.org/abs/2103.02844v1 )

ライセンス: Link先を確認

Kibrom Berihu Girum, Gilles Cr\'ehange, Alain Lalande

(参考訳) 深層学習は、医用画像セグメンテーションにうまく活用されている。コンボリューショナルニューラルネットワーク(CNN)を使用して、定義された画素ワイドの目的関数から特有の画像特徴を学習する。しかし、このアプローチは不完全で非現実的なセグメンテーション結果を生成する出力画素相互依存を減少させる可能性がある。本稿では,2つのシステムを用いた再帰的枠組みとしてセグメンテーション問題を定式化し,ロバストな医用画像セグメンテーションのための完全自動深層学習手法を提案する。 1つ目は、入力画像からセグメンテーション結果を予測するエンコーダデコーダcnnのフォワードシステムである。フォワードシステムの予測確率出力は、完全な畳み込みネットワーク(FCN)ベースのコンテキストフィードバックシステムによって符号化される。 FCNの符号化された特徴空間は、フォワードシステムのフィードフォワード学習プロセスに統合される。 FCNベースのコンテキストフィードバックループを使用することで、フォワードシステムはより高レベルな画像の特徴を学習し、抽出し、以前の誤りを修正し、時間とともに予測精度を向上させることができる。 4つの異なる臨床データセットで実施した実験結果から,本手法の医療画像の単一・多構造セグメント化への応用の可能性を示した。フィードバックループにより、ディープラーニングメソッドは解剖学的に実行可能で、コントラストの低い画像に対して堅牢な結果を生み出すことができる。したがって、コンテキストフィードバックループを介して2つの相互接続ネットワークの繰り返しフレームワークとして画像セグメンテーションを形成することは、堅牢で効率的な医療画像分析の潜在的な方法である。

Deep learning has successfully been leveraged for medical image segmentation. It employs convolutional neural networks (CNN) to learn distinctive image features from a defined pixel-wise objective function. However, this approach can lead to less output pixel interdependence producing incomplete and unrealistic segmentation results. In this paper, we present a fully automatic deep learning method for robust medical image segmentation by formulating the segmentation problem as a recurrent framework using two systems. The first one is a forward system of an encoder-decoder CNN that predicts the segmentation result from the input image. The predicted probabilistic output of the forward system is then encoded by a fully convolutional network (FCN)-based context feedback system. The encoded feature space of the FCN is then integrated back into the forward system's feed-forward learning process. Using the FCN-based context feedback loop allows the forward system to learn and extract more high-level image features and fix previous mistakes, thereby improving prediction accuracy over time. Experimental results, performed on four different clinical datasets, demonstrate our method's potential application for single and multi-structure medical image segmentation by outperforming the state of the art methods. With the feedback loop, deep learning methods can now produce results that are both anatomically plausible and robust to low contrast images. Therefore, formulating image segmentation as a recurrent framework of two interconnected networks via context feedback loop can be a potential method for robust and efficient medical image analysis.

翻訳日:2021-03-05 15:03:51 公開日:2021-03-04

# ディープニューラルネットワークを用いたx線血管造影における冠動脈狭窄の自動検出

Automated Detection of Coronary Artery Stenosis in X-ray Angiography using Deep Neural Networks ( http://arxiv.org/abs/2103.02969v1 )

ライセンス: Link先を確認

Dinis L. Rodrigues, Miguel Nobre Menezes, Fausto J. Pinto, Arlindo L. Oliveira

(参考訳) 冠動脈の一部または全部の閉塞である狭窄につながる冠動脈疾患は、毎年数百万の患者に影響を与える重篤な疾患である。最小限の侵襲的処置による狭窄度の自動同定と分類は臨床的価値が高いが、作業の複雑さのため、既存の方法は経験豊富な心科医の正確さに合致しない。狭窄を定量的に評価するための多くの計算手法が提案されているが、これらの手法の性能は臨床応用に必要なレベルには程遠い。本稿では,X線冠動脈造影画像からの狭窄検出を部分的に自動化する2段階のディープラーニングフレームワークを提案する。 2つのステップにおいて、我々は2つの異なる畳み込みニューラルネットワークアーキテクチャを使用し、1つはビューの角度を自動的に識別し分類し、もう1つは、狭窄が見えるフレームにおける関心領域の境界ボックスを決定する。転送学習とデータ拡張技術は、両方のタスクでシステムの性能を高めるために使用された。左/右冠動脈(LCA/RCA)角度ビューの分類作業において0.97の精度とLCAとRCAの関心領域の決定に関する0.68/0.73リコールを達成した。これらの結果は関連するアプローチで得られたこれまでの結果と比較し、x線血管造影から狭窄度を完全自動化する方法への道を開く。

Coronary artery disease leading up to stenosis, the partial or total blocking of coronary arteries, is a severe condition that affects millions of patients each year. Automated identification and classification of stenosis severity from minimally invasive procedures would be of great clinical value, but existing methods do not match the accuracy of experienced cardiologists, due to the complexity of the task. Although a number of computational approaches for quantitative assessment of stenosis have been proposed to date, the performance of these methods is still far from the required levels for clinical applications. In this paper, we propose a two-step deep-learning framework to partially automate the detection of stenosis from X-ray coronary angiography images. In the two steps, we used two distinct convolutional neural network architectures, one to automatically identify and classify the angle of view, and another to determine the bounding boxes of the regions of interest in frames where stenosis is visible. Transfer learning and data augmentation techniques were used to boost the performance of the system in both tasks. We achieved a 0.97 accuracy on the task of classifying the Left/Right Coronary Artery (LCA/RCA) angle view and 0.68/0.73 recall on the determination of the regions of interest, for LCA and RCA, respectively. These results compare favorably with previous results obtained using related approaches, and open the way to a fully automated method for the identification of stenosis severity from X-ray angiographies.

翻訳日:2021-03-05 15:03:29 公開日:2021-03-04

# PointGuard: おそらくロバストな3Dポイントクラウド分類

PointGuard: Provably Robust 3D Point Cloud Classification ( http://arxiv.org/abs/2103.03046v1 )

ライセンス: Link先を確認

Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong

(参考訳) 3Dポイントクラウド分類には、自律運転やロボットグリップなど、多くの安全クリティカルな応用がある。しかし、いくつかの研究で敵の攻撃に弱いことが示されている。特に、攻撃者は、少数のポイントを慎重に修正、追加、削除することで、3Dポイントクラウドの誤ラベルを予測することができる。ランダム化スムージングは、確実な堅牢な2D画像分類器を構築するための最先端の技術です。しかし、3Dポイントクラウド分類を適用すると、ランダム化されたスムージングは、逆 {modified} ポイントに対する堅牢性のみを証明できます。本研究では,反対に修正,追加,削除された点に対する堅牢性を保証する最初の防御であるPointGuardを提案する。具体的には、3Dポイントクラウドと任意のポイントクラウド分類器を与えられた場合、PointGuardは最初に、元のポイントクラウド内のポイントのランダムサブセットを含む複数のサブサンプルポイントクラウドを作成し、PointGuardは、ポイントクラウド分類器によって予測されるサブサンプルポイントクラウドのラベルの過半数として、元のポイントクラウドのラベルを予測します。最初の大きな理論的貢献は、逆修正、追加、および/または削除されたポイントの数に境界がある場合、PointGuardが3Dポイントクラウドの同じラベルを予測できることを示しています。 2つ目の大きな理論的貢献は、点クラウド分類器に仮定がない場合、導出境界の厳密性を証明することです。さらに、認証された堅牢性保証を計算する効率的なアルゴリズムを設計します。また、ModelNet40およびScanNetベンチマークデータセット上でPointGuardを実証的に評価する。

3D point cloud classification has many safety-critical applications such as autonomous driving and robotic grasping. However, several studies showed that it is vulnerable to adversarial attacks. In particular, an attacker can make a classifier predict an incorrect label for a 3D point cloud via carefully modifying, adding, and/or deleting a small number of its points. Randomized smoothing is state-of-the-art technique to build certifiably robust 2D image classifiers. However, when applied to 3D point cloud classification, randomized smoothing can only certify robustness against adversarially {modified} points. In this work, we propose PointGuard, the first defense that has provable robustness guarantees against adversarially modified, added, and/or deleted points. Specifically, given a 3D point cloud and an arbitrary point cloud classifier, our PointGuard first creates multiple subsampled point clouds, each of which contains a random subset of the points in the original point cloud; then our PointGuard predicts the label of the original point cloud as the majority vote among the labels of the subsampled point clouds predicted by the point cloud classifier. Our first major theoretical contribution is that we show PointGuard provably predicts the same label for a 3D point cloud when the number of adversarially modified, added, and/or deleted points is bounded. Our second major theoretical contribution is that we prove the tightness of our derived bound when no assumptions on the point cloud classifier are made. Moreover, we design an efficient algorithm to compute our certified robustness guarantees. We also empirically evaluate PointGuard on ModelNet40 and ScanNet benchmark datasets.

翻訳日:2021-03-05 15:03:02 公開日:2021-03-04

# 実世界ブラインド画像復調のための畳み込み対自己組織型オペレーショナルニューラルネットワーク

Convolutional versus Self-Organized Operational Neural Networks for Real-World Blind Image Denoising ( http://arxiv.org/abs/2103.03070v1 )

ライセンス: Link先を確認

Junaid Malik, Serkan Kiranyaz, Mehmet Yamac, Esin Guldogan, Moncef Gabbouj

(参考訳) 実世界のブラインドデノージングは、基礎となるノイズ分布の非決定論的性質のため、ユニークな画像復元に挑戦する。合成雑音モデルで訓練された一般的な識別ネットワークは、実世界のノイズ画像に悪影響を与えることが示されている。実世界のノイズ画像のキュレーションと地上の真理推定手順の改善は依然として重要なポイントであるが、潜在的研究の方向性は、より深い畳み込みニューラルネットワーク(CNN)を使うのとは対照的に、より少ないデータと低いネットワークの複雑さでより良い一般化を可能にするために広く使用される畳み込みニューロンモデルの拡張を探索することである。オペレーショナルニューラルネットワーク(ONNs)とその最近の変種である自己組織化ONN(Self-ONNs)は、強化された非線形性をニューロンモデルに組み込むことを提案し、様々な回帰タスクでCNNを上回ることが示されています。しかし、これらの比較はすべてコンパクトなネットワークで行われており、現代のディープアーキテクチャにおける畳み込みレイヤの代替として運用層を配置する効果は、まだ確認されていない。そこで本研究では,実世界のブラインド画像のデノジング問題に初めて,深い自己オンを用いて対処する。最先端の深層CNNネットワークであるDnCNNに対して、複数のメトリクスにまたがる広範囲な定量的および定性的評価と、高解像度の4つの実世界のノイズ画像データセットは、PSNRにおいて最大1.76dBの性能向上を確実に達成していることが明らかとなった。さらに、DnCNNの計算リソースのほんの一部だけを必要とするレイヤーの数を半分から4分の1まで持つSelf-ONNは、最先端のものと同じまたはより良い結果を達成できます。

Real-world blind denoising poses a unique image restoration challenge due to the non-deterministic nature of the underlying noise distribution. Prevalent discriminative networks trained on synthetic noise models have been shown to generalize poorly to real-world noisy images. While curating real-world noisy images and improving ground truth estimation procedures remain key points of interest, a potential research direction is to explore extensions to the widely used convolutional neuron model to enable better generalization with fewer data and lower network complexity, as opposed to simply using deeper Convolutional Neural Networks (CNNs). Operational Neural Networks (ONNs) and their recent variant, Self-organized ONNs (Self-ONNs), propose to embed enhanced non-linearity into the neuron model and have been shown to outperform CNNs across a variety of regression tasks. However, all such comparisons have been made for compact networks and the efficacy of deploying operational layers as a drop-in replacement for convolutional layers in contemporary deep architectures remains to be seen. In this work, we tackle the real-world blind image denoising problem by employing, for the first time, a deep Self-ONN. Extensive quantitative and qualitative evaluations spanning multiple metrics and four high-resolution real-world noisy image datasets against the state-of-the-art deep CNN network, DnCNN, reveal that deep Self-ONNs consistently achieve superior results with performance gains of up to 1.76dB in PSNR. Furthermore, Self-ONNs with half and even quarter the number of layers that require only a fraction of computational resources as that of DnCNN can still achieve similar or better results compared to the state-of-the-art.

翻訳日:2021-03-05 15:02:39 公開日:2021-03-04

# ジェネラティブ手法を用いたプライバシ攻撃に対する医療画像診断の防御

Defending Medical Image Diagnostics against Privacy Attacks using Generative Methods ( http://arxiv.org/abs/2103.03078v1 )

ライセンス: Link先を確認

William Paul, Yinzhi Cao, Miaomiao Zhang, and Phil Burlina

(参考訳) 医療画像診断に使用される機械学習(ML)モデルは、メンバーシップ推論攻撃を含むさまざまなプライバシー攻撃に脆弱になり、医療データの使用を規制する規制に違反し、診療所での効果的な展開を妨害する恐れがあります。本稿では,モデル変更と後処理ステップに着目したプライバシアウェアmlの最近の研究とは対照的に,データ共有プロセスを制御することで医療データのセキュリティを高める新しい補完スキームを提案する。本稿では,医療データ発信者に対してGAN(Generative Adversarial Network)を用いたプライバシ保護プロトコルの開発と評価を行う。病院) 原画像から合成されたプロキシデータセットを外部エージェント(モデラー)に提供することで、モデル消費者が利用可能な診断システムは、プライバシ攻撃者に対してレジリエントになる。本研究では, 糖尿病性網膜症に用いる網膜診断AIについて, 個人情報が漏洩するリスクがあることを示す。プライバシー擁護者とモデラーの両方の懸念を組み込むために、プライバシーとユーティリティのパフォーマンスを組み合わせ評価するメトリクスを導入し、これらの新旧のメトリクスを使用して、私たちのアプローチは、それ自体または他の防御と組み合わせて、プライバシー攻撃から守るための最先端の(SOTA)パフォーマンスを提供します。

Machine learning (ML) models used in medical imaging diagnostics can be vulnerable to a variety of privacy attacks, including membership inference attacks, that lead to violations of regulations governing the use of medical data and threaten to compromise their effective deployment in the clinic. In contrast to most recent work in privacy-aware ML that has been focused on model alteration and post-processing steps, we propose here a novel and complementary scheme that enhances the security of medical data by controlling the data sharing process. We develop and evaluate a privacy defense protocol based on using a generative adversarial network (GAN) that allows a medical data sourcer (e.g. a hospital) to provide an external agent (a modeler) a proxy dataset synthesized from the original images, so that the resulting diagnostic systems made available to model consumers is rendered resilient to privacy attackers. We validate the proposed method on retinal diagnostics AI used for diabetic retinopathy that bears the risk of possibly leaking private information. To incorporate concerns of both privacy advocates and modelers, we introduce a metric to evaluate privacy and utility performance in combination, and demonstrate, using these novel and classical metrics, that our approach, by itself or in conjunction with other defenses, provides state of the art (SOTA) performance for defending against privacy attacks.

翻訳日:2021-03-05 15:01:52 公開日:2021-03-04

# スパースランダム特徴による関数近似

Function Approximation via Sparse Random Features ( http://arxiv.org/abs/2103.03191v1 )

ライセンス: Link先を確認

Abolfazl Hashemi, Hayden Schaeffer, Robert Shi, Ufuk Topcu, Giang Tran, Rachel Ward

(参考訳) ランダム特徴法は様々な機械学習タスクで成功し、計算が容易で、理論的に精度の限界がある。コストのかかるトレーニングフェーズなしで同様の関数空間を表現できるため、標準的なニューラルネットワークに代わるアプローチとして機能します。しかしながら、正確性のため、ランダム特徴法はトレーニング可能なパラメータよりも多くの測定を必要とするため、データ収集アプリケーションや科学的な機械学習における問題に対する使用が制限される。本稿では,圧縮センシングの手法を用いて無作為特徴モデルを学習する分散ランダム特徴量法を提案する。再生カーネルヒルベルト空間における関数の近似誤差について、サンプル数と特徴量の分布に依存する一様境界を与える。誤差境界は、座標の間隔、スペクトルのコンパクトなクラスター、または急速なスペクトル崩壊などの追加の構造条件で改善される。分散ランダム特徴法は,十分に構造化された機能や科学的機械学習タスクへの応用において,浅層ネットワークよりも優れていることを示す。

Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing. We provide uniform bounds on the approximation error for functions in a reproducing kernel Hilbert space depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.

翻訳日:2021-03-05 15:01:27 公開日:2021-03-04

# 音声言語理解に関する調査 : 最近の進歩と新たなフロンティア

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers ( http://arxiv.org/abs/2103.03095v1 )

ライセンス: Link先を確認

Libo Qin, Tianbao Xie, Wanxiang Che, Ting Liu

(参考訳) SLU(Spoken Language Understanding)は、タスク指向ダイアログシステムの中核コンポーネントであるユーザクエリのセマンティクスフレームを抽出することを目的としている。深層ニューラルネットワークの破裂と事前訓練された言語モデルの進化により、SLUの研究は大きなブレークスルーを得た。しかし、既存のアプローチと最近のトレンドを要約した包括的な調査がいまだに欠落しており、この記事で提示された研究の動機となっている。本稿では、SLUの最近の進歩と新しいフロンティアを調査します。 Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. この調査が今後のSLU分野の研究に光を当てることを願っている。

Spoken Language Understanding (SLU) aims to extract the semantics frame of user queries, which is a core component in a task-oriented dialog system. With the burst of deep neural networks and the evolution of pre-trained language models, the research of SLU has obtained significant breakthroughs. However, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. In this paper, we survey recent advances and new frontiers in SLU. Specifically, we give a thorough review of this research field, covering different aspects including (1) new taxonomy: we provide a new perspective for SLU filed, including single model vs. joint model, implicit joint modeling vs. explicit joint modeling in joint model, non pre-trained paradigm vs. pre-trained paradigm;(2) new frontiers: some emerging areas in complex SLU as well as the corresponding challenges; (3) abundant open-source resources: to help the community, we have collected, organized the related papers, baseline projects and leaderboard on a public website where SLU researchers could directly access to the recent progress. We hope that this survey can shed a light on future research in SLU field.

翻訳日:2021-03-05 15:01:11 公開日:2021-03-04

# エンドツーエンド同時音声翻訳復号戦略の実証的研究

An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies ( http://arxiv.org/abs/2103.03233v1 )

ライセンス: Link先を確認

Ha Nguyen, Yannick Est\`eve, Laurent Besacier

(参考訳) 本稿では,エンドツーエンドの同時音声翻訳のためのデコード戦略を提案する。オフラインモードで訓練されたエンドツーエンドモデルを活用し、2つの言語ペア(英語-ドイツ語と英語-ポルトガル語)の実証的研究を行います。また,文字やByte Pair Encoding (BPE)ユニットなど,さまざまな出力トークンの粒度についても検討する。その結果, BLEU/Average Laggingのトレードオフを, 異なる遅延方式で制御できることが示された。最適な復号化設定は,IWSLT 2020共有タスクの同時翻訳トラックで評価された強力なカスケードモデルにより,同等の結果が得られる。

This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.

翻訳日:2021-03-05 15:00:54 公開日:2021-03-04

# PC2WF: 原点雲からの3Dワイヤフレーム再構築

PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds ( http://arxiv.org/abs/2103.02766v1 )

ライセンス: Link先を確認

Yujia Liu, Stefano D'Aronco, Konrad Schindler, Jan Dirk Wegner

(参考訳) PC2WFは,3Dポイントクラウドをワイヤフレームモデルに変換するための,最初のエンドツーエンドトレーニング可能なディープネットワークアーキテクチャである。ネットワークは、あるオブジェクトの表面からサンプリングされた無秩序な3dポイントのセットを入力とし、そのオブジェクトのワイヤーフレーム、すなわち線分でリンクされたコーナーポイントのスパースセットを出力する。ワイヤフレームの復元は難しい作業であり、頂点とエッジの数が各インスタンスで異なるため、a-prioriは未知である。私たちのアーキテクチャは徐々にモデルを構築し、ポイントを特徴ベクトルにエンコードすることから始まります。これらの特徴に基づいて、候補者頂点のプールを特定し、候補者をコーナー頂点の最終セットにプルーンし、位置を洗練します。次に、コーナーは、最終的なワイヤフレームを得るために再びプルーニングされる候補エッジの総括セットにリンクされる。すべてのステップはトレーニング可能で、エラーはシーケンス全体をバックプロパゲーションすることができる。提案したモデルを,地上の真理線フレームにアクセス可能な公開合成データセットと,新たな実世界のデータセットで検証する。我々のモデルは、優れた品質のワイヤフレーム抽象化を生成し、いくつかのベースラインを上回ります。

We introduce PC2WF, the first end-to-end trainable deep network architecture to convert a 3D point cloud into a wireframe model. The network takes as input an unordered set of 3D points sampled from the surface of some object, and outputs a wireframe of that object, i.e., a sparse set of corner points linked by line segments. Recovering the wireframe is a challenging task, where the numbers of both vertices and edges are different for every instance, and a-priori unknown. Our architecture gradually builds up the model: It starts by encoding the points into feature vectors. Based on those features, it identifies a pool of candidate vertices, then prunes those candidates to a final set of corner vertices and refines their locations. Next, the corners are linked with an exhaustive set of candidate edges, which is again pruned to obtain the final wireframe. All steps are trainable, and errors can be backpropagated through the entire sequence. We validate the proposed model on a publicly available synthetic dataset, for which the ground truth wireframes are accessible, as well as on a new real-world dataset. Our model produces wireframe abstractions of good quality and outperforms several baselines.

翻訳日:2021-03-05 14:59:33 公開日:2021-03-04

# 顔認識がオクルージョンを満たすとき: 新しいベンチマーク

When Face Recognition Meets Occlusion: A New Benchmark ( http://arxiv.org/abs/2103.02805v1 )

ライセンス: Link先を確認

Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Kangli Zeng, Zhen Han, Xin Tian, Yuhong Yang

(参考訳) 既存の顔認識データセットは、通常、顔認識の開発を妨げる閉塞サンプルを欠いています。特に新型コロナウイルス(COVID-19)の流行により、マスクの着用はウイルスの拡散を防ぐ効果的な手段となっている。既存のデータセットでトレーニングされた従来のCNNベースの顔認識モデルは、重閉塞に対してほとんど効果がない。この目的のために,シミュレーションによるオクルージョン顔認識データセットを考案する。特に,まず様々な眼鏡やマスクを隠蔽として収集し,隠蔽属性(隠蔽物,テクスチャ,色)をランダムに組み合わせて,より現実的な隠蔽タイプを多数達成する。それから私達は正常な閉塞の習慣の顔のイメージの適切な位置でそれらを覆います。さらに,オリジナル正規顔画像とオクルード顔画像を組み合わせて,webface-occと呼ばれる最終データセットを形成する。その多様性と安定性を確保するために、10,575人の被験者の804,704枚の顔画像をカバーしています。公開データセットに関する広範な実験は、データセットで再トレーニングされたarcfaceが最先端を著しく上回っていることを示している。 Webface-OCCはhttps://github.com/Baojin-Huang/Webface-OCCで入手できる。

The existing face recognition datasets usually lack occlusion samples, which hinders the development of face recognition. Especially during the COVID-19 coronavirus epidemic, wearing a mask has become an effective means of preventing the virus spread. Traditional CNN-based face recognition models trained on existing datasets are almost ineffective for heavy occlusion. To this end, we pioneer a simulated occlusion face recognition dataset. In particular, we first collect a variety of glasses and masks as occlusion, and randomly combine the occlusion attributes (occlusion objects, textures,and colors) to achieve a large number of more realistic occlusion types. We then cover them in the proper position of the face image with the normal occlusion habit. Furthermore, we reasonably combine original normal face images and occluded face images to form our final dataset, termed as Webface-OCC. It covers 804,704 face images of 10,575 subjects, with diverse occlusion types to ensure its diversity and stability. Extensive experiments on public datasets show that the ArcFace retrained by our dataset significantly outperforms the state-of-the-arts. Webface-OCC is available at https://github.com/Baojin-Huang/Webface-OCC.

翻訳日:2021-03-05 14:59:12 公開日:2021-03-04

# 構造条件付き逆学習による画像分類のための教師なし領域適応

Unsupervised Domain Adaptation for Image Classification via Structure-Conditioned Adversarial Learning ( http://arxiv.org/abs/2103.02808v1 )

ライセンス: Link先を確認

Hui Wang, Jian Tian, Songyuan Li, Hanbin Zhao, Qi Tian, Fei Wu, and Xi Li

(参考訳) Unsupervised Domain Adapt (UDA) は、典型的には、ラベルリッチなソースドメインから非ラベル付きターゲットドメインへの知識転送を、逆学習によって行う。原則として、既存のUDAアプローチは主にドメイン間のグローバルな分布アライメントに焦点を当て、固有の局所分布特性を無視している。そこで本研究では,ドメイン分散アライメント中のクラス内コンパクト性を維持可能な,エンドツーエンド構造条件付き対向学習スキーム(SCAL)を提案する。局所構造を構造認識条件として用いることで,提案手法を構造条件付き逆学習パイプラインに実装する。上記学習手順は、局所構造確立と構造条件付き逆学習とを交互に交互に行う。 UDAシナリオにおける提案手法の有効性を実験的に実証した。

Unsupervised domain adaptation (UDA) typically carries out knowledge transfer from a label-rich source domain to an unlabeled target domain by adversarial learning. In principle, existing UDA approaches mainly focus on the global distribution alignment between domains while ignoring the intrinsic local distribution properties. Motivated by this observation, we propose an end-to-end structure-conditioned adversarial learning scheme (SCAL) that is able to preserve the intra-class compactness during domain distribution alignment. By using local structures as structure-aware conditions, the proposed scheme is implemented in a structure-conditioned adversarial learning pipeline. The above learning procedure is iteratively performed by alternating between local structures establishment and structure-conditioned adversarial learning. Experimental results demonstrate the effectiveness of the proposed scheme in UDA scenarios.

翻訳日:2021-03-05 14:58:55 公開日:2021-03-04

# セマンティックアグリゲーションとアダプティブ2D-1Dレジストレーションによるカメラ空間ハンドメッシュの回復

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration ( http://arxiv.org/abs/2103.02845v1 )

ライセンス: Link先を確認

Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, Wen Zheng

(参考訳) 近年,3dハンドメッシュの回復が著しい進展を遂げている。しかし、本質的な2Dから3Dの曖昧さのために、単一のRGB画像からカメラ空間の3D情報を回復することは困難のままです。この問題に対処するため、カメラ空間メッシュ回復を2つのサブタスク、すなわちルート相対メッシュ回復とルート回復に分割する。まず、単一の入力画像からジョイントランドマークとシルエットを抽出し、3Dタスクに2Dキューを提供します。ルート関係メッシュ回収タスクでは,ジョイント間の意味関係を利用して抽出した2次元キューから3次元メッシュを生成する。このような生成された3Dメッシュ座標は、手首というルート位置に対して表現される。ルート回収タスクでは、生成した3Dメッシュを2Dキューに戻すことにより、カメラ空間にルート位置を登録し、カメラ空間の3Dメッシュ回復を完了させる。このパイプラインは,(1)関節間の既知の意味関係を明示的に利用し,(2)シルエットとメッシュの1次元投影を利用してロバストな登録を実現している。 FreiHAND、RHD、Human3.6Mなどの一般的なデータセットに関する広範な実験は、私たちのアプローチがルート相対メッシュリカバリとルートリカバリの両方で最先端のパフォーマンスを達成することを実証しています。私たちのコードはhttps://github.com/SeanChenxy/HandMeshで公開されています。

Recent years have witnessed significant progress in 3D hand mesh recovery. Nevertheless, because of the intrinsic 2D-to-3D ambiguity, recovering camera-space 3D information from a single RGB image remains challenging. To tackle this problem, we divide camera-space mesh recovery into two sub-tasks, i.e., root-relative mesh recovery and root recovery. First, joint landmarks and silhouette are extracted from a single input image to provide 2D cues for the 3D tasks. In the root-relative mesh recovery task, we exploit semantic relations among joints to generate a 3D mesh from the extracted 2D cues. Such generated 3D mesh coordinates are expressed relative to a root position, i.e., wrist of the hand. In the root recovery task, the root position is registered to the camera space by aligning the generated 3D mesh back to 2D cues, thereby completing camera-space 3D mesh recovery. Our pipeline is novel in that (1) it explicitly makes use of known semantic relations among joints and (2) it exploits 1D projections of the silhouette and mesh to achieve robust registration. Extensive experiments on popular datasets such as FreiHAND, RHD, and Human3.6M demonstrate that our approach achieves state-of-the-art performance on both root-relative mesh recovery and root recovery. Our code is publicly available at https://github.com/SeanChenxy/HandMesh.

翻訳日:2021-03-05 14:58:43 公開日:2021-03-04

# 微分可能なニューラルレンダリングによる物体検出のためのデータ拡張

Data Augmentation for Object Detection via Differentiable Neural Rendering ( http://arxiv.org/abs/2103.02852v1 )

ライセンス: Link先を確認

Guanghan Ning, Guang Chen, Chaowei Tan, Si Luo, Liefeng Bo, Heng Huang

(参考訳) 注釈付きデータが乏しい場合、堅牢なオブジェクト検出器を訓練することは困難です。この問題に対処する既存のアプローチには、ラベルなしデータからラベル付きデータを補間する半教師付き学習、プリテキストタスクを介してラベルなしデータ内の信号を利用する自己教師付き学習などがある。教師付き学習パラダイムを変えることなく,学習データを新しいビューで意味的に補間する,オブジェクト検出のためのオフラインデータ拡張手法を導入する。具体的には,人間の介入を伴わない境界ボックスアノテーションとともに,識別可能なニューラルレンダリングに基づくトレーニング画像の制御可能なビューを生成する。まず,深度マップを推定しながら,画素整列画像の特徴を点雲に抽出・投影する。次に、ターゲットカメラのポーズでそれらを再投影し、新しいビュー2d画像を描画する。キーポイント形式のオブジェクトはポイントクラウドにマークされ、新しいビューでアノテーションを復元します。アフィン変換やイメージミックスアップなどのオンラインデータ拡張手法と完全に互換性がある。広範な実験により,画像やラベルを強調するコストのないツールとして,訓練データが少ない物体検出システムの性能を著しく向上させることができることが示された。コードは \url{https://github.com/Guanghan/DANR} で入手できる。

It is challenging to train a robust object detector when annotated data is scarce. Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data, self-supervised learning that exploit signals within unlabeled data via pretext tasks. Without changing the supervised learning paradigm, we introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views. Specifically, our proposed system generates controllable views of training images based on differentiable neural rendering, together with corresponding bounding box annotations which involve no human intervention. Firstly, we extract and project pixel-aligned image features into point clouds while estimating depth maps. We then re-project them with a target camera pose and render a novel-view 2d image. Objects in the form of keypoints are marked in point clouds to recover annotations in new views. It is fully compatible with online data augmentation methods, such as affine transform, image mixup, etc. Extensive experiments show that our method, as a cost-free tool to enrich images and labels, can significantly boost the performance of object detection systems with scarce training data. Code is available at \url{https://github.com/Guanghan/DANR}.

翻訳日:2021-03-05 14:58:20 公開日:2021-03-04

# 相互一貫性トレーニングによる半教師付き左心房セグメンテーション

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training ( http://arxiv.org/abs/2103.02911v1 )

ライセンス: Link先を確認

Yicheng Wu, Minfeng Xu, Zongyuan Ge, Jianfei Cai and Lei Zhang

(参考訳) 半教師付き学習は、トレーニングのために大量の注釈データを集めることの重荷を軽減し、特に医用画像分割タスクにおいて機械学習の分野で大きな注目を集めている。しかし、既存の方法のほとんどは挑戦的な地域(例えば)の重要性を過小評価している。訓練中の小さな枝またはぼやけた縁) これらの未ラベル領域には、モデルの不確実性予測を最小限に抑えるためにより重要な情報が含まれており、トレーニングプロセスにおいて強調されるべきであると考えている。そこで本稿では,3次元MR画像からの半教師付き左房分割のための新しいMultual Consistency Network(MC-Net)を提案する。特に、MC-Netは1つのエンコーダと2つのわずかに異なるデコーダから構成されており、2つのデコーダの予測誤差は、相互整合性を促進するために設計された疑似ラベルスキームによって教師なしの損失として変換される。このような相互整合性は、2つのデコーダの一貫性と低エントロピー予測を奨励し、モデルがこれらのラベルのない挑戦領域から徐々に一般化された特徴を捉えることを可能にする。我々は,公開左心房(la)データベース上でmc-netを評価し,ラベルなしデータを効果的に活用することで印象的な性能向上を実現する。我々のMC-Netは、最近6つの半教師付き左房セグメンテーション法より優れており、LAデータベース上で新しい最先端性能を設定できる。

Semi-supervised learning has attracted great attention in the field of machine learning, especially for medical image segmentation tasks, since it alleviates the heavy burden of collecting abundant densely annotated data for training. However, most of existing methods underestimate the importance of challenging regions (e.g. small branches or blurred edges) during training. We believe that these unlabeled regions may contain more crucial information to minimize the uncertainty prediction for the model and should be emphasized in the training process. Therefore, in this paper, we propose a novel Mutual Consistency Network (MC-Net) for semi-supervised left atrium segmentation from 3D MR images. Particularly, our MC-Net consists of one encoder and two slightly different decoders, and the prediction discrepancies of two decoders are transformed as an unsupervised loss by our designed cycled pseudo label scheme to encourage mutual consistency. Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions. We evaluate our MC-Net on the public Left Atrium (LA) database and it obtains impressive performance gains by exploiting the unlabeled data effectively. Our MC-Net outperforms six recent semi-supervised methods for left atrium segmentation, and sets the new state-of-the-art performance on the LA database.

翻訳日:2021-03-05 14:58:00 公開日:2021-03-04

# QAIR:画像検索のためのクエリ効率の高いブラックボックス攻撃

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval ( http://arxiv.org/abs/2103.02927v1 )

ライセンス: Link先を確認

Xiaodan Li, Jinfeng Li, Yuefeng Chen, Shaokai Ye, Yuan He, Shuhui Wang, Hang Su, Hui Xue

(参考訳) 画像検索に対するクエリーベースの攻撃について検討し、ブラックボックス設定下では、データベースから上位kランクのラベルなし画像へのクエリーアクセスしか持たない、敵対的な例に対する堅牢性を評価する。返却されたラベルや信頼スコアに応じて敵を生成する画像分類におけるクエリーアタックと比較すると、部分的検索リストにおける攻撃効果の定量化が困難であるため、課題はさらに顕著になる。本稿では,QAIR(Query-based Attack against Image Retrieval)を初めて試行し,トップk検索結果を完全に変換する。具体的には、攻撃前後の上位K検索結果に対するセット類似度を測定し、勾配最適化を導くことにより、攻撃効果の定量化を図る。攻撃効率をさらに高めるため、ターゲットモデル上で転送可能なプリエントを取得し、事前誘導勾配を生成する再帰的モデル盗み法を提案する。総合的な実験により,ブラックボックス設定による画像検索システムに対するクエリ数が少なく,高い攻撃成功率を達成した。現実世界のビジュアル検索エンジンの攻撃評価は、Bing Visual Searchのような商用システムを、平均33のクエリで98%の攻撃成功率で欺くことに成功したことを示している。

We study the query-based attack against image retrieval to evaluate its robustness against adversarial examples under the black-box setting, where the adversary only has query access to the top-k ranked unlabeled images from the database. Compared with query attacks in image classification, which produce adversaries according to the returned labels or confidence score, the challenge becomes even more prominent due to the difficulty in quantifying the attack effectiveness on the partial retrieved list. In this paper, we make the first attempt in Query-based Attack against Image Retrieval (QAIR), to completely subvert the top-k retrieval results. Specifically, a new relevance-based loss is designed to quantify the attack effects by measuring the set similarity on the top-k retrieval results before and after attacks and guide the gradient optimization. To further boost the attack efficiency, a recursive model stealing method is proposed to acquire transferable priors on the target model and generate the prior-guided gradients. Comprehensive experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting. The attack evaluations on the real-world visual search engine show that it successfully deceives a commercial system such as Bing Visual Search with 98% attack success rate by only 33 queries on average.

翻訳日:2021-03-05 14:57:37 公開日:2021-03-04

# Visual Question Answering: どのアプリケーションを調査したか?

Visual Question Answering: which investigated applications? ( http://arxiv.org/abs/2103.02937v1 )

ライセンス: Link先を確認

Silvio Barra, Carmen Bisogni, Maria De Marsico, Stefano Ricciardi

(参考訳) VQA(Visual Question Answering)は、コンピュータビジョン(CV)と自然言語処理(NLP)が最近出会った非常に刺激的で挑戦的な研究分野である。画像キャプションとビデオ要約では、セマンティック情報は静止画またはビデオダイナミクスに完全に含まれており、人間の一貫性のある方法でマイニングおよび表現されるだけです。これとは違って、同じメディア内のVQAセマンティック情報は、自然言語で表現された質問によって暗示されるセマンティックスと比較されなければならない。 VQAアプローチに関する最近の調査では、画像関連処理や言語関連処理の基礎となる手法や、伝達された情報を一貫して融合させる方法に焦点が当てられている。実際、引用されたほとんどの作品は、VQAシステムのビルディングブロックを評価するために使用される汎用データセットに依存しています。本稿では、実際のアプリケーションにフォーカスした提案を検討し、アプリケーションドメインにバインドされた適切なデータをベンチマークとして使用する可能性について考察する。また、VQA研究における最近の課題についても報告する。

Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is completely contained in still images or video dynamics, and it has only to be mined and expressed in a human-consistent way. Differently from this, in VQA semantic information in the same media must be compared with the semantics implied by a question expressed in natural language, doubling the artificial intelligence-related effort. Some recent surveys about VQA approaches have focused on methods underlying either the image-related processing or the verbal-related one, or on the way to consistently fuse the conveyed information. Possible applications are only suggested, and, in fact, most cited works rely on general-purpose datasets that are used to assess the building blocks of a VQA system. This paper rather considers the proposals that focus on real-world applications, possibly using as benchmarks suitable data bound to the application domain. The paper also reports about some recent challenges in VQA research.

翻訳日:2021-03-05 14:57:16 公開日:2021-03-04

# MOGAN:単一画像からの形態学的構造認識ジェネラティブラーニング

MOGAN: Morphologic-structure-aware Generative Learning from a Single Image ( http://arxiv.org/abs/2103.02997v1 )

ライセンス: Link先を確認

Jinshu Chen, Qihui Xu, Qi Kang and MengChu Zhou

(参考訳) ユーザの関心領域(ROI)が与えられたほとんどのインタラクティブな画像生成タスクにおいて、生成した結果は、元の画像の正確かつ合理的な構造を維持しつつ、外観に適切な多様性が期待できる。このようなタスクは、限られたデータしか利用できない場合、より困難になる。近年,1つの画像のみに基づく生成モデルによる完全学習が提案されている。彼らはサンプル内の異なるオブジェクトの実際の意味情報を無視しながら、サンプルのモノリシックな特徴に多くの注意を払います。その結果、ROIベースの生成タスクでは、関連するオブジェクトの正しい構造を維持することなく、過度のランダム性を持つ不適切なサンプルを生成する可能性があります。この問題に対処するために,MOGAN と呼ばれるMOrphological-aware Generative Adversarial Networkを導入し,単一の画像のみに基づいて,多様な外観と信頼性を有するランダムなサンプルを生成する。 roiのトレーニングのために,原画像からのデータを拡張し,これらの拡張データを構造と外観の両方を含む知識に変換する新しいモジュールを導入することで,モデルのサンプル理解を高めることを提案する。 ROI以外の残りの領域を学ぶために、ROIから分離された生成を保証するためにバイナリマスクを使用します。最後に、上記の学習プロセスの並列および階層的な分岐を設定した。他の単一画像GAN方式と比較して,本手法は合理的な構造維持や外観の変化など,内部的な特徴に重点を置いている。実験では、ROIベースの画像生成タスクにおける私たちのモデルの能力は、競合相手よりも優れています。

In most interactive image generation tasks, given regions of interest (ROI) by users, the generated results are expected to have adequate diversities in appearance while maintaining correct and reasonable structures in original images. Such tasks become more challenging if only limited data is available. Recently proposed generative models complete training based on only one image. They pay much attention to the monolithic feature of the sample while ignoring the actual semantic information of different objects inside the sample. As a result, for ROI-based generation tasks, they may produce inappropriate samples with excessive randomicity and without maintaining the related objects' correct structures. To address this issue, this work introduces a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances and reliable structures based on only one image. For training for ROI, we propose to utilize the data coming from the original image being augmented and bring in a novel module to transform such augmented data into knowledge containing both structures and appearances, thus enhancing the model's comprehension of the sample. To learn the rest areas other than ROI, we employ binary masks to ensure the generation isolated from ROI. Finally, we set parallel and hierarchical branches of the mentioned learning process. Compared with other single image GAN schemes, our approach focuses on internal features including the maintenance of rational structures and variation on appearance. Experiments confirm a better capacity of our model on ROI-based image generation tasks than its competitive peers.

翻訳日:2021-03-05 14:56:57 公開日:2021-03-04

# CoTr:3D医療画像セグメンテーションのための効率の良いCNNとトランスフォーマー

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation ( http://arxiv.org/abs/2103.03024v1 )

ライセンス: Link先を確認

Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

(参考訳) 畳み込みニューラルネットワーク(CNN)は、現代の3D医療画像セグメンテーションのデファクトスタンダードとなっている。しかし、これらのネットワークで使用される畳み込み操作は、局所性および重量共有の誘導バイアスのために長距離依存性のモデリングに必然的に制限がある。 Transformerはこの問題に対処するために生まれたが、高解像度の3D特徴マップを処理する際の計算量と空間的複雑さに悩まされている。本稿では, 正確な3次元医用画像分割のために, {\bf Co}nvolutional Neural Network と {\bf Tr}ansformer {\bf (CoTr)} を効率的に橋渡しする新しいフレームワークを提案する。このフレームワークの下で、CNNは特徴表現を抽出するために構築され、抽出された特徴マップ上の長距離依存性をモデル化する効率的な変形可能なトランスフォーマー(DeTrans)が構築される。画像位置を均等に扱うバニラ変換器とは異なり、DeTransは変形可能な自己認識機構を導入することで、キー位置の小さなセットにのみ注意を払う。したがって、DeTransの計算と空間の複雑さは大幅に減少し、画像分割において最も重要となるマルチスケールで高解像度な特徴写像を処理できるようになった。 11の主要なヒト臓器をカバーするBCV(Multi-Atlas Labeling Beyond the Cranial Vault)データセットについて広範な評価を行っています。その結果, cotrは他のcnnベース, トランスフォーマーベース, ハイブリッド法に比べて, 3次元マルチオーガンセグメンテーションタスクの性能が大幅に向上した。コードは \def\UrlFont{\rm\ Small\ttfamily} \url{https://github.com/YtongXie/CoTr} で入手できる。

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a {\bf Co}nvolutional neural network and a {\bf Tr}ansformer {\bf (CoTr)} for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer which treats all image positions equally, our DeTrans pays attention only to a small set of key positions by introducing the deformable self-attention mechanism. Thus, the computational and spatial complexities of DeTrans have been greatly reduced, making it possible to process the multi-scale and high-resolution feature maps, which are usually of paramount importance for image segmentation. We conduct an extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that covers 11 major human organs. The results indicate that our CoTr leads to a substantial performance improvement over other CNN-based, transformer-based, and hybrid methods on the 3D multi-organ segmentation task. Code is available at \def\UrlFont{\rm\small\ttfamily} \url{https://github.com/YtongXie/CoTr}

翻訳日:2021-03-05 14:56:34 公開日:2021-03-04

# モバイルタッチレス指紋認識:実装、パフォーマンス、ユーザビリティの観点から

Mobile Touchless Fingerprint Recognition: Implementation, Performance and Usability Aspects ( http://arxiv.org/abs/2103.03038v1 )

ライセンス: Link先を確認

Jannis Priesnitz, Rolf Huesmann, Christian Rathgeb, Nicolas Buchmann, Christoph Busch

(参考訳) 本研究は,スマートフォン用タッチレス指紋自動認識システムを提案する。認識パイプライン全体を包括的に記述し,完全自動化したキャプチャシステムにおける重要な要件について考察する。また,本実装は研究目的で公開されている。データベースの取得中に、29の被験者の合計1,360のタッチレスおよびタッチベースのサンプルが2つの異なる環境状況でキャプチャされます。取得したデータベース上での実験では,環境制約下でのタッチレススキームとタッチベースベースラインスキームに匹敵する性能を示した。両方のキャプチャデバイスタイプの比較ユーザビリティ調査は、被験者の大半がタッチレスキャプチャ方法を好むことを示しています。実験結果に基づいて、現在のCOVID-19パンデミックが指紋認識システムに与える影響を分析した。最後に,タッチレス指紋認証の実装について概説する。

This work presents an automated touchless fingerprint recognition system for smartphones. We provide a comprehensive description of the entire recognition pipeline and discuss important requirements for a fully automated capturing system. Also, our implementation is made publicly available for research purposes. During a database acquisition, a total number of 1,360 touchless and touch-based samples of 29 subjects are captured in two different environmental situations. Experiments on the acquired database show a comparable performance of our touchless scheme and the touch-based baseline scheme under constrained environmental influences. A comparative usability study on both capturing device types indicates that the majority of subjects prefer the touchless capturing method. Based on our experimental results we analyze the impact of the current COVID-19 pandemic on fingerprint recognition systems. Finally, implementation aspects of touchless fingerprint recognition are summarized.

翻訳日:2021-03-05 14:56:07 公開日:2021-03-04

# 組織グラフを用いた不完全・不完全ラベルからの全すべりセグメンテーションの学習

Learning Whole-Slide Segmentation from Inexact and Incomplete Labels using Tissue Graphs ( http://arxiv.org/abs/2103.03129v1 )

ライセンス: Link先を確認

Valentin Anklin, Pushpak Pati, Guillaume Jaume, Behzad Bozorgtabar, Antonio Foncubierta-Rodr\'iguez, Jean-Philippe Thiran, Mathilde Sibony, Maria Gabrani, Orcun Goksel

(参考訳) 病理組織像を診断的に関連のある領域に分割することは,病理医の時間的かつ信頼性の高い判断を支援する上で不可欠である。この目的のためにコンピュータ支援技術が提案されており、スキャンされた組織学スライドの関連領域を記述している。しかし、この技術は、退屈で時間がかかり、費用がかかり、多くのヒストロジータスクのために取得できない、注釈付きピクセルのタスク固有の大きなデータセットを必要とします。よって, より安価で, より早く取得できるような, 弱い監督的セマンティックセグメンテーション手法を提案する。本稿では,弱い多重化アノテーション,すなわち弱い多重化アノテーションを利用するグラフを用いた弱教師付きセグメンテーション手法であるSegGiniを提案する。組織マイクロアレイ(TMA)からスライド画像全体(WSI)まで、任意の画像と大きな画像を分割する不完全かつ不完全なアノテーション。正式には、SegGiniは入力ヒストロジー画像のための組織グラフ表現を構築し、グラフノードは組織領域を描写する。そして、不正確な画像レベルラベル、不完全なスクリブル、またはその両方を用いて、ノード分類による弱い教師付きセグメンテーションを実行する。 TMAとWSIを含む2つの前立腺癌データセットを用いてSegGiniの評価を行った。本手法は,病理学者のベースラインに匹敵しながら,様々なアノテーション設定において,両方のデータセットにおいて最先端のセグメンテーション性能を達成した。

Segmenting histology images into diagnostically relevant regions is imperative to support timely and reliable decisions by pathologists. To this end, computer-aided techniques have been proposed to delineate relevant regions in scanned histology slides. However, the techniques necessitate task-specific large datasets of annotated pixels, which is tedious, time-consuming, expensive, and infeasible to acquire for many histology tasks. Thus, weakly-supervised semantic segmentation techniques are proposed to utilize weak supervision that is cheaper and quicker to acquire. In this paper, we propose SegGini, a weakly supervised segmentation method using graphs, that can utilize weak multiplex annotations, i.e. inexact and incomplete annotations, to segment arbitrary and large images, scaling from tissue microarray (TMA) to whole slide image (WSI). Formally, SegGini constructs a tissue-graph representation for an input histology image, where the graph nodes depict tissue regions. Then, it performs weakly-supervised segmentation via node classification by using inexact image-level labels, incomplete scribbles, or both. We evaluated SegGini on two public prostate cancer datasets containing TMAs and WSIs. Our method achieved state-of-the-art segmentation performance on both datasets for various annotation settings while being comparable to a pathologist baseline.

翻訳日:2021-03-05 14:55:58 公開日:2021-03-04

# コントラスト学習とトランスファー学習--医用画像解析を事例として

Contrastive Learning Meets Transfer Learning: A Case Study In Medical Image Analysis ( http://arxiv.org/abs/2103.03166v1 )

ライセンス: Link先を確認

Yuzhe Lu, Aadarsh Jha, and Yuankai Huo

(参考訳) 注釈付き医療画像は、ドメインの知識とプライバシーの制約によって制限されるため、ラベル付き自然画像よりも稀である。転校とコントラスト学習の最近の進歩は、異なる視点からこれらの問題に取り組む効果的な解決策を提供してきた。最先端の伝達学習(Big Transfer(BiT)など)とコントラスト学習(Simple Siamese Contrastive Learning(SimSiam)など)のアプローチは、その補完的な性質を考慮せずに、独立して検討されている。遅い収束速度が現代のコントラスト学習アプローチの重要な制限であることを考えると、トランスファー学習によるコントラスト学習を加速させるのは魅力的です。本稿では, BiT と SimSiam の整合性について検討する。経験的分析から、BiTをSimSiamに適応させる上では、異なる正規化技術(SimSiamのBatch Norm対Batch Norm)が鍵となる。 CIFAR-10およびHAM10000データセット上でBiT,SimSiam,BiT+SimSiamを用いてBiTとSimSiamを組み合わせた場合の性能評価を行った。その結果, BiTモデルがSimSiamの収束速度を加速することが示唆された。一緒に使用すると、どちらのモデルよりも優れた性能を発揮します。この研究は、画像解析のためのコントラスト学習モデルで、大きな学習済みモデルを集約するタスクを再検討する研究者を動機付けることを願っています。

Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective solutions to tackle such issues from different perspectives. The state-of-the-art transfer learning (e.g., Big Transfer (BiT)) and contrastive learning (e.g., Simple Siamese Contrastive Learning (SimSiam)) approaches have been investigated independently, without considering the complementary nature of such techniques. It would be appealing to accelerate contrastive learning with transfer learning, given that slow convergence speed is a critical limitation of modern contrastive learning approaches. In this paper, we investigate the feasibility of aligning BiT with SimSiam. From empirical analyses, different normalization techniques (Group Norm in BiT vs. Batch Norm in SimSiam) are the key hurdle of adapting BiT to SimSiam. When combining BiT with SimSiam, we evaluated the performance of using BiT, SimSiam, and BiT+SimSiam on CIFAR-10 and HAM10000 datasets. The results suggest that the BiT models accelerate the convergence speed of SimSiam. When used together, the model gives superior performance over both of its counterparts. We hope this study will motivate researchers to revisit the task of aggregating big pre-trained models with contrastive learning models for image analysis.

翻訳日:2021-03-05 14:55:36 公開日:2021-03-04

# インタラクティブ画像合成と編集のためのAnycost GAN

Anycost GANs for Interactive Image Synthesis and Editing ( http://arxiv.org/abs/2103.03243v1 )

ライセンス: Link先を確認

Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu

(参考訳) generative adversarial networks (gans) はフォトリアリスティックな画像合成と編集を可能にした。しかし、大規模なジェネレータ(例:StyleGAN2)の計算コストが高いため、エッジデバイス上の単一の編集結果を見るのには通常数秒かかり、インタラクティブなユーザーエクスペリエンスを禁止します。本稿では,現代的なレンダリングソフトウェアからインスピレーションを得て,インタラクティブな自然画像編集のためのAnycost GANを提案する。 Anycost GANをトレーニングし、弾力性のある解像度とチャンネルをサポートし、汎用性の高い速度で画像生成を高速化します。フルジェネレーターのサブセットを実行すると、フルジェネレーターと知覚的に類似した出力が生成されるため、プレビューに適したプロキシになります。サンプリングベースのマルチリゾリューショントレーニング、アダプティブチャネルトレーニング、および発電機コンディショニング識別器を使用することで、任意のジェネレータをさまざまな構成で評価し、別々に訓練されたモデルよりも優れた画質を実現できます。さらに,画像投影中に異なるサブジェネレータ間の一貫性を促進するために,新しいエンコーダトレーニングと潜在コード最適化手法を開発した。 Anycost GANは、さまざまなコスト予算(最大10倍の計算削減)で実行でき、幅広いハードウェアおよびレイテンシ要件に適応できます。デスクトップCPUとエッジデバイスにデプロイすると、6-12倍のスピードアップで知覚的に同様のプレビューを提供し、インタラクティブな画像編集を可能にします。コードとデモは公開されている。 https://github.com/mit-han-lab/anycost-gan。

Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, we take inspirations from modern rendering software and propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10x computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing. The code and demo are publicly available: https://github.com/mit-han-lab/anycost-gan.

翻訳日:2021-03-05 14:55:11 公開日:2021-03-04

# ニューラルネットワークによる正確かつ解釈可能な決定規則セットの学習

Learning Accurate and Interpretable Decision Rule Sets from Neural Networks ( http://arxiv.org/abs/2103.02826v1 )

ライセンス: Link先を確認

Litao Qiao, Weijia Wang, Bill Lin

(参考訳) 本論文では,分類の解釈可能なモデルとして独立論理則の集合を解読正規形式で学習する新しいパラダイムを提案する。我々は、ニューラルネットワークを特定の、しかし非常に単純な2層アーキテクチャでトレーニングすることとして、解釈可能な決定ルールセットを学ぶ問題を考える。第1層の各ニューロンはトレーニング後の解釈可能なif-thenルールに直接マップし、第2層の出力ニューロンは第1層ルールの切断に直接マップして決定ルールセットを形成する。この第1のルール層におけるニューロンの表現は、決定ルールにおける特徴の正の結合と負の結合の両方をエンコードできる。最先端のニューラルネットワークトレーニングアプローチは、高精度な分類モデルの学習に利用できる。さらに,分類精度とルールの単純さのバランスをとるために,スパース性に基づく正規化手法を提案する。実験の結果,本手法は他の最先端ルール学習アルゴリズムよりも精度の高いルールセットを生成できることがわかった。さらに、ランダムフォレストやフル精度ディープニューラルネットワークなどの解釈不能なブラックボックス機械学習アプローチと比較すると、予測性能に匹敵する解釈可能な決定ルールセットを簡単に見つけることができます。

This paper proposes a new paradigm for learning a set of independent logical rules in disjunctive normal form as an interpretable model for classification. We consider the problem of learning an interpretable decision rule set as training a neural network in a specific, yet very simple two-layer architecture. Each neuron in the first layer directly maps to an interpretable if-then rule after training, and the output neuron in the second layer directly maps to a disjunction of the first-layer rules to form the decision rule set. Our representation of neurons in this first rules layer enables us to encode both the positive and the negative association of features in a decision rule. State-of-the-art neural net training approaches can be leveraged for learning highly accurate classification models. Moreover, we propose a sparsity-based regularization approach to balance between classification accuracy and the simplicity of the derived rules. Our experimental results show that our method can generate more accurate decision rule sets than other state-of-the-art rule-learning algorithms with better accuracy-simplicity trade-offs. Further, when compared with uninterpretable black-box machine learning approaches such as random forests and full-precision deep neural networks, our approach can easily find interpretable decision rule sets that have comparable predictive performance.

翻訳日:2021-03-05 14:53:15 公開日:2021-03-04

# 旅行セールスマン問題のためのトランスフォーマーネットワーク

The Transformer Network for the Traveling Salesman Problem ( http://arxiv.org/abs/2103.03012v1 )

ライセンス: Link先を確認

Xavier Bresson and Thomas Laurent

(参考訳) トラベリングセールスマン問題(TSP)は1951年にフォン・ノイマンから始まった最も人気があり、最も研究されている組合せ問題である。切断面、分岐およびバウンド、ローカル検索、ラグランジアンリラクゼーション、シミュレートアニールなど、いくつかの最適化技術が発見されました。過去5年間、(グラフ)ニューラルネットワークが新しい組み合わせアルゴリズムを学習できる有望な技術の出現を見てきました。主な疑問は、ディープラーニングがデータからより良いヒューリスティックを学習できるかどうかである。人間工学のヒューリスティックを置き換える? NP-hard問題に効率的に対処するアルゴリズムを開発するには、長年の研究が必要であり、多くの業界問題が自然と組み合わせているため、これは魅力的である。本研究では,自然言語処理用に開発されたトランスフォーマーアーキテクチャを組込みTSPに適応させることを提案する。トレーニングは強化学習によって行われ、tspトレーニングソリューションがないため、デコーディングはビームサーチを使用する。 TSP50では0.004%、TSP100では0.39%の最適なギャップを持つ最近の学習ヒューリスティックに対するパフォーマンスの改善を報告する。

The Traveling Salesman Problem (TSP) is the most popular and most studied combinatorial problem, starting with von Neumann in 1951. It has driven the discovery of several optimization techniques such as cutting planes, branch-and-bound, local search, Lagrangian relaxation, and simulated annealing. The last five years have seen the emergence of promising techniques where (graph) neural networks have been capable to learn new combinatorial algorithms. The main question is whether deep learning can learn better heuristics from data, i.e. replacing human-engineered heuristics? This is appealing because developing algorithms to tackle efficiently NP-hard problems may require years of research, and many industry problems are combinatorial by nature. In this work, we propose to adapt the recent successful Transformer architecture originally developed for natural language processing to the combinatorial TSP. Training is done by reinforcement learning, hence without TSP training solutions, and decoding uses beam search. We report improved performances over recent learned heuristics with an optimal gap of 0.004% for TSP50 and 0.39% for TSP100.

翻訳日:2021-03-05 14:52:55 公開日:2021-03-04

# FESサイクリングにおける神経刺激制御の神経力学に基づく深部強化学習

Neuromechanics-based Deep Reinforcement Learning of Neurostimulation Control in FES cycling ( http://arxiv.org/abs/2103.03057v1 )

ライセンス: Link先を確認

Nat Wannawas, Mahendran Subramanian, A. Aldo Faisal

(参考訳) 機能電気刺激(FES)は麻痺した人の筋肉に動きを回復できます。しかし、四肢全体の機能を取り戻すために多くの筋肉を刺激する制御は未解決の問題である。現在の神経刺激工学はまだ20世紀の制御アプローチに依存しており、それに応じて日常的なチンカーリングがまったく動作する必要がある控えめな結果のみを示しています。本稿では,FESサイクリングのための麻痺肢の適応的神経刺激をリアルタイムに行うために開発されたDeep Reinforcement Learning(RL)の現状について述べる。アプローチの核心は、強化学習フレームワークにパーソナライズされた神経機械部品を組み込むことで、患者とのトレーニングセッションの延長を必要とせず、より効率的にモデルをトレーニングすることが可能になります。神経力学成分は筋・腱機能の筋骨格モデルと筋疲労の多状態モデルとを結合し、麻痺のサイクリストの瞬時筋肉容量に応答する神経刺激を誘導する。我々のRLアプローチはPIDとファジィロジックの精度と性能に優れる。また,本システムでは,自転車競技者の足の運動速度の上昇を刺激し,筋疲労時の定常走行において高いケイデンスを維持できることを学習した。 RLの神経刺激システムの一部は、2020年キバスロンオリンピックのFES大会で、9つの競技チームの中でパラパラパラパラジックサイクリストが銀メダルを獲得したことで成功しました。

Functional Electrical Stimulation (FES) can restore motion to a paralysed person's muscles. Yet, control stimulating many muscles to restore the practical function of entire limbs is an unsolved problem. Current neurostimulation engineering still relies on 20th Century control approaches and correspondingly shows only modest results that require daily tinkering to operate at all. Here, we present our state of the art Deep Reinforcement Learning (RL) developed for real time adaptive neurostimulation of paralysed legs for FES cycling. Core to our approach is the integration of a personalised neuromechanical component into our reinforcement learning framework that allows us to train the model efficiently without demanding extended training sessions with the patient and working out of the box. Our neuromechanical component includes merges musculoskeletal models of muscle and or tendon function and a multistate model of muscle fatigue, to render the neurostimulation responsive to a paraplegic's cyclist instantaneous muscle capacity. Our RL approach outperforms PID and Fuzzy Logic controllers in accuracy and performance. Crucially, our system learned to stimulate a cyclist's legs from ramping up speed at the start to maintaining a high cadence in steady state racing as the muscles fatigue. A part of our RL neurostimulation system has been successfully deployed at the Cybathlon 2020 bionic Olympics in the FES discipline with our paraplegic cyclist winning the Silver medal among 9 competing teams.

翻訳日:2021-03-05 14:52:35 公開日:2021-03-04

# アクセント話者に対する誤り駆動型固定予算ASRパーソナライズ

Error-driven Fixed-Budget ASR Personalization for Accented Speakers ( http://arxiv.org/abs/2103.03142v1 )

ライセンス: Link先を確認

Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, Preethi Jyothi

(参考訳) 話者特有の発話を記録するための固定予算に縛られながら、ASRモデルをパーソナライズするタスクを検討します。話者とASRモデルが与えられた場合,話者の発話を認識しにくくする文を識別する手法を提案する。このような文を選択するのに役立つ音素レベルの誤りモデルを学習するために、少数の話者固有データを仮定する。その結果,誤りモデルを用いて選択した文に対する話者の発話は,ランダムに選択された文に対する話者の発話よりも誤り率が高いことがわかった。誤りモデルの助けを借りて選択した文発話におけるasrモデルの微調整は、ランダムに選択された文発話数の微調整と比較して高いwr改善をもたらすことが判明した。そこで本手法は,ASRモデルのパーソナライズのための予算制約下で話者発話を効率よく収集する方法を提供する。

We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker-specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's utterances are likely to be harder for the given ASR model to recognize. We assume a tiny amount of speaker-specific data to learn phoneme-level error models which help us select such sentences. We show that speaker's utterances on the sentences selected using our error model indeed have larger error rates when compared to speaker's utterances on randomly selected sentences. We find that fine-tuning the ASR model on the sentence utterances selected with the help of error models yield higher WER improvements in comparison to fine-tuning on an equal number of randomly selected sentence utterances. Thus, our method provides an efficient way of collecting speaker utterances under budget constraints for personalizing ASR models.

翻訳日:2021-03-05 14:51:39 公開日:2021-03-04

# 交互セグメンテーションと合成によるコントラスト適応型組織分類

Contrast Adaptive Tissue Classification by Alternating Segmentation and Synthesis ( http://arxiv.org/abs/2103.02767v1 )

ライセンス: Link先を確認

Dzung L. Pham, Yi-Yu Chou, Blake E. Dewey, Daniel S. Reich, John A. Butman, and Snehashis Roy

(参考訳) 磁気共鳴画像のセグメンテーションに対する深層学習のアプローチは、脳画像の定量的解析の自動化に重要な可能性を示している。しかし、継続する課題は、取得プロトコルの可変性に対する感度である。トレーニングデータ内のコントラスト特性が異なるイメージをセグメント化しようとすると、一般的にパフォーマンスが大幅に低下する。さらに、取得の違いによる量的変動はしばしば測定しようとする生物学的差異による変動を弱めるため、不均質なデータセットは簡単には評価できない。本稿では,トレーニングデータのコントラスト特性を入力画像に適応させる,交互セグメンテーションと合成ステップを用いたアプローチについて述べる。これにより、トレーニングデータに似ていない入力イメージをより一貫してセグメント化できる。このアプローチの顕著な利点は、そのコントラスト特性に適応するために取得プロトコルの1つの例だけが必要であることである。 2つの異なるT1重み付きボリュームプロトコルでスキャンした被験者の脳画像を用いたアプローチの有効性を実証した。

Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantly reduced performance. Furthermore, heterogeneous data sets cannot be easily evaluated because the quantitative variation due to acquisition differences often dwarfs the variation due to the biological differences that one seeks to measure. In this work, we describe an approach using alternating segmentation and synthesis steps that adapts the contrast properties of the training data to the input image. This allows input images that do not resemble the training data to be more consistently segmented. A notable advantage of this approach is that only a single example of the acquisition protocol is required to adapt to its contrast properties. We demonstrate the efficacy of our approaching using brain images from a set of human subjects scanned with two different T1-weighted volumetric protocols.

翻訳日:2021-03-05 14:51:22 公開日:2021-03-04

# 複数カーネルと複数カーネル空間正規化器を用いたPET画像再構成

PET Image Reconstruction with Multiple Kernels and Multiple Kernel Space Regularizers ( http://arxiv.org/abs/2103.02813v1 )

ライセンス: Link先を確認

Shiyao Guo, Yuxia Sheng, Shenpeng Li, Li Chai, Jingxin Zhang

(参考訳) Kernelized max-likelihood (ML) 期待最大化 (EM) 法は最近、PET 画像の再構築において注目され、多くの最新の手法を上回っています。しかし,非カーネル化MLEM法では,潜在的に大規模な再構成誤りや反復数に対する高い感度で問題に対処できない。本稿では,理論的な推論と実験結果を用いてこれらの問題を実証し,その解法を提案する。このソリューションは、複数のカーネル行列と異なるアプリケーションに合わせて調整できる複数のカーネルスペース正規化を備えた正規化されたカーネル化されたMLEMです。再構成誤差と繰り返し数に対する感度を低減するため,カーネル画像辞書とカーネル画像からなる2つの正規化器の一般クラスと,それを用いて,PET画像再構成のための単一カーネル正規化EMとマルチカーネル正規化EMアルゴリズムを導出する。これらの新しいアルゴリズムは、機械学習におけるマルチカーネルコンビネーション、スパース符号化における画像辞書学習、グラフ信号処理におけるグラフラプルシアン二次的な技術ツールを用いて導かれる。シミュレーションデータと生体内データの比較実験を行い、新しいアルゴリズムの検証と評価を行い、カーネル化されたMLEMや他の従来の手法よりも優れた性能と利点を示す。

Kernelized maximum-likelihood (ML) expectation maximization (EM) methods have recently gained prominence in PET image reconstruction, outperforming many previous state-of-the-art methods. But they are not immune to the problems of non-kernelized MLEM methods in potentially large reconstruction error and high sensitivity to iteration number. This paper demonstrates these problems by theoretical reasoning and experiment results, and provides a novel solution to solve these problems. The solution is a regularized kernelized MLEM with multiple kernel matrices and multiple kernel space regularizers that can be tailored for different applications. To reduce the reconstruction error and the sensitivity to iteration number, we present a general class of multi-kernel matrices and two regularizers consisting of kernel image dictionary and kernel image Laplacian quatradic, and use them to derive the single-kernel regularized EM and multi-kernel regularized EM algorithms for PET image reconstruction. These new algorithms are derived using the technical tools of multi-kernel combination in machine learning, image dictionary learning in sparse coding, and graph Laplcian quadratic in graph signal processing. Extensive tests and comparisons on the simulated and in vivo data are presented to validate and evaluate the new algorithms, and demonstrate their superior performance and advantages over the kernelized MLEM and other conventional methods.

翻訳日:2021-03-05 14:51:07 公開日:2021-03-04

# 人間と機械の知覚に対する電位エネルギーに基づく点雲歪みの定量化

Point Cloud Distortion Quantification based on Potential Energy for Human and Machine Perception ( http://arxiv.org/abs/2103.02850v1 )

ライセンス: Link先を確認

Qi Yang, Siheng Chen, Yiling Xu, Jun Sun, M. Salman Asif, Zhan Ma

(参考訳) 点雲の歪み定量化は、幅広い人間や機械の知覚タスクにおいて、ステルスだが重要な役割を果たす。人間の知覚タスクでは、歪み量子化は主観的な実験に代えて3次元可視化を導くことができ、機械知覚タスクでは歪み量子化は教師なし学習タスクのためのディープニューラルネットワークのトレーニングを導くための損失関数として機能する。多くのアプリケーションで様々な要求を処理するためには、歪み定量化は、歪み識別可能で、微分可能で、計算の複雑さが低い必要がある。しかし、現在では3つの条件をすべて満たす一般的な歪量化が欠如している。このギャップを埋めるために、この研究は、点雲の幾何学と色差を測定する歪み定量化であるMPED(multiscale potential energy discrepancy)を提案する。様々な地域規模で評価することにより,MPEDはグローバルな局所的トレードオフを実現し,マルチスケールで歪みを捉えている。広範な実験研究は、人間と機械の両方の知覚タスクに対するMPEDの優位性を検証します。

Distortion quantification of point clouds plays a stealth, yet vital role in a wide range of human and machine perception tasks. For human perception tasks, a distortion quantification can substitute subjective experiments to guide 3D visualization; while for machine perception tasks, a distortion quantification can work as a loss function to guide the training of deep neural networks for unsupervised learning tasks. To handle a variety of demands in many applications, a distortion quantification needs to be distortion discriminable, differentiable, and have a low computational complexity. Currently, however, there is a lack of a general distortion quantification that can satisfy all three conditions. To fill this gap, this work proposes multiscale potential energy discrepancy (MPED), a distortion quantification to measure point cloud geometry and color difference. By evaluating at various neighborhood sizes, the proposed MPED achieves global-local tradeoffs, capturing distortion in a multiscale fashion. Extensive experimental studies validate MPED's superiority for both human and machine perception tasks.

翻訳日:2021-03-05 14:50:42 公開日:2021-03-04

# mask dngan: 逆損失と勾配マスクを備えたマルチステージ生ビデオ

Mask DnGAN: Multi-Stage Raw Video Denoising with Adversarial Loss and Gradient Mask ( http://arxiv.org/abs/2103.02861v1 )

ライセンス: Link先を確認

Avinash Paliwal, Libing Zeng and Nima Khademi Kalantari

(参考訳) 本論文では,低照度下で撮影された生の映像を消音する学習手法を提案する。まず、畳み込みニューラルネットワーク(cnn)を用いて、隣接するフレームを現在のフレームに明示的に調整することを提案する。次に、登録されたフレームを別のCNNを使って融合し、最終識別フレームを得る。時間的に離れたフレームを直接アライメントしないように、複数の段階でアライメントと融合の2つのプロセスを実行します。具体的には、各段階で3つの連続入力フレームで消音処理を行い、中間消音フレームを生成し、次のステージに入力として渡します。複数の段階で処理を行うことで、時間的に離れたフレームを直接調整することなく、隣接するフレームの情報を有効に活用することができる。我々は,条件付き判別器を用いた対向損失を用いた多段階システムの訓練を行う。具体的には,スムーズな領域に高周波アーティファクトを導入するのを防ぐために,ソフトグラデーションマスクに識別器を装着する。本システムでは,時間的にコヒーレントな映像をリアルに生成できることを示す。さらに,本手法が最先端の映像や映像を数値的および視覚的に表現する手法よりも優れていることを示す実験を行った。

In this paper, we propose a learning-based approach for denoising raw videos captured under low lighting conditions. We propose to do this by first explicitly aligning the neighboring frames to the current frame using a convolutional neural network (CNN). We then fuse the registered frames using another CNN to obtain the final denoised frame. To avoid directly aligning the temporally distant frames, we perform the two processes of alignment and fusion in multiple stages. Specifically, at each stage, we perform the denoising process on three consecutive input frames to generate the intermediate denoised frames which are then passed as the input to the next stage. By performing the process in multiple stages, we can effectively utilize the information of neighboring frames without directly aligning the temporally distant frames. We train our multi-stage system using an adversarial loss with a conditional discriminator. Specifically, we condition the discriminator on a soft gradient mask to prevent introducing high-frequency artifacts in smooth regions. We show that our system is able to produce temporally coherent videos with realistic details. Furthermore, we demonstrate through extensive experiments that our approach outperforms state-of-the-art image and video denoising methods both numerically and visually.

翻訳日:2021-03-05 14:50:24 公開日:2021-03-04

# メタモルフィックテストによる重畳生成逆数ネットワークのロバスト性評価

Robustness Evaluation of Stacked Generative Adversarial Networks using Metamorphic Testing ( http://arxiv.org/abs/2103.02870v1 )

ライセンス: Link先を確認

Hyejin Park, Taaha Waseem, Wen Qi Teo, Ying Hwei Low, Mei Kuan Lim and Chun Yong Chong

(参考訳) 自然言語からのフォトリアリスティック画像の合成は、コンピュータビジョンにおいて難しい問題の一つである。過去10年間で、多くのアプローチが提案され、改良されたStackGAN-v2は、入力されたテキスト記述に指定された詳細を反映した高解像度の画像を生成することができることが証明されている。本論文では, StackGAN-v2モデルの堅牢性と耐障害性を評価するために, トレーニングデータの変動を導入した。しかし、GAN(Generative Adversarial Network)の動作原理により、トレーニングデータの修正時にモデルの出力を予測することは困難である。そこで本研究では,様々な予期せぬトレーニングデータセットを用いてモデルのロバスト性を評価するために,メタモルフィックテスト手法を採用する。そこで我々はまずstackgan-v2アルゴリズムを実装し,原著者による事前学習モデルの検証を行い,実験の基礎的真理を明らかにした。次に、テストケースが生成される変成関係を特定します。さらに, 先行試験結果の観察に基づいて, 変成関係を連続的に導出した。その結果,StackGAN-v2モデルの著者やユーザからは報告されていないメインオブジェクトとの重複が最小限に抑えられたとしても,StackGAN-v2アルゴリズムは強迫性オブジェクトによる画像入力に感受性があることが判明した。提案したメタモルフィック関係は、ロバスト性を検証するだけでなく、機械学習モデルによる結果の理解と解釈を支援するために、他のテキスト・画像合成モデルにも適用することができる。

Synthesising photo-realistic images from natural language is one of the challenging problems in computer vision. Over the past decade, a number of approaches have been proposed, of which the improved Stacked Generative Adversarial Network (StackGAN-v2) has proven capable of generating high resolution images that reflect the details specified in the input text descriptions. In this paper, we aim to assess the robustness and fault-tolerance capability of the StackGAN-v2 model by introducing variations in the training data. However, due to the working principle of Generative Adversarial Network (GAN), it is difficult to predict the output of the model when the training data are modified. Hence, in this work, we adopt Metamorphic Testing technique to evaluate the robustness of the model with a variety of unexpected training dataset. As such, we first implement StackGAN-v2 algorithm and test the pre-trained model provided by the original authors to establish a ground truth for our experiments. We then identify a metamorphic relation, from which test cases are generated. Further, metamorphic relations were derived successively based on the observations of prior test results. Finally, we synthesise the results from our experiment of all the metamorphic relations and found that StackGAN-v2 algorithm is susceptible to input images with obtrusive objects, even if it overlaps with the main object minimally, which was not reported by the authors and users of StackGAN-v2 model. The proposed metamorphic relations can be applied to other text-to-image synthesis models to not only verify the robustness but also to help researchers understand and interpret the results made by the machine learning models.

翻訳日:2021-03-05 14:50:06 公開日:2021-03-04

# 深部画像圧縮におけるレイテンシのクロスチャネルコンテキストモデル

A Cross Channel Context Model for Latents in Deep Image Compression ( http://arxiv.org/abs/2103.02884v1 )

ライセンス: Link先を確認

Changyue Ma, Zhao Wang, Ruling Liao, Yan Ye

(参考訳) 本稿では,深部画像圧縮における潜伏者に対するクロスチャネルコンテキストモデルを提案する。一般的に、深い画像圧縮は自動エンコーダフレームワークに基づいており、元の画像はエンコーダで潜入し、復号器で量子化された潜伏から復元された画像を回復する。変換は通常エントロピーモデルと組み合わされ、算術符号化のための量子化された潜みの確率分布を推定する。現在、共同自己回帰的および階層的先行エントロピーモデルが広く採用され、ハイパーレイトからのグローバルコンテキストと量子化されたレイト要素からのローカルコンテキストの両方をキャプチャする。ローカルコンテキストでは、広く採用されている2Dマスク畳み込みは、空間コンテキストのみをキャプチャできる。しかし, 異なるチャネル間に強い相関関係があることが観察された。クロスチャネル相関を利用するため,本手法では,チャネルインデックスに従って潜伏者を複数のグループに分割し,そのグループを1つずつコード化する。提案するクロスチャネルコンテキストモデルは自己回帰モデルと階層的事前エントロピーモデルを組み合わせたものである。実験結果は、PSNRを歪みメトリックとして使用することにより、ベースラインエントロピーモデルよりも6.30%と6.31%のBDレート削減を達成し、KodakおよびCVPR CLIC2020プロフェッショナルデータセット用の最新のビデオコーディング標準のVVC(Versatile Video Coding)に対して2.50%と2.20%を達成した。また,MS-SSIMに最適化した場合,より快適な再構成画像を生成する。

This paper presents a cross channel context model for latents in deep image compression. Generally, deep image compression is based on an autoencoder framework, which transforms the original image to latents at the encoder and recovers the reconstructed image from the quantized latents at the decoder. The transform is usually combined with an entropy model, which estimates the probability distribution of the quantized latents for arithmetic coding. Currently, joint autoregressive and hierarchical prior entropy models are widely adopted to capture both the global contexts from the hyper latents and the local contexts from the quantized latent elements. For the local contexts, the widely adopted 2D mask convolution can only capture the spatial context. However, we observe that there are strong correlations between different channels in the latents. To utilize the cross channel correlations, we propose to divide the latents into several groups according to channel index and code the groups one by one, where previously coded groups are utilized to provide cross channel context for the current group. The proposed cross channel context model is combined with the joint autoregressive and hierarchical prior entropy model. Experimental results show that, using PSNR as the distortion metric, the combined model achieves BD-rate reductions of 6.30% and 6.31% over the baseline entropy model, and 2.50% and 2.20% over the latest video coding standard Versatile Video Coding (VVC) for the Kodak and CVPR CLIC2020 professional dataset, respectively. In addition, when optimized for the MS-SSIM metric, our approach generates visually more pleasant reconstructed images.

翻訳日:2021-03-05 14:49:38 公開日:2021-03-04

# 効率と高速化:混合精度量子化のための新しいシーケンシャル・シングルパス探索

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization ( http://arxiv.org/abs/2103.02904v1 )

ライセンス: Link先を確認

Qigong Sun, Licheng Jiao, Yan Ren, Xiufang Li, Fanhua Shang, Fang Liu

(参考訳) モデル量子化はモデルサイズと計算遅延を低減するのに役立つため、携帯電話、組み込みデバイス、スマートチップの多くのアプリケーションでうまく適用されている。混合精度量子化モデルは、異なる層の感度に応じて異なる量子化ビット精度に適合し、優れた性能を達成することができる。しかし、いくつかの制約(ハードウェアリソース、エネルギー消費、モデルサイズ、計算遅延など)に従って、ディープニューラルネットワーク内の各層の量子化ビット精度を迅速に決定することは困難である。この問題に対処するために,提案した制約を損失関数に導入し,探索プロセスを導出する,混合精度量子化のための新しいシーケンシャルシングルパス探索法(SSPS)を提案する。単一の経路探索セルは、勾配に基づくアルゴリズムによって最適化できる完全微分可能なスーパーネットを結合するために使用される。さらに, 探索空間を指数関数的に削減し, 探索過程の収束を高速化するために, 選択条件に従って候補精度を逐次決定する。実験では,異なるアーキテクチャ(例:ResNet-20, 18, 34, 50, MobileNet-V2)とデータセット(例:CIFAR-10, ImageNet, COCO)の混合精度モデルを,特定の制約下で効率的に探索できることを示した。

Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance. However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency). To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,in which the given constraints are introduced into its loss function to guide searching process. A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.

翻訳日:2021-03-05 14:49:08 公開日:2021-03-04

# 極端k-Spaceアンダーサンプリングと超解像による超高速MRI

Towards Ultrafast MRI via Extreme k-Space Undersampling and Superresolution ( http://arxiv.org/abs/2103.02940v1 )

ライセンス: Link先を確認

Aleksandr Belov and Joel Stadelmann and Sergey Kastryulin and Dmitry V. Dylov

(参考訳) 我々は、オリジナルのfastMRIチャレンジを参照するすべての論文によって報告されたMRI加速因子(k-space undersampling)を下回った後、未解決の画像を補う強力なディープラーニングベースの画像強化方法を検討した。我々は、サンプリングパターン、アンダーサンプリングおよびダウンスケーリング要因、ならびに脳と膝の高速MRIベンチマークの最終的な画像品質に対する回復モデルの影響を徹底的に検討します。復元された画像の品質は他の方法よりも高く、MSEは0.00114、PSNRは29.6 dB、SSIMは0.956 x16加速係数である。 x32およびx64のより極端なアンダーサンプリング因子も検討され、コンピュータ支援手術や放射線計画などの特定の臨床応用を約束する。専門家5名の放射線技師が100対の画像を評価し、回収したサンプル画像が統計的に診断価値を保っていることを示す。

We went below the MRI acceleration factors (a.k.a., k-space undersampling) reported by all published papers that reference the original fastMRI challenge, and then considered powerful deep learning based image enhancement methods to compensate for the underresolved images. We thoroughly study the influence of the sampling patterns, the undersampling and the downscaling factors, as well as the recovery models on the final image quality for both the brain and the knee fastMRI benchmarks. The quality of the reconstructed images surpasses that of the other methods, yielding an MSE of 0.00114, a PSNR of 29.6 dB, and an SSIM of 0.956 at x16 acceleration factor. More extreme undersampling factors of x32 and x64 are also investigated, holding promise for certain clinical applications such as computer-assisted surgery or radiation planning. We survey 5 expert radiologists to assess 100 pairs of images and show that the recovered undersampled images statistically preserve their diagnostic value.

翻訳日:2021-03-05 14:48:40 公開日:2021-03-04

# 高品質優先学習と劣化学習による知覚的画像復元

Perceptual Image Restoration with High-Quality Priori and Degradation Learning ( http://arxiv.org/abs/2103.03010v1 )

ライセンス: Link先を確認

Chaoyi Han, Yiping Duan, Xiaoming Tao, Jianhua Lu

(参考訳) 知覚画像復元は、特定の画像に劣化する可能性が高い高忠実度画像を求めます。より良い視覚品質のために、以前の研究は生成モデルの潜在空間を利用して自然画像多様体内の解を探すことを提案した。しかし, 遅延埋め込みが先行分布に近い場合にのみ, 生成画像の品質が保証される。本研究では,先行多様体内の実現可能な領域を制限することを提案する。これは、2つの分布のための非パラメトリック計量で達成されます:最大平均差(MMD)。さらに, 劣化過程を直接条件分布としてモデル化する。本モデルは,復元画像と劣化画像の類似度を測定するのに有効であることを示す。劣化画像よりも長く批判された画素距離を最適化する代わりに、高い確率で視覚的快楽画像を見つけるためにそのようなモデルを頼りにしている。同時修復・拡張フレームワークは,実世界の複雑な分解型によく一般化する。 nr-iqa(perceptual quality and no-reference image quality assessment)の実験結果から,本手法の優れた性能を示す。

Perceptual image restoration seeks for high-fidelity images that most likely degrade to given images. For better visual quality, previous work proposed to search for solutions within the natural image manifold, by exploiting the latent space of a generative model. However, the quality of generated images are only guaranteed when latent embedding lies close to the prior distribution. In this work, we propose to restrict the feasible region within the prior manifold. This is accomplished with a non-parametric metric for two distributions: the Maximum Mean Discrepancy (MMD). Moreover, we model the degradation process directly as a conditional distribution. We show that our model performs well in measuring the similarity between restored and degraded images. Instead of optimizing the long criticized pixel-wise distance over degraded images, we rely on such model to find visual pleasing images with high probability. Our simultaneous restoration and enhancement framework generalizes well to real-world complicated degradation types. The experimental results on perceptual quality and no-reference image quality assessment (NR-IQA) demonstrate the superior performance of our method.

翻訳日:2021-03-05 14:48:23 公開日:2021-03-04

# TPCN: モーション予測のための一時的ポイントクラウドネットワーク

TPCN: Temporal Point Cloud Networks for Motion Forecasting ( http://arxiv.org/abs/2103.03067v1 )

ライセンス: Link先を確認

Maosheng Ye, Tongyi Cao, Qifeng Chen

(参考訳) 軌道予測のための空間的および時間的学習を併用した新しい柔軟な枠組みであるtemporal point cloud networks (tpcn)を提案する。エージェントをラスタライズしたり,情報を2次元イメージにマッピングしたり,グラフ表現で操作したりする既存のアプローチとは異なり,このアプローチでは,動的時間学習を伴うポイントクラウド学習から,軌道予測を空間的および時間的次元に分割することで,空間的および時間的情報をキャプチャするアイデアを拡張している。空間的次元ではエージェントは無秩序な点集合と見なすことができ、したがってエージェントの位置をモデル化するためにポイントクラウド学習技術を適用することは容易である。空間次元は運動的・運動的情報を考慮しないが,エージェントの時間的動きをモデル化するための動的時間学習も提案する。 Argoverse運動予測ベンチマークの実験は、私たちのアプローチが最先端の結果を達成することを示しています。

We propose the Temporal Point Cloud Networks (TPCN), a novel and flexible framework with joint spatial and temporal learning for trajectory prediction. Unlike existing approaches that rasterize agents and map information as 2D images or operate in a graph representation, our approach extends ideas from point cloud learning with dynamic temporal learning to capture both spatial and temporal information by splitting trajectory prediction into both spatial and temporal dimensions. In the spatial dimension, agents can be viewed as an unordered point set, and thus it is straightforward to apply point cloud learning techniques to model agents' locations. While the spatial dimension does not take kinematic and motion information into account, we further propose dynamic temporal learning to model agents' motion over time. Experiments on the Argoverse motion forecasting benchmark show that our approach achieves the state-of-the-art results.

翻訳日:2021-03-05 14:48:09 公開日:2021-03-04

# 大規模ビデオ圧縮センシングのためのメモリ効率ネットワーク

Memory-Efficient Network for Large-scale Video Compressive Sensing ( http://arxiv.org/abs/2103.03089v1 )

ライセンス: Link先を確認

Ziheng Cheng, Bo Chen, Guanliang Liu, Hao Zhang, Ruiying Lu, Zhengjue Wang, Xin Yuan

(参考訳) video snapshot compressive imaging (sci) は、2d検出器を使って1つのショットで一連のビデオフレームをキャプチャする。基本原理は、1つの露光時間の間に異なるマスクを高速シーンに課して圧縮測定を行うというものである。マスクの知識により、このスナップショット測定から所望の高速映像フレームを再構成するために最適化アルゴリズムやディープラーニング手法が用いられる。残念ながら、これらの手法は良好な結果が得られるが、最適化アルゴリズムの長い実行時間やディープネットワークの巨大なトレーニングメモリ占有は、実用上のアプリケーションではそれらを妨げている。本稿では,マルチグループ可逆3次元畳み込みニューラルネットワークに基づく大規模映像SCIのためのメモリ効率の良いネットワークを開発する。グレースケールSCIシステムの基本モデルに加えて、我々はバイエル測定からカラービデオを直接回復するために、復号化とSCI再構築を組み合わせるためにさらに一歩進んでいます。 SCIカメラが捉えたシミュレーションと実データの両方の大規模な結果から,提案したモデルは,メモリの少ない従来モデルよりも優れており,大規模な問題に利用できることを示す。コードはhttps://github.com/BoChenGroup/RevSCI-netにある。

Video snapshot compressive imaging (SCI) captures a sequence of video frames in a single shot using a 2D detector. The underlying principle is that during one exposure time, different masks are imposed on the high-speed scene to form a compressed measurement. With the knowledge of masks, optimization algorithms or deep learning methods are employed to reconstruct the desired high-speed video frames from this snapshot measurement. Unfortunately, though these methods can achieve decent results, the long running time of optimization algorithms or huge training memory occupation of deep networks still preclude them in practical applications. In this paper, we develop a memory-efficient network for large-scale video SCI based on multi-group reversible 3D convolutional neural networks. In addition to the basic model for the grayscale SCI system, we take one step further to combine demosaicing and SCI reconstruction to directly recover color video from Bayer measurements. Extensive results on both simulation and real data captured by SCI cameras demonstrate that our proposed model outperforms previous state-of-the-art with less memory and thus can be used in large-scale problems. The code is at https://github.com/BoChenGroup/RevSCI-net.

翻訳日:2021-03-05 14:47:53 公開日:2021-03-04

# STEP:安全なオフロードナビゲーションのための確率的トラバーサビリティ評価と計画

STEP: Stochastic Traversability Evaluation and Planning for Safe Off-road Navigation ( http://arxiv.org/abs/2103.02828v1 )

ライセンス: Link先を確認

David D. Fan, Kyohei Otsu, Yuki Kubo, Anushri Dixit, Joel Burdick, and Ali-Akbar Agha-Mohammadi

(参考訳) 地上ロボットの自律性は、構造化および制御された環境で広く使用されているが、未知およびオフロード地形での自律性は依然として難しい問題である。未発達の荒野、洞窟、瓦瓦など、極端、オフロード、非構造的な環境は、自律的なナビゲーションに独特で困難な問題を引き起こす。これらの課題に対処するために, トラバーサビリティの評価と, 安全で実現可能な高速な軌道をリアルタイムに計画する手法を提案する。我々はSTEP (Stochastic Traversability Evaluation and Planning) と名づけたアプローチを, 1) 急激な不確実性認識マッピングとトラバースビリティ評価, 2) 条件付き値アットリスクを用いたテールリスク評価, 3) シーケンシャル2次プログラミングベース(SQP)モデル予測制御(MPC)を用いた効率的なリスクと制約対応キノダイナミックな運動計画に頼っている。本手法をシミュレーションで解析し,地下溶岩管を含む極端な地形を探索する車輪型および脚型ロボットプラットフォーム上での有効性を検証する。

Although ground robotic autonomy has gained widespread usage in structured and controlled environments, autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, and rubble pose unique and challenging problems for autonomous navigation. To tackle these problems we propose an approach for assessing traversability and planning a safe, feasible, and fast trajectory in real-time. Our approach, which we name STEP (Stochastic Traversability Evaluation and Planning), relies on: 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), and 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC). We analyze our method in simulation and validate its efficacy on wheeled and legged robotic platforms exploring extreme terrains including an underground lava tube.

翻訳日:2021-03-05 14:47:34 公開日:2021-03-04

# サイバーフィジカルシステムのためのrlに基づく適応検出戦略

An RL-Based Adaptive Detection Strategy to Secure Cyber-Physical Systems ( http://arxiv.org/abs/2103.02872v1 )

ライセンス: Link先を確認

Ipsita Koley, Sunandan Adhikary and Soumyajit Dey

(参考訳) ネットワークへの依存が高まり、ソフトウェアベースの制御はサイバーフィジカルシステム(CPS)の脆弱性を増大させました。動的システム理論を利用して開発された検出・監視コンポーネントは、安全上の重要なCPSを偽データ注入攻撃から保護するための軽量なセキュリティ対策としてしばしば用いられる。しかし、既存のアプローチは攻撃シナリオと検出システムのパラメータを関連付けていない。本研究では,攻撃シナリオから学んだ経験に基づいて,これらの検出器のパラメータを適応的に設定し,検出率を最大化し,制御動作を保ちながらプロセス中の誤報を最小化する強化学習(rl)フレームワークを提案する。

Increased dependence on networked, software based control has escalated the vulnerabilities of Cyber Physical Systems (CPSs). Detection and monitoring components developed leveraging dynamical systems theory are often employed as lightweight security measures for protecting such safety critical CPSs against false data injection attacks. However, existing approaches do not correlate attack scenarios with parameters of detection systems. In the present work, we propose a Reinforcement Learning (RL) based framework which adaptively sets the parameters of such detectors based on experience learned from attack scenarios, maximizing detection rate and minimizing false alarms in the process while attempting performance preserving control actions.

翻訳日:2021-03-05 14:46:56 公開日:2021-03-04

# 微分プライベートディープラーニングにおける$\epsilon$の選択と監査の定量化

Quantifying identifiability to choose and audit $\epsilon$ in differentially private deep learning ( http://arxiv.org/abs/2103.02913v1 )

ライセンス: Link先を確認

Daniel Bernau, G\"unther Eibl, Philip W. Grassal, Hannah Keller, Florian Kerschbaum

(参考訳) 差分プライバシーにより、トレーニングデータレコードが機械学習モデルに与える影響を制限できます。機械学習で差分プライバシーを使用するには、データサイエンティストがプライバシパラメータを$(\epsilon,\delta)$を選択する必要がある。弱いプライバシパラメータでトレーニングされたモデルが過剰なプライバシリークを引き起こす可能性があり、強力なプライバシパラメータがモデルユーティリティを過度に低下させる可能性があるため、有意義なプライバシパラメータを選択することが重要だ。しかし,プライバシパラメータの値は2つの主な理由から選択が難しい。まず、選択された感度と実用的なデータセットのデータ分布に応じて、プライバシー損失$(\epsilon,\delta)$の上限は緩いかもしれません。第二に、匿名化の法的要件と社会的規範は個々の識別可能性を指し、$(\epsilon,\delta)$は間接的にのみ関係している。 %プライアワークは$(\epsilon,\delta)$の選択を導くためにメンバーシップ推論の敵対者を提案した。しかし、これらの敵はディファレンシャル・プライバシが想定する敵よりも弱く、$(\epsilon,\delta)$で定義されるプライバシ損失の上限を経験上は到達できない。したがって、メンバーシップ推論攻撃の定量化は、$(\epsilon,\delta)$が行う正確な意味を保持しません。我々は(\epsilon,\delta)$を、トレーニングデータセットにおけるレコードの存在に関する差分プライバシーによって仮定される敵のベイズ的後方信念の束縛に変換する。構成下における多次元クエリのバウンダリは保持され、実際はタイトであることを示す。さらに, 識別可能性境界を導出し, 差動プライバシで想定される敵と, メンバシップ推論敵に対する先行研究との関連性を示す。我々は、データサイエンティストがモデルトレーニングを監査し、経験的識別可能性スコアと経験的$(\epsilon,\delta)$を計算することを可能にするこの差分プライバシーの逆数の実装を策定します。

Differential privacy allows bounding the influence that training data records have on a machine learning model. To use differential privacy in machine learning, data scientists must choose privacy parameters $(\epsilon,\delta)$. Choosing meaningful privacy parameters is key since models trained with weak privacy parameters might result in excessive privacy leakage, while strong privacy parameters might overly degrade model utility. However, privacy parameter values are difficult to choose for two main reasons. First, the upper bound on privacy loss $(\epsilon,\delta)$ might be loose, depending on the chosen sensitivity and data distribution of practical datasets. Second, legal requirements and societal norms for anonymization often refer to individual identifiability, to which $(\epsilon,\delta)$ are only indirectly related. %Prior work has proposed membership inference adversaries to guide the choice of $(\epsilon,\delta)$. However, these adversaries are weaker than the adversary assumed by differential privacy and cannot empirically reach the upper bounds on privacy loss defined by $(\epsilon,\delta)$. Therefore, no quantification of a membership inference attack holds the exact meaning that $(\epsilon,\delta)$ does. We transform $(\epsilon,\delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset. The bound holds for multidimensional queries under composition, and we show that it can be tight in practice. Furthermore, we derive an identifiability bound, which relates the adversary assumed in differential privacy to previous work on membership inference adversaries. We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(\epsilon,\delta)$.

翻訳日:2021-03-05 14:46:45 公開日:2021-03-04

# M5競合データの代表性を探る

Exploring the representativeness of the M5 competition data ( http://arxiv.org/abs/2103.02941v1 )

ライセンス: Link先を確認

Evangelos Theodorou, Shengjie Wang, Yanfei Kang, Evangelos Spiliotis, Spyros Makridakis, Vassilios Assimakopoulos

(参考訳) m5コンペティションの主な目的は、walmartの階層的なユニットセールスの予測に焦点をあて、現場における予測方法の正確性と不確実性を評価し、ベストプラクティスを特定し、その実践的意義を強調することであった。しかし、m5コンペティションの成果が小売業者の意思決定と運用をより良く支援するために一般化され、活用されるかどうかは、m5データが現実を表わす程度、すなわち、異なる地域で活動する小売業者の単位販売データを十分に表現し、異なる種類の製品を販売し、異なるマーケティング戦略を検討するかによって異なる。この問いに答えるために、我々はM5時系列の特徴を分析し、特徴空間を用いて2つの食料品店、すなわちCooraci\'on Favoritaと主要なギリシャのスーパーマーケットチェーンの特性を比較した。以上の結果から,m5データの代表性を支持するデータセット間の差異は少ないことが示唆された。

The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field in order to identify best practices and highlight their practical implications. However, whether the findings of the M5 competition can be generalized and exploited by retail firms to better support their decisions and operation depends on the extent to which the M5 data is representative of the reality, i.e., sufficiently represent the unit sales data of retailers that operate in different regions, sell different types of products, and consider different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporaci\'on Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest that there are only small discrepancies between the examined data sets, supporting the representativeness of the M5 data.

翻訳日:2021-03-05 14:46:09 公開日:2021-03-04

# ロバスト表現のための深層グラフ構造学習:調査

Deep Graph Structure Learning for Robust Representations: A Survey ( http://arxiv.org/abs/2103.03036v1 )

ライセンス: Link先を確認

Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, Liang Wang

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データの解析に広く利用されている。ほとんどのGNN法はグラフ構造の品質に非常に敏感であり、通常は情報埋め込みを学ぶのに完璧なグラフ構造を必要とする。しかし、グラフにおける雑音の広がりは実世界の問題に対するロバスト表現の学習を必要とする。 GNNモデルの堅牢性を改善するために、最適化されたグラフ構造と対応する表現を共同学習することを目的としたグラフ構造学習(GSL)の中心概念を中心に多くの研究が提案されている。本研究の目的は, 頑健な表現を学習するための GSL 手法の最近の進歩を概観することである。具体的には、まずGSLの一般的なパラダイムを定式化し、次に、GSLの考え方を他のグラフタスクに組み込んだアプリケーションで、グラフ構造のモデル化方法によって分類された最先端の手法をレビューする。最後に,本研究の問題点を指摘し,今後の方向性について論じる。

Graph Neural Networks (GNNs) are widely used for analyzing graph-structured data. Most GNN methods are highly sensitive to the quality of graph structures and usually require a perfect graph structure for learning informative embeddings. However, the pervasiveness of noise in graphs necessitates learning robust representations for real-world problems. To improve the robustness of GNN models, many studies have been proposed around the central concept of Graph Structure Learning (GSL), which aims to jointly learn an optimized graph structure and corresponding representations. Towards this end, in the presented survey, we broadly review recent progress of GSL methods for learning robust representations. Specifically, we first formulate a general paradigm of GSL, and then review state-of-the-art methods classified by how they model graph structures, followed by applications that incorporate the idea of GSL in other graph tasks. Finally, we point out some issues in current studies and discuss future directions.

翻訳日:2021-03-05 14:45:51 公開日:2021-03-04

# 自律車いすプラットフォームにおける人間のナビゲーション意図の視線一致復号

Gaze-contingent decoding of human navigation intention on an autonomous wheelchair platform ( http://arxiv.org/abs/2103.03072v1 )

ライセンス: Link先を確認

Mahendran Subramanian, Suhyung Park, Pavel Orlov, Ali Shafti, A. Aldo Faisal

(参考訳) 我々は,モビリティ・デバイスのナビゲートの目的を理解するために,ユーザが環境をどのように見ているかをデコードすることで,モビリティ・プラットフォームを制御するためのwhere-you-look-is where-you-goアプローチの先駆者となった。しかし、多くの自然眼球運動は行動意図のデコードとは無関係であり、midas touch問題と呼ばれるデコードに挑戦する者もいる。本稿では,1. 深部コンピュータビジョンを用いて,ユーザが自分の視野で何を見ているのかを理解し,2. ユーザが見ている対象のバウンディングボックスのどこにあるのかを分析し,3. 単純な機械学習分類器を用いて,対象に対する視覚上の注意がその対象へのナビゲーション意図の予測であるかどうかを判断する。私たちのデコードシステムは最終的に、ユーザーがドアなどへ運転したいかどうかを判断するか、単にそれを見るかを決定します。重要なのは、ユーザーがオブジェクトを見て、それに向かって動いていることを想像すると、このモーターイメージ(神経インターフェイスと同様)から得られる目の動きはデコダラブルのままであることだ。運転意図と位置を検知すると、自動車椅子プラットフォームであるA.Eye-Driveに、静的で移動中の障害物を避けながら、所望の物体への移動を指示する。したがって,ナビゲーションのためには,車いすを目標(低レベルヒューマンインタフェース)に継続的に操るのではなく,目的と認知的にインタラクションすることのみを必要とする認知レベルのヒューマンインタフェースを実現する。

We have pioneered the Where-You-Look-Is Where-You-Go approach to controlling mobility platforms by decoding how the user looks at the environment to understand where they want to navigate their mobility device. However, many natural eye-movements are not relevant for action intention decoding, only some are, which places a challenge on decoding, the so-called Midas Touch Problem. Here, we present a new solution, consisting of 1. deep computer vision to understand what object a user is looking at in their field of view, with 2. an analysis of where on the object's bounding box the user is looking, to 3. use a simple machine learning classifier to determine whether the overt visual attention on the object is predictive of a navigation intention to that object. Our decoding system ultimately determines whether the user wants to drive to e.g., a door or just looks at it. Crucially, we find that when users look at an object and imagine they were moving towards it, the resulting eye-movements from this motor imagery (akin to neural interfaces) remain decodable. Once a driving intention and thus also the location is detected our system instructs our autonomous wheelchair platform, the A.Eye-Drive, to navigate to the desired object while avoiding static and moving obstacles. Thus, for navigation purposes, we have realised a cognitive-level human interface, as it requires the user only to cognitively interact with the desired goal, not to continuously steer their wheelchair to the target (low-level human interfacing).

翻訳日:2021-03-05 14:45:37 公開日:2021-03-04

# Gradient-Guided Dynamic Efficient Adversarial Training

Gradient-Guided Dynamic Efficient Adversarial Training ( http://arxiv.org/abs/2103.03076v1 )

ライセンス: Link先を確認

Fu Wang, Yanghao Zhang, Yanbin Zheng, Wenjie Ruan

(参考訳) 敵意の強い攻撃に耐えられる強固なディープニューラルネットワークを訓練する上で、敵意のトレーニングは効果的だが時間がかかります。そこで本研究では,非効率性に対する反応として動的効率のよい対向訓練(deat)を提案し,対向的反復を徐々に増加させる。さらに、与えられたネットワークのLipschitz定数の下界の接続と、逆例に対する部分微分の大きさが理論的に明らかになる。この理論的な発見を裏付けるものとして, 勾配の大きさを利用して, 逆訓練の有効性を定量化し, 訓練手順の調整タイミングを決定する。このマグニチュードベースの戦略は計算に優しく、実装が簡単です。それはDEATのために特に適し、また反対の訓練方法の広い範囲に移植することができます。今後の研究に光を当てる可能性のある,効果的な対人訓練を実現するためには,一定のレベルのトレーニング対人事例の品質維持が不可欠であると考えられた。

Adversarial training is arguably an effective but time-consuming way to train robust deep neural networks that can withstand strong adversarial attacks. As a response to the inefficiency, we propose the Dynamic Efficient Adversarial Training (DEAT), which gradually increases the adversarial iteration during training. Moreover, we theoretically reveal that the connection of the lower bound of Lipschitz constant of a given network and the magnitude of its partial derivative towards adversarial examples. Supported by this theoretical finding, we utilize the gradient's magnitude to quantify the effectiveness of adversarial training and determine the timing to adjust the training procedure. This magnitude based strategy is computational friendly and easy to implement. It is especially suited for DEAT and can also be transplanted into a wide range of adversarial training methods. Our post-investigation suggests that maintaining the quality of the training adversarial examples at a certain level is essential to achieve efficient adversarial training, which may shed some light on future studies.

翻訳日:2021-03-05 14:45:08 公開日:2021-03-04

# ILoSA: スティフネスとアトラクションのインタラクティブな学習

ILoSA: Interactive Learning of Stiffness and Attractors ( http://arxiv.org/abs/2103.03099v1 )

ライセンス: Link先を確認

Giovanni Franzese, Anna M\'esz\'aros, Luka Peternel, and Jens Kober

(参考訳) ロボットに私たちの好みに応じて力を適用する方法を教えることは、複数のエンジニアリングの観点から取り組まなければならないオープンチャレンジです。本稿では,ユーザフレンドリーなインタフェースを用いて,人間の実演や修正からデカルト剛性とアトラクタの両方を学習できる可変インピーダンスポリシーの学習法について検討する。 ILoSAと呼ばれるこのフレームワークは、政策学習、不確実性領域の特定、インタラクティブな修正、剛性変調、アクティブ障害拒否を可能にするためにガウスプロセスを使用している。フランカ・エミカ・パンダにおいて,(1)プラグの取り外し成功時に突然の力の不連続が発生するプラグを引っ張る,(2)ロボットの動作を維持するために持続的な力を必要とする箱を押す,(3)力が移動方向に垂直に作用するホワイトボードを拭く,という3つの異なる力相互作用特性を有する実験的な評価を行った。

Teaching robots how to apply forces according to our preferences is still an open challenge that has to be tackled from multiple engineering perspectives. This paper studies how to learn variable impedance policies where both the Cartesian stiffness and the attractor can be learned from human demonstrations and corrections with a user-friendly interface. The presented framework, named ILoSA, uses Gaussian Processes for policy learning, identifying regions of uncertainty and allowing interactive corrections, stiffness modulation and active disturbance rejection. The experimental evaluation of the framework is carried out on a Franka-Emika Panda in three separate cases with unique force interaction properties: 1) pulling a plug wherein a sudden force discontinuity occurs upon successful removal of the plug, 2) pushing a box where a sustained force is required to keep the robot in motion, and 3) wiping a whiteboard in which the force is applied perpendicular to the direction of movement.

翻訳日:2021-03-05 14:44:50 公開日:2021-03-04

# Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks

Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks ( http://arxiv.org/abs/2103.03212v1 )

ライセンス: Link先を確認

Cristian Bodnar, Fabrizio Frasca, Yu Guang Wang, Nina Otter, Guido Mont\'ufar, Pietro Li\`o, Michael Bronstein

(参考訳) グラフ機械学習のペアワイズ相互作用パラダイムは、リレーショナルシステムのモデリングを主に支配している。しかし、グラフだけでは多くの複雑なシステムに存在するマルチレベル相互作用を捉えることができず、そのようなスキームの表現力は限定的であることが証明された。これらの制限を克服するために、グラフを高次元に一般化する位相的オブジェクトである単純複素体(SC)上のメッセージパッシングを実行するモデルのクラスであるMessage Passing Simplicial Networks (MPSNs)を提案する。理論的にモデルの表現性を解析するために,非同型SCを識別するための単純ワイスフェイラー・リーマンカラー化法を導入する。我々は、SWLのパワーと非同型グラフの識別の問題とを関連づけ、SWLとMPSNがWLテストよりも厳密に強力であり、3WLテストよりも強力でないことを示す。我々は,従来のグラフニューラルネットワークとReLUアクティベーションを比較して,表現可能な関数の線形領域の数の観点から分析を深めている。我々は,MPSNがGNNが失敗する難易度の高い正則グラフを識別し,配向同変層を備えると,GNNベースラインと比較して指向性SCの分類精度を向上させることができることを示すことによって,我々の理論的主張を実証的に支持する。さらに、我々は今後リリースする予定の単純複雑体上でメッセージパッシングのためのライブラリを実装します。

The pairwise interaction paradigm of graph machine learning has predominantly governed the modelling of relational systems. However, graphs alone cannot capture the multi-level interactions present in many complex systems and the expressive power of such schemes was proven to be limited. To overcome these limitations, we propose Message Passing Simplicial Networks (MPSNs), a class of models that perform message passing on simplicial complexes (SCs) - topological objects generalising graphs to higher dimensions. To theoretically analyse the expressivity of our model we introduce a Simplicial Weisfeiler-Lehman (SWL) colouring procedure for distinguishing non-isomorphic SCs. We relate the power of SWL to the problem of distinguishing non-isomorphic graphs and show that SWL and MPSNs are strictly more powerful than the WL test and not less powerful than the 3-WL test. We deepen the analysis by comparing our model with traditional graph neural networks with ReLU activations in terms of the number of linear regions of the functions they can represent. We empirically support our theoretical claims by showing that MPSNs can distinguish challenging strongly regular graphs for which GNNs fail and, when equipped with orientation equivariant layers, they can improve classification accuracy in oriented SCs compared to a GNN baseline. Additionally, we implement a library for message passing on simplicial complexes that we envision to release in due course.

翻訳日:2021-03-05 14:44:31 公開日:2021-03-04

# GenoML: ゲノムのための自動機械学習

GenoML: Automated Machine Learning for Genomics ( http://arxiv.org/abs/2103.03221v1 )

ライセンス: Link先を確認

Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, David Saffo, Lana Sargent, Anant Dadu, Eduardo Salmer\'on Casta\~no, John F. Carter, Melina Maleknia, Juan A. Botia, Cornelis Blauwendraat, Roy H. Campbell, Sayed Hadi Hashemi, Andrew B. Singleton, Mike A. Nalls, Faraz Faghri

(参考訳) GenoMLは、ゲノミクス(遺伝学とマルチオミクス)のための機械学習ワークフローを自動化するPythonパッケージである。ゲノムデータには、データのクリーン化、前処理、調和、および品質管理を行うための重要なドメイン専門知識が必要です。さらに、チューニング、検証、および解釈には、基礎となるデータ収集、プロトコル、および技術の生物学および潜在的に制限を考慮する必要があります。 GenoMLの使命は、完全な開発、評価、および展開プロセスを自動化する使いやすいツールを開発し、ゲノム学と臨床データの機械学習を非専門家にもたらすことです。オープンサイエンスを重視して、ワークフローを科学コミュニティ内で簡単にアクセス、複製、転送できるようにします。ソースコードとドキュメントはhttps://genoml.comで入手できる。

GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlying data collection, protocols, and technology. GenoML's mission is to bring machine learning for genomics and clinical data to non-experts by developing an easy-to-use tool that automates the full development, evaluation, and deployment process. Emphasis is put on open science to make workflows easily accessible, replicable, and transferable within the scientific community. Source code and documentation is available at https://genoml.com.

翻訳日:2021-03-05 14:44:06 公開日:2021-03-04

# ガウス過程のための小さなサンプル空間

Small Sample Spaces for Gaussian Processes ( http://arxiv.org/abs/2103.03169v1 )

ライセンス: Link先を確認

Toni Karvonen

(参考訳) ガウスプロセス$X$のサンプルの与えられた再生核ヒルベルト空間(RKHS)のメンバシップは、特定の核支配条件によって制御されることが知られている。しかし、サンプルを含む「小さな」関数の集合(必ずしもベクトル空間ではない)をどうやって識別するかは明確ではない。本稿では、そのような集合を識別するための一般的なアプローチを示す。私たちは、ヒルベルトスケールの一般化と見なすことができるスケールされたRKHSを使用して、スケールされたRKHSのコレクションによって誘導される$\sigma$-algebraで$X$の法則の下で完全な測定のすべての要素に含まれる最大のセットとしてサンプルサポートセットを定義します。この潜在的に測定不可能な集合は、$X$ の共分散核 RKHS の直交基底の点から拡大できる函数から成り、その平方基底係数が 0 と無限から遠ざかっていることが示され、カルフン-Lo\`{e}ve の定理によって示唆される。

It is known that the membership in a given reproducing kernel Hilbert space (RKHS) of the samples of a Gaussian process $X$ is controlled by a certain nuclear dominance condition. However, it is less clear how to identify a "small" set of functions (not necessarily a vector space) that contains the samples. This article presents a general approach for identifying such sets. We use scaled RKHSs, which can be viewed as a generalisation of Hilbert scales, to define the sample support set as the largest set which is contained in every element of full measure under the law of $X$ in the $\sigma$-algebra induced by the collection of scaled RKHS. This potentially non-measurable set is then shown to consist of those functions that can be expanded in terms of an orthonormal basis of the RKHS of the covariance kernel of $X$ and have their squared basis coefficients bounded away from zero and infinity, a result suggested by the Karhunen-Lo\`{e}ve theorem.

翻訳日:2021-03-05 14:43:23 公開日:2021-03-04

# 背景音楽と混合した放送データを用いたニューラルテキスト音声モデル

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music ( http://arxiv.org/abs/2103.03049v1 )

ライセンス: Link先を確認

Hanbin Bae, Jae-Sung Bae, Young-Sun Joo, Young-Ik Kim, Hoon-Young Cho

(参考訳) 近年,インターネットやyoutubeなどのメディアから音声データを得るのが容易になっているが,tts(neural text-to-speech)モデルを直接利用することは困難である。クリーンスピーチの割合は不十分であり、残りはバックグラウンドミュージックを含む。 global style token (gst)でさえも。そこで本研究では,放送データに制限のあるエンドツーエンドのTSモデルを学習する手法を提案する。まず、音楽フィルタを導入することにより、背景音楽が音声から削除される。第二に、補助品質分類器を備えたGST-TTSモデルは、フィルタリングされた音声と少量のクリーンな音声で訓練される。特に、品質分類器は、GST層の埋め込みベクトルを入力音声の音声品質(フィルタまたはクリーン)を表すことに重点を置いている。実験により,提案手法は従来手法よりもはるかに高品質な音声を合成することを確認した。

Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-end TTS model with limited broadcast data. First, the background music is removed from the speech by introducing a music filter. Second, the GST-TTS model with an auxiliary quality classifier is trained with the filtered speech and a small amount of clean speech. In particular, the quality classifier makes the embedding vector of the GST layer focus on representing the speech quality (filtered or clean) of the input speech. The experimental results verified that the proposed method synthesized much more high-quality speech than conventional methods.

翻訳日:2021-03-05 14:42:32 公開日:2021-03-04

# One for One, or All for All: フェデレーション学習におけるコラボレーションの平衡と最適性

One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning ( http://arxiv.org/abs/2103.03228v1 )

ライセンス: Link先を確認

Avrim Blum, Nika Haghtalab, Richard Lanas Phillips, Han Shao

(参考訳) 近年、連合学習は、多数の学習エージェントにまたがるコラボレーションを実現するためのアプローチとして受け入れられている。しかし、これらのコラボレーションを維持するために個別のリソースを共同学習に割り当てる際にエージェントのインセンティブをどのように考慮すべきかについては、ほとんど知られていない。本論文では,ゲーム理論の概念に触発されて,フェデレーション学習におけるインセンティブ認識学習とデータ共有のためのフレームワークを提案する。本研究は, 学習目標達成に関心のあるエージェントの存在下で, サンプル収集の負担を低く抑えながら, 協調の考え方を捉えたものである。例えば、うらやましのない平衡では、いかなるエージェントもサンプリング負荷を他のエージェントと交換することを望んでおらず、安定した平衡では、サンプリング負荷を一方的に低減したいエージェントはいない。この枠組みの形式化に加えて、我々の貢献には、そのような平衡の構造的性質を特徴づけ、その存在を証明し、どのように計算できるかを示すことが含まれる。さらに、エージェントのインセンティブを無視した場合のインセンティブ認識コラボレーションのサンプル複雑さと最適なコラボレーションのサンプルを比較します。

In recent years, federated learning has been embraced as an approach for bringing about collaboration across large populations of learning agents. However, little is known about how collaboration protocols should take agents' incentives into account when allocating individual resources for communal learning in order to maintain such collaborations. Inspired by game theoretic notions, this paper introduces a framework for incentive-aware learning and data sharing in federated learning. Our stable and envy-free equilibria capture notions of collaboration in the presence of agents interested in meeting their learning objectives while keeping their own sample collection burden low. For example, in an envy-free equilibrium, no agent would wish to swap their sampling burden with any other agent and in a stable equilibrium, no agent would wish to unilaterally reduce their sampling burden. In addition to formalizing this framework, our contributions include characterizing the structural properties of such equilibria, proving when they exist, and showing how they can be computed. Furthermore, we compare the sample complexity of incentive-aware collaboration with that of optimal collaboration when one ignores agents' incentives.

翻訳日:2021-03-05 14:42:06 公開日:2021-03-04

# Moshpit SGD:不均一不信頼デバイスにおけるコミュニケーション効率の良い分散トレーニング

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices ( http://arxiv.org/abs/2103.03239v1 )

ライセンス: Link先を確認

Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko

(参考訳) 大規模データセットでのディープニューラルネットワークのトレーニングは、複数の計算ノードを使用することで、しばしば加速される。分散トレーニングとして知られるこのアプローチは、リングオールリデューサのような特殊なメッセージパッシングプロトコルを使って数百のコンピュータを利用することができる。しかし、これらのプロトコルを大規模に実行するには、専用のクラスタでしか利用できない信頼性の高い高速ネットワークが必要である。対照的に、フェデレーション学習やクラウドベースの分散トレーニングといった現実世界のアプリケーションの多くは、不安定なネットワーク帯域を持つ信頼性の低いデバイス上で動作します。その結果、これらのアプリケーションは、パラメータサーバまたはgossipベースの平均プロトコルの使用に制限される。この研究では、指数関数的に世界平均に収束する反復平均プロトコルであるMoshpit All-Reduceを提案した。我々は,分散最適化のためのプロトコルの効率を,強い理論的保証で実証する。実験では、ImageNet上のResNet-50トレーニングの1.3倍のスピードアップと、プリエンプティブルな計算ノードを使用してALBERTをスクラッチからトレーニングする際の1.5倍のスピードアップが示されている。

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce -- an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.

翻訳日:2021-03-05 14:41:48 公開日:2021-03-04

# 契約理論による学習型適応制御

Learning-based Adaptive Control via Contraction Theory ( http://arxiv.org/abs/2103.02987v1 )

ライセンス: Link先を確認

Hiroyasu Tsukamoto and Soon-Jo Chung and Jean-Jacques Slotine

(参考訳) 本稿では,適応神経収縮計量(adaptive neural shrink metric, ancm)と呼ばれる,乗算可能なパラメトリック不確かさを持つ非線形システムのための,新しいディープラーニングに基づく適応制御フレームワークを提案する。 aNCMは、パラメトリック不確実性の下での系軌道の漸近安定性と指数的有界性を保証する最適適応収縮計量のニューラルネットワークモデルを使用する。特に,神経収縮計量(ncm)の概念を活用し,境界外乱を有する非線形システムに対して名目上十分安定なロバスト制御方針を得るとともに,この方針を新しい適応則と組み合わせて安定性保証を実現する。また,基底関数近似を用いてモデル化した力学系の適応制御に適用できることを示した。さらに、ニューラルネットワークをaNCMで使用することで、リアルタイムの実装が可能となり、様々なシステムに適用可能である。最先端技術に対するその優位性は、単純なカートポールバランスタスクで示される。

We present a new deep learning-based adaptive control framework for nonlinear systems with multiplicatively-separable parametric uncertainty, called an adaptive Neural Contraction Metric (aNCM). The aNCM uses a neural network model of an optimal adaptive contraction metric, the existence of which guarantees asymptotic stability and exponential boundedness of system trajectories under the parametric uncertainty. In particular, we exploit the concept of a Neural Contraction Metric (NCM) to obtain a nominal provably stable robust control policy for nonlinear systems with bounded disturbances, and combine this policy with a novel adaptation law to achieve stability guarantees. We also show that the framework is applicable to adaptive control of dynamical systems modeled via basis function approximation. Furthermore, the use of neural networks in the aNCM permits its real-time implementation, resulting in broad applicability to a variety of systems. Its superiority to the state-of-the-art is illustrated with a simple cart-pole balancing task.

翻訳日:2021-03-05 14:41:26 公開日:2021-03-04

# BERTをベースとした特許ノベルティ検索のトレーニング

BERT based patent novelty search by training claims to their own description ( http://arxiv.org/abs/2103.01126v3 )

ライセンス: Link先を確認

Michael Freunek and Andr\'e Bodmer

(参考訳) 本稿では,特許クレームを自己記述に結合する手法を提案する。この方法を適用することで、BERTはクレームの適切な記述を訓練する。このようなトレーニングされたBERT (claim-to-description- BERT) は、特許の新規性に関する記述を識別することができる。さらに,BERTの出力を有意に処理するために,新たなスコアリング方式,関連スコア,あるいは新規スコアを導入する。特許に関する最初の主張とそれに対応する記述に基づいてBERTを訓練することにより,特許出願の手法を検証した。 BERTの出力は、検索レポートの引用X文書と比較して、関連スコアと結果に基づいて処理されている。テストの結果、BERTは引用されたX文書のいくつかを非常に関連性が高いと評価した。

In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claim-to-description- BERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT's output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.

翻訳日:2021-03-05 13:05:03 公開日:2021-03-04

# ハイブリッドネットワークを用いたイベントベース合成開口イメージング

Event-based Synthetic Aperture Imaging with a Hybrid Network ( http://arxiv.org/abs/2103.02376v2 )

ライセンス: Link先を確認

Xiang Zhang, Liao Wei, Lei Yu, Wen Yang and Gui-Song Xia

(参考訳) 合成開口画像(SAI)は、オフフォーカス前景の閉塞をぼかし、マルチビュー画像からフォーカス内隠蔽対象を再構成することにより、その効果を生かすことができる。しかし、非常に密集した閉塞と極端な照明条件は、従来のフレームベースのカメラに基づくSAIに大きな障害をもたらし、性能劣化を引き起こす可能性がある。そこで本研究では,低レイテンシかつ高ダイナミックレンジの非同期イベントを生成可能なイベントカメラに基づく新しいSAIシステムを提案する。これにより、ほぼ連続的な視点で測定することで密閉体の干渉を排除でき、同時に露光問題に対処することができる。閉鎖対象を再構築するために、スパイキングニューラルネットワーク(SNN)と畳み込みニューラルネットワーク(CNN)からなるハイブリッドエンコーダデコーダネットワークを提案する。ハイブリッドネットワークでは、収集されたイベントの時空間情報が最初にsnn層によってエンコードされ、その後、スタイル転送cnnデコーダによってオクルードされたターゲットの視覚画像に変換される。実験により,非常に密集したオクルージョンと極端な照明条件に対処し,純イベントデータを用いて高品質な視覚画像を再構成できることを示す。

Synthetic aperture imaging (SAI) is able to achieve the see through effect by blurring out the off-focus foreground occlusions and reconstructing the in-focus occluded targets from multi-view images. However, very dense occlusions and extreme lighting conditions may bring significant disturbances to SAI based on conventional frame-based cameras, leading to performance degeneration. To address these problems, we propose a novel SAI system based on the event camera which can produce asynchronous events with extremely low latency and high dynamic range. Thus, it can eliminate the interference of dense occlusions by measuring with almost continuous views, and simultaneously tackle the over/under exposure problems. To reconstruct the occluded targets, we propose a hybrid encoder-decoder network composed of spiking neural networks (SNNs) and convolutional neural networks (CNNs). In the hybrid network, the spatio-temporal information of the collected events is first encoded by SNN layers, and then transformed to the visual image of the occluded targets by a style-transfer CNN decoder. Through experiments, the proposed method shows remarkable performance in dealing with very dense occlusions and extreme lighting conditions, and high quality visual images can be reconstructed using pure event data.

翻訳日:2021-03-05 13:04:27 公開日:2021-03-04

# ルールセットの視覚化:設計空間の探索と検証

Visualizing Rule Sets: Exploration and Validation of a Design Space ( http://arxiv.org/abs/2103.01022v2 )

ライセンス: Link先を確認

Jun Yuan, Oded Nov, Enrico Bertini

(参考訳) ルールセットは、透明性と知性が必要な設定でモデルロジックを伝える手段として、機械学習(ML)でよく使用される。ルールセットは通常、論理文(ルール)のテキストベースのリストとして表示される。驚いたことに、これまでルールを提示するための視覚的な代替方法を探求する作業は限られていた。本論文では,ルールの可読性や理解にポジティブな影響を与えると思われる視覚的要因に焦点をあてて,ルールの代替表現を設計するアイデアを検討する。本稿では,ルールセットを視覚化するための初期設計空間と,その影響を探索するユーザスタディを提案する。その結果, 設計要因のいくつかは, 精度への影響を最小限に抑えつつ, 読者がいかに効率的にルールを処理できるかに強い影響を与えていることがわかった。この作業は、ルールをコミュニケーション戦略として使用してMLモデルを理解する際に、実践者がより効果的なソリューションを採用するのに役立ちます。

Rule sets are often used in Machine Learning (ML) as a way to communicate the model logic in settings where transparency and intelligibility are necessary. Rule sets are typically presented as a text-based list of logical statements (rules). Surprisingly, to date there has been limited work on exploring visual alternatives for presenting rules. In this paper, we explore the idea of designing alternative representations of rules, focusing on a number of visual factors we believe have a positive impact on rule readability and understanding. The paper presents an initial design space for visualizing rule sets and a user study exploring their impact. The results show that some design factors have a strong impact on how efficiently readers can process the rules while having minimal impact on accuracy. This work can help practitioners employ more effective solutions when using rules as a communication strategy to understand ML models.

翻訳日:2021-03-05 13:04:06 公開日:2021-03-04

# ダグは好きか? 構造学習と因果的発見に関する調査

D'ya like DAGs? A Survey on Structure Learning and Causal Discovery ( http://arxiv.org/abs/2103.02582v2 )

ライセンス: Link先を確認

Matthew J. Vowels, Necati Cihan Camgoz, and Richard Bowden

(参考訳) 因果推論は科学と人間の知性の重要な部分です。データから因果関係を発見するためには構造探索法が必要である。本稿では、背景理論のレビューと構造発見手法の調査を行う。私たちは主にモダンで継続的な最適化手法にフォーカスし、ベンチマークデータセットやソフトウェアパッケージといったさらなるリソースへの参照を提供します。最後に,構造から因果関係へ導くために必要な跳躍について論じる。

Causal reasoning is a crucial part of science and human intelligence. In order to discover causal relationships from data, we need structure discovery methods. We provide a review of background theory and a survey of methods for structure discovery. We primarily focus on modern, continuous optimization methods, and provide reference to further resources such as benchmark datasets and software packages. Finally, we discuss the assumptive leap required to take us from structure to causality.

翻訳日:2021-03-05 13:03:51 公開日:2021-03-04

# ジェネレーティブ・アドバーサリー・ネットワークを1つの段階でトレーニングする

Training Generative Adversarial Networks in One Stage ( http://arxiv.org/abs/2103.00430v2 )

ライセンス: Link先を確認

Chengchao Shen, Youtan Yin, Xinchao Wang, Xubin LI, Jie Song, Mingli Song

(参考訳) GAN(Generative Adversarial Networks)は、様々な画像生成タスクで前例のない成功を収めています。しかし、奨励的な結果は、発電機と識別器が2つの段階に交互に更新される面倒なトレーニングプロセスの価格で提供されます。本稿では,1段階のみに効率よくGANを訓練できる総合的な訓練手法について検討する。生成器と識別器の対角的損失に基づいて、GANを対称的GANと非対称的GANの2つのクラスに分類し、2つのクラスを統一する新たな勾配分解法を導入し、両方のクラスを1段階にトレーニングし、トレーニング作業を緩和する。いくつかのデータセットと様々なネットワークアーキテクチャの計算解析と実験結果から,提案した1段階トレーニングスキームは,ジェネレータと判別器のネットワークアーキテクチャによらず,従来のトレーニングスキームよりも1.5$\times$加速度が得られた。さらに,提案手法は,データフリーナレッジ蒸留など,他の対比訓練シナリオにも容易に適用できることを示した。ソースコードはもうすぐ公開します。

Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Based on the adversarial losses of the generator and discriminator, we categorize GANs into two classes, Symmetric GANs and Asymmetric GANs, and introduce a novel gradient decomposition method to unify the two, allowing us to train both classes in one stage and hence alleviate the training effort. Computational analysis and experimental results on several datasets and various network architectures demonstrate that, the proposed one-stage training scheme yields a solid 1.5$\times$ acceleration over conventional training schemes, regardless of the network architectures of the generator and discriminator. Furthermore, we show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation. Our source code will be published soon.

翻訳日:2021-03-05 13:03:47 公開日:2021-03-04

# 授業増分学習におけるデータの蒸留因果効果

Distilling Causal Effect of Data in Class-Incremental Learning ( http://arxiv.org/abs/2103.01737v2 )

ライセンス: Link先を確認

Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang

(参考訳) 本研究では,CIL(Class-Incremental Learning)における破滅的忘れについて説明し,データリプレイや特徴/ラベル蒸留といった既存のアンチフォーガーティング手法に直交する新しい蒸留法を導出するための因果的枠組みを提案する。まず最初に、CILをフレームワークに配置し、2) 忘れる理由に答える: 古いデータの因果効果が新しいトレーニングで失われ、3) 既存のテクニックがそれを緩和する方法について説明する: 因果効果を取り戻せる。この枠組みから, 特徴・ラベル蒸留は貯蔵効率が高いが, その因果効果は, データ再生によって保存されるエンドツーエンドの特徴学習の長所と一致しないことがわかった。そこで本研究では,データ再生の因果効果と基本的に等価であるが,再生ストレージのコストを伴わずに,古いデータと新しいデータとの衝突効果を蒸留することを提案する。因果効果分析のおかげで、データストリームのIncremental Momentum Effectをさらにキャプチャし、新しいデータ効果によって圧倒された古い効果を保持するのに役立つものを削除し、テストにおける古いクラスの忘れを軽減することができます。 CIFAR-100、ImageNet-Sub&Fullの3つのCILベンチマークに関する広範な実験は、提案された因果効果蒸留が、様々な最先端のCIL法を大きなマージン(0.72%--9.06%)で改善できることを示した。

We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation. We first 1) place CIL into the framework, 2) answer why the forgetting happens: the causal effect of the old data is lost in new training, and then 3) explain how the existing techniques mitigate it: they bring the causal effect back. Based on the framework, we find that although the feature/label distillation is storage-efficient, its causal effect is not coherent with the end-to-end feature learning merit, which is however preserved by data replay. To this end, we propose to distill the Colliding Effect between the old and the new data, which is fundamentally equivalent to the causal effect of data replay, but without any cost of replay storage. Thanks to the causal effect analysis, we can further capture the Incremental Momentum Effect of the data stream, removing which can help to retain the old effect overwhelmed by the new data effect, and thus alleviate the forgetting of the old class in testing. Extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect distillation can improve various state-of-the-art CIL methods by a large margin (0.72%--9.06%).

翻訳日:2021-03-05 13:03:27 公開日:2021-03-04

# PENet: 精密かつ効率的な画像ガイド深度補完を目指して

PENet: Towards Precise and Efficient Image Guided Depth Completion ( http://arxiv.org/abs/2103.00783v2 )

ライセンス: Link先を確認

Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiaojin Gong

(参考訳) 画像案内深度完成は、スパース深度マップと高品質な画像から濃密深度マップを生成するタスクである。このタスクでは、色と深さのモダリティを融合する方法が、優れたパフォーマンスを達成する上で重要な役割を果たす。本論文では, 色優性分枝と深度優性分枝からなる2枝バックボーンを提案し, 2つのモダリティを徹底的に活用・融合する。具体的には、色画像とスパース深度マップを入力し、密度の深い深度マップを予測する。他方の分岐は、スパース深度マップと予め予測された深さマップを入力とし、高密度深さマップも出力する。 2つの枝から予測される深度マップは互いに補完的であり、適応的に融合する。さらに,3次元幾何学的手がかりを符号化する簡単な幾何学的畳み込み層も提案する。幾何エンコードされたバックボーンは、複数の段階で異なるモダリティの融合を行い、良好な深さ完成結果をもたらします。さらに、融解深度マップを効率的に洗練するために、拡張および加速CSPN++を実装します。提案する完全モデルは、提出時点でkitti depth completion online leaderboardで1位にランクインしている。また、トップクラスのほとんどのメソッドよりもはるかに高速に推論する。この作業のコードはhttps://github.com/JUGGHM/PENet_ICRA2021で入手できます。

Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch to exploit and fuse two modalities thoroughly. More specifically, one branch inputs a color image and a sparse depth map to predict a dense depth map. The other branch takes as inputs the sparse depth map and the previously predicted depth map, and outputs a dense depth map as well. The depth maps predicted from two branches are complimentary to each other and therefore they are adaptively fused. In addition, we also propose a simple geometric convolutional layer to encode 3D geometric cues. The geometric encoded backbone conducts the fusion of different modalities at multiple stages, leading to good depth completion results. We further implement a dilated and accelerated CSPN++ to refine the fused depth map efficiently. The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission. It also infers much faster than most of the top ranked methods. The code of this work will be available at https://github.com/JUGGHM/PENet_ICRA2021.

翻訳日:2021-03-05 13:03:01 公開日:2021-03-04

# StyleGAN用円形人工物の系統解析と除去

Systematic Analysis and Removal of Circular Artifacts for StyleGAN ( http://arxiv.org/abs/2103.01090v2 )

ライセンス: Link先を確認

Way Tan, Bihan Wen, Xulei Yang

(参考訳) StyleGANは、高解像度と超リアルな顔画像の合成で有名な最先端の画像ジェネレーターの1つです。バニラスタイルGANモデルによって生成された画像は視覚的に魅力的であるが、しばしば、生成された画像の品質を著しく低下させる顕著な円形のアーティファクトを含んでいる。本研究では、バニラ様式GANアーキテクチャの異なる段階の機能を検討し、メカニズム解析と広範な実験の両方を用いて、これらの円形アーティファクトがどのように形成されるのかを体系的に調査する。このような望ましくないアーティファクトを促進するバニラスタイルガンのキーモジュールが強調される。私たちの調査では、アーティファクトが通常、円形であり、比較的小さく、まれに2つ以上の部分に分割される理由も説明しています。さらに,バニラ型GANの顕著な円形アーティファクトを,新しいピクセルインスタンス正規化(PIN)層を適用して,簡易かつ効果的に除去する手法を提案する。

StyleGAN is one of the state-of-the-art image generators which is well-known for synthesizing high-resolution and hyper-realistic face images. Though images generated by vanilla StyleGAN model are visually appealing, they sometimes contain prominent circular artifacts which severely degrade the quality of generated images. In this work, we provide a systematic investigation on how those circular artifacts are formed by studying the functionalities of different stages of vanilla StyleGAN architecture, with both mechanism analysis and extensive experiments. The key modules of vanilla StyleGAN that promote such undesired artifacts are highlighted. Our investigation also explains why the artifacts are usually circular, relatively small and rarely split into 2 or more parts. Besides, we propose a simple yet effective solution to remove the prominent circular artifacts for vanilla StyleGAN, by applying a novel pixel-instance normalization (PIN) layer.

翻訳日:2021-03-05 13:02:40 公開日:2021-03-04

# 自己分散バイナリニューラルネットワーク

Self-Distribution Binary Neural Networks ( http://arxiv.org/abs/2103.02394v2 )

ライセンス: Link先を確認

Ping Xue, Yang Lu, Jingfei Chang, Xing Wei, Zhen Wei

(参考訳) 本研究では、重みとアクティベーションの両方がバイナリ(すなわち1ビット表現)である2進ニューラルネットワーク(BNN)について検討する。特徴表現はディープニューラルネットワークにとって重要ですが、BNNでは特徴はサインでしか異なります。先行研究では、量子化誤差を低減し、bnnの分類精度を効果的に向上するために、二元重みとアクティベーションにスケーリング係数を導入する。しかしながら、スケーリング要因はネットワークの計算複雑性を増加させるだけでなく、バイナリ機能の兆候にも意味をなさない。そこで,SD-BNN(Self-Distribution Binary Neural Network)を提案する。まず、アクティベーション自己分布(ASD)を用いて、アクティベーションの符号分布を適応的に調整し、畳み込みの出力の符号差を改善する。第二に、重量自己分布(WSD)を通じて重みの符号分布を調整し、畳み込みの符号分布を微調整します。さまざまなネットワーク構造を持つCIFAR-10およびImageNetデータセットの広範な実験は、提案されたSD-BNNが常に最先端の(SOTA)BNN(例えば、CIFAR-10で92.5%、ResNet-18で66.5%)を計算コストで上回ることを示唆している。コードはhttps://github.com/ pingxue-hfut/SD-BNNで入手できる。

In this work, we study the binary neural networks (BNNs) of which both the weights and activations are binary (i.e., 1-bit representation). Feature representation is critical for deep neural networks, while in BNNs, the features only differ in signs. Prior work introduces scaling factors into binary weights and activations to reduce the quantization error and effectively improves the classification accuracy of BNNs. However, the scaling factors not only increase the computational complexity of networks, but also make no sense to the signs of binary features. To this end, Self-Distribution Binary Neural Network (SD-BNN) is proposed. Firstly, we utilize Activation Self Distribution (ASD) to adaptively adjust the sign distribution of activations, thereby improve the sign differences of the outputs of the convolution. Secondly, we adjust the sign distribution of weights through Weight Self Distribution (WSD) and then fine-tune the sign distribution of the outputs of the convolution. Extensive experiments on CIFAR-10 and ImageNet datasets with various network structures show that the proposed SD-BNN consistently outperforms the state-of-the-art (SOTA) BNNs (e.g., achieves 92.5% on CIFAR-10 and 66.5% on ImageNet with ResNet-18) with less computation cost. Code is available at https://github.com/ pingxue-hfut/SD-BNN.

翻訳日:2021-03-05 13:02:23 公開日:2021-03-04

# S^3$: ガイド深度推定のための学習可能なスパース信号超密度

$S^3$: Learnable Sparse Signal Superdensity for Guided Depth Estimation ( http://arxiv.org/abs/2103.02396v2 )

ライセンス: Link先を確認

Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Yu-Cheng Chang, Tsung-Lin Tsou, Yu-An Wang, and Winston H. Hsu

(参考訳) Dense Depthの推定は、ロボット工学、3D再構成、拡張現実といった複数のアプリケーションにおいて重要な役割を果たす。 LiDAR や Radar などのスパース信号は高密度深度推定のガイダンスとして利用されているが、密度が低く、分布が不均衡なため改善が制限されている。スパースソースから有効性を最大化するために,拡張領域の信頼性を推定しながらスパースキューから深さ値を拡張する,$S^3$手法を提案する。提案した$S^3$は、様々な誘導深度推定手法や、入力、コストボリューム、出力を含む様々な段階で訓練されたエンドツーエンドに適用できる。広範な実験はLiDARおよびレーダー信号の$S^3$の技術の有効性、堅牢性および柔軟性を示す。

Dense Depth estimation plays a key role in multiple applications such as robotics, 3D reconstruction, and augmented reality. While sparse signal, e.g., LiDAR and Radar, has been leveraged as guidance for enhancing dense depth estimation, the improvement is limited due to its low density and imbalanced distribution. To maximize the utility from the sparse source, we propose $S^3$ technique, which expands the depth value from sparse cues while estimating the confidence of expanded region. The proposed $S^3$ can be applied to various guided depth estimation approaches and trained end-to-end at different stages, including input, cost volume and output. Extensive experiments demonstrate the effectiveness, robustness, and flexibility of the $S^3$ technique on LiDAR and Radar signal.

翻訳日:2021-03-05 13:01:53 公開日:2021-03-04

# 超解像圧縮画像の並列化とアーティファクト低減と分解能向上のシリーズ統合

Super-resolving Compressed Images via Parallel and Series Integration of Artifact Reduction and Resolution Enhancement ( http://arxiv.org/abs/2103.01698v2 )

ライセンス: Link先を確認

Hongming Luo, Fei Zhou, Guangsen Liao, and Guoping Qiu

(参考訳) 本論文では,アーティファクト除去と解像度向上の並列および直列統合に基づく新しい圧縮画像超解像(CISR)フレームワークを提案する。クリーンな低分解能(LR)入力画像と、ダウンサンプリングおよび圧縮観察からのクリーンな高分解能(HR)出力イメージを推定するための最大後方推論に基づいて、アーティファクトリダクションモジュール(ARM)とリゾリューションエンハンスモジュール(REM)の2つのディープニューラルネットワークモジュールからなるCISRアーキテクチャを設計しました。 ARMとREMは、圧縮LRイメージを入力として取得することと並行して動作し、REMはARMの出力を入力の1つとして取得し、ARMはREMの出力を他の入力として取得する。 CSIRシステムのユニークな特徴は、異なる方法で圧縮されたLR画像を様々な品質に超解ける1つの訓練されたモデルである。これは、画像劣化を処理するためのディープニューラルネットワーク容量と、ARMとREM間の並列および直列接続を利用して、特定の劣化への依存を減らすことで実現される。 ARMとREMは、深層展開技術によって同時に訓練される。 JPEGとWebP圧縮画像の混合に対して,圧縮型と圧縮係数の事前知識のない実験を行った。視覚的および定量的比較は、最先端のsuper resolu-tionメソッドよりも優れた方法を示している。

In this paper, we propose a novel compressed image super resolution (CISR) framework based on parallel and series integration of artifact removal and resolution enhancement. Based on maximum a posterior inference for estimating a clean low-resolution (LR) input image and a clean high resolution (HR) output image from down-sampled and compressed observations, we have designed a CISR architecture consisting of two deep neural network modules: the artifact reduction module (ARM) and resolution enhancement module (REM). ARM and REM work in parallel with both taking the compressed LR image as their inputs, while they also work in series with REM taking the output of ARM as one of its inputs and ARM taking the output of REM as its other input. A unique property of our CSIR system is that a single trained model is able to super-resolve LR images compressed by different methods to various qualities. This is achieved by exploiting deep neural net-works capacity for handling image degradations, and the parallel and series connections between ARM and REM to reduce the dependency on specific degradations. ARM and REM are trained simultaneously by the deep unfolding technique. Experiments are conducted on a mixture of JPEG and WebP compressed images without a priori knowledge of the compression type and com-pression factor. Visual and quantitative comparisons demonstrate the superiority of our method over state-of-the-art super resolu-tion methods.Code link: https://github.com/luohongming/CISR_PSI

翻訳日:2021-03-05 13:01:39 公開日:2021-03-04

# 動的核融合モジュールによる乾燥領域と道路異常検出:ベンチマークとアルゴリズム

Dynamic Fusion Module Evolves Drivable Area and Road Anomaly Detection: A Benchmark and Algorithms ( http://arxiv.org/abs/2103.02433v2 )

ライセンス: Link先を確認

Hengli Wang, Rui Fan, Yuxiang Sun, Ming Liu

(参考訳) 移動ロボットにとって,乾燥領域と道路異常の同時検出は非常に重要である。近年,畳み込みニューラルネットワーク(CNN)に基づく多くのセマンティックセグメンテーション手法が,画素ワイドな領域と道路異常検出のために提案されている。さらに、KITTIやCityscapesなどのベンチマークデータセットが広く使用されている。しかし、既存のベンチマークは主に自動運転車向けに設計されている。ロボット車椅子などの地上移動ロボットのベンチマークが欠けています。そこで本論文では,まず地上移動ロボットの走行可能領域と道路異常検出ベンチマークを構築し,視覚的特徴の6つのモダリティを用いて,既存の最先端の単一モーダルおよびデータ融合セマンティックセグメンテーションCNNを評価する。さらに,動的融合モジュール(DFM)と呼ばれる新しいモジュールを提案し,既存のデータ融合ネットワークに容易に展開し,異なるタイプの視覚的特徴を効果的かつ効率的に融合させることができる。実験の結果,変換された不均質画像が最も有意義な視覚的特徴であり,提案したDFM-RTFNetは最先端技術よりも優れていた。さらに,我々のDFM-RTFNetは,KITTIロードベンチマーク上での競合性能を実現している。私たちのベンチマークはhttps://sites.google.com/view/gmrbで公開されています。

Joint detection of drivable areas and road anomalies is very important for mobile robots. Recently, many semantic segmentation approaches based on convolutional neural networks (CNNs) have been proposed for pixel-wise drivable area and road anomaly detection. In addition, some benchmark datasets, such as KITTI and Cityscapes, have been widely used. However, the existing benchmarks are mostly designed for self-driving cars. There lacks a benchmark for ground mobile robots, such as robotic wheelchairs. Therefore, in this paper, we first build a drivable area and road anomaly detection benchmark for ground mobile robots, evaluating the existing state-of-the-art single-modal and data-fusion semantic segmentation CNNs using six modalities of visual features. Furthermore, we propose a novel module, referred to as the dynamic fusion module (DFM), which can be easily deployed in existing data-fusion networks to fuse different types of visual features effectively and efficiently. The experimental results show that the transformed disparity image is the most informative visual feature and the proposed DFM-RTFNet outperforms the state-of-the-arts. Additionally, our DFM-RTFNet achieves competitive performance on the KITTI road benchmark. Our benchmark is publicly available at https://sites.google.com/view/gmrb.

翻訳日:2021-03-05 13:01:12 公開日:2021-03-04

# 飛べる学習--多エージェントクワッドコプター制御の強化学習のためのパイブルレット物理を用いた体育環境

Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control ( http://arxiv.org/abs/2103.02142v2 )

ライセンス: Link先を確認

Jacopo Panerati (1 and 2), Hehui Zheng (3), SiQi Zhou (1 and 2), James Xu (1), Amanda Prorok (3), Angela P. Schoellig (1 and 2) ((1) University of Toronto Institute for Aerospace Studies, (2) Vector Institute for Artificial Intelligence, (3) University of Cambridge)

(参考訳) ロボットシミュレータは、学術研究と教育、および安全クリティカルなアプリケーションの開発に不可欠です。強化学習環境 -- 報酬関数の形で問題仕様と結合した単純なシミュレーション -- もまた、学習アルゴリズムの開発(およびベンチマーク)を標準化する上で重要である。しかし、フルスケールのシミュレータは移植性と並列性に欠ける。逆に、多くの強化学習環境は、おもちゃのような問題における高いサンプルスループットのためのトレードオフリアリズムである。パブリックデータセットはディープラーニングとコンピュータビジョンに大きく貢献していますが、制御理論と強化学習アプローチを同時に開発し、比較するソフトウェアツールはまだありません。本稿では,Bullet物理エンジンに基づく複数クワッドコプターのためのオープンソースのOpenAI Gymライクな環境を提案する。マルチエージェントおよびビジョンベースの強化学習インターフェース、および現実的な衝突と空力効果のサポートは、私たちの知識の最高に、その種の最初のものにします。我々は、制御(pid制御による軌道追跡、ダウンウォッシュによるマルチロボット飛行など)の例を通して、その使用例を実証する。または強化学習(単一および複数エージェント安定化タスク)、制御理論と機械学習を組み合わせた将来の研究を刺激することを望んでいます。

Robotic simulators are crucial for academic research and education as well as the development of safety-critical applications. Reinforcement learning environments -- simple simulations coupled with a problem specification in the form of a reward function -- are also important to standardize the development (and benchmarking) of learning algorithms. Yet, full-scale simulators typically lack portability and parallelizability. Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toy-like problems. While public data sets have greatly benefited deep learning and computer vision, we still lack the software tools to simultaneously develop -- and fairly compare -- control theory and reinforcement learning approaches. In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. Its multi-agent and vision based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects, make it, to the best of our knowledge, a first of its kind. We demonstrate its use through several examples, either for control (trajectory tracking with PID control, multi-robot flight with downwash, etc.) or reinforcement learning (single and multi-agent stabilization tasks), hoping to inspire future research that combines control theory and machine learning.

翻訳日:2021-03-05 13:00:36 公開日:2021-03-04

PDF登録状況（公開日: 20210304）