Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210814となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 近似テンソル分解:多くの分離の消失 Approximate tensor decompositions: disappearance of many separations ( http://arxiv.org/abs/2004.10219v2 ) ライセンス: Link先を確認	Gemma De las Cuevas, Andreas Klingler, Tim Netzer	(参考訳) テンソル分解は分離、すなわち局所項の制約(肯定性など)がそれらの表現に任意に高いコストを伴っていることを示すことがよく知られている。ここで、これらの分離の多くは近似の場合消滅することを示す。具体的には、すべての近似誤差 $\varepsilon$ とノルムに対して、近似ランクを、そのノルムに関して $\varepsilon$-ball の要素の最小ランクとして定義する。正の半定値行列に対しては、Schatten $p$-norms の大きなクラスに対して階数、浄化階数、分離ランクの分離が消失することを示す。非負テンソルに対しては、$p>1$のすべての$\ell_p$-ノルムに対してランク、正半定ランク、非負ランクの分離が消えることを示す。トレースノルム(p = 1$)については、周囲の次元に依存する上界を得る。また,境界に達する近似分解を求める決定論的アルゴリズムを提案する。我々の主なツールはカラス・エオドリーの定理の近似版である。その結果、多くの分離はテンソルの小さな摂動下では頑健ではなく、量子多体系や通信複雑性にも影響することがわかった。 It is well-known that tensor decompositions show separations, that is, that constraints on local terms (such as positivity) may entail an arbitrarily high cost in their representation. Here we show that many of these separations disappear in the approximate case. Specifically, for every approximation error $\varepsilon$ and norm, we define the approximate rank as the minimum rank of an element in the $\varepsilon$-ball with respect to that norm. For positive semidefinite matrices, we show that the separations between rank, purification rank, and separable rank disappear for a large class of Schatten $p$-norms. For nonnegative tensors, we show that the separations between rank, positive semidefinite rank, and nonnegative rank disappear for all $\ell_p$-norms with $p>1$. For the trace norm ($p = 1$), we obtain upper bounds that depend on the ambient dimension. We also provide a deterministic algorithm to obtain the approximate decomposition attaining our bounds. Our main tool is an approximate version of Carath\'eodory's Theorem. Our results imply that many separations are not robust under small perturbations of the tensor, with implications in quantum many-body systems and communication complexity.	翻訳日:2023-05-22 20:29:49 公開日:2021-08-14
# 波動と粒子の性質は量子実体の中で空間的に分離できる Wave and particle properties can be spatially separated in a quantum entity ( http://arxiv.org/abs/2009.00545v2 ) ライセンス: Link先を確認	Pratyusha Chowdhury, Arun Kumar Pati and Jing-Ling Chen	(参考訳) 波動と粒子は自然の2つの基本的な性質である。波動-粒子の双対性は、実験の状況に応じて、量子オブジェクトが波動と粒子の両方の挙動を示す可能性があることを示している。 The major significance of wave-particle duality has led to a fundamental equation in quantum mechanics, the Schr{\" o}dinger equation. At present, the principle of wave-particle duality has been deeply rooted in people's hearts. This gives rise to a common sense perception that wave property and particle property coexist simultaneously in a quantum entity, and these two physical attributes cannot be completely separated from each other. In classical physics, a similar common sense is that a physical system is inseparable from its physical properties. However, this has been recently challenged and beaten by a quantum phenomenon called the "quantum Cheshire cat", for which a cat and its grin can be separated spatially. 本研究では,量子チェシャー猫の同様の技術に基づく思考実験を提案する。量子実体の波動特性と粒子特性を完全に分離することができ、量子実体の波動粒子双対性をうまく分解することができる。我々の結果は相補性原理とまだ一致しており、量子基底の理解を深めている。 Wave and particle are two fundamental properties of Nature. The wave-particle duality has indicated that a quantum object may exhibit the behaviours of both wave and particle, depending upon the circumstances of the experiment. The major significance of wave-particle duality has led to a fundamental equation in quantum mechanics, the Schr{\" o}dinger equation. At present, the principle of wave-particle duality has been deeply rooted in people's hearts. This gives rise to a common sense perception that wave property and particle property coexist simultaneously in a quantum entity, and these two physical attributes cannot be completely separated from each other. In classical physics, a similar common sense is that a physical system is inseparable from its physical properties. However, this has been recently challenged and beaten by a quantum phenomenon called the "quantum Cheshire cat", for which a cat and its grin can be separated spatially. In this work, we propose a thought experiment based on the similar technology of quantum Cheshire cat. We find that wave and particle attributes of a quantum entity can be completely separated, thus successfully dismantling the wave-particle duality for a quantum entity. Our result is still consistent with the complementarity principle and deepens the understanding of quantum foundations.	翻訳日:2023-05-04 03:13:30 公開日:2021-08-14
# 単一実験における非局所性, 操舵および量子状態トモグラフィー Nonlocality, steering and quantum state tomography in a single experiment ( http://arxiv.org/abs/2011.05666v3 ) ライセンス: Link先を確認	Chang-Jiang Huang, Guo-Yong Xiang, Yu Guo, Kang-Da Wu, Bi-Heng Liu, Chuan-Feng Li, Guang-Can Guo, Armin Tavakoli	(参考訳) 量子状態トモグラフィーのパラダイム的測定,すなわち相互に偏りのない基底と対称的な情報完全測定が量子相関の証明に有効かどうかを検討する。本研究の目的は, トモグラフィ実験で得られた結果統計に基づいて評価可能な絡み合い検出, 操舵, 非局所性のための簡易かつノイズ・ロバスト相関証人を特定することである。これにより、絡み合ったクトリッツの状態トモグラフィー、アインシュタイン-ポドルスキー-ローゼンステアリングの試験、ベルの不等式試験を1つの実験で実行することができる。また, 量子相関と断層計測における部分集合とのトレードオフや, 異なるシナリオにおける絡み合いの定量化についても検討した。最後に、これらの柔軟な仮定の下で量子相関を実証するフォトニクス実験を行う。 We investigate whether paradigmatic measurements for quantum state tomography, namely mutually unbiased bases and symmetric informationally complete measurements, can be employed to certify quantum correlations. For this purpose, we identify a simple and noise-robust correlation witness for entanglement detection, steering and nonlocality that can be evaluated based on the outcome statistics obtained in the tomography experiment. This allows us to perform state tomography on entangled qutrits, a test of Einstein-Podolsky-Rosen steering and a Bell inequality test, all within a single experiment. We also investigate the trade-off between quantum correlations and subsets of tomographically complete measurements as well as the quantification of entanglement in the different scenarios. Finally, we perform a photonics experiment in which we demonstrate quantum correlations under these flexible assumptions, namely with both parties trusted, one party untrusted and both parties untrusted.	翻訳日:2023-04-24 12:01:21 公開日:2021-08-14
# デチューニング変調ユニバーサル複合パルス Detuning modulated universal composite pulses ( http://arxiv.org/abs/2012.04401v2 ) ライセンス: Link先を確認	Hadar Greener, Elica Kyoseva, Haim Suchowski	(参考訳) 本稿では,2状態量子系のn回転としてデチューニング変調複合パルス(dmcps)を導出し,システムの初期状態とは独立な高精度でロバストなパルスを生成する方法を提案する。この方式は最小限のパルスオーバヘッドを持ち、システムの寿命内に量子情報処理(QIP)に適した10−4$しきい値の範囲内で振幅誤差に対して安定なパルスを達成する。この一連のパルスは、シリコンフォトニクスにおける避けられない製造誤差を克服し、正確な光伝達を達成するためにシステムに結合された正確な初期状態の必要性を緩和する。さらに、一般DMCPを既約SU(2)対称性を持つnレベルシステムに拡張し、任意の初期状態からのパルス領域の誤差に対して非常に堅牢な状態転送を生成する。 We present a general method to derive detuning-modualted composite pulses (DMCPs) as N rotations of a canonical two-state quantum system to create accurate and robust pulses that are independent of the initial state of the system. This scheme has minimal pulse overhead, and achieves pulses that are stable against amplitude errors well within the $10^{-4}$ threshold that may be suitable for quantum information processing (QIP), within the lifetime of the system. This family of pulses enables to overcome inevitable fabrication errors in silicon photonics, and relax the need for a precise initial state of light coupled into the system to achieve accurate light transfer. Furthermore, we extend universal DMCPs to n-level systems with irreducible SU(2) symmetry to create state transfer that is highly robust to errors in the pulse area from any initial state.	翻訳日:2023-04-21 18:23:01 公開日:2021-08-14
# 量子プログラムのためのデータフロー最適化 Enabling Dataflow Optimization for Quantum Programs ( http://arxiv.org/abs/2101.11030v2 ) ライセンス: Link先を確認	David Ittah, Thomas H\"aner, Vadym Kliuchnikov, Torsten Hoefler	(参考訳) 最適化のために,量子および古典的データの依存関係を直接公開する量子コンピューティング用IRを提案する。最適化のための量子中間表現(qiro、quantum intermediate representation for optimization)は、2つの方言からなる。 1つはおそらく直感的なメモリセマンティクス(量子演算は副作用として機能する)を使用し、もう1つはバリューセマンティクス(操作は状態を消費し、生成する)を使用する。重要なことに、これはデータフローを直接IRにエンコードし、データフロー分析を利用する最適化のホストを可能にする。本稿では、既存の量子プログラミング言語を入力方言にマップする方法と、irを最適化方言に下げる方法について論じる。本稿では、いくつかの量子固有最適化パスを含むMLIRに基づくプロトタイプ実装を提案する。我々のベンチマークでは、静的な最適化によっても、リソース要求の大幅な改善が可能であることを示しています。実行時の回路最適化とは対照的に、これはコンパイル時のオーバーヘッドを一定に抑えながら実現され、アプリケーション規模での量子プログラム最適化にとって魅力的なアプローチとなる。 We propose an IR for quantum computing that directly exposes quantum and classical data dependencies for the purpose of optimization. The Quantum Intermediate Representation for Optimization (QIRO) consists of two dialects, one input dialect and one that is specifically tailored to enable quantum-classical co-optimization. While the first employs a perhaps more intuitive memory-semantics (quantum operations act as side-effects), the latter uses value-semantics (operations consume and produce states). Crucially, this encodes the dataflow directly in the IR, allowing for a host of optimizations that leverage dataflow analysis. We discuss how to map existing quantum programming languages to the input dialect and how to lower the resulting IR to the optimization dialect. We present a prototype implementation based on MLIR that includes several quantum-specific optimization passes. Our benchmarks show that significant improvements in resource requirements are possible even through static optimization. In contrast to circuit optimization at run time, this is achieved while incurring only a small constant overhead in compilation time, making this a compelling approach for quantum program optimization at application scale.	翻訳日:2023-04-13 22:20:21 公開日:2021-08-14
# 相互作用するボーソン系におけるリーブ・ロビンソン結合および近似線形光円錐 Lieb-Robinson bound and almost-linear light-cone in interacting boson systems ( http://arxiv.org/abs/2103.11592v3 ) ライセンス: Link先を確認	Tomotaka Kuwahara, Keiji Saito	(参考訳) 本研究では,Bose-Hubbard型ハミルトニアンと相互作用するボソン系の局所摂動がいかに早く伝播するかを検討する。一般に、これらのシステムは非有界な局所エネルギーを持ち、任意に高速な情報伝達が起こる可能性がある。非摂動初期状態にある任意の部位のボソン数がほぼ限られている、特定のが実験的に自然な状況に焦点を当てる。我々は、ほぼ線形な情報伝達光円錐の存在を厳密に証明し、リーブ-ロビンソン結合を確立する:波面は最大で$t\log^2 (t)$となる。ギャップのある基底状態に対するクラスタリング定理を証明し、古典的に1次元クエンチ力学をシミュレートする時間複雑性について研究する。 In this work, we investigate how quickly local perturbations propagate in interacting boson systems with Bose-Hubbard-type Hamiltonians. In general, these systems have unbounded local energies, and arbitrarily fast information propagation may occur. We focus on a specific but experimentally natural situation in which the number of bosons at any one site in the unperturbed initial state is approximately limited. We rigorously prove the existence of an almost-linear information-propagation light-cone, thus establishing a Lieb--Robinson bound: the wave-front grows at most as $t\log^2 (t)$. We prove the clustering theorem for gapped ground states and study the time complexity of classically simulating one-dimensional quench dynamics, a topic of great practical interest.	翻訳日:2023-04-07 04:45:19 公開日:2021-08-14
# 渦ミューオンの崩壊 Decay of the vortex muon ( http://arxiv.org/abs/2106.00345v2 ) ライセンス: Link先を確認	Pengcheng Zhao, Igor P. Ivanov, Pengming Zhang	(参考訳) ミューオン崩壊は自己解析であり、放出された電子のスペクトル角分布は偏光ミューオンのスピン配向を示す。ここでは、非平面波状態のミューオンに同じ特徴が適用されることを示し、利用可能なリッチ偏光機会を明らかにする。我々は, 平均伝播方向に対して非零軌道角運動量をもついわゆる渦状態に着目し, 運動量分布における円錐構造を示す。渦ミューオンの崩壊で放出される電子のスペクトルと角分布を計算し、最も明らかな可観測性は角分布ではなく、固定角電子スペクトルであることを示す。渦ミューオンの非常に小さな円錐開口角であっても、渦ミューオンとおおよそ平面波ミューオンを区別し、様々な偏光状態の区別を可能にする電子スペクトルの大幅な変化を観察することは容易である。これらの特徴は、外部磁場における渦ミューオンの進化を追跡する鍵となる。 Muon decay is self-analyzing: the spectral-angular distribution of the emitted electron reveals the spin orientation of the polarized muon. Here, we show that the same feature applies to muons in non-plane-wave states and helps reveal the rich polarization opportunities available. We focus on the so-called vortex states, in which the muon carries a non-zero orbital angular momentum with respect to the average propagation direction and exhibits a cone structure in the momentum distribution. We compute the spectrum and the angular distribution of the electrons emitted in decays of vortex muons and show that the most revealing observable is not the angular distribution but the fixed-angle electron spectra. Even for very small cone opening angles of the vortex muons, it will be easy to observe significant modifications of the electron spectra which would allow one to distinguish vortex muons from approximately plane wave muons, as well as to differentiate among various polarization states. These features will be the key to tracking the evolution of vortex muons in external magnetic fields.	翻訳日:2023-03-28 03:50:02 公開日:2021-08-14
# 相互作用スピン1/2$フェルミオンの偏極希釈気体の基底状態エネルギー Ground state energy of the polarized diluted gas of interacting spin $1/2$ fermions ( http://arxiv.org/abs/2108.00793v2 ) ライセンス: Link先を確認	Piotr Chankowski and Jacek Wojtkiewicz	(参考訳) 実効場理論のアプローチは、フェルミ粒子の希薄気体の基底状態エネルギーの摂動計算を単純化し、非分極系の場合、古典的な結果を(k_{\rm f}a_0)^2$オーダー(ここで、$k_{\rm f}$は系のフェルミ運動量であり、$a_0$は$s$-wave散乱長である)まで容易に再導出することができ(より多くの労力で)、それを$(k_{\rm f}a_0)^4$まで拡張することができる。スピン1/2$フェルミオンの偏極気体の基底状態エネルギーの対応する膨張は、分析的に(最良の知識のために)$k_{\rm F}a_0$(ここで$k_{\rm F}$は$k_{{\rm F}\uparrow}$または$k_{{\rm F}\downarrow}$)オーダーでのみ知られている。ここでは、同じ有効場理論法により、この結果に対する$(k_{\rm F}a_0)^2$の補正も容易に行えることを示す。 The effective field theory approach simplifies the perturbative computation of the ground state energy of the diluted gas of fermions allowing in the case of the unpolarized system to easily re-derive the classic results up to the $(k_{\rm F}a_0)^2$ order (where $k_{\rm F}$ is the system's Fermi momentum and $a_0$ the $s$-wave scattering length) and (with more labour) to extend it up to the order $(k_{\rm F}a_0)^4$. The corresponding expansion of the ground state energy of the polarized gas of spin $1/2$ fermions is known analytically (to our best knowledge) only up to the $k_{\rm F}a_0$ (where $k_{\rm F}$ stands for $k_{{\rm F}\uparrow}$ or $k_{{\rm F}\downarrow}$) order. Here we show that the same effective field theory method allows to easily compute also the order $(k_{\rm F}a_0)^2$ correction to this result.	翻訳日:2023-03-20 03:20:39 公開日:2021-08-14
# BenchENAS:進化的ニューラルネットワーク検索のためのベンチマークプラットフォーム BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search ( http://arxiv.org/abs/2108.03856v2 ) ライセンス: Link先を確認	Xiangning Xie, Yuqiao Liu, Yanan Sun, Gary G. Yen, Bing Xue and Mengjie Zhang	(参考訳) ディープニューラルネットワークのアーキテクチャを自動的に設計するneural architecture search(nas)は、ここ数年で多くのアプリケーションでブレークスルーを達成した。 NAS法の様々なクラスの中で、進化計算に基づくNAS(ENAS)法が近年注目されている。残念ながら、公平な比較と効率的な評価の問題はENASの開発を妨げている。公正な比較のために設計された現在のベンチマークアーキテクチャデータセットは、ENASアルゴリズムやアルゴリズムを実行するプラットフォームではなく、データセットのみを提供する。既存の効率的な評価手法は、人口ベースのENASアルゴリズムには適さないか、複雑すぎるかのいずれかである。本稿では,これらの問題に対処するためのBenchENASというプラットフォームを開発する。 BenchENASは、同じ環境で同じ設定で異なるアルゴリズムを実行することで、公正な比較を実現することを目指している。共通ラボ環境での効率的な評価を実現するため、benchenasは並列コンポーネントとキャッシュコンポーネントを高い保守性で設計する。さらに、BenchENASはインストールが容易で、高度に構成可能でモジュール化されており、優れたユーザビリティと拡張性をもたらす。本稿では,このプラットフォーム上でのGPU利用率の高い8つのENASアルゴリズムの効率的な比較実験を行う。実験は、公正な比較問題が存在することを検証し、BenchENASはこの問題を緩和することができる。 BenchENASのソースコードとドキュメントを無償で入手できるWebサイトがhttps://benchenas.comで公開されている。 Neural architecture search (NAS), which automatically designs the architectures of deep neural networks, has achieved breakthrough success over many applications in the past few years. Among different classes of NAS methods, evolutionary computation based NAS (ENAS) methods have recently gained much attention. Unfortunately, the issues of fair comparisons and efficient evaluations have hindered the development of ENAS. The current benchmark architecture datasets designed for fair comparisons only provide the datasets, not the ENAS algorithms or the platform to run the algorithms. The existing efficient evaluation methods are either not suitable for the population-based ENAS algorithm or are too complex to use. This paper develops a platform named BenchENAS to address these issues. BenchENAS aims to achieve fair comparisons by running different algorithms in the same environment and with the same settings. To achieve efficient evaluation in a common lab environment, BenchENAS designs a parallel component and a cache component with high maintainability. Furthermore, BenchENAS is easy to install and highly configurable and modular, which brings benefits in good usability and easy extensibility. The paper conducts efficient comparison experiments on eight ENAS algorithms with high GPU utilization on this platform. The experiments validate that the fair comparison issue does exist, and BenchENAS can alleviate this issue. A website has been built to promote BenchENAS at https://benchenas.com, where interested researchers can obtain the source code and document of BenchENAS for free.	翻訳日:2023-03-18 23:42:04 公開日:2021-08-14
# Deformed Explicitly Correlated Gaussian Deformed Explicitly Correlated Gaussians ( http://arxiv.org/abs/2108.04859v2 ) ライセンス: Link先を確認	Matthew Beutel, Alexander Ahrens, Chenhang Huang, Yasuyuki Suzuki and Kalman Varga	(参考訳) 変形相関ガウス基底関数を導入し、それらの行列要素を算出する。これらの基底関数は非球面ポテンシャルの問題を解くのに使うことができる。そのようなポテンシャルの例として、パウリ・フィエルツ・ハミルトニアンにおける双極子自己相互作用項がある。キャビティqedにおける光マッター結合系を正確に解くための変形ガウス基底関数の精度と必要性を示す。 Deformed correlated Gaussian basis functions are introduced and their matrix elements are calculated. These basis functions can be used to solve problems with nonspherical potentials. One example of such potential is the dipole self-interaction term in the Pauli-Fierz Hamiltonian. Examples are presented showing the accuracy and necessity of deformed Gaussian basis functions to accurately solve light-matter coupled systems in cavity QED.	翻訳日:2023-03-18 21:09:02 公開日:2021-08-14
# lilliput: 短期的量子誤り訂正のための軽量な低遅延ルックアップテーブルベースのデコーダ LILLIPUT: A Lightweight Low-Latency Lookup-Table Based Decoder for Near-term Quantum Error Correction ( http://arxiv.org/abs/2108.06569v1 ) ライセンス: Link先を確認	Poulami Das, Aditya Locharla, Cody Jones	(参考訳) 量子デバイスのエラー率は、ほとんどの量子アプリケーションを実行するのに必要なものよりも桁違いに高い。このギャップを埋めるために、Quantum Error Correction (QEC)は論理量子ビットを符号化し、複数の物理量子ビットを用いて情報を分配する。論理量子ビット上で定期的にシンドローム抽出回路を実行することにより、プログラムの実行中にエラーに関する情報(シンドロームと呼ばれる)を抽出する。デコーダはこれらのシンドロームを使用して、量子アルゴリズムで実装されたフィードバックを使用するために、リアルタイムでエラーを特定し、修正する。残念ながら、ソフトウェアデコーダは遅く、ハードウェアデコーダは高速だが正確ではない。したがって、これまでのほとんどのQEC研究はオフラインの復号化に依存している。短期QECにおけるリアルタイムデコーディングを実現するために,軽量低遅延ルックアップテーブルデコーダLILLIPUTを提案する。 lilliputはまず、症候群をエラー検出イベントに変換してルックアップテーブル(lut)にインデックスし、そのエントリがエラー情報をリアルタイムで提供する。第2に、ソフトウェアデコーダをオフラインで実行することで、全ての可能なエラーイベントに対するエラー割り当てをLUTにプログラムする。 lilliputは、ゲートや測定を含む量子ハードウェアのあらゆる操作のエラーを許容し、許容されたエラーの数はコードのサイズに応じて増加する。既存のシステムの制御回路や読み出し回路と容易に統合できるように、既製のFPGA上で <7% のロジックが必要である。 LILLIPUTは、数ナノ秒のレイテンシを発生させ、リアルタイムデコードを可能にする。また,LILLIPUTに必要なメモリを削減するために,CLUT(Compressed LUT)を提案する。すべてのエラーイベントが等しくあり、最も可能性の高いエラーイベントに対してのみデータを格納するという事実を利用して、clutsは107x(148mbから1.38mb)までのメモリを、精度を低下させることなく削減する。 The error rates of quantum devices are orders of magnitude higher than what is needed to run most quantum applications. To close this gap, Quantum Error Correction (QEC) encodes logical qubits and distributes information using several physical qubits. By periodically executing a syndrome extraction circuit on the logical qubits, information about errors (called syndrome) is extracted while running programs. A decoder uses these syndromes to identify and correct errors in real time, which is required to use feedback implemented in quantum algorithms. Unfortunately, software decoders are slow and hardware decoders are fast but less accurate. Thus, almost all QEC studies so far have relied on offline decoding. To enable real-time decoding in near-term QEC, we propose LILLIPUT-- a Lightweight Low Latency Look-Up Table decoder. LILLIPUT consists of two parts-- First, it translates syndromes into error detection events that index into a Look-Up Table (LUT) whose entry provides the error information in real-time. Second, it programs the LUTs with error assignments for all possible error events by running a software decoder offline. LILLIPUT tolerates an error on any operation in the quantum hardware, including gates and measurement, and the number of tolerated errors grows with the size of the code. It needs <7% logic on off-the-shelf FPGAs that allows it to be easily integrated alongside the control and readout circuits in existing systems. LILLIPUT incurs a latency of few nanoseconds and enables real-time decoding. We also propose Compressed LUTs (CLUTs) to reduce the memory needed by LILLIPUT. By exploiting the fact that not all error events are equally likely and only storing data for the most probable error events, CLUTs reduce the memory needed by up-to 107x (from 148 MB to 1.38 MB) without degrading accuracy.	翻訳日:2023-03-18 13:04:27 公開日:2021-08-14
# PaDGAN: 性能向上した分散設計のためのジェネレータネットワーク PaDGAN: A Generative Adversarial Network for Performance Augmented Diverse Designs ( http://arxiv.org/abs/2002.11304v5 ) ライセンス: Link先を確認	Wei Chen, Faez Ahmed	(参考訳) 深部生成モデルは自動設計合成と設計空間探索に有用なツールであることが証明されている。エンジニアリング設計に適用すると、既存の生成モデルには3つの課題がある。 1) 生成した設計は多様性を欠き,設計領域のすべての領域をカバーしない。 2) 生成した設計の全体的な性能や品質を明示的に改善することは困難である。 3) 既存のモデルでは,トレーニングデータの領域外において,新しい設計は行われない。本稿では,これらの課題に対して,多様性と品質の確率論的モデリングのための新しい決定点プロセスに基づく損失関数を提案する。この新しい損失関数により、デザイン空間を良好にカバーした新しい高品質なデザインを生成できる「高性能拡張型多様な生成型adversarial network」または「padgan」と呼ばれる生成型adversarial networkの変種を開発する。 3つの合成例と1つの実世界の翼設計例を用いて、PaDGANが多種多様な高品質な設計を生成できることを実証した。バニラ生成広告ネットワークと比較すると、平均して、28%高い平均品質スコアで、より多様性があり、モード崩壊の問題のないサンプルを生成する。トレーニングデータのバウンダリ内に補間することで新しい設計を生成する典型的な生成モデルとは異なり、PaDGANはトレーニングデータの外側の高品質な領域に設計空間境界を拡張する。提案手法は,設計空間探索,設計最適化,創造的ソリューションレコメンデーションなど,多くのタスクに適用可能である。 Deep generative models are proven to be a useful tool for automatic design synthesis and design space exploration. When applied in engineering design, existing generative models face three challenges: 1) generated designs lack diversity and do not cover all areas of the design space, 2) it is difficult to explicitly improve the overall performance or quality of generated designs, and 3) existing models generally do not generate novel designs, outside the domain of the training data. In this paper, we simultaneously address these challenges by proposing a new Determinantal Point Processes based loss function for probabilistic modeling of diversity and quality. With this new loss function, we develop a variant of the Generative Adversarial Network, named "Performance Augmented Diverse Generative Adversarial Network" or PaDGAN, which can generate novel high-quality designs with good coverage of the design space. Using three synthetic examples and one real-world airfoil design example, we demonstrate that PaDGAN can generate diverse and high-quality designs. In comparison to a vanilla Generative Adversarial Network, on average, it generates samples with a 28% higher mean quality score with larger diversity and without the mode collapse issue. Unlike typical generative models that usually generate new designs by interpolating within the boundary of training data, we show that PaDGAN expands the design space boundary outside the training data towards high-quality regions. The proposed method is broadly applicable to many tasks including design space exploration, design optimization, and creative solution recommendation.	翻訳日:2022-12-28 14:06:20 公開日:2021-08-14
# multimbnn:ニューラルネットワークによる因果推論のマッチングとバランス MultiMBNN: Matched and Balanced Causal Inference with Neural Networks ( http://arxiv.org/abs/2004.13446v4 ) ライセンス: Link先を確認	Ankit Sharma, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, Lovekesh Vig, Gautam Shroff	(参考訳) 観察研究における因果推論(CI)は、医療、教育、アドトリビューション、政策評価などにおいて多くの注目を集めている。コンバウンディングは、コンテキストが治療の割り当てと応答の両方に影響を与える典型的な危険である。複数の処理シナリオにおいて,ニューラルネットワークを用いたmultimbnnを提案し,一般化されたプロペンサリティスコアに基づくマッチングと学習バランスの取れた表現を用いることで,コンファウンディングを克服する。 PEHEを用いて、合成および実世界のデータセットのパフォーマンスをベンチマークし、ATEを指標として絶対誤差を平均する。 MultiMBNNは、TARNetやPerfect Match (PM)のようなCIの最先端アルゴリズムよりも優れている。 Causal inference (CI) in observational studies has received a lot of attention in healthcare, education, ad attribution, policy evaluation, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome confounding by employing generalized propensity score based matching, and learning balanced representations. We benchmark the performance on synthetic and real-world datasets using PEHE, and mean absolute percentage error over ATE as metrics. MultiMBNN outperforms the state-of-the-art algorithms for CI such as TARNet and Perfect Match (PM).	翻訳日:2022-12-08 23:52:18 公開日:2021-08-14
# セマンティックカラー化 Semantic-driven Colorization ( http://arxiv.org/abs/2006.07587v3 ) ライセンス: Link先を確認	Man M. Ho, Lu Zhang, Alexander Raake, Jinjia Zhou	(参考訳) 最近の着色は、白黒画像の着色を学習しながら意味情報を暗黙的に予測する。これにより、生成した色はオーバーフローしやすくなり、セマンティクスの障害は見えなくなる。人間の着色経験として、私たちの脳はまず写真の物体を検知し、認識し、次に実生活で見た多くの類似した物体に基づいて可視色を想像し、最後にそれらを着色する。そこで本研究では,まず,人間の動作をシミュレートして,画像の理解を学習し,色づけする。このように、我々の研究は意味レベルでもっともらしい色を提供できる。さらに、学習モデルのセマンティクス情報は理解可能になり、対話できるようになる。さらに、インスタンスの正規化も色付けの欠如を証明し、2つのデータストリームを持つためにU-Netの推論フローを再設計し、白黒画像とその意味マップから特徴マップを正規化する適切な方法を提供する。その結果、ネットワークは特定の対象に対して典型的な色付け作業と競合する可視色を提供できる。 Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. As a human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in the teaser. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information of the learned model becomes understandable and able to interact. Additionally, we also prove that Instance Normalization is also a missing ingredient for colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the feature maps from the black-and-white image and its semantic map. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects.	翻訳日:2022-11-21 21:09:09 公開日:2021-08-14
# ResRep:デカップリングによるロスレスCNNのプルーニング ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting ( http://arxiv.org/abs/2007.03260v4 ) ライセンス: Link先を確認	Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding	(参考訳) 本研究では,畳み込み層の幅(出力チャネル数)を小さくすることでcnnをスリム化する,ロスレスチャネルプルーニング(フィルタプルーニング)の新しい手法であるresrepを提案する。記憶と忘れの独立性に関する神経生物学研究から着想を得て,CNNを記憶部分と忘れ部分に再パラメータ化することを提案する。前者に対して正規SGDを用いたトレーニングを行ったが,後者ではペナルティ勾配の新たな更新規則により,構造的空間性を実現した。次に、記憶と忘れ物を、より狭いレイヤで元のアーキテクチャにマージします。この意味で、ResRepは構造的再パラメータ化の成功例と見なすことができる。このような方法論は、パラメータにペナルティを適用してスパーシティを生み出す従来の学習ベースのプルーニングパラダイムとresrepを区別する。 ResRep は標準の ResNet-50 を76.15% の精度で ImageNet から 45% のFLOP しか持たず、精度を落とさず、圧縮率の高いロスレスプルーニングを初めて達成した。コードとモデルはhttps://github.com/DingXiaoH/ResRepにある。 We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep.	翻訳日:2022-11-12 18:31:11 公開日:2021-08-14
# 社会相変化を受けるアリのビデオにおける異常状態の同定 Identification of Abnormal States in Videos of Ants Undergoing Social Phase Change ( http://arxiv.org/abs/2009.08626v2 ) ライセンス: Link先を確認	Taeyeong Choi, Benjamin Pyenson, Juergen Liebig, Theodore P. Pavlic	(参考訳) 生物学は、高度な機械学習技術を開発するための重要な応用分野であり、モチベーションの源でもある。高スループットシークエンシングによる大規模で複雑なデータセットに多くの注意が払われているが、高品質のビデオ記録技術の進歩は、コンピュータビジョンと時系列解析の両方の高度な技術を必要とする同様にリッチなデータセットを生成するようになった。さらに、ある生物における遺伝子発現パターンの研究が他の生物に適用できる一般的な原理を明らかにするのと同様に、実験室のアリコロニーのような実験的に抽出可能なモデルシステムにおける複雑な社会的相互作用の研究は、他の社会グループのダイナミクスに関する一般的な原則を提供することができる。本稿では,50種以上のハルペグナトスアリの小さな実験室コロニーにおける生殖調節の研究から,このような事例を取り上げる。これらのアリは人工的に誘導され、階層改革の約20日間のプロセスが始まる。この過程の結論は人間の観察者に顕著であるが、過渡期におけるどの行動が過程に寄与しているかは未だ不明である。この課題に対処するために,訓練中の正常な社会的条件に対してのみ行動データが利用できるアリコロニーにおける異常状態の検出に,ワンクラス分類(OC)の応用の可能性を検討する。具体的には、DSVDD(Deep Support Vector Data Description)に基づいて、DSVDDデータ記述の中心に近いトレーニング中に、偽の"インナー・アウター"観測を合成するインナー・アウター・ジェネレータ(IO-GEN)を導入する。 IO-GEN は他の DSVDD ベースラインと比較して最終 OC 分類器の信頼性が向上することを示す。この方法は、追加の人間の観察が必要なビデオフレームの表示に使用できる。 Biology is both an important application area and a source of motivation for development of advanced machine learning techniques. Although much attention has been paid to large and complex data sets resulting from high-throughput sequencing, advances in high-quality video recording technology have begun to generate similarly rich data sets requiring sophisticated techniques from both computer vision and time-series analysis. Moreover, just as studying gene expression patterns in one organism can reveal general principles that apply to other organisms, the study of complex social interactions in an experimentally tractable model system, such as a laboratory ant colony, can provide general principles about the dynamics of other social groups. Here, we focus on one such example from the study of reproductive regulation in small laboratory colonies of more than 50 Harpegnathos ants. These ants can be artificially induced to begin a ~20 day process of hierarchy reformation. Although the conclusion of this process is conspicuous to a human observer, it remains unclear which behaviors during the transient period are contributing to the process. To address this issue, we explore the potential application of One-class Classification (OC) to the detection of abnormal states in ant colonies for which behavioral data is only available for the normal societal conditions during training. Specifically, we build upon the Deep Support Vector Data Description (DSVDD) and introduce the Inner-Outlier Generator (IO-GEN) that synthesizes fake "inner outlier" observations during training that are near the center of the DSVDD data description. We show that IO-GEN increases the reliability of the final OC classifier relative to other DSVDD baselines. This method can be used to screen video frames for which additional human observation is needed.	翻訳日:2022-10-17 02:51:56 公開日:2021-08-14
# より広いニューラルネットワークは、敵のロバスト性に役立つか? Do Wider Neural Networks Really Help Adversarial Robustness? ( http://arxiv.org/abs/2010.01279v3 ) ライセンス: Link先を確認	Boxi Wu and Jinghui Chen and Deng Cai and Xiaofei He and Quanquan Gu	(参考訳) 敵の訓練は、敵の例に対する強力な防御手段である。以前の実験結果から、敵のトレーニングはパフォーマンスを改善するためにより広いネットワークを必要とすることが示唆された。しかし、ニューラルネットワークの幅がモデルロバスト性にどのように影響するかは、いまだ解明されていない。本稿では,ネットワーク幅とモデルロバスト性との関係を慎重に検討する。具体的には、モデルロバスト性は、ロバスト正規化パラメータ $\lambda$ によって制御される自然精度と摂動安定性のトレードオフと密接に関連していることを示す。同じ$\lambda$で、より広いネットワークはより優れた自然な精度を実現することができるが、摂動安定性が悪くなり、モデル全体の堅牢性が悪化する可能性がある。この現象の起源を理解するため、摂動安定性はネットワークの局所リプシッツ性とも関係している。ニューラルネットワークカーネルの最近の結果を活用することで、より広いネットワークが摂動安定性を悪くする傾向があることを示す。私たちの分析は 1)小型ネットワーク上で最初に$\lambda$を微調整し,それをモデルトレーニングに直接使用するという一般的な戦略は,モデルの堅牢性を低下させる可能性がある。 2) より広いモデルの堅牢性の可能性を完全に解き放つためには、$\lambda$を適切に拡大する必要がある。最後に、ワイドモデル上で$\lambda$を適応的に拡大し、チューニング時間を著しく短縮する新しい Width Adjusted Regularization (WAR) 法を提案する。 Adversarial training is a powerful type of defense against adversarial examples. Previous empirical results suggest that adversarial training requires wider networks for better performances. However, it remains elusive how neural network width affects model robustness. In this paper, we carefully examine the relationship between network width and model robustness. Specifically, we show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability, which is controlled by the robust regularization parameter $\lambda$. With the same $\lambda$, wider networks can achieve better natural accuracy but worse perturbation stability, leading to a potentially worse overall model robustness. To understand the origin of this phenomenon, we further relate the perturbation stability with the network's local Lipschitzness. By leveraging recent results on neural tangent kernels, we theoretically show that wider networks tend to have worse perturbation stability. Our analyses suggest that: 1) the common strategy of first fine-tuning $\lambda$ on small networks and then directly use it for wide model training could lead to deteriorated model robustness; 2) one needs to properly enlarge $\lambda$ to unleash the robustness potential of wider models fully. Finally, we propose a new Width Adjusted Regularization (WAR) method that adaptively enlarges $\lambda$ on wide models and significantly saves the tuning time.	翻訳日:2022-10-11 08:35:54 公開日:2021-08-14
# 効率的なグローバルローカル特徴表現と局所時間集約による歩行認識 Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation ( http://arxiv.org/abs/2011.01461v2 ) ライセンス: Link先を確認	Beibei Lin, Shunli Zhang and Xin Yu	(参考訳) 歩行認識は最も重要な生体計測技術の1つであり、多くの分野で応用されている。最近の歩行認識フレームワークは、人間のグローバル外観または地域から抽出された記述子によって、それぞれの歩行フレームを表現する。しかし、グローバル情報に基づく表現はしばしば歩行フレームの詳細を無視するが、地域ベースの記述子は近隣地域の関係を捉えることができないため、識別性が低下する。本稿では,歩行認識のための識別的特徴表現を実現するための特徴抽出・融合フレームワークを提案する。この目標に向けて、グローバルビジュアル情報とローカル領域の詳細の両方を利用し、グローバル・ローカル機能抽出器(glfe)を開発します。特に、当社のglfeモジュールは、新たに設計された複数のグローバルおよびローカル畳み込み層(glconv)で構成され、グローバルおよびローカル機能を原則的にアンサンブルします。さらに,時間分解能を低減し,より高い空間分解能を得るために,空間情報をさらに保存するための新しい操作である局所時間凝集(lta)を提案する。 glfeとltaの助けを借りて,視覚特徴の判別性を大幅に改善し,歩行認識性能を向上した。大規模実験により,提案手法が2つの一般的なデータセットにおける最先端の歩行認識法を上回っていることを示す。 Gait recognition is one of the most important biometric technologies and has been applied in many fields. Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans. However, the representations based on global information often neglect the details of the gait frame, while local region based descriptors cannot capture the relations among neighboring regions, thus reducing their discriminativeness. In this paper, we propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition. Towards this goal, we take advantage of both global visual information and local region details and develop a Global and Local Feature Extractor (GLFE). Specifically, our GLFE module is composed of our newly designed multiple global and local convolutional layers (GLConv) to ensemble global and local features in a principle manner. Furthermore, we present a novel operation, namely Local Temporal Aggregation (LTA), to further preserve the spatial information by reducing the temporal resolution to obtain higher spatial resolution. With the help of our GLFE and LTA, our method significantly improves the discriminativeness of our visual features, thus improving the gait recognition performance. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art gait recognition methods on two popular datasets.	翻訳日:2022-09-30 05:10:03 公開日:2021-08-14
# 特徴の模倣による知識の蒸留 Distilling Knowledge by Mimicking Features ( http://arxiv.org/abs/2011.01424v2 ) ライセンス: Link先を確認	Guo-Hua Wang, Yifan Ge, Jianxin Wu	(参考訳) 知識蒸留(KD)は、高容量ネットワーク(教師)の助けを借りて効率的なネットワーク(学生)を訓練する一般的な方法である。伝統的な手法では、教師のソフトロジットを学生ネットワークを訓練するための余分な監督として使用する。本稿では,学生にペナルティメート層における教師の特徴を模倣させることがより有利であると主張する。生徒は教師機能から直接より効果的な情報を学べるだけでなく、機能の模倣はソフトマックス層なしで訓練された教師にも応用できる。実験の結果、従来のkdよりも高い精度が得られることがわかった。さらに機能模倣を容易にするために,特徴ベクトルを大きさと方向に分解する。教師は生徒の特徴の大きさにより多くの自由を与え、生徒は特徴の方向性を模倣することにもっと注意を払うべきだと論じている。この要件を満たすために,LSH(Locality-sensitive hashing)に基づく損失項を提案する。この新たな損失の助けを借りて、本手法は、機能方向をより正確に模倣し、特徴量の制約を緩和し、最先端の蒸留精度を達成する。 lshが特徴方向模倣をいかに促進するかの理論解析を行い、特徴模倣をマルチラベル認識と物体検出にさらに拡張する。 Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logits as extra supervision to train the student network. In this paper, we argue that it is more advantageous to make the student mimic the teacher's features in the penultimate layer. Not only the student can directly learn more effective information from the teacher feature, feature mimicking can also be applied for teachers trained without a softmax layer. Experiments show that it can achieve higher accuracy than traditional KD. To further facilitate feature mimicking, we decompose a feature vector into the magnitude and the direction. We argue that the teacher should give more freedom to the student feature's magnitude, and let the student pay more attention on mimicking the feature direction. To meet this requirement, we propose a loss term based on locality-sensitive hashing (LSH). With the help of this new loss, our method indeed mimics feature directions more accurately, relaxes constraints on feature magnitudes, and achieves state-of-the-art distillation accuracy. We provide theoretical analyses of how LSH facilitates feature direction mimicking, and further extend feature mimicking to multi-label recognition and object detection.	翻訳日:2022-09-30 04:17:41 公開日:2021-08-14
# Webコンテンツのブランド一貫性向上のための統合的アプローチ:モデリング,分析,勧告 An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis and Recommendation ( http://arxiv.org/abs/2011.09754v3 ) ライセンス: Link先を確認	Soumyadeep Roy, Shamik Sural, Niyati Chhaya, Anandhavelu Natarajan, Niloy Ganguly	(参考訳) 消費者依存型(ビジネス・ツー・コンシューマー)組織は、会社のブランドパーソナリティ(ブランドパーソナリティ)と呼ばれる、人的品質のセットを保有する傾向にある。この知覚は、組織によって作成された広告、ブログ、雑誌のような形で、コンテンツを通じて消費者に印象を与えます。一貫性のあるブランドは、規則性と一般的なパターンに対する親和性が発達するにつれて、信頼を生み出し、顧客を維持するでしょう。しかし、ブランドの一貫したメッセージトーンを維持することは、デジタルマーケティング時代の最先端を維持するために、作成し、インターネットにプッシュする必要があるコンテンツの量が仮想的に爆発するにつれて、ますます難しくなっている。問題の深さを理解するために、約650社の約300万のWebページコンテンツを収集した。内容の言語的特徴を考慮した特徴特化分類モデルを開発した。分類器は、企業のミッションやビジョンと一致しないWeb記事を自動的に識別し、一貫性を維持することができない条件を見つけるのに役立ちます。ブランドの不整合問題に対処するために,web 記事のパーソナリティに一貫性を持たせるために,変更が必要な上位3つの文を出力可能な文ランキングシステムを開発した。 A consumer-dependent (business-to-consumer) organization tends to present itself as possessing a set of human qualities, which is termed as the brand personality of the company. The perception is impressed upon the consumer through the content, be it in the form of advertisement, blogs or magazines, produced by the organization. A consistent brand will generate trust and retain customers over time as they develop an affinity towards regularity and common patterns. However, maintaining a consistent messaging tone for a brand has become more challenging with the virtual explosion in the amount of content which needs to be authored and pushed to the Internet to maintain an edge in the era of digital marketing. To understand the depth of the problem, we collect around 300K web page content from around 650 companies. We develop trait-specific classification models by considering the linguistic features of the content. The classifier automatically identifies the web articles which are not consistent with the mission and vision of a company and further helps us to discover the conditions under which the consistency cannot be maintained. To address the brand inconsistency issue, we then develop a sentence ranking system that outputs the top three sentences that need to be changed for making a web article more consistent with the company's brand personality.	翻訳日:2022-09-23 21:27:53 公開日:2021-08-14
# RIN: 単一画像による人体モデル復元と模倣のテクスチャ化 RIN: Textured Human Model Recovery and Imitation with a Single Image ( http://arxiv.org/abs/2011.12024v4 ) ライセンス: Link先を確認	Haoxi Ran, Guangfu Wang, Li Lu	(参考訳) 人間の模倣は、GANの人間のポーズと身体の内容を歪める能力によって、最近話題になっている。しかし,最新の手法では3d情報にはほとんど注目せず,自己完結を避けるためには大量の入力画像が必要となる。本稿では,1枚の画像からテクスチャ3dモデルを再構成し,生成したモデルを用いて被写体を模倣する,新しいボリュームベースフレームワークrinを提案する。具体的には、人間のテクスチャのほとんどを推定するために、U-Netのようなフロントエンド翻訳ネットワークを提案する。前後の両方の画像を入力すると、テクスチャ化されたボリュームリカバリモジュールによって、ボリュームの人間を色づけすることができます。 3Dポーズのシーケンスは、ボリュームからボリュームへの変換タスクとして、フローブルディケンタングルネットワークを介して色付きボリュームをガイドする。トレーニング中に2次元平面にボリュームを投影するために, 異なる深度対応レンダラーを設計する。実験の結果,人間の模倣には容積モデルが適しており,バックビューはネットワークを用いて確実に推定できることがわかった。 2dポーズやセマンティクスマップに基づく先行作業は、人間の不安定な外観では失敗することが多いが、我々のフレームワークは、マルチビュー入力から想像されるものと競合する具体的な結果を生み出すことができる。 Human imitation has become topical recently, driven by GAN's ability to disentangle human pose and body content. However, the latest methods hardly focus on 3D information, and to avoid self-occlusion, a massive amount of input images are needed. In this paper, we propose RIN, a novel volume-based framework for reconstructing a textured 3D model from a single picture and imitating a subject with the generated model. Specifically, to estimate most of the human texture, we propose a U-Net-like front-to-back translation network. With both front and back images input, the textured volume recovery module allows us to color a volumetric human. A sequence of 3D poses then guides the colored volume via Flowable Disentangle Networks as a volume-to-volume translation task. To project volumes to a 2D plane during training, we design a differentiable depth-aware renderer. Our experiments demonstrate that our volume-based model is adequate for human imitation, and the back view can be estimated reliably using our network. While prior works based on either 2D pose or semantic map often fail for the unstable appearance of a human, our framework can still produce concrete results, which are competitive to those imagined from multi-view input.	翻訳日:2022-09-21 12:27:38 公開日:2021-08-14
# (参考訳) siam: ディープニューラルネットワークのためのチップレットベースのスケーラブルなインメモリアクセラレーション SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks ( http://arxiv.org/abs/2108.08903v1 ) ライセンス: CC BY 4.0	Gokul Krishnan, Sumit K. Mandal, Manvitha Pannala, Chaitali Chakrabarti, Jae-sun Seo, Umit Y. Ogras, Yu Cao	(参考訳) ディープラーニングのためのモノリシックなチップ上でのインメモリコンピューティング(IMC)は、モデルサイズの増加に伴い、領域、収量、オンチップの相互接続コストに劇的な課題に直面している。 2.5d統合あるいはchipletベースのアーキテクチャは、複数の小さなチップ(チップレットなど)を相互接続して大規模なコンピューティングシステムを形成し、モノリシックなiccアーキテクチャを超えて実現可能なソリューションを示し、大規模ディープラーニングモデルを加速する。本稿では,チップレットベースのIMCアーキテクチャの性能評価を行うベンチマークシミュレータSIAMを提案し,IMCアーキテクチャ設計におけるそのようなパラダイムシフトの可能性を探る。 SIAMはデバイス、回路、アーキテクチャ、ネットワークオンチップ(NoC)、ネットワークオンパッケージ(NoP)、DRAMアクセスモデルを統合し、エンドツーエンドシステムを実現する。 SIAMは広範囲のディープニューラルネットワーク(DNN)をサポートし、さまざまなネットワーク構造や構成に合わせてカスタマイズ可能で、効率的な設計空間探索が可能である。 CIFAR-10, CIFAR-100, ImageNetデータセットを用いて, 最先端DNNのベンチマークを行い, SIAMの柔軟性, スケーラビリティ, シミュレーション速度を示す。さらに,シリコーンによるシミュレーション結果をSIMBAでキャリブレーションする。 SIAMを通じて得られたチップレットベースのIMCアーキテクチャは、Nvidia V100やT4 GPUと比較して、ImageNetデータセット上でResNet-50のエネルギー効率が130$\times$と72$\times$改善されている。 In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes. 2.5D integration or chiplet-based architectures interconnect multiple small chips (i.e., chiplets) to form a large computing system, presenting a feasible solution beyond a monolithic IMC architecture to accelerate large deep learning models. This paper presents a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of such a paradigm shift in IMC architecture design. SIAM integrates device, circuit, architecture, network-on-chip (NoC), network-on-package (NoP), and DRAM access models to realize an end-to-end system. SIAM is scalable in its support of a wide range of deep neural networks (DNNs), customizable to various network structures and configurations, and capable of efficient design space exploration. We demonstrate the flexibility, scalability, and simulation speed of SIAM by benchmarking different state-of-the-art DNNs with CIFAR-10, CIFAR-100, and ImageNet datasets. We further calibrate the simulation results with a published silicon result, SIMBA. The chiplet-based IMC architecture obtained through SIAM shows 130$\times$ and 72$\times$ improvement in energy-efficiency for ResNet-50 on the ImageNet dataset compared to Nvidia V100 and T4 GPUs.	翻訳日:2021-08-29 15:10:28 公開日:2021-08-14
# (参考訳) TRAPDOOR: 機械学習に基づくゲノム解析におけるデータセットバイアス検出のためのバックドアの再利用 TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis ( http://arxiv.org/abs/2108.10132v1 ) ライセンス: CC BY 4.0	Esha Sarkar, Michail Maniatakos	(参考訳) 機械学習(ML)は、画像、音声、テキスト、データ分析など、いくつかのアプリケーションで前例のないパフォーマンスを達成した。遺伝子変異(ゲノミクス)の根底にあるパターンを理解するのにMLを使うことは、診断の落とし穴を克服するだけでなく、がんのような生命を脅かす疾患の治療を設計する上でも、はるかに大きな結果をもたらす。 MLアルゴリズムの成功と持続性は、収集およびトレーニングに使用されるデータの質と多様性に依存する。グループ(民族グループ、性別グループなど)の下位表現このようなデータセットでは、特定のグループの不正確な予測につながる可能性がある。本研究では,ニューラルネットワークのバックドア(バックドア)という悪質な目的のために提案された手法を再提案し,バイアス付きデータセットの同定手法であるTRAPDOORを提案する。我々は、病院、共同プロジェクト、研究機関からセンシティブなグループに対するバイアスを意識せずに中央クラウドにデータがもたらされるゲノミクスサプライチェーンの典型的な協調学習セットを検討する。そこで本研究では,ゲノム応用のためのMLバックドアを用いた真の性能を損なうことなく,集団データの潜在的なバイアス情報を漏洩させる手法を開発した。実世界のがんデータセットを用いて、すでに白色個体に対して存在する偏差を分析し、データセットに偏差を人工的に導入し、実験結果により、TRAPDOORが100%精度でデータセット偏差を検出できること、さらに小さな誤差で偏差を回復することで偏差の程度を抽出できることが示されている。 Machine Learning (ML) has achieved unprecedented performance in several applications including image, speech, text, and data analysis. Use of ML to understand underlying patterns in gene mutations (genomics) has far-reaching results, not only in overcoming diagnostic pitfalls, but also in designing treatments for life-threatening diseases like cancer. Success and sustainability of ML algorithms depends on the quality and diversity of data collected and used for training. Under-representation of groups (ethnic groups, gender groups, etc.) in such a dataset can lead to inaccurate predictions for certain groups, which can further exacerbate systemic discrimination issues. In this work, we propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors. We consider a typical collaborative learning setting of the genomics supply chain, where data may come from hospitals, collaborative projects, or research institutes to a central cloud without awareness of bias against a sensitive group. In this context, we develop a methodology to leak potential bias information of the collective data without hampering the genuine performance using ML backdooring catered for genomic applications. Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially, and our experimental result show that TRAPDOOR can detect the presence of dataset bias with 100% accuracy, and furthermore can also extract the extent of bias by recovering the percentage with a small error.	翻訳日:2021-08-29 14:45:58 公開日:2021-08-14
# 近位正規化サブバンド適応アルゴリズムによる音響エコーキャンセラの検討 Study of Proximal Normalized Subband Adaptive Algorithm for Acoustic Echo Cancellation ( http://arxiv.org/abs/2108.10219v1 ) ライセンス: Link先を確認	Gang Guo, Yi Yu, Rodrigo C. de Lamare, Zongsheng Zheng, Lu Lu and Qiangming Cai	(参考訳) 本稿では,分散シナリオに適した正規化サブバンド適応フィルタアルゴリズムを提案する。提案アルゴリズムは, 近位前方分割法とソフトスレッショルド法に基づいて導出する。シミュレーションによって支援されるアルゴリズムの平均および平均二乗挙動を解析する。さらに, 平均二乗偏差の最小化に基づいて, 近位ステップにおけるしきい値パラメータの選択に対する適応的アプローチも提案する。システム同定と音響エコーキャンセラの文脈におけるシミュレーションにより,提案アルゴリズムの優位性を検証した。 In this paper, we propose a novel normalized subband adaptive filter algorithm suited for sparse scenarios, which combines the proportionate and sparsity-aware mechanisms. The proposed algorithm is derived based on the proximal forward-backward splitting and the soft-thresholding methods. We analyze the mean and mean square behaviors of the algorithm, which is supported by simulations. In addition, an adaptive approach for the choice of the thresholding parameter in the proximal step is also proposed based on the minimization of the mean square deviation. Simulations in the contexts of system identification and acoustic echo cancellation verify the superiority of the proposed algorithm over its counterparts.	翻訳日:2021-08-29 12:10:05 公開日:2021-08-14
# FOX-NAS: 高速でオンデバイスで説明可能なニューラルアーキテクチャ検索 FOX-NAS: Fast, On-device and Explainable Neural Architecture Search ( http://arxiv.org/abs/2108.08189v1 ) ライセンス: Link先を確認	Chia-Hsiang Liu, Yu-Shin Han, Yuan-Yao Sung, Yi Lee, Hung-Yueh Chiang, Kai-Chiang Wu	(参考訳) ニューラルネットワーク検索は、優れたパフォーマンスを持つニューラルネットワークを見つけることができ、ワンショットアプローチが一般的です。ワンショットアプローチは通常、重み付けとアーキテクチャのパフォーマンスを予測する予測器を備えたスーパーネットを必要とする。しかし、従来の手法は性能予測器を生成するのに多くの時間がかかるため、非効率である。そこで本研究では,シミュレーションアニーリングと多変量回帰に基づく高速かつ説明可能な予測器からなるFOX-NASを提案する。本手法は量子化にやさしく,効率的にエッジに展開できる。異なるハードウェア上での実験は、fox-nasモデルが他の一般的なニューラルネットワークアーキテクチャよりも優れていることを示している。例えば、FOX-NASはMobileNetV2とEfficientNet-Lite0の精度を240%、エッジCPUの40%のレイテンシで一致させる。 FOX-NASは、2020年の低消費電力コンピュータビジョンチャレンジ(LPCVC)で3位を獲得した。すべての評価結果はhttps://lpcv.ai/competitions/2020を参照。検索コードと事前学習されたモデルはhttps://github.com/great8nctu/fox-nasでリリースされる。 Neural architecture search can discover neural networks with good performance, and One-Shot approaches are prevalent. One-Shot approaches typically require a supernet with weight sharing and predictors that predict the performance of architecture. However, the previous methods take much time to generate performance predictors thus are inefficient. To this end, we propose FOX-NAS that consists of fast and explainable predictors based on simulated annealing and multivariate regression. Our method is quantization-friendly and can be efficiently deployed to the edge. The experiments on different hardware show that FOX-NAS models outperform some other popular neural network architectures. For example, FOX-NAS matches MobileNetV2 and EfficientNet-Lite0 accuracy with 240% and 40% less latency on the edge CPU. FOX-NAS is the 3rd place winner of the 2020 Low-Power Computer Vision Challenge (LPCVC), DSP classification track. See all evaluation results at https://lpcv.ai/competitions/2020. Search code and pre-trained models are released at https://github.com/great8nctu/FOX-NAS.	翻訳日:2021-08-19 14:49:52 公開日:2021-08-14
# (参考訳) mri病変分割のためのベンダードメインへの適応 Adapting to Unseen Vendor Domains for MRI Lesion Segmentation ( http://arxiv.org/abs/2108.06434v1 ) ライセンス: CC BY 4.0	Brandon Mac, Alan R. Moody, April Khademi	(参考訳) 機械学習モデルにおける重要な制限の1つは、トレーニング分布の領域外にあるデータのパフォーマンスの低さである。これは磁気共鳴(MR)イメージングにおける画像解析において特に当てはまり、ハードウェアとソフトウェアのバリエーションはスキャナー間の非標準強度、コントラスト、ノイズ分布を生成する。近年,合成データポイントを作成するために,領域間のデータ拡張のための画像翻訳モデルが提案されている。本稿では,ソースデータセットからターゲットデータセットへのmr画像拡張のための教師なし画像変換モデルの適用について検討する。具体的には、画像翻訳により、これらのモデルがターゲットデータセットを表す合成データポイントをどれだけうまく作成できるかを評価し、これらの合成データポイントを訓練したセグメンテーションモデルが、ターゲットデータセット上で直接訓練されたモデルのパフォーマンスに近づくかどうかを確認する。画像間の変換、スキャナーベンダー間、ラベルから画像への変換からなるデータセット間の拡張の3つの構成を検討する。その結果、ラベルから画像構成までの合成データに基づいてトレーニングされたセグメンテーションモデルが、ターゲットデータセットに直接トレーニングされたセグメンテーションモデルに最も近い性能を示した。各ターゲットベンダー(GE、Siemens、Philips)の合成データのトレーニングは0.63、0.64、0.58であり、ターゲットデータセットでのトレーニングは0.65、0.72、0.61であった。 One of the key limitations in machine learning models is poor performance on data that is out of the domain of the training distribution. This is especially true for image analysis in magnetic resonance (MR) imaging, as variations in hardware and software create non-standard intensities, contrasts, and noise distributions across scanners. Recently, image translation models have been proposed to augment data across domains to create synthetic data points. In this paper, we investigate the application an unsupervised image translation model to augment MR images from a source dataset to a target dataset. Specifically, we want to evaluate how well these models can create synthetic data points representative of the target dataset through image translation, and to see if a segmentation model trained these synthetic data points would approach the performance of a model trained directly on the target dataset. We consider three configurations of augmentation between datasets consisting of translation between images, between scanner vendors, and from labels to images. It was found that the segmentation models trained on synthetic data from labels to images configuration yielded the closest performance to the segmentation model trained directly on the target dataset. The Dice coeffcient score per each target vendor (GE, Siemens, Philips) for training on synthetic data was 0.63, 0.64, and 0.58, compared to training directly on target dataset was 0.65, 0.72, and 0.61.	翻訳日:2021-08-18 10:59:01 公開日:2021-08-14
# (参考訳) 機械読取理解に基づく新しいエンティティ抽出法 A New Entity Extraction Method Based on Machine Reading Comprehension ( http://arxiv.org/abs/2108.06444v1 ) ライセンス: CC BY 4.0	Xiaobo Jiang, Kun He, Jiajun He and Guangyu Yan	(参考訳) エンティティ抽出は、自然言語処理において大量のテキストから情報を取得するための重要な技術である。それらの間のさらなる相互作用は、人間の読み理解の基準を満たさないため、モデルの理解が制限されるとともに、推論問題による回答(つまり対象の実体)の欠落や誤判断も制限される。 An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. 実験により、RC-I2DPは科学領域と公共領域の7つの分野の総合的な最先端モデルであり、F1のモデルと比較して2.1%から10.4%の性能向上を達成した。 Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. Experiments have proved that MRC-I2DP represents an overall state-of-the-art model in 7 from the scientific and public domains, achieving a performance improvement of 2.1% ~ 10.4% compared to the model model in F1.	翻訳日:2021-08-18 10:35:27 公開日:2021-08-14
# (参考訳) PTT:ポイントクラウドにおける3次元物体追跡のためのポイントトラック変換モジュール PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds ( http://arxiv.org/abs/2108.06455v1 ) ライセンス: CC BY 4.0	Jiayao Shan, Sifan Zhou, Zheng Fang, Yubo Cui	(参考訳) 3Dオブジェクト追跡はロボティクスにとって重要な問題だ。本稿では,ptt(point-track-transformer)と呼ばれる変圧器モジュールを提案する。 PTTモジュールには、機能埋め込み、位置符号化、自己注意機能計算のための3つのブロックが含まれている。機能埋め込みは、類似のセマンティック情報がある場合、機能を埋め込み空間に近づけることを目的としている。位置符号化は点雲の座標を高次元の識別可能な特徴に符号化するために用いられる。自己注意は、注意重みの計算によって洗練された注意特徴を生成する。さらに,PTTモジュールをオープンソースの最先端手法であるP2Bに組み込んでPTT-Netを構築する。 KITTIデータセットの実験では、当社のPTT-Netが最先端のマージン(約10.%)を突破していることが明らかになった。さらに、ptt-netはnvidia 1080ti gpuでリアルタイムパフォーマンス(約40fps)を達成できる。私たちのコードは、https://github.com/shanjiayao/PTT.comでロボットコミュニティのためにオープンソース化されています。 3D single object tracking is a key issue for robotics. In this paper, we propose a transformer module called Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking. PTT module contains three blocks for feature embedding, position encoding, and self-attention feature computation. Feature embedding aims to place features closer in the embedding space if they have similar semantic information. Position encoding is used to encode coordinates of point clouds into high dimension distinguishable features. Self-attention generates refined attention features by computing attention weights. Besides, we embed the PTT module into the open-source state-of-the-art method P2B to construct PTT-Net. Experiments on the KITTI dataset reveal that our PTT-Net surpasses the state-of-the-art by a noticeable margin (~10\%). Additionally, PTT-Net could achieve real-time performance (~40FPS) on NVIDIA 1080Ti GPU. Our code is open-sourced for the robotics community at https://github.com/shanjiayao/PTT.	翻訳日:2021-08-18 10:10:58 公開日:2021-08-14
# (参考訳) スパースニューラルネットワークによる最適近似とその応用 Optimal Approximation with Sparse Neural Networks and Applications ( http://arxiv.org/abs/2108.06467v1 ) ライセンス: CC BY 4.0	Khay Boon Hong	(参考訳) ニューラルネットワークを格納するための接続性やメモリ要件を制限することで,関数クラスの複雑性を$l^2(\mathbb r^d)$で測定する。また,表現系を持つ近似理論は数学においてよく開発されてきたため,ニューラルネットワークを導くための可算関数の集合であるrepresentation systemを導入する。次に、基本有界定理を証明し、関数クラス自体に固有の量を示すことによって、ニューラルネットワークと表現システムの近似能力に関する情報を与える。また、表現システムによる近似に関する既存の理論をニューラルネットワークに転送し、ニューラルネットワークの実践的価値を大幅に増幅する方法を提供する。最後に,ニューラルネットワークを用いてB-スプライン関数を近似し,B-スプライン曲線を生成する。次に,レートゆらぎ理論とウェッジレット構成を用いて,$\beta$ マンガ様関数と呼ばれるクラスの複雑性を分析する。 We use deep sparsely connected neural networks to measure the complexity of a function class in $L^2(\mathbb R^d)$ by restricting connectivity and memory requirement for storing the neural networks. We also introduce representation system - a countable collection of functions to guide neural networks, since approximation theory with representation system has been well developed in Mathematics. We then prove the fundamental bound theorem, implying a quantity intrinsic to the function class itself can give information about the approximation ability of neural networks and representation system. We also provides a method for transferring existing theories about approximation by representation systems to that of neural networks, greatly amplifying the practical values of neural networks. Finally, we use neural networks to approximate B-spline functions, which are used to generate the B-spline curves. Then, we analyse the complexity of a class called $\beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.	翻訳日:2021-08-18 09:52:59 公開日:2021-08-14
# (参考訳) ロバスト増強によるコントラスト型自己教師型シーケンスレコメンデーション Contrastive Self-supervised Sequential Recommendation with Robust Augmentation ( http://arxiv.org/abs/2108.06479v1 ) ライセンス: CC BY 4.0	Zhiwei Liu, Yongjun Chen, Jia Li, Philip S. Yu, Julian McAuley, Caiming Xiong	(参考訳) Sequential Recommendation Describes a set of technique to model dynamic user behavior to order to predict future interaction in sequence user data。その核となるアプローチは、マルコフ連鎖、リカレントネットワーク、あるいは最近ではトランスフォーマーを介して、シーケンス内のアイテム間のモデル遷移確率である。しかし、データスパーシリティやノイズの多いデータなど、古い問題と新しい問題の両方が残っており、特に複雑なパラメータハングリーモデルでは、そのような問題はパフォーマンスを損なう可能性がある。本稿では、これらの問題を緩和する手段として、コントラッシブな自己監視学習(SSL)のシーケンシャルレコメンデーションへの適用について検討する。対照的にSSLは、正のペア間の合意が最大化される非競合インスタンスからの拡張を構築する。離散的な性質、項目間の相関、長さ分布の歪性から、逐次的な推奨のために対照的なsslフレームワークを開発するのは困難である。この目的のために,コントラスト型自己指導型シーケンシャルレコメンデーション(CoSeRec)のための新しいフレームワークを提案する。項目相関を利用した2つの情報拡張演算子を導入し、コントラスト学習のための高品質なビューを作成する。実世界の3つのデータセットに対する実験結果から,提案手法がモデル性能の向上に有効であることを示す。実装は \url{https://github.com/ychen1993/coserec} で利用可能です。 Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. At their core, such approaches model transition probabilities between items in a sequence, whether through Markov chains, recurrent networks, or more recently, Transformers. However both old and new issues remain, including data-sparsity and noisy data; such issues can impair the performance, especially in complex, parameter-hungry models. In this paper, we investigate the application of contrastive Self-Supervised Learning (SSL) to the sequential recommendation, as a way to alleviate some of these issues. Contrastive SSL constructs augmentations from unlabelled instances, where agreements among positive pairs are maximized. It is challenging to devise a contrastive SSL framework for a sequential recommendation, due to its discrete nature, correlations among items, and skewness of length distributions. To this end, we propose a novel framework, Contrastive Self-supervised Learning for sequential Recommendation (CoSeRec). We introduce two informative augmentation operators leveraging item correlations to create high-quality views for contrastive learning. Experimental results on three real-world datasets demonstrate the effectiveness of the proposed method on improving model performance and the robustness against sparse and noisy data. Our implementation is available online at \url{https://github.com/YChen1993/CoSeRec}	翻訳日:2021-08-18 09:52:01 公開日:2021-08-14
# (参考訳) 深層畳み込みニューラルネットワークを用いた小児胸部x線写真における多発疾患の自動診断法 Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks ( http://arxiv.org/abs/2108.06486v1 ) ライセンス: CC BY 4.0	Thanh T. Tran, Hieu H. Pham, Thang V. Nguyen, Tung T. Le, Hieu T. Nguyen, Ha Q. Nguyen	(参考訳) 小児における胸部X線写真(CXR)の解釈は誤りが多く,放射線学の専門知識を高いレベルで理解する必要がある。近年、ディープ畳み込みニューラルネットワーク(D-CNN)は成人におけるCXRの解釈において顕著な性能を示した。しかし、D-CNNが小児CXRスキャンから正確に複数の肺病変を認識できるという証拠は乏しい。特に, 小児胸部疾患の診断モデルの開発は, (i) 医師注記データセットの欠如, (ii) クラス不均衡問題などの重大な課題に直面している。本稿では,小児のcxrスキャン5,017例の大規模データセットを回顧的に収集し,経験豊富な放射線科医が10例の病理所見を手作業で分類した。 D-CNNモデルは、3,550個の注釈付きスキャンで訓練され、複数の小児肺病理を自動分類する。そこで本研究では,BCE(Binary-Cross Entropy Loss)の再検討を前提としたD-CNNのトレーニングにおいて,多数のクラスに割り当てられた損失を減らし,より難しいサンプルを効率よく学習する「分散ベース損失」の修正と適用を提案する。 777の研究の独立したテストセットにおいて、提案手法は受信機動作特性 (auc) の下の領域を 0.709 (95% ci, 0.690-0.729) とする。カットオフ値の感度、特異性、F1スコアはそれぞれ0.722(0.694-0.750)、0.579(0.563-0.595)、0.389(0.373-0.405)である。これらの結果は, 対象疾患のほとんどにおいて, 従来の最先端手法よりも有意に優れていた。さらに,この学習課題において,BCEやFocal Lossなどの他の標準損失と比較して,提案した損失関数の有効性を検証した。全体として、小児CXRの解釈におけるD-CNNの可能性を示す。 Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. In particular, the development of diagnostic models for the detection of pediatric chest diseases faces significant challenges such as (i) lack of physician-annotated datasets and (ii) class imbalance problems. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist for the presence of 10 common pathologies. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically. To address the high-class imbalance issue, we propose to modify and apply "Distribution-Balanced loss" for training D-CNNs which reshapes the standard Binary-Cross Entropy loss (BCE) to efficiently learn harder samples by down-weighting the loss assigned to the majority classes. On an independent test set of 777 studies, the proposed approach yields an area under the receiver operating characteristic (AUC) of 0.709 (95% CI, 0.690-0.729). The sensitivity, specificity, and F1-score at the cutoff value are 0.722 (0.694-0.750), 0.579 (0.563-0.595), and 0.389 (0.373-0.405), respectively. These results significantly outperform previous state-of-the-art methods on most of the target diseases. Moreover, our ablation studies validate the effectiveness of the proposed loss function compared to other standard losses, e.g., BCE and Focal Loss, for this learning task. Overall, we demonstrate the potential of D-CNNs in interpreting pediatric CXRs.	翻訳日:2021-08-18 09:33:29 公開日:2021-08-14
# (参考訳) 自動毒素コメント検出におけるバイアスの調査--実証的研究 Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study ( http://arxiv.org/abs/2108.06487v1 ) ライセンス: CC BY 4.0	Ayush Kumar, Pratik Kumar	(参考訳) オンラインプラットフォームの増加に伴い、コメントやリアクションを通じて、これらのプラットフォームでのユーザエンゲージメントが急増している。このような文章によるコメントの大部分は、聴衆に対して虐待的で無礼で侮辱的です。機械学習システムがプラットフォームに現れるコメントをチェックするために、トレーニングデータに存在するバイアスが分類器に渡され、クラス、宗教、性別のセットに対する差別につながる。本研究では,これらの分類器のバイアスを推定するために異なる分類器と特徴を評価し,毒性分類の下流タスクにおける性能を評価する。その結果, 自動有毒なコメント検出モデルの性能改善は, バイアス軽減と正の相関を示した。我々の研究で、注意機構を持つLSTMはCNNモデルよりも優れたモデリング戦略であることが判明した。さらなる分析により、fasttext埋め込みは、有毒なコメント検出のためのトレーニングモデルへの手袋埋め込みよりもわずかに好ましいことが示されている。より深い分析により、これらの自動モデルはAUCスコアが高いにもかかわらず、特に特定のアイデンティティグループに偏っていることが明らかになった。最後に、毒性検出モデルのバイアスを軽減するために、毒性サブタイプの補助的なタスクで訓練されたマルチタスク設定が有用であることが判明し、AUCスコアの0.26%(6%)まで上昇した。 With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.	翻訳日:2021-08-18 09:20:05 公開日:2021-08-14
# (参考訳) DICOMイメージングルータ:DICOM X線スキャンから身体部位を分類するためのオープンディープラーニングフレームワーク DICOM Imaging Router: An Open Deep Learning Framework for Classification of Body Parts from DICOM X-ray Scans ( http://arxiv.org/abs/2108.06490v1 ) ライセンス: CC BY 4.0	Hieu H. Pham, Dung V. Do, Ha Q. Nguyen	(参考訳) dicom形式のx線イメージングは、臨床でもっとも一般的に使用されるイメージングモダリティであり、膨大な非正規化データベースを生み出している。これにより、医療画像を分析するためのAIソリューションのデプロイに障害が生じ、しばしば、特定のAIモデルにイメージを投入する前に、適切な身体部分を特定する必要がある。この課題は、X線スキャンから身体部分を分類する自動化的で効率的なアプローチの必要性を高める。残念ながら、私たちの知る限りでは、このタスクにはオープンなツールやフレームワークはありません。この欠点を補うために,未知のDICOM X線像を腹部,成人胸,小児胸,脊椎などの5つの解剖学的グループに分類するためのDICOM Imaging Routerを導入する。この目的のために、16,093枚の画像からなる大規模なX線データセットが収集され、手動で分類された。 11,263枚の画像のトレーニングセットを使用して、最先端の深層CNNのセットをトレーニングした。これらのネットワークは、2,419枚の独立したテストセットで評価され、ボディパーツの分類において優れた性能を示した。具体的には, 0.982 (95% CI, 0.977-0.988), 0.985 (95% CI, 0.975-0.989), F1スコア 0.981 (95% CI, 0.976-0.987) のリコールを実現した。 1000枚のx線画像に対する外部的妥当性は,提案手法の病院間における堅牢性を示している。これらの顕著なパフォーマンスは、深部CNNが人体部分とX線スキャンを正確にかつ効果的に区別できることを示し、臨床現場での幅広い応用に潜在的に有益であることを示している。この研究から得られたデータセット、コード、トレーニングされたディープラーニングモデルは、プロジェクトのWebサイトで公開されます。 X-ray imaging in DICOM format is the most commonly used imaging modality in clinical practice, resulting in vast, non-normalized databases. This leads to an obstacle in deploying AI solutions for analyzing medical images, which often requires identifying the right body part before feeding the image into a specified AI model. This challenge raises the need for an automated and efficient approach to classifying body parts from X-ray scans. Unfortunately, to the best of our knowledge, there is no open tool or framework for this task to date. To fill this lack, we introduce a DICOM Imaging Router that deploys deep CNNs for categorizing unknown DICOM X-ray images into five anatomical groups: abdominal, adult chest, pediatric chest, spine, and others. To this end, a large-scale X-ray dataset consisting of 16,093 images has been collected and manually classified. We then trained a set of state-of-the-art deep CNNs using a training set of 11,263 images. These networks were then evaluated on an independent test set of 2,419 images and showed superior performance in classifying the body parts. Specifically, our best performing model achieved a recall of 0.982 (95% CI, 0.977-0.988), a precision of 0.985 (95% CI, 0.975-0.989) and a F1-score of 0.981 (95% CI, 0.976-0.987), whilst requiring less computation for inference (0.0295 second per image). Our external validity on 1,000 X-ray images shows the robustness of the proposed approach across hospitals. These remarkable performances indicate that deep CNNs can accurately and effectively differentiate human body parts from X-ray scans, thereby providing potential benefits for a wide range of applications in clinical settings. The dataset, codes, and trained deep learning models from this study will be made publicly available on our project website at https://vindr.ai/.	翻訳日:2021-08-18 09:10:20 公開日:2021-08-14
# (参考訳) 人物に焦点をあてる:現代映画から学ぶ古いイメージの着色 Focusing on Persons: Colorizing Old Images Learning from Modern Historical Movies ( http://arxiv.org/abs/2108.06515v1 ) ライセンス: CC BY 4.0	Xin Jin, Zhonglan Li, Ke Liu, Dongqing Zou, Xiaodong Li, Xingfan Zhu, Ziyin Zhou, Qilong Sun, Qingyu Liu	(参考訳) 業界では、ビデオサイトやアーカイブなど、古いグレーの写真を自動的に色付けする必要があるシナリオがたくさんあります。本稿では,歴史的人物の多彩な多彩な高忠実度衣服の着色化に着目したヒストリーネットについて述べる。歴史人物の着色は現実的で実践的であるが、既存の方法ではうまく機能しない。本稿では,3つの部分,分類,微粒化セマンティックパーシング,カラー化を含むヒストリーネットを提案する。分類サブモジュールは、年代、国籍、衣服の種類に応じてイメージを分類し、パーシングサブネットワークは、画像中の人物の輪郭、衣服、背景のセマンティクスを提供し、服や人のより正確な着色を実現し、カラーオーバーフローを防ぐ。トレーニングプロセスでは,分類と意味解析機能をカラー化生成ネットワークに統合し,カラー化を改善する。分類および解析サブネットワークの設計により、画像のカラー化の精度が向上し、画像の各部分の境界をより明確にすることができる。また,現代に作られた147の歴史的映画やテレビシリーズから,1,353,166枚の画像と42個の年代,国籍,衣服を自動着色するためのラベルを含む,新しい現代映画データセット(MHMD)を提案する。様々な量的・質的な比較により,本手法は,歴史的文献により正色が正しい軍服において,最先端の着色法よりも優れることが示された。 In industry, there exist plenty of scenarios where old gray photos need to be automatically colored, such as video sites and archives. In this paper, we present the HistoryNet focusing on historical person's diverse high fidelity clothing colorization based on fine grained semantic understanding and prior. Colorization of historical persons is realistic and practical, however, existing methods do not perform well in the regards. In this paper, a HistoryNet including three parts, namely, classification, fine grained semantic parsing and colorization, is proposed. Classification sub-module supplies classifying of images according to the eras, nationalities and garment types; Parsing sub-network supplies the semantic for person contours, clothing and background in the image to achieve more accurate colorization of clothes and persons and prevent color overflow. In the training process, we integrate classification and semantic parsing features into the coloring generation network to improve colorization. Through the design of classification and parsing subnetwork, the accuracy of image colorization can be improved and the boundary of each part of image can be more clearly. Moreover, we also propose a novel Modern Historical Movies Dataset (MHMD) containing 1,353,166 images and 42 labels of eras, nationalities, and garment types for automatic colorization from 147 historical movies or TV series made in modern time. Various quantitative and qualitative comparisons demonstrate that our method outperforms the state-of-the-art colorization methods, especially on military uniforms, which has correct colors according to the historical literatures.	翻訳日:2021-08-18 09:04:32 公開日:2021-08-14
# (参考訳) 3次元ニューロン再構築のためのVoxel-wise Cross-Volume Representation Learning Voxel-wise Cross-Volume Representation Learning for 3D Neuron Reconstruction ( http://arxiv.org/abs/2108.06522v1 ) ライセンス: CC BY 4.0	Heng Wang, Chaoyi Zhang, Jianhui Yu, Yang Song, Siqi Liu, Wojciech Chrzanowski, Weidong Cai	(参考訳) 脳回路活動におけるニューロンの形態と機能の解析には,3次元ニューロンの自動再構成が重要である。しかし、既存のトレースアルゴリズムの性能は画像品質の低さに支えられている。近年,低コントラスト背景からノイズを除去し,ニューロン構造を復元することにより,生の3次元光学画像スタックの品質を向上させるためのディープラーニングに基づくセグメンテーション手法が提案されている。ニューロン形態の多様性と大きなニューロンデータセットの欠如により、現在のニューロンセグメンテーションモデルは、より優れた特徴表現を符号化することを目的とした、複雑で特別に設計されたサブモジュールをベースアーキテクチャに導入することに依存している。成功したが、推論中に計算に余分な負担がかかる。したがって、ベースネットワークを変更するのではなく、データセット自体に焦点を移します。ほとんどのニューロンセグメンテーションモデルで使用されるエンコーダ-デコーダバックボーンは、ニューロンの構造的特徴を学ぶためにボリューム内ボクセルポイントのみに出席するが、異なるボリューム間で同じカテゴリに属するボクセルの共有固有の意味的特徴を無視する。そこで本研究では,エンコーダ・デコーダセグメンテーションモデルに基づいて,新たなvoxelレベルクロスボリューム表現学習パラダイムを用いて,voxelの固有特徴を明示的に活用することを提案する。我々の方法は推論中に余分なコストを伴わない。提案手法は,BigNeuronプロジェクトの42個の3次元ニューロン画像から評価し,元のセグメンテーションモデルの学習能力の向上と再構築性能の向上を目的としている。 Automatic 3D neuron reconstruction is critical for analysing the morphology and functionality of neurons in brain circuit activities. However, the performance of existing tracing algorithms is hinged by the low image quality. Recently, a series of deep learning based segmentation methods have been proposed to improve the quality of raw 3D optical image stacks by removing noises and restoring neuronal structures from low-contrast background. Due to the variety of neuron morphology and the lack of large neuron datasets, most of current neuron segmentation models rely on introducing complex and specially-designed submodules to a base architecture with the aim of encoding better feature representations. Though successful, extra burden would be put on computation during inference. Therefore, rather than modifying the base network, we shift our focus to the dataset itself. The encoder-decoder backbone used in most neuron segmentation models attends only intra-volume voxel points to learn structural features of neurons but neglect the shared intrinsic semantic features of voxels belonging to the same category among different volumes, which is also important for expressive representation learning. Hence, to better utilise the scarce dataset, we propose to explicitly exploit such intrinsic features of voxels through a novel voxel-level cross-volume representation learning paradigm on the basis of an encoder-decoder segmentation model. Our method introduces no extra cost during inference. Evaluated on 42 3D neuron images from BigNeuron project, our proposed method is demonstrated to improve the learning ability of the original segmentation model and further enhancing the reconstruction performance.	翻訳日:2021-08-18 08:51:52 公開日:2021-08-14
# (参考訳) 手指衛生的ポーズの特徴同定とマッチング Feature Identification and Matching for Hand Hygiene Pose ( http://arxiv.org/abs/2108.06537v1 ) ライセンス: CC BY 4.0	Rashmi Bakshi	(参考訳) SIFT, SURF, ORB などのコンピュータビジョンの3つの特徴記述子を比較し評価した。手のひら画像から手のひら画像, 回転画像まで, 手のひら写真と一致する特徴を抽出し, 一致させた。マッチの総数と生成されたマッチの正確な数に基づいて算出された精度スコア。この実験は、orbアルゴリズムが、少ない時間で正しいマッチングを多く与えることで、より優れることを示した。特徴抽出と手指衛生ポーズ分類のための手洗いビデオ記録に応用したORB特徴検出技術が今後の課題である。 OpenCVはpythonスクリプトにアルゴリズムを適用した。 Three popular feature descriptors of computer vision such as SIFT, SURF, and ORB compared and evaluated. The number of correct features extracted and matched for the original hand hygiene pose-Rub hands palm to palm image and rotated image. An accuracy score calculated based on the total number of matches and the correct number of matches produced. The experiment demonstrated that ORB algorithm outperforms by giving the high number of correct matches in less amount of time. ORB feature detection technique applied over handwashing video recordings for feature extraction and hand hygiene pose classification as a future work. OpenCV utilized to apply the algorithms within python scripts.	翻訳日:2021-08-18 08:41:41 公開日:2021-08-14
# (参考訳) PICCOLO: ポイントクラウド中心のOmnidirectional Localization PICCOLO: Point Cloud-Centric Omnidirectional Localization ( http://arxiv.org/abs/2108.06545v1 ) ライセンス: CC BY 4.0	Junho Kim, Changwoon Choi, Hojun Jang, and Young Min Kim	(参考訳) 一方向局所化のための単純かつ効率的なアルゴリズムであるPICCOLOを提案する。カラーの点雲とシーンの360パノラマ画像が与えられた場合、パノラマ画像が撮影されるカメラのポーズを復元することが目的である。私たちのパイプラインは、クエリとして与えられた単一のイメージで、オフザシェルフで動作し、ニューラルネットワークのトレーニングや、画像の地味なポーズの収集は必要ありません。代わりに、各点雲の色をパノラマ画像の全体像と一致させ、グラデーション・ディッセント最適化を行い、カメラのポーズを見つける。我々の損失関数はサンプリング損失と呼ばれ、点クラウド内の全ての点の投影された位置で評価される点クラウド中心である。対照的に、従来の測光損失は画像中心であり、各画素位置の色を比較する。比較対象の単純な変更により、サンプリング損失は全方位画像の激しい視覚歪みを効果的に克服し、360度ビューのグローバルなコンテキストを享受し、視覚的ローカライゼーションの困難なシナリオに対処する。 PICCOLOは、様々な環境で評価された場合、既存の全方位ローカライゼーションアルゴリズムよりも精度と安定性が優れている。 We present PICCOLO, a simple and efficient algorithm for omnidirectional localization. Given a colored point cloud and a 360 panorama image of a scene, our objective is to recover the camera pose at which the panorama image is taken. Our pipeline works in an off-the-shelf manner with a single image given as a query and does not require any training of neural networks or collecting ground-truth poses of images. Instead, we match each point cloud color to the holistic view of the panorama image with gradient-descent optimization to find the camera pose. Our loss function, called sampling loss, is point cloud-centric, evaluated at the projected location of every point in the point cloud. In contrast, conventional photometric loss is image-centric, comparing colors at each pixel location. With a simple change in the compared entities, sampling loss effectively overcomes the severe visual distortion of omnidirectional images, and enjoys the global context of the 360 view to handle challenging scenarios for visual localization. PICCOLO outperforms existing omnidirectional localization algorithms in both accuracy and stability when evaluated in various environments.	翻訳日:2021-08-18 08:38:20 公開日:2021-08-14
# (参考訳) マルチレベルアテンション機構による砂時計網の積み重ね--椎間板ラベリングの探究 Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling ( http://arxiv.org/abs/2108.06554v1 ) ライセンス: CC BY 4.0	Reza Azad, Lucas Rouhier, Julien Cohen-Adad	(参考訳) MRIによる脊椎椎間板形成は,多発性硬化症,筋萎縮性側索硬化症,変性頚髄症,癌などの脊椎疾患の適切な診断に重要である。 MRIデータにおける脊椎椎間板の自動ラベル付けは、椎骨と骨の面積の類似性、脊椎と周囲の組織の形状のばらつき、スキャン(製造者、パルスシーケンス、画像コントラスト、解像度、アーティファクト)のばらつきなど、難しい作業である。以前の研究では、脊椎椎間板のラベル付けは、しばしばディスク検出のステップ後に行われ、ローカライゼーションアルゴリズムがディスクを見逃したり、偽陽性の検知を行うと、ほとんど失敗する。本研究では,ポーズ推定手法を用いて椎間板ラベリングを再構成することにより,この問題を軽減することを目的としている。そこで本研究では,椎間板の位置と骨格構造を共同学習するためのマルチレベルアテンション機構を備えた重ね合わせ砂時計ネットワークを提案する。提案した深層学習モデルは意味的セグメンテーションの強さとポーズ推定手法を考慮し,欠落した領域と偽陽性検出を扱う。提案手法の性能をさらに高めるために,偽陽性検出を減らすためのスケルトンベース探索空間を提案する。提案手法はspiner general public multi-center dataset上で評価し,t1wとt2wのコントラストにおいて,従来よりも優れた性能を示した。この方法はivadomed (https://ivadomed.org)で実装されている。 Labeling vertebral discs from MRI scans is important for the proper diagnosis of spinal related diseases, including multiple sclerosis, amyotrophic lateral sclerosis, degenerative cervical myelopathy and cancer. Automatic labeling of the vertebral discs in MRI data is a difficult task because of the similarity between discs and bone area, the variability in the geometry of the spine and surrounding tissues across individuals, and the variability across scans (manufacturers, pulse sequence, image contrast, resolution and artefacts). In previous studies, vertebral disc labeling is often done after a disc detection step and mostly fails when the localization algorithm misses discs or has false positive detection. In this work, we aim to mitigate this problem by reformulating the semantic vertebral disc labeling using the pose estimation technique. To do so, we propose a stacked hourglass network with multi-level attention mechanism to jointly learn intervertebral disc position and their skeleton structure. The proposed deep learning model takes into account the strength of semantic segmentation and pose estimation technique to handle the missing area and false positive detection. To further improve the performance of the proposed method, we propose a skeleton-based search space to reduce false positive detection. The proposed method evaluated on spine generic public multi-center dataset and demonstrated better performance comparing to previous work, on both T1w and T2w contrasts. The method is implemented in ivadomed (https://ivadomed.org).	翻訳日:2021-08-18 08:22:35 公開日:2021-08-14
# (参考訳) ニューラルネットワークの遷移直交分解:双曲方程式の非線形還元のための機械学習アプローチ The Neural Network shifted-Proper Orthogonal Decomposition: a Machine Learning Approach for Non-linear Reduction of Hyperbolic Equations ( http://arxiv.org/abs/2108.06558v1 ) ライセンス: CC BY 4.0	Davide Papapicco, Nicola Demo, Michele Girfoglio, Giovanni Stabile, Gianluigi Rozza	(参考訳) 主対流を持つモデルは射影に基づく還元次数モデリングにおいて常に難しい課題を提起した。最近提案された多くの手法は、コルモゴロフのN-幅崩壊を加速する全階解の事前処理に基づいており、より精度のよいより小さな線型部分空間が得られる。しかし、これらの方法は解の位相空間における特性速度の知識に頼らざるを得ず、アドベクション場に対する明示的な機能形式を持つ問題への適用範囲を制限しなければならない。本研究では,ディープラーニングアーキテクチャを実装することで,統計的学習フレームワークにおける正しい前処理変換を自動的に検出する問題にアプローチする。純粋にデータ駆動方式により、線形部分空間操作の既存のアプローチを未知の対流場を持つ非線形双曲問題に一般化することができる。提案アルゴリズムは,その性能をベンチマークするために単純なテストケースに対して検証され,後に多相シミュレーションに応用された。 Models with dominant advection always posed a difficult challenge for projection-based reduced order modelling. Many methodologies that have recently been proposed are based on the pre-processing of the full-order solutions to accelerate the Kolmogorov N-width decay thereby obtaining smaller linear subspaces with improved accuracy. These methods however must rely on the knowledge of the characteristic speeds in phase space of the solution, limiting their range of applicability to problems with explicit functional form for the advection field. In this work we approach the problem of automatically detecting the correct pre-processing transformation in a statistical learning framework by implementing a deep-learning architecture. The purely data-driven method allowed us to generalise the existing approaches of linear subspace manipulation to non-linear hyperbolic problems with unknown advection fields. The proposed algorithm has been validated against simple test cases to benchmark its performances and later successfully applied to a multiphase simulation.	翻訳日:2021-08-18 08:13:31 公開日:2021-08-14
# (参考訳) 水中ドームの屈折幾何学 Refractive Geometry for Underwater Domes ( http://arxiv.org/abs/2108.06575v1 ) ライセンス: CC BY 4.0	Mengkun She, David Nakath, Yifan Song, Kevin K\"oser	(参考訳) 水中カメラは通常、水から保護するためにガラス窓の後ろに置かれる。ドームポートである球状ガラスは、高度の水圧に非常に適しており、視野が大きく、ピンホールカメラが球の中心に正確に配置されている場合の屈折を避けることができる。実際のレンズをドームセンターに完全に合わせることは、実際に中心となるプロセスをガイドする方法(例えば)の両方において難しい作業である。視覚サーボ)とアライメントの品質を測定する方法に加えて、アライメントを機械的に実行する方法。したがって、このようなシステムはオフセットによって適切に調整されやすく、ピンホールカメラモデルを無効にする球面での屈折パターンに挑戦する。深海探査に使用する厚いドームにおいても、カメラシステム全体が軸方向のカメラとなり、正確な空気、ガラス、水の性質の知識を必要とせずに屈折中心を計算する非イテレーティブな方法を提供する。また,球面の屈折幾何学を解析し,前方と後方の正則化やiso屈折曲線などの効果を考察し,薄いドーム内の3次元点の前方投影のための6次多項式式を得る。次に,複数の画像から純水中キャリブレーションを推定する手法を提案する。この推定は、調整中にレンズの機械的位置を導くために使用できるか、フォトグラムの水中応用で考慮できる。 Underwater cameras are typically placed behind glass windows to protect them from the water. Spherical glass, a dome port, is well suited for high water pressures at great depth, allows for a large field of view, and avoids refraction if a pinhole camera is positioned exactly at the sphere's center. Adjusting a real lens perfectly to the dome center is a challenging task, both in terms of how to actually guide the centering process (e.g. visual servoing) and how to measure the alignment quality, but also, how to mechanically perform the alignment. Consequently, such systems are prone to being decentered by some offset, leading to challenging refraction patterns at the sphere that invalidate the pinhole camera model. We show that the overall camera system becomes an axial camera, even for thick domes as used for deep sea exploration and provide a non-iterative way to compute the center of refraction without requiring knowledge of exact air, glass or water properties. We also analyze the refractive geometry at the sphere, looking at effects such as forward- vs. backward decentering, iso-refraction curves and obtain a 6th-degree polynomial equation for forward projection of 3D points in thin domes. We then propose a pure underwater calibration procedure to estimate the decentering from multiple images. This estimate can either be used during adjustment to guide the mechanical position of the lens, or can be considered in photogrammetric underwater applications.	翻訳日:2021-08-18 07:46:03 公開日:2021-08-14
# (参考訳) スケーラブルな百万エージェント強化学習によるパンデミック予測のための微視的パンデミックシミュレータ A Microscopic Pandemic Simulator for Pandemic Prediction Using Scalable Million-Agent Reinforcement Learning ( http://arxiv.org/abs/2108.06589v1 ) ライセンス: CC BY-SA 4.0	Zhenggang Tang, Kai Yan, Liting Sun, Wei Zhan, Changliu Liu	(参考訳) 微視的流行モデルは、政府の政策立案者が疫病の発生を予測しシミュレーションするための強力なツールであり、個々の行動がマクロ現象に与える影響を捉えることができる。しかし、既存のモデルは単純なルールベースの個々の振る舞いのみを考慮し、適用性を制限する。本稿では, 深部強化学習型顕微鏡モデルであるMicroscopic Pandemic Simulator (MPS)を提案する。ルールベースのエージェントを報酬を最大化するために行動が駆動される合理的エージェントに置き換えることで、mpsは現実世界のダイナミクスをよりよく近似する。本稿では,MPSにおける大量のエージェントを効率的にシミュレートするため,SMADQN(Scalable Million-Agent DQN)を提案する。 MPSは、異なる政府の戦略の影響を効率的に評価することを可能にする。本稿ではまず,米国アレゲニーにおける実世界のデータに対してMPSを校正し,情報開示と隔離という2つの政府の戦略を実証的に評価する。その結果,提案手法の有効性が検証された。本稿では,経済・ソーシャルネットワークなどの大規模エージェントベースネットワークにおけるDRLの適用について,新たな知見を提供する。 Microscopic epidemic models are powerful tools for government policy makers to predict and simulate epidemic outbreaks, which can capture the impact of individual behaviors on the macroscopic phenomenon. However, existing models only consider simple rule-based individual behaviors, limiting their applicability. This paper proposes a deep-reinforcement-learning-powered microscopic model named Microscopic Pandemic Simulator (MPS). By replacing rule-based agents with rational agents whose behaviors are driven to maximize rewards, the MPS provides a better approximation of real world dynamics. To efficiently simulate with massive amounts of agents in MPS, we propose Scalable Million-Agent DQN (SMADQN). The MPS allows us to efficiently evaluate the impact of different government strategies. This paper first calibrates the MPS against real-world data in Allegheny, US, then demonstratively evaluates two government strategies: information disclosure and quarantine. The results validate the effectiveness of the proposed method. As a broad impact, this paper provides novel insights for the application of DRL in large scale agent-based networks such as economic and social networks.	翻訳日:2021-08-18 07:44:22 公開日:2021-08-14
# (参考訳) 微調整事前学習言語モデルによるセキュリティ脆弱性レポートのエンティティ認識 Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models ( http://arxiv.org/abs/2108.06590v1 ) ライセンス: CC BY 4.0	Guanqun Yang, Shay Dineen, Zhipeng Lin, Xueqing Liu	(参考訳) 公開セキュリティ脆弱性レポート(cveレポートなど)は、コンピュータとネットワークシステムのメンテナンスにおいて重要な役割を果たす。セキュリティ企業や管理者は、これらのレポートの情報に頼って、顧客へのパッチの開発とデプロイのタスクを優先している。これらのレポートは構造化されていないテキストであるため、自動情報抽出(IE)は構造化されていないレポートを構造化された形式に変換することで処理のスケールアップに役立つ。セキュリティ脆弱性レポートの自動IEに関する既存の作業は、しばしば多数のラベル付きトレーニングサンプルに依存している。しかし、大量のラベル付きトレーニングセットを作成するのは、費用も時間もかかる。そこで本研究では,ラベル付きトレーニングサンプルを少数しか使用できない問題について,本研究で初めて検討する。特に,我々の小規模トレーニングデータセットにおける最先端の事前学習言語モデルの性能について検討した。その結果、事前訓練された言語モデルと注意深く調整されたハイパーパラメーターにより、このタスクにおける最先端システムに到達またはわずかに優れることがわかった。主カテゴリにおける最初の微調整と、[7]のように他のカテゴリへの転送学習の2段階のプロセスと一致し、もしそうでなければ両方の段階において必要なラベル付きサンプルの数は大幅に減少する: 微調整の90%が5758から576に減少し、88.8%が1カテゴリあたり64のラベル付きサンプルで転送学習を減少させる。本実験は,NERの脆弱性レポートに対する少数サンプル学習の有効性を示すものである。この結果から,セキュリティ脆弱性レポートの少数サンプル学習における複数の研究機会が開放され,論文で論じられている。コード:https://github.com/guanqun-yang/FewVulnerability。 Public security vulnerability reports (e.g., CVE reports) play an important role in the maintenance of computer and network systems. Security companies and administrators rely on information from these reports to prioritize tasks on developing and deploying patches to their customers. Since these reports are unstructured texts, automatic information extraction (IE) can help scale up the processing by converting the unstructured reports to structured forms, e.g., software names and versions and vulnerability types. Existing works on automated IE for security vulnerability reports often rely on a large number of labeled training samples. However, creating massive labeled training set is both expensive and time consuming. In this work, for the first time, we propose to investigate this problem where only a small number of labeled training samples are available. In particular, we investigate the performance of fine-tuning several state-of-the-art pre-trained language models on our small training dataset. The results show that with pre-trained language models and carefully tuned hyperparameters, we have reached or slightly outperformed the state-of-the-art system on this task. Consistent with previous two-step process of first fine-tuning on main category and then transfer learning to others as in [7], if otherwise following our proposed approach, the number of required labeled samples substantially decrease in both stages: 90% reduction in fine-tuning from 5758 to 576,and 88.8% reduction in transfer learning with 64 labeled samples per category. Our experiments thus demonstrate the effectiveness of few-sample learning on NER for security vulnerability report. This result opens up multiple research opportunities for few-sample learning for security vulnerability reports, which is discussed in the paper. Code: https://github.com/guanqun-yang/FewVulnerability.	翻訳日:2021-08-18 07:21:01 公開日:2021-08-14
# (参考訳) オフィス需要応答におけるエネルギー価格のオフライン強化学習:エネルギーとデータコストの削減 Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs ( http://arxiv.org/abs/2108.06594v1 ) ライセンス: CC BY-SA 4.0	Doseok Jang, Lucas Spangher, Manan Khattar, Utkarsha Agwan, Selvaprabuh Nadarajah, Costas Spanos	(参考訳) 私たちのチームは、オフィスビルで本格的なエネルギー需要対応実験を行うことを提案しています。これはコミュニティに価値を提供するエキサイティングな取り組みですが、強化学習エージェントのトレーニングデータの収集にはコストがかかり、制限されます。本研究では,データコスト(収束の加速)とプログラム実装コストを最小化するためにオフライントレーニングをどのように活用するかを検討する。シミュレーションタスクで実験を開始するようにモデルを事前トレーニングし、エージェントに対する実世界の報酬をシミュレートするためにトレーニングされた計画モデルを使用することです。エネルギー需要応答問題における効率的な価格設定のためのオフライン強化学習の有用性を示す。 Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real world's rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem.	翻訳日:2021-08-18 07:03:26 公開日:2021-08-14
# (参考訳) 効率的な頭部追跡のための機械学習手法による光追跡パラメータの予測解析 Prediction Analysis of Optical Tracker Parameters using Machine Learning Approaches for efficient Head Tracking ( http://arxiv.org/abs/2108.06606v1 ) ライセンス: CC BY 4.0	Aman Kataria, Smarajit Ghosh and Vinod Karar	(参考訳) ヘッドトラッカーは、航空機/コックピットシミュレーターのパイロットの頭部を追跡するため、ヘッドマウントディスプレイシステムにおいて重要な部分である。ヘッドトラッカーの操作上の欠陥は、異なる照明条件や規則的な光の干渉といった様々な環境条件に依存する。このレターでは、異なる環境条件下での頭部の動きの6-DoFデータ収集に光学トラッカーが用いられている。また,6-DoFデータに対する環境条件の違いと受信機と光送信機の距離の変化も分析した。 A head tracker is a crucial part of the head mounted display systems, as it tracks the head of the pilot in the plane/cockpit simulator. The operational flaws of head trackers are also dependent on different environmental conditions like different lighting conditions and stray light interference. In this letter, an optical tracker has been employed to gather the 6-DoF data of head movements under different environmental conditions. Also, the effect of different environmental conditions and variation in distance between the receiver and optical transmitter on the 6-DoF data was analyzed.	翻訳日:2021-08-18 06:48:30 公開日:2021-08-14
# (参考訳) 自動エンコーディングのない教師なしディスタングル:落とし穴と今後の方向性 Unsupervised Disentanglement without Autoencoding: Pitfalls and Future Directions ( http://arxiv.org/abs/2108.06613v1 ) ライセンス: CC BY 4.0	Andrea Burns, Aaron Sarna, Dilip Krishnan, Aaron Maschinot	(参考訳) 切り離された視覚表現は、変分オートエンコーダ(VAE)のような生成モデルで主に研究されている。先行研究は、異種表現学習のための生成法に焦点を当ててきたが、生成モデルの現在の制限のため、これらのアプローチは大きなデータセットにはスケールしない。代わりに、コントラスト学習を用いた正規化手法について検討し、大規模なデータセットや下流アプリケーションに十分強力なアンタングル表現をもたらす可能性がある。しかし,タスク性能のトレードオフにより,最適化や初期化感度のため,教師なしの絡み合いが困難であることが判明した。下流タスクとの絡み合いを評価し,使用する各規則化の利点と欠点を分析し,今後の方向性について考察する。 Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs). While prior work has focused on generative methods for disentangled representation learning, these approaches do not scale to large datasets due to current limitations of generative models. Instead, we explore regularization methods with contrastive learning, which could result in disentangled representations that are powerful enough for large scale datasets and downstream applications. However, we find that unsupervised disentanglement is difficult to achieve due to optimization and initialization sensitivity, with trade-offs in task performance. We evaluate disentanglement with downstream tasks, analyze the benefits and disadvantages of each regularization used, and discuss future directions.	翻訳日:2021-08-18 06:41:59 公開日:2021-08-14
# (参考訳) SelectGen Challenge:Few-Shotニューラルテキスト生成のための最高のトレーニングサンプルを見つける The SelectGen Challenge: Finding the Best Training Samples for Few-Shot Neural Text Generation ( http://arxiv.org/abs/2108.06614v1 ) ライセンス: CC BY 4.0	Ernie Chang, Xiaoyu Shen, Alex Marin, Vera Demberg	(参考訳) 数ショットのニューラルテキスト生成のための学習事例選択のための共有タスクを提案する。大規模な事前学習された言語モデルは、わずかなテキスト生成において劇的な改善をもたらした。それでも、ほとんどすべての以前の作業は、ごく少数のトレーニングインスタンスを選択するためにランダムサンプリングを適用するだけだ。選択戦略とそれがモデルのパフォーマンスにどのように影響するかにほとんど注意が払われていない。選択戦略の研究は、(1)下流タスクでアノテーション予算を最大限に活用し、(2)より優れた数ショットテキスト生成モデルをベンチマークするのに役立ちます。我々は,選択戦略と世代品質への影響を示す提案を歓迎する。 We propose a shared task on training instance selection for few-shot neural text generation. Large-scale pretrained language models have led to dramatic improvements in few-shot text generation. Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. The study of the selection strategy can help us to (1) make the most use of our annotation budget in downstream tasks and (2) better benchmark few-shot text generative models. We welcome submissions that present their selection strategies and the effects on the generation quality.	翻訳日:2021-08-18 06:28:46 公開日:2021-08-14
# (参考訳) bスプライン B-Splines ( http://arxiv.org/abs/2108.06617v1 ) ライセンス: CC BY 4.0	Arindam Chaudhuri	(参考訳) BSplinesはコンピュータグラフィックスで最も有望な曲線の1つである。それらは優れた幾何学的性質を備えており、コンピュータ支援デザイン産業におけるいくつかの応用の理想的な候補となっている。本稿では,B-スプライン曲線の基本特性について述べる。 2つの重要なB-Spline特性viz凸船体特性と繰り返し点効果について議論した。計算装置におけるbsplines計算も示されている。 B-Spline曲線がCT画像データセットの3次元表面を再構成する画像処理に基づく産業応用は、これらの曲線の強さをさらに強調する。 BSplines are one of the most promising curves in computer graphics. They are blessed with some superior geometric properties which make them an ideal candidate for several applications in computer aided design industry. In this article, some basic properties of B-Spline curves are presented. Two significant B-Spline properties viz convex hull property and repeated points effects are discussed. The BSplines computation in computational devices is also illustrated. An industry application based on image processing where B-Spline curve reconstructs the 3D surfaces for CT image datasets of inner organs further highlights the strength of these curves	翻訳日:2021-08-18 06:20:40 公開日:2021-08-14
# (参考訳) ニューラルネットワークのスパース符号化解釈と理論的意味 A Sparse Coding Interpretation of Neural Networks and Theoretical Implications ( http://arxiv.org/abs/2108.06622v1 ) ライセンス: CC BY 4.0	Joshua Bowren	(参考訳) ニューラルネットワーク、特に深層畳み込みニューラルネットワークは、様々なコンピュータビジョンタスクにおいて前例のないパフォーマンスを達成しているが、成功したニューラルネットワークの計算と構造に関する根拠は完全には理解されていない。画像分類のための畳み込みニューラルネットワークの適性の理論は多いが、なぜそのようなモデルが推論や異常識別のような複雑な視覚的タスクを実現できるのかについては理解されていない。本稿では、ReLUアクティベーションを持つニューラルネットワークのスパース符号化解釈と、特に畳み込みニューラルネットワークを提案する。スパース符号化では、モデルの基底関数が直交であると仮定すると、最適係数は入力画像に投影された基底関数のソフト閾値関数によって与えられる。スパース符号の非負の変種では、ソフトスレッショルド関数はReLUとなる。ここでは、直交推定基底関数によるスパース符号化を用いてこれらの解を導出し、各スパース符号化係数に対して指数的事前パラメータを持つ修正非負の直交スパース符号化モデルから畳み込みニューラルネットワーク前方変換を導出する。次に,階層的スパース符号化モデルにロジスティック回帰を追加することにより,正規化やプール化を伴わない完全畳み込みニューラルネットワークを導出する。最後に、畳み込みニューラルネットワークにおけるスパースプリアーを維持し、より強固な非線形変換を行うことで、より強固なフォワード変換を動機付ける。 Neural networks, specifically deep convolutional neural networks, have achieved unprecedented performance in various computer vision tasks, but the rationale for the computations and structures of successful neural networks is not fully understood. Theories abound for the aptitude of convolutional neural networks for image classification, but less is understood about why such models would be capable of complex visual tasks such as inference and anomaly identification. Here, we propose a sparse coding interpretation of neural networks that have ReLU activation and of convolutional neural networks in particular. In sparse coding, when the model's basis functions are assumed to be orthogonal, the optimal coefficients are given by the soft-threshold function of the basis functions projected onto the input image. In a non-negative variant of sparse coding, the soft-threshold function becomes a ReLU. Here, we derive these solutions via sparse coding with orthogonal-assumed basis functions, then we derive the convolutional neural network forward transformation from a modified non-negative orthogonal sparse coding model with an exponential prior parameter for each sparse coding coefficient. Next, we derive a complete convolutional neural network without normalization and pooling by adding logistic regression to a hierarchical sparse coding model. Finally we motivate potentially more robust forward transformations by maintaining sparse priors in convolutional neural networks as well performing a stronger nonlinear transformation.	翻訳日:2021-08-18 06:15:56 公開日:2021-08-14
# (参考訳) Equity-Directed Bootstrapping:実例と分析 Equity-Directed Bootstrapping: Examples and Analysis ( http://arxiv.org/abs/2108.06624v1 ) ライセンス: CC BY 4.0	Harish S. Bhat and Majerle E. Reeves and Sidra Goldman-Mellor	(参考訳) 非常に不均衡なバイナリ分類問題に直面した場合、私たちはしばしば、各クラスのインスタンス数がより好ましい比率で発生するブートストラップデータ上でモデルを訓練する。グループ間の分類器のパフォーマンスのバランスをとるために、ラベルとグループアイデンティティの両方に関してバランスの取れたトレーニングセットを達成するためにブートストラップを行うことができる。重度クラス不均衡の例として, 行政患者記録から自殺死亡の予測を例に, エクイティ指向のブートストラップが, 同等のオッズ基準を満たすよりも, テストセットの感性や特異性を, どのように得るかを示す。 na\\ive Bayesとロジスティック回帰の文脈で、私たちは、株式指向のブートストラップを分析し、オッズ比を1に近づけ、インターセプト調整、しきい値調整、重み付けを含む手法にリンクすることで機能することを示した。 When faced with severely imbalanced binary classification problems, we often train models on bootstrapped data in which the number of instances of each class occur in a more favorable ratio, e.g., one. We view algorithmic inequity through the lens of imbalanced classification: in order to balance the performance of a classifier across groups, we can bootstrap to achieve training sets that are balanced with respect to both labels and group identity. For an example problem with severe class imbalance---prediction of suicide death from administrative patient records---we illustrate how an equity-directed bootstrap can bring test set sensitivities and specificities much closer to satisfying the equal odds criterion. In the context of na\"ive Bayes and logistic regression, we analyze the equity-directed bootstrap, demonstrating that it works by bringing odds ratios close to one, and linking it to methods involving intercept adjustment, thresholding, and weighting.	翻訳日:2021-08-18 06:14:50 公開日:2021-08-14
# (参考訳) 時間グラフ協調変圧器を用いた連続時間逐次推薦 Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer ( http://arxiv.org/abs/2108.06625v1 ) ライセンス: CC BY-SA 4.0	Ziwei Fan and Zhiwei Liu and Jiawei Zhang and Yun Xiong and Lei Zheng and Philip S. Yu	(参考訳) ユーザの嗜好の進化をモデル化するために,逐次レコメンデーション(sr)問題として定義される時間順アイテム購入シーケンスに基づいて,ユーザ/テーマ埋め込みを学習する必要がある。既存の手法はシーケンシャルなパターンを利用してアイテムの遷移をモデル化する。しかし、その多くは、ユーザとイテムの相互作用の進化に遅れ、シーケンシャルなパターンと共存する重要な時間的協調シグナルを無視している。そこで本研究では,推薦の質を向上させるために,逐次的パターンと時間的協調信号の統合を提案する。まず,シーケンシャルパターンと協調信号を同時にエンコードすることは困難である。第二に、協調信号の時間的効果を表現することは自明ではない。そこで我々は,連続時間二成分グラフ上に新たなフレームワークであるtemporal graph sequential recommender (tgsrec) を設計する。本稿では,TGSRecにおけるTCT(Temporal Collaborative Trans-former)層を提案する。 tct層は、シーケンシャルパターン内の時間的ダイナミクスを考慮しながら、ユーザとアイテムの両方からの協調的なシグナルを同時に捉えることができる。我々は,学習した情報を時間グラフ上で伝達し,逐次パターンと時間協調信号を統合する。 5つのデータセットの実証的な結果は、TGSRecがRecall@10とMRRのそれぞれ平均22.5%と22.1%の絶対改善で他のベースラインを大幅に上回っていることを示している。 In order to model the evolution of user preference, we should learn user/item embeddings based on time-ordered item purchasing sequences, which is defined as Sequential Recommendation (SR) problem. Existing methods leverage sequential patterns to model item transitions. However, most of them ignore crucial temporal collaborative signals, which are latent in evolving user-item interactions and coexist with sequential patterns. Therefore, we propose to unify sequential patterns and temporal collaborative signals to improve the quality of recommendation, which is rather challenging. Firstly, it is hard to simultaneously encode sequential patterns and collaborative signals. Secondly, it is non-trivial to express the temporal effects of collaborative signals. Hence, we design a new framework Temporal Graph Sequential Recommender (TGSRec) upon our defined continuous-time bi-partite graph. We propose a novel Temporal Collaborative Trans-former (TCT) layer in TGSRec, which advances the self-attention mechanism by adopting a novel collaborative attention. TCT layer can simultaneously capture collaborative signals from both users and items, as well as considering temporal dynamics inside sequential patterns. We propagate the information learned fromTCTlayerover the temporal graph to unify sequential patterns and temporal collaborative signals. Empirical results on five datasets show that TGSRec significantly outperforms other baselines, in average up to 22.5% and 22.1%absolute improvements in Recall@10and MRR, respectively.	翻訳日:2021-08-18 05:54:50 公開日:2021-08-14
# (参考訳) メモリ圧縮技術を用いたGAN加速に関する調査 A Survey on GAN Acceleration Using Memory Compression Technique ( http://arxiv.org/abs/2108.06626v1 ) ライセンス: CC BY 4.0	Dina Tantawy, Mohamed Zahran, Amr Wassal	(参考訳) その発明以来、GAN(Generative Adversarial Network)は多くのアプリケーションで顕著な結果を示している。 Generative Adversarial Networksは、リソース不足のディープラーニングモデルである。通常のディープラーニングモデルとの主な違いは、その出力の性質である。例えば、gan出力は画像全体であり、他のモデルがオブジェクトを検出したり、画像を分類したりすることができる。このように、ネットワークのアーキテクチャと数値精度は、ソリューションの品質と速度に影響を与える。したがって、GANの加速は重要である。 GANの高速化は,(1)メモリ圧縮,(2)計算最適化,(3)データフロー最適化の3つの主要なトラックに分類される。データ転送がエネルギー消費の主な源であるため、メモリ圧縮は最大の節約につながる。そこで本稿では,CNN ベース GAN のメモリ圧縮技術について検討する。さらに, GANの加速の機会と課題を要約し, オープンな研究課題をさらに検討することを提案する。 Since its invention, Generative adversarial networks (GANs) have shown outstanding results in many applications. Generative Adversarial Networks are powerful yet, resource-hungry deep-learning models. Their main difference from ordinary deep learning models is the nature of their output. For example, GAN output can be a whole image versus other models detecting objects or classifying images. Thus, the architecture and numeric precision of the network affect the quality and speed of the solution. Hence, accelerating GANs is pivotal. Accelerating GANs can be classified into three main tracks: (1) Memory compression, (2) Computation optimization, and (3) Data-flow optimization. Because data transfer is the main source of energy usage, memory compression leads to the most savings. Thus, in this paper, we survey memory compression techniques for CNN-Based GANs. Additionally, the paper summarizes opportunities and challenges in GANs acceleration and suggests open research problems to be further investigated.	翻訳日:2021-08-18 05:33:56 公開日:2021-08-14
# (参考訳) ニューラルネットワークにおけるドロップアウト正規化とモデルの複雑さの関係 Investigating the Relationship Between Dropout Regularization and Model Complexity in Neural Networks ( http://arxiv.org/abs/2108.06628v1 ) ライセンス: CC BY 4.0	Christopher Sun, Jai Sharma, and Milind Maiti	(参考訳) 分散を減らすのに役立つDropout Regularizationは、ディープラーニングモデルではほぼどこでも利用できる。本研究では、3つのデータセットそれぞれについて、ドロップアウトレートと密集層内の隠れ単位数をランダムに組み合わせて構成した2000のニューラルネットワークをトレーニングすることにより、ドロップアウトレートとモデルの複雑さの関係を考察する。二つのクロスエントロピー損失とz軸上の二乗精度を持つ生成した数値は、降下率を高めながら密度層に深さを加えるという一般的な仮定に疑問を呈する。また,この2つのハイパーパラメータの複雑な相関関係を,各密層に隠れた単位が与えられた場合の最適脱落率を予測する機械学習モデルと深層学習モデルを構築し,定量化を進める。線形回帰と多項式ロジスティック回帰は、回帰に含まれるコストデータポイントをそれぞれ選択し、コストデータポイントをバイナリ分類に割り当てるために任意のしきい値を使用する必要がある。これらの機械学習モデルは、その素質が複雑な決定境界のモデリングを妨げたため、中間性能を有する。ディープラーニングモデルに目を向けると、各密層内の隠れ単位数、所望のコスト、モデルの所望の精度を考慮して、最適なドロップアウト率を予測するニューラルネットワークを構築する。しかし、この試みは垂直線試験の失敗に起因した数学的誤りに遭遇する。究極のディープラーニングモデルは、決定境界が2000の以前に生成されたデータポイントを表すニューラルネットワークである。この最終モデルは,計算コストを最小限に抑えつつ性能を最大化するために,ハイパーパラメータをチューニングするための有望な手法を考案する。この戦略は任意のモデルハイパーパラメータに適用でき、工業モデルのより効率的なチューニングが期待できる。 Dropout Regularization, serving to reduce variance, is nearly ubiquitous in Deep Learning models. We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks configured with random combinations of the dropout rate and the number of hidden units in each dense layer, on each of the three data sets we selected. The generated figures, with binary cross entropy loss and binary accuracy on the z-axis, question the common assumption that adding depth to a dense layer while increasing the dropout rate will certainly enhance performance. We also discover a complex correlation between the two hyperparameters that we proceed to quantify by building additional machine learning and Deep Learning models which predict the optimal dropout rate given some hidden units in each dense layer. Linear regression and polynomial logistic regression require the use of arbitrary thresholds to select the cost data points included in the regression and to assign the cost data points a binary classification, respectively. These machine learning models have mediocre performance because their naive nature prevented the modeling of complex decision boundaries. Turning to Deep Learning models, we build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer, the desired cost, and the desired accuracy of the model. Though, this attempt encounters a mathematical error that can be attributed to the failure of the vertical line test. The ultimate Deep Learning model is a neural network whose decision boundary represents the 2,000 previously generated data points. This final model leads us to devise a promising method for tuning hyperparameters to minimize computational expense yet maximize performance. The strategy can be applied to any model hyperparameters, with the prospect of more efficient tuning in industrial models.	翻訳日:2021-08-18 05:17:57 公開日:2021-08-14
# 教師なし再同定のためのエッジクラウド連続体における協調最適化 Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification ( http://arxiv.org/abs/2108.06493v1 ) ライセンス: Link先を確認	Weiming Zhuang, Yonggang Wen, Shuai Zhang	(参考訳) 人物再識別(ReID)は、重複しないカメラビューから人物を再識別することを目的としている。個人ReIDデータには機密情報が含まれているため、研究者は、プライバシー漏洩のリスクを軽減するために、分散トレーニング手法であるフェデレーション学習を採用した。しかし、既存の研究は、取得に手間と時間を要するデータラベルに依存している。プライバシを保ちながらラベルのない人物ReIDモデルを学習するための,非教師付き人物ReIDシステムであるFedUReIDを提案する。 FedUReIDは、ラベルのないデータを持つエッジ上で、その場でモデルのトレーニングを可能にする。クラウドサーバは、データプライバシを保存するために生データを集中するのではなく、エッジからモデルを集約する。さらに,エッジがデータ量や分布によって異なるという問題に取り組むため,クラウドとエッジを共同で最適化することでエッジでのトレーニングをパーソナライズする。具体的には、トレーニングを通して計算を再割り当てるパーソナライズ・エポック、ラベルのないデータに適したラベルを反復的に予測するパーソナライズ・クラスタリング、各エッジにサーバ集約モデルを適用するパーソナライズ・アップデートを提案する。 8人のReIDデータセットに対する大規模な実験は、FedUReIDがより高い精度を達成するだけでなく、計算コストを29%削減することを示した。統合最適化によるfeedureidシステムは,データラベルのないマルチメディアタスクへのフェデレーション学習の実装に光を当てるでしょう。 Person re-identification (ReID) aims to re-identify a person from non-overlapping camera views. Since person ReID data contains sensitive personal information, researchers have adopted federated learning, an emerging distributed training method, to mitigate the privacy leakage risks. However, existing studies rely on data labels that are laborious and time-consuming to obtain. We present FedUReID, a federated unsupervised person ReID system to learn person ReID models without any labels while preserving privacy. FedUReID enables in-situ model training on edges with unlabeled data. A cloud server aggregates models from edges instead of centralizing raw data to preserve data privacy. Moreover, to tackle the problem that edges vary in data volumes and distributions, we personalize training in edges with joint optimization of cloud and edge. Specifically, we propose personalized epoch to reassign computation throughout training, personalized clustering to iteratively predict suitable labels for unlabeled data, and personalized update to adapt the server aggregated model to each edge. Extensive experiments on eight person ReID datasets demonstrate that FedUReID not only achieves higher accuracy but also reduces computation cost by 29%. Our FedUReID system with the joint optimization will shed light on implementing federated learning to more multimedia tasks without data labels.	翻訳日:2021-08-17 15:30:52 公開日:2021-08-14
# 分散データからの協調的教師なし視覚表現学習 Collaborative Unsupervised Visual Representation Learning from Decentralized Data ( http://arxiv.org/abs/2108.06492v1 ) ライセンス: Link先を確認	Weiming Zhuang, Xin Gan, Yonggang Wen, Shuai Zhang, Shuai Yi	(参考訳) 教師なし表現学習は、インターネットで利用可能な集中型データを使用して優れたパフォーマンスを達成している。しかし、プライバシー保護に対する意識の高まりは、複数の当事者(携帯電話やカメラなど)で爆発的に増加する非ラベル画像データの分散化を制限する。そのため、データプライバシを保ちながら、これらのデータを活用して下流タスクの視覚的表現を学習する方法が自然な問題である。この問題に対処するために,新しいフェデレーション付き教師なし学習フレームワークであるFedUを提案する。このフレームワークでは、オンラインネットワークとターゲットネットワークとの対比学習を用いて、各パーティはラベルのないデータからモデルを独立に訓練する。そして、中央サーバが訓練されたモデルを集約し、集約されたモデルでクライアントのモデルを更新する。データのプライバシは、各パーティが生のデータのみにアクセスできることから保護する。複数のパーティ間の分散データは、通常非独立で同一の分散(非IID)であり、性能劣化を引き起こす。この課題に対処するために,1) サーバ集約のためのオンラインネットワークのエンコーダのみをアップロードし,それを集約したエンコーダで更新するための通信プロトコルを設計し,2) 非IID によるばらつきに基づいた予測器の更新方法を動的に決定する新しいモジュールを提案する。予測器はオンラインネットワークの他のコンポーネントである。広範囲な実験とアブレーションがfeduの有効性と意義を示している。非IIDデータに対する線形および半教師付き評価において、一方の当事者のみによるトレーニングを5%以上、その他の手法で14%以上上回っている。 Unsupervised representation learning has achieved outstanding performances using centralized data available on the Internet. However, the increasing awareness of privacy protection limits sharing of decentralized unlabeled image data that grows explosively in multiple parties (e.g., mobile phones and cameras). As such, a natural problem is how to leverage these data to learn visual representations for downstream tasks while preserving data privacy. To address this problem, we propose a novel federated unsupervised learning framework, FedU. In this framework, each party trains models from unlabeled data independently using contrastive learning with an online network and a target network. Then, a central server aggregates trained models and updates clients' models with the aggregated model. It preserves data privacy as each party only has access to its raw data. Decentralized data among multiple parties are normally non-independent and identically distributed (non-IID), leading to performance degradation. To tackle this challenge, we propose two simple but effective methods: 1) We design the communication protocol to upload only the encoders of online networks for server aggregation and update them with the aggregated encoder; 2) We introduce a new module to dynamically decide how to update predictors based on the divergence caused by non-IID. The predictor is the other component of the online network. Extensive experiments and ablations demonstrate the effectiveness and significance of FedU. It outperforms training with only one party by over 5% and other methods by over 14% in linear and semi-supervised evaluation on non-IID data.	翻訳日:2021-08-17 15:28:58 公開日:2021-08-14
# 深層モデルに基づく強化学習のためのフラクショナルトランスファー学習 Fractional Transfer Learning for Deep Model-Based Reinforcement Learning ( http://arxiv.org/abs/2108.06526v1 ) ライセンス: Link先を確認	Remo Sasso, Matthia Sabatelli, Marco A. Wiering	(参考訳) 強化学習(RL)は、RLエージェントが複雑なタスクを実行することを学ぶために大量のデータを必要とすることで知られている。モデルベースRLの最近の進歩により、エージェントはよりデータ効率が良くなり、内部のワールドモデルを活用することで、視覚環境の振る舞いを想像で学べるようになった。サンプル効率の改善は、以前に学習したタスクから知識を再利用することでも達成できるが、転送学習はRLの課題である。パラメータベースの転送学習は一般的に、ネットワークのパラメータが完全に転送されるかランダムに初期化されるオール・オア・ナッシング・アプローチを用いて行われる。本研究では,簡単な代替手法である分数転送学習を提案する。アイデアは知識の分数を転送することであり、ランダム初期化で一般的に行われるような潜在的に有用な知識を破棄することとは対照的である。 World Model-based Dreamerアルゴリズムを用いて、このアプローチが適用可能なコンポーネントの種類を特定し、新しいマルチソース転送学習環境で実験を行う。その結果,スクラッチからの学習やランダムな初期化に比べて,分数変換学習が性能と学習の大幅な向上につながることが示唆された。 Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the network's parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.	翻訳日:2021-08-17 15:26:14 公開日:2021-08-14
# 情報ボトルネック理論による初期化のためのニューロン運動 Neuron Campaign for Initialization Guided by Information Bottleneck Theory ( http://arxiv.org/abs/2108.06530v1 ) ライセンス: Link先を確認	Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han and Dongmei Zhang	(参考訳) ディープニューラルネットワーク(DNN)のトレーニングには初期化が重要な役割を果たしている。既存の初期化戦略は主に、勾配の消失/爆発を緩和するためにトレーニングプロセスを安定化することに焦点を当てている。しかし, これらの初期化手法は, 一般化能力の向上を考慮に入れていない。 Information Bottleneck(IB)理論は、DNNの一般化を説明するためのよく知られた理解フレームワークである。 ib理論の知見に導かれ、dnnをより良い初期化するための2つの基準を設計した。さらに、与えられたデータセット上でニューラルネットワークの優れた初期化を選択するために、ニューロンキャンペーン初期化アルゴリズムを設計する。 MNISTデータセットを用いた実験により,より高速な収束による一般化性能の向上が得られた。 Initialization plays a critical role in the training of deep neural networks (DNN). Existing initialization strategies mainly focus on stabilizing the training process to mitigate gradient vanish/explosion problems. However, these initialization methods are lacking in consideration about how to enhance generalization ability. The Information Bottleneck (IB) theory is a well-known understanding framework to provide an explanation about the generalization of DNN. Guided by the insights provided by IB theory, we design two criteria for better initializing DNN. And we further design a neuron campaign initialization algorithm to efficiently select a good initialization for a neural network on a given dataset. The experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.	翻訳日:2021-08-17 15:25:56 公開日:2021-08-14
# 弱い教師付き連続学習 Weakly Supervised Continual Learning ( http://arxiv.org/abs/2108.06552v1 ) ライセンス: Link先を確認	Matteo Boschini, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara	(参考訳) 連続学習(CL)は、破滅的な忘れを伴わずに、タスクのストリーム上でディープネットワークのトレーニング方法を調査する。文献で提案されたCL設定は、すべての入力サンプルが接地真実アノテーションとペアリングされていると仮定する。しかし、これは多くの現実世界のアプリケーションと衝突する。ラベル付きデータの収集は、それ自体は面倒で高価であり、データの流れがストリームとして流れ、リアルタイムに消費されなければならないときに実際に実現不可能になる。この研究は、Weakly Supervised Continual Learning (WSCL): ここでは、ラベル付き入力例のごく一部を学習者に示す。 CLメソッドの現在の方法(例)を評価する。 EWC, LwF, iCaRL, ER, GDumb, DER) は, この斬新で難解なシナリオにおいて, 過剰な絡み合いを忘れてしまう。その後、メトリクス学習と整合性正規化を利用して学習中に教師なしデータを活用する2つの新しいWSCL手法を設計する。その結果,提案手法は,情報監督時に高い柔軟性を示すだけでなく,25%未満のラベルが完全な監視下で訓練されたsotaメソッドに到達したり,あるいは上回ったりできることがわかった。 Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring catastrophic forgetting. CL settings proposed in the literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes indeed infeasible when data flow as a stream and must be consumed in real-time. This work explores Weakly Supervised Continual Learning (WSCL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, in which overfitting entangles forgetting. Subsequently, we design two novel WSCL methods which exploit metric learning and consistency regularization to leverage unsupervised data while learning. In doing so, we show that not only our proposals exhibit higher flexibility when supervised information is scarce, but also that less than 25% labels can be enough to reach or even outperform SOTA methods trained under full supervision.	翻訳日:2021-08-17 15:24:14 公開日:2021-08-14
# 正に着目する:生物多様性モニタリングのための自己監督型学習 Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring ( http://arxiv.org/abs/2108.06435v1 ) ライセンス: Link先を確認	Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha	(参考訳) ラベルのない画像コレクションから自己教師付き表現を学習する問題に対処する。各入力画像の拡張バージョン間の類似性を最大化したり、負のサンプルを投機的に選択することで、有用な機能を学習しようとする既存のアプローチとは異なり、代わりに静的監視カメラでキャプチャされる画像コレクションで発生する自然な変動も利用します。これを実現するために,入力画像間の空間的および時間的関係などの情報をエンコードする,容易に利用可能なコンテキストデータを利用する。まず、トレーニング時に高い確率の正のペアを識別することで、下流の教師付き分類に驚くほど有効である表現を学習することができる。同じ視覚概念を表現しそうな画像です地球生物多様性監視の重要課題として、人間の監督が限定された視覚的種分類タスクに適応可能な画像特徴があげられる。本研究では,4種類のカメラトラップ画像から,3種類の自己教師あり学習法を対象とし,従来の自己教師あり学習や転送学習に比べて,トレーニング時の注意深い画像選択が優れた性能をもたらすことを示す。 We address the problem of learning self-supervised representations from unlabeled image collections. Unlike existing approaches that attempt to learn useful features by maximizing similarity between augmented versions of each input image or by speculatively picking negative samples, we instead also make use of the natural variation that occurs in image collections that are captured using static monitoring cameras. To achieve this, we exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images. We are able to learn representations that are surprisingly effective for downstream supervised classification, by first identifying high probability positive pairs at training time, i.e. those images that are likely to depict the same visual concept. For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision. We present results on four different camera trap image collections, across three different families of self-supervised learning methods, and show that careful image selection at training time results in superior performance compared to existing baselines such as conventional self-supervised training and transfer learning.	翻訳日:2021-08-17 15:22:56 公開日:2021-08-14
# 強化学習による情報経路計画戦略の適応的選択 Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning ( http://arxiv.org/abs/2108.06618v1 ) ライセンス: Link先を確認	Taeyeong Choi, Grzegorz Cielniak	(参考訳) 従来の研究では,gaussian process regression (gpr) の予測の不確かさを経路計画におけるロボットの"引き込み力"として用いることで,空間補間の大幅な精度向上を導くためにサンプリング位置を優先する体系的な方針を考案した。また, トラベリングセールスマン問題 (TSP) と統合することで, 比較的短い走行距離が得られたが, 最終的には準最適位置が経路に含まれるため, 全体の予測精度を低下させる要因がいくつか考えられる。そこで,本稿ではまず,次のサンプリング位置が優先される様々な空間範囲を取り入れた「ローカルプランニング」アプローチについて検討し,予測性能および帰路距離への影響について検討する。また、Reinforcement Learning (RL)ベースのハイレベルコントローラは、特定のローカルプランナーセットからブレンドプランを適応的に作成するように訓練され、最新の予測状態に応じてその選択から独自の強みを継承する。本研究は, 温度モニタリングロボットを用いた実験により, プランナーの動的混合により, 単一のプランナーが単独で作成できない高度な情報プランを生成するだけでなく, 最短経路計算のための追加モジュールを必要とせず, 予測信頼性の犠牲なしに, 大幅に短縮された走行距離を保証できることを示した。 In our previous work, we designed a systematic policy to prioritize sampling locations to lead significant accuracy improvement in spatial interpolation by using the prediction uncertainty of Gaussian Process Regression (GPR) as "attraction force" to deployed robots in path planning. Although the integration with Traveling Salesman Problem (TSP) solvers was also shown to produce relatively short travel distance, we here hypothesise several factors that could decrease the overall prediction precision as well because sub-optimal locations may eventually be included in their paths. To address this issue, in this paper, we first explore "local planning" approaches adopting various spatial ranges within which next sampling locations are prioritized to investigate their effects on the prediction performance as well as incurred travel distance. Also, Reinforcement Learning (RL)-based high-level controllers are trained to adaptively produce blended plans from a particular set of local planners to inherit unique strengths from that selection depending on latest prediction states. Our experiments on use cases of temperature monitoring robots demonstrate that the dynamic mixtures of planners can not only generate sophisticated, informative plans that a single planner could not create alone but also ensure significantly reduced travel distances at no cost of prediction reliability without any assist of additional modules for shortest path calculation.	翻訳日:2021-08-17 15:21:01 公開日:2021-08-14
# カラー画像復元のための高次元支援生成モデル High-dimensional Assisted Generative Model for Color Image Restoration ( http://arxiv.org/abs/2108.06460v1 ) ライセンス: Link先を確認	Kai Hong, Chunhua Wu, Cailian Yang, Minghui Zhang, Yancheng Lu, Yuhao Wang, and Qiegen Liu	(参考訳) 本研究では,高次元スコアベース生成モデルを用いたカラー画像復元のための教師なし深層学習手法を提案する。スコアベース生成モデルのサンプル数と内部次元がデータ分布の勾配推定に重要な影響を与えることを考慮し、チャネルコピー変換はサンプル数を増加させ、ピクセルスケール変換は実現可能な空間次元を減少させる。その後、これらの変換で表される高次元テンソルの集合を用いて、スコアマッチングを denoising score matching によってネットワークを訓練する。次に、ランジュバンダイナミクスと代替データ一貫性更新をアニーリングしてサンプリングを行う。さらに,高次元表現を学習することの難しさを軽減するために,性能を活用するためのプログレッシブ戦略を提案する。事前学習のための事前学習型生成ネットワークを含む教師なし学習と反復的復元アルゴリズムは,他のデータ駆動型アプローチと比較して透明で明確な解釈が可能である。解体・塗布実験の結果,提案手法の顕著な性能と多様性が得られた。 This work presents an unsupervised deep learning scheme that exploiting high-dimensional assisted score-based generative model for color image restoration tasks. Considering that the sample number and internal dimension in score-based generative model have key influence on estimating the gradients of data distribution, two different high-dimensional ways are proposed: The channel-copy transformation increases the sample number and the pixel-scale transformation decreases feasible space dimension. Subsequently, a set of high-dimensional tensors represented by these transformations are used to train the network through denoising score matching. Then, sampling is performed by annealing Langevin dynamics and alternative data-consistency update. Furthermore, to alleviate the difficulty of learning high-dimensional representation, a progressive strategy is proposed to leverage the performance. The proposed unsupervised learning and iterative restoration algo-rithm, which involves a pre-trained generative network to obtain prior, has transparent and clear interpretation compared to other data-driven approaches. Experimental results on demosaicking and inpainting conveyed the remarkable performance and diversity of our proposed method.	翻訳日:2021-08-17 15:19:54 公開日:2021-08-14
# LoResMT 2021の低リソース言語における新型コロナウイルスと手話の共有課題の発見 Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages ( http://arxiv.org/abs/2108.06598v1 ) ライセンス: Link先を確認	Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen	(参考訳) 本稿では,低リソース音声と手話の双方を対象とした,COVID-19データの機械翻訳(MT)に焦点を当てたLoResMT 2021共有タスクについて述べる。この作業は低リソース言語(LoResMT)の機械翻訳技術に関する第4回ワークショップの一環として実施された。パラレルコーポラ(parallel corpora)は、英語$\leftrightarrow$irish、英語$\leftrightarrow$marathi、台湾語手話$\leftrightarrow$ traditional chineseの順に提示され、公開されている。訓練データはそれぞれ8112セグメント、20933セグメント、128608セグメントからなる。 Marathi と English には21901セグメントからなる追加の単言語データセットがある。ここで示される結果は、合計8チームからのエントリに基づいています。 3つのチームが英語$\leftrightarrow$Irishにシステムを提出し、5つのチームが英語$\leftrightarrow$Marathiにシステムを提出した。残念なことに、台湾の手話$\leftrightarrow$Traditional Chinese taskへのシステム提出は行われなかった。最大システム性能はBLEUを用いて計算され、英語は36.0、アイルランド語は34.6、英語は24.2、マラタイ語は31.3と続く。 We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English$\leftrightarrow$Irish, English$\leftrightarrow$Marathi, and Taiwanese Sign language$\leftrightarrow$Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English$\leftrightarrow$Irish while five teams submitted systems for English$\leftrightarrow$Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language$\leftrightarrow$Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and 31.3 for Marathi--English.	翻訳日:2021-08-17 15:19:24 公開日:2021-08-14
# MatSat: 行列ベースの微分可能なSATソルバ MatSat: a matrix-based differentiable SAT solver ( http://arxiv.org/abs/2108.06481v1 ) ライセンス: Link先を確認	Taisuke Sato (1) and Ryosuke Kojima (2) ((1) National Institute of Informatics (NII), (2) Graduate School of Medicine, Kyoto University)	(参考訳) 本稿では,非負の微分可能コスト関数 J^sat のコスト最小化問題として,ベクトル空間におけるSAT問題の解法を提案する。このアプローチでは、n変数のSAT問題に対する代入を満足する解は、J^sat(u) をゼロにする {0,1}^n のバイナリベクトル u で表される。ベクトル空間 R^n においてそのような u をコスト最小化、すなわち初期 u_0 から J を 0 に最小化し、ニュートン法により u を反復的に更新する。行列型微分SATソルバMatSatとして提案手法を実装した。既存の主ストリームsatソルバは、コンフリクト駆動節学習(cdcl)型や確率的局所探索(sls)型など、解割り当ての各ビットを一つずつ決定するが、マサットはベクトル空間内の解に連続的に近づくという点で、それらと根本的に異なる。そこで我々は,MateSatがn=10^5変数まで解を見つけることのできるランダム3SAT問題を用いて,MateSatのスケーラビリティを測定する実験を行った。私たちはまた、SAT 2018コンペティションとSAT Race 2019の勝者を含む4つの最先端SATソルバと比較し、SAT 2018コンペティションと人工ランダム3SATインスタンスセットのランダムベンチマークセットを使用して、ソリューションを見つける時間の観点から比較した。その結果、MateSatは両方のテストセットで2位となり、CDCLの型解決器よりも優れています。 We propose a new approach to SAT solving which solves SAT problems in vector spaces as a cost minimization problem of a non-negative differentiable cost function J^sat. In our approach, a solution, i.e., satisfying assignment, for a SAT problem in n variables is represented by a binary vector u in {0,1}^n that makes J^sat(u) zero. We search for such u in a vector space R^n by cost minimization, i.e., starting from an initial u_0 and minimizing J to zero while iteratively updating u by Newton's method. We implemented our approach as a matrix-based differential SAT solver MatSat. Although existing main-stream SAT solvers decide each bit of a solution assignment one by one, be they of conflict driven clause learning (CDCL) type or of stochastic local search (SLS) type, MatSat fundamentally differs from them in that it continuously approach a solution in a vector space. We conducted an experiment to measure the scalability of MatSat with random 3-SAT problems in which MatSat could find a solution up to n=10^5 variables. We also compared MatSat with four state-of-the-art SAT solvers including winners of SAT competition 2018 and SAT Race 2019 in terms of time for finding a solution, using a random benchmark set from SAT 2018 competition and an artificial random 3-SAT instance set. The result shows that MatSat comes in second in both test sets and outperforms all the CDCL type solvers.	翻訳日:2021-08-17 15:17:27 公開日:2021-08-14
# 確率的流域変換によるサッカーラインマークセグメンテーション Soccer line mark segmentation with stochastic watershed transform ( http://arxiv.org/abs/2108.06432v1 ) ライセンス: Link先を確認	Daniel Berj\'on, Carlos Cuevas, Narciso Garc\'ia	(参考訳) 拡張現実のアプリケーションは、スポーツの放送方法を変え始めており、より豊かな体験と貴重な洞察をファンに提供する。拡張現実システムの最初のステップはカメラキャリブレーションであり、おそらくはフィールドのラインマークを検出することに基づいている。ライン検出のための既存の提案のほとんどはエッジ検出とハフ変換に依存しているが、光学的歪みと外縁はラインマーキングの不正確または散発的な検出を引き起こす。本稿では,直線の直さを前提とせず,競技場における選手やボールの存在の影響を受けないため,光学歪みに頑健な確率的流域変換に基づいて,ラインマーキングを自動的かつ正確にセグメント化する方法を提案する。第一に、全体としての遊技場は、スタンド及び周板を完全に取り除く。そして、線マークを抽出する。この戦略は、5つのスタジアムで60枚のアノテートされた画像からなる新しい公開データベースでテストされている。得られた結果は,提案したセグメント化アルゴリズムにより,ほとんどのラインマーク画素を精度よく検出できることを証明した。 Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the field of play. Most existing proposals for line detection rely on edge detection and Hough transform, but optical distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment line markings based on a stochastic watershed transform that is robust to optical distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball in the field of play. Firstly, the playing field as a whole is segmented completely eliminating the stands and perimeter boards. Then the line markings are extracted. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed segmentation algorithm allows successful and precise detection of most line mark pixels.	翻訳日:2021-08-17 15:16:12 公開日:2021-08-14
# ビデオキャプションのためのメタ概念を用いたクロスモーダルグラフ Cross-Modal Graph with Meta Concepts for Video Captioning ( http://arxiv.org/abs/2108.06458v1 ) ライセンス: Link先を確認	Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao	(参考訳) ビデオキャプションのターゲットは、複雑な視覚的内容をテキスト記述として解釈し、オブジェクトやそれらの相互作用を含むビデオシーンを完全に理解する必要がある。一般的な手法では、オフザシェルフオブジェクト検出ネットワークを用いてオブジェクトの提案を行い、オブジェクト間の関係をモデル化するためにアテンションメカニズムを使用する。彼らはしばしば事前訓練されたモデルの未定義の意味概念を見逃し、オブジェクト間の正確な述語関係を識別できない。本稿では,ビデオのテキスト記述を生成するオープンな研究課題について検討し,動画キャプションのメタ概念を用いたクロスモーダルグラフ(CMG)を提案する。具体的には、映像キャプションにおける有用な意味概念をカバーするために、対応するテキスト記述の視覚領域を弱く学習し、関連する視覚領域とテクストワードをクロスモーダルメタ概念と命名する。さらに、学習したクロスモーダルなメタ概念でメタ概念グラフを動的に構築する。また,ビデオシーケンス構造をモデル化するために,予測述語を用いた全体像と局所像のフレームレベルのビデオグラフを構築した。提案手法の有効性を広範な実験で検証し,2つの公開データセットで最新の結果を得た。 Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention mechanism to model the relations between objects. They often miss some undefined semantic concepts of the pretrained model and fail to identify exact predicate relationships between objects. In this paper, we investigate an open research task of generating text descriptions for the given videos, and propose Cross-Modal Graph (CMG) with meta concepts for video captioning. Specifically, to cover the useful semantic concepts in video captions, we weakly learn the corresponding visual regions for text descriptions, where the associated visual regions and textual words are named cross-modal meta concepts. We further build meta concept graphs dynamically with the learned cross-modal meta concepts. We also construct holistic video-level and local frame-level video graphs with the predicted predicates to model video sequence structures. We validate the efficacy of our proposed techniques with extensive experiments and achieve state-of-the-art results on two public datasets.	翻訳日:2021-08-17 15:15:53 公開日:2021-08-14
# 二次元蛍光X線画像への解剖学的スカルモデルの登録のための人工X線ランドマークデータセットからの学習 Transfer Learning from an Artificial Radiograph-landmark Dataset for Registration of the Anatomic Skull Model to Dual Fluoroscopic X-ray Images ( http://arxiv.org/abs/2108.06466v1 ) ライセンス: Link先を確認	Chaochao Zhou, Thomas Cha, Yun Peng, Guoan Li	(参考訳) 2次元蛍光X線画像への3次元解剖構造の登録は、広く使われているモーショントラッキング技術である。しかし、深層学習の実装は、しばしば医学的イメージと基礎的真実の暗黙性によって妨げられる。本研究では,人工データセットから学習した深層ニューラルネットワークを用いた3次元から2次元への登録のためのトランスファー学習手法を提案する。女性の頭蓋骨ctデータからデジタル再構成x線写真(drr)とx線頭蓋骨ランドマークが自動生成された。ランドマーク検出のための残留ネットワーク(ResNet)と、DRRと実際のX線とのスタイルの違いを排除するためのサイクル生成逆ネットワーク(GAN)の訓練に使用された。 GANスタイルの翻訳を経験するX線のランドマークはResNetによって検出され、実際のデュアルフルオロスコープ画像の3次元から2次元の頭蓋骨の登録(非直交的な設定、点X線源、画像歪み、部分的捕獲された頭蓋骨領域)の三角形最適化に使用された。登録精度は頭蓋骨運動の複数のシナリオで評価された。歩行中、頭蓋骨の学習に基づく登録は3.9 +- 2.1 deg / 4.6 +- 2.2 mmであった。しかし, 機能的頸部活動では, 終端位置の二重蛍光像に非常に小さな頭蓋領域がみられたため, 精度は低かった。人工的なトレーニングデータを戦略的に拡張する手法は、複雑な頭蓋骨登録シナリオに対処し、広範な登録シナリオに拡張する可能性を秘めている。 Registration of 3D anatomic structures to their 2D dual fluoroscopic X-ray images is a widely used motion tracking technique. However, deep learning implementation is often impeded by a paucity of medical images and ground truths. In this study, we proposed a transfer learning strategy for 3D-to-2D registration using deep neural networks trained from an artificial dataset. Digitally reconstructed radiographs (DRRs) and radiographic skull landmarks were automatically created from craniocervical CT data of a female subject. They were used to train a residual network (ResNet) for landmark detection and a cycle generative adversarial network (GAN) to eliminate the style difference between DRRs and actual X-rays. Landmarks on the X-rays experiencing GAN style translation were detected by the ResNet, and were used in triangulation optimization for 3D-to-2D registration of the skull in actual dual-fluoroscope images (with a non-orthogonal setup, point X-ray sources, image distortions, and partially captured skull regions). The registration accuracy was evaluated in multiple scenarios of craniocervical motions. In walking, learning-based registration for the skull had angular/position errors of 3.9 +- 2.1 deg / 4.6 +- 2.2 mm. However, the accuracy was lower during functional neck activity, due to overly small skull regions imaged on the dual fluoroscopic images at end-range positions. The methodology to strategically augment artificial training data can tackle the complicated skull registration scenario, and has potentials to extend to widespread registration scenarios.	翻訳日:2021-08-17 15:15:34 公開日:2021-08-14
# 疑似スカンナーによる3次元脳mri画像検索のための疾患指向画像埋め込み Disease-oriented image embedding with pseudo-scanner standardization for content-based image retrieval on 3D brain MRI ( http://arxiv.org/abs/2108.06518v1 ) ライセンス: Link先を確認	Hayato Arai, Yuto Onga, Kumpei Ikuta, Yusuke Chayama, Hitoshi Iyatomi, Kenichi Oishi	(参考訳) 臨床脳MRIデータベースに適用可能な,堅牢で実用的なコンテンツベース画像検索(CBIR)システムを構築するために,2つのコア技術,データ調和と次元縮小アルゴリズムからなる,疾患指向の画像埋め込み(DI-PSS)を提案する。我々のDI-PSSは頭蓋骨のストリッピングとCycleGANベースの画像変換を使用して、標準脳にマップし、次に所定の参照スキャナーで撮影された脳画像に変換する。そして, 深度学習による3次元コンボリューショナルオートエンコーダ(3D-CAE)は, 疾患の特徴を反映した低次元埋め込みを得る。提案手法の有効性を,アルツハイマー病神経画像イニシアチブとパーキンソン病進行マーカーイニシアチブから選択したT1強調MRIを用いて検討した。我々はPSSがスキャナーとデータセットの違いによる低次元埋め込みのばらつきを大幅に低減したことを確認した。ベースライン条件と比較すると, アルツハイマー病 (AD) から臨床正常 (CN) , パーキンソン病 (PD) までの距離の変動は15.8-22.6%, 18.0-29.9%減少した。これらの性質により、DI-PSSは病気の分類に適した低次元表現を生成することができる。スペクトルクラスタリングに基づくADとCNの分類実験では、PSSはそれぞれ平均精度を6.2%、マクロF1を10.7%改善した。トレーニングデータのスキャンに使用されなかったMRIスキャナーによってスキャンされた画像の調和化のためのDI-PSSの可能性を考えると,DI-PSSは異種環境でスキャンされた多数のレガシーMRIに適用するのに適していると考えられる。 To build a robust and practical content-based image retrieval (CBIR) system that is applicable to a clinical brain MRI database, we propose a new framework -- Disease-oriented image embedding with pseudo-scanner standardization (DI-PSS) -- that consists of two core techniques, data harmonization and a dimension reduction algorithm. Our DI-PSS uses skull stripping and CycleGAN-based image transformations that map to a standard brain followed by transformation into a brain image taken with a given reference scanner. Then, our 3D convolutioinal autoencoders (3D-CAE) with deep metric learning acquires a low-dimensional embedding that better reflects the characteristics of the disease. The effectiveness of our proposed framework was tested on the T1-weighted MRIs selected from the Alzheimer's Disease Neuroimaging Initiative and the Parkinson's Progression Markers Initiative. We confirmed that our PSS greatly reduced the variability of low-dimensional embeddings caused by different scanner and datasets. Compared with the baseline condition, our PSS reduced the variability in the distance from Alzheimer's disease (AD) to clinically normal (CN) and Parkinson disease (PD) cases by 15.8-22.6% and 18.0-29.9%, respectively. These properties allow DI-PSS to generate lower dimensional representations that are more amenable to disease classification. In AD and CN classification experiments based on spectral clustering, PSS improved the average accuracy and macro-F1 by 6.2% and 10.7%, respectively. Given the potential of the DI-PSS for harmonizing images scanned by MRI scanners that were not used to scan the training data, we expect that the DI-PSS is suitable for application to a large number of legacy MRIs scanned in heterogeneous environments.	翻訳日:2021-08-17 15:15:01 公開日:2021-08-14
# 弱教師付き時間的行動定位のための前景的行動整合性ネットワーク Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization ( http://arxiv.org/abs/2108.06524v1 ) ライセンス: Link先を確認	Linjiang Huang, Liang Wang, Hongsheng Li	(参考訳) 高レベル映像理解の課題として,時間的行動の局所化の弱さが注目されている。ビデオアノテーションのみを使用して、既存のほとんどのメソッドはこのタスクをローカライズ・バイ・クラス化フレームワークで処理し、一般的に、アクションの確率の高いスニペット、すなわちフォアグラウンドを選択するセレクタを採用する。それにもかかわらず、既存の前景選択戦略は、前景からアクションへの一方的な関係のみを考慮するという大きな制限を持ち、前景とアクションの一貫性を保証できない。本稿では,i3dバックボーンに基づくfac-netというフレームワークについて述べる。このフレームワークでは,3つのブランチが付加され,クラス別フォアグラウンド分類ブランチ,クラス非依存注意ブランチ,複数インスタンス学習ブランチと命名された。まず, クラスワイド前景分類部は, 前景分離を最大化するために, 行動と前景の関係を規則化する。さらに、前景-アクション一貫性を規則化し、有意義な前景分類器を学ぶのに役立つ、クラス非依存の注意ブランチと複数のインスタンス学習ブランチが採用されている。各ブランチでは,各スニペットに対する複数のアテンションスコアを計算するハイブリッドアテンション機構を導入し,識別スニペットと非識別スニペットの両方に着目し,アクション境界全体をキャプチャする。 THUMOS14とActivityNet1.3の実験結果から,本手法の最先端性能が示された。私たちのコードはhttps://github.com/leonhlj/fac-netで利用可能です。 As a challenging task of high-level video understanding, weakly supervised temporal action localization has been attracting increasing attention. With only video annotations, most existing methods seek to handle this task with a localization-by-classification framework, which generally adopts a selector to select snippets of high probabilities of actions or namely the foreground. Nevertheless, the existing foreground selection strategies have a major limitation of only considering the unilateral relation from foreground to actions, which cannot guarantee the foreground-action consistency. In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch. First, our class-wise foreground classification branch regularizes the relation between actions and foreground to maximize the foreground-background separation. Besides, the class-agnostic attention branch and multiple instance learning branch are adopted to regularize the foreground-action consistency and help to learn a meaningful foreground classifier. Within each branch, we introduce a hybrid attention mechanism, which calculates multiple attention scores for each snippet, to focus on both discriminative and less-discriminative snippets to capture the full action boundaries. Experimental results on THUMOS14 and ActivityNet1.3 demonstrate the state-of-the-art performance of our method. Our code is available at https://github.com/LeonHLJ/FAC-Net.	翻訳日:2021-08-17 15:14:32 公開日:2021-08-14
# 一般化ゼロショット意味セグメンテーションにおけるジョイント埋め込み空間の活用 Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation ( http://arxiv.org/abs/2108.06536v1 ) ライセンス: Link先を確認	Donghyeon Baek, Youngmin Oh, Bumsub Ham	(参考訳) 本稿では,一般ゼロショットセマンティックセマンティックセグメンテーション(GZS3)の課題に対処する。ほとんどのGZS3メソッドは、見知らぬクラスの視覚的特徴を対応する意味的特徴(例えば word2vec)から合成し、見知らぬクラスと見えないクラスの両方に新しい分類器を訓練する。生成法は優れた性能を示すが,(1)視覚的特徴が目に見えるクラスに偏っていること,(2)未知のクラスが出現するたびに分類器を再訓練する必要があること,の2つの制限がある。我々は,これらの制約を統一したフレームワークで解決するための差別的アプローチを提案する。この目的のために、視覚的および意味的エンコーダを活用して、セマンティックエンコーダがセマンティック特徴を対応するクラスの視覚的特徴の中心として機能するセマンティックプロトタイプに変換する、共同埋め込み空間を学習する。具体的には,境界認識回帰(BAR)と意味整合性(SC)の損失を導入し,識別的特徴を学習する。我々は, bar と sc の用語を併用した統合埋め込み空間を活用し, バイアス問題を緩和する手法を提案する。テスト時には,近親者(NN)分類器としてセマンティックプロトタイプを活用することで,再訓練プロセスを回避する。さらにバイアス問題を緩和するために、NN分類器の判断境界をアポロニウス円に適応的に変調するApollonius calibration (AC)と呼ばれる推論手法を提案する。実験の結果,本フレームワークの有効性が実証され,標準ベンチマークにおける新しい技術が得られた。 We address the problem of generalized zero-shot semantic segmentation (GZS3) predicting pixel-wise semantic labels for seen and unseen classes. Most GZS3 methods adopt a generative approach that synthesizes visual features of unseen classes from corresponding semantic ones (e.g., word2vec) to train novel classifiers for both seen and unseen classes. Although generative methods show decent performance, they have two limitations: (1) the visual features are biased towards seen classes; (2) the classifier should be retrained whenever novel unseen classes appear. We propose a discriminative approach to address these limitations in a unified framework. To this end, we leverage visual and semantic encoders to learn a joint embedding space, where the semantic encoder transforms semantic features to semantic prototypes that act as centers for visual features of corresponding classes. Specifically, we introduce boundary-aware regression (BAR) and semantic consistency (SC) losses to learn discriminative features. Our approach to exploiting the joint embedding space, together with BAR and SC terms, alleviates the seen bias problem. At test time, we avoid the retraining process by exploiting semantic prototypes as a nearest-neighbor (NN) classifier. To further alleviate the bias problem, we also propose an inference technique, dubbed Apollonius calibration (AC), that modulates the decision boundary of the NN classifier to the Apollonius circle adaptively. Experimental results demonstrate the effectiveness of our framework, achieving a new state of the art on standard benchmarks.	翻訳日:2021-08-17 15:14:08 公開日:2021-08-14
# MMOCR: テキストの検出・認識・理解のための総合ツールボックス MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding ( http://arxiv.org/abs/2108.06543v1 ) ライセンス: Link先を確認	Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin	(参考訳) 本稿では,テキスト検出と認識のための包括的パイプラインと,名前付きエンティティ認識やキー情報抽出などの下流タスクを提供するオープンソースツールボックスMMOCRを提案する。 MMOCRは14の最先端のアルゴリズムを実装しています。テキスト認識に関する今後の研究と産業応用を容易にするために,大量のモデルと詳細なベンチマークを提供し,テキスト検出,認識,理解のパフォーマンスに関する洞察を与える。 MMOCRはhttps://github.com/open-mmlab/mmocr.comで公開されている。 We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction. MMOCR implements 14 state-of-the-art algorithms, which is significantly more than all the existing open-source OCR projects we are aware of to date. To facilitate future research and industrial applications of text recognition-related problems, we also provide a large number of trained models and detailed benchmarks to give insights into the performance of text detection, recognition and understanding. MMOCR is publicly released at https://github.com/open-mmlab/mmocr.	翻訳日:2021-08-17 15:13:39 公開日:2021-08-14
# 事前学習された顔認識モデルの偏り予測に対する画像歪みの影響の解明 Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models ( http://arxiv.org/abs/2108.06581v1 ) ライセンス: Link先を確認	Puspita Majumdar, Surbhi Mittal, Richa Singh, Mayank Vatsa	(参考訳) ディープラーニングアルゴリズムにおけるバイアスの特定と軽減は、社会への影響によって、ここ数年で大きな人気を集めている。バランスのとれたデータセットでトレーニングされたモデルは、サブグループ間で同等で偏りのないパフォーマンスを提供する、と研究者は主張する。しかし、\textit{can looks unbiased pre-trained model are biased when input data unders certain distortions? 私たちは初めて、顔認識という文脈でこの問題に答えようと試みました。異なる \textit{gender} と \textit{race} 部分群にまたがる画像歪みの存在下での4つの最先端深層顔認識モデルの性能を評価するための系統的分析を行った。画像の歪みは,各サブグループ間のモデルの性能ギャップと関係していることがわかった。 Identifying and mitigating bias in deep learning algorithms has gained significant popularity in the past few years due to its impact on the society. Researchers argue that models trained on balanced datasets with good representation provide equal and unbiased performance across subgroups. However, \textit{can seemingly unbiased pre-trained model become biased when input data undergoes certain distortions?} For the first time, we attempt to answer this question in the context of face recognition. We provide a systematic analysis to evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions across different \textit{gender} and \textit{race} subgroups. We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.	翻訳日:2021-08-17 15:13:29 公開日:2021-08-14
# カテゴリーとドメインアライメントに向けて--逆領域適応のためのカテゴリ不変特徴拡張 Towards Category and Domain Alignment: Category-Invariant Feature Enhancement for Adversarial Domain Adaptation ( http://arxiv.org/abs/2108.06583v1 ) ライセンス: Link先を確認	Yuan Wu, Diana Inkpen and Ahmed El-Roby	(参考訳) 敵対的ドメイン適応は、両方のドメインの特徴分布を整列させることにより、ソースドメインからターゲットドメインへの知識伝達において顕著な進歩を遂げた。これらの手法は、領域の発散を最小限にし、これら2つの領域の理想的な合同仮説の期待誤差として測定される適応性を小さい定数として考慮することに焦点を当てている。しかし、これらのアプローチは依然として2つの問題に直面している: (1) 敵対的領域アライメントは元の特徴分布を歪め、適応性を低下させる; (2) 特徴表現をドメイン不変に変換してドメイン固有のバリエーションを犠牲にする必要がある。これらの問題を緩和するために,適応性を最適化して対向領域適応を向上する一般的なメカニズムであるカテゴリー不変機能拡張(CIFE)を提案する。特に、CIFEアプローチでは、転送可能性を維持することで、ドメイン不変機能の識別性を高めるために、カテゴリ不変機能を導入している。実験により、CIFEは5つのベンチマークで最先端の結果を得るために、代表対逆領域適応法により改善できることが示されている。 Adversarial domain adaptation has made impressive advances in transferring knowledge from the source domain to the target domain by aligning feature distributions of both domains. These methods focus on minimizing domain divergence and regard the adaptability, which is measured as the expected error of the ideal joint hypothesis on these two domains, as a small constant. However, these approaches still face two issues: (1) Adversarial domain alignment distorts the original feature distributions, deteriorating the adaptability; (2) Transforming feature representations to be domain-invariant needs to sacrifice domain-specific variations, resulting in weaker discriminability. In order to alleviate these issues, we propose category-invariant feature enhancement (CIFE), a general mechanism that enhances the adversarial domain adaptation through optimizing the adaptability. Specifically, the CIFE approach introduces category-invariant features to boost the discriminability of domain-invariant features with preserving the transferability. Experiments show that the CIFE could improve upon representative adversarial domain adaptation methods to yield state-of-the-art results on five benchmarks.	翻訳日:2021-08-17 15:13:18 公開日:2021-08-14
# Few-Shotセグメンテーションのための自己蒸留埋設アフィニティ注意モデル A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation ( http://arxiv.org/abs/2108.06600v1 ) ライセンス: Link先を確認	Qi Zhao, Binghao Liu, Shuchang Lyu, Xu Wang and Yifan Yang	(参考訳) 少数ショットのセマンティクスセグメンテーションは、わずかな注釈付きサンプルでピクセル単位でオブジェクトのカテゴリを予測する難しいタスクである。しかし、既存のアプローチは依然として2つの大きな課題に直面している。第一に、サポートとクエリイメージの巨大な特徴区別は、知識伝達障壁を引き起こし、セグメンテーション性能を損なう。第2に,サポート機能を説明できないようなサポートサンプルは少なく,高品質なクエリセグメンテーションを導くことがほとんどない。上記の2つの課題に対処するため,数発のセグメンテーションタスクの性能向上のために,自己蒸留型組込み親和性アテンションモデル(SD-AANet)を提案する。具体的には、自己蒸留ガイド型プロトタイプモジュール(SDPM)は、サポートとクエリの自己蒸留により固有のプロトタイプを抽出し、代表的特徴を捉える。教師付きアフィニティアテンションモジュール(SAAM)は、高品質なクエリアテンションマップの作成をガイドするために、サポート基盤真理を採用し、アフィニティ情報を学習してクエリターゲットの全領域にフォーカスすることができる。 SD-AANetは既存の手法と比較して性能を著しく向上させる。包括的アブレーション実験と可視化実験も,数発のセグメンテーション作業においてSDPMとSAAMの有意な効果を示した。ベンチマークデータセットであるPASCAL-5iとCOCO-20iにおいて,提案したSD-AANetはいずれも最先端の結果を得た。私たちのコードはまもなく公開されます。 Few-shot semantic segmentation is a challenging task of predicting object categories in pixel-wise with only few annotated samples. However, existing approaches still face two main challenges. First, huge feature distinction between support and query images causes knowledge transferring barrier, which harms the segmentation performance. Second, few support samples cause unrepresentative of support features, hardly to guide high-quality query segmentation. To deal with the above two issues, we propose self-distillation embedded supervised affinity attention model (SD-AANet) to improve the performance of few-shot segmentation task. Specifically, the self-distillation guided prototype module (SDPM) extracts intrinsic prototype by self-distillation between support and query to capture representative features. The supervised affinity attention module (SAAM) adopts support ground truth to guide the production of high quality query attention map, which can learn affinity information to focus on whole area of query target. Extensive experiments prove that our SD-AANet significantly improves the performance comparing with existing methods. Comprehensive ablation experiments and visualization studies also show the significant effect of SDPM and SAAM for few-shot segmentation task. On benchmark datasets, PASCAL-5i and COCO-20i, our proposed SD-AANet both achieve state-of-the-art results. Our code will be publicly available soon.	翻訳日:2021-08-17 15:12:58 公開日:2021-08-14
# マルチアクセス無線ネットワーク上での効率的なフェデレーションメタラーニング Efficient Federated Meta-Learning over Multi-Access Wireless Networks ( http://arxiv.org/abs/2108.06453v1 ) ライセンス: Link先を確認	Sheng Yue, Ju Ren, Jiang Xin, Deyu Zhang, Yaoxue Zhang, Weihua Zhuang	(参考訳) フェデレーションメタラーニング(fml)は、今日のエッジラーニング分野におけるデータ制限と多様性の課題に対処するための有望なパラダイムとして登場した。しかし、その性能は遅い収束とそれに対応する低通信効率によって制限されることが多い。さらに、無線帯域とIoTデバイスのエネルギー容量は通常不十分であるため、現実的な無線ネットワークにFMLをデプロイする際には、リソース割り当てとエネルギー消費を制御することが不可欠である。これらの課題を克服するため,本論文ではまず,各ラウンドのグローバルロス低減に対する各デバイスの役割を厳密に解析し,収束を加速する非一様デバイス選択スキームを用いたfmlアルゴリズム(nufm)を開発した。その後,マルチアクセス無線システムにおいてnfmを統合する資源割当問題を定式化し,コンバージェンス率を向上し,壁時計時間の最小化とエネルギーコストの削減を図る。元の問題を段階的に分解することにより,デバイス選択とリソース割当戦略(uralと呼ばれる)を共同して解決し,理論的保証を提供する。さらに, 2 つの一階近似手法を組み合わせることで, nufm の計算複雑性を $o(d^2)$ から $o(d)$ (モデル次元は $d$ で) に削減できることを示した。シミュレーションの結果,提案手法の有効性と優位性について,既存のベースラインと比較した。 Federated meta-learning (FML) has emerged as a promising paradigm to cope with the data limitation and heterogeneity challenges in today's edge learning arena. However, its performance is often limited by slow convergence and corresponding low communication efficiency. Besides, since the wireless bandwidth and IoT devices' energy capacity are usually insufficient, it is crucial to control the resource allocation and energy consumption when deploying FML in realistic wireless networks. To overcome these challenges, in this paper, we first rigorously analyze each device's contribution to the global loss reduction in each round and develop an FML algorithm (called NUFM) with a non-uniform device selection scheme to accelerate the convergence. After that, we formulate a resource allocation problem integrating NUFM in multi-access wireless systems to jointly improve the convergence rate and minimize the wall-clock time along with energy cost. By deconstructing the original problem step by step, we devise a joint device selection and resource allocation strategy (called URAL) to solve the problem and provide theoretical guarantees. Further, we show that the computational complexity of NUFM can be reduced from $O(d^2)$ to $O(d)$ (with $d$ being the model dimension) via combining two first-order approximation techniques. Extensive simulation results demonstrate the effectiveness and superiority of the proposed methods by comparing with the existing baselines.	翻訳日:2021-08-17 14:56:18 公開日:2021-08-14
# kdd cup 2021 都市脳チャレンジのためのdqn制御ソリューション DQN Control Solution for KDD Cup 2021 City Brain Challenge ( http://arxiv.org/abs/2108.06491v1 ) ライセンス: Link先を確認	Yitian Chen and Kunlong Chen and Kunjin Chen and Lin Wang	(参考訳) 私たちは、city brain challengeコンテストに参加し、第8位を獲得しました。このコンペティションでは、プレイヤーは実世界の都市規模の道路網と、その交通需要が実際の交通データから得られる。プレイヤーは自設計のエージェントと信号の調整を依頼され、許容できる遅延を維持しながら提供される車両の数を最大化する。本稿では,このコンペティションに対する総合分析と詳細な解法について述べる。提案手法は主に,リアルタイム信号制御のためのディープQネットワーク(DQN)の適応に基づいている。我々の見解では、この競争の大きな課題は、現実世界の複雑な道路網と交通流状況において、従来のDQNフレームワークを交通信号制御にどのように拡張するかである。いくつかの古典的な報酬関数を試行した後、私たちは最終的に、新しく設計された報酬をエージェントに適用することにしました。新たに提案した報酬関数を適用し、制御スキームを慎重にチューニングすることで、単一のDQNモデルに基づくエージェントがトップ15チームの中でランク付けできる。この論文は、現実世界の道路網の交通信号制御のベースラインソリューションとしてある程度機能し、さらなる試みや研究を刺激できることを願っている。 We took part in the city brain challenge competition and achieved the 8th place. In this competition, the players are provided with a real-world city-scale road network and its traffic demand derived from real traffic data. The players are asked to coordinate the traffic signals with a self-designed agent to maximize the number of vehicles served while maintaining an acceptable delay. In this abstract paper, we present an overall analysis and our detailed solution to this competition. Our approach is mainly based on the adaptation of the deep Q-network (DQN) for real-time traffic signal control. From our perspective, the major challenge of this competition is how to extend the classical DQN framework to traffic signals control in real-world complex road network and traffic flow situation. After trying and implementing several classical reward functions, we finally chose to apply our newly-designed reward in our agent. By applying our newly-proposed reward function and carefully tuning the control scheme, an agent based on a single DQN model can rank among the top 15 teams. We hope this paper could serve, to some extent, as a baseline solution to traffic signal control of real-world road network and inspire further attempts and researches.	翻訳日:2021-08-17 14:55:55 公開日:2021-08-14
# LinkTeller: 影響分析を通じてグラフニューラルネットワークからプライベートエッジを復元する LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis ( http://arxiv.org/abs/2108.06504v1 ) ライセンス: Link先を確認	Fan Wu, Yunhui Long, Ce Zhang, Bo Li	(参考訳) グラフ構造化データにより、豊富なノードの特徴とエッジ情報を考慮して、レコメンデーションシステムやトラフィック予測など、いくつかの成功したアプリケーションを実現している。しかし、これらの高次元の特徴と高次隣接情報は、通常不均一であり、実際には異なるデータホルダーによって保持される。このような垂直データ分割(例えば、1つのデータホルダがノードの特徴またはエッジ情報のみを所有している)を考えると、異なるデータホルダは、プライバシの懸念からデータを互いに直接転送するのではなく、効率的な共同トレーニングプロトコルを開発する必要がある。本稿では,エッジプライバシに注目し,ノード機能を備えたBobがまず,隣接情報を所有するAliceにトレーニングノード機能を送信するという,トレーニングシナリオを検討する。 Aliceは、ジョイント情報でグラフニューラルネットワーク(GNN)をトレーニングし、推論APIをリリースする。推論中、Bob氏はテストノードの機能を提供し、APIに問い合わせてテストノードの予測を取得することができる。本稿ではまず,Aliceが保持するプライベートエッジ情報をBobの逆クエリによって推測するために,影響分析によるプライバシ攻撃LinkTellerを提案する。その後、LinkTellerが膨大な量のプライベートエッジを回復できることを実証的に示し、既存のベースラインを上回ります。プライバシリークを更に評価するために、差分プライベートグラフ畳み込みネットワーク(DP GCN)トレーニングのための既存のアルゴリズムを適用し、新しいDP GCNメカニズムであるLapGraphを提案する。これらのDP GCNメカニズムは、穏やかなプライバシー保証(\varepsilon>5$)の下で、LinkTellerに対して実証的に回復力があるとは限らない。当社の研究は、よりレジリエントなプライバシー保存型gcnモデルの設計に向けた今後の研究に光を当てると同時に、gcnモデルユーティリティと潜在的なプライバシー攻撃に対する堅牢性とのトレードオフに関する深い理解を提供します。 Graph structured data have enabled several successful applications such as recommendation systems and traffic prediction, given the rich node features and edges information. However, these high-dimensional features and high-order adjacency information are usually heterogeneous and held by different data holders in practice. Given such vertical data partition (e.g., one data holder will only own either the node features or edge information), different data holders have to develop efficient joint training protocols rather than directly transfer data to each other due to privacy concerns. In this paper, we focus on the edge privacy, and consider a training scenario where Bob with node features will first send training node features to Alice who owns the adjacency information. Alice will then train a graph neural network (GNN) with the joint information and release an inference API. During inference, Bob is able to provide test node features and query the API to obtain the predictions for test nodes. Under this setting, we first propose a privacy attack LinkTeller via influence analysis to infer the private edge information held by Alice via designing adversarial queries for Bob. We then empirically show that LinkTeller is able to recover a significant amount of private edges, outperforming existing baselines. To further evaluate the privacy leakage, we adapt an existing algorithm for differentially private graph convolutional network (DP GCN) training and propose a new DP GCN mechanism LapGraph. We show that these DP GCN mechanisms are not always resilient against LinkTeller empirically under mild privacy guarantees ($\varepsilon>5$). Our studies will shed light on future research towards designing more resilient privacy-preserving GCN models; in the meantime, provide an in-depth understanding of the tradeoff between GCN model utility and robustness against potential privacy attacks.	翻訳日:2021-08-17 14:55:36 公開日:2021-08-14
# 属性の組み合わせに対する精度面としての予測サービスの能動的評価 Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations ( http://arxiv.org/abs/2108.06514v1 ) ライセンス: Link先を確認	Vihari Piratla, Soumen Chakrabarty, Sunita Sarawagi	(参考訳) 本研究の目的は,ブラックボックス分類モデルの精度を評価することであり,テストデータ分布の1つの集合ではなく,複数のテストデータ分布を特徴付ける多数の属性の組み合わせの曲面として評価することである。トレーニングデータ分散がクライアントから隠蔽され、異なるクライアントがデータ分散のさまざまな領域に興味を持つようになると、マシンラーニングモデルがサービスとしてデプロイされるにつれて、このような帰結した正確性測定が重要になる。本稿では,AAA(Attributed Accuracy Assay) - ガウス過程(GP)に基づく,そのような精度表面の確率的推定器を提案する。各属性の組み合わせは'arm'と呼ばれ、サービスの精度をサンプリングしたベータ密度に関連付けられている。 GPが関連するアーム上でベータ密度のパラメータを滑らかにすることで、間隔を緩和することを期待している。 gpsの明らかな応用は,人口の少ない巨大な属性空間におけるヘテロシデスティックな不確実性の課題に対処できないことを示す。これに反応して,スパース観測をプールし,ベータ密度のスケールパラメータを定式化する2つの機能拡張を行った。これらのイノベーションを導入した後、広範囲な実験と分析を通じて、推定精度と探索効率の両方の観点からAAAの有効性を確立した。 Our goal is to evaluate the accuracy of a black-box classification model, not as a single aggregate on a given test data distribution, but as a surface over a large number of combinations of attributes characterizing multiple test data distributions. Such attributed accuracy measures become important as machine learning models get deployed as a service, where the training data distribution is hidden from clients, and different clients may be interested in diverse regions of the data distribution. We present Attributed Accuracy Assay (AAA)--a Gaussian Process (GP)--based probabilistic estimator for such an accuracy surface. Each attribute combination, called an 'arm', is associated with a Beta density from which the service's accuracy is sampled. We expect the GP to smooth the parameters of the Beta density over related arms to mitigate sparsity. We show that obvious application of GPs cannot address the challenge of heteroscedastic uncertainty over a huge attribute space that is sparsely and unevenly populated. In response, we present two enhancements: pooling sparse observations, and regularizing the scale parameter of the Beta densities. After introducing these innovations, we establish the effectiveness of AAA in terms of both its estimation accuracy and exploration efficiency, through extensive experiments and analysis.	翻訳日:2021-08-17 14:55:03 公開日:2021-08-14
# 適切な公正認識? 自動意思決定システムの公正性を評価するための説明文の有効性について Appropriate Fairness Perceptions? On the Effectiveness of Explanations in Enabling People to Assess the Fairness of Automated Decision Systems ( http://arxiv.org/abs/2108.06500v1 ) ライセンス: Link先を確認	Jakob Schoeffer and Niklas Kuehl	(参考訳) 自動決定システム(ADS)の説明の1つの目的は、ユーザの肯定的な認識(公正性や信頼性など)を促進することであるとしばしば主張されている。しかし、この視点は、与えられたADSがまずは公平で信頼に値するという暗黙の仮定を下している。もしADSが不公平な結果を出した場合、システムの動作に関する説明がその欠点を明らかにし、したがって公正感の低下につながると期待するかもしれない。その結果、関連するADSの品質(公平さ)を適切に評価する上で、その有効性に対する説明を評価できることが示唆された。効果的に説明するためには、基礎となるADSが公正である場合に限り、公平性に対する認識が増加するべきであると論じる。本研究は, 適切な公正感のデシプラタムを導入し, 評価のための新しい研究設計を提案し, 総合実験に向けた次のステップを概説する。 It is often argued that one goal of explaining automated decision systems (ADS) is to facilitate positive perceptions (e.g., fairness or trustworthiness) of users towards such systems. This viewpoint, however, makes the implicit assumption that a given ADS is fair and trustworthy, to begin with. If the ADS issues unfair outcomes, then one might expect that explanations regarding the system's workings will reveal its shortcomings and, hence, lead to a decrease in fairness perceptions. Consequently, we suggest that it is more meaningful to evaluate explanations against their effectiveness in enabling people to appropriately assess the quality (e.g., fairness) of an associated ADS. We argue that for an effective explanation, perceptions of fairness should increase if and only if the underlying ADS is fair. In this in-progress work, we introduce the desideratum of appropriate fairness perceptions, propose a novel study design for evaluating it, and outline next steps towards a comprehensive experiment.	翻訳日:2021-08-17 14:50:53 公開日:2021-08-14
# 無人航空機におけるリアルタイムマルチモーダルセマンティクス融合 Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles ( http://arxiv.org/abs/2108.06608v1 ) ライセンス: Link先を確認	Simon Bultmann, Jan Quenzel and Sven Behnke	(参考訳) 複数の補完センサーを装備した無人航空機(UAV)は、高速な自律的または遠隔操作型セマンティックシーン分析(例えば災害調査)に極めて有益である。本研究では,実時間意味推論と複数センサの融合のためのUAVシステムを提案する。 LiDARスキャンとRGBイメージのセマンティックセグメンテーション、およびRGBとサーマルイメージのオブジェクト検出は、軽量CNNアーキテクチャと組み込み推論アクセラレータを使用してUAVコンピュータ上でオンラインで実行される。マルチモーダル性からのセマンティック情報が3次元点雲と画像分割マスクを増大させ、同時にアロセントリックなセマンティックマップを生成する。我々のシステムは、拡張されたセマンティックイメージとポイントクラウドを$\approx\,$9$,$hzで提供する。都市環境における実環境実験における統合システムの評価を行う。 Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors have tremendous potential for fast autonomous or remote-controlled semantic scene analysis, e.g., for disaster examination. In this work, we propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer using lightweight CNN architectures and embedded inference accelerators. We follow a late fusion approach where semantic information from multiple modalities augments 3D point clouds and image segmentation masks while also generating an allocentric semantic map. Our system provides augmented semantic images and point clouds with $\approx\,$9$\,$Hz. We evaluate the integrated system in real-world experiments in an urban environment.	翻訳日:2021-08-17 14:48:58 公開日:2021-08-14
# ループ内ソフトウェアを用いたクワッドコプタードローンの単眼視覚自律着陸システム Monocular visual autonomous landing system for quadcopter drones using software in the loop ( http://arxiv.org/abs/2108.06616v1 ) ライセンス: Link先を確認	Miguel Saavedra-Ruiz, Ana Mario Pinto-Vargas, Victor Romero-Cano	(参考訳) 自律着陸は、多くの社会的・産業的応用において、マルチロータードローンの潜在能力を最大限に発揮するために欠かせない能力である。物理プラットフォーム上でのこの機能の実装とテストはリスクが高く、リソース集約的であるため、健全な設計プロセスと安全な配置の両方を保証するためには、物理プロトタイプを実装する前にシミュレーションが必要である。本稿では,クワッドコプターを予め定義された着陸パッドに自律的かつ効率的に着陸させることにより,物理的試験段階のリスクを低減できる単眼視システムの開発について述べる。ガゼボをベースとしたシミュレーションにより,自律着陸システム全体が設計要件を満たすことを保証するとともに,本手法は,物理実装に先立って安全なパラメータチューニングと設計試験を行うためのツールを提供する。最後に、ランディングパッド追跡に対する単眼視覚のみのアプローチにより、オドロイドxu4組み込みプロセッサの標準的な計算能力を持つf450クアッドコプタードローンでシステムを効果的に実装することができた。 Autonomous landing is a capability that is essential to achieve the full potential of multi-rotor drones in many social and industrial applications. The implementation and testing of this capability on physical platforms is risky and resource-intensive; hence, in order to ensure both a sound design process and a safe deployment, simulations are required before implementing a physical prototype. This paper presents the development of a monocular visual system, using a software-in-the-loop methodology, that autonomously and efficiently lands a quadcopter drone on a predefined landing pad, thus reducing the risks of the physical testing stage. In addition to ensuring that the autonomous landing system as a whole fulfils the design requirements using a Gazebo-based simulation, our approach provides a tool for safe parameter tuning and design testing prior to physical implementation. Finally, the proposed monocular vision-only approach to landing pad tracking made it possible to effectively implement the system in an F450 quadcopter drone with the standard computational capabilities of an Odroid XU4 embedded processor.	翻訳日:2021-08-17 14:48:43 公開日:2021-08-14
# 疎ベイズ推定のための高速非同期MCMCサンプリング器 A fast asynchronous MCMC sampler for sparse Bayesian inference ( http://arxiv.org/abs/2108.06446v1 ) ライセンス: Link先を確認	Yves Atchad\'e and Liwei Wang	(参考訳) 非常に高速に近似したマルコフ・チェイン・モンテカルロ(MCMC)サンプリングフレームワークを提案する。これは、複数のモデルにおける反復1回当たりの計算コストが$O(ns)$で、$n$はサンプルサイズ、$s$はモデルの基本空間である。このコストは、確率勾配ランジュバンダイナミクスを用いる場合のデータサブサンプリングによってさらに削減できる。このアルゴリズムは、Johnsonらの非同期Gibbsサンプルラの拡張である。 (2013)が、統計的観点からはベイズ的反復的な独立したスクリーニング(Fan et al)の形式と見なすことができる。 (2009)). 高次元線形回帰問題において,提案アルゴリズムが生成するマルコフ連鎖は,統計的仮定の下で高い確率で主信号を正確に回復する不変分布を許容することを示した。さらに, その混合時間は回帰器数において最も直線的であることを示す。アルゴリズムをいくつかのモデルで示す。 We propose a very fast approximate Markov Chain Monte Carlo (MCMC) sampling framework that is applicable to a large class of sparse Bayesian inference problems, where the computational cost per iteration in several models is of order $O(ns)$, where $n$ is the sample size, and $s$ the underlying sparsity of the model. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. (2013), but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening (Fan et al. (2009)). We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.	翻訳日:2021-08-17 14:46:51 公開日:2021-08-14
# AdaGNN: AdaBoostingに基づくGNNのためのマルチモーダル潜在表現メタラーナ AdaGNN: A multi-modal latent representation meta-learner for GNNs based on AdaBoosting ( http://arxiv.org/abs/2108.06452v1 ) ライセンス: Link先を確認	Qinyi Zhu, Yiou Xiao	(参考訳) ディープラーニングの特殊分野として、グラフニューラルネットワーク(GNN)は固有のネットワーク特徴の抽出に重点を置いており、学術と産業の両方で前例のない人気を得ている。最先端のgnnモデルの多くは、グラフトラバーサルベースの方法での計算処理が難しいリッチなネットワーク機能を備えたソーシャルネットワークレコメンダシステムに対して、表現豊かで堅牢でスケーラブルでインダクティブなソリューションを提供します。最近のGNNは、部分グラフから1つの低次元埋め込み空間へ高次元の異種情報を符号化するエンコーダ・デコーダのパラダイムに従っている。しかし、1つの埋め込み空間は通常、グラフ信号の全ての側面を捉えない。本研究では,複数のプロジェクションと,グラフ信号の異なる側面をキャプチャする埋め込み空間を自動的に学習する,GNNのためのブースティングベースメタラーナを提案する。その結果、サブグラフ間の類似性を複数の埋め込み空間に近接して定量化する。 AdaGNNは、リッチで多様なノード近傍情報を持つアプリケーションに対して非常によく機能する。さらに、AdaGNNはノードレベルとエッジレベルの両方のタスクに対して誘導GNNと互換性がある。 As a special field in deep learning, Graph Neural Networks (GNNs) focus on extracting intrinsic network features and have drawn unprecedented popularity in both academia and industry. Most of the state-of-the-art GNN models offer expressive, robust, scalable and inductive solutions empowering social network recommender systems with rich network features that are computationally difficult to leverage with graph traversal based methods. Most recent GNNs follow an encoder-decoder paradigm to encode high dimensional heterogeneous information from a subgraph onto one low dimensional embedding space. However, one single embedding space usually fails to capture all aspects of graph signals. In this work, we propose boosting-based meta learner for GNNs, which automatically learns multiple projections and the corresponding embedding spaces that captures different aspects of the graph signals. As a result, similarities between sub-graphs are quantified by embedding proximity on multiple embedding spaces. AdaGNN performs exceptionally well for applications with rich and diverse node neighborhood information. Moreover, AdaGNN is compatible with any inductive GNNs for both node-level and edge-level tasks.	翻訳日:2021-08-17 14:46:35 公開日:2021-08-14
# ハイブリッドガウス過程モデリングによるバッチプロセスの経済統計モデル予測制御 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model Predictive Control of Batch Processes ( http://arxiv.org/abs/2108.06430v1 ) ライセンス: Link先を確認	E. Bradford, L. Imsland, M. Reble, E.A. del Rio-Chanona	(参考訳) 非線形モデル予測制御(nmpc)は、制約のある非線形多変数動的システムの制御に効率的なアプローチであるが、正確なプラントモデルを必要とする。植物モデルはしばしば第一原理から決定されるが、モデルの一部は物理法則だけで導出することは困難である。本稿では,この課題を克服するために,gpsを利用して,第一原理を用いた記述が難しい動的システムの部品をモデル化するハイブリッド・ガウス過程(gp)第一原理モデリングスキームを提案する。 GPは正確な予測を与えるだけでなく、このモデルの残留不確実性も定量化する。この不確実性を制御アルゴリズムで考慮し、制約違反や性能劣化を防止することが不可欠である。 GPのモンテカルロサンプルはオフラインで生成され、NMPCの制約を厳しくし、共同確率的制約満足度をオンラインで確保する。提案手法の利点は,高速なオンライン評価時間,保守性を緩和するオンライン学習を考慮できる可能性,gpsの柔軟性と第一原理モデルのデータ効率を活用できる点である。このアルゴリズムは、挑戦的なセミバッチバイオリアクターを含むケーススタディで検証される。 Nonlinear model predictive control (NMPC) is an efficient approach for the control of nonlinear multivariable dynamic systems with constraints, which however requires an accurate plant model. Plant models can often be determined from first principles, parts of the model are however difficult to derive using physical laws alone. In this paper a hybrid Gaussian process (GP) first principles modeling scheme is proposed to overcome this issue, which exploits GPs to model the parts of the dynamic system that are difficult to describe using first principles. GPs not only give accurate predictions, but also quantify the residual uncertainty of this model. It is vital to account for this uncertainty in the control algorithm, to prevent constraint violations and performance deterioration. Monte Carlo samples of the GPs are generated offline to tighten constraints of the NMPC to ensure joint probabilistic constraint satisfaction online. Advantages of our method include fast online evaluation times, possibility to account for online learning alleviating conservativeness, and exploiting the flexibility of GPs and the data efficiency of first principle models. The algorithm is verified on a case study involving a challenging semi-batch bioreactor.	翻訳日:2021-08-17 14:44:10 公開日:2021-08-14
# layerpipe:層内および層間勾配パイプラインとマルチプロセッサスケジューリングによるディープニューラルネットワークトレーニングの高速化 LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling ( http://arxiv.org/abs/2108.06629v1 ) ライセンス: Link先を確認	Nanda K. Unnikrishnan and Keshab K. Parhi	(参考訳) ニューラルネットワークのトレーニングに要する時間は、サイズ、複雑性、深さによって増加する。バックプロパゲーションによるモデルパラメータのトレーニングは、本質的にフィードバックループを生成する。これらのループは、レイヤー内および連続するレイヤ間のタスクの効率的なパイプライン化とスケジューリングを妨げる。 PipeDreamのような以前のアプローチでは、層間パイプライニングを実現するために遅延勾配を使用した。しかし、これらのアプローチはバックプロパゲーション全体を単一のタスクとして扱うため、計算時間とプロセッサの非使用率の増加につながる。本稿では,重みと活性化関数に対する勾配計算を独立に考慮し,並列に計算できる新しい最適化手法を提案する。これを層内最適化と呼ぶ。さらに、活性化関数に関する勾配計算はさらに2つの部分に分割され、2つの連続層に分散される。これにより、各レイヤの計算時間は同じバランスの取れたスケジューリングにつながる。これを層間最適化と呼ぶ。提案システムはLayerPipeと呼ばれ,プロセッサ使用率を最小化しつつ,プロセッサ間通信オーバーヘッドを最小限に抑えながら,トレーニングに必要なクロックサイクル数を削減している。 LayerPipeは、PipeDreamと比較して通信オーバーヘッドが少ない7～9プロセッサで平均25%、80%以上のスピードアップを実現している。 The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.	翻訳日:2021-08-17 14:43:53 公開日:2021-08-14
# graph2mda:microbe-drug関連を予測するマルチモーダル変分グラフ埋め込みモデル Graph2MDA: a multi-modal variational graph embedding model for predicting microbe-drug associations ( http://arxiv.org/abs/2108.06338v1 ) ライセンス: Link先を確認	Lei Deng, Yibiao Huang, Xuejun Liu and Hui Liu	(参考訳) 蓄積された臨床研究によると、ヒトに生息する微生物はヒトの宿主と密接に相互作用し、薬物効果や薬物毒性の調節に関与している。微生物は抗菌剤の開発に新たな標的となっている。したがって、微生物ドラッグ協会のスクリーニングは、薬物研究と開発に大きな利益をもたらす可能性がある。微生物ゲノムと薬理学のデータセットの増加に伴い,新しい微生物-薬物関連を同定する効果的な計算手法の開発が大きな動機となっている。本稿では、変動グラフオートエンコーダ(VGAE)を用いて、微生物と薬物の関係を予測する新しい方法、Graph2MDAを提案する。分子構造, 微生物遺伝子配列, 機能アノテーションなど, 微生物や薬物の多機能性に基づくマルチモーダル属性グラフを構築した。マルチモーダル属性グラフの入力として、vgaeは各ノードとグラフ全体の情報的かつ解釈可能な潜在表現を学ぶように訓練され、さらに深層ニューラルネットワーク分類器を使用してマイクロ薬物関連を予測する。ハイパーパラメータ解析およびモデルアブレーション研究により,本モデルの感度と堅牢性を示した。提案手法は3つの独立したデータセット上で評価し,提案手法が既存の6つの最先端手法を上回った。また,学習した薬物の潜在表現の意味についても検討し,薬物のATC分類と有意に一致した明らかなクラスタリングパターンを示した。さらに, 2つの微生物と2つの薬剤のケーススタディを行い, pubmed literatureで75\%-95\%の関連が報告された。提案手法の有効性を検証した。 Accumulated clinical studies show that microbes living in humans interact closely with human hosts, and get involved in modulating drug efficacy and drug toxicity. Microbes have become novel targets for the development of antibacterial agents. Therefore, screening of microbe-drug associations can benefit greatly drug research and development. With the increase of microbial genomic and pharmacological datasets, we are greatly motivated to develop an effective computational method to identify new microbe-drug associations. In this paper, we proposed a novel method, Graph2MDA, to predict microbe-drug associations by using variational graph autoencoder (VGAE). We constructed multi-modal attributed graphs based on multiple features of microbes and drugs, such as molecular structures, microbe genetic sequences, and function annotations. Taking as input the multi-modal attribute graphs, VGAE was trained to learn the informative and interpretable latent representations of each node and the whole graph, and then a deep neural network classifier was used to predict microbe-drug associations. The hyperparameter analysis and model ablation studies showed the sensitivity and robustness of our model. We evaluated our method on three independent datasets and the experimental results showed that our proposed method outperformed six existing state-of-the-art methods. We also explored the meaningness of the learned latent representations of drugs and found that the drugs show obvious clustering patterns that are significantly consistent with drug ATC classification. Moreover, we conducted case studies on two microbes and two drugs and found 75\%-95\% predicted associations have been reported in PubMed literature. Our extensive performance evaluations validated the effectiveness of our proposed method.\	翻訳日:2021-08-16 13:00:23 公開日:2021-08-14

Title

Authors

Abstract

論文公表日・翻訳日

# 近似テンソル分解:多くの分離の消失

Approximate tensor decompositions: disappearance of many separations ( http://arxiv.org/abs/2004.10219v2 )

ライセンス: Link先を確認

Gemma De las Cuevas, Andreas Klingler, Tim Netzer

(参考訳) テンソル分解は分離、すなわち局所項の制約(肯定性など)がそれらの表現に任意に高いコストを伴っていることを示すことがよく知られている。ここで、これらの分離の多くは近似の場合消滅することを示す。具体的には、すべての近似誤差 $\varepsilon$ とノルムに対して、近似ランクを、そのノルムに関して $\varepsilon$-ball の要素の最小ランクとして定義する。正の半定値行列に対しては、Schatten $p$-norms の大きなクラスに対して階数、浄化階数、分離ランクの分離が消失することを示す。非負テンソルに対しては、$p>1$のすべての$\ell_p$-ノルムに対してランク、正半定ランク、非負ランクの分離が消えることを示す。トレースノルム(p = 1$)については、周囲の次元に依存する上界を得る。また,境界に達する近似分解を求める決定論的アルゴリズムを提案する。我々の主なツールはカラス・エオドリーの定理の近似版である。その結果、多くの分離はテンソルの小さな摂動下では頑健ではなく、量子多体系や通信複雑性にも影響することがわかった。

It is well-known that tensor decompositions show separations, that is, that constraints on local terms (such as positivity) may entail an arbitrarily high cost in their representation. Here we show that many of these separations disappear in the approximate case. Specifically, for every approximation error $\varepsilon$ and norm, we define the approximate rank as the minimum rank of an element in the $\varepsilon$-ball with respect to that norm. For positive semidefinite matrices, we show that the separations between rank, purification rank, and separable rank disappear for a large class of Schatten $p$-norms. For nonnegative tensors, we show that the separations between rank, positive semidefinite rank, and nonnegative rank disappear for all $\ell_p$-norms with $p>1$. For the trace norm ($p = 1$), we obtain upper bounds that depend on the ambient dimension. We also provide a deterministic algorithm to obtain the approximate decomposition attaining our bounds. Our main tool is an approximate version of Carath\'eodory's Theorem. Our results imply that many separations are not robust under small perturbations of the tensor, with implications in quantum many-body systems and communication complexity.

翻訳日:2023-05-22 20:29:49 公開日:2021-08-14

# 波動と粒子の性質は量子実体の中で空間的に分離できる

Wave and particle properties can be spatially separated in a quantum entity ( http://arxiv.org/abs/2009.00545v2 )

ライセンス: Link先を確認

Pratyusha Chowdhury, Arun Kumar Pati and Jing-Ling Chen

(参考訳) 波動と粒子は自然の2つの基本的な性質である。波動-粒子の双対性は、実験の状況に応じて、量子オブジェクトが波動と粒子の両方の挙動を示す可能性があることを示している。 The major significance of wave-particle duality has led to a fundamental equation in quantum mechanics, the Schr{\" o}dinger equation. At present, the principle of wave-particle duality has been deeply rooted in people's hearts. This gives rise to a common sense perception that wave property and particle property coexist simultaneously in a quantum entity, and these two physical attributes cannot be completely separated from each other. In classical physics, a similar common sense is that a physical system is inseparable from its physical properties. However, this has been recently challenged and beaten by a quantum phenomenon called the "quantum Cheshire cat", for which a cat and its grin can be separated spatially. 本研究では,量子チェシャー猫の同様の技術に基づく思考実験を提案する。量子実体の波動特性と粒子特性を完全に分離することができ、量子実体の波動粒子双対性をうまく分解することができる。我々の結果は相補性原理とまだ一致しており、量子基底の理解を深めている。

Wave and particle are two fundamental properties of Nature. The wave-particle duality has indicated that a quantum object may exhibit the behaviours of both wave and particle, depending upon the circumstances of the experiment. The major significance of wave-particle duality has led to a fundamental equation in quantum mechanics, the Schr{\" o}dinger equation. At present, the principle of wave-particle duality has been deeply rooted in people's hearts. This gives rise to a common sense perception that wave property and particle property coexist simultaneously in a quantum entity, and these two physical attributes cannot be completely separated from each other. In classical physics, a similar common sense is that a physical system is inseparable from its physical properties. However, this has been recently challenged and beaten by a quantum phenomenon called the "quantum Cheshire cat", for which a cat and its grin can be separated spatially. In this work, we propose a thought experiment based on the similar technology of quantum Cheshire cat. We find that wave and particle attributes of a quantum entity can be completely separated, thus successfully dismantling the wave-particle duality for a quantum entity. Our result is still consistent with the complementarity principle and deepens the understanding of quantum foundations.

翻訳日:2023-05-04 03:13:30 公開日:2021-08-14

# 単一実験における非局所性, 操舵および量子状態トモグラフィー

Nonlocality, steering and quantum state tomography in a single experiment ( http://arxiv.org/abs/2011.05666v3 )

ライセンス: Link先を確認

Chang-Jiang Huang, Guo-Yong Xiang, Yu Guo, Kang-Da Wu, Bi-Heng Liu, Chuan-Feng Li, Guang-Can Guo, Armin Tavakoli

(参考訳) 量子状態トモグラフィーのパラダイム的測定,すなわち相互に偏りのない基底と対称的な情報完全測定が量子相関の証明に有効かどうかを検討する。本研究の目的は, トモグラフィ実験で得られた結果統計に基づいて評価可能な絡み合い検出, 操舵, 非局所性のための簡易かつノイズ・ロバスト相関証人を特定することである。これにより、絡み合ったクトリッツの状態トモグラフィー、アインシュタイン-ポドルスキー-ローゼンステアリングの試験、ベルの不等式試験を1つの実験で実行することができる。また, 量子相関と断層計測における部分集合とのトレードオフや, 異なるシナリオにおける絡み合いの定量化についても検討した。最後に、これらの柔軟な仮定の下で量子相関を実証するフォトニクス実験を行う。

We investigate whether paradigmatic measurements for quantum state tomography, namely mutually unbiased bases and symmetric informationally complete measurements, can be employed to certify quantum correlations. For this purpose, we identify a simple and noise-robust correlation witness for entanglement detection, steering and nonlocality that can be evaluated based on the outcome statistics obtained in the tomography experiment. This allows us to perform state tomography on entangled qutrits, a test of Einstein-Podolsky-Rosen steering and a Bell inequality test, all within a single experiment. We also investigate the trade-off between quantum correlations and subsets of tomographically complete measurements as well as the quantification of entanglement in the different scenarios. Finally, we perform a photonics experiment in which we demonstrate quantum correlations under these flexible assumptions, namely with both parties trusted, one party untrusted and both parties untrusted.

翻訳日:2023-04-24 12:01:21 公開日:2021-08-14

# デチューニング変調ユニバーサル複合パルス

Detuning modulated universal composite pulses ( http://arxiv.org/abs/2012.04401v2 )

ライセンス: Link先を確認

Hadar Greener, Elica Kyoseva, Haim Suchowski

(参考訳) 本稿では,2状態量子系のn回転としてデチューニング変調複合パルス(dmcps)を導出し,システムの初期状態とは独立な高精度でロバストなパルスを生成する方法を提案する。この方式は最小限のパルスオーバヘッドを持ち、システムの寿命内に量子情報処理(QIP)に適した10−4$しきい値の範囲内で振幅誤差に対して安定なパルスを達成する。この一連のパルスは、シリコンフォトニクスにおける避けられない製造誤差を克服し、正確な光伝達を達成するためにシステムに結合された正確な初期状態の必要性を緩和する。さらに、一般DMCPを既約SU(2)対称性を持つnレベルシステムに拡張し、任意の初期状態からのパルス領域の誤差に対して非常に堅牢な状態転送を生成する。

We present a general method to derive detuning-modualted composite pulses (DMCPs) as N rotations of a canonical two-state quantum system to create accurate and robust pulses that are independent of the initial state of the system. This scheme has minimal pulse overhead, and achieves pulses that are stable against amplitude errors well within the $10^{-4}$ threshold that may be suitable for quantum information processing (QIP), within the lifetime of the system. This family of pulses enables to overcome inevitable fabrication errors in silicon photonics, and relax the need for a precise initial state of light coupled into the system to achieve accurate light transfer. Furthermore, we extend universal DMCPs to n-level systems with irreducible SU(2) symmetry to create state transfer that is highly robust to errors in the pulse area from any initial state.

翻訳日:2023-04-21 18:23:01 公開日:2021-08-14

# 量子プログラムのためのデータフロー最適化

Enabling Dataflow Optimization for Quantum Programs ( http://arxiv.org/abs/2101.11030v2 )

ライセンス: Link先を確認

David Ittah, Thomas H\"aner, Vadym Kliuchnikov, Torsten Hoefler

(参考訳) 最適化のために,量子および古典的データの依存関係を直接公開する量子コンピューティング用IRを提案する。最適化のための量子中間表現(qiro、quantum intermediate representation for optimization)は、2つの方言からなる。 1つはおそらく直感的なメモリセマンティクス(量子演算は副作用として機能する)を使用し、もう1つはバリューセマンティクス(操作は状態を消費し、生成する)を使用する。重要なことに、これはデータフローを直接IRにエンコードし、データフロー分析を利用する最適化のホストを可能にする。本稿では、既存の量子プログラミング言語を入力方言にマップする方法と、irを最適化方言に下げる方法について論じる。本稿では、いくつかの量子固有最適化パスを含むMLIRに基づくプロトタイプ実装を提案する。我々のベンチマークでは、静的な最適化によっても、リソース要求の大幅な改善が可能であることを示しています。実行時の回路最適化とは対照的に、これはコンパイル時のオーバーヘッドを一定に抑えながら実現され、アプリケーション規模での量子プログラム最適化にとって魅力的なアプローチとなる。

We propose an IR for quantum computing that directly exposes quantum and classical data dependencies for the purpose of optimization. The Quantum Intermediate Representation for Optimization (QIRO) consists of two dialects, one input dialect and one that is specifically tailored to enable quantum-classical co-optimization. While the first employs a perhaps more intuitive memory-semantics (quantum operations act as side-effects), the latter uses value-semantics (operations consume and produce states). Crucially, this encodes the dataflow directly in the IR, allowing for a host of optimizations that leverage dataflow analysis. We discuss how to map existing quantum programming languages to the input dialect and how to lower the resulting IR to the optimization dialect. We present a prototype implementation based on MLIR that includes several quantum-specific optimization passes. Our benchmarks show that significant improvements in resource requirements are possible even through static optimization. In contrast to circuit optimization at run time, this is achieved while incurring only a small constant overhead in compilation time, making this a compelling approach for quantum program optimization at application scale.

翻訳日:2023-04-13 22:20:21 公開日:2021-08-14

# 相互作用するボーソン系におけるリーブ・ロビンソン結合および近似線形光円錐

Lieb-Robinson bound and almost-linear light-cone in interacting boson systems ( http://arxiv.org/abs/2103.11592v3 )

ライセンス: Link先を確認

Tomotaka Kuwahara, Keiji Saito

(参考訳) 本研究では,Bose-Hubbard型ハミルトニアンと相互作用するボソン系の局所摂動がいかに早く伝播するかを検討する。一般に、これらのシステムは非有界な局所エネルギーを持ち、任意に高速な情報伝達が起こる可能性がある。非摂動初期状態にある任意の部位のボソン数がほぼ限られている、特定のが実験的に自然な状況に焦点を当てる。我々は、ほぼ線形な情報伝達光円錐の存在を厳密に証明し、リーブ-ロビンソン結合を確立する:波面は最大で$t\log^2 (t)$となる。ギャップのある基底状態に対するクラスタリング定理を証明し、古典的に1次元クエンチ力学をシミュレートする時間複雑性について研究する。

In this work, we investigate how quickly local perturbations propagate in interacting boson systems with Bose-Hubbard-type Hamiltonians. In general, these systems have unbounded local energies, and arbitrarily fast information propagation may occur. We focus on a specific but experimentally natural situation in which the number of bosons at any one site in the unperturbed initial state is approximately limited. We rigorously prove the existence of an almost-linear information-propagation light-cone, thus establishing a Lieb--Robinson bound: the wave-front grows at most as $t\log^2 (t)$. We prove the clustering theorem for gapped ground states and study the time complexity of classically simulating one-dimensional quench dynamics, a topic of great practical interest.

翻訳日:2023-04-07 04:45:19 公開日:2021-08-14

# 渦ミューオンの崩壊

Decay of the vortex muon ( http://arxiv.org/abs/2106.00345v2 )

ライセンス: Link先を確認

Pengcheng Zhao, Igor P. Ivanov, Pengming Zhang

(参考訳) ミューオン崩壊は自己解析であり、放出された電子のスペクトル角分布は偏光ミューオンのスピン配向を示す。ここでは、非平面波状態のミューオンに同じ特徴が適用されることを示し、利用可能なリッチ偏光機会を明らかにする。我々は, 平均伝播方向に対して非零軌道角運動量をもついわゆる渦状態に着目し, 運動量分布における円錐構造を示す。渦ミューオンの崩壊で放出される電子のスペクトルと角分布を計算し、最も明らかな可観測性は角分布ではなく、固定角電子スペクトルであることを示す。渦ミューオンの非常に小さな円錐開口角であっても、渦ミューオンとおおよそ平面波ミューオンを区別し、様々な偏光状態の区別を可能にする電子スペクトルの大幅な変化を観察することは容易である。これらの特徴は、外部磁場における渦ミューオンの進化を追跡する鍵となる。

Muon decay is self-analyzing: the spectral-angular distribution of the emitted electron reveals the spin orientation of the polarized muon. Here, we show that the same feature applies to muons in non-plane-wave states and helps reveal the rich polarization opportunities available. We focus on the so-called vortex states, in which the muon carries a non-zero orbital angular momentum with respect to the average propagation direction and exhibits a cone structure in the momentum distribution. We compute the spectrum and the angular distribution of the electrons emitted in decays of vortex muons and show that the most revealing observable is not the angular distribution but the fixed-angle electron spectra. Even for very small cone opening angles of the vortex muons, it will be easy to observe significant modifications of the electron spectra which would allow one to distinguish vortex muons from approximately plane wave muons, as well as to differentiate among various polarization states. These features will be the key to tracking the evolution of vortex muons in external magnetic fields.

翻訳日:2023-03-28 03:50:02 公開日:2021-08-14

# 相互作用スピン1/2$フェルミオンの偏極希釈気体の基底状態エネルギー

Ground state energy of the polarized diluted gas of interacting spin $1/2$ fermions ( http://arxiv.org/abs/2108.00793v2 )

ライセンス: Link先を確認

Piotr Chankowski and Jacek Wojtkiewicz

(参考訳) 実効場理論のアプローチは、フェルミ粒子の希薄気体の基底状態エネルギーの摂動計算を単純化し、非分極系の場合、古典的な結果を(k_{\rm f}a_0)^2$オーダー(ここで、$k_{\rm f}$は系のフェルミ運動量であり、$a_0$は$s$-wave散乱長である)まで容易に再導出することができ(より多くの労力で)、それを$(k_{\rm f}a_0)^4$まで拡張することができる。スピン1/2$フェルミオンの偏極気体の基底状態エネルギーの対応する膨張は、分析的に(最良の知識のために)$k_{\rm F}a_0$(ここで$k_{\rm F}$は$k_{{\rm F}\uparrow}$または$k_{{\rm F}\downarrow}$)オーダーでのみ知られている。ここでは、同じ有効場理論法により、この結果に対する$(k_{\rm F}a_0)^2$の補正も容易に行えることを示す。

The effective field theory approach simplifies the perturbative computation of the ground state energy of the diluted gas of fermions allowing in the case of the unpolarized system to easily re-derive the classic results up to the $(k_{\rm F}a_0)^2$ order (where $k_{\rm F}$ is the system's Fermi momentum and $a_0$ the $s$-wave scattering length) and (with more labour) to extend it up to the order $(k_{\rm F}a_0)^4$. The corresponding expansion of the ground state energy of the polarized gas of spin $1/2$ fermions is known analytically (to our best knowledge) only up to the $k_{\rm F}a_0$ (where $k_{\rm F}$ stands for $k_{{\rm F}\uparrow}$ or $k_{{\rm F}\downarrow}$) order. Here we show that the same effective field theory method allows to easily compute also the order $(k_{\rm F}a_0)^2$ correction to this result.

翻訳日:2023-03-20 03:20:39 公開日:2021-08-14

# BenchENAS:進化的ニューラルネットワーク検索のためのベンチマークプラットフォーム

BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search ( http://arxiv.org/abs/2108.03856v2 )

ライセンス: Link先を確認

Xiangning Xie, Yuqiao Liu, Yanan Sun, Gary G. Yen, Bing Xue and Mengjie Zhang

(参考訳) ディープニューラルネットワークのアーキテクチャを自動的に設計するneural architecture search(nas)は、ここ数年で多くのアプリケーションでブレークスルーを達成した。 NAS法の様々なクラスの中で、進化計算に基づくNAS(ENAS)法が近年注目されている。残念ながら、公平な比較と効率的な評価の問題はENASの開発を妨げている。公正な比較のために設計された現在のベンチマークアーキテクチャデータセットは、ENASアルゴリズムやアルゴリズムを実行するプラットフォームではなく、データセットのみを提供する。既存の効率的な評価手法は、人口ベースのENASアルゴリズムには適さないか、複雑すぎるかのいずれかである。本稿では,これらの問題に対処するためのBenchENASというプラットフォームを開発する。 BenchENASは、同じ環境で同じ設定で異なるアルゴリズムを実行することで、公正な比較を実現することを目指している。共通ラボ環境での効率的な評価を実現するため、benchenasは並列コンポーネントとキャッシュコンポーネントを高い保守性で設計する。さらに、BenchENASはインストールが容易で、高度に構成可能でモジュール化されており、優れたユーザビリティと拡張性をもたらす。本稿では,このプラットフォーム上でのGPU利用率の高い8つのENASアルゴリズムの効率的な比較実験を行う。実験は、公正な比較問題が存在することを検証し、BenchENASはこの問題を緩和することができる。 BenchENASのソースコードとドキュメントを無償で入手できるWebサイトがhttps://benchenas.comで公開されている。

Neural architecture search (NAS), which automatically designs the architectures of deep neural networks, has achieved breakthrough success over many applications in the past few years. Among different classes of NAS methods, evolutionary computation based NAS (ENAS) methods have recently gained much attention. Unfortunately, the issues of fair comparisons and efficient evaluations have hindered the development of ENAS. The current benchmark architecture datasets designed for fair comparisons only provide the datasets, not the ENAS algorithms or the platform to run the algorithms. The existing efficient evaluation methods are either not suitable for the population-based ENAS algorithm or are too complex to use. This paper develops a platform named BenchENAS to address these issues. BenchENAS aims to achieve fair comparisons by running different algorithms in the same environment and with the same settings. To achieve efficient evaluation in a common lab environment, BenchENAS designs a parallel component and a cache component with high maintainability. Furthermore, BenchENAS is easy to install and highly configurable and modular, which brings benefits in good usability and easy extensibility. The paper conducts efficient comparison experiments on eight ENAS algorithms with high GPU utilization on this platform. The experiments validate that the fair comparison issue does exist, and BenchENAS can alleviate this issue. A website has been built to promote BenchENAS at https://benchenas.com, where interested researchers can obtain the source code and document of BenchENAS for free.

翻訳日:2023-03-18 23:42:04 公開日:2021-08-14

# Deformed Explicitly Correlated Gaussian

Deformed Explicitly Correlated Gaussians ( http://arxiv.org/abs/2108.04859v2 )

ライセンス: Link先を確認

Matthew Beutel, Alexander Ahrens, Chenhang Huang, Yasuyuki Suzuki and Kalman Varga

(参考訳) 変形相関ガウス基底関数を導入し、それらの行列要素を算出する。これらの基底関数は非球面ポテンシャルの問題を解くのに使うことができる。そのようなポテンシャルの例として、パウリ・フィエルツ・ハミルトニアンにおける双極子自己相互作用項がある。キャビティqedにおける光マッター結合系を正確に解くための変形ガウス基底関数の精度と必要性を示す。

Deformed correlated Gaussian basis functions are introduced and their matrix elements are calculated. These basis functions can be used to solve problems with nonspherical potentials. One example of such potential is the dipole self-interaction term in the Pauli-Fierz Hamiltonian. Examples are presented showing the accuracy and necessity of deformed Gaussian basis functions to accurately solve light-matter coupled systems in cavity QED.

翻訳日:2023-03-18 21:09:02 公開日:2021-08-14

# lilliput: 短期的量子誤り訂正のための軽量な低遅延ルックアップテーブルベースのデコーダ

LILLIPUT: A Lightweight Low-Latency Lookup-Table Based Decoder for Near-term Quantum Error Correction ( http://arxiv.org/abs/2108.06569v1 )

ライセンス: Link先を確認

Poulami Das, Aditya Locharla, Cody Jones

(参考訳) 量子デバイスのエラー率は、ほとんどの量子アプリケーションを実行するのに必要なものよりも桁違いに高い。このギャップを埋めるために、Quantum Error Correction (QEC)は論理量子ビットを符号化し、複数の物理量子ビットを用いて情報を分配する。論理量子ビット上で定期的にシンドローム抽出回路を実行することにより、プログラムの実行中にエラーに関する情報(シンドロームと呼ばれる)を抽出する。デコーダはこれらのシンドロームを使用して、量子アルゴリズムで実装されたフィードバックを使用するために、リアルタイムでエラーを特定し、修正する。残念ながら、ソフトウェアデコーダは遅く、ハードウェアデコーダは高速だが正確ではない。したがって、これまでのほとんどのQEC研究はオフラインの復号化に依存している。短期QECにおけるリアルタイムデコーディングを実現するために,軽量低遅延ルックアップテーブルデコーダLILLIPUTを提案する。 lilliputはまず、症候群をエラー検出イベントに変換してルックアップテーブル(lut)にインデックスし、そのエントリがエラー情報をリアルタイムで提供する。第2に、ソフトウェアデコーダをオフラインで実行することで、全ての可能なエラーイベントに対するエラー割り当てをLUTにプログラムする。 lilliputは、ゲートや測定を含む量子ハードウェアのあらゆる操作のエラーを許容し、許容されたエラーの数はコードのサイズに応じて増加する。既存のシステムの制御回路や読み出し回路と容易に統合できるように、既製のFPGA上で <7% のロジックが必要である。 LILLIPUTは、数ナノ秒のレイテンシを発生させ、リアルタイムデコードを可能にする。また,LILLIPUTに必要なメモリを削減するために,CLUT(Compressed LUT)を提案する。すべてのエラーイベントが等しくあり、最も可能性の高いエラーイベントに対してのみデータを格納するという事実を利用して、clutsは107x(148mbから1.38mb)までのメモリを、精度を低下させることなく削減する。

The error rates of quantum devices are orders of magnitude higher than what is needed to run most quantum applications. To close this gap, Quantum Error Correction (QEC) encodes logical qubits and distributes information using several physical qubits. By periodically executing a syndrome extraction circuit on the logical qubits, information about errors (called syndrome) is extracted while running programs. A decoder uses these syndromes to identify and correct errors in real time, which is required to use feedback implemented in quantum algorithms. Unfortunately, software decoders are slow and hardware decoders are fast but less accurate. Thus, almost all QEC studies so far have relied on offline decoding. To enable real-time decoding in near-term QEC, we propose LILLIPUT-- a Lightweight Low Latency Look-Up Table decoder. LILLIPUT consists of two parts-- First, it translates syndromes into error detection events that index into a Look-Up Table (LUT) whose entry provides the error information in real-time. Second, it programs the LUTs with error assignments for all possible error events by running a software decoder offline. LILLIPUT tolerates an error on any operation in the quantum hardware, including gates and measurement, and the number of tolerated errors grows with the size of the code. It needs <7% logic on off-the-shelf FPGAs that allows it to be easily integrated alongside the control and readout circuits in existing systems. LILLIPUT incurs a latency of few nanoseconds and enables real-time decoding. We also propose Compressed LUTs (CLUTs) to reduce the memory needed by LILLIPUT. By exploiting the fact that not all error events are equally likely and only storing data for the most probable error events, CLUTs reduce the memory needed by up-to 107x (from 148 MB to 1.38 MB) without degrading accuracy.

翻訳日:2023-03-18 13:04:27 公開日:2021-08-14

# PaDGAN: 性能向上した分散設計のためのジェネレータネットワーク

PaDGAN: A Generative Adversarial Network for Performance Augmented Diverse Designs ( http://arxiv.org/abs/2002.11304v5 )

ライセンス: Link先を確認

Wei Chen, Faez Ahmed

(参考訳) 深部生成モデルは自動設計合成と設計空間探索に有用なツールであることが証明されている。エンジニアリング設計に適用すると、既存の生成モデルには3つの課題がある。 1) 生成した設計は多様性を欠き,設計領域のすべての領域をカバーしない。 2) 生成した設計の全体的な性能や品質を明示的に改善することは困難である。 3) 既存のモデルでは,トレーニングデータの領域外において,新しい設計は行われない。本稿では,これらの課題に対して,多様性と品質の確率論的モデリングのための新しい決定点プロセスに基づく損失関数を提案する。この新しい損失関数により、デザイン空間を良好にカバーした新しい高品質なデザインを生成できる「高性能拡張型多様な生成型adversarial network」または「padgan」と呼ばれる生成型adversarial networkの変種を開発する。 3つの合成例と1つの実世界の翼設計例を用いて、PaDGANが多種多様な高品質な設計を生成できることを実証した。バニラ生成広告ネットワークと比較すると、平均して、28%高い平均品質スコアで、より多様性があり、モード崩壊の問題のないサンプルを生成する。トレーニングデータのバウンダリ内に補間することで新しい設計を生成する典型的な生成モデルとは異なり、PaDGANはトレーニングデータの外側の高品質な領域に設計空間境界を拡張する。提案手法は,設計空間探索,設計最適化,創造的ソリューションレコメンデーションなど,多くのタスクに適用可能である。

Deep generative models are proven to be a useful tool for automatic design synthesis and design space exploration. When applied in engineering design, existing generative models face three challenges: 1) generated designs lack diversity and do not cover all areas of the design space, 2) it is difficult to explicitly improve the overall performance or quality of generated designs, and 3) existing models generally do not generate novel designs, outside the domain of the training data. In this paper, we simultaneously address these challenges by proposing a new Determinantal Point Processes based loss function for probabilistic modeling of diversity and quality. With this new loss function, we develop a variant of the Generative Adversarial Network, named "Performance Augmented Diverse Generative Adversarial Network" or PaDGAN, which can generate novel high-quality designs with good coverage of the design space. Using three synthetic examples and one real-world airfoil design example, we demonstrate that PaDGAN can generate diverse and high-quality designs. In comparison to a vanilla Generative Adversarial Network, on average, it generates samples with a 28% higher mean quality score with larger diversity and without the mode collapse issue. Unlike typical generative models that usually generate new designs by interpolating within the boundary of training data, we show that PaDGAN expands the design space boundary outside the training data towards high-quality regions. The proposed method is broadly applicable to many tasks including design space exploration, design optimization, and creative solution recommendation.

翻訳日:2022-12-28 14:06:20 公開日:2021-08-14

# multimbnn:ニューラルネットワークによる因果推論のマッチングとバランス

MultiMBNN: Matched and Balanced Causal Inference with Neural Networks ( http://arxiv.org/abs/2004.13446v4 )

ライセンス: Link先を確認

Ankit Sharma, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, Lovekesh Vig, Gautam Shroff

(参考訳) 観察研究における因果推論(CI)は、医療、教育、アドトリビューション、政策評価などにおいて多くの注目を集めている。コンバウンディングは、コンテキストが治療の割り当てと応答の両方に影響を与える典型的な危険である。複数の処理シナリオにおいて,ニューラルネットワークを用いたmultimbnnを提案し,一般化されたプロペンサリティスコアに基づくマッチングと学習バランスの取れた表現を用いることで,コンファウンディングを克服する。 PEHEを用いて、合成および実世界のデータセットのパフォーマンスをベンチマークし、ATEを指標として絶対誤差を平均する。 MultiMBNNは、TARNetやPerfect Match (PM)のようなCIの最先端アルゴリズムよりも優れている。

Causal inference (CI) in observational studies has received a lot of attention in healthcare, education, ad attribution, policy evaluation, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome confounding by employing generalized propensity score based matching, and learning balanced representations. We benchmark the performance on synthetic and real-world datasets using PEHE, and mean absolute percentage error over ATE as metrics. MultiMBNN outperforms the state-of-the-art algorithms for CI such as TARNet and Perfect Match (PM).

翻訳日:2022-12-08 23:52:18 公開日:2021-08-14

# セマンティックカラー化

Semantic-driven Colorization ( http://arxiv.org/abs/2006.07587v3 )

ライセンス: Link先を確認

Man M. Ho, Lu Zhang, Alexander Raake, Jinjia Zhou

(参考訳) 最近の着色は、白黒画像の着色を学習しながら意味情報を暗黙的に予測する。これにより、生成した色はオーバーフローしやすくなり、セマンティクスの障害は見えなくなる。人間の着色経験として、私たちの脳はまず写真の物体を検知し、認識し、次に実生活で見た多くの類似した物体に基づいて可視色を想像し、最後にそれらを着色する。そこで本研究では,まず,人間の動作をシミュレートして,画像の理解を学習し,色づけする。このように、我々の研究は意味レベルでもっともらしい色を提供できる。さらに、学習モデルのセマンティクス情報は理解可能になり、対話できるようになる。さらに、インスタンスの正規化も色付けの欠如を証明し、2つのデータストリームを持つためにU-Netの推論フローを再設計し、白黒画像とその意味マップから特徴マップを正規化する適切な方法を提供する。その結果、ネットワークは特定の対象に対して典型的な色付け作業と競合する可視色を提供できる。

Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. As a human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in the teaser. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information of the learned model becomes understandable and able to interact. Additionally, we also prove that Instance Normalization is also a missing ingredient for colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the feature maps from the black-and-white image and its semantic map. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects.

翻訳日:2022-11-21 21:09:09 公開日:2021-08-14

# ResRep:デカップリングによるロスレスCNNのプルーニング

ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting ( http://arxiv.org/abs/2007.03260v4 )

ライセンス: Link先を確認

Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding

(参考訳) 本研究では,畳み込み層の幅(出力チャネル数)を小さくすることでcnnをスリム化する,ロスレスチャネルプルーニング(フィルタプルーニング)の新しい手法であるresrepを提案する。記憶と忘れの独立性に関する神経生物学研究から着想を得て,CNNを記憶部分と忘れ部分に再パラメータ化することを提案する。前者に対して正規SGDを用いたトレーニングを行ったが,後者ではペナルティ勾配の新たな更新規則により,構造的空間性を実現した。次に、記憶と忘れ物を、より狭いレイヤで元のアーキテクチャにマージします。この意味で、ResRepは構造的再パラメータ化の成功例と見なすことができる。このような方法論は、パラメータにペナルティを適用してスパーシティを生み出す従来の学習ベースのプルーニングパラダイムとresrepを区別する。 ResRep は標準の ResNet-50 を76.15% の精度で ImageNet から 45% のFLOP しか持たず、精度を落とさず、圧縮率の高いロスレスプルーニングを初めて達成した。コードとモデルはhttps://github.com/DingXiaoH/ResRepにある。

We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep.

翻訳日:2022-11-12 18:31:11 公開日:2021-08-14

# 社会相変化を受けるアリのビデオにおける異常状態の同定

Identification of Abnormal States in Videos of Ants Undergoing Social Phase Change ( http://arxiv.org/abs/2009.08626v2 )

ライセンス: Link先を確認

Taeyeong Choi, Benjamin Pyenson, Juergen Liebig, Theodore P. Pavlic

(参考訳) 生物学は、高度な機械学習技術を開発するための重要な応用分野であり、モチベーションの源でもある。高スループットシークエンシングによる大規模で複雑なデータセットに多くの注意が払われているが、高品質のビデオ記録技術の進歩は、コンピュータビジョンと時系列解析の両方の高度な技術を必要とする同様にリッチなデータセットを生成するようになった。さらに、ある生物における遺伝子発現パターンの研究が他の生物に適用できる一般的な原理を明らかにするのと同様に、実験室のアリコロニーのような実験的に抽出可能なモデルシステムにおける複雑な社会的相互作用の研究は、他の社会グループのダイナミクスに関する一般的な原則を提供することができる。本稿では,50種以上のハルペグナトスアリの小さな実験室コロニーにおける生殖調節の研究から,このような事例を取り上げる。これらのアリは人工的に誘導され、階層改革の約20日間のプロセスが始まる。この過程の結論は人間の観察者に顕著であるが、過渡期におけるどの行動が過程に寄与しているかは未だ不明である。この課題に対処するために,訓練中の正常な社会的条件に対してのみ行動データが利用できるアリコロニーにおける異常状態の検出に,ワンクラス分類(OC)の応用の可能性を検討する。具体的には、DSVDD(Deep Support Vector Data Description)に基づいて、DSVDDデータ記述の中心に近いトレーニング中に、偽の"インナー・アウター"観測を合成するインナー・アウター・ジェネレータ(IO-GEN)を導入する。 IO-GEN は他の DSVDD ベースラインと比較して最終 OC 分類器の信頼性が向上することを示す。この方法は、追加の人間の観察が必要なビデオフレームの表示に使用できる。

Biology is both an important application area and a source of motivation for development of advanced machine learning techniques. Although much attention has been paid to large and complex data sets resulting from high-throughput sequencing, advances in high-quality video recording technology have begun to generate similarly rich data sets requiring sophisticated techniques from both computer vision and time-series analysis. Moreover, just as studying gene expression patterns in one organism can reveal general principles that apply to other organisms, the study of complex social interactions in an experimentally tractable model system, such as a laboratory ant colony, can provide general principles about the dynamics of other social groups. Here, we focus on one such example from the study of reproductive regulation in small laboratory colonies of more than 50 Harpegnathos ants. These ants can be artificially induced to begin a ~20 day process of hierarchy reformation. Although the conclusion of this process is conspicuous to a human observer, it remains unclear which behaviors during the transient period are contributing to the process. To address this issue, we explore the potential application of One-class Classification (OC) to the detection of abnormal states in ant colonies for which behavioral data is only available for the normal societal conditions during training. Specifically, we build upon the Deep Support Vector Data Description (DSVDD) and introduce the Inner-Outlier Generator (IO-GEN) that synthesizes fake "inner outlier" observations during training that are near the center of the DSVDD data description. We show that IO-GEN increases the reliability of the final OC classifier relative to other DSVDD baselines. This method can be used to screen video frames for which additional human observation is needed.

翻訳日:2022-10-17 02:51:56 公開日:2021-08-14

# より広いニューラルネットワークは、敵のロバスト性に役立つか?

Do Wider Neural Networks Really Help Adversarial Robustness? ( http://arxiv.org/abs/2010.01279v3 )

ライセンス: Link先を確認

Boxi Wu and Jinghui Chen and Deng Cai and Xiaofei He and Quanquan Gu

(参考訳) 敵の訓練は、敵の例に対する強力な防御手段である。以前の実験結果から、敵のトレーニングはパフォーマンスを改善するためにより広いネットワークを必要とすることが示唆された。しかし、ニューラルネットワークの幅がモデルロバスト性にどのように影響するかは、いまだ解明されていない。本稿では,ネットワーク幅とモデルロバスト性との関係を慎重に検討する。具体的には、モデルロバスト性は、ロバスト正規化パラメータ $\lambda$ によって制御される自然精度と摂動安定性のトレードオフと密接に関連していることを示す。同じ$\lambda$で、より広いネットワークはより優れた自然な精度を実現することができるが、摂動安定性が悪くなり、モデル全体の堅牢性が悪化する可能性がある。この現象の起源を理解するため、摂動安定性はネットワークの局所リプシッツ性とも関係している。ニューラルネットワークカーネルの最近の結果を活用することで、より広いネットワークが摂動安定性を悪くする傾向があることを示す。私たちの分析は 1)小型ネットワーク上で最初に$\lambda$を微調整し,それをモデルトレーニングに直接使用するという一般的な戦略は,モデルの堅牢性を低下させる可能性がある。 2) より広いモデルの堅牢性の可能性を完全に解き放つためには、$\lambda$を適切に拡大する必要がある。最後に、ワイドモデル上で$\lambda$を適応的に拡大し、チューニング時間を著しく短縮する新しい Width Adjusted Regularization (WAR) 法を提案する。

Adversarial training is a powerful type of defense against adversarial examples. Previous empirical results suggest that adversarial training requires wider networks for better performances. However, it remains elusive how neural network width affects model robustness. In this paper, we carefully examine the relationship between network width and model robustness. Specifically, we show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability, which is controlled by the robust regularization parameter $\lambda$. With the same $\lambda$, wider networks can achieve better natural accuracy but worse perturbation stability, leading to a potentially worse overall model robustness. To understand the origin of this phenomenon, we further relate the perturbation stability with the network's local Lipschitzness. By leveraging recent results on neural tangent kernels, we theoretically show that wider networks tend to have worse perturbation stability. Our analyses suggest that: 1) the common strategy of first fine-tuning $\lambda$ on small networks and then directly use it for wide model training could lead to deteriorated model robustness; 2) one needs to properly enlarge $\lambda$ to unleash the robustness potential of wider models fully. Finally, we propose a new Width Adjusted Regularization (WAR) method that adaptively enlarges $\lambda$ on wide models and significantly saves the tuning time.

翻訳日:2022-10-11 08:35:54 公開日:2021-08-14

# 効率的なグローバルローカル特徴表現と局所時間集約による歩行認識

Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation ( http://arxiv.org/abs/2011.01461v2 )

ライセンス: Link先を確認

Beibei Lin, Shunli Zhang and Xin Yu

(参考訳) 歩行認識は最も重要な生体計測技術の1つであり、多くの分野で応用されている。最近の歩行認識フレームワークは、人間のグローバル外観または地域から抽出された記述子によって、それぞれの歩行フレームを表現する。しかし、グローバル情報に基づく表現はしばしば歩行フレームの詳細を無視するが、地域ベースの記述子は近隣地域の関係を捉えることができないため、識別性が低下する。本稿では,歩行認識のための識別的特徴表現を実現するための特徴抽出・融合フレームワークを提案する。この目標に向けて、グローバルビジュアル情報とローカル領域の詳細の両方を利用し、グローバル・ローカル機能抽出器(glfe)を開発します。特に、当社のglfeモジュールは、新たに設計された複数のグローバルおよびローカル畳み込み層(glconv)で構成され、グローバルおよびローカル機能を原則的にアンサンブルします。さらに,時間分解能を低減し,より高い空間分解能を得るために,空間情報をさらに保存するための新しい操作である局所時間凝集(lta)を提案する。 glfeとltaの助けを借りて,視覚特徴の判別性を大幅に改善し,歩行認識性能を向上した。大規模実験により,提案手法が2つの一般的なデータセットにおける最先端の歩行認識法を上回っていることを示す。

Gait recognition is one of the most important biometric technologies and has been applied in many fields. Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans. However, the representations based on global information often neglect the details of the gait frame, while local region based descriptors cannot capture the relations among neighboring regions, thus reducing their discriminativeness. In this paper, we propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition. Towards this goal, we take advantage of both global visual information and local region details and develop a Global and Local Feature Extractor (GLFE). Specifically, our GLFE module is composed of our newly designed multiple global and local convolutional layers (GLConv) to ensemble global and local features in a principle manner. Furthermore, we present a novel operation, namely Local Temporal Aggregation (LTA), to further preserve the spatial information by reducing the temporal resolution to obtain higher spatial resolution. With the help of our GLFE and LTA, our method significantly improves the discriminativeness of our visual features, thus improving the gait recognition performance. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art gait recognition methods on two popular datasets.

翻訳日:2022-09-30 05:10:03 公開日:2021-08-14

# 特徴の模倣による知識の蒸留

Distilling Knowledge by Mimicking Features ( http://arxiv.org/abs/2011.01424v2 )

ライセンス: Link先を確認

Guo-Hua Wang, Yifan Ge, Jianxin Wu

(参考訳) 知識蒸留(KD)は、高容量ネットワーク(教師)の助けを借りて効率的なネットワーク(学生)を訓練する一般的な方法である。伝統的な手法では、教師のソフトロジットを学生ネットワークを訓練するための余分な監督として使用する。本稿では,学生にペナルティメート層における教師の特徴を模倣させることがより有利であると主張する。生徒は教師機能から直接より効果的な情報を学べるだけでなく、機能の模倣はソフトマックス層なしで訓練された教師にも応用できる。実験の結果、従来のkdよりも高い精度が得られることがわかった。さらに機能模倣を容易にするために,特徴ベクトルを大きさと方向に分解する。教師は生徒の特徴の大きさにより多くの自由を与え、生徒は特徴の方向性を模倣することにもっと注意を払うべきだと論じている。この要件を満たすために,LSH(Locality-sensitive hashing)に基づく損失項を提案する。この新たな損失の助けを借りて、本手法は、機能方向をより正確に模倣し、特徴量の制約を緩和し、最先端の蒸留精度を達成する。 lshが特徴方向模倣をいかに促進するかの理論解析を行い、特徴模倣をマルチラベル認識と物体検出にさらに拡張する。

Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logits as extra supervision to train the student network. In this paper, we argue that it is more advantageous to make the student mimic the teacher's features in the penultimate layer. Not only the student can directly learn more effective information from the teacher feature, feature mimicking can also be applied for teachers trained without a softmax layer. Experiments show that it can achieve higher accuracy than traditional KD. To further facilitate feature mimicking, we decompose a feature vector into the magnitude and the direction. We argue that the teacher should give more freedom to the student feature's magnitude, and let the student pay more attention on mimicking the feature direction. To meet this requirement, we propose a loss term based on locality-sensitive hashing (LSH). With the help of this new loss, our method indeed mimics feature directions more accurately, relaxes constraints on feature magnitudes, and achieves state-of-the-art distillation accuracy. We provide theoretical analyses of how LSH facilitates feature direction mimicking, and further extend feature mimicking to multi-label recognition and object detection.

翻訳日:2022-09-30 04:17:41 公開日:2021-08-14

# Webコンテンツのブランド一貫性向上のための統合的アプローチ:モデリング,分析,勧告

An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis and Recommendation ( http://arxiv.org/abs/2011.09754v3 )

ライセンス: Link先を確認

Soumyadeep Roy, Shamik Sural, Niyati Chhaya, Anandhavelu Natarajan, Niloy Ganguly

(参考訳) 消費者依存型(ビジネス・ツー・コンシューマー)組織は、会社のブランドパーソナリティ(ブランドパーソナリティ)と呼ばれる、人的品質のセットを保有する傾向にある。この知覚は、組織によって作成された広告、ブログ、雑誌のような形で、コンテンツを通じて消費者に印象を与えます。一貫性のあるブランドは、規則性と一般的なパターンに対する親和性が発達するにつれて、信頼を生み出し、顧客を維持するでしょう。しかし、ブランドの一貫したメッセージトーンを維持することは、デジタルマーケティング時代の最先端を維持するために、作成し、インターネットにプッシュする必要があるコンテンツの量が仮想的に爆発するにつれて、ますます難しくなっている。問題の深さを理解するために、約650社の約300万のWebページコンテンツを収集した。内容の言語的特徴を考慮した特徴特化分類モデルを開発した。分類器は、企業のミッションやビジョンと一致しないWeb記事を自動的に識別し、一貫性を維持することができない条件を見つけるのに役立ちます。ブランドの不整合問題に対処するために,web 記事のパーソナリティに一貫性を持たせるために,変更が必要な上位3つの文を出力可能な文ランキングシステムを開発した。

A consumer-dependent (business-to-consumer) organization tends to present itself as possessing a set of human qualities, which is termed as the brand personality of the company. The perception is impressed upon the consumer through the content, be it in the form of advertisement, blogs or magazines, produced by the organization. A consistent brand will generate trust and retain customers over time as they develop an affinity towards regularity and common patterns. However, maintaining a consistent messaging tone for a brand has become more challenging with the virtual explosion in the amount of content which needs to be authored and pushed to the Internet to maintain an edge in the era of digital marketing. To understand the depth of the problem, we collect around 300K web page content from around 650 companies. We develop trait-specific classification models by considering the linguistic features of the content. The classifier automatically identifies the web articles which are not consistent with the mission and vision of a company and further helps us to discover the conditions under which the consistency cannot be maintained. To address the brand inconsistency issue, we then develop a sentence ranking system that outputs the top three sentences that need to be changed for making a web article more consistent with the company's brand personality.

翻訳日:2022-09-23 21:27:53 公開日:2021-08-14

# RIN: 単一画像による人体モデル復元と模倣のテクスチャ化

RIN: Textured Human Model Recovery and Imitation with a Single Image ( http://arxiv.org/abs/2011.12024v4 )

ライセンス: Link先を確認

Haoxi Ran, Guangfu Wang, Li Lu

(参考訳) 人間の模倣は、GANの人間のポーズと身体の内容を歪める能力によって、最近話題になっている。しかし,最新の手法では3d情報にはほとんど注目せず,自己完結を避けるためには大量の入力画像が必要となる。本稿では,1枚の画像からテクスチャ3dモデルを再構成し,生成したモデルを用いて被写体を模倣する,新しいボリュームベースフレームワークrinを提案する。具体的には、人間のテクスチャのほとんどを推定するために、U-Netのようなフロントエンド翻訳ネットワークを提案する。前後の両方の画像を入力すると、テクスチャ化されたボリュームリカバリモジュールによって、ボリュームの人間を色づけすることができます。 3Dポーズのシーケンスは、ボリュームからボリュームへの変換タスクとして、フローブルディケンタングルネットワークを介して色付きボリュームをガイドする。トレーニング中に2次元平面にボリュームを投影するために, 異なる深度対応レンダラーを設計する。実験の結果,人間の模倣には容積モデルが適しており,バックビューはネットワークを用いて確実に推定できることがわかった。 2dポーズやセマンティクスマップに基づく先行作業は、人間の不安定な外観では失敗することが多いが、我々のフレームワークは、マルチビュー入力から想像されるものと競合する具体的な結果を生み出すことができる。

Human imitation has become topical recently, driven by GAN's ability to disentangle human pose and body content. However, the latest methods hardly focus on 3D information, and to avoid self-occlusion, a massive amount of input images are needed. In this paper, we propose RIN, a novel volume-based framework for reconstructing a textured 3D model from a single picture and imitating a subject with the generated model. Specifically, to estimate most of the human texture, we propose a U-Net-like front-to-back translation network. With both front and back images input, the textured volume recovery module allows us to color a volumetric human. A sequence of 3D poses then guides the colored volume via Flowable Disentangle Networks as a volume-to-volume translation task. To project volumes to a 2D plane during training, we design a differentiable depth-aware renderer. Our experiments demonstrate that our volume-based model is adequate for human imitation, and the back view can be estimated reliably using our network. While prior works based on either 2D pose or semantic map often fail for the unstable appearance of a human, our framework can still produce concrete results, which are competitive to those imagined from multi-view input.

翻訳日:2022-09-21 12:27:38 公開日:2021-08-14

# (参考訳) siam: ディープニューラルネットワークのためのチップレットベースのスケーラブルなインメモリアクセラレーション

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks ( http://arxiv.org/abs/2108.08903v1 )

ライセンス: CC BY 4.0

Gokul Krishnan, Sumit K. Mandal, Manvitha Pannala, Chaitali Chakrabarti, Jae-sun Seo, Umit Y. Ogras, Yu Cao

(参考訳) ディープラーニングのためのモノリシックなチップ上でのインメモリコンピューティング(IMC)は、モデルサイズの増加に伴い、領域、収量、オンチップの相互接続コストに劇的な課題に直面している。 2.5d統合あるいはchipletベースのアーキテクチャは、複数の小さなチップ(チップレットなど)を相互接続して大規模なコンピューティングシステムを形成し、モノリシックなiccアーキテクチャを超えて実現可能なソリューションを示し、大規模ディープラーニングモデルを加速する。本稿では,チップレットベースのIMCアーキテクチャの性能評価を行うベンチマークシミュレータSIAMを提案し,IMCアーキテクチャ設計におけるそのようなパラダイムシフトの可能性を探る。 SIAMはデバイス、回路、アーキテクチャ、ネットワークオンチップ(NoC)、ネットワークオンパッケージ(NoP)、DRAMアクセスモデルを統合し、エンドツーエンドシステムを実現する。 SIAMは広範囲のディープニューラルネットワーク(DNN)をサポートし、さまざまなネットワーク構造や構成に合わせてカスタマイズ可能で、効率的な設計空間探索が可能である。 CIFAR-10, CIFAR-100, ImageNetデータセットを用いて, 最先端DNNのベンチマークを行い, SIAMの柔軟性, スケーラビリティ, シミュレーション速度を示す。さらに,シリコーンによるシミュレーション結果をSIMBAでキャリブレーションする。 SIAMを通じて得られたチップレットベースのIMCアーキテクチャは、Nvidia V100やT4 GPUと比較して、ImageNetデータセット上でResNet-50のエネルギー効率が130$\times$と72$\times$改善されている。

In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes. 2.5D integration or chiplet-based architectures interconnect multiple small chips (i.e., chiplets) to form a large computing system, presenting a feasible solution beyond a monolithic IMC architecture to accelerate large deep learning models. This paper presents a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of such a paradigm shift in IMC architecture design. SIAM integrates device, circuit, architecture, network-on-chip (NoC), network-on-package (NoP), and DRAM access models to realize an end-to-end system. SIAM is scalable in its support of a wide range of deep neural networks (DNNs), customizable to various network structures and configurations, and capable of efficient design space exploration. We demonstrate the flexibility, scalability, and simulation speed of SIAM by benchmarking different state-of-the-art DNNs with CIFAR-10, CIFAR-100, and ImageNet datasets. We further calibrate the simulation results with a published silicon result, SIMBA. The chiplet-based IMC architecture obtained through SIAM shows 130$\times$ and 72$\times$ improvement in energy-efficiency for ResNet-50 on the ImageNet dataset compared to Nvidia V100 and T4 GPUs.

翻訳日:2021-08-29 15:10:28 公開日:2021-08-14

# (参考訳) TRAPDOOR: 機械学習に基づくゲノム解析におけるデータセットバイアス検出のためのバックドアの再利用

TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis ( http://arxiv.org/abs/2108.10132v1 )

ライセンス: CC BY 4.0

Esha Sarkar, Michail Maniatakos

(参考訳) 機械学習(ML)は、画像、音声、テキスト、データ分析など、いくつかのアプリケーションで前例のないパフォーマンスを達成した。遺伝子変異(ゲノミクス)の根底にあるパターンを理解するのにMLを使うことは、診断の落とし穴を克服するだけでなく、がんのような生命を脅かす疾患の治療を設計する上でも、はるかに大きな結果をもたらす。 MLアルゴリズムの成功と持続性は、収集およびトレーニングに使用されるデータの質と多様性に依存する。グループ(民族グループ、性別グループなど)の下位表現このようなデータセットでは、特定のグループの不正確な予測につながる可能性がある。本研究では,ニューラルネットワークのバックドア(バックドア)という悪質な目的のために提案された手法を再提案し,バイアス付きデータセットの同定手法であるTRAPDOORを提案する。我々は、病院、共同プロジェクト、研究機関からセンシティブなグループに対するバイアスを意識せずに中央クラウドにデータがもたらされるゲノミクスサプライチェーンの典型的な協調学習セットを検討する。そこで本研究では,ゲノム応用のためのMLバックドアを用いた真の性能を損なうことなく,集団データの潜在的なバイアス情報を漏洩させる手法を開発した。実世界のがんデータセットを用いて、すでに白色個体に対して存在する偏差を分析し、データセットに偏差を人工的に導入し、実験結果により、TRAPDOORが100%精度でデータセット偏差を検出できること、さらに小さな誤差で偏差を回復することで偏差の程度を抽出できることが示されている。

Machine Learning (ML) has achieved unprecedented performance in several applications including image, speech, text, and data analysis. Use of ML to understand underlying patterns in gene mutations (genomics) has far-reaching results, not only in overcoming diagnostic pitfalls, but also in designing treatments for life-threatening diseases like cancer. Success and sustainability of ML algorithms depends on the quality and diversity of data collected and used for training. Under-representation of groups (ethnic groups, gender groups, etc.) in such a dataset can lead to inaccurate predictions for certain groups, which can further exacerbate systemic discrimination issues. In this work, we propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors. We consider a typical collaborative learning setting of the genomics supply chain, where data may come from hospitals, collaborative projects, or research institutes to a central cloud without awareness of bias against a sensitive group. In this context, we develop a methodology to leak potential bias information of the collective data without hampering the genuine performance using ML backdooring catered for genomic applications. Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially, and our experimental result show that TRAPDOOR can detect the presence of dataset bias with 100% accuracy, and furthermore can also extract the extent of bias by recovering the percentage with a small error.

翻訳日:2021-08-29 14:45:58 公開日:2021-08-14

# 近位正規化サブバンド適応アルゴリズムによる音響エコーキャンセラの検討

Study of Proximal Normalized Subband Adaptive Algorithm for Acoustic Echo Cancellation ( http://arxiv.org/abs/2108.10219v1 )

ライセンス: Link先を確認

Gang Guo, Yi Yu, Rodrigo C. de Lamare, Zongsheng Zheng, Lu Lu and Qiangming Cai

(参考訳) 本稿では,分散シナリオに適した正規化サブバンド適応フィルタアルゴリズムを提案する。提案アルゴリズムは, 近位前方分割法とソフトスレッショルド法に基づいて導出する。シミュレーションによって支援されるアルゴリズムの平均および平均二乗挙動を解析する。さらに, 平均二乗偏差の最小化に基づいて, 近位ステップにおけるしきい値パラメータの選択に対する適応的アプローチも提案する。システム同定と音響エコーキャンセラの文脈におけるシミュレーションにより,提案アルゴリズムの優位性を検証した。

In this paper, we propose a novel normalized subband adaptive filter algorithm suited for sparse scenarios, which combines the proportionate and sparsity-aware mechanisms. The proposed algorithm is derived based on the proximal forward-backward splitting and the soft-thresholding methods. We analyze the mean and mean square behaviors of the algorithm, which is supported by simulations. In addition, an adaptive approach for the choice of the thresholding parameter in the proximal step is also proposed based on the minimization of the mean square deviation. Simulations in the contexts of system identification and acoustic echo cancellation verify the superiority of the proposed algorithm over its counterparts.

翻訳日:2021-08-29 12:10:05 公開日:2021-08-14

# FOX-NAS: 高速でオンデバイスで説明可能なニューラルアーキテクチャ検索

FOX-NAS: Fast, On-device and Explainable Neural Architecture Search ( http://arxiv.org/abs/2108.08189v1 )

ライセンス: Link先を確認

Chia-Hsiang Liu, Yu-Shin Han, Yuan-Yao Sung, Yi Lee, Hung-Yueh Chiang, Kai-Chiang Wu

(参考訳) ニューラルネットワーク検索は、優れたパフォーマンスを持つニューラルネットワークを見つけることができ、ワンショットアプローチが一般的です。ワンショットアプローチは通常、重み付けとアーキテクチャのパフォーマンスを予測する予測器を備えたスーパーネットを必要とする。しかし、従来の手法は性能予測器を生成するのに多くの時間がかかるため、非効率である。そこで本研究では,シミュレーションアニーリングと多変量回帰に基づく高速かつ説明可能な予測器からなるFOX-NASを提案する。本手法は量子化にやさしく,効率的にエッジに展開できる。異なるハードウェア上での実験は、fox-nasモデルが他の一般的なニューラルネットワークアーキテクチャよりも優れていることを示している。例えば、FOX-NASはMobileNetV2とEfficientNet-Lite0の精度を240%、エッジCPUの40%のレイテンシで一致させる。 FOX-NASは、2020年の低消費電力コンピュータビジョンチャレンジ(LPCVC)で3位を獲得した。すべての評価結果はhttps://lpcv.ai/competitions/2020を参照。検索コードと事前学習されたモデルはhttps://github.com/great8nctu/fox-nasでリリースされる。

Neural architecture search can discover neural networks with good performance, and One-Shot approaches are prevalent. One-Shot approaches typically require a supernet with weight sharing and predictors that predict the performance of architecture. However, the previous methods take much time to generate performance predictors thus are inefficient. To this end, we propose FOX-NAS that consists of fast and explainable predictors based on simulated annealing and multivariate regression. Our method is quantization-friendly and can be efficiently deployed to the edge. The experiments on different hardware show that FOX-NAS models outperform some other popular neural network architectures. For example, FOX-NAS matches MobileNetV2 and EfficientNet-Lite0 accuracy with 240% and 40% less latency on the edge CPU. FOX-NAS is the 3rd place winner of the 2020 Low-Power Computer Vision Challenge (LPCVC), DSP classification track. See all evaluation results at https://lpcv.ai/competitions/2020. Search code and pre-trained models are released at https://github.com/great8nctu/FOX-NAS.

翻訳日:2021-08-19 14:49:52 公開日:2021-08-14

# (参考訳) mri病変分割のためのベンダードメインへの適応

Adapting to Unseen Vendor Domains for MRI Lesion Segmentation ( http://arxiv.org/abs/2108.06434v1 )

ライセンス: CC BY 4.0

Brandon Mac, Alan R. Moody, April Khademi

(参考訳) 機械学習モデルにおける重要な制限の1つは、トレーニング分布の領域外にあるデータのパフォーマンスの低さである。これは磁気共鳴(MR)イメージングにおける画像解析において特に当てはまり、ハードウェアとソフトウェアのバリエーションはスキャナー間の非標準強度、コントラスト、ノイズ分布を生成する。近年,合成データポイントを作成するために,領域間のデータ拡張のための画像翻訳モデルが提案されている。本稿では,ソースデータセットからターゲットデータセットへのmr画像拡張のための教師なし画像変換モデルの適用について検討する。具体的には、画像翻訳により、これらのモデルがターゲットデータセットを表す合成データポイントをどれだけうまく作成できるかを評価し、これらの合成データポイントを訓練したセグメンテーションモデルが、ターゲットデータセット上で直接訓練されたモデルのパフォーマンスに近づくかどうかを確認する。画像間の変換、スキャナーベンダー間、ラベルから画像への変換からなるデータセット間の拡張の3つの構成を検討する。その結果、ラベルから画像構成までの合成データに基づいてトレーニングされたセグメンテーションモデルが、ターゲットデータセットに直接トレーニングされたセグメンテーションモデルに最も近い性能を示した。各ターゲットベンダー(GE、Siemens、Philips)の合成データのトレーニングは0.63、0.64、0.58であり、ターゲットデータセットでのトレーニングは0.65、0.72、0.61であった。

One of the key limitations in machine learning models is poor performance on data that is out of the domain of the training distribution. This is especially true for image analysis in magnetic resonance (MR) imaging, as variations in hardware and software create non-standard intensities, contrasts, and noise distributions across scanners. Recently, image translation models have been proposed to augment data across domains to create synthetic data points. In this paper, we investigate the application an unsupervised image translation model to augment MR images from a source dataset to a target dataset. Specifically, we want to evaluate how well these models can create synthetic data points representative of the target dataset through image translation, and to see if a segmentation model trained these synthetic data points would approach the performance of a model trained directly on the target dataset. We consider three configurations of augmentation between datasets consisting of translation between images, between scanner vendors, and from labels to images. It was found that the segmentation models trained on synthetic data from labels to images configuration yielded the closest performance to the segmentation model trained directly on the target dataset. The Dice coeffcient score per each target vendor (GE, Siemens, Philips) for training on synthetic data was 0.63, 0.64, and 0.58, compared to training directly on target dataset was 0.65, 0.72, and 0.61.

翻訳日:2021-08-18 10:59:01 公開日:2021-08-14

# (参考訳) 機械読取理解に基づく新しいエンティティ抽出法

A New Entity Extraction Method Based on Machine Reading Comprehension ( http://arxiv.org/abs/2108.06444v1 )

ライセンス: CC BY 4.0

Xiaobo Jiang, Kun He, Jiajun He and Guangyu Yan

(参考訳) エンティティ抽出は、自然言語処理において大量のテキストから情報を取得するための重要な技術である。それらの間のさらなる相互作用は、人間の読み理解の基準を満たさないため、モデルの理解が制限されるとともに、推論問題による回答(つまり対象の実体)の欠落や誤判断も制限される。 An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. 実験により、RC-I2DPは科学領域と公共領域の7つの分野の総合的な最先端モデルであり、F1のモデルと比較して2.1%から10.4%の性能向上を達成した。

Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. Experiments have proved that MRC-I2DP represents an overall state-of-the-art model in 7 from the scientific and public domains, achieving a performance improvement of 2.1% ~ 10.4% compared to the model model in F1.

翻訳日:2021-08-18 10:35:27 公開日:2021-08-14

# (参考訳) PTT:ポイントクラウドにおける3次元物体追跡のためのポイントトラック変換モジュール

PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds ( http://arxiv.org/abs/2108.06455v1 )

ライセンス: CC BY 4.0

Jiayao Shan, Sifan Zhou, Zheng Fang, Yubo Cui

(参考訳) 3Dオブジェクト追跡はロボティクスにとって重要な問題だ。本稿では,ptt(point-track-transformer)と呼ばれる変圧器モジュールを提案する。 PTTモジュールには、機能埋め込み、位置符号化、自己注意機能計算のための3つのブロックが含まれている。機能埋め込みは、類似のセマンティック情報がある場合、機能を埋め込み空間に近づけることを目的としている。位置符号化は点雲の座標を高次元の識別可能な特徴に符号化するために用いられる。自己注意は、注意重みの計算によって洗練された注意特徴を生成する。さらに,PTTモジュールをオープンソースの最先端手法であるP2Bに組み込んでPTT-Netを構築する。 KITTIデータセットの実験では、当社のPTT-Netが最先端のマージン(約10.%)を突破していることが明らかになった。さらに、ptt-netはnvidia 1080ti gpuでリアルタイムパフォーマンス(約40fps)を達成できる。私たちのコードは、https://github.com/shanjiayao/PTT.comでロボットコミュニティのためにオープンソース化されています。

3D single object tracking is a key issue for robotics. In this paper, we propose a transformer module called Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking. PTT module contains three blocks for feature embedding, position encoding, and self-attention feature computation. Feature embedding aims to place features closer in the embedding space if they have similar semantic information. Position encoding is used to encode coordinates of point clouds into high dimension distinguishable features. Self-attention generates refined attention features by computing attention weights. Besides, we embed the PTT module into the open-source state-of-the-art method P2B to construct PTT-Net. Experiments on the KITTI dataset reveal that our PTT-Net surpasses the state-of-the-art by a noticeable margin (~10\%). Additionally, PTT-Net could achieve real-time performance (~40FPS) on NVIDIA 1080Ti GPU. Our code is open-sourced for the robotics community at https://github.com/shanjiayao/PTT.

翻訳日:2021-08-18 10:10:58 公開日:2021-08-14

# (参考訳) スパースニューラルネットワークによる最適近似とその応用

Optimal Approximation with Sparse Neural Networks and Applications ( http://arxiv.org/abs/2108.06467v1 )

ライセンス: CC BY 4.0

Khay Boon Hong

(参考訳) ニューラルネットワークを格納するための接続性やメモリ要件を制限することで,関数クラスの複雑性を$l^2(\mathbb r^d)$で測定する。また,表現系を持つ近似理論は数学においてよく開発されてきたため,ニューラルネットワークを導くための可算関数の集合であるrepresentation systemを導入する。次に、基本有界定理を証明し、関数クラス自体に固有の量を示すことによって、ニューラルネットワークと表現システムの近似能力に関する情報を与える。また、表現システムによる近似に関する既存の理論をニューラルネットワークに転送し、ニューラルネットワークの実践的価値を大幅に増幅する方法を提供する。最後に,ニューラルネットワークを用いてB-スプライン関数を近似し,B-スプライン曲線を生成する。次に,レートゆらぎ理論とウェッジレット構成を用いて,$\beta$ マンガ様関数と呼ばれるクラスの複雑性を分析する。

We use deep sparsely connected neural networks to measure the complexity of a function class in $L^2(\mathbb R^d)$ by restricting connectivity and memory requirement for storing the neural networks. We also introduce representation system - a countable collection of functions to guide neural networks, since approximation theory with representation system has been well developed in Mathematics. We then prove the fundamental bound theorem, implying a quantity intrinsic to the function class itself can give information about the approximation ability of neural networks and representation system. We also provides a method for transferring existing theories about approximation by representation systems to that of neural networks, greatly amplifying the practical values of neural networks. Finally, we use neural networks to approximate B-spline functions, which are used to generate the B-spline curves. Then, we analyse the complexity of a class called $\beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.

翻訳日:2021-08-18 09:52:59 公開日:2021-08-14

# (参考訳) ロバスト増強によるコントラスト型自己教師型シーケンスレコメンデーション

Contrastive Self-supervised Sequential Recommendation with Robust Augmentation ( http://arxiv.org/abs/2108.06479v1 )

ライセンス: CC BY 4.0

Zhiwei Liu, Yongjun Chen, Jia Li, Philip S. Yu, Julian McAuley, Caiming Xiong

(参考訳) Sequential Recommendation Describes a set of technique to model dynamic user behavior to order to predict future interaction in sequence user data。その核となるアプローチは、マルコフ連鎖、リカレントネットワーク、あるいは最近ではトランスフォーマーを介して、シーケンス内のアイテム間のモデル遷移確率である。しかし、データスパーシリティやノイズの多いデータなど、古い問題と新しい問題の両方が残っており、特に複雑なパラメータハングリーモデルでは、そのような問題はパフォーマンスを損なう可能性がある。本稿では、これらの問題を緩和する手段として、コントラッシブな自己監視学習(SSL)のシーケンシャルレコメンデーションへの適用について検討する。対照的にSSLは、正のペア間の合意が最大化される非競合インスタンスからの拡張を構築する。離散的な性質、項目間の相関、長さ分布の歪性から、逐次的な推奨のために対照的なsslフレームワークを開発するのは困難である。この目的のために,コントラスト型自己指導型シーケンシャルレコメンデーション(CoSeRec)のための新しいフレームワークを提案する。項目相関を利用した2つの情報拡張演算子を導入し、コントラスト学習のための高品質なビューを作成する。実世界の3つのデータセットに対する実験結果から,提案手法がモデル性能の向上に有効であることを示す。実装は \url{https://github.com/ychen1993/coserec} で利用可能です。

Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. At their core, such approaches model transition probabilities between items in a sequence, whether through Markov chains, recurrent networks, or more recently, Transformers. However both old and new issues remain, including data-sparsity and noisy data; such issues can impair the performance, especially in complex, parameter-hungry models. In this paper, we investigate the application of contrastive Self-Supervised Learning (SSL) to the sequential recommendation, as a way to alleviate some of these issues. Contrastive SSL constructs augmentations from unlabelled instances, where agreements among positive pairs are maximized. It is challenging to devise a contrastive SSL framework for a sequential recommendation, due to its discrete nature, correlations among items, and skewness of length distributions. To this end, we propose a novel framework, Contrastive Self-supervised Learning for sequential Recommendation (CoSeRec). We introduce two informative augmentation operators leveraging item correlations to create high-quality views for contrastive learning. Experimental results on three real-world datasets demonstrate the effectiveness of the proposed method on improving model performance and the robustness against sparse and noisy data. Our implementation is available online at \url{https://github.com/YChen1993/CoSeRec}

翻訳日:2021-08-18 09:52:01 公開日:2021-08-14

# (参考訳) 深層畳み込みニューラルネットワークを用いた小児胸部x線写真における多発疾患の自動診断法

Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks ( http://arxiv.org/abs/2108.06486v1 )

ライセンス: CC BY 4.0

Thanh T. Tran, Hieu H. Pham, Thang V. Nguyen, Tung T. Le, Hieu T. Nguyen, Ha Q. Nguyen

(参考訳) 小児における胸部X線写真(CXR)の解釈は誤りが多く,放射線学の専門知識を高いレベルで理解する必要がある。近年、ディープ畳み込みニューラルネットワーク(D-CNN)は成人におけるCXRの解釈において顕著な性能を示した。しかし、D-CNNが小児CXRスキャンから正確に複数の肺病変を認識できるという証拠は乏しい。特に, 小児胸部疾患の診断モデルの開発は, (i) 医師注記データセットの欠如, (ii) クラス不均衡問題などの重大な課題に直面している。本稿では,小児のcxrスキャン5,017例の大規模データセットを回顧的に収集し,経験豊富な放射線科医が10例の病理所見を手作業で分類した。 D-CNNモデルは、3,550個の注釈付きスキャンで訓練され、複数の小児肺病理を自動分類する。そこで本研究では,BCE(Binary-Cross Entropy Loss)の再検討を前提としたD-CNNのトレーニングにおいて,多数のクラスに割り当てられた損失を減らし,より難しいサンプルを効率よく学習する「分散ベース損失」の修正と適用を提案する。 777の研究の独立したテストセットにおいて、提案手法は受信機動作特性 (auc) の下の領域を 0.709 (95% ci, 0.690-0.729) とする。カットオフ値の感度、特異性、F1スコアはそれぞれ0.722(0.694-0.750)、0.579(0.563-0.595)、0.389(0.373-0.405)である。これらの結果は, 対象疾患のほとんどにおいて, 従来の最先端手法よりも有意に優れていた。さらに,この学習課題において,BCEやFocal Lossなどの他の標準損失と比較して,提案した損失関数の有効性を検証した。全体として、小児CXRの解釈におけるD-CNNの可能性を示す。

Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. In particular, the development of diagnostic models for the detection of pediatric chest diseases faces significant challenges such as (i) lack of physician-annotated datasets and (ii) class imbalance problems. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist for the presence of 10 common pathologies. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically. To address the high-class imbalance issue, we propose to modify and apply "Distribution-Balanced loss" for training D-CNNs which reshapes the standard Binary-Cross Entropy loss (BCE) to efficiently learn harder samples by down-weighting the loss assigned to the majority classes. On an independent test set of 777 studies, the proposed approach yields an area under the receiver operating characteristic (AUC) of 0.709 (95% CI, 0.690-0.729). The sensitivity, specificity, and F1-score at the cutoff value are 0.722 (0.694-0.750), 0.579 (0.563-0.595), and 0.389 (0.373-0.405), respectively. These results significantly outperform previous state-of-the-art methods on most of the target diseases. Moreover, our ablation studies validate the effectiveness of the proposed loss function compared to other standard losses, e.g., BCE and Focal Loss, for this learning task. Overall, we demonstrate the potential of D-CNNs in interpreting pediatric CXRs.

翻訳日:2021-08-18 09:33:29 公開日:2021-08-14

# (参考訳) 自動毒素コメント検出におけるバイアスの調査--実証的研究

Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study ( http://arxiv.org/abs/2108.06487v1 )

ライセンス: CC BY 4.0

Ayush Kumar, Pratik Kumar

(参考訳) オンラインプラットフォームの増加に伴い、コメントやリアクションを通じて、これらのプラットフォームでのユーザエンゲージメントが急増している。このような文章によるコメントの大部分は、聴衆に対して虐待的で無礼で侮辱的です。機械学習システムがプラットフォームに現れるコメントをチェックするために、トレーニングデータに存在するバイアスが分類器に渡され、クラス、宗教、性別のセットに対する差別につながる。本研究では,これらの分類器のバイアスを推定するために異なる分類器と特徴を評価し,毒性分類の下流タスクにおける性能を評価する。その結果, 自動有毒なコメント検出モデルの性能改善は, バイアス軽減と正の相関を示した。我々の研究で、注意機構を持つLSTMはCNNモデルよりも優れたモデリング戦略であることが判明した。さらなる分析により、fasttext埋め込みは、有毒なコメント検出のためのトレーニングモデルへの手袋埋め込みよりもわずかに好ましいことが示されている。より深い分析により、これらの自動モデルはAUCスコアが高いにもかかわらず、特に特定のアイデンティティグループに偏っていることが明らかになった。最後に、毒性検出モデルのバイアスを軽減するために、毒性サブタイプの補助的なタスクで訓練されたマルチタスク設定が有用であることが判明し、AUCスコアの0.26%(6%)まで上昇した。

With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.

翻訳日:2021-08-18 09:20:05 公開日:2021-08-14

# (参考訳) DICOMイメージングルータ:DICOM X線スキャンから身体部位を分類するためのオープンディープラーニングフレームワーク

DICOM Imaging Router: An Open Deep Learning Framework for Classification of Body Parts from DICOM X-ray Scans ( http://arxiv.org/abs/2108.06490v1 )

ライセンス: CC BY 4.0

Hieu H. Pham, Dung V. Do, Ha Q. Nguyen

(参考訳) dicom形式のx線イメージングは、臨床でもっとも一般的に使用されるイメージングモダリティであり、膨大な非正規化データベースを生み出している。これにより、医療画像を分析するためのAIソリューションのデプロイに障害が生じ、しばしば、特定のAIモデルにイメージを投入する前に、適切な身体部分を特定する必要がある。この課題は、X線スキャンから身体部分を分類する自動化的で効率的なアプローチの必要性を高める。残念ながら、私たちの知る限りでは、このタスクにはオープンなツールやフレームワークはありません。この欠点を補うために,未知のDICOM X線像を腹部,成人胸,小児胸,脊椎などの5つの解剖学的グループに分類するためのDICOM Imaging Routerを導入する。この目的のために、16,093枚の画像からなる大規模なX線データセットが収集され、手動で分類された。 11,263枚の画像のトレーニングセットを使用して、最先端の深層CNNのセットをトレーニングした。これらのネットワークは、2,419枚の独立したテストセットで評価され、ボディパーツの分類において優れた性能を示した。具体的には, 0.982 (95% CI, 0.977-0.988), 0.985 (95% CI, 0.975-0.989), F1スコア 0.981 (95% CI, 0.976-0.987) のリコールを実現した。 1000枚のx線画像に対する外部的妥当性は,提案手法の病院間における堅牢性を示している。これらの顕著なパフォーマンスは、深部CNNが人体部分とX線スキャンを正確にかつ効果的に区別できることを示し、臨床現場での幅広い応用に潜在的に有益であることを示している。この研究から得られたデータセット、コード、トレーニングされたディープラーニングモデルは、プロジェクトのWebサイトで公開されます。

X-ray imaging in DICOM format is the most commonly used imaging modality in clinical practice, resulting in vast, non-normalized databases. This leads to an obstacle in deploying AI solutions for analyzing medical images, which often requires identifying the right body part before feeding the image into a specified AI model. This challenge raises the need for an automated and efficient approach to classifying body parts from X-ray scans. Unfortunately, to the best of our knowledge, there is no open tool or framework for this task to date. To fill this lack, we introduce a DICOM Imaging Router that deploys deep CNNs for categorizing unknown DICOM X-ray images into five anatomical groups: abdominal, adult chest, pediatric chest, spine, and others. To this end, a large-scale X-ray dataset consisting of 16,093 images has been collected and manually classified. We then trained a set of state-of-the-art deep CNNs using a training set of 11,263 images. These networks were then evaluated on an independent test set of 2,419 images and showed superior performance in classifying the body parts. Specifically, our best performing model achieved a recall of 0.982 (95% CI, 0.977-0.988), a precision of 0.985 (95% CI, 0.975-0.989) and a F1-score of 0.981 (95% CI, 0.976-0.987), whilst requiring less computation for inference (0.0295 second per image). Our external validity on 1,000 X-ray images shows the robustness of the proposed approach across hospitals. These remarkable performances indicate that deep CNNs can accurately and effectively differentiate human body parts from X-ray scans, thereby providing potential benefits for a wide range of applications in clinical settings. The dataset, codes, and trained deep learning models from this study will be made publicly available on our project website at https://vindr.ai/.

翻訳日:2021-08-18 09:10:20 公開日:2021-08-14

# (参考訳) 人物に焦点をあてる:現代映画から学ぶ古いイメージの着色

Focusing on Persons: Colorizing Old Images Learning from Modern Historical Movies ( http://arxiv.org/abs/2108.06515v1 )

ライセンス: CC BY 4.0

Xin Jin, Zhonglan Li, Ke Liu, Dongqing Zou, Xiaodong Li, Xingfan Zhu, Ziyin Zhou, Qilong Sun, Qingyu Liu

(参考訳) 業界では、ビデオサイトやアーカイブなど、古いグレーの写真を自動的に色付けする必要があるシナリオがたくさんあります。本稿では,歴史的人物の多彩な多彩な高忠実度衣服の着色化に着目したヒストリーネットについて述べる。歴史人物の着色は現実的で実践的であるが、既存の方法ではうまく機能しない。本稿では,3つの部分,分類,微粒化セマンティックパーシング,カラー化を含むヒストリーネットを提案する。分類サブモジュールは、年代、国籍、衣服の種類に応じてイメージを分類し、パーシングサブネットワークは、画像中の人物の輪郭、衣服、背景のセマンティクスを提供し、服や人のより正確な着色を実現し、カラーオーバーフローを防ぐ。トレーニングプロセスでは,分類と意味解析機能をカラー化生成ネットワークに統合し,カラー化を改善する。分類および解析サブネットワークの設計により、画像のカラー化の精度が向上し、画像の各部分の境界をより明確にすることができる。また,現代に作られた147の歴史的映画やテレビシリーズから,1,353,166枚の画像と42個の年代,国籍,衣服を自動着色するためのラベルを含む,新しい現代映画データセット(MHMD)を提案する。様々な量的・質的な比較により,本手法は,歴史的文献により正色が正しい軍服において,最先端の着色法よりも優れることが示された。

In industry, there exist plenty of scenarios where old gray photos need to be automatically colored, such as video sites and archives. In this paper, we present the HistoryNet focusing on historical person's diverse high fidelity clothing colorization based on fine grained semantic understanding and prior. Colorization of historical persons is realistic and practical, however, existing methods do not perform well in the regards. In this paper, a HistoryNet including three parts, namely, classification, fine grained semantic parsing and colorization, is proposed. Classification sub-module supplies classifying of images according to the eras, nationalities and garment types; Parsing sub-network supplies the semantic for person contours, clothing and background in the image to achieve more accurate colorization of clothes and persons and prevent color overflow. In the training process, we integrate classification and semantic parsing features into the coloring generation network to improve colorization. Through the design of classification and parsing subnetwork, the accuracy of image colorization can be improved and the boundary of each part of image can be more clearly. Moreover, we also propose a novel Modern Historical Movies Dataset (MHMD) containing 1,353,166 images and 42 labels of eras, nationalities, and garment types for automatic colorization from 147 historical movies or TV series made in modern time. Various quantitative and qualitative comparisons demonstrate that our method outperforms the state-of-the-art colorization methods, especially on military uniforms, which has correct colors according to the historical literatures.

翻訳日:2021-08-18 09:04:32 公開日:2021-08-14

# (参考訳) 3次元ニューロン再構築のためのVoxel-wise Cross-Volume Representation Learning

Voxel-wise Cross-Volume Representation Learning for 3D Neuron Reconstruction ( http://arxiv.org/abs/2108.06522v1 )

ライセンス: CC BY 4.0

Heng Wang, Chaoyi Zhang, Jianhui Yu, Yang Song, Siqi Liu, Wojciech Chrzanowski, Weidong Cai

(参考訳) 脳回路活動におけるニューロンの形態と機能の解析には,3次元ニューロンの自動再構成が重要である。しかし、既存のトレースアルゴリズムの性能は画像品質の低さに支えられている。近年,低コントラスト背景からノイズを除去し,ニューロン構造を復元することにより,生の3次元光学画像スタックの品質を向上させるためのディープラーニングに基づくセグメンテーション手法が提案されている。ニューロン形態の多様性と大きなニューロンデータセットの欠如により、現在のニューロンセグメンテーションモデルは、より優れた特徴表現を符号化することを目的とした、複雑で特別に設計されたサブモジュールをベースアーキテクチャに導入することに依存している。成功したが、推論中に計算に余分な負担がかかる。したがって、ベースネットワークを変更するのではなく、データセット自体に焦点を移します。ほとんどのニューロンセグメンテーションモデルで使用されるエンコーダ-デコーダバックボーンは、ニューロンの構造的特徴を学ぶためにボリューム内ボクセルポイントのみに出席するが、異なるボリューム間で同じカテゴリに属するボクセルの共有固有の意味的特徴を無視する。そこで本研究では,エンコーダ・デコーダセグメンテーションモデルに基づいて,新たなvoxelレベルクロスボリューム表現学習パラダイムを用いて,voxelの固有特徴を明示的に活用することを提案する。我々の方法は推論中に余分なコストを伴わない。提案手法は,BigNeuronプロジェクトの42個の3次元ニューロン画像から評価し,元のセグメンテーションモデルの学習能力の向上と再構築性能の向上を目的としている。

Automatic 3D neuron reconstruction is critical for analysing the morphology and functionality of neurons in brain circuit activities. However, the performance of existing tracing algorithms is hinged by the low image quality. Recently, a series of deep learning based segmentation methods have been proposed to improve the quality of raw 3D optical image stacks by removing noises and restoring neuronal structures from low-contrast background. Due to the variety of neuron morphology and the lack of large neuron datasets, most of current neuron segmentation models rely on introducing complex and specially-designed submodules to a base architecture with the aim of encoding better feature representations. Though successful, extra burden would be put on computation during inference. Therefore, rather than modifying the base network, we shift our focus to the dataset itself. The encoder-decoder backbone used in most neuron segmentation models attends only intra-volume voxel points to learn structural features of neurons but neglect the shared intrinsic semantic features of voxels belonging to the same category among different volumes, which is also important for expressive representation learning. Hence, to better utilise the scarce dataset, we propose to explicitly exploit such intrinsic features of voxels through a novel voxel-level cross-volume representation learning paradigm on the basis of an encoder-decoder segmentation model. Our method introduces no extra cost during inference. Evaluated on 42 3D neuron images from BigNeuron project, our proposed method is demonstrated to improve the learning ability of the original segmentation model and further enhancing the reconstruction performance.

翻訳日:2021-08-18 08:51:52 公開日:2021-08-14

# (参考訳) 手指衛生的ポーズの特徴同定とマッチング

Feature Identification and Matching for Hand Hygiene Pose ( http://arxiv.org/abs/2108.06537v1 )

ライセンス: CC BY 4.0

Rashmi Bakshi

(参考訳) SIFT, SURF, ORB などのコンピュータビジョンの3つの特徴記述子を比較し評価した。手のひら画像から手のひら画像, 回転画像まで, 手のひら写真と一致する特徴を抽出し, 一致させた。マッチの総数と生成されたマッチの正確な数に基づいて算出された精度スコア。この実験は、orbアルゴリズムが、少ない時間で正しいマッチングを多く与えることで、より優れることを示した。特徴抽出と手指衛生ポーズ分類のための手洗いビデオ記録に応用したORB特徴検出技術が今後の課題である。 OpenCVはpythonスクリプトにアルゴリズムを適用した。

Three popular feature descriptors of computer vision such as SIFT, SURF, and ORB compared and evaluated. The number of correct features extracted and matched for the original hand hygiene pose-Rub hands palm to palm image and rotated image. An accuracy score calculated based on the total number of matches and the correct number of matches produced. The experiment demonstrated that ORB algorithm outperforms by giving the high number of correct matches in less amount of time. ORB feature detection technique applied over handwashing video recordings for feature extraction and hand hygiene pose classification as a future work. OpenCV utilized to apply the algorithms within python scripts.

翻訳日:2021-08-18 08:41:41 公開日:2021-08-14

# (参考訳) PICCOLO: ポイントクラウド中心のOmnidirectional Localization

PICCOLO: Point Cloud-Centric Omnidirectional Localization ( http://arxiv.org/abs/2108.06545v1 )

ライセンス: CC BY 4.0

Junho Kim, Changwoon Choi, Hojun Jang, and Young Min Kim

(参考訳) 一方向局所化のための単純かつ効率的なアルゴリズムであるPICCOLOを提案する。カラーの点雲とシーンの360パノラマ画像が与えられた場合、パノラマ画像が撮影されるカメラのポーズを復元することが目的である。私たちのパイプラインは、クエリとして与えられた単一のイメージで、オフザシェルフで動作し、ニューラルネットワークのトレーニングや、画像の地味なポーズの収集は必要ありません。代わりに、各点雲の色をパノラマ画像の全体像と一致させ、グラデーション・ディッセント最適化を行い、カメラのポーズを見つける。我々の損失関数はサンプリング損失と呼ばれ、点クラウド内の全ての点の投影された位置で評価される点クラウド中心である。対照的に、従来の測光損失は画像中心であり、各画素位置の色を比較する。比較対象の単純な変更により、サンプリング損失は全方位画像の激しい視覚歪みを効果的に克服し、360度ビューのグローバルなコンテキストを享受し、視覚的ローカライゼーションの困難なシナリオに対処する。 PICCOLOは、様々な環境で評価された場合、既存の全方位ローカライゼーションアルゴリズムよりも精度と安定性が優れている。

We present PICCOLO, a simple and efficient algorithm for omnidirectional localization. Given a colored point cloud and a 360 panorama image of a scene, our objective is to recover the camera pose at which the panorama image is taken. Our pipeline works in an off-the-shelf manner with a single image given as a query and does not require any training of neural networks or collecting ground-truth poses of images. Instead, we match each point cloud color to the holistic view of the panorama image with gradient-descent optimization to find the camera pose. Our loss function, called sampling loss, is point cloud-centric, evaluated at the projected location of every point in the point cloud. In contrast, conventional photometric loss is image-centric, comparing colors at each pixel location. With a simple change in the compared entities, sampling loss effectively overcomes the severe visual distortion of omnidirectional images, and enjoys the global context of the 360 view to handle challenging scenarios for visual localization. PICCOLO outperforms existing omnidirectional localization algorithms in both accuracy and stability when evaluated in various environments.

翻訳日:2021-08-18 08:38:20 公開日:2021-08-14

# (参考訳) マルチレベルアテンション機構による砂時計網の積み重ね--椎間板ラベリングの探究

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling ( http://arxiv.org/abs/2108.06554v1 )

ライセンス: CC BY 4.0

Reza Azad, Lucas Rouhier, Julien Cohen-Adad

(参考訳) MRIによる脊椎椎間板形成は,多発性硬化症,筋萎縮性側索硬化症,変性頚髄症,癌などの脊椎疾患の適切な診断に重要である。 MRIデータにおける脊椎椎間板の自動ラベル付けは、椎骨と骨の面積の類似性、脊椎と周囲の組織の形状のばらつき、スキャン(製造者、パルスシーケンス、画像コントラスト、解像度、アーティファクト)のばらつきなど、難しい作業である。以前の研究では、脊椎椎間板のラベル付けは、しばしばディスク検出のステップ後に行われ、ローカライゼーションアルゴリズムがディスクを見逃したり、偽陽性の検知を行うと、ほとんど失敗する。本研究では,ポーズ推定手法を用いて椎間板ラベリングを再構成することにより,この問題を軽減することを目的としている。そこで本研究では,椎間板の位置と骨格構造を共同学習するためのマルチレベルアテンション機構を備えた重ね合わせ砂時計ネットワークを提案する。提案した深層学習モデルは意味的セグメンテーションの強さとポーズ推定手法を考慮し,欠落した領域と偽陽性検出を扱う。提案手法の性能をさらに高めるために,偽陽性検出を減らすためのスケルトンベース探索空間を提案する。提案手法はspiner general public multi-center dataset上で評価し,t1wとt2wのコントラストにおいて,従来よりも優れた性能を示した。この方法はivadomed (https://ivadomed.org)で実装されている。

Labeling vertebral discs from MRI scans is important for the proper diagnosis of spinal related diseases, including multiple sclerosis, amyotrophic lateral sclerosis, degenerative cervical myelopathy and cancer. Automatic labeling of the vertebral discs in MRI data is a difficult task because of the similarity between discs and bone area, the variability in the geometry of the spine and surrounding tissues across individuals, and the variability across scans (manufacturers, pulse sequence, image contrast, resolution and artefacts). In previous studies, vertebral disc labeling is often done after a disc detection step and mostly fails when the localization algorithm misses discs or has false positive detection. In this work, we aim to mitigate this problem by reformulating the semantic vertebral disc labeling using the pose estimation technique. To do so, we propose a stacked hourglass network with multi-level attention mechanism to jointly learn intervertebral disc position and their skeleton structure. The proposed deep learning model takes into account the strength of semantic segmentation and pose estimation technique to handle the missing area and false positive detection. To further improve the performance of the proposed method, we propose a skeleton-based search space to reduce false positive detection. The proposed method evaluated on spine generic public multi-center dataset and demonstrated better performance comparing to previous work, on both T1w and T2w contrasts. The method is implemented in ivadomed (https://ivadomed.org).

翻訳日:2021-08-18 08:22:35 公開日:2021-08-14

# (参考訳) ニューラルネットワークの遷移直交分解:双曲方程式の非線形還元のための機械学習アプローチ

The Neural Network shifted-Proper Orthogonal Decomposition: a Machine Learning Approach for Non-linear Reduction of Hyperbolic Equations ( http://arxiv.org/abs/2108.06558v1 )

ライセンス: CC BY 4.0

Davide Papapicco, Nicola Demo, Michele Girfoglio, Giovanni Stabile, Gianluigi Rozza

(参考訳) 主対流を持つモデルは射影に基づく還元次数モデリングにおいて常に難しい課題を提起した。最近提案された多くの手法は、コルモゴロフのN-幅崩壊を加速する全階解の事前処理に基づいており、より精度のよいより小さな線型部分空間が得られる。しかし、これらの方法は解の位相空間における特性速度の知識に頼らざるを得ず、アドベクション場に対する明示的な機能形式を持つ問題への適用範囲を制限しなければならない。本研究では,ディープラーニングアーキテクチャを実装することで,統計的学習フレームワークにおける正しい前処理変換を自動的に検出する問題にアプローチする。純粋にデータ駆動方式により、線形部分空間操作の既存のアプローチを未知の対流場を持つ非線形双曲問題に一般化することができる。提案アルゴリズムは,その性能をベンチマークするために単純なテストケースに対して検証され,後に多相シミュレーションに応用された。

Models with dominant advection always posed a difficult challenge for projection-based reduced order modelling. Many methodologies that have recently been proposed are based on the pre-processing of the full-order solutions to accelerate the Kolmogorov N-width decay thereby obtaining smaller linear subspaces with improved accuracy. These methods however must rely on the knowledge of the characteristic speeds in phase space of the solution, limiting their range of applicability to problems with explicit functional form for the advection field. In this work we approach the problem of automatically detecting the correct pre-processing transformation in a statistical learning framework by implementing a deep-learning architecture. The purely data-driven method allowed us to generalise the existing approaches of linear subspace manipulation to non-linear hyperbolic problems with unknown advection fields. The proposed algorithm has been validated against simple test cases to benchmark its performances and later successfully applied to a multiphase simulation.

翻訳日:2021-08-18 08:13:31 公開日:2021-08-14

# (参考訳) 水中ドームの屈折幾何学

Refractive Geometry for Underwater Domes ( http://arxiv.org/abs/2108.06575v1 )

ライセンス: CC BY 4.0

Mengkun She, David Nakath, Yifan Song, Kevin K\"oser

(参考訳) 水中カメラは通常、水から保護するためにガラス窓の後ろに置かれる。ドームポートである球状ガラスは、高度の水圧に非常に適しており、視野が大きく、ピンホールカメラが球の中心に正確に配置されている場合の屈折を避けることができる。実際のレンズをドームセンターに完全に合わせることは、実際に中心となるプロセスをガイドする方法(例えば)の両方において難しい作業である。視覚サーボ)とアライメントの品質を測定する方法に加えて、アライメントを機械的に実行する方法。したがって、このようなシステムはオフセットによって適切に調整されやすく、ピンホールカメラモデルを無効にする球面での屈折パターンに挑戦する。深海探査に使用する厚いドームにおいても、カメラシステム全体が軸方向のカメラとなり、正確な空気、ガラス、水の性質の知識を必要とせずに屈折中心を計算する非イテレーティブな方法を提供する。また,球面の屈折幾何学を解析し,前方と後方の正則化やiso屈折曲線などの効果を考察し,薄いドーム内の3次元点の前方投影のための6次多項式式を得る。次に,複数の画像から純水中キャリブレーションを推定する手法を提案する。この推定は、調整中にレンズの機械的位置を導くために使用できるか、フォトグラムの水中応用で考慮できる。

Underwater cameras are typically placed behind glass windows to protect them from the water. Spherical glass, a dome port, is well suited for high water pressures at great depth, allows for a large field of view, and avoids refraction if a pinhole camera is positioned exactly at the sphere's center. Adjusting a real lens perfectly to the dome center is a challenging task, both in terms of how to actually guide the centering process (e.g. visual servoing) and how to measure the alignment quality, but also, how to mechanically perform the alignment. Consequently, such systems are prone to being decentered by some offset, leading to challenging refraction patterns at the sphere that invalidate the pinhole camera model. We show that the overall camera system becomes an axial camera, even for thick domes as used for deep sea exploration and provide a non-iterative way to compute the center of refraction without requiring knowledge of exact air, glass or water properties. We also analyze the refractive geometry at the sphere, looking at effects such as forward- vs. backward decentering, iso-refraction curves and obtain a 6th-degree polynomial equation for forward projection of 3D points in thin domes. We then propose a pure underwater calibration procedure to estimate the decentering from multiple images. This estimate can either be used during adjustment to guide the mechanical position of the lens, or can be considered in photogrammetric underwater applications.

翻訳日:2021-08-18 07:46:03 公開日:2021-08-14

# (参考訳) スケーラブルな百万エージェント強化学習によるパンデミック予測のための微視的パンデミックシミュレータ

A Microscopic Pandemic Simulator for Pandemic Prediction Using Scalable Million-Agent Reinforcement Learning ( http://arxiv.org/abs/2108.06589v1 )

ライセンス: CC BY-SA 4.0

Zhenggang Tang, Kai Yan, Liting Sun, Wei Zhan, Changliu Liu

(参考訳) 微視的流行モデルは、政府の政策立案者が疫病の発生を予測しシミュレーションするための強力なツールであり、個々の行動がマクロ現象に与える影響を捉えることができる。しかし、既存のモデルは単純なルールベースの個々の振る舞いのみを考慮し、適用性を制限する。本稿では, 深部強化学習型顕微鏡モデルであるMicroscopic Pandemic Simulator (MPS)を提案する。ルールベースのエージェントを報酬を最大化するために行動が駆動される合理的エージェントに置き換えることで、mpsは現実世界のダイナミクスをよりよく近似する。本稿では,MPSにおける大量のエージェントを効率的にシミュレートするため,SMADQN(Scalable Million-Agent DQN)を提案する。 MPSは、異なる政府の戦略の影響を効率的に評価することを可能にする。本稿ではまず,米国アレゲニーにおける実世界のデータに対してMPSを校正し,情報開示と隔離という2つの政府の戦略を実証的に評価する。その結果,提案手法の有効性が検証された。本稿では,経済・ソーシャルネットワークなどの大規模エージェントベースネットワークにおけるDRLの適用について,新たな知見を提供する。

Microscopic epidemic models are powerful tools for government policy makers to predict and simulate epidemic outbreaks, which can capture the impact of individual behaviors on the macroscopic phenomenon. However, existing models only consider simple rule-based individual behaviors, limiting their applicability. This paper proposes a deep-reinforcement-learning-powered microscopic model named Microscopic Pandemic Simulator (MPS). By replacing rule-based agents with rational agents whose behaviors are driven to maximize rewards, the MPS provides a better approximation of real world dynamics. To efficiently simulate with massive amounts of agents in MPS, we propose Scalable Million-Agent DQN (SMADQN). The MPS allows us to efficiently evaluate the impact of different government strategies. This paper first calibrates the MPS against real-world data in Allegheny, US, then demonstratively evaluates two government strategies: information disclosure and quarantine. The results validate the effectiveness of the proposed method. As a broad impact, this paper provides novel insights for the application of DRL in large scale agent-based networks such as economic and social networks.

翻訳日:2021-08-18 07:44:22 公開日:2021-08-14

# (参考訳) 微調整事前学習言語モデルによるセキュリティ脆弱性レポートのエンティティ認識

Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models ( http://arxiv.org/abs/2108.06590v1 )

ライセンス: CC BY 4.0

Guanqun Yang, Shay Dineen, Zhipeng Lin, Xueqing Liu

(参考訳) 公開セキュリティ脆弱性レポート(cveレポートなど)は、コンピュータとネットワークシステムのメンテナンスにおいて重要な役割を果たす。セキュリティ企業や管理者は、これらのレポートの情報に頼って、顧客へのパッチの開発とデプロイのタスクを優先している。これらのレポートは構造化されていないテキストであるため、自動情報抽出(IE)は構造化されていないレポートを構造化された形式に変換することで処理のスケールアップに役立つ。セキュリティ脆弱性レポートの自動IEに関する既存の作業は、しばしば多数のラベル付きトレーニングサンプルに依存している。しかし、大量のラベル付きトレーニングセットを作成するのは、費用も時間もかかる。そこで本研究では,ラベル付きトレーニングサンプルを少数しか使用できない問題について,本研究で初めて検討する。特に,我々の小規模トレーニングデータセットにおける最先端の事前学習言語モデルの性能について検討した。その結果、事前訓練された言語モデルと注意深く調整されたハイパーパラメーターにより、このタスクにおける最先端システムに到達またはわずかに優れることがわかった。主カテゴリにおける最初の微調整と、[7]のように他のカテゴリへの転送学習の2段階のプロセスと一致し、もしそうでなければ両方の段階において必要なラベル付きサンプルの数は大幅に減少する: 微調整の90%が5758から576に減少し、88.8%が1カテゴリあたり64のラベル付きサンプルで転送学習を減少させる。本実験は,NERの脆弱性レポートに対する少数サンプル学習の有効性を示すものである。この結果から,セキュリティ脆弱性レポートの少数サンプル学習における複数の研究機会が開放され,論文で論じられている。コード:https://github.com/guanqun-yang/FewVulnerability。

Public security vulnerability reports (e.g., CVE reports) play an important role in the maintenance of computer and network systems. Security companies and administrators rely on information from these reports to prioritize tasks on developing and deploying patches to their customers. Since these reports are unstructured texts, automatic information extraction (IE) can help scale up the processing by converting the unstructured reports to structured forms, e.g., software names and versions and vulnerability types. Existing works on automated IE for security vulnerability reports often rely on a large number of labeled training samples. However, creating massive labeled training set is both expensive and time consuming. In this work, for the first time, we propose to investigate this problem where only a small number of labeled training samples are available. In particular, we investigate the performance of fine-tuning several state-of-the-art pre-trained language models on our small training dataset. The results show that with pre-trained language models and carefully tuned hyperparameters, we have reached or slightly outperformed the state-of-the-art system on this task. Consistent with previous two-step process of first fine-tuning on main category and then transfer learning to others as in [7], if otherwise following our proposed approach, the number of required labeled samples substantially decrease in both stages: 90% reduction in fine-tuning from 5758 to 576,and 88.8% reduction in transfer learning with 64 labeled samples per category. Our experiments thus demonstrate the effectiveness of few-sample learning on NER for security vulnerability report. This result opens up multiple research opportunities for few-sample learning for security vulnerability reports, which is discussed in the paper. Code: https://github.com/guanqun-yang/FewVulnerability.

翻訳日:2021-08-18 07:21:01 公開日:2021-08-14

# (参考訳) オフィス需要応答におけるエネルギー価格のオフライン強化学習:エネルギーとデータコストの削減

Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs ( http://arxiv.org/abs/2108.06594v1 )

ライセンス: CC BY-SA 4.0

Doseok Jang, Lucas Spangher, Manan Khattar, Utkarsha Agwan, Selvaprabuh Nadarajah, Costas Spanos

(参考訳) 私たちのチームは、オフィスビルで本格的なエネルギー需要対応実験を行うことを提案しています。これはコミュニティに価値を提供するエキサイティングな取り組みですが、強化学習エージェントのトレーニングデータの収集にはコストがかかり、制限されます。本研究では,データコスト(収束の加速)とプログラム実装コストを最小化するためにオフライントレーニングをどのように活用するかを検討する。シミュレーションタスクで実験を開始するようにモデルを事前トレーニングし、エージェントに対する実世界の報酬をシミュレートするためにトレーニングされた計画モデルを使用することです。エネルギー需要応答問題における効率的な価格設定のためのオフライン強化学習の有用性を示す。

Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real world's rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem.

翻訳日:2021-08-18 07:03:26 公開日:2021-08-14

# (参考訳) 効率的な頭部追跡のための機械学習手法による光追跡パラメータの予測解析

Prediction Analysis of Optical Tracker Parameters using Machine Learning Approaches for efficient Head Tracking ( http://arxiv.org/abs/2108.06606v1 )

ライセンス: CC BY 4.0

Aman Kataria, Smarajit Ghosh and Vinod Karar

(参考訳) ヘッドトラッカーは、航空機/コックピットシミュレーターのパイロットの頭部を追跡するため、ヘッドマウントディスプレイシステムにおいて重要な部分である。ヘッドトラッカーの操作上の欠陥は、異なる照明条件や規則的な光の干渉といった様々な環境条件に依存する。このレターでは、異なる環境条件下での頭部の動きの6-DoFデータ収集に光学トラッカーが用いられている。また,6-DoFデータに対する環境条件の違いと受信機と光送信機の距離の変化も分析した。

A head tracker is a crucial part of the head mounted display systems, as it tracks the head of the pilot in the plane/cockpit simulator. The operational flaws of head trackers are also dependent on different environmental conditions like different lighting conditions and stray light interference. In this letter, an optical tracker has been employed to gather the 6-DoF data of head movements under different environmental conditions. Also, the effect of different environmental conditions and variation in distance between the receiver and optical transmitter on the 6-DoF data was analyzed.

翻訳日:2021-08-18 06:48:30 公開日:2021-08-14

# (参考訳) 自動エンコーディングのない教師なしディスタングル:落とし穴と今後の方向性

Unsupervised Disentanglement without Autoencoding: Pitfalls and Future Directions ( http://arxiv.org/abs/2108.06613v1 )

ライセンス: CC BY 4.0

Andrea Burns, Aaron Sarna, Dilip Krishnan, Aaron Maschinot

(参考訳) 切り離された視覚表現は、変分オートエンコーダ(VAE)のような生成モデルで主に研究されている。先行研究は、異種表現学習のための生成法に焦点を当ててきたが、生成モデルの現在の制限のため、これらのアプローチは大きなデータセットにはスケールしない。代わりに、コントラスト学習を用いた正規化手法について検討し、大規模なデータセットや下流アプリケーションに十分強力なアンタングル表現をもたらす可能性がある。しかし,タスク性能のトレードオフにより,最適化や初期化感度のため,教師なしの絡み合いが困難であることが判明した。下流タスクとの絡み合いを評価し,使用する各規則化の利点と欠点を分析し,今後の方向性について考察する。

Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs). While prior work has focused on generative methods for disentangled representation learning, these approaches do not scale to large datasets due to current limitations of generative models. Instead, we explore regularization methods with contrastive learning, which could result in disentangled representations that are powerful enough for large scale datasets and downstream applications. However, we find that unsupervised disentanglement is difficult to achieve due to optimization and initialization sensitivity, with trade-offs in task performance. We evaluate disentanglement with downstream tasks, analyze the benefits and disadvantages of each regularization used, and discuss future directions.

翻訳日:2021-08-18 06:41:59 公開日:2021-08-14

# (参考訳) SelectGen Challenge:Few-Shotニューラルテキスト生成のための最高のトレーニングサンプルを見つける

The SelectGen Challenge: Finding the Best Training Samples for Few-Shot Neural Text Generation ( http://arxiv.org/abs/2108.06614v1 )

ライセンス: CC BY 4.0

Ernie Chang, Xiaoyu Shen, Alex Marin, Vera Demberg

(参考訳) 数ショットのニューラルテキスト生成のための学習事例選択のための共有タスクを提案する。大規模な事前学習された言語モデルは、わずかなテキスト生成において劇的な改善をもたらした。それでも、ほとんどすべての以前の作業は、ごく少数のトレーニングインスタンスを選択するためにランダムサンプリングを適用するだけだ。選択戦略とそれがモデルのパフォーマンスにどのように影響するかにほとんど注意が払われていない。選択戦略の研究は、(1)下流タスクでアノテーション予算を最大限に活用し、(2)より優れた数ショットテキスト生成モデルをベンチマークするのに役立ちます。我々は,選択戦略と世代品質への影響を示す提案を歓迎する。

We propose a shared task on training instance selection for few-shot neural text generation. Large-scale pretrained language models have led to dramatic improvements in few-shot text generation. Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. The study of the selection strategy can help us to (1) make the most use of our annotation budget in downstream tasks and (2) better benchmark few-shot text generative models. We welcome submissions that present their selection strategies and the effects on the generation quality.

翻訳日:2021-08-18 06:28:46 公開日:2021-08-14

# (参考訳) bスプライン

B-Splines ( http://arxiv.org/abs/2108.06617v1 )

ライセンス: CC BY 4.0

Arindam Chaudhuri

(参考訳) BSplinesはコンピュータグラフィックスで最も有望な曲線の1つである。それらは優れた幾何学的性質を備えており、コンピュータ支援デザイン産業におけるいくつかの応用の理想的な候補となっている。本稿では,B-スプライン曲線の基本特性について述べる。 2つの重要なB-Spline特性viz凸船体特性と繰り返し点効果について議論した。計算装置におけるbsplines計算も示されている。 B-Spline曲線がCT画像データセットの3次元表面を再構成する画像処理に基づく産業応用は、これらの曲線の強さをさらに強調する。

BSplines are one of the most promising curves in computer graphics. They are blessed with some superior geometric properties which make them an ideal candidate for several applications in computer aided design industry. In this article, some basic properties of B-Spline curves are presented. Two significant B-Spline properties viz convex hull property and repeated points effects are discussed. The BSplines computation in computational devices is also illustrated. An industry application based on image processing where B-Spline curve reconstructs the 3D surfaces for CT image datasets of inner organs further highlights the strength of these curves

翻訳日:2021-08-18 06:20:40 公開日:2021-08-14

# (参考訳) ニューラルネットワークのスパース符号化解釈と理論的意味

A Sparse Coding Interpretation of Neural Networks and Theoretical Implications ( http://arxiv.org/abs/2108.06622v1 )

ライセンス: CC BY 4.0

Joshua Bowren

(参考訳) ニューラルネットワーク、特に深層畳み込みニューラルネットワークは、様々なコンピュータビジョンタスクにおいて前例のないパフォーマンスを達成しているが、成功したニューラルネットワークの計算と構造に関する根拠は完全には理解されていない。画像分類のための畳み込みニューラルネットワークの適性の理論は多いが、なぜそのようなモデルが推論や異常識別のような複雑な視覚的タスクを実現できるのかについては理解されていない。本稿では、ReLUアクティベーションを持つニューラルネットワークのスパース符号化解釈と、特に畳み込みニューラルネットワークを提案する。スパース符号化では、モデルの基底関数が直交であると仮定すると、最適係数は入力画像に投影された基底関数のソフト閾値関数によって与えられる。スパース符号の非負の変種では、ソフトスレッショルド関数はReLUとなる。ここでは、直交推定基底関数によるスパース符号化を用いてこれらの解を導出し、各スパース符号化係数に対して指数的事前パラメータを持つ修正非負の直交スパース符号化モデルから畳み込みニューラルネットワーク前方変換を導出する。次に,階層的スパース符号化モデルにロジスティック回帰を追加することにより,正規化やプール化を伴わない完全畳み込みニューラルネットワークを導出する。最後に、畳み込みニューラルネットワークにおけるスパースプリアーを維持し、より強固な非線形変換を行うことで、より強固なフォワード変換を動機付ける。

Neural networks, specifically deep convolutional neural networks, have achieved unprecedented performance in various computer vision tasks, but the rationale for the computations and structures of successful neural networks is not fully understood. Theories abound for the aptitude of convolutional neural networks for image classification, but less is understood about why such models would be capable of complex visual tasks such as inference and anomaly identification. Here, we propose a sparse coding interpretation of neural networks that have ReLU activation and of convolutional neural networks in particular. In sparse coding, when the model's basis functions are assumed to be orthogonal, the optimal coefficients are given by the soft-threshold function of the basis functions projected onto the input image. In a non-negative variant of sparse coding, the soft-threshold function becomes a ReLU. Here, we derive these solutions via sparse coding with orthogonal-assumed basis functions, then we derive the convolutional neural network forward transformation from a modified non-negative orthogonal sparse coding model with an exponential prior parameter for each sparse coding coefficient. Next, we derive a complete convolutional neural network without normalization and pooling by adding logistic regression to a hierarchical sparse coding model. Finally we motivate potentially more robust forward transformations by maintaining sparse priors in convolutional neural networks as well performing a stronger nonlinear transformation.

翻訳日:2021-08-18 06:15:56 公開日:2021-08-14

# (参考訳) Equity-Directed Bootstrapping:実例と分析

Equity-Directed Bootstrapping: Examples and Analysis ( http://arxiv.org/abs/2108.06624v1 )

ライセンス: CC BY 4.0

Harish S. Bhat and Majerle E. Reeves and Sidra Goldman-Mellor

(参考訳) 非常に不均衡なバイナリ分類問題に直面した場合、私たちはしばしば、各クラスのインスタンス数がより好ましい比率で発生するブートストラップデータ上でモデルを訓練する。グループ間の分類器のパフォーマンスのバランスをとるために、ラベルとグループアイデンティティの両方に関してバランスの取れたトレーニングセットを達成するためにブートストラップを行うことができる。重度クラス不均衡の例として, 行政患者記録から自殺死亡の予測を例に, エクイティ指向のブートストラップが, 同等のオッズ基準を満たすよりも, テストセットの感性や特異性を, どのように得るかを示す。 na\\ive Bayesとロジスティック回帰の文脈で、私たちは、株式指向のブートストラップを分析し、オッズ比を1に近づけ、インターセプト調整、しきい値調整、重み付けを含む手法にリンクすることで機能することを示した。

When faced with severely imbalanced binary classification problems, we often train models on bootstrapped data in which the number of instances of each class occur in a more favorable ratio, e.g., one. We view algorithmic inequity through the lens of imbalanced classification: in order to balance the performance of a classifier across groups, we can bootstrap to achieve training sets that are balanced with respect to both labels and group identity. For an example problem with severe class imbalance---prediction of suicide death from administrative patient records---we illustrate how an equity-directed bootstrap can bring test set sensitivities and specificities much closer to satisfying the equal odds criterion. In the context of na\"ive Bayes and logistic regression, we analyze the equity-directed bootstrap, demonstrating that it works by bringing odds ratios close to one, and linking it to methods involving intercept adjustment, thresholding, and weighting.

翻訳日:2021-08-18 06:14:50 公開日:2021-08-14

# (参考訳) 時間グラフ協調変圧器を用いた連続時間逐次推薦

Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer ( http://arxiv.org/abs/2108.06625v1 )

ライセンス: CC BY-SA 4.0

Ziwei Fan and Zhiwei Liu and Jiawei Zhang and Yun Xiong and Lei Zheng and Philip S. Yu

(参考訳) ユーザの嗜好の進化をモデル化するために,逐次レコメンデーション(sr)問題として定義される時間順アイテム購入シーケンスに基づいて,ユーザ/テーマ埋め込みを学習する必要がある。既存の手法はシーケンシャルなパターンを利用してアイテムの遷移をモデル化する。しかし、その多くは、ユーザとイテムの相互作用の進化に遅れ、シーケンシャルなパターンと共存する重要な時間的協調シグナルを無視している。そこで本研究では,推薦の質を向上させるために,逐次的パターンと時間的協調信号の統合を提案する。まず,シーケンシャルパターンと協調信号を同時にエンコードすることは困難である。第二に、協調信号の時間的効果を表現することは自明ではない。そこで我々は,連続時間二成分グラフ上に新たなフレームワークであるtemporal graph sequential recommender (tgsrec) を設計する。本稿では,TGSRecにおけるTCT(Temporal Collaborative Trans-former)層を提案する。 tct層は、シーケンシャルパターン内の時間的ダイナミクスを考慮しながら、ユーザとアイテムの両方からの協調的なシグナルを同時に捉えることができる。我々は,学習した情報を時間グラフ上で伝達し,逐次パターンと時間協調信号を統合する。 5つのデータセットの実証的な結果は、TGSRecがRecall@10とMRRのそれぞれ平均22.5%と22.1%の絶対改善で他のベースラインを大幅に上回っていることを示している。

In order to model the evolution of user preference, we should learn user/item embeddings based on time-ordered item purchasing sequences, which is defined as Sequential Recommendation (SR) problem. Existing methods leverage sequential patterns to model item transitions. However, most of them ignore crucial temporal collaborative signals, which are latent in evolving user-item interactions and coexist with sequential patterns. Therefore, we propose to unify sequential patterns and temporal collaborative signals to improve the quality of recommendation, which is rather challenging. Firstly, it is hard to simultaneously encode sequential patterns and collaborative signals. Secondly, it is non-trivial to express the temporal effects of collaborative signals. Hence, we design a new framework Temporal Graph Sequential Recommender (TGSRec) upon our defined continuous-time bi-partite graph. We propose a novel Temporal Collaborative Trans-former (TCT) layer in TGSRec, which advances the self-attention mechanism by adopting a novel collaborative attention. TCT layer can simultaneously capture collaborative signals from both users and items, as well as considering temporal dynamics inside sequential patterns. We propagate the information learned fromTCTlayerover the temporal graph to unify sequential patterns and temporal collaborative signals. Empirical results on five datasets show that TGSRec significantly outperforms other baselines, in average up to 22.5% and 22.1%absolute improvements in Recall@10and MRR, respectively.

翻訳日:2021-08-18 05:54:50 公開日:2021-08-14

# (参考訳) メモリ圧縮技術を用いたGAN加速に関する調査

A Survey on GAN Acceleration Using Memory Compression Technique ( http://arxiv.org/abs/2108.06626v1 )

ライセンス: CC BY 4.0

Dina Tantawy, Mohamed Zahran, Amr Wassal

(参考訳) その発明以来、GAN(Generative Adversarial Network)は多くのアプリケーションで顕著な結果を示している。 Generative Adversarial Networksは、リソース不足のディープラーニングモデルである。通常のディープラーニングモデルとの主な違いは、その出力の性質である。例えば、gan出力は画像全体であり、他のモデルがオブジェクトを検出したり、画像を分類したりすることができる。このように、ネットワークのアーキテクチャと数値精度は、ソリューションの品質と速度に影響を与える。したがって、GANの加速は重要である。 GANの高速化は,(1)メモリ圧縮,(2)計算最適化,(3)データフロー最適化の3つの主要なトラックに分類される。データ転送がエネルギー消費の主な源であるため、メモリ圧縮は最大の節約につながる。そこで本稿では,CNN ベース GAN のメモリ圧縮技術について検討する。さらに, GANの加速の機会と課題を要約し, オープンな研究課題をさらに検討することを提案する。

Since its invention, Generative adversarial networks (GANs) have shown outstanding results in many applications. Generative Adversarial Networks are powerful yet, resource-hungry deep-learning models. Their main difference from ordinary deep learning models is the nature of their output. For example, GAN output can be a whole image versus other models detecting objects or classifying images. Thus, the architecture and numeric precision of the network affect the quality and speed of the solution. Hence, accelerating GANs is pivotal. Accelerating GANs can be classified into three main tracks: (1) Memory compression, (2) Computation optimization, and (3) Data-flow optimization. Because data transfer is the main source of energy usage, memory compression leads to the most savings. Thus, in this paper, we survey memory compression techniques for CNN-Based GANs. Additionally, the paper summarizes opportunities and challenges in GANs acceleration and suggests open research problems to be further investigated.

翻訳日:2021-08-18 05:33:56 公開日:2021-08-14

# (参考訳) ニューラルネットワークにおけるドロップアウト正規化とモデルの複雑さの関係

Investigating the Relationship Between Dropout Regularization and Model Complexity in Neural Networks ( http://arxiv.org/abs/2108.06628v1 )

ライセンス: CC BY 4.0

Christopher Sun, Jai Sharma, and Milind Maiti

(参考訳) 分散を減らすのに役立つDropout Regularizationは、ディープラーニングモデルではほぼどこでも利用できる。本研究では、3つのデータセットそれぞれについて、ドロップアウトレートと密集層内の隠れ単位数をランダムに組み合わせて構成した2000のニューラルネットワークをトレーニングすることにより、ドロップアウトレートとモデルの複雑さの関係を考察する。二つのクロスエントロピー損失とz軸上の二乗精度を持つ生成した数値は、降下率を高めながら密度層に深さを加えるという一般的な仮定に疑問を呈する。また,この2つのハイパーパラメータの複雑な相関関係を,各密層に隠れた単位が与えられた場合の最適脱落率を予測する機械学習モデルと深層学習モデルを構築し,定量化を進める。線形回帰と多項式ロジスティック回帰は、回帰に含まれるコストデータポイントをそれぞれ選択し、コストデータポイントをバイナリ分類に割り当てるために任意のしきい値を使用する必要がある。これらの機械学習モデルは、その素質が複雑な決定境界のモデリングを妨げたため、中間性能を有する。ディープラーニングモデルに目を向けると、各密層内の隠れ単位数、所望のコスト、モデルの所望の精度を考慮して、最適なドロップアウト率を予測するニューラルネットワークを構築する。しかし、この試みは垂直線試験の失敗に起因した数学的誤りに遭遇する。究極のディープラーニングモデルは、決定境界が2000の以前に生成されたデータポイントを表すニューラルネットワークである。この最終モデルは,計算コストを最小限に抑えつつ性能を最大化するために,ハイパーパラメータをチューニングするための有望な手法を考案する。この戦略は任意のモデルハイパーパラメータに適用でき、工業モデルのより効率的なチューニングが期待できる。

Dropout Regularization, serving to reduce variance, is nearly ubiquitous in Deep Learning models. We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks configured with random combinations of the dropout rate and the number of hidden units in each dense layer, on each of the three data sets we selected. The generated figures, with binary cross entropy loss and binary accuracy on the z-axis, question the common assumption that adding depth to a dense layer while increasing the dropout rate will certainly enhance performance. We also discover a complex correlation between the two hyperparameters that we proceed to quantify by building additional machine learning and Deep Learning models which predict the optimal dropout rate given some hidden units in each dense layer. Linear regression and polynomial logistic regression require the use of arbitrary thresholds to select the cost data points included in the regression and to assign the cost data points a binary classification, respectively. These machine learning models have mediocre performance because their naive nature prevented the modeling of complex decision boundaries. Turning to Deep Learning models, we build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer, the desired cost, and the desired accuracy of the model. Though, this attempt encounters a mathematical error that can be attributed to the failure of the vertical line test. The ultimate Deep Learning model is a neural network whose decision boundary represents the 2,000 previously generated data points. This final model leads us to devise a promising method for tuning hyperparameters to minimize computational expense yet maximize performance. The strategy can be applied to any model hyperparameters, with the prospect of more efficient tuning in industrial models.

翻訳日:2021-08-18 05:17:57 公開日:2021-08-14

# 教師なし再同定のためのエッジクラウド連続体における協調最適化

Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification ( http://arxiv.org/abs/2108.06493v1 )

ライセンス: Link先を確認

Weiming Zhuang, Yonggang Wen, Shuai Zhang

(参考訳) 人物再識別(ReID)は、重複しないカメラビューから人物を再識別することを目的としている。個人ReIDデータには機密情報が含まれているため、研究者は、プライバシー漏洩のリスクを軽減するために、分散トレーニング手法であるフェデレーション学習を採用した。しかし、既存の研究は、取得に手間と時間を要するデータラベルに依存している。プライバシを保ちながらラベルのない人物ReIDモデルを学習するための,非教師付き人物ReIDシステムであるFedUReIDを提案する。 FedUReIDは、ラベルのないデータを持つエッジ上で、その場でモデルのトレーニングを可能にする。クラウドサーバは、データプライバシを保存するために生データを集中するのではなく、エッジからモデルを集約する。さらに,エッジがデータ量や分布によって異なるという問題に取り組むため,クラウドとエッジを共同で最適化することでエッジでのトレーニングをパーソナライズする。具体的には、トレーニングを通して計算を再割り当てるパーソナライズ・エポック、ラベルのないデータに適したラベルを反復的に予測するパーソナライズ・クラスタリング、各エッジにサーバ集約モデルを適用するパーソナライズ・アップデートを提案する。 8人のReIDデータセットに対する大規模な実験は、FedUReIDがより高い精度を達成するだけでなく、計算コストを29%削減することを示した。統合最適化によるfeedureidシステムは,データラベルのないマルチメディアタスクへのフェデレーション学習の実装に光を当てるでしょう。

Person re-identification (ReID) aims to re-identify a person from non-overlapping camera views. Since person ReID data contains sensitive personal information, researchers have adopted federated learning, an emerging distributed training method, to mitigate the privacy leakage risks. However, existing studies rely on data labels that are laborious and time-consuming to obtain. We present FedUReID, a federated unsupervised person ReID system to learn person ReID models without any labels while preserving privacy. FedUReID enables in-situ model training on edges with unlabeled data. A cloud server aggregates models from edges instead of centralizing raw data to preserve data privacy. Moreover, to tackle the problem that edges vary in data volumes and distributions, we personalize training in edges with joint optimization of cloud and edge. Specifically, we propose personalized epoch to reassign computation throughout training, personalized clustering to iteratively predict suitable labels for unlabeled data, and personalized update to adapt the server aggregated model to each edge. Extensive experiments on eight person ReID datasets demonstrate that FedUReID not only achieves higher accuracy but also reduces computation cost by 29%. Our FedUReID system with the joint optimization will shed light on implementing federated learning to more multimedia tasks without data labels.

翻訳日:2021-08-17 15:30:52 公開日:2021-08-14

# 分散データからの協調的教師なし視覚表現学習

Collaborative Unsupervised Visual Representation Learning from Decentralized Data ( http://arxiv.org/abs/2108.06492v1 )

ライセンス: Link先を確認

Weiming Zhuang, Xin Gan, Yonggang Wen, Shuai Zhang, Shuai Yi

(参考訳) 教師なし表現学習は、インターネットで利用可能な集中型データを使用して優れたパフォーマンスを達成している。しかし、プライバシー保護に対する意識の高まりは、複数の当事者(携帯電話やカメラなど)で爆発的に増加する非ラベル画像データの分散化を制限する。そのため、データプライバシを保ちながら、これらのデータを活用して下流タスクの視覚的表現を学習する方法が自然な問題である。この問題に対処するために,新しいフェデレーション付き教師なし学習フレームワークであるFedUを提案する。このフレームワークでは、オンラインネットワークとターゲットネットワークとの対比学習を用いて、各パーティはラベルのないデータからモデルを独立に訓練する。そして、中央サーバが訓練されたモデルを集約し、集約されたモデルでクライアントのモデルを更新する。データのプライバシは、各パーティが生のデータのみにアクセスできることから保護する。複数のパーティ間の分散データは、通常非独立で同一の分散(非IID)であり、性能劣化を引き起こす。この課題に対処するために,1) サーバ集約のためのオンラインネットワークのエンコーダのみをアップロードし,それを集約したエンコーダで更新するための通信プロトコルを設計し,2) 非IID によるばらつきに基づいた予測器の更新方法を動的に決定する新しいモジュールを提案する。予測器はオンラインネットワークの他のコンポーネントである。広範囲な実験とアブレーションがfeduの有効性と意義を示している。非IIDデータに対する線形および半教師付き評価において、一方の当事者のみによるトレーニングを5%以上、その他の手法で14%以上上回っている。

Unsupervised representation learning has achieved outstanding performances using centralized data available on the Internet. However, the increasing awareness of privacy protection limits sharing of decentralized unlabeled image data that grows explosively in multiple parties (e.g., mobile phones and cameras). As such, a natural problem is how to leverage these data to learn visual representations for downstream tasks while preserving data privacy. To address this problem, we propose a novel federated unsupervised learning framework, FedU. In this framework, each party trains models from unlabeled data independently using contrastive learning with an online network and a target network. Then, a central server aggregates trained models and updates clients' models with the aggregated model. It preserves data privacy as each party only has access to its raw data. Decentralized data among multiple parties are normally non-independent and identically distributed (non-IID), leading to performance degradation. To tackle this challenge, we propose two simple but effective methods: 1) We design the communication protocol to upload only the encoders of online networks for server aggregation and update them with the aggregated encoder; 2) We introduce a new module to dynamically decide how to update predictors based on the divergence caused by non-IID. The predictor is the other component of the online network. Extensive experiments and ablations demonstrate the effectiveness and significance of FedU. It outperforms training with only one party by over 5% and other methods by over 14% in linear and semi-supervised evaluation on non-IID data.

翻訳日:2021-08-17 15:28:58 公開日:2021-08-14

# 深層モデルに基づく強化学習のためのフラクショナルトランスファー学習

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning ( http://arxiv.org/abs/2108.06526v1 )

ライセンス: Link先を確認

Remo Sasso, Matthia Sabatelli, Marco A. Wiering

(参考訳) 強化学習(RL)は、RLエージェントが複雑なタスクを実行することを学ぶために大量のデータを必要とすることで知られている。モデルベースRLの最近の進歩により、エージェントはよりデータ効率が良くなり、内部のワールドモデルを活用することで、視覚環境の振る舞いを想像で学べるようになった。サンプル効率の改善は、以前に学習したタスクから知識を再利用することでも達成できるが、転送学習はRLの課題である。パラメータベースの転送学習は一般的に、ネットワークのパラメータが完全に転送されるかランダムに初期化されるオール・オア・ナッシング・アプローチを用いて行われる。本研究では,簡単な代替手法である分数転送学習を提案する。アイデアは知識の分数を転送することであり、ランダム初期化で一般的に行われるような潜在的に有用な知識を破棄することとは対照的である。 World Model-based Dreamerアルゴリズムを用いて、このアプローチが適用可能なコンポーネントの種類を特定し、新しいマルチソース転送学習環境で実験を行う。その結果,スクラッチからの学習やランダムな初期化に比べて,分数変換学習が性能と学習の大幅な向上につながることが示唆された。

Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the network's parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.

翻訳日:2021-08-17 15:26:14 公開日:2021-08-14

# 情報ボトルネック理論による初期化のためのニューロン運動

Neuron Campaign for Initialization Guided by Information Bottleneck Theory ( http://arxiv.org/abs/2108.06530v1 )

ライセンス: Link先を確認

Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han and Dongmei Zhang

(参考訳) ディープニューラルネットワーク(DNN)のトレーニングには初期化が重要な役割を果たしている。既存の初期化戦略は主に、勾配の消失/爆発を緩和するためにトレーニングプロセスを安定化することに焦点を当てている。しかし, これらの初期化手法は, 一般化能力の向上を考慮に入れていない。 Information Bottleneck(IB)理論は、DNNの一般化を説明するためのよく知られた理解フレームワークである。 ib理論の知見に導かれ、dnnをより良い初期化するための2つの基準を設計した。さらに、与えられたデータセット上でニューラルネットワークの優れた初期化を選択するために、ニューロンキャンペーン初期化アルゴリズムを設計する。 MNISTデータセットを用いた実験により,より高速な収束による一般化性能の向上が得られた。

Initialization plays a critical role in the training of deep neural networks (DNN). Existing initialization strategies mainly focus on stabilizing the training process to mitigate gradient vanish/explosion problems. However, these initialization methods are lacking in consideration about how to enhance generalization ability. The Information Bottleneck (IB) theory is a well-known understanding framework to provide an explanation about the generalization of DNN. Guided by the insights provided by IB theory, we design two criteria for better initializing DNN. And we further design a neuron campaign initialization algorithm to efficiently select a good initialization for a neural network on a given dataset. The experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.

翻訳日:2021-08-17 15:25:56 公開日:2021-08-14

# 弱い教師付き連続学習

Weakly Supervised Continual Learning ( http://arxiv.org/abs/2108.06552v1 )

ライセンス: Link先を確認

Matteo Boschini, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara

(参考訳) 連続学習(CL)は、破滅的な忘れを伴わずに、タスクのストリーム上でディープネットワークのトレーニング方法を調査する。文献で提案されたCL設定は、すべての入力サンプルが接地真実アノテーションとペアリングされていると仮定する。しかし、これは多くの現実世界のアプリケーションと衝突する。ラベル付きデータの収集は、それ自体は面倒で高価であり、データの流れがストリームとして流れ、リアルタイムに消費されなければならないときに実際に実現不可能になる。この研究は、Weakly Supervised Continual Learning (WSCL): ここでは、ラベル付き入力例のごく一部を学習者に示す。 CLメソッドの現在の方法(例)を評価する。 EWC, LwF, iCaRL, ER, GDumb, DER) は, この斬新で難解なシナリオにおいて, 過剰な絡み合いを忘れてしまう。その後、メトリクス学習と整合性正規化を利用して学習中に教師なしデータを活用する2つの新しいWSCL手法を設計する。その結果,提案手法は,情報監督時に高い柔軟性を示すだけでなく,25%未満のラベルが完全な監視下で訓練されたsotaメソッドに到達したり,あるいは上回ったりできることがわかった。

Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring catastrophic forgetting. CL settings proposed in the literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes indeed infeasible when data flow as a stream and must be consumed in real-time. This work explores Weakly Supervised Continual Learning (WSCL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, in which overfitting entangles forgetting. Subsequently, we design two novel WSCL methods which exploit metric learning and consistency regularization to leverage unsupervised data while learning. In doing so, we show that not only our proposals exhibit higher flexibility when supervised information is scarce, but also that less than 25% labels can be enough to reach or even outperform SOTA methods trained under full supervision.

翻訳日:2021-08-17 15:24:14 公開日:2021-08-14

# 正に着目する:生物多様性モニタリングのための自己監督型学習

Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring ( http://arxiv.org/abs/2108.06435v1 )

ライセンス: Link先を確認

Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha

(参考訳) ラベルのない画像コレクションから自己教師付き表現を学習する問題に対処する。各入力画像の拡張バージョン間の類似性を最大化したり、負のサンプルを投機的に選択することで、有用な機能を学習しようとする既存のアプローチとは異なり、代わりに静的監視カメラでキャプチャされる画像コレクションで発生する自然な変動も利用します。これを実現するために,入力画像間の空間的および時間的関係などの情報をエンコードする,容易に利用可能なコンテキストデータを利用する。まず、トレーニング時に高い確率の正のペアを識別することで、下流の教師付き分類に驚くほど有効である表現を学習することができる。同じ視覚概念を表現しそうな画像です地球生物多様性監視の重要課題として、人間の監督が限定された視覚的種分類タスクに適応可能な画像特徴があげられる。本研究では,4種類のカメラトラップ画像から,3種類の自己教師あり学習法を対象とし,従来の自己教師あり学習や転送学習に比べて,トレーニング時の注意深い画像選択が優れた性能をもたらすことを示す。

We address the problem of learning self-supervised representations from unlabeled image collections. Unlike existing approaches that attempt to learn useful features by maximizing similarity between augmented versions of each input image or by speculatively picking negative samples, we instead also make use of the natural variation that occurs in image collections that are captured using static monitoring cameras. To achieve this, we exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images. We are able to learn representations that are surprisingly effective for downstream supervised classification, by first identifying high probability positive pairs at training time, i.e. those images that are likely to depict the same visual concept. For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision. We present results on four different camera trap image collections, across three different families of self-supervised learning methods, and show that careful image selection at training time results in superior performance compared to existing baselines such as conventional self-supervised training and transfer learning.

翻訳日:2021-08-17 15:22:56 公開日:2021-08-14

# 強化学習による情報経路計画戦略の適応的選択

Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning ( http://arxiv.org/abs/2108.06618v1 )

ライセンス: Link先を確認

Taeyeong Choi, Grzegorz Cielniak

(参考訳) 従来の研究では,gaussian process regression (gpr) の予測の不確かさを経路計画におけるロボットの"引き込み力"として用いることで,空間補間の大幅な精度向上を導くためにサンプリング位置を優先する体系的な方針を考案した。また, トラベリングセールスマン問題 (TSP) と統合することで, 比較的短い走行距離が得られたが, 最終的には準最適位置が経路に含まれるため, 全体の予測精度を低下させる要因がいくつか考えられる。そこで,本稿ではまず,次のサンプリング位置が優先される様々な空間範囲を取り入れた「ローカルプランニング」アプローチについて検討し,予測性能および帰路距離への影響について検討する。また、Reinforcement Learning (RL)ベースのハイレベルコントローラは、特定のローカルプランナーセットからブレンドプランを適応的に作成するように訓練され、最新の予測状態に応じてその選択から独自の強みを継承する。本研究は, 温度モニタリングロボットを用いた実験により, プランナーの動的混合により, 単一のプランナーが単独で作成できない高度な情報プランを生成するだけでなく, 最短経路計算のための追加モジュールを必要とせず, 予測信頼性の犠牲なしに, 大幅に短縮された走行距離を保証できることを示した。

In our previous work, we designed a systematic policy to prioritize sampling locations to lead significant accuracy improvement in spatial interpolation by using the prediction uncertainty of Gaussian Process Regression (GPR) as "attraction force" to deployed robots in path planning. Although the integration with Traveling Salesman Problem (TSP) solvers was also shown to produce relatively short travel distance, we here hypothesise several factors that could decrease the overall prediction precision as well because sub-optimal locations may eventually be included in their paths. To address this issue, in this paper, we first explore "local planning" approaches adopting various spatial ranges within which next sampling locations are prioritized to investigate their effects on the prediction performance as well as incurred travel distance. Also, Reinforcement Learning (RL)-based high-level controllers are trained to adaptively produce blended plans from a particular set of local planners to inherit unique strengths from that selection depending on latest prediction states. Our experiments on use cases of temperature monitoring robots demonstrate that the dynamic mixtures of planners can not only generate sophisticated, informative plans that a single planner could not create alone but also ensure significantly reduced travel distances at no cost of prediction reliability without any assist of additional modules for shortest path calculation.

翻訳日:2021-08-17 15:21:01 公開日:2021-08-14

# カラー画像復元のための高次元支援生成モデル

High-dimensional Assisted Generative Model for Color Image Restoration ( http://arxiv.org/abs/2108.06460v1 )

ライセンス: Link先を確認

Kai Hong, Chunhua Wu, Cailian Yang, Minghui Zhang, Yancheng Lu, Yuhao Wang, and Qiegen Liu

(参考訳) 本研究では,高次元スコアベース生成モデルを用いたカラー画像復元のための教師なし深層学習手法を提案する。スコアベース生成モデルのサンプル数と内部次元がデータ分布の勾配推定に重要な影響を与えることを考慮し、チャネルコピー変換はサンプル数を増加させ、ピクセルスケール変換は実現可能な空間次元を減少させる。その後、これらの変換で表される高次元テンソルの集合を用いて、スコアマッチングを denoising score matching によってネットワークを訓練する。次に、ランジュバンダイナミクスと代替データ一貫性更新をアニーリングしてサンプリングを行う。さらに,高次元表現を学習することの難しさを軽減するために,性能を活用するためのプログレッシブ戦略を提案する。事前学習のための事前学習型生成ネットワークを含む教師なし学習と反復的復元アルゴリズムは,他のデータ駆動型アプローチと比較して透明で明確な解釈が可能である。解体・塗布実験の結果,提案手法の顕著な性能と多様性が得られた。

This work presents an unsupervised deep learning scheme that exploiting high-dimensional assisted score-based generative model for color image restoration tasks. Considering that the sample number and internal dimension in score-based generative model have key influence on estimating the gradients of data distribution, two different high-dimensional ways are proposed: The channel-copy transformation increases the sample number and the pixel-scale transformation decreases feasible space dimension. Subsequently, a set of high-dimensional tensors represented by these transformations are used to train the network through denoising score matching. Then, sampling is performed by annealing Langevin dynamics and alternative data-consistency update. Furthermore, to alleviate the difficulty of learning high-dimensional representation, a progressive strategy is proposed to leverage the performance. The proposed unsupervised learning and iterative restoration algo-rithm, which involves a pre-trained generative network to obtain prior, has transparent and clear interpretation compared to other data-driven approaches. Experimental results on demosaicking and inpainting conveyed the remarkable performance and diversity of our proposed method.

翻訳日:2021-08-17 15:19:54 公開日:2021-08-14

# LoResMT 2021の低リソース言語における新型コロナウイルスと手話の共有課題の発見

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages ( http://arxiv.org/abs/2108.06598v1 )

ライセンス: Link先を確認

Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen

(参考訳) 本稿では,低リソース音声と手話の双方を対象とした,COVID-19データの機械翻訳(MT)に焦点を当てたLoResMT 2021共有タスクについて述べる。この作業は低リソース言語(LoResMT)の機械翻訳技術に関する第4回ワークショップの一環として実施された。パラレルコーポラ(parallel corpora)は、英語$\leftrightarrow$irish、英語$\leftrightarrow$marathi、台湾語手話$\leftrightarrow$ traditional chineseの順に提示され、公開されている。訓練データはそれぞれ8112セグメント、20933セグメント、128608セグメントからなる。 Marathi と English には21901セグメントからなる追加の単言語データセットがある。ここで示される結果は、合計8チームからのエントリに基づいています。 3つのチームが英語$\leftrightarrow$Irishにシステムを提出し、5つのチームが英語$\leftrightarrow$Marathiにシステムを提出した。残念なことに、台湾の手話$\leftrightarrow$Traditional Chinese taskへのシステム提出は行われなかった。最大システム性能はBLEUを用いて計算され、英語は36.0、アイルランド語は34.6、英語は24.2、マラタイ語は31.3と続く。

We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English$\leftrightarrow$Irish, English$\leftrightarrow$Marathi, and Taiwanese Sign language$\leftrightarrow$Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English$\leftrightarrow$Irish while five teams submitted systems for English$\leftrightarrow$Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language$\leftrightarrow$Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and 31.3 for Marathi--English.

翻訳日:2021-08-17 15:19:24 公開日:2021-08-14

# MatSat: 行列ベースの微分可能なSATソルバ

MatSat: a matrix-based differentiable SAT solver ( http://arxiv.org/abs/2108.06481v1 )

ライセンス: Link先を確認

Taisuke Sato (1) and Ryosuke Kojima (2) ((1) National Institute of Informatics (NII), (2) Graduate School of Medicine, Kyoto University)

(参考訳) 本稿では,非負の微分可能コスト関数 J^sat のコスト最小化問題として,ベクトル空間におけるSAT問題の解法を提案する。このアプローチでは、n変数のSAT問題に対する代入を満足する解は、J^sat(u) をゼロにする {0,1}^n のバイナリベクトル u で表される。ベクトル空間 R^n においてそのような u をコスト最小化、すなわち初期 u_0 から J を 0 に最小化し、ニュートン法により u を反復的に更新する。行列型微分SATソルバMatSatとして提案手法を実装した。既存の主ストリームsatソルバは、コンフリクト駆動節学習(cdcl)型や確率的局所探索(sls)型など、解割り当ての各ビットを一つずつ決定するが、マサットはベクトル空間内の解に連続的に近づくという点で、それらと根本的に異なる。そこで我々は,MateSatがn=10^5変数まで解を見つけることのできるランダム3SAT問題を用いて,MateSatのスケーラビリティを測定する実験を行った。私たちはまた、SAT 2018コンペティションとSAT Race 2019の勝者を含む4つの最先端SATソルバと比較し、SAT 2018コンペティションと人工ランダム3SATインスタンスセットのランダムベンチマークセットを使用して、ソリューションを見つける時間の観点から比較した。その結果、MateSatは両方のテストセットで2位となり、CDCLの型解決器よりも優れています。

We propose a new approach to SAT solving which solves SAT problems in vector spaces as a cost minimization problem of a non-negative differentiable cost function J^sat. In our approach, a solution, i.e., satisfying assignment, for a SAT problem in n variables is represented by a binary vector u in {0,1}^n that makes J^sat(u) zero. We search for such u in a vector space R^n by cost minimization, i.e., starting from an initial u_0 and minimizing J to zero while iteratively updating u by Newton's method. We implemented our approach as a matrix-based differential SAT solver MatSat. Although existing main-stream SAT solvers decide each bit of a solution assignment one by one, be they of conflict driven clause learning (CDCL) type or of stochastic local search (SLS) type, MatSat fundamentally differs from them in that it continuously approach a solution in a vector space. We conducted an experiment to measure the scalability of MatSat with random 3-SAT problems in which MatSat could find a solution up to n=10^5 variables. We also compared MatSat with four state-of-the-art SAT solvers including winners of SAT competition 2018 and SAT Race 2019 in terms of time for finding a solution, using a random benchmark set from SAT 2018 competition and an artificial random 3-SAT instance set. The result shows that MatSat comes in second in both test sets and outperforms all the CDCL type solvers.

翻訳日:2021-08-17 15:17:27 公開日:2021-08-14

# 確率的流域変換によるサッカーラインマークセグメンテーション

Soccer line mark segmentation with stochastic watershed transform ( http://arxiv.org/abs/2108.06432v1 )

ライセンス: Link先を確認

Daniel Berj\'on, Carlos Cuevas, Narciso Garc\'ia

(参考訳) 拡張現実のアプリケーションは、スポーツの放送方法を変え始めており、より豊かな体験と貴重な洞察をファンに提供する。拡張現実システムの最初のステップはカメラキャリブレーションであり、おそらくはフィールドのラインマークを検出することに基づいている。ライン検出のための既存の提案のほとんどはエッジ検出とハフ変換に依存しているが、光学的歪みと外縁はラインマーキングの不正確または散発的な検出を引き起こす。本稿では,直線の直さを前提とせず,競技場における選手やボールの存在の影響を受けないため,光学歪みに頑健な確率的流域変換に基づいて,ラインマーキングを自動的かつ正確にセグメント化する方法を提案する。第一に、全体としての遊技場は、スタンド及び周板を完全に取り除く。そして、線マークを抽出する。この戦略は、5つのスタジアムで60枚のアノテートされた画像からなる新しい公開データベースでテストされている。得られた結果は,提案したセグメント化アルゴリズムにより,ほとんどのラインマーク画素を精度よく検出できることを証明した。

Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the field of play. Most existing proposals for line detection rely on edge detection and Hough transform, but optical distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment line markings based on a stochastic watershed transform that is robust to optical distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball in the field of play. Firstly, the playing field as a whole is segmented completely eliminating the stands and perimeter boards. Then the line markings are extracted. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed segmentation algorithm allows successful and precise detection of most line mark pixels.

翻訳日:2021-08-17 15:16:12 公開日:2021-08-14

# ビデオキャプションのためのメタ概念を用いたクロスモーダルグラフ

Cross-Modal Graph with Meta Concepts for Video Captioning ( http://arxiv.org/abs/2108.06458v1 )

ライセンス: Link先を確認

Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao

(参考訳) ビデオキャプションのターゲットは、複雑な視覚的内容をテキスト記述として解釈し、オブジェクトやそれらの相互作用を含むビデオシーンを完全に理解する必要がある。一般的な手法では、オフザシェルフオブジェクト検出ネットワークを用いてオブジェクトの提案を行い、オブジェクト間の関係をモデル化するためにアテンションメカニズムを使用する。彼らはしばしば事前訓練されたモデルの未定義の意味概念を見逃し、オブジェクト間の正確な述語関係を識別できない。本稿では,ビデオのテキスト記述を生成するオープンな研究課題について検討し,動画キャプションのメタ概念を用いたクロスモーダルグラフ(CMG)を提案する。具体的には、映像キャプションにおける有用な意味概念をカバーするために、対応するテキスト記述の視覚領域を弱く学習し、関連する視覚領域とテクストワードをクロスモーダルメタ概念と命名する。さらに、学習したクロスモーダルなメタ概念でメタ概念グラフを動的に構築する。また,ビデオシーケンス構造をモデル化するために,予測述語を用いた全体像と局所像のフレームレベルのビデオグラフを構築した。提案手法の有効性を広範な実験で検証し,2つの公開データセットで最新の結果を得た。

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention mechanism to model the relations between objects. They often miss some undefined semantic concepts of the pretrained model and fail to identify exact predicate relationships between objects. In this paper, we investigate an open research task of generating text descriptions for the given videos, and propose Cross-Modal Graph (CMG) with meta concepts for video captioning. Specifically, to cover the useful semantic concepts in video captions, we weakly learn the corresponding visual regions for text descriptions, where the associated visual regions and textual words are named cross-modal meta concepts. We further build meta concept graphs dynamically with the learned cross-modal meta concepts. We also construct holistic video-level and local frame-level video graphs with the predicted predicates to model video sequence structures. We validate the efficacy of our proposed techniques with extensive experiments and achieve state-of-the-art results on two public datasets.

翻訳日:2021-08-17 15:15:53 公開日:2021-08-14

# 二次元蛍光X線画像への解剖学的スカルモデルの登録のための人工X線ランドマークデータセットからの学習

Transfer Learning from an Artificial Radiograph-landmark Dataset for Registration of the Anatomic Skull Model to Dual Fluoroscopic X-ray Images ( http://arxiv.org/abs/2108.06466v1 )

ライセンス: Link先を確認

Chaochao Zhou, Thomas Cha, Yun Peng, Guoan Li

(参考訳) 2次元蛍光X線画像への3次元解剖構造の登録は、広く使われているモーショントラッキング技術である。しかし、深層学習の実装は、しばしば医学的イメージと基礎的真実の暗黙性によって妨げられる。本研究では,人工データセットから学習した深層ニューラルネットワークを用いた3次元から2次元への登録のためのトランスファー学習手法を提案する。女性の頭蓋骨ctデータからデジタル再構成x線写真(drr)とx線頭蓋骨ランドマークが自動生成された。ランドマーク検出のための残留ネットワーク(ResNet)と、DRRと実際のX線とのスタイルの違いを排除するためのサイクル生成逆ネットワーク(GAN)の訓練に使用された。 GANスタイルの翻訳を経験するX線のランドマークはResNetによって検出され、実際のデュアルフルオロスコープ画像の3次元から2次元の頭蓋骨の登録(非直交的な設定、点X線源、画像歪み、部分的捕獲された頭蓋骨領域)の三角形最適化に使用された。登録精度は頭蓋骨運動の複数のシナリオで評価された。歩行中、頭蓋骨の学習に基づく登録は3.9 +- 2.1 deg / 4.6 +- 2.2 mmであった。しかし, 機能的頸部活動では, 終端位置の二重蛍光像に非常に小さな頭蓋領域がみられたため, 精度は低かった。人工的なトレーニングデータを戦略的に拡張する手法は、複雑な頭蓋骨登録シナリオに対処し、広範な登録シナリオに拡張する可能性を秘めている。

Registration of 3D anatomic structures to their 2D dual fluoroscopic X-ray images is a widely used motion tracking technique. However, deep learning implementation is often impeded by a paucity of medical images and ground truths. In this study, we proposed a transfer learning strategy for 3D-to-2D registration using deep neural networks trained from an artificial dataset. Digitally reconstructed radiographs (DRRs) and radiographic skull landmarks were automatically created from craniocervical CT data of a female subject. They were used to train a residual network (ResNet) for landmark detection and a cycle generative adversarial network (GAN) to eliminate the style difference between DRRs and actual X-rays. Landmarks on the X-rays experiencing GAN style translation were detected by the ResNet, and were used in triangulation optimization for 3D-to-2D registration of the skull in actual dual-fluoroscope images (with a non-orthogonal setup, point X-ray sources, image distortions, and partially captured skull regions). The registration accuracy was evaluated in multiple scenarios of craniocervical motions. In walking, learning-based registration for the skull had angular/position errors of 3.9 +- 2.1 deg / 4.6 +- 2.2 mm. However, the accuracy was lower during functional neck activity, due to overly small skull regions imaged on the dual fluoroscopic images at end-range positions. The methodology to strategically augment artificial training data can tackle the complicated skull registration scenario, and has potentials to extend to widespread registration scenarios.

翻訳日:2021-08-17 15:15:34 公開日:2021-08-14

# 疑似スカンナーによる3次元脳mri画像検索のための疾患指向画像埋め込み

Disease-oriented image embedding with pseudo-scanner standardization for content-based image retrieval on 3D brain MRI ( http://arxiv.org/abs/2108.06518v1 )

ライセンス: Link先を確認

Hayato Arai, Yuto Onga, Kumpei Ikuta, Yusuke Chayama, Hitoshi Iyatomi, Kenichi Oishi

(参考訳) 臨床脳MRIデータベースに適用可能な,堅牢で実用的なコンテンツベース画像検索(CBIR)システムを構築するために,2つのコア技術,データ調和と次元縮小アルゴリズムからなる,疾患指向の画像埋め込み(DI-PSS)を提案する。我々のDI-PSSは頭蓋骨のストリッピングとCycleGANベースの画像変換を使用して、標準脳にマップし、次に所定の参照スキャナーで撮影された脳画像に変換する。そして, 深度学習による3次元コンボリューショナルオートエンコーダ(3D-CAE)は, 疾患の特徴を反映した低次元埋め込みを得る。提案手法の有効性を,アルツハイマー病神経画像イニシアチブとパーキンソン病進行マーカーイニシアチブから選択したT1強調MRIを用いて検討した。我々はPSSがスキャナーとデータセットの違いによる低次元埋め込みのばらつきを大幅に低減したことを確認した。ベースライン条件と比較すると, アルツハイマー病 (AD) から臨床正常 (CN) , パーキンソン病 (PD) までの距離の変動は15.8-22.6%, 18.0-29.9%減少した。これらの性質により、DI-PSSは病気の分類に適した低次元表現を生成することができる。スペクトルクラスタリングに基づくADとCNの分類実験では、PSSはそれぞれ平均精度を6.2%、マクロF1を10.7%改善した。トレーニングデータのスキャンに使用されなかったMRIスキャナーによってスキャンされた画像の調和化のためのDI-PSSの可能性を考えると,DI-PSSは異種環境でスキャンされた多数のレガシーMRIに適用するのに適していると考えられる。

To build a robust and practical content-based image retrieval (CBIR) system that is applicable to a clinical brain MRI database, we propose a new framework -- Disease-oriented image embedding with pseudo-scanner standardization (DI-PSS) -- that consists of two core techniques, data harmonization and a dimension reduction algorithm. Our DI-PSS uses skull stripping and CycleGAN-based image transformations that map to a standard brain followed by transformation into a brain image taken with a given reference scanner. Then, our 3D convolutioinal autoencoders (3D-CAE) with deep metric learning acquires a low-dimensional embedding that better reflects the characteristics of the disease. The effectiveness of our proposed framework was tested on the T1-weighted MRIs selected from the Alzheimer's Disease Neuroimaging Initiative and the Parkinson's Progression Markers Initiative. We confirmed that our PSS greatly reduced the variability of low-dimensional embeddings caused by different scanner and datasets. Compared with the baseline condition, our PSS reduced the variability in the distance from Alzheimer's disease (AD) to clinically normal (CN) and Parkinson disease (PD) cases by 15.8-22.6% and 18.0-29.9%, respectively. These properties allow DI-PSS to generate lower dimensional representations that are more amenable to disease classification. In AD and CN classification experiments based on spectral clustering, PSS improved the average accuracy and macro-F1 by 6.2% and 10.7%, respectively. Given the potential of the DI-PSS for harmonizing images scanned by MRI scanners that were not used to scan the training data, we expect that the DI-PSS is suitable for application to a large number of legacy MRIs scanned in heterogeneous environments.

翻訳日:2021-08-17 15:15:01 公開日:2021-08-14

# 弱教師付き時間的行動定位のための前景的行動整合性ネットワーク

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization ( http://arxiv.org/abs/2108.06524v1 )

ライセンス: Link先を確認

Linjiang Huang, Liang Wang, Hongsheng Li

(参考訳) 高レベル映像理解の課題として,時間的行動の局所化の弱さが注目されている。ビデオアノテーションのみを使用して、既存のほとんどのメソッドはこのタスクをローカライズ・バイ・クラス化フレームワークで処理し、一般的に、アクションの確率の高いスニペット、すなわちフォアグラウンドを選択するセレクタを採用する。それにもかかわらず、既存の前景選択戦略は、前景からアクションへの一方的な関係のみを考慮するという大きな制限を持ち、前景とアクションの一貫性を保証できない。本稿では,i3dバックボーンに基づくfac-netというフレームワークについて述べる。このフレームワークでは,3つのブランチが付加され,クラス別フォアグラウンド分類ブランチ,クラス非依存注意ブランチ,複数インスタンス学習ブランチと命名された。まず, クラスワイド前景分類部は, 前景分離を最大化するために, 行動と前景の関係を規則化する。さらに、前景-アクション一貫性を規則化し、有意義な前景分類器を学ぶのに役立つ、クラス非依存の注意ブランチと複数のインスタンス学習ブランチが採用されている。各ブランチでは,各スニペットに対する複数のアテンションスコアを計算するハイブリッドアテンション機構を導入し,識別スニペットと非識別スニペットの両方に着目し,アクション境界全体をキャプチャする。 THUMOS14とActivityNet1.3の実験結果から,本手法の最先端性能が示された。私たちのコードはhttps://github.com/leonhlj/fac-netで利用可能です。

As a challenging task of high-level video understanding, weakly supervised temporal action localization has been attracting increasing attention. With only video annotations, most existing methods seek to handle this task with a localization-by-classification framework, which generally adopts a selector to select snippets of high probabilities of actions or namely the foreground. Nevertheless, the existing foreground selection strategies have a major limitation of only considering the unilateral relation from foreground to actions, which cannot guarantee the foreground-action consistency. In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch. First, our class-wise foreground classification branch regularizes the relation between actions and foreground to maximize the foreground-background separation. Besides, the class-agnostic attention branch and multiple instance learning branch are adopted to regularize the foreground-action consistency and help to learn a meaningful foreground classifier. Within each branch, we introduce a hybrid attention mechanism, which calculates multiple attention scores for each snippet, to focus on both discriminative and less-discriminative snippets to capture the full action boundaries. Experimental results on THUMOS14 and ActivityNet1.3 demonstrate the state-of-the-art performance of our method. Our code is available at https://github.com/LeonHLJ/FAC-Net.

翻訳日:2021-08-17 15:14:32 公開日:2021-08-14

# 一般化ゼロショット意味セグメンテーションにおけるジョイント埋め込み空間の活用

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation ( http://arxiv.org/abs/2108.06536v1 )

ライセンス: Link先を確認

Donghyeon Baek, Youngmin Oh, Bumsub Ham

(参考訳) 本稿では,一般ゼロショットセマンティックセマンティックセグメンテーション(GZS3)の課題に対処する。ほとんどのGZS3メソッドは、見知らぬクラスの視覚的特徴を対応する意味的特徴(例えば word2vec)から合成し、見知らぬクラスと見えないクラスの両方に新しい分類器を訓練する。生成法は優れた性能を示すが,(1)視覚的特徴が目に見えるクラスに偏っていること,(2)未知のクラスが出現するたびに分類器を再訓練する必要があること,の2つの制限がある。我々は,これらの制約を統一したフレームワークで解決するための差別的アプローチを提案する。この目的のために、視覚的および意味的エンコーダを活用して、セマンティックエンコーダがセマンティック特徴を対応するクラスの視覚的特徴の中心として機能するセマンティックプロトタイプに変換する、共同埋め込み空間を学習する。具体的には,境界認識回帰(BAR)と意味整合性(SC)の損失を導入し,識別的特徴を学習する。我々は, bar と sc の用語を併用した統合埋め込み空間を活用し, バイアス問題を緩和する手法を提案する。テスト時には,近親者(NN)分類器としてセマンティックプロトタイプを活用することで,再訓練プロセスを回避する。さらにバイアス問題を緩和するために、NN分類器の判断境界をアポロニウス円に適応的に変調するApollonius calibration (AC)と呼ばれる推論手法を提案する。実験の結果,本フレームワークの有効性が実証され,標準ベンチマークにおける新しい技術が得られた。

We address the problem of generalized zero-shot semantic segmentation (GZS3) predicting pixel-wise semantic labels for seen and unseen classes. Most GZS3 methods adopt a generative approach that synthesizes visual features of unseen classes from corresponding semantic ones (e.g., word2vec) to train novel classifiers for both seen and unseen classes. Although generative methods show decent performance, they have two limitations: (1) the visual features are biased towards seen classes; (2) the classifier should be retrained whenever novel unseen classes appear. We propose a discriminative approach to address these limitations in a unified framework. To this end, we leverage visual and semantic encoders to learn a joint embedding space, where the semantic encoder transforms semantic features to semantic prototypes that act as centers for visual features of corresponding classes. Specifically, we introduce boundary-aware regression (BAR) and semantic consistency (SC) losses to learn discriminative features. Our approach to exploiting the joint embedding space, together with BAR and SC terms, alleviates the seen bias problem. At test time, we avoid the retraining process by exploiting semantic prototypes as a nearest-neighbor (NN) classifier. To further alleviate the bias problem, we also propose an inference technique, dubbed Apollonius calibration (AC), that modulates the decision boundary of the NN classifier to the Apollonius circle adaptively. Experimental results demonstrate the effectiveness of our framework, achieving a new state of the art on standard benchmarks.

翻訳日:2021-08-17 15:14:08 公開日:2021-08-14

# MMOCR: テキストの検出・認識・理解のための総合ツールボックス

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding ( http://arxiv.org/abs/2108.06543v1 )

ライセンス: Link先を確認

Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

(参考訳) 本稿では,テキスト検出と認識のための包括的パイプラインと,名前付きエンティティ認識やキー情報抽出などの下流タスクを提供するオープンソースツールボックスMMOCRを提案する。 MMOCRは14の最先端のアルゴリズムを実装しています。テキスト認識に関する今後の研究と産業応用を容易にするために,大量のモデルと詳細なベンチマークを提供し,テキスト検出,認識,理解のパフォーマンスに関する洞察を与える。 MMOCRはhttps://github.com/open-mmlab/mmocr.comで公開されている。

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction. MMOCR implements 14 state-of-the-art algorithms, which is significantly more than all the existing open-source OCR projects we are aware of to date. To facilitate future research and industrial applications of text recognition-related problems, we also provide a large number of trained models and detailed benchmarks to give insights into the performance of text detection, recognition and understanding. MMOCR is publicly released at https://github.com/open-mmlab/mmocr.

翻訳日:2021-08-17 15:13:39 公開日:2021-08-14

# 事前学習された顔認識モデルの偏り予測に対する画像歪みの影響の解明

Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models ( http://arxiv.org/abs/2108.06581v1 )

ライセンス: Link先を確認

Puspita Majumdar, Surbhi Mittal, Richa Singh, Mayank Vatsa

(参考訳) ディープラーニングアルゴリズムにおけるバイアスの特定と軽減は、社会への影響によって、ここ数年で大きな人気を集めている。バランスのとれたデータセットでトレーニングされたモデルは、サブグループ間で同等で偏りのないパフォーマンスを提供する、と研究者は主張する。しかし、\textit{can looks unbiased pre-trained model are biased when input data unders certain distortions? 私たちは初めて、顔認識という文脈でこの問題に答えようと試みました。異なる \textit{gender} と \textit{race} 部分群にまたがる画像歪みの存在下での4つの最先端深層顔認識モデルの性能を評価するための系統的分析を行った。画像の歪みは,各サブグループ間のモデルの性能ギャップと関係していることがわかった。

Identifying and mitigating bias in deep learning algorithms has gained significant popularity in the past few years due to its impact on the society. Researchers argue that models trained on balanced datasets with good representation provide equal and unbiased performance across subgroups. However, \textit{can seemingly unbiased pre-trained model become biased when input data undergoes certain distortions?} For the first time, we attempt to answer this question in the context of face recognition. We provide a systematic analysis to evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions across different \textit{gender} and \textit{race} subgroups. We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.

翻訳日:2021-08-17 15:13:29 公開日:2021-08-14

# カテゴリーとドメインアライメントに向けて--逆領域適応のためのカテゴリ不変特徴拡張

Towards Category and Domain Alignment: Category-Invariant Feature Enhancement for Adversarial Domain Adaptation ( http://arxiv.org/abs/2108.06583v1 )

ライセンス: Link先を確認

Yuan Wu, Diana Inkpen and Ahmed El-Roby

(参考訳) 敵対的ドメイン適応は、両方のドメインの特徴分布を整列させることにより、ソースドメインからターゲットドメインへの知識伝達において顕著な進歩を遂げた。これらの手法は、領域の発散を最小限にし、これら2つの領域の理想的な合同仮説の期待誤差として測定される適応性を小さい定数として考慮することに焦点を当てている。しかし、これらのアプローチは依然として2つの問題に直面している: (1) 敵対的領域アライメントは元の特徴分布を歪め、適応性を低下させる; (2) 特徴表現をドメイン不変に変換してドメイン固有のバリエーションを犠牲にする必要がある。これらの問題を緩和するために,適応性を最適化して対向領域適応を向上する一般的なメカニズムであるカテゴリー不変機能拡張(CIFE)を提案する。特に、CIFEアプローチでは、転送可能性を維持することで、ドメイン不変機能の識別性を高めるために、カテゴリ不変機能を導入している。実験により、CIFEは5つのベンチマークで最先端の結果を得るために、代表対逆領域適応法により改善できることが示されている。

Adversarial domain adaptation has made impressive advances in transferring knowledge from the source domain to the target domain by aligning feature distributions of both domains. These methods focus on minimizing domain divergence and regard the adaptability, which is measured as the expected error of the ideal joint hypothesis on these two domains, as a small constant. However, these approaches still face two issues: (1) Adversarial domain alignment distorts the original feature distributions, deteriorating the adaptability; (2) Transforming feature representations to be domain-invariant needs to sacrifice domain-specific variations, resulting in weaker discriminability. In order to alleviate these issues, we propose category-invariant feature enhancement (CIFE), a general mechanism that enhances the adversarial domain adaptation through optimizing the adaptability. Specifically, the CIFE approach introduces category-invariant features to boost the discriminability of domain-invariant features with preserving the transferability. Experiments show that the CIFE could improve upon representative adversarial domain adaptation methods to yield state-of-the-art results on five benchmarks.

翻訳日:2021-08-17 15:13:18 公開日:2021-08-14

# Few-Shotセグメンテーションのための自己蒸留埋設アフィニティ注意モデル

A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation ( http://arxiv.org/abs/2108.06600v1 )

ライセンス: Link先を確認

Qi Zhao, Binghao Liu, Shuchang Lyu, Xu Wang and Yifan Yang

(参考訳) 少数ショットのセマンティクスセグメンテーションは、わずかな注釈付きサンプルでピクセル単位でオブジェクトのカテゴリを予測する難しいタスクである。しかし、既存のアプローチは依然として2つの大きな課題に直面している。第一に、サポートとクエリイメージの巨大な特徴区別は、知識伝達障壁を引き起こし、セグメンテーション性能を損なう。第2に,サポート機能を説明できないようなサポートサンプルは少なく,高品質なクエリセグメンテーションを導くことがほとんどない。上記の2つの課題に対処するため,数発のセグメンテーションタスクの性能向上のために,自己蒸留型組込み親和性アテンションモデル(SD-AANet)を提案する。具体的には、自己蒸留ガイド型プロトタイプモジュール(SDPM)は、サポートとクエリの自己蒸留により固有のプロトタイプを抽出し、代表的特徴を捉える。教師付きアフィニティアテンションモジュール(SAAM)は、高品質なクエリアテンションマップの作成をガイドするために、サポート基盤真理を採用し、アフィニティ情報を学習してクエリターゲットの全領域にフォーカスすることができる。 SD-AANetは既存の手法と比較して性能を著しく向上させる。包括的アブレーション実験と可視化実験も,数発のセグメンテーション作業においてSDPMとSAAMの有意な効果を示した。ベンチマークデータセットであるPASCAL-5iとCOCO-20iにおいて,提案したSD-AANetはいずれも最先端の結果を得た。私たちのコードはまもなく公開されます。

Few-shot semantic segmentation is a challenging task of predicting object categories in pixel-wise with only few annotated samples. However, existing approaches still face two main challenges. First, huge feature distinction between support and query images causes knowledge transferring barrier, which harms the segmentation performance. Second, few support samples cause unrepresentative of support features, hardly to guide high-quality query segmentation. To deal with the above two issues, we propose self-distillation embedded supervised affinity attention model (SD-AANet) to improve the performance of few-shot segmentation task. Specifically, the self-distillation guided prototype module (SDPM) extracts intrinsic prototype by self-distillation between support and query to capture representative features. The supervised affinity attention module (SAAM) adopts support ground truth to guide the production of high quality query attention map, which can learn affinity information to focus on whole area of query target. Extensive experiments prove that our SD-AANet significantly improves the performance comparing with existing methods. Comprehensive ablation experiments and visualization studies also show the significant effect of SDPM and SAAM for few-shot segmentation task. On benchmark datasets, PASCAL-5i and COCO-20i, our proposed SD-AANet both achieve state-of-the-art results. Our code will be publicly available soon.

翻訳日:2021-08-17 15:12:58 公開日:2021-08-14

# マルチアクセス無線ネットワーク上での効率的なフェデレーションメタラーニング

Efficient Federated Meta-Learning over Multi-Access Wireless Networks ( http://arxiv.org/abs/2108.06453v1 )

ライセンス: Link先を確認

Sheng Yue, Ju Ren, Jiang Xin, Deyu Zhang, Yaoxue Zhang, Weihua Zhuang

(参考訳) フェデレーションメタラーニング(fml)は、今日のエッジラーニング分野におけるデータ制限と多様性の課題に対処するための有望なパラダイムとして登場した。しかし、その性能は遅い収束とそれに対応する低通信効率によって制限されることが多い。さらに、無線帯域とIoTデバイスのエネルギー容量は通常不十分であるため、現実的な無線ネットワークにFMLをデプロイする際には、リソース割り当てとエネルギー消費を制御することが不可欠である。これらの課題を克服するため,本論文ではまず,各ラウンドのグローバルロス低減に対する各デバイスの役割を厳密に解析し,収束を加速する非一様デバイス選択スキームを用いたfmlアルゴリズム(nufm)を開発した。その後,マルチアクセス無線システムにおいてnfmを統合する資源割当問題を定式化し,コンバージェンス率を向上し,壁時計時間の最小化とエネルギーコストの削減を図る。元の問題を段階的に分解することにより,デバイス選択とリソース割当戦略(uralと呼ばれる)を共同して解決し,理論的保証を提供する。さらに, 2 つの一階近似手法を組み合わせることで, nufm の計算複雑性を $o(d^2)$ から $o(d)$ (モデル次元は $d$ で) に削減できることを示した。シミュレーションの結果,提案手法の有効性と優位性について,既存のベースラインと比較した。

Federated meta-learning (FML) has emerged as a promising paradigm to cope with the data limitation and heterogeneity challenges in today's edge learning arena. However, its performance is often limited by slow convergence and corresponding low communication efficiency. Besides, since the wireless bandwidth and IoT devices' energy capacity are usually insufficient, it is crucial to control the resource allocation and energy consumption when deploying FML in realistic wireless networks. To overcome these challenges, in this paper, we first rigorously analyze each device's contribution to the global loss reduction in each round and develop an FML algorithm (called NUFM) with a non-uniform device selection scheme to accelerate the convergence. After that, we formulate a resource allocation problem integrating NUFM in multi-access wireless systems to jointly improve the convergence rate and minimize the wall-clock time along with energy cost. By deconstructing the original problem step by step, we devise a joint device selection and resource allocation strategy (called URAL) to solve the problem and provide theoretical guarantees. Further, we show that the computational complexity of NUFM can be reduced from $O(d^2)$ to $O(d)$ (with $d$ being the model dimension) via combining two first-order approximation techniques. Extensive simulation results demonstrate the effectiveness and superiority of the proposed methods by comparing with the existing baselines.

翻訳日:2021-08-17 14:56:18 公開日:2021-08-14

# kdd cup 2021 都市脳チャレンジのためのdqn制御ソリューション

DQN Control Solution for KDD Cup 2021 City Brain Challenge ( http://arxiv.org/abs/2108.06491v1 )

ライセンス: Link先を確認

Yitian Chen and Kunlong Chen and Kunjin Chen and Lin Wang

(参考訳) 私たちは、city brain challengeコンテストに参加し、第8位を獲得しました。このコンペティションでは、プレイヤーは実世界の都市規模の道路網と、その交通需要が実際の交通データから得られる。プレイヤーは自設計のエージェントと信号の調整を依頼され、許容できる遅延を維持しながら提供される車両の数を最大化する。本稿では,このコンペティションに対する総合分析と詳細な解法について述べる。提案手法は主に,リアルタイム信号制御のためのディープQネットワーク(DQN)の適応に基づいている。我々の見解では、この競争の大きな課題は、現実世界の複雑な道路網と交通流状況において、従来のDQNフレームワークを交通信号制御にどのように拡張するかである。いくつかの古典的な報酬関数を試行した後、私たちは最終的に、新しく設計された報酬をエージェントに適用することにしました。新たに提案した報酬関数を適用し、制御スキームを慎重にチューニングすることで、単一のDQNモデルに基づくエージェントがトップ15チームの中でランク付けできる。この論文は、現実世界の道路網の交通信号制御のベースラインソリューションとしてある程度機能し、さらなる試みや研究を刺激できることを願っている。

We took part in the city brain challenge competition and achieved the 8th place. In this competition, the players are provided with a real-world city-scale road network and its traffic demand derived from real traffic data. The players are asked to coordinate the traffic signals with a self-designed agent to maximize the number of vehicles served while maintaining an acceptable delay. In this abstract paper, we present an overall analysis and our detailed solution to this competition. Our approach is mainly based on the adaptation of the deep Q-network (DQN) for real-time traffic signal control. From our perspective, the major challenge of this competition is how to extend the classical DQN framework to traffic signals control in real-world complex road network and traffic flow situation. After trying and implementing several classical reward functions, we finally chose to apply our newly-designed reward in our agent. By applying our newly-proposed reward function and carefully tuning the control scheme, an agent based on a single DQN model can rank among the top 15 teams. We hope this paper could serve, to some extent, as a baseline solution to traffic signal control of real-world road network and inspire further attempts and researches.

翻訳日:2021-08-17 14:55:55 公開日:2021-08-14

# LinkTeller: 影響分析を通じてグラフニューラルネットワークからプライベートエッジを復元する

LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis ( http://arxiv.org/abs/2108.06504v1 )

ライセンス: Link先を確認

Fan Wu, Yunhui Long, Ce Zhang, Bo Li

(参考訳) グラフ構造化データにより、豊富なノードの特徴とエッジ情報を考慮して、レコメンデーションシステムやトラフィック予測など、いくつかの成功したアプリケーションを実現している。しかし、これらの高次元の特徴と高次隣接情報は、通常不均一であり、実際には異なるデータホルダーによって保持される。このような垂直データ分割(例えば、1つのデータホルダがノードの特徴またはエッジ情報のみを所有している)を考えると、異なるデータホルダは、プライバシの懸念からデータを互いに直接転送するのではなく、効率的な共同トレーニングプロトコルを開発する必要がある。本稿では,エッジプライバシに注目し,ノード機能を備えたBobがまず,隣接情報を所有するAliceにトレーニングノード機能を送信するという,トレーニングシナリオを検討する。 Aliceは、ジョイント情報でグラフニューラルネットワーク(GNN)をトレーニングし、推論APIをリリースする。推論中、Bob氏はテストノードの機能を提供し、APIに問い合わせてテストノードの予測を取得することができる。本稿ではまず,Aliceが保持するプライベートエッジ情報をBobの逆クエリによって推測するために,影響分析によるプライバシ攻撃LinkTellerを提案する。その後、LinkTellerが膨大な量のプライベートエッジを回復できることを実証的に示し、既存のベースラインを上回ります。プライバシリークを更に評価するために、差分プライベートグラフ畳み込みネットワーク(DP GCN)トレーニングのための既存のアルゴリズムを適用し、新しいDP GCNメカニズムであるLapGraphを提案する。これらのDP GCNメカニズムは、穏やかなプライバシー保証(\varepsilon>5$)の下で、LinkTellerに対して実証的に回復力があるとは限らない。当社の研究は、よりレジリエントなプライバシー保存型gcnモデルの設計に向けた今後の研究に光を当てると同時に、gcnモデルユーティリティと潜在的なプライバシー攻撃に対する堅牢性とのトレードオフに関する深い理解を提供します。

Graph structured data have enabled several successful applications such as recommendation systems and traffic prediction, given the rich node features and edges information. However, these high-dimensional features and high-order adjacency information are usually heterogeneous and held by different data holders in practice. Given such vertical data partition (e.g., one data holder will only own either the node features or edge information), different data holders have to develop efficient joint training protocols rather than directly transfer data to each other due to privacy concerns. In this paper, we focus on the edge privacy, and consider a training scenario where Bob with node features will first send training node features to Alice who owns the adjacency information. Alice will then train a graph neural network (GNN) with the joint information and release an inference API. During inference, Bob is able to provide test node features and query the API to obtain the predictions for test nodes. Under this setting, we first propose a privacy attack LinkTeller via influence analysis to infer the private edge information held by Alice via designing adversarial queries for Bob. We then empirically show that LinkTeller is able to recover a significant amount of private edges, outperforming existing baselines. To further evaluate the privacy leakage, we adapt an existing algorithm for differentially private graph convolutional network (DP GCN) training and propose a new DP GCN mechanism LapGraph. We show that these DP GCN mechanisms are not always resilient against LinkTeller empirically under mild privacy guarantees ($\varepsilon>5$). Our studies will shed light on future research towards designing more resilient privacy-preserving GCN models; in the meantime, provide an in-depth understanding of the tradeoff between GCN model utility and robustness against potential privacy attacks.

翻訳日:2021-08-17 14:55:36 公開日:2021-08-14

# 属性の組み合わせに対する精度面としての予測サービスの能動的評価

Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations ( http://arxiv.org/abs/2108.06514v1 )

ライセンス: Link先を確認

Vihari Piratla, Soumen Chakrabarty, Sunita Sarawagi

(参考訳) 本研究の目的は,ブラックボックス分類モデルの精度を評価することであり,テストデータ分布の1つの集合ではなく,複数のテストデータ分布を特徴付ける多数の属性の組み合わせの曲面として評価することである。トレーニングデータ分散がクライアントから隠蔽され、異なるクライアントがデータ分散のさまざまな領域に興味を持つようになると、マシンラーニングモデルがサービスとしてデプロイされるにつれて、このような帰結した正確性測定が重要になる。本稿では,AAA(Attributed Accuracy Assay) - ガウス過程(GP)に基づく,そのような精度表面の確率的推定器を提案する。各属性の組み合わせは'arm'と呼ばれ、サービスの精度をサンプリングしたベータ密度に関連付けられている。 GPが関連するアーム上でベータ密度のパラメータを滑らかにすることで、間隔を緩和することを期待している。 gpsの明らかな応用は,人口の少ない巨大な属性空間におけるヘテロシデスティックな不確実性の課題に対処できないことを示す。これに反応して,スパース観測をプールし,ベータ密度のスケールパラメータを定式化する2つの機能拡張を行った。これらのイノベーションを導入した後、広範囲な実験と分析を通じて、推定精度と探索効率の両方の観点からAAAの有効性を確立した。

Our goal is to evaluate the accuracy of a black-box classification model, not as a single aggregate on a given test data distribution, but as a surface over a large number of combinations of attributes characterizing multiple test data distributions. Such attributed accuracy measures become important as machine learning models get deployed as a service, where the training data distribution is hidden from clients, and different clients may be interested in diverse regions of the data distribution. We present Attributed Accuracy Assay (AAA)--a Gaussian Process (GP)--based probabilistic estimator for such an accuracy surface. Each attribute combination, called an 'arm', is associated with a Beta density from which the service's accuracy is sampled. We expect the GP to smooth the parameters of the Beta density over related arms to mitigate sparsity. We show that obvious application of GPs cannot address the challenge of heteroscedastic uncertainty over a huge attribute space that is sparsely and unevenly populated. In response, we present two enhancements: pooling sparse observations, and regularizing the scale parameter of the Beta densities. After introducing these innovations, we establish the effectiveness of AAA in terms of both its estimation accuracy and exploration efficiency, through extensive experiments and analysis.

翻訳日:2021-08-17 14:55:03 公開日:2021-08-14

# 適切な公正認識? 自動意思決定システムの公正性を評価するための説明文の有効性について

Appropriate Fairness Perceptions? On the Effectiveness of Explanations in Enabling People to Assess the Fairness of Automated Decision Systems ( http://arxiv.org/abs/2108.06500v1 )

ライセンス: Link先を確認

Jakob Schoeffer and Niklas Kuehl

(参考訳) 自動決定システム(ADS)の説明の1つの目的は、ユーザの肯定的な認識(公正性や信頼性など)を促進することであるとしばしば主張されている。しかし、この視点は、与えられたADSがまずは公平で信頼に値するという暗黙の仮定を下している。もしADSが不公平な結果を出した場合、システムの動作に関する説明がその欠点を明らかにし、したがって公正感の低下につながると期待するかもしれない。その結果、関連するADSの品質(公平さ)を適切に評価する上で、その有効性に対する説明を評価できることが示唆された。効果的に説明するためには、基礎となるADSが公正である場合に限り、公平性に対する認識が増加するべきであると論じる。本研究は, 適切な公正感のデシプラタムを導入し, 評価のための新しい研究設計を提案し, 総合実験に向けた次のステップを概説する。

It is often argued that one goal of explaining automated decision systems (ADS) is to facilitate positive perceptions (e.g., fairness or trustworthiness) of users towards such systems. This viewpoint, however, makes the implicit assumption that a given ADS is fair and trustworthy, to begin with. If the ADS issues unfair outcomes, then one might expect that explanations regarding the system's workings will reveal its shortcomings and, hence, lead to a decrease in fairness perceptions. Consequently, we suggest that it is more meaningful to evaluate explanations against their effectiveness in enabling people to appropriately assess the quality (e.g., fairness) of an associated ADS. We argue that for an effective explanation, perceptions of fairness should increase if and only if the underlying ADS is fair. In this in-progress work, we introduce the desideratum of appropriate fairness perceptions, propose a novel study design for evaluating it, and outline next steps towards a comprehensive experiment.

翻訳日:2021-08-17 14:50:53 公開日:2021-08-14

# 無人航空機におけるリアルタイムマルチモーダルセマンティクス融合

Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles ( http://arxiv.org/abs/2108.06608v1 )

ライセンス: Link先を確認

Simon Bultmann, Jan Quenzel and Sven Behnke

(参考訳) 複数の補完センサーを装備した無人航空機(UAV)は、高速な自律的または遠隔操作型セマンティックシーン分析(例えば災害調査)に極めて有益である。本研究では,実時間意味推論と複数センサの融合のためのUAVシステムを提案する。 LiDARスキャンとRGBイメージのセマンティックセグメンテーション、およびRGBとサーマルイメージのオブジェクト検出は、軽量CNNアーキテクチャと組み込み推論アクセラレータを使用してUAVコンピュータ上でオンラインで実行される。マルチモーダル性からのセマンティック情報が3次元点雲と画像分割マスクを増大させ、同時にアロセントリックなセマンティックマップを生成する。我々のシステムは、拡張されたセマンティックイメージとポイントクラウドを$\approx\,$9$,$hzで提供する。都市環境における実環境実験における統合システムの評価を行う。

Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors have tremendous potential for fast autonomous or remote-controlled semantic scene analysis, e.g., for disaster examination. In this work, we propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer using lightweight CNN architectures and embedded inference accelerators. We follow a late fusion approach where semantic information from multiple modalities augments 3D point clouds and image segmentation masks while also generating an allocentric semantic map. Our system provides augmented semantic images and point clouds with $\approx\,$9$\,$Hz. We evaluate the integrated system in real-world experiments in an urban environment.

翻訳日:2021-08-17 14:48:58 公開日:2021-08-14

# ループ内ソフトウェアを用いたクワッドコプタードローンの単眼視覚自律着陸システム

Monocular visual autonomous landing system for quadcopter drones using software in the loop ( http://arxiv.org/abs/2108.06616v1 )

ライセンス: Link先を確認

Miguel Saavedra-Ruiz, Ana Mario Pinto-Vargas, Victor Romero-Cano

(参考訳) 自律着陸は、多くの社会的・産業的応用において、マルチロータードローンの潜在能力を最大限に発揮するために欠かせない能力である。物理プラットフォーム上でのこの機能の実装とテストはリスクが高く、リソース集約的であるため、健全な設計プロセスと安全な配置の両方を保証するためには、物理プロトタイプを実装する前にシミュレーションが必要である。本稿では,クワッドコプターを予め定義された着陸パッドに自律的かつ効率的に着陸させることにより,物理的試験段階のリスクを低減できる単眼視システムの開発について述べる。ガゼボをベースとしたシミュレーションにより,自律着陸システム全体が設計要件を満たすことを保証するとともに,本手法は,物理実装に先立って安全なパラメータチューニングと設計試験を行うためのツールを提供する。最後に、ランディングパッド追跡に対する単眼視覚のみのアプローチにより、オドロイドxu4組み込みプロセッサの標準的な計算能力を持つf450クアッドコプタードローンでシステムを効果的に実装することができた。

Autonomous landing is a capability that is essential to achieve the full potential of multi-rotor drones in many social and industrial applications. The implementation and testing of this capability on physical platforms is risky and resource-intensive; hence, in order to ensure both a sound design process and a safe deployment, simulations are required before implementing a physical prototype. This paper presents the development of a monocular visual system, using a software-in-the-loop methodology, that autonomously and efficiently lands a quadcopter drone on a predefined landing pad, thus reducing the risks of the physical testing stage. In addition to ensuring that the autonomous landing system as a whole fulfils the design requirements using a Gazebo-based simulation, our approach provides a tool for safe parameter tuning and design testing prior to physical implementation. Finally, the proposed monocular vision-only approach to landing pad tracking made it possible to effectively implement the system in an F450 quadcopter drone with the standard computational capabilities of an Odroid XU4 embedded processor.

翻訳日:2021-08-17 14:48:43 公開日:2021-08-14

# 疎ベイズ推定のための高速非同期MCMCサンプリング器

A fast asynchronous MCMC sampler for sparse Bayesian inference ( http://arxiv.org/abs/2108.06446v1 )

ライセンス: Link先を確認

Yves Atchad\'e and Liwei Wang

(参考訳) 非常に高速に近似したマルコフ・チェイン・モンテカルロ(MCMC)サンプリングフレームワークを提案する。これは、複数のモデルにおける反復1回当たりの計算コストが$O(ns)$で、$n$はサンプルサイズ、$s$はモデルの基本空間である。このコストは、確率勾配ランジュバンダイナミクスを用いる場合のデータサブサンプリングによってさらに削減できる。このアルゴリズムは、Johnsonらの非同期Gibbsサンプルラの拡張である。 (2013)が、統計的観点からはベイズ的反復的な独立したスクリーニング(Fan et al)の形式と見なすことができる。 (2009)). 高次元線形回帰問題において,提案アルゴリズムが生成するマルコフ連鎖は,統計的仮定の下で高い確率で主信号を正確に回復する不変分布を許容することを示した。さらに, その混合時間は回帰器数において最も直線的であることを示す。アルゴリズムをいくつかのモデルで示す。

We propose a very fast approximate Markov Chain Monte Carlo (MCMC) sampling framework that is applicable to a large class of sparse Bayesian inference problems, where the computational cost per iteration in several models is of order $O(ns)$, where $n$ is the sample size, and $s$ the underlying sparsity of the model. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. (2013), but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening (Fan et al. (2009)). We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.

翻訳日:2021-08-17 14:46:51 公開日:2021-08-14

# AdaGNN: AdaBoostingに基づくGNNのためのマルチモーダル潜在表現メタラーナ

AdaGNN: A multi-modal latent representation meta-learner for GNNs based on AdaBoosting ( http://arxiv.org/abs/2108.06452v1 )

ライセンス: Link先を確認

Qinyi Zhu, Yiou Xiao

(参考訳) ディープラーニングの特殊分野として、グラフニューラルネットワーク(GNN)は固有のネットワーク特徴の抽出に重点を置いており、学術と産業の両方で前例のない人気を得ている。最先端のgnnモデルの多くは、グラフトラバーサルベースの方法での計算処理が難しいリッチなネットワーク機能を備えたソーシャルネットワークレコメンダシステムに対して、表現豊かで堅牢でスケーラブルでインダクティブなソリューションを提供します。最近のGNNは、部分グラフから1つの低次元埋め込み空間へ高次元の異種情報を符号化するエンコーダ・デコーダのパラダイムに従っている。しかし、1つの埋め込み空間は通常、グラフ信号の全ての側面を捉えない。本研究では,複数のプロジェクションと,グラフ信号の異なる側面をキャプチャする埋め込み空間を自動的に学習する,GNNのためのブースティングベースメタラーナを提案する。その結果、サブグラフ間の類似性を複数の埋め込み空間に近接して定量化する。 AdaGNNは、リッチで多様なノード近傍情報を持つアプリケーションに対して非常によく機能する。さらに、AdaGNNはノードレベルとエッジレベルの両方のタスクに対して誘導GNNと互換性がある。

As a special field in deep learning, Graph Neural Networks (GNNs) focus on extracting intrinsic network features and have drawn unprecedented popularity in both academia and industry. Most of the state-of-the-art GNN models offer expressive, robust, scalable and inductive solutions empowering social network recommender systems with rich network features that are computationally difficult to leverage with graph traversal based methods. Most recent GNNs follow an encoder-decoder paradigm to encode high dimensional heterogeneous information from a subgraph onto one low dimensional embedding space. However, one single embedding space usually fails to capture all aspects of graph signals. In this work, we propose boosting-based meta learner for GNNs, which automatically learns multiple projections and the corresponding embedding spaces that captures different aspects of the graph signals. As a result, similarities between sub-graphs are quantified by embedding proximity on multiple embedding spaces. AdaGNN performs exceptionally well for applications with rich and diverse node neighborhood information. Moreover, AdaGNN is compatible with any inductive GNNs for both node-level and edge-level tasks.

翻訳日:2021-08-17 14:46:35 公開日:2021-08-14

# ハイブリッドガウス過程モデリングによるバッチプロセスの経済統計モデル予測制御

Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model Predictive Control of Batch Processes ( http://arxiv.org/abs/2108.06430v1 )

ライセンス: Link先を確認

E. Bradford, L. Imsland, M. Reble, E.A. del Rio-Chanona

(参考訳) 非線形モデル予測制御(nmpc)は、制約のある非線形多変数動的システムの制御に効率的なアプローチであるが、正確なプラントモデルを必要とする。植物モデルはしばしば第一原理から決定されるが、モデルの一部は物理法則だけで導出することは困難である。本稿では,この課題を克服するために,gpsを利用して,第一原理を用いた記述が難しい動的システムの部品をモデル化するハイブリッド・ガウス過程(gp)第一原理モデリングスキームを提案する。 GPは正確な予測を与えるだけでなく、このモデルの残留不確実性も定量化する。この不確実性を制御アルゴリズムで考慮し、制約違反や性能劣化を防止することが不可欠である。 GPのモンテカルロサンプルはオフラインで生成され、NMPCの制約を厳しくし、共同確率的制約満足度をオンラインで確保する。提案手法の利点は,高速なオンライン評価時間,保守性を緩和するオンライン学習を考慮できる可能性,gpsの柔軟性と第一原理モデルのデータ効率を活用できる点である。このアルゴリズムは、挑戦的なセミバッチバイオリアクターを含むケーススタディで検証される。

Nonlinear model predictive control (NMPC) is an efficient approach for the control of nonlinear multivariable dynamic systems with constraints, which however requires an accurate plant model. Plant models can often be determined from first principles, parts of the model are however difficult to derive using physical laws alone. In this paper a hybrid Gaussian process (GP) first principles modeling scheme is proposed to overcome this issue, which exploits GPs to model the parts of the dynamic system that are difficult to describe using first principles. GPs not only give accurate predictions, but also quantify the residual uncertainty of this model. It is vital to account for this uncertainty in the control algorithm, to prevent constraint violations and performance deterioration. Monte Carlo samples of the GPs are generated offline to tighten constraints of the NMPC to ensure joint probabilistic constraint satisfaction online. Advantages of our method include fast online evaluation times, possibility to account for online learning alleviating conservativeness, and exploiting the flexibility of GPs and the data efficiency of first principle models. The algorithm is verified on a case study involving a challenging semi-batch bioreactor.

翻訳日:2021-08-17 14:44:10 公開日:2021-08-14

# layerpipe:層内および層間勾配パイプラインとマルチプロセッサスケジューリングによるディープニューラルネットワークトレーニングの高速化

LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling ( http://arxiv.org/abs/2108.06629v1 )

ライセンス: Link先を確認

Nanda K. Unnikrishnan and Keshab K. Parhi

(参考訳) ニューラルネットワークのトレーニングに要する時間は、サイズ、複雑性、深さによって増加する。バックプロパゲーションによるモデルパラメータのトレーニングは、本質的にフィードバックループを生成する。これらのループは、レイヤー内および連続するレイヤ間のタスクの効率的なパイプライン化とスケジューリングを妨げる。 PipeDreamのような以前のアプローチでは、層間パイプライニングを実現するために遅延勾配を使用した。しかし、これらのアプローチはバックプロパゲーション全体を単一のタスクとして扱うため、計算時間とプロセッサの非使用率の増加につながる。本稿では,重みと活性化関数に対する勾配計算を独立に考慮し,並列に計算できる新しい最適化手法を提案する。これを層内最適化と呼ぶ。さらに、活性化関数に関する勾配計算はさらに2つの部分に分割され、2つの連続層に分散される。これにより、各レイヤの計算時間は同じバランスの取れたスケジューリングにつながる。これを層間最適化と呼ぶ。提案システムはLayerPipeと呼ばれ,プロセッサ使用率を最小化しつつ,プロセッサ間通信オーバーヘッドを最小限に抑えながら,トレーニングに必要なクロックサイクル数を削減している。 LayerPipeは、PipeDreamと比較して通信オーバーヘッドが少ない7～9プロセッサで平均25%、80%以上のスピードアップを実現している。

The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.

翻訳日:2021-08-17 14:43:53 公開日:2021-08-14

# graph2mda:microbe-drug関連を予測するマルチモーダル変分グラフ埋め込みモデル

Graph2MDA: a multi-modal variational graph embedding model for predicting microbe-drug associations ( http://arxiv.org/abs/2108.06338v1 )

ライセンス: Link先を確認

Lei Deng, Yibiao Huang, Xuejun Liu and Hui Liu

(参考訳) 蓄積された臨床研究によると、ヒトに生息する微生物はヒトの宿主と密接に相互作用し、薬物効果や薬物毒性の調節に関与している。微生物は抗菌剤の開発に新たな標的となっている。したがって、微生物ドラッグ協会のスクリーニングは、薬物研究と開発に大きな利益をもたらす可能性がある。微生物ゲノムと薬理学のデータセットの増加に伴い,新しい微生物-薬物関連を同定する効果的な計算手法の開発が大きな動機となっている。本稿では、変動グラフオートエンコーダ(VGAE)を用いて、微生物と薬物の関係を予測する新しい方法、Graph2MDAを提案する。分子構造, 微生物遺伝子配列, 機能アノテーションなど, 微生物や薬物の多機能性に基づくマルチモーダル属性グラフを構築した。マルチモーダル属性グラフの入力として、vgaeは各ノードとグラフ全体の情報的かつ解釈可能な潜在表現を学ぶように訓練され、さらに深層ニューラルネットワーク分類器を使用してマイクロ薬物関連を予測する。ハイパーパラメータ解析およびモデルアブレーション研究により,本モデルの感度と堅牢性を示した。提案手法は3つの独立したデータセット上で評価し,提案手法が既存の6つの最先端手法を上回った。また,学習した薬物の潜在表現の意味についても検討し,薬物のATC分類と有意に一致した明らかなクラスタリングパターンを示した。さらに, 2つの微生物と2つの薬剤のケーススタディを行い, pubmed literatureで75\%-95\%の関連が報告された。提案手法の有効性を検証した。

Accumulated clinical studies show that microbes living in humans interact closely with human hosts, and get involved in modulating drug efficacy and drug toxicity. Microbes have become novel targets for the development of antibacterial agents. Therefore, screening of microbe-drug associations can benefit greatly drug research and development. With the increase of microbial genomic and pharmacological datasets, we are greatly motivated to develop an effective computational method to identify new microbe-drug associations. In this paper, we proposed a novel method, Graph2MDA, to predict microbe-drug associations by using variational graph autoencoder (VGAE). We constructed multi-modal attributed graphs based on multiple features of microbes and drugs, such as molecular structures, microbe genetic sequences, and function annotations. Taking as input the multi-modal attribute graphs, VGAE was trained to learn the informative and interpretable latent representations of each node and the whole graph, and then a deep neural network classifier was used to predict microbe-drug associations. The hyperparameter analysis and model ablation studies showed the sensitivity and robustness of our model. We evaluated our method on three independent datasets and the experimental results showed that our proposed method outperformed six existing state-of-the-art methods. We also explored the meaningness of the learned latent representations of drugs and found that the drugs show obvious clustering patterns that are significantly consistent with drug ATC classification. Moreover, we conducted case studies on two microbes and two drugs and found 75\%-95\% predicted associations have been reported in PubMed literature. Our extensive performance evaluations validated the effectiveness of our proposed method.\

翻訳日:2021-08-16 13:00:23 公開日:2021-08-14

PDF登録状況（公開日: 20210814）