Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210711となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 量子エンタングル-プローブ散乱理論 Quantum Entangled-Probe Scattering Theory ( http://arxiv.org/abs/2008.04328v3 ) ライセンス: Link先を確認	Abu Ashik Md Irfan, Patrick Blackstone, Roger Pynn and Gerardo Ortiz	(参考訳) 我々は、量子検出を含む絡み合ったプローブ散乱理論を開発し、標準散乱アプローチの範囲を広げる。これらのプローブは、強い相関系の非伝統的な位相のような絡み合った物質の研究において革命的であるかもしれない。本発表では、光子プローブにも同様の考え方が適用されるが、スピンとパスにモード絡み合う中性子ビームプローブを[1]で実験的に実現した。従来のファンホーブ理論 [2] を一般化し、応答は2点相関関数の適切に作られた組み合わせとして記述される。プローブの絡み合い長さをチューニングすることで、差動断面の干渉パターンを分析して、興味の空間的スケールを問うことができる。注目すべきことに、スピン二量体ターゲットの場合、ターゲット状態が非絡み合っているときに観測される典型的なヤング様干渉パターンは、その状態が最大絡み合っているときに量子消去される。 We develop an entangled-probe scattering theory, including quantum detection, that extends the scope of standard scattering approaches. We argue that these probes may be revolutionary in studying entangled matter such as unconventional phases of strongly correlated systems. Our presentation focuses on a neutron beam probe that is mode-entangled in spin and path as is experimentally realized in [1], although similar ideas also apply to photon probes. We generalize the traditional van Hove theory [2] whereby the response is written as a properly-crafted combination of two-point correlation functions. Tuning the probe's entanglement length allows us to interrogate spatial scales of interest by analyzing interference patterns in the differential cross-section. Remarkably, for a spin dimer target we find that the typical Young-like interference pattern observed if the target state is un-entangled gets quantum erased when that state becomes maximally entangled.	翻訳日:2023-05-06 16:00:47 公開日:2021-07-11
# 食事アセスメントのための無料mhealthアプリの批判的特徴と一般課題 A Review of Critical Features and General Issues of Freely Available mHealth Apps For Dietary Assessment ( http://arxiv.org/abs/2008.09883v4 ) ライセンス: Link先を確認	Ghalib Ahmed Tahir, Chu Kiong Loo, Foong Ming Moy and Nadine Kong	(参考訳) 肥満は生活の質を著しく低下させることが知られている。糖尿病、心血管疾患、様々ながんなどの非感染性疾患の頻度の増加と関連していることが多い。食事関連のモバイルアプリケーションは、個人の健康的な選択や食物摂取の追跡を支援する上で重要な役割を担っているという証拠がある。しかし、類似したアプリケーションが豊富にあるため、機能、ユーザビリティ、および設計上の問題の観点からそれぞれを評価し、将来に向けて最先端のソリューションを真に決定することが重要になる。これらのアプリケーションは、異なる食生活者からの複数のユーザー要求とレコメンデーションを実装しているため、評価は非常に複雑になる。そこで本研究では,既存の食事用アプリケーションについて検討し,アプリケーションのユーザビリティを損なうおそれのある重要な特徴と問題点を強調する。本研究は, PUBMED, CINAHL (2010年1月～2019年12月) およびScience Direct (2010-2019年) の各種学術データベースから, 論文の公開について検討した。我々はPRISMAガイドラインに従い,本研究の結果から,同定,スクリーニング,適性,全文評価の56%が包括的基準を満たした。選択した研究から35のアプリを分析し,特定した各アプリのデータを抽出した。自由に利用可能なmhealthアプリケーションの包括性に関する詳細な分析を行った結果,今後の研究課題を特定し,臨床的に正確な食事関連アプリケーションを開発するための推奨事項を述べた。 Obesity is known to lower the quality of life substantially. It is often associated with increased chances of non-communicable diseases such as diabetes, cardiovascular problems, various cancers, etc. Evidence suggests that diet-related mobile applications play a vital role in assisting individuals in making healthier choices and keeping track of food intake. However, due to an abundance of similar applications, it becomes pertinent to evaluate each of them in terms of functionality, usability, and possible design issues to truly determine state-of-the-art solutions for the future. Since these applications involve implementing multiple user requirements and recommendations from different dietitians, the evaluation becomes quite complex. Therefore, this study aims to review existing dietary applications at length to highlight key features and problems that enhance or undermine an application's usability. For this purpose, we have examined the published literature from various scientific databases of the PUBMED, CINAHL (January 2010-December 2019) and Science Direct (2010-2019). We followed PRISMA guidelines, and out of our findings, fifty-six primary studies met our inclusion criteria after identification, screening, eligibility and full-text evaluation. We analyzed 35 apps from the selected studies and extracted the data of each of the identified apps.Following our detailed analysis on the comprehensiveness of freely available mHealth applications, we specified potential future research challenges and stated recommendations to help grow clinically accurate diet-related applications.	翻訳日:2023-05-05 05:59:00 公開日:2021-07-11
# 非ブローチパリティ時対称性と例外点の観測 Observation of non-Bloch parity-time symmetry and exceptional points ( http://arxiv.org/abs/2009.07288v2 ) ライセンス: Link先を確認	Lei Xiao, Tianshu Deng, Kunkun Wang, Zhong Wang, Wei Yi, Peng Xue	(参考訳) パリティ時(PT)対称ハミルトニアンは非エルミート物理学において広く重要である。 pt対称ハミルトニアンは、実または複素の固有スペクトルを持つ異なる位相を示すことができ、一方、中間の遷移点(いわゆる例外点)は、アプリケーションにとって大きな期待を持つ重要な振る舞いのホストとなる。空間的に周期的な非エルミート系では、pt対称性は一般にブロッホバンド理論に沿って特徴づけられ観測され、ブリルアンゾーンに例外的な点がある。ここでは、単光子の非単位量子ウォークにおいて、この共通の知恵を超えた例外的な点の族が発見される。これらの「非ブロックの例外点」は、非エルミティアスキン効果として知られる境界付近のバルク固有状態の蓄積に由来する。以上の結果から,pt対称性と非エルミート皮膚効果との間に興味深い相互作用が認められた。 Parity-time (PT)-symmetric Hamiltonians have widespread significance in non-Hermitian physics. A PT-symmetric Hamiltonian can exhibit distinct phases with either real or complex eigenspectrum, while the transition points in between, the so-called exceptional points, give rise to a host of critical behaviors that holds great promise for applications. For spatially periodic non-Hermitian systems, PT symmetries are commonly characterized and observed in line with the Bloch band theory, with exceptional points dwelling in the Brillouin zone. Here, in nonunitary quantum walks of single photons, we uncover a novel family of exceptional points beyond this common wisdom. These "non-Bloch exceptional points" originate from the accumulation of bulk eigenstates near boundaries, known as the non-Hermitian skin effect, and inhabit a generalized Brillouin zone. Our finding opens the avenue toward a generalized PT-symmetry framework, and reveals the intriguing interplay between PT symmetry and non-Hermitian skin effect.	翻訳日:2023-05-02 04:16:49 公開日:2021-07-11
# フォトニック符号化パラフェミオンのトポロジカルな文脈性と正準統計 Topological contextuality and anyonic statistics of photonic-encoded parafermions ( http://arxiv.org/abs/2011.05008v2 ) ライセンス: Link先を確認	Zheng-Hao Liu, Kai Sun, Jiannis K. Pachos, Mu Yang, Yu Meng, Yu-Wei Liao, Qiang Li, Jun-Feng Wang, Ze-Yu Luo, Yi-Fei He, Dong-Yu Huang, Guang-Rui Ding, Jin-Shi Xu, Yong-Jian Han, Chuan-Feng Li and Guang-Can Guo	(参考訳) マヨラナゼロモード状態の測定中に生じると思われる準粒子中毒は、マヨラナベースの量子計算の実現に向けて根本的な問題となる。パラフェルミオン(Parafermions)はマヨラナのフェルミオンの自然な一般化であり、準粒子中毒に免疫するトポロジカルクォーディットをコードする。パラフェルミオンは超伝導分数量子ホール系で現れることが期待されているが、現在の技術では実現できない。この問題を回避するため,我々はフォトニック量子シミュレータを用いてパラフェルミオンに基づく普遍量子計算の重要なコンポーネントを実験的に実証する。この記事への私たちの貢献は2つあります。まず、フォトニック状態を操作することで、パラフェルミオンのブレイディング統計に対応するクリフォード作用素ベリー相を実現する。第2に、パラフェルミオン符号化量子量子状態の文脈性を示すことにより、トポロジカル系の量子文脈性が初めて研究される。重要なことに、トポロジカルに符号化された文脈性は、状態蒸留の魔法の道を開く一方、コンテキスト性とブレイディングによって引き起こされるクリフォードゲートは局所雑音に対して弾力性を持つ。文脈性を導入することで、我々のフォトニック量子シミュレーションは、トポロジカル量子計算を実現するための物理的に堅牢な方法論への第一歩を提供する。 Quasiparticle poisoning, expected to arise during the measurement of Majorana zero mode state, poses a fundamental problem towards the realization of Majorana-based quantum computation. Parafermions, a natural generalization of Majorana fermions, can encode topological qudits immune to quasiparticle poisoning. While parafermions are expected to emerge in superconducting fractional quantum Hall systems, they are not yet attainable with current technology. To bypass this problem, we employ a photonic quantum simulator to experimentally demonstrate the key components of parafermion-based universal quantum computation. Our contributions in this article are twofold. First, by manipulating the photonic states, we realize Clifford operator Berry phases that correspond to braiding statistics of parafermions. Second, we investigate the quantum contextuality in a topological system for the first time by demonstrating the contextuality of parafermion encoded qudit states. Importantly, we find that the topologically-encoded contextuality opens the way to magic state distillation, while both the contextuality and the braiding-induced Clifford gates are resilient against local noise. By introducing contextuality, our photonic quantum simulation provides the first step towards a physically robust methodology for realizing topological quantum computation.	翻訳日:2023-04-24 19:16:01 公開日:2021-07-11
# コヒーレント状態と猫状態に対するLeggett-Garg不等式の違反 Violations of the Leggett-Garg inequality for coherent and cat states ( http://arxiv.org/abs/2101.06866v3 ) ライセンス: Link先を確認	Hiroo Azuma, Masashi Ban	(参考訳) 数値計算により,コヒーレント状態が猫状態よりもレゲット・ガーグ不等式(lgi)により大きな違反を生じうることを示す。そこで本研究では, 空洞モードのLGIを, ゼロ温度環境に弱結合したLGIを物理系の実例として考察した。ボソニックモードは,環境との相互作用により消散するが,消耗には影響しないと仮定する。マスター方程式を正しく解くと、最初にコヒーレント状態である$\|\alpha\rangle$ と cat 状態 $(\|\alpha\rangle+\|-\alpha\rangle)$ で準備された両システムの不等式を破る明示的な形式を導出する。不等式の評価には、複素数$\beta$を特徴とする転位パリティ作用素を選択する。我々は不等式の上限を数値的に最大にする最適なパラメータ$\beta$を求める。我々の期待に反して、コヒーレント状態は、等間隔の3つの測定時間(約$\tau$)の特定の範囲において、lgiの違反の上限よりも高い量子品質を示すことがある。さらに、$\tau$ が 0 に近づくと、最適化されたパラメータ $\beta$ が発散し、LGI は強い特異点を示す。 We show that in some cases the coherent state can have a larger violation of the Leggett-Garg inequality (LGI) than the cat state by numerical calculations. To achieve this result, we consider the LGI of the cavity mode weakly coupled to a zero-temperature environment as a practical instance of the physical system. We assume that the bosonic mode undergoes dissipation because of an interaction with the environment but is not affected by dephasing. Solving the master equation exactly, we derive an explicit form of the violation of the inequality for both systems prepared initially in the coherent state $\|\alpha\rangle$ and the cat state $(\|\alpha\rangle+\|-\alpha\rangle)$. For the evaluation of the inequality, we choose the displaced parity operators characterized by a complex number $\beta$. We look for the optimum parameter $\beta$ that lets the upper bound of the inequality be maximum numerically. Contrary to our expectations, the coherent state occasionally exhibits quantum quality more strongly than the cat state for the upper bound of the violation of the LGI in a specific range of three equally spaced measurement times (spacing $\tau$). Moreover, as we let $\tau$ approach zero, the optimized parameter $\beta$ diverges and the LGI reveals intense singularity.	翻訳日:2023-04-14 21:25:14 公開日:2021-07-11
# スピン量子ビットアドレスを持つ2次元si量子ドットアレイの設計 Designs for a two-dimensional Si quantum dot array with spin qubit addressability ( http://arxiv.org/abs/2106.11124v2 ) ライセンス: Link先を確認	Masahiro Tadokoro, Takashi Nakajima, Takashi Kobayashi, Kenta Takeda, Akito Noiri, Kaito Tomari, Jun Yoneda, Seigo Tarucha, and Tetsuo Kodera	(参考訳) Siの電子スピンは、スケーラビリティと高速で高忠実な量子論理ゲートを基盤として、量子計算の魅力的なプラットフォームである。しかし、中規模から大規模の量子計算において、量子ビット間の効率的な接続と2次元の統合が重要であるにもかかわらず、量子ビットのアドレス性を保証する実用的なデバイス設計はまだ見つからない。本稿では,実用的な3 x 3量子ドットデバイスの設計と,長期的ターゲットとしての大規模設計を提案する。設計目標は、アドレス性を確保しながら、近接する4つの隣人とのqubit接続を実現することである。 3x3量子ドットアレイは, 1次元よりも効率よく4量子Groverのアルゴリズムを実行できることを示す。 3×3以上の二次元配列をスケールアップするために,強磁性ゲート電極を用いた新しい構造を提案する。以上の結果から,si中規模の量子プロセッサが高速な量子論理ゲートと長いコヒーレンス時間を持つ可能性を示す。 Electron spins in Si are an attractive platform for quantum computation, backed with their scalability and fast, high-fidelity quantum logic gates. Despite the importance of two-dimensional integration with efficient connectivity between qubits for medium- to large-scale quantum computation, however, a practical device design that guarantees qubit addressability is yet to be seen. Here, we propose a practical 3 x 3 quantum dot device design and a larger-scale design as a longer-term target. The design goal is to realize qubit connectivity to the four nearest neighbors while ensuring addressability. We show that a 3 x 3 quantum dot array can execute four-qubit Grover's algorithm more efficiently than the one-dimensional counterpart. To scale up the two-dimensional array beyond 3 x 3, we propose a novel structure with ferromagnetic gate electrodes. Our results showcase the possibility of medium-sized quantum processors in Si with fast quantum logic gates and long coherence times.	翻訳日:2023-03-25 23:15:29 公開日:2021-07-11
# 電荷励起と電偏光に対する創発的非エルミート境界寄与 Emergent non-Hermitian boundary contributions to charge pumping and electric polarization ( http://arxiv.org/abs/2106.14173v2 ) ライセンス: Link先を確認	K. Kyriakou and K. Moulopoulos	(参考訳) 電荷ポンプ現象と現代の電気偏光理論は、創発的非エルミート的貢献を考慮した解析的に再考される。これらは速度作用素の拡張定義を用いて説明され、ここで初めて導かれる動的ヘルマン=ファインマンの定理(DHFT)によって決定される。 DHFTは一般化されたベリー曲率を導入し、観測可能量の非摂動計算には有効である。拡張速度演算子を用いて、電荷ポンプが材料の境界とどのように結びついているか(この接続には非ハーミティティーが不可欠である)を厳密に示し、DHFTにより、周期ゲージがフロケ・ブロッホ状態に適用できないとき、駆動非平衡過程における非可積分アハロノフ・アンダン相により、励起電荷のよく知られた位相量子化が崩壊することを示す。同様に、電子分極変化は、現代の電気分極理論では見過ごせない、追加の非エルミート的寄与を持つことを示す。非エルミート寄与は定義によって、バルク積分を境界量に変換する対称構造により等しく境界量として評価できるバルク量である。この非エルミート的寄与は、波動関数に課される現実的な境界条件に非常に敏感であり、偏極変化を引き起こす過程中に境界上の電荷蓄積が存在する偏光絶縁体において重要であることが期待される。最後に、よく定義された曲面電荷定理が境界非エルミート寄与の観点から定式化できることを示す。 The phenomenon of charge pumping and the modern theory of electric polarization are reconsidered by analytically taking into account emergent non-Hermitian contributions. These are accounted for through the use of an extended definition of the velocity operator and are determined by means of a dynamic Hellmann-Feynman theorem (DHFT) that we derive here for the first time. The DHFT introduces generalized Berry curvatures and it is valid for calculating observables nonperturbatively, hence with results valid to all orders of the external fields. By using the extended velocity operator we rigorously show how the charge pumping is linked up with the boundaries of the material (with the non-Hermiticity being essential for this connection), and by means of the DHFT we show that the well-known topological quantization of the pumped charge breaks down due to a nonintegrable Aharonov-Anandan phase in driven non-equilibrium processes whenever the periodic gauge cannot be applied to the Floquet-Bloch states. Likewise, we show that the electronic polarization change has an additional non-Hermitian contribution, which is overlooked in the modern theory of electric polarization. The non-Hermitian contribution is by definition a bulk quantity that may equally be evaluated as a boundary quantity due to a symmetric structure that allows the bulk integration to be transformed into a boundary one. This non-Hermitian contribution is very sensitive to the realistic boundary conditions imposed on the wavefunctions and it is therefore expected to be significant in biased insulators where charge accumulation over their boundaries is present during the process that causes the polarization change. Finally, we show how a well-defined surface-charge theorem can be formulated in terms of the boundary non-Hermitian contribution.	翻訳日:2023-03-24 23:25:46 公開日:2021-07-11
# 熱場二重状態に対する奇数絡み合いエントロピーと対数ネガティクス Odd Entanglement Entropy and Logarithmic Negativity for Thermofield Double States ( http://arxiv.org/abs/2106.15451v2 ) ライセンス: Link先を確認	Mostafa Ghasemi, Ali Naseh and Reza Pirmoradian	(参考訳) 共分散行列を用いた自由スカラー量子場理論における熱場倍(TFD)状態に対する奇絡エントロピー(OEE)と対数ネガティビティ(LN)の時間発展について検討する。混合状態を持つためには、TFDの各辺の隣接区間または非連結区間である非補間サブシステムを選択する。我々は,OEEの時間進化パターンが線形成長であり,飽和が続くことを見出した。円格子上では、長い時間にわたって有限サイズ効果は振動挙動として表される。質量の消滅の限界では、TFDの両側に1度の自由度を含むサブシステムに対して、中間期の対数的成長をもたらすOEEの時間的進化に対するゼロモードの影響を解析的に見出す。さらに、隣接する区間では、LN は $t < \beta/2$ (逆温度の半分) で 0 であり、その後に線形に成長し始める。一定温度での解離区間については、時間$t<d/2$(間隔間の距離の半分)でLNの消滅が観測される。また、同じような遅延があり、$\Delta S=S_{\text{OEE}}-S_{\text{EE}}$の線形成長が見られる。これらの結果は、対数的成長とは別に、これらの測定の力学が準粒子像と一致していることを示している。 We investigate the time evolution of odd entanglement entropy (OEE) and logarithmic negativity (LN) for the thermofield double (TFD) states in free scalar quantum field theories using the covariance matrix approach. To have mixed states, we choose non-complementary subsystems, either adjacent or disjoint intervals on each side of the TFD. We find that the time evolution pattern of OEE is a linear growth followed by saturation. On a circular lattice, for longer times the finite size effect demonstrates itself as oscillatory behavior. In the limit of vanishing mass, for a subsystem containing a single degree of freedom on each side of the TFD, we analytically find the effect of zero-mode on the time evolution of OEE which leads to logarithmic growth in the intermediate times. Moreover, for adjacent intervals we find that the LN is zero for times $t < \beta/2$ (half of the inverse temperature) and after that, it begins to grow linearly. For disjoint intervals at fixed temperature, the vanishing of LN is observed for times $t<d/2$ (half of the distance between intervals). We also find a similar delay to see linear growth of $\Delta S=S_{\text{OEE}}-S_{\text{EE}}$. All these results show that the dynamics of these measures are consistent with the quasi-particle picture, of course apart from the logarithmic growth.	翻訳日:2023-03-24 19:32:58 公開日:2021-07-11
# 超伝導体を介する長距離磁気双極子-双極子相互作用 Long range magnetic dipole-dipole interaction mediated by a superconductor ( http://arxiv.org/abs/2107.05130v1 ) ライセンス: Link先を確認	Yoav Romach, Tal Wasserman, Shai Tishby, Nir Bar-Gill	(参考訳) 量子計算とシミュレーションは、空間的に分離される可能性がある量子ビット間の強いコヒーレント結合を必要とする。固体ベースのスピン量子ビットに対するこの結合を達成することは、長年の課題である。本稿では、量子ビットによって生成された磁束を導出する超伝導ナノ構造に基づいて、そのようなカップリングを実現する手法を理論的に検討する。超伝導層内のナノファブリケート開口の直下に位置するスピン量子ビットを描いた磁気双極子による磁場の半古典的解析計算とシミュレーションについて述べる。このような構造は磁束を流し、スピン量子ビット間の双極子-双極子相互作用を強化し、そのスケーリングを距離で変化させることで、相互作用するスピン系を制御可能とした。 Quantum computation and simulation requires strong coherent coupling between qubits, which may be spatially separated. Achieving this coupling for solid-state based spin qubits is a long-standing challenge. Here we theoretically investigate a method for achieving such coupling, based on superconducting nano-structures designed to channel the magnetic flux created by the qubits. We detail semi-classical analytical calculations and simulations of the magnetic field created by a magnetic dipole, depicting the spin qubit, positioned directly below nanofabricated apertures in a superconducting layer. We show that such structures could channel the magnetic flux, enhancing the dipole-dipole interaction between spin qubits and changing its scaling with distance, thus potentially paving the way for controllably engineering an interacting spin system.	翻訳日:2023-03-22 20:11:53 公開日:2021-07-11
# 基底状態エネルギー密度問題の計算複雑性 Computational Complexity of the Ground State Energy Density Problem ( http://arxiv.org/abs/2107.05060v1 ) ライセンス: Link先を確認	James D. Watson, Toby S. Cubitt	(参考訳) 無限格子サイズの熱力学的極限における格子上の局所ハミルトニアンの基底状態エネルギー密度を求める複雑さについて検討する。我々はこれを関数問題として厳密に定式化し、そこでは基底状態エネルギー密度を特定の精度で推定し、等価な公約問題として$\mathsf{GSED}$として、基底状態エネルギー密度が指定されたしきい値以上であるかを問う。基底状態エネルギー密度問題は、その基底状態エネルギー密度が一定の実数である熱力学の極限において、単一の固定ハミルトニアンに関係しているという点で珍しい。計算問題に対する唯一の入力は、基底状態エネルギー密度に対応する固定実数を計算する精度である。したがって、複雑性クラスに対するこの問題のハードネスは、クラス内のすべての問題の解がこの1つの数で符号化されることを意味する(計算可能性理論におけるChaitinの定数に類似している)。これは、熱力学的極限における単一のハミルトニアンの物理的性質に通常関係する凝縮物物理学でよく見られる問題の一種である。 2次元正方格子上の古典的、翻訳的不変、最寄りのハミルトニアンに対しては、$\mathsf{p}^{\mathsf{neexp}}\subseteq\mathsf{exp}^{\mathsf{gsed}}\subseteq \mathsf{exp}^{\mathsf{nexp}}$、量子ハミルトンでは$\mathsf{p}^{\mathsf{neexp}}\subseteq\mathsf{exp}^{\mathsf{gsed}}\subseteq \mathsf{qma}_{exp}}$である。 oracleの定義に関する技術的な注意事項により、これらの結果のいくつかにおける$\mathsf{exp}$は$\mathsf{pspace}$に強化できる。また、$\mathsf{gsed}$の関数バージョンに対する類似の複雑性境界を与える。 We study the complexity of finding the ground state energy density of a local Hamiltonian on a lattice in the thermodynamic limit of infinite lattice size. We formulate this rigorously as a function problem, in which we request an estimate of the ground state energy density to some specified precision; and as an equivalent promise problem, $\mathsf{GSED}$, in which we ask whether the ground state energy density is above or below specified thresholds. The ground state energy density problem is unusual, in that it concerns a single, fixed Hamiltonian in the thermodynamic limit, whose ground state energy density is just some fixed, real number. The only input to the computational problem is the precision to which to estimate this fixed real number, corresponding to the ground state energy density. Hardness of this problem for a complexity class therefore implies that the solutions to all problems in the class are encoded in this single number (analogous to Chaitin's constant in computability theory). This captures computationally the type of question most commonly encountered in condensed matter physics, which is typically concerned with the physical properties of a single Hamiltonian in the thermodynamic limit. We show that for classical, translationally invariant, nearest neighbour Hamiltonians on a 2D square lattice, $\mathsf{P}^{\mathsf{NEEXP}}\subseteq\mathsf{EXP}^{\mathsf{GSED}}\subseteq \mathsf{EXP}^{\mathsf{NEXP}}$, and for quantum Hamiltonians $\mathsf{P}^{\mathsf{NEEXP}}\subseteq\mathsf{EXP}^{\mathsf{GSED}}\subseteq \mathsf{EXP}^{\mathsf{QMA}_{EXP}}$. With some technical caveats on the oracle definitions, the $\mathsf{EXP}$ in some of these results can be strengthened to $\mathsf{PSPACE}$. We also give analogous complexity bounds for the function version of $\mathsf{GSED}$.	翻訳日:2023-03-22 20:11:40 公開日:2021-07-11
# 対実プロトコルの対実性解析のための3つのアプローチ Three approaches for analyzing the counterfactuality of counterfactual protocols ( http://arxiv.org/abs/2107.05055v1 ) ライセンス: Link先を確認	Alon Wander, Eliahu Cohen and Lev Vaidman	(参考訳) 対物通信プロトコルは古典的議論、弱いトレース基準、フィッシャー情報基準の3つのアプローチを用いて分析される。古典的な分析は矛盾を招き、従って放棄されるべきである。弱いトレースとフィッシャー情報基準は, ポストセレクションを含む通信プロトコルの非現実性の程度に一致している。ポストセレクションは反事実コミュニケーションプロトコルの必要な要素であると主張する。コヒーレントな相互作用実験と、弱いトレースを除去する対実的通信装置の最近の変更について論じる。 Counterfactual communication protocols are analysed using three approaches: a classical argument, the weak trace criterion, and the Fisher information criterion. It is argued that the classical analysis leads to contradiction and should therefore be abandoned. The weak trace and Fisher information criteria are shown to agree about the degree of counterfactuality of communication protocols involving postselection. It is argued that postselection is a necessary ingredient of counterfactual communication protocols. Coherent interaction experiments, as well as a recently introduced modification of counterfactual communication setups which eliminates the weak trace, are discussed.	翻訳日:2023-03-22 20:10:33 公開日:2021-07-11
# 光浮遊誘電体ナノ粒子の双極子散乱のイメージング Imaging the dipole scattering of an optically levitated dielectric nanoparticle ( http://arxiv.org/abs/2107.05042v1 ) ライセンス: Link先を確認	Yuanbin Jin, Jiangwei Yan, Shah Jee Rahman, Xudong Yu and Jing Zhang	(参考訳) ナノ粒子の双極子散乱を高数値開口(NA)イメージングシステムを用いて実験的に観察した。光浮上性ナノ粒子は、粒子-基板相互作用のない環境を提供する。我々は、散乱収集に使用する高NA目標に強く焦点を絞った1064nmトラップレーザビームの伝播方向に対して直交する532nmレーザービームで真空中でシリカナノ粒子を照射し、暗背景と高信号ノイズ比をもたらす。入射レーザの線形偏光により誘起されるナノ粒子の双極子配向を、照明光偏光を回転させる際の像内の散乱光分布とフーリエ空間(k空間)を測定することにより研究した。顕微鏡対象の光学軸に沿ってナノ粒子の双極子配向が整列している場合には、特別に偏光渦(ベクトルビーム)が観察される。我々の研究は、カーカー条件で散乱異方性を研究するための重要なプラットフォームを提供する。 We experimentally observe the dipole scattering of a nanoparticle using a high numerical aperture (NA) imaging system. The optically levitated nanoparticle provides an environment free of particle-substrate interaction. We illuminate the silica nanoparticle in vacuum with a 532 nm laser beam orthogonally to the propagation direction of the 1064 nm trapping laser beam strongly focused by the same high NA objective used to collect the scattering, which results in a dark background and high signal-noise ratio. The dipole orientations of the nanoparticle induced by the linear polarization of the incident laser are studied by measuring the scattering light distribution in the image and the Fourier space (k-space) as we rotate the illuminating light polarization. The polarization vortex (vector beam) is observed for the special case, when the dipole orientation of the nanoparticle is aligned along the optical axis of the microscope objective. Our work offers an important platform for studying the scattering anisotropy with Kerker conditions.	翻訳日:2023-03-22 20:10:23 公開日:2021-07-11
# 1次元および2次元タイト結合格子の量子輸送と局在 Quantum transport and localization in 1d and 2d tight-binding lattices ( http://arxiv.org/abs/2107.05035v1 ) ライセンス: Link先を確認	Amir H. Karamlou, Jochen Braum\"uller, Yariv Yanay, Agustin Di Paolo, Patrick Harrington, Bharath Kannan, David Kim, Morten Kjaergaard, Alexander Melville, Sarah Muschinske, Bethany Niedzielski, Antti Veps\"al\"ainen, Roni Winik, Jonilyn L. Yoder, Mollie Schwartz, Charles Tahan, Terry P. Orlando, Simon Gustavsson and William D. Oliver	(参考訳) 凝縮マター系における粒子輸送と局在現象は、強結合格子ハミルトニアンを用いてモデル化することができる。このようなモデルの理想的な実験エミュレーションは、高コヒーレントな量子システムにおいて、各格子サイトの同時かつ高忠実な制御と読み出しを利用する。ここでは, 量子輸送を1次元および2次元の強結合格子で実験的に研究し, 完全に制御可能な3-\times 3$配列の超伝導量子ビットでエミュレートした。格子内における絡み合いの伝播を探索し,アンダーソン・アンド・ワニエ・スターク政権における部位可変性障害強度と勾配の存在下での局在度を抽出する。この結果は数値シミュレーションと定量的に一致し,タイト結合モデルに基づく理論予測と一致する。システムオブザーバブルの抽出における実験的制御と精度の実証レベルは、数値シミュレーションが難解になる大きな相互作用格子の探索を可能にする。 Particle transport and localization phenomena in condensed-matter systems can be modeled using a tight-binding lattice Hamiltonian. The ideal experimental emulation of such a model utilizes simultaneous, high-fidelity control and readout of each lattice site in a highly coherent quantum system. Here, we experimentally study quantum transport in one-dimensional and two-dimensional tight-binding lattices, emulated by a fully controllable $3 \times 3$ array of superconducting qubits. We probe the propagation of entanglement throughout the lattice and extract the degree of localization in the Anderson and Wannier-Stark regimes in the presence of site-tunable disorder strengths and gradients. Our results are in quantitative agreement with numerical simulations and match theoretical predictions based on the tight-binding model. The demonstrated level of experimental control and accuracy in extracting the system observables of interest will enable the exploration of larger, interacting lattices where numerical simulations become intractable.	翻訳日:2023-03-22 20:10:06 公開日:2021-07-11
# 量子近似最適化アルゴリズムによる最大確率検出 Quantum Approximate Optimization Algorithm Based Maximum Likelihood Detection ( http://arxiv.org/abs/2107.05020v1 ) ライセンス: Link先を確認	Jingjing Cui, Yifeng Xiong, Soon Xin Ng, Lajos Hanzo	(参考訳) 量子技術の最近の進歩は、ノイズの多い中間スケール量子(NISQ)デバイスへの道を切り開いており、そこでは量子近似最適化アルゴリズム(QAOAs)が、NISQデバイスに基づく有形量子優位性を示す有望な候補となっている。本稿では,複数入力および複数出力(MIMO)チャネル上で送信されるバイナリシンボルの最大確率(ML)検出問題について考察する。本稿では、2pの変動パラメータを持つレベルpのQAOA回路に関心の問題を符号化することにより、ML検出にQAOAを適用する。このレベルp qaoa回路は、本問題と初期ハミルトニアンをp次ラウンドで交互に適用することにより構成される。より明確に、我々はまずML検出問題の最適解をハミルトニアン問題の基底状態に符号化する。量子断熱進化法を用いて,ml検出に用いる量子システムの固有値の進化を特徴付ける解析結果と数値値の両方を提供する。そして、レベル1のQAOA回路に対して、QAOAの期待値の解析式を導出し、QAOAベースのML検出器の複雑さについて議論する。本稿では,従来の最適化器の計算複雑性と,QAOAをシミュレーションするストレージ要件について検討する。最後に、QAOAベースのML検出器のビット誤り率(BER)を評価し、従来のML検出器と従来のMMSE検出器の両方と比較し、QAOAベースのML検出器が従来のML検出器の性能に近づくことができることを示した。これにより、NISQコンピュータによって解決される大規模な古典的最適化問題のホストの道が開ける。 Recent advances in quantum technologies pave the way for noisy intermediate-scale quantum (NISQ) devices, where quantum approximation optimization algorithms (QAOAs) constitute promising candidates for demonstrating tangible quantum advantages based on NISQ devices. In this paper, we consider the maximum likelihood (ML) detection problem of binary symbols transmitted over a multiple-input and multiple-output (MIMO) channel, where finding the optimal solution is exponentially hard using classical computers. Here, we apply the QAOA for the ML detection by encoding the problem of interest into a level-p QAOA circuit having 2p variational parameters, which can be optimized by classical optimizers. This level-p QAOA circuit is constructed by applying the prepared Hamiltonian to our problem and the initial Hamiltonian alternately in p consecutive rounds. More explicitly, we first encode the optimal solution of the ML detection problem into the ground state of a problem Hamiltonian. Using the quantum adiabatic evolution technique, we provide both analytical and numerical results for characterizing the evolution of the eigenvalues of the quantum system used for ML detection. Then, for level-1 QAOA circuits, we derive the analytical expressions of the expectation values of the QAOA and discuss the complexity of the QAOA based ML detector. Explicitly, we evaluate the computational complexity of the classical optimizer used and the storage requirement of simulating the QAOA. Finally, we evaluate the bit error rate (BER) of the QAOA based ML detector and compare it both to the classical ML detector and to the classical MMSE detector, demonstrating that the QAOA based ML detector is capable of approaching the performance of the classical ML detector. This paves the way for a host of large-scale classical optimization problems to be solved by NISQ computers.	翻訳日:2023-03-22 20:09:53 公開日:2021-07-11
# 微分マップエリートによる自己参照品質の多様性 Self-Referential Quality Diversity Through Differential Map-Elites ( http://arxiv.org/abs/2107.04964v1 ) ライセンス: Link先を確認	Tae Jong Choi and Julian Togelius	(参考訳) Differential MAP-ElitesはCVT-MAP-Elitesの照明能力と微分進化の連続空間最適化能力を組み合わせた新しいアルゴリズムである。このアルゴリズムは、照明アルゴリズムと品質多様性アルゴリズムが、進化的計算のための定性的に新しい機能と応用を提供するという観察によって動機付けられている。ここで初めて導入された基本的な微分 MAP-Elites アルゴリズムは、微分進化の演算子とCVT-MAP-Elites の写像構造を単純に組み合わせることで比較的単純である。 25の数値最適化問題に基づく実験により、差分MAP-エリートはCVT-MAP-エリートよりも明らかに優れ、より高品質で多様な解が見つかることが示唆された。 Differential MAP-Elites is a novel algorithm that combines the illumination capacity of CVT-MAP-Elites with the continuous-space optimization capacity of Differential Evolution. The algorithm is motivated by observations that illumination algorithms, and quality-diversity algorithms in general, offer qualitatively new capabilities and applications for evolutionary computation yet are in their original versions relatively unsophisticated optimizers. The basic Differential MAP-Elites algorithm, introduced for the first time here, is relatively simple in that it simply combines the operators from Differential Evolution with the map structure of CVT-MAP-Elites. Experiments based on 25 numerical optimization problems suggest that Differential MAP-Elites clearly outperforms CVT-MAP-Elites, finding better-quality and more diverse solutions.	翻訳日:2023-03-22 20:09:20 公開日:2021-07-11
# 脱分極型レコメンダシステムの設計 Designing Recommender Systems to Depolarize ( http://arxiv.org/abs/2107.04953v1 ) ライセンス: Link先を確認	Jonathan Stray	(参考訳) 分極化は民主主義の侵食と暴力の進行に関係しており、大規模なアルゴリズム的コンテンツ選択システム(リコンペンダーシステム)の分極特性が平和と安全保障の懸念事項となっている。アルゴリズム駆動のソーシャルメディアは、国レベルでの偏光の主要な要因とは思えないが、偏光社会において有用な介入ポイントとなるかもしれない。本稿では,対立の抑制や排除ではなく,より建設的な対立に向けたアルゴリズム的非分極介入について検討する。アルゴリズムによる介入は、どのコンテンツが利用可能か(モデレーション)、コンテンツの選択とパーソナライズ方法(ランク付け)、コンテンツのプレゼンテーションとコントロール(ユーザインターフェース)の3段階で検討される。オンライン紛争に関する実証研究は、「フィルターバブル」に対する解毒剤として提案された暴露多様性の介入が改善され、ある条件下では分極が悪化する可能性を示唆している。コンテンツ選択の多様性にともなうcivility metricsの使用は、より効果的かもしれない。しかし、多様性に基づく介入は大規模にテストされておらず、実際のプラットフォームの多様性と動的コンテキストでは機能しない可能性がある。代わりに、プラットフォーム偏光力学の介入は、広く使われている「フィーリング温度計」のような偏光測定の連続的なモニタリングを必要とする可能性が高い。これらのメトリクスは製品の特徴を評価するのに使われ、アルゴリズムの目的として設計される可能性がある。さらに、最適化プロセスが競合を副作用として生み出すのを防ぐために、レコメンダアルゴリズムの目的関数に偏極対策を含める必要があるかもしれない。 Polarization is implicated in the erosion of democracy and the progression to violence, which makes the polarization properties of large algorithmic content selection systems (recommender systems) a matter of concern for peace and security. While algorithm-driven social media does not seem to be a primary driver of polarization at the country level, it could be a useful intervention point in polarized societies. This paper examines algorithmic depolarization interventions with the goal of conflict transformation: not suppressing or eliminating conflict but moving towards more constructive conflict. Algorithmic intervention is considered at three stages: which content is available (moderation), how content is selected and personalized (ranking), and content presentation and controls (user interface). Empirical studies of online conflict suggest that the exposure diversity intervention proposed as an antidote to "filter bubbles" can be improved and can even worsen polarization under some conditions. Using civility metrics in conjunction with diversity in content selection may be more effective. However, diversity-based interventions have not been tested at scale and may not work in the diverse and dynamic contexts of real platforms. Instead, intervening in platform polarization dynamics will likely require continuous monitoring of polarization metrics, such as the widely used "feeling thermometer." These metrics can be used to evaluate product features, and potentially engineered as algorithmic objectives. It may further prove necessary to include polarization measures in the objective functions of recommender algorithms to prevent optimization processes from creating conflict as a side effect.	翻訳日:2023-03-22 20:09:04 公開日:2021-07-11
# 衝突誘起スピンノイズ Collision-induced spin noise ( http://arxiv.org/abs/2107.04942v1 ) ライセンス: Link先を確認	Shiming Song, Min Jiang, Yushu Qin, Yu Tong, Wenzhe Zhang, Xi Qin, Ren-Bao Liu, Xinhua Peng	(参考訳) 衝突現象はユビキタスであり、原子と分子の微細構造や分子間相互作用を決定する上で重要である。既存のアプローチは、主に原子または分子の散乱に基づいており、超高真空と低温のシステムの使用の不便さによって妨げられている。ここでは、プローブ光の光偏光回転ノイズを簡易な装置と環境条件で測定することにより、新しいスピンノイズ分光法を示す。我々の手法は、数十ギガヘルツの帯域幅と1部1万の分解能を備え、既存のスピンノイズ技術より優れている。この新しい手法により, アルカリ原子の衝突誘起スピンノイズを観測し, 衝突径, 井戸深さ, 支配的相互作用型といった重要な衝突パラメータを正確に決定する。本研究は環境条件下での幅広い衝突現象を研究するための新しいツールを提供する。 Collision phenomena are ubiquitous and of importance in determining the microscopic structures and intermolecular interactions of atoms and molecules. The existing approaches are mostly based on atomic or molecular scatterings, which are hindered by the inconvenience of using ultra-high vacuum and low temperature systems. Here we demonstrate a new spin-noise spectroscopic approach by measuring optical polarization rotation noise of the probe light, which operates with simple apparatus and ambient conditions. Our approach features tens of gigahertz bandwidth and one part-per-million resolution, outperforming existing spin-noise techniques. Enabled by the new technique, we observe the collision-induced spin noise of alkali atoms, and precisely determine key collision parameters, such as collision diameter, well depth, and dominant interaction type. Our work provides a new tool to study a broad range of collision phenomena under ambient conditions.	翻訳日:2023-03-22 20:08:34 公開日:2021-07-11
# カーネル近似のためのランダム特徴:アルゴリズム、理論、およびそれ以上に関する調査 Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond ( http://arxiv.org/abs/2004.11154v5 ) ライセンス: Link先を確認	Fanghui Liu, Xiaolin Huang, Yudong Chen, and Johan A.K. Suykens	(参考訳) ランダム機能は、大規模な問題においてカーネルメソッドを高速化する最も一般的な手法の1つである。 2017年にはNeurIPS Test-of-Time Award、2019年にはICML Best Paper Finalistが受賞した。ランダムな特徴に関する研究の本体は急速に成長しており、様々なアルゴリズムと理論的結果の関連性を説明する上で、この話題を包括的に概観することが望ましい。本研究では,過去10年間のランダムな特徴に関する研究を体系的にレビューする。まず,代表的ランダム特徴に基づくアルゴリズムのモチベーション,特徴,貢献を,サンプリングスキーム,学習手順,分散還元特性,トレーニングデータの利用方法に応じて要約する。第2に,学習した推定者の経験的・予測的リスクの損失がないことを保証するために,ランダムな特徴がいくつ必要か,という疑問を中心に,理論的結果について検討する。第3に,いくつかの大規模ベンチマークデータセットに基づく一般的なランダム特徴量に基づくアルゴリズムの包括的評価を行い,分類におけるその近似品質と予測性能について考察する。最後に、DNNの分析における高次元ランダム特徴の利用や、現在の理論的および経験的結果のギャップを含む、ランダム特徴と現代の過パラメータ化ディープニューラルネットワーク(DNN)の関係について論じる。この調査は、このトピックの穏やかな紹介や、代表的アルゴリズムの適用や様々な技術的前提の下での理論的結果の理解に関心のある実践者のためのユーザガイドとして機能する可能性がある。この調査が、このトピックのオープンな問題、さらに重要なこととして、今後の研究の方向性に関する議論を促進することを期待しています。 Random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of work on random features has grown rapidly, and hence it is desirable to have a comprehensive overview on this topic explaining the connections among various algorithms and theoretical results. In this survey, we systematically review the work on random features from the past ten years. First, the motivations, characteristics and contributions of representative random features based algorithms are summarized according to their sampling schemes, learning procedures, variance reduction properties and how they exploit training data. Second, we review theoretical results that center around the following key question: how many random features are needed to ensure a high approximation quality or no loss in the empirical/expected risks of the learned estimator. Third, we provide a comprehensive evaluation of popular random features based algorithms on several large-scale benchmark datasets and discuss their approximation quality and prediction performance for classification. Last, we discuss the relationship between random features and modern over-parameterized deep neural networks (DNNs), including the use of high dimensional random features in the analysis of DNNs as well as the gaps between current theoretical and empirical results. This survey may serve as a gentle introduction to this topic, and as a users' guide for practitioners interested in applying the representative algorithms and understanding theoretical results under various technical assumptions. We hope that this survey will facilitate discussion on the open problems in this topic, and more importantly, shed light on future research directions.	翻訳日:2022-12-10 09:13:09 公開日:2021-07-11
# 騒がしい自己報告を使ってtwitterユーザーの人口統計を予測 Using Noisy Self-Reports to Predict Twitter User Demographics ( http://arxiv.org/abs/2005.00635v2 ) ライセンス: Link先を確認	Zach Wood-Doughty, Paiheng Xu, Xiao Liu, Mark Dredze	(参考訳) 計算社会科学の研究は、しばしば標準的な人口統計学内のコンテンツ分析を文脈化する。人口統計は多くのソーシャルメディアプラットフォーム(例えばtwitter)では利用できないため、多くの研究が自動的に人口統計を推測している。多くの研究が人種と民族の概念推論の証明を提示しているが、注釈付きデータセットがほとんどないため、実践的なシステムの訓練は明らかになっていない。既存のデータセットは小さく、不正確で、アメリカで最も一般的な4つの人種や民族をカバーできない。本稿では,twitterのプロフィールから人種と民族の自己報告を識別する手法を提案する。自動監視に固有の誤りにもかかわらず、金の標準自己報告調査データに基づいて、優れた性能のモデルを作成する。その結果は、人種や民族のための大規模な訓練資源を作成する再現可能な方法である。 Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter) numerous studies have inferred demographics automatically. Despite many studies presenting proof of concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite errors inherent in automated supervision, we produce models with good performance when measured on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.	翻訳日:2022-12-08 00:05:31 公開日:2021-07-11
# 自然災害対応のためのソーシャルメディア情報共有 Social Media Information Sharing for Natural Disaster Response ( http://arxiv.org/abs/2005.07019v5 ) ライセンス: Link先を確認	Zhijie Sasha Dong, Lingyu Meng, Lauren Christenson, Lawrence Fulton	(参考訳) ソーシャルメディアは災害関連情報を投稿するための重要なチャネルとなり、災害管理を改善するために政府や救援機関にリアルタイムデータを提供する。しかし、この分野の研究は十分な注目を集めておらず、有用な情報を抽出することは依然として困難である。本研究の目的は,災害対応に対する公衆の態度や災害時の防災物資に対する公衆の要求など,ソーシャルメディアデータのマイニング・分析による防災効率の向上である。我々は,41,993件のツイートを含むタイプ,期間,被害などの特性に基づいて,異なる自然災害に焦点を当てた。本稿では, 災害対応の満足度, 不安感などの情報を含む, 人手による分類ツイートによって, 公共の認知度を定性的に評価する。自然災害に対する公衆の態度は、8つの機械学習モデルを用いて定量的解析によって研究される。適切なモデルによる意思決定者に提供するために、計算時間と予測精度に基づく機械学習モデルの比較を行う。異なる自然災害における世論の変化と、twitterが進化を続ける中、同じタイプの自然災害に直面する災害救済にソーシャルメディアを利用する人々の行動の進化について研究している。本論文は,提案手法の有効性と妥当性を実証し,災害対策に関する知見を救援庁に提供した。 Social media has become an essential channel for posting disaster-related information, which provide governments and relief agencies real-time data for better disaster management. However, research in this field has not received sufficient attention and extracting useful information is still challenging. This paper aims to improve disaster relief efficiency via mining and analyzing social media data like public attitudes towards disaster response and public demands for targeted relief supplies during different types of disasters. We focus on different natural disasters based on properties such as types, durations, and damages, which contains a total of 41,993 tweets. In this paper, public perception is assessed qualitatively by manually classified tweets, which contain information like the demand for targeted relief supplies, satisfactions of disaster response, and public fear. Public attitudes to natural disasters are studied via a quantitative analysis using eight machine learning models. To better provide decision-makers with the appropriate model, the comparison of machine learning models based on computational time and prediction accuracy is conducted. The change of public opinion during different natural disasters and the evolution of people's behavior of using social media for disaster relief in the face of the identical type of natural disasters as Twitter continues to evolve are studied. The results in this paper demonstrate the feasibility and validation of the proposed research approach and provide relief agencies with insights into better disaster management.	翻訳日:2022-12-05 12:08:51 公開日:2021-07-11
# 再帰的政策ネットワークの有限状態表現の再理解 Re-understanding Finite-State Representations of Recurrent Policy Networks ( http://arxiv.org/abs/2006.03745v3 ) ライセンス: Link先を確認	Mohamad H. Danesh, Anurag Koul, Alan Fern, Saeed Khorram	(参考訳) 本稿では、リカレントニューラルネットワークとして表現される制御ポリシーを理解するためのアプローチを提案する。最近の研究は、このようなリカレントポリシーネットワークを有限状態マシン(FSM)に変換し、等価最小化FSMを分析することでこの問題にアプローチしている。これは興味深い洞察につながったが、最小化プロセスは、意味的に異なる状態をマージすることで、マシンの動作をより深く理解することができない。この問題に対処するため,我々は,fsmの最小化から始まって,政策の重要な決定点を保存するより解釈可能な削減を適用する分析手法を提案する。また、意思決定における観察の役割をより深く理解するための注意ツールも提供します。 7つのAtariゲームと3つの制御ベンチマークのケーススタディは、これまで気付かれていなかった洞察を明らかにすることができることを示した。 We introduce an approach for understanding control policies represented as recurrent neural networks. Recent work has approached this problem by transforming such recurrent policy networks into finite-state machines (FSM) and then analyzing the equivalent minimized FSM. While this led to interesting insights, the minimization process can obscure a deeper understanding of a machine's operation by merging states that are semantically distinct. To address this issue, we introduce an analysis approach that starts with an unminimized FSM and applies more-interpretable reductions that preserve the key decision points of the policy. We also contribute an attention tool to attain a deeper understanding of the role of observations in the decisions. Our case studies on 7 Atari games and 3 control benchmarks demonstrate that the approach can reveal insights that have not been previously noticed.	翻訳日:2022-11-24 20:56:47 公開日:2021-07-11
# 円滑な敵の訓練 Smooth Adversarial Training ( http://arxiv.org/abs/2006.14536v2 ) ライセンス: Link先を確認	Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le	(参考訳) ネットワークは正確かつ堅牢であり得ず、堅牢性を得ることは正確さを失うことを意味すると一般的に信じられている。また、ネットワークを大きくしなければ、ネットワークアーキテクチャ要素は敵の堅牢性を改善する上ではほとんど重要でないと一般的に信じられている。ここでは,これらの共通の信念に挑戦する証拠を,敵の訓練に関する注意深く研究して提示する。注意点として,広く用いられているrelu活性化関数は,その非スムース性により,逆行訓練を著しく弱めている。そこで我々は,ReLUをそのスムーズな近似で置き換えて,対人訓練を強化するスムーズな対人訓練(SAT)を提案する。 SATのスムーズなアクティベーション関数の目的は、より難しい敵の例を見つけ、敵のトレーニング中により良い勾配更新を計算することである。 SATは標準的な対人訓練と比較して、「自由」の対人ロバスト性、すなわち精度の低下や計算コストの増大を改善できる。例えば、さらなる計算を導入することなく、SATはResNet-50の堅牢性を33.0%から42.3%に大幅に向上し、ImageNetの精度も0.9%向上した。 EfficientNet-L1が82.2%の精度と58.6%の堅牢性を達成するのに役立ち、従来の最先端の防御を9.5%、ロバスト性11.6%で上回っている。モデルはhttps://github.com/cihangxie/SmoothAdversarialTrainingで入手できる。 It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet. SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9.5% for accuracy and 11.6% for robustness. Models are available at https://github.com/cihangxie/SmoothAdversarialTraining.	翻訳日:2022-11-17 02:36:49 公開日:2021-07-11
# Pairwise Marginalsによる離散ランダム変数の結合確率の復元 Recovering Joint Probability of Discrete Random Variables from Pairwise Marginals ( http://arxiv.org/abs/2006.16912v2 ) ライセンス: Link先を確認	Shahana Ibrahim, Xiao Fu	(参考訳) 確率変数(RV)の合同確率の学習は、統計信号処理と機械学習の基盤となる。しかし、高次元関節確率の直接的非パラメトリック推定は、一般に次元性の呪いのため不可能である。最近の研究は、低ランクテンソル分解の代数的性質とRV間の(未知の)依存を利用して、任意の数のRVの結合確率質量関数(PMF)を3次元境界から復元することを提案した。それでも、3次元のマージンを正確に推定することは、サンプルの複雑さの面ではコストがかかる可能性がある。三次元境界を用いた場合、トラクタビリティが不明なテンソル分解問題にも挑戦する。この研究は、ペアの辺縁のみを用いて共同PMFを学習するための新しい枠組みを提示し、これは自然に3次に比べて低いサンプル複雑性を享受する。結合型非負行列分解(CNMF)フレームワークを開発し, 種々の条件下でのPMF回復保証について検討した。また,Gram-Schmidt (GS) のようなアルゴリズムを用いて,競合する実行性能を示す。このアルゴリズムは, 有限反復の有界誤差まで, 合理的な条件下で関節pmfを回復できることが示される。また、最近提案された経済予測最大化(EM)アルゴリズムは、GSライクなアルゴリズムの出力を改善することを保証し、精度と効率をさらに高めることを示した。実データ実験は有効性を示すために使用される。 Learning the joint probability of random variables (RVs) is the cornerstone of statistical signal processing and machine learning. However, direct nonparametric estimation for high-dimensional joint probability is in general impossible, due to the curse of dimensionality. Recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of RVs from three-dimensional marginals, leveraging the algebraic properties of low-rank tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals can still be costly in terms of sample complexity, affecting the performance of this line of work in practice in the sample-starved regime. Using three-dimensional marginals also involves challenging tensor decomposition problems whose tractability is unclear. This work puts forth a new framework for learning the joint PMF using only pairwise marginals, which naturally enjoys a lower sample complexity relative to the third-order ones. A coupled nonnegative matrix factorization (CNMF) framework is developed, and its joint PMF recovery guarantees under various conditions are analyzed. Our method also features a Gram--Schmidt (GS)-like algorithm that exhibits competitive runtime performance. The algorithm is shown to provably recover the joint PMF up to bounded error in finite iterations, under reasonable conditions. It is also shown that a recently proposed economical expectation maximization (EM) algorithm guarantees to improve upon the GS-like algorithm's output, thereby further lifting up the accuracy and efficiency. Real-data experiments are employed to showcase the effectiveness.	翻訳日:2022-11-15 05:21:22 公開日:2021-07-11
# ポイントクラウドにおける3次元物体検出のための部分認識データ拡張 Part-Aware Data Augmentation for 3D Object Detection in Point Cloud ( http://arxiv.org/abs/2007.13373v2 ) ライセンス: Link先を確認	Jaeseok Choi, Yeji Song and Nojun Kwak	(参考訳) データ拡張は画像認識タスクの性能向上に大きく貢献し、多くの関連研究が実施されている。しかし、3d point cloud dataのデータ拡張はあまり検討されていない。 3Dラベルは2Dラベルよりも高度で豊富な構造情報を持っているため、より多彩で効果的なデータ拡張を可能にする。本稿では,3dラベルのリッチな情報を利用して3d物体検出の性能を向上させるpart-aware data augmentation (pa-aug)を提案する。 PA-AUGはオブジェクトを分割に分割し、各局所領域に5つの拡張法を確率的に適用する。既存のポイントクラウドデータ拡張手法と互換性があり、検出器のアーキテクチャに関係なく普遍的に使用できる。 PA-AUGは、KITTIデータセットの全クラスに対して最先端の3Dオブジェクト検出器の性能を改善し、列車データを2.5$\times$で増加させる同等の効果を有する。また、PA-AUGは、与えられたデータセットのパフォーマンスを向上するだけでなく、破損したデータに対して堅牢であることを示す。コードはhttps://github.com/sky77764/pa-aug.pytorchで入手できる。 Data augmentation has greatly contributed to improving the performance in image recognition tasks, and a lot of related studies have been conducted. However, data augmentation on 3D point cloud data has not been much explored. 3D label has more sophisticated and rich structural information than the 2D label, so it enables more diverse and effective data augmentation. In this paper, we propose part-aware data augmentation (PA-AUG) that can better utilize rich information of 3D label to enhance the performance of 3D object detectors. PA-AUG divides objects into partitions and stochastically applies five augmentation methods to each local region. It is compatible with existing point cloud data augmentation methods and can be used universally regardless of the detector's architecture. PA-AUG has improved the performance of state-of-the-art 3D object detector for all classes of the KITTI dataset and has the equivalent effect of increasing the train data by about 2.5$\times$. We also show that PA-AUG not only increases performance for a given dataset but also is robust to corrupted data. The code is available at https://github.com/sky77764/pa-aug.pytorch	翻訳日:2022-11-06 08:36:52 公開日:2021-07-11
# カルマンフィルターと他のカーネルスモーザーとの接続によるガウス過程と関連ベクトルマシンの合同紹介 A Joint introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman filtering and other Kernel Smoothers ( http://arxiv.org/abs/2009.09217v4 ) ライセンス: Link先を確認	Luca Martino, Jesse Read	(参考訳) ベイジアンカーネルベースの手法の表現力は、人工知能のさまざまな側面にまたがる重要なツールとなり、多くの現代的なアプリケーションドメインに役立ち、不確実性分析によるパワーと解釈可能性の両方を提供する。本稿では,確率ベイズスキームの領域にまたがる2つの手法と回帰のためのカーネル法,ガウス過程と関連ベクトルマシンについて述べる。我々は,これらの手法を中間的手法,よく知られたカーネルリッジ回帰(kernel ridge regression)の確率的バージョン,二重定式化(dual formulas)によるそれらの接続の描画,主要なタスク(regressive, smoothing, interpolation, filter)のコンテキストにおけるそれらの応用に関する議論といった共通フレームワークの開発に焦点をあてている。全体として、これらのモデルの背後にある数学的概念を理解し、異なる解釈を深く要約し議論し、線形核スムーサ、カルマンフィルタリング、フーリエ近似など他の方法との関係を強調する。全体としては,理解を促進するために多数の図面を提供し,実践者には多数の推薦を行う。さまざまなテクニックのメリットと欠点が強調されている。私たちの知る限りでは、この2つの手法に焦点をあてたこれまでで最も詳細な研究であり、データサイエンス、信号処理、機械学習、人工知能全般の分野における理論的な理解と実践に関係します。 The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.	翻訳日:2022-10-16 20:53:40 公開日:2021-07-11
# 最大平均不一致テストは敵攻撃に注意する Maximum Mean Discrepancy Test is Aware of Adversarial Attacks ( http://arxiv.org/abs/2010.11415v3 ) ライセンス: Link先を確認	Ruize Gao, Feng Liu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Masashi Sugiyama	(参考訳) 最大平均誤差(MMD)テストは、原則として2つのデータセット間の分布誤差を検出できる。しかし、mmdテストは、敵の攻撃に気づいていないことが示されている。mmdテストは、自然データと敵データの間の不一致を検知できなかった。この現象を踏まえると、我々は、自然データと逆データとが本当に異なる分布から来ているか、という疑問を提起する。答えは肯定的であり、その目的に対するmmdテストの以前の使用は、3つの重要な要因を欠いている。それゆえ、我々は3つのコンポーネントを提案する。まず、ガウス核は表現力に制限があり、それを効果的なディープカーネルに置き換える。第2に,mmd試験の試験力は無視され,漸近統計により最大化した。最後に、敵データは非独立である可能性があり、ワイルドブートストラップでこの問題を克服する。これら3つの要因に対処することにより,MDD テストが敵攻撃に気づいていることを確認し,両サンプルテストに基づく敵データ検出のための新たな道を開く。 The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets. However, it has been shown that the MMD test is unaware of adversarial attacks -- the MMD test failed to detect the discrepancy between natural and adversarial data. Given this phenomenon, we raise a question: are natural and adversarial data really from different distributions? The answer is affirmative -- the previous use of the MMD test on the purpose missed three key factors, and accordingly, we propose three components. Firstly, the Gaussian kernel has limited representation power, and we replace it with an effective deep kernel. Secondly, the test power of the MMD test was neglected, and we maximize it following asymptotic statistics. Finally, adversarial data may be non-independent, and we overcome this issue with the wild bootstrap. By taking care of the three factors, we verify that the MMD test is aware of adversarial attacks, which lights up a novel road for adversarial data detection based on two-sample tests.	翻訳日:2022-10-04 05:30:20 公開日:2021-07-11
# 進行性BERTトレーニングにおける変圧器成長について On the Transformer Growth for Progressive BERT Training ( http://arxiv.org/abs/2010.12562v3 ) ライセンス: Link先を確認	Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han	(参考訳) 大規模言語モデルの事前トレーニングに過度なコストがかかるため、BERTを徐々にトレーニングする努力が続けられている。我々の目標は、トランスフォーマーの成長の理解を深め、進歩的トレーニングを導く原則を発見することである。まず、ネットワークアーキテクチャ検索と同様に、トランスフォーマーの成長も複合スケーリングを好むことが分かりました。具体的には、既存の手法は1次元でのみネットワーク成長を行うが、複合成長演算子を用いて複数の次元(例えば、モデルの深さ、幅、入力長)のバランスをとることは有用である。さらに,各次元の代替成長演算子を制御比較により探索し,演算子選択の実践的ガイダンスを与える。解析結果から,提案手法は,ベースモデルと大規模モデルでそれぞれ73.6%, 82.2%の事前学習を高速化し, 比較性能を実現した。 Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity. Our objective is to advance the understanding of Transformer growth and discover principles that guide progressive training. First, we find that similar to network architecture search, Transformer growth also favors compound scaling. Specifically, while existing methods only conduct network growth in a single dimension, we observe that it is beneficial to use compound growth operators and balance multiple dimensions (e.g., depth, width, and input length of the model). Moreover, we explore alternative growth operators in each dimension via controlled comparison to give operator selection practical guidance. In light of our analyses, the proposed method speeds up BERT pre-training by 73.6% and 82.2% for the base and large models respectively, while achieving comparable performances	翻訳日:2022-10-03 21:41:14 公開日:2021-07-11
# 自律運転における確率的物体検出に関するレビューと比較研究 A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving ( http://arxiv.org/abs/2011.10671v2 ) ライセンス: Link先を確認	Di Feng, Ali Harakeh, Steven Waslander, Klaus Dietmayer	(参考訳) 物体検出の不確実性の捕捉は安全な自動運転には不可欠である。近年,ディープラーニングがオブジェクト検出のデファクトアプローチとなり,多くの確率的オブジェクト検出法が提案されている。しかし、深層物体検出における不確実性推定の要約は存在せず、既存の手法は異なるネットワークアーキテクチャと不確実性推定法で構築されるだけでなく、幅広い評価指標を持つ異なるデータセット上で評価される。結果として、特定のアプリケーションに最も適したモデルの選択と同様に、メソッド間の比較は難しいままである。本論文は,既存の確率的物体検出法を自動運転応用に適用することで,この問題を緩和することを目的としている。まず,深層学習における汎用的不確実性推定の概観を提供し,確率的物体検出のための既存手法と評価指標を体系的に調査する。次に、画像検出器と3つの公道運転データセットに基づく確率的物体検出のための厳密な比較研究を提案する。最後に,残りの課題と今後の課題について考察する。コードはhttps://github.com/asharakeh/pod_compare.gitで利用可能である。 Capturing uncertainty in object detection is indispensable for safe autonomous driving. In recent years, deep learning has become the de-facto approach for object detection, and many probabilistic object detectors have been proposed. However, there is no summary on uncertainty estimation in deep object detection, and existing methods are not only built with different network architectures and uncertainty estimation methods, but also evaluated on different datasets with a wide range of evaluation metrics. As a result, a comparison among methods remains challenging, as does the selection of a model that best suits a particular application. This paper aims to alleviate this problem by providing a review and comparative study on existing probabilistic object detection methods for autonomous driving applications. First, we provide an overview of generic uncertainty estimation in deep learning, and then systematically survey existing methods and evaluation metrics for probabilistic object detection. Next, we present a strict comparative study for probabilistic object detection based on an image detector and three public autonomous driving datasets. Finally, we present a discussion of the remaining challenges and future works. Code has been made available at https://github.com/asharakeh/pod_compare.git	翻訳日:2022-09-23 06:24:40 公開日:2021-07-11
# 効率的な注意ネットワーク:プラグインの場所を検索して注意を加速する Efficient Attention Network: Accelerate Attention by Searching Where to Plug ( http://arxiv.org/abs/2011.14058v2 ) ライセンス: Link先を確認	Zhongzhan Huang, Senwei Liang, Mingfu Liang, Wei He, Haizhao Yang	(参考訳) 近年,深層畳み込みニューラルネットワーク(cnns)の内部情報を活用し,モデル一般化を促進するために,プラグ・アンド・プレイ自着モジュールが多数提案されている。以前の作業では、例えば軽量やタスク指向といった特定の機能のための注意モジュールの設計に重点が置かれていた。しかし、彼らはモジュールをCNNバックボーンのブロック全体と個別に接続するので、注意モジュールをどこに差し込むかの重要性を無視し、ネットワーク深度の増加に伴う計算コストとパラメータの数の増加につながった。そこで我々は,既存のアテンションモジュールの効率を改善するために,EAN(Efficient Attention Network)というフレームワークを提案する。 EANでは、共有メカニズム(Huang et al. 2020)を活用して、バックボーン内のアテンションモジュールを共有し、強化学習を通じて共有アテンションモジュールを接続する場所を探索する。最後に,(1) 精度を維持しながら,(1) 余剰パラメータの増大と(3) 加速推論を減少させながら,背骨とモジュール間の疎結合な注意ネットワークを得る。広く使われているベンチマークと一般的な注意ネットワークに関する大規模な実験は、EANの有効性を示している。さらに、我々のEANは、他のタスクに転送し、情報的特徴を捉える能力を持っていることを実証的に説明します。コードはhttps://github.com/gbup-group/ean- efficient-attention-networkで入手できる。 Recently, many plug-and-play self-attention modules are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs). Previous works lay an emphasis on the design of attention module for specific functionality, e.g., light-weighted or task-oriented attention. However, they ignore the importance of where to plug in the attention module since they connect the modules individually with each block of the entire CNN backbone for granted, leading to incremental computational cost and number of parameters with the growth of network depth. Thus, we propose a framework called Efficient Attention Network (EAN) to improve the efficiency for the existing attention modules. In EAN, we leverage the sharing mechanism (Huang et al. 2020) to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning. Finally, we obtain the attention network with sparse connections between the backbone and modules, while (1) maintaining accuracy (2) reducing extra parameter increment and (3) accelerating inference. Extensive experiments on widely-used benchmarks and popular attention networks show the effectiveness of EAN. Furthermore, we empirically illustrate that our EAN has the capacity of transferring to other tasks and capturing the informative features. The code is available at https://github.com/gbup-group/EAN-efficient-attention-network.	翻訳日:2022-09-19 19:13:51 公開日:2021-07-11
# (参考訳) オブジェクト検出における連続学習のためのコントラストR-CNN Contrast R-CNN for Continual Learning in Object Detection ( http://arxiv.org/abs/2108.04224v1 ) ライセンス: CC BY 4.0	Kai Zheng, Cen Chen	(参考訳) 連続学習問題は画像分類において広く研究され、オブジェクト検出において希少な研究がなされている。最近のいくつかの研究は、古い知識を維持するためにモデルを制約するために知識蒸留を適用するが、この厳格な制約は新しい知識を学ぶために有害である。本稿では,物体検出の連続学習のための新しい手法,すなわちコントラストr-cnnを提案する。さらに,従来型と新しいインスタンス間のあいまいさを排除し,継続的な学習をより堅牢にする提案コントラストを設計する。 PASCAL VOCデータセットの大規模評価により,本手法の有効性が示された。 The continual learning problem has been widely studied in image classification, while rare work has been explored in object detection. Some recent works apply knowledge distillation to constrain the model to retain old knowledge, but this rigid constraint is detrimental for learning new knowledge. In our paper, we propose a new scheme for continual learning of object detection, namely Contrast R-CNN, an approach strikes a balance between retaining the old knowledge and learning the new knowledge. Furthermore, we design a Proposal Contrast to eliminate the ambiguity between old and new instance to make the continual learning more robust. Extensive evaluation on the PASCAL VOC dataset demonstrates the effectiveness of our approach.	翻訳日:2021-08-15 16:35:57 公開日:2021-07-11
# (参考訳) internet-of-things デバイスと医療支援技術:アプリケーション,課題,機会 Internet-of-Things Devices and Assistive Technologies for Healthcare: Applications, Challenges, and Opportunities ( http://arxiv.org/abs/2107.14112v1 ) ライセンス: CC BY 4.0	Marc Jayson Baucas, Petros Spachos, and Stefano Gregori	(参考訳) 医療の状況やケースは急速に拡大しており、物理的な空間が制限され始めている。病院や診療所は、多くの患者を収容する能力を持っていない。健康産業の現状は、その価値ある限られた資源を改善する必要があることは明らかである。 IoT(Internet of Things)デバイスの進化と補助技術によって、医療サービスにワイヤレスでアクセスする便利な手段として、医療の問題を緩和することができる。これらの技術が提供するユニークな特徴を活用できる、IoTデバイスや潜在的なアプリケーションが数多く存在する。しかし同時に、これらのサービスは適切に対処する必要がある新しい課題を提起する。この記事では、IoTベースの医療用アプリケーションとデバイスに関する一般的なカテゴリについてレビューする。次に、課題を説明し、研究がオープンな問題に適切に対処し、医療における既存の実装を改善する方法について論じる。さらに可能な解決策は、将来の医療アプリケーションで実現可能なソリューションになる可能性を示すためにも議論されている。 Medical conditions and cases are growing at a rapid pace, where physical space is starting to be constrained. Hospitals and clinics no longer have the ability to accommodate large numbers of incoming patients. It is clear that the current state of the health industry needs to improve its valuable and limited resources. The evolution of the Internet of Things (IoT) devices along with assistive technologies can alleviate the problem in healthcare, by being a convenient and easy means of accessing healthcare services wirelessly. There is a plethora of IoT devices and potential applications that can take advantage of the unique characteristics that these technologies can offer. However, at the same time, these services pose novel challenges that need to be properly addressed. In this article, we review some popular categories of IoT-based applications for healthcare along with their devices. Then, we describe the challenges and discuss how research can properly address the open issues and improve the already existing implementations in healthcare. Further possible solutions are also discussed to show their potential in being viable solutions for future healthcare applications	翻訳日:2021-08-01 13:44:31 公開日:2021-07-11
# 有害人工知能の存続における社会運動・協力・労働者の役割と責任AIの発展への貢献 The Role of Social Movements, Coalitions, and Workers in Resisting Harmful Artificial Intelligence and Contributing to the Development of Responsible AI ( http://arxiv.org/abs/2107.14052v1 ) ライセンス: Link先を確認	Susan von Struensee	(参考訳) AIベースのシステムが社会にもたらす影響について、世間の懸念が高まっている。あらゆる分野の連合は、AIの有害な適用に抵抗するために世界中で活動している。信頼できるデータの欠如に対処する先住民から、スマートシティの利害関係者、そして、セックストレーカーやMITの寄付者Jeffery Epsteinとの学術的関係に抗議する学生まで、AIから大きく投資し利益を得る人々の倫理と価値は、世界的な監視下にある。 AIアルゴリズムにはバイアスがあり、不当な仮定があり、介入なしにロックインされる可能性がある。我々の最良の人間の判断は、AIの有害な影響を抑えるために必要です。 AIの最大の貢献の1つとして、人類の知恵が地球上でいかに重要かを理解することがあげられるだろう。 There is mounting public concern over the influence that AI based systems has in our society. Coalitions in all sectors are acting worldwide to resist hamful applications of AI. From indigenous people addressing the lack of reliable data, to smart city stakeholders, to students protesting the academic relationships with sex trafficker and MIT donor Jeffery Epstein, the questionable ethics and values of those heavily investing in and profiting from AI are under global scrutiny. There are biased, wrongful, and disturbing assumptions embedded in AI algorithms that could get locked in without intervention. Our best human judgment is needed to contain AI's harmful impact. Perhaps one of the greatest contributions of AI will be to make us ultimately understand how important human wisdom truly is in life on earth.	翻訳日:2021-08-01 11:01:45 公開日:2021-07-11
# (参考訳) 善は逆か? 対人MLコミュニティの価値が社会的に有利な攻撃にどのように影響するか Adversarial for Good? How the Adversarial ML Community's Values Impede Socially Beneficial Uses of Attacks ( http://arxiv.org/abs/2107.10302v1 ) ライセンス: CC BY 4.0	Kendra Albert, Maggie Delano, Bogdan Kulynych, Ram Shankar Siva Kumar	(参考訳) 敵機械学習(ML)からの攻撃は、ML内の既存の電力構造に対抗するために使用することができ、監視と制御の標的となる人々のための呼吸スペースを作ることができる。しかし、敵対的MLに関する研究の多くは、MLシステムに対する抵抗ツールの開発に関わっていない。なぜだ? 本稿では,敵ml研究者がneurips 2020論文の一部として書いた幅広いインパクトステートメントをレビューし,著者が仕事の目的について持っている仮定を評価する。また、著者が作品の影響をより一般的に見る方法に関する情報も収集しています。我々は、NeurIPSのほとんどの敵対的ML研究者が、社会的に有益な攻撃の活用を検討するのが困難になる2つの基本的な仮定を持っていることを発見した:(1)システムを堅牢にすることが望ましいこと、(2)システムの攻撃者は規範的に悪いこと、そして、システムのディフェンダーは規範的に良いこと。つまり、その表現と中立性にもかかわらず、ほとんどの対立するML研究者は、彼らの仕事の目標はシステムを保護することであり、現状を破壊するためのツールを概念化し構築することが困難であると考えている。 Attacks from adversarial machine learning (ML) have the potential to be used "for good": they can be used to run counter to the existing power structures within ML, creating breathing space for those who would otherwise be the targets of surveillance and control. But most research on adversarial ML has not engaged in developing tools for resistance against ML systems. Why? In this paper, we review the broader impact statements that adversarial ML researchers wrote as part of their NeurIPS 2020 papers and assess the assumptions that authors have about the goals of their work. We also collect information about how authors view their work's impact more generally. We find that most adversarial ML researchers at NeurIPS hold two fundamental assumptions that will make it difficult for them to consider socially beneficial uses of attacks: (1) it is desirable to make systems robust, independent of context, and (2) attackers of systems are normatively bad and defenders of systems are normatively good. That is, despite their expressed and supposed neutrality, most adversarial ML researchers believe that the goal of their work is to secure systems, making it difficult to conceptualize and build tools for disrupting the status quo.	翻訳日:2021-07-25 13:31:47 公開日:2021-07-11
# ハイブリッドant swarmベースのデータクラスタリング Hybrid Ant Swarm-Based Data Clustering ( http://arxiv.org/abs/2107.07382v1 ) ライセンス: Link先を確認	Md Ali Azam, Abir Hossen, Md Hafizur Rahman	(参考訳) 生物学的にインスパイアされたコンピューティング技術は非常に効果的で、データクラスタリングを含む多くの研究で有用である。アントクラスタリングアルゴリズムは自然に着想を得たクラスタリング手法であり、20年以上にわたって広く研究されてきた。本研究では,アリクラスタリングアルゴリズム(ACA)をハイブリッドアリクラスタリングアルゴリズム(hACA)に拡張する。具体的には,遺伝的アルゴリズムを標準ACAに組み込んで,ハイブリットアルゴリズムを高性能に拡張する。また、クラスタリングのパフォーマンスを高速化するために、新しいピックアップとドロップのルールを導入しました。本稿では,hACAアルゴリズムの性能について検討し,ベンチマークとして標準ACAと比較する。 Biologically inspired computing techniques are very effective and useful in many areas of research including data clustering. Ant clustering algorithm is a nature-inspired clustering technique which is extensively studied for over two decades. In this study, we extend the ant clustering algorithm (ACA) to a hybrid ant clustering algorithm (hACA). Specifically, we include a genetic algorithm in standard ACA to extend the hybrid algorithm for better performance. We also introduced novel pick up and drop off rules to speed up the clustering performance. We study the performance of the hACA algorithm and compare with standard ACA as a benchmark.	翻訳日:2021-07-16 13:48:50 公開日:2021-07-11
# (参考訳) 変分オートエンコーダを用いた多対多音声変換による特徴分散 Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder ( http://arxiv.org/abs/2107.06642v1 ) ライセンス: CC BY 4.0	Manh Luong and Viet Anh Tran	(参考訳) 音声変換は, 話者の音声特性を言語内容を変えることなく, 対象話者に変換する難易度の高い課題である。近年、変分オートエンコーダ(vaes)に基づく多対多音声変換(vc)において、良好な結果が得られているが、これらの手法では、話者のアイデンティティと言語コンテンツを分離して、見当たらない話者シナリオで優れたパフォーマンスを達成する能力が欠如している。本稿では,多数の音声変換に対応するために,特徴の絡み合いに基づく新しい手法を提案する。本手法は話者識別と言語コンテンツを発話から切り離す機能を備えており、音源話者を1つのオートエンコーダネットワークで多くのターゲット話者に変換することができる。さらに、目に見えないターゲット話者シナリオを自然に扱う。提案手法は,自然性や話者の類似性の観点から,他の最先端モデルと比較し,客観的評価と主観評価の両方を行う。 Voice conversion is a challenging task which transforms the voice characteristics of a source speaker to a target speaker without changing linguistic content. Recently, there have been many works on many-to-many Voice Conversion (VC) based on Variational Autoencoder (VAEs) achieving good results, however, these methods lack the ability to disentangle speaker identity and linguistic content to achieve good performance on unseen speaker scenarios. In this paper, we propose a new method based on feature disentanglement to tackle many to many voice conversion. The method has the capability to disentangle speaker identity and linguistic content from utterances, it can convert from many source speakers to many target speakers with a single autoencoder network. Moreover, it naturally deals with the unseen target speaker scenarios. We perform both objective and subjective evaluations to show the competitive performance of our proposed method compared with other state-of-the-art models in terms of naturalness and target speaker similarity.	翻訳日:2021-07-16 05:45:09 公開日:2021-07-11
# (参考訳) 科学的研究手法を用いたパターン発見と検証 Pattern Discovery and Validation Using Scientific Research Methods ( http://arxiv.org/abs/2107.06065v1 ) ライセンス: CC BY 4.0	Dirk Riehle, Nikolay Harutyunyan, Ann Barcomb	(参考訳) 以前は認識されていなかったパターンを発見するパターン発見は、しばしばアドホックなプロセスとして行われ、提案したパターンの品質の確実性はほとんどない。パターン検証は、提案されたパターンの精度を検証するプロセスであり、「3つの規則」の単純なヒューリスティックに支配されている。本稿では,パターン発見と検証のために確立された科学的研究手法の活用方法について述べる。本稿では, 質的調査, 行動研究, ケーススタディによるパターン発見・評価の手法であるハンドブック法(handbook method)を提案し, 科学的手法全般の基本的な原理について考察する。本手法を3つの探索研究を用いて評価し,その有用性を示す。 Pattern discovery, the process of discovering previously unrecognized patterns, is often performed as an ad-hoc process with little resulting certainty in the quality of the proposed patterns. Pattern validation, the process of validating the accuracy of proposed patterns, remains dominated by the simple heuristic of "the rule of three". This article shows how to use established scientific research methods for the purpose of pattern discovery and validation. We present a specific approach, called the handbook method, that uses the qualitative survey, action research, and case study research for pattern discovery and evaluation, and we discuss the underlying principle of using scientific methods in general. We evaluate the handbook method using three exploratory studies and demonstrate its usefulness.	翻訳日:2021-07-15 05:12:17 公開日:2021-07-11
# (参考訳) DiCOVA-Net:Deep Residual Network for the DiCOVA Challenge 2021による新型コロナウイルスの診断 DiCOVA-Net: Diagnosing COVID-19 using Acoustics based on Deep Residual Network for the DiCOVA Challenge 2021 ( http://arxiv.org/abs/2107.06126v1 ) ライセンス: CC BY 4.0	Jiangeng Chang, Shaoze Cui, Mengling Feng	(参考訳) 本稿では,ウイルス感染患者を音響記録に基づいて同定するディープ残余ネットワークベース手法,すなわちDiCOVA-Netを提案する。感染した患者より健康な人がはるかに多いため、この分類問題は不均衡なデータの課題に直面している。マイノリティクラス(感染した患者)を認識できるモデルの能力を向上させるため、データ拡張とコストに敏感な手法をモデルに導入した。さらに、このタスクの特異性を考慮すると、事前トレーニングされたResNet50を調整するための微調整技術をデプロイする。さらに,モデルの一般化性を向上させるために,ランダムな種を用いた複数ベース分類器からの予測結果を統合したアンサンブル学習を行う。提案したDiCOVA-Netの性能を評価するために,DiCOVAチャレンジデータセットを用いて実験を行った。その結果,AUCでは85.43\%を達成していることがわかった。 In this paper, we propose a deep residual network-based method, namely the DiCOVA-Net, to identify COVID-19 infected patients based on the acoustic recording of their coughs. Since there are far more healthy people than infected patients, this classification problem faces the challenge of imbalanced data. To improve the model's ability to recognize minority class (the infected patients), we introduce data augmentation and cost-sensitive methods into our model. Besides, considering the particularity of this task, we deploy some fine-tuning techniques to adjust the pre-training ResNet50. Furthermore, to improve the model's generalizability, we use ensemble learning to integrate prediction results from multiple base classifiers generated using different random seeds. To evaluate the proposed DiCOVA-Net's performance, we conducted experiments with the DiCOVA challenge dataset. The results show that our method has achieved 85.43\% in AUC, among the top of all competing teams.	翻訳日:2021-07-15 04:53:54 公開日:2021-07-11
# 条件ICAによる機能的磁気共鳴画像データ増大 Functional Magnetic Resonance Imaging data augmentation through conditional ICA ( http://arxiv.org/abs/2107.06104v1 ) ライセンス: Link先を確認	Badr Tajini, Hugo Richard, Bertrand Thirion	(参考訳) 計算認知神経画像研究の進歩は、大量のラベル付き脳画像データの利用可能性に関連しているが、そのようなデータは少ないし、コストもかかる。 generative adversarial networks(gans)のような強力なデータ生成メカニズムは、コンピュータビジョンのために過去10年間に設計されてきたが、このような改善はまだ脳イメージングに引き継がれていない。機能的ニューロイメージングで使用可能なノイズ,高次元,小型のサンプルデータにはgansトレーニングが不適当である可能性が高いためと考えられる。本報告では,条件付き独立成分分析(conditional ica: fast functional magnetic resonance imaging, fmri)データ拡張技術を紹介する。次に、少数のサンプルで観察されたクラスにジェネレータを条件付けるメカニズムを提案する。まず,生成機構が観察と区別できないデータの合成に成功し,脳デコード問題における分類精度が向上することを示す。特に、最適化と解釈がずっと簡単でありながら、GANよりも優れています。最後に、Conditional ICAはパラメータチューニングなしで8つのデータセットの分類精度を向上させる。 Advances in computational cognitive neuroimaging research are related to the availability of large amounts of labeled brain imaging data, but such data are scarce and expensive to generate. While powerful data generation mechanisms, such as Generative Adversarial Networks (GANs), have been designed in the last decade for computer vision, such improvements have not yet carried over to brain imaging. A likely reason is that GANs training is ill-suited to the noisy, high-dimensional and small-sample data available in functional neuroimaging.In this paper, we introduce Conditional Independent Components Analysis (Conditional ICA): a fast functional Magnetic Resonance Imaging (fMRI) data augmentation technique, that leverages abundant resting-state data to create images by sampling from an ICA decomposition. We then propose a mechanism to condition the generator on classes observed with few samples. We first show that the generative mechanism is successful at synthesizing data indistinguishable from observations, and that it yields gains in classification accuracy in brain decoding problems. In particular it outperforms GANs while being much easier to optimize and interpret. Lastly, Conditional ICA enhances classification accuracy in eight datasets without further parameters tuning.	翻訳日:2021-07-14 14:30:30 公開日:2021-07-11
# (参考訳) 他者から学ぶ - 限定スーパービジョンによる一般化ゼロショット学習の再考 Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision ( http://arxiv.org/abs/2107.04952v1 ) ライセンス: CC BY 4.0	Gaurav Bhatt, Shivam Chandok and Vineeth N Balasubramanian	(参考訳) ほとんどゼロと少数ショットの学習アプローチの一般的な問題は、クラスに対する偏見に悩まされ、サブ最適性能をもたらすことである。既存の取り組みは、訓練中に目に見えないクラス(すなわち、トランスダクティブゼロショット)からラベルなしの画像を活用することを目的としている。しかし、対象とするunseenクラスのデータが使用できない、あるいは収集できない、実用的なシナリオでは使用が制限される。そこで,本研究では,見知らぬカテゴリに属さない他のデータクラスからのラベルなしイメージを,任意の学習における一般化向上に活用する,帰納的ゼロ・少数ショット学習の実践的設定を提案する。我々は、製品・オブ・エキスパートズに基づく定式化を活用し、通常は利用可能であり、事実上アノテーションコストを伴わないデータ・クラスのラベルなしサンプルを使用できる新しいaudモジュールを導入する。さらに,本モデルの実用的かつ難解な汎用的なゼロショットを限定的な監督設定で解決する可能性も示し,基本視クラスでさえ十分な注釈付きサンプルを持っていないことを示した。 A common problem with most zero and few-shot learning approaches is they suffer from bias towards seen classes resulting in sub-optimal performance. Existing efforts aim to utilize unlabeled images from unseen classes (i.e transductive zero-shot) during training to enable generalization. However, this limits their use in practical scenarios where data from target unseen classes is unavailable or infeasible to collect. In this work, we present a practical setting of inductive zero and few-shot learning, where unlabeled images from other out-of-data classes, that do not belong to seen or unseen categories, can be used to improve generalization in any-shot learning. We leverage a formulation based on product-of-experts and introduce a new AUD module that enables us to use unlabeled samples from out-of-data classes which are usually easily available and practically entail no annotation cost. In addition, we also demonstrate the applicability of our model to address a more practical and challenging, Generalized Zero-shot under a limited supervision setting, where even base seen classes do not have sufficient annotated samples.	翻訳日:2021-07-14 05:36:18 公開日:2021-07-11
# (参考訳) ReconVAT:低リソース実世界のデータのための半スーパービジョン自動音楽書き起こしフレームワーク ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data ( http://arxiv.org/abs/2107.04954v1 ) ライセンス: CC BY 4.0	Kin Wai Cheuk, Dorien Herremans, Li Su	(参考訳) 現在のsupervised automatic music transcription (amt) モデルは、ほとんどが一般化することができない。これは、ラベル付きトレーニングデータに表示されない様々な音楽ジャンルから実際の音楽録音を翻訳するのに苦労していることを意味する。本稿では,膨大な量の未収録楽曲を活用できる半教師付きフレームワークReconVATを提案する。提案手法は再構成損失と仮想敵訓練を用いる。 AMTの既存のU-netモデルと組み合わせると、ReconVATはMAPSやMusicNetといった一般的なベンチマークデータセットで競合する結果が得られる。例えば、MusicNetの文字列部分バージョンの数ショット設定では、ReconVATはノートワイドとノートウィザードのメトリクスでそれぞれ61.0%と41.6%のF1スコアを達成しており、教師付きベースラインモデルと比較して22.2%と62.5%の改善となっている。提案するフレームワークでは,新たなデータに対する継続的な学習の可能性も示している。 Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT achieves competitive results on common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translates into an improvement of 22.2% and 62.5% compared to the supervised baseline model. Our proposed framework also demonstrates the potential of continual learning on new data, which could be useful in real-world applications whereby new data is constantly available.	翻訳日:2021-07-14 05:16:02 公開日:2021-07-11
# (参考訳) 適応的音声持続時間修正のためのディープベイズフレームワーク A Deep-Bayesian Framework for Adaptive Speech Duration Modification ( http://arxiv.org/abs/2107.04973v1 ) ライセンス: CC BY 4.0	Ravi Shankar and Archana Venkataraman	(参考訳) 与えられた音声信号の持続時間を適応的に修正する最初の方法を提案する。提案手法はベイズフレームワークを用いて,入力とターゲット発話のフレームをリンクする潜在注意マップを定義する。我々は、マスク付き畳み込みエンコーダデコーダネットワークをトレーニングし、このアテンションマップを平均絶対誤差損失関数の確率バージョンで生成し、またエンコーダ埋め込みを用いてターゲット音声信号の長さを予測する。予測された長さはデコーダ操作のステップ数を決定する。推定中、与えられた入力音声と未知の目標音声信号との類似度行列の代理としてアテンションマップを生成する。この類似性行列を用いて、2つの信号間のアライメントの歪み経路を計算する。この適応的フレームワークは、音声変換と感情変換の両方のタスクにおいて、既知の目標信号に依存する動的時間ワープと類似した結果が得られることを示す。また,本手法は,最先端のボコーダに匹敵する高い品質の音声を生成することを示す。 We propose the first method to adaptively modify the duration of a given speech signal. Our approach uses a Bayesian framework to define a latent attention map that links frames of the input and target utterances. We train a masked convolutional encoder-decoder network to produce this attention map via a stochastic version of the mean absolute error loss function; our model also predicts the length of the target speech signal using the encoder embeddings. The predicted length determines the number of steps for the decoder operation. During inference, we generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal. Using this similarity matrix, we compute a warping path of alignment between the two signals. Our experiments demonstrate that this adaptive framework produces similar results to dynamic time warping, which relies on a known target signal, on both voice conversion and emotion conversion tasks. We also show that our technique results in a high quality of generated speech that is on par with state-of-the-art vocoders.	翻訳日:2021-07-14 04:59:47 公開日:2021-07-11
# (参考訳) STR-GODEs:Metro Ridership Predictionのための空間時間ライダーグラフODEs STR-GODEs: Spatial-Temporal-Ridership Graph ODEs for Metro Ridership Prediction ( http://arxiv.org/abs/2107.04980v1 ) ライセンス: CC BY 4.0	Chuyu Huang	(参考訳) 地下鉄の乗客予測は常に政府や研究者から大きな注目を集めている。近年,複雑なグラフ畳み込み型リカレントネットワークアーキテクチャの設計に注目が集まっている。これらの研究は空間次元の情報をうまく抽出するが、時間次元の限界はまだ残っている。我々は,ニューラルネットワークアルゴリズムをグラフネットワークに拡張し,時間的,時間的,ライダー間の相関関係をタイムライン上で等間隔に分割することなく効果的に学習できるSTR-GODEsネットワークを提案する。空間的関係と時間的相関を学習しながら,gode-rnnセルを改変してライダーシップ特性と隠れ状態を得る。ライダー情報とその隠れ状態がGODESolveに追加され、予測の長い時系列によるエラーの蓄積を低減する。 2つの大規模データセットに対する大規模な実験は、我々のモデルの有効性と堅牢性を示している。 The metro ridership prediction has always received extensive attention from governments and researchers. Recent works focus on designing complicated graph convolutional recurrent network architectures to capture spatial and temporal patterns. These works extract the information of spatial dimension well, but the limitation of temporal dimension still exists. We extended Neural ODE algorithms to the graph network and proposed the STR-GODEs network, which can effectively learn spatial, temporal, and ridership correlations without the limitation of dividing data into equal-sized intervals on the timeline. While learning the spatial relations and the temporal correlations, we modify the GODE-RNN cell to obtain the ridership feature and hidden states. Ridership information and its hidden states are added to the GODESolve to reduce the error accumulation caused by long time series in prediction. Extensive experiments on two large-scale datasets demonstrate the efficacy and robustness of our model.	翻訳日:2021-07-14 04:47:04 公開日:2021-07-11
# (参考訳) 言語間変換再考による低リソース読解理解の改善 Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking ( http://arxiv.org/abs/2107.05002v1 ) ライセンス: CC BY 4.0	Gaochen Wu, Bin Xu1, Yuxin Qin, Fei Kong, Bangchang Liu, Hongwen Zhao, Dejie Chang	(参考訳) Extractive Reading Comprehension (ERC)は、大規模で高品質なERCトレーニングデータの提供によって、大幅に進歩した。このような急速な進歩と広範な応用にもかかわらず、英語のような高リソース言語以外の言語でのデータセットは乏しいままである。この問題に対処するために,既存の高品質抽出読解データセットを多言語環境でモデル化し,XLTT(Cross-Lingual Transposition ReThinking)モデルを提案する。具体的には、多言語適応的注意(MAA)を用いて、各言語族からより汎用的な意味と語彙の知識を学習する。さらに、既存のデータセットをフル活用するために、既存のデータセットとターゲットデータセット間のタスクレベルの類似性を計算することで、モデルをトレーニングするための新しいトレーニングフレームワークを採用しています。実験の結果、xlttモデルは2つの多言語ercベンチマークで6つのベースラインを上回っており、特に3.9と4.1の平均改善率の低リソース言語ではより効果的であった。 Extractive Reading Comprehension (ERC) has made tremendous advances enabled by the availability of large-scale high-quality ERC training data. Despite of such rapid progress and widespread application, the datasets in languages other than high-resource languages such as English remain scarce. To address this issue, we propose a Cross-Lingual Transposition ReThinking (XLTT) model by modelling existing high-quality extractive reading comprehension datasets in a multilingual environment. To be specific, we present multilingual adaptive attention (MAA) to combine intra-attention and inter-attention to learn more general generalizable semantic and lexical knowledge from each pair of language families. Furthermore, to make full use of existing datasets, we adopt a new training framework to train our model by calculating task-level similarities between each existing dataset and target dataset. The experimental results show that our XLTT model surpasses six baselines on two multilingual ERC benchmarks, especially more effective for low-resource languages with 3.9 and 4.1 average improvement in F1 and EM, respectively.	翻訳日:2021-07-14 04:34:16 公開日:2021-07-11
# (参考訳) インスタンス探索による正確な位置推定 Towards Accurate Localization by Instance Search ( http://arxiv.org/abs/2107.05005v1 ) ライセンス: CC BY 4.0	Yi-Geng Hong, Hui-Chu Xiao, Wan-Lei Zhao	(参考訳) ビジュアルオブジェクトのローカライゼーションは、一連のオブジェクト検出タスクにおける重要なステップである。文献では, 主観的に監視された枠組みを用いて, 高い局所化精度を達成している。しかし、そのような手法にはオブジェクトレベルのアノテーションが必要であり、未知のカテゴリのオブジェクトを検出できない。弱い管理手法も同様の困難に直面している。本稿では,インスタンス探索によって返されるランクリストの精度の高いオブジェクトローカライゼーションを実現するための自己評価学習フレームワークを提案する。提案フレームワークは,クエリと対応するトップランク検索結果から,ターゲットインスタンスを徐々にマイニングする。共通のインスタンスはクエリとランクリスト内のイメージの間で共有されるので、対象のビジュアルインスタンスは、対象のカテゴリが何であるかを知らずに正確にローカライズすることができる。インスタンス検索でのローカライズの実行に加えて、少数ショットのオブジェクト検出の問題も同じフレームワークで対処されている。両タスクで最先端手法よりも優れた性能が観察される。 Visual object localization is the key step in a series of object detection tasks. In the literature, high localization accuracy is achieved with the mainstream strongly supervised frameworks. However, such methods require object-level annotations and are unable to detect objects of unknown categories. Weakly supervised methods face similar difficulties. In this paper, a self-paced learning framework is proposed to achieve accurate object localization on the rank list returned by instance search. The proposed framework mines the target instance gradually from the queries and their corresponding top-ranked search results. Since a common instance is shared between the query and the images in the rank list, the target visual instance can be accurately localized even without knowing what the object category is. In addition to performing localization on instance search, the issue of few-shot object detection is also addressed under the same framework. Superior performance over state-of-the-art methods is observed on both tasks.	翻訳日:2021-07-14 04:06:07 公開日:2021-07-11
# (参考訳) 模倣と強化学習を用いた安定分子の生成 Generating stable molecules using imitation and reinforcement learning ( http://arxiv.org/abs/2107.05007v1 ) ライセンス: CC BY 4.0	S{\o}ren Ager Meldgaard, Jonas K\"ohler, Henrik Lund Mortensen, Mads-Peter V. Christiansen, Frank No\'e, Bj{\o}rk Hammer	(参考訳) 化学空間は、興味深い分子を発見するために機械学習手法によって定期的に探索される。しかし、これらの方法はしばしばグラフ表現に依存し、分子の安定性を決定するのに必要な3d情報を無視している。本稿では,安定性の量子化学予測を可能にする直交座標分子生成のための強化学習手法を提案する。サンプル効率を向上させるために,GDB-11データベース上の模倣学習から基本的な化学規則を学習し,すべての確率論に適用可能な初期モデルを作成する。次に、強化学習環境において、特定の確率論に基づくモデルの複数のコピーをデプロイする。モデルはデータベース内の低エネルギー分子を正確に同定し、トレーニングセットにない新しい異性体を生成する。最後に、このモデルをより大きな分子に適用し、強化学習がトレーニングデータから離れた領域における模倣学習モデルをさらに洗練させることを示す。 Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.	翻訳日:2021-07-14 03:52:14 公開日:2021-07-11
# (参考訳) BCNet:乳がんグレーディングのための深層畳み込みニューラルネットワーク BCNet: A Deep Convolutional Neural Network for Breast Cancer Grading ( http://arxiv.org/abs/2107.05037v1 ) ライセンス: CC BY 4.0	Pouya Hallaj Zavareh, Atefeh Safayari, Hamidreza Bolhasani	(参考訳) 乳がんは、世界中の人々が影響を受ける最も一般的ながんの1つであり、特定の女性において、人間にとって深刻な脅威となっている。この癌を効果的に治療または予防するために、早期の疾患診断は非常に重要である。画像の使用が支配的な役割を果たさなければならないこの障害を検出する様々な方法がある。近年、深層学習は科学、特に医学において広く採用されている。乳癌検出問題では、さまざまなデータセット上でさまざまなディープラーニング技術が開発され、精度が向上した。本稿では,この画像データベース上の最初のアプリケーションとして,Databioxイメージデータセットから病理像を分類するディープニューラルネットワークモデルを提案する。提案モデルであるBCNetは,VGG16を特徴抽出器として利用可能なモデルから選択するトランスファー学習手法を利用した。さらに,データ不足の問題に対処するため,データ拡張手法を用いて入力データセットの拡張を行った。本研究のすべての実装は、前処理アクションからモデルアーキテクチャの図形まで、tf.keras APIを使用して実施されている。提案したモデル実行の結果,88%の有意な検証精度と72%の評価精度が得られた。 Breast cancer has become one of the most prevalent cancers by which people all over the world are affected and is posed serious threats to human beings, in a particular woman. In order to provide effective treatment or prevention of this cancer, disease diagnosis in the early stages would be of high importance. There have been various methods to detect this disorder in which using images have to play a dominant role. Deep learning has been recently adopted widely in different areas of science, especially medicine. In breast cancer detection problems, some diverse deep learning techniques have been developed on different datasets and resulted in good accuracy. In this article, we aimed to present a deep neural network model to classify histopathological images from the Databiox image dataset as the first application on this image database. Our proposed model named BCNet has taken advantage of the transfer learning approach in which VGG16 is selected from available pertained models as a feature extractor. Furthermore, to address the problem of insufficient data, we employed the data augmentation technique to expand the input dataset. All implementations in this research, ranging from pre-processing actions to depicting the diagram of the model architecture, have been carried out using tf.keras API. As a consequence of the proposed model execution, the significant validation accuracy of 88% and evaluation accuracy of 72% obtained.	翻訳日:2021-07-14 03:29:54 公開日:2021-07-11
# (参考訳) 局所性を考慮したマルチビュークラスタリングフレームワーク Locality Relationship Constrained Multi-view Clustering Framework ( http://arxiv.org/abs/2107.05073v1 ) ライセンス: CC0 1.0	Xiangzhu Meng, Wei Wei, Wenzhe Liu	(参考訳) ほとんどの実用的なアプリケーションでは、異なるビューから複数の機能を使って1つのオブジェクトを表現するのが一般的です。これらの研究の中で、マルチビューサブスペースベースのクラスタリングは、マルチビューデータのクラスタリングソリューションを提供することを目的として、多くの研究者から注目を集めている。しかし、既存の手法のほとんどは、多視点シナリオ下でのサンプル間の局所性幾何学的構造と類似性の関係を十分に利用できない。そこで本研究では,局所性制約付きマルチビュークラスタリングフレームワーク (lrc-mcf) と呼ばれるマルチビュークラスタリングの問題を検討するために,局所性制約付きマルチビュー学習手法を提案する。 LRC-MCFは,多視点間の局所性関係情報と共通類似性関係を捉えることにより,異なる視点間の多様性,幾何学的,コンセンサス,相補的情報を探索することを目的としている。さらに、LCC-MCFは、共通ビューの局所性構造を見つける際に異なる視点の重みを十分に考慮し、最終的なクラスタを直接生成する。学習表現の冗長性を効果的に低減するため、共通類似性行列に対する低ランク制約も考慮される。 LRC-MCFの最小化問題を解決するために、全ての変数を反復的に計算する交代方向最小化(ADM)法が提供される。 7つのベンチマークマルチビューデータセットの広範な実験結果により、lrc-mcf法の有効性が検証された。 In most practical applications, it's common to utilize multiple features from different views to represent one object. Among these works, multi-view subspace-based clustering has gained extensive attention from many researchers, which aims to provide clustering solutions to multi-view data. However, most existing methods fail to take full use of the locality geometric structure and similarity relationship among samples under the multi-view scenario. To solve these issues, we propose a novel multi-view learning method with locality relationship constraint to explore the problem of multi-view clustering, called Locality Relationship Constrained Multi-view Clustering Framework (LRC-MCF). LRC-MCF aims to explore the diversity, geometric, consensus and complementary information among different views, by capturing the locality relationship information and the common similarity relationships among multiple views. Moreover, LRC-MCF takes sufficient consideration to weights of different views in finding the common-view locality structure and straightforwardly produce the final clusters. To effectually reduce the redundancy of the learned representations, the low-rank constraint on the common similarity matrix is considered additionally. To solve the minimization problem of LRC-MCF, an Alternating Direction Minimization (ADM) method is provided to iteratively calculate all variables LRC-MCF. Extensive experimental results on seven benchmark multi-view datasets validate the effectiveness of the LRC-MCF method.	翻訳日:2021-07-14 03:17:20 公開日:2021-07-11
# (参考訳) SGD: 急激な正規化, バッチサイズ, マルチエポックの役割 SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs ( http://arxiv.org/abs/2107.05074v1 ) ライセンス: CC BY-SA 4.0	Satyen Kale, Ayush Sekhari, Karthik Sridharan	(参考訳) SGD(Stochastic Gradient Descent)は,大規模過パラメータモデルを用いて学習する方法である。 SGDが実際にうまく機能する理由を説明する一般的な理論は、アルゴリズムが良い解に向けて出力をバイアスする暗黙の正規化を持っていることである。おそらく理論上最もよく知られたsgdの学習設定は確率凸最適化(sco)であり、sgdはサンプル数である$o(1/\sqrt{n})$で学習することがよく知られている。本稿ではSCOの問題点を考察し,SGDにおける暗黙の正規化,バッチサイズ,複数エポックの役割について考察する。主な貢献は3つある: (a) 正規化者にとって、正規化実証リスク最小化が学習に失敗するSCO問題が存在することを示す。これにより、暗黙の正規化に基づくSGDの成功の説明が自動的に除外される。 b)サンプル複雑性の観点から,経験的損失の勾配降下(gd)によるsgdと学習の分離を提供する。任意のステップサイズと反復数を持つ GD が最適以下でしか学べないような SCO 問題が存在することを示す:少なくとも $\widetilde{\Omega}(1/n^{5/12})$。 (c) 一般的に用いられるSGDのマルチエポック版について述べる。最悪の場合、このアルゴリズムはsingle pass sgdと同じくらい優れていることが証明される。しかし、SCOの特定の問題に対して、データセットに複数回のパスを取ることはシングルパスSGDを著しく上回る。我々は,任意のデータ分布に対して学習可能な問題を示すことによって,一般的な学習環境にまで拡張し,この問題に対して,SGDは正規化関数のRERMよりも厳密に優れていることを示す。この結果が深層学習に与える影響について考察し,2層対角型ニューラルネットワークにおけるsgdとermの分離を示す。 Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of $O(1/\sqrt{n})$, where $n$ is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: (a) We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD. (b) We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$. (c) We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.	翻訳日:2021-07-14 02:56:18 公開日:2021-07-11
# (参考訳) 畳み込みニューラルネットワークを用いた肺結節の分類における入力サイズの影響 Effect of Input Size on the Classification of Lung Nodules Using Convolutional Neural Networks ( http://arxiv.org/abs/2107.05085v1 ) ライセンス: CC BY 4.0	Gorkem Polat, Yesim Dogrusoz Serinagaoglu, Ugur Halici	(参考訳) 近年,CTを用いた肺がん検診は,従来の胸部X線撮影と比較して肺がん死亡率を20%低下させることが示された。したがって、CT肺検診は世界中で広く使われ始めている。しかし、これらの画像の解析は放射線科医にとって深刻な負担である。 CTスキャンのスライス数は最大600までである。したがって、コンピュータ支援検出(CAD)システムは、データのより高速かつ正確な評価に非常に重要である。本研究では, 畳み込みニューラルネットワーク(CNN)を用いてCT肺検診を解析し, 偽陽性を減少させる枠組みを提案する。我々は、異なるボリュームサイズでモデルをトレーニングし、ボリュームサイズがシステムの性能に重要な役割を果たすことを示した。また, 核融合により, 核融合のパワーと全体の精度について検討した。 3dデータに適用された2次元畳み込み操作は情報損失をもたらす可能性があるため、3d cnnは2d cnnよりも好まれる。提案したフレームワークはLUNA16 Challengeで提供されるデータセット上でテストされ、1スキャンあたりの偽陽性で0.831の感度が得られた。 Recent studies have shown that lung cancer screening using annual low-dose computed tomography (CT) reduces lung cancer mortality by 20% compared to traditional chest radiography. Therefore, CT lung screening has started to be used widely all across the world. However, analyzing these images is a serious burden for radiologists. The number of slices in a CT scan can be up to 600. Therefore, computer-aided-detection (CAD) systems are very important for faster and more accurate assessment of the data. In this study, we proposed a framework that analyzes CT lung screenings using convolutional neural networks (CNNs) to reduce false positives. We trained our model with different volume sizes and showed that volume size plays a critical role in the performance of the system. We also used different fusions in order to show their power and effect on the overall accuracy. 3D CNNs were preferred over 2D CNNs because 2D convolutional operations applied to 3D data could result in information loss. The proposed framework has been tested on the dataset provided by the LUNA16 Challenge and resulted in a sensitivity of 0.831 at 1 false positive per scan.	翻訳日:2021-07-14 02:54:54 公開日:2021-07-11
# (参考訳) ニューラルネットワークを用いたビデオからのリモート酸素推定 Remote Blood Oxygen Estimation From Videos Using Neural Networks ( http://arxiv.org/abs/2107.05087v1 ) ライセンス: CC BY 4.0	Joshua Mathew, Xin Tian, Min Wu, Chau-Wai Wong	(参考訳) 血液酸素飽和度(SpO$_2$)は呼吸機能の不可欠な指標であり、新型コロナウイルスのパンデミックで注目されている。臨床所見から、COVID-19患者は明らかな症状の前にSpO$_2$が著しく低い可能性が示唆された。カメラの普及により、研究者はSpO$2$をビデオで監視する方法を調査する動機となった。スマートフォンに関するほとんどの以前のスキームはコンタクトベースで、スマートフォンのカメラと近くの光源を覆うために指先が必要で、照らされた組織から再放射された光を捉える。本稿では,スマートフォンカメラを用いた最初の畳み込みニューラルネットワークを用いた非接触SpO$_2$推定方式を提案する。このスキームは、生理的感覚のために参加者の手のビデオを分析し、便利で快適で、プライバシーを保護し、マスクを装着することを可能にする。我々は,spo$_2$測定のための光生理学的モデルに触発されたニューラルネットワークアーキテクチャを設計し,チャネル結合の重みを可視化することにより説明可能性を示す。提案手法は,接触型SpO$_2$測定のための最先端モデルよりも優れており,公衆衛生に寄与する可能性を示している。また,スキンタイプと手の側面がspo$_2$推定性能に及ぼす影響についても検討した。 Blood oxygen saturation (SpO$_2$) is an essential indicator of respiratory functionality and is receiving increasing attention during the COVID-19 pandemic. Clinical findings show that it is possible for COVID-19 patients to have significantly low SpO$_2$ before any obvious symptoms. The prevalence of cameras has motivated researchers to investigate methods for monitoring SpO$_2$ using videos. Most prior schemes involving smartphones are contact-based: They require a fingertip to cover the phone's camera and the nearby light source to capture re-emitted light from the illuminated tissue. In this paper, we propose the first convolutional neural network based noncontact SpO$_2$ estimation scheme using smartphone cameras. The scheme analyzes the videos of a participant's hand for physiological sensing, which is convenient and comfortable, and can protect their privacy and allow for keeping face masks on. We design our neural network architectures inspired by the optophysiological models for SpO$_2$ measurement and demonstrate the explainability by visualizing the weights for channel combination. Our proposed models outperform the state-of-the-art model that is designed for contact-based SpO$_2$ measurement, showing the potential of our proposed method to contribute to public health. We also analyze the impact of skin type and the side of a hand on SpO$_2$ estimation performance.	翻訳日:2021-07-14 02:36:57 公開日:2021-07-11
# (参考訳) Fairer Software Made Easier("Keys"を使用) Fairer Software Made Easier (using "Keys") ( http://arxiv.org/abs/2107.05088v1 ) ライセンス: CC BY 4.0	Tim Menzies, Kewen Peng, Andre Lustosa	(参考訳) ソフトウェア分析の説明を簡単にできますか? たぶん。最近の結果は、しばしば「キー効果」、すなわち「キー効果」を示すことを示している。いくつかの重要な機能が残りを制御する。言うまでもなく、いくつかのキーで制御されたシステムでは、説明と制御はキーをまたいでいくつかの"what-if"クエリを実行するだけの問題です。鍵効果を利用することで、倫理的AIシステムに必要なものなど、複雑な説明を劇的に単純化することが可能になる。 Can we simplify explanations for software analytics? Maybe. Recent results show that systems often exhibit a "keys effect"; i.e. a few key features control the rest. Just to say the obvious, for systems controlled by a few keys, explanation and control is just a matter of running a handful of "what-if" queries across the keys. By exploiting the keys effect, it should be possible to dramatically simplify even complex explanations, such as those required for ethical AI systems.	翻訳日:2021-07-14 02:22:13 公開日:2021-07-11
# (参考訳) アフリカ農業分野における機械学習の課題と機会--一般論から Machine Learning Challenges and Opportunities in the African Agricultural Sector -- A General Perspective ( http://arxiv.org/abs/2107.05101v1 ) ライセンス: CC BY-SA 4.0	Racine Ly	(参考訳) コンピュータの能力の向上、アルゴリズム技術の進歩、利用可能なデータの大幅な増加は、最近の人工知能(AI)技術の発展を可能にした。機械学習(ML)と呼ばれるその分野の1つでは、視覚、スピーチ、問題解決といった人間の知性に起因する特徴を模倣する能力が強い。しかし、以前の技術革命が示唆しているように、彼らの最も大きな影響は、そのテクノロジーの伝統的なユーザーではない他の分野にほとんど期待できる。農業部門はアフリカ経済にとって不可欠であり、気候変動時代には収量の改善、損失軽減、天然資源の効率的な管理が不可欠である。機械学習は、予測を行う上で付加価値を持つ技術であり、それゆえ、農業セクターにおける不確実性とリスクを低減する可能性がある。本研究の目的は、アフリカ農業におけるMLベースのソリューションの障壁を文脈化し、議論することである。第2部では、歴史的・技術的観点からのML技術の概要とその推進力について概説した。第3部では,農業におけるMLの利用状況について概説した。最後に、第4節では、アフリカにおけるMLの成長と、農業分野におけるMLベースのソリューションの作成と利用における潜在的な障壁について論じる。 The improvement of computers' capacities, advancements in algorithmic techniques, and the significant increase of available data have enabled the recent developments of Artificial Intelligence (AI) technology. One of its branches, called Machine Learning (ML), has shown strong capacities in mimicking characteristics attributed to human intelligence, such as vision, speech, and problem-solving. However, as previous technological revolutions suggest, their most significant impacts could be mostly expected on other sectors that were not traditional users of that technology. The agricultural sector is vital for African economies; improving yields, mitigating losses, and effective management of natural resources are crucial in a climate change era. Machine Learning is a technology with an added value in making predictions, hence the potential to reduce uncertainties and risk across sectors, in this case, the agricultural sector. The purpose of this paper is to contextualize and discuss barriers to ML-based solutions for African agriculture. In the second section, we provided an overview of ML technology from a historical and technical perspective and its main driving force. In the third section, we provided a brief review of the current use of ML in agriculture. Finally, in section 4, we discuss ML growing interest in Africa and the potential barriers to creating and using ML-based solutions in the agricultural sector.	翻訳日:2021-07-14 02:07:30 公開日:2021-07-11
# (参考訳) Repo2Vec:リポジトリの類似性決定のための包括的埋め込みアプローチ Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity ( http://arxiv.org/abs/2107.05112v1 ) ライセンス: CC BY 4.0	Md Omar Faruk Rokon, Pei Yan, Risul Islam, Michalis Faloutsos	(参考訳) githubのような大規模なオンラインアーカイブの中で、類似したリポジトリやクラスタをどうやって特定できるのでしょう? リポジトリの類似性の決定は、このようなソフトウェアエコシステムのダイナミクスと進化を研究する上で不可欠な構成要素である。重要な課題は、さまざまなリポジトリ機能の適切な表現を決定することである。 (a) 利用可能な情報のすべての側面をキャプチャし、 (b) MLalgorithmsによって容易に使用することができる。本稿では,リポジトリを分散ベクタとして表現するための総合的な埋め込み手法であるRepo2Vecを提案する。 a)メタデータ、(b)レポジトリの構造、(c)ソースコードの3つのタイプの情報について検討しています。また、これらの情報型を単一の埋め込みに表現し、組み合わせるための一連の埋め込みアプローチも導入します。この手法をGitHubから2つの実際のデータセットで評価し、1013リポジトリを組み合わせた。まず,提案手法が精度(93%vs78%)で従来の手法を上回り,ほぼ2倍の類似リポジトリと30%の偽陽性率を示した。次に,repo2vecが, (a) マルウェアと良性リポジトリの区別, (b) 有意義な階層的クラスタリングの識別といった,確かな基盤を提供する方法を示す。例えば、マルウェアと良性リポジトリの区別において、98%の精度と96%のリコールを実現しています。全体的な作業は、ターゲットプラットフォームや意図によるリポジトリ分類、コード再利用とクローンの検出、系統と進化の特定など、多くのリポジトリ分析機能を実現するための基本的なビルディングブロックです。 How can we identify similar repositories and clusters among a large online archive, such as GitHub? Determiningrepository similarity is an essential building block in studying the dynamics and the evolution of such software ecosystems. The key challenge is to determine the right representation for the diverse repository features in a way that: (a) it captures all aspects of the available information, and (b) it is readily usable by MLalgorithms. We propose Repo2Vec, a comprehensive embedding approach to represent a repository as a distributed vector by combining features from three types of information sources. As our key novelty, we consider three types of information: (a)metadata, (b) the structure of the repository, and (c) the source code. We also introduce a series of embedding approaches to represent and combine these information types into a single embedding. We evaluate our method with two real datasets from GitHub for a combined 1013 repositories. First, we show that our method outperforms previous methods in terms of precision (93%vs 78%), with nearly twice as many Strongly Similar repositories and 30% fewer False Positives. Second, we show how Repo2Vecprovides a solid basis for: (a) distinguishing between malware and benign repositories, and (b) identifying a meaningful hierarchical clustering. For example, we achieve 98% precision and 96%recall in distinguishing malware and benign repositories. Overall, our work is a fundamental building block for enabling many repository analysis functions such as repository categorization by target platform or intention, detecting code-reuse and clones, and identifying lineage and evolution.	翻訳日:2021-07-14 01:52:06 公開日:2021-07-11
# (参考訳) Deep Collaborative Filtering-based Method for Image Denoisingの詳細 Details Preserving Deep Collaborative Filtering-Based Method for Image Denoising ( http://arxiv.org/abs/2107.05115v1 ) ライセンス: CC BY 4.0	Basit O. Alawode, Mudassir Masood, Tarig Ballal, and Tareq Al-Naffouri	(参考訳) 何年もの間、複数のデノイジングアルゴリズムによって達成された改善にもかかわらず、その多くはデノイジング後の画像の細部を保存できていない。これは、画像に対する滑らかな効果の結果である。ほとんどのニューラルネットワークベースのアルゴリズムは、古典的な推論アルゴリズムよりも優れた量的性能を達成している。しかし、スムーズなアウト効果の結果、質的な(視覚的な)パフォーマンスに悩まされる。本稿では,この問題に対処するアルゴリズムを提案する。本稿では,画像デノイジングのための深い協調フィルタリング(deep-cofib)アルゴリズムを提案する。このアルゴリズムは、最適化されたニューラルネットワークモデルのセットを使用してスパース領域における画像パッチの協調分解を行う。これにより、ノイズ除去と詳細保存のトレードオフを良好に得ることができる高速アルゴリズムが得られる。大規模な実験により、DeepCoFiBは(PSNRとSSIMの観点から)定量的に、そして(視覚的に)多くの最先端の復調アルゴリズムより質的に(定量的に)優れていることが示された。 In spite of the improvements achieved by the several denoising algorithms over the years, many of them still fail at preserving the fine details of the image after denoising. This is as a result of the smooth-out effect they have on the images. Most neural network-based algorithms have achieved better quantitative performance than the classical denoising algorithms. However, they also suffer from qualitative (visual) performance as a result of the smooth-out effect. In this paper, we propose an algorithm to address this shortcoming. We propose a deep collaborative filtering-based (Deep-CoFiB) algorithm for image denoising. This algorithm performs collaborative denoising of image patches in the sparse domain using a set of optimized neural network models. This results in a fast algorithm that is able to excellently obtain a trade-off between noise removal and details preservation. Extensive experiments show that the DeepCoFiB performed quantitatively (in terms of PSNR and SSIM) and qualitatively (visually) better than many of the state-of-the-art denoising algorithms.	翻訳日:2021-07-14 01:32:46 公開日:2021-07-11
# (参考訳) eGHWT:拡張一般化Haar-Walsh変換 eGHWT: The extended Generalized Haar-Walsh Transform ( http://arxiv.org/abs/2107.05121v1 ) ライセンス: CC BY 4.0	Naoki Saito and Yiqun Shao	(参考訳) 正規格子の古典的設定からグラフやネットワークのより一般的な設定への計算調和解析ツールの拡張は非常に重要であり、近年多くの研究が行われている。 irion and saito (2014) によって開発された一般化ハール・ウォルシュ変換(ghwt)は、古典的なハール変換とウォルシュ・ハダマード変換の一般化であるグラフ上の信号に対する多スケール変換である。我々は、Thiele と Villemoes (1996) の適応時間周波数タイリングの一般化である拡張一般化Har-Walsh変換(eGHWT)を提案する。 eGHWTはグラフ領域分割の効率だけでなく、"シーケンス領域"分割の効率も同時に調べている。その結果、グラフ信号に対するeGHWTとその関連するベストベーシ選択アルゴリズムは、類似の計算コストである$O(N \log N)$,$N$が入力グラフのノード数である場合、前回のGHWTの性能を大幅に向上させる。 ghwt best-basisアルゴリズムは、$(1.5)^n$が$\mathbb{r}^n$で可能な正規直交基底のうち、与えられたタスクの最も適切な正規直交基底を求めるが、eghwt best-basisアルゴリズムは$0.618\cdot(1.84)^n$ で可能な正規直交基底を$\mathbb{r}^n$ で検索することで、より良いものを見つけることができる。本稿では,eGHWTベストベージアルゴリズムの詳細を説明し,グラフ信号として見る従来のデジタル画像だけでなく,真のグラフ信号を含むいくつかの例を用いて,その優位性を実証する。さらに, eghwtを2次元信号や行列データに拡張する方法を, 列や列から生成されたグラフのテンソル積として見ることにより示し, 画像近似などのアプリケーションでの有効性を示す。 Extending computational harmonic analysis tools from the classical setting of regular lattices to the more general setting of graphs and networks is very important and much research has been done recently. The Generalized Haar-Walsh Transform (GHWT) developed by Irion and Saito (2014) is a multiscale transform for signals on graphs, which is a generalization of the classical Haar and Walsh-Hadamard Transforms. We propose the extended Generalized Haar-Walsh Transform (eGHWT), which is a generalization of the adapted time-frequency tilings of Thiele and Villemoes (1996). The eGHWT examines not only the efficiency of graph-domain partitions but also that of "sequency-domain" partitions simultaneously. Consequently, the eGHWT and its associated best-basis selection algorithm for graph signals significantly improve the performance of the previous GHWT with the similar computational cost, $O(N \log N)$, where $N$ is the number of nodes of an input graph. While the GHWT best-basis algorithm seeks the most suitable orthonormal basis for a given task among more than $(1.5)^N$ possible orthonormal bases in $\mathbb{R}^N$, the eGHWT best-basis algorithm can find a better one by searching through more than $0.618\cdot(1.84)^N$ possible orthonormal bases in $\mathbb{R}^N$. This article describes the details of the eGHWT best-basis algorithm and demonstrates its superiority using several examples including genuine graph signals as well as conventional digital images viewed as graph signals. Furthermore, we also show how the eGHWT can be extended to 2D signals and matrix-form data by viewing them as a tensor product of graphs generated from their columns and rows and demonstrate its effectiveness on applications such as image approximation.	翻訳日:2021-07-14 01:22:05 公開日:2021-07-11
# (参考訳) 早期行動認識のための解釈可能なDeep Feature Propagation Interpretable Deep Feature Propagation for Early Action Recognition ( http://arxiv.org/abs/2107.05122v1 ) ライセンス: CC BY 4.0	He Zhao, Richard P. Wildes	(参考訳) 限られた予備観測からの初期アクション認識(アクション予測)は、リアルタイムな推論を必要とするストリーミング視覚システムにとって重要な役割を担っている。本研究では,空間的特徴空間における行動パターンの時間的変化を解明し,行動予測に対処する。私たちのシステムには3つの重要なコンポーネントがあります。まず、空間レイアウトを維持しながら、生データからの抽象化を可能にする中間層convnet機能を扱う。第二に、各特徴を伝播するのではなく、その残余を時間にわたって伝播し、冗長性を減少させるコンパクトな表現を可能にします。第3に、エラーのビルドと予測開始時間の統一にKalmanフィルタを使用します。複数のベンチマークでの大規模な実験結果から,本手法は動作予測における競合性能をもたらすことが示された。特筆すべきは,我々のシステムの学習した構成要素を,その不透明な性質を2つの方法で照らすことである。まず,我々の学習した特徴伝達モジュールが畳み込み下での空間シフト機構として機能し,現在の観測を未来に伝播させることを示す。これにより、フローベースの画像動き情報をキャプチャする。第2に,学習したカルマンフィルタは事前推定を適応的に更新し,シーケンス学習を支援する。 Early action recognition (action prediction) from limited preliminary observations plays a critical role for streaming vision systems that demand real-time inference, as video actions often possess elongated temporal spans which cause undesired latency. In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space. There are three key components to our system. First, we work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout. Second, instead of propagating features per se, we propagate their residuals across time, which allows for a compact representation that reduces redundancy. Third, we employ a Kalman filter to combat error build-up and unify across prediction start times. Extensive experimental results on multiple benchmarks show that our approach leads to competitive performance in action prediction. Notably, we investigate the learned components of our system to shed light on their otherwise opaque natures in two ways. First, we document that our learned feature propagation module works as a spatial shifting mechanism under convolution to propagate current observations into the future. Thus, it captures flow-based image motion information. Second, the learned Kalman filter adaptively updates prior estimation to aid the sequence learning process.	翻訳日:2021-07-14 01:20:32 公開日:2021-07-11
# (参考訳) eコマースセッションベースレコメンデーションのためのマルチモーダル機能とポストフュージョンコンテキストを備えたトランスフォーマー Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation ( http://arxiv.org/abs/2107.05124v1 ) ライセンス: CC BY-SA 4.0	Gabriel de Souza P. Moreira and Sara Rabhi and Ronay Ak and Md Yasin Kabir and Even Oldridge	(参考訳) セッションベースのレコメンデーションはEコマースサービスにとって重要なタスクであり、多数のユーザが匿名でブラウズしたり、異なるセッションに対して非常に異なる関心を持つことがある。本稿では,SIGIR 2021 Workshop on E-Commerce Data Challenge の推薦課題における勝者の1つについて述べる。私たちのソリューションはnlp技術にインスパイアされ、transformer-xlとxlnetという2つのトランスフォーマーアーキテクチャで構成されています。コンペで利用可能なリッチデータセットのほとんどを活用するために、表形式のイベントとテキストベクトルと画像ベクトルを組み合わせることで、マルチモデル機能をどのように準備したかを述べる。また,セッションベースレコメンデーションにおけるアーキテクチャの有効性をよりよく理解するために,モデル予測分析を提案する。 Session-based recommendation is an important task for e-commerce services, where a large number of users browse anonymously or may have very distinct interests for different sessions. In this paper we present one of the winning solutions for the Recommendation task of the SIGIR 2021 Workshop on E-commerce Data Challenge. Our solution was inspired by NLP techniques and consists of an ensemble of two Transformer architectures - Transformer-XL and XLNet - trained with autoregressive and autoencoding approaches. To leverage most of the rich dataset made available for the competition, we describe how we prepared multi-model features by combining tabular events with textual and image vectors. We also present a model prediction analysis to better understand the effectiveness of our architectures for the session-based recommendation.	翻訳日:2021-07-14 00:52:23 公開日:2021-07-11
# (参考訳) ビデオ予測理解の概観:早期行動認識と今後の行動予測 Review of Video Predictive Understanding: Early ActionRecognition and Future Action Prediction ( http://arxiv.org/abs/2107.05140v1 ) ライセンス: CC BY 4.0	He Zhao, Richard P. Wildes	(参考訳) ビデオ予測理解は、現在から観測されていない未来や歴史的ビデオ観察への期待に関係した幅広い取り組みを含んでいる。アクション予測はビデオ予測理解の主要な部分であり、このレビューの焦点となっている。この亜領域には、早期行動認識と将来の行動予測という2つの大きな区分がある。早期行動認識は、進行中の行動をできるだけ早く認識することに関心がある。将来の行動予測は、以前に観察された行動に従う行動の予測に関係している。いずれの場合も、過去、現在、および潜在的将来の情報の間の \textbf{\textit{causal}} の関係が主な焦点である。 Markov Chains、Gaussian Processes、Auto-Regressive Modeling、Bayesian Recursive Filteringといった様々な数学的ツールが、これらの2つのタスクのコンピュータビジョン技術と共同で広く採用されている。しかし、これらのアプローチは、次元性の呪い、一般化の貧弱、ドメイン固有の知識からの制約といった課題に直面している。近年,既存の視覚タスク,一般的には行動予測タスクの性能向上のために,深部畳み込みニューラルネットワークや繰り返しニューラルネットワークに依存する構造が広く提案されている。しかし、これらには独自の欠点、大規模なトレーニングデータへの依存、強力な理論的基盤の欠如がある。本調査では,最近注目され,実用的価値が証明された映像予測理解の幅広い領域のサブ領域の導入から始める。次に、様々な早期行動認識と将来の行動予測アルゴリズムの徹底的なレビューを行い、適切に整理された分割を行う。最後に、今後の研究方針で議論を締めくくります。 Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations. Action prediction is a major sub-area of video predictive understanding and is the focus of this review. This sub-area has two major subdivisions: early action recognition and future action prediction. Early action recognition is concerned with recognizing an ongoing action as soon as possible. Future action prediction is concerned with the anticipation of actions that follow those previously observed. In either case, the \textbf{\textit{causal}} relationship between the past, current, and potential future information is the main focus. Various mathematical tools such as Markov Chains, Gaussian Processes, Auto-Regressive modeling, and Bayesian recursive filtering are widely adopted jointly with computer vision techniques for these two tasks. However, these approaches face challenges such as the curse of dimensionality, poor generalization, and constraints from domain-specific knowledge. Recently, structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks, in general, and action prediction tasks, in particular. However, they have their own shortcomings, \eg reliance on massive training data and lack of strong theoretical underpinnings. In this survey, we start by introducing the major sub-areas of the broad area of video predictive understanding, which recently have received intensive attention and proven to have practical value. Next, a thorough review of various early action recognition and future action prediction algorithms are provided with suitably organized divisions. Finally, we conclude our discussion with future research directions.	翻訳日:2021-07-14 00:40:12 公開日:2021-07-11
# (参考訳) CFTrack:3Dマルチオブジェクトトラッキングのためのセンターベースレーダとカメラフュージョン CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object Tracking ( http://arxiv.org/abs/2107.05150v1 ) ライセンス: CC BY 4.0	Ramin Nabati, Landon Harris, Hairong Qi	(参考訳) 3Dマルチオブジェクトトラッキングは、自動運転車の認識システムにおいて重要な要素である。障害物回避や経路計画といったタスクには、車両周辺のすべての動的オブジェクトを追跡することが不可欠である。自動運転車は通常、精度と信頼性を向上させるために異なるセンサーモードを備えている。近年、センサ融合は物体検出ネットワークで広く使われているが、既存のマルチオブジェクト追跡アルゴリズムのほとんどは単一の入力モダリティに依存するか、複数のセンシングモダリティによって提供される情報を十分に活用していない。本研究では,レーダとカメラセンサの融合に基づく共同物体検出・追跡のためのエンドツーエンドネットワークを提案する。提案手法では,物体検出に中心型レーダカメラ融合アルゴリズムを用い,物体関連にグリーディアルゴリズムを用いる。提案手法は,検出した物体の深さ,速度,2次元変位を時間とともに関連づける。これにより、我々の追跡アルゴリズムは、物体をオクルードし重ね合わせるのに非常に頑健になり、深さと速度の情報がネットワークの識別に役立ちます。提案手法は,20.0AMOTAを達成し,ベンチマークにおけるすべての視覚ベースの3Dトラッキング手法と,ベースラインのLiDARベースの手法とを比較検討する。提案手法は画像当たり35msのランタイムを持つオンラインであり,自動運転アプリケーションに適している。 3D multi-object tracking is a crucial component in the perception system of autonomous driving vehicles. Tracking all dynamic objects around the vehicle is essential for tasks such as obstacle avoidance and path planning. Autonomous vehicles are usually equipped with different sensor modalities to improve accuracy and reliability. While sensor fusion has been widely used in object detection networks in recent years, most existing multi-object tracking algorithms either rely on a single input modality, or do not fully exploit the information provided by multiple sensing modalities. In this work, we propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion. Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association. The proposed greedy algorithm uses the depth, velocity and 2D displacement of the detected objects to associate them through time. This makes our tracking algorithm very robust to occluded and overlapping objects, as the depth and velocity information can help the network in distinguishing them. We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark, as well as the baseline LiDAR-based method. Our method is online with a runtime of 35ms per image, making it very suitable for autonomous driving applications.	翻訳日:2021-07-14 00:38:47 公開日:2021-07-11
# (参考訳) 学術論文への埋め込み:単語埋め込みとTFIDFの有効性 Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF ( http://arxiv.org/abs/2107.05151v1 ) ライセンス: CC BY 4.0	H.J. Meijer, J. Truong, R. Karimi	(参考訳) ここ数年、ニューラルネットワークによる単語の埋め込みは自然言語処理の文献で人気を博した。研究は主に、ウィキペディアや他のニュースやソーシャルメディアソースなどの公開コーパスで訓練された単語埋め込みの品質と応用に焦点を当てている。しかし、これらの研究は一般的なテキストに限られており、それゆえに専門的な語彙や略語、学術的な文脈で一般的に用いられる科学的公式のような技術的・科学的ニュアンスを欠いている。本研究は,大規模学術コーパスに適用した単語埋め込みの性能に着目した。具体的には、訓練された単語埋め込みの品質と効率を、科学論文のコンテンツモデリングにおけるTFIDF表現と比較する。我々は、約7000万の科学論文のタイトルと要約に基づいて訓練されたWord2vecスキップグラムモデルを使用する。さらに,コンテンツモデルを科学的文脈で評価するベンチマークを開発した。このベンチマークは、2017年に発行された約13万記事の論文とジャーナルをマッチングする分類タスクに基づいている。以上の結果から,単語埋め込みに基づくコンテンツモデルはタイトル(短文)ではよいが,TFIDFは抽象文(長文)ではよいことがわかった。しかし、より大きなテキストに対するtfidfのわずかな改善は、3.7倍のメモリ要求と最大184倍の計算時間を犠牲にして、オンラインアプリケーションでは非効率になる可能性がある。さらに,組込みモデルを用いて2次元のジャーナルの可視化を行い,定性的に組込みモデルを検査した。このグラフは有用な洞察を示し、新しいジャーナルを提案するための競合ジャーナルやギャップを見つけるために使用できる。 Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public available corpuses such as Wikipedia or other news and social media sources. However, these studies are limited to generic text and thus lack technical and scientific nuances such as domain specific vocabulary, abbreviations, or scientific formulas which are commonly used in academic context. This research focuses on the performance of word embeddings applied to a large scale academic corpus. More specifically, we compare quality and efficiency of trained word embeddings to TFIDF representations in modeling content of scientific articles. We use a word2vec skip-gram model trained on titles and abstracts of about 70 million scientific articles. Furthermore, we have developed a benchmark to evaluate content models in a scientific context. The benchmark is based on a categorization task that matches articles to journals for about 1.3 million articles published in 2017. Our results show that content models based on word embeddings are better for titles (short text) while TFIDF works better for abstracts (longer text). However, the slight improvement of TFIDF for larger text comes at the expense of 3.7 times more memory requirement as well as up to 184 times higher computation times which may make it inefficient for online applications. In addition, we have created a 2-dimensional visualization of the journals modeled via embeddings to qualitatively inspect embedding model. This graph shows useful insights and can be used to find competitive journals or gaps to propose new journals.	翻訳日:2021-07-14 00:25:45 公開日:2021-07-11
# 階層ラッパーを用いた因果発見の効率と精度の向上 Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper ( http://arxiv.org/abs/2107.05001v1 ) ライセンス: Link先を確認	Shami Nisimov, Yaniv Gurwicz, Raanan Y. Rohekar, Gal Novik	(参考訳) 観測データからの因果発見は多くの科学分野において重要なツールである。特定の仮定の下では、科学者は現象を説明し、予測し、決定することができる。大規模なサンプルリミットでは、音響および完全因果探索アルゴリズムが導入されており、因果関係を表す有向非巡回グラフ(DAG)またはその等価クラスが探索されている。しかし、現実のケースでは、有限のトレーニングデータしか利用できないため、これらのアルゴリズムが使用する統計的テストのパワーが制限され、推論因果モデルの誤差が生じる。これは、可能な限り統計テストを使用する戦略を考案することによって、一般的に対処される。本稿では,既存の制約に基づく因果発見アルゴリズムのための再帰的ラッパーとして,健全性と完全性を保持する戦略を提案する。初期から正規化されたminカット基準を用いて観測変数を再帰的にクラスタリングし、バックトラック中にベースライン因果探索アルゴリズムを用いて局所的な部分グラフを学習する。そしてそれらを組み合わせ、完全性を保証する。 Ablation study, using synthetic data, by common real-world benchmarks, we demonstrate that our approach requires significantly less statistics test, learns more accurate graphs, and requires short run-times than the baseline algorithm。 Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.	翻訳日:2021-07-13 16:23:27 公開日:2021-07-11
# 畳み込みニューラルネットワークのためのプルーニング基準のブレンディング Blending Pruning Criteria for Convolutional Neural Networks ( http://arxiv.org/abs/2107.05033v1 ) ライセンス: Link先を確認	Wei He, Zhongzhan Huang, Mingfu Liang, Senwei Liang, Haizhao Yang	(参考訳) 様々な視覚アプリケーションにおける畳み込みニューラルネットワーク(CNN)の進歩は多くの注目を集めている。しかし、CNNの大多数は、現実世界のデプロイメントの厳しい要件を満たすことができません。これを解決するために、最近の人気ネットワークプルーニングはモデルの冗長性を抑える効果的な方法である。しかし、異なる刈り取り基準での「類似性」によるフィルタのランキングは矛盾する可能性がある。 1つのフィルタは特定の基準に従って重要であり、もう1つの基準では不要であり、これは各基準が包括的な「重要度」の部分的なビューであることを示している。このモチベーションから,既存のフィルタプルーニング基準を統合するための新しい枠組みを提案する。提案手法は,基準クラスタリングとフィルタ重要度校正の2段階を含む。まず,「重要」スコアのランクに基づいて,階層的クラスタリングによってプルーニング基準を導出する。第2に,各クラスタ内で選択されたブレンド候補の重要度を調整し,最適ブレンド基準を進化的アルゴリズムで探索するキャリブレーション係数を提案する。 CIFAR-100 と ImageNet ベンチマークの定量的結果は,我々のフレームワークが最先端のベースラインより優れており,刈り込み後のコンパクトモデル性能に低下していることを示している。 The advancement of convolutional neural networks (CNNs) on various vision applications has attracted lots of attention. Yet the majority of CNNs are unable to satisfy the strict requirement for real-world deployment. To overcome this, the recent popular network pruning is an effective method to reduce the redundancy of the models. However, the ranking of filters according to their "importance" on different pruning criteria may be inconsistent. One filter could be important according to a certain criterion, while it is unnecessary according to another one, which indicates that each criterion is only a partial view of the comprehensive "importance". From this motivation, we propose a novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity. The proposed framework contains two stages: Criteria Clustering and Filters Importance Calibration. First, we condense the pruning criteria via layerwise clustering based on the rank of "importance" score. Second, within each cluster, we propose a calibration factor to adjust their significance for each selected blending candidates and search for the optimal blending criterion via Evolutionary Algorithm. Quantitative results on the CIFAR-100 and ImageNet benchmarks show that our framework outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.	翻訳日:2021-07-13 16:22:09 公開日:2021-07-11
# 行動認識における領域適応のための相関情報の調整 Aligning Correlation Information for Domain Adaptation in Action Recognition ( http://arxiv.org/abs/2107.04932v1 ) ライセンス: Link先を確認	Yuecong Xu, Jianfei Yang, Haozhi Cao, Kezhi Mao, Jianxiong Yin, Simon See	(参考訳) ドメイン適応(DA)はドメインシフトに対処し、異なるシナリオにネットワークを適用することを可能にする。近年,様々な画像daアプローチが提案されているが,ビデオdaに対する研究は限られている。これは、時空間における画素の長期依存性として抽出された相関特徴を含む、ビデオの様々な特徴の適応の複雑さによるものである。相関特性は行動クラスと高度に関連し,教師付き行動認識タスクによる正確な映像特徴抽出における効果を証明した。しかし、同じアクションの相関特性はドメインシフトによってドメインによって異なる。そこで本研究では,画素相関を調整して動作映像をアライメントする新しいadversarial correlation adaptation network (acan)を提案する。 ACANは、Pixel correlation Discrepancy (PCD)と呼ばれる相関情報の分布を最小限にすることを目的としている。さらに、ビデオDA研究は、より大きなドメインシフトを持つクロスドメインビデオデータセットの欠如によって制限されている。そこで我々は,ドメイン間の統計的な差が大きいことによるドメインシフトが大きいhmdb-aridデータセットを導入する。このデータセットは、現在のデータセットをダークビデオ分類に活用するために構築されている。実験により,既存のビデオDAデータセットと新しいビデオDAデータセットの両方に対して,提案したACANの最先端性能を示す。 Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, video DA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new video DA datasets.	翻訳日:2021-07-13 16:19:28 公開日:2021-07-11
# 部分逆時間的注意ネットワークを用いた部分映像領域適応 Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network ( http://arxiv.org/abs/2107.04941v1 ) ライセンス: Link先を確認	Yuecong Xu, Jianfei Yang, Haozhi Cao, Qi Li, Kezhi Mao, Zhenghua Chen	(参考訳) 部分的ドメイン適応 (Partial Domain Adaptation, PDA) は実用的で一般的なドメイン適応シナリオであり、ソースラベル空間がターゲットとなるように、完全に共有されたラベル空間の仮定を緩和する。 PDAの主な課題は、ソースのみのクラスによる負の転送の問題である。ビデオの場合、そのような負の転送は、空間的特徴と時間的特徴の両方によって引き起こされ、より困難なビデオ領域適応(PVDA)問題を引き起こす可能性がある。本稿では,ソースのみのクラスをフィルタリングするための空間的特徴と時間的特徴を両立させてPVDA問題に対処する,新しいPATAN(Partial Adversarial Temporal Attentive Network)を提案する。さらにpatanは、クラス濾過プロセスに寄与する局所的な時間的特徴に従うことによって、効果的な時間的特徴を構築する。さらに、PVDA問題の研究を容易にするための新しいベンチマークを導入し、幅広いPVDAシナリオについて紹介する。複数のPVDAベンチマークで提案したPATANの最先端性能を実証した。 Partial Domain Adaptation (PDA) is a practical and general domain adaptation scenario, which relaxes the fully shared label space assumption such that the source label space subsumes the target one. The key challenge of PDA is the issue of negative transfer caused by source-only classes. For videos, such negative transfer could be triggered by both spatial and temporal features, which leads to a more challenging Partial Video Domain Adaptation (PVDA) problem. In this paper, we propose a novel Partial Adversarial Temporal Attentive Network (PATAN) to address the PVDA problem by utilizing both spatial and temporal features for filtering source-only classes. Besides, PATAN constructs effective overall temporal features by attending to local temporal features that contribute more toward the class filtration process. We further introduce new benchmarks to facilitate research on PVDA problems, covering a wide range of PVDA scenarios. Empirical results demonstrate the state-of-the-art performance of our proposed PATAN across the multiple PVDA benchmarks.	翻訳日:2021-07-13 16:19:12 公開日:2021-07-11
# 自律走行用物体検出モデルにおける表面不確かさの定量化 Prediction Surface Uncertainty Quantification in Object Detection Models for Autonomous Driving ( http://arxiv.org/abs/2107.04991v1 ) ライセンス: Link先を確認	Ferhat Ozgur Catak, Tao Yue, Shaukat Ali	(参考訳) 自動運転車における物体検出は、一般的にカメラ画像とlidar入力に基づいており、オブジェクト認識や速度調整などの意思決定のためのディープニューラルネットワークなどの予測モデルを訓練するためによく使用される。このような意思決定における誤りが損なわれる可能性があるため、不確実性の測定を通じて、そのような予測モデルによる決定の信頼性を測定することが不可欠である。不確実性は、ディープラーニングモデルにおいて、しばしば分類問題に対して測定される。しかし、自動運転におけるディープラーニングモデルは、しばしば多出力回帰モデルである。そこで,このような回帰モデルの予測不確実性を測定するために,pure (prediction surface uncertainty) と呼ばれる新しい手法を提案する。物体認識問題を2次元カメラビューにおける物体位置を見つけるために複数の出力を持つ回帰モデルとして定式化する。評価のために、広く応用された3つのオブジェクト認識モデル(YoLo、SSD300、SSD512)を修正し、KITTI、Stanford Cars、Berkeley DeepDrive、NEXETデータセットを使用しました。その結果,予測表面の不確かさと予測精度との間に統計的に有意な負の相関がみられた。 Object detection in autonomous cars is commonly based on camera images and Lidar inputs, which are often used to train prediction models such as deep artificial neural networks for decision making for object recognition, adjusting speed, etc. A mistake in such decision making can be damaging; thus, it is vital to measure the reliability of decisions made by such prediction models via uncertainty measurement. Uncertainty, in deep learning models, is often measured for classification problems. However, deep learning models in autonomous driving are often multi-output regression models. Hence, we propose a novel method called PURE (Prediction sURface uncErtainty) for measuring prediction uncertainty of such regression models. We formulate the object recognition problem as a regression model with more than one outputs for finding object locations in a 2-dimensional camera view. For evaluation, we modified three widely-applied object recognition models (i.e., YoLo, SSD300 and SSD512) and used the KITTI, Stanford Cars, Berkeley DeepDrive, and NEXET datasets. Results showed the statistically significant negative correlation between prediction surface uncertainty and prediction accuracy suggesting that uncertainty significantly impacts the decisions made by autonomous driving.	翻訳日:2021-07-13 16:18:55 公開日:2021-07-11
# 適応型クラスリバランシング自己学習による半教師付き物体検出 Semi-Supervised Object Detection with Adaptive Class-Rebalancing Self-Training ( http://arxiv.org/abs/2107.05031v1 ) ライセンス: Link先を確認	Fangyuan Zhang, Tianxiang Pan, Bin Wang	(参考訳) 本研究は半教師付き物体検出(ssod)に分解し,ラベルなしデータの追加により検出性能を向上させる。最先端のssodパフォーマンスは、トレーニングの監督が基礎的真実と疑似ラベルで構成されるセルフトレーニングによって最近達成されている。本研究では,ssodにおけるクラス不均衡が自己学習の有効性を著しく損なうことを観察する。クラス不均衡に対処するため、CropBankと呼ばれる新しいメモリモジュールを用いた適応型クラス再分散自己学習(ACRST)を提案する。 ACRSTは、トレーニングデータをCropBankから抽出された前景インスタンスと適応的に再バランスし、クラス不均衡を軽減する。検出タスクの複雑さが高いため,ssodでは,自己学習とデータバランスの両方が雑音を伴う疑似ラベルに苦しむのが観察された。そこで本研究では,疑似ラベルを生成するための2段階フィルタリングアルゴリズムを提案する。提案手法は,MS-COCOおよびVOCベンチマークの良好な改善を実現する。 MS-COCOでラベル付きデータを使用する場合,教師付きベースラインよりも17.02mAP,最先端手法に比べて5.32mAPの改善が達成される。 This study delves into semi-supervised object detection (SSOD) to improve detector performance with additional unlabeled data. State-of-the-art SSOD performance has been achieved recently by self-training, in which training supervision consists of ground truths and pseudo-labels. In current studies, we observe that class imbalance in SSOD severely impedes the effectiveness of self-training. To address the class imbalance, we propose adaptive class-rebalancing self-training (ACRST) with a novel memory module called CropBank. ACRST adaptively rebalances the training data with foreground instances extracted from the CropBank, thereby alleviating the class imbalance. Owing to the high complexity of detection tasks, we observe that both self-training and data-rebalancing suffer from noisy pseudo-labels in SSOD. Therefore, we propose a novel two-stage filtering algorithm to generate accurate pseudo-labels. Our method achieves satisfactory improvements on MS-COCO and VOC benchmarks. When using only 1\% labeled data in MS-COCO, our method achieves 17.02 mAP improvement over supervised baselines, and 5.32 mAP improvement compared with state-of-the-art methods.	翻訳日:2021-07-13 16:18:36 公開日:2021-07-11
# 対話型可視化と解釈可能な機械学習を用いたセルフサービスデータ分類 Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning ( http://arxiv.org/abs/2107.04971v1 ) ライセンス: Link先を確認	Sridevi Narayana Wagle, Boris Kovalerchuk	(参考訳) 機械学習アルゴリズムは、エンドユーザーと開発者の両方が複雑なブラックボックスモデルと見なすモデルをしばしば生成する。彼らは設計したドメインの観点からモデルを説明することができません。提案する反復的ビジュアル論理分類器(ivlc)は、エンドユーザがモデルを設計し、信頼性を高め、精度を損なうことなくデータを分類できる、解釈可能な機械学習アルゴリズムである。このようなテクニックは、医療領域におけるがんデータなどの機密で重要なデータを、高いコストで処理する上で特に有用である。インタラクティブでロスレスな多次元可視化を提案することで、エンドユーザは、説明可能な決定を下すことができるデータ内のパターンを識別できる。このようなオプションは、ブラックボックスの機械学習方法論では不可能だ。解釈可能なIVLCアルゴリズムは、Interactive Shifted Paired Coordinates Software System (SPCVis)によってサポートされている。ユーザ対話型機能を備えた無損失多次元データ可視化システムである。インタラクティブなアプローチは、マシンラーニングの専門家に頼らずに、エンドユーザがセルフサービスとしてデータ分類を実行するための柔軟性を提供する。インタラクティブなパターン発見は、数百の次元/機能を持つ大きなデータセットを扱うときに困難になる。この問題を解決するために、この章では、新しいコーディネートオーダー最適化アルゴリズム(COO)と遺伝的アルゴリズムを組み合わせた自動分類手法を提案する。 COOアルゴリズムは、データ分離を最もよく表す座標対列を自動的に生成し、遺伝的アルゴリズムは、データ分類のための領域を自動的に生成することにより、提案したIVLCアルゴリズムの最適化を支援する。このアプローチの有効性は、データ分類に使用されるインタラクティブプロセスと自動化プロセスの両方をカバーするベンチマークデータセットの実験によって示されている。 Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. They fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and lossless multidimensional visualization, end users can identify the pattern in the data based on which they can make explainable decisions. Such options would not be possible in black box machine learning methodologies. The interpretable IVLC algorithm is supported by the Interactive Shifted Paired Coordinates Software System (SPCVis). It is a lossless multidimensional data visualization system with user interactive features. The interactive approach provides flexibility to the end user to perform data classification as self-service without having to rely on a machine learning expert. Interactive pattern discovery becomes challenging while dealing with large data sets with hundreds of dimensions/features. To overcome this problem, this chapter proposes an automated classification approach combined with new Coordinate Order Optimizer (COO) algorithm and a Genetic algorithm. The COO algorithm automatically generates the coordinate pair sequences that best represent the data separation and the genetic algorithm helps optimizing the proposed IVLC algorithm by automatically generating the areas for data classification. The feasibility of the approach is shown by experiments on benchmark datasets covering both interactive and automated processes used for data classification.	翻訳日:2021-07-13 16:17:02 公開日:2021-07-11
# 分布外ダイナミクス検出:RL関連ベンチマークと結果 Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results ( http://arxiv.org/abs/2107.04982v1 ) ライセンス: Link先を確認	Mohamad H Danesh and Alan Fern	(参考訳) 本研究では,時間的プロセスの動的変化をトレーニング・分散力学と比較して検出するOODD(Out-of-distriion dynamics)の問題点について検討する。これは制御、強化学習(RL)、多変量時系列の応用に関係しており、テスト時間ダイナミクスの変更は未知の方法で学習コントローラや予測器の性能に影響を与える可能性がある。この問題は、学習したコントローラがトレーニング環境に過度に適合する、深いRLの文脈において特に重要である。しかし、現在RL研究でよく使われる環境の種類について、OODDベンチマークが確立されていない。最初のコントリビューションは、OODDのさまざまなタイプと強度を持つ共通RL環境から派生したOODDベンチマークを設計することです。第2のコントリビューションは、繰り返し暗黙的量子化ネットワーク(RIQN)に基づいて、OODD検出のための自己回帰予測エラーを監視する強力なOODDベースラインアプローチを設計することである。最後のコントリビューションは、RIQNアプローチをベンチマークで評価し、将来の比較のためのベースライン結果を提供することです。 We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile networks (RIQNs), which monitors autoregressive prediction errors for OODD detection. Our final contribution is to evaluate the RIQN approach on the benchmarks to provide baseline results for future comparison.	翻訳日:2021-07-13 16:16:38 公開日:2021-07-11
# 大量半導体プロセスにおける機械学習に基づくCVD仮想メトロロジー Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process ( http://arxiv.org/abs/2107.05071v1 ) ライセンス: Link先を確認	Yunsong Xie, Ryan Stearrett	(参考訳) データインプット、特徴選択、回帰アルゴリズム、マシンラーニングベースのCVD(Chemical vapor deposition)仮想メタロジ(VM)の3つの重要な側面について、クロスベンチマークが行われた。その結果,線形特徴選択回帰アルゴリズムはVMデータに不適合であることが判明した。最適な精度を得るためには、データの可用性が約70%であるので、高い予測精度を達成するためには、データのインプティングも必要である。この研究は、非線形特徴選択と回帰アルゴリズムと最も近いデータインプティングアルゴリズムを組み合わせることで、予測精度を0.7まで向上させることを示唆している。これにより、CVD処理の70%のばらつきが減少し、物理メロロジーの周波数が低下し、品質が向上したより信頼性の高い大量発生ウェハとなると考えられている。 A cross-benchmark has been done on three critical aspects, data imputing, feature selection and regression algorithms, for machine learning based chemical vapor deposition (CVD) virtual metrology (VM). The result reveals that linear feature selection regression algorithm would extensively under-fit the VM data. Data imputing is also necessary to achieve a higher prediction accuracy as the data availability is only ~70% when optimal accuracy is obtained. This work suggests a nonlinear feature selection and regression algorithm combined with nearest data imputing algorithm would provide a prediction accuracy as high as 0.7. This would lead to 70% reduced CVD processing variation, which is believed to will lead to reduced frequency of physical metrology as well as more reliable mass-produced wafer with improved quality.	翻訳日:2021-07-13 16:16:19 公開日:2021-07-11
# 1つのマップがすべてに適合しない:マルチモーダル医療画像におけるサニエンシマップ説明の評価 One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images ( http://arxiv.org/abs/2107.05047v1 ) ライセンス: Link先を確認	Weina Jin, Xiaoxiao Li, Ghassan Hamarneh	(参考訳) 臨床エンドユーザに予測を説明することは、臨床決定支援のためにAIモデルのパワーを活用する必要がある。医療画像では、塩分マップが最も一般的な説明形式である。マップはAIモデルの予測の重要な特徴を強調している。多くのサリエンシマップ法が提案されているが、それぞれのモダリティ/チャンネルが同じ基礎となる生体医学現象の異なる臨床的意味を持つマルチモーダルな医用画像において、意思決定をいかにうまく行うかは分かっていない。このようなモダリティに依存した特徴を理解することは、臨床ユーザーのAI決定の解釈に不可欠である。臨床的に重要な問題であるが技術的に無視される問題に対処するため,MSFI(Modality-Specific Feature Importance)測定基準を提案し,サリエンシマップがモダリティ特有の重要な特徴を強調できるかどうかを検討する。 MSFIは、モダリティ優先順位付けおよびモダリティ特異的特徴ローカライゼーションに関する臨床要件を符号化する。臨床用ユーザスタディを含む16のサリエンシーマップ法について評価した結果,ほとんどのサリエンシーマップ法はモダリティ重要情報を一般に捉えたものの,モダリティ固有の重要な特徴を一貫して正確に強調することはできなかった。評価結果は,サリエンシマップ法の選択をガイドし,臨床応用をターゲットとした新たな手法を提案する。 Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model's prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the MSFI (Modality-Specific Feature Importance) metric to examine whether saliency maps can highlight modality-specific important features. MSFI encodes the clinical requirements on modality prioritization and modality-specific feature localization. Our evaluations on 16 commonly used saliency map methods, including a clinician user study, show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely. The evaluation results guide the choices of saliency map methods and provide insights to propose new ones targeting clinical applications.	翻訳日:2021-07-13 16:11:12 公開日:2021-07-11
# コモンセンス知識統合によるゼロショットシーングラフ関係予測 Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration ( http://arxiv.org/abs/2107.05080v1 ) ライセンス: Link先を確認	Xuan Kan, Hejie Cui, Carl Yang	(参考訳) 画像内の実体間の関係予測はシーングラフ生成(SGG)の重要なステップであり、様々な視覚的理解や推論タスクにさらに影響を及ぼす。しかし、既存のSGGフレームワークは重い訓練を必要とするが、目に見えない(ゼロショット)トリプルをモデル化することができない。本研究は, 共通理解の欠如, すなわち, 類似する実体を関連付け, 世界の一般的な理解に基づく類似関係を推測する能力に起因していることを強調する。このギャップを埋めるために、特にゼロショット関係予測において、SGGのコモンセンス知識を統合するフレームワークであるCommOnsense-integrAted sCenegrapHrElation pRediction (COACHER)を提案する。具体的には、外部コモンセンス知識グラフ内のエンティティの周辺と経路をモデル化し、最先端のSGGフレームワーク上でそれらを統合するための新しいグラフマイニングパイプラインを開発する。提案手法の有効性を実証するために,Visual Genome のオリジナルデータセットとオペレーテッドデータセットの総合的定量的評価と定性ケーススタディを行った。 Relation prediction among entities in images is an important step in scene graph generation (SGG), which further impacts various visual understanding and reasoning tasks. Existing SGG frameworks, however, require heavy training yet are incapable of modeling unseen (i.e.,zero-shot) triplets. In this work, we stress that such incapability is due to the lack of commonsense reasoning,i.e., the ability to associate similar entities and infer similar relations based on general understanding of the world. To fill this gap, we propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for SGG, especially for zero-shot relation prediction. Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive quantitative evaluations and qualitative case studies on both original and manipulated datasets from Visual Genome demonstrate the effectiveness of our proposed approach.	翻訳日:2021-07-13 16:10:50 公開日:2021-07-11
# SE-PSNet:Panoptic Segmentation Networkのためのシルエットベースの拡張機能 SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation Network ( http://arxiv.org/abs/2107.05093v1 ) ライセンス: Link先を確認	Shuo-En Chang, Yi-Cheng Yang, En-Ting Lin, Pei-Yung Hsiao, Li-Chen Fu	(参考訳) 最近、セマンティックセグメンテーションとインスタンスセグメンテーションを組み合わせたpanopticセグメンテーションタスクがあり、それぞれのピクセルを対応するインスタンスidで分類することを目標としている。本研究では,panoptic segmentationタスクに取り組むための解法を提案する。全体構造はボトムアップ法とトップダウン法を組み合わせている。したがって、パフォーマンスが向上するだけでなく、実行速度も維持できる。ネットワークは主にマスクの品質に注意を払っている。前の研究では、オブジェクトの不均一な輪郭が出現する可能性が高まり、結果として低品質の予測が行われることがわかりました。そこで我々は,マスク改善のために,物体と背景のシルエットに対する拡張機能とそれに対応する損失関数を提案する。一方,新しい信頼度スコアを用いて咬合問題を解決し,ネットワークがより高品質なマスクを予測結果として使用する傾向を示した。研究の検証には,cocoデータセットとcityscapesデータセットを使用して実験を行い,高速な推論時間で競合結果を得た。 Recently, there has been a panoptic segmentation task combining semantic and instance segmentation, in which the goal is to classify each pixel with the corresponding instance ID. In this work, we propose a solution to tackle the panoptic segmentation task. The overall structure combines the bottom-up method and the top-down method. Therefore, not only can there be better performance, but also the execution speed can be maintained. The network mainly pays attention to the quality of the mask. In the previous work, we can see that the uneven contour of the object is more likely to appear, resulting in low-quality prediction. Accordingly, we propose enhancement features and corresponding loss functions for the silhouette of objects and backgrounds to improve the mask. Meanwhile, we use the new proposed confidence score to solve the occlusion problem and make the network tend to use higher quality masks as prediction results. To verify our research, we used the COCO dataset and CityScapes dataset to do experiments and obtained competitive results with fast inference time.	翻訳日:2021-07-13 16:10:29 公開日:2021-07-11
# 過パラメータ浅層ニューラルネットワークを用いたエネルギーベースモデルのデュアルトレーニング Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks ( http://arxiv.org/abs/2107.05134v1 ) ライセンス: Link先を確認	Carles Domingo-Enrich, Alberto Bietti, Marylou Gabri\'e, Joan Bruna, Eric Vanden-Eijnden	(参考訳) エネルギーベースモデル(英: Energy-based model、EBM)は、通常最大推定によって訓練される生成モデルである。このアプローチは、このエネルギーに関連するギブス分布をサンプリングする必要があるため、訓練されたエネルギーが凸でない一般的な状況では困難になる。 Fenchel双対性(英語版)の結果を用いて、能動性(いわゆる特徴学習)と遅延性(英語版)の両方において、浅度過度ニューラルネットワークエネルギーを持つ最大極大EBMに双対する変動原理を導出する。アクティブな状態において、この二重定式化は、サンプル空間の粒子とエネルギーのパラメータ空間のニューロンを同時に更新する訓練アルゴリズムをもたらす。また,このアルゴリズムでは,データセットからランダムに抽出したサンプルで粒子をリスタートさせる場合があり,反復ステップ毎にこれらのリスタートを行うことがスコアマッチングトレーニングに対応していることを示す。 2つのアルゴリズムで中間パラメータの設定を使用することで、最大確率とスコアマッチングトレーニングを補間する方法が得られます。これらの結果は単純な数値実験で示される。 Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is nonconvex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the active (aka feature-learning) and lazy regimes. In the active regime, this dual formulation leads to a training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. Using intermediate parameter setups in our dual algorithm thereby gives a way to interpolate between maximum likelihood and score matching training. These results are illustrated in simple numerical experiments.	翻訳日:2021-07-13 16:06:50 公開日:2021-07-11
# 楕円ペア座標を用いた非線形視覚知識発見 Non-linear Visual Knowledge Discovery with Elliptic Paired Coordinates ( http://arxiv.org/abs/2107.04974v1 ) ライセンス: Link先を確認	Rose McDonald, Boris Kovalerchuk	(参考訳) 裸眼で2-3次元以上のデータから視覚的な知識を発見できることは、人間が困難である。本章では,新しいepc(eliptic paired coordinates)可視化を用いて,予測機械学習モデルをインタラクティブに発見する効率について検討する。 EPCは,多次元データを可視化し,多次元情報を2次元で保存した視覚機械学習を支援する。平行座標と放射座標と比較して、epcの可視化は各n-d点の視覚要素の半分しか必要としない。本研究で開発された対話型ソフトウェアシステムEllipseVisは、高次元データセットを処理し、EPCビジュアライゼーションを作成し、EPCにおける支配ルールを発見して予測的分類モデルを生成する。インタラクティブで自動的なプロセスを使用することで、単一のクラスの高い優位性を持つEPC内のゾーンを発見する。 EPC法は計算実験において高いカバレッジと精度で非線形予測モデルを発見することに成功している。これは視覚的に魅力的な支配ルールを作成することで、複数のドメインに利益をもたらす。本章では,実データおよびシミュレーションデータを用いた実験におけるepc非線形手法の検証,動的楕円対座標(depc)に一般化されたepc,視覚発見を最適化する座標重みの組込み,代替epc設計の導入,epc/depcに基づく非コンパクト機械学習手法の概念の導入について述べる。 It is challenging for humans to enable visual knowledge discovery in data with more than 2-3 dimensions with a naked eye. This chapter explores the efficiency of discovering predictive machine learning models interactively using new Elliptic Paired coordinates (EPC) visualizations. It is shown that EPC are capable to visualize multidimensional data and support visual machine learning with preservation of multidimensional information in 2-D. Relative to parallel and radial coordinates, EPC visualization requires only a half of the visual elements for each n-D point. An interactive software system EllipseVis, which is developed in this work, processes high-dimensional datasets, creates EPC visualizations, and produces predictive classification models by discovering dominance rules in EPC. By using interactive and automatic processes it discovers zones in EPC with a high dominance of a single class. The EPC methodology has been successful in discovering non-linear predictive models with high coverage and precision in the computational experiments. This can benefit multiple domains by producing visually appealing dominance rules. This chapter presents results of successful testing the EPC non-linear methodology in experiments using real and simulated data, EPC generalized to the Dynamic Elliptic Paired Coordinates (DEPC), incorporation of the weights of coordinates to optimize the visual discovery, introduction of an alternative EPC design and introduction of the concept of incompact machine learning methodology based on EPC/DEPC.	翻訳日:2021-07-13 16:05:56 公開日:2021-07-11
# BrainNNExplainer:Brain Networkベースの疾患分析のための解釈可能なグラフニューラルネットワークフレームワーク BrainNNExplainer: An Interpretable Graph Neural Network Framework for Brain Network based Disease Analysis ( http://arxiv.org/abs/2107.05097v1 ) ライセンス: Link先を確認	Hejie Cui, Wei Dai, Yanqiao Zhu, Xiaoxiao Li, Lifang He, Carl Yang	(参考訳) 疾患予測のための解釈可能な脳ネットワークモデルは、神経科学の進歩に非常に有用である。 gnnは複雑なネットワークデータをモデル化することを約束しているが、過度に適合しやすいため、医療などの決定クリティカルなシナリオでの使用を妨げている。このギャップを埋めるために、脳ネットワーク分析のための解釈可能なGNNフレームワークであるBrainNNExplainerを提案する。主に2つの共同学習モジュールで構成されており、脳ネットワークに特化したバックボーン予測モデルと、疾患特異的な脳ネットワーク接続を強調する説明生成器である。 BrainNNExplainerのユニークな解釈可能性と優れた性能を示す2つの難病予測データセットの可視化による大規模な実験結果。 Interpretable brain network models for disease prediction are of great value for the advancement of neuroscience. GNNs are promising to model complicated network data, but they are prone to overfitting and suffer from poor interpretability, which prevents their usage in decision-critical scenarios like healthcare. To bridge this gap, we propose BrainNNExplainer, an interpretable GNN framework for brain network analysis. It is mainly composed of two jointly learned modules: a backbone prediction model that is specifically designed for brain networks and an explanation generator that highlights disease-specific prominent brain network connections. Extensive experimental results with visualizations on two challenging disease prediction datasets demonstrate the unique interpretability and outstanding performance of BrainNNExplainer.	翻訳日:2021-07-13 16:05:30 公開日:2021-07-11
# 異なる利害関係者グループに関する組織パフォーマンスのコンピュータ支援構成分類 Computer-assisted construct classification of organizational performance concerning different stakeholder groups ( http://arxiv.org/abs/2107.05133v1 ) ライセンス: Link先を確認	Seethalakshmi Gopalakrishnan, Victor Chen, Gus Hahn-Powell, Bharadwaj Tirunagar	(参考訳) ビジネスやマネジメントにおける研究記事の数は、用語、構成、尺度とともに劇的に増加している。研究論文からの組織的業績構成の適切な分類は、その研究が関係するかもしれない文献と理解を分類する上で重要な役割を担っている。 In this work, we classify constructs (i.e., concepts and terminology used to capture different aspects of organizational performance) in research articles into a three-level categorization: (a) performance and non-performance categories (Level 0); (b) for performance constructs, stakeholder group-level of performance concerning investors, customers, employees, and the society (community and natural environment) (Level 1); and (c) for each stakeholder group-level, subcategories of different ways of measurement (Level 2). 本研究は,周辺文や外部参照から抽出した特徴を用いた文脈情報の増加が,訓練データに制限がある場合,分解レベルラベルの分類を改善することを見出した。本研究は, コンピュータ支援による構造同定と分類, 研究合成における重要なステップである。 The number of research articles in business and management has dramatically increased along with terminology, constructs, and measures. Proper classification of organizational performance constructs from research articles plays an important role in categorizing the literature and understanding to whom its research implications may be relevant. In this work, we classify constructs (i.e., concepts and terminology used to capture different aspects of organizational performance) in research articles into a three-level categorization: (a) performance and non-performance categories (Level 0); (b) for performance constructs, stakeholder group-level of performance concerning investors, customers, employees, and the society (community and natural environment) (Level 1); and (c) for each stakeholder group-level, subcategories of different ways of measurement (Level 2). We observed that increasing contextual information with features extracted from surrounding sentences and external references improves classification of disaggregate-level labels, given limited training data. Our research has implications for computer-assisted construct identification and classification - an essential step for research synthesis.	翻訳日:2021-07-13 16:03:58 公開日:2021-07-11
# 低リソース地理空間機械学習のためのドメイン適応化 Leveraging Domain Adaptation for Low-Resource Geospatial Machine Learning ( http://arxiv.org/abs/2107.04983v1 ) ライセンス: Link先を確認	Jack Lynch and Sam Wookey	(参考訳) リモートセンシングにおける機械学習は、地理空間画像の可用性と解像度の増大とともに成熟しているが、その実用性はラベル付きデータの必要性によってボトルネックになっている。さらに、多くのラベル付き地理空間データセットは特定の地域、機器、極端な気象イベントに特化しています。提案する複数の地理空間的ベンチマークに対する現代ドメイン適応の適用について検討し,固有の課題を明らかにし,その解決策を提案する。 Machine learning in remote sensing has matured alongside a proliferation in availability and resolution of geospatial imagery, but its utility is bottlenecked by the need for labeled data. What's more, many labeled geospatial datasets are specific to certain regions, instruments, or extreme weather events. We investigate the application of modern domain-adaptation to multiple proposed geospatial benchmarks, uncovering unique challenges and proposing solutions to them.	翻訳日:2021-07-13 16:02:20 公開日:2021-07-11
# 医用画像分割のための空間ガイド型自己監督クラスタリングネットワーク A Spatial Guided Self-supervised Clustering Network for Medical Image Segmentation ( http://arxiv.org/abs/2107.04934v1 ) ライセンス: Link先を確認	Euijoon Ahn, Dagan Feng and Jinman Kim	(参考訳) 医療画像のセグメンテーションは、自動臨床意思決定支援システムの基本的なステップである。しかし,教師付き深層学習に基づく既存の医用画像分割法は,大量のラベル付きトレーニングデータに依存するため問題視されている。医用画像データリポジトリは拡大を続けているが,注釈付きデータの量の増加は確認されていない。そこで本研究では,空間的に接続され,類似した特徴表現を持つ画像画素をグループ化するのを支援する複数の損失関数を導入することで,医療画像分割のための空間的誘導型自己教師付きクラスタリングネットワーク(sgscn)を提案する。単一の画像から、各ピクセルの特徴表現とクラスタリングの割り当てをエンドツーエンドで反復的に学習する。また,画像領域の形状と境界をより明確に示すコンテキストベースの一貫性損失を提案する。クラスタに属するすべてのピクセルを、クラスタ中心に空間的に近接するように強制する。本手法を2つの公開医用画像データセット上で評価し,従来の自己監督型クラスタリング法と比較した。実験の結果,医用画像のセグメンテーションでは最も精度が高かった。 The segmentation of medical images is a fundamental step in automated clinical decision support systems. Existing medical image segmentation methods based on supervised deep learning, however, remain problematic because of their reliance on large amounts of labelled training data. Although medical imaging data repositories continue to expand, there has not been a commensurate increase in the amount of annotated data. Hence, we propose a new spatial guided self-supervised clustering network (SGSCN) for medical image segmentation, where we introduce multiple loss functions designed to aid in grouping image pixels that are spatially connected and have similar feature representations. It iteratively learns feature representations and clustering assignment of each pixel in an end-to-end fashion from a single image. We also propose a context-based consistency loss that better delineates the shape and boundaries of image regions. It enforces all the pixels belonging to a cluster to be spatially close to the cluster centre. We evaluated our method on 2 public medical image datasets and compared it to existing conventional and self-supervised clustering methods. Experimental results show that our method was most accurate for medical image segmentation.	翻訳日:2021-07-13 15:58:50 公開日:2021-07-11
# ディープファイバクラスタリング:高速かつ効果的なホワイトマターパーセレーションのための解剖学的非教師なしディープラーニング Deep Fiber Clustering: Anatomically Informed Unsupervised Deep Learning for Fast and Effective White Matter Parcellation ( http://arxiv.org/abs/2107.04938v1 ) ライセンス: Link先を確認	Yuqian Chen, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Fan Zhang, Lauren J. O'Donnell	(参考訳) ホワイトマター・ファイバ・クラスタリング(wmfc)は、疾患分類や解剖学的道分画などの応用のためにホワイトマター・トラクトグラフィのパーセル化を可能にする。しかし、基底真理の欠如とファイバーデータの曖昧さ(ファイバーに沿った点を前方または逆順に表すことができる)がこの課題に挑戦する。教師なしディープラーニングに基づく新しいWMFCフレームワークを提案する。我々は,教師なしクラスタリング問題を自己教師なし学習タスクとして解決する。具体的には、畳み込みニューラルネットワークを用いて入力ファイバーの埋め込みを学習し、ペアワイズファイバー距離を擬似アノテーションとして使用する。これにより、ファイバポイント順序に敏感なwmfcが可能になる。さらに、脳解剖学的セグメンテーションデータを組み込むことにより、繊維クラスターの解剖学的コヒーレンスを向上させる。提案フレームワークは, クラスタ割り当て確率の低いファイバを拒絶することにより, 自然に外乱除去を可能にする。我々は,Human Connectome Projectから200のデータセットを用いて本手法を訓練し,評価する。その結果,提案手法の性能と効率が向上した。 White matter fiber clustering (WMFC) enables parcellation of white matter tractography for applications such as disease classification and anatomical tract segmentation. However, the lack of ground truth and the ambiguity of fiber data (the points along a fiber can equivalently be represented in forward or reverse order) pose challenges to this task. We propose a novel WMFC framework based on unsupervised deep learning. We solve the unsupervised clustering problem as a self-supervised learning task. Specifically, we use a convolutional neural network to learn embeddings of input fibers, using pairwise fiber distances as pseudo annotations. This enables WMFC that is insensitive to fiber point ordering. In addition, anatomical coherence of fiber clusters is improved by incorporating brain anatomical segmentation data. The proposed framework enables outlier removal in a natural way by rejecting fibers with low cluster assignment probability. We train and evaluate our method using 200 datasets from the Human Connectome Project. Results demonstrate superior performance and efficiency of the proposed approach.	翻訳日:2021-07-13 15:58:35 公開日:2021-07-11
# LiveView:ビュー合成のための動的ターゲット中心型MPI LiveView: Dynamic Target-Centered MPI for View Synthesis ( http://arxiv.org/abs/2107.05113v1 ) ライセンス: Link先を確認	Sushobhan Ghosh, Zhaoyang Lv, Nathan Matsuda, Lei Xiao, Andrew Berkovich, Oliver Cossairt	(参考訳) 既存のMulti-Plane Image (MPI) ベースのビュー合成手法は、1つの前方通過で固定された平面数を用いて入力ビューに整列したMPIを生成する。これらの手法は、新しいビューの高速で高品質なレンダリングを生成するが、リアルタイムアプリケーションには適さない低速で計算コストの高いmpi生成メソッドに依存している。加えて、ほとんどのMPI技術はトレーニングが完了すると修正できない固定深度/不均一平面を使用しているため、実行時の柔軟性は極めて低い。リアルタイムに高品質なビュー合成を実現する新しいMPI生成・レンダリング技術であるLiveViewを提案する。また,実行時にシーン依存型MPI平面(平面数と間隔)を選択するための柔軟性も提供する。 liveviewはまず、入力画像をターゲットビュー(ターゲット中心)にワープし、次に目標ビュー中心のmpi、つまり1つの深さプレーン(動的に)を生成する。高速なMPI生成と新規なビュー合成を可能にするとともに、高品質なレンダリングを生成する。その結果、LiveViewは、入力ビューのビデオストリームに基づいて、MPIを頻繁に更新する必要があるリアルタイムビュー合成アプリケーションを可能にする。我々はLiveViewが、最先端のMPIベースの手法に比べて、実行時の70倍高速で、ビュー合成の品質を向上させることを実証した。 Existing Multi-Plane Image (MPI) based view-synthesis methods generate an MPI aligned with the input view using a fixed number of planes in one forward pass. These methods produce fast, high-quality rendering of novel views, but rely on slow and computationally expensive MPI generation methods unsuitable for real-time applications. In addition, most MPI techniques use fixed depth/disparity planes which cannot be modified once the training is complete, hence offering very little flexibility at run-time. We propose LiveView - a novel MPI generation and rendering technique that produces high-quality view synthesis in real-time. Our method can also offer the flexibility to select scene-dependent MPI planes (number of planes and spacing between them) at run-time. LiveView first warps input images to target view (target-centered) and then learns to generate a target view centered MPI, one depth plane at a time (dynamically). The method generates high-quality renderings, while also enabling fast MPI generation and novel view synthesis. As a result, LiveView enables real-time view synthesis applications where an MPI needs to be updated frequently based on a video stream of input views. We demonstrate that LiveView improves the quality of view synthesis while being 70 times faster at run-time compared to state-of-the-art MPI-based methods.	翻訳日:2021-07-13 15:58:20 公開日:2021-07-11
# 深部政策勾配の座標方向制御変量 Coordinate-wise Control Variates for Deep Policy Gradients ( http://arxiv.org/abs/2107.04987v1 ) ライセンス: Link先を確認	Yuanyi Zhong, Yuan Zhou, Jian Peng	(参考訳) 制御変数 (CV) 法は, 実際には勾配推定器のばらつきを低減するために, 政策勾配推定に広く用いられている。状態-作用値推定からベースライン関数を減算して制御変量を適用する。そして、ばらつきが引き起こされるポリシー勾配は、おそらく学習効率を向上させる。深いニューラルネットポリシを持つ制御変数の最近の研究は、主にスカラー値のベースライン関数に焦点を当てている。ベクトル値ベースラインの効果は未探索である。本稿では,ニューラルネットワークポリシのためのベクトル値ベースラインから構築した座標ワイドおよび層ワイド制御による分散低減について検討する。本研究では,従来のスカラー値ベースラインよりも低分散のベースラインが得られることを示す実験結果を示す。我々は、これらの新しい制御変数を用いて、人気のあるPPOアルゴリズムの装備方法を示す。正規化を適切に行うアルゴリズムは、連続制御ベンチマークにおいてスカラー制御よりも高いサンプリング効率が得られることを示す。 The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO) algorithm with these new control variates. We show that the resulting algorithm with proper regularization can achieve higher sample efficiency than scalar control variates in continuous control benchmarks.	翻訳日:2021-07-13 15:53:45 公開日:2021-07-11
# 空白と不均衡アノテーションによる集団からの学習 Learning from Crowds with Sparse and Imbalanced Annotations ( http://arxiv.org/abs/2107.05039v1 ) ライセンス: Link先を確認	Ye Shi, Shao-Yuan Li, Sheng-Jun Huang	(参考訳) 従来の教師付き学習では、訓練データには基礎的真理ラベルが必要であり、その収集は多くの場合困難である。近年、クラウドソーシングは、非専門家の群衆に頼って効率的なラベリングソリューションとして確立されている。ラベル付けエラーの影響を低減するために、各インスタンスを複数のワーカーに分散するのが一般的な方法だが、各ワーカーはデータのサブセットのみに注釈を付け、その結果、"it sparse annotation} 現象が発生する。本稿では,クラス不均衡,すなわち,基底の真理ラベルが「クラス不均衡」である場合,スパースアノテーションは難解に分散する傾向にあり,学習アルゴリズムに悪影響を及ぼす可能性があることに留意する。この問題に対処するために, 自信ある擬似アノテーションを徐々に追加し, アノテーション分布を再バランスさせることにより, 自己学習に基づく1つのアプローチを提案する。具体的には,自信ある疑似アノテーションを選択するための分布意識的信頼度尺度を提案し,少数派アノテーションをオーバーサンプリングし,多数派アノテーションをアンサンプする再サンプリング戦略を採用する。 1つの実世界のクラウドソーシング画像分類タスクにおいて,提案手法は分布非依存手法よりもトレーニングを通してよりバランスの取れたアノテーションを与え,異なるアノテーションスパーシティレベルでの学習性能を大幅に向上させることを示した。 Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the {\it sparse annotation} phenomenon. In this paper, we note that when meeting with class-imbalance, i.e., when the ground truth labels are {\it class-imbalanced}, the sparse annotations are prone to be skewly distributed, which thus can severely bias the learning algorithm. To combat this issue, we propose one self-training based approach named {\it Self-Crowd} by progressively adding confident pseudo-annotations and rebalancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select confident pseudo-annotations, which adopts the resampling strategy to oversample the minority annotations and undersample the majority annotations. On one real-world crowdsourcing image classification task, we show that the proposed method yields more balanced annotations throughout training than the distribution agnostic methods and substantially improves the learning performance at different annotation sparsity levels.	翻訳日:2021-07-13 15:53:32 公開日:2021-07-11
# クラス優先シフトによる正ラベル分類:密度比推定に基づく事前不変アプローチ Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation ( http://arxiv.org/abs/2107.05045v1 ) ライセンス: Link先を確認	Shota Nakajima, Masashi Sugiyama	(参考訳) 正およびラベルなし(PU)データから学ぶことは、様々なアプリケーションにおいて重要な問題である。 pu分類に対する最近のアプローチのほとんどは、トレーニング未ラベルデータセットのクラス優先(正のサンプルの割合)がテストデータと同一であると仮定している。さらに、私たちは通常、トレーニングとテストデータのクラスプライオリエントを知らないので、それらを使わずに分類器をトレーニングする方法の手がかりがありません。これらの問題に対処するために,密度比推定に基づく新しいPU分類法を提案する。提案手法の特筆すべき利点は, 学習段階ではクラスプライオリエントを必要としないこと, テスト段階でのみクラスプライオリエントシフトが組み込まれていることである。提案手法を理論的に正当化し,その効果を実験的に実証する。 Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and test data, thus we have no clue on how to train a classifier without them. To address these problems, we propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase; class-prior shift is incorporated only in the test phase. We theoretically justify our proposed method and experimentally demonstrate its effectiveness.	翻訳日:2021-07-13 15:53:06 公開日:2021-07-11
# LexSubCon: 語彙資源からの知識を語彙置換のための文脈埋め込みに統合する LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution ( http://arxiv.org/abs/2107.05132v1 ) ライセンス: Link先を確認	George Michalopoulos, Ian McKillop, Alexander Wong, Helen Chen	(参考訳) 語彙置換は、与えられたテキストの文脈で単語の意味のある代用を生成するタスクである。文脈単語埋め込みモデルは文中の置換語から抽出された文脈情報に頼って語彙置換タスクにおいて最先端の結果を得た。しかし、そのようなモデルは外部の語彙データベースに存在する構造化知識を考慮していない。我々は,高度に正確な代替候補を識別できる文脈埋め込みモデルに基づく,エンドツーエンドの語彙置換フレームワークlexsubconを紹介する。これは文脈情報と構造化語彙資源からの知識を組み合わせることで達成される。 Our approach involves: (i) introducing a novel mix-up embedding strategy in the creation of the input embedding of the target word through linearly interpolating the pair of the target input embedding and the average embedding of its probable synonyms; (ii) considering the similarity of the sentence-definition embeddings of the target word and its proposed candidates; and, (iii) calculating the effect of each substitution in the semantics of the sentence through a fine-tuned sentence similarity model. 実験の結果,lexsubcon は ls07 やcoinco ベンチマークデータセットにおいて,語彙置換タスクに広く使用される従来の最先端手法よりも優れていた。 Lexical substitution is the task of generating meaningful substitutes for a word in a given textual context. Contextual word embedding models have achieved state-of-the-art results in the lexical substitution task by relying on contextual information extracted from the replaced word within the sentence. However, such models do not take into account structured knowledge that exists in external lexical databases. We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models that can identify highly accurate substitute candidates. This is achieved by combining contextual information with knowledge from structured lexical resources. Our approach involves: (i) introducing a novel mix-up embedding strategy in the creation of the input embedding of the target word through linearly interpolating the pair of the target input embedding and the average embedding of its probable synonyms; (ii) considering the similarity of the sentence-definition embeddings of the target word and its proposed candidates; and, (iii) calculating the effect of each substitution in the semantics of the sentence through a fine-tuned sentence similarity model. Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets that are widely used for lexical substitution tasks.	翻訳日:2021-07-13 15:52:52 公開日:2021-07-11
# 音韻ベクトルに基づく音声埋め込みを用いた多言語・多言語音声認識 Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings ( http://arxiv.org/abs/2107.05038v1 ) ライセンス: Link先を確認	Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou	(参考訳) 音声特徴量(pfs)の使用は、訓練中に言語固有の電話機を接続できる可能性があり、低リソース言語のための多言語および言語間音声認識方法の情報共有に非常に望ましい。従来の音韻的特徴を用いた場合の欠点は、ボトムアップ方式での音響-PF抽出自体が難しいことである。本稿では,音韻駆動型電話埋め込み(トップダウン)とディープニューラルネットワーク(dnn)を用いた音響特徴抽出(bottom-up)を併用し,電話の確率を推定する。新しい手法はJoinAP(Joining of Acoustics and Phonology)と呼ばれる。音声認識には音響から音韻的特徴への逆変換は不要である。 In the IPA (International Phonetic Alphabet) table, we encode its phonological features to a phonological-vector, then applied linear or linear transformation of the phonological-vector to obtained the phone embedded。コモンボイスデータセット (ドイツ語, フランス語, スペイン語, イタリア語) と aishll-1 データセット (mandarin) で複数言語間および言語間(ゼロショットと少数ショットの両方)の音声認識実験を行い、joinap の線形電話埋め込みとフラット電話埋め込みによる従来の方法の両方において、非線形電話埋め込みによるjoinapの優位性を実証した。 The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages. A drawback suffered by previous methods in using phonological features is that the acoustic-to-PF extraction in a bottom-up way is itself difficult. In this paper, we propose to join phonology driven phone embedding (top-down) and deep neural network (DNN) based acoustic feature extraction (bottom-up) to calculate phone probabilities. The new method is called JoinAP (Joining of Acoustics and Phonology). Remarkably, no inversion from acoustics to phonological features is required for speech recognition. For each phone in the IPA (International Phonetic Alphabet) table, we encode its phonological features to a phonological-vector, and then apply linear or nonlinear transformation of the phonological-vector to obtain the phone embedding. A series of multilingual and crosslingual (both zero-shot and few-shot) speech recognition experiments are conducted on the CommonVoice dataset (German, French, Spanish and Italian) and the AISHLL-1 dataset (Mandarin), and demonstrate the superiority of JoinAP with nonlinear phone embeddings over both JoinAP with linear phone embeddings and the traditional method with flat phone embeddings.	翻訳日:2021-07-13 15:49:20 公開日:2021-07-11
# BEV-MODNet:自律走行のための単眼カメラによる鳥の視線移動物体検出 BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object Detection for Autonomous Driving ( http://arxiv.org/abs/2107.04937v1 ) ライセンス: Link先を確認	Hazem Rashed, Mariam Essam, Maha Mohamed, Ahmad El Sallab and Senthil Yogamani	(参考訳) 移動物体の検出は、自律運転システムにおいて非常に重要なタスクである。知覚フェーズの後、動作計画は通常バードアイビュー(BEV)空間で行われる。これは、画像平面上で検出されたオブジェクトをトップビューのBEV平面に投影する必要がある。このようなプロジェクションは、深度情報や遠方でのノイズマッピングの欠如によってエラーを起こしやすい。 CNNは、現場のグローバルコンテキストを活用して、より良いプロジェクトを作成することができる。本研究では,モノクル画像を直接入力として,BEVマップ上での終端移動物体検出(MOD)について検討する。我々の知る限り、そのようなデータセットは存在せず、5つのクラスのためにBEV空間で動くオブジェクトマスクのアノテーションを備えた12.9k画像からなる拡張KITTI-rawデータセットを作成します。データセットはクラスに依存しないモーションキューベースのオブジェクト検出に使用され、クラスはチューニングを改善するためにメタデータとして提供される。我々は,bev空間内で直接動作セグメンテーションを出力する2ストリームrgbとオプティカルフロー融合アーキテクチャを設計し実装する。画像平面上での最先端動作分割予測の逆視点マッピングと比較する。簡単なベースライン実装を用いてmIoUの13%の大幅な改善を観測した。これは、bev空間で動きのセグメンテーション出力を直接学習する能力を示している。私たちのベースラインとデータセットのアノテーションの質的な結果は、https://sites.google.com/view/bev-modnetで確認できます。 Detection of moving objects is a very important task in autonomous driving systems. After the perception phase, motion planning is typically performed in Bird's Eye View (BEV) space. This would require projection of objects detected on the image plane to top view BEV plane. Such a projection is prone to errors due to lack of depth information and noisy mapping in far away areas. CNNs can leverage the global context in the scene to project better. In this work, we explore end-to-end Moving Object Detection (MOD) on the BEV map directly using monocular images as input. To the best of our knowledge, such a dataset does not exist and we create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes. The dataset is intended to be used for class agnostic motion cue based object detection and classes are provided as meta-data for better tuning. We design and implement a two-stream RGB and optical flow fusion architecture which outputs motion segmentation directly in BEV space. We compare it with inverse perspective mapping of state-of-the-art motion segmentation predictions on the image plane. We observe a significant improvement of 13% in mIoU using the simple baseline implementation. This demonstrates the ability to directly learn motion segmentation output in BEV space. Qualitative results of our baseline and the dataset annotations can be found in https://sites.google.com/view/bev-modnet.	翻訳日:2021-07-13 15:48:09 公開日:2021-07-11
# 圧縮センシングmriのための深部幾何蒸留ネットワーク Deep Geometric Distillation Network for Compressive Sensing MRI ( http://arxiv.org/abs/2107.04943v1 ) ライセンス: Link先を確認	Xiaohong Fan, Yin Yang, Jianping Zhang	(参考訳) 圧縮センシング(CS)は、小さなサンプルデータから$k$-spaceでMR画像を再構成し、MRIの取得を加速する効率的な方法である。本研究では,モデルに基づくCS-MRI法と深層学習に基づくCS-MRI法の利点を組み合わせた新しい深部幾何学的蒸留ネットワークを提案する。まず,モデルに基づくcs-mri最適化問題を,画像線形近似と画像幾何補償からなる2つの部分問題に展開する。第二に、近似段階における失われたテクスチャの詳細を蒸留するための幾何補正サブプロブレムをテイラー展開により拡張し、異なる幾何学特性領域の特徴を融合させる幾何学蒸留モジュールを設計することができる。さらに、ステップ長パラメータの適応初期化を伴う学習可能なバージョンを使用しており、モデルの柔軟性が向上し、スムーズに収束することができる。数値実験により、他の最先端のCS-MRI再構成手法よりもその優位性を検証した。ソースコードは \url{https://github.com/fanxiaohong/deep-Geometric-Distillation-for-CS-MRI} で入手できる。 Compressed sensing (CS) is an efficient method to reconstruct MR image from small sampled data in $k$-space and accelerate the acquisition of MRI. In this work, we propose a novel deep geometric distillation network which combines the merits of model-based and deep learning-based CS-MRI methods, it can be theoretically guaranteed to improve geometric texture details of a linear reconstruction. Firstly, we unfold the model-based CS-MRI optimization problem into two sub-problems that consist of image linear approximation and image geometric compensation. Secondly, geometric compensation sub-problem for distilling lost texture details in approximation stage can be expanded by Taylor expansion to design a geometric distillation module fusing features of different geometric characteristic domains. Additionally, we use a learnable version with adaptive initialization of the step-length parameter, which allows model more flexibility that can lead to convergent smoothly. Numerical experiments verify its superiority over other state-of-the-art CS-MRI reconstruction approaches. The source code will be available at \url{https://github.com/fanxiaohong/Deep-Geometric-Distillation-Network-for-CS-MRI}	翻訳日:2021-07-13 15:47:48 公開日:2021-07-11
# NeoUNet: 正確な大腸ポリープ分画と腫瘍検出を目指して NeoUNet: Towards accurate colon polyp segmentation and neoplasm detection ( http://arxiv.org/abs/2107.05023v1 ) ライセンス: Link先を確認	Phan Ngoc Lan, Nguyen Sy An, Dao Viet Hang, Dao Van Long, Tran Quang Trung, Nguyen Thi Thuy, Dinh Viet Sang	(参考訳) 自動ポリプセグメンテーションは内視鏡検査において非常に有用であることが証明されており、内視鏡内科医の腺腫検出率を低下させ、効率を高めている。しかし、ポリープを腫瘍かどうかを分類し、ピクセルレベルで分割することは、医師が限られた時間で実行するのが難しい課題である。本稿では,ポリプセグメンテーション問題に対する細粒度な定式化を提案する。我々の定式化は,ポリープ領域だけでなく,悪性度の高い部位を高い精度で同定することを目的としている。さらに,この問題を解決するために,NeoUNetと呼ばれるUNetベースのニューラルネットワークアーキテクチャとハイブリッド損失関数を提案する。実験では,既存のポリプセグメンテーションモデルと比較して,neounetに対する高い競合性を示す。 Automatic polyp segmentation has proven to be immensely helpful for endoscopy procedures, reducing the missing rate of adenoma detection for endoscopists while increasing efficiency. However, classifying a polyp as being neoplasm or not and segmenting it at the pixel level is still a challenging task for doctors to perform in a limited time. In this work, we propose a fine-grained formulation for the polyp segmentation problem. Our formulation aims to not only segment polyp regions, but also identify those at high risk of malignancy with high accuracy. In addition, we present a UNet-based neural network architecture called NeoUNet, along with a hybrid loss function to solve this problem. Experiments show highly competitive results for NeoUNet on our benchmark dataset compared to existing polyp segmentation models.	翻訳日:2021-07-13 15:47:30 公開日:2021-07-11
# 類似度誘導型深部顔画像検索 Similarity Guided Deep Face Image Retrieval ( http://arxiv.org/abs/2107.05025v1 ) ライセンス: Link先を確認	Young Kyun Jang, Nam Ik Cho	(参考訳) 検索入力された顔画像から同一同一の画像を検索する顔画像検索は、画像データベースのサイズが急速に増加するにつれて注目を浴びている。高速かつ正確な検索を行うために, コンパクトなハッシュコードに基づく手法が提案されており, 近年, 教師付き分類訓練による深面画像ハッシュ手法が注目されている。しかし、分類に基づくスキームは、顔画像間の複雑な類似性をハッシュコード学習に明らかにできないという欠点がある。本稿では,自己と対相同性を同時に考慮した類似性誘導ハッシュ(sgh)法を提案することにより,顔画像の検索品質の向上を試みる。 sghは、顔画像間の精巧な類似性を探索するために設計された様々なデータ拡張を用いている。既存のベンチマークと大規模高解像度顔画像データセットによるプロトコルに関する大規模な実験結果から,我々のSGHが最先端の検索性能を実現することを示す。 Face image retrieval, which searches for images of the same identity from the query input face image, is drawing more attention as the size of the image database increases rapidly. In order to conduct fast and accurate retrieval, a compact hash code-based methods have been proposed, and recently, deep face image hashing methods with supervised classification training have shown outstanding performance. However, classification-based scheme has a disadvantage in that it cannot reveal complex similarities between face images into the hash code learning. In this paper, we attempt to improve the face image retrieval quality by proposing a Similarity Guided Hashing (SGH) method, which gently considers self and pairwise-similarity simultaneously. SGH employs various data augmentations designed to explore elaborate similarities between face images, solving both intra and inter identity-wise difficulties. Extensive experimental results on the protocols with existing benchmarks and an additionally proposed large scale higher resolution face image dataset demonstrate that our SGH delivers state-of-the-art retrieval performance.	翻訳日:2021-07-13 15:47:18 公開日:2021-07-11
# 投影・撮影機能を有するハイブリッド画素を用いたプロジェクタカメラシステム A Projector-Camera System Using Hybrid Pixels with Projection and Capturing Capabilities ( http://arxiv.org/abs/2107.05043v1 ) ライセンス: Link先を確認	Kenta Yamamoto, Daisuke Iwai, Kosuke Sato	(参考訳) 本稿では,各画素に投影能力と撮影能力の両方を有するプロジェクターカメラシステム(procams)を提案する。提案するprocamsは,プロジェクタとカメラの正確な画素対応を得るのが困難である。概念実証procamsプロトタイプを実装し,その動的投影マッピングへの適用性を実証した。 We propose a novel projector-camera system (ProCams) in which each pixel has both projection and capturing capabilities. Our proposed ProCams solves the difficulty of obtaining precise pixel correspondence between the projector and the camera. We implemented a proof-of-concept ProCams prototype and demonstrated its applicability to a dynamic projection mapping.	翻訳日:2021-07-13 15:47:00 公開日:2021-07-11
# 正則化m推定器の導出と残留分布と適応チューニングへの応用 Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning ( http://arxiv.org/abs/2107.05143v1 ) ライセンス: Link先を確認	Pierre C Bellec, Yiwei Shen	(参考訳) 本稿では,ガウス設計行列と任意雑音分布を持つ線形モデルにおいて,凸ペナルティを正規化した勾配-リプシッツ損失関数を持つm推定器について検討する。実例では、ハマー損失と弾性ネットのペナルティとノイズ分布の重みを持つロバストなM推定器がある。私たちの主な貢献は3倍です。 i) 正規化 M-推定器の微分に対する一般式 $\hat\beta(y,X)$ ここでの微分は$y$と$X$の両方で成り立つが、これはすべての凸正規化 M-推定器で共有される単純な微分可能性構造を示す。 (ii)これらの誘導体を用いて、次元とサンプルサイズが同一の中間高次元状態における残余 $r_i = y_i-x_i^\top\hat\beta$ の分布を特徴付ける。 (iii) 残差の分布を動機とし, 正規化m推定器のチューニングパラメータを選択するための新しい適応基準を提案する。基準は、サンプル外誤差を推定器から独立な加算定数まで近似することにより、サンプル外誤差を最小化するプロキシを提供する。提案した適応的基準は、ノイズ分布や設計の共分散の知識を必要としない。シミュレートされたデータは、残差の分布と基準の成功の両方に関して、サンプル外誤差のプロキシとして理論的な結果を確認する。最後に、我々の結果は、$\hat\beta(y,X)$ の微分と独立な興味を持つ M-推定子の有効自由度の間の新しい関係を明らかにする。 This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $\hat\beta(y,X)$ where differentiation is taken with respect to both $y$ and $X$; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residual $r_i = y_i-x_i^\top\hat\beta$ in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of $\hat\beta(y,X)$ and the effective degrees of freedom of the M-estimator, which are of independent interest.	翻訳日:2021-07-13 15:43:52 公開日:2021-07-11
# 拡張勾配降下を用いたコルモゴロフモデル学習の2重最適化 Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent ( http://arxiv.org/abs/2107.05011v1 ) ライセンス: Link先を確認	Qiyou Duan and Hadi Ghauch and Taejoon Kim	(参考訳) データ表現技術は、データ処理と機械学習(ML)の進歩に大きく貢献している。予測能力の向上は従来の表現技法の焦点であり、残念ながらデータの基礎となる洞察を抽出するという点では解釈可能性にかなり劣っている。近年、kolmogorov model (km) が研究され、確率変数の集合の根底にある確率的構造を学ぶための解釈可能かつ予測可能な表現アプローチである。しかし、ランダム化による半定緩和(SDRwR)や離散単調最適化(DMO)を用いた既存のKM学習アルゴリズムは、計算処理がうまく行えないため、ビッグデータアプリケーションに限られている。本稿では,拡張勾配降下法(gd)法を併用した正規化双対最適化に基づく,計算スケーラブルなkm学習アルゴリズムを提案する。提案手法を大規模化するために,固有値分解(EVD)と近似EVDアルゴリズムの2つの高速化手法を提案する。さらに、近似誤差解析を利用して正規化されたMinkowski $\ell_1$-normとそのバウンダリを利用するしきい値は、近位EVDアルゴリズムの反復数を選択するために提供される。ビッグデータアプリケーションに適用した場合,提案手法は,既存のKM学習アルゴリズムと比較して,計算複雑性を著しく低減した互換性のあるトレーニング/予測性能を実現可能であることが実証された。さらに,提案したKM学習アルゴリズムを用いた論理的関係マイニングの精度は80\%以上であることを示した。 Data representation techniques have made a substantial contribution to advancing data processing and machine learning (ML). Improving predictive power was the focus of previous representation techniques, which unfortunately perform rather poorly on the interpretability in terms of extracting underlying insights of the data. Recently, Kolmogorov model (KM) was studied, which is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. The existing KM learning algorithms using semi-definite relaxation with randomization (SDRwR) or discrete monotonic optimization (DMO) have, however, limited utility to big data applications because they do not scale well computationally. In this paper, we propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. To make our method more scalable to large-dimensional problems, we propose two acceleration schemes, namely, eigenvalue decomposition (EVD) elimination strategy and proximal EVD algorithm. Furthermore, a thresholding technique by exploiting the approximation error analysis and leveraging the normalized Minkowski $\ell_1$-norm and its bounds, is provided for the selection of the number of iterations of the proximal EVD algorithm. When applied to big data applications, it is demonstrated that the proposed method can achieve compatible training/prediction performance with significantly reduced computational complexity; roughly two orders of magnitude improvement in terms of the time overhead, compared to the existing KM learning algorithms. Furthermore, it is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80\%$.	翻訳日:2021-07-13 15:41:14 公開日:2021-07-11
# 深層学習を用いたスペクトル時間RF識別 Spectro-Temporal RF Identification using Deep Learning ( http://arxiv.org/abs/2107.05114v1 ) ライセンス: Link先を確認	Hai N. Nguyen, Marinos Vomvas, Triet Vo-Huu, Guevara Noubir	(参考訳) RF放射の検出、分類、分光時間的局所化は、RFスペクトルの理解、管理、保護に関するタスクだけでなく、侵入するドローンやジャマーの検出などの安全およびセキュリティ用途にも不可欠である。広帯域スペクトルとリアルタイム性能のこの目標を達成することは難しい問題である。本稿では,スペクトル時間検出,フレームワーク,システムを備えた広帯域リアルタイムrf識別システムである手首を提案する。得られた深層学習モデルは,100MHzスペクトルのRFサンプルをリアルタイム(入出力6Gbps以上のI&Qストリーム)で検出し,分類し,正確に検出することができる。このような機能は、深層学習に基づく一段階オブジェクト検出フレームワークを活用し、多チャンネル画像に基づくRF信号表現に学習を移すことにより実現可能である。また,合成および拡張rfデータを活用して,rfエミッション(spread)の大規模ラベル付きデータセットを効率的に構築する反復的トレーニング手法を提案する。 WRIST検出器は、野生の非常に密集した環境でも平均的な平均精度が90に達する。 WRISTモデルは5つの技術(Bluetooth、Lightbridge、Wi-Fi、XPD、ZigBee)を分類し、容易に拡張可能である。キュレートされた注釈付きデータセットをコミュニティ全体に公開しています。さまざまな無線電波から収集された100万近いラベル付きrf放射が、さまざまな環境にまたがって5つの排出クラスにまたがって構成されている。 RF emissions detection, classification, and spectro-temporal localization are crucial not only for tasks relating to understanding, managing, and protecting the RF spectrum, but also for safety and security applications such as detecting intruding drones or jammers. Achieving this goal for wideband spectrum and in real-time performance is a challenging problem. We present WRIST, a Wideband, Real-time RF Identification system with Spectro-Temporal detection, framework and system. Our resulting deep learning model is capable to detect, classify, and precisely locate RF emissions in time and frequency using RF samples of 100 MHz spectrum in real-time (over 6Gbps incoming I&Q streams). Such capabilities are made feasible by leveraging a deep-learning based one-stage object detection framework, and transfer learning to a multi-channel image-based RF signals representation. We also introduce an iterative training approach which leverages synthesized and augmented RF data to efficiently build large labelled datasets of RF emissions (SPREAD). WRIST detector achieves 90 mean Average Precision even in extremely congested environment in the wild. WRIST model classifies five technologies (Bluetooth, Lightbridge, Wi-Fi, XPD, and ZigBee) and is easily extendable to others. We are making our curated and annotated dataset available to the whole community. It consists of nearly 1 million fully labelled RF emissions collected from various off-the-shelf wireless radios in a range of environments and spanning the five classes of emissions.	翻訳日:2021-07-13 15:40:44 公開日:2021-07-11
# アタックルール:機械学習を用いた産業制御システムに対するアタック生成の逆アプローチ Attack Rules: An Adversarial Approach to Generate Attacks for Industrial Control Systems using Machine Learning ( http://arxiv.org/abs/2107.05127v1 ) ライセンス: Link先を確認	Muhammad Azmi Umer, Chuadhry Mujeeb Ahmed, Muhammad Taha Jilani, Aditya P. Mathur	(参考訳) 逆学習は、攻撃中の機械学習アルゴリズムの堅牢性をテストし、産業制御システム(ICS)の異常検出手法を欺く攻撃を生成するために使用される。本研究は,icのセキュリティ評価において,攻撃パターンの徹底的な集合が研究されていることを考慮し,マイニングに基づく攻撃生成手法を提案する。この技術は安全な水処理プラントのデータを用いて実装されている。提案手法は,これまで見られなかった攻撃ベクトルの大部分を構成する30万以上の攻撃パターンを生成することができた。自動生成攻撃は、潜在的な攻撃の理解を深め、ロバストな攻撃検出技術の設計を可能にする。 Adversarial learning is used to test the robustness of machine learning algorithms under attack and create attacks that deceive the anomaly detection methods in Industrial Control System (ICS). Given that security assessment of an ICS demands that an exhaustive set of possible attack patterns is studied, in this work, we propose an association rule mining-based attack generation technique. The technique has been implemented using data from a secure Water Treatment plant. The proposed technique was able to generate more than 300,000 attack patterns constituting a vast majority of new attack vectors which were not seen before. Automatically generated attacks improve our understanding of the potential attacks and enable the design of robust attack detection techniques.	翻訳日:2021-07-13 15:40:20 公開日:2021-07-11
# 新型コロナウイルス対策における温熱測定のためのクラウド-エッジ-端末協調システム A Cloud-Edge-Terminal Collaborative System for Temperature Measurement in COVID-19 Prevention ( http://arxiv.org/abs/2107.05078v1 ) ライセンス: Link先を確認	Zheyi Ma, Hao Li, Wen Fang, Qingwen Liu, Bin Zhou and Zhiyong Bu	(参考訳) 新型コロナウイルス(COVID-19)の感染拡大を防止するため、公共の場で予備温度測定とマスク検出を行う。しかし、既存の温度測定手法は安全性と展開の問題に直面している。本稿では,人の顔が部分的にぼけている場合でも,安全で正確な温度測定を実現するため,軽量赤外線温度計測モデルを用いたクラウド・エッジ・ターミナル協調システムを提案する。 RGBレンズとサーマルレンズを備えた双眼鏡カメラを用いて、画像対を同時にキャプチャする。次に,マルチタスク・カスケード・畳み込みネットワーク(MTCNN)に基づく移動体検出モデルを提案し,RGB画像上での顔アライメントとマスク検出を実現する。正確な温度測定のために、RGB画像の顔のランドマークをアフィン変換により熱画像に変換し、額のより正確な温度測定領域を選択する。収集された情報は、新型コロナウイルス予防のためにリアルタイムでクラウドにアップロードされる。実験により、検出モデルはわずか6.1mで、平均検出速度は257msであることが示された。 1mの距離では、室内温度測定の誤差は約3%である。すなわち,提案システムは公共空間におけるリアルタイム温度測定を実現することができる。 To prevent the spread of coronavirus disease 2019 (COVID-19), preliminary temperature measurement and mask detection in public areas are conducted. However, the existing temperature measurement methods face the problems of safety and deployment. In this paper, to realize safe and accurate temperature measurement even when a person's face is partially obscured, we propose a cloud-edge-terminal collaborative system with a lightweight infrared temperature measurement model. A binocular camera with an RGB lens and a thermal lens is utilized to simultaneously capture image pairs. Then, a mobile detection model based on a multi-task cascaded convolutional network (MTCNN) is proposed to realize face alignment and mask detection on the RGB images. For accurate temperature measurement, we transform the facial landmarks on the RGB images to the thermal images by an affine transformation and select a more accurate temperature measurement area on the forehead. The collected information is uploaded to the cloud in real time for COVID-19 prevention. Experiments show that the detection model is only 6.1M and the average detection speed is 257ms. At a distance of 1m, the error of indoor temperature measurement is about 3%. That is, the proposed system can realize real-time temperature measurement in public areas.	翻訳日:2021-07-13 15:39:21 公開日:2021-07-11
# 5gコネクテッド・オートマチック運転におけるqos予測 QoS Prediction for 5G Connected and Automated Driving ( http://arxiv.org/abs/2107.05000v1 ) ライセンス: Link先を確認	Apostolos Kousaridas, Ramya Panthangi Manjunath, Jose Mauricio Perdomo, Chan Zhou, Ernst Zielinski, Steffen Schmitz and Andreas Pfadler	(参考訳) 5G通信システムは、多くの高度な車両間通信(V2X)ユースケースの要求品質(QoS)要件をサポートすることができる。しかし、特に自動走行車の安全で効率的な運転は、供給されたQoSの急激な変更によって影響を受ける可能性がある。このため、QoS変更の予測と、これらの予測された変更の早期通知は、最近5G通信システムによって実現されている。このソリューションにより、車両はアプリケーションレベルでの突然のQoS変化の影響を回避または緩和することができる。本稿では,5G通信システムによってQoS予測が生成され,V2Xアプリケーションに配信される方法について述べる。遠隔操作運転使用事例は、QoS予測スキームの実現可能性を分析する例として使用される。 qos予測ソリューションを開発するための有用な推奨事項が提供され、オープンリサーチのトピックが特定される。 5G communication system can support the demanding quality-of-service (QoS) requirements of many advanced vehicle-to-everything (V2X) use cases. However, the safe and efficient driving, especially of automated vehicles, may be affected by sudden changes of the provided QoS. For that reason, the prediction of the QoS changes and the early notification of these predicted changes to the vehicles have been recently enabled by 5G communication systems. This solution enables the vehicles to avoid or mitigate the effect of sudden QoS changes at the application level. This article describes how QoS prediction could be generated by a 5G communication system and delivered to a V2X application. The tele-operated driving use case is used as an example to analyze the feasibility of a QoS prediction scheme. Useful recommendations for the development of a QoS prediction solution are provided, while open research topics are identified.	翻訳日:2021-07-13 15:39:04 公開日:2021-07-11
# ニューラルウェーブシェイピング合成 Neural Waveshaping Synthesis ( http://arxiv.org/abs/2107.05050v1 ) ライセンス: Link先を確認	Ben Hayes, Charalampos Saitis, Gy\"orgy Fazekas	(参考訳) ニューラルウェーブシェーピングユニット (NEWT) は, 波形領域で直接動作するニューラルオーディオ合成に対して, 高速なCPU推論のためのアタッチメント最適化 (FastNEWT) を備えた, 軽量で完全な因果的アプローチである。 NEWTは周期的なアクティベーションを持つ時間分散多層パーセプトロンを使用して、ターゲットの音色の特徴を符号化する非線形伝達関数を暗黙的に学習する。訓練されると、ニュートは入力信号と出力信号の単純なアフィン変換によって複雑な音節の進化を生み出すことができる。 NEWTと差別化可能なノイズ合成器を組み合わせて残響を行い、260kの総モデルパラメータしか持たない現実的な楽器演奏をF0と大音量で再現できることを発見した。提案手法を,マルチ刺激聴取テストとFr'echet Audio Distanceと比較したところ,テストされた音節領域間で競合する性能を示した。提案手法は, 生成速度のベンチマークを著しく上回り, 高速更新の有無に関わらず, 消費者cpu上でのリアルタイム性能を実現し, 将来的な創造的音響設計ツールの基盤となることを示唆する。 We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics of a target timbre. Once trained, a NEWT can produce complex timbral evolutions by simple affine transformations of its input and output signals. We paired the NEWT with a differentiable noise synthesiser and reverb and found it capable of generating realistic musical instrument performances with only 260k total model parameters, conditioned on F0 and loudness features. We compared our method to state-of-the-art benchmarks with a multi-stimulus listening test and the Fr\'echet Audio Distance and found it performed competitively across the tested timbral domains. Our method significantly outperformed the benchmarks in terms of generation speed, and achieved real-time performance on a consumer CPU, both with and without FastNEWT, suggesting it is a viable basis for future creative sound design tools.	翻訳日:2021-07-13 15:38:52 公開日:2021-07-11
# バングラ自然言語処理タスクのレビューと変圧器モデルの有用性 A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models ( http://arxiv.org/abs/2107.03844v2 ) ライセンス: Link先を確認	Firoj Alam, Arid Hasan, Tanvirul Alam, Akib Khan, Janntatul Tajrin, Naira Khan, Shammur Absar Chowdhury	(参考訳) Banglaは世界で6番目に広く話されている言語(https://www.ethnologue.com/guides/ethnologue200)であり、2億3000万人のネイティブスピーカーを持つ。 30年にわたる研究を経て、Bangla NLP(BNLP)は、主に資源不足とそれに伴う課題のために、まだ遅れを取っている。 BNLPのさまざまな領域に疎結合な研究があるが、以前の研究や最近の進歩を報告する詳細な調査はまだ行われていない。本研究は,まずバングラ・nlpのタスク,リソース,ツールのレビューを行い,現状のアルゴリズム(トランスフォーマーベースモデル)を用いて,様々なプラットフォームから収集したデータセットを9つのnlpタスク向けにベンチマークする。異なる大きさの単言語モデルと多言語モデルを比較することで,NLPタスクの比較結果を提供する。個人と統合されたデータセットを用いてその結果を報告し、今後の研究にデータ分割を提供する。我々は合計108の論文をレビューし、175の実験を行った。本結果は,計算コストとのトレードオフを強調しつつ,トランスフォーマーモデルを用いた有望な性能を示す。このような包括的調査がコミュニティを活性化させ、バングラNLPの研究をさらに前進させることを期待している。 Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing monolingual vs. multilingual models of varying sizes. We report our results using both individual and consolidated datasets and provide data splits for future research. We reviewed a total of 108 papers and conducted 175 sets of experiments. Our results show promising performance using transformer-based models while highlighting the trade-off with computational costs. We hope that such a comprehensive survey will motivate the community to build on and further advance the research on Bangla NLP.	翻訳日:2021-07-13 11:42:25 公開日:2021-07-11

Title

Authors

Abstract

論文公表日・翻訳日

# 量子エンタングル-プローブ散乱理論

Quantum Entangled-Probe Scattering Theory ( http://arxiv.org/abs/2008.04328v3 )

ライセンス: Link先を確認

Abu Ashik Md Irfan, Patrick Blackstone, Roger Pynn and Gerardo Ortiz

(参考訳) 我々は、量子検出を含む絡み合ったプローブ散乱理論を開発し、標準散乱アプローチの範囲を広げる。これらのプローブは、強い相関系の非伝統的な位相のような絡み合った物質の研究において革命的であるかもしれない。本発表では、光子プローブにも同様の考え方が適用されるが、スピンとパスにモード絡み合う中性子ビームプローブを[1]で実験的に実現した。従来のファンホーブ理論 [2] を一般化し、応答は2点相関関数の適切に作られた組み合わせとして記述される。プローブの絡み合い長さをチューニングすることで、差動断面の干渉パターンを分析して、興味の空間的スケールを問うことができる。注目すべきことに、スピン二量体ターゲットの場合、ターゲット状態が非絡み合っているときに観測される典型的なヤング様干渉パターンは、その状態が最大絡み合っているときに量子消去される。

We develop an entangled-probe scattering theory, including quantum detection, that extends the scope of standard scattering approaches. We argue that these probes may be revolutionary in studying entangled matter such as unconventional phases of strongly correlated systems. Our presentation focuses on a neutron beam probe that is mode-entangled in spin and path as is experimentally realized in [1], although similar ideas also apply to photon probes. We generalize the traditional van Hove theory [2] whereby the response is written as a properly-crafted combination of two-point correlation functions. Tuning the probe's entanglement length allows us to interrogate spatial scales of interest by analyzing interference patterns in the differential cross-section. Remarkably, for a spin dimer target we find that the typical Young-like interference pattern observed if the target state is un-entangled gets quantum erased when that state becomes maximally entangled.

翻訳日:2023-05-06 16:00:47 公開日:2021-07-11

# 食事アセスメントのための無料mhealthアプリの批判的特徴と一般課題

A Review of Critical Features and General Issues of Freely Available mHealth Apps For Dietary Assessment ( http://arxiv.org/abs/2008.09883v4 )

ライセンス: Link先を確認

Ghalib Ahmed Tahir, Chu Kiong Loo, Foong Ming Moy and Nadine Kong

(参考訳) 肥満は生活の質を著しく低下させることが知られている。糖尿病、心血管疾患、様々ながんなどの非感染性疾患の頻度の増加と関連していることが多い。食事関連のモバイルアプリケーションは、個人の健康的な選択や食物摂取の追跡を支援する上で重要な役割を担っているという証拠がある。しかし、類似したアプリケーションが豊富にあるため、機能、ユーザビリティ、および設計上の問題の観点からそれぞれを評価し、将来に向けて最先端のソリューションを真に決定することが重要になる。これらのアプリケーションは、異なる食生活者からの複数のユーザー要求とレコメンデーションを実装しているため、評価は非常に複雑になる。そこで本研究では,既存の食事用アプリケーションについて検討し,アプリケーションのユーザビリティを損なうおそれのある重要な特徴と問題点を強調する。本研究は, PUBMED, CINAHL (2010年1月～2019年12月) およびScience Direct (2010-2019年) の各種学術データベースから, 論文の公開について検討した。我々はPRISMAガイドラインに従い,本研究の結果から,同定,スクリーニング,適性,全文評価の56%が包括的基準を満たした。選択した研究から35のアプリを分析し,特定した各アプリのデータを抽出した。自由に利用可能なmhealthアプリケーションの包括性に関する詳細な分析を行った結果,今後の研究課題を特定し,臨床的に正確な食事関連アプリケーションを開発するための推奨事項を述べた。

Obesity is known to lower the quality of life substantially. It is often associated with increased chances of non-communicable diseases such as diabetes, cardiovascular problems, various cancers, etc. Evidence suggests that diet-related mobile applications play a vital role in assisting individuals in making healthier choices and keeping track of food intake. However, due to an abundance of similar applications, it becomes pertinent to evaluate each of them in terms of functionality, usability, and possible design issues to truly determine state-of-the-art solutions for the future. Since these applications involve implementing multiple user requirements and recommendations from different dietitians, the evaluation becomes quite complex. Therefore, this study aims to review existing dietary applications at length to highlight key features and problems that enhance or undermine an application's usability. For this purpose, we have examined the published literature from various scientific databases of the PUBMED, CINAHL (January 2010-December 2019) and Science Direct (2010-2019). We followed PRISMA guidelines, and out of our findings, fifty-six primary studies met our inclusion criteria after identification, screening, eligibility and full-text evaluation. We analyzed 35 apps from the selected studies and extracted the data of each of the identified apps.Following our detailed analysis on the comprehensiveness of freely available mHealth applications, we specified potential future research challenges and stated recommendations to help grow clinically accurate diet-related applications.

翻訳日:2023-05-05 05:59:00 公開日:2021-07-11

# 非ブローチパリティ時対称性と例外点の観測

Observation of non-Bloch parity-time symmetry and exceptional points ( http://arxiv.org/abs/2009.07288v2 )

ライセンス: Link先を確認

Lei Xiao, Tianshu Deng, Kunkun Wang, Zhong Wang, Wei Yi, Peng Xue

(参考訳) パリティ時(PT)対称ハミルトニアンは非エルミート物理学において広く重要である。 pt対称ハミルトニアンは、実または複素の固有スペクトルを持つ異なる位相を示すことができ、一方、中間の遷移点(いわゆる例外点)は、アプリケーションにとって大きな期待を持つ重要な振る舞いのホストとなる。空間的に周期的な非エルミート系では、pt対称性は一般にブロッホバンド理論に沿って特徴づけられ観測され、ブリルアンゾーンに例外的な点がある。ここでは、単光子の非単位量子ウォークにおいて、この共通の知恵を超えた例外的な点の族が発見される。これらの「非ブロックの例外点」は、非エルミティアスキン効果として知られる境界付近のバルク固有状態の蓄積に由来する。以上の結果から,pt対称性と非エルミート皮膚効果との間に興味深い相互作用が認められた。

Parity-time (PT)-symmetric Hamiltonians have widespread significance in non-Hermitian physics. A PT-symmetric Hamiltonian can exhibit distinct phases with either real or complex eigenspectrum, while the transition points in between, the so-called exceptional points, give rise to a host of critical behaviors that holds great promise for applications. For spatially periodic non-Hermitian systems, PT symmetries are commonly characterized and observed in line with the Bloch band theory, with exceptional points dwelling in the Brillouin zone. Here, in nonunitary quantum walks of single photons, we uncover a novel family of exceptional points beyond this common wisdom. These "non-Bloch exceptional points" originate from the accumulation of bulk eigenstates near boundaries, known as the non-Hermitian skin effect, and inhabit a generalized Brillouin zone. Our finding opens the avenue toward a generalized PT-symmetry framework, and reveals the intriguing interplay between PT symmetry and non-Hermitian skin effect.

翻訳日:2023-05-02 04:16:49 公開日:2021-07-11

# フォトニック符号化パラフェミオンのトポロジカルな文脈性と正準統計

Topological contextuality and anyonic statistics of photonic-encoded parafermions ( http://arxiv.org/abs/2011.05008v2 )

ライセンス: Link先を確認

Zheng-Hao Liu, Kai Sun, Jiannis K. Pachos, Mu Yang, Yu Meng, Yu-Wei Liao, Qiang Li, Jun-Feng Wang, Ze-Yu Luo, Yi-Fei He, Dong-Yu Huang, Guang-Rui Ding, Jin-Shi Xu, Yong-Jian Han, Chuan-Feng Li and Guang-Can Guo

(参考訳) マヨラナゼロモード状態の測定中に生じると思われる準粒子中毒は、マヨラナベースの量子計算の実現に向けて根本的な問題となる。パラフェルミオン(Parafermions)はマヨラナのフェルミオンの自然な一般化であり、準粒子中毒に免疫するトポロジカルクォーディットをコードする。パラフェルミオンは超伝導分数量子ホール系で現れることが期待されているが、現在の技術では実現できない。この問題を回避するため,我々はフォトニック量子シミュレータを用いてパラフェルミオンに基づく普遍量子計算の重要なコンポーネントを実験的に実証する。この記事への私たちの貢献は2つあります。まず、フォトニック状態を操作することで、パラフェルミオンのブレイディング統計に対応するクリフォード作用素ベリー相を実現する。第2に、パラフェルミオン符号化量子量子状態の文脈性を示すことにより、トポロジカル系の量子文脈性が初めて研究される。重要なことに、トポロジカルに符号化された文脈性は、状態蒸留の魔法の道を開く一方、コンテキスト性とブレイディングによって引き起こされるクリフォードゲートは局所雑音に対して弾力性を持つ。文脈性を導入することで、我々のフォトニック量子シミュレーションは、トポロジカル量子計算を実現するための物理的に堅牢な方法論への第一歩を提供する。

Quasiparticle poisoning, expected to arise during the measurement of Majorana zero mode state, poses a fundamental problem towards the realization of Majorana-based quantum computation. Parafermions, a natural generalization of Majorana fermions, can encode topological qudits immune to quasiparticle poisoning. While parafermions are expected to emerge in superconducting fractional quantum Hall systems, they are not yet attainable with current technology. To bypass this problem, we employ a photonic quantum simulator to experimentally demonstrate the key components of parafermion-based universal quantum computation. Our contributions in this article are twofold. First, by manipulating the photonic states, we realize Clifford operator Berry phases that correspond to braiding statistics of parafermions. Second, we investigate the quantum contextuality in a topological system for the first time by demonstrating the contextuality of parafermion encoded qudit states. Importantly, we find that the topologically-encoded contextuality opens the way to magic state distillation, while both the contextuality and the braiding-induced Clifford gates are resilient against local noise. By introducing contextuality, our photonic quantum simulation provides the first step towards a physically robust methodology for realizing topological quantum computation.

翻訳日:2023-04-24 19:16:01 公開日:2021-07-11

# コヒーレント状態と猫状態に対するLeggett-Garg不等式の違反

Violations of the Leggett-Garg inequality for coherent and cat states ( http://arxiv.org/abs/2101.06866v3 )

ライセンス: Link先を確認

Hiroo Azuma, Masashi Ban

(参考訳) 数値計算により,コヒーレント状態が猫状態よりもレゲット・ガーグ不等式(lgi)により大きな違反を生じうることを示す。そこで本研究では, 空洞モードのLGIを, ゼロ温度環境に弱結合したLGIを物理系の実例として考察した。ボソニックモードは,環境との相互作用により消散するが,消耗には影響しないと仮定する。マスター方程式を正しく解くと、最初にコヒーレント状態である$|\alpha\rangle$ と cat 状態 $(|\alpha\rangle+|-\alpha\rangle)$ で準備された両システムの不等式を破る明示的な形式を導出する。不等式の評価には、複素数$\beta$を特徴とする転位パリティ作用素を選択する。我々は不等式の上限を数値的に最大にする最適なパラメータ$\beta$を求める。我々の期待に反して、コヒーレント状態は、等間隔の3つの測定時間(約$\tau$)の特定の範囲において、lgiの違反の上限よりも高い量子品質を示すことがある。さらに、$\tau$ が 0 に近づくと、最適化されたパラメータ $\beta$ が発散し、LGI は強い特異点を示す。

We show that in some cases the coherent state can have a larger violation of the Leggett-Garg inequality (LGI) than the cat state by numerical calculations. To achieve this result, we consider the LGI of the cavity mode weakly coupled to a zero-temperature environment as a practical instance of the physical system. We assume that the bosonic mode undergoes dissipation because of an interaction with the environment but is not affected by dephasing. Solving the master equation exactly, we derive an explicit form of the violation of the inequality for both systems prepared initially in the coherent state $|\alpha\rangle$ and the cat state $(|\alpha\rangle+|-\alpha\rangle)$. For the evaluation of the inequality, we choose the displaced parity operators characterized by a complex number $\beta$. We look for the optimum parameter $\beta$ that lets the upper bound of the inequality be maximum numerically. Contrary to our expectations, the coherent state occasionally exhibits quantum quality more strongly than the cat state for the upper bound of the violation of the LGI in a specific range of three equally spaced measurement times (spacing $\tau$). Moreover, as we let $\tau$ approach zero, the optimized parameter $\beta$ diverges and the LGI reveals intense singularity.

翻訳日:2023-04-14 21:25:14 公開日:2021-07-11

# スピン量子ビットアドレスを持つ2次元si量子ドットアレイの設計

Designs for a two-dimensional Si quantum dot array with spin qubit addressability ( http://arxiv.org/abs/2106.11124v2 )

ライセンス: Link先を確認

Masahiro Tadokoro, Takashi Nakajima, Takashi Kobayashi, Kenta Takeda, Akito Noiri, Kaito Tomari, Jun Yoneda, Seigo Tarucha, and Tetsuo Kodera

(参考訳) Siの電子スピンは、スケーラビリティと高速で高忠実な量子論理ゲートを基盤として、量子計算の魅力的なプラットフォームである。しかし、中規模から大規模の量子計算において、量子ビット間の効率的な接続と2次元の統合が重要であるにもかかわらず、量子ビットのアドレス性を保証する実用的なデバイス設計はまだ見つからない。本稿では,実用的な3 x 3量子ドットデバイスの設計と,長期的ターゲットとしての大規模設計を提案する。設計目標は、アドレス性を確保しながら、近接する4つの隣人とのqubit接続を実現することである。 3x3量子ドットアレイは, 1次元よりも効率よく4量子Groverのアルゴリズムを実行できることを示す。 3×3以上の二次元配列をスケールアップするために,強磁性ゲート電極を用いた新しい構造を提案する。以上の結果から,si中規模の量子プロセッサが高速な量子論理ゲートと長いコヒーレンス時間を持つ可能性を示す。

Electron spins in Si are an attractive platform for quantum computation, backed with their scalability and fast, high-fidelity quantum logic gates. Despite the importance of two-dimensional integration with efficient connectivity between qubits for medium- to large-scale quantum computation, however, a practical device design that guarantees qubit addressability is yet to be seen. Here, we propose a practical 3 x 3 quantum dot device design and a larger-scale design as a longer-term target. The design goal is to realize qubit connectivity to the four nearest neighbors while ensuring addressability. We show that a 3 x 3 quantum dot array can execute four-qubit Grover's algorithm more efficiently than the one-dimensional counterpart. To scale up the two-dimensional array beyond 3 x 3, we propose a novel structure with ferromagnetic gate electrodes. Our results showcase the possibility of medium-sized quantum processors in Si with fast quantum logic gates and long coherence times.

翻訳日:2023-03-25 23:15:29 公開日:2021-07-11

# 電荷励起と電偏光に対する創発的非エルミート境界寄与

Emergent non-Hermitian boundary contributions to charge pumping and electric polarization ( http://arxiv.org/abs/2106.14173v2 )

ライセンス: Link先を確認

K. Kyriakou and K. Moulopoulos

(参考訳) 電荷ポンプ現象と現代の電気偏光理論は、創発的非エルミート的貢献を考慮した解析的に再考される。これらは速度作用素の拡張定義を用いて説明され、ここで初めて導かれる動的ヘルマン=ファインマンの定理(DHFT)によって決定される。 DHFTは一般化されたベリー曲率を導入し、観測可能量の非摂動計算には有効である。拡張速度演算子を用いて、電荷ポンプが材料の境界とどのように結びついているか(この接続には非ハーミティティーが不可欠である)を厳密に示し、DHFTにより、周期ゲージがフロケ・ブロッホ状態に適用できないとき、駆動非平衡過程における非可積分アハロノフ・アンダン相により、励起電荷のよく知られた位相量子化が崩壊することを示す。同様に、電子分極変化は、現代の電気分極理論では見過ごせない、追加の非エルミート的寄与を持つことを示す。非エルミート寄与は定義によって、バルク積分を境界量に変換する対称構造により等しく境界量として評価できるバルク量である。この非エルミート的寄与は、波動関数に課される現実的な境界条件に非常に敏感であり、偏極変化を引き起こす過程中に境界上の電荷蓄積が存在する偏光絶縁体において重要であることが期待される。最後に、よく定義された曲面電荷定理が境界非エルミート寄与の観点から定式化できることを示す。

The phenomenon of charge pumping and the modern theory of electric polarization are reconsidered by analytically taking into account emergent non-Hermitian contributions. These are accounted for through the use of an extended definition of the velocity operator and are determined by means of a dynamic Hellmann-Feynman theorem (DHFT) that we derive here for the first time. The DHFT introduces generalized Berry curvatures and it is valid for calculating observables nonperturbatively, hence with results valid to all orders of the external fields. By using the extended velocity operator we rigorously show how the charge pumping is linked up with the boundaries of the material (with the non-Hermiticity being essential for this connection), and by means of the DHFT we show that the well-known topological quantization of the pumped charge breaks down due to a nonintegrable Aharonov-Anandan phase in driven non-equilibrium processes whenever the periodic gauge cannot be applied to the Floquet-Bloch states. Likewise, we show that the electronic polarization change has an additional non-Hermitian contribution, which is overlooked in the modern theory of electric polarization. The non-Hermitian contribution is by definition a bulk quantity that may equally be evaluated as a boundary quantity due to a symmetric structure that allows the bulk integration to be transformed into a boundary one. This non-Hermitian contribution is very sensitive to the realistic boundary conditions imposed on the wavefunctions and it is therefore expected to be significant in biased insulators where charge accumulation over their boundaries is present during the process that causes the polarization change. Finally, we show how a well-defined surface-charge theorem can be formulated in terms of the boundary non-Hermitian contribution.

翻訳日:2023-03-24 23:25:46 公開日:2021-07-11

# 熱場二重状態に対する奇数絡み合いエントロピーと対数ネガティクス

Odd Entanglement Entropy and Logarithmic Negativity for Thermofield Double States ( http://arxiv.org/abs/2106.15451v2 )

ライセンス: Link先を確認

Mostafa Ghasemi, Ali Naseh and Reza Pirmoradian

(参考訳) 共分散行列を用いた自由スカラー量子場理論における熱場倍(TFD)状態に対する奇絡エントロピー(OEE)と対数ネガティビティ(LN)の時間発展について検討する。混合状態を持つためには、TFDの各辺の隣接区間または非連結区間である非補間サブシステムを選択する。我々は,OEEの時間進化パターンが線形成長であり,飽和が続くことを見出した。円格子上では、長い時間にわたって有限サイズ効果は振動挙動として表される。質量の消滅の限界では、TFDの両側に1度の自由度を含むサブシステムに対して、中間期の対数的成長をもたらすOEEの時間的進化に対するゼロモードの影響を解析的に見出す。さらに、隣接する区間では、LN は $t < \beta/2$ (逆温度の半分) で 0 であり、その後に線形に成長し始める。一定温度での解離区間については、時間$t<d/2$(間隔間の距離の半分)でLNの消滅が観測される。また、同じような遅延があり、$\Delta S=S_{\text{OEE}}-S_{\text{EE}}$の線形成長が見られる。これらの結果は、対数的成長とは別に、これらの測定の力学が準粒子像と一致していることを示している。

We investigate the time evolution of odd entanglement entropy (OEE) and logarithmic negativity (LN) for the thermofield double (TFD) states in free scalar quantum field theories using the covariance matrix approach. To have mixed states, we choose non-complementary subsystems, either adjacent or disjoint intervals on each side of the TFD. We find that the time evolution pattern of OEE is a linear growth followed by saturation. On a circular lattice, for longer times the finite size effect demonstrates itself as oscillatory behavior. In the limit of vanishing mass, for a subsystem containing a single degree of freedom on each side of the TFD, we analytically find the effect of zero-mode on the time evolution of OEE which leads to logarithmic growth in the intermediate times. Moreover, for adjacent intervals we find that the LN is zero for times $t < \beta/2$ (half of the inverse temperature) and after that, it begins to grow linearly. For disjoint intervals at fixed temperature, the vanishing of LN is observed for times $t<d/2$ (half of the distance between intervals). We also find a similar delay to see linear growth of $\Delta S=S_{\text{OEE}}-S_{\text{EE}}$. All these results show that the dynamics of these measures are consistent with the quasi-particle picture, of course apart from the logarithmic growth.

翻訳日:2023-03-24 19:32:58 公開日:2021-07-11

# 超伝導体を介する長距離磁気双極子-双極子相互作用

Long range magnetic dipole-dipole interaction mediated by a superconductor ( http://arxiv.org/abs/2107.05130v1 )

ライセンス: Link先を確認

Yoav Romach, Tal Wasserman, Shai Tishby, Nir Bar-Gill

(参考訳) 量子計算とシミュレーションは、空間的に分離される可能性がある量子ビット間の強いコヒーレント結合を必要とする。固体ベースのスピン量子ビットに対するこの結合を達成することは、長年の課題である。本稿では、量子ビットによって生成された磁束を導出する超伝導ナノ構造に基づいて、そのようなカップリングを実現する手法を理論的に検討する。超伝導層内のナノファブリケート開口の直下に位置するスピン量子ビットを描いた磁気双極子による磁場の半古典的解析計算とシミュレーションについて述べる。このような構造は磁束を流し、スピン量子ビット間の双極子-双極子相互作用を強化し、そのスケーリングを距離で変化させることで、相互作用するスピン系を制御可能とした。

Quantum computation and simulation requires strong coherent coupling between qubits, which may be spatially separated. Achieving this coupling for solid-state based spin qubits is a long-standing challenge. Here we theoretically investigate a method for achieving such coupling, based on superconducting nano-structures designed to channel the magnetic flux created by the qubits. We detail semi-classical analytical calculations and simulations of the magnetic field created by a magnetic dipole, depicting the spin qubit, positioned directly below nanofabricated apertures in a superconducting layer. We show that such structures could channel the magnetic flux, enhancing the dipole-dipole interaction between spin qubits and changing its scaling with distance, thus potentially paving the way for controllably engineering an interacting spin system.

翻訳日:2023-03-22 20:11:53 公開日:2021-07-11

# 基底状態エネルギー密度問題の計算複雑性

Computational Complexity of the Ground State Energy Density Problem ( http://arxiv.org/abs/2107.05060v1 )

ライセンス: Link先を確認

James D. Watson, Toby S. Cubitt

(参考訳) 無限格子サイズの熱力学的極限における格子上の局所ハミルトニアンの基底状態エネルギー密度を求める複雑さについて検討する。我々はこれを関数問題として厳密に定式化し、そこでは基底状態エネルギー密度を特定の精度で推定し、等価な公約問題として$\mathsf{GSED}$として、基底状態エネルギー密度が指定されたしきい値以上であるかを問う。基底状態エネルギー密度問題は、その基底状態エネルギー密度が一定の実数である熱力学の極限において、単一の固定ハミルトニアンに関係しているという点で珍しい。計算問題に対する唯一の入力は、基底状態エネルギー密度に対応する固定実数を計算する精度である。したがって、複雑性クラスに対するこの問題のハードネスは、クラス内のすべての問題の解がこの1つの数で符号化されることを意味する(計算可能性理論におけるChaitinの定数に類似している)。これは、熱力学的極限における単一のハミルトニアンの物理的性質に通常関係する凝縮物物理学でよく見られる問題の一種である。 2次元正方格子上の古典的、翻訳的不変、最寄りのハミルトニアンに対しては、$\mathsf{p}^{\mathsf{neexp}}\subseteq\mathsf{exp}^{\mathsf{gsed}}\subseteq \mathsf{exp}^{\mathsf{nexp}}$、量子ハミルトンでは$\mathsf{p}^{\mathsf{neexp}}\subseteq\mathsf{exp}^{\mathsf{gsed}}\subseteq \mathsf{qma}_{exp}}$である。 oracleの定義に関する技術的な注意事項により、これらの結果のいくつかにおける$\mathsf{exp}$は$\mathsf{pspace}$に強化できる。また、$\mathsf{gsed}$の関数バージョンに対する類似の複雑性境界を与える。

We study the complexity of finding the ground state energy density of a local Hamiltonian on a lattice in the thermodynamic limit of infinite lattice size. We formulate this rigorously as a function problem, in which we request an estimate of the ground state energy density to some specified precision; and as an equivalent promise problem, $\mathsf{GSED}$, in which we ask whether the ground state energy density is above or below specified thresholds. The ground state energy density problem is unusual, in that it concerns a single, fixed Hamiltonian in the thermodynamic limit, whose ground state energy density is just some fixed, real number. The only input to the computational problem is the precision to which to estimate this fixed real number, corresponding to the ground state energy density. Hardness of this problem for a complexity class therefore implies that the solutions to all problems in the class are encoded in this single number (analogous to Chaitin's constant in computability theory). This captures computationally the type of question most commonly encountered in condensed matter physics, which is typically concerned with the physical properties of a single Hamiltonian in the thermodynamic limit. We show that for classical, translationally invariant, nearest neighbour Hamiltonians on a 2D square lattice, $\mathsf{P}^{\mathsf{NEEXP}}\subseteq\mathsf{EXP}^{\mathsf{GSED}}\subseteq \mathsf{EXP}^{\mathsf{NEXP}}$, and for quantum Hamiltonians $\mathsf{P}^{\mathsf{NEEXP}}\subseteq\mathsf{EXP}^{\mathsf{GSED}}\subseteq \mathsf{EXP}^{\mathsf{QMA}_{EXP}}$. With some technical caveats on the oracle definitions, the $\mathsf{EXP}$ in some of these results can be strengthened to $\mathsf{PSPACE}$. We also give analogous complexity bounds for the function version of $\mathsf{GSED}$.

翻訳日:2023-03-22 20:11:40 公開日:2021-07-11

# 対実プロトコルの対実性解析のための3つのアプローチ

Three approaches for analyzing the counterfactuality of counterfactual protocols ( http://arxiv.org/abs/2107.05055v1 )

ライセンス: Link先を確認

Alon Wander, Eliahu Cohen and Lev Vaidman

(参考訳) 対物通信プロトコルは古典的議論、弱いトレース基準、フィッシャー情報基準の3つのアプローチを用いて分析される。古典的な分析は矛盾を招き、従って放棄されるべきである。弱いトレースとフィッシャー情報基準は, ポストセレクションを含む通信プロトコルの非現実性の程度に一致している。ポストセレクションは反事実コミュニケーションプロトコルの必要な要素であると主張する。コヒーレントな相互作用実験と、弱いトレースを除去する対実的通信装置の最近の変更について論じる。

Counterfactual communication protocols are analysed using three approaches: a classical argument, the weak trace criterion, and the Fisher information criterion. It is argued that the classical analysis leads to contradiction and should therefore be abandoned. The weak trace and Fisher information criteria are shown to agree about the degree of counterfactuality of communication protocols involving postselection. It is argued that postselection is a necessary ingredient of counterfactual communication protocols. Coherent interaction experiments, as well as a recently introduced modification of counterfactual communication setups which eliminates the weak trace, are discussed.

翻訳日:2023-03-22 20:10:33 公開日:2021-07-11

# 光浮遊誘電体ナノ粒子の双極子散乱のイメージング

Imaging the dipole scattering of an optically levitated dielectric nanoparticle ( http://arxiv.org/abs/2107.05042v1 )

ライセンス: Link先を確認

Yuanbin Jin, Jiangwei Yan, Shah Jee Rahman, Xudong Yu and Jing Zhang

(参考訳) ナノ粒子の双極子散乱を高数値開口(NA)イメージングシステムを用いて実験的に観察した。光浮上性ナノ粒子は、粒子-基板相互作用のない環境を提供する。我々は、散乱収集に使用する高NA目標に強く焦点を絞った1064nmトラップレーザビームの伝播方向に対して直交する532nmレーザービームで真空中でシリカナノ粒子を照射し、暗背景と高信号ノイズ比をもたらす。入射レーザの線形偏光により誘起されるナノ粒子の双極子配向を、照明光偏光を回転させる際の像内の散乱光分布とフーリエ空間(k空間)を測定することにより研究した。顕微鏡対象の光学軸に沿ってナノ粒子の双極子配向が整列している場合には、特別に偏光渦(ベクトルビーム)が観察される。我々の研究は、カーカー条件で散乱異方性を研究するための重要なプラットフォームを提供する。

We experimentally observe the dipole scattering of a nanoparticle using a high numerical aperture (NA) imaging system. The optically levitated nanoparticle provides an environment free of particle-substrate interaction. We illuminate the silica nanoparticle in vacuum with a 532 nm laser beam orthogonally to the propagation direction of the 1064 nm trapping laser beam strongly focused by the same high NA objective used to collect the scattering, which results in a dark background and high signal-noise ratio. The dipole orientations of the nanoparticle induced by the linear polarization of the incident laser are studied by measuring the scattering light distribution in the image and the Fourier space (k-space) as we rotate the illuminating light polarization. The polarization vortex (vector beam) is observed for the special case, when the dipole orientation of the nanoparticle is aligned along the optical axis of the microscope objective. Our work offers an important platform for studying the scattering anisotropy with Kerker conditions.

翻訳日:2023-03-22 20:10:23 公開日:2021-07-11

# 1次元および2次元タイト結合格子の量子輸送と局在

Quantum transport and localization in 1d and 2d tight-binding lattices ( http://arxiv.org/abs/2107.05035v1 )

ライセンス: Link先を確認

Amir H. Karamlou, Jochen Braum\"uller, Yariv Yanay, Agustin Di Paolo, Patrick Harrington, Bharath Kannan, David Kim, Morten Kjaergaard, Alexander Melville, Sarah Muschinske, Bethany Niedzielski, Antti Veps\"al\"ainen, Roni Winik, Jonilyn L. Yoder, Mollie Schwartz, Charles Tahan, Terry P. Orlando, Simon Gustavsson and William D. Oliver

(参考訳) 凝縮マター系における粒子輸送と局在現象は、強結合格子ハミルトニアンを用いてモデル化することができる。このようなモデルの理想的な実験エミュレーションは、高コヒーレントな量子システムにおいて、各格子サイトの同時かつ高忠実な制御と読み出しを利用する。ここでは, 量子輸送を1次元および2次元の強結合格子で実験的に研究し, 完全に制御可能な3-\times 3$配列の超伝導量子ビットでエミュレートした。格子内における絡み合いの伝播を探索し,アンダーソン・アンド・ワニエ・スターク政権における部位可変性障害強度と勾配の存在下での局在度を抽出する。この結果は数値シミュレーションと定量的に一致し,タイト結合モデルに基づく理論予測と一致する。システムオブザーバブルの抽出における実験的制御と精度の実証レベルは、数値シミュレーションが難解になる大きな相互作用格子の探索を可能にする。

Particle transport and localization phenomena in condensed-matter systems can be modeled using a tight-binding lattice Hamiltonian. The ideal experimental emulation of such a model utilizes simultaneous, high-fidelity control and readout of each lattice site in a highly coherent quantum system. Here, we experimentally study quantum transport in one-dimensional and two-dimensional tight-binding lattices, emulated by a fully controllable $3 \times 3$ array of superconducting qubits. We probe the propagation of entanglement throughout the lattice and extract the degree of localization in the Anderson and Wannier-Stark regimes in the presence of site-tunable disorder strengths and gradients. Our results are in quantitative agreement with numerical simulations and match theoretical predictions based on the tight-binding model. The demonstrated level of experimental control and accuracy in extracting the system observables of interest will enable the exploration of larger, interacting lattices where numerical simulations become intractable.

翻訳日:2023-03-22 20:10:06 公開日:2021-07-11

# 量子近似最適化アルゴリズムによる最大確率検出

Quantum Approximate Optimization Algorithm Based Maximum Likelihood Detection ( http://arxiv.org/abs/2107.05020v1 )

ライセンス: Link先を確認

Jingjing Cui, Yifeng Xiong, Soon Xin Ng, Lajos Hanzo

(参考訳) 量子技術の最近の進歩は、ノイズの多い中間スケール量子(NISQ)デバイスへの道を切り開いており、そこでは量子近似最適化アルゴリズム(QAOAs)が、NISQデバイスに基づく有形量子優位性を示す有望な候補となっている。本稿では,複数入力および複数出力(MIMO)チャネル上で送信されるバイナリシンボルの最大確率(ML)検出問題について考察する。本稿では、2pの変動パラメータを持つレベルpのQAOA回路に関心の問題を符号化することにより、ML検出にQAOAを適用する。このレベルp qaoa回路は、本問題と初期ハミルトニアンをp次ラウンドで交互に適用することにより構成される。より明確に、我々はまずML検出問題の最適解をハミルトニアン問題の基底状態に符号化する。量子断熱進化法を用いて,ml検出に用いる量子システムの固有値の進化を特徴付ける解析結果と数値値の両方を提供する。そして、レベル1のQAOA回路に対して、QAOAの期待値の解析式を導出し、QAOAベースのML検出器の複雑さについて議論する。本稿では,従来の最適化器の計算複雑性と,QAOAをシミュレーションするストレージ要件について検討する。最後に、QAOAベースのML検出器のビット誤り率(BER)を評価し、従来のML検出器と従来のMMSE検出器の両方と比較し、QAOAベースのML検出器が従来のML検出器の性能に近づくことができることを示した。これにより、NISQコンピュータによって解決される大規模な古典的最適化問題のホストの道が開ける。

Recent advances in quantum technologies pave the way for noisy intermediate-scale quantum (NISQ) devices, where quantum approximation optimization algorithms (QAOAs) constitute promising candidates for demonstrating tangible quantum advantages based on NISQ devices. In this paper, we consider the maximum likelihood (ML) detection problem of binary symbols transmitted over a multiple-input and multiple-output (MIMO) channel, where finding the optimal solution is exponentially hard using classical computers. Here, we apply the QAOA for the ML detection by encoding the problem of interest into a level-p QAOA circuit having 2p variational parameters, which can be optimized by classical optimizers. This level-p QAOA circuit is constructed by applying the prepared Hamiltonian to our problem and the initial Hamiltonian alternately in p consecutive rounds. More explicitly, we first encode the optimal solution of the ML detection problem into the ground state of a problem Hamiltonian. Using the quantum adiabatic evolution technique, we provide both analytical and numerical results for characterizing the evolution of the eigenvalues of the quantum system used for ML detection. Then, for level-1 QAOA circuits, we derive the analytical expressions of the expectation values of the QAOA and discuss the complexity of the QAOA based ML detector. Explicitly, we evaluate the computational complexity of the classical optimizer used and the storage requirement of simulating the QAOA. Finally, we evaluate the bit error rate (BER) of the QAOA based ML detector and compare it both to the classical ML detector and to the classical MMSE detector, demonstrating that the QAOA based ML detector is capable of approaching the performance of the classical ML detector. This paves the way for a host of large-scale classical optimization problems to be solved by NISQ computers.

翻訳日:2023-03-22 20:09:53 公開日:2021-07-11

# 微分マップエリートによる自己参照品質の多様性

Self-Referential Quality Diversity Through Differential Map-Elites ( http://arxiv.org/abs/2107.04964v1 )

ライセンス: Link先を確認

Tae Jong Choi and Julian Togelius

(参考訳) Differential MAP-ElitesはCVT-MAP-Elitesの照明能力と微分進化の連続空間最適化能力を組み合わせた新しいアルゴリズムである。このアルゴリズムは、照明アルゴリズムと品質多様性アルゴリズムが、進化的計算のための定性的に新しい機能と応用を提供するという観察によって動機付けられている。ここで初めて導入された基本的な微分 MAP-Elites アルゴリズムは、微分進化の演算子とCVT-MAP-Elites の写像構造を単純に組み合わせることで比較的単純である。 25の数値最適化問題に基づく実験により、差分MAP-エリートはCVT-MAP-エリートよりも明らかに優れ、より高品質で多様な解が見つかることが示唆された。

Differential MAP-Elites is a novel algorithm that combines the illumination capacity of CVT-MAP-Elites with the continuous-space optimization capacity of Differential Evolution. The algorithm is motivated by observations that illumination algorithms, and quality-diversity algorithms in general, offer qualitatively new capabilities and applications for evolutionary computation yet are in their original versions relatively unsophisticated optimizers. The basic Differential MAP-Elites algorithm, introduced for the first time here, is relatively simple in that it simply combines the operators from Differential Evolution with the map structure of CVT-MAP-Elites. Experiments based on 25 numerical optimization problems suggest that Differential MAP-Elites clearly outperforms CVT-MAP-Elites, finding better-quality and more diverse solutions.

翻訳日:2023-03-22 20:09:20 公開日:2021-07-11

# 脱分極型レコメンダシステムの設計

Designing Recommender Systems to Depolarize ( http://arxiv.org/abs/2107.04953v1 )

ライセンス: Link先を確認

Jonathan Stray

(参考訳) 分極化は民主主義の侵食と暴力の進行に関係しており、大規模なアルゴリズム的コンテンツ選択システム(リコンペンダーシステム)の分極特性が平和と安全保障の懸念事項となっている。アルゴリズム駆動のソーシャルメディアは、国レベルでの偏光の主要な要因とは思えないが、偏光社会において有用な介入ポイントとなるかもしれない。本稿では,対立の抑制や排除ではなく,より建設的な対立に向けたアルゴリズム的非分極介入について検討する。アルゴリズムによる介入は、どのコンテンツが利用可能か(モデレーション)、コンテンツの選択とパーソナライズ方法(ランク付け)、コンテンツのプレゼンテーションとコントロール(ユーザインターフェース)の3段階で検討される。オンライン紛争に関する実証研究は、「フィルターバブル」に対する解毒剤として提案された暴露多様性の介入が改善され、ある条件下では分極が悪化する可能性を示唆している。コンテンツ選択の多様性にともなうcivility metricsの使用は、より効果的かもしれない。しかし、多様性に基づく介入は大規模にテストされておらず、実際のプラットフォームの多様性と動的コンテキストでは機能しない可能性がある。代わりに、プラットフォーム偏光力学の介入は、広く使われている「フィーリング温度計」のような偏光測定の連続的なモニタリングを必要とする可能性が高い。これらのメトリクスは製品の特徴を評価するのに使われ、アルゴリズムの目的として設計される可能性がある。さらに、最適化プロセスが競合を副作用として生み出すのを防ぐために、レコメンダアルゴリズムの目的関数に偏極対策を含める必要があるかもしれない。

Polarization is implicated in the erosion of democracy and the progression to violence, which makes the polarization properties of large algorithmic content selection systems (recommender systems) a matter of concern for peace and security. While algorithm-driven social media does not seem to be a primary driver of polarization at the country level, it could be a useful intervention point in polarized societies. This paper examines algorithmic depolarization interventions with the goal of conflict transformation: not suppressing or eliminating conflict but moving towards more constructive conflict. Algorithmic intervention is considered at three stages: which content is available (moderation), how content is selected and personalized (ranking), and content presentation and controls (user interface). Empirical studies of online conflict suggest that the exposure diversity intervention proposed as an antidote to "filter bubbles" can be improved and can even worsen polarization under some conditions. Using civility metrics in conjunction with diversity in content selection may be more effective. However, diversity-based interventions have not been tested at scale and may not work in the diverse and dynamic contexts of real platforms. Instead, intervening in platform polarization dynamics will likely require continuous monitoring of polarization metrics, such as the widely used "feeling thermometer." These metrics can be used to evaluate product features, and potentially engineered as algorithmic objectives. It may further prove necessary to include polarization measures in the objective functions of recommender algorithms to prevent optimization processes from creating conflict as a side effect.

翻訳日:2023-03-22 20:09:04 公開日:2021-07-11

# 衝突誘起スピンノイズ

Collision-induced spin noise ( http://arxiv.org/abs/2107.04942v1 )

ライセンス: Link先を確認

Shiming Song, Min Jiang, Yushu Qin, Yu Tong, Wenzhe Zhang, Xi Qin, Ren-Bao Liu, Xinhua Peng

(参考訳) 衝突現象はユビキタスであり、原子と分子の微細構造や分子間相互作用を決定する上で重要である。既存のアプローチは、主に原子または分子の散乱に基づいており、超高真空と低温のシステムの使用の不便さによって妨げられている。ここでは、プローブ光の光偏光回転ノイズを簡易な装置と環境条件で測定することにより、新しいスピンノイズ分光法を示す。我々の手法は、数十ギガヘルツの帯域幅と1部1万の分解能を備え、既存のスピンノイズ技術より優れている。この新しい手法により, アルカリ原子の衝突誘起スピンノイズを観測し, 衝突径, 井戸深さ, 支配的相互作用型といった重要な衝突パラメータを正確に決定する。本研究は環境条件下での幅広い衝突現象を研究するための新しいツールを提供する。

Collision phenomena are ubiquitous and of importance in determining the microscopic structures and intermolecular interactions of atoms and molecules. The existing approaches are mostly based on atomic or molecular scatterings, which are hindered by the inconvenience of using ultra-high vacuum and low temperature systems. Here we demonstrate a new spin-noise spectroscopic approach by measuring optical polarization rotation noise of the probe light, which operates with simple apparatus and ambient conditions. Our approach features tens of gigahertz bandwidth and one part-per-million resolution, outperforming existing spin-noise techniques. Enabled by the new technique, we observe the collision-induced spin noise of alkali atoms, and precisely determine key collision parameters, such as collision diameter, well depth, and dominant interaction type. Our work provides a new tool to study a broad range of collision phenomena under ambient conditions.

翻訳日:2023-03-22 20:08:34 公開日:2021-07-11

# カーネル近似のためのランダム特徴:アルゴリズム、理論、およびそれ以上に関する調査

Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond ( http://arxiv.org/abs/2004.11154v5 )

ライセンス: Link先を確認

Fanghui Liu, Xiaolin Huang, Yudong Chen, and Johan A.K. Suykens

(参考訳) ランダム機能は、大規模な問題においてカーネルメソッドを高速化する最も一般的な手法の1つである。 2017年にはNeurIPS Test-of-Time Award、2019年にはICML Best Paper Finalistが受賞した。ランダムな特徴に関する研究の本体は急速に成長しており、様々なアルゴリズムと理論的結果の関連性を説明する上で、この話題を包括的に概観することが望ましい。本研究では,過去10年間のランダムな特徴に関する研究を体系的にレビューする。まず,代表的ランダム特徴に基づくアルゴリズムのモチベーション,特徴,貢献を,サンプリングスキーム,学習手順,分散還元特性,トレーニングデータの利用方法に応じて要約する。第2に,学習した推定者の経験的・予測的リスクの損失がないことを保証するために,ランダムな特徴がいくつ必要か,という疑問を中心に,理論的結果について検討する。第3に,いくつかの大規模ベンチマークデータセットに基づく一般的なランダム特徴量に基づくアルゴリズムの包括的評価を行い,分類におけるその近似品質と予測性能について考察する。最後に、DNNの分析における高次元ランダム特徴の利用や、現在の理論的および経験的結果のギャップを含む、ランダム特徴と現代の過パラメータ化ディープニューラルネットワーク(DNN)の関係について論じる。この調査は、このトピックの穏やかな紹介や、代表的アルゴリズムの適用や様々な技術的前提の下での理論的結果の理解に関心のある実践者のためのユーザガイドとして機能する可能性がある。この調査が、このトピックのオープンな問題、さらに重要なこととして、今後の研究の方向性に関する議論を促進することを期待しています。

Random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of work on random features has grown rapidly, and hence it is desirable to have a comprehensive overview on this topic explaining the connections among various algorithms and theoretical results. In this survey, we systematically review the work on random features from the past ten years. First, the motivations, characteristics and contributions of representative random features based algorithms are summarized according to their sampling schemes, learning procedures, variance reduction properties and how they exploit training data. Second, we review theoretical results that center around the following key question: how many random features are needed to ensure a high approximation quality or no loss in the empirical/expected risks of the learned estimator. Third, we provide a comprehensive evaluation of popular random features based algorithms on several large-scale benchmark datasets and discuss their approximation quality and prediction performance for classification. Last, we discuss the relationship between random features and modern over-parameterized deep neural networks (DNNs), including the use of high dimensional random features in the analysis of DNNs as well as the gaps between current theoretical and empirical results. This survey may serve as a gentle introduction to this topic, and as a users' guide for practitioners interested in applying the representative algorithms and understanding theoretical results under various technical assumptions. We hope that this survey will facilitate discussion on the open problems in this topic, and more importantly, shed light on future research directions.

翻訳日:2022-12-10 09:13:09 公開日:2021-07-11

# 騒がしい自己報告を使ってtwitterユーザーの人口統計を予測

Using Noisy Self-Reports to Predict Twitter User Demographics ( http://arxiv.org/abs/2005.00635v2 )

ライセンス: Link先を確認

Zach Wood-Doughty, Paiheng Xu, Xiao Liu, Mark Dredze

(参考訳) 計算社会科学の研究は、しばしば標準的な人口統計学内のコンテンツ分析を文脈化する。人口統計は多くのソーシャルメディアプラットフォーム(例えばtwitter)では利用できないため、多くの研究が自動的に人口統計を推測している。多くの研究が人種と民族の概念推論の証明を提示しているが、注釈付きデータセットがほとんどないため、実践的なシステムの訓練は明らかになっていない。既存のデータセットは小さく、不正確で、アメリカで最も一般的な4つの人種や民族をカバーできない。本稿では,twitterのプロフィールから人種と民族の自己報告を識別する手法を提案する。自動監視に固有の誤りにもかかわらず、金の標準自己報告調査データに基づいて、優れた性能のモデルを作成する。その結果は、人種や民族のための大規模な訓練資源を作成する再現可能な方法である。

Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter) numerous studies have inferred demographics automatically. Despite many studies presenting proof of concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite errors inherent in automated supervision, we produce models with good performance when measured on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.

翻訳日:2022-12-08 00:05:31 公開日:2021-07-11

# 自然災害対応のためのソーシャルメディア情報共有

Social Media Information Sharing for Natural Disaster Response ( http://arxiv.org/abs/2005.07019v5 )

ライセンス: Link先を確認

Zhijie Sasha Dong, Lingyu Meng, Lauren Christenson, Lawrence Fulton

(参考訳) ソーシャルメディアは災害関連情報を投稿するための重要なチャネルとなり、災害管理を改善するために政府や救援機関にリアルタイムデータを提供する。しかし、この分野の研究は十分な注目を集めておらず、有用な情報を抽出することは依然として困難である。本研究の目的は,災害対応に対する公衆の態度や災害時の防災物資に対する公衆の要求など,ソーシャルメディアデータのマイニング・分析による防災効率の向上である。我々は,41,993件のツイートを含むタイプ,期間,被害などの特性に基づいて,異なる自然災害に焦点を当てた。本稿では, 災害対応の満足度, 不安感などの情報を含む, 人手による分類ツイートによって, 公共の認知度を定性的に評価する。自然災害に対する公衆の態度は、8つの機械学習モデルを用いて定量的解析によって研究される。適切なモデルによる意思決定者に提供するために、計算時間と予測精度に基づく機械学習モデルの比較を行う。異なる自然災害における世論の変化と、twitterが進化を続ける中、同じタイプの自然災害に直面する災害救済にソーシャルメディアを利用する人々の行動の進化について研究している。本論文は,提案手法の有効性と妥当性を実証し,災害対策に関する知見を救援庁に提供した。

Social media has become an essential channel for posting disaster-related information, which provide governments and relief agencies real-time data for better disaster management. However, research in this field has not received sufficient attention and extracting useful information is still challenging. This paper aims to improve disaster relief efficiency via mining and analyzing social media data like public attitudes towards disaster response and public demands for targeted relief supplies during different types of disasters. We focus on different natural disasters based on properties such as types, durations, and damages, which contains a total of 41,993 tweets. In this paper, public perception is assessed qualitatively by manually classified tweets, which contain information like the demand for targeted relief supplies, satisfactions of disaster response, and public fear. Public attitudes to natural disasters are studied via a quantitative analysis using eight machine learning models. To better provide decision-makers with the appropriate model, the comparison of machine learning models based on computational time and prediction accuracy is conducted. The change of public opinion during different natural disasters and the evolution of people's behavior of using social media for disaster relief in the face of the identical type of natural disasters as Twitter continues to evolve are studied. The results in this paper demonstrate the feasibility and validation of the proposed research approach and provide relief agencies with insights into better disaster management.

翻訳日:2022-12-05 12:08:51 公開日:2021-07-11

# 再帰的政策ネットワークの有限状態表現の再理解

Re-understanding Finite-State Representations of Recurrent Policy Networks ( http://arxiv.org/abs/2006.03745v3 )

ライセンス: Link先を確認

Mohamad H. Danesh, Anurag Koul, Alan Fern, Saeed Khorram

(参考訳) 本稿では、リカレントニューラルネットワークとして表現される制御ポリシーを理解するためのアプローチを提案する。最近の研究は、このようなリカレントポリシーネットワークを有限状態マシン(FSM)に変換し、等価最小化FSMを分析することでこの問題にアプローチしている。これは興味深い洞察につながったが、最小化プロセスは、意味的に異なる状態をマージすることで、マシンの動作をより深く理解することができない。この問題に対処するため,我々は,fsmの最小化から始まって,政策の重要な決定点を保存するより解釈可能な削減を適用する分析手法を提案する。また、意思決定における観察の役割をより深く理解するための注意ツールも提供します。 7つのAtariゲームと3つの制御ベンチマークのケーススタディは、これまで気付かれていなかった洞察を明らかにすることができることを示した。

We introduce an approach for understanding control policies represented as recurrent neural networks. Recent work has approached this problem by transforming such recurrent policy networks into finite-state machines (FSM) and then analyzing the equivalent minimized FSM. While this led to interesting insights, the minimization process can obscure a deeper understanding of a machine's operation by merging states that are semantically distinct. To address this issue, we introduce an analysis approach that starts with an unminimized FSM and applies more-interpretable reductions that preserve the key decision points of the policy. We also contribute an attention tool to attain a deeper understanding of the role of observations in the decisions. Our case studies on 7 Atari games and 3 control benchmarks demonstrate that the approach can reveal insights that have not been previously noticed.

翻訳日:2022-11-24 20:56:47 公開日:2021-07-11

# 円滑な敵の訓練

Smooth Adversarial Training ( http://arxiv.org/abs/2006.14536v2 )

ライセンス: Link先を確認

Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

(参考訳) ネットワークは正確かつ堅牢であり得ず、堅牢性を得ることは正確さを失うことを意味すると一般的に信じられている。また、ネットワークを大きくしなければ、ネットワークアーキテクチャ要素は敵の堅牢性を改善する上ではほとんど重要でないと一般的に信じられている。ここでは,これらの共通の信念に挑戦する証拠を,敵の訓練に関する注意深く研究して提示する。注意点として,広く用いられているrelu活性化関数は,その非スムース性により,逆行訓練を著しく弱めている。そこで我々は,ReLUをそのスムーズな近似で置き換えて,対人訓練を強化するスムーズな対人訓練(SAT)を提案する。 SATのスムーズなアクティベーション関数の目的は、より難しい敵の例を見つけ、敵のトレーニング中により良い勾配更新を計算することである。 SATは標準的な対人訓練と比較して、「自由」の対人ロバスト性、すなわち精度の低下や計算コストの増大を改善できる。例えば、さらなる計算を導入することなく、SATはResNet-50の堅牢性を33.0%から42.3%に大幅に向上し、ImageNetの精度も0.9%向上した。 EfficientNet-L1が82.2%の精度と58.6%の堅牢性を達成するのに役立ち、従来の最先端の防御を9.5%、ロバスト性11.6%で上回っている。モデルはhttps://github.com/cihangxie/SmoothAdversarialTrainingで入手できる。

It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet. SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9.5% for accuracy and 11.6% for robustness. Models are available at https://github.com/cihangxie/SmoothAdversarialTraining.

翻訳日:2022-11-17 02:36:49 公開日:2021-07-11

# Pairwise Marginalsによる離散ランダム変数の結合確率の復元

Recovering Joint Probability of Discrete Random Variables from Pairwise Marginals ( http://arxiv.org/abs/2006.16912v2 )

ライセンス: Link先を確認

Shahana Ibrahim, Xiao Fu

(参考訳) 確率変数(RV)の合同確率の学習は、統計信号処理と機械学習の基盤となる。しかし、高次元関節確率の直接的非パラメトリック推定は、一般に次元性の呪いのため不可能である。最近の研究は、低ランクテンソル分解の代数的性質とRV間の(未知の)依存を利用して、任意の数のRVの結合確率質量関数(PMF)を3次元境界から復元することを提案した。それでも、3次元のマージンを正確に推定することは、サンプルの複雑さの面ではコストがかかる可能性がある。三次元境界を用いた場合、トラクタビリティが不明なテンソル分解問題にも挑戦する。この研究は、ペアの辺縁のみを用いて共同PMFを学習するための新しい枠組みを提示し、これは自然に3次に比べて低いサンプル複雑性を享受する。結合型非負行列分解(CNMF)フレームワークを開発し, 種々の条件下でのPMF回復保証について検討した。また,Gram-Schmidt (GS) のようなアルゴリズムを用いて,競合する実行性能を示す。このアルゴリズムは, 有限反復の有界誤差まで, 合理的な条件下で関節pmfを回復できることが示される。また、最近提案された経済予測最大化(EM)アルゴリズムは、GSライクなアルゴリズムの出力を改善することを保証し、精度と効率をさらに高めることを示した。実データ実験は有効性を示すために使用される。

Learning the joint probability of random variables (RVs) is the cornerstone of statistical signal processing and machine learning. However, direct nonparametric estimation for high-dimensional joint probability is in general impossible, due to the curse of dimensionality. Recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of RVs from three-dimensional marginals, leveraging the algebraic properties of low-rank tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals can still be costly in terms of sample complexity, affecting the performance of this line of work in practice in the sample-starved regime. Using three-dimensional marginals also involves challenging tensor decomposition problems whose tractability is unclear. This work puts forth a new framework for learning the joint PMF using only pairwise marginals, which naturally enjoys a lower sample complexity relative to the third-order ones. A coupled nonnegative matrix factorization (CNMF) framework is developed, and its joint PMF recovery guarantees under various conditions are analyzed. Our method also features a Gram--Schmidt (GS)-like algorithm that exhibits competitive runtime performance. The algorithm is shown to provably recover the joint PMF up to bounded error in finite iterations, under reasonable conditions. It is also shown that a recently proposed economical expectation maximization (EM) algorithm guarantees to improve upon the GS-like algorithm's output, thereby further lifting up the accuracy and efficiency. Real-data experiments are employed to showcase the effectiveness.

翻訳日:2022-11-15 05:21:22 公開日:2021-07-11

# ポイントクラウドにおける3次元物体検出のための部分認識データ拡張

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud ( http://arxiv.org/abs/2007.13373v2 )

ライセンス: Link先を確認

Jaeseok Choi, Yeji Song and Nojun Kwak

(参考訳) データ拡張は画像認識タスクの性能向上に大きく貢献し、多くの関連研究が実施されている。しかし、3d point cloud dataのデータ拡張はあまり検討されていない。 3Dラベルは2Dラベルよりも高度で豊富な構造情報を持っているため、より多彩で効果的なデータ拡張を可能にする。本稿では,3dラベルのリッチな情報を利用して3d物体検出の性能を向上させるpart-aware data augmentation (pa-aug)を提案する。 PA-AUGはオブジェクトを分割に分割し、各局所領域に5つの拡張法を確率的に適用する。既存のポイントクラウドデータ拡張手法と互換性があり、検出器のアーキテクチャに関係なく普遍的に使用できる。 PA-AUGは、KITTIデータセットの全クラスに対して最先端の3Dオブジェクト検出器の性能を改善し、列車データを2.5$\times$で増加させる同等の効果を有する。また、PA-AUGは、与えられたデータセットのパフォーマンスを向上するだけでなく、破損したデータに対して堅牢であることを示す。コードはhttps://github.com/sky77764/pa-aug.pytorchで入手できる。

Data augmentation has greatly contributed to improving the performance in image recognition tasks, and a lot of related studies have been conducted. However, data augmentation on 3D point cloud data has not been much explored. 3D label has more sophisticated and rich structural information than the 2D label, so it enables more diverse and effective data augmentation. In this paper, we propose part-aware data augmentation (PA-AUG) that can better utilize rich information of 3D label to enhance the performance of 3D object detectors. PA-AUG divides objects into partitions and stochastically applies five augmentation methods to each local region. It is compatible with existing point cloud data augmentation methods and can be used universally regardless of the detector's architecture. PA-AUG has improved the performance of state-of-the-art 3D object detector for all classes of the KITTI dataset and has the equivalent effect of increasing the train data by about 2.5$\times$. We also show that PA-AUG not only increases performance for a given dataset but also is robust to corrupted data. The code is available at https://github.com/sky77764/pa-aug.pytorch

翻訳日:2022-11-06 08:36:52 公開日:2021-07-11

# カルマンフィルターと他のカーネルスモーザーとの接続によるガウス過程と関連ベクトルマシンの合同紹介

A Joint introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman filtering and other Kernel Smoothers ( http://arxiv.org/abs/2009.09217v4 )

ライセンス: Link先を確認

Luca Martino, Jesse Read

(参考訳) ベイジアンカーネルベースの手法の表現力は、人工知能のさまざまな側面にまたがる重要なツールとなり、多くの現代的なアプリケーションドメインに役立ち、不確実性分析によるパワーと解釈可能性の両方を提供する。本稿では,確率ベイズスキームの領域にまたがる2つの手法と回帰のためのカーネル法,ガウス過程と関連ベクトルマシンについて述べる。我々は,これらの手法を中間的手法,よく知られたカーネルリッジ回帰(kernel ridge regression)の確率的バージョン,二重定式化(dual formulas)によるそれらの接続の描画,主要なタスク(regressive, smoothing, interpolation, filter)のコンテキストにおけるそれらの応用に関する議論といった共通フレームワークの開発に焦点をあてている。全体として、これらのモデルの背後にある数学的概念を理解し、異なる解釈を深く要約し議論し、線形核スムーサ、カルマンフィルタリング、フーリエ近似など他の方法との関係を強調する。全体としては,理解を促進するために多数の図面を提供し,実践者には多数の推薦を行う。さまざまなテクニックのメリットと欠点が強調されている。私たちの知る限りでは、この2つの手法に焦点をあてたこれまでで最も詳細な研究であり、データサイエンス、信号処理、機械学習、人工知能全般の分野における理論的な理解と実践に関係します。

The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.

翻訳日:2022-10-16 20:53:40 公開日:2021-07-11

# 最大平均不一致テストは敵攻撃に注意する

Maximum Mean Discrepancy Test is Aware of Adversarial Attacks ( http://arxiv.org/abs/2010.11415v3 )

ライセンス: Link先を確認

Ruize Gao, Feng Liu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Masashi Sugiyama

(参考訳) 最大平均誤差(MMD)テストは、原則として2つのデータセット間の分布誤差を検出できる。しかし、mmdテストは、敵の攻撃に気づいていないことが示されている。mmdテストは、自然データと敵データの間の不一致を検知できなかった。この現象を踏まえると、我々は、自然データと逆データとが本当に異なる分布から来ているか、という疑問を提起する。答えは肯定的であり、その目的に対するmmdテストの以前の使用は、3つの重要な要因を欠いている。それゆえ、我々は3つのコンポーネントを提案する。まず、ガウス核は表現力に制限があり、それを効果的なディープカーネルに置き換える。第2に,mmd試験の試験力は無視され,漸近統計により最大化した。最後に、敵データは非独立である可能性があり、ワイルドブートストラップでこの問題を克服する。これら3つの要因に対処することにより,MDD テストが敵攻撃に気づいていることを確認し,両サンプルテストに基づく敵データ検出のための新たな道を開く。

The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets. However, it has been shown that the MMD test is unaware of adversarial attacks -- the MMD test failed to detect the discrepancy between natural and adversarial data. Given this phenomenon, we raise a question: are natural and adversarial data really from different distributions? The answer is affirmative -- the previous use of the MMD test on the purpose missed three key factors, and accordingly, we propose three components. Firstly, the Gaussian kernel has limited representation power, and we replace it with an effective deep kernel. Secondly, the test power of the MMD test was neglected, and we maximize it following asymptotic statistics. Finally, adversarial data may be non-independent, and we overcome this issue with the wild bootstrap. By taking care of the three factors, we verify that the MMD test is aware of adversarial attacks, which lights up a novel road for adversarial data detection based on two-sample tests.

翻訳日:2022-10-04 05:30:20 公開日:2021-07-11

# 進行性BERTトレーニングにおける変圧器成長について

On the Transformer Growth for Progressive BERT Training ( http://arxiv.org/abs/2010.12562v3 )

ライセンス: Link先を確認

Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han

(参考訳) 大規模言語モデルの事前トレーニングに過度なコストがかかるため、BERTを徐々にトレーニングする努力が続けられている。我々の目標は、トランスフォーマーの成長の理解を深め、進歩的トレーニングを導く原則を発見することである。まず、ネットワークアーキテクチャ検索と同様に、トランスフォーマーの成長も複合スケーリングを好むことが分かりました。具体的には、既存の手法は1次元でのみネットワーク成長を行うが、複合成長演算子を用いて複数の次元(例えば、モデルの深さ、幅、入力長)のバランスをとることは有用である。さらに,各次元の代替成長演算子を制御比較により探索し,演算子選択の実践的ガイダンスを与える。解析結果から,提案手法は,ベースモデルと大規模モデルでそれぞれ73.6%, 82.2%の事前学習を高速化し, 比較性能を実現した。

Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity. Our objective is to advance the understanding of Transformer growth and discover principles that guide progressive training. First, we find that similar to network architecture search, Transformer growth also favors compound scaling. Specifically, while existing methods only conduct network growth in a single dimension, we observe that it is beneficial to use compound growth operators and balance multiple dimensions (e.g., depth, width, and input length of the model). Moreover, we explore alternative growth operators in each dimension via controlled comparison to give operator selection practical guidance. In light of our analyses, the proposed method speeds up BERT pre-training by 73.6% and 82.2% for the base and large models respectively, while achieving comparable performances

翻訳日:2022-10-03 21:41:14 公開日:2021-07-11

# 自律運転における確率的物体検出に関するレビューと比較研究

A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving ( http://arxiv.org/abs/2011.10671v2 )

ライセンス: Link先を確認

Di Feng, Ali Harakeh, Steven Waslander, Klaus Dietmayer

(参考訳) 物体検出の不確実性の捕捉は安全な自動運転には不可欠である。近年,ディープラーニングがオブジェクト検出のデファクトアプローチとなり,多くの確率的オブジェクト検出法が提案されている。しかし、深層物体検出における不確実性推定の要約は存在せず、既存の手法は異なるネットワークアーキテクチャと不確実性推定法で構築されるだけでなく、幅広い評価指標を持つ異なるデータセット上で評価される。結果として、特定のアプリケーションに最も適したモデルの選択と同様に、メソッド間の比較は難しいままである。本論文は,既存の確率的物体検出法を自動運転応用に適用することで,この問題を緩和することを目的としている。まず,深層学習における汎用的不確実性推定の概観を提供し,確率的物体検出のための既存手法と評価指標を体系的に調査する。次に、画像検出器と3つの公道運転データセットに基づく確率的物体検出のための厳密な比較研究を提案する。最後に,残りの課題と今後の課題について考察する。コードはhttps://github.com/asharakeh/pod_compare.gitで利用可能である。

Capturing uncertainty in object detection is indispensable for safe autonomous driving. In recent years, deep learning has become the de-facto approach for object detection, and many probabilistic object detectors have been proposed. However, there is no summary on uncertainty estimation in deep object detection, and existing methods are not only built with different network architectures and uncertainty estimation methods, but also evaluated on different datasets with a wide range of evaluation metrics. As a result, a comparison among methods remains challenging, as does the selection of a model that best suits a particular application. This paper aims to alleviate this problem by providing a review and comparative study on existing probabilistic object detection methods for autonomous driving applications. First, we provide an overview of generic uncertainty estimation in deep learning, and then systematically survey existing methods and evaluation metrics for probabilistic object detection. Next, we present a strict comparative study for probabilistic object detection based on an image detector and three public autonomous driving datasets. Finally, we present a discussion of the remaining challenges and future works. Code has been made available at https://github.com/asharakeh/pod_compare.git

翻訳日:2022-09-23 06:24:40 公開日:2021-07-11

# 効率的な注意ネットワーク:プラグインの場所を検索して注意を加速する

Efficient Attention Network: Accelerate Attention by Searching Where to Plug ( http://arxiv.org/abs/2011.14058v2 )

ライセンス: Link先を確認

Zhongzhan Huang, Senwei Liang, Mingfu Liang, Wei He, Haizhao Yang

(参考訳) 近年,深層畳み込みニューラルネットワーク(cnns)の内部情報を活用し,モデル一般化を促進するために,プラグ・アンド・プレイ自着モジュールが多数提案されている。以前の作業では、例えば軽量やタスク指向といった特定の機能のための注意モジュールの設計に重点が置かれていた。しかし、彼らはモジュールをCNNバックボーンのブロック全体と個別に接続するので、注意モジュールをどこに差し込むかの重要性を無視し、ネットワーク深度の増加に伴う計算コストとパラメータの数の増加につながった。そこで我々は,既存のアテンションモジュールの効率を改善するために,EAN(Efficient Attention Network)というフレームワークを提案する。 EANでは、共有メカニズム(Huang et al. 2020)を活用して、バックボーン内のアテンションモジュールを共有し、強化学習を通じて共有アテンションモジュールを接続する場所を探索する。最後に,(1) 精度を維持しながら,(1) 余剰パラメータの増大と(3) 加速推論を減少させながら,背骨とモジュール間の疎結合な注意ネットワークを得る。広く使われているベンチマークと一般的な注意ネットワークに関する大規模な実験は、EANの有効性を示している。さらに、我々のEANは、他のタスクに転送し、情報的特徴を捉える能力を持っていることを実証的に説明します。コードはhttps://github.com/gbup-group/ean- efficient-attention-networkで入手できる。

Recently, many plug-and-play self-attention modules are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs). Previous works lay an emphasis on the design of attention module for specific functionality, e.g., light-weighted or task-oriented attention. However, they ignore the importance of where to plug in the attention module since they connect the modules individually with each block of the entire CNN backbone for granted, leading to incremental computational cost and number of parameters with the growth of network depth. Thus, we propose a framework called Efficient Attention Network (EAN) to improve the efficiency for the existing attention modules. In EAN, we leverage the sharing mechanism (Huang et al. 2020) to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning. Finally, we obtain the attention network with sparse connections between the backbone and modules, while (1) maintaining accuracy (2) reducing extra parameter increment and (3) accelerating inference. Extensive experiments on widely-used benchmarks and popular attention networks show the effectiveness of EAN. Furthermore, we empirically illustrate that our EAN has the capacity of transferring to other tasks and capturing the informative features. The code is available at https://github.com/gbup-group/EAN-efficient-attention-network.

翻訳日:2022-09-19 19:13:51 公開日:2021-07-11

# (参考訳) オブジェクト検出における連続学習のためのコントラストR-CNN

Contrast R-CNN for Continual Learning in Object Detection ( http://arxiv.org/abs/2108.04224v1 )

ライセンス: CC BY 4.0

Kai Zheng, Cen Chen

(参考訳) 連続学習問題は画像分類において広く研究され、オブジェクト検出において希少な研究がなされている。最近のいくつかの研究は、古い知識を維持するためにモデルを制約するために知識蒸留を適用するが、この厳格な制約は新しい知識を学ぶために有害である。本稿では,物体検出の連続学習のための新しい手法,すなわちコントラストr-cnnを提案する。さらに,従来型と新しいインスタンス間のあいまいさを排除し,継続的な学習をより堅牢にする提案コントラストを設計する。 PASCAL VOCデータセットの大規模評価により,本手法の有効性が示された。

The continual learning problem has been widely studied in image classification, while rare work has been explored in object detection. Some recent works apply knowledge distillation to constrain the model to retain old knowledge, but this rigid constraint is detrimental for learning new knowledge. In our paper, we propose a new scheme for continual learning of object detection, namely Contrast R-CNN, an approach strikes a balance between retaining the old knowledge and learning the new knowledge. Furthermore, we design a Proposal Contrast to eliminate the ambiguity between old and new instance to make the continual learning more robust. Extensive evaluation on the PASCAL VOC dataset demonstrates the effectiveness of our approach.

翻訳日:2021-08-15 16:35:57 公開日:2021-07-11

# (参考訳) internet-of-things デバイスと医療支援技術:アプリケーション,課題,機会

Internet-of-Things Devices and Assistive Technologies for Healthcare: Applications, Challenges, and Opportunities ( http://arxiv.org/abs/2107.14112v1 )

ライセンス: CC BY 4.0

Marc Jayson Baucas, Petros Spachos, and Stefano Gregori

(参考訳) 医療の状況やケースは急速に拡大しており、物理的な空間が制限され始めている。病院や診療所は、多くの患者を収容する能力を持っていない。健康産業の現状は、その価値ある限られた資源を改善する必要があることは明らかである。 IoT(Internet of Things)デバイスの進化と補助技術によって、医療サービスにワイヤレスでアクセスする便利な手段として、医療の問題を緩和することができる。これらの技術が提供するユニークな特徴を活用できる、IoTデバイスや潜在的なアプリケーションが数多く存在する。しかし同時に、これらのサービスは適切に対処する必要がある新しい課題を提起する。この記事では、IoTベースの医療用アプリケーションとデバイスに関する一般的なカテゴリについてレビューする。次に、課題を説明し、研究がオープンな問題に適切に対処し、医療における既存の実装を改善する方法について論じる。さらに可能な解決策は、将来の医療アプリケーションで実現可能なソリューションになる可能性を示すためにも議論されている。

Medical conditions and cases are growing at a rapid pace, where physical space is starting to be constrained. Hospitals and clinics no longer have the ability to accommodate large numbers of incoming patients. It is clear that the current state of the health industry needs to improve its valuable and limited resources. The evolution of the Internet of Things (IoT) devices along with assistive technologies can alleviate the problem in healthcare, by being a convenient and easy means of accessing healthcare services wirelessly. There is a plethora of IoT devices and potential applications that can take advantage of the unique characteristics that these technologies can offer. However, at the same time, these services pose novel challenges that need to be properly addressed. In this article, we review some popular categories of IoT-based applications for healthcare along with their devices. Then, we describe the challenges and discuss how research can properly address the open issues and improve the already existing implementations in healthcare. Further possible solutions are also discussed to show their potential in being viable solutions for future healthcare applications

翻訳日:2021-08-01 13:44:31 公開日:2021-07-11

# 有害人工知能の存続における社会運動・協力・労働者の役割と責任AIの発展への貢献

The Role of Social Movements, Coalitions, and Workers in Resisting Harmful Artificial Intelligence and Contributing to the Development of Responsible AI ( http://arxiv.org/abs/2107.14052v1 )

ライセンス: Link先を確認

Susan von Struensee

(参考訳) AIベースのシステムが社会にもたらす影響について、世間の懸念が高まっている。あらゆる分野の連合は、AIの有害な適用に抵抗するために世界中で活動している。信頼できるデータの欠如に対処する先住民から、スマートシティの利害関係者、そして、セックストレーカーやMITの寄付者Jeffery Epsteinとの学術的関係に抗議する学生まで、AIから大きく投資し利益を得る人々の倫理と価値は、世界的な監視下にある。 AIアルゴリズムにはバイアスがあり、不当な仮定があり、介入なしにロックインされる可能性がある。我々の最良の人間の判断は、AIの有害な影響を抑えるために必要です。 AIの最大の貢献の1つとして、人類の知恵が地球上でいかに重要かを理解することがあげられるだろう。

There is mounting public concern over the influence that AI based systems has in our society. Coalitions in all sectors are acting worldwide to resist hamful applications of AI. From indigenous people addressing the lack of reliable data, to smart city stakeholders, to students protesting the academic relationships with sex trafficker and MIT donor Jeffery Epstein, the questionable ethics and values of those heavily investing in and profiting from AI are under global scrutiny. There are biased, wrongful, and disturbing assumptions embedded in AI algorithms that could get locked in without intervention. Our best human judgment is needed to contain AI's harmful impact. Perhaps one of the greatest contributions of AI will be to make us ultimately understand how important human wisdom truly is in life on earth.

翻訳日:2021-08-01 11:01:45 公開日:2021-07-11

# (参考訳) 善は逆か? 対人MLコミュニティの価値が社会的に有利な攻撃にどのように影響するか

Adversarial for Good? How the Adversarial ML Community's Values Impede Socially Beneficial Uses of Attacks ( http://arxiv.org/abs/2107.10302v1 )

ライセンス: CC BY 4.0

Kendra Albert, Maggie Delano, Bogdan Kulynych, Ram Shankar Siva Kumar

(参考訳) 敵機械学習(ML)からの攻撃は、ML内の既存の電力構造に対抗するために使用することができ、監視と制御の標的となる人々のための呼吸スペースを作ることができる。しかし、敵対的MLに関する研究の多くは、MLシステムに対する抵抗ツールの開発に関わっていない。なぜだ? 本稿では,敵ml研究者がneurips 2020論文の一部として書いた幅広いインパクトステートメントをレビューし,著者が仕事の目的について持っている仮定を評価する。また、著者が作品の影響をより一般的に見る方法に関する情報も収集しています。我々は、NeurIPSのほとんどの敵対的ML研究者が、社会的に有益な攻撃の活用を検討するのが困難になる2つの基本的な仮定を持っていることを発見した:(1)システムを堅牢にすることが望ましいこと、(2)システムの攻撃者は規範的に悪いこと、そして、システムのディフェンダーは規範的に良いこと。つまり、その表現と中立性にもかかわらず、ほとんどの対立するML研究者は、彼らの仕事の目標はシステムを保護することであり、現状を破壊するためのツールを概念化し構築することが困難であると考えている。

Attacks from adversarial machine learning (ML) have the potential to be used "for good": they can be used to run counter to the existing power structures within ML, creating breathing space for those who would otherwise be the targets of surveillance and control. But most research on adversarial ML has not engaged in developing tools for resistance against ML systems. Why? In this paper, we review the broader impact statements that adversarial ML researchers wrote as part of their NeurIPS 2020 papers and assess the assumptions that authors have about the goals of their work. We also collect information about how authors view their work's impact more generally. We find that most adversarial ML researchers at NeurIPS hold two fundamental assumptions that will make it difficult for them to consider socially beneficial uses of attacks: (1) it is desirable to make systems robust, independent of context, and (2) attackers of systems are normatively bad and defenders of systems are normatively good. That is, despite their expressed and supposed neutrality, most adversarial ML researchers believe that the goal of their work is to secure systems, making it difficult to conceptualize and build tools for disrupting the status quo.

翻訳日:2021-07-25 13:31:47 公開日:2021-07-11

# ハイブリッドant swarmベースのデータクラスタリング

Hybrid Ant Swarm-Based Data Clustering ( http://arxiv.org/abs/2107.07382v1 )

ライセンス: Link先を確認

Md Ali Azam, Abir Hossen, Md Hafizur Rahman

(参考訳) 生物学的にインスパイアされたコンピューティング技術は非常に効果的で、データクラスタリングを含む多くの研究で有用である。アントクラスタリングアルゴリズムは自然に着想を得たクラスタリング手法であり、20年以上にわたって広く研究されてきた。本研究では,アリクラスタリングアルゴリズム(ACA)をハイブリッドアリクラスタリングアルゴリズム(hACA)に拡張する。具体的には,遺伝的アルゴリズムを標準ACAに組み込んで,ハイブリットアルゴリズムを高性能に拡張する。また、クラスタリングのパフォーマンスを高速化するために、新しいピックアップとドロップのルールを導入しました。本稿では,hACAアルゴリズムの性能について検討し,ベンチマークとして標準ACAと比較する。

Biologically inspired computing techniques are very effective and useful in many areas of research including data clustering. Ant clustering algorithm is a nature-inspired clustering technique which is extensively studied for over two decades. In this study, we extend the ant clustering algorithm (ACA) to a hybrid ant clustering algorithm (hACA). Specifically, we include a genetic algorithm in standard ACA to extend the hybrid algorithm for better performance. We also introduced novel pick up and drop off rules to speed up the clustering performance. We study the performance of the hACA algorithm and compare with standard ACA as a benchmark.

翻訳日:2021-07-16 13:48:50 公開日:2021-07-11

# (参考訳) 変分オートエンコーダを用いた多対多音声変換による特徴分散

Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder ( http://arxiv.org/abs/2107.06642v1 )

ライセンス: CC BY 4.0

Manh Luong and Viet Anh Tran

(参考訳) 音声変換は, 話者の音声特性を言語内容を変えることなく, 対象話者に変換する難易度の高い課題である。近年、変分オートエンコーダ(vaes)に基づく多対多音声変換(vc)において、良好な結果が得られているが、これらの手法では、話者のアイデンティティと言語コンテンツを分離して、見当たらない話者シナリオで優れたパフォーマンスを達成する能力が欠如している。本稿では,多数の音声変換に対応するために,特徴の絡み合いに基づく新しい手法を提案する。本手法は話者識別と言語コンテンツを発話から切り離す機能を備えており、音源話者を1つのオートエンコーダネットワークで多くのターゲット話者に変換することができる。さらに、目に見えないターゲット話者シナリオを自然に扱う。提案手法は,自然性や話者の類似性の観点から,他の最先端モデルと比較し,客観的評価と主観評価の両方を行う。

Voice conversion is a challenging task which transforms the voice characteristics of a source speaker to a target speaker without changing linguistic content. Recently, there have been many works on many-to-many Voice Conversion (VC) based on Variational Autoencoder (VAEs) achieving good results, however, these methods lack the ability to disentangle speaker identity and linguistic content to achieve good performance on unseen speaker scenarios. In this paper, we propose a new method based on feature disentanglement to tackle many to many voice conversion. The method has the capability to disentangle speaker identity and linguistic content from utterances, it can convert from many source speakers to many target speakers with a single autoencoder network. Moreover, it naturally deals with the unseen target speaker scenarios. We perform both objective and subjective evaluations to show the competitive performance of our proposed method compared with other state-of-the-art models in terms of naturalness and target speaker similarity.

翻訳日:2021-07-16 05:45:09 公開日:2021-07-11

# (参考訳) 科学的研究手法を用いたパターン発見と検証

Pattern Discovery and Validation Using Scientific Research Methods ( http://arxiv.org/abs/2107.06065v1 )

ライセンス: CC BY 4.0

Dirk Riehle, Nikolay Harutyunyan, Ann Barcomb

(参考訳) 以前は認識されていなかったパターンを発見するパターン発見は、しばしばアドホックなプロセスとして行われ、提案したパターンの品質の確実性はほとんどない。パターン検証は、提案されたパターンの精度を検証するプロセスであり、「3つの規則」の単純なヒューリスティックに支配されている。本稿では,パターン発見と検証のために確立された科学的研究手法の活用方法について述べる。本稿では, 質的調査, 行動研究, ケーススタディによるパターン発見・評価の手法であるハンドブック法(handbook method)を提案し, 科学的手法全般の基本的な原理について考察する。本手法を3つの探索研究を用いて評価し,その有用性を示す。

Pattern discovery, the process of discovering previously unrecognized patterns, is often performed as an ad-hoc process with little resulting certainty in the quality of the proposed patterns. Pattern validation, the process of validating the accuracy of proposed patterns, remains dominated by the simple heuristic of "the rule of three". This article shows how to use established scientific research methods for the purpose of pattern discovery and validation. We present a specific approach, called the handbook method, that uses the qualitative survey, action research, and case study research for pattern discovery and evaluation, and we discuss the underlying principle of using scientific methods in general. We evaluate the handbook method using three exploratory studies and demonstrate its usefulness.

翻訳日:2021-07-15 05:12:17 公開日:2021-07-11

# (参考訳) DiCOVA-Net:Deep Residual Network for the DiCOVA Challenge 2021による新型コロナウイルスの診断

DiCOVA-Net: Diagnosing COVID-19 using Acoustics based on Deep Residual Network for the DiCOVA Challenge 2021 ( http://arxiv.org/abs/2107.06126v1 )

ライセンス: CC BY 4.0

Jiangeng Chang, Shaoze Cui, Mengling Feng

(参考訳) 本稿では,ウイルス感染患者を音響記録に基づいて同定するディープ残余ネットワークベース手法,すなわちDiCOVA-Netを提案する。感染した患者より健康な人がはるかに多いため、この分類問題は不均衡なデータの課題に直面している。マイノリティクラス(感染した患者)を認識できるモデルの能力を向上させるため、データ拡張とコストに敏感な手法をモデルに導入した。さらに、このタスクの特異性を考慮すると、事前トレーニングされたResNet50を調整するための微調整技術をデプロイする。さらに,モデルの一般化性を向上させるために,ランダムな種を用いた複数ベース分類器からの予測結果を統合したアンサンブル学習を行う。提案したDiCOVA-Netの性能を評価するために,DiCOVAチャレンジデータセットを用いて実験を行った。その結果,AUCでは85.43\%を達成していることがわかった。

In this paper, we propose a deep residual network-based method, namely the DiCOVA-Net, to identify COVID-19 infected patients based on the acoustic recording of their coughs. Since there are far more healthy people than infected patients, this classification problem faces the challenge of imbalanced data. To improve the model's ability to recognize minority class (the infected patients), we introduce data augmentation and cost-sensitive methods into our model. Besides, considering the particularity of this task, we deploy some fine-tuning techniques to adjust the pre-training ResNet50. Furthermore, to improve the model's generalizability, we use ensemble learning to integrate prediction results from multiple base classifiers generated using different random seeds. To evaluate the proposed DiCOVA-Net's performance, we conducted experiments with the DiCOVA challenge dataset. The results show that our method has achieved 85.43\% in AUC, among the top of all competing teams.

翻訳日:2021-07-15 04:53:54 公開日:2021-07-11

# 条件ICAによる機能的磁気共鳴画像データ増大

Functional Magnetic Resonance Imaging data augmentation through conditional ICA ( http://arxiv.org/abs/2107.06104v1 )

ライセンス: Link先を確認

Badr Tajini, Hugo Richard, Bertrand Thirion

(参考訳) 計算認知神経画像研究の進歩は、大量のラベル付き脳画像データの利用可能性に関連しているが、そのようなデータは少ないし、コストもかかる。 generative adversarial networks(gans)のような強力なデータ生成メカニズムは、コンピュータビジョンのために過去10年間に設計されてきたが、このような改善はまだ脳イメージングに引き継がれていない。機能的ニューロイメージングで使用可能なノイズ,高次元,小型のサンプルデータにはgansトレーニングが不適当である可能性が高いためと考えられる。本報告では,条件付き独立成分分析(conditional ica: fast functional magnetic resonance imaging, fmri)データ拡張技術を紹介する。次に、少数のサンプルで観察されたクラスにジェネレータを条件付けるメカニズムを提案する。まず,生成機構が観察と区別できないデータの合成に成功し,脳デコード問題における分類精度が向上することを示す。特に、最適化と解釈がずっと簡単でありながら、GANよりも優れています。最後に、Conditional ICAはパラメータチューニングなしで8つのデータセットの分類精度を向上させる。

Advances in computational cognitive neuroimaging research are related to the availability of large amounts of labeled brain imaging data, but such data are scarce and expensive to generate. While powerful data generation mechanisms, such as Generative Adversarial Networks (GANs), have been designed in the last decade for computer vision, such improvements have not yet carried over to brain imaging. A likely reason is that GANs training is ill-suited to the noisy, high-dimensional and small-sample data available in functional neuroimaging.In this paper, we introduce Conditional Independent Components Analysis (Conditional ICA): a fast functional Magnetic Resonance Imaging (fMRI) data augmentation technique, that leverages abundant resting-state data to create images by sampling from an ICA decomposition. We then propose a mechanism to condition the generator on classes observed with few samples. We first show that the generative mechanism is successful at synthesizing data indistinguishable from observations, and that it yields gains in classification accuracy in brain decoding problems. In particular it outperforms GANs while being much easier to optimize and interpret. Lastly, Conditional ICA enhances classification accuracy in eight datasets without further parameters tuning.

翻訳日:2021-07-14 14:30:30 公開日:2021-07-11

# (参考訳) 他者から学ぶ - 限定スーパービジョンによる一般化ゼロショット学習の再考

Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision ( http://arxiv.org/abs/2107.04952v1 )

ライセンス: CC BY 4.0

Gaurav Bhatt, Shivam Chandok and Vineeth N Balasubramanian

(参考訳) ほとんどゼロと少数ショットの学習アプローチの一般的な問題は、クラスに対する偏見に悩まされ、サブ最適性能をもたらすことである。既存の取り組みは、訓練中に目に見えないクラス(すなわち、トランスダクティブゼロショット)からラベルなしの画像を活用することを目的としている。しかし、対象とするunseenクラスのデータが使用できない、あるいは収集できない、実用的なシナリオでは使用が制限される。そこで,本研究では,見知らぬカテゴリに属さない他のデータクラスからのラベルなしイメージを,任意の学習における一般化向上に活用する,帰納的ゼロ・少数ショット学習の実践的設定を提案する。我々は、製品・オブ・エキスパートズに基づく定式化を活用し、通常は利用可能であり、事実上アノテーションコストを伴わないデータ・クラスのラベルなしサンプルを使用できる新しいaudモジュールを導入する。さらに,本モデルの実用的かつ難解な汎用的なゼロショットを限定的な監督設定で解決する可能性も示し,基本視クラスでさえ十分な注釈付きサンプルを持っていないことを示した。

A common problem with most zero and few-shot learning approaches is they suffer from bias towards seen classes resulting in sub-optimal performance. Existing efforts aim to utilize unlabeled images from unseen classes (i.e transductive zero-shot) during training to enable generalization. However, this limits their use in practical scenarios where data from target unseen classes is unavailable or infeasible to collect. In this work, we present a practical setting of inductive zero and few-shot learning, where unlabeled images from other out-of-data classes, that do not belong to seen or unseen categories, can be used to improve generalization in any-shot learning. We leverage a formulation based on product-of-experts and introduce a new AUD module that enables us to use unlabeled samples from out-of-data classes which are usually easily available and practically entail no annotation cost. In addition, we also demonstrate the applicability of our model to address a more practical and challenging, Generalized Zero-shot under a limited supervision setting, where even base seen classes do not have sufficient annotated samples.

翻訳日:2021-07-14 05:36:18 公開日:2021-07-11

# (参考訳) ReconVAT:低リソース実世界のデータのための半スーパービジョン自動音楽書き起こしフレームワーク

ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data ( http://arxiv.org/abs/2107.04954v1 )

ライセンス: CC BY 4.0

Kin Wai Cheuk, Dorien Herremans, Li Su

(参考訳) 現在のsupervised automatic music transcription (amt) モデルは、ほとんどが一般化することができない。これは、ラベル付きトレーニングデータに表示されない様々な音楽ジャンルから実際の音楽録音を翻訳するのに苦労していることを意味する。本稿では,膨大な量の未収録楽曲を活用できる半教師付きフレームワークReconVATを提案する。提案手法は再構成損失と仮想敵訓練を用いる。 AMTの既存のU-netモデルと組み合わせると、ReconVATはMAPSやMusicNetといった一般的なベンチマークデータセットで競合する結果が得られる。例えば、MusicNetの文字列部分バージョンの数ショット設定では、ReconVATはノートワイドとノートウィザードのメトリクスでそれぞれ61.0%と41.6%のF1スコアを達成しており、教師付きベースラインモデルと比較して22.2%と62.5%の改善となっている。提案するフレームワークでは,新たなデータに対する継続的な学習の可能性も示している。

Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT achieves competitive results on common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translates into an improvement of 22.2% and 62.5% compared to the supervised baseline model. Our proposed framework also demonstrates the potential of continual learning on new data, which could be useful in real-world applications whereby new data is constantly available.

翻訳日:2021-07-14 05:16:02 公開日:2021-07-11

# (参考訳) 適応的音声持続時間修正のためのディープベイズフレームワーク

A Deep-Bayesian Framework for Adaptive Speech Duration Modification ( http://arxiv.org/abs/2107.04973v1 )

ライセンス: CC BY 4.0

Ravi Shankar and Archana Venkataraman

(参考訳) 与えられた音声信号の持続時間を適応的に修正する最初の方法を提案する。提案手法はベイズフレームワークを用いて,入力とターゲット発話のフレームをリンクする潜在注意マップを定義する。我々は、マスク付き畳み込みエンコーダデコーダネットワークをトレーニングし、このアテンションマップを平均絶対誤差損失関数の確率バージョンで生成し、またエンコーダ埋め込みを用いてターゲット音声信号の長さを予測する。予測された長さはデコーダ操作のステップ数を決定する。推定中、与えられた入力音声と未知の目標音声信号との類似度行列の代理としてアテンションマップを生成する。この類似性行列を用いて、2つの信号間のアライメントの歪み経路を計算する。この適応的フレームワークは、音声変換と感情変換の両方のタスクにおいて、既知の目標信号に依存する動的時間ワープと類似した結果が得られることを示す。また,本手法は,最先端のボコーダに匹敵する高い品質の音声を生成することを示す。

We propose the first method to adaptively modify the duration of a given speech signal. Our approach uses a Bayesian framework to define a latent attention map that links frames of the input and target utterances. We train a masked convolutional encoder-decoder network to produce this attention map via a stochastic version of the mean absolute error loss function; our model also predicts the length of the target speech signal using the encoder embeddings. The predicted length determines the number of steps for the decoder operation. During inference, we generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal. Using this similarity matrix, we compute a warping path of alignment between the two signals. Our experiments demonstrate that this adaptive framework produces similar results to dynamic time warping, which relies on a known target signal, on both voice conversion and emotion conversion tasks. We also show that our technique results in a high quality of generated speech that is on par with state-of-the-art vocoders.

翻訳日:2021-07-14 04:59:47 公開日:2021-07-11

# (参考訳) STR-GODEs:Metro Ridership Predictionのための空間時間ライダーグラフODEs

STR-GODEs: Spatial-Temporal-Ridership Graph ODEs for Metro Ridership Prediction ( http://arxiv.org/abs/2107.04980v1 )

ライセンス: CC BY 4.0

Chuyu Huang

(参考訳) 地下鉄の乗客予測は常に政府や研究者から大きな注目を集めている。近年,複雑なグラフ畳み込み型リカレントネットワークアーキテクチャの設計に注目が集まっている。これらの研究は空間次元の情報をうまく抽出するが、時間次元の限界はまだ残っている。我々は,ニューラルネットワークアルゴリズムをグラフネットワークに拡張し,時間的,時間的,ライダー間の相関関係をタイムライン上で等間隔に分割することなく効果的に学習できるSTR-GODEsネットワークを提案する。空間的関係と時間的相関を学習しながら,gode-rnnセルを改変してライダーシップ特性と隠れ状態を得る。ライダー情報とその隠れ状態がGODESolveに追加され、予測の長い時系列によるエラーの蓄積を低減する。 2つの大規模データセットに対する大規模な実験は、我々のモデルの有効性と堅牢性を示している。

The metro ridership prediction has always received extensive attention from governments and researchers. Recent works focus on designing complicated graph convolutional recurrent network architectures to capture spatial and temporal patterns. These works extract the information of spatial dimension well, but the limitation of temporal dimension still exists. We extended Neural ODE algorithms to the graph network and proposed the STR-GODEs network, which can effectively learn spatial, temporal, and ridership correlations without the limitation of dividing data into equal-sized intervals on the timeline. While learning the spatial relations and the temporal correlations, we modify the GODE-RNN cell to obtain the ridership feature and hidden states. Ridership information and its hidden states are added to the GODESolve to reduce the error accumulation caused by long time series in prediction. Extensive experiments on two large-scale datasets demonstrate the efficacy and robustness of our model.

翻訳日:2021-07-14 04:47:04 公開日:2021-07-11

# (参考訳) 言語間変換再考による低リソース読解理解の改善

Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking ( http://arxiv.org/abs/2107.05002v1 )

ライセンス: CC BY 4.0

Gaochen Wu, Bin Xu1, Yuxin Qin, Fei Kong, Bangchang Liu, Hongwen Zhao, Dejie Chang

(参考訳) Extractive Reading Comprehension (ERC)は、大規模で高品質なERCトレーニングデータの提供によって、大幅に進歩した。このような急速な進歩と広範な応用にもかかわらず、英語のような高リソース言語以外の言語でのデータセットは乏しいままである。この問題に対処するために,既存の高品質抽出読解データセットを多言語環境でモデル化し,XLTT(Cross-Lingual Transposition ReThinking)モデルを提案する。具体的には、多言語適応的注意(MAA)を用いて、各言語族からより汎用的な意味と語彙の知識を学習する。さらに、既存のデータセットをフル活用するために、既存のデータセットとターゲットデータセット間のタスクレベルの類似性を計算することで、モデルをトレーニングするための新しいトレーニングフレームワークを採用しています。実験の結果、xlttモデルは2つの多言語ercベンチマークで6つのベースラインを上回っており、特に3.9と4.1の平均改善率の低リソース言語ではより効果的であった。

Extractive Reading Comprehension (ERC) has made tremendous advances enabled by the availability of large-scale high-quality ERC training data. Despite of such rapid progress and widespread application, the datasets in languages other than high-resource languages such as English remain scarce. To address this issue, we propose a Cross-Lingual Transposition ReThinking (XLTT) model by modelling existing high-quality extractive reading comprehension datasets in a multilingual environment. To be specific, we present multilingual adaptive attention (MAA) to combine intra-attention and inter-attention to learn more general generalizable semantic and lexical knowledge from each pair of language families. Furthermore, to make full use of existing datasets, we adopt a new training framework to train our model by calculating task-level similarities between each existing dataset and target dataset. The experimental results show that our XLTT model surpasses six baselines on two multilingual ERC benchmarks, especially more effective for low-resource languages with 3.9 and 4.1 average improvement in F1 and EM, respectively.

翻訳日:2021-07-14 04:34:16 公開日:2021-07-11

# (参考訳) インスタンス探索による正確な位置推定

Towards Accurate Localization by Instance Search ( http://arxiv.org/abs/2107.05005v1 )

ライセンス: CC BY 4.0

Yi-Geng Hong, Hui-Chu Xiao, Wan-Lei Zhao

(参考訳) ビジュアルオブジェクトのローカライゼーションは、一連のオブジェクト検出タスクにおける重要なステップである。文献では, 主観的に監視された枠組みを用いて, 高い局所化精度を達成している。しかし、そのような手法にはオブジェクトレベルのアノテーションが必要であり、未知のカテゴリのオブジェクトを検出できない。弱い管理手法も同様の困難に直面している。本稿では,インスタンス探索によって返されるランクリストの精度の高いオブジェクトローカライゼーションを実現するための自己評価学習フレームワークを提案する。提案フレームワークは,クエリと対応するトップランク検索結果から,ターゲットインスタンスを徐々にマイニングする。共通のインスタンスはクエリとランクリスト内のイメージの間で共有されるので、対象のビジュアルインスタンスは、対象のカテゴリが何であるかを知らずに正確にローカライズすることができる。インスタンス検索でのローカライズの実行に加えて、少数ショットのオブジェクト検出の問題も同じフレームワークで対処されている。両タスクで最先端手法よりも優れた性能が観察される。

Visual object localization is the key step in a series of object detection tasks. In the literature, high localization accuracy is achieved with the mainstream strongly supervised frameworks. However, such methods require object-level annotations and are unable to detect objects of unknown categories. Weakly supervised methods face similar difficulties. In this paper, a self-paced learning framework is proposed to achieve accurate object localization on the rank list returned by instance search. The proposed framework mines the target instance gradually from the queries and their corresponding top-ranked search results. Since a common instance is shared between the query and the images in the rank list, the target visual instance can be accurately localized even without knowing what the object category is. In addition to performing localization on instance search, the issue of few-shot object detection is also addressed under the same framework. Superior performance over state-of-the-art methods is observed on both tasks.

翻訳日:2021-07-14 04:06:07 公開日:2021-07-11

# (参考訳) 模倣と強化学習を用いた安定分子の生成

Generating stable molecules using imitation and reinforcement learning ( http://arxiv.org/abs/2107.05007v1 )

ライセンス: CC BY 4.0

S{\o}ren Ager Meldgaard, Jonas K\"ohler, Henrik Lund Mortensen, Mads-Peter V. Christiansen, Frank No\'e, Bj{\o}rk Hammer

(参考訳) 化学空間は、興味深い分子を発見するために機械学習手法によって定期的に探索される。しかし、これらの方法はしばしばグラフ表現に依存し、分子の安定性を決定するのに必要な3d情報を無視している。本稿では,安定性の量子化学予測を可能にする直交座標分子生成のための強化学習手法を提案する。サンプル効率を向上させるために,GDB-11データベース上の模倣学習から基本的な化学規則を学習し,すべての確率論に適用可能な初期モデルを作成する。次に、強化学習環境において、特定の確率論に基づくモデルの複数のコピーをデプロイする。モデルはデータベース内の低エネルギー分子を正確に同定し、トレーニングセットにない新しい異性体を生成する。最後に、このモデルをより大きな分子に適用し、強化学習がトレーニングデータから離れた領域における模倣学習モデルをさらに洗練させることを示す。

Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

翻訳日:2021-07-14 03:52:14 公開日:2021-07-11

# (参考訳) BCNet:乳がんグレーディングのための深層畳み込みニューラルネットワーク

BCNet: A Deep Convolutional Neural Network for Breast Cancer Grading ( http://arxiv.org/abs/2107.05037v1 )

ライセンス: CC BY 4.0

Pouya Hallaj Zavareh, Atefeh Safayari, Hamidreza Bolhasani

(参考訳) 乳がんは、世界中の人々が影響を受ける最も一般的ながんの1つであり、特定の女性において、人間にとって深刻な脅威となっている。この癌を効果的に治療または予防するために、早期の疾患診断は非常に重要である。画像の使用が支配的な役割を果たさなければならないこの障害を検出する様々な方法がある。近年、深層学習は科学、特に医学において広く採用されている。乳癌検出問題では、さまざまなデータセット上でさまざまなディープラーニング技術が開発され、精度が向上した。本稿では,この画像データベース上の最初のアプリケーションとして,Databioxイメージデータセットから病理像を分類するディープニューラルネットワークモデルを提案する。提案モデルであるBCNetは,VGG16を特徴抽出器として利用可能なモデルから選択するトランスファー学習手法を利用した。さらに,データ不足の問題に対処するため,データ拡張手法を用いて入力データセットの拡張を行った。本研究のすべての実装は、前処理アクションからモデルアーキテクチャの図形まで、tf.keras APIを使用して実施されている。提案したモデル実行の結果,88%の有意な検証精度と72%の評価精度が得られた。

Breast cancer has become one of the most prevalent cancers by which people all over the world are affected and is posed serious threats to human beings, in a particular woman. In order to provide effective treatment or prevention of this cancer, disease diagnosis in the early stages would be of high importance. There have been various methods to detect this disorder in which using images have to play a dominant role. Deep learning has been recently adopted widely in different areas of science, especially medicine. In breast cancer detection problems, some diverse deep learning techniques have been developed on different datasets and resulted in good accuracy. In this article, we aimed to present a deep neural network model to classify histopathological images from the Databiox image dataset as the first application on this image database. Our proposed model named BCNet has taken advantage of the transfer learning approach in which VGG16 is selected from available pertained models as a feature extractor. Furthermore, to address the problem of insufficient data, we employed the data augmentation technique to expand the input dataset. All implementations in this research, ranging from pre-processing actions to depicting the diagram of the model architecture, have been carried out using tf.keras API. As a consequence of the proposed model execution, the significant validation accuracy of 88% and evaluation accuracy of 72% obtained.

翻訳日:2021-07-14 03:29:54 公開日:2021-07-11

# (参考訳) 局所性を考慮したマルチビュークラスタリングフレームワーク

Locality Relationship Constrained Multi-view Clustering Framework ( http://arxiv.org/abs/2107.05073v1 )

ライセンス: CC0 1.0

Xiangzhu Meng, Wei Wei, Wenzhe Liu

(参考訳) ほとんどの実用的なアプリケーションでは、異なるビューから複数の機能を使って1つのオブジェクトを表現するのが一般的です。これらの研究の中で、マルチビューサブスペースベースのクラスタリングは、マルチビューデータのクラスタリングソリューションを提供することを目的として、多くの研究者から注目を集めている。しかし、既存の手法のほとんどは、多視点シナリオ下でのサンプル間の局所性幾何学的構造と類似性の関係を十分に利用できない。そこで本研究では,局所性制約付きマルチビュークラスタリングフレームワーク (lrc-mcf) と呼ばれるマルチビュークラスタリングの問題を検討するために,局所性制約付きマルチビュー学習手法を提案する。 LRC-MCFは,多視点間の局所性関係情報と共通類似性関係を捉えることにより,異なる視点間の多様性,幾何学的,コンセンサス,相補的情報を探索することを目的としている。さらに、LCC-MCFは、共通ビューの局所性構造を見つける際に異なる視点の重みを十分に考慮し、最終的なクラスタを直接生成する。学習表現の冗長性を効果的に低減するため、共通類似性行列に対する低ランク制約も考慮される。 LRC-MCFの最小化問題を解決するために、全ての変数を反復的に計算する交代方向最小化(ADM)法が提供される。 7つのベンチマークマルチビューデータセットの広範な実験結果により、lrc-mcf法の有効性が検証された。

In most practical applications, it's common to utilize multiple features from different views to represent one object. Among these works, multi-view subspace-based clustering has gained extensive attention from many researchers, which aims to provide clustering solutions to multi-view data. However, most existing methods fail to take full use of the locality geometric structure and similarity relationship among samples under the multi-view scenario. To solve these issues, we propose a novel multi-view learning method with locality relationship constraint to explore the problem of multi-view clustering, called Locality Relationship Constrained Multi-view Clustering Framework (LRC-MCF). LRC-MCF aims to explore the diversity, geometric, consensus and complementary information among different views, by capturing the locality relationship information and the common similarity relationships among multiple views. Moreover, LRC-MCF takes sufficient consideration to weights of different views in finding the common-view locality structure and straightforwardly produce the final clusters. To effectually reduce the redundancy of the learned representations, the low-rank constraint on the common similarity matrix is considered additionally. To solve the minimization problem of LRC-MCF, an Alternating Direction Minimization (ADM) method is provided to iteratively calculate all variables LRC-MCF. Extensive experimental results on seven benchmark multi-view datasets validate the effectiveness of the LRC-MCF method.

翻訳日:2021-07-14 03:17:20 公開日:2021-07-11

# (参考訳) SGD: 急激な正規化, バッチサイズ, マルチエポックの役割

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs ( http://arxiv.org/abs/2107.05074v1 )

ライセンス: CC BY-SA 4.0

Satyen Kale, Ayush Sekhari, Karthik Sridharan

(参考訳) SGD(Stochastic Gradient Descent)は,大規模過パラメータモデルを用いて学習する方法である。 SGDが実際にうまく機能する理由を説明する一般的な理論は、アルゴリズムが良い解に向けて出力をバイアスする暗黙の正規化を持っていることである。おそらく理論上最もよく知られたsgdの学習設定は確率凸最適化(sco)であり、sgdはサンプル数である$o(1/\sqrt{n})$で学習することがよく知られている。本稿ではSCOの問題点を考察し,SGDにおける暗黙の正規化,バッチサイズ,複数エポックの役割について考察する。主な貢献は3つある: (a) 正規化者にとって、正規化実証リスク最小化が学習に失敗するSCO問題が存在することを示す。これにより、暗黙の正規化に基づくSGDの成功の説明が自動的に除外される。 b)サンプル複雑性の観点から,経験的損失の勾配降下(gd)によるsgdと学習の分離を提供する。任意のステップサイズと反復数を持つ GD が最適以下でしか学べないような SCO 問題が存在することを示す:少なくとも $\widetilde{\Omega}(1/n^{5/12})$。 (c) 一般的に用いられるSGDのマルチエポック版について述べる。最悪の場合、このアルゴリズムはsingle pass sgdと同じくらい優れていることが証明される。しかし、SCOの特定の問題に対して、データセットに複数回のパスを取ることはシングルパスSGDを著しく上回る。我々は,任意のデータ分布に対して学習可能な問題を示すことによって,一般的な学習環境にまで拡張し,この問題に対して,SGDは正規化関数のRERMよりも厳密に優れていることを示す。この結果が深層学習に与える影響について考察し,2層対角型ニューラルネットワークにおけるsgdとermの分離を示す。

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of $O(1/\sqrt{n})$, where $n$ is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: (a) We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD. (b) We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$. (c) We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.

翻訳日:2021-07-14 02:56:18 公開日:2021-07-11

# (参考訳) 畳み込みニューラルネットワークを用いた肺結節の分類における入力サイズの影響

Effect of Input Size on the Classification of Lung Nodules Using Convolutional Neural Networks ( http://arxiv.org/abs/2107.05085v1 )

ライセンス: CC BY 4.0

Gorkem Polat, Yesim Dogrusoz Serinagaoglu, Ugur Halici

(参考訳) 近年,CTを用いた肺がん検診は,従来の胸部X線撮影と比較して肺がん死亡率を20%低下させることが示された。したがって、CT肺検診は世界中で広く使われ始めている。しかし、これらの画像の解析は放射線科医にとって深刻な負担である。 CTスキャンのスライス数は最大600までである。したがって、コンピュータ支援検出(CAD)システムは、データのより高速かつ正確な評価に非常に重要である。本研究では, 畳み込みニューラルネットワーク(CNN)を用いてCT肺検診を解析し, 偽陽性を減少させる枠組みを提案する。我々は、異なるボリュームサイズでモデルをトレーニングし、ボリュームサイズがシステムの性能に重要な役割を果たすことを示した。また, 核融合により, 核融合のパワーと全体の精度について検討した。 3dデータに適用された2次元畳み込み操作は情報損失をもたらす可能性があるため、3d cnnは2d cnnよりも好まれる。提案したフレームワークはLUNA16 Challengeで提供されるデータセット上でテストされ、1スキャンあたりの偽陽性で0.831の感度が得られた。

Recent studies have shown that lung cancer screening using annual low-dose computed tomography (CT) reduces lung cancer mortality by 20% compared to traditional chest radiography. Therefore, CT lung screening has started to be used widely all across the world. However, analyzing these images is a serious burden for radiologists. The number of slices in a CT scan can be up to 600. Therefore, computer-aided-detection (CAD) systems are very important for faster and more accurate assessment of the data. In this study, we proposed a framework that analyzes CT lung screenings using convolutional neural networks (CNNs) to reduce false positives. We trained our model with different volume sizes and showed that volume size plays a critical role in the performance of the system. We also used different fusions in order to show their power and effect on the overall accuracy. 3D CNNs were preferred over 2D CNNs because 2D convolutional operations applied to 3D data could result in information loss. The proposed framework has been tested on the dataset provided by the LUNA16 Challenge and resulted in a sensitivity of 0.831 at 1 false positive per scan.

翻訳日:2021-07-14 02:54:54 公開日:2021-07-11

# (参考訳) ニューラルネットワークを用いたビデオからのリモート酸素推定

Remote Blood Oxygen Estimation From Videos Using Neural Networks ( http://arxiv.org/abs/2107.05087v1 )

ライセンス: CC BY 4.0

Joshua Mathew, Xin Tian, Min Wu, Chau-Wai Wong

(参考訳) 血液酸素飽和度(SpO$_2$)は呼吸機能の不可欠な指標であり、新型コロナウイルスのパンデミックで注目されている。臨床所見から、COVID-19患者は明らかな症状の前にSpO$_2$が著しく低い可能性が示唆された。カメラの普及により、研究者はSpO$2$をビデオで監視する方法を調査する動機となった。スマートフォンに関するほとんどの以前のスキームはコンタクトベースで、スマートフォンのカメラと近くの光源を覆うために指先が必要で、照らされた組織から再放射された光を捉える。本稿では,スマートフォンカメラを用いた最初の畳み込みニューラルネットワークを用いた非接触SpO$_2$推定方式を提案する。このスキームは、生理的感覚のために参加者の手のビデオを分析し、便利で快適で、プライバシーを保護し、マスクを装着することを可能にする。我々は,spo$_2$測定のための光生理学的モデルに触発されたニューラルネットワークアーキテクチャを設計し,チャネル結合の重みを可視化することにより説明可能性を示す。提案手法は,接触型SpO$_2$測定のための最先端モデルよりも優れており,公衆衛生に寄与する可能性を示している。また,スキンタイプと手の側面がspo$_2$推定性能に及ぼす影響についても検討した。

Blood oxygen saturation (SpO$_2$) is an essential indicator of respiratory functionality and is receiving increasing attention during the COVID-19 pandemic. Clinical findings show that it is possible for COVID-19 patients to have significantly low SpO$_2$ before any obvious symptoms. The prevalence of cameras has motivated researchers to investigate methods for monitoring SpO$_2$ using videos. Most prior schemes involving smartphones are contact-based: They require a fingertip to cover the phone's camera and the nearby light source to capture re-emitted light from the illuminated tissue. In this paper, we propose the first convolutional neural network based noncontact SpO$_2$ estimation scheme using smartphone cameras. The scheme analyzes the videos of a participant's hand for physiological sensing, which is convenient and comfortable, and can protect their privacy and allow for keeping face masks on. We design our neural network architectures inspired by the optophysiological models for SpO$_2$ measurement and demonstrate the explainability by visualizing the weights for channel combination. Our proposed models outperform the state-of-the-art model that is designed for contact-based SpO$_2$ measurement, showing the potential of our proposed method to contribute to public health. We also analyze the impact of skin type and the side of a hand on SpO$_2$ estimation performance.

翻訳日:2021-07-14 02:36:57 公開日:2021-07-11

# (参考訳) Fairer Software Made Easier("Keys"を使用)

Fairer Software Made Easier (using "Keys") ( http://arxiv.org/abs/2107.05088v1 )

ライセンス: CC BY 4.0

Tim Menzies, Kewen Peng, Andre Lustosa

(参考訳) ソフトウェア分析の説明を簡単にできますか? たぶん。最近の結果は、しばしば「キー効果」、すなわち「キー効果」を示すことを示している。いくつかの重要な機能が残りを制御する。言うまでもなく、いくつかのキーで制御されたシステムでは、説明と制御はキーをまたいでいくつかの"what-if"クエリを実行するだけの問題です。鍵効果を利用することで、倫理的AIシステムに必要なものなど、複雑な説明を劇的に単純化することが可能になる。

Can we simplify explanations for software analytics? Maybe. Recent results show that systems often exhibit a "keys effect"; i.e. a few key features control the rest. Just to say the obvious, for systems controlled by a few keys, explanation and control is just a matter of running a handful of "what-if" queries across the keys. By exploiting the keys effect, it should be possible to dramatically simplify even complex explanations, such as those required for ethical AI systems.

翻訳日:2021-07-14 02:22:13 公開日:2021-07-11

# (参考訳) アフリカ農業分野における機械学習の課題と機会--一般論から

Machine Learning Challenges and Opportunities in the African Agricultural Sector -- A General Perspective ( http://arxiv.org/abs/2107.05101v1 )

ライセンス: CC BY-SA 4.0

Racine Ly

(参考訳) コンピュータの能力の向上、アルゴリズム技術の進歩、利用可能なデータの大幅な増加は、最近の人工知能(AI)技術の発展を可能にした。機械学習(ML)と呼ばれるその分野の1つでは、視覚、スピーチ、問題解決といった人間の知性に起因する特徴を模倣する能力が強い。しかし、以前の技術革命が示唆しているように、彼らの最も大きな影響は、そのテクノロジーの伝統的なユーザーではない他の分野にほとんど期待できる。農業部門はアフリカ経済にとって不可欠であり、気候変動時代には収量の改善、損失軽減、天然資源の効率的な管理が不可欠である。機械学習は、予測を行う上で付加価値を持つ技術であり、それゆえ、農業セクターにおける不確実性とリスクを低減する可能性がある。本研究の目的は、アフリカ農業におけるMLベースのソリューションの障壁を文脈化し、議論することである。第2部では、歴史的・技術的観点からのML技術の概要とその推進力について概説した。第3部では,農業におけるMLの利用状況について概説した。最後に、第4節では、アフリカにおけるMLの成長と、農業分野におけるMLベースのソリューションの作成と利用における潜在的な障壁について論じる。

The improvement of computers' capacities, advancements in algorithmic techniques, and the significant increase of available data have enabled the recent developments of Artificial Intelligence (AI) technology. One of its branches, called Machine Learning (ML), has shown strong capacities in mimicking characteristics attributed to human intelligence, such as vision, speech, and problem-solving. However, as previous technological revolutions suggest, their most significant impacts could be mostly expected on other sectors that were not traditional users of that technology. The agricultural sector is vital for African economies; improving yields, mitigating losses, and effective management of natural resources are crucial in a climate change era. Machine Learning is a technology with an added value in making predictions, hence the potential to reduce uncertainties and risk across sectors, in this case, the agricultural sector. The purpose of this paper is to contextualize and discuss barriers to ML-based solutions for African agriculture. In the second section, we provided an overview of ML technology from a historical and technical perspective and its main driving force. In the third section, we provided a brief review of the current use of ML in agriculture. Finally, in section 4, we discuss ML growing interest in Africa and the potential barriers to creating and using ML-based solutions in the agricultural sector.

翻訳日:2021-07-14 02:07:30 公開日:2021-07-11

# (参考訳) Repo2Vec:リポジトリの類似性決定のための包括的埋め込みアプローチ

Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity ( http://arxiv.org/abs/2107.05112v1 )

ライセンス: CC BY 4.0

Md Omar Faruk Rokon, Pei Yan, Risul Islam, Michalis Faloutsos

(参考訳) githubのような大規模なオンラインアーカイブの中で、類似したリポジトリやクラスタをどうやって特定できるのでしょう? リポジトリの類似性の決定は、このようなソフトウェアエコシステムのダイナミクスと進化を研究する上で不可欠な構成要素である。重要な課題は、さまざまなリポジトリ機能の適切な表現を決定することである。 (a) 利用可能な情報のすべての側面をキャプチャし、 (b) MLalgorithmsによって容易に使用することができる。本稿では,リポジトリを分散ベクタとして表現するための総合的な埋め込み手法であるRepo2Vecを提案する。 a)メタデータ、(b)レポジトリの構造、(c)ソースコードの3つのタイプの情報について検討しています。また、これらの情報型を単一の埋め込みに表現し、組み合わせるための一連の埋め込みアプローチも導入します。この手法をGitHubから2つの実際のデータセットで評価し、1013リポジトリを組み合わせた。まず,提案手法が精度(93%vs78%)で従来の手法を上回り,ほぼ2倍の類似リポジトリと30%の偽陽性率を示した。次に,repo2vecが, (a) マルウェアと良性リポジトリの区別, (b) 有意義な階層的クラスタリングの識別といった,確かな基盤を提供する方法を示す。例えば、マルウェアと良性リポジトリの区別において、98%の精度と96%のリコールを実現しています。全体的な作業は、ターゲットプラットフォームや意図によるリポジトリ分類、コード再利用とクローンの検出、系統と進化の特定など、多くのリポジトリ分析機能を実現するための基本的なビルディングブロックです。

How can we identify similar repositories and clusters among a large online archive, such as GitHub? Determiningrepository similarity is an essential building block in studying the dynamics and the evolution of such software ecosystems. The key challenge is to determine the right representation for the diverse repository features in a way that: (a) it captures all aspects of the available information, and (b) it is readily usable by MLalgorithms. We propose Repo2Vec, a comprehensive embedding approach to represent a repository as a distributed vector by combining features from three types of information sources. As our key novelty, we consider three types of information: (a)metadata, (b) the structure of the repository, and (c) the source code. We also introduce a series of embedding approaches to represent and combine these information types into a single embedding. We evaluate our method with two real datasets from GitHub for a combined 1013 repositories. First, we show that our method outperforms previous methods in terms of precision (93%vs 78%), with nearly twice as many Strongly Similar repositories and 30% fewer False Positives. Second, we show how Repo2Vecprovides a solid basis for: (a) distinguishing between malware and benign repositories, and (b) identifying a meaningful hierarchical clustering. For example, we achieve 98% precision and 96%recall in distinguishing malware and benign repositories. Overall, our work is a fundamental building block for enabling many repository analysis functions such as repository categorization by target platform or intention, detecting code-reuse and clones, and identifying lineage and evolution.

翻訳日:2021-07-14 01:52:06 公開日:2021-07-11

# (参考訳) Deep Collaborative Filtering-based Method for Image Denoisingの詳細

Details Preserving Deep Collaborative Filtering-Based Method for Image Denoising ( http://arxiv.org/abs/2107.05115v1 )

ライセンス: CC BY 4.0

Basit O. Alawode, Mudassir Masood, Tarig Ballal, and Tareq Al-Naffouri

(参考訳) 何年もの間、複数のデノイジングアルゴリズムによって達成された改善にもかかわらず、その多くはデノイジング後の画像の細部を保存できていない。これは、画像に対する滑らかな効果の結果である。ほとんどのニューラルネットワークベースのアルゴリズムは、古典的な推論アルゴリズムよりも優れた量的性能を達成している。しかし、スムーズなアウト効果の結果、質的な(視覚的な)パフォーマンスに悩まされる。本稿では,この問題に対処するアルゴリズムを提案する。本稿では,画像デノイジングのための深い協調フィルタリング(deep-cofib)アルゴリズムを提案する。このアルゴリズムは、最適化されたニューラルネットワークモデルのセットを使用してスパース領域における画像パッチの協調分解を行う。これにより、ノイズ除去と詳細保存のトレードオフを良好に得ることができる高速アルゴリズムが得られる。大規模な実験により、DeepCoFiBは(PSNRとSSIMの観点から)定量的に、そして(視覚的に)多くの最先端の復調アルゴリズムより質的に(定量的に)優れていることが示された。

In spite of the improvements achieved by the several denoising algorithms over the years, many of them still fail at preserving the fine details of the image after denoising. This is as a result of the smooth-out effect they have on the images. Most neural network-based algorithms have achieved better quantitative performance than the classical denoising algorithms. However, they also suffer from qualitative (visual) performance as a result of the smooth-out effect. In this paper, we propose an algorithm to address this shortcoming. We propose a deep collaborative filtering-based (Deep-CoFiB) algorithm for image denoising. This algorithm performs collaborative denoising of image patches in the sparse domain using a set of optimized neural network models. This results in a fast algorithm that is able to excellently obtain a trade-off between noise removal and details preservation. Extensive experiments show that the DeepCoFiB performed quantitatively (in terms of PSNR and SSIM) and qualitatively (visually) better than many of the state-of-the-art denoising algorithms.

翻訳日:2021-07-14 01:32:46 公開日:2021-07-11

# (参考訳) eGHWT:拡張一般化Haar-Walsh変換

eGHWT: The extended Generalized Haar-Walsh Transform ( http://arxiv.org/abs/2107.05121v1 )

ライセンス: CC BY 4.0

Naoki Saito and Yiqun Shao

(参考訳) 正規格子の古典的設定からグラフやネットワークのより一般的な設定への計算調和解析ツールの拡張は非常に重要であり、近年多くの研究が行われている。 irion and saito (2014) によって開発された一般化ハール・ウォルシュ変換(ghwt)は、古典的なハール変換とウォルシュ・ハダマード変換の一般化であるグラフ上の信号に対する多スケール変換である。我々は、Thiele と Villemoes (1996) の適応時間周波数タイリングの一般化である拡張一般化Har-Walsh変換(eGHWT)を提案する。 eGHWTはグラフ領域分割の効率だけでなく、"シーケンス領域"分割の効率も同時に調べている。その結果、グラフ信号に対するeGHWTとその関連するベストベーシ選択アルゴリズムは、類似の計算コストである$O(N \log N)$,$N$が入力グラフのノード数である場合、前回のGHWTの性能を大幅に向上させる。 ghwt best-basisアルゴリズムは、$(1.5)^n$が$\mathbb{r}^n$で可能な正規直交基底のうち、与えられたタスクの最も適切な正規直交基底を求めるが、eghwt best-basisアルゴリズムは$0.618\cdot(1.84)^n$ で可能な正規直交基底を$\mathbb{r}^n$ で検索することで、より良いものを見つけることができる。本稿では,eGHWTベストベージアルゴリズムの詳細を説明し,グラフ信号として見る従来のデジタル画像だけでなく,真のグラフ信号を含むいくつかの例を用いて,その優位性を実証する。さらに, eghwtを2次元信号や行列データに拡張する方法を, 列や列から生成されたグラフのテンソル積として見ることにより示し, 画像近似などのアプリケーションでの有効性を示す。

Extending computational harmonic analysis tools from the classical setting of regular lattices to the more general setting of graphs and networks is very important and much research has been done recently. The Generalized Haar-Walsh Transform (GHWT) developed by Irion and Saito (2014) is a multiscale transform for signals on graphs, which is a generalization of the classical Haar and Walsh-Hadamard Transforms. We propose the extended Generalized Haar-Walsh Transform (eGHWT), which is a generalization of the adapted time-frequency tilings of Thiele and Villemoes (1996). The eGHWT examines not only the efficiency of graph-domain partitions but also that of "sequency-domain" partitions simultaneously. Consequently, the eGHWT and its associated best-basis selection algorithm for graph signals significantly improve the performance of the previous GHWT with the similar computational cost, $O(N \log N)$, where $N$ is the number of nodes of an input graph. While the GHWT best-basis algorithm seeks the most suitable orthonormal basis for a given task among more than $(1.5)^N$ possible orthonormal bases in $\mathbb{R}^N$, the eGHWT best-basis algorithm can find a better one by searching through more than $0.618\cdot(1.84)^N$ possible orthonormal bases in $\mathbb{R}^N$. This article describes the details of the eGHWT best-basis algorithm and demonstrates its superiority using several examples including genuine graph signals as well as conventional digital images viewed as graph signals. Furthermore, we also show how the eGHWT can be extended to 2D signals and matrix-form data by viewing them as a tensor product of graphs generated from their columns and rows and demonstrate its effectiveness on applications such as image approximation.

翻訳日:2021-07-14 01:22:05 公開日:2021-07-11

# (参考訳) 早期行動認識のための解釈可能なDeep Feature Propagation

Interpretable Deep Feature Propagation for Early Action Recognition ( http://arxiv.org/abs/2107.05122v1 )

ライセンス: CC BY 4.0

He Zhao, Richard P. Wildes

(参考訳) 限られた予備観測からの初期アクション認識(アクション予測)は、リアルタイムな推論を必要とするストリーミング視覚システムにとって重要な役割を担っている。本研究では,空間的特徴空間における行動パターンの時間的変化を解明し,行動予測に対処する。私たちのシステムには3つの重要なコンポーネントがあります。まず、空間レイアウトを維持しながら、生データからの抽象化を可能にする中間層convnet機能を扱う。第二に、各特徴を伝播するのではなく、その残余を時間にわたって伝播し、冗長性を減少させるコンパクトな表現を可能にします。第3に、エラーのビルドと予測開始時間の統一にKalmanフィルタを使用します。複数のベンチマークでの大規模な実験結果から,本手法は動作予測における競合性能をもたらすことが示された。特筆すべきは,我々のシステムの学習した構成要素を,その不透明な性質を2つの方法で照らすことである。まず,我々の学習した特徴伝達モジュールが畳み込み下での空間シフト機構として機能し,現在の観測を未来に伝播させることを示す。これにより、フローベースの画像動き情報をキャプチャする。第2に,学習したカルマンフィルタは事前推定を適応的に更新し,シーケンス学習を支援する。

Early action recognition (action prediction) from limited preliminary observations plays a critical role for streaming vision systems that demand real-time inference, as video actions often possess elongated temporal spans which cause undesired latency. In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space. There are three key components to our system. First, we work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout. Second, instead of propagating features per se, we propagate their residuals across time, which allows for a compact representation that reduces redundancy. Third, we employ a Kalman filter to combat error build-up and unify across prediction start times. Extensive experimental results on multiple benchmarks show that our approach leads to competitive performance in action prediction. Notably, we investigate the learned components of our system to shed light on their otherwise opaque natures in two ways. First, we document that our learned feature propagation module works as a spatial shifting mechanism under convolution to propagate current observations into the future. Thus, it captures flow-based image motion information. Second, the learned Kalman filter adaptively updates prior estimation to aid the sequence learning process.

翻訳日:2021-07-14 01:20:32 公開日:2021-07-11

# (参考訳) eコマースセッションベースレコメンデーションのためのマルチモーダル機能とポストフュージョンコンテキストを備えたトランスフォーマー

Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation ( http://arxiv.org/abs/2107.05124v1 )

ライセンス: CC BY-SA 4.0

Gabriel de Souza P. Moreira and Sara Rabhi and Ronay Ak and Md Yasin Kabir and Even Oldridge

(参考訳) セッションベースのレコメンデーションはEコマースサービスにとって重要なタスクであり、多数のユーザが匿名でブラウズしたり、異なるセッションに対して非常に異なる関心を持つことがある。本稿では,SIGIR 2021 Workshop on E-Commerce Data Challenge の推薦課題における勝者の1つについて述べる。私たちのソリューションはnlp技術にインスパイアされ、transformer-xlとxlnetという2つのトランスフォーマーアーキテクチャで構成されています。コンペで利用可能なリッチデータセットのほとんどを活用するために、表形式のイベントとテキストベクトルと画像ベクトルを組み合わせることで、マルチモデル機能をどのように準備したかを述べる。また,セッションベースレコメンデーションにおけるアーキテクチャの有効性をよりよく理解するために,モデル予測分析を提案する。

Session-based recommendation is an important task for e-commerce services, where a large number of users browse anonymously or may have very distinct interests for different sessions. In this paper we present one of the winning solutions for the Recommendation task of the SIGIR 2021 Workshop on E-commerce Data Challenge. Our solution was inspired by NLP techniques and consists of an ensemble of two Transformer architectures - Transformer-XL and XLNet - trained with autoregressive and autoencoding approaches. To leverage most of the rich dataset made available for the competition, we describe how we prepared multi-model features by combining tabular events with textual and image vectors. We also present a model prediction analysis to better understand the effectiveness of our architectures for the session-based recommendation.

翻訳日:2021-07-14 00:52:23 公開日:2021-07-11

# (参考訳) ビデオ予測理解の概観:早期行動認識と今後の行動予測

Review of Video Predictive Understanding: Early ActionRecognition and Future Action Prediction ( http://arxiv.org/abs/2107.05140v1 )

ライセンス: CC BY 4.0

He Zhao, Richard P. Wildes

(参考訳) ビデオ予測理解は、現在から観測されていない未来や歴史的ビデオ観察への期待に関係した幅広い取り組みを含んでいる。アクション予測はビデオ予測理解の主要な部分であり、このレビューの焦点となっている。この亜領域には、早期行動認識と将来の行動予測という2つの大きな区分がある。早期行動認識は、進行中の行動をできるだけ早く認識することに関心がある。将来の行動予測は、以前に観察された行動に従う行動の予測に関係している。いずれの場合も、過去、現在、および潜在的将来の情報の間の \textbf{\textit{causal}} の関係が主な焦点である。 Markov Chains、Gaussian Processes、Auto-Regressive Modeling、Bayesian Recursive Filteringといった様々な数学的ツールが、これらの2つのタスクのコンピュータビジョン技術と共同で広く採用されている。しかし、これらのアプローチは、次元性の呪い、一般化の貧弱、ドメイン固有の知識からの制約といった課題に直面している。近年,既存の視覚タスク,一般的には行動予測タスクの性能向上のために,深部畳み込みニューラルネットワークや繰り返しニューラルネットワークに依存する構造が広く提案されている。しかし、これらには独自の欠点、大規模なトレーニングデータへの依存、強力な理論的基盤の欠如がある。本調査では,最近注目され,実用的価値が証明された映像予測理解の幅広い領域のサブ領域の導入から始める。次に、様々な早期行動認識と将来の行動予測アルゴリズムの徹底的なレビューを行い、適切に整理された分割を行う。最後に、今後の研究方針で議論を締めくくります。

Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations. Action prediction is a major sub-area of video predictive understanding and is the focus of this review. This sub-area has two major subdivisions: early action recognition and future action prediction. Early action recognition is concerned with recognizing an ongoing action as soon as possible. Future action prediction is concerned with the anticipation of actions that follow those previously observed. In either case, the \textbf{\textit{causal}} relationship between the past, current, and potential future information is the main focus. Various mathematical tools such as Markov Chains, Gaussian Processes, Auto-Regressive modeling, and Bayesian recursive filtering are widely adopted jointly with computer vision techniques for these two tasks. However, these approaches face challenges such as the curse of dimensionality, poor generalization, and constraints from domain-specific knowledge. Recently, structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks, in general, and action prediction tasks, in particular. However, they have their own shortcomings, \eg reliance on massive training data and lack of strong theoretical underpinnings. In this survey, we start by introducing the major sub-areas of the broad area of video predictive understanding, which recently have received intensive attention and proven to have practical value. Next, a thorough review of various early action recognition and future action prediction algorithms are provided with suitably organized divisions. Finally, we conclude our discussion with future research directions.

翻訳日:2021-07-14 00:40:12 公開日:2021-07-11

# (参考訳) CFTrack:3Dマルチオブジェクトトラッキングのためのセンターベースレーダとカメラフュージョン

CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object Tracking ( http://arxiv.org/abs/2107.05150v1 )

ライセンス: CC BY 4.0

Ramin Nabati, Landon Harris, Hairong Qi

(参考訳) 3Dマルチオブジェクトトラッキングは、自動運転車の認識システムにおいて重要な要素である。障害物回避や経路計画といったタスクには、車両周辺のすべての動的オブジェクトを追跡することが不可欠である。自動運転車は通常、精度と信頼性を向上させるために異なるセンサーモードを備えている。近年、センサ融合は物体検出ネットワークで広く使われているが、既存のマルチオブジェクト追跡アルゴリズムのほとんどは単一の入力モダリティに依存するか、複数のセンシングモダリティによって提供される情報を十分に活用していない。本研究では,レーダとカメラセンサの融合に基づく共同物体検出・追跡のためのエンドツーエンドネットワークを提案する。提案手法では,物体検出に中心型レーダカメラ融合アルゴリズムを用い,物体関連にグリーディアルゴリズムを用いる。提案手法は,検出した物体の深さ,速度,2次元変位を時間とともに関連づける。これにより、我々の追跡アルゴリズムは、物体をオクルードし重ね合わせるのに非常に頑健になり、深さと速度の情報がネットワークの識別に役立ちます。提案手法は,20.0AMOTAを達成し,ベンチマークにおけるすべての視覚ベースの3Dトラッキング手法と,ベースラインのLiDARベースの手法とを比較検討する。提案手法は画像当たり35msのランタイムを持つオンラインであり,自動運転アプリケーションに適している。

3D multi-object tracking is a crucial component in the perception system of autonomous driving vehicles. Tracking all dynamic objects around the vehicle is essential for tasks such as obstacle avoidance and path planning. Autonomous vehicles are usually equipped with different sensor modalities to improve accuracy and reliability. While sensor fusion has been widely used in object detection networks in recent years, most existing multi-object tracking algorithms either rely on a single input modality, or do not fully exploit the information provided by multiple sensing modalities. In this work, we propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion. Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association. The proposed greedy algorithm uses the depth, velocity and 2D displacement of the detected objects to associate them through time. This makes our tracking algorithm very robust to occluded and overlapping objects, as the depth and velocity information can help the network in distinguishing them. We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark, as well as the baseline LiDAR-based method. Our method is online with a runtime of 35ms per image, making it very suitable for autonomous driving applications.

翻訳日:2021-07-14 00:38:47 公開日:2021-07-11

# (参考訳) 学術論文への埋め込み:単語埋め込みとTFIDFの有効性

Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF ( http://arxiv.org/abs/2107.05151v1 )

ライセンス: CC BY 4.0

H.J. Meijer, J. Truong, R. Karimi

(参考訳) ここ数年、ニューラルネットワークによる単語の埋め込みは自然言語処理の文献で人気を博した。研究は主に、ウィキペディアや他のニュースやソーシャルメディアソースなどの公開コーパスで訓練された単語埋め込みの品質と応用に焦点を当てている。しかし、これらの研究は一般的なテキストに限られており、それゆえに専門的な語彙や略語、学術的な文脈で一般的に用いられる科学的公式のような技術的・科学的ニュアンスを欠いている。本研究は,大規模学術コーパスに適用した単語埋め込みの性能に着目した。具体的には、訓練された単語埋め込みの品質と効率を、科学論文のコンテンツモデリングにおけるTFIDF表現と比較する。我々は、約7000万の科学論文のタイトルと要約に基づいて訓練されたWord2vecスキップグラムモデルを使用する。さらに,コンテンツモデルを科学的文脈で評価するベンチマークを開発した。このベンチマークは、2017年に発行された約13万記事の論文とジャーナルをマッチングする分類タスクに基づいている。以上の結果から,単語埋め込みに基づくコンテンツモデルはタイトル(短文)ではよいが,TFIDFは抽象文(長文)ではよいことがわかった。しかし、より大きなテキストに対するtfidfのわずかな改善は、3.7倍のメモリ要求と最大184倍の計算時間を犠牲にして、オンラインアプリケーションでは非効率になる可能性がある。さらに,組込みモデルを用いて2次元のジャーナルの可視化を行い,定性的に組込みモデルを検査した。このグラフは有用な洞察を示し、新しいジャーナルを提案するための競合ジャーナルやギャップを見つけるために使用できる。

Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public available corpuses such as Wikipedia or other news and social media sources. However, these studies are limited to generic text and thus lack technical and scientific nuances such as domain specific vocabulary, abbreviations, or scientific formulas which are commonly used in academic context. This research focuses on the performance of word embeddings applied to a large scale academic corpus. More specifically, we compare quality and efficiency of trained word embeddings to TFIDF representations in modeling content of scientific articles. We use a word2vec skip-gram model trained on titles and abstracts of about 70 million scientific articles. Furthermore, we have developed a benchmark to evaluate content models in a scientific context. The benchmark is based on a categorization task that matches articles to journals for about 1.3 million articles published in 2017. Our results show that content models based on word embeddings are better for titles (short text) while TFIDF works better for abstracts (longer text). However, the slight improvement of TFIDF for larger text comes at the expense of 3.7 times more memory requirement as well as up to 184 times higher computation times which may make it inefficient for online applications. In addition, we have created a 2-dimensional visualization of the journals modeled via embeddings to qualitatively inspect embedding model. This graph shows useful insights and can be used to find competitive journals or gaps to propose new journals.

翻訳日:2021-07-14 00:25:45 公開日:2021-07-11

# 階層ラッパーを用いた因果発見の効率と精度の向上

Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper ( http://arxiv.org/abs/2107.05001v1 )

ライセンス: Link先を確認

Shami Nisimov, Yaniv Gurwicz, Raanan Y. Rohekar, Gal Novik

(参考訳) 観測データからの因果発見は多くの科学分野において重要なツールである。特定の仮定の下では、科学者は現象を説明し、予測し、決定することができる。大規模なサンプルリミットでは、音響および完全因果探索アルゴリズムが導入されており、因果関係を表す有向非巡回グラフ(DAG)またはその等価クラスが探索されている。しかし、現実のケースでは、有限のトレーニングデータしか利用できないため、これらのアルゴリズムが使用する統計的テストのパワーが制限され、推論因果モデルの誤差が生じる。これは、可能な限り統計テストを使用する戦略を考案することによって、一般的に対処される。本稿では,既存の制約に基づく因果発見アルゴリズムのための再帰的ラッパーとして,健全性と完全性を保持する戦略を提案する。初期から正規化されたminカット基準を用いて観測変数を再帰的にクラスタリングし、バックトラック中にベースライン因果探索アルゴリズムを用いて局所的な部分グラフを学習する。そしてそれらを組み合わせ、完全性を保証する。 Ablation study, using synthetic data, by common real-world benchmarks, we demonstrate that our approach requires significantly less statistics test, learns more accurate graphs, and requires short run-times than the baseline algorithm。

Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.

翻訳日:2021-07-13 16:23:27 公開日:2021-07-11

# 畳み込みニューラルネットワークのためのプルーニング基準のブレンディング

Blending Pruning Criteria for Convolutional Neural Networks ( http://arxiv.org/abs/2107.05033v1 )

ライセンス: Link先を確認

Wei He, Zhongzhan Huang, Mingfu Liang, Senwei Liang, Haizhao Yang

(参考訳) 様々な視覚アプリケーションにおける畳み込みニューラルネットワーク(CNN)の進歩は多くの注目を集めている。しかし、CNNの大多数は、現実世界のデプロイメントの厳しい要件を満たすことができません。これを解決するために、最近の人気ネットワークプルーニングはモデルの冗長性を抑える効果的な方法である。しかし、異なる刈り取り基準での「類似性」によるフィルタのランキングは矛盾する可能性がある。 1つのフィルタは特定の基準に従って重要であり、もう1つの基準では不要であり、これは各基準が包括的な「重要度」の部分的なビューであることを示している。このモチベーションから,既存のフィルタプルーニング基準を統合するための新しい枠組みを提案する。提案手法は,基準クラスタリングとフィルタ重要度校正の2段階を含む。まず,「重要」スコアのランクに基づいて,階層的クラスタリングによってプルーニング基準を導出する。第2に,各クラスタ内で選択されたブレンド候補の重要度を調整し,最適ブレンド基準を進化的アルゴリズムで探索するキャリブレーション係数を提案する。 CIFAR-100 と ImageNet ベンチマークの定量的結果は,我々のフレームワークが最先端のベースラインより優れており,刈り込み後のコンパクトモデル性能に低下していることを示している。

The advancement of convolutional neural networks (CNNs) on various vision applications has attracted lots of attention. Yet the majority of CNNs are unable to satisfy the strict requirement for real-world deployment. To overcome this, the recent popular network pruning is an effective method to reduce the redundancy of the models. However, the ranking of filters according to their "importance" on different pruning criteria may be inconsistent. One filter could be important according to a certain criterion, while it is unnecessary according to another one, which indicates that each criterion is only a partial view of the comprehensive "importance". From this motivation, we propose a novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity. The proposed framework contains two stages: Criteria Clustering and Filters Importance Calibration. First, we condense the pruning criteria via layerwise clustering based on the rank of "importance" score. Second, within each cluster, we propose a calibration factor to adjust their significance for each selected blending candidates and search for the optimal blending criterion via Evolutionary Algorithm. Quantitative results on the CIFAR-100 and ImageNet benchmarks show that our framework outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.

翻訳日:2021-07-13 16:22:09 公開日:2021-07-11

# 行動認識における領域適応のための相関情報の調整

Aligning Correlation Information for Domain Adaptation in Action Recognition ( http://arxiv.org/abs/2107.04932v1 )

ライセンス: Link先を確認

Yuecong Xu, Jianfei Yang, Haozhi Cao, Kezhi Mao, Jianxiong Yin, Simon See

(参考訳) ドメイン適応(DA)はドメインシフトに対処し、異なるシナリオにネットワークを適用することを可能にする。近年,様々な画像daアプローチが提案されているが,ビデオdaに対する研究は限られている。これは、時空間における画素の長期依存性として抽出された相関特徴を含む、ビデオの様々な特徴の適応の複雑さによるものである。相関特性は行動クラスと高度に関連し,教師付き行動認識タスクによる正確な映像特徴抽出における効果を証明した。しかし、同じアクションの相関特性はドメインシフトによってドメインによって異なる。そこで本研究では,画素相関を調整して動作映像をアライメントする新しいadversarial correlation adaptation network (acan)を提案する。 ACANは、Pixel correlation Discrepancy (PCD)と呼ばれる相関情報の分布を最小限にすることを目的としている。さらに、ビデオDA研究は、より大きなドメインシフトを持つクロスドメインビデオデータセットの欠如によって制限されている。そこで我々は,ドメイン間の統計的な差が大きいことによるドメインシフトが大きいhmdb-aridデータセットを導入する。このデータセットは、現在のデータセットをダークビデオ分類に活用するために構築されている。実験により,既存のビデオDAデータセットと新しいビデオDAデータセットの両方に対して,提案したACANの最先端性能を示す。

Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, video DA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new video DA datasets.

翻訳日:2021-07-13 16:19:28 公開日:2021-07-11

# 部分逆時間的注意ネットワークを用いた部分映像領域適応

Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network ( http://arxiv.org/abs/2107.04941v1 )

ライセンス: Link先を確認

Yuecong Xu, Jianfei Yang, Haozhi Cao, Qi Li, Kezhi Mao, Zhenghua Chen

(参考訳) 部分的ドメイン適応 (Partial Domain Adaptation, PDA) は実用的で一般的なドメイン適応シナリオであり、ソースラベル空間がターゲットとなるように、完全に共有されたラベル空間の仮定を緩和する。 PDAの主な課題は、ソースのみのクラスによる負の転送の問題である。ビデオの場合、そのような負の転送は、空間的特徴と時間的特徴の両方によって引き起こされ、より困難なビデオ領域適応(PVDA)問題を引き起こす可能性がある。本稿では,ソースのみのクラスをフィルタリングするための空間的特徴と時間的特徴を両立させてPVDA問題に対処する,新しいPATAN(Partial Adversarial Temporal Attentive Network)を提案する。さらにpatanは、クラス濾過プロセスに寄与する局所的な時間的特徴に従うことによって、効果的な時間的特徴を構築する。さらに、PVDA問題の研究を容易にするための新しいベンチマークを導入し、幅広いPVDAシナリオについて紹介する。複数のPVDAベンチマークで提案したPATANの最先端性能を実証した。

Partial Domain Adaptation (PDA) is a practical and general domain adaptation scenario, which relaxes the fully shared label space assumption such that the source label space subsumes the target one. The key challenge of PDA is the issue of negative transfer caused by source-only classes. For videos, such negative transfer could be triggered by both spatial and temporal features, which leads to a more challenging Partial Video Domain Adaptation (PVDA) problem. In this paper, we propose a novel Partial Adversarial Temporal Attentive Network (PATAN) to address the PVDA problem by utilizing both spatial and temporal features for filtering source-only classes. Besides, PATAN constructs effective overall temporal features by attending to local temporal features that contribute more toward the class filtration process. We further introduce new benchmarks to facilitate research on PVDA problems, covering a wide range of PVDA scenarios. Empirical results demonstrate the state-of-the-art performance of our proposed PATAN across the multiple PVDA benchmarks.

翻訳日:2021-07-13 16:19:12 公開日:2021-07-11

# 自律走行用物体検出モデルにおける表面不確かさの定量化

Prediction Surface Uncertainty Quantification in Object Detection Models for Autonomous Driving ( http://arxiv.org/abs/2107.04991v1 )

ライセンス: Link先を確認

Ferhat Ozgur Catak, Tao Yue, Shaukat Ali

(参考訳) 自動運転車における物体検出は、一般的にカメラ画像とlidar入力に基づいており、オブジェクト認識や速度調整などの意思決定のためのディープニューラルネットワークなどの予測モデルを訓練するためによく使用される。このような意思決定における誤りが損なわれる可能性があるため、不確実性の測定を通じて、そのような予測モデルによる決定の信頼性を測定することが不可欠である。不確実性は、ディープラーニングモデルにおいて、しばしば分類問題に対して測定される。しかし、自動運転におけるディープラーニングモデルは、しばしば多出力回帰モデルである。そこで,このような回帰モデルの予測不確実性を測定するために,pure (prediction surface uncertainty) と呼ばれる新しい手法を提案する。物体認識問題を2次元カメラビューにおける物体位置を見つけるために複数の出力を持つ回帰モデルとして定式化する。評価のために、広く応用された3つのオブジェクト認識モデル(YoLo、SSD300、SSD512)を修正し、KITTI、Stanford Cars、Berkeley DeepDrive、NEXETデータセットを使用しました。その結果,予測表面の不確かさと予測精度との間に統計的に有意な負の相関がみられた。

Object detection in autonomous cars is commonly based on camera images and Lidar inputs, which are often used to train prediction models such as deep artificial neural networks for decision making for object recognition, adjusting speed, etc. A mistake in such decision making can be damaging; thus, it is vital to measure the reliability of decisions made by such prediction models via uncertainty measurement. Uncertainty, in deep learning models, is often measured for classification problems. However, deep learning models in autonomous driving are often multi-output regression models. Hence, we propose a novel method called PURE (Prediction sURface uncErtainty) for measuring prediction uncertainty of such regression models. We formulate the object recognition problem as a regression model with more than one outputs for finding object locations in a 2-dimensional camera view. For evaluation, we modified three widely-applied object recognition models (i.e., YoLo, SSD300 and SSD512) and used the KITTI, Stanford Cars, Berkeley DeepDrive, and NEXET datasets. Results showed the statistically significant negative correlation between prediction surface uncertainty and prediction accuracy suggesting that uncertainty significantly impacts the decisions made by autonomous driving.

翻訳日:2021-07-13 16:18:55 公開日:2021-07-11

# 適応型クラスリバランシング自己学習による半教師付き物体検出

Semi-Supervised Object Detection with Adaptive Class-Rebalancing Self-Training ( http://arxiv.org/abs/2107.05031v1 )

ライセンス: Link先を確認

Fangyuan Zhang, Tianxiang Pan, Bin Wang

(参考訳) 本研究は半教師付き物体検出(ssod)に分解し,ラベルなしデータの追加により検出性能を向上させる。最先端のssodパフォーマンスは、トレーニングの監督が基礎的真実と疑似ラベルで構成されるセルフトレーニングによって最近達成されている。本研究では,ssodにおけるクラス不均衡が自己学習の有効性を著しく損なうことを観察する。クラス不均衡に対処するため、CropBankと呼ばれる新しいメモリモジュールを用いた適応型クラス再分散自己学習(ACRST)を提案する。 ACRSTは、トレーニングデータをCropBankから抽出された前景インスタンスと適応的に再バランスし、クラス不均衡を軽減する。検出タスクの複雑さが高いため,ssodでは,自己学習とデータバランスの両方が雑音を伴う疑似ラベルに苦しむのが観察された。そこで本研究では,疑似ラベルを生成するための2段階フィルタリングアルゴリズムを提案する。提案手法は,MS-COCOおよびVOCベンチマークの良好な改善を実現する。 MS-COCOでラベル付きデータを使用する場合,教師付きベースラインよりも17.02mAP,最先端手法に比べて5.32mAPの改善が達成される。

This study delves into semi-supervised object detection (SSOD) to improve detector performance with additional unlabeled data. State-of-the-art SSOD performance has been achieved recently by self-training, in which training supervision consists of ground truths and pseudo-labels. In current studies, we observe that class imbalance in SSOD severely impedes the effectiveness of self-training. To address the class imbalance, we propose adaptive class-rebalancing self-training (ACRST) with a novel memory module called CropBank. ACRST adaptively rebalances the training data with foreground instances extracted from the CropBank, thereby alleviating the class imbalance. Owing to the high complexity of detection tasks, we observe that both self-training and data-rebalancing suffer from noisy pseudo-labels in SSOD. Therefore, we propose a novel two-stage filtering algorithm to generate accurate pseudo-labels. Our method achieves satisfactory improvements on MS-COCO and VOC benchmarks. When using only 1\% labeled data in MS-COCO, our method achieves 17.02 mAP improvement over supervised baselines, and 5.32 mAP improvement compared with state-of-the-art methods.

翻訳日:2021-07-13 16:18:36 公開日:2021-07-11

# 対話型可視化と解釈可能な機械学習を用いたセルフサービスデータ分類

Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning ( http://arxiv.org/abs/2107.04971v1 )

ライセンス: Link先を確認

Sridevi Narayana Wagle, Boris Kovalerchuk

(参考訳) 機械学習アルゴリズムは、エンドユーザーと開発者の両方が複雑なブラックボックスモデルと見なすモデルをしばしば生成する。彼らは設計したドメインの観点からモデルを説明することができません。提案する反復的ビジュアル論理分類器(ivlc)は、エンドユーザがモデルを設計し、信頼性を高め、精度を損なうことなくデータを分類できる、解釈可能な機械学習アルゴリズムである。このようなテクニックは、医療領域におけるがんデータなどの機密で重要なデータを、高いコストで処理する上で特に有用である。インタラクティブでロスレスな多次元可視化を提案することで、エンドユーザは、説明可能な決定を下すことができるデータ内のパターンを識別できる。このようなオプションは、ブラックボックスの機械学習方法論では不可能だ。解釈可能なIVLCアルゴリズムは、Interactive Shifted Paired Coordinates Software System (SPCVis)によってサポートされている。ユーザ対話型機能を備えた無損失多次元データ可視化システムである。インタラクティブなアプローチは、マシンラーニングの専門家に頼らずに、エンドユーザがセルフサービスとしてデータ分類を実行するための柔軟性を提供する。インタラクティブなパターン発見は、数百の次元/機能を持つ大きなデータセットを扱うときに困難になる。この問題を解決するために、この章では、新しいコーディネートオーダー最適化アルゴリズム(COO)と遺伝的アルゴリズムを組み合わせた自動分類手法を提案する。 COOアルゴリズムは、データ分離を最もよく表す座標対列を自動的に生成し、遺伝的アルゴリズムは、データ分類のための領域を自動的に生成することにより、提案したIVLCアルゴリズムの最適化を支援する。このアプローチの有効性は、データ分類に使用されるインタラクティブプロセスと自動化プロセスの両方をカバーするベンチマークデータセットの実験によって示されている。

Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. They fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and lossless multidimensional visualization, end users can identify the pattern in the data based on which they can make explainable decisions. Such options would not be possible in black box machine learning methodologies. The interpretable IVLC algorithm is supported by the Interactive Shifted Paired Coordinates Software System (SPCVis). It is a lossless multidimensional data visualization system with user interactive features. The interactive approach provides flexibility to the end user to perform data classification as self-service without having to rely on a machine learning expert. Interactive pattern discovery becomes challenging while dealing with large data sets with hundreds of dimensions/features. To overcome this problem, this chapter proposes an automated classification approach combined with new Coordinate Order Optimizer (COO) algorithm and a Genetic algorithm. The COO algorithm automatically generates the coordinate pair sequences that best represent the data separation and the genetic algorithm helps optimizing the proposed IVLC algorithm by automatically generating the areas for data classification. The feasibility of the approach is shown by experiments on benchmark datasets covering both interactive and automated processes used for data classification.

翻訳日:2021-07-13 16:17:02 公開日:2021-07-11

# 分布外ダイナミクス検出:RL関連ベンチマークと結果

Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results ( http://arxiv.org/abs/2107.04982v1 )

ライセンス: Link先を確認

Mohamad H Danesh and Alan Fern

(参考訳) 本研究では,時間的プロセスの動的変化をトレーニング・分散力学と比較して検出するOODD(Out-of-distriion dynamics)の問題点について検討する。これは制御、強化学習(RL)、多変量時系列の応用に関係しており、テスト時間ダイナミクスの変更は未知の方法で学習コントローラや予測器の性能に影響を与える可能性がある。この問題は、学習したコントローラがトレーニング環境に過度に適合する、深いRLの文脈において特に重要である。しかし、現在RL研究でよく使われる環境の種類について、OODDベンチマークが確立されていない。最初のコントリビューションは、OODDのさまざまなタイプと強度を持つ共通RL環境から派生したOODDベンチマークを設計することです。第2のコントリビューションは、繰り返し暗黙的量子化ネットワーク(RIQN)に基づいて、OODD検出のための自己回帰予測エラーを監視する強力なOODDベースラインアプローチを設計することである。最後のコントリビューションは、RIQNアプローチをベンチマークで評価し、将来の比較のためのベースライン結果を提供することです。

We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile networks (RIQNs), which monitors autoregressive prediction errors for OODD detection. Our final contribution is to evaluate the RIQN approach on the benchmarks to provide baseline results for future comparison.

翻訳日:2021-07-13 16:16:38 公開日:2021-07-11

# 大量半導体プロセスにおける機械学習に基づくCVD仮想メトロロジー

Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process ( http://arxiv.org/abs/2107.05071v1 )

ライセンス: Link先を確認

Yunsong Xie, Ryan Stearrett

(参考訳) データインプット、特徴選択、回帰アルゴリズム、マシンラーニングベースのCVD(Chemical vapor deposition)仮想メタロジ(VM)の3つの重要な側面について、クロスベンチマークが行われた。その結果,線形特徴選択回帰アルゴリズムはVMデータに不適合であることが判明した。最適な精度を得るためには、データの可用性が約70%であるので、高い予測精度を達成するためには、データのインプティングも必要である。この研究は、非線形特徴選択と回帰アルゴリズムと最も近いデータインプティングアルゴリズムを組み合わせることで、予測精度を0.7まで向上させることを示唆している。これにより、CVD処理の70%のばらつきが減少し、物理メロロジーの周波数が低下し、品質が向上したより信頼性の高い大量発生ウェハとなると考えられている。

A cross-benchmark has been done on three critical aspects, data imputing, feature selection and regression algorithms, for machine learning based chemical vapor deposition (CVD) virtual metrology (VM). The result reveals that linear feature selection regression algorithm would extensively under-fit the VM data. Data imputing is also necessary to achieve a higher prediction accuracy as the data availability is only ~70% when optimal accuracy is obtained. This work suggests a nonlinear feature selection and regression algorithm combined with nearest data imputing algorithm would provide a prediction accuracy as high as 0.7. This would lead to 70% reduced CVD processing variation, which is believed to will lead to reduced frequency of physical metrology as well as more reliable mass-produced wafer with improved quality.

翻訳日:2021-07-13 16:16:19 公開日:2021-07-11

# 1つのマップがすべてに適合しない:マルチモーダル医療画像におけるサニエンシマップ説明の評価

One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images ( http://arxiv.org/abs/2107.05047v1 )

ライセンス: Link先を確認

Weina Jin, Xiaoxiao Li, Ghassan Hamarneh

(参考訳) 臨床エンドユーザに予測を説明することは、臨床決定支援のためにAIモデルのパワーを活用する必要がある。医療画像では、塩分マップが最も一般的な説明形式である。マップはAIモデルの予測の重要な特徴を強調している。多くのサリエンシマップ法が提案されているが、それぞれのモダリティ/チャンネルが同じ基礎となる生体医学現象の異なる臨床的意味を持つマルチモーダルな医用画像において、意思決定をいかにうまく行うかは分かっていない。このようなモダリティに依存した特徴を理解することは、臨床ユーザーのAI決定の解釈に不可欠である。臨床的に重要な問題であるが技術的に無視される問題に対処するため,MSFI(Modality-Specific Feature Importance)測定基準を提案し,サリエンシマップがモダリティ特有の重要な特徴を強調できるかどうかを検討する。 MSFIは、モダリティ優先順位付けおよびモダリティ特異的特徴ローカライゼーションに関する臨床要件を符号化する。臨床用ユーザスタディを含む16のサリエンシーマップ法について評価した結果,ほとんどのサリエンシーマップ法はモダリティ重要情報を一般に捉えたものの,モダリティ固有の重要な特徴を一貫して正確に強調することはできなかった。評価結果は,サリエンシマップ法の選択をガイドし,臨床応用をターゲットとした新たな手法を提案する。

Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model's prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the MSFI (Modality-Specific Feature Importance) metric to examine whether saliency maps can highlight modality-specific important features. MSFI encodes the clinical requirements on modality prioritization and modality-specific feature localization. Our evaluations on 16 commonly used saliency map methods, including a clinician user study, show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely. The evaluation results guide the choices of saliency map methods and provide insights to propose new ones targeting clinical applications.

翻訳日:2021-07-13 16:11:12 公開日:2021-07-11

# コモンセンス知識統合によるゼロショットシーングラフ関係予測

Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration ( http://arxiv.org/abs/2107.05080v1 )

ライセンス: Link先を確認

Xuan Kan, Hejie Cui, Carl Yang

(参考訳) 画像内の実体間の関係予測はシーングラフ生成(SGG)の重要なステップであり、様々な視覚的理解や推論タスクにさらに影響を及ぼす。しかし、既存のSGGフレームワークは重い訓練を必要とするが、目に見えない(ゼロショット)トリプルをモデル化することができない。本研究は, 共通理解の欠如, すなわち, 類似する実体を関連付け, 世界の一般的な理解に基づく類似関係を推測する能力に起因していることを強調する。このギャップを埋めるために、特にゼロショット関係予測において、SGGのコモンセンス知識を統合するフレームワークであるCommOnsense-integrAted sCenegrapHrElation pRediction (COACHER)を提案する。具体的には、外部コモンセンス知識グラフ内のエンティティの周辺と経路をモデル化し、最先端のSGGフレームワーク上でそれらを統合するための新しいグラフマイニングパイプラインを開発する。提案手法の有効性を実証するために,Visual Genome のオリジナルデータセットとオペレーテッドデータセットの総合的定量的評価と定性ケーススタディを行った。

Relation prediction among entities in images is an important step in scene graph generation (SGG), which further impacts various visual understanding and reasoning tasks. Existing SGG frameworks, however, require heavy training yet are incapable of modeling unseen (i.e.,zero-shot) triplets. In this work, we stress that such incapability is due to the lack of commonsense reasoning,i.e., the ability to associate similar entities and infer similar relations based on general understanding of the world. To fill this gap, we propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for SGG, especially for zero-shot relation prediction. Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive quantitative evaluations and qualitative case studies on both original and manipulated datasets from Visual Genome demonstrate the effectiveness of our proposed approach.

翻訳日:2021-07-13 16:10:50 公開日:2021-07-11

# SE-PSNet:Panoptic Segmentation Networkのためのシルエットベースの拡張機能

SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation Network ( http://arxiv.org/abs/2107.05093v1 )

ライセンス: Link先を確認

Shuo-En Chang, Yi-Cheng Yang, En-Ting Lin, Pei-Yung Hsiao, Li-Chen Fu

(参考訳) 最近、セマンティックセグメンテーションとインスタンスセグメンテーションを組み合わせたpanopticセグメンテーションタスクがあり、それぞれのピクセルを対応するインスタンスidで分類することを目標としている。本研究では,panoptic segmentationタスクに取り組むための解法を提案する。全体構造はボトムアップ法とトップダウン法を組み合わせている。したがって、パフォーマンスが向上するだけでなく、実行速度も維持できる。ネットワークは主にマスクの品質に注意を払っている。前の研究では、オブジェクトの不均一な輪郭が出現する可能性が高まり、結果として低品質の予測が行われることがわかりました。そこで我々は,マスク改善のために,物体と背景のシルエットに対する拡張機能とそれに対応する損失関数を提案する。一方,新しい信頼度スコアを用いて咬合問題を解決し,ネットワークがより高品質なマスクを予測結果として使用する傾向を示した。研究の検証には,cocoデータセットとcityscapesデータセットを使用して実験を行い,高速な推論時間で競合結果を得た。

Recently, there has been a panoptic segmentation task combining semantic and instance segmentation, in which the goal is to classify each pixel with the corresponding instance ID. In this work, we propose a solution to tackle the panoptic segmentation task. The overall structure combines the bottom-up method and the top-down method. Therefore, not only can there be better performance, but also the execution speed can be maintained. The network mainly pays attention to the quality of the mask. In the previous work, we can see that the uneven contour of the object is more likely to appear, resulting in low-quality prediction. Accordingly, we propose enhancement features and corresponding loss functions for the silhouette of objects and backgrounds to improve the mask. Meanwhile, we use the new proposed confidence score to solve the occlusion problem and make the network tend to use higher quality masks as prediction results. To verify our research, we used the COCO dataset and CityScapes dataset to do experiments and obtained competitive results with fast inference time.

翻訳日:2021-07-13 16:10:29 公開日:2021-07-11

# 過パラメータ浅層ニューラルネットワークを用いたエネルギーベースモデルのデュアルトレーニング

Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks ( http://arxiv.org/abs/2107.05134v1 )

ライセンス: Link先を確認

Carles Domingo-Enrich, Alberto Bietti, Marylou Gabri\'e, Joan Bruna, Eric Vanden-Eijnden

(参考訳) エネルギーベースモデル(英: Energy-based model、EBM)は、通常最大推定によって訓練される生成モデルである。このアプローチは、このエネルギーに関連するギブス分布をサンプリングする必要があるため、訓練されたエネルギーが凸でない一般的な状況では困難になる。 Fenchel双対性(英語版)の結果を用いて、能動性(いわゆる特徴学習)と遅延性(英語版)の両方において、浅度過度ニューラルネットワークエネルギーを持つ最大極大EBMに双対する変動原理を導出する。アクティブな状態において、この二重定式化は、サンプル空間の粒子とエネルギーのパラメータ空間のニューロンを同時に更新する訓練アルゴリズムをもたらす。また,このアルゴリズムでは,データセットからランダムに抽出したサンプルで粒子をリスタートさせる場合があり,反復ステップ毎にこれらのリスタートを行うことがスコアマッチングトレーニングに対応していることを示す。 2つのアルゴリズムで中間パラメータの設定を使用することで、最大確率とスコアマッチングトレーニングを補間する方法が得られます。これらの結果は単純な数値実験で示される。

Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is nonconvex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the active (aka feature-learning) and lazy regimes. In the active regime, this dual formulation leads to a training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. Using intermediate parameter setups in our dual algorithm thereby gives a way to interpolate between maximum likelihood and score matching training. These results are illustrated in simple numerical experiments.

翻訳日:2021-07-13 16:06:50 公開日:2021-07-11

# 楕円ペア座標を用いた非線形視覚知識発見

Non-linear Visual Knowledge Discovery with Elliptic Paired Coordinates ( http://arxiv.org/abs/2107.04974v1 )

ライセンス: Link先を確認

Rose McDonald, Boris Kovalerchuk

(参考訳) 裸眼で2-3次元以上のデータから視覚的な知識を発見できることは、人間が困難である。本章では,新しいepc(eliptic paired coordinates)可視化を用いて,予測機械学習モデルをインタラクティブに発見する効率について検討する。 EPCは,多次元データを可視化し,多次元情報を2次元で保存した視覚機械学習を支援する。平行座標と放射座標と比較して、epcの可視化は各n-d点の視覚要素の半分しか必要としない。本研究で開発された対話型ソフトウェアシステムEllipseVisは、高次元データセットを処理し、EPCビジュアライゼーションを作成し、EPCにおける支配ルールを発見して予測的分類モデルを生成する。インタラクティブで自動的なプロセスを使用することで、単一のクラスの高い優位性を持つEPC内のゾーンを発見する。 EPC法は計算実験において高いカバレッジと精度で非線形予測モデルを発見することに成功している。これは視覚的に魅力的な支配ルールを作成することで、複数のドメインに利益をもたらす。本章では,実データおよびシミュレーションデータを用いた実験におけるepc非線形手法の検証,動的楕円対座標(depc)に一般化されたepc,視覚発見を最適化する座標重みの組込み,代替epc設計の導入,epc/depcに基づく非コンパクト機械学習手法の概念の導入について述べる。

It is challenging for humans to enable visual knowledge discovery in data with more than 2-3 dimensions with a naked eye. This chapter explores the efficiency of discovering predictive machine learning models interactively using new Elliptic Paired coordinates (EPC) visualizations. It is shown that EPC are capable to visualize multidimensional data and support visual machine learning with preservation of multidimensional information in 2-D. Relative to parallel and radial coordinates, EPC visualization requires only a half of the visual elements for each n-D point. An interactive software system EllipseVis, which is developed in this work, processes high-dimensional datasets, creates EPC visualizations, and produces predictive classification models by discovering dominance rules in EPC. By using interactive and automatic processes it discovers zones in EPC with a high dominance of a single class. The EPC methodology has been successful in discovering non-linear predictive models with high coverage and precision in the computational experiments. This can benefit multiple domains by producing visually appealing dominance rules. This chapter presents results of successful testing the EPC non-linear methodology in experiments using real and simulated data, EPC generalized to the Dynamic Elliptic Paired Coordinates (DEPC), incorporation of the weights of coordinates to optimize the visual discovery, introduction of an alternative EPC design and introduction of the concept of incompact machine learning methodology based on EPC/DEPC.

翻訳日:2021-07-13 16:05:56 公開日:2021-07-11

# BrainNNExplainer:Brain Networkベースの疾患分析のための解釈可能なグラフニューラルネットワークフレームワーク

BrainNNExplainer: An Interpretable Graph Neural Network Framework for Brain Network based Disease Analysis ( http://arxiv.org/abs/2107.05097v1 )

ライセンス: Link先を確認

Hejie Cui, Wei Dai, Yanqiao Zhu, Xiaoxiao Li, Lifang He, Carl Yang

(参考訳) 疾患予測のための解釈可能な脳ネットワークモデルは、神経科学の進歩に非常に有用である。 gnnは複雑なネットワークデータをモデル化することを約束しているが、過度に適合しやすいため、医療などの決定クリティカルなシナリオでの使用を妨げている。このギャップを埋めるために、脳ネットワーク分析のための解釈可能なGNNフレームワークであるBrainNNExplainerを提案する。主に2つの共同学習モジュールで構成されており、脳ネットワークに特化したバックボーン予測モデルと、疾患特異的な脳ネットワーク接続を強調する説明生成器である。 BrainNNExplainerのユニークな解釈可能性と優れた性能を示す2つの難病予測データセットの可視化による大規模な実験結果。

Interpretable brain network models for disease prediction are of great value for the advancement of neuroscience. GNNs are promising to model complicated network data, but they are prone to overfitting and suffer from poor interpretability, which prevents their usage in decision-critical scenarios like healthcare. To bridge this gap, we propose BrainNNExplainer, an interpretable GNN framework for brain network analysis. It is mainly composed of two jointly learned modules: a backbone prediction model that is specifically designed for brain networks and an explanation generator that highlights disease-specific prominent brain network connections. Extensive experimental results with visualizations on two challenging disease prediction datasets demonstrate the unique interpretability and outstanding performance of BrainNNExplainer.

翻訳日:2021-07-13 16:05:30 公開日:2021-07-11

# 異なる利害関係者グループに関する組織パフォーマンスのコンピュータ支援構成分類

Computer-assisted construct classification of organizational performance concerning different stakeholder groups ( http://arxiv.org/abs/2107.05133v1 )

ライセンス: Link先を確認

Seethalakshmi Gopalakrishnan, Victor Chen, Gus Hahn-Powell, Bharadwaj Tirunagar

(参考訳) ビジネスやマネジメントにおける研究記事の数は、用語、構成、尺度とともに劇的に増加している。研究論文からの組織的業績構成の適切な分類は、その研究が関係するかもしれない文献と理解を分類する上で重要な役割を担っている。 In this work, we classify constructs (i.e., concepts and terminology used to capture different aspects of organizational performance) in research articles into a three-level categorization: (a) performance and non-performance categories (Level 0); (b) for performance constructs, stakeholder group-level of performance concerning investors, customers, employees, and the society (community and natural environment) (Level 1); and (c) for each stakeholder group-level, subcategories of different ways of measurement (Level 2). 本研究は,周辺文や外部参照から抽出した特徴を用いた文脈情報の増加が,訓練データに制限がある場合,分解レベルラベルの分類を改善することを見出した。本研究は, コンピュータ支援による構造同定と分類, 研究合成における重要なステップである。

The number of research articles in business and management has dramatically increased along with terminology, constructs, and measures. Proper classification of organizational performance constructs from research articles plays an important role in categorizing the literature and understanding to whom its research implications may be relevant. In this work, we classify constructs (i.e., concepts and terminology used to capture different aspects of organizational performance) in research articles into a three-level categorization: (a) performance and non-performance categories (Level 0); (b) for performance constructs, stakeholder group-level of performance concerning investors, customers, employees, and the society (community and natural environment) (Level 1); and (c) for each stakeholder group-level, subcategories of different ways of measurement (Level 2). We observed that increasing contextual information with features extracted from surrounding sentences and external references improves classification of disaggregate-level labels, given limited training data. Our research has implications for computer-assisted construct identification and classification - an essential step for research synthesis.

翻訳日:2021-07-13 16:03:58 公開日:2021-07-11

# 低リソース地理空間機械学習のためのドメイン適応化

Leveraging Domain Adaptation for Low-Resource Geospatial Machine Learning ( http://arxiv.org/abs/2107.04983v1 )

ライセンス: Link先を確認

Jack Lynch and Sam Wookey

(参考訳) リモートセンシングにおける機械学習は、地理空間画像の可用性と解像度の増大とともに成熟しているが、その実用性はラベル付きデータの必要性によってボトルネックになっている。さらに、多くのラベル付き地理空間データセットは特定の地域、機器、極端な気象イベントに特化しています。提案する複数の地理空間的ベンチマークに対する現代ドメイン適応の適用について検討し,固有の課題を明らかにし,その解決策を提案する。

Machine learning in remote sensing has matured alongside a proliferation in availability and resolution of geospatial imagery, but its utility is bottlenecked by the need for labeled data. What's more, many labeled geospatial datasets are specific to certain regions, instruments, or extreme weather events. We investigate the application of modern domain-adaptation to multiple proposed geospatial benchmarks, uncovering unique challenges and proposing solutions to them.

翻訳日:2021-07-13 16:02:20 公開日:2021-07-11

# 医用画像分割のための空間ガイド型自己監督クラスタリングネットワーク

A Spatial Guided Self-supervised Clustering Network for Medical Image Segmentation ( http://arxiv.org/abs/2107.04934v1 )

ライセンス: Link先を確認

Euijoon Ahn, Dagan Feng and Jinman Kim

(参考訳) 医療画像のセグメンテーションは、自動臨床意思決定支援システムの基本的なステップである。しかし,教師付き深層学習に基づく既存の医用画像分割法は,大量のラベル付きトレーニングデータに依存するため問題視されている。医用画像データリポジトリは拡大を続けているが,注釈付きデータの量の増加は確認されていない。そこで本研究では,空間的に接続され,類似した特徴表現を持つ画像画素をグループ化するのを支援する複数の損失関数を導入することで,医療画像分割のための空間的誘導型自己教師付きクラスタリングネットワーク(sgscn)を提案する。単一の画像から、各ピクセルの特徴表現とクラスタリングの割り当てをエンドツーエンドで反復的に学習する。また,画像領域の形状と境界をより明確に示すコンテキストベースの一貫性損失を提案する。クラスタに属するすべてのピクセルを、クラスタ中心に空間的に近接するように強制する。本手法を2つの公開医用画像データセット上で評価し,従来の自己監督型クラスタリング法と比較した。実験の結果,医用画像のセグメンテーションでは最も精度が高かった。

The segmentation of medical images is a fundamental step in automated clinical decision support systems. Existing medical image segmentation methods based on supervised deep learning, however, remain problematic because of their reliance on large amounts of labelled training data. Although medical imaging data repositories continue to expand, there has not been a commensurate increase in the amount of annotated data. Hence, we propose a new spatial guided self-supervised clustering network (SGSCN) for medical image segmentation, where we introduce multiple loss functions designed to aid in grouping image pixels that are spatially connected and have similar feature representations. It iteratively learns feature representations and clustering assignment of each pixel in an end-to-end fashion from a single image. We also propose a context-based consistency loss that better delineates the shape and boundaries of image regions. It enforces all the pixels belonging to a cluster to be spatially close to the cluster centre. We evaluated our method on 2 public medical image datasets and compared it to existing conventional and self-supervised clustering methods. Experimental results show that our method was most accurate for medical image segmentation.

翻訳日:2021-07-13 15:58:50 公開日:2021-07-11

# ディープファイバクラスタリング:高速かつ効果的なホワイトマターパーセレーションのための解剖学的非教師なしディープラーニング

Deep Fiber Clustering: Anatomically Informed Unsupervised Deep Learning for Fast and Effective White Matter Parcellation ( http://arxiv.org/abs/2107.04938v1 )

ライセンス: Link先を確認

Yuqian Chen, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Fan Zhang, Lauren J. O'Donnell

(参考訳) ホワイトマター・ファイバ・クラスタリング(wmfc)は、疾患分類や解剖学的道分画などの応用のためにホワイトマター・トラクトグラフィのパーセル化を可能にする。しかし、基底真理の欠如とファイバーデータの曖昧さ(ファイバーに沿った点を前方または逆順に表すことができる)がこの課題に挑戦する。教師なしディープラーニングに基づく新しいWMFCフレームワークを提案する。我々は,教師なしクラスタリング問題を自己教師なし学習タスクとして解決する。具体的には、畳み込みニューラルネットワークを用いて入力ファイバーの埋め込みを学習し、ペアワイズファイバー距離を擬似アノテーションとして使用する。これにより、ファイバポイント順序に敏感なwmfcが可能になる。さらに、脳解剖学的セグメンテーションデータを組み込むことにより、繊維クラスターの解剖学的コヒーレンスを向上させる。提案フレームワークは, クラスタ割り当て確率の低いファイバを拒絶することにより, 自然に外乱除去を可能にする。我々は,Human Connectome Projectから200のデータセットを用いて本手法を訓練し,評価する。その結果,提案手法の性能と効率が向上した。

White matter fiber clustering (WMFC) enables parcellation of white matter tractography for applications such as disease classification and anatomical tract segmentation. However, the lack of ground truth and the ambiguity of fiber data (the points along a fiber can equivalently be represented in forward or reverse order) pose challenges to this task. We propose a novel WMFC framework based on unsupervised deep learning. We solve the unsupervised clustering problem as a self-supervised learning task. Specifically, we use a convolutional neural network to learn embeddings of input fibers, using pairwise fiber distances as pseudo annotations. This enables WMFC that is insensitive to fiber point ordering. In addition, anatomical coherence of fiber clusters is improved by incorporating brain anatomical segmentation data. The proposed framework enables outlier removal in a natural way by rejecting fibers with low cluster assignment probability. We train and evaluate our method using 200 datasets from the Human Connectome Project. Results demonstrate superior performance and efficiency of the proposed approach.

翻訳日:2021-07-13 15:58:35 公開日:2021-07-11

# LiveView:ビュー合成のための動的ターゲット中心型MPI

LiveView: Dynamic Target-Centered MPI for View Synthesis ( http://arxiv.org/abs/2107.05113v1 )

ライセンス: Link先を確認

Sushobhan Ghosh, Zhaoyang Lv, Nathan Matsuda, Lei Xiao, Andrew Berkovich, Oliver Cossairt

(参考訳) 既存のMulti-Plane Image (MPI) ベースのビュー合成手法は、1つの前方通過で固定された平面数を用いて入力ビューに整列したMPIを生成する。これらの手法は、新しいビューの高速で高品質なレンダリングを生成するが、リアルタイムアプリケーションには適さない低速で計算コストの高いmpi生成メソッドに依存している。加えて、ほとんどのMPI技術はトレーニングが完了すると修正できない固定深度/不均一平面を使用しているため、実行時の柔軟性は極めて低い。リアルタイムに高品質なビュー合成を実現する新しいMPI生成・レンダリング技術であるLiveViewを提案する。また,実行時にシーン依存型MPI平面(平面数と間隔)を選択するための柔軟性も提供する。 liveviewはまず、入力画像をターゲットビュー(ターゲット中心)にワープし、次に目標ビュー中心のmpi、つまり1つの深さプレーン(動的に)を生成する。高速なMPI生成と新規なビュー合成を可能にするとともに、高品質なレンダリングを生成する。その結果、LiveViewは、入力ビューのビデオストリームに基づいて、MPIを頻繁に更新する必要があるリアルタイムビュー合成アプリケーションを可能にする。我々はLiveViewが、最先端のMPIベースの手法に比べて、実行時の70倍高速で、ビュー合成の品質を向上させることを実証した。

Existing Multi-Plane Image (MPI) based view-synthesis methods generate an MPI aligned with the input view using a fixed number of planes in one forward pass. These methods produce fast, high-quality rendering of novel views, but rely on slow and computationally expensive MPI generation methods unsuitable for real-time applications. In addition, most MPI techniques use fixed depth/disparity planes which cannot be modified once the training is complete, hence offering very little flexibility at run-time. We propose LiveView - a novel MPI generation and rendering technique that produces high-quality view synthesis in real-time. Our method can also offer the flexibility to select scene-dependent MPI planes (number of planes and spacing between them) at run-time. LiveView first warps input images to target view (target-centered) and then learns to generate a target view centered MPI, one depth plane at a time (dynamically). The method generates high-quality renderings, while also enabling fast MPI generation and novel view synthesis. As a result, LiveView enables real-time view synthesis applications where an MPI needs to be updated frequently based on a video stream of input views. We demonstrate that LiveView improves the quality of view synthesis while being 70 times faster at run-time compared to state-of-the-art MPI-based methods.

翻訳日:2021-07-13 15:58:20 公開日:2021-07-11

# 深部政策勾配の座標方向制御変量

Coordinate-wise Control Variates for Deep Policy Gradients ( http://arxiv.org/abs/2107.04987v1 )

ライセンス: Link先を確認

Yuanyi Zhong, Yuan Zhou, Jian Peng

(参考訳) 制御変数 (CV) 法は, 実際には勾配推定器のばらつきを低減するために, 政策勾配推定に広く用いられている。状態-作用値推定からベースライン関数を減算して制御変量を適用する。そして、ばらつきが引き起こされるポリシー勾配は、おそらく学習効率を向上させる。深いニューラルネットポリシを持つ制御変数の最近の研究は、主にスカラー値のベースライン関数に焦点を当てている。ベクトル値ベースラインの効果は未探索である。本稿では,ニューラルネットワークポリシのためのベクトル値ベースラインから構築した座標ワイドおよび層ワイド制御による分散低減について検討する。本研究では,従来のスカラー値ベースラインよりも低分散のベースラインが得られることを示す実験結果を示す。我々は、これらの新しい制御変数を用いて、人気のあるPPOアルゴリズムの装備方法を示す。正規化を適切に行うアルゴリズムは、連続制御ベンチマークにおいてスカラー制御よりも高いサンプリング効率が得られることを示す。

The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO) algorithm with these new control variates. We show that the resulting algorithm with proper regularization can achieve higher sample efficiency than scalar control variates in continuous control benchmarks.

翻訳日:2021-07-13 15:53:45 公開日:2021-07-11

# 空白と不均衡アノテーションによる集団からの学習

Learning from Crowds with Sparse and Imbalanced Annotations ( http://arxiv.org/abs/2107.05039v1 )

ライセンス: Link先を確認

Ye Shi, Shao-Yuan Li, Sheng-Jun Huang

(参考訳) 従来の教師付き学習では、訓練データには基礎的真理ラベルが必要であり、その収集は多くの場合困難である。近年、クラウドソーシングは、非専門家の群衆に頼って効率的なラベリングソリューションとして確立されている。ラベル付けエラーの影響を低減するために、各インスタンスを複数のワーカーに分散するのが一般的な方法だが、各ワーカーはデータのサブセットのみに注釈を付け、その結果、"it sparse annotation} 現象が発生する。本稿では,クラス不均衡,すなわち,基底の真理ラベルが「クラス不均衡」である場合,スパースアノテーションは難解に分散する傾向にあり,学習アルゴリズムに悪影響を及ぼす可能性があることに留意する。この問題に対処するために, 自信ある擬似アノテーションを徐々に追加し, アノテーション分布を再バランスさせることにより, 自己学習に基づく1つのアプローチを提案する。具体的には,自信ある疑似アノテーションを選択するための分布意識的信頼度尺度を提案し,少数派アノテーションをオーバーサンプリングし,多数派アノテーションをアンサンプする再サンプリング戦略を採用する。 1つの実世界のクラウドソーシング画像分類タスクにおいて,提案手法は分布非依存手法よりもトレーニングを通してよりバランスの取れたアノテーションを与え,異なるアノテーションスパーシティレベルでの学習性能を大幅に向上させることを示した。

Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the {\it sparse annotation} phenomenon. In this paper, we note that when meeting with class-imbalance, i.e., when the ground truth labels are {\it class-imbalanced}, the sparse annotations are prone to be skewly distributed, which thus can severely bias the learning algorithm. To combat this issue, we propose one self-training based approach named {\it Self-Crowd} by progressively adding confident pseudo-annotations and rebalancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select confident pseudo-annotations, which adopts the resampling strategy to oversample the minority annotations and undersample the majority annotations. On one real-world crowdsourcing image classification task, we show that the proposed method yields more balanced annotations throughout training than the distribution agnostic methods and substantially improves the learning performance at different annotation sparsity levels.

翻訳日:2021-07-13 15:53:32 公開日:2021-07-11

# クラス優先シフトによる正ラベル分類:密度比推定に基づく事前不変アプローチ

Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation ( http://arxiv.org/abs/2107.05045v1 )

ライセンス: Link先を確認

Shota Nakajima, Masashi Sugiyama

(参考訳) 正およびラベルなし(PU)データから学ぶことは、様々なアプリケーションにおいて重要な問題である。 pu分類に対する最近のアプローチのほとんどは、トレーニング未ラベルデータセットのクラス優先(正のサンプルの割合)がテストデータと同一であると仮定している。さらに、私たちは通常、トレーニングとテストデータのクラスプライオリエントを知らないので、それらを使わずに分類器をトレーニングする方法の手がかりがありません。これらの問題に対処するために,密度比推定に基づく新しいPU分類法を提案する。提案手法の特筆すべき利点は, 学習段階ではクラスプライオリエントを必要としないこと, テスト段階でのみクラスプライオリエントシフトが組み込まれていることである。提案手法を理論的に正当化し,その効果を実験的に実証する。

Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and test data, thus we have no clue on how to train a classifier without them. To address these problems, we propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase; class-prior shift is incorporated only in the test phase. We theoretically justify our proposed method and experimentally demonstrate its effectiveness.

翻訳日:2021-07-13 15:53:06 公開日:2021-07-11

# LexSubCon: 語彙資源からの知識を語彙置換のための文脈埋め込みに統合する

LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution ( http://arxiv.org/abs/2107.05132v1 )

ライセンス: Link先を確認

George Michalopoulos, Ian McKillop, Alexander Wong, Helen Chen

(参考訳) 語彙置換は、与えられたテキストの文脈で単語の意味のある代用を生成するタスクである。文脈単語埋め込みモデルは文中の置換語から抽出された文脈情報に頼って語彙置換タスクにおいて最先端の結果を得た。しかし、そのようなモデルは外部の語彙データベースに存在する構造化知識を考慮していない。我々は,高度に正確な代替候補を識別できる文脈埋め込みモデルに基づく,エンドツーエンドの語彙置換フレームワークlexsubconを紹介する。これは文脈情報と構造化語彙資源からの知識を組み合わせることで達成される。 Our approach involves: (i) introducing a novel mix-up embedding strategy in the creation of the input embedding of the target word through linearly interpolating the pair of the target input embedding and the average embedding of its probable synonyms; (ii) considering the similarity of the sentence-definition embeddings of the target word and its proposed candidates; and, (iii) calculating the effect of each substitution in the semantics of the sentence through a fine-tuned sentence similarity model. 実験の結果,lexsubcon は ls07 やcoinco ベンチマークデータセットにおいて,語彙置換タスクに広く使用される従来の最先端手法よりも優れていた。

Lexical substitution is the task of generating meaningful substitutes for a word in a given textual context. Contextual word embedding models have achieved state-of-the-art results in the lexical substitution task by relying on contextual information extracted from the replaced word within the sentence. However, such models do not take into account structured knowledge that exists in external lexical databases. We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models that can identify highly accurate substitute candidates. This is achieved by combining contextual information with knowledge from structured lexical resources. Our approach involves: (i) introducing a novel mix-up embedding strategy in the creation of the input embedding of the target word through linearly interpolating the pair of the target input embedding and the average embedding of its probable synonyms; (ii) considering the similarity of the sentence-definition embeddings of the target word and its proposed candidates; and, (iii) calculating the effect of each substitution in the semantics of the sentence through a fine-tuned sentence similarity model. Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets that are widely used for lexical substitution tasks.

翻訳日:2021-07-13 15:52:52 公開日:2021-07-11

# 音韻ベクトルに基づく音声埋め込みを用いた多言語・多言語音声認識

Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings ( http://arxiv.org/abs/2107.05038v1 )

ライセンス: Link先を確認

Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou

(参考訳) 音声特徴量(pfs)の使用は、訓練中に言語固有の電話機を接続できる可能性があり、低リソース言語のための多言語および言語間音声認識方法の情報共有に非常に望ましい。従来の音韻的特徴を用いた場合の欠点は、ボトムアップ方式での音響-PF抽出自体が難しいことである。本稿では,音韻駆動型電話埋め込み(トップダウン)とディープニューラルネットワーク(dnn)を用いた音響特徴抽出(bottom-up)を併用し,電話の確率を推定する。新しい手法はJoinAP(Joining of Acoustics and Phonology)と呼ばれる。音声認識には音響から音韻的特徴への逆変換は不要である。 In the IPA (International Phonetic Alphabet) table, we encode its phonological features to a phonological-vector, then applied linear or linear transformation of the phonological-vector to obtained the phone embedded。コモンボイスデータセット (ドイツ語, フランス語, スペイン語, イタリア語) と aishll-1 データセット (mandarin) で複数言語間および言語間(ゼロショットと少数ショットの両方)の音声認識実験を行い、joinap の線形電話埋め込みとフラット電話埋め込みによる従来の方法の両方において、非線形電話埋め込みによるjoinapの優位性を実証した。

The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages. A drawback suffered by previous methods in using phonological features is that the acoustic-to-PF extraction in a bottom-up way is itself difficult. In this paper, we propose to join phonology driven phone embedding (top-down) and deep neural network (DNN) based acoustic feature extraction (bottom-up) to calculate phone probabilities. The new method is called JoinAP (Joining of Acoustics and Phonology). Remarkably, no inversion from acoustics to phonological features is required for speech recognition. For each phone in the IPA (International Phonetic Alphabet) table, we encode its phonological features to a phonological-vector, and then apply linear or nonlinear transformation of the phonological-vector to obtain the phone embedding. A series of multilingual and crosslingual (both zero-shot and few-shot) speech recognition experiments are conducted on the CommonVoice dataset (German, French, Spanish and Italian) and the AISHLL-1 dataset (Mandarin), and demonstrate the superiority of JoinAP with nonlinear phone embeddings over both JoinAP with linear phone embeddings and the traditional method with flat phone embeddings.

翻訳日:2021-07-13 15:49:20 公開日:2021-07-11

# BEV-MODNet:自律走行のための単眼カメラによる鳥の視線移動物体検出

BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object Detection for Autonomous Driving ( http://arxiv.org/abs/2107.04937v1 )

ライセンス: Link先を確認

Hazem Rashed, Mariam Essam, Maha Mohamed, Ahmad El Sallab and Senthil Yogamani

(参考訳) 移動物体の検出は、自律運転システムにおいて非常に重要なタスクである。知覚フェーズの後、動作計画は通常バードアイビュー(BEV)空間で行われる。これは、画像平面上で検出されたオブジェクトをトップビューのBEV平面に投影する必要がある。このようなプロジェクションは、深度情報や遠方でのノイズマッピングの欠如によってエラーを起こしやすい。 CNNは、現場のグローバルコンテキストを活用して、より良いプロジェクトを作成することができる。本研究では,モノクル画像を直接入力として,BEVマップ上での終端移動物体検出(MOD)について検討する。我々の知る限り、そのようなデータセットは存在せず、5つのクラスのためにBEV空間で動くオブジェクトマスクのアノテーションを備えた12.9k画像からなる拡張KITTI-rawデータセットを作成します。データセットはクラスに依存しないモーションキューベースのオブジェクト検出に使用され、クラスはチューニングを改善するためにメタデータとして提供される。我々は,bev空間内で直接動作セグメンテーションを出力する2ストリームrgbとオプティカルフロー融合アーキテクチャを設計し実装する。画像平面上での最先端動作分割予測の逆視点マッピングと比較する。簡単なベースライン実装を用いてmIoUの13%の大幅な改善を観測した。これは、bev空間で動きのセグメンテーション出力を直接学習する能力を示している。私たちのベースラインとデータセットのアノテーションの質的な結果は、https://sites.google.com/view/bev-modnetで確認できます。

Detection of moving objects is a very important task in autonomous driving systems. After the perception phase, motion planning is typically performed in Bird's Eye View (BEV) space. This would require projection of objects detected on the image plane to top view BEV plane. Such a projection is prone to errors due to lack of depth information and noisy mapping in far away areas. CNNs can leverage the global context in the scene to project better. In this work, we explore end-to-end Moving Object Detection (MOD) on the BEV map directly using monocular images as input. To the best of our knowledge, such a dataset does not exist and we create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes. The dataset is intended to be used for class agnostic motion cue based object detection and classes are provided as meta-data for better tuning. We design and implement a two-stream RGB and optical flow fusion architecture which outputs motion segmentation directly in BEV space. We compare it with inverse perspective mapping of state-of-the-art motion segmentation predictions on the image plane. We observe a significant improvement of 13% in mIoU using the simple baseline implementation. This demonstrates the ability to directly learn motion segmentation output in BEV space. Qualitative results of our baseline and the dataset annotations can be found in https://sites.google.com/view/bev-modnet.

翻訳日:2021-07-13 15:48:09 公開日:2021-07-11

# 圧縮センシングmriのための深部幾何蒸留ネットワーク

Deep Geometric Distillation Network for Compressive Sensing MRI ( http://arxiv.org/abs/2107.04943v1 )

ライセンス: Link先を確認

Xiaohong Fan, Yin Yang, Jianping Zhang

(参考訳) 圧縮センシング(CS)は、小さなサンプルデータから$k$-spaceでMR画像を再構成し、MRIの取得を加速する効率的な方法である。本研究では,モデルに基づくCS-MRI法と深層学習に基づくCS-MRI法の利点を組み合わせた新しい深部幾何学的蒸留ネットワークを提案する。まず,モデルに基づくcs-mri最適化問題を,画像線形近似と画像幾何補償からなる2つの部分問題に展開する。第二に、近似段階における失われたテクスチャの詳細を蒸留するための幾何補正サブプロブレムをテイラー展開により拡張し、異なる幾何学特性領域の特徴を融合させる幾何学蒸留モジュールを設計することができる。さらに、ステップ長パラメータの適応初期化を伴う学習可能なバージョンを使用しており、モデルの柔軟性が向上し、スムーズに収束することができる。数値実験により、他の最先端のCS-MRI再構成手法よりもその優位性を検証した。ソースコードは \url{https://github.com/fanxiaohong/deep-Geometric-Distillation-for-CS-MRI} で入手できる。

Compressed sensing (CS) is an efficient method to reconstruct MR image from small sampled data in $k$-space and accelerate the acquisition of MRI. In this work, we propose a novel deep geometric distillation network which combines the merits of model-based and deep learning-based CS-MRI methods, it can be theoretically guaranteed to improve geometric texture details of a linear reconstruction. Firstly, we unfold the model-based CS-MRI optimization problem into two sub-problems that consist of image linear approximation and image geometric compensation. Secondly, geometric compensation sub-problem for distilling lost texture details in approximation stage can be expanded by Taylor expansion to design a geometric distillation module fusing features of different geometric characteristic domains. Additionally, we use a learnable version with adaptive initialization of the step-length parameter, which allows model more flexibility that can lead to convergent smoothly. Numerical experiments verify its superiority over other state-of-the-art CS-MRI reconstruction approaches. The source code will be available at \url{https://github.com/fanxiaohong/Deep-Geometric-Distillation-Network-for-CS-MRI}

翻訳日:2021-07-13 15:47:48 公開日:2021-07-11

# NeoUNet: 正確な大腸ポリープ分画と腫瘍検出を目指して

NeoUNet: Towards accurate colon polyp segmentation and neoplasm detection ( http://arxiv.org/abs/2107.05023v1 )

ライセンス: Link先を確認

Phan Ngoc Lan, Nguyen Sy An, Dao Viet Hang, Dao Van Long, Tran Quang Trung, Nguyen Thi Thuy, Dinh Viet Sang

(参考訳) 自動ポリプセグメンテーションは内視鏡検査において非常に有用であることが証明されており、内視鏡内科医の腺腫検出率を低下させ、効率を高めている。しかし、ポリープを腫瘍かどうかを分類し、ピクセルレベルで分割することは、医師が限られた時間で実行するのが難しい課題である。本稿では,ポリプセグメンテーション問題に対する細粒度な定式化を提案する。我々の定式化は,ポリープ領域だけでなく,悪性度の高い部位を高い精度で同定することを目的としている。さらに,この問題を解決するために,NeoUNetと呼ばれるUNetベースのニューラルネットワークアーキテクチャとハイブリッド損失関数を提案する。実験では,既存のポリプセグメンテーションモデルと比較して,neounetに対する高い競合性を示す。

Automatic polyp segmentation has proven to be immensely helpful for endoscopy procedures, reducing the missing rate of adenoma detection for endoscopists while increasing efficiency. However, classifying a polyp as being neoplasm or not and segmenting it at the pixel level is still a challenging task for doctors to perform in a limited time. In this work, we propose a fine-grained formulation for the polyp segmentation problem. Our formulation aims to not only segment polyp regions, but also identify those at high risk of malignancy with high accuracy. In addition, we present a UNet-based neural network architecture called NeoUNet, along with a hybrid loss function to solve this problem. Experiments show highly competitive results for NeoUNet on our benchmark dataset compared to existing polyp segmentation models.

翻訳日:2021-07-13 15:47:30 公開日:2021-07-11

# 類似度誘導型深部顔画像検索

Similarity Guided Deep Face Image Retrieval ( http://arxiv.org/abs/2107.05025v1 )

ライセンス: Link先を確認

Young Kyun Jang, Nam Ik Cho

(参考訳) 検索入力された顔画像から同一同一の画像を検索する顔画像検索は、画像データベースのサイズが急速に増加するにつれて注目を浴びている。高速かつ正確な検索を行うために, コンパクトなハッシュコードに基づく手法が提案されており, 近年, 教師付き分類訓練による深面画像ハッシュ手法が注目されている。しかし、分類に基づくスキームは、顔画像間の複雑な類似性をハッシュコード学習に明らかにできないという欠点がある。本稿では,自己と対相同性を同時に考慮した類似性誘導ハッシュ(sgh)法を提案することにより,顔画像の検索品質の向上を試みる。 sghは、顔画像間の精巧な類似性を探索するために設計された様々なデータ拡張を用いている。既存のベンチマークと大規模高解像度顔画像データセットによるプロトコルに関する大規模な実験結果から,我々のSGHが最先端の検索性能を実現することを示す。

Face image retrieval, which searches for images of the same identity from the query input face image, is drawing more attention as the size of the image database increases rapidly. In order to conduct fast and accurate retrieval, a compact hash code-based methods have been proposed, and recently, deep face image hashing methods with supervised classification training have shown outstanding performance. However, classification-based scheme has a disadvantage in that it cannot reveal complex similarities between face images into the hash code learning. In this paper, we attempt to improve the face image retrieval quality by proposing a Similarity Guided Hashing (SGH) method, which gently considers self and pairwise-similarity simultaneously. SGH employs various data augmentations designed to explore elaborate similarities between face images, solving both intra and inter identity-wise difficulties. Extensive experimental results on the protocols with existing benchmarks and an additionally proposed large scale higher resolution face image dataset demonstrate that our SGH delivers state-of-the-art retrieval performance.

翻訳日:2021-07-13 15:47:18 公開日:2021-07-11

# 投影・撮影機能を有するハイブリッド画素を用いたプロジェクタカメラシステム

A Projector-Camera System Using Hybrid Pixels with Projection and Capturing Capabilities ( http://arxiv.org/abs/2107.05043v1 )

ライセンス: Link先を確認

Kenta Yamamoto, Daisuke Iwai, Kosuke Sato

(参考訳) 本稿では,各画素に投影能力と撮影能力の両方を有するプロジェクターカメラシステム(procams)を提案する。提案するprocamsは,プロジェクタとカメラの正確な画素対応を得るのが困難である。概念実証procamsプロトタイプを実装し,その動的投影マッピングへの適用性を実証した。

We propose a novel projector-camera system (ProCams) in which each pixel has both projection and capturing capabilities. Our proposed ProCams solves the difficulty of obtaining precise pixel correspondence between the projector and the camera. We implemented a proof-of-concept ProCams prototype and demonstrated its applicability to a dynamic projection mapping.

翻訳日:2021-07-13 15:47:00 公開日:2021-07-11

# 正則化m推定器の導出と残留分布と適応チューニングへの応用

Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning ( http://arxiv.org/abs/2107.05143v1 )

ライセンス: Link先を確認

Pierre C Bellec, Yiwei Shen

(参考訳) 本稿では,ガウス設計行列と任意雑音分布を持つ線形モデルにおいて,凸ペナルティを正規化した勾配-リプシッツ損失関数を持つm推定器について検討する。実例では、ハマー損失と弾性ネットのペナルティとノイズ分布の重みを持つロバストなM推定器がある。私たちの主な貢献は3倍です。 i) 正規化 M-推定器の微分に対する一般式 $\hat\beta(y,X)$ ここでの微分は$y$と$X$の両方で成り立つが、これはすべての凸正規化 M-推定器で共有される単純な微分可能性構造を示す。 (ii)これらの誘導体を用いて、次元とサンプルサイズが同一の中間高次元状態における残余 $r_i = y_i-x_i^\top\hat\beta$ の分布を特徴付ける。 (iii) 残差の分布を動機とし, 正規化m推定器のチューニングパラメータを選択するための新しい適応基準を提案する。基準は、サンプル外誤差を推定器から独立な加算定数まで近似することにより、サンプル外誤差を最小化するプロキシを提供する。提案した適応的基準は、ノイズ分布や設計の共分散の知識を必要としない。シミュレートされたデータは、残差の分布と基準の成功の両方に関して、サンプル外誤差のプロキシとして理論的な結果を確認する。最後に、我々の結果は、$\hat\beta(y,X)$ の微分と独立な興味を持つ M-推定子の有効自由度の間の新しい関係を明らかにする。

This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $\hat\beta(y,X)$ where differentiation is taken with respect to both $y$ and $X$; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residual $r_i = y_i-x_i^\top\hat\beta$ in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of $\hat\beta(y,X)$ and the effective degrees of freedom of the M-estimator, which are of independent interest.

翻訳日:2021-07-13 15:43:52 公開日:2021-07-11

# 拡張勾配降下を用いたコルモゴロフモデル学習の2重最適化

Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent ( http://arxiv.org/abs/2107.05011v1 )

ライセンス: Link先を確認

Qiyou Duan and Hadi Ghauch and Taejoon Kim

(参考訳) データ表現技術は、データ処理と機械学習(ML)の進歩に大きく貢献している。予測能力の向上は従来の表現技法の焦点であり、残念ながらデータの基礎となる洞察を抽出するという点では解釈可能性にかなり劣っている。近年、kolmogorov model (km) が研究され、確率変数の集合の根底にある確率的構造を学ぶための解釈可能かつ予測可能な表現アプローチである。しかし、ランダム化による半定緩和(SDRwR)や離散単調最適化(DMO)を用いた既存のKM学習アルゴリズムは、計算処理がうまく行えないため、ビッグデータアプリケーションに限られている。本稿では,拡張勾配降下法(gd)法を併用した正規化双対最適化に基づく,計算スケーラブルなkm学習アルゴリズムを提案する。提案手法を大規模化するために,固有値分解(EVD)と近似EVDアルゴリズムの2つの高速化手法を提案する。さらに、近似誤差解析を利用して正規化されたMinkowski $\ell_1$-normとそのバウンダリを利用するしきい値は、近位EVDアルゴリズムの反復数を選択するために提供される。ビッグデータアプリケーションに適用した場合,提案手法は,既存のKM学習アルゴリズムと比較して,計算複雑性を著しく低減した互換性のあるトレーニング/予測性能を実現可能であることが実証された。さらに,提案したKM学習アルゴリズムを用いた論理的関係マイニングの精度は80\%以上であることを示した。

Data representation techniques have made a substantial contribution to advancing data processing and machine learning (ML). Improving predictive power was the focus of previous representation techniques, which unfortunately perform rather poorly on the interpretability in terms of extracting underlying insights of the data. Recently, Kolmogorov model (KM) was studied, which is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. The existing KM learning algorithms using semi-definite relaxation with randomization (SDRwR) or discrete monotonic optimization (DMO) have, however, limited utility to big data applications because they do not scale well computationally. In this paper, we propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. To make our method more scalable to large-dimensional problems, we propose two acceleration schemes, namely, eigenvalue decomposition (EVD) elimination strategy and proximal EVD algorithm. Furthermore, a thresholding technique by exploiting the approximation error analysis and leveraging the normalized Minkowski $\ell_1$-norm and its bounds, is provided for the selection of the number of iterations of the proximal EVD algorithm. When applied to big data applications, it is demonstrated that the proposed method can achieve compatible training/prediction performance with significantly reduced computational complexity; roughly two orders of magnitude improvement in terms of the time overhead, compared to the existing KM learning algorithms. Furthermore, it is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80\%$.

翻訳日:2021-07-13 15:41:14 公開日:2021-07-11

# 深層学習を用いたスペクトル時間RF識別

Spectro-Temporal RF Identification using Deep Learning ( http://arxiv.org/abs/2107.05114v1 )

ライセンス: Link先を確認

Hai N. Nguyen, Marinos Vomvas, Triet Vo-Huu, Guevara Noubir

(参考訳) RF放射の検出、分類、分光時間的局所化は、RFスペクトルの理解、管理、保護に関するタスクだけでなく、侵入するドローンやジャマーの検出などの安全およびセキュリティ用途にも不可欠である。広帯域スペクトルとリアルタイム性能のこの目標を達成することは難しい問題である。本稿では,スペクトル時間検出,フレームワーク,システムを備えた広帯域リアルタイムrf識別システムである手首を提案する。得られた深層学習モデルは,100MHzスペクトルのRFサンプルをリアルタイム(入出力6Gbps以上のI&Qストリーム)で検出し,分類し,正確に検出することができる。このような機能は、深層学習に基づく一段階オブジェクト検出フレームワークを活用し、多チャンネル画像に基づくRF信号表現に学習を移すことにより実現可能である。また,合成および拡張rfデータを活用して,rfエミッション(spread)の大規模ラベル付きデータセットを効率的に構築する反復的トレーニング手法を提案する。 WRIST検出器は、野生の非常に密集した環境でも平均的な平均精度が90に達する。 WRISTモデルは5つの技術(Bluetooth、Lightbridge、Wi-Fi、XPD、ZigBee)を分類し、容易に拡張可能である。キュレートされた注釈付きデータセットをコミュニティ全体に公開しています。さまざまな無線電波から収集された100万近いラベル付きrf放射が、さまざまな環境にまたがって5つの排出クラスにまたがって構成されている。

RF emissions detection, classification, and spectro-temporal localization are crucial not only for tasks relating to understanding, managing, and protecting the RF spectrum, but also for safety and security applications such as detecting intruding drones or jammers. Achieving this goal for wideband spectrum and in real-time performance is a challenging problem. We present WRIST, a Wideband, Real-time RF Identification system with Spectro-Temporal detection, framework and system. Our resulting deep learning model is capable to detect, classify, and precisely locate RF emissions in time and frequency using RF samples of 100 MHz spectrum in real-time (over 6Gbps incoming I&Q streams). Such capabilities are made feasible by leveraging a deep-learning based one-stage object detection framework, and transfer learning to a multi-channel image-based RF signals representation. We also introduce an iterative training approach which leverages synthesized and augmented RF data to efficiently build large labelled datasets of RF emissions (SPREAD). WRIST detector achieves 90 mean Average Precision even in extremely congested environment in the wild. WRIST model classifies five technologies (Bluetooth, Lightbridge, Wi-Fi, XPD, and ZigBee) and is easily extendable to others. We are making our curated and annotated dataset available to the whole community. It consists of nearly 1 million fully labelled RF emissions collected from various off-the-shelf wireless radios in a range of environments and spanning the five classes of emissions.

翻訳日:2021-07-13 15:40:44 公開日:2021-07-11

# アタックルール:機械学習を用いた産業制御システムに対するアタック生成の逆アプローチ

Attack Rules: An Adversarial Approach to Generate Attacks for Industrial Control Systems using Machine Learning ( http://arxiv.org/abs/2107.05127v1 )

ライセンス: Link先を確認

Muhammad Azmi Umer, Chuadhry Mujeeb Ahmed, Muhammad Taha Jilani, Aditya P. Mathur

(参考訳) 逆学習は、攻撃中の機械学習アルゴリズムの堅牢性をテストし、産業制御システム(ICS)の異常検出手法を欺く攻撃を生成するために使用される。本研究は,icのセキュリティ評価において,攻撃パターンの徹底的な集合が研究されていることを考慮し,マイニングに基づく攻撃生成手法を提案する。この技術は安全な水処理プラントのデータを用いて実装されている。提案手法は,これまで見られなかった攻撃ベクトルの大部分を構成する30万以上の攻撃パターンを生成することができた。自動生成攻撃は、潜在的な攻撃の理解を深め、ロバストな攻撃検出技術の設計を可能にする。

Adversarial learning is used to test the robustness of machine learning algorithms under attack and create attacks that deceive the anomaly detection methods in Industrial Control System (ICS). Given that security assessment of an ICS demands that an exhaustive set of possible attack patterns is studied, in this work, we propose an association rule mining-based attack generation technique. The technique has been implemented using data from a secure Water Treatment plant. The proposed technique was able to generate more than 300,000 attack patterns constituting a vast majority of new attack vectors which were not seen before. Automatically generated attacks improve our understanding of the potential attacks and enable the design of robust attack detection techniques.

翻訳日:2021-07-13 15:40:20 公開日:2021-07-11

# 新型コロナウイルス対策における温熱測定のためのクラウド-エッジ-端末協調システム

A Cloud-Edge-Terminal Collaborative System for Temperature Measurement in COVID-19 Prevention ( http://arxiv.org/abs/2107.05078v1 )

ライセンス: Link先を確認

Zheyi Ma, Hao Li, Wen Fang, Qingwen Liu, Bin Zhou and Zhiyong Bu

(参考訳) 新型コロナウイルス(COVID-19)の感染拡大を防止するため、公共の場で予備温度測定とマスク検出を行う。しかし、既存の温度測定手法は安全性と展開の問題に直面している。本稿では,人の顔が部分的にぼけている場合でも,安全で正確な温度測定を実現するため,軽量赤外線温度計測モデルを用いたクラウド・エッジ・ターミナル協調システムを提案する。 RGBレンズとサーマルレンズを備えた双眼鏡カメラを用いて、画像対を同時にキャプチャする。次に,マルチタスク・カスケード・畳み込みネットワーク(MTCNN)に基づく移動体検出モデルを提案し,RGB画像上での顔アライメントとマスク検出を実現する。正確な温度測定のために、RGB画像の顔のランドマークをアフィン変換により熱画像に変換し、額のより正確な温度測定領域を選択する。収集された情報は、新型コロナウイルス予防のためにリアルタイムでクラウドにアップロードされる。実験により、検出モデルはわずか6.1mで、平均検出速度は257msであることが示された。 1mの距離では、室内温度測定の誤差は約3%である。すなわち,提案システムは公共空間におけるリアルタイム温度測定を実現することができる。

To prevent the spread of coronavirus disease 2019 (COVID-19), preliminary temperature measurement and mask detection in public areas are conducted. However, the existing temperature measurement methods face the problems of safety and deployment. In this paper, to realize safe and accurate temperature measurement even when a person's face is partially obscured, we propose a cloud-edge-terminal collaborative system with a lightweight infrared temperature measurement model. A binocular camera with an RGB lens and a thermal lens is utilized to simultaneously capture image pairs. Then, a mobile detection model based on a multi-task cascaded convolutional network (MTCNN) is proposed to realize face alignment and mask detection on the RGB images. For accurate temperature measurement, we transform the facial landmarks on the RGB images to the thermal images by an affine transformation and select a more accurate temperature measurement area on the forehead. The collected information is uploaded to the cloud in real time for COVID-19 prevention. Experiments show that the detection model is only 6.1M and the average detection speed is 257ms. At a distance of 1m, the error of indoor temperature measurement is about 3%. That is, the proposed system can realize real-time temperature measurement in public areas.

翻訳日:2021-07-13 15:39:21 公開日:2021-07-11

# 5gコネクテッド・オートマチック運転におけるqos予測

QoS Prediction for 5G Connected and Automated Driving ( http://arxiv.org/abs/2107.05000v1 )

ライセンス: Link先を確認

Apostolos Kousaridas, Ramya Panthangi Manjunath, Jose Mauricio Perdomo, Chan Zhou, Ernst Zielinski, Steffen Schmitz and Andreas Pfadler

(参考訳) 5G通信システムは、多くの高度な車両間通信(V2X)ユースケースの要求品質(QoS)要件をサポートすることができる。しかし、特に自動走行車の安全で効率的な運転は、供給されたQoSの急激な変更によって影響を受ける可能性がある。このため、QoS変更の予測と、これらの予測された変更の早期通知は、最近5G通信システムによって実現されている。このソリューションにより、車両はアプリケーションレベルでの突然のQoS変化の影響を回避または緩和することができる。本稿では,5G通信システムによってQoS予測が生成され,V2Xアプリケーションに配信される方法について述べる。遠隔操作運転使用事例は、QoS予測スキームの実現可能性を分析する例として使用される。 qos予測ソリューションを開発するための有用な推奨事項が提供され、オープンリサーチのトピックが特定される。

5G communication system can support the demanding quality-of-service (QoS) requirements of many advanced vehicle-to-everything (V2X) use cases. However, the safe and efficient driving, especially of automated vehicles, may be affected by sudden changes of the provided QoS. For that reason, the prediction of the QoS changes and the early notification of these predicted changes to the vehicles have been recently enabled by 5G communication systems. This solution enables the vehicles to avoid or mitigate the effect of sudden QoS changes at the application level. This article describes how QoS prediction could be generated by a 5G communication system and delivered to a V2X application. The tele-operated driving use case is used as an example to analyze the feasibility of a QoS prediction scheme. Useful recommendations for the development of a QoS prediction solution are provided, while open research topics are identified.

翻訳日:2021-07-13 15:39:04 公開日:2021-07-11

# ニューラルウェーブシェイピング合成

Neural Waveshaping Synthesis ( http://arxiv.org/abs/2107.05050v1 )

ライセンス: Link先を確認

Ben Hayes, Charalampos Saitis, Gy\"orgy Fazekas

(参考訳) ニューラルウェーブシェーピングユニット (NEWT) は, 波形領域で直接動作するニューラルオーディオ合成に対して, 高速なCPU推論のためのアタッチメント最適化 (FastNEWT) を備えた, 軽量で完全な因果的アプローチである。 NEWTは周期的なアクティベーションを持つ時間分散多層パーセプトロンを使用して、ターゲットの音色の特徴を符号化する非線形伝達関数を暗黙的に学習する。訓練されると、ニュートは入力信号と出力信号の単純なアフィン変換によって複雑な音節の進化を生み出すことができる。 NEWTと差別化可能なノイズ合成器を組み合わせて残響を行い、260kの総モデルパラメータしか持たない現実的な楽器演奏をF0と大音量で再現できることを発見した。提案手法を,マルチ刺激聴取テストとFr'echet Audio Distanceと比較したところ,テストされた音節領域間で競合する性能を示した。提案手法は, 生成速度のベンチマークを著しく上回り, 高速更新の有無に関わらず, 消費者cpu上でのリアルタイム性能を実現し, 将来的な創造的音響設計ツールの基盤となることを示唆する。

We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics of a target timbre. Once trained, a NEWT can produce complex timbral evolutions by simple affine transformations of its input and output signals. We paired the NEWT with a differentiable noise synthesiser and reverb and found it capable of generating realistic musical instrument performances with only 260k total model parameters, conditioned on F0 and loudness features. We compared our method to state-of-the-art benchmarks with a multi-stimulus listening test and the Fr\'echet Audio Distance and found it performed competitively across the tested timbral domains. Our method significantly outperformed the benchmarks in terms of generation speed, and achieved real-time performance on a consumer CPU, both with and without FastNEWT, suggesting it is a viable basis for future creative sound design tools.

翻訳日:2021-07-13 15:38:52 公開日:2021-07-11

# バングラ自然言語処理タスクのレビューと変圧器モデルの有用性

A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models ( http://arxiv.org/abs/2107.03844v2 )

ライセンス: Link先を確認

Firoj Alam, Arid Hasan, Tanvirul Alam, Akib Khan, Janntatul Tajrin, Naira Khan, Shammur Absar Chowdhury

(参考訳) Banglaは世界で6番目に広く話されている言語(https://www.ethnologue.com/guides/ethnologue200)であり、2億3000万人のネイティブスピーカーを持つ。 30年にわたる研究を経て、Bangla NLP(BNLP)は、主に資源不足とそれに伴う課題のために、まだ遅れを取っている。 BNLPのさまざまな領域に疎結合な研究があるが、以前の研究や最近の進歩を報告する詳細な調査はまだ行われていない。本研究は,まずバングラ・nlpのタスク,リソース,ツールのレビューを行い,現状のアルゴリズム(トランスフォーマーベースモデル)を用いて,様々なプラットフォームから収集したデータセットを9つのnlpタスク向けにベンチマークする。異なる大きさの単言語モデルと多言語モデルを比較することで,NLPタスクの比較結果を提供する。個人と統合されたデータセットを用いてその結果を報告し、今後の研究にデータ分割を提供する。我々は合計108の論文をレビューし、175の実験を行った。本結果は,計算コストとのトレードオフを強調しつつ,トランスフォーマーモデルを用いた有望な性能を示す。このような包括的調査がコミュニティを活性化させ、バングラNLPの研究をさらに前進させることを期待している。

Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing monolingual vs. multilingual models of varying sizes. We report our results using both individual and consolidated datasets and provide data splits for future research. We reviewed a total of 108 papers and conducted 175 sets of experiments. Our results show promising performance using transformer-based models while highlighting the trade-off with computational costs. We hope that such a comprehensive survey will motivate the community to build on and further advance the research on Bangla NLP.

翻訳日:2021-07-13 11:42:25 公開日:2021-07-11

PDF登録状況（公開日: 20210711）