Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20210328となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 相互作用クエンチからの光学格子の量子相関の復元 Recovering quantum correlations in optical lattices from interaction quenches ( http://arxiv.org/abs/2005.09000v2 ) ライセンス: Link先を確認	M. Gluza, J. Eisert	(参考訳) 光格子中の超低温原子による量子シミュレーションは、強い相互作用を持つ量子系を理解するためのエキサイティングな道を開く。原子ガス顕微鏡は、他の量子多体系では例外なく、単一部位密度分解能を提供するため、これに欠かせない。しかし、現在、局所的なコヒーレント電流の直接測定は不可能である。本研究では,光格子を傾けた後など,非相互作用ダイナミクスに対するクエンチに応じて変化した密度を測定することにより,その実現方法を示す。そこで本研究では,トンネル電流と原子数ダイナミクスに関する方程式の閉集合を解くデータ解析手法を構築し,コヒーレント電流を表す非対角項を含む全共分散行列を確実に復元する。信号処理は半定値最適化を基盤とし、観測データに最適なボナfide共分散行列を提供する。得られた非可換可観測物に関する情報が有限温度での有界絡みを減らし、古典的能力を超える量子シミュレーションにおける量子相関を研究する可能性を明らかにする。 Quantum simulations with ultra-cold atoms in optical lattices open up an exciting path towards understanding strongly interacting quantum systems. Atom gas microscopes are crucial for this as they offer single-site density resolution, unparalleled in other quantum many-body systems. However, currently a direct measurement of local coherent currents is out of reach. In this work, we show how to achieve that by measuring densities that are altered in response to quenches to non-interacting dynamics, e.g., after tilting the optical lattice. For this, we establish a data analysis method solving the closed set of equations relating tunnelling currents and atom number dynamics, allowing to reliably recover the full covariance matrix, including off-diagonal terms representing coherent currents. The signal processing builds upon semi-definite optimization, providing bona fide covariance matrices optimally matching the observed data. We demonstrate how the obtained information about non-commuting observables allows to lower bound entanglement at finite temperature which opens up the possibility to study quantum correlations in quantum simulations going beyond classical capabilities.	翻訳日:2023-05-19 11:04:41 公開日:2021-03-28
# Stern-Gerlach実験によるスピン状態推定 Spin-state estimation using the Stern-Gerlach experiment ( http://arxiv.org/abs/2009.01877v3 ) ライセンス: Link先を確認	Javier Mart\'inez-Cifuentes, K.M. Fonseca-Romero	(参考訳) 本稿では,中性スピン1/2点粒子のビームが四極子磁場と相互作用するstern-gerlach実験の修正設定を用いて,スピンの状態推定手法を提案する。提案手法は, 二次強度検出器に基づく線形反転推定法であり, ビームの初期空間状態が適切である。初期スピン状態の推定器の統計的特徴付けにより、誤差を推定パラメータに関連付けるだけでなく、異なるStern-Gerlachセットアップに対応する推定手順を比較するための尺度を定義することができる。 We propose a state estimation scheme for spins, using a modified setup of the Stern-Gerlach experiment, in which a beam of neutral spin-1/2 point particles interacts with a quadrupolar magnetic field. The proposed linear inversion estimation procedure, based on a quadrant intensity detector, requires a suitable initial spatial state of the beam. The statistical characterization of the estimator of the initial spin state allows us not only to associate an error to the estimated parameters, but also to define a measure for comparing estimation procedures corresponding to different Stern-Gerlach setups.	翻訳日:2023-05-03 22:44:13 公開日:2021-03-28
# 低温におけるニオブ酸リチウムフォトニック結晶共振器の損失チャネル Loss channels affecting lithium niobate phononic crystal resonators at cryogenic temperature ( http://arxiv.org/abs/2010.01025v3 ) ライセンス: Link先を確認	E. Alex Wollack, Agnetta Y. Cleland, Patricio Arrangoiz-Arriola, Timothy P. McKenna, Rachel G. Gruenke, Rishi N. Patel, Wentao Jiang, Christopher J. Sarabalis, Amir H. Safavi-Naeini	(参考訳) 窒化リチウム薄膜上に作製したマイクロ波フォトニック結晶共振器の超伝導量子回路への集積特性について検討した。ミリケルビン温度の異なる設計ジオメトリでは、共振器内の5.10^6$フォノンに対応する10^5 - 10^6$以上の機械的内部品質係数を高いマイクロ波駆動力で達成する。同一のミラーセル設計で共振器の欠陥サイズをスイープすることで、共振器の内部品質因子を介して完全なフォノニックバンドギャップのシグネチャを間接的に観測することができる。品質因子の温度依存性を調べると、超伝導と2レベルシステム(TLS)損失チャネルがデバイスの性能に与える影響がわかる。最後に,共鳴tls減衰と一致した異常な低温周波数シフトを観測し,材料選択が損失の軽減に寄与することを確認した。 We investigate the performance of microwave-frequency phononic crystal resonators fabricated on thin-film lithium niobate for integration with superconducting quantum circuits. For different design geometries at millikelvin temperatures, we achieve mechanical internal quality factors $Q_i$ above $10^5 - 10^6$ at high microwave drive power, corresponding to $5\times10^6$ phonons inside the resonator. By sweeping the defect size of resonators with identical mirror cell designs, we are able to indirectly observe signatures of the complete phononic bandgap via the resonators' internal quality factors. Examination of quality factors' temperature dependence shows how superconducting and two-level system (TLS) loss channels impact device performance. Finally, we observe an anomalous low-temperature frequency shift consistent with resonant TLS decay and find that material choice can help to mitigate these losses.	翻訳日:2023-04-30 04:01:30 公開日:2021-03-28
# コヒーレントランダム化ベンチマーク Coherent randomized benchmarking ( http://arxiv.org/abs/2010.13810v2 ) ライセンス: Link先を確認	Jorge Miguel-Ramiro and Alexander Pirker and Wolfgang D\"ur	(参考訳) ランダム化ベンチマークは、量子ゲート、回路、デバイスの性能と信頼性を効率的に推定する強力な手法である。本稿では,独立したサンプルではなく,異なるランダムシーケンスの重ね合わせが使用される,コヒーレントな方法でランダム化ベンチマークを行うことを提案する。これにより、ベンチマーク可能なゲートや、効率と拡張性の観点から、一様でシンプルなプロトコルに大きなメリットがあることが示されています。例えば、普遍ゲート集合、例えば$n-$qudit Pauli作用素の集合、あるいは任意のユニタリを含むより一般的な集合、およびパウリ集合のみを用いた特定の$n-$qudit Cliffordゲートが効率的にベンチマーク可能であることを示す。支払いコストは、関連する量子演算に制御を追加するための追加の複雑さである。しかし,本研究では,任意の物理的実現において自然に利用可能であり,テスト対象のゲートとは独立な補助的自由度を用いることにより,これを実現できることを実証する。 Randomized benchmarking is a powerful technique to efficiently estimate the performance and reliability of quantum gates, circuits and devices. Here we propose to perform randomized benchmarking in a coherent way, where superpositions of different random sequences rather than independent samples are used. We show that this leads to a uniform and simple protocol with significant advantages with respect to gates that can be benchmarked, and in terms of efficiency and scalability. We show that e.g. universal gate sets, the set of $n-$qudit Pauli operators or more general sets including arbitrary unitaries, as well as a particular $n-$qudit Clifford gate using only the Pauli set, can be efficiently benchmarked. The price to pay is an additional complexity to add control to the involved quantum operations. However we demonstrate that this can be done by using auxiliary degrees of freedom that are naturally available in basically any physical realization, and are independent of the gates to be tested.	翻訳日:2023-04-27 11:21:24 公開日:2021-03-28
# ads${}_3$/cft${}_2$と$u(1)$チャーン・シモンズ理論との対称性解消された絡み合い Symmetry-Resolved Entanglement in AdS${}_3$/CFT${}_2$ coupled to $U(1)$ Chern-Simons Theory ( http://arxiv.org/abs/2012.11274v3 ) ライセンス: Link先を確認	Suting Zhao, Christian Northe, Ren\'e Meyer	(参考訳) 我々は、AdS${}_3$/CFT${}_2$と$U(1)$チャーン-サイモンズ理論に結合した対称性分解エントロピーを考える。二次元共形場理論における荷電モーメントのホログラフィック双対を、AdS${}_3$のバルク内の電荷ウィルソン線、すなわち、$U(1)$チャーン・サイモンズゲージ場に最小結合された龍高柳測地線として同定する。ウィルソン線周りのホロノミーをアハロノフ-ボーム位相と同定し、2次元場理論において、絡み合う区間の終点に挿入された帯電した u(1)$ 頂点作用素によって生成される。さらに,荷電モーメントの生成関数を帯電する部分領域の電荷量に関連づけることで,対称性の解消された絡み合いエントロピーを計算する新しい手法を考案した。バルクウィルソン線によって導かれる$U(1)$チャーン・サイモンズゲージ場から部分領域電荷を計算する。我々は,Poincar\'e パッチや大域 AdS${}_3$ および円錐欠陥測地に対する対称性分解エントロピーの導出に本手法を用いる。いずれの場合も、対称性分解エントロピーは、龍高柳測地線の長さとチャーン・サイモンズ準位$k$で決定され、絡み合いの均等性を満たす。バルク理論の漸近対称性代数は$\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody型である。 $\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody 対称性を用いて、二重共形場理論の計算によりホログラフィック結果を確認する。 We consider symmetry-resolved entanglement entropy in AdS${}_3$/CFT${}_2$ coupled to $U(1)$ Chern-Simons theory. We identify the holographic dual of the charged moments in the two-dimensional conformal field theory as a charged Wilson line in the bulk of AdS${}_3$, namely the Ryu-Takayanagi geodesic minimally coupled to the $U(1)$ Chern-Simons gauge field. We identify the holonomy around the Wilson line as the Aharonov-Bohm phases which, in the two-dimensional field theory, are generated by charged $U(1)$ vertex operators inserted at the endpoints of the entangling interval. Furthermore, we devise a new method to calculate the symmetry resolved entanglement entropy by relating the generating function for the charged moments to the amount of charge in the entangling subregion. We calculate the subregion charge from the $U(1)$ Chern-Simons gauge field sourced by the bulk Wilson line. We use our method to derive the symmetry-resolved entanglement entropy for Poincar\'e patch and global AdS${}_3$, as well as for the conical defect geometries. In all three cases, the symmetry resolved entanglement entropy is determined by the length of the Ryu-Takayanagi geodesic and the Chern-Simons level $k$, and fulfills equipartition of entanglement. The asymptotic symmetry algebra of the bulk theory is of $\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody type. Employing the $\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody symmetry, we confirm our holographic results by a calculation in the dual conformal field theory.	翻訳日:2023-04-20 00:27:25 公開日:2021-03-28
# 量子客観性を守る Defending Quantum Objectivity ( http://arxiv.org/abs/2103.11530v2 ) ライセンス: Link先を確認	Elias Okon	(参考訳) 最近の議論は、量子測度が明確な客観的な結果を持つという仮定は、量子予測とは相容れないことを主張している。本研究は,議論を詳細に検討した結果,従来認識されていたよりも適用範囲が狭いことを示す。特に、以下のものがある。一その主張が特定の特徴を有する隠れ変数モデルにのみ適用されること、及び二パイロット波理論を含む多くの隠れ変数モデルにそのような特徴が存在しないこと。この議論は量子測定の客観性に疑問を投げかけることに成功していないと結論づける。 A recent argument, attributed to Masanes, is claimed to show that the assumption that quantum measurements have definite, objective outcomes, is incompatible with quantum predictions. In this work, a detailed examination of the argument shows that it has a much narrower field of application than previously recognized. In particular, it is found: i) that the argument only applies to hidden-variable models with a particular feature; and ii) that such a feature is not present in most hidden-variable models, including pilot-wave theory. It is concluded that the argument does not succeed in calling into question the objectivity of quantum measurements.	翻訳日:2023-04-07 04:48:12 公開日:2021-03-28
# 非自明なバンド間効果:磁気感受性、非線形光学、トポロジカル縮退圧力への応用 Nontrivial interband effect: applications to magnetic susceptibility, nonlinear optics, and topological degeneracy pressure ( http://arxiv.org/abs/2103.13281v2 ) ライセンス: Link先を確認	Nobuyuki Okuma	(参考訳) バンド間効果は、伝統的および現代の固体物理学において重要な概念である。本稿では、与えられた規則を破ることなく削除できない非自明なバンド間効果の理論を提案する。一般の非自明なバンド間効果を、タイト結合ハミルトニアンの全バンド集合の性質を自明性として定義する。非自明なバンド間効果の源泉として、安定なトポロジカル絶縁体、対称性に基づく指標、脆弱なトポロジカル絶縁体、多極/高次トポロジカル絶縁体などを考える。応用として、トポロジカル特性を持つ強結合ハミルトニアンに対する軌道磁気感受性を計算する。さらに,非自明なバンド間効果による機械的特性について考察する。我々は、負の値を取る傾向があるバンド間縮退圧力を定義し、チャーン絶縁体に対して計算する。この計算は固体の力学特性におけるトポロジカルバンド構造の重要性を示している。また、偏光差を特徴とする非線形光学への応用と、絡み合う相互作用系への一般化についても論じる。位相的概念を部分集合として含む非自明なバンド間効果の枠組みは、未探索の概念を見つけるのに役立つかもしれない。 The interband effect is an important concept both in traditional and modern solid-state physics. In this paper, we present a theory of the nontrivial interband effect, which cannot be removed without breaking given rules. We define the general nontrivial interband effect by regarding a property of the set of the total bands of a tight-binding Hamiltonian as the triviality. As examples of the source of the nontrivial interband effect, we consider several topological concepts: stable topological insulator, symmetry-based indicator, fragile topological insulator, and multipole/higher-order topological insulator. As an application, we calculate the orbital magnetic susceptibility for tight-binding Hamiltonians with topological properties. In addition, we consider the mechanical properties induced by the nontrivial interband effect. We define interband-induced degeneracy pressure, which tends to take a negative value, and calculate it for the Chern insulator. This calculation demonstrates the importance of topological band structures in mechanical properties of solids. We also discuss the application to nonlinear optics characterized by the polarization difference and the generalization to interacting systems with entanglement. The framework of the nontrivial interband effect, which includes topological concepts as subsets, might be useful for finding unexplored concepts.	翻訳日:2023-04-06 23:43:43 公開日:2021-03-28
# 量子フォトニックデバイス用半導体ナノ構造の液滴エピタキシー Droplet Epitaxy of Semiconductor Nanostructures for Quantum Photonic Devices ( http://arxiv.org/abs/2103.15083v1 ) ライセンス: Link先を確認	Massimo Gurioli, Zhiming Wang, Armando Rastelli, Takashi Kuroda, Stefano Sanguinetti	(参考訳) 長い夢の量子インターネットは、空飛ぶ量子ビットで繋がる量子ノード(固体または原子系)のネットワークで構成され、自然に光子に基づいて、光速で長距離を移動し、不可解なデコヒーレンスを持つ。鍵となるコンポーネントは光源であり、単一または絡み合った光子対を提供できる。異なるプラットフォームの中で、半導体量子ドットは、小型チップで他のフォトニックおよび電子部品と統合できるため、非常に魅力的である。 1990年代初頭、自己集合型エピタキシャル半導体量子ドット(QD)または人工原子、すなわちStranski-Krastanov(SK)とDroplet Epitaxy(DE)の2つのアプローチが開発された。その頑丈さと単純さのため、SK法は基本分野と技術分野の両方でいくつかのブレークスルーを達成するための作業場となった。特定の発光波長や構造的および光学的性質の必要性は、それにもかかわらず、高品質な半導体ナノ構造を得るための補完的経路として、de法とその最近の開発である局所ドロップレットエッチング(lde)に関するさらなる研究の動機となっている。強い絡み合った光子対の生成と良好な光子の不識別性に関する最近の報告は、DEおよびLDE QDsが従来のSK InGaAs QDsを量子エミッタとして補完する(時には上回る)ことを示唆している。本稿では,de と lde の現状に関する批判的調査を行い,量子通信と技術応用の観点から,その利点と弱点,得られた成果,未解決の課題について述べる。 The long dreamed quantum internet would consist of a network of quantum nodes (solid-state or atomic systems) linked by flying qubits, naturally based on photons, travelling over long distances at the speed of light, with negligible decoherence. A key component is a light source, able to provide single or entangled photon pairs. Among the different platforms, semiconductor quantum dots are very attractive, as they can be integrated with other photonic and electronic components in miniaturized chips. In the early 1990s two approaches were developed to synthetize self-assembled epitaxial semiconductor quantum dots (QDs), or artificial atoms, namely the Stranski-Krastanov (SK) and the droplet epitaxy (DE) method. Because of its robustness and simplicity, the SK method became the workhorse to achieve several breakthroughs in both fundamental and technological areas. The need for specific emission wavelengths or structural and optical properties has nevertheless motivated further research on the DE method and its more recent development, the local-droplet-etching (LDE), as complementary routes to obtain high quality semiconductor nanostructures. The recent reports on the generation of highly entangled photon pairs, combined with good photon indistinguishability, suggest that DE and LDE QDs may complement (and sometime even outperform) conventional SK InGaAs QDs as quantum emitters. We present here a critical survey of the state of the art of DE and LDE, highlighting the advantages and weaknesses, the obtained achievements and the still open challenges, in view of applications in quantum communication and technology.	翻訳日:2023-04-06 08:11:29 公開日:2021-03-28
# 干渉集合法による複素分子と格子の伝導零点 Conductance zeros in complex molecules and lattices from the interference set method ( http://arxiv.org/abs/2103.15082v1 ) ライセンス: Link先を確認	M. Nita, M. Tolea, D.C. Marinescu	(参考訳) 破壊量子干渉(DQI)とその電子輸送への影響は、離散ハミルトニアンによって記述できる化学分子や有限物理格子で研究されている。特殊に指定された集合、干渉集合の任意の2つの点の間にコンダクタンス零点が存在することが知られている二部系から始まり、ダイソン方程式を用いて複素系におけるゼロコンダクタンス零点を決定する一般的なアルゴリズムを開発する。我々は、この方法がフルベン分子に適用されることを説明します。コンダクタンス零点の安定性は外部摂動に関して解析される。 Destructive quantum interference (DQI) and its effects on electron transport is studied in chemical molecules and finite physical lattices that can be described by a discrete Hamiltonian. Starting from a bipartite system whose conductance zeros are known to exist between any two points of a specially designated set, the interference set, we use the Dyson equation to develop a general algorithm of determining the zero conductance points in complex systems, which are not necessarily bipartite. We illustrate this procedure as it applies to the fulvene molecule. The stability of the conductance zeros is analyzed in respect with external perturbations.	翻訳日:2023-04-06 08:10:58 公開日:2021-03-28
# 量子ハトホールパラドックスの実験実験 Experimental demonstration of quantum pigeonhole paradox ( http://arxiv.org/abs/2103.15070v1 ) ライセンス: Link先を確認	Ming-Cheng Chen, Chang Liu, Yi-Han Luo, He-Liang Huang, Bi-Ying Wang, Xi-Lin Wang, Li Li, Nai-Le Liu, Chao-Yang Lu, Jian-Wei Pan	(参考訳) 3つの単一光子が2つの偏光チャネル(事前および後選択されたアンサンブル)を透過すると、弱い強度の測定によって同じ偏光チャネルに2つの光子が存在しないことを実験的に証明した。さらに、この効果が2次測定で破壊されることが示される。これらの結果は、量子ハトホールパラドックスとその操作機構の存在を示している。 We experimentally demonstrate that when three single photons transmit through two polarization channels, in a well-defined pre- and postselected ensemble, there are no two photons in the same polarization channel by weak-strength measurement, a counter-intuitive quantum counting effect called quantum pigeonhole paradox. We further show that this effect breaks down in second-order measurement. These results indicate the existence of quantum pigeonhole paradox and its operating regime.	翻訳日:2023-04-06 08:10:47 公開日:2021-03-28
# TransICD:説明可能なICD符号化のためのトランスフォーマーに基づくコードワイズアテンションモデル TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding ( http://arxiv.org/abs/2104.10652v1 ) ライセンス: Link先を確認	Biplob Biswas, Thai-Hoang Pham, Ping Zhang	(参考訳) 国際疾患分類法(ICD)は, 医療分野の請求システムにおいて, 診断基準付き医療券のタグ付けが有効であり, 不可欠であることが示されている。現在、ICDコードは手動で臨床メモに割り当てられており、多くのエラーを引き起こす可能性がある。さらに、熟練したプログラマのトレーニングには時間と人的リソースも必要です。したがって、ICDコード決定プロセスの自動化は重要な課題である。人工知能理論と計算ハードウェアの進歩により、機械学習アプローチがこのプロセスを自動化するのに適したソリューションとして登場した。本稿では,文書のトークン間の相互依存を捉えるためにトランスフォーマーベースのアーキテクチャを適用し,コードワイド・アテンション・メカニズムを用いて文書全体のコード固有表現を学習する。最後に、それらは対応するコード予測のために分離された高密度層に供給される。さらに,臨床データセットの符号周波数の不均衡に対処するために,ラベル分布認識マージン(LDAM)損失関数を用いる。 mimic-iiiデータセットの実験結果は,提案モデルが他のベースラインよりも有意な差を示した。特に,2方向リカレントニューラルネットワークの0.868と比較すると,マイクロAUCスコアが0.923である。また,コードワイズアテンション機構を利用することで,その予測に関するより多くの洞察を提供し,臨床医が信頼できる意思決定を行うことができることを示した。私たちのコードはオンラインで入手できる(https://github.com/biplob1ly/TransICD)。 International Classification of Disease (ICD) coding procedure which refers to tagging medical notes with diagnosis codes has been shown to be effective and crucial to the billing system in medical sector. Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors. Moreover, training skilled coders also requires time and human resources. Therefore, automating the ICD code determination process is an important task. With the advancement of artificial intelligence theory and computational hardware, machine learning approach has emerged as a suitable solution to automate this process. In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document. Finally, they are fed to separate dense layers for corresponding code prediction. Furthermore, to handle the imbalance in the code frequency of clinical datasets, we employ a label distribution aware margin (LDAM) loss function. The experimental results on the MIMIC-III dataset show that our proposed model outperforms other baselines by a significant margin. In particular, our best setting achieves a micro-AUC score of 0.923 compared to 0.868 of bidirectional recurrent neural networks. We also show that by using the code-wise attention mechanism, the model can provide more insights about its prediction, and thus it can support clinicians to make reliable decisions. Our code is available online (https://github.com/biplob1ly/TransICD)	翻訳日:2023-04-06 08:05:40 公開日:2021-03-28
# BCNN: バイナリ複合ニューラルネットワーク BCNN: Binary Complex Neural Network ( http://arxiv.org/abs/2104.10044v1 ) ライセンス: Link先を確認	Yanfei Li, Tong Geng, Ang Li, Huimin Yu	(参考訳) bnn(binarized neural networks)は、リソースに制限のあるハードウェアを備えたエッジサイドアプリケーションにおいて、優れた期待を示すが、精度の低下に関する懸念を提起する。複雑なニューラルネットワークに動機づけられ、bnnに複雑な表現を導入し、複雑な畳み込みによって2つの複雑な入力と重みを処理するが、それでもbnnの特別な計算効率を得られる新しいネットワーク設計であるbinary complex neural networkを提案する。高速収束率を確保するため,新しいBCNNベースのバッチ正規化関数と重み初期化関数を提案する。最先端ネットワークモデル(ResNet、ResNetE、NINなど)を用いたCifar10とImageNetの実験結果から、BCNNは元のBNNモデルよりも精度がよいことが示された。 BCNNは、複雑な表現を通じて学習能力を強化し、複雑な値の入力データに適用性を拡張することにより、BNNを改善している。 BCNNのソースコードはGitHubで公開される予定だ。 Binarized neural networks, or BNNs, show great promise in edge-side applications with resource limited hardware, but raise the concerns of reduced accuracy. Motivated by the complex neural networks, in this paper we introduce complex representation into the BNNs and propose Binary complex neural network -- a novel network design that processes binary complex inputs and weights through complex convolution, but still can harvest the extraordinary computation efficiency of BNNs. To ensure fast convergence rate, we propose novel BCNN based batch normalization function and weight initialization function. Experimental results on Cifar10 and ImageNet using state-of-the-art network models (e.g., ResNet, ResNetE and NIN) show that BCNN can achieve better accuracy compared to the original BNN models. BCNN improves BNN by strengthening its learning capability through complex representation and extending its applicability to complex-valued input data. The source code of BCNN will be released on GitHub.	翻訳日:2023-04-06 08:05:17 公開日:2021-03-28
# CyberLearning: サイバー異常とマルチアタックを検出する機械学習セキュリティモデリングの有効性分析 CyberLearning: Effectiveness Analysis of Machine Learning Security Modeling to Detect Cyber-Anomalies and Multi-Attacks ( http://arxiv.org/abs/2104.08080v1 ) ライセンス: Link先を確認	Iqbal H. Sarker	(参考訳) サイバー異常の検出や攻撃は、近年、サイバーセキュリティの領域で懸念が高まっている。人工知能の知識、特に機械学習技術は、これらの問題に取り組むのに使用できる。しかし,学習に基づくセキュリティモデルの有効性は,セキュリティ特性やデータ特性によって異なる可能性がある。本稿では,特徴選択を関連付けた機械学習に基づくサイバーセキュリティモデリングであるcyberlearningと,各種機械学習に基づくセキュリティモデルの有効性に関する包括的実証分析を行う。サイバーラーニングモデルでは,異常検出のためのバイナリ分類モデルと,各種サイバー攻撃に対するマルチクラス分類モデルを考慮に入れた。セキュリティモデルを構築するために,まず,ナイーブベイズ,ロジスティック回帰,確率勾配降下,k-ネアレスト近傍,サポートベクターマシン,決定木,ランダムフォレスト,適応ブースティング,極端な勾配ブースティング,線形判別分析などの10種類の機械学習分類手法を用いた。次に、複数の隠蔽層を考慮したニューラルネットワークベースのセキュリティモデルを提案する。これらの学習に基づくセキュリティモデルの有効性を,unsw-nb15とnsl-kddの2つのセキュリティデータセットを用いて検証した。本稿では,サイバーセキュリティの文脈における実験的分析と発見を通じて,データ駆動型セキュリティモデリングの基準点として機能することを目的とする。 Detecting cyber-anomalies and attacks are becoming a rising concern these days in the domain of cybersecurity. The knowledge of artificial intelligence, particularly, the machine learning techniques can be used to tackle these issues. However, the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. In this paper, we present "CyberLearning", a machine learning-based cybersecurity modeling with correlated-feature selection, and a comprehensive empirical analysis on the effectiveness of various machine learning based security models. In our CyberLearning modeling, we take into account a binary classification model for detecting anomalies, and multi-class classification model for various types of cyber-attacks. To build the security model, we first employ the popular ten machine learning classification techniques, such as naive Bayes, Logistic regression, Stochastic gradient descent, K-nearest neighbors, Support vector machine, Decision Tree, Random Forest, Adaptive Boosting, eXtreme Gradient Boosting, as well as Linear discriminant analysis. We then present the artificial neural network-based security model considering multiple hidden layers. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. Overall, this paper aims to serve as a reference point for data-driven security modeling through our experimental analysis and findings in the context of cybersecurity.	翻訳日:2023-04-06 08:04:58 公開日:2021-03-28
# デジタル超空間の数学 Mathematics of Digital Hyperspace ( http://arxiv.org/abs/2103.15203v1 ) ライセンス: Link先を確認	Jeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin	(参考訳) ソーシャルメディア、eコマース、ストリーミングビデオ、電子メール、クラウドドキュメント、Webページ、トラフィックフロー、ネットワークパケットは、私たちが毎日使っている巨大なデジタル湖、川、海を埋めます。このデジタルハイパースペースは、型と次元の標準概念を拡張する連続ストリームによって支えられるデータのアモルファスフローである。デジタル超空間の非構造化データは、超グラフ、超疎行列、連想配列代数の数学を通じてエレガントに表現、横断、変換することができる。本稿では,グラフ解析,データベース操作,機械学習に不可欠な操作を提供するためのセミリングのペアを組み合わせた,新しい数学的概念であるsemilinkについて検討する。 GraphBLAS標準は現在、ハイパーグラフ、ハイパースパース行列、セミリンクに必要な数学をサポートし、グラフ、ネットワーク、行列操作をシームレスに実行する。キーベースのインデックス(文字列へのポインタなど)とセミリンクの追加により、GraphBLASはよりリッチな連想配列代数となり、スプレッドシート、データベーステーブル、データ中心のオペレーティングシステムのプラグイン代替となり、デジタルハイパースペースで見つかった非構造化データのナビゲーションが強化される。 Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented, traversed, and transformed via the mathematics of hypergraphs, hypersparse matrices, and associative array algebra. This paper explores a novel mathematical concept, the semilink, that combines pairs of semirings to provide the essential operations for graph analytics, database operations, and machine learning. The GraphBLAS standard currently supports hypergraphs, hypersparse matrices, the mathematics required for semilinks, and seamlessly performs graph, network, and matrix operations. With the addition of key based indices (such as pointers to strings) and semilinks, GraphBLAS can become a richer associative array algebra and be a plug-in replacement for spreadsheets, database tables, and data centric operating systems, enhancing the navigation of unstructured data found in digital hyperspace.	翻訳日:2023-04-06 08:03:08 公開日:2021-03-28
# シーケンシャル・ツー・シーケンスのVAEは、文のグローバルな特徴を学ぶか? Do sequence-to-sequence VAEs learn global features of sentences? ( http://arxiv.org/abs/2004.07683v2 ) ライセンス: Link先を確認	Tom Bosc and Pascal Vincent	(参考訳) 自動回帰言語モデルは強力で、訓練が比較的容易です。しかしながら、これらのモデルは通常、明示的な条件付けのラベルなしで訓練されており、世代間の感情やトピックといったグローバルな側面を制御する簡単な方法を提供していない。 Bowman & al. (2016) は、変分オートエンコーダ (VAE) をシーケンス・ツー・シーケンスアーキテクチャで自然言語に適応させ、潜在ベクトルはそのようなグローバルな特徴を教師なしで捉えることができると主張した。我々はこの主張に疑問を呈する。文中の位置ごとの再構成損失を分解することにより、潜在情報から最も有益な単語を計測する。この方法を用いることで,vaesは最初の単語と文長を記憶し易く,有用性に乏しい局所的特徴を生じやすいことがわかった。そこで本研究では,単語の先入観と言語モデルの事前学習に基づく代替アーキテクチャについて検討する。これらの変種はよりグローバルな潜在変数、すなわちトピックや感情ラベルをより予測的に学習する。また,第1の単語と文の長さは,ベースラインほど正確には復元されないため,より多様な復元結果が得られるため,記憶力の低下が観察された。 Autoregressive language models are powerful and relatively easy to train. However, these models are usually trained without explicit conditioning labels and do not offer easy ways to control global aspects such as sentiment or topic during generation. Bowman & al. (2016) adapted the Variational Autoencoder (VAE) for natural language with the sequence-to-sequence architecture and claimed that the latent vector was able to capture such global features in an unsupervised manner. We question this claim. We measure which words benefit most from the latent information by decomposing the reconstruction loss per position in the sentence. Using this method, we find that VAEs are prone to memorizing the first words and the sentence length, producing local features of limited usefulness. To alleviate this, we investigate alternative architectures based on bag-of-words assumptions and language model pretraining. These variants learn latent variables that are more global, i.e., more predictive of topic or sentiment labels. Moreover, using reconstructions, we observe that they decrease memorization: the first word and the sentence length are not recovered as accurately than with the baselines, consequently yielding more diverse reconstructions.	翻訳日:2022-12-12 21:02:13 公開日:2021-03-28
# Red-GAN: 条件付き生成によるクラス不均衡の攻撃。皮膚病変皮膚内視鏡と脳腫瘍mriにおける医用画像合成の新たな展望 Red-GAN: Attacking class imbalance via conditioned generation. Yet another perspective on medical image synthesis for skin lesion dermoscopy and brain tumor MRI ( http://arxiv.org/abs/2004.10734v4 ) ライセンス: Link先を確認	Ahmad B Qasim, Ivan Ezhov, Suprosanna Shit, Oliver Schoppe, Johannes C Paetzold, Anjany Sekuboyina, Florian Kofler, Jana Lipkova, Hongwei Li, Bjoern Menze	(参考訳) データ体制の不足下での学習アルゴリズムの爆発は、医療画像分野の限界と現実である。この問題を緩和するために,生成的敵ネットワークに基づくデータ拡張プロトコルを提案する。我々は,ネットワークをピクセルレベル(セグメンテーションマスク)およびグローバルレベル情報(獲得環境または病変型)に設定する。このような条件付けは、合成画像のグローバルクラス固有の外観を制御しつつ、画像ラベル対への即時アクセスを提供する。セグメンテーションタスクに関連する特徴の合成を促進させるために、セグメンテーションゲームにセグメンテーションの形で追加の受動プレーヤを導入する。このアプローチを、BraTS、ISICの2つの医療データセットで検証する。トレーニングセットに合成画像の注入によりクラス分布を制御することにより、データセットのクラスの精度レベルを制御する。 Exploiting learning algorithms under scarce data regimes is a limitation and a reality of the medical imaging field. In an attempt to mitigate the problem, we propose a data augmentation protocol based on generative adversarial networks. We condition the networks at a pixel-level (segmentation mask) and at a global-level information (acquisition environment or lesion type). Such conditioning provides immediate access to the image-label pairs while controlling global class specific appearance of the synthesized images. To stimulate synthesis of the features relevant for the segmentation task, an additional passive player in a form of segmentor is introduced into the adversarial game. We validate the approach on two medical datasets: BraTS, ISIC. By controlling the class distribution through injection of synthetic images into the training set we achieve control over the accuracy levels of the datasets' classes.	翻訳日:2022-12-10 18:06:51 公開日:2021-03-28
# 論理チームq-learning:協調的marlにおける因子政策へのアプローチ Logical Team Q-learning: An approach towards factored policies in cooperative MARL ( http://arxiv.org/abs/2006.03553v2 ) ライセンス: Link先を確認	Lucas Cassano and Ali H. Sayed	(参考訳) 我々は,協調的marlシナリオにおける因子政策の学習の課題に対処した。特に、エージェントのチームが協力して共通のコストを最適化する状況を考察する。目標は、それぞれのエージェントの個々の行動を決定する要因付きポリシーを得ることであり、結果として得られる共同ポリシーが最適である。この研究の主な貢献は、Logical Team Q-learning(LTQL)の導入である。 LTQLは環境に関する仮定に依存しないので、一般的なMARLシナリオに適用される。本研究で導入した動的プログラミング手法の確率近似としてLTQLを導出する。論文の結論は,その主張を説明する実験(表と深い設定の両方)を提供することである。 We address the challenge of learning factored policies in cooperative MARL scenarios. In particular, we consider the situation in which a team of agents collaborates to optimize a common cost. The goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. The main contribution of this work is the introduction of Logical Team Q-learning (LTQL). LTQL does not rely on assumptions about the environment and hence is generally applicable to any collaborative MARL scenario. We derive LTQL as a stochastic approximation to a dynamic programming method we introduce in this work. We conclude the paper by providing experiments (both in the tabular and deep settings) that illustrate the claims.	翻訳日:2022-11-25 02:42:09 公開日:2021-03-28
# 非凸正規化器を用いたランク最小化のための低ランク因子化 Low-Rank Factorization for Rank Minimization with Nonconvex Regularizers ( http://arxiv.org/abs/2006.07702v2 ) ライセンス: Link先を確認	April Sagan, John E. Mitchell	(参考訳) ランク最小化は、リコメンダシステムやロバストな主成分分析のような機械学習応用に関心がある。階数最小化問題である核ノルムへの凸緩和の最小化は、強力な性能保証によって問題を解決する効果的な手法である。しかし、非凸緩和は核規範よりも推定バイアスが少なく、測定に対するノイズの影響をより正確に低減することができる。本研究では, 繰り返し再重み付けされた核ノルムスキームに基づく効率的なアルゴリズムを開発し, また, ブラーとモンテイロによる半定値プログラムの低階分解を利用した。我々は収束を証明し,対流緩和と交互最小化法に対する利点を計算的に示す。さらに、我々のアルゴリズムの各反復の計算複雑性は、アートアルゴリズムの他の状態と同等であり、大きな行列に対するランク最小化問題の解を素早く見つけることができる。 Rank minimization is of interest in machine learning applications such as recommender systems and robust principal component analysis. Minimizing the convex relaxation to the rank minimization problem, the nuclear norm, is an effective technique to solve the problem with strong performance guarantees. However, nonconvex relaxations have less estimation bias than the nuclear norm and can more accurately reduce the effect of noise on the measurements. We develop efficient algorithms based on iteratively reweighted nuclear norm schemes, while also utilizing the low rank factorization for semidefinite programs put forth by Burer and Monteiro. We prove convergence and computationally show the advantages over convex relaxations and alternating minimization methods. Additionally, the computational complexity of each iteration of our algorithm is on par with other state of the art algorithms, allowing us to quickly find solutions to the rank minimization problem for large matrices.	翻訳日:2022-11-21 21:36:13 公開日:2021-03-28
# bertology meets biology: タンパク質言語モデルにおける注意の解釈 BERTology Meets Biology: Interpreting Attention in Protein Language Models ( http://arxiv.org/abs/2006.15222v3 ) ライセンス: Link先を確認	Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani	(参考訳) トランスフォーマーアーキテクチャは、タンパク質の分類と生成タスクの有用な表現を学ぶことが証明されている。しかし、これらの表現は解釈可能性の課題を示す。本研究では,タンパク質トランスフォーマーモデルを注目レンズで解析するための一連の手法を実証する。 1) タンパク質の折りたたみ構造を捉え, 基底配列に遠く離れているが立体構造に空間的に近いアミノ酸を結合し, (2) タンパク質の主要な機能成分である結合部位を標的とし, 3) 層深度を増加させるとともに, より複雑な生物物理特性に着目する。この挙動は、3つのTransformerアーキテクチャ(BERT, ALBERT, XLNet)と2つの異なるタンパク質データセットで一致している。また,注意とタンパク質構造との相互作用を3次元的に可視化する。可視化と分析のためのコードはhttps://github.com/salesforce/provis.com/で入手できる。 Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We find this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets. We also present a three-dimensional visualization of the interaction between attention and protein structure. Code for visualization and analysis is available at https://github.com/salesforce/provis.	翻訳日:2022-11-16 21:20:58 公開日:2021-03-28
# カメラポーズ問題:姿勢分布バイアス緩和による深度予測の改善 Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias ( http://arxiv.org/abs/2007.03887v2 ) ライセンス: Link先を確認	Yunhan Zhao, Shu Kong, Charless Fowlkes	(参考訳) 単眼深度予測装置は通常、カメラポーズの分布に偏りがある大規模なトレーニングセットで訓練される。その結果、訓練された予測者は、珍しいカメラポーズで撮影されたサンプルをテストするために、信頼できる深さ予測を行うことができない。この問題に対処するために、トレーニングと予測中にカメラのポーズを利用する2つの新しい手法を提案する。まず、幾何学的に一貫した方法で既存のものを摂動することで、より多様な視点で新しいトレーニング例を合成する単純な視点対応データ拡張を提案する。次に,画像当たりのカメラポーズを先行知識として利用し,入力の一部として符号化する条件モデルを提案する。この2つの手法を共同で適用することで、撮影される画像の深度予測が向上することを示す。提案手法は,様々な予測アーキテクチャに適用することで性能が向上することを示す。最後に,実画像上で評価した場合,カメラポーズ分布を明示的にエンコードすることで,合成学習した深度予測器の一般化性能が向上することを示す。 Monocular depth predictors are typically trained on large-scale training sets which are naturally biased w.r.t the distribution of camera poses. As a result, trained predictors fail to make reliable depth predictions for testing examples captured under uncommon camera poses. To address this issue, we propose two novel techniques that exploit the camera pose during training and prediction. First, we introduce a simple perspective-aware data augmentation that synthesizes new training examples with more diverse views by perturbing the existing ones in a geometrically consistent manner. Second, we propose a conditional model that exploits the per-image camera pose as prior knowledge by encoding it as a part of the input. We show that jointly applying the two methods improves depth prediction on images captured under uncommon and even never-before-seen camera poses. We show that our methods improve performance when applied to a range of different predictor architectures. Lastly, we show that explicitly encoding the camera pose distribution improves the generalization performance of a synthetically trained depth predictor when evaluated on real images.	翻訳日:2022-11-12 13:15:19 公開日:2021-03-28
# 連成微分方程式に基づく不確かさ系の最適実験設計 Optimal Experimental Design for Uncertain Systems Based on Coupled Differential Equations ( http://arxiv.org/abs/2007.06117v2 ) ライセンス: Link先を確認	Youngjoon Hong, Bongsuk Kwon, and Byung-Jun Yoon	(参考訳) 結合常微分方程式により記述されたN相互作用振動子からなる不確実な倉本モデルに対する最適実験設計問題を考える。本研究の目的は,振動子間の結合強度における不確かさを効果的に低減し,不確実性倉本モデルのロバスト制御コストを最小化することにある。最適実験の設計における潜在的実験の運用効果の定量化の重要性を示す。 We consider the optimal experimental design problem for an uncertain Kuramoto model, which consists of N interacting oscillators described by coupled ordinary differential equations. The objective is to design experiments that can effectively reduce the uncertainty present in the coupling strengths between the oscillators, thereby minimizing the cost of robust control of the uncertain Kuramoto model. We demonstrate the importance of quantifying the operational impact of the potential experiments in designing optimal experiments.	翻訳日:2022-11-11 06:14:10 公開日:2021-03-28
# 可読性ニューラルヒューリスティックスを用いた微分可能プログラムの学習 Learning Differentiable Programs with Admissible Neural Heuristics ( http://arxiv.org/abs/2007.12101v5 ) ライセンス: Link先を確認	Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri	(参考訳) ドメイン固有言語におけるプログラムとして表現される微分可能関数の学習問題について検討する。このようなプログラムモデルは、コンポーザビリティや解釈可能性などの利点を提供するが、学習にはプログラム"アーキテクチャ"の組合せ空間よりも最適化する必要がある。この最適化問題を,プログラム構文のトップダウン導出をパスエンコードする重み付きグラフの探索として構成する。私たちの重要なイノベーションは、ニューラルネットワークのさまざまなクラスを、プログラムの空間上の連続的な緩和と見なすことです。この緩和プログラムは識別可能であり、エンドツーエンドで訓練することが可能であり、結果として得られるトレーニング損失は、組合せ探索を導くことができるおよそ許容されるヒューリスティックである。我々は、A-starアルゴリズムと反復的に分枝分枝分枝探索を用いてアプローチをインスタンス化し、これらのアルゴリズムを用いて3つのシーケンス分類タスクでプログラム分類器を学習する。実験の結果,アルゴリズムはプログラム学習の最先端手法よりも優れており,自然な解釈を導き,競争精度を実現するプログラム分類器が発見された。 We study the problem of learning differentiable functions expressed as programs in a domain-specific language. Such programmatic models can offer benefits such as composability and interpretability; however, learning them requires optimizing over a combinatorial space of program "architectures". We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax. Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs, which can then be used to complete any partial program. This relaxed program is differentiable and can be trained end-to-end, and the resulting training loss is an approximately admissible heuristic that can guide the combinatorial search. We instantiate our approach on top of the A-star algorithm and an iteratively deepened branch-and-bound search, and use these algorithms to learn programmatic classifiers in three sequence classification tasks. Our experiments show that the algorithms outperform state-of-the-art methods for program learning, and that they discover programmatic classifiers that yield natural interpretations and achieve competitive accuracy.	翻訳日:2022-11-07 11:47:22 公開日:2021-03-28
# リッジ回帰としての時変パラメータ Time-Varying Parameters as Ridge Regressions ( http://arxiv.org/abs/2009.00401v2 ) ライセンス: Link先を確認	Philippe Goulet Coulombe	(参考訳) 時間変化パラメータ(TVP)モデルは、構造変化をモデル化するためにしばしば経済学で使用される。実際、それらは隆起レグレッションであることを示す。これにより、状態空間のパラダイムよりも計算、チューニング、実装がずっと簡単になります。中でも、等価な二重尾根問題の解法は高次元においても非常に高速であり、重要な「時間変化の量」はクロスバリデーションによって調整される。進化するボラティリティは2段階のリッジ回帰を用いて処理される。空間性(アルゴリズムはどのパラメータが変化し、どのパラメータが変化しないかを選択する)と縮小ランク制限(変数は因子モデルに結びついている)を含む拡張を考える。このアプローチの有用性を示すために、私はカナダにおける金融政策の進化を研究するためにそれを使用します。このアプリケーションは、新しいメソッドの到達範囲内にあるタスクである約4600tvpsの見積もりを必要とする。 Time-varying parameters (TVPs) models are frequently used in economics to model structural change. I show that they are in fact ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method.	翻訳日:2022-10-23 01:54:31 公開日:2021-03-28
# クリーンな参照なしにDenoiserの強化と学習 Enhancing and Learning Denoiser without Clean Reference ( http://arxiv.org/abs/2009.04286v2 ) ライセンス: Link先を確認	Rui Zhao and Daniel P.K. Lun and Kin-Man Lam	(参考訳) 近年,様々なノイズ低減タスクにおいて,学習に基づく画像認識が有望な性能を達成している。これらの深い雑音の多くは、クリーンな参照の監督の下で訓練されるか、合成ノイズの監視を受けていないかのどちらかである。合成ノイズの仮定は、実際の写真に直面する際の一般化を損なう。この問題に対処するために,ノイズ伝達タスクの特別な場合として,ノイズ低減タスクを考慮し,新しい深部画像デオライズ手法を提案する。学習ノイズ伝達により、破損したサンプルを観察することで、ネットワークがノイズ除去能力を取得することができる。実世界の雑音除去ベンチマークの結果,提案手法は現実的な雑音除去に有望な性能を発揮でき,実用的な雑音低減問題に対する潜在的な解決策となることが示された。 Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks. Most of these deep denoisers are trained either under the supervision of clean references, or unsupervised on synthetic noise. The assumption with the synthetic noise leads to poor generalization when facing real photographs. To address this issue, we propose a novel deep image-denoising method by regarding the noise reduction task as a special case of the noise transference task. Learning noise transference enables the network to acquire the denoising ability by observing the corrupted samples. The results on real-world denoising benchmarks demonstrate that our proposed method achieves promising performance on removing realistic noises, making it a potential solution to practical noise reduction problems.	翻訳日:2022-10-20 11:57:21 公開日:2021-03-28
# PAL : テキストに基づくアクティブラーニング PAL : Pretext-based Active Learning ( http://arxiv.org/abs/2010.15947v3 ) ライセンス: Link先を確認	Shubhang Bhatnagar, Sachin Goyal, Darshan Tank, Amit Sethi	(参考訳) プールベースのアクティブラーニングの目標は、教師付き学習者の精度を最大化するために、プールからラベルのないサンプルの固定サイズのサブセットを選択して、oracleにラベルを問い合わせることである。しかし、oracleが常に正しいラベルを割り当てるべきという不必要な要件は、ほとんどの状況において理不尽です。提案手法は,従来の提案手法よりも,誤ラベルに頑健な深層ニューラルネットワークの能動的学習手法を提案する。従来の手法は、未ラベルサンプルの新規性を推定するためにタスクネットワーク自体に依存していたが、タスクの学習(一般化)とサンプルの選択(分布外検出)は相反する。ラベルのないサンプルを選別するために、別ネットワークを使用します。スコアリングネットワークは、ラベル付きサンプルの分布をモデル化し、潜在的にノイズの多いラベルへの依存性を減らすための自己スーパービジョンに依存している。また,多タスク学習による正規化のためのスコアリングネットワーク上に別のヘッドを配置し,異常な自己分散型ハイブリットスコアリング機能を利用する。さらに,各クエリをラベル付けする前にサブクエリに分割することで,クエリが多種多様なサンプルを持つことを保証する。オラクルによるサンプルの誤ラベルに対する耐性が高いことに加えて、この結果の手法はラベルノイズのない場合の競合精度も生み出す。この技術は、これらのクラスのサンプリング率を一時的に増加させることで、新しいクラスをオンザフライで導入する処理も行う。 The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks that is more robust to mislabeling than the previously proposed techniques. Previous techniques rely on the task network itself to estimate the novelty of the unlabeled samples, but learning the task (generalization) and selecting samples (out-of-distribution detection) can be conflicting goals. We use a separate network to score the unlabeled samples for selection. The scoring network relies on self-supervision for modeling the distribution of the labeled samples to reduce the dependency on potentially noisy labels. To counter the paucity of data, we also deploy another head on the scoring network for regularization via multi-task learning and use an unusual self-balancing hybrid scoring function. Furthermore, we divide each query into sub-queries before labeling to ensure that the query has diverse samples. In addition to having a higher tolerance to mislabeling of samples by the oracle, the resultant technique also produces competitive accuracy in the absence of label noise. The technique also handles the introduction of new classes on-the-fly well by temporarily increasing the sampling rate of these classes.	翻訳日:2022-10-01 22:28:06 公開日:2021-03-28
# 統計的推論における濃度不等式 Concentration Inequalities for Statistical Inference ( http://arxiv.org/abs/2011.02258v3 ) ライセンス: Link先を確認	Huiming Zhang, Song Xi Chen	(参考訳) 本稿では, 分布非依存から分布依存まで, サブゲージ変数からサブ指数変数, サブガンマ, サブワイブル変数, および平均から最大濃度まで, 広範囲の数学統計学の非漸近解析において広く用いられている濃度不等式について考察する。このレビューは、これらの設定の結果に新しい結果を与えます。高次元データや推論の普及に伴い、高次元線形回帰やポアソン回帰の文脈における結果も提供される。我々は既知の定数の濃度不等式を説明し、より鋭い定数で既存の境界を改善することを目的とする。 This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings, from distribution-free to distribution-dependent, from sub-Gaussian to sub-exponential, sub-Gamma, and sub-Weibull random variables, and from the mean to the maximum concentration. This review provides results in these settings with some fresh new results. Given the increasing popularity of high-dimensional data and inference, results in the context of high-dimensional linear and Poisson regressions are also provided. We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.	翻訳日:2022-09-29 22:06:20 公開日:2021-03-28
# intentonomy: 人間の意図理解のためのデータセットと研究 Intentonomy: a Dataset and Study towards Human Intent Understanding ( http://arxiv.org/abs/2011.05558v2 ) ライセンス: Link先を確認	Menglin Jia and Zuxuan Wu and Austin Reiter and Claire Cardie and Serge Belongie and Ser-Nam Lim	(参考訳) 画像は1000ワードの価値があり、物理的な視覚的コンテンツを超えた情報を伝達する。本稿では,視覚情報がどのように人間の意図を認識するのに役立つかを分析する目的で,ソーシャルメディア画像の背景にある意図について検討する。この目的に向けて,広範囲の日常シーンをカバーする14K画像からなる意図的データセットIntentonomyを導入する。これらの画像は、社会心理学の分類から派生した28の意図カテゴリで手動で注釈付けされる。次に、視覚情報(オブジェクトとコンテキスト)が人間のモチベーション理解にどの程度寄与するかを体系的に研究した。本研究は,対象クラスや文脈クラスへの参加効果の定量化と,意図分類器を訓練する際のハッシュタグ形式のテキスト情報の定量化を目的としている。その結果,視覚的およびテキスト的情報の意図予測における可観測的効果について,定量的かつ定性的に考察した。 An image is worth a thousand words, conveying information that goes beyond the physical visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can help the recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories that are derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.	翻訳日:2022-09-27 00:52:59 公開日:2021-03-28
# 微分型粒子フィルタのエンド・ツー・エンド半教師付き学習 End-To-End Semi-supervised Learning for Differentiable Particle Filters ( http://arxiv.org/abs/2011.05748v2 ) ライセンス: Link先を確認	Hao Wen, Xiongjie Chen, Georgios Papagiannis, Conghui Hu and Yunpeng Li	(参考訳) ニューラルネットワークを粒子フィルタに組み込むことの最近の進歩は、大規模実世界のアプリケーションに粒子フィルタを適用するために望ましい柔軟性を提供する。このフレームワークの動的および測定モデルは、粒子フィルタの微分可能実装により学習可能である。このようなモデルを最適化する過去の努力は、実際に入手したり利用できないほど高価である真の状態の知識を必要とすることが多い。本稿では,アノテートされたデータに対する需要を減らすために,真の状態の大部分が未知である場合の状態を推定し,擬似的様相関数の最大化に基づくエンドツーエンド学習目標を提案する。シミュレーションおよび実世界のデータセットを用いたロボット工学における状態推定タスクにおける提案手法の性能を評価する。 Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.	翻訳日:2022-09-26 23:13:33 公開日:2021-03-28
# 患者テキストからの医学症状認識:長期多ラベル分布に対するアクティブラーニングアプローチ Medical symptom recognition from patient text: An active learning approach for long-tailed multilabel distributions ( http://arxiv.org/abs/2011.06874v2 ) ライセンス: Link先を確認	Ali Mottaghi, Prathusha K Sarma, Xavier Amatriain, Serena Yeung, Anitha Kannan	(参考訳) 本研究は,患者(歴史テイク)から関連する情報を集める目的で,患者テキストから医療症状認識の問題について検討する。典型的な患者テキストは、患者が経験している症状を記述し、そのようなテキストの1つの例を複数の症状で"ラベル"することができる。これにより、医学的症状の認識が困難になる i)voluminous annotated dataの可用性の欠如二一つのテキストが写像できる複数の症状を有する大きな未知の宇宙さらに、患者のテキストはデータの長い尾で特徴づけられることが多い(例えば、"fever" 対 "hematochezia" では、いくつかのラベルや症状が他の人よりも頻繁に発生する)。本稿では,継続的に洗練され学習された潜在空間の構造を活用し,ラベル付けする最も有益な例を選択するアクティブラーニング手法を提案する。これにより、データ分布の長い尾にもかかわらず、学習されたモデルを通して症状の宇宙のカバレッジを徐々に増加させる最も有益な例を選択できる。 We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on account of i) the lack of availability of voluminous annotated data as well as ii) the large unknown universe of multiple symptoms that a single text can map to. Furthermore, patient text is often characterized by a long tail in the data (i.e., some labels/symptoms occur more frequently than others for e.g "fever" vs "hematochezia"). In this paper, we introduce an active learning method that leverages underlying structure of a continually refined, learned latent space to select the most informative examples to label. This enables the selection of the most informative examples that progressively increases the coverage on the universe of symptoms via the learned model, despite the long tail in data distribution.	翻訳日:2022-09-26 06:22:41 公開日:2021-03-28
# あなたの"Flamingo"は私の"Bird":ファイングラインドかノーか Your "Flamingo" is My "Bird": Fine-Grained, or Not ( http://arxiv.org/abs/2011.09040v3 ) ライセンス: Link先を確認	Dongliang Chang, Kaiyue Pang, Yixiao Zheng, Zhanyu Ma, Yi-Zhe Song, and Jun Guo	(参考訳) 図1で目にするものが"flamingo"なのか"bird"なのかは、この論文で私たちが問う疑問です。きめ細かい視覚分類(FGVC)は前者への到達を試みていますが、ほとんどの場合、非専門家は「鳥」だけで十分でしょう。それゆえ、本当の質問は -- 異なる専門知識のレベルの下で、どのように異なるきめ細かい定義を調整できるのか? そのために、FGVCの従来の設定を、シングルラベルの分類から、事前に定義された粗いラベル階層のトップダウンのトラバーサルへと再検討し、私たちの答えが"bird"->"Phoenicopteriformes"->"Phoenicopteridae"->"flamingo"になるようにしました。この新たな問題に取り組むために、まず、多くの参加者が専門家であるかどうかに関わらず、マルチグラニュラティラベルを好むことを確認するための、包括的な人間研究を行う。粗いレベルのラベル予測は、きめ細かい特徴学習を悪化させるが、細い特徴は粗いレベルの分類器の学習をより良くする。この発見によって私たちは、新しい問題に対して驚くほど効果的なソリューションを設計できます。 (i)粒度の細かい粗い特徴を乱すために、レベル固有の分類ヘッドを利用する。 (ii) よりきめ細かい特徴を粗いラベル予測に組み込むことにより, よりゆがみが良くなる。実験により,本手法は新たなFGVC設定において優れた性能を示し,従来のシングルラベルFGVC問題よりも優れた性能を示した。その単純さにより、既存のFGVCフレームワーク上で容易に実装でき、パラメータフリーである。 Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore -- how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy -- so that our answer becomes "bird"-->"Phoenicopteriformes"-->"Phoenicopteridae"-->"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.	翻訳日:2022-09-24 04:12:18 公開日:2021-03-28
# 自動変換検索によるプライバシー保護協調学習 Privacy-preserving Collaborative Learning with Automatic Transformation Search ( http://arxiv.org/abs/2011.12505v2 ) ライセンス: Link先を確認	Wei Gao, Shangwei Guo, Tianwei Zhang, Han Qiu, Yonggang Wen, Yang Liu	(参考訳) 参加者は、トレーニングセットを共有することなく、Deep Learningモデルを共同でトレーニングすることができる。しかし、最近の研究では、敵が共有勾配からセンシティブなトレーニングサンプルを完全に回収できることが判明した。このような再建攻撃は、協調学習に深刻な脅威をもたらす。したがって、効果的な緩和ソリューションが緊急に望まれる。本稿では,データ拡張を利用して再構築攻撃を打倒することを提案する。慎重に選択された変換ポリシーで機密画像を前処理することで,敵が対応する勾配から有用な情報を抽出することは不可能となる。我々は、資格ポリシーを自動的に発見する新しい探索法をデザインする。データプライバシとモデルユーザビリティに対するトランスフォーメーションの影響を定量化するために,私たちは2つの新しいメトリクスを採用しています。包括的評価により,本手法が発見した方針は,協調学習における既存のレコンストラクション攻撃を克服し,高効率かつ無視可能なモデル性能への影響を実証する。 Collaborative learning has gained great popularity due to its benefit of data privacy protection: participants can jointly train a Deep Learning model without sharing their training sets. However, recent works discovered that an adversary can fully recover the sensitive training samples from the shared gradients. Such reconstruction attacks pose severe threats to collaborative learning. Hence, effective mitigation solutions are urgently desired. In this paper, we propose to leverage data augmentation to defeat reconstruction attacks: by preprocessing sensitive images with carefully-selected transformation policies, it becomes infeasible for the adversary to extract any useful information from the corresponding gradients. We design a novel search method to automatically discover qualified policies. We adopt two new metrics to quantify the impacts of transformations on data privacy and model usability, which can significantly accelerate the search speed. Comprehensive evaluations demonstrate that the policies discovered by our method can defeat existing reconstruction attacks in collaborative learning, with high efficiency and negligible impact on the model performance.	翻訳日:2022-09-21 02:55:59 公開日:2021-03-28
# ナビゲーションのための反復的視覚言語bert A Recurrent Vision-and-Language BERT for Navigation ( http://arxiv.org/abs/2011.13922v2 ) ライセンス: Link先を確認	Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould	(参考訳) 多くの視覚言語的タスクの精度は、視覚言語(V&L) BERT の応用から大きな恩恵を受けている。しかし,視覚・言語ナビゲーション(VLN)への応用は依然として限られている。この理由の1つは、BERTアーキテクチャを部分的に観測可能なマルコフ決定プロセスに適合させることが困難であることであり、歴史に依存した注意と意思決定が必要である。本稿では,vln で使用する時間に着目した再帰的 bert モデルを提案する。具体的には、エージェントのクロスモーダル状態情報を保持する再帰関数をBERTモデルに装備する。 R2RとREVERIEに関する広範な実験を通じて、我々のモデルはより複雑なエンコーダデコーダモデルを置き換えて最先端の結果が得られることを示した。さらに,本手法を他のトランスフォーマーアーキテクチャに一般化し,事前学習をサポートし,ナビゲーションと表現タスクの同時参照を可能とした。 Accuracy of many visiolinguistic tasks has benefited significantly from the application of vision-and-language(V&L) BERT. However, its application for the task of vision-and-language navigation (VLN) remains limited. One reason for this is the difficulty adapting the BERT architecture to the partially observable Markov decision process present in VLN, requiring history-dependent attention and decision making. In this paper we propose a recurrent BERT model that is time-aware for use in VLN. Specifically, we equip the BERT model with a recurrent function that maintains cross-modal state information for the agent. Through extensive experiments on R2R and REVERIE we demonstrate that our model can replace more complex encoder-decoder models to achieve state-of-the-art results. Moreover, our approach can be generalised to other transformer-based architectures, supports pre-training, and is capable of solving navigation and referring expression tasks simultaneously.	翻訳日:2022-09-20 09:12:47 公開日:2021-03-28
# 制約付きリスク逆マルコフ決定過程 Constrained Risk-Averse Markov Decision Processes ( http://arxiv.org/abs/2012.02423v2 ) ライセンス: Link先を確認	Mohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham, Richard M. Murray, and Aaron D. Ames	(参考訳) マルコフ決定プロセス(MDP)の方針を動的コヒーレントなリスク目標と制約で設計する問題を考察する。まず、問題をLagrangianフレームワークで定式化することから始めます。リスク目標と制約をマルコフリスク遷移マッピングで表現できるという仮定の下で,制約付きリスク回避問題の下限となるマルコフポリシーを合成する最適化ベース手法を提案する。定式化された最適化問題は差分凸プログラム (dcps) の形式であり、disciplined convex-concave programming (dccp) フレームワークによって解決できることを実証する。これらの結果は,制約付きmdpの線形プログラムを,期待コストと制約の合計値で一般化することを示す。最後に,条件値-値-リスク(CVaR)とエントロピー-値-リスク(EVaR)のコヒーレントリスク対策を含むローバーナビゲーション問題に対する数値実験による提案手法の有効性について述べる。 We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition mapping, we propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem. We demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. Finally, we illustrate the effectiveness of the proposed method with numerical experiments on a rover navigation problem involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.	翻訳日:2021-05-22 20:34:03 公開日:2021-03-28
# (参考訳) 因子グラフに基づく推定のための学習触覚モデル Learning Tactile Models for Factor Graph-based Estimation ( http://arxiv.org/abs/2012.03768v2 ) ライセンス: CC BY 4.0	Paloma Sodhi, Michael Kaess, Mustafa Mukadam, Stuart Anderson	(参考訳) 咬合下での操作時のタッチから物体状態を推定する問題に興味がある。本研究では,平面押下時のタッチから物体のポーズを推定する問題に対処する。視覚ベースの触覚センサーは、接触点におけるリッチで局所的な画像計測を提供する。しかし、そのような測定には限られた情報が含まれており、潜伏状態の推測には複数の測定が必要である。この推論問題を因子グラフを用いて解く。触覚測定をグラフに組み込むためには,高次元の触覚画像を低次元の状態空間にマッピングできる局所観測モデルが必要である。以前の研究では、触覚測定を解釈するために低次元の力測定や工学的機能を使用してきた。しかし、これらの方法は脆く、物体やセンサーにまたがるスケールが困難である。私たちの重要な洞察は、触覚画像からセンサーの相対的な位置を予測する触覚観察モデルを直接学習することだ。これらの相対的なポーズは、因子グラフ内の因子として組み込むことができる。そこで我々は,まず,地中真理データに基づく局所触覚観測モデルを学習し,それらのモデルと物理および幾何学的要素を因子グラフオプティマイザに統合する2段階のアプローチを提案する。 3つの物体形状にまたがる様々な軌跡を持つ150の実世界の平面プッシュシーケンスに対して触覚フィードバックのみを用いて,信頼性の高い物体追跡を行う。追加ビデオ: https://youtu.be/y1kbfsmi8w0 We're interested in the problem of estimating object states from touch during manipulation under occlusions. In this work, we address the problem of estimating object poses from touch during planar pushing. Vision-based tactile sensors provide rich, local image measurements at the point of contact. A single such measurement, however, contains limited information and multiple measurements are needed to infer latent object state. We solve this inference problem using a factor graph. In order to incorporate tactile measurements in the graph, we need local observation models that can map high-dimensional tactile images onto a low-dimensional state space. Prior work has used low-dimensional force measurements or engineered functions to interpret tactile measurements. These methods, however, can be brittle and difficult to scale across objects and sensors. Our key insight is to directly learn tactile observation models that predict the relative pose of the sensor given a pair of tactile images. These relative poses can then be incorporated as factors within a factor graph. We propose a two-stage approach: first we learn local tactile observation models supervised with ground truth data, and then integrate these models along with physics and geometric factors within a factor graph optimizer. We demonstrate reliable object tracking using only tactile feedback for 150 real-world planar pushing sequences with varying trajectories across three object shapes. Supplementary video: https://youtu.be/y1kBfSmi8w0	翻訳日:2021-05-18 13:28:47 公開日:2021-03-28
# リンク予測のための逆順順列ノード表現法 Adversarial Permutation Guided Node Representations for Link Prediction ( http://arxiv.org/abs/2012.08974v2 ) ライセンス: Link先を確認	Indradyumna Roy, Abir De, Soumen Chakrabarti	(参考訳) ソーシャルネットワークのスナップショットを観察した後、リンク予測(LP)アルゴリズムは、将来新たなエッジが成立する可能性のあるノードペアを特定する。ほとんどのlpアルゴリズムは、現在不要なノード対のスコアを推定し、このスコアでランク付けする。最近のlpシステムは、ノードの密度の低い低次元ベクトル表現を比較することでこのスコアを計算する。グラフニューラルネットワーク(GNN)、特にグラフ畳み込みネットワーク(GCN)は一般的な例である。 2つのノードを有意義に比較するためには、それらの埋め込みは隣人の並べ替えとは無関係であるべきである。 GNNは通常、この特性を保証するために単純で対称な集合アグリゲータを使用するが、この設計決定は表現力に制限のある表現を生成することが示されている。シーケンスエンコーダはより表現力が高いが、設計に敏感である。このジレンマを克服する最近の取り組みは、LPタスクに不満足であることが判明した。提案するPermGNNは,リカレントかつオーダーセンシティブなアグリゲータを用いて隣接した特徴を集約し,隣り合う置換の逆生成器によって「攻撃」される場合,LP損失を直接最小化する。設計上、PermGNN{} は以前の対称アグリゲータよりも表現力が高い。次に、PermGNNのノード埋め込みを適切な局所性に敏感なハッシュにマッピングする最適化フレームワークを考案し、LPタスクのトップ$K$のエッジの報告を高速化する。多様なデータセットに関する実験によれば、\ourは最先端のリンク予測器をかなり上回っており、最も可能性の高いエッジを素早く予測できる。 After observing a snapshot of a social network, a link prediction (LP) algorithm identifies node pairs between which new edges will likely materialize in future. Most LP algorithms estimate a score for currently non-neighboring node pairs, and rank them by this score. Recent LP systems compute this score by comparing dense, low dimensional vector representations of nodes. Graph neural networks (GNNs), in particular graph convolutional networks (GCNs), are popular examples. For two nodes to be meaningfully compared, their embeddings should be indifferent to reordering of their neighbors. GNNs typically use simple, symmetric set aggregators to ensure this property, but this design decision has been shown to produce representations with limited expressive power. Sequence encoders are more expressive, but are permutation sensitive by design. Recent efforts to overcome this dilemma turn out to be unsatisfactory for LP tasks. In response, we propose PermGNN, which aggregates neighbor features using a recurrent, order-sensitive aggregator and directly minimizes an LP loss while it is `attacked' by adversarial generator of neighbor permutations. By design, PermGNN{} has more expressive power compared to earlier symmetric aggregators. Next, we devise an optimization framework to map PermGNN's node embeddings to a suitable locality-sensitive hash, which speeds up reporting the top-$K$ most likely edges for the LP task. Our experiments on diverse datasets show that \our outperforms several state-of-the-art link predictors by a significant margin, and can predict the most likely edges fast.	翻訳日:2021-05-09 12:37:11 公開日:2021-03-28
# Informer: 時系列予測のための効率的なトランスフォーマー Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting ( http://arxiv.org/abs/2012.07436v3 ) ライセンス: Link先を確認	Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang	(参考訳) 実世界のアプリケーションの多くは、電力消費計画のような長い時系列の予測を必要とする。長周期時系列予測(LSTF)は、出力と入力の正確な長距離依存性を効率的に捉える能力であるモデルの高い予測能力を必要とする。近年の研究では、トランスフォーマーが予測能力を高める可能性を示している。しかしtransformerには、二次時間の複雑さ、高メモリ使用量、エンコーダ-デコーダアーキテクチャの固有の制限など、lstfに直接適用できないいくつかの深刻な問題がある。 i)$ProbSparse$ self-attention mechanism, 時間複雑性とメモリ使用量で$O(L \log L)$を達成し, シーケンスの依存性アライメントに匹敵する性能を持つ。 (ii)カスケード層入力を半減し、極端に長い入力列を効率的に処理することにより、自己着脱蒸留が注目の高まりを強調する。 (iii)生成型デコーダは概念的には単純であるが、ステップバイステップではなく1回のフォワード操作で長い時系列シーケンスを予測し、長シーケンス予測の推論速度を大幅に向上させる。 4つの大規模なデータセットに対する大規模な実験は、Informerが既存のメソッドを著しく上回り、LSTF問題に対する新しい解決策を提供することを示した。 Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.	翻訳日:2021-05-08 14:38:11 公開日:2021-03-28
# トランスを用いたエンド・ツー・エンドヒューマン・ポースとメッシュ再構成 End-to-End Human Pose and Mesh Reconstruction with Transformers ( http://arxiv.org/abs/2012.09760v2 ) ライセンス: Link先を確認	Kevin Lin, Lijuan Wang, Zicheng Liu	(参考訳) 本研究では,メッシュトランスフォーマタ(metro)と呼ばれる新しい手法を提案し,人間の3次元ポーズとメッシュ頂点を1つの画像から再構成する。本手法では、トランスコーダを用いて頂点-頂点-接合相互作用をモデル化し、3次元ジョイント座標とメッシュ頂点を同時に出力する。ポーズと形状パラメータを回帰する既存の手法と比較して、METROはSMPLのようなパラメトリックメッシュモデルに依存しないので、手などの他のオブジェクトにも容易に拡張できる。さらにメッシュトポロジーを緩和し、トランスフォーマー自着機構が任意の2つの頂点間を自由に参加できるようにし、メッシュ頂点と関節間の非局所関係を学べるようにした。提案するマスキング頂点モデリングは, 部分閉塞などの困難な状況に対してより頑健で効果的な手法である。 METROは、パブリックなHuman3.6Mと3DPWデータセット上で、人間のメッシュ再構築のための新しい最先端の結果を生成する。さらに,METROの3次元手指再構成への一般化性を示し,FreiHANDデータセットにおける既存の最先端手法よりも優れていた。 We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset.	翻訳日:2021-05-02 07:23:24 公開日:2021-03-28
# SegGroup: 3DインスタンスとセマンティックセグメンテーションのためのSeg-Levelスーパービジョン SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation ( http://arxiv.org/abs/2012.10217v2 ) ライセンス: Link先を確認	An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou	(参考訳) ほとんどの既存のポイントクラウドインスタンスとセマンティックセグメンテーションメソッドは、シーンのすべてのポイントに対してポイントレベルのラベルを必要とする強力な監視信号に大きく依存しています。しかし、このような強い監督はアノテーションのコストの増大に苦しめられ、効率的な注釈研究の必要性が高まる。本稿では,3次元シーンセグメンテーションにおけるインスタンスの位置が重要であることを明らかにする。ロケーションの利点をフルに活用することで、アノテーションの場所を示すためにインスタンス毎に1つのポイントをクリックするだけで、弱教師付きポイントクラウドセグメンテーションアルゴリズムを設計する。事前処理のオーバーセグメンテーションにより、これらの位置アノテーションをセグレベルのラベルとしてセグメントに拡張する。さらにセグメントグループ化ネットワーク(seggroup)を設計、segレベルラベルの下で擬似的なポイントレベルラベルを生成するために、ラベルのないセグメントを関連するラベル付きセグメントに階層的にグループ化することで、既存のポイントレベルの教師付きセグメントモデルがこれらの擬似ラベルを直接使用してトレーニングできるようにする。実験結果から, セグレベル制御法 (SegGroup) は, 完全注釈付き点レベル制御法と同等の結果が得られることがわかった。さらに、固定アノテーション予算が与えられた最近の弱い監督手法よりも優れています。 Most existing point cloud instance and semantic segmentation methods rely heavily on strong supervision signals, which require point-level labels for every point in the scene. However, such strong supervision suffers from large annotation costs, arousing the need to study efficient annotating. In this paper, we discover that the locations of instances matter for 3D scene segmentation. By fully taking the advantages of locations, we design a weakly supervised point cloud segmentation algorithm that only requires clicking on one point per instance to indicate its location for annotation. With over-segmentation for pre-processing, we extend these location annotations into segments as seg-level labels. We further design a segment grouping network (SegGroup) to generate pseudo point-level labels under seg-level labels by hierarchically grouping the unlabeled segments into the relevant nearby labeled segments, so that existing point-level supervised segmentation models can directly consume these pseudo labels for training. Experimental results show that our seg-level supervised method (SegGroup) achieves comparable results with the fully annotated point-level supervised methods. Moreover, it also outperforms the recent weakly supervised methods given a fixed annotation budget.	翻訳日:2021-05-01 18:11:15 公開日:2021-03-28
# BAF検出器:太陽電池欠陥検出のための効率的なCNN検出器 BAF-Detector: An Efficient CNN-Based Detector for Photovoltaic Cell Defect Detection ( http://arxiv.org/abs/2012.10631v2 ) ライセンス: Link先を確認	Binyi Su, Haiyong Chen, Zhong Zhou	(参考訳) 太陽電池(PV)セルエレクトロルミネッセンス(EL)画像のマルチスケール欠陥検出は,ネットワークの深層化に伴う特徴の消失による課題である。この問題に対処するため,マルチスケール機能融合を実現するため,アテンションベースのトップダウン・ボトムアップアーキテクチャを開発した。このアーキテクチャはBAFPN(Bidirectional Attention Feature Pyramid Network)と呼ばれ、ピラミッドのすべての層が同様のセマンティックな特徴を共有することができる。 BAFPNでは、融合特徴における各画素の重要性を測定するためにコサイン類似性を用いる。さらに、高速RCNN+FPNの領域提案ネットワーク(RPN)にBAFPNを埋め込んだ新しい物体検出器BAF-Detectorが提案されている。 BAFPNはネットワークの堅牢性を改善してスケールし,マルチスケール欠陥検出タスクにおいて優れた性能を実現する。最後に,3629画像,2129画像を含む大規模elデータセットにおける実験結果から,本手法は生のpvセルel画像において,マルチスケールの欠陥分類と検出結果の点で98.70% (f-measure),88.07% (map),73.29% (iou) を達成した。 The multi-scale defect detection for photovoltaic (PV) cell electroluminescence (EL) images is a challenging task, due to the feature vanishing as network deepens. To address this problem, an attention-based top-down and bottom-up architecture is developed to accomplish multi-scale feature fusion. This architecture, called Bidirectional Attention Feature Pyramid Network (BAFPN), can make all layers of the pyramid share similar semantic features. In BAFPN, cosine similarity is employed to measure the importance of each pixel in the fused features. Furthermore, a novel object detector is proposed, called BAF-Detector, which embeds BAFPN into Region Proposal Network (RPN) in Faster RCNN+FPN. BAFPN improves the robustness of the network to scales, thus the proposed detector achieves a good performance in multi-scale defects detection task. Finally, the experimental results on a large-scale EL dataset including 3629 images, 2129 of which are defective, show that the proposed method achieves 98.70% (F-measure), 88.07% (mAP), and 73.29% (IoU) in terms of multi-scale defects classification and detection results in raw PV cell EL images.	翻訳日:2021-05-01 11:12:31 公開日:2021-03-28
# (参考訳) ブラックボックスソースモデルの教師なし領域適応 Unsupervised Domain Adaptation of Black-Box Source Models ( http://arxiv.org/abs/2101.02839v2 ) ライセンス: CC BY 4.0	Haojian Zhang, Yabin Zhang, Kui Jia, Lei Zhang	(参考訳) unsupervised domain adaptation(uda)は、ラベル付きソースドメインから知識を転送することで、ラベル付きデータのターゲットドメインのモデルを学ぶことを目的としている。従来のUDA設定では、ラベル付きソースデータが適応可能であると仮定される。データプライバシに関する懸念が高まっているため、ソースフリーなUDAは、トレーニング済みのソースモデルのみが利用可能であると想定される新しいUDA設定として高く評価されている。しかし、ソースモデルが商業的価値を持つ可能性があり、ソースモデルがソースドメインにリスクをもたらす可能性があるため、トレーニング済みのソースモデルも実際には使用できない場合がある。本研究では,B$^2$UDA (Black-Box Unsupervised Domain Adaptation) という,ソースモデルのアプリケーションプログラミングインターフェースのみを対象ドメインにアクセス可能なサブセットについて検討する。 B$^2$UDAに取り組むために,ノイズラベルを用いた反復学習(IterLNL)という,シンプルで効果的な手法を提案する。ブラックボックスモデルをノイズラベリングのツールとして、IterLNLはノイズラベリングと学習をノイズラベリング(LNL)で反復的に行う。 b$^2$uda における lnl の実装を容易にするために,ラベルなし対象データのモデル予測から雑音率を推定し,カテゴリ間の不均衡ラベルノイズに対処するためにカテゴリ毎のサンプリングを提案する。ベンチマークデータセットの実験は、IterLNLの有効性を示している。ソースデータもソースモデルも考慮しないため、IterLNLはラベル付きソースデータを完全に利用する従来のUDAメソッドと互換性がある。 Unsupervised domain adaptation (UDA) aims to learn models for a target domain of unlabeled data by transferring knowledge from a labeled source domain. In the traditional UDA setting, labeled source data are assumed to be available for adaptation. Due to increasing concerns for data privacy, source-free UDA is highly appreciated as a new UDA setting, where only a trained source model is assumed to be available, while labeled source data remain private. However, trained source models may also be unavailable in practice since source models may have commercial values and exposing source models brings risks to the source domain, e.g., problems of model misuse and white-box attacks. In this work, we study a subtly different setting, named Black-Box Unsupervised Domain Adaptation (B$^2$UDA), where only the application programming interface of source model is accessible to the target domain; in other words, the source model itself is kept as a black-box one. To tackle B$^2$UDA, we propose a simple yet effective method, termed Iterative Learning with Noisy Labels (IterLNL). With black-box models as tools of noisy labeling, IterLNL conducts noisy labeling and learning with noisy labels (LNL), iteratively. To facilitate the implementation of LNL in B$^2$UDA, we estimate the noise rate from model predictions of unlabeled target data and propose category-wise sampling to tackle the unbalanced label noise among categories. Experiments on benchmark datasets show the efficacy of IterLNL. Given neither source data nor source models, IterLNL performs comparably with traditional UDA methods that make full use of labeled source data.	翻訳日:2021-04-10 12:16:51 公開日:2021-03-28
# ガウス過程畳み込み辞書学習 Gaussian Process Convolutional Dictionary Learning ( http://arxiv.org/abs/2104.00530v1 ) ライセンス: Link先を確認	Andrew H. Song, Bahareh Tolooshams, Demba Ba	(参考訳) データからシフト不変テンプレートを推定する問題である畳み込み辞書学習(cdl)は、通常、テンプレートの事前構造や構造がない状態で実行される。コミュニティからほとんど注目を集めていないSNR(Data-Scarce or Low Signal-to-Noise ratio)体制では、下流タスクの予測性能に影響を与えるような、データの過度な適合と滑らかさの欠如を学習した。この制限に対処するため,GPCDLを提案する。GPCDLはガウス過程(GP)を用いたテンプレートの事前処理を行う畳み込み辞書学習フレームワークである。滑らか性に着目して,gpを事前設定することは,学習したテンプレートのワイナーフィルタリングと等価であることを示し,高周波成分の抑制と滑らか性の向上を理論的に示す。このアルゴリズムは古典的反復重み付け最小二乗の単純な拡張であり、柔軟性は異なる滑らかさの仮定で実験できることを示す。シミュレーションにより,GPCDLはSNRの非正規化よりもスムーズな辞書を学習できることを示す。ラットの神経スパイクデータに適用することにより、GPCDLによる学習テンプレートはより正確で視覚的に解釈可能なスムーズな辞書となり、非正規化されたCDLよりも予測性能が優れ、パラメトリックな代替品が得られた。 Convolutional dictionary learning (CDL), the problem of estimating shift-invariant templates from data, is typically conducted in the absence of a prior/structure on the templates. In data-scarce or low signal-to-noise ratio (SNR) regimes, which have received little attention from the community, learned templates overfit the data and lack smoothness, which can affect the predictive performance of downstream tasks. To address this limitation, we propose GPCDL, a convolutional dictionary learning framework that enforces priors on templates using Gaussian Processes (GPs). With the focus on smoothness, we show theoretically that imposing a GP prior is equivalent to Wiener filtering the learned templates, thereby suppressing high-frequency components and promoting smoothness. We show that the algorithm is a simple extension of the classical iteratively reweighted least squares, which allows the flexibility to experiment with different smoothness assumptions. Through simulation, we show that GPCDL learns smooth dictionaries with better accuracy than the unregularized alternative across a range of SNRs. Through an application to neural spiking data from rats, we show that learning templates by GPCDL results in a more accurate and visually-interpretable smooth dictionary, leading to superior predictive performance compared to non-regularized CDL, as well as parametric alternatives.	翻訳日:2021-04-02 13:48:50 公開日:2021-03-28
# (参考訳) 「あなたが正しいからといって、私が間違っているというわけではない」:オープンエンディングビジュアル質問回答(VQA)タスクの開発と評価におけるボタネックの克服 'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks ( http://arxiv.org/abs/2103.15022v1 ) ライセンス: CC BY 4.0	Man Luo, Shailaja Keyur Sampat, Riley Tallman, Yankai Zeng, Manuha Vancha, Akarshan Sajja, Chitta Baral	(参考訳) GQA (Hudson and Manning, 2019) は、現実の視覚的推論と構成的質問応答のためのデータセットである。 GQAデータセット上で最高の視覚言語モデルによって予測される多くの回答は、基礎的真実の答えと一致しないが、与えられた文脈において意味的に意味があり正しい。実際、ほとんどの既存の視覚的質問応答(VQA)データセットでは、各質問に対して1つの根本的回答しか想定していない。我々は,この制限に対処するために,既設のNLPツールを用いて自動生成する,地中回答の代替アンサーセット(AAS)を提案する。 AASに基づくセマンティックメトリックを導入し、トップVQAソルバを修正して、質問に対する複数の妥当な回答をサポートする。このアプローチをGQAデータセットに実装し、性能改善を示す。 GQA (Hudson and Manning, 2019) is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best visionlanguage models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools. We introduce a semantic metric based on AAS and modify top VQA solvers to support multiple plausible answers for a question. We implement this approach on the GQA dataset and show the performance improvements.	翻訳日:2021-04-01 09:24:13 公開日:2021-03-28
# (参考訳) 条件言語生成における幻覚と予測不確かさについて On Hallucination and Predictive Uncertainty in Conditional Language Generation ( http://arxiv.org/abs/2103.15025v1 ) ライセンス: CC BY 4.0	Yijun Xiao, William Yang Wang	(参考訳) 異なる自然言語生成タスクのパフォーマンスは改善されているが、深いニューラルモデルは、誤ったあるいは存在しない事実を幻覚させる傾向がある。異なるタスクに対して異なる仮説が提案され、個別に検討されるが、これらのタスクの体系的な説明は得られない。本研究では,条件言語生成における幻覚と予測の不確かさの関連性を示す。画像キャプションとデータ対テキスト生成の両方におけるそれらの関係を調べ、幻覚を減少させるビーム探索の簡単な拡張を提案する。分析の結果,高い予測不確実性は幻覚の確率が高いことがわかった。てんかんの不確実性は、失語症や全不確実性よりも幻覚を示す。提案したビームサーチ変種との幻覚を抑えるため,標準メートル法での取引性能の向上に寄与する。 Despite improvements in performances on different natural language generation tasks, deep neural models are prone to hallucinating facts that are incorrect or nonexistent. Different hypotheses are proposed and examined separately for different tasks, but no systematic explanations are available across these tasks. In this study, we draw connections between hallucinations and predictive uncertainty in conditional language generation. We investigate their relationship in both image captioning and data-to-text generation and propose a simple extension to beam search to reduce hallucination. Our analysis shows that higher predictive uncertainty corresponds to a higher chance of hallucination. Epistemic uncertainty is more indicative of hallucination than aleatoric or total uncertainties. It helps to achieve better results of trading performance in standard metric for less hallucination with the proposed beam search variant.	翻訳日:2021-04-01 09:17:05 公開日:2021-03-28
# (参考訳) ノイズ注入によるポイントクラウド処理の規則化 Noise Injection-based Regularization for Point Cloud Processing ( http://arxiv.org/abs/2103.15027v1 ) ライセンス: CC BY 4.0	Xiao Zang, Yi Xie, Siyu Liao, Jie Chen, Bo Yuan	(参考訳) Dropoutのようなノイズ注入に基づく正規化は、ディープニューラルネットワーク(DNN)の性能向上のために画像領域で広く利用されている。しかし、ポイントクラウド領域における効率的な正規化はめったに利用されず、最先端の作業の多くはデータ拡張ベースの正規化に焦点を当てている。本稿では,まず,ポイントクラウドドメインDNNにおけるノイズ注入に基づく正規化の体系化について検討する。具体的には、機能レベル、ポイントレベル、クラスタレベルの各ポイントフィーチャーマップにノイズ注入を行うために、dropfeat、droppoint、dropclusterという一連の正規化手法を提案する。また、異なるデータセットやdnnアーキテクチャにおけるアプローチの採用を促進する有用な洞察と一般的なデプロイメントガイドラインを得るために、速度の低下、クラスタサイズ、位置の低下など、さまざまな要因の影響を実証的に分析します。異なるポイントクラウド処理タスクに対する様々なDNNモデルに対する提案手法の評価を行った。実験の結果,本手法による性能改善効果が示された。特に、私たちのDropClusterは、ModelNet40形状分類データセットで、PointNet、PointNet++、DGCNNの全体的な精度を1.5%、1.3%、0.8%向上させています。 shapenetの部分セグメンテーションデータセットでは、dropclusterは0.5%、0.5%、0.2%がpointnet、pointnet++、dgcnnのintersection-over-union(iou)増加をもたらしている。 s3disセマンティックセグメンテーションデータセットでは、dropclusterはpointnet、pointnet++、dgcnnの平均iouをそれぞれ3.2%、2.9%、3.7%改善している。一方、DropClusterは、これら3つの人気バックボーンDNNの全体的な精度を2.4%、 2.2%、 1.8%向上させることができる。 Noise injection-based regularization, such as Dropout, has been widely used in image domain to improve the performance of deep neural networks (DNNs). However, efficient regularization in the point cloud domain is rarely exploited, and most of the state-of-the-art works focus on data augmentation-based regularization. In this paper, we, for the first time, perform systematic investigation on noise injection-based regularization for point cloud-domain DNNs. To be specific, we propose a series of regularization techniques, namely DropFeat, DropPoint and DropCluster, to perform noise injection on the point feature maps at the feature level, point level and cluster level, respectively. We also empirically analyze the impacts of different factors, including dropping rate, cluster size and dropping position, to obtain useful insights and general deployment guidelines, which can facilitate the adoption of our approaches across different datasets and DNN architectures. We evaluate our proposed approaches on various DNN models for different point cloud processing tasks. Experimental results show our approaches enable significant performance improvement. Notably, our DropCluster brings 1.5%, 1.3% and 0.8% higher overall accuracy for PointNet, PointNet++ and DGCNN, respectively, on ModelNet40 shape classification dataset. On ShapeNet part segmentation dataset, DropCluster brings 0.5%, 0.5% and 0.2% mean Intersection-over-union (IoU) increase for PointNet, PointNet++ and DGCNN, respectively. On S3DIS semantic segmentation dataset, DropCluster improves the mean IoU of PointNet, PointNet++ and DGCNN by 3.2%, 2.9% and 3.7%, respectively. Meanwhile, DropCluster also enables the overall accuracy increase for these three popular backbone DNNs by 2.4%, 2.2% and 1.8%, respectively.	翻訳日:2021-04-01 09:05:06 公開日:2021-03-28
# (参考訳) 確率微分方程式を用いた精度・信頼性予測 Accurate and Reliable Forecasting using Stochastic Differential Equations ( http://arxiv.org/abs/2103.15041v1 ) ライセンス: CC BY-SA 4.0	Peng Cui, Zhijie Deng, Wenbo Hu and Jun Zhu	(参考訳) ディープラーニングモデルにとって、現実世界の環境に浸透する不確実性を適切に特徴付けることは、非常に困難である。ヘテロシドスティックニューラルネットワーク(hnn)など多くの努力がなされているが、学習効率、不確実性推定の質、予測性能の異なるレベルの妥協によって、実用性が満足できる成果は少ない。さらに、既存のHNNは予測と関連する不確実性の間に明確な相互作用を構築することができない。本稿では、確率微分方程式(SDE)を備えた新しいヘテロ代用ニューラルネットワークであるSDE-HNNを開発し、HNNの予測平均と分散の相互作用を正確にかつ信頼性の高い回帰のために特徴付けることにより、これらの問題を解決することを目的とする。理論的には、考案されたニューラルSDEに対する解の存在と特異性を示す。さらに、SDE-HNNにおける最適化のためのバイアス分散トレードオフに基づいて、学習安定性を向上させるために、改良された数値SDEソルバを設計する。最後に、予測の不確かさをより体系的に評価するために、2つの新しい診断不確実性指標を示す。本手法は,予測性能と不確実性定量化の両方の観点から,最先端のベースラインを著しく上回り,良好な校正と鋭い予測間隔を提供することを示す。 It is critical yet challenging for deep learning models to properly characterize uncertainty that is pervasive in real-world environments. Although a lot of efforts have been made, such as heteroscedastic neural networks (HNNs), little work has demonstrated satisfactory practicability due to the different levels of compromise on learning efficiency, quality of uncertainty estimates, and predictive performance. Moreover, existing HNNs typically fail to construct an explicit interaction between the prediction and its associated uncertainty. This paper aims to remedy these issues by developing SDE-HNN, a new heteroscedastic neural network equipped with stochastic differential equations (SDE) to characterize the interaction between the predictive mean and variance of HNNs for accurate and reliable regression. Theoretically, we show the existence and uniqueness of the solution to the devised neural SDE. Moreover, based on the bias-variance trade-off for the optimization in SDE-HNN, we design an enhanced numerical SDE solver to improve the learning stability. Finally, to more systematically evaluate the predictive uncertainty, we present two new diagnostic uncertainty metrics. Experiments on the challenging datasets show that our method significantly outperforms the state-of-the-art baselines in terms of both predictive performance and uncertainty quantification, delivering well-calibrated and sharp prediction intervals.	翻訳日:2021-03-31 14:17:09 公開日:2021-03-28
# (参考訳) 表現学習による知識グラフエンティティアライメントに関する包括的調査 A Comprehensive Survey on Knowledge Graph Entity Alignment via Representation Learning ( http://arxiv.org/abs/2103.15059v1 ) ライセンス: CC BY 4.0	Rui Zhang, Bayu Distiawan Trisedy, Miao Li, Yong Jiang, Jianzhong Qi	(参考訳) ここ数年、AIアプリケーションにおいて重要な役割を担っているため、研究コミュニティと業界の両方で知識ベースへの関心が指数関数的に高まっている。エンティティアライメントは知識ベースを強化する上で重要なタスクです。本稿では,表現学習の新しいアプローチを用いた代表者アライメント手法に関する総合的なチュートリアル型調査を行う。本稿では,これらの手法の重要な特徴を捉えるためのフレームワークを提案し,既存のベンチマークデータセットの制限に対処する2つのデータセットを提案し,提案したデータセットを用いて広範な実験を行う。フレームワークは、テクニックの動作方法を明確に示しています。実験により,実験手法の実証的性能と各種要因が性能に与える影響について重要な結果が得られた。以前の研究で強調されなかった重要な観察の1つは、特徴が勝者として際立っているように、属性トリプルと関係式をうまく活用するテクニックである。 In the last few years, the interest in knowledge bases has grown exponentially in both the research community and the industry due to their essential role in AI applications. Entity alignment is an important task for enriching knowledge bases. This paper provides a comprehensive tutorial-type survey on representative entity alignment techniques that use the new approach of representation learning. We present a framework for capturing the key characteristics of these techniques, propose two datasets to address the limitation of existing benchmark datasets, and conduct extensive experiments using the proposed datasets. The framework gives a clear picture of how the techniques work. The experiments yield important results about the empirical performance of the techniques and how various factors affect the performance. One important observation not stressed by previous work is that techniques making good use of attribute triples and relation predicates as features stand out as winners.	翻訳日:2021-03-31 11:56:27 公開日:2021-03-28
# (参考訳) IUP: 5G IoTにおける固体発酵のためのインテリジェントなユーティリティ予測スキーム IUP: An Intelligent Utility Prediction Scheme for Solid-State Fermentation in 5G IoT ( http://arxiv.org/abs/2103.15073v1 ) ライセンス: CC BY 4.0	Min Wang, Shanchen Pang, Tong Ding, Sibo Qiao, Xue Zhai, Shuo Wang, Neal N. Xiong, Zhengwen Huang	(参考訳) 現在,SOILD-STATE発酵(SSF)は主に人工的な経験によって制御されており,生産品質と収量は安定していない。 SSFの品質と収量を正確に予測することは、食品の安全性と供給を改善する上で非常に重要である。本稿では,5G Industrial Internet of Things(IoT)におけるSSFのためのインテリジェントユーティリティ予測(IUP)手法を提案する。この IUP スキームは,5G 産業用 IoT の環境認識と知的学習アルゴリズムに基づいている。 rewritable petri netに基づくワークフローモデルを構築し,システムモデル機能とプロセスの正確性を検証する。さらに,GAN(Generative Adversarial Networks)とFCNN(Fully Connected Neural Network)に基づくSSFの実用予測モデルを設計する。平均二乗誤差(MSE-GAN)の制約付きGANを設計し、SSFの少数ショット学習の問題を解決するとともに、FCNNと組み合わせてSSFの効用予測(通常アルコール)を実現する。実験室での酒類製造から,SSFの実用性予測における他の予測手法よりも精度が高く,予め設定した原料の割合とセル温度の適切な設定に関する数値解析の基礎を提供する。 At present, SOILD-STATE Fermentation (SSF) is mainly controlled by artificial experience, and the product quality and yield are not stable. Accurately predicting the quality and yield of SSF is of great significance for improving human food security and supply. In this paper, we propose an Intelligent Utility Prediction (IUP) scheme for SSF in 5G Industrial Internet of Things (IoT), including parameter collection and utility prediction of SSF process. This IUP scheme is based on the environmental perception and intelligent learning algorithms of the 5G Industrial IoT. We build a workflow model based on rewritable petri net to verify the correctness of the system model function and process. In addition, we design a utility prediction model for SSF based on the Generative Adversarial Networks (GAN) and Fully Connected Neural Network (FCNN). We design a GAN with constraint of mean square error (MSE-GAN) to solve the problem of few-shot learning of SSF, and then combine with the FCNN to realize the utility prediction (usually use the alcohol) of SSF. Based on the production of liquor in laboratory, the experiments show that the proposed method is more accurate than the other prediction methods in the utility prediction of SSF, and provide the basis for the numerical analysis of the proportion of preconfigured raw materials and the appropriate setting of cellar temperature.	翻訳日:2021-03-31 11:17:40 公開日:2021-03-28
# (参考訳) PENELOPIE: 機械翻訳によるギリシア語のオープン情報抽出の実現 PENELOPIE: Enabling Open Information Extraction for the Greek Language through Machine Translation ( http://arxiv.org/abs/2103.15075v1 ) ライセンス: CC BY 4.0	Dimitris Papadopoulos, Nikolaos Papadakis and Nikolaos Matsatsinis	(参考訳) 本稿では,オープンインフォメーション抽出の文脈において,高リソース言語と低リソース言語のギャップを埋めることを目的とした方法論であるEACL 2021 SRWの提出について述べる。第一に、トランスフォーマーアーキテクチャに基づいて、英語からギリシャ語、ギリシャ語への翻訳のためのニューラルマシン翻訳(NMT)モデルを構築する。第二に、これらのNMTモデルを利用して、NLPパイプラインの入力としてギリシャ語のテキストの英語翻訳を作成し、一連の前処理と三重抽出タスクを適用します。最後に、抽出したトリプルをギリシャ語にバックトランスレートします。我々はNMT法とOIE法の両方をベンチマークデータセット上で評価し、我々のアプローチがギリシャの自然言語の最先端技術よりも優れていることを示す。 In this paper we present our submission for the EACL 2021 SRW; a methodology that aims at bridging the gap between high and low-resource languages in the context of Open Information Extraction, showcasing it on the Greek language. The goals of this paper are twofold: First, we build Neural Machine Translation (NMT) models for English-to-Greek and Greek-to-English based on the Transformer architecture. Second, we leverage these NMT models to produce English translations of Greek text as input for our NLP pipeline, to which we apply a series of pre-processing and triple extraction tasks. Finally, we back-translate the extracted triples to Greek. We conduct an evaluation of both our NMT and OIE methods on benchmark datasets and demonstrate that our approach outperforms the current state-of-the-art for the Greek natural language.	翻訳日:2021-03-31 10:55:02 公開日:2021-03-28
# (参考訳) スケッチテンソル空間の学習による人工シーンのイメージインペインティング Learning a Sketch Tensor Space for Image Inpainting of Man-made Scenes ( http://arxiv.org/abs/2103.15087v1 ) ライセンス: CC0 1.0	Chenjie Cao, Yanwei Fu	(参考訳) 本稿では,人為的なシーンを描く作業について検討する。エッジ、ライン、ジャンクションといった画像の視覚的パターンを保存するのが難しいため、非常に難しい。特に、それまでのほとんどの作品は、人工のシーンの画像のオブジェクト/構築構造を復元できなかった。そこで本稿では,人造シーンを描き込むためのスケッチテンソル(st)空間の学習を提案する。このような空間は、画像のエッジ、ライン、ジャンクションを復元するために学習され、その結果、全体像構造の信頼できる予測を行う。構造改善を容易にするために,新しいエンコーダ・デコーダ構造を持つマルチスケール・スケッチ・テンソル塗装 (MST) ネットワークを提案する。エンコーダは入力画像から線とエッジを抽出してST空間に投影する。この空間からデコーダが学習され、入力画像が復元される。広範な実験は、我々のモデルの有効性を検証する。さらに,本モデルでは,コンペティタに対する一般的な自然像の塗布において,競争性能が向上する。 This paper studies the task of inpainting man-made scenes. It is very challenging due to the difficulty in preserving the visual patterns of images, such as edges, lines, and junctions. Especially, most previous works are failed to restore the object/building structures for images of man-made scenes. To this end, this paper proposes learning a Sketch Tensor (ST) space for inpainting man-made scenes. Such a space is learned to restore the edges, lines, and junctions in images, and thus makes reliable predictions of the holistic image structures. To facilitate the structure refinement, we propose a Multi-scale Sketch Tensor inpainting (MST) network, with a novel encoder-decoder structure. The encoder extracts lines and edges from the input images to project them into an ST space. From this space, the decoder is learned to restore the input images. Extensive experiments validate the efficacy of our model. Furthermore, our model can also achieve competitive performance in inpainting general nature images over the competitors.	翻訳日:2021-03-31 10:45:38 公開日:2021-03-28
# (参考訳) 複数の課題によるランキングによる表現学習 Representation Learning by Ranking under multiple tasks ( http://arxiv.org/abs/2103.15093v1 ) ライセンス: CC BY 4.0	Lifeng Gu	(参考訳) 近年,表現学習が機械学習コミュニティの研究の焦点となっている。大規模事前学習ニューラルネットワークは、汎用知性を実現するための最初のステップとなっている。ニューラルネットワークの成功の鍵は、データの抽象表現能力にある。いくつかの学習分野は実際に表現の学習方法について議論しており、統一された視点がない。我々は、複数のタスクの表現学習問題をランキング問題に変換し、ランキング問題を統一的な視点として、近似的なNDCG損失を最適化することにより、異なるタスクの表現学習を解決する。分類、検索、マルチラベル学習、回帰、自己教師あり学習などの異なる学習タスクの下での実験は、近似ndcg損失の優位性が証明される。さらに、自己教師付き学習タスクにおいて、トレーニングデータをデータ拡張法により変換し、近似NDCG損失の性能を向上させることにより、近似NDCG損失が教師なしトレーニングデータの情報をフル活用できることを示す。 In recent years, representation learning has become the research focus of the machine learning community. Large-scale pre-training neural networks have become the first step to realize general intelligence. The key to the success of neural networks lies in their abstract representation capabilities for data. Several learning fields are actually discussing how to learn representations and there lacks a unified perspective. We convert the representation learning problem under multiple tasks into a ranking problem, taking the ranking problem as a unified perspective, the representation learning under different tasks is solved by optimizing the approximate NDCG loss. Experiments under different learning tasks like classification, retrieval, multi-label learning, regression, self-supervised learning prove the superiority of approximate NDCG loss. Further, under the self-supervised learning task, the training data is transformed by data augmentation method to improve the performance of the approximate NDCG loss, which proves that the approximate NDCG loss can make full use of the information of the unsupervised training data.	翻訳日:2021-03-31 10:26:50 公開日:2021-03-28
# (参考訳) BA^2M:画像分類のためのバッチ注意モジュール BA^2M: A Batch Aware Attention Module for Image Classification ( http://arxiv.org/abs/2103.15099v1 ) ライセンス: CC BY 4.0	Qishang Cheng, Hongliang Li, Qingbo Wu and King Ngi Ngan	(参考訳) 特徴表現を強化するために畳み込みニューラルネットワーク(cnn)では注意機構が採用されている。しかし、既存の注意機構は、各サンプル内の特徴を精錬することのみに集中し、異なるサンプル間の識別を無視する。本稿では,特徴量強化のためのバッチアウェアメントモジュール(ba2m)を提案する。具体的には、まず、各サンプル内のチャネル、局所空間及びグローバル空間の注意マップを融合させることにより、サンプルワイズアテンション表現(SAR)を得る。次に,全バッチのSARを正規化関数に供給し,各サンプルの重み付けを行う。重み付けは、内容の複雑さが異なるトレーニングバッチにおけるサンプル間の機能の重要性を区別するのに役立つ。 BA2MはCNNの様々な部分に埋め込まれ、エンドツーエンドでネットワークに最適化された。 BA2Mの設計は軽量で、パラメータや計算は少ない。 CIFAR-100 と ImageNet-1K の広汎な実験により BA2M を検証する。その結果、ba2mは様々なネットワークアーキテクチャの性能を高め、多くの古典的な注意手法を上回っている。さらに、BA2Mは損失値に基づいてサンプルを再重み付けする従来の方法を上回る。 The attention mechanisms have been employed in Convolutional Neural Network (CNN) to enhance the feature representation. However, existing attention mechanisms only concentrate on refining the features inside each sample and neglect the discrimination between different samples. In this paper, we propose a batch aware attention module (BA2M) for feature enrichment from a distinctive perspective. More specifically, we first get the sample-wise attention representation (SAR) by fusing the channel, local spatial and global spatial attention maps within each sample. Then, we feed the SARs of the whole batch to a normalization function to get the weights for each sample. The weights serve to distinguish the features' importance between samples in a training batch with different complexity of content. The BA2M could be embedded into different parts of CNN and optimized with the network in an end-to-end manner. The design of BA2M is lightweight with few extra parameters and calculations. We validate BA2M through extensive experiments on CIFAR-100 and ImageNet-1K for the image recognition task. The results show that BA2M can boost the performance of various network architectures and outperforms many classical attention methods. Besides, BA2M exceeds traditional methods of re-weighting samples based on the loss value.	翻訳日:2021-03-31 10:14:37 公開日:2021-03-28
# (参考訳) 階層的関係調整メトリックラーニング Hierarchical Relationship Alignment Metric Learning ( http://arxiv.org/abs/2103.15107v1 ) ライセンス: CC BY 4.0	Lifeng Gu	(参考訳) 既存のメトリック学習法は、サンプルペア間の類似点や類似点に依存する類似点や距離尺度の学習に焦点を当てている。しかし、サンプルのペアは、例えばマルチラベル学習、ラベル分布学習など、現実世界の多くのアプリケーションにおいて、単に類似または異種と特定できない。この目的のために,これらのシナリオにおける距離学習問題を扱うために,関係アライメントメトリック学習(RAML)フレームワークを提案する。しかし、RAMLは複雑なデータセットをモデル化できない線形メトリックを学ぶ。深層学習とRAMLフレームワークを組み合わせることで,複数の学習課題における距離学習問題に対する関係アライメントの概念を用いて,特徴空間におけるサンプルペア関係とラベル空間におけるサンプルペア関係との整合性をフル活用する階層的関係アライメント計量傾きモデルHRAMLを提案する。さらに,学習タスクによって分割されたいくつかの実験を整理し,多くの一般的なメソッドやRAMLフレームワークに対して,HRAMLの優れた性能を検証した。 Most existing metric learning methods focus on learning a similarity or distance measure relying on similar and dissimilar relations between sample pairs. However, pairs of samples cannot be simply identified as similar or dissimilar in many real-world applications, e.g., multi-label learning, label distribution learning. To this end, relation alignment metric learning (RAML) framework is proposed to handle the metric learning problem in those scenarios. But RAML learn a linear metric, which can't model complex datasets. Combining with deep learning and RAML framework, we propose a hierarchical relationship alignment metric leaning model HRAML, which uses the concept of relationship alignment to model metric learning problems under multiple learning tasks, and makes full use of the consistency between the sample pair relationship in the feature space and the sample pair relationship in the label space. Further we organize several experiment divided by learning tasks, and verified the better performance of HRAML against many popular methods and RAML framework.	翻訳日:2021-03-31 09:53:36 公開日:2021-03-28
# (参考訳) 相互情報による表現の説明 Explaining Representation by Mutual Information ( http://arxiv.org/abs/2103.15114v1 ) ライセンス: CC BY 4.0	Lifeng Gu	(参考訳) 科学は世界の法則を発見するために使われる。機械学習は、データの法則の発見に使用できる。近年,機械学習コミュニティにおける解釈可能性に関する研究がますます増えている。機械学習の手法が安全で解釈可能であり、データに意味のあるパターンを見つけるのに役立つことを願っています。本稿では,深層表現の解釈可能性に着目する。本稿では,相互情報に基づく解釈可能な表現法を提案し,その解釈を入力データと表現の間の3種類の情報に要約する。さらに、モデルに挿入して、モデル表現を説明するための情報量を推定できるMI-LRモジュールを提案する。最後に,プロトタイプネットワークの可視化による検証を行う。 Science is used to discover the law of world. Machine learning can be used to discover the law of data. In recent years, there are more and more research about interpretability in machine learning community. We hope the machine learning methods are safe, interpretable, and they can help us to find meaningful pattern in data. In this paper, we focus on interpretability of deep representation. We propose a interpretable method of representation based on mutual information, which summarizes the interpretation of representation into three types of information between input data and representation. We further proposed MI-LR module, which can be inserted into the model to estimate the amount of information to explain the model's representation. Finally, we verify the method through the visualization of the prototype network.	翻訳日:2021-03-31 09:46:44 公開日:2021-03-28
# (参考訳) 人間言語の区別不能概念に対する量子ボース・アインシュタイン統計 Quantum Bose-Einstein Statistics for Indistinguishable Concepts in Human Language ( http://arxiv.org/abs/2103.15125v1 ) ライセンス: CC BY 4.0	Lester Beltran	(参考訳) 本研究では,「数の概念」と「従属概念」の組合せにおいて,概念のレベルに存在する同一性と識別可能性,すなわち11種の動物が同一であり識別不能であることにより,ボース=アインシュタイン型の統計構造が同一で識別不能な量子粒子に対してボース=アインシュタイン統計が存在しているのと類似する仮説について検討する。 Google Searchツールを用いてWorld-Wide-Webから統計データを抽出し,この仮説の証拠を特定する。 Kullback-Leibler分散法を用いて、得られた分布をマクスウェル-ボルツマン分布およびボース=アインシュタイン分布と比較し、ボース=アインシュタイン分布がマクスウェル-ボルツマン分布と比較してよりよく適合することを示す。 We investigate the hypothesis that within a combination of a 'number concept' plus a 'substantive concept', such as 'eleven animals,' the identity and indistinguishability present on the level of the concepts, i.e., all eleven animals are identical and indistinguishable, gives rise to a statistical structure of the Bose-Einstein type similar to how Bose-Einstein statistics is present for identical and indistinguishable quantum particles. We proceed by identifying evidence for this hypothesis by extracting the statistical data from the World-Wide-Web utilizing the Google Search tool. By using the Kullback-Leibler divergence method, we then compare the obtained distribution with the Maxwell-Boltzmann as well as with the Bose-Einstein distributions and show that the Bose-Einstein's provides a better fit as compared to the Maxwell-Boltzmanns.	翻訳日:2021-03-31 09:38:24 公開日:2021-03-28
# (参考訳) 非線形逆問題におけるモデルベース学習のためのグラフ畳み込みネットワーク Graph Convolutional Networks for Model-Based Learning in Nonlinear Inverse Problems ( http://arxiv.org/abs/2103.15138v1 ) ライセンス: CC BY 4.0	William Herzberg, Daniel B. Rowe, Andreas Hauptmann, and Sarah J. Hamilton	(参考訳) 医用画像における学習画像再構成法の大部分は、画素画像などの一様領域に限られている。非線形逆問題に典型的な有限要素法から生じる非一様メッシュ上で基礎モデルが解かれた場合、補間と埋め込みが必要である。これを克服するために,メッシュをグラフとして解釈し,グラフ畳み込みニューラルネットワークを用いてネットワークアーキテクチャを定式化することにより,モデルベース学習を非一様メッシュに直接拡張するフレキシブルなフレームワークを提案する。これにより、提案された反復グラフ畳み込みニュートン法(GCNM)が、逆問題の解にフォワードモデルを直接含み、すべての更新は問題固有のメッシュ上でネットワークによって直接計算される。本研究では, 有限要素法を用いてフォワード問題を解く最適化に基づく手法で頻繁に解く非線形逆問題である電気インピーダンストモグラフィについて報告する。絶対eitイメージングの結果は、グラフ残差ネットワークと同様に、標準的な反復的手法と比較される。我々はGCNMが純粋にシミュレートされたトレーニングデータから分布データと実験データから、異なる領域形状に強く一般化可能であることを示す。 The majority of model-based learned image reconstruction methods in medical imaging have been limited to uniform domains, such as pixelated images. If the underlying model is solved on nonuniform meshes, arising from a finite element method typical for nonlinear inverse problems, interpolation and embeddings are needed. To overcome this, we present a flexible framework to extend model-based learning directly to nonuniform meshes, by interpreting the mesh as a graph and formulating our network architectures using graph convolutional neural networks. This gives rise to the proposed iterative Graph Convolutional Newton's Method (GCNM), which directly includes the forward model into the solution of the inverse problem, while all updates are directly computed by the network on the problem specific mesh. We present results for Electrical Impedance Tomography, a severely ill-posed nonlinear inverse problem that is frequently solved via optimization-based methods, where the forward problem is solved by finite element methods. Results for absolute EIT imaging are compared to standard iterative methods as well as a graph residual network. We show that the GCNM has strong generalizability to different domain shapes, out of distribution data as well as experimental data, from purely simulated training data.	翻訳日:2021-03-31 09:27:09 公開日:2021-03-28
# (参考訳) マルコフ論理ネットワークにおける重みパラメータのスケーリングと関係ロジスティック回帰モデル Scaling the weight parameters in Markov logic networks and relational logistic regression models ( http://arxiv.org/abs/2103.15140v1 ) ライセンス: CC BY 4.0	Felix Weitk\"amper	(参考訳) 我々はマルコフ論理ネットワークとリレーショナルロジスティック回帰を、その仕様に重み付き公式を用いる統計リレーショナル人工知能の2つの基本的な表現形式として考える。しかし、マルコフ論理ネットワークは無向グラフに基づいており、リレーショナルロジスティック回帰は有向非巡回グラフに基づいている。重みパラメータをドメインサイズでスケーリングする場合、関係ロジスティック回帰モデルの漸近的挙動はパラメータによって透過的に制御され、漸近確率を計算するアルゴリズムを提供する。また、マルコフ論理ネットワークには当てはまらない2つの例を示す。また、主に文献から、そのようなスケーリングが適切かどうか、生の未スケールパラメータの使用が望ましいかどうかをユーザが判断する上で、アプリケーションコンテキストがどのように役立つかを議論する。本稿では,特に有望なスケールモデルの適用分野としてランダムサンプリングに注目し,さらなる研究の道筋を示す。 We consider Markov logic networks and relational logistic regression as two fundamental representation formalisms in statistical relational artificial intelligence that use weighted formulas in their specification. However, Markov logic networks are based on undirected graphs, while relational logistic regression is based on directed acyclic graphs. We show that when scaling the weight parameters with the domain size, the asymptotic behaviour of a relational logistic regression model is transparently controlled by the parameters, and we supply an algorithm to compute asymptotic probabilities. We also show using two examples that this is not true for Markov logic networks. We also discuss using several examples, mainly from the literature, how the application context can help the user to decide when such scaling is appropriate and when using the raw unscaled parameters might be preferable. We highlight random sampling as a particularly promising area of application for scaled models and expound possible avenues for further research.	翻訳日:2021-03-31 09:11:05 公開日:2021-03-28
# (参考訳) webベースシステムにおける認証方式としての顔認識 Face Recognition as a Method of Authentication in a Web-Based System ( http://arxiv.org/abs/2103.15144v1 ) ライセンス: CC BY 4.0	Ben Wycliff Mugalu, Rodrick Calvin Wamala, Jonathan Serugunda, Andrew Katumba	(参考訳) オンライン情報システムは現在、情報保護とアクセス制御にユーザー名とパスワードの伝統的な方法に大きく依存している。生体認証技術の進歩とAIや機械学習などの分野の人気により、生体認証のセキュリティはユーザビリティの優位性から、ますます人気が高まっている。本稿では,ユーザビリティ向上のメリットを享受するための認証手法として,機械学習による顔認識をWebベースシステムに統合する方法を報告する。本稿では,顔認識のためのFaceNetと検出アルゴリズムと分類アルゴリズムの組み合わせを比較した。その結果,検出用MCCNN,埋め込み生成用Facenet,分類用LinearSVCの組み合わせは95%の精度で他の組み合わせよりも優れていることがわかった。得られた分類器は、Webベースシステムに統合され、ユーザ認証に使用される。 Online information systems currently heavily rely on the username and password traditional method for protecting information and controlling access. With the advancement in biometric technology and popularity of fields like AI and Machine Learning, biometric security is becoming increasingly popular because of the usability advantage. This paper reports how machine learning based face recognition can be integrated into a web-based system as a method of authentication to reap the benefits of improved usability. This paper includes a comparison of combinations of detection and classification algorithms with FaceNet for face recognition. The results show that a combination of MTCNN for detection, Facenet for generating embeddings, and LinearSVC for classification outperforms other combinations with a 95% accuracy. The resulting classifier is integrated into the web-based system and used for authenticating users.	翻訳日:2021-03-31 08:44:51 公開日:2021-03-28
# (参考訳) シンボリック回帰は小さなデータセットの他のモデルを上回る Symbolic regression outperforms other models for small data sets ( http://arxiv.org/abs/2103.15147v1 ) ライセンス: CC BY 4.0	Casper Wilstrup and Jaan Kasak	(参考訳) 機械学習は複雑な現象や関係の予測や新しい理解にしばしば応用されるが、モデルトレーニングに十分なデータの提供は広く問題となっている。ランダムフォレストや勾配向上といった従来の機械学習技術は、数百のサンプルのデータセットを扱う場合、過度に適合する傾向にある。本研究は,250個の観測値の小さなトレーニングセットに対して,線形モデルと決定木の解釈可能性を維持しつつ,精度を向上し,これらの機械学習モデルに代えてシンボル回帰が優れていることを示す。 240例中132例において、シンボリック回帰モデルは、サンプルデータ上で他のどのモデルよりも優れている。第2の最良のアルゴリズムはランダムな森林であることが判明し、240件中37件で最善を尽くした。解釈可能なモデルとの比較を制限する場合、シンボリック回帰は240例中184例で最良である。 Machine learning is often applied to obtain predictions and new understanding of complex phenomena and relationships, but availability of sufficient data for model training is a widespread problem. Traditional machine learning techniques such as random forests and gradient boosting tend to overfit when working with data sets of a few hundred samples. This study demonstrates that for small training sets of 250 observations, symbolic regression is a superior alternative to these machine learning models by providing better accuracy while preserving the interpretability of linear models and decision trees. In 132 out of 240 cases, the symbolic regression model performsbetter than any of the other models on the out-of-sample data. The second best algorithm was found to be a random forest, which performs best in 37 of the 240 cases. When restricting the comparison to interpretable models,symbolic regression performs best in 184 out of 240 cases.	翻訳日:2021-03-31 08:37:55 公開日:2021-03-28
# (参考訳) mri画像中の腫瘍同定のための画像処理技術 Image Processing Techniques for identifying tumors in an MRI image ( http://arxiv.org/abs/2103.15152v1 ) ライセンス: CC BY 4.0	Jacob John	(参考訳) 医学共鳴イメージングまたはMRIは、電波を使って体をスキャンする医療画像処理技術である。断層撮影技術であり、主に放射線医学の分野で用いられる。痛みのない診断方法の利点として、MRIでは、医療従事者が体内で発生した解剖や生理的過程の鮮明な画像を説明することができ、疾患の早期発見と治療が可能になる。これらの画像と画像処理技術を組み合わせることで、肉眼では識別が難しい腫瘍の検出に利用することができる。本稿では,ATD(Automated tumor Detection)における画像処理技術について検討する。この課題は,形態学ツール (MT) や地域成長技術 (RGT) といった従来の技術との比較から議論を始める。 Medical Resonance Imaging or MRI is a medical image processing technique that used radio waves to scan the body. It is a tomographic imaging technique, principally used in the field of radiology. With the advantage of being a painless diagnostic procedure, MRI allows medical personnel to illustrate clear pictures of the anatomy and the physiological processes occurring in the body, thus allowing early detection and treatment of diseases. These images, combined with image processing techniques may be used in the detection of tumors, difficult to identify with the naked eye. This digital assignment surveys the different image processing techniques used in Automated Tumor Detection (ATD). This assignment initiates the discussion with a comparison of traditional techniques such as Morphological Tools (MT) and Region Growing Technique (RGT).	翻訳日:2021-03-31 08:30:09 公開日:2021-03-28
# (参考訳) 表象的誤りを同定するベイズ的アプローチ A Bayesian Approach to Identifying Representational Errors ( http://arxiv.org/abs/2103.15171v1 ) ライセンス: CC BY 4.0	Ramya Ramakrishnan, Vaibhav Unhelkar, Ece Kamar, Julie Shah	(参考訳) 訓練されたAIシステムと専門家の意思決定者は、しばしば識別と理解が難しいエラーを犯すことができる。これらのエラーの根本原因を決定することは、将来の決定を改善することができる。本研究は,俳優の行動(模擬エージェント,ロボット,人間)の観察に基づいて表現誤差を推定する生成モデルである生成誤差モデル(gem)を提案する。このモデルは2つのエラー源を考察している: 表現上の制限によって発生するもの -- "盲点" -- と、実行時のノイズやアクターのポリシーに存在する系統的エラーなど、非表現的エラーである。これら2つのエラータイプを曖昧にすることで、アクタのポリシー(つまり、表現エラーは知覚的な拡張を必要とするが、他のエラーはトレーニングの改善や注意支援といった方法によって削減できる)をターゲットとする改善が可能になる。本稿では,GEMのベイズ推定アルゴリズムを提案し,複数の領域における表現誤りの回復にその有用性を評価する。その結果,本手法は,強化学習エージェントとユーザの両方の盲点を回復できることがわかった。 Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand. Determining the root cause for these errors can improve future decisions. This work presents Generative Error Model (GEM), a generative model for inferring representational errors based on observations of an actor's behavior (either simulated agent, robot, or human). The model considers two sources of error: those that occur due to representational limitations -- "blind spots" -- and non-representational errors, such as those caused by noise in execution or systematic errors present in the actor's policy. Disambiguating these two error types allows for targeted refinement of the actor's policy (i.e., representational errors require perceptual augmentation, while other errors can be reduced through methods such as improved training or attention support). We present a Bayesian inference algorithm for GEM and evaluate its utility in recovering representational errors on multiple domains. Results show that our approach can recover blind spots of both reinforcement learning agents as well as human users.	翻訳日:2021-03-31 08:21:37 公開日:2021-03-28
# (参考訳) バングラデシュにおけるcovid-19ワクチンの受容とその決定要因 Acceptance of COVID-19 Vaccine and Its Determinants in Bangladesh ( http://arxiv.org/abs/2103.15206v1 ) ライセンス: CC BY 4.0	Sultan Mahmud, Md. Mohsin, Ijaz Ahmed Khan, Ashraf Uddin Mian, Miah Akib Zaman	(参考訳) 背景:バングラデシュのゴヴト。 2021年2月上旬からSARS-CoV-2感染に対する全国的なワクチン接種を開始した。本研究の目的は、新型コロナウイルスワクチンの受け入れを評価し、バングラデシュでの受け入れに関連する要因を検討することである。方法:2021年1月30日から2月6日まで,バングラデシュの一般住民を対象に,webベースの匿名横断調査を実施した。多変量ロジスティック回帰は、新型コロナウイルスワクチンの受け入れに影響を与える要因を特定するために用いられた。結果:61.16%(370/605)の回答者が新型コロナウイルスワクチンの受け入れ/接種を希望していた。承認されたグループの中で、すぐにワクチンを接種する意思を示したのは35.14%に過ぎず、64.86%はワクチンの有効性や安全性がバングラデシュで死亡し、ワクチンの接種を遅らせることになる。その結果、年齢、性別、場所(都市/声)、教育水準、収入、将来covid-19に感染するリスクが認識され、感染の重症度が認識され、18歳以上の過去の予防接種経験、covid-19に関する知識、およびワクチン接種は、covid-19ワクチンの受容に著しく関連していた。結論: この研究はバングラデシュにおいて、新型コロナウイルスのワクチンの拒絶と不服従の頻度が高いと報告した。ワクチンの根絶を減らし、摂取量を増やすために、政策立案者は予防接種障壁を取り除くための十分に調査された免疫戦略を設計する必要がある。ワクチンの受け入れを改善するため、新型コロナウイルスワクチンに関する誤った噂や誤解は(特にインターネット上で)排除され、人々は実際の科学的事実に晒されなければならない。 Background: Bangladesh govt. launched a nationwide vaccination drive against SARS-CoV-2 infection from early February 2021. The objectives of this study were to evaluate the acceptance of the COVID-19 vaccines and examine the factors associated with the acceptance in Bangladesh. Method: In between January 30 to February 6, 2021, we conducted a web-based anonymous cross-sectional survey among the Bangladeshi general population. The multivariate logistic regression was used to identify the factors that influence the acceptance of the COVID-19 vaccination. Results: 61.16% (370/605) of the respondents were willing to accept/take the COVID-19 vaccine. Among the accepted group, only 35.14% showed the willingness to take the COVID-19 vaccine immediately, while 64.86% would delay the vaccination until they are confirmed about the vaccine's efficacy and safety or COVID-19 become deadlier in Bangladesh. The regression results showed age, gender, location (urban/rural), level of education, income, perceived risk of being infected with COVID-19 in the future, perceived severity of infection, having previous vaccination experience after age 18, having higher knowledge about COVID-19 and vaccination were significantly associated with the acceptance of COVID-19 vaccines. Conclusion: The research reported a high prevalence of COVID-19 vaccine refusal and hesitancy in Bangladesh. To diminish the vaccine hesitancy and increase the uptake, the policymakers need to design a well-researched immunization strategy to remove the vaccination barriers. To improve vaccine acceptance among people, false rumors and misconceptions about the COVID-19 vaccines must be dispelled (especially on the internet) and people must be exposed to the actual scientific facts.	翻訳日:2021-03-31 08:07:31 公開日:2021-03-28
# (参考訳) 微分可能なモンテカルロレンダリングによる統一形状とSVBRDF回収 Unified Shape and SVBRDF Recovery using Differentiable Monte Carlo Rendering ( http://arxiv.org/abs/2103.15208v1 ) ライセンス: CC BY 4.0	Fujun Luan, Shuang Zhao, Kavita Bala, Zhao Dong	(参考訳) 実世界の物体の形状と外観を2次元画像で再現することは、コンピュータビジョンにおいて長年の課題であった。本稿では,ロバストな粗い最適化と物理に基づく微分可能レンダリングにより,高品質な再構成を実現する新手法を提案する。幾何と反射率をほぼ別々に扱う従来の手法とは異なり、この手法は物体の反射率と反射率の両方について画像勾配を活用することで両方の最適化を統一する。物理的に正確な勾配推定を得るために,最近の微分可能レンダリング理論の進歩を利用して,pytorch3dやrednerといった既存のツールよりも優れた性能を享受しながら,偏りのない勾配を提供する新しいgpuベースのモンテカルロ微分可能レンダラを開発した。さらにロバスト性を向上させるために,形状や素材の先行性や粗大な最適化戦略を利用して幾何を再構築する。本手法は,従来のcolmapやkinect fusion法よりも高品質な再構築を実現できることを示す。 Reconstructing the shape and appearance of real-world objects using measured 2D images has been a long-standing problem in computer vision. In this paper, we introduce a new analysis-by-synthesis technique capable of producing high-quality reconstructions through robust coarse-to-fine optimization and physics-based differentiable rendering. Unlike most previous methods that handle geometry and reflectance largely separately, our method unifies the optimization of both by leveraging image gradients with respect to both object reflectance and geometry. To obtain physically accurate gradient estimates, we develop a new GPU-based Monte Carlo differentiable renderer leveraging recent advances in differentiable rendering theory to offer unbiased gradients while enjoying better performance than existing tools like PyTorch3D and redner. To further improve robustness, we utilize several shape and material priors as well as a coarse-to-fine optimization strategy to reconstruct geometry. We demonstrate that our technique can produce reconstructions with higher quality than previous methods such as COLMAP and Kinect Fusion.	翻訳日:2021-03-31 07:50:24 公開日:2021-03-28
# (参考訳) 地球におけるアルゴリズム予測の限界について On the limits of algorithmic prediction across the globe ( http://arxiv.org/abs/2103.15212v1 ) ライセンス: CC BY 4.0	Xingyu Li, Difan Song, Miaozhe Han, Yu Zhang, Rene F. Kizilcec	(参考訳) 予測アルゴリズムが人々の生活や生活に与える影響は、医学、刑事司法、金融、雇用、入場などで指摘されている。これらのアルゴリズムの多くは高度に発達した国々のデータと人的資本を用いて開発されている。先進国で訓練された人間の行動の予測モデルが、65カ国の全国代表学生データに基づく200人の学歴達成予測者のグローバル変動をモデル化し、先進国の人々に広く普及するかどうかを検証した。ここでは、米国のデータに基づいてトレーニングされた最先端の機械学習モデルが、高い精度で達成を予測でき、同等の精度で他の先進国に一般化できることを示す。しかし、様々な達成予測者の重要性のグローバル変動により、国家発展とともに精度は直線的に低下し、政策立案者にとって有用なヒューリスティックとなる。同じモデルを全国データでトレーニングすると、各国で高い精度が得られ、ローカルデータ収集の価値が強調される。 The impact of predictive algorithms on people's lives and livelihoods has been noted in medicine, criminal justice, finance, hiring and admissions. Most of these algorithms are developed using data and human capital from highly developed nations. We tested how well predictive models of human behavior trained in a developed country generalize to people in less developed countries by modeling global variation in 200 predictors of academic achievement on nationally representative student data for 65 countries. Here we show that state-of-the-art machine learning models trained on data from the United States can predict achievement with high accuracy and generalize to other developed countries with comparable accuracy. However, accuracy drops linearly with national development due to global variation in the importance of different achievement predictors, providing a useful heuristic for policymakers. Training the same model on national data yields high accuracy in every country, which highlights the value of local data collection.	翻訳日:2021-03-31 07:30:19 公開日:2021-03-28
# (参考訳) グラフニューラルネットワークを用いた3Dポイントクラウド処理のための特徴とグラフ構築のための局所幾何学の展開 Exploiting Local Geometry for Feature and Graph Construction for Better 3D Point Cloud Processing with Graph Neural Networks ( http://arxiv.org/abs/2103.15226v1 ) ライセンス: CC BY 4.0	Siddharth Srivastava, Gaurav Sharma	(参考訳) 本稿では,3次元クラウド処理のためのグラフニューラルネットワーク(GNN)の汎用フレームワークにおいて,点表現と局所グラフ構築の簡易かつ効果的な改善を提案する。まず,点の局所的幾何的な重要な情報を用いて頂点表現を拡大し,次にMLPを用いた非線形投影を提案する。第2の貢献として,GNNの3次元点群に対するグラフ構築の改善を提案する。既存の手法では、k-nn に基づく局所近傍グラフの構築を行う。現場の一部地域では,センサによる高密度サンプリングを行う場合,カバー範囲が減少する可能性があると論じる。提案手法は,このような問題に対処し,適用範囲を改善することを目的としている。従来のGNNは、頂点が幾何学的解釈を持たないような一般グラフを扱うように設計されているため、この2つの提案は3次元点雲の幾何学的性質を取り入れた一般グラフを増大させるものである。単純ではあるが,実世界のノイズスキャンと同様に比較的クリーンなcadモデルを用いた複数の難解なベンチマークを用いて,提案手法が3d分類(modelnet40),部分セグメンテーション(shapenet),意味セグメンテーション(stanford 3d indoor scene dataset)のベンチマークにおいて,最先端の技術結果が得られることを示す。また,提案ネットワークがより高速な学習収束を実現することを示す。～40%少なかった。プロジェクトの詳細はhttps://siddharthsrivastava.github.io/publication/geomgcnn/で確認できる。 We propose simple yet effective improvements in point representations and local neighborhood graph construction within the general framework of graph neural networks (GNNs) for 3D point cloud processing. As a first contribution, we propose to augment the vertex representations with important local geometric information of the points, followed by nonlinear projection using a MLP. As a second contribution, we propose to improve the graph construction for GNNs for 3D point clouds. The existing methods work with a k-nn based approach for constructing the local neighborhood graph. We argue that it might lead to reduction in coverage in case of dense sampling by sensors in some regions of the scene. The proposed methods aims to counter such problems and improve coverage in such cases. As the traditional GNNs were designed to work with general graphs, where vertices may have no geometric interpretations, we see both our proposals as augmenting the general graphs to incorporate the geometric nature of 3D point clouds. While being simple, we demonstrate with multiple challenging benchmarks, with relatively clean CAD models, as well as with real world noisy scans, that the proposed method achieves state of the art results on benchmarks for 3D classification (ModelNet40) , part segmentation (ShapeNet) and semantic segmentation (Stanford 3D Indoor Scenes Dataset). We also show that the proposed network achieves faster training convergence, i.e. ~40% less epochs for classification. The project details are available at https://siddharthsrivastava.github.io/publication/geomgcnn/	翻訳日:2021-03-31 07:29:19 公開日:2021-03-28
# (参考訳) 地域降雨予測のための過小評価モデルKNN KNN, An Underestimated Model for Regional Rainfall Forecasting ( http://arxiv.org/abs/2103.15235v1 ) ライセンス: CC BY 4.0	Ning Yu and Timothy Haskins	(参考訳) 地域降雨予測は水文学と気象学において重要な課題である。本稿では,特に深層ニューラルネットワーク,ワイドニューラルネットワーク,ディープ・アンド・ワイドニューラルネットワーク,Reservoir Computing,Long Short Term Memory,Support Vector Machine,K-Nearest Neighborといった最先端のディープラーニングアルゴリズムを応用して,地域降水量を予測する統合ツールの設計を目的とする。実験結果と,分類と回帰を含む機械学習モデルとの比較により,KNNは降水データの不確実性を扱う他のモデルよりも優れたモデルであることがわかった。また, ZScore や MinMax などのデータ正規化手法も検討し検討した。 Regional rainfall forecasting is an important issue in hydrology and meteorology. This paper aims to design an integrated tool by applying various machine learning algorithms, especially the state-of-the-art deep learning algorithms including Deep Neural Network, Wide Neural Network, Deep and Wide Neural Network, Reservoir Computing, Long Short Term Memory, Support Vector Machine, K-Nearest Neighbor for forecasting regional precipitations over different catchments in Upstate New York. Through the experimental results and the comparison among machine learning models including classification and regression, we find that KNN is an outstanding model over other models to handle the uncertainty in the precipitation data. The data normalization methods such as ZScore and MinMax are also evaluated and discussed.	翻訳日:2021-03-31 07:12:32 公開日:2021-03-28
# (参考訳) resnetsの再検討: 高次スキームによるスタック戦略の改善 Rethinking ResNets: Improved Stacking Strategies With High Order Schemes ( http://arxiv.org/abs/2103.15244v1 ) ライセンス: CC BY 4.0	Zhengbo Luo and Zitang Sun and Weilian Zhou and Sei-ichiro Kamata	(参考訳) さまざまなDeep Neural Networkアーキテクチャは、コンピュータビジョンにおいて非常に重要な記録を維持している。世界中の注目を集めている一方で、全体構造の設計には一般的なガイダンスが欠けている。近年,数名の研究者が観測したdnn設計と数値微分方程式の関係から,残差設計と高次視点との公平な比較を行った。我々は,dnnの設計戦略を広く活用し,小さな設計を常に積み重ねることで,理論的知識が充実し,余分なパラメータも必要とせず,容易に改善できることを示す。我々は,多くの実効ネットワークを微分方程式の異なる数値的離散化として解釈できるという観測から着想を得た,高次手法で残差設計を再構成した。 resnet の設計は euler forward という比較的単純なスキームに従っているが、スタックの状況は急速に複雑になっている。スタックされたresnetが何らかの高次スキームに等しくなっていると仮定すると、現在の転送の方法はrunge-kuttaのような典型的な高次手法と比較すると比較的弱い可能性がある。そこで本研究では, cvベンチマークを十分な実験で検証するために, 高次resnetを提案する。安定して顕著なパフォーマンスの上昇が観察され、収束と堅牢性が恩恵を受ける。 Various Deep Neural Network architectures are keeping massive vital records in computer vision. While drawing attention worldwide, the design of the overall structure somehow lacks general guidance. Based on the relationship between DNN design with numerical differential equations, which several researchers observed in recent years, we perform a fair comparison of residual design with higher-order perspectives. We show that the widely used DNN design strategy, constantly stacking a small design, could be easily improved, supported by solid theoretical knowledge and no extra parameters needed. We reorganize the residual design in higher-order ways, which is inspired by the observation that many effective networks could be interpreted as different numerical discretizations of differential equations. The design of ResNet follows a relatively simple scheme which is Euler forward; however, the situation is getting complicated rapidly while stacking. We suppose stacked ResNet is somehow equalled to a higher order scheme, then the current way of forwarding propagation might be relatively weak compared with a typical high-order method like Runge-Kutta. We propose higher order ResNet to verify the hypothesis on widely used CV benchmarks with sufficient experiments. Stable and noticeable rises in performance are observed, convergence and robustness are benefited.	翻訳日:2021-03-31 06:57:04 公開日:2021-03-28
# HiT:ビデオテキスト検索のためのモーメントコントラスト付き階層変換器 HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval ( http://arxiv.org/abs/2103.15049v1 ) ライセンス: Link先を確認	Song Liu and Haoqi Fan and Shengsheng Qian and Yiru Chen and Wenkui Ding and Zhongyuan Wang	(参考訳) インターネット上のマルチメディアデータの爆発的増加に伴い,ビデオテキスト検索はホットな研究課題となっている。ビデオテキスト学習用トランスフォーマーは,有望な性能により注目を集めている。しかしながら,既存のクロスモーダルトランスフォーマーアプローチでは,(1)異なる層が異なる特徴を持つトランスフォーマーアーキテクチャの活用が制限されている。 2) エンドツーエンドトレーニング機構は, サンプル間の負の相互作用を制限する。本稿では,ビデオテキスト検索のための階層変換器 (HiT) という新しい手法を提案する。 HiTは特徴レベルと意味レベルで階層的相互モーダルコントラストマッチングを行い、多視点および包括的検索結果を得る。さらに,MoCoに触発されたクロスモーダル学習のためのMomentum Cross-modal Contrastを提案する。 3つの主要ビデオテキスト検索ベンチマークデータセットの実験結果は,本手法の利点を示している。 Video-Text Retrieval has been a hot research topic with the explosion of multimedia data on the Internet. Transformer for video-text learning has attracted increasing attention due to the promising performance.However, existing cross-modal transformer approaches typically suffer from two major limitations: 1) Limited exploitation of the transformer architecture where different layers have different feature characteristics. 2) End-to-end training mechanism limits negative interactions among samples in a mini-batch. In this paper, we propose a novel approach named Hierarchical Transformer (HiT) for video-text retrieval. HiT performs hierarchical cross-modal contrastive matching in feature-level and semantic-level to achieve multi-view and comprehensive retrieval results. Moreover, inspired by MoCo, we propose Momentum Cross-modal Contrast for cross-modal learning to enable large-scale negative interactions on-the-fly, which contributes to the generation of more precise and discriminative representations. Experimental results on three major Video-Text Retrieval benchmark datasets demonstrate the advantages of our methods.	翻訳日:2021-03-30 15:25:56 公開日:2021-03-28
# グラフ埋め込みによる一般ハイパーグラフのコミュニティ検出 Community Detection in General Hypergraph via Graph Embedding ( http://arxiv.org/abs/2103.15035v1 ) ライセンス: Link先を確認	Yaoming Zhen and Junhui Wang	(参考訳) 近年,ネットワークデータに注目が集まっており,従来のネットワークは2つの頂点間のペアワイズインタラクションに注目している。しかし、実際のネットワークデータはより複雑な構造を示し、頂点間の多方向相互作用は自然に発生する。本稿では,一般ハイパーグラフネットワーク,均一型,非一様型のコミュニティ構造を検出する手法を提案する。提案手法では,非一様超グラフを一様多重超グラフに拡張するためにヌル頂点を導入し,その多重超グラフを低次元ベクトル空間に埋め込み,同一コミュニティ内の頂点が互いに近接する。結果最適化タスクは、代替更新スキームによって効率的に取り組める。提案手法の漸近成分は,コミュニティ検出とハイパーグラフ推定の両面で確立され,いくつかの合成および実生活ハイパーグラフネットワークにおける数値実験でも支持されている。 Network data has attracted tremendous attention in recent years, and most conventional networks focus on pairwise interactions between two vertices. However, real-life network data may display more complex structures, and multi-way interactions among vertices arise naturally. In this article, we propose a novel method for detecting community structure in general hypergraph networks, uniform or non-uniform. The proposed method introduces a null vertex to augment a non-uniform hypergraph into a uniform multi-hypergraph, and then embeds the multi-hypergraph in a low-dimensional vector space such that vertices within the same community are close to each other. The resultant optimization task can be efficiently tackled by an alternative updating scheme. The asymptotic consistencies of the proposed method are established in terms of both community detection and hypergraph estimation, which are also supported by numerical experiments on some synthetic and real-life hypergraph networks.	翻訳日:2021-03-30 15:24:02 公開日:2021-03-28
# Decon founded Score Method: Scoring DAGs with Dense Unobserved Confounding Deconfounded Score Method: Scoring DAGs with Dense Unobserved Confounding ( http://arxiv.org/abs/2103.15106v1 ) ライセンス: Link先を確認	Alexis Bellot, Mihaela van der Schaar	(参考訳) 観測されていない発見は因果発見の最大の課題の1つである。観測されていない変数が観測された変数の多くに潜在的に広範囲に影響を及ぼす場合、ほとんどの変数は他の部分集合に対して条件依存であるため、特に困難である。本稿では, 条件の不整合性を超えて, 観測データ分布に特徴的なフットプリントを残し, 突発的・因果的影響を解消できることを示す。この知見を用いて,観測変数間のスパース線形ガウス有向非巡回グラフをほぼ復元し,汎用解法や高次元問題へのスケールで実装可能な調整スコアに基づく因果発見アルゴリズムを提案する。さらに,因果回復を保証しようとする条件にもかかわらず,実際の性能はモデル仮定の大きな偏差に対して頑健であることが判明した。 Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a potentially widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset. In this paper, we show that beyond conditional independencies, unobserved confounding in this setting leaves a characteristic footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.	翻訳日:2021-03-30 15:23:47 公開日:2021-03-28
# マルチビュークラスタリングのための自己教師付き判別特徴学習 Self-supervised Discriminative Feature Learning for Multi-view Clustering ( http://arxiv.org/abs/2103.15069v1 ) ライセンス: Link先を確認	Jie Xu, Yazhou Ren, Huayi Tang, Zhimeng Yang, Lili Pan, Yang Yang, Xiaorong Pu	(参考訳) マルチビュークラスタリングは、複数のビューから補完情報を活用できるため、重要な研究トピックである。しかし、クラスタリング構造が不明な特定のビューによる負の影響を考慮する方法はほとんどなく、結果としてマルチビュークラスタリング性能が低下する。この欠点に対処するために,マルチビュークラスタリング(SDMVC)のための自己教師付き識別特徴学習を提案する。具体的には、ディープオートエンコーダを用いて各ビューの埋め込み機能を独立して学習する。マルチビュー補完情報を活用するために、すべてのビューの組み込み機能を結合してグローバル機能を形成することにより、一部のビューの不明瞭なクラスタリング構造による負の影響を克服する。自己教師方式で擬似ラベルを取得し、統一された目標分布を構築し、多視点識別特徴学習を行う。このプロセスでは、全ビューを監督するためにグローバル判別情報を掘り出し、より識別的な特徴を学習し、ターゲットディストリビューションを更新するために使用される。さらに、この統合されたターゲットディストリビューションは、SDMVCが一貫性のあるクラスタ割り当てを学習できるようにし、特徴の多様性を維持しながら、複数のビューのクラスタ化一貫性を達成する。様々なタイプのマルチビューデータセットの実験により、SDMVCが最先端のパフォーマンスを達成することが示された。 Multi-view clustering is an important research topic due to its capability to utilize complementary information from multiple views. However, there are few methods to consider the negative impact caused by certain views with unclear clustering structures, resulting in poor multi-view clustering performance. To address this drawback, we propose self-supervised discriminative feature learning for multi-view clustering (SDMVC). Concretely, deep autoencoders are applied to learn embedded features for each view independently. To leverage the multi-view complementary information, we concatenate all views' embedded features to form the global features, which can overcome the negative impact of some views' unclear clustering structures. In a self-supervised manner, pseudo-labels are obtained to build a unified target distribution to perform multi-view discriminative feature learning. During this process, global discriminative information can be mined to supervise all views to learn more discriminative features, which in turn are used to update the target distribution. Besides, this unified target distribution can make SDMVC learn consistent cluster assignments, which accomplishes the clustering consistency of multiple views while preserving their features' diversity. Experiments on various types of multi-view datasets show that SDMVC achieves state-of-the-art performance.	翻訳日:2021-03-30 15:22:14 公開日:2021-03-28
# 分散平滑化による自己回帰モデリングの改善 Improved Autoregressive Modeling with Distribution Smoothing ( http://arxiv.org/abs/2103.15089v1 ) ライセンス: Link先を確認	Chenlin Meng, Jiaming Song, Yang Song, Shengjia Zhao, and Stefano Ermon	(参考訳) 自己回帰モデルは画像圧縮に優れるが、そのサンプル品質はしばしば欠落している。現実的ではないものの、生成された画像は、しばしばモデルに従って高い確率を持ち、逆の例の場合に似ている。敵対的防御法の成功に触発されて,ランダム化平滑化を自己回帰的生成モデルに取り入れた。まず、まずスムーズなデータ分布をモデル化し、次にスムーズな処理を反転させて元のデータ分布を復元する。この手順は、合成データセットと実世界の画像データセットの既存の自己回帰モデルのサンプル品質を劇的に改善し、合成データセットの競合可能性を得る。 While autoregressive models excel at image compression, their sample quality is often lacking. Although not realistic, generated images often have high likelihood according to the model, resembling the case of adversarial examples. Inspired by a successful adversarial defense method, we incorporate randomized smoothing into autoregressive generative modeling. We first model a smoothed version of the data distribution, and then reverse the smoothing process to recover the original data distribution. This procedure drastically improves the sample quality of existing autoregressive models on several synthetic and real-world image datasets while obtaining competitive likelihoods on synthetic datasets.	翻訳日:2021-03-30 15:21:57 公開日:2021-03-28
# 確率的分類モデルの信頼度評価のためのエントロピー法 Entropy methods for the confidence assessment of probabilistic classification models ( http://arxiv.org/abs/2103.15157v1 ) ライセンス: Link先を確認	Gabriele N. Tornetta	(参考訳) 多くの分類モデルは予測の結果として確率分布を生成する。この情報は一般に、最も高い関連する確率で単一のクラスに圧縮される。本稿では、このプロセスで破棄された情報の一部は、実際にモデルの良さ、特に各予測の信頼性を更に評価するために利用できると論じる。本稿では,本論文で提示された概念の応用として,(ベルヌーリ)ナイーブベイズ生成モデルに対する補完的アプローチで観測される信頼度低下現象の理論的説明を提案する。 Many classification models produce a probability distribution as the outcome of a prediction. This information is generally compressed down to the single class with the highest associated probability. In this paper, we argue that part of the information that is discarded in this process can be in fact used to further evaluate the goodness of models, and in particular the confidence with which each prediction is made. As an application of the ideas presented in this paper, we provide a theoretical explanation of a confidence degradation phenomenon observed in the complement approach to the (Bernoulli) Naive Bayes generative model.	翻訳日:2021-03-30 15:16:16 公開日:2021-03-28
# InsertGNN: グラフニューラルネットワークはTOEFL文挿入問題において人間より優れているか? InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence Insertion Problem? ( http://arxiv.org/abs/2103.15066v1 ) ライセンス: Link先を確認	Fang Wu and Xiang Bai	(参考訳) 文挿入は繊細だが基本的なNLP問題である。文順序付け、テキストコヒーレンス、質問応答(QA)の現在のアプローチは、その解決には適さない。本稿では,この問題をグラフとして表現し,グラフニューラルネットワーク(GNN)を用いて文間の関係を学習するシンプルなモデルであるInsertGNNを提案する。また、近隣の文の局所的な相互作用を考慮できる地域情報とグローバル情報の両方で教師されている。我々の知る限りでは、文挿入に教師付きグラフ構造化モデルを適用する試みとしてはこれが初めてである。本手法を新たに収集したtoeflデータセットで評価し,クロスドメイン学習を用いた大規模arxivデータセットの有効性をさらに検証した。実験の結果,InsertGNNは教師なしテキストコヒーレンス手法,トポロジカル文順序付け手法,QAアーキテクチャよりも優れていた。具体的には、平均的な人間のテストスコアに匹敵する70%の精度を達成する。 Sentence insertion is a delicate but fundamental NLP problem. Current approaches in sentence ordering, text coherence, and question answering (QA) are neither suitable nor good at solving it. In this paper, We propose InsertGNN, a simple yet effective model that represents the problem as a graph and adopts the graph Neural Network (GNN) to learn the connection between sentences. It is also supervised by both the local and global information that the local interactions of neighboring sentences can be considered. To the best of our knowledge, this is the first recorded attempt to apply a supervised graph-structured model in sentence insertion. We evaluate our method in our newly collected TOEFL dataset and further verify its effectiveness on the larger arXivdataset using cross-domain learning. The experiments show that InsertGNN outperforms the unsupervised text coherence method, the topological sentence ordering approach, and the QA architecture. Specifically, It achieves an accuracy of 70%, rivaling the average human test scores.	翻訳日:2021-03-30 15:12:58 公開日:2021-03-28
# 汎用知能の一般理論 : プラグマティック・パターン主義の視点から The General Theory of General Intelligence: A Pragmatic Patternist Perspective ( http://arxiv.org/abs/2103.15100v1 ) ライセンス: Link先を確認	Ben Goertzel	(参考訳) 一連の書籍や論文で表現され、一連の実用および研究プロトタイプソフトウェアシステムのガイドに使用される、人工的および自然的汎用知性の理論的基礎に関する多年にわたる調査が、適度なレベルでレビューされている。このレビューでは、基礎となる哲学(心のパターン哲学、基礎現象論と論理オントロジー)、知性の概念の形式化、そしてこれらの形式化と哲学によって部分的に駆動されるagiシステムのための高レベルアーキテクチャの提案などを取り上げている。論理的推論、プログラム学習、クラスタリング、注意割当てといった特定の認知過程の実装は、このハイレベルアーキテクチャの文脈と言語において、共通の(例えば)重要性と同様に考慮される。タイプ付きメタグラフベース) 様々なプロセス間の「認知シナジー」を可能にする知識表現。人間のような認知アーキテクチャの特質は、これらの一般的な原則の表象として提示され、機械意識と機械倫理の重要な側面もこの文脈で扱われる。 OpenCog Hyperonのようなフレームワークにおける高度なAGIの実践的な実装の教訓を簡潔に検討する。 A multi-decade exploration into the theoretical foundations of artificial and natural general intelligence, which has been expressed in a series of books and papers and used to guide a series of practical and research-prototype software systems, is reviewed at a moderate level of detail. The review covers underlying philosophies (patternist philosophy of mind, foundational phenomenological and logical ontology), formalizations of the concept of intelligence, and a proposed high level architecture for AGI systems partly driven by these formalizations and philosophies. The implementation of specific cognitive processes such as logical reasoning, program learning, clustering and attention allocation in the context and language of this high level architecture is considered, as is the importance of a common (e.g. typed metagraph based) knowledge representation for enabling "cognitive synergy" between the various processes. The specifics of human-like cognitive architecture are presented as manifestations of these general principles, and key aspects of machine consciousness and machine ethics are also treated in this context. Lessons for practical implementation of advanced AGI in frameworks such as OpenCog Hyperon are briefly considered.	翻訳日:2021-03-30 15:11:29 公開日:2021-03-28
# LSG-CPD:点雲登録のための局所表面形状のコヒーレント点ドリフト LSG-CPD: Coherent Point Drift with Local Surface Geometry for Point Cloud Registration ( http://arxiv.org/abs/2103.15039v1 ) ライセンス: Link先を確認	Weixiao Liu, Hongtao Wu, Gregory Chirikjian	(参考訳) 確率的ポイントクラウド登録手法は,その堅牢性から人気が高まっている。しかし、局所的な表面幾何情報を含む反復的最接近点(icp)の点対平面変種とは異なり、ほとんどの確率的手法(例えばコヒーレント点ドリフト(cpd))はそのような情報を無視し、等方的ガウス共分散を持つガウス混合モデル(gmms)を構築する。この結果、球状GMM成分は2つの点雲の間の点間距離のみをペナル化する。本稿では,剛点雲登録のための局所表面形状(LSG-CPD)を用いたCDD法を提案する。本手法は,局所表面の平坦度に基づいて,点対面のペナリゼーションに異なるレベルのペナリゼーションを適応的に付加する。これにより、異方性共分散を持つGMM成分が得られる。我々は,最大極大推定(MLE)問題として点雲登録を定式化し,期待最大化(EM)アルゴリズムを用いて解いた。 Eステップでは、計算を単純な行列操作に再キャストし、GPU上で効率的に計算できることを実証する。 M ステップでは、行列リー群上で制約のない最適化を行い、登録の剛性変換を効率的に更新する。提案手法は、レンジスキャナ、rgbdカメラ、lidarでキャプチャした各種データセットの精度とロバスト性の観点から最先端アルゴリズムを上回る。また、cpdの現代的な実装よりもかなり高速である。コードはリリースされます。 Probabilistic point cloud registration methods are becoming more popular because of their robustness. However, unlike point-to-plane variants of iterative closest point (ICP) which incorporate local surface geometric information such as surface normals, most probabilistic methods (e.g., coherent point drift (CPD)) ignore such information and build Gaussian mixture models (GMMs) with isotropic Gaussian covariances. This results in sphere-like GMM components which only penalize the point-to-point distance between the two point clouds. In this paper, we propose a novel method called CPD with Local Surface Geometry (LSG-CPD) for rigid point cloud registration. Our method adaptively adds different levels of point-to-plane penalization on top of the point-to-point penalization based on the flatness of the local surface. This results in GMM components with anisotropic covariances. We formulate point cloud registration as a maximum likelihood estimation (MLE) problem and solve it with the Expectation-Maximization (EM) algorithm. In the E step, we demonstrate that the computation can be recast into simple matrix manipulations and efficiently computed on a GPU. In the M step, we perform an unconstrained optimization on a matrix Lie group to efficiently update the rigid transformation of the registration. The proposed method outperforms state-of-the-art algorithms in terms of accuracy and robustness on various datasets captured with range scanners, RGBD cameras, and LiDARs. Also, it is significantly faster than modern implementations of CPD. The code will be released.	翻訳日:2021-03-30 15:04:47 公開日:2021-03-28
# 長期音声認識における蒸留仮想例 Distilling Virtual Examples for Long-tailed Recognition ( http://arxiv.org/abs/2103.15042v1 ) ライセンス: Link先を確認	Yin-Yin He, Jianxin Wu, Xiu-Shen Wei	(参考訳) 本稿では,Distill the Virtual Examples(DiVE)法を提案することにより,知識蒸留の観点からの長期視覚認識問題に取り組む。具体的には,教師モデルの予測を仮想例として扱うことで,これらの仮想例からの蒸留が一定の制約下でラベル分布学習と等価であることを示す。仮想的なサンプル分布が元の入力分布よりも平坦になると、表現不足のテールクラスは大幅に改善され、ロングテール認識に欠かせないことが示される。提案手法では,仮想サンプル分布のフラット化を明示的に調整できる。大規模なiNaturalistを含む3つのベンチマークデータセットに対する大規模な実験は、提案したDiVEメソッドが最先端の手法を大幅に上回ることを正当化している。さらに、仮想的なサンプル解釈を検証し、長い尾問題に対するDiVEの調整済み設計の有効性を実証する。 In this paper, we tackle the long-tailed visual recognition problem from the knowledge distillation perspective by proposing a Distill the Virtual Examples (DiVE) method. Specifically, by treating the predictions of a teacher model as virtual examples, we prove that distilling from these virtual examples is equivalent to label distribution learning under certain constraints. We show that when the virtual example distribution becomes flatter than the original input distribution, the under-represented tail classes will receive significant improvements, which is crucial in long-tailed recognition. The proposed DiVE method can explicitly tune the virtual example distribution to become flat. Extensive experiments on three benchmark datasets, including the large-scale iNaturalist ones, justify that the proposed DiVE method can significantly outperform state-of-the-art methods. Furthermore, additional analyses and experiments verify the virtual example interpretation, and demonstrate the effectiveness of tailored designs in DiVE for long-tailed problems.	翻訳日:2021-03-30 15:04:20 公開日:2021-03-28
# 騒々しいラベルから学ぶ友人とファン Friends and Foes in Learning from Noisy Labels ( http://arxiv.org/abs/2103.15055v1 ) ライセンス: Link先を確認	Yifan Zhou, Yifan Ge, Jianxin Wu	(参考訳) ノイズの多いラベルを持つ例から学ぶことが近年注目を集めている。しかし,本論文では,CIFARに基づくデータセットと文献で使用される精度評価基準が,この文脈では不適切であることを示す。本稿では,この分野における適切な研究と評価を促進するために,代替有効な評価指標と新しいデータセットを提案する。そして, 従来の手法から, ノイズのあるラベル付きサンプルからの深層学習に有益あるいは有害な技術要素として友人や敵を同定し, 自己教師付き学習, 新たなウォームアップ戦略, インスタンスフィルタリング, ラベル修正など, 友人のカテゴリからの技術的構成要素を改善し, 組み合わせる。得られたF&F法は,提案したnCIFARデータセットと実世界のChrothing1Mデータセットの既存手法を著しく上回っている。 Learning from examples with noisy labels has attracted increasing attention recently. But, this paper will show that the commonly used CIFAR-based datasets and the accuracy evaluation metric used in the literature are both inappropriate in this context. An alternative valid evaluation metric and new datasets are proposed in this paper to promote proper research and evaluation in this area. Then, friends and foes are identified from existing methods as technical components that are either beneficial or detrimental to deep learning from noisy labeled examples, respectively, and this paper improves and combines technical components from the friends category, including self-supervised learning, new warmup strategy, instance filtering and label correction. The resulting F&F method significantly outperforms existing methods on the proposed nCIFAR datasets and the real-world Clothing1M dataset.	翻訳日:2021-03-30 15:04:06 公開日:2021-03-28
# warpへの注意:多変量時系列のためのディープメトリック学習 Attention to Warp: Deep Metric Learning for Multivariate Time Series ( http://arxiv.org/abs/2103.15074v1 ) ライセンス: Link先を確認	Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida	(参考訳) ディープ時系列計量学習は、非マッチングシーケンスを識別する際の時間的不変性と非線形歪みとのトレードオフが難しいため、困難である。本稿では,ロバストだが判別可能な時系列分類と検証のためのニューラルネットワークに基づく新しい手法を提案する。このアプローチは、より大きく適応的な時間的不変性に対して、パラメータ化された注意モデルを時間歪みに適応させる。局所的だけでなく大きな大域的歪みにも頑健であり、一調性、連続性、境界条件を満たさない対でさえもうまく同定できる。このモデルの学習は動的時間ワープによってさらに誘導され、安定したトレーニングと高い差別力のための時間的制約が課される。ウォーピングによってクラス間のバリエーションを増強し、類似しているが異なるクラスを効果的に区別することができる。提案手法は,Unipenデータセット上のシングルレター手書き分類において有望な動作を確認した後,オンライン署名検証フレームワークと組み合わせることで,従来の非パラメトリック・ディープモデルよりも優れていることを示す。 Deep time series metric learning is challenging due to the difficult trade-off between temporal invariance to nonlinear distortion and discriminative power in identifying non-matching sequences. This paper proposes a novel neural network-based approach for robust yet discriminative time series classification and verification. This approach adapts a parameterized attention model to time warping for greater and more adaptive temporal invariance. It is robust against not only local but also large global distortions, so that even matching pairs that do not satisfy the monotonicity, continuity, and boundary conditions can still be successfully identified. Learning of this model is further guided by dynamic time warping to impose temporal constraints for stabilized training and higher discriminative power. It can learn to augment the inter-class variation through warping, so that similar but different classes can be effectively distinguished. We experimentally demonstrate the superiority of the proposed approach over previous non-parametric and deep models by combining it with a deep online signature verification framework, after confirming its promising behavior in single-letter handwriting classification on the Unipen dataset.	翻訳日:2021-03-30 15:03:51 公開日:2021-03-28
# Picasso:3DメッシュによるディープラーニングのためのCUDAベースのライブラリ Picasso: A CUDA-based Library for Deep Learning over 3D Meshes ( http://arxiv.org/abs/2103.15076v1 ) ライセンス: Link先を確認	Huan Lei, Naveed Akhtar, Ajmal Mian	(参考訳) 複雑な現実世界の3Dメッシュ上でのディープラーニングのための新しいモジュールで構成されるCUDAベースのライブラリであるPicassoを紹介する。階層型ニューラルネットワークアーキテクチャは、高速メッシュデシミテーションの必要性を示すマルチスケールの特徴抽出に有効であることが証明されている。しかし、既存の手法はマルチレゾリューションメッシュを得るためにCPUベースの実装に依存している。我々は,ネットワーク解像度の低減を図るために,GPU加速メッシュデシメーションを設計する。プールおよびアンプールモジュールは、デシメーション時に収集された頂点クラスタ上で定義される。メッシュ上の特徴学習には、facet2vertex、vertex2facet、facet2facetという3種類の新しい畳み込みが含まれている。したがって、メッシュを従来の方法のようにエッジを持つ空間グラフではなく、頂点と面からなる幾何学的構造として扱う。 Picassoはまた、メッシュサンプリング(頂点密度)に対する堅牢性のためのファジィ機構をフィルタに組み込んでいる。これは、ファゼット2頂点の畳み込みのファジィ係数を定義するためにガウス混合を利用し、残りの2つの畳み込みの係数を定義するためにバリ中心補間を行う。本稿では,S3DIS上での競合セグメンテーション結果を用いた提案モジュールの有効性を示す。ライブラリはhttps://github.com/hlei-ziyan/picassoで公開される。 We present Picasso, a CUDA-based library comprising novel modules for deep learning over complex real-world 3D meshes. Hierarchical neural architectures have proved effective in multi-scale feature extraction which signifies the need for fast mesh decimation. However, existing methods rely on CPU-based implementations to obtain multi-resolution meshes. We design GPU-accelerated mesh decimation to facilitate network resolution reduction efficiently on-the-fly. Pooling and unpooling modules are defined on the vertex clusters gathered during decimation. For feature learning over meshes, Picasso contains three types of novel convolutions namely, facet2vertex, vertex2facet, and facet2facet convolution. Hence, it treats a mesh as a geometric structure comprising vertices and facets, rather than a spatial graph with edges as previous methods do. Picasso also incorporates a fuzzy mechanism in its filters for robustness to mesh sampling (vertex density). It exploits Gaussian mixtures to define fuzzy coefficients for the facet2vertex convolution, and barycentric interpolation to define the coefficients for the remaining two convolutions. In this release, we demonstrate the effectiveness of the proposed modules with competitive segmentation results on S3DIS. The library will be made public through https://github.com/hlei-ziyan/Picasso.	翻訳日:2021-03-30 15:03:33 公開日:2021-03-28
# オープンセット認識のための学習プレースホルダ Learning Placeholders for Open-Set Recognition ( http://arxiv.org/abs/2103.15086v1 ) ライセンス: Link先を確認	Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan	(参考訳) 従来の分類器はクローズドセット設定でデプロイされ、トレーニングクラスとテストクラスは同じセットに属する。しかし、現実世界のアプリケーションはおそらく未知のカテゴリの入力に直面し、モデルはそれらを既知のカテゴリとして認識する。このような状況下では、既知のクラスにおける分類性能を維持し、未知のクラスを拒否するオープンセット認識が提案されている。クローズドセットモデルは、既知のクラスインスタンスよりも自信過剰な予測を行うため、オープンセット環境に拡張する際に、カテゴリ間のキャリブレーションとしきい値化が不可欠な問題となる。そこで我々は,データと分類器の両方にプレースホルダを割り当てることで未知のクラスに備えるオープンセット認識 (proser) のためのプレースホルダの学習を提案した。具体的には、学習データプレースホルダはオープンセットのクラスデータを予測し、クローズドセットのトレーニングをオープンセットのトレーニングに変換する。さらに,ターゲットクラスと非ターゲットクラスの不変情報を学習するために,分類器のプレースホルダーを,未知と未知のクラス固有の境界として予約する。提案するproserは,多様体混合により新しいクラスを効率的に生成し,訓練中に予約されたオープンセット分類器の値を適応的に設定する。提案手法の有効性を検証した各種データセットの実験を行った。 Traditional classifiers are deployed under closed-set setting, with both training and test classes belong to the same set. However, real-world applications probably face the input of unknown categories, and the model will recognize them as known ones. Under such circumstances, open-set recognition is proposed to maintain classification performance on known classes and reject unknowns. The closed-set models make overconfident predictions over familiar known class instances, so that calibration and thresholding across categories become essential issues when extending to an open-set environment. To this end, we proposed to learn PlaceholdeRs for Open-SEt Recognition (Proser), which prepares for the unknown classes by allocating placeholders for both data and classifier. In detail, learning data placeholders tries to anticipate open-set class data, thus transforms closed-set training into open-set training. Besides, to learn the invariant information between target and non-target classes, we reserve classifier placeholders as the class-specific boundary between known and unknown. The proposed Proser efficiently generates novel class by manifold mixup, and adaptively sets the value of reserved open-set classifier during training. Experiments on various datasets validate the effectiveness of our proposed method.	翻訳日:2021-03-30 15:03:14 公開日:2021-03-28
# ACSNet:時間的行動局所化を弱める行動コンテキスト分離ネットワーク ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization ( http://arxiv.org/abs/2103.15088v1 ) ライセンス: Link先を確認	Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua	(参考訳) Weakly-supervised Temporal Action Localization (WS-TAL) の目的は、すべてのアクションインスタンスをビデオレベルの監視のみでトリミングされたビデオにローカライズすることである。トレーニング中にフレームレベルのアノテーションがないため、現在のWS-TALメソッドはビデオレベルの分類タスクに寄与する前景のスニペットやフレームをローカライズするアテンションメカニズムに依存している。この戦略は、ローカライゼーション結果においてコンテキストを実際のアクションと混同することが多い。アクションとコンテキストの分離は、正確なWS-TALにとって重要な問題ですが、非常に困難で、文献でほとんど無視されています。本稿では,アクションローカライズのためのコンテキストを明示的に考慮した行動コンテキスト分離ネットワーク(ACSNet)を提案する。 2つのブランチ(すなわちフォアグラウンドバックグラウンドブランチとアクションコンテキストブランチ)で構成されている。前景背景ブランチは、まずビデオ全体の背景と前景を区別する一方、Action-Contextブランチは、その前景をアクションとコンテキストとして分離する。我々はビデオスニペットを2つの潜伏成分(正の成分と負の成分)に関連付け、それらの組み合わせは前景、アクション、コンテキストを効果的に特徴付けることができる。さらに,アクション・コンテキスト分離の学習を容易にするために,補助コンテキストカテゴリを持つ拡張ラベルを導入する。 THUMOS14とActivityNet v1.2/v1.3データセットの実験では、ACSNetが既存のWS-TALメソッドよりも大きなマージンで優れていることが示されている。 The object of Weakly-supervised Temporal Action Localization (WS-TAL) is to localize all action instances in an untrimmed video with only video-level supervision. Due to the lack of frame-level annotations during training, current WS-TAL methods rely on attention mechanisms to localize the foreground snippets or frames that contribute to the video-level classification task. This strategy frequently confuse context with the actual action, in the localization result. Separating action and context is a core problem for precise WS-TAL, but it is very challenging and has been largely ignored in the literature. In this paper, we introduce an Action-Context Separation Network (ACSNet) that explicitly takes into account context for accurate action localization. It consists of two branches (i.e., the Foreground-Background branch and the Action-Context branch). The Foreground- Background branch first distinguishes foreground from background within the entire video while the Action-Context branch further separates the foreground as action and context. We associate video snippets with two latent components (i.e., a positive component and a negative component), and their different combinations can effectively characterize foreground, action and context. Furthermore, we introduce extended labels with auxiliary context categories to facilitate the learning of action-context separation. Experiments on THUMOS14 and ActivityNet v1.2/v1.3 datasets demonstrate the ACSNet outperforms existing state-of-the-art WS-TAL methods by a large margin.	翻訳日:2021-03-30 15:02:53 公開日:2021-03-28
# 高速かつ効果的な単一多重モデル畳み込みニューラルネットワークによる単一物体追跡 Single Object Tracking through a Fast and Effective Single-Multiple Model Convolutional Neural Network ( http://arxiv.org/abs/2103.15105v1 ) ライセンス: Link先を確認	Faraz Lotfi, Hamid D. Taghirad	(参考訳) 類似したオブジェクトが同じ領域に存在する場合、オブジェクト追跡は特に重要になる。近年のSOTA(State-of-the-art)アプローチは,トラッカーの性能を大幅に低下させる領域において,ターゲットを他の物体と区別するために,重構造と整合するネットワークを用いて提案されている。また、複数の候補が考慮され、時間を要する各フレームの関心領域に対象オブジェクトをローカライズするために処理される。本稿では,従来のアプローチとは対照的に,同一領域の類似したオブジェクトと区別するためにテンプレートを考慮しながら,単一のショットでオブジェクトの位置を識別することが可能な,特別なアーキテクチャを提案する。まず第一に、ターゲットサイズが2倍のオブジェクトを含むウィンドウを考える。このウィンドウは完全な畳み込みニューラルネットワーク(CNN)に入力され、各フレームのマトリックスの形式で関心領域(RoI)を抽出する。はじめに、ターゲットのテンプレートもcnnへの入力として取り込まれる。このRoI行列を考慮すると、トラッカーの次の動きは単純かつ高速な方法に基づいて決定される。さらに、このマトリックスは、時間とともに変化するときに重要なオブジェクトサイズを推定するのに役立ちます。マッチングネットワークがないにもかかわらず、提示されたトラッカーはSOTAと比較的困難な状況下で動作し、それに比べて超高速である(最大120FPS$ on 1080ti)。この主張を調べるため、GOT-10kデータセットで比較研究を行った。その結果,提案手法の課題遂行における優れた性能が得られた。 Object tracking becomes critical especially when similar objects are present in the same area. Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area which indeed drastically downgrades the performance of the tracker in terms of speed. Besides, several candidates are considered and processed to localize the intended object in a region of interest for each frame which is time-consuming. In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot while taking its template into account to distinguish it from the similar objects in the same area. In brief, first of all, a window containing the object with twice the target size is considered. This window is then fed into a fully convolutional neural network (CNN) to extract a region of interest (RoI) in a form of a matrix for each of the frames. In the beginning, a template of the target is also taken as the input to the CNN. Considering this RoI matrix, the next movement of the tracker is determined based on a simple and fast method. Moreover, this matrix helps to estimate the object size which is crucial when it changes over time. Despite the absence of a matching network, the presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to $120 FPS$ on 1080ti). To investigate this claim, a comparison study is carried out on the GOT-10k dataset. Results reveal the outstanding performance of the proposed method in fulfilling the task.	翻訳日:2021-03-30 15:02:28 公開日:2021-03-28
# 血縁検証のためのメタマイニング判別サンプル Meta-Mining Discriminative Samples for Kinship Verification ( http://arxiv.org/abs/2103.15108v1 ) ライセンス: Link先を確認	Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, Jie Zhou	(参考訳) Kinship confirmedは、与えられた顔画像の親族関係が存在するかどうかを調べることを目的としている。 kinship検証データベースは、アンバランスなデータで生まれます。 N 個の正の親和対を持つデータベースに対して、自然に N(N-1) 個の負の対を得る。限定された正の対を完全に活用し、血縁検証のための十分な負のサンプルから識別情報をマイニングする方法は、未解決の問題である。この問題に対処するため,本論文では識別サンプルメタマイニング(DSMM)手法を提案する。固定的な負のペアを持つバランスのとれたデータセットを構築する既存の方法とは異なり、全ての可能なペアを活用し、データから判別情報を自動学習する。具体的には、各イテレーションでバランスの取れない列車バッチとバランスのとれたメタ列車バッチをサンプリングします。次に、バランスのとれたメタトレーニングバッチでメタ勾配を持つメタマイナを学習します。最終的に、バランスのとれない列車バッチのサンプルは、学習したメタマイナによって再重み付けされ、キンシップモデルが最適化される。 KinFaceW-I, KinFaceW-II, TSKinFace, Cornell Kinshipデータセットを用いた実験結果から, 提案手法の有効性が示された。 Kinship verification aims to find out whether there is a kin relation for a given pair of facial images. Kinship verification databases are born with unbalanced data. For a database with N positive kinship pairs, we naturally obtain N(N-1) negative pairs. How to fully utilize the limited positive pairs and mine discriminative information from sufficient negative samples for kinship verification remains an open issue. To address this problem, we propose a Discriminative Sample Meta-Mining (DSMM) approach in this paper. Unlike existing methods that usually construct a balanced dataset with fixed negative pairs, we propose to utilize all possible pairs and automatically learn discriminative information from data. Specifically, we sample an unbalanced train batch and a balanced meta-train batch for each iteration. Then we learn a meta-miner with the meta-gradient on the balanced meta-train batch. In the end, the samples in the unbalanced train batch are re-weighted by the learned meta-miner to optimize the kinship models. Experimental results on the widely used KinFaceW-I, KinFaceW-II, TSKinFace, and Cornell Kinship datasets demonstrate the effectiveness of the proposed approach.	翻訳日:2021-03-30 15:02:00 公開日:2021-03-28
# 野生における顔面表情認識のためのインポンダラスネット Imponderous Net for Facial Expression Recognition in the Wild ( http://arxiv.org/abs/2103.15136v1 ) ライセンス: Link先を確認	Darshan Gera and S. Balasubramanian	(参考訳) ディープラーニング (DL) のルネサンス以来, 顔表情認識 (FER) が注目され, 性能が継続的に向上している。パフォーマンスと引き換えに、新たな課題が生まれました。現代のFERシステムは、オクルージョンやポーズのバリエーションを含む、制御されていない条件下で撮影された顔画像("in-the-wild scenario"とも呼ばれる)を扱う。彼らは、転送学習、アテンション機構、局所的グローバルコンテキスト抽出器といったさまざまなコンポーネントを備えたディープネットワークを使用して、そのような条件をうまく処理します。しかし、これらのディープネットワークは多数のパラメータで非常に複雑であり、実際のシナリオにデプロイするには適さない。内蔵のシナリオ下で、FER上で非常に優れたパフォーマンスを示す軽量ネットワークを構築することは可能か? 本研究では,このようなネットワークを体系的に構築し,Imponderous Netと呼ぶ。我々は、先のディープネットワークのコンポーネントをFERに活用し、分析し、慎重に選択し、Imponderous Netに到達させる。我々のインポンダラスネットは1.45Mパラメータしか持たない低カロリーネットであり、最先端(SOTA)アーキテクチャの約50倍小さい。さらに、推論の間、intel-i7 cpuでリアルタイムレート40フレーム/秒(fps)で処理することができる。カロリーは低いが、その性能は依然としてパワー満載であり、他の軽量アーキテクチャや高容量アーキテクチャを圧倒している。具体的には、in-the-wildデータセット rafdb, ferplus, affectnet それぞれ 87.09\%, 88.17\%, 62.06\% を報告している。また、オクルージョン下では優れたロバスト性を示し、文献の他の軽量建築と比べてポーズのバリエーションも示している。 Since the renaissance of deep learning (DL), facial expression recognition (FER) has received a lot of interest, with continual improvement in the performance. Hand-in-hand with performance, new challenges have come up. Modern FER systems deal with face images captured under uncontrolled conditions (also called in-the-wild scenario) including occlusions and pose variations. They successfully handle such conditions using deep networks that come with various components like transfer learning, attention mechanism and local-global context extractor. However, these deep networks are highly complex with large number of parameters, making them unfit to be deployed in real scenarios. Is it possible to build a light-weight network that can still show significantly good performance on FER under in-the-wild scenario? In this work, we methodically build such a network and call it as Imponderous Net. We leverage on the aforementioned components of deep networks for FER, and analyse, carefully choose and fit them to arrive at Imponderous Net. Our Imponderous Net is a low calorie net with only 1.45M parameters, which is almost 50x less than that of a state-of-the-art (SOTA) architecture. Further, during inference, it can process at the real time rate of 40 frames per second (fps) in an intel-i7 cpu. Though it is low calorie, it is still power packed in its performance, overpowering other light-weight architectures and even few high capacity architectures. Specifically, Imponderous Net reports 87.09\%, 88.17\% and 62.06\% accuracies on in-the-wild datasets RAFDB, FERPlus and AffectNet respectively. It also exhibits superior robustness under occlusions and pose variations in comparison to other light-weight architectures from the literature.	翻訳日:2021-03-30 15:01:42 公開日:2021-03-28
# TransCenter:マルチオブジェクトトラッキングのためのDense Queries付きトランスフォーマー TransCenter: Transformers with Dense Queries for Multiple-Object Tracking ( http://arxiv.org/abs/2103.15145v1 ) ライセンス: Link先を確認	Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda	(参考訳) トランスフォーマーネットワークは、導入以来、さまざまなタスクで非常に強力であることが証明されている。コンピュータビジョンは例外ではなく、近年ではトランスフォーマーの使用が視覚コミュニティで非常に人気になっている。この波にもかかわらず、MOT(Multiple-object Tracking)はトランスフォーマーと何らかの非互換性を示す。標準表現 -- 境界ボックス -- はMOTの学習トランスフォーマーに適応していない、と我々は主張する。最近の研究から着想を得たTransCenterは,複数のターゲットの中心を追跡するトランスフォーマーベースのアーキテクチャである。本研究では,2重デコーダネットワークにおいて,ターゲットの中心のヒートマップをロバストに推定し,時間を通じてそれらを関連付ける手法を提案する。 TransCenterは、MOT17とMOT20の両方において、現在の最先端のマルチオブジェクトトラッキングよりも優れている。本研究は,より単純な代替案と比較して,提案アーキテクチャの利点を実証するものである。コードは公開される予定だ。 Transformer networks have proven extremely powerful for a wide variety of tasks since they were introduced. Computer vision is not an exception, as the use of transformers has become very popular in the vision community in recent years. Despite this wave, multiple-object tracking (MOT) exhibits for now some sort of incompatibility with transformers. We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for MOT. Inspired by recent research, we propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets. Methodologically, we propose the use of dense queries in a double-decoder network, to be able to robustly infer the heatmap of targets' centers and associate them through time. TransCenter outperforms the current state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our ablation study demonstrates the advantage in the proposed architecture compared to more naive alternatives. The code will be made publicly available.	翻訳日:2021-03-30 15:01:11 公開日:2021-03-28
# ビジュアルギャップのブリッジ:ワイドレンジ画像のブレンド Bridging the Visual Gap: Wide-Range Image Blending ( http://arxiv.org/abs/2103.15149v1 ) ライセンス: Link先を確認	Chia-Ni Lu, Ya-Chu Chang and Wei-Chen Chiu	(参考訳) 本稿では,2つの異なる入力画像をパノラマにスムーズに融合し,その中間領域に新たな画像コンテンツを生成することを目的とした,画像処理における新たな問題シナリオである広域画像ブレンディングを提案する。このような問題は、画像インペインティング、画像アウトペインティング、画像ブレンドといったトピックと密接に関連しているが、これらのトピックからのアプローチは、いずれも簡単に対処できない。広帯域画像ブレンディングを実現するための効果的な深層学習モデルを導入し、新しい双方向コンテンツトランスファーモジュールを提案し、リカレントニューラルネットワークを介して中間領域の特徴表現の条件付き予測を行う。ブレンディング時の空間的・意味的整合性を確保することに加えて,提案手法では,視覚的パノラマの質を向上させるために,文脈的注意機構と対角学習方式も採用している。提案手法は,広視野画像ブレンディングのための視覚的に魅力的な結果を生成するだけでなく,最先端画像インパインティングおよびアウトパインティングアプローチ上に構築された複数のベースラインに対して優れた性能を提供することができることを実験的に実証した。 In this paper we propose a new problem scenario in image processing, wide-range image blending, which aims to smoothly merge two different input photos into a panorama by generating novel image content for the intermediate region between them. Although such problem is closely related to the topics of image inpainting, image outpainting, and image blending, none of the approaches from these topics is able to easily address it. We introduce an effective deep-learning model to realize wide-range image blending, where a novel Bidirectional Content Transfer module is proposed to perform the conditional prediction for the feature representation of the intermediate region via recurrent neural networks. In addition to ensuring the spatial and semantic consistency during the blending, we also adopt the contextual attention mechanism as well as the adversarial learning scheme in our proposed method for improving the visual quality of the resultant panorama. We experimentally demonstrate that our proposed method is not only able to produce visually appealing results for wide-range image blending, but also able to provide superior performance with respect to several baselines built upon the state-of-the-art image inpainting and outpainting approaches.	翻訳日:2021-03-30 15:00:58 公開日:2021-03-28
# 欠陥GAN:自動欠陥検査のための高忠実欠陥合成 Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection ( http://arxiv.org/abs/2103.15158v1 ) ライセンス: Link先を確認	Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, Shijian Lu	(参考訳) 自動欠陥検査は、先進的な製造における効率よく効率的なメンテナンス、修理、運用に重要である。一方で、特にこのタスクにディープニューラルネットワークを採用する場合、欠陥サンプルの欠如によって、自動欠陥検査が制限されることが多い。本稿では,正確でロバストな欠陥検査ネットワークをトレーニングするために,現実的で多様な欠陥サンプルを生成する自動欠陥合成ネットワークであるdefy-ganを提案する。欠陥-ganはデファクトと修復プロセスを通じて学習し、デファクトは通常の表面画像に欠陥を生成し、修復は欠陥を除去して正常な画像を生成する。テクスチャや外観の異なる様々な画像背景において現実的な欠陥を生成するために、新しい合成層ベースのアーキテクチャを用いる。また、欠陥の確率的なバリエーションを模倣し、画像背景内の生成された欠陥の位置やカテゴリを柔軟に制御することができる。大規模な実験により、欠陥GANは様々な欠陥を優れた多様性と忠実さで合成できることが示された。さらに, 合成欠陥サンプルは, より良い欠陥検査ネットワークの訓練に有効であることを示した。 Automated defect inspection is critical for effective and efficient maintenance, repair, and operations in advanced manufacturing. On the other hand, automated defect inspection is often constrained by the lack of defect samples, especially when we adopt deep neural networks for this task. This paper presents Defect-GAN, an automated defect synthesis network that generates realistic and diverse defect samples for training accurate and robust defect inspection networks. Defect-GAN learns through defacement and restoration processes, where the defacement generates defects on normal surface images while the restoration removes defects to generate normal images. It employs a novel compositional layer-based architecture for generating realistic defects within various image backgrounds with different textures and appearances. It can also mimic the stochastic variations of defects and offer flexible control over the locations and categories of the generated defects within the image background. Extensive experiments show that Defect-GAN is capable of synthesizing various defects with superior diversity and fidelity. In addition, the synthesized defect samples demonstrate their effectiveness in training better defect inspection networks.	翻訳日:2021-03-30 15:00:38 公開日:2021-03-28
# ReAgent: 模倣と強化学習を用いたポイントクラウド登録 ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning ( http://arxiv.org/abs/2103.15231v1 ) ライセンス: Link先を確認	Dominik Bauer, Timothy Patten and Markus Vincze	(参考訳) ポイントクラウドの登録は、オブジェクトポーズ推定のような多くの3Dコンピュータビジョンタスクにおいて一般的なステップであり、3Dモデルが観察に一致している。古典的な登録法は、新しいドメインによく一般化するが、ノイズの多い観察や悪い初期化が与えられると失敗する。対照的に学習ベースの手法はより堅牢であるが、一般化能力に欠ける。本稿では,反復的ポイントクラウド登録を強化学習タスクとして考慮し,そのために新たな登録エージェント(エージェント)を提案する。我々は,定常的な専門家政策に基づいて,個別登録ポリシーを初期化するために模倣学習を用いる。提案するアライメント報酬に基づくポリシー最適化との統合は,エージェントの登録性能をさらに向上させる。我々は,ModelNet40(合成)とScanObjectNN(実データ)の両方における古典的および学習的登録手法との比較を行い,ReAgentが最先端の精度を実現することを示す。エージェントの軽量なアーキテクチャは、関連するアプローチと比較して推論時間を短縮することができる。さらに,本手法を実データ(linemod)上のオブジェクトポーズ推定タスクに適用し,最先端のポーズリファインメント手法を上回っている。 Point cloud registration is a common step in many 3D computer vision tasks such as object pose estimation, where a 3D model is aligned to an observation. Classical registration methods generalize well to novel domains but fail when given a noisy observation or a bad initialization. Learning-based methods, in contrast, are more robust but lack in generalization capacity. We propose to consider iterative point cloud registration as a reinforcement learning task and, to this end, present a novel registration agent (ReAgent). We employ imitation learning to initialize its discrete registration policy based on a steady expert policy. Integration with policy optimization, based on our proposed alignment reward, further improves the agent's registration performance. We compare our approach to classical and learning-based registration methods on both ModelNet40 (synthetic) and ScanObjectNN (real data) and show that our ReAgent achieves state-of-the-art accuracy. The lightweight architecture of the agent, moreover, enables reduced inference time as compared to related approaches. In addition, we apply our method to the object pose estimation task on real data (LINEMOD), outperforming state-of-the-art pose refinement approaches.	翻訳日:2021-03-30 15:00:22 公開日:2021-03-28
# 時間的動作局所化のための低忠実度エンド・ツー・エンドビデオエンコーダ事前学習 Low-Fidelity End-to-End Video Encoder Pre-training for Temporal ActionLocalization ( http://arxiv.org/abs/2103.15233v1 ) ライセンス: Link先を確認	Mengmeng Xu, Juan-Manuel Perez-Ru, Xiatian Zhu, Bernard Ghanem, Brais Martinez	(参考訳) 時間的行動ローカライゼーション(TAL)は、ビデオ理解における基本的な課題である。既存のtalメソッドは、アクション分類の監督を通じてビデオエンコーダを事前トレーニングする。これにより、ビデオエンコーダ -- アクションの分類のために訓練されるが、talで使用される -- のタスク不一致問題が発生する。直感的には、エンドツーエンドのモデル最適化はよいソリューションです。しかし、長い未処理ビデオを処理するのに計算コストがかかるため、gpuメモリの制約を受けるtalでは動作できない。本稿では,ローファイダリティ・エンド・ツー・エンド(LoFi)ビデオエンコーダの事前学習手法を導入することで,この問題を解決する。ビデオエンコーダのエンド・ツー・エンド最適化が中間ハードウェア予算のメモリ条件下で操作可能となるように,時間的・空間的・時空間的・時空間的分解能の面でのミニバッチ構成の削減を提案する。これにより、TAL損失監視からビデオエンコーダを逆向きに流し、タスクの不一致の問題を良好に解決し、より効果的な特徴表現を提供する。広範な実験により,lofiプリトレーニング手法が既存のtal法の性能を著しく向上させることが示された。軽量なResNet18ベースのビデオエンコーダを1つのRGBストリームで使用しても、当社の手法は高価な光フローを持つ2ストリームのResNet50ベースの代替手段をはるかに上回ります。 Temporal action localization (TAL) is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL. Intuitively, end-to-end model optimization is a good solution. However, this is not operable for TAL subject to the GPU memory constraints, due to the prohibitive computational cost in processing long untrimmed videos. In this paper, we resolve this challenge by introducing a novel low-fidelity end-to-end (LoFi) video encoder pre-training method. Instead of always using the full training configurations for TAL learning, we propose to reduce the mini-batch composition in terms of temporal, spatial or spatio-temporal resolution so that end-to-end optimization for the video encoder becomes operable under the memory conditions of a mid-range hardware budget. Crucially, this enables the gradient to flow backward through the video encoder from a TAL loss supervision, favourably solving the task discrepancy problem and providing more effective feature representations. Extensive experiments show that the proposed LoFi pre-training approach can significantly enhance the performance of existing TAL methods. Encouragingly, even with a lightweight ResNet18 based video encoder in a single RGB stream, our method surpasses two-stream ResNet50 based alternatives with expensive optical flow, often by a good margin.	翻訳日:2021-03-30 15:00:04 公開日:2021-03-28
# 深層学習における重み付けの役割を理解する Understanding the role of importance weighting for deep learning ( http://arxiv.org/abs/2103.15209v1 ) ライセンス: Link先を確認	Da Xu, Yuting Ye, Chuanwei Ruan	(参考訳) Byrd & Lipton (2019) による最近の論文は、経験的な観察に基づいて、過度にパラメータ化されたディープラーニングモデルに対する重み付けの影響に大きな懸念を提起している。彼らは、モデルがトレーニングデータを分離できる限り、重要度重み付けの影響はトレーニングが進むにつれて減少する、と観察する。しかし、この現象の厳密な特徴が欠けている。本稿では,勾配降下の暗黙のバイアスとマージンに基づく学習理論に対する重要度重み付けの役割に関する形式的特徴と理論的正当性について述べる。ディープラーニングモデルの下で最適化力学と一般化性能の両方を明らかにする。本研究は,深層学習において重み付けを重要視する様々な新しい現象を説明するだけでなく,モデルの一部として重み付けが最適化されている研究にも応用する。 The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon. In this paper, we provide formal characterizations and theoretical justifications on the role of importance weighting with respect to the implicit bias of gradient descent and margin-based learning theory. We reveal both the optimization dynamics and generalization performance under deep learning models. Our work not only explains the various novel phenomenons observed for importance weighting in deep learning, but also extends to the studies where the weights are being optimized as part of the model, which applies to a number of topics under active research.	翻訳日:2021-03-30 14:48:26 公開日:2021-03-28
# 連続時間情報を用いた深層学習のための時間カーネルアプローチ A Temporal Kernel Approach for Deep Learning with Continuous-time Information ( http://arxiv.org/abs/2103.15213v1 ) ライセンス: Link先を確認	Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan	(参考訳) rnn、causal cnn、アテンション機構などの逐次ディープラーニングモデルは、連続時間情報を容易に消費しない。時間データの離散化は、単純な継続的プロセスにおいても矛盾を引き起こす。現在のアプローチは、しばしば既存のディープラーニングアーキテクチャや実装と整合するように、ヒューリスティックな方法で時間を扱う。本稿では,ディープラーニングツールを用いた連続時間システムの特徴付け手法を提案する。特に、提案されたアプローチはすべての主要なディープラーニングアーキテクチャに適用され、実装の変更はほとんど必要ありません。重要な洞察は、ニューラルネットワークを時間的カーネルと組み合わせることで、連続時間システムを表現することであり、そこでは、ガウス過程とニューラル・タンジェント・カーネルによるディープラーニング理解の最近の進歩から直感を得る。時間的カーネルを表現するために、ランダムな特徴アプローチを導入し、再パラメータ化の下でカーネル学習問題をスペクトル密度推定に変換する。さらに、時間的核が定常的でない場合でも収束と一貫性を証明し、スペクトル密度を誤特定する。シミュレーションと実データ実験は,我々の時間的カーネルアプローチの幅広い設定における経験的有効性を示す。 Sequential deep learning models such as RNN, causal CNN and attention mechanism do not readily consume continuous-time information. Discretizing the temporal data, as we show, causes inconsistency even for simple continuous-time processes. Current approaches often handle time in a heuristic manner to be consistent with the existing deep learning architectures and implementations. In this paper, we provide a principled way to characterize continuous-time systems using deep learning tools. Notably, the proposed approach applies to all the major deep learning architectures and requires little modifications to the implementation. The critical insight is to represent the continuous-time system by composing neural networks with a temporal kernel, where we gain our intuition from the recent advancements in understanding deep learning with Gaussian process and neural tangent kernel. To represent the temporal kernel, we introduce the random feature approach and convert the kernel learning problem to spectral density estimation under reparameterization. We further prove the convergence and consistency results even when the temporal kernel is non-stationary, and the spectral density is misspecified. The simulations and real-data experiments demonstrate the empirical effectiveness of our temporal kernel approach in a broad range of settings.	翻訳日:2021-03-30 14:48:12 公開日:2021-03-28
# PnG BERT:Augmented BERT on Phonemes and Graphemes for Neural TTS PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS ( http://arxiv.org/abs/2103.15060v1 ) ライセンス: Link先を確認	Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu	(参考訳) 本稿では,ニューラルTSの新しいエンコーダモデルであるPnG BERTを紹介する。このモデルは元のBERTモデルから拡張され、テキストの音素表現と音素表現の両方を入力とし、単語レベルのアライメントを行う。大規模テキストコーパス上で自己教師型で事前訓練し、TSタスクで微調整することができる。実験結果から,事前学習したPnG BERTをエンコーダとして使用するニューラルTSモデルは,事前学習のない音素入力のみを用いたベースラインモデルよりも自然な韻律と精度の高い発音が得られることがわかった。 PnG BERTを用いて合成した音声とプロの話者の真実記録との間に統計的に有意な嗜好がないことを示す。 This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them. It can be pre-trained on a large text corpus in a self-supervised manner, and fine-tuned in a TTS task. Experimental results show that a neural TTS model using a pre-trained PnG BERT as its encoder yields more natural prosody and more accurate pronunciation than a baseline model using only phoneme input with no pre-training. Subjective side-by-side preference evaluations show that raters have no statistically significant preference between the speech synthesized using a PnG BERT and ground truth recordings from professional speakers.	翻訳日:2021-03-30 14:44:05 公開日:2021-03-28
# 自動音声認識におけるバイアスの定量化 Quantifying Bias in Automatic Speech Recognition ( http://arxiv.org/abs/2103.15122v1 ) ライセンス: Link先を確認	Siyuan Feng, Olya Kudina, Bence Mark Halpern and Odette Scharenborg	(参考訳) 自動音声認識(ASR)システムは、人間の発話を客観的に解釈することを約束する。実践的および最近の証拠は、最先端(SotA)のASRが、性別、年齢、言語障害、人種、アクセントによる発話のばらつきに苦しむことを示唆している。多くの要因がASRシステムのバイアスを引き起こすことがある。訓練材料の構成と調音の相違。我々の包括的なゴールは、ASRシステムのバイアスを明らかにすることであり、ASRの積極的なバイアス緩和に向けたものである。本稿では,性別,年齢,地域アクセント,非ネイティブアクセントに対するSotA ASRシステムのバイアスを体系的に定量化する。単語の誤り率を比較し, 音素レベルの誤り分析を行い, バイアスの発生箇所を理解する。データセットの明瞭性の違いによるバイアスに焦点を当てる。以上の結果から,ASR開発におけるバイアス緩和戦略を提案する。 Automatic speech recognition (ASR) systems promise to deliver objective interpretation of human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with speech variance due to gender, age, speech impairment, race, and accents. Many factors can cause the bias of an ASR system, e.g. composition of the training material and articulation differences. Our overarching goal is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. This paper systematically quantifies the bias of a SotA ASR system against gender, age, regional accents and non-native accents. Word error rates are compared, and in-depth phoneme-level error analysis is conducted to understand where bias is occurring. We focus on bias due to articulation differences in the dataset. Based on our findings, we suggest bias mitigation strategies for ASR development.	翻訳日:2021-03-30 14:43:53 公開日:2021-03-28
# ボードに逆らう: 水平方向の進化アルゴリズムがパンデミックに対抗 Playing Against the Board: Rolling Horizon Evolutionary Algorithms Against Pandemic ( http://arxiv.org/abs/2103.15090v1 ) ライセンス: Link先を確認	Konstantinos Sfikas and Antonios Liapis	(参考訳) 競争的なボードゲームは、人工知能のための豊かで多様なテストベッドを提供している。本稿では,短期的リスク軽減と長期的勝利戦略のバランスをとる必要があるため,コラボレーションボードゲームが人工知能に異なる課題をもたらすことを主張する。協力的なボードゲームは、すべてのプレイヤーに、ボードと確率的なルールセットによって引き起こされるエスカレートする課題を克服するために、異なるパワーの調整やリソースのプールを義務付ける。本稿では,模範的なコラボレーションボードゲームであるPandemicに着目し,本ゲーム用に設計された転動地平線進化アルゴリズムを提案する。パンデミックゲーム状態が確率的だが予測可能な方法で変化する複雑な方法は、多くの特別に設計された前方モデル、意思決定のためのマクロアクション表現、進化アルゴリズムの遺伝的操作のための修復機能を必要とした。楽観的・悲観的なゲーム状態評価,異なる突然変異率,事象の地平線を探索するアルゴリズムの変数を,基本的階層的政策エージェントと比較する。結果は、短期水平ロールアウトによる進化的アプローチが、委員会が導入し、それらに対して防御する将来の危険を考慮しやすくしていることを示している。結果は、人工知能、特にマルチプレイヤー協調インタラクションの処理において、協調ボードゲームが直面する課題の種類を強調する。 Competitive board games have provided a rich and diverse testbed for artificial intelligence. This paper contends that collaborative board games pose a different challenge to artificial intelligence as it must balance short-term risk mitigation with long-term winning strategies. Collaborative board games task all players to coordinate their different powers or pool their resources to overcome an escalating challenge posed by the board and a stochastic ruleset. This paper focuses on the exemplary collaborative board game Pandemic and presents a rolling horizon evolutionary algorithm designed specifically for this game. The complex way in which the Pandemic game state changes in a stochastic but predictable way required a number of specially designed forward models, macro-action representations for decision-making, and repair functions for the genetic operations of the evolutionary algorithm. Variants of the algorithm which explore optimistic versus pessimistic game state evaluations, different mutation rates and event horizons are compared against a baseline hierarchical policy agent. Results show that an evolutionary approach via short-horizon rollouts can better account for the future dangers that the board may introduce, and guard against them. Results highlight the types of challenges that collaborative board games pose to artificial intelligence, especially for handling multi-player collaboration interactions.	翻訳日:2021-03-30 14:42:53 公開日:2021-03-28
# 人間の視力に基づくロボットシステムにおける適応自律性 Adaptive Autonomy in Human-on-the-Loop Vision-Based Robotics Systems ( http://arxiv.org/abs/2103.15053v1 ) ライセンス: Link先を確認	Sophia Abraham, Zachariah Carmichael, Sreya Banerjee, Rosaura VidalMata, Ankit Agrawal, Md Nafee Al Islam, Walter Scheirer, Jane Cleland-Huang	(参考訳) コンピュータビジョンのアプローチは、自律ロボットシステムによって、周囲の世界を感知し、衝突回避、捜索救助、オブジェクト操作など様々なタスクを実行する際の意思決定を導くために広く使われている。特にHuman-on-the-loop(HoTL)システムでは、システムによって意思決定が自律的に行われ、人間が監督的な役割を果たす。ビジョンモデルの失敗は、命や死の可能性のある誤った決定につながる可能性がある。本稿では,適応的自律性レベルに基づくソリューションを提案する。システムでは,これらのモデルの信頼性の損失を検知し,自己の自律性レベルを一時的に低下させ,意思決定プロセスにおける人間の関与を高めることにより応答する。我々のソリューションは、人間が反応しガイダンスを提供する時間を持つ視覚ベースのタスクに適用できる。提案手法は,モデルの不確実性を考慮して視覚タスクの信頼性を推定し,モデルのトレーニングデータと現在の動作環境が一致しないかどうかを共変解析することにより評価する。我々はDroneResponseの例を紹介し,小型無人航空システムを緊急対応ミッションに配置し,信頼性スコアに加えて,システムの自律性の行動と適応を駆動および特定するために,ビジョンモデルの信頼性をどのように利用するかを示す。本稿では,提案手法を概説し,自律システムの意思決定におけるビジョンモデルの安全かつ信頼性の高い展開に向けた,コンピュータビジョンとソフトウェアエンジニアリングの交点におけるオープンチャレンジについて述べる。 Computer vision approaches are widely used by autonomous robotic systems to sense the world around them and to guide their decision making as they perform diverse tasks such as collision avoidance, search and rescue, and object manipulation. High accuracy is critical, particularly for Human-on-the-loop (HoTL) systems where decisions are made autonomously by the system, and humans play only a supervisory role. Failures of the vision model can lead to erroneous decisions with potentially life or death consequences. In this paper, we propose a solution based upon adaptive autonomy levels, whereby the system detects loss of reliability of these models and responds by temporarily lowering its own autonomy levels and increasing engagement of the human in the decision-making process. Our solution is applicable for vision-based tasks in which humans have time to react and provide guidance. When implemented, our approach would estimate the reliability of the vision task by considering uncertainty in its model, and by performing covariate analysis to determine when the current operating environment is ill-matched to the model's training data. We provide examples from DroneResponse, in which small Unmanned Aerial Systems are deployed for Emergency Response missions, and show how the vision model's reliability would be used in addition to confidence scores to drive and specify the behavior and adaptation of the system's autonomy. This workshop paper outlines our proposed approach and describes open challenges at the intersection of Computer Vision and Software Engineering for the safe and reliable deployment of vision models in the decision making of autonomous systems.	翻訳日:2021-03-30 14:40:51 公開日:2021-03-28
# 可逆画像信号処理 Invertible Image Signal Processing ( http://arxiv.org/abs/2103.15061v1 ) ライセンス: Link先を確認	Yazhou Xing, Zian Qian, Qifeng Chen	(参考訳) 非処理RAWデータは、画像編集とコンピュータビジョンのための非常に貴重な画像フォーマットである。しかし、RAWデータのファイルサイズは巨大であるため、ほとんどのユーザーは処理や圧縮されたsRGB画像にしかアクセスできない。このギャップを埋めるために、インバーティブル画像信号処理(InvISP)パイプラインを設計し、視覚的に魅力的なsRGB画像のレンダリングを可能にするだけでなく、ほぼ完璧なRAWデータの復元を可能にする。本フレームワークの可逆性により、メモリオーバーヘッドを伴わずに、sRGB画像からRAWデータを合成する代わりに、現実的なRAWデータを再構成することができる。我々はまた、JPEG画像からRAWデータを再構成するフレームワークに、差別化可能なJPEG圧縮シミュレータを統合する。 2つのデジタル一眼レフにおける広範囲な定量的・定性的実験により,srgb画像と再構成生データの両方において,代替法よりも高い品質が得られることを示した。 Unprocessed RAW data is a highly valuable image format for image editing and computer vision. However, since the file size of RAW data is huge, most users can only get access to processed and compressed sRGB images. To bridge this gap, we design an Invertible Image Signal Processing (InvISP) pipeline, which not only enables rendering visually appealing sRGB images but also allows recovering nearly perfect RAW data. Due to our framework's inherent reversibility, we can reconstruct realistic RAW data instead of synthesizing RAW data from sRGB images without any memory overhead. We also integrate a differentiable JPEG compression simulator that empowers our framework to reconstruct RAW data from JPEG images. Extensive quantitative and qualitative experiments on two DSLR demonstrate that our method obtains much higher quality in both rendered sRGB images and reconstructed RAW data than alternative methods.	翻訳日:2021-03-30 14:40:28 公開日:2021-03-28
# manhattanslam: マンハッタンフレームの混合を利用したロバストな平面追跡とマッピング ManhattanSLAM: Robust Planar Tracking and Mapping Leveraging Mixture of Manhattan Frames ( http://arxiv.org/abs/2103.15068v1 ) ライセンス: Link先を確認	Raza Yunus, Yanyan Li and Federico Tombari	(参考訳) 本稿では,RGB-D SLAMシステムを提案し,室内環境における構造情報を活用することにより,CPU上での正確な追跡と効率的な高密度マッピングを実現する。以前の作品では、マンハッタン世界 (mw) の仮定を用いて低ドリフトカメラのポーズを推定し、そのようなシステムの応用を制限している。一方,本稿では,MW環境と非MW環境におけるロバストなトラッキングを実現する新しい手法を提案する。平面間の直交関係をチェックし、マンハッタンのフレームを直接検出し、シーンをマンハッタンのフレームの混合としてモデル化する。 MWシーンでは、ポーズ推定を分離し、マンハッタンフレーム観測に基づく新しいドリフトフリー回転推定を提供する。 MWシーンの翻訳推定や非MWシーンのフルカメラポーズ推定では,難易度の高いシーンにおいて,ポイント,ライン,平面の特徴を利用してロバストなトラッキングを行う。さらに,各フレームで検出された平面特徴を活用し,各画像を平面領域と非平面領域に分割した,効率的なサーフェルに基づく高密度マッピング戦略を提案する。平面波は地図上のスパース平面から直接初期化され、非平面波はスーパーピクセルを抽出することによって構築される。提案手法は,ポーズ推定,ドリフト,再構成の精度を,他の最先端手法と比較して高い性能で評価する。将来的にはコードをオープンソースにします。 In this paper, a robust RGB-D SLAM system is proposed to utilize the structural information in indoor scenes, allowing for accurate tracking and efficient dense mapping on a CPU. Prior works have used the Manhattan World (MW) assumption to estimate low-drift camera pose, in turn limiting the applications of such systems. This paper, in contrast, proposes a novel approach delivering robust tracking in MW and non-MW environments. We check orthogonal relations between planes to directly detect Manhattan Frames, modeling the scene as a Mixture of Manhattan Frames. For MW scenes, we decouple pose estimation and provide a novel drift-free rotation estimation based on Manhattan Frame observations. For translation estimation in MW scenes and full camera pose estimation in non-MW scenes, we make use of point, line and plane features for robust tracking in challenging scenes. %mapping Additionally, by exploiting plane features detected in each frame, we also propose an efficient surfel-based dense mapping strategy, which divides each image into planar and non-planar regions. Planar surfels are initialized directly from sparse planes in our map while non-planar surfels are built by extracting superpixels. We evaluate our method on public benchmarks for pose estimation, drift and reconstruction accuracy, achieving superior performance compared to other state-of-the-art methods. We will open-source our code in the future.	翻訳日:2021-03-30 14:40:12 公開日:2021-03-28
# MergeComp: スケーラブルな分散トレーニングのための圧縮スケジューリング MergeComp: A Compression Scheduler for Scalable Communication-Efficient Distributed Training ( http://arxiv.org/abs/2103.15195v1 ) ライセンス: Link先を確認	Zhuang Wang, Xinyu Wu, T.S. Eugene Ng	(参考訳) 大規模分散トレーニングはコミュニケーションバウンダリになりつつある。多くの勾配圧縮アルゴリズムが、通信オーバーヘッドを減らし、スケーラビリティを向上させるために提案されている。しかし、勾配圧縮が分散トレーニングの性能に悪影響を及ぼす場合もあることが観察されている。本稿では,通信効率のよい分散トレーニングのスケーラビリティを最適化する圧縮スケジューラであるMergeCompを提案する。モデルアーキテクチャやシステムパラメータの知識なしに圧縮アルゴリズムのパフォーマンスを最適化するために、圧縮操作を自動的にスケジュールする。我々はMergeCompを9つの一般的な圧縮アルゴリズムに適用した。評価の結果,mergecompは圧縮アルゴリズムの性能を最大3.83倍向上させることができた。高速ネットワーク上での分散トレーニングのスケーリング係数を最大99%達成することも可能だ。 Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the communication overhead and improve scalability. However, it has been observed that in some cases gradient compression may even harm the performance of distributed training. In this paper, we propose MergeComp, a compression scheduler to optimize the scalability of communication-efficient distributed training. It automatically schedules the compression operations to optimize the performance of compression algorithms without the knowledge of model architectures or system parameters. We have applied MergeComp to nine popular compression algorithms. Our evaluations show that MergeComp can improve the performance of compression algorithms by up to 3.83x without losing accuracy. It can even achieve a scaling factor of distributed training up to 99% over high-speed networks.	翻訳日:2021-03-30 14:36:47 公開日:2021-03-28
# ロバストと最適化ディープラーニングモデルを用いた正確な株価予測 Accurate Stock Price Forecasting Using Robust and Optimized Deep Learning Models ( http://arxiv.org/abs/2103.15096v1 ) ライセンス: Link先を確認	Jaydip Sen and Sidra Mehtab	(参考訳) 将来の株価を正確に予測するための堅牢なフレームワークを設計することは、常に非常に困難な研究課題とみなされてきた。古典的効率的な市場仮説の提唱者は、株価変動の確率的な性質のため、効率的な市場における将来の価格を正確に予測することは不可能であると断言する。しかし、アルゴリズムやモデルがどのように株式価格の効率的で正確で堅牢な予測を設計できるかを示す、洗練度と複雑さの程度が異なる文献には、多くの命題が存在する。本稿では,インドの自動車部門における重要な企業の株価の将来価格を正確に予測するために,回帰モデルの10種類の深層学習モデルを提案する。 2012年12月31日から2013年12月27日までの記録に基づいて、非常に細かな株価を5分間隔で収集し、モデルをトレーニングします。テストは2013年12月30日から2015年1月9日までの記録を用いて行われた。本稿では,モデルの設計原理を説明し,予測精度と実行速度に基づいてその性能を解析する。 Designing robust frameworks for precise prediction of future prices of stocks has always been considered a very challenging research problem. The advocates of the classical efficient market hypothesis affirm that it is impossible to accurately predict the future prices in an efficiently operating market due to the stochastic nature of the stock price variables. However, numerous propositions exist in the literature with varying degrees of sophistication and complexity that illustrate how algorithms and models can be designed for making efficient, accurate, and robust predictions of stock prices. We present a gamut of ten deep learning models of regression for precise and robust prediction of the future prices of the stock of a critical company in the auto sector of India. Using a very granular stock price collected at 5 minutes intervals, we train the models based on the records from 31st Dec, 2012 to 27th Dec, 2013. The testing of the models is done using records from 30th Dec, 2013 to 9th Jan 2015. We explain the design principles of the models and analyze the results of their performance based on accuracy in forecasting and speed of execution.	翻訳日:2021-03-30 14:33:08 公開日:2021-03-28

Title

Authors

Abstract

論文公表日・翻訳日

# 相互作用クエンチからの光学格子の量子相関の復元

Recovering quantum correlations in optical lattices from interaction quenches ( http://arxiv.org/abs/2005.09000v2 )

ライセンス: Link先を確認

M. Gluza, J. Eisert

(参考訳) 光格子中の超低温原子による量子シミュレーションは、強い相互作用を持つ量子系を理解するためのエキサイティングな道を開く。原子ガス顕微鏡は、他の量子多体系では例外なく、単一部位密度分解能を提供するため、これに欠かせない。しかし、現在、局所的なコヒーレント電流の直接測定は不可能である。本研究では,光格子を傾けた後など,非相互作用ダイナミクスに対するクエンチに応じて変化した密度を測定することにより,その実現方法を示す。そこで本研究では,トンネル電流と原子数ダイナミクスに関する方程式の閉集合を解くデータ解析手法を構築し,コヒーレント電流を表す非対角項を含む全共分散行列を確実に復元する。信号処理は半定値最適化を基盤とし、観測データに最適なボナfide共分散行列を提供する。得られた非可換可観測物に関する情報が有限温度での有界絡みを減らし、古典的能力を超える量子シミュレーションにおける量子相関を研究する可能性を明らかにする。

Quantum simulations with ultra-cold atoms in optical lattices open up an exciting path towards understanding strongly interacting quantum systems. Atom gas microscopes are crucial for this as they offer single-site density resolution, unparalleled in other quantum many-body systems. However, currently a direct measurement of local coherent currents is out of reach. In this work, we show how to achieve that by measuring densities that are altered in response to quenches to non-interacting dynamics, e.g., after tilting the optical lattice. For this, we establish a data analysis method solving the closed set of equations relating tunnelling currents and atom number dynamics, allowing to reliably recover the full covariance matrix, including off-diagonal terms representing coherent currents. The signal processing builds upon semi-definite optimization, providing bona fide covariance matrices optimally matching the observed data. We demonstrate how the obtained information about non-commuting observables allows to lower bound entanglement at finite temperature which opens up the possibility to study quantum correlations in quantum simulations going beyond classical capabilities.

翻訳日:2023-05-19 11:04:41 公開日:2021-03-28

# Stern-Gerlach実験によるスピン状態推定

Spin-state estimation using the Stern-Gerlach experiment ( http://arxiv.org/abs/2009.01877v3 )

ライセンス: Link先を確認

Javier Mart\'inez-Cifuentes, K.M. Fonseca-Romero

(参考訳) 本稿では,中性スピン1/2点粒子のビームが四極子磁場と相互作用するstern-gerlach実験の修正設定を用いて,スピンの状態推定手法を提案する。提案手法は, 二次強度検出器に基づく線形反転推定法であり, ビームの初期空間状態が適切である。初期スピン状態の推定器の統計的特徴付けにより、誤差を推定パラメータに関連付けるだけでなく、異なるStern-Gerlachセットアップに対応する推定手順を比較するための尺度を定義することができる。

We propose a state estimation scheme for spins, using a modified setup of the Stern-Gerlach experiment, in which a beam of neutral spin-1/2 point particles interacts with a quadrupolar magnetic field. The proposed linear inversion estimation procedure, based on a quadrant intensity detector, requires a suitable initial spatial state of the beam. The statistical characterization of the estimator of the initial spin state allows us not only to associate an error to the estimated parameters, but also to define a measure for comparing estimation procedures corresponding to different Stern-Gerlach setups.

翻訳日:2023-05-03 22:44:13 公開日:2021-03-28

# 低温におけるニオブ酸リチウムフォトニック結晶共振器の損失チャネル

Loss channels affecting lithium niobate phononic crystal resonators at cryogenic temperature ( http://arxiv.org/abs/2010.01025v3 )

ライセンス: Link先を確認

E. Alex Wollack, Agnetta Y. Cleland, Patricio Arrangoiz-Arriola, Timothy P. McKenna, Rachel G. Gruenke, Rishi N. Patel, Wentao Jiang, Christopher J. Sarabalis, Amir H. Safavi-Naeini

(参考訳) 窒化リチウム薄膜上に作製したマイクロ波フォトニック結晶共振器の超伝導量子回路への集積特性について検討した。ミリケルビン温度の異なる設計ジオメトリでは、共振器内の5.10^6$フォノンに対応する10^5 - 10^6$以上の機械的内部品質係数を高いマイクロ波駆動力で達成する。同一のミラーセル設計で共振器の欠陥サイズをスイープすることで、共振器の内部品質因子を介して完全なフォノニックバンドギャップのシグネチャを間接的に観測することができる。品質因子の温度依存性を調べると、超伝導と2レベルシステム(TLS)損失チャネルがデバイスの性能に与える影響がわかる。最後に,共鳴tls減衰と一致した異常な低温周波数シフトを観測し,材料選択が損失の軽減に寄与することを確認した。

We investigate the performance of microwave-frequency phononic crystal resonators fabricated on thin-film lithium niobate for integration with superconducting quantum circuits. For different design geometries at millikelvin temperatures, we achieve mechanical internal quality factors $Q_i$ above $10^5 - 10^6$ at high microwave drive power, corresponding to $5\times10^6$ phonons inside the resonator. By sweeping the defect size of resonators with identical mirror cell designs, we are able to indirectly observe signatures of the complete phononic bandgap via the resonators' internal quality factors. Examination of quality factors' temperature dependence shows how superconducting and two-level system (TLS) loss channels impact device performance. Finally, we observe an anomalous low-temperature frequency shift consistent with resonant TLS decay and find that material choice can help to mitigate these losses.

翻訳日:2023-04-30 04:01:30 公開日:2021-03-28

# コヒーレントランダム化ベンチマーク

Coherent randomized benchmarking ( http://arxiv.org/abs/2010.13810v2 )

ライセンス: Link先を確認

Jorge Miguel-Ramiro and Alexander Pirker and Wolfgang D\"ur

(参考訳) ランダム化ベンチマークは、量子ゲート、回路、デバイスの性能と信頼性を効率的に推定する強力な手法である。本稿では,独立したサンプルではなく,異なるランダムシーケンスの重ね合わせが使用される,コヒーレントな方法でランダム化ベンチマークを行うことを提案する。これにより、ベンチマーク可能なゲートや、効率と拡張性の観点から、一様でシンプルなプロトコルに大きなメリットがあることが示されています。例えば、普遍ゲート集合、例えば$n-$qudit Pauli作用素の集合、あるいは任意のユニタリを含むより一般的な集合、およびパウリ集合のみを用いた特定の$n-$qudit Cliffordゲートが効率的にベンチマーク可能であることを示す。支払いコストは、関連する量子演算に制御を追加するための追加の複雑さである。しかし,本研究では,任意の物理的実現において自然に利用可能であり,テスト対象のゲートとは独立な補助的自由度を用いることにより,これを実現できることを実証する。

Randomized benchmarking is a powerful technique to efficiently estimate the performance and reliability of quantum gates, circuits and devices. Here we propose to perform randomized benchmarking in a coherent way, where superpositions of different random sequences rather than independent samples are used. We show that this leads to a uniform and simple protocol with significant advantages with respect to gates that can be benchmarked, and in terms of efficiency and scalability. We show that e.g. universal gate sets, the set of $n-$qudit Pauli operators or more general sets including arbitrary unitaries, as well as a particular $n-$qudit Clifford gate using only the Pauli set, can be efficiently benchmarked. The price to pay is an additional complexity to add control to the involved quantum operations. However we demonstrate that this can be done by using auxiliary degrees of freedom that are naturally available in basically any physical realization, and are independent of the gates to be tested.

翻訳日:2023-04-27 11:21:24 公開日:2021-03-28

# ads${}_3$/cft${}_2$と$u(1)$チャーン・シモンズ理論との対称性解消された絡み合い

Symmetry-Resolved Entanglement in AdS${}_3$/CFT${}_2$ coupled to $U(1)$ Chern-Simons Theory ( http://arxiv.org/abs/2012.11274v3 )

ライセンス: Link先を確認

Suting Zhao, Christian Northe, Ren\'e Meyer

(参考訳) 我々は、AdS${}_3$/CFT${}_2$と$U(1)$チャーン-サイモンズ理論に結合した対称性分解エントロピーを考える。二次元共形場理論における荷電モーメントのホログラフィック双対を、AdS${}_3$のバルク内の電荷ウィルソン線、すなわち、$U(1)$チャーン・サイモンズゲージ場に最小結合された龍高柳測地線として同定する。ウィルソン線周りのホロノミーをアハロノフ-ボーム位相と同定し、2次元場理論において、絡み合う区間の終点に挿入された帯電した u(1)$ 頂点作用素によって生成される。さらに,荷電モーメントの生成関数を帯電する部分領域の電荷量に関連づけることで,対称性の解消された絡み合いエントロピーを計算する新しい手法を考案した。バルクウィルソン線によって導かれる$U(1)$チャーン・サイモンズゲージ場から部分領域電荷を計算する。我々は,Poincar\'e パッチや大域 AdS${}_3$ および円錐欠陥測地に対する対称性分解エントロピーの導出に本手法を用いる。いずれの場合も、対称性分解エントロピーは、龍高柳測地線の長さとチャーン・サイモンズ準位$k$で決定され、絡み合いの均等性を満たす。バルク理論の漸近対称性代数は$\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody型である。 $\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody 対称性を用いて、二重共形場理論の計算によりホログラフィック結果を確認する。

We consider symmetry-resolved entanglement entropy in AdS${}_3$/CFT${}_2$ coupled to $U(1)$ Chern-Simons theory. We identify the holographic dual of the charged moments in the two-dimensional conformal field theory as a charged Wilson line in the bulk of AdS${}_3$, namely the Ryu-Takayanagi geodesic minimally coupled to the $U(1)$ Chern-Simons gauge field. We identify the holonomy around the Wilson line as the Aharonov-Bohm phases which, in the two-dimensional field theory, are generated by charged $U(1)$ vertex operators inserted at the endpoints of the entangling interval. Furthermore, we devise a new method to calculate the symmetry resolved entanglement entropy by relating the generating function for the charged moments to the amount of charge in the entangling subregion. We calculate the subregion charge from the $U(1)$ Chern-Simons gauge field sourced by the bulk Wilson line. We use our method to derive the symmetry-resolved entanglement entropy for Poincar\'e patch and global AdS${}_3$, as well as for the conical defect geometries. In all three cases, the symmetry resolved entanglement entropy is determined by the length of the Ryu-Takayanagi geodesic and the Chern-Simons level $k$, and fulfills equipartition of entanglement. The asymptotic symmetry algebra of the bulk theory is of $\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody type. Employing the $\hat{\mathfrak{u}}{(1)_k}$ Kac-Moody symmetry, we confirm our holographic results by a calculation in the dual conformal field theory.

翻訳日:2023-04-20 00:27:25 公開日:2021-03-28

# 量子客観性を守る

Defending Quantum Objectivity ( http://arxiv.org/abs/2103.11530v2 )

ライセンス: Link先を確認

Elias Okon

(参考訳) 最近の議論は、量子測度が明確な客観的な結果を持つという仮定は、量子予測とは相容れないことを主張している。本研究は,議論を詳細に検討した結果,従来認識されていたよりも適用範囲が狭いことを示す。特に、以下のものがある。一その主張が特定の特徴を有する隠れ変数モデルにのみ適用されること、及び二パイロット波理論を含む多くの隠れ変数モデルにそのような特徴が存在しないこと。この議論は量子測定の客観性に疑問を投げかけることに成功していないと結論づける。

A recent argument, attributed to Masanes, is claimed to show that the assumption that quantum measurements have definite, objective outcomes, is incompatible with quantum predictions. In this work, a detailed examination of the argument shows that it has a much narrower field of application than previously recognized. In particular, it is found: i) that the argument only applies to hidden-variable models with a particular feature; and ii) that such a feature is not present in most hidden-variable models, including pilot-wave theory. It is concluded that the argument does not succeed in calling into question the objectivity of quantum measurements.

翻訳日:2023-04-07 04:48:12 公開日:2021-03-28

# 非自明なバンド間効果:磁気感受性、非線形光学、トポロジカル縮退圧力への応用

Nontrivial interband effect: applications to magnetic susceptibility, nonlinear optics, and topological degeneracy pressure ( http://arxiv.org/abs/2103.13281v2 )

ライセンス: Link先を確認

Nobuyuki Okuma

(参考訳) バンド間効果は、伝統的および現代の固体物理学において重要な概念である。本稿では、与えられた規則を破ることなく削除できない非自明なバンド間効果の理論を提案する。一般の非自明なバンド間効果を、タイト結合ハミルトニアンの全バンド集合の性質を自明性として定義する。非自明なバンド間効果の源泉として、安定なトポロジカル絶縁体、対称性に基づく指標、脆弱なトポロジカル絶縁体、多極/高次トポロジカル絶縁体などを考える。応用として、トポロジカル特性を持つ強結合ハミルトニアンに対する軌道磁気感受性を計算する。さらに,非自明なバンド間効果による機械的特性について考察する。我々は、負の値を取る傾向があるバンド間縮退圧力を定義し、チャーン絶縁体に対して計算する。この計算は固体の力学特性におけるトポロジカルバンド構造の重要性を示している。また、偏光差を特徴とする非線形光学への応用と、絡み合う相互作用系への一般化についても論じる。位相的概念を部分集合として含む非自明なバンド間効果の枠組みは、未探索の概念を見つけるのに役立つかもしれない。

The interband effect is an important concept both in traditional and modern solid-state physics. In this paper, we present a theory of the nontrivial interband effect, which cannot be removed without breaking given rules. We define the general nontrivial interband effect by regarding a property of the set of the total bands of a tight-binding Hamiltonian as the triviality. As examples of the source of the nontrivial interband effect, we consider several topological concepts: stable topological insulator, symmetry-based indicator, fragile topological insulator, and multipole/higher-order topological insulator. As an application, we calculate the orbital magnetic susceptibility for tight-binding Hamiltonians with topological properties. In addition, we consider the mechanical properties induced by the nontrivial interband effect. We define interband-induced degeneracy pressure, which tends to take a negative value, and calculate it for the Chern insulator. This calculation demonstrates the importance of topological band structures in mechanical properties of solids. We also discuss the application to nonlinear optics characterized by the polarization difference and the generalization to interacting systems with entanglement. The framework of the nontrivial interband effect, which includes topological concepts as subsets, might be useful for finding unexplored concepts.

翻訳日:2023-04-06 23:43:43 公開日:2021-03-28

# 量子フォトニックデバイス用半導体ナノ構造の液滴エピタキシー

Droplet Epitaxy of Semiconductor Nanostructures for Quantum Photonic Devices ( http://arxiv.org/abs/2103.15083v1 )

ライセンス: Link先を確認

Massimo Gurioli, Zhiming Wang, Armando Rastelli, Takashi Kuroda, Stefano Sanguinetti

(参考訳) 長い夢の量子インターネットは、空飛ぶ量子ビットで繋がる量子ノード(固体または原子系)のネットワークで構成され、自然に光子に基づいて、光速で長距離を移動し、不可解なデコヒーレンスを持つ。鍵となるコンポーネントは光源であり、単一または絡み合った光子対を提供できる。異なるプラットフォームの中で、半導体量子ドットは、小型チップで他のフォトニックおよび電子部品と統合できるため、非常に魅力的である。 1990年代初頭、自己集合型エピタキシャル半導体量子ドット(QD)または人工原子、すなわちStranski-Krastanov(SK)とDroplet Epitaxy(DE)の2つのアプローチが開発された。その頑丈さと単純さのため、SK法は基本分野と技術分野の両方でいくつかのブレークスルーを達成するための作業場となった。特定の発光波長や構造的および光学的性質の必要性は、それにもかかわらず、高品質な半導体ナノ構造を得るための補完的経路として、de法とその最近の開発である局所ドロップレットエッチング(lde)に関するさらなる研究の動機となっている。強い絡み合った光子対の生成と良好な光子の不識別性に関する最近の報告は、DEおよびLDE QDsが従来のSK InGaAs QDsを量子エミッタとして補完する(時には上回る)ことを示唆している。本稿では,de と lde の現状に関する批判的調査を行い,量子通信と技術応用の観点から,その利点と弱点,得られた成果,未解決の課題について述べる。

The long dreamed quantum internet would consist of a network of quantum nodes (solid-state or atomic systems) linked by flying qubits, naturally based on photons, travelling over long distances at the speed of light, with negligible decoherence. A key component is a light source, able to provide single or entangled photon pairs. Among the different platforms, semiconductor quantum dots are very attractive, as they can be integrated with other photonic and electronic components in miniaturized chips. In the early 1990s two approaches were developed to synthetize self-assembled epitaxial semiconductor quantum dots (QDs), or artificial atoms, namely the Stranski-Krastanov (SK) and the droplet epitaxy (DE) method. Because of its robustness and simplicity, the SK method became the workhorse to achieve several breakthroughs in both fundamental and technological areas. The need for specific emission wavelengths or structural and optical properties has nevertheless motivated further research on the DE method and its more recent development, the local-droplet-etching (LDE), as complementary routes to obtain high quality semiconductor nanostructures. The recent reports on the generation of highly entangled photon pairs, combined with good photon indistinguishability, suggest that DE and LDE QDs may complement (and sometime even outperform) conventional SK InGaAs QDs as quantum emitters. We present here a critical survey of the state of the art of DE and LDE, highlighting the advantages and weaknesses, the obtained achievements and the still open challenges, in view of applications in quantum communication and technology.

翻訳日:2023-04-06 08:11:29 公開日:2021-03-28

# 干渉集合法による複素分子と格子の伝導零点

Conductance zeros in complex molecules and lattices from the interference set method ( http://arxiv.org/abs/2103.15082v1 )

ライセンス: Link先を確認

M. Nita, M. Tolea, D.C. Marinescu

(参考訳) 破壊量子干渉(DQI)とその電子輸送への影響は、離散ハミルトニアンによって記述できる化学分子や有限物理格子で研究されている。特殊に指定された集合、干渉集合の任意の2つの点の間にコンダクタンス零点が存在することが知られている二部系から始まり、ダイソン方程式を用いて複素系におけるゼロコンダクタンス零点を決定する一般的なアルゴリズムを開発する。我々は、この方法がフルベン分子に適用されることを説明します。コンダクタンス零点の安定性は外部摂動に関して解析される。

Destructive quantum interference (DQI) and its effects on electron transport is studied in chemical molecules and finite physical lattices that can be described by a discrete Hamiltonian. Starting from a bipartite system whose conductance zeros are known to exist between any two points of a specially designated set, the interference set, we use the Dyson equation to develop a general algorithm of determining the zero conductance points in complex systems, which are not necessarily bipartite. We illustrate this procedure as it applies to the fulvene molecule. The stability of the conductance zeros is analyzed in respect with external perturbations.

翻訳日:2023-04-06 08:10:58 公開日:2021-03-28

# 量子ハトホールパラドックスの実験実験

Experimental demonstration of quantum pigeonhole paradox ( http://arxiv.org/abs/2103.15070v1 )

ライセンス: Link先を確認

Ming-Cheng Chen, Chang Liu, Yi-Han Luo, He-Liang Huang, Bi-Ying Wang, Xi-Lin Wang, Li Li, Nai-Le Liu, Chao-Yang Lu, Jian-Wei Pan

(参考訳) 3つの単一光子が2つの偏光チャネル(事前および後選択されたアンサンブル)を透過すると、弱い強度の測定によって同じ偏光チャネルに2つの光子が存在しないことを実験的に証明した。さらに、この効果が2次測定で破壊されることが示される。これらの結果は、量子ハトホールパラドックスとその操作機構の存在を示している。

We experimentally demonstrate that when three single photons transmit through two polarization channels, in a well-defined pre- and postselected ensemble, there are no two photons in the same polarization channel by weak-strength measurement, a counter-intuitive quantum counting effect called quantum pigeonhole paradox. We further show that this effect breaks down in second-order measurement. These results indicate the existence of quantum pigeonhole paradox and its operating regime.

翻訳日:2023-04-06 08:10:47 公開日:2021-03-28

# TransICD:説明可能なICD符号化のためのトランスフォーマーに基づくコードワイズアテンションモデル

TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding ( http://arxiv.org/abs/2104.10652v1 )

ライセンス: Link先を確認

Biplob Biswas, Thai-Hoang Pham, Ping Zhang

(参考訳) 国際疾患分類法(ICD)は, 医療分野の請求システムにおいて, 診断基準付き医療券のタグ付けが有効であり, 不可欠であることが示されている。現在、ICDコードは手動で臨床メモに割り当てられており、多くのエラーを引き起こす可能性がある。さらに、熟練したプログラマのトレーニングには時間と人的リソースも必要です。したがって、ICDコード決定プロセスの自動化は重要な課題である。人工知能理論と計算ハードウェアの進歩により、機械学習アプローチがこのプロセスを自動化するのに適したソリューションとして登場した。本稿では,文書のトークン間の相互依存を捉えるためにトランスフォーマーベースのアーキテクチャを適用し,コードワイド・アテンション・メカニズムを用いて文書全体のコード固有表現を学習する。最後に、それらは対応するコード予測のために分離された高密度層に供給される。さらに,臨床データセットの符号周波数の不均衡に対処するために,ラベル分布認識マージン(LDAM)損失関数を用いる。 mimic-iiiデータセットの実験結果は,提案モデルが他のベースラインよりも有意な差を示した。特に,2方向リカレントニューラルネットワークの0.868と比較すると,マイクロAUCスコアが0.923である。また,コードワイズアテンション機構を利用することで,その予測に関するより多くの洞察を提供し,臨床医が信頼できる意思決定を行うことができることを示した。私たちのコードはオンラインで入手できる(https://github.com/biplob1ly/TransICD)。

International Classification of Disease (ICD) coding procedure which refers to tagging medical notes with diagnosis codes has been shown to be effective and crucial to the billing system in medical sector. Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors. Moreover, training skilled coders also requires time and human resources. Therefore, automating the ICD code determination process is an important task. With the advancement of artificial intelligence theory and computational hardware, machine learning approach has emerged as a suitable solution to automate this process. In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document. Finally, they are fed to separate dense layers for corresponding code prediction. Furthermore, to handle the imbalance in the code frequency of clinical datasets, we employ a label distribution aware margin (LDAM) loss function. The experimental results on the MIMIC-III dataset show that our proposed model outperforms other baselines by a significant margin. In particular, our best setting achieves a micro-AUC score of 0.923 compared to 0.868 of bidirectional recurrent neural networks. We also show that by using the code-wise attention mechanism, the model can provide more insights about its prediction, and thus it can support clinicians to make reliable decisions. Our code is available online (https://github.com/biplob1ly/TransICD)

翻訳日:2023-04-06 08:05:40 公開日:2021-03-28

# BCNN: バイナリ複合ニューラルネットワーク

BCNN: Binary Complex Neural Network ( http://arxiv.org/abs/2104.10044v1 )

ライセンス: Link先を確認

Yanfei Li, Tong Geng, Ang Li, Huimin Yu

(参考訳) bnn(binarized neural networks)は、リソースに制限のあるハードウェアを備えたエッジサイドアプリケーションにおいて、優れた期待を示すが、精度の低下に関する懸念を提起する。複雑なニューラルネットワークに動機づけられ、bnnに複雑な表現を導入し、複雑な畳み込みによって2つの複雑な入力と重みを処理するが、それでもbnnの特別な計算効率を得られる新しいネットワーク設計であるbinary complex neural networkを提案する。高速収束率を確保するため,新しいBCNNベースのバッチ正規化関数と重み初期化関数を提案する。最先端ネットワークモデル(ResNet、ResNetE、NINなど)を用いたCifar10とImageNetの実験結果から、BCNNは元のBNNモデルよりも精度がよいことが示された。 BCNNは、複雑な表現を通じて学習能力を強化し、複雑な値の入力データに適用性を拡張することにより、BNNを改善している。 BCNNのソースコードはGitHubで公開される予定だ。

Binarized neural networks, or BNNs, show great promise in edge-side applications with resource limited hardware, but raise the concerns of reduced accuracy. Motivated by the complex neural networks, in this paper we introduce complex representation into the BNNs and propose Binary complex neural network -- a novel network design that processes binary complex inputs and weights through complex convolution, but still can harvest the extraordinary computation efficiency of BNNs. To ensure fast convergence rate, we propose novel BCNN based batch normalization function and weight initialization function. Experimental results on Cifar10 and ImageNet using state-of-the-art network models (e.g., ResNet, ResNetE and NIN) show that BCNN can achieve better accuracy compared to the original BNN models. BCNN improves BNN by strengthening its learning capability through complex representation and extending its applicability to complex-valued input data. The source code of BCNN will be released on GitHub.

翻訳日:2023-04-06 08:05:17 公開日:2021-03-28

# CyberLearning: サイバー異常とマルチアタックを検出する機械学習セキュリティモデリングの有効性分析

CyberLearning: Effectiveness Analysis of Machine Learning Security Modeling to Detect Cyber-Anomalies and Multi-Attacks ( http://arxiv.org/abs/2104.08080v1 )

ライセンス: Link先を確認

Iqbal H. Sarker

(参考訳) サイバー異常の検出や攻撃は、近年、サイバーセキュリティの領域で懸念が高まっている。人工知能の知識、特に機械学習技術は、これらの問題に取り組むのに使用できる。しかし,学習に基づくセキュリティモデルの有効性は,セキュリティ特性やデータ特性によって異なる可能性がある。本稿では,特徴選択を関連付けた機械学習に基づくサイバーセキュリティモデリングであるcyberlearningと,各種機械学習に基づくセキュリティモデルの有効性に関する包括的実証分析を行う。サイバーラーニングモデルでは,異常検出のためのバイナリ分類モデルと,各種サイバー攻撃に対するマルチクラス分類モデルを考慮に入れた。セキュリティモデルを構築するために,まず,ナイーブベイズ,ロジスティック回帰,確率勾配降下,k-ネアレスト近傍,サポートベクターマシン,決定木,ランダムフォレスト,適応ブースティング,極端な勾配ブースティング,線形判別分析などの10種類の機械学習分類手法を用いた。次に、複数の隠蔽層を考慮したニューラルネットワークベースのセキュリティモデルを提案する。これらの学習に基づくセキュリティモデルの有効性を,unsw-nb15とnsl-kddの2つのセキュリティデータセットを用いて検証した。本稿では,サイバーセキュリティの文脈における実験的分析と発見を通じて,データ駆動型セキュリティモデリングの基準点として機能することを目的とする。

Detecting cyber-anomalies and attacks are becoming a rising concern these days in the domain of cybersecurity. The knowledge of artificial intelligence, particularly, the machine learning techniques can be used to tackle these issues. However, the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. In this paper, we present "CyberLearning", a machine learning-based cybersecurity modeling with correlated-feature selection, and a comprehensive empirical analysis on the effectiveness of various machine learning based security models. In our CyberLearning modeling, we take into account a binary classification model for detecting anomalies, and multi-class classification model for various types of cyber-attacks. To build the security model, we first employ the popular ten machine learning classification techniques, such as naive Bayes, Logistic regression, Stochastic gradient descent, K-nearest neighbors, Support vector machine, Decision Tree, Random Forest, Adaptive Boosting, eXtreme Gradient Boosting, as well as Linear discriminant analysis. We then present the artificial neural network-based security model considering multiple hidden layers. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. Overall, this paper aims to serve as a reference point for data-driven security modeling through our experimental analysis and findings in the context of cybersecurity.

翻訳日:2023-04-06 08:04:58 公開日:2021-03-28

# デジタル超空間の数学

Mathematics of Digital Hyperspace ( http://arxiv.org/abs/2103.15203v1 )

ライセンス: Link先を確認

Jeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin

(参考訳) ソーシャルメディア、eコマース、ストリーミングビデオ、電子メール、クラウドドキュメント、Webページ、トラフィックフロー、ネットワークパケットは、私たちが毎日使っている巨大なデジタル湖、川、海を埋めます。このデジタルハイパースペースは、型と次元の標準概念を拡張する連続ストリームによって支えられるデータのアモルファスフローである。デジタル超空間の非構造化データは、超グラフ、超疎行列、連想配列代数の数学を通じてエレガントに表現、横断、変換することができる。本稿では,グラフ解析,データベース操作,機械学習に不可欠な操作を提供するためのセミリングのペアを組み合わせた,新しい数学的概念であるsemilinkについて検討する。 GraphBLAS標準は現在、ハイパーグラフ、ハイパースパース行列、セミリンクに必要な数学をサポートし、グラフ、ネットワーク、行列操作をシームレスに実行する。キーベースのインデックス(文字列へのポインタなど)とセミリンクの追加により、GraphBLASはよりリッチな連想配列代数となり、スプレッドシート、データベーステーブル、データ中心のオペレーティングシステムのプラグイン代替となり、デジタルハイパースペースで見つかった非構造化データのナビゲーションが強化される。

Social media, e-commerce, streaming video, e-mail, cloud documents, web pages, traffic flows, and network packets fill vast digital lakes, rivers, and oceans that we each navigate daily. This digital hyperspace is an amorphous flow of data supported by continuous streams that stretch standard concepts of type and dimension. The unstructured data of digital hyperspace can be elegantly represented, traversed, and transformed via the mathematics of hypergraphs, hypersparse matrices, and associative array algebra. This paper explores a novel mathematical concept, the semilink, that combines pairs of semirings to provide the essential operations for graph analytics, database operations, and machine learning. The GraphBLAS standard currently supports hypergraphs, hypersparse matrices, the mathematics required for semilinks, and seamlessly performs graph, network, and matrix operations. With the addition of key based indices (such as pointers to strings) and semilinks, GraphBLAS can become a richer associative array algebra and be a plug-in replacement for spreadsheets, database tables, and data centric operating systems, enhancing the navigation of unstructured data found in digital hyperspace.

翻訳日:2023-04-06 08:03:08 公開日:2021-03-28

# シーケンシャル・ツー・シーケンスのVAEは、文のグローバルな特徴を学ぶか?

Do sequence-to-sequence VAEs learn global features of sentences? ( http://arxiv.org/abs/2004.07683v2 )

ライセンス: Link先を確認

Tom Bosc and Pascal Vincent

(参考訳) 自動回帰言語モデルは強力で、訓練が比較的容易です。しかしながら、これらのモデルは通常、明示的な条件付けのラベルなしで訓練されており、世代間の感情やトピックといったグローバルな側面を制御する簡単な方法を提供していない。 Bowman & al. (2016) は、変分オートエンコーダ (VAE) をシーケンス・ツー・シーケンスアーキテクチャで自然言語に適応させ、潜在ベクトルはそのようなグローバルな特徴を教師なしで捉えることができると主張した。我々はこの主張に疑問を呈する。文中の位置ごとの再構成損失を分解することにより、潜在情報から最も有益な単語を計測する。この方法を用いることで,vaesは最初の単語と文長を記憶し易く,有用性に乏しい局所的特徴を生じやすいことがわかった。そこで本研究では,単語の先入観と言語モデルの事前学習に基づく代替アーキテクチャについて検討する。これらの変種はよりグローバルな潜在変数、すなわちトピックや感情ラベルをより予測的に学習する。また,第1の単語と文の長さは,ベースラインほど正確には復元されないため,より多様な復元結果が得られるため,記憶力の低下が観察された。

Autoregressive language models are powerful and relatively easy to train. However, these models are usually trained without explicit conditioning labels and do not offer easy ways to control global aspects such as sentiment or topic during generation. Bowman & al. (2016) adapted the Variational Autoencoder (VAE) for natural language with the sequence-to-sequence architecture and claimed that the latent vector was able to capture such global features in an unsupervised manner. We question this claim. We measure which words benefit most from the latent information by decomposing the reconstruction loss per position in the sentence. Using this method, we find that VAEs are prone to memorizing the first words and the sentence length, producing local features of limited usefulness. To alleviate this, we investigate alternative architectures based on bag-of-words assumptions and language model pretraining. These variants learn latent variables that are more global, i.e., more predictive of topic or sentiment labels. Moreover, using reconstructions, we observe that they decrease memorization: the first word and the sentence length are not recovered as accurately than with the baselines, consequently yielding more diverse reconstructions.

翻訳日:2022-12-12 21:02:13 公開日:2021-03-28

# Red-GAN: 条件付き生成によるクラス不均衡の攻撃。皮膚病変皮膚内視鏡と脳腫瘍mriにおける医用画像合成の新たな展望

Red-GAN: Attacking class imbalance via conditioned generation. Yet another perspective on medical image synthesis for skin lesion dermoscopy and brain tumor MRI ( http://arxiv.org/abs/2004.10734v4 )

ライセンス: Link先を確認

Ahmad B Qasim, Ivan Ezhov, Suprosanna Shit, Oliver Schoppe, Johannes C Paetzold, Anjany Sekuboyina, Florian Kofler, Jana Lipkova, Hongwei Li, Bjoern Menze

(参考訳) データ体制の不足下での学習アルゴリズムの爆発は、医療画像分野の限界と現実である。この問題を緩和するために,生成的敵ネットワークに基づくデータ拡張プロトコルを提案する。我々は,ネットワークをピクセルレベル(セグメンテーションマスク)およびグローバルレベル情報(獲得環境または病変型)に設定する。このような条件付けは、合成画像のグローバルクラス固有の外観を制御しつつ、画像ラベル対への即時アクセスを提供する。セグメンテーションタスクに関連する特徴の合成を促進させるために、セグメンテーションゲームにセグメンテーションの形で追加の受動プレーヤを導入する。このアプローチを、BraTS、ISICの2つの医療データセットで検証する。トレーニングセットに合成画像の注入によりクラス分布を制御することにより、データセットのクラスの精度レベルを制御する。

Exploiting learning algorithms under scarce data regimes is a limitation and a reality of the medical imaging field. In an attempt to mitigate the problem, we propose a data augmentation protocol based on generative adversarial networks. We condition the networks at a pixel-level (segmentation mask) and at a global-level information (acquisition environment or lesion type). Such conditioning provides immediate access to the image-label pairs while controlling global class specific appearance of the synthesized images. To stimulate synthesis of the features relevant for the segmentation task, an additional passive player in a form of segmentor is introduced into the adversarial game. We validate the approach on two medical datasets: BraTS, ISIC. By controlling the class distribution through injection of synthetic images into the training set we achieve control over the accuracy levels of the datasets' classes.

翻訳日:2022-12-10 18:06:51 公開日:2021-03-28

# 論理チームq-learning:協調的marlにおける因子政策へのアプローチ

Logical Team Q-learning: An approach towards factored policies in cooperative MARL ( http://arxiv.org/abs/2006.03553v2 )

ライセンス: Link先を確認

Lucas Cassano and Ali H. Sayed

(参考訳) 我々は,協調的marlシナリオにおける因子政策の学習の課題に対処した。特に、エージェントのチームが協力して共通のコストを最適化する状況を考察する。目標は、それぞれのエージェントの個々の行動を決定する要因付きポリシーを得ることであり、結果として得られる共同ポリシーが最適である。この研究の主な貢献は、Logical Team Q-learning(LTQL)の導入である。 LTQLは環境に関する仮定に依存しないので、一般的なMARLシナリオに適用される。本研究で導入した動的プログラミング手法の確率近似としてLTQLを導出する。論文の結論は,その主張を説明する実験(表と深い設定の両方)を提供することである。

We address the challenge of learning factored policies in cooperative MARL scenarios. In particular, we consider the situation in which a team of agents collaborates to optimize a common cost. The goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. The main contribution of this work is the introduction of Logical Team Q-learning (LTQL). LTQL does not rely on assumptions about the environment and hence is generally applicable to any collaborative MARL scenario. We derive LTQL as a stochastic approximation to a dynamic programming method we introduce in this work. We conclude the paper by providing experiments (both in the tabular and deep settings) that illustrate the claims.

翻訳日:2022-11-25 02:42:09 公開日:2021-03-28

# 非凸正規化器を用いたランク最小化のための低ランク因子化

Low-Rank Factorization for Rank Minimization with Nonconvex Regularizers ( http://arxiv.org/abs/2006.07702v2 )

ライセンス: Link先を確認

April Sagan, John E. Mitchell

(参考訳) ランク最小化は、リコメンダシステムやロバストな主成分分析のような機械学習応用に関心がある。階数最小化問題である核ノルムへの凸緩和の最小化は、強力な性能保証によって問題を解決する効果的な手法である。しかし、非凸緩和は核規範よりも推定バイアスが少なく、測定に対するノイズの影響をより正確に低減することができる。本研究では, 繰り返し再重み付けされた核ノルムスキームに基づく効率的なアルゴリズムを開発し, また, ブラーとモンテイロによる半定値プログラムの低階分解を利用した。我々は収束を証明し,対流緩和と交互最小化法に対する利点を計算的に示す。さらに、我々のアルゴリズムの各反復の計算複雑性は、アートアルゴリズムの他の状態と同等であり、大きな行列に対するランク最小化問題の解を素早く見つけることができる。

Rank minimization is of interest in machine learning applications such as recommender systems and robust principal component analysis. Minimizing the convex relaxation to the rank minimization problem, the nuclear norm, is an effective technique to solve the problem with strong performance guarantees. However, nonconvex relaxations have less estimation bias than the nuclear norm and can more accurately reduce the effect of noise on the measurements. We develop efficient algorithms based on iteratively reweighted nuclear norm schemes, while also utilizing the low rank factorization for semidefinite programs put forth by Burer and Monteiro. We prove convergence and computationally show the advantages over convex relaxations and alternating minimization methods. Additionally, the computational complexity of each iteration of our algorithm is on par with other state of the art algorithms, allowing us to quickly find solutions to the rank minimization problem for large matrices.

翻訳日:2022-11-21 21:36:13 公開日:2021-03-28

# bertology meets biology: タンパク質言語モデルにおける注意の解釈

BERTology Meets Biology: Interpreting Attention in Protein Language Models ( http://arxiv.org/abs/2006.15222v3 )

ライセンス: Link先を確認

Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

(参考訳) トランスフォーマーアーキテクチャは、タンパク質の分類と生成タスクの有用な表現を学ぶことが証明されている。しかし、これらの表現は解釈可能性の課題を示す。本研究では,タンパク質トランスフォーマーモデルを注目レンズで解析するための一連の手法を実証する。 1) タンパク質の折りたたみ構造を捉え, 基底配列に遠く離れているが立体構造に空間的に近いアミノ酸を結合し, (2) タンパク質の主要な機能成分である結合部位を標的とし, 3) 層深度を増加させるとともに, より複雑な生物物理特性に着目する。この挙動は、3つのTransformerアーキテクチャ(BERT, ALBERT, XLNet)と2つの異なるタンパク質データセットで一致している。また,注意とタンパク質構造との相互作用を3次元的に可視化する。可視化と分析のためのコードはhttps://github.com/salesforce/provis.com/で入手できる。

Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We find this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets. We also present a three-dimensional visualization of the interaction between attention and protein structure. Code for visualization and analysis is available at https://github.com/salesforce/provis.

翻訳日:2022-11-16 21:20:58 公開日:2021-03-28

# カメラポーズ問題:姿勢分布バイアス緩和による深度予測の改善

Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias ( http://arxiv.org/abs/2007.03887v2 )

ライセンス: Link先を確認

Yunhan Zhao, Shu Kong, Charless Fowlkes

(参考訳) 単眼深度予測装置は通常、カメラポーズの分布に偏りがある大規模なトレーニングセットで訓練される。その結果、訓練された予測者は、珍しいカメラポーズで撮影されたサンプルをテストするために、信頼できる深さ予測を行うことができない。この問題に対処するために、トレーニングと予測中にカメラのポーズを利用する2つの新しい手法を提案する。まず、幾何学的に一貫した方法で既存のものを摂動することで、より多様な視点で新しいトレーニング例を合成する単純な視点対応データ拡張を提案する。次に,画像当たりのカメラポーズを先行知識として利用し,入力の一部として符号化する条件モデルを提案する。この2つの手法を共同で適用することで、撮影される画像の深度予測が向上することを示す。提案手法は,様々な予測アーキテクチャに適用することで性能が向上することを示す。最後に,実画像上で評価した場合,カメラポーズ分布を明示的にエンコードすることで,合成学習した深度予測器の一般化性能が向上することを示す。

Monocular depth predictors are typically trained on large-scale training sets which are naturally biased w.r.t the distribution of camera poses. As a result, trained predictors fail to make reliable depth predictions for testing examples captured under uncommon camera poses. To address this issue, we propose two novel techniques that exploit the camera pose during training and prediction. First, we introduce a simple perspective-aware data augmentation that synthesizes new training examples with more diverse views by perturbing the existing ones in a geometrically consistent manner. Second, we propose a conditional model that exploits the per-image camera pose as prior knowledge by encoding it as a part of the input. We show that jointly applying the two methods improves depth prediction on images captured under uncommon and even never-before-seen camera poses. We show that our methods improve performance when applied to a range of different predictor architectures. Lastly, we show that explicitly encoding the camera pose distribution improves the generalization performance of a synthetically trained depth predictor when evaluated on real images.

翻訳日:2022-11-12 13:15:19 公開日:2021-03-28

# 連成微分方程式に基づく不確かさ系の最適実験設計

Optimal Experimental Design for Uncertain Systems Based on Coupled Differential Equations ( http://arxiv.org/abs/2007.06117v2 )

ライセンス: Link先を確認

Youngjoon Hong, Bongsuk Kwon, and Byung-Jun Yoon

(参考訳) 結合常微分方程式により記述されたN相互作用振動子からなる不確実な倉本モデルに対する最適実験設計問題を考える。本研究の目的は,振動子間の結合強度における不確かさを効果的に低減し,不確実性倉本モデルのロバスト制御コストを最小化することにある。最適実験の設計における潜在的実験の運用効果の定量化の重要性を示す。

We consider the optimal experimental design problem for an uncertain Kuramoto model, which consists of N interacting oscillators described by coupled ordinary differential equations. The objective is to design experiments that can effectively reduce the uncertainty present in the coupling strengths between the oscillators, thereby minimizing the cost of robust control of the uncertain Kuramoto model. We demonstrate the importance of quantifying the operational impact of the potential experiments in designing optimal experiments.

翻訳日:2022-11-11 06:14:10 公開日:2021-03-28

# 可読性ニューラルヒューリスティックスを用いた微分可能プログラムの学習

Learning Differentiable Programs with Admissible Neural Heuristics ( http://arxiv.org/abs/2007.12101v5 )

ライセンス: Link先を確認

Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri

(参考訳) ドメイン固有言語におけるプログラムとして表現される微分可能関数の学習問題について検討する。このようなプログラムモデルは、コンポーザビリティや解釈可能性などの利点を提供するが、学習にはプログラム"アーキテクチャ"の組合せ空間よりも最適化する必要がある。この最適化問題を,プログラム構文のトップダウン導出をパスエンコードする重み付きグラフの探索として構成する。私たちの重要なイノベーションは、ニューラルネットワークのさまざまなクラスを、プログラムの空間上の連続的な緩和と見なすことです。この緩和プログラムは識別可能であり、エンドツーエンドで訓練することが可能であり、結果として得られるトレーニング損失は、組合せ探索を導くことができるおよそ許容されるヒューリスティックである。我々は、A-starアルゴリズムと反復的に分枝分枝分枝探索を用いてアプローチをインスタンス化し、これらのアルゴリズムを用いて3つのシーケンス分類タスクでプログラム分類器を学習する。実験の結果,アルゴリズムはプログラム学習の最先端手法よりも優れており,自然な解釈を導き,競争精度を実現するプログラム分類器が発見された。

We study the problem of learning differentiable functions expressed as programs in a domain-specific language. Such programmatic models can offer benefits such as composability and interpretability; however, learning them requires optimizing over a combinatorial space of program "architectures". We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax. Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs, which can then be used to complete any partial program. This relaxed program is differentiable and can be trained end-to-end, and the resulting training loss is an approximately admissible heuristic that can guide the combinatorial search. We instantiate our approach on top of the A-star algorithm and an iteratively deepened branch-and-bound search, and use these algorithms to learn programmatic classifiers in three sequence classification tasks. Our experiments show that the algorithms outperform state-of-the-art methods for program learning, and that they discover programmatic classifiers that yield natural interpretations and achieve competitive accuracy.

翻訳日:2022-11-07 11:47:22 公開日:2021-03-28

# リッジ回帰としての時変パラメータ

Time-Varying Parameters as Ridge Regressions ( http://arxiv.org/abs/2009.00401v2 )

ライセンス: Link先を確認

Philippe Goulet Coulombe

(参考訳) 時間変化パラメータ(TVP)モデルは、構造変化をモデル化するためにしばしば経済学で使用される。実際、それらは隆起レグレッションであることを示す。これにより、状態空間のパラダイムよりも計算、チューニング、実装がずっと簡単になります。中でも、等価な二重尾根問題の解法は高次元においても非常に高速であり、重要な「時間変化の量」はクロスバリデーションによって調整される。進化するボラティリティは2段階のリッジ回帰を用いて処理される。空間性(アルゴリズムはどのパラメータが変化し、どのパラメータが変化しないかを選択する)と縮小ランク制限(変数は因子モデルに結びついている)を含む拡張を考える。このアプローチの有用性を示すために、私はカナダにおける金融政策の進化を研究するためにそれを使用します。このアプリケーションは、新しいメソッドの到達範囲内にあるタスクである約4600tvpsの見積もりを必要とする。

Time-varying parameters (TVPs) models are frequently used in economics to model structural change. I show that they are in fact ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method.

翻訳日:2022-10-23 01:54:31 公開日:2021-03-28

# クリーンな参照なしにDenoiserの強化と学習

Enhancing and Learning Denoiser without Clean Reference ( http://arxiv.org/abs/2009.04286v2 )

ライセンス: Link先を確認

Rui Zhao and Daniel P.K. Lun and Kin-Man Lam

(参考訳) 近年,様々なノイズ低減タスクにおいて,学習に基づく画像認識が有望な性能を達成している。これらの深い雑音の多くは、クリーンな参照の監督の下で訓練されるか、合成ノイズの監視を受けていないかのどちらかである。合成ノイズの仮定は、実際の写真に直面する際の一般化を損なう。この問題に対処するために,ノイズ伝達タスクの特別な場合として,ノイズ低減タスクを考慮し,新しい深部画像デオライズ手法を提案する。学習ノイズ伝達により、破損したサンプルを観察することで、ネットワークがノイズ除去能力を取得することができる。実世界の雑音除去ベンチマークの結果,提案手法は現実的な雑音除去に有望な性能を発揮でき,実用的な雑音低減問題に対する潜在的な解決策となることが示された。

Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks. Most of these deep denoisers are trained either under the supervision of clean references, or unsupervised on synthetic noise. The assumption with the synthetic noise leads to poor generalization when facing real photographs. To address this issue, we propose a novel deep image-denoising method by regarding the noise reduction task as a special case of the noise transference task. Learning noise transference enables the network to acquire the denoising ability by observing the corrupted samples. The results on real-world denoising benchmarks demonstrate that our proposed method achieves promising performance on removing realistic noises, making it a potential solution to practical noise reduction problems.

翻訳日:2022-10-20 11:57:21 公開日:2021-03-28

# PAL : テキストに基づくアクティブラーニング

PAL : Pretext-based Active Learning ( http://arxiv.org/abs/2010.15947v3 )

ライセンス: Link先を確認

Shubhang Bhatnagar, Sachin Goyal, Darshan Tank, Amit Sethi

(参考訳) プールベースのアクティブラーニングの目標は、教師付き学習者の精度を最大化するために、プールからラベルのないサンプルの固定サイズのサブセットを選択して、oracleにラベルを問い合わせることである。しかし、oracleが常に正しいラベルを割り当てるべきという不必要な要件は、ほとんどの状況において理不尽です。提案手法は,従来の提案手法よりも,誤ラベルに頑健な深層ニューラルネットワークの能動的学習手法を提案する。従来の手法は、未ラベルサンプルの新規性を推定するためにタスクネットワーク自体に依存していたが、タスクの学習(一般化)とサンプルの選択(分布外検出)は相反する。ラベルのないサンプルを選別するために、別ネットワークを使用します。スコアリングネットワークは、ラベル付きサンプルの分布をモデル化し、潜在的にノイズの多いラベルへの依存性を減らすための自己スーパービジョンに依存している。また,多タスク学習による正規化のためのスコアリングネットワーク上に別のヘッドを配置し,異常な自己分散型ハイブリットスコアリング機能を利用する。さらに,各クエリをラベル付けする前にサブクエリに分割することで,クエリが多種多様なサンプルを持つことを保証する。オラクルによるサンプルの誤ラベルに対する耐性が高いことに加えて、この結果の手法はラベルノイズのない場合の競合精度も生み出す。この技術は、これらのクラスのサンプリング率を一時的に増加させることで、新しいクラスをオンザフライで導入する処理も行う。

The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks that is more robust to mislabeling than the previously proposed techniques. Previous techniques rely on the task network itself to estimate the novelty of the unlabeled samples, but learning the task (generalization) and selecting samples (out-of-distribution detection) can be conflicting goals. We use a separate network to score the unlabeled samples for selection. The scoring network relies on self-supervision for modeling the distribution of the labeled samples to reduce the dependency on potentially noisy labels. To counter the paucity of data, we also deploy another head on the scoring network for regularization via multi-task learning and use an unusual self-balancing hybrid scoring function. Furthermore, we divide each query into sub-queries before labeling to ensure that the query has diverse samples. In addition to having a higher tolerance to mislabeling of samples by the oracle, the resultant technique also produces competitive accuracy in the absence of label noise. The technique also handles the introduction of new classes on-the-fly well by temporarily increasing the sampling rate of these classes.

翻訳日:2022-10-01 22:28:06 公開日:2021-03-28

# 統計的推論における濃度不等式

Concentration Inequalities for Statistical Inference ( http://arxiv.org/abs/2011.02258v3 )

ライセンス: Link先を確認

Huiming Zhang, Song Xi Chen

(参考訳) 本稿では, 分布非依存から分布依存まで, サブゲージ変数からサブ指数変数, サブガンマ, サブワイブル変数, および平均から最大濃度まで, 広範囲の数学統計学の非漸近解析において広く用いられている濃度不等式について考察する。このレビューは、これらの設定の結果に新しい結果を与えます。高次元データや推論の普及に伴い、高次元線形回帰やポアソン回帰の文脈における結果も提供される。我々は既知の定数の濃度不等式を説明し、より鋭い定数で既存の境界を改善することを目的とする。

This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings, from distribution-free to distribution-dependent, from sub-Gaussian to sub-exponential, sub-Gamma, and sub-Weibull random variables, and from the mean to the maximum concentration. This review provides results in these settings with some fresh new results. Given the increasing popularity of high-dimensional data and inference, results in the context of high-dimensional linear and Poisson regressions are also provided. We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.

翻訳日:2022-09-29 22:06:20 公開日:2021-03-28

# intentonomy: 人間の意図理解のためのデータセットと研究

Intentonomy: a Dataset and Study towards Human Intent Understanding ( http://arxiv.org/abs/2011.05558v2 )

ライセンス: Link先を確認

Menglin Jia and Zuxuan Wu and Austin Reiter and Claire Cardie and Serge Belongie and Ser-Nam Lim

(参考訳) 画像は1000ワードの価値があり、物理的な視覚的コンテンツを超えた情報を伝達する。本稿では,視覚情報がどのように人間の意図を認識するのに役立つかを分析する目的で,ソーシャルメディア画像の背景にある意図について検討する。この目的に向けて,広範囲の日常シーンをカバーする14K画像からなる意図的データセットIntentonomyを導入する。これらの画像は、社会心理学の分類から派生した28の意図カテゴリで手動で注釈付けされる。次に、視覚情報(オブジェクトとコンテキスト)が人間のモチベーション理解にどの程度寄与するかを体系的に研究した。本研究は,対象クラスや文脈クラスへの参加効果の定量化と,意図分類器を訓練する際のハッシュタグ形式のテキスト情報の定量化を目的としている。その結果,視覚的およびテキスト的情報の意図予測における可観測的効果について,定量的かつ定性的に考察した。

An image is worth a thousand words, conveying information that goes beyond the physical visual content therein. In this paper, we study the intent behind social media images with an aim to analyze how visual information can help the recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes. These images are manually annotated with 28 intent categories that are derived from a social psychology taxonomy. We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding. Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier. Our results quantitatively and qualitatively shed light on how visual and textual information can produce observable effects when predicting intent.

翻訳日:2022-09-27 00:52:59 公開日:2021-03-28

# 微分型粒子フィルタのエンド・ツー・エンド半教師付き学習

End-To-End Semi-supervised Learning for Differentiable Particle Filters ( http://arxiv.org/abs/2011.05748v2 )

ライセンス: Link先を確認

Hao Wen, Xiongjie Chen, Georgios Papagiannis, Conghui Hu and Yunpeng Li

(参考訳) ニューラルネットワークを粒子フィルタに組み込むことの最近の進歩は、大規模実世界のアプリケーションに粒子フィルタを適用するために望ましい柔軟性を提供する。このフレームワークの動的および測定モデルは、粒子フィルタの微分可能実装により学習可能である。このようなモデルを最適化する過去の努力は、実際に入手したり利用できないほど高価である真の状態の知識を必要とすることが多い。本稿では,アノテートされたデータに対する需要を減らすために,真の状態の大部分が未知である場合の状態を推定し,擬似的様相関数の最大化に基づくエンドツーエンド学習目標を提案する。シミュレーションおよび実世界のデータセットを用いたロボット工学における状態推定タスクにおける提案手法の性能を評価する。

Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.

翻訳日:2022-09-26 23:13:33 公開日:2021-03-28

# 患者テキストからの医学症状認識:長期多ラベル分布に対するアクティブラーニングアプローチ

Medical symptom recognition from patient text: An active learning approach for long-tailed multilabel distributions ( http://arxiv.org/abs/2011.06874v2 )

ライセンス: Link先を確認

Ali Mottaghi, Prathusha K Sarma, Xavier Amatriain, Serena Yeung, Anitha Kannan

(参考訳) 本研究は,患者(歴史テイク)から関連する情報を集める目的で,患者テキストから医療症状認識の問題について検討する。典型的な患者テキストは、患者が経験している症状を記述し、そのようなテキストの1つの例を複数の症状で"ラベル"することができる。これにより、医学的症状の認識が困難になる i)voluminous annotated dataの可用性の欠如二一つのテキストが写像できる複数の症状を有する大きな未知の宇宙さらに、患者のテキストはデータの長い尾で特徴づけられることが多い(例えば、"fever" 対 "hematochezia" では、いくつかのラベルや症状が他の人よりも頻繁に発生する)。本稿では,継続的に洗練され学習された潜在空間の構造を活用し,ラベル付けする最も有益な例を選択するアクティブラーニング手法を提案する。これにより、データ分布の長い尾にもかかわらず、学習されたモデルを通して症状の宇宙のカバレッジを徐々に増加させる最も有益な例を選択できる。

We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on account of i) the lack of availability of voluminous annotated data as well as ii) the large unknown universe of multiple symptoms that a single text can map to. Furthermore, patient text is often characterized by a long tail in the data (i.e., some labels/symptoms occur more frequently than others for e.g "fever" vs "hematochezia"). In this paper, we introduce an active learning method that leverages underlying structure of a continually refined, learned latent space to select the most informative examples to label. This enables the selection of the most informative examples that progressively increases the coverage on the universe of symptoms via the learned model, despite the long tail in data distribution.

翻訳日:2022-09-26 06:22:41 公開日:2021-03-28

# あなたの"Flamingo"は私の"Bird":ファイングラインドかノーか

Your "Flamingo" is My "Bird": Fine-Grained, or Not ( http://arxiv.org/abs/2011.09040v3 )

ライセンス: Link先を確認

Dongliang Chang, Kaiyue Pang, Yixiao Zheng, Zhanyu Ma, Yi-Zhe Song, and Jun Guo

(参考訳) 図1で目にするものが"flamingo"なのか"bird"なのかは、この論文で私たちが問う疑問です。きめ細かい視覚分類(FGVC)は前者への到達を試みていますが、ほとんどの場合、非専門家は「鳥」だけで十分でしょう。それゆえ、本当の質問は -- 異なる専門知識のレベルの下で、どのように異なるきめ細かい定義を調整できるのか? そのために、FGVCの従来の設定を、シングルラベルの分類から、事前に定義された粗いラベル階層のトップダウンのトラバーサルへと再検討し、私たちの答えが"bird"->"Phoenicopteriformes"->"Phoenicopteridae"->"flamingo"になるようにしました。この新たな問題に取り組むために、まず、多くの参加者が専門家であるかどうかに関わらず、マルチグラニュラティラベルを好むことを確認するための、包括的な人間研究を行う。粗いレベルのラベル予測は、きめ細かい特徴学習を悪化させるが、細い特徴は粗いレベルの分類器の学習をより良くする。この発見によって私たちは、新しい問題に対して驚くほど効果的なソリューションを設計できます。 (i)粒度の細かい粗い特徴を乱すために、レベル固有の分類ヘッドを利用する。 (ii) よりきめ細かい特徴を粗いラベル予測に組み込むことにより, よりゆがみが良くなる。実験により,本手法は新たなFGVC設定において優れた性能を示し,従来のシングルラベルFGVC問題よりも優れた性能を示した。その単純さにより、既存のFGVCフレームワーク上で容易に実装でき、パラメータフリーである。

Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore -- how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy -- so that our answer becomes "bird"-->"Phoenicopteriformes"-->"Phoenicopteridae"-->"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.

翻訳日:2022-09-24 04:12:18 公開日:2021-03-28

# 自動変換検索によるプライバシー保護協調学習

Privacy-preserving Collaborative Learning with Automatic Transformation Search ( http://arxiv.org/abs/2011.12505v2 )

ライセンス: Link先を確認

Wei Gao, Shangwei Guo, Tianwei Zhang, Han Qiu, Yonggang Wen, Yang Liu

(参考訳) 参加者は、トレーニングセットを共有することなく、Deep Learningモデルを共同でトレーニングすることができる。しかし、最近の研究では、敵が共有勾配からセンシティブなトレーニングサンプルを完全に回収できることが判明した。このような再建攻撃は、協調学習に深刻な脅威をもたらす。したがって、効果的な緩和ソリューションが緊急に望まれる。本稿では,データ拡張を利用して再構築攻撃を打倒することを提案する。慎重に選択された変換ポリシーで機密画像を前処理することで,敵が対応する勾配から有用な情報を抽出することは不可能となる。我々は、資格ポリシーを自動的に発見する新しい探索法をデザインする。データプライバシとモデルユーザビリティに対するトランスフォーメーションの影響を定量化するために,私たちは2つの新しいメトリクスを採用しています。包括的評価により,本手法が発見した方針は,協調学習における既存のレコンストラクション攻撃を克服し,高効率かつ無視可能なモデル性能への影響を実証する。

Collaborative learning has gained great popularity due to its benefit of data privacy protection: participants can jointly train a Deep Learning model without sharing their training sets. However, recent works discovered that an adversary can fully recover the sensitive training samples from the shared gradients. Such reconstruction attacks pose severe threats to collaborative learning. Hence, effective mitigation solutions are urgently desired. In this paper, we propose to leverage data augmentation to defeat reconstruction attacks: by preprocessing sensitive images with carefully-selected transformation policies, it becomes infeasible for the adversary to extract any useful information from the corresponding gradients. We design a novel search method to automatically discover qualified policies. We adopt two new metrics to quantify the impacts of transformations on data privacy and model usability, which can significantly accelerate the search speed. Comprehensive evaluations demonstrate that the policies discovered by our method can defeat existing reconstruction attacks in collaborative learning, with high efficiency and negligible impact on the model performance.

翻訳日:2022-09-21 02:55:59 公開日:2021-03-28

# ナビゲーションのための反復的視覚言語bert

A Recurrent Vision-and-Language BERT for Navigation ( http://arxiv.org/abs/2011.13922v2 )

ライセンス: Link先を確認

Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

(参考訳) 多くの視覚言語的タスクの精度は、視覚言語(V&L) BERT の応用から大きな恩恵を受けている。しかし,視覚・言語ナビゲーション(VLN)への応用は依然として限られている。この理由の1つは、BERTアーキテクチャを部分的に観測可能なマルコフ決定プロセスに適合させることが困難であることであり、歴史に依存した注意と意思決定が必要である。本稿では,vln で使用する時間に着目した再帰的 bert モデルを提案する。具体的には、エージェントのクロスモーダル状態情報を保持する再帰関数をBERTモデルに装備する。 R2RとREVERIEに関する広範な実験を通じて、我々のモデルはより複雑なエンコーダデコーダモデルを置き換えて最先端の結果が得られることを示した。さらに,本手法を他のトランスフォーマーアーキテクチャに一般化し,事前学習をサポートし,ナビゲーションと表現タスクの同時参照を可能とした。

Accuracy of many visiolinguistic tasks has benefited significantly from the application of vision-and-language(V&L) BERT. However, its application for the task of vision-and-language navigation (VLN) remains limited. One reason for this is the difficulty adapting the BERT architecture to the partially observable Markov decision process present in VLN, requiring history-dependent attention and decision making. In this paper we propose a recurrent BERT model that is time-aware for use in VLN. Specifically, we equip the BERT model with a recurrent function that maintains cross-modal state information for the agent. Through extensive experiments on R2R and REVERIE we demonstrate that our model can replace more complex encoder-decoder models to achieve state-of-the-art results. Moreover, our approach can be generalised to other transformer-based architectures, supports pre-training, and is capable of solving navigation and referring expression tasks simultaneously.

翻訳日:2022-09-20 09:12:47 公開日:2021-03-28

# 制約付きリスク逆マルコフ決定過程

Constrained Risk-Averse Markov Decision Processes ( http://arxiv.org/abs/2012.02423v2 )

ライセンス: Link先を確認

Mohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham, Richard M. Murray, and Aaron D. Ames

(参考訳) マルコフ決定プロセス(MDP)の方針を動的コヒーレントなリスク目標と制約で設計する問題を考察する。まず、問題をLagrangianフレームワークで定式化することから始めます。リスク目標と制約をマルコフリスク遷移マッピングで表現できるという仮定の下で,制約付きリスク回避問題の下限となるマルコフポリシーを合成する最適化ベース手法を提案する。定式化された最適化問題は差分凸プログラム (dcps) の形式であり、disciplined convex-concave programming (dccp) フレームワークによって解決できることを実証する。これらの結果は,制約付きmdpの線形プログラムを,期待コストと制約の合計値で一般化することを示す。最後に,条件値-値-リスク(CVaR)とエントロピー-値-リスク(EVaR)のコヒーレントリスク対策を含むローバーナビゲーション問題に対する数値実験による提案手法の有効性について述べる。

We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition mapping, we propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem. We demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. Finally, we illustrate the effectiveness of the proposed method with numerical experiments on a rover navigation problem involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.

翻訳日:2021-05-22 20:34:03 公開日:2021-03-28

# (参考訳) 因子グラフに基づく推定のための学習触覚モデル

Learning Tactile Models for Factor Graph-based Estimation ( http://arxiv.org/abs/2012.03768v2 )

ライセンス: CC BY 4.0

Paloma Sodhi, Michael Kaess, Mustafa Mukadam, Stuart Anderson

(参考訳) 咬合下での操作時のタッチから物体状態を推定する問題に興味がある。本研究では,平面押下時のタッチから物体のポーズを推定する問題に対処する。視覚ベースの触覚センサーは、接触点におけるリッチで局所的な画像計測を提供する。しかし、そのような測定には限られた情報が含まれており、潜伏状態の推測には複数の測定が必要である。この推論問題を因子グラフを用いて解く。触覚測定をグラフに組み込むためには,高次元の触覚画像を低次元の状態空間にマッピングできる局所観測モデルが必要である。以前の研究では、触覚測定を解釈するために低次元の力測定や工学的機能を使用してきた。しかし、これらの方法は脆く、物体やセンサーにまたがるスケールが困難である。私たちの重要な洞察は、触覚画像からセンサーの相対的な位置を予測する触覚観察モデルを直接学習することだ。これらの相対的なポーズは、因子グラフ内の因子として組み込むことができる。そこで我々は,まず,地中真理データに基づく局所触覚観測モデルを学習し,それらのモデルと物理および幾何学的要素を因子グラフオプティマイザに統合する2段階のアプローチを提案する。 3つの物体形状にまたがる様々な軌跡を持つ150の実世界の平面プッシュシーケンスに対して触覚フィードバックのみを用いて,信頼性の高い物体追跡を行う。追加ビデオ: https://youtu.be/y1kbfsmi8w0

We're interested in the problem of estimating object states from touch during manipulation under occlusions. In this work, we address the problem of estimating object poses from touch during planar pushing. Vision-based tactile sensors provide rich, local image measurements at the point of contact. A single such measurement, however, contains limited information and multiple measurements are needed to infer latent object state. We solve this inference problem using a factor graph. In order to incorporate tactile measurements in the graph, we need local observation models that can map high-dimensional tactile images onto a low-dimensional state space. Prior work has used low-dimensional force measurements or engineered functions to interpret tactile measurements. These methods, however, can be brittle and difficult to scale across objects and sensors. Our key insight is to directly learn tactile observation models that predict the relative pose of the sensor given a pair of tactile images. These relative poses can then be incorporated as factors within a factor graph. We propose a two-stage approach: first we learn local tactile observation models supervised with ground truth data, and then integrate these models along with physics and geometric factors within a factor graph optimizer. We demonstrate reliable object tracking using only tactile feedback for 150 real-world planar pushing sequences with varying trajectories across three object shapes. Supplementary video: https://youtu.be/y1kBfSmi8w0

翻訳日:2021-05-18 13:28:47 公開日:2021-03-28

# リンク予測のための逆順順列ノード表現法

Adversarial Permutation Guided Node Representations for Link Prediction ( http://arxiv.org/abs/2012.08974v2 )

ライセンス: Link先を確認

Indradyumna Roy, Abir De, Soumen Chakrabarti

(参考訳) ソーシャルネットワークのスナップショットを観察した後、リンク予測(LP)アルゴリズムは、将来新たなエッジが成立する可能性のあるノードペアを特定する。ほとんどのlpアルゴリズムは、現在不要なノード対のスコアを推定し、このスコアでランク付けする。最近のlpシステムは、ノードの密度の低い低次元ベクトル表現を比較することでこのスコアを計算する。グラフニューラルネットワーク(GNN)、特にグラフ畳み込みネットワーク(GCN)は一般的な例である。 2つのノードを有意義に比較するためには、それらの埋め込みは隣人の並べ替えとは無関係であるべきである。 GNNは通常、この特性を保証するために単純で対称な集合アグリゲータを使用するが、この設計決定は表現力に制限のある表現を生成することが示されている。シーケンスエンコーダはより表現力が高いが、設計に敏感である。このジレンマを克服する最近の取り組みは、LPタスクに不満足であることが判明した。提案するPermGNNは,リカレントかつオーダーセンシティブなアグリゲータを用いて隣接した特徴を集約し,隣り合う置換の逆生成器によって「攻撃」される場合,LP損失を直接最小化する。設計上、PermGNN{} は以前の対称アグリゲータよりも表現力が高い。次に、PermGNNのノード埋め込みを適切な局所性に敏感なハッシュにマッピングする最適化フレームワークを考案し、LPタスクのトップ$K$のエッジの報告を高速化する。多様なデータセットに関する実験によれば、\ourは最先端のリンク予測器をかなり上回っており、最も可能性の高いエッジを素早く予測できる。

After observing a snapshot of a social network, a link prediction (LP) algorithm identifies node pairs between which new edges will likely materialize in future. Most LP algorithms estimate a score for currently non-neighboring node pairs, and rank them by this score. Recent LP systems compute this score by comparing dense, low dimensional vector representations of nodes. Graph neural networks (GNNs), in particular graph convolutional networks (GCNs), are popular examples. For two nodes to be meaningfully compared, their embeddings should be indifferent to reordering of their neighbors. GNNs typically use simple, symmetric set aggregators to ensure this property, but this design decision has been shown to produce representations with limited expressive power. Sequence encoders are more expressive, but are permutation sensitive by design. Recent efforts to overcome this dilemma turn out to be unsatisfactory for LP tasks. In response, we propose PermGNN, which aggregates neighbor features using a recurrent, order-sensitive aggregator and directly minimizes an LP loss while it is `attacked' by adversarial generator of neighbor permutations. By design, PermGNN{} has more expressive power compared to earlier symmetric aggregators. Next, we devise an optimization framework to map PermGNN's node embeddings to a suitable locality-sensitive hash, which speeds up reporting the top-$K$ most likely edges for the LP task. Our experiments on diverse datasets show that \our outperforms several state-of-the-art link predictors by a significant margin, and can predict the most likely edges fast.

翻訳日:2021-05-09 12:37:11 公開日:2021-03-28

# Informer: 時系列予測のための効率的なトランスフォーマー

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting ( http://arxiv.org/abs/2012.07436v3 )

ライセンス: Link先を確認

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang

(参考訳) 実世界のアプリケーションの多くは、電力消費計画のような長い時系列の予測を必要とする。長周期時系列予測(LSTF)は、出力と入力の正確な長距離依存性を効率的に捉える能力であるモデルの高い予測能力を必要とする。近年の研究では、トランスフォーマーが予測能力を高める可能性を示している。しかしtransformerには、二次時間の複雑さ、高メモリ使用量、エンコーダ-デコーダアーキテクチャの固有の制限など、lstfに直接適用できないいくつかの深刻な問題がある。 i)$ProbSparse$ self-attention mechanism, 時間複雑性とメモリ使用量で$O(L \log L)$を達成し, シーケンスの依存性アライメントに匹敵する性能を持つ。 (ii)カスケード層入力を半減し、極端に長い入力列を効率的に処理することにより、自己着脱蒸留が注目の高まりを強調する。 (iii)生成型デコーダは概念的には単純であるが、ステップバイステップではなく1回のフォワード操作で長い時系列シーケンスを予測し、長シーケンス予測の推論速度を大幅に向上させる。 4つの大規模なデータセットに対する大規模な実験は、Informerが既存のメソッドを著しく上回り、LSTF問題に対する新しい解決策を提供することを示した。

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.

翻訳日:2021-05-08 14:38:11 公開日:2021-03-28

# トランスを用いたエンド・ツー・エンドヒューマン・ポースとメッシュ再構成

End-to-End Human Pose and Mesh Reconstruction with Transformers ( http://arxiv.org/abs/2012.09760v2 )

ライセンス: Link先を確認

Kevin Lin, Lijuan Wang, Zicheng Liu

(参考訳) 本研究では,メッシュトランスフォーマタ(metro)と呼ばれる新しい手法を提案し,人間の3次元ポーズとメッシュ頂点を1つの画像から再構成する。本手法では、トランスコーダを用いて頂点-頂点-接合相互作用をモデル化し、3次元ジョイント座標とメッシュ頂点を同時に出力する。ポーズと形状パラメータを回帰する既存の手法と比較して、METROはSMPLのようなパラメトリックメッシュモデルに依存しないので、手などの他のオブジェクトにも容易に拡張できる。さらにメッシュトポロジーを緩和し、トランスフォーマー自着機構が任意の2つの頂点間を自由に参加できるようにし、メッシュ頂点と関節間の非局所関係を学べるようにした。提案するマスキング頂点モデリングは, 部分閉塞などの困難な状況に対してより頑健で効果的な手法である。 METROは、パブリックなHuman3.6Mと3DPWデータセット上で、人間のメッシュ再構築のための新しい最先端の結果を生成する。さらに,METROの3次元手指再構成への一般化性を示し,FreiHANDデータセットにおける既存の最先端手法よりも優れていた。

We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset.

翻訳日:2021-05-02 07:23:24 公開日:2021-03-28

# SegGroup: 3DインスタンスとセマンティックセグメンテーションのためのSeg-Levelスーパービジョン

SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation ( http://arxiv.org/abs/2012.10217v2 )

ライセンス: Link先を確認

An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie Zhou

(参考訳) ほとんどの既存のポイントクラウドインスタンスとセマンティックセグメンテーションメソッドは、シーンのすべてのポイントに対してポイントレベルのラベルを必要とする強力な監視信号に大きく依存しています。しかし、このような強い監督はアノテーションのコストの増大に苦しめられ、効率的な注釈研究の必要性が高まる。本稿では,3次元シーンセグメンテーションにおけるインスタンスの位置が重要であることを明らかにする。ロケーションの利点をフルに活用することで、アノテーションの場所を示すためにインスタンス毎に1つのポイントをクリックするだけで、弱教師付きポイントクラウドセグメンテーションアルゴリズムを設計する。事前処理のオーバーセグメンテーションにより、これらの位置アノテーションをセグレベルのラベルとしてセグメントに拡張する。さらにセグメントグループ化ネットワーク(seggroup)を設計、segレベルラベルの下で擬似的なポイントレベルラベルを生成するために、ラベルのないセグメントを関連するラベル付きセグメントに階層的にグループ化することで、既存のポイントレベルの教師付きセグメントモデルがこれらの擬似ラベルを直接使用してトレーニングできるようにする。実験結果から, セグレベル制御法 (SegGroup) は, 完全注釈付き点レベル制御法と同等の結果が得られることがわかった。さらに、固定アノテーション予算が与えられた最近の弱い監督手法よりも優れています。

Most existing point cloud instance and semantic segmentation methods rely heavily on strong supervision signals, which require point-level labels for every point in the scene. However, such strong supervision suffers from large annotation costs, arousing the need to study efficient annotating. In this paper, we discover that the locations of instances matter for 3D scene segmentation. By fully taking the advantages of locations, we design a weakly supervised point cloud segmentation algorithm that only requires clicking on one point per instance to indicate its location for annotation. With over-segmentation for pre-processing, we extend these location annotations into segments as seg-level labels. We further design a segment grouping network (SegGroup) to generate pseudo point-level labels under seg-level labels by hierarchically grouping the unlabeled segments into the relevant nearby labeled segments, so that existing point-level supervised segmentation models can directly consume these pseudo labels for training. Experimental results show that our seg-level supervised method (SegGroup) achieves comparable results with the fully annotated point-level supervised methods. Moreover, it also outperforms the recent weakly supervised methods given a fixed annotation budget.

翻訳日:2021-05-01 18:11:15 公開日:2021-03-28

# BAF検出器:太陽電池欠陥検出のための効率的なCNN検出器

BAF-Detector: An Efficient CNN-Based Detector for Photovoltaic Cell Defect Detection ( http://arxiv.org/abs/2012.10631v2 )

ライセンス: Link先を確認

Binyi Su, Haiyong Chen, Zhong Zhou

(参考訳) 太陽電池(PV)セルエレクトロルミネッセンス(EL)画像のマルチスケール欠陥検出は,ネットワークの深層化に伴う特徴の消失による課題である。この問題に対処するため,マルチスケール機能融合を実現するため,アテンションベースのトップダウン・ボトムアップアーキテクチャを開発した。このアーキテクチャはBAFPN(Bidirectional Attention Feature Pyramid Network)と呼ばれ、ピラミッドのすべての層が同様のセマンティックな特徴を共有することができる。 BAFPNでは、融合特徴における各画素の重要性を測定するためにコサイン類似性を用いる。さらに、高速RCNN+FPNの領域提案ネットワーク(RPN)にBAFPNを埋め込んだ新しい物体検出器BAF-Detectorが提案されている。 BAFPNはネットワークの堅牢性を改善してスケールし,マルチスケール欠陥検出タスクにおいて優れた性能を実現する。最後に,3629画像,2129画像を含む大規模elデータセットにおける実験結果から,本手法は生のpvセルel画像において,マルチスケールの欠陥分類と検出結果の点で98.70% (f-measure),88.07% (map),73.29% (iou) を達成した。

The multi-scale defect detection for photovoltaic (PV) cell electroluminescence (EL) images is a challenging task, due to the feature vanishing as network deepens. To address this problem, an attention-based top-down and bottom-up architecture is developed to accomplish multi-scale feature fusion. This architecture, called Bidirectional Attention Feature Pyramid Network (BAFPN), can make all layers of the pyramid share similar semantic features. In BAFPN, cosine similarity is employed to measure the importance of each pixel in the fused features. Furthermore, a novel object detector is proposed, called BAF-Detector, which embeds BAFPN into Region Proposal Network (RPN) in Faster RCNN+FPN. BAFPN improves the robustness of the network to scales, thus the proposed detector achieves a good performance in multi-scale defects detection task. Finally, the experimental results on a large-scale EL dataset including 3629 images, 2129 of which are defective, show that the proposed method achieves 98.70% (F-measure), 88.07% (mAP), and 73.29% (IoU) in terms of multi-scale defects classification and detection results in raw PV cell EL images.

翻訳日:2021-05-01 11:12:31 公開日:2021-03-28

# (参考訳) ブラックボックスソースモデルの教師なし領域適応

Unsupervised Domain Adaptation of Black-Box Source Models ( http://arxiv.org/abs/2101.02839v2 )

ライセンス: CC BY 4.0

Haojian Zhang, Yabin Zhang, Kui Jia, Lei Zhang

(参考訳) unsupervised domain adaptation(uda)は、ラベル付きソースドメインから知識を転送することで、ラベル付きデータのターゲットドメインのモデルを学ぶことを目的としている。従来のUDA設定では、ラベル付きソースデータが適応可能であると仮定される。データプライバシに関する懸念が高まっているため、ソースフリーなUDAは、トレーニング済みのソースモデルのみが利用可能であると想定される新しいUDA設定として高く評価されている。しかし、ソースモデルが商業的価値を持つ可能性があり、ソースモデルがソースドメインにリスクをもたらす可能性があるため、トレーニング済みのソースモデルも実際には使用できない場合がある。本研究では,B$^2$UDA (Black-Box Unsupervised Domain Adaptation) という,ソースモデルのアプリケーションプログラミングインターフェースのみを対象ドメインにアクセス可能なサブセットについて検討する。 B$^2$UDAに取り組むために,ノイズラベルを用いた反復学習(IterLNL)という,シンプルで効果的な手法を提案する。ブラックボックスモデルをノイズラベリングのツールとして、IterLNLはノイズラベリングと学習をノイズラベリング(LNL)で反復的に行う。 b$^2$uda における lnl の実装を容易にするために,ラベルなし対象データのモデル予測から雑音率を推定し,カテゴリ間の不均衡ラベルノイズに対処するためにカテゴリ毎のサンプリングを提案する。ベンチマークデータセットの実験は、IterLNLの有効性を示している。ソースデータもソースモデルも考慮しないため、IterLNLはラベル付きソースデータを完全に利用する従来のUDAメソッドと互換性がある。

Unsupervised domain adaptation (UDA) aims to learn models for a target domain of unlabeled data by transferring knowledge from a labeled source domain. In the traditional UDA setting, labeled source data are assumed to be available for adaptation. Due to increasing concerns for data privacy, source-free UDA is highly appreciated as a new UDA setting, where only a trained source model is assumed to be available, while labeled source data remain private. However, trained source models may also be unavailable in practice since source models may have commercial values and exposing source models brings risks to the source domain, e.g., problems of model misuse and white-box attacks. In this work, we study a subtly different setting, named Black-Box Unsupervised Domain Adaptation (B$^2$UDA), where only the application programming interface of source model is accessible to the target domain; in other words, the source model itself is kept as a black-box one. To tackle B$^2$UDA, we propose a simple yet effective method, termed Iterative Learning with Noisy Labels (IterLNL). With black-box models as tools of noisy labeling, IterLNL conducts noisy labeling and learning with noisy labels (LNL), iteratively. To facilitate the implementation of LNL in B$^2$UDA, we estimate the noise rate from model predictions of unlabeled target data and propose category-wise sampling to tackle the unbalanced label noise among categories. Experiments on benchmark datasets show the efficacy of IterLNL. Given neither source data nor source models, IterLNL performs comparably with traditional UDA methods that make full use of labeled source data.

翻訳日:2021-04-10 12:16:51 公開日:2021-03-28

# ガウス過程畳み込み辞書学習

Gaussian Process Convolutional Dictionary Learning ( http://arxiv.org/abs/2104.00530v1 )

ライセンス: Link先を確認

Andrew H. Song, Bahareh Tolooshams, Demba Ba

(参考訳) データからシフト不変テンプレートを推定する問題である畳み込み辞書学習(cdl)は、通常、テンプレートの事前構造や構造がない状態で実行される。コミュニティからほとんど注目を集めていないSNR(Data-Scarce or Low Signal-to-Noise ratio)体制では、下流タスクの予測性能に影響を与えるような、データの過度な適合と滑らかさの欠如を学習した。この制限に対処するため,GPCDLを提案する。GPCDLはガウス過程(GP)を用いたテンプレートの事前処理を行う畳み込み辞書学習フレームワークである。滑らか性に着目して,gpを事前設定することは,学習したテンプレートのワイナーフィルタリングと等価であることを示し,高周波成分の抑制と滑らか性の向上を理論的に示す。このアルゴリズムは古典的反復重み付け最小二乗の単純な拡張であり、柔軟性は異なる滑らかさの仮定で実験できることを示す。シミュレーションにより,GPCDLはSNRの非正規化よりもスムーズな辞書を学習できることを示す。ラットの神経スパイクデータに適用することにより、GPCDLによる学習テンプレートはより正確で視覚的に解釈可能なスムーズな辞書となり、非正規化されたCDLよりも予測性能が優れ、パラメトリックな代替品が得られた。

Convolutional dictionary learning (CDL), the problem of estimating shift-invariant templates from data, is typically conducted in the absence of a prior/structure on the templates. In data-scarce or low signal-to-noise ratio (SNR) regimes, which have received little attention from the community, learned templates overfit the data and lack smoothness, which can affect the predictive performance of downstream tasks. To address this limitation, we propose GPCDL, a convolutional dictionary learning framework that enforces priors on templates using Gaussian Processes (GPs). With the focus on smoothness, we show theoretically that imposing a GP prior is equivalent to Wiener filtering the learned templates, thereby suppressing high-frequency components and promoting smoothness. We show that the algorithm is a simple extension of the classical iteratively reweighted least squares, which allows the flexibility to experiment with different smoothness assumptions. Through simulation, we show that GPCDL learns smooth dictionaries with better accuracy than the unregularized alternative across a range of SNRs. Through an application to neural spiking data from rats, we show that learning templates by GPCDL results in a more accurate and visually-interpretable smooth dictionary, leading to superior predictive performance compared to non-regularized CDL, as well as parametric alternatives.

翻訳日:2021-04-02 13:48:50 公開日:2021-03-28

# (参考訳) 「あなたが正しいからといって、私が間違っているというわけではない」:オープンエンディングビジュアル質問回答(VQA)タスクの開発と評価におけるボタネックの克服

'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks ( http://arxiv.org/abs/2103.15022v1 )

ライセンス: CC BY 4.0

Man Luo, Shailaja Keyur Sampat, Riley Tallman, Yankai Zeng, Manuha Vancha, Akarshan Sajja, Chitta Baral

(参考訳) GQA (Hudson and Manning, 2019) は、現実の視覚的推論と構成的質問応答のためのデータセットである。 GQAデータセット上で最高の視覚言語モデルによって予測される多くの回答は、基礎的真実の答えと一致しないが、与えられた文脈において意味的に意味があり正しい。実際、ほとんどの既存の視覚的質問応答(VQA)データセットでは、各質問に対して1つの根本的回答しか想定していない。我々は,この制限に対処するために,既設のNLPツールを用いて自動生成する,地中回答の代替アンサーセット(AAS)を提案する。 AASに基づくセマンティックメトリックを導入し、トップVQAソルバを修正して、質問に対する複数の妥当な回答をサポートする。このアプローチをGQAデータセットに実装し、性能改善を示す。

GQA (Hudson and Manning, 2019) is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best visionlanguage models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools. We introduce a semantic metric based on AAS and modify top VQA solvers to support multiple plausible answers for a question. We implement this approach on the GQA dataset and show the performance improvements.

翻訳日:2021-04-01 09:24:13 公開日:2021-03-28

# (参考訳) 条件言語生成における幻覚と予測不確かさについて

On Hallucination and Predictive Uncertainty in Conditional Language Generation ( http://arxiv.org/abs/2103.15025v1 )

ライセンス: CC BY 4.0

Yijun Xiao, William Yang Wang

(参考訳) 異なる自然言語生成タスクのパフォーマンスは改善されているが、深いニューラルモデルは、誤ったあるいは存在しない事実を幻覚させる傾向がある。異なるタスクに対して異なる仮説が提案され、個別に検討されるが、これらのタスクの体系的な説明は得られない。本研究では,条件言語生成における幻覚と予測の不確かさの関連性を示す。画像キャプションとデータ対テキスト生成の両方におけるそれらの関係を調べ、幻覚を減少させるビーム探索の簡単な拡張を提案する。分析の結果,高い予測不確実性は幻覚の確率が高いことがわかった。てんかんの不確実性は、失語症や全不確実性よりも幻覚を示す。提案したビームサーチ変種との幻覚を抑えるため,標準メートル法での取引性能の向上に寄与する。

Despite improvements in performances on different natural language generation tasks, deep neural models are prone to hallucinating facts that are incorrect or nonexistent. Different hypotheses are proposed and examined separately for different tasks, but no systematic explanations are available across these tasks. In this study, we draw connections between hallucinations and predictive uncertainty in conditional language generation. We investigate their relationship in both image captioning and data-to-text generation and propose a simple extension to beam search to reduce hallucination. Our analysis shows that higher predictive uncertainty corresponds to a higher chance of hallucination. Epistemic uncertainty is more indicative of hallucination than aleatoric or total uncertainties. It helps to achieve better results of trading performance in standard metric for less hallucination with the proposed beam search variant.

翻訳日:2021-04-01 09:17:05 公開日:2021-03-28

# (参考訳) ノイズ注入によるポイントクラウド処理の規則化

Noise Injection-based Regularization for Point Cloud Processing ( http://arxiv.org/abs/2103.15027v1 )

ライセンス: CC BY 4.0

Xiao Zang, Yi Xie, Siyu Liao, Jie Chen, Bo Yuan

(参考訳) Dropoutのようなノイズ注入に基づく正規化は、ディープニューラルネットワーク(DNN)の性能向上のために画像領域で広く利用されている。しかし、ポイントクラウド領域における効率的な正規化はめったに利用されず、最先端の作業の多くはデータ拡張ベースの正規化に焦点を当てている。本稿では,まず,ポイントクラウドドメインDNNにおけるノイズ注入に基づく正規化の体系化について検討する。具体的には、機能レベル、ポイントレベル、クラスタレベルの各ポイントフィーチャーマップにノイズ注入を行うために、dropfeat、droppoint、dropclusterという一連の正規化手法を提案する。また、異なるデータセットやdnnアーキテクチャにおけるアプローチの採用を促進する有用な洞察と一般的なデプロイメントガイドラインを得るために、速度の低下、クラスタサイズ、位置の低下など、さまざまな要因の影響を実証的に分析します。異なるポイントクラウド処理タスクに対する様々なDNNモデルに対する提案手法の評価を行った。実験の結果,本手法による性能改善効果が示された。特に、私たちのDropClusterは、ModelNet40形状分類データセットで、PointNet、PointNet++、DGCNNの全体的な精度を1.5%、1.3%、0.8%向上させています。 shapenetの部分セグメンテーションデータセットでは、dropclusterは0.5%、0.5%、0.2%がpointnet、pointnet++、dgcnnのintersection-over-union(iou)増加をもたらしている。 s3disセマンティックセグメンテーションデータセットでは、dropclusterはpointnet、pointnet++、dgcnnの平均iouをそれぞれ3.2%、2.9%、3.7%改善している。一方、DropClusterは、これら3つの人気バックボーンDNNの全体的な精度を2.4%、 2.2%、 1.8%向上させることができる。

Noise injection-based regularization, such as Dropout, has been widely used in image domain to improve the performance of deep neural networks (DNNs). However, efficient regularization in the point cloud domain is rarely exploited, and most of the state-of-the-art works focus on data augmentation-based regularization. In this paper, we, for the first time, perform systematic investigation on noise injection-based regularization for point cloud-domain DNNs. To be specific, we propose a series of regularization techniques, namely DropFeat, DropPoint and DropCluster, to perform noise injection on the point feature maps at the feature level, point level and cluster level, respectively. We also empirically analyze the impacts of different factors, including dropping rate, cluster size and dropping position, to obtain useful insights and general deployment guidelines, which can facilitate the adoption of our approaches across different datasets and DNN architectures. We evaluate our proposed approaches on various DNN models for different point cloud processing tasks. Experimental results show our approaches enable significant performance improvement. Notably, our DropCluster brings 1.5%, 1.3% and 0.8% higher overall accuracy for PointNet, PointNet++ and DGCNN, respectively, on ModelNet40 shape classification dataset. On ShapeNet part segmentation dataset, DropCluster brings 0.5%, 0.5% and 0.2% mean Intersection-over-union (IoU) increase for PointNet, PointNet++ and DGCNN, respectively. On S3DIS semantic segmentation dataset, DropCluster improves the mean IoU of PointNet, PointNet++ and DGCNN by 3.2%, 2.9% and 3.7%, respectively. Meanwhile, DropCluster also enables the overall accuracy increase for these three popular backbone DNNs by 2.4%, 2.2% and 1.8%, respectively.

翻訳日:2021-04-01 09:05:06 公開日:2021-03-28

# (参考訳) 確率微分方程式を用いた精度・信頼性予測

Accurate and Reliable Forecasting using Stochastic Differential Equations ( http://arxiv.org/abs/2103.15041v1 )

ライセンス: CC BY-SA 4.0

Peng Cui, Zhijie Deng, Wenbo Hu and Jun Zhu

(参考訳) ディープラーニングモデルにとって、現実世界の環境に浸透する不確実性を適切に特徴付けることは、非常に困難である。ヘテロシドスティックニューラルネットワーク(hnn)など多くの努力がなされているが、学習効率、不確実性推定の質、予測性能の異なるレベルの妥協によって、実用性が満足できる成果は少ない。さらに、既存のHNNは予測と関連する不確実性の間に明確な相互作用を構築することができない。本稿では、確率微分方程式(SDE)を備えた新しいヘテロ代用ニューラルネットワークであるSDE-HNNを開発し、HNNの予測平均と分散の相互作用を正確にかつ信頼性の高い回帰のために特徴付けることにより、これらの問題を解決することを目的とする。理論的には、考案されたニューラルSDEに対する解の存在と特異性を示す。さらに、SDE-HNNにおける最適化のためのバイアス分散トレードオフに基づいて、学習安定性を向上させるために、改良された数値SDEソルバを設計する。最後に、予測の不確かさをより体系的に評価するために、2つの新しい診断不確実性指標を示す。本手法は,予測性能と不確実性定量化の両方の観点から,最先端のベースラインを著しく上回り,良好な校正と鋭い予測間隔を提供することを示す。

It is critical yet challenging for deep learning models to properly characterize uncertainty that is pervasive in real-world environments. Although a lot of efforts have been made, such as heteroscedastic neural networks (HNNs), little work has demonstrated satisfactory practicability due to the different levels of compromise on learning efficiency, quality of uncertainty estimates, and predictive performance. Moreover, existing HNNs typically fail to construct an explicit interaction between the prediction and its associated uncertainty. This paper aims to remedy these issues by developing SDE-HNN, a new heteroscedastic neural network equipped with stochastic differential equations (SDE) to characterize the interaction between the predictive mean and variance of HNNs for accurate and reliable regression. Theoretically, we show the existence and uniqueness of the solution to the devised neural SDE. Moreover, based on the bias-variance trade-off for the optimization in SDE-HNN, we design an enhanced numerical SDE solver to improve the learning stability. Finally, to more systematically evaluate the predictive uncertainty, we present two new diagnostic uncertainty metrics. Experiments on the challenging datasets show that our method significantly outperforms the state-of-the-art baselines in terms of both predictive performance and uncertainty quantification, delivering well-calibrated and sharp prediction intervals.

翻訳日:2021-03-31 14:17:09 公開日:2021-03-28

# (参考訳) 表現学習による知識グラフエンティティアライメントに関する包括的調査

A Comprehensive Survey on Knowledge Graph Entity Alignment via Representation Learning ( http://arxiv.org/abs/2103.15059v1 )

ライセンス: CC BY 4.0

Rui Zhang, Bayu Distiawan Trisedy, Miao Li, Yong Jiang, Jianzhong Qi

(参考訳) ここ数年、AIアプリケーションにおいて重要な役割を担っているため、研究コミュニティと業界の両方で知識ベースへの関心が指数関数的に高まっている。エンティティアライメントは知識ベースを強化する上で重要なタスクです。本稿では,表現学習の新しいアプローチを用いた代表者アライメント手法に関する総合的なチュートリアル型調査を行う。本稿では,これらの手法の重要な特徴を捉えるためのフレームワークを提案し,既存のベンチマークデータセットの制限に対処する2つのデータセットを提案し,提案したデータセットを用いて広範な実験を行う。フレームワークは、テクニックの動作方法を明確に示しています。実験により,実験手法の実証的性能と各種要因が性能に与える影響について重要な結果が得られた。以前の研究で強調されなかった重要な観察の1つは、特徴が勝者として際立っているように、属性トリプルと関係式をうまく活用するテクニックである。

In the last few years, the interest in knowledge bases has grown exponentially in both the research community and the industry due to their essential role in AI applications. Entity alignment is an important task for enriching knowledge bases. This paper provides a comprehensive tutorial-type survey on representative entity alignment techniques that use the new approach of representation learning. We present a framework for capturing the key characteristics of these techniques, propose two datasets to address the limitation of existing benchmark datasets, and conduct extensive experiments using the proposed datasets. The framework gives a clear picture of how the techniques work. The experiments yield important results about the empirical performance of the techniques and how various factors affect the performance. One important observation not stressed by previous work is that techniques making good use of attribute triples and relation predicates as features stand out as winners.

翻訳日:2021-03-31 11:56:27 公開日:2021-03-28

# (参考訳) IUP: 5G IoTにおける固体発酵のためのインテリジェントなユーティリティ予測スキーム

IUP: An Intelligent Utility Prediction Scheme for Solid-State Fermentation in 5G IoT ( http://arxiv.org/abs/2103.15073v1 )

ライセンス: CC BY 4.0

Min Wang, Shanchen Pang, Tong Ding, Sibo Qiao, Xue Zhai, Shuo Wang, Neal N. Xiong, Zhengwen Huang

(参考訳) 現在,SOILD-STATE発酵(SSF)は主に人工的な経験によって制御されており,生産品質と収量は安定していない。 SSFの品質と収量を正確に予測することは、食品の安全性と供給を改善する上で非常に重要である。本稿では,5G Industrial Internet of Things(IoT)におけるSSFのためのインテリジェントユーティリティ予測(IUP)手法を提案する。この IUP スキームは,5G 産業用 IoT の環境認識と知的学習アルゴリズムに基づいている。 rewritable petri netに基づくワークフローモデルを構築し,システムモデル機能とプロセスの正確性を検証する。さらに,GAN(Generative Adversarial Networks)とFCNN(Fully Connected Neural Network)に基づくSSFの実用予測モデルを設計する。平均二乗誤差(MSE-GAN)の制約付きGANを設計し、SSFの少数ショット学習の問題を解決するとともに、FCNNと組み合わせてSSFの効用予測(通常アルコール)を実現する。実験室での酒類製造から,SSFの実用性予測における他の予測手法よりも精度が高く,予め設定した原料の割合とセル温度の適切な設定に関する数値解析の基礎を提供する。

At present, SOILD-STATE Fermentation (SSF) is mainly controlled by artificial experience, and the product quality and yield are not stable. Accurately predicting the quality and yield of SSF is of great significance for improving human food security and supply. In this paper, we propose an Intelligent Utility Prediction (IUP) scheme for SSF in 5G Industrial Internet of Things (IoT), including parameter collection and utility prediction of SSF process. This IUP scheme is based on the environmental perception and intelligent learning algorithms of the 5G Industrial IoT. We build a workflow model based on rewritable petri net to verify the correctness of the system model function and process. In addition, we design a utility prediction model for SSF based on the Generative Adversarial Networks (GAN) and Fully Connected Neural Network (FCNN). We design a GAN with constraint of mean square error (MSE-GAN) to solve the problem of few-shot learning of SSF, and then combine with the FCNN to realize the utility prediction (usually use the alcohol) of SSF. Based on the production of liquor in laboratory, the experiments show that the proposed method is more accurate than the other prediction methods in the utility prediction of SSF, and provide the basis for the numerical analysis of the proportion of preconfigured raw materials and the appropriate setting of cellar temperature.

翻訳日:2021-03-31 11:17:40 公開日:2021-03-28

# (参考訳) PENELOPIE: 機械翻訳によるギリシア語のオープン情報抽出の実現

PENELOPIE: Enabling Open Information Extraction for the Greek Language through Machine Translation ( http://arxiv.org/abs/2103.15075v1 )

ライセンス: CC BY 4.0

Dimitris Papadopoulos, Nikolaos Papadakis and Nikolaos Matsatsinis

(参考訳) 本稿では,オープンインフォメーション抽出の文脈において,高リソース言語と低リソース言語のギャップを埋めることを目的とした方法論であるEACL 2021 SRWの提出について述べる。第一に、トランスフォーマーアーキテクチャに基づいて、英語からギリシャ語、ギリシャ語への翻訳のためのニューラルマシン翻訳(NMT)モデルを構築する。第二に、これらのNMTモデルを利用して、NLPパイプラインの入力としてギリシャ語のテキストの英語翻訳を作成し、一連の前処理と三重抽出タスクを適用します。最後に、抽出したトリプルをギリシャ語にバックトランスレートします。我々はNMT法とOIE法の両方をベンチマークデータセット上で評価し、我々のアプローチがギリシャの自然言語の最先端技術よりも優れていることを示す。

In this paper we present our submission for the EACL 2021 SRW; a methodology that aims at bridging the gap between high and low-resource languages in the context of Open Information Extraction, showcasing it on the Greek language. The goals of this paper are twofold: First, we build Neural Machine Translation (NMT) models for English-to-Greek and Greek-to-English based on the Transformer architecture. Second, we leverage these NMT models to produce English translations of Greek text as input for our NLP pipeline, to which we apply a series of pre-processing and triple extraction tasks. Finally, we back-translate the extracted triples to Greek. We conduct an evaluation of both our NMT and OIE methods on benchmark datasets and demonstrate that our approach outperforms the current state-of-the-art for the Greek natural language.

翻訳日:2021-03-31 10:55:02 公開日:2021-03-28

# (参考訳) スケッチテンソル空間の学習による人工シーンのイメージインペインティング

Learning a Sketch Tensor Space for Image Inpainting of Man-made Scenes ( http://arxiv.org/abs/2103.15087v1 )

ライセンス: CC0 1.0

Chenjie Cao, Yanwei Fu

(参考訳) 本稿では,人為的なシーンを描く作業について検討する。エッジ、ライン、ジャンクションといった画像の視覚的パターンを保存するのが難しいため、非常に難しい。特に、それまでのほとんどの作品は、人工のシーンの画像のオブジェクト/構築構造を復元できなかった。そこで本稿では,人造シーンを描き込むためのスケッチテンソル(st)空間の学習を提案する。このような空間は、画像のエッジ、ライン、ジャンクションを復元するために学習され、その結果、全体像構造の信頼できる予測を行う。構造改善を容易にするために,新しいエンコーダ・デコーダ構造を持つマルチスケール・スケッチ・テンソル塗装 (MST) ネットワークを提案する。エンコーダは入力画像から線とエッジを抽出してST空間に投影する。この空間からデコーダが学習され、入力画像が復元される。広範な実験は、我々のモデルの有効性を検証する。さらに,本モデルでは,コンペティタに対する一般的な自然像の塗布において,競争性能が向上する。

This paper studies the task of inpainting man-made scenes. It is very challenging due to the difficulty in preserving the visual patterns of images, such as edges, lines, and junctions. Especially, most previous works are failed to restore the object/building structures for images of man-made scenes. To this end, this paper proposes learning a Sketch Tensor (ST) space for inpainting man-made scenes. Such a space is learned to restore the edges, lines, and junctions in images, and thus makes reliable predictions of the holistic image structures. To facilitate the structure refinement, we propose a Multi-scale Sketch Tensor inpainting (MST) network, with a novel encoder-decoder structure. The encoder extracts lines and edges from the input images to project them into an ST space. From this space, the decoder is learned to restore the input images. Extensive experiments validate the efficacy of our model. Furthermore, our model can also achieve competitive performance in inpainting general nature images over the competitors.

翻訳日:2021-03-31 10:45:38 公開日:2021-03-28

# (参考訳) 複数の課題によるランキングによる表現学習

Representation Learning by Ranking under multiple tasks ( http://arxiv.org/abs/2103.15093v1 )

ライセンス: CC BY 4.0

Lifeng Gu

(参考訳) 近年,表現学習が機械学習コミュニティの研究の焦点となっている。大規模事前学習ニューラルネットワークは、汎用知性を実現するための最初のステップとなっている。ニューラルネットワークの成功の鍵は、データの抽象表現能力にある。いくつかの学習分野は実際に表現の学習方法について議論しており、統一された視点がない。我々は、複数のタスクの表現学習問題をランキング問題に変換し、ランキング問題を統一的な視点として、近似的なNDCG損失を最適化することにより、異なるタスクの表現学習を解決する。分類、検索、マルチラベル学習、回帰、自己教師あり学習などの異なる学習タスクの下での実験は、近似ndcg損失の優位性が証明される。さらに、自己教師付き学習タスクにおいて、トレーニングデータをデータ拡張法により変換し、近似NDCG損失の性能を向上させることにより、近似NDCG損失が教師なしトレーニングデータの情報をフル活用できることを示す。

In recent years, representation learning has become the research focus of the machine learning community. Large-scale pre-training neural networks have become the first step to realize general intelligence. The key to the success of neural networks lies in their abstract representation capabilities for data. Several learning fields are actually discussing how to learn representations and there lacks a unified perspective. We convert the representation learning problem under multiple tasks into a ranking problem, taking the ranking problem as a unified perspective, the representation learning under different tasks is solved by optimizing the approximate NDCG loss. Experiments under different learning tasks like classification, retrieval, multi-label learning, regression, self-supervised learning prove the superiority of approximate NDCG loss. Further, under the self-supervised learning task, the training data is transformed by data augmentation method to improve the performance of the approximate NDCG loss, which proves that the approximate NDCG loss can make full use of the information of the unsupervised training data.

翻訳日:2021-03-31 10:26:50 公開日:2021-03-28

# (参考訳) BA^2M:画像分類のためのバッチ注意モジュール

BA^2M: A Batch Aware Attention Module for Image Classification ( http://arxiv.org/abs/2103.15099v1 )

ライセンス: CC BY 4.0

Qishang Cheng, Hongliang Li, Qingbo Wu and King Ngi Ngan

(参考訳) 特徴表現を強化するために畳み込みニューラルネットワーク(cnn)では注意機構が採用されている。しかし、既存の注意機構は、各サンプル内の特徴を精錬することのみに集中し、異なるサンプル間の識別を無視する。本稿では,特徴量強化のためのバッチアウェアメントモジュール(ba2m)を提案する。具体的には、まず、各サンプル内のチャネル、局所空間及びグローバル空間の注意マップを融合させることにより、サンプルワイズアテンション表現(SAR)を得る。次に,全バッチのSARを正規化関数に供給し,各サンプルの重み付けを行う。重み付けは、内容の複雑さが異なるトレーニングバッチにおけるサンプル間の機能の重要性を区別するのに役立つ。 BA2MはCNNの様々な部分に埋め込まれ、エンドツーエンドでネットワークに最適化された。 BA2Mの設計は軽量で、パラメータや計算は少ない。 CIFAR-100 と ImageNet-1K の広汎な実験により BA2M を検証する。その結果、ba2mは様々なネットワークアーキテクチャの性能を高め、多くの古典的な注意手法を上回っている。さらに、BA2Mは損失値に基づいてサンプルを再重み付けする従来の方法を上回る。

The attention mechanisms have been employed in Convolutional Neural Network (CNN) to enhance the feature representation. However, existing attention mechanisms only concentrate on refining the features inside each sample and neglect the discrimination between different samples. In this paper, we propose a batch aware attention module (BA2M) for feature enrichment from a distinctive perspective. More specifically, we first get the sample-wise attention representation (SAR) by fusing the channel, local spatial and global spatial attention maps within each sample. Then, we feed the SARs of the whole batch to a normalization function to get the weights for each sample. The weights serve to distinguish the features' importance between samples in a training batch with different complexity of content. The BA2M could be embedded into different parts of CNN and optimized with the network in an end-to-end manner. The design of BA2M is lightweight with few extra parameters and calculations. We validate BA2M through extensive experiments on CIFAR-100 and ImageNet-1K for the image recognition task. The results show that BA2M can boost the performance of various network architectures and outperforms many classical attention methods. Besides, BA2M exceeds traditional methods of re-weighting samples based on the loss value.

翻訳日:2021-03-31 10:14:37 公開日:2021-03-28

# (参考訳) 階層的関係調整メトリックラーニング

Hierarchical Relationship Alignment Metric Learning ( http://arxiv.org/abs/2103.15107v1 )

ライセンス: CC BY 4.0

Lifeng Gu

(参考訳) 既存のメトリック学習法は、サンプルペア間の類似点や類似点に依存する類似点や距離尺度の学習に焦点を当てている。しかし、サンプルのペアは、例えばマルチラベル学習、ラベル分布学習など、現実世界の多くのアプリケーションにおいて、単に類似または異種と特定できない。この目的のために,これらのシナリオにおける距離学習問題を扱うために,関係アライメントメトリック学習(RAML)フレームワークを提案する。しかし、RAMLは複雑なデータセットをモデル化できない線形メトリックを学ぶ。深層学習とRAMLフレームワークを組み合わせることで,複数の学習課題における距離学習問題に対する関係アライメントの概念を用いて,特徴空間におけるサンプルペア関係とラベル空間におけるサンプルペア関係との整合性をフル活用する階層的関係アライメント計量傾きモデルHRAMLを提案する。さらに,学習タスクによって分割されたいくつかの実験を整理し,多くの一般的なメソッドやRAMLフレームワークに対して,HRAMLの優れた性能を検証した。

Most existing metric learning methods focus on learning a similarity or distance measure relying on similar and dissimilar relations between sample pairs. However, pairs of samples cannot be simply identified as similar or dissimilar in many real-world applications, e.g., multi-label learning, label distribution learning. To this end, relation alignment metric learning (RAML) framework is proposed to handle the metric learning problem in those scenarios. But RAML learn a linear metric, which can't model complex datasets. Combining with deep learning and RAML framework, we propose a hierarchical relationship alignment metric leaning model HRAML, which uses the concept of relationship alignment to model metric learning problems under multiple learning tasks, and makes full use of the consistency between the sample pair relationship in the feature space and the sample pair relationship in the label space. Further we organize several experiment divided by learning tasks, and verified the better performance of HRAML against many popular methods and RAML framework.

翻訳日:2021-03-31 09:53:36 公開日:2021-03-28

# (参考訳) 相互情報による表現の説明

Explaining Representation by Mutual Information ( http://arxiv.org/abs/2103.15114v1 )

ライセンス: CC BY 4.0

Lifeng Gu

(参考訳) 科学は世界の法則を発見するために使われる。機械学習は、データの法則の発見に使用できる。近年,機械学習コミュニティにおける解釈可能性に関する研究がますます増えている。機械学習の手法が安全で解釈可能であり、データに意味のあるパターンを見つけるのに役立つことを願っています。本稿では,深層表現の解釈可能性に着目する。本稿では,相互情報に基づく解釈可能な表現法を提案し,その解釈を入力データと表現の間の3種類の情報に要約する。さらに、モデルに挿入して、モデル表現を説明するための情報量を推定できるMI-LRモジュールを提案する。最後に,プロトタイプネットワークの可視化による検証を行う。

Science is used to discover the law of world. Machine learning can be used to discover the law of data. In recent years, there are more and more research about interpretability in machine learning community. We hope the machine learning methods are safe, interpretable, and they can help us to find meaningful pattern in data. In this paper, we focus on interpretability of deep representation. We propose a interpretable method of representation based on mutual information, which summarizes the interpretation of representation into three types of information between input data and representation. We further proposed MI-LR module, which can be inserted into the model to estimate the amount of information to explain the model's representation. Finally, we verify the method through the visualization of the prototype network.

翻訳日:2021-03-31 09:46:44 公開日:2021-03-28

# (参考訳) 人間言語の区別不能概念に対する量子ボース・アインシュタイン統計

Quantum Bose-Einstein Statistics for Indistinguishable Concepts in Human Language ( http://arxiv.org/abs/2103.15125v1 )

ライセンス: CC BY 4.0

Lester Beltran

(参考訳) 本研究では,「数の概念」と「従属概念」の組合せにおいて,概念のレベルに存在する同一性と識別可能性,すなわち11種の動物が同一であり識別不能であることにより,ボース=アインシュタイン型の統計構造が同一で識別不能な量子粒子に対してボース=アインシュタイン統計が存在しているのと類似する仮説について検討する。 Google Searchツールを用いてWorld-Wide-Webから統計データを抽出し,この仮説の証拠を特定する。 Kullback-Leibler分散法を用いて、得られた分布をマクスウェル-ボルツマン分布およびボース=アインシュタイン分布と比較し、ボース=アインシュタイン分布がマクスウェル-ボルツマン分布と比較してよりよく適合することを示す。

We investigate the hypothesis that within a combination of a 'number concept' plus a 'substantive concept', such as 'eleven animals,' the identity and indistinguishability present on the level of the concepts, i.e., all eleven animals are identical and indistinguishable, gives rise to a statistical structure of the Bose-Einstein type similar to how Bose-Einstein statistics is present for identical and indistinguishable quantum particles. We proceed by identifying evidence for this hypothesis by extracting the statistical data from the World-Wide-Web utilizing the Google Search tool. By using the Kullback-Leibler divergence method, we then compare the obtained distribution with the Maxwell-Boltzmann as well as with the Bose-Einstein distributions and show that the Bose-Einstein's provides a better fit as compared to the Maxwell-Boltzmanns.

翻訳日:2021-03-31 09:38:24 公開日:2021-03-28

# (参考訳) 非線形逆問題におけるモデルベース学習のためのグラフ畳み込みネットワーク

Graph Convolutional Networks for Model-Based Learning in Nonlinear Inverse Problems ( http://arxiv.org/abs/2103.15138v1 )

ライセンス: CC BY 4.0

William Herzberg, Daniel B. Rowe, Andreas Hauptmann, and Sarah J. Hamilton

(参考訳) 医用画像における学習画像再構成法の大部分は、画素画像などの一様領域に限られている。非線形逆問題に典型的な有限要素法から生じる非一様メッシュ上で基礎モデルが解かれた場合、補間と埋め込みが必要である。これを克服するために,メッシュをグラフとして解釈し,グラフ畳み込みニューラルネットワークを用いてネットワークアーキテクチャを定式化することにより,モデルベース学習を非一様メッシュに直接拡張するフレキシブルなフレームワークを提案する。これにより、提案された反復グラフ畳み込みニュートン法(GCNM)が、逆問題の解にフォワードモデルを直接含み、すべての更新は問題固有のメッシュ上でネットワークによって直接計算される。本研究では, 有限要素法を用いてフォワード問題を解く最適化に基づく手法で頻繁に解く非線形逆問題である電気インピーダンストモグラフィについて報告する。絶対eitイメージングの結果は、グラフ残差ネットワークと同様に、標準的な反復的手法と比較される。我々はGCNMが純粋にシミュレートされたトレーニングデータから分布データと実験データから、異なる領域形状に強く一般化可能であることを示す。

The majority of model-based learned image reconstruction methods in medical imaging have been limited to uniform domains, such as pixelated images. If the underlying model is solved on nonuniform meshes, arising from a finite element method typical for nonlinear inverse problems, interpolation and embeddings are needed. To overcome this, we present a flexible framework to extend model-based learning directly to nonuniform meshes, by interpreting the mesh as a graph and formulating our network architectures using graph convolutional neural networks. This gives rise to the proposed iterative Graph Convolutional Newton's Method (GCNM), which directly includes the forward model into the solution of the inverse problem, while all updates are directly computed by the network on the problem specific mesh. We present results for Electrical Impedance Tomography, a severely ill-posed nonlinear inverse problem that is frequently solved via optimization-based methods, where the forward problem is solved by finite element methods. Results for absolute EIT imaging are compared to standard iterative methods as well as a graph residual network. We show that the GCNM has strong generalizability to different domain shapes, out of distribution data as well as experimental data, from purely simulated training data.

翻訳日:2021-03-31 09:27:09 公開日:2021-03-28

# (参考訳) マルコフ論理ネットワークにおける重みパラメータのスケーリングと関係ロジスティック回帰モデル

Scaling the weight parameters in Markov logic networks and relational logistic regression models ( http://arxiv.org/abs/2103.15140v1 )

ライセンス: CC BY 4.0

Felix Weitk\"amper

(参考訳) 我々はマルコフ論理ネットワークとリレーショナルロジスティック回帰を、その仕様に重み付き公式を用いる統計リレーショナル人工知能の2つの基本的な表現形式として考える。しかし、マルコフ論理ネットワークは無向グラフに基づいており、リレーショナルロジスティック回帰は有向非巡回グラフに基づいている。重みパラメータをドメインサイズでスケーリングする場合、関係ロジスティック回帰モデルの漸近的挙動はパラメータによって透過的に制御され、漸近確率を計算するアルゴリズムを提供する。また、マルコフ論理ネットワークには当てはまらない2つの例を示す。また、主に文献から、そのようなスケーリングが適切かどうか、生の未スケールパラメータの使用が望ましいかどうかをユーザが判断する上で、アプリケーションコンテキストがどのように役立つかを議論する。本稿では,特に有望なスケールモデルの適用分野としてランダムサンプリングに注目し,さらなる研究の道筋を示す。

We consider Markov logic networks and relational logistic regression as two fundamental representation formalisms in statistical relational artificial intelligence that use weighted formulas in their specification. However, Markov logic networks are based on undirected graphs, while relational logistic regression is based on directed acyclic graphs. We show that when scaling the weight parameters with the domain size, the asymptotic behaviour of a relational logistic regression model is transparently controlled by the parameters, and we supply an algorithm to compute asymptotic probabilities. We also show using two examples that this is not true for Markov logic networks. We also discuss using several examples, mainly from the literature, how the application context can help the user to decide when such scaling is appropriate and when using the raw unscaled parameters might be preferable. We highlight random sampling as a particularly promising area of application for scaled models and expound possible avenues for further research.

翻訳日:2021-03-31 09:11:05 公開日:2021-03-28

# (参考訳) webベースシステムにおける認証方式としての顔認識

Face Recognition as a Method of Authentication in a Web-Based System ( http://arxiv.org/abs/2103.15144v1 )

ライセンス: CC BY 4.0

Ben Wycliff Mugalu, Rodrick Calvin Wamala, Jonathan Serugunda, Andrew Katumba

(参考訳) オンライン情報システムは現在、情報保護とアクセス制御にユーザー名とパスワードの伝統的な方法に大きく依存している。生体認証技術の進歩とAIや機械学習などの分野の人気により、生体認証のセキュリティはユーザビリティの優位性から、ますます人気が高まっている。本稿では,ユーザビリティ向上のメリットを享受するための認証手法として,機械学習による顔認識をWebベースシステムに統合する方法を報告する。本稿では,顔認識のためのFaceNetと検出アルゴリズムと分類アルゴリズムの組み合わせを比較した。その結果,検出用MCCNN,埋め込み生成用Facenet,分類用LinearSVCの組み合わせは95%の精度で他の組み合わせよりも優れていることがわかった。得られた分類器は、Webベースシステムに統合され、ユーザ認証に使用される。

Online information systems currently heavily rely on the username and password traditional method for protecting information and controlling access. With the advancement in biometric technology and popularity of fields like AI and Machine Learning, biometric security is becoming increasingly popular because of the usability advantage. This paper reports how machine learning based face recognition can be integrated into a web-based system as a method of authentication to reap the benefits of improved usability. This paper includes a comparison of combinations of detection and classification algorithms with FaceNet for face recognition. The results show that a combination of MTCNN for detection, Facenet for generating embeddings, and LinearSVC for classification outperforms other combinations with a 95% accuracy. The resulting classifier is integrated into the web-based system and used for authenticating users.

翻訳日:2021-03-31 08:44:51 公開日:2021-03-28

# (参考訳) シンボリック回帰は小さなデータセットの他のモデルを上回る

Symbolic regression outperforms other models for small data sets ( http://arxiv.org/abs/2103.15147v1 )

ライセンス: CC BY 4.0

Casper Wilstrup and Jaan Kasak

(参考訳) 機械学習は複雑な現象や関係の予測や新しい理解にしばしば応用されるが、モデルトレーニングに十分なデータの提供は広く問題となっている。ランダムフォレストや勾配向上といった従来の機械学習技術は、数百のサンプルのデータセットを扱う場合、過度に適合する傾向にある。本研究は,250個の観測値の小さなトレーニングセットに対して,線形モデルと決定木の解釈可能性を維持しつつ,精度を向上し,これらの機械学習モデルに代えてシンボル回帰が優れていることを示す。 240例中132例において、シンボリック回帰モデルは、サンプルデータ上で他のどのモデルよりも優れている。第2の最良のアルゴリズムはランダムな森林であることが判明し、240件中37件で最善を尽くした。解釈可能なモデルとの比較を制限する場合、シンボリック回帰は240例中184例で最良である。

Machine learning is often applied to obtain predictions and new understanding of complex phenomena and relationships, but availability of sufficient data for model training is a widespread problem. Traditional machine learning techniques such as random forests and gradient boosting tend to overfit when working with data sets of a few hundred samples. This study demonstrates that for small training sets of 250 observations, symbolic regression is a superior alternative to these machine learning models by providing better accuracy while preserving the interpretability of linear models and decision trees. In 132 out of 240 cases, the symbolic regression model performsbetter than any of the other models on the out-of-sample data. The second best algorithm was found to be a random forest, which performs best in 37 of the 240 cases. When restricting the comparison to interpretable models,symbolic regression performs best in 184 out of 240 cases.

翻訳日:2021-03-31 08:37:55 公開日:2021-03-28

# (参考訳) mri画像中の腫瘍同定のための画像処理技術

Image Processing Techniques for identifying tumors in an MRI image ( http://arxiv.org/abs/2103.15152v1 )

ライセンス: CC BY 4.0

Jacob John

(参考訳) 医学共鳴イメージングまたはMRIは、電波を使って体をスキャンする医療画像処理技術である。断層撮影技術であり、主に放射線医学の分野で用いられる。痛みのない診断方法の利点として、MRIでは、医療従事者が体内で発生した解剖や生理的過程の鮮明な画像を説明することができ、疾患の早期発見と治療が可能になる。これらの画像と画像処理技術を組み合わせることで、肉眼では識別が難しい腫瘍の検出に利用することができる。本稿では,ATD(Automated tumor Detection)における画像処理技術について検討する。この課題は,形態学ツール (MT) や地域成長技術 (RGT) といった従来の技術との比較から議論を始める。

Medical Resonance Imaging or MRI is a medical image processing technique that used radio waves to scan the body. It is a tomographic imaging technique, principally used in the field of radiology. With the advantage of being a painless diagnostic procedure, MRI allows medical personnel to illustrate clear pictures of the anatomy and the physiological processes occurring in the body, thus allowing early detection and treatment of diseases. These images, combined with image processing techniques may be used in the detection of tumors, difficult to identify with the naked eye. This digital assignment surveys the different image processing techniques used in Automated Tumor Detection (ATD). This assignment initiates the discussion with a comparison of traditional techniques such as Morphological Tools (MT) and Region Growing Technique (RGT).

翻訳日:2021-03-31 08:30:09 公開日:2021-03-28

# (参考訳) 表象的誤りを同定するベイズ的アプローチ

A Bayesian Approach to Identifying Representational Errors ( http://arxiv.org/abs/2103.15171v1 )

ライセンス: CC BY 4.0

Ramya Ramakrishnan, Vaibhav Unhelkar, Ece Kamar, Julie Shah

(参考訳) 訓練されたAIシステムと専門家の意思決定者は、しばしば識別と理解が難しいエラーを犯すことができる。これらのエラーの根本原因を決定することは、将来の決定を改善することができる。本研究は,俳優の行動(模擬エージェント,ロボット,人間)の観察に基づいて表現誤差を推定する生成モデルである生成誤差モデル(gem)を提案する。このモデルは2つのエラー源を考察している: 表現上の制限によって発生するもの -- "盲点" -- と、実行時のノイズやアクターのポリシーに存在する系統的エラーなど、非表現的エラーである。これら2つのエラータイプを曖昧にすることで、アクタのポリシー(つまり、表現エラーは知覚的な拡張を必要とするが、他のエラーはトレーニングの改善や注意支援といった方法によって削減できる)をターゲットとする改善が可能になる。本稿では,GEMのベイズ推定アルゴリズムを提案し,複数の領域における表現誤りの回復にその有用性を評価する。その結果,本手法は,強化学習エージェントとユーザの両方の盲点を回復できることがわかった。

Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand. Determining the root cause for these errors can improve future decisions. This work presents Generative Error Model (GEM), a generative model for inferring representational errors based on observations of an actor's behavior (either simulated agent, robot, or human). The model considers two sources of error: those that occur due to representational limitations -- "blind spots" -- and non-representational errors, such as those caused by noise in execution or systematic errors present in the actor's policy. Disambiguating these two error types allows for targeted refinement of the actor's policy (i.e., representational errors require perceptual augmentation, while other errors can be reduced through methods such as improved training or attention support). We present a Bayesian inference algorithm for GEM and evaluate its utility in recovering representational errors on multiple domains. Results show that our approach can recover blind spots of both reinforcement learning agents as well as human users.

翻訳日:2021-03-31 08:21:37 公開日:2021-03-28

# (参考訳) バングラデシュにおけるcovid-19ワクチンの受容とその決定要因

Acceptance of COVID-19 Vaccine and Its Determinants in Bangladesh ( http://arxiv.org/abs/2103.15206v1 )

ライセンス: CC BY 4.0

Sultan Mahmud, Md. Mohsin, Ijaz Ahmed Khan, Ashraf Uddin Mian, Miah Akib Zaman

(参考訳) 背景:バングラデシュのゴヴト。 2021年2月上旬からSARS-CoV-2感染に対する全国的なワクチン接種を開始した。本研究の目的は、新型コロナウイルスワクチンの受け入れを評価し、バングラデシュでの受け入れに関連する要因を検討することである。方法:2021年1月30日から2月6日まで,バングラデシュの一般住民を対象に,webベースの匿名横断調査を実施した。多変量ロジスティック回帰は、新型コロナウイルスワクチンの受け入れに影響を与える要因を特定するために用いられた。結果:61.16%(370/605)の回答者が新型コロナウイルスワクチンの受け入れ/接種を希望していた。承認されたグループの中で、すぐにワクチンを接種する意思を示したのは35.14%に過ぎず、64.86%はワクチンの有効性や安全性がバングラデシュで死亡し、ワクチンの接種を遅らせることになる。その結果、年齢、性別、場所(都市/声)、教育水準、収入、将来covid-19に感染するリスクが認識され、感染の重症度が認識され、18歳以上の過去の予防接種経験、covid-19に関する知識、およびワクチン接種は、covid-19ワクチンの受容に著しく関連していた。結論: この研究はバングラデシュにおいて、新型コロナウイルスのワクチンの拒絶と不服従の頻度が高いと報告した。ワクチンの根絶を減らし、摂取量を増やすために、政策立案者は予防接種障壁を取り除くための十分に調査された免疫戦略を設計する必要がある。ワクチンの受け入れを改善するため、新型コロナウイルスワクチンに関する誤った噂や誤解は(特にインターネット上で)排除され、人々は実際の科学的事実に晒されなければならない。

Background: Bangladesh govt. launched a nationwide vaccination drive against SARS-CoV-2 infection from early February 2021. The objectives of this study were to evaluate the acceptance of the COVID-19 vaccines and examine the factors associated with the acceptance in Bangladesh. Method: In between January 30 to February 6, 2021, we conducted a web-based anonymous cross-sectional survey among the Bangladeshi general population. The multivariate logistic regression was used to identify the factors that influence the acceptance of the COVID-19 vaccination. Results: 61.16% (370/605) of the respondents were willing to accept/take the COVID-19 vaccine. Among the accepted group, only 35.14% showed the willingness to take the COVID-19 vaccine immediately, while 64.86% would delay the vaccination until they are confirmed about the vaccine's efficacy and safety or COVID-19 become deadlier in Bangladesh. The regression results showed age, gender, location (urban/rural), level of education, income, perceived risk of being infected with COVID-19 in the future, perceived severity of infection, having previous vaccination experience after age 18, having higher knowledge about COVID-19 and vaccination were significantly associated with the acceptance of COVID-19 vaccines. Conclusion: The research reported a high prevalence of COVID-19 vaccine refusal and hesitancy in Bangladesh. To diminish the vaccine hesitancy and increase the uptake, the policymakers need to design a well-researched immunization strategy to remove the vaccination barriers. To improve vaccine acceptance among people, false rumors and misconceptions about the COVID-19 vaccines must be dispelled (especially on the internet) and people must be exposed to the actual scientific facts.

翻訳日:2021-03-31 08:07:31 公開日:2021-03-28

# (参考訳) 微分可能なモンテカルロレンダリングによる統一形状とSVBRDF回収

Unified Shape and SVBRDF Recovery using Differentiable Monte Carlo Rendering ( http://arxiv.org/abs/2103.15208v1 )

ライセンス: CC BY 4.0

Fujun Luan, Shuang Zhao, Kavita Bala, Zhao Dong

(参考訳) 実世界の物体の形状と外観を2次元画像で再現することは、コンピュータビジョンにおいて長年の課題であった。本稿では,ロバストな粗い最適化と物理に基づく微分可能レンダリングにより,高品質な再構成を実現する新手法を提案する。幾何と反射率をほぼ別々に扱う従来の手法とは異なり、この手法は物体の反射率と反射率の両方について画像勾配を活用することで両方の最適化を統一する。物理的に正確な勾配推定を得るために,最近の微分可能レンダリング理論の進歩を利用して,pytorch3dやrednerといった既存のツールよりも優れた性能を享受しながら,偏りのない勾配を提供する新しいgpuベースのモンテカルロ微分可能レンダラを開発した。さらにロバスト性を向上させるために,形状や素材の先行性や粗大な最適化戦略を利用して幾何を再構築する。本手法は,従来のcolmapやkinect fusion法よりも高品質な再構築を実現できることを示す。

Reconstructing the shape and appearance of real-world objects using measured 2D images has been a long-standing problem in computer vision. In this paper, we introduce a new analysis-by-synthesis technique capable of producing high-quality reconstructions through robust coarse-to-fine optimization and physics-based differentiable rendering. Unlike most previous methods that handle geometry and reflectance largely separately, our method unifies the optimization of both by leveraging image gradients with respect to both object reflectance and geometry. To obtain physically accurate gradient estimates, we develop a new GPU-based Monte Carlo differentiable renderer leveraging recent advances in differentiable rendering theory to offer unbiased gradients while enjoying better performance than existing tools like PyTorch3D and redner. To further improve robustness, we utilize several shape and material priors as well as a coarse-to-fine optimization strategy to reconstruct geometry. We demonstrate that our technique can produce reconstructions with higher quality than previous methods such as COLMAP and Kinect Fusion.

翻訳日:2021-03-31 07:50:24 公開日:2021-03-28

# (参考訳) 地球におけるアルゴリズム予測の限界について

On the limits of algorithmic prediction across the globe ( http://arxiv.org/abs/2103.15212v1 )

ライセンス: CC BY 4.0

Xingyu Li, Difan Song, Miaozhe Han, Yu Zhang, Rene F. Kizilcec

(参考訳) 予測アルゴリズムが人々の生活や生活に与える影響は、医学、刑事司法、金融、雇用、入場などで指摘されている。これらのアルゴリズムの多くは高度に発達した国々のデータと人的資本を用いて開発されている。先進国で訓練された人間の行動の予測モデルが、65カ国の全国代表学生データに基づく200人の学歴達成予測者のグローバル変動をモデル化し、先進国の人々に広く普及するかどうかを検証した。ここでは、米国のデータに基づいてトレーニングされた最先端の機械学習モデルが、高い精度で達成を予測でき、同等の精度で他の先進国に一般化できることを示す。しかし、様々な達成予測者の重要性のグローバル変動により、国家発展とともに精度は直線的に低下し、政策立案者にとって有用なヒューリスティックとなる。同じモデルを全国データでトレーニングすると、各国で高い精度が得られ、ローカルデータ収集の価値が強調される。

The impact of predictive algorithms on people's lives and livelihoods has been noted in medicine, criminal justice, finance, hiring and admissions. Most of these algorithms are developed using data and human capital from highly developed nations. We tested how well predictive models of human behavior trained in a developed country generalize to people in less developed countries by modeling global variation in 200 predictors of academic achievement on nationally representative student data for 65 countries. Here we show that state-of-the-art machine learning models trained on data from the United States can predict achievement with high accuracy and generalize to other developed countries with comparable accuracy. However, accuracy drops linearly with national development due to global variation in the importance of different achievement predictors, providing a useful heuristic for policymakers. Training the same model on national data yields high accuracy in every country, which highlights the value of local data collection.

翻訳日:2021-03-31 07:30:19 公開日:2021-03-28

# (参考訳) グラフニューラルネットワークを用いた3Dポイントクラウド処理のための特徴とグラフ構築のための局所幾何学の展開

Exploiting Local Geometry for Feature and Graph Construction for Better 3D Point Cloud Processing with Graph Neural Networks ( http://arxiv.org/abs/2103.15226v1 )

ライセンス: CC BY 4.0

Siddharth Srivastava, Gaurav Sharma

(参考訳) 本稿では,3次元クラウド処理のためのグラフニューラルネットワーク(GNN)の汎用フレームワークにおいて,点表現と局所グラフ構築の簡易かつ効果的な改善を提案する。まず,点の局所的幾何的な重要な情報を用いて頂点表現を拡大し,次にMLPを用いた非線形投影を提案する。第2の貢献として,GNNの3次元点群に対するグラフ構築の改善を提案する。既存の手法では、k-nn に基づく局所近傍グラフの構築を行う。現場の一部地域では,センサによる高密度サンプリングを行う場合,カバー範囲が減少する可能性があると論じる。提案手法は,このような問題に対処し,適用範囲を改善することを目的としている。従来のGNNは、頂点が幾何学的解釈を持たないような一般グラフを扱うように設計されているため、この2つの提案は3次元点雲の幾何学的性質を取り入れた一般グラフを増大させるものである。単純ではあるが,実世界のノイズスキャンと同様に比較的クリーンなcadモデルを用いた複数の難解なベンチマークを用いて,提案手法が3d分類(modelnet40),部分セグメンテーション(shapenet),意味セグメンテーション(stanford 3d indoor scene dataset)のベンチマークにおいて,最先端の技術結果が得られることを示す。また,提案ネットワークがより高速な学習収束を実現することを示す。～40%少なかった。プロジェクトの詳細はhttps://siddharthsrivastava.github.io/publication/geomgcnn/で確認できる。

We propose simple yet effective improvements in point representations and local neighborhood graph construction within the general framework of graph neural networks (GNNs) for 3D point cloud processing. As a first contribution, we propose to augment the vertex representations with important local geometric information of the points, followed by nonlinear projection using a MLP. As a second contribution, we propose to improve the graph construction for GNNs for 3D point clouds. The existing methods work with a k-nn based approach for constructing the local neighborhood graph. We argue that it might lead to reduction in coverage in case of dense sampling by sensors in some regions of the scene. The proposed methods aims to counter such problems and improve coverage in such cases. As the traditional GNNs were designed to work with general graphs, where vertices may have no geometric interpretations, we see both our proposals as augmenting the general graphs to incorporate the geometric nature of 3D point clouds. While being simple, we demonstrate with multiple challenging benchmarks, with relatively clean CAD models, as well as with real world noisy scans, that the proposed method achieves state of the art results on benchmarks for 3D classification (ModelNet40) , part segmentation (ShapeNet) and semantic segmentation (Stanford 3D Indoor Scenes Dataset). We also show that the proposed network achieves faster training convergence, i.e. ~40% less epochs for classification. The project details are available at https://siddharthsrivastava.github.io/publication/geomgcnn/

翻訳日:2021-03-31 07:29:19 公開日:2021-03-28

# (参考訳) 地域降雨予測のための過小評価モデルKNN

KNN, An Underestimated Model for Regional Rainfall Forecasting ( http://arxiv.org/abs/2103.15235v1 )

ライセンス: CC BY 4.0

Ning Yu and Timothy Haskins

(参考訳) 地域降雨予測は水文学と気象学において重要な課題である。本稿では,特に深層ニューラルネットワーク,ワイドニューラルネットワーク,ディープ・アンド・ワイドニューラルネットワーク,Reservoir Computing,Long Short Term Memory,Support Vector Machine,K-Nearest Neighborといった最先端のディープラーニングアルゴリズムを応用して,地域降水量を予測する統合ツールの設計を目的とする。実験結果と,分類と回帰を含む機械学習モデルとの比較により,KNNは降水データの不確実性を扱う他のモデルよりも優れたモデルであることがわかった。また, ZScore や MinMax などのデータ正規化手法も検討し検討した。

Regional rainfall forecasting is an important issue in hydrology and meteorology. This paper aims to design an integrated tool by applying various machine learning algorithms, especially the state-of-the-art deep learning algorithms including Deep Neural Network, Wide Neural Network, Deep and Wide Neural Network, Reservoir Computing, Long Short Term Memory, Support Vector Machine, K-Nearest Neighbor for forecasting regional precipitations over different catchments in Upstate New York. Through the experimental results and the comparison among machine learning models including classification and regression, we find that KNN is an outstanding model over other models to handle the uncertainty in the precipitation data. The data normalization methods such as ZScore and MinMax are also evaluated and discussed.

翻訳日:2021-03-31 07:12:32 公開日:2021-03-28

# (参考訳) resnetsの再検討: 高次スキームによるスタック戦略の改善

Rethinking ResNets: Improved Stacking Strategies With High Order Schemes ( http://arxiv.org/abs/2103.15244v1 )

ライセンス: CC BY 4.0

Zhengbo Luo and Zitang Sun and Weilian Zhou and Sei-ichiro Kamata

(参考訳) さまざまなDeep Neural Networkアーキテクチャは、コンピュータビジョンにおいて非常に重要な記録を維持している。世界中の注目を集めている一方で、全体構造の設計には一般的なガイダンスが欠けている。近年,数名の研究者が観測したdnn設計と数値微分方程式の関係から,残差設計と高次視点との公平な比較を行った。我々は,dnnの設計戦略を広く活用し,小さな設計を常に積み重ねることで,理論的知識が充実し,余分なパラメータも必要とせず,容易に改善できることを示す。我々は,多くの実効ネットワークを微分方程式の異なる数値的離散化として解釈できるという観測から着想を得た,高次手法で残差設計を再構成した。 resnet の設計は euler forward という比較的単純なスキームに従っているが、スタックの状況は急速に複雑になっている。スタックされたresnetが何らかの高次スキームに等しくなっていると仮定すると、現在の転送の方法はrunge-kuttaのような典型的な高次手法と比較すると比較的弱い可能性がある。そこで本研究では, cvベンチマークを十分な実験で検証するために, 高次resnetを提案する。安定して顕著なパフォーマンスの上昇が観察され、収束と堅牢性が恩恵を受ける。

Various Deep Neural Network architectures are keeping massive vital records in computer vision. While drawing attention worldwide, the design of the overall structure somehow lacks general guidance. Based on the relationship between DNN design with numerical differential equations, which several researchers observed in recent years, we perform a fair comparison of residual design with higher-order perspectives. We show that the widely used DNN design strategy, constantly stacking a small design, could be easily improved, supported by solid theoretical knowledge and no extra parameters needed. We reorganize the residual design in higher-order ways, which is inspired by the observation that many effective networks could be interpreted as different numerical discretizations of differential equations. The design of ResNet follows a relatively simple scheme which is Euler forward; however, the situation is getting complicated rapidly while stacking. We suppose stacked ResNet is somehow equalled to a higher order scheme, then the current way of forwarding propagation might be relatively weak compared with a typical high-order method like Runge-Kutta. We propose higher order ResNet to verify the hypothesis on widely used CV benchmarks with sufficient experiments. Stable and noticeable rises in performance are observed, convergence and robustness are benefited.

翻訳日:2021-03-31 06:57:04 公開日:2021-03-28

# HiT:ビデオテキスト検索のためのモーメントコントラスト付き階層変換器

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval ( http://arxiv.org/abs/2103.15049v1 )

ライセンス: Link先を確認

Song Liu and Haoqi Fan and Shengsheng Qian and Yiru Chen and Wenkui Ding and Zhongyuan Wang

(参考訳) インターネット上のマルチメディアデータの爆発的増加に伴い,ビデオテキスト検索はホットな研究課題となっている。ビデオテキスト学習用トランスフォーマーは,有望な性能により注目を集めている。しかしながら,既存のクロスモーダルトランスフォーマーアプローチでは,(1)異なる層が異なる特徴を持つトランスフォーマーアーキテクチャの活用が制限されている。 2) エンドツーエンドトレーニング機構は, サンプル間の負の相互作用を制限する。本稿では,ビデオテキスト検索のための階層変換器 (HiT) という新しい手法を提案する。 HiTは特徴レベルと意味レベルで階層的相互モーダルコントラストマッチングを行い、多視点および包括的検索結果を得る。さらに,MoCoに触発されたクロスモーダル学習のためのMomentum Cross-modal Contrastを提案する。 3つの主要ビデオテキスト検索ベンチマークデータセットの実験結果は,本手法の利点を示している。

Video-Text Retrieval has been a hot research topic with the explosion of multimedia data on the Internet. Transformer for video-text learning has attracted increasing attention due to the promising performance.However, existing cross-modal transformer approaches typically suffer from two major limitations: 1) Limited exploitation of the transformer architecture where different layers have different feature characteristics. 2) End-to-end training mechanism limits negative interactions among samples in a mini-batch. In this paper, we propose a novel approach named Hierarchical Transformer (HiT) for video-text retrieval. HiT performs hierarchical cross-modal contrastive matching in feature-level and semantic-level to achieve multi-view and comprehensive retrieval results. Moreover, inspired by MoCo, we propose Momentum Cross-modal Contrast for cross-modal learning to enable large-scale negative interactions on-the-fly, which contributes to the generation of more precise and discriminative representations. Experimental results on three major Video-Text Retrieval benchmark datasets demonstrate the advantages of our methods.

翻訳日:2021-03-30 15:25:56 公開日:2021-03-28

# グラフ埋め込みによる一般ハイパーグラフのコミュニティ検出

Community Detection in General Hypergraph via Graph Embedding ( http://arxiv.org/abs/2103.15035v1 )

ライセンス: Link先を確認

Yaoming Zhen and Junhui Wang

(参考訳) 近年,ネットワークデータに注目が集まっており,従来のネットワークは2つの頂点間のペアワイズインタラクションに注目している。しかし、実際のネットワークデータはより複雑な構造を示し、頂点間の多方向相互作用は自然に発生する。本稿では,一般ハイパーグラフネットワーク,均一型,非一様型のコミュニティ構造を検出する手法を提案する。提案手法では,非一様超グラフを一様多重超グラフに拡張するためにヌル頂点を導入し,その多重超グラフを低次元ベクトル空間に埋め込み,同一コミュニティ内の頂点が互いに近接する。結果最適化タスクは、代替更新スキームによって効率的に取り組める。提案手法の漸近成分は,コミュニティ検出とハイパーグラフ推定の両面で確立され,いくつかの合成および実生活ハイパーグラフネットワークにおける数値実験でも支持されている。

Network data has attracted tremendous attention in recent years, and most conventional networks focus on pairwise interactions between two vertices. However, real-life network data may display more complex structures, and multi-way interactions among vertices arise naturally. In this article, we propose a novel method for detecting community structure in general hypergraph networks, uniform or non-uniform. The proposed method introduces a null vertex to augment a non-uniform hypergraph into a uniform multi-hypergraph, and then embeds the multi-hypergraph in a low-dimensional vector space such that vertices within the same community are close to each other. The resultant optimization task can be efficiently tackled by an alternative updating scheme. The asymptotic consistencies of the proposed method are established in terms of both community detection and hypergraph estimation, which are also supported by numerical experiments on some synthetic and real-life hypergraph networks.

翻訳日:2021-03-30 15:24:02 公開日:2021-03-28

# Decon founded Score Method: Scoring DAGs with Dense Unobserved Confounding

Deconfounded Score Method: Scoring DAGs with Dense Unobserved Confounding ( http://arxiv.org/abs/2103.15106v1 )

ライセンス: Link先を確認

Alexis Bellot, Mihaela van der Schaar

(参考訳) 観測されていない発見は因果発見の最大の課題の1つである。観測されていない変数が観測された変数の多くに潜在的に広範囲に影響を及ぼす場合、ほとんどの変数は他の部分集合に対して条件依存であるため、特に困難である。本稿では, 条件の不整合性を超えて, 観測データ分布に特徴的なフットプリントを残し, 突発的・因果的影響を解消できることを示す。この知見を用いて,観測変数間のスパース線形ガウス有向非巡回グラフをほぼ復元し,汎用解法や高次元問題へのスケールで実装可能な調整スコアに基づく因果発見アルゴリズムを提案する。さらに,因果回復を保証しようとする条件にもかかわらず,実際の性能はモデル仮定の大きな偏差に対して頑健であることが判明した。

Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a potentially widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset. In this paper, we show that beyond conditional independencies, unobserved confounding in this setting leaves a characteristic footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.

翻訳日:2021-03-30 15:23:47 公開日:2021-03-28

# マルチビュークラスタリングのための自己教師付き判別特徴学習

Self-supervised Discriminative Feature Learning for Multi-view Clustering ( http://arxiv.org/abs/2103.15069v1 )

ライセンス: Link先を確認

Jie Xu, Yazhou Ren, Huayi Tang, Zhimeng Yang, Lili Pan, Yang Yang, Xiaorong Pu

(参考訳) マルチビュークラスタリングは、複数のビューから補完情報を活用できるため、重要な研究トピックである。しかし、クラスタリング構造が不明な特定のビューによる負の影響を考慮する方法はほとんどなく、結果としてマルチビュークラスタリング性能が低下する。この欠点に対処するために,マルチビュークラスタリング(SDMVC)のための自己教師付き識別特徴学習を提案する。具体的には、ディープオートエンコーダを用いて各ビューの埋め込み機能を独立して学習する。マルチビュー補完情報を活用するために、すべてのビューの組み込み機能を結合してグローバル機能を形成することにより、一部のビューの不明瞭なクラスタリング構造による負の影響を克服する。自己教師方式で擬似ラベルを取得し、統一された目標分布を構築し、多視点識別特徴学習を行う。このプロセスでは、全ビューを監督するためにグローバル判別情報を掘り出し、より識別的な特徴を学習し、ターゲットディストリビューションを更新するために使用される。さらに、この統合されたターゲットディストリビューションは、SDMVCが一貫性のあるクラスタ割り当てを学習できるようにし、特徴の多様性を維持しながら、複数のビューのクラスタ化一貫性を達成する。様々なタイプのマルチビューデータセットの実験により、SDMVCが最先端のパフォーマンスを達成することが示された。

Multi-view clustering is an important research topic due to its capability to utilize complementary information from multiple views. However, there are few methods to consider the negative impact caused by certain views with unclear clustering structures, resulting in poor multi-view clustering performance. To address this drawback, we propose self-supervised discriminative feature learning for multi-view clustering (SDMVC). Concretely, deep autoencoders are applied to learn embedded features for each view independently. To leverage the multi-view complementary information, we concatenate all views' embedded features to form the global features, which can overcome the negative impact of some views' unclear clustering structures. In a self-supervised manner, pseudo-labels are obtained to build a unified target distribution to perform multi-view discriminative feature learning. During this process, global discriminative information can be mined to supervise all views to learn more discriminative features, which in turn are used to update the target distribution. Besides, this unified target distribution can make SDMVC learn consistent cluster assignments, which accomplishes the clustering consistency of multiple views while preserving their features' diversity. Experiments on various types of multi-view datasets show that SDMVC achieves state-of-the-art performance.

翻訳日:2021-03-30 15:22:14 公開日:2021-03-28

# 分散平滑化による自己回帰モデリングの改善

Improved Autoregressive Modeling with Distribution Smoothing ( http://arxiv.org/abs/2103.15089v1 )

ライセンス: Link先を確認

Chenlin Meng, Jiaming Song, Yang Song, Shengjia Zhao, and Stefano Ermon

(参考訳) 自己回帰モデルは画像圧縮に優れるが、そのサンプル品質はしばしば欠落している。現実的ではないものの、生成された画像は、しばしばモデルに従って高い確率を持ち、逆の例の場合に似ている。敵対的防御法の成功に触発されて,ランダム化平滑化を自己回帰的生成モデルに取り入れた。まず、まずスムーズなデータ分布をモデル化し、次にスムーズな処理を反転させて元のデータ分布を復元する。この手順は、合成データセットと実世界の画像データセットの既存の自己回帰モデルのサンプル品質を劇的に改善し、合成データセットの競合可能性を得る。

While autoregressive models excel at image compression, their sample quality is often lacking. Although not realistic, generated images often have high likelihood according to the model, resembling the case of adversarial examples. Inspired by a successful adversarial defense method, we incorporate randomized smoothing into autoregressive generative modeling. We first model a smoothed version of the data distribution, and then reverse the smoothing process to recover the original data distribution. This procedure drastically improves the sample quality of existing autoregressive models on several synthetic and real-world image datasets while obtaining competitive likelihoods on synthetic datasets.

翻訳日:2021-03-30 15:21:57 公開日:2021-03-28

# 確率的分類モデルの信頼度評価のためのエントロピー法

Entropy methods for the confidence assessment of probabilistic classification models ( http://arxiv.org/abs/2103.15157v1 )

ライセンス: Link先を確認

Gabriele N. Tornetta

(参考訳) 多くの分類モデルは予測の結果として確率分布を生成する。この情報は一般に、最も高い関連する確率で単一のクラスに圧縮される。本稿では、このプロセスで破棄された情報の一部は、実際にモデルの良さ、特に各予測の信頼性を更に評価するために利用できると論じる。本稿では,本論文で提示された概念の応用として,(ベルヌーリ)ナイーブベイズ生成モデルに対する補完的アプローチで観測される信頼度低下現象の理論的説明を提案する。

Many classification models produce a probability distribution as the outcome of a prediction. This information is generally compressed down to the single class with the highest associated probability. In this paper, we argue that part of the information that is discarded in this process can be in fact used to further evaluate the goodness of models, and in particular the confidence with which each prediction is made. As an application of the ideas presented in this paper, we provide a theoretical explanation of a confidence degradation phenomenon observed in the complement approach to the (Bernoulli) Naive Bayes generative model.

翻訳日:2021-03-30 15:16:16 公開日:2021-03-28

# InsertGNN: グラフニューラルネットワークはTOEFL文挿入問題において人間より優れているか?

InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence Insertion Problem? ( http://arxiv.org/abs/2103.15066v1 )

ライセンス: Link先を確認

Fang Wu and Xiang Bai

(参考訳) 文挿入は繊細だが基本的なNLP問題である。文順序付け、テキストコヒーレンス、質問応答(QA)の現在のアプローチは、その解決には適さない。本稿では,この問題をグラフとして表現し,グラフニューラルネットワーク(GNN)を用いて文間の関係を学習するシンプルなモデルであるInsertGNNを提案する。また、近隣の文の局所的な相互作用を考慮できる地域情報とグローバル情報の両方で教師されている。我々の知る限りでは、文挿入に教師付きグラフ構造化モデルを適用する試みとしてはこれが初めてである。本手法を新たに収集したtoeflデータセットで評価し,クロスドメイン学習を用いた大規模arxivデータセットの有効性をさらに検証した。実験の結果,InsertGNNは教師なしテキストコヒーレンス手法,トポロジカル文順序付け手法,QAアーキテクチャよりも優れていた。具体的には、平均的な人間のテストスコアに匹敵する70%の精度を達成する。

Sentence insertion is a delicate but fundamental NLP problem. Current approaches in sentence ordering, text coherence, and question answering (QA) are neither suitable nor good at solving it. In this paper, We propose InsertGNN, a simple yet effective model that represents the problem as a graph and adopts the graph Neural Network (GNN) to learn the connection between sentences. It is also supervised by both the local and global information that the local interactions of neighboring sentences can be considered. To the best of our knowledge, this is the first recorded attempt to apply a supervised graph-structured model in sentence insertion. We evaluate our method in our newly collected TOEFL dataset and further verify its effectiveness on the larger arXivdataset using cross-domain learning. The experiments show that InsertGNN outperforms the unsupervised text coherence method, the topological sentence ordering approach, and the QA architecture. Specifically, It achieves an accuracy of 70%, rivaling the average human test scores.

翻訳日:2021-03-30 15:12:58 公開日:2021-03-28

# 汎用知能の一般理論 : プラグマティック・パターン主義の視点から

The General Theory of General Intelligence: A Pragmatic Patternist Perspective ( http://arxiv.org/abs/2103.15100v1 )

ライセンス: Link先を確認

Ben Goertzel

(参考訳) 一連の書籍や論文で表現され、一連の実用および研究プロトタイプソフトウェアシステムのガイドに使用される、人工的および自然的汎用知性の理論的基礎に関する多年にわたる調査が、適度なレベルでレビューされている。このレビューでは、基礎となる哲学(心のパターン哲学、基礎現象論と論理オントロジー)、知性の概念の形式化、そしてこれらの形式化と哲学によって部分的に駆動されるagiシステムのための高レベルアーキテクチャの提案などを取り上げている。論理的推論、プログラム学習、クラスタリング、注意割当てといった特定の認知過程の実装は、このハイレベルアーキテクチャの文脈と言語において、共通の(例えば)重要性と同様に考慮される。タイプ付きメタグラフベース) 様々なプロセス間の「認知シナジー」を可能にする知識表現。人間のような認知アーキテクチャの特質は、これらの一般的な原則の表象として提示され、機械意識と機械倫理の重要な側面もこの文脈で扱われる。 OpenCog Hyperonのようなフレームワークにおける高度なAGIの実践的な実装の教訓を簡潔に検討する。

A multi-decade exploration into the theoretical foundations of artificial and natural general intelligence, which has been expressed in a series of books and papers and used to guide a series of practical and research-prototype software systems, is reviewed at a moderate level of detail. The review covers underlying philosophies (patternist philosophy of mind, foundational phenomenological and logical ontology), formalizations of the concept of intelligence, and a proposed high level architecture for AGI systems partly driven by these formalizations and philosophies. The implementation of specific cognitive processes such as logical reasoning, program learning, clustering and attention allocation in the context and language of this high level architecture is considered, as is the importance of a common (e.g. typed metagraph based) knowledge representation for enabling "cognitive synergy" between the various processes. The specifics of human-like cognitive architecture are presented as manifestations of these general principles, and key aspects of machine consciousness and machine ethics are also treated in this context. Lessons for practical implementation of advanced AGI in frameworks such as OpenCog Hyperon are briefly considered.

翻訳日:2021-03-30 15:11:29 公開日:2021-03-28

# LSG-CPD:点雲登録のための局所表面形状のコヒーレント点ドリフト

LSG-CPD: Coherent Point Drift with Local Surface Geometry for Point Cloud Registration ( http://arxiv.org/abs/2103.15039v1 )

ライセンス: Link先を確認

Weixiao Liu, Hongtao Wu, Gregory Chirikjian

(参考訳) 確率的ポイントクラウド登録手法は,その堅牢性から人気が高まっている。しかし、局所的な表面幾何情報を含む反復的最接近点(icp)の点対平面変種とは異なり、ほとんどの確率的手法(例えばコヒーレント点ドリフト(cpd))はそのような情報を無視し、等方的ガウス共分散を持つガウス混合モデル(gmms)を構築する。この結果、球状GMM成分は2つの点雲の間の点間距離のみをペナル化する。本稿では,剛点雲登録のための局所表面形状(LSG-CPD)を用いたCDD法を提案する。本手法は,局所表面の平坦度に基づいて,点対面のペナリゼーションに異なるレベルのペナリゼーションを適応的に付加する。これにより、異方性共分散を持つGMM成分が得られる。我々は,最大極大推定(MLE)問題として点雲登録を定式化し,期待最大化(EM)アルゴリズムを用いて解いた。 Eステップでは、計算を単純な行列操作に再キャストし、GPU上で効率的に計算できることを実証する。 M ステップでは、行列リー群上で制約のない最適化を行い、登録の剛性変換を効率的に更新する。提案手法は、レンジスキャナ、rgbdカメラ、lidarでキャプチャした各種データセットの精度とロバスト性の観点から最先端アルゴリズムを上回る。また、cpdの現代的な実装よりもかなり高速である。コードはリリースされます。

Probabilistic point cloud registration methods are becoming more popular because of their robustness. However, unlike point-to-plane variants of iterative closest point (ICP) which incorporate local surface geometric information such as surface normals, most probabilistic methods (e.g., coherent point drift (CPD)) ignore such information and build Gaussian mixture models (GMMs) with isotropic Gaussian covariances. This results in sphere-like GMM components which only penalize the point-to-point distance between the two point clouds. In this paper, we propose a novel method called CPD with Local Surface Geometry (LSG-CPD) for rigid point cloud registration. Our method adaptively adds different levels of point-to-plane penalization on top of the point-to-point penalization based on the flatness of the local surface. This results in GMM components with anisotropic covariances. We formulate point cloud registration as a maximum likelihood estimation (MLE) problem and solve it with the Expectation-Maximization (EM) algorithm. In the E step, we demonstrate that the computation can be recast into simple matrix manipulations and efficiently computed on a GPU. In the M step, we perform an unconstrained optimization on a matrix Lie group to efficiently update the rigid transformation of the registration. The proposed method outperforms state-of-the-art algorithms in terms of accuracy and robustness on various datasets captured with range scanners, RGBD cameras, and LiDARs. Also, it is significantly faster than modern implementations of CPD. The code will be released.

翻訳日:2021-03-30 15:04:47 公開日:2021-03-28

# 長期音声認識における蒸留仮想例

Distilling Virtual Examples for Long-tailed Recognition ( http://arxiv.org/abs/2103.15042v1 )

ライセンス: Link先を確認

Yin-Yin He, Jianxin Wu, Xiu-Shen Wei

(参考訳) 本稿では,Distill the Virtual Examples(DiVE)法を提案することにより,知識蒸留の観点からの長期視覚認識問題に取り組む。具体的には,教師モデルの予測を仮想例として扱うことで,これらの仮想例からの蒸留が一定の制約下でラベル分布学習と等価であることを示す。仮想的なサンプル分布が元の入力分布よりも平坦になると、表現不足のテールクラスは大幅に改善され、ロングテール認識に欠かせないことが示される。提案手法では,仮想サンプル分布のフラット化を明示的に調整できる。大規模なiNaturalistを含む3つのベンチマークデータセットに対する大規模な実験は、提案したDiVEメソッドが最先端の手法を大幅に上回ることを正当化している。さらに、仮想的なサンプル解釈を検証し、長い尾問題に対するDiVEの調整済み設計の有効性を実証する。

In this paper, we tackle the long-tailed visual recognition problem from the knowledge distillation perspective by proposing a Distill the Virtual Examples (DiVE) method. Specifically, by treating the predictions of a teacher model as virtual examples, we prove that distilling from these virtual examples is equivalent to label distribution learning under certain constraints. We show that when the virtual example distribution becomes flatter than the original input distribution, the under-represented tail classes will receive significant improvements, which is crucial in long-tailed recognition. The proposed DiVE method can explicitly tune the virtual example distribution to become flat. Extensive experiments on three benchmark datasets, including the large-scale iNaturalist ones, justify that the proposed DiVE method can significantly outperform state-of-the-art methods. Furthermore, additional analyses and experiments verify the virtual example interpretation, and demonstrate the effectiveness of tailored designs in DiVE for long-tailed problems.

翻訳日:2021-03-30 15:04:20 公開日:2021-03-28

# 騒々しいラベルから学ぶ友人とファン

Friends and Foes in Learning from Noisy Labels ( http://arxiv.org/abs/2103.15055v1 )

ライセンス: Link先を確認

Yifan Zhou, Yifan Ge, Jianxin Wu

(参考訳) ノイズの多いラベルを持つ例から学ぶことが近年注目を集めている。しかし,本論文では,CIFARに基づくデータセットと文献で使用される精度評価基準が,この文脈では不適切であることを示す。本稿では,この分野における適切な研究と評価を促進するために,代替有効な評価指標と新しいデータセットを提案する。そして, 従来の手法から, ノイズのあるラベル付きサンプルからの深層学習に有益あるいは有害な技術要素として友人や敵を同定し, 自己教師付き学習, 新たなウォームアップ戦略, インスタンスフィルタリング, ラベル修正など, 友人のカテゴリからの技術的構成要素を改善し, 組み合わせる。得られたF&F法は,提案したnCIFARデータセットと実世界のChrothing1Mデータセットの既存手法を著しく上回っている。

Learning from examples with noisy labels has attracted increasing attention recently. But, this paper will show that the commonly used CIFAR-based datasets and the accuracy evaluation metric used in the literature are both inappropriate in this context. An alternative valid evaluation metric and new datasets are proposed in this paper to promote proper research and evaluation in this area. Then, friends and foes are identified from existing methods as technical components that are either beneficial or detrimental to deep learning from noisy labeled examples, respectively, and this paper improves and combines technical components from the friends category, including self-supervised learning, new warmup strategy, instance filtering and label correction. The resulting F&F method significantly outperforms existing methods on the proposed nCIFAR datasets and the real-world Clothing1M dataset.

翻訳日:2021-03-30 15:04:06 公開日:2021-03-28

# warpへの注意:多変量時系列のためのディープメトリック学習

Attention to Warp: Deep Metric Learning for Multivariate Time Series ( http://arxiv.org/abs/2103.15074v1 )

ライセンス: Link先を確認

Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida

(参考訳) ディープ時系列計量学習は、非マッチングシーケンスを識別する際の時間的不変性と非線形歪みとのトレードオフが難しいため、困難である。本稿では,ロバストだが判別可能な時系列分類と検証のためのニューラルネットワークに基づく新しい手法を提案する。このアプローチは、より大きく適応的な時間的不変性に対して、パラメータ化された注意モデルを時間歪みに適応させる。局所的だけでなく大きな大域的歪みにも頑健であり、一調性、連続性、境界条件を満たさない対でさえもうまく同定できる。このモデルの学習は動的時間ワープによってさらに誘導され、安定したトレーニングと高い差別力のための時間的制約が課される。ウォーピングによってクラス間のバリエーションを増強し、類似しているが異なるクラスを効果的に区別することができる。提案手法は,Unipenデータセット上のシングルレター手書き分類において有望な動作を確認した後,オンライン署名検証フレームワークと組み合わせることで,従来の非パラメトリック・ディープモデルよりも優れていることを示す。

Deep time series metric learning is challenging due to the difficult trade-off between temporal invariance to nonlinear distortion and discriminative power in identifying non-matching sequences. This paper proposes a novel neural network-based approach for robust yet discriminative time series classification and verification. This approach adapts a parameterized attention model to time warping for greater and more adaptive temporal invariance. It is robust against not only local but also large global distortions, so that even matching pairs that do not satisfy the monotonicity, continuity, and boundary conditions can still be successfully identified. Learning of this model is further guided by dynamic time warping to impose temporal constraints for stabilized training and higher discriminative power. It can learn to augment the inter-class variation through warping, so that similar but different classes can be effectively distinguished. We experimentally demonstrate the superiority of the proposed approach over previous non-parametric and deep models by combining it with a deep online signature verification framework, after confirming its promising behavior in single-letter handwriting classification on the Unipen dataset.

翻訳日:2021-03-30 15:03:51 公開日:2021-03-28

# Picasso:3DメッシュによるディープラーニングのためのCUDAベースのライブラリ

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes ( http://arxiv.org/abs/2103.15076v1 )

ライセンス: Link先を確認

Huan Lei, Naveed Akhtar, Ajmal Mian

(参考訳) 複雑な現実世界の3Dメッシュ上でのディープラーニングのための新しいモジュールで構成されるCUDAベースのライブラリであるPicassoを紹介する。階層型ニューラルネットワークアーキテクチャは、高速メッシュデシミテーションの必要性を示すマルチスケールの特徴抽出に有効であることが証明されている。しかし、既存の手法はマルチレゾリューションメッシュを得るためにCPUベースの実装に依存している。我々は,ネットワーク解像度の低減を図るために,GPU加速メッシュデシメーションを設計する。プールおよびアンプールモジュールは、デシメーション時に収集された頂点クラスタ上で定義される。メッシュ上の特徴学習には、facet2vertex、vertex2facet、facet2facetという3種類の新しい畳み込みが含まれている。したがって、メッシュを従来の方法のようにエッジを持つ空間グラフではなく、頂点と面からなる幾何学的構造として扱う。 Picassoはまた、メッシュサンプリング(頂点密度)に対する堅牢性のためのファジィ機構をフィルタに組み込んでいる。これは、ファゼット2頂点の畳み込みのファジィ係数を定義するためにガウス混合を利用し、残りの2つの畳み込みの係数を定義するためにバリ中心補間を行う。本稿では,S3DIS上での競合セグメンテーション結果を用いた提案モジュールの有効性を示す。ライブラリはhttps://github.com/hlei-ziyan/picassoで公開される。

We present Picasso, a CUDA-based library comprising novel modules for deep learning over complex real-world 3D meshes. Hierarchical neural architectures have proved effective in multi-scale feature extraction which signifies the need for fast mesh decimation. However, existing methods rely on CPU-based implementations to obtain multi-resolution meshes. We design GPU-accelerated mesh decimation to facilitate network resolution reduction efficiently on-the-fly. Pooling and unpooling modules are defined on the vertex clusters gathered during decimation. For feature learning over meshes, Picasso contains three types of novel convolutions namely, facet2vertex, vertex2facet, and facet2facet convolution. Hence, it treats a mesh as a geometric structure comprising vertices and facets, rather than a spatial graph with edges as previous methods do. Picasso also incorporates a fuzzy mechanism in its filters for robustness to mesh sampling (vertex density). It exploits Gaussian mixtures to define fuzzy coefficients for the facet2vertex convolution, and barycentric interpolation to define the coefficients for the remaining two convolutions. In this release, we demonstrate the effectiveness of the proposed modules with competitive segmentation results on S3DIS. The library will be made public through https://github.com/hlei-ziyan/Picasso.

翻訳日:2021-03-30 15:03:33 公開日:2021-03-28

# オープンセット認識のための学習プレースホルダ

Learning Placeholders for Open-Set Recognition ( http://arxiv.org/abs/2103.15086v1 )

ライセンス: Link先を確認

Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan

(参考訳) 従来の分類器はクローズドセット設定でデプロイされ、トレーニングクラスとテストクラスは同じセットに属する。しかし、現実世界のアプリケーションはおそらく未知のカテゴリの入力に直面し、モデルはそれらを既知のカテゴリとして認識する。このような状況下では、既知のクラスにおける分類性能を維持し、未知のクラスを拒否するオープンセット認識が提案されている。クローズドセットモデルは、既知のクラスインスタンスよりも自信過剰な予測を行うため、オープンセット環境に拡張する際に、カテゴリ間のキャリブレーションとしきい値化が不可欠な問題となる。そこで我々は,データと分類器の両方にプレースホルダを割り当てることで未知のクラスに備えるオープンセット認識 (proser) のためのプレースホルダの学習を提案した。具体的には、学習データプレースホルダはオープンセットのクラスデータを予測し、クローズドセットのトレーニングをオープンセットのトレーニングに変換する。さらに,ターゲットクラスと非ターゲットクラスの不変情報を学習するために,分類器のプレースホルダーを,未知と未知のクラス固有の境界として予約する。提案するproserは,多様体混合により新しいクラスを効率的に生成し,訓練中に予約されたオープンセット分類器の値を適応的に設定する。提案手法の有効性を検証した各種データセットの実験を行った。

Traditional classifiers are deployed under closed-set setting, with both training and test classes belong to the same set. However, real-world applications probably face the input of unknown categories, and the model will recognize them as known ones. Under such circumstances, open-set recognition is proposed to maintain classification performance on known classes and reject unknowns. The closed-set models make overconfident predictions over familiar known class instances, so that calibration and thresholding across categories become essential issues when extending to an open-set environment. To this end, we proposed to learn PlaceholdeRs for Open-SEt Recognition (Proser), which prepares for the unknown classes by allocating placeholders for both data and classifier. In detail, learning data placeholders tries to anticipate open-set class data, thus transforms closed-set training into open-set training. Besides, to learn the invariant information between target and non-target classes, we reserve classifier placeholders as the class-specific boundary between known and unknown. The proposed Proser efficiently generates novel class by manifold mixup, and adaptively sets the value of reserved open-set classifier during training. Experiments on various datasets validate the effectiveness of our proposed method.

翻訳日:2021-03-30 15:03:14 公開日:2021-03-28

# ACSNet:時間的行動局所化を弱める行動コンテキスト分離ネットワーク

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization ( http://arxiv.org/abs/2103.15088v1 )

ライセンス: Link先を確認

Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, Gang Hua

(参考訳) Weakly-supervised Temporal Action Localization (WS-TAL) の目的は、すべてのアクションインスタンスをビデオレベルの監視のみでトリミングされたビデオにローカライズすることである。トレーニング中にフレームレベルのアノテーションがないため、現在のWS-TALメソッドはビデオレベルの分類タスクに寄与する前景のスニペットやフレームをローカライズするアテンションメカニズムに依存している。この戦略は、ローカライゼーション結果においてコンテキストを実際のアクションと混同することが多い。アクションとコンテキストの分離は、正確なWS-TALにとって重要な問題ですが、非常に困難で、文献でほとんど無視されています。本稿では,アクションローカライズのためのコンテキストを明示的に考慮した行動コンテキスト分離ネットワーク(ACSNet)を提案する。 2つのブランチ(すなわちフォアグラウンドバックグラウンドブランチとアクションコンテキストブランチ)で構成されている。前景背景ブランチは、まずビデオ全体の背景と前景を区別する一方、Action-Contextブランチは、その前景をアクションとコンテキストとして分離する。我々はビデオスニペットを2つの潜伏成分(正の成分と負の成分)に関連付け、それらの組み合わせは前景、アクション、コンテキストを効果的に特徴付けることができる。さらに,アクション・コンテキスト分離の学習を容易にするために,補助コンテキストカテゴリを持つ拡張ラベルを導入する。 THUMOS14とActivityNet v1.2/v1.3データセットの実験では、ACSNetが既存のWS-TALメソッドよりも大きなマージンで優れていることが示されている。

The object of Weakly-supervised Temporal Action Localization (WS-TAL) is to localize all action instances in an untrimmed video with only video-level supervision. Due to the lack of frame-level annotations during training, current WS-TAL methods rely on attention mechanisms to localize the foreground snippets or frames that contribute to the video-level classification task. This strategy frequently confuse context with the actual action, in the localization result. Separating action and context is a core problem for precise WS-TAL, but it is very challenging and has been largely ignored in the literature. In this paper, we introduce an Action-Context Separation Network (ACSNet) that explicitly takes into account context for accurate action localization. It consists of two branches (i.e., the Foreground-Background branch and the Action-Context branch). The Foreground- Background branch first distinguishes foreground from background within the entire video while the Action-Context branch further separates the foreground as action and context. We associate video snippets with two latent components (i.e., a positive component and a negative component), and their different combinations can effectively characterize foreground, action and context. Furthermore, we introduce extended labels with auxiliary context categories to facilitate the learning of action-context separation. Experiments on THUMOS14 and ActivityNet v1.2/v1.3 datasets demonstrate the ACSNet outperforms existing state-of-the-art WS-TAL methods by a large margin.

翻訳日:2021-03-30 15:02:53 公開日:2021-03-28

# 高速かつ効果的な単一多重モデル畳み込みニューラルネットワークによる単一物体追跡

Single Object Tracking through a Fast and Effective Single-Multiple Model Convolutional Neural Network ( http://arxiv.org/abs/2103.15105v1 )

ライセンス: Link先を確認

Faraz Lotfi, Hamid D. Taghirad

(参考訳) 類似したオブジェクトが同じ領域に存在する場合、オブジェクト追跡は特に重要になる。近年のSOTA(State-of-the-art)アプローチは,トラッカーの性能を大幅に低下させる領域において,ターゲットを他の物体と区別するために,重構造と整合するネットワークを用いて提案されている。また、複数の候補が考慮され、時間を要する各フレームの関心領域に対象オブジェクトをローカライズするために処理される。本稿では,従来のアプローチとは対照的に,同一領域の類似したオブジェクトと区別するためにテンプレートを考慮しながら,単一のショットでオブジェクトの位置を識別することが可能な,特別なアーキテクチャを提案する。まず第一に、ターゲットサイズが2倍のオブジェクトを含むウィンドウを考える。このウィンドウは完全な畳み込みニューラルネットワーク(CNN)に入力され、各フレームのマトリックスの形式で関心領域(RoI)を抽出する。はじめに、ターゲットのテンプレートもcnnへの入力として取り込まれる。このRoI行列を考慮すると、トラッカーの次の動きは単純かつ高速な方法に基づいて決定される。さらに、このマトリックスは、時間とともに変化するときに重要なオブジェクトサイズを推定するのに役立ちます。マッチングネットワークがないにもかかわらず、提示されたトラッカーはSOTAと比較的困難な状況下で動作し、それに比べて超高速である(最大120FPS$ on 1080ti)。この主張を調べるため、GOT-10kデータセットで比較研究を行った。その結果,提案手法の課題遂行における優れた性能が得られた。

Object tracking becomes critical especially when similar objects are present in the same area. Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area which indeed drastically downgrades the performance of the tracker in terms of speed. Besides, several candidates are considered and processed to localize the intended object in a region of interest for each frame which is time-consuming. In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot while taking its template into account to distinguish it from the similar objects in the same area. In brief, first of all, a window containing the object with twice the target size is considered. This window is then fed into a fully convolutional neural network (CNN) to extract a region of interest (RoI) in a form of a matrix for each of the frames. In the beginning, a template of the target is also taken as the input to the CNN. Considering this RoI matrix, the next movement of the tracker is determined based on a simple and fast method. Moreover, this matrix helps to estimate the object size which is crucial when it changes over time. Despite the absence of a matching network, the presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to $120 FPS$ on 1080ti). To investigate this claim, a comparison study is carried out on the GOT-10k dataset. Results reveal the outstanding performance of the proposed method in fulfilling the task.

翻訳日:2021-03-30 15:02:28 公開日:2021-03-28

# 血縁検証のためのメタマイニング判別サンプル

Meta-Mining Discriminative Samples for Kinship Verification ( http://arxiv.org/abs/2103.15108v1 )

ライセンス: Link先を確認

Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, Jie Zhou

(参考訳) Kinship confirmedは、与えられた顔画像の親族関係が存在するかどうかを調べることを目的としている。 kinship検証データベースは、アンバランスなデータで生まれます。 N 個の正の親和対を持つデータベースに対して、自然に N(N-1) 個の負の対を得る。限定された正の対を完全に活用し、血縁検証のための十分な負のサンプルから識別情報をマイニングする方法は、未解決の問題である。この問題に対処するため,本論文では識別サンプルメタマイニング(DSMM)手法を提案する。固定的な負のペアを持つバランスのとれたデータセットを構築する既存の方法とは異なり、全ての可能なペアを活用し、データから判別情報を自動学習する。具体的には、各イテレーションでバランスの取れない列車バッチとバランスのとれたメタ列車バッチをサンプリングします。次に、バランスのとれたメタトレーニングバッチでメタ勾配を持つメタマイナを学習します。最終的に、バランスのとれない列車バッチのサンプルは、学習したメタマイナによって再重み付けされ、キンシップモデルが最適化される。 KinFaceW-I, KinFaceW-II, TSKinFace, Cornell Kinshipデータセットを用いた実験結果から, 提案手法の有効性が示された。

Kinship verification aims to find out whether there is a kin relation for a given pair of facial images. Kinship verification databases are born with unbalanced data. For a database with N positive kinship pairs, we naturally obtain N(N-1) negative pairs. How to fully utilize the limited positive pairs and mine discriminative information from sufficient negative samples for kinship verification remains an open issue. To address this problem, we propose a Discriminative Sample Meta-Mining (DSMM) approach in this paper. Unlike existing methods that usually construct a balanced dataset with fixed negative pairs, we propose to utilize all possible pairs and automatically learn discriminative information from data. Specifically, we sample an unbalanced train batch and a balanced meta-train batch for each iteration. Then we learn a meta-miner with the meta-gradient on the balanced meta-train batch. In the end, the samples in the unbalanced train batch are re-weighted by the learned meta-miner to optimize the kinship models. Experimental results on the widely used KinFaceW-I, KinFaceW-II, TSKinFace, and Cornell Kinship datasets demonstrate the effectiveness of the proposed approach.

翻訳日:2021-03-30 15:02:00 公開日:2021-03-28

# 野生における顔面表情認識のためのインポンダラスネット

Imponderous Net for Facial Expression Recognition in the Wild ( http://arxiv.org/abs/2103.15136v1 )

ライセンス: Link先を確認

Darshan Gera and S. Balasubramanian

(参考訳) ディープラーニング (DL) のルネサンス以来, 顔表情認識 (FER) が注目され, 性能が継続的に向上している。パフォーマンスと引き換えに、新たな課題が生まれました。現代のFERシステムは、オクルージョンやポーズのバリエーションを含む、制御されていない条件下で撮影された顔画像("in-the-wild scenario"とも呼ばれる)を扱う。彼らは、転送学習、アテンション機構、局所的グローバルコンテキスト抽出器といったさまざまなコンポーネントを備えたディープネットワークを使用して、そのような条件をうまく処理します。しかし、これらのディープネットワークは多数のパラメータで非常に複雑であり、実際のシナリオにデプロイするには適さない。内蔵のシナリオ下で、FER上で非常に優れたパフォーマンスを示す軽量ネットワークを構築することは可能か? 本研究では,このようなネットワークを体系的に構築し,Imponderous Netと呼ぶ。我々は、先のディープネットワークのコンポーネントをFERに活用し、分析し、慎重に選択し、Imponderous Netに到達させる。我々のインポンダラスネットは1.45Mパラメータしか持たない低カロリーネットであり、最先端(SOTA)アーキテクチャの約50倍小さい。さらに、推論の間、intel-i7 cpuでリアルタイムレート40フレーム/秒(fps)で処理することができる。カロリーは低いが、その性能は依然としてパワー満載であり、他の軽量アーキテクチャや高容量アーキテクチャを圧倒している。具体的には、in-the-wildデータセット rafdb, ferplus, affectnet それぞれ 87.09\%, 88.17\%, 62.06\% を報告している。また、オクルージョン下では優れたロバスト性を示し、文献の他の軽量建築と比べてポーズのバリエーションも示している。

Since the renaissance of deep learning (DL), facial expression recognition (FER) has received a lot of interest, with continual improvement in the performance. Hand-in-hand with performance, new challenges have come up. Modern FER systems deal with face images captured under uncontrolled conditions (also called in-the-wild scenario) including occlusions and pose variations. They successfully handle such conditions using deep networks that come with various components like transfer learning, attention mechanism and local-global context extractor. However, these deep networks are highly complex with large number of parameters, making them unfit to be deployed in real scenarios. Is it possible to build a light-weight network that can still show significantly good performance on FER under in-the-wild scenario? In this work, we methodically build such a network and call it as Imponderous Net. We leverage on the aforementioned components of deep networks for FER, and analyse, carefully choose and fit them to arrive at Imponderous Net. Our Imponderous Net is a low calorie net with only 1.45M parameters, which is almost 50x less than that of a state-of-the-art (SOTA) architecture. Further, during inference, it can process at the real time rate of 40 frames per second (fps) in an intel-i7 cpu. Though it is low calorie, it is still power packed in its performance, overpowering other light-weight architectures and even few high capacity architectures. Specifically, Imponderous Net reports 87.09\%, 88.17\% and 62.06\% accuracies on in-the-wild datasets RAFDB, FERPlus and AffectNet respectively. It also exhibits superior robustness under occlusions and pose variations in comparison to other light-weight architectures from the literature.

翻訳日:2021-03-30 15:01:42 公開日:2021-03-28

# TransCenter:マルチオブジェクトトラッキングのためのDense Queries付きトランスフォーマー

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking ( http://arxiv.org/abs/2103.15145v1 )

ライセンス: Link先を確認

Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda

(参考訳) トランスフォーマーネットワークは、導入以来、さまざまなタスクで非常に強力であることが証明されている。コンピュータビジョンは例外ではなく、近年ではトランスフォーマーの使用が視覚コミュニティで非常に人気になっている。この波にもかかわらず、MOT(Multiple-object Tracking)はトランスフォーマーと何らかの非互換性を示す。標準表現 -- 境界ボックス -- はMOTの学習トランスフォーマーに適応していない、と我々は主張する。最近の研究から着想を得たTransCenterは,複数のターゲットの中心を追跡するトランスフォーマーベースのアーキテクチャである。本研究では,2重デコーダネットワークにおいて,ターゲットの中心のヒートマップをロバストに推定し,時間を通じてそれらを関連付ける手法を提案する。 TransCenterは、MOT17とMOT20の両方において、現在の最先端のマルチオブジェクトトラッキングよりも優れている。本研究は,より単純な代替案と比較して,提案アーキテクチャの利点を実証するものである。コードは公開される予定だ。

Transformer networks have proven extremely powerful for a wide variety of tasks since they were introduced. Computer vision is not an exception, as the use of transformers has become very popular in the vision community in recent years. Despite this wave, multiple-object tracking (MOT) exhibits for now some sort of incompatibility with transformers. We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for MOT. Inspired by recent research, we propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets. Methodologically, we propose the use of dense queries in a double-decoder network, to be able to robustly infer the heatmap of targets' centers and associate them through time. TransCenter outperforms the current state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our ablation study demonstrates the advantage in the proposed architecture compared to more naive alternatives. The code will be made publicly available.

翻訳日:2021-03-30 15:01:11 公開日:2021-03-28

# ビジュアルギャップのブリッジ:ワイドレンジ画像のブレンド

Bridging the Visual Gap: Wide-Range Image Blending ( http://arxiv.org/abs/2103.15149v1 )

ライセンス: Link先を確認

Chia-Ni Lu, Ya-Chu Chang and Wei-Chen Chiu

(参考訳) 本稿では,2つの異なる入力画像をパノラマにスムーズに融合し,その中間領域に新たな画像コンテンツを生成することを目的とした,画像処理における新たな問題シナリオである広域画像ブレンディングを提案する。このような問題は、画像インペインティング、画像アウトペインティング、画像ブレンドといったトピックと密接に関連しているが、これらのトピックからのアプローチは、いずれも簡単に対処できない。広帯域画像ブレンディングを実現するための効果的な深層学習モデルを導入し、新しい双方向コンテンツトランスファーモジュールを提案し、リカレントニューラルネットワークを介して中間領域の特徴表現の条件付き予測を行う。ブレンディング時の空間的・意味的整合性を確保することに加えて,提案手法では,視覚的パノラマの質を向上させるために,文脈的注意機構と対角学習方式も採用している。提案手法は,広視野画像ブレンディングのための視覚的に魅力的な結果を生成するだけでなく,最先端画像インパインティングおよびアウトパインティングアプローチ上に構築された複数のベースラインに対して優れた性能を提供することができることを実験的に実証した。

In this paper we propose a new problem scenario in image processing, wide-range image blending, which aims to smoothly merge two different input photos into a panorama by generating novel image content for the intermediate region between them. Although such problem is closely related to the topics of image inpainting, image outpainting, and image blending, none of the approaches from these topics is able to easily address it. We introduce an effective deep-learning model to realize wide-range image blending, where a novel Bidirectional Content Transfer module is proposed to perform the conditional prediction for the feature representation of the intermediate region via recurrent neural networks. In addition to ensuring the spatial and semantic consistency during the blending, we also adopt the contextual attention mechanism as well as the adversarial learning scheme in our proposed method for improving the visual quality of the resultant panorama. We experimentally demonstrate that our proposed method is not only able to produce visually appealing results for wide-range image blending, but also able to provide superior performance with respect to several baselines built upon the state-of-the-art image inpainting and outpainting approaches.

翻訳日:2021-03-30 15:00:58 公開日:2021-03-28

# 欠陥GAN:自動欠陥検査のための高忠実欠陥合成

Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection ( http://arxiv.org/abs/2103.15158v1 )

ライセンス: Link先を確認

Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, Shijian Lu

(参考訳) 自動欠陥検査は、先進的な製造における効率よく効率的なメンテナンス、修理、運用に重要である。一方で、特にこのタスクにディープニューラルネットワークを採用する場合、欠陥サンプルの欠如によって、自動欠陥検査が制限されることが多い。本稿では,正確でロバストな欠陥検査ネットワークをトレーニングするために,現実的で多様な欠陥サンプルを生成する自動欠陥合成ネットワークであるdefy-ganを提案する。欠陥-ganはデファクトと修復プロセスを通じて学習し、デファクトは通常の表面画像に欠陥を生成し、修復は欠陥を除去して正常な画像を生成する。テクスチャや外観の異なる様々な画像背景において現実的な欠陥を生成するために、新しい合成層ベースのアーキテクチャを用いる。また、欠陥の確率的なバリエーションを模倣し、画像背景内の生成された欠陥の位置やカテゴリを柔軟に制御することができる。大規模な実験により、欠陥GANは様々な欠陥を優れた多様性と忠実さで合成できることが示された。さらに, 合成欠陥サンプルは, より良い欠陥検査ネットワークの訓練に有効であることを示した。

Automated defect inspection is critical for effective and efficient maintenance, repair, and operations in advanced manufacturing. On the other hand, automated defect inspection is often constrained by the lack of defect samples, especially when we adopt deep neural networks for this task. This paper presents Defect-GAN, an automated defect synthesis network that generates realistic and diverse defect samples for training accurate and robust defect inspection networks. Defect-GAN learns through defacement and restoration processes, where the defacement generates defects on normal surface images while the restoration removes defects to generate normal images. It employs a novel compositional layer-based architecture for generating realistic defects within various image backgrounds with different textures and appearances. It can also mimic the stochastic variations of defects and offer flexible control over the locations and categories of the generated defects within the image background. Extensive experiments show that Defect-GAN is capable of synthesizing various defects with superior diversity and fidelity. In addition, the synthesized defect samples demonstrate their effectiveness in training better defect inspection networks.

翻訳日:2021-03-30 15:00:38 公開日:2021-03-28

# ReAgent: 模倣と強化学習を用いたポイントクラウド登録

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning ( http://arxiv.org/abs/2103.15231v1 )

ライセンス: Link先を確認

Dominik Bauer, Timothy Patten and Markus Vincze

(参考訳) ポイントクラウドの登録は、オブジェクトポーズ推定のような多くの3Dコンピュータビジョンタスクにおいて一般的なステップであり、3Dモデルが観察に一致している。古典的な登録法は、新しいドメインによく一般化するが、ノイズの多い観察や悪い初期化が与えられると失敗する。対照的に学習ベースの手法はより堅牢であるが、一般化能力に欠ける。本稿では,反復的ポイントクラウド登録を強化学習タスクとして考慮し,そのために新たな登録エージェント(エージェント)を提案する。我々は,定常的な専門家政策に基づいて,個別登録ポリシーを初期化するために模倣学習を用いる。提案するアライメント報酬に基づくポリシー最適化との統合は,エージェントの登録性能をさらに向上させる。我々は,ModelNet40(合成)とScanObjectNN(実データ)の両方における古典的および学習的登録手法との比較を行い,ReAgentが最先端の精度を実現することを示す。エージェントの軽量なアーキテクチャは、関連するアプローチと比較して推論時間を短縮することができる。さらに,本手法を実データ(linemod)上のオブジェクトポーズ推定タスクに適用し,最先端のポーズリファインメント手法を上回っている。

Point cloud registration is a common step in many 3D computer vision tasks such as object pose estimation, where a 3D model is aligned to an observation. Classical registration methods generalize well to novel domains but fail when given a noisy observation or a bad initialization. Learning-based methods, in contrast, are more robust but lack in generalization capacity. We propose to consider iterative point cloud registration as a reinforcement learning task and, to this end, present a novel registration agent (ReAgent). We employ imitation learning to initialize its discrete registration policy based on a steady expert policy. Integration with policy optimization, based on our proposed alignment reward, further improves the agent's registration performance. We compare our approach to classical and learning-based registration methods on both ModelNet40 (synthetic) and ScanObjectNN (real data) and show that our ReAgent achieves state-of-the-art accuracy. The lightweight architecture of the agent, moreover, enables reduced inference time as compared to related approaches. In addition, we apply our method to the object pose estimation task on real data (LINEMOD), outperforming state-of-the-art pose refinement approaches.

翻訳日:2021-03-30 15:00:22 公開日:2021-03-28

# 時間的動作局所化のための低忠実度エンド・ツー・エンドビデオエンコーダ事前学習

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal ActionLocalization ( http://arxiv.org/abs/2103.15233v1 )

ライセンス: Link先を確認

Mengmeng Xu, Juan-Manuel Perez-Ru, Xiatian Zhu, Bernard Ghanem, Brais Martinez

(参考訳) 時間的行動ローカライゼーション(TAL)は、ビデオ理解における基本的な課題である。既存のtalメソッドは、アクション分類の監督を通じてビデオエンコーダを事前トレーニングする。これにより、ビデオエンコーダ -- アクションの分類のために訓練されるが、talで使用される -- のタスク不一致問題が発生する。直感的には、エンドツーエンドのモデル最適化はよいソリューションです。しかし、長い未処理ビデオを処理するのに計算コストがかかるため、gpuメモリの制約を受けるtalでは動作できない。本稿では,ローファイダリティ・エンド・ツー・エンド(LoFi)ビデオエンコーダの事前学習手法を導入することで,この問題を解決する。ビデオエンコーダのエンド・ツー・エンド最適化が中間ハードウェア予算のメモリ条件下で操作可能となるように,時間的・空間的・時空間的・時空間的分解能の面でのミニバッチ構成の削減を提案する。これにより、TAL損失監視からビデオエンコーダを逆向きに流し、タスクの不一致の問題を良好に解決し、より効果的な特徴表現を提供する。広範な実験により,lofiプリトレーニング手法が既存のtal法の性能を著しく向上させることが示された。軽量なResNet18ベースのビデオエンコーダを1つのRGBストリームで使用しても、当社の手法は高価な光フローを持つ2ストリームのResNet50ベースの代替手段をはるかに上回ります。

Temporal action localization (TAL) is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL. Intuitively, end-to-end model optimization is a good solution. However, this is not operable for TAL subject to the GPU memory constraints, due to the prohibitive computational cost in processing long untrimmed videos. In this paper, we resolve this challenge by introducing a novel low-fidelity end-to-end (LoFi) video encoder pre-training method. Instead of always using the full training configurations for TAL learning, we propose to reduce the mini-batch composition in terms of temporal, spatial or spatio-temporal resolution so that end-to-end optimization for the video encoder becomes operable under the memory conditions of a mid-range hardware budget. Crucially, this enables the gradient to flow backward through the video encoder from a TAL loss supervision, favourably solving the task discrepancy problem and providing more effective feature representations. Extensive experiments show that the proposed LoFi pre-training approach can significantly enhance the performance of existing TAL methods. Encouragingly, even with a lightweight ResNet18 based video encoder in a single RGB stream, our method surpasses two-stream ResNet50 based alternatives with expensive optical flow, often by a good margin.

翻訳日:2021-03-30 15:00:04 公開日:2021-03-28

# 深層学習における重み付けの役割を理解する

Understanding the role of importance weighting for deep learning ( http://arxiv.org/abs/2103.15209v1 )

ライセンス: Link先を確認

Da Xu, Yuting Ye, Chuanwei Ruan

(参考訳) Byrd & Lipton (2019) による最近の論文は、経験的な観察に基づいて、過度にパラメータ化されたディープラーニングモデルに対する重み付けの影響に大きな懸念を提起している。彼らは、モデルがトレーニングデータを分離できる限り、重要度重み付けの影響はトレーニングが進むにつれて減少する、と観察する。しかし、この現象の厳密な特徴が欠けている。本稿では,勾配降下の暗黙のバイアスとマージンに基づく学習理論に対する重要度重み付けの役割に関する形式的特徴と理論的正当性について述べる。ディープラーニングモデルの下で最適化力学と一般化性能の両方を明らかにする。本研究は,深層学習において重み付けを重要視する様々な新しい現象を説明するだけでなく,モデルの一部として重み付けが最適化されている研究にも応用する。

The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon. In this paper, we provide formal characterizations and theoretical justifications on the role of importance weighting with respect to the implicit bias of gradient descent and margin-based learning theory. We reveal both the optimization dynamics and generalization performance under deep learning models. Our work not only explains the various novel phenomenons observed for importance weighting in deep learning, but also extends to the studies where the weights are being optimized as part of the model, which applies to a number of topics under active research.

翻訳日:2021-03-30 14:48:26 公開日:2021-03-28

# 連続時間情報を用いた深層学習のための時間カーネルアプローチ

A Temporal Kernel Approach for Deep Learning with Continuous-time Information ( http://arxiv.org/abs/2103.15213v1 )

ライセンス: Link先を確認

Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

(参考訳) rnn、causal cnn、アテンション機構などの逐次ディープラーニングモデルは、連続時間情報を容易に消費しない。時間データの離散化は、単純な継続的プロセスにおいても矛盾を引き起こす。現在のアプローチは、しばしば既存のディープラーニングアーキテクチャや実装と整合するように、ヒューリスティックな方法で時間を扱う。本稿では,ディープラーニングツールを用いた連続時間システムの特徴付け手法を提案する。特に、提案されたアプローチはすべての主要なディープラーニングアーキテクチャに適用され、実装の変更はほとんど必要ありません。重要な洞察は、ニューラルネットワークを時間的カーネルと組み合わせることで、連続時間システムを表現することであり、そこでは、ガウス過程とニューラル・タンジェント・カーネルによるディープラーニング理解の最近の進歩から直感を得る。時間的カーネルを表現するために、ランダムな特徴アプローチを導入し、再パラメータ化の下でカーネル学習問題をスペクトル密度推定に変換する。さらに、時間的核が定常的でない場合でも収束と一貫性を証明し、スペクトル密度を誤特定する。シミュレーションと実データ実験は,我々の時間的カーネルアプローチの幅広い設定における経験的有効性を示す。

Sequential deep learning models such as RNN, causal CNN and attention mechanism do not readily consume continuous-time information. Discretizing the temporal data, as we show, causes inconsistency even for simple continuous-time processes. Current approaches often handle time in a heuristic manner to be consistent with the existing deep learning architectures and implementations. In this paper, we provide a principled way to characterize continuous-time systems using deep learning tools. Notably, the proposed approach applies to all the major deep learning architectures and requires little modifications to the implementation. The critical insight is to represent the continuous-time system by composing neural networks with a temporal kernel, where we gain our intuition from the recent advancements in understanding deep learning with Gaussian process and neural tangent kernel. To represent the temporal kernel, we introduce the random feature approach and convert the kernel learning problem to spectral density estimation under reparameterization. We further prove the convergence and consistency results even when the temporal kernel is non-stationary, and the spectral density is misspecified. The simulations and real-data experiments demonstrate the empirical effectiveness of our temporal kernel approach in a broad range of settings.

翻訳日:2021-03-30 14:48:12 公開日:2021-03-28

# PnG BERT:Augmented BERT on Phonemes and Graphemes for Neural TTS

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS ( http://arxiv.org/abs/2103.15060v1 )

ライセンス: Link先を確認

Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu

(参考訳) 本稿では,ニューラルTSの新しいエンコーダモデルであるPnG BERTを紹介する。このモデルは元のBERTモデルから拡張され、テキストの音素表現と音素表現の両方を入力とし、単語レベルのアライメントを行う。大規模テキストコーパス上で自己教師型で事前訓練し、TSタスクで微調整することができる。実験結果から,事前学習したPnG BERTをエンコーダとして使用するニューラルTSモデルは,事前学習のない音素入力のみを用いたベースラインモデルよりも自然な韻律と精度の高い発音が得られることがわかった。 PnG BERTを用いて合成した音声とプロの話者の真実記録との間に統計的に有意な嗜好がないことを示す。

This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them. It can be pre-trained on a large text corpus in a self-supervised manner, and fine-tuned in a TTS task. Experimental results show that a neural TTS model using a pre-trained PnG BERT as its encoder yields more natural prosody and more accurate pronunciation than a baseline model using only phoneme input with no pre-training. Subjective side-by-side preference evaluations show that raters have no statistically significant preference between the speech synthesized using a PnG BERT and ground truth recordings from professional speakers.

翻訳日:2021-03-30 14:44:05 公開日:2021-03-28

# 自動音声認識におけるバイアスの定量化

Quantifying Bias in Automatic Speech Recognition ( http://arxiv.org/abs/2103.15122v1 )

ライセンス: Link先を確認

Siyuan Feng, Olya Kudina, Bence Mark Halpern and Odette Scharenborg

(参考訳) 自動音声認識(ASR)システムは、人間の発話を客観的に解釈することを約束する。実践的および最近の証拠は、最先端(SotA)のASRが、性別、年齢、言語障害、人種、アクセントによる発話のばらつきに苦しむことを示唆している。多くの要因がASRシステムのバイアスを引き起こすことがある。訓練材料の構成と調音の相違。我々の包括的なゴールは、ASRシステムのバイアスを明らかにすることであり、ASRの積極的なバイアス緩和に向けたものである。本稿では,性別,年齢,地域アクセント,非ネイティブアクセントに対するSotA ASRシステムのバイアスを体系的に定量化する。単語の誤り率を比較し, 音素レベルの誤り分析を行い, バイアスの発生箇所を理解する。データセットの明瞭性の違いによるバイアスに焦点を当てる。以上の結果から,ASR開発におけるバイアス緩和戦略を提案する。

Automatic speech recognition (ASR) systems promise to deliver objective interpretation of human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with speech variance due to gender, age, speech impairment, race, and accents. Many factors can cause the bias of an ASR system, e.g. composition of the training material and articulation differences. Our overarching goal is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. This paper systematically quantifies the bias of a SotA ASR system against gender, age, regional accents and non-native accents. Word error rates are compared, and in-depth phoneme-level error analysis is conducted to understand where bias is occurring. We focus on bias due to articulation differences in the dataset. Based on our findings, we suggest bias mitigation strategies for ASR development.

翻訳日:2021-03-30 14:43:53 公開日:2021-03-28

# ボードに逆らう: 水平方向の進化アルゴリズムがパンデミックに対抗

Playing Against the Board: Rolling Horizon Evolutionary Algorithms Against Pandemic ( http://arxiv.org/abs/2103.15090v1 )

ライセンス: Link先を確認

Konstantinos Sfikas and Antonios Liapis

(参考訳) 競争的なボードゲームは、人工知能のための豊かで多様なテストベッドを提供している。本稿では,短期的リスク軽減と長期的勝利戦略のバランスをとる必要があるため,コラボレーションボードゲームが人工知能に異なる課題をもたらすことを主張する。協力的なボードゲームは、すべてのプレイヤーに、ボードと確率的なルールセットによって引き起こされるエスカレートする課題を克服するために、異なるパワーの調整やリソースのプールを義務付ける。本稿では,模範的なコラボレーションボードゲームであるPandemicに着目し,本ゲーム用に設計された転動地平線進化アルゴリズムを提案する。パンデミックゲーム状態が確率的だが予測可能な方法で変化する複雑な方法は、多くの特別に設計された前方モデル、意思決定のためのマクロアクション表現、進化アルゴリズムの遺伝的操作のための修復機能を必要とした。楽観的・悲観的なゲーム状態評価,異なる突然変異率,事象の地平線を探索するアルゴリズムの変数を,基本的階層的政策エージェントと比較する。結果は、短期水平ロールアウトによる進化的アプローチが、委員会が導入し、それらに対して防御する将来の危険を考慮しやすくしていることを示している。結果は、人工知能、特にマルチプレイヤー協調インタラクションの処理において、協調ボードゲームが直面する課題の種類を強調する。

Competitive board games have provided a rich and diverse testbed for artificial intelligence. This paper contends that collaborative board games pose a different challenge to artificial intelligence as it must balance short-term risk mitigation with long-term winning strategies. Collaborative board games task all players to coordinate their different powers or pool their resources to overcome an escalating challenge posed by the board and a stochastic ruleset. This paper focuses on the exemplary collaborative board game Pandemic and presents a rolling horizon evolutionary algorithm designed specifically for this game. The complex way in which the Pandemic game state changes in a stochastic but predictable way required a number of specially designed forward models, macro-action representations for decision-making, and repair functions for the genetic operations of the evolutionary algorithm. Variants of the algorithm which explore optimistic versus pessimistic game state evaluations, different mutation rates and event horizons are compared against a baseline hierarchical policy agent. Results show that an evolutionary approach via short-horizon rollouts can better account for the future dangers that the board may introduce, and guard against them. Results highlight the types of challenges that collaborative board games pose to artificial intelligence, especially for handling multi-player collaboration interactions.

翻訳日:2021-03-30 14:42:53 公開日:2021-03-28

# 人間の視力に基づくロボットシステムにおける適応自律性

Adaptive Autonomy in Human-on-the-Loop Vision-Based Robotics Systems ( http://arxiv.org/abs/2103.15053v1 )

ライセンス: Link先を確認

Sophia Abraham, Zachariah Carmichael, Sreya Banerjee, Rosaura VidalMata, Ankit Agrawal, Md Nafee Al Islam, Walter Scheirer, Jane Cleland-Huang

(参考訳) コンピュータビジョンのアプローチは、自律ロボットシステムによって、周囲の世界を感知し、衝突回避、捜索救助、オブジェクト操作など様々なタスクを実行する際の意思決定を導くために広く使われている。特にHuman-on-the-loop(HoTL)システムでは、システムによって意思決定が自律的に行われ、人間が監督的な役割を果たす。ビジョンモデルの失敗は、命や死の可能性のある誤った決定につながる可能性がある。本稿では,適応的自律性レベルに基づくソリューションを提案する。システムでは,これらのモデルの信頼性の損失を検知し,自己の自律性レベルを一時的に低下させ,意思決定プロセスにおける人間の関与を高めることにより応答する。我々のソリューションは、人間が反応しガイダンスを提供する時間を持つ視覚ベースのタスクに適用できる。提案手法は,モデルの不確実性を考慮して視覚タスクの信頼性を推定し,モデルのトレーニングデータと現在の動作環境が一致しないかどうかを共変解析することにより評価する。我々はDroneResponseの例を紹介し,小型無人航空システムを緊急対応ミッションに配置し,信頼性スコアに加えて,システムの自律性の行動と適応を駆動および特定するために,ビジョンモデルの信頼性をどのように利用するかを示す。本稿では,提案手法を概説し,自律システムの意思決定におけるビジョンモデルの安全かつ信頼性の高い展開に向けた,コンピュータビジョンとソフトウェアエンジニアリングの交点におけるオープンチャレンジについて述べる。

Computer vision approaches are widely used by autonomous robotic systems to sense the world around them and to guide their decision making as they perform diverse tasks such as collision avoidance, search and rescue, and object manipulation. High accuracy is critical, particularly for Human-on-the-loop (HoTL) systems where decisions are made autonomously by the system, and humans play only a supervisory role. Failures of the vision model can lead to erroneous decisions with potentially life or death consequences. In this paper, we propose a solution based upon adaptive autonomy levels, whereby the system detects loss of reliability of these models and responds by temporarily lowering its own autonomy levels and increasing engagement of the human in the decision-making process. Our solution is applicable for vision-based tasks in which humans have time to react and provide guidance. When implemented, our approach would estimate the reliability of the vision task by considering uncertainty in its model, and by performing covariate analysis to determine when the current operating environment is ill-matched to the model's training data. We provide examples from DroneResponse, in which small Unmanned Aerial Systems are deployed for Emergency Response missions, and show how the vision model's reliability would be used in addition to confidence scores to drive and specify the behavior and adaptation of the system's autonomy. This workshop paper outlines our proposed approach and describes open challenges at the intersection of Computer Vision and Software Engineering for the safe and reliable deployment of vision models in the decision making of autonomous systems.

翻訳日:2021-03-30 14:40:51 公開日:2021-03-28

# 可逆画像信号処理

Invertible Image Signal Processing ( http://arxiv.org/abs/2103.15061v1 )

ライセンス: Link先を確認

Yazhou Xing, Zian Qian, Qifeng Chen

(参考訳) 非処理RAWデータは、画像編集とコンピュータビジョンのための非常に貴重な画像フォーマットである。しかし、RAWデータのファイルサイズは巨大であるため、ほとんどのユーザーは処理や圧縮されたsRGB画像にしかアクセスできない。このギャップを埋めるために、インバーティブル画像信号処理(InvISP)パイプラインを設計し、視覚的に魅力的なsRGB画像のレンダリングを可能にするだけでなく、ほぼ完璧なRAWデータの復元を可能にする。本フレームワークの可逆性により、メモリオーバーヘッドを伴わずに、sRGB画像からRAWデータを合成する代わりに、現実的なRAWデータを再構成することができる。我々はまた、JPEG画像からRAWデータを再構成するフレームワークに、差別化可能なJPEG圧縮シミュレータを統合する。 2つのデジタル一眼レフにおける広範囲な定量的・定性的実験により,srgb画像と再構成生データの両方において,代替法よりも高い品質が得られることを示した。

Unprocessed RAW data is a highly valuable image format for image editing and computer vision. However, since the file size of RAW data is huge, most users can only get access to processed and compressed sRGB images. To bridge this gap, we design an Invertible Image Signal Processing (InvISP) pipeline, which not only enables rendering visually appealing sRGB images but also allows recovering nearly perfect RAW data. Due to our framework's inherent reversibility, we can reconstruct realistic RAW data instead of synthesizing RAW data from sRGB images without any memory overhead. We also integrate a differentiable JPEG compression simulator that empowers our framework to reconstruct RAW data from JPEG images. Extensive quantitative and qualitative experiments on two DSLR demonstrate that our method obtains much higher quality in both rendered sRGB images and reconstructed RAW data than alternative methods.

翻訳日:2021-03-30 14:40:28 公開日:2021-03-28

# manhattanslam: マンハッタンフレームの混合を利用したロバストな平面追跡とマッピング

ManhattanSLAM: Robust Planar Tracking and Mapping Leveraging Mixture of Manhattan Frames ( http://arxiv.org/abs/2103.15068v1 )

ライセンス: Link先を確認

Raza Yunus, Yanyan Li and Federico Tombari

(参考訳) 本稿では,RGB-D SLAMシステムを提案し,室内環境における構造情報を活用することにより,CPU上での正確な追跡と効率的な高密度マッピングを実現する。以前の作品では、マンハッタン世界 (mw) の仮定を用いて低ドリフトカメラのポーズを推定し、そのようなシステムの応用を制限している。一方,本稿では,MW環境と非MW環境におけるロバストなトラッキングを実現する新しい手法を提案する。平面間の直交関係をチェックし、マンハッタンのフレームを直接検出し、シーンをマンハッタンのフレームの混合としてモデル化する。 MWシーンでは、ポーズ推定を分離し、マンハッタンフレーム観測に基づく新しいドリフトフリー回転推定を提供する。 MWシーンの翻訳推定や非MWシーンのフルカメラポーズ推定では,難易度の高いシーンにおいて,ポイント,ライン,平面の特徴を利用してロバストなトラッキングを行う。さらに,各フレームで検出された平面特徴を活用し,各画像を平面領域と非平面領域に分割した,効率的なサーフェルに基づく高密度マッピング戦略を提案する。平面波は地図上のスパース平面から直接初期化され、非平面波はスーパーピクセルを抽出することによって構築される。提案手法は,ポーズ推定,ドリフト,再構成の精度を,他の最先端手法と比較して高い性能で評価する。将来的にはコードをオープンソースにします。

In this paper, a robust RGB-D SLAM system is proposed to utilize the structural information in indoor scenes, allowing for accurate tracking and efficient dense mapping on a CPU. Prior works have used the Manhattan World (MW) assumption to estimate low-drift camera pose, in turn limiting the applications of such systems. This paper, in contrast, proposes a novel approach delivering robust tracking in MW and non-MW environments. We check orthogonal relations between planes to directly detect Manhattan Frames, modeling the scene as a Mixture of Manhattan Frames. For MW scenes, we decouple pose estimation and provide a novel drift-free rotation estimation based on Manhattan Frame observations. For translation estimation in MW scenes and full camera pose estimation in non-MW scenes, we make use of point, line and plane features for robust tracking in challenging scenes. %mapping Additionally, by exploiting plane features detected in each frame, we also propose an efficient surfel-based dense mapping strategy, which divides each image into planar and non-planar regions. Planar surfels are initialized directly from sparse planes in our map while non-planar surfels are built by extracting superpixels. We evaluate our method on public benchmarks for pose estimation, drift and reconstruction accuracy, achieving superior performance compared to other state-of-the-art methods. We will open-source our code in the future.

翻訳日:2021-03-30 14:40:12 公開日:2021-03-28

# MergeComp: スケーラブルな分散トレーニングのための圧縮スケジューリング

MergeComp: A Compression Scheduler for Scalable Communication-Efficient Distributed Training ( http://arxiv.org/abs/2103.15195v1 )

ライセンス: Link先を確認

Zhuang Wang, Xinyu Wu, T.S. Eugene Ng

(参考訳) 大規模分散トレーニングはコミュニケーションバウンダリになりつつある。多くの勾配圧縮アルゴリズムが、通信オーバーヘッドを減らし、スケーラビリティを向上させるために提案されている。しかし、勾配圧縮が分散トレーニングの性能に悪影響を及ぼす場合もあることが観察されている。本稿では,通信効率のよい分散トレーニングのスケーラビリティを最適化する圧縮スケジューラであるMergeCompを提案する。モデルアーキテクチャやシステムパラメータの知識なしに圧縮アルゴリズムのパフォーマンスを最適化するために、圧縮操作を自動的にスケジュールする。我々はMergeCompを9つの一般的な圧縮アルゴリズムに適用した。評価の結果,mergecompは圧縮アルゴリズムの性能を最大3.83倍向上させることができた。高速ネットワーク上での分散トレーニングのスケーリング係数を最大99%達成することも可能だ。

Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the communication overhead and improve scalability. However, it has been observed that in some cases gradient compression may even harm the performance of distributed training. In this paper, we propose MergeComp, a compression scheduler to optimize the scalability of communication-efficient distributed training. It automatically schedules the compression operations to optimize the performance of compression algorithms without the knowledge of model architectures or system parameters. We have applied MergeComp to nine popular compression algorithms. Our evaluations show that MergeComp can improve the performance of compression algorithms by up to 3.83x without losing accuracy. It can even achieve a scaling factor of distributed training up to 99% over high-speed networks.

翻訳日:2021-03-30 14:36:47 公開日:2021-03-28

# ロバストと最適化ディープラーニングモデルを用いた正確な株価予測

Accurate Stock Price Forecasting Using Robust and Optimized Deep Learning Models ( http://arxiv.org/abs/2103.15096v1 )

ライセンス: Link先を確認

Jaydip Sen and Sidra Mehtab

(参考訳) 将来の株価を正確に予測するための堅牢なフレームワークを設計することは、常に非常に困難な研究課題とみなされてきた。古典的効率的な市場仮説の提唱者は、株価変動の確率的な性質のため、効率的な市場における将来の価格を正確に予測することは不可能であると断言する。しかし、アルゴリズムやモデルがどのように株式価格の効率的で正確で堅牢な予測を設計できるかを示す、洗練度と複雑さの程度が異なる文献には、多くの命題が存在する。本稿では,インドの自動車部門における重要な企業の株価の将来価格を正確に予測するために,回帰モデルの10種類の深層学習モデルを提案する。 2012年12月31日から2013年12月27日までの記録に基づいて、非常に細かな株価を5分間隔で収集し、モデルをトレーニングします。テストは2013年12月30日から2015年1月9日までの記録を用いて行われた。本稿では,モデルの設計原理を説明し,予測精度と実行速度に基づいてその性能を解析する。

Designing robust frameworks for precise prediction of future prices of stocks has always been considered a very challenging research problem. The advocates of the classical efficient market hypothesis affirm that it is impossible to accurately predict the future prices in an efficiently operating market due to the stochastic nature of the stock price variables. However, numerous propositions exist in the literature with varying degrees of sophistication and complexity that illustrate how algorithms and models can be designed for making efficient, accurate, and robust predictions of stock prices. We present a gamut of ten deep learning models of regression for precise and robust prediction of the future prices of the stock of a critical company in the auto sector of India. Using a very granular stock price collected at 5 minutes intervals, we train the models based on the records from 31st Dec, 2012 to 27th Dec, 2013. The testing of the models is done using records from 30th Dec, 2013 to 9th Jan 2015. We explain the design principles of the models and analyze the results of their performance based on accuracy in forecasting and speed of execution.

翻訳日:2021-03-30 14:33:08 公開日:2021-03-28

PDF登録状況（公開日: 20210328）