Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20200714となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# キャビティの自己分極と光化学的抑制に及ぼす多モードの影響 Effect of Many Modes on Self-Polarization and Photochemical Suppression in Cavities ( http://arxiv.org/abs/2001.07330v3 ) ライセンス: Link先を確認	Norah M. Hoffmann, Lionel Lacombe, Angel Rubio, Neepa T. Maitra	(参考訳) キャビティ修飾分子反応の標準的な記述は、通常は単一(共鳴)モードを含むが、実際には量子キャビティは様々な光子モードをサポートする。ここでは、より多くの光子モードが説明されるにつれて、物理化学的現象が劇的に変化し、陽子結合電子移動の重要でユビキタスな過程のキャビティ誘起の抑制によって示される。光子モードに対するマルチトラックのehrenfest法を用いて、自己分極効果が必須となり、自己分極変調ボルン-オッペンハイマー面の概念を動力学解析の新しい構成法として導入する。キャビティフォトンモードの数が増加すると、キャビティフリーのボルン・オッペンハイマー表面からの表面のずれが増大し、光子放出とこれらの表面の拡幅帯域内の吸収との間の相互作用が抑制される。本研究は一般的な知見であり, キャビティに埋め込まれた分子, ナノ構造, 固体のキャビティ駆動の物理過程の記述と制御に影響を及ぼすであろう。 The standard description of cavity-modified molecular reactions typically involves a single (resonant) mode, while in reality the quantum cavity supports a range of photon modes. Here we demonstrate that as more photon modes are accounted for, physico-chemical phenomena can dramatically change, as illustrated by the cavity-induced suppression of the important and ubiquitous process of proton-coupled electron-transfer. Using a multi-trajectory Ehrenfest treatment for the photon-modes, we find that self-polarization effects become essential, and we introduce the concept of self-polarization-modified Born-Oppenheimer surfaces as a new construct to analyze dynamics. As the number of cavity photon modes increases, the increasing deviation of these surfaces from the cavity-free Born-Oppenheimer surfaces, together with the interplay between photon emission and absorption inside the widening bands of these surfaces, leads to enhanced suppression. The present findings are general and will have implications for the description and control of cavity-driven physical processes of molecules, nanostructures and solids embedded in cavities.	翻訳日:2023-06-06 11:37:58 公開日:2020-07-14
# マルチビット計測誤差の効率的な補正 Efficient correction of multiqubit measurement errors ( http://arxiv.org/abs/2001.09980v2 ) ライセンス: Link先を確認	Michael R. Geller and Mingyu Sun	(参考訳) 状態準備と測定(SPAM)エラーは、短期量子コンピュータの性能と実用化の可能性を制限する。スパムエラーは、n$ qubitsのレジスタの完全な実装のために2^n$の追加測定を必要とするキャリブレーションステップの後に部分的に修正可能である。ここでは,2^n \!の古典的処理を必要とするマルチキュービットSPAM誤差のキャラクタリゼーションと緩和のための近似的だが効率的な手法を提案する。 2^n$行列であるが、$O(4^k n^2)$測定のみであり、$k=O(1)$は相関体積の量子ビットの数である。 4および8個の超伝導量子ビットのレジスタ上でibm qプロセッサを用いてこの技術を実証・検証する。 State preparation and measurement (SPAM) errors limit the performance of near-term quantum computers and their potential for practical application. SPAM errors are partly correctable after a calibration step that requires, for a complete implementation on a register of $n$ qubits, $2^n$ additional measurements. Here we introduce an approximate but efficient method for multiqubit SPAM error characterization and mitigation requiring the classical processing of $2^n \! \times 2^n$ matrices, but only $O(4^k n^2)$ measurements, where $k=O(1)$ is the number of qubits in a correlation volume. We demonstrate and validate the technique using an IBM Q processor on registers of 4 and 8 superconducting qubits.	翻訳日:2023-06-05 11:42:14 公開日:2020-07-14
# 厳密な測定誤差補正 Rigorous measurement error correction ( http://arxiv.org/abs/2002.01471v2 ) ライセンス: Link先を確認	Michael R. Geller	(参考訳) 本稿では,ゲート型量子コンピュータにおける状態準備および測定誤差の補正に用いる実験手法について検討し,その厳密な正当性について論じる。特定の偏量子測定モデルにおいて、任意の$n$-量子ビット状態の非理想的測定は理想射影的測定と等価であり、出力確率分布に作用する古典的なマルコフ過程 $\gamma$ が続くことを証明する。測定誤差は厳密な正当化によって取り除くことができ、$\gamma$ が学習され反転できる。ゲートセットトモグラフィー(R. Blume-Kohout et al., arXiv:1310.4492)から$\Gamma$を得る方法を示し、IBM Q超伝導量子ビットに誤差補正手法を適用する。 We review an experimental technique used to correct state preparation and measurement errors on gate-based quantum computers, and discuss its rigorous justification. Within a specific biased quantum measurement model, we prove that nonideal measurement of an arbitrary $n$-qubit state is equivalent to ideal projective measurement followed by a classical Markov process $\Gamma$ acting on the output probability distribution. Measurement errors can be removed, with rigorous justification, if $\Gamma$ can be learned and inverted. We show how to obtain $\Gamma$ from gate set tomography (R. Blume-Kohout et al., arXiv:1310.4492) and apply the error correction technique to single IBM Q superconducting qubits.	翻訳日:2023-06-04 18:36:14 公開日:2020-07-14
# 量子ハト群集からの脱出 Escape from the Quantum Pigeon Conundrum ( http://arxiv.org/abs/2002.01876v3 ) ライセンス: Link先を確認	Gabor Kunstatter, Jonathan Ziprick, Victoria McNab, Alexander Rennie, Connor Speidel, and Jovin Toews	(参考訳) Aharonovらで最近議論されている。量子力学が2つの箱に3羽のハトを分配した場合、少なくとも2羽のハトを箱の1つに配置しなければならないというピジョン計数原理(PCP)に違反しているとするal. (2016)。しかし、この結論は厳密な理論的議論によって正当化できない。この問題は,PCP違反の結論を裏付けるものではないと予測される遷移振幅を実験的に確認することでさらに複雑になる。ここでは、解釈によらず、PCPが量子力学に違反しないことを演算子アイデンティティのセットで証明する。 It has recently been argued in Aharonov et. al. (2016) that quantum mechanics violates the Pigeon Counting Principle (PCP) which states that if one distributes three pigeons among two boxes there must be at least two pigeons in one of the boxes. However, this conclusion cannot justified by rigorous theoretical arguments. The issue is further complicated by experimental confirmation of the transition amplitudes predicted in this paper that nevertheless do not support the conclusion of PCP violation. Here we prove via a set of operator identities that the PCP is not violated within quantum mechanics, regardless of interpretation.	翻訳日:2023-06-04 16:18:25 公開日:2020-07-14
# ブロードバンド光を用いた光浮上 Optical levitation using broadband light ( http://arxiv.org/abs/2002.04650v3 ) ライセンス: Link先を確認	A. T. M. Anishur Rahman and P. F. Barker	(参考訳) 動的に調整された光学ポテンシャルを作り出す能力は、生物学から量子科学まで幅広い分野において重要になっている。超発光ダイオードの広帯域スペクトルプロファイルとレンズの色収差を組み合わせた任意の光学トウェザ電位の作成方法を示す。超高速レーザーパルス整形に使用される波長可変フィルタにより、広帯域のスペクトルプロファイルと、この光の集光によって形成される光ツイーザー電位を操作できる。これらのポテンシャルを真空中における浮遊ナノ粒子のブラウン運動の測定と、粒子の干渉検出とフィードバック冷却によって特徴付ける。このシンプルで費用対効果の高い技術により、幅広い応用が可能となり、MHz周波数を超える光ポテンシャルランドスケープの迅速な変調が可能となる。 The ability to create dynamic, tailored optical potentials has become important across fields ranging from biology to quantum science. We demonstrate a method for the creation of arbitrary optical tweezer potentials using the broadband spectral profile of a superluminescent diode combined with the chromatic aberration of a lens. A tunable filter, typically used for ultra-fast laser pulse shaping, allows us to manipulate the broad spectral profile and therefore the optical tweezer potentials formed by focusing of this light. We characterize these potentials by measuring the Brownian motion of levitated nanoparticles in vacuum and, also demonstrate interferometric detection and feedback cooling of the particle,s motion. This simple and cost-effective technique will enable a wide range of applications and allow rapid modulation of the optical potential landscape in excess of MHz frequencies.	翻訳日:2023-06-03 23:21:19 公開日:2020-07-14
# 2次元正方格子上の量子ウォークのためのトラップコインの完全分類 Complete classification of trapping coins for quantum walks on the 2D square lattice ( http://arxiv.org/abs/2002.08070v2 ) ライセンス: Link先を確認	B\'alint Koll\'ar, Andr\'as Gily\'en, Iva Tk\'a\v{c}ov\'a, Tam\'as Kiss, Igor Jex, Martin \v{S}tefa\v{n}\'ak	(参考訳) 離散時間量子ウォークのユニークな特徴の1つはトラップと呼ばれるもので、これは量子ウォーカーが最初の位置から完全に脱出できないことを意味する。この効果は、局所貨幣の寸法と明示的な形に依存する。正方格子上の4状態離散時間量子ウォークは、そのユニタリコイン作用素によって定義され、4次元コインヒルベルト空間に作用する。グロバー硬貨のよく知られた例は、部分的なトラップ、すなわち、初期位置に留まる確率が消えるいくつかの脱出初期状態をもたらす。一方、他のいくつかの硬貨は、そのような逃避状態が存在しない強いトラップを示すことが知られている。本稿では,2次元正方格子上での離散時間量子ウォークのために,これらすべてのコインを明示的に構成し,演算子の構造とトラップ効果の顕在化に基づいてそれらを分類する。本研究では, エスケープ状態の存在や非存在が示すように, 異なる動的特性を示す3種類のトラップコインと, 拡散波パレットが被覆する領域とを区別する。 One of the unique features of discrete-time quantum walks is called trapping, meaning the inability of the quantum walker to completely escape from its initial position, albeit the system is translationally invariant. The effect is dependent on the dimension and the explicit form of the local coin. A four state discrete-time quantum walk on a square lattice is defined by its unitary coin operator, acting on the four dimensional coin Hilbert space. The well known example of the Grover coin leads to a partial trapping, i.e., there exists some escaping initial state for which the probability of staying at the initial position vanishes. On the other hand, some other coins are known to exhibit strong trapping, where such escaping state does not exist. We present a systematic study of coins leading to trapping, explicitly construct all such coins for discrete-time quantum walks on the 2D square lattice, and classify them according to the structure of the operator and the manifestation of the trapping effect. We distinguish three types of trapping coins exhibiting distinct dynamical properties, as exemplified by the existence or non-existence of the escaping state and the area covered by the spreading wave-packet.	翻訳日:2023-06-03 04:56:24 公開日:2020-07-14
# 超伝導回路の最適制御による非断熱幾何量子計算 Nonadiabatic geometric quantum computation with optimal control on superconducting circuits ( http://arxiv.org/abs/2004.10199v2 ) ライセンス: Link先を確認	Jing Xu, Sai Li, Tao Chen, and Zheng-Yuan Xue	(参考訳) 量子コンピュータの重要な構成要素である量子ゲートは、非常に脆弱である。したがって、忠実度の高い頑健な量子ゲートを実現することは量子操作の究極の目標である。本稿では、任意の量子ゲートを設計するための超伝導回路上の非断熱的幾何学的量子計算手法を提案する。これは、幾何位相の強みと、最適制御技術と組み合わせてゲートロバスト性をさらに高める能力の両方を共有する。具体的には、任意の幾何学的単一量子ビットゲートを共振マイクロ波フィールド駆動によりトランスモン量子ビット上で実現し、その振幅と位相は時間依存である。一方、非自明な2量子ビットの幾何ゲートは2つの容量結合されたトランモン量子ビットで実装でき、トランモン量子ビットの周波数の1つを変調してそれらの間の効果的な共振結合を得る。したがって,本手法はフォールトトレラントな固体量子計算への有望な一歩となる。 Quantum gates, which are the essential building blocks of quantum computers, are very fragile. Thus, to realize robust quantum gates with high fidelity is the ultimate goal of quantum manipulation. Here, we propose a nonadiabatic geometric quantum computation scheme on superconducting circuits to engineer arbitrary quantum gates, which share both the robust merit of geometric phases and the capacity to combine with optimal control technique to further enhance the gate robustness. Specifically, in our proposal, arbitrary geometric single-qubit gates can be realized on a transmon qubit, by a resonant microwave field driving, with both the amplitude and phase of the driving being time-dependent. Meanwhile, nontrivial two-qubit geometric gates can be implemented by two capacitively coupled transmon qubits, with one of the transmon qubits' frequency being modulated to obtain effective resonant coupling between them. Therefore, our scheme provides a promising step towards fault-tolerant solid-state quantum computation.	翻訳日:2023-05-22 20:29:13 公開日:2020-07-14
# サーバーレス電子メール Serverless Electronic Mail ( http://arxiv.org/abs/2007.04608v2 ) ライセンス: Link先を確認	Geoffrey Goodell	(参考訳) 本稿では、通常のワークステーションやモバイル端末の利用者が、サードパーティーのメールサーバに頼らずにメッセージを交換できるピアツーピア電子メールへの簡単なアプローチについて述べる。重要なことに、このシステムは参加者が互いに通信するために複数のリンクされていないアイデンティティを確立および使用できるようにする。このアーキテクチャは、メッセージ配信に通常のSMTP、ピアツーピア通信にTorを利用する。この設計は、エンドツーエンド認証と暗号化のための公開鍵に基づく信頼のwebをブートストラップするために、自己認証のtor onionサービス名を使用する堅牢で意図しない方法を提供する。本システムは既存の電子メールシステムやパラダイムと相互運用可能であり,IMAP経由で他のメールを受信したり,システム参加者と外部メールユーザとの中継として操作したりすることができる。最後に,ブロードキャストプロトコルを使用してメーリングリストを実装する方法と,分散台帳技術がリストメンバ間の共有知識に関するコンセンサスをブートストラップする方法について述べる。 We describe a simple approach to peer-to-peer electronic mail that would allow users of ordinary workstations and mobile devices to exchange messages without relying upon third-party mail server operators. Crucially, the system allows participants to establish and use multiple unlinked identities for communication with each other. The architecture leverages ordinary SMTP for message delivery and Tor for peer-to-peer communication. The design offers a robust, unintrusive method to use self-certifying Tor onion service names to bootstrap a web of trust based on public keys for end-to-end authentication and encryption, which in turn can be used to facilitate message delivery when the sender and recipient are not online simultaneously. We show how the system can interoperate with existing email systems and paradigms, allowing users to hold messages that others can retrieve via IMAP or to operate as a relay between system participants and external email users. Finally, we show how it is possible to use a broadcast protocol to implement mailing lists and how distributed ledger technology might be used to bootstrap consensus about shared knowledge among list members.	翻訳日:2023-05-10 21:35:28 公開日:2020-07-14
# フェルミ表面形状 Fermi Surface Geometry ( http://arxiv.org/abs/2007.05525v2 ) ライセンス: Link先を確認	Elena Derunova, Jacob Gayles, Yan Sun, Michael W. Gaultois, Mazhar N. Ali	(参考訳) ペレルマン,ハミルトン,サーストンの著名かつ先駆的な数学作品に動機づけられ,多次元多様体の現代の幾何学的数学的分類を用いて電子構造を特徴付け,非自明な電子輸送現象を予測するという概念を導入した。ここでは、接束とガウス曲率を不変量として用いたフェルミ曲面幾何効果(FSGE)を開発する。我々はフェルミ表面(fs)の「双曲性」を記述するための指数である$\mathbb{h}_f$を開発し、現在の方法が苦しんだものを含む、様々な結晶、化学、電子構造ファミリーにまたがる16種類の化合物の実験的に測定された内在的異常ホール効果と普遍的相関(r$^2$ = 0.97)を示す。この研究は、フェルミ曲面から始まる電子(および拡張マグノニックおよびフォノン)構造多様体の幾何学的理解の完全な理論を開発する基礎を築いた。トポロジカル物理学の広範な影響と類似して、ここで始められた概念は、電子輸送の理解においてパラダイムシフトをもたらし、E と k の多様体の幾何学的性質と位相的性質を含むように移動する。 Motivated by the famous and pioneering mathematical works by Perelman, Hamilton, and Thurston, we introduce the concept of using modern geometrical mathematical classifications of multi-dimensional manifolds to characterize electronic structures and predict non-trivial electron transport phenomena. Here we develop the Fermi Surface Geometry Effect (FSGE), using the concepts of tangent bundles and Gaussian curvature as an invariant. We develop an index, $\mathbb{H}_F$, for describing the the "hyperbolicity" of the Fermi Surface (FS) and show a universal correlation (R$^2$ = 0.97) with the experimentally measured intrinsic anomalous Hall effect of 16 different compounds spanning a wide variety of crystal, chemical, and electronic structure families, including where current methods have struggled. This work lays the foundation for developing a complete theory of geometrical understanding of electronic (and by extension magnonic and phononic) structure manifolds, beginning with Fermi surfaces. In analogy to the broad impact of topological physics, the concepts begun here will have far reaching consequences and lead to a paradigm shift in the understanding of electron transport, moving it to include geometrical properties of the E vs k manifold as well as topological properties.	翻訳日:2023-05-10 17:14:08 公開日:2020-07-14
# ノイズ横運動を受ける個々の適応原子ビット上の量子ゲート Quantum Gates on Individually-Addressed Atomic Qubits Subject to Noisy Transverse Motion ( http://arxiv.org/abs/2007.06768v1 ) ライセンス: Link先を確認	M. Cetina, L. N. Egan, C. A. Noel, M. L. Goldman, A. R. Risinger, D. Zhu, D. Biswas, C. Monroe	(参考訳) 個々の閉じ込められた原子量子ビットは、その無視可能なアイドルエラーと集中した光学場を介して再構成可能なゲート操作の完全なセットを実装する能力により、量子コンピュータをスケールする最も有望な技術の1つである。しかし、量子ゲート演算の忠実度は、レーザーに横切る原子の弱い閉じ込めによって制限することができる。鎖軸に沿って弱く閉じ込められた25個の閉じ込められた原子イオンの鎖に個々に絡み合うゲートを配置することにより,この効果の測定を行う。ノイズ電界に起因するイオンの残留加熱から観測されたデコヒーレンスを正確に記述するモデルを提案する。量子回路を通して量子ビットイオンを同調的に冷却するために,鎖に分散したアンシライオンを用いてこれらの効果を抑制することを提案する。 Individual trapped atomic qubits represent one of the most promising technologies to scale quantum computers, owing to their negligible idle errors and the ability to implement a full set of reconfigurable gate operations via focused optical fields. However, the fidelity of quantum gate operations can be limited by weak confinement of the atoms transverse to the laser. We present measurements of this effect by performing individually-addressed entangling gates in chains of up to 25 trapped atomic ions that are weakly confined along the chain axis. We present a model that accurately describes the observed decoherence from the residual heating of the ions caused by noisy electric fields. We propose to suppress these effects through the use of ancilla ions interspersed in the chain to sympathetically cool the qubit ions throughout a quantum circuit.	翻訳日:2023-05-10 02:29:42 公開日:2020-07-14
# adiabaticityへの近道を用いた粒子速度の高速化 Speeding Up Particle Slowing using Shortcuts to Adiabaticity ( http://arxiv.org/abs/2007.06752v1 ) ライセンス: Link先を確認	John P. Bartolotta, Jarrod T. Reilly, and Murray J. Holland	(参考訳) 自発的に散乱した光子のランダムな方向から生じる運動量拡散を伴わずに大きな力を生み出すことができるレーザー場による粒子の減速法を提案する。この方法では、周期的に変形した時間分解型レーザーパルスが極小電子遷移に対処し、繰り返し吸収して放出サイクルを刺激することで粒子運動量を減少させる。ルイス・リースンフェルド不変量理論に基づく断熱アプローチへの近道を実装した。これにより, 短時間の減速距離を得るのに必要となる急速移動の利点を生かして, 応用分野の精密な強度や変形特性に本質的な不感が生じるという, 断熱移動の利点が得られる。熱オーブン源の典型的なパラメータは、毎秒1メートルの速度で中心速度を持つ粒子ビームを生成するため、これは粒子を1ミリメートル未満で定常付近に減速させる可能性がある。放射圧力に依存する広範に実装されたスローング技術と比較し,励起状態の減衰速度が小さい場合に生じる可能性のある利点を示す。したがって、このスキームは特定の分子で起こるような閉環遷移を欠く狭い線幅系の減速に特に有望な候補である。 We propose a method for slowing particles by laser fields that potentially has the ability to generate large forces without the associated momentum diffusion that results from the random directions of spontaneously scattered photons. In this method, time-resolved laser pulses with periodically modified detunings address an ultranarrow electronic transition to reduce the particle momentum through repeated absorption and stimulated emission cycles. We implement a shortcut to adiabaticity approach that is based on Lewis-Riesenfeld invariant theory. This affords our scheme the advantages of adiabatic transfer, where there can be an intrinsic insensitivity to the precise strength and detuning characteristics of the applied field, with the advantages of rapid transfer that is necessary for obtaining a short slowing distance. For typical parameters of a thermal oven source that generates a particle beam with a central velocity on the order of meters per second, this could result in slowing the particles to near stationary in less than a millimeter. We compare the slowing scheme to widely-implemented slowing techniques that rely on radiation pressure forces and show the advantages that potentially arise when the excited state decay rate is small. Thus, this scheme is a particularly promising candidate to slow narrow-linewidth systems that lack closed cycling transitions, such as occurs in certain molecules.	翻訳日:2023-05-10 02:29:08 公開日:2020-07-14
# 絡み合った光子源の最大忠実度のための集光光学の最適化 Optimization of collection optics for maximum fidelity in entangled photon sources ( http://arxiv.org/abs/2007.06748v1 ) ライセンス: Link先を確認	Kadir Durak	(参考訳) 本稿では,自然パラメトリックダウン変換現象によって生じる絡み合った光子のデコヒーレンス源について検討する。直交結晶からの光子対の位相および空間的識別性は、最大エンタングルメント忠実度を減少させる。慎重に選択された補償結晶は、ダウン変換原点の位相と空間的痕跡を消去するために使用される。光子対の放出角も光路差をもたらし、位相識別性をもたらす。現実的なシナリオは数値的にモデル化され、非ゼロ放射角の光子対が位相差を集める。これらのペアは、実用上はまだ収集と操作が可能であるが、収集光学は位相差を付加する。市販の2つの光学系(非球面レンズと無彩レンズ)を比較する。シミュレーション結果と実験結果を比較し,構築したモデルを用いて最大エンタングルメント忠実度を推定した。その結果, 実験パラメータを挿入することにより, 実測精度を提示モデルで正確に推定できることが示唆された。この研究は、臨界位相整合構成における絡み合った光子対源の調製と最適化に非常に有用であることが期待されている。 In this report the decoherence sources for entangled photons created by spontaneous parametric down conversion phenomenon is studied. The phase and spatial distinguishability of photon pairs from orthogonal crystals reduce the maximum achievable entanglement fidelity. Carefully chosen compensation crystals are used to erase the phase and spatial traces of down conversion origins. Emission angle of photon pairs also leads to optical path difference and resulting in phase distinguishability. A realistic scenario is numerically modelled, where the photon pairs with nonzero emission angle gather a phase difference. These pairs can still be collected and manipulated for practical use but the collection optics adds upon the phase difference. Two commercially available optics for collection; aspheric and achromatic lenses are compared. The numerical simulation results are compared with the experimental results to validate the built model for predicting the maximum achievable entanglement fidelity. The results indicate that the fidelity can be accurately estimated with the presented model by inserting the experimental parameters to it. The study is expected to be very useful for preparation and optimization of entangled photon pair sources in critical phase-matching configuration.	翻訳日:2023-05-10 02:28:46 公開日:2020-07-14
# オンラインコースにおけるデータ駆動モデルとタスク完了シーケンスのキャラクタリゼーション Data-driven modelling and characterisation of task completion sequences in online courses ( http://arxiv.org/abs/2007.07003v1 ) ライセンス: Link先を確認	Robert L. Peach and Sam F. Greenbury and Iain G. Johnston and Sophia N. Yaliraki and David Lefevre and Mauricio Barahona	(参考訳) 学習の本質的な時間性は、時系列情報を活用できる方法論の採用を要求する。本研究では、オンラインコースにおけるタスク完了の時間的シーケンスのデータ駆動分析を用いて、個人的および集団的学習者の振る舞いを特徴付け、所定のコース設計における重要なタスクやコースセッションを識別する方法を示す。また,最近開発した確率ベイズモデルを導入し,学生のシーケンス軌跡を学習し,学生の成績を予測する。オンラインビジネスマネジメントコースを受講する学習者からのデータに対するデータ駆動シーケンス分析の適用により、学習者のコーホート内で異なる行動が明らかになり、学習者や学習者のグループを識別し、コースで期待される名目上の順序から逸脱する。コースグレードを後進として,ハイパフォーマンスと低パフォーマンスの学習者間の行動の違いについて検討する。ハイパフォーマンスな学習者は、低パフォーマンスな学習者よりも週次セッションの進行に追随するが、各週次セッションのハイパフォーマンスな学習者は、名目上のタスク順序に縛られない。次に,確率ベイズモデルを用いてハイパフォーマンスとローパフォーマンスの学生のシーケンスをモデル化し,パフォーマンスに関連するエンゲージメント行動の学習を可能にする。また,データ・シーケンス・フレームワークをタスク中心の分析に利用し,重要な点とコース設計におけるタスクの種類の違いを特定する。対話型タスクや議論投稿などの非ロボット学習タスクは高いパフォーマンスと相関していることがわかった。本稿では,授業設計,介入,学生の指導を支援するため,このような分析手法の適用について論じる。 The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners' behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequence trajectories of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.	翻訳日:2023-05-10 02:20:38 公開日:2020-07-14
# 量子スピン鎖の創発的絡み合い構造と自己相似性 Emergent entanglement structures and self-similarity in quantum spin chains ( http://arxiv.org/abs/2007.06989v1 ) ライセンス: Link先を確認	Boris Sokolov, Matteo A. C. Rossi, Guillermo Garc\'ia-P\'erez and Sabrina Maniscalco	(参考訳) 本稿では,多体量子状態に対する実験的にアクセス可能なネットワーク表現を提案する。我々は、この表現のパワーをパラダイム的なスピンチェーンモデルであるxxモデルに適用し、新しい現象をもたらすことを示した。これらの絡み合いネットワークの解析により、準長域秩序の漸進的確立は、ネットワークトポロジーの不安定性と同様に、単スピン共起分布に関する対称性を伴うことが明らかとなった。さらに,空間的局所化コミュニティである創発的絡み合い構造の存在を,モデルに依存しないコミュニティ検出アルゴリズムによって明らかにできるシステムの大域対称性により同定する。ネットワーク表現はさらに、状態における構造クラスの存在と循環的な自己相似性を明らかにし、これはコミュニティ構造と密接に関連していると推測する。その結果、複雑なネットワーク理論からツールや概念を用いることで、何十年も研究されたモデルでも新しい物理現象の発見、理解、記述が可能になることが示された。 We introduce an experimentally accessible network representation for many-body quantum states based on entanglement between all pairs of its constituents. We illustrate the power of this representation by applying it to a paradigmatic spin chain model, the XX model, and showing that it brings to light new phenomena. The analysis of these entanglement networks reveals that the gradual establishment of quasi-long range order is accompanied by a symmetry regarding single-spin concurrence distributions, as well as by instabilities in the network topology. Moreover, we identify the existence of emergent entanglement structures, spatially localised communities enforced by the global symmetry of the system that can be revealed by model-agnostic community detection algorithms. The network representation further unveils the existence of structural classes and a cyclic self-similarity in the state, which we conjecture to be intimately linked to the community structure. Our results demonstrate that the use of tools and concepts from complex network theory enables the discovery, understanding, and description of new physical phenomena even in models studied for decades.	翻訳日:2023-05-10 02:20:11 公開日:2020-07-14
# 絡み合いと対称性の合同効果:物性と排他性 Joint effects of entanglement and symmetrization: physical properties and exclusion ( http://arxiv.org/abs/2007.06982v1 ) ライセンス: Link先を確認	Pedro Sancho	(参考訳) 絡み合いと対称性は物理的性質を変更できる非分離状態をもたらす。原子吸収の例を使って、それらが一度に関連している両方の種類の効果を比較します。多粒子重ね合わせの存在は、同じ原子の吸収率を大きく変化させ、フェルミオンの重なりに依存することさえ阻害する。また、この文脈で自然に現れる多重フェルミオン重ね合わせに関連する非標準排除状態のセットも同定する。これらのアイデアをテストするために分子の解離に基づくアレンジメントを提案する。 Entanglement and symmetrization lead to non-separable states that can modify physical properties. Using the example of atomic absorption we compare both types of effects when they are relevant at once. The presence of multi-particle superpositions largely alters the absorption rates of identical atoms, even inhibiting the dependence on overlapping for fermions. We also identify a set of non-standard excluded states related to multi-fermion superposition that naturally emerge in this context. We propose an arrangement based on the dissociation of molecules to test these ideas.	翻訳日:2023-05-10 02:19:21 公開日:2020-07-14
# 光の多光子状態の対称性保護 Symmetry-protection of multiphoton states of light ( http://arxiv.org/abs/2007.06942v1 ) ライセンス: Link先を確認	Jon Lasa-Alonso, Martin Molezuelas, J. J. Miguel Varga, Aitzol Garcia-Etxarri, Geza Giedke and Gabriel Molina-Terriza	(参考訳) 本稿では,円筒対称性を持つ散乱問題における保護多光子状態の出現を解析する。そのため、まず、選択後対称性保護の概念を形式的に定義する。対称保護状態は1光子状態や2光子状態に制限されないが、反対に、正式に多光子状態に拡張できることを示す。さらに,多光子保護状態が1光子状態と2光子状態の小さな集合から構成されていることを円柱対称性の場合には証明する。最後に、特にデコヒーレンスフリー部分空間の構築において、量子通信において対称性が保護された状態が有する可能性があることを指摘する。 In this manuscript we analyze the emergence of protected multiphoton states in scattering problems with cylindrical symmetry. In order to do that, we first provide a formal definition of the concept of postselected symmetry-protection. We show that symmetry-protected states are not limited to one- or two-photon states, on the contrary, it can be formally extended to the multiphoton case. In addition, we prove for the case of cylindrical symmetry that all possible multiphoton protected states are constructed from a small set of one- and two-photon states. Finally, we point out possible applications that symmetry-protected states may have in quantum communications, concretely, in the construction of decoherence-free subspaces.	翻訳日:2023-05-10 02:19:08 公開日:2020-07-14
# ASHRAE Great Energy Predictor IIIコンペティションの概要と結果 The ASHRAE Great Energy Predictor III competition: Overview and results ( http://arxiv.org/abs/2007.06933v1 ) ライセンス: Link先を確認	Clayton Miller, Pandarasamy Arjunan, Anjukan Kathirgamanathan, Chun Fu, Jonathan Roth, June Young Park, Chris Balbach, Krishnan Gowri, Zoltan Nagy, Anthony Fontanini, Jeff Haberl	(参考訳) 2019年後半、ASHRAEはカグルプラットフォーム上でGEPIII(Great Energy Predictor III)機械学習コンペティションを開催した。この打ち上げはアシュレーから3度目のエネルギー予測競争となり、1990年代半ば以来となる。この改訂版では、16のソースから1,448の建物から収集された2,380エネルギーメーターから2000万点以上のトレーニングデータを提供した。このコンペティションの全体的な目標は、4100万以上のプライベートおよびパブリックテストデータポイントの予測のための最も正確なモデリングソリューションを見つけることであった。参加者は4,370人で、94カ国の3,614チームが39,403件の予測を提出した。上位5つの勝利ワークフローに加えて、競合他社は40以上の完全なソリューションを含む415の再現可能なオンライン機械学習ワークフロー例(ノートブック)を公開している。本稿では,コンペティションの準備とデータセット,競争相手とその議論,機械学習ワークフローとモデルの生成,勝者とその提案,学んだ教訓の議論,競技成果と次のステップについて概説する。最もポピュラーで正確な機械学習ワークフローは、lightgbmのような勾配ブースティングツリーモデルの大規模なアンサンブルを使用していた。最初の予測競合と同様に、データセットの事前処理が重要な差別化要因として現れた。 In late 2019, ASHRAE hosted the Great Energy Predictor III (GEPIII) machine learning competition on the Kaggle platform. This launch marked the third energy prediction competition from ASHRAE and the first since the mid-1990s. In this updated version, the competitors were provided with over 20 million points of training data from 2,380 energy meters collected for 1,448 buildings from 16 sources. This competition's overall objective was to find the most accurate modeling solutions for the prediction of over 41 million private and public test data points. The competition had 4,370 participants, split across 3,614 teams from 94 countries who submitted 39,403 predictions. In addition to the top five winning workflows, the competitors publicly shared 415 reproducible online machine learning workflow examples (notebooks), including over 40 additional, full solutions. This paper gives a high-level overview of the competition preparation and dataset, competitors and their discussions, machine learning workflows and models generated, winners and their submissions, discussion of lessons learned, and competition outputs and next steps. The most popular and accurate machine learning workflows used large ensembles of mostly gradient boosting tree models, such as LightGBM. Similar to the first predictor competition, preprocessing of the data sets emerged as a key differentiator.	翻訳日:2023-05-10 02:18:49 公開日:2020-07-14
# 量子グラフニューラルネットワークによる粒子トラック再構成 A Quantum Graph Neural Network Approach to Particle Track Reconstruction ( http://arxiv.org/abs/2007.06868v1 ) ライセンス: Link先を確認	Cenk T\"uys\"uz, Federico Carminati, Bilge Demirk\"oz, Daniel Dobos, Fabio Fracas, Kristiane Novotny, Karolos Potamianos, Sofia Vallecorsa, Jean-Roch Vlimant	(参考訳) HL-LHC(High Luminosity Large Hadron Collider)実験の追跡検出器の計算に必要となる複雑性とデータのスケールの未熟な増加が期待されている。現在使われているカルマンフィルタに基づくアルゴリズムは、同時衝突の数の増加、占有率、拡張性(二次的よりも重要)といった曖昧さの観点から限界に達しているが、粒子トラック再構成に対する機械学習アプローチは様々である。 HEP.TrkXは以前、トラックMLデータセットを使用して、トラック計測を接続するグラフとしてイベントを処理することで、組合せ背景を管理可能な量に減らし、計算上妥当なサイズにスケールすることで、有望なソリューションを提供できることを実証した。これまでの研究では、粒子の再構成を追跡するために、量子コンピューティングからグラフニューラルネットワークへの最初の試みを示す。我々は、量子コンピューティングの能力を活用して、非常に多くの状態を同時に評価し、大きなパラメータ空間を効果的に探索することを目指している。本論文の次のステップとして,初期単純化ツリーテンソルネットワーク(TTN)モデルの低精度収束を克服するための反復的アプローチによる改良モデルを提案する。 Unprecedented increase of complexity and scale of data is expected in computation necessary for the tracking detectors of the High Luminosity Large Hadron Collider (HL-LHC) experiments. While currently used Kalman filter based algorithms are reaching their limits in terms of ambiguities from increasing number of simultaneous collisions, occupancy, and scalability (worse than quadratic), a variety of machine learning approaches to particle track reconstruction are explored. It has been demonstrated previously by HEP.TrkX using TrackML datasets, that graph neural networks, by processing events as a graph connecting track measurements can provide a promising solution by reducing the combinatorial background to a manageable amount and are scaling to a computationally reasonable size. In previous work, we have shown a first attempt of Quantum Computing to Graph Neural Networks for track reconstruction of particles. We aim to leverage the capability of quantum computing to evaluate a very large number of states simultaneously and thus to effectively search a large parameter space. As the next step in this paper, we present an improved model with an iterative approach to overcome the low accuracy convergence of the initial oversimplified Tree Tensor Network (TTN) model.	翻訳日:2023-05-10 02:18:01 公開日:2020-07-14
# ロングパスを用いた量子回路の2次元量子ビット配置 2D Qubit Placement of Quantum Circuits using LONGPATH ( http://arxiv.org/abs/2007.06804v1 ) ライセンス: Link先を確認	Mrityunjay Ghosh, Nivedita Dey, Debdeep Mitra, Amlan Chakrabarti	(参考訳) 計算困難問題の解を求める従来の古典的計算よりも高速化を実現するため、量子コンピューティングが導入された。量子アルゴリズムは擬似量子環境でシミュレートできるが、実装には量子ゲートの物理合成による量子回路の実現が含まれる。これは複素量子ゲートを単純な1量子ビットと2量子ビットゲートのカスケードに分解する必要がある。物理合成の方法論的枠組みは、オペランド(量子ビット)と演算子の配置に関する制約を課している。格子の各ノードが量子ビットを表す格子上に物理量子ビットを置くことができれば、隣接する量子ビット上でのみ量子ゲートを操作でき、そうでなければ、非線形近接近傍アーキテクチャを線形近接近傍アーキテクチャに変換するためにSWAPゲートを挿入しなければならない。スワップゲートの挿入は物理的実装の累積コストを減らすために最適である。実際の実装への配置とルーティングにはスケジュールレイアウト生成が必要である。本稿では、任意の量子回路におけるSWAPゲート数を最適化する2つのアルゴリズムを提案する。最初のアルゴリズムは、相互作用グラフの生成から始まり、次にノードから始まる最も長い経路を最大度で見つけることを意図している。第2のアルゴリズムは、任意の非隣接量子ビット間のSWAPゲート数を最適化する。提案手法は1Dおよび2D NTCアーキテクチャにおけるSWAPゲート数を大幅に削減する。 In order to achieve speedup over conventional classical computing for finding solution of computationally hard problems, quantum computing was introduced. Quantum algorithms can be simulated in a pseudo quantum environment, but implementation involves realization of quantum circuits through physical synthesis of quantum gates. This requires decomposition of complex quantum gates into a cascade of simple one qubit and two qubit gates. The methodological framework for physical synthesis imposes a constraint regarding placement of operands (qubits) and operators. If physical qubits can be placed on a grid, where each node of the grid represents a qubit then quantum gates can only be operated on adjacent qubits, otherwise SWAP gates must be inserted to convert non-Linear Nearest Neighbor architecture to Linear Nearest Neighbor architecture. Insertion of SWAP gates should be made optimal to reduce cumulative cost of physical implementation. A schedule layout generation is required for placement and routing apriori to actual implementation. In this paper, two algorithms are proposed to optimize the number of SWAP gates in any arbitrary quantum circuit. The first algorithm is intended to start with generation of an interaction graph followed by finding the longest path starting from the node with maximum degree. The second algorithm optimizes the number of SWAP gates between any pair of non-neighbouring qubits. Our proposed approach has a significant reduction in number of SWAP gates in 1D and 2D NTC architecture.	翻訳日:2023-05-10 02:17:42 公開日:2020-07-14
# 60%検出効率1550nmのInGaAs/InP単光子検出器 InGaAs/InP single-photon detectors with 60% detection efficiency at 1550 nm ( http://arxiv.org/abs/2007.06792v1 ) ライセンス: Link先を確認	Yu-Qiang Fang, Wei Chen, Tian-Hong Ao, Cong Liu, Li Wang, Xin-Jiang Gao, Jun Zhang, Jian-Wei Pan	(参考訳) InGaAs/InP単光子検出器(SPD)は近赤外光子計数に広く用いられている。光子検出効率(PDE)は、SPDのキャラクタリゼーションにおいて最も重要なパラメータの1つであり、PDEの増加は、産業開発と学術研究において一貫して中心的な役割を果たす。本稿では,1550nmにおいて,pdeを60%まで高めた高周波ゲイティングingaas/inp spdの実装について述べる。一方,ingaas/inp単光子アバランシェダイオードの誘電金属反射層を付加した構造設計とデバイス製造を最適化し,入射光子の吸収効率を20%程度向上させた。一方,アフターパルス効果抑制のための寄生容量を最小化するために,弱い雪崩抽出のためのモノリシックリードアウト回路を開発した。 1.25GHzの正弦波ゲーティングと最適化ゲート振幅と動作温度により、SPDは340kcpsの暗カウントレート(DCR)で60%のPDEに達する。 3kcpsのDCRを基準として、PDEは5.5%の余パルス確率で40% PDEに達し、近赤外線SPDベースのアプリケーションの性能を大幅に向上させることができる。 InGaAs/InP single-photon detectors (SPDs) are widely used for near-infrared photon counting in practical applications. Photon detection efficiency (PDE) is one of the most important parameters for SPD characterization, and therefore increasing PDE consistently plays a central role in both industrial development and academic research. Here we present the implementation of high-frequency gating InGaAs/InP SPD with a PDE as high as 60% at 1550 nm. On one hand, we optimize the structure design and device fabrication of InGaAs/InP single-photon avalanche diode with an additional dielectric-metal reflection layer to relatively increase the absorption efficiency of incident photons by ~ 20%. On the other hand, we develop a monolithic readout circuit of weak avalanche extraction to minimize the parasitic capacitance for the suppression of the afterpulsing effect. With 1.25 GHz sine wave gating and optimized gate amplitude and operation temperature, the SPD is characterized to reach a PDE of 60% with a dark count rate (DCR) of 340 kcps. For practical use, given 3 kcps DCR as a reference the PDE reaches ~ 40% PDE with an afterpulse probability of 5.5%, which can significantly improve the performance for the near-infrared SPD based applications.	翻訳日:2023-05-10 02:17:22 公開日:2020-07-14
# ゲージ場が維持する安定非線形モード Stable nonlinear modes sustained by gauge fields ( http://arxiv.org/abs/2007.07245v1 ) ライセンス: Link先を確認	Yaroslav V. Kartashov and Vladimir V. Konotop	(参考訳) スピノール多次元非線形schr\"{o}dinger方程式におけるソリトンの存在、進化、安定性に対するゲージ場の普遍的効果を明らかにする。二次元の場合に着目して、ゲージ場を純粋なゲージに分割して \rtext{non-pure gauge} を生成すると、ソリトン力学におけるこれらの成分の役割が異なる: 新興状態の \btext{localization characteristics} は曲率によって決定され、純粋なゲージはモードの安定性に影響を与える。それぞれの解は、純粋ゲージとは独立なエンベロープとして正確に表現でき、曲率とは独立な定常キャリアモード状態を変調することができる。我々の中心的な発見は、非ゼロ曲率が異常なモードの存在に繋がることであり、特に、定常的な反発相互作用を持つ媒体において、外部収束電位を伴わず、また、外部トラップにおいても、安定した局所的な自走基本状態と渦搬送状態が可能である。 We reveal the universal effect of gauge fields on the existence, evolution, and stability of solitons in the spinor multidimensional nonlinear Schr\"{o}dinger equation. Focusing on the two-dimensional case, we show that when gauge field can be split in a pure gauge and a \rtext{non-pure gauge} generating \rtext{effective potential}, the roles of these components in soliton dynamics are different: the \btext{localization characteristics} of emerging states are determined by the curvature, while pure gauge affects the stability of the modes. Respectively the solutions can be exactly represented as the envelopes independent of the pure gauge, modulating stationary carrier-mode states, which are independent of the curvature. Our central finding is that nonzero curvature can lead to the existence of unusual modes, in particular, enabling stable localized self-trapped fundamental and vortex-carrying states in media with constant repulsive interactions without additional external confining potentials and even in the expulsive external traps.	翻訳日:2023-05-10 02:10:52 公開日:2020-07-14
# 高Q間隔正方形結合マイクロリング共振器アレイ High-Q Interstitial Square Coupled Microring Resonators Arrays ( http://arxiv.org/abs/2007.07179v1 ) ライセンス: Link先を確認	Shaolin Liao and Lu Ou	(参考訳) 中間環を持つマイクロリング共振器(MRR)の正方配列の特性について検討した。 Floquet-Bloch周期状態の遷移行列法により, 正方形結合型MRRの分散挙動を求める。固有波動ベクトル,バンドギャップおよび固有モードベクトルの解析式は、同一のカップラを持つ間方結合MRR配列と、間方結合MRRを含まない正方結合MRR配列の特別な場合に対して導出される。そして、所定の周波数の4つの固有波ベクトルそれぞれについて、世俗方程式を介して固有モードの場分布を算出する。最後に、同一のカプラと正方形結合mrs配列を有する間欠的正方形結合mrs配列について数値シミュレーションを行う。シミュレーション結果は解析分析を検証する。最後に、中間5リング構成、正規4リング構成及び1リング構成の負荷品質係数を求める。その結果, 共振周波数における固有モードの劣化により, 1リング構成の20倍, 通常の4リング構成の8倍の負荷品質が得られた。したがって、正方形結合型MRRアレイは、パリティ時間対称センサのようなフィルタや共振に基づくセンシング装置を含む高品質なフォトニクスコンポーネントを形成する大きな可能性を持っている。 The properties of the square array of coupled Microring Resonators (MRRs) with interstitial rings are studied. Dispersion behavior of the interstitial square coupled MRRs is obtained through the transfer matrix method with the Floquet-Bloch periodic condition. Analytical formulas of the eigen wave vectors, band gaps and eigen mode vectors are derived for the special cases of the interstitial square coupled MRRs array with identical couplers and the regular square coupled MRRs array without the interstitial rings. Then, the eigen modes' field distribution are calculated for each of the four eigen wave vectors for a given frequency through the secular equation. Finally, numerical simulation is performed for an interstitial square coupled MRRs array with identical couplers and a regular square coupled MRRs array. The simulation result verifies the analytical analysis. Finally, the loaded quality factors of the interstitial 5-ring configuration, the regular 4-ring configuration and the 1-ring configuration are obtained. It is found that the loaded quality factor of the interstitial 5-ring configuration is up to 20 times and 8 times as high as those of the 1-ring configuration and the regular 4-ring configuration respectively, mainly due to the degenerated eigen modes at the resonant frequency. Thus, the interstitial square coupled MRRs array has the great potential to form high-quality integrated photonics components, including filters and resonance based sensing devices like the parity-time symmetric sensors.	翻訳日:2023-05-10 02:10:31 公開日:2020-07-14
# マンガ行列モデルにおける演算子成長境界 Operator growth bounds in a cartoon matrix model ( http://arxiv.org/abs/2007.07165v1 ) ライセンス: Link先を確認	Andrew Lucas, Andrew Osborne	(参考訳) n(n-1)/2$相互作用するマヨラナフェルミオンのモデルにおいて、演算子の成長を研究する。ハミルトニアンの項は、長さ$q$の周期の辺に生きる$q$フェルミオンの積に比例する。このモデルはマンガ「行列モデル」であり、相互作用グラフはホログラフィック的に量子重力に双対な単一のトレース行列モデルを模倣している。我々は(非摂動的に1/N$で、平均的なアンサンブルなしで)このモデルのスクランブル時間は少なくとも位数$\log N$であり、高速スクランブル予想と一致することを証明している。我々は、我々の「行列モデル」とメロニカルモデルにおける演算子の成長の明らかな類似性と相違についてコメントする。 We study operator growth in a model of $N(N-1)/2$ interacting Majorana fermions, which live on the edges of a complete graph of $N$ vertices. Terms in the Hamiltonian are proportional to the product of $q$ fermions which live on the edges of cycles of length $q$. This model is a cartoon "matrix model": the interaction graph mimics that of a single-trace matrix model, which can be holographically dual to quantum gravity. We prove (non-perturbatively in $1/N$, and without averaging over any ensemble) that the scrambling time of this model is at least of order $\log N$, consistent with the fast scrambling conjecture. We comment on apparent similarities and differences between operator growth in our "matrix model" and in the melonic models.	翻訳日:2023-05-10 02:10:09 公開日:2020-07-14
# マルチユーティリティ市場:持続可能な開発のためのブロックチェーン交換プラットフォームのためのフレームワーク Multi-Utility Market: Framework for a Blockchain Exchange Platform for Sustainable Development ( http://arxiv.org/abs/2007.07096v1 ) ライセンス: Link先を確認	Jacques Bou Abdo and Sherali Zeadally	(参考訳) 水やその他の資源は日々不足しており、発展途上国は直ちに介入する必要性が最も高い。国家のニーズとして水は、21世紀における紛争の主な原因の1つと考えられている。ピアツーピアトレーディングは、最も便利でスケーラブルで持続可能なソリューションの1つだが、適切なビジネスモデルが欠如しているため、通常のユーザーが生成されたリソース、通貨や金融決済の複雑さ、単一ユーティリティ市場を売却する動機となっている。本稿では,ピアツーピアトレーディングが直面する課題を解決するブロックチェーン技術に基づく多機能トレーディングプラットフォームを提案する。このプラットフォームは、先進国の農村部だけでなく、特に発展途上国のニーズを満たしている。提案する設計のオープン性は、様々な利害関係者による採用と利用に適しています。 Water and other resources are becoming scarcer every day, and developing countries are the neediest for an immediate intervention. Water, as a national need, is considered to be one of the main causes for conflicts in the 21st century. Peer-to-peer trading is one of the most convenient, scalable and sustainable solutions but faces organization challenges such as: the absence of suitable business models motivating normal users to sell their generated resources, currency and financial settlement complexities, and single utility markets. We propose a multi-utility trading platform, based on blockchain technology which can address the challenges faced by peer-to-peer trading. This platform meets the needs of developing countries in particular as well as rural areas of developed countries. The open nature of our proposed design makes it suitable for adoption and use by various stakeholders.	翻訳日:2023-05-10 02:09:52 公開日:2020-07-14
# プライベートデータから得られる公共財 - デジタルコンタクトトラクションの有効性と正当化パラドックス Public Goods From Private Data -- An Efficacy and Justification Paradox for Digital Contact Tracing ( http://arxiv.org/abs/2007.07016v1 ) ライセンス: Link先を確認	Andrew Buzzell	(参考訳) 新型コロナウイルスの感染拡大を抑えるためのデジタルコンタクトトラッキング(DCT)アプリの採用に関する議論は、個人のプライバシーへのリスクに焦点を当てている(Sharma & Bashir 2020, Tang 2020)。この強調は、DCTの倫理的展開に重大な課題を示すが、DCTを実装するための正当化を損なう制約を生成する。この結果のみを倫理的監視分析(Floridi & Strait 2020)の成功であり、潜在的に有害な技術の配備を妨げていると考えるのは間違いである。プライバシー中心の分析は、データを私有財産として扱い、個人と政府の関係を敵とみなし、技術プラットフォームをゲートキーパーとして定着させ、個人の同意と公共衛生倫理を知らせるよりコミュニタリズム的な価値観とある程度の緊張関係にある企業の影響によって、緊急公衆衛生当局の概念を支持している。倫理的かつ効果的なDCTの障壁を克服し、デジタル技術の公共的利益の実現を支援するインフラと政策を開発するためには、集約データの公開リソース概念を開発する必要がある。 Debate about the adoption of digital contact tracing (DCT) apps to control the spread of COVID-19 has focussed on risks to individual privacy (Sharma & Bashir 2020, Tang 2020). This emphasis reveals significant challenges to ethical deployment of DCT, but generates constraints which undermine justification to implement DCT. It would be a mistake to view this result solely as the successful operation of ethical foresight analysis (Floridi & Strait 2020), preventing deployment of potentially harmful technology. Privacy-centric analysis treats data as private property, frames the relationship between individuals and governments as adversarial, entrenches technology platforms as gatekeepers, and supports a conception of emergency public health authority as limited by individual consent and considerable corporate influence that is in some tension with the more communitarian values that typically inform public health ethics. To overcome the barriers to ethical and effective DCT, and develop infrastructure and policy that supports the realization of potential public benefits of digital technology, a public resource conception of aggregate data should be developed.	翻訳日:2023-05-10 02:08:50 公開日:2020-07-14
# 線形光学に基づくGHZ型絡み合いコヒーレント状態の絡み合い濃度プロトコル Entanglement concentration protocols for GHZ-type entangled coherent state based on linear optics ( http://arxiv.org/abs/2007.07014v1 ) ライセンス: Link先を確認	Mitali Sisodia, Chitra Shukla	(参考訳) 我々は,GHZ型ECSから最大絡み合いのグリーンベルガー・ホルン・ザイリンガー型絡み合いコヒーレント状態(ECS)を得るための2つの絡み合い濃度プロトコル(ECP)を提案した。コンヒーレント状態の重畳を補助する部分絡み付きGHZ型ECSを用いた第1のECPを得たが,第2のECPは部分絡み付きGHZ型ECSの2つのコピーを用いて設計した。成功確率も計算され、ecpsの両方で議論されている。我々は、3モードのGHZ型ECSに対する最初のECPの成功確率を、3モードのW型ECSのECPと比較し、状態パラメータのより大きな値(\beta=0.7)に対して、我々のECPがより効率的(最大成功確率)であることを示した。物理実現のために、線形光学素子を用いた2つの光回路(2つのecps)、viz 50:50ビームスプリッタ、位相シフト器、および光子検出器が提供され、この技術で可能な将来の実験実装をサポートする。 We proposed two entanglement concentration protocols (ECPs) to obtain maximally entangled Greenberger-Horne-Zeilinger (GHZ)-type entangled coherent state (ECS) from the corresponding partially entangled GHZ-type ECSs. We obtained the first ECP using a partially entangled GHZ-type ECS assisted with a superposition of single-mode coherent state, however the second ECP is designed using two copies of partially entangled GHZ-type ECSs. The success probabilities have also been calculated and discussed for both the ECPs. We have further compared the success probabilities of our first ECP for 3-mode GHZ-type ECS with an ECP of 3-mode W-type ECS and found that our ECP is more efficient (maximal success probabilities) for larger value (\beta=0.7) of state parameter. For the physical realization, two optical circuits (for two ECPs) using linear optical elements, viz 50:50 beam splitter, phase shifter, and photon detectors are provided, which support the future experimental implementation possible with the present technology.	翻訳日:2023-05-10 02:08:29 公開日:2020-07-14
# 一般化イジングマシンによる量子干渉のエミュレート Emulating Quantum Interference with Generalized Ising Machines ( http://arxiv.org/abs/2007.07379v1 ) ライセンス: Link先を確認	Shuvro Chowdhury, Kerem Y. Camsari and Supriyo Datta	(参考訳) ノイズの多い中間スケール量子(nisq)時代の量子超越性の最近の画期的な実証は、古典計算と量子計算の間のより細かい境界を確立するための激しい活動をもたらした。本稿では、量子モンテカルロ法(QMC)を用いて、$n$ q-bitsに作用する$d$量子ゲートの列を、2つの値 "0" と "1" を持つ古典スピンまたは p-bits を持つボルツマン機械(BM)に変換する体系的な手順を定式化する。この手順を用いて、ショアのアルゴリズムを、通常のラップトップコンピュータ上で1日以内の90ドルのpビットを用いて最大36ドルのqビットでエミュレートし、naive schr\"{o}dingerの実装では約10^{21}$要素の行列を乗算する必要がある。さらに大きな問題は専用イジングマシンでアクセス可能であるべきである。しかし、量子コンピュータに対する非効率性に対して量的計量$s_{\text{total}}$を導入することにより、確率論的アプローチの明確な限界も明らかにする。例えば、$n$ q-bitsのShorのアルゴリズムの簡単な確率的実装は、$S_{\text{Total}} \sim \exp{(-n/2)}$となり、真量子コンピュータで期待される多項式スケーリングの代わりに、確率的Shorのアルゴリズムの計算時間を指数関数的に2^{n/2$にする。これはQMCでよく知られた符号問題の顕在化であり、適切な変換で「テーム」することが可能である。最後に、純粋に実エネルギー関数に基づく標準的な最適化アルゴリズムを特徴とし、虚部$\Im{(E)}$を加算することにより、量子的な位相キャンセルを伴うファインマンパスの統計的抑制を増大させる例を示す。この例は、古典的アニーラーで遭遇した符号問題を量子アニーラーの計算資源にすることができることを示している。 The recent groundbreaking demonstration of quantum supremacy in the noisy intermediate scale quantum (NISQ) era has led to an intense activity in establishing finer boundaries between classical and quantum computing. In this paper, we use quantum Monte Carlo (QMC) techniques to formulate a systematic procedure for translating any sequence of $d$ quantum gates acting on $n$ q-bits into a Boltzmann machine (BM) having $n+g(d)$ classical spins or p-bits with two values "0" and "1", but with a complex energy function $E$. Using this procedure we emulate Shor's algorithm with up to $36$ q-bits using $90$ p-bits, on an ordinary laptop computer in less than a day, while a naive Schr\"{o}dinger implementation would require multiplying matrices with $\approx 10^{21}$ elements. Even larger problems should be accessible on dedicated Ising Machines. However, we also identify clear limitations of the probabilistic approach by introducing a quantitative metric $S_{\text{Total}}$ for its inefficiency relative to a quantum computer. For example, a straightforward probabilistic implementation of Shor's algorithm with $n$ q-bits leads to an $S_{\text{Total}} \sim \exp{(-n/2)}$, making the computation time for the probabilistic Shor's algorithm scale exponentially as $2^{n/2}$ instead of the polynomial scaling expected for true quantum computers. This is a manifestation of the well-known sign problem in QMC and it may be possible to "tame" it with appropriate transformations. Finally, we present an example featuring a standard optimization algorithm based on a purely real energy function to which we add an imaginary part $\Im{(E)}$, thereby augmenting the statistical suppression of Feynman paths with quantum-like phase cancellation. This example illustrates how the sign problem encountered in classical annealers can potentially be turned into a computational resource for quantum annealers.	翻訳日:2023-05-10 02:01:00 公開日:2020-07-14
# SaYoPillow: IoMTの睡眠習慣を考慮したストレス検出・予測・制御のためのブロックチェーン対応プライバシ保証フレームワーク SaYoPillow: A Blockchain-Enabled, Privacy-Assured Framework for Stress Detection, Prediction and Control Considering Sleeping Habits in the IoMT ( http://arxiv.org/abs/2007.07377v1 ) ライセンス: Link先を確認	Laavanya Rachakonda and Anand K. Bapatla and Saraju P. Mohanty and Elias Kougianos	(参考訳) 今日の生活様式を考えると、人々は人間の体に与える利益を忘れるだけである。生産的な睡眠をとらない理由は多々ある。 Smart-Yoga Pillow(SaYoPillow)は、ストレスを緩和し、ストレスと睡眠習慣の計測可能な関係を確立しながら、良質な睡眠の重要性を認識するためのデバイスとして構想されている。本研究では、急速眼球運動(REM)および非急速眼球運動(NREM)段階における生理的変化を継続的に監視し、睡眠習慣を分析するシステムを提案する。生理的パラメータの変化に加えて、睡眠時間、嗅覚範囲、眼球運動、四肢運動などの要因も監視される。 SaYoPillowシステムはエッジレベルで処理され、ストレージはクラウドにある。ユーザのプライバシを侵害する必要はなく、SaYoPillow氏は、医療に対する悪意のある攻撃を減らすために、アップロードと検索の両方にセキュアなデータ送信を提案している。ユーザインタフェースは、データアクセシビリティと可視性を制御するために提供される。 SaYoPillowの精度は96%で、既存の研究成果に近い。しかし、SaYoPillowは、セキュリティ機能を扱う唯一の仕事であり、ストレスに対する睡眠習慣を考慮する唯一の仕事である。 Considering today's lifestyle, people just sleep forgetting the benefits it provides to the human body. The reasons for not having a productive sleep could be many. Smart-Yoga Pillow (SaYoPillow) is envisioned as a device that may help in recognizing the importance of a good quality sleep to alleviate stress while establishing a measurable relationship between stress and sleeping habits. A system that analyzes the sleeping habits by continuously monitoring the physiological changes that occur during rapid eye movement (REM) and non-rapid eye movement (NREM) stages of sleep is proposed in the current work. In addition to the physiological parameter changes, factors such as sleep duration, snoring range, eye movement, and limb movements are also monitored. The SaYoPillow system is processed at the edge level with the storage being at the cloud. Not having to compromise the user's privacy, SaYoPillow proposes secure data transmission for both uploading and retrieving, and secure storage and communications as an attempt to reduce malicious attacks on healthcare. A user interface is provided for the user to control data accessibility and visibility. SaYoPillow has 96% accuracy which is close to other existing research works. However, SaYoPillow is the only work with security features as well as only work that considers sleeping habits for stress.	翻訳日:2023-05-10 02:00:18 公開日:2020-07-14
# 簡単な迷路を通る複数の経路を量子ウォークで見つける Finding more than one path through a simple maze with a quantum walk ( http://arxiv.org/abs/2007.07340v1 ) ライセンス: Link先を確認	Mark Hillery	(参考訳) 2つと3つの星グラフからなる鎖を通る量子ウォークを研究する。第一星は識別された頂点のラベル付きスタートを持ち、最後の星は1つのラベル付き終端を持つ。これら2つの頂点の間には複数の経路があり、対象はこれらの経路を見つけることである。量子ウォークは量子スピードアップによってこれを実現できることを示す。 We study quantum walks through chains consisting of two and three star graphs. The first star has a distinguished vertex labelled START and the last has one labelled END. There are multiple paths between these two vertices, and the object is to find these paths. We show that a quantum walk can do this with a quantum speedup.	翻訳日:2023-05-10 01:59:54 公開日:2020-07-14
# 窒化チタンとアルミニウム超伝導共振器の誘電損失の比較 Comparison of Dielectric Loss in Titanium Nitride and Aluminum Superconducting Resonators ( http://arxiv.org/abs/2007.07338v1 ) ライセンス: Link先を確認	Alexander Melville, Greg Calusine, Wayne Woods, Kyle Serniak, Evan Golden, Bethany M. Niedzielski, David K. Kim, Arjan Sevi, Jonilyn L. Yoder, Eric A. Dauler, William D. Oliver	(参考訳) 損失誘電体は超伝導量子回路における大きなデコヒーレンス源である。本稿では, 窒化チタン (TiN) とアルミニウム (Al) 超伝導コプラナー導波管 (CPW) 共振器のバルクおよび界面誘電体の誘電損失をモデル化し, 比較する。我々は等方トレンチ型共振器を作製し、特定の誘電領域の共振器品質因子に対する寄与を強調する一連のデバイスジオメトリを生成する。各誘電体領域はTiNデバイスの損失に大きく寄与するが、金属-空気界面はAlデバイスの損失を支配している。さらに、後プロセスハイドロフッ化物(hf)エッチングの有無に関わらず、各tin共振器形状の品質因子を評価し、基板-空気界面の損失を低減し、品質因子を改善する。 Lossy dielectrics are a significant source of decoherence in superconducting quantum circuits. In this report, we model and compare the dielectric loss in bulk and interfacial dielectrics in titanium nitride (TiN) and aluminum (Al) superconducting coplanar waveguide (CPW) resonators. We fabricate isotropically trenched resonators to produce a series of device geometries that accentuate a specific dielectric region's contribution to resonator quality factor. While each dielectric region contributes significantly to loss in TiN devices, the metal-air interface dominates the loss in the Al devices. Furthermore, we evaluate the quality factor of each TiN resonator geometry with and without a post-process hydrofluoric (HF) etch, and find that it reduced losses from the substrate-air interface, thereby improving the quality factor.	翻訳日:2023-05-10 01:59:49 公開日:2020-07-14
# ウィキペディアの取り組みと貢献に影響を与える要因 Individual Factors that Influence Effort and Contributions on Wikipedia ( http://arxiv.org/abs/2007.07333v1 ) ライセンス: Link先を確認	Luiz F. Pinto, Carlos Denner dos Santos, Silvia Onoyama	(参考訳) 本研究は,ウィキペディアに対する態度,自己効力,利他主義が努力や積極的な貢献にどのように影響するかを分析することを目的とする。本稿では,計画行動理論とオンラインコミュニティにおける文献からの知見に基づく新しい概念モデルを提案する。このモデルは、様々な面(識別、相互性、評判)における利他主義を考慮し、組織文献に拠れば、積極的な貢献の観点で測定されるパフォーマンス結果に先立って、努力を要素として扱うことによって、これまで提案されてきた他のモデルと異なる。研究の目的を達成するため、wikipediaはコミュニティのメンバーを調査し、二次的なデータを収集した。異常値を除くと,最終サンプルが212名であった。探索的因子分析と構造方程式モデリングを適用し,良好な適合指標を持つモデルを得た。その結果, 努力が積極的な貢献, 態度, 評価による利他主義, 識別による利他主義に影響を及ぼすことが示唆された。提案された要因はいずれも、アクティブな貢献に直接関係しない。経験は自己効力感に直接影響を与え、努力と積極的貢献の関係を肯定的に抑制する。最後に,文献への示唆と今後の研究への示唆を通じて結論を述べる。 In this work, we aim to analyze how attitude, self-efficacy, and altruism influence effort and active contributions on Wikipedia. We propose a new conceptual model based on the theory of planned behavior and findings from the literature on online communities. This model differs from other models that have been previously proposed by considering altruism in its various facets (identification, reciprocity, and reputation), and by treating effort as a factor prior to performance results, which is measured in terms of active contributions, according to the organizational literature. To fulfill the study specific objectives, Wikipedia surveyed community members and collected secondary data. After excluding outliers, we obtained a final sample with 212 participants. We applied exploratory factor analysis and structural equation modeling, which resulted in a model with satisfactory fit indices. The results indicate that effort influences active contributions, and attitude, altruism by reputation, and altruism by identification influence effort. None of the proposed factors are directly related to active contributions. Experience directly influences self-efficacy while it positively moderates the relation between effort and active contributions. Finally, we present the conclusions via several implications for the literature as well as suggestions for future research.	翻訳日:2023-05-10 01:59:31 公開日:2020-07-14
# 二元系ボース・アインシュタイン凝縮体におけるフィードバック誘起磁性相 Feedback Induced Magnetic Phases in Binary Bose-Einstein Condensates ( http://arxiv.org/abs/2007.07266v1 ) ライセンス: Link先を確認	Hilary M. Hurst, Shangjie Guo, I. B. Spielman	(参考訳) 実時間フィードバック制御を伴うタンデムの弱い測定は、新しい非平衡量子物質工学への新しい道である。本稿では,多成分ボース・アインシュタイン凝縮体(becs)の量子フィードバック制御のための理論的ツールボックスを開発した。単粒子ポテンシャルの形でのフィードバックは、系のダイナミクスを支配する確率方程式に入る効果的な相互作用をもたらすことができる。効果的な相互作用は調整可能であり、スピン非依存およびスピン依存のフェシュバッハ共鳴と類似するが、原子散乱パラメータを変更することはない。フィードバック冷却は測定バックアクションによる暴走加熱を防止し,その効果を説明する解析モデルを提案する。我々は,2成分のBECを確率平均場理論を用いて研究し,フィードバックが易軸強磁性体とスピン非秩序パラマグネット相の相転移を誘導するツールボックスを展示する。本研究では,スピン依存相互作用強度の関数として定常相図を示す。この結果は,ボース・アインシュタイン凝縮体の閉ループ量子制御が,低温原子系における量子工学の強力な新しいツールであることを示す。 Weak measurement in tandem with real-time feedback control is a new route toward engineering novel non-equilibrium quantum matter. Here we develop a theoretical toolbox for quantum feedback control of multicomponent Bose-Einstein condensates (BECs) using backaction-limited weak measurements in conjunction with spatially resolved feedback. Feedback in the form of a single-particle potential can introduce effective interactions that enter into the stochastic equation governing system dynamics. The effective interactions are tunable and can be made analogous to Feshbach resonances -- spin-independent and spin-dependent -- but without changing atomic scattering parameters. Feedback cooling prevents runaway heating due to measurement backaction and we present an analytical model to explain its effectiveness. We showcase our toolbox by studying a two-component BEC using a stochastic mean-field theory, where feedback induces a phase transition between easy-axis ferromagnet and spin-disordered paramagnet phases. We present the steady-state phase diagram as a function of intrinsic and effective spin-dependent interaction strengths. Our result demonstrates that closed-loop quantum control of Bose-Einstein condensates is a powerful new tool for quantum engineering in cold-atom systems.	翻訳日:2023-05-10 01:58:13 公開日:2020-07-14
# 人々を復活させる - ベンチマーク機械学習データセットのコンテスト Bringing the People Back In: Contesting Benchmark Machine Learning Datasets ( http://arxiv.org/abs/2007.07399v1 ) ライセンス: Link先を確認	Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, Morgan Klaus Scheuerman	(参考訳) 社会技術システムに埋め込まれたアルゴリズム上の不公平さに対して、白人、シスジェンダー、男性、西洋のデータ被験者に対する偏見を明らかにする機械学習データセットの内容に注目が集まっている。対照的に、そのようなデータセットに埋め込まれた履歴、値、規範に比較的注意が払われていない。本稿では,機械学習データの系譜である研究プログラムを概説し,これらのデータセットが作成されている理由,収集すべきデータの選択にどのような影響を与えるか,それらの生成の文脈的条件と付随的条件について検討する。機械学習におけるベンチマークデータセットを基盤として運用する方法を説明し、これらのデータセットについて4つの研究課題を提起する。この尋問は、データセット構築に埋め込まれた労働力を理解し、データに遭遇する他の研究者に対する新たなコンテストの道を示すことで、私たちを「人々を取り戻す」よう促します。 In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets. This interrogation forces us to "bring the people back in" by aiding us in understanding the labor embedded in dataset construction, and thereby presenting new avenues of contestation for other researchers encountering the data.	翻訳日:2023-05-10 01:50:46 公開日:2020-07-14
# 多次元格子上のフーリエ歩行では局在化は起こらない Localization does not occur for the Fourier walk on the multi-dimensional lattice ( http://arxiv.org/abs/2007.07398v1 ) ライセンス: Link先を確認	Akihiro Narimatsu	(参考訳) 多次元格子上のグロバーウォークの局所化の存在が知られている。本稿では,空間均質な量子ウォークの局在性の存在条件について述べる。また,多次元格子上のフーリエ歩行では局在化は起こらないことを証明した。 The existence of localization for the Grover walk on the multi-dimensional lattice is known. This paper gives some conditions for the existence of localization for the space-homogeneous quantum walks. We also prove that localization does not occur for the Fourier walk on the multi-dimensional lattice.	翻訳日:2023-05-10 01:50:30 公開日:2020-07-14
# 集積フォトニクスを用いた高純度パルススクイーズ生成 High-Purity Pulsed Squeezing Generation with Integrated Photonics ( http://arxiv.org/abs/2007.07387v1 ) ライセンス: Link先を確認	Chaohan Cui, Christos N. Gagatsos, Saikat Guha and Linran Fan	(参考訳) スクイーズド光は、部分的なポストセレクション技術に基づく量子強化センシングや量子状態工学など、量子技術の強力なツールへと進化してきた。複雑な通信ネットワークや大規模情報処理で好まれる正確なタイムスタンプと物理的に定義された時間モードを提供するため、絞り込み光のパルス発生は特に興味深い。しかしながら、従来の単一パス構成におけるパルススクイージングのマルチモード特性は出力状態の純度を制限し、量子技術における応用に悪影響を及ぼす。本報告では,パルススクイーズを高い時間的純度で生成する新しい手法を提案する。フォトニックキャビティのパラメトリックダウンコンバージョンに基づくパルススクイージングの解析を行った。出力された励起光の有効モード数がユニティに近づくことを示す。このような高純度励起光は広いパラメータと低いポンプパワーで実現でき、大規模な量子資源を生成するための堅牢なアプローチを提供する。 Squeezed light has evolved into a powerful tool for quantum technology, ranging from quantum enhanced sensing and quantum state engineering based on partial post-selection techniques. The pulsed generation of squeezed light is of particular interest, as it can provide accurate time stamp and physically defined temporal mode, which are highly preferred in complex communication networks and large-scale information processing. However, the multimode feature of pulsed squeezing in conventional single-pass configuration limits the purity of the output state, negatively impacting its application in quantum technology. In this Letter, we propose a new approach to generate pulsed squeezing with high temporal purity. Pulsed squeezing based on parametric down-conversion in photonic cavities is analyzed. We show that the effective mode number of the output squeezed light approaches unity. Such a high-purity squeezed light can be realized with broad parameters and low pump power, providing a robust approach to generate large-scale quantum resource.	翻訳日:2023-05-10 01:49:49 公開日:2020-07-14
# パッシブフォトニクスと時間分解検出を用いた高次元周波数エンコード量子情報処理 High-dimensional Frequency-Encoded Quantum Information Processing with Passive Photonics and Time-Resolving Detection ( http://arxiv.org/abs/2007.07386v1 ) ライセンス: Link先を確認	Chaohan Cui, Kaushik P. Seshadreesan, Saikat Guha and Linran Fan	(参考訳) 本稿では,光子周波数領域に符号化された高次元量子情報を処理する新しい手法を提案する。非線形光学過程に基づく以前のアプローチとは対照的に、光子エネルギーのアクティブ制御は不要である。任意ユニタリ変換と投影測定は、受動フォトニック回路と時間分解検出によって実現できる。任意の大きさの量子周波数コムの系統回路設計が提案されている。量子周波数相関の検証基準が導出されている。検出器の有限応答時間の実用的条件を考慮し、現在の装置性能で高忠実度動作を容易に実現できることを示す。この研究は、高次元周波数符号化に基づくスケーラブルで高忠実な量子情報処理への道を開く。 In this Letter, we propose a new approach to process high-dimensional quantum information encoded in a photon frequency domain. In contrast to previous approaches based on nonlinear optical processes, no active control of photon energy is required. Arbitrary unitary transformation and projection measurement can be realized with passive photonic circuits and time-resolving detection. A systematic circuit design for a quantum frequency comb with arbitrary size has been given. The criteria to verify quantum frequency correlation has been derived. By considering the practical condition of detector's finite response time, we show that high-fidelity operation can be readily realized with current device performance. This work will pave the way towards scalable and high-fidelity quantum information processing based on high-dimensional frequency encoding.	翻訳日:2023-05-10 01:49:34 公開日:2020-07-14
# 変分量子分類器を用いた認知症予測 Dementia Prediction Applying Variational Quantum Classifier ( http://arxiv.org/abs/2007.08653v1 ) ライセンス: Link先を確認	Daniel Sierra-Sosa, Juan Arcila-Moreno, Begonya Garcia-Zapirain, Cristian Castillo-Olea, Adel Elmaghraby	(参考訳) 認知症は世界で5番目の死因であり、毎年1000万人の新規患者が死亡している。機械学習技術を用いた医療アプリケーションは物理的限界にほぼ達しているが、診断の頻度の増加によってより多くのデータが利用できるようになっている。量子機械学習(QML)技術に関する最近の研究は、既存の機械学習モデルのトレーニングプロセスを加速し、より複雑なパターンを学ぶための代替手段を提供するのに役立つ、さまざまなアプローチを発見した。本研究は,量子機械学習アルゴリズムの実世界の応用を報告することを目的としており,特に,ibmのフレームワークにおける変分量子分類(vqc)に実装されたバージョンを用いることにより,高齢者の認知症予測を可能にしている。 Dementia is the fifth cause of death worldwide with 10 million new cases every year. Healthcare applications using machine learning techniques have almost reached the physical limits while more data is becoming available resulting from the increasing rate of diagnosis. Recent research in Quantum Machine Learning (QML) techniques have found different approaches that may be useful to accelerate the training process of existing machine learning models and provide an alternative to learn more complex patterns. This work aims to report a real-world application of a Quantum Machine Learning Algorithm, in particular, we found that using the implemented version for Variational Quantum Classiffication (VQC) in IBM's framework Qiskit allows predicting dementia in elderly patients, this approach proves to provide more consistent results when compared with a classical Support Vector Machine (SVM) with a linear kernel using different number of features.	翻訳日:2023-05-10 01:39:28 公開日:2020-07-14
# 相対論的安息エネルギーと次数1/2のフラクショナルモーメント演算子との関係 A Link Between Relativistic Rest Energy and Fractionary Momentum Operators of Order 1/2 ( http://arxiv.org/abs/1912.12770v4 ) ライセンス: Link先を確認	Luis Fernando Mora Mora	(参考訳) 無限ポテンシャル井戸における因果分数波方程式の解を得た。第一に、いわゆる「自由粒子」の場合が解決され、正規化可能な解は波のパケットに似た減衰振動の重ね合わせとなる。この結果から無限ポテンシャルウェルケースが解かれた。得られた方程式の減衰係数は、ユカワポテンシャルまたは「遮蔽」クーロンポテンシャルに現れる指数と一致した。このマッチングが強制されると、粒子はE = mc^2/2のオフセットエネルギーを取得し、各エネルギーレベルによって増大する。箱の中の波動解の指数減衰は、粒子が陽子の質量に等しい質量を持つとき、陽子の半径と密接に関連していることがわかった。最後に、分数的波動方程式は球面座標で表現され、解析的あるいは数値的な方法で解かれる。 The solution of a causal fractionary wave equation in an infinite potential well was obtained. First, the so-called "free particle" case was solved, giving as normalizable solutions a superposition of damped oscillations similar to a wave packet. From this results, the infinite potential well case was then solved. The damping coefficient of the equation obtained was matched with the exponent appearing in the Yucawa potential or "screened" Coulomb potential. When this matching was forced, the particle aquires an offset energy of E = mc^2/2 which then can be increased by each energy level. The expontential damping of the wave solutions in the box was found to be closely related with the radius of the proton when the particle has a mass equal to the mass of the proton. Lastly the fractionary wave equation was expressed in spherical coordinates and remains to be solved through analytical or numerical methods.	翻訳日:2023-01-17 02:52:57 公開日:2020-07-14
# ローカルに考える、グローバルに行動する - ローカルとグローバル表現による連合学習 Think Locally, Act Globally: Federated Learning with Local and Global Representations ( http://arxiv.org/abs/2001.01523v3 ) ライセンス: Link先を確認	Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency	(参考訳) フェデレートラーニング(Federated Learning)とは、複数のデバイスに分散したプライベートデータをトレーニングする手法である。デバイスのデータをプライベートに保つため、グローバルモデルはパラメータとアップデートを通信するだけでトレーニングされる。そこで本研究では,各デバイス上のコンパクトな局所表現と,全デバイスにまたがるグローバルモデルを同時に学習する,新しいフェデレーション学習アルゴリズムを提案する。その結果、グローバルモデルは局所表現のみで動作するため、より小さくなり、通信されるパラメータの数を減らすことができる。理論的には, 局所モデルと大域モデルの組み合わせにより, デバイス分布のばらつきだけでなく, データのばらつきも減少することを示す一般化解析を行う。実演的に、我々はローカルモデルが性能を維持しながらコミュニケーション効率の高い訓練を可能にすることを示した。また、プライバシーが重要な現実世界のモバイルデータから個人化された気分予測のタスクを評価する。最後に、ローカルモデルは、新しいデバイスからの異種データを処理し、人種、年齢、性別などの保護された属性を隠蔽する公平な表現を学ぶ。 Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. Theoretically, we provide a generalization analysis which shows that a combination of local and global models reduces both variance in the data as well as variance across device distributions. Empirically, we demonstrate that local models enable communication-efficient training while retaining performance. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key. Finally, local models handle heterogeneous data from new devices, and learn fair representations that obfuscate protected attributes such as race, age, and gender.	翻訳日:2023-01-14 02:08:44 公開日:2020-07-14
# 部分空間分割のための群ノルム正規化分解モデル A Group Norm Regularized Factorization Model for Subspace Segmentation ( http://arxiv.org/abs/2001.02568v2 ) ライセンス: Link先を確認	Xishun Wang and Zhouwang Yang and Xingye Yue and Hui Wang	(参考訳) 部分空間のセグメンテーションは、データが異なる部分空間の結合から来ると仮定し、セグメンテーションの目的は、データを対応する部分空間に分割することである。低ランク表現(LRR)は、サブスペースセグメンテーション問題を解決するための古典的なスペクトル型手法であり、まずLRRモデルを解くことで親和性行列を取得し、次にセグメンテーションのためのスペクトルクラスタリングを実行する。本稿では,部分空間分割のためのlrrモデルに触発された群ノルム正規化分解モデル(gnrfm)を提案し,このモデルを解くために拡張ラグランジアン法(aalm)アルゴリズムを設計する。具体的には, 因子行列の列を疎くするために群ノルム正規化を適用し, 低階の目的を達成することにより, 特異値分解 (svd) は不要となり, 各ステップの計算複雑性が大幅に低減される。我々は、異なるLRRモデルを用いて親和性行列を取得し、それぞれ異なる合成ノイズデータと実データを用いてクラスタテストを行う。従来のモデルやアルゴリズムと比較して、提案手法はより高速でノイズに強いため、最終的なクラスタリング結果の方が優れている。さらに, 計算結果から, アルゴリズムは高速に収束し, 約10回しか要しないことがわかった。 Subspace segmentation assumes that data comes from the union of different subspaces and the purpose of segmentation is to partition the data into the corresponding subspace. Low-rank representation (LRR) is a classic spectral-type method for solving subspace segmentation problems, that is, one first obtains an affinity matrix by solving a LRR model and then performs spectral clustering for segmentation. This paper proposes a group norm regularized factorization model (GNRFM) inspired by the LRR model for subspace segmentation and then designs an Accelerated Augmented Lagrangian Method (AALM) algorithm to solve this model. Specifically, we adopt group norm regularization to make the columns of the factor matrix sparse, thereby achieving a purpose of low rank, which means no Singular Value Decompositions (SVD) are required and the computational complexity of each step is greatly reduced. We obtain affinity matrices by using different LRR models and then performing cluster testing on different sets of synthetic noisy data and real data, respectively. Compared with traditional models and algorithms, the proposed method is faster and more robust to noise, so the final clustering results are better. Moreover, the numerical results show that our algorithm converges fast and only requires approximately ten iterations.	翻訳日:2023-01-13 09:40:38 公開日:2020-07-14
# 単一画像の空間適応ネットワーク Spatial-Adaptive Network for Single Image Denoising ( http://arxiv.org/abs/2001.10291v2 ) ライセンス: Link先を確認	Meng Chang, Qi Li, Huajun Feng, Zhihai Xu	(参考訳) 前回の研究では、畳み込みニューラルネットワークが画像のノイズ処理において優れたパフォーマンスを達成できることが示されている。しかし、局所的な強固な畳み込み操作によって制限されるため、これらの手法は過剰な人工物に繋がる。より深いネットワーク構造はこれらの問題を緩和するが、より多くの計算オーバーヘッドが必要である。本稿では,効率的な単一画像ブラインドノイズ除去のための空間適応型雑音除去ネットワーク(SADNet)を提案する。空間テクスチャやエッジの変化に適応するため, 残留空間適応ブロックを設計する。重み付けのための空間的相関特徴をサンプリングするために変形可能な畳み込みを導入する。コンテキストブロック付きエンコーダデコーダ構造を導入し、マルチスケール情報をキャプチャする。粗さから微細なノイズ除去により、高品質なノイズフリー画像を得ることができる。本手法を合成および実雑音画像データセットに適用する。実験の結果,本手法は定量的および視覚的に,最先端の弁別法を上回ることができることがわかった。 Previous works have shown that convolutional neural networks can achieve good performance in image denoising tasks. However, limited by the local rigid convolutional operation, these methods lead to oversmoothing artifacts. A deeper network structure could alleviate these problems, but more computational overhead is needed. In this paper, we propose a novel spatial-adaptive denoising network (SADNet) for efficient single image blind noise removal. To adapt to changes in spatial textures and edges, we design a residual spatial-adaptive block. Deformable convolution is introduced to sample the spatially correlated features for weighting. An encoder-decoder structure with a context block is introduced to capture multiscale information. With noise removal from the coarse to fine, a high-quality noisefree image can be obtained. We apply our method to both synthetic and real noisy image datasets. The experimental results demonstrate that our method can surpass the state-of-the-art denoising methods both quantitatively and visually.	翻訳日:2023-01-06 02:52:19 公開日:2020-07-14
# 音韻学における系統信号 Phylogenetic signal in phonotactics ( http://arxiv.org/abs/2002.00527v2 ) ライセンス: Link先を確認	Jayden L. Macklin-Cordes, Claire Bowern and Erich R. Round	(参考訳) 系統学的手法は、樹木の推測を超えた言語学に幅広い可能性を秘めている。ここでは, 系統学的なアプローチが, 全く新しい言語データから歴史的知見を得る可能性を明らかにする。本研究では,111パマ・ニュンガン語彙から音声学的データを抽出し,系統発生の履歴を反映する程度を定量化する。 1) セグメント間の遷移周波数と, (3) 自然音クラス間の遷移周波数とで, biphone (two-segment sequences) の存在の有無を記録するバイナリ変数と, (3) 自然音クラス間の遷移頻度の3つのデータセットをテストした。オーストラリアの言語は高い音韻的同質性を持っていると特徴付けられる。それにもかかわらず、すべてのデータセットで系統発生シグナルを検出する。系統発生シグナルは、二進法データよりも粒度の細かい周波数データで、自然クラスデータで最大である。これらの結果は, 歴史的・比較言語学において, 容易に抽出できる新たなデータ源を活用できることを示す。 Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data--in this instance, statistical phonotactics. We extract phonotactic data from 111 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.	翻訳日:2023-01-04 09:15:28 公開日:2020-07-14
# 1クラス潜在正規化ネットワークによる異常検出 Anomaly Detection by One Class Latent Regularized Networks ( http://arxiv.org/abs/2002.01607v2 ) ライセンス: Link先を確認	Chengwei Chen and Pan Chen and Haichuan Song and Yiqing Tao and Yuan Xie and Shouhong Ding and Lizhuang Ma	(参考訳) 異常検出は多くの実世界の応用でコンピュータビジョン領域の基本的な問題である。正規クラスに属する幅広い画像が、ある分布から現れると、このタスクの目的は、異常な事象に属する分布外画像を検出するモデルを構築することである。近年,GANに基づく半教師付きジェネレーティブ・アドバイザリアル・ネットワーク(GAN)手法が,異常検出タスクで人気を集めている。しかし、GANのトレーニングプロセスはまだ不安定で困難である。これらの問題を解決するために, 学習データの基盤構造を潜在的特徴空間にとらえるだけでなく, 潜在表現の空間を識別的に制限し, より正確な検出を行うことができる, 新たな対向的二重オートエンコーダネットワークを提案する。さらに、判別器と見なされる補助オートエンコーダは、より安定した訓練プロセスを得ることができた。実験の結果,MNISTおよびCIFAR10データセットおよびGTSRB停止信号データセットの最先端結果が得られた。 Anomaly detection is a fundamental problem in computer vision area with many real-world applications. Given a wide range of images belonging to the normal class, emerging from some distribution, the objective of this task is to construct the model to detect out-of-distribution images belonging to abnormal instances. Semi-supervised Generative Adversarial Networks (GAN)-based methods have been gaining popularity in anomaly detection task recently. However, the training process of GAN is still unstable and challenging. To solve these issues, a novel adversarial dual autoencoder network is proposed, in which the underlying structure of training data is not only captured in latent feature space, but also can be further restricted in the space of latent representation in a discriminant manner, leading to a more accurate detector. In addition, the auxiliary autoencoder regarded as a discriminator could obtain an more stable training process. Experiments show that our model achieves the state-of-the-art results on MNIST and CIFAR10 datasets as well as GTSRB stop signs dataset.	翻訳日:2023-01-03 21:45:56 公開日:2020-07-14
# 不均一顔認識のための関係深い特徴学習 Relational Deep Feature Learning for Heterogeneous Face Recognition ( http://arxiv.org/abs/2003.00697v3 ) ライセンス: Link先を確認	MyeongAh Cho, Taeoh Kim, Ig-Jae Kim, Kyungjae Lee, and Sangyoun Lee	(参考訳) Heterogeneous Face Recognition (HFR)は、可視光(VIS)、近赤外線(NIR)、スケッチ領域などの2つの異なる領域の顔にマッチするタスクである。データベースがないため、HFR法は通常、一般的な顔情報を含む大規模視覚データベースの事前訓練された特徴を利用する。しかし、これらの事前訓練された特徴は、視覚領域とのテクスチャの相違による性能劣化を引き起こす。そこで本研究では,汎用的な顔特徴に加えて,グローバルリレーショナル情報を抽出するリレーショナルグラフモジュール (rgm) を提案する。各アイデンティティの界面部分間の関係情報はどんなモダリティでも似ているため、特徴間のモデリング関係はドメイン間のマッチングに役立つ。 RGMを通して、関係伝播は事前訓練された特徴から利点を失うことなくテクスチャ依存性を減少させる。さらに、RGMは、局所的に相関した畳み込み特徴からグローバルな顔幾何学を捉え、長距離関係を識別する。さらに,ノードワイズ・リカレーションを行うノード注意ユニット(NAU)を提案する。さらに,HFRにおける埋め込みベクトルの効率的な投影学習のための条件付きマージン損失関数(C-softmax)を提案する。提案手法は5つのHFRデータベース上での最先端手法よりも優れている。さらに,我々のモジュールを任意のトレーニング済みの顔認識バックボーンにプラグインすることで,小さなHFRデータベースの制限を克服し,3つのバックボーンの性能向上を実証する。 Heterogeneous Face Recognition (HFR) is a task that matches faces across two different domains such as visible light (VIS), near-infrared (NIR), or the sketch domain. Due to the lack of databases, HFR methods usually exploit the pre-trained features on a large-scale visual database that contain general facial information. However, these pre-trained features cause performance degradation due to the texture discrepancy with the visual domain. With this motivation, we propose a graph-structured module called Relational Graph Module (RGM) that extracts global relational information in addition to general facial features. Because each identity's relational information between intra-facial parts is similar in any modality, the modeling relationship between features can help cross-domain matching. Through the RGM, relation propagation diminishes texture dependency without losing its advantages from the pre-trained features. Furthermore, the RGM captures global facial geometrics from locally correlated convolutional features to identify long-range relationships. In addition, we propose a Node Attention Unit (NAU) that performs node-wise recalibration to concentrate on the more informative nodes arising from relation-based propagation. Furthermore, we suggest a novel conditional-margin loss function (C-softmax) for the efficient projection learning of the embedding vector in HFR. The proposed method outperforms other state-of-the-art methods on five HFR databases. Furthermore, we demonstrate performance improvement on three backbones because our module can be plugged into any pre-trained face recognition backbone to overcome the limitations of a small HFR database.	翻訳日:2022-12-27 05:13:48 公開日:2020-07-14
# 深層強化学習を用いた効率的かつ効果的な類似サブトラジェクション探索 Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning ( http://arxiv.org/abs/2003.02542v2 ) ライセンス: Link先を確認	Zheng Wang, Cheng Long, Gao Cong, Yiding Liu	(参考訳) 同様の軌道探索は基本的な問題であり、過去20年間にわたってよく研究されてきた。しかしながら、クエリの軌跡に最もよく似た軌道の一部(すなわち、サブ軌跡)を返すことを目的とした類似のサブ軌跡探索(simsub)問題は、より細かい方法で軌道の類似性を捉えることができ、多くのアプリケーションが解析の基本的な単位としてサブ軌跡を取るにもかかわらず、ほとんど無視されている。本稿では,SimSub問題について検討し,精度と近似性の両方を含むアルゴリズムスイートを開発する。これらの近似アルゴリズムのうち、深層強化学習に基づく2つのアルゴリズムは、有効性と効率の観点からこれらの非学習ベースのアルゴリズムよりも優れている。提案手法の有効性と効率を検証した実世界の軌道データセットの実験を行った。 Similar trajectory search is a fundamental problem and has been well studied over the past two decades. However, the similar subtrajectory search (SimSub) problem, aiming to return a portion of a trajectory (i.e., a subtrajectory) which is the most similar to a query trajectory, has been mostly disregarded despite that it could capture trajectory similarity in a finer-grained way and many applications take subtrajectories as basic units for analysis. In this paper, we study the SimSub problem and develop a suite of algorithms including both exact and approximate ones. Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency. We conduct experiments on real-world trajectory datasets, which verify the effectiveness and efficiency of the proposed algorithms.	翻訳日:2022-12-26 07:45:49 公開日:2020-07-14
# 強化学習における分布ロバスト性と正規化 Distributional Robustness and Regularization in Reinforcement Learning ( http://arxiv.org/abs/2003.02894v2 ) ライセンス: Link先を確認	Esther Derman and Shie Mannor	(参考訳) 分散ロバスト最適化(DRO)は、分類と回帰におけるロバスト性と正規化の等価性を証明し、正規化が統計的学習においてうまく一般化する解析的理由を与える。 DROのシーケンシャルな意思決定への拡張は、ロバストなマルコフ決定プロセス(MDP)設定を通じて$\textit{external uncertainty}$を克服するが、結果の定式化は特に大域において解決が難しい。一方、強化学習における既存の正規化法は確率性のため$\textit{internal uncertainty}$のみを扱う。本研究は,強固なmdpと正則化の二重関係を確立することにより,強固な強化学習を促進することを目的としている。本稿では,分散ロバストなMPPを導入し,非サンプル性能を保証することを証明する。次に,経験値関数に対する新しい正規化器を導入し,ワッサースタイン分布ロバストな値関数の下限を示す。結果は大きな状態空間に対する線形値関数近似に拡張する。提案手法は,有限サンプル性能を保証したロバストネスの定式化を提供する。さらに、強化学習法で$\textit{external uncertainty}$を扱うための実用的なツールとして正規化を使うことを提案する。 Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address $\textit{internal uncertainty}$ due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with $\textit{external uncertainty}$ in reinforcement learning methods.	翻訳日:2022-12-26 07:02:27 公開日:2020-07-14
# OVC-Net: テンポラルグラフと詳細拡張によるオブジェクト指向ビデオキャプション OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement ( http://arxiv.org/abs/2003.03715v5 ) ライセンス: Link先を確認	Fangyi Zhu, Jenq-Neng Hwang, Zhanyu Ma, Guang Chen, Jun Guo	(参考訳) 従来のビデオキャプションでは、ビデオの総合的な説明を要求するが、特定のオブジェクトの詳細な説明は利用できない。移動軌跡を関連づけることなく、これらの画像に基づくデータ駆動手法は、物体間視覚特徴の時空間遷移からの活動を理解することができない。さらに、トレーニングであいまいなクリップ・センテンスペアを採用することで、単対多の性質からマルチモーダル機能マッピングを学ぶことを妨げる。本稿では,オブジェクト指向ビデオキャプションと呼ばれる,映像をオブジェクト指向で理解するための新しいタスクを提案する。ビデオベースのオブジェクト指向ビデオキャプションネットワーク(OVC)-Netを時間グラフと詳細拡張により導入し、時間とともに活動を分析し、小さなサンプル条件下での視覚言語接続を安定的に捕捉する。時間グラフは、以前のイメージベースアプローチよりも有用な補足を提供し、視覚特徴の時間的進化と空間的位置の動的移動からアクティビティを推論することができる。細部の拡張は、異なるオブジェクト間の識別的特徴をキャプチャし、それに続くキャプションモジュールによりより情報的で正確な記述が得られる。その後、効果的なクロスモーダル学習を容易にするために、一貫性のあるオブジェクト指向ペアを提供する新しいデータセットを構築した。提案手法の有効性を示すため,新しいデータセットの実験を行い,最先端のビデオキャプション手法と比較する。実験結果から,OVC-Netは並列オブジェクトを正確に記述する能力を示し,最先端の性能を実現する。 Traditional video captioning requests a holistic description of the video, yet the detailed descriptions of the specific objects may not be available. Without associating the moving trajectories, these image-based data-driven methods cannot understand the activities from the spatio-temporal transitions in the inter-object visual features. Besides, adopting ambiguous clip-sentence pairs in training, it goes against learning the multi-modal functional mappings owing to the one-to-many nature. In this paper, we propose a novel task to understand the videos in object-level, named object-oriented video captioning. We introduce the video-based object-oriented video captioning network (OVC)-Net via temporal graph and detail enhancement to effectively analyze the activities along time and stably capture the vision-language connections under small-sample condition. The temporal graph provides useful supplement over previous image-based approaches, allowing to reason the activities from the temporal evolution of visual features and the dynamic movement of spatial locations. The detail enhancement helps to capture the discriminative features among different objects, with which the subsequent captioning module can yield more informative and precise descriptions. Thereafter, we construct a new dataset, providing consistent object-sentence pairs, to facilitate effective cross-modal learning. To demonstrate the effectiveness, we conduct experiments on the new dataset and compare it with the state-of-the-art video captioning methods. From the experimental results, the OVC-Net exhibits the ability of precisely describing the concurrent objects, and achieves the state-of-the-art performance.	翻訳日:2022-12-25 14:25:05 公開日:2020-07-14
# 階層型運動型メッシュリカバリ Hierarchical Kinematic Human Mesh Recovery ( http://arxiv.org/abs/2003.04232v2 ) ライセンス: Link先を確認	Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecka, Ziyan Wu	(参考訳) 一つの画像から3次元メッシュのパラメトリックモデルを推定する問題を考察する。モデルパラメータの直接回帰によるこの分野の最近の進歩は大きいが、これらの手法は人体の運動的構造を暗黙的に活用するだけであり、それ以前のモデルの最適使用へと繋がる。本研究では,このギャップに対処すべく,既知の階層構造によって,モデルの相互依存性を含む,人間のパラメトリックモデルの回帰のための新しい手法を提案する。これにより、regressorアーキテクチャの事前インフォームド設計と、現在の3dヒューマンメッシュリカバリの標準フレームワークと連携して使用するためのフレキシブルな階層最適化が実現されている。これらの側面を、標準ベンチマークデータセットに関する広範な実験によって実証し、提案した新しい設計が、既存および一般的ないくつかの手法より優れており、新しい最先端の結果が確立されていることを示す。本手法は, データの破損下においても接合部を推定する機能を備えており, 閉塞度の異なる実験を行うことで実証する。 We consider the problem of estimating a parametric model of 3D human mesh from a single image. While there has been substantial recent progress in this area with direct regression of model parameters, these methods only implicitly exploit the human body kinematic structure, leading to sub-optimal use of the model prior. In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. This results in a strong prior-informed design of the regressor architecture and an associated hierarchical optimization that is flexible to be used in conjunction with the current standard frameworks for 3D human mesh recovery. We demonstrate these aspects by means of extensive experiments on standard benchmark datasets, showing how our proposed new design outperforms several existing and popular methods, establishing new state-of-the-art results. By considering joint interdependencies, our method is equipped to infer joints even under data corruptions, which we demonstrate by conducting experiments under varying degrees of occlusion.	翻訳日:2022-12-25 08:42:55 公開日:2020-07-14
# giqa: 生成した画像品質評価 GIQA: Generated Image Quality Assessment ( http://arxiv.org/abs/2003.08932v3 ) ライセンス: Link先を確認	Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen	(参考訳) 現在、GAN(Generative Adversarial Network)は素晴らしい成果を上げているが、すべての生成画像が完璧ではない。最近、生成モデルにいくつかの量的基準が現れたが、いずれも単一の生成画像のために設計されていない。本稿では,各画像の品質を定量的に評価するgiqa(generate image quality assessment)という新たな研究テーマを提案する。学習ベースとデータベースという2つの観点からGIQAアルゴリズムを導入する。我々は、様々なデータセット上で様々なGANモデルによって生成された多数の画像を評価し、それらが人間の評価と一致していることを示す。さらに、GIQAは、生成モデルのリアリズムと多様性を別々に評価し、GANのトレーニングにおいてオンラインのハードネガティブマイニング(OHEM)を可能にするなど、多くのアプリケーションで利用することができる。 Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect. A number of quantitative criteria have recently emerged for generative model, but none of them are designed for a single generated image. In this paper, we propose a new research topic, Generated Image Quality Assessment (GIQA), which quantitatively evaluates the quality of each generated image. We introduce three GIQA algorithms from two perspectives: learning-based and data-based. We evaluate a number of images generated by various recent GAN models on different datasets and demonstrate that they are consistent with human assessments. Furthermore, GIQA is available to many applications, like separately evaluating the realism and diversity of generative models, and enabling online hard negative mining (OHEM) in the training of GANs to improve the results.	翻訳日:2022-12-22 04:51:09 公開日:2020-07-14
# 自然言語推論モデルはIMPPRESsiveか? 学習のインプリケーションと前提 Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition ( http://arxiv.org/abs/2004.03066v2 ) ライセンス: Link先を確認	Paloma Jeretic, Alex Warstadt, Suvrat Bhooshan, Adina Williams	(参考訳) 自然言語推論(NLI)は、自然言語理解にとってますます重要なタスクであり、ある文が他の文に関係しているかどうかを推測する必要がある。しかし、nliモデルが実用的推論を行う能力は未定である。そこで我々は,約25kの半自動生成文対からなるIMPPRES(IMPlicature and PreSupposition diagnosis dataset)を作成した。我々は、multiNLI(Williams et al., 2018)でトレーニングされたBERT、InferSent、BOW NLIモデルが実用的推論を学習するかどうかを評価するためにIMPPRESを使用する。 multinli はこれらの推論型を示すペアをほとんど含まないように見えるが、bert が実用的推論を描くことを学ぶ。によって引き起こされるスカラーの模倣を包含物として確実に扱う。のみ"のような前提条件のトリガに対して、BERTは、否定のような追加条件のキャンセル演算子の下にトリガーが埋め込まれた場合でも、前提条件をエンセレーションとして確実に認識する。 BOWとInferSentは、実用的な推論の弱い証拠を示している。 nliトレーニングはモデルに実用的推論を学ぶことを奨励するものだと結論付けています。 Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by "some" as entailments. For some presupposition triggers like "only", BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.	翻訳日:2022-12-15 23:39:34 公開日:2020-07-14
# 球面画像の高精度セグメンテーションのための一般化された最短パスベーススーパーピクセル Generalized Shortest Path-based Superpixels for Accurate Segmentation of Spherical Images ( http://arxiv.org/abs/2004.07394v3 ) ライセンス: Link先を確認	R\'emi Giraud, Rodrigo Borba Pinheiro, Yannick Berthoumieu	(参考訳) 既存のスーパーピクセル法のほとんどは、コンピュータビジョンパイプラインのプリプロセスとして標準平面画像を分割するために設計されている。それでも、主に360{\deg}球面画像を生成する広角キャプチャデバイスに基づくアプリケーションの増加は、専用のスーパーピクセルアプローチの必要性を強制している。本稿では,SphSPS(Spherical Shortest Path-based Superpixels)と呼ばれる,球面画像の新しいスーパーピクセル法を提案する。我々のアプローチは球面幾何学を尊重し、3次元球面取得空間上の画素とスーパーピクセル中心の間の最短経路の概念を一般化する。このようなパスの特徴情報をクラスタリングフレームワークに効率的に統合でき、オブジェクトの輪郭や形状の規則性に関して共同で改善できることを示します。球面空間におけるこの最後の側面を相対的に評価するために、平面大域正規度計量も一般化する。最後に,提案手法は,360{\deg}球面パノラマセグメンテーションデータセットにおける平面および近年の球面スーパーピクセルアプローチよりもかなり優れた性能が得られることを示す。 Most of existing superpixel methods are designed to segment standard planar images as pre-processing for computer vision pipelines. Nevertheless, the increasing number of applications based on wide angle capture devices, mainly generating 360{\deg} spherical images, have enforced the need for dedicated superpixel approaches. In this paper, we introduce a new superpixel method for spherical images called SphSPS (for Spherical Shortest Path-based Superpixels). Our approach respects the spherical geometry and generalizes the notion of shortest path between a pixel and a superpixel center on the 3D spherical acquisition space. We show that the feature information on such path can be efficiently integrated into our clustering framework and jointly improves the respect of object contours and the shape regularity. To relevantly evaluate this last aspect in the spherical space, we also generalize a planar global regularity metric. Finally, the proposed SphSPS method obtains significantly better performance than both planar and recent spherical superpixel approaches on the reference 360{\deg} spherical panorama segmentation dataset.	翻訳日:2022-12-13 03:48:44 公開日:2020-07-14
# PolyLaneNet: 深い多項式回帰によるレーン推定 PolyLaneNet: Lane Estimation via Deep Polynomial Regression ( http://arxiv.org/abs/2004.10924v2 ) ライセンス: Link先を確認	Lucas Tabelini, Rodrigo Berriel, Thiago M. Paix\~ao, Claudine Badue, Alberto F. De Souza and Thiago Oliveira-Santos	(参考訳) 自動運転の大きな進歩に貢献した主な要因の1つは、ディープラーニングの出現である。安全な自動運転車にとって、まだ完全に解決されていない問題の1つは車線検出だ。このタスクのメソッドはリアルタイム(+30fps)で動作する必要があるため、効果的(すなわち高い精度)でなければならないだけでなく、効率的(すなわち高速)でなければならない。本研究では,車両に搭載された前方カメラからのイメージを入力として用いて,画像中の各レーンマーキングを表す多項式を深い多項式回帰により出力するレーン検出手法を提案する。提案手法は,tusimpleデータセットの効率(115fps)を維持しつつ,既存の最先端手法と競合することが示されている。さらに、さらに2つの公開データセットに関する広範な質的結果と、近年のレーン検出で使用された評価指標の制限が提示されている。最後に、私たちはソースコードとトレーニングされたモデルを提供して、他の人が本論文で示したすべての結果を再現できるようにします。ソースコードと事前訓練済みのモデルはhttps://github.com/lucastabelini/PolyLaneNet.comで入手できる。 One of the main factors that contributed to the large advances in autonomous driving is the advent of deep learning. For safer self-driving vehicles, one of the problems that has yet to be solved completely is lane detection. Since methods for this task have to work in real-time (+30 FPS), they not only have to be effective (i.e., have high accuracy) but they also have to be efficient (i.e., fast). In this work, we present a novel method for lane detection that uses as input an image from a forward-looking camera mounted in the vehicle and outputs polynomials representing each lane marking in the image, via deep polynomial regression. The proposed method is shown to be competitive with existing state-of-the-art methods in the TuSimple dataset while maintaining its efficiency (115 FPS). Additionally, extensive qualitative results on two additional public datasets are presented, alongside with limitations in the evaluation metrics used by recent works for lane detection. Finally, we provide source code and trained models that allow others to replicate all the results shown in this paper, which is surprisingly rare in state-of-the-art lane detection methods. The full source code and pretrained models are available at https://github.com/lucastabelini/PolyLaneNet.	翻訳日:2022-12-10 09:48:50 公開日:2020-07-14
# ミニマックスレイテンシを用いたマルチロボットパトロールスケジューリングの近似アルゴリズム Approximation Algorithms for Multi-Robot Patrol-Scheduling with Min-Max Latency ( http://arxiv.org/abs/2005.02530v3 ) ライセンス: Link先を確認	Peyman Afshani, Mark De Berg, Kevin Buchin, Jie Gao, Maarten Loffler, Amir Nayyeri, Benjamin Raichel, Rik Sarkar, Haotian Wang, Hao-Tsung Yang	(参考訳) 我々は、メートル法空間内の所定の$n$のサイトを訪れるために、$k$ロボットのパトロールスケジュールを見つける問題を考える。各ロボットは、同じ最大速度を持ち、そのサイトの連続訪問間の最大時間と定義されている任意のサイトの重み付けされた最大レイテンシを最小限にすることを目的としている。問題はnp-hardであり、旅行セールスマンの問題を特別なケースとして抱えている($k=1$、すべてのサイトが同じ重量を持つ場合)。近似係数が o(k^2 \log \frac{w_{\max}}{w_{\min}}) の多項式時間アルゴリズムを最適解に与え、そこではそれぞれ $w_{\max}$ と $w_{\min}$ が各サイトの最大重みと最小重みである。さらに, サイトが1次元の特別な場合についても検討する。すべてのサイトが同じ重みを持つ場合、問題を正確に解くために多項式時間アルゴリズムを示す。サイトが異なる重みを持つ場合、12ドル程度の解が提示され、ロボットの数であるk$が一定である場合、多項式時間で実行される。 We consider the problem of finding patrol schedules for $k$ robots to visit a given set of $n$ sites in a metric space. Each robot has the same maximum speed and the goal is to minimize the weighted maximum latency of any site, where the latency of a site is defined as the maximum time duration between consecutive visits of that site. The problem is NP-hard, as it has the traveling salesman problem as a special case (when $k=1$ and all sites have the same weight). We present a polynomial-time algorithm with an approximation factor of $O(k^2 \log \frac{w_{\max}}{w_{\min}})$ to the optimal solution, where $w_{\max}$ and $w_{\min}$ are the maximum and minimum weight of the sites respectively. Further, we consider the special case where the sites are in 1D. When all sites have the same weight, we present a polynomial-time algorithm to solve the problem exactly. If the sites may have different weights, we present a $12$-approximate solution, which runs in polynomial time when the number of robots, $k$, is a constant.	翻訳日:2022-12-06 14:33:53 公開日:2020-07-14
# ウェーブレット統合CNNによるノイズ・ロバスト画像分類 Wavelet Integrated CNNs for Noise-Robust Image Classification ( http://arxiv.org/abs/2005.03337v2 ) ライセンス: Link先を確認	Qiufu Li, Linlin Shen, Sheng Guo, Zhihui Lai	(参考訳) 畳み込みニューラルネットワーク(CNN)は、一般的にノイズの中断、すなわち小さな画像ノイズが出力に劇的な変化を引き起こす。最終述語に対するノイズ効果を抑制するために,max-pooling, strided-convolution, average-poolingを離散ウェーブレット変換(dwt)に置き換え,cnnの強化を行う。本稿では,haar,daubechies,cohenなど様々なウェーブレットに適用可能な一般dwtおよび逆dwt(idwt)層と,これらの層を画像分類に用いるウェーブレット統合cnn(wavecnets)の設計について述べる。ウェーブネットでは、ダウンサンプリング中に特徴マップを低周波および高周波成分に分解する。低周波成分は、基本オブジェクト構造を含む主情報を格納し、後続の層に送信してロバストな高レベル特徴を抽出する。データノイズのほとんどを含む高周波成分を推論中に落とし、ウェーブネットのノイズロバスト性を改善する。 imagenet と imagenet-c (imagenet のノイズバージョン) による実験の結果,wavecnets は vgg, resnet, densenet の統合バージョンであり,バニラ版よりも高い精度とノイズロバスト性を実現していることがわかった。 Convolutional Neural Networks (CNNs) are generally prone to noise interruptions, i.e., small image noise can cause drastic changes in the output. To suppress the noise effect to the final predication, we enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT). We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. In WaveCNets, feature maps are decomposed into the low-frequency and high-frequency components during the down-sampling. The low-frequency component stores main information including the basic object structures, which is transmitted into the subsequent layers to extract robust high-level features. The high-frequency components, containing most of the data noise, are dropped during inference to improve the noise-robustness of the WaveCNets. Our experimental results on ImageNet and ImageNet-C (the noisy version of ImageNet) show that WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.	翻訳日:2022-12-05 23:33:33 公開日:2020-07-14
# 反復的信頼フィードバックとガイドアップサンプリングによる高分解能イメージパインティング High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling ( http://arxiv.org/abs/2005.11742v2 ) ライセンス: Link先を確認	Yu Zeng, Zhe Lin, Jimei Yang, Jianming Zhang, Eli Shechtman, Huchuan Lu	(参考訳) 既存の画像塗装法は、実アプリケーションで大きな穴を扱う際に、しばしばアーティファクトを生成する。この課題に対処するため,フィードバック機構を備えた反復的塗装法を提案する。具体的には, 暗黙の結果だけでなく, 対応する信頼度マップも出力する深層生成モデルを導入する。このマップをフィードバックとして使用すると、各イテレーションでホール内の高信頼画素のみを信頼して、次のイテレーションで残るピクセルにフォーカスすることで、徐々に穴を埋める。前回のイテレーションからの部分的な予測を既知のピクセルとして再利用することで、このプロセスは徐々に結果を改善する。また,高分解能インペインティング結果を生成するための誘導型アップサンプリングネットワークを提案する。我々は、Contextual Attentionモジュールを拡張して、入力画像の高解像度な特徴パッチを借用する。さらに,実際のオブジェクト除去シナリオを模倣するために,大規模なオブジェクトマスクデータセットを収集し,ユーザ入力をシミュレートするより現実的なトレーニングデータを合成する。実験により,本手法は定量評価と定性評価の両方において既存手法よりも有意に優れていた。さらなる結果とWeb APPはhttps://zengxianyu.github.io/iic.comで入手できる。 Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at https://zengxianyu.github.io/iic.	翻訳日:2022-11-29 14:00:44 公開日:2020-07-14
# M2Net:脳腫瘍患者の生存時間予測のためのマルチモーダルマルチチャネルネットワーク M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients ( http://arxiv.org/abs/2006.10135v2 ) ライセンス: Link先を確認	Tao Zhou, Huazhu Fu, Yu Zhang, Changqing Zhang, Xiankai Lu, Jianbing Shen, and Ling Shao	(参考訳) 総生存時間(os)の早期かつ正確な予測は、脳腫瘍患者のより良い治療計画を得るのに役立つ。多くのOS時間予測手法が開発され、有望な結果が得られたが、まだいくつか問題がある。第1に、従来の予測方法は、磁気共鳴(mr)ボリュームの局所病変領域における放射線学的特徴に依存しており、完全な画像や複雑な腫瘍のモデルを表すものではない。第二に、異なるタイプのスキャナー(つまりマルチモーダルデータ)は異なる脳領域に敏感であり、複数のモーダルにまたがる補完情報を効果的に活用し、モダリティ固有の特性を維持することが困難である。第三に、既存の手法は予測モデルに焦点を合わせ、複雑なデータ-ラベル関係を無視している。上記の問題に対処するため,マルチモーダルマルチチャネルネットワーク (M2Net) のエンドツーエンドOS時間予測モデルを提案する。具体的には、まず3dmrボリュームを異なる方向の2d画像に投影し、計算コストを低減し、重要な情報を保存し、事前学習したモデルを他のタスクから転送できるようにする。次に,モダリティ特有のネットワークを用いて,mrスキャンから暗黙的かつ高レベルな特徴を抽出する。マルチモーダル共有ネットワークは、これらの機能をバイリニアプーリングモデルを用いて融合させ、それらの相関を利用して補完情報を提供する。最後に、各モダリティ固有ネットワークとマルチモーダル共有ネットワークからの出力を統合し、最終的な予測結果を生成する。 M2Netモデルが他の手法よりも優れていることを示す実験結果を得た。 Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.	翻訳日:2022-11-26 06:58:50 公開日:2020-07-14
# 人間行動を持つ超人的AI:モデルシステムとしてのチェス Aligning Superhuman AI with Human Behavior: Chess as a Model System ( http://arxiv.org/abs/2006.01855v3 ) ライセンス: Link先を確認	Reid McIlroy-Young and Siddhartha Sen and Jon Kleinberg and Ashton Anderson	(参考訳) 人工知能がますます知性が高まる中、スーパーヒューマンのパフォーマンスを達成することは、人間がアルゴリズムから学び、協力する可能性を高めている。しかし、AIシステムが問題にアプローチする方法は、人々が行う方法としばしば異なるため、解釈不能であり、そこから学ぶのが困難である。人間と人工知能の間のこのギャップを埋める上で重要なステップは、人間の行動を構成する粒度の大きいアクションをモデリングすることだ。我々は、人工知能の長い歴史を持つモデルシステム、チェスでこの目標を追求する。チェス選手の総合的なパフォーマンスは、ゲーム中に決定をするときに展開します。あらゆるスキルレベルでプレイヤーがオンラインでプレイする数億のゲームは、これらの決定とその正確な文脈が詳細に記録される豊富なデータソースを形成する。 AlphaZeroのオープンソース実装を含む既存のチェスエンジンをこのデータに適用すると、人間の動きをうまく予測できないことが分かる。人間のチェスゲームで訓練されたAlpha-Zeroのカスタマイズ版であるMaiaを開発し,既存のエンジンよりもはるかに高い精度で人間の動きを予測し,特定のスキルレベルでのプレイヤーによる決定を調整可能な方法で予測する際の最大精度を実現する。人間が次の動きで大きな間違いを犯すかどうかを予測する2つのタスクに対して、我々は、競争ベースラインを大幅に上回るディープニューラルネットワークを開発する。その結果,まず人間の意思決定を正確にモデル化することで,人間のコラボレーションを念頭に置いて人工知能システムを設計することができる可能性が示唆された。 As artificial intelligence becomes increasingly intelligent---in some cases, achieving superhuman performance---there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in minute detail. Applying existing chess engines to this data, including an open-source implementation of AlphaZero, we find that they do not predict human moves well. We develop and introduce Maia, a customized version of Alpha-Zero trained on human chess games, that predicts human moves at a much higher accuracy than existing engines, and can achieve maximum accuracy when predicting decisions made by players at a specific skill level in a tuneable way. For a dual task of predicting whether a human will make a large mistake on the next move, we develop a deep neural network that significantly outperforms competitive baselines. Taken together, our results suggest that there is substantial promise in designing artificial intelligence systems with human collaboration in mind by first accurately modeling granular human decision-making.	翻訳日:2022-11-25 23:43:39 公開日:2020-07-14
# 参照誘導顔成分編集 Reference-guided Face Component Editing ( http://arxiv.org/abs/2006.02051v2 ) ライセンス: Link先を確認	Qiyao Deng, Jie Cao, Yunfan Liu, Zhenhua Chai, Qi Li and Zhenan Sun	(参考訳) 近年,顔画像の編集は大きな進歩を遂げている。しかし以前の方法も 1)顔の特徴を事前に定義し、高レベルの顔成分(目、鼻、口など)の形状を制御する柔軟性に欠ける。 2)手作業で編集したマスクやスケッチをオブザーバブルな変更の中間表現として取り出すが、このような追加入力は通常、追加の労力を要する。既存の手法の限界(形状、マスク、スケッチなど)を断ち切るため、幾何学的変化を伴う多様かつ制御可能な顔コンポーネント編集のための r-FACE (Reference-guided FAce Component Editing) と呼ばれる新しいフレームワークを提案する。特に、r-faceは、顔成分の形状を制御するための条件として参照画像を利用するバックボーンとしてイメージインペインティングモデルを取る。フレームワークが対象の顔コンポーネントに集中するよう促すために、サンプルガイドアテンションモジュールは、参照画像から抽出された注意特徴と対象顔コンポーネント特徴とを融合するように設計されている。実験的な検証と比較を通じて,提案手法の有効性を検証した。 Face portrait editing has achieved great progress in recent years. However, previous methods either 1) operate on pre-defined face attributes, lacking the flexibility of controlling shapes of high-level semantic facial components (e.g., eyes, nose, mouth), or 2) take manually edited mask or sketch as an intermediate representation for observable changes, but such additional input usually requires extra efforts to obtain. To break the limitations (e.g. shape, mask or sketch) of the existing methods, we propose a novel framework termed r-FACE (Reference-guided FAce Component Editing) for diverse and controllable face component editing with geometric changes. Specifically, r-FACE takes an image inpainting model as the backbone, utilizing reference images as conditions for controlling the shape of face components. In order to encourage the framework to concentrate on the target face components, an example-guided attention module is designed to fuse attention features and the target face component features extracted from the reference image. Through extensive experimental validation and comparisons, we verify the effectiveness of the proposed framework.	翻訳日:2022-11-25 18:13:35 公開日:2020-07-14
# 深層学習時代の創発的マルチエージェントコミュニケーション Emergent Multi-Agent Communication in the Deep Learning Era ( http://arxiv.org/abs/2006.02419v2 ) ライセンス: Link先を確認	Angeliki Lazaridou, Marco Baroni	(参考訳) 言語を通して協力する能力は、人間の定義的な特徴である。深層人工ネットワークの知覚、運動、計画能力が増大するにつれて、研究者は相互作用するための共有言語の開発も可能であるかどうかの研究を行っている。科学的観点から、深いエージェントのコミュニティで言語が進化する条件を理解することは、人間の言語進化に光を当てることができる。応用の観点からは、ディープネットワークに相互通信によって対話的に問題を解決する能力を持たせることで、日々の生活においてより柔軟で有用なものにすることができる。本稿では,この2つの角度から最近の言語出現研究について概説する。 The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language evolves in communities of deep agents and its emergent features can shed light on human language evolution. From an applied perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us should make them more flexible and useful in everyday life. This article surveys representative recent language emergence studies from both of these two angles.	翻訳日:2022-11-25 17:19:15 公開日:2020-07-14
# Cascaded Opponent Filter Network を用いた視覚誘導音源分離 Visually Guided Sound Source Separation using Cascaded Opponent Filter Network ( http://arxiv.org/abs/2006.03028v2 ) ライセンス: Link先を確認	Lingyu Zhu, Esa Rahtu	(参考訳) 本研究の目的は、音源の視覚的な手がかりの助けを借りて、混合音声から元の成分信号を回収することである。このようなタスクは通常、視覚的に誘導された音源分離と呼ばれる。提案するCascaded Opponent Filter (COF) フレームワークは複数のステージで構成され,ソース分離を再帰的に洗練する。 COFのキー要素は、ソース間の残留成分を識別し、再配置する新しい反対フィルタモジュールである。本システムでは,映像フレーム,光学フロー,動画像,それらの組み合わせに基づいて,映像の出現と動きをガイドし,様々な表現について検討する。最後に, cofと共に音源位置の画素レベルマスクを生成する, 音源位置マスキング(sslm)手法を提案する。システム全体は、大量のビデオを使ってエンドツーエンドに訓練されている。我々はCOFを最近のベースラインと比較し、3つの挑戦的データセット(MUSIC、A-MUSIC、A-NATURAL)で最先端のパフォーマンスを得る。プロジェクトページ: https://ly-zhu.github.io/cof-net The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page: https://ly-zhu.github.io/cof-net.	翻訳日:2022-11-25 12:43:18 公開日:2020-07-14
# 不均一自律システムを用いた教師なし異常検出 Unsupervised Abnormality Detection Using Heterogeneous Autonomous Systems ( http://arxiv.org/abs/2006.03733v2 ) ライセンス: Link先を確認	Sayeed Shafayet Chowdhury, Kazi Mejbaul Islam and Rouhan Noor	(参考訳) 監視シナリオにおける異常検出(ad)は、新しくて困難な研究分野である。ドローンや自動車のような自動運転車にとって、通常の状態と異常状態をリアルタイムで区別することが極めて重要である。さらに、いかなるデバイス故障も検出する必要があります。しかし、その性質や異常の程度は実際の環境や逆境によって異なる可能性がある。結果として、すべてのケースa-prioriをモデル化し、教師付きメソッドを使用して分類することは非現実的である。また、自動運転車は、画像やその他のアナログまたはデジタルセンサーデータなどのさまざまなデータタイプを提供しており、これら全てを実効的に活用すれば、異常検出に役立てることができる。そこで本研究では,無人監視ドローンの異常度を推定し,リアルタイム画像とIMUセンサデータを教師なしで解析する異種システムを提案する。本稿では,convolutional neural network (cnn) アーキテクチャを実演し,通常の画像と検討中の画像との角度を推定し,デバイス異常の計測を行う。さらに、IMUデータはオートエンコーダで異常を予測するために使用される。最後に、これら2つのアルゴリズムの結果をアンサンブルして、最終異常度を推定する。提案手法は, IEEE SP Cup-2020データセットで97.3%の精度で良好に動作する。さらに、このアプローチを社内データセットでテストして、堅牢性を確認しました。 Anomaly detection (AD) in a surveillance scenario is an emerging and challenging field of research. For autonomous vehicles like drones or cars, it is immensely important to distinguish between normal and abnormal states in real-time. Additionally, we also need to detect any device malfunction. But the nature and degree of abnormality may vary depending upon the actual environment and adversary. As a result, it is impractical to model all cases a-priori and use supervised methods to classify. Also, an autonomous vehicle provides various data types like images and other analog or digital sensor data, all of which can be useful in anomaly detection if leveraged fruitfully. To that effect, in this paper, a heterogeneous system is proposed which estimates the degree of abnormality of an unmanned surveillance drone, analyzing real-time image and IMU (Inertial Measurement Unit) sensor data in an unsupervised manner. Here, we have demonstrated a Convolutional Neural Network (CNN) architecture, named AngleNet to estimate the angle between a normal image and another image under consideration, which provides us with a measure of anomaly of the device. Moreover, the IMU data are used in autoencoder to predict abnormality. Finally, the results from these two algorithms are ensembled to estimate the final degree of abnormality. The proposed method performs satisfactorily on the IEEE SP Cup-2020 dataset with an accuracy of 97.3%. Additionally, we have also tested this approach on an in-house dataset to validate its robustness.	翻訳日:2022-11-25 03:42:57 公開日:2020-07-14
# sigmorphon 2020 タスク0: タイプ論的に多様な形態変化 SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection ( http://arxiv.org/abs/2006.11572v2 ) ライセンス: Link先を確認	Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden	(参考訳) 自然言語処理(nlp)の幅広い目標は、任意の自然言語を処理する能力を持つシステムを開発することである。しかし、ほとんどのシステムは英語のような1つの言語からのデータを使って開発されている。 sigmorphon 2020では、形態学的再帰に関する共通タスクが、タイプ論的に異なる言語を一般化するシステムの能力を調査することを目的としている。システムは45言語と5つの言語ファミリーのデータを使用して開発され、追加の45言語と10の言語ファミリー(合計13言語)のデータで微調整され、90言語すべてで評価された。タスクには10チームから合計22のシステム(19のニューラル)が提出された。 4つの勝利システムはすべてニューラルネットワーク(単言語トランスフォーマー2台と多言語rnnベースのモデル2台)であった。ほとんどのチームは、低リソース言語のためのデータ幻覚と拡張、アンサンブル、多言語トレーニングの有用性を示しています。非神経学習者や手動で設計した文法は、Ingrian, Tajik, Tagalog, Zarma, Lingalaなど一部の言語で特に限られたデータで、競争力があり、優れた性能を示した。一部の言語ファミリー(afro-asiatic、niger-congo、turkic)は、ほとんどのシステムで比較的簡単であり、90%以上の精度を達成したが、他の言語はより困難であった。 A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.	翻訳日:2022-11-18 22:47:08 公開日:2020-07-14
# セルフプレイによる近最適強化学習 Near-Optimal Reinforcement Learning with Self-Play ( http://arxiv.org/abs/2006.12007v2 ) ライセンス: Link先を確認	Yu Bai, Chi Jin, Tiancheng Yu	(参考訳) 本稿では,2プレイヤーゼロサムゲームにおける強化学習のための最適アルゴリズムの設計問題について考察する。我々は,直接の監督なしに自己対決で最適な政策を学ぶセルフプレイアルゴリズムに焦点を当てる。 s$状態、$a$ max-playerアクション、$b$ min-playerアクションを持つ表型エピソディックマルコフゲームでは、近似ナッシュ均衡を見つけるための最良の既存のアルゴリズムは、$(s,a,b)$にのみ依存を強調するとき、ゲームプレイのステップである$\tilde{\mathcal{o}}(s^2ab)$を必要とする。対照的に、最も高い既存の下界スケールは$\Omega(S(A+B))$で、上界と大きな差がある。本稿では, サンプル複雑性を$\tilde{\mathcal{O}}(SAB)$, サンプル複雑性を$\tilde{\mathcal{O}}(S(A+B)$とする新しい \emph{Nash V-learning} アルゴリズムの楽観的な変種を提案する。後者の結果は、各エピソードの長さの多項式係数を除く全ての問題依存パラメータにおける情報理論の下限と一致する。さらに,マルコフゲームにおける固定対戦相手に対する最善の応答を学習する計算の難易度を,nash平衡を求めることとは異なる学習目標として提示する。 This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with $S$ states, $A$ max-player actions and $B$ min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires $\tilde{\mathcal{O}}(S^2AB)$ steps of game playing, when only highlighting the dependency on $(S,A,B)$. In contrast, the best existing lower bound scales as $\Omega(S(A+B))$ and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(S(A+B))$. The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. In addition, we present a computational hardness result for learning the best responses against a fixed opponent in Markov games---a learning objective different from finding the Nash equilibrium.	翻訳日:2022-11-18 04:17:38 公開日:2020-07-14
# 超低消費電力FDSOIニューラル回路による極端ニューロモーフィックインテリジェンス Ultra-Low-Power FDSOI Neural Circuits for Extreme-Edge Neuromorphic Intelligence ( http://arxiv.org/abs/2006.14270v2 ) ライセンス: Link先を確認	Arianna Rubino, Can Livanelioglu, Ning Qiao, Melika Payvand, and Giacomo Indiveri	(参考訳) 近年、エッジコンピューティングアプリケーションのための人工知能回路やシステムの開発への関心が高まっている。インメモリコンピューティング混合信号ニューロモルフィックアーキテクチャは、スパイクニューラルネットワークをリアルタイムでエミュレートする能力のおかげで、エッジコンピューティングのセンサー処理アプリケーションに有望な超低消費電力ソリューションを提供する。このアプローチによって提供される微粒な並列性は、フォン・ノイマンアーキテクチャの時間多重計算パラダイムに頼ることなく、知覚された信号にそれらのダイナミクスを適用することによって、知覚データを効率的に処理することができる。さらに電力消費を低減するため、FDSOI(Fully-Depleted Silicon on Insulator)統合プロセスの特徴を利用した混合信号アナログ/デジタル回路を提案する。具体的には,アナログ設計問題に対処し,シナプスインテグレータの設計と適応ニューロン回路の設計を最適化するためのfdsoi技術の選択肢を検討する。本稿では,回路シミュレーションの結果を示し,小型設計による生物学的に妥当な神経動力学を作製する能力を示し,ニューロモルフィックプロセッサにおける大規模スパイクニューラルネットワークの実現に最適化した。 Recent years have seen an increasing interest in the development of artificial intelligence circuits and systems for edge computing applications. In-memory computing mixed-signal neuromorphic architectures provide promising ultra-low-power solutions for edge-computing sensory-processing applications, thanks to their ability to emulate spiking neural networks in real-time. The fine-grain parallelism offered by this approach allows such neural circuits to process the sensory data efficiently by adapting their dynamics to the ones of the sensed signals, without having to resort to the time-multiplexed computing paradigm of von Neumann architectures. To reduce power consumption even further, we present a set of mixed-signal analog/digital circuits that exploit the features of advanced Fully-Depleted Silicon on Insulator (FDSOI) integration processes. Specifically, we explore the options of advanced FDSOI technologies to address analog design issues and optimize the design of the synapse integrator and of the adaptive neuron circuits accordingly. We present circuit simulation results and demonstrate the circuit's ability to produce biologically plausible neural dynamics with compact designs, optimized for the realization of large-scale spiking neural networks in neuromorphic processors.	翻訳日:2022-11-17 04:06:28 公開日:2020-07-14
# MvMM-RegNet:多変量混合モデルとニューラルネットワーク推定に基づく新しい画像登録フレームワーク MvMM-RegNet: A new image registration framework based on multivariate mixture model and neural network estimation ( http://arxiv.org/abs/2006.15573v2 ) ライセンス: Link先を確認	Xinzhe Luo and Xiahai Zhuang	(参考訳) 現在のディープラーニングベースの登録アルゴリズムは、トレーニング中のバックプロパゲーションによって、一対の移動画像と固定画像との密接な対応を最適化するロス関数として、強度に基づく類似度尺度を利用することが多い。しかし、強度に基づくメトリクスは、特にクロスモダリティやコントラスト強調画像において、強度クラス対応の仮定に違反する場合、誤解を招くことがある。また、既存の学習に基づく登録方法は、ペアワイズ登録に主に適用され、グループ登録や複数画像の同時登録に拡張されることは稀である。本稿では,多変量混合モデル(MvMM)とニューラルネットワーク推定に基づく新しい画像登録フレームワークを提案する。外観と解剖情報を一体化した生成モデルを構築し、グループ登録が可能な新規な損失関数を導出する。本稿では,マルチモーダル心画像に対する各種応用について,ペアワイズ登録によるsas (single-atlas-based segmentation) やグループワイズ登録で統一されたmas (multi-atlas segmentation) など,汎用性について述べる。 MM-WHS-2017とMS-CMRSeg-2019の2つの公開データセットの性能評価を行った。以上の結果から, MR画像上でのDiceスコアは平均0.871 pm 0.025ドル, LGE MR画像上での心筋セグメンテーションは0.783 pm 0.082ドルであった。 Current deep-learning-based registration algorithms often exploit intensity-based similarity measures as the loss function, where dense correspondence between a pair of moving and fixed images is optimized through backpropagation during training. However, intensity-based metrics can be misleading when the assumption of intensity class correspondence is violated, especially in cross-modality or contrast-enhanced images. Moreover, existing learning-based registration methods are predominantly applicable to pairwise registration and are rarely extended to groupwise registration or simultaneous registration with multiple images. In this paper, we propose a new image registration framework based on multivariate mixture model (MvMM) and neural network estimation. A generative model consolidating both appearance and anatomical information is established to derive a novel loss function capable of implementing groupwise registration. We highlight the versatility of the proposed framework for various applications on multimodal cardiac images, including single-atlas-based segmentation (SAS) via pairwise registration and multi-atlas segmentation (MAS) unified by groupwise registration. We evaluated performance on two publicly available datasets, i.e. MM-WHS-2017 and MS-CMRSeg-2019. The results show that the proposed framework achieved an average Dice score of $0.871\pm 0.025$ for whole-heart segmentation on MR images and $0.783\pm 0.082$ for myocardium segmentation on LGE MR images.	翻訳日:2022-11-16 02:40:34 公開日:2020-07-14
# 線形サンプルの混合によるスパース信号の復元 Recovery of Sparse Signals from a Mixture of Linear Samples ( http://arxiv.org/abs/2006.16406v2 ) ライセンス: Link先を確認	Arya Mazumdar and Soumyabrata Pal	(参考訳) 線形回帰の混合は、異種データを表現するために広く使われる一般的な学習理論モデルである。最も単純な形式では、2つの異なる線形モデルからラベルが生成され、混合されると仮定する。 Yin et al.とKrishnamurthy et al., 2019の最近の研究は、この問題に対するモデルリカバリの実験的設計設定に焦点を当てている。それらの特徴を設計・クエリしてラベルを得ることが可能であると考えられる。クエリを行うと、オラクルは2つの異なるスパース線形モデルの1つをランダムに選択し、それに応じてラベルを生成する。両方のモデルを同時にリカバリするために、oracleのクエリはいくつ必要か? この問題は、よく知られた圧縮センシング問題の一般化とも考えられる(Cand\`es and Tao, 2005, Donoho, 2006)。本研究では,この問合せの複雑性問題に対処し,これまで最もよく知られた結果に基づいて効率的なアルゴリズムを提供する。 Mixture of linear regressions is a popular learning theoretic model that is used widely to represent heterogeneous data. In the simplest form, this model assumes that the labels are generated from either of two different linear models and mixed together. Recent works of Yin et al. and Krishnamurthy et al., 2019, focus on an experimental design setting of model recovery for this problem. It is assumed that the features can be designed and queried with to obtain their label. When queried, an oracle randomly selects one of the two different sparse linear models and generates a label accordingly. How many such oracle queries are needed to recover both of the models simultaneously? This question can also be thought of as a generalization of the well-known compressed sensing problem (Cand\`es and Tao, 2005, Donoho, 2006). In this work, we address this query complexity problem and provide efficient algorithms that improves on the previously best known results.	翻訳日:2022-11-15 14:31:05 公開日:2020-07-14
# B\'ezierSketch:スケーラブルなベクトルスケッチの生成モデル B\'ezierSketch: A generative model for scalable vector sketches ( http://arxiv.org/abs/2007.02190v2 ) ライセンス: Link先を確認	Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang and Yi-Zhe Song	(参考訳) 人間のスケッチの神経生成モデルの研究は、スケッチ画像の生成と人間の描画過程の関係から、現代の興味深いモデリング問題となっている。ランドマークであるSketchRNNは、一連のウェイポイントとしてスケッチを逐次生成することでブレークスルーを提供した。しかし、これは低解像度の画像生成につながり、長いスケッチのモデル化に失敗する。本稿では,完全ベクトルスケッチのための新しい生成モデルであるB\'ezierSketchについて述べる。この目的のために、まず、エンコーダに各ストロークを最適なB'ezier曲線に埋め込むよう訓練する、ストローク埋め込みに対する新しい逆グラフアプローチを導入する。これにより、スケッチをパラマテライズドストロークの短いシーケンスとして扱うことができ、より長いスケッチの容量で再帰的なスケッチジェネレータを訓練でき、スケーラブルな高解像度な結果が得られる。我々はQuick, Draw!ベンチマークで定性的かつ定量的な結果を報告する。 The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B\'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B\'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.	翻訳日:2022-11-13 13:30:29 公開日:2020-07-14
# medas: 医療とインフォマティクスの間の壁を壊すためのオープンソースのプラットフォーム・アズ・サービス MeDaS: An open-source platform as service to help break the walls between medicine and informatics ( http://arxiv.org/abs/2007.06013v2 ) ライセンス: Link先を確認	Liang Zhang, Johann Li, Ping Li, Xiaoyuan Lu, Peiyi Shen, Guangming Zhu, Syed Afaq Shah, Mohammed Bennarmoun, Kun Qian, Bj\"orn W. Schuller	(参考訳) 過去10年間、ディープラーニング(DL)はコンピュータビジョン、自然言語処理、医療など多くの分野で前例のない成功を収めてきた。特にDLは, 分析, セグメンテーション, 分類などの観点から, 高度な医用画像解析への応用が進んでいる。一方,医学的,臨床的,情報学的な背景を持つ研究コミュニティから,医学的,知識的,スキル的,経験的知識を共同で共有するDLの力を活用した膨大なニーズが生まれている。一方で、規律間の障壁は、しばしばフルで効率的なコラボレーションを妨げるため、進行中です。この目的のために、私たちはMeDicalオープンソースプラットフォームであるMeDaSという新しいオープンソースプラットフォームを提案しています。私たちの知識を最大限に活用するために、MeDaSは、医学的背景から研究者が簡単にDL関連ツールキットを使って、共同で対話的なサービスを証明し、同時に情報科学の科学者やエンジニアが医療知識の側面を理解するための最初のオープンソースプラットフォームです。提案するMeDaSプラットフォームは,RINV(Rapid implementation aNd Verification)の考え方に基づく一連のツールキットとユーティリティに基づいて,医療画像解析に必要な前処理,後処理,拡張,可視化,その他のフェーズを実装できる。肺,肝臓,脳,胸部,病理などの5つの課題を検証し,MeDaSを用いて効率よく実現可能であることを実証した。 In the past decade, deep learning (DL) has achieved unprecedented success in numerous fields including computer vision, natural language processing, and healthcare. In particular, DL is experiencing an increasing development in applications for advanced medical image analysis in terms of analysis, segmentation, classification, and furthermore. On the one hand, tremendous needs that leverage the power of DL for medical image analysis are arising from the research community of a medical, clinical, and informatics background to jointly share their expertise, knowledge, skills, and experience. On the other hand, barriers between disciplines are on the road for them often hampering a full and efficient collaboration. To this end, we propose our novel open-source platform, i.e., MeDaS -- the MeDical open-source platform as Service. To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side. Based on a series of toolkits and utilities from the idea of RINV (Rapid Implementation aNd Verification), our proposed MeDaS platform can implement pre-processing, post-processing, augmentation, visualization, and other phases needed in medical image analysis. Five tasks including the subjects of lung, liver, brain, chest, and pathology, are validated and demonstrated to be efficiently realisable by using MeDaS.	翻訳日:2022-11-11 06:04:29 公開日:2020-07-14
# 手術用ジェスチャー認識のための対称拡張畳み込み Symmetric Dilated Convolution for Surgical Gesture Recognition ( http://arxiv.org/abs/2007.06373v2 ) ライセンス: Link先を確認	Jinglu Zhang, Yinyu Nie, Yao Lyu, Hailin Li, Jian Chang, Xiaosong Yang, Jian Jun Zhang	(参考訳) 自動手術ジェスチャー認識は術中コンピュータ支援と客観的手術スキル評価の前提条件である。以前の作業では、キネマティックなデータを集めるために追加のセンサーが必要か、長くて未撮影の手術ビデオから時間情報を取得することの制限が必要だった。これらの課題に対処するため,RGBビデオのみを用いて外科的ジェスチャーを自動的に検出・分節する新しい時間的畳み込みアーキテクチャを提案する。本手法は,長期の時間パターンを符号化・復号化するために,自己結合モジュールで橋渡しされた対称拡張構造を考案し,それに従ってフレーム間関係を確立する。 JIGSAWSデータセットからの基本的なロボット縫合作業におけるアプローチの有効性を検証する。実験の結果,F1@50スコア~6ポイントまでのフレーム単位の精度で,最先端の手法よりも優れる長期フレーム依存性の把握に本手法が有効であることが示された。 Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to ~6 points and the F1@50 score ~6 points.	翻訳日:2022-11-11 00:34:09 公開日:2020-07-14
# 限定スーパービジョンによるアクティブクラウドカウント Active Crowd Counting with Limited Supervision ( http://arxiv.org/abs/2007.06334v2 ) ライセンス: Link先を確認	Zhen Zhao, Miaojing Shi, Xiaoxiao Zhao, Li Li	(参考訳) 群衆画像から信頼できる人々を知るには、通常、ヘッドセンターアノテーションが必要である。しかし、アノテーティング・センターは密集した群衆にとって退屈で退屈なプロセスである。本稿では,アノテートする画像をランダムに選択する代わりに,少量のラベリング予算が与えられた場合,まず,アノテートするデータセットの最も情報性の高い画像をアノテートするアクティブなラベリング戦略を導入し,その上にカウントモデルを学習する。このプロセスを繰り返して、各サイクルで、群衆密度が多様で、以前の選択と異なるサンプルを選択するようにします。ラベル付け予算が満たされた最後のサイクルでは、ラベル付きデータをラベル付きデータと整列する分布分類器を導入し、さらに、ネットワーク内の分散ラベルと遅延表現を混合して、特にトレーニングサンプル間の分散アライメントを改善することを提案する。群衆カウントのための一般的な密度推定パイプラインに従う。上海技術、UCF CC 50、MAll、TRANCOS、DCCといった標準ベンチマークで大規模な実験が行われる。限られた数の画像(例えばデータセットの10%)にアノテートすることで、データセットの完全なアノテーションを利用する技術の状態から遠く離れたレベルのパフォーマンスに達する。 To learn a reliable people counter from crowd images, head center annotations are normally required. Annotating head centers is however a laborious and tedious process in dense crowds. In this paper, we present an active learning framework which enables accurate crowd counting with limited supervision: given a small labeling budget, instead of randomly selecting images to annotate, we first introduce an active labeling strategy to annotate the most informative images in the dataset and learn the counting model upon them. The process is repeated such that in every cycle we select the samples that are diverse in crowd density and dissimilar to previous selections. In the last cycle when the labeling budget is met, the large amount of unlabeled data are also utilized: a distribution classifier is introduced to align the labeled data with unlabeled data; furthermore, we propose to mix up the distribution labels and latent representations of data in the network to particularly improve the distribution alignment in-between training samples. We follow the popular density estimation pipeline for crowd counting. Extensive experiments are conducted on standard benchmarks i.e. ShanghaiTech, UCF CC 50, MAll, TRANCOS, and DCC. By annotating limited number of images (e.g. 10% of the dataset), our method reaches levels of performance not far from the state of the art which utilize full annotations of the dataset.	翻訳日:2022-11-11 00:18:00 公開日:2020-07-14
# 粗いものから細かいものへの複数の音源の定位 Multiple Sound Sources Localization from Coarse to Fine ( http://arxiv.org/abs/2007.06355v2 ) ライセンス: Link先を確認	Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin	(参考訳) 制約のないビデオで複数の音源を視覚的にローカライズする方法は、特にペアワイズなサウンドオブジェクトアノテーションが欠けている場合、恐ろしい問題です。そこで本研究では,複雑なシーンから異なるカテゴリの音声表現と視覚表現を分離し,粗面から細部までのクロスモーダル特徴のアライメントを行う2段階視聴覚学習フレームワークを開発した。本モデルでは,局所化の公開データセット上での最先端結果と,複雑な場面における複数音源音像定位における有意な性能を実現する。次に, 音像分離のための局所化結果を用い, 既存の手法に匹敵する性能を得る。これらの結果は、特定の視覚源と効果的に音を調整できるモデルの能力を示している。コードはhttps://github.com/shvdiwnkozbw/Multi-Source-Sound-Localizationで入手できる。 How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations. To solve this problem, we develop a two-stage audiovisual learning framework that disentangles audio and visual representations of different categories from complex scenes, then performs cross-modal feature alignment in a coarse-to-fine manner. Our model achieves state-of-the-art results on public dataset of localization, as well as considerable performance on multi-source sound localization in complex scenes. We then employ the localization results for sound separation and obtain comparable performance to existing methods. These outcomes demonstrate our model's ability in effectively aligning sounds with specific visual sources. Code is available at https://github.com/shvdiwnkozbw/Multi-Source-Sound-Localization	翻訳日:2022-11-11 00:17:36 公開日:2020-07-14
# ニューラルネットワークにおけるSGDのカオスの定量的伝播 Quantitative Propagation of Chaos for SGD in Wide Neural Networks ( http://arxiv.org/abs/2007.06352v2 ) ライセンス: Link先を確認	Valentin De Bortoli, Alain Durmus, Xavier Fontaine, Umut Simsekli	(参考訳) 本稿では,2層超パラメータニューラルネットワークに適用される確率的勾配降下(sgd)アルゴリズムの,数やニューロン(つまり隠れた層の大きさ)である$n \to +\infty$ の連続時間に対する制限挙動について検討する。確率論的アプローチに従って,この連続時間ダイナミクスによって定義される粒子系の「カオスの伝播」を示し,粒子間の統計的相互作用が漸近的に消失することを示す。特に、ワッサースタイン距離が与えられた距離空間における平均場mckean-vlasov方程式の解に対する任意の粒子のn$に関して定量的収束を確立する。これまでの研究と比較して、SGDのステップサイズ列がニューロンの数や反復数に依存する可能性のある設定について考察する。次に,それぞれ異なる平均場限界が得られた2つのレジームを同定し,そのうちの1つは手元の最小化問題の暗黙的に正規化されたバージョンに対応する。理論的な結果を検証するために実データ集合について様々な実験を行い、分類問題におけるこれら2つのレジームの存在を評価し、収束結果を示す。 In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.	翻訳日:2022-11-10 23:43:06 公開日:2020-07-14
# 人工データセットにおけるGANの訓練から学んだ教訓 Lessons Learned from the Training of GANs on Artificial Datasets ( http://arxiv.org/abs/2007.06418v2 ) ライセンス: Link先を確認	Shichang Tang	(参考訳) 近年,GAN(Generative Adversarial Networks)は現実的な画像の合成に大きく進歩している。しかし、しばしば、サンプルが少ないか、異なるデータ分布に属するクラスが多すぎるイメージデータセットで訓練される。その結果、GANは不適合や過剰適合の傾向があり、分析が困難で制約される。したがって、データセットがもたらした不要な干渉を回避しつつ、ganを徹底的に研究するために、無限に多くのサンプルと実際のデータ分布が単純で高次元で構造化多様体を持つ人工データセットでそれらを訓練する。さらに、ジェネレータは最適なパラメータ集合が存在するように設計されている。実験により,様々な距離測定において,生成元はGAN訓練手順でそのようなパラメータを学習できないことがわかった。また、GANのトレーニング混合物は、モデル複雑さが十分に高い場合のネットワーク深さや幅を増大させるよりも、パフォーマンスが向上することがわかった。実験の結果,複数のジェネレータの混合が教師なし設定で異なるモードや異なるクラスを自動的に発見できることが示され,複数のジェネレータと識別器にまたがる生成タスクと識別タスクの分散を特徴付ける。現実的なデータセットへの結論の一般化可能性の例として、CIFAR-10データセット上でGANの混合を訓練し、一般的なメトリクス、すなわちインセプションスコア(IS)とFr\echet Inception Distance(FID)で最先端の手法を著しく上回ります。 Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years. However, they are often trained on image datasets with either too few samples or too many classes belonging to different data distributions. Consequently, GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained. Therefore, in order to conduct a thorough study on GANs while obviating unnecessary interferences introduced by the datasets, we train them on artificial datasets where there are infinitely many samples and the real data distributions are simple, high-dimensional and have structured manifolds. Moreover, the generators are designed such that optimal sets of parameters exist. Empirically, we find that under various distance measures, the generator fails to learn such parameters with the GAN training procedure. We also find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width when the model complexity is high enough. Our experimental results demonstrate that a mixture of generators can discover different modes or different classes automatically in an unsupervised setting, which we attribute to the distribution of the generation and discrimination tasks across multiple generators and discriminators. As an example of the generalizability of our conclusions to realistic datasets, we train a mixture of GANs on the CIFAR-10 dataset and our method significantly outperforms the state-of-the-art in terms of popular metrics, i.e., Inception Score (IS) and Fr\'echet Inception Distance (FID).	翻訳日:2022-11-10 22:55:51 公開日:2020-07-14
# 自律走行車のロバストセンシングに向けて--敵対的視点から Towards robust sensing for Autonomous Vehicles: An adversarial perspective ( http://arxiv.org/abs/2007.10115v1 ) ライセンス: Link先を確認	Apostolos Modas, Ricardo Sanchez-Matilla, Pascal Frossard, Andrea Cavallaro	(参考訳) 自動運転車は、さまざまな状況において安全クリティカルな意思決定のために、正確でロバストなセンサー観測に依存している。このようなシステムの基本的な構成要素は、超音波、RADAR、GPS、LiDARおよびカメラ信号を処理するセンサーと分類器である。結果として得られる決定が摂動に対して堅牢であり、異なる種類のニュアンスやデータ変換の形式を取ることができ、また敵対的摂動(AP)にもなり得ることが重要である。敵対的摂動は、自律的なシステムを攻撃し、破壊することを目的として、意図的に環境または感覚測定を改変する。 AVの高速進化領域において、より安全なシステムを構築し、デプロイするには、センサーシステムの脆弱性を慎重に評価する必要がある。そこで,本稿では,自律システムに対するセンサ・モダリティに対する敵の攻撃をレビューした上で,その対策と今後の研究方向性について論じる。 Autonomous Vehicles rely on accurate and robust sensor observations for safety critical decision-making in a variety of conditions. Fundamental building blocks of such systems are sensors and classifiers that process ultrasound, RADAR, GPS, LiDAR and camera signals~\cite{Khan2018}. It is of primary importance that the resulting decisions are robust to perturbations, which can take the form of different types of nuisances and data transformations, and can even be adversarial perturbations (APs). Adversarial perturbations are purposefully crafted alterations of the environment or of the sensory measurements, with the objective of attacking and defeating the autonomous systems. A careful evaluation of the vulnerabilities of their sensing system(s) is necessary in order to build and deploy safer systems in the fast-evolving domain of AVs. To this end, we survey the emerging field of sensing in adversarial settings: after reviewing adversarial attacks on sensing modalities for autonomous systems, we discuss countermeasures and present future research directions.	翻訳日:2022-11-10 15:46:05 公開日:2020-07-14
# 遅延制約型無線フェデレート学習のための協調デバイススケジューリングと資源割り当て Joint Device Scheduling and Resource Allocation for Latency Constrained Wireless Federated Learning ( http://arxiv.org/abs/2007.07174v1 ) ライセンス: Link先を確認	Wenqi Shi, Sheng Zhou, Zhisheng Niu, Miao Jiang, Lu Geng	(参考訳) フェデレーション学習(fl)では、デバイスはワイヤレスチャネルを介してローカルモデルのアップデートをアップロードすることで、グローバルなトレーニングに寄与する。計算量や通信資源が限られているため、デバイススケジューリングはFLの収束速度に不可欠である。本稿では,遅延制約のある無線FLに対して,与えられたトレーニング時間予算のモデル精度を最大化するために,共同装置スケジューリングと資源割当ポリシを提案する。訓練性能損失の相反性に対する低いバウンダリは、訓練ラウンド数と1ラウンド当たりの予定装置数との観点から導出される。この境界に基づいて、精度最大化問題は2つのサブプロブレムに分解することで解決される。まず、スケジュールされたデバイスを考えると、最適な帯域割り当ては、より悪いチャネル条件や計算能力の弱いデバイスへの帯域幅を割り当てることを示唆する。そして、各ステップにおいて、最適な帯域割り当てによって得られる最小更新時間を消費する装置を、下限が増加するまで選択する欲望デバイススケジューリングアルゴリズムを導入することにより、より多くのデバイスがモデル精度を低下させる。実験により,提案手法は,データ分布とセル半径の広範囲な設定の下で,最先端のスケジューリングポリシーより優れていることが示された。 In federated learning (FL), devices contribute to the global training by uploading their local model updates via wireless channels. Due to limited computation and communication resources, device scheduling is crucial to the convergence rate of FL. In this paper, we propose a joint device scheduling and resource allocation policy to maximize the model accuracy within a given total training time budget for latency constrained wireless FL. A lower bound on the reciprocal of the training performance loss, in terms of the number of training rounds and the number of scheduled devices per round, is derived. Based on the bound, the accuracy maximization problem is solved by decoupling it into two sub-problems. First, given the scheduled devices, the optimal bandwidth allocation suggests allocating more bandwidth to the devices with worse channel conditions or weaker computation capabilities. Then, a greedy device scheduling algorithm is introduced, which in each step selects the device consuming the least updating time obtained by the optimal bandwidth allocation, until the lower bound begins to increase, meaning that scheduling more devices will degrade the model accuracy. Experiments show that the proposed policy outperforms state-of-the-art scheduling policies under extensive settings of data distributions and cell radius.	翻訳日:2022-11-10 15:45:36 公開日:2020-07-14
# セキュリティ制約付き最適潮流に対する深層学習と最適化の併用 Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow ( http://arxiv.org/abs/2007.07002v1 ) ライセンス: Link先を確認	Alexandre Velloso and Pascal Van Hentenryck	(参考訳) セキュリティに制約のある最適電力フロー(SCOPF)は電力システムの基本であり、同期発電機の自動一次応答(APR)と短期スケジュールを接続する。様々な入力に対して、SCOPFの問題を毎日繰り返し解決し、一組のタイミングで頑健なスケジュールを決定する。残念ながら、SCOPF問題におけるAPRのモデリングは、複雑な大規模混合整数プログラムをもたらすが、解決は困難である。この課題に対処するため,本研究では,深層学習と頑健な最適化技術を組み合わせた新しい手法を提案する。厳密解法の計算負荷を軽減することを目的とした最近の機械学習アプリケーションとは異なり、提案手法はscopf実装可能な解を直接予測する。 2つのステップで実現可能である。まず、トレーニング中にラグランジアン二重法は、コロン・アンド・制約生成アルゴリズム(CCGA)によって機械学習モデルに反復的に追加される物理的および操作上の制約の違反を罰する。第二に、別のccgaは予測に最も近い解を見つけることで実現可能性を取り戻す。大規模なテストケースでの実験では、最適度ギャップが0.1%以下で実現可能な解を得るためのかなりの時間短縮が得られた。 The security-constrained optimal power flow (SCOPF) is fundamental in power systems and connects the automatic primary response (APR) of synchronized generators with the short-term schedule. Every day, the SCOPF problem is repeatedly solved for various inputs to determine robust schedules given a set of contingencies. Unfortunately, the modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs, which are hard to solve. To address this challenge, leveraging the wealth of available historical data, this paper proposes a novel approach that combines deep learning and robust optimization techniques. Unlike recent machine-learning applications where the aim is to mitigate the computational burden of exact solvers, the proposed method predicts directly the SCOPF implementable solution. Feasibility is enforced in two steps. First, during training, a Lagrangian dual method penalizes violations of physical and operations constraints, which are iteratively added as necessary to the machine-learning model by a Column-and-Constraint-Generation Algorithm (CCGA). Second, another different CCGA restores feasibility by finding the closest feasible solution to the prediction. Experiments on large test cases show that the method results in significant time reduction for obtaining feasible solutions with an optimality gap below 0.1%.	翻訳日:2022-11-10 15:37:09 公開日:2020-07-14
# 地底真理の欠如による特徴不合理性予測 Predicting feature imputability in the absence of ground truth ( http://arxiv.org/abs/2007.07052v1 ) ライセンス: Link先を確認	Niamh McCombe, Xuemei Ding, Girijesh Prasad, David P. Finn, Stephen Todd, Paula L. McClean, KongFatt Wong-Lin	(参考訳) データ計算は、欠落した値を扱う最も一般的な方法であるが、ほとんどの実生活アプリケーションでは、大きな欠落データが発生する可能性があり、データが正確にインプットされたかどうかを評価することは困難または不可能である(基礎的真実の欠如)。本稿では,個々のデータの特徴を正確に説明できるかどうかを判断するための,効果的でシンプルな主成分に基づく手法を提案する。特に, 極度の欠如や根拠の欠如がある場合でも, 主成分負荷と特徴インプタビリティとの間に強い線形関係が確立される。この研究は、実践的なデータ計算戦略に重要な意味を持つだろう。 Data imputation is the most popular method of dealing with missing values, but in most real life applications, large missing data can occur and it is difficult or impossible to evaluate whether data has been imputed accurately (lack of ground truth). This paper addresses these issues by proposing an effective and simple principal component based method for determining whether individual data features can be accurately imputed - feature imputability. In particular, we establish a strong linear relationship between principal component loadings and feature imputability, even in the presence of extreme missingness and lack of ground truth. This work will have important implications in practical data imputation strategies.	翻訳日:2022-11-10 15:36:46 公開日:2020-07-14
# マルチドメイン医用画像検索のためのユニバーサルモデル Universal Model for Multi-Domain Medical Image Retrieval ( http://arxiv.org/abs/2007.08628v1 ) ライセンス: Link先を確認	Yang Feng, Yubao Liu, Jiebo Luo	(参考訳) 医用画像検索(MIR)は、医師が類似した患者のデータを素早く見つけるのに役立つ。デジタル画像モダリティの広範利用と医用画像レポジトリの成長により、MIRはますます役に立ちつつある。しかし、病院における様々なデジタル画像モダリティの人気もまた、MIRにいくつかの課題をもたらしている。通常、1つの画像検索モデルは、1つのモダリティまたは1つのソースの画像を扱うためにのみ訓練される。複数のソースやドメインから医療画像を取得する必要がある場合、複数の検索モデルを維持する必要があります。本稿では,複数の領域の医用画像に適用可能な1つのMIRモデルをトレーニングする方法について検討する。複数のドメインからトレーニングデータを融合するだけでは、既存のメソッドを使ってトレーニングすると、いくつかのドメインがより早く適合するため、この問題は解決できない。そこで本研究では,複数の専門的MIRモデルの知識を汎用埋め込みにより単一のマルチドメインMIRモデルに抽出し,その問題を解決することを提案する。皮膚疾患,X線,網膜画像データセットを用いて,提案したユニバーサルモデルがマルチドメインMIRを効果的に実現できることを検証する。 Medical Image Retrieval (MIR) helps doctors quickly find similar patients' data, which can considerably aid the diagnosis process. MIR is becoming increasingly helpful due to the wide use of digital imaging modalities and the growth of the medical image repositories. However, the popularity of various digital imaging modalities in hospitals also poses several challenges to MIR. Usually, one image retrieval model is only trained to handle images from one modality or one source. When there are needs to retrieve medical images from several sources or domains, multiple retrieval models need to be maintained, which is cost ineffective. In this paper, we study an important but unexplored task: how to train one MIR model that is applicable to medical images from multiple domains? Simply fusing the training data from multiple domains cannot solve this problem because some domains become over-fit sooner when trained together using existing methods. Therefore, we propose to distill the knowledge in multiple specialist MIR models into a single multi-domain MIR model via universal embedding to solve this problem. Using skin disease, x-ray, and retina image datasets, we validate that our proposed universal model can effectively accomplish multi-domain MIR.	翻訳日:2022-11-10 15:36:34 公開日:2020-07-14
# 映像から視覚的な音を生成する Generating Visually Aligned Sound from Videos ( http://arxiv.org/abs/2008.00820v1 ) ライセンス: Link先を確認	Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan	(参考訳) 我々は,自然映像から音を生成する作業に焦点をあて,その音は時間的にも内容的にも視覚信号と一致すべきである。このタスクは、ビデオコンテンツからカメラを推測できない音が生成されるため、非常に難しい。このモデルは、視覚的内容とこれらの無関係な音の間違ったマッピングを学習せざるを得ない。この課題に対処するため,我々はREGNETというフレームワークを提案する。本稿では,複雑な背景情報から音声を発する物体をよりよく識別するために,まず映像フレームから外観や動きの特徴を抽出する。次に,実音を入力として直接考慮し,ボトルネック音の特徴を出力する,革新的な音声フォワード正則化器を導入する。訓練中の音の予測に視覚的特徴とボトルネック的特徴の両方を使用すると、音の予測の監督が強化される。音声フォワーディングレギュレータは、無関係な音成分を制御でき、これにより、画面外にある物体から放射される映像フレームと音との誤ったマッピングを学習するのを防止する。テスト中、オーディオフォワードレギュラライザが削除され、regnetが純粋に調整されたサウンドを視覚的な特徴からのみ生成できるようになる。 Amazon Mechanical Turkに基づく大規模評価の結果,時間的・内容的アライメントが大幅に向上した。驚くべきことに、我々の生成した音は68.12%の成功率で人間を騙すことができる。コードと事前訓練されたモデルはhttps://github.com/PeihaoChen/regnetで公開されている。 We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated \emph{outside} a camera can not be inferred from video content. The model may be forced to learn an incorrect mapping between visual content and these irrelevant sounds. To address this challenge, we propose a framework named REGNET. In this framework, we first extract appearance and motion features from video frames to better distinguish the object that emits sound from complex background information. We then introduce an innovative audio forwarding regularizer that directly considers the real sound as input and outputs bottlenecked sound features. Using both visual and bottlenecked sound features for sound prediction during training provides stronger supervision for the sound prediction. The audio forwarding regularizer can control the irrelevant sound component and thus prevent the model from learning an incorrect mapping between video frames and sound emitted by the object that is out of the screen. During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features. Extensive evaluations based on Amazon Mechanical Turk demonstrate that our method significantly improves both temporal and content-wise alignment. Remarkably, our generated sound can fool the human with a 68.12% success rate. Code and pre-trained models are publicly available at https://github.com/PeihaoChen/regnet	翻訳日:2022-11-10 15:36:16 公開日:2020-07-14
# 非線形ロバスト制御のための三元ポリシー反復アルゴリズム Ternary Policy Iteration Algorithm for Nonlinear Robust Control ( http://arxiv.org/abs/2007.06810v1 ) ライセンス: Link先を確認	Jie Li, Shengbo Eben Li, Yang Guan, Jingliang Duan, Wenyu Li, Yuming Yin	(参考訳) 植物力学の不確実性は、非線形制御問題への挑戦である。本稿では,境界不確実性を伴う非線形ロバスト制御問題を解くための3次ポリシー反復(TPI)アルゴリズムを開発する。コントローラとシステムの不確実性はゲームプレイヤーと見なされ、ロバスト制御問題は2つのプレイヤーゼロサム差分ゲームとして定式化される。微分ゲームを解くために、対応するhamilton-jacobi-isaacs(hji)方程式が導出される。 3つの損失関数と3つの更新フェーズは、それぞれHJI方程式の恒等式、最小化、最大化に対応するように設計されている。これらの損失関数は、全状態が同時に設定されるのを防ぐために生成された状態集合における近似ハミルトン状態の期待によって定義される。勾配降下法を用いて設計した損失関数を小さくすることで、値関数とポリシーのパラメータを直接更新する。さらに、制御ポリシのパラメータにもゼロ初期化を適用することができる。提案アルゴリズムの有効性は2つのシミュレーション研究を通して実証した。シミュレーションの結果, tpiアルゴリズムは線形プラントの最適解に収束し, 非線形プラントの外乱に対する高い抵抗を持つことがわかった。 The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order to solve the differential game, the corresponding Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and three update phases are designed to match the identity equation, minimization and maximization of the HJI equation, respectively. These loss functions are defined by the expectation of the approximate Hamiltonian in a generated state set to prevent operating all the states in the entire state set concurrently. The parameters of value function and policies are directly updated by diminishing the designed loss functions using the gradient descent method. Moreover, zero-initialization can be applied to the parameters of the control policy. The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies. The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant.	翻訳日:2022-11-10 15:35:24 公開日:2020-07-14
# LSTMとトレーニング可能な初期隠れ状態を用いた金融時系列のモデル化 Modeling Financial Time Series using LSTM with Trainable Initial Hidden States ( http://arxiv.org/abs/2007.06848v1 ) ライセンス: Link先を確認	Jungsik Hwang	(参考訳) 過去の未知のパターンや情報を時系列で抽出することは、多くの現実世界のアプリケーションの中心である。本研究では,深層学習モデルを用いて金融時系列をモデル化する新しい手法を提案する。トレーニング可能な初期隠れ状態を備えたLong Short-Term Memory(LSTM)ネットワークを使用する。時系列の再構成を学習することにより,そのパラメータで高次元時系列データを表現できる。韓国株式市場のデータを用いた実験により、このモデルは潜在空間における大量の株価の相対的類似性を捉えることができた。さらに、このモデルでは、潜在分野から将来の株価トレンドを予測することもできる。提案手法は,多くの時系列間の関係を識別する上で有用であり,投資ポートフォリオの最適化など,金融アプリケーションに適用することができる。 Extracting previously unknown patterns and information in time series is central to many real-world applications. In this study, we introduce a novel approach to modeling financial time series using a deep learning model. We use a Long Short-Term Memory (LSTM) network equipped with the trainable initial hidden states. By learning to reconstruct time series, the proposed model can represent high-dimensional time series data with its parameters. An experiment with the Korean stock market data showed that the model was able to capture the relative similarity between a large number of stock prices in its latent space. Besides, the model was also able to predict the future stock trends from the latent space. The proposed method can help to identify relationships among many time series, and it could be applied to financial applications, such as optimizing the investment portfolios.	翻訳日:2022-11-10 15:35:07 公開日:2020-07-14
# ネットワーク音楽演奏アプリケーションにおける音声信号の低遅延パケット損失隠蔽のためのディープラーニング手法 A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications ( http://arxiv.org/abs/2007.07132v1 ) ライセンス: Link先を確認	Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi	(参考訳) Networked Music Performance (NMP)は、インターネットアプリケーションにおける潜在的なゲームチェンジャーとして構想されており、遠隔のミュージシャンが遠隔通信ネットワークを介して対話し、一緒に演奏できるようにすることによって、従来の音楽インタラクションの概念に革命をもたらすことを目的としている。しかし、音楽演奏の現実的な条件を保証することは、音質やネットワークの遅延といった極めて厳しい要件のため、重要なエンジニアリング上の課題となっている。ミュージシャンが経験したエンドツーエンドの遅延を最小限に抑えるため、NMPアプリケーションの典型的な実装では、圧縮されていない双方向オーディオストリームを使用し、UDPをトランスポートプロトコルとして利用する。接続が小さく信頼性の低いため、UDP経由で送信されるオーディオパケットは再送信されず、レシーバのオーディオ再生に不具合が発生する。本稿では,深層学習手法を用いてパケットの損失をリアルタイムで予測する手法について述べる。エラーをリアルタイムで隠蔽する能力は、パケット損失によるオーディオ障害の軽減に役立ち、現実世界のシナリオにおけるオーディオプレイアウトの品質を向上させる。 Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network. Ensuring realistic conditions for music performance, however, constitutes a significant engineering challenge due to extremely strict requirements in terms of audio quality and, most importantly, network delay. To minimize the end-to-end delay experienced by the musicians, typical implementations of NMP applications use un-compressed, bidirectional audio streams and leverage UDP as transport protocol. Being connection less and unreliable,audio packets transmitted via UDP which become lost in transit are not re-transmitted and thus cause glitches in the receiver audio playout. This article describes a technique for predicting lost packet content in real-time using a deep learning approach. The ability of concealing errors in real time can help mitigate audio impairments caused by packet losses, thus improving the quality of audio playout in real-world scenarios.	翻訳日:2022-11-10 15:34:29 公開日:2020-07-14
# Explore and Explain: セルフ教師付きナビゲーションとリカウント Explore and Explain: Self-supervised Navigation and Recounting ( http://arxiv.org/abs/2007.07268v1 ) ライセンス: Link先を確認	Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara	(参考訳) 自律的でインテリジェントなエージェントの開発を促進することを目的として、Embodied AIは最近注目を集めている。本稿では,エージェントが未知の環境を探索し,その経路に何が見えるのかを記述する必要がある,新たな具体的設定を考案する。この文脈では、エージェントは探索目標によって駆動される環境をナビゲートし、説明のための適切なモーメントを選択し、関連するオブジェクトとシーンの自然言語記述を出力する必要がある。本モデルでは,新たな自己監督探索モジュールとペナルティと,説明のための完全なキャプションモデルを統合する。また,環境とナビゲーションの双方から得られる情報によって,説明の適切なモーメントを選択するための異なるポリシーについて検討する。 Matterport3Dデータセットからフォトリアリスティックな環境下で実験を行い、エージェントのナビゲーションと説明機能およびそれらの相互作用の役割について調査する。 Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the navigation and explanation capabilities of the agent as well as the role of their interactions.	翻訳日:2022-11-10 15:28:03 公開日:2020-07-14
# TinyVIRAT:低解像度ビデオアクション認識 TinyVIRAT: Low-resolution Video Action Recognition ( http://arxiv.org/abs/2007.07355v1 ) ライセンス: Link先を確認	Ugur Demir, Yogesh S Rawat, Mubarak Shah	(参考訳) 既存のアクション認識の研究は主に、アクションがはっきりと見える高品質のビデオに焦点を当てている。現実世界の監視環境では、ビデオ内のアクションは幅広い解像度でキャプチャされる。ほとんどの活動は小さな解像度で発生し、そのような活動を認識することは難しい問題である。本研究では,ビデオ中の小さなアクションを認識することに焦点を当てる。我々は,天然の低解像度アクティビティを含むベンチマークデータセットであるtinyviratを紹介する。 TinyVIRATビデオのアクションには複数のラベルがあり、監視ビデオから抽出され、現実的でより困難なものになる。本稿では,低解像度アクションの品質向上のために,プログレッシブ・ジェネレーティブ・アプローチを用いたビデオにおける小さなアクションの認識手法を提案する。提案手法は,映像中の活動領域に焦点を合わせるのに役立つ弱訓練された注意機構も備えている。提案するtinyviratデータセットのベンチマーク実験を行い,提案手法がベースライン上での動作認識性能を大幅に向上させることを確認した。また,提案手法は,既存の手法と比較して,合成的再構成された行動認識データセットに対するアプローチを評価した。データセットとコードはhttps://github.com/UgurDemir/Tiny-VIRATで公開されている。 The existing research in action recognition is mostly focused on high-quality videos where the action is distinctly visible. In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions. Most activities occur at a distance with a small resolution and recognizing such activities is a challenging problem. In this work, we focus on recognizing tiny actions in videos. We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities. The actions in TinyVIRAT videos have multiple labels and they are extracted from surveillance videos which makes them realistic and more challenging. We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach to improve the quality of low-resolution actions. The proposed method also consists of a weakly trained attention mechanism which helps in focusing on the activity regions in the video. We perform extensive experiments to benchmark the proposed TinyVIRAT dataset and observe that the proposed method significantly improves the action recognition performance over baselines. We also evaluate the proposed approach on synthetically resized action recognition datasets and achieve state-of-the-art results when compared with existing methods. The dataset and code is publicly available at https://github.com/UgurDemir/Tiny-VIRAT.	翻訳日:2022-11-10 15:27:48 公開日:2020-07-14
# 攻撃的セキュリティのための機械学習:決定木とニューラルネットワークを用いたサンドボックス分類 Machine Learning for Offensive Security: Sandbox Classification Using Decision Trees and Artificial Neural Networks ( http://arxiv.org/abs/2007.06763v1 ) ライセンス: Link先を確認	Will Pearce, Nick Landers, and Nancy Fulda	(参考訳) 情報セキュリティにおける機械学習のメリットは、主に防衛を強化することに焦点を当てている。しかし、機械学習(ml)のテクニックは、深いポケットと巨大なデータリポジトリを持つ組織に留まらず、mlの民主化によって、mlを使用して攻撃的な操作をサポートするセキュリティチームの数が増えています。ここで提示された研究は、我々のチームが1つの攻撃的タスクを解決するために使った2つのモデルを調べ、サンドボックスを検出する。フィッシングメールで収集されたプロセスリストデータを用いて、サンドボックスの分類にDecision TreesとArtificial Neural Networksを用いることで、安全でない実行を避けることができる。本稿は,実際の攻撃的チームが機械学習を用いて攻撃的操作をサポートする方法について,ユニークな洞察を提供することを目的とする。 The merits of machine learning in information security have primarily focused on bolstering defenses. However, machine learning (ML) techniques are not reserved for organizations with deep pockets and massive data repositories; the democratization of ML has lead to a rise in the number of security teams using ML to support offensive operations. The research presented here will explore two models that our team has used to solve a single offensive task, detecting a sandbox. Using process list data gathered with phishing emails, we will demonstrate the use of Decision Trees and Artificial Neural Networks to successfully classify sandboxes, thereby avoiding unsafe execution. This paper aims to give unique insight into how a real offensive team is using machine learning to support offensive operations.	翻訳日:2022-11-10 15:26:34 公開日:2020-07-14
# マルチスタティック・ローカライゼーションのための深部ニューラルネットワークの不確かさと超音波構造健康モニタリングへの応用 Uncertainty Aware Deep Neural Network for Multistatic Localization with Application to Ultrasonic Structural Health Monitoring ( http://arxiv.org/abs/2007.06814v1 ) ライセンス: Link先を確認	Ishan D. Khurjekar, Joel B. Harley	(参考訳) 誘導超音波定位法は、空間分布型マルチスタティックセンサアレイと一般化ビームフォーミング戦略を用いて、構造物全体の損傷を検出し、発見する。伝播チャネルはしばしば非常に複雑である。波動伝播モデルとデータを比較して損傷を特定できる。しかし、環境の不確実性(例えば温度やストレスの変化)は、しばしば精度を低下させる。本稿では,不確実性を考慮した深層ニューラルネットワークフレームワークを用いて,ロバストな局所化モデルを学習し,不確実性を表現する。訓練データの不確実性に基づき,混合密度ネットワークを用いて損傷位置分布を生成する。これは、出力点推定を行うほとんどのローカライゼーション手法とは対照的である。本手法を一般化ビームフォーミングフレームワークであるmatched field processing(mfp)と比較した。提案手法は, 環境不確かさや騒音がある場合の0.1425mに対して0.0625mのローカライズ誤差を達成する。また,環境不確実性の増加に伴う予測的不確実性は,局所化精度を評価する統計的に有意な指標となることを示す。 Guided ultrasonic wave localization uses spatially distributed multistatic sensor arrays and generalized beamforming strategies to detect and locate damage across a structure. The propagation channel is often very complex. Methods can compare data with models of wave propagation to locate damage. Yet, environmental uncertainty (e.g., temperature or stress variations) often degrade accuracies. This paper uses an uncertainty-aware deep neural network framework to learn robust localization models and represent uncertainty. We use mixture density networks to generate damage location distributions based on training data uncertainty. This is in contrast with most localization methods, which output point estimates. We compare our approach with matched field processing (MFP), a generalized beamforming framework. The proposed approach achieves a localization error of 0.0625 m as compared to 0.1425 m with MFP when data has environmental uncertainty and noise. We also show that the predictive uncertainty scales as environmental uncertainty increases to provide a statistically meaningful metric for assessing localization accuracy.	翻訳日:2022-11-10 15:26:24 公開日:2020-07-14
# SRDCNN:時系列センサ信号分類タスクのための強正規化深部畳み込みニューラルネットワークアーキテクチャ SRDCNN: Strongly Regularized Deep Convolution Neural Network Architecture for Time-series Sensor Signal Classification Tasks ( http://arxiv.org/abs/2007.06909v1 ) ライセンス: Link先を確認	Arijit Ukil, Antonio Jara, Leandro Marin	(参考訳) ディープニューラルネットワーク(DNN)は、特にコンピュータビジョンベースのアプリケーションにおいて、分類および回帰タスクの実行に成功している。近年,IoT(Internet of Things,モノのインターネット)の普及により,時系列データ,特にセンサの分類タスクが最も重要になっている。本稿では, SRDCNN: Strongly Regularized Deep Convolution Neural Network (DCNN) に基づく,時系列分類タスクを実行するディープアーキテクチャを提案する。提案手法の新規性は、ネットワークウェイトが L1 と L2 のノルム法則によって正則化されることである。どちらも、より少ないトレーニングインスタンスの実践的な問題、より迅速なトレーニングプロセスの要求、重みベクトルのスパース化と重み値の制御によるオーバーフィッティングの問題を回避するために、協調的に対処する。提案手法(SRDCNN)と,公開時系列分類ベンチマーク(UCR/UEAアーカイブ)を用いて異なるDNNを含む関連技術アルゴリズムを比較し,提案手法が優れた性能を提供することを示す。 SRDCNNは,実時間時系列センサ信号のトレーニングインスタンス不足問題に対処するために,ネットワークパラメータを深く制御することで,より優れた一般化能力を深層アーキテクチャに保証していると感じている。 Deep Neural Networks (DNN) have been successfully used to perform classification and regression tasks, particularly in computer vision based applications. Recently, owing to the widespread deployment of Internet of Things (IoT), we identify that the classification tasks for time series data, specifically from different sensors are of utmost importance. In this paper, we present SRDCNN: Strongly Regularized Deep Convolution Neural Network (DCNN) based deep architecture to perform time series classification tasks. The novelty of the proposed approach is that the network weights are regularized by both L1 and L2 norm penalties. Both of the regularization approaches jointly address the practical issues of smaller number of training instances, requirement of quicker training process, avoiding overfitting problem by incorporating sparsification of weight vectors as well as through controlling of weight values. We compare the proposed method (SRDCNN) with relevant state-of-the-art algorithms including different DNNs using publicly available time series classification benchmark (the UCR/UEA archive) time series datasets and demonstrate that the proposed method provides superior performance. We feel that SRDCNN warrants better generalization capability to the deep architecture by profoundly controlling the network parameters to combat the training instance insufficiency problem of real-life time series sensor signals.	翻訳日:2022-11-10 15:26:06 公開日:2020-07-14
# ビセクターを追従する:多目的最適化のための簡単な方法 Follow the bisector: a simple method for multi-objective optimization ( http://arxiv.org/abs/2007.06937v1 ) ライセンス: Link先を確認	Alexandr Katrutsa, Daniil Merkulov, Nurislam Tursynbek and Ivan Oseledets	(参考訳) 本研究では,多目的最適化問題を解くための新しい等角方向法(EDM)を提案する。複数の異なる損失を最小化しなければならない最適化問題を考える。提案手法は,各イテレーションにおける降下方向を計算し,目的関数の相対的減少を保証する。この降下方向は個々の損失の正規化勾配に基づいている。したがって、マルチスケール損失を伴う多目的最適化問題を解くのが適切である。標準データセットを用いた不均衡分類問題とマルチタスク学習問題において,提案手法を検証した。 EDMはこれらの問題を解決する他の方法と比較される。 This study presents a novel Equiangular Direction Method (EDM) to solve a multi-objective optimization problem. We consider optimization problems, where multiple differentiable losses have to be minimized. The presented method computes descent direction in every iteration to guarantee equal relative decrease of objective functions. This descent direction is based on the normalized gradients of the individual losses. Therefore, it is appropriate to solve multi-objective optimization problems with multi-scale losses. We test the proposed method on the imbalanced classification problem and multi-task learning problem, where standard datasets are used. EDM is compared with other methods to solve these problems.	翻訳日:2022-11-10 15:25:41 公開日:2020-07-14
# Fenton-Wilkinson順序統計を用いた正規回帰:オリエンテーリングレースを事例として Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race ( http://arxiv.org/abs/2007.07369v1 ) ライセンス: Link先を確認	Joonas P\"a\"akk\"onen	(参考訳) スポーツでは、個人とチームは一般的に最終ランキングに興味を持つ。時間や距離などの最終結果は、これらのランキング(場所)を定めている。場所は順序確率変数(一般に順序統計と呼ばれる)にさらに関連付けることができる。そこで本研究では,更新時間を伴うレース場所の中継を予測できる簡易かつ高精度な順序統計順序回帰関数を提案する。この関数をfenton-wilkinson order statistics modelと呼ぶ。このモデルは次のような教育的な仮定に基づいて構築されている。さらに, フェントン・ウィルキンソンは, ドイツの戦車問題と同様に, チームの総数を推定するエスティメータと並行して, チェンジオーバタイムの近似値を用いることが目的である。この元のプレース回帰関数はsgmoidalであり、その結果、他のチームを大きく上回る少数のエリートチームが存在することを正しく予測している。また,本モデルでは,対数正規分布関数のインフレクション点における切替時間とともに,位置が線形に増大する様子を述べる。大規模なオリエンテーリングリレーレースであるJukola 2019の実際のデータから、トレーニングセットのサイズがデータセット全体のわずか5%である場合でも、モデルは極めて正確であることが示されている。また,本モデルでは,線形回帰,モード回帰,ガウス過程回帰よりも局所的根-平均二乗誤差が小さいことを示した。 In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places with changeover-times. We call this function the Fenton-Wilkinson Order Statistics model. This model is built on the following educated assumption: individual leg-times follow log-normal distributions. Moreover, our key idea is to utilize Fenton-Wilkinson approximations of changeover-times alongside an estimator for the total number of teams as in the notorious German tank problem. This original place regression function is sigmoidal and thus correctly predicts the existence of a small number of elite teams that significantly outperform the rest of the teams. Our model also describes how place increases linearly with changeover-time at the inflection point of the log-normal distribution function. With real-world data from Jukola 2019, a massive orienteering relay race, the model is shown to be highly accurate even when the size of the training set is only 5% of the whole data set. Numerical results also show that our model exhibits smaller place prediction root-mean-square-errors than linear regression, mord regression and Gaussian process regression.	翻訳日:2022-11-10 15:20:18 公開日:2020-07-14
# Meta-rPPG:トランスダクティブメタラーナーを用いた遠隔心拍数推定 Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner ( http://arxiv.org/abs/2007.06786v1 ) ライセンス: Link先を確認	Eugene Lee, Evan Chen, Chen-Yi Lee	(参考訳) 遠隔心拍数推定は、被験者と物理的に接触することなく心拍数を計測し、この研究で遠隔胸腔鏡(rPPG)を用いて達成する。 rPPG信号は通常、皮膚のトーンの変化、照明条件、顔の構造など、複数の要因に敏感なビデオカメラを使用して収集される。エンドツーエンドの教師あり学習アプローチは、トレーニングデータが豊富で、テストデータやデプロイメント中の分散からあまり逸脱しない分布をカバーしている。展開中の予期せぬ分布変化に対処するため,自己監督型重み調整(トランスダクティブ推論)のための試験(デプロイ)中に未ラベルのサンプルを採取し,分布変化に迅速に適応するトランスダクティブメタラーナを提案する。このアプローチを用いて,MAHNOB-HCIとUBFC-rPPGの最先端性能を実現する。 Remote heart rate estimation is the measurement of heart rate without any physical contact with the subject and is accomplished using remote photoplethysmography (rPPG) in this work. rPPG signals are usually collected using a video camera with a limitation of being sensitive to multiple contributing factors, e.g. variation in skin tone, lighting condition and facial structure. End-to-end supervised learning approach performs well when training data is abundant, covering a distribution that doesn't deviate too much from the distribution of testing data or during deployment. To cope with the unforeseeable distributional changes during deployment, we propose a transductive meta-learner that takes unlabeled samples during testing (deployment) for a self-supervised weight adjustment (also known as transductive inference), providing fast adaptation to the distributional changes. Using this approach, we achieve state-of-the-art performance on MAHNOB-HCI and UBFC-rPPG.	翻訳日:2022-11-10 15:20:00 公開日:2020-07-14
# BUNET:セキュアなUNETに基づいたブラインド医療画像セグメンテーション BUNET: Blind Medical Image Segmentation Based on Secure UNET ( http://arxiv.org/abs/2007.06855v1 ) ライセンス: Link先を確認	Song Bian, Xiaowei Xu, Weiwen Jiang, Yiyu Shi, Takashi Sato	(参考訳) さまざまなプライバシー規制によって医療記録に課される厳格なセキュリティ要件は、ビッグデータ時代の大きな障害となる。本研究では,データ機密性を保護しつつ,サービス方式としての効率的な機械学習を確保するために,UNETアーキテクチャに基づくプライバシ保存医療画像セグメンテーションを実装したセキュアプロトコルである盲点UNET(BUNET)を提案する。 BUNETでは、同相暗号やガーブロード回路(GC)などの暗号プリミティブを効率よく利用し、UNETニューラルアーキテクチャのための完全なセキュアなプロトコルを設計する。また,高次元入力データを用いたgcベースのセキュアアクティベーションプロトコルの計算ボトルネックを削減するため,広範なアーキテクチャ探索を行う。実験では,本プロトコルのパラメータ空間を徹底的に検討し,精度を損なうことなくベースラインアーキテクチャ上での最先端のセキュアな推論手法と比較して,最大14倍の推論時間を短縮できることを示す。 The strict security requirements placed on medical records by various privacy regulations become major obstacles in the age of big data. To ensure efficient machine learning as a service schemes while protecting data confidentiality, in this work, we propose blind UNET (BUNET), a secure protocol that implements privacy-preserving medical image segmentation based on the UNET architecture. In BUNET, we efficiently utilize cryptographic primitives such as homomorphic encryption and garbled circuits (GC) to design a complete secure protocol for the UNET neural architecture. In addition, we perform extensive architectural search in reducing the computational bottleneck of GC-based secure activation protocols with high-dimensional input data. In the experiment, we thoroughly examine the parameter space of our protocol, and show that we can achieve up to 14x inference time reduction compared to the-state-of-the-art secure inference technique on a baseline architecture with negligible accuracy degradation.	翻訳日:2022-11-10 15:19:42 公開日:2020-07-14
# イコサヘドロンの折り紙クラウン表現を用いた複数の魚眼画像からの360$^\circ$深度推定 360$^\circ$ Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron ( http://arxiv.org/abs/2007.06891v1 ) ライセンス: Link先を確認	Ren Komatsu, Hiromitsu Fujii, Yusuke Tamura, Atsushi Yamashita, Hajime Asama	(参考訳) 本研究では,屋内環境における多方向画像からの全周深度推定手法を提案する。特に,画像から深度を推定する手法として,平面スウィーピングステレオに着目した。オリガミの冠に類似しているため、「CrownConv」と命名した全方位画像に対して、新しいイコサヘドロンに基づく表現とConvNetsを提案する。 crownconvは魚眼画像と等角画像の両方に適用でき、特徴を抽出することができる。さらに,抽出した特徴量からイコサヘドロンのコストボリュームを生成するために,イコサヘドロンを用いた球面スイーピングを提案する。コストボリュームは3次元クラウンコンブを用いて正規化し、コストボリュームから深さ回帰によって最終的な深さを求める。提案手法は,外部カメラパラメータを用いてカメラアライメントに頑健であるため,トレーニングデータセットとカメラアライメントが異なる場合でも,正確な深度推定が可能である。提案する合成データセットのモデルを評価し,その有効性を実証する。提案手法は計算効率がよいため,GPUを搭載したラップトップを用いて,魚眼画像4枚から1秒以内で深度を推定する。そのため、現実世界のロボット応用に適している。ソースコードはhttps://github.com/matsuren/crownconv360depthから入手できます。 In this study, we present a method for all-around depth estimation from multiple omnidirectional images for indoor environments. In particular, we focus on plane-sweeping stereo as the method for depth estimation from the images. We propose a new icosahedron-based representation and ConvNets for omnidirectional images, which we name "CrownConv" because the representation resembles a crown made of origami. CrownConv can be applied to both fisheye images and equirectangular images to extract features. Furthermore, we propose icosahedron-based spherical sweeping for generating the cost volume on an icosahedron from the extracted features. The cost volume is regularized using the three-dimensional CrownConv, and the final depth is obtained by depth regression from the cost volume. Our proposed method is robust to camera alignments by using the extrinsic camera parameters; therefore, it can achieve precise depth estimation even when the camera alignment differs from that in the training dataset. We evaluate the proposed model on synthetic datasets and demonstrate its effectiveness. As our proposed method is computationally efficient, the depth is estimated from four fisheye images in less than a second using a laptop with a GPU. Therefore, it is suitable for real-world robotics applications. Our source code is available at https://github.com/matsuren/crownconv360depth.	翻訳日:2022-11-10 15:19:23 公開日:2020-07-14
# 自己発見、自己分類、自己修復による意味論的表現の学習 Learning Semantics-enriched Representation via Self-discovery, Self-classification, and Self-restoration ( http://arxiv.org/abs/2007.06959v1 ) ライセンス: Link先を確認	Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B. Gotway, Jianming Liang	(参考訳) 医療画像は、人間の解剖学に関する豊富な意味論と自然に関連し、繰り返し繰り返される解剖学的パターンに反映され、深い意味表現学習を育むユニークな可能性を提供し、異なる医療応用のための意味論的により強力なモデルを提供する。しかし、そのような強いが自由なセマンティックスを医療画像に埋め込むことができるのかは、まだ明らかにされていない。この目的のために,深層モデルを用いて自己発見,自己分類,および医用画像下の解剖学の自己修復により,意味論的に強化された視覚表現を学習し,意味論的に強化された汎用的3dモデルであるセマンティック・ジェネシス(semantic genesis)を実現する。我々は,様々な医学的特徴(ct,mri,x線)の分類と分節化の両方をカバーする6つの異なる対象課題について,自己監督または完全な監督によって,利用可能なすべての事前学習モデルを用いて意味論的生成を検討する。我々の広範な実験は、セマンティック・ジェネシスが3Dの全てをはるかに上回り、2Dのイメージネットに基づくデファクト・トランスファー学習をはるかに上回っていることを示している。これは、医療画像に埋め込まれた一貫した解剖学から得られる豊富な解剖学的パターンから、深層モデルに説得力のある意味表現を学ぶように促すものです。コードと事前トレーニングされたSemantic Genesisはhttps://github.com/JLiangLab/SemanticGenesis で入手できる。 Medical images are naturally associated with rich semantics about the human anatomy, reflected in an abundance of recurring anatomical patterns, offering unique potential to foster deep semantic representation learning and yield semantically more powerful models for different medical applications. But how exactly such strong yet free semantics embedded in medical images can be harnessed for self-supervised learning remains largely unexplored. To this end, we train deep models to learn semantically enriched visual representation by self-discovery, self-classification, and self-restoration of the anatomy underneath medical images, resulting in a semantics-enriched, general-purpose, pre-trained 3D model, named Semantic Genesis. We examine our Semantic Genesis with all the publicly-available pre-trained models, by either self-supervision or fully supervision, on the six distinct target tasks, covering both classification and segmentation in various medical modalities (i.e.,CT, MRI, and X-ray). Our extensive experiments demonstrate that Semantic Genesis significantly exceeds all of its 3D counterparts as well as the de facto ImageNet-based transfer learning in 2D. This performance is attributed to our novel self-supervised learning framework, encouraging deep models to learn compelling semantic representation from abundant anatomical patterns resulting from consistent anatomies embedded in medical images. Code and pre-trained Semantic Genesis are available at https://github.com/JLiangLab/SemanticGenesis .	翻訳日:2022-11-10 15:18:59 公開日:2020-07-14
# Pose2RGB。絶対位置からの深度とrgb画像の生成 Pose2RGBD. Generating Depth and RGB images from absolute positions ( http://arxiv.org/abs/2007.07013v1 ) ライセンス: Link先を確認	Mihai Cristian P\^irvu	(参考訳) 本稿では,ニューラルネットワークを用いてrgbd画像を自動的に生成するコンピュータビジョンとコンピュータグラフィックスの交点における手法を提案する。モデルはテクスチャ(RGB)と構造(Depth)の両方を再構築できなければならないため、メッシュやポイントクラウドのような明示的な表現とは対照的に、シーンの暗黙的な表現を生成する。このプロセスはニューラルレンダリング(Neural rendering)とみなすことができ、この関数 f : Pose -> RGBD は、グラフィックシミュレーションと同様、生成されたシーンをナビゲートするために使用できる。本稿では2つの新しいデータセットについて紹介する。1つは合成データに基づくデータで,もう1つは映像とgps信号のみを用いて,大学キャンパスのドローン飛行から記録する。最後に,Pose2RGBDネットワークをトレーニングするために,ビデオのみからデータセットを生成する教師なしの手法を提案する。コードとデータセットは: https://gitlab.com/mihaicristianpirvu/pose2rgbd。 We propose a method at the intersection of Computer Vision and Computer Graphics fields, which automatically generates RGBD images using neural networks, based on previously seen and synchronized video, depth and pose signals. Since the models must be able to reconstruct both texture (RGB) and structure (Depth), it creates an implicit representation of the scene, as opposed to explicit ones, such as meshes or point clouds. The process can be thought of as neural rendering, where we obtain a function f : Pose -> RGBD, which we can use to navigate through the generated scene, similarly to graphics simulations. We introduce two new datasets, one based on synthetic data with full ground truth information, while the other one being recorded from a drone flight in an university campus, using only video and GPS signals. Finally, we propose a fully unsupervised method of generating datasets from videos alone, in order to train the Pose2RGBD networks. Code and datasets are available at:: https://gitlab.com/mihaicristianpirvu/pose2rgbd.	翻訳日:2022-11-10 15:18:24 公開日:2020-07-14
# 共有潜在ガウス混合モデルによるクロスドメイン医用画像変換 Cross-Domain Medical Image Translation by Shared Latent Gaussian Mixture Model ( http://arxiv.org/abs/2007.07230v1 ) ライセンス: Link先を確認	Yingying Zhu, Youbao Tang, Yuxing Tang, Daniel C. Elton, Sungwon Lee, Perry J. Pickhardt, Ronald M. Summers	(参考訳) 現在のディープラーニングベースのセグメンテーションモデルは、訓練データ不足のため、ドメイン間の疎結合をよく一般化する。実世界の臨床応用では、異なる領域の医用画像が正確な診断に必要とされるため、クロスドメイン画像解析ツールが要求されている。放射線学における重要な例は、非造影CTから造影CTへの一般化である。異なる位相における造影CTは、特定の病理や臓器を増強するために用いられる。多くの既存のクロスドメイン画像-画像翻訳モデルは、大きな臓器のクロスドメインセグメンテーションを改善することが示されている。しかし、これらのモデルには翻訳過程で微細な構造を維持する能力がないため、大動脈や骨盤動脈の小さな石灰化プラークの分節化など、多くの臨床応用において重要である。医用画像翻訳中に微細な構造を保存するため,ガウス混合モデルから共有潜在変数を用いたパッチベースモデルを提案する。画像翻訳フレームワークを,クロスドメイン画像翻訳における最先端手法と比較し,詳細な構造保存に優れた性能を示す。大動脈プラークと膵のセグメンテーションの変換画像検出とセグメンテーションで2つのタスクをこなすことで,本モデルの優れた性能を検証した。生成された画像の品質が向上し、小さな構造を保存する能力が向上するため、セグメンテーション以外の問題にもフレームワークの有用性が拡張されることを期待します。 Current deep learning based segmentation models often generalize poorly between domains due to insufficient training data. In real-world clinical applications, cross-domain image analysis tools are in high demand since medical images from different domains are often needed to achieve a precise diagnosis. An important example in radiology is generalizing from non-contrast CT to contrast enhanced CTs. Contrast enhanced CT scans at different phases are used to enhance certain pathologies or organs. Many existing cross-domain image-to-image translation models have been shown to improve cross-domain segmentation of large organs. However, such models lack the ability to preserve fine structures during the translation process, which is significant for many clinical applications, such as segmenting small calcified plaques in the aorta and pelvic arteries. In order to preserve fine structures during medical image translation, we propose a patch-based model using shared latent variables from a Gaussian mixture model. We compare our image translation framework to several state-of-the-art methods on cross-domain image translation and show our model does a better job preserving fine structures. The superior performance of our model is verified by performing two tasks with the translated images - detection and segmentation of aortic plaques and pancreas segmentation. We expect the utility of our framework will extend to other problems beyond segmentation due to the improved quality of the generated images and enhanced ability to preserve small structures.	翻訳日:2022-11-10 15:17:29 公開日:2020-07-14
# Transposer:Feature Map を変換畳み込みフィルタとして用いたユニバーサルテクスチャ合成 Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter ( http://arxiv.org/abs/2007.07243v1 ) ライセンス: Link先を確認	Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro	(参考訳) テクスチャ合成のための従来のcnnは、(de)コンボリューションとアップ/ダウンサンプリングの一連の層で構成されており、各層はローカルに動作し、テクスチャ合成に必要な長期的な構造依存性を捉えることができない。したがって、彼らはしばしば合理的な合成を行うのではなく、単に入力テクスチャを拡大する。妥協として、近年の多くの手法は、同じ単一の(または固定された)テクスチャイメージ上でのトレーニングとテストによって一般化性を犠牲にしており、その結果、目に見えない画像に対して膨大な再トレーニング時間コストが生じる。本研究では,従来のテクスチャ合成における組立・ステーシング操作が,転置畳み込み操作と類似していることから,転置畳み込み操作を用いた新しい方法を提案する。具体的には, 入力テクスチャの符号化特徴マップ全体を変換畳み込みフィルタとして, 自己相関情報をキャプチャする特徴の自己相似性マップを変換畳み込みの入力として直接扱う。このような設計により、トレーニングされたフレームワークは、ほぼリアルタイムで単一のフォワードパスで、見えないテクスチャの合成を一般化することができます。本手法は,様々な指標に基づき,最先端のテクスチャ合成品質を実現する。自己相似性は入力テクスチャの規則的な構造パターンを保存するのに役立つが、我々のフレームワークは、自己相似性マップの代わりに不規則な入力テクスチャのためのランダムノイズマップを変換畳み込み入力として利用することもできる。より多様な結果を得ることができ、また、1回のパスで大きなノイズマップを直接サンプリングすることで、任意に大きなテクスチャ出力を生成することができる。 Conventional CNNs for texture synthesis consist of a sequence of (de)-convolution and up/down-sampling layers, where each layer operates locally and lacks the ability to capture the long-term structural dependency required by texture synthesis. Thus, they often simply enlarge the input texture, rather than perform reasonable synthesis. As a compromise, many recent methods sacrifice generalizability by training and testing on the same single (or fixed set of) texture image(s), resulting in huge re-training time costs for unseen images. In this work, based on the discovery that the assembling/stitching operation in traditional texture synthesis is analogous to a transposed convolution operation, we propose a novel way of using transposed convolution operation. Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution. Such a design allows our framework, once trained, to be generalizable to perform synthesis of unseen textures with a single forward pass in nearly real-time. Our method achieves state-of-the-art texture synthesis quality based on various metrics. While self-similarity helps preserve the input textures' regular structural patterns, our framework can also take random noise maps for irregular input textures instead of self-similarity maps as transposed convolution inputs. It allows to get more diverse results as well as generate arbitrarily large texture outputs by directly sampling large noise maps in a single pass as well.	翻訳日:2022-11-10 15:17:04 公開日:2020-07-14
# 意味セグメンテーションにおける限定データとアノテーションの問題に取り組む Tackling the Problem of Limited Data and Annotations in Semantic Segmentation ( http://arxiv.org/abs/2007.07357v1 ) ライセンス: Link先を確認	Ahmadreza Jeddi	(参考訳) 本研究では,小さな画像データセット(PASCAL VOC 2012からランダムに選択された1000個の画像)におけるセマンティックセグメンテーション(セマンティックセグメンテーション)について検討した。特に,画像セグメンテーションにおける限られたデータアノテーションの問題に対処するため,画像セグメンテーション性能を向上させるために,様々な事前訓練されたモデルとCRFベースの手法を転送する。この目的のために、RotNet、DeeperCluster、Semi&Weakly Supervised Learning (SWSL)事前訓練されたモデルをDeepLab-v2ベースラインで転送、微調整し、高密度CRFを後処理および損失正規化技術として適用する。私の研究の結果は、この小さなデータセットでは、プリトレーニングされたresnet50 swslモデルを使用することで、imagenetプリトレーニングモデルよりも7.4%優れた結果が得られることを示しています。一方、高密度CRFは非常に有効であることが示され、弱い教師付きトレーニングにおける損失正規化技術や後処理ツールとしての結果が高められる。 In this work, the case of semantic segmentation on a small image dataset (simulated by 1000 randomly selected images from PASCAL VOC 2012), where only weak supervision signals (scribbles from user interaction) are available is studied. Especially, to tackle the problem of limited data annotations in image segmentation, transferring different pre-trained models and CRF based methods are applied to enhance the segmentation performance. To this end, RotNet, DeeperCluster, and Semi&Weakly Supervised Learning (SWSL) pre-trained models are transferred and finetuned in a DeepLab-v2 baseline, and dense CRF is applied both as a post-processing and loss regularization technique. The results of my study show that, on this small dataset, using a pre-trained ResNet50 SWSL model gives results that are 7.4% better than applying an ImageNet pre-trained model; moreover, for the case of training on the full PASCAL VOC 2012 training data, this pre-training approach increases the mIoU results by almost 4%. On the other hand, dense CRF is shown to be very effective as well, enhancing the results both as a loss regularization technique in weakly supervised training and as a post-processing tool.	翻訳日:2022-11-10 15:10:04 公開日:2020-07-14
# 深層畳み込みニューラルネットワークを用いたusgs歴史地図列からの道路交差点点の自動抽出 Automatic extraction of road intersection points from USGS historical map series using deep convolutional neural networks ( http://arxiv.org/abs/2007.07404v1 ) ライセンス: Link先を確認	Mahmoud Saeedimoghaddam and T. F. Stepinski	(参考訳) 道路交差点のデータは様々な地理空間的応用と分析に利用されている。 GIS以前の道路網のデータセットは、歴史印刷された地図の形でしか利用できない。 GISソフトウェアで解析する前には、スキャンして、使用可能なベクトルベースのフォーマットに変換する必要がある。スキャンされた歴史的地図の膨大な量のため、それらをデジタルデータセットに変換する自動化方法が採用される必要がある。このプロセスはコンピュータビジョンアルゴリズムに基づくことが多い。しかし、低品質かつ視覚的に複雑なマップと最適パラメータの設定のための変換精度は、これらのアルゴリズムを使用する際の2つの課題である。本稿では,地域別CNNと呼ばれるオブジェクト検出タスクにディープ畳み込みニューラルネットワークを用いる標準的なパラダイムを用いて,米国各都市の歴史的USGS地図における道路交差点の自動同定を行った。その結果,道路地図の複線地図表現における変換精度は,単線地図よりも高いことがわかった。また、従来のコンピュータビジョンアルゴリズムと比較して、RCNNはより正確な抽出を提供する。最後に, 検出出力における誤差の量は, 地図の複雑さや曖昧さに敏感であるとともに, 内部のRGB組み合わせの数にも敏感であることを示した。 Road intersections data have been used across different geospatial applications and analysis. The road network datasets dating from pre-GIS years are only available in the form of historical printed maps. Before they can be analyzed by a GIS software, they need to be scanned and transformed into the usable vector-based format. Due to the great bulk of scanned historical maps, automated methods of transforming them into digital datasets need to be employed. Frequently, this process is based on computer vision algorithms. However, low conversion accuracy for low quality and visually complex maps and setting optimal parameters are the two challenges of using those algorithms. In this paper, we employed the standard paradigm of using deep convolutional neural network for object detection task named region-based CNN for automatically identifying road intersections in scanned historical USGS maps of several U.S. cities. We have found that the algorithm showed higher conversion accuracy for the double line cartographic representations of the road maps than the single line ones. Also, compared to the majority of traditional computer vision algorithms RCNN provides more accurate extraction. Finally, the results show that the amount of errors in the detection outputs is sensitive to complexity and blurriness of the maps as well as the number of distinct RGB combinations within them.	翻訳日:2022-11-10 15:09:39 公開日:2020-07-14
# 言語・コミュニケーション・社会 : ジェンダーに基づく言語分析 Language, communication and society: a gender based linguistics analysis ( http://arxiv.org/abs/2007.06908v1 ) ライセンス: Link先を確認	P. Cutugno, D. Chiarella, R. Lucentini, L. Marconi and G. Morgavi	(参考訳) 本研究の目的は,言語が思考の鏡であり,偏見であり,文化的ステレオタイプであるとする仮説を支持する証拠を見つけることである。 537名を対象にアンケート調査を行った。回答は、心理的特徴や行動特性の帰属など、性別のステレオタイプが存在するかどうかを調べるために分析されてきた。特に、現代社会における男女の役割を定義する際に現れるステレオタイプ画像が何であるかを識別することを目的としていた。さらに、与えられた結果は、性別のステレオタイプと、それらが生み出す期待が、罰や不平等をもたらすかどうかを理解するための良い出発点となる。もしそうなら、言語とその使用は本質的にジェンダーバイアスを生じさせ、日々の生活でも仕事の設定でも評価に影響します。 The purpose of this study is to find evidence for supporting the hypothesis that language is the mirror of our thinking, our prejudices and cultural stereotypes. In this analysis, a questionnaire was administered to 537 people. The answers have been analysed to see if gender stereotypes were present such as the attribution of psychological and behavioural characteristics. In particular, the aim was to identify, if any, what are the stereotyped images, which emerge in defining the roles of men and women in modern society. Moreover, the results given can be a good starting point to understand if gender stereotypes, and the expectations they produce, can result in penalization or inequality. If so, the language and its use would create inherently a gender bias, which influences evaluations both in work settings both in everyday life.	翻訳日:2022-11-10 15:09:22 公開日:2020-07-14
# ポートノイズ調査に最も適した調査を定義するためのアンケート調査分析 Questionnaire analysis to define the most suitable survey for port-noise investigation ( http://arxiv.org/abs/2007.06915v1 ) ライセンス: Link先を確認	Andrea Cerniglia, Davide Chiarella, Paola Cutugno, Lucia Marconi, Anna Magrini, Gelsomina Di Feo, Melissa Ferretti	(参考訳) 港湾とロジスティックプラットフォームの間の地域に影響を与える高レベルの騒音汚染は、異なる観点から直面することができる問題である。音響モニタリング,マッピング,短期計測,港湾および道路交通流の解析は,この問題のより良い管理のために提案すべき戦略について有用な指標を与えることができる。バックポート地域の騒音に曝された住民へのアンケート作成による調査活動は,主観的視点の理解を深める上で有用である。本論文は,国際的に提案されている主観的調査のためのアンケートデータベースの一部として選択された,特定の研究に適した質問のサンプルを分析する。第1次データ収集キャンペーンの予備結果は,調査に使用する数,質問の種類,サンプルノイズの種類の妥当性を検証するために検討された。調査はTRIPLOプロジェクト(TRansports and Innovative sustainable connection between Ports and LOGistic platform)に分散するよう最適化される。本調査の結果は,音響モニタリングと組み合わせて行った言語調査の出発点となり,個人の感情と技術的側面との関係の理解を深める。 The high level of noise pollution affecting the areas between ports and logistic platforms represents a problem that can be faced from different points of view. Acoustic monitoring, mapping, short-term measurements, port and road traffic flows analyses can give useful indications on the strategies to be proposed for a better management of the problem. A survey campaign through the preparation of questionnaires to be submitted to the population exposed to noise in the back-port areas will help to better understand the subjective point of view. The paper analyses a sample of questions suitable for the specific research, chosen as part of the wide database of questionnaires internationally proposed for subjective investigations. The preliminary results of a first data collection campaign are considered to verify the adequacy of the number, the type of questions, and the type of sample noise used for the survey. The questionnaire will be optimized to be distributed in the TRIPLO project (TRansports and Innovative sustainable connections between Ports and LOgistic platforms). The results of this survey will be the starting point for the linguistic investigation carried out in combination with the acoustic monitoring, to improve understanding the connections between personal feeling and technical aspects.	翻訳日:2022-11-10 15:09:08 公開日:2020-07-14
# Covidex:COVID-19オープン研究データセットのニューラルネットワークランキングモデルとキーワード検索基盤 Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset ( http://arxiv.org/abs/2007.07846v1 ) ライセンス: Link先を確認	Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin	(参考訳) 我々は、最新のニューラルネットワークランキングモデルを利用して、Allen Institute for AIがキュレートしたCOVID-19 Open Research Datasetに情報アクセスを提供する検索エンジンであるCovidexを紹介する。当社のシステムは,2020年3月下旬以降,オンラインでユーザに提供する。 covidexは、現在進行中の世界的なパンデミックに取り組むドメインエキスパートを支援する技術を開発するための、3段階の戦略のユーザアプリケーションコンポーネントです。さらに、成熟したフュージョンベースの手法を利用する堅牢で使いやすいキーワード検索インフラストラクチャや、他のアプリケーションに組み込むことのできるスタンドアロンのニューラルネットワークランキングモデルも提供しています。私たちのインフラとベースラインは多くの参加者によって採用されています。第3ラウンドでは、前回のトレーニングデータと第2のフルオートマチックランを活用した最高スケアランを報告します。 We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.	翻訳日:2022-11-10 15:07:59 公開日:2020-07-14
# クロスモーダル変調と選択によるRGB-D能動物体検出 RGB-D Salient Object Detection with Cross-Modality Modulation and Selection ( http://arxiv.org/abs/2007.07051v1 ) ライセンス: Link先を確認	Chongyi Li and Runmin Cong and Yongri Piao and Qianqian Xu and Chen Change Loy	(参考訳) 本稿では, RGB-D salient Object Detection (SOD) において, モーダリティの相互補完性を段階的に統合し, 改良する有効な方法を提案する。提案するネットワークは主に2つの課題を解決している。 1)RGB画像とその対応する深度マップからの補完情報を効果的に統合する方法、及び 2) より衛生的な特徴を適応的に選択する方法。まず,rgb-dデータの相補関係をモデル化する奥行き特徴を予め考慮し,特徴表現を強調するクロスモダリティ特徴変調(cmfm)モジュールを提案する。第2に,サリエンシー関連特徴を選択し,下位特徴を抑圧する適応特徴選択(afs)モジュールを提案する。 AFSモジュールは、自己モダリティとチャネル特徴の相互依存性を考慮した多モード空間的特徴融合を利用する。第3に,saliency-guided position-edge attention(sg-pea)モジュールを使用して,ネットワークがsariency-related regionに集中するよう促す。上記のモジュール全体であるcmMSブロック(英語版)は、粗い微細な方法での塩分濃度特性の洗練を促進する。ボトムアップ推論と組み合わせて、改良されたサリエンシ機能は正確かつエッジ保存のSODを可能にする。大規模な実験により、我々のネットワークは6つのRGB-D SODベンチマークで最先端の精度検出器より優れていることが示された。 We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD). The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features. First, we propose a cross-modality feature modulation (cmFM) module to enhance feature representations by taking the depth features as prior, which models the complementary relations of RGB-D data. Second, we propose an adaptive feature selection (AFS) module to select saliency-related features and suppress the inferior ones. The AFS module exploits multi-modality spatial feature fusion with the self-modality and cross-modality interdependencies of channel features are considered. Third, we employ a saliency-guided position-edge attention (sg-PEA) module to encourage our network to focus more on saliency-related regions. The above modules as a whole, called cmMS block, facilitates the refinement of saliency features in a coarse-to-fine fashion. Coupled with a bottom-up inference, the refined saliency features enable accurate and edge-preserving SOD. Extensive experiments demonstrate that our network outperforms state-of-the-art saliency detectors on six popular RGB-D SOD benchmarks.	翻訳日:2022-11-10 15:02:18 公開日:2020-07-14
# 書き手識別と書き手検索のための再ランク付け Re-ranking for Writer Identification and Writer Retrieval ( http://arxiv.org/abs/2007.07101v1 ) ライセンス: Link先を確認	Simon Jordan, Mathias Seuret, Pavel Kr\'al, Ladislav Lenc, Ji\v{r}\'i Mart\'inek, Barbara Wiermann, Tobias Schwinger, Andreas Maier, Vincent Christlein	(参考訳) 自動ライタ識別は文書解析において一般的な問題である。 state-of-the-artメソッドは通常、従来的あるいはディープラーニングベースのテクニックによる特徴抽出ステップにフォーカスする。検索問題では、再ランク付けは結果を改善するのによく使われる手法である。ランク付けされた結果に含まれる知識を用いて、初期ランク付け結果を洗練する。 g. 最寄りの近隣関係を利用する。私たちの知る限りでは、再ランク付けはライターの識別/再利用には使われていません。考えられる理由は、公開利用可能なベンチマークデータセットには、書き込み毎のサンプルがわずかしかないため、再ランク付けが期待できないことだ。著者1人あたりのサンプル数が少ない場合でも,k-相反的近傍関係に基づく再ランク付けが,著者識別に有利であることを示す。これらの相互関係は、もともと提案されたような新しいベクトルにエンコードするか、クエリ拡張の観点でそれらを統合するかの2つの方法で利用します。両手法が3つの著者識別データセット上でmAPの基準値よりも優れていることを示す。 Automatic writer identification is a common problem in document analysis. State-of-the-art methods typically focus on the feature extraction step with traditional or deep-learning-based techniques. In retrieval problems, re-ranking is a commonly used technique to improve the results. Re-ranking refines an initial ranking result by using the knowledge contained in the ranked result, e. g., by exploiting nearest neighbor relations. To the best of our knowledge, re-ranking has not been used for writer identification/retrieval. A possible reason might be that publicly available benchmark datasets contain only few samples per writer which makes a re-ranking less promising. We show that a re-ranking step based on k-reciprocal nearest neighbor relationships is advantageous for writer identification, even if only a few samples per writer are available. We use these reciprocal relationships in two ways: encode them into new vectors, as originally proposed, or integrate them in terms of query-expansion. We show that both techniques outperform the baseline results in terms of mAP on three writer identification datasets.	翻訳日:2022-11-10 15:00:19 公開日:2020-07-14
# 画像生成と編集のためのアートワークフローのモデリング Modeling Artistic Workflows for Image Generation and Editing ( http://arxiv.org/abs/2007.07238v1 ) ライセンス: Link先を確認	Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang	(参考訳) 人々は、デザイン全体を伝える複数のステージを含む芸術的なワークフローに従うことで、しばしばアートを作成する。アーティストが初期の決定を修正したい場合、この新たな決定を最終的な作品に広めるために重要な作業が必要となる。上記の観察に動機づけられ,既存の芸術作品の多段階画像生成と多段階画像編集の両方を可能にする,所定の芸術的ワークフローに従う生成モデルを提案する。さらに, 編集シナリオでは, モデルが生成した編集画像が元の画像と密接に一致するように, 学習に基づく正規化とともに最適化プロセスを導入する。 3つの異なる芸術的データセットの質的および定量的な結果は、画像生成と編集の両方におけるフレームワークの有効性を示す。 People often create art by following an artistic workflow involving multiple stages that inform the overall design. If an artist wishes to modify an earlier decision, significant work may be required to propagate this new decision forward to the final artwork. Motivated by the above observations, we propose a generative model that follows a given artistic workflow, enabling both multi-stage image generation as well as multi-stage image editing of an existing piece of art. Furthermore, for the editing scenario, we introduce an optimization process along with learning-based regularization to ensure the edited image produced by the model closely aligns with the originally provided image. Qualitative and quantitative results on three different artistic datasets demonstrate the effectiveness of the proposed framework on both image generation and editing tasks.	翻訳日:2022-11-10 14:58:58 公開日:2020-07-14
# JSENet:3Dポイントクラウドのための共同セマンティックセグメンテーションとエッジ検出ネットワーク JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds ( http://arxiv.org/abs/2007.06888v1 ) ライセンス: Link先を確認	Zeyu Hu, Mingmin Zhen, Xuyang Bai, Hongbo Fu and Chiew-lan Tai	(参考訳) セマンティックセグメンテーションとセマンティックエッジ検出は、コンピュータビジョンにおける密接な関係を持つ2つの双対問題と見なすことができる。学習に基づく3Dセマンティックセグメンテーション法の急速な進化にもかかわらず、3Dセマンティックエッジ検出器の学習には注意が向けられていない。本稿では,3次元意味エッジ検出タスクを初めて取り上げ,これら2つのタスクを共同で実行する新たな2ストリーム完全畳み込みネットワークを提案する。特に,両タスクの性能向上のために,領域情報とエッジ情報を明示的に関連付ける共同改良モジュールを設計する。さらに,ネットワークが境界を良くして意味的セグメンテーション結果を生成することを促す新しい損失関数を提案する。 S3DISおよびScanNetデータセットの大規模評価により,本手法はセマンティックセグメンテーションの最先端手法よりも高い性能を示し,セマンティックエッジ検出のベースライン手法よりも優れていた。コードリリース:https://github.com/hzykent/JSENet Semantic segmentation and semantic edge detection can be seen as two dual problems with close relationships in computer vision. Despite the fast evolution of learning-based 3D semantic segmentation methods, little attention has been drawn to the learning of 3D semantic edge detectors, even less to a joint learning method for the two tasks. In this paper, we tackle the 3D semantic edge detection task for the first time and present a new two-stream fully-convolutional network that jointly performs the two tasks. In particular, we design a joint refinement module that explicitly wires region information and edge information to improve the performances of both tasks. Further, we propose a novel loss function that encourages the network to produce semantic segmentation results with better boundaries. Extensive evaluations on S3DIS and ScanNet datasets show that our method achieves on par or better performance than the state-of-the-art methods for semantic segmentation and outperforms the baseline methods for semantic edge detection. Code release: https://github.com/hzykent/JSENet	翻訳日:2022-11-10 14:53:22 公開日:2020-07-14
# 歴史的文書化のための共同レイアウト解析・文字検出・認識 Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization ( http://arxiv.org/abs/2007.06890v1 ) ライセンス: Link先を確認	Weihong Ma, Hesuo Zhang, Lianwen Jin, Sihang Wu, Jiapeng Wang, Yongpan Wang	(参考訳) 本稿では,正しい読み順に従って履歴文書を復元するためのエンドツーエンドの学習フレームワークを提案する。このフレームワークでは、キャラクタブランチとレイアウトブランチという2つのブランチが特徴抽出ネットワークの背後に追加される。文字ブランチは、文書画像中の個々の文字をローカライズし、同時に認識する。次に,テキスト行にグループ化するための後処理手法を採用する。完全な畳み込みネットワークに基づくレイアウト分岐は、バイナリマスクを出力する。次に,バイナリマスクの行検出にhough変換を使用し,文字結果とレイアウト情報を組み合わせて文書コンテンツを復元する。これら2つの枝は並行して訓練でき、容易に訓練できる。さらに,認識誤差を最小化する再スコア機構を提案する。中国の歴史文書MTHv2データセットの実験結果から,提案手法の有効性が示された。 In this paper, we propose an end-to-end trainable framework for restoring historical documents content that follows the correct reading order. In this framework, two branches named character branch and layout branch are added behind the feature extraction network. The character branch localizes individual characters in a document image and recognizes them simultaneously. Then we adopt a post-processing method to group them into text lines. The layout branch based on fully convolutional network outputs a binary mask. We then use Hough transform for line detection on the binary mask and combine character results with the layout information to restore document content. These two branches can be trained in parallel and are easy to train. Furthermore, we propose a re-score mechanism to minimize recognition error. Experiment results on the extended Chinese historical document MTHv2 dataset demonstrate the effectiveness of the proposed framework.	翻訳日:2022-11-10 14:52:40 公開日:2020-07-14
# 人-物体相互作用検出のためのグラフに基づく対話型推論 A Graph-based Interactive Reasoning for Human-Object Interaction Detection ( http://arxiv.org/abs/2007.06925v1 ) ライセンス: Link先を確認	Dongming Yang and Yuexian Zou	(参考訳) 人間-物体相互作用(Human-Object Interaction, HOI)検出は,<人,動詞,オブジェクト>の推論によって,人間が周囲の物体とどのように相互作用するかを学ぶ。しかし、最近のhoi検出手法は、主に追加のアノテーション(人間のポーズなど)と、畳み込みを超えて強力な対話的推論を無視する。本稿では,対話型意味論を視覚的対象に対して効果的に活用する,インタラクティブグラフ(in-Graph)と呼ばれる新しいグラフベースの対話型推論モデルを提案する。提案モデルは,コンボリューション空間からグラフベースのセマンティック空間へ関連ターゲットをマッピングするプロジェクト関数と,すべてのノード間のセマンティクスを伝播するメッセージパッシングプロセスと,理由付けられたノードを畳み込み空間に変換する更新関数とから構成される。さらに,新たなフレームワークを構築して,HOI,すなわち-GraphNetを検出する。このフレームワークは、それぞれインスタンス機能を使用してHOIを推論する以外に、2レベルイングラフ、すなわちシーンワイドとインスタンスワイドイングラフを統合することで、視覚的ターゲット間のペアワイズなセマンティクスを動的に解析する。私たちのフレームワークはエンドツーエンドでトレーニング可能で、人間のポーズのような高価なアノテーションは不要です。 V-COCOとHICO-DETのベンチマークにおいて,提案手法が既存のHOI検出法より優れ,ベースラインが約9.4%,15%向上し,HOI検出の有効性が検証された。 Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring triplets of < human, verb, object >. However, recent HOI detection methods mostly rely on additional annotations (e.g., human pose) and neglect powerful interactive reasoning beyond convolutions. In this paper, we present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs, in which interactive semantics implied among visual targets are efficiently exploited. The proposed model consists of a project function that maps related targets from convolution space to a graph-based semantic space, a message passing process propagating semantics among all nodes and an update function transforming the reasoned nodes back to convolution space. Furthermore, we construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Beyond inferring HOIs using instance features respectively, the framework dynamically parses pairwise interactive semantics among visual targets by integrating two-level in-Graphs, i.e., scene-wide and instance-wide in-Graphs. Our framework is end-to-end trainable and free from costly annotations like human pose. Extensive experiments show that our proposed framework outperforms existing HOI detection methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 9.4% and 15% relatively, validating its efficacy in detecting HOIs.	翻訳日:2022-11-10 14:52:10 公開日:2020-07-14
# 特徴等化をもつ相互エンコーダデコーダによる画像の描画再考 Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations ( http://arxiv.org/abs/2007.06929v1 ) ライセンス: Link先を確認	Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, and Chao Yang	(参考訳) ディープエンコーダデコーダベースのcnnは、ホール充填のための高度なイメージインペインティング手法を備えている。既存の手法では、ホール領域の構造とテクスチャを段階的に復元するが、通常は2つのエンコーダデコーダを使用して別々のリカバリを行う。各エンコーダのCNN機能は、それら全体を考慮せずに、欠落した構造やテクスチャをキャプチャする。これらのエンコーダの不十分な利用により、構造とテクスチャの回復性能が制限される。本稿では,相互エンコーダとデコーダのCNNを用いて,両者の結合回復を提案する。入力画像の構造とテクスチャをそれぞれ表現するために,エンコーダの深層と浅層からのcnn機能を使用する。深層特徴は構造分岐に送られ、浅層特徴はテクスチャ分岐に送られる。各ブランチでは、CNNの機能の複数のスケールで穴を埋めます。両方のブランチから満たされたCNN機能は連結され、その後等化される。特徴均等化の際,まずチャネルの注意を尊重し,空間等化を実現するための双方向伝搬活性化関数を提案する。この目的のために、構造とテクスチャの満たされたCNN特徴は、すべての特徴レベルで画像コンテンツを表現するのに相互に有利である。我々は、スキップ接続による出力画像生成のためのデコーダ機能を補うために等化機能を使用する。評価実験の結果,提案手法は構造やテクスチャの復元に有効であり,最先端のアプローチに対して良好に機能することがわかった。 Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features limit the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch and the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on the benchmark datasets show the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.	翻訳日:2022-11-10 14:51:39 公開日:2020-07-14
# 高精度スケール推定のための適応的提案選択による相関フィルタ追跡 Correlation filter tracking with adaptive proposal selection for accurate scale estimation ( http://arxiv.org/abs/2007.07018v1 ) ライセンス: Link先を確認	Luo Xiong, Yanjie Liang, Yan Yan, Hanzi Wang	(参考訳) 近年,相関フィルタを用いた検出手法が最先端の追跡結果を達成している。しかし、提案生成器によって与えられる多くの冗長な提案は、これらのトラッカーの性能と速度を低下させる可能性がある。本稿では,視覚物体追跡のためのスケール変動問題に対処するために,少数の高品質提案を生成できる適応型提案選択アルゴリズムを提案する。具体的には、まず、HSV色空間における色ヒストグラムを用いて、インスタンス(例えば、最初のフレームにおける初期ターゲットと、前のフレームにおける予測ターゲット)と提案を行う。そして、色相似性に基づく適応戦略を定式化し、高品質の提案を選択する。さらに,提案する適応型提案選択アルゴリズムを細かな深層特徴と統合することで,トラッカの一般化と効率性を検証する。 2つのベンチマークデータセットの実験では、提案アルゴリズムがいくつかの最先端トラッカーに対して好適に動作することを示した。 Recently, some correlation filter based trackers with detection proposals have achieved state-of-the-art tracking results. However, a large number of redundant proposals given by the proposal generator may degrade the performance and speed of these trackers. In this paper, we propose an adaptive proposal selection algorithm which can generate a small number of high-quality proposals to handle the problem of scale variations for visual object tracking. Specifically, we firstly utilize the color histograms in the HSV color space to represent the instances (i.e., the initial target in the first frame and the predicted target in the previous frame) and proposals. Then, an adaptive strategy based on the color similarity is formulated to select high-quality proposals. We further integrate the proposed adaptive proposal selection algorithm with coarse-to-fine deep features to validate the generalization and efficiency of the proposed tracker. Experiments on two benchmark datasets demonstrate that the proposed algorithm performs favorably against several state-of-the-art trackers.	翻訳日:2022-11-10 14:50:21 公開日:2020-07-14
# 一般属性予測のための教師学生ネットワークによる半教師付き学習 Semi-supervised Learning with a Teacher-student Network for Generalized Attribute Prediction ( http://arxiv.org/abs/2007.06769v1 ) ライセンス: Link先を確認	Minchul Shin	(参考訳) 本稿では,視覚特性予測問題を解くための半教師付き学習について述べる。視覚アルゴリズムの多くの応用において、物体の視覚特性の正確な認識は重要であるが、それでも難しい。これは属性のクラス階層の定義があいまいであるため、トレーニングデータは必然的にクラスの不均衡とラベルのスパーシティに苦しむため、効果的なアノテーションが欠如している。直感的な解決策は、ラベルのない画像を利用して画像表現を効果的に学習する方法を見つけることである。そこで本研究では,マルチタスク学習と半教師学習の蒸留に触発されたマルチティーチャー・シングルスチューデント(mtss)アプローチを提案する。我々のMTSSはラベル埋め込み技術を用いて教師ネットワークと呼ばれるタスク固有のドメインエキスパートを学習し、モデルにドメインエキスパートが学習した分布を模倣するように強制することで学生ネットワークと呼ばれる統一モデルを学ぶ。提案手法は, ファッション属性予測のための様々なベンチマークにおいて, 競争性能を達成するだけでなく, ドメイン間適応性やロバスト性も向上することを示した。 This paper presents a study on semi-supervised learning to solve the visual attribute prediction problem. In many applications of vision algorithms, the precise recognition of visual attributes of objects is important but still challenging. This is because defining a class hierarchy of attributes is ambiguous, so training data inevitably suffer from class imbalance and label sparsity, leading to a lack of effective annotations. An intuitive solution is to find a method to effectively learn image representations by utilizing unlabeled images. With that in mind, we propose a multi-teacher-single-student (MTSS) approach inspired by the multi-task learning and the distillation of semi-supervised learning. Our MTSS learns task-specific domain experts called teacher networks using the label embedding technique and learns a unified model called a student network by forcing a model to mimic the distributions learned by domain experts. Our experiments demonstrate that our method not only achieves competitive performance on various benchmarks for fashion attribute prediction, but also improves robustness and cross-domain adaptability for unseen domains.	翻訳日:2022-11-10 14:43:45 公開日:2020-07-14
# Face to Purchase: 顔構造と行動特性を組み込んだ消費者選択予測 Face to Purchase: Predicting Consumer Choices with Structured Facial and Behavioral Traits Embedding ( http://arxiv.org/abs/2007.06842v1 ) ライセンス: Link先を確認	Zhe Liu, Xianzhi Wang, Lina Yao, Jake An, Lei Bai, Ee-Peng Lim	(参考訳) 消費者の購買行動を予測することは、eコマースのターゲット広告や販売促進にとって重要である。人間の顔は、消費者の性格や行動特性に関する洞察を得るための貴重な情報源である。しかし、消費者の顔は、これまでの研究ではほとんど研究されておらず、既存の顔関連研究は、顔データから学ぶことのビジネス的重要性を無視しながら、パーソナリティ特性のようなハイレベルな特徴に焦点を当てている。顔の特徴や購買履歴から消費者の購買予測を行う。我々は,階層的埋め込みネットワークに基づく半教師付きモデルを設計し,消費者の高レベルな特徴を抽出し,消費者の最上位の購入先を予測する。実世界のデータセットを用いた実験結果から,消費者の購買行動予測に顔情報を導入する効果が示された。 Predicting consumers' purchasing behaviors is critical for targeted advertisement and sales promotion in e-commerce. Human faces are an invaluable source of information for gaining insights into consumer personality and behavioral traits. However, consumer's faces are largely unexplored in previous research, and the existing face-related studies focus on high-level features such as personality traits while neglecting the business significance of learning from facial data. We propose to predict consumers' purchases based on their facial features and purchasing histories. We design a semi-supervised model based on a hierarchical embedding network to extract high-level features of consumers and to predict the top-$N$ purchase destinations of a consumer. Our experimental results on a real-world dataset demonstrate the positive effect of incorporating facial information in predicting consumers' purchasing behaviors.	翻訳日:2022-11-10 14:42:42 公開日:2020-07-14
# 社会的・文脈的に認知される人間の動きとポーズ予測 Socially and Contextually Aware Human Motion and Pose Forecasting ( http://arxiv.org/abs/2007.06843v1 ) ライセンス: Link先を確認	Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi	(参考訳) 人間と対話しながらスムーズでシームレスなロボットナビゲーションは、人間の動きを予測することに依存する。このような人間のダイナミクスの予測には、人間の軌跡(球運動)や詳細な体の動き(局所運動)をモデル化することが多い。先行研究は通常、地域と世界の動きを別々に取り組んだ。本稿では,人間の動作(あるいは軌道)と身体骨格のポーズ予測の両方を統一されたエンドツーエンドパイプラインで行うための新しい枠組みを提案する。この現実的な問題に対処するため、我々は、この予測タスクの重要な手がかりとして、シーンと社会的文脈の両方を、提案フレームワークに組み込むことを検討する。この2つのタスクを一つにまとめます一共有GRUエンコーダ及び共有GRUエンコーダを用いてその履歴を符号化すること二基準を損失として適用し、各業務における誤差の源泉を単一の距離として連続的に測定すること。次に,映像データの時空間表現を符号化することでシーンコンテキストを組み込む。また,ソーシャル・プーリング・レイヤを用いて,人物の動作とポーズから共同特徴表現を生成することにより,社会的手がかりも含んでいる。最後に、GRUベースのデコーダを使用して、動きと骨格のポーズを予測します。提案手法は,2つのソーシャルデータセットのベースラインよりも優れた性能を示す。 Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline. To deal with this real-world problem, we consider incorporating both scene and social contexts, as critical clues for this prediction task, into our proposed framework. To this end, we first couple these two tasks by i) encoding their history using a shared Gated Recurrent Unit (GRU) encoder and ii) applying a metric as loss, which measures the source of errors in each task jointly as a single distance. Then, we incorporate the scene context by encoding a spatio-temporal representation of the video data. We also include social clues by generating a joint feature representation from motion and pose of all individuals from the scene using a social pooling layer. Finally, we use a GRU based decoder to forecast both motion and skeleton pose. We demonstrate that our proposed framework achieves a superior performance compared to several baselines on two social datasets.	翻訳日:2022-11-10 14:42:29 公開日:2020-07-14
# 動的シーン再構築のためのトポロジー・チェンジ対応ボリュームフュージョン Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction ( http://arxiv.org/abs/2007.06853v1 ) ライセンス: Link先を確認	Chao Li and Xiaohu Guo	(参考訳) トポロジー変化は動的シーンの4次元再構成において難しい問題である。古典的な体積融合に基づくフレームワークでは、メッシュは通常TSDF体積から標準表面表現として抽出され、変形場の推定に役立ちます。しかし、表面メッシュが固定接続性を持つため、表面変形グラフと埋め込み変形グラフ(EDG)の表現はトポロジ上の矛盾をもたらすが、変形場は不連続である。本稿では, TSDFとEDGの両方に非多様体体積格子の新たな構造を導入し, セル分割・複製による接続更新を可能にすることにより, トポロジ変化下での動的シーンの4次元再構成を実現する。実験では、最先端手法と比較して、トポロジー変化の動的シーンに対する説得力のある再構成結果を示す。 Topology change is a challenging problem for 4D reconstruction of dynamic scenes. In the classic volumetric fusion-based framework, a mesh is usually extracted from the TSDF volume as the canonical surface representation to help estimating deformation field. However, the surface and Embedded Deformation Graph (EDG) representations bring conflicts under topology changes since the surface mesh has fixed-connectivity but the deformation field can be discontinuous. In this paper, the classic framework is re-designed to enable 4D reconstruction of dynamic scene under topology changes, by introducing a novel structure of Non-manifold Volumetric Grid to the re-design of both TSDF and EDG, which allows connectivity updates by cell splitting and replication. Experiments show convincing reconstruction results for dynamic scenes of topology changes, as compared to the state-of-the-art methods.	翻訳日:2022-11-10 14:42:09 公開日:2020-07-14
# 動作境界検出による過分割誤りの軽減 Alleviating Over-segmentation Errors by Detecting Action Boundaries ( http://arxiv.org/abs/2007.06866v1 ) ライセンス: Link先を確認	Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, Hirokatsu Kataoka	(参考訳) 本稿では,時間的行動セグメント化作業,すなわちアクションセグメンテーション・リファインメント・フレームワーク(ASRF)の効果的なフレームワークを提案する。我々のモデルアーキテクチャは、長期的特徴抽出器と、アクションセグメンテーションブランチ(ASB)と境界回帰ブランチ(BRB)の2つのブランチから構成される。長期特徴抽出器は、広時間受容野を有する2つの枝に共通特徴を提供する。 ASBはビデオフレームをアクションクラスに分類し、BRBはアクション境界確率を回帰する。 BRBが予測した動作境界はASBの出力を洗練し、性能が大幅に向上した。私たちの貢献は3倍です。 i) 時間的行動セグメント化のためのフレームワークであるASRFを提案し, 時間的行動セグメント化をフレーム単位の行動分類と行動境界回帰に分割する。我々のフレームワークは、予測されたアクション境界を用いてアクションクラスのフレームレベル仮説を洗練する。二) 行動確率の遷移を円滑にするための損失関数を提案し, 時間的行動区分のための各種損失関数の組み合わせを分析する。 (iii)本フレームワークは,3つの難題データセットにおいて最先端手法を上回り,セグメント編集距離で最大13.7%,セグメントf1スコアで最大16.1%の改善を提供する。私たちのコードはまもなく公開されます。 We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code will be publicly available soon.	翻訳日:2022-11-10 14:41:52 公開日:2020-07-14
# TridentAlignとコンテキスト埋め込みによる視覚追跡 Visual Tracking by TridentAlign and Context Embedding ( http://arxiv.org/abs/2007.06887v1 ) ライセンス: Link先を確認	Janghoon Choi, Junseok Kwon, Kyoung Mu Lee	(参考訳) シームズネットワークに基づく視覚追跡手法の最近の進歩は、多数のトラッキングベンチマークで高いパフォーマンスを実現している。しかし、ターゲットオブジェクトと類似のカテゴリを持つイントラクタオブジェクトの広範なスケールのバリエーションは、常に視覚的トラッキングの課題を提起している。このような持続的な問題に対処するために,Siamese ネットワークに基づく視覚的トラッキングのための新しい TridentAlign とコンテキスト埋め込みモジュールを提案する。 tridentalignモジュールは、ターゲットの広範囲なバリエーションや大きな変形への適応性を促進し、対象オブジェクトの特徴表現を複数の空間次元にプールし、特徴ピラミッドを形成する。一方、コンテキスト埋め込みモジュールは、オブジェクト間のグローバルなコンテキスト情報を考慮し、ターゲットを邪魔対象から識別することを目的としている。コンテキスト埋め込みモジュールは、所定のフレームのグローバルコンテキスト情報を、最終分類段階で活用できるように、ローカルな特徴表現に抽出して埋め込みます。複数のベンチマークデータセットから得られた実験結果から,提案トラッカーの性能は最先端トラッカーと同等であり,提案トラッカーはリアルタイムに動作していることがわかった。 Recent advances in Siamese network-based visual tracking methods have enabled high performance on numerous tracking benchmarks. However, extensive scale variations of the target object and distractor objects with similar categories have consistently posed challenges in visual tracking. To address these persisting issues, we propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods. The TridentAlign module facilitates adaptability to extensive scale variations and large deformations of the target, where it pools the feature representation of the target object into multiple spatial dimensions to form a feature pyramid, which is then utilized in the region proposal stage. Meanwhile, context embedding module aims to discriminate the target from distractor objects by accounting for the global context information among objects. The context embedding module extracts and embeds the global context information of a given frame into a local feature representation such that the information can be utilized in the final classification stage. Experimental results obtained on multiple benchmark datasets show that the performance of the proposed tracker is comparable to that of state-of-the-art trackers, while the proposed tracker runs at real-time speed.	翻訳日:2022-11-10 14:41:29 公開日:2020-07-14
# 深層学習と深部画像を用いた高密度人物検出に向けて Towards Dense People Detection with Deep Learning and Depth images ( http://arxiv.org/abs/2007.07171v1 ) ライセンス: Link先を確認	David Fuentes-Jimenez and Cristina Losada-Gutierrez and David Casillas-Perez and Javier Macias-Guarasa and Roberto Martin-Lopez and Daniel Pizarro and Carlos A.Luna	(参考訳) 本稿では,1つの深度画像から複数の人物を検出するDNNシステムを提案する。我々のニューラルネットワークは深度画像を処理し、画像座標の確率マップを出力し、各検出は人の頭を中心にしたガウス型の局所分布に対応する。検出された人物の数と2D画像位置の両方をエンコードし、奥行き画像とカメラキャリブレーションパラメータを用いて各人物の3D位置を復元することができる。私たちのアーキテクチャはコンパクトで、分離された畳み込みを使ってパフォーマンスを高め、低予算gpuでリアルタイムに動作します。まずネットワークのトレーニングにシミュレーションデータを使用し,その後,比較的少ない実データで微調整を行う。我々は,この戦略が効果的であることを示し,訓練中に使用する場面とは異なる場面を一般化するネットワークを創り出す。我々は,従来のDNNベースのソリューションを含め,既存の最先端技術と比較した。本手法は既存の手法よりも優れており,有意な咬合を有するシーンの人物を正確に検出できる。 This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at the person's head. The likelihood map encodes both the number of detected people and their 2D image positions, and can be used to recover the 3D position of each person using the depth image and the camera calibration parameters. Our architecture is compact, using separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use simulated data for initially training the network, followed by fine tuning with a relatively small amount of real data. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training. We thoroughly compare our method against the existing state-of-the-art, including both classical and DNN-based solutions. Our method outperforms existing methods and can accurately detect people in scenes with significant occlusions.	翻訳日:2022-11-10 14:35:26 公開日:2020-07-14
# 絵文字予測:拡張とベンチマーク Emoji Prediction: Extensions and Benchmarking ( http://arxiv.org/abs/2007.07389v1 ) ライセンス: Link先を確認	Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi	(参考訳) 絵文字は、具体的な意味、感情、意図を表現できる簡潔な言語である。絵文字には、コミュニケーションの意図をより理解するために使用できるシグナルも備わっている。それらは私たちの日常生活のユビキタスな部分となり、ユーザー生成コンテンツを理解する重要な部分となっている。絵文字予測タスクは、テキストに関連付けられた適切な絵文字セットを予測することを目的としている。絵文字予測により、モデルは書かれたテキストのコミュニケーション意図の豊かな表現を学ぶことができる。絵文字予測タスクに関する既存の研究は、特定の感情と密接に関連する絵文字の少数のサブセットに焦点を当てているが、この設定はタスクを単純化し、絵文字の表現力を無駄にする。本稿では,絵文字予測タスクの既存の設定を,よりリッチな絵文字セットを含むように拡張し,タスクのマルチラベル分類を可能にする。トランスフォーマーネットワークに基づくマルチクラス・マルチラベル絵文字予測のための新しいモデルを提案する。また、ヒューリスティックスを用いてTwitterから複数の絵文字予測データセットを構築する。 BERTモデルは、すべてのデータセットに対して、すべての設定下で最先端のパフォーマンスを達成し、相対的な改善は27.21%から236.36%、トップ5の精度は2.01%から88.28%、F-1のスコアは65.19%から346.79%である。本研究は,絵文字予測タスクにおける深いトランスフォーマーモデルの有効性を示す。また、将来の研究者のために、https://github.com/hikari-NYU/Emoji_Prediction_Datasets_MMSでデータセットをリリースしています。 Emojis are a succinct form of language which can express concrete meanings, emotions, and intentions. Emojis also carry signals that can be used to better understand communicative intent. They have become a ubiquitous part of our daily lives, making them an important part of understanding user-generated content. The emoji prediction task aims at predicting the proper set of emojis associated with a piece of text. Through emoji prediction, models can learn rich representations of the communicative intent of the written text. While existing research on the emoji prediction task focus on a small subset of emoji types closely related to certain emotions, this setting oversimplifies the task and wastes the expressive power of emojis. In this paper, we extend the existing setting of the emoji prediction task to include a richer set of emojis and to allow multi-label classification on the task. We propose novel models for multi-class and multi-label emoji prediction based on Transformer networks. We also construct multiple emoji prediction datasets from Twitter using heuristics. The BERT models achieve state-of-the-art performances on all our datasets under all the settings, with relative improvements of 27.21% to 236.36% in accuracy, 2.01% to 88.28% in top-5 accuracy and 65.19% to 346.79% in F-1 score, compared to the prior state-of-the-art. Our results demonstrate the efficacy of deep Transformer-based models on the emoji prediction task. We also release our datasets at https://github.com/hikari-NYU/Emoji_Prediction_Datasets_MMS for future researchers.	翻訳日:2022-11-10 14:33:58 公開日:2020-07-14
# 深層学習者の活用によるメール生成におけるコヒーレンシーのモデル化 Modeling Coherency in Generated Emails by Leveraging Deep Neural Learners ( http://arxiv.org/abs/2007.07403v1 ) ライセンス: Link先を確認	Avisha Das and Rakesh M. Verma	(参考訳) 高度な機械学習と自然言語技術により、攻撃者は高度なソーシャルエンジニアリングに基づく攻撃を開始することができる。攻撃的な問題に対処するため、研究者は積極的に検出する方法に頼ってきた。標的のメールを使って被害者を騙すメールは、高度な攻撃方法である。しかし、自動テキスト生成には生成したコンテンツのコンテキストと一貫性の制御が必要である。この方法は、入力文書内の文の学習表現を用いて構造化された電子メールを生成する階層型ディープニューラルモデルを利用する。深層モデルを用いて,ターゲットとする短文メッセージの生成を実証する。合成テキストのグローバルなコヒーレンシーを質的研究と複数の定量的尺度を用いて評価する。 Advanced machine learning and natural language techniques enable attackers to launch sophisticated and targeted social engineering-based attacks. To counter the active attacker issue, researchers have since resorted to proactive methods of detection. Email masquerading using targeted emails to fool the victim is an advanced attack method. However automatic text generation requires controlling the context and coherency of the generated content, which has been identified as an increasingly difficult problem. The method used leverages a hierarchical deep neural model which uses a learned representation of the sentences in the input document to generate structured written emails. We demonstrate the generation of short and targeted text messages using the deep model. The global coherency of the synthesized text is evaluated using a qualitative study as well as multiple quantitative measures.	翻訳日:2022-11-10 14:33:32 公開日:2020-07-14
# 集合的推論を支援するモデル:形式化・分析・計算評価 A model to support collective reasoning: Formalization, analysis and computational assessment ( http://arxiv.org/abs/2007.06850v1 ) ライセンス: Link先を確認	Jordi Ganzer, Natalia Criado, Maite Lopez-Sanchez, Simon Parsons, Juan A. Rodriguez-Aguilar	(参考訳) 本稿では,e-participationシステムに着想を得て,人間の議論を表現し,それらから集団的な結論を得るための新しいモデルを提案する。このモデルは,ユーザが議論に新たな情報を導入し,既存の情報に関連付けることによって,既存のアプローチの欠点を克服すると同時に,他のユーザの提案した情報に対する意見を表明する。また,このモデルでは,ユーザの意見が合理的であるとして,情報抽出を前提とせず,現在のアプローチを著しく制限している。代わりに、一貫性のある意見を特徴付ける合理性の弱い概念を定義し、個別の意見の一貫性とユーザーが議論構造に持つコンセンサスレベルに基づいて異なるシナリオを考察する。この2つの要因を考慮し,個別の意見と討論構造に基づいて集団意思決定を行う異なる意見集約関数の結果を分析した。特に,合意の欠如や個々人の意見が一貫性がない場合でも,総合的な意見が一貫性を持つことを実証する。本研究は,実物大の議論に対して,集団的意見を効率的に計算できることを示す数値的評価で結論づける。 Inspired by e-participation systems, in this paper we propose a new model to represent human debates and methods to obtain collective conclusions from them. This model overcomes drawbacks of existing approaches by allowing users to introduce new pieces of information into the discussion, to relate them to existing pieces, and also to express their opinion on the pieces proposed by other users. In addition, our model does not assume that users' opinions are rational in order to extract information from it, an assumption that significantly limits current approaches. Instead, we define a weaker notion of rationality that characterises coherent opinions, and we consider different scenarios based on the coherence of individual opinions and the level of consensus that users have on the debate structure. Considering these two factors, we analyse the outcomes of different opinion aggregation functions that compute a collective decision based on the individual opinions and the debate structure. In particular, we demonstrate that aggregated opinions can be coherent even if there is a lack of consensus and individual opinions are not coherent. We conclude our analysis with a computational evaluation demonstrating that collective opinions can be computed efficiently for real-sized debates.	翻訳日:2022-11-10 14:33:07 公開日:2020-07-14
# ReLUネットワークのグラディエントDescent Trainingにおけるプラトー現象:説明,定量化,回避 Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance ( http://arxiv.org/abs/2007.07213v1 ) ライセンス: Link先を確認	Mark Ainsworth and Yeonjong Shin	(参考訳) ニューラルネットワークが幅広いアプリケーションに‘クラス最高の’近似を提供する能力は、十分に文書化されている。それでも、ニューラルネットワークの強力な表現性は、ネットワークを定義するパラメータを効果的にトレーニング(チョース)できない場合に問題となる。一般に、ニューラルネットワークは勾配降下型最適化法またはその確率的変種によって訓練される。実際には、そのような方法ではトレーニング開始時に損失関数が急速に低下するが、比較的少数のステップの後、大幅に低下する。この損失は、多くのエポックの期間に停滞しているように見えるが、その間に突然減少し始めるが、その原因は明らかでない。このいわゆるプラトー現象は多くの学習課題に現れている。本研究の目的は,高原現象の根本原因の同定と定量化である。トレーニングデータ数に対するニューロン数についての仮定は行われず,怠け者と適応者の両方について結果が得られた。主な発見は、活性化パターンが一定である期間、活性化パターンは与えられたニューロンを活性化するデータ点の数、勾配流れのダイナミクスの収束の定量化、およびトレーニングデータのサブセット上の局所的最小二乗回帰線の解による静止点のキャラクタリゼーションである。そこで,本研究では,各ステップにおける活性化パターンの明示的な調整により特徴付けられる,新しい反復学習法である活動ニューロン最小二乗法(anls)を提案する。図示的な数値の例が全て含まれている。 The ability of neural networks to provide `best in class' approximation across a wide range of applications is well-documented. Nevertheless, the powerful expressivity of neural networks comes to naught if one is unable to effectively train (choose) the parameters defining the network. In general, neural networks are trained by gradient descent type optimization methods, or a stochastic variant thereof. In practice, such methods result in the loss function decreases rapidly at the beginning of training but then, after a relatively small number of steps, significantly slow down. The loss may even appear to stagnate over the period of a large number of epochs, only to then suddenly start to decrease fast again for no apparent reason. This so-called plateau phenomenon manifests itself in many learning tasks. The present work aims to identify and quantify the root causes of plateau phenomenon. No assumptions are made on the number of neurons relative to the number of training data, and our results hold for both the lazy and adaptive regimes. The main findings are: plateaux correspond to periods during which activation patterns remain constant, where activation pattern refers to the number of data points that activate a given neuron; quantification of convergence of the gradient flow dynamics; and, characterization of stationary points in terms solutions of local least squares regression lines over subsets of the training data. Based on these conclusions, we propose a new iterative training method, the Active Neuron Least Squares (ANLS), characterised by the explicit adjustment of the activation pattern at each step, which is designed to enable a quick exit from a plateau. Illustrative numerical examples are included throughout.	翻訳日:2022-11-10 14:26:30 公開日:2020-07-14
# k-centerクラスタリングに対するペアワイズフェアとコミュニティ保存アプローチ A Pairwise Fair and Community-preserving Approach to k-Center Clustering ( http://arxiv.org/abs/2007.07384v1 ) ライセンス: Link先を確認	Brian Brubach, Darshan Chakrabarti, John P. Dickerson, Samir Khuller, Aravind Srinivasan, Leonidas Tsepenekas	(参考訳) クラスタリングは多くのアプリケーションで機械学習の基本的な問題である。機械学習が自動化システムのバックエンドとして普及するにつれて、公平性に関する懸念が生まれます。フェアネスに関する現在の文献の多くは、教師付き学習(グループフェアネス)における保護されたクラスに対する差別を扱う。 2つの点(あるいは1つの点のコミュニティ)が分離される確率が、ペアワイズ距離(あるいはコミュニティの直径)の増大関数によって境界づけられるという、フェアクラスタリングの異なる概念を定義する。データポイントが一括してクラスタ化されるメリットを享受する人々を表す状況を取り除きます。不公平は、特定のポイントが任意に、あるいは選挙地区のように、彼らを傷つけようとする誰かによって、決定論的に分離されたときに生じる。そこで我々は,クラスタリング設定において,ペアワイズフェアネスとコミュニティ保存という2つの新たなフェアネスを正式に定義する。公平性目標の実用性を探るために、我々は、これらの公平性制約を満たすために既存の$k$中心アルゴリズムを拡張するアプローチを考案する。このアプローチの解析は、公平性を維持しながら合理的な近似が達成できることを証明している。実験では、従来の$k$-centerアルゴリズム/ヒューリスティックスに対するアプローチの有効性を比較し、最適なクラスタリングと公正性のトレードオフを探る。 Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with discrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probability that two points (or a community of points) become separated is bounded by an increasing function of their pairwise distance (or community diameter). We capture the situation where data points represent people who gain some benefit from being clustered together. Unfairness arises when certain points are deterministically separated, either arbitrarily or by someone who intends to harm them as in the case of gerrymandering election districts. In response, we formally define two new types of fairness in the clustering setting, pairwise fairness and community preservation. To explore the practicality of our fairness goals, we devise an approach for extending existing $k$-center algorithms to satisfy these fairness constraints. Analysis of this approach proves that reasonable approximations can be achieved while maintaining fairness. In experiments, we compare the effectiveness of our approach to classical $k$-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness.	翻訳日:2022-11-10 14:25:28 公開日:2020-07-14
# グラフの深層学習によるタンパク質の情報還元表現の同定の高速化 Accelerating the identification of informative reduced representations of proteins with deep learning for graphs ( http://arxiv.org/abs/2007.08658v1 ) ライセンス: Link先を確認	Federico Errica, Marco Giulini, Davide Bacciu, Roberto Menichetti, Alessio Micheli, Raffaello Potestio	(参考訳) 分子動力学(MD)シミュレーションの限界は、コンピュータアーキテクチャとアルゴリズムの絶え間ない発展によって着実に前進している。このMD軌道の量と範囲(サイズと時間)の爆発は、原データの合理化と定量化のための自動化および転送可能な方法の必要性を引き起こす。近年,タンパク質の原子のサブセットを同定するアルゴリズム的手法が開発され,最も情報的な記述が可能となった。この方法は、与えられた縮小表現に対して、関連するマッピングエントロピー(つまり、単純化による情報損失の尺度)の計算に依存する。比較的単純だが、この計算には時間がかかる。本稿では,マッピングエントロピーの計算の高速化を目的としたディープラーニング手法の実装について述べる。この方法はディープグラフネットワークに依存しており、入力フォーマットの柔軟性が極めて高い。深部グラフネットワークは正確かつ極めて効率的であり,マッピングエントロピーのアルゴリズム計算に対して最大10^5$の高速化係数を持つことを示す。この手法の応用は、マッピングエントロピーの景観を再構築する際に生体分子の研究に大きな可能性をもたらすが、この手法は分子の構造の任意の関数の計算に容易に移行できるスキームである。 The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed forward by the relentless developments of computer architectures and algorithms. This explosion in the number and extent (in size and time) of MD trajectories induces the need of automated and transferable methods to rationalise the raw data and make quantitative sense out of them. Recently, an algorithmic approach was developed by some of us to identify the subset of a protein's atoms, or mapping, that enables the most informative description of it. This method relies on the computation, for a given reduced representation, of the associated mapping entropy, that is, a measure of the information loss due to the simplification. Albeit relatively straightforward, this calculation can be time consuming. Here, we describe the implementation of a deep learning approach aimed at accelerating the calculation of the mapping entropy. The method relies on deep graph networks, which provide extreme flexibility in the input format. We show that deep graph networks are accurate and remarkably efficient, with a speedup factor as large as $10^5$ with respect to the algorithmic computation of the mapping entropy. Applications of this method, which entails a great potential in the study of biomolecules when used to reconstruct its mapping entropy landscape, reach much farther than this, being the scheme easily transferable to the computation of arbitrary functions of a molecule's structure.	翻訳日:2022-11-10 14:24:46 公開日:2020-07-14
# モノのインターネットのためのレコメンダシステム:調査 Recommender Systems for the Internet of Things: A Survey ( http://arxiv.org/abs/2007.06758v1 ) ライセンス: Link先を確認	May Altulyan, Lina Yao, Xianzhi Wang, Chaoran Huang, Salil S Kanhere, Quan Z Sheng	(参考訳) 勧告はIoT(Internet of Things)のメリットの開発と促進において重要な段階である。従来のレコメンデータシステムは、成長を続ける、動的で、異質なIoTデータを利用できない。本稿では,最先端のレコメンダシステムに関する総合的なレビューと,iotの活気ある分野における関連技術とアプリケーションについて述べる。本稿では,iotへのレコメンデーションシステムの適用に関するいくつかの制限について議論し,既存の研究を比較するための参照フレームワークを提案する。 Recommendation represents a vital stage in developing and promoting the benefits of the Internet of Things (IoT). Traditional recommender systems fail to exploit ever-growing, dynamic, and heterogeneous IoT data. This paper presents a comprehensive review of the state-of-the-art recommender systems, as well as related techniques and application in the vibrant field of IoT. We discuss several limitations of applying recommendation systems to IoT and propose a reference framework for comparing existing studies to guide future research and practices.	翻訳日:2022-11-10 14:17:45 公開日:2020-07-14
# Pareto-Embeddingsによる選択関数の学習 Learning Choice Functions via Pareto-Embeddings ( http://arxiv.org/abs/2007.06927v1 ) ライセンス: Link先を確認	Karlson Pfannschmidt, Eyke H\"ullermeier	(参考訳) 本研究では,各オブジェクトが特徴ベクトルで表現される対象の集合から選択することの難しさを考察する。選択モデリングにおける伝統的なアプローチは、主に潜在、実数値の効用関数の学習に基づいており、選択の代替関数に対して線形順序を誘導する。このアプローチは離散的な(トップ-1)選択に適しているが、サブセットの選択にどのように使うかは単純ではない。実数直線の選択肢を写像する代わりに、パレート最適点を持つ選択集合を識別する高次元のユーティリティ空間にそれらを埋め込むことを提案する。そこで本研究では,このタスクに適した微分可能損失関数を最小化する学習アルゴリズムを提案する。ベンチマークデータセットのスイート上でPareto-embeddingを学習する可能性を示す。 We consider the problem of learning to choose from a given set of objects, where each object is represented by a feature vector. Traditional approaches in choice modelling are mainly based on learning a latent, real-valued utility function, thereby inducing a linear order on choice alternatives. While this approach is suitable for discrete (top-1) choices, it is not straightforward how to use it for subset choices. Instead of mapping choice alternatives to the real number line, we propose to embed them into a higher-dimensional utility space, in which we identify choice sets with Pareto-optimal points. To this end, we propose a learning algorithm that minimizes a differentiable loss function suitable for this task. We demonstrate the feasibility of learning a Pareto-embedding on a suite of benchmark datasets.	翻訳日:2022-11-10 14:16:35 公開日:2020-07-14
# ADSAGE: きめ細かいレベルでのインサイダー脅威検出に応用した分散グラフエッジ列の異常検出 ADSAGE: Anomaly Detection in Sequences of Attributed Graph Edges applied to insider threat detection at fine-grained level ( http://arxiv.org/abs/2007.06985v1 ) ライセンス: Link先を確認	Mathieu Garchery and Michael Granitzer	(参考訳) CERTインサイダーの脅威検出ケースに関する以前の研究は、ユーザ動作の関連性にもかかわらず、グラフとテキストの特徴を無視している。さらに既存のシステムは、悪意のあるアクティビティを検出するために、機能エンジニアリングと監査データアグリゲーションに大きく依存している。これは時間がかかり、専門家の知識が必要であり、正確なユーザーアクションに対する警告のトレースを防ぐ。これらの問題に対処するために、グラフエッジとしてモデル化された監査ログイベントの異常を検出するADSAGEを導入する。私たちの一般的な方法は、エッジシーケンスと属性の両方をサポートしながら、エッジレベルで異常検出を行う最初の方法です。本稿では、CERTのユースケースから異なる監査ログにおいて、ADSAGEをきめ細かなイベントレベルのインサイダー脅威検出に利用する方法について述べる。 CERT問題に標準ベンチマークがないことに留意し、現実的なリコールベースのメトリクスに基づいた評価設定を以前提案した。我々は、CERTインサイダー脅威データセットの認証、Eメールトラフィック、Webブラウジングログ、および実世界の認証イベントについてADSAGEを評価する。 ADSAGEは認証の異常、ユーザとコンピュータのインタラクション、メール通信の異常を検出するのに有効である。単純なベースラインも驚くほど強い結果をもたらす。興味深いことに、いくつかの検出器は相補的であり、検出を改善するために組み合わせられる可能性がある。全体として,グラフの特徴は悪意のあるインサイダー活動を特徴付けるのに有益であり,きめ細かいレベルでの検知が可能であることを示す。 Previous works on the CERT insider threat detection case have neglected graph and text features despite their relevance to describe user behavior. Additionally, existing systems heavily rely on feature engineering and audit data aggregation to detect malicious activities. This is time consuming, requires expert knowledge and prevents tracing back alerts to precise user actions. To address these issues we introduce ADSAGE to detect anomalies in audit log events modeled as graph edges. Our general method is the first to perform anomaly detection at edge level while supporting both edge sequences and attributes, which can be numeric, categorical or even text. We describe how ADSAGE can be used for fine-grained, event level insider threat detection in different audit logs from the CERT use case. Remarking that there is no standard benchmark for the CERT problem, we use a previously proposed evaluation setting based on realistic recall-based metrics. We evaluate ADSAGE on authentication, email traffic and web browsing logs from the CERT insider threat datasets, as well as on real-world authentication events. ADSAGE is effective to detect anomalies in authentications, modeled as user to computer interactions, and in email communications. Simple baselines give surprisingly strong results as well. We also report performance split by malicious scenarios present in the CERT datasets: interestingly, several detectors are complementary and could be combined to improve detection. Overall, our results show that graph features are informative to characterize malicious insider activities, and that detection at fine-grained level is possible.	翻訳日:2022-11-10 14:16:14 公開日:2020-07-14
# 構造付き潜在共同設立者によるガウス過程を用いた因果推論 Causal Inference using Gaussian Processes with Structured Latent Confounders ( http://arxiv.org/abs/2007.07127v1 ) ライセンス: Link先を確認	Sam Witty, Kenta Takatsu, David Jensen, Vikash Mansinghka	(参考訳) 潜在的共同設立者--治療と結果の両方に影響を与える未観測変数---因果効果の偏見を推定する。例えば、コースを受講しているすべての学生は、個別に受ける教育的介入に加えて、コースの難しさの影響を受けている。本稿では,この構造を持つ助成金を半パラメトリックにモデル化し,因果効果の評価を改善する方法について述べる。鍵となる革新は階層的ベイズモデル、構造化潜在共同設立者(GP-SLC)を持つガウス過程、楕円スライスサンプリングに基づくモンテカルロ推論アルゴリズムである。 GP-SLCは、共同設立者、共変量、治療、結果に関連する機能形式に関する最小限の仮定で、個々の治療効果のベイズ的不確実性推定を提供する。最後に, gp-slcは, 乳幼児保健開発プログラムや, 温度変化がニューイングランド全域のエネルギー消費に与える影響を示すデータセットなど, 3つのベンチマークデータセットにおいて, 広く使用されている因果推論技術と競合しているか, またはより正確であることを示す。 Latent confounders---unobserved variables that influence both treatment and outcome---can bias estimates of causal effects. In some cases, these confounders are shared across observations, e.g. all students taking a course are influenced by the course's difficulty in addition to any educational interventions they receive individually. This paper shows how to semiparametrically model latent confounders that have this structure and thereby improve estimates of causal effects. The key innovations are a hierarchical Bayesian model, Gaussian processes with structured latent confounders (GP-SLC), and a Monte Carlo inference algorithm for this model based on elliptical slice sampling. GP-SLC provides principled Bayesian uncertainty estimates of individual treatment effect with minimal assumptions about the functional forms relating confounders, covariates, treatment, and outcome. Finally, this paper shows GP-SLC is competitive with or more accurate than widely used causal inference techniques on three benchmark datasets, including the Infant Health and Development Program and a dataset showing the effect of changing temperatures on state-wide energy consumption across New England.	翻訳日:2022-11-10 14:15:04 公開日:2020-07-14
# 注意と差別:ウェアラブルセンサを用いた人間行動認識の現状と課題 Attend And Discriminate: Beyond the State-of-the-Art for Human Activity Recognition using Wearable Sensors ( http://arxiv.org/abs/2007.07172v1 ) ライセンス: Link先を確認	Alireza Abedin, Mahsa Ehsanpour, Qinfeng Shi, Hamid Rezatofighi, Damith C. Ranasinghe	(参考訳) ウェアラブルは、特にリハビリテーションからきめ細かい歩行分析に至るまで、医療応用の増加のために、人間の活動に対する理解を改善するための基本となる。ウェアラブルに関するHAR(Human Activity Recognition)問題を解決するための総合的なノウハウは、エンドツーエンドのディープラーニングパラダイムによって大きく進歩しているが、いくつかの基本的な機会は見過ごされ続けている。我々は、豊かで差別性の高い活動表現を学習するこれらの新しい機会を精力的に探求する。提案します一マルチチャネルセンサモダリティと特定活動の潜在関係を利用するための学習二深部HARモデルの標準化のためのマルチモーダルセンサデータストリームにおけるデータ非依存化の有効性の検討及び三クラス間差を最大化しつつ、クラス内差の最小化を図るために分類損失基準を組み込むこと。当社の貢献は、4つの多様なアクティビティ認識問題ベンチマークで新たな最先端のパフォーマンスを達成し、大きなマージンと最大6%のマージン改善を実現しています。我々は,この設計概念からの貢献を,量的および質的研究を通じて共有される活動的不均衡尺度,アブレーション研究,洞察など,広範な実験を通じて広範囲に検証した。 Wearables are fundamental to improving our understanding of human activities, especially for an increasing number of healthcare applications from rehabilitation to fine-grained gait analysis. Although our collective know-how to solve Human Activity Recognition (HAR) problems with wearables has progressed immensely with end-to-end deep learning paradigms, several fundamental opportunities remain overlooked. We rigorously explore these new opportunities to learn enriched and highly discriminating activity representations. We propose: i) learning to exploit the latent relationships between multi-channel sensor modalities and specific activities; ii) investigating the effectiveness of data-agnostic augmentation for multi-modal sensor data streams to regularize deep HAR models; and iii) incorporating a classification loss criterion to encourage minimal intra-class representation differences whilst maximising inter-class differences to achieve more discriminative features. Our contributions achieves new state-of-the-art performance on four diverse activity recognition problem benchmarks with large margins -- with up to 6% relative margin improvement. We extensively validate the contributions from our design concepts through extensive experiments, including activity misalignment measures, ablation studies and insights shared through both quantitative and qualitative studies.	翻訳日:2022-11-10 14:14:47 公開日:2020-07-14
# TCGM: 半教師付きマルチモーダル学習のための情報理論フレームワーク TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning ( http://arxiv.org/abs/2007.06793v1 ) ライセンス: Link先を確認	Xinwei Sun, Yilun Xu, Peng Cao, Yuqing Kong, Lingjing Hu, Shanghang Zhang, Yizhou Wang	(参考訳) 複数のモダリティからデータを抽出することで、機械学習システムのトレーニングにより多くの情報を提供する。しかしながら、各モダリティを大量のデータでラベル付けすることは、非常に高価で時間がかかるため、半教師付きマルチモーダル学習の重要な問題となる。既存の手法は、適切な仮定の下でのモダリティ間の非効率的な融合または理論的保証の欠如に苦しむ。本稿では, 半教師付きマルチモーダル学習のための新しい情報理論的手法, \textbf{t}otal \textbf{c}orrelation \textbf{g}ain \textbf{m}aximization (tcgm)を提案する。一ラベルなしデータポイントの異なるモダリティの情報を有効活用して各モダリティの訓練分類を行うことができること。 (ii) ベイズ分類器を同定する理論的保証、すなわちすべてのモダリティの根本的真理を同定すること。具体的には、すべてのモダリティの分類器に対するtc誘発損失(すなわちtcゲイン)を最大化することで、これらの分類器は協調的に対応する接地型分類器の類型を発見し、ラベル付きデータの限られた割合を活用することでユニークなものを識別することができる。本手法を様々なタスクに適用し,ニュース分類,感情認識,疾患予測など最新の結果を得る。 Fusing data from multiple modalities provides more information to train machine learning systems. However, it is prohibitively expensive and time-consuming to label each modality with a large amount of data, which leads to a crucial problem of semi-supervised multi-modal learning. Existing methods suffer from either ineffective fusion across modalities or lack of theoretical guarantees under proper assumptions. In this paper, we propose a novel information-theoretic approach, namely \textbf{T}otal \textbf{C}orrelation \textbf{G}ain \textbf{M}aximization (TCGM), for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) it has theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities. Specifically, by maximizing TC-induced loss (namely TC gain) over classifiers of all modalities, these classifiers can cooperatively discover the equivalent class of ground-truth classifiers; and identify the unique ones by leveraging limited percentage of labeled data. We apply our method to various tasks and achieve state-of-the-art results, including news classification, emotion recognition and disease prediction.	翻訳日:2022-11-10 14:09:00 公開日:2020-07-14
# コーン・エプシロン・ドミナンス:進化的多目的最適化へのアプローチ The Cone epsilon-Dominance: An Approach for Evolutionary Multiobjective Optimization ( http://arxiv.org/abs/2008.04224v1 ) ライセンス: Link先を確認	Lucas S. Batista, Felipe Campelo, Frederico G. Guimar\~aes and Jaime A. Ram\'irez	(参考訳) 本稿では,多目的進化アルゴリズム(moeas)の収束と多様性を改善するためのコーン・エプシロン・ドミナンス手法を提案する。標準パレート関係(NSGA-II,NSGA-II,SPEA2,クラスタ化NSGA-II)およびエプシロン支配(eps-MOEA)に基づいて、コーン-エピス-MOEAをMOEAと比較した。この比較は、計算の複雑さと、各アルゴリズムによって得られた最終結果の質を定量化するために選択された4つのパフォーマンス指標、すなわち、多くの集合メトリクスの収束、多様性、ハイパーボリューム、カバレッジの両方において行われる。 ZDTやDTLZファミリーを含む16の有名なベンチマーク問題が実験室で検討されている。アルゴリズム間の相違性を評価するため、4つの性能指標について慎重に設計した実験を行った。その結果、コーン・エプス・MOEAは、考慮されたすべての性能指標に対して、効率的かつバランスの取れた性能を示すことができることが示唆された。これらの結果は、コーン-エプス-MOEAは、パレートフロントへの収束と多様性の効率的なバランスを得るための競争的アプローチであり、多目的最適化問題の解決に有用なツールである、という結論を強く支持している。 We propose the cone epsilon-dominance approach to improve convergence and diversity in multiobjective evolutionary algorithms (MOEAs). A cone-eps-MOEA is presented and compared with MOEAs based on the standard Pareto relation (NSGA-II, NSGA-II, SPEA2, and a clustered NSGA-II) and on the epsilon-dominance (eps-MOEA). The comparison is performed both in terms of computational complexity and on four performance indicators selected to quantify the quality of the final results obtained by each algorithm: the convergence, diversity, hypervolume, and coverage of many sets metrics. Sixteen well-known benchmark problems are considered in the experimental section, including the ZDT and the DTLZ families. To evaluate the possible differences amongst the algorithms, a carefully designed experiment is performed for the four performance metrics. The results obtained suggest that the cone-eps-MOEA is capable of presenting an efficient and balanced performance over all the performance metrics considered. These results strongly support the conclusion that the cone-eps-MOEA is a competitive approach for obtaining an efficient balance between convergence and diversity to the Pareto front, and as such represents a useful tool for the solution of multiobjective optimization problems.	翻訳日:2022-11-10 14:06:30 公開日:2020-07-14
# 2層ニューラルネットワークにおける2次ダイナミクスの大域収束 Global Convergence of Second-order Dynamics in Two-layer Neural Networks ( http://arxiv.org/abs/2007.06852v1 ) ライセンス: Link先を確認	Walid Krichene, Kenneth F. Caluya, Abhishek Halder	(参考訳) 近年, 2層完全連結ニューラルネットワークでは, 平均場力学とワッサーシュタイン勾配流との接続により, 勾配流は無限幅限界における大域的最適に収束することが示されている。これらの結果は一階の勾配流のために導出され、自然な疑問は二階の力学、すなわち運動量を持つ力学が同様の保証を示すかどうかである。その結果,重球法では正の解が得られた。この場合、結果の積分 pde は非線形運動論的フォッカープランク方程式であり、一階の場合とは異なり、ワッサースタイン勾配流とは明確な関係を持たない。代わりに、解軌道に沿ったリアプノフ汎関数の変種を研究し、定常点を特徴付け、収束を証明する。平均場限界は漸近的であるが,数値シミュレーションにより,大域収束は比較的小さなネットワークで既に発生している可能性が示唆された。 Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit, by making a connection between the mean field dynamics and the Wasserstein gradient flow. These results were derived for first-order gradient flow, and a natural question is whether second-order dynamics, i.e., dynamics with momentum, exhibit a similar guarantee. We show that the answer is positive for the heavy ball method. In this case, the resulting integro-PDE is a nonlinear kinetic Fokker Planck equation, and unlike the first-order case, it has no apparent connection with the Wasserstein gradient flow. Instead, we study the variations of a Lyapunov functional along the solution trajectories to characterize the stationary points and to prove convergence. While our results are asymptotic in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks.	翻訳日:2022-11-10 14:06:04 公開日:2020-07-14
# 少数変数ガウス近似による信用フルート検出に向けて Towards Credit-Fraud Detection via Sparsely Varying Gaussian Approximations ( http://arxiv.org/abs/2007.07181v1 ) ライセンス: Link先を確認	Harshit Sharma, Harsh K. Gandhi, Apoorv Jain	(参考訳) 不正行為は多くの金融機関にとって高価な問題であり、年間数十億ドルを企業に費やしている。この分野でのより一般的な活動はクレジットカード詐欺である。この文脈において、クレジットカード不正検出の概念は、予測システムに不確実性を組み込んで、そのような重要なタスクにおけるより良い判断を確実にするために開発された。本稿では,大規模なデータセットを扱うためにスパースガウス分類法を用い,擬似的あるいは誘導的入力の概念を用いることを提案する。異なるカーネルセットと異なるインジェクションデータポイント数を用いて、RBFカーネルを高いインジェクションポイント数で選択することで、最も精度の高いデータポイントを得ることができた。提案手法は,提案手法の確率的性質と,モデルの信頼性と堅牢性を示す予測に対して,低分散の試験精度を考慮し,大規模な財務データを扱うことができた。ベイズ学習手法の方法論を組み込んだ誘導点現象を用いて、健全な精度と高い信頼度を得ることができる。 Fraudulent activities are an expensive problem for many financial institutions, costing billions of dollars to corporations annually. More commonly occurring activities in this regard are credit card frauds. In this context, the credit card fraud detection concept has been developed over the lines of incorporating the uncertainty in our prediction system to ensure better judgment in such a crucial task. We propose to use a sparse Gaussian classification method to work with the large data-set and use the concept of pseudo or inducing inputs. We perform the same with different sets of kernels and the different number of inducing data points to show the best accuracy was obtained with the selection of RBF kernel with a higher number of inducing points. Our approach was able to work over large financial data given the stochastic nature of our method employed and also good test accuracy with low variance over the prediction suggesting confidence and robustness in our model. Using the methodologies of Bayesian learning techniques with the incorporated inducing points phenomenon, are successfully able to obtain a healthy accuracy and a high confidence score.	翻訳日:2022-11-10 13:59:31 公開日:2020-07-14
# リカレントニューラルネットワークのシャッフリング Shuffling Recurrent Neural Networks ( http://arxiv.org/abs/2007.07324v1 ) ライセンス: Link先を確認	Michael Rotman and Lior Wolf	(参考訳) 本稿では,従来の隠れ状態$h_{t-1}$のベクトル要素を置換し,学習関数$b(x_t)$の入力値$x_t$の出力をt$で加算することにより,隠れ状態$h_t$が得られる新しいリカレントニューラルネットワークモデルを提案する。我々のモデルでは、予測は第2の学習関数によって与えられ、隠れた状態$s(h_t)$に適用される。この方法は実装が容易で、非常に効率的であり、消滅や爆発的な勾配に苦しむことはない。広範な実験において,本手法は主要な文献ベースラインと比較して,競争力のある結果を示す。 We propose a novel recurrent neural network model, where the hidden state $h_t$ is obtained by permuting the vector elements of the previous hidden state $h_{t-1}$ and adding the output of a learned function $b(x_t)$ of the input $x_t$ at time $t$. In our model, the prediction is given by a second learned function, which is applied to the hidden state $s(h_t)$. The method is easy to implement, extremely efficient, and does not suffer from vanishing nor exploding gradients. In an extensive set of experiments, the method shows competitive results, in comparison to the leading literature baselines.	翻訳日:2022-11-10 13:59:02 公開日:2020-07-14
# コスト依存型アンサンブル学習の誤分類:統一フレームワーク Misclassification cost-sensitive ensemble learning: A unifying framework ( http://arxiv.org/abs/2007.07361v1 ) ライセンス: Link先を確認	George Petrides and Wouter Verbeke	(参考訳) 長年にわたり、異なるタイプの誤分類エラーが異なるコストをもたらす場合にデータについて学ぶために、多くのコストに敏感な方法が提案されてきた。私たちの貢献は、コストに敏感なアンサンブルメソッドに関する包括的かつ洞察に富んだ概要を提供する統一フレームワークです。我々のフレームワークには、AdaBoost、Bagging、Random Forestなど、メソッド間の自然な拡張とアイデアの一般化が含まれており、結果として、現在知られているすべてのメソッドだけでなく、これまで検討されていないいくつかのメソッドも得られます。 Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.	翻訳日:2022-11-10 13:58:25 公開日:2020-07-14
# ストリーミング確率的深部テンソル因子化 Streaming Probabilistic Deep Tensor Factorization ( http://arxiv.org/abs/2007.07367v1 ) ライセンス: Link先を確認	Shikai Fang, Zheng Wang, Zhimeng Pan, Ji Liu, Shandian Zhe	(参考訳) 既存のテンソル分解法の成功にもかかわらず、それらのほとんどが多重線形分解を行い、データ内の様々な複雑な相互作用を捉えるためにディープニューラルネットワークのような強力なモデリングフレームワークを利用することは滅多にない。より重要なのは、非常に表現力が高く、深い因子化のために、実世界のアプリケーションで広く使われているストリーミングデータを扱う効果的なアプローチが欠けていることです。これらの問題に対処するため、SPIDER(Streaming ProbabilistIc Deep tEnsoR factorization method)を提案する。まずベイズ型ニューラルネットワーク(nns)を用いて,深いテンソル分解モデルを構築した。我々は,nn重みよりも先にスパイク・アンド・スラブを割り当て,スパーシティを奨励し,過剰フィットを防止する。そこで我々はTaylor拡張とモーメントマッチングを用いてNN出力の後部を近似し、仮定密度フィルタおよび期待伝搬フレームワークにおいて効率的な後部推論アルゴリズムを開発するランニングモデルエビデンスを算出する。提案アルゴリズムは,新しいテンソルエントリを受信すると,潜在因子とnn重みの後方に応答的な更新を行い,一方,冗長/無使用重みを選択・抑制する。実世界の4つのアプリケーションにアプローチの利点を示す。 Despite the success of existing tensor factorization methods, most of them conduct a multilinear decomposition, and rarely exploit powerful modeling frameworks, like deep neural networks, to capture a variety of complicated interactions in data. More important, for highly expressive, deep factorization, we lack an effective approach to handle streaming data, which are ubiquitous in real-world applications. To address these issues, we propose SPIDER, a Streaming ProbabilistIc Deep tEnsoR factorization method. We first use Bayesian neural networks (NNs) to construct a deep tensor factorization model. We assign a spike-and-slab prior over the NN weights to encourage sparsity and prevent overfitting. We then use Taylor expansions and moment matching to approximate the posterior of the NN output and calculate the running model evidence, based on which we develop an efficient streaming posterior inference algorithm in the assumed-density-filtering and expectation propagation framework. Our algorithm provides responsive incremental updates for the posterior of the latent factors and NN weights upon receiving new tensor entries, and meanwhile select and inhibit redundant/useless weights. We show the advantages of our approach in four real-world applications.	翻訳日:2022-11-10 13:57:56 公開日:2020-07-14
# MainNetにSideNetを追加する Add a SideNet to your MainNet ( http://arxiv.org/abs/2007.13512v1 ) ライセンス: Link先を確認	Adrien Morisot	(参考訳) ディープニューラルネットワークの性能と人気が高まるにつれて、計算コストも増大している。ネットワークの計算フットプリント(量子化、プルーニング、知識蒸留)を減らすための効果的な技術は数多く存在するが、これらは入力に関係なく計算コストが同じであるモデルにつながる。私たちの人間の反応時間は、我々が実行するタスクの複雑さによって異なります。より簡単なタスク(例えば、ボートから犬を区別する)は、より難しいタスクよりもずっと高速に実行される(例えば、類似した2種類の犬種を区別する)。そこで本研究では,我々がsidenetと呼ぶ小さな分類層をmainnetと呼ぶ大規模事前学習済みネットワークにアタッチすることで,適応的ネットワーク複雑化の手法を開発した。入力が与えられると、サイドネットは、softmaxによって得られた信頼度レベルがユーザ決定しきい値を超えている場合に分類を返し、信頼度が低すぎる場合は、大きなメインネットに渡すだけである。これにより、ネットワークのパフォーマンスを計算コストで柔軟にトレードオフすることができます。実験結果から,プレトレーニング済みのResNetとBERT MainNetに加えられた単純な単一層パーセプトロン・サイドネットは,画像やテキストの分類タスクのパフォーマンスを最小限に抑えることができることがわかった。また,サイドネットによって得られる分類を校正し,他の計算量削減手法を補完し,計算精度空間の探索を容易にすること,という3つの望ましい特徴を強調する。 As the performance and popularity of deep neural networks has increased, so too has their computational cost. There are many effective techniques for reducing a network's computational footprint (quantisation, pruning, knowledge distillation), but these lead to models whose computational cost is the same regardless of their input. Our human reaction times vary with the complexity of the tasks we perform: easier tasks (e.g. telling apart dogs from boat) are executed much faster than harder ones (e.g. telling apart two similar looking breeds of dogs). Driven by this observation, we develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet. Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and only passes it along to the large MainNet for further processing if its confidence is too low. This allows us to flexibly trade off the network's performance with its computational cost. Experimental results show that simple single hidden layer perceptron SideNets added onto pretrained ResNet and BERT MainNets allow for substantial decreases in compute with minimal drops in performance on image and text classification tasks. We also highlight three other desirable properties of our method, namely that the classifications obtained by SideNets are calibrated, complementary to other compute reduction techniques, and that they enable the easy exploration of compute accuracy space.	翻訳日:2022-11-10 13:57:26 公開日:2020-07-14
# マルチタスクランキングを用いたソーシャルメディア画像からの水位予測 Water level prediction from social media images with a multi-task ranking approach ( http://arxiv.org/abs/2007.06749v1 ) ライセンス: Link先を確認	P. Chaudhary, S. D'Aronco, J.P. Leitao, K. Schindler, J.D. Wegner	(参考訳) 洪水は最も頻繁で壊滅的な自然災害であり、世界中の何百万人もの人々に影響を与えている。正確な洪水地図を作成して(オフライン)計画し、(リアルタイム)洪水対策と洪水救助活動を行うことが重要である。おそらくソーシャルメディアから集めた画像は、そのタスクに有用な情報を提供することができるだろう。我々は,洪水時のソーシャルメディア画像から水深を推定するコンピュータビジョンシステムを導入し,洪水マップを(ほぼ)リアルタイムに構築する。本稿では,回帰学習とペアランキング損失の両方を用いてモデルを訓練するマルチタスク(ディープ)学習手法を提案する。画像に基づく水位推定の主なボトルネックはトレーニングデータであり,未制御の画像に適切な水深で注釈を付けるのに多くの労力を要する。本研究では,2つの画像のうち,どの画像が水位が高いかを示すのみを示す,注釈付き水位と,より弱いアノテーションのセットから,予測器を消耗的に学習する方法を実証する。さらに,DeepFloodという新たなデータセットと8145の注釈付き地上レベルの画像を提供し,提案手法により,1つのクラウドソース画像から約11cmの平均平方誤差で水位を予測することができることを示す。 Floods are among the most frequent and catastrophic natural disasters and affect millions of people worldwide. It is important to create accurate flood maps to plan (offline) and conduct (real-time) flood mitigation and flood rescue operations. Arguably, images collected from social media can provide useful information for that task, which would otherwise be unavailable. We introduce a computer vision system that estimates water depth from social media images taken during flooding events, in order to build flood maps in (near) real-time. We propose a multi-task (deep) learning approach, where a model is trained using both a regression and a pairwise ranking loss. Our approach is motivated by the observation that a main bottleneck for image-based flood level estimation is training data: it is diffcult and requires a lot of effort to annotate uncontrolled images with the correct water depth. We demonstrate how to effciently learn a predictor from a small set of annotated water levels and a larger set of weaker annotations that only indicate in which of two images the water level is higher, and are much easier to obtain. Moreover, we provide a new dataset, named DeepFlood, with 8145 annotated ground-level images, and show that the proposed multi-task approach can predict the water level from a single, crowd-sourced image with ~11 cm root mean square error.	翻訳日:2022-11-10 13:57:01 公開日:2020-07-14
# reluアクティベーションを用いたニューラルネットワークのための局所領域の線形領域数制限 Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations ( http://arxiv.org/abs/2007.06803v1 ) ライセンス: Link先を確認	Rui Zhu, Bo Lin, Haixu Tang	(参考訳) 線形領域の数は、ReLUのような一方向線形活性化関数を用いたニューラルネットワークの特性の1つであり、他のアクティベーション関数を用いた従来の領域と比較する。この特性はニューラルネットワークファミリー([14])の表現性を反映しており、結果として、ニューラルネットワークモデルの構造的複雑さが計算する関数にどのように影響するかを特徴付けるのに使うことができる。それにもかかわらず、線形領域の数を直接計算することは困難であり、多くの研究者はReLUを用いて深部ニューラルネットワークの線形領域の数(特に上限値)を推定することに集中している。しかし、これらの手法は入力空間全体の上限を推定しようと試みた。理論的な手法では、入力空間の特定の領域内の線形領域の数、例えば、逆例やバックドアトリガーのような訓練データポイントを中心とする球数を推定することができない。本稿では,与えられたReLUニューラルネットワークの入力空間内の任意の球面における線形領域数の上界を推定する最初の手法を提案する。本手法を実装し,区分線形能動関数を用いて深層ニューラルネットワークにおける境界を計算した。実験の結果、ニューラルネットワークをトレーニングしている間、線形領域の境界はトレーニングデータポイントから離れる傾向にあることがわかった。さらに、トレーニングデータ点を中心とする球体は、入力空間内の任意の点よりも線状領域を多く含む傾向があることを観察する。我々の知る限りでは、これは特定のデータポイントの周りの線形領域の境界に関する最初の研究である。我々は、特定の入力領域におけるディープニューラルネットワークの構造的複雑さの調査に向けた第一歩であると考えている。 The number of linear regions is one of the distinct properties of the neural networks using piecewise linear activation functions such as ReLU, comparing with those conventional ones using other activation functions. Previous studies showed this property reflected the expressivity of a neural network family ([14]); as a result, it can be used to characterize how the structural complexity of a neural network model affects the function it aims to compute. Nonetheless, it is challenging to directly compute the number of linear regions; therefore, many researchers focus on estimating the bounds (in particular the upper bound) of the number of linear regions for deep neural networks using ReLU. These methods, however, attempted to estimate the upper bound in the entire input space. The theoretical methods are still lacking to estimate the number of linear regions within a specific area of the input space, e.g., a sphere centered at a training data point such as an adversarial example or a backdoor trigger. In this paper, we present the first method to estimate the upper bound of the number of linear regions in any sphere in the input space of a given ReLU neural network. We implemented the method, and computed the bounds in deep neural networks using the piece-wise linear active function. Our experiments showed that, while training a neural network, the boundaries of the linear regions tend to move away from the training data points. In addition, we observe that the spheres centered at the training data points tend to contain more linear regions than any arbitrary points in the input space. To the best of our knowledge, this is the first study of bounding linear regions around a specific data point. We consider our work as a first step toward the investigation of the structural complexity of deep neural networks in a specific input area.	翻訳日:2022-11-10 13:50:38 公開日:2020-07-14
# スペクトル誘導逆差学習 Spectrum-Guided Adversarial Disparity Learning ( http://arxiv.org/abs/2007.06831v1 ) ライセンス: Link先を確認	Zhe Liu, Lina Yao, Lei Bai, Xianzhi Wang, Can Wang	(参考訳) 行動認識領域におけるクラス内格差を正確に表現することは重要な課題であり、各活動クラスにおける主題固有の変動間の相関を堅牢に表現する必要がある。本研究では,2つの競合する符号化分布を用いてクラス条件付きクラス内不一致を表現し,学習された不一致を識別して精製された潜時符号を学習する,新しいエンド・ツー・エンドの学習フレームワークを提案する。さらに、ドメイン知識を教師なしの方法で組み込んで最適化をガイドし、パフォーマンスをさらに向上させる。 4つのharベンチマークデータセットを用いた実験により,提案手法のロバスト性と一般化が実証された。さらに,性能向上におけるドメイン知識の自動導入の有効性を実証する。 It has been a significant challenge to portray intraclass disparity precisely in the area of activity recognition, as it requires a robust representation of the correlation between subject-specific variation for each activity class. In this work, we propose a novel end-to-end knowledge directed adversarial learning framework, which portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity. Furthermore, the domain knowledge is incorporated in an unsupervised manner to guide the optimization and further boosts the performance. The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art. We further prove the effectiveness of automatic domain knowledge incorporation in performance enhancement.	翻訳日:2022-11-10 13:50:04 公開日:2020-07-14
# 非対称協調機械学習のための追加同型暗号化に基づくディープニューラルネットワーク Additively Homomorphical Encryption based Deep Neural Network for Asymmetrically Collaborative Machine Learning ( http://arxiv.org/abs/2007.06849v1 ) ライセンス: Link先を確認	Yifei Zhang and Hao Zhu	(参考訳) 金融セクターは、さまざまな機械学習技術を適用する多くの機会を提供する。集中型機械学習は金融セクターにおけるさらなる適用を制限する制約を生み出す。データプライバシは、さまざまなセクションでモデルを学習するさまざまな金融および保険アプリケーションにとって、基本的な課題である。本稿では,一方の当事者がデータを所有し,他方がラベルのみを所有する協調機械学習の新たな実践的手法を定義し,これを「非対称協調機械学習」と呼ぶ。本研究では,両者が協調的に深層学習モデルを学習し,それぞれのデータのプライバシーを保ちながら効率的に学習できる新しいプライバシ保護アーキテクチャを提案する。より具体的には、ニューラルネットワークの前方伝播と後方伝播を4つの異なるステップに分解し、これらのステップで情報漏洩を処理する新しいプロトコルを提案する。異なるデータセットに対する広範な実験は、精度の低下なしに安定したトレーニングを行うだけでなく、最先端システムと比較して100倍以上のスピードアップを示す。 The financial sector presents many opportunities to apply various machine learning techniques. Centralized machine learning creates a constraint which limits further applications in finance sectors. Data privacy is a fundamental challenge for a variety of finance and insurance applications that account on learning a model across different sections. In this paper, we define a new practical scheme of collaborative machine learning that one party owns data, but another party owns labels only, and term this \textbf{Asymmetrically Collaborative Machine Learning}. For this scheme, we propose a novel privacy-preserving architecture where two parties can collaboratively train a deep learning model efficiently while preserving the privacy of each party's data. More specifically, we decompose the forward propagation and backpropagation of the neural network into four different steps and propose a novel protocol to handle information leakage in these steps. Our extensive experiments on different datasets demonstrate not only stable training without accuracy loss, but also more than 100 times speedup compared with the state-of-the-art system.	翻訳日:2022-11-10 13:49:51 公開日:2020-07-14
# 行動空間対応訓練による強化学習エージェントの強固化 Robustifying Reinforcement Learning Agents via Action Space Adversarial Training ( http://arxiv.org/abs/2007.07176v1 ) ライセンス: Link先を確認	Kai Liang Tan, Yasaman Esfandiari, Xian Yeow Lee, Aakanksha, Soumik Sarkar	(参考訳) 機械学習(ML)に対応したサイバー物理システム(CPS)の採用は、輸送、産業、電力網といった現代社会の様々な分野で広く普及している。深層強化学習(DRL)の最近の研究は、様々なデータ駆動型意思決定と制御アプリケーションにおいてその利点を実証している。 ML対応システムへの依存度が高まるにつれて、悪意のある状態とアクチュエーター攻撃の下でこれらのシステムの性能を研究することが不可欠である。従来の制御システムはレジリエント/フォールト耐性のコントローラを採用しており、エラー観測によってシステムを修正している。しかし、いくつかのアプリケーションでは、回復力のあるコントローラは破滅的な失敗を避けるには不十分である。理想的には、堅牢なアプローチは、システムを本質的に(設計によって)敵の攻撃に対して堅牢なシナリオにおいてより有用である。堅牢な制御には長い歴史があるが、堅牢なMLは、その関連性と緊急性をすでに示している新興の研究分野である。しかしながら、ロバストなML研究の大部分は、意思決定や制御タスクではなく、知覚タスクに焦点を合わせてきたが、制御アプリケーションに使用されるML(特にRL)モデルは、敵の攻撃に対して等しく脆弱である。本稿では,動作空間の摂動(アクチュエータアタックなど)の影響を受けやすいDRLエージェントを,対向訓練により同様の摂動に対して堅牢化することができることを示す。 Adoption of machine learning (ML)-enabled cyber-physical systems (CPS) are becoming prevalent in various sectors of modern society such as transportation, industrial, and power grids. Recent studies in deep reinforcement learning (DRL) have demonstrated its benefits in a large variety of data-driven decisions and control applications. As reliance on ML-enabled systems grows, it is imperative to study the performance of these systems under malicious state and actuator attacks. Traditional control systems employ resilient/fault-tolerant controllers that counter these attacks by correcting the system via error observations. However, in some applications, a resilient controller may not be sufficient to avoid a catastrophic failure. Ideally, a robust approach is more useful in these scenarios where a system is inherently robust (by design) to adversarial attacks. While robust control has a long history of development, robust ML is an emerging research area that has already demonstrated its relevance and urgency. However, the majority of robust ML research has focused on perception tasks and not on decision and control tasks, although the ML (specifically RL) models used for control applications are equally vulnerable to adversarial attacks. In this paper, we show that a well-performing DRL agent that is initially susceptible to action space perturbations (e.g. actuator attacks) can be robustified against similar perturbations through adversarial training.	翻訳日:2022-11-10 13:48:02 公開日:2020-07-14
# 自然言語からの知的な要求工学とCADモデルへの連鎖 Intelligent requirements engineering from natural language and their chaining toward CAD models ( http://arxiv.org/abs/2007.07825v1 ) ライセンス: Link先を確認	Alain-J\'er\^ome Foug\`eres and Egon Ostrosi	(参考訳) 本稿では,デザイナーの創造性を設計する上で,デザイン言語が重要な役割を担っていると仮定する。設計者は、思考の補助、議論と意思決定の焦点、提案の信頼性を評価する手段としてモデルを使用し、開発する。本稿では,自然言語からの要求工学とCADモデルへの連鎖に関するインテリジェントな手法を提案する。言語分析から工学的要求の表現への移行は、構文構造を概念グラフで表される意味形式に変換することから成り立っている。概念グラフと述語論理の間の同型に基づいて、仕様の形式言語が提案されている。この言語の結果は連鎖し、コンピュータ支援3次元インタラクティブアプリケーション(catia)モデルに翻訳される。このツール(EGEON: Engineering desiGn sEmantics elabOration and ApplicatioN)は、エンジニアリング要件のセマンティックネットワークを表現するために開発された。提案手法を説明するために, 自動車ドアヒンジの設計に関する事例研究を行った。 This paper assumes that design language plays an important role in how designers design and on the creativity of designers. Designers use and develop models as an aid to thinking, a focus for discussion and decision-making and a means of evaluating the reliability of the proposals. This paper proposes an intelligent method for requirements engineering from natural language and their chaining toward CAD models. The transition from linguistic analysis to the representation of engineering requirements consists of the translation of the syntactic structure into semantic form represented by conceptual graphs. Based on the isomorphism between conceptual graphs and predicate logic, a formal language of the specification is proposed. The outcome of this language is chained and translated in Computer Aided Three-Dimensional Interactive Application (CATIA) models. The tool (EGEON: Engineering desiGn sEmantics elabOration and applicatioN) is developed to represent the semantic network of engineering requirements. A case study on the design of a car door hinge is presented to illustrates the proposed method.	翻訳日:2022-11-10 13:41:46 公開日:2020-07-14
# 名前の由来は? BERT は Entity Representations を他のどの名前にも最適か? What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? ( http://arxiv.org/abs/2007.06897v1 ) ライセンス: Link先を確認	Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi, Sunita Sarawagi	(参考訳) BERTをベースとしたNLPモデルの名前付きエンティティ表現は,入力中の同じ型付きクラスからの置換に対するロバスト性を調べることで評価する。このような摂動は自然であるが、いくつかのタスクにおいて、訓練されたモデルの状況は驚くほど不安定である。脆性は、最近のエンティティ対応bertモデルでも継続される。また,この非ロバスト性の原因を,トークン化や発生頻度などの要因を考慮して識別する。タイプアノテーションの不確かさとラベル予測を共同でモデル化しながら,複数の置換子から予測をアンサンブルする簡易な手法を提案する。 3つのNLPタスクの実験から,本手法は自然・逆のデータセットの堅牢性を向上し,精度を高めることが示された。 We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks show that our method enhances robustness and increases accuracy on both natural and adversarial datasets.	翻訳日:2022-11-10 13:41:01 公開日:2020-07-14
# 機械翻訳におけるシステム結合投票のモデル化 Modeling Voting for System Combination in Machine Translation ( http://arxiv.org/abs/2007.06943v1 ) ライセンス: Link先を確認	Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu	(参考訳) システム結合は、異なる機械翻訳システムの仮説を組み合わせる重要な技術であり、翻訳性能を向上させる。システム組み合わせに対する初期の統計的アプローチは仮説間のコンセンサスを分析するのに有効であることが証明されているが、パイプラインの使用によるエラー伝搬の問題に悩まされている。この問題は、近年のマルチソースシーケンス・ツー・シーケンスモデルのエンドツーエンドトレーニングによって緩和されているが、これらのニューラルモデルは仮説間の関係を明示的に分析せず、仮説中の単語への注意が独立に計算されるため、複数の仮説で単語が生じる可能性を無視する。本研究では,機械翻訳におけるシステム組み合わせに対する投票のモデル化手法を提案する。基本的な考え方は、異なるシステムからの仮説における単語を、代表的で生成プロセスに関与するべき単語に投票できるようにすることである。これは、各投票者の影響力と各候補者の選好を定量化する。本手法は,仮説間の関係を解析できるだけでなく,エンドツーエンドのトレーニングを可能にするため,統計的手法とニューラル手法の利点を組み合わせる。実験の結果,我々の手法は仮説のコンセンサスをうまく活用でき,中国語とドイツ語の機械翻訳タスクにおける最先端のベースラインを大幅に改善できることがわかった。 System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.	翻訳日:2022-11-10 13:40:48 公開日:2020-07-14
# 質問応答におけるホログラフィック圧縮埋め込みの利用 Using Holographically Compressed Embeddings in Question Answering ( http://arxiv.org/abs/2007.07287v1 ) ライセンス: Link先を確認	Salvador E. Barbosa	(参考訳) 単語ベクトル表現は、ディープラーニング自然言語処理モデルの中心である。埋め込みとして知られるこれらのベクトルの多くの形式があり、例えば word2vec や GloVe がある。埋め込みは大きなコーパスで訓練され、文脈で単語の使用法を学び、単語間の意味的関係を捉える。しかし、そのような訓練のセマンティクスは(単語型として知られる)異なる単語のレベルであり、例えば、単語型が名詞または動詞である場合、曖昧である可能性がある。質問応答では、入力部分と名前付きエンティティタイプが重要であるが、これらの属性を神経モデルにエンコードすることで入力のサイズが拡大する。本研究は,予め訓練された埋め込みのホログラフィック圧縮を用いて,トークン,その部分表現,名前付きエンティティタイプを,トークンのみを表すのと同じ次元で表現する。この実装は、修正された質問応答の繰り返しディープラーニングネットワークにおいて、意味的関係が保存され、高い性能が得られることを示す。 Word vector representations are central to deep learning natural language processing models. Many forms of these vectors, known as embeddings, exist, including word2vec and GloVe. Embeddings are trained on large corpora and learn the word's usage in context, capturing the semantic relationship between words. However, the semantics from such training are at the level of distinct words (known as word types), and can be ambiguous when, for example, a word type can be either a noun or a verb. In question answering, parts-of-speech and named entity types are important, but encoding these attributes in neural models expands the size of the input. This research employs holographic compression of pre-trained embeddings, to represent a token, its part-of-speech, and named entity type, in the same dimension as representing only the token. The implementation, in a modified question answering recurrent deep learning network, shows that semantic relationships are preserved, and yields strong performance.	翻訳日:2022-11-10 13:40:23 公開日:2020-07-14
# 単部適応Q-ラーニング Single-partition adaptive Q-learning ( http://arxiv.org/abs/2007.06741v1 ) ライセンス: Link先を確認	Jo\~ao Pedro Ara\'ujo, M\'ario Figueiredo, Miguel Ayala Botto	(参考訳) 本稿では、マルコフ決定過程(MDP)の状態空間を適応的に分割するモデルフリー・エピソード強化学習(RL)のアルゴリズムである単一分割適応Q-ラーニング(SPAQL)を紹介し、同時に時間不変ポリシー(例えば、状態から行動へのマッピングはエピソード時間ステップに依存しない)を学習し、累積報酬を最大化する。探索と搾取の間のトレードオフは、訓練中にuper confidence bounds(ucb)とboltzmann exploration(ボルツマン探索)の混合物を使い、トレーニングの進捗に合わせて自動的に調整される温度パラメータを用いて処理される。このアルゴリズムは適応型Q-ラーニング(AQL)よりも改善されている。最適な解に速く収束すると同時に、より少ないアームを使用する。多数のタイムステップを持つエピソードのテストでは、SPAQLはAQLとは異なり、スケーリングに問題はないことが示されている。この経験的証拠に基づき、SPAQLはAQLよりも高いサンプリング効率を持つため、効率的なモデルフリーなRL手法の分野における重要な貢献であると主張している。 This paper introduces single-partition adaptive Q-learning (SPAQL), an algorithm for model-free episodic reinforcement learning (RL), which adaptively partitions the state-action space of a Markov decision process (MDP), while simultaneously learning a time-invariant policy (i. e., the mapping from states to actions does not depend explicitly on the episode time step) for maximizing the cumulative reward. The trade-off between exploration and exploitation is handled by using a mixture of upper confidence bounds (UCB) and Boltzmann exploration during training, with a temperature parameter that is automatically tuned as training progresses. The algorithm is an improvement over adaptive Q-learning (AQL). It converges faster to the optimal solution, while also using fewer arms. Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike AQL. Based on this empirical evidence, we claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.	翻訳日:2022-11-10 13:39:15 公開日:2020-07-14
# 比較とリウェイト:類似画像集合を用いた識別的画像キャプション Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets ( http://arxiv.org/abs/2007.06877v1 ) ライセンス: Link先を確認	Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan	(参考訳) BLEU、CIDEr、SPICEといった一般的な指標に基づいて、幅広い画像キャプションモデルが開発され、大幅に改善されている。しかし、生成されたキャプションは画像を正確に記述できるが、類似した画像には汎用的であり、各画像の特異性を適切に記述することができない。本稿では,類似画像の集合を用いた訓練により,画像キャプションの識別性を向上することを目的とする。まず,類似画像に対する字幕の識別性を評価するために,セットcider(ciderbtw)間の識別性指標を提案する。評価基準は,各画像の人的アノテーションが特徴性に基づいて等価でないことを示す。そこで本研究では,CIDErBtwを重み付き損失関数あるいは強化学習報酬として用いることにより,画像毎のキャプションの特異性を高めるための新たなトレーニング戦略を提案する。最後に,提案手法は,CIDErBtwで測定した特徴量と,CIDErで測定した精度(例えば,CIDErで測定した精度)を,多種多様な画像キャプションベースラインに対して有意に改善することを示す。これらの結果はユーザ調査によってさらに確認される。 A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.	翻訳日:2022-11-10 13:33:27 公開日:2020-07-14
# 自動合成から現実への一般化 Automated Synthetic-to-Real Generalization ( http://arxiv.org/abs/2007.06965v1 ) ライセンス: Link先を確認	Wuyang Chen, Zhiding Yu, Zhangyang Wang, Anima Anandkumar	(参考訳) 合成画像で訓練されたモデルは、しばしば実データへの分解された一般化に直面します。慣例として、これらのモデルはimagenet事前学習された表現で初期化されることが多い。しかし、この知識を活用して一般化能力を維持する慣習にもかかわらず、イメージネット知識の役割はほとんど議論されない。例えば、早期停止と階層的学習率の慎重な調整は、合成と現実の一般化を改善することが示されるが、熱心でヒューリスティックでもある。本研究では, 合成学習モデルに対して, imagenet 事前学習モデルと類似表現を維持することを明示的に推奨し, 層別学習率の自動選択のための \textit{learning-to-optimize (l2o)" 戦略を提案する。提案フレームワークは,実データを見たりトレーニングしたりすることなく,合成から実への一般化性能を大幅に向上できると同時に,ドメイン適応などの下流タスクにもメリットがある。コードは、https://github.com/NVlabs/ASG.comで入手できる。 Models trained on synthetic images often face degraded generalization to real data. As a convention, these models are often initialized with ImageNet pre-trained representation. Yet the role of ImageNet knowledge is seldom discussed despite common practices that leverage this knowledge to maintain the generalization ability. An example is the careful hand-tuning of early stopping and layer-wise learning rates, which is shown to improve synthetic-to-real generalization but is also laborious and heuristic. In this work, we explicitly encourage the synthetically trained model to maintain similar representations with the ImageNet pre-trained model, and propose a \textit{learning-to-optimize (L2O)} strategy to automate the selection of layer-wise learning rates. We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.	翻訳日:2022-11-10 13:33:03 公開日:2020-07-14
# 破滅的忘れの解剖--隠れた表現とタスクの意味論 Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics ( http://arxiv.org/abs/2007.07400v1 ) ライセンス: Link先を確認	Vinay V. Ramasesh, Ethan Dyer, Maithra Raghu	(参考訳) 汎用機械学習システムの開発における中心的な課題は、破滅的な忘れさだ。タスクの順序でトレーニングされたモデルが、以前のタスクで大幅なパフォーマンス低下を被る。破滅的な忘れ物が多用されているにもかかわらず、基礎となるプロセスとその原因についての理解は限られている。本稿では,この重要な知識ギャップに対処し,ニューラルネットワークモデルにおいて,忘れることが表現に与える影響について検討する。表現分析手法により,深い層が忘れの源であることがわかった。これを支持するために、忘れを緩和する方法の研究は、より深い層を安定化するために働くことを示す。これらの洞察は、タスク間の表象的類似性を忘れる程度に関連する分析的議論と経験的図の開発を可能にする。この図と一致して、中間相似性を持つタスクシーケンスの最大忘れが観測される。我々は、標準分割CIFAR-10セットアップに関する実証的研究を行い、また、現実的な入力分布シフトを近似する新しいCIFAR-100タスクを導入する。 A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. Despite the ubiquity of catastrophic forgetting, there is limited understanding of the underlying process and its causes. In this paper, we address this important knowledge gap, investigating how forgetting affects representations in neural network models. Through representational analysis techniques, we find that deeper layers are disproportionately the source of forgetting. Supporting this, a study of methods to mitigate forgetting illustrates that they act to stabilize deeper layers. These insights enable the development of an analytic argument and empirical picture relating the degree of forgetting to representational similarity between tasks. Consistent with this picture, we observe maximal forgetting occurs for task sequences with intermediate similarity. We perform empirical studies on the standard split CIFAR-10 setup and also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.	翻訳日:2022-11-10 13:32:02 公開日:2020-07-14
# Sudo rm -rf: ユニバーサル音源分離のための効率的なネットワーク Sudo rm -rf: Efficient Networks for Universal Audio Source Separation ( http://arxiv.org/abs/2007.06833v1 ) ライセンス: Link先を確認	Efthymios Tzinis, Zhepei Wang and Paris Smaragdis	(参考訳) 本稿では,エンドツーエンドの汎用音源分離のための効率的なニューラルネットワークを提案する。具体的には、この畳み込みネットワークのバックボーン構造は、単純な1次元畳み込みによって実行される、複数の解像度特徴(sudormrf)の連続的なダウンサンプリングと再サンプリングである。このようにして,浮動小数点演算数,メモリ要求数,パラメータ数,レイテンシを限定した高品質なオーディオソース分離を実現することができる。音声と環境音の分離データセットを用いた実験により,SuDoRMRFは相容れない性能を示し,計算資源の要求が大幅に高い様々な最先端手法を超越していることがわかった。 In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high quality audio source separation with limited number of floating point operations, memory requirements, number of parameters and latency. Our experiments on both speech and environmental sound separation datasets show that SuDoRMRF performs comparably and even surpasses various state-of-the-art approaches with significantly higher computational resource requirements.	翻訳日:2022-11-10 13:31:47 公開日:2020-07-14
# Rewardsによるプログラミング Programming by Rewards ( http://arxiv.org/abs/2007.06835v1 ) ライセンス: Link先を確認	Nagarajan Natarajan, Ajaykrishna Karthikeyan, Prateek Jain, Ivan Radicek, Sriram Rajamani, Sumit Gulwani, Johannes Gehrke	(参考訳) PBR(Programming by rewards)は,パフォーマンスや資源利用,あるいはベンチマーク上の正当性などの定量的指標を最適化するために,サブルーチンを指定・合成するための新しい手法である。 PBR仕様は(1)入力機能$x$、(2)報酬関数$r$で、ブラックボックスコンポーネントとしてモデル化され、実行毎に報酬を割り当てる。シンセサイザーの目標は「決定関数」$f$を合成することであり、ブラックボックスコンポーネントの判断値を変換して、様々な値の$x$に対して$f(x)$を実行するための期待報酬$e[r \circ f(x)]$を最大化することである。我々は,木構造における入力特徴の線形関数を分岐し,木の葉における入力の線形関数を計算するループフリーif-then-elseプログラムのdslにおける決定関数の空間を考える。このdslはプログラマが実際に手作業で記述した決定関数をキャプチャする。我々の技術的貢献は、if-then-elseプログラムのような決定関数の合成に連続最適化技術を使うことである。また、このフレームワークは理論的に確立された -- 報酬が優れた特性を満たす場合において、合成されたコードは正確な意味で最適であることを示す。我々は,pbrを活用して,proseコードベースにおける検索・ランキングヒューリスティックスに関連する非自明な決定関数(産業強度プログラム合成フレームワーク)を合成し,複数人のチューニングにおいて手作業による手続きと競合する結果を得る。実世界のケーススタディ(PROSEを含む)と単純な合成ベンチマークにおいて,他のベースライン技術に対する実証評価を行った。 We formalize and study ``programming by rewards'' (PBR), a new approach for specifying and synthesizing subroutines for optimizing some quantitative metric such as performance, resource utilization, or correctness over a benchmark. A PBR specification consists of (1) input features $x$, and (2) a reward function $r$, modeled as a black-box component (which we can only run), that assigns a reward for each execution. The goal of the synthesizer is to synthesize a "decision function" $f$ which transforms the features to a decision value for the black-box component so as to maximize the expected reward $E[r \circ f (x)]$ for executing decisions $f(x)$ for various values of $x$. We consider a space of decision functions in a DSL of loop-free if-then-else programs, which can branch on linear functions of the input features in a tree-structure and compute a linear function of the inputs in the leaves of the tree. We find that this DSL captures decision functions that are manually written in practice by programmers. Our technical contribution is the use of continuous-optimization techniques to perform synthesis of such decision functions as if-then-else programs. We also show that the framework is theoretically-founded ---in cases when the rewards satisfy nice properties, the synthesized code is optimal in a precise sense. We have leveraged PBR to synthesize non-trivial decision functions related to search and ranking heuristics in the PROSE codebase (an industrial strength program synthesis framework) and achieve competitive results to manually written procedures over multiple man years of tuning. We present empirical evaluation against other baseline techniques over real-world case studies (including PROSE) as well on simple synthetic benchmarks.	翻訳日:2022-11-10 13:31:36 公開日:2020-07-14
# 注目すべき発話予測による医師・患者会話の構造化データ抽出 Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances ( http://arxiv.org/abs/2007.07151v1 ) ライセンス: Link先を確認	Kundan Krishna, Amy Pavel, Benjamin Schloss, Jeffrey P. Bigham, Zachary C. Lipton	(参考訳) 医療データの多様なモダリティを発掘する様々な努力にもかかわらず、診療当時の医師と患者の会話は未解決の洞察の源である。本稿では,このデータを利用して医師の電子的健康記録における訪問後の文書化を支援する構造情報を抽出し,聖職者の負担軽減を図る。本稿では,会話の書き起こし,ビジット後の要約,それに対応する証拠(転写文),構造化ラベルからなる新しいデータセットについて述べる。我々は, 臓器システム(ros)のレビューにおいて, 関連する診断や異常の認識の課題に焦点をあてる。方法論上の課題の1つは、会話が長い(約1500語)ため、現代のディープラーニングモデルがそれらを入力として使用するのが困難である。この課題に対処するために,会話の一部が要約文を支持する証拠として引用される可能性が高い,注目すべき発話を抽出する。予測音声を初めてフィルタリングすることにより,診断とRoS異常の両方を認識するための予測性能を大幅に向上させることができる。 Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.	翻訳日:2022-11-10 13:24:04 公開日:2020-07-14
# 評価基準は包括的一般化の更新を必要とします Our Evaluation Metric Needs an Update to Encourage Generalization ( http://arxiv.org/abs/2007.06898v1 ) ライセンス: Link先を確認	Swaroop Mishra, Anjana Arunkumar, Chris Bryan and Chitta Baral	(参考訳) いくつかの人気のあるベンチマークで人的パフォーマンスを上回るモデルでは、out of Distribution(OOD)データに曝露した場合のパフォーマンスが著しく低下する。最近の研究では、モデルが人間のような一般化可能な特徴を学習する代わりに、刺激的なバイアスや「ハック」データセットに過度に適合していることが示されている。モデル性能のインフレーション(つまりAIシステムの能力の過大評価)を抑えるため、我々は、評価中の一般化を促進する単純で斬新な評価指標であるWOODスコアを提案する。 Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and `hack' datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance -- and thus overestimation in AI systems' capabilities -- we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.	翻訳日:2022-11-10 13:23:47 公開日:2020-07-14
# 固有タスクを用いた生涯学習:タスク分離、スキル獲得、選択転送 Lifelong Learning using Eigentasks: Task Separation, Skill Acquisition, and Selective Transfer ( http://arxiv.org/abs/2007.06918v1 ) ライセンス: Link先を確認	Aswin Raghavan, Jesse Hostetler, Indranil Sur, Abrar Rahman, Ajay Divakaran	(参考訳) 生涯学習のための固有タスクフレームワークを紹介する。固有タスク(eigentask)とは、関連するタスクの集合を解決するスキルのペアであり、そのスキルの入力空間からサンプルできる生成モデルとペアリングする。このフレームワークは、主に破滅的な忘れを避けるために使われてきた生成的リプレイアプローチを拡張し、フォワード・ナレッジ・トランスファーのような他の生涯学習目標にも対処する。我々は,学習のためのタスク学習と知識統合を交互に行うウェイクスリープサイクルを提案し,生涯教師付き学習と生涯rlをインスタンス化する。我々は,教師付き連続学習における最先端の性能向上を実現し,ゲーム『スタークラフト2』の生涯RLアプリケーションにおけるフォワード知識伝達の証拠を示す。 We introduce the eigentask framework for lifelong learning. An eigentask is a pairing of a skill that solves a set of related tasks, paired with a generative model that can sample from the skill's input space. The framework extends generative replay approaches, which have mainly been used to avoid catastrophic forgetting, to also address other lifelong learning goals such as forward knowledge transfer. We propose a wake-sleep cycle of alternating task learning and knowledge consolidation for learning in our framework, and instantiate it for lifelong supervised learning and lifelong RL. We achieve improved performance over the state-of-the-art in supervised continual learning, and show evidence of forward knowledge transfer in a lifelong RL application in the game Starcraft2.	翻訳日:2022-11-10 13:23:38 公開日:2020-07-14
# リパラメータ化によるMLシステムの検証 Verification of ML Systems via Reparameterization ( http://arxiv.org/abs/2007.06776v1 ) ライセンス: Link先を確認	Jean-Baptiste Tristan, Joseph Tassarotti, Koundinya Vajjha, Michael L. Wick, Anindya Banerjee	(参考訳) 機械学習が本質的なシステムでますます使われているため、深刻なバグの発生を低減または排除することが重要である。成長する研究機関は、パフォーマンス、堅牢性、公正性に関する正式な保証を備えた機械学習アルゴリズムを開発した。しかし、これらのアルゴリズムの分析はしばしば複雑であり、実際にそのようなシステムを実装するとエラーの余地が生じる。証明アシスタントは、そのようなバグを除外する正当性のマシンチェック証明を構築することによって、機械学習システムの正式な検証に使用できる。しかし、証明アシスタントの内部での確率的主張の推論は依然として困難である。確率的プログラムが 'emph{reparameterization} という概念を用いて定理証明器で自動的に表現され、また可測性の退屈な証明が確率的プログラムから自動的に生成されることを示す。このアプローチが、かなり異なるタイプの機械学習システムを扱うのに十分広いことを実証するために、統計的学習理論(PAC-learnability of decision stumps)からの古典的な結果と、ベイズ仮説テストで用いられるヌルモデルが、人口統計パリティと呼ばれる公正な基準を満たすことを証明した。 As machine learning is increasingly used in essential systems, it is important to reduce or eliminate the incidence of serious bugs. A growing body of research has developed machine learning algorithms with formal guarantees about performance, robustness, or fairness. Yet, the analysis of these algorithms is often complex, and implementing such systems in practice introduces room for error. Proof assistants can be used to formally verify machine learning systems by constructing machine checked proofs of correctness that rule out such bugs. However, reasoning about probabilistic claims inside of a proof assistant remains challenging. We show how a probabilistic program can be automatically represented in a theorem prover using the concept of \emph{reparameterization}, and how some of the tedious proofs of measurability can be generated automatically from the probabilistic program. To demonstrate that this approach is broad enough to handle rather different types of machine learning systems, we verify both a classic result from statistical learning theory (PAC-learnability of decision stumps) and prove that the null model used in a Bayesian hypothesis test satisfies a fairness criterion called demographic parity.	翻訳日:2022-11-10 13:23:24 公開日:2020-07-14
# 因果推論の線形構造方程式モデルにおけるロバスト同定可能性 Robust Identifiability in Linear Structural Equation Models of Causal Inference ( http://arxiv.org/abs/2007.06869v1 ) ライセンス: Link先を確認	Karthik Abinav Sankararaman, Anand Louis, Navin Goyal	(参考訳) 本研究では,線形構造方程式モデル(LSEM)の文脈における観測データからのロバストパラメータ推定の問題について考察する。 LSEMは、自然科学と社会科学の因果関係を推定するための、人気がありよく研究されているモデルのクラスである。 LSEMに関連する主な問題の1つは、観測データからモデルパラメータを復元することである。 LSEMとモデルパラメータの様々な条件の下で、先行研究はパラメータを復元する効率的なアルゴリズムを提供する。しかし、これらの結果はしばしば汎用的な識別可能性に関するものである。実際には、一般的な識別性は十分ではなく、堅牢な識別性が必要であり、観測データの小さな変化はパラメータに多大な影響を及ぼすべきではない。ロバストな識別性は、はるかに少ない注目を受けており、まだ理解されていない。 sankararaman et al. (2019) は最近、ロバストな識別性が実現可能なパラメータに関する十分条件のセットを提供した。しかしながら、彼らの研究の限界は、それらの結果は ``bow-free paths と呼ばれるLSEMの小さなサブクラスにのみ適用されることである。「'この作業では、複数の次元に沿って作業を大幅に拡張します。まず,大規模かつ十分に検討されたlsemsクラス,すなわち`bow free''モデルに対して,ロバスト識別性が保持するモデルパラメータに関する十分な条件を提供し,事前作業に必要なパスの制限を解消する。次に,この十分条件が高い確率で保持されることを示すことにより,頑健な識別可能性の大きい集合に対して,既存のアルゴリズムが既に頑健な識別可能性を達成していることを示す。最後に、シミュレーションと実世界の両方のデータセットで結果を検証する。 In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called ``bow-free paths.'' In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely ``bow free'' models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.	翻訳日:2022-11-10 13:23:03 公開日:2020-07-14
# 多腕バンディットにおける汎用的異常検出 Generic Outlier Detection in Multi-Armed Bandit ( http://arxiv.org/abs/2007.07293v1 ) ライセンス: Link先を確認	Yikun Ban and Jingrui He	(参考訳) 本稿では,金融,医療,オンライン広告など多くのハイパフォーマンスな分野において,多腕のバンディット設定における異常アーム検出の問題点について検討する。この問題に対して、学習者は、期待された報酬が他のほとんどの腕から著しく逸脱する腕を特定することを目指している。既存の作業とは違って、期待される報酬がより大きく、小さく、あるいは通常のアーム間でも得る、汎用的なアウトリアーアームまたはアウトリアーアームグループをターゲットにしています。この目的のために、我々は、そのようなジェネリックアウトリアーアームとアウトリアーアーム群の包括的定義を提供することから始める。そこで本研究では,GOLDと呼ばれる新しい引抜きアルゴリズムを提案する。これは、高信頼境界に基づくリアルタイムな近傍グラフを構築し、通常の腕から外れ値の振る舞いパターンをキャッチする。また、その性能を様々な側面から分析する。合成データと実世界のデータの両方で行った実験において,提案アルゴリズムは98 %の精度を実現し,最先端技術と比較して平均83 %の探索コストを節約した。 In this paper, we study the problem of outlier arm detection in multi-armed bandit settings, which finds plenty of applications in many high-impact domains such as finance, healthcare, and online advertising. For this problem, a learner aims to identify the arms whose expected rewards deviate significantly from most of the other arms. Different from existing work, we target the generic outlier arms or outlier arm groups whose expected rewards can be larger, smaller, or even in between those of normal arms. To this end, we start by providing a comprehensive definition of such generic outlier arms and outlier arm groups. Then we propose a novel pulling algorithm named GOLD to identify such generic outlier arms. It builds a real-time neighborhood graph based on upper confidence bounds and catches the behavior pattern of outliers from normal arms. We also analyze its performance from various aspects. In the experiments conducted on both synthetic and real-world data sets, the proposed algorithm achieves 98 % accuracy while saving 83 % exploration cost on average compared with state-of-the-art techniques.	翻訳日:2022-11-10 13:21:48 公開日:2020-07-14

Title

Authors

Abstract

論文公表日・翻訳日

# キャビティの自己分極と光化学的抑制に及ぼす多モードの影響

Effect of Many Modes on Self-Polarization and Photochemical Suppression in Cavities ( http://arxiv.org/abs/2001.07330v3 )

ライセンス: Link先を確認

Norah M. Hoffmann, Lionel Lacombe, Angel Rubio, Neepa T. Maitra

(参考訳) キャビティ修飾分子反応の標準的な記述は、通常は単一(共鳴)モードを含むが、実際には量子キャビティは様々な光子モードをサポートする。ここでは、より多くの光子モードが説明されるにつれて、物理化学的現象が劇的に変化し、陽子結合電子移動の重要でユビキタスな過程のキャビティ誘起の抑制によって示される。光子モードに対するマルチトラックのehrenfest法を用いて、自己分極効果が必須となり、自己分極変調ボルン-オッペンハイマー面の概念を動力学解析の新しい構成法として導入する。キャビティフォトンモードの数が増加すると、キャビティフリーのボルン・オッペンハイマー表面からの表面のずれが増大し、光子放出とこれらの表面の拡幅帯域内の吸収との間の相互作用が抑制される。本研究は一般的な知見であり, キャビティに埋め込まれた分子, ナノ構造, 固体のキャビティ駆動の物理過程の記述と制御に影響を及ぼすであろう。

The standard description of cavity-modified molecular reactions typically involves a single (resonant) mode, while in reality the quantum cavity supports a range of photon modes. Here we demonstrate that as more photon modes are accounted for, physico-chemical phenomena can dramatically change, as illustrated by the cavity-induced suppression of the important and ubiquitous process of proton-coupled electron-transfer. Using a multi-trajectory Ehrenfest treatment for the photon-modes, we find that self-polarization effects become essential, and we introduce the concept of self-polarization-modified Born-Oppenheimer surfaces as a new construct to analyze dynamics. As the number of cavity photon modes increases, the increasing deviation of these surfaces from the cavity-free Born-Oppenheimer surfaces, together with the interplay between photon emission and absorption inside the widening bands of these surfaces, leads to enhanced suppression. The present findings are general and will have implications for the description and control of cavity-driven physical processes of molecules, nanostructures and solids embedded in cavities.

翻訳日:2023-06-06 11:37:58 公開日:2020-07-14

# マルチビット計測誤差の効率的な補正

Efficient correction of multiqubit measurement errors ( http://arxiv.org/abs/2001.09980v2 )

ライセンス: Link先を確認

Michael R. Geller and Mingyu Sun

(参考訳) 状態準備と測定(SPAM)エラーは、短期量子コンピュータの性能と実用化の可能性を制限する。スパムエラーは、n$ qubitsのレジスタの完全な実装のために2^n$の追加測定を必要とするキャリブレーションステップの後に部分的に修正可能である。ここでは,2^n \!の古典的処理を必要とするマルチキュービットSPAM誤差のキャラクタリゼーションと緩和のための近似的だが効率的な手法を提案する。 2^n$行列であるが、$O(4^k n^2)$測定のみであり、$k=O(1)$は相関体積の量子ビットの数である。 4および8個の超伝導量子ビットのレジスタ上でibm qプロセッサを用いてこの技術を実証・検証する。

State preparation and measurement (SPAM) errors limit the performance of near-term quantum computers and their potential for practical application. SPAM errors are partly correctable after a calibration step that requires, for a complete implementation on a register of $n$ qubits, $2^n$ additional measurements. Here we introduce an approximate but efficient method for multiqubit SPAM error characterization and mitigation requiring the classical processing of $2^n \! \times 2^n$ matrices, but only $O(4^k n^2)$ measurements, where $k=O(1)$ is the number of qubits in a correlation volume. We demonstrate and validate the technique using an IBM Q processor on registers of 4 and 8 superconducting qubits.

翻訳日:2023-06-05 11:42:14 公開日:2020-07-14

# 厳密な測定誤差補正

Rigorous measurement error correction ( http://arxiv.org/abs/2002.01471v2 )

ライセンス: Link先を確認

Michael R. Geller

(参考訳) 本稿では,ゲート型量子コンピュータにおける状態準備および測定誤差の補正に用いる実験手法について検討し,その厳密な正当性について論じる。特定の偏量子測定モデルにおいて、任意の$n$-量子ビット状態の非理想的測定は理想射影的測定と等価であり、出力確率分布に作用する古典的なマルコフ過程 $\gamma$ が続くことを証明する。測定誤差は厳密な正当化によって取り除くことができ、$\gamma$ が学習され反転できる。ゲートセットトモグラフィー(R. Blume-Kohout et al., arXiv:1310.4492)から$\Gamma$を得る方法を示し、IBM Q超伝導量子ビットに誤差補正手法を適用する。

We review an experimental technique used to correct state preparation and measurement errors on gate-based quantum computers, and discuss its rigorous justification. Within a specific biased quantum measurement model, we prove that nonideal measurement of an arbitrary $n$-qubit state is equivalent to ideal projective measurement followed by a classical Markov process $\Gamma$ acting on the output probability distribution. Measurement errors can be removed, with rigorous justification, if $\Gamma$ can be learned and inverted. We show how to obtain $\Gamma$ from gate set tomography (R. Blume-Kohout et al., arXiv:1310.4492) and apply the error correction technique to single IBM Q superconducting qubits.

翻訳日:2023-06-04 18:36:14 公開日:2020-07-14

# 量子ハト群集からの脱出

Escape from the Quantum Pigeon Conundrum ( http://arxiv.org/abs/2002.01876v3 )

ライセンス: Link先を確認

Gabor Kunstatter, Jonathan Ziprick, Victoria McNab, Alexander Rennie, Connor Speidel, and Jovin Toews

(参考訳) Aharonovらで最近議論されている。量子力学が2つの箱に3羽のハトを分配した場合、少なくとも2羽のハトを箱の1つに配置しなければならないというピジョン計数原理(PCP)に違反しているとするal. (2016)。しかし、この結論は厳密な理論的議論によって正当化できない。この問題は,PCP違反の結論を裏付けるものではないと予測される遷移振幅を実験的に確認することでさらに複雑になる。ここでは、解釈によらず、PCPが量子力学に違反しないことを演算子アイデンティティのセットで証明する。

It has recently been argued in Aharonov et. al. (2016) that quantum mechanics violates the Pigeon Counting Principle (PCP) which states that if one distributes three pigeons among two boxes there must be at least two pigeons in one of the boxes. However, this conclusion cannot justified by rigorous theoretical arguments. The issue is further complicated by experimental confirmation of the transition amplitudes predicted in this paper that nevertheless do not support the conclusion of PCP violation. Here we prove via a set of operator identities that the PCP is not violated within quantum mechanics, regardless of interpretation.

翻訳日:2023-06-04 16:18:25 公開日:2020-07-14

# ブロードバンド光を用いた光浮上

Optical levitation using broadband light ( http://arxiv.org/abs/2002.04650v3 )

ライセンス: Link先を確認

A. T. M. Anishur Rahman and P. F. Barker

(参考訳) 動的に調整された光学ポテンシャルを作り出す能力は、生物学から量子科学まで幅広い分野において重要になっている。超発光ダイオードの広帯域スペクトルプロファイルとレンズの色収差を組み合わせた任意の光学トウェザ電位の作成方法を示す。超高速レーザーパルス整形に使用される波長可変フィルタにより、広帯域のスペクトルプロファイルと、この光の集光によって形成される光ツイーザー電位を操作できる。これらのポテンシャルを真空中における浮遊ナノ粒子のブラウン運動の測定と、粒子の干渉検出とフィードバック冷却によって特徴付ける。このシンプルで費用対効果の高い技術により、幅広い応用が可能となり、MHz周波数を超える光ポテンシャルランドスケープの迅速な変調が可能となる。

The ability to create dynamic, tailored optical potentials has become important across fields ranging from biology to quantum science. We demonstrate a method for the creation of arbitrary optical tweezer potentials using the broadband spectral profile of a superluminescent diode combined with the chromatic aberration of a lens. A tunable filter, typically used for ultra-fast laser pulse shaping, allows us to manipulate the broad spectral profile and therefore the optical tweezer potentials formed by focusing of this light. We characterize these potentials by measuring the Brownian motion of levitated nanoparticles in vacuum and, also demonstrate interferometric detection and feedback cooling of the particle,s motion. This simple and cost-effective technique will enable a wide range of applications and allow rapid modulation of the optical potential landscape in excess of MHz frequencies.

翻訳日:2023-06-03 23:21:19 公開日:2020-07-14

# 2次元正方格子上の量子ウォークのためのトラップコインの完全分類

Complete classification of trapping coins for quantum walks on the 2D square lattice ( http://arxiv.org/abs/2002.08070v2 )

ライセンス: Link先を確認

B\'alint Koll\'ar, Andr\'as Gily\'en, Iva Tk\'a\v{c}ov\'a, Tam\'as Kiss, Igor Jex, Martin \v{S}tefa\v{n}\'ak

(参考訳) 離散時間量子ウォークのユニークな特徴の1つはトラップと呼ばれるもので、これは量子ウォーカーが最初の位置から完全に脱出できないことを意味する。この効果は、局所貨幣の寸法と明示的な形に依存する。正方格子上の4状態離散時間量子ウォークは、そのユニタリコイン作用素によって定義され、4次元コインヒルベルト空間に作用する。グロバー硬貨のよく知られた例は、部分的なトラップ、すなわち、初期位置に留まる確率が消えるいくつかの脱出初期状態をもたらす。一方、他のいくつかの硬貨は、そのような逃避状態が存在しない強いトラップを示すことが知られている。本稿では,2次元正方格子上での離散時間量子ウォークのために,これらすべてのコインを明示的に構成し,演算子の構造とトラップ効果の顕在化に基づいてそれらを分類する。本研究では, エスケープ状態の存在や非存在が示すように, 異なる動的特性を示す3種類のトラップコインと, 拡散波パレットが被覆する領域とを区別する。

One of the unique features of discrete-time quantum walks is called trapping, meaning the inability of the quantum walker to completely escape from its initial position, albeit the system is translationally invariant. The effect is dependent on the dimension and the explicit form of the local coin. A four state discrete-time quantum walk on a square lattice is defined by its unitary coin operator, acting on the four dimensional coin Hilbert space. The well known example of the Grover coin leads to a partial trapping, i.e., there exists some escaping initial state for which the probability of staying at the initial position vanishes. On the other hand, some other coins are known to exhibit strong trapping, where such escaping state does not exist. We present a systematic study of coins leading to trapping, explicitly construct all such coins for discrete-time quantum walks on the 2D square lattice, and classify them according to the structure of the operator and the manifestation of the trapping effect. We distinguish three types of trapping coins exhibiting distinct dynamical properties, as exemplified by the existence or non-existence of the escaping state and the area covered by the spreading wave-packet.

翻訳日:2023-06-03 04:56:24 公開日:2020-07-14

# 超伝導回路の最適制御による非断熱幾何量子計算

Nonadiabatic geometric quantum computation with optimal control on superconducting circuits ( http://arxiv.org/abs/2004.10199v2 )

ライセンス: Link先を確認

Jing Xu, Sai Li, Tao Chen, and Zheng-Yuan Xue

(参考訳) 量子コンピュータの重要な構成要素である量子ゲートは、非常に脆弱である。したがって、忠実度の高い頑健な量子ゲートを実現することは量子操作の究極の目標である。本稿では、任意の量子ゲートを設計するための超伝導回路上の非断熱的幾何学的量子計算手法を提案する。これは、幾何位相の強みと、最適制御技術と組み合わせてゲートロバスト性をさらに高める能力の両方を共有する。具体的には、任意の幾何学的単一量子ビットゲートを共振マイクロ波フィールド駆動によりトランスモン量子ビット上で実現し、その振幅と位相は時間依存である。一方、非自明な2量子ビットの幾何ゲートは2つの容量結合されたトランモン量子ビットで実装でき、トランモン量子ビットの周波数の1つを変調してそれらの間の効果的な共振結合を得る。したがって,本手法はフォールトトレラントな固体量子計算への有望な一歩となる。

Quantum gates, which are the essential building blocks of quantum computers, are very fragile. Thus, to realize robust quantum gates with high fidelity is the ultimate goal of quantum manipulation. Here, we propose a nonadiabatic geometric quantum computation scheme on superconducting circuits to engineer arbitrary quantum gates, which share both the robust merit of geometric phases and the capacity to combine with optimal control technique to further enhance the gate robustness. Specifically, in our proposal, arbitrary geometric single-qubit gates can be realized on a transmon qubit, by a resonant microwave field driving, with both the amplitude and phase of the driving being time-dependent. Meanwhile, nontrivial two-qubit geometric gates can be implemented by two capacitively coupled transmon qubits, with one of the transmon qubits' frequency being modulated to obtain effective resonant coupling between them. Therefore, our scheme provides a promising step towards fault-tolerant solid-state quantum computation.

翻訳日:2023-05-22 20:29:13 公開日:2020-07-14

# サーバーレス電子メール

Serverless Electronic Mail ( http://arxiv.org/abs/2007.04608v2 )

ライセンス: Link先を確認

Geoffrey Goodell

(参考訳) 本稿では、通常のワークステーションやモバイル端末の利用者が、サードパーティーのメールサーバに頼らずにメッセージを交換できるピアツーピア電子メールへの簡単なアプローチについて述べる。重要なことに、このシステムは参加者が互いに通信するために複数のリンクされていないアイデンティティを確立および使用できるようにする。このアーキテクチャは、メッセージ配信に通常のSMTP、ピアツーピア通信にTorを利用する。この設計は、エンドツーエンド認証と暗号化のための公開鍵に基づく信頼のwebをブートストラップするために、自己認証のtor onionサービス名を使用する堅牢で意図しない方法を提供する。本システムは既存の電子メールシステムやパラダイムと相互運用可能であり,IMAP経由で他のメールを受信したり,システム参加者と外部メールユーザとの中継として操作したりすることができる。最後に,ブロードキャストプロトコルを使用してメーリングリストを実装する方法と,分散台帳技術がリストメンバ間の共有知識に関するコンセンサスをブートストラップする方法について述べる。

We describe a simple approach to peer-to-peer electronic mail that would allow users of ordinary workstations and mobile devices to exchange messages without relying upon third-party mail server operators. Crucially, the system allows participants to establish and use multiple unlinked identities for communication with each other. The architecture leverages ordinary SMTP for message delivery and Tor for peer-to-peer communication. The design offers a robust, unintrusive method to use self-certifying Tor onion service names to bootstrap a web of trust based on public keys for end-to-end authentication and encryption, which in turn can be used to facilitate message delivery when the sender and recipient are not online simultaneously. We show how the system can interoperate with existing email systems and paradigms, allowing users to hold messages that others can retrieve via IMAP or to operate as a relay between system participants and external email users. Finally, we show how it is possible to use a broadcast protocol to implement mailing lists and how distributed ledger technology might be used to bootstrap consensus about shared knowledge among list members.

翻訳日:2023-05-10 21:35:28 公開日:2020-07-14

# フェルミ表面形状

Fermi Surface Geometry ( http://arxiv.org/abs/2007.05525v2 )

ライセンス: Link先を確認

Elena Derunova, Jacob Gayles, Yan Sun, Michael W. Gaultois, Mazhar N. Ali

(参考訳) ペレルマン,ハミルトン,サーストンの著名かつ先駆的な数学作品に動機づけられ,多次元多様体の現代の幾何学的数学的分類を用いて電子構造を特徴付け,非自明な電子輸送現象を予測するという概念を導入した。ここでは、接束とガウス曲率を不変量として用いたフェルミ曲面幾何効果(FSGE)を開発する。我々はフェルミ表面(fs)の「双曲性」を記述するための指数である$\mathbb{h}_f$を開発し、現在の方法が苦しんだものを含む、様々な結晶、化学、電子構造ファミリーにまたがる16種類の化合物の実験的に測定された内在的異常ホール効果と普遍的相関(r$^2$ = 0.97)を示す。この研究は、フェルミ曲面から始まる電子(および拡張マグノニックおよびフォノン)構造多様体の幾何学的理解の完全な理論を開発する基礎を築いた。トポロジカル物理学の広範な影響と類似して、ここで始められた概念は、電子輸送の理解においてパラダイムシフトをもたらし、E と k の多様体の幾何学的性質と位相的性質を含むように移動する。

Motivated by the famous and pioneering mathematical works by Perelman, Hamilton, and Thurston, we introduce the concept of using modern geometrical mathematical classifications of multi-dimensional manifolds to characterize electronic structures and predict non-trivial electron transport phenomena. Here we develop the Fermi Surface Geometry Effect (FSGE), using the concepts of tangent bundles and Gaussian curvature as an invariant. We develop an index, $\mathbb{H}_F$, for describing the the "hyperbolicity" of the Fermi Surface (FS) and show a universal correlation (R$^2$ = 0.97) with the experimentally measured intrinsic anomalous Hall effect of 16 different compounds spanning a wide variety of crystal, chemical, and electronic structure families, including where current methods have struggled. This work lays the foundation for developing a complete theory of geometrical understanding of electronic (and by extension magnonic and phononic) structure manifolds, beginning with Fermi surfaces. In analogy to the broad impact of topological physics, the concepts begun here will have far reaching consequences and lead to a paradigm shift in the understanding of electron transport, moving it to include geometrical properties of the E vs k manifold as well as topological properties.

翻訳日:2023-05-10 17:14:08 公開日:2020-07-14

# ノイズ横運動を受ける個々の適応原子ビット上の量子ゲート

Quantum Gates on Individually-Addressed Atomic Qubits Subject to Noisy Transverse Motion ( http://arxiv.org/abs/2007.06768v1 )

ライセンス: Link先を確認

M. Cetina, L. N. Egan, C. A. Noel, M. L. Goldman, A. R. Risinger, D. Zhu, D. Biswas, C. Monroe

(参考訳) 個々の閉じ込められた原子量子ビットは、その無視可能なアイドルエラーと集中した光学場を介して再構成可能なゲート操作の完全なセットを実装する能力により、量子コンピュータをスケールする最も有望な技術の1つである。しかし、量子ゲート演算の忠実度は、レーザーに横切る原子の弱い閉じ込めによって制限することができる。鎖軸に沿って弱く閉じ込められた25個の閉じ込められた原子イオンの鎖に個々に絡み合うゲートを配置することにより,この効果の測定を行う。ノイズ電界に起因するイオンの残留加熱から観測されたデコヒーレンスを正確に記述するモデルを提案する。量子回路を通して量子ビットイオンを同調的に冷却するために,鎖に分散したアンシライオンを用いてこれらの効果を抑制することを提案する。

Individual trapped atomic qubits represent one of the most promising technologies to scale quantum computers, owing to their negligible idle errors and the ability to implement a full set of reconfigurable gate operations via focused optical fields. However, the fidelity of quantum gate operations can be limited by weak confinement of the atoms transverse to the laser. We present measurements of this effect by performing individually-addressed entangling gates in chains of up to 25 trapped atomic ions that are weakly confined along the chain axis. We present a model that accurately describes the observed decoherence from the residual heating of the ions caused by noisy electric fields. We propose to suppress these effects through the use of ancilla ions interspersed in the chain to sympathetically cool the qubit ions throughout a quantum circuit.

翻訳日:2023-05-10 02:29:42 公開日:2020-07-14

# adiabaticityへの近道を用いた粒子速度の高速化

Speeding Up Particle Slowing using Shortcuts to Adiabaticity ( http://arxiv.org/abs/2007.06752v1 )

ライセンス: Link先を確認

John P. Bartolotta, Jarrod T. Reilly, and Murray J. Holland

(参考訳) 自発的に散乱した光子のランダムな方向から生じる運動量拡散を伴わずに大きな力を生み出すことができるレーザー場による粒子の減速法を提案する。この方法では、周期的に変形した時間分解型レーザーパルスが極小電子遷移に対処し、繰り返し吸収して放出サイクルを刺激することで粒子運動量を減少させる。ルイス・リースンフェルド不変量理論に基づく断熱アプローチへの近道を実装した。これにより, 短時間の減速距離を得るのに必要となる急速移動の利点を生かして, 応用分野の精密な強度や変形特性に本質的な不感が生じるという, 断熱移動の利点が得られる。熱オーブン源の典型的なパラメータは、毎秒1メートルの速度で中心速度を持つ粒子ビームを生成するため、これは粒子を1ミリメートル未満で定常付近に減速させる可能性がある。放射圧力に依存する広範に実装されたスローング技術と比較し,励起状態の減衰速度が小さい場合に生じる可能性のある利点を示す。したがって、このスキームは特定の分子で起こるような閉環遷移を欠く狭い線幅系の減速に特に有望な候補である。

We propose a method for slowing particles by laser fields that potentially has the ability to generate large forces without the associated momentum diffusion that results from the random directions of spontaneously scattered photons. In this method, time-resolved laser pulses with periodically modified detunings address an ultranarrow electronic transition to reduce the particle momentum through repeated absorption and stimulated emission cycles. We implement a shortcut to adiabaticity approach that is based on Lewis-Riesenfeld invariant theory. This affords our scheme the advantages of adiabatic transfer, where there can be an intrinsic insensitivity to the precise strength and detuning characteristics of the applied field, with the advantages of rapid transfer that is necessary for obtaining a short slowing distance. For typical parameters of a thermal oven source that generates a particle beam with a central velocity on the order of meters per second, this could result in slowing the particles to near stationary in less than a millimeter. We compare the slowing scheme to widely-implemented slowing techniques that rely on radiation pressure forces and show the advantages that potentially arise when the excited state decay rate is small. Thus, this scheme is a particularly promising candidate to slow narrow-linewidth systems that lack closed cycling transitions, such as occurs in certain molecules.

翻訳日:2023-05-10 02:29:08 公開日:2020-07-14

# 絡み合った光子源の最大忠実度のための集光光学の最適化

Optimization of collection optics for maximum fidelity in entangled photon sources ( http://arxiv.org/abs/2007.06748v1 )

ライセンス: Link先を確認

Kadir Durak

(参考訳) 本稿では,自然パラメトリックダウン変換現象によって生じる絡み合った光子のデコヒーレンス源について検討する。直交結晶からの光子対の位相および空間的識別性は、最大エンタングルメント忠実度を減少させる。慎重に選択された補償結晶は、ダウン変換原点の位相と空間的痕跡を消去するために使用される。光子対の放出角も光路差をもたらし、位相識別性をもたらす。現実的なシナリオは数値的にモデル化され、非ゼロ放射角の光子対が位相差を集める。これらのペアは、実用上はまだ収集と操作が可能であるが、収集光学は位相差を付加する。市販の2つの光学系(非球面レンズと無彩レンズ)を比較する。シミュレーション結果と実験結果を比較し,構築したモデルを用いて最大エンタングルメント忠実度を推定した。その結果, 実験パラメータを挿入することにより, 実測精度を提示モデルで正確に推定できることが示唆された。この研究は、臨界位相整合構成における絡み合った光子対源の調製と最適化に非常に有用であることが期待されている。

In this report the decoherence sources for entangled photons created by spontaneous parametric down conversion phenomenon is studied. The phase and spatial distinguishability of photon pairs from orthogonal crystals reduce the maximum achievable entanglement fidelity. Carefully chosen compensation crystals are used to erase the phase and spatial traces of down conversion origins. Emission angle of photon pairs also leads to optical path difference and resulting in phase distinguishability. A realistic scenario is numerically modelled, where the photon pairs with nonzero emission angle gather a phase difference. These pairs can still be collected and manipulated for practical use but the collection optics adds upon the phase difference. Two commercially available optics for collection; aspheric and achromatic lenses are compared. The numerical simulation results are compared with the experimental results to validate the built model for predicting the maximum achievable entanglement fidelity. The results indicate that the fidelity can be accurately estimated with the presented model by inserting the experimental parameters to it. The study is expected to be very useful for preparation and optimization of entangled photon pair sources in critical phase-matching configuration.

翻訳日:2023-05-10 02:28:46 公開日:2020-07-14

# オンラインコースにおけるデータ駆動モデルとタスク完了シーケンスのキャラクタリゼーション

Data-driven modelling and characterisation of task completion sequences in online courses ( http://arxiv.org/abs/2007.07003v1 )

ライセンス: Link先を確認

Robert L. Peach and Sam F. Greenbury and Iain G. Johnston and Sophia N. Yaliraki and David Lefevre and Mauricio Barahona

(参考訳) 学習の本質的な時間性は、時系列情報を活用できる方法論の採用を要求する。本研究では、オンラインコースにおけるタスク完了の時間的シーケンスのデータ駆動分析を用いて、個人的および集団的学習者の振る舞いを特徴付け、所定のコース設計における重要なタスクやコースセッションを識別する方法を示す。また,最近開発した確率ベイズモデルを導入し,学生のシーケンス軌跡を学習し,学生の成績を予測する。オンラインビジネスマネジメントコースを受講する学習者からのデータに対するデータ駆動シーケンス分析の適用により、学習者のコーホート内で異なる行動が明らかになり、学習者や学習者のグループを識別し、コースで期待される名目上の順序から逸脱する。コースグレードを後進として,ハイパフォーマンスと低パフォーマンスの学習者間の行動の違いについて検討する。ハイパフォーマンスな学習者は、低パフォーマンスな学習者よりも週次セッションの進行に追随するが、各週次セッションのハイパフォーマンスな学習者は、名目上のタスク順序に縛られない。次に,確率ベイズモデルを用いてハイパフォーマンスとローパフォーマンスの学生のシーケンスをモデル化し,パフォーマンスに関連するエンゲージメント行動の学習を可能にする。また,データ・シーケンス・フレームワークをタスク中心の分析に利用し,重要な点とコース設計におけるタスクの種類の違いを特定する。対話型タスクや議論投稿などの非ロボット学習タスクは高いパフォーマンスと相関していることがわかった。本稿では,授業設計,介入,学生の指導を支援するため,このような分析手法の適用について論じる。

The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners' behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequence trajectories of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.

翻訳日:2023-05-10 02:20:38 公開日:2020-07-14

# 量子スピン鎖の創発的絡み合い構造と自己相似性

Emergent entanglement structures and self-similarity in quantum spin chains ( http://arxiv.org/abs/2007.06989v1 )

ライセンス: Link先を確認

Boris Sokolov, Matteo A. C. Rossi, Guillermo Garc\'ia-P\'erez and Sabrina Maniscalco

(参考訳) 本稿では,多体量子状態に対する実験的にアクセス可能なネットワーク表現を提案する。我々は、この表現のパワーをパラダイム的なスピンチェーンモデルであるxxモデルに適用し、新しい現象をもたらすことを示した。これらの絡み合いネットワークの解析により、準長域秩序の漸進的確立は、ネットワークトポロジーの不安定性と同様に、単スピン共起分布に関する対称性を伴うことが明らかとなった。さらに,空間的局所化コミュニティである創発的絡み合い構造の存在を,モデルに依存しないコミュニティ検出アルゴリズムによって明らかにできるシステムの大域対称性により同定する。ネットワーク表現はさらに、状態における構造クラスの存在と循環的な自己相似性を明らかにし、これはコミュニティ構造と密接に関連していると推測する。その結果、複雑なネットワーク理論からツールや概念を用いることで、何十年も研究されたモデルでも新しい物理現象の発見、理解、記述が可能になることが示された。

We introduce an experimentally accessible network representation for many-body quantum states based on entanglement between all pairs of its constituents. We illustrate the power of this representation by applying it to a paradigmatic spin chain model, the XX model, and showing that it brings to light new phenomena. The analysis of these entanglement networks reveals that the gradual establishment of quasi-long range order is accompanied by a symmetry regarding single-spin concurrence distributions, as well as by instabilities in the network topology. Moreover, we identify the existence of emergent entanglement structures, spatially localised communities enforced by the global symmetry of the system that can be revealed by model-agnostic community detection algorithms. The network representation further unveils the existence of structural classes and a cyclic self-similarity in the state, which we conjecture to be intimately linked to the community structure. Our results demonstrate that the use of tools and concepts from complex network theory enables the discovery, understanding, and description of new physical phenomena even in models studied for decades.

翻訳日:2023-05-10 02:20:11 公開日:2020-07-14

# 絡み合いと対称性の合同効果:物性と排他性

Joint effects of entanglement and symmetrization: physical properties and exclusion ( http://arxiv.org/abs/2007.06982v1 )

ライセンス: Link先を確認

Pedro Sancho

(参考訳) 絡み合いと対称性は物理的性質を変更できる非分離状態をもたらす。原子吸収の例を使って、それらが一度に関連している両方の種類の効果を比較します。多粒子重ね合わせの存在は、同じ原子の吸収率を大きく変化させ、フェルミオンの重なりに依存することさえ阻害する。また、この文脈で自然に現れる多重フェルミオン重ね合わせに関連する非標準排除状態のセットも同定する。これらのアイデアをテストするために分子の解離に基づくアレンジメントを提案する。

Entanglement and symmetrization lead to non-separable states that can modify physical properties. Using the example of atomic absorption we compare both types of effects when they are relevant at once. The presence of multi-particle superpositions largely alters the absorption rates of identical atoms, even inhibiting the dependence on overlapping for fermions. We also identify a set of non-standard excluded states related to multi-fermion superposition that naturally emerge in this context. We propose an arrangement based on the dissociation of molecules to test these ideas.

翻訳日:2023-05-10 02:19:21 公開日:2020-07-14

# 光の多光子状態の対称性保護

Symmetry-protection of multiphoton states of light ( http://arxiv.org/abs/2007.06942v1 )

ライセンス: Link先を確認

Jon Lasa-Alonso, Martin Molezuelas, J. J. Miguel Varga, Aitzol Garcia-Etxarri, Geza Giedke and Gabriel Molina-Terriza

(参考訳) 本稿では,円筒対称性を持つ散乱問題における保護多光子状態の出現を解析する。そのため、まず、選択後対称性保護の概念を形式的に定義する。対称保護状態は1光子状態や2光子状態に制限されないが、反対に、正式に多光子状態に拡張できることを示す。さらに,多光子保護状態が1光子状態と2光子状態の小さな集合から構成されていることを円柱対称性の場合には証明する。最後に、特にデコヒーレンスフリー部分空間の構築において、量子通信において対称性が保護された状態が有する可能性があることを指摘する。

In this manuscript we analyze the emergence of protected multiphoton states in scattering problems with cylindrical symmetry. In order to do that, we first provide a formal definition of the concept of postselected symmetry-protection. We show that symmetry-protected states are not limited to one- or two-photon states, on the contrary, it can be formally extended to the multiphoton case. In addition, we prove for the case of cylindrical symmetry that all possible multiphoton protected states are constructed from a small set of one- and two-photon states. Finally, we point out possible applications that symmetry-protected states may have in quantum communications, concretely, in the construction of decoherence-free subspaces.

翻訳日:2023-05-10 02:19:08 公開日:2020-07-14

# ASHRAE Great Energy Predictor IIIコンペティションの概要と結果

The ASHRAE Great Energy Predictor III competition: Overview and results ( http://arxiv.org/abs/2007.06933v1 )

ライセンス: Link先を確認

Clayton Miller, Pandarasamy Arjunan, Anjukan Kathirgamanathan, Chun Fu, Jonathan Roth, June Young Park, Chris Balbach, Krishnan Gowri, Zoltan Nagy, Anthony Fontanini, Jeff Haberl

(参考訳) 2019年後半、ASHRAEはカグルプラットフォーム上でGEPIII(Great Energy Predictor III)機械学習コンペティションを開催した。この打ち上げはアシュレーから3度目のエネルギー予測競争となり、1990年代半ば以来となる。この改訂版では、16のソースから1,448の建物から収集された2,380エネルギーメーターから2000万点以上のトレーニングデータを提供した。このコンペティションの全体的な目標は、4100万以上のプライベートおよびパブリックテストデータポイントの予測のための最も正確なモデリングソリューションを見つけることであった。参加者は4,370人で、94カ国の3,614チームが39,403件の予測を提出した。上位5つの勝利ワークフローに加えて、競合他社は40以上の完全なソリューションを含む415の再現可能なオンライン機械学習ワークフロー例(ノートブック)を公開している。本稿では,コンペティションの準備とデータセット,競争相手とその議論,機械学習ワークフローとモデルの生成,勝者とその提案,学んだ教訓の議論,競技成果と次のステップについて概説する。最もポピュラーで正確な機械学習ワークフローは、lightgbmのような勾配ブースティングツリーモデルの大規模なアンサンブルを使用していた。最初の予測競合と同様に、データセットの事前処理が重要な差別化要因として現れた。

In late 2019, ASHRAE hosted the Great Energy Predictor III (GEPIII) machine learning competition on the Kaggle platform. This launch marked the third energy prediction competition from ASHRAE and the first since the mid-1990s. In this updated version, the competitors were provided with over 20 million points of training data from 2,380 energy meters collected for 1,448 buildings from 16 sources. This competition's overall objective was to find the most accurate modeling solutions for the prediction of over 41 million private and public test data points. The competition had 4,370 participants, split across 3,614 teams from 94 countries who submitted 39,403 predictions. In addition to the top five winning workflows, the competitors publicly shared 415 reproducible online machine learning workflow examples (notebooks), including over 40 additional, full solutions. This paper gives a high-level overview of the competition preparation and dataset, competitors and their discussions, machine learning workflows and models generated, winners and their submissions, discussion of lessons learned, and competition outputs and next steps. The most popular and accurate machine learning workflows used large ensembles of mostly gradient boosting tree models, such as LightGBM. Similar to the first predictor competition, preprocessing of the data sets emerged as a key differentiator.

翻訳日:2023-05-10 02:18:49 公開日:2020-07-14

# 量子グラフニューラルネットワークによる粒子トラック再構成

A Quantum Graph Neural Network Approach to Particle Track Reconstruction ( http://arxiv.org/abs/2007.06868v1 )

ライセンス: Link先を確認

Cenk T\"uys\"uz, Federico Carminati, Bilge Demirk\"oz, Daniel Dobos, Fabio Fracas, Kristiane Novotny, Karolos Potamianos, Sofia Vallecorsa, Jean-Roch Vlimant

(参考訳) HL-LHC(High Luminosity Large Hadron Collider)実験の追跡検出器の計算に必要となる複雑性とデータのスケールの未熟な増加が期待されている。現在使われているカルマンフィルタに基づくアルゴリズムは、同時衝突の数の増加、占有率、拡張性(二次的よりも重要)といった曖昧さの観点から限界に達しているが、粒子トラック再構成に対する機械学習アプローチは様々である。 HEP.TrkXは以前、トラックMLデータセットを使用して、トラック計測を接続するグラフとしてイベントを処理することで、組合せ背景を管理可能な量に減らし、計算上妥当なサイズにスケールすることで、有望なソリューションを提供できることを実証した。これまでの研究では、粒子の再構成を追跡するために、量子コンピューティングからグラフニューラルネットワークへの最初の試みを示す。我々は、量子コンピューティングの能力を活用して、非常に多くの状態を同時に評価し、大きなパラメータ空間を効果的に探索することを目指している。本論文の次のステップとして,初期単純化ツリーテンソルネットワーク(TTN)モデルの低精度収束を克服するための反復的アプローチによる改良モデルを提案する。

Unprecedented increase of complexity and scale of data is expected in computation necessary for the tracking detectors of the High Luminosity Large Hadron Collider (HL-LHC) experiments. While currently used Kalman filter based algorithms are reaching their limits in terms of ambiguities from increasing number of simultaneous collisions, occupancy, and scalability (worse than quadratic), a variety of machine learning approaches to particle track reconstruction are explored. It has been demonstrated previously by HEP.TrkX using TrackML datasets, that graph neural networks, by processing events as a graph connecting track measurements can provide a promising solution by reducing the combinatorial background to a manageable amount and are scaling to a computationally reasonable size. In previous work, we have shown a first attempt of Quantum Computing to Graph Neural Networks for track reconstruction of particles. We aim to leverage the capability of quantum computing to evaluate a very large number of states simultaneously and thus to effectively search a large parameter space. As the next step in this paper, we present an improved model with an iterative approach to overcome the low accuracy convergence of the initial oversimplified Tree Tensor Network (TTN) model.

翻訳日:2023-05-10 02:18:01 公開日:2020-07-14

# ロングパスを用いた量子回路の2次元量子ビット配置

2D Qubit Placement of Quantum Circuits using LONGPATH ( http://arxiv.org/abs/2007.06804v1 )

ライセンス: Link先を確認

Mrityunjay Ghosh, Nivedita Dey, Debdeep Mitra, Amlan Chakrabarti

(参考訳) 計算困難問題の解を求める従来の古典的計算よりも高速化を実現するため、量子コンピューティングが導入された。量子アルゴリズムは擬似量子環境でシミュレートできるが、実装には量子ゲートの物理合成による量子回路の実現が含まれる。これは複素量子ゲートを単純な1量子ビットと2量子ビットゲートのカスケードに分解する必要がある。物理合成の方法論的枠組みは、オペランド(量子ビット)と演算子の配置に関する制約を課している。格子の各ノードが量子ビットを表す格子上に物理量子ビットを置くことができれば、隣接する量子ビット上でのみ量子ゲートを操作でき、そうでなければ、非線形近接近傍アーキテクチャを線形近接近傍アーキテクチャに変換するためにSWAPゲートを挿入しなければならない。スワップゲートの挿入は物理的実装の累積コストを減らすために最適である。実際の実装への配置とルーティングにはスケジュールレイアウト生成が必要である。本稿では、任意の量子回路におけるSWAPゲート数を最適化する2つのアルゴリズムを提案する。最初のアルゴリズムは、相互作用グラフの生成から始まり、次にノードから始まる最も長い経路を最大度で見つけることを意図している。第2のアルゴリズムは、任意の非隣接量子ビット間のSWAPゲート数を最適化する。提案手法は1Dおよび2D NTCアーキテクチャにおけるSWAPゲート数を大幅に削減する。

In order to achieve speedup over conventional classical computing for finding solution of computationally hard problems, quantum computing was introduced. Quantum algorithms can be simulated in a pseudo quantum environment, but implementation involves realization of quantum circuits through physical synthesis of quantum gates. This requires decomposition of complex quantum gates into a cascade of simple one qubit and two qubit gates. The methodological framework for physical synthesis imposes a constraint regarding placement of operands (qubits) and operators. If physical qubits can be placed on a grid, where each node of the grid represents a qubit then quantum gates can only be operated on adjacent qubits, otherwise SWAP gates must be inserted to convert non-Linear Nearest Neighbor architecture to Linear Nearest Neighbor architecture. Insertion of SWAP gates should be made optimal to reduce cumulative cost of physical implementation. A schedule layout generation is required for placement and routing apriori to actual implementation. In this paper, two algorithms are proposed to optimize the number of SWAP gates in any arbitrary quantum circuit. The first algorithm is intended to start with generation of an interaction graph followed by finding the longest path starting from the node with maximum degree. The second algorithm optimizes the number of SWAP gates between any pair of non-neighbouring qubits. Our proposed approach has a significant reduction in number of SWAP gates in 1D and 2D NTC architecture.

翻訳日:2023-05-10 02:17:42 公開日:2020-07-14

# 60%検出効率1550nmのInGaAs/InP単光子検出器

InGaAs/InP single-photon detectors with 60% detection efficiency at 1550 nm ( http://arxiv.org/abs/2007.06792v1 )

ライセンス: Link先を確認

Yu-Qiang Fang, Wei Chen, Tian-Hong Ao, Cong Liu, Li Wang, Xin-Jiang Gao, Jun Zhang, Jian-Wei Pan

(参考訳) InGaAs/InP単光子検出器(SPD)は近赤外光子計数に広く用いられている。光子検出効率(PDE)は、SPDのキャラクタリゼーションにおいて最も重要なパラメータの1つであり、PDEの増加は、産業開発と学術研究において一貫して中心的な役割を果たす。本稿では,1550nmにおいて,pdeを60%まで高めた高周波ゲイティングingaas/inp spdの実装について述べる。一方,ingaas/inp単光子アバランシェダイオードの誘電金属反射層を付加した構造設計とデバイス製造を最適化し,入射光子の吸収効率を20%程度向上させた。一方,アフターパルス効果抑制のための寄生容量を最小化するために,弱い雪崩抽出のためのモノリシックリードアウト回路を開発した。 1.25GHzの正弦波ゲーティングと最適化ゲート振幅と動作温度により、SPDは340kcpsの暗カウントレート(DCR)で60%のPDEに達する。 3kcpsのDCRを基準として、PDEは5.5%の余パルス確率で40% PDEに達し、近赤外線SPDベースのアプリケーションの性能を大幅に向上させることができる。

InGaAs/InP single-photon detectors (SPDs) are widely used for near-infrared photon counting in practical applications. Photon detection efficiency (PDE) is one of the most important parameters for SPD characterization, and therefore increasing PDE consistently plays a central role in both industrial development and academic research. Here we present the implementation of high-frequency gating InGaAs/InP SPD with a PDE as high as 60% at 1550 nm. On one hand, we optimize the structure design and device fabrication of InGaAs/InP single-photon avalanche diode with an additional dielectric-metal reflection layer to relatively increase the absorption efficiency of incident photons by ~ 20%. On the other hand, we develop a monolithic readout circuit of weak avalanche extraction to minimize the parasitic capacitance for the suppression of the afterpulsing effect. With 1.25 GHz sine wave gating and optimized gate amplitude and operation temperature, the SPD is characterized to reach a PDE of 60% with a dark count rate (DCR) of 340 kcps. For practical use, given 3 kcps DCR as a reference the PDE reaches ~ 40% PDE with an afterpulse probability of 5.5%, which can significantly improve the performance for the near-infrared SPD based applications.

翻訳日:2023-05-10 02:17:22 公開日:2020-07-14

# ゲージ場が維持する安定非線形モード

Stable nonlinear modes sustained by gauge fields ( http://arxiv.org/abs/2007.07245v1 )

ライセンス: Link先を確認

Yaroslav V. Kartashov and Vladimir V. Konotop

(参考訳) スピノール多次元非線形schr\"{o}dinger方程式におけるソリトンの存在、進化、安定性に対するゲージ場の普遍的効果を明らかにする。二次元の場合に着目して、ゲージ場を純粋なゲージに分割して \rtext{non-pure gauge} を生成すると、ソリトン力学におけるこれらの成分の役割が異なる: 新興状態の \btext{localization characteristics} は曲率によって決定され、純粋なゲージはモードの安定性に影響を与える。それぞれの解は、純粋ゲージとは独立なエンベロープとして正確に表現でき、曲率とは独立な定常キャリアモード状態を変調することができる。我々の中心的な発見は、非ゼロ曲率が異常なモードの存在に繋がることであり、特に、定常的な反発相互作用を持つ媒体において、外部収束電位を伴わず、また、外部トラップにおいても、安定した局所的な自走基本状態と渦搬送状態が可能である。

We reveal the universal effect of gauge fields on the existence, evolution, and stability of solitons in the spinor multidimensional nonlinear Schr\"{o}dinger equation. Focusing on the two-dimensional case, we show that when gauge field can be split in a pure gauge and a \rtext{non-pure gauge} generating \rtext{effective potential}, the roles of these components in soliton dynamics are different: the \btext{localization characteristics} of emerging states are determined by the curvature, while pure gauge affects the stability of the modes. Respectively the solutions can be exactly represented as the envelopes independent of the pure gauge, modulating stationary carrier-mode states, which are independent of the curvature. Our central finding is that nonzero curvature can lead to the existence of unusual modes, in particular, enabling stable localized self-trapped fundamental and vortex-carrying states in media with constant repulsive interactions without additional external confining potentials and even in the expulsive external traps.

翻訳日:2023-05-10 02:10:52 公開日:2020-07-14

# 高Q間隔正方形結合マイクロリング共振器アレイ

High-Q Interstitial Square Coupled Microring Resonators Arrays ( http://arxiv.org/abs/2007.07179v1 )

ライセンス: Link先を確認

Shaolin Liao and Lu Ou

(参考訳) 中間環を持つマイクロリング共振器(MRR)の正方配列の特性について検討した。 Floquet-Bloch周期状態の遷移行列法により, 正方形結合型MRRの分散挙動を求める。固有波動ベクトル,バンドギャップおよび固有モードベクトルの解析式は、同一のカップラを持つ間方結合MRR配列と、間方結合MRRを含まない正方結合MRR配列の特別な場合に対して導出される。そして、所定の周波数の4つの固有波ベクトルそれぞれについて、世俗方程式を介して固有モードの場分布を算出する。最後に、同一のカプラと正方形結合mrs配列を有する間欠的正方形結合mrs配列について数値シミュレーションを行う。シミュレーション結果は解析分析を検証する。最後に、中間5リング構成、正規4リング構成及び1リング構成の負荷品質係数を求める。その結果, 共振周波数における固有モードの劣化により, 1リング構成の20倍, 通常の4リング構成の8倍の負荷品質が得られた。したがって、正方形結合型MRRアレイは、パリティ時間対称センサのようなフィルタや共振に基づくセンシング装置を含む高品質なフォトニクスコンポーネントを形成する大きな可能性を持っている。

The properties of the square array of coupled Microring Resonators (MRRs) with interstitial rings are studied. Dispersion behavior of the interstitial square coupled MRRs is obtained through the transfer matrix method with the Floquet-Bloch periodic condition. Analytical formulas of the eigen wave vectors, band gaps and eigen mode vectors are derived for the special cases of the interstitial square coupled MRRs array with identical couplers and the regular square coupled MRRs array without the interstitial rings. Then, the eigen modes' field distribution are calculated for each of the four eigen wave vectors for a given frequency through the secular equation. Finally, numerical simulation is performed for an interstitial square coupled MRRs array with identical couplers and a regular square coupled MRRs array. The simulation result verifies the analytical analysis. Finally, the loaded quality factors of the interstitial 5-ring configuration, the regular 4-ring configuration and the 1-ring configuration are obtained. It is found that the loaded quality factor of the interstitial 5-ring configuration is up to 20 times and 8 times as high as those of the 1-ring configuration and the regular 4-ring configuration respectively, mainly due to the degenerated eigen modes at the resonant frequency. Thus, the interstitial square coupled MRRs array has the great potential to form high-quality integrated photonics components, including filters and resonance based sensing devices like the parity-time symmetric sensors.

翻訳日:2023-05-10 02:10:31 公開日:2020-07-14

# マンガ行列モデルにおける演算子成長境界

Operator growth bounds in a cartoon matrix model ( http://arxiv.org/abs/2007.07165v1 )

ライセンス: Link先を確認

Andrew Lucas, Andrew Osborne

(参考訳) n(n-1)/2$相互作用するマヨラナフェルミオンのモデルにおいて、演算子の成長を研究する。ハミルトニアンの項は、長さ$q$の周期の辺に生きる$q$フェルミオンの積に比例する。このモデルはマンガ「行列モデル」であり、相互作用グラフはホログラフィック的に量子重力に双対な単一のトレース行列モデルを模倣している。我々は(非摂動的に1/N$で、平均的なアンサンブルなしで)このモデルのスクランブル時間は少なくとも位数$\log N$であり、高速スクランブル予想と一致することを証明している。我々は、我々の「行列モデル」とメロニカルモデルにおける演算子の成長の明らかな類似性と相違についてコメントする。

We study operator growth in a model of $N(N-1)/2$ interacting Majorana fermions, which live on the edges of a complete graph of $N$ vertices. Terms in the Hamiltonian are proportional to the product of $q$ fermions which live on the edges of cycles of length $q$. This model is a cartoon "matrix model": the interaction graph mimics that of a single-trace matrix model, which can be holographically dual to quantum gravity. We prove (non-perturbatively in $1/N$, and without averaging over any ensemble) that the scrambling time of this model is at least of order $\log N$, consistent with the fast scrambling conjecture. We comment on apparent similarities and differences between operator growth in our "matrix model" and in the melonic models.

翻訳日:2023-05-10 02:10:09 公開日:2020-07-14

# マルチユーティリティ市場:持続可能な開発のためのブロックチェーン交換プラットフォームのためのフレームワーク

Multi-Utility Market: Framework for a Blockchain Exchange Platform for Sustainable Development ( http://arxiv.org/abs/2007.07096v1 )

ライセンス: Link先を確認

Jacques Bou Abdo and Sherali Zeadally

(参考訳) 水やその他の資源は日々不足しており、発展途上国は直ちに介入する必要性が最も高い。国家のニーズとして水は、21世紀における紛争の主な原因の1つと考えられている。ピアツーピアトレーディングは、最も便利でスケーラブルで持続可能なソリューションの1つだが、適切なビジネスモデルが欠如しているため、通常のユーザーが生成されたリソース、通貨や金融決済の複雑さ、単一ユーティリティ市場を売却する動機となっている。本稿では,ピアツーピアトレーディングが直面する課題を解決するブロックチェーン技術に基づく多機能トレーディングプラットフォームを提案する。このプラットフォームは、先進国の農村部だけでなく、特に発展途上国のニーズを満たしている。提案する設計のオープン性は、様々な利害関係者による採用と利用に適しています。

Water and other resources are becoming scarcer every day, and developing countries are the neediest for an immediate intervention. Water, as a national need, is considered to be one of the main causes for conflicts in the 21st century. Peer-to-peer trading is one of the most convenient, scalable and sustainable solutions but faces organization challenges such as: the absence of suitable business models motivating normal users to sell their generated resources, currency and financial settlement complexities, and single utility markets. We propose a multi-utility trading platform, based on blockchain technology which can address the challenges faced by peer-to-peer trading. This platform meets the needs of developing countries in particular as well as rural areas of developed countries. The open nature of our proposed design makes it suitable for adoption and use by various stakeholders.

翻訳日:2023-05-10 02:09:52 公開日:2020-07-14

# プライベートデータから得られる公共財 - デジタルコンタクトトラクションの有効性と正当化パラドックス

Public Goods From Private Data -- An Efficacy and Justification Paradox for Digital Contact Tracing ( http://arxiv.org/abs/2007.07016v1 )

ライセンス: Link先を確認

Andrew Buzzell

(参考訳) 新型コロナウイルスの感染拡大を抑えるためのデジタルコンタクトトラッキング(DCT)アプリの採用に関する議論は、個人のプライバシーへのリスクに焦点を当てている(Sharma & Bashir 2020, Tang 2020)。この強調は、DCTの倫理的展開に重大な課題を示すが、DCTを実装するための正当化を損なう制約を生成する。この結果のみを倫理的監視分析(Floridi & Strait 2020)の成功であり、潜在的に有害な技術の配備を妨げていると考えるのは間違いである。プライバシー中心の分析は、データを私有財産として扱い、個人と政府の関係を敵とみなし、技術プラットフォームをゲートキーパーとして定着させ、個人の同意と公共衛生倫理を知らせるよりコミュニタリズム的な価値観とある程度の緊張関係にある企業の影響によって、緊急公衆衛生当局の概念を支持している。倫理的かつ効果的なDCTの障壁を克服し、デジタル技術の公共的利益の実現を支援するインフラと政策を開発するためには、集約データの公開リソース概念を開発する必要がある。

Debate about the adoption of digital contact tracing (DCT) apps to control the spread of COVID-19 has focussed on risks to individual privacy (Sharma & Bashir 2020, Tang 2020). This emphasis reveals significant challenges to ethical deployment of DCT, but generates constraints which undermine justification to implement DCT. It would be a mistake to view this result solely as the successful operation of ethical foresight analysis (Floridi & Strait 2020), preventing deployment of potentially harmful technology. Privacy-centric analysis treats data as private property, frames the relationship between individuals and governments as adversarial, entrenches technology platforms as gatekeepers, and supports a conception of emergency public health authority as limited by individual consent and considerable corporate influence that is in some tension with the more communitarian values that typically inform public health ethics. To overcome the barriers to ethical and effective DCT, and develop infrastructure and policy that supports the realization of potential public benefits of digital technology, a public resource conception of aggregate data should be developed.

翻訳日:2023-05-10 02:08:50 公開日:2020-07-14

# 線形光学に基づくGHZ型絡み合いコヒーレント状態の絡み合い濃度プロトコル

Entanglement concentration protocols for GHZ-type entangled coherent state based on linear optics ( http://arxiv.org/abs/2007.07014v1 )

ライセンス: Link先を確認

Mitali Sisodia, Chitra Shukla

(参考訳) 我々は,GHZ型ECSから最大絡み合いのグリーンベルガー・ホルン・ザイリンガー型絡み合いコヒーレント状態(ECS)を得るための2つの絡み合い濃度プロトコル(ECP)を提案した。コンヒーレント状態の重畳を補助する部分絡み付きGHZ型ECSを用いた第1のECPを得たが,第2のECPは部分絡み付きGHZ型ECSの2つのコピーを用いて設計した。成功確率も計算され、ecpsの両方で議論されている。我々は、3モードのGHZ型ECSに対する最初のECPの成功確率を、3モードのW型ECSのECPと比較し、状態パラメータのより大きな値(\beta=0.7)に対して、我々のECPがより効率的(最大成功確率)であることを示した。物理実現のために、線形光学素子を用いた2つの光回路(2つのecps)、viz 50:50ビームスプリッタ、位相シフト器、および光子検出器が提供され、この技術で可能な将来の実験実装をサポートする。

We proposed two entanglement concentration protocols (ECPs) to obtain maximally entangled Greenberger-Horne-Zeilinger (GHZ)-type entangled coherent state (ECS) from the corresponding partially entangled GHZ-type ECSs. We obtained the first ECP using a partially entangled GHZ-type ECS assisted with a superposition of single-mode coherent state, however the second ECP is designed using two copies of partially entangled GHZ-type ECSs. The success probabilities have also been calculated and discussed for both the ECPs. We have further compared the success probabilities of our first ECP for 3-mode GHZ-type ECS with an ECP of 3-mode W-type ECS and found that our ECP is more efficient (maximal success probabilities) for larger value (\beta=0.7) of state parameter. For the physical realization, two optical circuits (for two ECPs) using linear optical elements, viz 50:50 beam splitter, phase shifter, and photon detectors are provided, which support the future experimental implementation possible with the present technology.

翻訳日:2023-05-10 02:08:29 公開日:2020-07-14

# 一般化イジングマシンによる量子干渉のエミュレート

Emulating Quantum Interference with Generalized Ising Machines ( http://arxiv.org/abs/2007.07379v1 )

ライセンス: Link先を確認

Shuvro Chowdhury, Kerem Y. Camsari and Supriyo Datta

(参考訳) ノイズの多い中間スケール量子(nisq)時代の量子超越性の最近の画期的な実証は、古典計算と量子計算の間のより細かい境界を確立するための激しい活動をもたらした。本稿では、量子モンテカルロ法(QMC)を用いて、$n$ q-bitsに作用する$d$量子ゲートの列を、2つの値 "0" と "1" を持つ古典スピンまたは p-bits を持つボルツマン機械(BM)に変換する体系的な手順を定式化する。この手順を用いて、ショアのアルゴリズムを、通常のラップトップコンピュータ上で1日以内の90ドルのpビットを用いて最大36ドルのqビットでエミュレートし、naive schr\"{o}dingerの実装では約10^{21}$要素の行列を乗算する必要がある。さらに大きな問題は専用イジングマシンでアクセス可能であるべきである。しかし、量子コンピュータに対する非効率性に対して量的計量$s_{\text{total}}$を導入することにより、確率論的アプローチの明確な限界も明らかにする。例えば、$n$ q-bitsのShorのアルゴリズムの簡単な確率的実装は、$S_{\text{Total}} \sim \exp{(-n/2)}$となり、真量子コンピュータで期待される多項式スケーリングの代わりに、確率的Shorのアルゴリズムの計算時間を指数関数的に2^{n/2$にする。これはQMCでよく知られた符号問題の顕在化であり、適切な変換で「テーム」することが可能である。最後に、純粋に実エネルギー関数に基づく標準的な最適化アルゴリズムを特徴とし、虚部$\Im{(E)}$を加算することにより、量子的な位相キャンセルを伴うファインマンパスの統計的抑制を増大させる例を示す。この例は、古典的アニーラーで遭遇した符号問題を量子アニーラーの計算資源にすることができることを示している。

The recent groundbreaking demonstration of quantum supremacy in the noisy intermediate scale quantum (NISQ) era has led to an intense activity in establishing finer boundaries between classical and quantum computing. In this paper, we use quantum Monte Carlo (QMC) techniques to formulate a systematic procedure for translating any sequence of $d$ quantum gates acting on $n$ q-bits into a Boltzmann machine (BM) having $n+g(d)$ classical spins or p-bits with two values "0" and "1", but with a complex energy function $E$. Using this procedure we emulate Shor's algorithm with up to $36$ q-bits using $90$ p-bits, on an ordinary laptop computer in less than a day, while a naive Schr\"{o}dinger implementation would require multiplying matrices with $\approx 10^{21}$ elements. Even larger problems should be accessible on dedicated Ising Machines. However, we also identify clear limitations of the probabilistic approach by introducing a quantitative metric $S_{\text{Total}}$ for its inefficiency relative to a quantum computer. For example, a straightforward probabilistic implementation of Shor's algorithm with $n$ q-bits leads to an $S_{\text{Total}} \sim \exp{(-n/2)}$, making the computation time for the probabilistic Shor's algorithm scale exponentially as $2^{n/2}$ instead of the polynomial scaling expected for true quantum computers. This is a manifestation of the well-known sign problem in QMC and it may be possible to "tame" it with appropriate transformations. Finally, we present an example featuring a standard optimization algorithm based on a purely real energy function to which we add an imaginary part $\Im{(E)}$, thereby augmenting the statistical suppression of Feynman paths with quantum-like phase cancellation. This example illustrates how the sign problem encountered in classical annealers can potentially be turned into a computational resource for quantum annealers.

翻訳日:2023-05-10 02:01:00 公開日:2020-07-14

# SaYoPillow: IoMTの睡眠習慣を考慮したストレス検出・予測・制御のためのブロックチェーン対応プライバシ保証フレームワーク

SaYoPillow: A Blockchain-Enabled, Privacy-Assured Framework for Stress Detection, Prediction and Control Considering Sleeping Habits in the IoMT ( http://arxiv.org/abs/2007.07377v1 )

ライセンス: Link先を確認

Laavanya Rachakonda and Anand K. Bapatla and Saraju P. Mohanty and Elias Kougianos

(参考訳) 今日の生活様式を考えると、人々は人間の体に与える利益を忘れるだけである。生産的な睡眠をとらない理由は多々ある。 Smart-Yoga Pillow(SaYoPillow)は、ストレスを緩和し、ストレスと睡眠習慣の計測可能な関係を確立しながら、良質な睡眠の重要性を認識するためのデバイスとして構想されている。本研究では、急速眼球運動(REM)および非急速眼球運動(NREM)段階における生理的変化を継続的に監視し、睡眠習慣を分析するシステムを提案する。生理的パラメータの変化に加えて、睡眠時間、嗅覚範囲、眼球運動、四肢運動などの要因も監視される。 SaYoPillowシステムはエッジレベルで処理され、ストレージはクラウドにある。ユーザのプライバシを侵害する必要はなく、SaYoPillow氏は、医療に対する悪意のある攻撃を減らすために、アップロードと検索の両方にセキュアなデータ送信を提案している。ユーザインタフェースは、データアクセシビリティと可視性を制御するために提供される。 SaYoPillowの精度は96%で、既存の研究成果に近い。しかし、SaYoPillowは、セキュリティ機能を扱う唯一の仕事であり、ストレスに対する睡眠習慣を考慮する唯一の仕事である。

Considering today's lifestyle, people just sleep forgetting the benefits it provides to the human body. The reasons for not having a productive sleep could be many. Smart-Yoga Pillow (SaYoPillow) is envisioned as a device that may help in recognizing the importance of a good quality sleep to alleviate stress while establishing a measurable relationship between stress and sleeping habits. A system that analyzes the sleeping habits by continuously monitoring the physiological changes that occur during rapid eye movement (REM) and non-rapid eye movement (NREM) stages of sleep is proposed in the current work. In addition to the physiological parameter changes, factors such as sleep duration, snoring range, eye movement, and limb movements are also monitored. The SaYoPillow system is processed at the edge level with the storage being at the cloud. Not having to compromise the user's privacy, SaYoPillow proposes secure data transmission for both uploading and retrieving, and secure storage and communications as an attempt to reduce malicious attacks on healthcare. A user interface is provided for the user to control data accessibility and visibility. SaYoPillow has 96% accuracy which is close to other existing research works. However, SaYoPillow is the only work with security features as well as only work that considers sleeping habits for stress.

翻訳日:2023-05-10 02:00:18 公開日:2020-07-14

# 簡単な迷路を通る複数の経路を量子ウォークで見つける

Finding more than one path through a simple maze with a quantum walk ( http://arxiv.org/abs/2007.07340v1 )

ライセンス: Link先を確認

Mark Hillery

(参考訳) 2つと3つの星グラフからなる鎖を通る量子ウォークを研究する。第一星は識別された頂点のラベル付きスタートを持ち、最後の星は1つのラベル付き終端を持つ。これら2つの頂点の間には複数の経路があり、対象はこれらの経路を見つけることである。量子ウォークは量子スピードアップによってこれを実現できることを示す。

We study quantum walks through chains consisting of two and three star graphs. The first star has a distinguished vertex labelled START and the last has one labelled END. There are multiple paths between these two vertices, and the object is to find these paths. We show that a quantum walk can do this with a quantum speedup.

翻訳日:2023-05-10 01:59:54 公開日:2020-07-14

# 窒化チタンとアルミニウム超伝導共振器の誘電損失の比較

Comparison of Dielectric Loss in Titanium Nitride and Aluminum Superconducting Resonators ( http://arxiv.org/abs/2007.07338v1 )

ライセンス: Link先を確認

Alexander Melville, Greg Calusine, Wayne Woods, Kyle Serniak, Evan Golden, Bethany M. Niedzielski, David K. Kim, Arjan Sevi, Jonilyn L. Yoder, Eric A. Dauler, William D. Oliver

(参考訳) 損失誘電体は超伝導量子回路における大きなデコヒーレンス源である。本稿では, 窒化チタン (TiN) とアルミニウム (Al) 超伝導コプラナー導波管 (CPW) 共振器のバルクおよび界面誘電体の誘電損失をモデル化し, 比較する。我々は等方トレンチ型共振器を作製し、特定の誘電領域の共振器品質因子に対する寄与を強調する一連のデバイスジオメトリを生成する。各誘電体領域はTiNデバイスの損失に大きく寄与するが、金属-空気界面はAlデバイスの損失を支配している。さらに、後プロセスハイドロフッ化物(hf)エッチングの有無に関わらず、各tin共振器形状の品質因子を評価し、基板-空気界面の損失を低減し、品質因子を改善する。

Lossy dielectrics are a significant source of decoherence in superconducting quantum circuits. In this report, we model and compare the dielectric loss in bulk and interfacial dielectrics in titanium nitride (TiN) and aluminum (Al) superconducting coplanar waveguide (CPW) resonators. We fabricate isotropically trenched resonators to produce a series of device geometries that accentuate a specific dielectric region's contribution to resonator quality factor. While each dielectric region contributes significantly to loss in TiN devices, the metal-air interface dominates the loss in the Al devices. Furthermore, we evaluate the quality factor of each TiN resonator geometry with and without a post-process hydrofluoric (HF) etch, and find that it reduced losses from the substrate-air interface, thereby improving the quality factor.

翻訳日:2023-05-10 01:59:49 公開日:2020-07-14

# ウィキペディアの取り組みと貢献に影響を与える要因

Individual Factors that Influence Effort and Contributions on Wikipedia ( http://arxiv.org/abs/2007.07333v1 )

ライセンス: Link先を確認

Luiz F. Pinto, Carlos Denner dos Santos, Silvia Onoyama

(参考訳) 本研究は,ウィキペディアに対する態度,自己効力,利他主義が努力や積極的な貢献にどのように影響するかを分析することを目的とする。本稿では,計画行動理論とオンラインコミュニティにおける文献からの知見に基づく新しい概念モデルを提案する。このモデルは、様々な面(識別、相互性、評判)における利他主義を考慮し、組織文献に拠れば、積極的な貢献の観点で測定されるパフォーマンス結果に先立って、努力を要素として扱うことによって、これまで提案されてきた他のモデルと異なる。研究の目的を達成するため、wikipediaはコミュニティのメンバーを調査し、二次的なデータを収集した。異常値を除くと,最終サンプルが212名であった。探索的因子分析と構造方程式モデリングを適用し,良好な適合指標を持つモデルを得た。その結果, 努力が積極的な貢献, 態度, 評価による利他主義, 識別による利他主義に影響を及ぼすことが示唆された。提案された要因はいずれも、アクティブな貢献に直接関係しない。経験は自己効力感に直接影響を与え、努力と積極的貢献の関係を肯定的に抑制する。最後に,文献への示唆と今後の研究への示唆を通じて結論を述べる。

In this work, we aim to analyze how attitude, self-efficacy, and altruism influence effort and active contributions on Wikipedia. We propose a new conceptual model based on the theory of planned behavior and findings from the literature on online communities. This model differs from other models that have been previously proposed by considering altruism in its various facets (identification, reciprocity, and reputation), and by treating effort as a factor prior to performance results, which is measured in terms of active contributions, according to the organizational literature. To fulfill the study specific objectives, Wikipedia surveyed community members and collected secondary data. After excluding outliers, we obtained a final sample with 212 participants. We applied exploratory factor analysis and structural equation modeling, which resulted in a model with satisfactory fit indices. The results indicate that effort influences active contributions, and attitude, altruism by reputation, and altruism by identification influence effort. None of the proposed factors are directly related to active contributions. Experience directly influences self-efficacy while it positively moderates the relation between effort and active contributions. Finally, we present the conclusions via several implications for the literature as well as suggestions for future research.

翻訳日:2023-05-10 01:59:31 公開日:2020-07-14

# 二元系ボース・アインシュタイン凝縮体におけるフィードバック誘起磁性相

Feedback Induced Magnetic Phases in Binary Bose-Einstein Condensates ( http://arxiv.org/abs/2007.07266v1 )

ライセンス: Link先を確認

Hilary M. Hurst, Shangjie Guo, I. B. Spielman

(参考訳) 実時間フィードバック制御を伴うタンデムの弱い測定は、新しい非平衡量子物質工学への新しい道である。本稿では,多成分ボース・アインシュタイン凝縮体(becs)の量子フィードバック制御のための理論的ツールボックスを開発した。単粒子ポテンシャルの形でのフィードバックは、系のダイナミクスを支配する確率方程式に入る効果的な相互作用をもたらすことができる。効果的な相互作用は調整可能であり、スピン非依存およびスピン依存のフェシュバッハ共鳴と類似するが、原子散乱パラメータを変更することはない。フィードバック冷却は測定バックアクションによる暴走加熱を防止し,その効果を説明する解析モデルを提案する。我々は,2成分のBECを確率平均場理論を用いて研究し,フィードバックが易軸強磁性体とスピン非秩序パラマグネット相の相転移を誘導するツールボックスを展示する。本研究では,スピン依存相互作用強度の関数として定常相図を示す。この結果は,ボース・アインシュタイン凝縮体の閉ループ量子制御が,低温原子系における量子工学の強力な新しいツールであることを示す。

Weak measurement in tandem with real-time feedback control is a new route toward engineering novel non-equilibrium quantum matter. Here we develop a theoretical toolbox for quantum feedback control of multicomponent Bose-Einstein condensates (BECs) using backaction-limited weak measurements in conjunction with spatially resolved feedback. Feedback in the form of a single-particle potential can introduce effective interactions that enter into the stochastic equation governing system dynamics. The effective interactions are tunable and can be made analogous to Feshbach resonances -- spin-independent and spin-dependent -- but without changing atomic scattering parameters. Feedback cooling prevents runaway heating due to measurement backaction and we present an analytical model to explain its effectiveness. We showcase our toolbox by studying a two-component BEC using a stochastic mean-field theory, where feedback induces a phase transition between easy-axis ferromagnet and spin-disordered paramagnet phases. We present the steady-state phase diagram as a function of intrinsic and effective spin-dependent interaction strengths. Our result demonstrates that closed-loop quantum control of Bose-Einstein condensates is a powerful new tool for quantum engineering in cold-atom systems.

翻訳日:2023-05-10 01:58:13 公開日:2020-07-14

# 人々を復活させる - ベンチマーク機械学習データセットのコンテスト

Bringing the People Back In: Contesting Benchmark Machine Learning Datasets ( http://arxiv.org/abs/2007.07399v1 )

ライセンス: Link先を確認

Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, Morgan Klaus Scheuerman

(参考訳) 社会技術システムに埋め込まれたアルゴリズム上の不公平さに対して、白人、シスジェンダー、男性、西洋のデータ被験者に対する偏見を明らかにする機械学習データセットの内容に注目が集まっている。対照的に、そのようなデータセットに埋め込まれた履歴、値、規範に比較的注意が払われていない。本稿では,機械学習データの系譜である研究プログラムを概説し,これらのデータセットが作成されている理由,収集すべきデータの選択にどのような影響を与えるか,それらの生成の文脈的条件と付随的条件について検討する。機械学習におけるベンチマークデータセットを基盤として運用する方法を説明し、これらのデータセットについて4つの研究課題を提起する。この尋問は、データセット構築に埋め込まれた労働力を理解し、データに遭遇する他の研究者に対する新たなコンテストの道を示すことで、私たちを「人々を取り戻す」よう促します。

In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets. This interrogation forces us to "bring the people back in" by aiding us in understanding the labor embedded in dataset construction, and thereby presenting new avenues of contestation for other researchers encountering the data.

翻訳日:2023-05-10 01:50:46 公開日:2020-07-14

# 多次元格子上のフーリエ歩行では局在化は起こらない

Localization does not occur for the Fourier walk on the multi-dimensional lattice ( http://arxiv.org/abs/2007.07398v1 )

ライセンス: Link先を確認

Akihiro Narimatsu

(参考訳) 多次元格子上のグロバーウォークの局所化の存在が知られている。本稿では,空間均質な量子ウォークの局在性の存在条件について述べる。また,多次元格子上のフーリエ歩行では局在化は起こらないことを証明した。

The existence of localization for the Grover walk on the multi-dimensional lattice is known. This paper gives some conditions for the existence of localization for the space-homogeneous quantum walks. We also prove that localization does not occur for the Fourier walk on the multi-dimensional lattice.

翻訳日:2023-05-10 01:50:30 公開日:2020-07-14

# 集積フォトニクスを用いた高純度パルススクイーズ生成

High-Purity Pulsed Squeezing Generation with Integrated Photonics ( http://arxiv.org/abs/2007.07387v1 )

ライセンス: Link先を確認

Chaohan Cui, Christos N. Gagatsos, Saikat Guha and Linran Fan

(参考訳) スクイーズド光は、部分的なポストセレクション技術に基づく量子強化センシングや量子状態工学など、量子技術の強力なツールへと進化してきた。複雑な通信ネットワークや大規模情報処理で好まれる正確なタイムスタンプと物理的に定義された時間モードを提供するため、絞り込み光のパルス発生は特に興味深い。しかしながら、従来の単一パス構成におけるパルススクイージングのマルチモード特性は出力状態の純度を制限し、量子技術における応用に悪影響を及ぼす。本報告では,パルススクイーズを高い時間的純度で生成する新しい手法を提案する。フォトニックキャビティのパラメトリックダウンコンバージョンに基づくパルススクイージングの解析を行った。出力された励起光の有効モード数がユニティに近づくことを示す。このような高純度励起光は広いパラメータと低いポンプパワーで実現でき、大規模な量子資源を生成するための堅牢なアプローチを提供する。

Squeezed light has evolved into a powerful tool for quantum technology, ranging from quantum enhanced sensing and quantum state engineering based on partial post-selection techniques. The pulsed generation of squeezed light is of particular interest, as it can provide accurate time stamp and physically defined temporal mode, which are highly preferred in complex communication networks and large-scale information processing. However, the multimode feature of pulsed squeezing in conventional single-pass configuration limits the purity of the output state, negatively impacting its application in quantum technology. In this Letter, we propose a new approach to generate pulsed squeezing with high temporal purity. Pulsed squeezing based on parametric down-conversion in photonic cavities is analyzed. We show that the effective mode number of the output squeezed light approaches unity. Such a high-purity squeezed light can be realized with broad parameters and low pump power, providing a robust approach to generate large-scale quantum resource.

翻訳日:2023-05-10 01:49:49 公開日:2020-07-14

# パッシブフォトニクスと時間分解検出を用いた高次元周波数エンコード量子情報処理

High-dimensional Frequency-Encoded Quantum Information Processing with Passive Photonics and Time-Resolving Detection ( http://arxiv.org/abs/2007.07386v1 )

ライセンス: Link先を確認

Chaohan Cui, Kaushik P. Seshadreesan, Saikat Guha and Linran Fan

(参考訳) 本稿では,光子周波数領域に符号化された高次元量子情報を処理する新しい手法を提案する。非線形光学過程に基づく以前のアプローチとは対照的に、光子エネルギーのアクティブ制御は不要である。任意ユニタリ変換と投影測定は、受動フォトニック回路と時間分解検出によって実現できる。任意の大きさの量子周波数コムの系統回路設計が提案されている。量子周波数相関の検証基準が導出されている。検出器の有限応答時間の実用的条件を考慮し、現在の装置性能で高忠実度動作を容易に実現できることを示す。この研究は、高次元周波数符号化に基づくスケーラブルで高忠実な量子情報処理への道を開く。

In this Letter, we propose a new approach to process high-dimensional quantum information encoded in a photon frequency domain. In contrast to previous approaches based on nonlinear optical processes, no active control of photon energy is required. Arbitrary unitary transformation and projection measurement can be realized with passive photonic circuits and time-resolving detection. A systematic circuit design for a quantum frequency comb with arbitrary size has been given. The criteria to verify quantum frequency correlation has been derived. By considering the practical condition of detector's finite response time, we show that high-fidelity operation can be readily realized with current device performance. This work will pave the way towards scalable and high-fidelity quantum information processing based on high-dimensional frequency encoding.

翻訳日:2023-05-10 01:49:34 公開日:2020-07-14

# 変分量子分類器を用いた認知症予測

Dementia Prediction Applying Variational Quantum Classifier ( http://arxiv.org/abs/2007.08653v1 )

ライセンス: Link先を確認

Daniel Sierra-Sosa, Juan Arcila-Moreno, Begonya Garcia-Zapirain, Cristian Castillo-Olea, Adel Elmaghraby

(参考訳) 認知症は世界で5番目の死因であり、毎年1000万人の新規患者が死亡している。機械学習技術を用いた医療アプリケーションは物理的限界にほぼ達しているが、診断の頻度の増加によってより多くのデータが利用できるようになっている。量子機械学習(QML)技術に関する最近の研究は、既存の機械学習モデルのトレーニングプロセスを加速し、より複雑なパターンを学ぶための代替手段を提供するのに役立つ、さまざまなアプローチを発見した。本研究は,量子機械学習アルゴリズムの実世界の応用を報告することを目的としており,特に,ibmのフレームワークにおける変分量子分類(vqc)に実装されたバージョンを用いることにより,高齢者の認知症予測を可能にしている。

Dementia is the fifth cause of death worldwide with 10 million new cases every year. Healthcare applications using machine learning techniques have almost reached the physical limits while more data is becoming available resulting from the increasing rate of diagnosis. Recent research in Quantum Machine Learning (QML) techniques have found different approaches that may be useful to accelerate the training process of existing machine learning models and provide an alternative to learn more complex patterns. This work aims to report a real-world application of a Quantum Machine Learning Algorithm, in particular, we found that using the implemented version for Variational Quantum Classiffication (VQC) in IBM's framework Qiskit allows predicting dementia in elderly patients, this approach proves to provide more consistent results when compared with a classical Support Vector Machine (SVM) with a linear kernel using different number of features.

翻訳日:2023-05-10 01:39:28 公開日:2020-07-14

# 相対論的安息エネルギーと次数1/2のフラクショナルモーメント演算子との関係

A Link Between Relativistic Rest Energy and Fractionary Momentum Operators of Order 1/2 ( http://arxiv.org/abs/1912.12770v4 )

ライセンス: Link先を確認

Luis Fernando Mora Mora

(参考訳) 無限ポテンシャル井戸における因果分数波方程式の解を得た。第一に、いわゆる「自由粒子」の場合が解決され、正規化可能な解は波のパケットに似た減衰振動の重ね合わせとなる。この結果から無限ポテンシャルウェルケースが解かれた。得られた方程式の減衰係数は、ユカワポテンシャルまたは「遮蔽」クーロンポテンシャルに現れる指数と一致した。このマッチングが強制されると、粒子はE = mc^2/2のオフセットエネルギーを取得し、各エネルギーレベルによって増大する。箱の中の波動解の指数減衰は、粒子が陽子の質量に等しい質量を持つとき、陽子の半径と密接に関連していることがわかった。最後に、分数的波動方程式は球面座標で表現され、解析的あるいは数値的な方法で解かれる。

The solution of a causal fractionary wave equation in an infinite potential well was obtained. First, the so-called "free particle" case was solved, giving as normalizable solutions a superposition of damped oscillations similar to a wave packet. From this results, the infinite potential well case was then solved. The damping coefficient of the equation obtained was matched with the exponent appearing in the Yucawa potential or "screened" Coulomb potential. When this matching was forced, the particle aquires an offset energy of E = mc^2/2 which then can be increased by each energy level. The expontential damping of the wave solutions in the box was found to be closely related with the radius of the proton when the particle has a mass equal to the mass of the proton. Lastly the fractionary wave equation was expressed in spherical coordinates and remains to be solved through analytical or numerical methods.

翻訳日:2023-01-17 02:52:57 公開日:2020-07-14

# ローカルに考える、グローバルに行動する - ローカルとグローバル表現による連合学習

Think Locally, Act Globally: Federated Learning with Local and Global Representations ( http://arxiv.org/abs/2001.01523v3 )

ライセンス: Link先を確認

Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

(参考訳) フェデレートラーニング(Federated Learning)とは、複数のデバイスに分散したプライベートデータをトレーニングする手法である。デバイスのデータをプライベートに保つため、グローバルモデルはパラメータとアップデートを通信するだけでトレーニングされる。そこで本研究では,各デバイス上のコンパクトな局所表現と,全デバイスにまたがるグローバルモデルを同時に学習する,新しいフェデレーション学習アルゴリズムを提案する。その結果、グローバルモデルは局所表現のみで動作するため、より小さくなり、通信されるパラメータの数を減らすことができる。理論的には, 局所モデルと大域モデルの組み合わせにより, デバイス分布のばらつきだけでなく, データのばらつきも減少することを示す一般化解析を行う。実演的に、我々はローカルモデルが性能を維持しながらコミュニケーション効率の高い訓練を可能にすることを示した。また、プライバシーが重要な現実世界のモバイルデータから個人化された気分予測のタスクを評価する。最後に、ローカルモデルは、新しいデバイスからの異種データを処理し、人種、年齢、性別などの保護された属性を隠蔽する公平な表現を学ぶ。

Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. Theoretically, we provide a generalization analysis which shows that a combination of local and global models reduces both variance in the data as well as variance across device distributions. Empirically, we demonstrate that local models enable communication-efficient training while retaining performance. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key. Finally, local models handle heterogeneous data from new devices, and learn fair representations that obfuscate protected attributes such as race, age, and gender.

翻訳日:2023-01-14 02:08:44 公開日:2020-07-14

# 部分空間分割のための群ノルム正規化分解モデル

A Group Norm Regularized Factorization Model for Subspace Segmentation ( http://arxiv.org/abs/2001.02568v2 )

ライセンス: Link先を確認

Xishun Wang and Zhouwang Yang and Xingye Yue and Hui Wang

(参考訳) 部分空間のセグメンテーションは、データが異なる部分空間の結合から来ると仮定し、セグメンテーションの目的は、データを対応する部分空間に分割することである。低ランク表現(LRR)は、サブスペースセグメンテーション問題を解決するための古典的なスペクトル型手法であり、まずLRRモデルを解くことで親和性行列を取得し、次にセグメンテーションのためのスペクトルクラスタリングを実行する。本稿では,部分空間分割のためのlrrモデルに触発された群ノルム正規化分解モデル(gnrfm)を提案し,このモデルを解くために拡張ラグランジアン法(aalm)アルゴリズムを設計する。具体的には, 因子行列の列を疎くするために群ノルム正規化を適用し, 低階の目的を達成することにより, 特異値分解 (svd) は不要となり, 各ステップの計算複雑性が大幅に低減される。我々は、異なるLRRモデルを用いて親和性行列を取得し、それぞれ異なる合成ノイズデータと実データを用いてクラスタテストを行う。従来のモデルやアルゴリズムと比較して、提案手法はより高速でノイズに強いため、最終的なクラスタリング結果の方が優れている。さらに, 計算結果から, アルゴリズムは高速に収束し, 約10回しか要しないことがわかった。

Subspace segmentation assumes that data comes from the union of different subspaces and the purpose of segmentation is to partition the data into the corresponding subspace. Low-rank representation (LRR) is a classic spectral-type method for solving subspace segmentation problems, that is, one first obtains an affinity matrix by solving a LRR model and then performs spectral clustering for segmentation. This paper proposes a group norm regularized factorization model (GNRFM) inspired by the LRR model for subspace segmentation and then designs an Accelerated Augmented Lagrangian Method (AALM) algorithm to solve this model. Specifically, we adopt group norm regularization to make the columns of the factor matrix sparse, thereby achieving a purpose of low rank, which means no Singular Value Decompositions (SVD) are required and the computational complexity of each step is greatly reduced. We obtain affinity matrices by using different LRR models and then performing cluster testing on different sets of synthetic noisy data and real data, respectively. Compared with traditional models and algorithms, the proposed method is faster and more robust to noise, so the final clustering results are better. Moreover, the numerical results show that our algorithm converges fast and only requires approximately ten iterations.

翻訳日:2023-01-13 09:40:38 公開日:2020-07-14

# 単一画像の空間適応ネットワーク

Spatial-Adaptive Network for Single Image Denoising ( http://arxiv.org/abs/2001.10291v2 )

ライセンス: Link先を確認

Meng Chang, Qi Li, Huajun Feng, Zhihai Xu

(参考訳) 前回の研究では、畳み込みニューラルネットワークが画像のノイズ処理において優れたパフォーマンスを達成できることが示されている。しかし、局所的な強固な畳み込み操作によって制限されるため、これらの手法は過剰な人工物に繋がる。より深いネットワーク構造はこれらの問題を緩和するが、より多くの計算オーバーヘッドが必要である。本稿では,効率的な単一画像ブラインドノイズ除去のための空間適応型雑音除去ネットワーク(SADNet)を提案する。空間テクスチャやエッジの変化に適応するため, 残留空間適応ブロックを設計する。重み付けのための空間的相関特徴をサンプリングするために変形可能な畳み込みを導入する。コンテキストブロック付きエンコーダデコーダ構造を導入し、マルチスケール情報をキャプチャする。粗さから微細なノイズ除去により、高品質なノイズフリー画像を得ることができる。本手法を合成および実雑音画像データセットに適用する。実験の結果,本手法は定量的および視覚的に,最先端の弁別法を上回ることができることがわかった。

Previous works have shown that convolutional neural networks can achieve good performance in image denoising tasks. However, limited by the local rigid convolutional operation, these methods lead to oversmoothing artifacts. A deeper network structure could alleviate these problems, but more computational overhead is needed. In this paper, we propose a novel spatial-adaptive denoising network (SADNet) for efficient single image blind noise removal. To adapt to changes in spatial textures and edges, we design a residual spatial-adaptive block. Deformable convolution is introduced to sample the spatially correlated features for weighting. An encoder-decoder structure with a context block is introduced to capture multiscale information. With noise removal from the coarse to fine, a high-quality noisefree image can be obtained. We apply our method to both synthetic and real noisy image datasets. The experimental results demonstrate that our method can surpass the state-of-the-art denoising methods both quantitatively and visually.

翻訳日:2023-01-06 02:52:19 公開日:2020-07-14

# 音韻学における系統信号

Phylogenetic signal in phonotactics ( http://arxiv.org/abs/2002.00527v2 )

ライセンス: Link先を確認

Jayden L. Macklin-Cordes, Claire Bowern and Erich R. Round

(参考訳) 系統学的手法は、樹木の推測を超えた言語学に幅広い可能性を秘めている。ここでは, 系統学的なアプローチが, 全く新しい言語データから歴史的知見を得る可能性を明らかにする。本研究では,111パマ・ニュンガン語彙から音声学的データを抽出し,系統発生の履歴を反映する程度を定量化する。 1) セグメント間の遷移周波数と, (3) 自然音クラス間の遷移周波数とで, biphone (two-segment sequences) の存在の有無を記録するバイナリ変数と, (3) 自然音クラス間の遷移頻度の3つのデータセットをテストした。オーストラリアの言語は高い音韻的同質性を持っていると特徴付けられる。それにもかかわらず、すべてのデータセットで系統発生シグナルを検出する。系統発生シグナルは、二進法データよりも粒度の細かい周波数データで、自然クラスデータで最大である。これらの結果は, 歴史的・比較言語学において, 容易に抽出できる新たなデータ源を活用できることを示す。

Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data--in this instance, statistical phonotactics. We extract phonotactic data from 111 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.

翻訳日:2023-01-04 09:15:28 公開日:2020-07-14

# 1クラス潜在正規化ネットワークによる異常検出

Anomaly Detection by One Class Latent Regularized Networks ( http://arxiv.org/abs/2002.01607v2 )

ライセンス: Link先を確認

Chengwei Chen and Pan Chen and Haichuan Song and Yiqing Tao and Yuan Xie and Shouhong Ding and Lizhuang Ma

(参考訳) 異常検出は多くの実世界の応用でコンピュータビジョン領域の基本的な問題である。正規クラスに属する幅広い画像が、ある分布から現れると、このタスクの目的は、異常な事象に属する分布外画像を検出するモデルを構築することである。近年,GANに基づく半教師付きジェネレーティブ・アドバイザリアル・ネットワーク(GAN)手法が,異常検出タスクで人気を集めている。しかし、GANのトレーニングプロセスはまだ不安定で困難である。これらの問題を解決するために, 学習データの基盤構造を潜在的特徴空間にとらえるだけでなく, 潜在表現の空間を識別的に制限し, より正確な検出を行うことができる, 新たな対向的二重オートエンコーダネットワークを提案する。さらに、判別器と見なされる補助オートエンコーダは、より安定した訓練プロセスを得ることができた。実験の結果,MNISTおよびCIFAR10データセットおよびGTSRB停止信号データセットの最先端結果が得られた。

Anomaly detection is a fundamental problem in computer vision area with many real-world applications. Given a wide range of images belonging to the normal class, emerging from some distribution, the objective of this task is to construct the model to detect out-of-distribution images belonging to abnormal instances. Semi-supervised Generative Adversarial Networks (GAN)-based methods have been gaining popularity in anomaly detection task recently. However, the training process of GAN is still unstable and challenging. To solve these issues, a novel adversarial dual autoencoder network is proposed, in which the underlying structure of training data is not only captured in latent feature space, but also can be further restricted in the space of latent representation in a discriminant manner, leading to a more accurate detector. In addition, the auxiliary autoencoder regarded as a discriminator could obtain an more stable training process. Experiments show that our model achieves the state-of-the-art results on MNIST and CIFAR10 datasets as well as GTSRB stop signs dataset.

翻訳日:2023-01-03 21:45:56 公開日:2020-07-14

# 不均一顔認識のための関係深い特徴学習

Relational Deep Feature Learning for Heterogeneous Face Recognition ( http://arxiv.org/abs/2003.00697v3 )

ライセンス: Link先を確認

MyeongAh Cho, Taeoh Kim, Ig-Jae Kim, Kyungjae Lee, and Sangyoun Lee

(参考訳) Heterogeneous Face Recognition (HFR)は、可視光(VIS)、近赤外線(NIR)、スケッチ領域などの2つの異なる領域の顔にマッチするタスクである。データベースがないため、HFR法は通常、一般的な顔情報を含む大規模視覚データベースの事前訓練された特徴を利用する。しかし、これらの事前訓練された特徴は、視覚領域とのテクスチャの相違による性能劣化を引き起こす。そこで本研究では,汎用的な顔特徴に加えて,グローバルリレーショナル情報を抽出するリレーショナルグラフモジュール (rgm) を提案する。各アイデンティティの界面部分間の関係情報はどんなモダリティでも似ているため、特徴間のモデリング関係はドメイン間のマッチングに役立つ。 RGMを通して、関係伝播は事前訓練された特徴から利点を失うことなくテクスチャ依存性を減少させる。さらに、RGMは、局所的に相関した畳み込み特徴からグローバルな顔幾何学を捉え、長距離関係を識別する。さらに,ノードワイズ・リカレーションを行うノード注意ユニット(NAU)を提案する。さらに,HFRにおける埋め込みベクトルの効率的な投影学習のための条件付きマージン損失関数(C-softmax)を提案する。提案手法は5つのHFRデータベース上での最先端手法よりも優れている。さらに,我々のモジュールを任意のトレーニング済みの顔認識バックボーンにプラグインすることで,小さなHFRデータベースの制限を克服し,3つのバックボーンの性能向上を実証する。

Heterogeneous Face Recognition (HFR) is a task that matches faces across two different domains such as visible light (VIS), near-infrared (NIR), or the sketch domain. Due to the lack of databases, HFR methods usually exploit the pre-trained features on a large-scale visual database that contain general facial information. However, these pre-trained features cause performance degradation due to the texture discrepancy with the visual domain. With this motivation, we propose a graph-structured module called Relational Graph Module (RGM) that extracts global relational information in addition to general facial features. Because each identity's relational information between intra-facial parts is similar in any modality, the modeling relationship between features can help cross-domain matching. Through the RGM, relation propagation diminishes texture dependency without losing its advantages from the pre-trained features. Furthermore, the RGM captures global facial geometrics from locally correlated convolutional features to identify long-range relationships. In addition, we propose a Node Attention Unit (NAU) that performs node-wise recalibration to concentrate on the more informative nodes arising from relation-based propagation. Furthermore, we suggest a novel conditional-margin loss function (C-softmax) for the efficient projection learning of the embedding vector in HFR. The proposed method outperforms other state-of-the-art methods on five HFR databases. Furthermore, we demonstrate performance improvement on three backbones because our module can be plugged into any pre-trained face recognition backbone to overcome the limitations of a small HFR database.

翻訳日:2022-12-27 05:13:48 公開日:2020-07-14

# 深層強化学習を用いた効率的かつ効果的な類似サブトラジェクション探索

Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning ( http://arxiv.org/abs/2003.02542v2 )

ライセンス: Link先を確認

Zheng Wang, Cheng Long, Gao Cong, Yiding Liu

(参考訳) 同様の軌道探索は基本的な問題であり、過去20年間にわたってよく研究されてきた。しかしながら、クエリの軌跡に最もよく似た軌道の一部(すなわち、サブ軌跡)を返すことを目的とした類似のサブ軌跡探索(simsub)問題は、より細かい方法で軌道の類似性を捉えることができ、多くのアプリケーションが解析の基本的な単位としてサブ軌跡を取るにもかかわらず、ほとんど無視されている。本稿では,SimSub問題について検討し,精度と近似性の両方を含むアルゴリズムスイートを開発する。これらの近似アルゴリズムのうち、深層強化学習に基づく2つのアルゴリズムは、有効性と効率の観点からこれらの非学習ベースのアルゴリズムよりも優れている。提案手法の有効性と効率を検証した実世界の軌道データセットの実験を行った。

Similar trajectory search is a fundamental problem and has been well studied over the past two decades. However, the similar subtrajectory search (SimSub) problem, aiming to return a portion of a trajectory (i.e., a subtrajectory) which is the most similar to a query trajectory, has been mostly disregarded despite that it could capture trajectory similarity in a finer-grained way and many applications take subtrajectories as basic units for analysis. In this paper, we study the SimSub problem and develop a suite of algorithms including both exact and approximate ones. Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency. We conduct experiments on real-world trajectory datasets, which verify the effectiveness and efficiency of the proposed algorithms.

翻訳日:2022-12-26 07:45:49 公開日:2020-07-14

# 強化学習における分布ロバスト性と正規化

Distributional Robustness and Regularization in Reinforcement Learning ( http://arxiv.org/abs/2003.02894v2 )

ライセンス: Link先を確認

Esther Derman and Shie Mannor

(参考訳) 分散ロバスト最適化(DRO)は、分類と回帰におけるロバスト性と正規化の等価性を証明し、正規化が統計的学習においてうまく一般化する解析的理由を与える。 DROのシーケンシャルな意思決定への拡張は、ロバストなマルコフ決定プロセス(MDP)設定を通じて$\textit{external uncertainty}$を克服するが、結果の定式化は特に大域において解決が難しい。一方、強化学習における既存の正規化法は確率性のため$\textit{internal uncertainty}$のみを扱う。本研究は,強固なmdpと正則化の二重関係を確立することにより,強固な強化学習を促進することを目的としている。本稿では,分散ロバストなMPPを導入し,非サンプル性能を保証することを証明する。次に,経験値関数に対する新しい正規化器を導入し,ワッサースタイン分布ロバストな値関数の下限を示す。結果は大きな状態空間に対する線形値関数近似に拡張する。提案手法は,有限サンプル性能を保証したロバストネスの定式化を提供する。さらに、強化学習法で$\textit{external uncertainty}$を扱うための実用的なツールとして正規化を使うことを提案する。

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address $\textit{internal uncertainty}$ due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with $\textit{external uncertainty}$ in reinforcement learning methods.

翻訳日:2022-12-26 07:02:27 公開日:2020-07-14

# OVC-Net: テンポラルグラフと詳細拡張によるオブジェクト指向ビデオキャプション

OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement ( http://arxiv.org/abs/2003.03715v5 )

ライセンス: Link先を確認

Fangyi Zhu, Jenq-Neng Hwang, Zhanyu Ma, Guang Chen, Jun Guo

(参考訳) 従来のビデオキャプションでは、ビデオの総合的な説明を要求するが、特定のオブジェクトの詳細な説明は利用できない。移動軌跡を関連づけることなく、これらの画像に基づくデータ駆動手法は、物体間視覚特徴の時空間遷移からの活動を理解することができない。さらに、トレーニングであいまいなクリップ・センテンスペアを採用することで、単対多の性質からマルチモーダル機能マッピングを学ぶことを妨げる。本稿では,オブジェクト指向ビデオキャプションと呼ばれる,映像をオブジェクト指向で理解するための新しいタスクを提案する。ビデオベースのオブジェクト指向ビデオキャプションネットワーク(OVC)-Netを時間グラフと詳細拡張により導入し、時間とともに活動を分析し、小さなサンプル条件下での視覚言語接続を安定的に捕捉する。時間グラフは、以前のイメージベースアプローチよりも有用な補足を提供し、視覚特徴の時間的進化と空間的位置の動的移動からアクティビティを推論することができる。細部の拡張は、異なるオブジェクト間の識別的特徴をキャプチャし、それに続くキャプションモジュールによりより情報的で正確な記述が得られる。その後、効果的なクロスモーダル学習を容易にするために、一貫性のあるオブジェクト指向ペアを提供する新しいデータセットを構築した。提案手法の有効性を示すため,新しいデータセットの実験を行い,最先端のビデオキャプション手法と比較する。実験結果から,OVC-Netは並列オブジェクトを正確に記述する能力を示し,最先端の性能を実現する。

Traditional video captioning requests a holistic description of the video, yet the detailed descriptions of the specific objects may not be available. Without associating the moving trajectories, these image-based data-driven methods cannot understand the activities from the spatio-temporal transitions in the inter-object visual features. Besides, adopting ambiguous clip-sentence pairs in training, it goes against learning the multi-modal functional mappings owing to the one-to-many nature. In this paper, we propose a novel task to understand the videos in object-level, named object-oriented video captioning. We introduce the video-based object-oriented video captioning network (OVC)-Net via temporal graph and detail enhancement to effectively analyze the activities along time and stably capture the vision-language connections under small-sample condition. The temporal graph provides useful supplement over previous image-based approaches, allowing to reason the activities from the temporal evolution of visual features and the dynamic movement of spatial locations. The detail enhancement helps to capture the discriminative features among different objects, with which the subsequent captioning module can yield more informative and precise descriptions. Thereafter, we construct a new dataset, providing consistent object-sentence pairs, to facilitate effective cross-modal learning. To demonstrate the effectiveness, we conduct experiments on the new dataset and compare it with the state-of-the-art video captioning methods. From the experimental results, the OVC-Net exhibits the ability of precisely describing the concurrent objects, and achieves the state-of-the-art performance.

翻訳日:2022-12-25 14:25:05 公開日:2020-07-14

# 階層型運動型メッシュリカバリ

Hierarchical Kinematic Human Mesh Recovery ( http://arxiv.org/abs/2003.04232v2 )

ライセンス: Link先を確認

Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecka, Ziyan Wu

(参考訳) 一つの画像から3次元メッシュのパラメトリックモデルを推定する問題を考察する。モデルパラメータの直接回帰によるこの分野の最近の進歩は大きいが、これらの手法は人体の運動的構造を暗黙的に活用するだけであり、それ以前のモデルの最適使用へと繋がる。本研究では,このギャップに対処すべく,既知の階層構造によって,モデルの相互依存性を含む,人間のパラメトリックモデルの回帰のための新しい手法を提案する。これにより、regressorアーキテクチャの事前インフォームド設計と、現在の3dヒューマンメッシュリカバリの標準フレームワークと連携して使用するためのフレキシブルな階層最適化が実現されている。これらの側面を、標準ベンチマークデータセットに関する広範な実験によって実証し、提案した新しい設計が、既存および一般的ないくつかの手法より優れており、新しい最先端の結果が確立されていることを示す。本手法は, データの破損下においても接合部を推定する機能を備えており, 閉塞度の異なる実験を行うことで実証する。

We consider the problem of estimating a parametric model of 3D human mesh from a single image. While there has been substantial recent progress in this area with direct regression of model parameters, these methods only implicitly exploit the human body kinematic structure, leading to sub-optimal use of the model prior. In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. This results in a strong prior-informed design of the regressor architecture and an associated hierarchical optimization that is flexible to be used in conjunction with the current standard frameworks for 3D human mesh recovery. We demonstrate these aspects by means of extensive experiments on standard benchmark datasets, showing how our proposed new design outperforms several existing and popular methods, establishing new state-of-the-art results. By considering joint interdependencies, our method is equipped to infer joints even under data corruptions, which we demonstrate by conducting experiments under varying degrees of occlusion.

翻訳日:2022-12-25 08:42:55 公開日:2020-07-14

# giqa: 生成した画像品質評価

GIQA: Generated Image Quality Assessment ( http://arxiv.org/abs/2003.08932v3 )

ライセンス: Link先を確認

Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

(参考訳) 現在、GAN(Generative Adversarial Network)は素晴らしい成果を上げているが、すべての生成画像が完璧ではない。最近、生成モデルにいくつかの量的基準が現れたが、いずれも単一の生成画像のために設計されていない。本稿では,各画像の品質を定量的に評価するgiqa(generate image quality assessment)という新たな研究テーマを提案する。学習ベースとデータベースという2つの観点からGIQAアルゴリズムを導入する。我々は、様々なデータセット上で様々なGANモデルによって生成された多数の画像を評価し、それらが人間の評価と一致していることを示す。さらに、GIQAは、生成モデルのリアリズムと多様性を別々に評価し、GANのトレーニングにおいてオンラインのハードネガティブマイニング(OHEM)を可能にするなど、多くのアプリケーションで利用することができる。

Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect. A number of quantitative criteria have recently emerged for generative model, but none of them are designed for a single generated image. In this paper, we propose a new research topic, Generated Image Quality Assessment (GIQA), which quantitatively evaluates the quality of each generated image. We introduce three GIQA algorithms from two perspectives: learning-based and data-based. We evaluate a number of images generated by various recent GAN models on different datasets and demonstrate that they are consistent with human assessments. Furthermore, GIQA is available to many applications, like separately evaluating the realism and diversity of generative models, and enabling online hard negative mining (OHEM) in the training of GANs to improve the results.

翻訳日:2022-12-22 04:51:09 公開日:2020-07-14

# 自然言語推論モデルはIMPPRESsiveか? 学習のインプリケーションと前提

Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition ( http://arxiv.org/abs/2004.03066v2 )

ライセンス: Link先を確認

Paloma Jeretic, Alex Warstadt, Suvrat Bhooshan, Adina Williams

(参考訳) 自然言語推論(NLI)は、自然言語理解にとってますます重要なタスクであり、ある文が他の文に関係しているかどうかを推測する必要がある。しかし、nliモデルが実用的推論を行う能力は未定である。そこで我々は,約25kの半自動生成文対からなるIMPPRES(IMPlicature and PreSupposition diagnosis dataset)を作成した。我々は、multiNLI(Williams et al., 2018)でトレーニングされたBERT、InferSent、BOW NLIモデルが実用的推論を学習するかどうかを評価するためにIMPPRESを使用する。 multinli はこれらの推論型を示すペアをほとんど含まないように見えるが、bert が実用的推論を描くことを学ぶ。によって引き起こされるスカラーの模倣を包含物として確実に扱う。のみ"のような前提条件のトリガに対して、BERTは、否定のような追加条件のキャンセル演算子の下にトリガーが埋め込まれた場合でも、前提条件をエンセレーションとして確実に認識する。 BOWとInferSentは、実用的な推論の弱い証拠を示している。 nliトレーニングはモデルに実用的推論を学ぶことを奨励するものだと結論付けています。

Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by "some" as entailments. For some presupposition triggers like "only", BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.

翻訳日:2022-12-15 23:39:34 公開日:2020-07-14

# 球面画像の高精度セグメンテーションのための一般化された最短パスベーススーパーピクセル

Generalized Shortest Path-based Superpixels for Accurate Segmentation of Spherical Images ( http://arxiv.org/abs/2004.07394v3 )

ライセンス: Link先を確認

R\'emi Giraud, Rodrigo Borba Pinheiro, Yannick Berthoumieu

(参考訳) 既存のスーパーピクセル法のほとんどは、コンピュータビジョンパイプラインのプリプロセスとして標準平面画像を分割するために設計されている。それでも、主に360{\deg}球面画像を生成する広角キャプチャデバイスに基づくアプリケーションの増加は、専用のスーパーピクセルアプローチの必要性を強制している。本稿では,SphSPS(Spherical Shortest Path-based Superpixels)と呼ばれる,球面画像の新しいスーパーピクセル法を提案する。我々のアプローチは球面幾何学を尊重し、3次元球面取得空間上の画素とスーパーピクセル中心の間の最短経路の概念を一般化する。このようなパスの特徴情報をクラスタリングフレームワークに効率的に統合でき、オブジェクトの輪郭や形状の規則性に関して共同で改善できることを示します。球面空間におけるこの最後の側面を相対的に評価するために、平面大域正規度計量も一般化する。最後に,提案手法は,360{\deg}球面パノラマセグメンテーションデータセットにおける平面および近年の球面スーパーピクセルアプローチよりもかなり優れた性能が得られることを示す。

Most of existing superpixel methods are designed to segment standard planar images as pre-processing for computer vision pipelines. Nevertheless, the increasing number of applications based on wide angle capture devices, mainly generating 360{\deg} spherical images, have enforced the need for dedicated superpixel approaches. In this paper, we introduce a new superpixel method for spherical images called SphSPS (for Spherical Shortest Path-based Superpixels). Our approach respects the spherical geometry and generalizes the notion of shortest path between a pixel and a superpixel center on the 3D spherical acquisition space. We show that the feature information on such path can be efficiently integrated into our clustering framework and jointly improves the respect of object contours and the shape regularity. To relevantly evaluate this last aspect in the spherical space, we also generalize a planar global regularity metric. Finally, the proposed SphSPS method obtains significantly better performance than both planar and recent spherical superpixel approaches on the reference 360{\deg} spherical panorama segmentation dataset.

翻訳日:2022-12-13 03:48:44 公開日:2020-07-14

# PolyLaneNet: 深い多項式回帰によるレーン推定

PolyLaneNet: Lane Estimation via Deep Polynomial Regression ( http://arxiv.org/abs/2004.10924v2 )

ライセンス: Link先を確認

Lucas Tabelini, Rodrigo Berriel, Thiago M. Paix\~ao, Claudine Badue, Alberto F. De Souza and Thiago Oliveira-Santos

(参考訳) 自動運転の大きな進歩に貢献した主な要因の1つは、ディープラーニングの出現である。安全な自動運転車にとって、まだ完全に解決されていない問題の1つは車線検出だ。このタスクのメソッドはリアルタイム(+30fps)で動作する必要があるため、効果的(すなわち高い精度)でなければならないだけでなく、効率的(すなわち高速)でなければならない。本研究では,車両に搭載された前方カメラからのイメージを入力として用いて,画像中の各レーンマーキングを表す多項式を深い多項式回帰により出力するレーン検出手法を提案する。提案手法は,tusimpleデータセットの効率(115fps)を維持しつつ,既存の最先端手法と競合することが示されている。さらに、さらに2つの公開データセットに関する広範な質的結果と、近年のレーン検出で使用された評価指標の制限が提示されている。最後に、私たちはソースコードとトレーニングされたモデルを提供して、他の人が本論文で示したすべての結果を再現できるようにします。ソースコードと事前訓練済みのモデルはhttps://github.com/lucastabelini/PolyLaneNet.comで入手できる。

One of the main factors that contributed to the large advances in autonomous driving is the advent of deep learning. For safer self-driving vehicles, one of the problems that has yet to be solved completely is lane detection. Since methods for this task have to work in real-time (+30 FPS), they not only have to be effective (i.e., have high accuracy) but they also have to be efficient (i.e., fast). In this work, we present a novel method for lane detection that uses as input an image from a forward-looking camera mounted in the vehicle and outputs polynomials representing each lane marking in the image, via deep polynomial regression. The proposed method is shown to be competitive with existing state-of-the-art methods in the TuSimple dataset while maintaining its efficiency (115 FPS). Additionally, extensive qualitative results on two additional public datasets are presented, alongside with limitations in the evaluation metrics used by recent works for lane detection. Finally, we provide source code and trained models that allow others to replicate all the results shown in this paper, which is surprisingly rare in state-of-the-art lane detection methods. The full source code and pretrained models are available at https://github.com/lucastabelini/PolyLaneNet.

翻訳日:2022-12-10 09:48:50 公開日:2020-07-14

# ミニマックスレイテンシを用いたマルチロボットパトロールスケジューリングの近似アルゴリズム

Approximation Algorithms for Multi-Robot Patrol-Scheduling with Min-Max Latency ( http://arxiv.org/abs/2005.02530v3 )

ライセンス: Link先を確認

Peyman Afshani, Mark De Berg, Kevin Buchin, Jie Gao, Maarten Loffler, Amir Nayyeri, Benjamin Raichel, Rik Sarkar, Haotian Wang, Hao-Tsung Yang

(参考訳) 我々は、メートル法空間内の所定の$n$のサイトを訪れるために、$k$ロボットのパトロールスケジュールを見つける問題を考える。各ロボットは、同じ最大速度を持ち、そのサイトの連続訪問間の最大時間と定義されている任意のサイトの重み付けされた最大レイテンシを最小限にすることを目的としている。問題はnp-hardであり、旅行セールスマンの問題を特別なケースとして抱えている($k=1$、すべてのサイトが同じ重量を持つ場合)。近似係数が o(k^2 \log \frac{w_{\max}}{w_{\min}}) の多項式時間アルゴリズムを最適解に与え、そこではそれぞれ $w_{\max}$ と $w_{\min}$ が各サイトの最大重みと最小重みである。さらに, サイトが1次元の特別な場合についても検討する。すべてのサイトが同じ重みを持つ場合、問題を正確に解くために多項式時間アルゴリズムを示す。サイトが異なる重みを持つ場合、12ドル程度の解が提示され、ロボットの数であるk$が一定である場合、多項式時間で実行される。

We consider the problem of finding patrol schedules for $k$ robots to visit a given set of $n$ sites in a metric space. Each robot has the same maximum speed and the goal is to minimize the weighted maximum latency of any site, where the latency of a site is defined as the maximum time duration between consecutive visits of that site. The problem is NP-hard, as it has the traveling salesman problem as a special case (when $k=1$ and all sites have the same weight). We present a polynomial-time algorithm with an approximation factor of $O(k^2 \log \frac{w_{\max}}{w_{\min}})$ to the optimal solution, where $w_{\max}$ and $w_{\min}$ are the maximum and minimum weight of the sites respectively. Further, we consider the special case where the sites are in 1D. When all sites have the same weight, we present a polynomial-time algorithm to solve the problem exactly. If the sites may have different weights, we present a $12$-approximate solution, which runs in polynomial time when the number of robots, $k$, is a constant.

翻訳日:2022-12-06 14:33:53 公開日:2020-07-14

# ウェーブレット統合CNNによるノイズ・ロバスト画像分類

Wavelet Integrated CNNs for Noise-Robust Image Classification ( http://arxiv.org/abs/2005.03337v2 )

ライセンス: Link先を確認

Qiufu Li, Linlin Shen, Sheng Guo, Zhihui Lai

(参考訳) 畳み込みニューラルネットワーク(CNN)は、一般的にノイズの中断、すなわち小さな画像ノイズが出力に劇的な変化を引き起こす。最終述語に対するノイズ効果を抑制するために,max-pooling, strided-convolution, average-poolingを離散ウェーブレット変換(dwt)に置き換え,cnnの強化を行う。本稿では,haar,daubechies,cohenなど様々なウェーブレットに適用可能な一般dwtおよび逆dwt(idwt)層と,これらの層を画像分類に用いるウェーブレット統合cnn(wavecnets)の設計について述べる。ウェーブネットでは、ダウンサンプリング中に特徴マップを低周波および高周波成分に分解する。低周波成分は、基本オブジェクト構造を含む主情報を格納し、後続の層に送信してロバストな高レベル特徴を抽出する。データノイズのほとんどを含む高周波成分を推論中に落とし、ウェーブネットのノイズロバスト性を改善する。 imagenet と imagenet-c (imagenet のノイズバージョン) による実験の結果,wavecnets は vgg, resnet, densenet の統合バージョンであり,バニラ版よりも高い精度とノイズロバスト性を実現していることがわかった。

Convolutional Neural Networks (CNNs) are generally prone to noise interruptions, i.e., small image noise can cause drastic changes in the output. To suppress the noise effect to the final predication, we enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT). We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. In WaveCNets, feature maps are decomposed into the low-frequency and high-frequency components during the down-sampling. The low-frequency component stores main information including the basic object structures, which is transmitted into the subsequent layers to extract robust high-level features. The high-frequency components, containing most of the data noise, are dropped during inference to improve the noise-robustness of the WaveCNets. Our experimental results on ImageNet and ImageNet-C (the noisy version of ImageNet) show that WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.

翻訳日:2022-12-05 23:33:33 公開日:2020-07-14

# 反復的信頼フィードバックとガイドアップサンプリングによる高分解能イメージパインティング

High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling ( http://arxiv.org/abs/2005.11742v2 )

ライセンス: Link先を確認

Yu Zeng, Zhe Lin, Jimei Yang, Jianming Zhang, Eli Shechtman, Huchuan Lu

(参考訳) 既存の画像塗装法は、実アプリケーションで大きな穴を扱う際に、しばしばアーティファクトを生成する。この課題に対処するため,フィードバック機構を備えた反復的塗装法を提案する。具体的には, 暗黙の結果だけでなく, 対応する信頼度マップも出力する深層生成モデルを導入する。このマップをフィードバックとして使用すると、各イテレーションでホール内の高信頼画素のみを信頼して、次のイテレーションで残るピクセルにフォーカスすることで、徐々に穴を埋める。前回のイテレーションからの部分的な予測を既知のピクセルとして再利用することで、このプロセスは徐々に結果を改善する。また,高分解能インペインティング結果を生成するための誘導型アップサンプリングネットワークを提案する。我々は、Contextual Attentionモジュールを拡張して、入力画像の高解像度な特徴パッチを借用する。さらに,実際のオブジェクト除去シナリオを模倣するために,大規模なオブジェクトマスクデータセットを収集し,ユーザ入力をシミュレートするより現実的なトレーニングデータを合成する。実験により,本手法は定量評価と定性評価の両方において既存手法よりも有意に優れていた。さらなる結果とWeb APPはhttps://zengxianyu.github.io/iic.comで入手できる。

Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at https://zengxianyu.github.io/iic.

翻訳日:2022-11-29 14:00:44 公開日:2020-07-14

# M2Net:脳腫瘍患者の生存時間予測のためのマルチモーダルマルチチャネルネットワーク

M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients ( http://arxiv.org/abs/2006.10135v2 )

ライセンス: Link先を確認

Tao Zhou, Huazhu Fu, Yu Zhang, Changqing Zhang, Xiankai Lu, Jianbing Shen, and Ling Shao

(参考訳) 総生存時間(os)の早期かつ正確な予測は、脳腫瘍患者のより良い治療計画を得るのに役立つ。多くのOS時間予測手法が開発され、有望な結果が得られたが、まだいくつか問題がある。第1に、従来の予測方法は、磁気共鳴(mr)ボリュームの局所病変領域における放射線学的特徴に依存しており、完全な画像や複雑な腫瘍のモデルを表すものではない。第二に、異なるタイプのスキャナー(つまりマルチモーダルデータ)は異なる脳領域に敏感であり、複数のモーダルにまたがる補完情報を効果的に活用し、モダリティ固有の特性を維持することが困難である。第三に、既存の手法は予測モデルに焦点を合わせ、複雑なデータ-ラベル関係を無視している。上記の問題に対処するため,マルチモーダルマルチチャネルネットワーク (M2Net) のエンドツーエンドOS時間予測モデルを提案する。具体的には、まず3dmrボリュームを異なる方向の2d画像に投影し、計算コストを低減し、重要な情報を保存し、事前学習したモデルを他のタスクから転送できるようにする。次に,モダリティ特有のネットワークを用いて,mrスキャンから暗黙的かつ高レベルな特徴を抽出する。マルチモーダル共有ネットワークは、これらの機能をバイリニアプーリングモデルを用いて融合させ、それらの相関を利用して補完情報を提供する。最後に、各モダリティ固有ネットワークとマルチモーダル共有ネットワークからの出力を統合し、最終的な予測結果を生成する。 M2Netモデルが他の手法よりも優れていることを示す実験結果を得た。

Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.

翻訳日:2022-11-26 06:58:50 公開日:2020-07-14

# 人間行動を持つ超人的AI:モデルシステムとしてのチェス

Aligning Superhuman AI with Human Behavior: Chess as a Model System ( http://arxiv.org/abs/2006.01855v3 )

ライセンス: Link先を確認

Reid McIlroy-Young and Siddhartha Sen and Jon Kleinberg and Ashton Anderson

(参考訳) 人工知能がますます知性が高まる中、スーパーヒューマンのパフォーマンスを達成することは、人間がアルゴリズムから学び、協力する可能性を高めている。しかし、AIシステムが問題にアプローチする方法は、人々が行う方法としばしば異なるため、解釈不能であり、そこから学ぶのが困難である。人間と人工知能の間のこのギャップを埋める上で重要なステップは、人間の行動を構成する粒度の大きいアクションをモデリングすることだ。我々は、人工知能の長い歴史を持つモデルシステム、チェスでこの目標を追求する。チェス選手の総合的なパフォーマンスは、ゲーム中に決定をするときに展開します。あらゆるスキルレベルでプレイヤーがオンラインでプレイする数億のゲームは、これらの決定とその正確な文脈が詳細に記録される豊富なデータソースを形成する。 AlphaZeroのオープンソース実装を含む既存のチェスエンジンをこのデータに適用すると、人間の動きをうまく予測できないことが分かる。人間のチェスゲームで訓練されたAlpha-Zeroのカスタマイズ版であるMaiaを開発し,既存のエンジンよりもはるかに高い精度で人間の動きを予測し,特定のスキルレベルでのプレイヤーによる決定を調整可能な方法で予測する際の最大精度を実現する。人間が次の動きで大きな間違いを犯すかどうかを予測する2つのタスクに対して、我々は、競争ベースラインを大幅に上回るディープニューラルネットワークを開発する。その結果,まず人間の意思決定を正確にモデル化することで,人間のコラボレーションを念頭に置いて人工知能システムを設計することができる可能性が示唆された。

As artificial intelligence becomes increasingly intelligent---in some cases, achieving superhuman performance---there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in minute detail. Applying existing chess engines to this data, including an open-source implementation of AlphaZero, we find that they do not predict human moves well. We develop and introduce Maia, a customized version of Alpha-Zero trained on human chess games, that predicts human moves at a much higher accuracy than existing engines, and can achieve maximum accuracy when predicting decisions made by players at a specific skill level in a tuneable way. For a dual task of predicting whether a human will make a large mistake on the next move, we develop a deep neural network that significantly outperforms competitive baselines. Taken together, our results suggest that there is substantial promise in designing artificial intelligence systems with human collaboration in mind by first accurately modeling granular human decision-making.

翻訳日:2022-11-25 23:43:39 公開日:2020-07-14

# 参照誘導顔成分編集

Reference-guided Face Component Editing ( http://arxiv.org/abs/2006.02051v2 )

ライセンス: Link先を確認

Qiyao Deng, Jie Cao, Yunfan Liu, Zhenhua Chai, Qi Li and Zhenan Sun

(参考訳) 近年,顔画像の編集は大きな進歩を遂げている。しかし以前の方法も 1)顔の特徴を事前に定義し、高レベルの顔成分(目、鼻、口など)の形状を制御する柔軟性に欠ける。 2)手作業で編集したマスクやスケッチをオブザーバブルな変更の中間表現として取り出すが、このような追加入力は通常、追加の労力を要する。既存の手法の限界(形状、マスク、スケッチなど)を断ち切るため、幾何学的変化を伴う多様かつ制御可能な顔コンポーネント編集のための r-FACE (Reference-guided FAce Component Editing) と呼ばれる新しいフレームワークを提案する。特に、r-faceは、顔成分の形状を制御するための条件として参照画像を利用するバックボーンとしてイメージインペインティングモデルを取る。フレームワークが対象の顔コンポーネントに集中するよう促すために、サンプルガイドアテンションモジュールは、参照画像から抽出された注意特徴と対象顔コンポーネント特徴とを融合するように設計されている。実験的な検証と比較を通じて,提案手法の有効性を検証した。

Face portrait editing has achieved great progress in recent years. However, previous methods either 1) operate on pre-defined face attributes, lacking the flexibility of controlling shapes of high-level semantic facial components (e.g., eyes, nose, mouth), or 2) take manually edited mask or sketch as an intermediate representation for observable changes, but such additional input usually requires extra efforts to obtain. To break the limitations (e.g. shape, mask or sketch) of the existing methods, we propose a novel framework termed r-FACE (Reference-guided FAce Component Editing) for diverse and controllable face component editing with geometric changes. Specifically, r-FACE takes an image inpainting model as the backbone, utilizing reference images as conditions for controlling the shape of face components. In order to encourage the framework to concentrate on the target face components, an example-guided attention module is designed to fuse attention features and the target face component features extracted from the reference image. Through extensive experimental validation and comparisons, we verify the effectiveness of the proposed framework.

翻訳日:2022-11-25 18:13:35 公開日:2020-07-14

# 深層学習時代の創発的マルチエージェントコミュニケーション

Emergent Multi-Agent Communication in the Deep Learning Era ( http://arxiv.org/abs/2006.02419v2 )

ライセンス: Link先を確認

Angeliki Lazaridou, Marco Baroni

(参考訳) 言語を通して協力する能力は、人間の定義的な特徴である。深層人工ネットワークの知覚、運動、計画能力が増大するにつれて、研究者は相互作用するための共有言語の開発も可能であるかどうかの研究を行っている。科学的観点から、深いエージェントのコミュニティで言語が進化する条件を理解することは、人間の言語進化に光を当てることができる。応用の観点からは、ディープネットワークに相互通信によって対話的に問題を解決する能力を持たせることで、日々の生活においてより柔軟で有用なものにすることができる。本稿では,この2つの角度から最近の言語出現研究について概説する。

The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language evolves in communities of deep agents and its emergent features can shed light on human language evolution. From an applied perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us should make them more flexible and useful in everyday life. This article surveys representative recent language emergence studies from both of these two angles.

翻訳日:2022-11-25 17:19:15 公開日:2020-07-14

# Cascaded Opponent Filter Network を用いた視覚誘導音源分離

Visually Guided Sound Source Separation using Cascaded Opponent Filter Network ( http://arxiv.org/abs/2006.03028v2 )

ライセンス: Link先を確認

Lingyu Zhu, Esa Rahtu

(参考訳) 本研究の目的は、音源の視覚的な手がかりの助けを借りて、混合音声から元の成分信号を回収することである。このようなタスクは通常、視覚的に誘導された音源分離と呼ばれる。提案するCascaded Opponent Filter (COF) フレームワークは複数のステージで構成され,ソース分離を再帰的に洗練する。 COFのキー要素は、ソース間の残留成分を識別し、再配置する新しい反対フィルタモジュールである。本システムでは,映像フレーム,光学フロー,動画像,それらの組み合わせに基づいて,映像の出現と動きをガイドし,様々な表現について検討する。最後に, cofと共に音源位置の画素レベルマスクを生成する, 音源位置マスキング(sslm)手法を提案する。システム全体は、大量のビデオを使ってエンドツーエンドに訓練されている。我々はCOFを最近のベースラインと比較し、3つの挑戦的データセット(MUSIC、A-MUSIC、A-NATURAL)で最先端のパフォーマンスを得る。プロジェクトページ: https://ly-zhu.github.io/cof-net

The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page: https://ly-zhu.github.io/cof-net.

翻訳日:2022-11-25 12:43:18 公開日:2020-07-14

# 不均一自律システムを用いた教師なし異常検出

Unsupervised Abnormality Detection Using Heterogeneous Autonomous Systems ( http://arxiv.org/abs/2006.03733v2 )

ライセンス: Link先を確認

Sayeed Shafayet Chowdhury, Kazi Mejbaul Islam and Rouhan Noor

(参考訳) 監視シナリオにおける異常検出(ad)は、新しくて困難な研究分野である。ドローンや自動車のような自動運転車にとって、通常の状態と異常状態をリアルタイムで区別することが極めて重要である。さらに、いかなるデバイス故障も検出する必要があります。しかし、その性質や異常の程度は実際の環境や逆境によって異なる可能性がある。結果として、すべてのケースa-prioriをモデル化し、教師付きメソッドを使用して分類することは非現実的である。また、自動運転車は、画像やその他のアナログまたはデジタルセンサーデータなどのさまざまなデータタイプを提供しており、これら全てを実効的に活用すれば、異常検出に役立てることができる。そこで本研究では,無人監視ドローンの異常度を推定し,リアルタイム画像とIMUセンサデータを教師なしで解析する異種システムを提案する。本稿では,convolutional neural network (cnn) アーキテクチャを実演し,通常の画像と検討中の画像との角度を推定し,デバイス異常の計測を行う。さらに、IMUデータはオートエンコーダで異常を予測するために使用される。最後に、これら2つのアルゴリズムの結果をアンサンブルして、最終異常度を推定する。提案手法は, IEEE SP Cup-2020データセットで97.3%の精度で良好に動作する。さらに、このアプローチを社内データセットでテストして、堅牢性を確認しました。

Anomaly detection (AD) in a surveillance scenario is an emerging and challenging field of research. For autonomous vehicles like drones or cars, it is immensely important to distinguish between normal and abnormal states in real-time. Additionally, we also need to detect any device malfunction. But the nature and degree of abnormality may vary depending upon the actual environment and adversary. As a result, it is impractical to model all cases a-priori and use supervised methods to classify. Also, an autonomous vehicle provides various data types like images and other analog or digital sensor data, all of which can be useful in anomaly detection if leveraged fruitfully. To that effect, in this paper, a heterogeneous system is proposed which estimates the degree of abnormality of an unmanned surveillance drone, analyzing real-time image and IMU (Inertial Measurement Unit) sensor data in an unsupervised manner. Here, we have demonstrated a Convolutional Neural Network (CNN) architecture, named AngleNet to estimate the angle between a normal image and another image under consideration, which provides us with a measure of anomaly of the device. Moreover, the IMU data are used in autoencoder to predict abnormality. Finally, the results from these two algorithms are ensembled to estimate the final degree of abnormality. The proposed method performs satisfactorily on the IEEE SP Cup-2020 dataset with an accuracy of 97.3%. Additionally, we have also tested this approach on an in-house dataset to validate its robustness.

翻訳日:2022-11-25 03:42:57 公開日:2020-07-14

# sigmorphon 2020 タスク0: タイプ論的に多様な形態変化

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection ( http://arxiv.org/abs/2006.11572v2 )

ライセンス: Link先を確認

Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden

(参考訳) 自然言語処理(nlp)の幅広い目標は、任意の自然言語を処理する能力を持つシステムを開発することである。しかし、ほとんどのシステムは英語のような1つの言語からのデータを使って開発されている。 sigmorphon 2020では、形態学的再帰に関する共通タスクが、タイプ論的に異なる言語を一般化するシステムの能力を調査することを目的としている。システムは45言語と5つの言語ファミリーのデータを使用して開発され、追加の45言語と10の言語ファミリー(合計13言語)のデータで微調整され、90言語すべてで評価された。タスクには10チームから合計22のシステム(19のニューラル)が提出された。 4つの勝利システムはすべてニューラルネットワーク(単言語トランスフォーマー2台と多言語rnnベースのモデル2台)であった。ほとんどのチームは、低リソース言語のためのデータ幻覚と拡張、アンサンブル、多言語トレーニングの有用性を示しています。非神経学習者や手動で設計した文法は、Ingrian, Tajik, Tagalog, Zarma, Lingalaなど一部の言語で特に限られたデータで、競争力があり、優れた性能を示した。一部の言語ファミリー(afro-asiatic、niger-congo、turkic)は、ほとんどのシステムで比較的簡単であり、90%以上の精度を達成したが、他の言語はより困難であった。

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.

翻訳日:2022-11-18 22:47:08 公開日:2020-07-14

# セルフプレイによる近最適強化学習

Near-Optimal Reinforcement Learning with Self-Play ( http://arxiv.org/abs/2006.12007v2 )

ライセンス: Link先を確認

Yu Bai, Chi Jin, Tiancheng Yu

(参考訳) 本稿では,2プレイヤーゼロサムゲームにおける強化学習のための最適アルゴリズムの設計問題について考察する。我々は,直接の監督なしに自己対決で最適な政策を学ぶセルフプレイアルゴリズムに焦点を当てる。 s$状態、$a$ max-playerアクション、$b$ min-playerアクションを持つ表型エピソディックマルコフゲームでは、近似ナッシュ均衡を見つけるための最良の既存のアルゴリズムは、$(s,a,b)$にのみ依存を強調するとき、ゲームプレイのステップである$\tilde{\mathcal{o}}(s^2ab)$を必要とする。対照的に、最も高い既存の下界スケールは$\Omega(S(A+B))$で、上界と大きな差がある。本稿では, サンプル複雑性を$\tilde{\mathcal{O}}(SAB)$, サンプル複雑性を$\tilde{\mathcal{O}}(S(A+B)$とする新しい \emph{Nash V-learning} アルゴリズムの楽観的な変種を提案する。後者の結果は、各エピソードの長さの多項式係数を除く全ての問題依存パラメータにおける情報理論の下限と一致する。さらに,マルコフゲームにおける固定対戦相手に対する最善の応答を学習する計算の難易度を,nash平衡を求めることとは異なる学習目標として提示する。

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with $S$ states, $A$ max-player actions and $B$ min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires $\tilde{\mathcal{O}}(S^2AB)$ steps of game playing, when only highlighting the dependency on $(S,A,B)$. In contrast, the best existing lower bound scales as $\Omega(S(A+B))$ and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(S(A+B))$. The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. In addition, we present a computational hardness result for learning the best responses against a fixed opponent in Markov games---a learning objective different from finding the Nash equilibrium.

翻訳日:2022-11-18 04:17:38 公開日:2020-07-14

# 超低消費電力FDSOIニューラル回路による極端ニューロモーフィックインテリジェンス

Ultra-Low-Power FDSOI Neural Circuits for Extreme-Edge Neuromorphic Intelligence ( http://arxiv.org/abs/2006.14270v2 )

ライセンス: Link先を確認

Arianna Rubino, Can Livanelioglu, Ning Qiao, Melika Payvand, and Giacomo Indiveri

(参考訳) 近年、エッジコンピューティングアプリケーションのための人工知能回路やシステムの開発への関心が高まっている。インメモリコンピューティング混合信号ニューロモルフィックアーキテクチャは、スパイクニューラルネットワークをリアルタイムでエミュレートする能力のおかげで、エッジコンピューティングのセンサー処理アプリケーションに有望な超低消費電力ソリューションを提供する。このアプローチによって提供される微粒な並列性は、フォン・ノイマンアーキテクチャの時間多重計算パラダイムに頼ることなく、知覚された信号にそれらのダイナミクスを適用することによって、知覚データを効率的に処理することができる。さらに電力消費を低減するため、FDSOI(Fully-Depleted Silicon on Insulator)統合プロセスの特徴を利用した混合信号アナログ/デジタル回路を提案する。具体的には,アナログ設計問題に対処し,シナプスインテグレータの設計と適応ニューロン回路の設計を最適化するためのfdsoi技術の選択肢を検討する。本稿では,回路シミュレーションの結果を示し,小型設計による生物学的に妥当な神経動力学を作製する能力を示し,ニューロモルフィックプロセッサにおける大規模スパイクニューラルネットワークの実現に最適化した。

Recent years have seen an increasing interest in the development of artificial intelligence circuits and systems for edge computing applications. In-memory computing mixed-signal neuromorphic architectures provide promising ultra-low-power solutions for edge-computing sensory-processing applications, thanks to their ability to emulate spiking neural networks in real-time. The fine-grain parallelism offered by this approach allows such neural circuits to process the sensory data efficiently by adapting their dynamics to the ones of the sensed signals, without having to resort to the time-multiplexed computing paradigm of von Neumann architectures. To reduce power consumption even further, we present a set of mixed-signal analog/digital circuits that exploit the features of advanced Fully-Depleted Silicon on Insulator (FDSOI) integration processes. Specifically, we explore the options of advanced FDSOI technologies to address analog design issues and optimize the design of the synapse integrator and of the adaptive neuron circuits accordingly. We present circuit simulation results and demonstrate the circuit's ability to produce biologically plausible neural dynamics with compact designs, optimized for the realization of large-scale spiking neural networks in neuromorphic processors.

翻訳日:2022-11-17 04:06:28 公開日:2020-07-14

# MvMM-RegNet:多変量混合モデルとニューラルネットワーク推定に基づく新しい画像登録フレームワーク

MvMM-RegNet: A new image registration framework based on multivariate mixture model and neural network estimation ( http://arxiv.org/abs/2006.15573v2 )

ライセンス: Link先を確認

Xinzhe Luo and Xiahai Zhuang

(参考訳) 現在のディープラーニングベースの登録アルゴリズムは、トレーニング中のバックプロパゲーションによって、一対の移動画像と固定画像との密接な対応を最適化するロス関数として、強度に基づく類似度尺度を利用することが多い。しかし、強度に基づくメトリクスは、特にクロスモダリティやコントラスト強調画像において、強度クラス対応の仮定に違反する場合、誤解を招くことがある。また、既存の学習に基づく登録方法は、ペアワイズ登録に主に適用され、グループ登録や複数画像の同時登録に拡張されることは稀である。本稿では,多変量混合モデル(MvMM)とニューラルネットワーク推定に基づく新しい画像登録フレームワークを提案する。外観と解剖情報を一体化した生成モデルを構築し、グループ登録が可能な新規な損失関数を導出する。本稿では,マルチモーダル心画像に対する各種応用について,ペアワイズ登録によるsas (single-atlas-based segmentation) やグループワイズ登録で統一されたmas (multi-atlas segmentation) など,汎用性について述べる。 MM-WHS-2017とMS-CMRSeg-2019の2つの公開データセットの性能評価を行った。以上の結果から, MR画像上でのDiceスコアは平均0.871 pm 0.025ドル, LGE MR画像上での心筋セグメンテーションは0.783 pm 0.082ドルであった。

Current deep-learning-based registration algorithms often exploit intensity-based similarity measures as the loss function, where dense correspondence between a pair of moving and fixed images is optimized through backpropagation during training. However, intensity-based metrics can be misleading when the assumption of intensity class correspondence is violated, especially in cross-modality or contrast-enhanced images. Moreover, existing learning-based registration methods are predominantly applicable to pairwise registration and are rarely extended to groupwise registration or simultaneous registration with multiple images. In this paper, we propose a new image registration framework based on multivariate mixture model (MvMM) and neural network estimation. A generative model consolidating both appearance and anatomical information is established to derive a novel loss function capable of implementing groupwise registration. We highlight the versatility of the proposed framework for various applications on multimodal cardiac images, including single-atlas-based segmentation (SAS) via pairwise registration and multi-atlas segmentation (MAS) unified by groupwise registration. We evaluated performance on two publicly available datasets, i.e. MM-WHS-2017 and MS-CMRSeg-2019. The results show that the proposed framework achieved an average Dice score of $0.871\pm 0.025$ for whole-heart segmentation on MR images and $0.783\pm 0.082$ for myocardium segmentation on LGE MR images.

翻訳日:2022-11-16 02:40:34 公開日:2020-07-14

# 線形サンプルの混合によるスパース信号の復元

Recovery of Sparse Signals from a Mixture of Linear Samples ( http://arxiv.org/abs/2006.16406v2 )

ライセンス: Link先を確認

Arya Mazumdar and Soumyabrata Pal

(参考訳) 線形回帰の混合は、異種データを表現するために広く使われる一般的な学習理論モデルである。最も単純な形式では、2つの異なる線形モデルからラベルが生成され、混合されると仮定する。 Yin et al.とKrishnamurthy et al., 2019の最近の研究は、この問題に対するモデルリカバリの実験的設計設定に焦点を当てている。それらの特徴を設計・クエリしてラベルを得ることが可能であると考えられる。クエリを行うと、オラクルは2つの異なるスパース線形モデルの1つをランダムに選択し、それに応じてラベルを生成する。両方のモデルを同時にリカバリするために、oracleのクエリはいくつ必要か? この問題は、よく知られた圧縮センシング問題の一般化とも考えられる(Cand\`es and Tao, 2005, Donoho, 2006)。本研究では,この問合せの複雑性問題に対処し,これまで最もよく知られた結果に基づいて効率的なアルゴリズムを提供する。

Mixture of linear regressions is a popular learning theoretic model that is used widely to represent heterogeneous data. In the simplest form, this model assumes that the labels are generated from either of two different linear models and mixed together. Recent works of Yin et al. and Krishnamurthy et al., 2019, focus on an experimental design setting of model recovery for this problem. It is assumed that the features can be designed and queried with to obtain their label. When queried, an oracle randomly selects one of the two different sparse linear models and generates a label accordingly. How many such oracle queries are needed to recover both of the models simultaneously? This question can also be thought of as a generalization of the well-known compressed sensing problem (Cand\`es and Tao, 2005, Donoho, 2006). In this work, we address this query complexity problem and provide efficient algorithms that improves on the previously best known results.

翻訳日:2022-11-15 14:31:05 公開日:2020-07-14

# B\'ezierSketch:スケーラブルなベクトルスケッチの生成モデル

B\'ezierSketch: A generative model for scalable vector sketches ( http://arxiv.org/abs/2007.02190v2 )

ライセンス: Link先を確認

Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang and Yi-Zhe Song

(参考訳) 人間のスケッチの神経生成モデルの研究は、スケッチ画像の生成と人間の描画過程の関係から、現代の興味深いモデリング問題となっている。ランドマークであるSketchRNNは、一連のウェイポイントとしてスケッチを逐次生成することでブレークスルーを提供した。しかし、これは低解像度の画像生成につながり、長いスケッチのモデル化に失敗する。本稿では,完全ベクトルスケッチのための新しい生成モデルであるB\'ezierSketchについて述べる。この目的のために、まず、エンコーダに各ストロークを最適なB'ezier曲線に埋め込むよう訓練する、ストローク埋め込みに対する新しい逆グラフアプローチを導入する。これにより、スケッチをパラマテライズドストロークの短いシーケンスとして扱うことができ、より長いスケッチの容量で再帰的なスケッチジェネレータを訓練でき、スケーラブルな高解像度な結果が得られる。我々はQuick, Draw!ベンチマークで定性的かつ定量的な結果を報告する。

The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B\'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B\'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.

翻訳日:2022-11-13 13:30:29 公開日:2020-07-14

# medas: 医療とインフォマティクスの間の壁を壊すためのオープンソースのプラットフォーム・アズ・サービス

MeDaS: An open-source platform as service to help break the walls between medicine and informatics ( http://arxiv.org/abs/2007.06013v2 )

ライセンス: Link先を確認

Liang Zhang, Johann Li, Ping Li, Xiaoyuan Lu, Peiyi Shen, Guangming Zhu, Syed Afaq Shah, Mohammed Bennarmoun, Kun Qian, Bj\"orn W. Schuller

(参考訳) 過去10年間、ディープラーニング(DL)はコンピュータビジョン、自然言語処理、医療など多くの分野で前例のない成功を収めてきた。特にDLは, 分析, セグメンテーション, 分類などの観点から, 高度な医用画像解析への応用が進んでいる。一方,医学的,臨床的,情報学的な背景を持つ研究コミュニティから,医学的,知識的,スキル的,経験的知識を共同で共有するDLの力を活用した膨大なニーズが生まれている。一方で、規律間の障壁は、しばしばフルで効率的なコラボレーションを妨げるため、進行中です。この目的のために、私たちはMeDicalオープンソースプラットフォームであるMeDaSという新しいオープンソースプラットフォームを提案しています。私たちの知識を最大限に活用するために、MeDaSは、医学的背景から研究者が簡単にDL関連ツールキットを使って、共同で対話的なサービスを証明し、同時に情報科学の科学者やエンジニアが医療知識の側面を理解するための最初のオープンソースプラットフォームです。提案するMeDaSプラットフォームは,RINV(Rapid implementation aNd Verification)の考え方に基づく一連のツールキットとユーティリティに基づいて,医療画像解析に必要な前処理,後処理,拡張,可視化,その他のフェーズを実装できる。肺,肝臓,脳,胸部,病理などの5つの課題を検証し,MeDaSを用いて効率よく実現可能であることを実証した。

In the past decade, deep learning (DL) has achieved unprecedented success in numerous fields including computer vision, natural language processing, and healthcare. In particular, DL is experiencing an increasing development in applications for advanced medical image analysis in terms of analysis, segmentation, classification, and furthermore. On the one hand, tremendous needs that leverage the power of DL for medical image analysis are arising from the research community of a medical, clinical, and informatics background to jointly share their expertise, knowledge, skills, and experience. On the other hand, barriers between disciplines are on the road for them often hampering a full and efficient collaboration. To this end, we propose our novel open-source platform, i.e., MeDaS -- the MeDical open-source platform as Service. To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side. Based on a series of toolkits and utilities from the idea of RINV (Rapid Implementation aNd Verification), our proposed MeDaS platform can implement pre-processing, post-processing, augmentation, visualization, and other phases needed in medical image analysis. Five tasks including the subjects of lung, liver, brain, chest, and pathology, are validated and demonstrated to be efficiently realisable by using MeDaS.

翻訳日:2022-11-11 06:04:29 公開日:2020-07-14

# 手術用ジェスチャー認識のための対称拡張畳み込み

Symmetric Dilated Convolution for Surgical Gesture Recognition ( http://arxiv.org/abs/2007.06373v2 )

ライセンス: Link先を確認

Jinglu Zhang, Yinyu Nie, Yao Lyu, Hailin Li, Jian Chang, Xiaosong Yang, Jian Jun Zhang

(参考訳) 自動手術ジェスチャー認識は術中コンピュータ支援と客観的手術スキル評価の前提条件である。以前の作業では、キネマティックなデータを集めるために追加のセンサーが必要か、長くて未撮影の手術ビデオから時間情報を取得することの制限が必要だった。これらの課題に対処するため,RGBビデオのみを用いて外科的ジェスチャーを自動的に検出・分節する新しい時間的畳み込みアーキテクチャを提案する。本手法は,長期の時間パターンを符号化・復号化するために,自己結合モジュールで橋渡しされた対称拡張構造を考案し,それに従ってフレーム間関係を確立する。 JIGSAWSデータセットからの基本的なロボット縫合作業におけるアプローチの有効性を検証する。実験の結果,F1@50スコア~6ポイントまでのフレーム単位の精度で,最先端の手法よりも優れる長期フレーム依存性の把握に本手法が有効であることが示された。

Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to ~6 points and the F1@50 score ~6 points.

翻訳日:2022-11-11 00:34:09 公開日:2020-07-14

# 限定スーパービジョンによるアクティブクラウドカウント

Active Crowd Counting with Limited Supervision ( http://arxiv.org/abs/2007.06334v2 )

ライセンス: Link先を確認

Zhen Zhao, Miaojing Shi, Xiaoxiao Zhao, Li Li

(参考訳) 群衆画像から信頼できる人々を知るには、通常、ヘッドセンターアノテーションが必要である。しかし、アノテーティング・センターは密集した群衆にとって退屈で退屈なプロセスである。本稿では,アノテートする画像をランダムに選択する代わりに,少量のラベリング予算が与えられた場合,まず,アノテートするデータセットの最も情報性の高い画像をアノテートするアクティブなラベリング戦略を導入し,その上にカウントモデルを学習する。このプロセスを繰り返して、各サイクルで、群衆密度が多様で、以前の選択と異なるサンプルを選択するようにします。ラベル付け予算が満たされた最後のサイクルでは、ラベル付きデータをラベル付きデータと整列する分布分類器を導入し、さらに、ネットワーク内の分散ラベルと遅延表現を混合して、特にトレーニングサンプル間の分散アライメントを改善することを提案する。群衆カウントのための一般的な密度推定パイプラインに従う。上海技術、UCF CC 50、MAll、TRANCOS、DCCといった標準ベンチマークで大規模な実験が行われる。限られた数の画像(例えばデータセットの10%)にアノテートすることで、データセットの完全なアノテーションを利用する技術の状態から遠く離れたレベルのパフォーマンスに達する。

To learn a reliable people counter from crowd images, head center annotations are normally required. Annotating head centers is however a laborious and tedious process in dense crowds. In this paper, we present an active learning framework which enables accurate crowd counting with limited supervision: given a small labeling budget, instead of randomly selecting images to annotate, we first introduce an active labeling strategy to annotate the most informative images in the dataset and learn the counting model upon them. The process is repeated such that in every cycle we select the samples that are diverse in crowd density and dissimilar to previous selections. In the last cycle when the labeling budget is met, the large amount of unlabeled data are also utilized: a distribution classifier is introduced to align the labeled data with unlabeled data; furthermore, we propose to mix up the distribution labels and latent representations of data in the network to particularly improve the distribution alignment in-between training samples. We follow the popular density estimation pipeline for crowd counting. Extensive experiments are conducted on standard benchmarks i.e. ShanghaiTech, UCF CC 50, MAll, TRANCOS, and DCC. By annotating limited number of images (e.g. 10% of the dataset), our method reaches levels of performance not far from the state of the art which utilize full annotations of the dataset.

翻訳日:2022-11-11 00:18:00 公開日:2020-07-14

# 粗いものから細かいものへの複数の音源の定位

Multiple Sound Sources Localization from Coarse to Fine ( http://arxiv.org/abs/2007.06355v2 )

ライセンス: Link先を確認

Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin

(参考訳) 制約のないビデオで複数の音源を視覚的にローカライズする方法は、特にペアワイズなサウンドオブジェクトアノテーションが欠けている場合、恐ろしい問題です。そこで本研究では,複雑なシーンから異なるカテゴリの音声表現と視覚表現を分離し,粗面から細部までのクロスモーダル特徴のアライメントを行う2段階視聴覚学習フレームワークを開発した。本モデルでは,局所化の公開データセット上での最先端結果と,複雑な場面における複数音源音像定位における有意な性能を実現する。次に, 音像分離のための局所化結果を用い, 既存の手法に匹敵する性能を得る。これらの結果は、特定の視覚源と効果的に音を調整できるモデルの能力を示している。コードはhttps://github.com/shvdiwnkozbw/Multi-Source-Sound-Localizationで入手できる。

How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations. To solve this problem, we develop a two-stage audiovisual learning framework that disentangles audio and visual representations of different categories from complex scenes, then performs cross-modal feature alignment in a coarse-to-fine manner. Our model achieves state-of-the-art results on public dataset of localization, as well as considerable performance on multi-source sound localization in complex scenes. We then employ the localization results for sound separation and obtain comparable performance to existing methods. These outcomes demonstrate our model's ability in effectively aligning sounds with specific visual sources. Code is available at https://github.com/shvdiwnkozbw/Multi-Source-Sound-Localization

翻訳日:2022-11-11 00:17:36 公開日:2020-07-14

# ニューラルネットワークにおけるSGDのカオスの定量的伝播

Quantitative Propagation of Chaos for SGD in Wide Neural Networks ( http://arxiv.org/abs/2007.06352v2 )

ライセンス: Link先を確認

Valentin De Bortoli, Alain Durmus, Xavier Fontaine, Umut Simsekli

(参考訳) 本稿では,2層超パラメータニューラルネットワークに適用される確率的勾配降下(sgd)アルゴリズムの,数やニューロン(つまり隠れた層の大きさ)である$n \to +\infty$ の連続時間に対する制限挙動について検討する。確率論的アプローチに従って,この連続時間ダイナミクスによって定義される粒子系の「カオスの伝播」を示し,粒子間の統計的相互作用が漸近的に消失することを示す。特に、ワッサースタイン距離が与えられた距離空間における平均場mckean-vlasov方程式の解に対する任意の粒子のn$に関して定量的収束を確立する。これまでの研究と比較して、SGDのステップサイズ列がニューロンの数や反復数に依存する可能性のある設定について考察する。次に,それぞれ異なる平均場限界が得られた2つのレジームを同定し,そのうちの1つは手元の最小化問題の暗黙的に正規化されたバージョンに対応する。理論的な結果を検証するために実データ集合について様々な実験を行い、分類問題におけるこれら2つのレジームの存在を評価し、収束結果を示す。

In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.

翻訳日:2022-11-10 23:43:06 公開日:2020-07-14

# 人工データセットにおけるGANの訓練から学んだ教訓

Lessons Learned from the Training of GANs on Artificial Datasets ( http://arxiv.org/abs/2007.06418v2 )

ライセンス: Link先を確認

Shichang Tang

(参考訳) 近年,GAN(Generative Adversarial Networks)は現実的な画像の合成に大きく進歩している。しかし、しばしば、サンプルが少ないか、異なるデータ分布に属するクラスが多すぎるイメージデータセットで訓練される。その結果、GANは不適合や過剰適合の傾向があり、分析が困難で制約される。したがって、データセットがもたらした不要な干渉を回避しつつ、ganを徹底的に研究するために、無限に多くのサンプルと実際のデータ分布が単純で高次元で構造化多様体を持つ人工データセットでそれらを訓練する。さらに、ジェネレータは最適なパラメータ集合が存在するように設計されている。実験により,様々な距離測定において,生成元はGAN訓練手順でそのようなパラメータを学習できないことがわかった。また、GANのトレーニング混合物は、モデル複雑さが十分に高い場合のネットワーク深さや幅を増大させるよりも、パフォーマンスが向上することがわかった。実験の結果,複数のジェネレータの混合が教師なし設定で異なるモードや異なるクラスを自動的に発見できることが示され,複数のジェネレータと識別器にまたがる生成タスクと識別タスクの分散を特徴付ける。現実的なデータセットへの結論の一般化可能性の例として、CIFAR-10データセット上でGANの混合を訓練し、一般的なメトリクス、すなわちインセプションスコア(IS)とFr\echet Inception Distance(FID)で最先端の手法を著しく上回ります。

Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years. However, they are often trained on image datasets with either too few samples or too many classes belonging to different data distributions. Consequently, GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained. Therefore, in order to conduct a thorough study on GANs while obviating unnecessary interferences introduced by the datasets, we train them on artificial datasets where there are infinitely many samples and the real data distributions are simple, high-dimensional and have structured manifolds. Moreover, the generators are designed such that optimal sets of parameters exist. Empirically, we find that under various distance measures, the generator fails to learn such parameters with the GAN training procedure. We also find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width when the model complexity is high enough. Our experimental results demonstrate that a mixture of generators can discover different modes or different classes automatically in an unsupervised setting, which we attribute to the distribution of the generation and discrimination tasks across multiple generators and discriminators. As an example of the generalizability of our conclusions to realistic datasets, we train a mixture of GANs on the CIFAR-10 dataset and our method significantly outperforms the state-of-the-art in terms of popular metrics, i.e., Inception Score (IS) and Fr\'echet Inception Distance (FID).

翻訳日:2022-11-10 22:55:51 公開日:2020-07-14

# 自律走行車のロバストセンシングに向けて--敵対的視点から

Towards robust sensing for Autonomous Vehicles: An adversarial perspective ( http://arxiv.org/abs/2007.10115v1 )

ライセンス: Link先を確認

Apostolos Modas, Ricardo Sanchez-Matilla, Pascal Frossard, Andrea Cavallaro

(参考訳) 自動運転車は、さまざまな状況において安全クリティカルな意思決定のために、正確でロバストなセンサー観測に依存している。このようなシステムの基本的な構成要素は、超音波、RADAR、GPS、LiDARおよびカメラ信号を処理するセンサーと分類器である。結果として得られる決定が摂動に対して堅牢であり、異なる種類のニュアンスやデータ変換の形式を取ることができ、また敵対的摂動(AP)にもなり得ることが重要である。敵対的摂動は、自律的なシステムを攻撃し、破壊することを目的として、意図的に環境または感覚測定を改変する。 AVの高速進化領域において、より安全なシステムを構築し、デプロイするには、センサーシステムの脆弱性を慎重に評価する必要がある。そこで,本稿では,自律システムに対するセンサ・モダリティに対する敵の攻撃をレビューした上で,その対策と今後の研究方向性について論じる。

Autonomous Vehicles rely on accurate and robust sensor observations for safety critical decision-making in a variety of conditions. Fundamental building blocks of such systems are sensors and classifiers that process ultrasound, RADAR, GPS, LiDAR and camera signals~\cite{Khan2018}. It is of primary importance that the resulting decisions are robust to perturbations, which can take the form of different types of nuisances and data transformations, and can even be adversarial perturbations (APs). Adversarial perturbations are purposefully crafted alterations of the environment or of the sensory measurements, with the objective of attacking and defeating the autonomous systems. A careful evaluation of the vulnerabilities of their sensing system(s) is necessary in order to build and deploy safer systems in the fast-evolving domain of AVs. To this end, we survey the emerging field of sensing in adversarial settings: after reviewing adversarial attacks on sensing modalities for autonomous systems, we discuss countermeasures and present future research directions.

翻訳日:2022-11-10 15:46:05 公開日:2020-07-14

# 遅延制約型無線フェデレート学習のための協調デバイススケジューリングと資源割り当て

Joint Device Scheduling and Resource Allocation for Latency Constrained Wireless Federated Learning ( http://arxiv.org/abs/2007.07174v1 )

ライセンス: Link先を確認

Wenqi Shi, Sheng Zhou, Zhisheng Niu, Miao Jiang, Lu Geng

(参考訳) フェデレーション学習(fl)では、デバイスはワイヤレスチャネルを介してローカルモデルのアップデートをアップロードすることで、グローバルなトレーニングに寄与する。計算量や通信資源が限られているため、デバイススケジューリングはFLの収束速度に不可欠である。本稿では,遅延制約のある無線FLに対して,与えられたトレーニング時間予算のモデル精度を最大化するために,共同装置スケジューリングと資源割当ポリシを提案する。訓練性能損失の相反性に対する低いバウンダリは、訓練ラウンド数と1ラウンド当たりの予定装置数との観点から導出される。この境界に基づいて、精度最大化問題は2つのサブプロブレムに分解することで解決される。まず、スケジュールされたデバイスを考えると、最適な帯域割り当ては、より悪いチャネル条件や計算能力の弱いデバイスへの帯域幅を割り当てることを示唆する。そして、各ステップにおいて、最適な帯域割り当てによって得られる最小更新時間を消費する装置を、下限が増加するまで選択する欲望デバイススケジューリングアルゴリズムを導入することにより、より多くのデバイスがモデル精度を低下させる。実験により,提案手法は,データ分布とセル半径の広範囲な設定の下で,最先端のスケジューリングポリシーより優れていることが示された。

In federated learning (FL), devices contribute to the global training by uploading their local model updates via wireless channels. Due to limited computation and communication resources, device scheduling is crucial to the convergence rate of FL. In this paper, we propose a joint device scheduling and resource allocation policy to maximize the model accuracy within a given total training time budget for latency constrained wireless FL. A lower bound on the reciprocal of the training performance loss, in terms of the number of training rounds and the number of scheduled devices per round, is derived. Based on the bound, the accuracy maximization problem is solved by decoupling it into two sub-problems. First, given the scheduled devices, the optimal bandwidth allocation suggests allocating more bandwidth to the devices with worse channel conditions or weaker computation capabilities. Then, a greedy device scheduling algorithm is introduced, which in each step selects the device consuming the least updating time obtained by the optimal bandwidth allocation, until the lower bound begins to increase, meaning that scheduling more devices will degrade the model accuracy. Experiments show that the proposed policy outperforms state-of-the-art scheduling policies under extensive settings of data distributions and cell radius.

翻訳日:2022-11-10 15:45:36 公開日:2020-07-14

# セキュリティ制約付き最適潮流に対する深層学習と最適化の併用

Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow ( http://arxiv.org/abs/2007.07002v1 )

ライセンス: Link先を確認

Alexandre Velloso and Pascal Van Hentenryck

(参考訳) セキュリティに制約のある最適電力フロー(SCOPF)は電力システムの基本であり、同期発電機の自動一次応答(APR)と短期スケジュールを接続する。様々な入力に対して、SCOPFの問題を毎日繰り返し解決し、一組のタイミングで頑健なスケジュールを決定する。残念ながら、SCOPF問題におけるAPRのモデリングは、複雑な大規模混合整数プログラムをもたらすが、解決は困難である。この課題に対処するため,本研究では,深層学習と頑健な最適化技術を組み合わせた新しい手法を提案する。厳密解法の計算負荷を軽減することを目的とした最近の機械学習アプリケーションとは異なり、提案手法はscopf実装可能な解を直接予測する。 2つのステップで実現可能である。まず、トレーニング中にラグランジアン二重法は、コロン・アンド・制約生成アルゴリズム(CCGA)によって機械学習モデルに反復的に追加される物理的および操作上の制約の違反を罰する。第二に、別のccgaは予測に最も近い解を見つけることで実現可能性を取り戻す。大規模なテストケースでの実験では、最適度ギャップが0.1%以下で実現可能な解を得るためのかなりの時間短縮が得られた。

The security-constrained optimal power flow (SCOPF) is fundamental in power systems and connects the automatic primary response (APR) of synchronized generators with the short-term schedule. Every day, the SCOPF problem is repeatedly solved for various inputs to determine robust schedules given a set of contingencies. Unfortunately, the modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs, which are hard to solve. To address this challenge, leveraging the wealth of available historical data, this paper proposes a novel approach that combines deep learning and robust optimization techniques. Unlike recent machine-learning applications where the aim is to mitigate the computational burden of exact solvers, the proposed method predicts directly the SCOPF implementable solution. Feasibility is enforced in two steps. First, during training, a Lagrangian dual method penalizes violations of physical and operations constraints, which are iteratively added as necessary to the machine-learning model by a Column-and-Constraint-Generation Algorithm (CCGA). Second, another different CCGA restores feasibility by finding the closest feasible solution to the prediction. Experiments on large test cases show that the method results in significant time reduction for obtaining feasible solutions with an optimality gap below 0.1%.

翻訳日:2022-11-10 15:37:09 公開日:2020-07-14

# 地底真理の欠如による特徴不合理性予測

Predicting feature imputability in the absence of ground truth ( http://arxiv.org/abs/2007.07052v1 )

ライセンス: Link先を確認

Niamh McCombe, Xuemei Ding, Girijesh Prasad, David P. Finn, Stephen Todd, Paula L. McClean, KongFatt Wong-Lin

(参考訳) データ計算は、欠落した値を扱う最も一般的な方法であるが、ほとんどの実生活アプリケーションでは、大きな欠落データが発生する可能性があり、データが正確にインプットされたかどうかを評価することは困難または不可能である(基礎的真実の欠如)。本稿では,個々のデータの特徴を正確に説明できるかどうかを判断するための,効果的でシンプルな主成分に基づく手法を提案する。特に, 極度の欠如や根拠の欠如がある場合でも, 主成分負荷と特徴インプタビリティとの間に強い線形関係が確立される。この研究は、実践的なデータ計算戦略に重要な意味を持つだろう。

Data imputation is the most popular method of dealing with missing values, but in most real life applications, large missing data can occur and it is difficult or impossible to evaluate whether data has been imputed accurately (lack of ground truth). This paper addresses these issues by proposing an effective and simple principal component based method for determining whether individual data features can be accurately imputed - feature imputability. In particular, we establish a strong linear relationship between principal component loadings and feature imputability, even in the presence of extreme missingness and lack of ground truth. This work will have important implications in practical data imputation strategies.

翻訳日:2022-11-10 15:36:46 公開日:2020-07-14

# マルチドメイン医用画像検索のためのユニバーサルモデル

Universal Model for Multi-Domain Medical Image Retrieval ( http://arxiv.org/abs/2007.08628v1 )

ライセンス: Link先を確認

Yang Feng, Yubao Liu, Jiebo Luo

(参考訳) 医用画像検索(MIR)は、医師が類似した患者のデータを素早く見つけるのに役立つ。デジタル画像モダリティの広範利用と医用画像レポジトリの成長により、MIRはますます役に立ちつつある。しかし、病院における様々なデジタル画像モダリティの人気もまた、MIRにいくつかの課題をもたらしている。通常、1つの画像検索モデルは、1つのモダリティまたは1つのソースの画像を扱うためにのみ訓練される。複数のソースやドメインから医療画像を取得する必要がある場合、複数の検索モデルを維持する必要があります。本稿では,複数の領域の医用画像に適用可能な1つのMIRモデルをトレーニングする方法について検討する。複数のドメインからトレーニングデータを融合するだけでは、既存のメソッドを使ってトレーニングすると、いくつかのドメインがより早く適合するため、この問題は解決できない。そこで本研究では,複数の専門的MIRモデルの知識を汎用埋め込みにより単一のマルチドメインMIRモデルに抽出し,その問題を解決することを提案する。皮膚疾患,X線,網膜画像データセットを用いて,提案したユニバーサルモデルがマルチドメインMIRを効果的に実現できることを検証する。

Medical Image Retrieval (MIR) helps doctors quickly find similar patients' data, which can considerably aid the diagnosis process. MIR is becoming increasingly helpful due to the wide use of digital imaging modalities and the growth of the medical image repositories. However, the popularity of various digital imaging modalities in hospitals also poses several challenges to MIR. Usually, one image retrieval model is only trained to handle images from one modality or one source. When there are needs to retrieve medical images from several sources or domains, multiple retrieval models need to be maintained, which is cost ineffective. In this paper, we study an important but unexplored task: how to train one MIR model that is applicable to medical images from multiple domains? Simply fusing the training data from multiple domains cannot solve this problem because some domains become over-fit sooner when trained together using existing methods. Therefore, we propose to distill the knowledge in multiple specialist MIR models into a single multi-domain MIR model via universal embedding to solve this problem. Using skin disease, x-ray, and retina image datasets, we validate that our proposed universal model can effectively accomplish multi-domain MIR.

翻訳日:2022-11-10 15:36:34 公開日:2020-07-14

# 映像から視覚的な音を生成する

Generating Visually Aligned Sound from Videos ( http://arxiv.org/abs/2008.00820v1 )

ライセンス: Link先を確認

Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan

(参考訳) 我々は,自然映像から音を生成する作業に焦点をあて,その音は時間的にも内容的にも視覚信号と一致すべきである。このタスクは、ビデオコンテンツからカメラを推測できない音が生成されるため、非常に難しい。このモデルは、視覚的内容とこれらの無関係な音の間違ったマッピングを学習せざるを得ない。この課題に対処するため,我々はREGNETというフレームワークを提案する。本稿では,複雑な背景情報から音声を発する物体をよりよく識別するために,まず映像フレームから外観や動きの特徴を抽出する。次に,実音を入力として直接考慮し,ボトルネック音の特徴を出力する,革新的な音声フォワード正則化器を導入する。訓練中の音の予測に視覚的特徴とボトルネック的特徴の両方を使用すると、音の予測の監督が強化される。音声フォワーディングレギュレータは、無関係な音成分を制御でき、これにより、画面外にある物体から放射される映像フレームと音との誤ったマッピングを学習するのを防止する。テスト中、オーディオフォワードレギュラライザが削除され、regnetが純粋に調整されたサウンドを視覚的な特徴からのみ生成できるようになる。 Amazon Mechanical Turkに基づく大規模評価の結果,時間的・内容的アライメントが大幅に向上した。驚くべきことに、我々の生成した音は68.12%の成功率で人間を騙すことができる。コードと事前訓練されたモデルはhttps://github.com/PeihaoChen/regnetで公開されている。

We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated \emph{outside} a camera can not be inferred from video content. The model may be forced to learn an incorrect mapping between visual content and these irrelevant sounds. To address this challenge, we propose a framework named REGNET. In this framework, we first extract appearance and motion features from video frames to better distinguish the object that emits sound from complex background information. We then introduce an innovative audio forwarding regularizer that directly considers the real sound as input and outputs bottlenecked sound features. Using both visual and bottlenecked sound features for sound prediction during training provides stronger supervision for the sound prediction. The audio forwarding regularizer can control the irrelevant sound component and thus prevent the model from learning an incorrect mapping between video frames and sound emitted by the object that is out of the screen. During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features. Extensive evaluations based on Amazon Mechanical Turk demonstrate that our method significantly improves both temporal and content-wise alignment. Remarkably, our generated sound can fool the human with a 68.12% success rate. Code and pre-trained models are publicly available at https://github.com/PeihaoChen/regnet

翻訳日:2022-11-10 15:36:16 公開日:2020-07-14

# 非線形ロバスト制御のための三元ポリシー反復アルゴリズム

Ternary Policy Iteration Algorithm for Nonlinear Robust Control ( http://arxiv.org/abs/2007.06810v1 )

ライセンス: Link先を確認

Jie Li, Shengbo Eben Li, Yang Guan, Jingliang Duan, Wenyu Li, Yuming Yin

(参考訳) 植物力学の不確実性は、非線形制御問題への挑戦である。本稿では,境界不確実性を伴う非線形ロバスト制御問題を解くための3次ポリシー反復(TPI)アルゴリズムを開発する。コントローラとシステムの不確実性はゲームプレイヤーと見なされ、ロバスト制御問題は2つのプレイヤーゼロサム差分ゲームとして定式化される。微分ゲームを解くために、対応するhamilton-jacobi-isaacs(hji)方程式が導出される。 3つの損失関数と3つの更新フェーズは、それぞれHJI方程式の恒等式、最小化、最大化に対応するように設計されている。これらの損失関数は、全状態が同時に設定されるのを防ぐために生成された状態集合における近似ハミルトン状態の期待によって定義される。勾配降下法を用いて設計した損失関数を小さくすることで、値関数とポリシーのパラメータを直接更新する。さらに、制御ポリシのパラメータにもゼロ初期化を適用することができる。提案アルゴリズムの有効性は2つのシミュレーション研究を通して実証した。シミュレーションの結果, tpiアルゴリズムは線形プラントの最適解に収束し, 非線形プラントの外乱に対する高い抵抗を持つことがわかった。

The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order to solve the differential game, the corresponding Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and three update phases are designed to match the identity equation, minimization and maximization of the HJI equation, respectively. These loss functions are defined by the expectation of the approximate Hamiltonian in a generated state set to prevent operating all the states in the entire state set concurrently. The parameters of value function and policies are directly updated by diminishing the designed loss functions using the gradient descent method. Moreover, zero-initialization can be applied to the parameters of the control policy. The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies. The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant.

翻訳日:2022-11-10 15:35:24 公開日:2020-07-14

# LSTMとトレーニング可能な初期隠れ状態を用いた金融時系列のモデル化

Modeling Financial Time Series using LSTM with Trainable Initial Hidden States ( http://arxiv.org/abs/2007.06848v1 )

ライセンス: Link先を確認

Jungsik Hwang

(参考訳) 過去の未知のパターンや情報を時系列で抽出することは、多くの現実世界のアプリケーションの中心である。本研究では,深層学習モデルを用いて金融時系列をモデル化する新しい手法を提案する。トレーニング可能な初期隠れ状態を備えたLong Short-Term Memory(LSTM)ネットワークを使用する。時系列の再構成を学習することにより,そのパラメータで高次元時系列データを表現できる。韓国株式市場のデータを用いた実験により、このモデルは潜在空間における大量の株価の相対的類似性を捉えることができた。さらに、このモデルでは、潜在分野から将来の株価トレンドを予測することもできる。提案手法は,多くの時系列間の関係を識別する上で有用であり,投資ポートフォリオの最適化など,金融アプリケーションに適用することができる。

Extracting previously unknown patterns and information in time series is central to many real-world applications. In this study, we introduce a novel approach to modeling financial time series using a deep learning model. We use a Long Short-Term Memory (LSTM) network equipped with the trainable initial hidden states. By learning to reconstruct time series, the proposed model can represent high-dimensional time series data with its parameters. An experiment with the Korean stock market data showed that the model was able to capture the relative similarity between a large number of stock prices in its latent space. Besides, the model was also able to predict the future stock trends from the latent space. The proposed method can help to identify relationships among many time series, and it could be applied to financial applications, such as optimizing the investment portfolios.

翻訳日:2022-11-10 15:35:07 公開日:2020-07-14

# ネットワーク音楽演奏アプリケーションにおける音声信号の低遅延パケット損失隠蔽のためのディープラーニング手法

A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications ( http://arxiv.org/abs/2007.07132v1 )

ライセンス: Link先を確認

Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi

(参考訳) Networked Music Performance (NMP)は、インターネットアプリケーションにおける潜在的なゲームチェンジャーとして構想されており、遠隔のミュージシャンが遠隔通信ネットワークを介して対話し、一緒に演奏できるようにすることによって、従来の音楽インタラクションの概念に革命をもたらすことを目的としている。しかし、音楽演奏の現実的な条件を保証することは、音質やネットワークの遅延といった極めて厳しい要件のため、重要なエンジニアリング上の課題となっている。ミュージシャンが経験したエンドツーエンドの遅延を最小限に抑えるため、NMPアプリケーションの典型的な実装では、圧縮されていない双方向オーディオストリームを使用し、UDPをトランスポートプロトコルとして利用する。接続が小さく信頼性の低いため、UDP経由で送信されるオーディオパケットは再送信されず、レシーバのオーディオ再生に不具合が発生する。本稿では,深層学習手法を用いてパケットの損失をリアルタイムで予測する手法について述べる。エラーをリアルタイムで隠蔽する能力は、パケット損失によるオーディオ障害の軽減に役立ち、現実世界のシナリオにおけるオーディオプレイアウトの品質を向上させる。

Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network. Ensuring realistic conditions for music performance, however, constitutes a significant engineering challenge due to extremely strict requirements in terms of audio quality and, most importantly, network delay. To minimize the end-to-end delay experienced by the musicians, typical implementations of NMP applications use un-compressed, bidirectional audio streams and leverage UDP as transport protocol. Being connection less and unreliable,audio packets transmitted via UDP which become lost in transit are not re-transmitted and thus cause glitches in the receiver audio playout. This article describes a technique for predicting lost packet content in real-time using a deep learning approach. The ability of concealing errors in real time can help mitigate audio impairments caused by packet losses, thus improving the quality of audio playout in real-world scenarios.

翻訳日:2022-11-10 15:34:29 公開日:2020-07-14

# Explore and Explain: セルフ教師付きナビゲーションとリカウント

Explore and Explain: Self-supervised Navigation and Recounting ( http://arxiv.org/abs/2007.07268v1 )

ライセンス: Link先を確認

Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

(参考訳) 自律的でインテリジェントなエージェントの開発を促進することを目的として、Embodied AIは最近注目を集めている。本稿では,エージェントが未知の環境を探索し,その経路に何が見えるのかを記述する必要がある,新たな具体的設定を考案する。この文脈では、エージェントは探索目標によって駆動される環境をナビゲートし、説明のための適切なモーメントを選択し、関連するオブジェクトとシーンの自然言語記述を出力する必要がある。本モデルでは,新たな自己監督探索モジュールとペナルティと,説明のための完全なキャプションモデルを統合する。また,環境とナビゲーションの双方から得られる情報によって,説明の適切なモーメントを選択するための異なるポリシーについて検討する。 Matterport3Dデータセットからフォトリアリスティックな環境下で実験を行い、エージェントのナビゲーションと説明機能およびそれらの相互作用の役割について調査する。

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the navigation and explanation capabilities of the agent as well as the role of their interactions.

翻訳日:2022-11-10 15:28:03 公開日:2020-07-14

# TinyVIRAT:低解像度ビデオアクション認識

TinyVIRAT: Low-resolution Video Action Recognition ( http://arxiv.org/abs/2007.07355v1 )

ライセンス: Link先を確認

Ugur Demir, Yogesh S Rawat, Mubarak Shah

(参考訳) 既存のアクション認識の研究は主に、アクションがはっきりと見える高品質のビデオに焦点を当てている。現実世界の監視環境では、ビデオ内のアクションは幅広い解像度でキャプチャされる。ほとんどの活動は小さな解像度で発生し、そのような活動を認識することは難しい問題である。本研究では,ビデオ中の小さなアクションを認識することに焦点を当てる。我々は,天然の低解像度アクティビティを含むベンチマークデータセットであるtinyviratを紹介する。 TinyVIRATビデオのアクションには複数のラベルがあり、監視ビデオから抽出され、現実的でより困難なものになる。本稿では,低解像度アクションの品質向上のために,プログレッシブ・ジェネレーティブ・アプローチを用いたビデオにおける小さなアクションの認識手法を提案する。提案手法は,映像中の活動領域に焦点を合わせるのに役立つ弱訓練された注意機構も備えている。提案するtinyviratデータセットのベンチマーク実験を行い,提案手法がベースライン上での動作認識性能を大幅に向上させることを確認した。また,提案手法は,既存の手法と比較して,合成的再構成された行動認識データセットに対するアプローチを評価した。データセットとコードはhttps://github.com/UgurDemir/Tiny-VIRATで公開されている。

The existing research in action recognition is mostly focused on high-quality videos where the action is distinctly visible. In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions. Most activities occur at a distance with a small resolution and recognizing such activities is a challenging problem. In this work, we focus on recognizing tiny actions in videos. We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities. The actions in TinyVIRAT videos have multiple labels and they are extracted from surveillance videos which makes them realistic and more challenging. We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach to improve the quality of low-resolution actions. The proposed method also consists of a weakly trained attention mechanism which helps in focusing on the activity regions in the video. We perform extensive experiments to benchmark the proposed TinyVIRAT dataset and observe that the proposed method significantly improves the action recognition performance over baselines. We also evaluate the proposed approach on synthetically resized action recognition datasets and achieve state-of-the-art results when compared with existing methods. The dataset and code is publicly available at https://github.com/UgurDemir/Tiny-VIRAT.

翻訳日:2022-11-10 15:27:48 公開日:2020-07-14

# 攻撃的セキュリティのための機械学習:決定木とニューラルネットワークを用いたサンドボックス分類

Machine Learning for Offensive Security: Sandbox Classification Using Decision Trees and Artificial Neural Networks ( http://arxiv.org/abs/2007.06763v1 )

ライセンス: Link先を確認

Will Pearce, Nick Landers, and Nancy Fulda

(参考訳) 情報セキュリティにおける機械学習のメリットは、主に防衛を強化することに焦点を当てている。しかし、機械学習(ml)のテクニックは、深いポケットと巨大なデータリポジトリを持つ組織に留まらず、mlの民主化によって、mlを使用して攻撃的な操作をサポートするセキュリティチームの数が増えています。ここで提示された研究は、我々のチームが1つの攻撃的タスクを解決するために使った2つのモデルを調べ、サンドボックスを検出する。フィッシングメールで収集されたプロセスリストデータを用いて、サンドボックスの分類にDecision TreesとArtificial Neural Networksを用いることで、安全でない実行を避けることができる。本稿は,実際の攻撃的チームが機械学習を用いて攻撃的操作をサポートする方法について,ユニークな洞察を提供することを目的とする。

The merits of machine learning in information security have primarily focused on bolstering defenses. However, machine learning (ML) techniques are not reserved for organizations with deep pockets and massive data repositories; the democratization of ML has lead to a rise in the number of security teams using ML to support offensive operations. The research presented here will explore two models that our team has used to solve a single offensive task, detecting a sandbox. Using process list data gathered with phishing emails, we will demonstrate the use of Decision Trees and Artificial Neural Networks to successfully classify sandboxes, thereby avoiding unsafe execution. This paper aims to give unique insight into how a real offensive team is using machine learning to support offensive operations.

翻訳日:2022-11-10 15:26:34 公開日:2020-07-14

# マルチスタティック・ローカライゼーションのための深部ニューラルネットワークの不確かさと超音波構造健康モニタリングへの応用

Uncertainty Aware Deep Neural Network for Multistatic Localization with Application to Ultrasonic Structural Health Monitoring ( http://arxiv.org/abs/2007.06814v1 )

ライセンス: Link先を確認

Ishan D. Khurjekar, Joel B. Harley

(参考訳) 誘導超音波定位法は、空間分布型マルチスタティックセンサアレイと一般化ビームフォーミング戦略を用いて、構造物全体の損傷を検出し、発見する。伝播チャネルはしばしば非常に複雑である。波動伝播モデルとデータを比較して損傷を特定できる。しかし、環境の不確実性(例えば温度やストレスの変化)は、しばしば精度を低下させる。本稿では,不確実性を考慮した深層ニューラルネットワークフレームワークを用いて,ロバストな局所化モデルを学習し,不確実性を表現する。訓練データの不確実性に基づき,混合密度ネットワークを用いて損傷位置分布を生成する。これは、出力点推定を行うほとんどのローカライゼーション手法とは対照的である。本手法を一般化ビームフォーミングフレームワークであるmatched field processing(mfp)と比較した。提案手法は, 環境不確かさや騒音がある場合の0.1425mに対して0.0625mのローカライズ誤差を達成する。また,環境不確実性の増加に伴う予測的不確実性は,局所化精度を評価する統計的に有意な指標となることを示す。

Guided ultrasonic wave localization uses spatially distributed multistatic sensor arrays and generalized beamforming strategies to detect and locate damage across a structure. The propagation channel is often very complex. Methods can compare data with models of wave propagation to locate damage. Yet, environmental uncertainty (e.g., temperature or stress variations) often degrade accuracies. This paper uses an uncertainty-aware deep neural network framework to learn robust localization models and represent uncertainty. We use mixture density networks to generate damage location distributions based on training data uncertainty. This is in contrast with most localization methods, which output point estimates. We compare our approach with matched field processing (MFP), a generalized beamforming framework. The proposed approach achieves a localization error of 0.0625 m as compared to 0.1425 m with MFP when data has environmental uncertainty and noise. We also show that the predictive uncertainty scales as environmental uncertainty increases to provide a statistically meaningful metric for assessing localization accuracy.

翻訳日:2022-11-10 15:26:24 公開日:2020-07-14

# SRDCNN:時系列センサ信号分類タスクのための強正規化深部畳み込みニューラルネットワークアーキテクチャ

SRDCNN: Strongly Regularized Deep Convolution Neural Network Architecture for Time-series Sensor Signal Classification Tasks ( http://arxiv.org/abs/2007.06909v1 )

ライセンス: Link先を確認

Arijit Ukil, Antonio Jara, Leandro Marin

(参考訳) ディープニューラルネットワーク(DNN)は、特にコンピュータビジョンベースのアプリケーションにおいて、分類および回帰タスクの実行に成功している。近年,IoT(Internet of Things,モノのインターネット)の普及により,時系列データ,特にセンサの分類タスクが最も重要になっている。本稿では, SRDCNN: Strongly Regularized Deep Convolution Neural Network (DCNN) に基づく,時系列分類タスクを実行するディープアーキテクチャを提案する。提案手法の新規性は、ネットワークウェイトが L1 と L2 のノルム法則によって正則化されることである。どちらも、より少ないトレーニングインスタンスの実践的な問題、より迅速なトレーニングプロセスの要求、重みベクトルのスパース化と重み値の制御によるオーバーフィッティングの問題を回避するために、協調的に対処する。提案手法(SRDCNN)と,公開時系列分類ベンチマーク(UCR/UEAアーカイブ)を用いて異なるDNNを含む関連技術アルゴリズムを比較し,提案手法が優れた性能を提供することを示す。 SRDCNNは,実時間時系列センサ信号のトレーニングインスタンス不足問題に対処するために,ネットワークパラメータを深く制御することで,より優れた一般化能力を深層アーキテクチャに保証していると感じている。

Deep Neural Networks (DNN) have been successfully used to perform classification and regression tasks, particularly in computer vision based applications. Recently, owing to the widespread deployment of Internet of Things (IoT), we identify that the classification tasks for time series data, specifically from different sensors are of utmost importance. In this paper, we present SRDCNN: Strongly Regularized Deep Convolution Neural Network (DCNN) based deep architecture to perform time series classification tasks. The novelty of the proposed approach is that the network weights are regularized by both L1 and L2 norm penalties. Both of the regularization approaches jointly address the practical issues of smaller number of training instances, requirement of quicker training process, avoiding overfitting problem by incorporating sparsification of weight vectors as well as through controlling of weight values. We compare the proposed method (SRDCNN) with relevant state-of-the-art algorithms including different DNNs using publicly available time series classification benchmark (the UCR/UEA archive) time series datasets and demonstrate that the proposed method provides superior performance. We feel that SRDCNN warrants better generalization capability to the deep architecture by profoundly controlling the network parameters to combat the training instance insufficiency problem of real-life time series sensor signals.

翻訳日:2022-11-10 15:26:06 公開日:2020-07-14

# ビセクターを追従する:多目的最適化のための簡単な方法

Follow the bisector: a simple method for multi-objective optimization ( http://arxiv.org/abs/2007.06937v1 )

ライセンス: Link先を確認

Alexandr Katrutsa, Daniil Merkulov, Nurislam Tursynbek and Ivan Oseledets

(参考訳) 本研究では,多目的最適化問題を解くための新しい等角方向法(EDM)を提案する。複数の異なる損失を最小化しなければならない最適化問題を考える。提案手法は,各イテレーションにおける降下方向を計算し,目的関数の相対的減少を保証する。この降下方向は個々の損失の正規化勾配に基づいている。したがって、マルチスケール損失を伴う多目的最適化問題を解くのが適切である。標準データセットを用いた不均衡分類問題とマルチタスク学習問題において,提案手法を検証した。 EDMはこれらの問題を解決する他の方法と比較される。

This study presents a novel Equiangular Direction Method (EDM) to solve a multi-objective optimization problem. We consider optimization problems, where multiple differentiable losses have to be minimized. The presented method computes descent direction in every iteration to guarantee equal relative decrease of objective functions. This descent direction is based on the normalized gradients of the individual losses. Therefore, it is appropriate to solve multi-objective optimization problems with multi-scale losses. We test the proposed method on the imbalanced classification problem and multi-task learning problem, where standard datasets are used. EDM is compared with other methods to solve these problems.

翻訳日:2022-11-10 15:25:41 公開日:2020-07-14

# Fenton-Wilkinson順序統計を用いた正規回帰:オリエンテーリングレースを事例として

Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race ( http://arxiv.org/abs/2007.07369v1 )

ライセンス: Link先を確認

Joonas P\"a\"akk\"onen

(参考訳) スポーツでは、個人とチームは一般的に最終ランキングに興味を持つ。時間や距離などの最終結果は、これらのランキング(場所)を定めている。場所は順序確率変数(一般に順序統計と呼ばれる)にさらに関連付けることができる。そこで本研究では,更新時間を伴うレース場所の中継を予測できる簡易かつ高精度な順序統計順序回帰関数を提案する。この関数をfenton-wilkinson order statistics modelと呼ぶ。このモデルは次のような教育的な仮定に基づいて構築されている。さらに, フェントン・ウィルキンソンは, ドイツの戦車問題と同様に, チームの総数を推定するエスティメータと並行して, チェンジオーバタイムの近似値を用いることが目的である。この元のプレース回帰関数はsgmoidalであり、その結果、他のチームを大きく上回る少数のエリートチームが存在することを正しく予測している。また,本モデルでは,対数正規分布関数のインフレクション点における切替時間とともに,位置が線形に増大する様子を述べる。大規模なオリエンテーリングリレーレースであるJukola 2019の実際のデータから、トレーニングセットのサイズがデータセット全体のわずか5%である場合でも、モデルは極めて正確であることが示されている。また,本モデルでは,線形回帰,モード回帰,ガウス過程回帰よりも局所的根-平均二乗誤差が小さいことを示した。

In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places with changeover-times. We call this function the Fenton-Wilkinson Order Statistics model. This model is built on the following educated assumption: individual leg-times follow log-normal distributions. Moreover, our key idea is to utilize Fenton-Wilkinson approximations of changeover-times alongside an estimator for the total number of teams as in the notorious German tank problem. This original place regression function is sigmoidal and thus correctly predicts the existence of a small number of elite teams that significantly outperform the rest of the teams. Our model also describes how place increases linearly with changeover-time at the inflection point of the log-normal distribution function. With real-world data from Jukola 2019, a massive orienteering relay race, the model is shown to be highly accurate even when the size of the training set is only 5% of the whole data set. Numerical results also show that our model exhibits smaller place prediction root-mean-square-errors than linear regression, mord regression and Gaussian process regression.

翻訳日:2022-11-10 15:20:18 公開日:2020-07-14

# Meta-rPPG:トランスダクティブメタラーナーを用いた遠隔心拍数推定

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner ( http://arxiv.org/abs/2007.06786v1 )

ライセンス: Link先を確認

Eugene Lee, Evan Chen, Chen-Yi Lee

(参考訳) 遠隔心拍数推定は、被験者と物理的に接触することなく心拍数を計測し、この研究で遠隔胸腔鏡(rPPG)を用いて達成する。 rPPG信号は通常、皮膚のトーンの変化、照明条件、顔の構造など、複数の要因に敏感なビデオカメラを使用して収集される。エンドツーエンドの教師あり学習アプローチは、トレーニングデータが豊富で、テストデータやデプロイメント中の分散からあまり逸脱しない分布をカバーしている。展開中の予期せぬ分布変化に対処するため,自己監督型重み調整(トランスダクティブ推論)のための試験(デプロイ)中に未ラベルのサンプルを採取し,分布変化に迅速に適応するトランスダクティブメタラーナを提案する。このアプローチを用いて,MAHNOB-HCIとUBFC-rPPGの最先端性能を実現する。

Remote heart rate estimation is the measurement of heart rate without any physical contact with the subject and is accomplished using remote photoplethysmography (rPPG) in this work. rPPG signals are usually collected using a video camera with a limitation of being sensitive to multiple contributing factors, e.g. variation in skin tone, lighting condition and facial structure. End-to-end supervised learning approach performs well when training data is abundant, covering a distribution that doesn't deviate too much from the distribution of testing data or during deployment. To cope with the unforeseeable distributional changes during deployment, we propose a transductive meta-learner that takes unlabeled samples during testing (deployment) for a self-supervised weight adjustment (also known as transductive inference), providing fast adaptation to the distributional changes. Using this approach, we achieve state-of-the-art performance on MAHNOB-HCI and UBFC-rPPG.

翻訳日:2022-11-10 15:20:00 公開日:2020-07-14

# BUNET:セキュアなUNETに基づいたブラインド医療画像セグメンテーション

BUNET: Blind Medical Image Segmentation Based on Secure UNET ( http://arxiv.org/abs/2007.06855v1 )

ライセンス: Link先を確認

Song Bian, Xiaowei Xu, Weiwen Jiang, Yiyu Shi, Takashi Sato

(参考訳) さまざまなプライバシー規制によって医療記録に課される厳格なセキュリティ要件は、ビッグデータ時代の大きな障害となる。本研究では,データ機密性を保護しつつ,サービス方式としての効率的な機械学習を確保するために,UNETアーキテクチャに基づくプライバシ保存医療画像セグメンテーションを実装したセキュアプロトコルである盲点UNET(BUNET)を提案する。 BUNETでは、同相暗号やガーブロード回路(GC)などの暗号プリミティブを効率よく利用し、UNETニューラルアーキテクチャのための完全なセキュアなプロトコルを設計する。また,高次元入力データを用いたgcベースのセキュアアクティベーションプロトコルの計算ボトルネックを削減するため,広範なアーキテクチャ探索を行う。実験では,本プロトコルのパラメータ空間を徹底的に検討し,精度を損なうことなくベースラインアーキテクチャ上での最先端のセキュアな推論手法と比較して,最大14倍の推論時間を短縮できることを示す。

The strict security requirements placed on medical records by various privacy regulations become major obstacles in the age of big data. To ensure efficient machine learning as a service schemes while protecting data confidentiality, in this work, we propose blind UNET (BUNET), a secure protocol that implements privacy-preserving medical image segmentation based on the UNET architecture. In BUNET, we efficiently utilize cryptographic primitives such as homomorphic encryption and garbled circuits (GC) to design a complete secure protocol for the UNET neural architecture. In addition, we perform extensive architectural search in reducing the computational bottleneck of GC-based secure activation protocols with high-dimensional input data. In the experiment, we thoroughly examine the parameter space of our protocol, and show that we can achieve up to 14x inference time reduction compared to the-state-of-the-art secure inference technique on a baseline architecture with negligible accuracy degradation.

翻訳日:2022-11-10 15:19:42 公開日:2020-07-14

# イコサヘドロンの折り紙クラウン表現を用いた複数の魚眼画像からの360$^\circ$深度推定

360$^\circ$ Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron ( http://arxiv.org/abs/2007.06891v1 )

ライセンス: Link先を確認

Ren Komatsu, Hiromitsu Fujii, Yusuke Tamura, Atsushi Yamashita, Hajime Asama

(参考訳) 本研究では,屋内環境における多方向画像からの全周深度推定手法を提案する。特に,画像から深度を推定する手法として,平面スウィーピングステレオに着目した。オリガミの冠に類似しているため、「CrownConv」と命名した全方位画像に対して、新しいイコサヘドロンに基づく表現とConvNetsを提案する。 crownconvは魚眼画像と等角画像の両方に適用でき、特徴を抽出することができる。さらに,抽出した特徴量からイコサヘドロンのコストボリュームを生成するために,イコサヘドロンを用いた球面スイーピングを提案する。コストボリュームは3次元クラウンコンブを用いて正規化し、コストボリュームから深さ回帰によって最終的な深さを求める。提案手法は,外部カメラパラメータを用いてカメラアライメントに頑健であるため,トレーニングデータセットとカメラアライメントが異なる場合でも,正確な深度推定が可能である。提案する合成データセットのモデルを評価し,その有効性を実証する。提案手法は計算効率がよいため,GPUを搭載したラップトップを用いて,魚眼画像4枚から1秒以内で深度を推定する。そのため、現実世界のロボット応用に適している。ソースコードはhttps://github.com/matsuren/crownconv360depthから入手できます。

In this study, we present a method for all-around depth estimation from multiple omnidirectional images for indoor environments. In particular, we focus on plane-sweeping stereo as the method for depth estimation from the images. We propose a new icosahedron-based representation and ConvNets for omnidirectional images, which we name "CrownConv" because the representation resembles a crown made of origami. CrownConv can be applied to both fisheye images and equirectangular images to extract features. Furthermore, we propose icosahedron-based spherical sweeping for generating the cost volume on an icosahedron from the extracted features. The cost volume is regularized using the three-dimensional CrownConv, and the final depth is obtained by depth regression from the cost volume. Our proposed method is robust to camera alignments by using the extrinsic camera parameters; therefore, it can achieve precise depth estimation even when the camera alignment differs from that in the training dataset. We evaluate the proposed model on synthetic datasets and demonstrate its effectiveness. As our proposed method is computationally efficient, the depth is estimated from four fisheye images in less than a second using a laptop with a GPU. Therefore, it is suitable for real-world robotics applications. Our source code is available at https://github.com/matsuren/crownconv360depth.

翻訳日:2022-11-10 15:19:23 公開日:2020-07-14

# 自己発見、自己分類、自己修復による意味論的表現の学習

Learning Semantics-enriched Representation via Self-discovery, Self-classification, and Self-restoration ( http://arxiv.org/abs/2007.06959v1 )

ライセンス: Link先を確認

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B. Gotway, Jianming Liang

(参考訳) 医療画像は、人間の解剖学に関する豊富な意味論と自然に関連し、繰り返し繰り返される解剖学的パターンに反映され、深い意味表現学習を育むユニークな可能性を提供し、異なる医療応用のための意味論的により強力なモデルを提供する。しかし、そのような強いが自由なセマンティックスを医療画像に埋め込むことができるのかは、まだ明らかにされていない。この目的のために,深層モデルを用いて自己発見,自己分類,および医用画像下の解剖学の自己修復により,意味論的に強化された視覚表現を学習し,意味論的に強化された汎用的3dモデルであるセマンティック・ジェネシス(semantic genesis)を実現する。我々は,様々な医学的特徴(ct,mri,x線)の分類と分節化の両方をカバーする6つの異なる対象課題について,自己監督または完全な監督によって,利用可能なすべての事前学習モデルを用いて意味論的生成を検討する。我々の広範な実験は、セマンティック・ジェネシスが3Dの全てをはるかに上回り、2Dのイメージネットに基づくデファクト・トランスファー学習をはるかに上回っていることを示している。これは、医療画像に埋め込まれた一貫した解剖学から得られる豊富な解剖学的パターンから、深層モデルに説得力のある意味表現を学ぶように促すものです。コードと事前トレーニングされたSemantic Genesisはhttps://github.com/JLiangLab/SemanticGenesis で入手できる。

Medical images are naturally associated with rich semantics about the human anatomy, reflected in an abundance of recurring anatomical patterns, offering unique potential to foster deep semantic representation learning and yield semantically more powerful models for different medical applications. But how exactly such strong yet free semantics embedded in medical images can be harnessed for self-supervised learning remains largely unexplored. To this end, we train deep models to learn semantically enriched visual representation by self-discovery, self-classification, and self-restoration of the anatomy underneath medical images, resulting in a semantics-enriched, general-purpose, pre-trained 3D model, named Semantic Genesis. We examine our Semantic Genesis with all the publicly-available pre-trained models, by either self-supervision or fully supervision, on the six distinct target tasks, covering both classification and segmentation in various medical modalities (i.e.,CT, MRI, and X-ray). Our extensive experiments demonstrate that Semantic Genesis significantly exceeds all of its 3D counterparts as well as the de facto ImageNet-based transfer learning in 2D. This performance is attributed to our novel self-supervised learning framework, encouraging deep models to learn compelling semantic representation from abundant anatomical patterns resulting from consistent anatomies embedded in medical images. Code and pre-trained Semantic Genesis are available at https://github.com/JLiangLab/SemanticGenesis .

翻訳日:2022-11-10 15:18:59 公開日:2020-07-14

# Pose2RGB。絶対位置からの深度とrgb画像の生成

Pose2RGBD. Generating Depth and RGB images from absolute positions ( http://arxiv.org/abs/2007.07013v1 )

ライセンス: Link先を確認

Mihai Cristian P\^irvu

(参考訳) 本稿では,ニューラルネットワークを用いてrgbd画像を自動的に生成するコンピュータビジョンとコンピュータグラフィックスの交点における手法を提案する。モデルはテクスチャ(RGB)と構造(Depth)の両方を再構築できなければならないため、メッシュやポイントクラウドのような明示的な表現とは対照的に、シーンの暗黙的な表現を生成する。このプロセスはニューラルレンダリング(Neural rendering)とみなすことができ、この関数 f : Pose -> RGBD は、グラフィックシミュレーションと同様、生成されたシーンをナビゲートするために使用できる。本稿では2つの新しいデータセットについて紹介する。1つは合成データに基づくデータで,もう1つは映像とgps信号のみを用いて,大学キャンパスのドローン飛行から記録する。最後に,Pose2RGBDネットワークをトレーニングするために,ビデオのみからデータセットを生成する教師なしの手法を提案する。コードとデータセットは: https://gitlab.com/mihaicristianpirvu/pose2rgbd。

We propose a method at the intersection of Computer Vision and Computer Graphics fields, which automatically generates RGBD images using neural networks, based on previously seen and synchronized video, depth and pose signals. Since the models must be able to reconstruct both texture (RGB) and structure (Depth), it creates an implicit representation of the scene, as opposed to explicit ones, such as meshes or point clouds. The process can be thought of as neural rendering, where we obtain a function f : Pose -> RGBD, which we can use to navigate through the generated scene, similarly to graphics simulations. We introduce two new datasets, one based on synthetic data with full ground truth information, while the other one being recorded from a drone flight in an university campus, using only video and GPS signals. Finally, we propose a fully unsupervised method of generating datasets from videos alone, in order to train the Pose2RGBD networks. Code and datasets are available at:: https://gitlab.com/mihaicristianpirvu/pose2rgbd.

翻訳日:2022-11-10 15:18:24 公開日:2020-07-14

# 共有潜在ガウス混合モデルによるクロスドメイン医用画像変換

Cross-Domain Medical Image Translation by Shared Latent Gaussian Mixture Model ( http://arxiv.org/abs/2007.07230v1 )

ライセンス: Link先を確認

Yingying Zhu, Youbao Tang, Yuxing Tang, Daniel C. Elton, Sungwon Lee, Perry J. Pickhardt, Ronald M. Summers

(参考訳) 現在のディープラーニングベースのセグメンテーションモデルは、訓練データ不足のため、ドメイン間の疎結合をよく一般化する。実世界の臨床応用では、異なる領域の医用画像が正確な診断に必要とされるため、クロスドメイン画像解析ツールが要求されている。放射線学における重要な例は、非造影CTから造影CTへの一般化である。異なる位相における造影CTは、特定の病理や臓器を増強するために用いられる。多くの既存のクロスドメイン画像-画像翻訳モデルは、大きな臓器のクロスドメインセグメンテーションを改善することが示されている。しかし、これらのモデルには翻訳過程で微細な構造を維持する能力がないため、大動脈や骨盤動脈の小さな石灰化プラークの分節化など、多くの臨床応用において重要である。医用画像翻訳中に微細な構造を保存するため,ガウス混合モデルから共有潜在変数を用いたパッチベースモデルを提案する。画像翻訳フレームワークを,クロスドメイン画像翻訳における最先端手法と比較し,詳細な構造保存に優れた性能を示す。大動脈プラークと膵のセグメンテーションの変換画像検出とセグメンテーションで2つのタスクをこなすことで,本モデルの優れた性能を検証した。生成された画像の品質が向上し、小さな構造を保存する能力が向上するため、セグメンテーション以外の問題にもフレームワークの有用性が拡張されることを期待します。

Current deep learning based segmentation models often generalize poorly between domains due to insufficient training data. In real-world clinical applications, cross-domain image analysis tools are in high demand since medical images from different domains are often needed to achieve a precise diagnosis. An important example in radiology is generalizing from non-contrast CT to contrast enhanced CTs. Contrast enhanced CT scans at different phases are used to enhance certain pathologies or organs. Many existing cross-domain image-to-image translation models have been shown to improve cross-domain segmentation of large organs. However, such models lack the ability to preserve fine structures during the translation process, which is significant for many clinical applications, such as segmenting small calcified plaques in the aorta and pelvic arteries. In order to preserve fine structures during medical image translation, we propose a patch-based model using shared latent variables from a Gaussian mixture model. We compare our image translation framework to several state-of-the-art methods on cross-domain image translation and show our model does a better job preserving fine structures. The superior performance of our model is verified by performing two tasks with the translated images - detection and segmentation of aortic plaques and pancreas segmentation. We expect the utility of our framework will extend to other problems beyond segmentation due to the improved quality of the generated images and enhanced ability to preserve small structures.

翻訳日:2022-11-10 15:17:29 公開日:2020-07-14

# Transposer:Feature Map を変換畳み込みフィルタとして用いたユニバーサルテクスチャ合成

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter ( http://arxiv.org/abs/2007.07243v1 )

ライセンス: Link先を確認

Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro

(参考訳) テクスチャ合成のための従来のcnnは、(de)コンボリューションとアップ/ダウンサンプリングの一連の層で構成されており、各層はローカルに動作し、テクスチャ合成に必要な長期的な構造依存性を捉えることができない。したがって、彼らはしばしば合理的な合成を行うのではなく、単に入力テクスチャを拡大する。妥協として、近年の多くの手法は、同じ単一の(または固定された)テクスチャイメージ上でのトレーニングとテストによって一般化性を犠牲にしており、その結果、目に見えない画像に対して膨大な再トレーニング時間コストが生じる。本研究では,従来のテクスチャ合成における組立・ステーシング操作が,転置畳み込み操作と類似していることから,転置畳み込み操作を用いた新しい方法を提案する。具体的には, 入力テクスチャの符号化特徴マップ全体を変換畳み込みフィルタとして, 自己相関情報をキャプチャする特徴の自己相似性マップを変換畳み込みの入力として直接扱う。このような設計により、トレーニングされたフレームワークは、ほぼリアルタイムで単一のフォワードパスで、見えないテクスチャの合成を一般化することができます。本手法は,様々な指標に基づき,最先端のテクスチャ合成品質を実現する。自己相似性は入力テクスチャの規則的な構造パターンを保存するのに役立つが、我々のフレームワークは、自己相似性マップの代わりに不規則な入力テクスチャのためのランダムノイズマップを変換畳み込み入力として利用することもできる。より多様な結果を得ることができ、また、1回のパスで大きなノイズマップを直接サンプリングすることで、任意に大きなテクスチャ出力を生成することができる。

Conventional CNNs for texture synthesis consist of a sequence of (de)-convolution and up/down-sampling layers, where each layer operates locally and lacks the ability to capture the long-term structural dependency required by texture synthesis. Thus, they often simply enlarge the input texture, rather than perform reasonable synthesis. As a compromise, many recent methods sacrifice generalizability by training and testing on the same single (or fixed set of) texture image(s), resulting in huge re-training time costs for unseen images. In this work, based on the discovery that the assembling/stitching operation in traditional texture synthesis is analogous to a transposed convolution operation, we propose a novel way of using transposed convolution operation. Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution. Such a design allows our framework, once trained, to be generalizable to perform synthesis of unseen textures with a single forward pass in nearly real-time. Our method achieves state-of-the-art texture synthesis quality based on various metrics. While self-similarity helps preserve the input textures' regular structural patterns, our framework can also take random noise maps for irregular input textures instead of self-similarity maps as transposed convolution inputs. It allows to get more diverse results as well as generate arbitrarily large texture outputs by directly sampling large noise maps in a single pass as well.

翻訳日:2022-11-10 15:17:04 公開日:2020-07-14

# 意味セグメンテーションにおける限定データとアノテーションの問題に取り組む

Tackling the Problem of Limited Data and Annotations in Semantic Segmentation ( http://arxiv.org/abs/2007.07357v1 )

ライセンス: Link先を確認

Ahmadreza Jeddi

(参考訳) 本研究では,小さな画像データセット(PASCAL VOC 2012からランダムに選択された1000個の画像)におけるセマンティックセグメンテーション(セマンティックセグメンテーション)について検討した。特に,画像セグメンテーションにおける限られたデータアノテーションの問題に対処するため,画像セグメンテーション性能を向上させるために,様々な事前訓練されたモデルとCRFベースの手法を転送する。この目的のために、RotNet、DeeperCluster、Semi&Weakly Supervised Learning (SWSL)事前訓練されたモデルをDeepLab-v2ベースラインで転送、微調整し、高密度CRFを後処理および損失正規化技術として適用する。私の研究の結果は、この小さなデータセットでは、プリトレーニングされたresnet50 swslモデルを使用することで、imagenetプリトレーニングモデルよりも7.4%優れた結果が得られることを示しています。一方、高密度CRFは非常に有効であることが示され、弱い教師付きトレーニングにおける損失正規化技術や後処理ツールとしての結果が高められる。

In this work, the case of semantic segmentation on a small image dataset (simulated by 1000 randomly selected images from PASCAL VOC 2012), where only weak supervision signals (scribbles from user interaction) are available is studied. Especially, to tackle the problem of limited data annotations in image segmentation, transferring different pre-trained models and CRF based methods are applied to enhance the segmentation performance. To this end, RotNet, DeeperCluster, and Semi&Weakly Supervised Learning (SWSL) pre-trained models are transferred and finetuned in a DeepLab-v2 baseline, and dense CRF is applied both as a post-processing and loss regularization technique. The results of my study show that, on this small dataset, using a pre-trained ResNet50 SWSL model gives results that are 7.4% better than applying an ImageNet pre-trained model; moreover, for the case of training on the full PASCAL VOC 2012 training data, this pre-training approach increases the mIoU results by almost 4%. On the other hand, dense CRF is shown to be very effective as well, enhancing the results both as a loss regularization technique in weakly supervised training and as a post-processing tool.

翻訳日:2022-11-10 15:10:04 公開日:2020-07-14

# 深層畳み込みニューラルネットワークを用いたusgs歴史地図列からの道路交差点点の自動抽出

Automatic extraction of road intersection points from USGS historical map series using deep convolutional neural networks ( http://arxiv.org/abs/2007.07404v1 )

ライセンス: Link先を確認

Mahmoud Saeedimoghaddam and T. F. Stepinski

(参考訳) 道路交差点のデータは様々な地理空間的応用と分析に利用されている。 GIS以前の道路網のデータセットは、歴史印刷された地図の形でしか利用できない。 GISソフトウェアで解析する前には、スキャンして、使用可能なベクトルベースのフォーマットに変換する必要がある。スキャンされた歴史的地図の膨大な量のため、それらをデジタルデータセットに変換する自動化方法が採用される必要がある。このプロセスはコンピュータビジョンアルゴリズムに基づくことが多い。しかし、低品質かつ視覚的に複雑なマップと最適パラメータの設定のための変換精度は、これらのアルゴリズムを使用する際の2つの課題である。本稿では,地域別CNNと呼ばれるオブジェクト検出タスクにディープ畳み込みニューラルネットワークを用いる標準的なパラダイムを用いて,米国各都市の歴史的USGS地図における道路交差点の自動同定を行った。その結果,道路地図の複線地図表現における変換精度は,単線地図よりも高いことがわかった。また、従来のコンピュータビジョンアルゴリズムと比較して、RCNNはより正確な抽出を提供する。最後に, 検出出力における誤差の量は, 地図の複雑さや曖昧さに敏感であるとともに, 内部のRGB組み合わせの数にも敏感であることを示した。

Road intersections data have been used across different geospatial applications and analysis. The road network datasets dating from pre-GIS years are only available in the form of historical printed maps. Before they can be analyzed by a GIS software, they need to be scanned and transformed into the usable vector-based format. Due to the great bulk of scanned historical maps, automated methods of transforming them into digital datasets need to be employed. Frequently, this process is based on computer vision algorithms. However, low conversion accuracy for low quality and visually complex maps and setting optimal parameters are the two challenges of using those algorithms. In this paper, we employed the standard paradigm of using deep convolutional neural network for object detection task named region-based CNN for automatically identifying road intersections in scanned historical USGS maps of several U.S. cities. We have found that the algorithm showed higher conversion accuracy for the double line cartographic representations of the road maps than the single line ones. Also, compared to the majority of traditional computer vision algorithms RCNN provides more accurate extraction. Finally, the results show that the amount of errors in the detection outputs is sensitive to complexity and blurriness of the maps as well as the number of distinct RGB combinations within them.

翻訳日:2022-11-10 15:09:39 公開日:2020-07-14

# 言語・コミュニケーション・社会 : ジェンダーに基づく言語分析

Language, communication and society: a gender based linguistics analysis ( http://arxiv.org/abs/2007.06908v1 )

ライセンス: Link先を確認

P. Cutugno, D. Chiarella, R. Lucentini, L. Marconi and G. Morgavi

(参考訳) 本研究の目的は,言語が思考の鏡であり,偏見であり,文化的ステレオタイプであるとする仮説を支持する証拠を見つけることである。 537名を対象にアンケート調査を行った。回答は、心理的特徴や行動特性の帰属など、性別のステレオタイプが存在するかどうかを調べるために分析されてきた。特に、現代社会における男女の役割を定義する際に現れるステレオタイプ画像が何であるかを識別することを目的としていた。さらに、与えられた結果は、性別のステレオタイプと、それらが生み出す期待が、罰や不平等をもたらすかどうかを理解するための良い出発点となる。もしそうなら、言語とその使用は本質的にジェンダーバイアスを生じさせ、日々の生活でも仕事の設定でも評価に影響します。

The purpose of this study is to find evidence for supporting the hypothesis that language is the mirror of our thinking, our prejudices and cultural stereotypes. In this analysis, a questionnaire was administered to 537 people. The answers have been analysed to see if gender stereotypes were present such as the attribution of psychological and behavioural characteristics. In particular, the aim was to identify, if any, what are the stereotyped images, which emerge in defining the roles of men and women in modern society. Moreover, the results given can be a good starting point to understand if gender stereotypes, and the expectations they produce, can result in penalization or inequality. If so, the language and its use would create inherently a gender bias, which influences evaluations both in work settings both in everyday life.

翻訳日:2022-11-10 15:09:22 公開日:2020-07-14

# ポートノイズ調査に最も適した調査を定義するためのアンケート調査分析

Questionnaire analysis to define the most suitable survey for port-noise investigation ( http://arxiv.org/abs/2007.06915v1 )

ライセンス: Link先を確認

Andrea Cerniglia, Davide Chiarella, Paola Cutugno, Lucia Marconi, Anna Magrini, Gelsomina Di Feo, Melissa Ferretti

(参考訳) 港湾とロジスティックプラットフォームの間の地域に影響を与える高レベルの騒音汚染は、異なる観点から直面することができる問題である。音響モニタリング,マッピング,短期計測,港湾および道路交通流の解析は,この問題のより良い管理のために提案すべき戦略について有用な指標を与えることができる。バックポート地域の騒音に曝された住民へのアンケート作成による調査活動は,主観的視点の理解を深める上で有用である。本論文は,国際的に提案されている主観的調査のためのアンケートデータベースの一部として選択された,特定の研究に適した質問のサンプルを分析する。第1次データ収集キャンペーンの予備結果は,調査に使用する数,質問の種類,サンプルノイズの種類の妥当性を検証するために検討された。調査はTRIPLOプロジェクト(TRansports and Innovative sustainable connection between Ports and LOGistic platform)に分散するよう最適化される。本調査の結果は,音響モニタリングと組み合わせて行った言語調査の出発点となり,個人の感情と技術的側面との関係の理解を深める。

The high level of noise pollution affecting the areas between ports and logistic platforms represents a problem that can be faced from different points of view. Acoustic monitoring, mapping, short-term measurements, port and road traffic flows analyses can give useful indications on the strategies to be proposed for a better management of the problem. A survey campaign through the preparation of questionnaires to be submitted to the population exposed to noise in the back-port areas will help to better understand the subjective point of view. The paper analyses a sample of questions suitable for the specific research, chosen as part of the wide database of questionnaires internationally proposed for subjective investigations. The preliminary results of a first data collection campaign are considered to verify the adequacy of the number, the type of questions, and the type of sample noise used for the survey. The questionnaire will be optimized to be distributed in the TRIPLO project (TRansports and Innovative sustainable connections between Ports and LOgistic platforms). The results of this survey will be the starting point for the linguistic investigation carried out in combination with the acoustic monitoring, to improve understanding the connections between personal feeling and technical aspects.

翻訳日:2022-11-10 15:09:08 公開日:2020-07-14

# Covidex:COVID-19オープン研究データセットのニューラルネットワークランキングモデルとキーワード検索基盤

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset ( http://arxiv.org/abs/2007.07846v1 )

ライセンス: Link先を確認

Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin

(参考訳) 我々は、最新のニューラルネットワークランキングモデルを利用して、Allen Institute for AIがキュレートしたCOVID-19 Open Research Datasetに情報アクセスを提供する検索エンジンであるCovidexを紹介する。当社のシステムは,2020年3月下旬以降,オンラインでユーザに提供する。 covidexは、現在進行中の世界的なパンデミックに取り組むドメインエキスパートを支援する技術を開発するための、3段階の戦略のユーザアプリケーションコンポーネントです。さらに、成熟したフュージョンベースの手法を利用する堅牢で使いやすいキーワード検索インフラストラクチャや、他のアプリケーションに組み込むことのできるスタンドアロンのニューラルネットワークランキングモデルも提供しています。私たちのインフラとベースラインは多くの参加者によって採用されています。第3ラウンドでは、前回のトレーニングデータと第2のフルオートマチックランを活用した最高スケアランを報告します。

We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.

翻訳日:2022-11-10 15:07:59 公開日:2020-07-14

# クロスモーダル変調と選択によるRGB-D能動物体検出

RGB-D Salient Object Detection with Cross-Modality Modulation and Selection ( http://arxiv.org/abs/2007.07051v1 )

ライセンス: Link先を確認

Chongyi Li and Runmin Cong and Yongri Piao and Qianqian Xu and Chen Change Loy

(参考訳) 本稿では, RGB-D salient Object Detection (SOD) において, モーダリティの相互補完性を段階的に統合し, 改良する有効な方法を提案する。提案するネットワークは主に2つの課題を解決している。 1)RGB画像とその対応する深度マップからの補完情報を効果的に統合する方法、及び 2) より衛生的な特徴を適応的に選択する方法。まず,rgb-dデータの相補関係をモデル化する奥行き特徴を予め考慮し,特徴表現を強調するクロスモダリティ特徴変調(cmfm)モジュールを提案する。第2に,サリエンシー関連特徴を選択し,下位特徴を抑圧する適応特徴選択(afs)モジュールを提案する。 AFSモジュールは、自己モダリティとチャネル特徴の相互依存性を考慮した多モード空間的特徴融合を利用する。第3に,saliency-guided position-edge attention(sg-pea)モジュールを使用して,ネットワークがsariency-related regionに集中するよう促す。上記のモジュール全体であるcmMSブロック(英語版)は、粗い微細な方法での塩分濃度特性の洗練を促進する。ボトムアップ推論と組み合わせて、改良されたサリエンシ機能は正確かつエッジ保存のSODを可能にする。大規模な実験により、我々のネットワークは6つのRGB-D SODベンチマークで最先端の精度検出器より優れていることが示された。

We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD). The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features. First, we propose a cross-modality feature modulation (cmFM) module to enhance feature representations by taking the depth features as prior, which models the complementary relations of RGB-D data. Second, we propose an adaptive feature selection (AFS) module to select saliency-related features and suppress the inferior ones. The AFS module exploits multi-modality spatial feature fusion with the self-modality and cross-modality interdependencies of channel features are considered. Third, we employ a saliency-guided position-edge attention (sg-PEA) module to encourage our network to focus more on saliency-related regions. The above modules as a whole, called cmMS block, facilitates the refinement of saliency features in a coarse-to-fine fashion. Coupled with a bottom-up inference, the refined saliency features enable accurate and edge-preserving SOD. Extensive experiments demonstrate that our network outperforms state-of-the-art saliency detectors on six popular RGB-D SOD benchmarks.

翻訳日:2022-11-10 15:02:18 公開日:2020-07-14

# 書き手識別と書き手検索のための再ランク付け

Re-ranking for Writer Identification and Writer Retrieval ( http://arxiv.org/abs/2007.07101v1 )

ライセンス: Link先を確認

Simon Jordan, Mathias Seuret, Pavel Kr\'al, Ladislav Lenc, Ji\v{r}\'i Mart\'inek, Barbara Wiermann, Tobias Schwinger, Andreas Maier, Vincent Christlein

(参考訳) 自動ライタ識別は文書解析において一般的な問題である。 state-of-the-artメソッドは通常、従来的あるいはディープラーニングベースのテクニックによる特徴抽出ステップにフォーカスする。検索問題では、再ランク付けは結果を改善するのによく使われる手法である。ランク付けされた結果に含まれる知識を用いて、初期ランク付け結果を洗練する。 g. 最寄りの近隣関係を利用する。私たちの知る限りでは、再ランク付けはライターの識別/再利用には使われていません。考えられる理由は、公開利用可能なベンチマークデータセットには、書き込み毎のサンプルがわずかしかないため、再ランク付けが期待できないことだ。著者1人あたりのサンプル数が少ない場合でも,k-相反的近傍関係に基づく再ランク付けが,著者識別に有利であることを示す。これらの相互関係は、もともと提案されたような新しいベクトルにエンコードするか、クエリ拡張の観点でそれらを統合するかの2つの方法で利用します。両手法が3つの著者識別データセット上でmAPの基準値よりも優れていることを示す。

Automatic writer identification is a common problem in document analysis. State-of-the-art methods typically focus on the feature extraction step with traditional or deep-learning-based techniques. In retrieval problems, re-ranking is a commonly used technique to improve the results. Re-ranking refines an initial ranking result by using the knowledge contained in the ranked result, e. g., by exploiting nearest neighbor relations. To the best of our knowledge, re-ranking has not been used for writer identification/retrieval. A possible reason might be that publicly available benchmark datasets contain only few samples per writer which makes a re-ranking less promising. We show that a re-ranking step based on k-reciprocal nearest neighbor relationships is advantageous for writer identification, even if only a few samples per writer are available. We use these reciprocal relationships in two ways: encode them into new vectors, as originally proposed, or integrate them in terms of query-expansion. We show that both techniques outperform the baseline results in terms of mAP on three writer identification datasets.

翻訳日:2022-11-10 15:00:19 公開日:2020-07-14

# 画像生成と編集のためのアートワークフローのモデリング

Modeling Artistic Workflows for Image Generation and Editing ( http://arxiv.org/abs/2007.07238v1 )

ライセンス: Link先を確認

Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang

(参考訳) 人々は、デザイン全体を伝える複数のステージを含む芸術的なワークフローに従うことで、しばしばアートを作成する。アーティストが初期の決定を修正したい場合、この新たな決定を最終的な作品に広めるために重要な作業が必要となる。上記の観察に動機づけられ,既存の芸術作品の多段階画像生成と多段階画像編集の両方を可能にする,所定の芸術的ワークフローに従う生成モデルを提案する。さらに, 編集シナリオでは, モデルが生成した編集画像が元の画像と密接に一致するように, 学習に基づく正規化とともに最適化プロセスを導入する。 3つの異なる芸術的データセットの質的および定量的な結果は、画像生成と編集の両方におけるフレームワークの有効性を示す。

People often create art by following an artistic workflow involving multiple stages that inform the overall design. If an artist wishes to modify an earlier decision, significant work may be required to propagate this new decision forward to the final artwork. Motivated by the above observations, we propose a generative model that follows a given artistic workflow, enabling both multi-stage image generation as well as multi-stage image editing of an existing piece of art. Furthermore, for the editing scenario, we introduce an optimization process along with learning-based regularization to ensure the edited image produced by the model closely aligns with the originally provided image. Qualitative and quantitative results on three different artistic datasets demonstrate the effectiveness of the proposed framework on both image generation and editing tasks.

翻訳日:2022-11-10 14:58:58 公開日:2020-07-14

# JSENet:3Dポイントクラウドのための共同セマンティックセグメンテーションとエッジ検出ネットワーク

JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds ( http://arxiv.org/abs/2007.06888v1 )

ライセンス: Link先を確認

Zeyu Hu, Mingmin Zhen, Xuyang Bai, Hongbo Fu and Chiew-lan Tai

(参考訳) セマンティックセグメンテーションとセマンティックエッジ検出は、コンピュータビジョンにおける密接な関係を持つ2つの双対問題と見なすことができる。学習に基づく3Dセマンティックセグメンテーション法の急速な進化にもかかわらず、3Dセマンティックエッジ検出器の学習には注意が向けられていない。本稿では,3次元意味エッジ検出タスクを初めて取り上げ,これら2つのタスクを共同で実行する新たな2ストリーム完全畳み込みネットワークを提案する。特に,両タスクの性能向上のために,領域情報とエッジ情報を明示的に関連付ける共同改良モジュールを設計する。さらに,ネットワークが境界を良くして意味的セグメンテーション結果を生成することを促す新しい損失関数を提案する。 S3DISおよびScanNetデータセットの大規模評価により,本手法はセマンティックセグメンテーションの最先端手法よりも高い性能を示し,セマンティックエッジ検出のベースライン手法よりも優れていた。コードリリース:https://github.com/hzykent/JSENet

Semantic segmentation and semantic edge detection can be seen as two dual problems with close relationships in computer vision. Despite the fast evolution of learning-based 3D semantic segmentation methods, little attention has been drawn to the learning of 3D semantic edge detectors, even less to a joint learning method for the two tasks. In this paper, we tackle the 3D semantic edge detection task for the first time and present a new two-stream fully-convolutional network that jointly performs the two tasks. In particular, we design a joint refinement module that explicitly wires region information and edge information to improve the performances of both tasks. Further, we propose a novel loss function that encourages the network to produce semantic segmentation results with better boundaries. Extensive evaluations on S3DIS and ScanNet datasets show that our method achieves on par or better performance than the state-of-the-art methods for semantic segmentation and outperforms the baseline methods for semantic edge detection. Code release: https://github.com/hzykent/JSENet

翻訳日:2022-11-10 14:53:22 公開日:2020-07-14

# 歴史的文書化のための共同レイアウト解析・文字検出・認識

Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization ( http://arxiv.org/abs/2007.06890v1 )

ライセンス: Link先を確認

Weihong Ma, Hesuo Zhang, Lianwen Jin, Sihang Wu, Jiapeng Wang, Yongpan Wang

(参考訳) 本稿では,正しい読み順に従って履歴文書を復元するためのエンドツーエンドの学習フレームワークを提案する。このフレームワークでは、キャラクタブランチとレイアウトブランチという2つのブランチが特徴抽出ネットワークの背後に追加される。文字ブランチは、文書画像中の個々の文字をローカライズし、同時に認識する。次に,テキスト行にグループ化するための後処理手法を採用する。完全な畳み込みネットワークに基づくレイアウト分岐は、バイナリマスクを出力する。次に,バイナリマスクの行検出にhough変換を使用し,文字結果とレイアウト情報を組み合わせて文書コンテンツを復元する。これら2つの枝は並行して訓練でき、容易に訓練できる。さらに,認識誤差を最小化する再スコア機構を提案する。中国の歴史文書MTHv2データセットの実験結果から,提案手法の有効性が示された。

In this paper, we propose an end-to-end trainable framework for restoring historical documents content that follows the correct reading order. In this framework, two branches named character branch and layout branch are added behind the feature extraction network. The character branch localizes individual characters in a document image and recognizes them simultaneously. Then we adopt a post-processing method to group them into text lines. The layout branch based on fully convolutional network outputs a binary mask. We then use Hough transform for line detection on the binary mask and combine character results with the layout information to restore document content. These two branches can be trained in parallel and are easy to train. Furthermore, we propose a re-score mechanism to minimize recognition error. Experiment results on the extended Chinese historical document MTHv2 dataset demonstrate the effectiveness of the proposed framework.

翻訳日:2022-11-10 14:52:40 公開日:2020-07-14

# 人-物体相互作用検出のためのグラフに基づく対話型推論

A Graph-based Interactive Reasoning for Human-Object Interaction Detection ( http://arxiv.org/abs/2007.06925v1 )

ライセンス: Link先を確認

Dongming Yang and Yuexian Zou

(参考訳) 人間-物体相互作用(Human-Object Interaction, HOI)検出は,<人,動詞,オブジェクト>の推論によって,人間が周囲の物体とどのように相互作用するかを学ぶ。しかし、最近のhoi検出手法は、主に追加のアノテーション(人間のポーズなど)と、畳み込みを超えて強力な対話的推論を無視する。本稿では,対話型意味論を視覚的対象に対して効果的に活用する,インタラクティブグラフ(in-Graph)と呼ばれる新しいグラフベースの対話型推論モデルを提案する。提案モデルは,コンボリューション空間からグラフベースのセマンティック空間へ関連ターゲットをマッピングするプロジェクト関数と,すべてのノード間のセマンティクスを伝播するメッセージパッシングプロセスと,理由付けられたノードを畳み込み空間に変換する更新関数とから構成される。さらに,新たなフレームワークを構築して,HOI,すなわち-GraphNetを検出する。このフレームワークは、それぞれインスタンス機能を使用してHOIを推論する以外に、2レベルイングラフ、すなわちシーンワイドとインスタンスワイドイングラフを統合することで、視覚的ターゲット間のペアワイズなセマンティクスを動的に解析する。私たちのフレームワークはエンドツーエンドでトレーニング可能で、人間のポーズのような高価なアノテーションは不要です。 V-COCOとHICO-DETのベンチマークにおいて,提案手法が既存のHOI検出法より優れ,ベースラインが約9.4%,15%向上し,HOI検出の有効性が検証された。

Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring triplets of < human, verb, object >. However, recent HOI detection methods mostly rely on additional annotations (e.g., human pose) and neglect powerful interactive reasoning beyond convolutions. In this paper, we present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs, in which interactive semantics implied among visual targets are efficiently exploited. The proposed model consists of a project function that maps related targets from convolution space to a graph-based semantic space, a message passing process propagating semantics among all nodes and an update function transforming the reasoned nodes back to convolution space. Furthermore, we construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Beyond inferring HOIs using instance features respectively, the framework dynamically parses pairwise interactive semantics among visual targets by integrating two-level in-Graphs, i.e., scene-wide and instance-wide in-Graphs. Our framework is end-to-end trainable and free from costly annotations like human pose. Extensive experiments show that our proposed framework outperforms existing HOI detection methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 9.4% and 15% relatively, validating its efficacy in detecting HOIs.

翻訳日:2022-11-10 14:52:10 公開日:2020-07-14

# 特徴等化をもつ相互エンコーダデコーダによる画像の描画再考

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations ( http://arxiv.org/abs/2007.06929v1 )

ライセンス: Link先を確認

Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, and Chao Yang

(参考訳) ディープエンコーダデコーダベースのcnnは、ホール充填のための高度なイメージインペインティング手法を備えている。既存の手法では、ホール領域の構造とテクスチャを段階的に復元するが、通常は2つのエンコーダデコーダを使用して別々のリカバリを行う。各エンコーダのCNN機能は、それら全体を考慮せずに、欠落した構造やテクスチャをキャプチャする。これらのエンコーダの不十分な利用により、構造とテクスチャの回復性能が制限される。本稿では,相互エンコーダとデコーダのCNNを用いて,両者の結合回復を提案する。入力画像の構造とテクスチャをそれぞれ表現するために,エンコーダの深層と浅層からのcnn機能を使用する。深層特徴は構造分岐に送られ、浅層特徴はテクスチャ分岐に送られる。各ブランチでは、CNNの機能の複数のスケールで穴を埋めます。両方のブランチから満たされたCNN機能は連結され、その後等化される。特徴均等化の際,まずチャネルの注意を尊重し,空間等化を実現するための双方向伝搬活性化関数を提案する。この目的のために、構造とテクスチャの満たされたCNN特徴は、すべての特徴レベルで画像コンテンツを表現するのに相互に有利である。我々は、スキップ接続による出力画像生成のためのデコーダ機能を補うために等化機能を使用する。評価実験の結果,提案手法は構造やテクスチャの復元に有効であり,最先端のアプローチに対して良好に機能することがわかった。

Deep encoder-decoder based CNNs have advanced image inpainting methods for hole filling. While existing methods recover structures and textures step-by-step in the hole regions, they typically use two encoder-decoders for separate recovery. The CNN features of each encoder are learned to capture either missing structures or textures without considering them as a whole. The insufficient utilization of these encoder features limit the performance of recovering both structures and textures. In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both. We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively. The deep layer features are sent to a structure branch and the shallow layer features are sent to a texture branch. In each branch, we fill holes in multiple scales of the CNN features. The filled CNN features from both branches are concatenated and then equalized. During feature equalization, we reweigh channel attentions first and propose a bilateral propagation activation function to enable spatial equalization. To this end, the filled CNN features of structure and texture mutually benefit each other to represent image content at all feature levels. We use the equalized feature to supplement decoder features for output image generation through skip connections. Experiments on the benchmark datasets show the proposed method is effective to recover structures and textures and performs favorably against state-of-the-art approaches.

翻訳日:2022-11-10 14:51:39 公開日:2020-07-14

# 高精度スケール推定のための適応的提案選択による相関フィルタ追跡

Correlation filter tracking with adaptive proposal selection for accurate scale estimation ( http://arxiv.org/abs/2007.07018v1 )

ライセンス: Link先を確認

Luo Xiong, Yanjie Liang, Yan Yan, Hanzi Wang

(参考訳) 近年,相関フィルタを用いた検出手法が最先端の追跡結果を達成している。しかし、提案生成器によって与えられる多くの冗長な提案は、これらのトラッカーの性能と速度を低下させる可能性がある。本稿では,視覚物体追跡のためのスケール変動問題に対処するために,少数の高品質提案を生成できる適応型提案選択アルゴリズムを提案する。具体的には、まず、HSV色空間における色ヒストグラムを用いて、インスタンス(例えば、最初のフレームにおける初期ターゲットと、前のフレームにおける予測ターゲット)と提案を行う。そして、色相似性に基づく適応戦略を定式化し、高品質の提案を選択する。さらに,提案する適応型提案選択アルゴリズムを細かな深層特徴と統合することで,トラッカの一般化と効率性を検証する。 2つのベンチマークデータセットの実験では、提案アルゴリズムがいくつかの最先端トラッカーに対して好適に動作することを示した。

Recently, some correlation filter based trackers with detection proposals have achieved state-of-the-art tracking results. However, a large number of redundant proposals given by the proposal generator may degrade the performance and speed of these trackers. In this paper, we propose an adaptive proposal selection algorithm which can generate a small number of high-quality proposals to handle the problem of scale variations for visual object tracking. Specifically, we firstly utilize the color histograms in the HSV color space to represent the instances (i.e., the initial target in the first frame and the predicted target in the previous frame) and proposals. Then, an adaptive strategy based on the color similarity is formulated to select high-quality proposals. We further integrate the proposed adaptive proposal selection algorithm with coarse-to-fine deep features to validate the generalization and efficiency of the proposed tracker. Experiments on two benchmark datasets demonstrate that the proposed algorithm performs favorably against several state-of-the-art trackers.

翻訳日:2022-11-10 14:50:21 公開日:2020-07-14

# 一般属性予測のための教師学生ネットワークによる半教師付き学習

Semi-supervised Learning with a Teacher-student Network for Generalized Attribute Prediction ( http://arxiv.org/abs/2007.06769v1 )

ライセンス: Link先を確認

Minchul Shin

(参考訳) 本稿では,視覚特性予測問題を解くための半教師付き学習について述べる。視覚アルゴリズムの多くの応用において、物体の視覚特性の正確な認識は重要であるが、それでも難しい。これは属性のクラス階層の定義があいまいであるため、トレーニングデータは必然的にクラスの不均衡とラベルのスパーシティに苦しむため、効果的なアノテーションが欠如している。直感的な解決策は、ラベルのない画像を利用して画像表現を効果的に学習する方法を見つけることである。そこで本研究では,マルチタスク学習と半教師学習の蒸留に触発されたマルチティーチャー・シングルスチューデント(mtss)アプローチを提案する。我々のMTSSはラベル埋め込み技術を用いて教師ネットワークと呼ばれるタスク固有のドメインエキスパートを学習し、モデルにドメインエキスパートが学習した分布を模倣するように強制することで学生ネットワークと呼ばれる統一モデルを学ぶ。提案手法は, ファッション属性予測のための様々なベンチマークにおいて, 競争性能を達成するだけでなく, ドメイン間適応性やロバスト性も向上することを示した。

This paper presents a study on semi-supervised learning to solve the visual attribute prediction problem. In many applications of vision algorithms, the precise recognition of visual attributes of objects is important but still challenging. This is because defining a class hierarchy of attributes is ambiguous, so training data inevitably suffer from class imbalance and label sparsity, leading to a lack of effective annotations. An intuitive solution is to find a method to effectively learn image representations by utilizing unlabeled images. With that in mind, we propose a multi-teacher-single-student (MTSS) approach inspired by the multi-task learning and the distillation of semi-supervised learning. Our MTSS learns task-specific domain experts called teacher networks using the label embedding technique and learns a unified model called a student network by forcing a model to mimic the distributions learned by domain experts. Our experiments demonstrate that our method not only achieves competitive performance on various benchmarks for fashion attribute prediction, but also improves robustness and cross-domain adaptability for unseen domains.

翻訳日:2022-11-10 14:43:45 公開日:2020-07-14

# Face to Purchase: 顔構造と行動特性を組み込んだ消費者選択予測

Face to Purchase: Predicting Consumer Choices with Structured Facial and Behavioral Traits Embedding ( http://arxiv.org/abs/2007.06842v1 )

ライセンス: Link先を確認

Zhe Liu, Xianzhi Wang, Lina Yao, Jake An, Lei Bai, Ee-Peng Lim

(参考訳) 消費者の購買行動を予測することは、eコマースのターゲット広告や販売促進にとって重要である。人間の顔は、消費者の性格や行動特性に関する洞察を得るための貴重な情報源である。しかし、消費者の顔は、これまでの研究ではほとんど研究されておらず、既存の顔関連研究は、顔データから学ぶことのビジネス的重要性を無視しながら、パーソナリティ特性のようなハイレベルな特徴に焦点を当てている。顔の特徴や購買履歴から消費者の購買予測を行う。我々は,階層的埋め込みネットワークに基づく半教師付きモデルを設計し,消費者の高レベルな特徴を抽出し,消費者の最上位の購入先を予測する。実世界のデータセットを用いた実験結果から,消費者の購買行動予測に顔情報を導入する効果が示された。

Predicting consumers' purchasing behaviors is critical for targeted advertisement and sales promotion in e-commerce. Human faces are an invaluable source of information for gaining insights into consumer personality and behavioral traits. However, consumer's faces are largely unexplored in previous research, and the existing face-related studies focus on high-level features such as personality traits while neglecting the business significance of learning from facial data. We propose to predict consumers' purchases based on their facial features and purchasing histories. We design a semi-supervised model based on a hierarchical embedding network to extract high-level features of consumers and to predict the top-$N$ purchase destinations of a consumer. Our experimental results on a real-world dataset demonstrate the positive effect of incorporating facial information in predicting consumers' purchasing behaviors.

翻訳日:2022-11-10 14:42:42 公開日:2020-07-14

# 社会的・文脈的に認知される人間の動きとポーズ予測

Socially and Contextually Aware Human Motion and Pose Forecasting ( http://arxiv.org/abs/2007.06843v1 )

ライセンス: Link先を確認

Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi

(参考訳) 人間と対話しながらスムーズでシームレスなロボットナビゲーションは、人間の動きを予測することに依存する。このような人間のダイナミクスの予測には、人間の軌跡(球運動)や詳細な体の動き(局所運動)をモデル化することが多い。先行研究は通常、地域と世界の動きを別々に取り組んだ。本稿では,人間の動作(あるいは軌道)と身体骨格のポーズ予測の両方を統一されたエンドツーエンドパイプラインで行うための新しい枠組みを提案する。この現実的な問題に対処するため、我々は、この予測タスクの重要な手がかりとして、シーンと社会的文脈の両方を、提案フレームワークに組み込むことを検討する。この2つのタスクを一つにまとめます一共有GRUエンコーダ及び共有GRUエンコーダを用いてその履歴を符号化すること二基準を損失として適用し、各業務における誤差の源泉を単一の距離として連続的に測定すること。次に,映像データの時空間表現を符号化することでシーンコンテキストを組み込む。また,ソーシャル・プーリング・レイヤを用いて,人物の動作とポーズから共同特徴表現を生成することにより,社会的手がかりも含んでいる。最後に、GRUベースのデコーダを使用して、動きと骨格のポーズを予測します。提案手法は,2つのソーシャルデータセットのベースラインよりも優れた性能を示す。

Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline. To deal with this real-world problem, we consider incorporating both scene and social contexts, as critical clues for this prediction task, into our proposed framework. To this end, we first couple these two tasks by i) encoding their history using a shared Gated Recurrent Unit (GRU) encoder and ii) applying a metric as loss, which measures the source of errors in each task jointly as a single distance. Then, we incorporate the scene context by encoding a spatio-temporal representation of the video data. We also include social clues by generating a joint feature representation from motion and pose of all individuals from the scene using a social pooling layer. Finally, we use a GRU based decoder to forecast both motion and skeleton pose. We demonstrate that our proposed framework achieves a superior performance compared to several baselines on two social datasets.

翻訳日:2022-11-10 14:42:29 公開日:2020-07-14

# 動的シーン再構築のためのトポロジー・チェンジ対応ボリュームフュージョン

Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction ( http://arxiv.org/abs/2007.06853v1 )

ライセンス: Link先を確認

Chao Li and Xiaohu Guo

(参考訳) トポロジー変化は動的シーンの4次元再構成において難しい問題である。古典的な体積融合に基づくフレームワークでは、メッシュは通常TSDF体積から標準表面表現として抽出され、変形場の推定に役立ちます。しかし、表面メッシュが固定接続性を持つため、表面変形グラフと埋め込み変形グラフ(EDG)の表現はトポロジ上の矛盾をもたらすが、変形場は不連続である。本稿では, TSDFとEDGの両方に非多様体体積格子の新たな構造を導入し, セル分割・複製による接続更新を可能にすることにより, トポロジ変化下での動的シーンの4次元再構成を実現する。実験では、最先端手法と比較して、トポロジー変化の動的シーンに対する説得力のある再構成結果を示す。

Topology change is a challenging problem for 4D reconstruction of dynamic scenes. In the classic volumetric fusion-based framework, a mesh is usually extracted from the TSDF volume as the canonical surface representation to help estimating deformation field. However, the surface and Embedded Deformation Graph (EDG) representations bring conflicts under topology changes since the surface mesh has fixed-connectivity but the deformation field can be discontinuous. In this paper, the classic framework is re-designed to enable 4D reconstruction of dynamic scene under topology changes, by introducing a novel structure of Non-manifold Volumetric Grid to the re-design of both TSDF and EDG, which allows connectivity updates by cell splitting and replication. Experiments show convincing reconstruction results for dynamic scenes of topology changes, as compared to the state-of-the-art methods.

翻訳日:2022-11-10 14:42:09 公開日:2020-07-14

# 動作境界検出による過分割誤りの軽減

Alleviating Over-segmentation Errors by Detecting Action Boundaries ( http://arxiv.org/abs/2007.06866v1 )

ライセンス: Link先を確認

Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, Hirokatsu Kataoka

(参考訳) 本稿では,時間的行動セグメント化作業,すなわちアクションセグメンテーション・リファインメント・フレームワーク(ASRF)の効果的なフレームワークを提案する。我々のモデルアーキテクチャは、長期的特徴抽出器と、アクションセグメンテーションブランチ(ASB)と境界回帰ブランチ(BRB)の2つのブランチから構成される。長期特徴抽出器は、広時間受容野を有する2つの枝に共通特徴を提供する。 ASBはビデオフレームをアクションクラスに分類し、BRBはアクション境界確率を回帰する。 BRBが予測した動作境界はASBの出力を洗練し、性能が大幅に向上した。私たちの貢献は3倍です。 i) 時間的行動セグメント化のためのフレームワークであるASRFを提案し, 時間的行動セグメント化をフレーム単位の行動分類と行動境界回帰に分割する。我々のフレームワークは、予測されたアクション境界を用いてアクションクラスのフレームレベル仮説を洗練する。二) 行動確率の遷移を円滑にするための損失関数を提案し, 時間的行動区分のための各種損失関数の組み合わせを分析する。 (iii)本フレームワークは,3つの難題データセットにおいて最先端手法を上回り,セグメント編集距離で最大13.7%,セグメントf1スコアで最大16.1%の改善を提供する。私たちのコードはまもなく公開されます。

We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code will be publicly available soon.

翻訳日:2022-11-10 14:41:52 公開日:2020-07-14

# TridentAlignとコンテキスト埋め込みによる視覚追跡

Visual Tracking by TridentAlign and Context Embedding ( http://arxiv.org/abs/2007.06887v1 )

ライセンス: Link先を確認

Janghoon Choi, Junseok Kwon, Kyoung Mu Lee

(参考訳) シームズネットワークに基づく視覚追跡手法の最近の進歩は、多数のトラッキングベンチマークで高いパフォーマンスを実現している。しかし、ターゲットオブジェクトと類似のカテゴリを持つイントラクタオブジェクトの広範なスケールのバリエーションは、常に視覚的トラッキングの課題を提起している。このような持続的な問題に対処するために,Siamese ネットワークに基づく視覚的トラッキングのための新しい TridentAlign とコンテキスト埋め込みモジュールを提案する。 tridentalignモジュールは、ターゲットの広範囲なバリエーションや大きな変形への適応性を促進し、対象オブジェクトの特徴表現を複数の空間次元にプールし、特徴ピラミッドを形成する。一方、コンテキスト埋め込みモジュールは、オブジェクト間のグローバルなコンテキスト情報を考慮し、ターゲットを邪魔対象から識別することを目的としている。コンテキスト埋め込みモジュールは、所定のフレームのグローバルコンテキスト情報を、最終分類段階で活用できるように、ローカルな特徴表現に抽出して埋め込みます。複数のベンチマークデータセットから得られた実験結果から,提案トラッカーの性能は最先端トラッカーと同等であり,提案トラッカーはリアルタイムに動作していることがわかった。

Recent advances in Siamese network-based visual tracking methods have enabled high performance on numerous tracking benchmarks. However, extensive scale variations of the target object and distractor objects with similar categories have consistently posed challenges in visual tracking. To address these persisting issues, we propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods. The TridentAlign module facilitates adaptability to extensive scale variations and large deformations of the target, where it pools the feature representation of the target object into multiple spatial dimensions to form a feature pyramid, which is then utilized in the region proposal stage. Meanwhile, context embedding module aims to discriminate the target from distractor objects by accounting for the global context information among objects. The context embedding module extracts and embeds the global context information of a given frame into a local feature representation such that the information can be utilized in the final classification stage. Experimental results obtained on multiple benchmark datasets show that the performance of the proposed tracker is comparable to that of state-of-the-art trackers, while the proposed tracker runs at real-time speed.

翻訳日:2022-11-10 14:41:29 公開日:2020-07-14

# 深層学習と深部画像を用いた高密度人物検出に向けて

Towards Dense People Detection with Deep Learning and Depth images ( http://arxiv.org/abs/2007.07171v1 )

ライセンス: Link先を確認

David Fuentes-Jimenez and Cristina Losada-Gutierrez and David Casillas-Perez and Javier Macias-Guarasa and Roberto Martin-Lopez and Daniel Pizarro and Carlos A.Luna

(参考訳) 本稿では,1つの深度画像から複数の人物を検出するDNNシステムを提案する。我々のニューラルネットワークは深度画像を処理し、画像座標の確率マップを出力し、各検出は人の頭を中心にしたガウス型の局所分布に対応する。検出された人物の数と2D画像位置の両方をエンコードし、奥行き画像とカメラキャリブレーションパラメータを用いて各人物の3D位置を復元することができる。私たちのアーキテクチャはコンパクトで、分離された畳み込みを使ってパフォーマンスを高め、低予算gpuでリアルタイムに動作します。まずネットワークのトレーニングにシミュレーションデータを使用し,その後,比較的少ない実データで微調整を行う。我々は,この戦略が効果的であることを示し,訓練中に使用する場面とは異なる場面を一般化するネットワークを創り出す。我々は,従来のDNNベースのソリューションを含め,既存の最先端技術と比較した。本手法は既存の手法よりも優れており,有意な咬合を有するシーンの人物を正確に検出できる。

This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at the person's head. The likelihood map encodes both the number of detected people and their 2D image positions, and can be used to recover the 3D position of each person using the depth image and the camera calibration parameters. Our architecture is compact, using separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use simulated data for initially training the network, followed by fine tuning with a relatively small amount of real data. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training. We thoroughly compare our method against the existing state-of-the-art, including both classical and DNN-based solutions. Our method outperforms existing methods and can accurately detect people in scenes with significant occlusions.

翻訳日:2022-11-10 14:35:26 公開日:2020-07-14

# 絵文字予測:拡張とベンチマーク

Emoji Prediction: Extensions and Benchmarking ( http://arxiv.org/abs/2007.07389v1 )

ライセンス: Link先を確認

Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi

(参考訳) 絵文字は、具体的な意味、感情、意図を表現できる簡潔な言語である。絵文字には、コミュニケーションの意図をより理解するために使用できるシグナルも備わっている。それらは私たちの日常生活のユビキタスな部分となり、ユーザー生成コンテンツを理解する重要な部分となっている。絵文字予測タスクは、テキストに関連付けられた適切な絵文字セットを予測することを目的としている。絵文字予測により、モデルは書かれたテキストのコミュニケーション意図の豊かな表現を学ぶことができる。絵文字予測タスクに関する既存の研究は、特定の感情と密接に関連する絵文字の少数のサブセットに焦点を当てているが、この設定はタスクを単純化し、絵文字の表現力を無駄にする。本稿では,絵文字予測タスクの既存の設定を,よりリッチな絵文字セットを含むように拡張し,タスクのマルチラベル分類を可能にする。トランスフォーマーネットワークに基づくマルチクラス・マルチラベル絵文字予測のための新しいモデルを提案する。また、ヒューリスティックスを用いてTwitterから複数の絵文字予測データセットを構築する。 BERTモデルは、すべてのデータセットに対して、すべての設定下で最先端のパフォーマンスを達成し、相対的な改善は27.21%から236.36%、トップ5の精度は2.01%から88.28%、F-1のスコアは65.19%から346.79%である。本研究は,絵文字予測タスクにおける深いトランスフォーマーモデルの有効性を示す。また、将来の研究者のために、https://github.com/hikari-NYU/Emoji_Prediction_Datasets_MMSでデータセットをリリースしています。

Emojis are a succinct form of language which can express concrete meanings, emotions, and intentions. Emojis also carry signals that can be used to better understand communicative intent. They have become a ubiquitous part of our daily lives, making them an important part of understanding user-generated content. The emoji prediction task aims at predicting the proper set of emojis associated with a piece of text. Through emoji prediction, models can learn rich representations of the communicative intent of the written text. While existing research on the emoji prediction task focus on a small subset of emoji types closely related to certain emotions, this setting oversimplifies the task and wastes the expressive power of emojis. In this paper, we extend the existing setting of the emoji prediction task to include a richer set of emojis and to allow multi-label classification on the task. We propose novel models for multi-class and multi-label emoji prediction based on Transformer networks. We also construct multiple emoji prediction datasets from Twitter using heuristics. The BERT models achieve state-of-the-art performances on all our datasets under all the settings, with relative improvements of 27.21% to 236.36% in accuracy, 2.01% to 88.28% in top-5 accuracy and 65.19% to 346.79% in F-1 score, compared to the prior state-of-the-art. Our results demonstrate the efficacy of deep Transformer-based models on the emoji prediction task. We also release our datasets at https://github.com/hikari-NYU/Emoji_Prediction_Datasets_MMS for future researchers.

翻訳日:2022-11-10 14:33:58 公開日:2020-07-14

# 深層学習者の活用によるメール生成におけるコヒーレンシーのモデル化

Modeling Coherency in Generated Emails by Leveraging Deep Neural Learners ( http://arxiv.org/abs/2007.07403v1 )

ライセンス: Link先を確認

Avisha Das and Rakesh M. Verma

(参考訳) 高度な機械学習と自然言語技術により、攻撃者は高度なソーシャルエンジニアリングに基づく攻撃を開始することができる。攻撃的な問題に対処するため、研究者は積極的に検出する方法に頼ってきた。標的のメールを使って被害者を騙すメールは、高度な攻撃方法である。しかし、自動テキスト生成には生成したコンテンツのコンテキストと一貫性の制御が必要である。この方法は、入力文書内の文の学習表現を用いて構造化された電子メールを生成する階層型ディープニューラルモデルを利用する。深層モデルを用いて,ターゲットとする短文メッセージの生成を実証する。合成テキストのグローバルなコヒーレンシーを質的研究と複数の定量的尺度を用いて評価する。

Advanced machine learning and natural language techniques enable attackers to launch sophisticated and targeted social engineering-based attacks. To counter the active attacker issue, researchers have since resorted to proactive methods of detection. Email masquerading using targeted emails to fool the victim is an advanced attack method. However automatic text generation requires controlling the context and coherency of the generated content, which has been identified as an increasingly difficult problem. The method used leverages a hierarchical deep neural model which uses a learned representation of the sentences in the input document to generate structured written emails. We demonstrate the generation of short and targeted text messages using the deep model. The global coherency of the synthesized text is evaluated using a qualitative study as well as multiple quantitative measures.

翻訳日:2022-11-10 14:33:32 公開日:2020-07-14

# 集合的推論を支援するモデル:形式化・分析・計算評価

A model to support collective reasoning: Formalization, analysis and computational assessment ( http://arxiv.org/abs/2007.06850v1 )

ライセンス: Link先を確認

Jordi Ganzer, Natalia Criado, Maite Lopez-Sanchez, Simon Parsons, Juan A. Rodriguez-Aguilar

(参考訳) 本稿では,e-participationシステムに着想を得て,人間の議論を表現し,それらから集団的な結論を得るための新しいモデルを提案する。このモデルは,ユーザが議論に新たな情報を導入し,既存の情報に関連付けることによって,既存のアプローチの欠点を克服すると同時に,他のユーザの提案した情報に対する意見を表明する。また,このモデルでは,ユーザの意見が合理的であるとして,情報抽出を前提とせず,現在のアプローチを著しく制限している。代わりに、一貫性のある意見を特徴付ける合理性の弱い概念を定義し、個別の意見の一貫性とユーザーが議論構造に持つコンセンサスレベルに基づいて異なるシナリオを考察する。この2つの要因を考慮し,個別の意見と討論構造に基づいて集団意思決定を行う異なる意見集約関数の結果を分析した。特に,合意の欠如や個々人の意見が一貫性がない場合でも,総合的な意見が一貫性を持つことを実証する。本研究は,実物大の議論に対して,集団的意見を効率的に計算できることを示す数値的評価で結論づける。

Inspired by e-participation systems, in this paper we propose a new model to represent human debates and methods to obtain collective conclusions from them. This model overcomes drawbacks of existing approaches by allowing users to introduce new pieces of information into the discussion, to relate them to existing pieces, and also to express their opinion on the pieces proposed by other users. In addition, our model does not assume that users' opinions are rational in order to extract information from it, an assumption that significantly limits current approaches. Instead, we define a weaker notion of rationality that characterises coherent opinions, and we consider different scenarios based on the coherence of individual opinions and the level of consensus that users have on the debate structure. Considering these two factors, we analyse the outcomes of different opinion aggregation functions that compute a collective decision based on the individual opinions and the debate structure. In particular, we demonstrate that aggregated opinions can be coherent even if there is a lack of consensus and individual opinions are not coherent. We conclude our analysis with a computational evaluation demonstrating that collective opinions can be computed efficiently for real-sized debates.

翻訳日:2022-11-10 14:33:07 公開日:2020-07-14

# ReLUネットワークのグラディエントDescent Trainingにおけるプラトー現象:説明,定量化,回避

Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance ( http://arxiv.org/abs/2007.07213v1 )

ライセンス: Link先を確認

Mark Ainsworth and Yeonjong Shin

(参考訳) ニューラルネットワークが幅広いアプリケーションに‘クラス最高の’近似を提供する能力は、十分に文書化されている。それでも、ニューラルネットワークの強力な表現性は、ネットワークを定義するパラメータを効果的にトレーニング(チョース)できない場合に問題となる。一般に、ニューラルネットワークは勾配降下型最適化法またはその確率的変種によって訓練される。実際には、そのような方法ではトレーニング開始時に損失関数が急速に低下するが、比較的少数のステップの後、大幅に低下する。この損失は、多くのエポックの期間に停滞しているように見えるが、その間に突然減少し始めるが、その原因は明らかでない。このいわゆるプラトー現象は多くの学習課題に現れている。本研究の目的は,高原現象の根本原因の同定と定量化である。トレーニングデータ数に対するニューロン数についての仮定は行われず,怠け者と適応者の両方について結果が得られた。主な発見は、活性化パターンが一定である期間、活性化パターンは与えられたニューロンを活性化するデータ点の数、勾配流れのダイナミクスの収束の定量化、およびトレーニングデータのサブセット上の局所的最小二乗回帰線の解による静止点のキャラクタリゼーションである。そこで,本研究では,各ステップにおける活性化パターンの明示的な調整により特徴付けられる,新しい反復学習法である活動ニューロン最小二乗法(anls)を提案する。図示的な数値の例が全て含まれている。

The ability of neural networks to provide `best in class' approximation across a wide range of applications is well-documented. Nevertheless, the powerful expressivity of neural networks comes to naught if one is unable to effectively train (choose) the parameters defining the network. In general, neural networks are trained by gradient descent type optimization methods, or a stochastic variant thereof. In practice, such methods result in the loss function decreases rapidly at the beginning of training but then, after a relatively small number of steps, significantly slow down. The loss may even appear to stagnate over the period of a large number of epochs, only to then suddenly start to decrease fast again for no apparent reason. This so-called plateau phenomenon manifests itself in many learning tasks. The present work aims to identify and quantify the root causes of plateau phenomenon. No assumptions are made on the number of neurons relative to the number of training data, and our results hold for both the lazy and adaptive regimes. The main findings are: plateaux correspond to periods during which activation patterns remain constant, where activation pattern refers to the number of data points that activate a given neuron; quantification of convergence of the gradient flow dynamics; and, characterization of stationary points in terms solutions of local least squares regression lines over subsets of the training data. Based on these conclusions, we propose a new iterative training method, the Active Neuron Least Squares (ANLS), characterised by the explicit adjustment of the activation pattern at each step, which is designed to enable a quick exit from a plateau. Illustrative numerical examples are included throughout.

翻訳日:2022-11-10 14:26:30 公開日:2020-07-14

# k-centerクラスタリングに対するペアワイズフェアとコミュニティ保存アプローチ

A Pairwise Fair and Community-preserving Approach to k-Center Clustering ( http://arxiv.org/abs/2007.07384v1 )

ライセンス: Link先を確認

Brian Brubach, Darshan Chakrabarti, John P. Dickerson, Samir Khuller, Aravind Srinivasan, Leonidas Tsepenekas

(参考訳) クラスタリングは多くのアプリケーションで機械学習の基本的な問題である。機械学習が自動化システムのバックエンドとして普及するにつれて、公平性に関する懸念が生まれます。フェアネスに関する現在の文献の多くは、教師付き学習(グループフェアネス)における保護されたクラスに対する差別を扱う。 2つの点(あるいは1つの点のコミュニティ)が分離される確率が、ペアワイズ距離(あるいはコミュニティの直径)の増大関数によって境界づけられるという、フェアクラスタリングの異なる概念を定義する。データポイントが一括してクラスタ化されるメリットを享受する人々を表す状況を取り除きます。不公平は、特定のポイントが任意に、あるいは選挙地区のように、彼らを傷つけようとする誰かによって、決定論的に分離されたときに生じる。そこで我々は,クラスタリング設定において,ペアワイズフェアネスとコミュニティ保存という2つの新たなフェアネスを正式に定義する。公平性目標の実用性を探るために、我々は、これらの公平性制約を満たすために既存の$k$中心アルゴリズムを拡張するアプローチを考案する。このアプローチの解析は、公平性を維持しながら合理的な近似が達成できることを証明している。実験では、従来の$k$-centerアルゴリズム/ヒューリスティックスに対するアプローチの有効性を比較し、最適なクラスタリングと公正性のトレードオフを探る。

Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with discrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probability that two points (or a community of points) become separated is bounded by an increasing function of their pairwise distance (or community diameter). We capture the situation where data points represent people who gain some benefit from being clustered together. Unfairness arises when certain points are deterministically separated, either arbitrarily or by someone who intends to harm them as in the case of gerrymandering election districts. In response, we formally define two new types of fairness in the clustering setting, pairwise fairness and community preservation. To explore the practicality of our fairness goals, we devise an approach for extending existing $k$-center algorithms to satisfy these fairness constraints. Analysis of this approach proves that reasonable approximations can be achieved while maintaining fairness. In experiments, we compare the effectiveness of our approach to classical $k$-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness.

翻訳日:2022-11-10 14:25:28 公開日:2020-07-14

# グラフの深層学習によるタンパク質の情報還元表現の同定の高速化

Accelerating the identification of informative reduced representations of proteins with deep learning for graphs ( http://arxiv.org/abs/2007.08658v1 )

ライセンス: Link先を確認

Federico Errica, Marco Giulini, Davide Bacciu, Roberto Menichetti, Alessio Micheli, Raffaello Potestio

(参考訳) 分子動力学(MD)シミュレーションの限界は、コンピュータアーキテクチャとアルゴリズムの絶え間ない発展によって着実に前進している。このMD軌道の量と範囲(サイズと時間)の爆発は、原データの合理化と定量化のための自動化および転送可能な方法の必要性を引き起こす。近年,タンパク質の原子のサブセットを同定するアルゴリズム的手法が開発され,最も情報的な記述が可能となった。この方法は、与えられた縮小表現に対して、関連するマッピングエントロピー(つまり、単純化による情報損失の尺度)の計算に依存する。比較的単純だが、この計算には時間がかかる。本稿では,マッピングエントロピーの計算の高速化を目的としたディープラーニング手法の実装について述べる。この方法はディープグラフネットワークに依存しており、入力フォーマットの柔軟性が極めて高い。深部グラフネットワークは正確かつ極めて効率的であり,マッピングエントロピーのアルゴリズム計算に対して最大10^5$の高速化係数を持つことを示す。この手法の応用は、マッピングエントロピーの景観を再構築する際に生体分子の研究に大きな可能性をもたらすが、この手法は分子の構造の任意の関数の計算に容易に移行できるスキームである。

The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed forward by the relentless developments of computer architectures and algorithms. This explosion in the number and extent (in size and time) of MD trajectories induces the need of automated and transferable methods to rationalise the raw data and make quantitative sense out of them. Recently, an algorithmic approach was developed by some of us to identify the subset of a protein's atoms, or mapping, that enables the most informative description of it. This method relies on the computation, for a given reduced representation, of the associated mapping entropy, that is, a measure of the information loss due to the simplification. Albeit relatively straightforward, this calculation can be time consuming. Here, we describe the implementation of a deep learning approach aimed at accelerating the calculation of the mapping entropy. The method relies on deep graph networks, which provide extreme flexibility in the input format. We show that deep graph networks are accurate and remarkably efficient, with a speedup factor as large as $10^5$ with respect to the algorithmic computation of the mapping entropy. Applications of this method, which entails a great potential in the study of biomolecules when used to reconstruct its mapping entropy landscape, reach much farther than this, being the scheme easily transferable to the computation of arbitrary functions of a molecule's structure.

翻訳日:2022-11-10 14:24:46 公開日:2020-07-14

# モノのインターネットのためのレコメンダシステム:調査

Recommender Systems for the Internet of Things: A Survey ( http://arxiv.org/abs/2007.06758v1 )

ライセンス: Link先を確認

May Altulyan, Lina Yao, Xianzhi Wang, Chaoran Huang, Salil S Kanhere, Quan Z Sheng

(参考訳) 勧告はIoT(Internet of Things)のメリットの開発と促進において重要な段階である。従来のレコメンデータシステムは、成長を続ける、動的で、異質なIoTデータを利用できない。本稿では,最先端のレコメンダシステムに関する総合的なレビューと,iotの活気ある分野における関連技術とアプリケーションについて述べる。本稿では,iotへのレコメンデーションシステムの適用に関するいくつかの制限について議論し,既存の研究を比較するための参照フレームワークを提案する。

Recommendation represents a vital stage in developing and promoting the benefits of the Internet of Things (IoT). Traditional recommender systems fail to exploit ever-growing, dynamic, and heterogeneous IoT data. This paper presents a comprehensive review of the state-of-the-art recommender systems, as well as related techniques and application in the vibrant field of IoT. We discuss several limitations of applying recommendation systems to IoT and propose a reference framework for comparing existing studies to guide future research and practices.

翻訳日:2022-11-10 14:17:45 公開日:2020-07-14

# Pareto-Embeddingsによる選択関数の学習

Learning Choice Functions via Pareto-Embeddings ( http://arxiv.org/abs/2007.06927v1 )

ライセンス: Link先を確認

Karlson Pfannschmidt, Eyke H\"ullermeier

(参考訳) 本研究では,各オブジェクトが特徴ベクトルで表現される対象の集合から選択することの難しさを考察する。選択モデリングにおける伝統的なアプローチは、主に潜在、実数値の効用関数の学習に基づいており、選択の代替関数に対して線形順序を誘導する。このアプローチは離散的な(トップ-1)選択に適しているが、サブセットの選択にどのように使うかは単純ではない。実数直線の選択肢を写像する代わりに、パレート最適点を持つ選択集合を識別する高次元のユーティリティ空間にそれらを埋め込むことを提案する。そこで本研究では,このタスクに適した微分可能損失関数を最小化する学習アルゴリズムを提案する。ベンチマークデータセットのスイート上でPareto-embeddingを学習する可能性を示す。

We consider the problem of learning to choose from a given set of objects, where each object is represented by a feature vector. Traditional approaches in choice modelling are mainly based on learning a latent, real-valued utility function, thereby inducing a linear order on choice alternatives. While this approach is suitable for discrete (top-1) choices, it is not straightforward how to use it for subset choices. Instead of mapping choice alternatives to the real number line, we propose to embed them into a higher-dimensional utility space, in which we identify choice sets with Pareto-optimal points. To this end, we propose a learning algorithm that minimizes a differentiable loss function suitable for this task. We demonstrate the feasibility of learning a Pareto-embedding on a suite of benchmark datasets.

翻訳日:2022-11-10 14:16:35 公開日:2020-07-14

# ADSAGE: きめ細かいレベルでのインサイダー脅威検出に応用した分散グラフエッジ列の異常検出

ADSAGE: Anomaly Detection in Sequences of Attributed Graph Edges applied to insider threat detection at fine-grained level ( http://arxiv.org/abs/2007.06985v1 )

ライセンス: Link先を確認

Mathieu Garchery and Michael Granitzer

(参考訳) CERTインサイダーの脅威検出ケースに関する以前の研究は、ユーザ動作の関連性にもかかわらず、グラフとテキストの特徴を無視している。さらに既存のシステムは、悪意のあるアクティビティを検出するために、機能エンジニアリングと監査データアグリゲーションに大きく依存している。これは時間がかかり、専門家の知識が必要であり、正確なユーザーアクションに対する警告のトレースを防ぐ。これらの問題に対処するために、グラフエッジとしてモデル化された監査ログイベントの異常を検出するADSAGEを導入する。私たちの一般的な方法は、エッジシーケンスと属性の両方をサポートしながら、エッジレベルで異常検出を行う最初の方法です。本稿では、CERTのユースケースから異なる監査ログにおいて、ADSAGEをきめ細かなイベントレベルのインサイダー脅威検出に利用する方法について述べる。 CERT問題に標準ベンチマークがないことに留意し、現実的なリコールベースのメトリクスに基づいた評価設定を以前提案した。我々は、CERTインサイダー脅威データセットの認証、Eメールトラフィック、Webブラウジングログ、および実世界の認証イベントについてADSAGEを評価する。 ADSAGEは認証の異常、ユーザとコンピュータのインタラクション、メール通信の異常を検出するのに有効である。単純なベースラインも驚くほど強い結果をもたらす。興味深いことに、いくつかの検出器は相補的であり、検出を改善するために組み合わせられる可能性がある。全体として,グラフの特徴は悪意のあるインサイダー活動を特徴付けるのに有益であり,きめ細かいレベルでの検知が可能であることを示す。

Previous works on the CERT insider threat detection case have neglected graph and text features despite their relevance to describe user behavior. Additionally, existing systems heavily rely on feature engineering and audit data aggregation to detect malicious activities. This is time consuming, requires expert knowledge and prevents tracing back alerts to precise user actions. To address these issues we introduce ADSAGE to detect anomalies in audit log events modeled as graph edges. Our general method is the first to perform anomaly detection at edge level while supporting both edge sequences and attributes, which can be numeric, categorical or even text. We describe how ADSAGE can be used for fine-grained, event level insider threat detection in different audit logs from the CERT use case. Remarking that there is no standard benchmark for the CERT problem, we use a previously proposed evaluation setting based on realistic recall-based metrics. We evaluate ADSAGE on authentication, email traffic and web browsing logs from the CERT insider threat datasets, as well as on real-world authentication events. ADSAGE is effective to detect anomalies in authentications, modeled as user to computer interactions, and in email communications. Simple baselines give surprisingly strong results as well. We also report performance split by malicious scenarios present in the CERT datasets: interestingly, several detectors are complementary and could be combined to improve detection. Overall, our results show that graph features are informative to characterize malicious insider activities, and that detection at fine-grained level is possible.

翻訳日:2022-11-10 14:16:14 公開日:2020-07-14

# 構造付き潜在共同設立者によるガウス過程を用いた因果推論

Causal Inference using Gaussian Processes with Structured Latent Confounders ( http://arxiv.org/abs/2007.07127v1 )

ライセンス: Link先を確認

Sam Witty, Kenta Takatsu, David Jensen, Vikash Mansinghka

(参考訳) 潜在的共同設立者--治療と結果の両方に影響を与える未観測変数---因果効果の偏見を推定する。例えば、コースを受講しているすべての学生は、個別に受ける教育的介入に加えて、コースの難しさの影響を受けている。本稿では,この構造を持つ助成金を半パラメトリックにモデル化し,因果効果の評価を改善する方法について述べる。鍵となる革新は階層的ベイズモデル、構造化潜在共同設立者(GP-SLC)を持つガウス過程、楕円スライスサンプリングに基づくモンテカルロ推論アルゴリズムである。 GP-SLCは、共同設立者、共変量、治療、結果に関連する機能形式に関する最小限の仮定で、個々の治療効果のベイズ的不確実性推定を提供する。最後に, gp-slcは, 乳幼児保健開発プログラムや, 温度変化がニューイングランド全域のエネルギー消費に与える影響を示すデータセットなど, 3つのベンチマークデータセットにおいて, 広く使用されている因果推論技術と競合しているか, またはより正確であることを示す。

Latent confounders---unobserved variables that influence both treatment and outcome---can bias estimates of causal effects. In some cases, these confounders are shared across observations, e.g. all students taking a course are influenced by the course's difficulty in addition to any educational interventions they receive individually. This paper shows how to semiparametrically model latent confounders that have this structure and thereby improve estimates of causal effects. The key innovations are a hierarchical Bayesian model, Gaussian processes with structured latent confounders (GP-SLC), and a Monte Carlo inference algorithm for this model based on elliptical slice sampling. GP-SLC provides principled Bayesian uncertainty estimates of individual treatment effect with minimal assumptions about the functional forms relating confounders, covariates, treatment, and outcome. Finally, this paper shows GP-SLC is competitive with or more accurate than widely used causal inference techniques on three benchmark datasets, including the Infant Health and Development Program and a dataset showing the effect of changing temperatures on state-wide energy consumption across New England.

翻訳日:2022-11-10 14:15:04 公開日:2020-07-14

# 注意と差別:ウェアラブルセンサを用いた人間行動認識の現状と課題

Attend And Discriminate: Beyond the State-of-the-Art for Human Activity Recognition using Wearable Sensors ( http://arxiv.org/abs/2007.07172v1 )

ライセンス: Link先を確認

Alireza Abedin, Mahsa Ehsanpour, Qinfeng Shi, Hamid Rezatofighi, Damith C. Ranasinghe

(参考訳) ウェアラブルは、特にリハビリテーションからきめ細かい歩行分析に至るまで、医療応用の増加のために、人間の活動に対する理解を改善するための基本となる。ウェアラブルに関するHAR(Human Activity Recognition)問題を解決するための総合的なノウハウは、エンドツーエンドのディープラーニングパラダイムによって大きく進歩しているが、いくつかの基本的な機会は見過ごされ続けている。我々は、豊かで差別性の高い活動表現を学習するこれらの新しい機会を精力的に探求する。提案します一マルチチャネルセンサモダリティと特定活動の潜在関係を利用するための学習二深部HARモデルの標準化のためのマルチモーダルセンサデータストリームにおけるデータ非依存化の有効性の検討及び三クラス間差を最大化しつつ、クラス内差の最小化を図るために分類損失基準を組み込むこと。当社の貢献は、4つの多様なアクティビティ認識問題ベンチマークで新たな最先端のパフォーマンスを達成し、大きなマージンと最大6%のマージン改善を実現しています。我々は,この設計概念からの貢献を,量的および質的研究を通じて共有される活動的不均衡尺度,アブレーション研究,洞察など,広範な実験を通じて広範囲に検証した。

Wearables are fundamental to improving our understanding of human activities, especially for an increasing number of healthcare applications from rehabilitation to fine-grained gait analysis. Although our collective know-how to solve Human Activity Recognition (HAR) problems with wearables has progressed immensely with end-to-end deep learning paradigms, several fundamental opportunities remain overlooked. We rigorously explore these new opportunities to learn enriched and highly discriminating activity representations. We propose: i) learning to exploit the latent relationships between multi-channel sensor modalities and specific activities; ii) investigating the effectiveness of data-agnostic augmentation for multi-modal sensor data streams to regularize deep HAR models; and iii) incorporating a classification loss criterion to encourage minimal intra-class representation differences whilst maximising inter-class differences to achieve more discriminative features. Our contributions achieves new state-of-the-art performance on four diverse activity recognition problem benchmarks with large margins -- with up to 6% relative margin improvement. We extensively validate the contributions from our design concepts through extensive experiments, including activity misalignment measures, ablation studies and insights shared through both quantitative and qualitative studies.

翻訳日:2022-11-10 14:14:47 公開日:2020-07-14

# TCGM: 半教師付きマルチモーダル学習のための情報理論フレームワーク

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning ( http://arxiv.org/abs/2007.06793v1 )

ライセンス: Link先を確認

Xinwei Sun, Yilun Xu, Peng Cao, Yuqing Kong, Lingjing Hu, Shanghang Zhang, Yizhou Wang

(参考訳) 複数のモダリティからデータを抽出することで、機械学習システムのトレーニングにより多くの情報を提供する。しかしながら、各モダリティを大量のデータでラベル付けすることは、非常に高価で時間がかかるため、半教師付きマルチモーダル学習の重要な問題となる。既存の手法は、適切な仮定の下でのモダリティ間の非効率的な融合または理論的保証の欠如に苦しむ。本稿では, 半教師付きマルチモーダル学習のための新しい情報理論的手法, \textbf{t}otal \textbf{c}orrelation \textbf{g}ain \textbf{m}aximization (tcgm)を提案する。一ラベルなしデータポイントの異なるモダリティの情報を有効活用して各モダリティの訓練分類を行うことができること。 (ii) ベイズ分類器を同定する理論的保証、すなわちすべてのモダリティの根本的真理を同定すること。具体的には、すべてのモダリティの分類器に対するtc誘発損失(すなわちtcゲイン)を最大化することで、これらの分類器は協調的に対応する接地型分類器の類型を発見し、ラベル付きデータの限られた割合を活用することでユニークなものを識別することができる。本手法を様々なタスクに適用し,ニュース分類,感情認識,疾患予測など最新の結果を得る。

Fusing data from multiple modalities provides more information to train machine learning systems. However, it is prohibitively expensive and time-consuming to label each modality with a large amount of data, which leads to a crucial problem of semi-supervised multi-modal learning. Existing methods suffer from either ineffective fusion across modalities or lack of theoretical guarantees under proper assumptions. In this paper, we propose a novel information-theoretic approach, namely \textbf{T}otal \textbf{C}orrelation \textbf{G}ain \textbf{M}aximization (TCGM), for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) it has theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities. Specifically, by maximizing TC-induced loss (namely TC gain) over classifiers of all modalities, these classifiers can cooperatively discover the equivalent class of ground-truth classifiers; and identify the unique ones by leveraging limited percentage of labeled data. We apply our method to various tasks and achieve state-of-the-art results, including news classification, emotion recognition and disease prediction.

翻訳日:2022-11-10 14:09:00 公開日:2020-07-14

# コーン・エプシロン・ドミナンス:進化的多目的最適化へのアプローチ

The Cone epsilon-Dominance: An Approach for Evolutionary Multiobjective Optimization ( http://arxiv.org/abs/2008.04224v1 )

ライセンス: Link先を確認

Lucas S. Batista, Felipe Campelo, Frederico G. Guimar\~aes and Jaime A. Ram\'irez

(参考訳) 本稿では,多目的進化アルゴリズム(moeas)の収束と多様性を改善するためのコーン・エプシロン・ドミナンス手法を提案する。標準パレート関係(NSGA-II,NSGA-II*,SPEA2,クラスタ化NSGA-II)およびエプシロン支配(eps-MOEA)に基づいて、コーン-エピス-MOEAをMOEAと比較した。この比較は、計算の複雑さと、各アルゴリズムによって得られた最終結果の質を定量化するために選択された4つのパフォーマンス指標、すなわち、多くの集合メトリクスの収束、多様性、ハイパーボリューム、カバレッジの両方において行われる。 ZDTやDTLZファミリーを含む16の有名なベンチマーク問題が実験室で検討されている。アルゴリズム間の相違性を評価するため、4つの性能指標について慎重に設計した実験を行った。その結果、コーン・エプス・MOEAは、考慮されたすべての性能指標に対して、効率的かつバランスの取れた性能を示すことができることが示唆された。これらの結果は、コーン-エプス-MOEAは、パレートフロントへの収束と多様性の効率的なバランスを得るための競争的アプローチであり、多目的最適化問題の解決に有用なツールである、という結論を強く支持している。

We propose the cone epsilon-dominance approach to improve convergence and diversity in multiobjective evolutionary algorithms (MOEAs). A cone-eps-MOEA is presented and compared with MOEAs based on the standard Pareto relation (NSGA-II, NSGA-II*, SPEA2, and a clustered NSGA-II) and on the epsilon-dominance (eps-MOEA). The comparison is performed both in terms of computational complexity and on four performance indicators selected to quantify the quality of the final results obtained by each algorithm: the convergence, diversity, hypervolume, and coverage of many sets metrics. Sixteen well-known benchmark problems are considered in the experimental section, including the ZDT and the DTLZ families. To evaluate the possible differences amongst the algorithms, a carefully designed experiment is performed for the four performance metrics. The results obtained suggest that the cone-eps-MOEA is capable of presenting an efficient and balanced performance over all the performance metrics considered. These results strongly support the conclusion that the cone-eps-MOEA is a competitive approach for obtaining an efficient balance between convergence and diversity to the Pareto front, and as such represents a useful tool for the solution of multiobjective optimization problems.

翻訳日:2022-11-10 14:06:30 公開日:2020-07-14

# 2層ニューラルネットワークにおける2次ダイナミクスの大域収束

Global Convergence of Second-order Dynamics in Two-layer Neural Networks ( http://arxiv.org/abs/2007.06852v1 )

ライセンス: Link先を確認

Walid Krichene, Kenneth F. Caluya, Abhishek Halder

(参考訳) 近年, 2層完全連結ニューラルネットワークでは, 平均場力学とワッサーシュタイン勾配流との接続により, 勾配流は無限幅限界における大域的最適に収束することが示されている。これらの結果は一階の勾配流のために導出され、自然な疑問は二階の力学、すなわち運動量を持つ力学が同様の保証を示すかどうかである。その結果,重球法では正の解が得られた。この場合、結果の積分 pde は非線形運動論的フォッカープランク方程式であり、一階の場合とは異なり、ワッサースタイン勾配流とは明確な関係を持たない。代わりに、解軌道に沿ったリアプノフ汎関数の変種を研究し、定常点を特徴付け、収束を証明する。平均場限界は漸近的であるが,数値シミュレーションにより,大域収束は比較的小さなネットワークで既に発生している可能性が示唆された。

Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit, by making a connection between the mean field dynamics and the Wasserstein gradient flow. These results were derived for first-order gradient flow, and a natural question is whether second-order dynamics, i.e., dynamics with momentum, exhibit a similar guarantee. We show that the answer is positive for the heavy ball method. In this case, the resulting integro-PDE is a nonlinear kinetic Fokker Planck equation, and unlike the first-order case, it has no apparent connection with the Wasserstein gradient flow. Instead, we study the variations of a Lyapunov functional along the solution trajectories to characterize the stationary points and to prove convergence. While our results are asymptotic in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks.

翻訳日:2022-11-10 14:06:04 公開日:2020-07-14

# 少数変数ガウス近似による信用フルート検出に向けて

Towards Credit-Fraud Detection via Sparsely Varying Gaussian Approximations ( http://arxiv.org/abs/2007.07181v1 )

ライセンス: Link先を確認

Harshit Sharma, Harsh K. Gandhi, Apoorv Jain

(参考訳) 不正行為は多くの金融機関にとって高価な問題であり、年間数十億ドルを企業に費やしている。この分野でのより一般的な活動はクレジットカード詐欺である。この文脈において、クレジットカード不正検出の概念は、予測システムに不確実性を組み込んで、そのような重要なタスクにおけるより良い判断を確実にするために開発された。本稿では,大規模なデータセットを扱うためにスパースガウス分類法を用い,擬似的あるいは誘導的入力の概念を用いることを提案する。異なるカーネルセットと異なるインジェクションデータポイント数を用いて、RBFカーネルを高いインジェクションポイント数で選択することで、最も精度の高いデータポイントを得ることができた。提案手法は,提案手法の確率的性質と,モデルの信頼性と堅牢性を示す予測に対して,低分散の試験精度を考慮し,大規模な財務データを扱うことができた。ベイズ学習手法の方法論を組み込んだ誘導点現象を用いて、健全な精度と高い信頼度を得ることができる。

Fraudulent activities are an expensive problem for many financial institutions, costing billions of dollars to corporations annually. More commonly occurring activities in this regard are credit card frauds. In this context, the credit card fraud detection concept has been developed over the lines of incorporating the uncertainty in our prediction system to ensure better judgment in such a crucial task. We propose to use a sparse Gaussian classification method to work with the large data-set and use the concept of pseudo or inducing inputs. We perform the same with different sets of kernels and the different number of inducing data points to show the best accuracy was obtained with the selection of RBF kernel with a higher number of inducing points. Our approach was able to work over large financial data given the stochastic nature of our method employed and also good test accuracy with low variance over the prediction suggesting confidence and robustness in our model. Using the methodologies of Bayesian learning techniques with the incorporated inducing points phenomenon, are successfully able to obtain a healthy accuracy and a high confidence score.

翻訳日:2022-11-10 13:59:31 公開日:2020-07-14

# リカレントニューラルネットワークのシャッフリング

Shuffling Recurrent Neural Networks ( http://arxiv.org/abs/2007.07324v1 )

ライセンス: Link先を確認

Michael Rotman and Lior Wolf

(参考訳) 本稿では,従来の隠れ状態$h_{t-1}$のベクトル要素を置換し,学習関数$b(x_t)$の入力値$x_t$の出力をt$で加算することにより,隠れ状態$h_t$が得られる新しいリカレントニューラルネットワークモデルを提案する。我々のモデルでは、予測は第2の学習関数によって与えられ、隠れた状態$s(h_t)$に適用される。この方法は実装が容易で、非常に効率的であり、消滅や爆発的な勾配に苦しむことはない。広範な実験において,本手法は主要な文献ベースラインと比較して,競争力のある結果を示す。

We propose a novel recurrent neural network model, where the hidden state $h_t$ is obtained by permuting the vector elements of the previous hidden state $h_{t-1}$ and adding the output of a learned function $b(x_t)$ of the input $x_t$ at time $t$. In our model, the prediction is given by a second learned function, which is applied to the hidden state $s(h_t)$. The method is easy to implement, extremely efficient, and does not suffer from vanishing nor exploding gradients. In an extensive set of experiments, the method shows competitive results, in comparison to the leading literature baselines.

翻訳日:2022-11-10 13:59:02 公開日:2020-07-14

# コスト依存型アンサンブル学習の誤分類:統一フレームワーク

Misclassification cost-sensitive ensemble learning: A unifying framework ( http://arxiv.org/abs/2007.07361v1 )

ライセンス: Link先を確認

George Petrides and Wouter Verbeke

(参考訳) 長年にわたり、異なるタイプの誤分類エラーが異なるコストをもたらす場合にデータについて学ぶために、多くのコストに敏感な方法が提案されてきた。私たちの貢献は、コストに敏感なアンサンブルメソッドに関する包括的かつ洞察に富んだ概要を提供する統一フレームワークです。我々のフレームワークには、AdaBoost、Bagging、Random Forestなど、メソッド間の自然な拡張とアイデアの一般化が含まれており、結果として、現在知られているすべてのメソッドだけでなく、これまで検討されていないいくつかのメソッドも得られます。

Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.

翻訳日:2022-11-10 13:58:25 公開日:2020-07-14

# ストリーミング確率的深部テンソル因子化

Streaming Probabilistic Deep Tensor Factorization ( http://arxiv.org/abs/2007.07367v1 )

ライセンス: Link先を確認

Shikai Fang, Zheng Wang, Zhimeng Pan, Ji Liu, Shandian Zhe

(参考訳) 既存のテンソル分解法の成功にもかかわらず、それらのほとんどが多重線形分解を行い、データ内の様々な複雑な相互作用を捉えるためにディープニューラルネットワークのような強力なモデリングフレームワークを利用することは滅多にない。より重要なのは、非常に表現力が高く、深い因子化のために、実世界のアプリケーションで広く使われているストリーミングデータを扱う効果的なアプローチが欠けていることです。これらの問題に対処するため、SPIDER(Streaming ProbabilistIc Deep tEnsoR factorization method)を提案する。まずベイズ型ニューラルネットワーク(nns)を用いて,深いテンソル分解モデルを構築した。我々は,nn重みよりも先にスパイク・アンド・スラブを割り当て,スパーシティを奨励し,過剰フィットを防止する。そこで我々はTaylor拡張とモーメントマッチングを用いてNN出力の後部を近似し、仮定密度フィルタおよび期待伝搬フレームワークにおいて効率的な後部推論アルゴリズムを開発するランニングモデルエビデンスを算出する。提案アルゴリズムは,新しいテンソルエントリを受信すると,潜在因子とnn重みの後方に応答的な更新を行い,一方,冗長/無使用重みを選択・抑制する。実世界の4つのアプリケーションにアプローチの利点を示す。

Despite the success of existing tensor factorization methods, most of them conduct a multilinear decomposition, and rarely exploit powerful modeling frameworks, like deep neural networks, to capture a variety of complicated interactions in data. More important, for highly expressive, deep factorization, we lack an effective approach to handle streaming data, which are ubiquitous in real-world applications. To address these issues, we propose SPIDER, a Streaming ProbabilistIc Deep tEnsoR factorization method. We first use Bayesian neural networks (NNs) to construct a deep tensor factorization model. We assign a spike-and-slab prior over the NN weights to encourage sparsity and prevent overfitting. We then use Taylor expansions and moment matching to approximate the posterior of the NN output and calculate the running model evidence, based on which we develop an efficient streaming posterior inference algorithm in the assumed-density-filtering and expectation propagation framework. Our algorithm provides responsive incremental updates for the posterior of the latent factors and NN weights upon receiving new tensor entries, and meanwhile select and inhibit redundant/useless weights. We show the advantages of our approach in four real-world applications.

翻訳日:2022-11-10 13:57:56 公開日:2020-07-14

# MainNetにSideNetを追加する

Add a SideNet to your MainNet ( http://arxiv.org/abs/2007.13512v1 )

ライセンス: Link先を確認

Adrien Morisot

(参考訳) ディープニューラルネットワークの性能と人気が高まるにつれて、計算コストも増大している。ネットワークの計算フットプリント(量子化、プルーニング、知識蒸留)を減らすための効果的な技術は数多く存在するが、これらは入力に関係なく計算コストが同じであるモデルにつながる。私たちの人間の反応時間は、我々が実行するタスクの複雑さによって異なります。より簡単なタスク(例えば、ボートから犬を区別する)は、より難しいタスクよりもずっと高速に実行される(例えば、類似した2種類の犬種を区別する)。そこで本研究では,我々がsidenetと呼ぶ小さな分類層をmainnetと呼ぶ大規模事前学習済みネットワークにアタッチすることで,適応的ネットワーク複雑化の手法を開発した。入力が与えられると、サイドネットは、softmaxによって得られた信頼度レベルがユーザ決定しきい値を超えている場合に分類を返し、信頼度が低すぎる場合は、大きなメインネットに渡すだけである。これにより、ネットワークのパフォーマンスを計算コストで柔軟にトレードオフすることができます。実験結果から,プレトレーニング済みのResNetとBERT MainNetに加えられた単純な単一層パーセプトロン・サイドネットは,画像やテキストの分類タスクのパフォーマンスを最小限に抑えることができることがわかった。また,サイドネットによって得られる分類を校正し,他の計算量削減手法を補完し,計算精度空間の探索を容易にすること,という3つの望ましい特徴を強調する。

As the performance and popularity of deep neural networks has increased, so too has their computational cost. There are many effective techniques for reducing a network's computational footprint (quantisation, pruning, knowledge distillation), but these lead to models whose computational cost is the same regardless of their input. Our human reaction times vary with the complexity of the tasks we perform: easier tasks (e.g. telling apart dogs from boat) are executed much faster than harder ones (e.g. telling apart two similar looking breeds of dogs). Driven by this observation, we develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet. Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and only passes it along to the large MainNet for further processing if its confidence is too low. This allows us to flexibly trade off the network's performance with its computational cost. Experimental results show that simple single hidden layer perceptron SideNets added onto pretrained ResNet and BERT MainNets allow for substantial decreases in compute with minimal drops in performance on image and text classification tasks. We also highlight three other desirable properties of our method, namely that the classifications obtained by SideNets are calibrated, complementary to other compute reduction techniques, and that they enable the easy exploration of compute accuracy space.

翻訳日:2022-11-10 13:57:26 公開日:2020-07-14

# マルチタスクランキングを用いたソーシャルメディア画像からの水位予測

Water level prediction from social media images with a multi-task ranking approach ( http://arxiv.org/abs/2007.06749v1 )

ライセンス: Link先を確認

P. Chaudhary, S. D'Aronco, J.P. Leitao, K. Schindler, J.D. Wegner

(参考訳) 洪水は最も頻繁で壊滅的な自然災害であり、世界中の何百万人もの人々に影響を与えている。正確な洪水地図を作成して(オフライン)計画し、(リアルタイム)洪水対策と洪水救助活動を行うことが重要である。おそらくソーシャルメディアから集めた画像は、そのタスクに有用な情報を提供することができるだろう。我々は,洪水時のソーシャルメディア画像から水深を推定するコンピュータビジョンシステムを導入し,洪水マップを(ほぼ)リアルタイムに構築する。本稿では,回帰学習とペアランキング損失の両方を用いてモデルを訓練するマルチタスク(ディープ)学習手法を提案する。画像に基づく水位推定の主なボトルネックはトレーニングデータであり,未制御の画像に適切な水深で注釈を付けるのに多くの労力を要する。本研究では,2つの画像のうち,どの画像が水位が高いかを示すのみを示す,注釈付き水位と,より弱いアノテーションのセットから,予測器を消耗的に学習する方法を実証する。さらに,DeepFloodという新たなデータセットと8145の注釈付き地上レベルの画像を提供し,提案手法により,1つのクラウドソース画像から約11cmの平均平方誤差で水位を予測することができることを示す。

Floods are among the most frequent and catastrophic natural disasters and affect millions of people worldwide. It is important to create accurate flood maps to plan (offline) and conduct (real-time) flood mitigation and flood rescue operations. Arguably, images collected from social media can provide useful information for that task, which would otherwise be unavailable. We introduce a computer vision system that estimates water depth from social media images taken during flooding events, in order to build flood maps in (near) real-time. We propose a multi-task (deep) learning approach, where a model is trained using both a regression and a pairwise ranking loss. Our approach is motivated by the observation that a main bottleneck for image-based flood level estimation is training data: it is diffcult and requires a lot of effort to annotate uncontrolled images with the correct water depth. We demonstrate how to effciently learn a predictor from a small set of annotated water levels and a larger set of weaker annotations that only indicate in which of two images the water level is higher, and are much easier to obtain. Moreover, we provide a new dataset, named DeepFlood, with 8145 annotated ground-level images, and show that the proposed multi-task approach can predict the water level from a single, crowd-sourced image with ~11 cm root mean square error.

翻訳日:2022-11-10 13:57:01 公開日:2020-07-14

# reluアクティベーションを用いたニューラルネットワークのための局所領域の線形領域数制限

Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations ( http://arxiv.org/abs/2007.06803v1 )

ライセンス: Link先を確認

Rui Zhu, Bo Lin, Haixu Tang

(参考訳) 線形領域の数は、ReLUのような一方向線形活性化関数を用いたニューラルネットワークの特性の1つであり、他のアクティベーション関数を用いた従来の領域と比較する。この特性はニューラルネットワークファミリー([14])の表現性を反映しており、結果として、ニューラルネットワークモデルの構造的複雑さが計算する関数にどのように影響するかを特徴付けるのに使うことができる。それにもかかわらず、線形領域の数を直接計算することは困難であり、多くの研究者はReLUを用いて深部ニューラルネットワークの線形領域の数(特に上限値)を推定することに集中している。しかし、これらの手法は入力空間全体の上限を推定しようと試みた。理論的な手法では、入力空間の特定の領域内の線形領域の数、例えば、逆例やバックドアトリガーのような訓練データポイントを中心とする球数を推定することができない。本稿では,与えられたReLUニューラルネットワークの入力空間内の任意の球面における線形領域数の上界を推定する最初の手法を提案する。本手法を実装し,区分線形能動関数を用いて深層ニューラルネットワークにおける境界を計算した。実験の結果、ニューラルネットワークをトレーニングしている間、線形領域の境界はトレーニングデータポイントから離れる傾向にあることがわかった。さらに、トレーニングデータ点を中心とする球体は、入力空間内の任意の点よりも線状領域を多く含む傾向があることを観察する。我々の知る限りでは、これは特定のデータポイントの周りの線形領域の境界に関する最初の研究である。我々は、特定の入力領域におけるディープニューラルネットワークの構造的複雑さの調査に向けた第一歩であると考えている。

The number of linear regions is one of the distinct properties of the neural networks using piecewise linear activation functions such as ReLU, comparing with those conventional ones using other activation functions. Previous studies showed this property reflected the expressivity of a neural network family ([14]); as a result, it can be used to characterize how the structural complexity of a neural network model affects the function it aims to compute. Nonetheless, it is challenging to directly compute the number of linear regions; therefore, many researchers focus on estimating the bounds (in particular the upper bound) of the number of linear regions for deep neural networks using ReLU. These methods, however, attempted to estimate the upper bound in the entire input space. The theoretical methods are still lacking to estimate the number of linear regions within a specific area of the input space, e.g., a sphere centered at a training data point such as an adversarial example or a backdoor trigger. In this paper, we present the first method to estimate the upper bound of the number of linear regions in any sphere in the input space of a given ReLU neural network. We implemented the method, and computed the bounds in deep neural networks using the piece-wise linear active function. Our experiments showed that, while training a neural network, the boundaries of the linear regions tend to move away from the training data points. In addition, we observe that the spheres centered at the training data points tend to contain more linear regions than any arbitrary points in the input space. To the best of our knowledge, this is the first study of bounding linear regions around a specific data point. We consider our work as a first step toward the investigation of the structural complexity of deep neural networks in a specific input area.

翻訳日:2022-11-10 13:50:38 公開日:2020-07-14

# スペクトル誘導逆差学習

Spectrum-Guided Adversarial Disparity Learning ( http://arxiv.org/abs/2007.06831v1 )

ライセンス: Link先を確認

Zhe Liu, Lina Yao, Lei Bai, Xianzhi Wang, Can Wang

(参考訳) 行動認識領域におけるクラス内格差を正確に表現することは重要な課題であり、各活動クラスにおける主題固有の変動間の相関を堅牢に表現する必要がある。本研究では,2つの競合する符号化分布を用いてクラス条件付きクラス内不一致を表現し,学習された不一致を識別して精製された潜時符号を学習する,新しいエンド・ツー・エンドの学習フレームワークを提案する。さらに、ドメイン知識を教師なしの方法で組み込んで最適化をガイドし、パフォーマンスをさらに向上させる。 4つのharベンチマークデータセットを用いた実験により,提案手法のロバスト性と一般化が実証された。さらに,性能向上におけるドメイン知識の自動導入の有効性を実証する。

It has been a significant challenge to portray intraclass disparity precisely in the area of activity recognition, as it requires a robust representation of the correlation between subject-specific variation for each activity class. In this work, we propose a novel end-to-end knowledge directed adversarial learning framework, which portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity. Furthermore, the domain knowledge is incorporated in an unsupervised manner to guide the optimization and further boosts the performance. The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art. We further prove the effectiveness of automatic domain knowledge incorporation in performance enhancement.

翻訳日:2022-11-10 13:50:04 公開日:2020-07-14

# 非対称協調機械学習のための追加同型暗号化に基づくディープニューラルネットワーク

Additively Homomorphical Encryption based Deep Neural Network for Asymmetrically Collaborative Machine Learning ( http://arxiv.org/abs/2007.06849v1 )

ライセンス: Link先を確認

Yifei Zhang and Hao Zhu

(参考訳) 金融セクターは、さまざまな機械学習技術を適用する多くの機会を提供する。集中型機械学習は金融セクターにおけるさらなる適用を制限する制約を生み出す。データプライバシは、さまざまなセクションでモデルを学習するさまざまな金融および保険アプリケーションにとって、基本的な課題である。本稿では,一方の当事者がデータを所有し,他方がラベルのみを所有する協調機械学習の新たな実践的手法を定義し,これを「非対称協調機械学習」と呼ぶ。本研究では,両者が協調的に深層学習モデルを学習し,それぞれのデータのプライバシーを保ちながら効率的に学習できる新しいプライバシ保護アーキテクチャを提案する。より具体的には、ニューラルネットワークの前方伝播と後方伝播を4つの異なるステップに分解し、これらのステップで情報漏洩を処理する新しいプロトコルを提案する。異なるデータセットに対する広範な実験は、精度の低下なしに安定したトレーニングを行うだけでなく、最先端システムと比較して100倍以上のスピードアップを示す。

The financial sector presents many opportunities to apply various machine learning techniques. Centralized machine learning creates a constraint which limits further applications in finance sectors. Data privacy is a fundamental challenge for a variety of finance and insurance applications that account on learning a model across different sections. In this paper, we define a new practical scheme of collaborative machine learning that one party owns data, but another party owns labels only, and term this \textbf{Asymmetrically Collaborative Machine Learning}. For this scheme, we propose a novel privacy-preserving architecture where two parties can collaboratively train a deep learning model efficiently while preserving the privacy of each party's data. More specifically, we decompose the forward propagation and backpropagation of the neural network into four different steps and propose a novel protocol to handle information leakage in these steps. Our extensive experiments on different datasets demonstrate not only stable training without accuracy loss, but also more than 100 times speedup compared with the state-of-the-art system.

翻訳日:2022-11-10 13:49:51 公開日:2020-07-14

# 行動空間対応訓練による強化学習エージェントの強固化

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training ( http://arxiv.org/abs/2007.07176v1 )

ライセンス: Link先を確認

Kai Liang Tan, Yasaman Esfandiari, Xian Yeow Lee, Aakanksha, Soumik Sarkar

(参考訳) 機械学習(ML)に対応したサイバー物理システム(CPS)の採用は、輸送、産業、電力網といった現代社会の様々な分野で広く普及している。深層強化学習(DRL)の最近の研究は、様々なデータ駆動型意思決定と制御アプリケーションにおいてその利点を実証している。 ML対応システムへの依存度が高まるにつれて、悪意のある状態とアクチュエーター攻撃の下でこれらのシステムの性能を研究することが不可欠である。従来の制御システムはレジリエント/フォールト耐性のコントローラを採用しており、エラー観測によってシステムを修正している。しかし、いくつかのアプリケーションでは、回復力のあるコントローラは破滅的な失敗を避けるには不十分である。理想的には、堅牢なアプローチは、システムを本質的に(設計によって)敵の攻撃に対して堅牢なシナリオにおいてより有用である。堅牢な制御には長い歴史があるが、堅牢なMLは、その関連性と緊急性をすでに示している新興の研究分野である。しかしながら、ロバストなML研究の大部分は、意思決定や制御タスクではなく、知覚タスクに焦点を合わせてきたが、制御アプリケーションに使用されるML(特にRL)モデルは、敵の攻撃に対して等しく脆弱である。本稿では,動作空間の摂動(アクチュエータアタックなど)の影響を受けやすいDRLエージェントを,対向訓練により同様の摂動に対して堅牢化することができることを示す。

Adoption of machine learning (ML)-enabled cyber-physical systems (CPS) are becoming prevalent in various sectors of modern society such as transportation, industrial, and power grids. Recent studies in deep reinforcement learning (DRL) have demonstrated its benefits in a large variety of data-driven decisions and control applications. As reliance on ML-enabled systems grows, it is imperative to study the performance of these systems under malicious state and actuator attacks. Traditional control systems employ resilient/fault-tolerant controllers that counter these attacks by correcting the system via error observations. However, in some applications, a resilient controller may not be sufficient to avoid a catastrophic failure. Ideally, a robust approach is more useful in these scenarios where a system is inherently robust (by design) to adversarial attacks. While robust control has a long history of development, robust ML is an emerging research area that has already demonstrated its relevance and urgency. However, the majority of robust ML research has focused on perception tasks and not on decision and control tasks, although the ML (specifically RL) models used for control applications are equally vulnerable to adversarial attacks. In this paper, we show that a well-performing DRL agent that is initially susceptible to action space perturbations (e.g. actuator attacks) can be robustified against similar perturbations through adversarial training.

翻訳日:2022-11-10 13:48:02 公開日:2020-07-14

# 自然言語からの知的な要求工学とCADモデルへの連鎖

Intelligent requirements engineering from natural language and their chaining toward CAD models ( http://arxiv.org/abs/2007.07825v1 )

ライセンス: Link先を確認

Alain-J\'er\^ome Foug\`eres and Egon Ostrosi

(参考訳) 本稿では,デザイナーの創造性を設計する上で,デザイン言語が重要な役割を担っていると仮定する。設計者は、思考の補助、議論と意思決定の焦点、提案の信頼性を評価する手段としてモデルを使用し、開発する。本稿では,自然言語からの要求工学とCADモデルへの連鎖に関するインテリジェントな手法を提案する。言語分析から工学的要求の表現への移行は、構文構造を概念グラフで表される意味形式に変換することから成り立っている。概念グラフと述語論理の間の同型に基づいて、仕様の形式言語が提案されている。この言語の結果は連鎖し、コンピュータ支援3次元インタラクティブアプリケーション(catia)モデルに翻訳される。このツール(EGEON: Engineering desiGn sEmantics elabOration and ApplicatioN)は、エンジニアリング要件のセマンティックネットワークを表現するために開発された。提案手法を説明するために, 自動車ドアヒンジの設計に関する事例研究を行った。

This paper assumes that design language plays an important role in how designers design and on the creativity of designers. Designers use and develop models as an aid to thinking, a focus for discussion and decision-making and a means of evaluating the reliability of the proposals. This paper proposes an intelligent method for requirements engineering from natural language and their chaining toward CAD models. The transition from linguistic analysis to the representation of engineering requirements consists of the translation of the syntactic structure into semantic form represented by conceptual graphs. Based on the isomorphism between conceptual graphs and predicate logic, a formal language of the specification is proposed. The outcome of this language is chained and translated in Computer Aided Three-Dimensional Interactive Application (CATIA) models. The tool (EGEON: Engineering desiGn sEmantics elabOration and applicatioN) is developed to represent the semantic network of engineering requirements. A case study on the design of a car door hinge is presented to illustrates the proposed method.

翻訳日:2022-11-10 13:41:46 公開日:2020-07-14

# 名前の由来は? BERT は Entity Representations を他のどの名前にも最適か?

What's in a Name? Are BERT Named Entity Representations just as Good for any other Name? ( http://arxiv.org/abs/2007.06897v1 )

ライセンス: Link先を確認

Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi, Sunita Sarawagi

(参考訳) BERTをベースとしたNLPモデルの名前付きエンティティ表現は,入力中の同じ型付きクラスからの置換に対するロバスト性を調べることで評価する。このような摂動は自然であるが、いくつかのタスクにおいて、訓練されたモデルの状況は驚くほど不安定である。脆性は、最近のエンティティ対応bertモデルでも継続される。また,この非ロバスト性の原因を,トークン化や発生頻度などの要因を考慮して識別する。タイプアノテーションの不確かさとラベル予測を共同でモデル化しながら,複数の置換子から予測をアンサンブルする簡易な手法を提案する。 3つのNLPタスクの実験から,本手法は自然・逆のデータセットの堅牢性を向上し,精度を高めることが示された。

We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks show that our method enhances robustness and increases accuracy on both natural and adversarial datasets.

翻訳日:2022-11-10 13:41:01 公開日:2020-07-14

# 機械翻訳におけるシステム結合投票のモデル化

Modeling Voting for System Combination in Machine Translation ( http://arxiv.org/abs/2007.06943v1 )

ライセンス: Link先を確認

Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu

(参考訳) システム結合は、異なる機械翻訳システムの仮説を組み合わせる重要な技術であり、翻訳性能を向上させる。システム組み合わせに対する初期の統計的アプローチは仮説間のコンセンサスを分析するのに有効であることが証明されているが、パイプラインの使用によるエラー伝搬の問題に悩まされている。この問題は、近年のマルチソースシーケンス・ツー・シーケンスモデルのエンドツーエンドトレーニングによって緩和されているが、これらのニューラルモデルは仮説間の関係を明示的に分析せず、仮説中の単語への注意が独立に計算されるため、複数の仮説で単語が生じる可能性を無視する。本研究では,機械翻訳におけるシステム組み合わせに対する投票のモデル化手法を提案する。基本的な考え方は、異なるシステムからの仮説における単語を、代表的で生成プロセスに関与するべき単語に投票できるようにすることである。これは、各投票者の影響力と各候補者の選好を定量化する。本手法は,仮説間の関係を解析できるだけでなく,エンドツーエンドのトレーニングを可能にするため,統計的手法とニューラル手法の利点を組み合わせる。実験の結果,我々の手法は仮説のコンセンサスをうまく活用でき,中国語とドイツ語の機械翻訳タスクにおける最先端のベースラインを大幅に改善できることがわかった。

System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.

翻訳日:2022-11-10 13:40:48 公開日:2020-07-14

# 質問応答におけるホログラフィック圧縮埋め込みの利用

Using Holographically Compressed Embeddings in Question Answering ( http://arxiv.org/abs/2007.07287v1 )

ライセンス: Link先を確認

Salvador E. Barbosa

(参考訳) 単語ベクトル表現は、ディープラーニング自然言語処理モデルの中心である。埋め込みとして知られるこれらのベクトルの多くの形式があり、例えば word2vec や GloVe がある。埋め込みは大きなコーパスで訓練され、文脈で単語の使用法を学び、単語間の意味的関係を捉える。しかし、そのような訓練のセマンティクスは(単語型として知られる)異なる単語のレベルであり、例えば、単語型が名詞または動詞である場合、曖昧である可能性がある。質問応答では、入力部分と名前付きエンティティタイプが重要であるが、これらの属性を神経モデルにエンコードすることで入力のサイズが拡大する。本研究は,予め訓練された埋め込みのホログラフィック圧縮を用いて,トークン,その部分表現,名前付きエンティティタイプを,トークンのみを表すのと同じ次元で表現する。この実装は、修正された質問応答の繰り返しディープラーニングネットワークにおいて、意味的関係が保存され、高い性能が得られることを示す。

Word vector representations are central to deep learning natural language processing models. Many forms of these vectors, known as embeddings, exist, including word2vec and GloVe. Embeddings are trained on large corpora and learn the word's usage in context, capturing the semantic relationship between words. However, the semantics from such training are at the level of distinct words (known as word types), and can be ambiguous when, for example, a word type can be either a noun or a verb. In question answering, parts-of-speech and named entity types are important, but encoding these attributes in neural models expands the size of the input. This research employs holographic compression of pre-trained embeddings, to represent a token, its part-of-speech, and named entity type, in the same dimension as representing only the token. The implementation, in a modified question answering recurrent deep learning network, shows that semantic relationships are preserved, and yields strong performance.

翻訳日:2022-11-10 13:40:23 公開日:2020-07-14

# 単部適応Q-ラーニング

Single-partition adaptive Q-learning ( http://arxiv.org/abs/2007.06741v1 )

ライセンス: Link先を確認

Jo\~ao Pedro Ara\'ujo, M\'ario Figueiredo, Miguel Ayala Botto

(参考訳) 本稿では、マルコフ決定過程(MDP)の状態空間を適応的に分割するモデルフリー・エピソード強化学習(RL)のアルゴリズムである単一分割適応Q-ラーニング(SPAQL)を紹介し、同時に時間不変ポリシー(例えば、状態から行動へのマッピングはエピソード時間ステップに依存しない)を学習し、累積報酬を最大化する。探索と搾取の間のトレードオフは、訓練中にuper confidence bounds(ucb)とboltzmann exploration(ボルツマン探索)の混合物を使い、トレーニングの進捗に合わせて自動的に調整される温度パラメータを用いて処理される。このアルゴリズムは適応型Q-ラーニング(AQL)よりも改善されている。最適な解に速く収束すると同時に、より少ないアームを使用する。多数のタイムステップを持つエピソードのテストでは、SPAQLはAQLとは異なり、スケーリングに問題はないことが示されている。この経験的証拠に基づき、SPAQLはAQLよりも高いサンプリング効率を持つため、効率的なモデルフリーなRL手法の分野における重要な貢献であると主張している。

This paper introduces single-partition adaptive Q-learning (SPAQL), an algorithm for model-free episodic reinforcement learning (RL), which adaptively partitions the state-action space of a Markov decision process (MDP), while simultaneously learning a time-invariant policy (i. e., the mapping from states to actions does not depend explicitly on the episode time step) for maximizing the cumulative reward. The trade-off between exploration and exploitation is handled by using a mixture of upper confidence bounds (UCB) and Boltzmann exploration during training, with a temperature parameter that is automatically tuned as training progresses. The algorithm is an improvement over adaptive Q-learning (AQL). It converges faster to the optimal solution, while also using fewer arms. Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike AQL. Based on this empirical evidence, we claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.

翻訳日:2022-11-10 13:39:15 公開日:2020-07-14

# 比較とリウェイト:類似画像集合を用いた識別的画像キャプション

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets ( http://arxiv.org/abs/2007.06877v1 )

ライセンス: Link先を確認

Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan

(参考訳) BLEU、CIDEr、SPICEといった一般的な指標に基づいて、幅広い画像キャプションモデルが開発され、大幅に改善されている。しかし、生成されたキャプションは画像を正確に記述できるが、類似した画像には汎用的であり、各画像の特異性を適切に記述することができない。本稿では,類似画像の集合を用いた訓練により,画像キャプションの識別性を向上することを目的とする。まず,類似画像に対する字幕の識別性を評価するために,セットcider(ciderbtw)間の識別性指標を提案する。評価基準は,各画像の人的アノテーションが特徴性に基づいて等価でないことを示す。そこで本研究では,CIDErBtwを重み付き損失関数あるいは強化学習報酬として用いることにより,画像毎のキャプションの特異性を高めるための新たなトレーニング戦略を提案する。最後に,提案手法は,CIDErBtwで測定した特徴量と,CIDErで測定した精度(例えば,CIDErで測定した精度)を,多種多様な画像キャプションベースラインに対して有意に改善することを示す。これらの結果はユーザ調査によってさらに確認される。

A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.

翻訳日:2022-11-10 13:33:27 公開日:2020-07-14

# 自動合成から現実への一般化

Automated Synthetic-to-Real Generalization ( http://arxiv.org/abs/2007.06965v1 )

ライセンス: Link先を確認

Wuyang Chen, Zhiding Yu, Zhangyang Wang, Anima Anandkumar

(参考訳) 合成画像で訓練されたモデルは、しばしば実データへの分解された一般化に直面します。慣例として、これらのモデルはimagenet事前学習された表現で初期化されることが多い。しかし、この知識を活用して一般化能力を維持する慣習にもかかわらず、イメージネット知識の役割はほとんど議論されない。例えば、早期停止と階層的学習率の慎重な調整は、合成と現実の一般化を改善することが示されるが、熱心でヒューリスティックでもある。本研究では, 合成学習モデルに対して, imagenet 事前学習モデルと類似表現を維持することを明示的に推奨し, 層別学習率の自動選択のための \textit{learning-to-optimize (l2o)" 戦略を提案する。提案フレームワークは,実データを見たりトレーニングしたりすることなく,合成から実への一般化性能を大幅に向上できると同時に,ドメイン適応などの下流タスクにもメリットがある。コードは、https://github.com/NVlabs/ASG.comで入手できる。

Models trained on synthetic images often face degraded generalization to real data. As a convention, these models are often initialized with ImageNet pre-trained representation. Yet the role of ImageNet knowledge is seldom discussed despite common practices that leverage this knowledge to maintain the generalization ability. An example is the careful hand-tuning of early stopping and layer-wise learning rates, which is shown to improve synthetic-to-real generalization but is also laborious and heuristic. In this work, we explicitly encourage the synthetically trained model to maintain similar representations with the ImageNet pre-trained model, and propose a \textit{learning-to-optimize (L2O)} strategy to automate the selection of layer-wise learning rates. We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.

翻訳日:2022-11-10 13:33:03 公開日:2020-07-14

# 破滅的忘れの解剖--隠れた表現とタスクの意味論

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics ( http://arxiv.org/abs/2007.07400v1 )

ライセンス: Link先を確認

Vinay V. Ramasesh, Ethan Dyer, Maithra Raghu

(参考訳) 汎用機械学習システムの開発における中心的な課題は、破滅的な忘れさだ。タスクの順序でトレーニングされたモデルが、以前のタスクで大幅なパフォーマンス低下を被る。破滅的な忘れ物が多用されているにもかかわらず、基礎となるプロセスとその原因についての理解は限られている。本稿では,この重要な知識ギャップに対処し,ニューラルネットワークモデルにおいて,忘れることが表現に与える影響について検討する。表現分析手法により,深い層が忘れの源であることがわかった。これを支持するために、忘れを緩和する方法の研究は、より深い層を安定化するために働くことを示す。これらの洞察は、タスク間の表象的類似性を忘れる程度に関連する分析的議論と経験的図の開発を可能にする。この図と一致して、中間相似性を持つタスクシーケンスの最大忘れが観測される。我々は、標準分割CIFAR-10セットアップに関する実証的研究を行い、また、現実的な入力分布シフトを近似する新しいCIFAR-100タスクを導入する。

A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. Despite the ubiquity of catastrophic forgetting, there is limited understanding of the underlying process and its causes. In this paper, we address this important knowledge gap, investigating how forgetting affects representations in neural network models. Through representational analysis techniques, we find that deeper layers are disproportionately the source of forgetting. Supporting this, a study of methods to mitigate forgetting illustrates that they act to stabilize deeper layers. These insights enable the development of an analytic argument and empirical picture relating the degree of forgetting to representational similarity between tasks. Consistent with this picture, we observe maximal forgetting occurs for task sequences with intermediate similarity. We perform empirical studies on the standard split CIFAR-10 setup and also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.

翻訳日:2022-11-10 13:32:02 公開日:2020-07-14

# Sudo rm -rf: ユニバーサル音源分離のための効率的なネットワーク

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation ( http://arxiv.org/abs/2007.06833v1 )

ライセンス: Link先を確認

Efthymios Tzinis, Zhepei Wang and Paris Smaragdis

(参考訳) 本稿では,エンドツーエンドの汎用音源分離のための効率的なニューラルネットワークを提案する。具体的には、この畳み込みネットワークのバックボーン構造は、単純な1次元畳み込みによって実行される、複数の解像度特徴(sudormrf)の連続的なダウンサンプリングと再サンプリングである。このようにして,浮動小数点演算数,メモリ要求数,パラメータ数,レイテンシを限定した高品質なオーディオソース分離を実現することができる。音声と環境音の分離データセットを用いた実験により,SuDoRMRFは相容れない性能を示し,計算資源の要求が大幅に高い様々な最先端手法を超越していることがわかった。

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high quality audio source separation with limited number of floating point operations, memory requirements, number of parameters and latency. Our experiments on both speech and environmental sound separation datasets show that SuDoRMRF performs comparably and even surpasses various state-of-the-art approaches with significantly higher computational resource requirements.

翻訳日:2022-11-10 13:31:47 公開日:2020-07-14

# Rewardsによるプログラミング

Programming by Rewards ( http://arxiv.org/abs/2007.06835v1 )

ライセンス: Link先を確認

Nagarajan Natarajan, Ajaykrishna Karthikeyan, Prateek Jain, Ivan Radicek, Sriram Rajamani, Sumit Gulwani, Johannes Gehrke

(参考訳) PBR(Programming by rewards)は,パフォーマンスや資源利用,あるいはベンチマーク上の正当性などの定量的指標を最適化するために,サブルーチンを指定・合成するための新しい手法である。 PBR仕様は(1)入力機能$x$、(2)報酬関数$r$で、ブラックボックスコンポーネントとしてモデル化され、実行毎に報酬を割り当てる。シンセサイザーの目標は「決定関数」$f$を合成することであり、ブラックボックスコンポーネントの判断値を変換して、様々な値の$x$に対して$f(x)$を実行するための期待報酬$e[r \circ f(x)]$を最大化することである。我々は,木構造における入力特徴の線形関数を分岐し,木の葉における入力の線形関数を計算するループフリーif-then-elseプログラムのdslにおける決定関数の空間を考える。このdslはプログラマが実際に手作業で記述した決定関数をキャプチャする。我々の技術的貢献は、if-then-elseプログラムのような決定関数の合成に連続最適化技術を使うことである。また、このフレームワークは理論的に確立された -- 報酬が優れた特性を満たす場合において、合成されたコードは正確な意味で最適であることを示す。我々は,pbrを活用して,proseコードベースにおける検索・ランキングヒューリスティックスに関連する非自明な決定関数(産業強度プログラム合成フレームワーク)を合成し,複数人のチューニングにおいて手作業による手続きと競合する結果を得る。実世界のケーススタディ(PROSEを含む)と単純な合成ベンチマークにおいて,他のベースライン技術に対する実証評価を行った。

We formalize and study ``programming by rewards'' (PBR), a new approach for specifying and synthesizing subroutines for optimizing some quantitative metric such as performance, resource utilization, or correctness over a benchmark. A PBR specification consists of (1) input features $x$, and (2) a reward function $r$, modeled as a black-box component (which we can only run), that assigns a reward for each execution. The goal of the synthesizer is to synthesize a "decision function" $f$ which transforms the features to a decision value for the black-box component so as to maximize the expected reward $E[r \circ f (x)]$ for executing decisions $f(x)$ for various values of $x$. We consider a space of decision functions in a DSL of loop-free if-then-else programs, which can branch on linear functions of the input features in a tree-structure and compute a linear function of the inputs in the leaves of the tree. We find that this DSL captures decision functions that are manually written in practice by programmers. Our technical contribution is the use of continuous-optimization techniques to perform synthesis of such decision functions as if-then-else programs. We also show that the framework is theoretically-founded ---in cases when the rewards satisfy nice properties, the synthesized code is optimal in a precise sense. We have leveraged PBR to synthesize non-trivial decision functions related to search and ranking heuristics in the PROSE codebase (an industrial strength program synthesis framework) and achieve competitive results to manually written procedures over multiple man years of tuning. We present empirical evaluation against other baseline techniques over real-world case studies (including PROSE) as well on simple synthetic benchmarks.

翻訳日:2022-11-10 13:31:36 公開日:2020-07-14

# 注目すべき発話予測による医師・患者会話の構造化データ抽出

Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances ( http://arxiv.org/abs/2007.07151v1 )

ライセンス: Link先を確認

Kundan Krishna, Amy Pavel, Benjamin Schloss, Jeffrey P. Bigham, Zachary C. Lipton

(参考訳) 医療データの多様なモダリティを発掘する様々な努力にもかかわらず、診療当時の医師と患者の会話は未解決の洞察の源である。本稿では,このデータを利用して医師の電子的健康記録における訪問後の文書化を支援する構造情報を抽出し,聖職者の負担軽減を図る。本稿では,会話の書き起こし,ビジット後の要約,それに対応する証拠(転写文),構造化ラベルからなる新しいデータセットについて述べる。我々は, 臓器システム(ros)のレビューにおいて, 関連する診断や異常の認識の課題に焦点をあてる。方法論上の課題の1つは、会話が長い(約1500語)ため、現代のディープラーニングモデルがそれらを入力として使用するのが困難である。この課題に対処するために,会話の一部が要約文を支持する証拠として引用される可能性が高い,注目すべき発話を抽出する。予測音声を初めてフィルタリングすることにより,診断とRoS異常の両方を認識するための予測性能を大幅に向上させることができる。

Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.

翻訳日:2022-11-10 13:24:04 公開日:2020-07-14

# 評価基準は包括的一般化の更新を必要とします

Our Evaluation Metric Needs an Update to Encourage Generalization ( http://arxiv.org/abs/2007.06898v1 )

ライセンス: Link先を確認

Swaroop Mishra, Anjana Arunkumar, Chris Bryan and Chitta Baral

(参考訳) いくつかの人気のあるベンチマークで人的パフォーマンスを上回るモデルでは、out of Distribution(OOD)データに曝露した場合のパフォーマンスが著しく低下する。最近の研究では、モデルが人間のような一般化可能な特徴を学習する代わりに、刺激的なバイアスや「ハック」データセットに過度に適合していることが示されている。モデル性能のインフレーション(つまりAIシステムの能力の過大評価)を抑えるため、我々は、評価中の一般化を促進する単純で斬新な評価指標であるWOODスコアを提案する。

Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and `hack' datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance -- and thus overestimation in AI systems' capabilities -- we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.

翻訳日:2022-11-10 13:23:47 公開日:2020-07-14

# 固有タスクを用いた生涯学習:タスク分離、スキル獲得、選択転送

Lifelong Learning using Eigentasks: Task Separation, Skill Acquisition, and Selective Transfer ( http://arxiv.org/abs/2007.06918v1 )

ライセンス: Link先を確認

Aswin Raghavan, Jesse Hostetler, Indranil Sur, Abrar Rahman, Ajay Divakaran

(参考訳) 生涯学習のための固有タスクフレームワークを紹介する。固有タスク(eigentask)とは、関連するタスクの集合を解決するスキルのペアであり、そのスキルの入力空間からサンプルできる生成モデルとペアリングする。このフレームワークは、主に破滅的な忘れを避けるために使われてきた生成的リプレイアプローチを拡張し、フォワード・ナレッジ・トランスファーのような他の生涯学習目標にも対処する。我々は,学習のためのタスク学習と知識統合を交互に行うウェイクスリープサイクルを提案し,生涯教師付き学習と生涯rlをインスタンス化する。我々は,教師付き連続学習における最先端の性能向上を実現し,ゲーム『スタークラフト2』の生涯RLアプリケーションにおけるフォワード知識伝達の証拠を示す。

We introduce the eigentask framework for lifelong learning. An eigentask is a pairing of a skill that solves a set of related tasks, paired with a generative model that can sample from the skill's input space. The framework extends generative replay approaches, which have mainly been used to avoid catastrophic forgetting, to also address other lifelong learning goals such as forward knowledge transfer. We propose a wake-sleep cycle of alternating task learning and knowledge consolidation for learning in our framework, and instantiate it for lifelong supervised learning and lifelong RL. We achieve improved performance over the state-of-the-art in supervised continual learning, and show evidence of forward knowledge transfer in a lifelong RL application in the game Starcraft2.

翻訳日:2022-11-10 13:23:38 公開日:2020-07-14

# リパラメータ化によるMLシステムの検証

Verification of ML Systems via Reparameterization ( http://arxiv.org/abs/2007.06776v1 )

ライセンス: Link先を確認

Jean-Baptiste Tristan, Joseph Tassarotti, Koundinya Vajjha, Michael L. Wick, Anindya Banerjee

(参考訳) 機械学習が本質的なシステムでますます使われているため、深刻なバグの発生を低減または排除することが重要である。成長する研究機関は、パフォーマンス、堅牢性、公正性に関する正式な保証を備えた機械学習アルゴリズムを開発した。しかし、これらのアルゴリズムの分析はしばしば複雑であり、実際にそのようなシステムを実装するとエラーの余地が生じる。証明アシスタントは、そのようなバグを除外する正当性のマシンチェック証明を構築することによって、機械学習システムの正式な検証に使用できる。しかし、証明アシスタントの内部での確率的主張の推論は依然として困難である。確率的プログラムが 'emph{reparameterization} という概念を用いて定理証明器で自動的に表現され、また可測性の退屈な証明が確率的プログラムから自動的に生成されることを示す。このアプローチが、かなり異なるタイプの機械学習システムを扱うのに十分広いことを実証するために、統計的学習理論(PAC-learnability of decision stumps)からの古典的な結果と、ベイズ仮説テストで用いられるヌルモデルが、人口統計パリティと呼ばれる公正な基準を満たすことを証明した。

As machine learning is increasingly used in essential systems, it is important to reduce or eliminate the incidence of serious bugs. A growing body of research has developed machine learning algorithms with formal guarantees about performance, robustness, or fairness. Yet, the analysis of these algorithms is often complex, and implementing such systems in practice introduces room for error. Proof assistants can be used to formally verify machine learning systems by constructing machine checked proofs of correctness that rule out such bugs. However, reasoning about probabilistic claims inside of a proof assistant remains challenging. We show how a probabilistic program can be automatically represented in a theorem prover using the concept of \emph{reparameterization}, and how some of the tedious proofs of measurability can be generated automatically from the probabilistic program. To demonstrate that this approach is broad enough to handle rather different types of machine learning systems, we verify both a classic result from statistical learning theory (PAC-learnability of decision stumps) and prove that the null model used in a Bayesian hypothesis test satisfies a fairness criterion called demographic parity.

翻訳日:2022-11-10 13:23:24 公開日:2020-07-14

# 因果推論の線形構造方程式モデルにおけるロバスト同定可能性

Robust Identifiability in Linear Structural Equation Models of Causal Inference ( http://arxiv.org/abs/2007.06869v1 )

ライセンス: Link先を確認

Karthik Abinav Sankararaman, Anand Louis, Navin Goyal

(参考訳) 本研究では,線形構造方程式モデル(LSEM)の文脈における観測データからのロバストパラメータ推定の問題について考察する。 LSEMは、自然科学と社会科学の因果関係を推定するための、人気がありよく研究されているモデルのクラスである。 LSEMに関連する主な問題の1つは、観測データからモデルパラメータを復元することである。 LSEMとモデルパラメータの様々な条件の下で、先行研究はパラメータを復元する効率的なアルゴリズムを提供する。しかし、これらの結果はしばしば汎用的な識別可能性に関するものである。実際には、一般的な識別性は十分ではなく、堅牢な識別性が必要であり、観測データの小さな変化はパラメータに多大な影響を及ぼすべきではない。ロバストな識別性は、はるかに少ない注目を受けており、まだ理解されていない。 sankararaman et al. (2019) は最近、ロバストな識別性が実現可能なパラメータに関する十分条件のセットを提供した。しかしながら、彼らの研究の限界は、それらの結果は ``bow-free paths と呼ばれるLSEMの小さなサブクラスにのみ適用されることである。「'この作業では、複数の次元に沿って作業を大幅に拡張します。まず,大規模かつ十分に検討されたlsemsクラス,すなわち`bow free''モデルに対して,ロバスト識別性が保持するモデルパラメータに関する十分な条件を提供し,事前作業に必要なパスの制限を解消する。次に,この十分条件が高い確率で保持されることを示すことにより,頑健な識別可能性の大きい集合に対して,既存のアルゴリズムが既に頑健な識別可能性を達成していることを示す。最後に、シミュレーションと実世界の両方のデータセットで結果を検証する。

In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called ``bow-free paths.'' In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely ``bow free'' models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.

翻訳日:2022-11-10 13:23:03 公開日:2020-07-14

# 多腕バンディットにおける汎用的異常検出

Generic Outlier Detection in Multi-Armed Bandit ( http://arxiv.org/abs/2007.07293v1 )

ライセンス: Link先を確認

Yikun Ban and Jingrui He

(参考訳) 本稿では,金融,医療,オンライン広告など多くのハイパフォーマンスな分野において,多腕のバンディット設定における異常アーム検出の問題点について検討する。この問題に対して、学習者は、期待された報酬が他のほとんどの腕から著しく逸脱する腕を特定することを目指している。既存の作業とは違って、期待される報酬がより大きく、小さく、あるいは通常のアーム間でも得る、汎用的なアウトリアーアームまたはアウトリアーアームグループをターゲットにしています。この目的のために、我々は、そのようなジェネリックアウトリアーアームとアウトリアーアーム群の包括的定義を提供することから始める。そこで本研究では,GOLDと呼ばれる新しい引抜きアルゴリズムを提案する。これは、高信頼境界に基づくリアルタイムな近傍グラフを構築し、通常の腕から外れ値の振る舞いパターンをキャッチする。また、その性能を様々な側面から分析する。合成データと実世界のデータの両方で行った実験において,提案アルゴリズムは98 %の精度を実現し,最先端技術と比較して平均83 %の探索コストを節約した。

In this paper, we study the problem of outlier arm detection in multi-armed bandit settings, which finds plenty of applications in many high-impact domains such as finance, healthcare, and online advertising. For this problem, a learner aims to identify the arms whose expected rewards deviate significantly from most of the other arms. Different from existing work, we target the generic outlier arms or outlier arm groups whose expected rewards can be larger, smaller, or even in between those of normal arms. To this end, we start by providing a comprehensive definition of such generic outlier arms and outlier arm groups. Then we propose a novel pulling algorithm named GOLD to identify such generic outlier arms. It builds a real-time neighborhood graph based on upper confidence bounds and catches the behavior pattern of outliers from normal arms. We also analyze its performance from various aspects. In the experiments conducted on both synthetic and real-world data sets, the proposed algorithm achieves 98 % accuracy while saving 83 % exploration cost on average compared with state-of-the-art techniques.

翻訳日:2022-11-10 13:21:48 公開日:2020-07-14

PDF登録状況（公開日: 20200714）