Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20200318となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 物理的に定義されたp型MOSシリコン二重量子ドットにおけるスピン軌道場 Spin orbit field in a physically defined p type MOS silicon double quantum dot ( http://arxiv.org/abs/2003.07079v2 ) ライセンス: Link先を確認	Marian Marx, Jun Yoneda, \'Angel Guti\'errez Rubio, Peter Stano, Tomohiro Otsuka, Kenta Takeda, Sen Li, Yu Yamaoka, Takashi Nakajima, Akito Noiri, Daniel Loss, Tetsuo Kodera and Seigo Tarucha	(参考訳) シリコン中のp型金属酸化物半導体二重量子ドットにおけるスピン軌道(so)場を実験的に理論的に検討した。パウリのスピン遮断における二重点を通した漏れ電流の磁場依存性を測定する。有限磁場は、外部磁場とSO磁場が平行であるときに、昇降が最も効果的である。このようにして、トンネル孔のスピンフリップは、二重点軸に垂直なSO場が量子井戸面からほぼ完全に外れているためである。群対称表現理論を用いて、SO項の導出による測定を拡大する。平面電場(量子井戸の場合)がなければ、so場は主に平面内にあり、ラシュバの和とドレッテルハウスの項が支配的であると予測される。したがって, 観測された等電界は, 平面成分が相当な電界に起因していると解釈した。 We experimentally and theoretically investigate the spin orbit (SO) field in a physically defined, p type metal oxide semiconductor double quantum dot in silicon. We measure the magnetic field dependence of the leakage current through the double dot in the Pauli spin blockade. A finite magnetic field lifts the blockade, with the lifting least effective when the external and SO fields are parallel. In this way, we find that the spin flip of a tunneling hole is due to a SO field pointing perpendicular to the double dot axis and almost fully out of the quantum well plane. We augment the measurements by a derivation of SO terms using group symmetric representations theory. It predicts that without in plane electric fields (a quantum well case), the SO field would be mostly within the plane, dominated by a sum of a Rashba and a Dresselhaus like term. We, therefore, interpret the observed SO field as originated in the electric fields with substantial in plane components.	翻訳日:2023-05-29 00:33:59 公開日:2020-03-18
# コンパクト位相空間宇宙論の量子揺らぎ Quantum fluctuations of the compact phase space cosmology ( http://arxiv.org/abs/2003.08129v1 ) ライセンス: Link先を確認	Danilo Artigas, Sean Crowe, Jakub Mielczarek	(参考訳) 最近の記事ではPhys。 D 100, No. 4, 043533 (2019) 平面ド・ジッター宇宙論のコンパクト位相空間の一般化が提案されている。コンパクト化の主な利点は、物理量は有界であり、量子論は有限次元ヒルベルト空間によって特徴づけられる。さらに、$\mathbb{s}^2$位相空間を考えることで、量子記述は$su(2)$表現理論を用いて構成される。本研究の目的は、量子力学の半古典的状態の抽出に効果的な方法を適用することである。解析は、量子制約の事前解法とモデルの物理ハミルトニアンを抽出することによって行われる。有効レベルでは、2つの手順の結果が等価であることが示される。我々は、標準平坦位相空間での量子化後のものとは異なる、宇宙の再集合の周りのゆらぎの非自明な挙動を見つける。この挙動は量子バック反応効果を持つ修正フリードマン方程式のレベルに反映され、導出される。最後に、宇宙セクターの量子ゆらぎとホログラフィック・ブッソ境界との予期せぬ関係を示す。 In the recent article Phys. Rev. D 100, no. 4, 043533 (2019) a compact phase space generalization of the flat de Sitter cosmology has been proposed. The main advantages of the compactification is that physical quantities are bounded, and the quantum theory is characterized by finite dimensional Hilbert space. Furthermore, by considering the $\mathbb{S}^2$ phase space, quantum description is constructed with the use $SU(2)$ representation theory. The purpose of this article is to apply effective methods to extract semi-classical regime of the quantum dynamics. The analysis is performed both without prior solving of the quantum constraint and by extracting physical Hamiltonian of the model. At the effective level, the results of the two procedures are shown to be equivalent. We find a nontrivial behavior of the fluctuations around the recollapse of the universe, which is distinct from what is found after quantization with the standard flat phase space. The behavior is reflected at the level of the modified Friedmann equation with quantum back-reaction effects, which is derived. Finally, an unexpected relation between the quantum fluctuations of the cosmological sector and the holographic Bousso bound is shown.	翻訳日:2023-05-28 20:26:26 公開日:2020-03-18
# 量子アルゴリズムによる粒子トラック再構成 Particle Track Reconstruction with Quantum Algorithms ( http://arxiv.org/abs/2003.08126v1 ) ライセンス: Link先を確認	Cenk T\"uys\"uz, Federico Carminati, Bilge Demirk\"oz, Daniel Dobos, Fabio Fracas, Kristiane Novotny, Karolos Potamianos, Sofia Vallecorsa, Jean-Roch Vlimant	(参考訳) 粒子軌道再構成パラメータの正確な決定は、HL-LHC(High Luminosity Large Hadron Collider)実験において大きな課題となる。 HL-LHCにおける同時衝突の数の増加と高い検出器占有率により、トラック再構成アルゴリズムは時間と計算資源の面で極めて要求される。ヒット数の増加は、トラック再構築アルゴリズムの複雑さを増大させる。加えて、粒子のトラックにヒットを割り当てる際の曖昧さは、検出器の有限分解能とヒットの物理的近接性によって増大する。したがって、荷電粒子軌道の再構成はHL-LHCデータの正しい解釈にとって大きな課題となる。現在使われているほとんどの手法はカルマンフィルタに基づいており、ロバストであり、優れた物理性能を提供する。しかし、二乗よりはスケールが悪くなることが期待されている。ヒットレベルの組合せ背景を低減できるアルゴリズムを設計することで、カルマンフィルタに対するよりクリーンな初期シードが提供され、全体の処理時間が大幅に短縮される。量子コンピュータの顕著な特徴の1つは、非常に多くの状態を同時に評価でき、大きなパラメータ空間で検索するのに理想的な手段となることである。実際、異なるr\&dイニシアティブは、量子追跡アルゴリズムがそのような能力をどのように活用できるかを探求している。本稿では,初期シード段階における組合せ背景の低減を目的とした量子ベーストラック探索アルゴリズムの実装について述べる。 kaggle trackmlチャレンジ用に設計された公開データセットを使用します。 Accurate determination of particle track reconstruction parameters will be a major challenge for the High Luminosity Large Hadron Collider (HL-LHC) experiments. The expected increase in the number of simultaneous collisions at the HL-LHC and the resulting high detector occupancy will make track reconstruction algorithms extremely demanding in terms of time and computing resources. The increase in number of hits will increase the complexity of track reconstruction algorithms. In addition, the ambiguity in assigning hits to particle tracks will be increased due to the finite resolution of the detector and the physical closeness of the hits. Thus, the reconstruction of charged particle tracks will be a major challenge to the correct interpretation of the HL-LHC data. Most methods currently in use are based on Kalman filters which are shown to be robust and to provide good physics performance. However, they are expected to scale worse than quadratically. Designing an algorithm capable of reducing the combinatorial background at the hit level, would provide a much cleaner initial seed to the Kalman filter, strongly reducing the total processing time. One of the salient features of Quantum Computers is the ability to evaluate a very large number of states simultaneously, making them an ideal instrument for searches in a large parameter space. In fact, different R\&D initiatives are exploring how Quantum Tracking Algorithms could leverage such capabilities. In this paper, we present our work on the implementation of a quantum-based track finding algorithm aimed at reducing combinatorial background during the initial seeding stage. We use the publicly available dataset designed for the kaggle TrackML challenge.	翻訳日:2023-05-28 20:25:55 公開日:2020-03-18
# ARIMAモデルを用いた予測犯罪 Forecasting Crime Using ARIMA Model ( http://arxiv.org/abs/2003.08006v1 ) ライセンス: Link先を確認	Khawar Islam and Akhter Raza	(参考訳) データマイニングとは,大規模データセットからさまざまなパターンや有用な情報を抽出するプロセスである。ロンドン警察によると、犯罪は2017年の初めからロンドン各区で増加している。今後、犯罪防止のための有用な情報は得られない。我々は,ロンドン地区における犯罪率の予測を,ロンドンにおける大規模な犯罪データセットを抽出し,将来における犯罪数を予測する。ロンドンにおける犯罪予測に時系列ARIMAモデルを用いた。 2年間の犯罪データを予測するARIMAモデルに5年間のデータを与える。対照的に指数的滑らかな ARIMA モデルはより高い適合値を持つ。ロンドン警視庁がウェブサイトやその他の資料から収集した実際の犯罪のデータセット。私たちの主な概念は4つの部分に分かれている。データ抽出(DE)、非構造化データのデータ処理(DP)、IBM SPSSの可視化モデル。 DEは、2012年の2016年のWebソースから犯罪データを抽出する。 DPはデータを統合して還元し、事前に定義された属性を与える。犯罪予測は、いくつかの計算を適用して分析され、その移動平均、差、自動回帰を計算する。予測モデルは80%の正確な値を与えるが、これは正確なモデルである。この作業はロンドン警察の犯罪に対する意思決定に役立つ。 Data mining is the process in which we extract the different patterns and useful Information from large dataset. According to London police, crimes are immediately increases from beginning of 2017 in different borough of London. No useful information is available for prevent crime on future basis. We forecasts crime rates in London borough by extracting large dataset of crime in London and predicted number of crimes in future. We used time series ARIMA model for forecasting crimes in London. By giving 5 years of data to ARIMA model forecasting 2 years crime data. Comparatively, with exponential smoothing ARIMA model has higher fitting values. A real dataset of crimes reported by London police collected from its website and other resources. Our main concept is divided into four parts. Data extraction (DE), data processing (DP) of unstructured data, visualizing model in IBM SPSS. DE extracts crime data from web sources during 2012 for the 2016 year. DP integrates and reduces data and give them predefined attributes. Crime prediction is analyzed by applying some calculation, calculated their moving average, difference, and auto-regression. Forecasted Model gives 80% correct values, which is formed to be an accurate model. This work helps for London police in decision-making against crime.	翻訳日:2023-05-28 20:23:09 公開日:2020-03-18
# 光子と相互作用する原子配列に現れる量子ホール位相 Quantum Hall phase emerging in an array of atoms interacting with photons ( http://arxiv.org/abs/2003.08257v1 ) ライセンス: Link先を確認	Alexander V. Poshakinskiy, Janet Zhong, Yongguan Ke, Nikita A. Olekhno, Chaohong Lee, Yuri S. Kivshar, Alexander N. Poddubny	(参考訳) 位相量子相は現代物理学の多くの概念の根底にある。電子の無秩序なトポロジカルエッジ状態の存在は通常磁場を必要とするが、光に対する磁場の直接効果は非常に弱い。結果として、光子の位相状態のデモンストレーションは、特別な複素構造や外部時間依存変調で設計された合成場を用いる。ここでは、トポロジカルなエッジ状態、スペクトルランダウレベル、ホフスタッター・バターフライを持つ量子ホール相が単純な量子系に現れ、トポロジカルな秩序は微調整なしで相互作用からのみ生じる。このような系は、古典ディッケモデルによって記述された光に結合された2段階の原子(量子ビット)の配列であり、最近、低温原子と超伝導量子ビットの実験で実現されている。我々は、量子物理学、多体物理学、非線形トポロジカルフォトニクスを含むいくつかの分野において新たな地平線が開かれ、量子ビットアレイや量子シミュレータの実験において重要な基準点となると考えている。 Topological quantum phases underpin many concepts of modern physics. While the existence of disorder-immune topological edge states of electrons usually requires magnetic fields, direct effects of magnetic field on light are very weak. As a result, demonstrations of topological states of photons employ synthetic fields engineered in special complex structures or external time-dependent modulations. Here, we reveal that the quantum Hall phase with topological edge states, spectral Landau levels and Hofstadter butterfly can emerge in a simple quantum system, where topological order arises solely from interactions without any fine-tuning. Such systems, arrays of two-level atoms (qubits) coupled to light being described by the classical Dicke model, have recently been realized in experiments with cold atoms and superconducting qubits. We believe that our finding will open new horizons in several disciplines including quantum physics, many-body physics, and nonlinear topological photonics, and it will set an important reference point for experiments on qubit arrays and quantum simulators.	翻訳日:2023-05-28 20:16:44 公開日:2020-03-18
# 動的に分離された単一中性原子のコヒーレンス Coherence of a dynamically decoupled single neutral atom ( http://arxiv.org/abs/2003.08163v1 ) ライセンス: Link先を確認	Chang Hoong Chow, Boon Long Ng, Christian Kurtsiefer	(参考訳) 量子通信における高度な応用には、長い量子ビットコヒーレンスと効率的な原子-光子カップリングが不可欠である。コヒーレンスを維持するための1つのテクニックは動的疎結合であり、システムと環境との相互作用を減らすために周期的な再焦点パルス列を用いる。スピン分極した$^{87}$rb原子上での動的デカップリングの実装を実験的に検討した。 2つの磁気感度を持つ5S_{1/2}=ゼーマンレベル、$\lvert{F=2,\ m_{F}=-2}\rangle$と$\lvert{F=1,\ m_{F}=-1}\rangle$をクォービット状態として使用し、$\lvert{F=2,\ m_{F}=-2}\rangle$から$5P_{3/2}$に励起状態$\lvert{F'=3,\ m'_{F}=-3}\rangle$を閉光遷移によって結合する。動的デカップリング法において、より多くの再焦点パルスにより、コヒーレンス時間を38(3)$\mu$sから2ミリ秒以上にまで拡張することができた。また, 原子の運動状態と再焦点後のクビットコヒーレンスとの間には強い相関関係が見られ, トラップパラメータの解法として利用することができる。 Long qubit coherence and efficient atom-photon coupling are essential for advanced applications in quantum communication. One technique to maintain coherence is dynamical decoupling, where a periodic sequence of refocusing pulses is employed to reduce the interaction of the system with the environment. We experimentally study the implementation of dynamical decoupling on an optically-trapped, spin-polarized $^{87}$Rb atom. We use the two magnetic-sensitive $5S_{1/2}$ Zeeman levels, $\lvert{F=2,\ m_{F}=-2}\rangle$ and $\lvert{F=1,\ m_{F}=-1}\rangle$ as qubit states, motivated by the possibility to couple $\lvert{F=2,\ m_{F}=-2}\rangle$ to $5P_{3/2}$ the excited state $\lvert{F'=3,\ m'_{F}=-3}\rangle$ via a closed optical transition. With more refocusing pulses in the dynamical decoupling technique, we manage to extend the coherence time from 38(3)$\mu$s to more than two milliseconds. We also observe a strong correlation between the motional states of the atom and the qubit coherence after the refocusing, which can be used as a measurement basis to resolve trapping parameters.	翻訳日:2023-05-28 20:15:55 公開日:2020-03-18
# 量子集合モデルにおける異常熱化 Anomalous Thermalization in Quantum Collective Models ( http://arxiv.org/abs/2003.08141v1 ) ライセンス: Link先を確認	Armando Rela\~no	(参考訳) 熱状態は,非平衡過程を含む実験によって追跡可能な過去の情報と関連する量の情報をいまだに保存していることが明らかとなった。我々は,マイクロカノニカル量子クルックの定理の条件を提供し,数値実験により検証する。 lipkin-meshkov-glickモデルでは、同じ平衡状態につながる2つの異なる手順は、非平衡過程における仕事の異なる統計をもたらす。ディックモデルでは、同じ非平衡プロトコルに対する2つの異なる軌道が、異なる作業統計を生成する。マイクロカノニカル平均は、全ての場合において物理観測可能な期待値の正しい結果を与えるが、マイクロカノニカル量子クルックの定理はそれらのいくつかでは失敗する。量子ゆらぎ定理のテストは、システムが適切に熱化されているかどうかを検証することが必須である。 We show that apparently thermalized states still store relevant amounts of information about their past, information that can be tracked by experiments involving nonequilibrium processes. We provide a condition for the microcanonical quantum Crook\'s theorem, and we test it by means of numerical experiments. In the Lipkin-Meshkov-Glick model, two different procedures leading to the same equilibrium states give rise to different statistics of work in nonequilibrium processes. In the Dicke model, two different trajectories for the same nonequilibrium protocol produce different statistics of work. Microcanonical averages provide the correct results for the expectation values of physical observables in all the cases; the microcanonical quantum Crook\'s theorem fails in some of them. We conclude that testing quantum fluctuation theorems is mandatory to verify if a system is properly thermalized.	翻訳日:2023-05-28 20:14:32 公開日:2020-03-18
# キャビティエンハンスドノイズ抑制を用いた寒冷原子時空間多重量子メモリ A cold atom temporally multiplexed quantum memory with cavity-enhanced noise suppression ( http://arxiv.org/abs/2003.08418v1 ) ライセンス: Link先を確認	Lukas Heller, Pau Farrera, Georg Heinze, Hugues de Riedmatten	(参考訳) 将来の量子リピータアーキテクチャは、遠く離れた光の量子状態に符号化された情報を効率的に分散することができる。本研究では, レーザー冷却雲内における時相多重量子リピータノードを, $^{87}$rb 原子で実演する。我々はDLCZプロトコルを用いて、光子対と1つの集合スピン励起(いわゆるスピン波)を複数の時間モードで書き込みパルスを用いて生成する。異なる時間モードで生成されたスピン波を識別可能とし、選択的読み出しを可能にするため、磁場勾配によるスピン波の強調と強調を制御し、関連する原子超微粒子の可逆的不均質化を誘導する。低精細な光学空洞内に原子アンサンブルを埋め込むことにより、マルチモード動作で発生する追加ノイズが強く抑制されることを示す。フィードフォワード読み出しを利用して、最大10の時間モードの区別可能な検索を示す。各モードについて、第一光子と第二光子の間の非古典的相関を証明する。さらに、メモリに記憶されている時間モードの数を増やすことにより、相関光子対の速度の増大が観察される。報告された能力は、多重量子メモリに基づく量子リピータアーキテクチャの重要な要素である。 Future quantum repeater architectures, capable of efficiently distributing information encoded in quantum states of light over large distances, will benefit from multiplexed photonic quantum memories. In this work we demonstrate a temporally multiplexed quantum repeater node in a laser-cooled cloud of $^{87}$Rb atoms. We employ the DLCZ protocol where pairs of photons and single collective spin excitations (so called spin waves) are created in several temporal modes using a train of write pulses. To make the spin waves created in different temporal modes distinguishable and enable selective readout, we control the dephasing and rephasing of the spin waves by a magnetic field gradient, which induces a controlled reversible inhomogeneous broadening of the involved atomic hyperfine levels. We demonstrate that by embedding the atomic ensemble inside a low finesse optical cavity, the additional noise generated in multi-mode operation is strongly suppressed. By employing feed forward readout, we demonstrate distinguishable retrieval of up to 10 temporal modes. For each mode, we prove non-classical correlations between the first and second photon. Furthermore, an enhancement in rates of correlated photon pairs is observed as we increase the number of temporal modes stored in the memory. The reported capability is a key element of a quantum repeater architecture based on multiplexed quantum memories.	翻訳日:2023-05-28 20:05:48 公開日:2020-03-18
# 1064nm顕微鏡光ツイーザの単一セシウム原子トラップ寿命に及ぼすレーザ強度変動の影響 Influence of Laser Intensity Fluctuation on Single-Cesium Atom Trapping Lifetime in a 1064-nm Microscopic Optical Tweezer ( http://arxiv.org/abs/2003.08415v1 ) ライセンス: Link先を確認	Rui Sun, Xin Wang, Kong Zhang, Jun He and Junmin Wang	(参考訳) 赤調1064nmレーザーの強集束単空間モードガウスビームからなる光tweezerは、光強度の最も強い点において単一セシウム(cs)原子を固定することができる。これを単一量子ビットと単一光子ソースのコヒーレントな操作に使用できる。光ツイーザ中の原子のトラップ寿命は、背景原子の影響、光ツイーザのレーザー強度変動、原子の残留熱運動により非常に短い。本稿では,背景圧力,光トワイザのトラップ周波数,光トワイザのパラメトリック加熱が原子トラップ寿命に及ぼす影響を分析した。 AOM(Acousto-optical modulator)に基づく外部フィードバックループと組み合わせて、時間領域における1064nmレーザーの強度変動を$\pm$3.360$\%$から$\pm$0.064$\%$に抑制し、抑制帯域はおよそ33kHzに達した。光学トウィーザーにおける単一cs原子のトラップ寿命は4.04 sから6.34 sに延長された。 An optical tweezer composed of a strongly focused single-spatial-mode Gaussian beam of a red-detuned 1064-nm laser can confine a single-cesium (Cs) atom at the strongest point of the light intensity. We can use this for coherent manipulation of single-quantum bits and single-photon sources. The trapping lifetime of the atoms in the optical tweezers is very short due to the impact of the background atoms, the laser intensity fluctuation of optical tweezer and the residual thermal motion of the atoms. In this paper, we analyzed the influence of the background pressure, the trap frequency of optical tweezers and the parametric heating of the optical tweezer on the atomic trapping lifetime. Combined with the external feedback loop based on an acousto-optical modulator (AOM), the intensity fluctuation of the 1064-nm laser in the time domain was suppressed from $\pm$ 3.360$\%$ to $\pm$ 0.064$\%$, and the suppression bandwidth reached approximately 33 kHz. The trapping lifetime of a single Cs atom in the microscopic optical tweezer was extended from 4.04 s to 6.34 s.	翻訳日:2023-05-28 20:05:27 公開日:2020-03-18
# 退化基底空間に対するAGSPのシャープ含意 Sharp implications of AGSPs for degenerate ground spaces ( http://arxiv.org/abs/2003.08406v1 ) ライセンス: Link先を確認	Nilin Abrahamsen	(参考訳) We generalize the `off-the-rack' AGSP$\Rightarrow$entanglement bound implication of [Arad, Landau, and Vazirani '12] from unique ground states to degenerate ground spaces. Our condition $R\Delta\le1/2$ on a $(\Delta,R)$-AGSP matches the non-degenerate case, whereas existing tools in the literature of spin chains would only be adequate to prove a less natural implication which assumes $R^{\text{Const}}\Delta\le c$. To show that $R\Delta\le1/2$ still suffices in the degenerate case we prove an optimal error reduction bound which improves on the literature by a factor $\delta\mu$ where $\delta=1-\mu$ is the viability. The generalized off-the-rack bound implies the generalization of a recent 2D subvolume law of [Anshu, Arad, and Gosset '19] from the non-degenerate case to the sub-exponentially degenerate case. We generalize the `off-the-rack' AGSP$\Rightarrow$entanglement bound implication of [Arad, Landau, and Vazirani '12] from unique ground states to degenerate ground spaces. Our condition $R\Delta\le1/2$ on a $(\Delta,R)$-AGSP matches the non-degenerate case, whereas existing tools in the literature of spin chains would only be adequate to prove a less natural implication which assumes $R^{\text{Const}}\Delta\le c$. To show that $R\Delta\le1/2$ still suffices in the degenerate case we prove an optimal error reduction bound which improves on the literature by a factor $\delta\mu$ where $\delta=1-\mu$ is the viability. The generalized off-the-rack bound implies the generalization of a recent 2D subvolume law of [Anshu, Arad, and Gosset '19] from the non-degenerate case to the sub-exponentially degenerate case.	翻訳日:2023-05-28 20:04:32 公開日:2020-03-18
# 超伝導量子プロセッサ,スピンメモリ,フォトニック量子ネットワーク間のコヒーレントインタフェースのための光バス A Phononic Bus for Coherent Interfaces Between a Superconducting Quantum Processor, Spin Memory, and Photonic Quantum Networks ( http://arxiv.org/abs/2003.08383v1 ) ライセンス: Link先を確認	Tomas Neuman, Matt Eichenfield, Matthew Trusheim, Lisa Hackett, Prineha Narang, Dirk Englund	(参考訳) 超伝導マイクロ波量子ビットと固体人工原子の基底状態スピン系との間の高忠実な量子状態変換法を圧電トランスデューサに接続された音響バスを介して紹介する。最適化フォノニックキャビティにおける超伝導回路量子ビットおよびダイヤモンドシリコン空隙中心の現在の実験パラメータに適用し,99\%を超える忠実性を有する量子状態変換をmhz規模の帯域幅で推定する。超伝導回路量子コンピューティングと人工原子の相補的な強度を組み合わせることで、ハイブリッドアーキテクチャは、長寿命量子メモリ、高忠実度測定、大きな量子ビット数、再構成可能な量子ビット接続、光量子ネットワークによる高忠実度状態とゲートテレポーテーションを提供する。 We introduce a method for high-fidelity quantum state transduction between a superconducting microwave qubit and the ground state spin system of a solid-state artificial atom, mediated via an acoustic bus connected by piezoelectric transducers. Applied to present-day experimental parameters for superconducting circuit qubits and diamond silicon vacancy centers in an optimized phononic cavity, we estimate quantum state transduction with fidelity exceeding 99\% at a MHz-scale bandwidth. By combining the complementary strengths of superconducting circuit quantum computing and artificial atoms, the hybrid architecture provides high-fidelity qubit gates with long-lived quantum memory, high-fidelity measurement, large qubit number, reconfigurable qubit connectivity, and high-fidelity state and gate teleportation through optical quantum networks.	翻訳日:2023-05-28 20:04:07 公開日:2020-03-18
# タンパク質$\alpha$-helicesにおける自由エネルギーの量子輸送と利用 Quantum transport and utilization of free energy in protein $\alpha$-helices ( http://arxiv.org/abs/2003.13814v1 ) ライセンス: Link先を確認	Danko D. Georgiev, James F. Glazebrook	(参考訳) 生命を維持する重要な生物学的プロセスは、タンパク質ナノエンジンによって触媒され、生体系を非平衡状態から維持する。タンパク質のエネルギー過程を調べるために, 水素結合ペプチド基に沿って伝播する複数のアミドIエキシトン量子の量子力学を, 一般化したダヴィドフ方程式の系を解析した。計算シミュレーションにより、様々な長さのタンパク質$\alpha$-helicesに対してアミドiエネルギーのパルスを印加することで動くダヴィドフソリトンの生成が確認された。これらのソリトンの安定性と移動性は、アミドi振動子間の双極子-双極子カップリングの均一性とエキシトン-フォノン相互作用の等方性に依存する。ダヴィドフ・ソリトンは巨大な障壁を通る量子トンネルや、衝突地点での量子干渉も可能であった。この結果は、高分子構造の結合剤としての共有結合の力学的支持を超えた生物学的システムにおける量子効果の非自明な役割を支持する。ダヴィドフソリトン(英語版)の量子トンネルと干渉は、そのような真の量子現象の存在を支持する生物学的秩序の進化的な委任に加えて、高効率な輸送、輸送、利用を可能にする物理機構を持つ触媒活性のマクロ分子タンパク質複合体を提供する。 The essential biological processes that sustain life are catalyzed by protein nano-engines, which maintain living systems in far-from-equilibrium ordered states. To investigate energetic processes in proteins, we have analyzed the system of generalized Davydov equations that govern the quantum dynamics of multiple amide I exciton quanta propagating along the hydrogen-bonded peptide groups in $\alpha$-helices. Computational simulations have confirmed the generation of moving Davydov solitons by applied pulses of amide I energy for protein $\alpha$-helices of varying length. The stability and mobility of these solitons depended on the uniformity of dipole-dipole coupling between amide I oscillators, and the isotropy of the exciton-phonon interaction. Davydov solitons were also able to quantum tunnel through massive barriers, or to quantum interfere at collision sites. The results presented here support a nontrivial role of quantum effects in biological systems that lies beyond the mechanistic support of covalent bonds as binding agents of macromolecular structures. Quantum tunneling and interference of Davydov solitons provide catalytically active macromolecular protein complexes with a physical mechanism allowing highly efficient transport, delivery, and utilization of free energy, besides the evolutionary mandate of biological order that supports the existence of such genuine quantum phenomena, and may indeed demarcate the quantum boundaries of life.	翻訳日:2023-05-28 19:58:18 公開日:2020-03-18
# テキストマイニングフォーマメンティスネットワークはソーシャルメディアにおけるSTEM性差の一般認識を再構築する Text-mining forma mentis networks reconstruct public perception of the STEM gender gap in social media ( http://arxiv.org/abs/2003.08835v1 ) ライセンス: Link先を確認	Massimo Stella	(参考訳) マインドセット再構成(Mindset reconstruction)は、個人の構造と知識の知覚のマッピングであり、言語とその人間の心における認知的反射(精神の語彙)を調べることによって、ここで展開された地図である。 textual forma mentis networks (tfmn) は、文章データからマインドセットの構造を抽出、表現、理解するために導入されたガラスの箱である。ネットワーク科学、心理言語学、ビッグデータを組み合わせることで、TFMNは、ベンチマークテキストにおいて、監督なしに関連する概念を特定できた。ひとたび検証されると、tfmnは科学における男女格差のケーススタディに応用され、近年の研究によって歪んだ考え方に強く関連した。ソーシャルメディアの認識とオンラインの談話に焦点を当て、この研究は1万の関連ツイートを分析した。ジェンダー」と「ギャップ」はほとんど肯定的な認識を示し、信頼とジョーヨーの感情的プロファイルと意味的関連性: 女性科学者の成功、男女格差と賃金差の関連、将来の解決への期待。女性」の認識は、科学における女性に対する性的嫌がらせとステレオタイプ的脅威(暗黙の認知バイアスの一形態)に関する議論を「成功のための個人的スキルを犠牲にする」ことを強調した。人」の再構築された認識は、科学における男性の優越という神話に対する社会ユーザの認識を強調した。人」に関する怒りは検出されず、ギャップに焦点をあてた談話が性別のない言葉に関して緊張しなくなったことを示唆している。科学者」に対する定型的な認識は、実世界の調査とは異なるオンライン上では見つからなかった。総合分析では、オンラインの談話は、主にステレオタイプフリーでポジティブで信頼に満ちたジェンダー格差の認識を促進し、暗黙の/説明的な偏見を認識し、ギャップを縮めると予測している。 TFMNは、異なるグループの認識を調査するための新しい方法を開き、政策決定のための詳細なデータインフォームド基盤を提供した。 Mindset reconstruction maps how individuals structure and perceive knowledge, a map unfolded here by investigating language and its cognitive reflection in the human mind, i.e. the mental lexicon. Textual forma mentis networks (TFMN) are glass boxes introduced for extracting, representing and understanding mindsets' structure, in Latin "forma mentis", from textual data. Combining network science, psycholinguistics and Big Data, TFMNs successfully identified relevant concepts, without supervision, in benchmark texts. Once validated, TFMNs were applied to the case study of the gender gap in science, which was strongly linked to distorted mindsets by recent studies. Focusing over social media perception and online discourse, this work analysed 10,000 relevant tweets. "Gender" and "gap" elicited a mostly positive perception, with a trustful/joyous emotional profile and semantic associates that: celebrated successful female scientists, related gender gap to wage differences, and hoped for a future resolution. The perception of "woman" highlighted discussion about sexual harassment and stereotype threat (a form of implicit cognitive bias) relative to women in science "sacrificing personal skills for success". The reconstructed perception of "man" highlighted social users' awareness of the myth of male superiority in science. No anger was detected around "person", suggesting that gap-focused discourse got less tense around genderless terms. No stereotypical perception of "scientist" was identified online, differently from real-world surveys. The overall analysis identified the online discourse as promoting a mostly stereotype-free, positive/trustful perception of gender disparity, aware of implicit/explicit biases and projected to closing the gap. TFMNs opened new ways for investigating perceptions in different groups, offering detailed data-informed grounding for policy making.	翻訳日:2023-05-28 19:56:40 公開日:2020-03-18
# ハイブリッド量子システムの非線形光学応答の位相制御 Topological control of the nonlinear-optical response of hybrid quantum systems ( http://arxiv.org/abs/2003.08465v1 ) ライセンス: Link先を確認	Ethan L. Crowell and Mark G. Kuzyk	(参考訳) 1次元超格子の位相特性を電子系の光学的性質にマッピングする。非線形光学応答は、位相的に保護されたエッジ状態と非局在化された固有状態の間の遷移形態にある電子に最適化されている。これはハイブリッド量子システムの非線形光学応答をチューニングする新しい手段を提供する。これらの特性を飽和吸収の模倣に利用し,効率の良い全光スイッチの構築に「量子コード」をどのように利用できるかを示す。 We map the topological properties of a one dimensional superlattice to the optical properties of an electronic system. We find that the nonlinear-optical response is optimized for electrons that live in the transitional morphology between topologically protected edge states and delocalized eigenstates. This provides a novel means of tuning the nonlinear-optical response of hybrid quantum systems. We show how these characteristics can be used to mimic saturable absorption and illustrate how `quantum cords' can be used to build an efficient all-optical switch.	翻訳日:2023-05-28 19:55:12 公開日:2020-03-18
# 時間軌道電位トラップにおける磁場の精密制御と光偏光 Precise control of magnetic fields and optical polarization in a time-orbiting potential trap ( http://arxiv.org/abs/2003.08459v1 ) ライセンス: Link先を確認	A. J. Fallon and C. A. Sackett	(参考訳) 時間軌道ポテンシャルトラップは、回転磁場中の中性原子を拘束する。磁場の回転は、系統的な効果を平均化できるため、精密な測定に有用である。しかし、磁場は静的場よりも特性付けが難しく、原子に適用される光が量子化軸に対して時間的に変動する光偏光を持つ。これらの問題は、電波磁場またはレーザーが回転磁場に同期するパルスに印加されるストロボスコープ技術を用いて克服することができる。これらの方法を用いて、磁場は10mGの精度で特徴付けることができ、光は5\times 10^{-5}$の偏光誤差で適用することができる。 A time orbiting potential trap confines neutral atoms in a rotating magnetic field. The rotation of the field can be useful for precision measurements, since it can average out some systematic effects. However, the field is more difficult to characterize than a static field, and it makes light applied to the atoms have a time-varying optical polarization relative to the quantization axis. These problems can be overcome using stroboscopic techniques, where either a radio-frequency field or a laser is applied in pulses that are synchronized to the rotating field. Using these methods, the magnetic field can be characterized with a precision of 10 mG and light can be applied with a polarization error of $5\times 10^{-5}$.	翻訳日:2023-05-28 19:55:03 公開日:2020-03-18
# 量子イジングリングにおける縦磁化ダイナミクス:運動量空間と実空間の対応に基づくパフィアン法 Longitudinal magnetization dynamics in the quantum Ising ring: A Pfaffian method based on correspondence between momentum space and real space ( http://arxiv.org/abs/2001.00511v2 ) ライセンス: Link先を確認	Ning Wu	(参考訳) おそらく最も研究されている量子相転移のパラダイムとして、周期的量子イジング鎖はジョルダン・ウィグナー変換によって正確に解くことができ、続いてスピンレスフェルミオンの運動量空間でモデルを対角化するフーリエ変換が続く。上記の手順はよく知られているが、量子イジング環の実空間と運動量空間の表現、特にフェルミオンパリティに関する対応に関して、いくつかの微妙な点がある。本研究では、実空間における2つの完全整列強磁性状態と古典的イジング環の2つの退化運動量空間基底状態との関係を定め、前者はフラストレーションのない超曲面上のより一般的なXYZモデルの分解基底状態の特別な場合である。この観察に基づいて, 2つの強磁性状態のうちの1つと, 並進不変な駆動下で作製した系を用いて, パリティ破断した縦磁化のリアルタイムダイナミクスを計算するためのファフィアン公式を提供する。この形式主義は、パフィアンの数値計算のためのオンラインプログラムの助けを借りてシステムに適用できるため、例えば関連するシステムにおける離散時間結晶の出現を数値的に研究するための効率的な手法を提供する。 As perhaps the most studied paradigm for a quantum phase transition, the periodic quantum Ising chain is exactly solvable via the Jordan-Wigner transformation followed by a Fourier transform that diagonalizes the model in the momentum space of spinless fermions. Although the above procedures are well-known, there remain some subtle points to be clarified regarding the correspondence between the real-space and momentum-space representations of the quantum Ising ring, especially those related to fermion parities. In this work, we establish the relationship between the two fully aligned ferromagnetic states in real space and the two degenerate momentum-space ground states of the classical Ising ring, with the former being a special case of the factorized ground states of the more general XYZ model on the frustration-free hypersurface. Based on this observation, we then provide a Pfaffian formula for calculating real-time dynamics of the parity-breaking longitudinal magnetization with the system initially prepared in one of the two ferromagnetic states and under translationally invariant drivings. The formalism is shown to be applicable to systems with the help of online programs for the numerical computation of the Pfaffian, thus providing an efficient method to numerically study, for example, the emergence of discrete time crystals in related systems.	翻訳日:2023-01-16 04:50:55 公開日:2020-03-18
# 進化的ニューラルアーキテクチャによる網膜血管セグメンテーションの探索 Evolutionary Neural Architecture Search for Retinal Vessel Segmentation ( http://arxiv.org/abs/2001.06678v3 ) ライセンス: Link先を確認	Zhun Fan, Jiahong Wei, Guijie Zhu, Jiajie Mo, Wenji Li	(参考訳) 正確な網膜血管セグメンテーション(RVS)は、眼科疾患やその他の全身疾患の診断において医師を支援する上で非常に重要である。網膜血管セグメンテーションのための有効なニューラルネットワークアーキテクチャを手作業で設計するには、高度な専門知識と大きなワークロードが必要です。血管セグメンテーションの性能を改善し,手動で設計するニューラルネットワークの作業量を削減するために,網膜血管セグメンテーションのためのエンコーダデコーダアーキテクチャを最適化するためのニューラルネットワーク探索(NAS)を適用した新しいアプローチを提案する。改良進化アルゴリズムは、限られた計算資源でエンコーダ・デコーダ・フレームワークのアーキテクチャを発展させるために用いられる。提案手法により得られた進化的モデルは,DRIVE, STARE, CHASE_DB1 という3つのデータセットで比較した手法の上位性能を実現するが,パラメータははるかに少ない。さらに, クロストレーニングの結果, 進化したモデルはかなりのスケーラビリティを示し, 臨床疾患診断の可能性も示唆した。 The accurate retinal vessel segmentation (RVS) is of great significance to assist doctors in the diagnosis of ophthalmology diseases and other systemic diseases. Manually designing a valid neural network architecture for retinal vessel segmentation requires high expertise and a large workload. In order to improve the performance of vessel segmentation and reduce the workload of manually designing neural network, we propose novel approach which applies neural architecture search (NAS) to optimize an encoder-decoder architecture for retinal vessel segmentation. A modified evolutionary algorithm is used to evolve the architectures of encoder-decoder framework with limited computing resources. The evolved model obtained by the proposed approach achieves top performance among all compared methods on the three datasets, namely DRIVE, STARE and CHASE_DB1, but with much fewer parameters. Moreover, the results of cross-training show that the evolved model is with considerable scalability, which indicates a great potential for clinical disease diagnosis.	翻訳日:2023-01-10 05:30:56 公開日:2020-03-18
# 対象関数は、十分広いランダムネットワークの近傍に存在する:幾何学的視点 Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective ( http://arxiv.org/abs/2001.06931v2 ) ライセンス: Link先を確認	Shun-ichi Amari	(参考訳) 任意の対象関数は、幅(層内のニューロン数)が十分に大きい場合、ランダムに接続された任意のディープネットワークの十分小さな近傍で実現されることが知られている。この顕著な事実については洗練された理論や議論があるが、厳密な理論は非常に複雑である。構造を解明するために単純なモデルを用いて基本的な幾何学的証明を与える。半径 1 の高次元球面を低次元部分空間に投影すると、球面上の一様分布は無視できるほど小さな共分散のガウス分布に還元される。 It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role: When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere reduces to a Gaussian distribution of negligibly small covariances.	翻訳日:2023-01-08 05:04:34 公開日:2020-03-18
# 深部ニューラルネットワークのための学習後線形量子化 Post-Training Piecewise Linear Quantization for Deep Neural Networks ( http://arxiv.org/abs/2002.00104v2 ) ライセンス: Link先を確認	Jun Fang, Ali Shafiee, Hamzah Abdel-Aziz, David Thorsley, Georgios Georgiadis, Joseph Hassoun	(参考訳) リソース制限されたデバイスへのディープニューラルネットワークのエネルギー効率向上において、量子化は重要な役割を果たす。トレーニング後の量子化は、完全なトレーニングデータセットの再トレーニングやアクセスを必要としないため、非常に望ましい。ニューラルネットワークを完全精度から8ビットの固定点整数に変換することにより、学習後量子化のための確立された均一なスキームが良好な結果を得る。しかし、ビット幅の量子化では性能が著しく低下する。本稿では,長尾のベル型分布を持つテンソル値の高精度近似を実現するために,区分線形量子化(pwlq)スキームを提案する。提案手法では、量子化範囲全体をテンソル毎に重複しない領域に分割し、各領域に等数の量子化レベルを割り当てる。範囲全体を分割する最適なブレークポイントは、量子化誤差を最小化する。実験結果から,提案手法は画像分類,セマンティックセグメンテーション,オブジェクト検出において,少ないオーバーヘッドで優れた性能を発揮することが示された。 Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training dataset. The well-established uniform scheme for post-training quantization achieves satisfactory results by converting neural networks from full-precision to 8-bit fixed-point integers. However, it suffers from significant performance degradation when quantizing to lower bit-widths. In this paper, we propose a piecewise linear quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Our approach breaks the entire quantization range into non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. Optimal breakpoints that divide the entire range are found by minimizing the quantization error. Compared to state-of-the-art post-training quantization methods, experimental results show that our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.	翻訳日:2023-01-05 06:04:22 公開日:2020-03-18
# ml-misfit:機械学習を用いたフルウェーブフォームインバージョンのためのロバストミスフィット関数の学習 ML-misfit: Learning a robust misfit function for full-waveform inversion using machine learning ( http://arxiv.org/abs/2002.03163v2 ) ライセンス: Link先を確認	Bingbing Sun and Tariq Alkhalifah	(参考訳) フル波形インバージョン(fwi)用の利用可能なadvanced misfit関数のほとんどは手作りであり、これらのmisfit関数のパフォーマンスはデータ依存である。そこで本研究では,fwi の ml-misfit というミスフィット関数を機械学習に基づいて学習することを提案する。マッチングフィルタの最適輸送にインスパイアされ、2つの分布の平均と分散を比較するのに類似した形で、不適合関数のためのニューラルネットワーク(NN)アーキテクチャを設計する。その結果得られたミスフィットがメトリックであることを保証するために、入力に対するミスフィットの対称性と「三角不等式」規則を満たすメタ損失関数におけるヒンジ損失正規化項を満足する。メタラーニングの枠組みでは、FWIを実行してネットワークをトレーニングし、ランダムに生成された速度モデルを逆転させ、真のモデルと逆モデルの累積差として定義されるメタロスを最小化してNNのパラメータを更新する。まず,移動時シフト信号に対する凸不適合関数を学習するためのMLミスフィットの基本原理を説明する。さらに,2次元水平層モデル上でNNを訓練し,よく知られたMarmousiモデルに適用することにより,学習したMLミスフィットの有効性と堅牢性を示す。 Most of the available advanced misfit functions for full waveform inversion (FWI) are hand-crafted, and the performance of those misfit functions is data-dependent. Thus, we propose to learn a misfit function for FWI, entitled ML-misfit, based on machine learning. Inspired by the optimal transport of the matching filter misfit, we design a neural network (NN) architecture for the misfit function in a form similar to comparing the mean and variance for two distributions. To guarantee the resulting learned misfit is a metric, we accommodate the symmetry of the misfit with respect to its input and a Hinge loss regularization term in a meta-loss function to satisfy the "triangle inequality" rule. In the framework of meta-learning, we train the network by running FWI to invert for randomly generated velocity models and update the parameters of the NN by minimizing the meta-loss, which is defined as accumulated difference between the true and inverted models. We first illustrate the basic principle of the ML-misfit for learning a convex misfit function for travel-time shifted signals. Further, we train the NN on 2D horizontally layered models, and we demonstrate the effectiveness and robustness of the learned ML-misfit by applying it to the well-known Marmousi model.	翻訳日:2023-01-02 23:08:47 公開日:2020-03-18
# 不変リスク最小化ゲーム Invariant Risk Minimization Games ( http://arxiv.org/abs/2002.04692v2 ) ライセンス: Link先を確認	Kartik Ahuja, Karthikeyan Shanmugam, Kush R. Varshney, Amit Dhurandhar	(参考訳) 機械学習の標準的なリスク最小化パラダイムは、スプリアス相関によるトレーニング分布とテスト分布が異なる環境での運用では不安定である。多くの環境からのデータのトレーニングと不変な予測器の発見は、結果と因果関係を持つ特徴にモデルを集中させることで、刺激的な特徴の効果を減らす。本研究では,複数の環境においてアンサンブルゲームのナッシュ平衡を求めるような不変リスク最小化を行う。そこで我々は,最良の応答ダイナミクスを用いた簡易な学習アルゴリズムを開発し,実験では,arjovsky et al.(2019)の挑戦的2レベル最適化問題よりも,非常に低い分散で類似または優れた経験的精度を与える。 1つの重要な理論的貢献は、提案されたゲームに対するナッシュ均衡の集合が、非線形分類器や変換でさえ、任意の有限個の環境に対する不変予測子の集合と等価であることを示すことである。その結果、この手法はArjovsky et al. (2019) に示される大きな環境の集合に対する一般化保証も維持する。提案アルゴリズムは, 生成逆数ネットワークなどのゲーム理論機械学習アルゴリズムの収集に成功した。 The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding invariant predictors reduces the effect of spurious features by concentrating models on features that have a causal relationship with the outcome. In this work, we pose such invariant risk minimization as finding the Nash equilibrium of an ensemble game among several environments. By doing so, we develop a simple training algorithm that uses best response dynamics and, in our experiments, yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019). One key theoretical contribution is showing that the set of Nash equilibria for the proposed game are equivalent to the set of invariant predictors for any finite number of environments, even with nonlinear classifiers and transformations. As a result, our method also retains the generalization guarantees to a large set of environments shown in Arjovsky et al. (2019). The proposed algorithm adds to the collection of successful game-theoretic machine learning algorithms such as generative adversarial networks.	翻訳日:2023-01-02 01:37:44 公開日:2020-03-18
# 人物画像生成のための深部画像空間変換 Deep Image Spatial Transformation for Person Image Generation ( http://arxiv.org/abs/2003.00696v2 ) ライセンス: Link先を確認	Yurui Ren, Xiaoming Yu, Junming Chen, Thomas H. Li, Ge Li	(参考訳) ポーズ誘導人物画像生成は、対象人物画像から対象人物画像への変換である。このタスクはソースデータの空間的操作を必要とする。しかし、畳み込みニューラルネットワークは、入力を空間的に変換する能力の欠如によって制限される。本稿では,インプットを機能レベルで再アセンブルするための微分可能なグローバルフローローカルアテンションフレームワークを提案する。具体的には、まずソースとターゲットのグローバルな相関を計算し、流れ場を予測する。そして、特徴地図からフローした局所パッチ対を抽出して局所注意係数を算出する。最後に,得られた局所的注意係数を用いたコンテンツ認識サンプリング手法を用いて,ソース特性を警告する。主観的および客観的実験の結果から,モデルの優越性が示された。さらに,映像アニメーションとビュー合成のさらなる結果は,我々のモデルは空間変換を必要とする他のタスクに適用可能であることを示している。ソースコードはhttps://github.com/RenYurui/Global-Flow-Local-Attentionで公開しています。 Pose-guided person image generation is to transform a source person image to a target pose. This task requires spatial manipulations of source data. However, Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. Specifically, our model first calculates the global correlations between sources and targets to predict flow fields. Then, the flowed local patch pairs are extracted from the feature maps to calculate the local attention coefficients. Finally, we warp the source features using a content-aware sampling method with the obtained local attention coefficients. The results of both subjective and objective experiments demonstrate the superiority of our model. Besides, additional results in video animation and view synthesis show that our model is applicable to other tasks requiring spatial transformation. Our source code is available at https://github.com/RenYurui/Global-Flow-Local-Attention.	翻訳日:2022-12-27 04:13:03 公開日:2020-03-18
# GenNet : ジェネレーションとセレクションモデルを用いた複数選択質問の読解 GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model ( http://arxiv.org/abs/2003.04360v2 ) ライセンス: Link先を確認	Vaishali Ingale, Pushpender Singh	(参考訳) 複数選択機械読解は, 与えられた項目と質問項目から正しい選択肢を選択するために必要な機械として困難な作業であり, 複数選択問合せタスクによる理解を読み, 与えられた項目, 質問ペア, 与えられた選択肢から最適な選択肢を選択するための人間(または機械)を検索する。与えられた節から正しい答えを選択するには2つの異なる方法がある。最悪の解答を排除して、ベストマッチの解答を選択する。本稿では、ニューラルネットワークベースのモデルであるGenNetモデルを提案する。このモデルでは、まずその文から質問の答えを生成し、それから生成された回答と与えられた回答とを一致させる。回答生成にはS-net(Tan et al., 2017)モデルをSQuADでトレーニングし,そのモデルを評価するために大規模RAS(ReAding Comprehension Dataset From Examinations)(Lai et al., 2017)を使用しました。 Multiple-choice machine reading comprehension is difficult task as its required machines to select the correct option from a set of candidate or possible options using the given passage and question.Reading Comprehension with Multiple Choice Questions task,required a human (or machine) to read a given passage, question pair and select the best one option from n given options. There are two different ways to select the correct answer from the given passage. Either by selecting the best match answer to by eliminating the worst match answer. Here we proposed GenNet model, a neural network-based model. In this model first we will generate the answer of the question from the passage and then will matched the generated answer with given answer, the best matched option will be our answer. For answer generation we used S-net (Tan et al., 2017) model trained on SQuAD and to evaluate our model we used Large-scale RACE (ReAding Comprehension Dataset From Examinations) (Lai et al.,2017).	翻訳日:2022-12-26 21:50:20 公開日:2020-03-18
# 自己教師付き時間領域適応による動作セグメンテーション Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation ( http://arxiv.org/abs/2003.02824v3 ) ライセンス: Link先を確認	Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira	(参考訳) 完全教師付きアクションセグメンテーション技術の最近の進歩にもかかわらず、パフォーマンスはまだ完全には満足できない。主な課題は時空間変動の問題である(例えば、異なる人が様々な方法で同じ活動をすることができる)。そこで本稿では,非ラベル映像を用いて,時空間変動による領域差を伴うクロスドメイン問題としてアクションセグメンテーションタスクを再構成し,この問題に対処する。そこで本研究では,局所的および大域的な時間的ダイナミクスを組み込んだクロスドメイン特徴空間を協調的に整列させ,他のドメイン適応(da)手法よりも優れた性能を実現するために,自己教師付き時間領域適応(sstda)を提案する。 3つの挑戦的なベンチマークデータセット(GTEA、50Salads、およびBreakfast)において、SSTDAは、現在の最先端の手法を大きなマージン(例えば、F1@25スコア、59.6%から69.1%、Breakfastスコア、50Saladsの73.4%から81.5%、GTEAの83.6%から89.1%)で上回り、ラベル付きトレーニングデータの65%に匹敵するパフォーマンスを保っている。ソースコードはhttps://github.com/cmhungsteve/SSTDAで入手できる。 Despite the recent progress of fully-supervised action segmentation techniques, the performance is still not fully satisfactory. One main challenge is the problem of spatiotemporal variations (e.g. different people may perform the same activity in various ways). Therefore, we exploit unlabeled videos to address this problem by reformulating the action segmentation task as a cross-domain problem with domain discrepancy caused by spatio-temporal variations. To reduce the discrepancy, we propose Self-Supervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches. On three challenging benchmark datasets (GTEA, 50Salads, and Breakfast), SSTDA outperforms the current state-of-the-art method by large margins (e.g. for the F1@25 score, from 59.6% to 69.1% on Breakfast, from 73.4% to 81.5% on 50Salads, and from 83.6% to 89.1% on GTEA), and requires only 65% of the labeled training data for comparable performance, demonstrating the usefulness of adapting to unlabeled target videos across variations. The source code is available at https://github.com/cmhungsteve/SSTDA.	翻訳日:2022-12-26 06:59:32 公開日:2020-03-18
# 機械学習バイアスの半自動検出のための設計ツール:インタビュー研究 Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study ( http://arxiv.org/abs/2003.07680v2 ) ライセンス: Link先を確認	Po-Ming Law, Sana Malik, Fan Du, Moumita Sinha	(参考訳) 機械学習モデルは、入力データの特定のサブグループに対するバイアスを予測します。検出されない場合、機械学習のバイアスは、重要な経済的および倫理的影響を構成する可能性がある。ループに人間を巻き込む半自動ツールは、バイアス検出を容易にする。しかし、その設計にかかわる考慮事項についてはほとんど分かっていない。本稿では,機械学習の実践者11人とのインタビューで,半自動バイアス検出ツールに関するニーズを調査した。この結果に基づき,バイアス検出のための将来のツール開発を目指すシステム設計者の指導を行うための設計上の4つの考察を強調する。 Machine learning models often make predictions that bias against certain subgroups of input data. When undetected, machine learning biases can constitute significant financial and ethical implications. Semi-automated tools that involve humans in the loop could facilitate bias detection. Yet, little is known about the considerations involved in their design. In this paper, we report on an interview study with 11 machine learning practitioners for investigating the needs surrounding semi-automated bias detection tools. Based on the findings, we highlight four considerations in designing to guide system designers who aim to create future tools for bias detection.	翻訳日:2022-12-24 01:31:39 公開日:2020-03-18
# 集中治療における敗血症治療の最適化 : 強化学習から治療前評価まで Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation ( http://arxiv.org/abs/2003.06474v2 ) ライセンス: Link先を確認	Luchen Li, Ignacio Albert-Smet, and Aldo A. Faisal	(参考訳) 本研究の目的は,インベンションを最適化する強化学習(rl)が,臨床展開における学習方針の今後の臨床検査への規制に準拠する経路を,遡及的に確立することにある。我々は, 複雑で不透明な患者動態が原因で治療が困難である集中治療室における感染症と, 個々の患者が必要とする治療方針について, 臨床的に議論され, 高度に分断され, かつ集中治療室は自然にデータに富んでいることに焦点を当てた。本研究は、医療におけるRLアプローチ(AI臨床医)の構築と、部分的に観察可能なMDP(POMDP)の下での歴史的集中治療データを用いた敗血症治療のための医薬の非政治的服用方針を学習する。 POMPDは、すべての歴史的情報を取り込み、効率的な表現をもたらすことで、患者の状態の不確実性をよりよく捉えます。最優先のツリーサーチによって、遭遇した各状態を評価することで、振り返りデータにおける探索の欠如を補う。我々は, 臨床医の複合政策近傍の政策を最適化することで, 状態分布の変化を緩和する。要は,従来の政策評価だけでなく,臨床医の意思決定の正確さと不確実性を評価するための,モデルに依存しない事前臨床評価手法を,同一の患者履歴に直面する場合のシステムレコメンデーション(シャドウモード)と比較した。 Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies in a clinical deployment. We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics, and the clinically debated, highly-divergent set of intervention policies required by each individual patient, yet intensive care units are naturally data rich. In our work, we build on RL approaches in healthcare ("AI Clinicians"), and learn off-policy continuous dosing policy of pharmaceuticals for sepsis treatment using historical intensive care data under partially observable MDPs (POMDPs). POMPDs capture uncertainty in patient state better by taking in all historical information, yielding an efficient representation, which we investigate through ablations. We compensate for the lack of exploration in our retrospective data by evaluating each encountered state with a best-first tree search. We mitigate state distributional shift by optimizing our policy in the vicinity of the clinicians' compound policy. Crucially, we evaluate our model recommendations using not only conventional policy evaluations but a novel framework that incorporates human experts: a model-agnostic pre-clinical evaluation method to estimate the accuracy and uncertainty of clinician's decisions versus our system recommendations when confronted with the same individual patient history ("shadow mode").	翻訳日:2022-12-24 01:05:55 公開日:2020-03-18
# SDNを用いたモノのインターネットのためのSOM型DDoS防御機構 SOM-based DDoS Defense Mechanism using SDN for the Internet of Things ( http://arxiv.org/abs/2003.06834v2 ) ライセンス: Link先を確認	Yunfei Meng, Zhiqiu Huang, Senzhang Wang, Guohua Shen, Changbo Ke	(参考訳) 本稿では,モノのインターネットに対するセキュリティの脅威に効果的に取り組むために,ソフトウェア定義ネットワーク(SDN)を用いたSOMベースのDDoS防御機構を提案する。このメカニズムの主な考え方は、物のインターネットにおけるデバイスサービスを保護するためにSDNベースのゲートウェイをデプロイすることだ。ゲートウェイは、somニューラルネットワークに基づくddos防御メカニズムを提供する。 SOMベースのDDoS防御機構により、ゲートウェイはIoT内の悪意あるセンサーデバイスを効果的に識別し、検出した後にそれらの悪意のあるデバイスを自動的にブロックし、DDoS攻撃を受けた場合のシステムのセキュリティと堅牢性を効果的に強化することができる。この機構の有効性と有効性を検証するため,実験システムの実装にはPOXコントローラとミニネットエミュレータを使用し,さらに前述のセキュリティ対策機構をPythonで実装する。最後の実験結果は、異なるテストシナリオでメカニズムが本当に効果的であることを示している。 To effectively tackle the security threats towards the Internet of things, we propose a SOM-based DDoS defense mechanism using software-defined networking (SDN) in this paper. The main idea of the mechanism is to deploy a SDN-based gateway to protect the device services in the Internet of things. The gateway provides DDoS defense mechanism based on SOM neural network. By means of SOM-based DDoS defense mechanism, the gateway can effectively identify the malicious sensing devices in the IoT, and automatically block those malicious devices after detecting them, so that it can effectively enforce the security and robustness of the system when it is under DDoS attacks. In order to validate the feasibility and effectiveness of the mechanism, we leverage POX controller and Mininet emulator to implement an experimental system, and further implement the aforementioned security enforcement mechanisms with Python. The final experimental results illustrate that the mechanism is truly effective under the different test scenarios.	翻訳日:2022-12-23 08:45:59 公開日:2020-03-18
# アフリカ小児てんかん患者における低磁場脳mriのコントラストと分解能の向上 Image Quality Transfer Enhances Contrast and Resolution of Low-Field Brain MRI in African Paediatric Epilepsy Patients ( http://arxiv.org/abs/2003.07216v2 ) ライセンス: Link先を確認	Matteo Figini (1), Hongxiang Lin (1), Godwin Ogbole (2), Felice D Arco (3), Stefano B. Blumberg (1), David W. Carmichael (4 and 5), Ryutaro Tanno (1 and 6), Enrico Kaden (1 and 4), Biobele J. Brown (7), Ikeoluwa Lagunju (7), Helen J. Cross (3 and 4), Delmiro Fernandez-Reyes (1 and 7), Daniel C. Alexander (1) ((1) Centre for Medical Image Computing and Department of Computer Science - University College London - UK, (2) Department of Radiology - College of Medicine - University of Ibadan - Nigeria, (3) Great Ormond Street Hospital for Children - London - UK, (4) UCL Great Ormond Street Institute of Child Health - London - UK, (5) Department of Biomedical Engineering - Kings College London - UK, (6) Machine Intelligence and Perception Group - Microsoft Research Cambridge - UK, (7) Department of Paediatrics - College of Medicine - University of Ibadan - Nigeria)	(参考訳) 1.5tまたは3tスキャナは、現在の臨床mriの標準であるが、低磁場の(<1t)スキャナは、コストと停電に対する堅牢性のために、多くの低所得国で依然として一般的である。現代の高磁場スキャナと比較すると、低磁場スキャナーは同等の解像度で信号と雑音の比が低い画像を提供し、実践者は大きなスライス厚さと不完全な空間カバレッジを用いて補償する。さらに、異なる種類の脳組織間のコントラストは、診断値を制限する等信号対雑音比でも著しく減少する可能性がある。近年,1.5T画像や3T画像の解像度,空間被覆,コントラストの近似を目的とした0.36T画像の高精細化のために,画像品質転送のパラダイムが適用されている。ニューラルネットワークU-Netの亜種は、公開されている3T Human Connectome Projectデータセットからシミュレーションされたローフィールドイメージを使用してトレーニングされた。今回我々は,手軽に手軽に利用できる低磁場MRIのてんかん管理における臨床的有用性を高めるために,IQTの有用性を示すリアルおよびシミュレートされた臨床低磁場脳画像の質的結果を示す。 1.5T or 3T scanners are the current standard for clinical MRI, but low-field (<1T) scanners are still common in many lower- and middle-income countries for reasons of cost and robustness to power failures. Compared to modern high-field scanners, low-field scanners provide images with lower signal-to-noise ratio at equivalent resolution, leaving practitioners to compensate by using large slice thickness and incomplete spatial coverage. Furthermore, the contrast between different types of brain tissue may be substantially reduced even at equal signal-to-noise ratio, which limits diagnostic value. Recently the paradigm of Image Quality Transfer has been applied to enhance 0.36T structural images aiming to approximate the resolution, spatial coverage, and contrast of typical 1.5T or 3T images. A variant of the neural network U-Net was trained using low-field images simulated from the publicly available 3T Human Connectome Project dataset. Here we present qualitative results from real and simulated clinical low-field brain images showing the potential value of IQT to enhance the clinical utility of readily accessible low-field MRIs in the management of epilepsy.	翻訳日:2022-12-23 04:09:30 公開日:2020-03-18
# ギリシア語における攻撃的言語識別 Offensive Language Identification in Greek ( http://arxiv.org/abs/2003.07459v2 ) ライセンス: Link先を確認	Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe	(参考訳) オンラインコミュニティやソーシャルメディアプラットフォームでは、攻撃的言語が問題になってきたため、研究者は乱暴なコンテンツに対処する方法や、サイバーいじめ、ヘイトスピーチ、攻撃など、さまざまなタイプを検出するシステムの開発を行っている。特筆すべき例外がいくつかあるが、この話題に関するほとんどの研究は英語を扱っている。これは主に英語の言語リソースが利用できるためである。この欠点に対処するため,攻撃的言語識別のための最初のギリシャの注釈付きデータセットであるOGTDを提案する。 OGTDは、Twitterから4,779件の投稿が攻撃的であり、攻撃的ではないという手動の注釈付きデータセットである。データセットの詳細な説明とともに、このデータに基づいてトレーニングおよびテストされたいくつかの計算モデルを評価する。 As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With a few notable exceptions, most research on this topic so far has dealt with English. This is mostly due to the availability of language resources for English. To address this shortcoming, this paper presents the first Greek annotated dataset for offensive language identification: the Offensive Greek Tweet Dataset (OGTD). OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive. Along with a detailed description of the dataset, we evaluate several computational models trained and tested on this data.	翻訳日:2022-12-23 03:12:47 公開日:2020-03-18
# ロバスト画像を用いた植物病診断のためのノイズラベルからのメタラーニング Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant Disease Diagnosis ( http://arxiv.org/abs/2003.07603v2 ) ライセンス: Link先を確認	Ruifeng Shi, Deming Zhai, Xianming Liu, Junjun Jiang, Wen Gao	(参考訳) 植物病は食料安全保障と作物生産の主な脅威の1つである。したがって、最近の人工知能の進歩を利用して植物病の診断を支援することは重要である。一般的なアプローチの1つは、葉画像分類タスクとしてこの問題を変換し、強力な畳み込みニューラルネットワーク(CNN)によって処理することができる。しかしながら、cnnに基づく分類手法の性能は、実際にはラベルに必然的にノイズをもたらし、モデルオーバーフィッティングとパフォーマンス低下をもたらす、高品質な手動ラベルトレーニングデータに依存する。そこで本稿では,修正メタ学習モジュールを共通CNNパラダイムに組み込んだ新しいフレームワークを提案する。提案手法は以下の利点を享受する。一補正メタラーニングは、偏見のないサンプルにより多くの注意を払って、収束の加速と分類精度の向上を図っている。二この方法は、様々な種類の騒音によく作用するラベルノイズ分布を仮定して、自由である。三本手法は、勾配降下法により最適化されたディープモデルに組み込むことができるプラグアンドプレイモジュールとして機能する。最先端のアルゴリズムよりも優れた性能を示すために,広範な実験を行った。 Plant diseases serve as one of main threats to food security and crop production. It is thus valuable to exploit recent advances of artificial intelligence to assist plant disease diagnosis. One popular approach is to transform this problem as a leaf image classification task, which can be then addressed by the powerful convolutional neural networks (CNNs). However, the performance of CNN-based classification approach depends on a large amount of high-quality manually labeled training data, which are inevitably introduced noise on labels in practice, leading to model overfitting and performance degradation. To overcome this problem, we propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information. The proposed method enjoys the following merits: i) A rectified meta-learning is designed to pay more attention to unbiased samples, leading to accelerated convergence and improved classification accuracy. ii) Our method is free on assumption of label noise distribution, which works well on various kinds of noise. iii) Our method serves as a plug-and-play module, which can be embedded into any deep models optimized by gradient descent based method. Extensive experiments are conducted to demonstrate the superior performance of our algorithm over the state-of-the-arts.	翻訳日:2022-12-22 21:03:34 公開日:2020-03-18
# TREC 2019ディープラーニングトラックの概要 Overview of the TREC 2019 deep learning track ( http://arxiv.org/abs/2003.07820v2 ) ライセンス: Link先を確認	Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees	(参考訳) Deep Learning TrackはTREC 2019の新しいトラックで、大規模データ体制におけるアドホックランキングの研究を目的としている。 2つのタスクに対応する2つのセットを導入し、それぞれに厳格なTRECスタイルのブラインド評価と再利用可能なテストセットがある。文書検索タスクには320万のドキュメントと367万のトレーニングクエリからなるコーパスがあり、43のクエリで再利用可能なテストセットを生成する。パス検索タスクは880万のパスと503万のトレーニングクエリからなるコーパスを持ち、43のクエリの再利用テストセットを生成する。今年、15のグループは、ディープラーニング、転送学習、従来のirランキング手法のさまざまな組み合わせを使用して、合計75のランを提出した。ディープラーニングは従来のir実行を大きく上回っている。この結果から,大規模なトレーニングデータを導入し,評価プールにトレーニングした深層モデルを含めることができたが,過去の研究ではそのようなトレーニングデータやプールは存在しなかった。 The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries, for which we generate a reusable test set of 43 queries. The passage retrieval task has a corpus of 8.8 million passages with 503 thousand training queries, for which we generate a reusable test set of 43 queries. This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods. Deep learning runs significantly outperformed traditional IR runs. Possible explanations for this result are that we introduced large training data and we included deep models trained on such data in our judging pools, whereas some past studies did not have such training data or pooling.	翻訳日:2022-12-22 20:44:41 公開日:2020-03-18
# プライバシー保護協調フィルタリングの実態調査 Survey of Privacy-Preserving Collaborative Filtering ( http://arxiv.org/abs/2003.08343v1 ) ライセンス: Link先を確認	Islam Elnabarawy, Wei Jiang, Donald C. Wunsch II	(参考訳) 協調フィルタリングレコメンデーションシステムは、自身の過去の好みと、同様の関心を持つ他のユーザの好みに基づいて、ユーザにレコメンデーションを提供する。近年、レコメンデーションシステムの利用は広く増えており、どの映画を見るか、本を読むか、購入するアイテムを選ぶのに役立っている。しかし、このようなシステムを使う場合、ユーザーはプライバシーを心配することが多く、ほとんどのオンラインサービスに正確な情報を提供することに消極的である。プライバシー保護協調フィルタリングレコメンデーションシステムは、データのプライバシーに関する一定の保証を維持しながら、ユーザに正確なレコメンデーションを提供することを目的としている。この調査は、プライバシ保護協調フィルタリングにおける最近の文献を調査し、分野の広い視点を提供し、文献における主要なコントリビューションを、それらが対処する脆弱性の種類と、それを解決するために使用するアプローチのタイプという2つの異なる基準を用いて分類する。 Collaborative filtering recommendation systems provide recommendations to users based on their own past preferences, as well as those of other users who share similar interests. The use of recommendation systems has grown widely in recent years, helping people choose which movies to watch, books to read, and items to buy. However, users are often concerned about their privacy when using such systems, and many users are reluctant to provide accurate information to most online services. Privacy-preserving collaborative filtering recommendation systems aim to provide users with accurate recommendations while maintaining certain guarantees about the privacy of their data. This survey examines the recent literature in privacy-preserving collaborative filtering, providing a broad perspective of the field and classifying the key contributions in the literature using two different criteria: the type of vulnerability they address and the type of approach they use to solve it.	翻訳日:2022-12-22 13:27:45 公開日:2020-03-18
# ニューラルファジィエクストラクタ:バイオメトリックユーザ認証にニューラルネットワークを使用するセキュアな方法 Neural Fuzzy Extractors: A Secure Way to Use Artificial Neural Networks for Biometric User Authentication ( http://arxiv.org/abs/2003.08433v1 ) ライセンス: Link先を確認	Abhishek Jana, Md Kamruzzaman Sarker, Monireh Ebrahimi, Pascal Hitzler, George T Amariucai	(参考訳) センサ開発と人工知能の新たな進歩、計算コストの低減、ハンドヘルド計算デバイスの普及により、生体認証(および識別)が急速に普及している。高度な機械学習技術に基づくバイオメトリック認証への現代的なアプローチは、訓練済みの分類器の詳細または明示的なユーザバイオメトリックデータの保存を回避できないため、ユーザの認証情報が偽造される。本稿では,生体認証のためのベクトル空間分類器や人工ニューラルネットワークを用いたユーザ固有情報を扱うためのセキュアな方法を提案する。提案アーキテクチャはニューラルファジィ・エクストラクタ (NFE) と呼ばれ,既存の分類器とファジィ抽出器の結合を可能にする。したがって、NFEは、現代のディープラーニングベースの分類器のすべてのパフォーマンス上の利点と、標準的なファジィ抽出器のセキュリティを提供する。従来型ニューラルネットワークのnfeを,指紋認証によるユーザ認証の簡単なシナリオに適合させることを実証する。 Powered by new advances in sensor development and artificial intelligence, the decreasing cost of computation, and the pervasiveness of handheld computation devices, biometric user authentication (and identification) is rapidly becoming ubiquitous. Modern approaches to biometric authentication, based on sophisticated machine learning techniques, cannot avoid storing either trained-classifier details or explicit user biometric data, thus exposing users' credentials to falsification. In this paper, we introduce a secure way to handle user-specific information involved with the use of vector-space classifiers or artificial neural networks for biometric authentication. Our proposed architecture, called a Neural Fuzzy Extractor (NFE), allows the coupling of pre-existing classifiers with fuzzy extractors, through a artificial-neural-network-based buffer called an expander, with minimal or no performance degradation. The NFE thus offers all the performance advantages of modern deep-learning-based classifiers, and all the security of standard fuzzy extractors. We demonstrate the NFE retrofit to a classic artificial neural network for a simple scenario of fingerprint-based user authentication.	翻訳日:2022-12-22 13:27:04 公開日:2020-03-18
# オンライン予測における悪質専門家と乗法重みアルゴリズム Malicious Experts versus the multiplicative weights algorithm in online prediction ( http://arxiv.org/abs/2003.08457v1 ) ライセンス: Link先を確認	Erhan Bayraktar, H. Vincent Poor, Xin Zhang	(参考訳) 2人の専門家と予測者による予測問題を考える。専門家の一人が正直で、各ラウンドで確率$\mu$で正しい予測をしていると仮定する。もう一方は悪意があり、各ラウンドの真の結果を知り、予測者の損失を最大化するために予測を行う。予測者が古典的な乗法重みアルゴリズムを採用すると仮定すると、悪意のある専門家の値関数の上限と下限が見つかる。その結果,乗算重みアルゴリズムは悪意のある専門家の腐敗に抵抗できないことが示唆された。また,適応乗法重み付けアルゴリズムは予測者にとって漸近的に最適であり,従って悪質な専門家の腐敗に抵抗することを示した。 We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability $\mu$ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster. Assuming the forecaster adopts the classical multiplicative weights algorithm, we find upper and lower bounds for the value function of the malicious expert. Our results imply that the multiplicative weights algorithm cannot resist the corruption of malicious experts. We also show that an adaptive multiplicative weights algorithm is asymptotically optimal for the forecaster, and hence more resistant to the corruption of malicious experts.	翻訳日:2022-12-22 13:26:45 公開日:2020-03-18
# LTE-U/Wi-Fi共存シナリオにおける機械学習によるスペクトル共有 Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios ( http://arxiv.org/abs/2003.13652v1 ) ライセンス: Link先を確認	Adam Dziedzic, Vanlin Sathya, Muhammad Iqbal Rochman, Monisha Ghosh and Sanjay Krishnan	(参考訳) 複雑なエンジニアリング問題に対する機械学習(ML)技術の適用は、魅力的で効率的なソリューションであることが証明されている。 MLは、画像認識や産業運用の自動化など、いくつかの実践的なタスクにうまく適用されています。非線形問題の解決におけるML技術の約束は、既知のML技術の適用と、未ライセンスのスペクトルにおけるWi-FiとLTE間の無線スペクトル共有のための新しい手法の開発を目的として、この研究に影響を与えた。本研究では,LTE-Uフォーラムが開発したLTE-Unlicensed (LTE-U)仕様に焦点をあてる。この仕様は、コチャネルWi-Fiベーシックサービスセット(BSS)の数が1つから2つ以上に増加すると、LTE-Uベースステーション(BS)のデューティサイクルが減少することを示唆している。しかし、Wi-Fiパケットを復号することなく、リアルタイムにチャンネル上で動作しているWi-Fi BSSの数を検出することは難しい問題である。本研究では,LTE-U OFF時間帯に観測されたエネルギー値を用いて,MLに基づく新しい手法を提案する。 LTE-U BS OFF時間中のエネルギー値だけを観測するのは、LTE-Uベースステーションで完全なWi-Fi受信機を必要とするWi-Fiパケット全体を復号するのに比べれば比較的簡単である。提案手法を実時間実験により実装・検証し,一方と多数の Wi-Fi AP 伝送間のエネルギー分布に異なるパターンが存在することを示す。提案手法は,従来の自己相関法 (AC) やエネルギー検出法 (ED) と比較して高い精度 (すべての場合 99 % に近づいた) が得られる。 The application of Machine Learning (ML) techniques to complex engineering problems has proved to be an attractive and efficient solution. ML has been successfully applied to several practical tasks like image recognition, automating industrial operations, etc. The promise of ML techniques in solving non-linear problems influenced this work which aims to apply known ML techniques and develop new ones for wireless spectrum sharing between Wi-Fi and LTE in the unlicensed spectrum. In this work, we focus on the LTE-Unlicensed (LTE-U) specification developed by the LTE-U Forum, which uses the duty-cycle approach for fair coexistence. The specification suggests reducing the duty cycle at the LTE-U base-station (BS) when the number of co-channel Wi-Fi basic service sets (BSSs) increases from one to two or more. However, without decoding the Wi-Fi packets, detecting the number of Wi-Fi BSSs operating on the channel in real-time is a challenging problem. In this work, we demonstrate a novel ML-based approach which solves this problem by using energy values observed during the LTE-U OFF duration. It is relatively straightforward to observe only the energy values during the LTE-U BS OFF time compared to decoding the entire Wi-Fi packet, which would require a full Wi-Fi receiver at the LTE-U base-station. We implement and validate the proposed ML-based approach by real-time experiments and demonstrate that there exist distinct patterns between the energy distributions between one and many Wi-Fi AP transmissions. The proposed ML-based approach results in a higher accuracy (close to 99\% in all cases) as compared to the existing auto-correlation (AC) and energy detection (ED) approaches.	翻訳日:2022-12-22 13:26:33 公開日:2020-03-18
# eisen: 堅固なディープラーニングのためのpythonパッケージ Eisen: a python package for solid deep learning ( http://arxiv.org/abs/2004.02747v1 ) ライセンス: Link先を確認	Frank Mancolo	(参考訳) eisenは、ディープラーニングメソッドの実装を簡単にするオープンソースのpythonパッケージである。医用画像解析やコンピュータビジョンタスクに特化しているが、柔軟性によって任意のアプリケーションへの拡張が可能になる。 EisenはPyTorchをベースにしており、PyTorchエコシステムに属する他のパッケージと同じアーキテクチャに従っている。これにより使用が簡単になり、他のパッケージが提供するモジュールと互換性がある。 eisenは、複数のデータセットローディングメソッド、さまざまなデータフォーマットのためのi/o、データ操作と変換、トレーニング、検証とテストループの完全な実装、損失とネットワークアーキテクチャの実装、トレーニングアーティファクトの自動エクスポート、サマリーとログ、ビジュアル実験構築、コマンドラインインターフェースなどを実装している。さらに,コミュニティによるユーザコントリビューションも公開されている。ドキュメント、例、コードはhttp://eisen.ai.comからダウンロードできる。 Eisen is an open source python package making the implementation of deep learning methods easy. It is specifically tailored to medical image analysis and computer vision tasks, but its flexibility allows extension to any application. Eisen is based on PyTorch and it follows the same architecture of other packages belonging to the PyTorch ecosystem. This simplifies its use and allows it to be compatible with modules provided by other packages. Eisen implements multiple dataset loading methods, I/O for various data formats, data manipulation and transformation, full implementation of training, validation and test loops, implementation of losses and network architectures, automatic export of training artifacts, summaries and logs, visual experiment building, command line interface and more. Furthermore, it is open to user contributions by the community. Documentation, examples and code can be downloaded from http://eisen.ai.	翻訳日:2022-12-22 13:18:26 公開日:2020-03-18
# ディープニューラルネットワーク学習のためのブロック層分割スキーム Block Layer Decomposition schemes for training Deep Neural Networks ( http://arxiv.org/abs/2003.08123v1 ) ライセンス: Link先を確認	Laura Palagi, Ruggiero Seccia	(参考訳) ディープフィードフォワードニューラルネットワーク(dfnn)の重み推定は、非常に大きな非凸最適化問題の解に依存している。その結果、最適化アルゴリズムは、悪い解決策につながるか、最適化プロセスを遅くする可能性がある局所的最小化器に惹かれることができる。さらに、トレーニング問題に対する優れた解を見つけるのに必要な時間は、サンプルの数と変数の数の両方に依存する。本稿では,ブロック座標降下法(bcd法)を用いて,定常点や平坦領域を回避し,最先端アルゴリズムの性能を向上させる方法を示す。まず、ネットワークの深さに効果的に取り組むことができるバッチBCD法について述べ、次に、BCDアプローチをミニバッチフレームワークに埋め込むことで、変数数とサンプル数の両方をスケールできる \textit{minibatch} BCD フレームワークを提案するアルゴリズムをさらに拡張する。複数のアーキテクチャネットワークにおける標準データセットの広範囲な数値計算により,dfnnのトレーニングフェーズへのbcd手法の適用が,標準バッチアルゴリズムやミニバッチアルゴリズムよりも優れており,トレーニングフェーズとネットワークの一般化性能の両方が向上していることを示す。 Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. As a consequence, optimization algorithms can be attracted toward local minimizers which can lead to bad solutions or can slow down the optimization process. Furthermore, the time needed to find good solutions to the training problem depends on both the number of samples and the number of variables. In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method ables to effectively tackle the network's depth and then we further extend the algorithm proposing a \textit{minibatch} BCD framework able to scale with respect to both the number of variables and the number of samples by embedding a BCD approach into a minibatch framework. By extensive numerical results on standard datasets for several architecture networks, we show how the application of BCD methods to the training phase of DFNNs permits to outperform standard batch and minibatch algorithms leading to an improvement on both the training phase and the generalization performance of the networks.	翻訳日:2022-12-22 13:17:51 公開日:2020-03-18
# クラスタリングと影響分析を用いたプロセスマイニング分析におけるビジネスエリア効果の発見 Discovering Business Area Effects to Process Mining Analysis Using Clustering and Influence Analysis ( http://arxiv.org/abs/2003.08170v1 ) ライセンス: Link先を確認	Teemu Lehto and Markku Hinkka	(参考訳) 大きな組織でビジネスプロセスを改善するための一般的な課題は、オペレーションを担当するビジネスパーソンが、ビジネスオペレーションで実行される実行の詳細、プロセス変種、例外の事実に基づく理解を欠いていることです。既存のプロセスマイニング方法論はイベントログに基づいてこれらの詳細を発見できるが、プロセスマイニングの知見をビジネス関係者に伝えることは困難である。本稿では,プロセス実行の詳細に重要な影響を与えるビジネス領域を発見するための新しい手法を提案する。本手法はクラスタリングを用いてプロセスフロー特性に基づいて類似の事例をグループ化し,クラスタに最も相関するビジネス領域を検出するための影響分析を行う。私たちの分析はBPMの人々とビジネスの間の橋渡しとして役立ちます。また,公開されている実物購入注文プロセスデータに基づく事例分析を行った。 A common challenge for improving business processes in large organizations is that business people in charge of the operations are lacking a fact-based understanding of the execution details, process variants, and exceptions taking place in business operations. While existing process mining methodologies can discover these details based on event logs, it is challenging to communicate the process mining findings to business people. In this paper, we present a novel methodology for discovering business areas that have a significant effect on the process execution details. Our method uses clustering to group similar cases based on process flow characteristics and then influence analysis for detecting those business areas that correlate most with the discovered clusters. Our analysis serves as a bridge between BPM people and business, people facilitating the knowledge sharing between these groups. We also present an example analysis based on publicly available real-life purchase order process data.	翻訳日:2022-12-22 13:17:29 公開日:2020-03-18
# 機械学習を用いた新しいコロナウイルスに対する中和抗体の発見 Potential Neutralizing Antibodies Discovered for Novel Corona Virus Using Machine Learning ( http://arxiv.org/abs/2003.08447v1 ) ライセンス: Link先を確認	Rishikesh Magar, Prakarsh Yadav, Amir Barati Farimani	(参考訳) 迅速で追跡不能なウイルス変異は、免疫系が抑制抗体を産生する前に数千人の命を奪う。新型コロナウイルスの感染拡大を受け、世界中で数千人が死亡した。新型コロナウイルスのウイルスエピトープを阻害するペプチドや抗体配列の迅速発見法は、数千人の命を救う。本稿では,コロナウイルスに対する阻害性合成抗体を予測するための機械学習(ML)モデルを考案した。 1933年のウイルス抗体配列と臨床患者中和反応を収集し,MLモデルを用いて抗体反応の予測を行った。各種ML法を用いて, 数千の仮説抗体配列をスクリーニングし, 8種類の安定抗体が検出された。我々は、コロナウイルスを阻害する候補抗体の安定性を検証するために、バイオインフォマティクス、構造生物学、分子動力学(md)シミュレーションを組み合わせた。 The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. Recent outbreak of novel coronavirus infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of COVID-19 will save the life of thousands. In this paper, we devised a machine learning (ML) model to predict the possible inhibitory synthetic antibodies for Corona virus. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, we screened thousands of hypothetical antibody sequences and found 8 stable antibodies that potentially inhibit COVID-19. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit the Corona virus.	翻訳日:2022-12-22 13:17:15 公開日:2020-03-18
# AIはファッションジャーゴンを解読できるのか? Can AI decrypt fashion jargon for you? ( http://arxiv.org/abs/2003.08052v1 ) ライセンス: Link先を確認	Yuan Shen, Shanduojiao Jiang, Muhammad Rizky Wellyanto, and Ranjitha Kumar	(参考訳) ファッションについて語るとき、ファッションの概念の根底にある意味、例えばスタイルに気を配り、例えば、このドレスのどの機能がスマートかといった質問をするが、今日のファッションウェブサイトの製品説明はドメイン固有の言葉と低レベルの言葉でいっぱいである。これらの低レベルの記述が、いかにしてスタイルや高レベルのファッション概念に貢献できるかは、人々には明らかではない。本稿では,ファッションサイトにおける既存製品データを活用することで,この概念理解問題に対処するためのデータ駆動型ソリューションを提案する。最初に1546のファッションキーワードを5つのファッションカテゴリに分類した。次に,853,056製品からなる新しいファッション製品データセットを収集した。最後に、プロダクトイメージの低レベルとドメイン固有のファッション機能でハイレベルなファッションコンセプトを明示的に予測し、説明できるディープラーニングモデルをトレーニングしました。 When people talk about fashion, they care about the underlying meaning of fashion concepts,e.g., style.For example, people ask questions like what features make this dress smart.However, the product descriptions in today fashion websites are full of domain specific and low level words. It is not clear to people how exactly those low level descriptions can contribute to a style or any high level fashion concept. In this paper, we proposed a data driven solution to address this concept understanding issues by leveraging a large number of existing product data on fashion sites. We first collected and categorized 1546 fashion keywords into 5 different fashion categories. Then, we collected a new fashion product dataset with 853,056 products in total. Finally, we trained a deep learning model that can explicitly predict and explain high level fashion concepts in a product image with its low level and domain specific fashion features.	翻訳日:2022-12-22 13:16:59 公開日:2020-03-18
# クロスリンガルクロスコーパス音声の感情認識 Cross Lingual Cross Corpus Speech Emotion Recognition ( http://arxiv.org/abs/2003.07996v1 ) ライセンス: Link先を確認	Shivali Goel (1), Homayoon Beigi (1 and 2) ((1) Department of Computer Science, Columbia University, (2) Recognition Technologies, Inc., South Salem, New York, United States)	(参考訳) 既存の音声感情認識モデルの大部分は、単一のコーパスと単一の言語設定で訓練され、評価される。これらのシステムは、クロスコーポレートかつクロス言語シナリオに適用された場合、うまく機能しない。本稿では,単一コーパスとクロスコーパスのいずれにおいても,4言語における音声感情認識の結果について述べる。さらに,ジェンダー,自然性,覚醒を補助課題とするマルチタスク学習(MTL)は,感情モデルの一般化能力を高めることが示されており,本研究では,まだ研究されていない感情認識における音声言語の役割を探るため,MTLフレームワークのもう一つの補助課題として言語IDを導入する。 The majority of existing speech emotion recognition models are trained and evaluated on a single corpus and a single language setting. These systems do not perform as well when applied in a cross-corpus and cross-language scenario. This paper presents results for speech emotion recognition for 4 languages in both single corpus and cross corpus setting. Additionally, since multi-task learning (MTL) with gender, naturalness and arousal as auxiliary tasks has shown to enhance the generalisation capabilities of the emotion models, this paper introduces language ID as another auxiliary task in MTL framework to explore the role of spoken language on emotion recognition which has not been studied yet.	翻訳日:2022-12-22 13:10:14 公開日:2020-03-18
# 造影CT画像からの3次元肝血管形態再構成のためのグラフ注意ネットワークを用いたプルーニング Graph Attention Network based Pruning for Reconstructing 3D Liver Vessel Morphology from Contrasted CT Images ( http://arxiv.org/abs/2003.07999v1 ) ライセンス: Link先を確認	Donghao Zhang, Siqi Liu, Shikha Chaganti, Eli Gibson, Zhoubing Xu, Sasa Grbic, Weidong Cai, and Dorin Comaniciu	(参考訳) 造影剤を血管に注入することで、多相造影ct画像は人体における血管ネットワークの可視性を高めることができる。造影CT画像から肝血管の3次元幾何学的形態を再構築することで, 複数種類の術前手術計画が可能である。肝血管形態の再構成は, 肝血管の形態学的複雑度と, 多相造影CT像の非一貫性により, 依然として困難である。一方, 意思決定バイアスを回避するためには, 3次元再構成において高い整合性が必要である。本稿では,完全畳み込みニューラルネットワークとグラフアテンションネットワークを併用した肝血管形態再構築フレームワークを提案する。完全な畳み込みニューラルネットワークは、まず肝臓の中枢熱マップを生成するために訓練される。その後、画像処理に基づくアルゴリズムを用いて、熱マップに基づいてオーバー再構成肝血管グラフモデルをトレースする。グラフアテンションネットワークを用いて、集約されたCNN特徴を用いて、初期再構成における各セグメント分岐の存在確率を予測する。 418個の多相腹部ct画像からなる社内データセット上で提案手法を評価した。提案したグラフネットワークのプルーニングにより,全体のF1スコアが6.4%向上した。また、他の最先端の曲率構造再構成アルゴリズムよりも優れていた。 With the injection of contrast material into blood vessels, multi-phase contrasted CT images can enhance the visibility of vessel networks in the human body. Reconstructing the 3D geometric morphology of liver vessels from the contrasted CT images can enable multiple liver preoperative surgical planning applications. Automatic reconstruction of liver vessel morphology remains a challenging problem due to the morphological complexity of liver vessels and the inconsistent vessel intensities among different multi-phase contrasted CT images. On the other side, high integrity is required for the 3D reconstruction to avoid decision making biases. In this paper, we propose a framework for liver vessel morphology reconstruction using both a fully convolutional neural network and a graph attention network. A fully convolutional neural network is first trained to produce the liver vessel centerline heatmap. An over-reconstructed liver vessel graph model is then traced based on the heatmap using an image processing based algorithm. We use a graph attention network to prune the false-positive branches by predicting the presence probability of each segmented branch in the initial reconstruction using the aggregated CNN features. We evaluated the proposed framework on an in-house dataset consisting of 418 multi-phase abdomen CT images with contrast. The proposed graph network pruning improves the overall reconstruction F1 score by 6.4% over the baseline. It also outperformed the other state-of-the-art curvilinear structure reconstruction algorithms.	翻訳日:2022-12-22 13:10:02 公開日:2020-03-18
# オブジェクトベースの画像符号化: 学習駆動再訪 Object-Based Image Coding: A Learning-Driven Revisit ( http://arxiv.org/abs/2003.08033v1 ) ライセンス: Link先を確認	Qi Xia, Haojie Liu and Zhan Ma	(参考訳) 20年ほど前に広く研究されたObject-Based Image Coding (OBIC)は、超低ビットレート通信と高レベルのセマンティックコンテンツ理解の両方に広大なアプリケーション視点を約束していたが、任意の形状のオブジェクトの非効率なコンパクト表現のために、ほとんど使われなかった。根本的な問題は、任意の形のオブジェクトを細かい粒度で効率的に処理する方法である(フィーチャー要素やピクセルワイズなど)。そこで本稿では,画像層分解のためのオブジェクトセグメンテーションネットワークと,マスク付き前景オブジェクトと背景シーンを別々に処理するための並列畳み込みに基づくニューラルイメージ圧縮ネットワークを考案して,要素ワイズマスキングと圧縮の適用を提案する。すべてのコンポーネントはエンドツーエンドの学習フレームワークで最適化され、視覚的に快適なリコンストラクションのために、その(オブジェクトや背景といった)貢献をインテリジェントに重み付けます。我々は, JPEG2K, HEVCベースのBPGおよび他の学習画像圧縮法と比較して, 主観的品質向上を顕著に示す, 非常に低ビットレートのシナリオ(例えば, $\lesssim$0.1 bits per pixel - bpp)において, PASCAL VOCデータセットの性能を評価するための総合的な実験を行った。関連資料はすべてhttps://njuvision.github.io/Neural-Object-Coding/で公開されています。 The Object-Based Image Coding (OBIC) that was extensively studied about two decades ago, promised a vast application perspective for both ultra-low bitrate communication and high-level semantical content understanding, but it had rarely been used due to the inefficient compact representation of object with arbitrary shape. A fundamental issue behind is how to efficiently process the arbitrary-shaped objects at a fine granularity (e.g., feature element or pixel wise). To attack this, we have proposed to apply the element-wise masking and compression by devising an object segmentation network for image layer decomposition, and parallel convolution-based neural image compression networks to process masked foreground objects and background scene separately. All components are optimized in an end-to-end learning framework to intelligently weigh their (e.g., object and background) contributions for visually pleasant reconstruction. We have conducted comprehensive experiments to evaluate the performance on PASCAL VOC dataset at a very low bitrate scenario (e.g., $\lesssim$0.1 bits per pixel - bpp) which have demonstrated noticeable subjective quality improvement compared with JPEG2K, HEVC-based BPG and another learned image compression method. All relevant materials are made publicly accessible at https://njuvision.github.io/Neural-Object-Coding/.	翻訳日:2022-12-22 13:09:42 公開日:2020-03-18
# OmniSLAM:ワイドベースラインマルチカメラシステムにおける全方向のローカライゼーションとディエンスマッピング OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems ( http://arxiv.org/abs/2003.08056v1 ) ライセンス: Link先を確認	Changhee Won, Hochang Seok, Zhaopeng Cui, Marc Pollefeys, Jongwoo Lim	(参考訳) 本稿では,超広視野魚眼カメラ(FOV)を用いた広視野多視点ステレオ装置における全方位位置推定と高密度マッピングシステムについて述べる。より実用的で正確な再構築のために、我々はまず、既存のネットワークよりも高速かつ高精度な全方位深度推定のために、改良された軽量深層ニューラルネットワークを導入する。第2に,全方位深度推定を視覚オドメトリ(vo)に統合し,グローバル一貫性のためのループクローズモジュールを追加した。推定深度マップを用いて、お互いのビューにキーポイントを再計画し、より良く、より効率的な特徴マッチングプロセスをもたらす。最後に,全方位深度マップを融合し,推定したリグをTSDF(truncated signed distance function)ボリュームにポーズさせて3Dマップを得る。提案手法は,実環境と実環境の両方において優れた復元結果が得られることを示すとともに,実環境と合成環境の両方において,本手法が優れた復元結果を生成することを示す。 In this paper, we present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras, which has a 360 degrees coverage of stereo observations of the environment. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation, which are faster and more accurate than the existing networks. Second, we integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency. Using the estimated depth map, we reproject keypoints onto each other view, which leads to a better and more efficient feature matching process. Finally, we fuse the omnidirectional depth maps and the estimated rig poses into the truncated signed distance function (TSDF) volume to acquire a 3D map. We evaluate our method on synthetic datasets with ground-truth and real-world sequences of challenging environments, and the extensive experiments show that the proposed system generates excellent reconstruction results in both synthetic and real-world environments.	翻訳日:2022-12-22 13:09:15 公開日:2020-03-18
# 測地線に基づく3次元形状のキャラクタリゼーション : 軟部組織臓器の時間的変形への応用 A new geodesic-based feature for characterization of 3D shapes: application to soft tissue organ temporal deformations ( http://arxiv.org/abs/2003.08332v1 ) ライセンス: Link先を確認	Karim Makki, Amine Bohi, Augustin C. Ogier, Marc-Emmanuel Bellemare	(参考訳) 本稿では,点雲から3次元形状を特徴付ける手法を提案し,臓器の時間的変形の研究への直接的応用を示す。一例として, 強制呼吸運動中の膀胱の挙動を3次元表面点の減少で特徴づける: まず, 大規模な変形Diffomorphic Metric Mapping (LDDMM) フレームワークを用いて, 表面の四角形メッシュの頂点を表す等距離点の集合を, 長いダイナミックMRIシーケンスを通して追跡する。次に, ユークリッド偏微分方程式 (pdes) を用いた時間的臓器変形を特徴付けるために, スケーリングと回転に不変な新しい幾何学的特徴を提案する。我々は, 人工3次元形状と, 強制呼吸運動時の膀胱変形を特徴とする動的MRIデータの両方に特徴の堅牢性を示す。提案手法は, 医用画像, 空気力学, ロボット工学など, コンピュータビジョンの応用に有用である可能性が示唆された。 In this paper, we propose a method for characterizing 3D shapes from point clouds and we show a direct application on a study of organ temporal deformations. As an example, we characterize the behavior of a bladder during a forced respiratory motion with a reduced number of 3D surface points: first, a set of equidistant points representing the vertices of quadrilateral mesh for the surface in the first time frame are tracked throughout a long dynamic MRI sequence using a Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework. Second, a novel geometric feature which is invariant to scaling and rotation is proposed for characterizing the temporal organ deformations by employing an Eulerian Partial Differential Equations (PDEs) methodology. We demonstrate the robustness of our feature on both synthetic 3D shapes and realistic dynamic MRI data portraying the bladder deformation during forced respiratory motions. Promising results are obtained, showing that the proposed feature may be useful for several computer vision applications such as medical imaging, aerodynamics and robotics.	翻訳日:2022-12-22 13:08:27 公開日:2020-03-18
# 画像スタイル変換のためのコンテンツ変換ブロック A Content Transformation Block For Image Style Transfer ( http://arxiv.org/abs/2003.08407v1 ) ライセンス: Link先を確認	Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Bj\"orn Ommer	(参考訳) 画像理解と合成における根本的な課題を研究できるため、スタイル転送は最近多くの注目を集めている。最近の研究は、色、テクスチャ、計算速度、画像解像度の表現を大幅に改善した。芸術的なスタイルは、色、形、テクスチャといった画像の形式的特徴に影響を与えるが、コンテンツの詳細を変形、追加、削除する。本稿では,コンテンツイメージのコンテンツとスタイルを意識したスタイル化に焦点を当てた。そこで,エンコーダとデコーダの間にコンテンツ変換モジュールを導入する。さらに、写真やスタイルサンプルに現れる類似コンテンツを利用して、スタイルがコンテンツの詳細をどのように変更するかを学習し、これを他のクラスの詳細に一般化する。さらに,高分解能画像合成に不可欠な新しい正規化層を提案する。モデルの堅牢性と速度は,リアルタイムかつ高精細なビデオスタイリングを可能にする。我々は,提案手法の有効性を示すために,質的かつ定量的な評価を行う。 Style transfer has recently received a lot of attention, since it allows to study fundamental challenges in image understanding and synthesis. Recent work has significantly improved the representation of color and texture and computational speed and image resolution. The explicit transformation of image content has, however, been mostly neglected: while artistic style affects formal characteristics of an image, such as color, shape or texture, it also deforms, adds or removes content details. This paper explicitly focuses on a content-and style-aware stylization of a content image. Therefore, we introduce a content transformation module between the encoder and decoder. Moreover, we utilize similar content appearing in photographs and style samples to learn how style alters content details and we generalize this to other class details. Additionally, this work presents a novel normalization layer critical for high resolution image synthesis. The robustness and speed of our model enables a video stylization in real-time and high definition. We perform extensive qualitative and quantitative evaluations to demonstrate the validity of our approach.	翻訳日:2022-12-22 13:00:20 公開日:2020-03-18
# 脳における社会的フィードバック処理がソーシャルメディア時代における集団意見プロセスをどのように形成するか How social feedback processing in the brain shapes collective opinion processes in the era of social media ( http://arxiv.org/abs/2003.08154v1 ) ライセンス: Link先を確認	Sven Banisch and Felix Gaisbauer and Eckehard Olbrich	(参考訳) 特定の意見を持つグループが公的な声を出し、異なる見解を持つ人たちを黙らせる仕組みは何でしょう? ソーシャルメディアはどのように機能するのか? 社会的フィードバックの処理に関する最近の神経科学的知見に基づいて,これらの問題に対処できる理論モデルを構築した。このモデルは、世論の沈黙理論のスパイラルによって説明される現象を捉え、そのメカニズムに基づく基礎を提供し、この方法で異なる集団構造が集団的意見表現の異なる体制とどのように関連しているかについてのより一般的な洞察を可能にする。少数派が結束全体として振る舞うと、強い多数派でさえ沈黙を余儀なくされる。社会フィードバック理論(英語版) (SFT) の枠組みは、社会的および認知神経科学における発見の社会的レベルの影響を理解するための社会理論の必要性を強調している。 What are the mechanisms by which groups with certain opinions gain public voice and force others holding a different view into silence? And how does social media play into this? Drawing on recent neuro-scientific insights into the processing of social feedback, we develop a theoretical model that allows to address these questions. The model captures phenomena described by spiral of silence theory of public opinion, provides a mechanism-based foundation for it, and allows in this way more general insight into how different group structures relate to different regimes of collective opinion expression. Even strong majorities can be forced into silence if a minority acts as a cohesive whole. The proposed framework of social feedback theory (SFT) highlights the need for sociological theorising to understand the societal-level implications of findings in social and cognitive neuroscience.	翻訳日:2022-12-22 12:58:19 公開日:2020-03-18
# 中国語における代用スーパーセンスのコーパス A Corpus of Adpositional Supersenses for Mandarin Chinese ( http://arxiv.org/abs/2003.08437v1 ) ライセンス: Link先を確認	Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider	(参考訳) 格付けは、しばしば意味関係の指標となるが、非常に曖昧であり、言語によって大きく異なる。さらに,形容詞意味論の言語間差異を調査したり,多言語的曖昧化システムを構築するための注釈付きコーパスのデジェストが存在する。本稿は,中国語における全ての格付けが意味論的にアノテートされたコーパスについて述べる。提案手法は,言語に依存しないセマンティックな基準に従って,一般的なスーパーセンスの集合を定義する枠組みに適応するが,その開発は主に英語の前置詞に焦点を当てている(Schneider et al., 2018)。このスーパーセンスカテゴリーは、英語と構文的差異があるにもかかわらず、中国語の表記に適していることがわかった。 The Little Prince』のマンダリン翻訳では、高いアノテータ間合意を達成し、ビットクストの付加トークンの意味対応を解析する。 Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.	翻訳日:2022-12-22 12:58:04 公開日:2020-03-18
# 公理ピンポイント Axiom Pinpointing ( http://arxiv.org/abs/2003.08298v1 ) ライセンス: Link先を確認	Rafael Pe\~naloza	(参考訳) 公理ピンポイント(英: axiom pinpointing)とは、結果が従うべき存在論における特定の公理を見つけること。この課題は多くの研究分野において異なる名称で研究され、技術改革と再発明につながっている。本稿では,公理ピンポインティングの概要を概説し,基本的な概念,それを解決するための異なるアプローチ,そして文献で検討されたバリエーションや応用について述べる。これは、関連する問題に関心のある研究者の出発点となり、詳細を深く掘り下げるための豊富な書誌がある。 Axiom pinpointing refers to the task of finding the specific axioms in an ontology which are responsible for a consequence to follow. This task has been studied, under different names, in many research areas, leading to a reformulation and reinvention of techniques. In this work, we present a general overview to axiom pinpointing, providing the basic notions, different approaches for solving it, and some variations and applications which have been considered in the literature. This should serve as a starting point for researchers interested in related problems, with an ample bibliography for delving deeper into the details.	翻訳日:2022-12-22 12:57:48 公開日:2020-03-18
# 単発人物再同定の深層学習のための三重項置換法 Triplet Permutation Method for Deep Learning of Single-Shot Person Re-Identification ( http://arxiv.org/abs/2003.08303v1 ) ライセンス: Link先を確認	M. J. G\'omez-Silva, J.M. Armingol, A. de la Escalera	(参考訳) 深層畳み込みニューラルネットワークのトレーニングによる単発人物再識別(re-id)の解決は、1人当たり2枚の画像しか利用できないため、トレーニングデータの欠如による厄介な課題である。これによりモデルがオーバーフィッティングされ、性能が劣化する。本稿では,特定のre-idデータセットから複数のトレーニングセットを生成するために,Triplet Permutation法を定式化する。これはトリプルトネットワークを供給するための新しい戦略であり、シングルショットRe-Idモデルのオーバーフィッティングを低減する。改良されたパフォーマンスは、最も挑戦的なre-idデータセットであるprid2011で実証され、この方法の有効性が証明された。 Solving Single-Shot Person Re-Identification (Re-Id) by training Deep Convolutional Neural Networks is a daunting challenge, due to the lack of training data, since only two images per person are available. This causes the overfitting of the models, leading to degenerated performance. This paper formulates the Triplet Permutation method to generate multiple training sets, from a certain re-id dataset. This is a novel strategy for feeding triplet networks, which reduces the overfitting of the Single-Shot Re-Id model. The improved performance has been demonstrated over one of the most challenging Re-Id datasets, PRID2011, proving the effectiveness of the method.	翻訳日:2022-12-22 12:51:08 公開日:2020-03-18
# 内科的回転平均化におけるミニマの分布について On the Distribution of Minima in Intrinsic-Metric Rotation Averaging ( http://arxiv.org/abs/2003.08310v1 ) ライセンス: Link先を確認	Kyle Wilson and David Bindel	(参考訳) 回転平均化は3dシーンの画像からカメラの集合の向きを決定する非凸最適化問題である。この問題は様々な距離とロバスト化器を用いて研究されている。 SO(3) 上の内在的(あるいは測地的)距離は幾何学的に有意であるが、外在的距離に基づく解法では(条件付き)正当性を保証するが、内在的計量では同等の結果が見つからない。本稿では,局所ミニマの空間分布について検討する。まず、質的行動における鋭い遷移を示すために、新しい実証研究を行い、問題がより不安定になるにつれて、それらは単一の(簡単に探せる)支配的な最小の面からミニマで満たされたコスト面へと遷移する。本論文の第2部では、この遷移が起こるときの理論的境界を導出する。これは[24]の結果を拡張したもので、この問題の難しさを研究するためのプロキシとして局所凸性を用いたものです。問題の基底となる商多様体幾何を認識することにより、先行作業よりも n-次元の改善が得られる。ちなみに、我々の分析では、以前の$l_2$ワークを一般的な$l_p$コストにまで拡張しています。本研究は,問題難易度を示す指標として代数的接続性を用いることを提案する。 Rotation Averaging is a non-convex optimization problem that determines orientations of a collection of cameras from their images of a 3D scene. The problem has been studied using a variety of distances and robustifiers. The intrinsic (or geodesic) distance on SO(3) is geometrically meaningful; but while some extrinsic distance-based solvers admit (conditional) guarantees of correctness, no comparable results have been found under the intrinsic metric. In this paper, we study the spatial distribution of local minima. First, we do a novel empirical study to demonstrate sharp transitions in qualitative behavior: as problems become noisier, they transition from a single (easy-to-find) dominant minimum to a cost surface filled with minima. In the second part of this paper we derive a theoretical bound for when this transition occurs. This is an extension of the results of [24], which used local convexity as a proxy to study the difficulty of problem. By recognizing the underlying quotient manifold geometry of the problem we achieve an n-fold improvement over prior work. Incidentally, our analysis also extends the prior $l_2$ work to general $l_p$ costs. Our results suggest using algebraic connectivity as an indicator of problem difficulty.	翻訳日:2022-12-22 12:50:54 公開日:2020-03-18
# DeepCap:Weak Supervisionを使った単眼の人間パフォーマンスキャプチャ DeepCap: Monocular Human Performance Capture Using Weak Supervision ( http://arxiv.org/abs/2003.08325v1 ) ライセンス: Link先を確認	Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt	(参考訳) 人間のパフォーマンスキャプチャは、映画制作やバーチャル/拡張現実における多くの応用において、非常に重要なコンピュータビジョン問題である。以前の多くのパフォーマンスキャプチャアプローチでは、高価なマルチビューの設定が必要か、フレーム間対応で密集した時空コヒーレント形状を回復しなかった。本稿では,単眼高密度ヒトパフォーマンスキャプチャのための新しい深層学習手法を提案する。提案手法は,3次元基底真理アノテーションを用いたトレーニングデータを完全に除去する多視点監視に基づいて,弱教師付きで訓練される。ネットワークアーキテクチャは、タスクをポーズ推定と非剛性表面変形ステップに切り離す2つの別々のネットワークに基づいている。広範な質的・定量的評価は,我々のアプローチが品質と堅牢性の観点から,芸術の状態を上回っていることを示している。 Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness.	翻訳日:2022-12-22 12:50:32 公開日:2020-03-18
# RGB-Dスキャンによる逆テクスチャ最適化 Adversarial Texture Optimization from RGB-D Scans ( http://arxiv.org/abs/2003.08400v1 ) ライセンス: Link先を確認	Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nie{\ss}ner, Thomas Funkhouser	(参考訳) リアルなカラーテクスチャの生成は、rgb-d表面再構成の重要なステップであるが、再構成された形状の不正確さ、カメラのポーズのミスアライメント、ビュー依存のイメージアーティファクトのため、実際はまだ困難である。本研究では,弱教師付き視点から得られた条件付き逆数損失を用いた色彩テクスチャ生成手法を提案する。具体的には,これらの誤差に頑健な客観的関数を学習することにより,不整合画像からでも近似面に対してフォトリアリスティックなテクスチャを生成する手法を提案する。提案手法の鍵となる考え方は,テクスチャ最適化をミスアライメントに寛容に導くパッチベースの条件判別器を学習することである。識別器は合成ビューと実画像を取り、合成ビューが現実主義の広い定義の下で現実的かどうかを評価する。私たちは、'リアル'な'例の入力ビューとそれらの不整合バージョンを提供することで、判別子を訓練し、学習した敵の損失がスキャンからエラーを許容できるようにします。定量的・質的評価の下での合成データおよび実データ実験は,最先端技術と比較して,本手法の利点を実証する。私たちのコードはビデオデモで公開されています。 Realistic color texture generation is an important step in RGB-D surface reconstruction, but remains challenging in practice due to inaccuracies in reconstructed geometry, misaligned camera poses, and view-dependent imaging artifacts. In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. Specifically, we propose an approach to produce photorealistic textures for approximate surfaces, even from misaligned images, by learning an objective function that is robust to these errors. The key idea of our approach is to learn a patch-based conditional discriminator which guides the texture optimization to be tolerant to misalignments. Our discriminator takes a synthesized view and a real image, and evaluates whether the synthesized one is realistic, under a broadened definition of realism. We train the discriminator by providing as `real' examples pairs of input views and their misaligned versions -- so that the learned adversarial loss will tolerate errors from the scans. Experiments on synthetic and real data under quantitative or qualitative evaluation demonstrate the advantage of our approach in comparison to state of the art. Our code is publicly available with video demonstration.	翻訳日:2022-12-22 12:48:51 公開日:2020-03-18
# swaptext: シーン内の画像ベースのテキスト転送 SwapText: Image Based Texts Transfer in Scenes ( http://arxiv.org/abs/2003.08152v1 ) ライセンス: Link先を確認	Qiangpeng Yang, Hongsheng Jin, Jun Huang, Wei Lin	(参考訳) オリジナルのフォント、色、サイズ、背景テクスチャを保存しながらシーンイメージにテキストをスワップすることは、異なる要因間の複雑な相互作用のために難しい課題である。本研究では,シーンイメージ間でテキストを転送する3段階フレームワークであるSwapTextを紹介する。まず,前景画像にのみテキストラベルを置換するために,新しいテキストスワップネットワークを提案する。次に、背景完了ネットワークを学習して背景画像を再構成する。最後に、生成された前景画像と背景画像を用いて、融合ネットワークにより単語画像を生成する。提案フレームワークを用いて,重度の幾何学的歪みであっても入力画像のテキストを操作できる。定性的かつ定量的な結果は、正規および不規則なテキストデータセットを含むいくつかのシーンテキストデータセットに表示される。我々は,画像ベーステキスト翻訳やテキスト画像合成などの手法の有用性を証明するため,広範な実験を行った。 Swapping text in scene images while preserving original fonts, colors, sizes and background textures is a challenging task due to the complex interplay between different factors. In this work, we present SwapText, a three-stage framework to transfer texts across scene images. First, a novel text swapping network is proposed to replace text labels only in the foreground image. Second, a background completion network is learned to reconstruct background images. Finally, the generated foreground image and background image are used to generate the word image by the fusion network. Using the proposing framework, we can manipulate the texts of the input images even with severe geometric distortion. Qualitative and quantitative results are presented on several scene text datasets, including regular and irregular text datasets. We conducted extensive experiments to prove the usefulness of our method such as image based text translation, text image synthesis, etc.	翻訳日:2022-12-22 12:42:42 公開日:2020-03-18
# 3次元ガウス核とのマルチビュー融合による3次元群数計測 3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels ( http://arxiv.org/abs/2003.08162v1 ) ライセンス: Link先を確認	Qi Zhang and Antoni B. Chan	(参考訳) 群衆の数え上げは数十年にわたって研究され、特にDNNに基づく密度マップ推定法において、多くの研究が優れた成果を上げている。既存の群衆計数作業の多くは単一視点計数に重点を置いているが、複数のカメラを使用する大規模・広視野の多視点計数の研究は少ない。近年,Multi-view Multi-scale (MVMS) と呼ばれる,複数のカメラビューをCNNで融合し,平面上の2次元シーンレベルの密度マップを推定する手法が提案されている。 MVMSとは違って,2次元地上平面ではなく3次元シーンレベル密度マップと3次元特徴融合による多視点群カウントタスクを提案する。 2D融合と比較して、3D融合は、z次元(高さ)に沿った人々のより多くの情報を抽出し、複数のビューにわたるスケールの変動を解決するのに役立つ。 3D密度マップは、和がカウントである2D密度マップの特性を保ちながら、群衆密度に関する3D情報も提供する。また,2次元ビューにおける3次元予測と基底構造間の投影整合性について検討し,計数性能をさらに向上させる。提案手法は,3つのマルチビュー計数データセット上でテストし,最先端の計数性能と同等の性能を実現する。 Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.	翻訳日:2022-12-22 12:42:28 公開日:2020-03-18
# 少数音源ラベルを用いたドメイン適応のためのクロスドメイン自己教師型学習 Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels ( http://arxiv.org/abs/2003.08264v1 ) ライセンス: Link先を確認	Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, and Kate Saenko	(参考訳) 既存の教師なしドメイン適応メソッドは、ラベル豊富なソースドメインからラベルなしのターゲットドメインに知識を転送することを目的としている。しかしながら、一部のソースドメインのラベルを取得することは非常に高価であり、以前の作業で使われるような完全なラベル付けは実用的ではない。本研究では,対象領域がラベル付けされていない場合,ソース領域内のいくつかの例のみをラベル付けする,スパースラベル付きソースデータを用いた新しいドメイン適応シナリオについて検討する。ラベル付きソースの例が限られている場合、既存のメソッドはソースドメインとターゲットドメインの両方に適用可能な差別的特徴を学習できないことが多い。本稿では,ドメイン不変性だけでなく,クラス識別性も備えた特徴を学習する,ドメイン適応のための新しいクロスドメイン自己教師型学習手法を提案する。本手法は,ドメイン内自己スーパービジョンと視覚的類似性をドメイン適応的に捉え,ドメイン間自己スーパービジョンと整合するクロスドメイン機能を実行する。 3つの標準ベンチマークデータセットによる広範な実験において、本手法は、ソースラベルが少ない新しいターゲット領域におけるターゲット精度を著しく向上させ、古典的なドメイン適応シナリオにおいてさらに有用である。 Existing unsupervised domain adaptation methods aim to transfer knowledge from a label-rich source domain to an unlabeled target domain. However, obtaining labels for some source domains may be very expensive, making complete labeling as used in prior work impractical. In this work, we investigate a new domain adaptation scenario with sparsely labeled source data, where only a few examples in the source domain have been labeled, while the target domain is unlabeled. We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains. We propose a novel Cross-Domain Self-supervised (CDS) learning approach for domain adaptation, which learns features that are not only domain-invariant but also class-discriminative. Our self-supervised learning method captures apparent visual similarity with in-domain self-supervision in a domain adaptive manner and performs cross-domain feature matching with across-domain self-supervision. In extensive experiments with three standard benchmark datasets, our method significantly boosts performance of target accuracy in the new target domain with few source labels and is even helpful on classical domain adaptation scenarios.	翻訳日:2022-12-22 12:40:27 公開日:2020-03-18
# PIC:長距離活動認識のための変分不変畳み込み PIC: Permutation Invariant Convolution for Recognizing Long-range Activities ( http://arxiv.org/abs/2003.08275v1 ) ライセンス: Link先を確認	Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders	(参考訳) 畳み込み、自己注意、ベクトル集約などの神経操作は、短距離行動を認識するための選択肢である。しかし、長距離活動のモデリングには3つの制限がある。本稿では,長期活動の時間的構造をモデル化する新しい神経層であるpic,permutation invariant convolutionを提案する。望ましい性質は3つある。私は... 標準的な畳み込みとは異なり、PICは受容領域内の特徴の時間的置換に不変であり、弱い時間構造をモデル化する資格がある。私は... ベクトルアグリゲーションと異なり、PICは局所接続を尊重し、カスケード層を用いて長距離時間抽象を学習することができる。第3回。自己注意とは対照的に、PICは共有重量を使い、長く騒々しいビデオの中で最も差別的な視覚的証拠を検出することができる。本研究では,picの3つの特性について検討し,シャレード,朝食,マルチトゥモスの長距離活動の認識にその効果を示す。 Neural operations as convolutions, self-attention, and vector aggregation are the go-to choices for recognizing short-range actions. However, they have three limitations in modeling long-range activities. This paper presents PIC, Permutation Invariant Convolution, a novel neural layer to model the temporal structure of long-range activities. It has three desirable properties. i. Unlike standard convolution, PIC is invariant to the temporal permutations of features within its receptive field, qualifying it to model the weak temporal structures. ii. Different from vector aggregation, PIC respects local connectivity, enabling it to learn long-range temporal abstractions using cascaded layers. iii. In contrast to self-attention, PIC uses shared weights, making it more capable of detecting the most discriminant visual evidence across long and noisy videos. We study the three properties of PIC and demonstrate its effectiveness in recognizing the long-range activities of Charades, Breakfast, and MultiThumos.	翻訳日:2022-12-22 12:40:05 公開日:2020-03-18
# AMIL:人文推定のための対話型マルチインスタンス学習 AMIL: Adversarial Multi Instance Learning for Human Pose Estimation ( http://arxiv.org/abs/2003.08002v1 ) ライセンス: Link先を確認	Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Jie Yang	(参考訳) 人間のポーズ推定は、ヒューマンコンピュータインタフェースから監視やコンテンツに基づくビデオ検索まで幅広い応用に重要な影響を与える。人間のポーズ推定では、関節の障害や人体の重なりが、離脱したポーズ推定に繋がる。これらの問題に対処するために,人体の構造の優先順位を統合することにより,ネットワークのトレーニング中にその優先順位を慎重に検討する新しい構造認識ネットワークを提案する。通常、そのような制約を学ぶことは難しい課題です。そこで本研究では,同一のアーキテクチャを持つ2つの残差複数インスタンス学習モデル(mil)を設計し,一方を生成器として,もう一方を判別器として使用する学習モデルとして生成型逆ネットワークを提案する。判別作業は、実際のポーズと偽のポーズを区別することである。ポーズ生成器が、判別器が実際のものと区別できない結果を生成すると、モデルが事前学習に成功する。提案モデルでは, 地中断熱マップと生成熱マップを区別し, その後, 逆方向の損失が生成体に逆伝搬する。このような手順は、発電機が合理的な身体構成を学ぶのを補助し、ポーズ推定精度を向上させるのに有利であることが証明される。一方,我々はmilの新しい機能を提案する。インスタンス選択とモデリングの両方を行うための調整可能な構造で、ひとつのバッグ内のインスタンス間で情報を適切に渡すことができる。提案された残留MILニューラルネットワークでは、プールアクションがバッグへのインスタンスコントリビューションを適切に更新する。ヒトのポーズ推定タスクの2つのデータセットにおいて、プールに基づく逆数残差マルチインスタンスニューラルネットワークが検証され、他の最先端モデルよりもうまく性能が向上した。 Human pose estimation has an important impact on a wide range of applications from human-computer interface to surveillance and content-based video retrieval. For human pose estimation, joint obstructions and overlapping upon human bodies result in departed pose estimation. To address these problems, by integrating priors of the structure of human bodies, we present a novel structure-aware network to discreetly consider such priors during the training of the network. Typically, learning such constraints is a challenging task. Instead, we propose generative adversarial networks as our learning model in which we design two residual multiple instance learning (MIL) models with the identical architecture, one is used as the generator and the other one is used as the discriminator. The discriminator task is to distinguish the actual poses from the fake ones. If the pose generator generates the results that the discriminator is not able to distinguish from the real ones, the model has successfully learnt the priors. In the proposed model, the discriminator differentiates the ground-truth heatmaps from the generated ones, and later the adversarial loss back-propagates to the generator. Such procedure assists the generator to learn reasonable body configurations and is proved to be advantageous to improve the pose estimation accuracy. Meanwhile, we propose a novel function for MIL. It is an adjustable structure for both instance selection and modeling to appropriately pass the information between instances in a single bag. In the proposed residual MIL neural network, the pooling action adequately updates the instance contribution to its bag. The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets for the human pose estimation task and successfully outperforms the other state-of-arts models.	翻訳日:2022-12-22 10:18:21 公開日:2020-03-18
# ScanSSD:PDF文書画像における数式用シングルショット検出器 ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images ( http://arxiv.org/abs/2003.08005v1 ) ライセンス: Link先を確認	Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, Richard Zanibbi	(参考訳) 本稿では,テキストからオフセットした数式をテキストラインに埋め込むScanning Single Shot Detector(ScanSSD)を提案する。 ScanSSDは検出に視覚的機能のみを使用し、レイアウト、フォント、文字ラベルなどのフォーマットやタイプセット情報を使用しない。 600dpiのドキュメントページイメージが与えられた場合、Single Shot Detector (SSD) はスライドウィンドウを使用して複数のスケールで公式を見つけ、次に候補検出をプールしてページレベルの結果を得る。実験では, TFD-ICDAR2019v2データセットを用いた。 ScanSSDは精度の高い公式の文字を検出し、0.926 fスコアを取得し、全体的なリコール率の高い公式を検出する。例えば、大きな空白ギャップ(変数の制約など)で式を分割したり、隣接するテキストラインで式をマージしたりするなどである。式検出f-スコアは 0.796 (iou $\geq0.5$) と 0.733 (iou $\ge 0.75$) である。私たちのデータ、評価ツール、コードは公開されています。 We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results. For our experiments we use the TFD-ICDAR2019v2 dataset, a modification of the GTDB scanned math article collection. ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall. Detection errors are largely minor, such as splitting formulas at large whitespace gaps (e.g., for variable constraints) and merging formulas on adjacent textlines. Formula detection f-scores of 0.796 (IOU $\geq0.5$) and 0.733 (IOU $\ge 0.75$) are obtained. Our data, evaluation tools, and code are publicly available.	翻訳日:2022-12-22 10:17:55 公開日:2020-03-18
# 物体追跡におけるr-spatiogramの適用による閉塞処理 Applying r-spatiogram in object tracking for occlusion handling ( http://arxiv.org/abs/2003.08021v1 ) ライセンス: Link先を確認	Niloufar Salehi Dastjerdi and M. Omair Ahmad	(参考訳) 物体追跡はコンピュータビジョンにおける最も重要な問題の1つである。ビデオトラッキングの目的は、対象または対象の軌跡を抽出し、すなわち、動画シーケンス内の移動対象を正確に特定し、シーケンスの特徴空間における非対象からターゲットを判別することである。したがって、特徴記述子はそのような差別に大きな影響を与える可能性がある。本稿では,参照モデルの3つの主要コンポーネント,すなわちオブジェクトモデリング,オブジェクト検出とローカライゼーション,モデル更新からなる,多くのトラッカの基本的な考え方について述べる。しかし、我々のシステムには大きな改善がある。我々のforthコンポーネントであるocclusion handlingはr-spatiogramを利用して最適なターゲット候補を検知する。スパティグラムはピクセルの座標上のいくつかのモーメントを含むが、r-spatiogramは、オブジェクトを表現するためによりリッチな特徴をキャプチャする画像内の与えられた特徴の分布に関する領域ベースのコンパクト性を計算する。本研究は,映像中の物体の出現変化や重度の閉塞の存在下での追跡を効果的かつ堅牢に行う方法を開発した。提案手法は,課題の異なるシーケンスを考慮し,プリンストン rgbd 追跡データセット上で評価し,提案手法の有効性を示す。 Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e., object modeling, object detection and localization, and model updating. However, there are major improvements in our system. Our forth component, occlusion handling, utilizes the r-spatiogram to detect the best target candidate. While spatiogram contains some moments upon the coordinates of the pixels, r-spatiogram computes region-based compactness on the distribution of the given feature in the image that captures richer features to represent the objects. The proposed research develops an efficient and robust way to keep tracking the object throughout video sequences in the presence of significant appearance variations and severe occlusions. The proposed method is evaluated on the Princeton RGBD tracking dataset considering sequences with different challenges and the obtained results demonstrate the effectiveness of the proposed method.	翻訳日:2022-12-22 10:17:34 公開日:2020-03-18
# STH:効率的な行動認識のための時空間ハイブリッド畳み込み STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition ( http://arxiv.org/abs/2003.08042v1 ) ライセンス: Link先を確認	Xu Li, Jingwen Wang, Lin Ma, Kaihao Zhang, Fengzong Lian, Zhanhui Kang and Jinjun Wang	(参考訳) 効果的な時空間モデリングは行動認識に不可欠である。既存のメソッドは、モデルのパフォーマンスとモデルの複雑さの間のトレードオフに苦しむ。本稿では,空間的・時間的映像情報を少ないパラメータコストで同時に符号化する,空間的・時間的ハイブリッド・コンボリューション・ネットワーク(STH)を提案する。コンボリューション層が異なる空間的・時間的情報を逐次的または並列に抽出する既存の作業とは異なり、入力チャネルを複数のグループに分割し、空間的・時間的操作を1つの畳み込み層にインターリーブする。このような設計は効率的な時空間モデリングを可能にし、小さなモデルスケールを維持する。 STH-Convは一般的なビルディングブロックであり、従来の2D-Convブロック(2D畳み込み)を置き換えることで、ResNetやMobileNetのような既存の2D CNNアーキテクチャにプラグインすることができる。 STHネットワークは、Something (V1 & V2)、Jester、HMDB-51といったベンチマークデータセットの競合製品よりも、競争力やパフォーマンスの向上を実現している。さらに、sthは2d cnnよりもさらに小さなパラメータコストを維持しながら、3d cnnよりも優れた性能を享受する。 Effective and Efficient spatio-temporal modeling is essential for action recognition. Existing methods suffer from the trade-off between model performance and model complexity. In this paper, we present a novel Spatio-Temporal Hybrid Convolution Network (denoted as "STH") which simultaneously encodes spatial and temporal video information with a small parameter cost. Different from existing works that sequentially or parallelly extract spatial and temporal information with different convolutional layers, we divide the input channels into multiple groups and interleave the spatial and temporal operations in one convolutional layer, which deeply incorporates spatial and temporal clues. Such a design enables efficient spatio-temporal modeling and maintains a small model scale. STH-Conv is a general building block, which can be plugged into existing 2D CNN architectures such as ResNet and MobileNet by replacing the conventional 2D-Conv blocks (2D convolutions). STH network achieves competitive or even better performance than its competitors on benchmark datasets such as Something-Something (V1 & V2), Jester, and HMDB-51. Moreover, STH enjoys performance superiority over 3D CNNs while maintaining an even smaller parameter cost than 2D CNNs.	翻訳日:2022-12-22 10:16:56 公開日:2020-03-18
# パーキンソン病における顔面運動動態の推定:運動追跡のための2次元および3次元マーカーレスシステムの比較 Estimation of Orofacial Kinematics in Parkinson's Disease: Comparison of 2D and 3D Markerless Systems for Motion Tracking ( http://arxiv.org/abs/2003.08048v1 ) ライセンス: Link先を確認	Diego L. Guarin, Aidan Dempster, Andrea Bandini, Yana Yunusova and Babak Taati	(参考訳) 顔面の欠損はパーキンソン病(PD)の患者によく見られ、その進化は疾患進行の重要なバイオマーカーである可能性がある。本研究は, PDにおける口腔機能評価の自動化システムを開発し, 家庭内, クリニックで使用でき, 疾患管理に有用な, 客観的な臨床情報を提供する。我々の現在のアプローチは3次元顔の動きを推定するために色と深度カメラに依存している。しかし、深度カメラは一般的には利用できず、高価であり、制御とデータ処理のために特別なソフトウェアを必要とする。本研究の目的は,口腔顔面運動学から抽出した特徴に基づいて,健康管理とpd患者との鑑別に深度カメラが必要かどうかを評価することである。その結果,カラーカメラのみから抽出した2次元特徴は,カラーカメラと深度カメラから抽出した3次元特徴と同程度に情報的であり,PD患者の健康管理の差異が示唆された。これらの結果は,PDにおける口腔機能の自動的,客観的評価のための汎用システム開発への道を開くものである。 Orofacial deficits are common in people with Parkinson's disease (PD) and their evolution might represent an important biomarker of disease progression. We are developing an automated system for assessment of orofacial function in PD that can be used in-home or in-clinic and can provide useful and objective clinical information that informs disease management. Our current approach relies on color and depth cameras for the estimation of 3D facial movements. However, depth cameras are not commonly available, might be expensive, and require specialized software for control and data processing. The objective of this paper was to evaluate if depth cameras are needed to differentiate between healthy controls and PD patients based on features extracted from orofacial kinematics. Results indicate that 2D features, extracted from color cameras only, are as informative as 3D features, extracted from color and depth cameras, differentiating healthy controls from PD patients. These results pave the way for the development of a universal system for automatic and objective assessment of orofacial function in PD.	翻訳日:2022-12-22 10:16:33 公開日:2020-03-18
# 対面防止のための深部空間勾配と時間深度学習 Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing ( http://arxiv.org/abs/2003.08061v1 ) ライセンス: Link先を確認	Zezheng Wang, Zitong Yu, Chenxu Zhao, Xiangyu Zhu, Yunxiao Qin, Qiusheng Zhou, Feng Zhou, Zhen Lei	(参考訳) 顔認識システムのセキュリティには顔認識対策が不可欠である。深層学習は、顔の反偽造の最も効果的な方法の1つとして証明されている。大きな成功にもかかわらず、以前のほとんどの研究は、詳細な細かい情報と顔深度と動きパターンの相互作用を無視しながら、単に深度による損失を増大させることで、単一フレームのマルチタスクとして問題を定式化している。対照的に,我々は2つの洞察に基づいて,複数のフレームからプレゼンテーションアタックを検出する新しいアプローチをデザインする。 1)生活と陰影の間の詳細な識別的手がかり(例えば、空間的勾配等級)は、積み重ねられたバニラの畳み込みによって破棄され得る。 2) 3次元移動面のダイナミクスは, スプーフィング面を検出する上で重要な手がかりとなる。提案手法は,Residual Spatial Gradient Block (RSGB) を用いて識別の詳細を抽出し,時空間伝搬モジュール (STPM) から時空間情報を効率よく符号化する。さらに、より正確な深度監視のために、新しいContrastive Depth Lossが提示される。また,本手法の有効性を評価するために,サンプル毎に実際の深度を提供するDMAD(Double-modal Anti-Spoofing Dataset)も収集した。実験により,提案手法はOULU-NPU, SiW, CASIA-MFSD, Replay-Attack, そして新しいDMADを含む5つのベンチマークデータセットに対して,最先端の結果が得られることを示した。コードはhttps://github.com/clks-wzz/FAS-SGTD.comで入手できる。 Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/FAS-SGTD.	翻訳日:2022-12-22 10:16:13 公開日:2020-03-18
# 画像から画像への変換を保存した幾何による教師なしマルチモーダル画像登録 Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation ( http://arxiv.org/abs/2003.08073v1 ) ライセンス: Link先を確認	Moab Arar, Yiftach Ginger, Dov Danon, Ilya Leizerson, Amit Bermano, Daniel Cohen-Or	(参考訳) 自動運転のような多くの応用は、モダリティ間の空間的アライメントを必要とするマルチモーダルデータに大きく依存している。多くのマルチモーダル登録法は、画像間の空間的対応の計算に苦慮している。本研究では,2つの入力モダリティのイメージ・ツー・イメージ翻訳ネットワークをトレーニングすることにより,モダリティ間の類似度向上の難しさを回避する。この学習された翻訳は、単純で信頼性の高いモノモダリティメトリクスを使用して登録ネットワークをトレーニングできる。空間変換ネットワークと翻訳ネットワークの2つのネットワークを用いてマルチモーダル登録を行う。我々は,翻訳ネットワークの幾何学的保存を奨励することで,正確な空間変換ネットワークをトレーニングできることを示す。最先端のマルチモーダル手法と比較して,提案手法は教師なしであり,トレーニングにアライメントされたモーダルのペアを必要とせず,任意の対のモーダルに適応できる。本手法は,商用データセット上で定量的・定性的に評価し,複数の形態で良好に動作し,高精度なアライメントを実現する。 Many applications, such as autonomous driving, heavily rely on multi-modal data where spatial alignment between the modalities is required. Most multi-modal registration methods struggle computing the spatial correspondence between the images using prevalent cross-modality similarity measures. In this work, we bypass the difficulties of developing cross-modality similarity measures, by training an image-to-image translation network on the two input modalities. This learned translation allows training the registration network using simple and reliable mono-modality metrics. We perform multi-modal registration using two networks - a spatial transformation network and a translation network. We show that by encouraging our translation network to be geometry preserving, we manage to train an accurate spatial transformation network. Compared to state-of-the-art multi-modal methods our presented method is unsupervised, requiring no pairs of aligned modalities for training, and can be adapted to any pair of modalities. We evaluate our method quantitatively and qualitatively on commercial datasets, showing that it performs well on several modalities and achieves accurate alignment.	翻訳日:2022-12-22 10:15:18 公開日:2020-03-18
# MagicEyes: 複合現実感のための大規模視線推定データセット MagicEyes: A Large Scale Eye Gaze Estimation Dataset for Mixed Reality ( http://arxiv.org/abs/2003.08806v1 ) ライセンス: Link先を確認	Zhengyang Wu, Srivignesh Rajendran, Tarrence van As, Joelle Zimmermann, Vijay Badrinarayanan, Andrew Rabinovich	(参考訳) 仮想および混合現実(xr)デバイスが出現したことで、コンピュータビジョンコミュニティではアイトラッキングが注目されている。視線推定はXRの重要な要素であり、エネルギー効率の良いレンダリング、多焦点ディスプレイ、コンテンツとの効果的な相互作用を可能にする。ヘッドマウントXRデバイスでは、視野を塞ぐのを避けるために、視線をオフ軸に撮像する。これにより、目に関連する量を推測する際の課題が増加し、同時に正確で堅牢な学習ベースのアプローチを開発する機会を提供する。そこで本研究では,実mr装置を用いて収集した最初の大規模眼球データセットであるmagiceyesを提案する。 MagicEyesには、587ドルの被験者と80,000ドルの人間ラベル付き地上の真実の画像、80,000ドルの目標ラベル付き画像が含まれている。そこで本研究では,magiceyesの最先端手法を評価するとともに,角膜,グリム,瞳孔を1回のフォワードパスで検出するマルチタスクアイネットモデルを提案する。 With the emergence of Virtual and Mixed Reality (XR) devices, eye tracking has received significant attention in the computer vision community. Eye gaze estimation is a crucial component in XR -- enabling energy efficient rendering, multi-focal displays, and effective interaction with content. In head-mounted XR devices, the eyes are imaged off-axis to avoid blocking the field of view. This leads to increased challenges in inferring eye related quantities and simultaneously provides an opportunity to develop accurate and robust learning based approaches. To this end, we present MagicEyes, the first large scale eye dataset collected using real MR devices with comprehensive ground truth labeling. MagicEyes includes $587$ subjects with $80,000$ images of human-labeled ground truth and over $800,000$ images with gaze target labels. We evaluate several state-of-the-art methods on MagicEyes and also propose a new multi-task EyeNet model designed for detecting the cornea, glints and pupil along with eye segmentation in a single forward pass.	翻訳日:2022-12-22 10:08:38 公開日:2020-03-18
# 自動車レーダを用いた深部空間セグメンテーション Deep Open Space Segmentation using Automotive Radar ( http://arxiv.org/abs/2004.03449v1 ) ライセンス: Link先を確認	Farzan Erlik Nowruzi, Dhanvin Kolhatkar, Prince Kapoor, Fahed Al Hassanat, Elnaz Jahani Heravi, Robert Laganiere, Julien Rebut, Waqas Malik	(参考訳) 本研究では,駐車シナリオにおける開放空間を特定するために,高度な深部セグメンテーションモデルを用いたレーダを提案する。 SCORPと呼ばれるレーダー観測の公開データセットが収集された。深いモデルは様々なレーダ入力表現で評価される。提案手法は,低メモリ使用量およびリアルタイム処理速度を実現し,組込み配置に非常に適している。 In this work, we propose the use of radar with advanced deep segmentation models to identify open space in parking scenarios. A publically available dataset of radar observations called SCORP was collected. Deep models are evaluated with various radar input representations. Our proposed approach achieves low memory usage and real-time processing speeds, and is thus very well suited for embedded deployment.	翻訳日:2022-12-22 10:08:16 公開日:2020-03-18
# オープンソース音声資源におけるジェンダー表現 Gender Representation in Open Source Speech Resources ( http://arxiv.org/abs/2003.08132v1 ) ライセンス: Link先を確認	Mahault Garnerin, Solange Rossato, Laurent Besacier	(参考訳) 人工知能(AI)の台頭とディープラーニングアーキテクチャの利用の増加に伴い、AIシステムの倫理、透明性、公正性の問題は研究コミュニティの中心的な関心事となっている。我々は,open speech and language resource platform を通じて利用可能な音声資源における性表現に関する研究を行い,音声言語システムの透明性と公平性について論じる。オープンソースコーパスにおけるジェンダー情報の発見は簡単ではなく、ジェンダーバランスは他のコーパスの特徴にも依存することを示す(Elicited/non elicited speech, Low/high Resource Language, speech task targeted)。この論文は、このようなコーパスを用いて構築された音声システムの透明性を高めるために、研究者のためのメタデータと性別情報に関する勧告で締めくくられる。 With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited/non elicited speech, low/high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.	翻訳日:2022-12-22 10:07:45 公開日:2020-03-18
# スキップグラムモデルの学習規則の分析 An Analysis on the Learning Rules of the Skip-Gram Model ( http://arxiv.org/abs/2003.08489v1 ) ライセンス: Link先を確認	Canlin Zhang, Xiuwen Liu and Daniel Bis	(参考訳) 自然言語処理タスクにおける表現の一般化を改善するため、単語はベクトルを用いて表現され、ベクトル間の距離は単語の類似度と関連付けられる。スキップグラムモデルの最先端実装である word2vec は、多くの自然言語処理タスクの性能向上に広く利用されているが、そのメカニズムはまだよく理解されていない。本研究では,スキップグラムモデルの学習ルールを導出し,それらの競合学習との密接な関係を確立する。さらに,スキップグラムモデルに対する大域的最適解制約を提供し,実験結果を用いて検証する。 To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the state-of-the-art implementation of the skip-gram model, is widely used and improves the performance of many natural language processing tasks, its mechanism is not yet well understood. In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.	翻訳日:2022-12-22 10:06:56 公開日:2020-03-18
# 深層強化学習による配置最適化 Placement Optimization with Deep Reinforcement Learning ( http://arxiv.org/abs/2003.08445v1 ) ライセンス: Link先を確認	Anna Goldie and Azalia Mirhoseini	(参考訳) 配置最適化はシステムやチップ設計において重要な問題であり、グラフのノードを制約の対象となる目的のために最適化するための限られたリソースセットにマッピングする。本稿では,配置問題の解法として強化学習を動機づけることから始める。次に、深い強化学習とは何かの概要を示す。次に、配置問題を強化学習問題として定式化し、政策勾配最適化を用いてこの問題をいかに解決できるかを示す。最後に,様々な配置最適化問題に対する深層強化学習政策の訓練から学んだ教訓について述べる。 Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by motivating reinforcement learning as a solution to the placement problem. We then give an overview of what deep reinforcement learning is. We next formulate the placement problem as a reinforcement learning problem and show how this problem can be solved with policy gradient optimization. Finally, we describe lessons we have learned from training deep reinforcement learning policies across a variety of placement optimization problems.	翻訳日:2022-12-22 10:06:19 公開日:2020-03-18
# ロータテ・アンド・レンダー:シングルビュー画像からの教師なしフォトリアリスティック顔回転 Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images ( http://arxiv.org/abs/2003.08124v1 ) ライセンス: Link先を確認	Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang	(参考訳) 顔の回転は近年急速に進歩しているが、高品質なペアリングトレーニングデータの欠如は、既存の手法にとって大きなハードルとなっている。現在の生成モデルは、同一人物のマルチビューイメージを持つデータセットに大きく依存している。したがって、生成された結果は、データソースのスケールとドメインによって制限される。これらの課題を克服するために、野生の単視点画像コレクションのみを用いて、写真リアルな回転面を合成できる新しい教師なしフレームワークを提案する。私たちの重要な洞察は、3D空間の顔を前後に回転させ、2D平面に再レンダリングすることで、強力な自己スーパービジョンになるということです。我々は3次元顔モデリングと高分解能GANの最近の進歩を活用して構築ブロックを構成する。顔の3次元回転・回転は細部を損なうことなく任意の角度に適用できるため,既存の手法が不足している実地シナリオ(ペアデータがない場合など)に極めて適している。広範な実験により,提案手法は合成品質が優れ,かつ最先端の手法に対するアイデンティティの保存が幅広いポーズやドメインにまたがることを示した。さらに,我々のローテーション・アンド・レンダー・フレームワークが,強力なベースラインモデルであっても,現代の顔認識システムを強化する効果的なデータ拡張エンジンとして機能することを検証する。 Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods. The current generative models heavily rely on datasets with multi-view images of the same person. Thus, their generated results are restricted by the scale and domain of the data source. To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild. Our key insight is that rotating faces in the 3D space back and forth, and re-rendering them to the 2D plane can serve as a strong self-supervision. We leverage the recent advances in 3D face modeling and high-resolution GAN to constitute our building blocks. Since the 3D rotation-and-render on faces can be applied to arbitrary angles without losing details, our approach is extremely suitable for in-the-wild scenarios (i.e. no paired data are available), where existing methods fall short. Extensive experiments demonstrate that our approach has superior synthesis quality as well as identity preservation over the state-of-the-art methods, across a wide range of poses and domains. Furthermore, we validate that our rotate-and-render framework naturally can act as an effective data augmentation engine for boosting modern face recognition systems even on strong baseline models.	翻訳日:2022-12-22 10:00:41 公開日:2020-03-18
# 時空間特徴系列に基づくドライバ疲労認識アルゴリズム A Driver Fatigue Recognition Algorithm Based on Spatio-Temporal Feature Sequence ( http://arxiv.org/abs/2003.08134v1 ) ライセンス: Link先を確認	Chen Zhang, Xiaobo Lu, Zhiliang Huang	(参考訳) 道路交通事故における疲労運転は交通事故の重要な原因の一つであり,運転者の疲労認識アルゴリズムを用いて道路交通安全を改善することが重要である。近年、ディープラーニングの発展に伴い、パターン認識の分野は大きな発展を遂げている。本稿では, 時空間特徴系列に基づくリアルタイム疲労状態認識アルゴリズムを設計し, 主に疲労運転認識の現場に適用できることを示す。このアルゴリズムは,顔検出ネットワーク,顔のランドマーク検出,頭部ポーズ推定ネットワーク,疲労認識ネットワークの3つのタスクネットワークに分けられる。実験により,このアルゴリズムは小体積,高速,高精度の利点を有することが示された。 Researches show that fatigue driving is one of the important causes of road traffic accidents, so it is of great significance to study the driver fatigue recognition algorithm to improve road traffic safety. In recent years, with the development of deep learning, the field of pattern recognition has made great development. This paper designs a real-time fatigue state recognition algorithm based on spatio-temporal feature sequence, which can be mainly applied to the scene of fatigue driving recognition. The algorithm is divided into three task networks: face detection network, facial landmark detection and head pose estimation network, fatigue recognition network. Experiments show that the algorithm has the advantages of small volume, high speed and high accuracy.	翻訳日:2022-12-22 10:00:17 公開日:2020-03-18
# cafenet:クラス非依存のマイナショットエッジ検出ネットワーク CAFENet: Class-Agnostic Few-Shot Edge Detection Network ( http://arxiv.org/abs/2003.08235v1 ) ライセンス: Link先を確認	Young-Hyun Park, Jun Seo, Jaekyun Moon	(参考訳) 少数のラベル付きサンプルのみを用いて、新しいカテゴリのクリップ境界をローカライズすることを目的とした、数発のセマンティックエッジ検出と呼ばれる、新しい数発の学習課題に取り組む。また,メタ学習戦略に基づくクラス非依存Few-shot Edge Detection Network (CAFENet)を提案する。 CAFENetは、エッジラベルのセマンティック情報の欠如を補うために、セマンティックセグメンテーションモジュールを小規模に採用している。予測されたセグメンテーションマスクは、対象対象領域をハイライトするアテンションマップを生成し、デコーダモジュールをその領域に集中させる。また,マルチスプリットマッチングに基づく新たな正規化手法を提案する。メタトレーニングでは、高次元ベクトルを持つ計量学習問題は、低次元部分ベクトルを持つ小さな部分問題に分割される。そこで我々はFSE-1000とSBD-$5^i$という2つの新しいデータセットを構築し,提案したCAFENetの性能評価を行った。大規模なシミュレーション結果からCAFENetで採用した手法の性能評価が得られた。 We tackle a novel few-shot learning challenge, which we call few-shot semantic edge detection, aiming to localize crisp boundaries of novel categories using only a few labeled samples. We also present a Class-Agnostic Few-shot Edge detection Network (CAFENet) based on meta-learning strategy. CAFENet employs a semantic segmentation module in small-scale to compensate for lack of semantic information in edge labels. The predicted segmentation mask is used to generate an attention map to highlight the target object region, and make the decoder module concentrate on that region. We also propose a new regularization method based on multi-split matching. In meta-training, the metric-learning problem with high-dimensional vectors are divided into small subproblems with low-dimensional sub-vectors. Since there is no existing dataset for few-shot semantic edge detection, we construct two new datasets, FSE-1000 and SBD-$5^i$, and evaluate the performance of the proposed CAFENet on them. Extensive simulation results confirm the performance merits of the techniques adopted in CAFENet.	翻訳日:2022-12-22 10:00:07 公開日:2020-03-18
# 入院患者の栄養素摂取量評価のための人工知能システム An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients ( http://arxiv.org/abs/2003.08273v1 ) ライセンス: Link先を確認	Ya Lu, Thomai Stathopoulou, Maria F. Vasiloglou, Stergios Christodoulidis, Zeno Stanga, Stavroula Mougiakakou	(参考訳) 入院患者の栄養摂取の定期的なモニタリングは、疾患関連栄養失調のリスクを低減する上で重要な役割を果たす。栄養素摂取量を推定するいくつかの手法が開発されているが、データ精度を改善し、参加者の負担と健康コストを軽減できるため、より信頼性が高く完全に自動化された技術が要求されている。本稿では,食事摂取前後のRGB深度(RGB-D)画像ペアを簡便に処理することで,栄養摂取量を正確に推定する人工知能(AI)に基づく新しいシステムを提案する。このシステムは、食品セグメンテーションのための新しいマルチタスクコンテキストネットワークと、食品認識のための限られたトレーニングサンプルで構築された数ショット学習ベースの分類器と、3d表面構築のためのアルゴリズムを含んでいる。これにより、食品のシーケンシャルセグメンテーション、認識、消費食品量の推定が可能になり、各食事の栄養素摂取量を完全に自動で推定することができる。システムの開発と評価のために,322食の画像と栄養素のレシピを含む専用データベースを,革新的な戦略を用いてデータアノテーションと組み合わせて構築した。実験の結果, 推定栄養素摂取量は, 地上の真実と高い相関関係を示し, 平均相対誤差(20%)が非常に小さく, 既存の栄養素摂取評価技術よりも優れていた。 Regular monitoring of nutrient intake in hospitalised patients plays a critical role in reducing the risk of disease-related malnutrition. Although several methods to estimate nutrient intake have been developed, there is still a clear demand for a more reliable and fully automated technique, as this could improve data accuracy and reduce both the burden on participants and health costs. In this paper, we propose a novel system based on artificial intelligence (AI) to accurately estimate nutrient intake, by simply processing RGB Depth (RGB-D) image pairs captured before and after meal consumption. The system includes a novel multi-task contextual network for food segmentation, a few-shot learning-based classifier built by limited training samples for food recognition, and an algorithm for 3D surface construction. This allows sequential food segmentation, recognition, and estimation of the consumed food volume, permitting fully automatic estimation of the nutrient intake for each meal. For the development and evaluation of the system, a dedicated new database containing images and nutrient recipes of 322 meals is assembled, coupled to data annotation using innovative strategies. Experimental results demonstrate that the estimated nutrient intake is highly correlated (> 0.91) to the ground truth and shows very small mean relative errors (< 20%), outperforming existing techniques proposed for nutrient intake assessment.	翻訳日:2022-12-22 09:59:49 公開日:2020-03-18
# MINT:相互情報に基づくニューロントリミングによるディープネットワーク圧縮 MINT: Deep Network Compression via Mutual Information-based Neuron Trimming ( http://arxiv.org/abs/2003.08472v1 ) ライセンス: Link先を確認	Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh	(参考訳) プルーニングによるディープニューラルネットワーク圧縮へのほとんどのアプローチは、その重みを使ってフィルタの重要性を評価するか、あるいはスパシティ制約のある代替目的関数を最適化する。これらの手法は、類似のフィルタからの貢献を近似する有用な方法を提供するが、しばしば層間の依存性を無視したり、標準的なクロスエントロピーよりもより難しい最適化目標を解決したりする。我々の手法であるMINT(Multual Information-based Neuron Trimming)は,各層にまたがる隣接層間のフィルタの強度に基づいて,パーキングによる深部圧縮にアプローチする。この関係は、グラフベースの基準を用いてフィルタ間で交換される類似情報量を評価する条件付き幾何相互情報を用いて算出される。ネットワークをプルーニングする場合、保持されたフィルタが、高い性能を保証する後続層への情報の大部分に寄与することを保証する。提案手法は,MNIST, CIFAR-10, ILSVRC2012など,様々なネットワークアーキテクチャの標準ベンチマークにおいて,既存の最先端圧縮処理手法よりも優れている。さらに,本手法の逆攻撃に対する応答と,元のネットワークと比較した場合の校正統計との共通分母の観測について検討した。 Most approaches to deep neural network compression via pruning either evaluate a filter's importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard cross-entropy. Our method, Mutual Information-based Neuron Trimming (MINT), approaches deep compression via pruning by enforcing sparsity based on the strength of the relationship between filters of adjacent layers, across every pair of layers. The relationship is calculated using conditional geometric mutual information which evaluates the amount of similar information exchanged between the filters using a graph-based criterion. When pruning a network, we ensure that retained filters contribute the majority of the information towards succeeding layers which ensures high performance. Our novel approach outperforms existing state-of-the-art compression-via-pruning methods on the standard benchmarks for this task: MNIST, CIFAR-10, and ILSVRC2012, across a variety of network architectures. In addition, we discuss our observations of a common denominator between our pruning methodology's response to adversarial attacks and calibration statistics when compared to the original network.	翻訳日:2022-12-22 09:57:47 公開日:2020-03-18
# ヘッドマウントディスプレイ用ガゼセンシングLED Gaze-Sensing LEDs for Head Mounted Displays ( http://arxiv.org/abs/2003.08499v1 ) ライセンス: Link先を確認	Kaan Ak\c{s}it, Jan Kautz, David Luebke	(参考訳) ヘッドマウントディスプレイ(HMD)のための新しいガウントラッカーを導入する。光エミットダイオード(LED)を用いて、2つの既製のHMDを視線対応に修正する。私たちの重要な貢献は、LEDのセンシング機能を利用して、仮想現実(VR)アプリケーションのための低消費電力の視線トラッカーを作成することです。これにより、最小限のハードウェアを使用して、モバイルデバイス上で動作する軽量教師付きガウスプロセス回帰(GPR)を使用して、高い精度と低レイテンシを実現するシンプルなアプローチが得られる。ハードウェアを用いて,ミンコフスキー距離測度に基づくGPR実装は,自由パラメータを正確に決定することなく,一般的に使用される放射基底関数に基づくサポートベクター回帰(SVR)よりも優れていることを示す。本手法では,オフ軸光路による複雑な次元縮小,特徴抽出,歪み補正を必要としないことを示す。眼球追跡アプリケーションを用いた2つの完全なHMDプロトタイプを実演し,プロトタイプを用いた一連の主観的テストについて報告する。 We introduce a new gaze tracker for Head Mounted Displays (HMDs). We modify two off-the-shelf HMDs to be gaze-aware using Light Emitting Diodes (LEDs). Our key contribution is to exploit the sensing capability of LEDs to create low-power gaze tracker for virtual reality (VR) applications. This yields a simple approach using minimal hardware to achieve good accuracy and low latency using light-weight supervised Gaussian Process Regression (GPR) running on a mobile device. With our hardware, we show that Minkowski distance measure based GPR implementation outperforms the commonly used radial basis function-based support vector regression (SVR) without the need to precisely determine free parameters. We show that our gaze estimation method does not require complex dimension reduction techniques, feature extraction, or distortion corrections due to off-axis optical paths. We demonstrate two complete HMD prototypes with a sample eye-tracked application, and report on a series of subjective tests using our prototypes.	翻訳日:2022-12-22 09:57:23 公開日:2020-03-18
# 安定な神経流れ Stable Neural Flows ( http://arxiv.org/abs/2003.08063v1 ) ライセンス: Link先を確認	Stefano Massaroli, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, Hajime Asama	(参考訳) ニューラルネットワークによってパラメータ化されたエネルギー関数上で軌道が進化するニューラル常微分方程式(ニューラルODE)の確率的に安定な変種を導入する。安定なニューラルフローは、深さ流の漸近安定性を暗黙的に保証し、数値解法に対する入力摂動に対する頑健さと計算負荷を低下させる。学習手順は最適制御問題としてキャストされ、随伴感性分析に基づいて近似解が提案される。さらに最適化プロセスの容易化と収束の高速化を目的とした新しい正規化器を導入する。提案するモデルクラスは非線形分類と関数近似タスクで評価される。 We introduce a provably stable variant of neural ordinary differential equations (neural ODEs) whose trajectories evolve on an energy functional parametrised by a neural network. Stable neural flows provide an implicit guarantee on asymptotic stability of the depth-flows, leading to robustness against input perturbations and low computational burden for the numerical solver. The learning procedure is cast as an optimal control problem, and an approximate solution is proposed based on adjoint sensivity analysis. We further introduce novel regularizers designed to ease the optimization process and speed up convergence. The proposed model class is evaluated on non-linear classification and function approximation tasks.	翻訳日:2022-12-22 09:51:08 公開日:2020-03-18
# 近似勾配法による非凸非微分可能ミニマックスゲームの解法 Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method ( http://arxiv.org/abs/2003.08093v1 ) ライセンス: Link先を確認	Babak Barazandeh and Meisam Razaviyayn	(参考訳) min-max saddle pointゲームは、機械の傾きや信号処理における幅広い応用に現れる。適用性は広いが、理論的な研究は主に特別な凸凹構造に限られる。最近の研究では、これらの結果を特別な滑らかな非凸ケースに一般化したものの、非滑らかなシナリオに対する理解はまだ限られている。本研究では,目的関数がプレイヤーの決定変数の1つに対して(強く)凸である場合,非滑らかなmin-maxゲームの特徴形式について検討する。単純な多段階の近位勾配降下勾配アルゴリズムは、min-maxゲームの1次ナッシュ平衡に収束し、1/\epsilon$の多項式となる勾配評価の個数を示す。また、文献上に存在するものよりも定常性の概念が強いことも示します。最後に,LASSO推定器への逆攻撃による提案アルゴリズムの性能評価を行った。 Min-max saddle point games appear in a wide range of applications in machine leaning and signal processing. Despite their wide applicability, theoretical studies are mostly limited to the special convex-concave structure. While some recent works generalized these results to special smooth non-convex cases, our understanding of non-smooth scenarios is still limited. In this work, we study special form of non-smooth min-max games when the objective function is (strongly) convex with respect to one of the player's decision variable. We show that a simple multi-step proximal gradient descent-ascent algorithm converges to $\epsilon$-first-order Nash equilibrium of the min-max game with the number of gradient evaluations being polynomial in $1/\epsilon$. We will also show that our notion of stationarity is stronger than existing ones in the literature. Finally, we evaluate the performance of the proposed algorithm through adversarial attack on a LASSO estimator.	翻訳日:2022-12-22 09:50:48 公開日:2020-03-18
# 点雲の動的還元ネットワーク A Dynamic Reduction Network for Point Clouds ( http://arxiv.org/abs/2003.08013v1 ) ライセンス: Link先を確認	Lindsey Gray (1), Thomas Klijnsma (1), Shamik Ghosh (2) ((1) Fermi National Accelerator Laboratory, (2) Saha Institute of Nuclear Physics)	(参考訳) 画像全体を分類することは機械学習の古典的な問題であり、グラフニューラルネットワークは非常に不規則な幾何学を学ぶための強力な手法である。全体分類を決定する場合、点雲の一部が他の部分よりも重要である場合がしばしばある。グラフ構造では、これは畳み込みフィルタの最後に情報をプールすることから始まり、静的グラフ上の様々なステージ付きプーリング技術へと進化した。本稿では,所定のグラフ構造の必要性を排除したプーリングの動的グラフ定式化を導入する。中間クラスタリングを通じてデータ間の最も重要な関係を動的に学習することで、これを実現する。ネットワークアーキテクチャは、表現サイズと効率性を考慮した興味深い結果をもたらす。また、高エネルギー粒子物理学における画像分類からエネルギー回帰まで、多くのタスクに容易に適応できる。 Classifying whole images is a classic problem in machine learning, and graph neural networks are a powerful methodology to learn highly irregular geometries. It is often the case that certain parts of a point cloud are more important than others when determining overall classification. On graph structures this started by pooling information at the end of convolutional filters, and has evolved to a variety of staged pooling techniques on static graphs. In this paper, a dynamic graph formulation of pooling is introduced that removes the need for predetermined graph structure. It achieves this by dynamically learning the most important relationships between data via an intermediate clustering. The network architecture yields interesting results considering representation size and efficiency. It also adapts easily to a large number of tasks from image classification to energy regression in high energy particle physics.	翻訳日:2022-12-22 09:49:35 公開日:2020-03-18
# カプセルネットワークを用いたジェネレータアーキテクチャのためのカプセルGAN Capsule GAN Using Capsule Network for Generator Architecture ( http://arxiv.org/abs/2003.08047v1 ) ライセンス: Link先を確認	Kanako Marusaki and Hiroshi Watanabe	(参考訳) 本稿では,キャプリケータだけでなく,ジェネレータ内でもCapsule Networkを用いた生成逆ネットワークであるCapsule GANを提案する。近年,GAN(Generative Adversarial Network)の研究が盛んに行われている。しかし,GANによる画像生成は困難である。したがって、GANは時に画質の悪い画像を生成する。これらのGANは畳み込みニューラルネットワーク(CNN)を使用する。しかし、cnnには画像の特徴間の関係情報が失われる可能性があるという欠陥がある。 2017年に hinton が提案した capsule network は cnn の欠陥を克服している。 Capsule GANは以前、差別装置でCapsule Networkを使用していると報告している。しかし、Capsule Networkを使う代わりに、Capsule GANは以前の研究でDCGANのようなジェネレータアーキテクチャでCNNを使用していると報告している。本稿では,ジェネレータにCapsule Networkを使用する2つのアプローチを紹介する。 1つは、ジェネレータへの入力として識別器からdigitcaps層を使用することである。 DigitCaps層はCapsule Networkの出力層である。判別器の入力画像の特徴を有する。もう1つは、ジェネレータ内のカプセルネットワークにおける認識プロセスの逆操作を使用することである。本稿では,この論文で提案したCapsule GANと,CNNとCapsule GANを用いた従来のGANを比較した。データセットはMNIST、Fashion-MNIST、カラー画像である。 Capsule GAN は CNN と Capsule Network で GAN より優れていることを示す。本稿では, Capsule GAN のアーキテクチャを Capsule Network を用いた基本アーキテクチャとして提案する。したがって,既存のGANの改良手法をCapsule GANに適用することができる。 This paper presents Capsule GAN, a Generative adversarial network using Capsule Network not only in the discriminator but also in the generator. Recently, Generative adversarial networks (GANs) has been intensively studied. However, generating images by GANs is difficult. Therefore, GANs sometimes generate poor quality images. These GANs use convolutional neural networks (CNNs). However, CNNs have the defect that the relational information between features of the image may be lost. Capsule Network, proposed by Hinton in 2017, overcomes the defect of CNNs. Capsule GAN reported previously uses Capsule Network in the discriminator. However, instead of using Capsule Network, Capsule GAN reported in previous studies uses CNNs in generator architecture like DCGAN. This paper introduces two approaches to use Capsule Network in the generator. One is to use DigitCaps layer from the discriminator as the input to the generator. DigitCaps layer is the output layer of Capsule Network. It has the features of the input images of the discriminator. The other is to use the reverse operation of recognition process in Capsule Network in the generator. We compare Capsule GAN proposed in this paper with conventional GAN using CNN and Capsule GAN which uses Capsule Network in the discriminator only. The datasets are MNIST, Fashion-MNIST and color images. We show that Capsule GAN outperforms the GAN using CNN and the GAN using Capsule Network in the discriminator only. The architecture of Capsule GAN proposed in this paper is a basic architecture using Capsule Network. Therefore, we can apply the existing improvement techniques for GANs to Capsule GAN.	翻訳日:2022-12-22 09:48:38 公開日:2020-03-18
# ブートストラップバイアス補正クロス検証のスーパーラーニングへの応用 Bootstrap Bias Corrected Cross Validation applied to Super Learning ( http://arxiv.org/abs/2003.08342v1 ) ライセンス: Link先を確認	Krzysztof Mnich and Agnieszka Kitlas Goli\'nska and Aneta Polewko-Klim and Witold R. Rudnicki	(参考訳) 超学習者アルゴリズムは、複数のベース学習者の結果を組み合わせて予測の質を向上させることができる。超学習者の結果を検証するデフォルトの方法は、ネストされたクロスバリデーションである。 Tsamardinosらは、ネストしたクロスバリデーションを学習アルゴリズムのハイパーパラメータをチューニングするための再サンプリングに置き換えることを提案した。このアイデアをsuper learnerの検証に適用し,nested cross validationを含む他の検証手法と比較する。様々なサイズの人工データセットと7つの実際の生物医学データセットでテストが行われた。 Bootstrap Bias Correctionと呼ばれる再サンプリング手法は、ネストされたクロスバリデーションに対して、合理的に正確でコスト効率のよい代替手段であることが判明した。 Super learner algorithm can be applied to combine results of multiple base learners to improve quality of predictions. The default method for verification of super learner results is by nested cross validation. It has been proposed by Tsamardinos et al., that nested cross validation can be replaced by resampling for tuning hyper-parameters of the learning algorithms. We apply this idea to verification of super learner and compare with other verification methods, including nested cross validation. Tests were performed on artificial data sets of diverse size and on seven real, biomedical data sets. The resampling method, called Bootstrap Bias Correction, proved to be a reasonably precise and very cost-efficient alternative for nested cross validation.	翻訳日:2022-12-22 09:42:05 公開日:2020-03-18
# survlime: 機械学習生存モデルを説明する方法 SurvLIME: A method for explaining machine learning survival models ( http://arxiv.org/abs/2003.08371v1 ) ライセンス: Link先を確認	Maxim S. Kovalev, Lev V. Utkin, Ernest M. Kasimov	(参考訳) 機械学習生存モデルを説明するためにsurvlimeと呼ばれる新しい手法を提案する。これはよく知られたメソッド LIME の拡張や修正と見なすことができる。提案手法の背景にある主な考え方は,Cox比例ハザードモデルを用いて,試験例の周辺地域における生存率モデルを近似することである。コックスモデルは、例の共変数の線形結合を、共変数の係数が予測に定量的に影響を及ぼすとみなすことができると考えるために用いられる。もう1つのアイデアは、説明されたモデルとcoxモデルの累積ハザード関数を、関心点周辺の局所領域における摂動点の集合を用いて近似することである。この方法は制約のない凸最適化問題に還元される。多くの数値実験がサーヴライム効率を示している。 A new method called SurvLIME for explaining machine learning survival models is proposed. It can be viewed as an extension or modification of the well-known method LIME. The main idea behind the proposed method is to apply the Cox proportional hazards model to approximate the survival model at the local area around a test example. The Cox model is used because it considers a linear combination of the example covariates such that coefficients of the covariates can be regarded as quantitative impacts on the prediction. Another idea is to approximate cumulative hazard functions of the explained model and the Cox model by using a set of perturbed points in a local area around the point of interest. The method is reduced to solving an unconstrained convex optimization problem. A lot of numerical experiments demonstrate the SurvLIME efficiency.	翻訳日:2022-12-22 09:41:53 公開日:2020-03-18
# necpd:最適確率勾配降下を伴うオンラインテンソル分解 NeCPD: An Online Tensor Decomposition with Optimal Stochastic Gradient Descent ( http://arxiv.org/abs/2003.08844v1 ) ライセンス: Link先を確認	Ali Anaissi, Basem Suleiman, Seid Miad Zandavi	(参考訳) マルチウェイデータ分析は、テンソル $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $ に格納された高次データセットの基盤構造をキャプチャするための重要なツールとなっている。 CANDECOMP/PARAFAC$ (CP)分解は広く研究され、$\mathcal{X}$ by $N$ loading matrices $A^{(1)}, \dots, A^{(N)}$ ここで$N$はテンソルの順序を表す。確率勾配勾配(SGD)アルゴリズムに基づくマルチウェイオンラインデータにおける非凸問題に対するNeCPDという新しい効率的なCP分解解法を提案する。 SGDは1ステップで$\mathcal{X}^{(t+1)}$を更新できるので、オンライン設定では非常に便利です。大域収束に関しては、SGDが非凸問題を扱う際に多くのサドル点に留まることが知られている。ヘシアン行列を解析し,これらの鞍点を同定し,勾配更新ステップにノイズをほとんど付加しない摂動法を用いてそれらから逃れようとする。さらに,Nesterov の Accelerated Gradient (NAG) 法をSGD アルゴリズムに適用し,収束速度を最適に高速化し,エポック毎のヘシアン計算遅延時間を補償する。実験室ベースおよび実生活構造データセットを用いた構造健康モニタリングの分野での実験的な評価により,既存のオンラインテンソル解析法と比較して,より正確な結果が得られた。 Multi-way data analysis has become an essential tool for capturing underlying structures in higher-order datasets stored in tensor $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $. $CANDECOMP/PARAFAC$ (CP) decomposition has been extensively studied and applied to approximate $\mathcal{X}$ by $N$ loading matrices $A^{(1)}, \dots, A^{(N)}$ where $N$ represents the order of the tensor. We propose a new efficient CP decomposition solver named NeCPD for non-convex problem in multi-way online data based on stochastic gradient descent (SGD) algorithm. SGD is very useful in online setting since it allows us to update $\mathcal{X}^{(t+1)}$ in one single step. In terms of global convergence, it is well known that SGD stuck in many saddle points when it deals with non-convex problems. We study the Hessian matrix to identify theses saddle points, and then try to escape them using the perturbation approach which adds little noise to the gradient update step. We further apply Nesterov's Accelerated Gradient (NAG) method in SGD algorithm to optimally accelerate the convergence rate and compensate Hessian computational delay time per epoch. Experimental evaluation in the field of structural health monitoring using laboratory-based and real-life structural datasets show that our method provides more accurate results compared with existing online tensor analysis methods.	翻訳日:2022-12-22 09:41:00 公開日:2020-03-18
# opengan: オープンセット生成型広告ネットワーク OpenGAN: Open Set Generative Adversarial Networks ( http://arxiv.org/abs/2003.08074v1 ) ライセンス: Link先を確認	Luke Ditria, Benjamin J. Meyer, Tom Drummond	(参考訳) 既存の条件付きジェネレータネットワーク(cGAN)の多くは、事前に定義されたクラスレベルのセマンティックラベルや属性の条件付けに限られている。計量空間から特徴埋め込みした入力サンプル毎に条件付けされたオープン集合 gan アーキテクチャ (opengan) を提案する。クラスレベルときめ細かなセマンティック情報をエンコードする最先端のメトリック学習モデルを用いて、与えられたソース画像にセマンティックに類似したサンプルを生成することができる。計量学習モデルによって抽出された意味情報は、分布外の新しいクラスに転送され、生成モデルがトレーニング分布外のサンプルを生成する。提案手法は,学習クラスに類似した視覚的品質を持つ新しいクラスから256$\times$256の解像度画像を生成することができることを示す。ソース画像の代わりに、距離空間のランダムなサンプリングも高品質なサンプルをもたらすことを示す。特徴空間と潜在空間の補間は画像空間における意味的かつ視覚的に可算な変換をもたらすことを示す。最後に、データ拡張の下流タスクに対する生成されたサンプルの有用性を示す。 GAN トレーニング分布外のクラスにおいて,OpenGAN サンプルを用いて学習データを増強することにより,分類器の性能を大幅に向上できることを示す。 Many existing conditional Generative Adversarial Networks (cGANs) are limited to conditioning on pre-defined and fixed class-level semantic labels or attributes. We propose an open set GAN architecture (OpenGAN) that is conditioned per-input sample with a feature embedding drawn from a metric space. Using a state-of-the-art metric learning model that encodes both class-level and fine-grained semantic information, we are able to generate samples that are semantically similar to a given source image. The semantic information extracted by the metric learning model transfers to out-of-distribution novel classes, allowing the generative model to produce samples that are outside of the training distribution. We show that our proposed method is able to generate 256$\times$256 resolution images from novel classes that are of similar visual quality to those from the training classes. In lieu of a source image, we demonstrate that random sampling of the metric space also results in high-quality samples. We show that interpolation in the feature space and latent space results in semantically and visually plausible transformations in the image space. Finally, the usefulness of the generated samples to the downstream task of data augmentation is demonstrated. We show that classifier performance can be significantly improved by augmenting the training data with OpenGAN samples on classes that are outside of the GAN training distribution.	翻訳日:2022-12-22 09:40:29 公開日:2020-03-18
# ContainerStress:ビッグデータMLユースケースのためのクラウドノード自動スコープフレームワーク ContainerStress: Autonomous Cloud-Node Scoping Framework for Big-Data ML Use Cases ( http://arxiv.org/abs/2003.08011v1 ) ライセンス: Link先を確認	Guang Chao Wang, Kenny Gross, and Akshay Subramaniam	(参考訳) クラウド環境にビッグデータ機械学習(ML)サービスをデプロイすることは、クラウドベンダにとって、任意の顧客ユースケースのサイズを拡大するクラウドコンテナの構成に関する課題となる。 OracleLabsは、ネストループのMonte Carloシミュレーションを使用して、クラウドCPU-GPU"Shapes"(エンドユーザが利用可能なクラウドコンテナ内のCPUやGPUの設定)の範囲で、任意のサイズの顧客MLユースケースを自律的にスケールする自動フレームワークを開発した。さらに、OracleLabsとNVIDIAの著者は、MLの予測アルゴリズムの計算コストとGPUアクセラレーションを分析し、従来のCPUとNVIDIA GPUで構成されるクラウドコンテナの計算コストの削減を評価するMLベンチマーク研究に協力している。 Deploying big-data Machine Learning (ML) services in a cloud environment presents a challenge to the cloud vendor with respect to the cloud container configuration sizing for any given customer use case. OracleLabs has developed an automated framework that uses nested-loop Monte Carlo simulation to autonomously scale any size customer ML use cases across the range of cloud CPU-GPU "Shapes" (configurations of CPUs and/or GPUs in Cloud containers available to end customers). Moreover, the OracleLabs and NVIDIA authors have collaborated on a ML benchmark study which analyzes the compute cost and GPU acceleration of any ML prognostic algorithm and assesses the reduction of compute cost in a cloud container comprising conventional CPUs and NVIDIA GPUs.	翻訳日:2022-12-22 09:39:15 公開日:2020-03-18
# 活性化機能とXavierおよびHe正規初期化との関連性に関する調査 A Survey on Activation Functions and their relation with Xavier and He Normal Initialization ( http://arxiv.org/abs/2004.06632v1 ) ライセンス: Link先を確認	Leonid Datta	(参考訳) ニューラルネットワークでは、活性化関数と重み初期化法は、ニューラルネットワークのトレーニングとパフォーマンスにおいて重要な役割を果たす。問題は、機能の性質が、よく機能するアクティベーション機能として重要/必要であるかどうかである。また、最も広く使われている重み初期化法(xavierとhe normal initialization)は、アクティベーション関数と基本的な関係がある。本研究は活性化機能と最も広く利用されている活性化機能(sgmoid, tanh, relu, lrelu, prelu)の重要/必要特性について述べる。また,これらの活性化関数と2つの重み初期化法 (xavier と he normal initialization) との関係についても検討した。 In artificial neural network, the activation function and the weight initialization method play important roles in training and performance of a neural network. The question arises is what properties of a function are important/necessary for being a well-performing activation function. Also, the most widely used weight initialization methods - Xavier and He normal initialization have fundamental connection with activation function. This survey discusses the important/necessary properties of activation function and the most widely used activation functions (sigmoid, tanh, ReLU, LReLU and PReLU). This survey also explores the relationship between these activation functions and the two weight initialization methods - Xavier and He normal initialization.	翻訳日:2022-12-22 09:32:37 公開日:2020-03-18
# 知識グラフ補完手法の現実的再評価:実験的検討 Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study ( http://arxiv.org/abs/2003.08001v1 ) ライセンス: Link先を確認	Farahnaz Akrami (1), Mohammed Samiul Saeef (1), Qingheng Zhang (2), Wei Hu (2), Chengkai Li (1) ((1) Department of Computer Science and Engineering, University of Texas at Arlington, (2) State Key Laboratory for Novel Software Technology, Nanjing University)	(参考訳) 知識グラフの補完、特にリンク予測のタスクに埋め込みモデルを用いる活発な研究分野において、ほとんどの先行研究は2つのベンチマークデータセット fb15k と wn18 を用いた。これらの研究における多くの3つのデータセットは、意味的重複、相関、データ不完全性による高いデータ冗長性を示す逆関係と重複関係に属する。これは過剰なデータ漏洩のケースです – モデルが実際の予測に適用される必要がある場合に利用できない機能を使用して、モデルをトレーニングします。また、適用対象と対象のデカルト積によって形成される全ての三重項が真の事実であるデカルト積関係もある。上記の関係に関するリンク予測は簡単であり、洗練された埋め込みモデルではなく、単純な規則を用いてより正確な精度で実現できる。これらのモデルのより根本的な欠点は、リンク予測のシナリオが実世界では存在しないことである。本論文は,非現実的三重項除去時の埋め込みモデルの真の有効性を評価することを目的とした,最初の体系的研究である。実験の結果、これらのモデルは以前よりもはるかに精度が低いことがわかった。それらの精度の低さは、リンク予測を真に効果的な自動化ソリューションなしでタスクにします。したがって、有効なアプローチの再検討が必要である。 In the active research area of employing embedding models for knowledge graph completion, particularly for the task of link prediction, most prior studies used two benchmark datasets FB15k and WN18 in evaluating such models. Most triples in these and other datasets in such studies belong to reverse and duplicate relations which exhibit high data redundancy due to semantic duplication, correlation or data incompleteness. This is a case of excessive data leakage---a model is trained using features that otherwise would not be available when the model needs to be applied for real prediction. There are also Cartesian product relations for which every triple formed by the Cartesian product of applicable subjects and objects is a true fact. Link prediction on the aforementioned relations is easy and can be achieved with even better accuracy using straightforward rules instead of sophisticated embedding models. A more fundamental defect of these models is that the link prediction scenario, given such data, is non-existent in the real-world. This paper is the first systematic study with the main objective of assessing the true effectiveness of embedding models when the unrealistic triples are removed. Our experiment results show these models are much less accurate than what we used to perceive. Their poor accuracy renders link prediction a task without truly effective automated solution. Hence, we call for re-investigation of possible effective approaches.	翻訳日:2022-12-22 09:32:24 公開日:2020-03-18
# 構文グラフ畳み込みネットワークによる文書要約のための選択的注意エンコーダ Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization ( http://arxiv.org/abs/2003.08004v1 ) ライセンス: Link先を確認	Haiyang Xu, Yun Wang, Kun Han, Baochang Ma, Junwen Chen, Xiangang Li	(参考訳) 抽象的なテキスト要約は難しい課題であり、ソーステキストから有意な情報を効果的に抽出し、要約を生成するメカニズムを設計する必要がある。ソーステキストの構文解析プロセスには、より正確な要約を生成するのに役立つ重要な構文構造や意味構造が含まれている。しかし、テキスト要約のための解析木をモデル化することは、その非線形構造のため自明ではなく、複数の文とその解析木を含む文書を扱うのが困難である。本稿では,文書中の文から解析木を接続するためのグラフと,文書の構文表現を学習するための重ね畳み込みグラフ畳み込みネットワーク(GCN)を提案する。選択的注意機構は、意味的・構造的側面において有意な情報を抽出し、抽象的な要約を生成する。 CNN/Daily Mailテキスト要約データセットに対する我々のアプローチを評価する。実験結果は,提案手法がベースラインを上回り,データセット上での最先端性能を実現することを示す。 Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary. A parsing process of the source text contains critical syntactic or semantic structures, which is useful to generate more accurate summary. However, modeling a parsing tree for text summarization is not trivial due to its non-linear structure and it is harder to deal with a document that includes multiple sentences and their parsing trees. In this paper, we propose to use a graph to connect the parsing trees from the sentences in a document and utilize the stacked graph convolutional networks (GCNs) to learn the syntactic representation for a document. The selective attention mechanism is used to extract salient information in semantic and structural aspect and generate an abstractive summary. We evaluate our approach on the CNN/Daily Mail text summarization dataset. The experimental results show that the proposed GCNs based selective attention approach outperforms the baselines and achieves the state-of-the-art performance on the dataset.	翻訳日:2022-12-22 09:32:01 公開日:2020-03-18
# TTTTTackling WinoGrande Schemas TTTTTackling WinoGrande Schemas ( http://arxiv.org/abs/2003.08380v1 ) ライセンス: Link先を確認	Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin	(参考訳) 各例を仮説を含む2つの入力テキスト文字列に分解し,「補足」トークンに割り当てられた確率を仮説のスコアとして用いることで,ai2 winograndeチャレンジに取り組むためにt5シーケンシャル・ツー・シーケンスモデルを適用した。 2020年3月13日、公式のリーダーボードへの最初の(そして唯一の)提出は0.7673 AUCだった。 We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis. Our first (and only) submission to the official leaderboard yielded 0.7673 AUC on March 13, 2020, which is the best known result at this time and beats the previous state of the art by over five points.	翻訳日:2022-12-22 09:31:19 公開日:2020-03-18
# 高次イジングモデルにおける推論によるピアグループ効果によるロジスティック回帰 Logistic-Regression with peer-group effects via inference in higher order Ising models ( http://arxiv.org/abs/2003.08259v1 ) ライセンス: Link先を確認	Constantinos Daskalakis, Nishanth Dikkala and Ioannis Panageas	(参考訳) スピングラスモデル(例えば、シェリントン=キルクパトリック、ホップフィールド、イジングモデル)は全て指数関数的離散分布の族としてよく研究されており、ネットワーク上の相関現象のモデル化に使用される多くのアプリケーション領域に影響を与えている。従来、これらのモデルは2次統計量を持ち、その結果、対相互作用から生じる相関を捉える。本研究では,ピアグループ効果を持つソーシャルネットワーク上での行動のモデル化を行い,高次統計量モデルへのこれらの拡張について検討する。特に、ネットワーク上の二進結果を高次スピングラスとしてモデル化し、個人の振る舞いは、自身の共変量のベクトルの線型関数と、他の振る舞いの多項式関数に依存し、ピアグループ効果を捉えている。このようなモデルから高次元のサンプルである {\em single} を用いて、我々の目標は、線型関数の係数とピアグループ効果の強さを回復することである。この結果の核心は、モデルのlog pseudo-likelihoodの強い結合性を示す新しいアプローチであり、最大 pseudo-likelihood estimator (mple) に対する統計エラーレートは$\sqrt{d/n}$であり、ここでは$d$は共変ベクトルの次元であり、$n$はネットワークのサイズ(ノード数)である。我々のモデルは、最近の研究で研究されているベニラロジスティック回帰とピアエフェクトモデルを一般化し、これらの結果を高次相互作用に対応するように拡張する。 Spin glass models, such as the Sherrington-Kirkpatrick, Hopfield and Ising models, are all well-studied members of the exponential family of discrete distributions, and have been influential in a number of application domains where they are used to model correlation phenomena on networks. Conventionally these models have quadratic sufficient statistics and consequently capture correlations arising from pairwise interactions. In this work we study extensions of these to models with higher-order sufficient statistics, modeling behavior on a social network with peer-group effects. In particular, we model binary outcomes on a network as a higher-order spin glass, where the behavior of an individual depends on a linear function of their own vector of covariates and some polynomial function of the behavior of others, capturing peer-group effects. Using a {\em single}, high-dimensional sample from such model our goal is to recover the coefficients of the linear function as well as the strength of the peer-group effects. The heart of our result is a novel approach for showing strong concavity of the log pseudo-likelihood of the model, implying statistical error rate of $\sqrt{d/n}$ for the Maximum Pseudo-Likelihood Estimator (MPLE), where $d$ is the dimensionality of the covariate vectors and $n$ is the size of the network (number of nodes). Our model generalizes vanilla logistic regression as well as the peer-effect models studied in recent works, and our results extend these results to accommodate higher-order interactions.	翻訳日:2022-12-22 09:30:40 公開日:2020-03-18
# unfolding reweighted $\ell_1$-$\ell_1$ による解釈可能なディープリカレントニューラルネットワーク:アーキテクチャ設計と一般化解析 Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted $\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization Analysis ( http://arxiv.org/abs/2003.08334v1 ) ライセンス: Link先を確認	Huynh Van Luong, Boris Joukovsky, Nikos Deligiannis	(参考訳) 例えば、学習された反復収縮しきい値アルゴリズム(LISTA)は、最適化方法の学習的なバリエーションとして、ディープニューラルネットワークを設計する。これらのネットワークは、元の最適化手法よりも高速な収束と高精度を実現することが示されている。本稿では,再重み付けされた$\ell_1$-$\ell_1$最小化アルゴリズムを展開することにより,新しいディープリカレントニューラルネットワーク (coined reweighted-rnn) を開発し,シーケンシャル信号再構成のタスクに適用する。私たちの知る限りでは、これは再重み付け最小化を探求する最初の深い展開方法です。下位の再重み付け最小化モデルにより、rnnは各層内の隠れたユニットごとに異なるソフトthresholding関数(alia、異なるアクティベーション関数)を持つ。さらに、オーバーパラメータ化重みによる既存の深部展開RNNモデルよりも高いネットワーク表現性を有する。重要なことは、Rademacher複雑性を用いて提案したreweighted-RNNモデルの理論的一般化誤差境界を確立することである。境界は、提案されたreweighted-RNNのパラメータ化が良い一般化を保証することを示している。本研究では,低次元計測による映像フレーム再構成問題,すなわち逐次フレーム再構成問題に対して,提案手法を適用した。移動MNISTデータセットの実験結果から,提案した深度再重み付きRNNは既存のRNNモデルよりも大幅に優れていた。 Deep unfolding methods---for example, the learned iterative shrinkage thresholding algorithm (LISTA)---design deep neural networks as learned variations of optimization methods. These networks have been shown to achieve faster convergence and higher accuracy than the original optimization methods. In this line of research, this paper develops a novel deep recurrent neural network (coined reweighted-RNN) by the unfolding of a reweighted $\ell_1$-$\ell_1$ minimization algorithm and applies it to the task of sequential signal reconstruction. To the best of our knowledge, this is the first deep unfolding method that explores reweighted minimization. Due to the underlying reweighted minimization model, our RNN has a different soft-thresholding function (alias, different activation functions) for each hidden unit in each layer. Furthermore, it has higher network expressivity than existing deep unfolding RNN models due to the over-parameterizing weights. Importantly, we establish theoretical generalization error bounds for the proposed reweighted-RNN model by means of Rademacher complexity. The bounds reveal that the parameterization of the proposed reweighted-RNN ensures good generalization. We apply the proposed reweighted-RNN to the problem of video frame reconstruction from low-dimensional measurements, that is, sequential frame reconstruction. The experimental results on the moving MNIST dataset demonstrate that the proposed deep reweighted-RNN significantly outperforms existing RNN models.	翻訳日:2022-12-22 09:23:54 公開日:2020-03-18
# コンピュータビジョンにおける自己監督型コンテキスト帯域 Self-Supervised Contextual Bandits in Computer Vision ( http://arxiv.org/abs/2003.08485v1 ) ライセンス: Link先を確認	Aniket Anand Deshmukh, Abhimanu Kumar, Levi Boyles, Denis Charles, Eren Manavoglu, Urun Dogan	(参考訳) コンテキストバンディットは、仮説テストから製品レコメンデーションまで、ドメイン内の機械学習実践者が直面する一般的な問題である。様々な成功の度合いでコンテキスト的バンディット問題にリッチなデータ表現を利用するには、多くのアプローチがあった。自己教師付き学習は、明示的なラベルなしでリッチなデータ表現を見つけるための有望なアプローチである。典型的な自己指導型学習スキームでは、第一のタスクは問題目標(クラスタリング、分類、埋め込み生成など)によって定義され、第二のタスクは自己監督目標(回転予測、近傍の言葉、着色など)によって定義される。通常のセルフスーパービジョンでは,2次タスクのトレーニングデータから暗黙のラベルを学習する。しかし、文脈的バンディット設定では、学習の初期段階でデータが不足しているため、暗黙的なラベルを得るという利点はありません。文脈的バンディット目標と自己監督目標を組み合わせることにより,この問題に取り組むための新たなアプローチを提案する。文脈的バンディット学習を自己超越で強化することで、より累積的な報酬を得ることができます。 8種類のコンピュータビジョンデータセットを用いた結果,累積報酬が大幅に向上した。提案手法が最適に動作しないケースを提供し、これらのケースでより良い学習を行うための代替手法を提供する。 Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn't perform optimally and give alternative methods for better learning in these cases.	翻訳日:2022-12-22 09:22:20 公開日:2020-03-18
# 半教師付き少数ショット分類のためのタスク適応クラスタリング Task-Adaptive Clustering for Semi-Supervised Few-Shot Classification ( http://arxiv.org/abs/2003.08221v1 ) ライセンス: Link先を確認	Jun Seo, Sung Whan Yoon, Jaekyun Moon	(参考訳) 未確認のタスクを、少量の新しいトレーニングデータだけで処理することを目的としている。しかし、数発学習者の準備(あるいはメタトレーニング)では、大量のラベル付きデータが必要である。実世界では、残念ながらラベル付きデータは高価で不足している。本研究では,トレーニングデータの大部分をラベル付けしていない半教師環境下でうまく機能する,数発学習システムを提案する。提案手法では, 組込み特徴空間とは異なる新しい射影空間において, 現在のタスクに対するラベル付きサンプルクラスタリングを行う。条件付きクラスタリング空間は、現在のタスクのクラスセンタロイドと、タスク間でメタトレーニングされる独立したクラス毎の参照ベクトルとの間のギャップを迅速に閉じるように線形に構成される。より一般的な設定では,メタ学習におけるタスクコンディショニングの度合いを制御するという概念を導入し,クラスタリング空間の繰り返し更新数に応じてタスクコンディショニングの量が変化する。 miniImageNet と tieredImageNet のデータセットに基づく広範囲なシミュレーション結果から,提案手法の半教師付きセミショット分類性能を示す。シミュレーションの結果,提案したタスク適応型クラスタリングは,対象クラス外からのラベル付きサンプル画像の増加に伴い,優れた劣化を示すことが示された。 Few-shot learning aims to handle previously unseen tasks using only a small amount of new training data. In preparing (or meta-training) a few-shot learner, however, massive labeled data are necessary. In the real world, unfortunately, labeled data are expensive and/or scarce. In this work, we propose a few-shot learner that can work well under the semi-supervised setting where a large portion of training data is unlabeled. Our method employs explicit task-conditioning in which unlabeled sample clustering for the current task takes place in a new projection space different from the embedding feature space. The conditioned clustering space is linearly constructed so as to quickly close the gap between the class centroids for the current task and the independent per-class reference vectors meta-trained across tasks. In a more general setting, our method introduces a concept of controlling the degree of task-conditioning for meta-learning: the amount of task-conditioning varies with the number of repetitive updates for the clustering space. Extensive simulation results based on the miniImageNet and tieredImageNet datasets show state-of-the-art semi-supervised few-shot classification performance of the proposed method. Simulation results also indicate that the proposed task-adaptive clustering shows graceful degradation with a growing number of distractor samples, i.e., unlabeled sample images coming from outside the candidate classes.	翻訳日:2022-12-22 09:22:01 公開日:2020-03-18
# コネクション型AIアプリケーションの脆弱性:評価と防御 Vulnerabilities of Connectionist AI Applications: Evaluation and Defence ( http://arxiv.org/abs/2003.08837v1 ) ライセンス: Link先を確認	Christian Berghoff and Matthias Neu and Arndt von Twickel	(参考訳) この記事では、コネクショナリスト人工知能(AI)アプリケーションのITセキュリティを扱い、三つのITセキュリティ目標の1つである完全性への脅威に焦点を当てます。このような脅威は、例えば、著名なAIコンピュータビジョンアプリケーションにおいて最も関係がある。 ITセキュリティの目標整合性に関する総合的な見解を示すために、解釈可能性、堅牢性、ドキュメントなど多くの追加的な側面が考慮されている。脅威と可能な緩和の包括的なリストは、最先端の文献をレビューすることで提示される。敵の攻撃や毒殺攻撃などのai固有の脆弱性や、ai固有の根本原因を詳細に論じる。さらに、以前のレビューとは対照的に、AIサプライチェーン全体が、計画、データ取得、トレーニング、評価、運用フェーズを含む脆弱性について分析されている。緩和に関する議論は同様に、AIシステム自体のレベルに限らず、サプライチェーンとそのより大きなITインフラストラクチャやハードウェアデバイスへの組み込みという文脈でAIシステムを見ることを提唱している。これと、アダプティブアタッカーがこれまでに公表された任意のAI固有の防御を回避できるという観察に基づいて、記事は、単一の保護措置が不十分である代わりに、AIアプリケーションのための最小レベルのITセキュリティを達成するために、異なるレベルの複数の対策を組み合わせる必要があると結論付けている。 This article deals with the IT security of connectionist artificial intelligence (AI) applications, focusing on threats to integrity, one of the three IT security goals. Such threats are for instance most relevant in prominent AI computer vision applications. In order to present a holistic view on the IT security goal integrity, many additional aspects such as interpretability, robustness and documentation are taken into account. A comprehensive list of threats and possible mitigations is presented by reviewing the state-of-the-art literature. AI-specific vulnerabilities such as adversarial attacks and poisoning attacks as well as their AI-specific root causes are discussed in detail. Additionally and in contrast to former reviews, the whole AI supply chain is analysed with respect to vulnerabilities, including the planning, data acquisition, training, evaluation and operation phases. The discussion of mitigations is likewise not restricted to the level of the AI system itself but rather advocates viewing AI systems in the context of their supply chains and their embeddings in larger IT infrastructures and hardware devices. Based on this and the observation that adaptive attackers may circumvent any single published AI-specific defence to date, the article concludes that single protective measures are not sufficient but rather multiple measures on different levels have to be combined to achieve a minimum level of IT security for AI applications.	翻訳日:2022-12-22 09:21:41 公開日:2020-03-18

Title

Authors

Abstract

論文公表日・翻訳日

# 物理的に定義されたp型MOSシリコン二重量子ドットにおけるスピン軌道場

Spin orbit field in a physically defined p type MOS silicon double quantum dot ( http://arxiv.org/abs/2003.07079v2 )

ライセンス: Link先を確認

Marian Marx, Jun Yoneda, \'Angel Guti\'errez Rubio, Peter Stano, Tomohiro Otsuka, Kenta Takeda, Sen Li, Yu Yamaoka, Takashi Nakajima, Akito Noiri, Daniel Loss, Tetsuo Kodera and Seigo Tarucha

(参考訳) シリコン中のp型金属酸化物半導体二重量子ドットにおけるスピン軌道(so)場を実験的に理論的に検討した。パウリのスピン遮断における二重点を通した漏れ電流の磁場依存性を測定する。有限磁場は、外部磁場とSO磁場が平行であるときに、昇降が最も効果的である。このようにして、トンネル孔のスピンフリップは、二重点軸に垂直なSO場が量子井戸面からほぼ完全に外れているためである。群対称表現理論を用いて、SO項の導出による測定を拡大する。平面電場(量子井戸の場合)がなければ、so場は主に平面内にあり、ラシュバの和とドレッテルハウスの項が支配的であると予測される。したがって, 観測された等電界は, 平面成分が相当な電界に起因していると解釈した。

We experimentally and theoretically investigate the spin orbit (SO) field in a physically defined, p type metal oxide semiconductor double quantum dot in silicon. We measure the magnetic field dependence of the leakage current through the double dot in the Pauli spin blockade. A finite magnetic field lifts the blockade, with the lifting least effective when the external and SO fields are parallel. In this way, we find that the spin flip of a tunneling hole is due to a SO field pointing perpendicular to the double dot axis and almost fully out of the quantum well plane. We augment the measurements by a derivation of SO terms using group symmetric representations theory. It predicts that without in plane electric fields (a quantum well case), the SO field would be mostly within the plane, dominated by a sum of a Rashba and a Dresselhaus like term. We, therefore, interpret the observed SO field as originated in the electric fields with substantial in plane components.

翻訳日:2023-05-29 00:33:59 公開日:2020-03-18

# コンパクト位相空間宇宙論の量子揺らぎ

Quantum fluctuations of the compact phase space cosmology ( http://arxiv.org/abs/2003.08129v1 )

ライセンス: Link先を確認

Danilo Artigas, Sean Crowe, Jakub Mielczarek

(参考訳) 最近の記事ではPhys。 D 100, No. 4, 043533 (2019) 平面ド・ジッター宇宙論のコンパクト位相空間の一般化が提案されている。コンパクト化の主な利点は、物理量は有界であり、量子論は有限次元ヒルベルト空間によって特徴づけられる。さらに、$\mathbb{s}^2$位相空間を考えることで、量子記述は$su(2)$表現理論を用いて構成される。本研究の目的は、量子力学の半古典的状態の抽出に効果的な方法を適用することである。解析は、量子制約の事前解法とモデルの物理ハミルトニアンを抽出することによって行われる。有効レベルでは、2つの手順の結果が等価であることが示される。我々は、標準平坦位相空間での量子化後のものとは異なる、宇宙の再集合の周りのゆらぎの非自明な挙動を見つける。この挙動は量子バック反応効果を持つ修正フリードマン方程式のレベルに反映され、導出される。最後に、宇宙セクターの量子ゆらぎとホログラフィック・ブッソ境界との予期せぬ関係を示す。

In the recent article Phys. Rev. D 100, no. 4, 043533 (2019) a compact phase space generalization of the flat de Sitter cosmology has been proposed. The main advantages of the compactification is that physical quantities are bounded, and the quantum theory is characterized by finite dimensional Hilbert space. Furthermore, by considering the $\mathbb{S}^2$ phase space, quantum description is constructed with the use $SU(2)$ representation theory. The purpose of this article is to apply effective methods to extract semi-classical regime of the quantum dynamics. The analysis is performed both without prior solving of the quantum constraint and by extracting physical Hamiltonian of the model. At the effective level, the results of the two procedures are shown to be equivalent. We find a nontrivial behavior of the fluctuations around the recollapse of the universe, which is distinct from what is found after quantization with the standard flat phase space. The behavior is reflected at the level of the modified Friedmann equation with quantum back-reaction effects, which is derived. Finally, an unexpected relation between the quantum fluctuations of the cosmological sector and the holographic Bousso bound is shown.

翻訳日:2023-05-28 20:26:26 公開日:2020-03-18

# 量子アルゴリズムによる粒子トラック再構成

Particle Track Reconstruction with Quantum Algorithms ( http://arxiv.org/abs/2003.08126v1 )

ライセンス: Link先を確認

Cenk T\"uys\"uz, Federico Carminati, Bilge Demirk\"oz, Daniel Dobos, Fabio Fracas, Kristiane Novotny, Karolos Potamianos, Sofia Vallecorsa, Jean-Roch Vlimant

(参考訳) 粒子軌道再構成パラメータの正確な決定は、HL-LHC(High Luminosity Large Hadron Collider)実験において大きな課題となる。 HL-LHCにおける同時衝突の数の増加と高い検出器占有率により、トラック再構成アルゴリズムは時間と計算資源の面で極めて要求される。ヒット数の増加は、トラック再構築アルゴリズムの複雑さを増大させる。加えて、粒子のトラックにヒットを割り当てる際の曖昧さは、検出器の有限分解能とヒットの物理的近接性によって増大する。したがって、荷電粒子軌道の再構成はHL-LHCデータの正しい解釈にとって大きな課題となる。現在使われているほとんどの手法はカルマンフィルタに基づいており、ロバストであり、優れた物理性能を提供する。しかし、二乗よりはスケールが悪くなることが期待されている。ヒットレベルの組合せ背景を低減できるアルゴリズムを設計することで、カルマンフィルタに対するよりクリーンな初期シードが提供され、全体の処理時間が大幅に短縮される。量子コンピュータの顕著な特徴の1つは、非常に多くの状態を同時に評価でき、大きなパラメータ空間で検索するのに理想的な手段となることである。実際、異なるr\&dイニシアティブは、量子追跡アルゴリズムがそのような能力をどのように活用できるかを探求している。本稿では,初期シード段階における組合せ背景の低減を目的とした量子ベーストラック探索アルゴリズムの実装について述べる。 kaggle trackmlチャレンジ用に設計された公開データセットを使用します。

Accurate determination of particle track reconstruction parameters will be a major challenge for the High Luminosity Large Hadron Collider (HL-LHC) experiments. The expected increase in the number of simultaneous collisions at the HL-LHC and the resulting high detector occupancy will make track reconstruction algorithms extremely demanding in terms of time and computing resources. The increase in number of hits will increase the complexity of track reconstruction algorithms. In addition, the ambiguity in assigning hits to particle tracks will be increased due to the finite resolution of the detector and the physical closeness of the hits. Thus, the reconstruction of charged particle tracks will be a major challenge to the correct interpretation of the HL-LHC data. Most methods currently in use are based on Kalman filters which are shown to be robust and to provide good physics performance. However, they are expected to scale worse than quadratically. Designing an algorithm capable of reducing the combinatorial background at the hit level, would provide a much cleaner initial seed to the Kalman filter, strongly reducing the total processing time. One of the salient features of Quantum Computers is the ability to evaluate a very large number of states simultaneously, making them an ideal instrument for searches in a large parameter space. In fact, different R\&D initiatives are exploring how Quantum Tracking Algorithms could leverage such capabilities. In this paper, we present our work on the implementation of a quantum-based track finding algorithm aimed at reducing combinatorial background during the initial seeding stage. We use the publicly available dataset designed for the kaggle TrackML challenge.

翻訳日:2023-05-28 20:25:55 公開日:2020-03-18

# ARIMAモデルを用いた予測犯罪

Forecasting Crime Using ARIMA Model ( http://arxiv.org/abs/2003.08006v1 )

ライセンス: Link先を確認

Khawar Islam and Akhter Raza

(参考訳) データマイニングとは,大規模データセットからさまざまなパターンや有用な情報を抽出するプロセスである。ロンドン警察によると、犯罪は2017年の初めからロンドン各区で増加している。今後、犯罪防止のための有用な情報は得られない。我々は,ロンドン地区における犯罪率の予測を,ロンドンにおける大規模な犯罪データセットを抽出し,将来における犯罪数を予測する。ロンドンにおける犯罪予測に時系列ARIMAモデルを用いた。 2年間の犯罪データを予測するARIMAモデルに5年間のデータを与える。対照的に指数的滑らかな ARIMA モデルはより高い適合値を持つ。ロンドン警視庁がウェブサイトやその他の資料から収集した実際の犯罪のデータセット。私たちの主な概念は4つの部分に分かれている。データ抽出(DE)、非構造化データのデータ処理(DP)、IBM SPSSの可視化モデル。 DEは、2012年の2016年のWebソースから犯罪データを抽出する。 DPはデータを統合して還元し、事前に定義された属性を与える。犯罪予測は、いくつかの計算を適用して分析され、その移動平均、差、自動回帰を計算する。予測モデルは80%の正確な値を与えるが、これは正確なモデルである。この作業はロンドン警察の犯罪に対する意思決定に役立つ。

Data mining is the process in which we extract the different patterns and useful Information from large dataset. According to London police, crimes are immediately increases from beginning of 2017 in different borough of London. No useful information is available for prevent crime on future basis. We forecasts crime rates in London borough by extracting large dataset of crime in London and predicted number of crimes in future. We used time series ARIMA model for forecasting crimes in London. By giving 5 years of data to ARIMA model forecasting 2 years crime data. Comparatively, with exponential smoothing ARIMA model has higher fitting values. A real dataset of crimes reported by London police collected from its website and other resources. Our main concept is divided into four parts. Data extraction (DE), data processing (DP) of unstructured data, visualizing model in IBM SPSS. DE extracts crime data from web sources during 2012 for the 2016 year. DP integrates and reduces data and give them predefined attributes. Crime prediction is analyzed by applying some calculation, calculated their moving average, difference, and auto-regression. Forecasted Model gives 80% correct values, which is formed to be an accurate model. This work helps for London police in decision-making against crime.

翻訳日:2023-05-28 20:23:09 公開日:2020-03-18

# 光子と相互作用する原子配列に現れる量子ホール位相

Quantum Hall phase emerging in an array of atoms interacting with photons ( http://arxiv.org/abs/2003.08257v1 )

ライセンス: Link先を確認

Alexander V. Poshakinskiy, Janet Zhong, Yongguan Ke, Nikita A. Olekhno, Chaohong Lee, Yuri S. Kivshar, Alexander N. Poddubny

(参考訳) 位相量子相は現代物理学の多くの概念の根底にある。電子の無秩序なトポロジカルエッジ状態の存在は通常磁場を必要とするが、光に対する磁場の直接効果は非常に弱い。結果として、光子の位相状態のデモンストレーションは、特別な複素構造や外部時間依存変調で設計された合成場を用いる。ここでは、トポロジカルなエッジ状態、スペクトルランダウレベル、ホフスタッター・バターフライを持つ量子ホール相が単純な量子系に現れ、トポロジカルな秩序は微調整なしで相互作用からのみ生じる。このような系は、古典ディッケモデルによって記述された光に結合された2段階の原子(量子ビット)の配列であり、最近、低温原子と超伝導量子ビットの実験で実現されている。我々は、量子物理学、多体物理学、非線形トポロジカルフォトニクスを含むいくつかの分野において新たな地平線が開かれ、量子ビットアレイや量子シミュレータの実験において重要な基準点となると考えている。

Topological quantum phases underpin many concepts of modern physics. While the existence of disorder-immune topological edge states of electrons usually requires magnetic fields, direct effects of magnetic field on light are very weak. As a result, demonstrations of topological states of photons employ synthetic fields engineered in special complex structures or external time-dependent modulations. Here, we reveal that the quantum Hall phase with topological edge states, spectral Landau levels and Hofstadter butterfly can emerge in a simple quantum system, where topological order arises solely from interactions without any fine-tuning. Such systems, arrays of two-level atoms (qubits) coupled to light being described by the classical Dicke model, have recently been realized in experiments with cold atoms and superconducting qubits. We believe that our finding will open new horizons in several disciplines including quantum physics, many-body physics, and nonlinear topological photonics, and it will set an important reference point for experiments on qubit arrays and quantum simulators.

翻訳日:2023-05-28 20:16:44 公開日:2020-03-18

# 動的に分離された単一中性原子のコヒーレンス

Coherence of a dynamically decoupled single neutral atom ( http://arxiv.org/abs/2003.08163v1 )

ライセンス: Link先を確認

Chang Hoong Chow, Boon Long Ng, Christian Kurtsiefer

(参考訳) 量子通信における高度な応用には、長い量子ビットコヒーレンスと効率的な原子-光子カップリングが不可欠である。コヒーレンスを維持するための1つのテクニックは動的疎結合であり、システムと環境との相互作用を減らすために周期的な再焦点パルス列を用いる。スピン分極した$^{87}$rb原子上での動的デカップリングの実装を実験的に検討した。 2つの磁気感度を持つ5S_{1/2}=ゼーマンレベル、$\lvert{F=2,\ m_{F}=-2}\rangle$と$\lvert{F=1,\ m_{F}=-1}\rangle$をクォービット状態として使用し、$\lvert{F=2,\ m_{F}=-2}\rangle$から$5P_{3/2}$に励起状態$\lvert{F'=3,\ m'_{F}=-3}\rangle$を閉光遷移によって結合する。動的デカップリング法において、より多くの再焦点パルスにより、コヒーレンス時間を38(3)$\mu$sから2ミリ秒以上にまで拡張することができた。また, 原子の運動状態と再焦点後のクビットコヒーレンスとの間には強い相関関係が見られ, トラップパラメータの解法として利用することができる。

Long qubit coherence and efficient atom-photon coupling are essential for advanced applications in quantum communication. One technique to maintain coherence is dynamical decoupling, where a periodic sequence of refocusing pulses is employed to reduce the interaction of the system with the environment. We experimentally study the implementation of dynamical decoupling on an optically-trapped, spin-polarized $^{87}$Rb atom. We use the two magnetic-sensitive $5S_{1/2}$ Zeeman levels, $\lvert{F=2,\ m_{F}=-2}\rangle$ and $\lvert{F=1,\ m_{F}=-1}\rangle$ as qubit states, motivated by the possibility to couple $\lvert{F=2,\ m_{F}=-2}\rangle$ to $5P_{3/2}$ the excited state $\lvert{F'=3,\ m'_{F}=-3}\rangle$ via a closed optical transition. With more refocusing pulses in the dynamical decoupling technique, we manage to extend the coherence time from 38(3)$\mu$s to more than two milliseconds. We also observe a strong correlation between the motional states of the atom and the qubit coherence after the refocusing, which can be used as a measurement basis to resolve trapping parameters.

翻訳日:2023-05-28 20:15:55 公開日:2020-03-18

# 量子集合モデルにおける異常熱化

Anomalous Thermalization in Quantum Collective Models ( http://arxiv.org/abs/2003.08141v1 )

ライセンス: Link先を確認

Armando Rela\~no

(参考訳) 熱状態は,非平衡過程を含む実験によって追跡可能な過去の情報と関連する量の情報をいまだに保存していることが明らかとなった。我々は,マイクロカノニカル量子クルックの定理の条件を提供し,数値実験により検証する。 lipkin-meshkov-glickモデルでは、同じ平衡状態につながる2つの異なる手順は、非平衡過程における仕事の異なる統計をもたらす。ディックモデルでは、同じ非平衡プロトコルに対する2つの異なる軌道が、異なる作業統計を生成する。マイクロカノニカル平均は、全ての場合において物理観測可能な期待値の正しい結果を与えるが、マイクロカノニカル量子クルックの定理はそれらのいくつかでは失敗する。量子ゆらぎ定理のテストは、システムが適切に熱化されているかどうかを検証することが必須である。

We show that apparently thermalized states still store relevant amounts of information about their past, information that can be tracked by experiments involving nonequilibrium processes. We provide a condition for the microcanonical quantum Crook\'s theorem, and we test it by means of numerical experiments. In the Lipkin-Meshkov-Glick model, two different procedures leading to the same equilibrium states give rise to different statistics of work in nonequilibrium processes. In the Dicke model, two different trajectories for the same nonequilibrium protocol produce different statistics of work. Microcanonical averages provide the correct results for the expectation values of physical observables in all the cases; the microcanonical quantum Crook\'s theorem fails in some of them. We conclude that testing quantum fluctuation theorems is mandatory to verify if a system is properly thermalized.

翻訳日:2023-05-28 20:14:32 公開日:2020-03-18

# キャビティエンハンスドノイズ抑制を用いた寒冷原子時空間多重量子メモリ

A cold atom temporally multiplexed quantum memory with cavity-enhanced noise suppression ( http://arxiv.org/abs/2003.08418v1 )

ライセンス: Link先を確認

Lukas Heller, Pau Farrera, Georg Heinze, Hugues de Riedmatten

(参考訳) 将来の量子リピータアーキテクチャは、遠く離れた光の量子状態に符号化された情報を効率的に分散することができる。本研究では, レーザー冷却雲内における時相多重量子リピータノードを, $^{87}$rb 原子で実演する。我々はDLCZプロトコルを用いて、光子対と1つの集合スピン励起(いわゆるスピン波)を複数の時間モードで書き込みパルスを用いて生成する。異なる時間モードで生成されたスピン波を識別可能とし、選択的読み出しを可能にするため、磁場勾配によるスピン波の強調と強調を制御し、関連する原子超微粒子の可逆的不均質化を誘導する。低精細な光学空洞内に原子アンサンブルを埋め込むことにより、マルチモード動作で発生する追加ノイズが強く抑制されることを示す。フィードフォワード読み出しを利用して、最大10の時間モードの区別可能な検索を示す。各モードについて、第一光子と第二光子の間の非古典的相関を証明する。さらに、メモリに記憶されている時間モードの数を増やすことにより、相関光子対の速度の増大が観察される。報告された能力は、多重量子メモリに基づく量子リピータアーキテクチャの重要な要素である。

Future quantum repeater architectures, capable of efficiently distributing information encoded in quantum states of light over large distances, will benefit from multiplexed photonic quantum memories. In this work we demonstrate a temporally multiplexed quantum repeater node in a laser-cooled cloud of $^{87}$Rb atoms. We employ the DLCZ protocol where pairs of photons and single collective spin excitations (so called spin waves) are created in several temporal modes using a train of write pulses. To make the spin waves created in different temporal modes distinguishable and enable selective readout, we control the dephasing and rephasing of the spin waves by a magnetic field gradient, which induces a controlled reversible inhomogeneous broadening of the involved atomic hyperfine levels. We demonstrate that by embedding the atomic ensemble inside a low finesse optical cavity, the additional noise generated in multi-mode operation is strongly suppressed. By employing feed forward readout, we demonstrate distinguishable retrieval of up to 10 temporal modes. For each mode, we prove non-classical correlations between the first and second photon. Furthermore, an enhancement in rates of correlated photon pairs is observed as we increase the number of temporal modes stored in the memory. The reported capability is a key element of a quantum repeater architecture based on multiplexed quantum memories.

翻訳日:2023-05-28 20:05:48 公開日:2020-03-18

# 1064nm顕微鏡光ツイーザの単一セシウム原子トラップ寿命に及ぼすレーザ強度変動の影響

Influence of Laser Intensity Fluctuation on Single-Cesium Atom Trapping Lifetime in a 1064-nm Microscopic Optical Tweezer ( http://arxiv.org/abs/2003.08415v1 )

ライセンス: Link先を確認

Rui Sun, Xin Wang, Kong Zhang, Jun He and Junmin Wang

(参考訳) 赤調1064nmレーザーの強集束単空間モードガウスビームからなる光tweezerは、光強度の最も強い点において単一セシウム(cs)原子を固定することができる。これを単一量子ビットと単一光子ソースのコヒーレントな操作に使用できる。光ツイーザ中の原子のトラップ寿命は、背景原子の影響、光ツイーザのレーザー強度変動、原子の残留熱運動により非常に短い。本稿では,背景圧力,光トワイザのトラップ周波数,光トワイザのパラメトリック加熱が原子トラップ寿命に及ぼす影響を分析した。 AOM(Acousto-optical modulator)に基づく外部フィードバックループと組み合わせて、時間領域における1064nmレーザーの強度変動を$\pm$3.360$\%$から$\pm$0.064$\%$に抑制し、抑制帯域はおよそ33kHzに達した。光学トウィーザーにおける単一cs原子のトラップ寿命は4.04 sから6.34 sに延長された。

An optical tweezer composed of a strongly focused single-spatial-mode Gaussian beam of a red-detuned 1064-nm laser can confine a single-cesium (Cs) atom at the strongest point of the light intensity. We can use this for coherent manipulation of single-quantum bits and single-photon sources. The trapping lifetime of the atoms in the optical tweezers is very short due to the impact of the background atoms, the laser intensity fluctuation of optical tweezer and the residual thermal motion of the atoms. In this paper, we analyzed the influence of the background pressure, the trap frequency of optical tweezers and the parametric heating of the optical tweezer on the atomic trapping lifetime. Combined with the external feedback loop based on an acousto-optical modulator (AOM), the intensity fluctuation of the 1064-nm laser in the time domain was suppressed from $\pm$ 3.360$\%$ to $\pm$ 0.064$\%$, and the suppression bandwidth reached approximately 33 kHz. The trapping lifetime of a single Cs atom in the microscopic optical tweezer was extended from 4.04 s to 6.34 s.

翻訳日:2023-05-28 20:05:27 公開日:2020-03-18

# 退化基底空間に対するAGSPのシャープ含意

Sharp implications of AGSPs for degenerate ground spaces ( http://arxiv.org/abs/2003.08406v1 )

ライセンス: Link先を確認

Nilin Abrahamsen

(参考訳) We generalize the `off-the-rack' AGSP$\Rightarrow$entanglement bound implication of [Arad, Landau, and Vazirani '12] from unique ground states to degenerate ground spaces. Our condition $R\Delta\le1/2$ on a $(\Delta,R)$-AGSP matches the non-degenerate case, whereas existing tools in the literature of spin chains would only be adequate to prove a less natural implication which assumes $R^{\text{Const}}\Delta\le c$. To show that $R\Delta\le1/2$ still suffices in the degenerate case we prove an optimal error reduction bound which improves on the literature by a factor $\delta\mu$ where $\delta=1-\mu$ is the viability. The generalized off-the-rack bound implies the generalization of a recent 2D subvolume law of [Anshu, Arad, and Gosset '19] from the non-degenerate case to the sub-exponentially degenerate case.

We generalize the `off-the-rack' AGSP$\Rightarrow$entanglement bound implication of [Arad, Landau, and Vazirani '12] from unique ground states to degenerate ground spaces. Our condition $R\Delta\le1/2$ on a $(\Delta,R)$-AGSP matches the non-degenerate case, whereas existing tools in the literature of spin chains would only be adequate to prove a less natural implication which assumes $R^{\text{Const}}\Delta\le c$. To show that $R\Delta\le1/2$ still suffices in the degenerate case we prove an optimal error reduction bound which improves on the literature by a factor $\delta\mu$ where $\delta=1-\mu$ is the viability. The generalized off-the-rack bound implies the generalization of a recent 2D subvolume law of [Anshu, Arad, and Gosset '19] from the non-degenerate case to the sub-exponentially degenerate case.

翻訳日:2023-05-28 20:04:32 公開日:2020-03-18

# 超伝導量子プロセッサ,スピンメモリ,フォトニック量子ネットワーク間のコヒーレントインタフェースのための光バス

A Phononic Bus for Coherent Interfaces Between a Superconducting Quantum Processor, Spin Memory, and Photonic Quantum Networks ( http://arxiv.org/abs/2003.08383v1 )

ライセンス: Link先を確認

Tomas Neuman, Matt Eichenfield, Matthew Trusheim, Lisa Hackett, Prineha Narang, Dirk Englund

(参考訳) 超伝導マイクロ波量子ビットと固体人工原子の基底状態スピン系との間の高忠実な量子状態変換法を圧電トランスデューサに接続された音響バスを介して紹介する。最適化フォノニックキャビティにおける超伝導回路量子ビットおよびダイヤモンドシリコン空隙中心の現在の実験パラメータに適用し,99\%を超える忠実性を有する量子状態変換をmhz規模の帯域幅で推定する。超伝導回路量子コンピューティングと人工原子の相補的な強度を組み合わせることで、ハイブリッドアーキテクチャは、長寿命量子メモリ、高忠実度測定、大きな量子ビット数、再構成可能な量子ビット接続、光量子ネットワークによる高忠実度状態とゲートテレポーテーションを提供する。

We introduce a method for high-fidelity quantum state transduction between a superconducting microwave qubit and the ground state spin system of a solid-state artificial atom, mediated via an acoustic bus connected by piezoelectric transducers. Applied to present-day experimental parameters for superconducting circuit qubits and diamond silicon vacancy centers in an optimized phononic cavity, we estimate quantum state transduction with fidelity exceeding 99\% at a MHz-scale bandwidth. By combining the complementary strengths of superconducting circuit quantum computing and artificial atoms, the hybrid architecture provides high-fidelity qubit gates with long-lived quantum memory, high-fidelity measurement, large qubit number, reconfigurable qubit connectivity, and high-fidelity state and gate teleportation through optical quantum networks.

翻訳日:2023-05-28 20:04:07 公開日:2020-03-18

# タンパク質$\alpha$-helicesにおける自由エネルギーの量子輸送と利用

Quantum transport and utilization of free energy in protein $\alpha$-helices ( http://arxiv.org/abs/2003.13814v1 )

ライセンス: Link先を確認

Danko D. Georgiev, James F. Glazebrook

(参考訳) 生命を維持する重要な生物学的プロセスは、タンパク質ナノエンジンによって触媒され、生体系を非平衡状態から維持する。タンパク質のエネルギー過程を調べるために, 水素結合ペプチド基に沿って伝播する複数のアミドIエキシトン量子の量子力学を, 一般化したダヴィドフ方程式の系を解析した。計算シミュレーションにより、様々な長さのタンパク質$\alpha$-helicesに対してアミドiエネルギーのパルスを印加することで動くダヴィドフソリトンの生成が確認された。これらのソリトンの安定性と移動性は、アミドi振動子間の双極子-双極子カップリングの均一性とエキシトン-フォノン相互作用の等方性に依存する。ダヴィドフ・ソリトンは巨大な障壁を通る量子トンネルや、衝突地点での量子干渉も可能であった。この結果は、高分子構造の結合剤としての共有結合の力学的支持を超えた生物学的システムにおける量子効果の非自明な役割を支持する。ダヴィドフソリトン(英語版)の量子トンネルと干渉は、そのような真の量子現象の存在を支持する生物学的秩序の進化的な委任に加えて、高効率な輸送、輸送、利用を可能にする物理機構を持つ触媒活性のマクロ分子タンパク質複合体を提供する。

The essential biological processes that sustain life are catalyzed by protein nano-engines, which maintain living systems in far-from-equilibrium ordered states. To investigate energetic processes in proteins, we have analyzed the system of generalized Davydov equations that govern the quantum dynamics of multiple amide I exciton quanta propagating along the hydrogen-bonded peptide groups in $\alpha$-helices. Computational simulations have confirmed the generation of moving Davydov solitons by applied pulses of amide I energy for protein $\alpha$-helices of varying length. The stability and mobility of these solitons depended on the uniformity of dipole-dipole coupling between amide I oscillators, and the isotropy of the exciton-phonon interaction. Davydov solitons were also able to quantum tunnel through massive barriers, or to quantum interfere at collision sites. The results presented here support a nontrivial role of quantum effects in biological systems that lies beyond the mechanistic support of covalent bonds as binding agents of macromolecular structures. Quantum tunneling and interference of Davydov solitons provide catalytically active macromolecular protein complexes with a physical mechanism allowing highly efficient transport, delivery, and utilization of free energy, besides the evolutionary mandate of biological order that supports the existence of such genuine quantum phenomena, and may indeed demarcate the quantum boundaries of life.

翻訳日:2023-05-28 19:58:18 公開日:2020-03-18

# テキストマイニングフォーマメンティスネットワークはソーシャルメディアにおけるSTEM性差の一般認識を再構築する

Text-mining forma mentis networks reconstruct public perception of the STEM gender gap in social media ( http://arxiv.org/abs/2003.08835v1 )

ライセンス: Link先を確認

Massimo Stella

(参考訳) マインドセット再構成(Mindset reconstruction)は、個人の構造と知識の知覚のマッピングであり、言語とその人間の心における認知的反射(精神の語彙)を調べることによって、ここで展開された地図である。 textual forma mentis networks (tfmn) は、文章データからマインドセットの構造を抽出、表現、理解するために導入されたガラスの箱である。ネットワーク科学、心理言語学、ビッグデータを組み合わせることで、TFMNは、ベンチマークテキストにおいて、監督なしに関連する概念を特定できた。ひとたび検証されると、tfmnは科学における男女格差のケーススタディに応用され、近年の研究によって歪んだ考え方に強く関連した。ソーシャルメディアの認識とオンラインの談話に焦点を当て、この研究は1万の関連ツイートを分析した。ジェンダー」と「ギャップ」はほとんど肯定的な認識を示し、信頼とジョーヨーの感情的プロファイルと意味的関連性: 女性科学者の成功、男女格差と賃金差の関連、将来の解決への期待。女性」の認識は、科学における女性に対する性的嫌がらせとステレオタイプ的脅威(暗黙の認知バイアスの一形態)に関する議論を「成功のための個人的スキルを犠牲にする」ことを強調した。人」の再構築された認識は、科学における男性の優越という神話に対する社会ユーザの認識を強調した。人」に関する怒りは検出されず、ギャップに焦点をあてた談話が性別のない言葉に関して緊張しなくなったことを示唆している。科学者」に対する定型的な認識は、実世界の調査とは異なるオンライン上では見つからなかった。総合分析では、オンラインの談話は、主にステレオタイプフリーでポジティブで信頼に満ちたジェンダー格差の認識を促進し、暗黙の/説明的な偏見を認識し、ギャップを縮めると予測している。 TFMNは、異なるグループの認識を調査するための新しい方法を開き、政策決定のための詳細なデータインフォームド基盤を提供した。

Mindset reconstruction maps how individuals structure and perceive knowledge, a map unfolded here by investigating language and its cognitive reflection in the human mind, i.e. the mental lexicon. Textual forma mentis networks (TFMN) are glass boxes introduced for extracting, representing and understanding mindsets' structure, in Latin "forma mentis", from textual data. Combining network science, psycholinguistics and Big Data, TFMNs successfully identified relevant concepts, without supervision, in benchmark texts. Once validated, TFMNs were applied to the case study of the gender gap in science, which was strongly linked to distorted mindsets by recent studies. Focusing over social media perception and online discourse, this work analysed 10,000 relevant tweets. "Gender" and "gap" elicited a mostly positive perception, with a trustful/joyous emotional profile and semantic associates that: celebrated successful female scientists, related gender gap to wage differences, and hoped for a future resolution. The perception of "woman" highlighted discussion about sexual harassment and stereotype threat (a form of implicit cognitive bias) relative to women in science "sacrificing personal skills for success". The reconstructed perception of "man" highlighted social users' awareness of the myth of male superiority in science. No anger was detected around "person", suggesting that gap-focused discourse got less tense around genderless terms. No stereotypical perception of "scientist" was identified online, differently from real-world surveys. The overall analysis identified the online discourse as promoting a mostly stereotype-free, positive/trustful perception of gender disparity, aware of implicit/explicit biases and projected to closing the gap. TFMNs opened new ways for investigating perceptions in different groups, offering detailed data-informed grounding for policy making.

翻訳日:2023-05-28 19:56:40 公開日:2020-03-18

# ハイブリッド量子システムの非線形光学応答の位相制御

Topological control of the nonlinear-optical response of hybrid quantum systems ( http://arxiv.org/abs/2003.08465v1 )

ライセンス: Link先を確認

Ethan L. Crowell and Mark G. Kuzyk

(参考訳) 1次元超格子の位相特性を電子系の光学的性質にマッピングする。非線形光学応答は、位相的に保護されたエッジ状態と非局在化された固有状態の間の遷移形態にある電子に最適化されている。これはハイブリッド量子システムの非線形光学応答をチューニングする新しい手段を提供する。これらの特性を飽和吸収の模倣に利用し,効率の良い全光スイッチの構築に「量子コード」をどのように利用できるかを示す。

We map the topological properties of a one dimensional superlattice to the optical properties of an electronic system. We find that the nonlinear-optical response is optimized for electrons that live in the transitional morphology between topologically protected edge states and delocalized eigenstates. This provides a novel means of tuning the nonlinear-optical response of hybrid quantum systems. We show how these characteristics can be used to mimic saturable absorption and illustrate how `quantum cords' can be used to build an efficient all-optical switch.

翻訳日:2023-05-28 19:55:12 公開日:2020-03-18

# 時間軌道電位トラップにおける磁場の精密制御と光偏光

Precise control of magnetic fields and optical polarization in a time-orbiting potential trap ( http://arxiv.org/abs/2003.08459v1 )

ライセンス: Link先を確認

A. J. Fallon and C. A. Sackett

(参考訳) 時間軌道ポテンシャルトラップは、回転磁場中の中性原子を拘束する。磁場の回転は、系統的な効果を平均化できるため、精密な測定に有用である。しかし、磁場は静的場よりも特性付けが難しく、原子に適用される光が量子化軸に対して時間的に変動する光偏光を持つ。これらの問題は、電波磁場またはレーザーが回転磁場に同期するパルスに印加されるストロボスコープ技術を用いて克服することができる。これらの方法を用いて、磁場は10mGの精度で特徴付けることができ、光は5\times 10^{-5}$の偏光誤差で適用することができる。

A time orbiting potential trap confines neutral atoms in a rotating magnetic field. The rotation of the field can be useful for precision measurements, since it can average out some systematic effects. However, the field is more difficult to characterize than a static field, and it makes light applied to the atoms have a time-varying optical polarization relative to the quantization axis. These problems can be overcome using stroboscopic techniques, where either a radio-frequency field or a laser is applied in pulses that are synchronized to the rotating field. Using these methods, the magnetic field can be characterized with a precision of 10 mG and light can be applied with a polarization error of $5\times 10^{-5}$.

翻訳日:2023-05-28 19:55:03 公開日:2020-03-18

# 量子イジングリングにおける縦磁化ダイナミクス:運動量空間と実空間の対応に基づくパフィアン法

Longitudinal magnetization dynamics in the quantum Ising ring: A Pfaffian method based on correspondence between momentum space and real space ( http://arxiv.org/abs/2001.00511v2 )

ライセンス: Link先を確認

Ning Wu

(参考訳) おそらく最も研究されている量子相転移のパラダイムとして、周期的量子イジング鎖はジョルダン・ウィグナー変換によって正確に解くことができ、続いてスピンレスフェルミオンの運動量空間でモデルを対角化するフーリエ変換が続く。上記の手順はよく知られているが、量子イジング環の実空間と運動量空間の表現、特にフェルミオンパリティに関する対応に関して、いくつかの微妙な点がある。本研究では、実空間における2つの完全整列強磁性状態と古典的イジング環の2つの退化運動量空間基底状態との関係を定め、前者はフラストレーションのない超曲面上のより一般的なXYZモデルの分解基底状態の特別な場合である。この観察に基づいて, 2つの強磁性状態のうちの1つと, 並進不変な駆動下で作製した系を用いて, パリティ破断した縦磁化のリアルタイムダイナミクスを計算するためのファフィアン公式を提供する。この形式主義は、パフィアンの数値計算のためのオンラインプログラムの助けを借りてシステムに適用できるため、例えば関連するシステムにおける離散時間結晶の出現を数値的に研究するための効率的な手法を提供する。

As perhaps the most studied paradigm for a quantum phase transition, the periodic quantum Ising chain is exactly solvable via the Jordan-Wigner transformation followed by a Fourier transform that diagonalizes the model in the momentum space of spinless fermions. Although the above procedures are well-known, there remain some subtle points to be clarified regarding the correspondence between the real-space and momentum-space representations of the quantum Ising ring, especially those related to fermion parities. In this work, we establish the relationship between the two fully aligned ferromagnetic states in real space and the two degenerate momentum-space ground states of the classical Ising ring, with the former being a special case of the factorized ground states of the more general XYZ model on the frustration-free hypersurface. Based on this observation, we then provide a Pfaffian formula for calculating real-time dynamics of the parity-breaking longitudinal magnetization with the system initially prepared in one of the two ferromagnetic states and under translationally invariant drivings. The formalism is shown to be applicable to systems with the help of online programs for the numerical computation of the Pfaffian, thus providing an efficient method to numerically study, for example, the emergence of discrete time crystals in related systems.

翻訳日:2023-01-16 04:50:55 公開日:2020-03-18

# 進化的ニューラルアーキテクチャによる網膜血管セグメンテーションの探索

Evolutionary Neural Architecture Search for Retinal Vessel Segmentation ( http://arxiv.org/abs/2001.06678v3 )

ライセンス: Link先を確認

Zhun Fan, Jiahong Wei, Guijie Zhu, Jiajie Mo, Wenji Li

(参考訳) 正確な網膜血管セグメンテーション(RVS)は、眼科疾患やその他の全身疾患の診断において医師を支援する上で非常に重要である。網膜血管セグメンテーションのための有効なニューラルネットワークアーキテクチャを手作業で設計するには、高度な専門知識と大きなワークロードが必要です。血管セグメンテーションの性能を改善し,手動で設計するニューラルネットワークの作業量を削減するために,網膜血管セグメンテーションのためのエンコーダデコーダアーキテクチャを最適化するためのニューラルネットワーク探索(NAS)を適用した新しいアプローチを提案する。改良進化アルゴリズムは、限られた計算資源でエンコーダ・デコーダ・フレームワークのアーキテクチャを発展させるために用いられる。提案手法により得られた進化的モデルは,DRIVE, STARE, CHASE_DB1 という3つのデータセットで比較した手法の上位性能を実現するが,パラメータははるかに少ない。さらに, クロストレーニングの結果, 進化したモデルはかなりのスケーラビリティを示し, 臨床疾患診断の可能性も示唆した。

The accurate retinal vessel segmentation (RVS) is of great significance to assist doctors in the diagnosis of ophthalmology diseases and other systemic diseases. Manually designing a valid neural network architecture for retinal vessel segmentation requires high expertise and a large workload. In order to improve the performance of vessel segmentation and reduce the workload of manually designing neural network, we propose novel approach which applies neural architecture search (NAS) to optimize an encoder-decoder architecture for retinal vessel segmentation. A modified evolutionary algorithm is used to evolve the architectures of encoder-decoder framework with limited computing resources. The evolved model obtained by the proposed approach achieves top performance among all compared methods on the three datasets, namely DRIVE, STARE and CHASE_DB1, but with much fewer parameters. Moreover, the results of cross-training show that the evolved model is with considerable scalability, which indicates a great potential for clinical disease diagnosis.

翻訳日:2023-01-10 05:30:56 公開日:2020-03-18

# 対象関数は、十分広いランダムネットワークの近傍に存在する:幾何学的視点

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective ( http://arxiv.org/abs/2001.06931v2 )

ライセンス: Link先を確認

Shun-ichi Amari

(参考訳) 任意の対象関数は、幅(層内のニューロン数)が十分に大きい場合、ランダムに接続された任意のディープネットワークの十分小さな近傍で実現されることが知られている。この顕著な事実については洗練された理論や議論があるが、厳密な理論は非常に複雑である。構造を解明するために単純なモデルを用いて基本的な幾何学的証明を与える。半径 1 の高次元球面を低次元部分空間に投影すると、球面上の一様分布は無視できるほど小さな共分散のガウス分布に還元される。

It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role: When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere reduces to a Gaussian distribution of negligibly small covariances.

翻訳日:2023-01-08 05:04:34 公開日:2020-03-18

# 深部ニューラルネットワークのための学習後線形量子化

Post-Training Piecewise Linear Quantization for Deep Neural Networks ( http://arxiv.org/abs/2002.00104v2 )

ライセンス: Link先を確認

Jun Fang, Ali Shafiee, Hamzah Abdel-Aziz, David Thorsley, Georgios Georgiadis, Joseph Hassoun

(参考訳) リソース制限されたデバイスへのディープニューラルネットワークのエネルギー効率向上において、量子化は重要な役割を果たす。トレーニング後の量子化は、完全なトレーニングデータセットの再トレーニングやアクセスを必要としないため、非常に望ましい。ニューラルネットワークを完全精度から8ビットの固定点整数に変換することにより、学習後量子化のための確立された均一なスキームが良好な結果を得る。しかし、ビット幅の量子化では性能が著しく低下する。本稿では,長尾のベル型分布を持つテンソル値の高精度近似を実現するために,区分線形量子化(pwlq)スキームを提案する。提案手法では、量子化範囲全体をテンソル毎に重複しない領域に分割し、各領域に等数の量子化レベルを割り当てる。範囲全体を分割する最適なブレークポイントは、量子化誤差を最小化する。実験結果から,提案手法は画像分類,セマンティックセグメンテーション,オブジェクト検出において,少ないオーバーヘッドで優れた性能を発揮することが示された。

Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training dataset. The well-established uniform scheme for post-training quantization achieves satisfactory results by converting neural networks from full-precision to 8-bit fixed-point integers. However, it suffers from significant performance degradation when quantizing to lower bit-widths. In this paper, we propose a piecewise linear quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Our approach breaks the entire quantization range into non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. Optimal breakpoints that divide the entire range are found by minimizing the quantization error. Compared to state-of-the-art post-training quantization methods, experimental results show that our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.

翻訳日:2023-01-05 06:04:22 公開日:2020-03-18

# ml-misfit:機械学習を用いたフルウェーブフォームインバージョンのためのロバストミスフィット関数の学習

ML-misfit: Learning a robust misfit function for full-waveform inversion using machine learning ( http://arxiv.org/abs/2002.03163v2 )

ライセンス: Link先を確認

Bingbing Sun and Tariq Alkhalifah

(参考訳) フル波形インバージョン(fwi)用の利用可能なadvanced misfit関数のほとんどは手作りであり、これらのmisfit関数のパフォーマンスはデータ依存である。そこで本研究では,fwi の ml-misfit というミスフィット関数を機械学習に基づいて学習することを提案する。マッチングフィルタの最適輸送にインスパイアされ、2つの分布の平均と分散を比較するのに類似した形で、不適合関数のためのニューラルネットワーク(NN)アーキテクチャを設計する。その結果得られたミスフィットがメトリックであることを保証するために、入力に対するミスフィットの対称性と「三角不等式」規則を満たすメタ損失関数におけるヒンジ損失正規化項を満足する。メタラーニングの枠組みでは、FWIを実行してネットワークをトレーニングし、ランダムに生成された速度モデルを逆転させ、真のモデルと逆モデルの累積差として定義されるメタロスを最小化してNNのパラメータを更新する。まず,移動時シフト信号に対する凸不適合関数を学習するためのMLミスフィットの基本原理を説明する。さらに,2次元水平層モデル上でNNを訓練し,よく知られたMarmousiモデルに適用することにより,学習したMLミスフィットの有効性と堅牢性を示す。

Most of the available advanced misfit functions for full waveform inversion (FWI) are hand-crafted, and the performance of those misfit functions is data-dependent. Thus, we propose to learn a misfit function for FWI, entitled ML-misfit, based on machine learning. Inspired by the optimal transport of the matching filter misfit, we design a neural network (NN) architecture for the misfit function in a form similar to comparing the mean and variance for two distributions. To guarantee the resulting learned misfit is a metric, we accommodate the symmetry of the misfit with respect to its input and a Hinge loss regularization term in a meta-loss function to satisfy the "triangle inequality" rule. In the framework of meta-learning, we train the network by running FWI to invert for randomly generated velocity models and update the parameters of the NN by minimizing the meta-loss, which is defined as accumulated difference between the true and inverted models. We first illustrate the basic principle of the ML-misfit for learning a convex misfit function for travel-time shifted signals. Further, we train the NN on 2D horizontally layered models, and we demonstrate the effectiveness and robustness of the learned ML-misfit by applying it to the well-known Marmousi model.

翻訳日:2023-01-02 23:08:47 公開日:2020-03-18

# 不変リスク最小化ゲーム

Invariant Risk Minimization Games ( http://arxiv.org/abs/2002.04692v2 )

ライセンス: Link先を確認

Kartik Ahuja, Karthikeyan Shanmugam, Kush R. Varshney, Amit Dhurandhar

(参考訳) 機械学習の標準的なリスク最小化パラダイムは、スプリアス相関によるトレーニング分布とテスト分布が異なる環境での運用では不安定である。多くの環境からのデータのトレーニングと不変な予測器の発見は、結果と因果関係を持つ特徴にモデルを集中させることで、刺激的な特徴の効果を減らす。本研究では,複数の環境においてアンサンブルゲームのナッシュ平衡を求めるような不変リスク最小化を行う。そこで我々は,最良の応答ダイナミクスを用いた簡易な学習アルゴリズムを開発し,実験では,arjovsky et al.(2019)の挑戦的2レベル最適化問題よりも,非常に低い分散で類似または優れた経験的精度を与える。 1つの重要な理論的貢献は、提案されたゲームに対するナッシュ均衡の集合が、非線形分類器や変換でさえ、任意の有限個の環境に対する不変予測子の集合と等価であることを示すことである。その結果、この手法はArjovsky et al. (2019) に示される大きな環境の集合に対する一般化保証も維持する。提案アルゴリズムは, 生成逆数ネットワークなどのゲーム理論機械学習アルゴリズムの収集に成功した。

The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding invariant predictors reduces the effect of spurious features by concentrating models on features that have a causal relationship with the outcome. In this work, we pose such invariant risk minimization as finding the Nash equilibrium of an ensemble game among several environments. By doing so, we develop a simple training algorithm that uses best response dynamics and, in our experiments, yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019). One key theoretical contribution is showing that the set of Nash equilibria for the proposed game are equivalent to the set of invariant predictors for any finite number of environments, even with nonlinear classifiers and transformations. As a result, our method also retains the generalization guarantees to a large set of environments shown in Arjovsky et al. (2019). The proposed algorithm adds to the collection of successful game-theoretic machine learning algorithms such as generative adversarial networks.

翻訳日:2023-01-02 01:37:44 公開日:2020-03-18

# 人物画像生成のための深部画像空間変換

Deep Image Spatial Transformation for Person Image Generation ( http://arxiv.org/abs/2003.00696v2 )

ライセンス: Link先を確認

Yurui Ren, Xiaoming Yu, Junming Chen, Thomas H. Li, Ge Li

(参考訳) ポーズ誘導人物画像生成は、対象人物画像から対象人物画像への変換である。このタスクはソースデータの空間的操作を必要とする。しかし、畳み込みニューラルネットワークは、入力を空間的に変換する能力の欠如によって制限される。本稿では,インプットを機能レベルで再アセンブルするための微分可能なグローバルフローローカルアテンションフレームワークを提案する。具体的には、まずソースとターゲットのグローバルな相関を計算し、流れ場を予測する。そして、特徴地図からフローした局所パッチ対を抽出して局所注意係数を算出する。最後に,得られた局所的注意係数を用いたコンテンツ認識サンプリング手法を用いて,ソース特性を警告する。主観的および客観的実験の結果から,モデルの優越性が示された。さらに,映像アニメーションとビュー合成のさらなる結果は,我々のモデルは空間変換を必要とする他のタスクに適用可能であることを示している。ソースコードはhttps://github.com/RenYurui/Global-Flow-Local-Attentionで公開しています。

Pose-guided person image generation is to transform a source person image to a target pose. This task requires spatial manipulations of source data. However, Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. Specifically, our model first calculates the global correlations between sources and targets to predict flow fields. Then, the flowed local patch pairs are extracted from the feature maps to calculate the local attention coefficients. Finally, we warp the source features using a content-aware sampling method with the obtained local attention coefficients. The results of both subjective and objective experiments demonstrate the superiority of our model. Besides, additional results in video animation and view synthesis show that our model is applicable to other tasks requiring spatial transformation. Our source code is available at https://github.com/RenYurui/Global-Flow-Local-Attention.

翻訳日:2022-12-27 04:13:03 公開日:2020-03-18

# GenNet : ジェネレーションとセレクションモデルを用いた複数選択質問の読解

GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model ( http://arxiv.org/abs/2003.04360v2 )

ライセンス: Link先を確認

Vaishali Ingale, Pushpender Singh

(参考訳) 複数選択機械読解は, 与えられた項目と質問項目から正しい選択肢を選択するために必要な機械として困難な作業であり, 複数選択問合せタスクによる理解を読み, 与えられた項目, 質問ペア, 与えられた選択肢から最適な選択肢を選択するための人間(または機械)を検索する。与えられた節から正しい答えを選択するには2つの異なる方法がある。最悪の解答を排除して、ベストマッチの解答を選択する。本稿では、ニューラルネットワークベースのモデルであるGenNetモデルを提案する。このモデルでは、まずその文から質問の答えを生成し、それから生成された回答と与えられた回答とを一致させる。回答生成にはS-net(Tan et al., 2017)モデルをSQuADでトレーニングし,そのモデルを評価するために大規模RAS(ReAding Comprehension Dataset From Examinations)(Lai et al., 2017)を使用しました。

Multiple-choice machine reading comprehension is difficult task as its required machines to select the correct option from a set of candidate or possible options using the given passage and question.Reading Comprehension with Multiple Choice Questions task,required a human (or machine) to read a given passage, question pair and select the best one option from n given options. There are two different ways to select the correct answer from the given passage. Either by selecting the best match answer to by eliminating the worst match answer. Here we proposed GenNet model, a neural network-based model. In this model first we will generate the answer of the question from the passage and then will matched the generated answer with given answer, the best matched option will be our answer. For answer generation we used S-net (Tan et al., 2017) model trained on SQuAD and to evaluate our model we used Large-scale RACE (ReAding Comprehension Dataset From Examinations) (Lai et al.,2017).

翻訳日:2022-12-26 21:50:20 公開日:2020-03-18

# 自己教師付き時間領域適応による動作セグメンテーション

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation ( http://arxiv.org/abs/2003.02824v3 )

ライセンス: Link先を確認

Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira

(参考訳) 完全教師付きアクションセグメンテーション技術の最近の進歩にもかかわらず、パフォーマンスはまだ完全には満足できない。主な課題は時空間変動の問題である(例えば、異なる人が様々な方法で同じ活動をすることができる)。そこで本稿では,非ラベル映像を用いて,時空間変動による領域差を伴うクロスドメイン問題としてアクションセグメンテーションタスクを再構成し,この問題に対処する。そこで本研究では,局所的および大域的な時間的ダイナミクスを組み込んだクロスドメイン特徴空間を協調的に整列させ,他のドメイン適応(da)手法よりも優れた性能を実現するために,自己教師付き時間領域適応(sstda)を提案する。 3つの挑戦的なベンチマークデータセット(GTEA、50Salads、およびBreakfast)において、SSTDAは、現在の最先端の手法を大きなマージン(例えば、F1@25スコア、59.6%から69.1%、Breakfastスコア、50Saladsの73.4%から81.5%、GTEAの83.6%から89.1%)で上回り、ラベル付きトレーニングデータの65%に匹敵するパフォーマンスを保っている。ソースコードはhttps://github.com/cmhungsteve/SSTDAで入手できる。

Despite the recent progress of fully-supervised action segmentation techniques, the performance is still not fully satisfactory. One main challenge is the problem of spatiotemporal variations (e.g. different people may perform the same activity in various ways). Therefore, we exploit unlabeled videos to address this problem by reformulating the action segmentation task as a cross-domain problem with domain discrepancy caused by spatio-temporal variations. To reduce the discrepancy, we propose Self-Supervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches. On three challenging benchmark datasets (GTEA, 50Salads, and Breakfast), SSTDA outperforms the current state-of-the-art method by large margins (e.g. for the F1@25 score, from 59.6% to 69.1% on Breakfast, from 73.4% to 81.5% on 50Salads, and from 83.6% to 89.1% on GTEA), and requires only 65% of the labeled training data for comparable performance, demonstrating the usefulness of adapting to unlabeled target videos across variations. The source code is available at https://github.com/cmhungsteve/SSTDA.

翻訳日:2022-12-26 06:59:32 公開日:2020-03-18

# 機械学習バイアスの半自動検出のための設計ツール:インタビュー研究

Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study ( http://arxiv.org/abs/2003.07680v2 )

ライセンス: Link先を確認

Po-Ming Law, Sana Malik, Fan Du, Moumita Sinha

(参考訳) 機械学習モデルは、入力データの特定のサブグループに対するバイアスを予測します。検出されない場合、機械学習のバイアスは、重要な経済的および倫理的影響を構成する可能性がある。ループに人間を巻き込む半自動ツールは、バイアス検出を容易にする。しかし、その設計にかかわる考慮事項についてはほとんど分かっていない。本稿では,機械学習の実践者11人とのインタビューで,半自動バイアス検出ツールに関するニーズを調査した。この結果に基づき,バイアス検出のための将来のツール開発を目指すシステム設計者の指導を行うための設計上の4つの考察を強調する。

Machine learning models often make predictions that bias against certain subgroups of input data. When undetected, machine learning biases can constitute significant financial and ethical implications. Semi-automated tools that involve humans in the loop could facilitate bias detection. Yet, little is known about the considerations involved in their design. In this paper, we report on an interview study with 11 machine learning practitioners for investigating the needs surrounding semi-automated bias detection tools. Based on the findings, we highlight four considerations in designing to guide system designers who aim to create future tools for bias detection.

翻訳日:2022-12-24 01:31:39 公開日:2020-03-18

# 集中治療における敗血症治療の最適化 : 強化学習から治療前評価まで

Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation ( http://arxiv.org/abs/2003.06474v2 )

ライセンス: Link先を確認

Luchen Li, Ignacio Albert-Smet, and Aldo A. Faisal

(参考訳) 本研究の目的は,インベンションを最適化する強化学習(rl)が,臨床展開における学習方針の今後の臨床検査への規制に準拠する経路を,遡及的に確立することにある。我々は, 複雑で不透明な患者動態が原因で治療が困難である集中治療室における感染症と, 個々の患者が必要とする治療方針について, 臨床的に議論され, 高度に分断され, かつ集中治療室は自然にデータに富んでいることに焦点を当てた。本研究は、医療におけるRLアプローチ(AI臨床医)の構築と、部分的に観察可能なMDP(POMDP)の下での歴史的集中治療データを用いた敗血症治療のための医薬の非政治的服用方針を学習する。 POMPDは、すべての歴史的情報を取り込み、効率的な表現をもたらすことで、患者の状態の不確実性をよりよく捉えます。最優先のツリーサーチによって、遭遇した各状態を評価することで、振り返りデータにおける探索の欠如を補う。我々は, 臨床医の複合政策近傍の政策を最適化することで, 状態分布の変化を緩和する。要は,従来の政策評価だけでなく,臨床医の意思決定の正確さと不確実性を評価するための,モデルに依存しない事前臨床評価手法を,同一の患者履歴に直面する場合のシステムレコメンデーション(シャドウモード)と比較した。

Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies in a clinical deployment. We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics, and the clinically debated, highly-divergent set of intervention policies required by each individual patient, yet intensive care units are naturally data rich. In our work, we build on RL approaches in healthcare ("AI Clinicians"), and learn off-policy continuous dosing policy of pharmaceuticals for sepsis treatment using historical intensive care data under partially observable MDPs (POMDPs). POMPDs capture uncertainty in patient state better by taking in all historical information, yielding an efficient representation, which we investigate through ablations. We compensate for the lack of exploration in our retrospective data by evaluating each encountered state with a best-first tree search. We mitigate state distributional shift by optimizing our policy in the vicinity of the clinicians' compound policy. Crucially, we evaluate our model recommendations using not only conventional policy evaluations but a novel framework that incorporates human experts: a model-agnostic pre-clinical evaluation method to estimate the accuracy and uncertainty of clinician's decisions versus our system recommendations when confronted with the same individual patient history ("shadow mode").

翻訳日:2022-12-24 01:05:55 公開日:2020-03-18

# SDNを用いたモノのインターネットのためのSOM型DDoS防御機構

SOM-based DDoS Defense Mechanism using SDN for the Internet of Things ( http://arxiv.org/abs/2003.06834v2 )

ライセンス: Link先を確認

Yunfei Meng, Zhiqiu Huang, Senzhang Wang, Guohua Shen, Changbo Ke

(参考訳) 本稿では,モノのインターネットに対するセキュリティの脅威に効果的に取り組むために,ソフトウェア定義ネットワーク(SDN)を用いたSOMベースのDDoS防御機構を提案する。このメカニズムの主な考え方は、物のインターネットにおけるデバイスサービスを保護するためにSDNベースのゲートウェイをデプロイすることだ。ゲートウェイは、somニューラルネットワークに基づくddos防御メカニズムを提供する。 SOMベースのDDoS防御機構により、ゲートウェイはIoT内の悪意あるセンサーデバイスを効果的に識別し、検出した後にそれらの悪意のあるデバイスを自動的にブロックし、DDoS攻撃を受けた場合のシステムのセキュリティと堅牢性を効果的に強化することができる。この機構の有効性と有効性を検証するため,実験システムの実装にはPOXコントローラとミニネットエミュレータを使用し,さらに前述のセキュリティ対策機構をPythonで実装する。最後の実験結果は、異なるテストシナリオでメカニズムが本当に効果的であることを示している。

To effectively tackle the security threats towards the Internet of things, we propose a SOM-based DDoS defense mechanism using software-defined networking (SDN) in this paper. The main idea of the mechanism is to deploy a SDN-based gateway to protect the device services in the Internet of things. The gateway provides DDoS defense mechanism based on SOM neural network. By means of SOM-based DDoS defense mechanism, the gateway can effectively identify the malicious sensing devices in the IoT, and automatically block those malicious devices after detecting them, so that it can effectively enforce the security and robustness of the system when it is under DDoS attacks. In order to validate the feasibility and effectiveness of the mechanism, we leverage POX controller and Mininet emulator to implement an experimental system, and further implement the aforementioned security enforcement mechanisms with Python. The final experimental results illustrate that the mechanism is truly effective under the different test scenarios.

翻訳日:2022-12-23 08:45:59 公開日:2020-03-18

# アフリカ小児てんかん患者における低磁場脳mriのコントラストと分解能の向上

Image Quality Transfer Enhances Contrast and Resolution of Low-Field Brain MRI in African Paediatric Epilepsy Patients ( http://arxiv.org/abs/2003.07216v2 )

ライセンス: Link先を確認

Matteo Figini (1), Hongxiang Lin (1), Godwin Ogbole (2), Felice D Arco (3), Stefano B. Blumberg (1), David W. Carmichael (4 and 5), Ryutaro Tanno (1 and 6), Enrico Kaden (1 and 4), Biobele J. Brown (7), Ikeoluwa Lagunju (7), Helen J. Cross (3 and 4), Delmiro Fernandez-Reyes (1 and 7), Daniel C. Alexander (1) ((1) Centre for Medical Image Computing and Department of Computer Science - University College London - UK, (2) Department of Radiology - College of Medicine - University of Ibadan - Nigeria, (3) Great Ormond Street Hospital for Children - London - UK, (4) UCL Great Ormond Street Institute of Child Health - London - UK, (5) Department of Biomedical Engineering - Kings College London - UK, (6) Machine Intelligence and Perception Group - Microsoft Research Cambridge - UK, (7) Department of Paediatrics - College of Medicine - University of Ibadan - Nigeria)

(参考訳) 1.5tまたは3tスキャナは、現在の臨床mriの標準であるが、低磁場の(<1t)スキャナは、コストと停電に対する堅牢性のために、多くの低所得国で依然として一般的である。現代の高磁場スキャナと比較すると、低磁場スキャナーは同等の解像度で信号と雑音の比が低い画像を提供し、実践者は大きなスライス厚さと不完全な空間カバレッジを用いて補償する。さらに、異なる種類の脳組織間のコントラストは、診断値を制限する等信号対雑音比でも著しく減少する可能性がある。近年,1.5T画像や3T画像の解像度,空間被覆,コントラストの近似を目的とした0.36T画像の高精細化のために,画像品質転送のパラダイムが適用されている。ニューラルネットワークU-Netの亜種は、公開されている3T Human Connectome Projectデータセットからシミュレーションされたローフィールドイメージを使用してトレーニングされた。今回我々は,手軽に手軽に利用できる低磁場MRIのてんかん管理における臨床的有用性を高めるために,IQTの有用性を示すリアルおよびシミュレートされた臨床低磁場脳画像の質的結果を示す。

1.5T or 3T scanners are the current standard for clinical MRI, but low-field (<1T) scanners are still common in many lower- and middle-income countries for reasons of cost and robustness to power failures. Compared to modern high-field scanners, low-field scanners provide images with lower signal-to-noise ratio at equivalent resolution, leaving practitioners to compensate by using large slice thickness and incomplete spatial coverage. Furthermore, the contrast between different types of brain tissue may be substantially reduced even at equal signal-to-noise ratio, which limits diagnostic value. Recently the paradigm of Image Quality Transfer has been applied to enhance 0.36T structural images aiming to approximate the resolution, spatial coverage, and contrast of typical 1.5T or 3T images. A variant of the neural network U-Net was trained using low-field images simulated from the publicly available 3T Human Connectome Project dataset. Here we present qualitative results from real and simulated clinical low-field brain images showing the potential value of IQT to enhance the clinical utility of readily accessible low-field MRIs in the management of epilepsy.

翻訳日:2022-12-23 04:09:30 公開日:2020-03-18

# ギリシア語における攻撃的言語識別

Offensive Language Identification in Greek ( http://arxiv.org/abs/2003.07459v2 )

ライセンス: Link先を確認

Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe

(参考訳) オンラインコミュニティやソーシャルメディアプラットフォームでは、攻撃的言語が問題になってきたため、研究者は乱暴なコンテンツに対処する方法や、サイバーいじめ、ヘイトスピーチ、攻撃など、さまざまなタイプを検出するシステムの開発を行っている。特筆すべき例外がいくつかあるが、この話題に関するほとんどの研究は英語を扱っている。これは主に英語の言語リソースが利用できるためである。この欠点に対処するため,攻撃的言語識別のための最初のギリシャの注釈付きデータセットであるOGTDを提案する。 OGTDは、Twitterから4,779件の投稿が攻撃的であり、攻撃的ではないという手動の注釈付きデータセットである。データセットの詳細な説明とともに、このデータに基づいてトレーニングおよびテストされたいくつかの計算モデルを評価する。

As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With a few notable exceptions, most research on this topic so far has dealt with English. This is mostly due to the availability of language resources for English. To address this shortcoming, this paper presents the first Greek annotated dataset for offensive language identification: the Offensive Greek Tweet Dataset (OGTD). OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive. Along with a detailed description of the dataset, we evaluate several computational models trained and tested on this data.

翻訳日:2022-12-23 03:12:47 公開日:2020-03-18

# ロバスト画像を用いた植物病診断のためのノイズラベルからのメタラーニング

Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant Disease Diagnosis ( http://arxiv.org/abs/2003.07603v2 )

ライセンス: Link先を確認

Ruifeng Shi, Deming Zhai, Xianming Liu, Junjun Jiang, Wen Gao

(参考訳) 植物病は食料安全保障と作物生産の主な脅威の1つである。したがって、最近の人工知能の進歩を利用して植物病の診断を支援することは重要である。一般的なアプローチの1つは、葉画像分類タスクとしてこの問題を変換し、強力な畳み込みニューラルネットワーク(CNN)によって処理することができる。しかしながら、cnnに基づく分類手法の性能は、実際にはラベルに必然的にノイズをもたらし、モデルオーバーフィッティングとパフォーマンス低下をもたらす、高品質な手動ラベルトレーニングデータに依存する。そこで本稿では,修正メタ学習モジュールを共通CNNパラダイムに組み込んだ新しいフレームワークを提案する。提案手法は以下の利点を享受する。一補正メタラーニングは、偏見のないサンプルにより多くの注意を払って、収束の加速と分類精度の向上を図っている。二この方法は、様々な種類の騒音によく作用するラベルノイズ分布を仮定して、自由である。三本手法は、勾配降下法により最適化されたディープモデルに組み込むことができるプラグアンドプレイモジュールとして機能する。最先端のアルゴリズムよりも優れた性能を示すために,広範な実験を行った。

Plant diseases serve as one of main threats to food security and crop production. It is thus valuable to exploit recent advances of artificial intelligence to assist plant disease diagnosis. One popular approach is to transform this problem as a leaf image classification task, which can be then addressed by the powerful convolutional neural networks (CNNs). However, the performance of CNN-based classification approach depends on a large amount of high-quality manually labeled training data, which are inevitably introduced noise on labels in practice, leading to model overfitting and performance degradation. To overcome this problem, we propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information. The proposed method enjoys the following merits: i) A rectified meta-learning is designed to pay more attention to unbiased samples, leading to accelerated convergence and improved classification accuracy. ii) Our method is free on assumption of label noise distribution, which works well on various kinds of noise. iii) Our method serves as a plug-and-play module, which can be embedded into any deep models optimized by gradient descent based method. Extensive experiments are conducted to demonstrate the superior performance of our algorithm over the state-of-the-arts.

翻訳日:2022-12-22 21:03:34 公開日:2020-03-18

# TREC 2019ディープラーニングトラックの概要

Overview of the TREC 2019 deep learning track ( http://arxiv.org/abs/2003.07820v2 )

ライセンス: Link先を確認

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees

(参考訳) Deep Learning TrackはTREC 2019の新しいトラックで、大規模データ体制におけるアドホックランキングの研究を目的としている。 2つのタスクに対応する2つのセットを導入し、それぞれに厳格なTRECスタイルのブラインド評価と再利用可能なテストセットがある。文書検索タスクには320万のドキュメントと367万のトレーニングクエリからなるコーパスがあり、43のクエリで再利用可能なテストセットを生成する。パス検索タスクは880万のパスと503万のトレーニングクエリからなるコーパスを持ち、43のクエリの再利用テストセットを生成する。今年、15のグループは、ディープラーニング、転送学習、従来のirランキング手法のさまざまな組み合わせを使用して、合計75のランを提出した。ディープラーニングは従来のir実行を大きく上回っている。この結果から,大規模なトレーニングデータを導入し,評価プールにトレーニングした深層モデルを含めることができたが,過去の研究ではそのようなトレーニングデータやプールは存在しなかった。

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries, for which we generate a reusable test set of 43 queries. The passage retrieval task has a corpus of 8.8 million passages with 503 thousand training queries, for which we generate a reusable test set of 43 queries. This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods. Deep learning runs significantly outperformed traditional IR runs. Possible explanations for this result are that we introduced large training data and we included deep models trained on such data in our judging pools, whereas some past studies did not have such training data or pooling.

翻訳日:2022-12-22 20:44:41 公開日:2020-03-18

# プライバシー保護協調フィルタリングの実態調査

Survey of Privacy-Preserving Collaborative Filtering ( http://arxiv.org/abs/2003.08343v1 )

ライセンス: Link先を確認

Islam Elnabarawy, Wei Jiang, Donald C. Wunsch II

(参考訳) 協調フィルタリングレコメンデーションシステムは、自身の過去の好みと、同様の関心を持つ他のユーザの好みに基づいて、ユーザにレコメンデーションを提供する。近年、レコメンデーションシステムの利用は広く増えており、どの映画を見るか、本を読むか、購入するアイテムを選ぶのに役立っている。しかし、このようなシステムを使う場合、ユーザーはプライバシーを心配することが多く、ほとんどのオンラインサービスに正確な情報を提供することに消極的である。プライバシー保護協調フィルタリングレコメンデーションシステムは、データのプライバシーに関する一定の保証を維持しながら、ユーザに正確なレコメンデーションを提供することを目的としている。この調査は、プライバシ保護協調フィルタリングにおける最近の文献を調査し、分野の広い視点を提供し、文献における主要なコントリビューションを、それらが対処する脆弱性の種類と、それを解決するために使用するアプローチのタイプという2つの異なる基準を用いて分類する。

Collaborative filtering recommendation systems provide recommendations to users based on their own past preferences, as well as those of other users who share similar interests. The use of recommendation systems has grown widely in recent years, helping people choose which movies to watch, books to read, and items to buy. However, users are often concerned about their privacy when using such systems, and many users are reluctant to provide accurate information to most online services. Privacy-preserving collaborative filtering recommendation systems aim to provide users with accurate recommendations while maintaining certain guarantees about the privacy of their data. This survey examines the recent literature in privacy-preserving collaborative filtering, providing a broad perspective of the field and classifying the key contributions in the literature using two different criteria: the type of vulnerability they address and the type of approach they use to solve it.

翻訳日:2022-12-22 13:27:45 公開日:2020-03-18

# ニューラルファジィエクストラクタ:バイオメトリックユーザ認証にニューラルネットワークを使用するセキュアな方法

Neural Fuzzy Extractors: A Secure Way to Use Artificial Neural Networks for Biometric User Authentication ( http://arxiv.org/abs/2003.08433v1 )

ライセンス: Link先を確認

Abhishek Jana, Md Kamruzzaman Sarker, Monireh Ebrahimi, Pascal Hitzler, George T Amariucai

(参考訳) センサ開発と人工知能の新たな進歩、計算コストの低減、ハンドヘルド計算デバイスの普及により、生体認証(および識別)が急速に普及している。高度な機械学習技術に基づくバイオメトリック認証への現代的なアプローチは、訓練済みの分類器の詳細または明示的なユーザバイオメトリックデータの保存を回避できないため、ユーザの認証情報が偽造される。本稿では,生体認証のためのベクトル空間分類器や人工ニューラルネットワークを用いたユーザ固有情報を扱うためのセキュアな方法を提案する。提案アーキテクチャはニューラルファジィ・エクストラクタ (NFE) と呼ばれ,既存の分類器とファジィ抽出器の結合を可能にする。したがって、NFEは、現代のディープラーニングベースの分類器のすべてのパフォーマンス上の利点と、標準的なファジィ抽出器のセキュリティを提供する。従来型ニューラルネットワークのnfeを,指紋認証によるユーザ認証の簡単なシナリオに適合させることを実証する。

Powered by new advances in sensor development and artificial intelligence, the decreasing cost of computation, and the pervasiveness of handheld computation devices, biometric user authentication (and identification) is rapidly becoming ubiquitous. Modern approaches to biometric authentication, based on sophisticated machine learning techniques, cannot avoid storing either trained-classifier details or explicit user biometric data, thus exposing users' credentials to falsification. In this paper, we introduce a secure way to handle user-specific information involved with the use of vector-space classifiers or artificial neural networks for biometric authentication. Our proposed architecture, called a Neural Fuzzy Extractor (NFE), allows the coupling of pre-existing classifiers with fuzzy extractors, through a artificial-neural-network-based buffer called an expander, with minimal or no performance degradation. The NFE thus offers all the performance advantages of modern deep-learning-based classifiers, and all the security of standard fuzzy extractors. We demonstrate the NFE retrofit to a classic artificial neural network for a simple scenario of fingerprint-based user authentication.

翻訳日:2022-12-22 13:27:04 公開日:2020-03-18

# オンライン予測における悪質専門家と乗法重みアルゴリズム

Malicious Experts versus the multiplicative weights algorithm in online prediction ( http://arxiv.org/abs/2003.08457v1 )

ライセンス: Link先を確認

Erhan Bayraktar, H. Vincent Poor, Xin Zhang

(参考訳) 2人の専門家と予測者による予測問題を考える。専門家の一人が正直で、各ラウンドで確率$\mu$で正しい予測をしていると仮定する。もう一方は悪意があり、各ラウンドの真の結果を知り、予測者の損失を最大化するために予測を行う。予測者が古典的な乗法重みアルゴリズムを採用すると仮定すると、悪意のある専門家の値関数の上限と下限が見つかる。その結果,乗算重みアルゴリズムは悪意のある専門家の腐敗に抵抗できないことが示唆された。また,適応乗法重み付けアルゴリズムは予測者にとって漸近的に最適であり,従って悪質な専門家の腐敗に抵抗することを示した。

We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability $\mu$ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster. Assuming the forecaster adopts the classical multiplicative weights algorithm, we find upper and lower bounds for the value function of the malicious expert. Our results imply that the multiplicative weights algorithm cannot resist the corruption of malicious experts. We also show that an adaptive multiplicative weights algorithm is asymptotically optimal for the forecaster, and hence more resistant to the corruption of malicious experts.

翻訳日:2022-12-22 13:26:45 公開日:2020-03-18

# LTE-U/Wi-Fi共存シナリオにおける機械学習によるスペクトル共有

Machine Learning enabled Spectrum Sharing in Dense LTE-U/Wi-Fi Coexistence Scenarios ( http://arxiv.org/abs/2003.13652v1 )

ライセンス: Link先を確認

Adam Dziedzic, Vanlin Sathya, Muhammad Iqbal Rochman, Monisha Ghosh and Sanjay Krishnan

(参考訳) 複雑なエンジニアリング問題に対する機械学習(ML)技術の適用は、魅力的で効率的なソリューションであることが証明されている。 MLは、画像認識や産業運用の自動化など、いくつかの実践的なタスクにうまく適用されています。非線形問題の解決におけるML技術の約束は、既知のML技術の適用と、未ライセンスのスペクトルにおけるWi-FiとLTE間の無線スペクトル共有のための新しい手法の開発を目的として、この研究に影響を与えた。本研究では,LTE-Uフォーラムが開発したLTE-Unlicensed (LTE-U)仕様に焦点をあてる。この仕様は、コチャネルWi-Fiベーシックサービスセット(BSS)の数が1つから2つ以上に増加すると、LTE-Uベースステーション(BS)のデューティサイクルが減少することを示唆している。しかし、Wi-Fiパケットを復号することなく、リアルタイムにチャンネル上で動作しているWi-Fi BSSの数を検出することは難しい問題である。本研究では,LTE-U OFF時間帯に観測されたエネルギー値を用いて,MLに基づく新しい手法を提案する。 LTE-U BS OFF時間中のエネルギー値だけを観測するのは、LTE-Uベースステーションで完全なWi-Fi受信機を必要とするWi-Fiパケット全体を復号するのに比べれば比較的簡単である。提案手法を実時間実験により実装・検証し,一方と多数の Wi-Fi AP 伝送間のエネルギー分布に異なるパターンが存在することを示す。提案手法は,従来の自己相関法 (AC) やエネルギー検出法 (ED) と比較して高い精度 (すべての場合 99 % に近づいた) が得られる。

The application of Machine Learning (ML) techniques to complex engineering problems has proved to be an attractive and efficient solution. ML has been successfully applied to several practical tasks like image recognition, automating industrial operations, etc. The promise of ML techniques in solving non-linear problems influenced this work which aims to apply known ML techniques and develop new ones for wireless spectrum sharing between Wi-Fi and LTE in the unlicensed spectrum. In this work, we focus on the LTE-Unlicensed (LTE-U) specification developed by the LTE-U Forum, which uses the duty-cycle approach for fair coexistence. The specification suggests reducing the duty cycle at the LTE-U base-station (BS) when the number of co-channel Wi-Fi basic service sets (BSSs) increases from one to two or more. However, without decoding the Wi-Fi packets, detecting the number of Wi-Fi BSSs operating on the channel in real-time is a challenging problem. In this work, we demonstrate a novel ML-based approach which solves this problem by using energy values observed during the LTE-U OFF duration. It is relatively straightforward to observe only the energy values during the LTE-U BS OFF time compared to decoding the entire Wi-Fi packet, which would require a full Wi-Fi receiver at the LTE-U base-station. We implement and validate the proposed ML-based approach by real-time experiments and demonstrate that there exist distinct patterns between the energy distributions between one and many Wi-Fi AP transmissions. The proposed ML-based approach results in a higher accuracy (close to 99\% in all cases) as compared to the existing auto-correlation (AC) and energy detection (ED) approaches.

翻訳日:2022-12-22 13:26:33 公開日:2020-03-18

# eisen: 堅固なディープラーニングのためのpythonパッケージ

Eisen: a python package for solid deep learning ( http://arxiv.org/abs/2004.02747v1 )

ライセンス: Link先を確認

Frank Mancolo

(参考訳) eisenは、ディープラーニングメソッドの実装を簡単にするオープンソースのpythonパッケージである。医用画像解析やコンピュータビジョンタスクに特化しているが、柔軟性によって任意のアプリケーションへの拡張が可能になる。 EisenはPyTorchをベースにしており、PyTorchエコシステムに属する他のパッケージと同じアーキテクチャに従っている。これにより使用が簡単になり、他のパッケージが提供するモジュールと互換性がある。 eisenは、複数のデータセットローディングメソッド、さまざまなデータフォーマットのためのi/o、データ操作と変換、トレーニング、検証とテストループの完全な実装、損失とネットワークアーキテクチャの実装、トレーニングアーティファクトの自動エクスポート、サマリーとログ、ビジュアル実験構築、コマンドラインインターフェースなどを実装している。さらに,コミュニティによるユーザコントリビューションも公開されている。ドキュメント、例、コードはhttp://eisen.ai.comからダウンロードできる。

Eisen is an open source python package making the implementation of deep learning methods easy. It is specifically tailored to medical image analysis and computer vision tasks, but its flexibility allows extension to any application. Eisen is based on PyTorch and it follows the same architecture of other packages belonging to the PyTorch ecosystem. This simplifies its use and allows it to be compatible with modules provided by other packages. Eisen implements multiple dataset loading methods, I/O for various data formats, data manipulation and transformation, full implementation of training, validation and test loops, implementation of losses and network architectures, automatic export of training artifacts, summaries and logs, visual experiment building, command line interface and more. Furthermore, it is open to user contributions by the community. Documentation, examples and code can be downloaded from http://eisen.ai.

翻訳日:2022-12-22 13:18:26 公開日:2020-03-18

# ディープニューラルネットワーク学習のためのブロック層分割スキーム

Block Layer Decomposition schemes for training Deep Neural Networks ( http://arxiv.org/abs/2003.08123v1 )

ライセンス: Link先を確認

Laura Palagi, Ruggiero Seccia

(参考訳) ディープフィードフォワードニューラルネットワーク(dfnn)の重み推定は、非常に大きな非凸最適化問題の解に依存している。その結果、最適化アルゴリズムは、悪い解決策につながるか、最適化プロセスを遅くする可能性がある局所的最小化器に惹かれることができる。さらに、トレーニング問題に対する優れた解を見つけるのに必要な時間は、サンプルの数と変数の数の両方に依存する。本稿では,ブロック座標降下法(bcd法)を用いて,定常点や平坦領域を回避し,最先端アルゴリズムの性能を向上させる方法を示す。まず、ネットワークの深さに効果的に取り組むことができるバッチBCD法について述べ、次に、BCDアプローチをミニバッチフレームワークに埋め込むことで、変数数とサンプル数の両方をスケールできる \textit{minibatch} BCD フレームワークを提案するアルゴリズムをさらに拡張する。複数のアーキテクチャネットワークにおける標準データセットの広範囲な数値計算により,dfnnのトレーニングフェーズへのbcd手法の適用が,標準バッチアルゴリズムやミニバッチアルゴリズムよりも優れており,トレーニングフェーズとネットワークの一般化性能の両方が向上していることを示す。

Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. As a consequence, optimization algorithms can be attracted toward local minimizers which can lead to bad solutions or can slow down the optimization process. Furthermore, the time needed to find good solutions to the training problem depends on both the number of samples and the number of variables. In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method ables to effectively tackle the network's depth and then we further extend the algorithm proposing a \textit{minibatch} BCD framework able to scale with respect to both the number of variables and the number of samples by embedding a BCD approach into a minibatch framework. By extensive numerical results on standard datasets for several architecture networks, we show how the application of BCD methods to the training phase of DFNNs permits to outperform standard batch and minibatch algorithms leading to an improvement on both the training phase and the generalization performance of the networks.

翻訳日:2022-12-22 13:17:51 公開日:2020-03-18

# クラスタリングと影響分析を用いたプロセスマイニング分析におけるビジネスエリア効果の発見

Discovering Business Area Effects to Process Mining Analysis Using Clustering and Influence Analysis ( http://arxiv.org/abs/2003.08170v1 )

ライセンス: Link先を確認

Teemu Lehto and Markku Hinkka

(参考訳) 大きな組織でビジネスプロセスを改善するための一般的な課題は、オペレーションを担当するビジネスパーソンが、ビジネスオペレーションで実行される実行の詳細、プロセス変種、例外の事実に基づく理解を欠いていることです。既存のプロセスマイニング方法論はイベントログに基づいてこれらの詳細を発見できるが、プロセスマイニングの知見をビジネス関係者に伝えることは困難である。本稿では,プロセス実行の詳細に重要な影響を与えるビジネス領域を発見するための新しい手法を提案する。本手法はクラスタリングを用いてプロセスフロー特性に基づいて類似の事例をグループ化し,クラスタに最も相関するビジネス領域を検出するための影響分析を行う。私たちの分析はBPMの人々とビジネスの間の橋渡しとして役立ちます。また,公開されている実物購入注文プロセスデータに基づく事例分析を行った。

A common challenge for improving business processes in large organizations is that business people in charge of the operations are lacking a fact-based understanding of the execution details, process variants, and exceptions taking place in business operations. While existing process mining methodologies can discover these details based on event logs, it is challenging to communicate the process mining findings to business people. In this paper, we present a novel methodology for discovering business areas that have a significant effect on the process execution details. Our method uses clustering to group similar cases based on process flow characteristics and then influence analysis for detecting those business areas that correlate most with the discovered clusters. Our analysis serves as a bridge between BPM people and business, people facilitating the knowledge sharing between these groups. We also present an example analysis based on publicly available real-life purchase order process data.

翻訳日:2022-12-22 13:17:29 公開日:2020-03-18

# 機械学習を用いた新しいコロナウイルスに対する中和抗体の発見

Potential Neutralizing Antibodies Discovered for Novel Corona Virus Using Machine Learning ( http://arxiv.org/abs/2003.08447v1 )

ライセンス: Link先を確認

Rishikesh Magar, Prakarsh Yadav, Amir Barati Farimani

(参考訳) 迅速で追跡不能なウイルス変異は、免疫系が抑制抗体を産生する前に数千人の命を奪う。新型コロナウイルスの感染拡大を受け、世界中で数千人が死亡した。新型コロナウイルスのウイルスエピトープを阻害するペプチドや抗体配列の迅速発見法は、数千人の命を救う。本稿では,コロナウイルスに対する阻害性合成抗体を予測するための機械学習(ML)モデルを考案した。 1933年のウイルス抗体配列と臨床患者中和反応を収集し,MLモデルを用いて抗体反応の予測を行った。各種ML法を用いて, 数千の仮説抗体配列をスクリーニングし, 8種類の安定抗体が検出された。我々は、コロナウイルスを阻害する候補抗体の安定性を検証するために、バイオインフォマティクス、構造生物学、分子動力学(md)シミュレーションを組み合わせた。

The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. Recent outbreak of novel coronavirus infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of COVID-19 will save the life of thousands. In this paper, we devised a machine learning (ML) model to predict the possible inhibitory synthetic antibodies for Corona virus. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, we screened thousands of hypothetical antibody sequences and found 8 stable antibodies that potentially inhibit COVID-19. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit the Corona virus.

翻訳日:2022-12-22 13:17:15 公開日:2020-03-18

# AIはファッションジャーゴンを解読できるのか?

Can AI decrypt fashion jargon for you? ( http://arxiv.org/abs/2003.08052v1 )

ライセンス: Link先を確認

Yuan Shen, Shanduojiao Jiang, Muhammad Rizky Wellyanto, and Ranjitha Kumar

(参考訳) ファッションについて語るとき、ファッションの概念の根底にある意味、例えばスタイルに気を配り、例えば、このドレスのどの機能がスマートかといった質問をするが、今日のファッションウェブサイトの製品説明はドメイン固有の言葉と低レベルの言葉でいっぱいである。これらの低レベルの記述が、いかにしてスタイルや高レベルのファッション概念に貢献できるかは、人々には明らかではない。本稿では,ファッションサイトにおける既存製品データを活用することで,この概念理解問題に対処するためのデータ駆動型ソリューションを提案する。最初に1546のファッションキーワードを5つのファッションカテゴリに分類した。次に,853,056製品からなる新しいファッション製品データセットを収集した。最後に、プロダクトイメージの低レベルとドメイン固有のファッション機能でハイレベルなファッションコンセプトを明示的に予測し、説明できるディープラーニングモデルをトレーニングしました。

When people talk about fashion, they care about the underlying meaning of fashion concepts,e.g., style.For example, people ask questions like what features make this dress smart.However, the product descriptions in today fashion websites are full of domain specific and low level words. It is not clear to people how exactly those low level descriptions can contribute to a style or any high level fashion concept. In this paper, we proposed a data driven solution to address this concept understanding issues by leveraging a large number of existing product data on fashion sites. We first collected and categorized 1546 fashion keywords into 5 different fashion categories. Then, we collected a new fashion product dataset with 853,056 products in total. Finally, we trained a deep learning model that can explicitly predict and explain high level fashion concepts in a product image with its low level and domain specific fashion features.

翻訳日:2022-12-22 13:16:59 公開日:2020-03-18

# クロスリンガルクロスコーパス音声の感情認識

Cross Lingual Cross Corpus Speech Emotion Recognition ( http://arxiv.org/abs/2003.07996v1 )

ライセンス: Link先を確認

Shivali Goel (1), Homayoon Beigi (1 and 2) ((1) Department of Computer Science, Columbia University, (2) Recognition Technologies, Inc., South Salem, New York, United States)

(参考訳) 既存の音声感情認識モデルの大部分は、単一のコーパスと単一の言語設定で訓練され、評価される。これらのシステムは、クロスコーポレートかつクロス言語シナリオに適用された場合、うまく機能しない。本稿では,単一コーパスとクロスコーパスのいずれにおいても,4言語における音声感情認識の結果について述べる。さらに,ジェンダー,自然性,覚醒を補助課題とするマルチタスク学習(MTL)は,感情モデルの一般化能力を高めることが示されており,本研究では,まだ研究されていない感情認識における音声言語の役割を探るため,MTLフレームワークのもう一つの補助課題として言語IDを導入する。

The majority of existing speech emotion recognition models are trained and evaluated on a single corpus and a single language setting. These systems do not perform as well when applied in a cross-corpus and cross-language scenario. This paper presents results for speech emotion recognition for 4 languages in both single corpus and cross corpus setting. Additionally, since multi-task learning (MTL) with gender, naturalness and arousal as auxiliary tasks has shown to enhance the generalisation capabilities of the emotion models, this paper introduces language ID as another auxiliary task in MTL framework to explore the role of spoken language on emotion recognition which has not been studied yet.

翻訳日:2022-12-22 13:10:14 公開日:2020-03-18

# 造影CT画像からの3次元肝血管形態再構成のためのグラフ注意ネットワークを用いたプルーニング

Graph Attention Network based Pruning for Reconstructing 3D Liver Vessel Morphology from Contrasted CT Images ( http://arxiv.org/abs/2003.07999v1 )

ライセンス: Link先を確認

Donghao Zhang, Siqi Liu, Shikha Chaganti, Eli Gibson, Zhoubing Xu, Sasa Grbic, Weidong Cai, and Dorin Comaniciu

(参考訳) 造影剤を血管に注入することで、多相造影ct画像は人体における血管ネットワークの可視性を高めることができる。造影CT画像から肝血管の3次元幾何学的形態を再構築することで, 複数種類の術前手術計画が可能である。肝血管形態の再構成は, 肝血管の形態学的複雑度と, 多相造影CT像の非一貫性により, 依然として困難である。一方, 意思決定バイアスを回避するためには, 3次元再構成において高い整合性が必要である。本稿では,完全畳み込みニューラルネットワークとグラフアテンションネットワークを併用した肝血管形態再構築フレームワークを提案する。完全な畳み込みニューラルネットワークは、まず肝臓の中枢熱マップを生成するために訓練される。その後、画像処理に基づくアルゴリズムを用いて、熱マップに基づいてオーバー再構成肝血管グラフモデルをトレースする。グラフアテンションネットワークを用いて、集約されたCNN特徴を用いて、初期再構成における各セグメント分岐の存在確率を予測する。 418個の多相腹部ct画像からなる社内データセット上で提案手法を評価した。提案したグラフネットワークのプルーニングにより,全体のF1スコアが6.4%向上した。また、他の最先端の曲率構造再構成アルゴリズムよりも優れていた。

With the injection of contrast material into blood vessels, multi-phase contrasted CT images can enhance the visibility of vessel networks in the human body. Reconstructing the 3D geometric morphology of liver vessels from the contrasted CT images can enable multiple liver preoperative surgical planning applications. Automatic reconstruction of liver vessel morphology remains a challenging problem due to the morphological complexity of liver vessels and the inconsistent vessel intensities among different multi-phase contrasted CT images. On the other side, high integrity is required for the 3D reconstruction to avoid decision making biases. In this paper, we propose a framework for liver vessel morphology reconstruction using both a fully convolutional neural network and a graph attention network. A fully convolutional neural network is first trained to produce the liver vessel centerline heatmap. An over-reconstructed liver vessel graph model is then traced based on the heatmap using an image processing based algorithm. We use a graph attention network to prune the false-positive branches by predicting the presence probability of each segmented branch in the initial reconstruction using the aggregated CNN features. We evaluated the proposed framework on an in-house dataset consisting of 418 multi-phase abdomen CT images with contrast. The proposed graph network pruning improves the overall reconstruction F1 score by 6.4% over the baseline. It also outperformed the other state-of-the-art curvilinear structure reconstruction algorithms.

翻訳日:2022-12-22 13:10:02 公開日:2020-03-18

# オブジェクトベースの画像符号化: 学習駆動再訪

Object-Based Image Coding: A Learning-Driven Revisit ( http://arxiv.org/abs/2003.08033v1 )

ライセンス: Link先を確認

Qi Xia, Haojie Liu and Zhan Ma

(参考訳) 20年ほど前に広く研究されたObject-Based Image Coding (OBIC)は、超低ビットレート通信と高レベルのセマンティックコンテンツ理解の両方に広大なアプリケーション視点を約束していたが、任意の形状のオブジェクトの非効率なコンパクト表現のために、ほとんど使われなかった。根本的な問題は、任意の形のオブジェクトを細かい粒度で効率的に処理する方法である(フィーチャー要素やピクセルワイズなど)。そこで本稿では,画像層分解のためのオブジェクトセグメンテーションネットワークと,マスク付き前景オブジェクトと背景シーンを別々に処理するための並列畳み込みに基づくニューラルイメージ圧縮ネットワークを考案して,要素ワイズマスキングと圧縮の適用を提案する。すべてのコンポーネントはエンドツーエンドの学習フレームワークで最適化され、視覚的に快適なリコンストラクションのために、その(オブジェクトや背景といった)貢献をインテリジェントに重み付けます。我々は, JPEG2K, HEVCベースのBPGおよび他の学習画像圧縮法と比較して, 主観的品質向上を顕著に示す, 非常に低ビットレートのシナリオ(例えば, $\lesssim$0.1 bits per pixel - bpp)において, PASCAL VOCデータセットの性能を評価するための総合的な実験を行った。関連資料はすべてhttps://njuvision.github.io/Neural-Object-Coding/で公開されています。

The Object-Based Image Coding (OBIC) that was extensively studied about two decades ago, promised a vast application perspective for both ultra-low bitrate communication and high-level semantical content understanding, but it had rarely been used due to the inefficient compact representation of object with arbitrary shape. A fundamental issue behind is how to efficiently process the arbitrary-shaped objects at a fine granularity (e.g., feature element or pixel wise). To attack this, we have proposed to apply the element-wise masking and compression by devising an object segmentation network for image layer decomposition, and parallel convolution-based neural image compression networks to process masked foreground objects and background scene separately. All components are optimized in an end-to-end learning framework to intelligently weigh their (e.g., object and background) contributions for visually pleasant reconstruction. We have conducted comprehensive experiments to evaluate the performance on PASCAL VOC dataset at a very low bitrate scenario (e.g., $\lesssim$0.1 bits per pixel - bpp) which have demonstrated noticeable subjective quality improvement compared with JPEG2K, HEVC-based BPG and another learned image compression method. All relevant materials are made publicly accessible at https://njuvision.github.io/Neural-Object-Coding/.

翻訳日:2022-12-22 13:09:42 公開日:2020-03-18

# OmniSLAM:ワイドベースラインマルチカメラシステムにおける全方向のローカライゼーションとディエンスマッピング

OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems ( http://arxiv.org/abs/2003.08056v1 )

ライセンス: Link先を確認

Changhee Won, Hochang Seok, Zhaopeng Cui, Marc Pollefeys, Jongwoo Lim

(参考訳) 本稿では,超広視野魚眼カメラ(FOV)を用いた広視野多視点ステレオ装置における全方位位置推定と高密度マッピングシステムについて述べる。より実用的で正確な再構築のために、我々はまず、既存のネットワークよりも高速かつ高精度な全方位深度推定のために、改良された軽量深層ニューラルネットワークを導入する。第2に,全方位深度推定を視覚オドメトリ(vo)に統合し,グローバル一貫性のためのループクローズモジュールを追加した。推定深度マップを用いて、お互いのビューにキーポイントを再計画し、より良く、より効率的な特徴マッチングプロセスをもたらす。最後に,全方位深度マップを融合し,推定したリグをTSDF(truncated signed distance function)ボリュームにポーズさせて3Dマップを得る。提案手法は,実環境と実環境の両方において優れた復元結果が得られることを示すとともに,実環境と合成環境の両方において,本手法が優れた復元結果を生成することを示す。

In this paper, we present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras, which has a 360 degrees coverage of stereo observations of the environment. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation, which are faster and more accurate than the existing networks. Second, we integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency. Using the estimated depth map, we reproject keypoints onto each other view, which leads to a better and more efficient feature matching process. Finally, we fuse the omnidirectional depth maps and the estimated rig poses into the truncated signed distance function (TSDF) volume to acquire a 3D map. We evaluate our method on synthetic datasets with ground-truth and real-world sequences of challenging environments, and the extensive experiments show that the proposed system generates excellent reconstruction results in both synthetic and real-world environments.

翻訳日:2022-12-22 13:09:15 公開日:2020-03-18

# 測地線に基づく3次元形状のキャラクタリゼーション : 軟部組織臓器の時間的変形への応用

A new geodesic-based feature for characterization of 3D shapes: application to soft tissue organ temporal deformations ( http://arxiv.org/abs/2003.08332v1 )

ライセンス: Link先を確認

Karim Makki, Amine Bohi, Augustin C. Ogier, Marc-Emmanuel Bellemare

(参考訳) 本稿では,点雲から3次元形状を特徴付ける手法を提案し,臓器の時間的変形の研究への直接的応用を示す。一例として, 強制呼吸運動中の膀胱の挙動を3次元表面点の減少で特徴づける: まず, 大規模な変形Diffomorphic Metric Mapping (LDDMM) フレームワークを用いて, 表面の四角形メッシュの頂点を表す等距離点の集合を, 長いダイナミックMRIシーケンスを通して追跡する。次に, ユークリッド偏微分方程式 (pdes) を用いた時間的臓器変形を特徴付けるために, スケーリングと回転に不変な新しい幾何学的特徴を提案する。我々は, 人工3次元形状と, 強制呼吸運動時の膀胱変形を特徴とする動的MRIデータの両方に特徴の堅牢性を示す。提案手法は, 医用画像, 空気力学, ロボット工学など, コンピュータビジョンの応用に有用である可能性が示唆された。

In this paper, we propose a method for characterizing 3D shapes from point clouds and we show a direct application on a study of organ temporal deformations. As an example, we characterize the behavior of a bladder during a forced respiratory motion with a reduced number of 3D surface points: first, a set of equidistant points representing the vertices of quadrilateral mesh for the surface in the first time frame are tracked throughout a long dynamic MRI sequence using a Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework. Second, a novel geometric feature which is invariant to scaling and rotation is proposed for characterizing the temporal organ deformations by employing an Eulerian Partial Differential Equations (PDEs) methodology. We demonstrate the robustness of our feature on both synthetic 3D shapes and realistic dynamic MRI data portraying the bladder deformation during forced respiratory motions. Promising results are obtained, showing that the proposed feature may be useful for several computer vision applications such as medical imaging, aerodynamics and robotics.

翻訳日:2022-12-22 13:08:27 公開日:2020-03-18

# 画像スタイル変換のためのコンテンツ変換ブロック

A Content Transformation Block For Image Style Transfer ( http://arxiv.org/abs/2003.08407v1 )

ライセンス: Link先を確認

Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Bj\"orn Ommer

(参考訳) 画像理解と合成における根本的な課題を研究できるため、スタイル転送は最近多くの注目を集めている。最近の研究は、色、テクスチャ、計算速度、画像解像度の表現を大幅に改善した。芸術的なスタイルは、色、形、テクスチャといった画像の形式的特徴に影響を与えるが、コンテンツの詳細を変形、追加、削除する。本稿では,コンテンツイメージのコンテンツとスタイルを意識したスタイル化に焦点を当てた。そこで,エンコーダとデコーダの間にコンテンツ変換モジュールを導入する。さらに、写真やスタイルサンプルに現れる類似コンテンツを利用して、スタイルがコンテンツの詳細をどのように変更するかを学習し、これを他のクラスの詳細に一般化する。さらに,高分解能画像合成に不可欠な新しい正規化層を提案する。モデルの堅牢性と速度は,リアルタイムかつ高精細なビデオスタイリングを可能にする。我々は,提案手法の有効性を示すために,質的かつ定量的な評価を行う。

Style transfer has recently received a lot of attention, since it allows to study fundamental challenges in image understanding and synthesis. Recent work has significantly improved the representation of color and texture and computational speed and image resolution. The explicit transformation of image content has, however, been mostly neglected: while artistic style affects formal characteristics of an image, such as color, shape or texture, it also deforms, adds or removes content details. This paper explicitly focuses on a content-and style-aware stylization of a content image. Therefore, we introduce a content transformation module between the encoder and decoder. Moreover, we utilize similar content appearing in photographs and style samples to learn how style alters content details and we generalize this to other class details. Additionally, this work presents a novel normalization layer critical for high resolution image synthesis. The robustness and speed of our model enables a video stylization in real-time and high definition. We perform extensive qualitative and quantitative evaluations to demonstrate the validity of our approach.

翻訳日:2022-12-22 13:00:20 公開日:2020-03-18

# 脳における社会的フィードバック処理がソーシャルメディア時代における集団意見プロセスをどのように形成するか

How social feedback processing in the brain shapes collective opinion processes in the era of social media ( http://arxiv.org/abs/2003.08154v1 )

ライセンス: Link先を確認

Sven Banisch and Felix Gaisbauer and Eckehard Olbrich

(参考訳) 特定の意見を持つグループが公的な声を出し、異なる見解を持つ人たちを黙らせる仕組みは何でしょう? ソーシャルメディアはどのように機能するのか? 社会的フィードバックの処理に関する最近の神経科学的知見に基づいて,これらの問題に対処できる理論モデルを構築した。このモデルは、世論の沈黙理論のスパイラルによって説明される現象を捉え、そのメカニズムに基づく基礎を提供し、この方法で異なる集団構造が集団的意見表現の異なる体制とどのように関連しているかについてのより一般的な洞察を可能にする。少数派が結束全体として振る舞うと、強い多数派でさえ沈黙を余儀なくされる。社会フィードバック理論(英語版) (SFT) の枠組みは、社会的および認知神経科学における発見の社会的レベルの影響を理解するための社会理論の必要性を強調している。

What are the mechanisms by which groups with certain opinions gain public voice and force others holding a different view into silence? And how does social media play into this? Drawing on recent neuro-scientific insights into the processing of social feedback, we develop a theoretical model that allows to address these questions. The model captures phenomena described by spiral of silence theory of public opinion, provides a mechanism-based foundation for it, and allows in this way more general insight into how different group structures relate to different regimes of collective opinion expression. Even strong majorities can be forced into silence if a minority acts as a cohesive whole. The proposed framework of social feedback theory (SFT) highlights the need for sociological theorising to understand the societal-level implications of findings in social and cognitive neuroscience.

翻訳日:2022-12-22 12:58:19 公開日:2020-03-18

# 中国語における代用スーパーセンスのコーパス

A Corpus of Adpositional Supersenses for Mandarin Chinese ( http://arxiv.org/abs/2003.08437v1 )

ライセンス: Link先を確認

Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider

(参考訳) 格付けは、しばしば意味関係の指標となるが、非常に曖昧であり、言語によって大きく異なる。さらに,形容詞意味論の言語間差異を調査したり,多言語的曖昧化システムを構築するための注釈付きコーパスのデジェストが存在する。本稿は,中国語における全ての格付けが意味論的にアノテートされたコーパスについて述べる。提案手法は,言語に依存しないセマンティックな基準に従って,一般的なスーパーセンスの集合を定義する枠組みに適応するが,その開発は主に英語の前置詞に焦点を当てている(Schneider et al., 2018)。このスーパーセンスカテゴリーは、英語と構文的差異があるにもかかわらず、中国語の表記に適していることがわかった。 The Little Prince』のマンダリン翻訳では、高いアノテータ間合意を達成し、ビットクストの付加トークンの意味対応を解析する。

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.

翻訳日:2022-12-22 12:58:04 公開日:2020-03-18

# 公理ピンポイント

Axiom Pinpointing ( http://arxiv.org/abs/2003.08298v1 )

ライセンス: Link先を確認

Rafael Pe\~naloza

(参考訳) 公理ピンポイント(英: axiom pinpointing)とは、結果が従うべき存在論における特定の公理を見つけること。この課題は多くの研究分野において異なる名称で研究され、技術改革と再発明につながっている。本稿では,公理ピンポインティングの概要を概説し,基本的な概念,それを解決するための異なるアプローチ,そして文献で検討されたバリエーションや応用について述べる。これは、関連する問題に関心のある研究者の出発点となり、詳細を深く掘り下げるための豊富な書誌がある。

Axiom pinpointing refers to the task of finding the specific axioms in an ontology which are responsible for a consequence to follow. This task has been studied, under different names, in many research areas, leading to a reformulation and reinvention of techniques. In this work, we present a general overview to axiom pinpointing, providing the basic notions, different approaches for solving it, and some variations and applications which have been considered in the literature. This should serve as a starting point for researchers interested in related problems, with an ample bibliography for delving deeper into the details.

翻訳日:2022-12-22 12:57:48 公開日:2020-03-18

# 単発人物再同定の深層学習のための三重項置換法

Triplet Permutation Method for Deep Learning of Single-Shot Person Re-Identification ( http://arxiv.org/abs/2003.08303v1 )

ライセンス: Link先を確認

M. J. G\'omez-Silva, J.M. Armingol, A. de la Escalera

(参考訳) 深層畳み込みニューラルネットワークのトレーニングによる単発人物再識別(re-id)の解決は、1人当たり2枚の画像しか利用できないため、トレーニングデータの欠如による厄介な課題である。これによりモデルがオーバーフィッティングされ、性能が劣化する。本稿では,特定のre-idデータセットから複数のトレーニングセットを生成するために,Triplet Permutation法を定式化する。これはトリプルトネットワークを供給するための新しい戦略であり、シングルショットRe-Idモデルのオーバーフィッティングを低減する。改良されたパフォーマンスは、最も挑戦的なre-idデータセットであるprid2011で実証され、この方法の有効性が証明された。

Solving Single-Shot Person Re-Identification (Re-Id) by training Deep Convolutional Neural Networks is a daunting challenge, due to the lack of training data, since only two images per person are available. This causes the overfitting of the models, leading to degenerated performance. This paper formulates the Triplet Permutation method to generate multiple training sets, from a certain re-id dataset. This is a novel strategy for feeding triplet networks, which reduces the overfitting of the Single-Shot Re-Id model. The improved performance has been demonstrated over one of the most challenging Re-Id datasets, PRID2011, proving the effectiveness of the method.

翻訳日:2022-12-22 12:51:08 公開日:2020-03-18

# 内科的回転平均化におけるミニマの分布について

On the Distribution of Minima in Intrinsic-Metric Rotation Averaging ( http://arxiv.org/abs/2003.08310v1 )

ライセンス: Link先を確認

Kyle Wilson and David Bindel

(参考訳) 回転平均化は3dシーンの画像からカメラの集合の向きを決定する非凸最適化問題である。この問題は様々な距離とロバスト化器を用いて研究されている。 SO(3) 上の内在的(あるいは測地的)距離は幾何学的に有意であるが、外在的距離に基づく解法では(条件付き)正当性を保証するが、内在的計量では同等の結果が見つからない。本稿では,局所ミニマの空間分布について検討する。まず、質的行動における鋭い遷移を示すために、新しい実証研究を行い、問題がより不安定になるにつれて、それらは単一の(簡単に探せる)支配的な最小の面からミニマで満たされたコスト面へと遷移する。本論文の第2部では、この遷移が起こるときの理論的境界を導出する。これは[24]の結果を拡張したもので、この問題の難しさを研究するためのプロキシとして局所凸性を用いたものです。問題の基底となる商多様体幾何を認識することにより、先行作業よりも n-次元の改善が得られる。ちなみに、我々の分析では、以前の$l_2$ワークを一般的な$l_p$コストにまで拡張しています。本研究は,問題難易度を示す指標として代数的接続性を用いることを提案する。

Rotation Averaging is a non-convex optimization problem that determines orientations of a collection of cameras from their images of a 3D scene. The problem has been studied using a variety of distances and robustifiers. The intrinsic (or geodesic) distance on SO(3) is geometrically meaningful; but while some extrinsic distance-based solvers admit (conditional) guarantees of correctness, no comparable results have been found under the intrinsic metric. In this paper, we study the spatial distribution of local minima. First, we do a novel empirical study to demonstrate sharp transitions in qualitative behavior: as problems become noisier, they transition from a single (easy-to-find) dominant minimum to a cost surface filled with minima. In the second part of this paper we derive a theoretical bound for when this transition occurs. This is an extension of the results of [24], which used local convexity as a proxy to study the difficulty of problem. By recognizing the underlying quotient manifold geometry of the problem we achieve an n-fold improvement over prior work. Incidentally, our analysis also extends the prior $l_2$ work to general $l_p$ costs. Our results suggest using algebraic connectivity as an indicator of problem difficulty.

翻訳日:2022-12-22 12:50:54 公開日:2020-03-18

# DeepCap:Weak Supervisionを使った単眼の人間パフォーマンスキャプチャ

DeepCap: Monocular Human Performance Capture Using Weak Supervision ( http://arxiv.org/abs/2003.08325v1 )

ライセンス: Link先を確認

Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt

(参考訳) 人間のパフォーマンスキャプチャは、映画制作やバーチャル/拡張現実における多くの応用において、非常に重要なコンピュータビジョン問題である。以前の多くのパフォーマンスキャプチャアプローチでは、高価なマルチビューの設定が必要か、フレーム間対応で密集した時空コヒーレント形状を回復しなかった。本稿では,単眼高密度ヒトパフォーマンスキャプチャのための新しい深層学習手法を提案する。提案手法は,3次元基底真理アノテーションを用いたトレーニングデータを完全に除去する多視点監視に基づいて,弱教師付きで訓練される。ネットワークアーキテクチャは、タスクをポーズ推定と非剛性表面変形ステップに切り離す2つの別々のネットワークに基づいている。広範な質的・定量的評価は,我々のアプローチが品質と堅牢性の観点から,芸術の状態を上回っていることを示している。

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness.

翻訳日:2022-12-22 12:50:32 公開日:2020-03-18

# RGB-Dスキャンによる逆テクスチャ最適化

Adversarial Texture Optimization from RGB-D Scans ( http://arxiv.org/abs/2003.08400v1 )

ライセンス: Link先を確認

Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nie{\ss}ner, Thomas Funkhouser

(参考訳) リアルなカラーテクスチャの生成は、rgb-d表面再構成の重要なステップであるが、再構成された形状の不正確さ、カメラのポーズのミスアライメント、ビュー依存のイメージアーティファクトのため、実際はまだ困難である。本研究では,弱教師付き視点から得られた条件付き逆数損失を用いた色彩テクスチャ生成手法を提案する。具体的には,これらの誤差に頑健な客観的関数を学習することにより,不整合画像からでも近似面に対してフォトリアリスティックなテクスチャを生成する手法を提案する。提案手法の鍵となる考え方は,テクスチャ最適化をミスアライメントに寛容に導くパッチベースの条件判別器を学習することである。識別器は合成ビューと実画像を取り、合成ビューが現実主義の広い定義の下で現実的かどうかを評価する。私たちは、'リアル'な'例の入力ビューとそれらの不整合バージョンを提供することで、判別子を訓練し、学習した敵の損失がスキャンからエラーを許容できるようにします。定量的・質的評価の下での合成データおよび実データ実験は,最先端技術と比較して,本手法の利点を実証する。私たちのコードはビデオデモで公開されています。

Realistic color texture generation is an important step in RGB-D surface reconstruction, but remains challenging in practice due to inaccuracies in reconstructed geometry, misaligned camera poses, and view-dependent imaging artifacts. In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. Specifically, we propose an approach to produce photorealistic textures for approximate surfaces, even from misaligned images, by learning an objective function that is robust to these errors. The key idea of our approach is to learn a patch-based conditional discriminator which guides the texture optimization to be tolerant to misalignments. Our discriminator takes a synthesized view and a real image, and evaluates whether the synthesized one is realistic, under a broadened definition of realism. We train the discriminator by providing as `real' examples pairs of input views and their misaligned versions -- so that the learned adversarial loss will tolerate errors from the scans. Experiments on synthetic and real data under quantitative or qualitative evaluation demonstrate the advantage of our approach in comparison to state of the art. Our code is publicly available with video demonstration.

翻訳日:2022-12-22 12:48:51 公開日:2020-03-18

# swaptext: シーン内の画像ベースのテキスト転送

SwapText: Image Based Texts Transfer in Scenes ( http://arxiv.org/abs/2003.08152v1 )

ライセンス: Link先を確認

Qiangpeng Yang, Hongsheng Jin, Jun Huang, Wei Lin

(参考訳) オリジナルのフォント、色、サイズ、背景テクスチャを保存しながらシーンイメージにテキストをスワップすることは、異なる要因間の複雑な相互作用のために難しい課題である。本研究では,シーンイメージ間でテキストを転送する3段階フレームワークであるSwapTextを紹介する。まず,前景画像にのみテキストラベルを置換するために,新しいテキストスワップネットワークを提案する。次に、背景完了ネットワークを学習して背景画像を再構成する。最後に、生成された前景画像と背景画像を用いて、融合ネットワークにより単語画像を生成する。提案フレームワークを用いて,重度の幾何学的歪みであっても入力画像のテキストを操作できる。定性的かつ定量的な結果は、正規および不規則なテキストデータセットを含むいくつかのシーンテキストデータセットに表示される。我々は,画像ベーステキスト翻訳やテキスト画像合成などの手法の有用性を証明するため,広範な実験を行った。

Swapping text in scene images while preserving original fonts, colors, sizes and background textures is a challenging task due to the complex interplay between different factors. In this work, we present SwapText, a three-stage framework to transfer texts across scene images. First, a novel text swapping network is proposed to replace text labels only in the foreground image. Second, a background completion network is learned to reconstruct background images. Finally, the generated foreground image and background image are used to generate the word image by the fusion network. Using the proposing framework, we can manipulate the texts of the input images even with severe geometric distortion. Qualitative and quantitative results are presented on several scene text datasets, including regular and irregular text datasets. We conducted extensive experiments to prove the usefulness of our method such as image based text translation, text image synthesis, etc.

翻訳日:2022-12-22 12:42:42 公開日:2020-03-18

# 3次元ガウス核とのマルチビュー融合による3次元群数計測

3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels ( http://arxiv.org/abs/2003.08162v1 )

ライセンス: Link先を確認

Qi Zhang and Antoni B. Chan

(参考訳) 群衆の数え上げは数十年にわたって研究され、特にDNNに基づく密度マップ推定法において、多くの研究が優れた成果を上げている。既存の群衆計数作業の多くは単一視点計数に重点を置いているが、複数のカメラを使用する大規模・広視野の多視点計数の研究は少ない。近年,Multi-view Multi-scale (MVMS) と呼ばれる,複数のカメラビューをCNNで融合し,平面上の2次元シーンレベルの密度マップを推定する手法が提案されている。 MVMSとは違って,2次元地上平面ではなく3次元シーンレベル密度マップと3次元特徴融合による多視点群カウントタスクを提案する。 2D融合と比較して、3D融合は、z次元(高さ)に沿った人々のより多くの情報を抽出し、複数のビューにわたるスケールの変動を解決するのに役立つ。 3D密度マップは、和がカウントである2D密度マップの特性を保ちながら、群衆密度に関する3D情報も提供する。また,2次元ビューにおける3次元予測と基底構造間の投影整合性について検討し,計数性能をさらに向上させる。提案手法は,3つのマルチビュー計数データセット上でテストし,最先端の計数性能と同等の性能を実現する。

Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.

翻訳日:2022-12-22 12:42:28 公開日:2020-03-18

# 少数音源ラベルを用いたドメイン適応のためのクロスドメイン自己教師型学習

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels ( http://arxiv.org/abs/2003.08264v1 )

ライセンス: Link先を確認

Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, and Kate Saenko

(参考訳) 既存の教師なしドメイン適応メソッドは、ラベル豊富なソースドメインからラベルなしのターゲットドメインに知識を転送することを目的としている。しかしながら、一部のソースドメインのラベルを取得することは非常に高価であり、以前の作業で使われるような完全なラベル付けは実用的ではない。本研究では,対象領域がラベル付けされていない場合,ソース領域内のいくつかの例のみをラベル付けする,スパースラベル付きソースデータを用いた新しいドメイン適応シナリオについて検討する。ラベル付きソースの例が限られている場合、既存のメソッドはソースドメインとターゲットドメインの両方に適用可能な差別的特徴を学習できないことが多い。本稿では,ドメイン不変性だけでなく,クラス識別性も備えた特徴を学習する,ドメイン適応のための新しいクロスドメイン自己教師型学習手法を提案する。本手法は,ドメイン内自己スーパービジョンと視覚的類似性をドメイン適応的に捉え,ドメイン間自己スーパービジョンと整合するクロスドメイン機能を実行する。 3つの標準ベンチマークデータセットによる広範な実験において、本手法は、ソースラベルが少ない新しいターゲット領域におけるターゲット精度を著しく向上させ、古典的なドメイン適応シナリオにおいてさらに有用である。

Existing unsupervised domain adaptation methods aim to transfer knowledge from a label-rich source domain to an unlabeled target domain. However, obtaining labels for some source domains may be very expensive, making complete labeling as used in prior work impractical. In this work, we investigate a new domain adaptation scenario with sparsely labeled source data, where only a few examples in the source domain have been labeled, while the target domain is unlabeled. We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains. We propose a novel Cross-Domain Self-supervised (CDS) learning approach for domain adaptation, which learns features that are not only domain-invariant but also class-discriminative. Our self-supervised learning method captures apparent visual similarity with in-domain self-supervision in a domain adaptive manner and performs cross-domain feature matching with across-domain self-supervision. In extensive experiments with three standard benchmark datasets, our method significantly boosts performance of target accuracy in the new target domain with few source labels and is even helpful on classical domain adaptation scenarios.

翻訳日:2022-12-22 12:40:27 公開日:2020-03-18

# PIC:長距離活動認識のための変分不変畳み込み

PIC: Permutation Invariant Convolution for Recognizing Long-range Activities ( http://arxiv.org/abs/2003.08275v1 )

ライセンス: Link先を確認

Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders

(参考訳) 畳み込み、自己注意、ベクトル集約などの神経操作は、短距離行動を認識するための選択肢である。しかし、長距離活動のモデリングには3つの制限がある。本稿では,長期活動の時間的構造をモデル化する新しい神経層であるpic,permutation invariant convolutionを提案する。望ましい性質は3つある。私は... 標準的な畳み込みとは異なり、PICは受容領域内の特徴の時間的置換に不変であり、弱い時間構造をモデル化する資格がある。私は... ベクトルアグリゲーションと異なり、PICは局所接続を尊重し、カスケード層を用いて長距離時間抽象を学習することができる。第3回。自己注意とは対照的に、PICは共有重量を使い、長く騒々しいビデオの中で最も差別的な視覚的証拠を検出することができる。本研究では,picの3つの特性について検討し,シャレード,朝食,マルチトゥモスの長距離活動の認識にその効果を示す。

Neural operations as convolutions, self-attention, and vector aggregation are the go-to choices for recognizing short-range actions. However, they have three limitations in modeling long-range activities. This paper presents PIC, Permutation Invariant Convolution, a novel neural layer to model the temporal structure of long-range activities. It has three desirable properties. i. Unlike standard convolution, PIC is invariant to the temporal permutations of features within its receptive field, qualifying it to model the weak temporal structures. ii. Different from vector aggregation, PIC respects local connectivity, enabling it to learn long-range temporal abstractions using cascaded layers. iii. In contrast to self-attention, PIC uses shared weights, making it more capable of detecting the most discriminant visual evidence across long and noisy videos. We study the three properties of PIC and demonstrate its effectiveness in recognizing the long-range activities of Charades, Breakfast, and MultiThumos.

翻訳日:2022-12-22 12:40:05 公開日:2020-03-18

# AMIL:人文推定のための対話型マルチインスタンス学習

AMIL: Adversarial Multi Instance Learning for Human Pose Estimation ( http://arxiv.org/abs/2003.08002v1 )

ライセンス: Link先を確認

Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Jie Yang

(参考訳) 人間のポーズ推定は、ヒューマンコンピュータインタフェースから監視やコンテンツに基づくビデオ検索まで幅広い応用に重要な影響を与える。人間のポーズ推定では、関節の障害や人体の重なりが、離脱したポーズ推定に繋がる。これらの問題に対処するために,人体の構造の優先順位を統合することにより,ネットワークのトレーニング中にその優先順位を慎重に検討する新しい構造認識ネットワークを提案する。通常、そのような制約を学ぶことは難しい課題です。そこで本研究では,同一のアーキテクチャを持つ2つの残差複数インスタンス学習モデル(mil)を設計し,一方を生成器として,もう一方を判別器として使用する学習モデルとして生成型逆ネットワークを提案する。判別作業は、実際のポーズと偽のポーズを区別することである。ポーズ生成器が、判別器が実際のものと区別できない結果を生成すると、モデルが事前学習に成功する。提案モデルでは, 地中断熱マップと生成熱マップを区別し, その後, 逆方向の損失が生成体に逆伝搬する。このような手順は、発電機が合理的な身体構成を学ぶのを補助し、ポーズ推定精度を向上させるのに有利であることが証明される。一方,我々はmilの新しい機能を提案する。インスタンス選択とモデリングの両方を行うための調整可能な構造で、ひとつのバッグ内のインスタンス間で情報を適切に渡すことができる。提案された残留MILニューラルネットワークでは、プールアクションがバッグへのインスタンスコントリビューションを適切に更新する。ヒトのポーズ推定タスクの2つのデータセットにおいて、プールに基づく逆数残差マルチインスタンスニューラルネットワークが検証され、他の最先端モデルよりもうまく性能が向上した。

Human pose estimation has an important impact on a wide range of applications from human-computer interface to surveillance and content-based video retrieval. For human pose estimation, joint obstructions and overlapping upon human bodies result in departed pose estimation. To address these problems, by integrating priors of the structure of human bodies, we present a novel structure-aware network to discreetly consider such priors during the training of the network. Typically, learning such constraints is a challenging task. Instead, we propose generative adversarial networks as our learning model in which we design two residual multiple instance learning (MIL) models with the identical architecture, one is used as the generator and the other one is used as the discriminator. The discriminator task is to distinguish the actual poses from the fake ones. If the pose generator generates the results that the discriminator is not able to distinguish from the real ones, the model has successfully learnt the priors. In the proposed model, the discriminator differentiates the ground-truth heatmaps from the generated ones, and later the adversarial loss back-propagates to the generator. Such procedure assists the generator to learn reasonable body configurations and is proved to be advantageous to improve the pose estimation accuracy. Meanwhile, we propose a novel function for MIL. It is an adjustable structure for both instance selection and modeling to appropriately pass the information between instances in a single bag. In the proposed residual MIL neural network, the pooling action adequately updates the instance contribution to its bag. The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets for the human pose estimation task and successfully outperforms the other state-of-arts models.

翻訳日:2022-12-22 10:18:21 公開日:2020-03-18

# ScanSSD:PDF文書画像における数式用シングルショット検出器

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images ( http://arxiv.org/abs/2003.08005v1 )

ライセンス: Link先を確認

Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, Richard Zanibbi

(参考訳) 本稿では,テキストからオフセットした数式をテキストラインに埋め込むScanning Single Shot Detector(ScanSSD)を提案する。 ScanSSDは検出に視覚的機能のみを使用し、レイアウト、フォント、文字ラベルなどのフォーマットやタイプセット情報を使用しない。 600dpiのドキュメントページイメージが与えられた場合、Single Shot Detector (SSD) はスライドウィンドウを使用して複数のスケールで公式を見つけ、次に候補検出をプールしてページレベルの結果を得る。実験では, TFD-ICDAR2019v2データセットを用いた。 ScanSSDは精度の高い公式の文字を検出し、0.926 fスコアを取得し、全体的なリコール率の高い公式を検出する。例えば、大きな空白ギャップ(変数の制約など)で式を分割したり、隣接するテキストラインで式をマージしたりするなどである。式検出f-スコアは 0.796 (iou $\geq0.5$) と 0.733 (iou $\ge 0.75$) である。私たちのデータ、評価ツール、コードは公開されています。

We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results. For our experiments we use the TFD-ICDAR2019v2 dataset, a modification of the GTDB scanned math article collection. ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall. Detection errors are largely minor, such as splitting formulas at large whitespace gaps (e.g., for variable constraints) and merging formulas on adjacent textlines. Formula detection f-scores of 0.796 (IOU $\geq0.5$) and 0.733 (IOU $\ge 0.75$) are obtained. Our data, evaluation tools, and code are publicly available.

翻訳日:2022-12-22 10:17:55 公開日:2020-03-18

# 物体追跡におけるr-spatiogramの適用による閉塞処理

Applying r-spatiogram in object tracking for occlusion handling ( http://arxiv.org/abs/2003.08021v1 )

ライセンス: Link先を確認

Niloufar Salehi Dastjerdi and M. Omair Ahmad

(参考訳) 物体追跡はコンピュータビジョンにおける最も重要な問題の1つである。ビデオトラッキングの目的は、対象または対象の軌跡を抽出し、すなわち、動画シーケンス内の移動対象を正確に特定し、シーケンスの特徴空間における非対象からターゲットを判別することである。したがって、特徴記述子はそのような差別に大きな影響を与える可能性がある。本稿では,参照モデルの3つの主要コンポーネント,すなわちオブジェクトモデリング,オブジェクト検出とローカライゼーション,モデル更新からなる,多くのトラッカの基本的な考え方について述べる。しかし、我々のシステムには大きな改善がある。我々のforthコンポーネントであるocclusion handlingはr-spatiogramを利用して最適なターゲット候補を検知する。スパティグラムはピクセルの座標上のいくつかのモーメントを含むが、r-spatiogramは、オブジェクトを表現するためによりリッチな特徴をキャプチャする画像内の与えられた特徴の分布に関する領域ベースのコンパクト性を計算する。本研究は,映像中の物体の出現変化や重度の閉塞の存在下での追跡を効果的かつ堅牢に行う方法を開発した。提案手法は,課題の異なるシーケンスを考慮し,プリンストン rgbd 追跡データセット上で評価し,提案手法の有効性を示す。

Object tracking is one of the most important problems in computer vision. The aim of video tracking is to extract the trajectories of a target or object of interest, i.e. accurately locate a moving target in a video sequence and discriminate target from non-targets in the feature space of the sequence. So, feature descriptors can have significant effects on such discrimination. In this paper, we use the basic idea of many trackers which consists of three main components of the reference model, i.e., object modeling, object detection and localization, and model updating. However, there are major improvements in our system. Our forth component, occlusion handling, utilizes the r-spatiogram to detect the best target candidate. While spatiogram contains some moments upon the coordinates of the pixels, r-spatiogram computes region-based compactness on the distribution of the given feature in the image that captures richer features to represent the objects. The proposed research develops an efficient and robust way to keep tracking the object throughout video sequences in the presence of significant appearance variations and severe occlusions. The proposed method is evaluated on the Princeton RGBD tracking dataset considering sequences with different challenges and the obtained results demonstrate the effectiveness of the proposed method.

翻訳日:2022-12-22 10:17:34 公開日:2020-03-18

# STH:効率的な行動認識のための時空間ハイブリッド畳み込み

STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition ( http://arxiv.org/abs/2003.08042v1 )

ライセンス: Link先を確認

Xu Li, Jingwen Wang, Lin Ma, Kaihao Zhang, Fengzong Lian, Zhanhui Kang and Jinjun Wang

(参考訳) 効果的な時空間モデリングは行動認識に不可欠である。既存のメソッドは、モデルのパフォーマンスとモデルの複雑さの間のトレードオフに苦しむ。本稿では,空間的・時間的映像情報を少ないパラメータコストで同時に符号化する,空間的・時間的ハイブリッド・コンボリューション・ネットワーク(STH)を提案する。コンボリューション層が異なる空間的・時間的情報を逐次的または並列に抽出する既存の作業とは異なり、入力チャネルを複数のグループに分割し、空間的・時間的操作を1つの畳み込み層にインターリーブする。このような設計は効率的な時空間モデリングを可能にし、小さなモデルスケールを維持する。 STH-Convは一般的なビルディングブロックであり、従来の2D-Convブロック(2D畳み込み)を置き換えることで、ResNetやMobileNetのような既存の2D CNNアーキテクチャにプラグインすることができる。 STHネットワークは、Something (V1 & V2)、Jester、HMDB-51といったベンチマークデータセットの競合製品よりも、競争力やパフォーマンスの向上を実現している。さらに、sthは2d cnnよりもさらに小さなパラメータコストを維持しながら、3d cnnよりも優れた性能を享受する。

Effective and Efficient spatio-temporal modeling is essential for action recognition. Existing methods suffer from the trade-off between model performance and model complexity. In this paper, we present a novel Spatio-Temporal Hybrid Convolution Network (denoted as "STH") which simultaneously encodes spatial and temporal video information with a small parameter cost. Different from existing works that sequentially or parallelly extract spatial and temporal information with different convolutional layers, we divide the input channels into multiple groups and interleave the spatial and temporal operations in one convolutional layer, which deeply incorporates spatial and temporal clues. Such a design enables efficient spatio-temporal modeling and maintains a small model scale. STH-Conv is a general building block, which can be plugged into existing 2D CNN architectures such as ResNet and MobileNet by replacing the conventional 2D-Conv blocks (2D convolutions). STH network achieves competitive or even better performance than its competitors on benchmark datasets such as Something-Something (V1 & V2), Jester, and HMDB-51. Moreover, STH enjoys performance superiority over 3D CNNs while maintaining an even smaller parameter cost than 2D CNNs.

翻訳日:2022-12-22 10:16:56 公開日:2020-03-18

# パーキンソン病における顔面運動動態の推定:運動追跡のための2次元および3次元マーカーレスシステムの比較

Estimation of Orofacial Kinematics in Parkinson's Disease: Comparison of 2D and 3D Markerless Systems for Motion Tracking ( http://arxiv.org/abs/2003.08048v1 )

ライセンス: Link先を確認

Diego L. Guarin, Aidan Dempster, Andrea Bandini, Yana Yunusova and Babak Taati

(参考訳) 顔面の欠損はパーキンソン病(PD)の患者によく見られ、その進化は疾患進行の重要なバイオマーカーである可能性がある。本研究は, PDにおける口腔機能評価の自動化システムを開発し, 家庭内, クリニックで使用でき, 疾患管理に有用な, 客観的な臨床情報を提供する。我々の現在のアプローチは3次元顔の動きを推定するために色と深度カメラに依存している。しかし、深度カメラは一般的には利用できず、高価であり、制御とデータ処理のために特別なソフトウェアを必要とする。本研究の目的は,口腔顔面運動学から抽出した特徴に基づいて,健康管理とpd患者との鑑別に深度カメラが必要かどうかを評価することである。その結果,カラーカメラのみから抽出した2次元特徴は,カラーカメラと深度カメラから抽出した3次元特徴と同程度に情報的であり,PD患者の健康管理の差異が示唆された。これらの結果は,PDにおける口腔機能の自動的,客観的評価のための汎用システム開発への道を開くものである。

Orofacial deficits are common in people with Parkinson's disease (PD) and their evolution might represent an important biomarker of disease progression. We are developing an automated system for assessment of orofacial function in PD that can be used in-home or in-clinic and can provide useful and objective clinical information that informs disease management. Our current approach relies on color and depth cameras for the estimation of 3D facial movements. However, depth cameras are not commonly available, might be expensive, and require specialized software for control and data processing. The objective of this paper was to evaluate if depth cameras are needed to differentiate between healthy controls and PD patients based on features extracted from orofacial kinematics. Results indicate that 2D features, extracted from color cameras only, are as informative as 3D features, extracted from color and depth cameras, differentiating healthy controls from PD patients. These results pave the way for the development of a universal system for automatic and objective assessment of orofacial function in PD.

翻訳日:2022-12-22 10:16:33 公開日:2020-03-18

# 対面防止のための深部空間勾配と時間深度学習

Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing ( http://arxiv.org/abs/2003.08061v1 )

ライセンス: Link先を確認

Zezheng Wang, Zitong Yu, Chenxu Zhao, Xiangyu Zhu, Yunxiao Qin, Qiusheng Zhou, Feng Zhou, Zhen Lei

(参考訳) 顔認識システムのセキュリティには顔認識対策が不可欠である。深層学習は、顔の反偽造の最も効果的な方法の1つとして証明されている。大きな成功にもかかわらず、以前のほとんどの研究は、詳細な細かい情報と顔深度と動きパターンの相互作用を無視しながら、単に深度による損失を増大させることで、単一フレームのマルチタスクとして問題を定式化している。対照的に,我々は2つの洞察に基づいて,複数のフレームからプレゼンテーションアタックを検出する新しいアプローチをデザインする。 1)生活と陰影の間の詳細な識別的手がかり(例えば、空間的勾配等級)は、積み重ねられたバニラの畳み込みによって破棄され得る。 2) 3次元移動面のダイナミクスは, スプーフィング面を検出する上で重要な手がかりとなる。提案手法は,Residual Spatial Gradient Block (RSGB) を用いて識別の詳細を抽出し,時空間伝搬モジュール (STPM) から時空間情報を効率よく符号化する。さらに、より正確な深度監視のために、新しいContrastive Depth Lossが提示される。また,本手法の有効性を評価するために,サンプル毎に実際の深度を提供するDMAD(Double-modal Anti-Spoofing Dataset)も収集した。実験により,提案手法はOULU-NPU, SiW, CASIA-MFSD, Replay-Attack, そして新しいDMADを含む5つのベンチマークデータセットに対して,最先端の結果が得られることを示した。コードはhttps://github.com/clks-wzz/FAS-SGTD.comで入手できる。

Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/FAS-SGTD.

翻訳日:2022-12-22 10:16:13 公開日:2020-03-18

# 画像から画像への変換を保存した幾何による教師なしマルチモーダル画像登録

Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation ( http://arxiv.org/abs/2003.08073v1 )

ライセンス: Link先を確認

Moab Arar, Yiftach Ginger, Dov Danon, Ilya Leizerson, Amit Bermano, Daniel Cohen-Or

(参考訳) 自動運転のような多くの応用は、モダリティ間の空間的アライメントを必要とするマルチモーダルデータに大きく依存している。多くのマルチモーダル登録法は、画像間の空間的対応の計算に苦慮している。本研究では,2つの入力モダリティのイメージ・ツー・イメージ翻訳ネットワークをトレーニングすることにより,モダリティ間の類似度向上の難しさを回避する。この学習された翻訳は、単純で信頼性の高いモノモダリティメトリクスを使用して登録ネットワークをトレーニングできる。空間変換ネットワークと翻訳ネットワークの2つのネットワークを用いてマルチモーダル登録を行う。我々は,翻訳ネットワークの幾何学的保存を奨励することで,正確な空間変換ネットワークをトレーニングできることを示す。最先端のマルチモーダル手法と比較して,提案手法は教師なしであり,トレーニングにアライメントされたモーダルのペアを必要とせず,任意の対のモーダルに適応できる。本手法は,商用データセット上で定量的・定性的に評価し,複数の形態で良好に動作し,高精度なアライメントを実現する。

Many applications, such as autonomous driving, heavily rely on multi-modal data where spatial alignment between the modalities is required. Most multi-modal registration methods struggle computing the spatial correspondence between the images using prevalent cross-modality similarity measures. In this work, we bypass the difficulties of developing cross-modality similarity measures, by training an image-to-image translation network on the two input modalities. This learned translation allows training the registration network using simple and reliable mono-modality metrics. We perform multi-modal registration using two networks - a spatial transformation network and a translation network. We show that by encouraging our translation network to be geometry preserving, we manage to train an accurate spatial transformation network. Compared to state-of-the-art multi-modal methods our presented method is unsupervised, requiring no pairs of aligned modalities for training, and can be adapted to any pair of modalities. We evaluate our method quantitatively and qualitatively on commercial datasets, showing that it performs well on several modalities and achieves accurate alignment.

翻訳日:2022-12-22 10:15:18 公開日:2020-03-18

# MagicEyes: 複合現実感のための大規模視線推定データセット

MagicEyes: A Large Scale Eye Gaze Estimation Dataset for Mixed Reality ( http://arxiv.org/abs/2003.08806v1 )

ライセンス: Link先を確認

Zhengyang Wu, Srivignesh Rajendran, Tarrence van As, Joelle Zimmermann, Vijay Badrinarayanan, Andrew Rabinovich

(参考訳) 仮想および混合現実(xr)デバイスが出現したことで、コンピュータビジョンコミュニティではアイトラッキングが注目されている。視線推定はXRの重要な要素であり、エネルギー効率の良いレンダリング、多焦点ディスプレイ、コンテンツとの効果的な相互作用を可能にする。ヘッドマウントXRデバイスでは、視野を塞ぐのを避けるために、視線をオフ軸に撮像する。これにより、目に関連する量を推測する際の課題が増加し、同時に正確で堅牢な学習ベースのアプローチを開発する機会を提供する。そこで本研究では,実mr装置を用いて収集した最初の大規模眼球データセットであるmagiceyesを提案する。 MagicEyesには、587ドルの被験者と80,000ドルの人間ラベル付き地上の真実の画像、80,000ドルの目標ラベル付き画像が含まれている。そこで本研究では,magiceyesの最先端手法を評価するとともに,角膜,グリム,瞳孔を1回のフォワードパスで検出するマルチタスクアイネットモデルを提案する。

With the emergence of Virtual and Mixed Reality (XR) devices, eye tracking has received significant attention in the computer vision community. Eye gaze estimation is a crucial component in XR -- enabling energy efficient rendering, multi-focal displays, and effective interaction with content. In head-mounted XR devices, the eyes are imaged off-axis to avoid blocking the field of view. This leads to increased challenges in inferring eye related quantities and simultaneously provides an opportunity to develop accurate and robust learning based approaches. To this end, we present MagicEyes, the first large scale eye dataset collected using real MR devices with comprehensive ground truth labeling. MagicEyes includes $587$ subjects with $80,000$ images of human-labeled ground truth and over $800,000$ images with gaze target labels. We evaluate several state-of-the-art methods on MagicEyes and also propose a new multi-task EyeNet model designed for detecting the cornea, glints and pupil along with eye segmentation in a single forward pass.

翻訳日:2022-12-22 10:08:38 公開日:2020-03-18

# 自動車レーダを用いた深部空間セグメンテーション

Deep Open Space Segmentation using Automotive Radar ( http://arxiv.org/abs/2004.03449v1 )

ライセンス: Link先を確認

Farzan Erlik Nowruzi, Dhanvin Kolhatkar, Prince Kapoor, Fahed Al Hassanat, Elnaz Jahani Heravi, Robert Laganiere, Julien Rebut, Waqas Malik

(参考訳) 本研究では,駐車シナリオにおける開放空間を特定するために,高度な深部セグメンテーションモデルを用いたレーダを提案する。 SCORPと呼ばれるレーダー観測の公開データセットが収集された。深いモデルは様々なレーダ入力表現で評価される。提案手法は,低メモリ使用量およびリアルタイム処理速度を実現し,組込み配置に非常に適している。

In this work, we propose the use of radar with advanced deep segmentation models to identify open space in parking scenarios. A publically available dataset of radar observations called SCORP was collected. Deep models are evaluated with various radar input representations. Our proposed approach achieves low memory usage and real-time processing speeds, and is thus very well suited for embedded deployment.

翻訳日:2022-12-22 10:08:16 公開日:2020-03-18

# オープンソース音声資源におけるジェンダー表現

Gender Representation in Open Source Speech Resources ( http://arxiv.org/abs/2003.08132v1 )

ライセンス: Link先を確認

Mahault Garnerin, Solange Rossato, Laurent Besacier

(参考訳) 人工知能(AI)の台頭とディープラーニングアーキテクチャの利用の増加に伴い、AIシステムの倫理、透明性、公正性の問題は研究コミュニティの中心的な関心事となっている。我々は,open speech and language resource platform を通じて利用可能な音声資源における性表現に関する研究を行い,音声言語システムの透明性と公平性について論じる。オープンソースコーパスにおけるジェンダー情報の発見は簡単ではなく、ジェンダーバランスは他のコーパスの特徴にも依存することを示す(Elicited/non elicited speech, Low/high Resource Language, speech task targeted)。この論文は、このようなコーパスを用いて構築された音声システムの透明性を高めるために、研究者のためのメタデータと性別情報に関する勧告で締めくくられる。

With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited/non elicited speech, low/high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.

翻訳日:2022-12-22 10:07:45 公開日:2020-03-18

# スキップグラムモデルの学習規則の分析

An Analysis on the Learning Rules of the Skip-Gram Model ( http://arxiv.org/abs/2003.08489v1 )

ライセンス: Link先を確認

Canlin Zhang, Xiuwen Liu and Daniel Bis

(参考訳) 自然言語処理タスクにおける表現の一般化を改善するため、単語はベクトルを用いて表現され、ベクトル間の距離は単語の類似度と関連付けられる。スキップグラムモデルの最先端実装である word2vec は、多くの自然言語処理タスクの性能向上に広く利用されているが、そのメカニズムはまだよく理解されていない。本研究では,スキップグラムモデルの学習ルールを導出し,それらの競合学習との密接な関係を確立する。さらに,スキップグラムモデルに対する大域的最適解制約を提供し,実験結果を用いて検証する。

To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the state-of-the-art implementation of the skip-gram model, is widely used and improves the performance of many natural language processing tasks, its mechanism is not yet well understood. In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.

翻訳日:2022-12-22 10:06:56 公開日:2020-03-18

# 深層強化学習による配置最適化

Placement Optimization with Deep Reinforcement Learning ( http://arxiv.org/abs/2003.08445v1 )

ライセンス: Link先を確認

Anna Goldie and Azalia Mirhoseini

(参考訳) 配置最適化はシステムやチップ設計において重要な問題であり、グラフのノードを制約の対象となる目的のために最適化するための限られたリソースセットにマッピングする。本稿では,配置問題の解法として強化学習を動機づけることから始める。次に、深い強化学習とは何かの概要を示す。次に、配置問題を強化学習問題として定式化し、政策勾配最適化を用いてこの問題をいかに解決できるかを示す。最後に,様々な配置最適化問題に対する深層強化学習政策の訓練から学んだ教訓について述べる。

Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by motivating reinforcement learning as a solution to the placement problem. We then give an overview of what deep reinforcement learning is. We next formulate the placement problem as a reinforcement learning problem and show how this problem can be solved with policy gradient optimization. Finally, we describe lessons we have learned from training deep reinforcement learning policies across a variety of placement optimization problems.

翻訳日:2022-12-22 10:06:19 公開日:2020-03-18

# ロータテ・アンド・レンダー:シングルビュー画像からの教師なしフォトリアリスティック顔回転

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images ( http://arxiv.org/abs/2003.08124v1 )

ライセンス: Link先を確認

Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang

(参考訳) 顔の回転は近年急速に進歩しているが、高品質なペアリングトレーニングデータの欠如は、既存の手法にとって大きなハードルとなっている。現在の生成モデルは、同一人物のマルチビューイメージを持つデータセットに大きく依存している。したがって、生成された結果は、データソースのスケールとドメインによって制限される。これらの課題を克服するために、野生の単視点画像コレクションのみを用いて、写真リアルな回転面を合成できる新しい教師なしフレームワークを提案する。私たちの重要な洞察は、3D空間の顔を前後に回転させ、2D平面に再レンダリングすることで、強力な自己スーパービジョンになるということです。我々は3次元顔モデリングと高分解能GANの最近の進歩を活用して構築ブロックを構成する。顔の3次元回転・回転は細部を損なうことなく任意の角度に適用できるため,既存の手法が不足している実地シナリオ(ペアデータがない場合など)に極めて適している。広範な実験により,提案手法は合成品質が優れ,かつ最先端の手法に対するアイデンティティの保存が幅広いポーズやドメインにまたがることを示した。さらに,我々のローテーション・アンド・レンダー・フレームワークが,強力なベースラインモデルであっても,現代の顔認識システムを強化する効果的なデータ拡張エンジンとして機能することを検証する。

Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods. The current generative models heavily rely on datasets with multi-view images of the same person. Thus, their generated results are restricted by the scale and domain of the data source. To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild. Our key insight is that rotating faces in the 3D space back and forth, and re-rendering them to the 2D plane can serve as a strong self-supervision. We leverage the recent advances in 3D face modeling and high-resolution GAN to constitute our building blocks. Since the 3D rotation-and-render on faces can be applied to arbitrary angles without losing details, our approach is extremely suitable for in-the-wild scenarios (i.e. no paired data are available), where existing methods fall short. Extensive experiments demonstrate that our approach has superior synthesis quality as well as identity preservation over the state-of-the-art methods, across a wide range of poses and domains. Furthermore, we validate that our rotate-and-render framework naturally can act as an effective data augmentation engine for boosting modern face recognition systems even on strong baseline models.

翻訳日:2022-12-22 10:00:41 公開日:2020-03-18

# 時空間特徴系列に基づくドライバ疲労認識アルゴリズム

A Driver Fatigue Recognition Algorithm Based on Spatio-Temporal Feature Sequence ( http://arxiv.org/abs/2003.08134v1 )

ライセンス: Link先を確認

Chen Zhang, Xiaobo Lu, Zhiliang Huang

(参考訳) 道路交通事故における疲労運転は交通事故の重要な原因の一つであり,運転者の疲労認識アルゴリズムを用いて道路交通安全を改善することが重要である。近年、ディープラーニングの発展に伴い、パターン認識の分野は大きな発展を遂げている。本稿では, 時空間特徴系列に基づくリアルタイム疲労状態認識アルゴリズムを設計し, 主に疲労運転認識の現場に適用できることを示す。このアルゴリズムは,顔検出ネットワーク,顔のランドマーク検出,頭部ポーズ推定ネットワーク,疲労認識ネットワークの3つのタスクネットワークに分けられる。実験により,このアルゴリズムは小体積,高速,高精度の利点を有することが示された。

Researches show that fatigue driving is one of the important causes of road traffic accidents, so it is of great significance to study the driver fatigue recognition algorithm to improve road traffic safety. In recent years, with the development of deep learning, the field of pattern recognition has made great development. This paper designs a real-time fatigue state recognition algorithm based on spatio-temporal feature sequence, which can be mainly applied to the scene of fatigue driving recognition. The algorithm is divided into three task networks: face detection network, facial landmark detection and head pose estimation network, fatigue recognition network. Experiments show that the algorithm has the advantages of small volume, high speed and high accuracy.

翻訳日:2022-12-22 10:00:17 公開日:2020-03-18

# cafenet:クラス非依存のマイナショットエッジ検出ネットワーク

CAFENet: Class-Agnostic Few-Shot Edge Detection Network ( http://arxiv.org/abs/2003.08235v1 )

ライセンス: Link先を確認

Young-Hyun Park, Jun Seo, Jaekyun Moon

(参考訳) 少数のラベル付きサンプルのみを用いて、新しいカテゴリのクリップ境界をローカライズすることを目的とした、数発のセマンティックエッジ検出と呼ばれる、新しい数発の学習課題に取り組む。また,メタ学習戦略に基づくクラス非依存Few-shot Edge Detection Network (CAFENet)を提案する。 CAFENetは、エッジラベルのセマンティック情報の欠如を補うために、セマンティックセグメンテーションモジュールを小規模に採用している。予測されたセグメンテーションマスクは、対象対象領域をハイライトするアテンションマップを生成し、デコーダモジュールをその領域に集中させる。また,マルチスプリットマッチングに基づく新たな正規化手法を提案する。メタトレーニングでは、高次元ベクトルを持つ計量学習問題は、低次元部分ベクトルを持つ小さな部分問題に分割される。そこで我々はFSE-1000とSBD-$5^i$という2つの新しいデータセットを構築し,提案したCAFENetの性能評価を行った。大規模なシミュレーション結果からCAFENetで採用した手法の性能評価が得られた。

We tackle a novel few-shot learning challenge, which we call few-shot semantic edge detection, aiming to localize crisp boundaries of novel categories using only a few labeled samples. We also present a Class-Agnostic Few-shot Edge detection Network (CAFENet) based on meta-learning strategy. CAFENet employs a semantic segmentation module in small-scale to compensate for lack of semantic information in edge labels. The predicted segmentation mask is used to generate an attention map to highlight the target object region, and make the decoder module concentrate on that region. We also propose a new regularization method based on multi-split matching. In meta-training, the metric-learning problem with high-dimensional vectors are divided into small subproblems with low-dimensional sub-vectors. Since there is no existing dataset for few-shot semantic edge detection, we construct two new datasets, FSE-1000 and SBD-$5^i$, and evaluate the performance of the proposed CAFENet on them. Extensive simulation results confirm the performance merits of the techniques adopted in CAFENet.

翻訳日:2022-12-22 10:00:07 公開日:2020-03-18

# 入院患者の栄養素摂取量評価のための人工知能システム

An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients ( http://arxiv.org/abs/2003.08273v1 )

ライセンス: Link先を確認

Ya Lu, Thomai Stathopoulou, Maria F. Vasiloglou, Stergios Christodoulidis, Zeno Stanga, Stavroula Mougiakakou

(参考訳) 入院患者の栄養摂取の定期的なモニタリングは、疾患関連栄養失調のリスクを低減する上で重要な役割を果たす。栄養素摂取量を推定するいくつかの手法が開発されているが、データ精度を改善し、参加者の負担と健康コストを軽減できるため、より信頼性が高く完全に自動化された技術が要求されている。本稿では,食事摂取前後のRGB深度(RGB-D)画像ペアを簡便に処理することで,栄養摂取量を正確に推定する人工知能(AI)に基づく新しいシステムを提案する。このシステムは、食品セグメンテーションのための新しいマルチタスクコンテキストネットワークと、食品認識のための限られたトレーニングサンプルで構築された数ショット学習ベースの分類器と、3d表面構築のためのアルゴリズムを含んでいる。これにより、食品のシーケンシャルセグメンテーション、認識、消費食品量の推定が可能になり、各食事の栄養素摂取量を完全に自動で推定することができる。システムの開発と評価のために,322食の画像と栄養素のレシピを含む専用データベースを,革新的な戦略を用いてデータアノテーションと組み合わせて構築した。実験の結果, 推定栄養素摂取量は, 地上の真実と高い相関関係を示し, 平均相対誤差(20%)が非常に小さく, 既存の栄養素摂取評価技術よりも優れていた。

Regular monitoring of nutrient intake in hospitalised patients plays a critical role in reducing the risk of disease-related malnutrition. Although several methods to estimate nutrient intake have been developed, there is still a clear demand for a more reliable and fully automated technique, as this could improve data accuracy and reduce both the burden on participants and health costs. In this paper, we propose a novel system based on artificial intelligence (AI) to accurately estimate nutrient intake, by simply processing RGB Depth (RGB-D) image pairs captured before and after meal consumption. The system includes a novel multi-task contextual network for food segmentation, a few-shot learning-based classifier built by limited training samples for food recognition, and an algorithm for 3D surface construction. This allows sequential food segmentation, recognition, and estimation of the consumed food volume, permitting fully automatic estimation of the nutrient intake for each meal. For the development and evaluation of the system, a dedicated new database containing images and nutrient recipes of 322 meals is assembled, coupled to data annotation using innovative strategies. Experimental results demonstrate that the estimated nutrient intake is highly correlated (> 0.91) to the ground truth and shows very small mean relative errors (< 20%), outperforming existing techniques proposed for nutrient intake assessment.

翻訳日:2022-12-22 09:59:49 公開日:2020-03-18

# MINT:相互情報に基づくニューロントリミングによるディープネットワーク圧縮

MINT: Deep Network Compression via Mutual Information-based Neuron Trimming ( http://arxiv.org/abs/2003.08472v1 )

ライセンス: Link先を確認

Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh

(参考訳) プルーニングによるディープニューラルネットワーク圧縮へのほとんどのアプローチは、その重みを使ってフィルタの重要性を評価するか、あるいはスパシティ制約のある代替目的関数を最適化する。これらの手法は、類似のフィルタからの貢献を近似する有用な方法を提供するが、しばしば層間の依存性を無視したり、標準的なクロスエントロピーよりもより難しい最適化目標を解決したりする。我々の手法であるMINT(Multual Information-based Neuron Trimming)は,各層にまたがる隣接層間のフィルタの強度に基づいて,パーキングによる深部圧縮にアプローチする。この関係は、グラフベースの基準を用いてフィルタ間で交換される類似情報量を評価する条件付き幾何相互情報を用いて算出される。ネットワークをプルーニングする場合、保持されたフィルタが、高い性能を保証する後続層への情報の大部分に寄与することを保証する。提案手法は,MNIST, CIFAR-10, ILSVRC2012など,様々なネットワークアーキテクチャの標準ベンチマークにおいて,既存の最先端圧縮処理手法よりも優れている。さらに,本手法の逆攻撃に対する応答と,元のネットワークと比較した場合の校正統計との共通分母の観測について検討した。

Most approaches to deep neural network compression via pruning either evaluate a filter's importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard cross-entropy. Our method, Mutual Information-based Neuron Trimming (MINT), approaches deep compression via pruning by enforcing sparsity based on the strength of the relationship between filters of adjacent layers, across every pair of layers. The relationship is calculated using conditional geometric mutual information which evaluates the amount of similar information exchanged between the filters using a graph-based criterion. When pruning a network, we ensure that retained filters contribute the majority of the information towards succeeding layers which ensures high performance. Our novel approach outperforms existing state-of-the-art compression-via-pruning methods on the standard benchmarks for this task: MNIST, CIFAR-10, and ILSVRC2012, across a variety of network architectures. In addition, we discuss our observations of a common denominator between our pruning methodology's response to adversarial attacks and calibration statistics when compared to the original network.

翻訳日:2022-12-22 09:57:47 公開日:2020-03-18

# ヘッドマウントディスプレイ用ガゼセンシングLED

Gaze-Sensing LEDs for Head Mounted Displays ( http://arxiv.org/abs/2003.08499v1 )

ライセンス: Link先を確認

Kaan Ak\c{s}it, Jan Kautz, David Luebke

(参考訳) ヘッドマウントディスプレイ(HMD)のための新しいガウントラッカーを導入する。光エミットダイオード(LED)を用いて、2つの既製のHMDを視線対応に修正する。私たちの重要な貢献は、LEDのセンシング機能を利用して、仮想現実(VR)アプリケーションのための低消費電力の視線トラッカーを作成することです。これにより、最小限のハードウェアを使用して、モバイルデバイス上で動作する軽量教師付きガウスプロセス回帰(GPR)を使用して、高い精度と低レイテンシを実現するシンプルなアプローチが得られる。ハードウェアを用いて,ミンコフスキー距離測度に基づくGPR実装は,自由パラメータを正確に決定することなく,一般的に使用される放射基底関数に基づくサポートベクター回帰(SVR)よりも優れていることを示す。本手法では,オフ軸光路による複雑な次元縮小,特徴抽出,歪み補正を必要としないことを示す。眼球追跡アプリケーションを用いた2つの完全なHMDプロトタイプを実演し,プロトタイプを用いた一連の主観的テストについて報告する。

We introduce a new gaze tracker for Head Mounted Displays (HMDs). We modify two off-the-shelf HMDs to be gaze-aware using Light Emitting Diodes (LEDs). Our key contribution is to exploit the sensing capability of LEDs to create low-power gaze tracker for virtual reality (VR) applications. This yields a simple approach using minimal hardware to achieve good accuracy and low latency using light-weight supervised Gaussian Process Regression (GPR) running on a mobile device. With our hardware, we show that Minkowski distance measure based GPR implementation outperforms the commonly used radial basis function-based support vector regression (SVR) without the need to precisely determine free parameters. We show that our gaze estimation method does not require complex dimension reduction techniques, feature extraction, or distortion corrections due to off-axis optical paths. We demonstrate two complete HMD prototypes with a sample eye-tracked application, and report on a series of subjective tests using our prototypes.

翻訳日:2022-12-22 09:57:23 公開日:2020-03-18

# 安定な神経流れ

Stable Neural Flows ( http://arxiv.org/abs/2003.08063v1 )

ライセンス: Link先を確認

Stefano Massaroli, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, Hajime Asama

(参考訳) ニューラルネットワークによってパラメータ化されたエネルギー関数上で軌道が進化するニューラル常微分方程式(ニューラルODE)の確率的に安定な変種を導入する。安定なニューラルフローは、深さ流の漸近安定性を暗黙的に保証し、数値解法に対する入力摂動に対する頑健さと計算負荷を低下させる。学習手順は最適制御問題としてキャストされ、随伴感性分析に基づいて近似解が提案される。さらに最適化プロセスの容易化と収束の高速化を目的とした新しい正規化器を導入する。提案するモデルクラスは非線形分類と関数近似タスクで評価される。

We introduce a provably stable variant of neural ordinary differential equations (neural ODEs) whose trajectories evolve on an energy functional parametrised by a neural network. Stable neural flows provide an implicit guarantee on asymptotic stability of the depth-flows, leading to robustness against input perturbations and low computational burden for the numerical solver. The learning procedure is cast as an optimal control problem, and an approximate solution is proposed based on adjoint sensivity analysis. We further introduce novel regularizers designed to ease the optimization process and speed up convergence. The proposed model class is evaluated on non-linear classification and function approximation tasks.

翻訳日:2022-12-22 09:51:08 公開日:2020-03-18

# 近似勾配法による非凸非微分可能ミニマックスゲームの解法

Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method ( http://arxiv.org/abs/2003.08093v1 )

ライセンス: Link先を確認

Babak Barazandeh and Meisam Razaviyayn

(参考訳) min-max saddle pointゲームは、機械の傾きや信号処理における幅広い応用に現れる。適用性は広いが、理論的な研究は主に特別な凸凹構造に限られる。最近の研究では、これらの結果を特別な滑らかな非凸ケースに一般化したものの、非滑らかなシナリオに対する理解はまだ限られている。本研究では,目的関数がプレイヤーの決定変数の1つに対して(強く)凸である場合,非滑らかなmin-maxゲームの特徴形式について検討する。単純な多段階の近位勾配降下勾配アルゴリズムは、min-maxゲームの1次ナッシュ平衡に収束し、1/\epsilon$の多項式となる勾配評価の個数を示す。また、文献上に存在するものよりも定常性の概念が強いことも示します。最後に,LASSO推定器への逆攻撃による提案アルゴリズムの性能評価を行った。

Min-max saddle point games appear in a wide range of applications in machine leaning and signal processing. Despite their wide applicability, theoretical studies are mostly limited to the special convex-concave structure. While some recent works generalized these results to special smooth non-convex cases, our understanding of non-smooth scenarios is still limited. In this work, we study special form of non-smooth min-max games when the objective function is (strongly) convex with respect to one of the player's decision variable. We show that a simple multi-step proximal gradient descent-ascent algorithm converges to $\epsilon$-first-order Nash equilibrium of the min-max game with the number of gradient evaluations being polynomial in $1/\epsilon$. We will also show that our notion of stationarity is stronger than existing ones in the literature. Finally, we evaluate the performance of the proposed algorithm through adversarial attack on a LASSO estimator.

翻訳日:2022-12-22 09:50:48 公開日:2020-03-18

# 点雲の動的還元ネットワーク

A Dynamic Reduction Network for Point Clouds ( http://arxiv.org/abs/2003.08013v1 )

ライセンス: Link先を確認

Lindsey Gray (1), Thomas Klijnsma (1), Shamik Ghosh (2) ((1) Fermi National Accelerator Laboratory, (2) Saha Institute of Nuclear Physics)

(参考訳) 画像全体を分類することは機械学習の古典的な問題であり、グラフニューラルネットワークは非常に不規則な幾何学を学ぶための強力な手法である。全体分類を決定する場合、点雲の一部が他の部分よりも重要である場合がしばしばある。グラフ構造では、これは畳み込みフィルタの最後に情報をプールすることから始まり、静的グラフ上の様々なステージ付きプーリング技術へと進化した。本稿では,所定のグラフ構造の必要性を排除したプーリングの動的グラフ定式化を導入する。中間クラスタリングを通じてデータ間の最も重要な関係を動的に学習することで、これを実現する。ネットワークアーキテクチャは、表現サイズと効率性を考慮した興味深い結果をもたらす。また、高エネルギー粒子物理学における画像分類からエネルギー回帰まで、多くのタスクに容易に適応できる。

Classifying whole images is a classic problem in machine learning, and graph neural networks are a powerful methodology to learn highly irregular geometries. It is often the case that certain parts of a point cloud are more important than others when determining overall classification. On graph structures this started by pooling information at the end of convolutional filters, and has evolved to a variety of staged pooling techniques on static graphs. In this paper, a dynamic graph formulation of pooling is introduced that removes the need for predetermined graph structure. It achieves this by dynamically learning the most important relationships between data via an intermediate clustering. The network architecture yields interesting results considering representation size and efficiency. It also adapts easily to a large number of tasks from image classification to energy regression in high energy particle physics.

翻訳日:2022-12-22 09:49:35 公開日:2020-03-18

# カプセルネットワークを用いたジェネレータアーキテクチャのためのカプセルGAN

Capsule GAN Using Capsule Network for Generator Architecture ( http://arxiv.org/abs/2003.08047v1 )

ライセンス: Link先を確認

Kanako Marusaki and Hiroshi Watanabe

(参考訳) 本稿では,キャプリケータだけでなく,ジェネレータ内でもCapsule Networkを用いた生成逆ネットワークであるCapsule GANを提案する。近年,GAN(Generative Adversarial Network)の研究が盛んに行われている。しかし,GANによる画像生成は困難である。したがって、GANは時に画質の悪い画像を生成する。これらのGANは畳み込みニューラルネットワーク(CNN)を使用する。しかし、cnnには画像の特徴間の関係情報が失われる可能性があるという欠陥がある。 2017年に hinton が提案した capsule network は cnn の欠陥を克服している。 Capsule GANは以前、差別装置でCapsule Networkを使用していると報告している。しかし、Capsule Networkを使う代わりに、Capsule GANは以前の研究でDCGANのようなジェネレータアーキテクチャでCNNを使用していると報告している。本稿では,ジェネレータにCapsule Networkを使用する2つのアプローチを紹介する。 1つは、ジェネレータへの入力として識別器からdigitcaps層を使用することである。 DigitCaps層はCapsule Networkの出力層である。判別器の入力画像の特徴を有する。もう1つは、ジェネレータ内のカプセルネットワークにおける認識プロセスの逆操作を使用することである。本稿では,この論文で提案したCapsule GANと,CNNとCapsule GANを用いた従来のGANを比較した。データセットはMNIST、Fashion-MNIST、カラー画像である。 Capsule GAN は CNN と Capsule Network で GAN より優れていることを示す。本稿では, Capsule GAN のアーキテクチャを Capsule Network を用いた基本アーキテクチャとして提案する。したがって,既存のGANの改良手法をCapsule GANに適用することができる。

This paper presents Capsule GAN, a Generative adversarial network using Capsule Network not only in the discriminator but also in the generator. Recently, Generative adversarial networks (GANs) has been intensively studied. However, generating images by GANs is difficult. Therefore, GANs sometimes generate poor quality images. These GANs use convolutional neural networks (CNNs). However, CNNs have the defect that the relational information between features of the image may be lost. Capsule Network, proposed by Hinton in 2017, overcomes the defect of CNNs. Capsule GAN reported previously uses Capsule Network in the discriminator. However, instead of using Capsule Network, Capsule GAN reported in previous studies uses CNNs in generator architecture like DCGAN. This paper introduces two approaches to use Capsule Network in the generator. One is to use DigitCaps layer from the discriminator as the input to the generator. DigitCaps layer is the output layer of Capsule Network. It has the features of the input images of the discriminator. The other is to use the reverse operation of recognition process in Capsule Network in the generator. We compare Capsule GAN proposed in this paper with conventional GAN using CNN and Capsule GAN which uses Capsule Network in the discriminator only. The datasets are MNIST, Fashion-MNIST and color images. We show that Capsule GAN outperforms the GAN using CNN and the GAN using Capsule Network in the discriminator only. The architecture of Capsule GAN proposed in this paper is a basic architecture using Capsule Network. Therefore, we can apply the existing improvement techniques for GANs to Capsule GAN.

翻訳日:2022-12-22 09:48:38 公開日:2020-03-18

# ブートストラップバイアス補正クロス検証のスーパーラーニングへの応用

Bootstrap Bias Corrected Cross Validation applied to Super Learning ( http://arxiv.org/abs/2003.08342v1 )

ライセンス: Link先を確認

Krzysztof Mnich and Agnieszka Kitlas Goli\'nska and Aneta Polewko-Klim and Witold R. Rudnicki

(参考訳) 超学習者アルゴリズムは、複数のベース学習者の結果を組み合わせて予測の質を向上させることができる。超学習者の結果を検証するデフォルトの方法は、ネストされたクロスバリデーションである。 Tsamardinosらは、ネストしたクロスバリデーションを学習アルゴリズムのハイパーパラメータをチューニングするための再サンプリングに置き換えることを提案した。このアイデアをsuper learnerの検証に適用し,nested cross validationを含む他の検証手法と比較する。様々なサイズの人工データセットと7つの実際の生物医学データセットでテストが行われた。 Bootstrap Bias Correctionと呼ばれる再サンプリング手法は、ネストされたクロスバリデーションに対して、合理的に正確でコスト効率のよい代替手段であることが判明した。

Super learner algorithm can be applied to combine results of multiple base learners to improve quality of predictions. The default method for verification of super learner results is by nested cross validation. It has been proposed by Tsamardinos et al., that nested cross validation can be replaced by resampling for tuning hyper-parameters of the learning algorithms. We apply this idea to verification of super learner and compare with other verification methods, including nested cross validation. Tests were performed on artificial data sets of diverse size and on seven real, biomedical data sets. The resampling method, called Bootstrap Bias Correction, proved to be a reasonably precise and very cost-efficient alternative for nested cross validation.

翻訳日:2022-12-22 09:42:05 公開日:2020-03-18

# survlime: 機械学習生存モデルを説明する方法

SurvLIME: A method for explaining machine learning survival models ( http://arxiv.org/abs/2003.08371v1 )

ライセンス: Link先を確認

Maxim S. Kovalev, Lev V. Utkin, Ernest M. Kasimov

(参考訳) 機械学習生存モデルを説明するためにsurvlimeと呼ばれる新しい手法を提案する。これはよく知られたメソッド LIME の拡張や修正と見なすことができる。提案手法の背景にある主な考え方は,Cox比例ハザードモデルを用いて,試験例の周辺地域における生存率モデルを近似することである。コックスモデルは、例の共変数の線形結合を、共変数の係数が予測に定量的に影響を及ぼすとみなすことができると考えるために用いられる。もう1つのアイデアは、説明されたモデルとcoxモデルの累積ハザード関数を、関心点周辺の局所領域における摂動点の集合を用いて近似することである。この方法は制約のない凸最適化問題に還元される。多くの数値実験がサーヴライム効率を示している。

A new method called SurvLIME for explaining machine learning survival models is proposed. It can be viewed as an extension or modification of the well-known method LIME. The main idea behind the proposed method is to apply the Cox proportional hazards model to approximate the survival model at the local area around a test example. The Cox model is used because it considers a linear combination of the example covariates such that coefficients of the covariates can be regarded as quantitative impacts on the prediction. Another idea is to approximate cumulative hazard functions of the explained model and the Cox model by using a set of perturbed points in a local area around the point of interest. The method is reduced to solving an unconstrained convex optimization problem. A lot of numerical experiments demonstrate the SurvLIME efficiency.

翻訳日:2022-12-22 09:41:53 公開日:2020-03-18

# necpd:最適確率勾配降下を伴うオンラインテンソル分解

NeCPD: An Online Tensor Decomposition with Optimal Stochastic Gradient Descent ( http://arxiv.org/abs/2003.08844v1 )

ライセンス: Link先を確認

Ali Anaissi, Basem Suleiman, Seid Miad Zandavi

(参考訳) マルチウェイデータ分析は、テンソル $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $ に格納された高次データセットの基盤構造をキャプチャするための重要なツールとなっている。 CANDECOMP/PARAFAC$ (CP)分解は広く研究され、$\mathcal{X}$ by $N$ loading matrices $A^{(1)}, \dots, A^{(N)}$ ここで$N$はテンソルの順序を表す。確率勾配勾配(SGD)アルゴリズムに基づくマルチウェイオンラインデータにおける非凸問題に対するNeCPDという新しい効率的なCP分解解法を提案する。 SGDは1ステップで$\mathcal{X}^{(t+1)}$を更新できるので、オンライン設定では非常に便利です。大域収束に関しては、SGDが非凸問題を扱う際に多くのサドル点に留まることが知られている。ヘシアン行列を解析し,これらの鞍点を同定し,勾配更新ステップにノイズをほとんど付加しない摂動法を用いてそれらから逃れようとする。さらに,Nesterov の Accelerated Gradient (NAG) 法をSGD アルゴリズムに適用し,収束速度を最適に高速化し,エポック毎のヘシアン計算遅延時間を補償する。実験室ベースおよび実生活構造データセットを用いた構造健康モニタリングの分野での実験的な評価により,既存のオンラインテンソル解析法と比較して,より正確な結果が得られた。

Multi-way data analysis has become an essential tool for capturing underlying structures in higher-order datasets stored in tensor $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $. $CANDECOMP/PARAFAC$ (CP) decomposition has been extensively studied and applied to approximate $\mathcal{X}$ by $N$ loading matrices $A^{(1)}, \dots, A^{(N)}$ where $N$ represents the order of the tensor. We propose a new efficient CP decomposition solver named NeCPD for non-convex problem in multi-way online data based on stochastic gradient descent (SGD) algorithm. SGD is very useful in online setting since it allows us to update $\mathcal{X}^{(t+1)}$ in one single step. In terms of global convergence, it is well known that SGD stuck in many saddle points when it deals with non-convex problems. We study the Hessian matrix to identify theses saddle points, and then try to escape them using the perturbation approach which adds little noise to the gradient update step. We further apply Nesterov's Accelerated Gradient (NAG) method in SGD algorithm to optimally accelerate the convergence rate and compensate Hessian computational delay time per epoch. Experimental evaluation in the field of structural health monitoring using laboratory-based and real-life structural datasets show that our method provides more accurate results compared with existing online tensor analysis methods.

翻訳日:2022-12-22 09:41:00 公開日:2020-03-18

# opengan: オープンセット生成型広告ネットワーク

OpenGAN: Open Set Generative Adversarial Networks ( http://arxiv.org/abs/2003.08074v1 )

ライセンス: Link先を確認

Luke Ditria, Benjamin J. Meyer, Tom Drummond

(参考訳) 既存の条件付きジェネレータネットワーク(cGAN)の多くは、事前に定義されたクラスレベルのセマンティックラベルや属性の条件付けに限られている。計量空間から特徴埋め込みした入力サンプル毎に条件付けされたオープン集合 gan アーキテクチャ (opengan) を提案する。クラスレベルときめ細かなセマンティック情報をエンコードする最先端のメトリック学習モデルを用いて、与えられたソース画像にセマンティックに類似したサンプルを生成することができる。計量学習モデルによって抽出された意味情報は、分布外の新しいクラスに転送され、生成モデルがトレーニング分布外のサンプルを生成する。提案手法は,学習クラスに類似した視覚的品質を持つ新しいクラスから256$\times$256の解像度画像を生成することができることを示す。ソース画像の代わりに、距離空間のランダムなサンプリングも高品質なサンプルをもたらすことを示す。特徴空間と潜在空間の補間は画像空間における意味的かつ視覚的に可算な変換をもたらすことを示す。最後に、データ拡張の下流タスクに対する生成されたサンプルの有用性を示す。 GAN トレーニング分布外のクラスにおいて,OpenGAN サンプルを用いて学習データを増強することにより,分類器の性能を大幅に向上できることを示す。

Many existing conditional Generative Adversarial Networks (cGANs) are limited to conditioning on pre-defined and fixed class-level semantic labels or attributes. We propose an open set GAN architecture (OpenGAN) that is conditioned per-input sample with a feature embedding drawn from a metric space. Using a state-of-the-art metric learning model that encodes both class-level and fine-grained semantic information, we are able to generate samples that are semantically similar to a given source image. The semantic information extracted by the metric learning model transfers to out-of-distribution novel classes, allowing the generative model to produce samples that are outside of the training distribution. We show that our proposed method is able to generate 256$\times$256 resolution images from novel classes that are of similar visual quality to those from the training classes. In lieu of a source image, we demonstrate that random sampling of the metric space also results in high-quality samples. We show that interpolation in the feature space and latent space results in semantically and visually plausible transformations in the image space. Finally, the usefulness of the generated samples to the downstream task of data augmentation is demonstrated. We show that classifier performance can be significantly improved by augmenting the training data with OpenGAN samples on classes that are outside of the GAN training distribution.

翻訳日:2022-12-22 09:40:29 公開日:2020-03-18

# ContainerStress:ビッグデータMLユースケースのためのクラウドノード自動スコープフレームワーク

ContainerStress: Autonomous Cloud-Node Scoping Framework for Big-Data ML Use Cases ( http://arxiv.org/abs/2003.08011v1 )

ライセンス: Link先を確認

Guang Chao Wang, Kenny Gross, and Akshay Subramaniam

(参考訳) クラウド環境にビッグデータ機械学習(ML)サービスをデプロイすることは、クラウドベンダにとって、任意の顧客ユースケースのサイズを拡大するクラウドコンテナの構成に関する課題となる。 OracleLabsは、ネストループのMonte Carloシミュレーションを使用して、クラウドCPU-GPU"Shapes"(エンドユーザが利用可能なクラウドコンテナ内のCPUやGPUの設定)の範囲で、任意のサイズの顧客MLユースケースを自律的にスケールする自動フレームワークを開発した。さらに、OracleLabsとNVIDIAの著者は、MLの予測アルゴリズムの計算コストとGPUアクセラレーションを分析し、従来のCPUとNVIDIA GPUで構成されるクラウドコンテナの計算コストの削減を評価するMLベンチマーク研究に協力している。

Deploying big-data Machine Learning (ML) services in a cloud environment presents a challenge to the cloud vendor with respect to the cloud container configuration sizing for any given customer use case. OracleLabs has developed an automated framework that uses nested-loop Monte Carlo simulation to autonomously scale any size customer ML use cases across the range of cloud CPU-GPU "Shapes" (configurations of CPUs and/or GPUs in Cloud containers available to end customers). Moreover, the OracleLabs and NVIDIA authors have collaborated on a ML benchmark study which analyzes the compute cost and GPU acceleration of any ML prognostic algorithm and assesses the reduction of compute cost in a cloud container comprising conventional CPUs and NVIDIA GPUs.

翻訳日:2022-12-22 09:39:15 公開日:2020-03-18

# 活性化機能とXavierおよびHe正規初期化との関連性に関する調査

A Survey on Activation Functions and their relation with Xavier and He Normal Initialization ( http://arxiv.org/abs/2004.06632v1 )

ライセンス: Link先を確認

Leonid Datta

(参考訳) ニューラルネットワークでは、活性化関数と重み初期化法は、ニューラルネットワークのトレーニングとパフォーマンスにおいて重要な役割を果たす。問題は、機能の性質が、よく機能するアクティベーション機能として重要/必要であるかどうかである。また、最も広く使われている重み初期化法(xavierとhe normal initialization)は、アクティベーション関数と基本的な関係がある。本研究は活性化機能と最も広く利用されている活性化機能(sgmoid, tanh, relu, lrelu, prelu)の重要/必要特性について述べる。また,これらの活性化関数と2つの重み初期化法 (xavier と he normal initialization) との関係についても検討した。

In artificial neural network, the activation function and the weight initialization method play important roles in training and performance of a neural network. The question arises is what properties of a function are important/necessary for being a well-performing activation function. Also, the most widely used weight initialization methods - Xavier and He normal initialization have fundamental connection with activation function. This survey discusses the important/necessary properties of activation function and the most widely used activation functions (sigmoid, tanh, ReLU, LReLU and PReLU). This survey also explores the relationship between these activation functions and the two weight initialization methods - Xavier and He normal initialization.

翻訳日:2022-12-22 09:32:37 公開日:2020-03-18

# 知識グラフ補完手法の現実的再評価:実験的検討

Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study ( http://arxiv.org/abs/2003.08001v1 )

ライセンス: Link先を確認

Farahnaz Akrami (1), Mohammed Samiul Saeef (1), Qingheng Zhang (2), Wei Hu (2), Chengkai Li (1) ((1) Department of Computer Science and Engineering, University of Texas at Arlington, (2) State Key Laboratory for Novel Software Technology, Nanjing University)

(参考訳) 知識グラフの補完、特にリンク予測のタスクに埋め込みモデルを用いる活発な研究分野において、ほとんどの先行研究は2つのベンチマークデータセット fb15k と wn18 を用いた。これらの研究における多くの3つのデータセットは、意味的重複、相関、データ不完全性による高いデータ冗長性を示す逆関係と重複関係に属する。これは過剰なデータ漏洩のケースです – モデルが実際の予測に適用される必要がある場合に利用できない機能を使用して、モデルをトレーニングします。また、適用対象と対象のデカルト積によって形成される全ての三重項が真の事実であるデカルト積関係もある。上記の関係に関するリンク予測は簡単であり、洗練された埋め込みモデルではなく、単純な規則を用いてより正確な精度で実現できる。これらのモデルのより根本的な欠点は、リンク予測のシナリオが実世界では存在しないことである。本論文は,非現実的三重項除去時の埋め込みモデルの真の有効性を評価することを目的とした,最初の体系的研究である。実験の結果、これらのモデルは以前よりもはるかに精度が低いことがわかった。それらの精度の低さは、リンク予測を真に効果的な自動化ソリューションなしでタスクにします。したがって、有効なアプローチの再検討が必要である。

In the active research area of employing embedding models for knowledge graph completion, particularly for the task of link prediction, most prior studies used two benchmark datasets FB15k and WN18 in evaluating such models. Most triples in these and other datasets in such studies belong to reverse and duplicate relations which exhibit high data redundancy due to semantic duplication, correlation or data incompleteness. This is a case of excessive data leakage---a model is trained using features that otherwise would not be available when the model needs to be applied for real prediction. There are also Cartesian product relations for which every triple formed by the Cartesian product of applicable subjects and objects is a true fact. Link prediction on the aforementioned relations is easy and can be achieved with even better accuracy using straightforward rules instead of sophisticated embedding models. A more fundamental defect of these models is that the link prediction scenario, given such data, is non-existent in the real-world. This paper is the first systematic study with the main objective of assessing the true effectiveness of embedding models when the unrealistic triples are removed. Our experiment results show these models are much less accurate than what we used to perceive. Their poor accuracy renders link prediction a task without truly effective automated solution. Hence, we call for re-investigation of possible effective approaches.

翻訳日:2022-12-22 09:32:24 公開日:2020-03-18

# 構文グラフ畳み込みネットワークによる文書要約のための選択的注意エンコーダ

Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization ( http://arxiv.org/abs/2003.08004v1 )

ライセンス: Link先を確認

Haiyang Xu, Yun Wang, Kun Han, Baochang Ma, Junwen Chen, Xiangang Li

(参考訳) 抽象的なテキスト要約は難しい課題であり、ソーステキストから有意な情報を効果的に抽出し、要約を生成するメカニズムを設計する必要がある。ソーステキストの構文解析プロセスには、より正確な要約を生成するのに役立つ重要な構文構造や意味構造が含まれている。しかし、テキスト要約のための解析木をモデル化することは、その非線形構造のため自明ではなく、複数の文とその解析木を含む文書を扱うのが困難である。本稿では,文書中の文から解析木を接続するためのグラフと,文書の構文表現を学習するための重ね畳み込みグラフ畳み込みネットワーク(GCN)を提案する。選択的注意機構は、意味的・構造的側面において有意な情報を抽出し、抽象的な要約を生成する。 CNN/Daily Mailテキスト要約データセットに対する我々のアプローチを評価する。実験結果は,提案手法がベースラインを上回り,データセット上での最先端性能を実現することを示す。

Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary. A parsing process of the source text contains critical syntactic or semantic structures, which is useful to generate more accurate summary. However, modeling a parsing tree for text summarization is not trivial due to its non-linear structure and it is harder to deal with a document that includes multiple sentences and their parsing trees. In this paper, we propose to use a graph to connect the parsing trees from the sentences in a document and utilize the stacked graph convolutional networks (GCNs) to learn the syntactic representation for a document. The selective attention mechanism is used to extract salient information in semantic and structural aspect and generate an abstractive summary. We evaluate our approach on the CNN/Daily Mail text summarization dataset. The experimental results show that the proposed GCNs based selective attention approach outperforms the baselines and achieves the state-of-the-art performance on the dataset.

翻訳日:2022-12-22 09:32:01 公開日:2020-03-18

# TTTTTackling WinoGrande Schemas

TTTTTackling WinoGrande Schemas ( http://arxiv.org/abs/2003.08380v1 )

ライセンス: Link先を確認

Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

(参考訳) 各例を仮説を含む2つの入力テキスト文字列に分解し,「補足」トークンに割り当てられた確率を仮説のスコアとして用いることで,ai2 winograndeチャレンジに取り組むためにt5シーケンシャル・ツー・シーケンスモデルを適用した。 2020年3月13日、公式のリーダーボードへの最初の(そして唯一の)提出は0.7673 AUCだった。

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis. Our first (and only) submission to the official leaderboard yielded 0.7673 AUC on March 13, 2020, which is the best known result at this time and beats the previous state of the art by over five points.

翻訳日:2022-12-22 09:31:19 公開日:2020-03-18

# 高次イジングモデルにおける推論によるピアグループ効果によるロジスティック回帰

Logistic-Regression with peer-group effects via inference in higher order Ising models ( http://arxiv.org/abs/2003.08259v1 )

ライセンス: Link先を確認

Constantinos Daskalakis, Nishanth Dikkala and Ioannis Panageas

(参考訳) スピングラスモデル(例えば、シェリントン=キルクパトリック、ホップフィールド、イジングモデル)は全て指数関数的離散分布の族としてよく研究されており、ネットワーク上の相関現象のモデル化に使用される多くのアプリケーション領域に影響を与えている。従来、これらのモデルは2次統計量を持ち、その結果、対相互作用から生じる相関を捉える。本研究では,ピアグループ効果を持つソーシャルネットワーク上での行動のモデル化を行い,高次統計量モデルへのこれらの拡張について検討する。特に、ネットワーク上の二進結果を高次スピングラスとしてモデル化し、個人の振る舞いは、自身の共変量のベクトルの線型関数と、他の振る舞いの多項式関数に依存し、ピアグループ効果を捉えている。このようなモデルから高次元のサンプルである {\em single} を用いて、我々の目標は、線型関数の係数とピアグループ効果の強さを回復することである。この結果の核心は、モデルのlog pseudo-likelihoodの強い結合性を示す新しいアプローチであり、最大 pseudo-likelihood estimator (mple) に対する統計エラーレートは$\sqrt{d/n}$であり、ここでは$d$は共変ベクトルの次元であり、$n$はネットワークのサイズ(ノード数)である。我々のモデルは、最近の研究で研究されているベニラロジスティック回帰とピアエフェクトモデルを一般化し、これらの結果を高次相互作用に対応するように拡張する。

Spin glass models, such as the Sherrington-Kirkpatrick, Hopfield and Ising models, are all well-studied members of the exponential family of discrete distributions, and have been influential in a number of application domains where they are used to model correlation phenomena on networks. Conventionally these models have quadratic sufficient statistics and consequently capture correlations arising from pairwise interactions. In this work we study extensions of these to models with higher-order sufficient statistics, modeling behavior on a social network with peer-group effects. In particular, we model binary outcomes on a network as a higher-order spin glass, where the behavior of an individual depends on a linear function of their own vector of covariates and some polynomial function of the behavior of others, capturing peer-group effects. Using a {\em single}, high-dimensional sample from such model our goal is to recover the coefficients of the linear function as well as the strength of the peer-group effects. The heart of our result is a novel approach for showing strong concavity of the log pseudo-likelihood of the model, implying statistical error rate of $\sqrt{d/n}$ for the Maximum Pseudo-Likelihood Estimator (MPLE), where $d$ is the dimensionality of the covariate vectors and $n$ is the size of the network (number of nodes). Our model generalizes vanilla logistic regression as well as the peer-effect models studied in recent works, and our results extend these results to accommodate higher-order interactions.

翻訳日:2022-12-22 09:30:40 公開日:2020-03-18

# unfolding reweighted $\ell_1$-$\ell_1$ による解釈可能なディープリカレントニューラルネットワーク:アーキテクチャ設計と一般化解析

Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted $\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization Analysis ( http://arxiv.org/abs/2003.08334v1 )

ライセンス: Link先を確認

Huynh Van Luong, Boris Joukovsky, Nikos Deligiannis

(参考訳) 例えば、学習された反復収縮しきい値アルゴリズム(LISTA)は、最適化方法の学習的なバリエーションとして、ディープニューラルネットワークを設計する。これらのネットワークは、元の最適化手法よりも高速な収束と高精度を実現することが示されている。本稿では,再重み付けされた$\ell_1$-$\ell_1$最小化アルゴリズムを展開することにより,新しいディープリカレントニューラルネットワーク (coined reweighted-rnn) を開発し,シーケンシャル信号再構成のタスクに適用する。私たちの知る限りでは、これは再重み付け最小化を探求する最初の深い展開方法です。下位の再重み付け最小化モデルにより、rnnは各層内の隠れたユニットごとに異なるソフトthresholding関数(alia、異なるアクティベーション関数)を持つ。さらに、オーバーパラメータ化重みによる既存の深部展開RNNモデルよりも高いネットワーク表現性を有する。重要なことは、Rademacher複雑性を用いて提案したreweighted-RNNモデルの理論的一般化誤差境界を確立することである。境界は、提案されたreweighted-RNNのパラメータ化が良い一般化を保証することを示している。本研究では,低次元計測による映像フレーム再構成問題,すなわち逐次フレーム再構成問題に対して,提案手法を適用した。移動MNISTデータセットの実験結果から,提案した深度再重み付きRNNは既存のRNNモデルよりも大幅に優れていた。

Deep unfolding methods---for example, the learned iterative shrinkage thresholding algorithm (LISTA)---design deep neural networks as learned variations of optimization methods. These networks have been shown to achieve faster convergence and higher accuracy than the original optimization methods. In this line of research, this paper develops a novel deep recurrent neural network (coined reweighted-RNN) by the unfolding of a reweighted $\ell_1$-$\ell_1$ minimization algorithm and applies it to the task of sequential signal reconstruction. To the best of our knowledge, this is the first deep unfolding method that explores reweighted minimization. Due to the underlying reweighted minimization model, our RNN has a different soft-thresholding function (alias, different activation functions) for each hidden unit in each layer. Furthermore, it has higher network expressivity than existing deep unfolding RNN models due to the over-parameterizing weights. Importantly, we establish theoretical generalization error bounds for the proposed reweighted-RNN model by means of Rademacher complexity. The bounds reveal that the parameterization of the proposed reweighted-RNN ensures good generalization. We apply the proposed reweighted-RNN to the problem of video frame reconstruction from low-dimensional measurements, that is, sequential frame reconstruction. The experimental results on the moving MNIST dataset demonstrate that the proposed deep reweighted-RNN significantly outperforms existing RNN models.

翻訳日:2022-12-22 09:23:54 公開日:2020-03-18

# コンピュータビジョンにおける自己監督型コンテキスト帯域

Self-Supervised Contextual Bandits in Computer Vision ( http://arxiv.org/abs/2003.08485v1 )

ライセンス: Link先を確認

Aniket Anand Deshmukh, Abhimanu Kumar, Levi Boyles, Denis Charles, Eren Manavoglu, Urun Dogan

(参考訳) コンテキストバンディットは、仮説テストから製品レコメンデーションまで、ドメイン内の機械学習実践者が直面する一般的な問題である。様々な成功の度合いでコンテキスト的バンディット問題にリッチなデータ表現を利用するには、多くのアプローチがあった。自己教師付き学習は、明示的なラベルなしでリッチなデータ表現を見つけるための有望なアプローチである。典型的な自己指導型学習スキームでは、第一のタスクは問題目標(クラスタリング、分類、埋め込み生成など)によって定義され、第二のタスクは自己監督目標(回転予測、近傍の言葉、着色など)によって定義される。通常のセルフスーパービジョンでは,2次タスクのトレーニングデータから暗黙のラベルを学習する。しかし、文脈的バンディット設定では、学習の初期段階でデータが不足しているため、暗黙的なラベルを得るという利点はありません。文脈的バンディット目標と自己監督目標を組み合わせることにより,この問題に取り組むための新たなアプローチを提案する。文脈的バンディット学習を自己超越で強化することで、より累積的な報酬を得ることができます。 8種類のコンピュータビジョンデータセットを用いた結果,累積報酬が大幅に向上した。提案手法が最適に動作しないケースを提供し、これらのケースでより良い学習を行うための代替手法を提供する。

Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn't perform optimally and give alternative methods for better learning in these cases.

翻訳日:2022-12-22 09:22:20 公開日:2020-03-18

# 半教師付き少数ショット分類のためのタスク適応クラスタリング

Task-Adaptive Clustering for Semi-Supervised Few-Shot Classification ( http://arxiv.org/abs/2003.08221v1 )

ライセンス: Link先を確認

Jun Seo, Sung Whan Yoon, Jaekyun Moon

(参考訳) 未確認のタスクを、少量の新しいトレーニングデータだけで処理することを目的としている。しかし、数発学習者の準備(あるいはメタトレーニング)では、大量のラベル付きデータが必要である。実世界では、残念ながらラベル付きデータは高価で不足している。本研究では,トレーニングデータの大部分をラベル付けしていない半教師環境下でうまく機能する,数発学習システムを提案する。提案手法では, 組込み特徴空間とは異なる新しい射影空間において, 現在のタスクに対するラベル付きサンプルクラスタリングを行う。条件付きクラスタリング空間は、現在のタスクのクラスセンタロイドと、タスク間でメタトレーニングされる独立したクラス毎の参照ベクトルとの間のギャップを迅速に閉じるように線形に構成される。より一般的な設定では,メタ学習におけるタスクコンディショニングの度合いを制御するという概念を導入し,クラスタリング空間の繰り返し更新数に応じてタスクコンディショニングの量が変化する。 miniImageNet と tieredImageNet のデータセットに基づく広範囲なシミュレーション結果から,提案手法の半教師付きセミショット分類性能を示す。シミュレーションの結果,提案したタスク適応型クラスタリングは,対象クラス外からのラベル付きサンプル画像の増加に伴い,優れた劣化を示すことが示された。

Few-shot learning aims to handle previously unseen tasks using only a small amount of new training data. In preparing (or meta-training) a few-shot learner, however, massive labeled data are necessary. In the real world, unfortunately, labeled data are expensive and/or scarce. In this work, we propose a few-shot learner that can work well under the semi-supervised setting where a large portion of training data is unlabeled. Our method employs explicit task-conditioning in which unlabeled sample clustering for the current task takes place in a new projection space different from the embedding feature space. The conditioned clustering space is linearly constructed so as to quickly close the gap between the class centroids for the current task and the independent per-class reference vectors meta-trained across tasks. In a more general setting, our method introduces a concept of controlling the degree of task-conditioning for meta-learning: the amount of task-conditioning varies with the number of repetitive updates for the clustering space. Extensive simulation results based on the miniImageNet and tieredImageNet datasets show state-of-the-art semi-supervised few-shot classification performance of the proposed method. Simulation results also indicate that the proposed task-adaptive clustering shows graceful degradation with a growing number of distractor samples, i.e., unlabeled sample images coming from outside the candidate classes.

翻訳日:2022-12-22 09:22:01 公開日:2020-03-18

# コネクション型AIアプリケーションの脆弱性:評価と防御

Vulnerabilities of Connectionist AI Applications: Evaluation and Defence ( http://arxiv.org/abs/2003.08837v1 )

ライセンス: Link先を確認

Christian Berghoff and Matthias Neu and Arndt von Twickel

(参考訳) この記事では、コネクショナリスト人工知能(AI)アプリケーションのITセキュリティを扱い、三つのITセキュリティ目標の1つである完全性への脅威に焦点を当てます。このような脅威は、例えば、著名なAIコンピュータビジョンアプリケーションにおいて最も関係がある。 ITセキュリティの目標整合性に関する総合的な見解を示すために、解釈可能性、堅牢性、ドキュメントなど多くの追加的な側面が考慮されている。脅威と可能な緩和の包括的なリストは、最先端の文献をレビューすることで提示される。敵の攻撃や毒殺攻撃などのai固有の脆弱性や、ai固有の根本原因を詳細に論じる。さらに、以前のレビューとは対照的に、AIサプライチェーン全体が、計画、データ取得、トレーニング、評価、運用フェーズを含む脆弱性について分析されている。緩和に関する議論は同様に、AIシステム自体のレベルに限らず、サプライチェーンとそのより大きなITインフラストラクチャやハードウェアデバイスへの組み込みという文脈でAIシステムを見ることを提唱している。これと、アダプティブアタッカーがこれまでに公表された任意のAI固有の防御を回避できるという観察に基づいて、記事は、単一の保護措置が不十分である代わりに、AIアプリケーションのための最小レベルのITセキュリティを達成するために、異なるレベルの複数の対策を組み合わせる必要があると結論付けている。

This article deals with the IT security of connectionist artificial intelligence (AI) applications, focusing on threats to integrity, one of the three IT security goals. Such threats are for instance most relevant in prominent AI computer vision applications. In order to present a holistic view on the IT security goal integrity, many additional aspects such as interpretability, robustness and documentation are taken into account. A comprehensive list of threats and possible mitigations is presented by reviewing the state-of-the-art literature. AI-specific vulnerabilities such as adversarial attacks and poisoning attacks as well as their AI-specific root causes are discussed in detail. Additionally and in contrast to former reviews, the whole AI supply chain is analysed with respect to vulnerabilities, including the planning, data acquisition, training, evaluation and operation phases. The discussion of mitigations is likewise not restricted to the level of the AI system itself but rather advocates viewing AI systems in the context of their supply chains and their embeddings in larger IT infrastructures and hardware devices. Based on this and the observation that adaptive attackers may circumvent any single published AI-specific defence to date, the article concludes that single protective measures are not sufficient but rather multiple measures on different levels have to be combined to achieve a minimum level of IT security for AI applications.

翻訳日:2022-12-22 09:21:41 公開日:2020-03-18

PDF登録状況（公開日: 20200318）