Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20220111となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# AI研究の絞り込み? A narrowing of AI research? ( http://arxiv.org/abs/2009.10385v4 ) ライセンス: Link先を確認	Joel Klinger, Juan Mateos-Garcia and Konstantinos Stathoulopoulos	(参考訳) 大規模なデータセットからパターンを推論できるディープラーニング技術が登場し、人工知能(AI)システムの性能が劇的に向上した。しかし、ディープラーニングの急速な発展と採用は、多くの場合、大企業が主導し、堅牢性の欠如、環境コストの高さ、そして潜在的に不公平な結果を含む、その弱点にもかかわらず、AI研究の技術的軌道の早期縮小に関する懸念を生み出している。我々は、人気のあるプレプリントデータベースであるarXivにおけるAI研究のセマンティック分析により、エビデンスベースを改善しようとしている。我々は、AI研究のテーマ的多様性の進化について研究し、学術分野と民間分野におけるAI研究のテーマ的多様性を比較し、AI研究における民間企業の影響力を他の機関との協力を通じて測定する。以上の結果から,近年のai研究の多様性は停滞しており,民間分野でのai研究は学界での研究よりも多様で影響力が強い傾向がみられた。プライベートセクターのAI研究者は、他のAI手法を含む研究、AIの社会的および倫理的意味を考慮に入れた研究、健康のような分野の応用を犠牲にして、データ収集と計算集約的なディープラーニング手法を専門とする傾向にある。我々の研究結果は、その社会的利益を制限する可能性のあるAI研究の早期縮小を防ぐための政策行動の理論的根拠を提供するが、そのような介入の仕方における情報、インセンティブ、スケールハードルに留意する。 The arrival of deep learning techniques able to infer patterns from large datasets has dramatically improved the performance of Artificial Intelligence (AI) systems. Deep learning's rapid development and adoption, in great part led by large technology companies, has however created concerns about a premature narrowing in the technological trajectory of AI research despite its weaknesses, which include lack of robustness, high environmental costs, and potentially unfair outcomes. We seek to improve the evidence base with a semantic analysis of AI research in arXiv, a popular pre-prints database. We study the evolution of the thematic diversity of AI research, compare the thematic diversity of AI research in academia and the private sector and measure the influence of private companies in AI research through the citations they receive and their collaborations with other institutions. Our results suggest that diversity in AI research has stagnated in recent years, and that AI research involving the private sector tends to be less diverse and more influential than research in academia. We also find that private sector AI researchers tend to specialise in data-hungry and computationally intensive deep learning methods at the expense of research involving other AI methods, research that considers the societal and ethical implications of AI, and applications in sectors like health. Our results provide a rationale for policy action to prevent a premature narrowing of AI research that could constrain its societal benefits, but we note the informational, incentive and scale hurdles standing in the way of such interventions.	翻訳日:2023-05-01 07:17:14 公開日:2022-01-11
# 2次元量子渦ガス中の乱流緩和と平衡 Turbulent relaxation to equilibrium in a two-dimensional quantum vortex gas ( http://arxiv.org/abs/2010.10049v3 ) ライセンス: Link先を確認	Matthew T. Reeves, Kwan Goddard-Lee, Guillaume Gauthier, Oliver R. Stockdale, Hayder Salman, Timothy Edmonds, Xiaoquan Yu, Ashton S. Bradley, Mark Baker, Halina Rubinsztein-Dunlop, Matthew J. Davis, and Tyler W. Neely	(参考訳) 二次元カイラル渦気体の乱流緩和ダイナミクスにおける微小キャノニカル平衡状態の出現を実験的に研究した。メカニカル・スターリング・プロトコルを用いて、準2次元円盤状原子ボース・アインシュタイン凝縮体に同符号渦を注入する。結果として生じる長周期渦分布は、固定エネルギー$\cal{H}$と角運動量$\cal{M}$でマイクロカノニカルアンサンブルを記述する系に対する平均場ポアソン・ボルツマン方程式と良好な一致である。平衡状態は逆温度 $\hat{\beta}$ と回転周波数 $\hat{\omega}$ の対応する熱力学的変数によって特徴づけられる。我々は、ゼロ温度、無限温度、負絶対温度に近いオン軸状態を含む渦ガスの全相ダイアグラムにまたがる平衡を実現することができる。十分に高いエネルギーで、システムは対称性破壊遷移を示し、容器の対称性をもはや共有しない負の絶対温度におけるオフ軸平衡相をもたらす。平衡力学を定量的に再現できる現象論的減衰と雑音を伴う点渦モデルを提案する。 We experimentally study emergence of microcanonical equilibrium states in the turbulent relaxation dynamics of a two-dimensional chiral vortex gas. Same-sign vortices are injected into a quasi-two-dimensional disk-shaped atomic Bose-Einstein condensate using a range of mechanical stirring protocols. The resulting long-time vortex distributions are found to be in excellent agreement with the meanfield Poisson-Boltzmann equation for the system describing the microcanonical ensemble at fixed energy $\cal{H}$ and angular momentum $\cal{M}$. The equilibrium states are characterized by the corresponding thermodynamic variables of inverse temperature $\hat{\beta}$ and rotation frequency $\hat{\omega}$. We are able to realize equilibria spanning the full phase diagram of the vortex gas, including on-axis states near zero-temperature, infinite temperature, and negative absolute temperatures. At sufficiently high energies the system exhibits a symmetry-breaking transition, resulting in an off-axis equilibrium phase at negative absolute temperature that no longer shares the symmetry of the container. We introduce a point-vortex model with phenomenological damping and noise that is able to quantitatively reproduce the equilibration dynamics.	翻訳日:2023-04-28 05:51:27 公開日:2022-01-11
# ランダム製品状態からのエンタングルメントダイナミクス:最大エンタングルメントからの逸脱 Entanglement Dynamics From Random Product States: Deviation From Maximal Entanglement ( http://arxiv.org/abs/2102.07584v3 ) ライセンス: Link先を確認	Yichen Huang	(参考訳) 量子多体系の絡み合い力学を研究し、次のことを証明する: (I) 格子上の任意の幾何学的局所ハミルトニアンに対して、ランダムな積状態から始めると、絡み合いエントロピーは高い確率で常に最大エントロピーから遠ざかる。 (II) ランダムな全対全相互作用を持つスピングラスモデルでは、任意の積状態から始めると、平均エンタングルメントエントロピーは常に最大エントロピーから切り離される。また、これらの結果は電荷保存を伴う任意のユニタリ進化や、Sachdev-Ye-Kitaevモデルにも拡張する。この結果は、(カオス)ハミルトン力学によって生じる絡み合いとランダム状態との差を強調し、後者は最大に近い。 We study the entanglement dynamics of quantum many-body systems and prove the following: (I) For any geometrically local Hamiltonian on a lattice, starting from a random product state the entanglement entropy is bounded away from the maximum entropy at all times with high probability. (II) In a spin-glass model with random all-to-all interactions, starting from any product state the average entanglement entropy is bounded away from the maximum entropy at all times. We also extend these results to any unitary evolution with charge conservation and to the Sachdev-Ye-Kitaev model. Our results highlight the difference between the entanglement generated by (chaotic) Hamiltonian dynamics and that of random states, for the latter is nearly maximal.	翻訳日:2023-04-11 02:20:16 公開日:2022-01-11
# 位相空間における複素ネットワークのシミュレーション:ガウスボソンサンプリング Simulating complex networks in phase space: Gaussian boson sampling ( http://arxiv.org/abs/2102.10341v5 ) ライセンス: Link先を確認	Peter D. Drummond, Bogdan Opanchuk, Alexander Dellios and Margaret D. Reid	(参考訳) フォトニックネットワークにおけるガウス量子状態の位相空間シミュレーションにより,ガウスボソンサンプリング(GBS)量子コンピュータの可測相関の検証が可能であることを示す。以上の結果と100次相関実験は一致し,デコヒーレンスを考慮に入れた。これを16,000以上のモードに拡張し、真のマルチパーティの絡み合いをシミュレートする方法を説明する。 We show how phase-space simulations of Gaussian quantum states in a photonic network permit verification of measurable correlations of Gaussian boson sampling (GBS) quantum computers. Our results agree with experiments for up to 100-th order correlations, provided decoherence is included. We extend this to more than 16,000 modes, and describe how to simulate genuine multipartite entanglement.	翻訳日:2023-04-10 15:57:27 公開日:2022-01-11
# 散逸下における量子同期と極限サイクルの代数理論 Algebraic Theory of Quantum Synchronization and Limit Cycles under Dissipation ( http://arxiv.org/abs/2103.01808v5 ) ライセンス: Link先を確認	Berislav Buca, Cameron Booker, Dieter Jaksch	(参考訳) 同期は相互作用する粒子が動きをロックし、非自明なダイナミクスを表示する現象である。古典的極限のない系における同期の研究には激しい努力があったが、包括的な理論は発見されていない。我々は、時間非依存の量子マスター方程式の固有モード(極限サイクル)を持続的に振動させるために必要な新しい代数的基準に基づいて、そのような一般理論を開発する。これらの固有モードは量子コヒーレントでなければならないことを示し、動的対称性代数の観点から、そのような全ての力学に対する厳密な解析解を与える。この理論を用いて、安定同期と準安定/過渡同期の両方について検討する。我々は我々の理論を用いて自律システムの自然同期を完全に特徴づける。さらに、同期の欠如を証明するために使用できるコンパクトな代数的基準を与える。種々のフェルミオン性低温原子実験に関連する複数の系で同期を示す。 Synchronization is a phenomenon where interacting particles lock their motion and display non-trivial dynamics. Despite intense efforts studying synchronization in systems without clear classical limits, no comprehensive theory has been found. We develop such a general theory based on novel necessary and sufficient algebraic criteria for persistently oscillating eigenmodes (limit cycles) of time-independent quantum master equations. We show these eigenmodes must be quantum coherent and give an exact analytical solution for all such dynamics in terms of a dynamical symmetry algebra. Using our theory, we study both stable synchronization and metastable/transient synchronization. We use our theory to fully characterise spontaneous synchronization of autonomous systems. Moreover, we give compact algebraic criteria that may be used to prove absence of synchronization. We demonstrate synchronization in several systems relevant for various fermionic cold atom experiments.	翻訳日:2023-04-09 12:13:52 公開日:2022-01-11
# ブラックホールメソ状態のホールボ情報 Holevo Information of Black Hole Mesostates ( http://arxiv.org/abs/2103.06888v2 ) ライセンス: Link先を確認	Ning Bao, Jonathan Harper, Grant N. Remmen	(参考訳) 我々は、異なる大きさの地平線間を補間するバルクワームホール幾何学を定義し、これらの幾何学におけるHRT表面の特性を決定する。この構造はブラックホールの中間状態と双対であり、ブラックホールのミクロ状態と完全なブラックホール状態の間の状態の中間粗粒化である。近年のホログラフィック・ホレボ情報技術を用いて,これらの物体の識別可能性を分析し,新しい位相遷移挙動を示す。 We define a bulk wormhole geometry interpolating between horizons of differing size and determine characteristics of the HRT surface in these geometries. This construction is dual to black hole mesostates, an intermediate coarse-graining of states between black hole microstates and the full black hole state. We analyze the distinguishability of these objects using the recently derived holographic Holevo information techniques, demonstrating novel phase transition behavior for such systems.	翻訳日:2023-04-08 10:54:51 公開日:2022-01-11
# 正の演算子価値尺度の直交化 Orthogonalization of Positive Operator Valued Measures ( http://arxiv.org/abs/2103.14126v2 ) ライセンス: Link先を確認	Mikael de la Salle	(参考訳) 我々は、ほぼ直交するヒルベルト空間上のユニタリ(またはPOVM)の分割が、同じフォン・ノイマン代数の直交のPOVMに近いことを示す。これはKempe-Vidick と Ji-Natarajan-Vidick-Wright-Yuen による行列代数における無限次元の以前の結果を一般化する。定量的には, 最適となる線形依存度が得られるので, 結果もより微妙である。また、フォン・ノイマン代数の前の有限部分集合の極小行列とPOVMの間の無限次元の双対結果にも一般化する。 We show that a partition of the unity (or POVM) on a Hilbert space that is almost orthogonal is close to an orthogonal POVM in the same von Neumann algebra. This generalizes to infinite dimension previous results in matrix algebras by Kempe-Vidick and Ji-Natarajan-Vidick-Wright-Yuen. Quantitatively, our result are also finer, as we obtain a linear dependance, which is optimal. We also generalize to infinite dimension a duality result between POVMs and minimal majorants of finite subsets in the predual of a von Neumann algebra.	翻訳日:2023-04-06 21:14:56 公開日:2022-01-11
# 閉・開系における二次ボソニック励起の位相的分類とその例 Topological classifications of quadratic bosonic excitations in closed and open systems with examples ( http://arxiv.org/abs/2103.15200v5 ) ライセンス: Link先を確認	Yan He and Chih-Chun Chien	(参考訳) 閉システムの運動方程式からの動的行列の対称性と開システムのリンドブラッド方程式からの有効ハミルトニアンによる二次ボソニック系の位相的分類を解析した。非エルミート動的行列と有効ハミルトニアンはどちらも10倍の方向テーブルをもたらすが、系-保存結合は、貯水池と結合しない系を異なるクラスに分解させる可能性がある。 2Dチャーン絶縁体は異なる分類に敏感であることが示されている。対照的に、カイラル対称性を持つ1次元ボソニックSu-Schrieffer-Heegerモデルと時間反転対称性を持つ2次元ボソニックトポロジカル絶縁体を、対応する開系が異なるクラスに分類できることを示す。 The topological classifications of quadratic bosonic systems according to the symmetries of the dynamic matrices from the equations of motion of closed systems and the effective Hamiltonians from the Lindblad equations of open systems are analyzed. While the non-Hermitian dynamic matrix and effective Hamiltonian both lead to a ten-fold way table, the system-reservoir coupling may cause a system with or without coupling to a reservoir to fall into different classes. A 2D Chern insulator is shown to be insensitive to the different classifications. In contrast, we present a 1D bosonic Su-Schrieffer-Heeger model with chiral symmetry and a 2D bosonic topological insulator with time-reversal symmetry to show the corresponding open systems may fall into different classes.	翻訳日:2023-04-06 08:02:47 公開日:2022-01-11
# 位相項を持つシュウィンガー模型のスクリーニングと閉じ込めのための古典的エミュレートディジタル量子シミュレーション Classically emulated digital quantum simulation for screening and confinement in the Schwinger model with a topological term ( http://arxiv.org/abs/2105.03276v2 ) ライセンス: Link先を確認	Masazumi Honda, Etsuko Itou, Yuta Kikuchi, Lento Nagano, Takuya Okuda	(参考訳) 古典的シミュレータを用いてデジタル量子シミュレーションを行い、ゲージ理論における位相項によるスクリーニングと閉じ込めの研究を行い、テタ項を持つ(1+1$)次元量子電磁力学(シュウィンガーモデル)に焦点を当てた。プローブ電荷の存在下で基底状態エネルギーを計算し、それらの間のポテンシャルを断熱的状態準備によって推定する。有限体積のシミュレーション結果と解析的予測を比較し,良好な一致を求める。特に, 大規模の場合において, 非整数電荷に対する線形挙動と整数電荷に対する非線形挙動は, 非整数電荷に対する期待閉じ込め(スクリーニング)挙動と一貫して一致している。 We perform digital quantum simulation, using a classical simulator, to study screening and confinement in a gauge theory with a topological term, focusing on ($1+1$)-dimensional quantum electrodynamics (Schwinger model) with a theta term. We compute the ground state energy in the presence of probe charges to estimate the potential between them, via adiabatic state preparation. We compare our simulation results and analytical predictions for a finite volume, finding good agreements. In particular our result in the massive case shows a linear behavior for non-integer charges and a non-linear behavior for integer charges, consistently with the expected confinement (screening) behavior for non-integer (integer) charges.	翻訳日:2023-04-01 05:36:36 公開日:2022-01-11
# フィードバックコヒーレントイジング機の位相空間シミュレーション Phase-space simulations of feedback coherent Ising machines ( http://arxiv.org/abs/2105.04190v2 ) ライセンス: Link先を確認	Simon Kiesewetter, Peter D Drummond (Swinburne University of Technology, Melbourne, Australia)	(参考訳) コヒーレントIsingマシン量子コンピュータの正位相空間シミュレーションを行うための新しい手法が実証された。結合行列の適切な設計により、一般的なハード最適化問題を解くことができる。ここでは、コヒーレントイジングマシンの実装である、フィードバックタイプのフォトニックパラメトリックネットワークの計算量子シミュレーションを行う。量子フィードバック装置の量子シミュレーションのためのスケーラブルな位相空間アルゴリズムを用いて成功率を求める。 A new technique is demonstrated for carrying out exact positive-P phase-space simulations of the coherent Ising machine quantum computer. By suitable design of the coupling matrix, general hard optimization problems can be solved. Here, computational quantum simulations of a feedback type of photonic parametric network are carried out, which is the implementation of the coherent Ising machine. Results for success rates are obtained using this scalable phase-space algorithm for quantum simulations of quantum feedback devices.	翻訳日:2023-03-31 23:40:15 公開日:2022-01-11
# 量子鎖上の同変準局所自己同型の分類 Classification of equivariant quasi-local automorphisms on quantum chains ( http://arxiv.org/abs/2106.02145v2 ) ライセンス: Link先を確認	Alex Bols	(参考訳) 量子鎖上の自己同型を分類し、スピンとフェルミオンの自由度の両方を許容し、それはさらに有限対称性群 $g$ の局所対称性作用に関して同値である。この分類は、強い連続的な変形と分離された補助自己同型の積み重ねを通じて同値となる。等価クラスは、$\mathbb{q} \cup \sqrt{2} \mathbb{q} \times \mathrm{hom}(g, \mathbb{z}_2) \times h^2(g, u(1))$ の値を取るインデックスによって一意にラベル付けされる。スピン鎖上の一次元対称性保護位相相の指数に対するこの指数のte関係について検討し, $h^2(g, u(1))$ の値を取る。 We classify automorphisms on quantum chains, allowing both spin and fermionic degrees of freedom, that are moreover equivariant with respect to a local symmetry action of a finite symmetry group $G$. The classification is up to equivalence through strongly continuous deformation and stacking with decoupled auxiliary automorphisms. We find that the equivalence classes are uniquely labeled by an index taking values in $\mathbb{Q} \cup \sqrt{2} \mathbb{Q} \times \mathrm{Hom}(G, \mathbb{Z}_2) \times H^2(G, U(1))$. We discuss te relation of this index to the index of one-dimensional symmetry protected topological phases on spin chains, which takes values in $H^2(G, U(1))$.	翻訳日:2023-03-27 23:11:30 公開日:2022-01-11
# 正確な被覆問題における量子アニールギャップとクエンチダイナミクス The quantum annealing gap and quench dynamics in the exact cover problem ( http://arxiv.org/abs/2106.08101v3 ) ライセンス: Link先を確認	Bernhard Irsigler and Tobias Grass	(参考訳) 熱処理と熱処理は、量子系の時間進化において極端な反対である: 熱処理は、ゆっくりと変化するパラメータを持つハミルトンの平衡位相を探索し、複雑な最適化問題を解決するツールとして利用することができる。対照的に、クエンチはハミルトンの突然の変化であり、非平衡状態を生み出す。本稿では,両者の関係について検討する。具体的には,量子アニーリングアルゴリズムの重要なボトルネックであるアニーリングギャップの最小値が,クエンチ後の動的量子状態を記述する動的クエンチパラメータから明らかにできることを示す。ニューラルネットワークのトレーニングを含む統計的ツールと組み合わせることで、クエンチとアニーリングダイナミクスの関係を利用して、クエンチデータからアニーリングギャップの完全な機能的挙動を再現することができる。このような方法で得られるアニーリングギャップに関する部分的あるいは完全な知識は、実用的な時間対解の利点を持つ最適化された量子アニーリングプロトコルの設計に使用できる。我々の結果は、厳密な被覆問題の難解な例を表すランダムなイジング・ハミルトニアンのシミュレーションから得られる。 Quenching and annealing are extreme opposites in the time evolution of a quantum system: Annealing explores equilibrium phases of a Hamiltonian with slowly changing parameters and can be exploited as a tool for solving complex optimization problems. In contrast, quenches are sudden changes of the Hamiltonian, producing a non-equilibrium situation. Here, we investigate the relation between the two cases. Specifically, we show that the minimum of the annealing gap, which is an important bottleneck of quantum annealing algorithms, can be revealed from a dynamical quench parameter which describes the dynamical quantum state after the quench. Combined with statistical tools including the training of a neural network, the relation between quench and annealing dynamics can be exploited to reproduce the full functional behavior of the annealing gap from the quench data. We show that the partial or full knowledge about the annealing gap which can be gained in this way can be used to design optimized quantum annealing protocols with a practical time-to-solution benefit. Our results are obtained from simulating random Ising Hamiltonians, representing hard-to-solve instances of the exact cover problem.	翻訳日:2023-03-26 15:31:57 公開日:2022-01-11
# o(n)^{q-1}$テンソルモデルによる平均値のないワームホール計算 Wormhole calculus without averaging from $O(N)^{q-1}$ tensor model ( http://arxiv.org/abs/2106.14886v4 ) ライセンス: Link先を確認	Sayantan Choudhury, K. Shirish	(参考訳) SYKモデルは、ほぼ$AdS_2$空間のフェルミオン結合を平均化した後、ワームホールのような解を持つ。結合が固定されたとしても、これらのワームホールの寄与は引き続き存在し、新しいサドルポイントは「半ワームホール」と解釈される。本稿では,これらのワームホールの運命を,SYKモデルと相関関数とSYKモデルとの相関関係を持つ,$O(N)^{q-1}$ゲージ対称性を持つテンソルモデルを用いて検討する。ワームホール・スレッドド・ウィルソン作用素に関連付けられた因子分解問題を、大域電荷や非自明なコボルディズムクラスにおいて、解消されたワームホールに関連付ける。したがって、特に短距離で分解する分断関数には、ワームホールに関連する大域対称性を破り、大域対称性を欠くような位相的欠陥が存在する必要がある。我々はこれらのワームホールを、トポロジカルな欠陥を加えて「半ワームホール」と解釈する。また、スペクトルフォームファクタの遅い時間的挙動、特に重力セクターの高次ワームホール(英語版)からの先行およびサブリードの貢献についてもコメントする。最後に、他の「ハーフワームホール」の非自明なサドルがどのように支配的であり、非摂動効果によってバルクセクターで異常な熱力学を引き起こすかを示す。 The SYK model has a wormhole-like solution after averaging over the fermionic couplings in the nearly $AdS_2$ space. Even when the couplings are fixed the contribution of these wormholes continues to exist and new saddle points appear which are interpreted as "half-wormholes". In this paper, we will study the fate of these wormholes in a model without quenched disorder namely a tensor model with $O(N)^{q-1}$ gauge symmetry whose correlation function and thermodynamics in the large $N$ limit are the same as that of the SYK model. We will restate the factorization problem linked with the wormhole threaded Wilson operator, in terms of global charges or non-trivial cobordism classes associated with disconnected wormholes. Therefore for the partition function to factorize especially at short distances, there must exist certain topological defects which break the global symmetry associated with wormholes and make the theory devoid of global symmetries. We will interpret these wormholes with added topological defects as our "half-wormholes". We will also comment on the late time behavior of the spectral form factor, particularly its leading and sub-leading order contributions coming from higher genus wormholes in the gravitational sector. Finally we will show how, the other non-trivial saddles from "half-wormhole" dominate and give rise to unusual thermodynamics in the bulk sector due to non-perturbative effects.	翻訳日:2023-03-24 22:03:41 公開日:2022-01-11
# グラフ上の密度Functional Theory Density-Functional Theory on Graphs ( http://arxiv.org/abs/2106.15370v2 ) ライセンス: Link先を確認	Markus Penz, Robert van Leeuwen	(参考訳) 密度汎関数理論の原理はグラフで表される有限格子系に対して研究される。驚くべきことに、ベーシックなホヘンベルク・コーンの定理は一般には無効であるが、密度-ポテンシャル写像の位相構造に関する多くの洞察は得られる。基底状態の正確な条件を一意に v-表現可能とし、この性質がほぼすべての密度に対して成り立つことを証明できる。一連の例がこの理論を示し、純粋な状態制約付き探索汎関数の非凸性を示す。 The principles of density-functional theory are studied for finite lattice systems represented by graphs. Surprisingly, the fundamental Hohenberg-Kohn theorem is found void in general, while many insights into the topological structure of the density-potential mapping can be won. We give precise conditions for a ground state to be uniquely v-representable and are able to prove that this property holds for almost all densities. A set of examples illustrates the theory and demonstrates the non-convexity of the pure-state constrained-search functional.	翻訳日:2023-03-24 19:42:57 公開日:2022-01-11
# 単一量子ビット解離による量子位相の学習 Learning quantum phases via single-qubit disentanglement ( http://arxiv.org/abs/2107.03542v3 ) ライセンス: Link先を確認	Zheng An, Chenfeng Cao, Cheng-Qian Xu, D. L. Zhou	(参考訳) 物質の相の同定は複雑なプロセスであり、特に量子論では、基底状態の複雑さはシステムサイズとともに指数関数的に増加するように見える。量子多体系の絡み合いは異なる相における異なる複雑な構造を示す。異なる絡み合い構造を検出して異なる量子相を区別する自然な方法である。絡み合い構造の検出に直接的に取り組む方法として、絡み合い構造に関する事実情報を提供することができる。本研究では, 量子相を同定するアプローチとして, 異なる位相下での絡み合い構造の決定と, イジングスピン鎖系の相の識別という, 根本的に異なるアプローチを踏襲する。本稿では,RL設計逆アンタングル回路の性能に基づく位相遷移を求めるための強化学習(RL)手法を提案する。さらに,本手法では局所演算が限定され,量子回路が拡張可能で,大規模システムにも拡張可能である。本研究では, 横場イジングモデル(TFIM)とXXZモデルにおける量子相転移におけるこの手法の成功例を示す。さらに、RLエージェントは、TFIMの下での絡み合い構造のクラマース・ワニエ双対性を学ぶ。この研究は、量子多体系の絡み合い構造で量子相を特徴づけることに光を当てる。 Identifying phases of matter is a complicated process, especially in quantum theory, where the complexity of the ground state appears to rise exponentially with system size. The entanglement of quantum many-body systems exhibits different complex structures in different phases. It is a natural way to distinguish different quantum phases by detecting different entanglement structures. As a method that works directly on the detection of the entanglement structure, disentanglement can provide factual information on the entanglement structure. In this work, we follow a radically different approach to identifying quantum phases: we utilize disentanglement to determine the entanglement structures under different phases to distinguish the phases of Ising spin chain systems. Here, we propose a reinforcement learning (RL) approach to finding phase transitions based on the performance of the RL-designed disentangling circuit. Further, our method uses limited local operations and one qubit measurement, making the quantum circuit scalable and easily extended to large-sized systems. We demonstrate the success of this method on the quantum phase transition in the transverse field Ising model (TFIM) and the XXZ model. Moreover, we find that the RL agent learns the Kramers-Wannier duality on entanglement structures under the TFIM. This study sheds light on characterizing quantum phases with the entanglement structures of quantum many-body systems.	翻訳日:2023-03-23 02:21:44 公開日:2022-01-11
# ピロクロア格子上のキラル$\mathbb{Z}_2$スピン液体の射影対称性群の分類:スピン-1/2$XXZハイゼンベルクモデルへの応用 Projective symmetry group classification of chiral $\mathbb{Z}_2$ spin liquids on the pyrochlore lattice: application to the spin-$1/2$ XXZ Heisenberg model ( http://arxiv.org/abs/2107.13574v3 ) ライセンス: Link先を確認	Benedikt Schneider, Jad C. Halimeh, Matthias Punk	(参考訳) 我々は、シュヴィンガーボソン平均場状態の射影対称性群解析を用いて、ピロクロア格子上のキラル$\mathbb{Z}_2$量子スピン液体と同様に、完全対称の完全な分類を与える。 Liu らによって分類された 12 個の完全対称近傍の $\mathbb{Z}_2$ スピン液体を含む 50 個の独立 ans\atze が存在する。 [https://journals.aps.org/prb/abstract/10.1103/PhysRevB.100.075125] 各クラスについて、最も一般的な対称性許容平均場ハミルトニアンを指定する。さらに,反強磁性ハイゼンベルク点近傍のスピン-$1/2$ xxz 模型の平均場方程式を解いて,スピン液体 ans\"atze の部分集合の性質を検証した。 4つのキラルスピン液体が、格子モジュラー時間反転対称性のネジ対称性を破っている。これらの状態は、以前研究されたモノポール束状態と異なる対称性を持ち、その特異な特徴は格子のすべてのロンボで囲まれた$\frac{\pi}{3}$フラックスである。 We give a complete classification of fully symmetric as well as chiral $\mathbb{Z}_2$ quantum spin liquids on the pyrochlore lattice using a projective symmetry group analysis of Schwinger boson mean-field states. We find 50 independent ans\"atze, including the 12 fully symmetric nearest-neighbor $\mathbb{Z}_2$ spin liquids that have been classified by Liu et al. [https://journals.aps.org/prb/abstract/10.1103/PhysRevB.100.075125]. For each class we specify the most general symmetry-allowed mean-field Hamiltonian. Additionally, we test the properties of a subset of the spin liquid ans\"atze by solving the mean-field equations for the spin-$1/2$ XXZ model near the antiferromagnetic Heisenberg point. We find four chiral spin liquids that break the screw symmetry of the lattice modulo time reversal symmetry. These states have a different symmetry than the previously studied monopole flux state and their unique characteristic is a $\frac{\pi}{3}$ flux enclosed by every rhombus of the lattice.	翻訳日:2023-03-20 16:53:00 公開日:2022-01-11
# 量子気体中の弱い相互作用と強い相互作用の二重性 Duality between weak and strong interactions in quantum gases ( http://arxiv.org/abs/2109.08626v2 ) ライセンス: Link先を確認	Etienne Granet, Bruno Bertini, Fabian H. L. Essler	(参考訳) 1次元の量子気体では、ハードコアボソンと非相互作用フェルミオンの間によく知られた「双対性」が存在する。しかし、場の理論のレベルでは、強く相互作用するボソンと弱い相互作用するフェルミオンをつなぐ完全双対性は知られていない。ここでは、この長期的問題に対する解決策を提案する。我々の導出は、その導出に比例する波動関数の不連続を誘導する1次元におけるフェルミオン間の唯一の点的相互作用を規則化することに依存する。すべての既知の正規化とは対照的に、我々のポテンシャルは小さな相互作用強度に対して弱い。これにより、強い相互作用を持つボソンに図式摂動理論の標準的な方法を適用することができる。第一の応用として, 有名なリーブ・リンガーモデルと双対なフェルミオン模型であるcheon-shigeharaモデルの有限温度スペクトル関数を計算する。 In one dimensional quantum gases there is a well known "duality" between hard core bosons and non-interacting fermions. However, at the field theory level, no exact duality connecting strongly interacting bosons to weakly interacting fermions is known. Here we propose a solution to this long standing problem. Our derivation relies on regularizing the only point-like interaction between fermions in 1D that induces a discontinuity in the wave function proportional to its derivative. In contrast to all known regularisations our potential is weak for small interaction strengths. Crucially, this allows one to apply standard methods of diagrammatic perturbation theory to strongly interacting bosons. As a first application we compute the finite temperature spectral function of the Cheon-Shigehara model, the fermionic model dual to the celebrated Lieb-Liniger model.	翻訳日:2023-03-14 11:26:43 公開日:2022-01-11
# 非共鳴二段階遷移:量子熱力学からの教訓 Nonresonant two-level transitions: Lessons from quantum thermodynamics ( http://arxiv.org/abs/2109.11413v2 ) ライセンス: Link先を確認	Andreas Wacker	(参考訳) 量子熱力学の概念に基づき、1つの電磁モードに結合した2段階の系を解析する。モード周波数が遷移周波数に一致しないデチューニングの場合には、レベルと光子エネルギーに対して有効エネルギーが導出される。熱力学的に一貫した記述を達成するためには, フェルミオン, ボソニック貯留層と定常状態でのエネルギー交換に使用する必要がある。周波数プルやブロッホゲインなどの既知の特徴を回復する一方で、熱力学的背景に光を当て、一貫性のある理解を可能にする。 Based on concepts from quantum thermodynamics the two-level system coupled to a single electromagnetic mode is analyzed. Focusing on the case of detuning, where the mode frequency does not match the transition frequency, effective energies are derived for the levels and the photon energy. It is shown that these should be used for energy exchange with fermionic and bosonic reservoirs in the steady state in order to achieve a thermodynamically consistent description. While recovering known features such as frequency pulling or Bloch gain, this sheds light on their thermodynamic background and allows for a coherent understanding.	翻訳日:2023-03-13 23:00:57 公開日:2022-01-11
# 競合する長距離相互作用を持つ原子BECにおける超ガラス形成 Superglass formation in an atomic BEC with competing long-range interactions ( http://arxiv.org/abs/2109.14709v2 ) ライセンス: Link先を確認	Stefan Ostermann, Valentin Walther and Susanne F. Yelin	(参考訳) 量子系の複雑な動的位相は、通常、創発的な周期次数を引き起こす原子相互作用によって引き起こされる。本稿では,密度秩序に対する動的不安定性が超ガラス相に与える2つの競合する,実質的に異なる長距離相互作用ポテンシャルを持つ量子多体系について検討する。 e. 局所密度変調を呈する超流動性非晶質固体であり, 長周期秩序は存在しない。ライドベルク装束法における2次元BECと光立位波共振器について検討する。この系の動的パターン形成は、リドバーグドレッシングによって生じる反発的ソフトコア相互作用と、キャビティフォトンによって引き起こされる無限範囲の符号変化相互作用の2つの相互作用電位の競合によって制御される。超ガラス相は、2つの相互作用ポテンシャルが不規則な長さスケールを導入するときに起こる。外部付加障害のないこの特異位相の動的形成は量子ゆらぎによって駆動され、2つの競合する相互作用エネルギーと長さスケールによって引き起こされるフラストレーションに起因する。 The complex dynamical phases of quantum systems are dictated by atomic interactions that usually evoke an emergent periodic order. Here, we study a quantum many-body system with two competing and substantially different long-range interaction potentials where the dynamical instability towards density order can give way to a superglass phase, i. e., a superfluid disordered amorphous solid, which exhibits local density modulations but no long-range periodic order. We consider a two-dimensional BEC in the Rydberg-dressing regime coupled to an optical standing wave resonator. The dynamic pattern formation in this system is governed by the competition between the two involved interaction potentials: repulsive soft-core interactions arising due to Rydberg dressing and infinite-range sign changing interactions induced by the cavity photons. The superglass phase is found when the two interaction potentials introduce incommensurate length scales. The dynamic formation of this peculiar phase without any externally added disorder is driven by quantum fluctuations and can be attributed to frustration induced by the two competing interaction energies and length scales.	翻訳日:2023-03-13 04:49:56 公開日:2022-01-11
# パラメータ空間に例外点が包囲された場合の量子力学の複素時間法 Complex time method for quantum dynamics when an exceptional point is encircled in the parameter space ( http://arxiv.org/abs/2110.14473v2 ) ライセンス: Link先を確認	Petra Ruth Kapralova-Zdanska	(参考訳) 我々は、量子力学への応用のための複素時間法を再考し、例外点がハミルトニアンのパラメータ空間に囲むようにした。複素時間法の基本的な考え方は、複素輪郭積分を用いて一階断熱摂動積分を行うことである。このようにして、量子力学問題は複素時間平面(遷移点)の特異点の研究に変換され、これは断熱ハミルトンの複素退化を表すものであり、周囲の輪郭を定義する時間依存パラメータは解析的に複素平面に継続される。このアプローチの基本的な例として、特別な時間対称の場合、例外点を囲む際に発生するラビ振動と急な断熱通路の切り替えについて論じる。 We revisit the complex time method for the application to quantum dynamics as an exceptional point is encircled in the parameter space of the Hamiltonian. The basic idea of the complex time method is using complex contour integration to perform the first-order adiabatic perturbation integral. In this way, the quantum dynamical problem is transformed to a study of singularities in the complex time plane -- transition points -- which represent complex degeneracies of the adiabatic Hamiltonian as the time-dependent parameters defining the encircling contour are analytically continued to complex plane. As an underlying illustration of the approach we discuss a switch between Rabi oscillations and rapid adiabatic passage which occurs upon the encircling of an exceptional point in a special time-symmetric case.	翻訳日:2023-03-10 03:14:53 公開日:2022-01-11
# 重力絡みの推測における限界 Limits on inference of gravitational entanglement ( http://arxiv.org/abs/2111.00936v2 ) ライセンス: Link先を確認	Yue Ma, Thomas Guff, Gavin Morley, Igor Pikovski, M. S. Kim	(参考訳) 重力と量子力学を組み合わせることは物理学の最大の課題である。近年、重力の量子的性質について間接的な手がかりを与えるオプト・メカニカル・システムによる実験が提案されている。近年のD. Carney et al., Phys.Rev.X Quantum 2, 030330 (2021)] では, 原子干渉計をメソスコピック発振器で重力的に絡めることが提案されている。この相互作用は、干渉計の視界の周期的な低下と回復をもたらし、特定の仮定の下では、絡み合いの重力発生を示す。ここでは、同じ効果を再現できる原子干渉計の半古典的モデルについて研究する。周期的崩壊と可視性の復活といったコアシグネチャは、原子がランダムなユニタリチャネルの下にある場合、発振器の明示的なモデリングがなくても、発振器が完全に古典的かつ状況的になる場合を含む。また,振動子の非古典性は,系が基底状態に非常に近くなければ消失し,系が基底状態であっても結合強度によって非古典性が制限されることを示した。その結果,非古典性仮定を満たし検証すること自体が大きな課題となるため,提案実験から絡み合いを導き出すことは非常に困難であることが示唆された。 Combining gravity with quantum mechanics remains one of the biggest challenges of physics. In the past years, experiments with opto-mechanical systems have been proposed that may give indirect clues about the quantum nature of gravity. In a recent variation of such tests [D. Carney et al., Phys.Rev.X Quantum 2, 030330 (2021)], the authors propose to gravitationally entangle an atom interferometer with a mesoscopic oscillator. The interaction results in periodic drops and revivals of the interferometeric visibility, which under specific assumptions indicate the gravitational generation of entanglement. Here we study semi-classical models of the atom interferometer that can reproduce the same effect. We show that the core signature -- periodic collapses and revivals of the visibility -- can appear if the atom is subject to a random unitary channel, including the case where the oscillator is fully classical and situations even without explicit modelling of the oscillator. We also show that the non-classicality of the oscillator vanishes unless the system is very close to its ground state, and even when the system is in the ground state, the non-classicality is limited by the coupling strength. Our results thus indicate that deducing entanglement from the proposed experiment is very challenging, since fulfilling and verifying the non-classicality assumptions is a significant challenge on its own right.	翻訳日:2023-03-09 16:58:46 公開日:2022-01-11
# ウィグナーの友人の古典的モデルがありますか。 Is there a classical model of Wigner's friend? ( http://arxiv.org/abs/2111.02807v2 ) ライセンス: Link先を確認	Anthony Sudbery	(参考訳) ウィグナーの友人(wigner's friend)とは、量子力学の規則に従って異なる観察者が矛盾する記述を与える量子過程を指す。 lostaglio と bowles は、最近同じ効果を示す古典的なシステムを記述すると主張した。この主張は正当化されていない。古典力学や量子力学における確率の異なる意味を考慮に入れない。 "Wigner's friend" refers to a quantum process of which different observers, following the rules of quantum mechanics, give contradictory descriptions. Lostaglio and Bowles have recently claimed to describe a classical system showing the same effect. It is argued that this claim is not justified. it fails to take into account the different meanings of probability in classical and quantum mechanics.	翻訳日:2023-03-09 06:31:49 公開日:2022-01-11
# 生物学的刺激による活性化関数は、生体と人工ニューロンのパフォーマンスギャップを橋渡しできる Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons ( http://arxiv.org/abs/2111.04020v3 ) ライセンス: Link先を確認	Matthew Mithra Noel, Shubham Bharadwaj, Venkataraman Muthiah-Nakarajan, Praneet Dutta, Geraldine Bessie Amali	(参考訳) 非線形活性化関数はニューラルネットワークに複雑な高次元関数を学習する能力を与える。活性化関数の選択は、ディープニューラルネットワークの性能を決定する重要なハイパーパラメータである。これは勾配流、トレーニングの速度、最終的にはニューラルネットワークの表現力に大きく影響する。 sigmoidsのような飽和アクティベーション関数は、消失する勾配問題に悩まされ、ディープニューラルネットワークでは使用できない。普遍近似定理は、シグモイドとReLUの多層ネットワークが任意の精度で任意の複素連続函数を学習できることを保証する。多層ニューラルネットワークが任意の複雑な活性化関数を学習する能力にもかかわらず、従来のニューラルネットワーク(シグモイドとReLUをアクティベーションに用いたネットワーク)の各ニューロンはその決定境界として単一の超平面を持ち、従って線形分類を行う。したがって、Sigmoidal、ReLU、Swish、Mishの活性化機能を持つ単一ニューロンはXOR関数を学習できない。近年の研究では、振動活性化機能を有し、XOR機能を個別に学習できるヒト大脳皮質の2層と3層の生物学的ニューロンが発見された。生体ニューロンにおける振動活性化機能の存在は、生物学的ニューラルネットワークと人工神経ネットワークのパフォーマンスギャップを部分的に説明できるかもしれない。本稿では,個々のニューロンが手作業でxor機能を学習できる4つの新しい振動活性化機能を提案する。本稿では、発振活性化関数を用いて、ニューロンの少ない分類問題を解消し、トレーニング時間を短縮する可能性を検討する。 Nonlinear activation functions endow neural networks with the ability to learn complex high-dimensional functions. The choice of activation function is a crucial hyperparameter that determines the performance of deep neural networks. It significantly affects the gradient flow, speed of training and ultimately the representation power of the neural network. Saturating activation functions like sigmoids suffer from the vanishing gradient problem and cannot be used in deep neural networks. Universal approximation theorems guarantee that multilayer networks of sigmoids and ReLU can learn arbitrarily complex continuous functions to any accuracy. Despite the ability of multilayer neural networks to learn arbitrarily complex activation functions, each neuron in a conventional neural network (networks using sigmoids and ReLU like activations) has a single hyperplane as its decision boundary and hence makes a linear classification. Thus single neurons with sigmoidal, ReLU, Swish, and Mish activation functions cannot learn the XOR function. Recent research has discovered biological neurons in layers two and three of the human cortex having oscillating activation functions and capable of individually learning the XOR function. The presence of oscillating activation functions in biological neural neurons might partially explain the performance gap between biological and artificial neural networks. This paper proposes 4 new oscillating activation functions which enable individual neurons to learn the XOR function without manual feature engineering. The paper explores the possibility of using oscillating activation functions to solve classification problems with fewer neurons and reduce training time.	翻訳日:2023-03-08 22:32:25 公開日:2022-01-11
# NUTブラックホールのスカラーおよびディラック摂動のグレイボディ放射 Greybody Radiation of scalar and Dirac perturbations of NUT Black Holes ( http://arxiv.org/abs/2111.15005v2 ) ライセンス: Link先を確認	Ahmad Al-Badawi, Sara Kanzi, and \.Izzet Sakall{\i}	(参考訳) スピノリアル波動方程式、すなわちディラック方程式とクライン・ゴルドン方程式、およびナットブラックホール時空におけるグレイボディ放射を考える。この目的のために、NUT時空におけるディラック方程式をNewman-Penrose (NP)フォーマリズムにおけるヌルテトラッドを用いて初めて研究する。次に、ディラック方程式をラジアル集合と角集合に分離する。角集合は関連するルジャンドル関数の観点で解かれる。放射状集合では、分離された放射状波動方程式を求め、有効ポテンシャルとともに1次元schr\"odinger波動方程式を導出する。次に, 物理的に許容される領域において, 半径距離の関数としてプロットすることでポテンシャルを議論する。また、Klein-Gordon方程式を用いて、ボソンおよびフェルミオンのグレーボディ因子(GF)を計算する。 NUT時空のGFに対するNUTパラメータの影響を詳細に検討した。 We consider the spinorial wave equations, namely the Dirac and the Klein-Gordon equations, and greybody radiation in the NUT black hole spacetime. To this end, we first study the Dirac equation in NUT spacetime by using a null tetrad in the Newman-Penrose (NP) formalism. Next, we separate the Dirac equation into radial and angular sets. The angular set is solved in terms of associated Legendre functions. With the radial set, we obtain the decoupled radial wave equations and derive the one-dimensional Schr\"odinger wave equations together with effective potentials. Then, we discuss the potentials by plotting them as a function of radial distance in a physically acceptable region. We also study the Klein-Gordon equation to compute the greybody factors (GFs) for both bosons and fermions. The influence of the NUT parameter on the GFs of the NUT spacetime is investigated in detail.	翻訳日:2023-03-06 09:04:15 公開日:2022-01-11
# スピンチェーン交換ハミルトニアンの高速熱分解 Rapid thermalization of spin chain commuting Hamiltonians ( http://arxiv.org/abs/2112.00593v2 ) ライセンス: Link先を確認	Ivan Bardet, \'Angela Capel, Li Gao, Angelo Lucia, David P\'erez-Garc\'ia and Cambyse Rouz\'e	(参考訳) スピン鎖と大きな熱浴との弱い結合は、有限範囲の翻訳不変の通勤ハミルトニアンの任意の温度で急速に熱し、システムサイズと対数的にスケールする時間で平衡に達することを証明した。我々の主な結果は、古典的なスピン鎖に対するホリーとストロックの半次結果の量子的設定への一般化であり、スペクトルギャップの非閉性に基づく境界に対する指数関数的な改善を表している。物理的観点からは、この結果は変換不変スピン鎖上のデイビス進化に対する散逸相転移の欠如を厳密に立証する。この結果は、進化が位相の対称性を尊重する対称性保護位相(Symmetry Protected Topological phases)にも適用される。これは多体および非平衡量子系の研究に広く応用されている。 We prove that spin chains weakly coupled to a large heat bath thermalize rapidly at any temperature for finite-range, translation-invariant commuting Hamiltonians, reaching equilibrium in a time which scales logarithmically with the system size. Our main result is a generalization to the quantum setting of a seminal result of Holley and Stroock for classical spin chains and represents an exponential improvement over bounds based on the non-closure of the spectral gap. From a physical point of view, our result rigorously establishes the absence of dissipative phase transition for Davies evolutions over translation-invariant spin chains. The result also applies in the case of Symmetry Protected Topological phases where the evolution is respecting the symmetry of the phase. This has wide-ranging applications to the study of many-body in and out-of-equilibrium quantum systems.	翻訳日:2023-03-06 04:43:18 公開日:2022-01-11
# $\alpha-z$ Bures-Wasserstein 量子分岐に対する正しい平均 Right mean for the $\alpha-z$ Bures-Wasserstein quantum divergence ( http://arxiv.org/abs/2201.03732v1 ) ライセンス: Link先を確認	Miran Jeong, Jinmi Hwang, Sejong Kim	(参考訳) $\alpha-z$ Renyi相対エントロピーから誘導される新しい量子分岐は、最近、$\alpha-z$ Bures-Wasserstein 量子分岐と呼ばれる。本稿では,各点に対する$\alpha-z$ bures-wasserstein量子発散の重み付き和の一意的最小化である右平均の性質について検討する。カルタン平均を含む行列パワー平均を持つ正しい平均の多くの興味深い作用素不等式が提示される。さらに、ワッサースタイン平均とのトレース不等式を検証し、2つの右平均のハダマール積の境界を与える。 A new quantum divergence induced from the $\alpha-z$ Renyi relative entropy, called the $\alpha-z$ Bures-Wasserstein quantum divergence, has been recently introduced. We investigate in this paper properties of the right mean, which is a unique minimizer of the weighted sum of $\alpha-z$ Bures-Wasserstein quantum divergences to each points. Many interesting operator inequalities of the right mean with the matrix power mean including the Cartan mean are presented. Moreover, we verify the trace inequality with the Wasserstein mean and provide bounds for the Hadamard product of two right means.	翻訳日:2023-03-01 13:08:51 公開日:2022-01-11
# 3量子状態の合成 Preparation of 3-qubit states ( http://arxiv.org/abs/2201.03724v1 ) ライセンス: Link先を確認	Oscar Perdomo, Nelson Castaneda and Roger Vogeler	(参考訳) すべての振幅が実数であれば、純粋なqubit状態を実数と呼ぶ。実3量子状態は、$R_y(\theta)$ gates と少なくとも 4 つの制御された-$Z$ gates を用いて準備できることを示し、4 が最適であると予想する。 znidaric, giraud, georgeotによる2008年のアルゴリズムとは異なり、ローカルゲートと少なくとも3つの制御された$z$ゲートを使って、3量子ビットの状態を生成するアルゴリズムも提示する。私たちのメソッドが2および3量子状態に対してどのように動作するかを示すビデオは、https://youtu.be/LIdYSs-rE-oとhttps://youtu.be/Kne0Vq7gyzQで見ることができる。 We will call a pure qubit state real if all its amplitudes are real numbers. We show that any real 3-qubit state can be prepared using $R_y(\theta)$ gates and at most four controlled-$Z$ gates, and we conjecture that four is optimal. We also present an algorithm -- different from the 2008 algorithm given by Znidaric, Giraud and Georgeot -- that prepares any 3-qubit state using local gates and at most three controlled-$Z$ gates. Videos showing how our method works for two- and three-qubit states can be found at https://youtu.be/LIdYSs-rE-o and https://youtu.be/Kne0Vq7gyzQ	翻訳日:2023-03-01 13:08:41 公開日:2022-01-11
# アームチェアグラフェンナノリボンの多光子吸収とラビ振動 Multiphoton absorption and Rabi oscillations in armchair graphene nanoribbons ( http://arxiv.org/abs/2201.03896v1 ) ライセンス: Link先を確認	B.S. Monozon and P. Schmelcher	(参考訳) 本稿では,リボン軸に平行な光波によって誘起される時間振動する強電界の存在下でのアルチェエアグラフェンナノリボン(agnr)の多光子吸収とラビ振動の問題に対する解析的アプローチを提案する。リボン閉じ込めを受ける無質量電子に対する2次元ディラック方程式を用いる。価値と導電サイズ量子化サブバンド間の電子遷移に関連する電子-ホール対生成率の共振近似では、対応する多光子吸収係数とラビ振動の周波数が明示的な形で得られる。以上の量とリボン幅および電界強度の依存性を追尾し, 時間振動と実質的に一定な電界に関係した多光子アシストレジームとトンネルレジームの両方について検討した。サブバンド間遷移に対する電場の振動特性の顕著な増強効果に遭遇する。解析結果はグラフェン層で得られたものと数値計算により定性的に一致している。典型的なAGNRとレーザーパラメータの予測実験値は、ラボでラビ振動と多光子吸収の両方がアクセス可能であることを示している。サブバンド間トンネルに関連するデータは、agnrを外部実験室電場を印加することにより量子電磁気真空崩壊を検出できる1次元凝縮物アナログとする。 We present an analytical approach to the problem of the multiphoton absorption and Rabi oscillations in an armchair graphene nanoribbon (AGNR) in the presence of a time-oscillating strong electric field induced by a light wave directed parallel to the ribbon axis. The two-dimensional Dirac equation for the massless electron subject to the ribbon confinement is employed. In the resonant approximation the electron-hole pair production rate, associated with the electron transitions between the valence and conduction size-quantized subbands, the corresponding multiphoton absorption coefficient and the frequency of the Rabi oscillations are obtained in an explicit form. We trace the dependencies of the above quantities on the ribbon width and electric field strength for both the multiphoton assisted and tunneling regimes relevant to the time-oscillating and practically constant electric field, respectively. A significant enhancement effect of the oscillating character of the electric field on the intersubband transitions is encountered. Our analytical results are in qualitative agreement with those obtained for the graphene layer by numerical methods. Estimates of the expected experimental values for the typically employed AGNR and laser parameters show that both the Rabi oscillations and multiphoton absorption are accessible in the laboratory. The data relevant to the intersubband tunneling makes the AGNR a 1D condensed matter analog in which the quantum electrodynamic vacuum decay can be detected by applying an external laboratory electric field.	翻訳日:2023-03-01 13:05:34 公開日:2022-01-11
# ランクアグリゲーションのヒューリスティック検索とラベルランキングへの応用 Heuristic Search for Rank Aggregation with Application to Label Ranking ( http://arxiv.org/abs/2201.03893v1 ) ライセンス: Link先を確認	Yangming Zhou and Jin-Kao Hao and Zhen Li and Fred Glover	(参考訳) ランクアグリゲーションは、異なる有権者の選択肢の選好ランキングを単一のコンセンサスランキングに統合することを目的としている。しかし、様々な実用的応用のための有用なモデルとして、計算的に難しい問題である。本稿では,完全ランキングと部分ランキングの両方を用いて,ランクアグリゲーション問題を解くための効果的なハイブリッド進化的ランキングアルゴリズムを提案する。このアルゴリズムは、コンコーダントペアに基づくセマンティッククロスオーバーと、効率的な漸進的評価手法によって強化された遅延受容局所探索を特徴とする。アルゴリズムを評価するために実験を行い、最先端のアルゴリズムと比較してベンチマークインスタンスで高い競合性を示す。その実用性を示すために、このアルゴリズムは重要な機械学習タスクであるラベルランキングに適用される。 Rank aggregation aims to combine the preference rankings of a number of alternatives from different voters into a single consensus ranking. As a useful model for a variety of practical applications, however, it is a computationally challenging problem. In this paper, we propose an effective hybrid evolutionary ranking algorithm to solve the rank aggregation problem with both complete and partial rankings. The algorithm features a semantic crossover based on concordant pairs and a late acceptance local search reinforced by an efficient incremental evaluation technique. Experiments are conducted to assess the algorithm, indicating a highly competitive performance on benchmark instances compared with state-of-the-art algorithms. To demonstrate its practical usefulness, the algorithm is applied to label ranking, which is an important machine learning task.	翻訳日:2023-03-01 13:05:14 公開日:2022-01-11
# 自然パラメトリックダウン変換の古典的モデル A classical model of spontaneous parametric down-conversion ( http://arxiv.org/abs/2201.03842v1 ) ライセンス: Link先を確認	Girish Kulkarni, Jeremy Rioux, Boris Braverman, Maria V. Chekhova, and Robert. W. Boyd	(参考訳) 我々は,自然パラメトリックダウンコンバージョン(SPDC)を,ポンプ場の古典的差分周波数生成(DFG)および仮想確率的「真空」シードフィールドとしてモデル化した。 DFGプロセスから発生するフィールドの2次時空間相関がSPDCから信号フィールドの2次時空間相関を再現することを示した。特に、低利得の場合、このモデルは信号光子の密度行列の量子計算と一致し、高利得の場合、モデルの予測は、ポンプ強度を上げるためのspd場の遠方磁場強度プロファイル、軌道角運動量スペクトル、波長スペクトルの実験的測定とよく一致している。さらに、モデルが二階のSU(1,1)干渉を捕捉し、両方の利得状態におけるコヒーレンス効果を誘導することを示す。興味深いことに、このモデルはまた、低利得状態におけるオブジェクト透過性による干渉視認性の線形スケーリングを正しく予測している。本モデルは,spdcと誘導コヒーレンスという文脈における古典量子分断に関する新たな基礎的洞察をもたらすだけでなく,spdに基づく多数の実験や応用のための有用な理論的ツールとなる。 We model spontaneous parametric down-conversion (SPDC) as classical difference frequency generation (DFG) of the pump field and a hypothetical stochastic "vacuum" seed field. We analytically show that the second-order spatiotemporal correlations of the field generated from the DFG process replicate those of the signal field from SPDC. Specifically, for low gain, the model is consistent with the quantum calculation of the signal photon's reduced density matrix; and for high gain, the model's predictions are in good agreement with our experimental measurements of the far-field intensity profile, orbital angular momentum spectrum, and wavelength spectrum of the SPDC field for increasing pump strengths. We further theoretically show that the model successfully captures second-order SU(1,1) interference and induced coherence effects in both gain regimes. Intriguingly, the model also correctly predicts the linear scaling of the interference visibility with object transmittance in the low-gain regime -- a feature that is often regarded as a quintessential signature of the nonclassicality of induced coherence. Our model may not only lead to novel fundamental insights into the classical-quantum divide in the context of SPDC and induced coherence, but can also be a useful theoretical tool for numerous experiments and applications based on SPDC.	翻訳日:2023-03-01 13:04:44 公開日:2022-01-11
# 原子-光ハイブリッド干渉計における非対称利得最適化によるセンシング性能向上 Sensing performance enhancement via asymmetric gain optimization in the atom-light hybrid interferometer ( http://arxiv.org/abs/2201.03818v1 ) ライセンス: Link先を確認	Zhifei Yu, Bo Fang, Shuying Chen, Pan Liu, Guzhi Bao, Chun-hua Yuan, and L.Q Chen	(参考訳) SU(1,1)型原子-光ハイブリッド干渉計(SALHI)は、光学相と原子相の両方に敏感な干渉計の一種である。しかし、この損失は実用上避けられない問題であり、干渉計の使用を大幅に制限している。可視性は干渉計のセンシング性能を評価する重要なパラメータである。そこで本研究では,salhiの視認性に対する損失の軽減効果を非対称ゲイン最適化により実験的に示し,視認性に対する損失の最大閾値を100〜$近く増加させる。さらに,最大視認性に対する最適条件は,信号対雑音比(snr)を強度検出を用いた損失の有無において最良値に向上させる条件と同一であり,snr改善のための実験的運用基準として有効であることを示す。干渉可視性の向上はSNR増強の達成を意味する。本研究は,SALHIをレーダーおよび測度測定に応用するための重要な基礎となる。 The SU (1,1)-type atom-light hybrid interferometer (SALHI) is a kind of interferometer that is sensitive to both the optical phase and atomic phase. However, the loss has been an unavoidable problem in practical applications and greatly limits the use of interferometers. Visibility is an important parameter to evaluate the sensing performance of interferometers. Here, we experimentally demonstrate the mitigating effect of the loss on visibility of the SALHI via asymmetric gain optimization, where the maximum threshold of loss to visibility close to $100\%$ is increased. Furthermore, we theoretically find that the optimal condition for the largest visibility is the same as that for the enhancement of signal-to-noise ratio (SNR) to the best value in the presence of losses using the intensity detection, indicating that the visibility can act as an experimental operational criterion for SNR improvement in practical applications. Improvement of the interference visibility means achievement of SNR enhancement. Our results provide a significant foundation for practical application of the SALHI in radar and ranging measurements.	翻訳日:2023-03-01 13:04:21 公開日:2022-01-11
# カルデイラ・レゲット形式論における未探究のデコヒーレンスについて:到着時間分布、同一粒子および時間内の回折 On some unexplored decoherence aspects in the Caldeira-Leggett formalism: arrival time distributions, identical particles and diffraction in time ( http://arxiv.org/abs/2201.03778v1 ) ライセンス: Link先を確認	S. V. Mousavi and S. Miret-Artes	(参考訳) カルデイラ・レゲット・マスター方程式における未検討のデコヒーレンスについて解析・考察した。デコヒーレンス過程は、緩和速度または摩擦と温度という2つの環境パラメータによって制御され、量子状態から古典状態へ徐々に遷移する。時間依存干渉パターンによるデコヒーレンス過程において, 時間分布, 非最小不確かさ生成物, 拡張ガウス波パケット, 同一粒子および回折は, 興味深い特徴を示す。定力場の存在がデコヒーレンスに影響を与えないこと, ストレッチパラメータの正値がデコヒーレンス率を減少させること, 同一粒子に対する波動関数の対称性がオープンダイナミクスを考慮した場合, 時間と空間の回折は, いわゆる量子シャッター問題におけるゼロ散逸限界における温度および/または緩和速度を増大させることによって徐々に洗い出されることを示す。 Some unexplored decoherence aspects within the Caldeira-Leggett master equation are analyzed and discussed. The decoherence process is controlled by the two environment parameters, the relaxation rate or friction and the temperature, leading to a gradual transition from the quantum to classical regime. Arrival time distributions, nonminimum-uncertainty-product or stretching Gaussian wave packets, identical particles and diffraction in time display interesting features during the decoherence process undergone by the time dependent interference patterns. We show that the presence of a constant force field does not affect the decoherence, {\it positive} values of the stretching parameter reduces the rate of decoherence, the symmetry of the wave function for identical particles plays no role when open dynamics are considered; and diffraction in time and space is gradually washed out by increasing the temperature and/or relaxation rate in the zero dissipation limit within the so-called quantum shutter problem.	翻訳日:2023-03-01 13:04:00 公開日:2022-01-11
# ナノファイバー空洞量子力学系における高次例外点 High-order exceptional point in a nanofiber cavity quantum electrodynamics system ( http://arxiv.org/abs/2201.03768v1 ) ライセンス: Link先を確認	Zigeng Li and Xiaomiao Li and Xiaolan Zhong	(参考訳) 本稿では,2レベルエミッタとナノファイバーキャビティからなる全繊維エミッタキャビティ量子電磁力学(QED)システムを提案する。本手法により,エミッタとナノファイバーキャビティの結合に基づく高次例外点の観測が可能となった。このキャビティの有効な利得は、2つの同一のレーザー場を介してナノファイバーキャビティに弱い駆動によって得られ、実験の実行においてコヒーレント完全吸収(CPA)を実現する。実験可能なパラメータの下では、このシステムのハミルトニアンは擬エルミティティ(英語版)の状態にあり、その固有値は1つの実と1つの複素共役からなるか、あるいは全て実となる。 2つのエミッタ-キャビティ結合強度の比とエミッタの崩壊率の比を制御的に調整することにより、エミッタ-キャビティ系におけるパリティ時間対称性のない3階例外点(EP3)と2階例外点(EP2)の両方を発見することができる。これらの結果は、全出力スペクトルと透過スペクトルによっても示される。また,ep3点において結合強度が臨界結合強度よりも大きい場合,対称モードが発生することがわかった。本稿では,高次例外点を実現する新しい手法を提案する。 We present an all-fiber emitter-cavity quantum electrodynamics (QED) system which consists of two two-level emitters and a nanofiber cavity. Our scheme makes it possible to observe the higher-order exceptional points based on the coupling between the emitters and the nanofiber cavity. The effective gain of this cavity can be obtained by weakly driven to the nanofiber cavity via two identical laser fields, which will realize coherent perfect absorption (CPA) in the implementation of the experiments. Under the experimental feasible parameters, the Hamiltonian of this system is in the condition of pseudo-Hermiticity, which means that its eigenvalues can be made of one real and a pair of complex conjugates, or be all real. By controllably tuned the ratio of the two emitter-cavity coupling strengths, and the ratio of the decay rates of the emitters, we can discover both the three-order exceptional point (EP3) and the second-order exceptional point (EP2) without parity-time symmetry in our emitter-cavity system. These results can also be demonstrated by the total output spectra and transmission spectra. We also find that the symmetric modes come into being when the coupling strength greater than the critical coupling strength at EP3 points. Our proposal will provide a new method to realize higher-order exceptional points.	翻訳日:2023-03-01 13:03:34 公開日:2022-01-11
# 密度汎関数理論を一電子還元密度行列関数理論に変換して静的相関を捉える Density Functional Theory Transformed into a One-electron Reduced Density Matrix Functional Theory for the Capture of Static Correlation ( http://arxiv.org/abs/2201.03736v1 ) ライセンス: Link先を確認	Daniel Gibney, Jan-Niklas Boyn and David A. Mazziotti	(参考訳) 現代の計算化学において最も広く採用されている密度汎関数論(DFT)は、強い相関系の電子構造を正確に記述することができない。ここでは、DFTを1電子還元密度行列(1-RDM)関数理論にフォーマルかつ実用的に変換できることを示す。運動エネルギー項における1-RDMのイデオロポシシ制限の緩和に加えて、DFTの密度に基づく交換相関関数に2次1-RDMに基づく項を追加する。我々のアプローチは、DFTの計算スケール$O(r^{3})$の2次半定値プログラミングによって実装され、シングルトビラディカルや結合解離のような化学構造やプロセスにおける静的相関の記述において、従来のDFTよりも大幅に改善されている。 Density functional theory (DFT), the most widely adopted method in modern computational chemistry, fails to describe accurately the electronic structure of strongly correlated systems. Here we show that DFT can be formally and practically transformed into a one-electron reduced-density-matrix (1-RDM) functional theory, which can address the limitations of DFT while retaining favorable computational scaling compared to wavefunction-based approaches. In addition to relaxing the idempotency restriction on the 1-RDM in the kinetic energy term, we add a quadratic 1-RDM-based term to DFT's density-based exchange-correlation functional. Our approach, which we implement by quadratic semidefinite programming at DFT's computational scaling of $O(r^{3})$, yields substantial improvements over traditional DFT in the description of static correlation in chemical structures and processes such as singlet biradicals and bond dissociations.	翻訳日:2023-03-01 13:02:57 公開日:2022-01-11
# モードパラメータ推定における量子限界 Quantum Limits on Mode Parameter Estimation ( http://arxiv.org/abs/2201.04050v1 ) ライセンス: Link先を確認	Manuel Gessner, Nicolas Treps, and Claude Fabre	(参考訳) パラメータ非依存の量子状態によって占有されるモードの時空間構造を変更する「モードパラメータ」の推定における究極の量子限界を決定する。純粋あるいは混合、ガウス的あるいは非ガウス的といった任意の多モード状態に対して有界な量子 Cram\'{e}r-Rao に対する解析的表現は、非古典的状態によって達成される可能性のある量子精度拡張の起源を明らかにする。推定誤差のスケーリングの改善は、特定のモードが集約され測定された場合にのみ可能である。その結果,広帯域の時空間モードパラメータと超解像画像に対する量子エンハンスド推定手法の同定が可能となった。 We determine the ultimate quantum limits on the estimation of a "mode parameter" that modifies the spatiotemporal structure of the modes occupied by a parameter-independent quantum state. Our analytical expression for the quantum Cram\'{e}r-Rao bound for arbitrary multimode states, pure or mixed, Gaussian or non-Gaussian, reveals the origin of quantum precision enhancements that may be achieved with nonclassical states. Improved scaling of the estimation error is possible only if specific modes are populated and measured. Our results allow us to identify quantum-enhanced estimation strategies for a wide range of spatio-temporal mode parameters and in superresolution imaging.	翻訳日:2023-03-01 12:55:36 公開日:2022-01-11
# rydberg原子配列における量子スピン液体の動的合成 Dynamical preparation of quantum spin liquids in Rydberg atom arrays ( http://arxiv.org/abs/2201.04034v1 ) ライセンス: Link先を確認	Giuliano Giudici, Mikhail D. Lukin, Hannes Pichler	(参考訳) ライドバーグ原子配列に基づくプログラム可能な量子シミュレータを用いて,最近のスピン液体の開始を示す実験(semeghini et al., science 374, 1242 (2021))を理論的に解析した。実験では, 準断熱的状態準備プロトコルを用いて調製した平衡外状態において, トポロジカル秩序のロバストなシグネチャが出現する。状態準備プロトコルは、原子数と線形にスケールする時間において、位相位相の固定点(硬二量体の共鳴価結合(RVB)状態)を目標に最適化できることを理論的に示す。さらに, テンソルネットワーク(TN)状態の2パラメータ変動多様体について, 合成過程の多体ダイナミクスを正確に記述する。このアプローチを用いて,非平衡状態の性質を解析し,位相秩序の出現を明らかにした。 We theoretically analyze recent experiments [G. Semeghini et al., Science 374, 1242 (2021)] demonstrating the onset of a topological spin liquid using a programmable quantum simulator based on Rydberg atom arrays. In the experiment, robust signatures of topological order emerge in out-of-equilibrium states that are prepared using a quasi-adiabatic state preparation protocol. We show theoretically that the state preparation protocol can be optimized to target the fixed point of the topological phase -- the resonating valence bond (RVB) state of hard dimers -- in a time that scales linearly with the number of atoms. Moreover, we provide a two-parameter variational manifold of tensor network (TN) states that accurately describe the many-body dynamics of the preparation process. Using this approach we analyze the nature of the non-equilibrium state, establishing the emergence of topological order.	翻訳日:2023-03-01 12:55:23 公開日:2022-01-11
# 非エルミート準結晶の動的局在 Dynamical localization in non-Hermitian quasi-crystals ( http://arxiv.org/abs/2201.04028v1 ) ライセンス: Link先を確認	C. M. Dai, Yunbo Zhang, and Xuexi Yi	(参考訳) 片方向の2ステップ駆動は一様コヒーレントトンネルと非共分散オンサイトゲインと損失により構成される周期的に駆動される1次元非エルミタン格子の局所化遷移について検討した。複雑なポテンシャルの駆動周波数と位相シフトに応じて, システムは局所化, 非局在化, 混合相にすることができる。システムの2つの臨界駆動周波数を特定し、最初の1つは複素ポテンシャルの最大位相シフトに対応し、準エネルギースペクトルが依然として存在し、全ての状態が拡張され、もう1つは完全な実スペクトルの消失に対応し、非常に弱い複素ポテンシャルは、駆動周波数がこの臨界周波数より低い場合に局所状態の出現に繋がる。高周波の極限において、実スペクトルと複素スペクトルの2つの領域を分離する臨界位相シフトは、有効非エルミート・ハミルトニアンによって捉えることができる定数値に傾向する。 We study the localization transition in periodically driven one-dimensional non-Hermitian lattices where the piece-wise two-step drive is constituted by uniform coherent tunneling and incommensurate onsite gain and loss. We find that the system can be in localized, delocalized, or mixed-phase depending on the driving frequency and the phase shift of complex potential. Two critical driving frequencies of the system are identified, the first one corresponds to the largest phase shift of the complex potential so that the quasi-energy spectrum is still real and all the states are extended, the second one corresponds to the disappear of full real spectrum, and very weak complex potential leads to the emergence of localized states when the driving frequency is lower than this critical frequency. In the high frequency limit, we find the critical phase shift that separates the two regions with respectively real and complex spectrum tends to a constant value that can be captured by an effective non-Hermitian Hamiltonian.	翻訳日:2023-03-01 12:55:06 公開日:2022-01-11
# 定電界におけるガウス相関のシュウィンガー効果 Schwinger effect of Gaussian correlations in constant electric fields ( http://arxiv.org/abs/2201.04001v1 ) ライセンス: Link先を確認	Shu-Min Wu, Hao-Sheng Zeng	(参考訳) 我々はアリスとボブが共有した連続可変2モード圧縮状態のガウス相関(量子エンタングルメント、不協和および相互情報)のシュウィンガー効果について検討し、フェルミオン-フェルミオンモードとクビット-ボゾン場間の相関関係のシュウィンガー効果の差に特に注意を払う。また,シュウィンガー効果下での相関関係の再分配と保守性についても検討した。 We study the Schwinger effect of Gaussian correlations (quantum entanglement, discord and mutual information) of the continuous-variable two-mode squeezed states shared by Alice and Bob, paying special attention to the difference of the Schwinger effect of correlations between modes of fermion-fermion and qubit-bosonic fields studied previously. We also study the redistribution and conservativeness of the correlations under the Schwinger effect.	翻訳日:2023-03-01 12:54:49 公開日:2022-01-11
# 負性ハミルトニアン:混合状態絡みの作用素的特徴 The Negativity Hamiltonian: An operator characterization of mixed-state entanglement ( http://arxiv.org/abs/2201.03989v1 ) ライセンス: Link先を確認	Sara Murciano, Vittorio Vitale, Marcello Dalmonte, Pasquale Calabrese	(参考訳) 量子多体系の基底状態の文脈において、空間の連結領域間の絡み合いの局所性は対応する絡み合いハミルトニアンの局所性に直接結びついている。本研究では,多体系の部分転置の対数を記述する(非エルミート的)実効ハミルトニアン作用素として,負性ハミルトニアンを導入する。これにより、二部的な純粋システムのパラダイムを超えて、絡み合いと演算子の局所性の間の接続に対処できる。この方向への第一歩として、フェルミオン共形場の理論と自由フェルミオン鎖に対する負性ハミルトニアンの構造について研究する:どちらの場合も、負性ハミルトニアンが半局所函数形式を仮定し、単純な函数関係によって捉えることを示す。 In the context of ground states of quantum many-body systems, the locality of entanglement between connected regions of space is directly tied to the locality of the corresponding entanglement Hamiltonian: the latter is dominated by local, few-body terms. In this work, we introduce the negativity Hamiltonian as the (non hermitian) effective Hamiltonian operator describing the logarithm of the partial transpose of a many-body system. This allows us to address the connection between entanglement and operator locality beyond the paradigm of bipartite pure systems. As a first step in this direction, we study the structure of the negativity Hamiltonian for fermionic conformal field theories and a free fermion chain: in both cases, we show that the negativity Hamiltonian assumes a quasi-local functional form, that is captured by simple functional relations.	翻訳日:2023-03-01 12:54:38 公開日:2022-01-11
# 量子通信におけるPOVM測定の簡単な紹介 A Brief Introduction to POVM Measurement in Quantum Communications ( http://arxiv.org/abs/2201.07968v1 ) ライセンス: Link先を確認	Renzhi Yuan	(参考訳) 本稿では,量子通信におけるポジティブ演算値測定(POVM)について概説する。 Projection-Valued Measure(PVM)が最初に導入され、次にPOVM。 POVM と PVM の関係を論じ,実測における POVM の例を示す。本稿では,量子通信におけるPOVM測定について考察する。 This paper gives a brief introduction to Positive-Operator Valued Measure (POVM) of quantum communications. The Projection-Valued Measure (PVM) is first introduced and then the POVM. The relation between POVM and PVM is discussed and an example of POVM in practical measurement is given. This paper provides some insight of POVM measurement for quantum communications.	翻訳日:2023-03-01 12:47:09 公開日:2022-01-11
# ワクチン規制の強化とワクチンに対する公衆の態度: Google検索活動から何が学べるか? Reinforcement of vaccine mandates and public attitudes towards vaccines: What can we learn from google search activity ? ( http://arxiv.org/abs/2201.06965v1 ) ライセンス: Link先を確認	Florian Cafiero (GEMASS), Jeremy Ward	(参考訳) 国際公衆衛生政策はますます強制免疫を優先している。ワクチン接種に対する短期的な影響が十分に文書化されている場合、ワクチンに対する公衆の態度に対する影響についてはほとんど考慮されていない。本稿では,過去10年で少なくとも1回のワクチン委任延長を経験した5カ国(オーストラリア,フランス,ドイツ,イタリア,セルビア)および2つの米国国家(カリフォルニア)のワクチンに関するGoogle検索について検討する。新たな委任統治の実施の効果は、それぞれの国や国家の状況に大きく依存していることが判明した。また,新規あるいは延長された委任状がワクチンに対する公衆の疑念を和らげる兆候はほとんどなかった。 International public health policies increasingly favor mandatory immunization. If its short-term effects on vaccine coverage are well documented, there has been little consideration to its effects on public attitudes towards vaccines. In this paper, we examine Google searches related to vaccines in five countries (Australia, France, Germany, Italy, Serbia) and two American states (California) which experienced at least one vaccine mandate extension in the past decade. We found that the effects of a new mandate implementation heavily depends on the context in each specific country or state. We also observed that there is little indication that the passing of new or extended mandates attenuated public doubt towards vaccines.	翻訳日:2023-03-01 12:46:28 公開日:2022-01-11
# データマーケットプレースとそのビジネスモデルに関する調査 A Survey of Data Marketplaces and Their Business Models ( http://arxiv.org/abs/2201.04561v1 ) ライセンス: Link先を確認	Santiago Andr\'es Azcoitia and Nikolaos Laoutaris	(参考訳) 「データは、土地、インフラ、労働、資本のように、必要不可欠な生産要素になりつつある。これの一環として、さまざまな分野の無数のアプリケーションが、生産チェーンやビジネスプロセスにおいて重要な役割を担うモデルやアルゴリズムを供給するために、膨大な量の情報を必要とします。特定の機能の自動化から、データ駆動型組織における意思決定の促進に至るまで、サードパーティからのデータインプットを取得することのメリットはますます増えています。この要求に応えて、データ要求を適切なプロバイダと一致させ、情報の交換を容易にすることを目的として、新しいエンティティと新しいビジネスモデルが登場した。本稿では,インターネット上でデータ取引を行う企業の現状に関する包括的調査の結果と結論と,研究コミュニティによる新たなデータマーケットプレースの設計について述べる。 "Data" is becoming an indispensable production factor, just like land, infrastructure, labor or capital. As part of this, a myriad of applications in different sectors require huge amounts of information to feed models and algorithms responsible for critical roles in production chains and business processes. Tasks ranging from automating certain functions to facilitating decision-making in data-driven organizations increasingly benefit from acquiring data inputs from third parties. Responding to this demand, new entities and novel business models have appeared with the aim of matching such data requirements with the right providers and facilitating the exchange of information. In this paper, we present the results and conclusions of a comprehensive survey on the state of the art of entities trading data on the internet, as well as novel data marketplace designs from the research community.	翻訳日:2023-03-01 12:46:15 公開日:2022-01-11
# トランスモン量子ビットの自発的放出率計算のための全波法 Full-Wave Methodology to Compute the Spontaneous Emission Rate of a Transmon Qubit ( http://arxiv.org/abs/2201.04244v1 ) ライセンス: Link先を確認	Thomas E. Roth and Weng C. Chew	(参考訳) 自発的放出速度(ser)は量子ビット(qubit)の制御と非一貫性において重要な役割を果たすため、量子ビット(qubit)にとって重要なメリットの指標である。その結果、実用機器のSERを正確に特徴付けることは、量子情報処理装置の設計における重要なステップである。ここでは、超伝導回路の量子ビットの一種であるトランスモン量子ビットの実験的に人気のあるプラットフォームに焦点を当てる。これらの量子ビットのSERを理解することの重要性にもかかわらず、近似回路モデルを用いてしばしば決定される。設計過程における予測の精度を向上させるためには,実用システムの記述において最小の近似をすることができるフルウェーブ数値手法を用いることが望ましい。本稿では,最近開発されたトランスモン量子ビットを電磁環境に結合したフィールドベースで記述することで,これを実現する方法を示す。実験でよく特性化された文献と類似したデバイスに対してサーを計算し,モデルを検証する。さらに,単純化した集積素子回路と伝送線路モデルとの比較を行い,検討を行った。 The spontaneous emission rate (SER) is an important figure of merit for any quantum bit (qubit), as it can play a significant role in the control and decoherence of the qubit. As a result, accurately characterizing the SER for practical devices is an important step in the design of quantum information processing devices. Here, we specifically focus on the experimentally popular platform of a transmon qubit, which is a kind of superconducting circuit qubit. Despite the importance of understanding the SER of these qubits, it is often determined using approximate circuit models or is inferred from measurements on a fabricated device. To improve the accuracy of predictions in the design process, it is better to use full-wave numerical methods that can make a minimal number of approximations in the description of practical systems. In this work, we show how this can be done with a recently developed field-based description of transmon qubits coupled to an electromagnetic environment. We validate our model by computing the SER for devices similar to those found in the literature that have been well-characterized experimentally. We further cross-validate our results by comparing them to simplified lumped element circuit and transmission line models as appropriate.	翻訳日:2023-03-01 12:46:05 公開日:2022-01-11
# すべての量子混合物は All quantum mixtures are proper ( http://arxiv.org/abs/2201.04143v1 ) ライセンス: Link先を確認	Leonardo Castellani	(参考訳) 固有かつ不適切な量子混合状態は観測可能な差を持たず、区別するべきではないと論じられている。これは量子力学に対する主観的なアプローチに影響を及ぼし、QMのリレーショナル解釈の主要な動機の1つを無効にする。 It is argued that proper and improper quantum mixed states have no observable differences, and hence should not be distinguished. This has implications for subjective approaches to quantum mechanics, and invalidates one of the main motivations for relational interpretations of QM.	翻訳日:2023-03-01 12:44:39 公開日:2022-01-11
# ガウス変調CV-QKDの性能に及ぼすサブシステム非理想性の影響 Influence of sub-system non-idealities on the performance of Gaussian modulated CV-QKD ( http://arxiv.org/abs/2202.01311v1 ) ライセンス: Link先を確認	R Muralekrishnan, Lakshmi Narayanan Venkatasubramani, Sameer Ahmad Mir and Deepa Venkitesh	(参考訳) 本稿では,ガウス変調CV-QKDシステムにおける数値モデリングとサブシステムの評価について,非理想的操作を取り入れた詳細な解析と関連する結果について述べる。 We present a detailed analysis of the numerical modelling and evaluation of sub-systems in a Gaussian modulated CV-QKD system, incorporating non-ideal operations, and along with associated results.	翻訳日:2023-03-01 12:35:41 公開日:2022-01-11
# ランダムRNNとCNN:RGB-Dオブジェクトのマルチレベル解析とシーン認識を目指して When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition ( http://arxiv.org/abs/2004.12349v2 ) ライセンス: Link先を確認	Ali Caglayan and Nevrez Imamoglu and Ahmet Burak Can and Ryosuke Nakamura	(参考訳) オブジェクトとシーンを認識することは、イメージ理解において困難な2つの重要なタスクである。特に、これらのタスクの処理におけるrgb-dセンサーの使用は、視覚理解を改善するための重要な焦点となっている。一方、ニューラルネットワーク、特に畳み込みニューラルネットワーク(cnns)は広く普及し、手作りの機能を効果的なディープ機能に置き換えることで、多くの視覚タスクに応用されている。しかし、多層CNNモデルの深い特徴を効果的に活用する方法は、オープンな問題である。本稿では,オブジェクトおよびシーン認識タスクのための多モードRGB-D画像から識別的特徴表現を抽出する新しい2段階フレームワークを提案する。第1段階では、事前訓練されたcnnモデルがバックボーンとして採用され、複数のレベルで視覚的な特徴を抽出する。第2段階は、これらの特徴を再帰的ニューラルネットワーク(rnn)の完全ランダム構造を持つ高レベル表現にマップする。 CNNアクティベーションの高次元性に対応するため、RNNにおけるランダム性の概念を拡張したランダム重み付けプール方式が提案されている。マルチモーダル融合は、RGBと深度ストリームの個人認識信頼度(すなわちSVMスコア)に基づいて重みを計算し、ソフト投票方式によって実現されている。これにより、最終RGB-D分類性能において一貫したクラスラベル推定が得られる。広範囲な実験により、rnnステージの完全ランダム化構造がcnnの活性化を識別的固体機能にうまくエンコードしていることが確かめられた。人気の高いWashington RGB-D ObjectとSUN RGB-D Sceneデータセットの比較実験結果から,提案手法はオブジェクト認識タスクとシーン認識タスクの両方における最先端の手法と比較して,優れた性能,即時性能を実現していることが示された。コードはhttps://github.com/acaglayan/cnn_randrnnで入手できる。 Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing hand-crafted features with effective deep features. However, it is an open problem how to exploit deep features from a multi-layer CNN model effectively. In this paper, we propose a novel two-stage framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. In the first stage, a pretrained CNN model has been employed as a backbone to extract visual features at multiple levels. The second stage maps these features into high level representations with a fully randomized structure of recursive neural networks (RNNs) efficiently. To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed by extending the idea of randomness in RNNs. Multi-modal fusion has been performed through a soft voting approach by computing weights based on individual recognition confidences (i.e. SVM scores) of RGB and depth streams separately. This produces consistent class label estimation in final RGB-D classification performance. Extensive experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully. Comparative experimental results on the popular Washington RGB-D Object and SUN RGB-D Scene datasets show that the proposed approach achieves superior or on-par performance compared to state-of-the-art methods both in object and scene recognition tasks. Code is available at https://github.com/acaglayan/CNN_randRNN.	翻訳日:2022-12-09 13:35:55 公開日:2022-01-11
# MCMC出力の最適薄膜化 Optimal Thinning of MCMC Output ( http://arxiv.org/abs/2005.03952v5 ) ライセンス: Link先を確認	Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates	(参考訳) マルコフ連鎖モンテカルロの出力の収束と圧縮を評価するためのヒューリスティックスの使用は、生成される経験的近似の観点からは、準最適である。典型的には、初期状態のいくつかは「燃え尽きる」とされ除去されるが、圧縮も必要であれば残りの鎖は「薄められる」。本稿では,実験分布から得られる近似が最適に近いようなサンプルパスから,固定濃度を持つ状態の部分集合を遡及的に選択する問題を考察する。重圧縮を必要とする問題に適合するカーネルSteinの差分最小化に基づく新しい手法を提案する。一般微分方程式に対するパラメータ推論の難解な文脈において, この手法の有効性を理論的に保証する。ソフトウェアは、Python、R、MATLABのStein Thinningパッケージで利用可能である。 The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB.	翻訳日:2022-12-05 13:09:47 公開日:2022-01-11
# 高齢者の身体活動エネルギー消費のモニタリングに関するRNN RNNs on Monitoring Physical Activity Energy Expenditure in Older People ( http://arxiv.org/abs/2006.01169v2 ) ライセンス: Link先を確認	Stylianos Paraschiakos, Cl\'audio Rebelo de S\'a, Jeremiah Okai, Eline P. Slagboom, Marian Beekman, Arno Knobbe	(参考訳) 身体活動エネルギー支出(PAEE)の定量化を通じて、医療モニタリングは、活力と健康な高齢化を刺激し、高齢者の行動変化を誘発し、これらを個人の健康増進と結びつける可能性がある。監視環境においてPAEEを測定するために,若年層を対象としたウェアラブル加速度計の手法が開発されている。高齢者は,エネルギー要求や身体活動の幅が異なるため,高齢者のPAEE推定には適さない可能性がある。過去の活動が現在のPAEEに影響を与えるため、逐次データモデリング能力で知られるモデリング手法であるRecurrent Neural Network (RNN)を提案する。高齢者のためのrnnのトレーニングには,60歳以上(平均65歳)の健常者34名を対象に,gotovデータセットを用いて16種類の活動を行った。我々は,手首と足首に設置した加速度計を用いて,間接熱量測定によるエネルギー数の測定を行った。最適化後、3つのGRU層を持つRNNと、加速度計と参加者レベルのデータを組み合わせたフィードフォワードネットワークからなるアーキテクチャを提案する。本稿では,grgベースのrnnの標準設備を越え,最先端技術を上回る精度を達成するための取り組みについて述べる。これらの取り組みには、アグリゲーション関数を平均から分散尺度(SD, IQR, ...)に切り替えること、時間的データと静的データ(年齢、体重、BMIなど個人固有の詳細)を組み合わせて、以前に訓練されたMLモデルによって予測されたシンボル的活動データを追加することが含まれる。得られたアーキテクチャは、トレーニング入力を10倍減らしながら、ほぼ10%の性能向上を実現している。したがって、PAEEと代謝と認知の健康と精神の健康に関連する活力パラメータの関連を調査するために使用できる。 Through the quantification of physical activity energy expenditure (PAEE), health care monitoring has the potential to stimulate vital and healthy ageing, inducing behavioural changes in older people and linking these to personal health gains. To be able to measure PAEE in a monitoring environment, methods from wearable accelerometers have been developed, however, mainly targeted towards younger people. Since elderly subjects differ in energy requirements and range of physical activities, the current models may not be suitable for estimating PAEE among the elderly. Because past activities influence present PAEE, we propose a modeling approach known for its ability to model sequential data, the Recurrent Neural Network (RNN). To train the RNN for an elderly population, we used the GOTOV dataset with 34 healthy participants of 60 years and older (mean 65 years old), performing 16 different activities. We used accelerometers placed on wrist and ankle, and measurements of energy counts by means of indirect calorimetry. After optimization, we propose an architecture consisting of an RNN with 3 GRU layers and a feedforward network combining both accelerometer and participant-level data. In this paper, we describe our efforts to go beyond the standard facilities of a GRU-based RNN, with the aim of achieving accuracy surpassing the state of the art. These efforts include switching aggregation function from mean to dispersion measures (SD, IQR, ...), combining temporal and static data (person-specific details such as age, weight, BMI) and adding symbolic activity data as predicted by a previously trained ML model. The resulting architecture manages to increase its performance by approximatelly 10% while decreasing training input by a factor of 10. It can thus be employed to investigate associations of PAEE with vitality parameters related to metabolic and cognitive health and mental well-being.	翻訳日:2022-11-26 07:57:29 公開日:2022-01-11
# DensE: アダプティブセマンティック階層を組み込んだ知識グラフのための非可換表現の強化 DensE: An Enhanced Non-commutative Representation for Knowledge Graph Embedding with Adaptive Semantic Hierarchy ( http://arxiv.org/abs/2008.04548v2 ) ライセンス: Link先を確認	Haonan Lu, Hailin Hu, Xiaodong Lin	(参考訳) 関係の合成パターンのキャプチャは、知識グラフの完成において重要なタスクである。学習知識に対するマルチホップ推論の基本的なステップとしても機能する。これまで、数種類の複素値対角行列の積を用いて複合関係をモデル化するための回転に基づく翻訳法が開発されてきた。しかし、これらの手法は複合関係を単純化しすぎる傾向があり、例えば、それらは可換であり、実体とは独立であり、意味的階層を欠いている。そこで我々は,これらの問題を体系的に解決するために,複雑な構成パターンをモデル化するための新しい知識グラフ埋め込み法DensEを開発した。特に、3次元 (3-d) ユークリッド空間において、各関係をso(3)群に基づく回転作用素とスケーリング作用素に分解する。 This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns. 複数のベンチマークナレッジグラフの実験的結果は、特に複合関係において、リンク予測が欠如している現在の最先端モデルよりも密度が高いことを示している。 Capturing the composition patterns of relations is a vital task in knowledge graph completion. It also serves as a fundamental step towards multi-hop reasoning over learned knowledge. Previously, several rotation-based translational methods have been developed to model composite relations using the product of a series of complex-valued diagonal matrices. However, these methods tend to make several oversimplified assumptions on the composite relations, e.g., forcing them to be commutative, independent from entities and lacking semantic hierarchy. To systematically tackle these problems, we have developed a novel knowledge graph embedding method, named DensE, to provide an improved modeling scheme for the complex composition patterns of relations. In particular, our method decomposes each relation into an SO(3) group-based rotation operator and a scaling operator in the three dimensional (3-D) Euclidean space. This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns. Experimental results on multiple benchmark knowledge graphs show that DensE outperforms the current state-of-the-art models for missing link prediction, especially on composite relations.	翻訳日:2022-10-31 10:37:50 公開日:2022-01-11
# 制約付き最適化の近距離法の拡張 Extensions to the Proximal Distance Method of Constrained Optimization ( http://arxiv.org/abs/2009.00801v2 ) ライセンス: Link先を確認	Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange	(参考訳) 現在の論文では、損失$f(\boldsymbol{x})$を、パラメータを融合する行列である$\boldsymbol{D}\boldsymbol{x} \in S$という形式の制約を最小化する問題を研究している。融合制約は、滑らかさ、疎さ、あるいはより一般的な制約パターンをキャプチャすることができる。このような一般的な問題に対処するために、ベルトラミ・コースト法と近距離原理を組み合わせる。後者はペナル化対象の最小化によって駆動される: $f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ で、大きなチューニング定数が $\rho$ で、平方ユークリッド距離が $\boldsymbol{D}\boldsymbol{x}$ である。対応する近距離アルゴリズムの次のイテレート$\boldsymbol{x}_{n+1}$は、主要なサロゲート関数$f(\boldsymbol{x})+\frac{\rho}{2}\\|\boldsymbol{d}\boldsymbol{x}-\mathcal{p}_{s}(\boldsymbol{d}\boldsymbol{x}_n)\\|^2$を最小化することにより、現在のイテレート$\boldsymbol{x}_n$から構成される。固定 $\rho$ と部分解析損失 $f(\boldsymbol{x})$ と部分解析制約セット $s$ に対して、我々は定常点への収束を証明する。強い仮定の下では、収束率を提供し、線形局所収束を示す。また, コストのかかる線形システム問題を回避するために, 最急降下型 (sd) も構築した。アルゴリズムをベンチマークするために、乗算器の交互方向法(ADMM)と比較する。大規模な数値実験では, 距離予測, 凸回帰, 凸クラスタリング, 総変動像のデノイング, 行列の良好な条件数への投影に関する問題を含む。これらの実験は,高次元問題に対する最も急な変形の速度と許容可能な精度を示す。 The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $\rho$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\frac{\rho}{2}\\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\\|^2$. For fixed $\rho$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.	翻訳日:2022-10-22 20:06:09 公開日:2022-01-11
# リッジレット事前:ベイズニューラルネットワークの事前仕様に対する共分散関数アプローチ The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks ( http://arxiv.org/abs/2010.08488v4 ) ライセンス: Link先を確認	Takuo Matsubara and Chris J. Oates and Fran\c{c}ois-Xavier Briol	(参考訳) ベイジアンニューラルネットワークは、ニューラルネットワークの強い予測性能と、ベイジアンフレームワークの予測出力に関連する不確実性の形式的定量化を組み合わせようとする。しかし、ネットワークの出力空間に持ち上げられたときに意味のある事前分布をネットワークのパラメータに与える方法はまだ不明である。タスクに対して適切なガウスプロセス共分散関数を提案できる可能性のあるソリューションが提案されている。提案手法は、ネットワークの出力空間における擬似ガウス過程を近似した、リッジレット事前と呼ばれる、ネットワークのパラメータの事前分布を構築する。ニューラルネットワークとガウス過程の接続に関する既存の研究とは対照的に,本解析は非漸近的であり,有限サンプルサイズ誤差境界が提供されている。これは、ベイズニューラルネットワークが共分散関数が十分正則である任意のガウス過程を近似できる普遍性を確立する。実験評価は概念実証に限定し,適切なガウス過程が提供できる回帰問題に対して,リッジレットプリアーが非構造化プリアーよりも優れることを示す。 Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate Gaussian process covariance function for the task at hand. Our approach constructs a prior distribution for the parameters of the network, called a ridgelet prior, that approximates the posited Gaussian process in the output space of the network. In contrast to existing work on the connection between neural networks and Gaussian processes, our analysis is non-asymptotic, with finite sample-size error bounds provided. This establishes the universality property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular. Our experimental assessment is limited to a proof-of-concept, where we demonstrate that the ridgelet prior can out-perform an unstructured prior on regression problems for which a suitable Gaussian process prior can be provided.	翻訳日:2022-10-06 20:20:31 公開日:2022-01-11
# 人間による位置推定のためのロボットナビゲーショングラフの最小化 Minimizing Robot Navigation-Graph For Position-Based Predictability By Humans ( http://arxiv.org/abs/2010.15255v2 ) ライセンス: Link先を確認	Sriram Gopalakrishnan, Subbarao Kambhampati	(参考訳) 人間とロボットが同じ空間で作業しながら移動している状況では、移動ロボットが捉えた予測可能な経路は環境をより安全に感じさせるだけでなく、人間が経路の衝突を避けたり、道を塞いだりすることで空間内のナビゲーションを助けることができる。予測可能な経路が不可欠になりますロボットの数が増えるにつれて、人間がロボットの経路を予測するための認知的努力は維持できなくなる。人間の数が増えるにつれて、複数の人間の動きを考慮しながらロボットが動くのも難しくなる。さらに、レストランや銀行、病院など、新しい人がこの分野に足を踏み入れると、ロボットが一般的に行う軌道への親密性が低下し、経路に沿って予測可能なロボットの動きの必要性がさらに高まる。そこで本研究では,ロボットの現在位置からの予測可能性である位置に基づく予測可能性について,ロボットのナビゲーショングラフを最小化することを提案する。これは、人間が自身の作業に加えて、ロボットの目標や以前の行動を追跡することは期待できないため、重要である。本稿では,位置に基づく予測可能性の尺度を定義し,ロボットの動きのナビゲーショングラフ(方向グラフ)を最小化するためのヒルクライミングアルゴリズムの提案と評価を行う。これに続いて,提案手法をサポートする人間-対象実験の結果が得られた。 In situations where humans and robots are moving in the same space whilst performing their own tasks, predictable paths taken by mobile robots can not only make the environment feel safer, but humans can also help with the navigation in the space by avoiding path conflicts or not blocking the way. So predictable paths become vital. The cognitive effort for the human to predict the robot's path becomes untenable as the number of robots increases. As the number of humans increase, it also makes it harder for the robots to move while considering the motion of multiple humans. Additionally, if new people are entering the space -- like in restaurants, banks, and hospitals -- they would have less familiarity with the trajectories typically taken by the robots; this further increases the needs for predictable robot motion along paths. With this in mind, we propose to minimize the navigation-graph of the robot for position-based predictability, which is predictability from just the current position of the robot. This is important since the human cannot be expected to keep track of the goals and prior actions of the robot in addition to doing their own tasks. In this paper, we define measures for position-based predictability, then present and evaluate a hill-climbing algorithm to minimize the navigation-graph (directed graph) of robot motion. This is followed by the results of our human-subject experiments which support our proposed methodology.	翻訳日:2022-10-02 05:31:45 公開日:2022-01-11
# 複数の二次変数を用いた非定常ランダム関数の組込みモデル推定器 An Embedded Model Estimator for Non-Stationary Random Functions using Multiple Secondary Variables ( http://arxiv.org/abs/2011.04116v4 ) ライセンス: Link先を確認	Colin Daly	(参考訳) 複数の二次変数を用いた非定常空間モデリングアルゴリズムを開発した。ジオ統計学と量子ランダムフォレストを組み合わせて、新しい補間と確率シミュレーションアルゴリズムを提供する。本稿では,本手法を導入し,地理的モデリングや量子ランダムフォレストに適用した結果と自然に類似した一貫性を有することを示す。この方法では、モデルをさらに条件づけるために、krigingのような単純な補間技法を組み込むことができる。このアルゴリズムは、各ターゲット位置における目標変数の条件分布を推定することで動作する。このような分布の族は対象変数のエンベロープと呼ばれる。このことから、空間推定、量子化、不確実性を得ることができる。エンベロープから条件付きシミュレーションを生成するアルゴリズムも開発されている。封筒からサンプルを採取すると、二次変数の重要性、傾向、変数の相対的な変化に局所的に影響される。 An algorithm for non-stationary spatial modelling using multiple secondary variables is developed. It combines Geostatistics with Quantile Random Forests to give a new interpolation and stochastic simulation algorithm. This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests. The method allows for embedding of simpler interpolation techniques, such as Kriging, to further condition the model. The algorithm works by estimating a conditional distribution for the target variable at each target location. The family of such distributions is called the envelope of the target variable. From this, it is possible to obtain spatial estimates, quantiles and uncertainty. An algorithm to produce conditional simulations from the envelope is also developed. As they sample from the envelope, realizations are therefore locally influenced by relative changes of importance of secondary variables, trends and variability.	翻訳日:2022-09-28 02:04:42 公開日:2022-01-11
# (参考訳) 音声コマンド認識のためのテンソルトレインネットワークのハイブリッドモデルの構築 Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition ( http://arxiv.org/abs/2201.10609v1 ) ライセンス: CC BY 4.0	Jun Qi, Javier Tejedor	(参考訳) 本研究の目的は,モデルパラメータ数と分類精度のトレードオフを考慮し,低複雑性音声コマンド認識(SCR)システムを設計することである。具体的には、テンソルトレイン(TT)ネットワークの深いハイブリッドアーキテクチャを利用して、エンドツーエンドのSRCパイプラインを構築します。我々のコマンド認識システムであるCNN+(TT-DNN)は、スペクトル特徴抽出のための下部の畳み込み層と、コマンド分類のための上部のTT層で構成されている。提案するCNN+(TT-DNN)モデルでは,従来のCNNベースラインと比較して,完全連結(FC)層をTTモデルに置き換えることができ,CNNモデルのベースライン性能を維持しながら,モデルパラメータの大幅な削減が可能である。我々は、CNN+(TT-DNN)モデルをランダムに初期化し、あるいはよく訓練されたCNN+DNNに基づいて、Google Speech Command Dataset上でCNN+(TT-DNN)モデルを評価する。実験の結果,提案したCNN+(TT-DNN)モデルでは,CNNモデルよりも4倍少ないモデルパラメータで96.31%の競争精度が得られた。さらに、パラメータ数が増加するとCNN+(TT-DNN)モデルは97.2%の精度が得られる。 This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely CNN+(TT-DNN), is composed of convolutional layers at the bottom for spectral feature extraction and TT layers at the top for command classification. Compared with a traditional end-to-end CNN baseline for SCR, our proposed CNN+(TT-DNN) model replaces fully connected (FC) layers with TT ones and it can substantially reduce the number of model parameters while maintaining the baseline performance of the CNN model. We initialize the CNN+(TT-DNN) model in a randomized manner or based on a well-trained CNN+DNN, and assess the CNN+(TT-DNN) models on the Google Speech Command Dataset. Our experimental results show that the proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31% with 4 times fewer model parameters than the CNN model. Furthermore, the CNN+(TT-DNN) model can obtain a 97.2% accuracy when the number of parameters is increased.	翻訳日:2022-01-30 13:51:16 公開日:2022-01-11
# (参考訳) 学習者のコース選択を支援するオープンMOOCレビューの大規模分析 Large Scale Analysis of Open MOOC Reviews to Support Learners' Course Selection ( http://arxiv.org/abs/2201.06967v1 ) ライセンス: CC BY-SA 4.0	Manuel J. Gomez, Mario Calder\'on, Victor S\'anchez, F\'elix J. Garc\'ia Clemente, Jos\'e A. Ruip\'erez-Valiente	(参考訳) 最近のパンデミックは教育の見方を変えました。子供や大学生だけがオンライン教育を利用しているわけではないことは驚きではない。過去数年間、何百万人もの大人がオンラインの授業やコースにサインアップし、courseraやedxなどのmoocプロバイダが、彼らのプラットフォームに登録した新規ユーザーを報告している。しかし、学生はコースを選択する際にいくつかの課題に直面している。オンラインレビューシステムは、多くの分野において標準的なものであるが、moocエコシステムには標準化された、あるいは完全に分散されたレビューシステムは存在しない。この分野では、よりシンプルで透明性の高いレビューシステムを構築するために、利用可能なオープンMOOCレビューを活用する機会があると考えています。 Specifically, in our research we analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms in order to determine the following: (1) if the numeric ratings provide discriminant information to learners, (2) if NLP-driven sentiment analysis on textual reviews could provide valuable information to learners, (3) if we can leverage NLP-driven topic finding techniques to infer themes that could be important for learners, and (4) if we can use these models to effectively characterize MOOCs based on the open reviews. その結果,数値評価は偏りが顕著であり (その63\%は5つ星評価である),トピック・モデリングにより,コース広告や実際の適用性,異なるコースの難易度などに関連する興味深い話題が明らかになった。我々は、この領域に光を当て、オンライン教育レビューにおいてより透明なアプローチを推進し、ポストパンデミック時代に入るにつれて、ますます人気が高まっていることを期待している。 The recent pandemic has changed the way we see education. It is not surprising that children and college students are not the only ones using online education. Millions of adults have signed up for online classes and courses during last years, and MOOC providers, such as Coursera or edX, are reporting millions of new users signing up in their platforms. However, students do face some challenges when choosing courses. Though online review systems are standard among many verticals, no standardized or fully decentralized review systems exist in the MOOC ecosystem. In this vein, we believe that there is an opportunity to leverage available open MOOC reviews in order to build simpler and more transparent reviewing systems, allowing users to really identify the best courses out there. Specifically, in our research we analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms in order to determine the following: (1) if the numeric ratings provide discriminant information to learners, (2) if NLP-driven sentiment analysis on textual reviews could provide valuable information to learners, (3) if we can leverage NLP-driven topic finding techniques to infer themes that could be important for learners, and (4) if we can use these models to effectively characterize MOOCs based on the open reviews. Results show that numeric ratings are clearly biased (63\% of them are 5-star ratings), and the topic modeling reveals some interesting topics related with course advertisements, the real applicability, or the difficulty of the different courses. We expect our study to shed some light on the area and promote a more transparent approach in online education reviews, which are becoming more and more popular as we enter the post-pandemic era.	翻訳日:2022-01-23 20:09:11 公開日:2022-01-11
# (参考訳) Quasi-Framelet: GraphNeural Networksのもうひとつの改善 Quasi-Framelets: Another Improvement to GraphNeural Networks ( http://arxiv.org/abs/2201.04728v1 ) ライセンス: CC BY 4.0	Mengxi Yang, Xuebin Zheng, Jie Yin and Junbin Gao	(参考訳) 本稿では,スペクトルグラフニューラルネットワークのためのマルチスケールフレームレット畳み込みの新しい設計を提案する。スペクトルパラダイムでは、スペクトル領域に様々なスペクトルフィルタを提案し、グローバルグラフ構造情報とローカルグラフ構造情報の両方をキャプチャすることで、グラフ学習タスクの性能を向上させる。既存のスペクトルアプローチは、いくつかのグラフでは優れた性能を示すが、柔軟性の欠如と、グラフ情報が不完全あるいは摂動的である場合に脆弱である。新しいフレームレット畳み込みは、スペクトル領域で直接設計されたフィルタリングファンクメントを組み込んで、これらの制限を克服します。提案した畳み込みはスペクトル情報の遮断に優れた柔軟性を示し,ノイズグラフ信号の負の効果を効果的に緩和する。また、実世界のグラフデータの不均一性を利用するため、新しいフレームレット畳み込みを用いたヘテロジニアスグラフニューラルネットワークは、マルチレベルグラフ解析によりメタパスの固有トポロジ情報を埋め込むソリューションを提供する。 This paper aims to provide a novel design of a multiscale framelets convolution for spectral graph neural networks. In the spectral paradigm, spectral GNNs improve graph learning task performance via proposing various spectral filters in spectral domain to capture both global and local graph structure information. Although the existing spectral approaches show superior performance in some graphs, they suffer from lack of flexibility and being fragile when graph information are incomplete or perturbated. Our new framelets convolution incorporates the filtering func-tions directly designed in the spectral domain to overcome these limitations. The proposed convolution shows a great flexibility in cutting-off spectral information and effectively mitigate the negative effect of noisy graph signals. Besides, to exploit the heterogeneity in real-world graph data, the heterogeneous graph neural network with our new framelet convolution provides a solution for embedding the intrinsic topological information of meta-path with a multi-level graph analysis.Extensive experiments have been conducted on real-world heterogeneous graphs and homogeneous graphs under settings with noisy node features and superior performance results are achieved.	翻訳日:2022-01-15 05:30:00 公開日:2022-01-11
# (参考訳) デュアルアテンションネットワークを用いた二型・ハイブリッド型市場知識グラフに基づく株価変動予測 Stock Movement Prediction Based on Bi-typed and Hybrid-relational Market Knowledge Graph via Dual Attention Networks ( http://arxiv.org/abs/2201.04965v1 ) ライセンス: CC BY 4.0	Yu Zhao, Huaming Du, Ying Liu, Shaopeng Wei, Xingyan Chen, Huali Feng, Qinghong Shuai, Qing Li, Fuzhen Zhuang, Gang Kou	(参考訳) 株式移動予測(SMP)は、上場企業の株価動向を予測することを目的としており、これは金融市場の不安定な性質のために難しい課題である。近年の金融研究では、モーメントの流出効果が株価変動に重要な役割を果たすことが示されている。しかし、従来の研究は通常、関連企業間の単純な接続情報のみを学習するが、実際の金融市場における上場企業の複雑な関係をモデル化することは必然的に失敗する。この問題に対処するため,我々はまず,上場企業とその関連役員を含む2種類のエンティティと,明示的関係と暗黙的関係を含むハイブリッド関係を含む,より包括的な市場ナレッジグラフ(mkg)を構築する。その後、構築したMKGに基づいて運動量流出信号を学習し、株価予測を行う新しいデュアルアテンションネットワークであるDanSmpを提案する。 sotaベースライン9に対して構築したデータセットを実験した結果,提案手法が構築したmkgを用いて在庫予測を改善できることが確認された。 Stock Movement Prediction (SMP) aims at predicting listed companies' stock future price trend, which is a challenging task due to the volatile nature of financial markets. Recent financial studies show that the momentum spillover effect plays a significant role in stock fluctuation. However, previous studies typically only learn the simple connection information among related companies, which inevitably fail to model complex relations of listed companies in the real financial market. To address this issue, we first construct a more comprehensive Market Knowledge Graph (MKG) which contains bi-typed entities including listed companies and their associated executives, and hybrid-relations including the explicit relations and implicit relations. Afterward, we propose DanSmp, a novel Dual Attention Networks to learn the momentum spillover signals based upon the constructed MKG for stock prediction. The empirical experiments on our constructed datasets against nine SOTA baselines demonstrate that the proposed DanSmp is capable of improving stock prediction with the constructed MKG.	翻訳日:2022-01-15 05:10:15 公開日:2022-01-11
# (参考訳) インターネット提供型認知行動療法におけるアドヒアランス予測 : 最小データ感度アプローチ Adherence Forecasting for Guided Internet-Delivered Cognitive Behavioral Therapy: A Minimally Data-Sensitive Approach ( http://arxiv.org/abs/2201.04967v1 ) ライセンス: CC BY 4.0	Ulysse C\^ot\'e-Allard, Minh H. Pham, Alexandra K. Schultz, Tine Nordgreen, Jim Torresen	(参考訳) インターネット提供型心理的治療(IDPT)は、メンタルヘルスのアクセシビリティを向上させるための効果的でスケーラブルな経路であると考えられている。この文脈において、治療の順守は、伝統的な介入に比べて医療専門家と患者との相互作用が減っているため、特に問題となる。並行して、特にデジタル分野において、人々の個人データを使用する際の規制が増加している。このような規制では、データ最小化はしばしばGDPR(General Data Protection Regulation)のような中核的なテナントとなる。そこで本研究では,最小限の敏感なログイン/ログアウトデータにのみ依存しながら,自動アドバンス予測を行うディープラーニング手法を提案する。本研究は,インターネット提供型認知行動療法(G-ICBT)を施行した342例を対象に行った。提案するセルフアテンションネットワークは平均平均バランス精度を70%以上達成し,治療期間の1/3しか経過しなかった。そこで本研究では,G-ICBTの自動付着予測が,最小限の感度データのみを用いて実現可能であることを示す。 Internet-delivered psychological treatments (IDPT) are seen as an effective and scalable pathway to improving the accessibility of mental healthcare. Within this context, treatment adherence is an especially relevant challenge to address due to the reduced interaction between healthcare professionals and patients, compared to more traditional interventions. In parallel, there are increasing regulations when using peoples' personal data, especially in the digital sphere. In such regulations, data minimization is often a core tenant such as within the General Data Protection Regulation (GDPR). Consequently, this work proposes a deep-learning approach to perform automatic adherence forecasting, while only relying on minimally sensitive login/logout data. This approach was tested on a dataset containing 342 patients undergoing guided internet-delivered cognitive behavioral therapy (G-ICBT) treatment. The proposed Self-Attention Network achieved over 70% average balanced accuracy, when only 1/3 of the treatment duration had elapsed. As such, this study demonstrates that automatic adherence forecasting for G-ICBT, is achievable using only minimally sensitive data, thus facilitating the implementation of such tools within real-world IDPT platforms.	翻訳日:2022-01-15 04:55:11 公開日:2022-01-11
# fusion autoencoderによる深層クラスタリング Deep clustering with fusion autoencoder ( http://arxiv.org/abs/2201.04727v1 ) ライセンス: Link先を確認	Shuai Chang	(参考訳) 近年,クラスタリング研究における表現学習の深層学習技術の導入が注目され,新たに開発されたクラスタリングパラダイムであるviz. the Deep Clustering (DC) が生み出されている。通常、DCモデルはオートエンコーダを利用して、結果としてクラスタリングプロセスを促進する固有の特徴を学ぶ。近年, 可変オートエンコーダ (VAE) と呼ばれる生成モデルがDC研究で広く受け入れられている。それでも、一般的なVAEは、包括的な潜伏する特徴を認識できないため、劣化するクラスタリングのパフォーマンスにつながる。本稿では,この問題に対処する新しいDC法を提案する。特に、生成逆数ネットワークとVAEは、下流クラスタリングタスクの恩恵を受けるより差別的な表現を識別するために、融合オートエンコーダ(FAE)と呼ばれる新しいオートエンコーダに結合される。さらに、FAEは、表現学習能力をさらに強化するディープ残差ネットワークアーキテクチャで実装されている。最後に、faeの潜在空間は、異なるクラスタを互いに引き離し、個々のクラスタ内のデータポイントを崩壊させる、深密ニューラルネットワークによって形成される埋め込み空間に変換される。複数の画像データセットを用いて実験を行い、ベースライン法に対する提案したDCモデルの有効性を示した。 Embracing the deep learning techniques for representation learning in clustering research has attracted broad attention in recent years, yielding a newly developed clustering paradigm, viz. the deep clustering (DC). Typically, the DC models capitalize on autoencoders to learn the intrinsic features which facilitate the clustering process in consequence. Nowadays, a generative model named variational autoencoder (VAE) has got wide acceptance in DC studies. Nevertheless, the plain VAE is insufficient to perceive the comprehensive latent features, leading to the deteriorative clustering performance. In this paper, a novel DC method is proposed to address this issue. Specifically, the generative adversarial network and VAE are coalesced into a new autoencoder called fusion autoencoder (FAE) for discerning more discriminative representation that benefits the downstream clustering task. Besides, the FAE is implemented with the deep residual network architecture which further enhances the representation learning ability. Finally, the latent space of the FAE is transformed to an embedding space shaped by a deep dense neural network for pulling away different clusters from each other and collapsing data points within individual clusters. Experiment conducted on several image datasets demonstrate the effectiveness of the proposed DC model against the baseline methods.	翻訳日:2022-01-14 14:16:04 公開日:2022-01-11
# 道路交通プロファイルのセンサレス推定のためのグラフ埋め込みの設計について On the Design of Graph Embeddings for the Sensorless Estimation of Road Traffic Profiles ( http://arxiv.org/abs/2201.04968v1 ) ライセンス: Link先を確認	Eric L. Manibardo, Ibai La\~na, Esther Villar, and Javier Del Ser	(参考訳) トラフィック予測モデルは、認識、処理、保存が必要なデータに依存します。これには交通センシングインフラストラクチャの展開とメンテナンスが必要であり、しばしば耐え難い金銭コストに繋がる。センシングされた位置の欠如は、交通監視に必要な経済的投資をさらに減少させる合成データシミュレーションによって補うことができる。最も一般的なデータ生成アプローチの1つは、類似する道路のデータ分布に基づいて、実際のトラフィックパターンを生成することだ。同様の交通量で道路を検出するプロセスは、これらのシステムの重要なポイントである。しかし、この類似性に基づく探索には、ターゲット位置でデータを集めることなくフローメトリクスを使用できない。本稿では,道路セグメントのトポロジ的特徴を検査することで,交通データのある場所を検出する手法を提案する。関連する位相的特徴を数値表現(埋め込み)として抽出し、異なる場所を比較し、最終的にそれらの埋め込み間の類似性に基づいて最も類似した道路を見つける。本システムの性能について検討し,より単純なトラフィック推定手法と比較した。類似したデータソースを見つけた後、トラフィックプロファイルを合成するために生成手法が使用される。認識された道路における交通行動の類似性に応じて、生成法は1つの道路からのデータのみを供給できる。合成試料の精度の観点から, 数世代にわたって解析を行った。とりわけ,本研究は,合成交通試料の品質向上に向けたさらなる研究努力を刺激し,センサインフラストラクチャの必要性を減らすことを目的としている。 Traffic forecasting models rely on data that needs to be sensed, processed, and stored. This requires the deployment and maintenance of traffic sensing infrastructure, often leading to unaffordable monetary costs. The lack of sensed locations can be complemented with synthetic data simulations that further lower the economical investment needed for traffic monitoring. One of the most common data generative approaches consists of producing real-like traffic patterns, according to data distributions from analogous roads. The process of detecting roads with similar traffic is the key point of these systems. However, without collecting data at the target location no flow metrics can be employed for this similarity-based search. We present a method to discover locations among those with available traffic data by inspecting topological features of road segments. Relevant topological features are extracted as numerical representations (embeddings) to compare different locations and eventually find the most similar roads based on the similarity between their embeddings. The performance of this novel selection system is examined and compared to simpler traffic estimation approaches. After finding a similar source of data, a generative method is used to synthesize traffic profiles. Depending on the resemblance of the traffic behavior at the sensed road, the generation method can be fed with data from one road only. Several generation approaches are analyzed in terms of the precision of the synthesized samples. Above all, this work intends to stimulate further research efforts towards enhancing the quality of synthetic traffic samples and thereby, reducing the need for sensing infrastructure.	翻訳日:2022-01-14 13:05:15 公開日:2022-01-11
# (参考訳) MICCAI2021におけるHECKTORチャレンジの概要:PET/CT画像における頭頸部腫瘍分離とアウトカム予測 Overview of the HECKTOR Challenge at MICCAI 2021: Automatic Head and Neck Tumor Segmentation and Outcome Prediction in PET/CT Images ( http://arxiv.org/abs/2201.04138v1 ) ライセンス: CC BY 4.0	Vincent Andrearczyk, Valentin Oreiller, Sarah Boughdad, Catherine Chez Le Rest, Hesham Elhalawani, Mario Jreige, John O. Prior, Martin Valli\`eres, Dimitris Visvikis, Mathieu Hatt, Adrien Depeursinge	(参考訳) 本稿では,第24回医療画像コンピューティング・コンピュータ支援干渉会議(MICCAI)のサテライトイベントとして組織されたHECKTOR(HEAD and neCK Tumor)チャレンジの第2版の概要を紹介する。この課題は、頭頸部癌(h&n)に対するpet/ct画像の自動解析に関連する3つの課題から成り、咽頭領域に焦点を当てている。タスク1は、FDG-PET/CT画像におけるH&N原発グロス腫瘍ボリューム(GTVt)の自動セグメンテーションである。タスク2は、同じFDG-PET/CTからPFS(Progression Free Survival)の自動予測である。最後に、第3タスクは第2タスクと同じで、参加者にGTVtアノテーションが提供されている。データは6つのセンターから収集され、合計325枚の画像が224のトレーニングと101のテストケースに分割された。チャレンジへの関心は、103の登録チームと448の結果の提出による重要な参加によって強調された。第1タスクではDice similarity Coefficient(DSC)が0.7591、第2タスクでは0.7196、第3タスクでは0.6978のConcordance Index(C-index)がそれぞれ得られた。あらゆるタスクにおいて、アプローチの単純さが一般化性能を保証する鍵であることが判明した。タスク2と3におけるPFS予測性能の比較では、GTVt輪郭の提供は最良の結果を得るためには重要ではなかったことが示唆され、完全な自動手法が利用可能であることが示唆された。これはgtvtの整備の必要性を損なう可能性があり、何千もの潜在的対象を含む再現可能で大規模な放射線学研究への道を開く可能性がある。 This paper presents an overview of the second edition of the HEad and neCK TumOR (HECKTOR) challenge, organized as a satellite event of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021. The challenge is composed of three tasks related to the automatic analysis of PET/CT images for patients with Head and Neck cancer (H&N), focusing on the oropharynx region. Task 1 is the automatic segmentation of H&N primary Gross Tumor Volume (GTVt) in FDG-PET/CT images. Task 2 is the automatic prediction of Progression Free Survival (PFS) from the same FDG-PET/CT. Finally, Task 3 is the same as Task 2 with ground truth GTVt annotations provided to the participants. The data were collected from six centers for a total of 325 images, split into 224 training and 101 testing cases. The interest in the challenge was highlighted by the important participation with 103 registered teams and 448 result submissions. The best methods obtained a Dice Similarity Coefficient (DSC) of 0.7591 in the first task, and a Concordance index (C-index) of 0.7196 and 0.6978 in Tasks 2 and 3, respectively. In all tasks, simplicity of the approach was found to be key to ensure generalization performance. The comparison of the PFS prediction performance in Tasks 2 and 3 suggests that providing the GTVt contour was not crucial to achieve best results, which indicates that fully automatic methods can be used. This potentially obviates the need for GTVt contouring, opening avenues for reproducible and large scale radiomics studies including thousands potential subjects.	翻訳日:2022-01-13 23:49:07 公開日:2022-01-11
# (参考訳) 自動tether-netシステムを用いた一般化デブリ捕獲のためのロバストポリシーの学習 Learning Robust Policies for Generalized Debris Capture with an Automated Tether-Net System ( http://arxiv.org/abs/2201.04180v1 ) ライセンス: CC BY 4.0	Chen Zeng, Grant Hecht, Prajit KrisshnaKumar, Raj K. Shah, Souma Chowdhury and Eleonora M. Botta	(参考訳) チェイサー宇宙船から打ち上げられたテザーネットは、軌道上の大きな宇宙ゴミを捕獲し処分する有望な方法を提供する。このテザネットシステムは、ネット発射・閉鎖制御の性能に影響を及ぼすセンサとアクチュエーターの不確実性の原因を複数抱えている。しかし、初期の信頼性に基づく制御アクション設計の最適化アプローチは、チェイサーに対する様々な発射シナリオと目標(デブリス)状態の一般化を困難かつ計算的に禁止している。本稿では,汎用かつ信頼性の高い制御ポリシを探索するために,ppo(proximal policy optimization)アプローチとネットダイナミクスシミュレーションを統合した強化学習フレームワークを提案する。後者は、ネットベースのターゲットキャプチャのエピソードを評価し、PPO2に対する報酬フィードバックとして機能するキャプチャ品質指標を推定する。ここで、学習されたポリシーは、任意の発射シナリオに基づいて、移動網の状態と目標に基づいて、網閉動作のタイミングをモデル化するように設計されている。状態推定と起動動作に合成不確実性を組み込むために,確率的状態遷移モデルを考える。トレーニング中の顕著な報酬改善に加えて、トレーニングされたポリシは、個々のシナリオで実行される信頼性ベースの最適化によって得られたものに近い(幅広い発射/目標シナリオにわたる)キャプチャパフォーマンスを実証する。 Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose of large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability-based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to the chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learned policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability-based optimization run over an individual scenario.	翻訳日:2022-01-13 23:47:09 公開日:2022-01-11
# (参考訳) Hyper Transformer: 教師付き半教師付きFew-Shot学習のためのモデル生成 HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning ( http://arxiv.org/abs/2201.04182v1 ) ライセンス: CC BY 4.0	Andrey Zhmoginov, Mark Sandler, Max Vladymyrov	(参考訳) 本研究では,支援サンプルから直接畳み込みニューラルネットワーク(CNN)の重みを生成する,数ショット学習のためのトランスフォーマーベースモデルであるHyperTransformerを提案する。特定のタスクに対する小さなCNNモデルの依存は、高容量トランスフォーマーモデルによって符号化されるので、大きなタスク空間の複雑さと個々のタスクの複雑さを効果的に分離する。提案手法は, タスク依存型埋め込みの学習が最適ではなく, タスクに関する情報が全てのモデルパラメータを変調できる場合に, より優れた性能が得られるような, 小さなターゲットCNNアーキテクチャにおいて特に有効である。より大きなモデルの場合、最後のレイヤを生成するだけで、最先端のメソッドで得られるものよりも競争性や優れた結果を生み出すことができることが分かりました。最後に,提案手法を,サポートセットの未ラベルサンプルを利用した半教師付きシステムに拡張し,さらに撮影性能を向上する。 In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable. Finally, we extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.	翻訳日:2022-01-13 23:34:31 公開日:2022-01-11
# (参考訳) ディープラーニングに基づく駐車場サービスの動的価格設定 Dynamic Price of Parking Service based on Deep Learning ( http://arxiv.org/abs/2201.04188v1 ) ライセンス: CC BY 4.0	Alejandro Luque-Cerpa, Miguel A. Guti\'errez-Naranjo, Miguel C\'ardenas-Montes	(参考訳) 都市における空気質の向上は公共団体の主要な関心事の一つである。この懸念は、空気質と公衆衛生の間の証拠から生じている。この地域の政府機関の主な取り組みは、監視と予測システム、汚染された自動車の禁止、低品質空気の期間の交通制限などである。本研究は,規制駐車場サービスにおける動的価格設定の提案である。駐車場サービスの動的な価格は、低品質のエピソードが予測されたときに自動車の駐車を妨げなければならない。この目的のために、多様なディープラーニング戦略を評価する。彼らは共通して、市内の空気品質に関するラベルを予測するために、集合的空気品質測定を使用する。本提案はマドリード(spain)の経済パラメータと深層学習品質基準を用いて評価される。 The improvement of air-quality in urban areas is one of the main concerns of public government bodies. This concern emerges from the evidence between the air quality and the public health. Major efforts from government bodies in this area include monitoring and forecasting systems, banning more pollutant motor vehicles, and traffic limitations during the periods of low-quality air. In this work, a proposal for dynamic prices in regulated parking services is presented. The dynamic prices in parking service must discourage motor vehicles parking when low-quality episodes are predicted. For this purpose, diverse deep learning strategies are evaluated. They have in common the use of collective air-quality measurements for forecasting labels about air quality in the city. The proposal is evaluated by using economic parameters and deep learning quality criteria at Madrid (Spain).	翻訳日:2022-01-13 23:07:37 公開日:2022-01-11
# (参考訳) ニューラルネットワーク容量:エッジダイナミクスによるニューラルネットワーク選択の新しい視点 Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics ( http://arxiv.org/abs/2201.04194v1 ) ライセンス: CC BY 4.0	Chunheng Jiang, Tejaswini Pedapati, Pin-Yu Chen, Yizhou Sun, Jianxi Gao	(参考訳) 下流タスクに適切なトレーニング済みニューラルネットワークを特定するための効率的なモデル選択は、ディープラーニングの基本的な課題である。現在の実践では、パフォーマンス予測のためのモデルトレーニングに高価な計算コストを必要とする。本稿では,学習中のシナプス接続(エッジ)上の制御ダイナミクスを解析し,ニューラルネットワーク選択のための新しいフレームワークを提案する。我々のフレームワークは、ニューラルネットワークトレーニング中のバックプロパゲーションがシナプス接続の動的進化と等価であるという事実に基づいている。したがって、収束ニューラルネットワークは、これらのエッジからなるネットワークシステムの平衡状態と関連付けられる。この目的のために、ニューラルネットワーク$G_A$を有向線グラフ$G_B$に変換するネットワークマッピング$\phi$を構築し、これらエッジ上で定義した$G_A$を$G_A$とする。次に、一握りの初期のトレーニング結果を用いて、下流タスクにおける$g_a$の一般化能力を普遍的に捉える予測指標として、ニューラルキャパシタンスメトリック$\beta_{\rm eff}$を導出する。本フレームワークの微調整性能を評価するために,17種類のイメージネットモデルとcifar10,cifar100,svhn,fashion mnist,birdsを含む5つのベンチマークデータセットを用いて広範な実験を行った。我々のニューラルキャパシタンスメトリックは、初期トレーニング結果のみに基づいたモデル選択の強力な指標であり、最先端の手法よりも効率的である。 Efficient model selection for identifying a suitable pre-trained neural network to a downstream task is a fundamental yet challenging task in deep learning. Current practice requires expensive computational costs in model training for performance prediction. In this paper, we propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections. Therefore, a converged neural network is associated with an equilibrium state of a networked system composed of those edges. To this end, we construct a network mapping $\phi$, converting a neural network $G_A$ to a directed line graph $G_B$ that is defined on those edges in $G_A$. Next, we derive a neural capacitance metric $\beta_{\rm eff}$ as a predictive measure universally capturing the generalization capability of $G_A$ on the downstream task using only a handful of early training results. We carried out extensive experiments using 17 popular pre-trained ImageNet models and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST and Birds, to evaluate the fine-tuning performance of our framework. Our neural capacitance metric is shown to be a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.	翻訳日:2022-01-13 22:54:11 公開日:2022-01-11
# (参考訳) 視覚ロボットのための深層強化学習アルゴリズムのベンチマーク Benchmarking Deep Reinforcement Learning Algorithms for Vision-based Robotics ( http://arxiv.org/abs/2201.04224v1 ) ライセンス: CC BY 4.0	Swagat Kumar, Hayden Sampson, Ardhendu Behera	(参考訳) 本稿では,2つの視覚に基づくロボット工学問題の解法として,最先端の強化学習アルゴリズムのベンチマーク研究を行う。本研究で検討されているアルゴリズムは、ソフトアクター・クリティック(SAC)、近位ポリシー最適化(PPO)、補間ポリシー勾配(IPG)、およびHER(Hindsight Experience replay)を含む。これらのアルゴリズムの性能は、PyBulletの2つのシミュレーション環境であるKukaDiverseObjectEnvとRacecarZEDGymEnvと比較される。これらの環境における状態観察はRGB画像の形で利用可能であり、アクション空間は連続しており、解決が困難である。基本的には単一ゴール環境であるこれらの問題に対してHERアルゴリズムを実装するのに必要な、いくつかの戦略が提案されている。また,学習過程に空間的および時間的注意を組み込むために,いくつかの特徴抽出アーキテクチャが提案されている。厳密なシミュレーション実験により、これらの成分による改善が確立される。私たちの知る限りでは、上記の2つのビジョンベースのロボット工学の問題に対して、このようなベンチマーク研究は利用できない。 This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight Experience replay (HER). The performances of these algorithms are compared against PyBullet's two simulation environments known as KukaDiverseObjectEnv and RacecarZEDGymEnv respectively. The state observations in these environments are available in the form of RGB images and the action space is continuous, making them difficult to solve. A number of strategies are suggested to provide intermediate hindsight goals required for implementing HER algorithm on these problems which are essentially single-goal environments. In addition, a number of feature extraction architectures are proposed to incorporate spatial and temporal attention in the learning process. Through rigorous simulation experiments, the improvement achieved with these components are established. To the best of our knowledge, such a benchmarking study is not available for the above two vision-based robotics problems making it a novel contribution in the field.	翻訳日:2022-01-13 22:06:30 公開日:2022-01-11
# (参考訳) ラベルなしデータを活用して分散性能を予測する Leveraging Unlabeled Data to Predict Out-of-Distribution Performance ( http://arxiv.org/abs/2201.04234v1 ) ライセンス: CC BY 4.0	Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi	(参考訳) 実世界の機械学習のデプロイメントは、パフォーマンス低下を引き起こす可能性のあるソース(トレーニング)とターゲット(テスト)ディストリビューションのミスマッチによって特徴づけられる。本研究では,ラベル付きソースデータとラベルなしターゲットデータのみを用いて,対象領域の精度を予測する手法を検討する。本稿では,モデル信頼度がしきい値を超える未ラベル例のごく一部として精度を予測し,モデルの信頼度にしきい値を求める実践的手法である平均閾値保持信頼度(ATC)を提案する。 ATCは、いくつかのモデルアーキテクチャ、分散シフトのタイプ(例えば、合成腐敗、データセットの再生、新しいサブポピュレーション)、データセット(Wilds、ImageNet、Breeds、CIFAR、MNIST)において、以前の方法よりも優れていた。我々の実験では、ATCは目標性能を従来の方法よりも正確に2-4\times$と見積もっている。また,問題の理論的基礎を探究し,一般には,最適な予測者を特定するのと同じくらい精度の特定が困難であることを示す。最後に,この手法をおもちゃの分布上で解析し,その動作状況について考察する。 Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceeds that threshold. ATC outperforms previous methods across several model architectures, types of distribution shifts (e.g., due to synthetic corruptions, dataset reproduction, or novel subpopulations), and datasets (Wilds, ImageNet, Breeds, CIFAR, and MNIST). In our experiments, ATC estimates target performance $2$-$4\times$ more accurately than prior methods. We also explore the theoretical foundations of the problem, proving that, in general, identifying the accuracy is just as hard as identifying the optimal predictor and thus, the efficacy of any method rests upon (perhaps unstated) assumptions on the nature of the shift. Finally, analyzing our method on some toy distributions, we provide insights concerning when it works.	翻訳日:2022-01-13 21:54:35 公開日:2022-01-11
# 2つの誤りが正しい: 化学の精度による化学発見のための転移学習アプローチ Two Wrongs Can Make a Right: A Transfer Learning Approach for Chemical Discovery with Chemical Accuracy ( http://arxiv.org/abs/2201.04243v1 ) ライセンス: Link先を確認	Chenru Duan, Daniel B. K. Chu, Aditya Nandy, and Heather J. Kulik	(参考訳) 仮想高スループットスクリーニング(VHTS)において,MR特性を有する分子や材料を適切に同定・処理することが,高データの忠実性を実現する上で重要である。しかしながら、ほとんどのVHTSは1つの関数を使って近似密度汎関数理論(DFT)を用いて実行される。多くのMR診断が開発されているにもかかわらず、そのような診断の単一値が化学特性予測に対するMR効果を示す範囲は十分に確立されていない。我々は1万以上の遷移金属錯体(TMC)のMR診断を評価し,有機分子と比較した。 MR診断は,これらの材料空間間でのみ行うことができる。 mr特性が複数のポテンシャルエネルギー面(すなわち、断熱スピン分割、$\delta e_\mathrm{h-l}$、イオン化ポテンシャル、ip)を含む化学的性質(すなわちmr効果)に与える影響を調べることにより、mr効果のキャンセルが蓄積よりも優れていることを観察する。 MR特性の差は特性予測におけるMR効果の予測においてMR特性の総程度よりも重要である。この観測により、我々は、CCSD(T)レベルのアダイバティック$\Delta E_\mathrm{H-L}$とIPを低い理論レベルから直接予測する転送学習モデルを構築した。これらのモデルと不確実性定量化と多レベルモデリングを組み合わせることで、ロバストvhtsの化学精度(すなわち1kcal/mol)を保ちながら、データ取得を少なくとも3倍に加速するマルチプロング戦略を導入する。 Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high throughput screening (VHTS). Nevertheless, most VHTS is carried out with approximate density functional theory (DFT) using a single functional. Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates MR effect on chemical property prediction is not well established. We evaluate MR diagnostics of over 10,000 transition metal complexes (TMCs) and compare to those in organic molecules. We reveal that only some MR diagnostics are transferable across these materials spaces. By studying the influence of MR character on chemical properties (i.e., MR effect) that involves multiple potential energy surfaces (i.e., adiabatic spin splitting, $\Delta E_\mathrm{H-L}$, and ionization potential, IP), we observe that cancellation in MR effect outweighs accumulation. Differences in MR character are more important than the total degree of MR character in predicting MR effect in property prediction. Motivated by this observation, we build transfer learning models to directly predict CCSD(T)-level adiabatic $\Delta E_\mathrm{H-L}$ and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving chemical accuracy (i.e., 1 kcal/mol) for robust VHTS.	翻訳日:2022-01-13 15:18:24 公開日:2022-01-11
# チューリングトラップ:人間のような人工知能の約束と約束 The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence ( http://arxiv.org/abs/2201.04200v1 ) ライセンス: Link先を確認	Erik Brynjolfsson	(参考訳) 1950年、アラン・チューリングは機械が知的かどうかの究極のテストとして模倣ゲームを提案した。それ以来、人間の知性と一致するインテリジェンスを作ることが、何千人もの研究者、エンジニア、起業家の目標となっている。 human-like artificial intelligence(hlai)のメリットには、生産性の高騰、余暇の増加、そしておそらく最も深く、私たちの心をより深く理解することが含まれる。しかし、あらゆるタイプのAIが人間に似ているわけではない。実際、最も強力なシステムの多くは、人間とは大きく異なる。そのため、HLAIの開発とデプロイに過度な注力が、私たちを罠に陥れさせます。機械が人間の労働の代用品になるにつれて、労働者は経済的、政治的交渉力を失い、テクノロジーを制御する人々に依存するようになる。対照的に、AIが人間を模倣するのではなく強化することに焦点を当てている場合、人間は創造された価値の共有を主張する力を保持します。さらに、強化は新たな機能と新製品やサービスを生み出し、最終的には人間のようなAIよりもはるかに多くの価値を生み出す。どちらのタイプのAIも非常に有益だが、現在、技術者やビジネスエグゼクティブ、政策立案者の間では、自動化に対する過度のインセンティブがある。 In 1950, Alan Turing proposed an imitation game as the ultimate test of whether a machine was intelligent: could a machine imitate a human so well that its answers to questions indistinguishable from a human. Ever since, creating intelligence that matches human intelligence has implicitly or explicitly been the goal of thousands of researchers, engineers, and entrepreneurs. The benefits of human-like artificial intelligence (HLAI) include soaring productivity, increased leisure, and perhaps most profoundly, a better understanding of our own minds. But not all types of AI are human-like. In fact, many of the most powerful systems are very different from humans. So an excessive focus on developing and deploying HLAI can lead us into a trap. As machines become better substitutes for human labor, workers lose economic and political bargaining power and become increasingly dependent on those who control the technology. In contrast, when AI is focused on augmenting humans rather than mimicking them, then humans retain the power to insist on a share of the value created. Furthermore, augmentation creates new capabilities and new products and services, ultimately generating far more value than merely human-like AI. While both types of AI can be enormously beneficial, there are currently excess incentives for automation rather than augmentation among technologists, business executives, and policymakers.	翻訳日:2022-01-13 15:01:49 公開日:2022-01-11
# 信頼できない知的意思決定支援システムのサブゴールによる説明 Subgoal-Based Explanations for Unreliable Intelligent Decision Support Systems ( http://arxiv.org/abs/2201.04204v1 ) ライセンス: Link先を確認	Devleena Das, Been Kim, Sonia Chernova	(参考訳) インテリジェント意思決定支援(IDS)システムは、人工知能技術を活用して、タスクの意思決定フェーズを通じて人間のユーザを導くレコメンデーションを生成する。しかし、重要な課題は、IDSシステムが完璧ではなく、複雑な実世界のシナリオでは誤った出力を生成したり、完全に動作しない可能性があることである。説明可能なAI計画(XAIP)の分野は、AIシステムをエンドユーザにとってより説明しやすいものにするシーケンシャルな意思決定を行う技術の開発を目指している。批判的に、IDSシステムにXAIP技術を適用する前の作業は、プランナーが提案するプランが常に最適であり、ユーザへの意思決定支援として推奨されるアクションやプランが常に正しいと仮定されている。本研究は,非不正IDSシステムとの初級ユーザインタラクションについて検討し,ユーザがガイダンスに慣れた後に利用できなくなる可能性のある,誤ったアクションを推奨するシステムについて検討する。本報告では,従来のIDS出力を補完し,推奨行動が寄与するサブゴールに関する情報を付加する,新たな説明型,サブゴールベース説明法を提案する。我々は、サブゴールベースの説明がユーザタスクのパフォーマンスの向上、最適なIDSレコメンデーションと最適でないIDSレコメンデーションを区別するユーザ能力の向上、IDS障害時により堅牢なユーザパフォーマンスの実現につながることを実証した。 Intelligent decision support (IDS) systems leverage artificial intelligence techniques to generate recommendations that guide human users through the decision making phases of a task. However, a key challenge is that IDS systems are not perfect, and in complex real-world scenarios may produce incorrect output or fail to work altogether. The field of explainable AI planning (XAIP) has sought to develop techniques that make the decision making of sequential decision making AI systems more explainable to end-users. Critically, prior work in applying XAIP techniques to IDS systems has assumed that the plan being proposed by the planner is always optimal, and therefore the action or plan being recommended as decision support to the user is always correct. In this work, we examine novice user interactions with a non-robust IDS system -- one that occasionally recommends the wrong action, and one that may become unavailable after users have become accustomed to its guidance. We introduce a novel explanation type, subgoal-based explanations, for planning-based IDS systems, that supplements traditional IDS output with information about the subgoal toward which the recommended action would contribute. We demonstrate that subgoal-based explanations lead to improved user task performance, improve user ability to distinguish optimal and suboptimal IDS recommendations, are preferred by users, and enable more robust user performance in the case of IDS failure	翻訳日:2022-01-13 14:58:09 公開日:2022-01-11
# 音楽スコア画像の領域ベースレイアウト解析 Region-based Layout Analysis of Music Score Images ( http://arxiv.org/abs/2201.04214v1 ) ライセンス: Link先を確認	Francisco J. Castellanos, Carlos Garrido-Munoz, Antonio R\'ios-Vila, Jorge Calvo-Zaragoza	(参考訳) レイアウト解析(LA)ステージは、光学音楽認識(OMR)システムの正しい性能において極めて重要である。スタブや歌詞などの興味のある領域を識別し、その内容の書き起こしのために処理しなければならない。ディープラーニングに基づく現代的なアプローチが存在するにもかかわらず、OMRにおけるLAの徹底的な研究は、異なるモデルの精度、異なるドメインへの一般化、あるいはより重要なのは、パイプラインのその後のステージへの影響に関してまだ行われていない。この研究は、異なるニューラルアーキテクチャ、音楽文書タイプ、評価シナリオの実験的な研究により、文学におけるこのギャップを埋めることに焦点を当てている。トレーニングデータの必要性は、実際のシナリオにおけるLAアプローチの効率的な適用を可能にする、新しい半合成データ生成技術の提案につながっている。結果はこう示しています (i)モデルの選択とその性能は、転写過程全体において不可欠である。 (ii)laステージを評価するために一般的に用いられる指標は、omrシステムの最終性能と必ずしも相関しない。 (iii)提案手法は,ラベル付きデータの限られたセットで最先端の成果を実現できる。 The Layout Analysis (LA) stage is of vital importance to the correct performance of an Optical Music Recognition (OMR) system. It identifies the regions of interest, such as staves or lyrics, which must then be processed in order to transcribe their content. Despite the existence of modern approaches based on deep learning, an exhaustive study of LA in OMR has not yet been carried out with regard to the precision of different models, their generalization to different domains or, more importantly, their impact on subsequent stages of the pipeline. This work focuses on filling this gap in literature by means of an experimental study of different neural architectures, music document types and evaluation scenarios. The need for training data has also led to a proposal for a new semi-synthetic data generation technique that enables the efficient applicability of LA approaches in real scenarios. Our results show that: (i) the choice of the model and its performance are crucial for the entire transcription process; (ii) the metrics commonly used to evaluate the LA stage do not always correlate with the final performance of the OMR system, and (iii) the proposed data-generation technique enables state-of-the-art results to be achieved with a limited set of labeled data.	翻訳日:2022-01-13 14:57:24 公開日:2022-01-11
# インシデント1M:自然災害・被害・インシデントを含む大規模画像データセット Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents ( http://arxiv.org/abs/2201.04236v1 ) ライセンス: Link先を確認	Ethan Weber, Dim P. Papadopoulos, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba	(参考訳) 洪水、竜巻、山火事などの自然災害は、地球が地球温暖化に陥るにつれてますます広まりつつある。事故の発生時期や発生時期を予測することは困難であり、破壊的な出来事によって危険にさらされている人々の命を救うために、時折緊急対応が重要となる。幸いなことに、このような状況ではテクノロジーが役割を担います。ソーシャルメディア投稿は、災害の進行と余波を理解するために低レイテンシデータソースとして使用できるが、このデータを解析するのは、自動化された方法なしでは面倒である。以前の研究はテキストベースのフィルタリングが中心だったが、画像とビデオベースのフィルタリングはほとんど未調査のままである。本研究では,43のインシデントと49のカテゴリを含む977,088の画像を含む大規模マルチラベルデータセットであるインシデント1Mデータセットを提案する。データセットの構築、統計、潜在的なバイアスの詳細、インシデント検出モデルの導入とトレーニング、flickrやtwitter上の数百万の画像に対するイメージフィルタリング実験を実施します。また,人道支援のためのコンピュータビジョンにおける今後の作業を促進するために,インシデント分析に関するいくつかの応用について述べる。コード、データ、モデルはhttp://incidentsdataset.csail.mit.eduで利用可能である。 Natural disasters, such as floods, tornadoes, or wildfires, are increasingly pervasive as the Earth undergoes global warming. It is difficult to predict when and where an incident will occur, so timely emergency response is critical to saving the lives of those endangered by destructive events. Fortunately, technology can play a role in these situations. Social media posts can be used as a low-latency data source to understand the progression and aftermath of a disaster, yet parsing this data is tedious without automated methods. Prior work has mostly focused on text-based filtering, yet image and video-based filtering remains largely unexplored. In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977,088 images, with 43 incident and 49 place categories. We provide details of the dataset construction, statistics and potential biases; introduce and train a model for incident detection; and perform image-filtering experiments on millions of images on Flickr and Twitter. We also present some applications on incident analysis to encourage and enable future work in computer vision for humanitarian aid. Code, data, and models are available at http://incidentsdataset.csail.mit.edu.	翻訳日:2022-01-13 14:57:07 公開日:2022-01-11
# デジタル人間アバターのバーチャルコプレゼンスへの応用に関する調査研究 A Survey on Applications of Digital Human Avatars toward Virtual Co-presence ( http://arxiv.org/abs/2201.04168v1 ) ライセンス: Link先を確認	Matthew Korban, Xin Li	(参考訳) 本稿では,対話型仮想コプレゼンス(VCP)環境に対するデジタルアバターの構築と利用について検討する。我々は, VCP環境構築技術の発展と, 人工知能(AI)とコンピュータグラフィックスの進歩が, VCP環境の品質に与える影響を評価する。文献における様々な手法を応用と方法論に基づいて分類し,その応用,貢献,限界に基づく様々なグループと戦略を比較した。また、デジタル人間のアバターではなく、他の形態の人間の表現がVCP環境で利用されるというアプローチについても、簡単な議論がなされている。我々の目標は、アバターベースのVCP環境を構築するための様々なアプローチを調査する文献レビューが不足している研究領域のギャップを埋めることである。 VCPやバーチャルリアリティ(VR)環境における人間の表現に関する今後の研究に役立つと期待する。我々の知る限りでは、アバターベースのVCP環境を調査する最初の調査である。具体的には,アバターベース手法の分類手法が新たに提案されている。 This paper investigates different approaches to build and use digital human avatars toward interactive Virtual Co-presence (VCP) environments. We evaluate the evolution of technologies for creating VCP environments and how the advancement in Artificial Intelligence (AI) and Computer Graphics affect the quality of VCP environments. We categorize different methods in the literature based on their applications and methodology and compare various groups and strategies based on their applications, contributions, and limitations. We also have a brief discussion about the approaches that other forms of human representation, rather than digital human avatars, have been utilized in VCP environments. Our goal is to fill the gap in the research domain where there is a lack of literature review investigating different approaches for creating avatar-based VCP environments. We hope this study will be useful for future research involving human representation in VCP or Virtual Reality (VR) environments. To the best of our knowledge, it is the first survey research that investigates avatar-based VCP environments. Specifically, the categorization methodology suggested in this paper for avatar-based methods is new.	翻訳日:2022-01-13 14:50:38 公開日:2022-01-11
# MDPose:WiFiマイクロドップラー信号を用いた人骨格運動再構成 MDPose: Human Skeletal Motion Reconstruction Using WiFi Micro-Doppler Signatures ( http://arxiv.org/abs/2201.04212v1 ) ライセンス: Link先を確認	Chong Tang, Wenda Li, Shelly Vishwakarma, Fangzhan Shi, Simon Julier, Kevin Chetty	(参考訳) 光学センサに基づくモーショントラッキングシステムは、通常、照明条件の悪さ、閉塞、カバー範囲の制限などの問題に苦しめられ、プライバシーの懸念が高まる。最近では、無線周波数(RF)ベースの商用WiFiデバイスによるアプローチが登場し、プライバシーを保護しながら、低コストでユビキタスなセンシングを提供する。しかし、Range-Doppler SpectrogramsのようなRFセンシングシステムの出力は直感的に人間の動きを表現することができず、通常はさらなる処理を必要とする。本研究では,WiFiマイクロドップラーシグネチャに基づくヒト骨格運動再建のための新しいフレームワークであるMDPoseを提案する。従来のRFセンシング出力の解釈をより理解しやすい方法で支援できる17個のキーポイントを持つ骨格モデルを再構築することで、人間の活動を追跡する効果的なソリューションを提供する。第一に、特徴抽出に影響を及ぼす可能性のある不要なノイズを除去し、弱いドップラーシグネチャを強化するために、デノイジングアルゴリズムが実装されている。次に、畳み込みニューラルネットワーク(cnn)-リカレントニューラルネットワーク(rnn)アーキテクチャを用いて、クリーンなマイクロドップラー署名から時間空間依存性を学習し、キーポイントの速度情報を復元する。最後に、スケルトンの初期状態を推定し、エラーの増加を制限するためにポーズ最適化機構を用いる。我々は,MDPoseの性能を示すために,複数の被験者を用いて様々な環境で総合的な実験を行い,29.4mm平均キーポイント位置における絶対誤差を報告した。 Motion tracking systems based on optical sensors typically often suffer from issues, such as poor lighting conditions, occlusion, limited coverage, and may raise privacy concerns. More recently, radio frequency (RF)-based approaches using commercial WiFi devices have emerged which offer low-cost ubiquitous sensing whilst preserving privacy. However, the output of an RF sensing system, such as Range-Doppler spectrograms, cannot represent human motion intuitively and usually requires further processing. In this study, MDPose, a novel framework for human skeletal motion reconstruction based on WiFi micro-Doppler signatures, is proposed. It provides an effective solution to track human activities by reconstructing a skeleton model with 17 key points, which can assist with the interpretation of conventional RF sensing outputs in a more understandable way. Specifically, MDPose has various incremental stages to gradually address a series of challenges: First, a denoising algorithm is implemented to remove any unwanted noise that may affect the feature extraction and enhance weak Doppler signatures. Secondly, the convolutional neural network (CNN)-recurrent neural network (RNN) architecture is applied to learn temporal-spatial dependency from clean micro-Doppler signatures and restore key points' velocity information. Finally, a pose optimising mechanism is employed to estimate the initial state of the skeleton and to limit the increase of error. We have conducted comprehensive tests in a variety of environments using numerous subjects with a single receiver radar system to demonstrate the performance of MDPose, and report 29.4mm mean absolute error over all key points positions, which outperforms state-of-the-art RF-based pose estimation systems.	翻訳日:2022-01-13 14:50:23 公開日:2022-01-11
# smartdet:モバイルオブジェクト検出のためのエッジタスクオフロードのコンテキストアウェア動的制御 SmartDet: Context-Aware Dynamic Control of Edge Task Offloading for Mobile Object Detection ( http://arxiv.org/abs/2201.04235v1 ) ライセンス: Link先を確認	Davide Callegaro and Francesco Restuccia and Marco Levorato	(参考訳) モバイルデバイスはますます、重要なタスクを実行するためにディープニューラルネットワーク(DNN)を介してオブジェクト検出(OD)に依存している。複雑さが高いため、これらのDNNの実行には過剰な時間とエネルギーが必要である。低複雑さオブジェクトトラッキング(OT)はODで使用することができ、後者はトラッキングのための"フレッシュ"参照を生成するために定期的に適用される。しかし、odで処理されたフレームには大きな遅延が発生し、基準が時代遅れとなり追跡品質が低下する可能性がある。本稿では、エッジコンピューティングをこの文脈で使用し、大規模なODレイテンシに耐性のある(モバイルデバイスで)並列OTとOD(エッジサーバで)プロセスを確立することを提案する。過度のod遅延に対するシステムのレジリエンスを向上させる新しいトラッキング機構であるkatch-upを提案する。しかし、Katch-Upは性能が大幅に向上する一方、モバイルデバイスの計算負荷も増大する。そこで我々は,資源利用とOD性能のトレードオフの制御を学習する深層強化学習(DRL)に基づく,低複雑さのコントローラであるSmartDetを設計する。 smartdetは、現在のビデオコンテンツと現在のネットワーク条件に関連する入力コンテキスト関連情報を取り、odオフロードの頻度とタイプを最適化し、katch-up利用を最適化する。我々は,JetSon Nanoをモバイルデバイスとして,GTX 980 Tiをエッジサーバとして,Wi-Fiリンクを介して接続した実世界のテストベッド上でSmartDetを広範囲に評価した。実験結果によると、SmartDetは、平均的リコール(mAR)とリソース使用量という、トラッキングパフォーマンスの最適なバランスを実現している。完全なKatch-Upusageと最大チャネル使用率を持つベースラインに関しては、チャネルの50%削減とKatch-Upに関連する30%の電力リソースを使用しながら、mARを4%増加させています。最小限の資源を用いた固定戦略では、フレームの1/3でKatch-Upを用いてmARを20%増加させる。 Mobile devices increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Low-complexity object tracking (OT) can be used with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which may make the reference outdated and degrade tracking quality. Herein, we propose to use edge computing in this context, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose Katch-Up, a novel tracking mechanism that improves the system resilience to excessive OD delay. However, while Katch-Up significantly improves performance, it also increases the computing load of the mobile device. Hence, we design SmartDet, a low-complexity controller based on deep reinforcement learning (DRL) that learns controlling the trade-off between resource utilization and OD performance. SmartDet takes as input context-related information related to the current video content and the current network conditions to optimize frequency and type of OD offloading, as well as Katch-Up utilization. We extensively evaluate SmartDet on a real-world testbed composed of a JetSon Nano as mobile device and a GTX 980 Ti as edge server, connected through a Wi-Fi link. Experimental results show that SmartDet achieves an optimal balance between tracking performance - mean Average Recall (mAR) and resource usage. With respect to a baseline with full Katch-Upusage and maximum channel usage, we still increase mAR by 4% while using 50% less of the channel and 30% power resources associated with Katch-Up. With respect to a fixed strategy using minimal resources, we increase mAR by 20% while using Katch-Up on 1/3 of the frames.	翻訳日:2022-01-13 14:19:09 公開日:2022-01-11
# (参考訳) ヘイト音声識別のための特徴抽出に基づくモデル A Feature Extraction based Model for Hate Speech Identification ( http://arxiv.org/abs/2201.04227v1 ) ライセンス: CC BY 4.0	Salar Mohtaj, Vera Schmitt, Sebastian M\"oller	(参考訳) ネット上でヘイトスピーチを検出することは重要な課題となり、傷つき、わいせつ、侮辱的コンテンツといった攻撃的な言語は、疎外された人々やグループを傷つける可能性がある。本稿では,インド・ヨーロッパ言語2021におけるヘイトスピーチと攻撃的コンテンツの識別に関する共通タスクのタスク1a,1bについて,tu berlinチームによる実験と結果について述べる。異なる自然言語処理モデルの成功は、競争を通じて各サブタスクに対して評価される。我々は,単語・文字レベルにおける再帰ニューラルネットワークに基づく異なるモデルと,競合によって提供されたデータセットに基づくbertに基づくトランスファー学習アプローチをテストした。実験に使用した実験モデルのうち、転送学習に基づくモデルは両方のサブタスクで最良の結果を得た。 The detection of hate speech online has become an important task, as offensive language such as hurtful, obscene and insulting content can harm marginalized people or groups. This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021. The success of different Natural Language Processing models is evaluated for the respective subtasks throughout the competition. We tested different models based on recurrent neural networks in word and character levels and transfer learning approaches based on Bert on the provided dataset by the competition. Among the tested models that have been used for the experiments, the transfer learning-based models achieved the best results in both subtasks.	翻訳日:2022-01-13 14:16:26 公開日:2022-01-11
# 統計学と機械学習でマネーロンダリングと戦う - 序文とレビュー Fighting Money-Laundering with Statistics and Machine Learning: An Introduction and Review ( http://arxiv.org/abs/2201.04207v1 ) ライセンス: Link先を確認	Rasmus Jensen and Alexandros Iosifidis	(参考訳) マネーロンダリングは深刻な世界的な問題だ。それでも、このトピックに関する統計的および機械学習の研究はほとんどない。本稿では,銀行におけるマネーロンダリング対策に着目する。この分野の既存の研究を整理するために,統一的な用語を提案し,文献のレビューを行う。これは2つの中心的なタスクを中心に構成されている。 (i)クライアントのリスク・プロファイリング (ii)不審な行動顧客リスクプロファイリングは、診断、すなわちリスク要因の発見と説明の努力によって特徴づけられる。一方、突発的な行動フラグングは、開示されていない特徴と手作りのリスク指標によって特徴付けられる。最後に,今後の研究の方向性について述べる。大きな課題のひとつは、公開データセットの欠如だ。これは、合成データ生成によって対処される可能性がある。その他の研究の方向性としては、半教師付き深層学習、解釈可能性、結果の公平性などがある。 Money laundering is a profound, global problem. Nonetheless, there is little statistical and machine learning research on the topic. In this paper, we focus on anti-money laundering in banks. To help organize existing research in the field, we propose a unifying terminology and provide a review of the literature. This is structured around two central tasks: (i) client risk profiling and (ii) suspicious behavior flagging. We find that client risk profiling is characterized by diagnostics, i.e., efforts to find and explain risk factors. Suspicious behavior flagging, on the other hand, is characterized by non-disclosed features and hand-crafted risk indices. Finally, we discuss directions for future research. One major challenge is the lack of public data sets. This may, potentially, be addressed by synthetic data generation. Other possible research directions include semi-supervised and deep learning, interpretability and fairness of the results.	翻訳日:2022-01-13 14:01:19 公開日:2022-01-11
# (参考訳) 悪天候のビジョン:各種物体検出器を用いたサイクロンGANによる自律走行の堅牢な認識 Vision in adverse weather: Augmentation using CycleGANs with various object detectors for robust perception in autonomous racing ( http://arxiv.org/abs/2201.03246v2 ) ライセンス: CC BY 4.0	Izzeddin Teeti, Valentina Musat, Salman Khan, Alexander Rast, Fabio Cuzzolin, Andrew Bradley	(参考訳) 自律運転システムでは、環境からの特徴や物体を識別する認識が重要である。自律レースでは、高速と小さなマージンは迅速かつ正確な検知システムを必要とする。レース中、天候は突然変化し、認識が著しく低下し、非効率な操作が生じる。悪天候の検出を改善するために、ディープラーニングベースのモデルは、通常、そのような状況でキャプチャされた広範なデータセットを必要とする。しかし、最近のCycleGANアーキテクチャは、複数の気象条件下で非常に現実的なシーンを合成することができる。そこで本研究では, 夜間条件下での5つの最先端検出器のうち4つを平均42.7と4.4mAPのパーセンテージで改善するため, 自律レースにおける合成悪条件データセット(CycleGANを用いた)を用いたアプローチを提案する。さらに,5つの物体検出器の比較分析を行い,自律走行時に使用する検出器の最適ペアリングとトレーニングデータの同定を行った。 In an autonomous driving system, perception - identification of features and objects from the environment - is crucial. In autonomous racing, high speeds and small margins demand rapid and accurate detection systems. During the race, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres. In order to improve detection in adverse weather, deep-learning-based models typically require extensive datasets captured in such conditions - the collection of which is a tedious, laborious, and costly process. However, recent developments in CycleGAN architectures allow the synthesis of highly realistic scenes in multiple weather conditions. To this end, we introduce an approach of using synthesised adverse condition datasets in autonomous racing (generated using CycleGAN) to improve the performance of four out of five state-of-the-art detectors by an average of 42.7 and 4.4 mAP percentage points in the presence of night-time conditions and droplets, respectively. Furthermore, we present a comparative analysis of five object detectors - identifying the optimal pairing of detector and training data for use during autonomous racing in challenging conditions.	翻訳日:2022-01-13 12:38:25 公開日:2022-01-11
# (参考訳) ミラーラーニング:政策最適化の統一的枠組み Mirror Learning: A Unifying Framework of Policy Optimisation ( http://arxiv.org/abs/2201.02373v2 ) ライセンス: CC BY 4.0	Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster	(参考訳) 総合政策改善(GPI)と信頼領域学習(TRL)は、マルコフ決定プロセス(MDP)のコアモデルとして機能する、現代強化学習(RL)における主要なフレームワークである。残念なことに、それらの数学的形式は修正に敏感であるため、それらを実装する実用的なインスタンス化は自動的に改善保証を継承しない。その結果、利用可能な厳密なMDP溶媒のスペクトルは狭い。実際、TRPOやPPOのような多くの最先端(SOTA)アルゴリズムは収束することが証明されていない。本稿では,RL問題に対する一般解である「textsl{mirror learning}」を提案する。我々は,GPI と TRL は,モノトニック改善特性を誇示し,最適ポリシーに収束する,このはるかに大きなアルゴリズム空間内の小さな点であることを明らかにした。 RLのための事実上全てのSOTAアルゴリズムがミラー学習の例であり、その経験的性能は近似的な類似ではなく理論的性質の結果であることを示す。興味深いことに、ミラー学習は、収束保証を伴う政策学習手法の全く新しい空間を開くことを示す。 General policy improvement (GPI) and trust-region learning (TRL) are the predominant frameworks within contemporary reinforcement learning (RL), which serve as the core models for solving Markov decision processes (MDPs). Unfortunately, in their mathematical form, they are sensitive to modifications, and thus, the practical instantiations that implement them do not automatically inherit their improvement guarantees. As a result, the spectrum of available rigorous MDP-solvers is narrow. Indeed, many state-of-the-art (SOTA) algorithms, such as TRPO and PPO, are not proven to converge. In this paper, we propose \textsl{mirror learning} -- a general solution to the RL problem. We reveal GPI and TRL to be but small points within this far greater space of algorithms which boasts the monotonic improvement property and converges to the optimal policy. We show that virtually all SOTA algorithms for RL are instances of mirror learning, and thus suggest that their empirical performance is a consequence of their theoretical properties, rather than of approximate analogies. Excitingly, we show that mirror learning opens up a whole new space of policy learning methods with convergence guarantees.	翻訳日:2022-01-13 00:34:26 公開日:2022-01-11
# (参考訳) セグメンテーション性能に対する事前ベース損失の影響:ベンチマーク Effect of Prior-based Losses on Segmentation Performance: A Benchmark ( http://arxiv.org/abs/2201.02428v3 ) ライセンス: CC BY 4.0	Rosana El Jurdi, Caroline Petitjean, Veronika Cheplygina, Paul Honeine, Fahed Abdallah	(参考訳) 今日、深層畳み込みニューラルネットワーク(cnns)は、様々な画像モードやタスクに基づいて、医用画像セグメンテーションの最先端のパフォーマンスを実証している。初期の成功にもかかわらず、セグメンテーションネットワークは依然として解剖学的に異常なセグメンテーションを生成し、オブジェクト境界付近に穴や不正確さがある。解剖学的可能性を強化するために、近年の研究は、損失関数の制約として、物体形状や境界などの事前知識を取り入れることに焦点を当てている。以前の統合は、基幹領域から抽出された再構成された表現を低レベル、または臓器の形状や大きさなどの外部医療情報を高レベルに表すことができる。過去数年間、事前の損失は、アーキテクチャに依存しながら専門家の知識の統合を可能にしているため、研究分野への関心が高まった。しかしながら、さまざまな医療画像の課題やタスクにおける事前ベース損失の多様性を考えると、どのデータセットに最適な損失を識別することが困難になっている。本稿では,医療画像分割における最近の先行的損失のベンチマークについて述べる。主な目的は、特定のタスクやデータセットに与えられた損失を選択するための直感を提供することである。この目的のために、4つの低レベルおよび高レベルの事前ベース損失が選択される。評価された損失は、Deathlon、ISLES、WMHチャレンジなど、さまざまな医療画像セグメンテーション課題から8つの異なるデータセットで検証される。その結果、低レベルの事前ベース損失はデータセット特性に関わらずサイコロ損失ベースラインよりも性能が向上することを保証できるが、高レベルの事前ベース損失はデータ特性に応じて解剖学的信頼性が向上することが示された。 Today, deep convolutional neural networks (CNNs) have demonstrated state-of-the-art performance for medical image segmentation, on various imaging modalities and tasks. Despite early success, segmentation networks may still generate anatomically aberrant segmentations, with holes or inaccuracies near the object boundaries. To enforce anatomical plausibility, recent research studies have focused on incorporating prior knowledge such as object shape or boundary, as constraints in the loss function. Prior integrated could be low-level referring to reformulated representations extracted from the ground-truth segmentations, or high-level representing external medical information such as the organ's shape or size. Over the past few years, prior-based losses exhibited a rising interest in the research field since they allow integration of expert knowledge while still being architecture-agnostic. However, given the diversity of prior-based losses on different medical imaging challenges and tasks, it has become hard to identify what loss works best for which dataset. In this paper, we establish a benchmark of recent prior-based losses for medical image segmentation. The main objective is to provide intuition onto which losses to choose given a particular task or dataset. To this end, four low-level and high-level prior-based losses are selected. The considered losses are validated on 8 different datasets from a variety of medical image segmentation challenges including the Decathlon, the ISLES and the WMH challenge. Results show that whereas low-level prior-based losses can guarantee an increase in performance over the Dice loss baseline regardless of the dataset characteristics, high-level prior-based losses can increase anatomical plausibility as per data characteristics.	翻訳日:2022-01-13 00:05:16 公開日:2022-01-11
# (参考訳) エージェントエージェント時間決定における一般値関数を持つパブロフ信号伝達 Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making ( http://arxiv.org/abs/2201.03709v1 ) ライセンス: CC BY 4.0	Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A. Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil, Patrick M. Pilarski	(参考訳) 本稿では,パブロフ信号の多面的研究に寄与し,あるエージェントが他のエージェントから意思決定を通知する時間的拡張予測プロセスを提案する。信号は時間とタイミングに密接に関連している。信号の生成と受信を行う際、人間や他の動物は時間を表し、過去の出来事から時間を決定し、将来の刺激まで時間を予測し、時間とともに広がるパターンを認識し、生成することが知られている。時間的プロセスの違いが学習エージェント間の協調とシグナル伝達にどのように影響するかを,Frost Hollowと呼ばれる部分的に観測可能な意思決定ドメインを導入することによって検討する。このドメインでは、予測学習エージェントと強化学習エージェントとを、時間的ハザードを避けながらスパース報酬を得るための2部意思決定システムに結合する。 7状態線形歩行における機械エージェントの相互作用と,仮想現実環境における人間と機械の相互作用である。その結果,パブロフ信号の学習速度,時間的表現の違いがエージェントエージェント協調に与える影響,時間的エイリアシングがエージェントエージェントと人間エージェントの相互作用にどう影響するかが示された。主な貢献として、固定信号のパラダイムと2つのエージェント間の完全適応通信学習の自然なブリッジとしてパブロフ信号を確立する。さらに,高速な連続予測学習とエージェント受信信号の性質に関する最小限の制約を特徴とする,固定的な信号処理からこの適応的信号処理を計算的に構築する方法を示す。この結果から,強化学習エージェント間のコミュニケーション学習への実践的で建設的な道筋が示唆された。 In this paper, we contribute a multi-faceted study into Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent. Signalling is intimately connected to time and timing. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict the time until a future stimulus, and both recognize and generate patterns that unfold in time. We investigate how different temporal processes impact coordination and signalling between learning agents by introducing a partially observable decision-making domain we call the Frost Hollow. In this domain, a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that works to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: machine agents interacting in a seven-state linear walk, and human-machine interaction in a virtual-reality environment. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents. We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals. Our results therefore suggest an actionable, constructivist path towards communication learning between reinforcement learning agents.	翻訳日:2022-01-12 19:25:53 公開日:2022-01-11
# (参考訳) 予算のオンライン変更点検出 Online Changepoint Detection on a Budget ( http://arxiv.org/abs/2201.03710v1 ) ライセンス: CC BY 4.0	Zhaohui Wang, Xiao Lin, Abhinav Mishra, Ram Sriharsha	(参考訳) 変更ポイントは、基礎となるデータの分布の急激なバリエーションである。データストリームの変更を検出することは、多くのアプリケーションにとって重要な問題である。本稿では,従来の観測回数とは無関係に,記憶条件と最悪の計算複雑性の両方を考慮し,オンライン環境で動作する変更点検出アルゴリズムに関心がある。そこで本研究では,オフライン・チェンジポイント検出アルゴリズムと好適な比較を行うとともに,厳密に制約された計算モデルで動作するオンライン・チェンジポイント検出アルゴリズムを提案する。さらに,これらのアルゴリズムに対する簡易なオンラインハイパーパラメータ自動チューニング手法を提案する。 Changepoints are abrupt variations in the underlying distribution of data. Detecting changes in a data stream is an important problem with many applications. In this paper, we are interested in changepoint detection algorithms which operate in an online setting in the sense that both its storage requirements and worst-case computational complexity per observation are independent of the number of previous observations. We propose an online changepoint detection algorithm for both univariate and multivariate data which compares favorably with offline changepoint detection algorithms while also operating in a strictly more constrained computational model. In addition, we present a simple online hyperparameter auto tuning technique for these algorithms.	翻訳日:2022-01-12 19:09:38 公開日:2022-01-11
# (参考訳) フェデレーション学習における部分モデル平均化:パフォーマンス保証とメリット Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits ( http://arxiv.org/abs/2201.03789v1 ) ライセンス: CC BY 4.0	Sunwoo Lee, Anit Kumar Sahu, Chaoyang He, and Salman Avestimehr	(参考訳) 周期モデル平均化(FedAvg)を用いた局所確率勾配決定(SGD)は、フェデレートラーニングにおける基礎的アルゴリズムである。アルゴリズムは独立して複数のワーカー上でsgdを実行し、すべてのワーカーに対して定期的にモデルを平均化する。しかし、局所的なSGDが多くの労働者と共に実行されると、周期的な平均化は労働者間で重要なモデル差を引き起こし、グローバルな損失は緩やかに収束する。最近の高度な最適化手法は、非IID設定に焦点をあてた問題に対処しているが、根底にある周期モデル平均化によるモデル差の問題はまだ残っている。フェデレートラーニングにおけるモデルの相違を緩和する部分モデル平均化フレームワークを提案する。部分平均化により、局所モデル同士がパラメータ空間に近接することを奨励し、より効果的にグローバル損失を最小化することができる。一定回数の反復と多数の作業者(128)が与えられた場合、部分的平均は周期的全平均よりも最大2.2%高い検証精度が得られる。 Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2% higher validation accuracy than the periodic full averaging.	翻訳日:2022-01-12 19:01:21 公開日:2022-01-11
# (参考訳) 物体検出と移動学習を用いたビールボトルの分類 Classification of Beer Bottles using Object Detection and Transfer Learning ( http://arxiv.org/abs/2201.03791v1 ) ライセンス: CC BY 4.0	Philipp Hohlfeld, Tobias Ostermeier, Dominik Brandl	(参考訳) 分類問題はコンピュータビジョンでよく見られる。それにもかかわらず、ビール瓶の分類には専用の作業はない。マスターコースのDeep Learningの課題の一環として、5207個のビールボトルの画像とブランドラベルのデータセットが作成された。画像にはちょうど1つのビールボトルが含まれています。本稿では,ビールボトルの画像を2段階のアプローチで分類する深層学習モデルを提案する。最初のステップとして、Faster-R-CNNはブランドとは独立して分類に関連する画像区間を検出する。第2ステップでは、関連する画像セクションをResNet-18で分類する。最も信頼度の高い画像セクションはクラスラベルとして返される。最終テストデータセットの課題において、古典的な1ステップのトランスファー学習アプローチを超越し、99.86%の精度に達したモデルを提案する。挑戦が終わった後、100%の精度で達成できた Classification problems are common in Computer Vision. Despite this, there is no dedicated work for the classification of beer bottles. As part of the challenge of the master course Deep Learning, a dataset of 5207 beer bottle images and brand labels was created. An image contains exactly one beer bottle. In this paper we present a deep learning model which classifies pictures of beer bottles in a two step approach. As the first step, a Faster-R-CNN detects image sections relevant for classification independently of the brand. In the second step, the relevant image sections are classified by a ResNet-18. The image section with the highest confidence is returned as class label. We propose a model, with which we surpass the classic one step transfer learning approach and reached an accuracy of 99.86% during the challenge on the final test dataset. We were able to achieve 100% accuracy after the challenge ended	翻訳日:2022-01-12 18:59:54 公開日:2022-01-11
# (参考訳) 超解像に対する効率的な非局所コントラストアテンション Efficient Non-Local Contrastive Attention for Image Super-Resolution ( http://arxiv.org/abs/2201.03794v1 ) ライセンス: CC BY 4.0	Bin Xia, Yucheng Hang, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie Zhou	(参考訳) 非局所的注意(NLA)は、自然画像の内在的特徴相関を利用して、単一画像超解法(SISR)に大きな改善をもたらす。しかし、NLAはノイズの多い情報を提供し、入力サイズに関して二次計算資源を消費し、その性能と応用を制限する。本稿では,長期ビジュアルモデリングを行い,より関連性の高い非局所的特徴を活用するための,効率的な非局所的コントラスト注意(ENLCA)を提案する。具体的には、ENLCAは、効率的な非局所的注意(ENLA)とスパース集約(Sparse Aggregation)の2つの部分から構成される。 ENLAは指数関数を近似するためにカーネル法を採用し、線形計算複雑性を得る。 Sparse Aggregationでは、増幅係数で入力を乗算して情報的特徴にフォーカスするが、近似のばらつきは指数関数的に増加する。したがって、コントラスト学習は、さらに関係性および無関係な特徴を分離するために適用される。 ENLCAの有効性を示すため,簡単なバックボーンにいくつかのモジュールを追加することで,ENLCN(Efficient Non-Local Contrastive Network)と呼ばれるアーキテクチャを構築した。実験結果から,ENLCNは定量評価と定性評価の両方において,最先端手法よりも優れた性能を示した。 Non-Local Attention (NLA) brings significant improvement for Single Image Super-Resolution (SISR) by leveraging intrinsic feature correlation in natural images. However, NLA gives noisy information large weights and consumes quadratic computation resources with respect to the input size, limiting its performance and application. In this paper, we propose a novel Efficient Non-Local Contrastive Attention (ENLCA) to perform long-range visual modeling and leverage more relevant non-local features. Specifically, ENLCA consists of two parts, Efficient Non-Local Attention (ENLA) and Sparse Aggregation. ENLA adopts the kernel method to approximate exponential function and obtains linear computation complexity. For Sparse Aggregation, we multiply inputs by an amplification factor to focus on informative features, yet the variance of approximation increases exponentially. Therefore, contrastive learning is applied to further separate relevant and irrelevant features. To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone. Extensive experimental results show that ENLCN reaches superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.	翻訳日:2022-01-12 18:52:16 公開日:2022-01-11
# (参考訳) CI-AVSR:車内コマンド認識のためのカントン音声・ビジュアル音声データセット CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition ( http://arxiv.org/abs/2201.03804v1 ) ライセンス: CC BY 4.0	Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung	(参考訳) ディープラーニングとインテリジェントな車両の台頭により、スマートアシスタントは、運転を容易にし、余分な機能を提供するために、車内コンポーネントとして不可欠なものになっている。車内スマートアシスタントは、運転を楽にし、安全性を向上させるために、一般および車関連コマンドを処理し、対応するアクションを実行することができるべきである。しかし、低リソース言語にはデータ不足の問題があり、研究やアプリケーションの開発を妨げている。本稿では,Cantonese言語における車内コマンド認識のための新しいデータセットであるCantonese In-car Audio-Visual Speech Recognition (CI-AVSR)を導入する。カントン語話者30人が記録した200の車載コマンドの4,984サンプル(8.3時間)で構成されている。さらに,実環境をシミュレートするために,車内背景雑音を用いたデータセットの拡張を行い,収集したデータより10倍大きいデータセットを生成する。当社のデータセットのクリーンバージョンと拡張バージョンの両方に関する詳細な統計情報を提供しています。さらに,CI-AVSRの有効性を示すために,2つのマルチモーダルベースラインを実装した。実験の結果,視覚信号の活用により,モデル全体の性能が向上することがわかった。私たちの最良のモデルはクリーンなテストセットでかなりの品質を達成できますが、ノイズの多いデータの音声認識品質はいまだに劣っており、実際の車内音声認識システムにとって非常に困難なタスクです。データセットとコードはhttps://github.com/HLTCHKUST/CI-AVSRで公開される。 With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains as an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR.	翻訳日:2022-01-12 18:38:33 公開日:2022-01-11
# (参考訳) メタラーニングによる情報グラフ拡張のブートストラップ Bootstrapping Informative Graph Augmentation via A Meta Learning Approach ( http://arxiv.org/abs/2201.03812v1 ) ライセンス: CC BY 4.0	Hang Gao, Jiangmeng Li, Wenwen Qiang, Lingyu Si, Changwen Zheng, Fuchun Sun	(参考訳) 近年の研究では、グラフ表現を自己教師型で学習する。グラフコントラスト学習では、ベンチマーク手法は様々なグラフ拡張アプローチを適用する。しかし、増分法のほとんどは学習不可能であり、不便な増分グラフを生成する問題を引き起こす。このような拡張は、グラフコントラスト学習法の表現能力を低下させる可能性がある。そこで本稿では,Meta Graph Augmentation (MEGA) と呼ばれる学習可能なグラフ拡張器を用いてグラフを生成する方法を提案する。そして、"良い"グラフ拡張は、インスタンスレベルでは均一で、機能レベルではインフォメーション性を持つ必要があります。そこで本研究では,一様性,情報性に富んだ拡張を生成できるグラフ強化器の学習手法を提案する。グラフ拡張器の目的は,特徴抽出ネットワークを促進させ,より差別的な特徴表現を学習することであり,メタラーニングパラダイムを提案する動機となっている。実験的に、複数のベンチマークデータセットに対する実験は、MEGAがグラフ自己教師付き学習タスクにおいて最先端の手法よりも優れていることを示した。さらなる実験的研究により、MEGAの異なる用語の有効性が証明された。 Recent works explore learning graph representations in a self-supervised manner. In graph contrastive learning, benchmark methods apply various graph augmentation approaches. However, most of the augmentation methods are non-learnable, which causes the issue of generating unbeneficial augmented graphs. Such augmentation may degenerate the representation ability of graph contrastive learning methods. Therefore, we motivate our method to generate augmented graph by a learnable graph augmenter, called MEta Graph Augmentation (MEGA). We then clarify that a "good" graph augmentation must have uniformity at the instance-level and informativeness at the feature-level. To this end, we propose a novel approach to learning a graph augmenter that can generate an augmentation with uniformity and informativeness. The objective of the graph augmenter is to promote our feature extraction network to learn a more discriminative feature representation, which motivates us to propose a meta-learning paradigm. Empirically, the experiments across multiple benchmark datasets demonstrate that MEGA outperforms the state-of-the-art methods in graph self-supervised learning tasks. Further experimental studies prove the effectiveness of different terms of MEGA.	翻訳日:2022-01-12 18:25:47 公開日:2022-01-11
# (参考訳) 機械学習を用いたトルコの感情分析:オンライン食品注文サイトレビューへの適用 Turkish Sentiment Analysis Using Machine Learning Methods: Application on Online Food Order Site Reviews ( http://arxiv.org/abs/2201.03848v1 ) ライセンス: CC BY 4.0	\"Ozlem Akta\c{s}, Berkay Co\c{s}kuner, \.Ilker Soner	(参考訳) あらゆるセクターで今日現れる満足度測定は、多くの企業にとって非常に重要な要素だ。本研究では,Yemek Sepetiのデータとこれらのデータのバリエーションを用いて,さまざまな機械学習アルゴリズムで最高の精度に達することを目的とした。各アルゴリズムの精度は、使用する自然言語処理手法とともに算出された。これらの精度を計算しながら、使用するアルゴリズムのパラメータを最適化しようと試みた。この研究でトレーニングされたラベル付きデータに関するモデルは、ラベルなしデータで使用することができ、顧客満足度を測定するアイデアを企業に提供することができる。 3つの異なる自然言語処理手法が適用され,ほとんどのモデルにおいて約5%の精度向上が得られた。 Satisfaction measurement, which emerges in every sector today, is a very important factor for many companies. In this study, it is aimed to reach the highest accuracy rate with various machine learning algorithms by using the data on Yemek Sepeti and variations of this data. The accuracy values of each algorithm were calculated together with the various natural language processing methods used. While calculating these accuracy values, the parameters of the algorithms used were tried to be optimized. The models trained in this study on labeled data can be used on unlabeled data and can give companies an idea in measuring customer satisfaction. It was observed that 3 different natural language processing methods applied resulted in approximately 5% accuracy increase in most of the developed models.	翻訳日:2022-01-12 18:10:07 公開日:2022-01-11
# (参考訳) 野生の文書のWebジェネア識別のためのGINCOトレーニングデータセット The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild ( http://arxiv.org/abs/2201.03857v1 ) ライセンス: CC BY-SA 4.0	Taja Kuzman, Peter Rupnik and Nikola Ljube\v{s}i\'c	(参考訳) 本稿では,6万語からなる1,125クロールされたスロベニア語 web 文書に基づく,ジャンル識別用自動学習データセットを提案する。各ドキュメントは、既存のスキーマ上に構築された新しいアノテーションスキーマを使って、ジャンル向けに手作業で注釈付けされ、主にラベルとアノテーション間の合意を念頭に置いている。このデータセットは、機械翻訳コンテンツ、エンコーディングエラー、一つの文書に表示される複数のコンテンツなど、Webベースのデータに関連するさまざまな課題で構成され、現実的な条件下での分類器の評価を可能にする。データセット上の最初の機械学習実験では、(1)プリトランスフォーマモデルでは、マクロf1メトリクスが約0.22で、一方、トランスフォーマベースモデルは約0.58点、(2)マルチリンガルトランスフォーマモデルは、これまで標準nlpタスクでマルチリンガルモデルよりも優れていることが証明されていた単言語モデルと同様に、タスクでも動作する、という結果が得られた。 This paper presents a new training dataset for automatic genre identification GINCO, which is based on 1,125 crawled Slovenian web documents that consist of 650 thousand words. Each document was manually annotated for genre with a new annotation schema that builds upon existing schemata, having primarily clarity of labels and inter-annotator agreement in mind. The dataset consists of various challenges related to web-based data, such as machine translated content, encoding errors, multiple contents presented in one document etc., enabling evaluation of classifiers in realistic conditions. The initial machine learning experiments on the dataset show that (1) pre-Transformer models are drastically less able to model the phenomena, with macro F1 metrics ranging around 0.22, while Transformer-based models achieve scores of around 0.58, and (2) multilingual Transformer models work as well on the task as the monolingual models that were previously proven to be superior to multilingual models on standard NLP tasks.	翻訳日:2022-01-12 18:02:17 公開日:2022-01-11
# (参考訳) eegからの感情推定 ---saliencyと組み合わせた二重ディープラーニングアプローチ Emotion Estimation from EEG -- A Dual Deep Learning Approach Combined with Saliency ( http://arxiv.org/abs/2201.03891v1 ) ライセンス: CC BY 4.0	Victor Delvigne, Antoine Facchini, Hazem Wannous, Thierry Dutoit, Laurence Ris and Jean-Philippe Vandeborre	(参考訳) 感情の推定は、人間とコンピュータの相互作用に重要な影響を与える研究の活発な分野である。感情を評価するための異なるモダリティの中で、脳波(EEG)は過去10年間に動機づけた結果を示した。脳波による感情推定は、特定の疾患の診断やリハビリに役立つ。本稿では,コンピュータビジョンに特化した新しい深層学習(DL)モデルと,専門家が定義する生理的知識を考慮した2つの手法を提案する。モデル塩分分析により共同学習が強化された。グローバルなアプローチを提案するため、このモデルは4つの公開データセットで評価され、最先端のアプローチと同じような結果が得られ、より高い安定性を反映した標準偏差の低い2つのデータセットに対して性能が向上する。本論文で提案するコードとモデルは再現性のためにgithub.com/VDelv/Emotion-EEGで公開されている。 Emotion estimation is an active field of research that has an important impact on the interaction between human and computer. Among the different modality to assess emotion, electroencephalogram (EEG) representing the electrical brain activity presented motivating results during the last decade. Emotion estimation from EEG could help in the diagnosis or rehabilitation of certain diseases. In this paper, we propose a dual method considering the physiological knowledge defined by specialists combined with novel deep learning (DL) models initially dedicated to computer vision. The joint learning has been enhanced with model saliency analysis. To present a global approach, the model has been evaluated on four publicly available datasets and achieves similar results to the state-of-theart approaches and outperforming results for two of the proposed datasets with a lower standard deviation that reflects higher stability. For sake of reproducibility, the codes and models proposed in this paper are available at github.com/VDelv/Emotion-EEG.	翻訳日:2022-01-12 17:44:07 公開日:2022-01-11
# (参考訳) オートエンコーダ入門 An Introduction to Autoencoders ( http://arxiv.org/abs/2201.03898v1 ) ライセンス: CC BY 4.0	Umberto Michelucci	(参考訳) 本稿では,オートエンコーダについて述べる。本稿では,オートエンコーダの基本概念と数学について述べる。それらは何か、制限は何か、典型的なユースケースについて議論し、いくつかの例を見ていきます。まず、オートエンコーダの一般的な紹介から始め、出力層におけるアクティベーション関数の役割と損失関数について議論する。次に、再構築エラーとは何かについて議論する。最後に, 典型的な応用として, 次元の縮小, 分類, 分節化, 異常検出について考察する。本論文は2021年に与えられたオートエンコーダに関する博士号レベルの講義のノートを含む。 In this article, we will look at autoencoders. This article covers the mathematics and the fundamental concepts of autoencoders. We will discuss what they are, what the limitations are, the typical use cases, and we will look at some examples. We will start with a general introduction to autoencoders, and we will discuss the role of the activation function in the output layer and the loss function. We will then discuss what the reconstruction error is. Finally, we will look at typical applications as dimensionality reduction, classification, denoising, and anomaly detection. This paper contains the notes of a PhD-level lecture on autoencoders given in 2021.	翻訳日:2022-01-12 17:32:36 公開日:2022-01-11
# (参考訳) 私の心はどこにありますか. 脳活動からの視覚的注意の予測 Where Is My Mind (looking at)? Predicting Visual Attention from Brain Activity ( http://arxiv.org/abs/2201.03902v1 ) ライセンス: CC BY 4.0	Victor Delvigne, No\'e Tits, Luca La Fisca, Nathan Hubens, Antoine Maiorca, Hazem Wannous, Thierry Dutoit and Jean-Philippe Vandeborre	(参考訳) 視覚的注意の推定は、コンピュータビジョン、人工知能、医学といった様々な分野の研究の活発な分野である。注目度マップを推定する最も一般的なアプローチの1つは、観察された画像に基づいている。本稿では,脳波の取得から視覚的注意を抽出できることを示す。結果は観測された画像からの従来の予測に匹敵する。この目的のために一連の信号が記録され、視覚注意と脳活動の関係を研究するために異なるモデルが開発されている。結果は奨励的であり、他のモダリティによる注意を推定する他のアプローチと比較できる。この論文で検討されているコードとデータセットは、この分野の研究を促進するために \url{https://figshare.com/s/3e353bd1c62 1962888ad} で利用可能である。 Visual attention estimation is an active field of research at the crossroads of different disciplines: computer vision, artificial intelligence and medicine. One of the most common approaches to estimate a saliency map representing attention is based on the observed images. In this paper, we show that visual attention can be retrieved from EEG acquisition. The results are comparable to traditional predictions from observed images, which is of great interest. For this purpose, a set of signals has been recorded and different models have been developed to study the relationship between visual attention and brain activity. The results are encouraging and comparable with other approaches estimating attention with other modalities. The codes and dataset considered in this paper have been made available at \url{https://figshare.com/s/3e353bd1c621962888ad} to promote research in the field.	翻訳日:2022-01-12 17:20:19 公開日:2022-01-11
# (参考訳) 適応正と負のサンプルを用いたコントラスト学習に基づく特徴抽出フレームワーク Feature Extraction Framework based on Contrastive Learning with Adaptive Positive and Negative Samples ( http://arxiv.org/abs/2201.03942v1 ) ライセンス: CC BY 4.0	Hongjie Zhang	(参考訳) 本研究では,教師なし・教師なし・半教師付き単一視点特徴抽出に適した適応正負サンプル(CL-FEFA)を用いたコントラスト学習に基づく特徴抽出フレームワークを提案する。 CL-FEFAは、特徴抽出の結果から正および負のサンプルを適応的に構成し、より適切かつ正確である。その後、前回の正および負のサンプルに基づいてInfoNCEの損失に基づいて識別特性を再抽出し、クラス内サンプルをよりコンパクト化し、クラス間サンプルをより分散させる。同時に、サブスペースサンプルの潜在的構造情報を用いて、正および負のサンプルを動的に構築することで、我々のフレームワークはノイズの多いデータに対してより堅牢になる。さらに、CL-FEFAは、潜在的な構造に類似したサンプルである正のサンプル間の相互情報を考慮し、特徴抽出の利点を理論的に支持する。最終数値実験により,提案手法は従来の特徴抽出法やコントラスト学習法よりも大きなアドバンテージを持つことが示された。 In this study, we propose a feature extraction framework based on contrastive learning with adaptive positive and negative samples (CL-FEFA) that is suitable for unsupervised, supervised, and semi-supervised single-view feature extraction. CL-FEFA constructs adaptively the positive and negative samples from the results of feature extraction, which makes it more appropriate and accurate. Thereafter, the discriminative features are re extracted to according to InfoNCE loss based on previous positive and negative samples, which will make the intra-class samples more compact and the inter-class samples more dispersed. At the same time, using the potential structure information of subspace samples to dynamically construct positive and negative samples can make our framework more robust to noisy data. Furthermore, CL-FEFA considers the mutual information between positive samples, that is, similar samples in potential structures, which provides theoretical support for its advantages in feature extraction. The final numerical experiments prove that the proposed framework has a strong advantage over the traditional feature extraction methods and contrastive learning methods.	翻訳日:2022-01-12 17:05:33 公開日:2022-01-11
# (参考訳) アクティブ強化学習--興味ある自己適応のための分類システムへのロードマップ Active Reinforcement Learning -- A Roadmap Towards Curious Classifier Systems for Self-Adaptation ( http://arxiv.org/abs/2201.03947v1 ) ライセンス: CC BY 4.0	Simon Reichhuber, Sven Tomforde	(参考訳) インテリジェントなシステムには、観察や経験、明示的なフィードバックを考慮して、時間とともに行動を改善する能力がある。従来のアプローチでは、学習問題を分離し、強化学習、アクティブ学習、異常検出、転送学習など、さまざまな分野の機械学習から分離したテクニックを使用する。このような状況下では、基本的な強化学習アプローチには、試行錯誤、純粋なリアクティブな振る舞い、分離された問題処理といった、現実のシステムへの応用を妨げるいくつかの欠点がある。本稿では,知的システムにおける「能動的強化学習」という研究課題を立案し,これらの欠点を軽減するための概念を提案する。 Intelligent systems have the ability to improve their behaviour over time taking observations, experiences or explicit feedback into account. Traditional approaches separate the learning problem and make isolated use of techniques from different field of machine learning such as reinforcement learning, active learning, anomaly detection or transfer learning, for instance. In this context, the fundamental reinforcement learning approaches come with several drawbacks that hinder their application to real-world systems: trial-and-error, purely reactive behaviour or isolated problem handling. The idea of this article is to present a concept for alleviating these drawbacks by setting up a research agenda towards what we call "active reinforcement learning" in intelligent systems.	翻訳日:2022-01-12 16:51:57 公開日:2022-01-11
# (参考訳) 不均衡データに対するマルチグラニュラリティリラベルアンダーサンプリングアルゴリズム Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data ( http://arxiv.org/abs/2201.03957v1 ) ライセンス: CC BY 4.0	Qi Dai, Jian-wei Liu, Yang Liu	(参考訳) 不均衡な分類問題は、データマイニングと機械学習において重要かつ困難な問題の1つであることが判明した。従来の分類器の性能は、クラス不均衡問題、クラスオーバーラップ、ノイズなど、多くのデータ問題に大きく影響を受ける。 tomek-linkアルゴリズムは、提案時にデータクリーニングにのみ使用された。近年,Tomek-Linkアルゴリズムとサンプリング手法の組み合わせが報告されている。 Tomek-Linkサンプリングアルゴリズムは、データ上のクラスオーバーラップを効果的に低減し、識別が難しい多数インスタンスを除去し、アルゴリズムの分類精度を向上させる。しかし、tomek-linksのアンダーサンプリングアルゴリズムは、互いに最も近い境界インスタンスのみをグローバルに考慮し、潜在的に重複するインスタンスを無視する。マイノリティインスタンス数が小さい場合、アンサンプリング効果が不十分であり、分類モデルの性能改善は明らかではない。そこで,tomek-linkに基づき,マルチグラニュラリティリラベル化アンダーサンプリングアルゴリズム(mgru)を提案する。このアルゴリズムは、局所粒度部分空間に設定されたデータセットの局所情報を十分に考慮し、データセット内の局所ポテンシャル重複インスタンスを検出する。そして、重なり合う多数派インスタンスをグローバルレザベルインデックス値に従って排除し、トメックリンクの検出範囲を効果的に拡大する。その結果,アンダーサンプリングの最適大域レラベルインデックス値を選択した場合,提案するアンダーサンプリングアルゴリズムの分類精度と一般化性能は,他のベースラインアルゴリズムよりも有意に優れていることがわかった。 The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning. The performances of traditional classifiers will be severely affected by many data problems, such as class imbalanced problem, class overlap and noise. The Tomek-Link algorithm was only used to clean data when it was proposed. In recent years, there have been reports of combining Tomek-Link algorithm with sampling technique. The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy. However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances. When the number of minority instances is small, the under-sampling effect is not satisfactory, and the performance improvement of the classification model is not obvious. Therefore, on the basis of Tomek-Link, a multi-granularity relabeled under-sampling algorithm (MGRU) is proposed. This algorithm fully considers the local information of the data set in the local granularity subspace, and detects the local potential overlapping instances in the data set. Then, the overlapped majority instances are eliminated according to the global relabeled index value, which effectively expands the detection range of Tomek-Links. The simulation results show that when we select the optimal global relabeled index value for under-sampling, the classification accuracy and generalization performance of the proposed under-sampling algorithm are significantly better than other baseline algorithms.	翻訳日:2022-01-12 16:33:08 公開日:2022-01-11
# (参考訳) フーリエリング相関を用いた画質測定とノイズ除去 Image quality measurements and denoising using Fourier Ring Correlations ( http://arxiv.org/abs/2201.03992v1 ) ライセンス: CC BY 4.0	J. Kaczmar-Michalska, N.R. Hajizadeh, A.J. Rzepiela and S.F. N{\o}rrelykke	(参考訳) 画像の品質は、異なる人々に対して異なる意味を持つ誤った概念です。画像品質を定量化するため、劣化画像と接地真実画像との相対差を典型的に算出する。しかし、この違いを測定するのにどんな指標を使うべきか? 理想的には、メトリックは自然画像と科学画像の両方でうまく機能するはずだ。構造類似度指数(SSIM)は、人間が画像の類似性をどう知覚するかの指標であるが、科学的に顕微鏡で意味のある違いには敏感ではない。電子および超解像顕微鏡では、フーリエリング相関 (FRC) がしばしば用いられるが、これらの分野以外ではほとんど知られていない。ここでは、FRCがGoogle Open Imagesデータセットなど、自然画像にも同じように適用可能であることを示す。次に、frcに基づいて損失関数を定義し、解析的に微分可能であることを示す。このFRCベースの損失関数は、L1またはL2ベースの損失を使用する場合よりも、ネットワークを高速にトレーニングし、類似またはより良い結果が得られる。また、FRC解析によるニューラルネットワークの特性と限界についても検討する。 Image quality is a nebulous concept with different meanings to different people. To quantify image quality a relative difference is typically calculated between a corrupted image and a ground truth image. But what metric should we use for measuring this difference? Ideally, the metric should perform well for both natural and scientific images. The structural similarity index (SSIM) is a good measure for how humans perceive image similarities, but is not sensitive to differences that are scientifically meaningful in microscopy. In electron and super-resolution microscopy, the Fourier Ring Correlation (FRC) is often used, but is little known outside of these fields. Here we show that the FRC can equally well be applied to natural images, e.g. the Google Open Images dataset. We then define a loss function based on the FRC, show that it is analytically differentiable, and use it to train a U-net for denoising of images. This FRC-based loss function allows the network to train faster and achieve similar or better results than when using L1- or L2- based losses. We also investigate the properties and limitations of neural network denoising with the FRC analysis.	翻訳日:2022-01-12 16:27:01 公開日:2022-01-11
# (参考訳) ディープフェイス認識に対する類似性に基づくグレイボックス逆攻撃 Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition ( http://arxiv.org/abs/2201.04011v1 ) ライセンス: CC BY 4.0	Hanrui Wang, Shuo Wang, Zhe Jin, Yandan Wang, Cunjian Chen, Massimo Tistarell	(参考訳) 敵対的攻撃手法の大半は、システムの全知識が明らかにされると、深い顔認識に対して良好に機能する(\emph{white-box})。しかし、このような手法は攻撃者に顔テンプレートが未知のグレーボックス設定ではうまく機能しない。本研究では,新たに開発された目的関数を持つ類似性に基づく灰色の箱対向攻撃(SGADV)手法を提案する。 SGADVは、相似性スコアを使用して、最適化された敵の例、すなわち類似性に基づく敵攻撃を生成する。このテクニックは、ホワイトボックスとグレーボックスの両方で、異なる類似度スコアを使用して真正または偽のユーザを決定する認証システムに対して適用される。 SGADVの有効性を検証するため,LFW,CelebA,CelebA-HQの顔データセットに対して,ホワイトボックスとグレーボックスの両方でFaceNetとInsightFaceの深層顔認識モデルに対して広範な実験を行った。提案手法は,グレーボックス設定において既存の攻撃手法よりも有意に優れていた。したがって,本手法の類似性ベースアプローチは,非認証のためのグレイボックス攻撃シナリオに十分対応できる可能性が示唆された。 The majority of adversarial attack techniques perform well against deep face recognition when the full knowledge of the system is revealed (\emph{white-box}). However, such techniques act unsuccessfully in the gray-box setting where the face templates are unknown to the attackers. In this work, we propose a similarity-based gray-box adversarial attack (SGADV) technique with a newly developed objective function. SGADV utilizes the dissimilarity score to produce the optimized adversarial example, i.e., similarity-based adversarial attack. This technique applies to both white-box and gray-box attacks against authentication systems that determine genuine or imposter users using the dissimilarity score. To validate the effectiveness of SGADV, we conduct extensive experiments on face datasets of LFW, CelebA, and CelebA-HQ against deep face recognition models of FaceNet and InsightFace in both white-box and gray-box settings. The results suggest that the proposed method significantly outperforms the existing adversarial attack techniques in the gray-box setting. We hence summarize that the similarity-base approaches to develop the adversarial example could satisfactorily cater to the gray-box attack scenarios for de-authentication.	翻訳日:2022-01-12 16:04:49 公開日:2022-01-11
# (参考訳) 差動的個人分割学習に対する特徴空間ハイジャック攻撃 Feature Space Hijacking Attacks against Differentially Private Split Learning ( http://arxiv.org/abs/2201.04018v1 ) ライセンス: CC BY-SA 4.0	Grzegorz Gawron, Philip Stubbings	(参考訳) 分散学習と差分プライバシーは、分散データセット上のプライバシーに準拠した高度な分析を支援する可能性がある技術である。分割学習に対する攻撃は重要な評価ツールであり、近年研究の注目を集めている。この研究の貢献は、クライアントサイドのオフザシェルDPオプティマイザを使用して、差分プライバシ(DP)によって強化されたスプリットニューラルネットワークの学習プロセスに、最近のフィーチャースペースハイジャック攻撃(FSHA)を適用することである。 FSHA攻撃は、任意に設定されたDPエプシロンレベルでエラー率の低いクライアントのプライベートデータ再構成を取得する。また,攻撃リスク軽減の可能性を示唆する次元的低減実験を行い,ある程度の有効性を示す。この設定において、差分プライバシーが効果的な保護ではない理由を論じ、他のリスク軽減手法についても言及する。 Split learning and differential privacy are technologies with growing potential to help with privacy-compliant advanced analytics on distributed datasets. Attacks against split learning are an important evaluation tool and have been receiving increased research attention recently. This work's contribution is applying a recent feature space hijacking attack (FSHA) to the learning process of a split neural network enhanced with differential privacy (DP), using a client-side off-the-shelf DP optimizer. The FSHA attack obtains client's private data reconstruction with low error rates at arbitrarily set DP epsilon levels. We also experiment with dimensionality reduction as a potential attack risk mitigation and show that it might help to some extent. We discuss the reasons why differential privacy is not an effective protection in this setting and mention potential other risk mitigation methods.	翻訳日:2022-01-12 16:03:48 公開日:2022-01-11
# (参考訳) グラフニューラルネットワークを利用した電力系統の状態推定 State Estimation in Electric Power Systems Leveraging Graph Neural Networks ( http://arxiv.org/abs/2201.04056v1 ) ライセンス: CC BY 4.0	Ognjen Kundacina, Mirsad Cosovic, Dejan Vukobratovic	(参考訳) 状態推定(SE)アルゴリズムの目標は、電力系統で利用可能な測定値セットに基づいて、複雑なバス電圧を状態変数として推定することである。ファサー測定ユニット (pmus) は送電系統で使われるようになりつつあるため, pmu の高いサンプリング率を活用できる高速 se ソルバが必要である。本稿では,pmu電圧と電流測定を入力として,評価フェーズ中に高速かつ正確な予測を得るために,グラフニューラルネットワーク(gnn)をトレーニングすることを提案する。 GNNは、電力系統内の測定セットをランダムにサンプリングし、PMUsソルバを備えた線形SEを用いて得られる解でラベル付けすることで、合成データセットを用いて訓練される。その結果,様々なテストシナリオにおけるGNN予測の精度を示し,欠落した入力データに対する予測の感度に対処した。 The goal of the state estimation (SE) algorithm is to estimate complex bus voltages as state variables based on the available set of measurements in the power system. Because phasor measurement units (PMUs) are increasingly being used in transmission power systems, there is a need for a fast SE solver that can take advantage of PMU high sampling rates. This paper proposes training a graph neural network (GNN) to learn the estimates given the PMU voltage and current measurements as inputs, with the intent of obtaining fast and accurate predictions during the evaluation phase. GNN is trained using synthetic datasets, created by randomly sampling sets of measurements in the power system and labelling them with a solution obtained using a linear SE with PMUs solver. The presented results display the accuracy of GNN predictions in various test scenarios and tackle the sensitivity of the predictions to the missing input data.	翻訳日:2022-01-12 15:57:00 公開日:2022-01-11
# (参考訳) 重力波検出における共通空間パターンの適用 Application of Common Spatial Patterns in Gravitational Waves Detection ( http://arxiv.org/abs/2201.04086v1 ) ライセンス: CC BY 4.0	Damodar Dahal	(参考訳) 共通空間パターン (Common Spatial Patterns, CSP) は、脳-コンピュータインタフェース(BCI)システムで多チャンネル磁気・電脳波(MEG/EEG)時系列データ中の事象関連電位(ERP)を検出するために広く使われている特徴抽出アルゴリズムである。本稿では,多検出器重力波(GW)のひずみがコレセンスを含むかどうかを判定する問題に対して,CSPアルゴリズムを開発し,適用する。信号処理技術とロジスティック回帰分類器を用いて、我々のパイプラインは、H1およびL1ひずみを用いて、重力波トランジェントカタログから82の信頼できるイベントのうち76の76を正確に検出でき、分類スコアは9,3.72 \pm 0.04\%$を10 \times 5$ Cross Validationを使って検出できることがわかった。偽陰性事象は、GW170817-v3、GW191219 163120-v1、GW200115 042309-v2、GW200210 092254-v1、GW200220 061928-v1、GW200322 091133-v1である。 Common Spatial Patterns (CSP) is a feature extraction algorithm widely used in Brain-Computer Interface (BCI) Systems for detecting Event-Related Potentials (ERPs) in multi-channel magneto/electroencephalography (MEG/EEG) time series data. In this article, we develop and apply a CSP algorithm to the problem of identifying whether a given epoch of multi-detector Gravitational Wave (GW) strains contains coalescenses. Paired with Signal Processing techniques and a Logistic Regression classifier, we find that our pipeline is correctly able to detect 76 out of 82 confident events from Gravitational Wave Transient Catalog, using H1 and L1 strains, with a classification score of $93.72 \pm 0.04\%$ using $10 \times 5$ cross validation. The false negative events were: GW170817-v3, GW191219 163120-v1, GW200115 042309-v2, GW200210 092254-v1, GW200220 061928-v1, and GW200322 091133-v1.	翻訳日:2022-01-12 15:46:43 公開日:2022-01-11
# (参考訳) スペクトルサーベイ:自律型UAVを用いたアクティブ無線マップ推定 Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs ( http://arxiv.org/abs/2201.04125v1 ) ライセンス: CC BY 4.0	Raju Shrestha, Daniel Romero, Sundeep Prabhakar Chepuri	(参考訳) 無線地図は、リソース割り当て、干渉調整、ミッションプランニングなど、無線通信や移動ロボットのタスクに多くの応用を見出している。空間分布測定から無線地図を構築する手法が多数提案されているが, 事前にその位置を推定する。そこで,本稿では,無人航空機 (uav) などの移動ロボットが,短時間の測量で高品質な地図推定を行うために,活発に選択された複数の場所で計測を収集するスペクトラムサーベイを提案する。これは2つのステップで行われる。まず,モデルベースオンラインベイズ推定器とデータ駆動深層学習アルゴリズムの2つの新しいアルゴリズムを考案し,地図推定値の更新と,可能な各場所における測定値の有意性を示す不確実性指標を提案する。これらのアルゴリズムは、相補的な利点と測定毎の特徴的複雑さを提供する。第二に、不確実性測定基準は、UAVの軌道を計画し、最も情報性の高い場所で測定を収集するために用いられる。この問題の組合せ複雑性を克服するために、線形時間における大きな不確実性のある領域を通して経路点のリストを得る動的プログラミング手法を提案する。実データを用いた数値実験により,提案手法が正確な無線地図を高速に構築できることが確認された。 Radio maps find numerous applications in wireless communications and mobile robotics tasks, including resource allocation, interference coordination, and mission planning. Although numerous techniques have been proposed to construct radio maps from spatially distributed measurements, the locations of such measurements are assumed predetermined beforehand. In contrast, this paper proposes spectrum surveying, where a mobile robot such as an unmanned aerial vehicle (UAV) collects measurements at a set of locations that are actively selected to obtain high-quality map estimates in a short surveying time. This is performed in two steps. First, two novel algorithms, a model-based online Bayesian estimator and a data-driven deep learning algorithm, are devised for updating a map estimate and an uncertainty metric that indicates the informativeness of measurements at each possible location. These algorithms offer complementary benefits and feature constant complexity per measurement. Second, the uncertainty metric is used to plan the trajectory of the UAV to gather measurements at the most informative locations. To overcome the combinatorial complexity of this problem, a dynamic programming approach is proposed to obtain lists of waypoints through areas of large uncertainty in linear time. Numerical experiments conducted on a realistic dataset confirm that the proposed scheme constructs accurate radio maps quickly.	翻訳日:2022-01-12 15:36:57 公開日:2022-01-11
# ODEフローの経路微分可能性 Path differentiability of ODE flows ( http://arxiv.org/abs/2201.03819v1 ) ライセンス: Link先を確認	Swann Marx (LS2N), Edouard Pauwels (IRIT)	(参考訳) 経路微分ベクトル場によって駆動される常微分方程式(ODE)の流れを考える。経路微分可能関数は、基本計算規則と相反する一般化微分の概念である保守勾配を受け入れるリプシッツ函数の固有部分類を構成する。我々の主な結果は、そのような流れが駆動ベクトル場の経路微分可能性特性を継承することを示している。感度差分包有物によって与えられる導関数の前方伝播が流れに保守的ジャコビアンを与えることを示す。これにより、ODE制約の下で積分コストに適用可能な非滑らかなアジョイント法を提案することができる。この結果は、パラメトリズドODE制約を用いた多種多様な非滑らかな最適化問題を解くための小さなステップ一階法の適用の理論的根拠となっている。これは、提案する非スムース随伴に基づく小さなステップ一階法を収束させることで示される。 We consider flows of ordinary differential equations (ODEs) driven by path differentiable vector fields. Path differentiable functions constitute a proper subclass of Lipschitz functions which admit conservative gradients, a notion of generalized derivative compatible with basic calculus rules. Our main result states that such flows inherit the path differentiability property of the driving vector field. We show indeed that forward propagation of derivatives given by the sensitivity differential inclusions provide a conservative Jacobian for the flow. This allows to propose a nonsmooth version of the adjoint method, which can be applied to integral costs under an ODE constraint. This result constitutes a theoretical ground to the application of small step first order methods to solve a broad class of nonsmooth optimization problems with parametrized ODE constraints. This is illustrated with the convergence of small step first order methods based on the proposed nonsmooth adjoint.	翻訳日:2022-01-12 15:07:20 公開日:2022-01-11
# 機械学習時代の反応と分光に関する原子論的シミュレーション - quo vadis? Atomistic Simulations for Reactions and Spectroscopy in the Era of Machine Learning -- Quo Vadis? ( http://arxiv.org/abs/2201.03822v1 ) ライセンス: Link先を確認	M. Meuwly	(参考訳) 正確なエネルギー関数を用いた原子論シミュレーションは、気体および凝縮相における分子の機能運動に関する分子レベルの洞察を与えることができる。最近開発され現在進行中の機械学習技術の統合と組み合わせは、そのようなダイナミクスシミュレーションを現実に近づけるユニークな機会を提供する。この視点は、この分野における他者の努力と、自身の仕事のいくつかから現場の現状を明確にし、オープンな質問と将来の展望について議論する。 Atomistic simulations using accurate energy functions can provide molecular-level insight into functional motions of molecules in the gas- and in the condensed phase. Together with recently developed and currently pursued efforts in integrating and combining this with machine learning techniques provides a unique opportunity to bring such dynamics simulations closer to reality. This perspective delineates the present status of the field from efforts of others in the field and some of your own work and discusses open questions and future prospects.	翻訳日:2022-01-12 15:07:06 公開日:2022-01-11
# スパース・リワード課題における強化と模倣学習を組み合わせたリワード・リラベリング Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks ( http://arxiv.org/abs/2201.03834v1 ) ライセンス: Link先を確認	Jesus Bujalance Martin, Fabien Moutarde	(参考訳) 近年、深層強化学習(DRL)は、ロボット工学、自律運転、ビデオゲームといった複雑な意思決定アプリケーションへの侵入に成功した。よりサンプル効率の良いアルゴリズムの探索では、できるだけ多くの外部のオフポリシーデータを活用することが有望な方向である。このデータ駆動アプローチの要点は、専門家のデモから学ぶことだ。過去には、デモのみの事前トレーニングや追加コスト関数の最小化など、リプレイバッファに追加されるデモをうまく活用するための複数のアイデアが提案されている。我々は,オンライン上で収集したデモやエピソードを,オフ・ポリシー・アルゴリズムを用いて,どのようなスパース・リワード環境でも活用できる新しい手法を提案する。本手法は,実演や成功したエピソードに与えられる報酬ボーナスに基づいて,専門家の模倣と自己模倣を奨励する。まず、エージェントが実証された動作にマッチするように促すために、デモから来る遷移に報奨ボーナスを与える。次に、成功したエピソードを収集すると、リプレイバッファに追加する前に同じボーナスで遷移を緩和し、エージェントが以前の成功と一致するように促します。実験はロボットの操作,特に6自由度ロボットアームの3つのタスクに焦点をあてた。報奨関係に基づく手法は, 実演がなくても, 基本アルゴリズム (sac, ddpg) の性能を向上させることを示す。さらに,従来の方法から2つの改善点を取り入れることで,すべてのベースラインを上回ります。 During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. In the search for more sample-efficient algorithms, a promising direction is to leverage as much external off-policy data as possible. One staple of this data-driven approach is to learn from expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation. First, we give a reward bonus to the transitions coming from demonstrations to encourage the agent to match the demonstrated behaviour. Then, upon collecting a successful episode, we relabel its transitions with the same bonus before adding them to the replay buffer, encouraging the agent to also match its previous successes. Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation. We show that our method based on reward relabeling improves the performance of the base algorithm (SAC and DDPG) on these tasks, even in the absence of demonstrations. Furthermore, integrating into our method two improvements from previous works allows our approach to outperform all baselines.	翻訳日:2022-01-12 15:06:57 公開日:2022-01-11
# 3次元物体検出と位置推定のためのLiDARビーム構成のエンドツーエンド最適化 End-To-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization ( http://arxiv.org/abs/2201.03860v1 ) ライセンス: Link先を確認	Niclas V\"odisch, Ozan Unal, Ke Li, Luc Van Gool, Dengxin Dai	(参考訳) lidarベースのアプリケーションのための既存の学習方法は、あらかじめ決められたビーム構成の下でスキャンされた3dポイントを使用する。これらの固定構成はタスクに依存しないため、単純に使用すればサブ最適パフォーマンスにつながる可能性がある。本研究では,あるアプリケーションに対して,lidarビーム構成の最適化を学ぶための新しい経路を提案する。具体的には、異なるLiDARベースのアプリケーションに対して、ビーム構成を自動的にエンドツーエンドに最適化する強化学習ベースの学習最適化(RL-L2O)フレームワークを提案する。この最適化は,目標タスクの最終的な性能によって導かれるので,簡単なドロップインモジュールとして任意のLiDARアプリケーションと容易に統合できる。この方法は、例えば大規模なシステム展開において、低解像度(低コスト)のLiDARが必要な場合に特に有用である。我々は,低分解能LiDARのビーム構成を3次元物体検出と局所化という2つの重要なタスクに対して探索する。実験の結果,RL-L2O法はベースライン法に比べて両タスクの性能が有意に向上することがわかった。我々は,プログラム可能なLiDARの最近の進歩と組み合わせることで,LiDARをベースとしたアクティブな知覚のための新たな研究方向を創出できると考えている。コードはhttps://github.com/vniclas/lidar_beam_selectionで公開されている。 Existing learning methods for LiDAR-based applications use 3D points scanned under a pre-determined beam configuration, e.g., the elevation angles of beams are often evenly distributed. Those fixed configurations are task-agnostic, so simply using them can lead to sub-optimal performance. In this work, we take a new route to learn to optimize the LiDAR beam configuration for a given application. Specifically, we propose a reinforcement learning-based learning-to-optimize (RL-L2O) framework to automatically optimize the beam configuration in an end-to-end manner for different LiDAR-based applications. The optimization is guided by the final performance of the target task and thus our method can be integrated easily with any LiDAR-based application as a simple drop-in module. The method is especially useful when a low-resolution (low-cost) LiDAR is needed, for instance, for system deployment at a massive scale. We use our method to search for the beam configuration of a low-resolution LiDAR for two important tasks: 3D object detection and localization. Experiments show that the proposed RL-L2O method improves the performance in both tasks significantly compared to the baseline methods. We believe that a combination of our method with the recent advances of programmable LiDARs can start a new research direction for LiDAR-based active perception. The code is publicly available at https://github.com/vniclas/lidar_beam_selection	翻訳日:2022-01-12 15:06:32 公開日:2022-01-11
# Systematic Literature Review: Quantum Machine Learningとその応用 Systematic Literature Review: Quantum Machine Learning and its applications ( http://arxiv.org/abs/2201.04093v1 ) ライセンス: Link先を確認	David Peral Garc\'ia, Juan Cruz-Benito and Francisco Jos\'e Garc\'ia-Pe\~nalvo	(参考訳) 量子コンピューティングは、量子力学を用いて計算を行う過程である。このフィールドは、その後の計算や大規模情報処理に使用するために、特定のサブ原子粒子の量子的挙動を研究する。これらの能力により、量子コンピュータは従来のコンピュータよりも計算時間とコストの面で有利になる。今日では、計算の複雑さや計算にかかる時間によって古典的な計算で実行できない科学的課題があり、量子計算は可能な答えの1つである。しかし、現在の量子デバイスはまだ必要な量子ビットを持っておらず、これらの目標を達成するのに十分なフォールトトレラントではない。それでも、機械学習や化学など、現在の量子デバイスで量子計算が役立つ分野は他にもある。この原稿は、2017年から2021年にかけて出版された論文の体系的な文献レビューを行い、量子機械学習で使用される異なるアルゴリズムとその応用を識別、分析、分類することを目的としている。その結果,量子機械学習技術とアルゴリズムを用いた52の論文を同定した。発見アルゴリズムの主な種類は、サポートベクターマシンやk-ネアレスト隣モデルのような古典的な機械学習アルゴリズムの量子実装と、量子ニューラルネットワークのような古典的なディープラーニングアルゴリズムである。古典的機械学習によって現在回答されている問題を、量子デバイスとアルゴリズムを使って解こうとする記事が多い。結果は有望だが、量子機械学習はその潜在能力を完全に達成するには程遠い。既存の量子コンピュータには、量子コンピューティングがその潜在能力を達成するのに十分な品質、速度、スケールが欠けているため、量子ハードウェアの改善が必要である。 Quantum computing is the process of performing calculations using quantum mechanics. This field studies the quantum behavior of certain subatomic particles for subsequent use in performing calculations, as well as for large-scale information processing. These capabilities can give quantum computers an advantage in terms of computational time and cost over classical computers. Nowadays, there are scientific challenges that are impossible to perform by classical computation due to computational complexity or the time the calculation would take, and quantum computation is one of the possible answers. However, current quantum devices have not yet the necessary qubits and are not fault-tolerant enough to achieve these goals. Nonetheless, there are other fields like machine learning or chemistry where quantum computation could be useful with current quantum devices. This manuscript aims to present a Systematic Literature Review of the papers published between 2017 and 2021 to identify, analyze and classify the different algorithms used in quantum machine learning and their applications. Consequently, this study identified 52 articles that used quantum machine learning techniques and algorithms. The main types of found algorithms are quantum implementations of classical machine learning algorithms, such as support vector machines or the k-nearest neighbor model, and classical deep learning algorithms, like quantum neural networks. Many articles try to solve problems currently answered by classical machine learning but using quantum devices and algorithms. Even though results are promising, quantum machine learning is far from achieving its full potential. An improvement in the quantum hardware is required since the existing quantum computers lack enough quality, speed, and scale to allow quantum computing to achieve its full potential.	翻訳日:2022-01-12 15:06:15 公開日:2022-01-11
# 最適圧縮VCクラス Optimally compressing VC classes ( http://arxiv.org/abs/2201.04131v1 ) ライセンス: Link先を確認	Zachary Chase	(参考訳) Littlestone と Warmuth の予想を解くと、VC-dimension $d$ の任意の概念クラスは、サンプル圧縮スキームが$d$ であることを示す。 Resolving a conjecture of Littlestone and Warmuth, we show that any concept class of VC-dimension $d$ has a sample compression scheme of size $d$.	翻訳日:2022-01-12 15:05:51 公開日:2022-01-11
# アンダーサンプド4次元流れMRIからの再構成ノイズの解析 An analysis of reconstruction noise from undersampled 4D flow MRI ( http://arxiv.org/abs/2201.03715v1 ) ライセンス: Link先を確認	Lauren Partin, Daniele E. Schiavazzi and Carlos A. Sing Long	(参考訳) 新しいMR画像モダリティは血行動態を定量化できるが、心血管疾患の早期診断に広く用いられていることを除いて、長い取得時間を必要とする。取得時間を短縮するため、画像圧縮性を高めるために設計された表現を活用するアンダーサンプル計測による再構成手法が日常的に使用される。再構成された解剖学的および血行力学的画像は、視覚的アーティファクトを呈することがある。これらのアーティファクトのいくつかは本質的にレコンストラクションエラーであり、アンダーサンプリングの結果であるが、測定ノイズやサンプル周波数のランダムな選択によるものもある。そうでなければ、再構成された画像はランダムな変数となり、そのバイアスと共分散の両方が視覚的なアーティファクトにつながる可能性がある。前者の性質は文献で研究されているが、後者はそれほど注目されていない。本研究では,再建過程から生じるランダム摂動の理論的性質について検討し,シミュレーションおよびMR大動脈流に関する数値実験を行った。その結果,gaussian undersamplingパターンと$\ell_1$-norm最小化に基づくリカバリアルゴリズムを組み合わせた場合,相関長は2～3ピクセルに制限されることがわかった。しかし, 他のアンダーサンプリングパターン, 高いアンダーサンプリング因子 (すなわち8xまたは16x圧縮) , 異なる再構成法では相関長が有意に増加する可能性がある。 Novel Magnetic Resonance (MR) imaging modalities can quantify hemodynamics but require long acquisition times, precluding its widespread use for early diagnosis of cardiovascular disease. To reduce the acquisition times, reconstruction methods from undersampled measurements are routinely used, that leverage representations designed to increase image compressibility. Reconstructed anatomical and hemodynamic images may present visual artifacts. Although some of these artifact are essentially reconstruction errors, and thus a consequence of undersampling, others may be due to measurement noise or the random choice of the sampled frequencies. Said otherwise, a reconstructed image becomes a random variable, and both its bias and its covariance can lead to visual artifacts; the latter leads to spatial correlations that may be misconstrued for visual information. Although the nature of the former has been studied in the literature, the latter has not received as much attention. In this study, we investigate the theoretical properties of the random perturbations arising from the reconstruction process, and perform a number of numerical experiments on simulated and MR aortic flow. Our results show that the correlation length remains limited to two to three pixels when a Gaussian undersampling pattern is combined with recovery algorithms based on $\ell_1$-norm minimization. However, the correlation length may increase significantly for other undersampling patterns, higher undersampling factors (i.e., 8x or 16x compression), and different reconstruction methods.	翻訳日:2022-01-12 15:05:23 公開日:2022-01-11
# 異常検出のための一様スパース表現を用いた辞書学習 Dictionary Learning with Uniform Sparse Representations for Anomaly Detection ( http://arxiv.org/abs/2201.03869v1 ) ライセンス: Link先を確認	Paul Irofti, Cristian Rusu, Andrei P\u{a}tra\c{s}cu	(参考訳) オーディオや画像処理のような多くのアプリケーションはスパース表現が強力で効率的な信号モデリング技術であることを示している。辞書学習(DL)によってアプローチされる難解な問題として,データの最短表現と最小近似誤差を同時に生成する最適な辞書を見つけることが挙げられる。信号のデータセットにおける異常サンプルの検出において,DLが果たす効果について検討した。本稿では,K-SVD型アルゴリズムを用いて,一様スパース表現モデルを求める特定のDL定式化を用いて,データセットの多数サンプルの下位部分空間を検出する。数値シミュレーションにより、この結果のサブスペースを効率よく利用し、正規データ点上の異常を識別できることが示されている。 Many applications like audio and image processing show that sparse representations are a powerful and efficient signal modeling technique. Finding an optimal dictionary that generates at the same time the sparsest representations of data and the smallest approximation error is a hard problem approached by dictionary learning (DL). We study how DL performs in detecting abnormal samples in a dataset of signals. In this paper we use a particular DL formulation that seeks uniform sparse representations model to detect the underlying subspace of the majority of samples in a dataset, using a K-SVD-type algorithm. Numerical simulations show that one can efficiently use this resulted subspace to discriminate the anomalies over the regular data points.	翻訳日:2022-01-12 15:04:07 公開日:2022-01-11
# PEPit: Pythonにおける一階最適化手法のコンピュータ支援最悪ケース解析 PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python ( http://arxiv.org/abs/2201.04040v1 ) ライセンス: Link先を確認	Baptiste Goujaud, C\'eline Moucer, Fran\c{c}ois Glineur, Julien Hendrickx, Adrien Taylor, Aymeric Dieuleveut	(参考訳) PEPitはPythonパッケージで、勾配、プロジェクション、近さ、線形最適化オラクルを含む多くの一階最適化メソッドの最悪のケース分析へのアクセスを、近似やブレグマン変種とともに単純化することを目的としている。簡単に言えば、PEPitはコンピュータ支援による一階最適化手法の最悪のケース解析を可能にするパッケージである。基本となる考え方は、数値的に解くことができる半定値プログラム(sdp)として、パフォーマンス推定問題(pep)と呼ばれる最悪のケース分析を行う問題を引き起こすことである。そのため、パッケージのユーザは、実装するのとほとんど同じくらいに、ファーストオーダーのメソッドを書かなければならない。その後、パッケージはSDPモデリング部品の処理を行い、最悪のケース解析は標準解法を介して数値的に行われる。 PEPit is a Python package aiming at simplifying the access to worst-case analyses of a large family of first-order optimization methods possibly involving gradient, projection, proximal, or linear optimization oracles, along with their approximate, or Bregman variants. In short, PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods. The key underlying idea is to cast the problem of performing a worst-case analysis, often referred to as a performance estimation problem (PEP), as a semidefinite program (SDP) which can be solved numerically. For doing that, the package users are only required to write first-order methods nearly as they would have implemented them. The package then takes care of the SDP modelling parts, and the worst-case analysis is performed numerically via a standard solver.	翻訳日:2022-01-12 15:03:17 公開日:2022-01-11
# (参考訳) 通信産業におけるデータ変換に基づく最適顧客チャーン予測モデル Data transformation based optimized customer churn prediction model for the telecommunication industry ( http://arxiv.org/abs/2201.04088v1 ) ライセンス: CC BY 4.0	Joydeb Kumar Sana, Mohammad Zoynul Abedin, M. Sohel Rahman, M. Saifur Rahman	(参考訳) データ変換(DT)は、元のデータを特定の分類アルゴリズムをサポートする形式で転送し、特別な目的のためにデータを解析するプロセスである。予測性能を向上させるため,様々なデータ変換法を検討した。本研究は、顧客誘引が一般的な現象である通信産業(TCI)における顧客チャーン予測(CCP)の文脈で実施する。本研究では, ccp問題に対するデータ変換法と機械学習モデルを組み合わせた新しい手法を提案する。公開TIデータセットを用いて実験を行い,広く利用されている評価尺度(AUC,精度,リコール,F尺度など)を用いて評価を行った。本研究では,変換手法の効果を確認するための包括的比較を行った。比較結果と統計的テストの結果,提案したデータ変換に基づく最適化モデルのほとんどはCCPの性能を著しく向上させることがわかった。全体として、この原稿を通じて、通信産業のための効率的で最適化されたCCPモデルが提示されている。 Data transformation (DT) is a process that transfers the original data into a form which supports a particular classification algorithm and helps to analyze the data for a special purpose. To improve the prediction performance we investigated various data transform methods. This study is conducted in a customer churn prediction (CCP) context in the telecommunication industry (TCI), where customer attrition is a common phenomenon. We have proposed a novel approach of combining data transformation methods with the machine learning models for the CCP problem. We conducted our experiments on publicly available TCI datasets and assessed the performance in terms of the widely used evaluation measures (e.g. AUC, precision, recall, and F-measure). In this study, we presented comprehensive comparisons to affirm the effect of the transformation methods. The comparison results and statistical test proved that most of the proposed data transformation based optimized models improve the performance of CCP significantly. Overall, an efficient and optimized CCP model for the telecommunication industry has been presented through this manuscript.	翻訳日:2022-01-12 15:01:05 公開日:2022-01-11
# gDNA: 生成の詳細なニューラルアバターを目指して gDNA: Towards Generative Detailed Neural Avatars ( http://arxiv.org/abs/2201.04123v1 ) ライセンス: Link先を確認	Xu Chen, Tianjian Jiang, Jie Song, Jinlong Yang, Michael J. Black, Andreas Geiger, Otmar Hilliges	(参考訳) 3Dアバターを広く利用するためには、任意のポーズでさまざまなアイデンティティと形状を持つ様々な3D仮想人間を生成する必要がある。この課題は、衣服の形状の多様性、複雑な調音、そして衣服における豊かでしかし確率的な幾何学的詳細のためである。したがって、現在の3D人を表す方法は、衣服の人々の完全な生成モデルを提供していない。本稿では,スキンの重みに対応するさまざまな衣服の人物の詳細な3次元形状を学習する新しい手法を提案する。具体的には,被験者1人あたり数回のポーズ・アンリグドスキャンから学習したマルチサブジェクト・フォワード・スキニングモジュールを考案する。衣料品の高周波詳細の確率的性質を捉えるために,モデルが基礎となる統計を捉えることを奨励する逆損失定式化を利用する。このことがシワなどの局所的な詳細の現実的な生成につながるという実証的な証拠を提供する。我々は,多様で詳細な衣服を身に着けた天然のアバターを生産できることを示した。さらに,本手法は,人間のモデルを生のスキャンに適合させることで,従来の技術よりも優れることを示す。 To make 3D human avatars widely available, we must be able to generate a variety of 3D virtual humans with varied identities and shapes in arbitrary poses. This task is challenging due to the diversity of clothed body shapes, their complex articulations, and the resulting rich, yet stochastic geometric detail in clothing. Hence, current methods to represent 3D people do not provide a full generative model of people in clothing. In this paper, we propose a novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skinning weights. Specifically, we devise a multi-subject forward skinning module that is learned from only a few posed, un-rigged scans per subject. To capture the stochastic nature of high-frequency details in garments, we leverage an adversarial loss formulation that encourages the model to capture the underlying statistics. We provide empirical evidence that this leads to realistic generation of local details such as wrinkles. We show that our model is able to generate natural human avatars wearing diverse and detailed clothing. Furthermore, we show that our method can be used on the task of fitting human models to raw scans, outperforming the previous state-of-the-art.	翻訳日:2022-01-12 14:46:05 公開日:2022-01-11
# DANNTe:ドメインシフト下におけるターボ機械センサ仮想化の事例研究 DANNTe: a case study of a turbo-machinery sensor virtualization under domain shift ( http://arxiv.org/abs/2201.03850v1 ) ライセンス: Link先を確認	Luca Strazzera and Valentina Gori and Giacomo Veneri	(参考訳) 本稿では,ドメイン適応(DA)時系列回帰タスク(DANNTe)に取り組むための逆学習手法を提案する。この回帰は、ガスタービンに搭載されたセンサーの仮想コピーを構築することを目的としており、特定の状況で失われる可能性のある物理センサーの代わりに使用される。我々のDAアプローチは、特徴のドメイン不変表現を探すことです。学習者はラベル付きソースデータセットとラベル付きターゲットデータセット(教師なしDA)の両方にアクセスでき、タスク回帰器とドメイン分類器ニューラルネットワークの間のminmaxゲームを利用するようにトレーニングされる。両方のモデルは同じ特徴表現を共有し、特徴抽出器によって学習される。この研究は Ganin et al. arXiv:1505.07818 によって発表された結果に基づいている。ソースドメインでのみトレーニングされたベースラインモデルと比較して,回帰性能が大幅に向上したことを報告する。 We propose an adversarial learning method to tackle a Domain Adaptation (DA) time series regression task (DANNTe). The regression aims at building a virtual copy of a sensor installed on a gas turbine, to be used in place of the physical sensor which can be missing in certain situations. Our DA approach is to search for a domain-invariant representation of the features. The learner has access to both a labelled source dataset and an unlabeled target dataset (unsupervised DA) and is trained on both, exploiting the minmax game between a task regressor and a domain classifier Neural Networks. Both models share the same feature representation, learnt by a feature extractor. This work is based on the results published by Ganin et al. arXiv:1505.07818; indeed, we present an extension suitable to time series applications. We report a significant improvement in regression performance, compared to the baseline model trained on the source domain only.	翻訳日:2022-01-12 14:45:33 公開日:2022-01-11
# 自動強化学習(AutoRL: Automated Reinforcement Learning)の調査と課題 Automated Reinforcement Learning (AutoRL): A Survey and Open Problems ( http://arxiv.org/abs/2201.03916v1 ) ライセンス: Link先を確認	Jack Parker-Holder, Raghu Rajan, Xingyou Song, Andr\'e Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer	(参考訳) 強化学習(RL)とディープラーニングの組み合わせは、多くの(深い)RLが一般的に有能なエージェントへの道筋を提供する、という印象的な成果をもたらした。しかしながら、RLエージェントの成功は、しばしばトレーニングプロセスにおける設計選択に非常に敏感であり、退屈でエラーを起こしやすい手動チューニングを必要とする。これにより、新しい問題にRLを使用することが難しくなり、また、その潜在能力を最大限に制限する。機械学習の他の多くの分野において、AutoMLはそのような設計選択を自動化できることを示しており、RLに適用すると有望な初期結果も得られている。しかし、AutoRL(Automated Reinforcement Learning)は、AutoMLの標準的なアプリケーションだけでなく、RL特有の課題も含んでいる。そのため、AutoRLはRLにおける重要な研究領域として現れており、RNA設計からGoのようなゲームまで、様々なアプリケーションで約束されている。 RLにおける手法や環境の多様性を考えると、研究の多くはメタラーニングから進化まで、異なるサブフィールドで行われている。本調査では,AutoRLの分野を統一し,共通分類学を提供し,各分野を詳細に議論し,今後の研究者にとって関心のあるオープンな問題を提起する。 The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.	翻訳日:2022-01-12 14:45:17 公開日:2022-01-11
# 関連キー差分を用いた改良型ニューラルディスタンス:SIMONとSIMECKへの応用 Improved Neural Distinguishers with (Related-key) Differentials: Applications in SIMON and SIMECK ( http://arxiv.org/abs/2201.03767v1 ) ライセンス: Link先を確認	Jinyu Lu and Guoqiang Liu and Yunwen Liu and Bing Sun and Chao Li and Li Liu	(参考訳) CRYPTO 2019で、Gohr氏は先駆的な試みを行い、NSAブロック暗号Speck32/64に対する差分暗号解析にディープラーニングをうまく適用し、純粋な差分区別器よりも高い精度を実現した。その性質上、データ内の効果的な特徴のマイニングは、データ駆動型ディープラーニングにおいて重要な役割を果たす。本稿では,暗号文ペアの学習データからの情報の整合性を考慮することに加えて,ディープラーニングの学習プロセスにおいて,差分暗号解析の構造に関するドメイン知識も考慮し,性能の向上を図る。また,sat/smtソルバに基づいて,従来に比べて性能を効果的に向上させる他の高確率対応差分特性を見出す。我々は,Simon と Simeck に対してニューラル差別器 (NDs) と関連キーニューラル差別器 (RKNDs) を構築する。 Simon32/64 の ND と RKND はそれぞれ 11-, 11-round に達し、それぞれ 59.55% と 97.90% である。 Simon64/128では、NDは13ラウンドで60.32%、RKNDは95.49%である。 Simeck32/64では、11ラウンド、14ラウンドのNDとRKNDがそれぞれ63.32%、87.06%の精度に達する。また、Simeck64/128向けに17ラウンドのNDと21ラウンドのRKNDをそれぞれ64.24%と62.96%の精度で構築する。現在、これらはSimon32/64、Simon64/128、Simeck32/64、Simeck64/128の最も長い(関連するキー)神経識別器である。 In CRYPTO 2019, Gohr made a pioneering attempt, and successfully applied deep learning to the differential cryptanalysis against NSA block cipher Speck32/64, achieving higher accuracy than the pure differential distinguishers. By its very nature, mining effective features in data plays a crucial role in data-driven deep learning. In this paper, in addition to considering the integrity of the information from the training data of the ciphertext pair, domain knowledge about the structure of differential cryptanalysis is also considered into the training process of deep learning to improve the performance. Besides, based on the SAT/SMT solvers, we find other high probability compatible differential characteristics which effectively improve the performance compared with previous work. We build neural distinguishers (NDs) and related-key neural distinguishers (RKNDs) against Simon and Simeck. The ND and RKND for Simon32/64 reach 11-, 11-round with an accuracy of 59.55% and 97.90%, respectively. For Simon64/128, the ND achieve an accuracy of 60.32% in 13-round, while it is 95.49% for the RKND. For Simeck32/64, ND and RKND of 11-, 14-round are obtained, reaching an accuracy of 63.32% and 87.06%, respectively. And we build 17-round ND and 21-round RKND for Simeck64/128 with an accuracy of 64.24% and 62.96%, respectively. Currently, these are the longest (related-key) neural distinguishers with higher accuracy for Simon32/64, Simon64/128, Simeck32/64 and Simeck64/128.	翻訳日:2022-01-12 14:42:55 公開日:2022-01-11
# CVSSコーパスと多言語音声合成 CVSS Corpus and Massively Multilingual Speech-to-Speech Translation ( http://arxiv.org/abs/2201.03713v1 ) ライセンス: Link先を確認	Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen	(参考訳) CVSSは,21言語から英語への文レベル並列S2ST対をカバーする,多言語から英語への多言語翻訳(S2ST)コーパスである。 CVSSはコモンボイス音声コーパスとCoVoST2音声テキスト翻訳(ST)コーパスから派生したもので、CoVoST2からの翻訳テキストを最先端のTSSシステムを用いて音声に合成する。翻訳文には2つのバージョンがある。 1)CVSS-C:全ての翻訳音声は高品質の標準音声である。 2) CVSS-T: 翻訳音声は対応する音源から伝達される音声である。さらに、CVSSは、翻訳音声の発音と一致する正規化翻訳テキストを提供する。 CVSSの各バージョンにおいて,ベースライン多言語直接S2STモデルとカスケードS2STモデルを構築し,コーパスの有効性を検証した。強力なカスケードS2STベースラインを構築するために、我々はCoVoST 2上でSTモデルを訓練した。それでも、直接S2STモデルの性能は、スクラッチからトレーニングされたときの強いカスケードベースラインに近づき、一致するSTモデルから初期化されるときのASR転写翻訳における0.1または0.7BLEU差のみである。 We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems. Two versions of translation speeches are provided: 1) CVSS-C: All the translation speeches are in a single high-quality canonical voice; 2) CVSS-T: The translation speeches are in voices transferred from the corresponding source speeches. In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech. On each version of CVSS, we built baseline multilingual direct S2ST models and cascade S2ST models, verifying the effectiveness of the corpus. To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous state-of-the-art trained on the corpus without extra data by 5.8 BLEU. Nevertheless, the performance of the direct S2ST models approaches the strong cascade baselines when trained from scratch, and with only 0.1 or 0.7 BLEU difference on ASR transcribed translation when initialized from matching ST models.	翻訳日:2022-01-12 14:42:28 公開日:2022-01-11
# TSA-Net:行動品質評価のためのチューブ自己注意ネットワーク TSA-Net: Tube Self-Attention Network for Action Quality Assessment ( http://arxiv.org/abs/2201.03746v1 ) ライセンス: Link先を確認	Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, Lihua Zhang	(参考訳) 近年,映像からのアクションクオリティの評価がコンピュータビジョンコミュニティやヒューマンコンピュータインタラクションにおいて注目を集めている。既存のアプローチの多くは、フォアグラウンドやバックグラウンド情報といった機能マップ内の本質的な違いを無視するアクション認識タスクからモデルを直接移行することで、この問題に対処している。この問題に対処するために,行動品質評価(AQA)のためのチューブ自己注意ネットワーク(TSA-Net)を提案する。具体的には、単一オブジェクトトラッカーをAQAに導入し、スパースな特徴相互作用を採用することで、時空間情報を高効率に生成できるチューブ自己認識モジュール(TSA)を提案する。 TSAモジュールは既存のビデオネットワークに埋め込まれ、TSA-Netを形成する。全体として、私たちのTSA-Netには以下のメリットがあります。 1)高い計算効率、 2)高い柔軟性、そして 3)最先端の芸術作品。 AQA-7 や MTL-AQA など,一般的な行動品質評価データセットに対して大規模な実験を行った。さらに、フィギュアスケートシーンにおける基本的なアクションアセスメントを検討するために、Fall Recognition in Figure Skating (FR-FS) というデータセットが提案されている。 In recent years, assessing action quality from videos has attracted growing attention in computer vision community and human computer interaction. Most existing approaches usually tackle this problem by directly migrating the model from action recognition tasks, which ignores the intrinsic differences within the feature map such as foreground and background information. To address this issue, we propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA). Specifically, we introduce a single object tracker into AQA and propose the Tube Self-Attention Module (TSA), which can efficiently generate rich spatio-temporal contextual information by adopting sparse feature interactions. The TSA module is embedded in existing video networks to form TSA-Net. Overall, our TSA-Net is with the following merits: 1) High computational efficiency, 2) High flexibility, and 3) The state-of-the art performance. Extensive experiments are conducted on popular action quality assessment datasets including AQA-7 and MTL-AQA. Besides, a dataset named Fall Recognition in Figure Skating (FR-FS) is proposed to explore the basic action assessment in the figure skating scene.	翻訳日:2022-01-12 14:41:31 公開日:2022-01-11
# 脳腫瘍分節に対する相互対位学習: BraTS Challenge 2021 分節課題への解法 Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task ( http://arxiv.org/abs/2201.03777v1 ) ライセンス: Link先を確認	Himashi Peiris, Zhaolin Chen, Gary Egan, Mehrtash Harandi	(参考訳) 本稿では,脳腫瘍セグメンテーション課題に対する対角学習に基づくトレーニング手法を提案する。この概念では、3Dセグメンテーションネットワークは2つの相互対角学習アプローチから学習する。セグメンテーション予測の一般化を図り,セグメンテーションネットワークの堅牢化を図るため,本研究は,患者データにノイズを付加することにより,より逆行例を生成することにより,仮想逆トレーニングアプローチに固執する。定量的主観的審判として機能する批評家を取り入れることで、セグメンテーションネットワークは、セグメンテーション結果に関連する不確実性情報から学習する。 RSNA-ASNR-MICCAI BraTS 2021データセットを用いてネットワークアーキテクチャのトレーニングと評価を行った。オンライン検証データセットの性能は以下の通りである: Dice similarity Score of 81.38%, 90.77%, 85.39%; Hausdorff Distance (95\%) of 21.83 mm, 5.37 mm, 8.56 mm for the enhance tumor, whole tumor and tumor core。同様に、我々のアプローチは最終試験データセットで84.55%、90.46%、85.30%のDice類似スコア、13.48mm、6.32mm、16.98mmのHausdorff Distance (95\%)を達成した。全体として,提案手法は各腫瘍部分領域の分節精度が向上した。私たちのコード実装はhttps://github.com/himashi92/vizviva_brats_2021で公開されています。 This paper proposes an adversarial learning based training approach for brain tumor segmentation task. In this concept, the 3D segmentation network learns from dual reciprocal adversarial learning approaches. To enhance the generalization across the segmentation predictions and to make the segmentation network robust, we adhere to the Virtual Adversarial Training approach by generating more adversarial examples via adding some noise on original patient data. By incorporating a critic that acts as a quantitative subjective referee, the segmentation network learns from the uncertainty information associated with segmentation results. We trained and evaluated network architecture on the RSNA-ASNR-MICCAI BraTS 2021 dataset. Our performance on the online validation dataset is as follows: Dice Similarity Score of 81.38%, 90.77% and 85.39%; Hausdorff Distance (95\%) of 21.83 mm, 5.37 mm, 8.56 mm for the enhancing tumor, whole tumor and tumor core, respectively. Similarly, our approach achieved a Dice Similarity Score of 84.55%, 90.46% and 85.30%, as well as Hausdorff Distance (95\%) of 13.48 mm, 6.32 mm and 16.98 mm on the final test dataset. Overall, our proposed approach yielded better performance in segmentation accuracy for each tumor sub-region. Our code implementation is publicly available at https://github.com/himashi92/vizviva_brats_2021	翻訳日:2022-01-12 14:41:13 公開日:2022-01-11
# COROLLA:緑内障治療のためのコントラスト学習を改良した多モード統合フレームワーク COROLLA: An Efficient Multi-Modality Fusion Framework with Supervised Contrastive Learning for Glaucoma Grading ( http://arxiv.org/abs/2201.03795v1 ) ライセンス: Link先を確認	Zhiyuan Cai, Li Lin, Huaqing He, Xiaoying Tang	(参考訳) 緑内障は盲目を引き起こす可能性のある眼疾患の1つであり、早期発見と治療は非常に重要である。眼底画像と光学コヒーレンス断層撮影(oct)画像はどちらも緑内障の診断に広く用いられている。しかし, 既存の緑内障分類法は, 眼底と眼底の相補情報を無視して, 単一のモダリティを主に活用している。本稿では,緑内障評価のための効率的な多モード教師付きコントラスト学習フレームワークcorollaを提案する。層分割と厚さ計算と投影により、元のoctボリュームから網膜厚マップを抽出し、置換モードとして使用することにより、メモリ使用量が少なく、より効率的な計算が可能になる。医用画像サンプルの高構造と分布の類似性を考慮し,教師付きコントラスト学習を用いて,モデルの識別能力を向上させる。さらに, 診断精度を高めるため, 足底画像と厚みマップの特徴レベル融合を行った。 GAMMAデータセットでは,我々のCOROLLAフレームワークは最先端の手法と比較して圧倒的な緑内障グレーディング性能を達成している。 Glaucoma is one of the ophthalmic diseases that may cause blindness, for which early detection and treatment are very important. Fundus images and optical coherence tomography (OCT) images are both widely-used modalities in diagnosing glaucoma. However, existing glaucoma grading approaches mainly utilize a single modality, ignoring the complementary information between fundus and OCT. In this paper, we propose an efficient multi-modality supervised contrastive learning framework, named COROLLA, for glaucoma grading. Through layer segmentation as well as thickness calculation and projection, retinal thickness maps are extracted from the original OCT volumes and used as a replacing modality, resulting in more efficient calculations with less memory usage. Given the high structure and distribution similarities across medical image samples, we employ supervised contrastive learning to increase our models' discriminative power with better convergence. Moreover, feature-level fusion of paired fundus image and thickness map is conducted for enhanced diagnosis accuracy. On the GAMMA dataset, our COROLLA framework achieves overwhelming glaucoma grading performance compared to state-of-the-art methods.	翻訳日:2022-01-12 14:40:42 公開日:2022-01-11
# Smart Director:ライブ放送のためのイベント駆動ディレクティブシステム Smart Director: An Event-Driven Directing System for Live Broadcasting ( http://arxiv.org/abs/2201.04024v1 ) ライセンス: Link先を確認	Yingwei Pan and Yue Chen and Qian Bao and Ning Zhang and Ting Yao and Jingen Liu and Tao Mei	(参考訳) ライブビデオ放送は通常、マルチカメラ生産を可能にするために、様々な技術と専門知識を必要とする。カメラの数が増えるにつれて、ライブスポーツ放送の監督は、これまで以上に複雑で難しいものになっている。放送監督は、製作中にもっと集中し、反応し、知識を持てなければならない。そこで我々は,従来の人間間放送を模倣して,先進的な多視点ビデオ解析アルゴリズムを用いて,ほぼ専門的な放送番組をリアルタイムで自動作成することを目的とした,Smart Directorという,革新的な自動スポーツ放送ディレクティブシステムを開発した。スポーツ放送のいわゆる「3つのイベント」構成に着想を得て、3つの連続した新規コンポーネントからなるイベント駆動パイプラインでシステムを構築する。 1)マルチビュー相関をモデル化してイベントを検出するマルチビューイベントローカライゼーション 2)視点選択の視覚的重要度によるカメラビューのランク付けのためのマルチビューハイライト検出 3)放送映像の制作を制御する自動放送スケジューリング装置。我々の知る限り,本システムはスポーツイベントのセマンティック理解によって完全に駆動される,マルチカメラスポーツ放送のための初のエンドツーエンド自動ディレクティブシステムである。また、クロスビュー関係モデリングによる多視点共同イベント検出の新たな問題を解決した最初のシステムでもある。我々は,実世界のマルチカメラサッカーデータセット上で客観的および主観的評価を行い,自動生成ビデオの品質が人間に匹敵することを示す。より高速な応答によって、私たちのシステムはより高速で短時間のイベントをキャプチャすることができます。 Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keep increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called "three-event" construction of sports broadcast, we build our system with an event-driven pipeline consisting of three consecutive novel components: 1) the Multi-view Event Localization to detect events by modeling multi-view correlations, 2) the Multi-view Highlight Detection to rank camera views by the visual importance for view selection, 3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors.	翻訳日:2022-01-12 14:40:23 公開日:2022-01-11
# MobilePhys: パーソナライズされたカメラベースのコンタクトレス生理的センシング MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing ( http://arxiv.org/abs/2201.04039v1 ) ライセンス: Link先を確認	Xin Liu, Yuntao Wang, Sinan Xie, Xiaoyu Zhang, Zixian Ma, Daniel McDuff, Shwetak Patel	(参考訳) カメラベースのコンタクトレスフォトプレチモグラフィ(英: contactless photoplethysmography)は、コンタクトレス生理測定のための一般的な技術である。現在の最先端のニューラルモデルは通常、金の標準的な生理学的測定を伴うビデオを使用して教師ありの方法で訓練される。しかし、多くの場合、ドメイン外の例(トレーニングセットと異なるビデオ)を一般化する。パーソナライズモデルはモデルの一般化性を改善するのに役立つが、多くのパーソナライズ技術は金の標準データを必要とする。そこで本稿では,スマートフォンの前面カメラと背面カメラの両方を利用して,パーソナライズされたコンタクトレスカメラベースのppgモデルをトレーニングするための高品質な自己教師付きラベルを生成する,モバイルパーソナライズ型リモート生理センシングシステムmobilephysを提案する。 MobilePhysのロバスト性を評価するために,異なるモバイルデバイス,照明条件/強度,動作タスク,皮膚タイプでタスクセットを完了した39名の被験者を対象に,ユーザスタディを行った。以上の結果から,MobilePhysはデバイス上での教師付きトレーニングや少数ショット適応手法よりも優れていた。広範なユーザ研究を通じて,MobilePhysは複雑な実世界の環境でどのように機能するかをさらに検討する。提案するデュアルカメラ・モバイルセンシングシステムから生成したカメラベースコンタクトレスppgモデルのキャリブレーションやパーソナライズによるppgモデルは,スマートミラーやフィットネス,モバイルヘルスアプリケーションなど,将来の多くのアプリケーションへの扉を開くだろう。 Camera-based contactless photoplethysmography refers to a set of popular techniques for contactless physiological measurement. The current state-of-the-art neural models are typically trained in a supervised manner using videos accompanied by gold standard physiological measurements. However, they often generalize poorly out-of-domain examples (i.e., videos that are unlike those in the training set). Personalizing models can help improve model generalizability, but many personalization techniques still require some gold standard data. To help alleviate this dependency, in this paper, we present a novel mobile sensing system called MobilePhys, the first mobile personalized remote physiological sensing system, that leverages both front and rear cameras on a smartphone to generate high-quality self-supervised labels for training personalized contactless camera-based PPG models. To evaluate the robustness of MobilePhys, we conducted a user study with 39 participants who completed a set of tasks under different mobile devices, lighting conditions/intensities, motion tasks, and skin types. Our results show that MobilePhys significantly outperforms the state-of-the-art on-device supervised training and few-shot adaptation methods. Through extensive user studies, we further examine how does MobilePhys perform in complex real-world settings. We envision that calibrated or personalized camera-based contactless PPG models generated from our proposed dual-camera mobile sensing system will open the door for numerous future applications such as smart mirrors, fitness and mobile health applications.	翻訳日:2022-01-12 14:39:55 公開日:2022-01-11
# DM-VIO: 遅延不整形視覚慣性オドメトリー DM-VIO: Delayed Marginalization Visual-Inertial Odometry ( http://arxiv.org/abs/2201.04114v1 ) ライセンス: Link先を確認	Lukas von Stumberg, Daniel Cremers	(参考訳) DM-VIOは,遅延境界化法とポーズグラフバンドル調整法という2つの新しい手法に基づく単眼視覚・慣性オドメトリーシステムである。 dm-vioは、視覚残差のために動的重みで測光束調整を行う。我々は、更新時間を制限し続けるための一般的な戦略である限界化を採用するが、容易に逆転することはできず、連結変数の線形化点を固定する必要がある。この問題を解決するために、我々は、遅延余分化を提案する: この考え方は、余分化が遅れる第2因子グラフを維持することである。これにより、この遅延グラフを後で読み出し、新しい一貫性のある線形化点に先行して辺縁化を更新できる。さらに, 限界化の遅れにより, IMU 情報を既存の限界化状態に注入することができる。これは、IMU初期化に使用する提案されたポーズグラフバンドル調整の基礎である。 IMU初期化に関する以前の研究とは対照的に、完全な測光の不確かさを捉え、スケール推定を改善することができる。当初観測不能なスケールに対応するため、IMU初期化が完了した後も、メインシステムのスケールと重力方向を最適化し続けます。我々は,EuRoC,TUM-VI,および4Seasonsデータセットを用いて,空飛ぶドローン,大規模ハンドヘルド,自動車シナリオからなるシステム評価を行った。提案したIMUイニシャライゼーションにより,本システムは視覚・慣性オードメトリーにおいて,単一のカメラとIMUのみを使用しながら,ステレオ慣性手法よりも優れていた。コードはhttp://vision.in.tum.de/dm-vioで公開される。 We present DM-VIO, a monocular visual-inertial odometry system based on two novel techniques called delayed marginalization and pose graph bundle adjustment. DM-VIO performs photometric bundle adjustment with a dynamic weight for visual residuals. We adopt marginalization, which is a popular strategy to keep the update time constrained, but it cannot easily be reversed, and linearization points of connected variables have to be fixed. To overcome this we propose delayed marginalization: The idea is to maintain a second factor graph, where marginalization is delayed. This allows us to later readvance this delayed graph, yielding an updated marginalization prior with new and consistent linearization points. In addition, delayed marginalization enables us to inject IMU information into already marginalized states. This is the foundation of the proposed pose graph bundle adjustment, which we use for IMU initialization. In contrast to prior works on IMU initialization, it is able to capture the full photometric uncertainty, improving the scale estimation. In order to cope with initially unobservable scale, we continue to optimize scale and gravity direction in the main system after IMU initialization is complete. We evaluate our system on the EuRoC, TUM-VI, and 4Seasons datasets, which comprise flying drone, large-scale handheld, and automotive scenarios. Thanks to the proposed IMU initialization, our system exceeds the state of the art in visual-inertial odometry, even outperforming stereo-inertial methods while using only a single camera and IMU. The code will be published at http://vision.in.tum.de/dm-vio	翻訳日:2022-01-12 14:39:13 公開日:2022-01-11
# humannerf: モノクロビデオから人を動かす自由視点レンダリング HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video ( http://arxiv.org/abs/2201.04127v1 ) ライセンス: Link先を確認	Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron and Ira Kemelmacher-Shlizerman	(参考訳) 我々は、人間の複雑な身体の動き、例えばYouTubeのビデオの特定の単眼ビデオで動作する自由視点レンダリング手法、HumanNeRFを紹介した。提案手法では,任意のフレームで動画をパージングし,任意のカメラ視点から被写体をレンダリングしたり,特定のフレームとボディポーズのための360度カメラパスをフルに作成することができる。この作業は特に困難であり、入力ビデオに存在しない様々なカメラアングルから見るように、身体のフォトリアリスティックな詳細を合成し、布の折りたたみや顔の外観などの細かい詳細を合成する必要がある。提案手法は, 逆ワープによる映像のすべてのフレームに推定された正準表現をマッピングする運動場と協調して, 標準T位置における人物の体積表現を最適化する。運動場は、深層ネットワークによって生成される骨格剛体および非剛体運動に分解される。先行作業よりも性能が大幅に向上し,無制御のキャプチャシナリオに挑戦する単眼映像からのフリー視点レンダリングの説得力のある例を示す。 We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.	翻訳日:2022-01-12 14:38:47 公開日:2022-01-11
# pymdp: 離散状態空間におけるアクティブ推論のためのPythonライブラリ pymdp: A Python library for active inference in discrete state spaces ( http://arxiv.org/abs/2201.03904v1 ) ライセンス: Link先を確認	Conor Heins, Beren Millidge, Daphne Demekas, Brennan Klein, Karl Friston, Iain Couzin, Alexander Tschantz	(参考訳) アクティブ推論(Active Inference)は、ベイズ推論の理論マントルの下で行動、知覚、学習をまとめる複雑なシステムにおける認知と行動の説明である。アクティブ推論は学術研究、特に人間や動物の行動をモデル化しようとする分野に応用が増えている。近年では、pythonやjuliaのようなオープンソース言語で、アクティブな推論エージェントをシミュレートするための最も人気のあるソフトウェアであるto-dateは、神経画像データの統計解析とモデリングのために元々開発されたmatlabライブラリであるspmのdemツールボックスである。アクティブ推論への関心の高まりは、膨大な数と、科学分野にわたるアプリケーションの多様化の両方において現れ、pythonのようなオープンソースの科学計算言語でアクティブ推論をシミュレートするための汎用的で広く利用可能な、ユーザフレンドリなコードの必要性を生み出した。ここで紹介するpythonパッケージであるpymdp(https://github.com/infer-actively/pymdp参照)は、この方向への大きなステップを示しています。我々は、パッケージの構造をレビューし、モジュール設計やカスタマイズ性といった利点を説明しながら、テキスト内コードブロックを提供し、アクティブな推論プロセスの構築と実行を簡単にする方法をデモする。我々は,様々な学際的背景を持つ研究者,技術者,開発者に対して,アクティブ推論フレームワークのアクセシビリティと露出を高めるために,pymdpを開発した。オープンソースソフトウェアの精神では、活発な推論コミュニティにおいて、新たなイノベーション、開発、コラボレーションが促進されることを願っています。 Active inference is an account of cognition and behavior in complex systems which brings together action, perception, and learning under the theoretical mantle of Bayesian inference. Active inference has seen growing applications in academic research, especially in fields that seek to model human or animal behavior. While in recent years, some of the code arising from the active inference literature has been written in open source languages like Python and Julia, to-date, the most popular software for simulating active inference agents is the DEM toolbox of SPM, a MATLAB library originally developed for the statistical analysis and modelling of neuroimaging data. Increasing interest in active inference, manifested both in terms of sheer number as well as diversifying applications across scientific disciplines, has thus created a need for generic, widely-available, and user-friendly code for simulating active inference in open-source scientific computing languages like Python. The Python package we present here, pymdp (see https://github.com/infer-actively/pymdp), represents a significant step in this direction: namely, we provide the first open-source package for simulating active inference with partially-observable Markov Decision Processes or POMDPs. We review the package's structure and explain its advantages like modular design and customizability, while providing in-text code blocks along the way to demonstrate how it can be used to build and run active inference processes with ease. We developed pymdp to increase the accessibility and exposure of the active inference framework to researchers, engineers, and developers with diverse disciplinary backgrounds. In the spirit of open-source software, we also hope that it spurs new innovation, development, and collaboration in the growing active inference community.	翻訳日:2022-01-12 14:38:05 公開日:2022-01-11
# h\"older関数に対するディープニューラルネットワーク近似 Deep Neural Network Approximation For H\"older Functions ( http://arxiv.org/abs/2201.03747v1 ) ライセンス: Link先を確認	Ahmed Abdeljawad	(参考訳) 本研究では,h\"older-regular関数に対する深さ直交単位ニューラルネットワークの近似能力について,一様ノルムに関して検討する。理論的近似はニューラルネットワークにおいて選択された活性化関数に大きく依存する。 In this work, we explore the approximation capability of deep Rectified Quadratic Unit neural networks for H\"older-regular functions, with respect to the uniform norm. We find that theoretical approximation heavily depends on the selected activation function in the neural network.	翻訳日:2022-01-12 14:36:27 公開日:2022-01-11
# (参考訳) 軽量ニューラルネットワークアニメーションを目指して : エキスパートベースアニメーションモデルの混合によるニューラルネットワークプラニングの探索 Towards Lightweight Neural Animation : Exploration of Neural Network Pruning in Mixture of Experts-based Animation Models ( http://arxiv.org/abs/2201.04042v1 ) ライセンス: CC BY 4.0	Antoine Maiorca, Nathan Hubens, Sohaib Laraba and Thierry Dutoit	(参考訳) 近年,ニューラルネットワークアニメーションが登場し,仮想文字をアニメーション化する自動手法が提案されている。それらの動きはニューラルネットワークによって合成される。この動きをユーザ定義の制御信号でリアルタイムに制御することは、ビデオゲームでも重要なタスクである。完全連結層(mlps)と混合専門家(moe)に基づくソリューションは、環境と仮想キャラクタ間の密接な相互作用によって様々な動きを生成し制御する素晴らしい結果をもたらしている。しかし、完全接続層の主な欠点は、計算コストとメモリコストが最適化されたソリューションにつながる可能性があることである。本研究では,MLP-MoEニューラルネットワークをインタラクティブなキャラクターアニメーションの文脈で圧縮するためにプルーニングアルゴリズムを適用し,パラメータの数を削減し,このアクセラレーションと合成された動き品質とのトレードオフにより計算時間を高速化する。この研究は、同じ数のエキスパートとパラメータで、刈り取ったモデルは密度の高いモデルよりも動きのアーティファクトを少なくし、学習されたハイレベルな運動特徴が両方のモデルに類似していることを示します。 In the past few years, neural character animation has emerged and offered an automatic method for animating virtual characters. Their motion is synthesized by a neural network. Controlling this movement in real time with a user-defined control signal is also an important task in video games for example. Solutions based on fully-connected layers (MLPs) and Mixture-of-Experts (MoE) have given impressive results in generating and controlling various movements with close-range interactions between the environment and the virtual character. However, a major shortcoming of fully-connected layers is their computational and memory cost which may lead to sub-optimized solution. In this work, we apply pruning algorithms to compress an MLP- MoE neural network in the context of interactive character animation, which reduces its number of parameters and accelerates its computation time with a trade-off between this acceleration and the synthesized motion quality. This work demonstrates that, with the same number of experts and parameters, the pruned model produces less motion artifacts than the dense model and the learned high-level motion features are similar for both	翻訳日:2022-01-12 14:35:53 公開日:2022-01-11
# DDG-DA:予測可能なコンセプトドリフト適応のためのデータ分散生成 DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation ( http://arxiv.org/abs/2201.04038v1 ) ライセンス: Link先を確認	Wendi Li, Xiao Yang, Weiqing Liu, Yingce Xia, Jiang Bian	(参考訳) 多くの現実世界のシナリオでは、時間とともに順次収集されるストリーミングデータを扱うことが多い。環境の非定常的な性質のため、ストリーミングデータ分布は予測不可能な方法で変化する可能性がある。概念ドリフトを処理するために、従来の手法はまず、概念ドリフトの発生時期を検知し、次に最新のデータの分布に合わせてモデルを適用する。しかしながら、環境進化の基盤となる要因が予測可能であり、ストリーミングデータの将来の概念ドリフト傾向をモデル化できるケースは多いが、以前の作業では十分に検討されていない。本稿では,データ分散の進化を効果的に予測し,モデルの性能を向上させる手法DDG-DAを提案する。具体的には、まず予測器をトレーニングして将来のデータ分布を推定し、次にトレーニングサンプルを生成し、最終的に生成されたデータでモデルをトレーニングします。我々は,3つの実世界の課題(株価動向,電力負荷,日射量の予測)について実験を行い,多種多様なモデルにおいて有意な改善を得る。 In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work. In this paper, we propose a novel method DDG-DA, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data. We conduct experiments on three real-world tasks (forecasting on stock price trend, electricity load and solar irradiance) and obtain significant improvement on multiple widely-used models.	翻訳日:2022-01-12 14:24:53 公開日:2022-01-11
# ランダムグラフにおけるエントロピー最適輸送 Entropic Optimal Transport in Random Graphs ( http://arxiv.org/abs/2201.03949v1 ) ライセンス: Link先を確認	Nicolas Keriven	(参考訳) グラフ解析において、古典的なタスクはノード間の(グループの)類似性の計算によって構成される。潜在空間ランダムグラフでは、ノードは未知の潜在変数に関連付けられる。すると、グラフ構造のみを用いて、潜在空間内で直接距離を計算することができる。本稿では,潜在空間内のノード群間におけるエントロピー規則化された最適輸送(OT)距離を一貫して推定できることを示す。コスト行列の摂動に対するエントロピーOTの一般的な安定性結果を提供する。その後、グラトンや多様体上の$\epsilon$-graphsのようなランダムグラフのいくつかの例に適用する。その過程で、いわゆる普遍特異値しきい値推定器と、多様体上の測地距離の推定のための新しい濃度結果が証明される。 In graph analysis, a classic task consists in computing similarity measures between (groups of) nodes. In latent space random graphs, nodes are associated to unknown latent variables. One may then seek to compute distances directly in the latent space, using only the graph structure. In this paper, we show that it is possible to consistently estimate entropic-regularized Optimal Transport (OT) distances between groups of nodes in the latent space. We provide a general stability result for entropic OT with respect to perturbations of the cost matrix. We then apply it to several examples of random graphs, such as graphons or $\epsilon$-graphs on manifolds. Along the way, we prove new concentration results for the so-called Universal Singular Value Thresholding estimator, and for the estimation of geodesic distances on a manifold.	翻訳日:2022-01-12 14:24:14 公開日:2022-01-11
# Captcha攻撃:人間性に対する攻撃 Captcha Attack:Turning Captchas Against Humanity ( http://arxiv.org/abs/2201.04014v1 ) ライセンス: Link先を確認	Mauro Conti, Luca Pajola, Pier Paolo Tricomi	(参考訳) 現在、人々はオンラインプラットフォーム(例えば、ソーシャルネットワーク、ブログ)で大量のコンテンツを作成、共有している。 2021年、毎日190億人のFacebookユーザーが毎分約150万枚の写真を投稿した。コンテンツモデレーターは常にこれらのオンラインプラットフォームを監視し、不適切なコンテンツ(ヘイトスピーチ、ヌード画像など)の拡散を防ぐ。ディープラーニング(DL)の進歩に基づいて、自動コンテンツモデレータ(ACM)は、人間のモデレーターが高いデータボリュームを処理するのに役立つ。アドバンテージにもかかわらず、攻撃者はDLコンポーネント(例えば前処理、モデル)の弱点を利用してパフォーマンスに影響を与えることができる。したがって、攻撃者はacmを回避して不適切なコンテンツを拡散することができる。そこで本研究では,ACM制御を回避して不適切なテキストをオンラインで拡散できるCAPtcha Attack (CAPA)を提案する。 CAPAはカスタムテキストCAPTCHAを生成することで、ACMの不注意な設計実装と内部プロシージャの脆弱性を利用する。実世界のACMに対する攻撃を検証し、その結果、単純で効果的な攻撃の威力を確認し、ほとんどのケースで100%の回避に成功した。同時に、CAPTCHAs研究領域におけるCAPA緩和の難しさを実証し、新たな課題を提起する。 Nowadays, people generate and share massive content on online platforms (e.g., social networks, blogs). In 2021, the 1.9 billion daily active Facebook users posted around 150 thousand photos every minute. Content moderators constantly monitor these online platforms to prevent the spreading of inappropriate content (e.g., hate speech, nudity images). Based on deep learning (DL) advances, Automatic Content Moderators (ACM) help human moderators handle high data volume. Despite their advantages, attackers can exploit weaknesses of DL components (e.g., preprocessing, model) to affect their performance. Therefore, an attacker can leverage such techniques to spread inappropriate content by evading ACM. In this work, we propose CAPtcha Attack (CAPA), an adversarial technique that allows users to spread inappropriate text online by evading ACM controls. CAPA, by generating custom textual CAPTCHAs, exploits ACM's careless design implementations and internal procedures vulnerabilities. We test our attack on real-world ACM, and the results confirm the ferocity of our simple yet effective attack, reaching up to a 100% evasion success in most cases. At the same time, we demonstrate the difficulties in designing CAPA mitigations, opening new challenges in CAPTCHAs research area.	翻訳日:2022-01-12 14:23:13 公開日:2022-01-11
# 大規模なデータセット改善のためのモバイルUIレイアウトを識別する学習 Learning to Denoise Raw Mobile UI Layouts for ImprovingDatasets at Scale ( http://arxiv.org/abs/2201.04100v1 ) ライセンス: Link先を確認	Gang Li, Gilles Baechler, Manuel Tragut, Yang Li	(参考訳) モバイル画面のレイアウトは、ui設計のための重要なデータソースであり、画面のセマンティック理解である。しかし、既存のデータセットのuilayoutは、しばしば騒がしいか、視覚表現とミスマッチしているか、あるいは分析やモデル化が難しいジェネリックまたはアプリ固有型で構成されている。本稿では,既存のモバイルuiレイアウトデータセットを大規模に自動改善可能な,uiレイアウトを否定するディープラーニングアプローチを用いたclayパイプラインを提案する。パイプラインは、スクリーンショットと生のUIレイアウトの両方を取り、不正なノードを削除し、各ノードに意味のあるタイプを割り当てることで、生のレイアウトに注釈を付ける。データクリーニングパイプラインを試すために、公開モバイルuiコーパスであるricoのスクリーンショットと生のレイアウトに基づいて、59,555のヒューマンアノテーション付きスクリーンレイアウトのclayデータセットを作成しました。深層モデルでは,有意な視覚的表現を持たないレイアウトオブジェクトの検出では82.7%,オブジェクトタイプ認識では85.9%と,ヒューリスティックベースラインを有意に上回る精度を実現している。当社の作業であるlaysa foundation for create large-scale high quality ui layout datasets for data-driven mobile ui research(サイト・英語)は、非常に高価な手動ラベル作業の必要性を低減します。 The layout of a mobile screen is a critical data source for UI designresearch and semantic understanding of the screen. However, UIlayouts in existing datasets are often noisy, have mismatches withtheir visual representation, or consists of generic or app-specifictypes that are difficult to analyze and model. In this paper, wepropose the CLAY pipeline that uses a deep learning approach fordenoising UI layouts, allowing us to automatically improve existingmobile UI layout datasets at scale. Our pipeline takes both thescreenshot and the raw UI layout, and annotates the raw layout byremoving incorrect nodes and assigning a semantically meaningfultype to each node. To experiment with our data-cleaning pipeline,we create the CLAY dataset of 59,555 human-annotated screenlayouts, based on screenshots and raw layouts from Rico, a publicmobile UI corpus. Our deep models achieve high accuracy withF1 scores of 82.7% for detecting layout objects that do not have avalid visual representation and 85.9% for recognizing object types,which significantly outperforms a heuristic baseline. Our work laysa foundation for creating large-scale high quality UI layout datasetsfor data-driven mobile UI research and reduces the need of manuallabeling efforts that are prohibitively expensive.	翻訳日:2022-01-12 14:22:53 公開日:2022-01-11
# 入力中の不確かさの検出による事前学習言語モデルの予測不確かさの説明 Explaining Prediction Uncertainty of Pre-trained Language Models by Detecting Uncertain Words in Inputs ( http://arxiv.org/abs/2201.03742v1 ) ライセンス: Link先を確認	Hanjie Chen, Yangfeng Ji	(参考訳) プレトレーニング言語モデルの予測不確実性を推定することは,NLPにおける信頼性を高める上で重要である。先行研究の多くは予測の不確かさの定量化に重点を置いているが、不確実性を説明する作業はほとんどない。本稿では,事前訓練後の言語モデルの不確定な予測について,さらに説明していく。 2つの摂動法に基づくポストホック解釈法であるlet-one-out と sample shapley を適用し,予測の不確実性を引き起こす入力中の単語を同定した。提案手法をBERTとRoBERTaの3つのタスク(感情分類、自然言語推論、パラフレーズ識別)で、ドメイン内およびドメイン外の両方で検証する。実験により、どちらの手法も、予測の不確実性を引き起こす入力中の単語を常に捕捉することを示した。 Estimating the predictive uncertainty of pre-trained language models is important for increasing their trustworthiness in NLP. Although many previous works focus on quantifying prediction uncertainty, there is little work on explaining the uncertainty. This paper pushes a step further on explaining uncertain predictions of post-calibrated pre-trained language models. We adapt two perturbation-based post-hoc interpretation methods, Leave-one-out and Sampling Shapley, to identify words in inputs that cause the uncertainty in predictions. We test the proposed methods on BERT and RoBERTa with three tasks: sentiment classification, natural language inference, and paraphrase identification, in both in-domain and out-of-domain settings. Experiments show that both methods consistently capture words in inputs that cause prediction uncertainty.	翻訳日:2022-01-12 14:21:24 公開日:2022-01-11
# 単語置換に対するロバストさの定量化 Quantifying Robustness to Adversarial Word Substitutions ( http://arxiv.org/abs/2201.03829v1 ) ライセンス: Link先を確認	Yuting Yang, Pei Huang, FeiFei Ma, Juan Cao, Meishan Zhang, Jian Zhang and Jintao Li	(参考訳) 深層学習に基づくNLPモデルは単語置換摂動に弱いことが判明した。広く採用される前に、堅牢性に関する基本的な問題に対処する必要がある。本稿では,単語レベルのロバスト性を評価するための形式的枠組みを提案する。まず,モデルの安全な領域を研究するために,モデルが摂動に抵抗できる境界であるロバスト性半径を導入する。最大ロバスト性半径の計算は計算が難しいので、その上限と下限を推定する。攻撃手法を上界を求める方法として再活用し,より強固な上界に対して擬似動的プログラミングアルゴリズムを設計する。そして、下限に対して検証方法を利用する。さらに,安全な半径外の領域のロバスト性を評価するために,別の視点からロバスト性を再検討する。厳密な統計的保証を持つロバストネス計量を導入し、モデルが安全な半径の外の摂動に感受性を示す逆例の定量化を計測する。このメトリクスは、BERTのような最先端のモデルがいくつかの単語置換によって簡単に騙されることができる理由を理解するのに役立ちます。 Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. Before they are widely adopted, the fundamental issues of robustness need to be addressed. Along this line, we propose a formal framework to evaluate word-level robustness. First, to study safe regions for a model, we introduce robustness radius which is the boundary where the model can resist any perturbation. As calculating the maximum robustness radius is computationally hard, we estimate its upper and lower bound. We repurpose attack methods as ways of seeking upper bound and design a pseudo-dynamic programming algorithm for a tighter upper bound. Then verification method is utilized for a lower bound. Further, for evaluating the robustness of regions outside a safe radius, we reexamine robustness from another view: quantification. A robustness metric with a rigorous statistical guarantee is introduced to measure the quantification of adversarial examples, which indicates the model's susceptibility to perturbations outside the safe radius. The metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions, but generalize well in the presence of real-world noises.	翻訳日:2022-01-12 14:21:10 公開日:2022-01-11
# 因果グラフのない因果推論のためのアンセストラル法 Ancestral instrument method for causal inference without a causal graph ( http://arxiv.org/abs/2201.03810v1 ) ライセンス: Link先を確認	Debo Cheng (1) and Jiuyong Li (1) and Lin Liu (1) and Jiji Zhang (2) and Thuc duy Le (1) and Jixue Liu (1) ((1) STEM, University of South Australia, Adelaide, SA, Australia, (2) Department of Religion and Philosophy, Hong Kong Baptist University, Hong Kong, China)	(参考訳) 観測データから因果効果を推定する主な障害は、観測不能な共起である。インストゥルメンタル変数(ivs)は潜在共同創設者が存在する場合の因果効果推定に広く使われている。標準 iv 法では、与えられた iv が有効であれば、偏りのない推定が得られるが、標準 iv の妥当性要件は厳格で検証不可能である。条件IVは、観測変数の集合(条件IVの条件集合として知られる)を条件付けすることで標準IVの要求を緩和するために提案されている。しかし、条件付きivの条件付き集合を見つけるための基準は、観測変数と観測変数の両方の因果関係を表す完全な因果構造知識または有向非巡回グラフ(dag)が必要である。これにより、データから直接条件セットを見つけることが不可能になる。本稿では,潜在変数を用いた因果推論において最大祖先グラフ(mags)を活用し,magにおける新しいタイプのiv, ancestral ivを提案し,magにおける所定の祖先ivの条件付け集合をデータ駆動的に発見する理論を考案する。この理論に基づいて,マグおよび観測データにおける祖先ivを用いた非バイアス因果効果推定アルゴリズムを開発した。合成および実世界のデータセットに関する大規模な実験は、既存のIV法と比較してアルゴリズムの性能を実証した。 Unobserved confounding is the main obstacle to causal effect estimation from observational data. Instrumental variables (IVs) are widely used for causal effect estimation when there exist latent confounders. With the standard IV method, when a given IV is valid, unbiased estimation can be obtained, but the validity requirement of standard IV is strict and untestable. Conditional IV has been proposed to relax the requirement of standard IV by conditioning on a set of observed variables (known as a conditioning set for a conditional IV). However, the criterion for finding a conditioning set for a conditional IV needs complete causal structure knowledge or a directed acyclic graph (DAG) representing the causal relationships of both observed and unobserved variables. This makes it impossible to discover a conditioning set directly from data. In this paper, by leveraging maximal ancestral graphs (MAGs) in causal inference with latent variables, we propose a new type of IV, ancestral IV in MAG, and develop the theory to support data-driven discovery of the conditioning set for a given ancestral IV in MAG. Based on the theory, we develop an algorithm for unbiased causal effect estimation with an ancestral IV in MAG and observational data. Extensive experiments on synthetic and real-world datasets have demonstrated the performance of the algorithm in comparison with existing IV methods.	翻訳日:2022-01-12 14:20:51 公開日:2022-01-11
# オントロジーによるユーザ嗜好の獲得と表現 Acquisition and Representation of User Preferences Guided by an Ontology ( http://arxiv.org/abs/2201.03824v1 ) ライセンス: Link先を確認	Rahma Dandan, Sylvie Despres, Karima Sedki	(参考訳) 私たちの食物選好は、食べ物の選択を導き、個人の健康や社会生活に影響を与えます。本稿では,形式主義CP-Netにおける嗜好の獲得と表現を支援するためにOWL2で表現されたドメインオントロジーを用いたアプローチを採用する。具体的には,ドメインオントロジーとアンケート設計を構築し,好みの獲得と表現を行う。好みの獲得と表現は大学キャンティーンの分野で実施される。この予備作業における私たちの大きな貢献は、優先権を取得し、オントロジーで表現されたドメイン知識を好ましくはモデルを強化することです。 Our food preferences guide our food choices and in turn affect our personal health and our social life. In this paper, we adopt an approach using a domain ontology expressed in OWL2 to support the acquisition and representation of preferences in formalism CP-Net. Specifically, we present the construction of the domain ontology and questionnaire design to acquire and represent the preferences. The acquisition and representation of preferences are implemented in the field of university canteen. Our main contribution in this preliminary work is to acquire preferences and enrich the model preferably with domain knowledge represented in the ontology.	翻訳日:2022-01-12 14:20:30 公開日:2022-01-11
# rgb/ir融合によるドローン物体検出 Drone Object Detection Using RGB/IR Fusion ( http://arxiv.org/abs/2201.03786v1 ) ライセンス: Link先を確認	Lizhi Yang, Ruhang Ma, Avideh Zakhor	(参考訳) 近年,空中ドローン画像による物体検出が注目されている。可視光画像はほとんどのシナリオで物体を検出するのに適しているが、サーマルカメラは物体の検出能力を夜間や隠された物体に拡張することができる。そのため、オブジェクト検出のためのRGBおよび赤外線融合法が有用かつ重要である。 RGB/IRオブジェクト検出にディープラーニング手法を適用する際の最大の課題のひとつは、特に夜間におけるドローン赤外線画像のトレーニングデータが不足していることである。本稿では,airsimシミュレーションエンジンとcycleganを用いて合成ir画像を作成するためのいくつかの戦略を考案する。さらに,地上での物体検出のためにRGBとIR画像の融合に照明対応融合フレームワークを利用する。シミュレーションデータと実データの両方に対して,本手法を特徴付ける。我々のソリューションはnvidia jetson xavierで実際のドローンで動作し、rgb/ir画像ペアあたり約28ミリ秒の処理を必要とする。 Object detection using aerial drone imagery has received a great deal of attention in recent years. While visible light images are adequate for detecting objects in most scenarios, thermal cameras can extend the capabilities of object detection to night-time or occluded objects. As such, RGB and Infrared (IR) fusion methods for object detection are useful and important. One of the biggest challenges in applying deep learning methods to RGB/IR object detection is the lack of available training data for drone IR imagery, especially at night. In this paper, we develop several strategies for creating synthetic IR images using the AIRSim simulation engine and CycleGAN. Furthermore, we utilize an illumination-aware fusion framework to fuse RGB and IR images for object detection on the ground. We characterize and test our methods for both simulated and actual data. Our solution is implemented on an NVIDIA Jetson Xavier running on an actual drone, requiring about 28 milliseconds of processing per RGB/IR image pair.	翻訳日:2022-01-12 14:19:28 公開日:2022-01-11
# ローカルエンハンスと原型辞書学習による教師なしドメイン適応型人物の再認識 Unsupervised Domain Adaptive Person Re-id with Local-enhance and Prototype Dictionary Learning ( http://arxiv.org/abs/2201.03803v1 ) ライセンス: Link先を確認	Haopeng Hou	(参考訳) 非教師付きドメイン適応型人物再識別(re-ID)タスクは、一般的なドメイン適応型タスクとは異なり、ソースデータとターゲットドメインデータのクラスが重複しないため、大きなドメインギャップにつながるため、課題となっている。最先端のunsupervised re-IDメソッドは、メモリベースのコントラスト損失を使用してニューラルネットワークをトレーニングする。しかし、ラベルのない各インスタンスをクラスとして扱うことで対照的な学習を行うと、クラス衝突の問題が起こり、メモリバンクの更新時に異なるカテゴリのインスタンスの数が異なるため、更新強度が矛盾する。そこで本研究では,クラス衝突問題やクラスタレベルのプロトタイプ辞書学習による強度不整合の問題を回避しつつ,ソースドメインデータとターゲットドメインデータの両方を1つのトレーニング段階で活用できる人向け辞書学習のプロトタイプを提案する。モデル上のドメインギャップの干渉を低減するために,モデルパラメータ数を増加させることなく,モデルのドメイン適応性を向上させるローカルエンハンスモジュールを提案する。 2つの大きなデータセットに対する実験により,試作辞書学習の有効性が示された。 71.5\% mAP は Market-to-Duke タスクで達成され、最先端の非教師なしドメイン適応型 re-ID メソッドと比較して 2.3\% 改善されている。 Duke-to-Marketタスクでは83.9\%のmAPを実現しており、最先端の非教師なし適応型re-IDメソッドと比較して4.4\%改善されている。 The unsupervised domain adaptive person re-identification (re-ID) task has been a challenge because, unlike the general domain adaptive tasks, there is no overlap between the classes of source and target domain data in the person re-ID, which leads to a significant domain gap. State-of-the-art unsupervised re-ID methods train the neural networks using a memory-based contrastive loss. However, performing contrastive learning by treating each unlabeled instance as a class will lead to the problem of class collision, and the updating intensity is inconsistent due to the difference in the number of instances of different categories when updating in the memory bank. To address such problems, we propose Prototype Dictionary Learning for person re-ID which is able to utilize both source domain data and target domain data by one training stage while avoiding the problem of class collision and the problem of updating intensity inconsistency by cluster-level prototype dictionary learning. In order to reduce the interference of domain gap on the model, we propose a local-enhance module to improve the domain adaptation of the model without increasing the number of model parameters. Our experiments on two large datasets demonstrate the effectiveness of the prototype dictionary learning. 71.5\% mAP is achieved in the Market-to-Duke task, which is a 2.3\% improvement compared to the state-of-the-art unsupervised domain adaptive re-ID methods. It achieves 83.9\% mAP in the Duke-to-Market task, which improves by 4.4\% compared to the state-of-the-art unsupervised adaptive re-ID methods.	翻訳日:2022-01-12 14:18:05 公開日:2022-01-11
# MobileFaceSwap: ビデオ顔スワッピングのための軽量フレームワーク MobileFaceSwap: A Lightweight Framework for Video Face Swapping ( http://arxiv.org/abs/2201.03808v1 ) ライセンス: Link先を確認	Zhiliang Xu, Zhibin Hong, Changxing Ding, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding	(参考訳) 高度な顔交換法は魅力的な結果を得た。しかし、これらのメソッドの多くは多くのパラメータと計算を持っているため、リアルタイムアプリケーションに適用したり、携帯電話のようなエッジデバイスにデプロイすることは困難である。本研究では,識別情報に基づいてモデルパラメータを動的に調整し,主観的顔交換のための軽量ID-Aware Dynamic Network (IDN)を提案する。特に,重み予測と重み変調を含む2つの動的ニューラルネットワーク技術を導入することで,効率的なid注入モジュール(iim)を設計する。 IDNが更新されると、ターゲット画像やビデオが与えられた顔のスワップに適用される。提示されたIDNは0.50Mパラメータのみを含み、1フレームあたり0.33GのFLOPを必要とするため、携帯電話でリアルタイムのビデオ顔交換が可能である。さらに, 安定トレーニングのための知識蒸留に基づく方法を導入し, よりよい合成結果を得るために損失重み付けモジュールを用いる。最後に,本手法は教師モデルや他の最先端手法と同等の結果を得る。 Advanced face swapping methods have achieved appealing results. However, most of these methods have many parameters and computations, which makes it challenging to apply them in real-time applications or deploy them on edge devices like mobile phones. In this work, we propose a lightweight Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping by dynamically adjusting the model parameters according to the identity information. In particular, we design an efficient Identity Injection Module (IIM) by introducing two dynamic neural network techniques, including the weights prediction and weights modulation. Once the IDN is updated, it can be applied to swap faces given any target image or video. The presented IDN contains only 0.50M parameters and needs 0.33G FLOPs per frame, making it capable for real-time video face swapping on mobile phones. In addition, we introduce a knowledge distillation-based method for stable training, and a loss reweighting module is employed to obtain better synthesized results. Finally, our method achieves comparable results with the teacher models and other state-of-the-art methods.	翻訳日:2022-01-12 14:17:35 公開日:2022-01-11
# 可視赤外人物再同定のための補助学習タスクとしてのポーズ推定の検討 On Exploring Pose Estimation as an Auxiliary Learning Task for Visible-Infrared Person Re-identification ( http://arxiv.org/abs/2201.03859v1 ) ライセンス: Link先を確認	Yunqi Miao, Nianchang Huang, Xiao Ma, Qiang Zhang, and Jungong Han	(参考訳) 可視赤外人物再同定(vi-reid)は,可視光と赤外線の差が大きいため困難である。ほとんどの先駆的なアプローチは、モダリティ共有とID関連の特徴を学習することで、クラス内変異とモダリティ間格差を減らす。しかし、明示的なモダリティ共有のキュー、すなわちボディキーポイントは、VI-ReIDで完全に活用されていない。さらに、既存の機能学習パラダイムは、グローバル機能と部分機能の予測一貫性を無視した、グローバル機能または分割された機能ストライプに制約を課している。上記の問題に対処するため、我々はPose Estimationを補助学習タスクとして活用し、エンドツーエンドフレームワークにおけるVI-ReIDタスクを支援する。これら2つのタスクを相互に有益にトレーニングすることで、より高品質なモダリティ共有およびid関連特徴を学習する。その上、グローバルな特徴と局所的な特徴の学習は階層的特徴制約(HFC)によってシームレスに同期され、前者は知識蒸留戦略を用いて後者を監督する。 2つのベンチマークVI-ReIDデータセットの実験結果から,提案手法は一定のマージンで最先端の手法を継続的に改善することが示された。具体的には,RegDBデータセットの最先端手法に対して,約20$\%$ mAPの改善を実現する。興味深い結果として,VI-ReIDにおける補助課題学習の利用が注目された。 Visible-infrared person re-identification (VI-ReID) has been challenging due to the existence of large discrepancies between visible and infrared modalities. Most pioneering approaches reduce intra-class variations and inter-modality discrepancies by learning modality-shared and ID-related features. However, an explicit modality-shared cue, i.e., body keypoints, has not been fully exploited in VI-ReID. Additionally, existing feature learning paradigms imposed constraints on either global features or partitioned feature stripes, which neglect the prediction consistency of global and part features. To address the above problems, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework. By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features. On top of it, the learnings of global features and local features are seamlessly synchronized by Hierarchical Feature Constraint (HFC), where the former supervises the latter using the knowledge distillation strategy. Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins. Specifically, our method achieves nearly 20$\%$ mAP improvements against the state-of-the-art method on the RegDB dataset. Our intriguing findings highlight the usage of auxiliary task learning in VI-ReID.	翻訳日:2022-01-12 14:17:17 公開日:2022-01-11
# 3D ConvNet の最適化計画 Optimization Planning for 3D ConvNets ( http://arxiv.org/abs/2201.04021v1 ) ライセンス: Link先を確認	Zhaofan Qiu and Ting Yao and Chong-Wah Ngo and Tao Mei	(参考訳) 3次元畳み込みニューラルネットワーク(3d convnets)を最適に学習するのは、高い複雑性とトレーニングスキームの様々なオプションのためである。最も一般的なハンドチューニングプロセスは、短いビデオクリップを使って3dコンベネットを学習することから始まり、その後、長いクリップを使って長期の時間依存を学習し、トレーニングが進むにつれて学習率を徐々に低下させる。このようなプロセスといくつかのヒューリスティックな設定が組み合わさったという事実は、トレーニング全体を自動化するための最適な"パス"を求めて研究を動機付ける。本稿では,パスを一連のトレーニング「状態」に分解し,各状態における学習率や入力クリップの長さなどのハイパーパラメータを指定する。パフォーマンス・エピック曲線における膝点の推定は、ある状態から別の状態への遷移を引き起こす。我々は全ての候補状態に対して動的プログラミングを行い、最適な状態の置換、すなわち最適化経路を計画する。さらに,空間的および時間的識別性を改善するために,デュアルヘッド分類器を独自に設計した新しい3次元convnetを考案する。 7つの公開ビデオ認識ベンチマークに関する広範囲な実験が提案の利点を示している。最適化計画では、3D ConvNetsは最先端の認識手法と比較して優れた結果が得られる。より顕著に、Kinetics-400とKinetics-600のデータセットでそれぞれ80.5%と82.7%というトップ1の精度を得る。ソースコードはhttps://github.com/ZhaofanQiu/Optimization-Planning-for-3D-ConvNetsで入手できる。 It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal dependency using lengthy clips, while gradually decaying the learning rate from high to low as training progresses. The fact that such process comes along with several heuristic settings motivates the study to seek an optimal "path" to automate the entire training. In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e.g., learning rate and the length of input clips, in each state. The estimation of the knee point on the performance-epoch curve triggers the transition from one state to another. We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path. Furthermore, we devise a new 3D ConvNets with a unique design of dual-head classifier to improve spatial and temporal discrimination. Extensive experiments on seven public video recognition benchmarks demonstrate the advantages of our proposal. With the optimization planning, our 3D ConvNets achieves superior results when comparing to the state-of-the-art recognition methods. More remarkably, we obtain the top-1 accuracy of 80.5% and 82.7% on Kinetics-400 and Kinetics-600 datasets, respectively. Source code is available at https://github.com/ZhaofanQiu/Optimization-Planning-for-3D-ConvNets.	翻訳日:2022-01-12 14:16:54 公開日:2022-01-11
# 映像認識のための1つの情報フレームにシーケンスを凝縮する Condensing a Sequence to One Informative Frame for Video Recognition ( http://arxiv.org/abs/2201.04022v1 ) ライセンス: Link先を確認	Zhaofan Qiu and Ting Yao and Yan Shu and Chong-Wah Ngo and Tao Mei	(参考訳) 動画は、動きのばらつきと、細かな視覚詳細の豊富なコンテンツによって複雑である。このような情報集約メディアから有用な情報を抽象化するには、網羅的な計算資源が必要である。本稿では,まず映像シーケンスを情報的「フレーム」に凝縮し,次に合成フレーム上の既製の画像認識システムを利用する2段階の方法を提案する。有効な疑問は、どのように「有用な情報」を定義し、それをビデオシーケンスから1つの合成フレームに蒸留するかである。本稿では,視覚再構成,映像分類,運動推定,および2つの正則化,すなわち,逆学習,色一貫性という3つの客観的タスクを組み込んだ,新しい情報フレーム合成(ifs)アーキテクチャを提案する。各タスクは合成フレームに1つの能力を与え、各レギュレータはその視覚品質を高める。これにより、フレーム合成をエンドツーエンドで共同で学習することにより、ビデオ解析に有用な時空間情報をカプセル化することが期待できる。大規模なKineeticsデータセット上で大規模な実験を行う。ビデオシーケンスを1つの画像にマッピングするベースライン手法と比較すると、IFSは優れた性能を示す。さらに印象的なことに、IFSは画像ベースの2Dネットワークとクリップベースの3Dネットワークの明確な改善を一貫して示しており、計算コストの少ない最先端の手法と同等のパフォーマンスを実現している。 Video is complex due to large variations in motion and rich content in fine-grained visual details. Abstracting useful information from such information-intensive media requires exhaustive computing resources. This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame. A valid question is how to define "useful information" and then distill it from a video sequence down to one synthetic frame. This paper presents a novel Informative Frame Synthesis (IFS) architecture that incorporates three objective tasks, i.e., appearance reconstruction, video categorization, motion estimation, and two regularizers, i.e., adversarial learning, color consistency. Each task equips the synthetic frame with one ability, while each regularizer enhances its visual quality. With these, by jointly learning the frame synthesis in an end-to-end manner, the generated frame is expected to encapsulate the required spatio-temporal information useful for video analysis. Extensive experiments are conducted on the large-scale Kinetics dataset. When comparing to baseline methods that map video sequence to a single image, IFS shows superior performance. More remarkably, IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks, and achieves comparable performance with the state-of-the-art methods with less computational cost.	翻訳日:2022-01-12 14:16:32 公開日:2022-01-11
# 多面統合による映像表現学習の促進 Boosting Video Representation Learning with Multi-Faceted Integration ( http://arxiv.org/abs/2201.04023v1 ) ライセンス: Link先を確認	Zhaofan Qiu and Ting Yao and Chong-Wah Ngo and Xiao-Ping Zhang and Dong Wu and Tao Mei	(参考訳) ビデオコンテンツは多面的であり、オブジェクト、シーン、インタラクション、アクションで構成される。既存のデータセットは、モデルトレーニング用のファセットの1つだけをラベル付けし、トレーニングデータセットに依存する1つのファセットに偏るビデオ表現を生成する。多面ラベルからビデオ表現を学ぶ方法や、多面情報をビデオ表現学習に有用かどうかについてはまだ研究されていない。本稿では,ビデオコンテンツの全スペクトルを反映した表現を学習するために,異なるデータセットから顔データを集約する,MUFI(MUlti-Faceted Integration)という新たな学習フレームワークを提案する。 MUFIは、映像表現をリッチなセマンティックな埋め込み空間に明示的にマッピングし、2つの視点から映像表現を協調的に最適化する視覚意味埋め込み学習として問題を定式化する。 1つは、各ビデオとそのラベル記述間の顔内監督を活かし、もう1つは、他のデータセットの顔から各ビデオの「意味表現」を顔間監督として予測することである。大規模な4つのビデオデータセットと2つの画像データセットを組み合わせることで、MUFIフレームワークを介して3D CNNを学習することが、ビデオ表現の優れた能力をもたらすことを示す。 MUFIを使った事前学習型3D CNNは、ダウンストリームビデオアプリケーションにおける他のアプローチよりも明らかに改善されている。 UCF101/HMDB51では98.1%/80.9%、ビデオキャプションではCIDEr-Dスコアでは101.5%である。 Video content is multifaceted, consisting of objects, scenes, interactions or actions. The existing datasets mostly label only one of the facets for model training, resulting in the video representation that biases to only one facet depending on the training dataset. There is no study yet on how to learn a video representation from multifaceted labels, and whether multifaceted information is helpful for video representation learning. In this paper, we propose a new learning framework, MUlti-Faceted Integration (MUFI), to aggregate facets from different datasets for learning a representation that could reflect the full spectrum of video content. Technically, MUFI formulates the problem as visual-semantic embedding learning, which explicitly maps video representation into a rich semantic embedding space, and jointly optimizes video representation from two perspectives. One is to capitalize on the intra-facet supervision between each video and its own label descriptions, and the second predicts the "semantic representation" of each video from the facets of other datasets as the inter-facet supervision. Extensive experiments demonstrate that learning 3D CNN via our MUFI framework on a union of four large-scale video datasets plus two image datasets leads to superior capability of video representation. The pre-learnt 3D CNN with MUFI also shows clear improvements over other approaches on several downstream video applications. More remarkably, MUFI achieves 98.1%/80.9% on UCF101/HMDB51 for action recognition and 101.5% in terms of CIDEr-D score on MSVD for video captioning.	翻訳日:2022-01-12 14:16:08 公開日:2022-01-11
# 行動認識のための識別サブグラフとしての映像表現 Representing Videos as Discriminative Sub-graphs for Action Recognition ( http://arxiv.org/abs/2201.04027v1 ) ライセンス: Link先を確認	Dong Li and Zhaofan Qiu and Yingwei Pan and Ting Yao and Houqiang Li and Tao Mei	(参考訳) 人間の行動は、典型的には組合せ構造やパターン、すなわち主題、対象、そして時空間的相互作用である。このような構造を発見することは、相互作用のダイナミクスを推論し、行動を認識する報奨となる。本稿では,ビデオ中の各行動の識別パターンを表現・符号化するサブグラフの新たな設計を提案する。具体的には,MUSLE(MUlti-scale Sub-graph LEarning)フレームワークを新たに構築し,ノード数に関するグラフを各スケールでコンパクトなサブグラフにクラスタ化する。技術的には、MUSLEは各ビデオクリップに3Dバウンディングボックス、すなわちチューブレットをグラフノードとして生成し、チューブレット間のグラフエッジとして密接な接続を行う。各アクションカテゴリに対して、ガウス混合層を学習し、認識のためのアクションプロトタイプとして識別サブグラフを選択することにより、グラフを各スケールでサブグラフに分解するオンラインクラスタリングを実行する。 Some-Something V1 & V2 と Kinetics-400 の2つのデータセットで大規模な実験を行い、最先端の手法と比較して優れた結果を報告する。さらに、我々のMUSLEは、Something V2バリデーションセットで65.0%の最高の報告精度を達成した。 Human actions are typically of combinatorial structures or patterns, i.e., subjects, objects, plus spatio-temporal interactions in between. Discovering such structures is therefore a rewarding way to reason about the dynamics of interactions and recognize the actions. In this paper, we introduce a new design of sub-graphs to represent and encode the discriminative patterns of each action in the videos. Specifically, we present MUlti-scale Sub-graph LEarning (MUSLE) framework that novelly builds space-time graphs and clusters the graphs into compact sub-graphs on each scale with respect to the number of nodes. Technically, MUSLE produces 3D bounding boxes, i.e., tubelets, in each video clip, as graph nodes and takes dense connectivity as graph edges between tubelets. For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition. Extensive experiments are conducted on both Something-Something V1 & V2 and Kinetics-400 datasets, and superior results are reported when comparing to state-of-the-art methods. More remarkably, our MUSLE achieves to-date the best reported accuracy of 65.0% on Something-Something V2 validation set.	翻訳日:2022-01-12 14:15:41 公開日:2022-01-11
# 動きに着目した映像表現のコントラスト学習 Motion-Focused Contrastive Learning of Video Representations ( http://arxiv.org/abs/2201.04029v1 ) ライセンス: Link先を確認	Rui Li and Yiheng Zhang and Zhaofan Qiu and Ting Yao and Dong Liu and Tao Mei	(参考訳) 動画における動きは、時間とともに変化する変化を巻き込む最も独特な現象であり、ビデオ表現学習の発展に欠かせないものとなっている。本稿では,特に自己監督型映像表現学習において,どのような動きが重要か,という疑問を呈する。この目的のために、コントラスト学習の体制において、データ拡張と特徴学習のための動きを利用するデュエットを構成する。具体的には,このようなデュエットを基礎とみなす動き中心のコントラスト学習(MCL)手法を提案する。一方、MCLはビデオ内の各フレームの光学的流れを利用して、時間的および空間的にチューブレット(すなわち時間的に関連するフレームパッチのシーケンス)をデータ拡張としてサンプリングする。一方,MCLは,空間的・時間的・時空間的視点からの光学的フローマップに,畳み込み層の勾配図を合わせることで,特徴学習における運動情報の基礎となる。 R(2+1)Dバックボーンを用いた広範囲な実験により, MCLの有効性が示された。 UCF101では、MCLが学習した表現に基づいて訓練された線形分類器が81.91%のトップ-1の精度を達成し、ImageNetの教師付き事前トレーニングを6.78%上回った。 Kinetics-400では、MCLは線形プロトコルの下で66.62%のトップ-1の精度を達成する。コードはhttps://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learningで公開されている。 Motion, as the most distinct phenomenon in a video to involve the changes over time, has been unique and critical to the development of video representation learning. In this paper, we ask the question: how important is the motion particularly for self-supervised video representation learning. To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning. Specifically, we present a Motion-focused Contrastive Learning (MCL) method that regards such duet as the foundation. On one hand, MCL capitalizes on optical flow of each frame in a video to temporally and spatially sample the tubelets (i.e., sequences of associated frame patches across time) as data augmentations. On the other hand, MCL further aligns gradient maps of the convolutional layers to optical flow maps from spatial, temporal and spatio-temporal perspectives, in order to ground motion information in feature learning. Extensive experiments conducted on R(2+1)D backbone demonstrate the effectiveness of our MCL. On UCF101, the linear classifier trained on the representations learnt by MCL achieves 81.91% top-1 accuracy, outperforming ImageNet supervised pre-training by 6.78%. On Kinetics-400, MCL achieves 66.62% top-1 accuracy under the linear protocol. Code is available at https://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learning.	翻訳日:2022-01-12 14:15:18 公開日:2022-01-11
# (参考訳) 視覚質問応答における共用変圧器層の有効性について On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering ( http://arxiv.org/abs/2201.03965v1 ) ライセンス: CC BY 4.0	Ankur Sikarwar and Gabriel Kreiman	(参考訳) 近年、マルチモーダルトランスフォーマーは視覚言語タスクにおいて、視覚質問応答(vqa)のような著しい進歩を示しており、以前のアーキテクチャをかなり上回っている。このVQAの改善は、視覚と言語ストリーム間の豊富な相互作用に起因することが多い。本研究では,ネットワークが質問に回答しながら関連領域に集中するのを助けるために,コアテンショントランスフォーマー層の有効性について検討する。我々は,これらのコアテンション層における疑問条件付きイメージアテンションスコアを用いて視覚アテンションマップを生成する。現状VQAモデルの視覚的注意に対する以下の臨界成分の影響を評価する。 (i)対象領域の提案数 (ii)音声(POS)タグの質問部分 (iii)質問の意味論 (iv)コアテンション層の数、及び (v)正確性に答える。ニューラルネットワークのアテンションマップと人間のアテンションマップを質的・定量的に比較した。以上の結果から,画像の関連領域への応答にはコアテンショントランスフォーマーモジュールが重要であることが示唆された。重要なことに、質問の意味は視覚的注意を惹きつけるものではなく、質問の特定のキーワードが行うものである。我々の研究は、コアテンショントランスフォーマー層の機能と解釈に光を当て、現在のネットワークのギャップを強調し、視覚と言語ストリームを同時に処理する将来のVQAモデルとネットワークの開発をガイドすることができる。 In recent years, multi-modal transformers have shown significant progress in Vision-Language tasks, such as Visual Question Answering (VQA), outperforming previous architectures by a considerable margin. This improvement in VQA is often attributed to the rich interactions between vision and language streams. In this work, we investigate the efficacy of co-attention transformer layers in helping the network focus on relevant regions while answering the question. We generate visual attention maps using the question-conditioned image attention scores in these co-attention layers. We evaluate the effect of the following critical components on visual attention of a state-of-the-art VQA model: (i) number of object region proposals, (ii) question part of speech (POS) tags, (iii) question semantics, (iv) number of co-attention layers, and (v) answer accuracy. We compare the neural network attention maps against human attention maps both qualitatively and quantitatively. Our findings indicate that co-attention transformer modules are crucial in attending to relevant regions of the image given a question. Importantly, we observe that the semantic meaning of the question is not what drives visual attention, but specific keywords in the question do. Our work sheds light on the function and interpretation of co-attention transformer layers, highlights gaps in current networks, and can guide the development of future VQA models and networks that simultaneously process visual and language streams.	翻訳日:2022-01-12 14:13:41 公開日:2022-01-11
# 深層学習モデルを用いた感情分析--シンハラ語10年間のfacebookデータの比較研究 Sentiment Analysis with Deep Learning Models: A Comparative Study on a Decade of Sinhala Language Facebook Data ( http://arxiv.org/abs/2201.03941v1 ) ライセンス: Link先を確認	Gihan Weeraprameshwara, Vihanga Jayawickrama, Nisansa de Silva, Yudhanjaya Wijeratne	(参考訳) facebookの投稿とそれに対応するリアクション機能の関係は、探究と理解のための興味深いテーマだ。この目的をアーカイブするために、現在最先端のSinhala感情分析モデルを、何百万もの反応を伴う10年分のSinhalaポストを含むデータセットに対してテストした。ベンチマークの確立と、sinhalaの感情分析に最適なモデルを特定することを目的として、同じデータセットの設定で、感情分析に適した他のディープラーニングモデルをテストする。本研究では,3層双方向LSTMモデルが,現在最先端モデルであるCapsule Bより82.04%のF1スコアを達成し,Sinhala感情分析のF1スコアが84.58%に達することを報告した。さらに、すべてのディープラーニングモデルが75%以上のF1スコアを示しているので、Facebookの反応がテキストの感情を予測するのに適していると主張することは安全である。 The relationship between Facebook posts and the corresponding reaction feature is an interesting subject to explore and understand. To archive this end, we test state-of-the-art Sinhala sentiment analysis models against a data set containing a decade worth of Sinhala posts with millions of reactions. For the purpose of establishing benchmarks and with the goal of identifying the best model for Sinhala sentiment analysis, we also test, on the same data set configuration, other deep learning models catered for sentiment analysis. In this study we report that the 3 layer Bidirectional LSTM model achieves an F1 score of 84.58% for Sinhala sentiment analysis, surpassing the current state-of-the-art model; Capsule B, which only manages to get an F1 score of 82.04%. Further, since all the deep learning models show F1 scores above 75% we conclude that it is safe to claim that Facebook reactions are suitable to predict the sentiment of a text.	翻訳日:2022-01-12 13:56:37 公開日:2022-01-11
# chalearn autodl challenge 2019の勝利ソリューションとチャレンジ後の分析 Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019 ( http://arxiv.org/abs/2201.03801v1 ) ライセンス: Link先を確認	Zhengying Liu, Adrien Pavao, Zhen Xu, Sergio Escalera, Fabio Ferreira, Isabelle Guyon, Sirui Hong, Frank Hutter, Rongrong Ji, Julio C. S. Jacques Junior, Ge Li, Marius Lindauer, Zhipeng Luo, Meysam Madadi, Thomas Nierhoff, Kangning Niu, Chunguang Pan, Danny Stoll, Sebastien Treguer, Jin Wang, Peng Wang, Chenglin Wu, Youcheng Xiong, Arbe r Zela, Yang Zhang	(参考訳) 本稿では,ChaLearn氏のAutoDLチャレンジシリーズの結果と,さまざまな環境で導入されてきたディープラーニング(DL)のためのAutoMLソリューションの拡散のソートを支援するが,公正な比較は得られなかった。全ての入力データモダリティ(時系列、画像、ビデオ、テキスト、表計算)はテンソルとしてフォーマットされ、全てのタスクはマルチラベルの分類問題であった。コード提出は、限られた時間と計算資源で隠れたタスクで実行され、素早く結果を得るソリューションをプッシュした。この設定では、一般的なニューラルネットワークサーチ(NAS)は実用的ではなかったが、DLメソッドが支配的であった。ソリューションは、データモダリティにマッチするアーキテクチャを備えた、微調整済みのネットワークに依存していた。チャレンジ後のテストでは、制限時間を超える改善は示されなかった。コンポーネントは特にオリジナルでも新しいものでもないが、"meta-learner"、"data ingestor"、"model selector"、"model/learner"、"evaluator"を特徴とするハイレベルなモジュール化組織が登場した。このモジュラリティによってアブレーション研究が可能となり、(プラットフォーム外の)メタラーニング、センシング、効率的なデータ管理の重要性が明らかになった。異種モジュールの組み合わせに関する実験は、勝利した解の(局所的な)最適性をさらに確認する。私たちの課題には、継続するベンチマーク(http://autodl.chalearn.org)、勝者のオープンソースコード、無償の"AutoDLセルフサービス"が含まれています。 This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service".	翻訳日:2022-01-12 13:55:32 公開日:2022-01-11
# 覚えることを学ぶ Learning what to remember ( http://arxiv.org/abs/2201.03806v1 ) ライセンス: Link先を確認	Robi Bhattacharjee and Gaurav Mahajan	(参考訳) 我々は,学習者が絶え間なく任意の事実の流れに直面する生涯学習シナリオを考察し,その記憶に保持すべきものを決定する必要がある。オンライン学習フレームワークに基づく数学的モデルを導入し、学習者は記憶に制約のある専門家の集合に対して自己測定を行い、記憶すべきものに対する異なるポリシーを反映する。事実のストリームに散らばっているのは時々の質問であり、これらの各学習者は、対応する事実を覚えていなければ損失を被る。そのゴールは、ほぼ同じ量のメモリを使用しながら、後見で最高の専門家とほとんど同じことをすることです。このメモリ制約のあるシナリオにおいて乗算重み更新アルゴリズムを用いることの難しさを特定し、後悔の保証が最良に近い代替スキームを設計する。 We consider a lifelong learning scenario in which a learner faces a neverending and arbitrary stream of facts and has to decide which ones to retain in its limited memory. We introduce a mathematical model based on the online learning framework, in which the learner measures itself against a collection of experts that are also memory-constrained and that reflect different policies for what to remember. Interspersed with the stream of facts are occasional questions, and on each of these the learner incurs a loss if it has not remembered the corresponding fact. Its goal is to do almost as well as the best expert in hindsight, while using roughly the same amount of memory. We identify difficulties with using the multiplicative weights update algorithm in this memory-constrained scenario, and design an alternative scheme whose regret guarantees are close to the best possible.	翻訳日:2022-01-12 13:53:29 公開日:2022-01-11
# Uni-EDEN:マルチグラニュラービジョンランゲージ事前学習によるユニバーサルエンコーダデコーダネットワーク Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training ( http://arxiv.org/abs/2201.04026v1 ) ライセンス: Link先を確認	Yehao Li and Jiahao Fan and Yingwei Pan and Ting Yao and Weiyao Lin and Tao Mei	(参考訳) 視覚言語プレトレーニングは、リッチリソースプレトレーニングタスクから限定リソースダウンストリームタスクにマルチモーダル知識を転送する、新興かつ迅速な研究トピックである。単一のジェネリックエンコーダを主に学習する既存の研究とは異なり、視覚言語認識(例えば、視覚的質問応答)と生成(例えば、画像キャプション)の両方を容易にする訓練済みのユニバーサルエンコーダ・デコーダネットワーク(Uni-EDEN)を提示する。 Uni-EDENは2ストリームトランスフォーマーベースの構造で、各モダリティの表現を個別に学習するオブジェクトと文エンコーダと、モーダル間相互作用による多モーダル推論と文生成を可能にする文デコーダの3つのモジュールで構成される。画像の言語表現は, 単純から包括的, 個々のラベル, フレーズ, 自然文まで, この階層のさまざまな粒度にまたがる可能性があることを考慮し, 多粒度視覚言語プロキシタスク(MOC), マスケ領域フレーズ生成(MRPG), イメージセンテンスマッチング(ISM), マスケ領域生成(MSG)を事前訓練する。このように、Uni-EDENにはマルチモーダル表現抽出と言語モデリングの両方の能力がある。広汎な実験は、Uni-EDENを4つの視覚言語知覚と下流タスクに微調整することで、説得力のある一般化性を示す。 Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks. Unlike existing works that predominantly learn a single generic encoder, we present a pre-trainable Universal Encoder-DEcoder Network (Uni-EDEN) to facilitate both vision-language perception (e.g., visual question answering) and generation (e.g., image captioning). Uni-EDEN is a two-stream Transformer based structure, consisting of three modules: object and sentence encoders that separately learns the representations of each modality, and sentence decoder that enables both multi-modal reasoning and sentence generation via inter-modal interaction. Considering that the linguistic representations of each image can span different granularities in this hierarchy including, from simple to comprehensive, individual label, a phrase, and a natural sentence, we pre-train Uni-EDEN through multi-granular vision-language proxy tasks: Masked Object Classification (MOC), Masked Region Phrase Generation (MRPG), Image-Sentence Matching (ISM), and Masked Sentence Generation (MSG). In this way, Uni-EDEN is endowed with the power of both multi-modal representation extraction and language modeling. Extensive experiments demonstrate the compelling generalizability of Uni-EDEN by fine-tuning it to four vision-language perception and generation downstream tasks.	翻訳日:2022-01-12 13:53:15 公開日:2022-01-11
# (参考訳) 信号デノナイズのためのクロスバリデーションフレームワークとそのトレンドフィルタリング, Dyadic CARTなどへの応用 A Cross Validation Framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond ( http://arxiv.org/abs/2201.02654v2 ) ライセンス: CC BY 4.0	Anamitra Chaudhuri and Sabyasachi Chatterjee	(参考訳) 本稿では,信号復調のための一般的なクロス検証フレームワークを定式化する。一般的なフレームワークは、トレンドフィルタリングやdyadic cartのような非パラメトリック回帰法に適用される。得られたクロス検証されたバージョンは、最適に調整されたアナログで知られているように、ほぼ同じ収束率に達することが示される。トレンドフィルタリングやDyadic CARTのクロスバリデーションバージョンに関する以前の理論的分析は存在しなかった。フレームワークの汎用性を説明するために, 2つの基本推定器の相互検証版, 高次元線形回帰のためのラッソ, 行列推定のための特異値閾値付けを提案する。我々の一般的なフレームワークはChatterjee と Jafarov (2015) のアイデアにインスパイアされており、チューニングパラメータを使用する幅広い推定手法に適用できる可能性がある。 This paper formulates a general cross validation framework for signal denoising. The general framework is then applied to nonparametric regression methods such as Trend Filtering and Dyadic CART. The resulting cross validated versions are then shown to attain nearly the same rates of convergence as are known for the optimally tuned analogues. There did not exist any previous theoretical analyses of cross validated versions of Trend Filtering or Dyadic CART. To illustrate the generality of the framework we also propose and study cross validated versions of two fundamental estimators; lasso for high dimensional linear regression and singular value thresholding for matrix estimation. Our general framework is inspired by the ideas in Chatterjee and Jafarov (2015) and is potentially applicable to a wide range of estimation methods which use tuning parameters.	翻訳日:2022-01-12 13:26:38 公開日:2022-01-11
# 深層マルチタスク学習のためのユニタリスカラー化の擁護 In Defense of the Unitary Scalarization for Deep Multi-Task Learning ( http://arxiv.org/abs/2201.04122v1 ) ライセンス: Link先を確認	Vitaly Kurin, Alessandro De Palma, Ilya Kostrikov, Shimon Whiteson, M. Pawan Kumar	(参考訳) 最近のマルチタスク学習研究は、訓練が単にタスク損失の総和を最小化するユニタリスカラー化に反対している。アドホックなマルチタスク最適化アルゴリズムが提案され、マルチタスク設定の難しさに関する様々な仮説に着想を得た。これらのオプティマイザの大部分は、タスク毎の勾配を必要とし、メモリ、ランタイム、実装のオーバーヘッドを大きく導入する。本稿では,多くの専用マルチタスクオプティマイザを正規化の形式として解釈できることを示す理論的解析を行う。さらに,単タスク学習の標準正規化と安定化技術とを組み合わせることで,教師付き学習と強化学習の両方において,複雑なマルチタスクオプティマイザの性能が一致するか,あるいは向上することを示す。我々は,本研究の結果から,近年の地域研究に対する批判的な再評価が求められていると信じている。 Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and implementation overhead. We present a theoretical analysis suggesting that many specialized multi-task optimizers can be interpreted as forms of regularization. Moreover, we show that, when coupled with standard regularization and stabilization techniques from single-task learning, unitary scalarization matches or improves upon the performance of complex multi-task optimizers in both supervised and reinforcement learning settings. We believe our results call for a critical reevaluation of recent research in the area.	翻訳日:2022-01-12 13:26:07 公開日:2022-01-11
# 事前知識が放射線レポート生成を増強する Prior Knowledge Enhances Radiology Report Generation ( http://arxiv.org/abs/2201.03761v1 ) ライセンス: Link先を確認	Song Wang, Liyan Tang, Mingquan Lin, George Shih, Ying Ding, Yifan Peng	(参考訳) 放射線医学報告生成は, 放射線科医の作業負荷を軽減するため, コンピュータ支援診断を作成することを目的としており, 近年注目を集めている。しかし, 従来の深層学習手法では, 医学的所見間の相互影響を無視する傾向があり, 報告の質を損なうボトルネックとなる。本稿では,情報的知識グラフを用いて,医学的発見の関連性について検討し,その先行知識を放射線学的報告生成に取り入れ,報告の質を向上させることを提案する。実験の結果, ROUGE-Lを0.384$\pm$0.007, CIDErを0.340$\pm$0.011とするIU X線データセットにおいて, 提案手法の優れた性能を示した。 CIDErとROUGE-Lは平均1.6%の改善(それぞれ2.0%と1.5%の改善)を達成した。実験により, 先行知識が正確な放射線学レポート生成に性能向上をもたらす可能性が示唆された。コードはhttps://github.com/bionlplab/report_generation_amia2022で公開します。 Radiology report generation aims to produce computer-aided diagnoses to alleviate the workload of radiologists and has drawn increasing attention recently. However, previous deep learning methods tend to neglect the mutual influences between medical findings, which can be the bottleneck that limits the quality of generated reports. In this work, we propose to mine and represent the associations among medical findings in an informative knowledge graph and incorporate this prior knowledge with radiology report generation to help improve the quality of generated reports. Experiment results demonstrate the superior performance of our proposed method on the IU X-ray dataset with a ROUGE-L of 0.384$\pm$0.007 and CIDEr of 0.340$\pm$0.011. Compared with previous works, our model achieves an average of 1.6% improvement (2.0% and 1.5% improvements in CIDEr and ROUGE-L, respectively). The experiments suggest that prior knowledge can bring performance gains to accurate radiology report generation. We will make the code publicly available at https://github.com/bionlplab/report_generation_amia2022.	翻訳日:2022-01-12 13:25:32 公開日:2022-01-11
# セマンティクスセグメンテーションのためのピラミッド融合トランスフォーマ Pyramid Fusion Transformer for Semantic Segmentation ( http://arxiv.org/abs/2201.04019v1 ) ライセンス: Link先を確認	Zipeng Qin, Jianbo Liu, Xiaolin Zhang, Maoqing Tian, Aojun Zhou, Shuai Yi, Hongsheng Li	(参考訳) 最近提案されたMaskFormer \cite{maskformer}は、セマンティックセグメンテーションのタスクについて、新たな視点を与えている。本質的には、カテゴリセグメントに対応するペア確率とマスクを生成し、セグメンテーションマップの推論中にそれらを組み合わせます。したがって、セグメンテーションの品質は、クエリが画像内のカテゴリとその空間的位置に関するセマンティック情報をいかにうまくキャプチャできるかに依存する。本研究では,シングルスケール機能上のマスク分類デコーダは,信頼性の高い確率やマスクを抽出できるほど有効ではないことを見出した。特徴ピラミッド全体にわたって豊富な意味情報を求めるため,マルチスケール機能上にマスク毎のセマンティクスセグメンテーションを実現するトランスフォーマティブベースのピラミッド融合トランスフォーマを提案する。計算オーバーヘッドを過大に発生させることなく、異なる解像度の画像特徴を効率的に活用するために、PFTは、マルチスケールのマルチスケールトランスフォーマーデコーダを用いて補完情報を交換する。広範な実験評価とアブレーションを行い,その効果を実証した。特に、MaskFormerと比較して、ResNet-101cでCOCO-Stuff 10Kデータセットを3.2mIoU改善しました。さらに、ADE20K検証セットでは、Swin-BのバックボーンとMaskFormerのバックボーンと、シングルスケールとマルチスケールの両方でずっと大きなSwin-Lのバックボーンが一致し、それぞれ54.1 mIoUと55.3 mIoUを達成した。 Swin-Lのバックボーンを用いてADE20K検証セット上で56.0 mIoUのシングルスケール結果と57.2のマルチスケール結果を得る。 The recently proposed MaskFormer \cite{maskformer} gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates paired probabilities and masks corresponding to category segments and combines them during inference for the segmentation maps. The segmentation quality thus relies on how well the queries can capture the semantic information for categories and their spatial locations within the images. In our study, we find that per-mask classification decoder on top of a single-scale feature is not effective enough to extract reliable probability or mask. To mine for rich semantic information across the feature pyramid, we propose a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation on top of multi-scale features. To efficiently utilize image features of different resolutions without incurring too much computational overheads, PFT uses a multi-scale transformer decoder with cross-scale inter-query attention to exchange complimentary information. Extensive experimental evaluations and ablations demonstrate the efficacy of our framework. In particular, we achieve a 3.2 mIoU improvement on COCO-Stuff 10K dataset with ResNet-101c compared to MaskFormer. Besides, on ADE20K validation set, our result with Swin-B backbone matches that of MaskFormer's with a much larger Swin-L backbone in both single-scale and multi-scale inference, achieving 54.1 mIoU and 55.3 mIoU respectively. Using a Swin-L backbone, we achieve 56.0 mIoU single-scale result on the ADE20K validation set and 57.2 multi-scale result, obtaining state-of-the-art performance on the dataset.	翻訳日:2022-01-12 13:25:10 公開日:2022-01-11
# (参考訳) 核電子顕微鏡における体積再構成のための深部生成モデル Deep Generative Modeling for Volume Reconstruction in Cryo-Electron Microscopy ( http://arxiv.org/abs/2201.02867v2 ) ライセンス: CC BY 4.0	Claire Donnat, Axel Levy, Frederic Poitevin, Nina Miolane	(参考訳) 低温電子顕微鏡(cryo-EM)による生体分子の高分解能イメージングの最近の進歩は、分子体積の再構築のための新しい扉を開放し、生物学、化学、薬理学研究のさらなる進歩を約束している。重要な道のりにもかかわらず、Cryo-EMデータ分析における大きな課題は、物理学者、構造生物学者、計算機科学者、統計学者、応用数学者からの洞察を必要とする、自然界における厳密で複雑な学際的な課題のままである。一方, 生成モデルとエンドツーエンドの教師なし深層学習技術を組み合わせた次世代のボリューム再構成アルゴリズムでは, シミュレーションデータに対して有望な結果が得られた。そこで本稿では,このような手法の普及と課題の学際的性質を踏まえ,高分解能cryo-emボリューム再構成のための深部生成モデリングの最近の進歩を批判的に検討する。本日のレビューは (i)これらの新しい方法を比較して対比する一方で (ii)cryo-emの特定の背景を持たない5つの分野の科学者に親しみやすい用語を用いて、視点から提示すること。このレビューは、Creo-EMボリューム再構成のための深部生成モデルの数学的および計算的課題の紹介と、このクラスのアルゴリズム間で共有されるベースライン方法論の概要から始まる。これらの異なるモデルを通して共通のスレッドウィービングを確立し、これらの最先端のアルゴリズムを実践的に比較し、それらが依存する仮定とともに、それらの相対的な強みと弱みを強調します。これにより、将来の研究のための現在の方法や道のボトルネックを特定できます。 Recent breakthroughs in high resolution imaging of biomolecules in solution with cryo-electron microscopy (cryo-EM) have unlocked new doors for the reconstruction of molecular volumes, thereby promising further advances in biology, chemistry, and pharmacological research amongst others. Despite significant headway, the immense challenges in cryo-EM data analysis remain legion and intricately inter-disciplinary in nature, requiring insights from physicists, structural biologists, computer scientists, statisticians, and applied mathematicians. Meanwhile, recent next-generation volume reconstruction algorithms that combine generative modeling with end-to-end unsupervised deep learning techniques have shown promising results on simulated data, but still face considerable hurdles when applied to experimental cryo-EM images. In light of the proliferation of such methods and given the interdisciplinary nature of the task, we propose here a critical review of recent advances in the field of deep generative modeling for high resolution cryo-EM volume reconstruction. The present review aims to (i) compare and contrast these new methods, while (ii) presenting them from a perspective and using terminology familiar to scientists in each of the five aforementioned fields with no specific background in cryo-EM. The review begins with an introduction to the mathematical and computational challenges of deep generative models for cryo-EM volume reconstruction, along with an overview of the baseline methodology shared across this class of algorithms. Having established the common thread weaving through these different models, we provide a practical comparison of these state-of-the-art algorithms, highlighting their relative strengths and weaknesses, along with the assumptions that they rely on. This allows us to identify bottlenecks in current methods and avenues for future research.	翻訳日:2022-01-12 13:22:54 公開日:2022-01-11
# (参考訳) 新型コロナウイルス(covid-19)パンデミックにおけるバイオメディカル記事のゼロショットと少数ショットの分類 Zero-Shot and Few-Shot Classification of Biomedical Articles in Context of the COVID-19 Pandemic ( http://arxiv.org/abs/2201.03017v2 ) ライセンス: CC BY 4.0	Simon Lupart, Benoit Favre, Vassilina Nikoulina, Salah Ait-Mokhtar	(参考訳) mesh (medical subject headings) は国立医学図書館によって作成され、生物医学領域の出版物の細かなインデックス化に使われる大きなシソーラスである。新型コロナウイルス(COVID-19)パンデミックの文脈では、MeSH記述子は対応するトピックに関する記事に関連して現れている。ゼロショット分類は、メッシュカテゴリの論文の流れをタイムリーにラベリングするのに適切な応答である。本研究では,MeSHで利用可能なリッチな意味情報によってBioBERT表現が向上し,ゼロショット/フェーショットタスクに適合する可能性が示唆された。本稿では,MeSHの項定義と論文の要約が有効であるか否かを判断し,マルチタスク学習を活用して,Seq2seqタスクによって表現のMeSH階層を誘導する。結果は、MedLineとLitCovidデータセットのベースラインを確立し、結果の表現がMeSHに存在する階層的関係を伝達していることを示す。 MeSH (Medical Subject Headings) is a large thesaurus created by the National Library of Medicine and used for fine-grained indexing of publications in the biomedical domain. In the context of the COVID-19 pandemic, MeSH descriptors have emerged in relation to articles published on the corresponding topic. Zero-shot classification is an adequate response for timely labeling of the stream of papers with MeSH categories. In this work, we hypothesise that rich semantic information available in MeSH has potential to improve BioBERT representations and make them more suitable for zero-shot/few-shot tasks. We frame the problem as determining if MeSH term definitions, concatenated with paper abstracts are valid instances or not, and leverage multi-task learning to induce the MeSH hierarchy in the representations thanks to a seq2seq task. Results establish a baseline on the MedLine and LitCovid datasets, and probing shows that the resulting representations convey the hierarchical relations present in MeSH.	翻訳日:2022-01-12 12:43:07 公開日:2022-01-11
# (参考訳) 階層的多粒度分類のための階層的残差ネットワーク強化ラベル関係グラフ Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification ( http://arxiv.org/abs/2201.03194v2 ) ライセンス: CC BY 4.0	Jingzhou Chen, Peng Wang, Jian Liu, Yuntao Qian	(参考訳) 階層的多粒度分類(HMC)は、各オブジェクトに階層的多粒度ラベルを割り当て、["Albatross", "Laysan Albatross"]のようなラベル階層を粗いレベルから細かいレベルまで符号化することに焦点を当てる。しかしながら、細粒度の定義は主観的であり、画像品質が識別に影響する可能性がある。したがって、サンプルは階層の任意のレベル、例えば ["Albatross"] や ["Albatross", "Laysan Albatross"] で観察することができ、粗いカテゴリで識別される例は、従来のHMCの設定では無視されることが多い。本稿では,オブジェクトを階層の任意のレベルにラベル付けするHMC問題について検討する。提案手法の基本設計は,(1) 様々なレベルにラベル付けされた物体の学習は階層的な知識をレベル間で伝達し,(2) 下位クラスは上位レベルのスーパークラスに関連する属性を継承する,という2つの動機から導かれる。提案する組合せ損失は、ツリー階層で定義された関連ラベルから情報を集約することにより、観測された基底真理ラベルの限界確率を最大化する。観測されたラベルが葉のレベルであれば、組合せ損失はさらに多種クロスエントロピー損失を課し、細粒度の分類損失の重みを増加させる。本研究では,階層的特徴の相互作用を考慮した階層的残差ネットワーク(hrn)を提案する。 3つの一般的なデータセットを用いた実験は、最新のHMCアプローチや、ラベル階層を利用したきめ細かな視覚分類(FGVC)手法と比較して、我々のアプローチの有効性を実証している。 Hierarchical multi-granularity classification (HMC) assigns hierarchical multi-granularity labels to each object and focuses on encoding the label hierarchy, e.g., ["Albatross", "Laysan Albatross"] from coarse-to-fine levels. However, the definition of what is fine-grained is subjective, and the image quality may affect the identification. Thus, samples could be observed at any level of the hierarchy, e.g., ["Albatross"] or ["Albatross", "Laysan Albatross"], and examples discerned at coarse categories are often neglected in the conventional setting of HMC. In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy. The essential designs of the proposed method are derived from two motivations: (1) learning with objects labeled at various levels should transfer hierarchical knowledge between levels; (2) lower-level classes should inherit attributes related to upper-level superclasses. The proposed combinatorial loss maximizes the marginal probability of the observed ground truth label by aggregating information from related labels defined in the tree hierarchy. If the observed label is at the leaf level, the combinatorial loss further imposes the multi-class cross-entropy loss to increase the weight of fine-grained classification loss. Considering the hierarchical feature interaction, we propose a hierarchical residual network (HRN), in which granularity-specific features from parent levels acting as residual connections are added to features of children levels. Experiments on three commonly used datasets demonstrate the effectiveness of our approach compared to the state-of-the-art HMC approaches and fine-grained visual classification (FGVC) methods exploiting the label hierarchy.	翻訳日:2022-01-12 12:31:09 公開日:2022-01-11
# ブートストラップによる異種グラフニューラルネットワークのクロスビュー自己監督学習 Cross-view Self-Supervised Learning on Heterogeneous Graph Neural Network via Bootstrapping ( http://arxiv.org/abs/2201.03340v2 ) ライセンス: Link先を確認	Minjae Park	(参考訳) 不均一グラフニューラルネットワークは、優れた能力を持つ異種グラフの情報を表現することができる。近年,グラフの独特な表現を対照的な学習方法で学習する自己教師型学習法が研究されている。ラベルがない場合、この学習方法は大きな可能性を秘めている。しかし、対照的な学習は正と負のペアに大きく依存しており、異種グラフから高品質なペアを生成することは困難である。本稿では,BYOL(ブートストラップ)と呼ばれる自己教師型学習における最近の革新に則って,多数のペアを生成することなく優れた表現を生成できる手法を提案する。さらに、ネットワークスキーマとメタパスビューという2つの視点から異種グラフを見ることができるという事実に注目して、グラフ内の高レベル表現をキャプチャして表現する。提案モデルは,様々な実世界のデータセットにおいて,他の手法よりも最先端の性能を示した。 Heterogeneous graph neural networks can represent information of heterogeneous graphs with excellent ability. Recently, self-supervised learning manner is researched which learns the unique expression of a graph through a contrastive learning method. In the absence of labels, this learning methods show great potential. However, contrastive learning relies heavily on positive and negative pairs, and generating high-quality pairs from heterogeneous graphs is difficult. In this paper, in line with recent innovations in self-supervised learning called BYOL or bootstrapping, we introduce a that can generate good representations without generating large number of pairs. In addition, paying attention to the fact that heterogeneous graphs can be viewed from two perspectives, network schema and meta-path views, high-level expressions in the graphs are captured and expressed. The proposed model showed state-of-the-art performance than other methods in various real world datasets.	翻訳日:2022-01-12 12:12:00 公開日:2022-01-11
# GBRS: Pawlakラフセットと近隣ラフセットの統一モデル GBRS: An Unified Model of Pawlak Rough Set and Neighborhood Rough Set ( http://arxiv.org/abs/2201.03349v2 ) ライセンス: Link先を確認	Shuyin Xia, Cheng Wang, GuoYing Wang, XinBo Gao, Elisabeth Giem, JianHang Yu	(参考訳) パウラーク粗集合と近傍粗集合は、最も一般的な粗集合理論モデルである。 Pawlawk は知識を表現するために同値クラスを使用することができるが、連続データを処理することはできない。そこで本稿では,グラニュラーボール計算に基づく粒状ボール粗さ集合を提案する。粒状ボール粗さ集合は、パウラーク粗さ集合と近傍粗さ集合を同時に表現することができ、2つの統一表現を実現することができる。これにより、粒度ボールの粗い集合は連続データを扱うだけでなく、知識表現に同値クラスを使うことができる。さらに,粒状球粗集合の実装アルゴリズムを提案する。ベンチマークデータセットを用いた実験の結果,粒球計算のロバスト性と適応性の組み合わせにより,粒球粗さ集合の学習精度は,pawlak粗さ集合と従来の近傍粗さ集合と比較して大幅に向上した。グラウラーボールセットはまた、9つの人気または最先端の特徴選択方法よりも優れている。 Pawlak rough set and neighborhood rough set are the two most common rough set theoretical models. Pawlawk can use equivalence classes to represent knowledge, but it cannot process continuous data; neighborhood rough sets can process continuous data, but it loses the ability of using equivalence classes to represent knowledge. To this end, this paper presents a granular-ball rough set based on the granlar-ball computing. The granular-ball rough set can simultaneously represent Pawlak rough sets, and the neighborhood rough set, so as to realize the unified representation of the two. This makes the granular-ball rough set not only can deal with continuous data, but also can use equivalence classes for knowledge representation. In addition, we propose an implementation algorithms of granular-ball rough sets. The experimental resuts on benchmark datasets demonstrate that, due to the combination of the robustness and adaptability of the granular-ball computing, the learning accuracy of the granular-ball rough set has been greatly improved compared with the Pawlak rough set and the traditional neighborhood rough set. The granular-ball rough set also outperforms nine popular or the state-of-the-art feature selection methods.	翻訳日:2022-01-12 12:11:45 公開日:2022-01-11
# コンピュータビジョンによるUAV作物画像からの農業プラントカタログ作成とデータフレームワークの構築 Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision ( http://arxiv.org/abs/2201.02885v2 ) ライセンス: Link先を確認	Maurice G\"under, Facundo R. Ispizua Yamati, Jana Kierdorf, Ribana Roscher, Anne-Katrin Mahlein, Christian Bauckhage	(参考訳) 近代農業におけるUAVに基づく画像検索は、大量の空間的に参照された作物の画像データを収集することを可能にする。しかし、大規模な実験では、UAV画像は複雑な天蓋構造に多量の作物を含むことに苦しむ。特に時間的効果の観察においては、複数の画像上の個々の植物の認識と関連する情報の抽出が複雑になる。本研究は,理解可能なコンピュータビジョン手法に基づいて,uavの作物画像の時間的・空間的識別と個別化を自動化するためのハンズオンワークフローを提案する。実世界の2つのデータセット上でワークフローを評価する。 1つのデータセットは、成長サイクル全体を通してサトウキビの葉の斑点(真菌病)を観察するために記録されている。もう1つは、カリフラワー植物の収穫予測に関するものである。植物カタログは、複数のタイムポイントで見られる単一の植物画像の抽出に利用される。これにより、大規模な時空間画像データセットを収集し、さまざまなデータレイヤを含むさらなる機械学習モデルをトレーニングすることができる。提案手法は農業におけるUAVデータの分析と解釈を大幅に改善する。参照データによる検証により,より複雑な深層学習に基づく認識手法と類似した精度を示す。私たちのワークフローは、特に大規模なデータセットに対して、植物のカタログ作成と画像抽出のトレーニングを自動化できます。 UAV-based image retrieval in modern agriculture enables gathering large amounts of spatially referenced crop image data. In large-scale experiments, however, UAV images suffer from containing a multitudinous amount of crops in a complex canopy architecture. Especially for the observation of temporal effects, this complicates the recognition of individual plants over several images and the extraction of relevant information tremendously. In this work, we present a hands-on workflow for the automatized temporal and spatial identification and individualization of crop images from UAVs abbreviated as "cataloging" based on comprehensible computer vision methods. We evaluate the workflow on two real-world datasets. One dataset is recorded for observation of Cercospora leaf spot - a fungal disease - in sugar beet over an entire growing cycle. The other one deals with harvest prediction of cauliflower plants. The plant catalog is utilized for the extraction of single plant images seen over multiple time points. This gathers large-scale spatio-temporal image dataset that in turn can be applied to train further machine learning models including various data layers. The presented approach improves analysis and interpretation of UAV data in agriculture significantly. By validation with some reference data, our method shows an accuracy that is similar to more complex deep learning-based recognition techniques. Our workflow is able to automatize plant cataloging and training image extraction, especially for large datasets.	翻訳日:2022-01-12 12:11:26 公開日:2022-01-11
# MaskMTL:深層マルチタスク学習によるマスク付き顔画像の属性予測 MaskMTL: Attribute prediction in masked facial images with deep multitask learning ( http://arxiv.org/abs/2201.03002v2 ) ライセンス: Link先を確認	Prerana Mukherjee, Vinay Kaushik, Ronak Gupta, Ritika Jha, Daneshwari Kankanwadi, and Brejesh Lall	(参考訳) 目印の自由な顔画像の属性を予測することは、マスクの使用によって顔が目立たなくなるとさらに複雑になる課題である。身元確認や個人情報へのセキュアなログインを利用するスマートアクセス制御ゲートは、生体認証特性として顔を利用することができる。特に、Covid-19パンデミックは、衛生的および接触のない身元確認の重要性をますます証明している。このような場合、マスクの使用はより避けられないものとなり、属性予測は、コミュニティの広がりからターゲットの脆弱なグループを分離したり、共同環境での社会的距離を確保するのに役立つ。マスクの形状,大きさ,テクスチャの異なるマスクを効率的にオーバーレイすることで,マスクの装着による変動を効果的にモデル化する。本稿では,マスク付き顔画像から多種多様な属性を同時推定する深層マルチタスク学習(MTL)手法を提案する。ベンチマーク顔属性UTKFaceデータセットの実験結果から,提案手法が他の競合技術に取って代わることを示す。ソースコードはhttps://github.com/ritikajha/attribute-prediction-in-masked-face-images-with-deep-multitask-learning で入手できる。 Predicting attributes in the landmark free facial images is itself a challenging task which gets further complicated when the face gets occluded due to the usage of masks. Smart access control gates which utilize identity verification or the secure login to personal electronic gadgets may utilize face as a biometric trait. Particularly, the Covid-19 pandemic increasingly validates the essentiality of hygienic and contactless identity verification. In such cases, the usage of masks become more inevitable and performing attribute prediction helps in segregating the target vulnerable groups from community spread or ensuring social distancing for them in a collaborative environment. We create a masked face dataset by efficiently overlaying masks of different shape, size and textures to effectively model variability generated by wearing mask. This paper presents a deep Multi-Task Learning (MTL) approach to jointly estimate various heterogeneous attributes from a single masked facial image. Experimental results on benchmark face attribute UTKFace dataset demonstrate that the proposed approach supersedes in performance to other competing techniques. The source code is available at https://github.com/ritikajha/Attribute-prediction-in-masked-facial-images-with-deep-multitask-learni ng	翻訳日:2022-01-12 12:11:06 公開日:2022-01-11

Title

Authors

Abstract

論文公表日・翻訳日

# AI研究の絞り込み?

A narrowing of AI research? ( http://arxiv.org/abs/2009.10385v4 )

ライセンス: Link先を確認

Joel Klinger, Juan Mateos-Garcia and Konstantinos Stathoulopoulos

(参考訳) 大規模なデータセットからパターンを推論できるディープラーニング技術が登場し、人工知能(AI)システムの性能が劇的に向上した。しかし、ディープラーニングの急速な発展と採用は、多くの場合、大企業が主導し、堅牢性の欠如、環境コストの高さ、そして潜在的に不公平な結果を含む、その弱点にもかかわらず、AI研究の技術的軌道の早期縮小に関する懸念を生み出している。我々は、人気のあるプレプリントデータベースであるarXivにおけるAI研究のセマンティック分析により、エビデンスベースを改善しようとしている。我々は、AI研究のテーマ的多様性の進化について研究し、学術分野と民間分野におけるAI研究のテーマ的多様性を比較し、AI研究における民間企業の影響力を他の機関との協力を通じて測定する。以上の結果から,近年のai研究の多様性は停滞しており,民間分野でのai研究は学界での研究よりも多様で影響力が強い傾向がみられた。プライベートセクターのAI研究者は、他のAI手法を含む研究、AIの社会的および倫理的意味を考慮に入れた研究、健康のような分野の応用を犠牲にして、データ収集と計算集約的なディープラーニング手法を専門とする傾向にある。我々の研究結果は、その社会的利益を制限する可能性のあるAI研究の早期縮小を防ぐための政策行動の理論的根拠を提供するが、そのような介入の仕方における情報、インセンティブ、スケールハードルに留意する。

The arrival of deep learning techniques able to infer patterns from large datasets has dramatically improved the performance of Artificial Intelligence (AI) systems. Deep learning's rapid development and adoption, in great part led by large technology companies, has however created concerns about a premature narrowing in the technological trajectory of AI research despite its weaknesses, which include lack of robustness, high environmental costs, and potentially unfair outcomes. We seek to improve the evidence base with a semantic analysis of AI research in arXiv, a popular pre-prints database. We study the evolution of the thematic diversity of AI research, compare the thematic diversity of AI research in academia and the private sector and measure the influence of private companies in AI research through the citations they receive and their collaborations with other institutions. Our results suggest that diversity in AI research has stagnated in recent years, and that AI research involving the private sector tends to be less diverse and more influential than research in academia. We also find that private sector AI researchers tend to specialise in data-hungry and computationally intensive deep learning methods at the expense of research involving other AI methods, research that considers the societal and ethical implications of AI, and applications in sectors like health. Our results provide a rationale for policy action to prevent a premature narrowing of AI research that could constrain its societal benefits, but we note the informational, incentive and scale hurdles standing in the way of such interventions.

翻訳日:2023-05-01 07:17:14 公開日:2022-01-11

# 2次元量子渦ガス中の乱流緩和と平衡

Turbulent relaxation to equilibrium in a two-dimensional quantum vortex gas ( http://arxiv.org/abs/2010.10049v3 )

ライセンス: Link先を確認

Matthew T. Reeves, Kwan Goddard-Lee, Guillaume Gauthier, Oliver R. Stockdale, Hayder Salman, Timothy Edmonds, Xiaoquan Yu, Ashton S. Bradley, Mark Baker, Halina Rubinsztein-Dunlop, Matthew J. Davis, and Tyler W. Neely

(参考訳) 二次元カイラル渦気体の乱流緩和ダイナミクスにおける微小キャノニカル平衡状態の出現を実験的に研究した。メカニカル・スターリング・プロトコルを用いて、準2次元円盤状原子ボース・アインシュタイン凝縮体に同符号渦を注入する。結果として生じる長周期渦分布は、固定エネルギー$\cal{H}$と角運動量$\cal{M}$でマイクロカノニカルアンサンブルを記述する系に対する平均場ポアソン・ボルツマン方程式と良好な一致である。平衡状態は逆温度 $\hat{\beta}$ と回転周波数 $\hat{\omega}$ の対応する熱力学的変数によって特徴づけられる。我々は、ゼロ温度、無限温度、負絶対温度に近いオン軸状態を含む渦ガスの全相ダイアグラムにまたがる平衡を実現することができる。十分に高いエネルギーで、システムは対称性破壊遷移を示し、容器の対称性をもはや共有しない負の絶対温度におけるオフ軸平衡相をもたらす。平衡力学を定量的に再現できる現象論的減衰と雑音を伴う点渦モデルを提案する。

We experimentally study emergence of microcanonical equilibrium states in the turbulent relaxation dynamics of a two-dimensional chiral vortex gas. Same-sign vortices are injected into a quasi-two-dimensional disk-shaped atomic Bose-Einstein condensate using a range of mechanical stirring protocols. The resulting long-time vortex distributions are found to be in excellent agreement with the meanfield Poisson-Boltzmann equation for the system describing the microcanonical ensemble at fixed energy $\cal{H}$ and angular momentum $\cal{M}$. The equilibrium states are characterized by the corresponding thermodynamic variables of inverse temperature $\hat{\beta}$ and rotation frequency $\hat{\omega}$. We are able to realize equilibria spanning the full phase diagram of the vortex gas, including on-axis states near zero-temperature, infinite temperature, and negative absolute temperatures. At sufficiently high energies the system exhibits a symmetry-breaking transition, resulting in an off-axis equilibrium phase at negative absolute temperature that no longer shares the symmetry of the container. We introduce a point-vortex model with phenomenological damping and noise that is able to quantitatively reproduce the equilibration dynamics.

翻訳日:2023-04-28 05:51:27 公開日:2022-01-11

# ランダム製品状態からのエンタングルメントダイナミクス:最大エンタングルメントからの逸脱

Entanglement Dynamics From Random Product States: Deviation From Maximal Entanglement ( http://arxiv.org/abs/2102.07584v3 )

ライセンス: Link先を確認

Yichen Huang

(参考訳) 量子多体系の絡み合い力学を研究し、次のことを証明する: (I) 格子上の任意の幾何学的局所ハミルトニアンに対して、ランダムな積状態から始めると、絡み合いエントロピーは高い確率で常に最大エントロピーから遠ざかる。 (II) ランダムな全対全相互作用を持つスピングラスモデルでは、任意の積状態から始めると、平均エンタングルメントエントロピーは常に最大エントロピーから切り離される。また、これらの結果は電荷保存を伴う任意のユニタリ進化や、Sachdev-Ye-Kitaevモデルにも拡張する。この結果は、(カオス)ハミルトン力学によって生じる絡み合いとランダム状態との差を強調し、後者は最大に近い。

We study the entanglement dynamics of quantum many-body systems and prove the following: (I) For any geometrically local Hamiltonian on a lattice, starting from a random product state the entanglement entropy is bounded away from the maximum entropy at all times with high probability. (II) In a spin-glass model with random all-to-all interactions, starting from any product state the average entanglement entropy is bounded away from the maximum entropy at all times. We also extend these results to any unitary evolution with charge conservation and to the Sachdev-Ye-Kitaev model. Our results highlight the difference between the entanglement generated by (chaotic) Hamiltonian dynamics and that of random states, for the latter is nearly maximal.

翻訳日:2023-04-11 02:20:16 公開日:2022-01-11

# 位相空間における複素ネットワークのシミュレーション:ガウスボソンサンプリング

Simulating complex networks in phase space: Gaussian boson sampling ( http://arxiv.org/abs/2102.10341v5 )

ライセンス: Link先を確認

Peter D. Drummond, Bogdan Opanchuk, Alexander Dellios and Margaret D. Reid

(参考訳) フォトニックネットワークにおけるガウス量子状態の位相空間シミュレーションにより,ガウスボソンサンプリング(GBS)量子コンピュータの可測相関の検証が可能であることを示す。以上の結果と100次相関実験は一致し,デコヒーレンスを考慮に入れた。これを16,000以上のモードに拡張し、真のマルチパーティの絡み合いをシミュレートする方法を説明する。

We show how phase-space simulations of Gaussian quantum states in a photonic network permit verification of measurable correlations of Gaussian boson sampling (GBS) quantum computers. Our results agree with experiments for up to 100-th order correlations, provided decoherence is included. We extend this to more than 16,000 modes, and describe how to simulate genuine multipartite entanglement.

翻訳日:2023-04-10 15:57:27 公開日:2022-01-11

# 散逸下における量子同期と極限サイクルの代数理論

Algebraic Theory of Quantum Synchronization and Limit Cycles under Dissipation ( http://arxiv.org/abs/2103.01808v5 )

ライセンス: Link先を確認

Berislav Buca, Cameron Booker, Dieter Jaksch

(参考訳) 同期は相互作用する粒子が動きをロックし、非自明なダイナミクスを表示する現象である。古典的極限のない系における同期の研究には激しい努力があったが、包括的な理論は発見されていない。我々は、時間非依存の量子マスター方程式の固有モード(極限サイクル)を持続的に振動させるために必要な新しい代数的基準に基づいて、そのような一般理論を開発する。これらの固有モードは量子コヒーレントでなければならないことを示し、動的対称性代数の観点から、そのような全ての力学に対する厳密な解析解を与える。この理論を用いて、安定同期と準安定/過渡同期の両方について検討する。我々は我々の理論を用いて自律システムの自然同期を完全に特徴づける。さらに、同期の欠如を証明するために使用できるコンパクトな代数的基準を与える。種々のフェルミオン性低温原子実験に関連する複数の系で同期を示す。

Synchronization is a phenomenon where interacting particles lock their motion and display non-trivial dynamics. Despite intense efforts studying synchronization in systems without clear classical limits, no comprehensive theory has been found. We develop such a general theory based on novel necessary and sufficient algebraic criteria for persistently oscillating eigenmodes (limit cycles) of time-independent quantum master equations. We show these eigenmodes must be quantum coherent and give an exact analytical solution for all such dynamics in terms of a dynamical symmetry algebra. Using our theory, we study both stable synchronization and metastable/transient synchronization. We use our theory to fully characterise spontaneous synchronization of autonomous systems. Moreover, we give compact algebraic criteria that may be used to prove absence of synchronization. We demonstrate synchronization in several systems relevant for various fermionic cold atom experiments.

翻訳日:2023-04-09 12:13:52 公開日:2022-01-11

# ブラックホールメソ状態のホールボ情報

Holevo Information of Black Hole Mesostates ( http://arxiv.org/abs/2103.06888v2 )

ライセンス: Link先を確認

Ning Bao, Jonathan Harper, Grant N. Remmen

(参考訳) 我々は、異なる大きさの地平線間を補間するバルクワームホール幾何学を定義し、これらの幾何学におけるHRT表面の特性を決定する。この構造はブラックホールの中間状態と双対であり、ブラックホールのミクロ状態と完全なブラックホール状態の間の状態の中間粗粒化である。近年のホログラフィック・ホレボ情報技術を用いて,これらの物体の識別可能性を分析し,新しい位相遷移挙動を示す。

We define a bulk wormhole geometry interpolating between horizons of differing size and determine characteristics of the HRT surface in these geometries. This construction is dual to black hole mesostates, an intermediate coarse-graining of states between black hole microstates and the full black hole state. We analyze the distinguishability of these objects using the recently derived holographic Holevo information techniques, demonstrating novel phase transition behavior for such systems.

翻訳日:2023-04-08 10:54:51 公開日:2022-01-11

# 正の演算子価値尺度の直交化

Orthogonalization of Positive Operator Valued Measures ( http://arxiv.org/abs/2103.14126v2 )

ライセンス: Link先を確認

Mikael de la Salle

(参考訳) 我々は、ほぼ直交するヒルベルト空間上のユニタリ(またはPOVM)の分割が、同じフォン・ノイマン代数の直交のPOVMに近いことを示す。これはKempe-Vidick と Ji-Natarajan-Vidick-Wright-Yuen による行列代数における無限次元の以前の結果を一般化する。定量的には, 最適となる線形依存度が得られるので, 結果もより微妙である。また、フォン・ノイマン代数の前の有限部分集合の極小行列とPOVMの間の無限次元の双対結果にも一般化する。

We show that a partition of the unity (or POVM) on a Hilbert space that is almost orthogonal is close to an orthogonal POVM in the same von Neumann algebra. This generalizes to infinite dimension previous results in matrix algebras by Kempe-Vidick and Ji-Natarajan-Vidick-Wright-Yuen. Quantitatively, our result are also finer, as we obtain a linear dependance, which is optimal. We also generalize to infinite dimension a duality result between POVMs and minimal majorants of finite subsets in the predual of a von Neumann algebra.

翻訳日:2023-04-06 21:14:56 公開日:2022-01-11

# 閉・開系における二次ボソニック励起の位相的分類とその例

Topological classifications of quadratic bosonic excitations in closed and open systems with examples ( http://arxiv.org/abs/2103.15200v5 )

ライセンス: Link先を確認

Yan He and Chih-Chun Chien

(参考訳) 閉システムの運動方程式からの動的行列の対称性と開システムのリンドブラッド方程式からの有効ハミルトニアンによる二次ボソニック系の位相的分類を解析した。非エルミート動的行列と有効ハミルトニアンはどちらも10倍の方向テーブルをもたらすが、系-保存結合は、貯水池と結合しない系を異なるクラスに分解させる可能性がある。 2Dチャーン絶縁体は異なる分類に敏感であることが示されている。対照的に、カイラル対称性を持つ1次元ボソニックSu-Schrieffer-Heegerモデルと時間反転対称性を持つ2次元ボソニックトポロジカル絶縁体を、対応する開系が異なるクラスに分類できることを示す。

The topological classifications of quadratic bosonic systems according to the symmetries of the dynamic matrices from the equations of motion of closed systems and the effective Hamiltonians from the Lindblad equations of open systems are analyzed. While the non-Hermitian dynamic matrix and effective Hamiltonian both lead to a ten-fold way table, the system-reservoir coupling may cause a system with or without coupling to a reservoir to fall into different classes. A 2D Chern insulator is shown to be insensitive to the different classifications. In contrast, we present a 1D bosonic Su-Schrieffer-Heeger model with chiral symmetry and a 2D bosonic topological insulator with time-reversal symmetry to show the corresponding open systems may fall into different classes.

翻訳日:2023-04-06 08:02:47 公開日:2022-01-11

# 位相項を持つシュウィンガー模型のスクリーニングと閉じ込めのための古典的エミュレートディジタル量子シミュレーション

Classically emulated digital quantum simulation for screening and confinement in the Schwinger model with a topological term ( http://arxiv.org/abs/2105.03276v2 )

ライセンス: Link先を確認

Masazumi Honda, Etsuko Itou, Yuta Kikuchi, Lento Nagano, Takuya Okuda

(参考訳) 古典的シミュレータを用いてデジタル量子シミュレーションを行い、ゲージ理論における位相項によるスクリーニングと閉じ込めの研究を行い、テタ項を持つ(1+1$)次元量子電磁力学(シュウィンガーモデル)に焦点を当てた。プローブ電荷の存在下で基底状態エネルギーを計算し、それらの間のポテンシャルを断熱的状態準備によって推定する。有限体積のシミュレーション結果と解析的予測を比較し,良好な一致を求める。特に, 大規模の場合において, 非整数電荷に対する線形挙動と整数電荷に対する非線形挙動は, 非整数電荷に対する期待閉じ込め(スクリーニング)挙動と一貫して一致している。

We perform digital quantum simulation, using a classical simulator, to study screening and confinement in a gauge theory with a topological term, focusing on ($1+1$)-dimensional quantum electrodynamics (Schwinger model) with a theta term. We compute the ground state energy in the presence of probe charges to estimate the potential between them, via adiabatic state preparation. We compare our simulation results and analytical predictions for a finite volume, finding good agreements. In particular our result in the massive case shows a linear behavior for non-integer charges and a non-linear behavior for integer charges, consistently with the expected confinement (screening) behavior for non-integer (integer) charges.

翻訳日:2023-04-01 05:36:36 公開日:2022-01-11

# フィードバックコヒーレントイジング機の位相空間シミュレーション

Phase-space simulations of feedback coherent Ising machines ( http://arxiv.org/abs/2105.04190v2 )

ライセンス: Link先を確認

Simon Kiesewetter, Peter D Drummond (Swinburne University of Technology, Melbourne, Australia)

(参考訳) コヒーレントIsingマシン量子コンピュータの正位相空間シミュレーションを行うための新しい手法が実証された。結合行列の適切な設計により、一般的なハード最適化問題を解くことができる。ここでは、コヒーレントイジングマシンの実装である、フィードバックタイプのフォトニックパラメトリックネットワークの計算量子シミュレーションを行う。量子フィードバック装置の量子シミュレーションのためのスケーラブルな位相空間アルゴリズムを用いて成功率を求める。

A new technique is demonstrated for carrying out exact positive-P phase-space simulations of the coherent Ising machine quantum computer. By suitable design of the coupling matrix, general hard optimization problems can be solved. Here, computational quantum simulations of a feedback type of photonic parametric network are carried out, which is the implementation of the coherent Ising machine. Results for success rates are obtained using this scalable phase-space algorithm for quantum simulations of quantum feedback devices.

翻訳日:2023-03-31 23:40:15 公開日:2022-01-11

# 量子鎖上の同変準局所自己同型の分類

Classification of equivariant quasi-local automorphisms on quantum chains ( http://arxiv.org/abs/2106.02145v2 )

ライセンス: Link先を確認

Alex Bols

(参考訳) 量子鎖上の自己同型を分類し、スピンとフェルミオンの自由度の両方を許容し、それはさらに有限対称性群 $g$ の局所対称性作用に関して同値である。この分類は、強い連続的な変形と分離された補助自己同型の積み重ねを通じて同値となる。等価クラスは、$\mathbb{q} \cup \sqrt{2} \mathbb{q} \times \mathrm{hom}(g, \mathbb{z}_2) \times h^2(g, u(1))$ の値を取るインデックスによって一意にラベル付けされる。スピン鎖上の一次元対称性保護位相相の指数に対するこの指数のte関係について検討し, $h^2(g, u(1))$ の値を取る。

We classify automorphisms on quantum chains, allowing both spin and fermionic degrees of freedom, that are moreover equivariant with respect to a local symmetry action of a finite symmetry group $G$. The classification is up to equivalence through strongly continuous deformation and stacking with decoupled auxiliary automorphisms. We find that the equivalence classes are uniquely labeled by an index taking values in $\mathbb{Q} \cup \sqrt{2} \mathbb{Q} \times \mathrm{Hom}(G, \mathbb{Z}_2) \times H^2(G, U(1))$. We discuss te relation of this index to the index of one-dimensional symmetry protected topological phases on spin chains, which takes values in $H^2(G, U(1))$.

翻訳日:2023-03-27 23:11:30 公開日:2022-01-11

# 正確な被覆問題における量子アニールギャップとクエンチダイナミクス

The quantum annealing gap and quench dynamics in the exact cover problem ( http://arxiv.org/abs/2106.08101v3 )

ライセンス: Link先を確認

Bernhard Irsigler and Tobias Grass

(参考訳) 熱処理と熱処理は、量子系の時間進化において極端な反対である: 熱処理は、ゆっくりと変化するパラメータを持つハミルトンの平衡位相を探索し、複雑な最適化問題を解決するツールとして利用することができる。対照的に、クエンチはハミルトンの突然の変化であり、非平衡状態を生み出す。本稿では,両者の関係について検討する。具体的には,量子アニーリングアルゴリズムの重要なボトルネックであるアニーリングギャップの最小値が,クエンチ後の動的量子状態を記述する動的クエンチパラメータから明らかにできることを示す。ニューラルネットワークのトレーニングを含む統計的ツールと組み合わせることで、クエンチとアニーリングダイナミクスの関係を利用して、クエンチデータからアニーリングギャップの完全な機能的挙動を再現することができる。このような方法で得られるアニーリングギャップに関する部分的あるいは完全な知識は、実用的な時間対解の利点を持つ最適化された量子アニーリングプロトコルの設計に使用できる。我々の結果は、厳密な被覆問題の難解な例を表すランダムなイジング・ハミルトニアンのシミュレーションから得られる。

Quenching and annealing are extreme opposites in the time evolution of a quantum system: Annealing explores equilibrium phases of a Hamiltonian with slowly changing parameters and can be exploited as a tool for solving complex optimization problems. In contrast, quenches are sudden changes of the Hamiltonian, producing a non-equilibrium situation. Here, we investigate the relation between the two cases. Specifically, we show that the minimum of the annealing gap, which is an important bottleneck of quantum annealing algorithms, can be revealed from a dynamical quench parameter which describes the dynamical quantum state after the quench. Combined with statistical tools including the training of a neural network, the relation between quench and annealing dynamics can be exploited to reproduce the full functional behavior of the annealing gap from the quench data. We show that the partial or full knowledge about the annealing gap which can be gained in this way can be used to design optimized quantum annealing protocols with a practical time-to-solution benefit. Our results are obtained from simulating random Ising Hamiltonians, representing hard-to-solve instances of the exact cover problem.

翻訳日:2023-03-26 15:31:57 公開日:2022-01-11

# o(n)^{q-1}$テンソルモデルによる平均値のないワームホール計算

Wormhole calculus without averaging from $O(N)^{q-1}$ tensor model ( http://arxiv.org/abs/2106.14886v4 )

ライセンス: Link先を確認

Sayantan Choudhury, K. Shirish

(参考訳) SYKモデルは、ほぼ$AdS_2$空間のフェルミオン結合を平均化した後、ワームホールのような解を持つ。結合が固定されたとしても、これらのワームホールの寄与は引き続き存在し、新しいサドルポイントは「半ワームホール」と解釈される。本稿では,これらのワームホールの運命を,SYKモデルと相関関数とSYKモデルとの相関関係を持つ,$O(N)^{q-1}$ゲージ対称性を持つテンソルモデルを用いて検討する。ワームホール・スレッドド・ウィルソン作用素に関連付けられた因子分解問題を、大域電荷や非自明なコボルディズムクラスにおいて、解消されたワームホールに関連付ける。したがって、特に短距離で分解する分断関数には、ワームホールに関連する大域対称性を破り、大域対称性を欠くような位相的欠陥が存在する必要がある。我々はこれらのワームホールを、トポロジカルな欠陥を加えて「半ワームホール」と解釈する。また、スペクトルフォームファクタの遅い時間的挙動、特に重力セクターの高次ワームホール(英語版)からの先行およびサブリードの貢献についてもコメントする。最後に、他の「ハーフワームホール」の非自明なサドルがどのように支配的であり、非摂動効果によってバルクセクターで異常な熱力学を引き起こすかを示す。

The SYK model has a wormhole-like solution after averaging over the fermionic couplings in the nearly $AdS_2$ space. Even when the couplings are fixed the contribution of these wormholes continues to exist and new saddle points appear which are interpreted as "half-wormholes". In this paper, we will study the fate of these wormholes in a model without quenched disorder namely a tensor model with $O(N)^{q-1}$ gauge symmetry whose correlation function and thermodynamics in the large $N$ limit are the same as that of the SYK model. We will restate the factorization problem linked with the wormhole threaded Wilson operator, in terms of global charges or non-trivial cobordism classes associated with disconnected wormholes. Therefore for the partition function to factorize especially at short distances, there must exist certain topological defects which break the global symmetry associated with wormholes and make the theory devoid of global symmetries. We will interpret these wormholes with added topological defects as our "half-wormholes". We will also comment on the late time behavior of the spectral form factor, particularly its leading and sub-leading order contributions coming from higher genus wormholes in the gravitational sector. Finally we will show how, the other non-trivial saddles from "half-wormhole" dominate and give rise to unusual thermodynamics in the bulk sector due to non-perturbative effects.

翻訳日:2023-03-24 22:03:41 公開日:2022-01-11

# グラフ上の密度Functional Theory

Density-Functional Theory on Graphs ( http://arxiv.org/abs/2106.15370v2 )

ライセンス: Link先を確認

Markus Penz, Robert van Leeuwen

(参考訳) 密度汎関数理論の原理はグラフで表される有限格子系に対して研究される。驚くべきことに、ベーシックなホヘンベルク・コーンの定理は一般には無効であるが、密度-ポテンシャル写像の位相構造に関する多くの洞察は得られる。基底状態の正確な条件を一意に v-表現可能とし、この性質がほぼすべての密度に対して成り立つことを証明できる。一連の例がこの理論を示し、純粋な状態制約付き探索汎関数の非凸性を示す。

The principles of density-functional theory are studied for finite lattice systems represented by graphs. Surprisingly, the fundamental Hohenberg-Kohn theorem is found void in general, while many insights into the topological structure of the density-potential mapping can be won. We give precise conditions for a ground state to be uniquely v-representable and are able to prove that this property holds for almost all densities. A set of examples illustrates the theory and demonstrates the non-convexity of the pure-state constrained-search functional.

翻訳日:2023-03-24 19:42:57 公開日:2022-01-11

# 単一量子ビット解離による量子位相の学習

Learning quantum phases via single-qubit disentanglement ( http://arxiv.org/abs/2107.03542v3 )

ライセンス: Link先を確認

Zheng An, Chenfeng Cao, Cheng-Qian Xu, D. L. Zhou

(参考訳) 物質の相の同定は複雑なプロセスであり、特に量子論では、基底状態の複雑さはシステムサイズとともに指数関数的に増加するように見える。量子多体系の絡み合いは異なる相における異なる複雑な構造を示す。異なる絡み合い構造を検出して異なる量子相を区別する自然な方法である。絡み合い構造の検出に直接的に取り組む方法として、絡み合い構造に関する事実情報を提供することができる。本研究では, 量子相を同定するアプローチとして, 異なる位相下での絡み合い構造の決定と, イジングスピン鎖系の相の識別という, 根本的に異なるアプローチを踏襲する。本稿では,RL設計逆アンタングル回路の性能に基づく位相遷移を求めるための強化学習(RL)手法を提案する。さらに,本手法では局所演算が限定され,量子回路が拡張可能で,大規模システムにも拡張可能である。本研究では, 横場イジングモデル(TFIM)とXXZモデルにおける量子相転移におけるこの手法の成功例を示す。さらに、RLエージェントは、TFIMの下での絡み合い構造のクラマース・ワニエ双対性を学ぶ。この研究は、量子多体系の絡み合い構造で量子相を特徴づけることに光を当てる。

Identifying phases of matter is a complicated process, especially in quantum theory, where the complexity of the ground state appears to rise exponentially with system size. The entanglement of quantum many-body systems exhibits different complex structures in different phases. It is a natural way to distinguish different quantum phases by detecting different entanglement structures. As a method that works directly on the detection of the entanglement structure, disentanglement can provide factual information on the entanglement structure. In this work, we follow a radically different approach to identifying quantum phases: we utilize disentanglement to determine the entanglement structures under different phases to distinguish the phases of Ising spin chain systems. Here, we propose a reinforcement learning (RL) approach to finding phase transitions based on the performance of the RL-designed disentangling circuit. Further, our method uses limited local operations and one qubit measurement, making the quantum circuit scalable and easily extended to large-sized systems. We demonstrate the success of this method on the quantum phase transition in the transverse field Ising model (TFIM) and the XXZ model. Moreover, we find that the RL agent learns the Kramers-Wannier duality on entanglement structures under the TFIM. This study sheds light on characterizing quantum phases with the entanglement structures of quantum many-body systems.

翻訳日:2023-03-23 02:21:44 公開日:2022-01-11

# ピロクロア格子上のキラル$\mathbb{Z}_2$スピン液体の射影対称性群の分類:スピン-1/2$XXZハイゼンベルクモデルへの応用

Projective symmetry group classification of chiral $\mathbb{Z}_2$ spin liquids on the pyrochlore lattice: application to the spin-$1/2$ XXZ Heisenberg model ( http://arxiv.org/abs/2107.13574v3 )

ライセンス: Link先を確認

Benedikt Schneider, Jad C. Halimeh, Matthias Punk

(参考訳) 我々は、シュヴィンガーボソン平均場状態の射影対称性群解析を用いて、ピロクロア格子上のキラル$\mathbb{Z}_2$量子スピン液体と同様に、完全対称の完全な分類を与える。 Liu らによって分類された 12 個の完全対称近傍の $\mathbb{Z}_2$ スピン液体を含む 50 個の独立 ans\atze が存在する。 [https://journals.aps.org/prb/abstract/10.1103/PhysRevB.100.075125] 各クラスについて、最も一般的な対称性許容平均場ハミルトニアンを指定する。さらに,反強磁性ハイゼンベルク点近傍のスピン-$1/2$ xxz 模型の平均場方程式を解いて,スピン液体 ans\"atze の部分集合の性質を検証した。 4つのキラルスピン液体が、格子モジュラー時間反転対称性のネジ対称性を破っている。これらの状態は、以前研究されたモノポール束状態と異なる対称性を持ち、その特異な特徴は格子のすべてのロンボで囲まれた$\frac{\pi}{3}$フラックスである。

We give a complete classification of fully symmetric as well as chiral $\mathbb{Z}_2$ quantum spin liquids on the pyrochlore lattice using a projective symmetry group analysis of Schwinger boson mean-field states. We find 50 independent ans\"atze, including the 12 fully symmetric nearest-neighbor $\mathbb{Z}_2$ spin liquids that have been classified by Liu et al. [https://journals.aps.org/prb/abstract/10.1103/PhysRevB.100.075125]. For each class we specify the most general symmetry-allowed mean-field Hamiltonian. Additionally, we test the properties of a subset of the spin liquid ans\"atze by solving the mean-field equations for the spin-$1/2$ XXZ model near the antiferromagnetic Heisenberg point. We find four chiral spin liquids that break the screw symmetry of the lattice modulo time reversal symmetry. These states have a different symmetry than the previously studied monopole flux state and their unique characteristic is a $\frac{\pi}{3}$ flux enclosed by every rhombus of the lattice.

翻訳日:2023-03-20 16:53:00 公開日:2022-01-11

# 量子気体中の弱い相互作用と強い相互作用の二重性

Duality between weak and strong interactions in quantum gases ( http://arxiv.org/abs/2109.08626v2 )

ライセンス: Link先を確認

Etienne Granet, Bruno Bertini, Fabian H. L. Essler

(参考訳) 1次元の量子気体では、ハードコアボソンと非相互作用フェルミオンの間によく知られた「双対性」が存在する。しかし、場の理論のレベルでは、強く相互作用するボソンと弱い相互作用するフェルミオンをつなぐ完全双対性は知られていない。ここでは、この長期的問題に対する解決策を提案する。我々の導出は、その導出に比例する波動関数の不連続を誘導する1次元におけるフェルミオン間の唯一の点的相互作用を規則化することに依存する。すべての既知の正規化とは対照的に、我々のポテンシャルは小さな相互作用強度に対して弱い。これにより、強い相互作用を持つボソンに図式摂動理論の標準的な方法を適用することができる。第一の応用として, 有名なリーブ・リンガーモデルと双対なフェルミオン模型であるcheon-shigeharaモデルの有限温度スペクトル関数を計算する。

In one dimensional quantum gases there is a well known "duality" between hard core bosons and non-interacting fermions. However, at the field theory level, no exact duality connecting strongly interacting bosons to weakly interacting fermions is known. Here we propose a solution to this long standing problem. Our derivation relies on regularizing the only point-like interaction between fermions in 1D that induces a discontinuity in the wave function proportional to its derivative. In contrast to all known regularisations our potential is weak for small interaction strengths. Crucially, this allows one to apply standard methods of diagrammatic perturbation theory to strongly interacting bosons. As a first application we compute the finite temperature spectral function of the Cheon-Shigehara model, the fermionic model dual to the celebrated Lieb-Liniger model.

翻訳日:2023-03-14 11:26:43 公開日:2022-01-11

# 非共鳴二段階遷移:量子熱力学からの教訓

Nonresonant two-level transitions: Lessons from quantum thermodynamics ( http://arxiv.org/abs/2109.11413v2 )

ライセンス: Link先を確認

Andreas Wacker

(参考訳) 量子熱力学の概念に基づき、1つの電磁モードに結合した2段階の系を解析する。モード周波数が遷移周波数に一致しないデチューニングの場合には、レベルと光子エネルギーに対して有効エネルギーが導出される。熱力学的に一貫した記述を達成するためには, フェルミオン, ボソニック貯留層と定常状態でのエネルギー交換に使用する必要がある。周波数プルやブロッホゲインなどの既知の特徴を回復する一方で、熱力学的背景に光を当て、一貫性のある理解を可能にする。

Based on concepts from quantum thermodynamics the two-level system coupled to a single electromagnetic mode is analyzed. Focusing on the case of detuning, where the mode frequency does not match the transition frequency, effective energies are derived for the levels and the photon energy. It is shown that these should be used for energy exchange with fermionic and bosonic reservoirs in the steady state in order to achieve a thermodynamically consistent description. While recovering known features such as frequency pulling or Bloch gain, this sheds light on their thermodynamic background and allows for a coherent understanding.

翻訳日:2023-03-13 23:00:57 公開日:2022-01-11

# 競合する長距離相互作用を持つ原子BECにおける超ガラス形成

Superglass formation in an atomic BEC with competing long-range interactions ( http://arxiv.org/abs/2109.14709v2 )

ライセンス: Link先を確認

Stefan Ostermann, Valentin Walther and Susanne F. Yelin

(参考訳) 量子系の複雑な動的位相は、通常、創発的な周期次数を引き起こす原子相互作用によって引き起こされる。本稿では,密度秩序に対する動的不安定性が超ガラス相に与える2つの競合する,実質的に異なる長距離相互作用ポテンシャルを持つ量子多体系について検討する。 e. 局所密度変調を呈する超流動性非晶質固体であり, 長周期秩序は存在しない。ライドベルク装束法における2次元BECと光立位波共振器について検討する。この系の動的パターン形成は、リドバーグドレッシングによって生じる反発的ソフトコア相互作用と、キャビティフォトンによって引き起こされる無限範囲の符号変化相互作用の2つの相互作用電位の競合によって制御される。超ガラス相は、2つの相互作用ポテンシャルが不規則な長さスケールを導入するときに起こる。外部付加障害のないこの特異位相の動的形成は量子ゆらぎによって駆動され、2つの競合する相互作用エネルギーと長さスケールによって引き起こされるフラストレーションに起因する。

The complex dynamical phases of quantum systems are dictated by atomic interactions that usually evoke an emergent periodic order. Here, we study a quantum many-body system with two competing and substantially different long-range interaction potentials where the dynamical instability towards density order can give way to a superglass phase, i. e., a superfluid disordered amorphous solid, which exhibits local density modulations but no long-range periodic order. We consider a two-dimensional BEC in the Rydberg-dressing regime coupled to an optical standing wave resonator. The dynamic pattern formation in this system is governed by the competition between the two involved interaction potentials: repulsive soft-core interactions arising due to Rydberg dressing and infinite-range sign changing interactions induced by the cavity photons. The superglass phase is found when the two interaction potentials introduce incommensurate length scales. The dynamic formation of this peculiar phase without any externally added disorder is driven by quantum fluctuations and can be attributed to frustration induced by the two competing interaction energies and length scales.

翻訳日:2023-03-13 04:49:56 公開日:2022-01-11

# パラメータ空間に例外点が包囲された場合の量子力学の複素時間法

Complex time method for quantum dynamics when an exceptional point is encircled in the parameter space ( http://arxiv.org/abs/2110.14473v2 )

ライセンス: Link先を確認

Petra Ruth Kapralova-Zdanska

(参考訳) 我々は、量子力学への応用のための複素時間法を再考し、例外点がハミルトニアンのパラメータ空間に囲むようにした。複素時間法の基本的な考え方は、複素輪郭積分を用いて一階断熱摂動積分を行うことである。このようにして、量子力学問題は複素時間平面(遷移点)の特異点の研究に変換され、これは断熱ハミルトンの複素退化を表すものであり、周囲の輪郭を定義する時間依存パラメータは解析的に複素平面に継続される。このアプローチの基本的な例として、特別な時間対称の場合、例外点を囲む際に発生するラビ振動と急な断熱通路の切り替えについて論じる。

We revisit the complex time method for the application to quantum dynamics as an exceptional point is encircled in the parameter space of the Hamiltonian. The basic idea of the complex time method is using complex contour integration to perform the first-order adiabatic perturbation integral. In this way, the quantum dynamical problem is transformed to a study of singularities in the complex time plane -- transition points -- which represent complex degeneracies of the adiabatic Hamiltonian as the time-dependent parameters defining the encircling contour are analytically continued to complex plane. As an underlying illustration of the approach we discuss a switch between Rabi oscillations and rapid adiabatic passage which occurs upon the encircling of an exceptional point in a special time-symmetric case.

翻訳日:2023-03-10 03:14:53 公開日:2022-01-11

# 重力絡みの推測における限界

Limits on inference of gravitational entanglement ( http://arxiv.org/abs/2111.00936v2 )

ライセンス: Link先を確認

Yue Ma, Thomas Guff, Gavin Morley, Igor Pikovski, M. S. Kim

(参考訳) 重力と量子力学を組み合わせることは物理学の最大の課題である。近年、重力の量子的性質について間接的な手がかりを与えるオプト・メカニカル・システムによる実験が提案されている。近年のD. Carney et al., Phys.Rev.X Quantum 2, 030330 (2021)] では, 原子干渉計をメソスコピック発振器で重力的に絡めることが提案されている。この相互作用は、干渉計の視界の周期的な低下と回復をもたらし、特定の仮定の下では、絡み合いの重力発生を示す。ここでは、同じ効果を再現できる原子干渉計の半古典的モデルについて研究する。周期的崩壊と可視性の復活といったコアシグネチャは、原子がランダムなユニタリチャネルの下にある場合、発振器の明示的なモデリングがなくても、発振器が完全に古典的かつ状況的になる場合を含む。また,振動子の非古典性は,系が基底状態に非常に近くなければ消失し,系が基底状態であっても結合強度によって非古典性が制限されることを示した。その結果,非古典性仮定を満たし検証すること自体が大きな課題となるため,提案実験から絡み合いを導き出すことは非常に困難であることが示唆された。

Combining gravity with quantum mechanics remains one of the biggest challenges of physics. In the past years, experiments with opto-mechanical systems have been proposed that may give indirect clues about the quantum nature of gravity. In a recent variation of such tests [D. Carney et al., Phys.Rev.X Quantum 2, 030330 (2021)], the authors propose to gravitationally entangle an atom interferometer with a mesoscopic oscillator. The interaction results in periodic drops and revivals of the interferometeric visibility, which under specific assumptions indicate the gravitational generation of entanglement. Here we study semi-classical models of the atom interferometer that can reproduce the same effect. We show that the core signature -- periodic collapses and revivals of the visibility -- can appear if the atom is subject to a random unitary channel, including the case where the oscillator is fully classical and situations even without explicit modelling of the oscillator. We also show that the non-classicality of the oscillator vanishes unless the system is very close to its ground state, and even when the system is in the ground state, the non-classicality is limited by the coupling strength. Our results thus indicate that deducing entanglement from the proposed experiment is very challenging, since fulfilling and verifying the non-classicality assumptions is a significant challenge on its own right.

翻訳日:2023-03-09 16:58:46 公開日:2022-01-11

# ウィグナーの友人の古典的モデルがありますか。

Is there a classical model of Wigner's friend? ( http://arxiv.org/abs/2111.02807v2 )

ライセンス: Link先を確認

Anthony Sudbery

(参考訳) ウィグナーの友人(wigner's friend)とは、量子力学の規則に従って異なる観察者が矛盾する記述を与える量子過程を指す。 lostaglio と bowles は、最近同じ効果を示す古典的なシステムを記述すると主張した。この主張は正当化されていない。古典力学や量子力学における確率の異なる意味を考慮に入れない。

"Wigner's friend" refers to a quantum process of which different observers, following the rules of quantum mechanics, give contradictory descriptions. Lostaglio and Bowles have recently claimed to describe a classical system showing the same effect. It is argued that this claim is not justified. it fails to take into account the different meanings of probability in classical and quantum mechanics.

翻訳日:2023-03-09 06:31:49 公開日:2022-01-11

# 生物学的刺激による活性化関数は、生体と人工ニューロンのパフォーマンスギャップを橋渡しできる

Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons ( http://arxiv.org/abs/2111.04020v3 )

ライセンス: Link先を確認

Matthew Mithra Noel, Shubham Bharadwaj, Venkataraman Muthiah-Nakarajan, Praneet Dutta, Geraldine Bessie Amali

(参考訳) 非線形活性化関数はニューラルネットワークに複雑な高次元関数を学習する能力を与える。活性化関数の選択は、ディープニューラルネットワークの性能を決定する重要なハイパーパラメータである。これは勾配流、トレーニングの速度、最終的にはニューラルネットワークの表現力に大きく影響する。 sigmoidsのような飽和アクティベーション関数は、消失する勾配問題に悩まされ、ディープニューラルネットワークでは使用できない。普遍近似定理は、シグモイドとReLUの多層ネットワークが任意の精度で任意の複素連続函数を学習できることを保証する。多層ニューラルネットワークが任意の複雑な活性化関数を学習する能力にもかかわらず、従来のニューラルネットワーク(シグモイドとReLUをアクティベーションに用いたネットワーク)の各ニューロンはその決定境界として単一の超平面を持ち、従って線形分類を行う。したがって、Sigmoidal、ReLU、Swish、Mishの活性化機能を持つ単一ニューロンはXOR関数を学習できない。近年の研究では、振動活性化機能を有し、XOR機能を個別に学習できるヒト大脳皮質の2層と3層の生物学的ニューロンが発見された。生体ニューロンにおける振動活性化機能の存在は、生物学的ニューラルネットワークと人工神経ネットワークのパフォーマンスギャップを部分的に説明できるかもしれない。本稿では,個々のニューロンが手作業でxor機能を学習できる4つの新しい振動活性化機能を提案する。本稿では、発振活性化関数を用いて、ニューロンの少ない分類問題を解消し、トレーニング時間を短縮する可能性を検討する。

Nonlinear activation functions endow neural networks with the ability to learn complex high-dimensional functions. The choice of activation function is a crucial hyperparameter that determines the performance of deep neural networks. It significantly affects the gradient flow, speed of training and ultimately the representation power of the neural network. Saturating activation functions like sigmoids suffer from the vanishing gradient problem and cannot be used in deep neural networks. Universal approximation theorems guarantee that multilayer networks of sigmoids and ReLU can learn arbitrarily complex continuous functions to any accuracy. Despite the ability of multilayer neural networks to learn arbitrarily complex activation functions, each neuron in a conventional neural network (networks using sigmoids and ReLU like activations) has a single hyperplane as its decision boundary and hence makes a linear classification. Thus single neurons with sigmoidal, ReLU, Swish, and Mish activation functions cannot learn the XOR function. Recent research has discovered biological neurons in layers two and three of the human cortex having oscillating activation functions and capable of individually learning the XOR function. The presence of oscillating activation functions in biological neural neurons might partially explain the performance gap between biological and artificial neural networks. This paper proposes 4 new oscillating activation functions which enable individual neurons to learn the XOR function without manual feature engineering. The paper explores the possibility of using oscillating activation functions to solve classification problems with fewer neurons and reduce training time.

翻訳日:2023-03-08 22:32:25 公開日:2022-01-11

# NUTブラックホールのスカラーおよびディラック摂動のグレイボディ放射

Greybody Radiation of scalar and Dirac perturbations of NUT Black Holes ( http://arxiv.org/abs/2111.15005v2 )

ライセンス: Link先を確認

Ahmad Al-Badawi, Sara Kanzi, and \.Izzet Sakall{\i}

(参考訳) スピノリアル波動方程式、すなわちディラック方程式とクライン・ゴルドン方程式、およびナットブラックホール時空におけるグレイボディ放射を考える。この目的のために、NUT時空におけるディラック方程式をNewman-Penrose (NP)フォーマリズムにおけるヌルテトラッドを用いて初めて研究する。次に、ディラック方程式をラジアル集合と角集合に分離する。角集合は関連するルジャンドル関数の観点で解かれる。放射状集合では、分離された放射状波動方程式を求め、有効ポテンシャルとともに1次元schr\"odinger波動方程式を導出する。次に, 物理的に許容される領域において, 半径距離の関数としてプロットすることでポテンシャルを議論する。また、Klein-Gordon方程式を用いて、ボソンおよびフェルミオンのグレーボディ因子(GF)を計算する。 NUT時空のGFに対するNUTパラメータの影響を詳細に検討した。

We consider the spinorial wave equations, namely the Dirac and the Klein-Gordon equations, and greybody radiation in the NUT black hole spacetime. To this end, we first study the Dirac equation in NUT spacetime by using a null tetrad in the Newman-Penrose (NP) formalism. Next, we separate the Dirac equation into radial and angular sets. The angular set is solved in terms of associated Legendre functions. With the radial set, we obtain the decoupled radial wave equations and derive the one-dimensional Schr\"odinger wave equations together with effective potentials. Then, we discuss the potentials by plotting them as a function of radial distance in a physically acceptable region. We also study the Klein-Gordon equation to compute the greybody factors (GFs) for both bosons and fermions. The influence of the NUT parameter on the GFs of the NUT spacetime is investigated in detail.

翻訳日:2023-03-06 09:04:15 公開日:2022-01-11

# スピンチェーン交換ハミルトニアンの高速熱分解

Rapid thermalization of spin chain commuting Hamiltonians ( http://arxiv.org/abs/2112.00593v2 )

ライセンス: Link先を確認

Ivan Bardet, \'Angela Capel, Li Gao, Angelo Lucia, David P\'erez-Garc\'ia and Cambyse Rouz\'e

(参考訳) スピン鎖と大きな熱浴との弱い結合は、有限範囲の翻訳不変の通勤ハミルトニアンの任意の温度で急速に熱し、システムサイズと対数的にスケールする時間で平衡に達することを証明した。我々の主な結果は、古典的なスピン鎖に対するホリーとストロックの半次結果の量子的設定への一般化であり、スペクトルギャップの非閉性に基づく境界に対する指数関数的な改善を表している。物理的観点からは、この結果は変換不変スピン鎖上のデイビス進化に対する散逸相転移の欠如を厳密に立証する。この結果は、進化が位相の対称性を尊重する対称性保護位相(Symmetry Protected Topological phases)にも適用される。これは多体および非平衡量子系の研究に広く応用されている。

We prove that spin chains weakly coupled to a large heat bath thermalize rapidly at any temperature for finite-range, translation-invariant commuting Hamiltonians, reaching equilibrium in a time which scales logarithmically with the system size. Our main result is a generalization to the quantum setting of a seminal result of Holley and Stroock for classical spin chains and represents an exponential improvement over bounds based on the non-closure of the spectral gap. From a physical point of view, our result rigorously establishes the absence of dissipative phase transition for Davies evolutions over translation-invariant spin chains. The result also applies in the case of Symmetry Protected Topological phases where the evolution is respecting the symmetry of the phase. This has wide-ranging applications to the study of many-body in and out-of-equilibrium quantum systems.

翻訳日:2023-03-06 04:43:18 公開日:2022-01-11

# $\alpha-z$ Bures-Wasserstein 量子分岐に対する正しい平均

Right mean for the $\alpha-z$ Bures-Wasserstein quantum divergence ( http://arxiv.org/abs/2201.03732v1 )

ライセンス: Link先を確認

Miran Jeong, Jinmi Hwang, Sejong Kim

(参考訳) $\alpha-z$ Renyi相対エントロピーから誘導される新しい量子分岐は、最近、$\alpha-z$ Bures-Wasserstein 量子分岐と呼ばれる。本稿では,各点に対する$\alpha-z$ bures-wasserstein量子発散の重み付き和の一意的最小化である右平均の性質について検討する。カルタン平均を含む行列パワー平均を持つ正しい平均の多くの興味深い作用素不等式が提示される。さらに、ワッサースタイン平均とのトレース不等式を検証し、2つの右平均のハダマール積の境界を与える。

A new quantum divergence induced from the $\alpha-z$ Renyi relative entropy, called the $\alpha-z$ Bures-Wasserstein quantum divergence, has been recently introduced. We investigate in this paper properties of the right mean, which is a unique minimizer of the weighted sum of $\alpha-z$ Bures-Wasserstein quantum divergences to each points. Many interesting operator inequalities of the right mean with the matrix power mean including the Cartan mean are presented. Moreover, we verify the trace inequality with the Wasserstein mean and provide bounds for the Hadamard product of two right means.

翻訳日:2023-03-01 13:08:51 公開日:2022-01-11

# 3量子状態の合成

Preparation of 3-qubit states ( http://arxiv.org/abs/2201.03724v1 )

ライセンス: Link先を確認

Oscar Perdomo, Nelson Castaneda and Roger Vogeler

(参考訳) すべての振幅が実数であれば、純粋なqubit状態を実数と呼ぶ。実3量子状態は、$R_y(\theta)$ gates と少なくとも 4 つの制御された-$Z$ gates を用いて準備できることを示し、4 が最適であると予想する。 znidaric, giraud, georgeotによる2008年のアルゴリズムとは異なり、ローカルゲートと少なくとも3つの制御された$z$ゲートを使って、3量子ビットの状態を生成するアルゴリズムも提示する。私たちのメソッドが2および3量子状態に対してどのように動作するかを示すビデオは、https://youtu.be/LIdYSs-rE-oとhttps://youtu.be/Kne0Vq7gyzQで見ることができる。

We will call a pure qubit state real if all its amplitudes are real numbers. We show that any real 3-qubit state can be prepared using $R_y(\theta)$ gates and at most four controlled-$Z$ gates, and we conjecture that four is optimal. We also present an algorithm -- different from the 2008 algorithm given by Znidaric, Giraud and Georgeot -- that prepares any 3-qubit state using local gates and at most three controlled-$Z$ gates. Videos showing how our method works for two- and three-qubit states can be found at https://youtu.be/LIdYSs-rE-o and https://youtu.be/Kne0Vq7gyzQ

翻訳日:2023-03-01 13:08:41 公開日:2022-01-11

# アームチェアグラフェンナノリボンの多光子吸収とラビ振動

Multiphoton absorption and Rabi oscillations in armchair graphene nanoribbons ( http://arxiv.org/abs/2201.03896v1 )

ライセンス: Link先を確認

B.S. Monozon and P. Schmelcher

(参考訳) 本稿では,リボン軸に平行な光波によって誘起される時間振動する強電界の存在下でのアルチェエアグラフェンナノリボン(agnr)の多光子吸収とラビ振動の問題に対する解析的アプローチを提案する。リボン閉じ込めを受ける無質量電子に対する2次元ディラック方程式を用いる。価値と導電サイズ量子化サブバンド間の電子遷移に関連する電子-ホール対生成率の共振近似では、対応する多光子吸収係数とラビ振動の周波数が明示的な形で得られる。以上の量とリボン幅および電界強度の依存性を追尾し, 時間振動と実質的に一定な電界に関係した多光子アシストレジームとトンネルレジームの両方について検討した。サブバンド間遷移に対する電場の振動特性の顕著な増強効果に遭遇する。解析結果はグラフェン層で得られたものと数値計算により定性的に一致している。典型的なAGNRとレーザーパラメータの予測実験値は、ラボでラビ振動と多光子吸収の両方がアクセス可能であることを示している。サブバンド間トンネルに関連するデータは、agnrを外部実験室電場を印加することにより量子電磁気真空崩壊を検出できる1次元凝縮物アナログとする。

We present an analytical approach to the problem of the multiphoton absorption and Rabi oscillations in an armchair graphene nanoribbon (AGNR) in the presence of a time-oscillating strong electric field induced by a light wave directed parallel to the ribbon axis. The two-dimensional Dirac equation for the massless electron subject to the ribbon confinement is employed. In the resonant approximation the electron-hole pair production rate, associated with the electron transitions between the valence and conduction size-quantized subbands, the corresponding multiphoton absorption coefficient and the frequency of the Rabi oscillations are obtained in an explicit form. We trace the dependencies of the above quantities on the ribbon width and electric field strength for both the multiphoton assisted and tunneling regimes relevant to the time-oscillating and practically constant electric field, respectively. A significant enhancement effect of the oscillating character of the electric field on the intersubband transitions is encountered. Our analytical results are in qualitative agreement with those obtained for the graphene layer by numerical methods. Estimates of the expected experimental values for the typically employed AGNR and laser parameters show that both the Rabi oscillations and multiphoton absorption are accessible in the laboratory. The data relevant to the intersubband tunneling makes the AGNR a 1D condensed matter analog in which the quantum electrodynamic vacuum decay can be detected by applying an external laboratory electric field.

翻訳日:2023-03-01 13:05:34 公開日:2022-01-11

# ランクアグリゲーションのヒューリスティック検索とラベルランキングへの応用

Heuristic Search for Rank Aggregation with Application to Label Ranking ( http://arxiv.org/abs/2201.03893v1 )

ライセンス: Link先を確認

Yangming Zhou and Jin-Kao Hao and Zhen Li and Fred Glover

(参考訳) ランクアグリゲーションは、異なる有権者の選択肢の選好ランキングを単一のコンセンサスランキングに統合することを目的としている。しかし、様々な実用的応用のための有用なモデルとして、計算的に難しい問題である。本稿では,完全ランキングと部分ランキングの両方を用いて,ランクアグリゲーション問題を解くための効果的なハイブリッド進化的ランキングアルゴリズムを提案する。このアルゴリズムは、コンコーダントペアに基づくセマンティッククロスオーバーと、効率的な漸進的評価手法によって強化された遅延受容局所探索を特徴とする。アルゴリズムを評価するために実験を行い、最先端のアルゴリズムと比較してベンチマークインスタンスで高い競合性を示す。その実用性を示すために、このアルゴリズムは重要な機械学習タスクであるラベルランキングに適用される。

Rank aggregation aims to combine the preference rankings of a number of alternatives from different voters into a single consensus ranking. As a useful model for a variety of practical applications, however, it is a computationally challenging problem. In this paper, we propose an effective hybrid evolutionary ranking algorithm to solve the rank aggregation problem with both complete and partial rankings. The algorithm features a semantic crossover based on concordant pairs and a late acceptance local search reinforced by an efficient incremental evaluation technique. Experiments are conducted to assess the algorithm, indicating a highly competitive performance on benchmark instances compared with state-of-the-art algorithms. To demonstrate its practical usefulness, the algorithm is applied to label ranking, which is an important machine learning task.

翻訳日:2023-03-01 13:05:14 公開日:2022-01-11

# 自然パラメトリックダウン変換の古典的モデル

A classical model of spontaneous parametric down-conversion ( http://arxiv.org/abs/2201.03842v1 )

ライセンス: Link先を確認

Girish Kulkarni, Jeremy Rioux, Boris Braverman, Maria V. Chekhova, and Robert. W. Boyd

(参考訳) 我々は,自然パラメトリックダウンコンバージョン(SPDC)を,ポンプ場の古典的差分周波数生成(DFG)および仮想確率的「真空」シードフィールドとしてモデル化した。 DFGプロセスから発生するフィールドの2次時空間相関がSPDCから信号フィールドの2次時空間相関を再現することを示した。特に、低利得の場合、このモデルは信号光子の密度行列の量子計算と一致し、高利得の場合、モデルの予測は、ポンプ強度を上げるためのspd場の遠方磁場強度プロファイル、軌道角運動量スペクトル、波長スペクトルの実験的測定とよく一致している。さらに、モデルが二階のSU(1,1)干渉を捕捉し、両方の利得状態におけるコヒーレンス効果を誘導することを示す。興味深いことに、このモデルはまた、低利得状態におけるオブジェクト透過性による干渉視認性の線形スケーリングを正しく予測している。本モデルは,spdcと誘導コヒーレンスという文脈における古典量子分断に関する新たな基礎的洞察をもたらすだけでなく,spdに基づく多数の実験や応用のための有用な理論的ツールとなる。

We model spontaneous parametric down-conversion (SPDC) as classical difference frequency generation (DFG) of the pump field and a hypothetical stochastic "vacuum" seed field. We analytically show that the second-order spatiotemporal correlations of the field generated from the DFG process replicate those of the signal field from SPDC. Specifically, for low gain, the model is consistent with the quantum calculation of the signal photon's reduced density matrix; and for high gain, the model's predictions are in good agreement with our experimental measurements of the far-field intensity profile, orbital angular momentum spectrum, and wavelength spectrum of the SPDC field for increasing pump strengths. We further theoretically show that the model successfully captures second-order SU(1,1) interference and induced coherence effects in both gain regimes. Intriguingly, the model also correctly predicts the linear scaling of the interference visibility with object transmittance in the low-gain regime -- a feature that is often regarded as a quintessential signature of the nonclassicality of induced coherence. Our model may not only lead to novel fundamental insights into the classical-quantum divide in the context of SPDC and induced coherence, but can also be a useful theoretical tool for numerous experiments and applications based on SPDC.

翻訳日:2023-03-01 13:04:44 公開日:2022-01-11

# 原子-光ハイブリッド干渉計における非対称利得最適化によるセンシング性能向上

Sensing performance enhancement via asymmetric gain optimization in the atom-light hybrid interferometer ( http://arxiv.org/abs/2201.03818v1 )

ライセンス: Link先を確認

Zhifei Yu, Bo Fang, Shuying Chen, Pan Liu, Guzhi Bao, Chun-hua Yuan, and L.Q Chen

(参考訳) SU(1,1)型原子-光ハイブリッド干渉計(SALHI)は、光学相と原子相の両方に敏感な干渉計の一種である。しかし、この損失は実用上避けられない問題であり、干渉計の使用を大幅に制限している。可視性は干渉計のセンシング性能を評価する重要なパラメータである。そこで本研究では,salhiの視認性に対する損失の軽減効果を非対称ゲイン最適化により実験的に示し,視認性に対する損失の最大閾値を100〜$近く増加させる。さらに,最大視認性に対する最適条件は,信号対雑音比(snr)を強度検出を用いた損失の有無において最良値に向上させる条件と同一であり,snr改善のための実験的運用基準として有効であることを示す。干渉可視性の向上はSNR増強の達成を意味する。本研究は,SALHIをレーダーおよび測度測定に応用するための重要な基礎となる。

The SU (1,1)-type atom-light hybrid interferometer (SALHI) is a kind of interferometer that is sensitive to both the optical phase and atomic phase. However, the loss has been an unavoidable problem in practical applications and greatly limits the use of interferometers. Visibility is an important parameter to evaluate the sensing performance of interferometers. Here, we experimentally demonstrate the mitigating effect of the loss on visibility of the SALHI via asymmetric gain optimization, where the maximum threshold of loss to visibility close to $100\%$ is increased. Furthermore, we theoretically find that the optimal condition for the largest visibility is the same as that for the enhancement of signal-to-noise ratio (SNR) to the best value in the presence of losses using the intensity detection, indicating that the visibility can act as an experimental operational criterion for SNR improvement in practical applications. Improvement of the interference visibility means achievement of SNR enhancement. Our results provide a significant foundation for practical application of the SALHI in radar and ranging measurements.

翻訳日:2023-03-01 13:04:21 公開日:2022-01-11

# カルデイラ・レゲット形式論における未探究のデコヒーレンスについて:到着時間分布、同一粒子および時間内の回折

On some unexplored decoherence aspects in the Caldeira-Leggett formalism: arrival time distributions, identical particles and diffraction in time ( http://arxiv.org/abs/2201.03778v1 )

ライセンス: Link先を確認

S. V. Mousavi and S. Miret-Artes

(参考訳) カルデイラ・レゲット・マスター方程式における未検討のデコヒーレンスについて解析・考察した。デコヒーレンス過程は、緩和速度または摩擦と温度という2つの環境パラメータによって制御され、量子状態から古典状態へ徐々に遷移する。時間依存干渉パターンによるデコヒーレンス過程において, 時間分布, 非最小不確かさ生成物, 拡張ガウス波パケット, 同一粒子および回折は, 興味深い特徴を示す。定力場の存在がデコヒーレンスに影響を与えないこと, ストレッチパラメータの正値がデコヒーレンス率を減少させること, 同一粒子に対する波動関数の対称性がオープンダイナミクスを考慮した場合, 時間と空間の回折は, いわゆる量子シャッター問題におけるゼロ散逸限界における温度および/または緩和速度を増大させることによって徐々に洗い出されることを示す。

Some unexplored decoherence aspects within the Caldeira-Leggett master equation are analyzed and discussed. The decoherence process is controlled by the two environment parameters, the relaxation rate or friction and the temperature, leading to a gradual transition from the quantum to classical regime. Arrival time distributions, nonminimum-uncertainty-product or stretching Gaussian wave packets, identical particles and diffraction in time display interesting features during the decoherence process undergone by the time dependent interference patterns. We show that the presence of a constant force field does not affect the decoherence, {\it positive} values of the stretching parameter reduces the rate of decoherence, the symmetry of the wave function for identical particles plays no role when open dynamics are considered; and diffraction in time and space is gradually washed out by increasing the temperature and/or relaxation rate in the zero dissipation limit within the so-called quantum shutter problem.

翻訳日:2023-03-01 13:04:00 公開日:2022-01-11

# ナノファイバー空洞量子力学系における高次例外点

High-order exceptional point in a nanofiber cavity quantum electrodynamics system ( http://arxiv.org/abs/2201.03768v1 )

ライセンス: Link先を確認

Zigeng Li and Xiaomiao Li and Xiaolan Zhong

(参考訳) 本稿では,2レベルエミッタとナノファイバーキャビティからなる全繊維エミッタキャビティ量子電磁力学(QED)システムを提案する。本手法により,エミッタとナノファイバーキャビティの結合に基づく高次例外点の観測が可能となった。このキャビティの有効な利得は、2つの同一のレーザー場を介してナノファイバーキャビティに弱い駆動によって得られ、実験の実行においてコヒーレント完全吸収(CPA)を実現する。実験可能なパラメータの下では、このシステムのハミルトニアンは擬エルミティティ(英語版)の状態にあり、その固有値は1つの実と1つの複素共役からなるか、あるいは全て実となる。 2つのエミッタ-キャビティ結合強度の比とエミッタの崩壊率の比を制御的に調整することにより、エミッタ-キャビティ系におけるパリティ時間対称性のない3階例外点(EP3)と2階例外点(EP2)の両方を発見することができる。これらの結果は、全出力スペクトルと透過スペクトルによっても示される。また,ep3点において結合強度が臨界結合強度よりも大きい場合,対称モードが発生することがわかった。本稿では,高次例外点を実現する新しい手法を提案する。

We present an all-fiber emitter-cavity quantum electrodynamics (QED) system which consists of two two-level emitters and a nanofiber cavity. Our scheme makes it possible to observe the higher-order exceptional points based on the coupling between the emitters and the nanofiber cavity. The effective gain of this cavity can be obtained by weakly driven to the nanofiber cavity via two identical laser fields, which will realize coherent perfect absorption (CPA) in the implementation of the experiments. Under the experimental feasible parameters, the Hamiltonian of this system is in the condition of pseudo-Hermiticity, which means that its eigenvalues can be made of one real and a pair of complex conjugates, or be all real. By controllably tuned the ratio of the two emitter-cavity coupling strengths, and the ratio of the decay rates of the emitters, we can discover both the three-order exceptional point (EP3) and the second-order exceptional point (EP2) without parity-time symmetry in our emitter-cavity system. These results can also be demonstrated by the total output spectra and transmission spectra. We also find that the symmetric modes come into being when the coupling strength greater than the critical coupling strength at EP3 points. Our proposal will provide a new method to realize higher-order exceptional points.

翻訳日:2023-03-01 13:03:34 公開日:2022-01-11

# 密度汎関数理論を一電子還元密度行列関数理論に変換して静的相関を捉える

Density Functional Theory Transformed into a One-electron Reduced Density Matrix Functional Theory for the Capture of Static Correlation ( http://arxiv.org/abs/2201.03736v1 )

ライセンス: Link先を確認

Daniel Gibney, Jan-Niklas Boyn and David A. Mazziotti

(参考訳) 現代の計算化学において最も広く採用されている密度汎関数論(DFT)は、強い相関系の電子構造を正確に記述することができない。ここでは、DFTを1電子還元密度行列(1-RDM)関数理論にフォーマルかつ実用的に変換できることを示す。運動エネルギー項における1-RDMのイデオロポシシ制限の緩和に加えて、DFTの密度に基づく交換相関関数に2次1-RDMに基づく項を追加する。我々のアプローチは、DFTの計算スケール$O(r^{3})$の2次半定値プログラミングによって実装され、シングルトビラディカルや結合解離のような化学構造やプロセスにおける静的相関の記述において、従来のDFTよりも大幅に改善されている。

Density functional theory (DFT), the most widely adopted method in modern computational chemistry, fails to describe accurately the electronic structure of strongly correlated systems. Here we show that DFT can be formally and practically transformed into a one-electron reduced-density-matrix (1-RDM) functional theory, which can address the limitations of DFT while retaining favorable computational scaling compared to wavefunction-based approaches. In addition to relaxing the idempotency restriction on the 1-RDM in the kinetic energy term, we add a quadratic 1-RDM-based term to DFT's density-based exchange-correlation functional. Our approach, which we implement by quadratic semidefinite programming at DFT's computational scaling of $O(r^{3})$, yields substantial improvements over traditional DFT in the description of static correlation in chemical structures and processes such as singlet biradicals and bond dissociations.

翻訳日:2023-03-01 13:02:57 公開日:2022-01-11

# モードパラメータ推定における量子限界

Quantum Limits on Mode Parameter Estimation ( http://arxiv.org/abs/2201.04050v1 )

ライセンス: Link先を確認

Manuel Gessner, Nicolas Treps, and Claude Fabre

(参考訳) パラメータ非依存の量子状態によって占有されるモードの時空間構造を変更する「モードパラメータ」の推定における究極の量子限界を決定する。純粋あるいは混合、ガウス的あるいは非ガウス的といった任意の多モード状態に対して有界な量子 Cram\'{e}r-Rao に対する解析的表現は、非古典的状態によって達成される可能性のある量子精度拡張の起源を明らかにする。推定誤差のスケーリングの改善は、特定のモードが集約され測定された場合にのみ可能である。その結果,広帯域の時空間モードパラメータと超解像画像に対する量子エンハンスド推定手法の同定が可能となった。

We determine the ultimate quantum limits on the estimation of a "mode parameter" that modifies the spatiotemporal structure of the modes occupied by a parameter-independent quantum state. Our analytical expression for the quantum Cram\'{e}r-Rao bound for arbitrary multimode states, pure or mixed, Gaussian or non-Gaussian, reveals the origin of quantum precision enhancements that may be achieved with nonclassical states. Improved scaling of the estimation error is possible only if specific modes are populated and measured. Our results allow us to identify quantum-enhanced estimation strategies for a wide range of spatio-temporal mode parameters and in superresolution imaging.

翻訳日:2023-03-01 12:55:36 公開日:2022-01-11

# rydberg原子配列における量子スピン液体の動的合成

Dynamical preparation of quantum spin liquids in Rydberg atom arrays ( http://arxiv.org/abs/2201.04034v1 )

ライセンス: Link先を確認

Giuliano Giudici, Mikhail D. Lukin, Hannes Pichler

(参考訳) ライドバーグ原子配列に基づくプログラム可能な量子シミュレータを用いて,最近のスピン液体の開始を示す実験(semeghini et al., science 374, 1242 (2021))を理論的に解析した。実験では, 準断熱的状態準備プロトコルを用いて調製した平衡外状態において, トポロジカル秩序のロバストなシグネチャが出現する。状態準備プロトコルは、原子数と線形にスケールする時間において、位相位相の固定点(硬二量体の共鳴価結合(RVB)状態)を目標に最適化できることを理論的に示す。さらに, テンソルネットワーク(TN)状態の2パラメータ変動多様体について, 合成過程の多体ダイナミクスを正確に記述する。このアプローチを用いて,非平衡状態の性質を解析し,位相秩序の出現を明らかにした。

We theoretically analyze recent experiments [G. Semeghini et al., Science 374, 1242 (2021)] demonstrating the onset of a topological spin liquid using a programmable quantum simulator based on Rydberg atom arrays. In the experiment, robust signatures of topological order emerge in out-of-equilibrium states that are prepared using a quasi-adiabatic state preparation protocol. We show theoretically that the state preparation protocol can be optimized to target the fixed point of the topological phase -- the resonating valence bond (RVB) state of hard dimers -- in a time that scales linearly with the number of atoms. Moreover, we provide a two-parameter variational manifold of tensor network (TN) states that accurately describe the many-body dynamics of the preparation process. Using this approach we analyze the nature of the non-equilibrium state, establishing the emergence of topological order.

翻訳日:2023-03-01 12:55:23 公開日:2022-01-11

# 非エルミート準結晶の動的局在

Dynamical localization in non-Hermitian quasi-crystals ( http://arxiv.org/abs/2201.04028v1 )

ライセンス: Link先を確認

C. M. Dai, Yunbo Zhang, and Xuexi Yi

(参考訳) 片方向の2ステップ駆動は一様コヒーレントトンネルと非共分散オンサイトゲインと損失により構成される周期的に駆動される1次元非エルミタン格子の局所化遷移について検討した。複雑なポテンシャルの駆動周波数と位相シフトに応じて, システムは局所化, 非局在化, 混合相にすることができる。システムの2つの臨界駆動周波数を特定し、最初の1つは複素ポテンシャルの最大位相シフトに対応し、準エネルギースペクトルが依然として存在し、全ての状態が拡張され、もう1つは完全な実スペクトルの消失に対応し、非常に弱い複素ポテンシャルは、駆動周波数がこの臨界周波数より低い場合に局所状態の出現に繋がる。高周波の極限において、実スペクトルと複素スペクトルの2つの領域を分離する臨界位相シフトは、有効非エルミート・ハミルトニアンによって捉えることができる定数値に傾向する。

We study the localization transition in periodically driven one-dimensional non-Hermitian lattices where the piece-wise two-step drive is constituted by uniform coherent tunneling and incommensurate onsite gain and loss. We find that the system can be in localized, delocalized, or mixed-phase depending on the driving frequency and the phase shift of complex potential. Two critical driving frequencies of the system are identified, the first one corresponds to the largest phase shift of the complex potential so that the quasi-energy spectrum is still real and all the states are extended, the second one corresponds to the disappear of full real spectrum, and very weak complex potential leads to the emergence of localized states when the driving frequency is lower than this critical frequency. In the high frequency limit, we find the critical phase shift that separates the two regions with respectively real and complex spectrum tends to a constant value that can be captured by an effective non-Hermitian Hamiltonian.

翻訳日:2023-03-01 12:55:06 公開日:2022-01-11

# 定電界におけるガウス相関のシュウィンガー効果

Schwinger effect of Gaussian correlations in constant electric fields ( http://arxiv.org/abs/2201.04001v1 )

ライセンス: Link先を確認

Shu-Min Wu, Hao-Sheng Zeng

(参考訳) 我々はアリスとボブが共有した連続可変2モード圧縮状態のガウス相関(量子エンタングルメント、不協和および相互情報)のシュウィンガー効果について検討し、フェルミオン-フェルミオンモードとクビット-ボゾン場間の相関関係のシュウィンガー効果の差に特に注意を払う。また,シュウィンガー効果下での相関関係の再分配と保守性についても検討した。

We study the Schwinger effect of Gaussian correlations (quantum entanglement, discord and mutual information) of the continuous-variable two-mode squeezed states shared by Alice and Bob, paying special attention to the difference of the Schwinger effect of correlations between modes of fermion-fermion and qubit-bosonic fields studied previously. We also study the redistribution and conservativeness of the correlations under the Schwinger effect.

翻訳日:2023-03-01 12:54:49 公開日:2022-01-11

# 負性ハミルトニアン:混合状態絡みの作用素的特徴

The Negativity Hamiltonian: An operator characterization of mixed-state entanglement ( http://arxiv.org/abs/2201.03989v1 )

ライセンス: Link先を確認

Sara Murciano, Vittorio Vitale, Marcello Dalmonte, Pasquale Calabrese

(参考訳) 量子多体系の基底状態の文脈において、空間の連結領域間の絡み合いの局所性は対応する絡み合いハミルトニアンの局所性に直接結びついている。本研究では,多体系の部分転置の対数を記述する(非エルミート的)実効ハミルトニアン作用素として,負性ハミルトニアンを導入する。これにより、二部的な純粋システムのパラダイムを超えて、絡み合いと演算子の局所性の間の接続に対処できる。この方向への第一歩として、フェルミオン共形場の理論と自由フェルミオン鎖に対する負性ハミルトニアンの構造について研究する:どちらの場合も、負性ハミルトニアンが半局所函数形式を仮定し、単純な函数関係によって捉えることを示す。

In the context of ground states of quantum many-body systems, the locality of entanglement between connected regions of space is directly tied to the locality of the corresponding entanglement Hamiltonian: the latter is dominated by local, few-body terms. In this work, we introduce the negativity Hamiltonian as the (non hermitian) effective Hamiltonian operator describing the logarithm of the partial transpose of a many-body system. This allows us to address the connection between entanglement and operator locality beyond the paradigm of bipartite pure systems. As a first step in this direction, we study the structure of the negativity Hamiltonian for fermionic conformal field theories and a free fermion chain: in both cases, we show that the negativity Hamiltonian assumes a quasi-local functional form, that is captured by simple functional relations.

翻訳日:2023-03-01 12:54:38 公開日:2022-01-11

# 量子通信におけるPOVM測定の簡単な紹介

A Brief Introduction to POVM Measurement in Quantum Communications ( http://arxiv.org/abs/2201.07968v1 )

ライセンス: Link先を確認

Renzhi Yuan

(参考訳) 本稿では,量子通信におけるポジティブ演算値測定(POVM)について概説する。 Projection-Valued Measure(PVM)が最初に導入され、次にPOVM。 POVM と PVM の関係を論じ,実測における POVM の例を示す。本稿では,量子通信におけるPOVM測定について考察する。

This paper gives a brief introduction to Positive-Operator Valued Measure (POVM) of quantum communications. The Projection-Valued Measure (PVM) is first introduced and then the POVM. The relation between POVM and PVM is discussed and an example of POVM in practical measurement is given. This paper provides some insight of POVM measurement for quantum communications.

翻訳日:2023-03-01 12:47:09 公開日:2022-01-11

# ワクチン規制の強化とワクチンに対する公衆の態度: Google検索活動から何が学べるか?

Reinforcement of vaccine mandates and public attitudes towards vaccines: What can we learn from google search activity ? ( http://arxiv.org/abs/2201.06965v1 )

ライセンス: Link先を確認

Florian Cafiero (GEMASS), Jeremy Ward

(参考訳) 国際公衆衛生政策はますます強制免疫を優先している。ワクチン接種に対する短期的な影響が十分に文書化されている場合、ワクチンに対する公衆の態度に対する影響についてはほとんど考慮されていない。本稿では,過去10年で少なくとも1回のワクチン委任延長を経験した5カ国(オーストラリア,フランス,ドイツ,イタリア,セルビア)および2つの米国国家(カリフォルニア)のワクチンに関するGoogle検索について検討する。新たな委任統治の実施の効果は、それぞれの国や国家の状況に大きく依存していることが判明した。また,新規あるいは延長された委任状がワクチンに対する公衆の疑念を和らげる兆候はほとんどなかった。

International public health policies increasingly favor mandatory immunization. If its short-term effects on vaccine coverage are well documented, there has been little consideration to its effects on public attitudes towards vaccines. In this paper, we examine Google searches related to vaccines in five countries (Australia, France, Germany, Italy, Serbia) and two American states (California) which experienced at least one vaccine mandate extension in the past decade. We found that the effects of a new mandate implementation heavily depends on the context in each specific country or state. We also observed that there is little indication that the passing of new or extended mandates attenuated public doubt towards vaccines.

翻訳日:2023-03-01 12:46:28 公開日:2022-01-11

# データマーケットプレースとそのビジネスモデルに関する調査

A Survey of Data Marketplaces and Their Business Models ( http://arxiv.org/abs/2201.04561v1 )

ライセンス: Link先を確認

Santiago Andr\'es Azcoitia and Nikolaos Laoutaris

(参考訳) 「データは、土地、インフラ、労働、資本のように、必要不可欠な生産要素になりつつある。これの一環として、さまざまな分野の無数のアプリケーションが、生産チェーンやビジネスプロセスにおいて重要な役割を担うモデルやアルゴリズムを供給するために、膨大な量の情報を必要とします。特定の機能の自動化から、データ駆動型組織における意思決定の促進に至るまで、サードパーティからのデータインプットを取得することのメリットはますます増えています。この要求に応えて、データ要求を適切なプロバイダと一致させ、情報の交換を容易にすることを目的として、新しいエンティティと新しいビジネスモデルが登場した。本稿では,インターネット上でデータ取引を行う企業の現状に関する包括的調査の結果と結論と,研究コミュニティによる新たなデータマーケットプレースの設計について述べる。

"Data" is becoming an indispensable production factor, just like land, infrastructure, labor or capital. As part of this, a myriad of applications in different sectors require huge amounts of information to feed models and algorithms responsible for critical roles in production chains and business processes. Tasks ranging from automating certain functions to facilitating decision-making in data-driven organizations increasingly benefit from acquiring data inputs from third parties. Responding to this demand, new entities and novel business models have appeared with the aim of matching such data requirements with the right providers and facilitating the exchange of information. In this paper, we present the results and conclusions of a comprehensive survey on the state of the art of entities trading data on the internet, as well as novel data marketplace designs from the research community.

翻訳日:2023-03-01 12:46:15 公開日:2022-01-11

# トランスモン量子ビットの自発的放出率計算のための全波法

Full-Wave Methodology to Compute the Spontaneous Emission Rate of a Transmon Qubit ( http://arxiv.org/abs/2201.04244v1 )

ライセンス: Link先を確認

Thomas E. Roth and Weng C. Chew

(参考訳) 自発的放出速度(ser)は量子ビット(qubit)の制御と非一貫性において重要な役割を果たすため、量子ビット(qubit)にとって重要なメリットの指標である。その結果、実用機器のSERを正確に特徴付けることは、量子情報処理装置の設計における重要なステップである。ここでは、超伝導回路の量子ビットの一種であるトランスモン量子ビットの実験的に人気のあるプラットフォームに焦点を当てる。これらの量子ビットのSERを理解することの重要性にもかかわらず、近似回路モデルを用いてしばしば決定される。設計過程における予測の精度を向上させるためには,実用システムの記述において最小の近似をすることができるフルウェーブ数値手法を用いることが望ましい。本稿では,最近開発されたトランスモン量子ビットを電磁環境に結合したフィールドベースで記述することで,これを実現する方法を示す。実験でよく特性化された文献と類似したデバイスに対してサーを計算し,モデルを検証する。さらに,単純化した集積素子回路と伝送線路モデルとの比較を行い,検討を行った。

The spontaneous emission rate (SER) is an important figure of merit for any quantum bit (qubit), as it can play a significant role in the control and decoherence of the qubit. As a result, accurately characterizing the SER for practical devices is an important step in the design of quantum information processing devices. Here, we specifically focus on the experimentally popular platform of a transmon qubit, which is a kind of superconducting circuit qubit. Despite the importance of understanding the SER of these qubits, it is often determined using approximate circuit models or is inferred from measurements on a fabricated device. To improve the accuracy of predictions in the design process, it is better to use full-wave numerical methods that can make a minimal number of approximations in the description of practical systems. In this work, we show how this can be done with a recently developed field-based description of transmon qubits coupled to an electromagnetic environment. We validate our model by computing the SER for devices similar to those found in the literature that have been well-characterized experimentally. We further cross-validate our results by comparing them to simplified lumped element circuit and transmission line models as appropriate.

翻訳日:2023-03-01 12:46:05 公開日:2022-01-11

# すべての量子混合物は

All quantum mixtures are proper ( http://arxiv.org/abs/2201.04143v1 )

ライセンス: Link先を確認

Leonardo Castellani

(参考訳) 固有かつ不適切な量子混合状態は観測可能な差を持たず、区別するべきではないと論じられている。これは量子力学に対する主観的なアプローチに影響を及ぼし、QMのリレーショナル解釈の主要な動機の1つを無効にする。

It is argued that proper and improper quantum mixed states have no observable differences, and hence should not be distinguished. This has implications for subjective approaches to quantum mechanics, and invalidates one of the main motivations for relational interpretations of QM.

翻訳日:2023-03-01 12:44:39 公開日:2022-01-11

# ガウス変調CV-QKDの性能に及ぼすサブシステム非理想性の影響

Influence of sub-system non-idealities on the performance of Gaussian modulated CV-QKD ( http://arxiv.org/abs/2202.01311v1 )

ライセンス: Link先を確認

R Muralekrishnan, Lakshmi Narayanan Venkatasubramani, Sameer Ahmad Mir and Deepa Venkitesh

(参考訳) 本稿では,ガウス変調CV-QKDシステムにおける数値モデリングとサブシステムの評価について,非理想的操作を取り入れた詳細な解析と関連する結果について述べる。

We present a detailed analysis of the numerical modelling and evaluation of sub-systems in a Gaussian modulated CV-QKD system, incorporating non-ideal operations, and along with associated results.

翻訳日:2023-03-01 12:35:41 公開日:2022-01-11

# ランダムRNNとCNN:RGB-Dオブジェクトのマルチレベル解析とシーン認識を目指して

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition ( http://arxiv.org/abs/2004.12349v2 )

ライセンス: Link先を確認

Ali Caglayan and Nevrez Imamoglu and Ahmet Burak Can and Ryosuke Nakamura

(参考訳) オブジェクトとシーンを認識することは、イメージ理解において困難な2つの重要なタスクである。特に、これらのタスクの処理におけるrgb-dセンサーの使用は、視覚理解を改善するための重要な焦点となっている。一方、ニューラルネットワーク、特に畳み込みニューラルネットワーク(cnns)は広く普及し、手作りの機能を効果的なディープ機能に置き換えることで、多くの視覚タスクに応用されている。しかし、多層CNNモデルの深い特徴を効果的に活用する方法は、オープンな問題である。本稿では,オブジェクトおよびシーン認識タスクのための多モードRGB-D画像から識別的特徴表現を抽出する新しい2段階フレームワークを提案する。第1段階では、事前訓練されたcnnモデルがバックボーンとして採用され、複数のレベルで視覚的な特徴を抽出する。第2段階は、これらの特徴を再帰的ニューラルネットワーク(rnn)の完全ランダム構造を持つ高レベル表現にマップする。 CNNアクティベーションの高次元性に対応するため、RNNにおけるランダム性の概念を拡張したランダム重み付けプール方式が提案されている。マルチモーダル融合は、RGBと深度ストリームの個人認識信頼度(すなわちSVMスコア)に基づいて重みを計算し、ソフト投票方式によって実現されている。これにより、最終RGB-D分類性能において一貫したクラスラベル推定が得られる。広範囲な実験により、rnnステージの完全ランダム化構造がcnnの活性化を識別的固体機能にうまくエンコードしていることが確かめられた。人気の高いWashington RGB-D ObjectとSUN RGB-D Sceneデータセットの比較実験結果から,提案手法はオブジェクト認識タスクとシーン認識タスクの両方における最先端の手法と比較して,優れた性能,即時性能を実現していることが示された。コードはhttps://github.com/acaglayan/cnn_randrnnで入手できる。

Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing hand-crafted features with effective deep features. However, it is an open problem how to exploit deep features from a multi-layer CNN model effectively. In this paper, we propose a novel two-stage framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. In the first stage, a pretrained CNN model has been employed as a backbone to extract visual features at multiple levels. The second stage maps these features into high level representations with a fully randomized structure of recursive neural networks (RNNs) efficiently. To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed by extending the idea of randomness in RNNs. Multi-modal fusion has been performed through a soft voting approach by computing weights based on individual recognition confidences (i.e. SVM scores) of RGB and depth streams separately. This produces consistent class label estimation in final RGB-D classification performance. Extensive experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully. Comparative experimental results on the popular Washington RGB-D Object and SUN RGB-D Scene datasets show that the proposed approach achieves superior or on-par performance compared to state-of-the-art methods both in object and scene recognition tasks. Code is available at https://github.com/acaglayan/CNN_randRNN.

翻訳日:2022-12-09 13:35:55 公開日:2022-01-11

# MCMC出力の最適薄膜化

Optimal Thinning of MCMC Output ( http://arxiv.org/abs/2005.03952v5 )

ライセンス: Link先を確認

Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates

(参考訳) マルコフ連鎖モンテカルロの出力の収束と圧縮を評価するためのヒューリスティックスの使用は、生成される経験的近似の観点からは、準最適である。典型的には、初期状態のいくつかは「燃え尽きる」とされ除去されるが、圧縮も必要であれば残りの鎖は「薄められる」。本稿では,実験分布から得られる近似が最適に近いようなサンプルパスから,固定濃度を持つ状態の部分集合を遡及的に選択する問題を考察する。重圧縮を必要とする問題に適合するカーネルSteinの差分最小化に基づく新しい手法を提案する。一般微分方程式に対するパラメータ推論の難解な文脈において, この手法の有効性を理論的に保証する。ソフトウェアは、Python、R、MATLABのStein Thinningパッケージで利用可能である。

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB.

翻訳日:2022-12-05 13:09:47 公開日:2022-01-11

# 高齢者の身体活動エネルギー消費のモニタリングに関するRNN

RNNs on Monitoring Physical Activity Energy Expenditure in Older People ( http://arxiv.org/abs/2006.01169v2 )

ライセンス: Link先を確認

Stylianos Paraschiakos, Cl\'audio Rebelo de S\'a, Jeremiah Okai, Eline P. Slagboom, Marian Beekman, Arno Knobbe

(参考訳) 身体活動エネルギー支出(PAEE)の定量化を通じて、医療モニタリングは、活力と健康な高齢化を刺激し、高齢者の行動変化を誘発し、これらを個人の健康増進と結びつける可能性がある。監視環境においてPAEEを測定するために,若年層を対象としたウェアラブル加速度計の手法が開発されている。高齢者は,エネルギー要求や身体活動の幅が異なるため,高齢者のPAEE推定には適さない可能性がある。過去の活動が現在のPAEEに影響を与えるため、逐次データモデリング能力で知られるモデリング手法であるRecurrent Neural Network (RNN)を提案する。高齢者のためのrnnのトレーニングには,60歳以上(平均65歳)の健常者34名を対象に,gotovデータセットを用いて16種類の活動を行った。我々は,手首と足首に設置した加速度計を用いて,間接熱量測定によるエネルギー数の測定を行った。最適化後、3つのGRU層を持つRNNと、加速度計と参加者レベルのデータを組み合わせたフィードフォワードネットワークからなるアーキテクチャを提案する。本稿では,grgベースのrnnの標準設備を越え,最先端技術を上回る精度を達成するための取り組みについて述べる。これらの取り組みには、アグリゲーション関数を平均から分散尺度(SD, IQR, ...)に切り替えること、時間的データと静的データ(年齢、体重、BMIなど個人固有の詳細)を組み合わせて、以前に訓練されたMLモデルによって予測されたシンボル的活動データを追加することが含まれる。得られたアーキテクチャは、トレーニング入力を10倍減らしながら、ほぼ10%の性能向上を実現している。したがって、PAEEと代謝と認知の健康と精神の健康に関連する活力パラメータの関連を調査するために使用できる。

Through the quantification of physical activity energy expenditure (PAEE), health care monitoring has the potential to stimulate vital and healthy ageing, inducing behavioural changes in older people and linking these to personal health gains. To be able to measure PAEE in a monitoring environment, methods from wearable accelerometers have been developed, however, mainly targeted towards younger people. Since elderly subjects differ in energy requirements and range of physical activities, the current models may not be suitable for estimating PAEE among the elderly. Because past activities influence present PAEE, we propose a modeling approach known for its ability to model sequential data, the Recurrent Neural Network (RNN). To train the RNN for an elderly population, we used the GOTOV dataset with 34 healthy participants of 60 years and older (mean 65 years old), performing 16 different activities. We used accelerometers placed on wrist and ankle, and measurements of energy counts by means of indirect calorimetry. After optimization, we propose an architecture consisting of an RNN with 3 GRU layers and a feedforward network combining both accelerometer and participant-level data. In this paper, we describe our efforts to go beyond the standard facilities of a GRU-based RNN, with the aim of achieving accuracy surpassing the state of the art. These efforts include switching aggregation function from mean to dispersion measures (SD, IQR, ...), combining temporal and static data (person-specific details such as age, weight, BMI) and adding symbolic activity data as predicted by a previously trained ML model. The resulting architecture manages to increase its performance by approximatelly 10% while decreasing training input by a factor of 10. It can thus be employed to investigate associations of PAEE with vitality parameters related to metabolic and cognitive health and mental well-being.

翻訳日:2022-11-26 07:57:29 公開日:2022-01-11

# DensE: アダプティブセマンティック階層を組み込んだ知識グラフのための非可換表現の強化

DensE: An Enhanced Non-commutative Representation for Knowledge Graph Embedding with Adaptive Semantic Hierarchy ( http://arxiv.org/abs/2008.04548v2 )

ライセンス: Link先を確認

Haonan Lu, Hailin Hu, Xiaodong Lin

(参考訳) 関係の合成パターンのキャプチャは、知識グラフの完成において重要なタスクである。学習知識に対するマルチホップ推論の基本的なステップとしても機能する。これまで、数種類の複素値対角行列の積を用いて複合関係をモデル化するための回転に基づく翻訳法が開発されてきた。しかし、これらの手法は複合関係を単純化しすぎる傾向があり、例えば、それらは可換であり、実体とは独立であり、意味的階層を欠いている。そこで我々は,これらの問題を体系的に解決するために,複雑な構成パターンをモデル化するための新しい知識グラフ埋め込み法DensEを開発した。特に、3次元 (3-d) ユークリッド空間において、各関係をso(3)群に基づく回転作用素とスケーリング作用素に分解する。 This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns. 複数のベンチマークナレッジグラフの実験的結果は、特に複合関係において、リンク予測が欠如している現在の最先端モデルよりも密度が高いことを示している。

Capturing the composition patterns of relations is a vital task in knowledge graph completion. It also serves as a fundamental step towards multi-hop reasoning over learned knowledge. Previously, several rotation-based translational methods have been developed to model composite relations using the product of a series of complex-valued diagonal matrices. However, these methods tend to make several oversimplified assumptions on the composite relations, e.g., forcing them to be commutative, independent from entities and lacking semantic hierarchy. To systematically tackle these problems, we have developed a novel knowledge graph embedding method, named DensE, to provide an improved modeling scheme for the complex composition patterns of relations. In particular, our method decomposes each relation into an SO(3) group-based rotation operator and a scaling operator in the three dimensional (3-D) Euclidean space. This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns. Experimental results on multiple benchmark knowledge graphs show that DensE outperforms the current state-of-the-art models for missing link prediction, especially on composite relations.

翻訳日:2022-10-31 10:37:50 公開日:2022-01-11

# 制約付き最適化の近距離法の拡張

Extensions to the Proximal Distance Method of Constrained Optimization ( http://arxiv.org/abs/2009.00801v2 )

ライセンス: Link先を確認

Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange

(参考訳) 現在の論文では、損失$f(\boldsymbol{x})$を、パラメータを融合する行列である$\boldsymbol{D}\boldsymbol{x} \in S$という形式の制約を最小化する問題を研究している。融合制約は、滑らかさ、疎さ、あるいはより一般的な制約パターンをキャプチャすることができる。このような一般的な問題に対処するために、ベルトラミ・コースト法と近距離原理を組み合わせる。後者はペナル化対象の最小化によって駆動される: $f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ で、大きなチューニング定数が $\rho$ で、平方ユークリッド距離が $\boldsymbol{D}\boldsymbol{x}$ である。対応する近距離アルゴリズムの次のイテレート$\boldsymbol{x}_{n+1}$は、主要なサロゲート関数$f(\boldsymbol{x})+\frac{\rho}{2}\|\boldsymbol{d}\boldsymbol{x}-\mathcal{p}_{s}(\boldsymbol{d}\boldsymbol{x}_n)\|^2$を最小化することにより、現在のイテレート$\boldsymbol{x}_n$から構成される。固定 $\rho$ と部分解析損失 $f(\boldsymbol{x})$ と部分解析制約セット $s$ に対して、我々は定常点への収束を証明する。強い仮定の下では、収束率を提供し、線形局所収束を示す。また, コストのかかる線形システム問題を回避するために, 最急降下型 (sd) も構築した。アルゴリズムをベンチマークするために、乗算器の交互方向法(ADMM)と比較する。大規模な数値実験では, 距離予測, 凸回帰, 凸クラスタリング, 総変動像のデノイング, 行列の良好な条件数への投影に関する問題を含む。これらの実験は,高次元問題に対する最も急な変形の速度と許容可能な精度を示す。

The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $\rho$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\frac{\rho}{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$. For fixed $\rho$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.

翻訳日:2022-10-22 20:06:09 公開日:2022-01-11

# リッジレット事前:ベイズニューラルネットワークの事前仕様に対する共分散関数アプローチ

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks ( http://arxiv.org/abs/2010.08488v4 )

ライセンス: Link先を確認

Takuo Matsubara and Chris J. Oates and Fran\c{c}ois-Xavier Briol

(参考訳) ベイジアンニューラルネットワークは、ニューラルネットワークの強い予測性能と、ベイジアンフレームワークの予測出力に関連する不確実性の形式的定量化を組み合わせようとする。しかし、ネットワークの出力空間に持ち上げられたときに意味のある事前分布をネットワークのパラメータに与える方法はまだ不明である。タスクに対して適切なガウスプロセス共分散関数を提案できる可能性のあるソリューションが提案されている。提案手法は、ネットワークの出力空間における擬似ガウス過程を近似した、リッジレット事前と呼ばれる、ネットワークのパラメータの事前分布を構築する。ニューラルネットワークとガウス過程の接続に関する既存の研究とは対照的に,本解析は非漸近的であり,有限サンプルサイズ誤差境界が提供されている。これは、ベイズニューラルネットワークが共分散関数が十分正則である任意のガウス過程を近似できる普遍性を確立する。実験評価は概念実証に限定し,適切なガウス過程が提供できる回帰問題に対して,リッジレットプリアーが非構造化プリアーよりも優れることを示す。

Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate Gaussian process covariance function for the task at hand. Our approach constructs a prior distribution for the parameters of the network, called a ridgelet prior, that approximates the posited Gaussian process in the output space of the network. In contrast to existing work on the connection between neural networks and Gaussian processes, our analysis is non-asymptotic, with finite sample-size error bounds provided. This establishes the universality property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular. Our experimental assessment is limited to a proof-of-concept, where we demonstrate that the ridgelet prior can out-perform an unstructured prior on regression problems for which a suitable Gaussian process prior can be provided.

翻訳日:2022-10-06 20:20:31 公開日:2022-01-11

# 人間による位置推定のためのロボットナビゲーショングラフの最小化

Minimizing Robot Navigation-Graph For Position-Based Predictability By Humans ( http://arxiv.org/abs/2010.15255v2 )

ライセンス: Link先を確認

Sriram Gopalakrishnan, Subbarao Kambhampati

(参考訳) 人間とロボットが同じ空間で作業しながら移動している状況では、移動ロボットが捉えた予測可能な経路は環境をより安全に感じさせるだけでなく、人間が経路の衝突を避けたり、道を塞いだりすることで空間内のナビゲーションを助けることができる。予測可能な経路が不可欠になりますロボットの数が増えるにつれて、人間がロボットの経路を予測するための認知的努力は維持できなくなる。人間の数が増えるにつれて、複数の人間の動きを考慮しながらロボットが動くのも難しくなる。さらに、レストランや銀行、病院など、新しい人がこの分野に足を踏み入れると、ロボットが一般的に行う軌道への親密性が低下し、経路に沿って予測可能なロボットの動きの必要性がさらに高まる。そこで本研究では,ロボットの現在位置からの予測可能性である位置に基づく予測可能性について,ロボットのナビゲーショングラフを最小化することを提案する。これは、人間が自身の作業に加えて、ロボットの目標や以前の行動を追跡することは期待できないため、重要である。本稿では,位置に基づく予測可能性の尺度を定義し,ロボットの動きのナビゲーショングラフ(方向グラフ)を最小化するためのヒルクライミングアルゴリズムの提案と評価を行う。これに続いて,提案手法をサポートする人間-対象実験の結果が得られた。

In situations where humans and robots are moving in the same space whilst performing their own tasks, predictable paths taken by mobile robots can not only make the environment feel safer, but humans can also help with the navigation in the space by avoiding path conflicts or not blocking the way. So predictable paths become vital. The cognitive effort for the human to predict the robot's path becomes untenable as the number of robots increases. As the number of humans increase, it also makes it harder for the robots to move while considering the motion of multiple humans. Additionally, if new people are entering the space -- like in restaurants, banks, and hospitals -- they would have less familiarity with the trajectories typically taken by the robots; this further increases the needs for predictable robot motion along paths. With this in mind, we propose to minimize the navigation-graph of the robot for position-based predictability, which is predictability from just the current position of the robot. This is important since the human cannot be expected to keep track of the goals and prior actions of the robot in addition to doing their own tasks. In this paper, we define measures for position-based predictability, then present and evaluate a hill-climbing algorithm to minimize the navigation-graph (directed graph) of robot motion. This is followed by the results of our human-subject experiments which support our proposed methodology.

翻訳日:2022-10-02 05:31:45 公開日:2022-01-11

# 複数の二次変数を用いた非定常ランダム関数の組込みモデル推定器

An Embedded Model Estimator for Non-Stationary Random Functions using Multiple Secondary Variables ( http://arxiv.org/abs/2011.04116v4 )

ライセンス: Link先を確認

Colin Daly

(参考訳) 複数の二次変数を用いた非定常空間モデリングアルゴリズムを開発した。ジオ統計学と量子ランダムフォレストを組み合わせて、新しい補間と確率シミュレーションアルゴリズムを提供する。本稿では,本手法を導入し,地理的モデリングや量子ランダムフォレストに適用した結果と自然に類似した一貫性を有することを示す。この方法では、モデルをさらに条件づけるために、krigingのような単純な補間技法を組み込むことができる。このアルゴリズムは、各ターゲット位置における目標変数の条件分布を推定することで動作する。このような分布の族は対象変数のエンベロープと呼ばれる。このことから、空間推定、量子化、不確実性を得ることができる。エンベロープから条件付きシミュレーションを生成するアルゴリズムも開発されている。封筒からサンプルを採取すると、二次変数の重要性、傾向、変数の相対的な変化に局所的に影響される。

An algorithm for non-stationary spatial modelling using multiple secondary variables is developed. It combines Geostatistics with Quantile Random Forests to give a new interpolation and stochastic simulation algorithm. This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests. The method allows for embedding of simpler interpolation techniques, such as Kriging, to further condition the model. The algorithm works by estimating a conditional distribution for the target variable at each target location. The family of such distributions is called the envelope of the target variable. From this, it is possible to obtain spatial estimates, quantiles and uncertainty. An algorithm to produce conditional simulations from the envelope is also developed. As they sample from the envelope, realizations are therefore locally influenced by relative changes of importance of secondary variables, trends and variability.

翻訳日:2022-09-28 02:04:42 公開日:2022-01-11

# (参考訳) 音声コマンド認識のためのテンソルトレインネットワークのハイブリッドモデルの構築

Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition ( http://arxiv.org/abs/2201.10609v1 )

ライセンス: CC BY 4.0

Jun Qi, Javier Tejedor

(参考訳) 本研究の目的は,モデルパラメータ数と分類精度のトレードオフを考慮し,低複雑性音声コマンド認識(SCR)システムを設計することである。具体的には、テンソルトレイン(TT)ネットワークの深いハイブリッドアーキテクチャを利用して、エンドツーエンドのSRCパイプラインを構築します。我々のコマンド認識システムであるCNN+(TT-DNN)は、スペクトル特徴抽出のための下部の畳み込み層と、コマンド分類のための上部のTT層で構成されている。提案するCNN+(TT-DNN)モデルでは,従来のCNNベースラインと比較して,完全連結(FC)層をTTモデルに置き換えることができ,CNNモデルのベースライン性能を維持しながら,モデルパラメータの大幅な削減が可能である。我々は、CNN+(TT-DNN)モデルをランダムに初期化し、あるいはよく訓練されたCNN+DNNに基づいて、Google Speech Command Dataset上でCNN+(TT-DNN)モデルを評価する。実験の結果,提案したCNN+(TT-DNN)モデルでは,CNNモデルよりも4倍少ないモデルパラメータで96.31%の競争精度が得られた。さらに、パラメータ数が増加するとCNN+(TT-DNN)モデルは97.2%の精度が得られる。

This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely CNN+(TT-DNN), is composed of convolutional layers at the bottom for spectral feature extraction and TT layers at the top for command classification. Compared with a traditional end-to-end CNN baseline for SCR, our proposed CNN+(TT-DNN) model replaces fully connected (FC) layers with TT ones and it can substantially reduce the number of model parameters while maintaining the baseline performance of the CNN model. We initialize the CNN+(TT-DNN) model in a randomized manner or based on a well-trained CNN+DNN, and assess the CNN+(TT-DNN) models on the Google Speech Command Dataset. Our experimental results show that the proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31% with 4 times fewer model parameters than the CNN model. Furthermore, the CNN+(TT-DNN) model can obtain a 97.2% accuracy when the number of parameters is increased.

翻訳日:2022-01-30 13:51:16 公開日:2022-01-11

# (参考訳) 学習者のコース選択を支援するオープンMOOCレビューの大規模分析

Large Scale Analysis of Open MOOC Reviews to Support Learners' Course Selection ( http://arxiv.org/abs/2201.06967v1 )

ライセンス: CC BY-SA 4.0

Manuel J. Gomez, Mario Calder\'on, Victor S\'anchez, F\'elix J. Garc\'ia Clemente, Jos\'e A. Ruip\'erez-Valiente

(参考訳) 最近のパンデミックは教育の見方を変えました。子供や大学生だけがオンライン教育を利用しているわけではないことは驚きではない。過去数年間、何百万人もの大人がオンラインの授業やコースにサインアップし、courseraやedxなどのmoocプロバイダが、彼らのプラットフォームに登録した新規ユーザーを報告している。しかし、学生はコースを選択する際にいくつかの課題に直面している。オンラインレビューシステムは、多くの分野において標準的なものであるが、moocエコシステムには標準化された、あるいは完全に分散されたレビューシステムは存在しない。この分野では、よりシンプルで透明性の高いレビューシステムを構築するために、利用可能なオープンMOOCレビューを活用する機会があると考えています。 Specifically, in our research we analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms in order to determine the following: (1) if the numeric ratings provide discriminant information to learners, (2) if NLP-driven sentiment analysis on textual reviews could provide valuable information to learners, (3) if we can leverage NLP-driven topic finding techniques to infer themes that could be important for learners, and (4) if we can use these models to effectively characterize MOOCs based on the open reviews. その結果,数値評価は偏りが顕著であり (その63\%は5つ星評価である),トピック・モデリングにより,コース広告や実際の適用性,異なるコースの難易度などに関連する興味深い話題が明らかになった。我々は、この領域に光を当て、オンライン教育レビューにおいてより透明なアプローチを推進し、ポストパンデミック時代に入るにつれて、ますます人気が高まっていることを期待している。

The recent pandemic has changed the way we see education. It is not surprising that children and college students are not the only ones using online education. Millions of adults have signed up for online classes and courses during last years, and MOOC providers, such as Coursera or edX, are reporting millions of new users signing up in their platforms. However, students do face some challenges when choosing courses. Though online review systems are standard among many verticals, no standardized or fully decentralized review systems exist in the MOOC ecosystem. In this vein, we believe that there is an opportunity to leverage available open MOOC reviews in order to build simpler and more transparent reviewing systems, allowing users to really identify the best courses out there. Specifically, in our research we analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms in order to determine the following: (1) if the numeric ratings provide discriminant information to learners, (2) if NLP-driven sentiment analysis on textual reviews could provide valuable information to learners, (3) if we can leverage NLP-driven topic finding techniques to infer themes that could be important for learners, and (4) if we can use these models to effectively characterize MOOCs based on the open reviews. Results show that numeric ratings are clearly biased (63\% of them are 5-star ratings), and the topic modeling reveals some interesting topics related with course advertisements, the real applicability, or the difficulty of the different courses. We expect our study to shed some light on the area and promote a more transparent approach in online education reviews, which are becoming more and more popular as we enter the post-pandemic era.

翻訳日:2022-01-23 20:09:11 公開日:2022-01-11

# (参考訳) Quasi-Framelet: GraphNeural Networksのもうひとつの改善

Quasi-Framelets: Another Improvement to GraphNeural Networks ( http://arxiv.org/abs/2201.04728v1 )

ライセンス: CC BY 4.0

Mengxi Yang, Xuebin Zheng, Jie Yin and Junbin Gao

(参考訳) 本稿では,スペクトルグラフニューラルネットワークのためのマルチスケールフレームレット畳み込みの新しい設計を提案する。スペクトルパラダイムでは、スペクトル領域に様々なスペクトルフィルタを提案し、グローバルグラフ構造情報とローカルグラフ構造情報の両方をキャプチャすることで、グラフ学習タスクの性能を向上させる。既存のスペクトルアプローチは、いくつかのグラフでは優れた性能を示すが、柔軟性の欠如と、グラフ情報が不完全あるいは摂動的である場合に脆弱である。新しいフレームレット畳み込みは、スペクトル領域で直接設計されたフィルタリングファンクメントを組み込んで、これらの制限を克服します。提案した畳み込みはスペクトル情報の遮断に優れた柔軟性を示し,ノイズグラフ信号の負の効果を効果的に緩和する。また、実世界のグラフデータの不均一性を利用するため、新しいフレームレット畳み込みを用いたヘテロジニアスグラフニューラルネットワークは、マルチレベルグラフ解析によりメタパスの固有トポロジ情報を埋め込むソリューションを提供する。

This paper aims to provide a novel design of a multiscale framelets convolution for spectral graph neural networks. In the spectral paradigm, spectral GNNs improve graph learning task performance via proposing various spectral filters in spectral domain to capture both global and local graph structure information. Although the existing spectral approaches show superior performance in some graphs, they suffer from lack of flexibility and being fragile when graph information are incomplete or perturbated. Our new framelets convolution incorporates the filtering func-tions directly designed in the spectral domain to overcome these limitations. The proposed convolution shows a great flexibility in cutting-off spectral information and effectively mitigate the negative effect of noisy graph signals. Besides, to exploit the heterogeneity in real-world graph data, the heterogeneous graph neural network with our new framelet convolution provides a solution for embedding the intrinsic topological information of meta-path with a multi-level graph analysis.Extensive experiments have been conducted on real-world heterogeneous graphs and homogeneous graphs under settings with noisy node features and superior performance results are achieved.

翻訳日:2022-01-15 05:30:00 公開日:2022-01-11

# (参考訳) デュアルアテンションネットワークを用いた二型・ハイブリッド型市場知識グラフに基づく株価変動予測

Stock Movement Prediction Based on Bi-typed and Hybrid-relational Market Knowledge Graph via Dual Attention Networks ( http://arxiv.org/abs/2201.04965v1 )

ライセンス: CC BY 4.0

Yu Zhao, Huaming Du, Ying Liu, Shaopeng Wei, Xingyan Chen, Huali Feng, Qinghong Shuai, Qing Li, Fuzhen Zhuang, Gang Kou

(参考訳) 株式移動予測(SMP)は、上場企業の株価動向を予測することを目的としており、これは金融市場の不安定な性質のために難しい課題である。近年の金融研究では、モーメントの流出効果が株価変動に重要な役割を果たすことが示されている。しかし、従来の研究は通常、関連企業間の単純な接続情報のみを学習するが、実際の金融市場における上場企業の複雑な関係をモデル化することは必然的に失敗する。この問題に対処するため,我々はまず,上場企業とその関連役員を含む2種類のエンティティと,明示的関係と暗黙的関係を含むハイブリッド関係を含む,より包括的な市場ナレッジグラフ(mkg)を構築する。その後、構築したMKGに基づいて運動量流出信号を学習し、株価予測を行う新しいデュアルアテンションネットワークであるDanSmpを提案する。 sotaベースライン9に対して構築したデータセットを実験した結果,提案手法が構築したmkgを用いて在庫予測を改善できることが確認された。

Stock Movement Prediction (SMP) aims at predicting listed companies' stock future price trend, which is a challenging task due to the volatile nature of financial markets. Recent financial studies show that the momentum spillover effect plays a significant role in stock fluctuation. However, previous studies typically only learn the simple connection information among related companies, which inevitably fail to model complex relations of listed companies in the real financial market. To address this issue, we first construct a more comprehensive Market Knowledge Graph (MKG) which contains bi-typed entities including listed companies and their associated executives, and hybrid-relations including the explicit relations and implicit relations. Afterward, we propose DanSmp, a novel Dual Attention Networks to learn the momentum spillover signals based upon the constructed MKG for stock prediction. The empirical experiments on our constructed datasets against nine SOTA baselines demonstrate that the proposed DanSmp is capable of improving stock prediction with the constructed MKG.

翻訳日:2022-01-15 05:10:15 公開日:2022-01-11

# (参考訳) インターネット提供型認知行動療法におけるアドヒアランス予測 : 最小データ感度アプローチ

Adherence Forecasting for Guided Internet-Delivered Cognitive Behavioral Therapy: A Minimally Data-Sensitive Approach ( http://arxiv.org/abs/2201.04967v1 )

ライセンス: CC BY 4.0

Ulysse C\^ot\'e-Allard, Minh H. Pham, Alexandra K. Schultz, Tine Nordgreen, Jim Torresen

(参考訳) インターネット提供型心理的治療(IDPT)は、メンタルヘルスのアクセシビリティを向上させるための効果的でスケーラブルな経路であると考えられている。この文脈において、治療の順守は、伝統的な介入に比べて医療専門家と患者との相互作用が減っているため、特に問題となる。並行して、特にデジタル分野において、人々の個人データを使用する際の規制が増加している。このような規制では、データ最小化はしばしばGDPR(General Data Protection Regulation)のような中核的なテナントとなる。そこで本研究では,最小限の敏感なログイン/ログアウトデータにのみ依存しながら,自動アドバンス予測を行うディープラーニング手法を提案する。本研究は,インターネット提供型認知行動療法(G-ICBT)を施行した342例を対象に行った。提案するセルフアテンションネットワークは平均平均バランス精度を70%以上達成し,治療期間の1/3しか経過しなかった。そこで本研究では,G-ICBTの自動付着予測が,最小限の感度データのみを用いて実現可能であることを示す。

Internet-delivered psychological treatments (IDPT) are seen as an effective and scalable pathway to improving the accessibility of mental healthcare. Within this context, treatment adherence is an especially relevant challenge to address due to the reduced interaction between healthcare professionals and patients, compared to more traditional interventions. In parallel, there are increasing regulations when using peoples' personal data, especially in the digital sphere. In such regulations, data minimization is often a core tenant such as within the General Data Protection Regulation (GDPR). Consequently, this work proposes a deep-learning approach to perform automatic adherence forecasting, while only relying on minimally sensitive login/logout data. This approach was tested on a dataset containing 342 patients undergoing guided internet-delivered cognitive behavioral therapy (G-ICBT) treatment. The proposed Self-Attention Network achieved over 70% average balanced accuracy, when only 1/3 of the treatment duration had elapsed. As such, this study demonstrates that automatic adherence forecasting for G-ICBT, is achievable using only minimally sensitive data, thus facilitating the implementation of such tools within real-world IDPT platforms.

翻訳日:2022-01-15 04:55:11 公開日:2022-01-11

# fusion autoencoderによる深層クラスタリング

Deep clustering with fusion autoencoder ( http://arxiv.org/abs/2201.04727v1 )

ライセンス: Link先を確認

Shuai Chang

(参考訳) 近年,クラスタリング研究における表現学習の深層学習技術の導入が注目され,新たに開発されたクラスタリングパラダイムであるviz. the Deep Clustering (DC) が生み出されている。通常、DCモデルはオートエンコーダを利用して、結果としてクラスタリングプロセスを促進する固有の特徴を学ぶ。近年, 可変オートエンコーダ (VAE) と呼ばれる生成モデルがDC研究で広く受け入れられている。それでも、一般的なVAEは、包括的な潜伏する特徴を認識できないため、劣化するクラスタリングのパフォーマンスにつながる。本稿では,この問題に対処する新しいDC法を提案する。特に、生成逆数ネットワークとVAEは、下流クラスタリングタスクの恩恵を受けるより差別的な表現を識別するために、融合オートエンコーダ(FAE)と呼ばれる新しいオートエンコーダに結合される。さらに、FAEは、表現学習能力をさらに強化するディープ残差ネットワークアーキテクチャで実装されている。最後に、faeの潜在空間は、異なるクラスタを互いに引き離し、個々のクラスタ内のデータポイントを崩壊させる、深密ニューラルネットワークによって形成される埋め込み空間に変換される。複数の画像データセットを用いて実験を行い、ベースライン法に対する提案したDCモデルの有効性を示した。

Embracing the deep learning techniques for representation learning in clustering research has attracted broad attention in recent years, yielding a newly developed clustering paradigm, viz. the deep clustering (DC). Typically, the DC models capitalize on autoencoders to learn the intrinsic features which facilitate the clustering process in consequence. Nowadays, a generative model named variational autoencoder (VAE) has got wide acceptance in DC studies. Nevertheless, the plain VAE is insufficient to perceive the comprehensive latent features, leading to the deteriorative clustering performance. In this paper, a novel DC method is proposed to address this issue. Specifically, the generative adversarial network and VAE are coalesced into a new autoencoder called fusion autoencoder (FAE) for discerning more discriminative representation that benefits the downstream clustering task. Besides, the FAE is implemented with the deep residual network architecture which further enhances the representation learning ability. Finally, the latent space of the FAE is transformed to an embedding space shaped by a deep dense neural network for pulling away different clusters from each other and collapsing data points within individual clusters. Experiment conducted on several image datasets demonstrate the effectiveness of the proposed DC model against the baseline methods.

翻訳日:2022-01-14 14:16:04 公開日:2022-01-11

# 道路交通プロファイルのセンサレス推定のためのグラフ埋め込みの設計について

On the Design of Graph Embeddings for the Sensorless Estimation of Road Traffic Profiles ( http://arxiv.org/abs/2201.04968v1 )

ライセンス: Link先を確認

Eric L. Manibardo, Ibai La\~na, Esther Villar, and Javier Del Ser

(参考訳) トラフィック予測モデルは、認識、処理、保存が必要なデータに依存します。これには交通センシングインフラストラクチャの展開とメンテナンスが必要であり、しばしば耐え難い金銭コストに繋がる。センシングされた位置の欠如は、交通監視に必要な経済的投資をさらに減少させる合成データシミュレーションによって補うことができる。最も一般的なデータ生成アプローチの1つは、類似する道路のデータ分布に基づいて、実際のトラフィックパターンを生成することだ。同様の交通量で道路を検出するプロセスは、これらのシステムの重要なポイントである。しかし、この類似性に基づく探索には、ターゲット位置でデータを集めることなくフローメトリクスを使用できない。本稿では,道路セグメントのトポロジ的特徴を検査することで,交通データのある場所を検出する手法を提案する。関連する位相的特徴を数値表現(埋め込み)として抽出し、異なる場所を比較し、最終的にそれらの埋め込み間の類似性に基づいて最も類似した道路を見つける。本システムの性能について検討し,より単純なトラフィック推定手法と比較した。類似したデータソースを見つけた後、トラフィックプロファイルを合成するために生成手法が使用される。認識された道路における交通行動の類似性に応じて、生成法は1つの道路からのデータのみを供給できる。合成試料の精度の観点から, 数世代にわたって解析を行った。とりわけ,本研究は,合成交通試料の品質向上に向けたさらなる研究努力を刺激し,センサインフラストラクチャの必要性を減らすことを目的としている。

Traffic forecasting models rely on data that needs to be sensed, processed, and stored. This requires the deployment and maintenance of traffic sensing infrastructure, often leading to unaffordable monetary costs. The lack of sensed locations can be complemented with synthetic data simulations that further lower the economical investment needed for traffic monitoring. One of the most common data generative approaches consists of producing real-like traffic patterns, according to data distributions from analogous roads. The process of detecting roads with similar traffic is the key point of these systems. However, without collecting data at the target location no flow metrics can be employed for this similarity-based search. We present a method to discover locations among those with available traffic data by inspecting topological features of road segments. Relevant topological features are extracted as numerical representations (embeddings) to compare different locations and eventually find the most similar roads based on the similarity between their embeddings. The performance of this novel selection system is examined and compared to simpler traffic estimation approaches. After finding a similar source of data, a generative method is used to synthesize traffic profiles. Depending on the resemblance of the traffic behavior at the sensed road, the generation method can be fed with data from one road only. Several generation approaches are analyzed in terms of the precision of the synthesized samples. Above all, this work intends to stimulate further research efforts towards enhancing the quality of synthetic traffic samples and thereby, reducing the need for sensing infrastructure.

翻訳日:2022-01-14 13:05:15 公開日:2022-01-11

# (参考訳) MICCAI2021におけるHECKTORチャレンジの概要:PET/CT画像における頭頸部腫瘍分離とアウトカム予測

Overview of the HECKTOR Challenge at MICCAI 2021: Automatic Head and Neck Tumor Segmentation and Outcome Prediction in PET/CT Images ( http://arxiv.org/abs/2201.04138v1 )

ライセンス: CC BY 4.0

Vincent Andrearczyk, Valentin Oreiller, Sarah Boughdad, Catherine Chez Le Rest, Hesham Elhalawani, Mario Jreige, John O. Prior, Martin Valli\`eres, Dimitris Visvikis, Mathieu Hatt, Adrien Depeursinge

(参考訳) 本稿では,第24回医療画像コンピューティング・コンピュータ支援干渉会議(MICCAI)のサテライトイベントとして組織されたHECKTOR(HEAD and neCK Tumor)チャレンジの第2版の概要を紹介する。この課題は、頭頸部癌(h&n)に対するpet/ct画像の自動解析に関連する3つの課題から成り、咽頭領域に焦点を当てている。タスク1は、FDG-PET/CT画像におけるH&N原発グロス腫瘍ボリューム(GTVt)の自動セグメンテーションである。タスク2は、同じFDG-PET/CTからPFS(Progression Free Survival)の自動予測である。最後に、第3タスクは第2タスクと同じで、参加者にGTVtアノテーションが提供されている。データは6つのセンターから収集され、合計325枚の画像が224のトレーニングと101のテストケースに分割された。チャレンジへの関心は、103の登録チームと448の結果の提出による重要な参加によって強調された。第1タスクではDice similarity Coefficient(DSC)が0.7591、第2タスクでは0.7196、第3タスクでは0.6978のConcordance Index(C-index)がそれぞれ得られた。あらゆるタスクにおいて、アプローチの単純さが一般化性能を保証する鍵であることが判明した。タスク2と3におけるPFS予測性能の比較では、GTVt輪郭の提供は最良の結果を得るためには重要ではなかったことが示唆され、完全な自動手法が利用可能であることが示唆された。これはgtvtの整備の必要性を損なう可能性があり、何千もの潜在的対象を含む再現可能で大規模な放射線学研究への道を開く可能性がある。

This paper presents an overview of the second edition of the HEad and neCK TumOR (HECKTOR) challenge, organized as a satellite event of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021. The challenge is composed of three tasks related to the automatic analysis of PET/CT images for patients with Head and Neck cancer (H&N), focusing on the oropharynx region. Task 1 is the automatic segmentation of H&N primary Gross Tumor Volume (GTVt) in FDG-PET/CT images. Task 2 is the automatic prediction of Progression Free Survival (PFS) from the same FDG-PET/CT. Finally, Task 3 is the same as Task 2 with ground truth GTVt annotations provided to the participants. The data were collected from six centers for a total of 325 images, split into 224 training and 101 testing cases. The interest in the challenge was highlighted by the important participation with 103 registered teams and 448 result submissions. The best methods obtained a Dice Similarity Coefficient (DSC) of 0.7591 in the first task, and a Concordance index (C-index) of 0.7196 and 0.6978 in Tasks 2 and 3, respectively. In all tasks, simplicity of the approach was found to be key to ensure generalization performance. The comparison of the PFS prediction performance in Tasks 2 and 3 suggests that providing the GTVt contour was not crucial to achieve best results, which indicates that fully automatic methods can be used. This potentially obviates the need for GTVt contouring, opening avenues for reproducible and large scale radiomics studies including thousands potential subjects.

翻訳日:2022-01-13 23:49:07 公開日:2022-01-11

# (参考訳) 自動tether-netシステムを用いた一般化デブリ捕獲のためのロバストポリシーの学習

Learning Robust Policies for Generalized Debris Capture with an Automated Tether-Net System ( http://arxiv.org/abs/2201.04180v1 )

ライセンス: CC BY 4.0

Chen Zeng, Grant Hecht, Prajit KrisshnaKumar, Raj K. Shah, Souma Chowdhury and Eleonora M. Botta

(参考訳) チェイサー宇宙船から打ち上げられたテザーネットは、軌道上の大きな宇宙ゴミを捕獲し処分する有望な方法を提供する。このテザネットシステムは、ネット発射・閉鎖制御の性能に影響を及ぼすセンサとアクチュエーターの不確実性の原因を複数抱えている。しかし、初期の信頼性に基づく制御アクション設計の最適化アプローチは、チェイサーに対する様々な発射シナリオと目標(デブリス)状態の一般化を困難かつ計算的に禁止している。本稿では,汎用かつ信頼性の高い制御ポリシを探索するために,ppo(proximal policy optimization)アプローチとネットダイナミクスシミュレーションを統合した強化学習フレームワークを提案する。後者は、ネットベースのターゲットキャプチャのエピソードを評価し、PPO2に対する報酬フィードバックとして機能するキャプチャ品質指標を推定する。ここで、学習されたポリシーは、任意の発射シナリオに基づいて、移動網の状態と目標に基づいて、網閉動作のタイミングをモデル化するように設計されている。状態推定と起動動作に合成不確実性を組み込むために,確率的状態遷移モデルを考える。トレーニング中の顕著な報酬改善に加えて、トレーニングされたポリシは、個々のシナリオで実行される信頼性ベースの最適化によって得られたものに近い(幅広い発射/目標シナリオにわたる)キャプチャパフォーマンスを実証する。

Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose of large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability-based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to the chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learned policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability-based optimization run over an individual scenario.

翻訳日:2022-01-13 23:47:09 公開日:2022-01-11

# (参考訳) Hyper Transformer: 教師付き半教師付きFew-Shot学習のためのモデル生成

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning ( http://arxiv.org/abs/2201.04182v1 )

ライセンス: CC BY 4.0

Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

(参考訳) 本研究では,支援サンプルから直接畳み込みニューラルネットワーク(CNN)の重みを生成する,数ショット学習のためのトランスフォーマーベースモデルであるHyperTransformerを提案する。特定のタスクに対する小さなCNNモデルの依存は、高容量トランスフォーマーモデルによって符号化されるので、大きなタスク空間の複雑さと個々のタスクの複雑さを効果的に分離する。提案手法は, タスク依存型埋め込みの学習が最適ではなく, タスクに関する情報が全てのモデルパラメータを変調できる場合に, より優れた性能が得られるような, 小さなターゲットCNNアーキテクチャにおいて特に有効である。より大きなモデルの場合、最後のレイヤを生成するだけで、最先端のメソッドで得られるものよりも競争性や優れた結果を生み出すことができることが分かりました。最後に,提案手法を,サポートセットの未ラベルサンプルを利用した半教師付きシステムに拡張し,さらに撮影性能を向上する。

In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable. Finally, we extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.

翻訳日:2022-01-13 23:34:31 公開日:2022-01-11

# (参考訳) ディープラーニングに基づく駐車場サービスの動的価格設定

Dynamic Price of Parking Service based on Deep Learning ( http://arxiv.org/abs/2201.04188v1 )

ライセンス: CC BY 4.0

Alejandro Luque-Cerpa, Miguel A. Guti\'errez-Naranjo, Miguel C\'ardenas-Montes

(参考訳) 都市における空気質の向上は公共団体の主要な関心事の一つである。この懸念は、空気質と公衆衛生の間の証拠から生じている。この地域の政府機関の主な取り組みは、監視と予測システム、汚染された自動車の禁止、低品質空気の期間の交通制限などである。本研究は,規制駐車場サービスにおける動的価格設定の提案である。駐車場サービスの動的な価格は、低品質のエピソードが予測されたときに自動車の駐車を妨げなければならない。この目的のために、多様なディープラーニング戦略を評価する。彼らは共通して、市内の空気品質に関するラベルを予測するために、集合的空気品質測定を使用する。本提案はマドリード(spain)の経済パラメータと深層学習品質基準を用いて評価される。

The improvement of air-quality in urban areas is one of the main concerns of public government bodies. This concern emerges from the evidence between the air quality and the public health. Major efforts from government bodies in this area include monitoring and forecasting systems, banning more pollutant motor vehicles, and traffic limitations during the periods of low-quality air. In this work, a proposal for dynamic prices in regulated parking services is presented. The dynamic prices in parking service must discourage motor vehicles parking when low-quality episodes are predicted. For this purpose, diverse deep learning strategies are evaluated. They have in common the use of collective air-quality measurements for forecasting labels about air quality in the city. The proposal is evaluated by using economic parameters and deep learning quality criteria at Madrid (Spain).

翻訳日:2022-01-13 23:07:37 公開日:2022-01-11

# (参考訳) ニューラルネットワーク容量:エッジダイナミクスによるニューラルネットワーク選択の新しい視点

Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics ( http://arxiv.org/abs/2201.04194v1 )

ライセンス: CC BY 4.0

Chunheng Jiang, Tejaswini Pedapati, Pin-Yu Chen, Yizhou Sun, Jianxi Gao

(参考訳) 下流タスクに適切なトレーニング済みニューラルネットワークを特定するための効率的なモデル選択は、ディープラーニングの基本的な課題である。現在の実践では、パフォーマンス予測のためのモデルトレーニングに高価な計算コストを必要とする。本稿では,学習中のシナプス接続(エッジ)上の制御ダイナミクスを解析し,ニューラルネットワーク選択のための新しいフレームワークを提案する。我々のフレームワークは、ニューラルネットワークトレーニング中のバックプロパゲーションがシナプス接続の動的進化と等価であるという事実に基づいている。したがって、収束ニューラルネットワークは、これらのエッジからなるネットワークシステムの平衡状態と関連付けられる。この目的のために、ニューラルネットワーク$G_A$を有向線グラフ$G_B$に変換するネットワークマッピング$\phi$を構築し、これらエッジ上で定義した$G_A$を$G_A$とする。次に、一握りの初期のトレーニング結果を用いて、下流タスクにおける$g_a$の一般化能力を普遍的に捉える予測指標として、ニューラルキャパシタンスメトリック$\beta_{\rm eff}$を導出する。本フレームワークの微調整性能を評価するために,17種類のイメージネットモデルとcifar10,cifar100,svhn,fashion mnist,birdsを含む5つのベンチマークデータセットを用いて広範な実験を行った。我々のニューラルキャパシタンスメトリックは、初期トレーニング結果のみに基づいたモデル選択の強力な指標であり、最先端の手法よりも効率的である。

Efficient model selection for identifying a suitable pre-trained neural network to a downstream task is a fundamental yet challenging task in deep learning. Current practice requires expensive computational costs in model training for performance prediction. In this paper, we propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections. Therefore, a converged neural network is associated with an equilibrium state of a networked system composed of those edges. To this end, we construct a network mapping $\phi$, converting a neural network $G_A$ to a directed line graph $G_B$ that is defined on those edges in $G_A$. Next, we derive a neural capacitance metric $\beta_{\rm eff}$ as a predictive measure universally capturing the generalization capability of $G_A$ on the downstream task using only a handful of early training results. We carried out extensive experiments using 17 popular pre-trained ImageNet models and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST and Birds, to evaluate the fine-tuning performance of our framework. Our neural capacitance metric is shown to be a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.

翻訳日:2022-01-13 22:54:11 公開日:2022-01-11

# (参考訳) 視覚ロボットのための深層強化学習アルゴリズムのベンチマーク

Benchmarking Deep Reinforcement Learning Algorithms for Vision-based Robotics ( http://arxiv.org/abs/2201.04224v1 )

ライセンス: CC BY 4.0

Swagat Kumar, Hayden Sampson, Ardhendu Behera

(参考訳) 本稿では,2つの視覚に基づくロボット工学問題の解法として,最先端の強化学習アルゴリズムのベンチマーク研究を行う。本研究で検討されているアルゴリズムは、ソフトアクター・クリティック(SAC)、近位ポリシー最適化(PPO)、補間ポリシー勾配(IPG)、およびHER(Hindsight Experience replay)を含む。これらのアルゴリズムの性能は、PyBulletの2つのシミュレーション環境であるKukaDiverseObjectEnvとRacecarZEDGymEnvと比較される。これらの環境における状態観察はRGB画像の形で利用可能であり、アクション空間は連続しており、解決が困難である。基本的には単一ゴール環境であるこれらの問題に対してHERアルゴリズムを実装するのに必要な、いくつかの戦略が提案されている。また,学習過程に空間的および時間的注意を組み込むために,いくつかの特徴抽出アーキテクチャが提案されている。厳密なシミュレーション実験により、これらの成分による改善が確立される。私たちの知る限りでは、上記の2つのビジョンベースのロボット工学の問題に対して、このようなベンチマーク研究は利用できない。

This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight Experience replay (HER). The performances of these algorithms are compared against PyBullet's two simulation environments known as KukaDiverseObjectEnv and RacecarZEDGymEnv respectively. The state observations in these environments are available in the form of RGB images and the action space is continuous, making them difficult to solve. A number of strategies are suggested to provide intermediate hindsight goals required for implementing HER algorithm on these problems which are essentially single-goal environments. In addition, a number of feature extraction architectures are proposed to incorporate spatial and temporal attention in the learning process. Through rigorous simulation experiments, the improvement achieved with these components are established. To the best of our knowledge, such a benchmarking study is not available for the above two vision-based robotics problems making it a novel contribution in the field.

翻訳日:2022-01-13 22:06:30 公開日:2022-01-11

# (参考訳) ラベルなしデータを活用して分散性能を予測する

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance ( http://arxiv.org/abs/2201.04234v1 )

ライセンス: CC BY 4.0

Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi

(参考訳) 実世界の機械学習のデプロイメントは、パフォーマンス低下を引き起こす可能性のあるソース(トレーニング)とターゲット(テスト)ディストリビューションのミスマッチによって特徴づけられる。本研究では,ラベル付きソースデータとラベルなしターゲットデータのみを用いて,対象領域の精度を予測する手法を検討する。本稿では,モデル信頼度がしきい値を超える未ラベル例のごく一部として精度を予測し,モデルの信頼度にしきい値を求める実践的手法である平均閾値保持信頼度(ATC)を提案する。 ATCは、いくつかのモデルアーキテクチャ、分散シフトのタイプ(例えば、合成腐敗、データセットの再生、新しいサブポピュレーション)、データセット(Wilds、ImageNet、Breeds、CIFAR、MNIST)において、以前の方法よりも優れていた。我々の実験では、ATCは目標性能を従来の方法よりも正確に2-4\times$と見積もっている。また,問題の理論的基礎を探究し,一般には,最適な予測者を特定するのと同じくらい精度の特定が困難であることを示す。最後に,この手法をおもちゃの分布上で解析し,その動作状況について考察する。

Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceeds that threshold. ATC outperforms previous methods across several model architectures, types of distribution shifts (e.g., due to synthetic corruptions, dataset reproduction, or novel subpopulations), and datasets (Wilds, ImageNet, Breeds, CIFAR, and MNIST). In our experiments, ATC estimates target performance $2$-$4\times$ more accurately than prior methods. We also explore the theoretical foundations of the problem, proving that, in general, identifying the accuracy is just as hard as identifying the optimal predictor and thus, the efficacy of any method rests upon (perhaps unstated) assumptions on the nature of the shift. Finally, analyzing our method on some toy distributions, we provide insights concerning when it works.

翻訳日:2022-01-13 21:54:35 公開日:2022-01-11

# 2つの誤りが正しい: 化学の精度による化学発見のための転移学習アプローチ

Two Wrongs Can Make a Right: A Transfer Learning Approach for Chemical Discovery with Chemical Accuracy ( http://arxiv.org/abs/2201.04243v1 )

ライセンス: Link先を確認

Chenru Duan, Daniel B. K. Chu, Aditya Nandy, and Heather J. Kulik

(参考訳) 仮想高スループットスクリーニング(VHTS)において,MR特性を有する分子や材料を適切に同定・処理することが,高データの忠実性を実現する上で重要である。しかしながら、ほとんどのVHTSは1つの関数を使って近似密度汎関数理論(DFT)を用いて実行される。多くのMR診断が開発されているにもかかわらず、そのような診断の単一値が化学特性予測に対するMR効果を示す範囲は十分に確立されていない。我々は1万以上の遷移金属錯体(TMC)のMR診断を評価し,有機分子と比較した。 MR診断は,これらの材料空間間でのみ行うことができる。 mr特性が複数のポテンシャルエネルギー面(すなわち、断熱スピン分割、$\delta e_\mathrm{h-l}$、イオン化ポテンシャル、ip)を含む化学的性質(すなわちmr効果)に与える影響を調べることにより、mr効果のキャンセルが蓄積よりも優れていることを観察する。 MR特性の差は特性予測におけるMR効果の予測においてMR特性の総程度よりも重要である。この観測により、我々は、CCSD(T)レベルのアダイバティック$\Delta E_\mathrm{H-L}$とIPを低い理論レベルから直接予測する転送学習モデルを構築した。これらのモデルと不確実性定量化と多レベルモデリングを組み合わせることで、ロバストvhtsの化学精度(すなわち1kcal/mol)を保ちながら、データ取得を少なくとも3倍に加速するマルチプロング戦略を導入する。

Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high throughput screening (VHTS). Nevertheless, most VHTS is carried out with approximate density functional theory (DFT) using a single functional. Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates MR effect on chemical property prediction is not well established. We evaluate MR diagnostics of over 10,000 transition metal complexes (TMCs) and compare to those in organic molecules. We reveal that only some MR diagnostics are transferable across these materials spaces. By studying the influence of MR character on chemical properties (i.e., MR effect) that involves multiple potential energy surfaces (i.e., adiabatic spin splitting, $\Delta E_\mathrm{H-L}$, and ionization potential, IP), we observe that cancellation in MR effect outweighs accumulation. Differences in MR character are more important than the total degree of MR character in predicting MR effect in property prediction. Motivated by this observation, we build transfer learning models to directly predict CCSD(T)-level adiabatic $\Delta E_\mathrm{H-L}$ and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving chemical accuracy (i.e., 1 kcal/mol) for robust VHTS.

翻訳日:2022-01-13 15:18:24 公開日:2022-01-11

# チューリングトラップ:人間のような人工知能の約束と約束

The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence ( http://arxiv.org/abs/2201.04200v1 )

ライセンス: Link先を確認

Erik Brynjolfsson

(参考訳) 1950年、アラン・チューリングは機械が知的かどうかの究極のテストとして模倣ゲームを提案した。それ以来、人間の知性と一致するインテリジェンスを作ることが、何千人もの研究者、エンジニア、起業家の目標となっている。 human-like artificial intelligence(hlai)のメリットには、生産性の高騰、余暇の増加、そしておそらく最も深く、私たちの心をより深く理解することが含まれる。しかし、あらゆるタイプのAIが人間に似ているわけではない。実際、最も強力なシステムの多くは、人間とは大きく異なる。そのため、HLAIの開発とデプロイに過度な注力が、私たちを罠に陥れさせます。機械が人間の労働の代用品になるにつれて、労働者は経済的、政治的交渉力を失い、テクノロジーを制御する人々に依存するようになる。対照的に、AIが人間を模倣するのではなく強化することに焦点を当てている場合、人間は創造された価値の共有を主張する力を保持します。さらに、強化は新たな機能と新製品やサービスを生み出し、最終的には人間のようなAIよりもはるかに多くの価値を生み出す。どちらのタイプのAIも非常に有益だが、現在、技術者やビジネスエグゼクティブ、政策立案者の間では、自動化に対する過度のインセンティブがある。

In 1950, Alan Turing proposed an imitation game as the ultimate test of whether a machine was intelligent: could a machine imitate a human so well that its answers to questions indistinguishable from a human. Ever since, creating intelligence that matches human intelligence has implicitly or explicitly been the goal of thousands of researchers, engineers, and entrepreneurs. The benefits of human-like artificial intelligence (HLAI) include soaring productivity, increased leisure, and perhaps most profoundly, a better understanding of our own minds. But not all types of AI are human-like. In fact, many of the most powerful systems are very different from humans. So an excessive focus on developing and deploying HLAI can lead us into a trap. As machines become better substitutes for human labor, workers lose economic and political bargaining power and become increasingly dependent on those who control the technology. In contrast, when AI is focused on augmenting humans rather than mimicking them, then humans retain the power to insist on a share of the value created. Furthermore, augmentation creates new capabilities and new products and services, ultimately generating far more value than merely human-like AI. While both types of AI can be enormously beneficial, there are currently excess incentives for automation rather than augmentation among technologists, business executives, and policymakers.

翻訳日:2022-01-13 15:01:49 公開日:2022-01-11

# 信頼できない知的意思決定支援システムのサブゴールによる説明

Subgoal-Based Explanations for Unreliable Intelligent Decision Support Systems ( http://arxiv.org/abs/2201.04204v1 )

ライセンス: Link先を確認

Devleena Das, Been Kim, Sonia Chernova

(参考訳) インテリジェント意思決定支援(IDS)システムは、人工知能技術を活用して、タスクの意思決定フェーズを通じて人間のユーザを導くレコメンデーションを生成する。しかし、重要な課題は、IDSシステムが完璧ではなく、複雑な実世界のシナリオでは誤った出力を生成したり、完全に動作しない可能性があることである。説明可能なAI計画(XAIP)の分野は、AIシステムをエンドユーザにとってより説明しやすいものにするシーケンシャルな意思決定を行う技術の開発を目指している。批判的に、IDSシステムにXAIP技術を適用する前の作業は、プランナーが提案するプランが常に最適であり、ユーザへの意思決定支援として推奨されるアクションやプランが常に正しいと仮定されている。本研究は,非不正IDSシステムとの初級ユーザインタラクションについて検討し,ユーザがガイダンスに慣れた後に利用できなくなる可能性のある,誤ったアクションを推奨するシステムについて検討する。本報告では,従来のIDS出力を補完し,推奨行動が寄与するサブゴールに関する情報を付加する,新たな説明型,サブゴールベース説明法を提案する。我々は、サブゴールベースの説明がユーザタスクのパフォーマンスの向上、最適なIDSレコメンデーションと最適でないIDSレコメンデーションを区別するユーザ能力の向上、IDS障害時により堅牢なユーザパフォーマンスの実現につながることを実証した。

Intelligent decision support (IDS) systems leverage artificial intelligence techniques to generate recommendations that guide human users through the decision making phases of a task. However, a key challenge is that IDS systems are not perfect, and in complex real-world scenarios may produce incorrect output or fail to work altogether. The field of explainable AI planning (XAIP) has sought to develop techniques that make the decision making of sequential decision making AI systems more explainable to end-users. Critically, prior work in applying XAIP techniques to IDS systems has assumed that the plan being proposed by the planner is always optimal, and therefore the action or plan being recommended as decision support to the user is always correct. In this work, we examine novice user interactions with a non-robust IDS system -- one that occasionally recommends the wrong action, and one that may become unavailable after users have become accustomed to its guidance. We introduce a novel explanation type, subgoal-based explanations, for planning-based IDS systems, that supplements traditional IDS output with information about the subgoal toward which the recommended action would contribute. We demonstrate that subgoal-based explanations lead to improved user task performance, improve user ability to distinguish optimal and suboptimal IDS recommendations, are preferred by users, and enable more robust user performance in the case of IDS failure

翻訳日:2022-01-13 14:58:09 公開日:2022-01-11

# 音楽スコア画像の領域ベースレイアウト解析

Region-based Layout Analysis of Music Score Images ( http://arxiv.org/abs/2201.04214v1 )

ライセンス: Link先を確認

Francisco J. Castellanos, Carlos Garrido-Munoz, Antonio R\'ios-Vila, Jorge Calvo-Zaragoza

(参考訳) レイアウト解析(LA)ステージは、光学音楽認識(OMR)システムの正しい性能において極めて重要である。スタブや歌詞などの興味のある領域を識別し、その内容の書き起こしのために処理しなければならない。ディープラーニングに基づく現代的なアプローチが存在するにもかかわらず、OMRにおけるLAの徹底的な研究は、異なるモデルの精度、異なるドメインへの一般化、あるいはより重要なのは、パイプラインのその後のステージへの影響に関してまだ行われていない。この研究は、異なるニューラルアーキテクチャ、音楽文書タイプ、評価シナリオの実験的な研究により、文学におけるこのギャップを埋めることに焦点を当てている。トレーニングデータの必要性は、実際のシナリオにおけるLAアプローチの効率的な適用を可能にする、新しい半合成データ生成技術の提案につながっている。結果はこう示しています (i)モデルの選択とその性能は、転写過程全体において不可欠である。 (ii)laステージを評価するために一般的に用いられる指標は、omrシステムの最終性能と必ずしも相関しない。 (iii)提案手法は,ラベル付きデータの限られたセットで最先端の成果を実現できる。

The Layout Analysis (LA) stage is of vital importance to the correct performance of an Optical Music Recognition (OMR) system. It identifies the regions of interest, such as staves or lyrics, which must then be processed in order to transcribe their content. Despite the existence of modern approaches based on deep learning, an exhaustive study of LA in OMR has not yet been carried out with regard to the precision of different models, their generalization to different domains or, more importantly, their impact on subsequent stages of the pipeline. This work focuses on filling this gap in literature by means of an experimental study of different neural architectures, music document types and evaluation scenarios. The need for training data has also led to a proposal for a new semi-synthetic data generation technique that enables the efficient applicability of LA approaches in real scenarios. Our results show that: (i) the choice of the model and its performance are crucial for the entire transcription process; (ii) the metrics commonly used to evaluate the LA stage do not always correlate with the final performance of the OMR system, and (iii) the proposed data-generation technique enables state-of-the-art results to be achieved with a limited set of labeled data.

翻訳日:2022-01-13 14:57:24 公開日:2022-01-11

# インシデント1M:自然災害・被害・インシデントを含む大規模画像データセット

Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents ( http://arxiv.org/abs/2201.04236v1 )

ライセンス: Link先を確認

Ethan Weber, Dim P. Papadopoulos, Agata Lapedriza, Ferda Ofli, Muhammad Imran, Antonio Torralba

(参考訳) 洪水、竜巻、山火事などの自然災害は、地球が地球温暖化に陥るにつれてますます広まりつつある。事故の発生時期や発生時期を予測することは困難であり、破壊的な出来事によって危険にさらされている人々の命を救うために、時折緊急対応が重要となる。幸いなことに、このような状況ではテクノロジーが役割を担います。ソーシャルメディア投稿は、災害の進行と余波を理解するために低レイテンシデータソースとして使用できるが、このデータを解析するのは、自動化された方法なしでは面倒である。以前の研究はテキストベースのフィルタリングが中心だったが、画像とビデオベースのフィルタリングはほとんど未調査のままである。本研究では,43のインシデントと49のカテゴリを含む977,088の画像を含む大規模マルチラベルデータセットであるインシデント1Mデータセットを提案する。データセットの構築、統計、潜在的なバイアスの詳細、インシデント検出モデルの導入とトレーニング、flickrやtwitter上の数百万の画像に対するイメージフィルタリング実験を実施します。また,人道支援のためのコンピュータビジョンにおける今後の作業を促進するために,インシデント分析に関するいくつかの応用について述べる。コード、データ、モデルはhttp://incidentsdataset.csail.mit.eduで利用可能である。

Natural disasters, such as floods, tornadoes, or wildfires, are increasingly pervasive as the Earth undergoes global warming. It is difficult to predict when and where an incident will occur, so timely emergency response is critical to saving the lives of those endangered by destructive events. Fortunately, technology can play a role in these situations. Social media posts can be used as a low-latency data source to understand the progression and aftermath of a disaster, yet parsing this data is tedious without automated methods. Prior work has mostly focused on text-based filtering, yet image and video-based filtering remains largely unexplored. In this work, we present the Incidents1M Dataset, a large-scale multi-label dataset which contains 977,088 images, with 43 incident and 49 place categories. We provide details of the dataset construction, statistics and potential biases; introduce and train a model for incident detection; and perform image-filtering experiments on millions of images on Flickr and Twitter. We also present some applications on incident analysis to encourage and enable future work in computer vision for humanitarian aid. Code, data, and models are available at http://incidentsdataset.csail.mit.edu.

翻訳日:2022-01-13 14:57:07 公開日:2022-01-11

# デジタル人間アバターのバーチャルコプレゼンスへの応用に関する調査研究

A Survey on Applications of Digital Human Avatars toward Virtual Co-presence ( http://arxiv.org/abs/2201.04168v1 )

ライセンス: Link先を確認

Matthew Korban, Xin Li

(参考訳) 本稿では,対話型仮想コプレゼンス(VCP)環境に対するデジタルアバターの構築と利用について検討する。我々は, VCP環境構築技術の発展と, 人工知能(AI)とコンピュータグラフィックスの進歩が, VCP環境の品質に与える影響を評価する。文献における様々な手法を応用と方法論に基づいて分類し,その応用,貢献,限界に基づく様々なグループと戦略を比較した。また、デジタル人間のアバターではなく、他の形態の人間の表現がVCP環境で利用されるというアプローチについても、簡単な議論がなされている。我々の目標は、アバターベースのVCP環境を構築するための様々なアプローチを調査する文献レビューが不足している研究領域のギャップを埋めることである。 VCPやバーチャルリアリティ(VR)環境における人間の表現に関する今後の研究に役立つと期待する。我々の知る限りでは、アバターベースのVCP環境を調査する最初の調査である。具体的には,アバターベース手法の分類手法が新たに提案されている。

This paper investigates different approaches to build and use digital human avatars toward interactive Virtual Co-presence (VCP) environments. We evaluate the evolution of technologies for creating VCP environments and how the advancement in Artificial Intelligence (AI) and Computer Graphics affect the quality of VCP environments. We categorize different methods in the literature based on their applications and methodology and compare various groups and strategies based on their applications, contributions, and limitations. We also have a brief discussion about the approaches that other forms of human representation, rather than digital human avatars, have been utilized in VCP environments. Our goal is to fill the gap in the research domain where there is a lack of literature review investigating different approaches for creating avatar-based VCP environments. We hope this study will be useful for future research involving human representation in VCP or Virtual Reality (VR) environments. To the best of our knowledge, it is the first survey research that investigates avatar-based VCP environments. Specifically, the categorization methodology suggested in this paper for avatar-based methods is new.

翻訳日:2022-01-13 14:50:38 公開日:2022-01-11

# MDPose:WiFiマイクロドップラー信号を用いた人骨格運動再構成

MDPose: Human Skeletal Motion Reconstruction Using WiFi Micro-Doppler Signatures ( http://arxiv.org/abs/2201.04212v1 )

ライセンス: Link先を確認

Chong Tang, Wenda Li, Shelly Vishwakarma, Fangzhan Shi, Simon Julier, Kevin Chetty

(参考訳) 光学センサに基づくモーショントラッキングシステムは、通常、照明条件の悪さ、閉塞、カバー範囲の制限などの問題に苦しめられ、プライバシーの懸念が高まる。最近では、無線周波数(RF)ベースの商用WiFiデバイスによるアプローチが登場し、プライバシーを保護しながら、低コストでユビキタスなセンシングを提供する。しかし、Range-Doppler SpectrogramsのようなRFセンシングシステムの出力は直感的に人間の動きを表現することができず、通常はさらなる処理を必要とする。本研究では,WiFiマイクロドップラーシグネチャに基づくヒト骨格運動再建のための新しいフレームワークであるMDPoseを提案する。従来のRFセンシング出力の解釈をより理解しやすい方法で支援できる17個のキーポイントを持つ骨格モデルを再構築することで、人間の活動を追跡する効果的なソリューションを提供する。第一に、特徴抽出に影響を及ぼす可能性のある不要なノイズを除去し、弱いドップラーシグネチャを強化するために、デノイジングアルゴリズムが実装されている。次に、畳み込みニューラルネットワーク(cnn)-リカレントニューラルネットワーク(rnn)アーキテクチャを用いて、クリーンなマイクロドップラー署名から時間空間依存性を学習し、キーポイントの速度情報を復元する。最後に、スケルトンの初期状態を推定し、エラーの増加を制限するためにポーズ最適化機構を用いる。我々は,MDPoseの性能を示すために,複数の被験者を用いて様々な環境で総合的な実験を行い,29.4mm平均キーポイント位置における絶対誤差を報告した。

Motion tracking systems based on optical sensors typically often suffer from issues, such as poor lighting conditions, occlusion, limited coverage, and may raise privacy concerns. More recently, radio frequency (RF)-based approaches using commercial WiFi devices have emerged which offer low-cost ubiquitous sensing whilst preserving privacy. However, the output of an RF sensing system, such as Range-Doppler spectrograms, cannot represent human motion intuitively and usually requires further processing. In this study, MDPose, a novel framework for human skeletal motion reconstruction based on WiFi micro-Doppler signatures, is proposed. It provides an effective solution to track human activities by reconstructing a skeleton model with 17 key points, which can assist with the interpretation of conventional RF sensing outputs in a more understandable way. Specifically, MDPose has various incremental stages to gradually address a series of challenges: First, a denoising algorithm is implemented to remove any unwanted noise that may affect the feature extraction and enhance weak Doppler signatures. Secondly, the convolutional neural network (CNN)-recurrent neural network (RNN) architecture is applied to learn temporal-spatial dependency from clean micro-Doppler signatures and restore key points' velocity information. Finally, a pose optimising mechanism is employed to estimate the initial state of the skeleton and to limit the increase of error. We have conducted comprehensive tests in a variety of environments using numerous subjects with a single receiver radar system to demonstrate the performance of MDPose, and report 29.4mm mean absolute error over all key points positions, which outperforms state-of-the-art RF-based pose estimation systems.

翻訳日:2022-01-13 14:50:23 公開日:2022-01-11

# smartdet:モバイルオブジェクト検出のためのエッジタスクオフロードのコンテキストアウェア動的制御

SmartDet: Context-Aware Dynamic Control of Edge Task Offloading for Mobile Object Detection ( http://arxiv.org/abs/2201.04235v1 )

ライセンス: Link先を確認

Davide Callegaro and Francesco Restuccia and Marco Levorato

(参考訳) モバイルデバイスはますます、重要なタスクを実行するためにディープニューラルネットワーク(DNN)を介してオブジェクト検出(OD)に依存している。複雑さが高いため、これらのDNNの実行には過剰な時間とエネルギーが必要である。低複雑さオブジェクトトラッキング(OT)はODで使用することができ、後者はトラッキングのための"フレッシュ"参照を生成するために定期的に適用される。しかし、odで処理されたフレームには大きな遅延が発生し、基準が時代遅れとなり追跡品質が低下する可能性がある。本稿では、エッジコンピューティングをこの文脈で使用し、大規模なODレイテンシに耐性のある(モバイルデバイスで)並列OTとOD(エッジサーバで)プロセスを確立することを提案する。過度のod遅延に対するシステムのレジリエンスを向上させる新しいトラッキング機構であるkatch-upを提案する。しかし、Katch-Upは性能が大幅に向上する一方、モバイルデバイスの計算負荷も増大する。そこで我々は,資源利用とOD性能のトレードオフの制御を学習する深層強化学習(DRL)に基づく,低複雑さのコントローラであるSmartDetを設計する。 smartdetは、現在のビデオコンテンツと現在のネットワーク条件に関連する入力コンテキスト関連情報を取り、odオフロードの頻度とタイプを最適化し、katch-up利用を最適化する。我々は,JetSon Nanoをモバイルデバイスとして,GTX 980 Tiをエッジサーバとして,Wi-Fiリンクを介して接続した実世界のテストベッド上でSmartDetを広範囲に評価した。実験結果によると、SmartDetは、平均的リコール(mAR)とリソース使用量という、トラッキングパフォーマンスの最適なバランスを実現している。完全なKatch-Upusageと最大チャネル使用率を持つベースラインに関しては、チャネルの50%削減とKatch-Upに関連する30%の電力リソースを使用しながら、mARを4%増加させています。最小限の資源を用いた固定戦略では、フレームの1/3でKatch-Upを用いてmARを20%増加させる。

Mobile devices increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Low-complexity object tracking (OT) can be used with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which may make the reference outdated and degrade tracking quality. Herein, we propose to use edge computing in this context, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose Katch-Up, a novel tracking mechanism that improves the system resilience to excessive OD delay. However, while Katch-Up significantly improves performance, it also increases the computing load of the mobile device. Hence, we design SmartDet, a low-complexity controller based on deep reinforcement learning (DRL) that learns controlling the trade-off between resource utilization and OD performance. SmartDet takes as input context-related information related to the current video content and the current network conditions to optimize frequency and type of OD offloading, as well as Katch-Up utilization. We extensively evaluate SmartDet on a real-world testbed composed of a JetSon Nano as mobile device and a GTX 980 Ti as edge server, connected through a Wi-Fi link. Experimental results show that SmartDet achieves an optimal balance between tracking performance - mean Average Recall (mAR) and resource usage. With respect to a baseline with full Katch-Upusage and maximum channel usage, we still increase mAR by 4% while using 50% less of the channel and 30% power resources associated with Katch-Up. With respect to a fixed strategy using minimal resources, we increase mAR by 20% while using Katch-Up on 1/3 of the frames.

翻訳日:2022-01-13 14:19:09 公開日:2022-01-11

# (参考訳) ヘイト音声識別のための特徴抽出に基づくモデル

A Feature Extraction based Model for Hate Speech Identification ( http://arxiv.org/abs/2201.04227v1 )

ライセンス: CC BY 4.0

Salar Mohtaj, Vera Schmitt, Sebastian M\"oller

(参考訳) ネット上でヘイトスピーチを検出することは重要な課題となり、傷つき、わいせつ、侮辱的コンテンツといった攻撃的な言語は、疎外された人々やグループを傷つける可能性がある。本稿では,インド・ヨーロッパ言語2021におけるヘイトスピーチと攻撃的コンテンツの識別に関する共通タスクのタスク1a,1bについて,tu berlinチームによる実験と結果について述べる。異なる自然言語処理モデルの成功は、競争を通じて各サブタスクに対して評価される。我々は,単語・文字レベルにおける再帰ニューラルネットワークに基づく異なるモデルと,競合によって提供されたデータセットに基づくbertに基づくトランスファー学習アプローチをテストした。実験に使用した実験モデルのうち、転送学習に基づくモデルは両方のサブタスクで最良の結果を得た。

The detection of hate speech online has become an important task, as offensive language such as hurtful, obscene and insulting content can harm marginalized people or groups. This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021. The success of different Natural Language Processing models is evaluated for the respective subtasks throughout the competition. We tested different models based on recurrent neural networks in word and character levels and transfer learning approaches based on Bert on the provided dataset by the competition. Among the tested models that have been used for the experiments, the transfer learning-based models achieved the best results in both subtasks.

翻訳日:2022-01-13 14:16:26 公開日:2022-01-11

# 統計学と機械学習でマネーロンダリングと戦う - 序文とレビュー

Fighting Money-Laundering with Statistics and Machine Learning: An Introduction and Review ( http://arxiv.org/abs/2201.04207v1 )

ライセンス: Link先を確認

Rasmus Jensen and Alexandros Iosifidis

(参考訳) マネーロンダリングは深刻な世界的な問題だ。それでも、このトピックに関する統計的および機械学習の研究はほとんどない。本稿では,銀行におけるマネーロンダリング対策に着目する。この分野の既存の研究を整理するために,統一的な用語を提案し,文献のレビューを行う。これは2つの中心的なタスクを中心に構成されている。 (i)クライアントのリスク・プロファイリング (ii)不審な行動顧客リスクプロファイリングは、診断、すなわちリスク要因の発見と説明の努力によって特徴づけられる。一方、突発的な行動フラグングは、開示されていない特徴と手作りのリスク指標によって特徴付けられる。最後に,今後の研究の方向性について述べる。大きな課題のひとつは、公開データセットの欠如だ。これは、合成データ生成によって対処される可能性がある。その他の研究の方向性としては、半教師付き深層学習、解釈可能性、結果の公平性などがある。

Money laundering is a profound, global problem. Nonetheless, there is little statistical and machine learning research on the topic. In this paper, we focus on anti-money laundering in banks. To help organize existing research in the field, we propose a unifying terminology and provide a review of the literature. This is structured around two central tasks: (i) client risk profiling and (ii) suspicious behavior flagging. We find that client risk profiling is characterized by diagnostics, i.e., efforts to find and explain risk factors. Suspicious behavior flagging, on the other hand, is characterized by non-disclosed features and hand-crafted risk indices. Finally, we discuss directions for future research. One major challenge is the lack of public data sets. This may, potentially, be addressed by synthetic data generation. Other possible research directions include semi-supervised and deep learning, interpretability and fairness of the results.

翻訳日:2022-01-13 14:01:19 公開日:2022-01-11

# (参考訳) 悪天候のビジョン:各種物体検出器を用いたサイクロンGANによる自律走行の堅牢な認識

Vision in adverse weather: Augmentation using CycleGANs with various object detectors for robust perception in autonomous racing ( http://arxiv.org/abs/2201.03246v2 )

ライセンス: CC BY 4.0

Izzeddin Teeti, Valentina Musat, Salman Khan, Alexander Rast, Fabio Cuzzolin, Andrew Bradley

(参考訳) 自律運転システムでは、環境からの特徴や物体を識別する認識が重要である。自律レースでは、高速と小さなマージンは迅速かつ正確な検知システムを必要とする。レース中、天候は突然変化し、認識が著しく低下し、非効率な操作が生じる。悪天候の検出を改善するために、ディープラーニングベースのモデルは、通常、そのような状況でキャプチャされた広範なデータセットを必要とする。しかし、最近のCycleGANアーキテクチャは、複数の気象条件下で非常に現実的なシーンを合成することができる。そこで本研究では, 夜間条件下での5つの最先端検出器のうち4つを平均42.7と4.4mAPのパーセンテージで改善するため, 自律レースにおける合成悪条件データセット(CycleGANを用いた)を用いたアプローチを提案する。さらに,5つの物体検出器の比較分析を行い,自律走行時に使用する検出器の最適ペアリングとトレーニングデータの同定を行った。

In an autonomous driving system, perception - identification of features and objects from the environment - is crucial. In autonomous racing, high speeds and small margins demand rapid and accurate detection systems. During the race, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres. In order to improve detection in adverse weather, deep-learning-based models typically require extensive datasets captured in such conditions - the collection of which is a tedious, laborious, and costly process. However, recent developments in CycleGAN architectures allow the synthesis of highly realistic scenes in multiple weather conditions. To this end, we introduce an approach of using synthesised adverse condition datasets in autonomous racing (generated using CycleGAN) to improve the performance of four out of five state-of-the-art detectors by an average of 42.7 and 4.4 mAP percentage points in the presence of night-time conditions and droplets, respectively. Furthermore, we present a comparative analysis of five object detectors - identifying the optimal pairing of detector and training data for use during autonomous racing in challenging conditions.

翻訳日:2022-01-13 12:38:25 公開日:2022-01-11

# (参考訳) ミラーラーニング:政策最適化の統一的枠組み

Mirror Learning: A Unifying Framework of Policy Optimisation ( http://arxiv.org/abs/2201.02373v2 )

ライセンス: CC BY 4.0

Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster

(参考訳) 総合政策改善(GPI)と信頼領域学習(TRL)は、マルコフ決定プロセス(MDP)のコアモデルとして機能する、現代強化学習(RL)における主要なフレームワークである。残念なことに、それらの数学的形式は修正に敏感であるため、それらを実装する実用的なインスタンス化は自動的に改善保証を継承しない。その結果、利用可能な厳密なMDP溶媒のスペクトルは狭い。実際、TRPOやPPOのような多くの最先端(SOTA)アルゴリズムは収束することが証明されていない。本稿では,RL問題に対する一般解である「textsl{mirror learning}」を提案する。我々は,GPI と TRL は,モノトニック改善特性を誇示し,最適ポリシーに収束する,このはるかに大きなアルゴリズム空間内の小さな点であることを明らかにした。 RLのための事実上全てのSOTAアルゴリズムがミラー学習の例であり、その経験的性能は近似的な類似ではなく理論的性質の結果であることを示す。興味深いことに、ミラー学習は、収束保証を伴う政策学習手法の全く新しい空間を開くことを示す。

General policy improvement (GPI) and trust-region learning (TRL) are the predominant frameworks within contemporary reinforcement learning (RL), which serve as the core models for solving Markov decision processes (MDPs). Unfortunately, in their mathematical form, they are sensitive to modifications, and thus, the practical instantiations that implement them do not automatically inherit their improvement guarantees. As a result, the spectrum of available rigorous MDP-solvers is narrow. Indeed, many state-of-the-art (SOTA) algorithms, such as TRPO and PPO, are not proven to converge. In this paper, we propose \textsl{mirror learning} -- a general solution to the RL problem. We reveal GPI and TRL to be but small points within this far greater space of algorithms which boasts the monotonic improvement property and converges to the optimal policy. We show that virtually all SOTA algorithms for RL are instances of mirror learning, and thus suggest that their empirical performance is a consequence of their theoretical properties, rather than of approximate analogies. Excitingly, we show that mirror learning opens up a whole new space of policy learning methods with convergence guarantees.

翻訳日:2022-01-13 00:34:26 公開日:2022-01-11

# (参考訳) セグメンテーション性能に対する事前ベース損失の影響:ベンチマーク

Effect of Prior-based Losses on Segmentation Performance: A Benchmark ( http://arxiv.org/abs/2201.02428v3 )

ライセンス: CC BY 4.0

Rosana El Jurdi, Caroline Petitjean, Veronika Cheplygina, Paul Honeine, Fahed Abdallah

(参考訳) 今日、深層畳み込みニューラルネットワーク(cnns)は、様々な画像モードやタスクに基づいて、医用画像セグメンテーションの最先端のパフォーマンスを実証している。初期の成功にもかかわらず、セグメンテーションネットワークは依然として解剖学的に異常なセグメンテーションを生成し、オブジェクト境界付近に穴や不正確さがある。解剖学的可能性を強化するために、近年の研究は、損失関数の制約として、物体形状や境界などの事前知識を取り入れることに焦点を当てている。以前の統合は、基幹領域から抽出された再構成された表現を低レベル、または臓器の形状や大きさなどの外部医療情報を高レベルに表すことができる。過去数年間、事前の損失は、アーキテクチャに依存しながら専門家の知識の統合を可能にしているため、研究分野への関心が高まった。しかしながら、さまざまな医療画像の課題やタスクにおける事前ベース損失の多様性を考えると、どのデータセットに最適な損失を識別することが困難になっている。本稿では,医療画像分割における最近の先行的損失のベンチマークについて述べる。主な目的は、特定のタスクやデータセットに与えられた損失を選択するための直感を提供することである。この目的のために、4つの低レベルおよび高レベルの事前ベース損失が選択される。評価された損失は、Deathlon、ISLES、WMHチャレンジなど、さまざまな医療画像セグメンテーション課題から8つの異なるデータセットで検証される。その結果、低レベルの事前ベース損失はデータセット特性に関わらずサイコロ損失ベースラインよりも性能が向上することを保証できるが、高レベルの事前ベース損失はデータ特性に応じて解剖学的信頼性が向上することが示された。

Today, deep convolutional neural networks (CNNs) have demonstrated state-of-the-art performance for medical image segmentation, on various imaging modalities and tasks. Despite early success, segmentation networks may still generate anatomically aberrant segmentations, with holes or inaccuracies near the object boundaries. To enforce anatomical plausibility, recent research studies have focused on incorporating prior knowledge such as object shape or boundary, as constraints in the loss function. Prior integrated could be low-level referring to reformulated representations extracted from the ground-truth segmentations, or high-level representing external medical information such as the organ's shape or size. Over the past few years, prior-based losses exhibited a rising interest in the research field since they allow integration of expert knowledge while still being architecture-agnostic. However, given the diversity of prior-based losses on different medical imaging challenges and tasks, it has become hard to identify what loss works best for which dataset. In this paper, we establish a benchmark of recent prior-based losses for medical image segmentation. The main objective is to provide intuition onto which losses to choose given a particular task or dataset. To this end, four low-level and high-level prior-based losses are selected. The considered losses are validated on 8 different datasets from a variety of medical image segmentation challenges including the Decathlon, the ISLES and the WMH challenge. Results show that whereas low-level prior-based losses can guarantee an increase in performance over the Dice loss baseline regardless of the dataset characteristics, high-level prior-based losses can increase anatomical plausibility as per data characteristics.

翻訳日:2022-01-13 00:05:16 公開日:2022-01-11

# (参考訳) エージェントエージェント時間決定における一般値関数を持つパブロフ信号伝達

Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making ( http://arxiv.org/abs/2201.03709v1 )

ライセンス: CC BY 4.0

Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A. Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil, Patrick M. Pilarski

(参考訳) 本稿では,パブロフ信号の多面的研究に寄与し,あるエージェントが他のエージェントから意思決定を通知する時間的拡張予測プロセスを提案する。信号は時間とタイミングに密接に関連している。信号の生成と受信を行う際、人間や他の動物は時間を表し、過去の出来事から時間を決定し、将来の刺激まで時間を予測し、時間とともに広がるパターンを認識し、生成することが知られている。時間的プロセスの違いが学習エージェント間の協調とシグナル伝達にどのように影響するかを,Frost Hollowと呼ばれる部分的に観測可能な意思決定ドメインを導入することによって検討する。このドメインでは、予測学習エージェントと強化学習エージェントとを、時間的ハザードを避けながらスパース報酬を得るための2部意思決定システムに結合する。 7状態線形歩行における機械エージェントの相互作用と,仮想現実環境における人間と機械の相互作用である。その結果,パブロフ信号の学習速度,時間的表現の違いがエージェントエージェント協調に与える影響,時間的エイリアシングがエージェントエージェントと人間エージェントの相互作用にどう影響するかが示された。主な貢献として、固定信号のパラダイムと2つのエージェント間の完全適応通信学習の自然なブリッジとしてパブロフ信号を確立する。さらに,高速な連続予測学習とエージェント受信信号の性質に関する最小限の制約を特徴とする,固定的な信号処理からこの適応的信号処理を計算的に構築する方法を示す。この結果から,強化学習エージェント間のコミュニケーション学習への実践的で建設的な道筋が示唆された。

In this paper, we contribute a multi-faceted study into Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent. Signalling is intimately connected to time and timing. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict the time until a future stimulus, and both recognize and generate patterns that unfold in time. We investigate how different temporal processes impact coordination and signalling between learning agents by introducing a partially observable decision-making domain we call the Frost Hollow. In this domain, a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that works to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: machine agents interacting in a seven-state linear walk, and human-machine interaction in a virtual-reality environment. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents. We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals. Our results therefore suggest an actionable, constructivist path towards communication learning between reinforcement learning agents.

翻訳日:2022-01-12 19:25:53 公開日:2022-01-11

# (参考訳) 予算のオンライン変更点検出

Online Changepoint Detection on a Budget ( http://arxiv.org/abs/2201.03710v1 )

ライセンス: CC BY 4.0

Zhaohui Wang, Xiao Lin, Abhinav Mishra, Ram Sriharsha

(参考訳) 変更ポイントは、基礎となるデータの分布の急激なバリエーションである。データストリームの変更を検出することは、多くのアプリケーションにとって重要な問題である。本稿では,従来の観測回数とは無関係に,記憶条件と最悪の計算複雑性の両方を考慮し,オンライン環境で動作する変更点検出アルゴリズムに関心がある。そこで本研究では,オフライン・チェンジポイント検出アルゴリズムと好適な比較を行うとともに,厳密に制約された計算モデルで動作するオンライン・チェンジポイント検出アルゴリズムを提案する。さらに,これらのアルゴリズムに対する簡易なオンラインハイパーパラメータ自動チューニング手法を提案する。

Changepoints are abrupt variations in the underlying distribution of data. Detecting changes in a data stream is an important problem with many applications. In this paper, we are interested in changepoint detection algorithms which operate in an online setting in the sense that both its storage requirements and worst-case computational complexity per observation are independent of the number of previous observations. We propose an online changepoint detection algorithm for both univariate and multivariate data which compares favorably with offline changepoint detection algorithms while also operating in a strictly more constrained computational model. In addition, we present a simple online hyperparameter auto tuning technique for these algorithms.

翻訳日:2022-01-12 19:09:38 公開日:2022-01-11

# (参考訳) フェデレーション学習における部分モデル平均化:パフォーマンス保証とメリット

Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits ( http://arxiv.org/abs/2201.03789v1 )

ライセンス: CC BY 4.0

Sunwoo Lee, Anit Kumar Sahu, Chaoyang He, and Salman Avestimehr

(参考訳) 周期モデル平均化(FedAvg)を用いた局所確率勾配決定(SGD)は、フェデレートラーニングにおける基礎的アルゴリズムである。アルゴリズムは独立して複数のワーカー上でsgdを実行し、すべてのワーカーに対して定期的にモデルを平均化する。しかし、局所的なSGDが多くの労働者と共に実行されると、周期的な平均化は労働者間で重要なモデル差を引き起こし、グローバルな損失は緩やかに収束する。最近の高度な最適化手法は、非IID設定に焦点をあてた問題に対処しているが、根底にある周期モデル平均化によるモデル差の問題はまだ残っている。フェデレートラーニングにおけるモデルの相違を緩和する部分モデル平均化フレームワークを提案する。部分平均化により、局所モデル同士がパラメータ空間に近接することを奨励し、より効果的にグローバル損失を最小化することができる。一定回数の反復と多数の作業者(128)が与えられた場合、部分的平均は周期的全平均よりも最大2.2%高い検証精度が得られる。

Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2% higher validation accuracy than the periodic full averaging.

翻訳日:2022-01-12 19:01:21 公開日:2022-01-11

# (参考訳) 物体検出と移動学習を用いたビールボトルの分類

Classification of Beer Bottles using Object Detection and Transfer Learning ( http://arxiv.org/abs/2201.03791v1 )

ライセンス: CC BY 4.0

Philipp Hohlfeld, Tobias Ostermeier, Dominik Brandl

(参考訳) 分類問題はコンピュータビジョンでよく見られる。それにもかかわらず、ビール瓶の分類には専用の作業はない。マスターコースのDeep Learningの課題の一環として、5207個のビールボトルの画像とブランドラベルのデータセットが作成された。画像にはちょうど1つのビールボトルが含まれています。本稿では,ビールボトルの画像を2段階のアプローチで分類する深層学習モデルを提案する。最初のステップとして、Faster-R-CNNはブランドとは独立して分類に関連する画像区間を検出する。第2ステップでは、関連する画像セクションをResNet-18で分類する。最も信頼度の高い画像セクションはクラスラベルとして返される。最終テストデータセットの課題において、古典的な1ステップのトランスファー学習アプローチを超越し、99.86%の精度に達したモデルを提案する。挑戦が終わった後、100%の精度で達成できた

Classification problems are common in Computer Vision. Despite this, there is no dedicated work for the classification of beer bottles. As part of the challenge of the master course Deep Learning, a dataset of 5207 beer bottle images and brand labels was created. An image contains exactly one beer bottle. In this paper we present a deep learning model which classifies pictures of beer bottles in a two step approach. As the first step, a Faster-R-CNN detects image sections relevant for classification independently of the brand. In the second step, the relevant image sections are classified by a ResNet-18. The image section with the highest confidence is returned as class label. We propose a model, with which we surpass the classic one step transfer learning approach and reached an accuracy of 99.86% during the challenge on the final test dataset. We were able to achieve 100% accuracy after the challenge ended

翻訳日:2022-01-12 18:59:54 公開日:2022-01-11

# (参考訳) 超解像に対する効率的な非局所コントラストアテンション

Efficient Non-Local Contrastive Attention for Image Super-Resolution ( http://arxiv.org/abs/2201.03794v1 )

ライセンス: CC BY 4.0

Bin Xia, Yucheng Hang, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie Zhou

(参考訳) 非局所的注意(NLA)は、自然画像の内在的特徴相関を利用して、単一画像超解法(SISR)に大きな改善をもたらす。しかし、NLAはノイズの多い情報を提供し、入力サイズに関して二次計算資源を消費し、その性能と応用を制限する。本稿では,長期ビジュアルモデリングを行い,より関連性の高い非局所的特徴を活用するための,効率的な非局所的コントラスト注意(ENLCA)を提案する。具体的には、ENLCAは、効率的な非局所的注意(ENLA)とスパース集約(Sparse Aggregation)の2つの部分から構成される。 ENLAは指数関数を近似するためにカーネル法を採用し、線形計算複雑性を得る。 Sparse Aggregationでは、増幅係数で入力を乗算して情報的特徴にフォーカスするが、近似のばらつきは指数関数的に増加する。したがって、コントラスト学習は、さらに関係性および無関係な特徴を分離するために適用される。 ENLCAの有効性を示すため,簡単なバックボーンにいくつかのモジュールを追加することで,ENLCN(Efficient Non-Local Contrastive Network)と呼ばれるアーキテクチャを構築した。実験結果から,ENLCNは定量評価と定性評価の両方において,最先端手法よりも優れた性能を示した。

Non-Local Attention (NLA) brings significant improvement for Single Image Super-Resolution (SISR) by leveraging intrinsic feature correlation in natural images. However, NLA gives noisy information large weights and consumes quadratic computation resources with respect to the input size, limiting its performance and application. In this paper, we propose a novel Efficient Non-Local Contrastive Attention (ENLCA) to perform long-range visual modeling and leverage more relevant non-local features. Specifically, ENLCA consists of two parts, Efficient Non-Local Attention (ENLA) and Sparse Aggregation. ENLA adopts the kernel method to approximate exponential function and obtains linear computation complexity. For Sparse Aggregation, we multiply inputs by an amplification factor to focus on informative features, yet the variance of approximation increases exponentially. Therefore, contrastive learning is applied to further separate relevant and irrelevant features. To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone. Extensive experimental results show that ENLCN reaches superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.

翻訳日:2022-01-12 18:52:16 公開日:2022-01-11

# (参考訳) CI-AVSR:車内コマンド認識のためのカントン音声・ビジュアル音声データセット

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition ( http://arxiv.org/abs/2201.03804v1 )

ライセンス: CC BY 4.0

Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

(参考訳) ディープラーニングとインテリジェントな車両の台頭により、スマートアシスタントは、運転を容易にし、余分な機能を提供するために、車内コンポーネントとして不可欠なものになっている。車内スマートアシスタントは、運転を楽にし、安全性を向上させるために、一般および車関連コマンドを処理し、対応するアクションを実行することができるべきである。しかし、低リソース言語にはデータ不足の問題があり、研究やアプリケーションの開発を妨げている。本稿では,Cantonese言語における車内コマンド認識のための新しいデータセットであるCantonese In-car Audio-Visual Speech Recognition (CI-AVSR)を導入する。カントン語話者30人が記録した200の車載コマンドの4,984サンプル(8.3時間)で構成されている。さらに,実環境をシミュレートするために,車内背景雑音を用いたデータセットの拡張を行い,収集したデータより10倍大きいデータセットを生成する。当社のデータセットのクリーンバージョンと拡張バージョンの両方に関する詳細な統計情報を提供しています。さらに,CI-AVSRの有効性を示すために,2つのマルチモーダルベースラインを実装した。実験の結果,視覚信号の活用により,モデル全体の性能が向上することがわかった。私たちの最良のモデルはクリーンなテストセットでかなりの品質を達成できますが、ノイズの多いデータの音声認識品質はいまだに劣っており、実際の車内音声認識システムにとって非常に困難なタスクです。データセットとコードはhttps://github.com/HLTCHKUST/CI-AVSRで公開される。

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains as an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR.

翻訳日:2022-01-12 18:38:33 公開日:2022-01-11

# (参考訳) メタラーニングによる情報グラフ拡張のブートストラップ

Bootstrapping Informative Graph Augmentation via A Meta Learning Approach ( http://arxiv.org/abs/2201.03812v1 )

ライセンス: CC BY 4.0

Hang Gao, Jiangmeng Li, Wenwen Qiang, Lingyu Si, Changwen Zheng, Fuchun Sun

(参考訳) 近年の研究では、グラフ表現を自己教師型で学習する。グラフコントラスト学習では、ベンチマーク手法は様々なグラフ拡張アプローチを適用する。しかし、増分法のほとんどは学習不可能であり、不便な増分グラフを生成する問題を引き起こす。このような拡張は、グラフコントラスト学習法の表現能力を低下させる可能性がある。そこで本稿では,Meta Graph Augmentation (MEGA) と呼ばれる学習可能なグラフ拡張器を用いてグラフを生成する方法を提案する。そして、"良い"グラフ拡張は、インスタンスレベルでは均一で、機能レベルではインフォメーション性を持つ必要があります。そこで本研究では,一様性,情報性に富んだ拡張を生成できるグラフ強化器の学習手法を提案する。グラフ拡張器の目的は,特徴抽出ネットワークを促進させ,より差別的な特徴表現を学習することであり,メタラーニングパラダイムを提案する動機となっている。実験的に、複数のベンチマークデータセットに対する実験は、MEGAがグラフ自己教師付き学習タスクにおいて最先端の手法よりも優れていることを示した。さらなる実験的研究により、MEGAの異なる用語の有効性が証明された。

Recent works explore learning graph representations in a self-supervised manner. In graph contrastive learning, benchmark methods apply various graph augmentation approaches. However, most of the augmentation methods are non-learnable, which causes the issue of generating unbeneficial augmented graphs. Such augmentation may degenerate the representation ability of graph contrastive learning methods. Therefore, we motivate our method to generate augmented graph by a learnable graph augmenter, called MEta Graph Augmentation (MEGA). We then clarify that a "good" graph augmentation must have uniformity at the instance-level and informativeness at the feature-level. To this end, we propose a novel approach to learning a graph augmenter that can generate an augmentation with uniformity and informativeness. The objective of the graph augmenter is to promote our feature extraction network to learn a more discriminative feature representation, which motivates us to propose a meta-learning paradigm. Empirically, the experiments across multiple benchmark datasets demonstrate that MEGA outperforms the state-of-the-art methods in graph self-supervised learning tasks. Further experimental studies prove the effectiveness of different terms of MEGA.

翻訳日:2022-01-12 18:25:47 公開日:2022-01-11

# (参考訳) 機械学習を用いたトルコの感情分析:オンライン食品注文サイトレビューへの適用

Turkish Sentiment Analysis Using Machine Learning Methods: Application on Online Food Order Site Reviews ( http://arxiv.org/abs/2201.03848v1 )

ライセンス: CC BY 4.0

\"Ozlem Akta\c{s}, Berkay Co\c{s}kuner, \.Ilker Soner

(参考訳) あらゆるセクターで今日現れる満足度測定は、多くの企業にとって非常に重要な要素だ。本研究では,Yemek Sepetiのデータとこれらのデータのバリエーションを用いて,さまざまな機械学習アルゴリズムで最高の精度に達することを目的とした。各アルゴリズムの精度は、使用する自然言語処理手法とともに算出された。これらの精度を計算しながら、使用するアルゴリズムのパラメータを最適化しようと試みた。この研究でトレーニングされたラベル付きデータに関するモデルは、ラベルなしデータで使用することができ、顧客満足度を測定するアイデアを企業に提供することができる。 3つの異なる自然言語処理手法が適用され,ほとんどのモデルにおいて約5%の精度向上が得られた。

Satisfaction measurement, which emerges in every sector today, is a very important factor for many companies. In this study, it is aimed to reach the highest accuracy rate with various machine learning algorithms by using the data on Yemek Sepeti and variations of this data. The accuracy values of each algorithm were calculated together with the various natural language processing methods used. While calculating these accuracy values, the parameters of the algorithms used were tried to be optimized. The models trained in this study on labeled data can be used on unlabeled data and can give companies an idea in measuring customer satisfaction. It was observed that 3 different natural language processing methods applied resulted in approximately 5% accuracy increase in most of the developed models.

翻訳日:2022-01-12 18:10:07 公開日:2022-01-11

# (参考訳) 野生の文書のWebジェネア識別のためのGINCOトレーニングデータセット

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild ( http://arxiv.org/abs/2201.03857v1 )

ライセンス: CC BY-SA 4.0

Taja Kuzman, Peter Rupnik and Nikola Ljube\v{s}i\'c

(参考訳) 本稿では,6万語からなる1,125クロールされたスロベニア語 web 文書に基づく,ジャンル識別用自動学習データセットを提案する。各ドキュメントは、既存のスキーマ上に構築された新しいアノテーションスキーマを使って、ジャンル向けに手作業で注釈付けされ、主にラベルとアノテーション間の合意を念頭に置いている。このデータセットは、機械翻訳コンテンツ、エンコーディングエラー、一つの文書に表示される複数のコンテンツなど、Webベースのデータに関連するさまざまな課題で構成され、現実的な条件下での分類器の評価を可能にする。データセット上の最初の機械学習実験では、(1)プリトランスフォーマモデルでは、マクロf1メトリクスが約0.22で、一方、トランスフォーマベースモデルは約0.58点、(2)マルチリンガルトランスフォーマモデルは、これまで標準nlpタスクでマルチリンガルモデルよりも優れていることが証明されていた単言語モデルと同様に、タスクでも動作する、という結果が得られた。

This paper presents a new training dataset for automatic genre identification GINCO, which is based on 1,125 crawled Slovenian web documents that consist of 650 thousand words. Each document was manually annotated for genre with a new annotation schema that builds upon existing schemata, having primarily clarity of labels and inter-annotator agreement in mind. The dataset consists of various challenges related to web-based data, such as machine translated content, encoding errors, multiple contents presented in one document etc., enabling evaluation of classifiers in realistic conditions. The initial machine learning experiments on the dataset show that (1) pre-Transformer models are drastically less able to model the phenomena, with macro F1 metrics ranging around 0.22, while Transformer-based models achieve scores of around 0.58, and (2) multilingual Transformer models work as well on the task as the monolingual models that were previously proven to be superior to multilingual models on standard NLP tasks.

翻訳日:2022-01-12 18:02:17 公開日:2022-01-11

# (参考訳) eegからの感情推定 ---saliencyと組み合わせた二重ディープラーニングアプローチ

Emotion Estimation from EEG -- A Dual Deep Learning Approach Combined with Saliency ( http://arxiv.org/abs/2201.03891v1 )

ライセンス: CC BY 4.0

Victor Delvigne, Antoine Facchini, Hazem Wannous, Thierry Dutoit, Laurence Ris and Jean-Philippe Vandeborre

(参考訳) 感情の推定は、人間とコンピュータの相互作用に重要な影響を与える研究の活発な分野である。感情を評価するための異なるモダリティの中で、脳波(EEG)は過去10年間に動機づけた結果を示した。脳波による感情推定は、特定の疾患の診断やリハビリに役立つ。本稿では,コンピュータビジョンに特化した新しい深層学習(DL)モデルと,専門家が定義する生理的知識を考慮した2つの手法を提案する。モデル塩分分析により共同学習が強化された。グローバルなアプローチを提案するため、このモデルは4つの公開データセットで評価され、最先端のアプローチと同じような結果が得られ、より高い安定性を反映した標準偏差の低い2つのデータセットに対して性能が向上する。本論文で提案するコードとモデルは再現性のためにgithub.com/VDelv/Emotion-EEGで公開されている。

Emotion estimation is an active field of research that has an important impact on the interaction between human and computer. Among the different modality to assess emotion, electroencephalogram (EEG) representing the electrical brain activity presented motivating results during the last decade. Emotion estimation from EEG could help in the diagnosis or rehabilitation of certain diseases. In this paper, we propose a dual method considering the physiological knowledge defined by specialists combined with novel deep learning (DL) models initially dedicated to computer vision. The joint learning has been enhanced with model saliency analysis. To present a global approach, the model has been evaluated on four publicly available datasets and achieves similar results to the state-of-theart approaches and outperforming results for two of the proposed datasets with a lower standard deviation that reflects higher stability. For sake of reproducibility, the codes and models proposed in this paper are available at github.com/VDelv/Emotion-EEG.

翻訳日:2022-01-12 17:44:07 公開日:2022-01-11

# (参考訳) オートエンコーダ入門

An Introduction to Autoencoders ( http://arxiv.org/abs/2201.03898v1 )

ライセンス: CC BY 4.0

Umberto Michelucci

(参考訳) 本稿では,オートエンコーダについて述べる。本稿では,オートエンコーダの基本概念と数学について述べる。それらは何か、制限は何か、典型的なユースケースについて議論し、いくつかの例を見ていきます。まず、オートエンコーダの一般的な紹介から始め、出力層におけるアクティベーション関数の役割と損失関数について議論する。次に、再構築エラーとは何かについて議論する。最後に, 典型的な応用として, 次元の縮小, 分類, 分節化, 異常検出について考察する。本論文は2021年に与えられたオートエンコーダに関する博士号レベルの講義のノートを含む。

In this article, we will look at autoencoders. This article covers the mathematics and the fundamental concepts of autoencoders. We will discuss what they are, what the limitations are, the typical use cases, and we will look at some examples. We will start with a general introduction to autoencoders, and we will discuss the role of the activation function in the output layer and the loss function. We will then discuss what the reconstruction error is. Finally, we will look at typical applications as dimensionality reduction, classification, denoising, and anomaly detection. This paper contains the notes of a PhD-level lecture on autoencoders given in 2021.

翻訳日:2022-01-12 17:32:36 公開日:2022-01-11

# (参考訳) 私の心はどこにありますか. 脳活動からの視覚的注意の予測

Where Is My Mind (looking at)? Predicting Visual Attention from Brain Activity ( http://arxiv.org/abs/2201.03902v1 )

ライセンス: CC BY 4.0

Victor Delvigne, No\'e Tits, Luca La Fisca, Nathan Hubens, Antoine Maiorca, Hazem Wannous, Thierry Dutoit and Jean-Philippe Vandeborre

(参考訳) 視覚的注意の推定は、コンピュータビジョン、人工知能、医学といった様々な分野の研究の活発な分野である。注目度マップを推定する最も一般的なアプローチの1つは、観察された画像に基づいている。本稿では,脳波の取得から視覚的注意を抽出できることを示す。結果は観測された画像からの従来の予測に匹敵する。この目的のために一連の信号が記録され、視覚注意と脳活動の関係を研究するために異なるモデルが開発されている。結果は奨励的であり、他のモダリティによる注意を推定する他のアプローチと比較できる。この論文で検討されているコードとデータセットは、この分野の研究を促進するために \url{https://figshare.com/s/3e353bd1c62 1962888ad} で利用可能である。

Visual attention estimation is an active field of research at the crossroads of different disciplines: computer vision, artificial intelligence and medicine. One of the most common approaches to estimate a saliency map representing attention is based on the observed images. In this paper, we show that visual attention can be retrieved from EEG acquisition. The results are comparable to traditional predictions from observed images, which is of great interest. For this purpose, a set of signals has been recorded and different models have been developed to study the relationship between visual attention and brain activity. The results are encouraging and comparable with other approaches estimating attention with other modalities. The codes and dataset considered in this paper have been made available at \url{https://figshare.com/s/3e353bd1c621962888ad} to promote research in the field.

翻訳日:2022-01-12 17:20:19 公開日:2022-01-11

# (参考訳) 適応正と負のサンプルを用いたコントラスト学習に基づく特徴抽出フレームワーク

Feature Extraction Framework based on Contrastive Learning with Adaptive Positive and Negative Samples ( http://arxiv.org/abs/2201.03942v1 )

ライセンス: CC BY 4.0

Hongjie Zhang

(参考訳) 本研究では,教師なし・教師なし・半教師付き単一視点特徴抽出に適した適応正負サンプル(CL-FEFA)を用いたコントラスト学習に基づく特徴抽出フレームワークを提案する。 CL-FEFAは、特徴抽出の結果から正および負のサンプルを適応的に構成し、より適切かつ正確である。その後、前回の正および負のサンプルに基づいてInfoNCEの損失に基づいて識別特性を再抽出し、クラス内サンプルをよりコンパクト化し、クラス間サンプルをより分散させる。同時に、サブスペースサンプルの潜在的構造情報を用いて、正および負のサンプルを動的に構築することで、我々のフレームワークはノイズの多いデータに対してより堅牢になる。さらに、CL-FEFAは、潜在的な構造に類似したサンプルである正のサンプル間の相互情報を考慮し、特徴抽出の利点を理論的に支持する。最終数値実験により,提案手法は従来の特徴抽出法やコントラスト学習法よりも大きなアドバンテージを持つことが示された。

In this study, we propose a feature extraction framework based on contrastive learning with adaptive positive and negative samples (CL-FEFA) that is suitable for unsupervised, supervised, and semi-supervised single-view feature extraction. CL-FEFA constructs adaptively the positive and negative samples from the results of feature extraction, which makes it more appropriate and accurate. Thereafter, the discriminative features are re extracted to according to InfoNCE loss based on previous positive and negative samples, which will make the intra-class samples more compact and the inter-class samples more dispersed. At the same time, using the potential structure information of subspace samples to dynamically construct positive and negative samples can make our framework more robust to noisy data. Furthermore, CL-FEFA considers the mutual information between positive samples, that is, similar samples in potential structures, which provides theoretical support for its advantages in feature extraction. The final numerical experiments prove that the proposed framework has a strong advantage over the traditional feature extraction methods and contrastive learning methods.

翻訳日:2022-01-12 17:05:33 公開日:2022-01-11

# (参考訳) アクティブ強化学習--興味ある自己適応のための分類システムへのロードマップ

Active Reinforcement Learning -- A Roadmap Towards Curious Classifier Systems for Self-Adaptation ( http://arxiv.org/abs/2201.03947v1 )

ライセンス: CC BY 4.0

Simon Reichhuber, Sven Tomforde

(参考訳) インテリジェントなシステムには、観察や経験、明示的なフィードバックを考慮して、時間とともに行動を改善する能力がある。従来のアプローチでは、学習問題を分離し、強化学習、アクティブ学習、異常検出、転送学習など、さまざまな分野の機械学習から分離したテクニックを使用する。このような状況下では、基本的な強化学習アプローチには、試行錯誤、純粋なリアクティブな振る舞い、分離された問題処理といった、現実のシステムへの応用を妨げるいくつかの欠点がある。本稿では,知的システムにおける「能動的強化学習」という研究課題を立案し,これらの欠点を軽減するための概念を提案する。

Intelligent systems have the ability to improve their behaviour over time taking observations, experiences or explicit feedback into account. Traditional approaches separate the learning problem and make isolated use of techniques from different field of machine learning such as reinforcement learning, active learning, anomaly detection or transfer learning, for instance. In this context, the fundamental reinforcement learning approaches come with several drawbacks that hinder their application to real-world systems: trial-and-error, purely reactive behaviour or isolated problem handling. The idea of this article is to present a concept for alleviating these drawbacks by setting up a research agenda towards what we call "active reinforcement learning" in intelligent systems.

翻訳日:2022-01-12 16:51:57 公開日:2022-01-11

# (参考訳) 不均衡データに対するマルチグラニュラリティリラベルアンダーサンプリングアルゴリズム

Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data ( http://arxiv.org/abs/2201.03957v1 )

ライセンス: CC BY 4.0

Qi Dai, Jian-wei Liu, Yang Liu

(参考訳) 不均衡な分類問題は、データマイニングと機械学習において重要かつ困難な問題の1つであることが判明した。従来の分類器の性能は、クラス不均衡問題、クラスオーバーラップ、ノイズなど、多くのデータ問題に大きく影響を受ける。 tomek-linkアルゴリズムは、提案時にデータクリーニングにのみ使用された。近年,Tomek-Linkアルゴリズムとサンプリング手法の組み合わせが報告されている。 Tomek-Linkサンプリングアルゴリズムは、データ上のクラスオーバーラップを効果的に低減し、識別が難しい多数インスタンスを除去し、アルゴリズムの分類精度を向上させる。しかし、tomek-linksのアンダーサンプリングアルゴリズムは、互いに最も近い境界インスタンスのみをグローバルに考慮し、潜在的に重複するインスタンスを無視する。マイノリティインスタンス数が小さい場合、アンサンプリング効果が不十分であり、分類モデルの性能改善は明らかではない。そこで,tomek-linkに基づき,マルチグラニュラリティリラベル化アンダーサンプリングアルゴリズム(mgru)を提案する。このアルゴリズムは、局所粒度部分空間に設定されたデータセットの局所情報を十分に考慮し、データセット内の局所ポテンシャル重複インスタンスを検出する。そして、重なり合う多数派インスタンスをグローバルレザベルインデックス値に従って排除し、トメックリンクの検出範囲を効果的に拡大する。その結果,アンダーサンプリングの最適大域レラベルインデックス値を選択した場合,提案するアンダーサンプリングアルゴリズムの分類精度と一般化性能は,他のベースラインアルゴリズムよりも有意に優れていることがわかった。

The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning. The performances of traditional classifiers will be severely affected by many data problems, such as class imbalanced problem, class overlap and noise. The Tomek-Link algorithm was only used to clean data when it was proposed. In recent years, there have been reports of combining Tomek-Link algorithm with sampling technique. The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy. However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances. When the number of minority instances is small, the under-sampling effect is not satisfactory, and the performance improvement of the classification model is not obvious. Therefore, on the basis of Tomek-Link, a multi-granularity relabeled under-sampling algorithm (MGRU) is proposed. This algorithm fully considers the local information of the data set in the local granularity subspace, and detects the local potential overlapping instances in the data set. Then, the overlapped majority instances are eliminated according to the global relabeled index value, which effectively expands the detection range of Tomek-Links. The simulation results show that when we select the optimal global relabeled index value for under-sampling, the classification accuracy and generalization performance of the proposed under-sampling algorithm are significantly better than other baseline algorithms.

翻訳日:2022-01-12 16:33:08 公開日:2022-01-11

# (参考訳) フーリエリング相関を用いた画質測定とノイズ除去

Image quality measurements and denoising using Fourier Ring Correlations ( http://arxiv.org/abs/2201.03992v1 )

ライセンス: CC BY 4.0

J. Kaczmar-Michalska, N.R. Hajizadeh, A.J. Rzepiela and S.F. N{\o}rrelykke

(参考訳) 画像の品質は、異なる人々に対して異なる意味を持つ誤った概念です。画像品質を定量化するため、劣化画像と接地真実画像との相対差を典型的に算出する。しかし、この違いを測定するのにどんな指標を使うべきか? 理想的には、メトリックは自然画像と科学画像の両方でうまく機能するはずだ。構造類似度指数(SSIM)は、人間が画像の類似性をどう知覚するかの指標であるが、科学的に顕微鏡で意味のある違いには敏感ではない。電子および超解像顕微鏡では、フーリエリング相関 (FRC) がしばしば用いられるが、これらの分野以外ではほとんど知られていない。ここでは、FRCがGoogle Open Imagesデータセットなど、自然画像にも同じように適用可能であることを示す。次に、frcに基づいて損失関数を定義し、解析的に微分可能であることを示す。このFRCベースの損失関数は、L1またはL2ベースの損失を使用する場合よりも、ネットワークを高速にトレーニングし、類似またはより良い結果が得られる。また、FRC解析によるニューラルネットワークの特性と限界についても検討する。

Image quality is a nebulous concept with different meanings to different people. To quantify image quality a relative difference is typically calculated between a corrupted image and a ground truth image. But what metric should we use for measuring this difference? Ideally, the metric should perform well for both natural and scientific images. The structural similarity index (SSIM) is a good measure for how humans perceive image similarities, but is not sensitive to differences that are scientifically meaningful in microscopy. In electron and super-resolution microscopy, the Fourier Ring Correlation (FRC) is often used, but is little known outside of these fields. Here we show that the FRC can equally well be applied to natural images, e.g. the Google Open Images dataset. We then define a loss function based on the FRC, show that it is analytically differentiable, and use it to train a U-net for denoising of images. This FRC-based loss function allows the network to train faster and achieve similar or better results than when using L1- or L2- based losses. We also investigate the properties and limitations of neural network denoising with the FRC analysis.

翻訳日:2022-01-12 16:27:01 公開日:2022-01-11

# (参考訳) ディープフェイス認識に対する類似性に基づくグレイボックス逆攻撃

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition ( http://arxiv.org/abs/2201.04011v1 )

ライセンス: CC BY 4.0

Hanrui Wang, Shuo Wang, Zhe Jin, Yandan Wang, Cunjian Chen, Massimo Tistarell

(参考訳) 敵対的攻撃手法の大半は、システムの全知識が明らかにされると、深い顔認識に対して良好に機能する(\emph{white-box})。しかし、このような手法は攻撃者に顔テンプレートが未知のグレーボックス設定ではうまく機能しない。本研究では,新たに開発された目的関数を持つ類似性に基づく灰色の箱対向攻撃(SGADV)手法を提案する。 SGADVは、相似性スコアを使用して、最適化された敵の例、すなわち類似性に基づく敵攻撃を生成する。このテクニックは、ホワイトボックスとグレーボックスの両方で、異なる類似度スコアを使用して真正または偽のユーザを決定する認証システムに対して適用される。 SGADVの有効性を検証するため,LFW,CelebA,CelebA-HQの顔データセットに対して,ホワイトボックスとグレーボックスの両方でFaceNetとInsightFaceの深層顔認識モデルに対して広範な実験を行った。提案手法は,グレーボックス設定において既存の攻撃手法よりも有意に優れていた。したがって,本手法の類似性ベースアプローチは,非認証のためのグレイボックス攻撃シナリオに十分対応できる可能性が示唆された。

The majority of adversarial attack techniques perform well against deep face recognition when the full knowledge of the system is revealed (\emph{white-box}). However, such techniques act unsuccessfully in the gray-box setting where the face templates are unknown to the attackers. In this work, we propose a similarity-based gray-box adversarial attack (SGADV) technique with a newly developed objective function. SGADV utilizes the dissimilarity score to produce the optimized adversarial example, i.e., similarity-based adversarial attack. This technique applies to both white-box and gray-box attacks against authentication systems that determine genuine or imposter users using the dissimilarity score. To validate the effectiveness of SGADV, we conduct extensive experiments on face datasets of LFW, CelebA, and CelebA-HQ against deep face recognition models of FaceNet and InsightFace in both white-box and gray-box settings. The results suggest that the proposed method significantly outperforms the existing adversarial attack techniques in the gray-box setting. We hence summarize that the similarity-base approaches to develop the adversarial example could satisfactorily cater to the gray-box attack scenarios for de-authentication.

翻訳日:2022-01-12 16:04:49 公開日:2022-01-11

# (参考訳) 差動的個人分割学習に対する特徴空間ハイジャック攻撃

Feature Space Hijacking Attacks against Differentially Private Split Learning ( http://arxiv.org/abs/2201.04018v1 )

ライセンス: CC BY-SA 4.0

Grzegorz Gawron, Philip Stubbings

(参考訳) 分散学習と差分プライバシーは、分散データセット上のプライバシーに準拠した高度な分析を支援する可能性がある技術である。分割学習に対する攻撃は重要な評価ツールであり、近年研究の注目を集めている。この研究の貢献は、クライアントサイドのオフザシェルDPオプティマイザを使用して、差分プライバシ(DP)によって強化されたスプリットニューラルネットワークの学習プロセスに、最近のフィーチャースペースハイジャック攻撃(FSHA)を適用することである。 FSHA攻撃は、任意に設定されたDPエプシロンレベルでエラー率の低いクライアントのプライベートデータ再構成を取得する。また,攻撃リスク軽減の可能性を示唆する次元的低減実験を行い,ある程度の有効性を示す。この設定において、差分プライバシーが効果的な保護ではない理由を論じ、他のリスク軽減手法についても言及する。

Split learning and differential privacy are technologies with growing potential to help with privacy-compliant advanced analytics on distributed datasets. Attacks against split learning are an important evaluation tool and have been receiving increased research attention recently. This work's contribution is applying a recent feature space hijacking attack (FSHA) to the learning process of a split neural network enhanced with differential privacy (DP), using a client-side off-the-shelf DP optimizer. The FSHA attack obtains client's private data reconstruction with low error rates at arbitrarily set DP epsilon levels. We also experiment with dimensionality reduction as a potential attack risk mitigation and show that it might help to some extent. We discuss the reasons why differential privacy is not an effective protection in this setting and mention potential other risk mitigation methods.

翻訳日:2022-01-12 16:03:48 公開日:2022-01-11

# (参考訳) グラフニューラルネットワークを利用した電力系統の状態推定

State Estimation in Electric Power Systems Leveraging Graph Neural Networks ( http://arxiv.org/abs/2201.04056v1 )

ライセンス: CC BY 4.0

Ognjen Kundacina, Mirsad Cosovic, Dejan Vukobratovic

(参考訳) 状態推定(SE)アルゴリズムの目標は、電力系統で利用可能な測定値セットに基づいて、複雑なバス電圧を状態変数として推定することである。ファサー測定ユニット (pmus) は送電系統で使われるようになりつつあるため, pmu の高いサンプリング率を活用できる高速 se ソルバが必要である。本稿では,pmu電圧と電流測定を入力として,評価フェーズ中に高速かつ正確な予測を得るために,グラフニューラルネットワーク(gnn)をトレーニングすることを提案する。 GNNは、電力系統内の測定セットをランダムにサンプリングし、PMUsソルバを備えた線形SEを用いて得られる解でラベル付けすることで、合成データセットを用いて訓練される。その結果,様々なテストシナリオにおけるGNN予測の精度を示し,欠落した入力データに対する予測の感度に対処した。

The goal of the state estimation (SE) algorithm is to estimate complex bus voltages as state variables based on the available set of measurements in the power system. Because phasor measurement units (PMUs) are increasingly being used in transmission power systems, there is a need for a fast SE solver that can take advantage of PMU high sampling rates. This paper proposes training a graph neural network (GNN) to learn the estimates given the PMU voltage and current measurements as inputs, with the intent of obtaining fast and accurate predictions during the evaluation phase. GNN is trained using synthetic datasets, created by randomly sampling sets of measurements in the power system and labelling them with a solution obtained using a linear SE with PMUs solver. The presented results display the accuracy of GNN predictions in various test scenarios and tackle the sensitivity of the predictions to the missing input data.

翻訳日:2022-01-12 15:57:00 公開日:2022-01-11

# (参考訳) 重力波検出における共通空間パターンの適用

Application of Common Spatial Patterns in Gravitational Waves Detection ( http://arxiv.org/abs/2201.04086v1 )

ライセンス: CC BY 4.0

Damodar Dahal

(参考訳) 共通空間パターン (Common Spatial Patterns, CSP) は、脳-コンピュータインタフェース(BCI)システムで多チャンネル磁気・電脳波(MEG/EEG)時系列データ中の事象関連電位(ERP)を検出するために広く使われている特徴抽出アルゴリズムである。本稿では,多検出器重力波(GW)のひずみがコレセンスを含むかどうかを判定する問題に対して,CSPアルゴリズムを開発し,適用する。信号処理技術とロジスティック回帰分類器を用いて、我々のパイプラインは、H1およびL1ひずみを用いて、重力波トランジェントカタログから82の信頼できるイベントのうち76の76を正確に検出でき、分類スコアは9,3.72 \pm 0.04\%$を10 \times 5$ Cross Validationを使って検出できることがわかった。偽陰性事象は、GW170817-v3、GW191219 163120-v1、GW200115 042309-v2、GW200210 092254-v1、GW200220 061928-v1、GW200322 091133-v1である。

Common Spatial Patterns (CSP) is a feature extraction algorithm widely used in Brain-Computer Interface (BCI) Systems for detecting Event-Related Potentials (ERPs) in multi-channel magneto/electroencephalography (MEG/EEG) time series data. In this article, we develop and apply a CSP algorithm to the problem of identifying whether a given epoch of multi-detector Gravitational Wave (GW) strains contains coalescenses. Paired with Signal Processing techniques and a Logistic Regression classifier, we find that our pipeline is correctly able to detect 76 out of 82 confident events from Gravitational Wave Transient Catalog, using H1 and L1 strains, with a classification score of $93.72 \pm 0.04\%$ using $10 \times 5$ cross validation. The false negative events were: GW170817-v3, GW191219 163120-v1, GW200115 042309-v2, GW200210 092254-v1, GW200220 061928-v1, and GW200322 091133-v1.

翻訳日:2022-01-12 15:46:43 公開日:2022-01-11

# (参考訳) スペクトルサーベイ:自律型UAVを用いたアクティブ無線マップ推定

Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs ( http://arxiv.org/abs/2201.04125v1 )

ライセンス: CC BY 4.0

Raju Shrestha, Daniel Romero, Sundeep Prabhakar Chepuri

(参考訳) 無線地図は、リソース割り当て、干渉調整、ミッションプランニングなど、無線通信や移動ロボットのタスクに多くの応用を見出している。空間分布測定から無線地図を構築する手法が多数提案されているが, 事前にその位置を推定する。そこで,本稿では,無人航空機 (uav) などの移動ロボットが,短時間の測量で高品質な地図推定を行うために,活発に選択された複数の場所で計測を収集するスペクトラムサーベイを提案する。これは2つのステップで行われる。まず,モデルベースオンラインベイズ推定器とデータ駆動深層学習アルゴリズムの2つの新しいアルゴリズムを考案し,地図推定値の更新と,可能な各場所における測定値の有意性を示す不確実性指標を提案する。これらのアルゴリズムは、相補的な利点と測定毎の特徴的複雑さを提供する。第二に、不確実性測定基準は、UAVの軌道を計画し、最も情報性の高い場所で測定を収集するために用いられる。この問題の組合せ複雑性を克服するために、線形時間における大きな不確実性のある領域を通して経路点のリストを得る動的プログラミング手法を提案する。実データを用いた数値実験により,提案手法が正確な無線地図を高速に構築できることが確認された。

Radio maps find numerous applications in wireless communications and mobile robotics tasks, including resource allocation, interference coordination, and mission planning. Although numerous techniques have been proposed to construct radio maps from spatially distributed measurements, the locations of such measurements are assumed predetermined beforehand. In contrast, this paper proposes spectrum surveying, where a mobile robot such as an unmanned aerial vehicle (UAV) collects measurements at a set of locations that are actively selected to obtain high-quality map estimates in a short surveying time. This is performed in two steps. First, two novel algorithms, a model-based online Bayesian estimator and a data-driven deep learning algorithm, are devised for updating a map estimate and an uncertainty metric that indicates the informativeness of measurements at each possible location. These algorithms offer complementary benefits and feature constant complexity per measurement. Second, the uncertainty metric is used to plan the trajectory of the UAV to gather measurements at the most informative locations. To overcome the combinatorial complexity of this problem, a dynamic programming approach is proposed to obtain lists of waypoints through areas of large uncertainty in linear time. Numerical experiments conducted on a realistic dataset confirm that the proposed scheme constructs accurate radio maps quickly.

翻訳日:2022-01-12 15:36:57 公開日:2022-01-11

# ODEフローの経路微分可能性

Path differentiability of ODE flows ( http://arxiv.org/abs/2201.03819v1 )

ライセンス: Link先を確認

Swann Marx (LS2N), Edouard Pauwels (IRIT)

(参考訳) 経路微分ベクトル場によって駆動される常微分方程式(ODE)の流れを考える。経路微分可能関数は、基本計算規則と相反する一般化微分の概念である保守勾配を受け入れるリプシッツ函数の固有部分類を構成する。我々の主な結果は、そのような流れが駆動ベクトル場の経路微分可能性特性を継承することを示している。感度差分包有物によって与えられる導関数の前方伝播が流れに保守的ジャコビアンを与えることを示す。これにより、ODE制約の下で積分コストに適用可能な非滑らかなアジョイント法を提案することができる。この結果は、パラメトリズドODE制約を用いた多種多様な非滑らかな最適化問題を解くための小さなステップ一階法の適用の理論的根拠となっている。これは、提案する非スムース随伴に基づく小さなステップ一階法を収束させることで示される。

We consider flows of ordinary differential equations (ODEs) driven by path differentiable vector fields. Path differentiable functions constitute a proper subclass of Lipschitz functions which admit conservative gradients, a notion of generalized derivative compatible with basic calculus rules. Our main result states that such flows inherit the path differentiability property of the driving vector field. We show indeed that forward propagation of derivatives given by the sensitivity differential inclusions provide a conservative Jacobian for the flow. This allows to propose a nonsmooth version of the adjoint method, which can be applied to integral costs under an ODE constraint. This result constitutes a theoretical ground to the application of small step first order methods to solve a broad class of nonsmooth optimization problems with parametrized ODE constraints. This is illustrated with the convergence of small step first order methods based on the proposed nonsmooth adjoint.

翻訳日:2022-01-12 15:07:20 公開日:2022-01-11

# 機械学習時代の反応と分光に関する原子論的シミュレーション - quo vadis?

Atomistic Simulations for Reactions and Spectroscopy in the Era of Machine Learning -- Quo Vadis? ( http://arxiv.org/abs/2201.03822v1 )

ライセンス: Link先を確認

M. Meuwly

(参考訳) 正確なエネルギー関数を用いた原子論シミュレーションは、気体および凝縮相における分子の機能運動に関する分子レベルの洞察を与えることができる。最近開発され現在進行中の機械学習技術の統合と組み合わせは、そのようなダイナミクスシミュレーションを現実に近づけるユニークな機会を提供する。この視点は、この分野における他者の努力と、自身の仕事のいくつかから現場の現状を明確にし、オープンな質問と将来の展望について議論する。

Atomistic simulations using accurate energy functions can provide molecular-level insight into functional motions of molecules in the gas- and in the condensed phase. Together with recently developed and currently pursued efforts in integrating and combining this with machine learning techniques provides a unique opportunity to bring such dynamics simulations closer to reality. This perspective delineates the present status of the field from efforts of others in the field and some of your own work and discusses open questions and future prospects.

翻訳日:2022-01-12 15:07:06 公開日:2022-01-11

# スパース・リワード課題における強化と模倣学習を組み合わせたリワード・リラベリング

Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks ( http://arxiv.org/abs/2201.03834v1 )

ライセンス: Link先を確認

Jesus Bujalance Martin, Fabien Moutarde

(参考訳) 近年、深層強化学習(DRL)は、ロボット工学、自律運転、ビデオゲームといった複雑な意思決定アプリケーションへの侵入に成功した。よりサンプル効率の良いアルゴリズムの探索では、できるだけ多くの外部のオフポリシーデータを活用することが有望な方向である。このデータ駆動アプローチの要点は、専門家のデモから学ぶことだ。過去には、デモのみの事前トレーニングや追加コスト関数の最小化など、リプレイバッファに追加されるデモをうまく活用するための複数のアイデアが提案されている。我々は,オンライン上で収集したデモやエピソードを,オフ・ポリシー・アルゴリズムを用いて,どのようなスパース・リワード環境でも活用できる新しい手法を提案する。本手法は,実演や成功したエピソードに与えられる報酬ボーナスに基づいて,専門家の模倣と自己模倣を奨励する。まず、エージェントが実証された動作にマッチするように促すために、デモから来る遷移に報奨ボーナスを与える。次に、成功したエピソードを収集すると、リプレイバッファに追加する前に同じボーナスで遷移を緩和し、エージェントが以前の成功と一致するように促します。実験はロボットの操作,特に6自由度ロボットアームの3つのタスクに焦点をあてた。報奨関係に基づく手法は, 実演がなくても, 基本アルゴリズム (sac, ddpg) の性能を向上させることを示す。さらに,従来の方法から2つの改善点を取り入れることで,すべてのベースラインを上回ります。

During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. In the search for more sample-efficient algorithms, a promising direction is to leverage as much external off-policy data as possible. One staple of this data-driven approach is to learn from expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation. First, we give a reward bonus to the transitions coming from demonstrations to encourage the agent to match the demonstrated behaviour. Then, upon collecting a successful episode, we relabel its transitions with the same bonus before adding them to the replay buffer, encouraging the agent to also match its previous successes. Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation. We show that our method based on reward relabeling improves the performance of the base algorithm (SAC and DDPG) on these tasks, even in the absence of demonstrations. Furthermore, integrating into our method two improvements from previous works allows our approach to outperform all baselines.

翻訳日:2022-01-12 15:06:57 公開日:2022-01-11

# 3次元物体検出と位置推定のためのLiDARビーム構成のエンドツーエンド最適化

End-To-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization ( http://arxiv.org/abs/2201.03860v1 )

ライセンス: Link先を確認

Niclas V\"odisch, Ozan Unal, Ke Li, Luc Van Gool, Dengxin Dai

(参考訳) lidarベースのアプリケーションのための既存の学習方法は、あらかじめ決められたビーム構成の下でスキャンされた3dポイントを使用する。これらの固定構成はタスクに依存しないため、単純に使用すればサブ最適パフォーマンスにつながる可能性がある。本研究では,あるアプリケーションに対して,lidarビーム構成の最適化を学ぶための新しい経路を提案する。具体的には、異なるLiDARベースのアプリケーションに対して、ビーム構成を自動的にエンドツーエンドに最適化する強化学習ベースの学習最適化(RL-L2O)フレームワークを提案する。この最適化は,目標タスクの最終的な性能によって導かれるので,簡単なドロップインモジュールとして任意のLiDARアプリケーションと容易に統合できる。この方法は、例えば大規模なシステム展開において、低解像度(低コスト)のLiDARが必要な場合に特に有用である。我々は,低分解能LiDARのビーム構成を3次元物体検出と局所化という2つの重要なタスクに対して探索する。実験の結果,RL-L2O法はベースライン法に比べて両タスクの性能が有意に向上することがわかった。我々は,プログラム可能なLiDARの最近の進歩と組み合わせることで,LiDARをベースとしたアクティブな知覚のための新たな研究方向を創出できると考えている。コードはhttps://github.com/vniclas/lidar_beam_selectionで公開されている。

Existing learning methods for LiDAR-based applications use 3D points scanned under a pre-determined beam configuration, e.g., the elevation angles of beams are often evenly distributed. Those fixed configurations are task-agnostic, so simply using them can lead to sub-optimal performance. In this work, we take a new route to learn to optimize the LiDAR beam configuration for a given application. Specifically, we propose a reinforcement learning-based learning-to-optimize (RL-L2O) framework to automatically optimize the beam configuration in an end-to-end manner for different LiDAR-based applications. The optimization is guided by the final performance of the target task and thus our method can be integrated easily with any LiDAR-based application as a simple drop-in module. The method is especially useful when a low-resolution (low-cost) LiDAR is needed, for instance, for system deployment at a massive scale. We use our method to search for the beam configuration of a low-resolution LiDAR for two important tasks: 3D object detection and localization. Experiments show that the proposed RL-L2O method improves the performance in both tasks significantly compared to the baseline methods. We believe that a combination of our method with the recent advances of programmable LiDARs can start a new research direction for LiDAR-based active perception. The code is publicly available at https://github.com/vniclas/lidar_beam_selection

翻訳日:2022-01-12 15:06:32 公開日:2022-01-11

# Systematic Literature Review: Quantum Machine Learningとその応用

Systematic Literature Review: Quantum Machine Learning and its applications ( http://arxiv.org/abs/2201.04093v1 )

ライセンス: Link先を確認

David Peral Garc\'ia, Juan Cruz-Benito and Francisco Jos\'e Garc\'ia-Pe\~nalvo

(参考訳) 量子コンピューティングは、量子力学を用いて計算を行う過程である。このフィールドは、その後の計算や大規模情報処理に使用するために、特定のサブ原子粒子の量子的挙動を研究する。これらの能力により、量子コンピュータは従来のコンピュータよりも計算時間とコストの面で有利になる。今日では、計算の複雑さや計算にかかる時間によって古典的な計算で実行できない科学的課題があり、量子計算は可能な答えの1つである。しかし、現在の量子デバイスはまだ必要な量子ビットを持っておらず、これらの目標を達成するのに十分なフォールトトレラントではない。それでも、機械学習や化学など、現在の量子デバイスで量子計算が役立つ分野は他にもある。この原稿は、2017年から2021年にかけて出版された論文の体系的な文献レビューを行い、量子機械学習で使用される異なるアルゴリズムとその応用を識別、分析、分類することを目的としている。その結果,量子機械学習技術とアルゴリズムを用いた52の論文を同定した。発見アルゴリズムの主な種類は、サポートベクターマシンやk-ネアレスト隣モデルのような古典的な機械学習アルゴリズムの量子実装と、量子ニューラルネットワークのような古典的なディープラーニングアルゴリズムである。古典的機械学習によって現在回答されている問題を、量子デバイスとアルゴリズムを使って解こうとする記事が多い。結果は有望だが、量子機械学習はその潜在能力を完全に達成するには程遠い。既存の量子コンピュータには、量子コンピューティングがその潜在能力を達成するのに十分な品質、速度、スケールが欠けているため、量子ハードウェアの改善が必要である。

Quantum computing is the process of performing calculations using quantum mechanics. This field studies the quantum behavior of certain subatomic particles for subsequent use in performing calculations, as well as for large-scale information processing. These capabilities can give quantum computers an advantage in terms of computational time and cost over classical computers. Nowadays, there are scientific challenges that are impossible to perform by classical computation due to computational complexity or the time the calculation would take, and quantum computation is one of the possible answers. However, current quantum devices have not yet the necessary qubits and are not fault-tolerant enough to achieve these goals. Nonetheless, there are other fields like machine learning or chemistry where quantum computation could be useful with current quantum devices. This manuscript aims to present a Systematic Literature Review of the papers published between 2017 and 2021 to identify, analyze and classify the different algorithms used in quantum machine learning and their applications. Consequently, this study identified 52 articles that used quantum machine learning techniques and algorithms. The main types of found algorithms are quantum implementations of classical machine learning algorithms, such as support vector machines or the k-nearest neighbor model, and classical deep learning algorithms, like quantum neural networks. Many articles try to solve problems currently answered by classical machine learning but using quantum devices and algorithms. Even though results are promising, quantum machine learning is far from achieving its full potential. An improvement in the quantum hardware is required since the existing quantum computers lack enough quality, speed, and scale to allow quantum computing to achieve its full potential.

翻訳日:2022-01-12 15:06:15 公開日:2022-01-11

# 最適圧縮VCクラス

Optimally compressing VC classes ( http://arxiv.org/abs/2201.04131v1 )

ライセンス: Link先を確認

Zachary Chase

(参考訳) Littlestone と Warmuth の予想を解くと、VC-dimension $d$ の任意の概念クラスは、サンプル圧縮スキームが$d$ であることを示す。

Resolving a conjecture of Littlestone and Warmuth, we show that any concept class of VC-dimension $d$ has a sample compression scheme of size $d$.

翻訳日:2022-01-12 15:05:51 公開日:2022-01-11

# アンダーサンプド4次元流れMRIからの再構成ノイズの解析

An analysis of reconstruction noise from undersampled 4D flow MRI ( http://arxiv.org/abs/2201.03715v1 )

ライセンス: Link先を確認

Lauren Partin, Daniele E. Schiavazzi and Carlos A. Sing Long

(参考訳) 新しいMR画像モダリティは血行動態を定量化できるが、心血管疾患の早期診断に広く用いられていることを除いて、長い取得時間を必要とする。取得時間を短縮するため、画像圧縮性を高めるために設計された表現を活用するアンダーサンプル計測による再構成手法が日常的に使用される。再構成された解剖学的および血行力学的画像は、視覚的アーティファクトを呈することがある。これらのアーティファクトのいくつかは本質的にレコンストラクションエラーであり、アンダーサンプリングの結果であるが、測定ノイズやサンプル周波数のランダムな選択によるものもある。そうでなければ、再構成された画像はランダムな変数となり、そのバイアスと共分散の両方が視覚的なアーティファクトにつながる可能性がある。前者の性質は文献で研究されているが、後者はそれほど注目されていない。本研究では,再建過程から生じるランダム摂動の理論的性質について検討し,シミュレーションおよびMR大動脈流に関する数値実験を行った。その結果,gaussian undersamplingパターンと$\ell_1$-norm最小化に基づくリカバリアルゴリズムを組み合わせた場合,相関長は2～3ピクセルに制限されることがわかった。しかし, 他のアンダーサンプリングパターン, 高いアンダーサンプリング因子 (すなわち8xまたは16x圧縮) , 異なる再構成法では相関長が有意に増加する可能性がある。

Novel Magnetic Resonance (MR) imaging modalities can quantify hemodynamics but require long acquisition times, precluding its widespread use for early diagnosis of cardiovascular disease. To reduce the acquisition times, reconstruction methods from undersampled measurements are routinely used, that leverage representations designed to increase image compressibility. Reconstructed anatomical and hemodynamic images may present visual artifacts. Although some of these artifact are essentially reconstruction errors, and thus a consequence of undersampling, others may be due to measurement noise or the random choice of the sampled frequencies. Said otherwise, a reconstructed image becomes a random variable, and both its bias and its covariance can lead to visual artifacts; the latter leads to spatial correlations that may be misconstrued for visual information. Although the nature of the former has been studied in the literature, the latter has not received as much attention. In this study, we investigate the theoretical properties of the random perturbations arising from the reconstruction process, and perform a number of numerical experiments on simulated and MR aortic flow. Our results show that the correlation length remains limited to two to three pixels when a Gaussian undersampling pattern is combined with recovery algorithms based on $\ell_1$-norm minimization. However, the correlation length may increase significantly for other undersampling patterns, higher undersampling factors (i.e., 8x or 16x compression), and different reconstruction methods.

翻訳日:2022-01-12 15:05:23 公開日:2022-01-11

# 異常検出のための一様スパース表現を用いた辞書学習

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection ( http://arxiv.org/abs/2201.03869v1 )

ライセンス: Link先を確認

Paul Irofti, Cristian Rusu, Andrei P\u{a}tra\c{s}cu

(参考訳) オーディオや画像処理のような多くのアプリケーションはスパース表現が強力で効率的な信号モデリング技術であることを示している。辞書学習(DL)によってアプローチされる難解な問題として,データの最短表現と最小近似誤差を同時に生成する最適な辞書を見つけることが挙げられる。信号のデータセットにおける異常サンプルの検出において,DLが果たす効果について検討した。本稿では,K-SVD型アルゴリズムを用いて,一様スパース表現モデルを求める特定のDL定式化を用いて,データセットの多数サンプルの下位部分空間を検出する。数値シミュレーションにより、この結果のサブスペースを効率よく利用し、正規データ点上の異常を識別できることが示されている。

Many applications like audio and image processing show that sparse representations are a powerful and efficient signal modeling technique. Finding an optimal dictionary that generates at the same time the sparsest representations of data and the smallest approximation error is a hard problem approached by dictionary learning (DL). We study how DL performs in detecting abnormal samples in a dataset of signals. In this paper we use a particular DL formulation that seeks uniform sparse representations model to detect the underlying subspace of the majority of samples in a dataset, using a K-SVD-type algorithm. Numerical simulations show that one can efficiently use this resulted subspace to discriminate the anomalies over the regular data points.

翻訳日:2022-01-12 15:04:07 公開日:2022-01-11

# PEPit: Pythonにおける一階最適化手法のコンピュータ支援最悪ケース解析

PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python ( http://arxiv.org/abs/2201.04040v1 )

ライセンス: Link先を確認

Baptiste Goujaud, C\'eline Moucer, Fran\c{c}ois Glineur, Julien Hendrickx, Adrien Taylor, Aymeric Dieuleveut

(参考訳) PEPitはPythonパッケージで、勾配、プロジェクション、近さ、線形最適化オラクルを含む多くの一階最適化メソッドの最悪のケース分析へのアクセスを、近似やブレグマン変種とともに単純化することを目的としている。簡単に言えば、PEPitはコンピュータ支援による一階最適化手法の最悪のケース解析を可能にするパッケージである。基本となる考え方は、数値的に解くことができる半定値プログラム(sdp)として、パフォーマンス推定問題(pep)と呼ばれる最悪のケース分析を行う問題を引き起こすことである。そのため、パッケージのユーザは、実装するのとほとんど同じくらいに、ファーストオーダーのメソッドを書かなければならない。その後、パッケージはSDPモデリング部品の処理を行い、最悪のケース解析は標準解法を介して数値的に行われる。

PEPit is a Python package aiming at simplifying the access to worst-case analyses of a large family of first-order optimization methods possibly involving gradient, projection, proximal, or linear optimization oracles, along with their approximate, or Bregman variants. In short, PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods. The key underlying idea is to cast the problem of performing a worst-case analysis, often referred to as a performance estimation problem (PEP), as a semidefinite program (SDP) which can be solved numerically. For doing that, the package users are only required to write first-order methods nearly as they would have implemented them. The package then takes care of the SDP modelling parts, and the worst-case analysis is performed numerically via a standard solver.

翻訳日:2022-01-12 15:03:17 公開日:2022-01-11

# (参考訳) 通信産業におけるデータ変換に基づく最適顧客チャーン予測モデル

Data transformation based optimized customer churn prediction model for the telecommunication industry ( http://arxiv.org/abs/2201.04088v1 )

ライセンス: CC BY 4.0

Joydeb Kumar Sana, Mohammad Zoynul Abedin, M. Sohel Rahman, M. Saifur Rahman

(参考訳) データ変換(DT)は、元のデータを特定の分類アルゴリズムをサポートする形式で転送し、特別な目的のためにデータを解析するプロセスである。予測性能を向上させるため,様々なデータ変換法を検討した。本研究は、顧客誘引が一般的な現象である通信産業(TCI)における顧客チャーン予測(CCP)の文脈で実施する。本研究では, ccp問題に対するデータ変換法と機械学習モデルを組み合わせた新しい手法を提案する。公開TIデータセットを用いて実験を行い,広く利用されている評価尺度(AUC,精度,リコール,F尺度など)を用いて評価を行った。本研究では,変換手法の効果を確認するための包括的比較を行った。比較結果と統計的テストの結果,提案したデータ変換に基づく最適化モデルのほとんどはCCPの性能を著しく向上させることがわかった。全体として、この原稿を通じて、通信産業のための効率的で最適化されたCCPモデルが提示されている。

Data transformation (DT) is a process that transfers the original data into a form which supports a particular classification algorithm and helps to analyze the data for a special purpose. To improve the prediction performance we investigated various data transform methods. This study is conducted in a customer churn prediction (CCP) context in the telecommunication industry (TCI), where customer attrition is a common phenomenon. We have proposed a novel approach of combining data transformation methods with the machine learning models for the CCP problem. We conducted our experiments on publicly available TCI datasets and assessed the performance in terms of the widely used evaluation measures (e.g. AUC, precision, recall, and F-measure). In this study, we presented comprehensive comparisons to affirm the effect of the transformation methods. The comparison results and statistical test proved that most of the proposed data transformation based optimized models improve the performance of CCP significantly. Overall, an efficient and optimized CCP model for the telecommunication industry has been presented through this manuscript.

翻訳日:2022-01-12 15:01:05 公開日:2022-01-11

# gDNA: 生成の詳細なニューラルアバターを目指して

gDNA: Towards Generative Detailed Neural Avatars ( http://arxiv.org/abs/2201.04123v1 )

ライセンス: Link先を確認

Xu Chen, Tianjian Jiang, Jie Song, Jinlong Yang, Michael J. Black, Andreas Geiger, Otmar Hilliges

(参考訳) 3Dアバターを広く利用するためには、任意のポーズでさまざまなアイデンティティと形状を持つ様々な3D仮想人間を生成する必要がある。この課題は、衣服の形状の多様性、複雑な調音、そして衣服における豊かでしかし確率的な幾何学的詳細のためである。したがって、現在の3D人を表す方法は、衣服の人々の完全な生成モデルを提供していない。本稿では,スキンの重みに対応するさまざまな衣服の人物の詳細な3次元形状を学習する新しい手法を提案する。具体的には,被験者1人あたり数回のポーズ・アンリグドスキャンから学習したマルチサブジェクト・フォワード・スキニングモジュールを考案する。衣料品の高周波詳細の確率的性質を捉えるために,モデルが基礎となる統計を捉えることを奨励する逆損失定式化を利用する。このことがシワなどの局所的な詳細の現実的な生成につながるという実証的な証拠を提供する。我々は,多様で詳細な衣服を身に着けた天然のアバターを生産できることを示した。さらに,本手法は,人間のモデルを生のスキャンに適合させることで,従来の技術よりも優れることを示す。

To make 3D human avatars widely available, we must be able to generate a variety of 3D virtual humans with varied identities and shapes in arbitrary poses. This task is challenging due to the diversity of clothed body shapes, their complex articulations, and the resulting rich, yet stochastic geometric detail in clothing. Hence, current methods to represent 3D people do not provide a full generative model of people in clothing. In this paper, we propose a novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skinning weights. Specifically, we devise a multi-subject forward skinning module that is learned from only a few posed, un-rigged scans per subject. To capture the stochastic nature of high-frequency details in garments, we leverage an adversarial loss formulation that encourages the model to capture the underlying statistics. We provide empirical evidence that this leads to realistic generation of local details such as wrinkles. We show that our model is able to generate natural human avatars wearing diverse and detailed clothing. Furthermore, we show that our method can be used on the task of fitting human models to raw scans, outperforming the previous state-of-the-art.

翻訳日:2022-01-12 14:46:05 公開日:2022-01-11

# DANNTe:ドメインシフト下におけるターボ機械センサ仮想化の事例研究

DANNTe: a case study of a turbo-machinery sensor virtualization under domain shift ( http://arxiv.org/abs/2201.03850v1 )

ライセンス: Link先を確認

Luca Strazzera and Valentina Gori and Giacomo Veneri

(参考訳) 本稿では,ドメイン適応(DA)時系列回帰タスク(DANNTe)に取り組むための逆学習手法を提案する。この回帰は、ガスタービンに搭載されたセンサーの仮想コピーを構築することを目的としており、特定の状況で失われる可能性のある物理センサーの代わりに使用される。我々のDAアプローチは、特徴のドメイン不変表現を探すことです。学習者はラベル付きソースデータセットとラベル付きターゲットデータセット(教師なしDA)の両方にアクセスでき、タスク回帰器とドメイン分類器ニューラルネットワークの間のminmaxゲームを利用するようにトレーニングされる。両方のモデルは同じ特徴表現を共有し、特徴抽出器によって学習される。この研究は Ganin et al. arXiv:1505.07818 によって発表された結果に基づいている。ソースドメインでのみトレーニングされたベースラインモデルと比較して,回帰性能が大幅に向上したことを報告する。

We propose an adversarial learning method to tackle a Domain Adaptation (DA) time series regression task (DANNTe). The regression aims at building a virtual copy of a sensor installed on a gas turbine, to be used in place of the physical sensor which can be missing in certain situations. Our DA approach is to search for a domain-invariant representation of the features. The learner has access to both a labelled source dataset and an unlabeled target dataset (unsupervised DA) and is trained on both, exploiting the minmax game between a task regressor and a domain classifier Neural Networks. Both models share the same feature representation, learnt by a feature extractor. This work is based on the results published by Ganin et al. arXiv:1505.07818; indeed, we present an extension suitable to time series applications. We report a significant improvement in regression performance, compared to the baseline model trained on the source domain only.

翻訳日:2022-01-12 14:45:33 公開日:2022-01-11

# 自動強化学習(AutoRL: Automated Reinforcement Learning)の調査と課題

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems ( http://arxiv.org/abs/2201.03916v1 )

ライセンス: Link先を確認

Jack Parker-Holder, Raghu Rajan, Xingyou Song, Andr\'e Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

(参考訳) 強化学習(RL)とディープラーニングの組み合わせは、多くの(深い)RLが一般的に有能なエージェントへの道筋を提供する、という印象的な成果をもたらした。しかしながら、RLエージェントの成功は、しばしばトレーニングプロセスにおける設計選択に非常に敏感であり、退屈でエラーを起こしやすい手動チューニングを必要とする。これにより、新しい問題にRLを使用することが難しくなり、また、その潜在能力を最大限に制限する。機械学習の他の多くの分野において、AutoMLはそのような設計選択を自動化できることを示しており、RLに適用すると有望な初期結果も得られている。しかし、AutoRL(Automated Reinforcement Learning)は、AutoMLの標準的なアプリケーションだけでなく、RL特有の課題も含んでいる。そのため、AutoRLはRLにおける重要な研究領域として現れており、RNA設計からGoのようなゲームまで、様々なアプリケーションで約束されている。 RLにおける手法や環境の多様性を考えると、研究の多くはメタラーニングから進化まで、異なるサブフィールドで行われている。本調査では,AutoRLの分野を統一し,共通分類学を提供し,各分野を詳細に議論し,今後の研究者にとって関心のあるオープンな問題を提起する。

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.

翻訳日:2022-01-12 14:45:17 公開日:2022-01-11

# 関連キー差分を用いた改良型ニューラルディスタンス:SIMONとSIMECKへの応用

Improved Neural Distinguishers with (Related-key) Differentials: Applications in SIMON and SIMECK ( http://arxiv.org/abs/2201.03767v1 )

ライセンス: Link先を確認

Jinyu Lu and Guoqiang Liu and Yunwen Liu and Bing Sun and Chao Li and Li Liu

(参考訳) CRYPTO 2019で、Gohr氏は先駆的な試みを行い、NSAブロック暗号Speck32/64に対する差分暗号解析にディープラーニングをうまく適用し、純粋な差分区別器よりも高い精度を実現した。その性質上、データ内の効果的な特徴のマイニングは、データ駆動型ディープラーニングにおいて重要な役割を果たす。本稿では,暗号文ペアの学習データからの情報の整合性を考慮することに加えて,ディープラーニングの学習プロセスにおいて,差分暗号解析の構造に関するドメイン知識も考慮し,性能の向上を図る。また,sat/smtソルバに基づいて,従来に比べて性能を効果的に向上させる他の高確率対応差分特性を見出す。我々は,Simon と Simeck に対してニューラル差別器 (NDs) と関連キーニューラル差別器 (RKNDs) を構築する。 Simon32/64 の ND と RKND はそれぞれ 11-, 11-round に達し、それぞれ 59.55% と 97.90% である。 Simon64/128では、NDは13ラウンドで60.32%、RKNDは95.49%である。 Simeck32/64では、11ラウンド、14ラウンドのNDとRKNDがそれぞれ63.32%、87.06%の精度に達する。また、Simeck64/128向けに17ラウンドのNDと21ラウンドのRKNDをそれぞれ64.24%と62.96%の精度で構築する。現在、これらはSimon32/64、Simon64/128、Simeck32/64、Simeck64/128の最も長い(関連するキー)神経識別器である。

In CRYPTO 2019, Gohr made a pioneering attempt, and successfully applied deep learning to the differential cryptanalysis against NSA block cipher Speck32/64, achieving higher accuracy than the pure differential distinguishers. By its very nature, mining effective features in data plays a crucial role in data-driven deep learning. In this paper, in addition to considering the integrity of the information from the training data of the ciphertext pair, domain knowledge about the structure of differential cryptanalysis is also considered into the training process of deep learning to improve the performance. Besides, based on the SAT/SMT solvers, we find other high probability compatible differential characteristics which effectively improve the performance compared with previous work. We build neural distinguishers (NDs) and related-key neural distinguishers (RKNDs) against Simon and Simeck. The ND and RKND for Simon32/64 reach 11-, 11-round with an accuracy of 59.55% and 97.90%, respectively. For Simon64/128, the ND achieve an accuracy of 60.32% in 13-round, while it is 95.49% for the RKND. For Simeck32/64, ND and RKND of 11-, 14-round are obtained, reaching an accuracy of 63.32% and 87.06%, respectively. And we build 17-round ND and 21-round RKND for Simeck64/128 with an accuracy of 64.24% and 62.96%, respectively. Currently, these are the longest (related-key) neural distinguishers with higher accuracy for Simon32/64, Simon64/128, Simeck32/64 and Simeck64/128.

翻訳日:2022-01-12 14:42:55 公開日:2022-01-11

# CVSSコーパスと多言語音声合成

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation ( http://arxiv.org/abs/2201.03713v1 )

ライセンス: Link先を確認

Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen

(参考訳) CVSSは,21言語から英語への文レベル並列S2ST対をカバーする,多言語から英語への多言語翻訳(S2ST)コーパスである。 CVSSはコモンボイス音声コーパスとCoVoST2音声テキスト翻訳(ST)コーパスから派生したもので、CoVoST2からの翻訳テキストを最先端のTSSシステムを用いて音声に合成する。翻訳文には2つのバージョンがある。 1)CVSS-C:全ての翻訳音声は高品質の標準音声である。 2) CVSS-T: 翻訳音声は対応する音源から伝達される音声である。さらに、CVSSは、翻訳音声の発音と一致する正規化翻訳テキストを提供する。 CVSSの各バージョンにおいて,ベースライン多言語直接S2STモデルとカスケードS2STモデルを構築し,コーパスの有効性を検証した。強力なカスケードS2STベースラインを構築するために、我々はCoVoST 2上でSTモデルを訓練した。それでも、直接S2STモデルの性能は、スクラッチからトレーニングされたときの強いカスケードベースラインに近づき、一致するSTモデルから初期化されるときのASR転写翻訳における0.1または0.7BLEU差のみである。

We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems. Two versions of translation speeches are provided: 1) CVSS-C: All the translation speeches are in a single high-quality canonical voice; 2) CVSS-T: The translation speeches are in voices transferred from the corresponding source speeches. In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech. On each version of CVSS, we built baseline multilingual direct S2ST models and cascade S2ST models, verifying the effectiveness of the corpus. To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous state-of-the-art trained on the corpus without extra data by 5.8 BLEU. Nevertheless, the performance of the direct S2ST models approaches the strong cascade baselines when trained from scratch, and with only 0.1 or 0.7 BLEU difference on ASR transcribed translation when initialized from matching ST models.

翻訳日:2022-01-12 14:42:28 公開日:2022-01-11

# TSA-Net:行動品質評価のためのチューブ自己注意ネットワーク

TSA-Net: Tube Self-Attention Network for Action Quality Assessment ( http://arxiv.org/abs/2201.03746v1 )

ライセンス: Link先を確認

Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, Lihua Zhang

(参考訳) 近年,映像からのアクションクオリティの評価がコンピュータビジョンコミュニティやヒューマンコンピュータインタラクションにおいて注目を集めている。既存のアプローチの多くは、フォアグラウンドやバックグラウンド情報といった機能マップ内の本質的な違いを無視するアクション認識タスクからモデルを直接移行することで、この問題に対処している。この問題に対処するために,行動品質評価(AQA)のためのチューブ自己注意ネットワーク(TSA-Net)を提案する。具体的には、単一オブジェクトトラッカーをAQAに導入し、スパースな特徴相互作用を採用することで、時空間情報を高効率に生成できるチューブ自己認識モジュール(TSA)を提案する。 TSAモジュールは既存のビデオネットワークに埋め込まれ、TSA-Netを形成する。全体として、私たちのTSA-Netには以下のメリットがあります。 1)高い計算効率、 2)高い柔軟性、そして 3)最先端の芸術作品。 AQA-7 や MTL-AQA など,一般的な行動品質評価データセットに対して大規模な実験を行った。さらに、フィギュアスケートシーンにおける基本的なアクションアセスメントを検討するために、Fall Recognition in Figure Skating (FR-FS) というデータセットが提案されている。

In recent years, assessing action quality from videos has attracted growing attention in computer vision community and human computer interaction. Most existing approaches usually tackle this problem by directly migrating the model from action recognition tasks, which ignores the intrinsic differences within the feature map such as foreground and background information. To address this issue, we propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA). Specifically, we introduce a single object tracker into AQA and propose the Tube Self-Attention Module (TSA), which can efficiently generate rich spatio-temporal contextual information by adopting sparse feature interactions. The TSA module is embedded in existing video networks to form TSA-Net. Overall, our TSA-Net is with the following merits: 1) High computational efficiency, 2) High flexibility, and 3) The state-of-the art performance. Extensive experiments are conducted on popular action quality assessment datasets including AQA-7 and MTL-AQA. Besides, a dataset named Fall Recognition in Figure Skating (FR-FS) is proposed to explore the basic action assessment in the figure skating scene.

翻訳日:2022-01-12 14:41:31 公開日:2022-01-11

# 脳腫瘍分節に対する相互対位学習: BraTS Challenge 2021 分節課題への解法

Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task ( http://arxiv.org/abs/2201.03777v1 )

ライセンス: Link先を確認

Himashi Peiris, Zhaolin Chen, Gary Egan, Mehrtash Harandi

(参考訳) 本稿では,脳腫瘍セグメンテーション課題に対する対角学習に基づくトレーニング手法を提案する。この概念では、3Dセグメンテーションネットワークは2つの相互対角学習アプローチから学習する。セグメンテーション予測の一般化を図り,セグメンテーションネットワークの堅牢化を図るため,本研究は,患者データにノイズを付加することにより,より逆行例を生成することにより,仮想逆トレーニングアプローチに固執する。定量的主観的審判として機能する批評家を取り入れることで、セグメンテーションネットワークは、セグメンテーション結果に関連する不確実性情報から学習する。 RSNA-ASNR-MICCAI BraTS 2021データセットを用いてネットワークアーキテクチャのトレーニングと評価を行った。オンライン検証データセットの性能は以下の通りである: Dice similarity Score of 81.38%, 90.77%, 85.39%; Hausdorff Distance (95\%) of 21.83 mm, 5.37 mm, 8.56 mm for the enhance tumor, whole tumor and tumor core。同様に、我々のアプローチは最終試験データセットで84.55%、90.46%、85.30%のDice類似スコア、13.48mm、6.32mm、16.98mmのHausdorff Distance (95\%)を達成した。全体として,提案手法は各腫瘍部分領域の分節精度が向上した。私たちのコード実装はhttps://github.com/himashi92/vizviva_brats_2021で公開されています。

This paper proposes an adversarial learning based training approach for brain tumor segmentation task. In this concept, the 3D segmentation network learns from dual reciprocal adversarial learning approaches. To enhance the generalization across the segmentation predictions and to make the segmentation network robust, we adhere to the Virtual Adversarial Training approach by generating more adversarial examples via adding some noise on original patient data. By incorporating a critic that acts as a quantitative subjective referee, the segmentation network learns from the uncertainty information associated with segmentation results. We trained and evaluated network architecture on the RSNA-ASNR-MICCAI BraTS 2021 dataset. Our performance on the online validation dataset is as follows: Dice Similarity Score of 81.38%, 90.77% and 85.39%; Hausdorff Distance (95\%) of 21.83 mm, 5.37 mm, 8.56 mm for the enhancing tumor, whole tumor and tumor core, respectively. Similarly, our approach achieved a Dice Similarity Score of 84.55%, 90.46% and 85.30%, as well as Hausdorff Distance (95\%) of 13.48 mm, 6.32 mm and 16.98 mm on the final test dataset. Overall, our proposed approach yielded better performance in segmentation accuracy for each tumor sub-region. Our code implementation is publicly available at https://github.com/himashi92/vizviva_brats_2021

翻訳日:2022-01-12 14:41:13 公開日:2022-01-11

# COROLLA:緑内障治療のためのコントラスト学習を改良した多モード統合フレームワーク

COROLLA: An Efficient Multi-Modality Fusion Framework with Supervised Contrastive Learning for Glaucoma Grading ( http://arxiv.org/abs/2201.03795v1 )

ライセンス: Link先を確認

Zhiyuan Cai, Li Lin, Huaqing He, Xiaoying Tang

(参考訳) 緑内障は盲目を引き起こす可能性のある眼疾患の1つであり、早期発見と治療は非常に重要である。眼底画像と光学コヒーレンス断層撮影(oct)画像はどちらも緑内障の診断に広く用いられている。しかし, 既存の緑内障分類法は, 眼底と眼底の相補情報を無視して, 単一のモダリティを主に活用している。本稿では,緑内障評価のための効率的な多モード教師付きコントラスト学習フレームワークcorollaを提案する。層分割と厚さ計算と投影により、元のoctボリュームから網膜厚マップを抽出し、置換モードとして使用することにより、メモリ使用量が少なく、より効率的な計算が可能になる。医用画像サンプルの高構造と分布の類似性を考慮し,教師付きコントラスト学習を用いて,モデルの識別能力を向上させる。さらに, 診断精度を高めるため, 足底画像と厚みマップの特徴レベル融合を行った。 GAMMAデータセットでは,我々のCOROLLAフレームワークは最先端の手法と比較して圧倒的な緑内障グレーディング性能を達成している。

Glaucoma is one of the ophthalmic diseases that may cause blindness, for which early detection and treatment are very important. Fundus images and optical coherence tomography (OCT) images are both widely-used modalities in diagnosing glaucoma. However, existing glaucoma grading approaches mainly utilize a single modality, ignoring the complementary information between fundus and OCT. In this paper, we propose an efficient multi-modality supervised contrastive learning framework, named COROLLA, for glaucoma grading. Through layer segmentation as well as thickness calculation and projection, retinal thickness maps are extracted from the original OCT volumes and used as a replacing modality, resulting in more efficient calculations with less memory usage. Given the high structure and distribution similarities across medical image samples, we employ supervised contrastive learning to increase our models' discriminative power with better convergence. Moreover, feature-level fusion of paired fundus image and thickness map is conducted for enhanced diagnosis accuracy. On the GAMMA dataset, our COROLLA framework achieves overwhelming glaucoma grading performance compared to state-of-the-art methods.

翻訳日:2022-01-12 14:40:42 公開日:2022-01-11

# Smart Director:ライブ放送のためのイベント駆動ディレクティブシステム

Smart Director: An Event-Driven Directing System for Live Broadcasting ( http://arxiv.org/abs/2201.04024v1 )

ライセンス: Link先を確認

Yingwei Pan and Yue Chen and Qian Bao and Ning Zhang and Ting Yao and Jingen Liu and Tao Mei

(参考訳) ライブビデオ放送は通常、マルチカメラ生産を可能にするために、様々な技術と専門知識を必要とする。カメラの数が増えるにつれて、ライブスポーツ放送の監督は、これまで以上に複雑で難しいものになっている。放送監督は、製作中にもっと集中し、反応し、知識を持てなければならない。そこで我々は,従来の人間間放送を模倣して,先進的な多視点ビデオ解析アルゴリズムを用いて,ほぼ専門的な放送番組をリアルタイムで自動作成することを目的とした,Smart Directorという,革新的な自動スポーツ放送ディレクティブシステムを開発した。スポーツ放送のいわゆる「3つのイベント」構成に着想を得て、3つの連続した新規コンポーネントからなるイベント駆動パイプラインでシステムを構築する。 1)マルチビュー相関をモデル化してイベントを検出するマルチビューイベントローカライゼーション 2)視点選択の視覚的重要度によるカメラビューのランク付けのためのマルチビューハイライト検出 3)放送映像の制作を制御する自動放送スケジューリング装置。我々の知る限り,本システムはスポーツイベントのセマンティック理解によって完全に駆動される,マルチカメラスポーツ放送のための初のエンドツーエンド自動ディレクティブシステムである。また、クロスビュー関係モデリングによる多視点共同イベント検出の新たな問題を解決した最初のシステムでもある。我々は,実世界のマルチカメラサッカーデータセット上で客観的および主観的評価を行い,自動生成ビデオの品質が人間に匹敵することを示す。より高速な応答によって、私たちのシステムはより高速で短時間のイベントをキャプチャすることができます。

Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keep increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called "three-event" construction of sports broadcast, we build our system with an event-driven pipeline consisting of three consecutive novel components: 1) the Multi-view Event Localization to detect events by modeling multi-view correlations, 2) the Multi-view Highlight Detection to rank camera views by the visual importance for view selection, 3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors.

翻訳日:2022-01-12 14:40:23 公開日:2022-01-11

# MobilePhys: パーソナライズされたカメラベースのコンタクトレス生理的センシング

MobilePhys: Personalized Mobile Camera-Based Contactless Physiological Sensing ( http://arxiv.org/abs/2201.04039v1 )

ライセンス: Link先を確認

Xin Liu, Yuntao Wang, Sinan Xie, Xiaoyu Zhang, Zixian Ma, Daniel McDuff, Shwetak Patel

(参考訳) カメラベースのコンタクトレスフォトプレチモグラフィ(英: contactless photoplethysmography)は、コンタクトレス生理測定のための一般的な技術である。現在の最先端のニューラルモデルは通常、金の標準的な生理学的測定を伴うビデオを使用して教師ありの方法で訓練される。しかし、多くの場合、ドメイン外の例(トレーニングセットと異なるビデオ)を一般化する。パーソナライズモデルはモデルの一般化性を改善するのに役立つが、多くのパーソナライズ技術は金の標準データを必要とする。そこで本稿では,スマートフォンの前面カメラと背面カメラの両方を利用して,パーソナライズされたコンタクトレスカメラベースのppgモデルをトレーニングするための高品質な自己教師付きラベルを生成する,モバイルパーソナライズ型リモート生理センシングシステムmobilephysを提案する。 MobilePhysのロバスト性を評価するために,異なるモバイルデバイス,照明条件/強度,動作タスク,皮膚タイプでタスクセットを完了した39名の被験者を対象に,ユーザスタディを行った。以上の結果から,MobilePhysはデバイス上での教師付きトレーニングや少数ショット適応手法よりも優れていた。広範なユーザ研究を通じて,MobilePhysは複雑な実世界の環境でどのように機能するかをさらに検討する。提案するデュアルカメラ・モバイルセンシングシステムから生成したカメラベースコンタクトレスppgモデルのキャリブレーションやパーソナライズによるppgモデルは,スマートミラーやフィットネス,モバイルヘルスアプリケーションなど,将来の多くのアプリケーションへの扉を開くだろう。

Camera-based contactless photoplethysmography refers to a set of popular techniques for contactless physiological measurement. The current state-of-the-art neural models are typically trained in a supervised manner using videos accompanied by gold standard physiological measurements. However, they often generalize poorly out-of-domain examples (i.e., videos that are unlike those in the training set). Personalizing models can help improve model generalizability, but many personalization techniques still require some gold standard data. To help alleviate this dependency, in this paper, we present a novel mobile sensing system called MobilePhys, the first mobile personalized remote physiological sensing system, that leverages both front and rear cameras on a smartphone to generate high-quality self-supervised labels for training personalized contactless camera-based PPG models. To evaluate the robustness of MobilePhys, we conducted a user study with 39 participants who completed a set of tasks under different mobile devices, lighting conditions/intensities, motion tasks, and skin types. Our results show that MobilePhys significantly outperforms the state-of-the-art on-device supervised training and few-shot adaptation methods. Through extensive user studies, we further examine how does MobilePhys perform in complex real-world settings. We envision that calibrated or personalized camera-based contactless PPG models generated from our proposed dual-camera mobile sensing system will open the door for numerous future applications such as smart mirrors, fitness and mobile health applications.

翻訳日:2022-01-12 14:39:55 公開日:2022-01-11

# DM-VIO: 遅延不整形視覚慣性オドメトリー

DM-VIO: Delayed Marginalization Visual-Inertial Odometry ( http://arxiv.org/abs/2201.04114v1 )

ライセンス: Link先を確認

Lukas von Stumberg, Daniel Cremers

(参考訳) DM-VIOは,遅延境界化法とポーズグラフバンドル調整法という2つの新しい手法に基づく単眼視覚・慣性オドメトリーシステムである。 dm-vioは、視覚残差のために動的重みで測光束調整を行う。我々は、更新時間を制限し続けるための一般的な戦略である限界化を採用するが、容易に逆転することはできず、連結変数の線形化点を固定する必要がある。この問題を解決するために、我々は、遅延余分化を提案する: この考え方は、余分化が遅れる第2因子グラフを維持することである。これにより、この遅延グラフを後で読み出し、新しい一貫性のある線形化点に先行して辺縁化を更新できる。さらに, 限界化の遅れにより, IMU 情報を既存の限界化状態に注入することができる。これは、IMU初期化に使用する提案されたポーズグラフバンドル調整の基礎である。 IMU初期化に関する以前の研究とは対照的に、完全な測光の不確かさを捉え、スケール推定を改善することができる。当初観測不能なスケールに対応するため、IMU初期化が完了した後も、メインシステムのスケールと重力方向を最適化し続けます。我々は,EuRoC,TUM-VI,および4Seasonsデータセットを用いて,空飛ぶドローン,大規模ハンドヘルド,自動車シナリオからなるシステム評価を行った。提案したIMUイニシャライゼーションにより,本システムは視覚・慣性オードメトリーにおいて,単一のカメラとIMUのみを使用しながら,ステレオ慣性手法よりも優れていた。コードはhttp://vision.in.tum.de/dm-vioで公開される。

We present DM-VIO, a monocular visual-inertial odometry system based on two novel techniques called delayed marginalization and pose graph bundle adjustment. DM-VIO performs photometric bundle adjustment with a dynamic weight for visual residuals. We adopt marginalization, which is a popular strategy to keep the update time constrained, but it cannot easily be reversed, and linearization points of connected variables have to be fixed. To overcome this we propose delayed marginalization: The idea is to maintain a second factor graph, where marginalization is delayed. This allows us to later readvance this delayed graph, yielding an updated marginalization prior with new and consistent linearization points. In addition, delayed marginalization enables us to inject IMU information into already marginalized states. This is the foundation of the proposed pose graph bundle adjustment, which we use for IMU initialization. In contrast to prior works on IMU initialization, it is able to capture the full photometric uncertainty, improving the scale estimation. In order to cope with initially unobservable scale, we continue to optimize scale and gravity direction in the main system after IMU initialization is complete. We evaluate our system on the EuRoC, TUM-VI, and 4Seasons datasets, which comprise flying drone, large-scale handheld, and automotive scenarios. Thanks to the proposed IMU initialization, our system exceeds the state of the art in visual-inertial odometry, even outperforming stereo-inertial methods while using only a single camera and IMU. The code will be published at http://vision.in.tum.de/dm-vio

翻訳日:2022-01-12 14:39:13 公開日:2022-01-11

# humannerf: モノクロビデオから人を動かす自由視点レンダリング

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video ( http://arxiv.org/abs/2201.04127v1 )

ライセンス: Link先を確認

Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron and Ira Kemelmacher-Shlizerman

(参考訳) 我々は、人間の複雑な身体の動き、例えばYouTubeのビデオの特定の単眼ビデオで動作する自由視点レンダリング手法、HumanNeRFを紹介した。提案手法では,任意のフレームで動画をパージングし,任意のカメラ視点から被写体をレンダリングしたり,特定のフレームとボディポーズのための360度カメラパスをフルに作成することができる。この作業は特に困難であり、入力ビデオに存在しない様々なカメラアングルから見るように、身体のフォトリアリスティックな詳細を合成し、布の折りたたみや顔の外観などの細かい詳細を合成する必要がある。提案手法は, 逆ワープによる映像のすべてのフレームに推定された正準表現をマッピングする運動場と協調して, 標準T位置における人物の体積表現を最適化する。運動場は、深層ネットワークによって生成される骨格剛体および非剛体運動に分解される。先行作業よりも性能が大幅に向上し,無制御のキャプチャシナリオに挑戦する単眼映像からのフリー視点レンダリングの説得力のある例を示す。

We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.

翻訳日:2022-01-12 14:38:47 公開日:2022-01-11

# pymdp: 離散状態空間におけるアクティブ推論のためのPythonライブラリ

pymdp: A Python library for active inference in discrete state spaces ( http://arxiv.org/abs/2201.03904v1 )

ライセンス: Link先を確認

Conor Heins, Beren Millidge, Daphne Demekas, Brennan Klein, Karl Friston, Iain Couzin, Alexander Tschantz

(参考訳) アクティブ推論(Active Inference)は、ベイズ推論の理論マントルの下で行動、知覚、学習をまとめる複雑なシステムにおける認知と行動の説明である。アクティブ推論は学術研究、特に人間や動物の行動をモデル化しようとする分野に応用が増えている。近年では、pythonやjuliaのようなオープンソース言語で、アクティブな推論エージェントをシミュレートするための最も人気のあるソフトウェアであるto-dateは、神経画像データの統計解析とモデリングのために元々開発されたmatlabライブラリであるspmのdemツールボックスである。アクティブ推論への関心の高まりは、膨大な数と、科学分野にわたるアプリケーションの多様化の両方において現れ、pythonのようなオープンソースの科学計算言語でアクティブ推論をシミュレートするための汎用的で広く利用可能な、ユーザフレンドリなコードの必要性を生み出した。ここで紹介するpythonパッケージであるpymdp(https://github.com/infer-actively/pymdp参照)は、この方向への大きなステップを示しています。我々は、パッケージの構造をレビューし、モジュール設計やカスタマイズ性といった利点を説明しながら、テキスト内コードブロックを提供し、アクティブな推論プロセスの構築と実行を簡単にする方法をデモする。我々は,様々な学際的背景を持つ研究者,技術者,開発者に対して,アクティブ推論フレームワークのアクセシビリティと露出を高めるために,pymdpを開発した。オープンソースソフトウェアの精神では、活発な推論コミュニティにおいて、新たなイノベーション、開発、コラボレーションが促進されることを願っています。

Active inference is an account of cognition and behavior in complex systems which brings together action, perception, and learning under the theoretical mantle of Bayesian inference. Active inference has seen growing applications in academic research, especially in fields that seek to model human or animal behavior. While in recent years, some of the code arising from the active inference literature has been written in open source languages like Python and Julia, to-date, the most popular software for simulating active inference agents is the DEM toolbox of SPM, a MATLAB library originally developed for the statistical analysis and modelling of neuroimaging data. Increasing interest in active inference, manifested both in terms of sheer number as well as diversifying applications across scientific disciplines, has thus created a need for generic, widely-available, and user-friendly code for simulating active inference in open-source scientific computing languages like Python. The Python package we present here, pymdp (see https://github.com/infer-actively/pymdp), represents a significant step in this direction: namely, we provide the first open-source package for simulating active inference with partially-observable Markov Decision Processes or POMDPs. We review the package's structure and explain its advantages like modular design and customizability, while providing in-text code blocks along the way to demonstrate how it can be used to build and run active inference processes with ease. We developed pymdp to increase the accessibility and exposure of the active inference framework to researchers, engineers, and developers with diverse disciplinary backgrounds. In the spirit of open-source software, we also hope that it spurs new innovation, development, and collaboration in the growing active inference community.

翻訳日:2022-01-12 14:38:05 公開日:2022-01-11

# h\"older関数に対するディープニューラルネットワーク近似

Deep Neural Network Approximation For H\"older Functions ( http://arxiv.org/abs/2201.03747v1 )

ライセンス: Link先を確認

Ahmed Abdeljawad

(参考訳) 本研究では,h\"older-regular関数に対する深さ直交単位ニューラルネットワークの近似能力について,一様ノルムに関して検討する。理論的近似はニューラルネットワークにおいて選択された活性化関数に大きく依存する。

In this work, we explore the approximation capability of deep Rectified Quadratic Unit neural networks for H\"older-regular functions, with respect to the uniform norm. We find that theoretical approximation heavily depends on the selected activation function in the neural network.

翻訳日:2022-01-12 14:36:27 公開日:2022-01-11

# (参考訳) 軽量ニューラルネットワークアニメーションを目指して : エキスパートベースアニメーションモデルの混合によるニューラルネットワークプラニングの探索

Towards Lightweight Neural Animation : Exploration of Neural Network Pruning in Mixture of Experts-based Animation Models ( http://arxiv.org/abs/2201.04042v1 )

ライセンス: CC BY 4.0

Antoine Maiorca, Nathan Hubens, Sohaib Laraba and Thierry Dutoit

(参考訳) 近年,ニューラルネットワークアニメーションが登場し,仮想文字をアニメーション化する自動手法が提案されている。それらの動きはニューラルネットワークによって合成される。この動きをユーザ定義の制御信号でリアルタイムに制御することは、ビデオゲームでも重要なタスクである。完全連結層(mlps)と混合専門家(moe)に基づくソリューションは、環境と仮想キャラクタ間の密接な相互作用によって様々な動きを生成し制御する素晴らしい結果をもたらしている。しかし、完全接続層の主な欠点は、計算コストとメモリコストが最適化されたソリューションにつながる可能性があることである。本研究では,MLP-MoEニューラルネットワークをインタラクティブなキャラクターアニメーションの文脈で圧縮するためにプルーニングアルゴリズムを適用し,パラメータの数を削減し,このアクセラレーションと合成された動き品質とのトレードオフにより計算時間を高速化する。この研究は、同じ数のエキスパートとパラメータで、刈り取ったモデルは密度の高いモデルよりも動きのアーティファクトを少なくし、学習されたハイレベルな運動特徴が両方のモデルに類似していることを示します。

In the past few years, neural character animation has emerged and offered an automatic method for animating virtual characters. Their motion is synthesized by a neural network. Controlling this movement in real time with a user-defined control signal is also an important task in video games for example. Solutions based on fully-connected layers (MLPs) and Mixture-of-Experts (MoE) have given impressive results in generating and controlling various movements with close-range interactions between the environment and the virtual character. However, a major shortcoming of fully-connected layers is their computational and memory cost which may lead to sub-optimized solution. In this work, we apply pruning algorithms to compress an MLP- MoE neural network in the context of interactive character animation, which reduces its number of parameters and accelerates its computation time with a trade-off between this acceleration and the synthesized motion quality. This work demonstrates that, with the same number of experts and parameters, the pruned model produces less motion artifacts than the dense model and the learned high-level motion features are similar for both

翻訳日:2022-01-12 14:35:53 公開日:2022-01-11

# DDG-DA:予測可能なコンセプトドリフト適応のためのデータ分散生成

DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation ( http://arxiv.org/abs/2201.04038v1 )

ライセンス: Link先を確認

Wendi Li, Xiao Yang, Weiqing Liu, Yingce Xia, Jiang Bian

(参考訳) 多くの現実世界のシナリオでは、時間とともに順次収集されるストリーミングデータを扱うことが多い。環境の非定常的な性質のため、ストリーミングデータ分布は予測不可能な方法で変化する可能性がある。概念ドリフトを処理するために、従来の手法はまず、概念ドリフトの発生時期を検知し、次に最新のデータの分布に合わせてモデルを適用する。しかしながら、環境進化の基盤となる要因が予測可能であり、ストリーミングデータの将来の概念ドリフト傾向をモデル化できるケースは多いが、以前の作業では十分に検討されていない。本稿では,データ分散の進化を効果的に予測し,モデルの性能を向上させる手法DDG-DAを提案する。具体的には、まず予測器をトレーニングして将来のデータ分布を推定し、次にトレーニングサンプルを生成し、最終的に生成されたデータでモデルをトレーニングします。我々は,3つの実世界の課題(株価動向,電力負荷,日射量の予測)について実験を行い,多種多様なモデルにおいて有意な改善を得る。

In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work. In this paper, we propose a novel method DDG-DA, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data. We conduct experiments on three real-world tasks (forecasting on stock price trend, electricity load and solar irradiance) and obtain significant improvement on multiple widely-used models.

翻訳日:2022-01-12 14:24:53 公開日:2022-01-11

# ランダムグラフにおけるエントロピー最適輸送

Entropic Optimal Transport in Random Graphs ( http://arxiv.org/abs/2201.03949v1 )

ライセンス: Link先を確認

Nicolas Keriven

(参考訳) グラフ解析において、古典的なタスクはノード間の(グループの)類似性の計算によって構成される。潜在空間ランダムグラフでは、ノードは未知の潜在変数に関連付けられる。すると、グラフ構造のみを用いて、潜在空間内で直接距離を計算することができる。本稿では,潜在空間内のノード群間におけるエントロピー規則化された最適輸送(OT)距離を一貫して推定できることを示す。コスト行列の摂動に対するエントロピーOTの一般的な安定性結果を提供する。その後、グラトンや多様体上の$\epsilon$-graphsのようなランダムグラフのいくつかの例に適用する。その過程で、いわゆる普遍特異値しきい値推定器と、多様体上の測地距離の推定のための新しい濃度結果が証明される。

In graph analysis, a classic task consists in computing similarity measures between (groups of) nodes. In latent space random graphs, nodes are associated to unknown latent variables. One may then seek to compute distances directly in the latent space, using only the graph structure. In this paper, we show that it is possible to consistently estimate entropic-regularized Optimal Transport (OT) distances between groups of nodes in the latent space. We provide a general stability result for entropic OT with respect to perturbations of the cost matrix. We then apply it to several examples of random graphs, such as graphons or $\epsilon$-graphs on manifolds. Along the way, we prove new concentration results for the so-called Universal Singular Value Thresholding estimator, and for the estimation of geodesic distances on a manifold.

翻訳日:2022-01-12 14:24:14 公開日:2022-01-11

# Captcha攻撃:人間性に対する攻撃

Captcha Attack:Turning Captchas Against Humanity ( http://arxiv.org/abs/2201.04014v1 )

ライセンス: Link先を確認

Mauro Conti, Luca Pajola, Pier Paolo Tricomi

(参考訳) 現在、人々はオンラインプラットフォーム(例えば、ソーシャルネットワーク、ブログ)で大量のコンテンツを作成、共有している。 2021年、毎日190億人のFacebookユーザーが毎分約150万枚の写真を投稿した。コンテンツモデレーターは常にこれらのオンラインプラットフォームを監視し、不適切なコンテンツ(ヘイトスピーチ、ヌード画像など)の拡散を防ぐ。ディープラーニング(DL)の進歩に基づいて、自動コンテンツモデレータ(ACM)は、人間のモデレーターが高いデータボリュームを処理するのに役立つ。アドバンテージにもかかわらず、攻撃者はDLコンポーネント(例えば前処理、モデル)の弱点を利用してパフォーマンスに影響を与えることができる。したがって、攻撃者はacmを回避して不適切なコンテンツを拡散することができる。そこで本研究では,ACM制御を回避して不適切なテキストをオンラインで拡散できるCAPtcha Attack (CAPA)を提案する。 CAPAはカスタムテキストCAPTCHAを生成することで、ACMの不注意な設計実装と内部プロシージャの脆弱性を利用する。実世界のACMに対する攻撃を検証し、その結果、単純で効果的な攻撃の威力を確認し、ほとんどのケースで100%の回避に成功した。同時に、CAPTCHAs研究領域におけるCAPA緩和の難しさを実証し、新たな課題を提起する。

Nowadays, people generate and share massive content on online platforms (e.g., social networks, blogs). In 2021, the 1.9 billion daily active Facebook users posted around 150 thousand photos every minute. Content moderators constantly monitor these online platforms to prevent the spreading of inappropriate content (e.g., hate speech, nudity images). Based on deep learning (DL) advances, Automatic Content Moderators (ACM) help human moderators handle high data volume. Despite their advantages, attackers can exploit weaknesses of DL components (e.g., preprocessing, model) to affect their performance. Therefore, an attacker can leverage such techniques to spread inappropriate content by evading ACM. In this work, we propose CAPtcha Attack (CAPA), an adversarial technique that allows users to spread inappropriate text online by evading ACM controls. CAPA, by generating custom textual CAPTCHAs, exploits ACM's careless design implementations and internal procedures vulnerabilities. We test our attack on real-world ACM, and the results confirm the ferocity of our simple yet effective attack, reaching up to a 100% evasion success in most cases. At the same time, we demonstrate the difficulties in designing CAPA mitigations, opening new challenges in CAPTCHAs research area.

翻訳日:2022-01-12 14:23:13 公開日:2022-01-11

# 大規模なデータセット改善のためのモバイルUIレイアウトを識別する学習

Learning to Denoise Raw Mobile UI Layouts for ImprovingDatasets at Scale ( http://arxiv.org/abs/2201.04100v1 )

ライセンス: Link先を確認

Gang Li, Gilles Baechler, Manuel Tragut, Yang Li

(参考訳) モバイル画面のレイアウトは、ui設計のための重要なデータソースであり、画面のセマンティック理解である。しかし、既存のデータセットのuilayoutは、しばしば騒がしいか、視覚表現とミスマッチしているか、あるいは分析やモデル化が難しいジェネリックまたはアプリ固有型で構成されている。本稿では,既存のモバイルuiレイアウトデータセットを大規模に自動改善可能な,uiレイアウトを否定するディープラーニングアプローチを用いたclayパイプラインを提案する。パイプラインは、スクリーンショットと生のUIレイアウトの両方を取り、不正なノードを削除し、各ノードに意味のあるタイプを割り当てることで、生のレイアウトに注釈を付ける。データクリーニングパイプラインを試すために、公開モバイルuiコーパスであるricoのスクリーンショットと生のレイアウトに基づいて、59,555のヒューマンアノテーション付きスクリーンレイアウトのclayデータセットを作成しました。深層モデルでは,有意な視覚的表現を持たないレイアウトオブジェクトの検出では82.7%,オブジェクトタイプ認識では85.9%と,ヒューリスティックベースラインを有意に上回る精度を実現している。当社の作業であるlaysa foundation for create large-scale high quality ui layout datasets for data-driven mobile ui research(サイト・英語)は、非常に高価な手動ラベル作業の必要性を低減します。

The layout of a mobile screen is a critical data source for UI designresearch and semantic understanding of the screen. However, UIlayouts in existing datasets are often noisy, have mismatches withtheir visual representation, or consists of generic or app-specifictypes that are difficult to analyze and model. In this paper, wepropose the CLAY pipeline that uses a deep learning approach fordenoising UI layouts, allowing us to automatically improve existingmobile UI layout datasets at scale. Our pipeline takes both thescreenshot and the raw UI layout, and annotates the raw layout byremoving incorrect nodes and assigning a semantically meaningfultype to each node. To experiment with our data-cleaning pipeline,we create the CLAY dataset of 59,555 human-annotated screenlayouts, based on screenshots and raw layouts from Rico, a publicmobile UI corpus. Our deep models achieve high accuracy withF1 scores of 82.7% for detecting layout objects that do not have avalid visual representation and 85.9% for recognizing object types,which significantly outperforms a heuristic baseline. Our work laysa foundation for creating large-scale high quality UI layout datasetsfor data-driven mobile UI research and reduces the need of manuallabeling efforts that are prohibitively expensive.

翻訳日:2022-01-12 14:22:53 公開日:2022-01-11

# 入力中の不確かさの検出による事前学習言語モデルの予測不確かさの説明

Explaining Prediction Uncertainty of Pre-trained Language Models by Detecting Uncertain Words in Inputs ( http://arxiv.org/abs/2201.03742v1 )

ライセンス: Link先を確認

Hanjie Chen, Yangfeng Ji

(参考訳) プレトレーニング言語モデルの予測不確実性を推定することは,NLPにおける信頼性を高める上で重要である。先行研究の多くは予測の不確かさの定量化に重点を置いているが、不確実性を説明する作業はほとんどない。本稿では,事前訓練後の言語モデルの不確定な予測について,さらに説明していく。 2つの摂動法に基づくポストホック解釈法であるlet-one-out と sample shapley を適用し,予測の不確実性を引き起こす入力中の単語を同定した。提案手法をBERTとRoBERTaの3つのタスク(感情分類、自然言語推論、パラフレーズ識別)で、ドメイン内およびドメイン外の両方で検証する。実験により、どちらの手法も、予測の不確実性を引き起こす入力中の単語を常に捕捉することを示した。

Estimating the predictive uncertainty of pre-trained language models is important for increasing their trustworthiness in NLP. Although many previous works focus on quantifying prediction uncertainty, there is little work on explaining the uncertainty. This paper pushes a step further on explaining uncertain predictions of post-calibrated pre-trained language models. We adapt two perturbation-based post-hoc interpretation methods, Leave-one-out and Sampling Shapley, to identify words in inputs that cause the uncertainty in predictions. We test the proposed methods on BERT and RoBERTa with three tasks: sentiment classification, natural language inference, and paraphrase identification, in both in-domain and out-of-domain settings. Experiments show that both methods consistently capture words in inputs that cause prediction uncertainty.

翻訳日:2022-01-12 14:21:24 公開日:2022-01-11

# 単語置換に対するロバストさの定量化

Quantifying Robustness to Adversarial Word Substitutions ( http://arxiv.org/abs/2201.03829v1 )

ライセンス: Link先を確認

Yuting Yang, Pei Huang, FeiFei Ma, Juan Cao, Meishan Zhang, Jian Zhang and Jintao Li

(参考訳) 深層学習に基づくNLPモデルは単語置換摂動に弱いことが判明した。広く採用される前に、堅牢性に関する基本的な問題に対処する必要がある。本稿では,単語レベルのロバスト性を評価するための形式的枠組みを提案する。まず,モデルの安全な領域を研究するために,モデルが摂動に抵抗できる境界であるロバスト性半径を導入する。最大ロバスト性半径の計算は計算が難しいので、その上限と下限を推定する。攻撃手法を上界を求める方法として再活用し,より強固な上界に対して擬似動的プログラミングアルゴリズムを設計する。そして、下限に対して検証方法を利用する。さらに,安全な半径外の領域のロバスト性を評価するために,別の視点からロバスト性を再検討する。厳密な統計的保証を持つロバストネス計量を導入し、モデルが安全な半径の外の摂動に感受性を示す逆例の定量化を計測する。このメトリクスは、BERTのような最先端のモデルがいくつかの単語置換によって簡単に騙されることができる理由を理解するのに役立ちます。

Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. Before they are widely adopted, the fundamental issues of robustness need to be addressed. Along this line, we propose a formal framework to evaluate word-level robustness. First, to study safe regions for a model, we introduce robustness radius which is the boundary where the model can resist any perturbation. As calculating the maximum robustness radius is computationally hard, we estimate its upper and lower bound. We repurpose attack methods as ways of seeking upper bound and design a pseudo-dynamic programming algorithm for a tighter upper bound. Then verification method is utilized for a lower bound. Further, for evaluating the robustness of regions outside a safe radius, we reexamine robustness from another view: quantification. A robustness metric with a rigorous statistical guarantee is introduced to measure the quantification of adversarial examples, which indicates the model's susceptibility to perturbations outside the safe radius. The metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions, but generalize well in the presence of real-world noises.

翻訳日:2022-01-12 14:21:10 公開日:2022-01-11

# 因果グラフのない因果推論のためのアンセストラル法

Ancestral instrument method for causal inference without a causal graph ( http://arxiv.org/abs/2201.03810v1 )

ライセンス: Link先を確認

Debo Cheng (1) and Jiuyong Li (1) and Lin Liu (1) and Jiji Zhang (2) and Thuc duy Le (1) and Jixue Liu (1) ((1) STEM, University of South Australia, Adelaide, SA, Australia, (2) Department of Religion and Philosophy, Hong Kong Baptist University, Hong Kong, China)

(参考訳) 観測データから因果効果を推定する主な障害は、観測不能な共起である。インストゥルメンタル変数(ivs)は潜在共同創設者が存在する場合の因果効果推定に広く使われている。標準 iv 法では、与えられた iv が有効であれば、偏りのない推定が得られるが、標準 iv の妥当性要件は厳格で検証不可能である。条件IVは、観測変数の集合(条件IVの条件集合として知られる)を条件付けすることで標準IVの要求を緩和するために提案されている。しかし、条件付きivの条件付き集合を見つけるための基準は、観測変数と観測変数の両方の因果関係を表す完全な因果構造知識または有向非巡回グラフ(dag)が必要である。これにより、データから直接条件セットを見つけることが不可能になる。本稿では,潜在変数を用いた因果推論において最大祖先グラフ(mags)を活用し,magにおける新しいタイプのiv, ancestral ivを提案し,magにおける所定の祖先ivの条件付け集合をデータ駆動的に発見する理論を考案する。この理論に基づいて,マグおよび観測データにおける祖先ivを用いた非バイアス因果効果推定アルゴリズムを開発した。合成および実世界のデータセットに関する大規模な実験は、既存のIV法と比較してアルゴリズムの性能を実証した。

Unobserved confounding is the main obstacle to causal effect estimation from observational data. Instrumental variables (IVs) are widely used for causal effect estimation when there exist latent confounders. With the standard IV method, when a given IV is valid, unbiased estimation can be obtained, but the validity requirement of standard IV is strict and untestable. Conditional IV has been proposed to relax the requirement of standard IV by conditioning on a set of observed variables (known as a conditioning set for a conditional IV). However, the criterion for finding a conditioning set for a conditional IV needs complete causal structure knowledge or a directed acyclic graph (DAG) representing the causal relationships of both observed and unobserved variables. This makes it impossible to discover a conditioning set directly from data. In this paper, by leveraging maximal ancestral graphs (MAGs) in causal inference with latent variables, we propose a new type of IV, ancestral IV in MAG, and develop the theory to support data-driven discovery of the conditioning set for a given ancestral IV in MAG. Based on the theory, we develop an algorithm for unbiased causal effect estimation with an ancestral IV in MAG and observational data. Extensive experiments on synthetic and real-world datasets have demonstrated the performance of the algorithm in comparison with existing IV methods.

翻訳日:2022-01-12 14:20:51 公開日:2022-01-11

# オントロジーによるユーザ嗜好の獲得と表現

Acquisition and Representation of User Preferences Guided by an Ontology ( http://arxiv.org/abs/2201.03824v1 )

ライセンス: Link先を確認

Rahma Dandan, Sylvie Despres, Karima Sedki

(参考訳) 私たちの食物選好は、食べ物の選択を導き、個人の健康や社会生活に影響を与えます。本稿では,形式主義CP-Netにおける嗜好の獲得と表現を支援するためにOWL2で表現されたドメインオントロジーを用いたアプローチを採用する。具体的には,ドメインオントロジーとアンケート設計を構築し,好みの獲得と表現を行う。好みの獲得と表現は大学キャンティーンの分野で実施される。この予備作業における私たちの大きな貢献は、優先権を取得し、オントロジーで表現されたドメイン知識を好ましくはモデルを強化することです。

Our food preferences guide our food choices and in turn affect our personal health and our social life. In this paper, we adopt an approach using a domain ontology expressed in OWL2 to support the acquisition and representation of preferences in formalism CP-Net. Specifically, we present the construction of the domain ontology and questionnaire design to acquire and represent the preferences. The acquisition and representation of preferences are implemented in the field of university canteen. Our main contribution in this preliminary work is to acquire preferences and enrich the model preferably with domain knowledge represented in the ontology.

翻訳日:2022-01-12 14:20:30 公開日:2022-01-11

# rgb/ir融合によるドローン物体検出

Drone Object Detection Using RGB/IR Fusion ( http://arxiv.org/abs/2201.03786v1 )

ライセンス: Link先を確認

Lizhi Yang, Ruhang Ma, Avideh Zakhor

(参考訳) 近年,空中ドローン画像による物体検出が注目されている。可視光画像はほとんどのシナリオで物体を検出するのに適しているが、サーマルカメラは物体の検出能力を夜間や隠された物体に拡張することができる。そのため、オブジェクト検出のためのRGBおよび赤外線融合法が有用かつ重要である。 RGB/IRオブジェクト検出にディープラーニング手法を適用する際の最大の課題のひとつは、特に夜間におけるドローン赤外線画像のトレーニングデータが不足していることである。本稿では,airsimシミュレーションエンジンとcycleganを用いて合成ir画像を作成するためのいくつかの戦略を考案する。さらに,地上での物体検出のためにRGBとIR画像の融合に照明対応融合フレームワークを利用する。シミュレーションデータと実データの両方に対して,本手法を特徴付ける。我々のソリューションはnvidia jetson xavierで実際のドローンで動作し、rgb/ir画像ペアあたり約28ミリ秒の処理を必要とする。

Object detection using aerial drone imagery has received a great deal of attention in recent years. While visible light images are adequate for detecting objects in most scenarios, thermal cameras can extend the capabilities of object detection to night-time or occluded objects. As such, RGB and Infrared (IR) fusion methods for object detection are useful and important. One of the biggest challenges in applying deep learning methods to RGB/IR object detection is the lack of available training data for drone IR imagery, especially at night. In this paper, we develop several strategies for creating synthetic IR images using the AIRSim simulation engine and CycleGAN. Furthermore, we utilize an illumination-aware fusion framework to fuse RGB and IR images for object detection on the ground. We characterize and test our methods for both simulated and actual data. Our solution is implemented on an NVIDIA Jetson Xavier running on an actual drone, requiring about 28 milliseconds of processing per RGB/IR image pair.

翻訳日:2022-01-12 14:19:28 公開日:2022-01-11

# ローカルエンハンスと原型辞書学習による教師なしドメイン適応型人物の再認識

Unsupervised Domain Adaptive Person Re-id with Local-enhance and Prototype Dictionary Learning ( http://arxiv.org/abs/2201.03803v1 )

ライセンス: Link先を確認

Haopeng Hou

(参考訳) 非教師付きドメイン適応型人物再識別(re-ID)タスクは、一般的なドメイン適応型タスクとは異なり、ソースデータとターゲットドメインデータのクラスが重複しないため、大きなドメインギャップにつながるため、課題となっている。最先端のunsupervised re-IDメソッドは、メモリベースのコントラスト損失を使用してニューラルネットワークをトレーニングする。しかし、ラベルのない各インスタンスをクラスとして扱うことで対照的な学習を行うと、クラス衝突の問題が起こり、メモリバンクの更新時に異なるカテゴリのインスタンスの数が異なるため、更新強度が矛盾する。そこで本研究では,クラス衝突問題やクラスタレベルのプロトタイプ辞書学習による強度不整合の問題を回避しつつ,ソースドメインデータとターゲットドメインデータの両方を1つのトレーニング段階で活用できる人向け辞書学習のプロトタイプを提案する。モデル上のドメインギャップの干渉を低減するために,モデルパラメータ数を増加させることなく,モデルのドメイン適応性を向上させるローカルエンハンスモジュールを提案する。 2つの大きなデータセットに対する実験により,試作辞書学習の有効性が示された。 71.5\% mAP は Market-to-Duke タスクで達成され、最先端の非教師なしドメイン適応型 re-ID メソッドと比較して 2.3\% 改善されている。 Duke-to-Marketタスクでは83.9\%のmAPを実現しており、最先端の非教師なし適応型re-IDメソッドと比較して4.4\%改善されている。

The unsupervised domain adaptive person re-identification (re-ID) task has been a challenge because, unlike the general domain adaptive tasks, there is no overlap between the classes of source and target domain data in the person re-ID, which leads to a significant domain gap. State-of-the-art unsupervised re-ID methods train the neural networks using a memory-based contrastive loss. However, performing contrastive learning by treating each unlabeled instance as a class will lead to the problem of class collision, and the updating intensity is inconsistent due to the difference in the number of instances of different categories when updating in the memory bank. To address such problems, we propose Prototype Dictionary Learning for person re-ID which is able to utilize both source domain data and target domain data by one training stage while avoiding the problem of class collision and the problem of updating intensity inconsistency by cluster-level prototype dictionary learning. In order to reduce the interference of domain gap on the model, we propose a local-enhance module to improve the domain adaptation of the model without increasing the number of model parameters. Our experiments on two large datasets demonstrate the effectiveness of the prototype dictionary learning. 71.5\% mAP is achieved in the Market-to-Duke task, which is a 2.3\% improvement compared to the state-of-the-art unsupervised domain adaptive re-ID methods. It achieves 83.9\% mAP in the Duke-to-Market task, which improves by 4.4\% compared to the state-of-the-art unsupervised adaptive re-ID methods.

翻訳日:2022-01-12 14:18:05 公開日:2022-01-11

# MobileFaceSwap: ビデオ顔スワッピングのための軽量フレームワーク

MobileFaceSwap: A Lightweight Framework for Video Face Swapping ( http://arxiv.org/abs/2201.03808v1 )

ライセンス: Link先を確認

Zhiliang Xu, Zhibin Hong, Changxing Ding, Zhen Zhu, Junyu Han, Jingtuo Liu, Errui Ding

(参考訳) 高度な顔交換法は魅力的な結果を得た。しかし、これらのメソッドの多くは多くのパラメータと計算を持っているため、リアルタイムアプリケーションに適用したり、携帯電話のようなエッジデバイスにデプロイすることは困難である。本研究では,識別情報に基づいてモデルパラメータを動的に調整し,主観的顔交換のための軽量ID-Aware Dynamic Network (IDN)を提案する。特に,重み予測と重み変調を含む2つの動的ニューラルネットワーク技術を導入することで,効率的なid注入モジュール(iim)を設計する。 IDNが更新されると、ターゲット画像やビデオが与えられた顔のスワップに適用される。提示されたIDNは0.50Mパラメータのみを含み、1フレームあたり0.33GのFLOPを必要とするため、携帯電話でリアルタイムのビデオ顔交換が可能である。さらに, 安定トレーニングのための知識蒸留に基づく方法を導入し, よりよい合成結果を得るために損失重み付けモジュールを用いる。最後に,本手法は教師モデルや他の最先端手法と同等の結果を得る。

Advanced face swapping methods have achieved appealing results. However, most of these methods have many parameters and computations, which makes it challenging to apply them in real-time applications or deploy them on edge devices like mobile phones. In this work, we propose a lightweight Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping by dynamically adjusting the model parameters according to the identity information. In particular, we design an efficient Identity Injection Module (IIM) by introducing two dynamic neural network techniques, including the weights prediction and weights modulation. Once the IDN is updated, it can be applied to swap faces given any target image or video. The presented IDN contains only 0.50M parameters and needs 0.33G FLOPs per frame, making it capable for real-time video face swapping on mobile phones. In addition, we introduce a knowledge distillation-based method for stable training, and a loss reweighting module is employed to obtain better synthesized results. Finally, our method achieves comparable results with the teacher models and other state-of-the-art methods.

翻訳日:2022-01-12 14:17:35 公開日:2022-01-11

# 可視赤外人物再同定のための補助学習タスクとしてのポーズ推定の検討

On Exploring Pose Estimation as an Auxiliary Learning Task for Visible-Infrared Person Re-identification ( http://arxiv.org/abs/2201.03859v1 )

ライセンス: Link先を確認

Yunqi Miao, Nianchang Huang, Xiao Ma, Qiang Zhang, and Jungong Han

(参考訳) 可視赤外人物再同定(vi-reid)は,可視光と赤外線の差が大きいため困難である。ほとんどの先駆的なアプローチは、モダリティ共有とID関連の特徴を学習することで、クラス内変異とモダリティ間格差を減らす。しかし、明示的なモダリティ共有のキュー、すなわちボディキーポイントは、VI-ReIDで完全に活用されていない。さらに、既存の機能学習パラダイムは、グローバル機能と部分機能の予測一貫性を無視した、グローバル機能または分割された機能ストライプに制約を課している。上記の問題に対処するため、我々はPose Estimationを補助学習タスクとして活用し、エンドツーエンドフレームワークにおけるVI-ReIDタスクを支援する。これら2つのタスクを相互に有益にトレーニングすることで、より高品質なモダリティ共有およびid関連特徴を学習する。その上、グローバルな特徴と局所的な特徴の学習は階層的特徴制約(HFC)によってシームレスに同期され、前者は知識蒸留戦略を用いて後者を監督する。 2つのベンチマークVI-ReIDデータセットの実験結果から,提案手法は一定のマージンで最先端の手法を継続的に改善することが示された。具体的には,RegDBデータセットの最先端手法に対して,約20$\%$ mAPの改善を実現する。興味深い結果として,VI-ReIDにおける補助課題学習の利用が注目された。

Visible-infrared person re-identification (VI-ReID) has been challenging due to the existence of large discrepancies between visible and infrared modalities. Most pioneering approaches reduce intra-class variations and inter-modality discrepancies by learning modality-shared and ID-related features. However, an explicit modality-shared cue, i.e., body keypoints, has not been fully exploited in VI-ReID. Additionally, existing feature learning paradigms imposed constraints on either global features or partitioned feature stripes, which neglect the prediction consistency of global and part features. To address the above problems, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework. By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features. On top of it, the learnings of global features and local features are seamlessly synchronized by Hierarchical Feature Constraint (HFC), where the former supervises the latter using the knowledge distillation strategy. Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins. Specifically, our method achieves nearly 20$\%$ mAP improvements against the state-of-the-art method on the RegDB dataset. Our intriguing findings highlight the usage of auxiliary task learning in VI-ReID.

翻訳日:2022-01-12 14:17:17 公開日:2022-01-11

# 3D ConvNet の最適化計画

Optimization Planning for 3D ConvNets ( http://arxiv.org/abs/2201.04021v1 )

ライセンス: Link先を確認

Zhaofan Qiu and Ting Yao and Chong-Wah Ngo and Tao Mei

(参考訳) 3次元畳み込みニューラルネットワーク(3d convnets)を最適に学習するのは、高い複雑性とトレーニングスキームの様々なオプションのためである。最も一般的なハンドチューニングプロセスは、短いビデオクリップを使って3dコンベネットを学習することから始まり、その後、長いクリップを使って長期の時間依存を学習し、トレーニングが進むにつれて学習率を徐々に低下させる。このようなプロセスといくつかのヒューリスティックな設定が組み合わさったという事実は、トレーニング全体を自動化するための最適な"パス"を求めて研究を動機付ける。本稿では,パスを一連のトレーニング「状態」に分解し,各状態における学習率や入力クリップの長さなどのハイパーパラメータを指定する。パフォーマンス・エピック曲線における膝点の推定は、ある状態から別の状態への遷移を引き起こす。我々は全ての候補状態に対して動的プログラミングを行い、最適な状態の置換、すなわち最適化経路を計画する。さらに,空間的および時間的識別性を改善するために,デュアルヘッド分類器を独自に設計した新しい3次元convnetを考案する。 7つの公開ビデオ認識ベンチマークに関する広範囲な実験が提案の利点を示している。最適化計画では、3D ConvNetsは最先端の認識手法と比較して優れた結果が得られる。より顕著に、Kinetics-400とKinetics-600のデータセットでそれぞれ80.5%と82.7%というトップ1の精度を得る。ソースコードはhttps://github.com/ZhaofanQiu/Optimization-Planning-for-3D-ConvNetsで入手できる。

It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal dependency using lengthy clips, while gradually decaying the learning rate from high to low as training progresses. The fact that such process comes along with several heuristic settings motivates the study to seek an optimal "path" to automate the entire training. In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e.g., learning rate and the length of input clips, in each state. The estimation of the knee point on the performance-epoch curve triggers the transition from one state to another. We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path. Furthermore, we devise a new 3D ConvNets with a unique design of dual-head classifier to improve spatial and temporal discrimination. Extensive experiments on seven public video recognition benchmarks demonstrate the advantages of our proposal. With the optimization planning, our 3D ConvNets achieves superior results when comparing to the state-of-the-art recognition methods. More remarkably, we obtain the top-1 accuracy of 80.5% and 82.7% on Kinetics-400 and Kinetics-600 datasets, respectively. Source code is available at https://github.com/ZhaofanQiu/Optimization-Planning-for-3D-ConvNets.

翻訳日:2022-01-12 14:16:54 公開日:2022-01-11

# 映像認識のための1つの情報フレームにシーケンスを凝縮する

Condensing a Sequence to One Informative Frame for Video Recognition ( http://arxiv.org/abs/2201.04022v1 )

ライセンス: Link先を確認

Zhaofan Qiu and Ting Yao and Yan Shu and Chong-Wah Ngo and Tao Mei

(参考訳) 動画は、動きのばらつきと、細かな視覚詳細の豊富なコンテンツによって複雑である。このような情報集約メディアから有用な情報を抽象化するには、網羅的な計算資源が必要である。本稿では,まず映像シーケンスを情報的「フレーム」に凝縮し,次に合成フレーム上の既製の画像認識システムを利用する2段階の方法を提案する。有効な疑問は、どのように「有用な情報」を定義し、それをビデオシーケンスから1つの合成フレームに蒸留するかである。本稿では,視覚再構成,映像分類,運動推定,および2つの正則化,すなわち,逆学習,色一貫性という3つの客観的タスクを組み込んだ,新しい情報フレーム合成(ifs)アーキテクチャを提案する。各タスクは合成フレームに1つの能力を与え、各レギュレータはその視覚品質を高める。これにより、フレーム合成をエンドツーエンドで共同で学習することにより、ビデオ解析に有用な時空間情報をカプセル化することが期待できる。大規模なKineeticsデータセット上で大規模な実験を行う。ビデオシーケンスを1つの画像にマッピングするベースライン手法と比較すると、IFSは優れた性能を示す。さらに印象的なことに、IFSは画像ベースの2Dネットワークとクリップベースの3Dネットワークの明確な改善を一貫して示しており、計算コストの少ない最先端の手法と同等のパフォーマンスを実現している。

Video is complex due to large variations in motion and rich content in fine-grained visual details. Abstracting useful information from such information-intensive media requires exhaustive computing resources. This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame. A valid question is how to define "useful information" and then distill it from a video sequence down to one synthetic frame. This paper presents a novel Informative Frame Synthesis (IFS) architecture that incorporates three objective tasks, i.e., appearance reconstruction, video categorization, motion estimation, and two regularizers, i.e., adversarial learning, color consistency. Each task equips the synthetic frame with one ability, while each regularizer enhances its visual quality. With these, by jointly learning the frame synthesis in an end-to-end manner, the generated frame is expected to encapsulate the required spatio-temporal information useful for video analysis. Extensive experiments are conducted on the large-scale Kinetics dataset. When comparing to baseline methods that map video sequence to a single image, IFS shows superior performance. More remarkably, IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks, and achieves comparable performance with the state-of-the-art methods with less computational cost.

翻訳日:2022-01-12 14:16:32 公開日:2022-01-11

# 多面統合による映像表現学習の促進

Boosting Video Representation Learning with Multi-Faceted Integration ( http://arxiv.org/abs/2201.04023v1 )

ライセンス: Link先を確認

Zhaofan Qiu and Ting Yao and Chong-Wah Ngo and Xiao-Ping Zhang and Dong Wu and Tao Mei

(参考訳) ビデオコンテンツは多面的であり、オブジェクト、シーン、インタラクション、アクションで構成される。既存のデータセットは、モデルトレーニング用のファセットの1つだけをラベル付けし、トレーニングデータセットに依存する1つのファセットに偏るビデオ表現を生成する。多面ラベルからビデオ表現を学ぶ方法や、多面情報をビデオ表現学習に有用かどうかについてはまだ研究されていない。本稿では,ビデオコンテンツの全スペクトルを反映した表現を学習するために,異なるデータセットから顔データを集約する,MUFI(MUlti-Faceted Integration)という新たな学習フレームワークを提案する。 MUFIは、映像表現をリッチなセマンティックな埋め込み空間に明示的にマッピングし、2つの視点から映像表現を協調的に最適化する視覚意味埋め込み学習として問題を定式化する。 1つは、各ビデオとそのラベル記述間の顔内監督を活かし、もう1つは、他のデータセットの顔から各ビデオの「意味表現」を顔間監督として予測することである。大規模な4つのビデオデータセットと2つの画像データセットを組み合わせることで、MUFIフレームワークを介して3D CNNを学習することが、ビデオ表現の優れた能力をもたらすことを示す。 MUFIを使った事前学習型3D CNNは、ダウンストリームビデオアプリケーションにおける他のアプローチよりも明らかに改善されている。 UCF101/HMDB51では98.1%/80.9%、ビデオキャプションではCIDEr-Dスコアでは101.5%である。

Video content is multifaceted, consisting of objects, scenes, interactions or actions. The existing datasets mostly label only one of the facets for model training, resulting in the video representation that biases to only one facet depending on the training dataset. There is no study yet on how to learn a video representation from multifaceted labels, and whether multifaceted information is helpful for video representation learning. In this paper, we propose a new learning framework, MUlti-Faceted Integration (MUFI), to aggregate facets from different datasets for learning a representation that could reflect the full spectrum of video content. Technically, MUFI formulates the problem as visual-semantic embedding learning, which explicitly maps video representation into a rich semantic embedding space, and jointly optimizes video representation from two perspectives. One is to capitalize on the intra-facet supervision between each video and its own label descriptions, and the second predicts the "semantic representation" of each video from the facets of other datasets as the inter-facet supervision. Extensive experiments demonstrate that learning 3D CNN via our MUFI framework on a union of four large-scale video datasets plus two image datasets leads to superior capability of video representation. The pre-learnt 3D CNN with MUFI also shows clear improvements over other approaches on several downstream video applications. More remarkably, MUFI achieves 98.1%/80.9% on UCF101/HMDB51 for action recognition and 101.5% in terms of CIDEr-D score on MSVD for video captioning.

翻訳日:2022-01-12 14:16:08 公開日:2022-01-11

# 行動認識のための識別サブグラフとしての映像表現

Representing Videos as Discriminative Sub-graphs for Action Recognition ( http://arxiv.org/abs/2201.04027v1 )

ライセンス: Link先を確認

Dong Li and Zhaofan Qiu and Yingwei Pan and Ting Yao and Houqiang Li and Tao Mei

(参考訳) 人間の行動は、典型的には組合せ構造やパターン、すなわち主題、対象、そして時空間的相互作用である。このような構造を発見することは、相互作用のダイナミクスを推論し、行動を認識する報奨となる。本稿では,ビデオ中の各行動の識別パターンを表現・符号化するサブグラフの新たな設計を提案する。具体的には,MUSLE(MUlti-scale Sub-graph LEarning)フレームワークを新たに構築し,ノード数に関するグラフを各スケールでコンパクトなサブグラフにクラスタ化する。技術的には、MUSLEは各ビデオクリップに3Dバウンディングボックス、すなわちチューブレットをグラフノードとして生成し、チューブレット間のグラフエッジとして密接な接続を行う。各アクションカテゴリに対して、ガウス混合層を学習し、認識のためのアクションプロトタイプとして識別サブグラフを選択することにより、グラフを各スケールでサブグラフに分解するオンラインクラスタリングを実行する。 Some-Something V1 & V2 と Kinetics-400 の2つのデータセットで大規模な実験を行い、最先端の手法と比較して優れた結果を報告する。さらに、我々のMUSLEは、Something V2バリデーションセットで65.0%の最高の報告精度を達成した。

Human actions are typically of combinatorial structures or patterns, i.e., subjects, objects, plus spatio-temporal interactions in between. Discovering such structures is therefore a rewarding way to reason about the dynamics of interactions and recognize the actions. In this paper, we introduce a new design of sub-graphs to represent and encode the discriminative patterns of each action in the videos. Specifically, we present MUlti-scale Sub-graph LEarning (MUSLE) framework that novelly builds space-time graphs and clusters the graphs into compact sub-graphs on each scale with respect to the number of nodes. Technically, MUSLE produces 3D bounding boxes, i.e., tubelets, in each video clip, as graph nodes and takes dense connectivity as graph edges between tubelets. For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition. Extensive experiments are conducted on both Something-Something V1 & V2 and Kinetics-400 datasets, and superior results are reported when comparing to state-of-the-art methods. More remarkably, our MUSLE achieves to-date the best reported accuracy of 65.0% on Something-Something V2 validation set.

翻訳日:2022-01-12 14:15:41 公開日:2022-01-11

# 動きに着目した映像表現のコントラスト学習

Motion-Focused Contrastive Learning of Video Representations ( http://arxiv.org/abs/2201.04029v1 )

ライセンス: Link先を確認

Rui Li and Yiheng Zhang and Zhaofan Qiu and Ting Yao and Dong Liu and Tao Mei

(参考訳) 動画における動きは、時間とともに変化する変化を巻き込む最も独特な現象であり、ビデオ表現学習の発展に欠かせないものとなっている。本稿では,特に自己監督型映像表現学習において,どのような動きが重要か,という疑問を呈する。この目的のために、コントラスト学習の体制において、データ拡張と特徴学習のための動きを利用するデュエットを構成する。具体的には,このようなデュエットを基礎とみなす動き中心のコントラスト学習(MCL)手法を提案する。一方、MCLはビデオ内の各フレームの光学的流れを利用して、時間的および空間的にチューブレット(すなわち時間的に関連するフレームパッチのシーケンス)をデータ拡張としてサンプリングする。一方,MCLは,空間的・時間的・時空間的視点からの光学的フローマップに,畳み込み層の勾配図を合わせることで,特徴学習における運動情報の基礎となる。 R(2+1)Dバックボーンを用いた広範囲な実験により, MCLの有効性が示された。 UCF101では、MCLが学習した表現に基づいて訓練された線形分類器が81.91%のトップ-1の精度を達成し、ImageNetの教師付き事前トレーニングを6.78%上回った。 Kinetics-400では、MCLは線形プロトコルの下で66.62%のトップ-1の精度を達成する。コードはhttps://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learningで公開されている。

Motion, as the most distinct phenomenon in a video to involve the changes over time, has been unique and critical to the development of video representation learning. In this paper, we ask the question: how important is the motion particularly for self-supervised video representation learning. To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning. Specifically, we present a Motion-focused Contrastive Learning (MCL) method that regards such duet as the foundation. On one hand, MCL capitalizes on optical flow of each frame in a video to temporally and spatially sample the tubelets (i.e., sequences of associated frame patches across time) as data augmentations. On the other hand, MCL further aligns gradient maps of the convolutional layers to optical flow maps from spatial, temporal and spatio-temporal perspectives, in order to ground motion information in feature learning. Extensive experiments conducted on R(2+1)D backbone demonstrate the effectiveness of our MCL. On UCF101, the linear classifier trained on the representations learnt by MCL achieves 81.91% top-1 accuracy, outperforming ImageNet supervised pre-training by 6.78%. On Kinetics-400, MCL achieves 66.62% top-1 accuracy under the linear protocol. Code is available at https://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learning.

翻訳日:2022-01-12 14:15:18 公開日:2022-01-11

# (参考訳) 視覚質問応答における共用変圧器層の有効性について

On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering ( http://arxiv.org/abs/2201.03965v1 )

ライセンス: CC BY 4.0

Ankur Sikarwar and Gabriel Kreiman

(参考訳) 近年、マルチモーダルトランスフォーマーは視覚言語タスクにおいて、視覚質問応答(vqa)のような著しい進歩を示しており、以前のアーキテクチャをかなり上回っている。このVQAの改善は、視覚と言語ストリーム間の豊富な相互作用に起因することが多い。本研究では,ネットワークが質問に回答しながら関連領域に集中するのを助けるために,コアテンショントランスフォーマー層の有効性について検討する。我々は,これらのコアテンション層における疑問条件付きイメージアテンションスコアを用いて視覚アテンションマップを生成する。現状VQAモデルの視覚的注意に対する以下の臨界成分の影響を評価する。 (i)対象領域の提案数 (ii)音声(POS)タグの質問部分 (iii)質問の意味論 (iv)コアテンション層の数、及び (v)正確性に答える。ニューラルネットワークのアテンションマップと人間のアテンションマップを質的・定量的に比較した。以上の結果から,画像の関連領域への応答にはコアテンショントランスフォーマーモジュールが重要であることが示唆された。重要なことに、質問の意味は視覚的注意を惹きつけるものではなく、質問の特定のキーワードが行うものである。我々の研究は、コアテンショントランスフォーマー層の機能と解釈に光を当て、現在のネットワークのギャップを強調し、視覚と言語ストリームを同時に処理する将来のVQAモデルとネットワークの開発をガイドすることができる。

In recent years, multi-modal transformers have shown significant progress in Vision-Language tasks, such as Visual Question Answering (VQA), outperforming previous architectures by a considerable margin. This improvement in VQA is often attributed to the rich interactions between vision and language streams. In this work, we investigate the efficacy of co-attention transformer layers in helping the network focus on relevant regions while answering the question. We generate visual attention maps using the question-conditioned image attention scores in these co-attention layers. We evaluate the effect of the following critical components on visual attention of a state-of-the-art VQA model: (i) number of object region proposals, (ii) question part of speech (POS) tags, (iii) question semantics, (iv) number of co-attention layers, and (v) answer accuracy. We compare the neural network attention maps against human attention maps both qualitatively and quantitatively. Our findings indicate that co-attention transformer modules are crucial in attending to relevant regions of the image given a question. Importantly, we observe that the semantic meaning of the question is not what drives visual attention, but specific keywords in the question do. Our work sheds light on the function and interpretation of co-attention transformer layers, highlights gaps in current networks, and can guide the development of future VQA models and networks that simultaneously process visual and language streams.

翻訳日:2022-01-12 14:13:41 公開日:2022-01-11

# 深層学習モデルを用いた感情分析--シンハラ語10年間のfacebookデータの比較研究

Sentiment Analysis with Deep Learning Models: A Comparative Study on a Decade of Sinhala Language Facebook Data ( http://arxiv.org/abs/2201.03941v1 )

ライセンス: Link先を確認

Gihan Weeraprameshwara, Vihanga Jayawickrama, Nisansa de Silva, Yudhanjaya Wijeratne

(参考訳) facebookの投稿とそれに対応するリアクション機能の関係は、探究と理解のための興味深いテーマだ。この目的をアーカイブするために、現在最先端のSinhala感情分析モデルを、何百万もの反応を伴う10年分のSinhalaポストを含むデータセットに対してテストした。ベンチマークの確立と、sinhalaの感情分析に最適なモデルを特定することを目的として、同じデータセットの設定で、感情分析に適した他のディープラーニングモデルをテストする。本研究では,3層双方向LSTMモデルが,現在最先端モデルであるCapsule Bより82.04%のF1スコアを達成し,Sinhala感情分析のF1スコアが84.58%に達することを報告した。さらに、すべてのディープラーニングモデルが75%以上のF1スコアを示しているので、Facebookの反応がテキストの感情を予測するのに適していると主張することは安全である。

The relationship between Facebook posts and the corresponding reaction feature is an interesting subject to explore and understand. To archive this end, we test state-of-the-art Sinhala sentiment analysis models against a data set containing a decade worth of Sinhala posts with millions of reactions. For the purpose of establishing benchmarks and with the goal of identifying the best model for Sinhala sentiment analysis, we also test, on the same data set configuration, other deep learning models catered for sentiment analysis. In this study we report that the 3 layer Bidirectional LSTM model achieves an F1 score of 84.58% for Sinhala sentiment analysis, surpassing the current state-of-the-art model; Capsule B, which only manages to get an F1 score of 82.04%. Further, since all the deep learning models show F1 scores above 75% we conclude that it is safe to claim that Facebook reactions are suitable to predict the sentiment of a text.

翻訳日:2022-01-12 13:56:37 公開日:2022-01-11

# chalearn autodl challenge 2019の勝利ソリューションとチャレンジ後の分析

Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019 ( http://arxiv.org/abs/2201.03801v1 )

ライセンス: Link先を確認

Zhengying Liu, Adrien Pavao, Zhen Xu, Sergio Escalera, Fabio Ferreira, Isabelle Guyon, Sirui Hong, Frank Hutter, Rongrong Ji, Julio C. S. Jacques Junior, Ge Li, Marius Lindauer, Zhipeng Luo, Meysam Madadi, Thomas Nierhoff, Kangning Niu, Chunguang Pan, Danny Stoll, Sebastien Treguer, Jin Wang, Peng Wang, Chenglin Wu, Youcheng Xiong, Arbe r Zela, Yang Zhang

(参考訳) 本稿では,ChaLearn氏のAutoDLチャレンジシリーズの結果と,さまざまな環境で導入されてきたディープラーニング(DL)のためのAutoMLソリューションの拡散のソートを支援するが,公正な比較は得られなかった。全ての入力データモダリティ(時系列、画像、ビデオ、テキスト、表計算)はテンソルとしてフォーマットされ、全てのタスクはマルチラベルの分類問題であった。コード提出は、限られた時間と計算資源で隠れたタスクで実行され、素早く結果を得るソリューションをプッシュした。この設定では、一般的なニューラルネットワークサーチ(NAS)は実用的ではなかったが、DLメソッドが支配的であった。ソリューションは、データモダリティにマッチするアーキテクチャを備えた、微調整済みのネットワークに依存していた。チャレンジ後のテストでは、制限時間を超える改善は示されなかった。コンポーネントは特にオリジナルでも新しいものでもないが、"meta-learner"、"data ingestor"、"model selector"、"model/learner"、"evaluator"を特徴とするハイレベルなモジュール化組織が登場した。このモジュラリティによってアブレーション研究が可能となり、(プラットフォーム外の)メタラーニング、センシング、効率的なデータ管理の重要性が明らかになった。異種モジュールの組み合わせに関する実験は、勝利した解の(局所的な)最適性をさらに確認する。私たちの課題には、継続するベンチマーク(http://autodl.chalearn.org)、勝者のオープンソースコード、無償の"AutoDLセルフサービス"が含まれています。

This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service".

翻訳日:2022-01-12 13:55:32 公開日:2022-01-11

# 覚えることを学ぶ

Learning what to remember ( http://arxiv.org/abs/2201.03806v1 )

ライセンス: Link先を確認

Robi Bhattacharjee and Gaurav Mahajan

(参考訳) 我々は,学習者が絶え間なく任意の事実の流れに直面する生涯学習シナリオを考察し,その記憶に保持すべきものを決定する必要がある。オンライン学習フレームワークに基づく数学的モデルを導入し、学習者は記憶に制約のある専門家の集合に対して自己測定を行い、記憶すべきものに対する異なるポリシーを反映する。事実のストリームに散らばっているのは時々の質問であり、これらの各学習者は、対応する事実を覚えていなければ損失を被る。そのゴールは、ほぼ同じ量のメモリを使用しながら、後見で最高の専門家とほとんど同じことをすることです。このメモリ制約のあるシナリオにおいて乗算重み更新アルゴリズムを用いることの難しさを特定し、後悔の保証が最良に近い代替スキームを設計する。

We consider a lifelong learning scenario in which a learner faces a neverending and arbitrary stream of facts and has to decide which ones to retain in its limited memory. We introduce a mathematical model based on the online learning framework, in which the learner measures itself against a collection of experts that are also memory-constrained and that reflect different policies for what to remember. Interspersed with the stream of facts are occasional questions, and on each of these the learner incurs a loss if it has not remembered the corresponding fact. Its goal is to do almost as well as the best expert in hindsight, while using roughly the same amount of memory. We identify difficulties with using the multiplicative weights update algorithm in this memory-constrained scenario, and design an alternative scheme whose regret guarantees are close to the best possible.

翻訳日:2022-01-12 13:53:29 公開日:2022-01-11

# Uni-EDEN:マルチグラニュラービジョンランゲージ事前学習によるユニバーサルエンコーダデコーダネットワーク

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training ( http://arxiv.org/abs/2201.04026v1 )

ライセンス: Link先を確認

Yehao Li and Jiahao Fan and Yingwei Pan and Ting Yao and Weiyao Lin and Tao Mei

(参考訳) 視覚言語プレトレーニングは、リッチリソースプレトレーニングタスクから限定リソースダウンストリームタスクにマルチモーダル知識を転送する、新興かつ迅速な研究トピックである。単一のジェネリックエンコーダを主に学習する既存の研究とは異なり、視覚言語認識(例えば、視覚的質問応答)と生成(例えば、画像キャプション)の両方を容易にする訓練済みのユニバーサルエンコーダ・デコーダネットワーク(Uni-EDEN)を提示する。 Uni-EDENは2ストリームトランスフォーマーベースの構造で、各モダリティの表現を個別に学習するオブジェクトと文エンコーダと、モーダル間相互作用による多モーダル推論と文生成を可能にする文デコーダの3つのモジュールで構成される。画像の言語表現は, 単純から包括的, 個々のラベル, フレーズ, 自然文まで, この階層のさまざまな粒度にまたがる可能性があることを考慮し, 多粒度視覚言語プロキシタスク(MOC), マスケ領域フレーズ生成(MRPG), イメージセンテンスマッチング(ISM), マスケ領域生成(MSG)を事前訓練する。このように、Uni-EDENにはマルチモーダル表現抽出と言語モデリングの両方の能力がある。広汎な実験は、Uni-EDENを4つの視覚言語知覚と下流タスクに微調整することで、説得力のある一般化性を示す。

Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks. Unlike existing works that predominantly learn a single generic encoder, we present a pre-trainable Universal Encoder-DEcoder Network (Uni-EDEN) to facilitate both vision-language perception (e.g., visual question answering) and generation (e.g., image captioning). Uni-EDEN is a two-stream Transformer based structure, consisting of three modules: object and sentence encoders that separately learns the representations of each modality, and sentence decoder that enables both multi-modal reasoning and sentence generation via inter-modal interaction. Considering that the linguistic representations of each image can span different granularities in this hierarchy including, from simple to comprehensive, individual label, a phrase, and a natural sentence, we pre-train Uni-EDEN through multi-granular vision-language proxy tasks: Masked Object Classification (MOC), Masked Region Phrase Generation (MRPG), Image-Sentence Matching (ISM), and Masked Sentence Generation (MSG). In this way, Uni-EDEN is endowed with the power of both multi-modal representation extraction and language modeling. Extensive experiments demonstrate the compelling generalizability of Uni-EDEN by fine-tuning it to four vision-language perception and generation downstream tasks.

翻訳日:2022-01-12 13:53:15 公開日:2022-01-11

# (参考訳) 信号デノナイズのためのクロスバリデーションフレームワークとそのトレンドフィルタリング, Dyadic CARTなどへの応用

A Cross Validation Framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond ( http://arxiv.org/abs/2201.02654v2 )

ライセンス: CC BY 4.0

Anamitra Chaudhuri and Sabyasachi Chatterjee

(参考訳) 本稿では,信号復調のための一般的なクロス検証フレームワークを定式化する。一般的なフレームワークは、トレンドフィルタリングやdyadic cartのような非パラメトリック回帰法に適用される。得られたクロス検証されたバージョンは、最適に調整されたアナログで知られているように、ほぼ同じ収束率に達することが示される。トレンドフィルタリングやDyadic CARTのクロスバリデーションバージョンに関する以前の理論的分析は存在しなかった。フレームワークの汎用性を説明するために, 2つの基本推定器の相互検証版, 高次元線形回帰のためのラッソ, 行列推定のための特異値閾値付けを提案する。我々の一般的なフレームワークはChatterjee と Jafarov (2015) のアイデアにインスパイアされており、チューニングパラメータを使用する幅広い推定手法に適用できる可能性がある。

This paper formulates a general cross validation framework for signal denoising. The general framework is then applied to nonparametric regression methods such as Trend Filtering and Dyadic CART. The resulting cross validated versions are then shown to attain nearly the same rates of convergence as are known for the optimally tuned analogues. There did not exist any previous theoretical analyses of cross validated versions of Trend Filtering or Dyadic CART. To illustrate the generality of the framework we also propose and study cross validated versions of two fundamental estimators; lasso for high dimensional linear regression and singular value thresholding for matrix estimation. Our general framework is inspired by the ideas in Chatterjee and Jafarov (2015) and is potentially applicable to a wide range of estimation methods which use tuning parameters.

翻訳日:2022-01-12 13:26:38 公開日:2022-01-11

# 深層マルチタスク学習のためのユニタリスカラー化の擁護

In Defense of the Unitary Scalarization for Deep Multi-Task Learning ( http://arxiv.org/abs/2201.04122v1 )

ライセンス: Link先を確認

Vitaly Kurin, Alessandro De Palma, Ilya Kostrikov, Shimon Whiteson, M. Pawan Kumar

(参考訳) 最近のマルチタスク学習研究は、訓練が単にタスク損失の総和を最小化するユニタリスカラー化に反対している。アドホックなマルチタスク最適化アルゴリズムが提案され、マルチタスク設定の難しさに関する様々な仮説に着想を得た。これらのオプティマイザの大部分は、タスク毎の勾配を必要とし、メモリ、ランタイム、実装のオーバーヘッドを大きく導入する。本稿では,多くの専用マルチタスクオプティマイザを正規化の形式として解釈できることを示す理論的解析を行う。さらに,単タスク学習の標準正規化と安定化技術とを組み合わせることで,教師付き学習と強化学習の両方において,複雑なマルチタスクオプティマイザの性能が一致するか,あるいは向上することを示す。我々は,本研究の結果から,近年の地域研究に対する批判的な再評価が求められていると信じている。

Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and implementation overhead. We present a theoretical analysis suggesting that many specialized multi-task optimizers can be interpreted as forms of regularization. Moreover, we show that, when coupled with standard regularization and stabilization techniques from single-task learning, unitary scalarization matches or improves upon the performance of complex multi-task optimizers in both supervised and reinforcement learning settings. We believe our results call for a critical reevaluation of recent research in the area.

翻訳日:2022-01-12 13:26:07 公開日:2022-01-11

# 事前知識が放射線レポート生成を増強する

Prior Knowledge Enhances Radiology Report Generation ( http://arxiv.org/abs/2201.03761v1 )

ライセンス: Link先を確認

Song Wang, Liyan Tang, Mingquan Lin, George Shih, Ying Ding, Yifan Peng

(参考訳) 放射線医学報告生成は, 放射線科医の作業負荷を軽減するため, コンピュータ支援診断を作成することを目的としており, 近年注目を集めている。しかし, 従来の深層学習手法では, 医学的所見間の相互影響を無視する傾向があり, 報告の質を損なうボトルネックとなる。本稿では,情報的知識グラフを用いて,医学的発見の関連性について検討し,その先行知識を放射線学的報告生成に取り入れ,報告の質を向上させることを提案する。実験の結果, ROUGE-Lを0.384$\pm$0.007, CIDErを0.340$\pm$0.011とするIU X線データセットにおいて, 提案手法の優れた性能を示した。 CIDErとROUGE-Lは平均1.6%の改善(それぞれ2.0%と1.5%の改善)を達成した。実験により, 先行知識が正確な放射線学レポート生成に性能向上をもたらす可能性が示唆された。コードはhttps://github.com/bionlplab/report_generation_amia2022で公開します。

Radiology report generation aims to produce computer-aided diagnoses to alleviate the workload of radiologists and has drawn increasing attention recently. However, previous deep learning methods tend to neglect the mutual influences between medical findings, which can be the bottleneck that limits the quality of generated reports. In this work, we propose to mine and represent the associations among medical findings in an informative knowledge graph and incorporate this prior knowledge with radiology report generation to help improve the quality of generated reports. Experiment results demonstrate the superior performance of our proposed method on the IU X-ray dataset with a ROUGE-L of 0.384$\pm$0.007 and CIDEr of 0.340$\pm$0.011. Compared with previous works, our model achieves an average of 1.6% improvement (2.0% and 1.5% improvements in CIDEr and ROUGE-L, respectively). The experiments suggest that prior knowledge can bring performance gains to accurate radiology report generation. We will make the code publicly available at https://github.com/bionlplab/report_generation_amia2022.

翻訳日:2022-01-12 13:25:32 公開日:2022-01-11

# セマンティクスセグメンテーションのためのピラミッド融合トランスフォーマ

Pyramid Fusion Transformer for Semantic Segmentation ( http://arxiv.org/abs/2201.04019v1 )

ライセンス: Link先を確認

Zipeng Qin, Jianbo Liu, Xiaolin Zhang, Maoqing Tian, Aojun Zhou, Shuai Yi, Hongsheng Li

(参考訳) 最近提案されたMaskFormer \cite{maskformer}は、セマンティックセグメンテーションのタスクについて、新たな視点を与えている。本質的には、カテゴリセグメントに対応するペア確率とマスクを生成し、セグメンテーションマップの推論中にそれらを組み合わせます。したがって、セグメンテーションの品質は、クエリが画像内のカテゴリとその空間的位置に関するセマンティック情報をいかにうまくキャプチャできるかに依存する。本研究では,シングルスケール機能上のマスク分類デコーダは,信頼性の高い確率やマスクを抽出できるほど有効ではないことを見出した。特徴ピラミッド全体にわたって豊富な意味情報を求めるため,マルチスケール機能上にマスク毎のセマンティクスセグメンテーションを実現するトランスフォーマティブベースのピラミッド融合トランスフォーマを提案する。計算オーバーヘッドを過大に発生させることなく、異なる解像度の画像特徴を効率的に活用するために、PFTは、マルチスケールのマルチスケールトランスフォーマーデコーダを用いて補完情報を交換する。広範な実験評価とアブレーションを行い,その効果を実証した。特に、MaskFormerと比較して、ResNet-101cでCOCO-Stuff 10Kデータセットを3.2mIoU改善しました。さらに、ADE20K検証セットでは、Swin-BのバックボーンとMaskFormerのバックボーンと、シングルスケールとマルチスケールの両方でずっと大きなSwin-Lのバックボーンが一致し、それぞれ54.1 mIoUと55.3 mIoUを達成した。 Swin-Lのバックボーンを用いてADE20K検証セット上で56.0 mIoUのシングルスケール結果と57.2のマルチスケール結果を得る。

The recently proposed MaskFormer \cite{maskformer} gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates paired probabilities and masks corresponding to category segments and combines them during inference for the segmentation maps. The segmentation quality thus relies on how well the queries can capture the semantic information for categories and their spatial locations within the images. In our study, we find that per-mask classification decoder on top of a single-scale feature is not effective enough to extract reliable probability or mask. To mine for rich semantic information across the feature pyramid, we propose a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation on top of multi-scale features. To efficiently utilize image features of different resolutions without incurring too much computational overheads, PFT uses a multi-scale transformer decoder with cross-scale inter-query attention to exchange complimentary information. Extensive experimental evaluations and ablations demonstrate the efficacy of our framework. In particular, we achieve a 3.2 mIoU improvement on COCO-Stuff 10K dataset with ResNet-101c compared to MaskFormer. Besides, on ADE20K validation set, our result with Swin-B backbone matches that of MaskFormer's with a much larger Swin-L backbone in both single-scale and multi-scale inference, achieving 54.1 mIoU and 55.3 mIoU respectively. Using a Swin-L backbone, we achieve 56.0 mIoU single-scale result on the ADE20K validation set and 57.2 multi-scale result, obtaining state-of-the-art performance on the dataset.

翻訳日:2022-01-12 13:25:10 公開日:2022-01-11

# (参考訳) 核電子顕微鏡における体積再構成のための深部生成モデル

Deep Generative Modeling for Volume Reconstruction in Cryo-Electron Microscopy ( http://arxiv.org/abs/2201.02867v2 )

ライセンス: CC BY 4.0

Claire Donnat, Axel Levy, Frederic Poitevin, Nina Miolane

(参考訳) 低温電子顕微鏡(cryo-EM)による生体分子の高分解能イメージングの最近の進歩は、分子体積の再構築のための新しい扉を開放し、生物学、化学、薬理学研究のさらなる進歩を約束している。重要な道のりにもかかわらず、Cryo-EMデータ分析における大きな課題は、物理学者、構造生物学者、計算機科学者、統計学者、応用数学者からの洞察を必要とする、自然界における厳密で複雑な学際的な課題のままである。一方, 生成モデルとエンドツーエンドの教師なし深層学習技術を組み合わせた次世代のボリューム再構成アルゴリズムでは, シミュレーションデータに対して有望な結果が得られた。そこで本稿では,このような手法の普及と課題の学際的性質を踏まえ,高分解能cryo-emボリューム再構成のための深部生成モデリングの最近の進歩を批判的に検討する。本日のレビューは (i)これらの新しい方法を比較して対比する一方で (ii)cryo-emの特定の背景を持たない5つの分野の科学者に親しみやすい用語を用いて、視点から提示すること。このレビューは、Creo-EMボリューム再構成のための深部生成モデルの数学的および計算的課題の紹介と、このクラスのアルゴリズム間で共有されるベースライン方法論の概要から始まる。これらの異なるモデルを通して共通のスレッドウィービングを確立し、これらの最先端のアルゴリズムを実践的に比較し、それらが依存する仮定とともに、それらの相対的な強みと弱みを強調します。これにより、将来の研究のための現在の方法や道のボトルネックを特定できます。

Recent breakthroughs in high resolution imaging of biomolecules in solution with cryo-electron microscopy (cryo-EM) have unlocked new doors for the reconstruction of molecular volumes, thereby promising further advances in biology, chemistry, and pharmacological research amongst others. Despite significant headway, the immense challenges in cryo-EM data analysis remain legion and intricately inter-disciplinary in nature, requiring insights from physicists, structural biologists, computer scientists, statisticians, and applied mathematicians. Meanwhile, recent next-generation volume reconstruction algorithms that combine generative modeling with end-to-end unsupervised deep learning techniques have shown promising results on simulated data, but still face considerable hurdles when applied to experimental cryo-EM images. In light of the proliferation of such methods and given the interdisciplinary nature of the task, we propose here a critical review of recent advances in the field of deep generative modeling for high resolution cryo-EM volume reconstruction. The present review aims to (i) compare and contrast these new methods, while (ii) presenting them from a perspective and using terminology familiar to scientists in each of the five aforementioned fields with no specific background in cryo-EM. The review begins with an introduction to the mathematical and computational challenges of deep generative models for cryo-EM volume reconstruction, along with an overview of the baseline methodology shared across this class of algorithms. Having established the common thread weaving through these different models, we provide a practical comparison of these state-of-the-art algorithms, highlighting their relative strengths and weaknesses, along with the assumptions that they rely on. This allows us to identify bottlenecks in current methods and avenues for future research.

翻訳日:2022-01-12 13:22:54 公開日:2022-01-11

# (参考訳) 新型コロナウイルス(covid-19)パンデミックにおけるバイオメディカル記事のゼロショットと少数ショットの分類

Zero-Shot and Few-Shot Classification of Biomedical Articles in Context of the COVID-19 Pandemic ( http://arxiv.org/abs/2201.03017v2 )

ライセンス: CC BY 4.0

Simon Lupart, Benoit Favre, Vassilina Nikoulina, Salah Ait-Mokhtar

(参考訳) mesh (medical subject headings) は国立医学図書館によって作成され、生物医学領域の出版物の細かなインデックス化に使われる大きなシソーラスである。新型コロナウイルス(COVID-19)パンデミックの文脈では、MeSH記述子は対応するトピックに関する記事に関連して現れている。ゼロショット分類は、メッシュカテゴリの論文の流れをタイムリーにラベリングするのに適切な応答である。本研究では,MeSHで利用可能なリッチな意味情報によってBioBERT表現が向上し,ゼロショット/フェーショットタスクに適合する可能性が示唆された。本稿では,MeSHの項定義と論文の要約が有効であるか否かを判断し,マルチタスク学習を活用して,Seq2seqタスクによって表現のMeSH階層を誘導する。結果は、MedLineとLitCovidデータセットのベースラインを確立し、結果の表現がMeSHに存在する階層的関係を伝達していることを示す。

MeSH (Medical Subject Headings) is a large thesaurus created by the National Library of Medicine and used for fine-grained indexing of publications in the biomedical domain. In the context of the COVID-19 pandemic, MeSH descriptors have emerged in relation to articles published on the corresponding topic. Zero-shot classification is an adequate response for timely labeling of the stream of papers with MeSH categories. In this work, we hypothesise that rich semantic information available in MeSH has potential to improve BioBERT representations and make them more suitable for zero-shot/few-shot tasks. We frame the problem as determining if MeSH term definitions, concatenated with paper abstracts are valid instances or not, and leverage multi-task learning to induce the MeSH hierarchy in the representations thanks to a seq2seq task. Results establish a baseline on the MedLine and LitCovid datasets, and probing shows that the resulting representations convey the hierarchical relations present in MeSH.

翻訳日:2022-01-12 12:43:07 公開日:2022-01-11

# (参考訳) 階層的多粒度分類のための階層的残差ネットワーク強化ラベル関係グラフ

Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification ( http://arxiv.org/abs/2201.03194v2 )

ライセンス: CC BY 4.0

Jingzhou Chen, Peng Wang, Jian Liu, Yuntao Qian

(参考訳) 階層的多粒度分類(HMC)は、各オブジェクトに階層的多粒度ラベルを割り当て、["Albatross", "Laysan Albatross"]のようなラベル階層を粗いレベルから細かいレベルまで符号化することに焦点を当てる。しかしながら、細粒度の定義は主観的であり、画像品質が識別に影響する可能性がある。したがって、サンプルは階層の任意のレベル、例えば ["Albatross"] や ["Albatross", "Laysan Albatross"] で観察することができ、粗いカテゴリで識別される例は、従来のHMCの設定では無視されることが多い。本稿では,オブジェクトを階層の任意のレベルにラベル付けするHMC問題について検討する。提案手法の基本設計は,(1) 様々なレベルにラベル付けされた物体の学習は階層的な知識をレベル間で伝達し,(2) 下位クラスは上位レベルのスーパークラスに関連する属性を継承する,という2つの動機から導かれる。提案する組合せ損失は、ツリー階層で定義された関連ラベルから情報を集約することにより、観測された基底真理ラベルの限界確率を最大化する。観測されたラベルが葉のレベルであれば、組合せ損失はさらに多種クロスエントロピー損失を課し、細粒度の分類損失の重みを増加させる。本研究では,階層的特徴の相互作用を考慮した階層的残差ネットワーク(hrn)を提案する。 3つの一般的なデータセットを用いた実験は、最新のHMCアプローチや、ラベル階層を利用したきめ細かな視覚分類(FGVC)手法と比較して、我々のアプローチの有効性を実証している。

Hierarchical multi-granularity classification (HMC) assigns hierarchical multi-granularity labels to each object and focuses on encoding the label hierarchy, e.g., ["Albatross", "Laysan Albatross"] from coarse-to-fine levels. However, the definition of what is fine-grained is subjective, and the image quality may affect the identification. Thus, samples could be observed at any level of the hierarchy, e.g., ["Albatross"] or ["Albatross", "Laysan Albatross"], and examples discerned at coarse categories are often neglected in the conventional setting of HMC. In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy. The essential designs of the proposed method are derived from two motivations: (1) learning with objects labeled at various levels should transfer hierarchical knowledge between levels; (2) lower-level classes should inherit attributes related to upper-level superclasses. The proposed combinatorial loss maximizes the marginal probability of the observed ground truth label by aggregating information from related labels defined in the tree hierarchy. If the observed label is at the leaf level, the combinatorial loss further imposes the multi-class cross-entropy loss to increase the weight of fine-grained classification loss. Considering the hierarchical feature interaction, we propose a hierarchical residual network (HRN), in which granularity-specific features from parent levels acting as residual connections are added to features of children levels. Experiments on three commonly used datasets demonstrate the effectiveness of our approach compared to the state-of-the-art HMC approaches and fine-grained visual classification (FGVC) methods exploiting the label hierarchy.

翻訳日:2022-01-12 12:31:09 公開日:2022-01-11

# ブートストラップによる異種グラフニューラルネットワークのクロスビュー自己監督学習

Cross-view Self-Supervised Learning on Heterogeneous Graph Neural Network via Bootstrapping ( http://arxiv.org/abs/2201.03340v2 )

ライセンス: Link先を確認

Minjae Park

(参考訳) 不均一グラフニューラルネットワークは、優れた能力を持つ異種グラフの情報を表現することができる。近年,グラフの独特な表現を対照的な学習方法で学習する自己教師型学習法が研究されている。ラベルがない場合、この学習方法は大きな可能性を秘めている。しかし、対照的な学習は正と負のペアに大きく依存しており、異種グラフから高品質なペアを生成することは困難である。本稿では,BYOL(ブートストラップ)と呼ばれる自己教師型学習における最近の革新に則って,多数のペアを生成することなく優れた表現を生成できる手法を提案する。さらに、ネットワークスキーマとメタパスビューという2つの視点から異種グラフを見ることができるという事実に注目して、グラフ内の高レベル表現をキャプチャして表現する。提案モデルは,様々な実世界のデータセットにおいて,他の手法よりも最先端の性能を示した。

Heterogeneous graph neural networks can represent information of heterogeneous graphs with excellent ability. Recently, self-supervised learning manner is researched which learns the unique expression of a graph through a contrastive learning method. In the absence of labels, this learning methods show great potential. However, contrastive learning relies heavily on positive and negative pairs, and generating high-quality pairs from heterogeneous graphs is difficult. In this paper, in line with recent innovations in self-supervised learning called BYOL or bootstrapping, we introduce a that can generate good representations without generating large number of pairs. In addition, paying attention to the fact that heterogeneous graphs can be viewed from two perspectives, network schema and meta-path views, high-level expressions in the graphs are captured and expressed. The proposed model showed state-of-the-art performance than other methods in various real world datasets.

翻訳日:2022-01-12 12:12:00 公開日:2022-01-11

# GBRS: Pawlakラフセットと近隣ラフセットの統一モデル

GBRS: An Unified Model of Pawlak Rough Set and Neighborhood Rough Set ( http://arxiv.org/abs/2201.03349v2 )

ライセンス: Link先を確認

Shuyin Xia, Cheng Wang, GuoYing Wang, XinBo Gao, Elisabeth Giem, JianHang Yu

(参考訳) パウラーク粗集合と近傍粗集合は、最も一般的な粗集合理論モデルである。 Pawlawk は知識を表現するために同値クラスを使用することができるが、連続データを処理することはできない。そこで本稿では,グラニュラーボール計算に基づく粒状ボール粗さ集合を提案する。粒状ボール粗さ集合は、パウラーク粗さ集合と近傍粗さ集合を同時に表現することができ、2つの統一表現を実現することができる。これにより、粒度ボールの粗い集合は連続データを扱うだけでなく、知識表現に同値クラスを使うことができる。さらに,粒状球粗集合の実装アルゴリズムを提案する。ベンチマークデータセットを用いた実験の結果,粒球計算のロバスト性と適応性の組み合わせにより,粒球粗さ集合の学習精度は,pawlak粗さ集合と従来の近傍粗さ集合と比較して大幅に向上した。グラウラーボールセットはまた、9つの人気または最先端の特徴選択方法よりも優れている。

Pawlak rough set and neighborhood rough set are the two most common rough set theoretical models. Pawlawk can use equivalence classes to represent knowledge, but it cannot process continuous data; neighborhood rough sets can process continuous data, but it loses the ability of using equivalence classes to represent knowledge. To this end, this paper presents a granular-ball rough set based on the granlar-ball computing. The granular-ball rough set can simultaneously represent Pawlak rough sets, and the neighborhood rough set, so as to realize the unified representation of the two. This makes the granular-ball rough set not only can deal with continuous data, but also can use equivalence classes for knowledge representation. In addition, we propose an implementation algorithms of granular-ball rough sets. The experimental resuts on benchmark datasets demonstrate that, due to the combination of the robustness and adaptability of the granular-ball computing, the learning accuracy of the granular-ball rough set has been greatly improved compared with the Pawlak rough set and the traditional neighborhood rough set. The granular-ball rough set also outperforms nine popular or the state-of-the-art feature selection methods.

翻訳日:2022-01-12 12:11:45 公開日:2022-01-11

# コンピュータビジョンによるUAV作物画像からの農業プラントカタログ作成とデータフレームワークの構築

Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision ( http://arxiv.org/abs/2201.02885v2 )

ライセンス: Link先を確認

Maurice G\"under, Facundo R. Ispizua Yamati, Jana Kierdorf, Ribana Roscher, Anne-Katrin Mahlein, Christian Bauckhage

(参考訳) 近代農業におけるUAVに基づく画像検索は、大量の空間的に参照された作物の画像データを収集することを可能にする。しかし、大規模な実験では、UAV画像は複雑な天蓋構造に多量の作物を含むことに苦しむ。特に時間的効果の観察においては、複数の画像上の個々の植物の認識と関連する情報の抽出が複雑になる。本研究は,理解可能なコンピュータビジョン手法に基づいて,uavの作物画像の時間的・空間的識別と個別化を自動化するためのハンズオンワークフローを提案する。実世界の2つのデータセット上でワークフローを評価する。 1つのデータセットは、成長サイクル全体を通してサトウキビの葉の斑点(真菌病)を観察するために記録されている。もう1つは、カリフラワー植物の収穫予測に関するものである。植物カタログは、複数のタイムポイントで見られる単一の植物画像の抽出に利用される。これにより、大規模な時空間画像データセットを収集し、さまざまなデータレイヤを含むさらなる機械学習モデルをトレーニングすることができる。提案手法は農業におけるUAVデータの分析と解釈を大幅に改善する。参照データによる検証により,より複雑な深層学習に基づく認識手法と類似した精度を示す。私たちのワークフローは、特に大規模なデータセットに対して、植物のカタログ作成と画像抽出のトレーニングを自動化できます。

UAV-based image retrieval in modern agriculture enables gathering large amounts of spatially referenced crop image data. In large-scale experiments, however, UAV images suffer from containing a multitudinous amount of crops in a complex canopy architecture. Especially for the observation of temporal effects, this complicates the recognition of individual plants over several images and the extraction of relevant information tremendously. In this work, we present a hands-on workflow for the automatized temporal and spatial identification and individualization of crop images from UAVs abbreviated as "cataloging" based on comprehensible computer vision methods. We evaluate the workflow on two real-world datasets. One dataset is recorded for observation of Cercospora leaf spot - a fungal disease - in sugar beet over an entire growing cycle. The other one deals with harvest prediction of cauliflower plants. The plant catalog is utilized for the extraction of single plant images seen over multiple time points. This gathers large-scale spatio-temporal image dataset that in turn can be applied to train further machine learning models including various data layers. The presented approach improves analysis and interpretation of UAV data in agriculture significantly. By validation with some reference data, our method shows an accuracy that is similar to more complex deep learning-based recognition techniques. Our workflow is able to automatize plant cataloging and training image extraction, especially for large datasets.

翻訳日:2022-01-12 12:11:26 公開日:2022-01-11

# MaskMTL:深層マルチタスク学習によるマスク付き顔画像の属性予測

MaskMTL: Attribute prediction in masked facial images with deep multitask learning ( http://arxiv.org/abs/2201.03002v2 )

ライセンス: Link先を確認

Prerana Mukherjee, Vinay Kaushik, Ronak Gupta, Ritika Jha, Daneshwari Kankanwadi, and Brejesh Lall

(参考訳) 目印の自由な顔画像の属性を予測することは、マスクの使用によって顔が目立たなくなるとさらに複雑になる課題である。身元確認や個人情報へのセキュアなログインを利用するスマートアクセス制御ゲートは、生体認証特性として顔を利用することができる。特に、Covid-19パンデミックは、衛生的および接触のない身元確認の重要性をますます証明している。このような場合、マスクの使用はより避けられないものとなり、属性予測は、コミュニティの広がりからターゲットの脆弱なグループを分離したり、共同環境での社会的距離を確保するのに役立つ。マスクの形状,大きさ,テクスチャの異なるマスクを効率的にオーバーレイすることで,マスクの装着による変動を効果的にモデル化する。本稿では,マスク付き顔画像から多種多様な属性を同時推定する深層マルチタスク学習(MTL)手法を提案する。ベンチマーク顔属性UTKFaceデータセットの実験結果から,提案手法が他の競合技術に取って代わることを示す。ソースコードはhttps://github.com/ritikajha/attribute-prediction-in-masked-face-images-with-deep-multitask-learning で入手できる。

Predicting attributes in the landmark free facial images is itself a challenging task which gets further complicated when the face gets occluded due to the usage of masks. Smart access control gates which utilize identity verification or the secure login to personal electronic gadgets may utilize face as a biometric trait. Particularly, the Covid-19 pandemic increasingly validates the essentiality of hygienic and contactless identity verification. In such cases, the usage of masks become more inevitable and performing attribute prediction helps in segregating the target vulnerable groups from community spread or ensuring social distancing for them in a collaborative environment. We create a masked face dataset by efficiently overlaying masks of different shape, size and textures to effectively model variability generated by wearing mask. This paper presents a deep Multi-Task Learning (MTL) approach to jointly estimate various heterogeneous attributes from a single masked facial image. Experimental results on benchmark face attribute UTKFace dataset demonstrate that the proposed approach supersedes in performance to other competing techniques. The source code is available at https://github.com/ritikajha/Attribute-prediction-in-masked-facial-images-with-deep-multitask-learni ng

翻訳日:2022-01-12 12:11:06 公開日:2022-01-11

PDF登録状況（公開日: 20220111）