Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20221212となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 位相・エネルギー・振幅推定のための高速コヒーレント量子アルゴリズム Faster Coherent Quantum Algorithms for Phase, Energy, and Amplitude Estimation ( http://arxiv.org/abs/2103.09717v4 ) ライセンス: Link先を確認	Patrick Rall	(参考訳) 入力状態のコピーが1つだけ与えられ、入力状態がユニタリの固有状態である必要はなく、状態を測定する必要もない。ほとんどの量子推定アルゴリズムはこの「コヒーレント」設定に適さない仮定を作り、教科書のアプローチのみを残す。本稿では,従来の教科書法よりも概念的にも計算的にも簡単な位相,エネルギー,振幅推定のための新しいアルゴリズムを提案し,クエリの複雑さとアンシラフットプリントの両面を特徴付ける。量子フーリエ変換は不要であり、いくつかの推定値の中央値を計算するために量子ソートネットワークを必要としない。その代わりに、ブロックエンコーディング技術を使って1ビットずつの推定を計算し、特異値変換による全増幅を実行する。これらの改良されたサブルーチンは、量子メトロポリスサンプリングと量子ベイズ推論のパフォーマンスを加速する。 We consider performing phase estimation under the following conditions: we are given only one copy of the input state, the input state does not have to be an eigenstate of the unitary, and the state must not be measured. Most quantum estimation algorithms make assumptions that make them unsuitable for this 'coherent' setting, leaving only the textbook approach. We present novel algorithms for phase, energy, and amplitude estimation that are both conceptually and computationally simpler than the textbook method, featuring both a smaller query complexity and ancilla footprint. They do not require a quantum Fourier transform, and they do not require a quantum sorting network to compute the median of several estimates. Instead, they use block-encoding techniques to compute the estimate one bit at a time, performing all amplification via singular value transformation. These improved subroutines accelerate the performance of quantum Metropolis sampling and quantum Bayesian inference.	翻訳日:2023-04-07 21:09:28 公開日:2022-12-12
# 1+1)$d量子リンクシュウィンガーモデルの連続限界に向けて Towards the continuum limit of a $(1+1)$d quantum link Schwinger model ( http://arxiv.org/abs/2104.00025v2 ) ライセンス: Link先を確認	Torsten V. Zache, Maarten Van Damme, Jad C. Halimeh, Philipp Hauke, Debasish Banerjee	(参考訳) ゲージ理論の解は、量子技術の最も有望な応用の1つである。ここでは、量子スピン-S$作用素の有限次元ヒルベルト空間を介して正規化された$U(1)$ゲージ理論の連続極限へのアプローチについて議論する。 1つの空間次元における量子電磁力学(QED)に対して、基底状態エネルギー、スカラー、ベクトル中間子質量を大きなスピン長$S$、大体積$N$、消滅格子$a$に外挿することで連続限界を数値的に示す。任意の$S$に対してガウスの法則を正確に解くことにより、一般化されたPXPスピンモデルを求め、解析的にヒルベルト空間次元を数える。これにより、量子デバイス上の連続極限に対する信頼性の高い外挿に必要なリソースを定量化できる。関数積分法を用いて、モデルと半整数スピンの大きな値と位相角 $\theta=\pi$ の物理学を関連付ける。この結果から,近い将来,量子デバイスが量子リンクモデルを用いてQEDレシエーションを定量的に探究できることが示唆された。 The solution of gauge theories is one of the most promising applications of quantum technologies. Here, we discuss the approach to the continuum limit for $U(1)$ gauge theories regularized via finite-dimensional Hilbert spaces of quantum spin-$S$ operators, known as quantum link models. For quantum electrodynamics (QED) in one spatial dimension, we numerically demonstrate the continuum limit by extrapolating the ground state energy, the scalar, and the vector meson masses to large spin lengths $S$, large volume $N$, and vanishing lattice spacing $a$. By exactly solving Gauss' law for arbitrary $S$, we obtain a generalized PXP spin model and count the physical Hilbert space dimension analytically. This allows us to quantify the required resources for reliable extrapolations to the continuum limit on quantum devices. We use a functional integral approach to relate the model with large values of half-integer spins to the physics at topological angle $\Theta=\pi$. Our findings indicate that quantum devices will in the foreseeable future be able to quantitatively probe the QED regime with quantum link models.	翻訳日:2023-04-06 00:24:51 公開日:2022-12-12
# 量子コンピュータにおけるコスト効率QFAアルゴリズム Cost-efficient QFA Algorithm for Quantum Computers ( http://arxiv.org/abs/2107.02262v2 ) ライセンス: Link先を確認	\"Ozlem Salehi, Abuzer Yakary{\i}lmaz	(参考訳) 量子有限オートマトン(QFAs)の研究は、有限メモリを持つ量子コンピュータの探索の可能なアプローチの1つである。最も制限されたモデルであるにもかかわらず、ムーア・クラッチフィールド量子有限オートマトン (MCQFA) は、$\mathtt{MOD}_p = \{ a^{j} \mid j \equiv 0 \mod p\}$ のような特定の言語を認識するとき、古典的有限オートマトンモデルよりも指数関数的に簡潔であることが証明されている。本稿では,利用可能な実量子コンピュータの基底ゲートに基づいて演算子を選択した言語$\mathtt{MOD}_p$に対する改良MCQFAアルゴリズムを提案する。その結果,本論文で与えられたアルゴリズムの実装と比較して,基底ゲートの少ない短い量子プログラムを得ることができた。 The study of quantum finite automata (QFAs) is one of the possible approaches in exploring quantum computers with finite memory. Despite being one of the most restricted models, Moore-Crutchfield quantum finite automaton (MCQFA) is proven to be exponentially more succinct than classical finite automata models in recognizing certain languages such as $\mathtt{MOD}_p = \{ a^{j} \mid j \equiv 0 \mod p\}$, where $p$ is a prime number. In this paper, we present a modified MCQFA algorithm for the language $\mathtt{MOD}_p$, the operators of which are selected based on the basis gates on the available real quantum computers. As a consequence, we obtain shorter quantum programs using fewer basis gates compared to the implementation of the original algorithm given in the literature.	翻訳日:2023-03-23 08:38:41 公開日:2022-12-12
# 量子プログラムの形式的検証:理論,ツール,課題 Formal Verification of Quantum Programs: Theory, Tools and Challenges ( http://arxiv.org/abs/2110.01320v2 ) ライセンス: Link先を確認	Marco Lewis and Sadegh Soudjani and Paolo Zuliani	(参考訳) 過去27年間で、量子コンピューティングは学界と産業の両方から大きな関心を集めている。現在の速度では、量子コンピュータはこの分野の研究の増加によって急速に成長している。量子ハードウェアの信頼性の向上と、量子コンピュータのプログラムに適したソフトウェアの開発に、多大な努力が払われている。対照的に、量子プログラムの検証はあまり注目されていない。プログラムの検証は、リソース制約やエラーが発生しやすい量子ハードウェア上で複雑なアルゴリズムを正しくプログラムすることの難しさから、量子環境において特に重要である。量子プログラムの検証フレームワークを作成する研究は近年、理論的なアイデアの集合を用いて様々なツールが実装されている。この調査は、量子プログラムの形式的検証分野への短い導入であり、これまで開発された理論とツールをまとめることを目的としている。さらに、この調査では、この分野が将来直面するであろういくつかの課題、すなわち複雑な量子アルゴリズムの開発について調べる。 Over the past 27 years, quantum computing has seen a huge rise in interest from both academia and industry. At the current rate, quantum computers are growing in size rapidly backed up by the increase of research in the field. Significant efforts are being made to improve the reliability of quantum hardware and to develop suitable software to program quantum computers. In contrast, the verification of quantum programs has received relatively less attention. Verifying programs is especially important in the quantum setting due to how difficult it is to program complex algorithms correctly on resource-constrained and error-prone quantum hardware. Research into creating verification frameworks for quantum programs has seen recent development, with a variety of tools implemented using a collection of theoretical ideas. This survey aims to be a short introduction into the area of formal verification of quantum programs, bringing together theory and tools developed to date. Further, this survey examines some of the challenges that the field may face in the future, namely the development of complex quantum algorithms.	翻訳日:2023-03-12 14:19:13 公開日:2022-12-12
# Jaynes-Cummingsモデルとその子孫 The Jaynes-Cummings model and its descendants ( http://arxiv.org/abs/2202.00330v2 ) ライセンス: Link先を確認	Jonas Larson and Th. K. Mavrogordatos	(参考訳) Jaynes-Cummings (JC) モデルは、現在まで約60年間量子光学の最前線にあり、現代の物理学において最も単純だが複雑な非線形な光物質相互作用の定式化の1つとなっている。このモノグラフは、様々な分野にわたるモデルの全義性に重点を置いており、原子物理学、量子光学、固体物理学、量子情報科学を含むいくつかの領域における特定の物理系における幅広い応用を考察して、その形式主義の基本的な一般化をもたらす。物語を組み立てるために様々な部品を組み立てるとき、我々は主に量子物理学と量子光学の研究者をターゲットにしてきた。このモノグラフはまた、非平衡量子相転移、量子コンピューティングとシミュレーション、および量子多体物理学に携わる大学院生向けのアクセス可能な導入を含んでいる。この枠組みでは、物理学と応用の共通基盤を文献に散らばり、様々な技術進歩を明らかにすることを目的としている。この展示は、量子光学と凝縮物質物理学をインターレースする活気のある場を通して読者を導く。全てのセクションは理論と実験の強い相互関係に費やされており、歴史的にjc物理学を起源とする様々な現代の研究方向の発展と結びついている。これは1960年代初めからその進化を形作った主要な出版物への包括的な参照リストを伴っている。最後に,このような多面的素材の提示を可能な限り簡潔に維持し,数学的表現の経済的利用とともに,様々な図形で連続的なテキストを散在させてきた。 The Jaynes-Cummings (JC) model has been at the forefront of quantum optics for almost six decades to date, providing one of the simplest yet intricately nonlinear formulations of light-matter interaction in modern physics. Laying most of the emphasis on the omnipresence of the model across a range of disciplines, this monograph brings up the fundamental generality of its formalism, looking at a wide gamut of applications in specific physical systems among several realms, including atomic physics, quantum optics, solid-state physics and quantum information science. When bringing the various pieces together to assemble our narrative, we have primarily targeted researchers in quantum physics and quantum optics. The monograph also comprises an accessible introduction for graduate students engaged with non-equilibrium quantum phase transitions, quantum computing and simulation, and quantum many-body physics. In that framework, we aim to reveal the common ground between physics and applications scattered across literature and different technological advancements. The exposition guides the reader through a vibrant field interlacing quantum optics and condensed-matter physics. All sections are devoted to the strong interconnection between theory and experiment, historically linked to the development of the various modern research directions stemming from JC physics. This is accompanied by a comprehensive list of references to the key publications that have shaped its evolution since the early 1960s. Finally, we have endeavored to keep the presentation of such a multi-sided material as concise as possible, interspersing continuous text with various illustrations alongside an economical use of mathematical expressions.	翻訳日:2023-02-27 03:16:46 公開日:2022-12-12
# 隠れたプロジェクター埋め込みから生じるバイパルタイトRydbergアレイの量子多体傷 Quantum many-body scars in bipartite Rydberg arrays originate from hidden projector embedding ( http://arxiv.org/abs/2203.00658v4 ) ライセンス: Link先を確認	Keita Omiya and Markus M\"uller	(参考訳) 拘束されたラビ振動を記述したPXPモデルに現れるエルゴディディディティ破りの「量子多体傷」状態の性質について検討する。 Rydberg 原子の2部格子の全体クラスについて、これらの状態のほぼエネルギー等価な塔は、一般化された射影埋め込み形式(量子多体散乱をホストする多くのモデルに共通する構造)にハミルトニアンの近接から生じることを明らかにする。非エルミート的だが厳密には局所的なPXPモデルの拡張を構築し、文献からヘルミート的傷跡安定化拡張がどのように自然に理解できるかを示す。正確な傷痕状態は、明示的に構築された擬似スピンの大きなスピン状態として解析的に得られる。 n\eel状態から生じる準周期運動は、最終的に大きな擬スピンの接点のrydberg-constrained subspaceへの射影であることが示される。 We study the nature of the ergodicity-breaking "quantum many-body scar" states that appear in the PXP model describing constrained Rabi oscillations. For a {wide class of bipartite lattices} of Rydberg atoms, we reveal that the nearly energy-equidistant tower of these states arises from the Hamiltonian's close proximity to a generalized projector-embedding form, a structure common to many models hosting quantum many-body scars. We construct a non-Hermitian, but strictly local extension of the PXP model hosting exact quantum scars, and show how various Hermitian scar-stabilizing extensions from the literature can be naturally understood within this framework. The exact scar states are obtained analytically as large spin states of explicitly constructed pseudospins. The quasi-periodic motion ensuing from the N\'eel state is finally shown to be the projection onto the Rydberg-constrained subspace of the precession of the large pseudospin.	翻訳日:2023-02-23 10:15:08 公開日:2022-12-12
# 臨界カシミール効果:厳密な結果 Critical Casimir Effect: Exact Results ( http://arxiv.org/abs/2203.15050v2 ) ライセンス: Link先を確認	D. M. Dantchev and S. Dietrich	(参考訳) 任意の媒体では、温度やその構成成分の量子的性質による変動がある。物質体がそのような媒体に浸漬された場合、その形状とその構成成分の性質は、周囲の媒体の特性とその変動を変化させる。同じ媒体に第2の体がある場合(これら間の直接の相互作用に加えて)、第1の体による変化は第2の体による変化に影響を及ぼす。この相互の影響は、両者の間に力をもたらす。物質間の効果的な相互作用を媒介する媒体の励起が質量を持たない場合、この力は長距離で現在カシミール力として知られている。変動媒体が真空中の閉じ込められた電磁場からなる場合、量子力学的カシミール効果について話す。物質場の秩序パラメータが数密度や濃度の差のように変動し、対応する秩序パラメータの変動が長距離である場合、臨界カシミール効果を語る。これは例えば、二階相転移を実行し、対応する臨界点の近くに熱力学的に位置する系や、ゴールドストーンモード励起を示す連続対称性を持つ系などである。ここでは、一次元イジングモデル、XY、ハイゼンベルクモデル、二次元イジングモデル、ガウスモデル、球面モデル、およびイジングモデルとXYモデルの平均場結果を含む系のカシミール効果に関する現在利用可能な正確な結果について述べる。境界条件がカシミール力の挙動に及ぼす影響には特に注意が必要である。 In any medium there are fluctuations due to temperature or due to the quantum nature of its constituents. If a material body is immersed into such a medium, its shape and the properties of its constituents modify the properties of the surrounding medium and its fluctuations. If in the same medium there is a second body then -- in addition to all direct interactions between them -- the modifications due to the first body influence the modifications due to the second body. This mutual influence results in a force between these bodies. If the excitations of the medium, which mediate the effective interaction between the bodies, are massless, this force is long-ranged and nowadays known as a Casimir force. If the fluctuating medium consists of the confined electromagnetic field in vacuum, one speaks of the quantum mechanical Casimir effect. In the case that the order parameter of material fields fluctuates - such as differences of number densities or concentrations - and that the corresponding fluctuations of the order parameter are long-ranged, one speaks of the critical Casimir effect. This holds, e.g., in the case of systems which undergo a second-order phase transition and which are thermodynamically located near the corresponding critical point, or for systems with a continuous symmetry exhibiting Goldstone mode excitations. Here we review the currently available exact results concerning the critical Casimir effect in systems encompassing the one-dimensional Ising, XY, and Heisenberg models, the two-dimensional Ising model, the Gaussian and the spherical models, as well as the mean field results for the Ising and the XY model. Special attention is paid to the influence of the boundary conditions on the behavior of the Casimir force.	翻訳日:2023-02-20 11:40:47 公開日:2022-12-12
# 論理的誤認検出 Logical Fallacy Detection ( http://arxiv.org/abs/2202.13758v3 ) ライセンス: Link先を確認	Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, Bernhard Sch\"olkopf	(参考訳) 推論は人間の知性の中心である。しかし、悪質な議論が一般的であり、気候変動に関する誤報を広めるなど、悪化する問題もある。本稿では,論理的誤り検出の課題を提案し,テキストに一般的に見られる論理的誤りの新たなデータセット(ロジック)と,気候変動の主張(LogicClimate)における論理的誤りの検出のための追加の課題を提案する。モデルが議論の根底にある論理構造を理解する必要があるため、論理的誤りの検出は難しい問題である。既存の事前学習済みの大規模言語モデルは、このタスクでパフォーマンスが悪いことが分かりました。対照的に、単純な構造認識型分類器は論理学では5.46%、論理学では4.51%の言語モデルを上回る。私たちは今後この課題を探求することを奨励します (a)言語モデルの新たな推論課題として機能し、 (b)誤情報の拡散に取り組むための潜在的な応用がある可能性がある。私たちのデータセットとコードはhttps://github.com/causalNLP/logical-fallacyで利用可能です。 Reasoning is central to human intelligence. However, fallacious arguments are common, and some exacerbate problems such as spreading misinformation about climate change. In this paper, we propose the task of logical fallacy detection, and provide a new dataset (Logic) of logical fallacies generally found in text, together with an additional challenge set for detecting logical fallacies in climate change claims (LogicClimate). Detecting logical fallacies is a hard problem as the model must understand the underlying logical structure of the argument. We find that existing pretrained large language models perform poorly on this task. In contrast, we show that a simple structure-aware classifier outperforms the best language model by 5.46% on Logic and 4.51% on LogicClimate. We encourage future work to explore this task as (a) it can serve as a new reasoning challenge for language models, and (b) it can have potential applications in tackling the spread of misinformation. Our dataset and code are available at https://github.com/causalNLP/logical-fallacy	翻訳日:2023-02-19 15:18:44 公開日:2022-12-12
# 予測セットによるエキスパート予測の改善 Improving Expert Predictions with Prediction Sets ( http://arxiv.org/abs/2201.12006v4 ) ライセンス: Link先を確認	Eleni Straitouri and Lequn Wang and Nastaran Okati and Manuel Gomez Rodriguez	(参考訳) 自動意思決定支援システムは、人間の専門家がより効率的に正確にタスクを解決できるようにする。しかし、既存のシステムは一般に専門家に、いつエージェンシーをシステムに割譲するか、いつ独自のエージェンシーを行使するかを理解する必要がある。さらに、専門家がシステムに対する誤った信頼を育むと、パフォーマンスが悪化する可能性がある。本研究では、上記の要件を解除し、設計上、それぞれの推奨がパフォーマンスを改善するのにいつ正確なのかを専門家が理解する必要のない、自動意思決定支援システムを開発する。この目的のために,マルチクラス分類タスクに着目し,各データサンプルに対して分類器を使用してラベルのサブセットを人間エキスパートに推薦する自動決定支援システムを検討する。まず、共形予測の観点から、そのようなシステムの設計を考えることで、ラベルの推奨されたサブセットが真のラベルを含む確率が、ほぼ正確にターゲット確率値と高い確率と一致することを保証できることを示す。そこで,本研究では,本システムの利用が最も有利な目標確率値を求めるための,効率的で近似的な探索手法を開発した。合成データと実データを用いた実験により,本システムはより正確な予測を行うことができ,それに依存する分類器の精度にロバストであることが証明された。 Automated decision support systems promise to help human experts solve tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Moreover, if the experts develop a misplaced trust in the system, their performance may worsen. In this work, we lift the above requirement and develop automated decision support systems that, by design, do not require experts to understand when each of their recommendations is accurate to improve their performance. To this end, we focus on multiclass classification tasks and consider an automated decision support system that, for each data sample, uses a classifier to recommend a subset of labels to a human expert. We first show that, by looking at the design of such a system from the perspective of conformal prediction, we can ensure that the probability that the recommended subset of labels contains the true label matches almost exactly a target probability value with high probability. Then, we develop an efficient and near-optimal search method to find the target probability value under which the expert benefits the most from using our system. Experiments on synthetic and real data demonstrate that our system can help the experts make more accurate predictions and is robust to the accuracy of the classifier it relies on.	翻訳日:2023-02-19 14:32:23 公開日:2022-12-12
# パーソナルブロックチェーンの作成とメンテナンスのための新しいアーキテクチャ Novel Architecture to Create and Maintain Personal Blockchains ( http://arxiv.org/abs/2212.14671v1 ) ライセンス: Link先を確認	Collin Connors and Dilip Sarkar	(参考訳) ブロックチェーンは革命的技術だと言われている。しかし、興奮にもかかわらず、ブロックチェーンは多くの分野で採用されていない。多くの人は、プライバシの懸念や使用障壁、実用的なユースケースの欠如などを理由に、ブロックチェーン技術の採用をためらっている。本研究では、複数の金融機関間での金融取引を追跡するブロックチェーンのユースケースについて概説する。従来の集中型アプローチの欠点と、ブロックチェーンアプローチが、このユースケースに必要なすべてのプライバシとアクセシビリティを提供していないことを示しています。したがって、ユースケースをサポートする新しいブロックチェーンアーキテクチャを提案する。この新しいアーキテクチャは、パブリックブロックチェーンの使いやすさとプライベートブロックチェーンのプライバシを組み合わせることで、ユーザがパーソナルブロックチェーンを作成できるようにする。この新たなパーソナルブロックチェーンアーキテクチャは、特にプライベートデータを扱うユースケースにおいて、より多くのブロックチェーン採用につながると考えています。 Blockchain has been touted as a revolutionary technology. However, despite the excitement, blockchain has not been adopted in many fields. Many are hesitant to adopt blockchain technology due to privacy concerns, barriers to use, or lack of practical use cases. In this work, we outline a potential blockchain use case for tracking financial transactions across multiple financial institutions. We show the downsides of traditional centralized approaches and that blockchain approaches fail to give all the privacy and accessibility required for this use case. Thus we propose a novel blockchain architecture to support our use case. This novel architecture combines the ease of use of public blockchains with the privacy of private blockchains by allowing users to create personal blockchains. We believe this novel personal blockchain architecture will lead to more blockchain adoption, particularly in use cases handling private data.	翻訳日:2023-02-19 13:23:02 公開日:2022-12-12
# アダムズン大学における総合的な教育管理ツール Integrated Educational Management Tool for Adamson University ( http://arxiv.org/abs/2212.08039v1 ) ライセンス: Link先を確認	Anabella C. Doctor	(参考訳) 本研究は、アダムソン大学の教員が無費用の試験、学生の成績の付与、データや努力の冗長性回避、試験や成績に関するアクセス可能で信頼性の高い情報を提供することを支援できるウェブベースの総合的な学術情報システムの開発に焦点をあてた。開発システムは、試験と学生の成績の過程を自動化する。研究の目的を達成するため、研究者はソフトウェア開発ライフサイクルの段階を踏襲し、アダムズソン大学の教員や管理職の期待を満たさない高品質なソフトウェアアウトプットを生み出すことを目的とした。開発システムはアダムズン大学でテストされ,IT専門家やエンドユーザを含む回答者によるISO 9126ソフトウェア製品評価基準を用いて評価された。アダムソン大学統合教育管理ツールは、オープンソース技術を用いてwebサイトの開発に成功したシステムである。このシステムは,webサイトの機能,信頼性,ユーザビリティ,効率性,ポータビリティの試験に成功し,その評価結果から,教育機関試験と学生成績評価システムによる効率性,信頼性,アクセシビリティが評価された。データ処理のオンライン自動同期によるオフラインクラス記録と試験の作成とともに、アイテム分析、仕様表、システムのサブモジュールの強化に関する今後の研究と統合が推奨される。 This study focused on the development of a web based integrated academic information system that can aid Adamson University faculty to become more effective and efficient in giving costless examinations, in giving student grades, in avoiding redundancy of data and efforts, and in providing accessible and reliable information about examinations and grades. The developed system automates the processes of examination and student grading. To achieve the goal of the study, the researcher followed the phases of software development life cycle aiming to produce high quality software output that meets or even exceeds Adamson University faculty and administrations expectations. The developed system was tested in Adamson University and evaluated using the ISO 9126 software product evaluation criteria by respondents who include IT Experts and end-users with a descriptive rating of excellent with a mean average of 4.76 which proves that the system can be a useful tool for managing educational institutions examination and student grading. Integrated Educational Management Tool for Adamson University is a system that was successfully constructed using open source technology in developing web sites. The system has been successfully tested for functionality, reliability, usability, efficiency, and portability of the website with results that revealed that the developed application supports the educational institutions examination and student grading system for efficiency, reliability, and accessibility. Future studies and integration of item analysis, table of specification, and enhancement of sub modules of the system are recommended as well as making available offline class records and exams with online auto synchronization of data processes	翻訳日:2023-02-19 13:05:02 公開日:2022-12-12
# 大規模モビリティネットワークによるcovid-19政策の地理的流出効果の推定 Estimating Geographic Spillover Effects of COVID-19 Policies From Large-Scale Mobility Networks ( http://arxiv.org/abs/2212.06224v1 ) ライセンス: Link先を確認	Serina Chang, Damir Vrabac, Jure Leskovec, Johan Ugander	(参考訳) アメリカにおける多くの政策は、例えば郡レベルで地方で決定される。地方政策体制は地域間の柔軟性を提供しているが、地理的な流出がある場合、人口は近隣の制限の少ない地域へ旅行することで地域制限を回避できる。政策作成の内在的な性質のため、因果的流出効果を確実に推定したり、地域政策への影響を評価する機会はほとんどない。本研究では,地域政策の流出効果を未定で見積もることができるような,新しい設定を特定し,適切な方法論を開発する。カリフォルニア州のより安全な経済のための青写真に焦点を当て、郡レベルのモビリティー制限が新型コロナウイルス(covid-19)の重大度統計によって決定論的に設定されたことを活用し、郡間の流出を見積もる回帰不連続設計の枠組みを可能にした。我々は、何十億ものタイムスタンプのある移動ネットワークを用いてこれらの効果を推定し、小売、飲食店、体育館において大きな効果を持つ大規模な流出運動を見出した。地方やグローバルな政策体制とは対照的に、我々の推計では、郡レベルの制限はモビリティを減らすための州全体の制限と同程度に有効である。しかしながら、郡分割を最適化するマクロカントリー制限の中間戦略は、スプリンクラー推定値で重み付けされたグラフ上で最小のkカット問題を解決することで、郡間の柔軟性を保ちながら、州全体の移動率の90%以上を回復することができる。 Many policies in the US are determined locally, e.g., at the county-level. Local policy regimes provide flexibility between regions, but may become less effective in the presence of geographic spillovers, where populations circumvent local restrictions by traveling to less restricted regions nearby. Due to the endogenous nature of policymaking, there have been few opportunities to reliably estimate causal spillover effects or evaluate their impact on local policies. In this work, we identify a novel setting and develop a suitable methodology that allow us to make unconfounded estimates of spillover effects of local policies. Focusing on California's Blueprint for a Safer Economy, we leverage how county-level mobility restrictions were deterministically set by public COVID-19 severity statistics, enabling a regression discontinuity design framework to estimate spillovers between counties. We estimate these effects using a mobility network with billions of timestamped edges and find significant spillover movement, with larger effects in retail, eating places, and gyms. Contrasting local and global policy regimes, our spillover estimates suggest that county-level restrictions are only 54% as effective as statewide restrictions at reducing mobility. However, an intermediate strategy of macro-county restrictions -- where we optimize county partitions by solving a minimum k-cut problem on a graph weighted by our spillover estimates -- can recover over 90% of statewide mobility reductions, while maintaining substantial flexibility between counties.	翻訳日:2023-02-19 12:58:55 公開日:2022-12-12
# LAMBRETTA: Twitterのソフトモデレーションでランク付けを学ぶ LAMBRETTA: Learning to Rank for Twitter Soft Moderation ( http://arxiv.org/abs/2212.05926v1 ) ライセンス: Link先を確認	Pujan Paudel, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, and Gianluca Stringhini	(参考訳) 偽情報の問題を抑制するため、twitterのようなソーシャルメディアプラットフォームは、削除された物語を議論するコンテンツに警告ラベルを追加し始めた。残念なことに、これらのラベルは一様に適用されず、大量の偽コンテンツは調整されない。本稿では,Learning To Rank (LTR) を用いたソフトモデレーション候補であるツイートを自動的に識別するシステム LAMBRETTA を提案する。私たちはtwitterのデータを使って、2020年のアメリカ大統領選挙に関連する虚偽の主張を控えめに調べ、twitterがtwitterの20倍以上のツイートを警告していることを突き止め、偽陽性は3.93%、偽陰性は18.81%で、キーワード抽出とセマンティック検索に基づく代替的方法よりも優れていると結論づけました。全体として、LAMBRETTAはソーシャルメディア上の偽情報を識別し、フラグ付けする人間のモデレーターを支援する。 To curb the problem of false information, social media platforms like Twitter started adding warning labels to content discussing debunked narratives, with the goal of providing more context to their audiences. Unfortunately, these labels are not applied uniformly and leave large amounts of false content unmoderated. This paper presents LAMBRETTA, a system that automatically identifies tweets that are candidates for soft moderation using Learning To Rank (LTR). We run LAMBRETTA on Twitter data to moderate false claims related to the 2020 US Election and find that it flags over 20 times more tweets than Twitter, with only 3.93% false positives and 18.81% false negatives, outperforming alternative state-of-the-art methods based on keyword extraction and semantic search. Overall, LAMBRETTA assists human moderators in identifying and flagging false information on social media.	翻訳日:2023-02-19 12:57:50 公開日:2022-12-12
# Wikipediaのバランス法: 集合的知性や大量監視のためのツール? Wikipedia's Balancing Act: A Tool for Collective Intelligence or Mass Surveillance? ( http://arxiv.org/abs/2212.05828v1 ) ライセンス: Link先を確認	Simon Liu	(参考訳) Wikipediaは、ますます複雑なデータ駆動社会におけるオンライン百科事典としての本来の機能を超えて進化してきた。社会的プラットフォームは、集団情報と集団監視のバランスをとる行為に満ちており、匿名通信ソフトウェアによる重要な貢献を犠牲にすることなく、個人やコミュニティを政府による大量監視から守るためのプロセスを開発する必要がある。ケーススタディは、NSA政府の監視慣行、反SOPA法運動、ウィキペディアの参加ジャーナリズムへの関与、偽情報、自己検閲、Torの使用に関する研究から提供される。本稿では,データ保持と政府説明責任に関する社会文化人類学と政策枠組みの今後の研究を通じて,個人,公共機関,民間機関間の共通基盤を整備することを提案する。ウィキペディアは、その反復的な性質を通じて変化に適応できる複雑な組織として、米国の諜報機関の例として使われており、今後の政策フレームワークの保護方法に関する洞察を引き出している。最後に本稿は,個人,民間機関,政府に対して,オンラインコミュニティへの貢献の結果,個人情報の保存と利用に対する警戒を継続するよう呼びかけるものである。 Wikipedia has evolved beyond its original function as an online encyclopedia in an increasingly complex data-driven society. The social platform is met with a balancing act between collective intelligence and mass surveillance; processes need to be developed to protect individuals and the community from government mass surveillance without sacrificing the important contributions made through prohibited anonymous communication software. Case studies are provided from NSA government surveillance practices, the anti-SOPA legislation movement, and research that covers Wikipedia's involvement with participatory journalism, disinformation, self-censorship, and the use of Tor. This paper proposes that a common ground can be developed between individuals, public and private institutions through future research in socio-cultural anthropology and policy frameworks around data retention and government accountability. Wikipedia is used as an example within the US intelligence community as a complex organisation that can adapt to changes through its iterative nature, which draws insight into how policy frameworks can be future-proofed. Finally, this paper is a wake-up call to individuals, private institutions, and governments to remain vigilant about the storage and use of personal information as a result of contributing to online communities.	翻訳日:2023-02-19 12:57:35 公開日:2022-12-12
# メタバースにおける経済システム:基礎,最先端,課題 Economic Systems in Metaverse: Basics, State of the Art, and Challenges ( http://arxiv.org/abs/2212.05803v1 ) ライセンス: Link先を確認	Huawei Huang, Qinnan Zhang, Taotao Li, Qinglin Yang, Zhaokang Yin, Junhao Wu, Zehui Xiong, Jianming Zhu, Jiajing Wu, and Zibin Zheng	(参考訳) 経済システムはメタバースにおいて重要な役割を果たす。しかし,メタバースの経済システムを体系的に導入する概観はまだ見つかっていない。そこで本稿では,経済システムに関する最先端のソリューション,アーキテクチャ,システムについて概観する。 1) メタバースの文脈における経済システムの枠組みとは何か,(2) 経済システムはメタバースにどのような影響を及ぼすのか? 本稿では、現在と将来のメタバースの両方で機能する経済システムに関する洞察を明らかにすることを目的とする。経済体制の枠組みを概観するために,我々はメタバース,すなわちデジタル創造,デジタル資産,デジタル取引市場における3つの基本的な要素の関連性について論じる。その後,提案する経済システム枠組みの各トピックについて詳述する。これらのトピックには、インセンティブメカニズム、金融システム、デジタルウォレット、分散金融(defi)アクティビティ、メタバースのクロスプラットフォーム相互運用性などがある。各トピックについて、主に3つの質問を取り上げます。 a) この話題の理論的根拠 b) メタバースがなぜこの話題を必要とするのか c) このトピックがメタバースでどのように進化するか。この概観を通じて、読者はメタバースが必要とする経済システムと、メタバースにおける経済活動の背後にある洞察をよりよく理解できるようにしたい。 Economic systems play pivotal roles in the metaverse. However, we have not yet found an overview that systematically introduces economic systems for the metaverse. Therefore, we review the state-of-the-art solutions, architectures, and systems related to economic systems. When investigating those state-of-the-art studies, we keep two questions in our mind: (1) what is the framework of economic systems in the context of the metaverse, and (2) what activities would economic systems engage in the metaverse? This article aims to disclose insights into the economic systems that work for both the current and the future metaverse. To have a clear overview of the economic-system framework, we mainly discuss the connections among three fundamental elements in the metaverse, i.e., digital creation, digital assets, and the digital trading market. After that, we elaborate on each topic of the proposed economic-system framework. Those topics include incentive mechanisms, monetary systems, digital wallets, decentralized finance (DeFi) activities, and cross-platform interoperability for the metaverse. For each topic, we mainly discuss three questions: a) the rationale of this topic, b) why the metaverse needs this topic, and c) how this topic will evolve in the metaverse. Through this overview, we wish readers can better understand what economic systems the metaverse needs, and the insights behind the economic activities in the metaverse.	翻訳日:2023-02-19 12:57:15 公開日:2022-12-12
# 拡散アートかデジタル偽造か? 拡散モデルにおけるデータレプリケーションの検討 Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models ( http://arxiv.org/abs/2212.03860v3 ) ライセンス: Link先を確認	Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein	(参考訳) カットエッジ拡散モデルは高品質でカスタマイズ可能な画像を生成し、商業芸術やグラフィックデザインの目的で使用することができる。しかし、拡散モデルは独自の芸術作品を作るのか、それともトレーニングセットから直接コンテンツを複製するのか? 本研究では,生成した画像とトレーニングサンプルを比較し,コンテンツが複製されたことを検知する画像検索フレームワークについて検討する。フレームワークをオックスフォード花、Celeb-A、ImageNet、LAIONなど複数のデータセットでトレーニングされた拡散モデルに適用することにより、トレーニングセットのサイズがコンテンツ複製の速度にどのように影響するかを議論する。また,人気のある安定拡散モデルを含む拡散モデルが,トレーニングデータからぼんやりとコピーされるケースを特定する。 Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they replicating content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.	翻訳日:2023-02-19 12:54:27 公開日:2022-12-12
# 非互換性を超えた: 機械学習と法における相互排他的公正基準のトレードオフ Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law ( http://arxiv.org/abs/2212.00469v3 ) ライセンス: Link先を確認	Meike Zehlike and Alex Loosley and H{\aa}kan Jonsson and Emil Wiedemann and Philipp Hacker	(参考訳) 信頼できるAIは、マシンラーニングと法的なドメインの両方において、ますます重要になっている。重要な結果の1つは、意思決定者が「公正」、すなわち非差別的、アルゴリズム的決定手順を保証しようとすることである。しかし、現実的な事実的仮定の下で相互に相容れないことが示されているアルゴリズム的公正性のいくつかの競合する概念がある。この懸念は、例えば、「グループ内の校正」と「正負のクラスに対する均衡」の広く使われている公平度尺度である。本稿では,これら3つのフェアネス基準を補間する新しいアルゴリズム(FAir Interpolation Method: FAIM)を提案する。これにより、少なくとも部分的には、各フェアネス条件の所望の重み付けの組み合わせを満たすように、初期不公平な予測を是正することができる。我々は,合成データ,CompASデータセット,電子商取引部門による新たな実世界のデータセットに適用した場合のアルゴリズムの有効性を実証する。最後に、FAIMが相反する法的義務を満たすためにどの程度活用できるかについて議論する。この分析は、信用スコアリングや刑事司法手続のような従来の法分野における業務を運用する可能性だけでなく、最近制定されたデジタル市場法のように、EUで実施された最新のAI規制についても運用する可能性があることを示唆している。 Trustworthy AI is becoming ever more important in both machine learning and legal domains. One important consequence is that decision makers must seek to guarantee a 'fair', i.e., non-discriminatory, algorithmic decision procedure. However, there are several competing notions of algorithmic fairness that have been shown to be mutually incompatible under realistic factual assumptions. This concerns, for example, the widely used fairness measures of 'calibration within groups' and 'balance for the positive/negative class'. In this paper, we present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between these three fairness criteria. Thus, an initially unfair prediction can be remedied to, at least partially, meet a desired, weighted combination of the respective fairness conditions. We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector. Finally, we discuss to what extent FAIM can be harnessed to comply with conflicting legal obligations. The analysis suggests that it may operationalize duties in traditional legal fields, such as credit scoring and criminal justice proceedings, but also for the latest AI regulations put forth in the EU, like the recently enacted Digital Markets Act.	翻訳日:2023-02-19 12:45:32 公開日:2022-12-12
# 法律インフォームス・コード:人間と人工知能をアライメントするための法情報学のアプローチ Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans ( http://arxiv.org/abs/2209.13020v12 ) ライセンス: Link先を確認	John J. Nay	(参考訳) 私たちは現在、AIの振る舞いを確実に導く方法で、人間の目標と社会的価値を特定できません。法的な解釈と法的な解釈は、不透明な人間の価値を妥当な指令に変換する計算エンジンを形成する。ローインフォメーション・コード(law informs code)は、aiに法的知識と推論を組み込んだ研究課題である。法的な契約の当事者が将来の関係のあらゆる潜在的な事態を予測できないのと同様に、議会は提案された法案が適用される全ての状況を予測することができない。法理論と実践は、これらの仕様問題に対処するための一連のツールを開発した。例えば、法的な基準により、人間は共通の理解を発達させ、新しい状況に適応することができる。法律のより散在的な使用(例えば、認可の脅威による悪行の抑止として)とは対照的に、人間の目標の伝達方法や社会の価値観の表現として活用され、法律はコードを知らせる。本稿では,法的プロセス(法制定法,法解釈法,契約起草法,法標準の適用法,法的推論法等)が生み出すデータがどのように,本質的にあいまいな人間の目標の堅牢な仕様を促進するかを述べる。これにより、人間-AIアライメントとAIの局所的有用性が向上する。社会AIアライメントに向けて,多エージェントアライメントの応用哲学としての法を理解するための枠組みを提案する。法律は歴史的に有望な政治権力の反映であり、したがって市民選好の完全な集積ではないが、適切に解析すれば、その蒸留は利用可能な社会的価値の最も正当な計算的理解を提供する。法律が最終的に強力なAIに通知すると、法律を改善するための熟考的な政治プロセスがさらに意味を成す。 We are currently unable to specify human goals and societal values in a way that reliably directs AI behavior. Law-making and legal interpretation form a computational engine that converts opaque human values into legible directives. "Law Informs Code" is the research agenda embedding legal knowledge and reasoning in AI. Similar to how parties to a legal contract cannot foresee every potential contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex ante specify rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code. We describe how data generated by legal processes (methods of law-making, statutory interpretation, contract drafting, applications of legal standards, legal reasoning, etc.) can facilitate the robust specification of inherently vague human goals. This increases human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we present a framework for understanding law as the applied philosophy of multi-agent alignment. Although law is partly a reflection of historically contingent political power - and thus not a perfect aggregation of citizen preferences - if properly parsed, its distillation offers the most legitimate computational comprehension of societal values available. If law eventually informs powerful AI, engaging in the deliberative political process to improve law takes on even more meaning.	翻訳日:2023-02-19 11:24:07 公開日:2022-12-12
# 公正なプログラミング Fairness Reprogramming ( http://arxiv.org/abs/2209.10222v4 ) ライセンス: Link先を確認	Guanhua Zhang, Yihua Zhang, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, Shiyu Chang	(参考訳) 機械学習(ML)の公正性を促進する最近の進歩にもかかわらず、既存の主流のアプローチは、公正性基準を満たすために、ニューラルネットワークの全重みをトレーニングまたは微調整する必要がある。しかし、大規模なトレーニングモデルでは、計算コストやストレージコスト、データ効率の低さ、モデルプライバシの問題などにより、これは実現不可能であることが多い。本稿では,モデル再プログラミング手法を組み込んだ新しい汎用的フェアネス学習パラダイム,fairreprogramを提案する。具体的には、fairreprogramはモデルの変更ができず、min-maxの定式化の下でフェアネス基準に向けて調整されるfairness triggerと呼ばれる一連の摂動を入力に付加するケースを考察している。さらに,公平性トリガーを用いて公平性目標を達成できる理由と条件を説明する情報理論の枠組みについても紹介する。本研究では,固定MLモデルの出力予測において,フェアネストリガが,正しい人口統計情報を利用して予測を行うのを妨げる偽の人口統計情報を提供することによって,効果的に人口統計バイアスを隠蔽できることを示す。 nlp と cv のデータセットを広範囲に実験した結果,2つのフェアネス基準の下では,データ依存度がはるかに少ない再トレーニング法よりも公平性の向上が期待できることがわかった。コードはhttps://github.com/UCSB-NLP-Chang/Fairness-Remingming.gitで公開されている。 Despite a surge of recent advances in promoting machine Learning (ML) fairness, the existing mainstream approaches mostly require retraining or finetuning the entire weights of the neural network to meet the fairness criteria. However, this is often infeasible in practice for those large-scale trained models due to large computational and storage costs, low data efficiency, and model privacy issues. In this paper, we propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the case where models can not be changed and appends to the input a set of perturbations, called the fairness trigger, which is tuned towards the fairness criteria under a min-max formulation. We further introduce an information-theoretic framework that explains why and under what conditions fairness goals can be achieved using the fairness trigger. We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models by providing false demographic information that hinders the model from utilizing the correct demographic information to make the prediction. Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less data dependency under two widely-used fairness criteria. Codes are available at https://github.com/UCSB-NLP-Chang/Fairness-Reprogramming.git.	翻訳日:2023-02-19 11:17:55 公開日:2022-12-12
# 量子シミュレーションの今後数十年 The Coming Decades of Quantum Simulation ( http://arxiv.org/abs/2204.08905v2 ) ライセンス: Link先を確認	Joana Fraxanet, Tymoteusz Salamon and Maciej Lewenstein	(参考訳) 現代の量子技術は、誤り訂正を伴うフォールトトレラント量子コンピューティングにおいて大きな困難に直面しており、代わりに様々な量子シミュレーション(ノイズ中間スケール量子、NISQ)デバイス、アナログおよびデジタル量子シミュレータ、および量子アニールに焦点を当てている。このようなシステムには、物理的システムの量子力学を必ずしもシミュレートすることなく、巨大で制御可能な、堅牢で絡み合った、重畳状態を生成するという明確なニーズと要求がある。これは特に、デコヒーレンスの制御を可能にし、これらの状態を量子通信(例えば、より安全で速い方法で情報の効率的な転送を実現する)、量子計測、センシング、診断(例えば、光場の位相シフトを正確に測定したり、量子物質を診断するために)に使用できる。この章では、今後数十年にわたって量子シミュレーターの黄金の未来を展望する。 Contemporary quantum technologies face major difficulties in fault tolerant quantum computing with error correction, and focus instead on various shades of quantum simulation (Noisy Intermediate Scale Quantum, NISQ) devices, analogue and digital quantum simulators and quantum annealers. There is a clear need and quest for such systems that, without necessarily simulating quantum dynamics of some physical systems, can generate massive, controllable, robust, entangled, and superposition states. This will, in particular, allow the control of decoherence, enabling the use of these states for quantum communications (e.g. to achieve efficient transfer of information in a safer and quicker way), quantum metrology, sensing and diagnostics (e.g. to precisely measure phase shifts of light fields, or to diagnose quantum materials). In this Chapter we present a vision of the golden future of quantum simulators in the decades to come.	翻訳日:2023-02-16 08:46:46 公開日:2022-12-12
# 古典的情報原則の統一と拡張 Unification and Extension of Classic Information Principles ( http://arxiv.org/abs/2207.07577v5 ) ライセンス: Link先を確認	Jianfeng Xu	(参考訳) 情報理論の普遍的な枠組みを定式化することは有益である。本研究は, 客観的情報理論(oit)のセクシュタプルモデルが, 4つの基本的な仮定と情報を議論するのに十分かつ必要な条件であることを示す。これはOITで定義された各計量に対して示され、古典情報理論や一般的に用いられる原理に該当する例がある。さらに、原子情報は識別不能な基本情報として定義され、原子情報の組み合わせに対して体積加算性が証明される。これにより、単一の量子キャリアが持てる情報ボリュームが導出され、質量、エネルギー、時間に関する情報ボリュームに関する定理が証明される。これらの取り組みは、OITが様々な古典的な情報原理を統一し、情報、物質、エネルギーの量的関係を正確に明らかにできる新しい情報理論であることを示している。 To formulate a universal framework of information theory is beneficial. This study proves that the sextuple model of the objective information theory (OIT) is a sufficient and necessary condition for discussing information with four basic postulations. It is demonstrated for each metric defined in the OIT, there is a corresponding example in classical information theories or commonly used principles. Furthermore, atomic information is defined as the indivisible elementary information and the volume additivity is proven for combinations of atomic information. Consequently, the information volume that a single quantum carrier can carry is derived and a theorem relating information volume to mass, energy, and time is proved. All these efforts illustrate that the OIT is a novel information theory that can unify a variety of classical information principles and even accurately reveal the quantitative relationship between information, matter and energy.	翻訳日:2023-02-09 20:38:27 公開日:2022-12-12
# 不確実性関係について非古典的とは何か? What is nonclassical about uncertainty relations? ( http://arxiv.org/abs/2207.11779v2 ) ライセンス: Link先を確認	Lorenzo Catani, Matthew Leifer, Giovanni Scala, David Schmid and Robert W. Spekkens	(参考訳) 不確実性関係は、単一の状態における異なる測定結果が共同で予測できる程度に制限を表現している。量子論における非自明な不確実性関係の存在は、古典的世界観からの離脱を伴う方法であると考えられている。しかし、この観点は、非自明な不確実性関係を示すが、一般化された非文脈のオントロジモデルを認めた古典的世界観と一致する操作理論が存在するという事実により、弱められている。これにより、不確実性関係のどの側面が実現できないかという疑問が提起され、真の非古典性の証拠となる。ここでは、二元系アウトカム観測の予測可能性(例えば、パウリxとパウリz観測可能性の量子論における測定)のトレードオフを記述する不確実性関係を考える。特定の対称性特性を満たす理論のクラスに対して、この予測可能性トレードオフの関数形式は、線形曲線の下にある非文脈性によって制約されることを示す。量子ビット量子論は関連する対称性を持つため、その予測可能性トレードオフが円の部分を記述するという事実は、この非コンテキスト境界の違反であり、したがって不確かさ関係の関数形式が文脈性を見極める一例である。また、選択された演算ホイル群を量子論に含意し、3つの測度への一般化を考える。 Uncertainty relations express limits on the extent to which the outcomes of distinct measurements on a single state can be made jointly predictable. The existence of nontrivial uncertainty relations in quantum theory is generally considered to be a way in which it entails a departure from the classical worldview. However, this perspective is undermined by the fact that there exist operational theories which exhibit nontrivial uncertainty relations but which are consistent with the classical worldview insofar as they admit of a generalized-noncontextual ontological model. This prompts the question of what aspects of uncertainty relations, if any, cannot be realized in this way and so constitute evidence of genuine nonclassicality. We here consider uncertainty relations describing the tradeoff between the predictability of a pair of binary-outcome measurements (e.g., measurements of Pauli X and Pauli Z observables in quantum theory). We show that, for a class of theories satisfying a particular symmetry property, the functional form of this predictability tradeoff is constrained by noncontextuality to be below a linear curve. Because qubit quantum theory has the relevant symmetry property, the fact that its predictability tradeoff describes a section of a circle is a violation of this noncontextual bound, and therefore constitutes an example of how the functional form of an uncertainty relation can witness contextuality. We also deduce the implications for a selected group of operational foils to quantum theory and consider the generalization to three measurements.	翻訳日:2023-02-03 22:05:42 公開日:2022-12-12
# 単光子検出器を用いた単一サイクルTHzフィールドの電気光学サンプリング Electro-Optical Sampling of Single-Cycle THz Fields with Single-Photon Detectors ( http://arxiv.org/abs/2208.02103v2 ) ライセンス: Link先を確認	Taylor Shields, Adetunmise C. Dada, Lennart Hirsch, Seungjin Yoon, Jonathan M. R. Weaver, Daniele Faccio, Lucia Caspani, Marco Peccianti, Matteo Clerici	(参考訳) 超短パルスプローブを用いたテラヘルツ磁場の電気光学サンプリングは、thz放射の電界を直接測定するための確立された手法である。この技術は通常、THz誘起複屈折による光位相シフトを記録するために平衡検出に依存する。したがって、電気光学サンプリングの感度はプローブパルスのショットノイズによって制限され、例えばハイゼンベルク限界位相推定のためにNOON状態を用いる量子メトロジーアプローチによって改善が達成される。光プローブとして単一光子検出器と弱い圧縮真空場を用いたTHz電気光学サンプリング実験について報告する。本手法は、位相同期単光子検出器を用いたプローブ状態の統計特性に制限された磁場感度を実現し、量子増幅型THzセンシングをターゲットとしたさらなる研究の道を開く。 Electro-optical sampling of Terahertz fields with ultrashort pulsed probes is a well-established approach for directly measuring the electric field of THz radiation. This technique usually relies on balanced detection to record the optical phase shift brought by THz-induced birefringence. The sensitivity of electro-optical sampling is, therefore, limited by the shot noise of the probe pulse, and improvements could be achieved using quantum metrology approaches using, e.g., NOON states for Heisenberg-limited phase estimation. We report on our experiments on THz electro-optical sampling using single-photon detectors and a weak squeezed vacuum field as the optical probe. Our approach achieves field sensitivity limited by the probe state statistical properties using phase-locked single-photon detectors and paves the way for further studies targeting quantum-enhanced THz sensing.	翻訳日:2023-02-02 10:05:49 公開日:2022-12-12
# 非平衡フォノン凝縮と相転移の完全量子論 Full quantum theory of nonequilibrium phonon condensation and phase transition ( http://arxiv.org/abs/2209.05086v3 ) ライセンス: Link先を確認	Xuanhua Wang, Jin Wang	(参考訳) fr\"olich condensationは室温の非平衡現象であり、多くの物理系や生物系で起こることが期待されている。理論上は半世紀前に予測されたが、そのような凝縮の性質はいまだ解明されていない。このレターでは、Wu-Austin Hamiltonian から Fr\"ohlich condensation の完全な量子論が導出され、非平衡性および非線形性によって誘導される二階相転移がデコレーション近似なしで大きなD$極限に現れるという解析的証明を初めて提示する。この批判的な行動は、外部ソースが古典的に扱われても観察できない。位相遷移は, 凝縮フォノンの統計分布に大きなゆらぎを伴い, 過大な外部エネルギー入力の限界で変動を特徴づけるマンデル-Q因子が負となることを示す。冷原子平衡 BEC とは対照的に、Fr\"ohlich condensate は、ポンプが粒子の数を設定する役割を担い、媒体が温度を設定する役割を担っている非平衡駆動の結果である。したがって、becは固定ポンプの媒体温度を下げる(平衡の場合)か、固定中温度でポンプを増加させる(非平衡の場合)かのどちらかである。 Fr\"olich condensation is a room-temperature nonequilibrium phenomenon which is expected to occur in many physical and biological systems. Though predicted theoretically a half century ago, the nature of such condensation remains elusive. In this Letter, we derive a full quantum theory of Fr\"ohlich condensation from the Wu-Austin Hamiltonian and present for the first time an analytical proof that a second-order phase transition induced by nonequilibrium and nonlinearity emerges in the large-$D$ limit with and without decorrelation approximation. This critical behavior cannot be witnessed if external sources are treated classically. We show that the phase transition is accompanied by large fluctuations in the statistical distribution of condensate phonons and that the Mandel-Q factor which characterizes fluctuations becomes negative in the limit of excessive external energy input. In contrast with the cold atom equilibrium BEC, the Fr\"ohlich condensate is a result of the nonequilibrium driving where the pump plays a role of setting the number of particles, and the medium plays a role of setting the temperature. Hence, BEC can either arise by reducing the medium temperature at fixed pump (equilibrium case), or by increasing the pump at fixed medium temperature (nonequilibrium case).	翻訳日:2023-01-26 22:20:07 公開日:2022-12-12
# 等次元コンパクト多様体上のベレジン型量子化 Berezin-type quantization on even-dimensional compact manifolds ( http://arxiv.org/abs/2210.08814v2 ) ライセンス: Link先を確認	Rukmini Dey and Kohinoor Ghosh	(参考訳) 本稿では、コンパクトな偶次元多様体 $M^{2d}$ 上でベレジン型量子化が達成できることを示し、残余が ${\mathbb R}^{2d}$ に微分同型であるような低い次元の骨格 $M_0$ を取り除き、${\mathbb C}^d$ と同一視して ${\mathbb C}P^d$ に埋め込む。局所ポアソン構造とベレジン型量子化は${\mathbb C}P^d$から誘導される。我々は、再生カーネルを持つヒルベルト空間を持つ。ヒルベルト空間上の有界線型作用素の記号は、測度 0 の集合の外側の対応原理を満たす星積を持つ。この構成は微分同相に依存する。しかし、x=m \setminus m_0$ が複素構造を持ち、x \setminus x_0$, (x_0$ a set of measure zero or empty) から ${\mathbb c}^d \setminus n_0$, (ここで $n_0$ は測度 0 か空か) への双ホモ同型を仮定する。前述したように、${\mathbb C}^d \setminus N_0$ in ${\mathbb C}^d$ そして${\mathbb C}P^d$に埋め込み、${\mathbb C}P^d$から誘導されるベレジン型量子化を持つ。別の双ホモ同型を用いると、一方の再生核が他方の再生核に写像し、等価な量子化を持つように考慮された2つのヒルベルト空間の写像が存在する。同様の構成として、任意の複素多様体を考え、局所座標を用いて${\mathbb c}p^d$ から量子化を誘導する。コンパクト複素多様体上の大域ベレジン量子化を定義する可能性について検討する。プルバックトプリッツ作用素の定義を与え、測度 0 の集合を取り除いた後にコンパクトな偶次元多様体のトプリッツ量子化を示す。次に、コンパクトな滑らかな多様体上のプルバックコヒーレント状態(英語版)(pullback coherent states)の簡単な構成を与える。 In this article we show that a Berezin-type quantization can be achieved on a compact even dimensional manifold $M^{2d}$ by removing a skeleton $M_0$ of lower dimension such that what remains is diffeomorphic to ${\mathbb R}^{2d}$ which we identify with ${\mathbb C}^d$ and embed in ${\mathbb C}P^d$. A local Poisson structure and Berezin-type quantization are induced from ${\mathbb C}P^d$. We have a Hilbert space with a reproducing kernel. The symbols of bounded linear operators on the Hilbert space have a star product which satisfies the correspondence principle outside a set of measure zero. This construction depends on the diffeomorphism. However, suppose $X= M \setminus M_0$ has a complex structure and we have from $X \setminus X_0$, ($X_0$ a set of measure zero or empty) a biholomorphism from it to ${\mathbb C}^d \setminus N_0$, (where $N_0$ is of measure zero or empty). As before we embed ${\mathbb C}^d \setminus N_0$ in ${\mathbb C}^d$ and then into ${\mathbb C}P^d$ and we have a Berezin-type quantization induced from ${\mathbb C}P^d$. If we use another biholomorphism, we have a map of the two Hilbert spaces under consideration such that the reproducing kernel of one maps to the reproducing kernel of the other and we have an equivalent quantization. We have a similar construction where we consider an arbitrary complex manifold and use local coordinates to induce the quantization from ${\mathbb C}P^d$. We study the possibility of defining a global Berezin quantization on compact complex manifolds. We give a defintion of pullback Toeplitz operators and exhibit Toeplitz quantization of compact even dimensional manifolds after removing a set of measure zero. Next we give a simple construction of pullback coherent states on compact smooth manifolds which are simplified versions of those defined in an earlier work by the authors.	翻訳日:2023-01-22 07:17:37 公開日:2022-12-12
# 時間・周波数分解核共鳴散乱スペクトル Unraveling Time- and Frequency-Resolved Nuclear Resonant Scattering Spectra ( http://arxiv.org/abs/2210.09848v2 ) ライセンス: Link先を確認	Lukas Wolff and J\"org Evers	(参考訳) 極めて狭い線幅と例外的なコヒーレンス特性のため、m\"ossbauer 核は硬x線のエネルギーで量子光学、分光、動力学に有望なプラットフォームを形成する。さらなる進歩の鍵となる要件は、より強力な計測とデータ分析技術の開発である。 1つのアプローチとして、時間分解スペクトルまたは周波数分解スペクトルを別々に測定する確立されたアプローチと比較して、最近の実験では時間分解スペクトルと周波数分解スペクトルの測定が採用されている。これらの実験では、周波数依存性を調整可能な単線核レファレンス吸収器を用いて実装する。本稿では,周波数領域における時間分解核共鳴散乱スペクトルの分光法と解析法を開発した。提案手法は, 時間軸に沿った実験的にアクセス可能な強度のフーリエ変換に基づいて, 複素数値周波数相関(FFC)スペクトルを導出する。これらのFFCスペクトルは、特に単純な構造を示し、異なる散乱寄与を阻害するだけでなく、ターゲット応答の核標的特性と複雑な値の核共鳴部分に直接アクセスすることを可能にする。第2部では,提案方式の基準吸収器から共振的に散乱したX線の追加位相制御の可能性について検討する。このような制御は特定の散乱経路への選択的アクセスを提供し、パラメータ空間を特定の周波数や時間制限に制限することなく、それぞれの分離解析を可能にする。すべての結果は、m\"ossbauer原子核の薄い層を含む薄膜x線キャビティの核前方散乱や反射の関連例で示される。 Owing to their extremely narrow line-widths and exceptional coherence properties, M\"ossbauer nuclei form a promising platform for quantum optics, spectroscopy and dynamics at energies of hard x-rays. A key requirement for further progress is the development of more powerful measurement and data analysis techniques. As one approach, recent experiments have employed time- and frequency-resolved measurements, as compared to the established approaches of measuring time-resolved or frequency-resolved spectra separately. In these experiments, the frequency-dependence is implemented using a tunable single-line nuclear reference absorber. Here, we develop spectroscopy and analysis techniques for such time- and frequency-resolved Nuclear Resonant Scattering spectra in the frequency-frequency domain. Our approach is based on a Fourier-transform of the experimentally accessible intensities along the time axis, which results in complex-valued frequency-frequency correlation (FFC) spectra. We show that these FFC spectra not only exhibit a particularly simple structure, disentangling the different scattering contributions, but also allow one to directly access nuclear target properties and the complex-valued nuclear resonant part of the target response. In a second part, we explore the potential of an additional phase control of the x-rays resonantly scattered off of the reference absorber for our scheme. Such control provides selective access to specific scattering pathways, allowing for their separate analysis without the need to constrain the parameter space to certain frequency or time limits. All results are illustrated with pertinent examples in Nuclear Forward Scattering and in reflection off of thin-film x-ray cavities containing thin layers of M\"ossbauer nuclei.	翻訳日:2023-01-22 04:28:11 公開日:2022-12-12
# 単分子磁石における$\pi$-Radicalの空間分解電子スピン共鳴 Spatially Resolving Electron Spin Resonance of $\pi$-Radical in Single-molecule Magnet ( http://arxiv.org/abs/2210.10235v2 ) ライセンス: Link先を確認	Ryo Kawaguchi, Katsushi Hashimoto, Toshiyuki Kakudate, Keiichi Katoh, Masahiro Yamashita, Tadahiro Komeda	(参考訳) 磁気分子のスピントロニクスは科学的に注目されている。量子情報処理の量子ビットに特に重点が置かれている。単分子磁石Bis(phthalocyaninato (Pc)) Tb(III) (TbPc2) は、Pc配位子の非局在化された {\pi}-ラジカル電子スピンが、局所化されたTbスピン量子ビットの読み出しと中間化において重要な役割を果たす最もよく検討された例の1つである。走査型トンネル顕微鏡(STM)に実装した電子スピン共鳴(ESR)技術を用いて,Cu(100)基板から分離した単一TbPc2分子の局所ESRを2つの単層NaCl膜で測定し,その放射スピンを同定した。我々は,S = 1/2スピンで期待される共振条件下で,リガンド位置でESR信号を検出する。その結果, ラジカル電子はリガンド内で非局在化され, 化学環境の影響を受けやすい分子内結合を示すことがわかった。 The spintronic properties of magnetic molecules have attracted significant scientific attention. Special emphasis has been placed on the qubit for quantum information processing. The single molecule magnet, bis(phthalocyaninato (Pc)) Tb(III) (TbPc2), is one of the best examined cases in which the delocalized {\pi}-radical electron spin of the Pc ligand plays the key role in reading and intermediating the localized Tb spin qubits. We utilized the electron spin resonance (ESR) technique implemented on scanning tunneling microscope (STM) and use it to measure local ESR of single TbPc2 molecule decoupled from the Cu(100) substrate by 2 monolayers NaCl film to identify the {\pi}-radical spin. We detected the ESR signal at the ligand positions at the resonance condition expected for the S = 1/2 spin. The results reveal that the {\pi}-radical electron is delocalized within the ligands and exhibits intramolecular coupling susceptible to the chemical environment.	翻訳日:2023-01-22 01:53:03 公開日:2022-12-12
# メソスコピックフロケット系の熱化の普遍クラス Universality classes of thermalization for mesoscopic Floquet systems ( http://arxiv.org/abs/2210.13444v2 ) ライセンス: Link先を確認	Alan Morningstar, David A. Huse, Vedika Khemani	(参考訳) 周期的に駆動されるメソスコピックまたは中間スケールの量子カオス系で発生する熱化の異なる相を同定する。その際、新しいフロッケ熱アンサンブルである「ラダーアンサンブル」を同定し、それは長年駆動系に適した平衡アンサンブルであると仮定されてきた「特徴のない無限温度」状態とは定性的に区別される。我々が見つけた段階は大きく分類され (i)そのシステムが非可逆的に$\omega$のエネルギーをドライブ、すなわちフロッケの熱化と交換するか否か (ii)フロッケ加熱を行う系における最終的な平衡を記述するフロッケ熱アンサンブル。これらの位相はメソスコピック系における振舞いのレギュレーションを表すものであるが、駆動周波数$\omega$がシステムサイズ$N$にスケールアップする特定の大系極限において鋭く定義される: 弱い$N$依存の$\omega(N) \sim \log N$から、$\omega(N) \sim \sqrt{N}$から$\omega(N) \sim N$までの周波数スケーリングを調べる。本研究では,フロッケ熱化が崩壊する遷移は広い駆動周波数で起こることを示すとともに,フロッケ熱化を行わない系は,フロッケ帯のレア共鳴の有無によって区別されることを示す。熱化相図はフロケット系の数値的研究と中間スケール量子シミュレータの実験的研究に関係し、どちらもN$と$\omega$の間のスケールの清浄な分離を欠く有限サイズのシステムに限定されている。我々の研究の顕著な予測は、単純な初期状態からの実験的に観測可能なクエンチプロトコルが、異なる温度での状態のグローバルな重ね合わせである新しいタイプのシュロディンガー・キャット状態へのフロケ熱化を示すことができるということである。 We identify several distinct phases of thermalization that can occur in periodically driven mesoscopic or intermediate-scale quantum chaotic systems. In doing so, we also identify a new Floquet thermal ensemble, the ``ladder ensemble", that is qualitatively distinct from the ``featureless infinite-temperature" state that has long been assumed to be the appropriate equilibrium ensemble for driven systems. The phases we find can be coarsely classified by (i) whether or not the system irreversibly exchanges energy of order $\omega$ with the drive, i.e., Floquet thermalizes, and (ii) the Floquet thermal ensemble describing the final equilibrium in systems that do Floquet thermalize. These phases are representative of regimes of behavior in mesoscopic systems, but they are sharply defined in a particular large-system limit where the drive frequency $\omega$ scales up with system size $N$ as the $N\to\infty$ limit is taken: we examine frequency scalings ranging from a weakly $N$-dependent $\omega(N) \sim \log N$, to stronger scalings ranging from $\omega(N) \sim \sqrt{N}$ to $\omega(N) \sim N$. We show that the transition where Floquet thermalization breaks down happens at an extensive drive frequency and, beyond that, systems that do not Floquet thermalize are distinguished based on the presence or absence of rare resonances across Floquet zones. We produce a thermalization phase diagram that is relevant for numerical studies of Floquet systems and experimental studies on intermediate-scale quantum simulators, both of which are limited to finite-size systems that lack a clean separation of scales between $N$ and $\omega$. A striking prediction of our work is that certain experimentally observable quench protocols from simple initial states can show Floquet thermalization to a novel type of Schrodinger-cat state that is a global superposition of states at distinct temperatures.	翻訳日:2023-01-21 18:46:18 公開日:2022-12-12
# スピン液体ハミルトニアンの量子シミュレーターにおける分数統計の探索 Probing fractional statistics in quantum simulators of spin liquid Hamiltonians ( http://arxiv.org/abs/2211.09784v2 ) ライセンス: Link先を確認	Shiyu Zhou, Maria Zelenayova, Oliver Hart, Claudio Chamon, Claudio Castelnovo	(参考訳) プログラマブル量子デバイスの最近の進歩は、トポロジカル量子スピン液体相の実現と研究にそれらを使うことの興味深い可能性をもたらした。この新しくエキサイティングな方向性は、このようなエキゾチックで非常に絡み合ったフェーズの存在を探究し、決定する方法に関する重要な研究課題をもたらす。最も有望なツールの1つは、トポロジカルな励起の挙動、特にその分数統計の研究である。本研究では、これを達成するための一般的な経路を示し、組合せゲージ対称性の助けを借りて実装された$\mathbb{Z}_2$トポロジカルスピン液体の特定の場合について説明する。我々は,準粒子干渉法を用いて分数統計量のシグネチャを研究するための便利なアーキテクチャを設計し,その頑健性を評価するとともに,雑音のある量子プログラマブルデバイスで一般的に普及する効果を強調する。我々が探している署名は、システム内の量子コヒーレンスと量子干渉効果に重大な影響を与えているため、これらのデバイスの「量子性」を明確にテストするのに役立つ。 Recent advances in programmable quantum devices brought to the fore the intriguing possibility of using them to realise and investigate topological quantum spin liquid phases. This new and exciting direction brings about important research questions on how to probe and determine the presence of such exotic, highly entangled phases. One of the most promising tools is investigating the behaviour of the topological excitations, and in particular their fractional statistics. In this work we put forward a generic route to achieve this, and we illustrate it in the specific case of $\mathbb{Z}_2$ topological spin liquids implemented with the aid of combinatorial gauge symmetry. We design a convenient architecture to study signatures of fractional statistics via quasiparticle interferometry, and we assess its robustness to diagonal and off-diagonal disorder, as well as to dephasing -- effects that are generally pervasive in noisy quantum programmable devices. A useful counterpart of our scheme is that it provides a clear test of the `quantumness' of these devices, since the signatures that we are looking for crucially hinge on quantum coherence and quantum interference effects in the system.	翻訳日:2023-01-19 06:41:42 公開日:2022-12-12
# 部分絡み状態からの絡み替えにおける局所予測可能性とコヒーレンス対分散絡み合い Local predictability and coherence versus distributed entanglement in entanglement swapping from partially entangled pure states ( http://arxiv.org/abs/2211.07539v2 ) ライセンス: Link先を確認	Jonas Maziero, Marcos L. W. Basso, Lucas C. C\'eleri	(参考訳) 例えば、$p(\rho_{a})^{2} + c(\rho_{a})^{2} + e(\|\psi\rangle_{ab})^{2}=1$、局所予測可能性、$p$、局所コヒーレンス、$c$、二成分純粋状態の絡み合い、$e$などである。局所コヒーレンスを持つ部分絡み合った純粋状態の特定のクラスで最初に作られた量子ビット対に対して、これらの関係はRefで使われた。 (第451条,第451条,第128414条(第2022条))]は,エンタングルメント交換プロトコル(esp)のベル・ベーシス測定後の状態の最大エンタングル成分の確率と,事前測定状態の局所的予測可能性との操作的関係を提供する。本稿では、この結果を一般的な純初期状態に対して拡張し、ESPにおける$P$,$C$と分散絡みの関係を確立する。我々は、IBMの量子コンピュータを用いて、これらの一般的な理論結果のいくつかの事例を実験的に検証する。 Complete complementarity relations, as e.g. $P(\rho_{A})^{2} + C(\rho_{A})^{2} + E(\|\Psi\rangle_{AB})^{2}=1$, constrain the local predictability, $P$, and local coherence, $C$, and the entanglement, $E$, of bipartite pure states. For pairs of qubits prepared initially in a particular class of partially entangled pure states with null local coherence, these relations were used in Ref. [Phys. Lett. A, 451, 128414 (2022)] to provide an operational connection between local predictability of the pre-measurement states with the probability of the maximally entangled components of the states after the Bell-basis measurement of the entanglement swapping protocol (ESP). In this article, we extend this result for general pure initial states establishing the relation between $P$, $C$ and the distributed entanglement in the ESP. We use IBM's quantum computers to verify experimentally some instances of these general theoretical results.	翻訳日:2023-01-18 06:51:01 公開日:2022-12-12
# 量子畳み込みニューラルネットワークを用いた物体の量子相のモデル独立学習 Model-Independent Learning of Quantum Phases of Matter with Quantum Convolutional Neural Networks ( http://arxiv.org/abs/2211.11786v2 ) ライセンス: Link先を確認	Yu-Jie Liu, Adam Smith, Michael Knap, and Frank Pollmann	(参考訳) 量子畳み込みニューラルネットワーク(QCNN)は、物質ギャップ量子相の分類器として導入されている。本稿では,位相保存摂動下で変化する順序パラメータを検出するために,qcnnを訓練するためのモデル非依存プロトコルを提案する。量子位相の定点波動関数でトレーニングシーケンスを開始し、システムの対称性を尊重する変換不変ノイズを加えて、短い長さスケールで固定点構造を隠蔽する。本稿では、QCNNを1次元の時間反転対称性で保護された位相上で訓練し、自明で対称性を破り、対称性を保護した位相秩序を示す複数の時間反転対称性モデル上でテストする。 QCNNは3つのフェーズすべてを特定し、位相境界の位置を正確に予測する順序パラメータのセットを発見する。提案プロトコルは,プログラム可能な量子プロセッサ上での量子位相分類器のハードウェア効率トレーニングへの道を開くものである。 Quantum convolutional neural networks (QCNNs) have been introduced as classifiers for gapped quantum phases of matter. Here, we propose a model-independent protocol for training QCNNs to discover order parameters that are unchanged under phase-preserving perturbations. We initiate the training sequence with the fixed-point wavefunctions of the quantum phase and then add translation-invariant noise that respects the symmetries of the system to mask the fixed-point structure on short length scales. We illustrate this approach by training the QCNN on phases protected by time-reversal symmetry in one dimension, and test it on several time-reversal symmetric models exhibiting trivial, symmetry-breaking, and symmetry-protected topological order. The QCNN discovers a set of order parameters that identifies all three phases and accurately predicts the location of the phase boundary. The proposed protocol paves the way towards hardware-efficient training of quantum phase classifiers on a programmable quantum processor.	翻訳日:2023-01-17 23:17:56 公開日:2022-12-12
# エッジからバルクへ:空間的非局所量子ビットのキャビティ誘起変位 From edge to bulk: Cavity induced displacement of topological non-local qubits ( http://arxiv.org/abs/2211.14145v2 ) ライセンス: Link先を確認	F. P. M. M\'endez-C\'ordoba, F. J. Rodr\'iguez, C. Tejedor, L. Quiroga	(参考訳) マヨラナフェルミオンの連結性を調整するための位相鎖への選択的キャビティカップリングの能力について検討した。非局所マヨラナフェルミオンペアリングに関連するトポロジカルキュービット(TQ)が、特定の物理的部位との光-物質相互作用への選択的アクセスを通じて、トポロジカルチェーンの端からバルクへ移動可能であることを示す。特に, 鎖状キャビティ結合ジオメトリーの基底状態特性に関する総合的DMRG研究を行い, 強い結合状態における解析的知見を検証した。この新しい種類のマヨラナフェルミオン相関生成プロセスは、新しいキャビティ光子特性を持つ。また,キャビティ・マッター結合強度の急冷後の時間変化を考慮し,高い非自明性物質(majorana)相関の発展はキャビティ内に測定可能な非古典光子インプリントを欠くことを示した。これにより、任意の長さのトポロジカル連鎖におけるTQ非局所相関を動的に生成する新たな方法が提供される。 We investigate the ability of selective cavity coupling to a topological chain for tailoring the connectivity of Majorana fermions. We show how topological qubits (TQs), associated with non-local Majorana fermion pairing, can be moved from the edge to the bulk of a topological chain through selective access to light-matter interaction with specific physical sites. In particular, we present a comprehensive DMRG study of ground-state features in different chain-cavity coupling geometries and validate analytical insights in the strong coupling regime. This new kind of Majorana fermion correlation generation process comes with new cavity photon features. Moreover, by considering the time evolution after a sudden quench of the cavity-matter coupling strength, we show that the development of high non-trivial matter (Majorana) correlations leaves off measurable non-classical photon imprints in the cavity. New ways to dynamically generate TQ nonlocal correlations in topological chains of arbitrary length are thus provided, opening alternative routes to controllable long-range entanglement in hybrid photonic solid-state systems.	翻訳日:2023-01-17 20:41:34 公開日:2022-12-12
# 機械学習による適切な回転フレームの構築 Machine-learning-assisted construction of appropriate rotating frame ( http://arxiv.org/abs/2211.15269v3 ) ライセンス: Link先を確認	Yoshihiro Michishita	(参考訳) ニューラルネットワークによる機械学習は、自然言語処理、画像認識、ゲーム勝利、さらには物理学の問題など、さまざまなタスクのための、ますます強力なツールになりつつある。数値計算への機械学習の適用と実験的な検出の支援については,多くの研究があるが,解析手法の発見に機械学習を適用する方法はあまり研究されていない。本稿では,機械学習を用いて解析手法を見つける手法を提案する。本研究では,時間周期ハミルトニアンをニューラルネットワークに入力するだけで,フロッケマグヌス展開を‘導出’することができることを実証し,周期駆動系の適切な回転フレームを導出する。また,本手法は,他のシステムにおける理論的枠組みの発見にも適用可能であると論じる。 Machine learning with neural networks is now becoming a more and more powerful tool for various tasks, such as natural language processing, image recognition, winning the game, and even for the issues of physics. Although there are many studies on the application of machine learning to numerical calculation and the assistance of experimental detection, the methods of applying machine learning to find the analytical method are poorly studied. In this letter, we propose methods to use machine learning to find the analytical methods. We demonstrate that the recurrent neural networks can ``derive'' the Floquet-Magnus expansion just by inputting the time-periodic Hamiltonian to the neural networks, and derive the appropriate rotating frame in the periodically-driven system. We also argue that this method is also applicable to finding other theoretical frameworks in other systems.	翻訳日:2023-01-17 15:09:33 公開日:2022-12-12
# ホログラフィックメタサーフェストランスシーバーの最適位相シフトの学習 Learning Optimal Phase-Shifts of Holographic Metasurface Transceivers ( http://arxiv.org/abs/2301.03371v1 ) ライセンス: Link先を確認	Debamita Ghosh and Manjesh K. Hanawal and Nikola Zlatanov	(参考訳) ホログラフィー・トランスシーバー(HMT)は,無線通信システムのカバレッジと速度を高める新しい技術である。しかし,HMT支援無線通信システムにおける正確なチャネル状態情報の取得は,これらの目標達成に不可欠である。本論文では,遠距離チャネルモデルのためのHMTにおける最適位相シフトの学習アルゴリズムを提案する。提案手法は遠方界領域におけるチャネルゲインの構造を活用し,受信信号における雑音の存在下での最適位相シフトを学習する。提案アルゴリズムにより推定された最適位相シフトが真の値から逸脱する確率はパイロット信号数で指数関数的に減衰することを示す。大規模な数値シミュレーションは、理論上の保証を検証し、最先端の政策と比較して大きな効果を示す。 Holographic metasurface transceivers (HMT) is an emerging technology for enhancing the coverage and rate of wireless communication systems. However, acquiring accurate channel state information in HMT-assisted wireless communication systems is critical for achieving these goals. In this paper, we propose an algorithm for learning the optimal phase-shifts at a HMT for the far-field channel model. Our proposed algorithm exploits the structure of the channel gains in the far-field regions and learns the optimal phase-shifts in presence of noise in the received signals. We prove that the probability that the optimal phase-shifts estimated by our proposed algorithm deviate from the true values decays exponentially in the number of pilot signals. Extensive numerical simulations validate the theoretical guarantees and also demonstrate significant gains as compared to the state-of-the-art policies.	翻訳日:2023-01-15 23:24:22 公開日:2022-12-12
# 熱行列化ポリトープとその退化 The Thermomajorization Polytope and Its Degeneracies ( http://arxiv.org/abs/2212.04305v2 ) ライセンス: Link先を確認	Frederik vom Ende, Emanuel Malvetti	(参考訳) 将来の熱錐は、与えられた初期状態によって熱行列化された全ての状態の集合であり、準古典的領域において凸ポリトープを形成し、このポリトープの極端点に置換を関連付ける地図を明示的に書き下すことができることはよく知られている。そのような極端点が与えられたとき、初期状態をその極端状態にマッピングするギブス確率行列の式をレビューし、単純な基礎構造を明らかにする。これにより、輸送ポリトープの理論とのつながりが引き起こされ、これは ``well-structured''' と `stable'' ギブス状態の概念に繋がる。前者は極大である極大状態の数に関係しているが、後者は準古典的領域において熱大化が部分次数であるときに特徴付けられる。さらに、極点写像が2つの異なる置換を同じ状態にマップするかどうかを確認するために、ポリトープの退化に関する簡単な基準を与える。 It is well known that the future thermal cone -- which is the set of all states thermomajorized by a given initial state -- forms a convex polytope in the quasi-classical realm, and that one can explicitly write down a map which relates the permutations to the extreme points of this polytope. Given any such extreme point we review a formula for a Gibbs-stochastic matrix that maps the initial state to said extremal state, and we uncover the simple underlying structure. This allows us to draw a connection to the theory of transportation polytopes, which leads to the notions of ``well-structured'' and ``stable'' Gibbs states. While the former relates to the number of extremal states being maximal, the latter characterizes when thermomajorization is a partial order in the quasi-classical realm; this corresponds to the impossibility of cyclic state transfers. Moreover, we give simple criteria for degeneracy of the polytope, that is, for checking whether the extreme point map maps two different permutations to the same state.	翻訳日:2023-01-09 18:50:30 公開日:2022-12-12
# 局所ハミルトニアンの正規化群による低境界基底状態エネルギー Lower Bounding Ground-State Energies of Local Hamiltonians Through the Renormalization Group ( http://arxiv.org/abs/2212.03014v2 ) ライセンス: Link先を確認	Ilya Kull, Norbert Schuch, Ben Dive, Miguel Navascu\'es	(参考訳) 再正規化スキームが与えられた場合、多体量子系の実現可能な局所密度行列の集合のトラクタブル凸緩和を定式化する方法を示す。この緩和は、成長を続ける格子サイトの集合の減少状態の間の制約階層を導入することによって得られる。基礎となる再正規化手順の粗粒度マップは、それらの制約の多くを取り除くのに役立ち、残りのものは合理的な計算手段で強制される。これは、縮小された量子状態の凸緩和に対して線形最適化を行うことにより、任意の局所ハミルトニアンの基底状態エネルギーの厳密な下界を得るのに使うことができる。境界の質は特定の再正規化スキームに決定的に依存するが、これは対象のハミルトニアンに合わせる必要がある。本手法を1次元翻訳不変スピンモデルに適用し,n\gtrsim 100$スピンの局所翻訳不変状態に対して最適化することで得られるエネルギー境界を求める。この実証の他に、一般的な方法は、高空間次元のスピン系、電子構造問題、および絡み合いや非局所性検出などの様々な多体最適化問題など、幅広い問題に適用することができる。 Given a renormalization scheme, we show how to formulate a tractable convex relaxation of the set of feasible local density matrices of a many-body quantum system. The relaxation is obtained by introducing a hierarchy of constraints between the reduced states of ever-growing sets of lattice sites. The coarse-graining maps of the underlying renormalization procedure serve to eliminate a vast number of those constraints, such that the remaining ones can be enforced with reasonable computational means. This can be used to obtain rigorous lower bounds on the ground state energy of arbitrary local Hamiltonians, by performing a linear optimization over the resulting convex relaxation of reduced quantum states. The quality of the bounds crucially depends on the particular renormalization scheme, which must be tailored to the target Hamiltonian. We apply our method to 1D translation-invariant spin models, obtaining energy bounds comparable to those attained by optimizing over locally translation-invariant states of $n\gtrsim 100$ spins. Beyond this demonstration, the general method can be applied to a wide range of other problems, such as spin systems in higher spatial dimensions, electronic structure problems, and various other many-body optimization problems, such as entanglement and nonlocality detection.	翻訳日:2023-01-09 17:48:21 公開日:2022-12-12
# su(2)$ の還元可能表現のための近傍可換行列の構築と緒方定理への応用 Constructing Nearby Commuting Matrices for Reducible Representations of $su(2)$ with an Application to Ogata's Theorem ( http://arxiv.org/abs/2212.06012v1 ) ライセンス: Link先を確認	David Herrera (Rutgers University)	(参考訳) フォン・ノイマンの予想を解くと、arxiv:1111.5933 のオガタの定理は、n$ のサイトと固定されたサイト次元 $d$ のマクロ可観測量に対応する行列が、漸近的に近傍の可換可観測量 $n \to \infty$ である非常に非自明な結果を示した。本論文では,既約部分表現の多重度が一定の単調な減少挙動を示す$su(2)$の正規化高既約表現に対して,近傍の可換行列を構築する手法を開発した。次に、現場次元 $d=2$ に対するオガタの定理の構成的証明と、近傍の可観測物がどれほど近いかを明確に見積もる。さらに、arxiv:1012.3494で探究された時間反転対称性の適用により、実巨視可観測性は漸近的に近傍の実可換可観測性を有するという性質を持つ。 Resolving a conjecture of von Neumann, Ogata's theorem in arXiv:1111.5933 showed the highly nontrivial result that arbitrarily many matrices corresponding to macroscopic observables with $N$ sites and a fixed site dimension $d$ are asymptotically nearby commuting observables as $N \to \infty$. In this paper, we develop a method to construct nearby commuting matrices for normalized highly reducible representations of $su(2)$ whose multiplicities of irreducible subrepresentations exhibit a certain monotonically decreasing behavior. We then provide a constructive proof of Ogata's theorem for site dimension $d=2$ with explicit estimates for how close the nearby observables are. Moreover, motivated by the application to time-reversal symmetry explored in arXiv:1012.3494, our construction has the property that real macroscopic observables are asymptotically nearby real commuting observables.	翻訳日:2023-01-09 16:28:53 公開日:2022-12-12
# 開量子系に対する作用素成長仮説 An operator growth hypothesis for open quantum systems ( http://arxiv.org/abs/2212.06180v1 ) ライセンス: Link先を確認	Budhaditya Bhattacharjee, Xiangyu Cao, Pratik Nandy, Tanay Pathak	(参考訳) Physの形式的拡張。 rev. x 9, 041017 特定の開量子系において作用素成長仮説を提供することを目標とする。この結果は,マルコフ力学が支配する散逸性$q$-body Sachdev-Ye-Kitaev(SYK$_q$)モデルに基づく。ここでは、大きな$q$の極限において、Lanczos係数の2つの集合(a_n$および$b_n$)の漸近線型挙動の図式的および組合せ的証明を可能にする'operator size concentration'の概念を導入する。我々の結果は、大きな$N$極限における有限$q$の半解析と、有限$q$および有限$N$極限における数値アルノルニ反復とを相関付ける。結果として、クリロフ複雑性は、逆散逸強度で対数的に成長する飽和後の指数関数的な成長を示す。複雑性の増大は閉系結果と比較して抑制されるが、正規化外秩序相関器(OTOC)の成長は上界である。我々は、これを任意の散逸(開)量子系に対して一般的なものと推測し、そのような場合のカオス境界を一般化することができる。また、双対重力面による結果のもっともらしい説明も提供する。 Extending the formalism of Phys. Rev. X 9, 041017, we aim to provide an operator growth hypothesis in certain open quantum systems. Our results are based on the study of the dissipative $q$-body Sachdev-Ye-Kitaev (SYK$_q$) model, governed by the Markovian dynamics. We introduce a notion of ''operator size concentration'' which allows a diagrammatic and combinatorial proof of the asymptotic linear behavior of the two sets of Lanczos coefficients ($a_n$ and $b_n$) in the large $q$ limit. Our results corroborate with the semi-analytics in finite $q$ in the large $N$ limit, and the numerical Arnoldi iteration in finite $q$ and finite $N$ limit. As a result, Krylov complexity exhibits exponential growth following a saturation at a time that grows logarithmically with the inverse dissipation strength. The growth of complexity is suppressed compared to the closed system results, yet it upper bounds the growth of the normalized out-of-time-ordered correlator (OTOC). We conjecture this to be generic for any dissipative (open) quantum systems and may generalize the chaos bound in such cases. We also provide a plausible explanation of the results from the dual gravitational side.	翻訳日:2023-01-09 15:53:55 公開日:2022-12-12
# 時空双対による非平衡全数統計と対称性解の絡み合い Nonequilibrium Full Counting Statistics and Symmetry-Resolved Entanglement from Space-Time Duality ( http://arxiv.org/abs/2212.06188v1 ) ライセンス: Link先を確認	Bruno Bertini, Pasquale Calabrese, Mario Collura, Katja Klobas, Colin Rylands	(参考訳) その確率的性質から、量子力学における測定過程は可能な結果の分布を生成する。この分布、またはフルカウント統計(FCS)として知られるフーリエ変換は、測定された可観測値の平均値よりもはるかに多くの情報を含み、それにアクセスすることがシステムに関する関連情報を得る唯一の方法である。実際、FCSは、大域対称性の存在下で量子エンタングルメントが異なる対称性セクターにどのように分割されるかを特徴付ける、より一般的な観測可能な族(荷電モーメント)の極限である。ここでは、FCSとU(1)電荷の電荷モーメントの進化を、大域的量子クエンチの後に有限領域に切り替わったものとみなす。大規模な場合、これらの量は2つの異なる状態が時間の関数として示される単純な大偏差形式をとる: 領域のサイズよりもはるかに大きい場合、局所平衡状態によって設定された定常値に近づくが、領域サイズよりも短い場合、時間に対する自明な依存を示す。初期状態が U(1) 対称であるとき、FCS の時間における先頭の順序と非平衡状態における荷電モーメントは時空双対性によって決定できることを示す。すなわち、時間と空間の役割が交換されるシステムの定常値と一致する。この観察を用いてfcsと荷電モーメントの一般性を見いだし、相互作用する可積分モデルにおいてそれらの量の正確な表現を導出する。我々は、この式を規則54量子セルオートマトンとxxzスピン1/2鎖の正確な数値の正確な結果に対してテストする。 Due to its probabilistic nature, a measurement process in quantum mechanics produces a distribution of possible outcomes. This distribution - or its Fourier transform known as full counting statistics (FCS) - contains much more information than say the mean value of the measured observable and accessing it is sometimes the only way to obtain relevant information about the system. In fact, the FCS is the limit of an even more general family of observables - the charged moments - that characterise how quantum entanglement is split in different symmetry sectors in the presence of a global symmetry. Here we consider the evolution of the FCS and of the charged moments of a U (1) charge truncated to a finite region after a global quantum quench. For large scales these quantities take a simple large-deviation form, showing two different regimes as functions of time: while for times much larger than the size of the region they approach a stationary value set by the local equilibrium state, for times shorter than region size they show a non-trivial dependence on time. We show that, whenever the initial state is also U (1) symmetric, the leading order in time of FCS and charged moments in the out-of-equilibrium regime can be determined by means of a space-time duality. Namely, it coincides with the stationary value in the system where the roles of time and space are exchanged. We use this observation to find some general properties of FCS and charged moments out-of-equilibrium, and to derive an exact expression for these quantities in interacting integrable models. We test this expression against exact results in the Rule 54 quantum cellular automaton and exact numerics in the XXZ spin-1/2 chain.	翻訳日:2023-01-09 15:53:32 公開日:2022-12-12
# 遠心影推定 : 量子回路とバウンディングテールの再利用 Thrifty shadow estimation: re-using quantum circuits and bounding tails ( http://arxiv.org/abs/2212.06240v1 ) ライセンス: Link先を確認	Jonas Helsen and Michael Walter	(参考訳) ランダム化シャドウ推定は、ランダム量子回路と計算基底測定を用いて得られる「古典シャドウ」から、指数関数的に多くの量子状態の期待値を推定できる最近のプロトコルである。本稿では,短期量子コンピューティングの観点から,このアプローチの統計効率について検討する。特に,本プロトコルのより実践的に実装可能な変種であるスリフティシャドウ推定を提案し,各測定(元のプロトコルのように)に新たに生成する代わりに,量子回路を何度も再利用する。この再利用の効果は、選択される量子回路の族に強く依存していることを示す。特に、ハールランダムユニタリをサンプリングする場合は最大有効であり、クリフォード回路(クリフォード群が3つのデザインを形成するにもかかわらず)をサンプリングする場合は最大有効ではない。これら2つの極小を補間するために、近似t設計の最近の構築に触発された量子回路の効率良くシミュレート可能な族を提供する。最後に,シャドウ推定のテール境界を考察し,平均中央値推定を標準平均推定に置き換える方法について検討する。 Randomized shadow estimation is a recent protocol that allows estimating exponentially many expectation values of a quantum state from ``classical shadows'', obtained by applying random quantum circuits and computational basis measurements. In this paper we study the statistical efficiency of this approach in light of near-term quantum computing. In particular, we propose and analyze a more practically-implementable variant of the protocol, thrifty shadow estimation, in which quantum circuits are reused many times instead of having to be freshly generated for each measurement (as in the original protocol). We show that the effect of this reuse strongly depends on the family of quantum circuits that is chosen. In particular, it is maximally effective when sampling Haar random unitaries, and maximally ineffective when sampling Clifford circuits (even though the Clifford group forms a three-design). To interpolate between these two extremes, we provide an efficiently simulable family of quantum circuits inspired by a recent construction of approximate t-designs. Finally we consider tail bounds for shadow estimation and discuss when median-of-means estimation can be replaced with standard mean estimation.	翻訳日:2023-01-09 15:53:03 公開日:2022-12-12
# 有限温度ドープ2次元半導体におけるエキシトンポラロンからトライアンへのクロスオーバー Crossover from exciton polarons to trions in doped two-dimensional semiconductors at finite temperature ( http://arxiv.org/abs/2212.05635v1 ) ライセンス: Link先を確認	A. Tiene, B. C. Mulkerin, J. Levinsen, M. M. Parish, and F. M. Marchetti	(参考訳) ドープ2次元半導体の光応答における温度の役割を体系的に研究した。有限温度フェルミ・ポーラロン理論を用いることで、よく定義されたポラロン準粒子を持つ量子縮退状態から、最低エネルギーの「引力」ポラロン準粒子が破壊される高温または低ドーピングにおける非コヒーレントな状態へのクロスオーバーを明らかにした。クロスオーバーは吸収とフォトルミネッセンスの両方において有意な質的変化を伴っていることを示す。特に、温度の上昇(またはドーピングの減少)とともに、魅力的な枝の放出プロファイルは、有限運動量でトリオンと反動電子を含む指数的テールを持つ対称ローレンツ型から非対称ピークへと進化する。マイクロキャビティ内に埋もれた構造物の光とのカップリングに対する温度の影響を考察し、エキシトン-ポーラロン準粒子が破壊されても十分に定義された偏光子準粒子が存在することを示し、弱光から強光間カップリングへの遷移をポーラロン線幅とスペクトル重みの観点から説明できることを示した。 We study systematically the role of temperature in the optical response of doped two-dimensional semiconductors. By making use of a finite-temperature Fermi-polaron theory, we reveal a crossover from a quantum-degenerate regime with well-defined polaron quasiparticles to an incoherent regime at high temperature or low doping where the lowest energy "attractive" polaron quasiparticle is destroyed, becoming subsumed into a broad trion-hole continuum. We demonstrate that the crossover is accompanied by significant qualitative changes in both absorption and photoluminescence. In particular, with increasing temperature (or decreasing doping), the emission profile of the attractive branch evolves from a symmetric Lorentzian to an asymmetric peak with an exponential tail involving trions and recoil electrons at finite momentum. We discuss the effect of temperature on the coupling to light for structures embedded into a microcavity, and we show that there can exist well-defined polariton quasiparticles even when the exciton-polaron quasiparticle has been destroyed, where the transition from weak to strong light-matter coupling can be explained in terms of the polaron linewidths and spectral weights.	翻訳日:2023-01-09 15:45:42 公開日:2022-12-12
# 非エルミート系における有限温度動的量子相転移 Finite temperature dynamical quantum phase transition in a non-Hermitian system ( http://arxiv.org/abs/2212.05839v1 ) ライセンス: Link先を確認	Debashish Mondal, Tanay Nag	(参考訳) 混合状態動的量子相転移(MSDQPT)の文脈における非常温性と有限温度との相互作用について検討する。 p$-wave超伝導体モデルでは,ガッピング位相に加えてギャップのない位相を生じさせる複雑なホッピングと非ハーミティを包含し,位相内クエンチを介してmsdqptと巻線数を調べる。このMSDQPTは, 基礎相のギャップ構造によらず常に存在するが, フィッシャー零点の分布は上記の相の間で変化する。このようなMSDQPTの発生は、差分相に対してDQPTが起こらないゼロ温度の場合とは対照的である。驚くべきことに、ゼロ温度での回転数の半整数ジャンプは、ギャップレス位相の有限温度のために洗い流される。本研究では,ガッピング相とギャップレス相を区別できる逆温度でmsdqptを経験するために必要な最小時間の進化について検討する。本研究は, ギャップ付き(ギャップなし)相の単調(非単調)挙動を最小時間で示すことを示す。 We investigate the interplay between the non-Hermiticity and finite temperature in the context of mixed state dynamical quantum phase transition (MSDQPT). We consider a $p$-wave superconductor model, encompassing complex hopping and non-Hermiticity that can lead to gapless phases in addition to gapped phases, to examine the MSDQPT and winding number via the intra-phase quench. We find that the MSDQPT is always present irrespective of the gap structure of the underlying phase, however, the profile of Fisher zeros changes between the above phases. Such occurrences of MSDQPT are in contrast to the zero-temperature case where DQPT does not take place for the gapped phase. Surprisingly, the half-integer jumps in winding number at zero-temperature are washed away for finite temperature in the gapless phase. We study the evolution of the minimum time required by the system to experience MSDQPT with the inverse temperature such that gapped and gapless phases can be differentiated. Our study indicates that the minimum time shows monotonic (non-monotonic) behavior for the gapped (gapless) phase.	翻訳日:2023-01-09 15:45:16 公開日:2022-12-12
# 弱磁場中における中性原子の高速核スピンゲートと電子核絡み Fast nuclear-spin gates and electrons-nuclei entanglement of neutral atoms in weak magnetic fields ( http://arxiv.org/abs/2212.05876v1 ) ライセンス: Link先を確認	Xiao-Feng Shi	(参考訳) 例として,2価原子の核スピンを^<171}$ybとする高速リドバーグ媒介の絡み合いを示す。まず、スタークシフトの補助により2つのレーザーパルスまたは3つのパルスで実現可能な任意の位相の核スピン制御位相ゲートを示す。次に、2つの原子の電子~(e)と核スピン~(n)の間に絡み合った状態(\lvert\text{cc}\rangle_{\text{e}} \otimes \lvert\phi\rangle_{\text{n}} + \lvert\phi\rangle_{\text{e}} \otimes \lvert\psi\rangle_{\text{n}} )/\sqrt{2}$を作成する。より優れた用語を欲しがるならば、" `large'' Bell state"を模倣した"Super Bell State"と呼ばれ、3つの" ``smaller' Bell state"が組み込まれている。第3に、3つの原子状態 $(\sqrt{3}\lvert\rangle_{\text{e}} \otimes \lvert\lambda\rangle_{\text{n}} + \lvert \text{w}\rangle_{\text{e}} \otimes \lvert \text{ghz}\rangle_{\text{n}} )/2$, where $\lvert\lambda\rangle_{\text{n}}$ is a nuclear-spin state, $\lvert \text{w}\rangle_{\text{e}}$ is a w state in the ground-clock state space, $\lvert \text{ghz}\rangle_{\text{n}}$ is the greenberger-horne-zeilinger(~ghz) state-pin state space である。 4つのプロトコルは内在性が高く、単サイトのrydbergアドレスを必要とせず、各原子中の2つの核スピン量子ビット状態のrydberg励起を伴う弱いガウススケールの磁場で大きな$\omega_{\text{m}}$で実行することができる。後者の2つのプロトコルはベル、ハイパーエンタングルおよびGHZ状態の測定に基づく準備を可能にする。 We present fast Rydberg-mediated entanglement involving nuclear spins of divalent atoms with $^{171}$Yb as an example. First, we show a nuclear-spin controlled phase gate of an arbitrary phase realizable either with two laser pulses when assisted by Stark shifts, or with three pulses. Second, we propose to create a state $(\lvert\text{cc}\rangle_{\text{e}} \otimes \lvert\Phi\rangle_{\text{n}} + \lvert\Phi\rangle_{\text{e}} \otimes \lvert\Psi\rangle_{\text{n}} )/\sqrt{2}$ entangled between the electrons~(e) and nuclear spins~(n) of two atoms, where $\lvert\Phi\rangle$ and $\lvert\Psi\rangle$ are two orthogonal Bell states and $\lvert \text{c}\rangle_{\text{e}}$ denotes the clock state. For want of a better term, it is called a Super Bell State for it mimics a ``large'' Bell state incorporating three ``smaller'' Bell states. Third, we show a protocol to create a three-atom state $(\sqrt{3}\lvert\text{ccc}\rangle_{\text{e}} \otimes \lvert\Lambda\rangle_{\text{n}} + \lvert \text{W}\rangle_{\text{e}} \otimes \lvert \text{GHZ}\rangle_{\text{n}} )/2$, where $\lvert\Lambda\rangle_{\text{n}}$ is a nuclear-spin state, $\lvert \text{W}\rangle_{\text{e}}$ is a W state in the ground-clock state space, and $\lvert \text{GHZ}\rangle_{\text{n}}$ is the Greenberger-Horne-Zeilinger~(GHZ) state in the nuclear-spin state space. The four protocols possess high intrinsic fidelities, do not require single-site Rydberg addressing, and can be executed with large $\Omega_{\text{m}}$ in a weak, Gauss-scale magnetic field for they involve Rydberg excitation of both nuclear-spin qubit states in each atom. The latter two protocols can enable measurement-based preparation of Bell, hyperentangled, and GHZ states.	翻訳日:2023-01-09 15:44:58 公開日:2022-12-12
# SyReC Synthesizer:可逆回路合成のためのMQTツール SyReC Synthesizer: An MQT tool for synthesis of reversible circuits ( http://arxiv.org/abs/2212.05903v1 ) ライセンス: Link先を確認	Smaran Adarsh, Lukas Burgholzer, Tanmay Manjunath and Robert Wille	(参考訳) 可逆回路は、量子コンピューティング、低消費電力/断熱設計、エンコーダ/デコーダデバイスなど、多くの有望な新興技術のバックボーンを形成する。近年,このような回路のスケーラブルな合成が注目されている。本稿では,ハードウェア記述言語SyReCに基づく可逆回路の合成ツールであるSyReC Synthesizerを提案する。 SyReCは高レベルの抽象化で可逆的な機能を記述することができる。提供されるSyReC Synthesizerはプッシュボタン方式でこの機能を実現する。対応するオプションは、必要な回路信号/線数(例えば、全ての回路ラインがキュービットに対応する量子コンピューティング)と必要なゲート(回路のコストに応じた)との間のトレードオフを可能にする。さらに、このツールは結果の回路をシミュレートし、ゲートコストを決定できる。 SyReC Synthesizerは、ミュンヘン量子ツールキット(MQT)の一部としてhttps://github.com/cda-tum/syrecでオープンソースソフトウェアパッケージとして利用可能である。 Reversible circuits form the backbone for many promising emerging technologies such as quantum computing, low power/adiabatic design, encoder/decoder devices, and several other applications. In the recent years, the scalable synthesis of such circuits has gained significant attention. In this work, we present the SyReC Synthesizer, a synthesis tool for reversible circuits based on the hardware description language SyReC. SyReC allows to describe reversible functionality at a high level of abstraction. The provided SyReC Synthesizer then realizes this functionality in a push-button fashion. Corresponding options allow for a trade-off between the number of needed circuit signals/lines (relevant, e.g., for quantum computing in which every circuit line corresponds to a qubit) and the respectively needed gates (corresponding to the circuit's costs). Furthermore, the tool allows to simulate the resulting circuit as well as to determine the gate costs of it. The SyReC Synthesizer is available as an open-source software package at https://github.com/cda-tum/syrec as part of the Munich Quantum Toolkit (MQT).	翻訳日:2023-01-09 15:43:28 公開日:2022-12-12
# ホログラフィー量子スカー Holographic Quantum Scars ( http://arxiv.org/abs/2212.05962v1 ) ライセンス: Link先を確認	Diego Liska, Vladimir Gritsev, Ward Vleeshouwers, Ji\v{r}\'i Min\'a\v{r}	(参考訳) ホログラフィーの文脈における量子多体傷の構成について論じる。二次元共形場の理論を考察し、その力学対称性をヴィラソロ環を通じて自然に実現し、傷ついた状態を構築する。 Loschmidt振幅の研究により、状態の周期的特性を評価する。幾何学的解釈により、応力テンソルの期待値とこれらの傷つき状態の絡み合いエントロピーを計算することができる。これらのホログラフィック双対は、ブラックホールしきい値以上のエネルギーであっても、空のAdSと微分同相によって関連していることを示す。また,スカーレッド状態における期待値は一般に非熱的であり,典型的な (バルク) 状態に対する$\sqrt{e}$ とは対照的に,そのエントロピーが $\log(e)$ のエネルギーとともに増大することを示した。さらに、スカーレッド状態が無限エネルギーを持つ極限において、発散あるいは消滅する絡み合いエントロピーに関連するCFT平面上の固定点を同定する。 We discuss a construction of quantum many-body scars in the context of holography. We consider two-dimensional conformal field theories and use their dynamical symmetries, naturally realized through the Virasoro algebra, to construct scarred states. By studying their Loschmidt amplitude, we evaluate the states' periodic properties. A geometrical interpretation allows us to compute the expectation value of the stress tensor and entanglement entropy of these scarred states. We show that their holographic dual is related by a diffeomorphism to empty AdS, even for energies above the black hole threshold. We also demonstrate that expectation values in the scarred states are generally non-thermal and that their entanglement entropy grows with the energy as $\log(E)$ in contrast to $\sqrt{E}$ for the typical (bulk) states. Furthermore, we identify fixed points on the CFT plane associated with divergent or vanishing entanglement entropy in the limit where the scarred states have infinite energy.	翻訳日:2023-01-09 15:43:00 公開日:2022-12-12
# Biorthogonal renormalization Biorthogonal Renormalization ( http://arxiv.org/abs/2212.06004v1 ) ライセンス: Link先を確認	Elisabet Edvardsson, J Lukas K K\"onig, Marcus St{\aa}lhammar	(参考訳) 生物直交形式は、従来の量子力学を非エルミート領域に拡張する。しかし、生物rthogonal inner productは固有ベクトルのスケーリングによって変化することが指摘されており、その物理的意義はまだ議論されている。ここでは、この問題を再検討し、この正規化の選択が物理的に重要である場合について議論する。本稿では, 予測値や遷移確率などの設定量が固有ベクトルのスケーリングに依存する場合と, 生物rthogonal formalism の設定が不明瞭である場合について述べる。明らかなスケーリングの曖昧さを解決するため、基底のゲージ選択に依存しない内部積を導入し、それに対応する数学的構造が量子力学と一致することを示す。この形式主義を用いて、ヒルベルト空間表現の物理性に関するより深い問題を特定し、位置基底を用いて説明する。多くの物理的結果が依拠する数学的基礎の理解を深めるだけでなく、この発見は非エルミート的ハミルトニアンによって記述されたシステム間の一貫した比較への道を開いた。 The biorthogonal formalism extends conventional quantum mechanics to the non-Hermitian realm. It has, however, been pointed out that the biorthogonal inner product changes with the scaling of the eigenvectors, an ambiguity whose physical significance is still being debated. Here, we revisit this issue and argue when this choice of normalization is of physical importance. We illustrate in which settings quantities such as expectation values and transition probabilities depend on the scaling of eigenvectors, and in which settings the biorthogonal formalism remains unambiguous. To resolve the apparent scaling ambiguity, we introduce an inner product independent of the gauge choice of basis and show that its corresponding mathematical structure is consistent with quantum mechanics. Using this formalism, we identify a deeper problem relating to the physicality of Hilbert space representations, which we illustrate using the position basis. Apart from increasing the understanding of the mathematical foundations upon which many physical results rely, our findings also pave the way towards consistent comparisons between systems described by non-Hermitian Hamiltonians.	翻訳日:2023-01-09 15:42:40 公開日:2022-12-12
# 臨界不安定二層系の量子コヒーレンス Quantum Coherence of Critical Unstable Two-Level Systems ( http://arxiv.org/abs/2212.06031v1 ) ライセンス: Link先を確認	Dimitrios Karamitros, Thomas McKelvey, Apostolos Pilaftsis	(参考訳) 量子ビットのブロッホ球形式を用いて不安定な2レベル量子系の力学を詳細に研究する。このような不安定な量子ビット系のブロッホベクトル表現を用いることで、いわゆるエネルギーレベルベクトルと減衰幅ベクトルである ${\bf e}$ と ${\bf\gamma}$ が互いに直交し、パラメータ $r = \|{\bf \gamma}\|/(2\|{\bf e}\|)$ が 1 未満となるような、新しい臨界シナリオのクラスを特定する。最も驚くべきことに、臨界不安定な量子ビット系は、システムの適切に定義された共沈系で解析された場合、コヒーレンス・デコヒーレンス振動のような非定型的な振る舞いを示す。同じフレームで、純粋な臨界量子ビットを記述する単位ブロッホベクトル ${\bf b}$ は、同じ時間間隔で不等な領域を掃き、一方、ベクトル ${\bf e}$ の周りで回転する。これらの現象は、2レベル量子系のエネルギーレベル差によって通常の振動パターンを越えて現れる。興味深いことに、これらの新機能は準クリティカルなシナリオでも継続するので、ベクトル ${\bf e}$ と ${\bf\gamma}$ は互いに完全に直交するものではない。量子情報および不安定な中間子-アタイムソンおよび他のシステムへの適用について論じる。 We study in detail the dynamics of unstable two-level quantum systems by adopting the Bloch-sphere formalism of qubits. By employing the Bloch-vector representation for such unstable qubit systems, we identify a novel class of critical scenarios in which the so-called energy-level and decay-width vectors, ${\bf E}$ and ${\bf\Gamma}$, are orthogonal to one another, and the parameter $r = \|{\bf \Gamma}\|/(2\|{\bf E}\|)$ is less than 1. Most remarkably, we find that critical unstable qubit systems exhibit atypical behaviours like coherence--decoherence oscillations when analysed in an appropriately defined co-decaying frame of the system. In the same frame, a unit Bloch vector ${\bf b}$ describing a pure critical qubit will sweep out unequal areas during equal intervals of time, while rotating about the vector ${\bf E}$. These phenomena emerge beyond the usual oscillatory pattern due to the energy-level difference of the two-level quantum system. Interestingly enough, we observe that these new features will persist even for quasi-critical scenarios, in which the vectors ${\bf E}$ and ${\bf\Gamma}$ are not perfectly orthogonal to each other. Applications of our results to quantum information and to unstable meson--antimeson and other systems are discussed.	翻訳日:2023-01-09 15:42:22 公開日:2022-12-12
# ボソニックフラックスラダーにおける幾何学的フラストレーションのないフラストレーションマグネット Frustrated magnets without geometrical frustration in bosonic flux ladders ( http://arxiv.org/abs/2212.06112v1 ) ライセンス: Link先を確認	Luca Barbiero, Josep Cabedo, Maciej Lewenstein, Leticia Tarruell, Alessio Celi	(参考訳) 光格子中の超低温ボゾン原子を用いたフラストレーションスピン1/2量子XXモデルの実現手法を提案する。我々のアプローチは、1つの実次元と1つの合成スピン次元を持つ$\pi$に近い磁束の正方形のラダーに基づいている。このシステムは幾何学的なフラストレーションを持たないが、低エネルギーでは合成トンネルの特定の値にスタガー付きフラックスを持つ有効三角形のはしごにマッピングされる。本研究では, その豊富な相図を数値的に検討し, 結合秩序波およびキラル超流動相を含むことを示す。本手法は, 実際の幾何学的フラストレーションを必要とせずに, 最小のフラストレーションマグネットのインスタンスにアクセスし, 実験的な複雑さを最小化する。 We propose a scheme to realize the frustrated spin-1/2 quantum XX model with ultracold bosonic atoms in optical lattices. Our approach is based on a square ladder of magnetic flux close to $\pi$ with one real and one synthetic spin dimension. Although this system does not have geometrical frustration, we show that at low energies it maps into an effective triangular ladder with staggered fluxes for specific values of the synthetic tunneling. We numerically investigate its rich phase diagram and show that it contains bond-ordered-wave and chiral superfluid phases. Our scheme gives access to minimal instances of frustrated magnets without the need for real geometrical frustration, in a setup of minimal experimental complexity.	翻訳日:2023-01-09 15:41:56 公開日:2022-12-12
# スペクトルフィルタによる不連続ポンピング2レベル系の純度, 識別性, 量子収率の制御 Controlling Purity, Indistinguishability and Quantum Yield of Incoherently Pumped Two-Level System by Spectral Filters ( http://arxiv.org/abs/2212.06233v1 ) ライセンス: Link先を確認	Ivan V. Panyukov, Vladislav Yu. Shishkov, Evgeny S. Andrianov	(参考訳) 退化過程は決定論的単一光子源の性能に大きな影響を及ぼす。スペクトル線を強調することで、放出された光子の識別性が低下し、多くの応用、特に量子コンピューティングでは望ましくない。パルス非コヒーレントポンプを用いた2レベルシステムにより放射される光をスペクトルフィルタの存在下で検討する。スペクトルフィルタは、2階自己相関関数、識別不能性、および量子収率の制御を可能にする。狭いスペクトルフィルタは、量子収率を損なうことなく、放出される光の識別性を高めることができる。スペクトルフィルタが2次相関関数に及ぼす影響はポンプの持続時間に依存する。ポンプパルスが2レベル系の寿命と比較して長い場合、狭いスペクトルフィルタは2次自己相関関数を急速に増加させる。この限界において、2段階の系からの光の統計は、非コヒーレントポンプの統計を継承する。ポンプパルスの短寿命の場合、スペクトルフィルタのサブライフタイム幅に対して、一光子特性をある程度保持することができる。さらに、単一光子源によって放出される光が、例えば空洞のような量子系を制御するために使用されるとき、光の単一光子の性質は、量子系の応答時間によって異なる。特に、長い応答時間の場合、サブライフタイム幅のスペクトルフィルタは、ほぼゼロの2次自己相関関数を提供できる。 Dephasing processes significantly impact the performance of deterministic single-photon sources. Dephasing broadens the spectral line and suppresses the indistinguishability of the emitted photons, which is undesirable for many applications, primarily for quantum computing. We consider a light emitted by a two-level system with a pulsed incoherent pump in the presence of the spectral filter. The spectral filter allows control of the second-order autocorrelation function, indistinguishability, and quantum yield. We show that narrow spectral filters can increase the indistinguishability of the emitted light while undermining the quantum yield. The influence of the spectral filter on the second-order correlation function depends on the duration of the pump. When the pumping pulse is long compared to the lifetime of the two-level system, the narrow spectral filters lead to a rapid increase in the second-order autocorrelation function. In this limit, the statistics of the light from the two-level system inherit the statistics of the incoherent pump. In the case of the short duration of the pump pulse, it is possible to preserve single-photon properties to some degree for the sub-lifetime width of the spectral filter. Moreover, when the light emitted by the single-photon source is used to control a quantum system, e.g., cavity, the single-photon properties of the light manifest themselves differently, depending on the response time of the quantum system. In particular, in the case of long response time, the spectral filter with sub-lifetime width can provide the near-zero second-order autocorrelation function.	翻訳日:2023-01-09 15:08:29 公開日:2022-12-12
# 放射周期ポテンシャルにおける渦リング量子液滴 Vortex-ring quantum droplets in a radially-periodic potential ( http://arxiv.org/abs/2212.05838v1 ) ライセンス: Link先を確認	Bin Liu, Yi xi Chen, Ao wei Yang, Xiao yan Cai, Yan Liu, Zhi huan Luo, Xi zhou Qin, Xun da Jiang, Yong yao Li, and Boris A. Malomed	(参考訳) ボース・アインシュタイン凝縮体(BECs)により形成される2次元渦輪状量子滴(QDs)の安定性と特性を確立する。この系はGross-Pitaevskii(GP)方程式でモデル化され、対数係数(平均場理論へのリー=ハン・ヤン補正によって生成される)と放射座標の周期関数であるポテンシャルによって乗算される。放射電位の特定の円輪に閉じ込められた位相電荷の高い狭い渦輪が生成される。これらの結果から,vortical qdsの作成には実験的に関連のある方法が示唆された(これまではゼロ渦のみ報告されている)。狭い環の2次元GP方程式は1次元形式にほぼ還元され、アジムタール摂動に対する環の変調安定性を研究することができる。これらのモードでは、完全な安定性領域がデライン化されている。回転数 (WNs) の異なる渦輪に対して, 円形トラフのトラップ容量を同定した。互いにネストした同心多重環の形で安定な化合物状態も構築され、WNの反対の符号を含む。他のロバスト化合物状態は、1つの円電位トラフに変調的に安定な狭い環と、隣接する軌道運動を行うアジムタールソリトンを組み合わせる。この結果は、データストレージに異なるWNを持つリング形状のモードを併用したデバイスの設計に使用することができる。 We establish stability and characteristics of two-dimensional (2D) vortex ring-shaped quantum droplets (QDs) formed by binary Bose-Einstein condensates (BECs). The system is modeled by the Gross-Pitaevskii (GP) equation with the cubic term multiplied by a logarithmic factor (as produced by the Lee-Huang-Yang correction to the mean-field theory) and a potential which is a periodic function of the radial coordinate. Narrow vortex rings with high values of the topological charge, trapped in particular circular troughs of the radial potential, are produced. These results suggest an experimentally relevant method for the creation of vortical QDs (thus far, only zero-vorticity ones have been reported). The 2D GP equation for the narrow rings is approximately reduced to the 1D form, which makes it possible to study the modulational stability of the rings against azimuthal perturbations. Full stability areas are delineated for these modes. The trapping capacity of the circular troughs is identified for the vortex rings with different winding numbers (WNs). Stable compound states in the form of mutually nested concentric multiple rings are constructed too, including ones with opposite signs of the WNs. Other robust compound states combine a modulationally stable narrow ring in one circular potential trough and an azimuthal soliton performing orbital motion in an adjacent one. The results may be used to design a device employing coexisting ring-shaped modes with different WNs for data storage.	翻訳日:2023-01-09 14:57:23 公開日:2022-12-12
# フェムト秒レーザーライティングによる2量子ビット量子フォトニックプロセッサ Two-qubit quantum photonic processor manufactured by femtosecond laser writing ( http://arxiv.org/abs/2212.05931v1 ) ライセンス: Link先を確認	N.N. Skryabin, I.V. Kondratyev, I.V. Dyakonov, O.V. Borzenkova, S.P. Kulik, and S.S. Straupe	(参考訳) フェムト秒レーザーライティング技術を用いて作製した2量子ビットフォトニック量子プロセッサを実験的に実装した。我々はフェムト秒レーザーライティングを用いて、精密な単一量子ビットと2量子ビット演算を実装した低損失再構成可能なフォトニックチップを作成する。シングルキュービットゲートと2キュービットゲートの性能はフルプロセストモグラフィーによって特徴付けられる。変動量子固有解法アルゴリズムを用いてH2分子の基底状態エネルギーを決定するためのプロセッサの例を示した。フェムト秒レーザーによる小型量子フォトニックプロセッサの高性能化の可能性について検討した。 We present an experimental implementation of a two-qubit photonic quantum processor fabricated using femtosecond laser writing technology. We employ femtosecond laser writing to create a low-loss reconfigurable photonic chip implementing precise single-qubit and two-qubit operations. The performance of single-qubit and two-qubit gates is characterized by full process tomography. An exemplary application of the processor to determining the ground state energy of an H2 molecule using the variational quantum eigensolver algorithm is demonstrated. Our results highlight the potential of femtosecond laser writing technology to deliver high quality small-scale quantum photonic processors.	翻訳日:2023-01-09 14:56:57 公開日:2022-12-12
# 確率量子シミュレーションにおける重要度サンプリング Importance sampling for stochastic quantum simulations ( http://arxiv.org/abs/2212.05952v1 ) ライセンス: Link先を確認	Oriel Kiss, Michele Grossi and Alessandro Roggero	(参考訳) 複雑な量子システムのシミュレーションは、デジタル量子コンピュータにとって有望なタスクである。しかし、一般的な製品公式の深さはハミルトニアンのサムマン数に比例するので、短期的およびフォールトトレラントなデバイスで実装することは困難である。効率的な解は、ハミルトニアンから係数の大きさに応じてサンプリングしてランダム積公式を構築する qdrift として知られる確率的コンパイルプロトコルによって与えられる。本研究では,qdriftプロトコルをサンプリングの重要性で統一し,バイアスと統計変動の両方を制御しながら任意の分布からサンプルすることが可能である。サンプリング段階における個別のシミュレーションコストを考慮することにより、同じ精度でシミュレーションコストを削減することができることを示す。さらに,本研究では, 対象の精度に対して, サンプル数, 実験数, 時間ステップを選択する方法を示す, 偏差と分散の厳密な境界を計算した最近の研究を取り入れた。これらの結果は、複合チャネルの使用の有無に関わらず、qdriftプロトコルをより効率的に実装することにつながる。理論的結果は格子核実効場理論で行った数値シミュレーションによって確認される。 Simulating complex quantum systems is a promising task for digital quantum computers. However, the depth of popular product formulas scales with the number of summands in the Hamiltonian, which can therefore be challenging to implement on near-term as well as fault-tolerant devices. An efficient solution is given by the stochastic compilation protocol known as qDrift, which builds random product formulas by sampling from the Hamiltonian according to the magnitude of their coefficients. In this work, we unify the qDrift protocol with importance sampling, allowing us to sample from arbitrary distributions while controlling both the bias as well as the statistical fluctuations. We show that the simulation cost can be reduced while achieving the same accuracy by considering the individual simulation cost during the sampling stage. Moreover, we incorporate recent work on composite channel and compute rigorous bounds on the bias and variance showing how to choose the number of samples, experiments, and time steps for a given target accuracy. These results lead to a more efficient implementation of the qDrift protocol, both with and without the use of composite channels. Theoretical results are confirmed by numerical simulations performed on a lattice nuclear effective field theory.	翻訳日:2023-01-09 14:06:17 公開日:2022-12-12
# 一般時間非依存ハミルトニアンの連続変数のダイナミクスに基づく量子性証明 Dynamics-based quantumness certification of continuous variables with generic time-independent Hamiltonians ( http://arxiv.org/abs/2212.06017v1 ) ライセンス: Link先を確認	Lin Htoo Zaw and Valerio Scarani	(参考訳) 量子性のダイナミクスに基づく証明は、その力学が知られているという仮定の下で、連続変数状態の非古典的な性質を目撃するアプローチである。単一系に対する非古典性の他のテストとは異なり、シーケンシャルな測定は不要である。このプロトコルのファミリーは調和力学のために導入された。本研究では,一般時間非依存ハミルトニアンの下で進化する一自由度に対する力学に基づく証明を構築する方法を示す。低エネルギーの限界(ケレル非線形性、振り子、モースポテンシャル)でほぼ調和しているものや(無限井戸の粒子)ではないものなど、いくつかの例が明示的に研究されている。 Dynamics-based certification of quantumness is an approach to witnessing the nonclassical character of some continuous-variable states, under the assumption that their dynamics is known. Contrary to other tests of nonclassicality for single systems, it does not require sequential measurements. This family of protocols was introduced for harmonic dynamics. In this work, we show how to construct dynamics-based certification for one degree of freedom evolving under a generic time-independent Hamiltonian. Several examples are explicitly studied: some that are approximately harmonic in the limits of low energy (Kerr nonlinearities, the pendulum, and the Morse potential) and one that is not (the particle in an infinite well).	翻訳日:2023-01-09 14:05:58 公開日:2022-12-12
# ガウス状態の光子数モーメントと累積 Photon-number moments and cumulants of Gaussian states ( http://arxiv.org/abs/2212.06067v1 ) ライセンス: Link先を確認	Yanic Cardin, Nicol\'as Quesada	(参考訳) 光子数ベースで測定した場合,ガウス状態のモーメントと累積に対する閉形式表現を開発する。ガウス状態の光子数モーメントをループハフニアンで表現し、グラフの隣接を表す$(0,1)$行列に適用すると、その完全マッチングの数を数える。次に、これらの式を用いて、累積の観点でモード間の真の光子数相関を計算する。単一モードのガウス状態がゼロの全ての入力において、一様損失の干渉計が供給されると、奇数次累積は全てゼロとなる。最後に,K$同一状態が$$\ell$モード干渉計に供給されるガウスボソンサンプリング装置において,累積の分布を4次まで異なる入力状態に対して研究するために導出した式を用いる。本研究は, 累積物の種類, 圧縮, 損失, スクワッド, 熱の関数として, および非真空入力数の関数として, 累積物の依存性を解析した。熱状態は他の古典的状態(例えばスカッシュ状態)よりも、損失状態や無損失状態の光子数累積状態の模倣においてずっと悪い結果をもたらすことが判明した。 We develop a closed-form expression for the moments and cumulants of Gaussian states when measured in the photon-number basis. We express the photon-number moments of a Gaussian state in terms of the loop Hafnian, a function that when applied to $(0,1)$-matrices representing the adjacency of a graph, counts the number of its perfect matchings. We then use these expressions to calculate genuine photon-number correlations between modes in terms of cumulants. We show that when a uniformly lossy interferometer is fed in every input with identical single-mode Gaussian states with zero displacement, all the odd-order cumulants but the first one are zero. Finally, we employ the expressions we derive to study the distribution of cumulants up to the fourth order for different input states in a Gaussian boson sampling setup where $K$ identical states are fed into an $\ell$-mode interferometer. We analyze the dependence of the cumulants as a function of the type of state, squeezed, lossy squeezed, squashed, or thermal, and as a function of the number of non-vacuum inputs. We find that thermal states perform much worse than other classical states, such as squashed states, at mimicking the photon-number cumulants of lossy or lossless squeezed states.	翻訳日:2023-01-09 14:05:45 公開日:2022-12-12
# 複数のチャネルを通して貯水池に結合したオープン量子系の効率的なシミュレーション Efficient simulation of open quantum systems coupled to a reservoir through multiple channels ( http://arxiv.org/abs/2212.06099v1 ) ライセンス: Link先を確認	Kai T. Liu, Jiaxi Wu, Peng Zhang, and David N. Beratan	(参考訳) 複数のチャネルを通して貯水池に結合したオープン量子系のシミュレーションは依然として大きな課題である。この種の開量子系は、例えば分子振動に結合した励起状態の放射のない崩壊を考えると生じる。相互作用図では連鎖マッピング戦略を用いて、複数の相互作用チャネルを介して調和浴に線形に結合した系を研究する。相互作用図では、素浴ハミルトニアンはユニタリ変換によって除去され(系-バス相互作用は残っており)、連鎖写像は浴モードを新しい基底に変換する。変換ハミルトニアンは時間依存の局所系-バス結合を含む。開量子系は、新しい基礎において限られた数の(変換された)浴モードに結合される。したがって、システムバス相互作用によって生じる絡み合いは局所的であり、行列積状態で効率的な動的シミュレーションを可能にする。このアプローチは一般化スピンボソンハミルトニアンを用いて一重項分裂をシミュレートする。電子状態は、対角および対角の両方の振動浴に結合される。このアプローチは、連鎖写像スキームをマルチチャネル系-バスカップリングの場合に一般化し、行列積状態を用いたこのクラス開量子システムの効率的なシミュレーションを可能にする。 The simulation of open quantum systems coupled to a reservoir through multiple channels remains a substantial challenge. This kind of open quantum system arises when considering the radiationless decay of excited states that are coupled to molecular vibrations, for example. We use the chain mapping strategy in the interaction picture to study systems linearly coupled to a harmonic bath through multiple interaction channels. In the interaction picture, the bare bath Hamiltonian is removed by a unitary transformation (the system-bath interactions remain), and a chain mapping transforms the bath modes to a new basis. The transformed Hamiltonian contains time-dependent local system-bath couplings. The open quantum system is coupled to a limited number of (transformed) bath modes in the new basis. As such, the entanglement generated by the system-bath interactions is local, making efficient dynamical simulations possible with matrix product states. We use this approach to simulate singlet fission, using a generalized spin-boson Hamiltonian. The electronic states are coupled to a vibrational bath both diagonally and off-diagonally. This approach generalizes the chain mapping scheme to the case of multi-channel system-bath couplings, enabling the efficient simulation of this class of open quantum systems using matrix product states.	翻訳日:2023-01-09 14:05:19 公開日:2022-12-12
# 複雑なネットワークの量子シミュレーションについて On the quantum simulation of complex networks ( http://arxiv.org/abs/2212.06126v1 ) ライセンス: Link先を確認	Duarte Magano and Jo\~ao Moutinho and Bruno Coutinho	(参考訳) 量子ウォークは、量子コンピュータでグラフ問題にアプローチするための自然なフレームワークを提供し、マークされたノードの探索や欠落したリンクの予測といったタスクに対して、従来のものよりもスピードアップを示す。連続時間量子ウォークアルゴリズムは、ハミルトニアンがグラフの隣接行列によって与えられる量子システムのダイナミクスをシミュレートできると仮定する。グラフが行スパースで効率よく行計算可能であれば、これを効率的にシミュレートできることが知られている。これは多くのアプリケーションに十分であるが、このタイプのアルゴリズムが実世界の複雑なネットワークを研究するための適用性を制限する。言い換えれば、複雑なネットワークは通常、すべてのノード間の平均接続が非常に小さいとしても、行スパースではない。本研究では、量子シミュレーションの最先端結果を、少数のハブを含むグラフに拡張するが、それ以外はスパースである。私たちの結果は、量子コンピューティングのネットワーク科学への新しい応用に繋がるかもしれません。 Quantum walks provide a natural framework to approach graph problems with quantum computers, exhibiting speedups over their classical counterparts for tasks such as the search for marked nodes or the prediction of missing links. Continuous-time quantum walk algorithms assume that we can simulate the dynamics of quantum systems where the Hamiltonian is given by the adjacency matrix of the graph. It is known that such can be simulated efficiently if the underlying graph is row-sparse and efficiently row-computable. While this is sufficient for many applications, it limits the applicability for this class of algorithms to study real world complex networks, which, among other properties, are characterized by the existence of a few densely connected nodes, called hubs. In other words, complex networks are typically not row-sparse, even though the average connectivity over all nodes can be very small. In this work, we extend the state-of-the-art results on quantum simulation to graphs that contain a small number of hubs, but that are otherwise sparse. Hopefully, our results may lead to new applications of quantum computing to network science.	翻訳日:2023-01-09 14:05:00 公開日:2022-12-12
# マルチノード超電導量子コンピュータのアーキテクチャ Architectures for Multinode Superconducting Quantum Computers ( http://arxiv.org/abs/2212.06167v1 ) ライセンス: Link先を確認	James Ang, Gabriella Carini, Yanzhu Chen, Isaac Chuang, Michael Austin DeMarco, Sophia E. Economou, Alec Eickbusch, Andrei Faraon, Kai-Mei Fu, Steven M. Girvin, Michael Hatridge, Andrew Houck, Paul Hilaire, Kevin Krsulich, Ang Li, Chenxu Liu, Yuan Liu, Margaret Martonosi, David C. McKay, James Misewich, Mark Ritter, Robert J. Schoelkopf, Samuel A. Stein, Sara Sussman, Hong X. Tang, Wei Tang, Teague Tomesh, Norm M. Tubman, Chen Wang, Nathan Wiebe, Yong-Xin Yao, Dillon C. Yost, Yiyu Zhou	(参考訳) 量子技術をスケールする多くの提案は、ノードと呼ばれる個々の量子プロセッサが結合して1つの大きなマルチノード量子コンピュータ(MNQC)を形成するモジュラーまたは分散設計に依存している。 MNQCを構築するためのスケーラブルな方法の1つは、光配線を持つ超伝導量子システムである。しかし、これらのマシンの制限要因はノード間ゲートであり、これは2～3桁のノイズがあり、局所的な操作よりも遅い。ノード間ゲートの制限を克服するには、絡み合い生成の改善、絡み合い蒸留の使用、最適化されたソフトウェアとコンパイラなど、さまざまなテクニックが必要である。本稿では,ノード間リンク,絡み込み蒸留,局所アーキテクチャといったハードウェアモデルを用いて,MNQCの全体的な性能を定量化するために,コデザインにインスパイアされたアプローチを用いる。マイクロ波-光リンクを有する超伝導MNQCでは, エンタングルメント生成と蒸留のトレードオフが発見され, 性能低下を脅かす。我々は、このトレードオフをナビゲートする方法を示し、コンパイラがローカルゲートとインターノードゲートを最適化する方法を示し、ノイズの多い量子リンクが純粋に古典的なリンクよりも優れている場合について議論する。これらの結果から,MNQCのハードウェアやソフトウェアの改良の可能性を示す初期のMNQCの実現のロードマップや,絡み込み生成や量子メモリの進歩から,分散量子位相推定などの専用アルゴリズムに至るまで,ランドスケープを評価するための基準を概説する。光配線を有する超伝導デバイスに焦点をあてる一方で、我々のアプローチはMNQC実装全体にわたって一般的である。 Many proposals to scale quantum technology rely on modular or distributed designs where individual quantum processors, called nodes, are linked together to form one large multinode quantum computer (MNQC). One scalable method to construct an MNQC is using superconducting quantum systems with optical interconnects. However, a limiting factor of these machines will be internode gates, which may be two to three orders of magnitude noisier and slower than local operations. Surmounting the limitations of internode gates will require a range of techniques, including improvements in entanglement generation, the use of entanglement distillation, and optimized software and compilers, and it remains unclear how improvements to these components interact to affect overall system performance, what performance from each is required, or even how to quantify the performance of each. In this paper, we employ a `co-design' inspired approach to quantify overall MNQC performance in terms of hardware models of internode links, entanglement distillation, and local architecture. In the case of superconducting MNQCs with microwave-to-optical links, we uncover a tradeoff between entanglement generation and distillation that threatens to degrade performance. We show how to navigate this tradeoff, lay out how compilers should optimize between local and internode gates, and discuss when noisy quantum links have an advantage over purely classical links. Using these results, we introduce a roadmap for the realization of early MNQCs which illustrates potential improvements to the hardware and software of MNQCs and outlines criteria for evaluating the landscape, from progress in entanglement generation and quantum memory to dedicated algorithms such as distributed quantum phase estimation. While we focus on superconducting devices with optical interconnects, our approach is general across MNQC implementations.	翻訳日:2023-01-09 14:04:43 公開日:2022-12-12
# ランダム量子回路を用いたランダム化ベンチマークの一般保証 General guarantees for randomized benchmarking with random quantum circuits ( http://arxiv.org/abs/2212.06181v1 ) ライセンス: Link先を確認	Markus Heinrich, Martin Kliesch, Ingo Roth	(参考訳) 多くの変種において、ランダム化ベンチマーク(RB)は量子コンピュータにおけるゲート実装の品質を評価するために広く用いられている手法である。厳密な理論的な理解と一般的な保証がRBプロトコルの関数化と解釈のために存在する: 精査下のゲートがコンパクト群からランダムに一様に描かれる。対照的に、実際に魅力的でスケーラブルなrbプロトコルの多くは、あるゲート集合からランダムに引き出される局所ゲートを持つランダム量子回路を実装している。実際には、これらの一様でないRBプロトコルに対して、実験的に妥当な仮定の下での一般的な保証が欠落している。本研究では,フィルタRBと呼ぶランダム回路に対して,大規模なRBプロトコルの保証を導出する。代表的な例として、線形クロスエントロピーベンチマーク、文字ベンチマーク、ポーリノイズトモグラフィ、同時rbの変種がある。近年のランダム回路に関する結果をもとに,線形深さのランダム量子回路を用いて,関連する多くのフィルタ付きrbスキームを実現できることを示した。さらに,フィルタRBの一般試料複雑性境界を導出する。高次クロストーク対応プロトコルを含むいくつかの関連グループにおいて,フィルタ付きrbはサンプル効率が高いことを示す。非一様フィルタRBの理論は、原則として、非一様およびアナログ量子シミュレータのための新しいプロトコルを設計できるほど柔軟である。 In its many variants, randomized benchmarking (RB) is a broadly used technique for assessing the quality of gate implementations on quantum computers. A detailed theoretical understanding and general guarantees exist for the functioning and interpretation of RB protocols if the gates under scrutiny are drawn uniformly at random from a compact group. In contrast, many practically attractive and scalable RB protocols implement random quantum circuits with local gates randomly drawn from some gate-set. Despite their abundance in practice, for those non-uniform RB protocols, general guarantees under experimentally plausible assumptions are missing. In this work, we derive such guarantees for a large class of RB protocols for random circuits that we refer to as filtered RB. Prominent examples include linear cross-entropy benchmarking, character benchmarking, Pauli-noise tomography and variants of simultaneous RB. Building upon recent results for random circuits, we show that many relevant filtered RB schemes can be realized with random quantum circuits in linear depth, and we provide explicit small constants for common instances. We further derive general sample complexity bounds for filtered RB. We show filtered RB to be sample-efficient for several relevant groups, including protocols addressing higher-order cross-talk. Our theory for non-uniform filtered RB is, in principle, flexible enough to design new protocols for non-universal and analog quantum simulators.	翻訳日:2023-01-09 14:04:15 公開日:2022-12-12
# 有限分割単発読み出しによる読み出しと初期化の忠実性について On readout and initialisation fidelity by finite demolition single shot readout ( http://arxiv.org/abs/2212.06271v1 ) ライセンス: Link先を確認	Majid Zahedian, Max Keller, Minsik Kwon, Javid Javadzade, Jonas Meinel, Vadim Vorobyov and J\"org Wrachtrup	(参考訳) 理想的な射影量子測定は、可観測作用素の1つで系状態が崩壊する(\|\phi_\alpha\rangle$)。しかし、射影計測の実験的な実現は理想的ではない。装置の古典的なノイズを克服するために必要な測定時間の間、システム状態はしばしば(軽く)摂動し、初期化の忠実さを損なう。本稿では,単発読み出しによって実行されるシステムの初期化忠実度を分析する解析モデルを提案する。ダイヤモンド, 電荷状態, 核スピン, 低温電子スピンの読み出しにおけるNV色中心の光子計数に基づく読み出しのパラメータを最適化する手法を考案した。我々の研究は、単発読み出しがポストセレクションやリアルタイム制御による初期化に使用されるとき、量子ビットの初期化忠実性の正確な記述に関連している。 Ideal projective quantum measurement makes the system state collapse in one of the observable operator eigenstates $\|\phi_\alpha\rangle$, making it a powerful tool for preparing the system in the desired pure state. Nevertheless, experimental realisations of projective measurement are not ideal. During the measurement time needed to overcome the classical noise of the apparatus, the system state is often (slightly) perturbed, which compromises the fidelity of initialisation. In this paper, we propose an analytical model to analyse the initialisation fidelity of the system performed by the single-shot readout. We derive a method to optimise parameters for the three most used cases of photon counting based readouts for NV colour centre in diamond, charge state, nuclear spin and low temperature electron spin readout. Our work is of relevance for the accurate description of initialisation fidelity of the quantum bit when the single-shot readout is used for initialisation via post-selection or real-time control.	翻訳日:2023-01-09 14:03:52 公開日:2022-12-12
# 対称量子センサの量子誤差補正 Quantum error correction on symmetric quantum sensors ( http://arxiv.org/abs/2212.06285v1 ) ライセンス: Link先を確認	Yingkai Ouyang and Gavin K. Brennen	(参考訳) 集合角運動量の対称状態は、準備が容易で、個々のアドレナビリティを必要とせずに制御できるため、量子センサーのマルチキュービットプローブ状態のよい候補である。ここでは,古典場の大きさを対称プローブ状態を用いて推定するための量子誤差補正プロトコルを提案する。これを達成するために、まず対称部分空間上の量子誤差補正の一般理論を考案する。この理論は対称群の表現論に基づいて、任意の置換不変コード上の修正可能な誤りを訂正できる効率的なアルゴリズムを構築することができる。これらのアルゴリズムは、全角運動量、量子シュール変換または論理状態テレポーテーション、幾何学パルスゲートの測定を含む。削除誤差に対しては,幾何学的パルスゲートに基づく単純な量子誤差補正アルゴリズムを提案する。第2に、除去誤差の線形率にもかかわらず機能する対称プローブ状態に対する簡単な量子センシング手法を考案し、その漸近的性能を解析する。提案手法では,信号が蓄積している間,プローブ状態をコード空間に繰り返し投影する。信号の蓄積に要する時間が一定であれば,ノイズのない設定で可能な限り近い精度で位相推定を行うことができる。第3に,アルゴリズムの短期的実装を行う。 Symmetric states of collective angular momentum are good candidates for multi-qubit probe states in quantum sensors because they are easy to prepare and can be controlled without requiring individual addressability. Here, we give quantum error correction protocols for estimating the magnitude of classical fields using symmetric probe states. To achieve this, we first develop a general theory for quantum error correction on the symmetric subspace. This theory, based on the representation theory of the symmetric group, allows us to construct efficient algorithms that can correct any correctible error on any permutation-invariant code. These algorithms involve measurements of total angular momentum, quantum Schur transforms or logical state teleportations, and geometric pulse gates. For deletion errors, we give a simpler quantum error correction algorithm based on primarily on geometric pulse gates. Second, we devise a simple quantum sensing scheme on symmetric probe states that works in spite of a linear rate of deletion errors, and analyze its asymptotic performance. In our scheme, we repeatedly project the probe state onto the codespace while the signal accumulates. When the time spent to accumulate the signal is constant, our scheme can do phase estimation with precision that approaches the best possible in the noiseless setting. Third, we give near-term implementations of our algorithms.	翻訳日:2023-01-09 14:03:35 公開日:2022-12-12
# 単一光子メモリ計測-デバイス非依存量子セキュアな直接通信 Single-photon-memory measurement-device-independent quantum secure direct communication ( http://arxiv.org/abs/2212.05661v1 ) ライセンス: Link先を確認	Xiang-Jie Li, Dong Pan, Gui-Lu Long, and Lajos Hanzo	(参考訳) 量子セキュアダイレクト通信(QSDC)は、量子チャネルを使用して情報を確実かつ安全に送信する。実用検出器によるセキュリティの抜け穴を取り除くため,測定デバイス非依存(MDI)QSDCプロトコルが提案されている。しかし、ブロックベースの量子状態の伝送はmdi-qsdcで活用されており、執筆時点ではまだ使用できない実用的な量子メモリを必要とする。この障害を回避するため,高速な量子メモリを不要とする単一光子メモリMDI QSDCプロトコル(SPMQC)を提案する。提案プロトコルの性能は,現実的な実験パラメータを考慮したシミュレーションにより特徴づけられ,現在の技術に依拠してspmqcを実装することが可能であることが示されている。 Quantum secure direct communication (QSDC) uses the quantum channel to transmit information reliably and securely. In order to eliminate the security loopholes resulting from practical detectors, the measurement-device-independent (MDI) QSDC protocol has been proposed. However, block-based transmission of quantum states is utilized in MDI-QSDC, which requires practical quantum memory that is still unavailable at the time of writing. For circumventing this impediment, we propose a single-photon-memory MDI QSDC protocol (SPMQC) for dispensing with high-performance quantum memory. The performance of the proposed protocol is characterized by simulations considering realistic experimental parameters, and the results show that it is feasible to implement SPMQC by relying on present-day technology.	翻訳日:2023-01-09 13:56:10 公開日:2022-12-12
# 通勤ゲートのSWAPゲート挿入における初期写像問題に対するSATアプローチ A SAT approach to the initial mapping problem in SWAP gate insertion for commuting gates ( http://arxiv.org/abs/2212.05666v1 ) ライセンス: Link先を確認	Atsushi Matsuo, Shigeru Yamashita, Daniel J. Egger	(参考訳) ほとんどの量子回路は、量子ハードウェア上で量子ビット接続に制限のあるSWAPゲート挿入を必要とする。 2ビットゲートを交換するブロックに対する有望なSWAPゲート挿入方法は、結合マップ上で同時に実行可能なSWAPゲートの層を適用した所定のスワップ戦略である。スワップ戦略に対する優れた初期マッピングは、必要なスワップゲートの数を減らす。しかし、量子近似最適化アルゴリズム(QAOA)やイジン・ハミルトニアンのトロッター化シミュレーションのように、回路が通勤ゲートで構成されている場合でも、よい初期写像を見つけることは難しい問題である。そこで本研究では,スワップ戦略を応用したコンミューティングゲートをハードウェアにトランスパイアした回路の初期マッピングをsatで求める手法を提案する。この手法は500ノードのランダムな3正則グラフに対するゲート数を65%削減する。さらに,SATの定式化とクラスタリングアルゴリズムを組み合わせたヒューリスティックな手法を提案する。このアプローチは、1000ノードのランダムな3正則グラフの自明な初期マッピングとランダムな初期マッピングの両方と比較して、スワップ層数を25%削減する。良い初期写像は、数百の量子ビットを持つノイズの多い量子ハードウェア上で、スパース問題に適用されたQAOAやIsing Hamiltonianシミュレーションのような量子アルゴリズムの研究を可能にする。 Most quantum circuits require SWAP gate insertion to run on quantum hardware with limited qubit connectivity. A promising SWAP gate insertion method for blocks of commuting two-qubit gates is a predetermined swap strategy which applies layers of SWAP gates simultaneously executable on the coupling map. A good initial mapping for the swap strategy reduces the number of required swap gates. However, even when a circuit consists of commuting gates, e.g., as in the Quantum Approximate Optimization Algorithm (QAOA) or trotterized simulations of Ising Hamiltonians, finding a good initial mapping is a hard problem. We present a SAT-based approach to find good initial mappings for circuits with commuting gates transpiled to the hardware with swap strategies. Our method achieves a 65% reduction in gate count for random three-regular graphs with 500 nodes. In addition, we present a heuristic approach that combines the SAT formulation with a clustering algorithm to reduce large problems to a manageable size. This approach reduces the number of swap layers by 25% compared to both a trivial and random initial mapping for a random three-regular graph with 1000 nodes. Good initial mappings will therefore enable the study of quantum algorithms, such as QAOA and Ising Hamiltonian simulation applied to sparse problems, on noisy quantum hardware with several hundreds of qubits.	翻訳日:2023-01-09 13:55:58 公開日:2022-12-12
# マイクロ波被覆弱非調和超伝導量子ビットの非摂動的再正規化の解消 Resolving non-perturbative renormalization of a microwave-dressed weakly anharmonic superconducting qubit ( http://arxiv.org/abs/2212.05847v1 ) ライセンス: Link先を確認	Byoung-moo Ann, Sercan Deve, and Gary A. Steele	(参考訳) マイクロ波駆動は超伝導量子ビット(scqs)のユビキタスな技術であるが、従来の摂動理論と回転波近似に基づく服装状態の記述は強い駆動限界のダイナミクスを完全に捉えることはできない。これらの近似を超越した包括的な実験的な研究は、残念ながら量子技術への関心が高まり、稀である。本研究では,マイクロ波装填トランスモンを広範囲の駆動パラメータで検討する。我々は,従来の近似を破ることなく,Rabi周波数,エネルギー緩和時間,およびリードアウト共振器との結合速度の有意な再正規化を見出した。また、2状態モデルを超えた簡潔な非フロケ理論を確立し、近似を劇的に最小化し、実験を良好に定量化する。この研究は、時間周期駆動システムの基本的な理解を拡大し、弱い非調和量子ビットのダイナミクスを正確に推定する上で重要な役割を担います。さらに,マルチレベルシステムではより複雑である適切なフロッケモードの選択などの追加作業を回避できるため,非フロッケアプローチは理論的解析に有益である。 Microwave driving is a ubiquitous technique for superconducting qubits (SCQs), but the dressed states description based on the conventionally used perturbation theory and rotating wave approximation cannot fully capture the dynamics in the strong driving limit. Comprehensive experimental works beyond these approximations applicable for transmons is unfortunately rare, which receive rising interests in quantum technologies. In this work, we investigate a microwave-dressed transmon over a wide range of driving parameters. We find significant renormalization of Rabi frequencies, energy relaxation times, and the coupling rates with a readout resonator, all of which are not quantified without breaking the conventional approximations. We also establish a concise non-Floquet theory beyond the two-state model while dramatically minimizing the approximations, which excellently quantifies the experiments. This work expands our fundamental understanding of time-periodically driven systems and will have an important role in accurately estimating the dynamics of weakly anharmonic qubits. Furthermore, our non-Floquet approach is beneficial for theoretical analysis since one can avoid additional efforts such as the choice of proper Floquet modes, which is more complicated for multi-level systems.	翻訳日:2023-01-09 13:54:59 公開日:2022-12-12
# 位置表現におけるBialynicki-BirulaとLandau-Peierls Fock空間の電磁場量子化の同型性 Isomorphism between the Bialynicki-Birula and the Landau-Peierls Fock space quantization of the electromagnetic field in position representation ( http://arxiv.org/abs/2212.05849v1 ) ライセンス: Link先を確認	Maxime Federico and Hans Rudolf Jauslin	(参考訳) まず, 位置空間表現における電磁場の量子化について, クーロンゲージにおけるlandau-peierlsアプローチと, リーマン・シルバーシュタインベクトルに基づくbialynicki-birulaアプローチの2つの主要なアプローチを用いて概説する。古典的ハミルトニアン構造から始まる枠組みと、正確に定義された対応原理によってボソニックフォック空間に量子モデルを構築する枠組みの両方を記述する。 2つの近似が完全同値であることを示す。これは、フォック空間の間に同型となるユニタリ写像が存在することを示すことによって定式化される。物理的に測定可能な全ての量はスカラー積で表現できるので、2つの量子化が全く同じ物理的性質をもたらすことを意味する。さらに、同型は時間進化において保存されていることを示す。等価性を示すために、ヘリシティと周波数演算子の概念を用いる。これら2つの演算子の組み合わせは、これらの2つの量子化法を正確な方法でリンクできる定式化を提供する。また、ハミルトニアンにおける負の固有値の存在を回避できるbialynicki-birula量子化の構成は、電子と陽電子のディラック方程式の例に類似しており、マクスウェル方程式の正準変数の別の選択を通して行うことができることを示した。 We first present a summary of the quantization of the electromagnetic field in position space representation, using two main approaches: the Landau-Peierls approach in the Coulomb gauge and the Bialynicki-Birula approach, based on the Riemann-Silberstein vector. We describe both in a framework that starts with a classical Hamiltonian structure and builds the quantum model in a bosonic Fock space by a precisely defined principle of correspondence. We show that the two approches are completly equivalent. This is formulated by showing that there is a unitary map between the Fock spaces that makes them isomorphic. Since all the physically measurable quantities can be expressed in terms of scalar products, this implies that the two quantizations lead to exactly the same physical properties. We show furthemore that the isomorphism is preserved in the time evolutions. To show the equivalence, we use the concepts of helicity and frequency operators. The combination of these two operators provides a formulation that allows one to make the link between these two methods of quantization in a precise way. We also show that the construction in the Bialynicki-Birula quantization that avoids the presence of negative eigenvalues in the Hamiltonian, in analogy with the one for the Dirac equation for electrons and positrons, can be performed through an alternative choice of the canonical variables for Maxwell's equations.	翻訳日:2023-01-09 13:54:39 公開日:2022-12-12
# 単光子干渉計のサブ0.1度位相ロック Sub-0.1 degree phase locking of a single-photon interferometer ( http://arxiv.org/abs/2212.05852v1 ) ライセンス: Link先を確認	Vojt\v{e}ch \v{S}varc, Martina Nov\'akov\'a, Michal Dudka, Miroslav Je\v{z}ek	(参考訳) 単光子mach-zehnder干渉計の位相精度を15時間で0.05度に安定化した。位相をロックするために、量子信号とは異なる波長の補助基準光を用いる。開発した位相ロックは、無視可能なクロストークと量子信号の任意の位相に対して連続的に動作する。さらに、その性能は基準の強度変動とは無関係である。この手法は、量子干渉量論ネットワークの大部分で使用できるため、量子通信や量子メトロロジーにおける位相感応応用を大幅に改善することができる。 We report a single-photon Mach-Zehnder interferometer stabilized to a phase precision of 0.05 degrees over 15 hours. To lock the phase, we employ an auxiliary reference light at a different wavelength than the quantum signal. The developed phase locking operates continuously, with negligible crosstalk, and for an arbitrary phase of the quantum signal. Moreover, its performance is independent of intensity fluctuations of the reference. Since the presented method can be used in a vast majority of quantum interferometric networks it can significantly improve phase-sensitive applications in quantum communication and quantum metrology.	翻訳日:2023-01-09 13:54:13 公開日:2022-12-12
# trinet:完全あるいはゆっくり崩壊した自己教師付き学習の安定化 TriNet: stabilizing self-supervised learning from complete or slow collapse ( http://arxiv.org/abs/2301.00656v1 ) ライセンス: Link先を確認	Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu	(参考訳) 自己教師付き学習(SSL)モデルは、突然の情報崩壊や遅い次元崩壊という課題に直面している。本稿では,崩壊を防止し,事前学習を安定化するための新しい三分岐アーキテクチャTriNetを提案する。提案手法は,下降ベンチマークasrタスクのsof-the-art(sota)データ2vecと比較して,事前学習を安定化し,5.32%の単語誤り率低減(werr)を実現する。コードはhttps://github.com/tencent-ailab/でリリースします。 Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. Our experimental results show that the proposed method notably stabilizes and accelerates pre-training and achieves a relative word error rate reduction (WERR) of 5.32% compared to the state-of-the-art (SOTA) Data2vec for a downstream benchmark ASR task. We will release our code at https://github.com/tencent-ailab/.	翻訳日:2023-01-09 13:47:41 公開日:2022-12-12
# 自律走行車における分散的協調認識--未知の価値を学習する Decentralized cooperative perception for autonomous vehicles: Learning to value the unknown ( http://arxiv.org/abs/2301.01250v1 ) ライセンス: Link先を確認	Maxime Chaveroche, Franck Davoine, V\'eronique Cherfaoui	(参考訳) 最近、自動運転車による事故と十分な情報不足が目撃されている。この問題に取り組む一つの方法は、異なる視点の認識、すなわち協調的な知覚から恩恵を受けることである。そこで我々は,エージェントが周囲にもっと知りたくなるような特定の領域を求めることで,完全な認識を求める活動を行っている分散的なコラボレーション,すなわちピア・ツー・ピアを提案する。究極的には、移動対象に関する知識の最大化と、他者から受信される情報総量の最小化とのトレードオフを最適化し、通信コストとメッセージ処理時間を制限したい。そこで本研究では,送信側でフィルタリングを行う代わりに,未知の車両を自我車に要求するだけで,通常の通信パラダイムを逆転する通信方針を学習する方法を提案する。深層強化学習(drl)アルゴリズムのベースとして3つの異なる生成モデルをテストし,それらを放送ポリシーとランダムに選択するポリシーと比較した。特に,局部予測可能なvae (lp-vae) を提案する。これはスタンドアロンモデルとdrlの文脈の両方において,最先端のモデルよりも優れた予測状態を生成する。運転シミュレータCARLAで実験を行った。我々の最良のモデルは、平均して補完情報の25%を獲得し、エゴ車両の知覚野の約5%しか要求していない。このトレードオフは、報酬関数の解釈可能なハイパーパラメータを通じて調整可能です。 Recently, we have been witnesses of accidents involving autonomous vehicles and their lack of sufficient information. One way to tackle this issue is to benefit from the perception of different view points, namely cooperative perception. We propose here a decentralized collaboration, i.e. peer-to-peer, in which the agents are active in their quest for full perception by asking for specific areas in their surroundings on which they would like to know more. Ultimately, we want to optimize a trade-off between the maximization of knowledge about moving objects and the minimization of the total volume of information received from others, to limit communication costs and message processing time. For this, we propose a way to learn a communication policy that reverses the usual communication paradigm by only requesting from other vehicles what is unknown to the ego-vehicle, instead of filtering on the sender side. We tested three different generative models to be taken as base for a Deep Reinforcement Learning (DRL) algorithm, and compared them to a broadcasting policy and a policy randomly selecting areas. In particular, we propose Locally Predictable VAE (LP-VAE), which appears to be producing better belief states for predictions than state-of-the-art models, both as a standalone model and in the context of DRL. Experiments were conducted in the driving simulator CARLA. Our best models reached on average a gain of 25% of the total complementary information, while only requesting about 5% of the ego-vehicle's perceptual field. This trade-off is adjustable through the interpretable hyperparameters of our reward function.	翻訳日:2023-01-09 13:47:06 公開日:2022-12-12
# テキストに富む歴史的文書のページレイアウト分析--テキストと視覚的アプローチの比較- Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches ( http://arxiv.org/abs/2212.13924v1 ) ライセンス: Link先を確認	Najem-Meyer Sven, Romanello Matteo	(参考訳) ページレイアウト分析は、ページを関心のある領域に分割できるドキュメント処理の基本的なステップである。非常に複雑なレイアウトと複雑なスクリプトにより、学術的なコメンテータはテキストに富んだドキュメントであり、最先端のモデルでは依然として困難である。彼らのレイアウトは版によって大きく異なり、最も重要な領域は主に位置や外観といったグラフィカルな特徴ではなく、意味的に定義される。この設定は、テキスト、ビジュアル、およびハイブリッドのアプローチの比較を要求する。そこで我々は2つの変圧器(LayoutLMv3とRoBERTa)と対物検出ネットワーク(YOLOv5)の性能を評価する。結果が後者に有利な点を示した場合、この発見に注意すべき点をいくつか挙げる。実験に加えて、私たちはcaのデータセットをリリースしました。 19世紀の注釈から採集された300ページ。 Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.	翻訳日:2023-01-01 14:26:25 公開日:2022-12-12
# 自己覚醒と選択的バッチサンプリングを併用したロバスト睡眠ステージ用シームス睡眠トランスフォーマー Siamese Sleep Transformer For Robust Sleep Stage Scoring With Self-knowledge Distillation and Selective Batch Sampling ( http://arxiv.org/abs/2212.13919v1 ) ライセンス: Link先を確認	Heon-Gyu Kwak, Young-Seok Kweon, Gi-Hwan Shin	(参考訳) 本稿では,単一チャネルの生脳波信号から特徴を効果的に抽出し,ロバストな睡眠ステージスコアリングを行うシアム睡眠トランスフォーマ(sst)を提案する。過去数年間の睡眠ステージスコアの大幅な進歩にもかかわらず、そのほとんどはモデルパフォーマンスの増大に重点を置いていた。しかしながら、データセット内のラベルのバイアスや、繰り返しトレーニングによるモデルパフォーマンスの不安定さなど、他の問題も存在する。そこで本研究では,選択的なバッチサンプリング戦略と自己認識蒸留による新しい睡眠ステージスコアリングモデルであるsstを提案する。このモデルがラベルのバイアスに対してどれほど堅牢かを評価するために、私たちは、トレーニングとテストのために異なるデータセット、すなわち睡眠心健康調査と睡眠-EDFデータセットを使用しました。この条件下では、SSTは睡眠ステージスコアにおいて競争性能を示した。また, 繰り返し訓練による性能の標準偏差を低減し, 選択的バッチサンプリング戦略の有効性を実証した。これらの結果から,sstはデータセット内のラベルのバイアスに対して効果的な学習特徴を抽出でき,選択的なバッチサンプリング戦略はモデルのロバスト性に有効であった。 In this paper, we propose a Siamese sleep transformer (SST) that effectively extracts features from single-channel raw electroencephalogram signals for robust sleep stage scoring. Despite the significant advances in sleep stage scoring in the last few years, most of them mainly focused on the increment of model performance. However, other problems still exist: the bias of labels in datasets and the instability of model performance by repetitive training. To alleviate these problems, we propose the SST, a novel sleep stage scoring model with a selective batch sampling strategy and self-knowledge distillation. To evaluate how robust the model was to the bias of labels, we used different datasets for training and testing: the sleep heart health study and the Sleep-EDF datasets. In this condition, the SST showed competitive performance in sleep stage scoring. In addition, we demonstrated the effectiveness of the selective batch sampling strategy with a reduction of the standard deviation of performance by repetitive training. These results could show that SST extracted effective learning features against the bias of labels in datasets, and the selective batch sampling strategy worked for the model robustness in training.	翻訳日:2023-01-01 14:26:11 公開日:2022-12-12
# インダストリアルエッジ装置からの高周波機械データを用いた工具側面摩耗予測 Tool flank wear prediction using high-frequency machine data from industrial edge device ( http://arxiv.org/abs/2212.13905v1 ) ライセンス: Link先を確認	D. Bilgili (1), G. Kecibas (1 and 2), C. Besirova (1 and 2), M. R. Chehrehzad (2), G. Burun (3), T. Pehlivan (1), U. Uresin (1), E. Emekli (1), I. Lazoglu (2) ((1) Ford Otosan R&D Center, Istanbul, Turkey, (2) Ko\c{c} University, Manufacturing and Automation Research Center, Istanbul, Turkey, (3) Tubitak BILGEM Information Technologies Institute, Kocaeli, Turkey)	(参考訳) ツールサイドの摩耗監視は、生産性と製品品質を高めながら、加工のダウンタイムコストを最小限に抑えることができる。一部の工業用途では、必要な許容度を達成するために、限られたレベルの工具着用しか認められない。機械のフレキシブルな振動などの他のコンポーネントが測定信号を支配しているため、機械から収集されたデータの限られたレベルのツール摩耗を監視することは困難になるかもしれない。本研究では,スピンドルモータ電流とダイナモメータの測定値から工具摩耗の限られたレベルを予測するための工具摩耗モニタリング手法を提案する。産業用エッジ装置で高周波スピンドルモータ電流データを収集し、選択された多数の穴の掘削試験において、回転ダイナモメーターで切削力とトルクを測定する。ツール摩耗の小さな変化に最も敏感な計測信号の統計的特徴を特定するために,特徴工学を行った。計測されたスピンドルモータ電流とダイナモメータ信号からツール側面の摩耗を予測するために、long short-term memory(lstm)アーキテクチャに基づくニューラルネットワークを開発した。提案手法は精度が高く計算効率も高いツールサイド摩耗を予測できることが実証された。提案手法は産業用エッジデバイスにリアルタイムの予測保守アプリケーションとして容易に実装でき、製造ダウンタイムやツール使用過多によるコストを最小限に抑えることができる。 Tool flank wear monitoring can minimize machining downtime costs while increasing productivity and product quality. In some industrial applications, only a limited level of tool wear is allowed to attain necessary tolerances. It may become challenging to monitor a limited level of tool wear in the data collected from the machine due to the other components, such as the flexible vibrations of the machine, dominating the measurement signals. In this study, a tool wear monitoring technique to predict limited levels of tool wear from the spindle motor current and dynamometer measurements is presented. High-frequency spindle motor current data is collected with an industrial edge device while the cutting forces and torque are measured with a rotary dynamometer in drilling tests for a selected number of holes. Feature engineering is conducted to identify the statistical features of the measurement signals that are most sensitive to small changes in tool wear. A neural network based on the long short-term memory (LSTM) architecture is developed to predict tool flank wear from the measured spindle motor current and dynamometer signals. It is demonstrated that the proposed technique predicts tool flank wear with good accuracy and high computational efficiency. The proposed technique can easily be implemented in an industrial edge device as a real-time predictive maintenance application to minimize the costs due to manufacturing downtime and tool underuse or overuse.	翻訳日:2023-01-01 14:25:13 公開日:2022-12-12
# PEファイル中のマルウェア検出のための機械学習 Machine Learning for Detecting Malware in PE Files ( http://arxiv.org/abs/2212.13988v1 ) ライセンス: Link先を確認	Collin Connors and Dilip Sarkar	(参考訳) 高度なマルウェアの増加は、サイバーセキュリティの大きな脅威となる。ポータブル実行可能ファイル(PEファイル)はそのようなマルウェアの一般的なベクトルである。本研究では,機械学習を用いたPEマルウェア検出手法のレビューと評価を行う。大規模ベンチマークデータセットを用いて,マルウェア検出に最も一般的な機械学習手法を用いてpeファイルの特徴を評価する。 The increasing number of sophisticated malware poses a major cybersecurity threat. Portable executable (PE) files are a common vector for such malware. In this work we review and evaluate machine learning-based PE malware detection techniques. Using a large benchmark dataset, we evaluate features of PE files using the most common machine learning techniques to detect malware.	翻訳日:2023-01-01 14:24:52 公開日:2022-12-12
# 時間変動コスト関数を用いた分散制約なし最適化 Distributed Unconstrained Optimization with Time-varying Cost Functions ( http://arxiv.org/abs/2212.09472v1 ) ライセンス: Link先を確認	Amir-Salar Esteki and Solmaz S. Kia	(参考訳) 本稿では,グループネットワークエージェントの時間変動局所コスト関数の総和を総コストとする分散制約なし最適化問題に対する新しい解を提案する。目的は、各時点の総コストを最小限に抑える最適な軌道を追跡することである。提案手法は,2段階のダイナミックスから成り,第1段階と第2段階の局所的コストの導関数を定期的にサンプリングして最適軌道への降下方向の推定を行い,第2段階では,この推定とコンセンサス項を用いて局所状態の時間変化解への誘導を行う。第1部は離散時間フレームワークにおける重み付き平均コンセンサスアルゴリズムの実装により実行され、第2部は連続時間ダイナミクスで実行される。リアプノフ安定性解析を用いて、漸近的に到達した総コストの勾配上の上限を求める。この境界は地域費用の特性によって特徴づけられる。提案手法の性能を示すために,アルゴリズムのパラメータをチューニングした数値実験を行い,局所状態の最適軌道への収束に対する効果について検討した。 In this paper, we propose a novel solution for the distributed unconstrained optimization problem where the total cost is the summation of time-varying local cost functions of a group networked agents. The objective is to track the optimal trajectory that minimizes the total cost at each time instant. Our approach consists of a two-stage dynamics, where the first one samples the first and second derivatives of the local costs periodically to construct an estimate of the descent direction towards the optimal trajectory, and the second one uses this estimate and a consensus term to drive local states towards the time-varying solution while reaching consensus. The first part is carried out by the implementation of a weighted average consensus algorithm in the discrete-time framework and the second part is performed with a continuous-time dynamics. Using the Lyapunov stability analysis, an upper bound on the gradient of the total cost is obtained which is asymptotically reached. This bound is characterized by the properties of the local costs. To demonstrate the performance of the proposed method, a numerical example is conducted that studies tuning the algorithm's parameters and their effects on the convergence of local states to the optimal trajectory.	翻訳日:2022-12-25 02:52:46 公開日:2022-12-12
# 会話レコメンダシステムのための合成データセットの評価 Evaluation of Synthetic Datasets for Conversational Recommender Systems ( http://arxiv.org/abs/2212.08167v1 ) ライセンス: Link先を確認	Harsh Lara, Manoj Tiwari	(参考訳) 大規模言語モデル(llms)をトレーニングデータセット、特に会話型レコメンデーションシステムの生成に活用する研究者にとって、堅牢な評価フレームワークの欠如は長年の問題だった。データ生成段階でllmsによってもたらされる効率は、一般的には、生成されたデータが高品質で十分な多様性を有することを保証するために、人手が要求されるため、生成データの評価の過程で阻害される。ダウンストリームアプリケーションでは,トレーニングデータの質が重要となるため,品質を水平的に評価し,バイアスを識別する指標を開発することが重要である。本稿では,生成モデルによって生成されたデータセットを評価するための多面的アプローチを用いて,様々な評価手法の利点と限界について議論する。 For researchers leveraging Large-Language Models (LLMs) in the generation of training datasets, especially for conversational recommender systems - the absence of robust evaluation frameworks has been a long-standing problem. The efficiency brought about by LLMs in the data generation phase is impeded during the process of evaluation of the generated data, since it generally requires human-raters to ensure that the data generated is of high quality and has sufficient diversity. Since the quality of training data is critical for downstream applications, it is important to develop metrics that evaluate the quality holistically and identify biases. In this paper, we present a framework that takes a multi-faceted approach towards evaluating datasets produced by generative models and discuss the advantages and limitations of various evaluation methods.	翻訳日:2022-12-25 02:45:48 公開日:2022-12-12
# 再合成ギャップの心:単段階と多段階の再合成予測の分岐 Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction ( http://arxiv.org/abs/2212.11809v1 ) ライセンス: Link先を確認	Alan Kai Hassen, Paula Torren-Peraire, Samuel Genheden, Jonas Verhoeven, Mike Preuss, Igor Tetko	(参考訳) 再合成は、商業的に利用可能な分子の集合が見つかるまで、再帰的に分子前駆体に分解する作業である。したがって、分子の有効な合成経路を提供することが目的である。単段階モデルが発展するにつれて、分子切断の予測精度が高まり、合成経路の生成が改善される可能性がある。多段階のアプローチは、単段階のレトロシンセシスモデルに格納された化学情報を繰り返し適用する。しかし、この接続は現代の研究には反映されず、プロセス内のシングルステップモデルまたはマルチステップアルゴリズムを固定する。本研究では,2つの共通探索アルゴリズムであるモンテカルロ木探索法とレトロ法を用いて,異なる単一ステップのレトロシンセシスモデルの性能と転送のベンチマークを行い,両タスク間の橋渡しを確立する。複数のステップに拡張された単一ステップの逆合成を設計したモデルは、現在のマルチステップ手法の経路探索能力に大きな影響を及ぼし、最も広く使われているモデルと比較して最大30%性能が向上することを示した。さらに,同時代の単段階評価指標と多段階評価指標との間には明確な相関は見られず,多段階領域に対して単段階モデルを開発し,テストする必要があることを示す。 Retrosynthesis is the task of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found. Consequently, the goal is to provide a valid synthesis route for a molecule. As more single-step models develop, we see increasing accuracy in the prediction of molecular disconnections, potentially improving the creation of synthetic paths. Multi-step approaches repeatedly apply the chemical information stored in single-step retrosynthesis models. However, this connection is not reflected in contemporary research, fixing either the single-step model or the multi-step algorithm in the process. In this work, we establish a bridge between both tasks by benchmarking the performance and transfer of different single-step retrosynthesis models to the multi-step domain by leveraging two common search algorithms, Monte Carlo Tree Search and Retro. We show that models designed for single-step retrosynthesis, when extended to multi-step, can have a tremendous impact on the route finding capabilities of current multi-step methods, improving performance by up to +30% compared to the most widely used model. Furthermore, we observe no clear link between contemporary single-step and multi-step evaluation metrics, showing that single-step models need to be developed and tested for the multi-step domain and not as an isolated task to find synthesis routes for molecules of interest.	翻訳日:2022-12-25 02:44:18 公開日:2022-12-12
# nervus: 医学画像と臨床データ分析の両方のための総合的なディープラーニング分類、回帰、予後予測ツール Nervus: A Comprehensive Deep Learning Classification, Regression, and Prognostication Tool for both Medical Image and Clinical Data Analysis ( http://arxiv.org/abs/2212.11113v1 ) ライセンス: Link先を確認	Toshimasa Matsumoto, Shannon L Walston, Yukio Miki, Daiju Ueda	(参考訳) 本研究の目的は、医用画像研究に使いやすく、グレースケール画像、複数の入力(画像と表データの両方)、マルチラベルタスクを処理できる総合的で柔軟なライブラリを作ることである。 nervusと名付けました。研究目的にAIに適したPyTorchライブラリをベースとして、包括的な入力と出力を処理する4部モデルを作成しました。 nervusは4つの部分からなる。まずはデータローダ、次に特徴抽出器、機能ミキサー、そして最後に分類器です。データローダは入力データを前処理し、特徴抽出器はトレーニングデータとグランド真実ラベルとの間の特徴を抽出し、特徴混合器は抽出器の特徴を混合し、分類器はタスクに基づいて特徴混合器から入力データを分類する。我々はNervusを開発した。Nervusは包括的で柔軟なモデルライブラリで、グレースケール画像、マルチインプット、マルチラベルタスクを処理できる医療画像研究に簡単に利用できる。これは、放射線学の分野の研究者にとって役立つだろう。 The goal of our research is to create a comprehensive and flexible library that is easy to use for medical imaging research, and capable of handling grayscale images, multiple inputs (both images and tabular data), and multi-label tasks. We have named it Nervus. Based on the PyTorch library, which is suitable for AI for research purposes, we created a four-part model to handle comprehensive inputs and outputs. Nervus consists of four parts. First is the dataloader, then the feature extractor, the feature mixer, and finally the classifier. The dataloader preprocesses the input data, the feature extractor extracts the features between the training data and ground truth labels, feature mixer mixes the features of the extractors, and the classifier classifies the input data from feature mixer based on the task. We have created Nervus, which is a comprehensive and flexible model library that is easy to use for medical imaging research which can handle grayscale images, multi-inputs and multi-label tasks. This will be helpful for researchers in the field of radiology.	翻訳日:2022-12-25 02:43:54 公開日:2022-12-12
# 自然言語処理における論理的誤りのロバストかつ説明可能な同定 Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments ( http://arxiv.org/abs/2212.07425v1 ) ライセンス: Link先を確認	Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande, Himanshu Rawlani, Filip Ilievski, H\^ong-\^An Sandlin, Alain Mermoud	(参考訳) 偽情報、プロパガンダ、欠陥のある議論の拡散はインターネット時代に増幅されている。データの量と議論規範の違反を識別する微妙さを考えると、コンテンツモデレーションのような情報分析タスクをサポートし、論理的誤りを識別する信頼できる方法が不可欠である。本稿では,従来の論理的誤りに関する理論的研究を,検出,粗粒度,きめ細かい分類の総合的な3段階評価フレームワークに定式化する。既存の評価データセットを評価の各段階に適用する。プロトタイプ推論,インスタンスベース推論,知識注入に基づくロバストで説明可能な3つの手法を考案した。この手法は言語モデルと背景知識と説明可能なメカニズムを組み合わせるために設計されている。さらに,データ拡張とカリキュラム学習の戦略により,データの分散性に対処する。当社の3段階フレームワークは,プロパガンダ検出などの既存のタスクから,事前データセットとメソッドをネイティブに統合し,総合的な評価テストベッドとして機能します。これらの手法をデータセット上で広範囲に評価し,堅牢性と説明可能性に注目した。本研究は,異なる構成要素と誤認クラスにおける手法の強みと弱みについて考察し,誤認同定は様々なクラスを捉えるのに特別な推論を必要とする困難な課題であることを示す。私たちはオープンソースコードとデータをgithubで共有し、論理的な誤った識別に関するさらなる作業を支援しています。 The spread of misinformation, propaganda, and flawed argumentation has been amplified in the Internet era. Given the volume of data and the subtlety of identifying violations of argumentation norms, supporting information analytics tasks, like content moderation, with trustworthy methods that can identify logical fallacies is essential. In this paper, we formalize prior theoretical work on logical fallacies into a comprehensive three-stage evaluation framework of detection, coarse-grained, and fine-grained classification. We adapt existing evaluation datasets for each stage of the evaluation. We devise three families of robust and explainable methods based on prototype reasoning, instance-based reasoning, and knowledge injection. The methods are designed to combine language models with background knowledge and explainable mechanisms. Moreover, we address data sparsity with strategies for data augmentation and curriculum learning. Our three-stage framework natively consolidates prior datasets and methods from existing tasks, like propaganda detection, serving as an overarching evaluation testbed. We extensively evaluate these methods on our datasets, focusing on their robustness and explainability. Our results provide insight into the strengths and weaknesses of the methods on different components and fallacy classes, indicating that fallacy identification is a challenging task that may require specialized forms of reasoning to capture various classes. We share our open-source code and data on GitHub to support further work on logical fallacy identification.	翻訳日:2022-12-16 15:48:02 公開日:2022-12-12
# eBayにおける移動メトリック検出とアラーティングシステム Moving Metric Detection and Alerting System at eBay ( http://arxiv.org/abs/2004.02360v2 ) ライセンス: Link先を確認	Zezhong Zhang, Keyu Nie and Ted Tao Yuan	(参考訳) ebayでは、さまざまなドメインチームが監視する何千もの製品健康指標があります。異常検出と警告検索に基づいて,動作可能な警告をユーザに通知する2段階警告システムを構築した。第1フェーズでは,分布非依存な基準を持つメトリクス間の潜在的な警告を識別するために,移動メトリック検出(mmd)と呼ばれる効率的な異常検出アルゴリズムを開発した。第2の警告検索フェーズでは、ポイントワイドランキングモデルとビジネスルールで有効な警告を選択するためのフィードバック付きロジックを構築しました。他の傾向や季節分解法と比較すると,非監督症例の異常検出がより高速かつ良好である。当社の2段階アプローチは、警告精度を劇的に改善し、ebayプロダクションにおける警告スパムを回避する。 At eBay, there are thousands of product health metrics for different domain teams to monitor. We built a two-phase alerting system to notify users with actionable alerts based on anomaly detection and alert retrieval. In the first phase, we developed an efficient anomaly detection algorithm, called Moving Metric Detector (MMD), to identify potential alerts among metrics with distribution agnostic criteria. In the second alert retrieval phase, we built additional logic with feedbacks to select valid actionable alerts with point-wise ranking model and business rules. Compared with other trend and seasonality decomposition methods, our decomposer is faster and better to detect anomalies in unsupervised cases. Our two-phase approach dramatically improves alert precision and avoids alert spamming in eBay production.	翻訳日:2022-12-16 06:00:58 公開日:2022-12-12
# 深層学習に基づく推薦システムにおけるスパース特徴のアクセスパターンによるデータ漏洩 Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems ( http://arxiv.org/abs/2212.06264v1 ) ライセンス: Link先を確認	Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram, G. Edward Suh, Hsien-Hsin S. Lee	(参考訳) オンラインパーソナライズドレコメンデーションサービスは一般にクラウドにホストされ、ユーザーはクラウドベースのモデルに問い合わせて商品やニュースフィードなどの推奨入力を受け取る。最先端のレコメンデーションモデルは、ユーザのプロファイル情報とそれらが対話するアイテムを表現するために、疎密な機能に依存しています。スパース機能はモデル全体の99%を占めるが、スパース機能による潜在的な情報漏洩には十分な注意が払われていなかった。これらのスパース機能は、クリック履歴やオブジェクトのインタラクションなど、ユーザの振る舞いを追跡するために使われ、各ユーザのプライベート情報を運ぶ可能性がある。スパース機能は大きなテーブルに格納された学習された埋め込みベクターとして表現され、パーソナライズされたレコメンデーションは、特定のユーザのスパース機能を使用してテーブルをインデックス化する。クラウドで発生する計算を隠蔽する最近提案された方法であっても、クラウドの攻撃者は埋め込みテーブルへのアクセスパターンを追跡することができる。本稿では,レコメンデーションモデルのスパース機能アクセスパターンを追跡することで学習できるプライベート情報について検討する。まず,信頼できないクラウド内のレコメンデーションモデルのスパース機能上で実施可能な攻撃の種類を特徴付け,次に,これらの攻撃がユーザの個人情報の抽出や,ユーザの行動の追跡にどのようにつながるかをデモする。 Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.	翻訳日:2022-12-14 15:51:44 公開日:2022-12-12
# スマートファクトリにおける機械停止予測の非依存学習 Agnostic Learning for Packing Machine Stoppage Prediction in Smart Factories ( http://arxiv.org/abs/2212.06288v1 ) ライセンス: Link先を確認	Gabriel Filios, Ioannis Katsidimas, Sotiris Nikoletseas, Stefanos H. Panagiotou, Theofanis P. Raptis	(参考訳) サイバー物理的収束は、産業運営者にとって新たなビジネスチャンスを開いている。サイバーと物理世界の深い統合の必要性は、新しいシステムとネットワークエンジニアリングのアプローチを統合するための豊富なビジネスアジェンダを確立する。この革命は、豊かで異質なデータソースと、そのインテリジェントな利用能力がなければ不可能であり、主にデータが産業4.0を推進するための基本的な資源となるためである。このデータ豊かでサイバー物理的でスマートな工場環境から生まれてくる最も実りある研究と実践分野の1つは、データ駆動型のプロセス監視分野である。本稿では,パッキングマシンの運転状態記録(食品・飲料領域から製造プラントの生産ラインから得られる実データ)の歴史的産業データセットを変換・前処理することにより,産業4.0の応用コンテキストにおいて,一般時系列予測手法と機械学習アルゴリズムについて検討する。提案手法では,機械の動作状態に関する1つの信号のみを使用して予測を行い,他の動作変数や故障信号や警告信号を考慮せずに予測を行う。この点において,本手法は3つのユースケースに対して極めて有望な性能を達成できることを示す。 The cyber-physical convergence is opening up new business opportunities for industrial operators. The need for deep integration of the cyber and the physical worlds establishes a rich business agenda towards consolidating new system and network engineering approaches. This revolution would not be possible without the rich and heterogeneous sources of data, as well as the ability of their intelligent exploitation, mainly due to the fact that data will serve as a fundamental resource to promote Industry 4.0. One of the most fruitful research and practice areas emerging from this data-rich, cyber-physical, smart factory environment is the data-driven process monitoring field, which applies machine learning methodologies to enable predictive maintenance applications. In this paper, we examine popular time series forecasting techniques as well as supervised machine learning algorithms in the applied context of Industry 4.0, by transforming and preprocessing the historical industrial dataset of a packing machine's operational state recordings (real data coming from the production line of a manufacturing plant from the food and beverage domain). In our methodology, we use only a single signal concerning the machine's operational status to make our predictions, without considering other operational variables or fault and warning signals, hence its characterization as ``agnostic''. In this respect, the results demonstrate that the adopted methods achieve a quite promising performance on three targeted use cases.	翻訳日:2022-12-14 15:51:18 公開日:2022-12-12
# リスクアウェアコントロールのためのオンライン学習障害: 1分未満のデータによるリスクアウェアフライト Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data ( http://arxiv.org/abs/2212.06253v1 ) ライセンス: Link先を確認	Prithvi Akella, Skylar X. Wei, Joel W. Burdick, and Aaron D. Ames	(参考訳) 安全クリティカルなリスク認識制御の最近の進歩は、システムが直面する障害に関するアプリオリの知識に基づいて予測されている。本稿では,これらの障害をオンラインで効果的に学習する手法を提案する。まず、リスク認識コミュニティで一般的に使用されるリスク尺度であるValue-at-Riskを拡張する確率過程のリスク尺度であるSurface-at-Riskの概念を紹介します。第二に、モデルと真のシステム進化の間の状態差のノルムをスカラー値の確率過程としてモデル化し、ガウス過程回帰を通じて表面-アット-リスクへの上限を決定する。第3に,システム動作中に収集したデータセットに対して検証可能な軽度の仮定を対象とする表面の精度に関する理論的結果を提供する。最後に,ドローンの制御器を増設し,運用データを1分未満で収集した後のリスク認識アプローチによる性能向上を実証した。 Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data.	翻訳日:2022-12-14 15:48:41 公開日:2022-12-12
# ドメインインスパイアされた時間グラフ畳み込みニューラルネットワークによる土壌水分の予測と持続的作物管理 Forecasting Soil Moisture Using Domain Inspired Temporal Graph Convolution Neural Networks To Guide Sustainable Crop Management ( http://arxiv.org/abs/2212.06565v1 ) ライセンス: Link先を確認	Muneeza Azmat, Malvern Madondo, Kelsey Dipietro, Raya Horesh, Arun Bawa, Michael Jacobs, Raghavan Srinivasan, Fearghal O'Donncha	(参考訳) 気候変動、人口増加、水不足は農業にとって前例のない課題である。本研究の目的は、持続可能な農業を可能にする作物管理決定のための、ドメイン知識と機械学習を用いた土壌水分の予測である。水文反応の特徴を予測する従来の方法は、計算時間と専門知識を必要とする。最近の研究は、水文応答特性を予測するツールとして機械学習モデルを実装しているが、これらのモデルは、空間的に近接したユニットが全く異なる水文応答を持つことのできる従来の水文モデリングの重要な構成要素を無視している。従来の水文モデルでは、類似した水文特性を持つ単位をまとめて、その空間的近接に関係なくモデルパラメータを共有する。このドメイン知識に触発されて、新しいドメインにインスパイアされた時間グラフ畳み込みニューラルネットワークを構築した。本手法は,時間変動水理特性に基づくクラスタリングユニット,各クラスタのグラフトポロジの構築,およびグラフ畳み込みとゲートリカレントニューラルネットワークを用いた土壌水分の予測を含む。我々は,米国北東部のケーススタディにおいて,40年間にわたる約99,000個の水文応答ユニットからなるフィールドスケール時系列データを訓練し,検証し,検証した。既存のモデルとの比較は、時系列グラフニューラルネットワークを用いたドメインインスパイアされたクラスタリングの有効性を示している。このフレームワークは、pro bono social impactプログラムの一部としてデプロイされている。訓練されたモデルはテキサス中部の小規模農場に配備されている。 Climate change, population growth, and water scarcity present unprecedented challenges for agriculture. This project aims to forecast soil moisture using domain knowledge and machine learning for crop management decisions that enable sustainable farming. Traditional methods for predicting hydrological response features require significant computational time and expertise. Recent work has implemented machine learning models as a tool for forecasting hydrological response features, but these models neglect a crucial component of traditional hydrological modeling that spatially close units can have vastly different hydrological responses. In traditional hydrological modeling, units with similar hydrological properties are grouped together and share model parameters regardless of their spatial proximity. Inspired by this domain knowledge, we have constructed a novel domain-inspired temporal graph convolution neural network. Our approach involves clustering units based on time-varying hydrological properties, constructing graph topologies for each cluster, and forecasting soil moisture using graph convolutions and a gated recurrent neural network. We have trained, validated, and tested our method on field-scale time series data consisting of approximately 99,000 hydrological response units spanning 40 years in a case study in northeastern United States. Comparison with existing models illustrates the effectiveness of using domain-inspired clustering with time series graph neural networks. The framework is being deployed as part of a pro bono social impact program. The trained models are being deployed on small-holding farms in central Texas.	翻訳日:2022-12-14 15:41:41 公開日:2022-12-12
# ROAD: 3次元形状を効率的にエンコードする不必要な再帰オクターオートデコーダ ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes ( http://arxiv.org/abs/2212.06193v1 ) ライセンス: Link先を確認	Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon	(参考訳) 3次元形状のコンパクトで正確な表現は多くの知覚やロボット工学のタスクの中心である。最先端の学習ベースの手法は、単一のオブジェクトを再構築できるが、大きなデータセットにはスケールしない。本稿では,暗黙のオクツリーを潜在空間で再帰的にトラバースすることで,複雑な3次元形状の大規模データセットを効率よく正確に符号化する新しい暗黙表現を提案する。暗黙的再帰的Octree Auto-Decoder (ROAD) は階層的に構造化された潜在空間を学習し、圧縮比99%以上で最先端の復元結果を実現する。また,基礎となるoctree空間表現の粗さを自然に活用する効率的なカリキュラム学習手法を提案する。本研究では, 潜在空間次元, データセットサイズ, 再構成精度に関するスケーリング則を考察し, 潜在空間次元の増加は大規模形状データセットにスケールするのに十分であることを示した。最後に,学習した潜在性空間は,異なる詳細レベルにわたって再利用可能な潜在性をもたらす粗粒度から細粒度までの階層構造を符号化し,トレーニングセット外の新しい形状への一般化の質的証拠を提供する。 Compact and accurate representations of 3D shapes are central to many perception and robotics tasks. State-of-the-art learning-based methods can reconstruct single objects but scale poorly to large datasets. We present a novel recursive implicit representation to efficiently and accurately encode large datasets of complex 3D shapes by recursively traversing an implicit octree in latent space. Our implicit Recursive Octree Auto-Decoder (ROAD) learns a hierarchically structured latent space enabling state-of-the-art reconstruction results at a compression ratio above 99%. We also propose an efficient curriculum learning scheme that naturally exploits the coarse-to-fine properties of the underlying octree spatial representation. We explore the scaling law relating latent space dimension, dataset size, and reconstruction accuracy, showing that increasing the latent space dimension is enough to scale to large shape datasets. Finally, we show that our learned latent space encodes a coarse-to-fine hierarchical structure yielding reusable latents across different levels of details, and we provide qualitative evidence of generalization to novel shapes outside the training set.	翻訳日:2022-12-14 15:40:25 公開日:2022-12-12
# 量子テンソルネットワークを用いた量子位相認識 Quantum Phase Recognition using Quantum Tensor Networks ( http://arxiv.org/abs/2212.06207v1 ) ライセンス: Link先を確認	Shweta Sahoo, Utkarsh Azad and Harjinder Singh	(参考訳) 機械学習(ML)は、最近、多体物理システムに関連する問題の解決に多くの進歩をもたらした。これらの問題の本質的な量子的性質を考えると、量子化された機械学習によって、現在よりもさらに詳細が明らかにできると推測するのは自然なことです。本稿では,教師付き学習タスクのためのテンソルネットワークに触発された浅い変動アンサツに基づく量子機械学習手法について検討する。特に,ファッション・ムニストデータセットを用いた標準画像分類タスクをまず検討し,テンソルネットワーク層がansatzの表現性と性能に与える影響について検討した。最後に、この戦略を用いて、横フィールドIsingとHeisenbergのスピンモデルに対する量子位相認識の問題を1次元と2次元で解決し、マルチスケールエンタングルメント再正規化アンサッツ (MERA) とツリーテンソルネットワーク (TTN) にインスパイアされたパラメタライズ量子回路を用いて、$\geq 98\%$テストセット精度を達成できた。 Machine learning (ML) has recently facilitated many advances in solving problems related to many-body physical systems. Given the intrinsic quantum nature of these problems, it is natural to speculate that quantum-enhanced machine learning will enable us to unveil even greater details than we currently have. With this motivation, this paper examines a quantum machine learning approach based on shallow variational ansatz inspired by tensor networks for supervised learning tasks. In particular, we first look at the standard image classification tasks using the Fashion-MNIST dataset and study the effect of repeating tensor network layers on ansatz's expressibility and performance. Finally, we use this strategy to tackle the problem of quantum phase recognition for the transverse-field Ising and Heisenberg spin models in one and two dimensions, where we were able to reach $\geq 98\%$ test-set accuracies with both multi-scale entanglement renormalization ansatz (MERA) and tree tensor network (TTN) inspired parametrized quantum circuits.	翻訳日:2022-12-14 15:32:09 公開日:2022-12-12
# PathFusion:パスに一貫性のあるLidar-Camera Deep Feature Fusion PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion ( http://arxiv.org/abs/2212.06244v1 ) ライセンス: Link先を確認	Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra	(参考訳) LiDARで撮影するカメラは、物理特性の相補性による3次元検出の精度を向上させるための有望な技術である。既存のほとんどの手法は、カメラ機能を生のLiDAR点雲や浅部3次元特徴と直接融合させることに重点を置いているが、直接深部3次元特徴融合は特徴の不一致により精度が劣る。深いネットワークの段階において、大きな受容領域にまたがる特徴集約から生じる誤用がますます厳しくなっている。本稿ではパス一貫性を有するLiDARカメラの深部機能融合を実現するPathFusionを提案する。 PathFusionは浅い特徴と深い特徴の間の経路一貫性の損失を導入し、2Dバックボーンとその融合パスが3Dバックボーンの変換にセマンティックに整合するように2D特徴を変換することを奨励する。従来の核融合ベースラインである Focals Conv にPathFusion を適用し, nuScenes テストにおける 1.2\% mAP の改善を, テスト時間拡張なしで一貫して観察する。さらにPathFusionは、KITTI AP3D(R11)を適度なレベルで0.6%以上改善する。 Fusing camera with LiDAR is a promising technique to improve the accuracy of 3D detection due to the complementary physical properties. While most existing methods focus on fusing camera features directly with raw LiDAR point clouds or shallow 3D features, it is observed that direct deep 3D feature fusion achieves inferior accuracy due to feature misalignment. The misalignment that originates from the feature aggregation across large receptive fields becomes increasingly severe for deep network stages. In this paper, we propose PathFusion to enable path-consistent LiDAR-camera deep feature fusion. PathFusion introduces a path consistency loss between shallow and deep features, which encourages the 2D backbone and its fusion path to transform 2D features in a way that is semantically aligned with the transform of the 3D backbone. We apply PathFusion to the prior-art fusion baseline, Focals Conv, and observe more than 1.2\% mAP improvements on the nuScenes test split consistently with and without testing-time augmentations. Moreover, PathFusion also improves KITTI AP3D (R11) by more than 0.6% on moderate level.	翻訳日:2022-12-14 14:49:53 公開日:2022-12-12
# scanents3d: 3dシーンにおける visio-linguistic model の改良 ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes ( http://arxiv.org/abs/2212.06250v1 ) ライセンス: Link先を確認	Ahmed Abdelreheem, Kyle Olszewski, Hsin-Ying Lee, Peter Wonka, Panos Achlioptas	(参考訳) ScanRefer [16]とReferIt3D [3]の2つの人気のあるデータセットは、自然言語を現実世界の3Dデータに結びつける。本稿では,参照文で言及されるすべてのオブジェクトと,その基礎となるインスタンスを3dシーン内で関連付けることで,上記2つを拡張した大規模かつ補完的なデータセットをキュレートする。特に、3d(scanents3d)データセットのスキャンエンティティは、84kの自然参照文にまたがる369kオブジェクト間の明示的な対応を提供し、705の現実世界のシーンをカバーします。重要なのは、この新しいデータセットから学習できる直感的な損失を組み込むことで、Nr3DとScanReferのベンチマークでそれぞれ4.3%と5.0%の改善を含む、最近導入されたいくつかのニューラルリスニングアーキテクチャのパフォーマンスを大幅に改善できることである。さらに,nr3dベンチマークにおけるsitaの13.2cider点の改善を含む3dニューラル話者のトレーニングにより,言語生成タスクの競合ベースラインと最近の手法を実験し,ニューラルリスナーと同様に3dニューラル話者もscanents3dで明らかに有益であることを示す。本研究は,ScanEnts3Dを学習することで,新たに収集したアノテーションをテスト時に提供することなく,より効率的かつ解釈可能な3Dアーキテクチャを実現することができるという結論を強く支持する。プロジェクトのwebページはhttps://scanents3d.github.io/。 The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects across 84k natural referential sentences, covering 705 real-world scenes. Crucially, we show that by incorporating intuitive losses that enable learning from this novel dataset, we can significantly improve the performance of several recently introduced neural listening architectures, including improving the SoTA in both the Nr3D and ScanRefer benchmarks by 4.3% and 5.0%, respectively. Moreover, we experiment with competitive baselines and recent methods for the task of language generation and show that, as with neural listeners, 3D neural speakers can also noticeably benefit by training with ScanEnts3D, including improving the SoTA by 13.2 CIDEr points on the Nr3D benchmark. Overall, our carefully conducted experimental studies strongly support the conclusion that, by learning on ScanEnts3D, commonly used visio-linguistic 3D architectures can become more efficient and interpretable in their generalization without needing to provide these newly collected annotations at test time. The project's webpage is https://scanents3d.github.io/ .	翻訳日:2022-12-14 14:49:31 公開日:2022-12-12
# 光コヒーレンス断層画像を用いた糖尿病網膜症自動評価法 An Ensemble Method to Automatically Grade Diabetic Retinopathy with Optical Coherence Tomography Angiography Images ( http://arxiv.org/abs/2212.06265v1 ) ライセンス: Link先を確認	Yuhan Zheng, Fuping Wu, Bart{\l}omiej W. Papie\.z	(参考訳) 糖尿病網膜症(英語版)(dr)は糖尿病の合併症であり、世界人口における視覚障害の主な原因の1つである。 DRの早期発現は、通常非常に軽度で検出が難しいため、眼球スクリーニングによる正確な診断は、後段の視力喪失を防ぐために臨床的に重要である。本研究では,糖尿病網膜症解析チャレンジ(DRAC)2022から入手可能なUW-OCTA画像を用いて,DRを自動的に評価するアンサンブル手法を提案する。まず、最先端の分類ネットワーク、すなわちresnet, densenet, efficientnet, vggを採用し、利用可能なデータセットの異なる分割を持つuw-octaイメージのグレードを訓練する。最終的に、25モデルを取得し、そのうち上位16モデルを選択して、最終的な予測を生成する。また、学習過程において、マルチタスク学習戦略についても検討し、モデル性能を改善するために補助的な分類タスクである画像品質評価を追加する。最終アンサンブルモデルでは,内部テストデータセットでは0.9346の2次重み付きカッパ(QWK),内部テストデータセットでは0.9766のエリアアンダーカーブ(AUC),DRACチャレンジテストデータセットでは0.839のQWKと0.8978のAUCを達成した。 Diabetic retinopathy (DR) is a complication of diabetes, and one of the major causes of vision impairment in the global population. As the early-stage manifestation of DR is usually very mild and hard to detect, an accurate diagnosis via eye-screening is clinically important to prevent vision loss at later stages. In this work, we propose an ensemble method to automatically grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022. First, we adopt the state-of-the-art classification networks, i.e., ResNet, DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with different splits of the available dataset. Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions. During the training process, we also investigate the multi-task learning strategy, and add an auxiliary classification task, the Image Quality Assessment, to improve the model performance. Our final ensemble model achieved a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of 0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of 0.8978 on the DRAC challenge testing dataset.	翻訳日:2022-12-14 14:49:01 公開日:2022-12-12
# ビデオオブジェクトセグメンテーションにおける「オブジェクト」の分解 Breaking the "Object" in Video Object Segmentation ( http://arxiv.org/abs/2212.06200v1 ) ライセンス: Link先を確認	Pavel Tokmakov, Jie Li, Adrien Gaidon	(参考訳) 物体の外観は、それが変形するときに浮かび上がることがある。卵が折れたり、紙が破れてしまうと、その色、形、テクスチャが劇的に変化し、アイデンティティ自体を除いてオリジナルのものはほとんど保存されない。しかし、この重要な現象は既存のvos(video object segmentation)ベンチマークにはほとんど及ばない。本研究では,ビデオオブジェクトセグメンテーションのための新しいデータセットを変換(VOST)下で収集することで,そのギャップを埋める。 700以上の高解像度ビデオで構成され、さまざまな環境で撮影され、平均20秒の長さで、インスタンスマスクでラベル付けされている。これらのビデオは、複雑なオブジェクト変換に焦点を合わせ、その完全な時間的範囲を捉えるために、注意深いマルチステップのアプローチが採用されている。次に、最先端のVOS手法を広く評価し、多くの重要な発見を行う。特に,本課題に適用された場合,既存の手法は困難であり,その主な限界は静的な外観上の過度な信頼にあることを示す。これにより、時空間情報のモデリングを改善することにより、その能力を改善するトップパフォーマンスベースラインのいくつかの変更を提案する動機付けとなります。しかし、より広範に、より堅牢なビデオオブジェクト表現の学習に関する議論を刺激することを期待している。 The appearance of an object can be fleeting when it transforms. As eggs are broken or paper is torn, their color, shape and texture can change dramatically, preserving virtually nothing of the original except for the identity itself. Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks. In this work, we close the gap by collecting a new dataset for Video Object Segmentation under Transformations (VOST). It consists of more than 700 high-resolution videos, captured in diverse environments, which are 20 seconds long on average and densely labeled with instance masks. A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent. We then extensively evaluate state-of-the-art VOS methods and make a number of important discoveries. In particular, we show that existing methods struggle when applied to this novel task and that their main limitation lies in over-reliance on static appearance cues. This motivates us to propose a few modifications for the top-performing baseline that improve its capabilities by better modeling spatio-temporal information. But more broadly, the hope is to stimulate discussion on learning more robust video object representations.	翻訳日:2022-12-14 14:37:56 公開日:2022-12-12
# 2つの正しいオブジェクト認識: 視覚的合理的な理由 Doubly Right Object Recognition: A Why Prompt for Visual Rationales ( http://arxiv.org/abs/2212.06202v1 ) ライセンス: Link先を確認	Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick	(参考訳) 多くの視覚認識モデルは、それらが強い性能を得る指標である分類精度に基づいて評価される。本稿では,コンピュータビジョンモデルが予測に正しい根拠を与えることができるかどうかを考察する。そこで、メトリクスはモデルに対して、正しいラベルと正しい合理性の両方を同時に生成するように要求する。クリップのような最先端の視覚モデルは、分類学的予測に不正確な根拠を与えることが多い。しかし, 言語モデルから, 適切なデータセットを用いて視覚表現に有理を変換することにより, 大きな視覚表現を適応させて正しい有理を生成できる「なぜプロンプト」を学習できることが示される。可視化と実証実験により,2倍のオブジェクト認識の性能が向上し,非認識タスクやデータセットへのゼロショット転送も向上した。 Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In this paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a ``why prompt,'' which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets.	翻訳日:2022-12-14 14:37:35 公開日:2022-12-12
# 文脈記述可能なビデオ表現:\Human知覚に基づく理解 Contextual Explainable Video Representation:\\Human Perception-based Understanding ( http://arxiv.org/abs/2212.06206v1 ) ライセンス: Link先を確認	Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan Le	(参考訳) 映像理解は、行動検出、行動認識、ビデオキャプション、ビデオ検索など、空間的情報と時間的情報の両方を理解するための多くの興味深いタスクを含む、強烈な研究の対象となっている。ビデオ理解における最も困難な問題の1つは特徴抽出(例えば、制約のないビデオの長く複雑な時間構造のために与えられたビデオから文脈的視覚表現を抽出する)を扱うことである。事前学習されたバックボーンネットワークをブラックボックスとして視覚的表現を抽出する既存のアプローチとは異なり、本手法は説明可能なメカニズムで最も文脈的な情報を抽出することを目的としている。私たちが観察したように、人間は通常、アクタ、関連するオブジェクト、および周囲の環境という3つの主要な要因の相互作用を通してビデオを知覚する。したがって,それぞれの要因を抽出し,それらの関係をモデル化する,文脈的に説明可能な映像表現抽出を設計することが極めて重要である。本稿では,人間の知覚過程をアクタ,物体,環境のモデリングに組み込む手法について述べる。映像理解における人間の知覚に基づく文脈表現の有効性を説明するために,映像段落キャプションと時間的行動検出を選択する。ソースコードはhttps://github.com/UARK-AICV/Video_Representationで公開されている。 Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.	翻訳日:2022-12-14 14:37:18 公開日:2022-12-12
# 新しい脆弱歩行者データセットにおける深部物体検出器の比較 Comparison Of Deep Object Detectors On A New Vulnerable Pedestrian Dataset ( http://arxiv.org/abs/2212.06218v1 ) ライセンス: Link先を確認	Devansh Sharma, Tihitina Hade, Qing Tian	(参考訳) 歩行者の安全は自動運転の主要な関心事である。今日の歩行者データセットにおける脆弱なグループの表現不足は、脆弱な道路ユーザのデータセットに対する緊急の必要性を示している。本稿では、まず、BG Vulnerable Pedestrian(BGVP)データセットという、脆弱な歩行者検出データセットを導入し、身近なモデルを訓練し、脆弱な歩行者検出の有効性を高めるために研究を誘導する。データセットには、障害のない子供、障害のない高齢者、障害のある高齢者、非脆弱性の4つのクラスが含まれている。このデータセットはパブリックドメインから収集された画像と手動で注釈付けされたバウンディングボックスで構成されている。さらに,提案したデータセットを用いて,YOLOv4,YOLOv5,YOLOX,Faster R-CNN,EfficientDetという,最先端オブジェクト検出モデルのトレーニングとテストを行った。その結果,YOLOXとYOLOv4はデータセット上で最高の成績を示し,YOLOv4は0.7999,YOLOXは0.5で0.7779,YOLOXは0.5で3.8%の成績を示した。一般的に、5つの検知器は、 with Disability クラスをよく予測し、高齢者障害クラスではうまく機能しない。 YOLOX は mAP (0.5:0.95) の他の検出器を常に上回り、障害のない子供、障害のない高齢者、障害のない子供、障害のない子供、および障害のない人それぞれ 0.5644, 0.5242, 0.4781, 0.6796 を得る。私たちのデータセットとコードはhttps://github.com/devvansh1997/bgvpで利用可能です。 Pedestrian safety is one primary concern in autonomous driving. The under-representation of vulnerable groups in today's pedestrian datasets points to an urgent need for a dataset of vulnerable road users. In this paper, we first introduce a new vulnerable pedestrian detection dataset, BG Vulnerable Pedestrian (BGVP) dataset to help train well-rounded models and thus induce research to increase the efficacy of vulnerable pedestrian detection. The dataset includes four classes, i.e., Children Without Disability, Elderly without Disability, With Disability, and Non-Vulnerable. This dataset consists of images collected from the public domain and manually-annotated bounding boxes. In addition, on the proposed dataset, we have trained and tested five state-of-the-art object detection models, i.e., YOLOv4, YOLOv5, YOLOX, Faster R-CNN, and EfficientDet. Our results indicate that YOLOX and YOLOv4 perform the best on our dataset, YOLOv4 scoring 0.7999 and YOLOX scoring 0.7779 on the mAP 0.5 metric, while YOLOX outperforms YOLOv4 by 3.8 percent on the mAP 0.5:0.95 metric. Generally speaking, all five detectors do well predicting the With Disability class and perform poorly in the Elderly Without Disability class. YOLOX consistently outperforms all other detectors on the mAP (0.5:0.95) per class metric, obtaining 0.5644, 0.5242, 0.4781, and 0.6796 for Children Without Disability, Elderly Without Disability, Non-vulnerable, and With Disability, respectively. Our dataset and codes are available at https://github.com/devvansh1997/BGVP.	翻訳日:2022-12-14 14:36:56 公開日:2022-12-12
# 生データから視覚と聴覚の表現を共同学習する Jointly Learning Visual and Auditory Speech Representations from Raw Data ( http://arxiv.org/abs/2212.06246v1 ) ライセンス: Link先を確認	Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic	(参考訳) 視覚と聴覚の表現を協調的に学習する自己教師型マルチモーダルアプローチであるRAVEnを提案する。事前学習の目的は,マスキング入力を符号化し,ゆるやかに変化する運動量エンコーダによって生成された文脈的目標を予測することである。映像と音声の相違により、我々の設計は非対称なw.r.t.の2つのモードのプリテキストタスクである:聴覚ストリームは視覚的目標と聴覚的目標の両方を予測するが、視覚ストリームは聴覚的目標のみを予測する。我々は,1つの事前学習段階から得られる視覚的および聴覚的エンコーダを微調整し,エンコーダを協調的に訓練する際の,低・高リソースなラベル付きデータ設定の強い結果を観察した。特に、RAVEnは、RS3上の視覚音声認識(VSR)に関する全ての自己指導的手法を超越し、RAVEnと自己訓練を組み合わせることで、わずか30時間のラベル付きデータを使用して、90,000時間の公開データに基づいてトレーニングされた最近の半監督的手法よりも優れています。同時に、聴覚音声認識のための低リソース設定であるLSS3(VSR)において、最先端の結果を達成している。本研究は,手作りの特徴に頼らずに,生の映像や音声から強力な音声表現を学習できることを示す。コードとモデルは公開されます。 We present RAVEn, a self-supervised multi-modal approach to jointly learn visual and auditory speech representations. Our pre-training objective involves encoding masked inputs, and then predicting contextualised targets generated by slowly-evolving momentum encoders. Driven by the inherent differences between video and audio, our design is asymmetric w.r.t. the two modalities' pretext tasks: Whereas the auditory stream predicts both the visual and auditory targets, the visual one predicts only the auditory targets. We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained. Notably, RAVEn surpasses all self-supervised methods on visual speech recognition (VSR) on LRS3, and combining RAVEn with self-training using only 30 hours of labelled data even outperforms a recent semi-supervised method trained on 90,000 hours of non-public data. At the same time, we achieve state-of-the-art results in the LRS3 low-resource setting for auditory speech recognition (as well as for VSR). Our findings point to the viability of learning powerful speech representations entirely from raw video and audio, i.e., without relying on handcrafted features. Code and models will be made public.	翻訳日:2022-12-14 14:21:21 公開日:2022-12-12
# 変異を利用したゲノムデータを用いたニューラルネットワークの解釈可能性評価 Utilizing Mutations to Evaluate Interpretability of Neural Networks on Genomic Data ( http://arxiv.org/abs/2212.06151v1 ) ライセンス: Link先を確認	Utku Ozbulak, Solha Kang, Jasper Zuallaert, Stephen Depuydt, Joris Vankerschaver	(参考訳) 深層ニューラルネットワーク(DNN)はゲノムデータに関わる多くの問題に対して最先端の結果を達成しているが、DNNに意思決定プロセスを説明することは、ブラックボックスの性質のために大きな課題となっている。 DNNに予測の推論を説明する1つの方法は、最も予測に寄与する入力の部分を強調すると仮定される帰属法である。多くの帰属法の存在とそれらの方法の忠実度に関する定量的な結果の欠如を踏まえ、列ベースタスクに対する帰属法の選択は質的に行われている。本研究では,点突然変異を利用した計算手法を提案することにより,最も忠実な帰属法を特定するための一歩を踏み出した。 7つの一般的な帰属法について定量的な結果が得られ,LRPは翻訳開始に最も適しており,LRPは翻訳の2つの重要な生物学的特徴であるコザック配列の整合性および早期停止コドンの有害な影響を同定している。 Even though deep neural networks (DNNs) achieve state-of-the-art results for a number of problems involving genomic data, getting DNNs to explain their decision-making process has been a major challenge due to their black-box nature. One way to get DNNs to explain their reasoning for prediction is via attribution methods which are assumed to highlight the parts of the input that contribute to the prediction the most. Given the existence of numerous attribution methods and a lack of quantitative results on the fidelity of those methods, selection of an attribution method for sequence-based tasks has been mostly done qualitatively. In this work, we take a step towards identifying the most faithful attribution method by proposing a computational approach that utilizes point mutations. Providing quantitative results on seven popular attribution methods, we find Layerwise Relevance Propagation (LRP) to be the most appropriate one for translation initiation, with LRP identifying two important biological features for translation: the integrity of Kozak sequence as well as the detrimental effects of premature stop codons.	翻訳日:2022-12-14 14:11:13 公開日:2022-12-12
# 近似探索データ解析の強化 Reinforced Approximate Exploratory Data Analysis ( http://arxiv.org/abs/2212.06225v1 ) ライセンス: Link先を確認	Shaddy Garg, Subrata Mitra, Tong Yu, Yash Gadhia, Arjun Kashettiwar	(参考訳) 探索的データ分析(exploratory data analytics、eda)は、アナリストがそれに続くクエリを選択して、過去のクエリとそれに対応する結果に基づいて興味深い洞察を導き出す、逐次的な意思決定プロセスである。データ処理システムは、低レイテンシで結果を生成するために、しばしばサンプルでクエリを実行する。異なるダウンサンプリング戦略は、データの異なる統計を保存し、異なる大きさの遅延減少を持つ。サンプリング戦略の最適選択は分析フローの特定の文脈と分析者の隠れた意図に依存することが多い。本稿では,対話型データ探索におけるサンプリングの影響を,近似誤差を導入する際に初めて検討する。本稿では, サンプル選択を最適化し, 分析および洞察フローの持続性を維持するための, 深層強化学習(DRL)に基づくフレームワークを提案する。 3つの実データセットを用いて評価した結果,本手法は,ベースライン法と比較して,相互作用遅延を改善しつつ,元の洞察生成フローを維持可能であることがわかった。 Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.	翻訳日:2022-12-14 14:10:53 公開日:2022-12-12
# テキストマイニングとソーシャルメディア分析に基づく地震影響解析 Earthquake Impact Analysis Based on Text Mining and Social Media Analytics ( http://arxiv.org/abs/2212.06765v1 ) ライセンス: Link先を確認	Zhe Zheng, Hong-Zheng Shi, Yu-Cheng Zhou, Xin-Zheng Lu, Jia-Rui Lin	(参考訳) 地震は広い範囲に深く影響し、緊急救助活動は災害の範囲や範囲に関するソーシャルメディアの情報から恩恵を受ける可能性がある。そこで本研究では,早期地震影響解析のためのソーシャルメディアデータを収集・分析するためのテキストマイニング手法を提案する。まず、災害関連マイクロブログをクローラ技術に基づくSinaマイクロブログから収集する。そして、データをクリーニングした後、(1)ホットワード分析、(2)マイクロブログ数の動向、(3)世論感情の傾向、(4)地震影響分析のためのキーワードおよび規則に基づくテキスト分類を含む一連の分析を行う。最後に,中国におけるマグニチュードと震源深度が同じ2つの最近の地震を解析し,その影響を比較した。その結果, 世論の傾向分析と世論の傾向は, 早期に地震の社会的影響を推定し, 意思決定・救助管理に有効であることが示唆された。 Earthquakes have a deep impact on wide areas, and emergency rescue operations may benefit from social media information about the scope and extent of the disaster. Therefore, this work presents a text miningbased approach to collect and analyze social media data for early earthquake impact analysis. First, disasterrelated microblogs are collected from the Sina microblog based on crawler technology. Then, after data cleaning a series of analyses are conducted including (1) the hot words analysis, (2) the trend of the number of microblogs, (3) the trend of public opinion sentiment, and (4) a keyword and rule-based text classification for earthquake impact analysis. Finally, two recent earthquakes with the same magnitude and focal depth in China are analyzed to compare their impacts. The results show that the public opinion trend analysis and the trend of public opinion sentiment can estimate the earthquake's social impact at an early stage, which will be helpful to decision-making and rescue management.	翻訳日:2022-12-14 14:01:09 公開日:2022-12-12
# 自己回帰バンド Autoregressive Bandits ( http://arxiv.org/abs/2212.06251v1 ) ライセンス: Link先を確認	Francesco Bacchiocchi, Gianmarco Genalti, Davide Maran, Marco Mussi, Marcello Restelli, Nicola Gatti, Alberto Maria Metelli	(参考訳) 自己回帰的なプロセスは、株式市場、売り予測、天気予報、広告、価格など、様々な現実世界のシナリオで自然に発生する。このような文脈でシーケンシャルな意思決定問題に対処する場合、連続的な観測間の時間的依存は最適決定ポリシーに収束するために適切に考慮すべきである。そこで本研究では,エージェントが選択する動作にパラメータが依存するk$の自己回帰プロセスに従って,有限セットのn$アクション内で,観察された報酬が従う,自己回帰的バンディット(autoregressive bandits, arbs)という,新しいオンライン学習設定を提案する。次に、楽観的な後悔の最小化アルゴリズム(ar-ucb)を考案し、$\widetilde{\mathcal{o}} \left( \frac{(k+1)^{3/2}\sqrt{nt}}{(1-\gamma)^2} \right)$であり、最適化の地平線に$t$、システムの安定性の指標に$\gamma < 1$を与える。最後に,提案手法の利点を示す一般および特定目的のバンディットベースラインと比較し,いくつかの合成および1つの実世界環境での数値検証を行う。 Autoregressive processes naturally arise in a large variety of real-world scenarios, including e.g., stock markets, sell forecasting, weather prediction, advertising, and pricing. When addressing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for converge to the optimal decision policy. In this work, we propose a novel online learning setting, named Autoregressive Bandits (ARBs), in which the observed reward follows an autoregressive process of order $k$, whose parameters depend on the action the agent chooses, within a finite set of $n$ actions. Then, we devise an optimistic regret minimization algorithm AutoRegressive Upper Confidence Bounds (AR-UCB) that suffers regret of order $\widetilde{\mathcal{O}} \left( \frac{(k+1)^{3/2}\sqrt{nT}}{(1-\Gamma)^2} \right)$, being $T$ the optimization horizon and $\Gamma < 1$ an index of the stability of the system. Finally, we present a numerical validation in several synthetic and one real-world setting, in comparison with general and specific purpose bandit baselines showing the advantages of the proposed approach.	翻訳日:2022-12-14 13:53:59 公開日:2022-12-12
# ディープラーニングのための合成画像データ Synthetic Image Data for Deep Learning ( http://arxiv.org/abs/2212.06232v1 ) ライセンス: Link先を確認	Jason W. Anderson, Marcin Ziolkowski, Ken Kennedy, Amy W. Apon	(参考訳) 3dモデルからレンダリングされた現実的な合成画像データは、画像セットの拡張と画像分類のセマンティクスセグメンテーションモデルのトレーニングに使用できる。本研究では,実車の生産3次元CADモデルに基づく大規模合成データセットを,高品質な物理ベースレンダリングとドメインランダム化により効率的に作成する方法について検討する。このデータセットを用いて、u-netおよびdouble-u-netモデルを用いた合成拡張の有効性を定量化する。この領域では, 合成画像は, 限られた実データ集合を増強する有効な手法であることがわかった。純合成画像上で訓練されたモデルでは,実際の検証画像上では平均IoUが極めて低かった。また,合成データセットに非常に少量の実画像を追加すると精度が大幅に向上し,合成画像で拡張されたデータセットで学習したモデルの方が実画像単独で訓練したモデルよりも精度が高かった。最後に, インクリメンタルトレーニングやモデル特殊化の恩恵を受けるユースケースでは, 合成画像のベースモデルを事前訓練することで, 転送学習のトレーニングコストが大幅に削減され, モデルトレーニングの最大90%をフロントロードできることがわかった。 Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.	翻訳日:2022-12-14 13:52:24 公開日:2022-12-12
# テスト時間適応とトレーニング時間一般化:キーポイント推定を用いたヒトインスタンス分割のケーススタディ Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation ( http://arxiv.org/abs/2212.06242v1 ) ライセンス: Link先を確認	Kambiz Azarian, Debasmit Das, Hyojin Park, Fatih Porikli	(参考訳) キーポイント推定を用いて,与えられたテスト画像のヒトインスタンスセグメンテーションマスク品質を改善する問題を考える。 2つのアプローチを比較します。第1のアプローチはテスト時間適応(TTA)法であり、単一のラベルのないテスト画像を用いてセグメント化ネットワークの重みをテスト時間修正できる。このアプローチでは、ラベル付きソースデータセットへのテスト時間アクセスを前提としません。具体的には、キーポイント推定値を擬似ラベルとして使用し、バックボーン重みを調整するためにバックプロパゲートする。第2のアプローチは、トレーニング時一般化(TTG)手法であり、ラベル付きソースデータセットへのオフラインアクセスを許可するが、重みのテスト時修正は許可しない。さらに、対象領域に関する画像や知識が利用できるとは想定していません。 TTG法は,キーポイントヘッドが生成したバックボーンの特徴を増強し,アグリゲートベクトルをマスクヘッドに供給することで構成する。包括的アブリケーションを通じて、両アプローチを評価し、TTAゲインを制限するいくつかの要因を特定する。特に、大きなドメインシフトがなければ、TTAは損傷し、TTGはパフォーマンスがわずかに向上することを示し、一方、大きなドメインシフトでは、TTAゲインはより小さく、使用したヒューリスティックに依存し、TTGゲインはより大きく、アーキテクチャ上の選択に対して堅牢であることを示す。 We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to the labeled source dataset. More specifically, our TTA method consists of using the keypoints estimates as pseudo labels and backpropagating them to adjust the backbone weights. The second approach is a training-time generalization (TTG) method, where we permit offline access to the labeled source dataset but not the test-time modification of weights. Furthermore, we do not assume the availability of any images from or knowledge about the target domain. Our TTG method consists of augmenting the backbone features with those generated by the keypoints head and feeding the aggregate vector to the mask head. Through a comprehensive set of ablations, we evaluate both approaches and identify several factors limiting the TTA gains. In particular, we show that in the absence of a significant domain shift, TTA may hurt and TTG show only a small gain in performance, whereas for a large domain shift, TTA gains are smaller and dependent on the heuristics used, while TTG gains are larger and robust to architectural choices.	翻訳日:2022-12-14 13:52:04 公開日:2022-12-12
# スプリアス相関を修正するには、適切な埋め込み抽出子だけでよい You Only Need a Good Embeddings Extractor to Fix Spurious Correlations ( http://arxiv.org/abs/2212.06254v1 ) ライセンス: Link先を確認	Raghav Mehta, V\'itor Albiero, Li Chen, Ivan Evtimov, Tamar Glaser, Zhiheng Li, Tal Hassner	(参考訳) トレーニングデータのスプリアス相関は、モデルがショートカットとして使用することを学ぶと、しばしば堅牢性の問題を引き起こす。例えば、オブジェクトが牛であるかどうかを予測する場合、モデルはその緑の背景に依存することを学べるので、砂浜の背景の牛ではうまくいかない。この問題を軽減する方法に関する最先端の測定のための標準データセットは、waterbirdsである。ベストメソッド(Group Distributionally Robust Optimization - GroupDRO)は、現在、89 %最悪のグループ精度を達成しており、生画像のスクラッチからの標準トレーニングは72 %しか得られない。 GroupDROは、サブグループラベルを使ってエンドツーエンドでモデルをトレーニングする必要がある。本稿では,大規模な視覚モデル抽出器からの埋め込みを単純に使用し,その上に線形分類器を訓練することにより,トレーニングセットのサブグループ情報を用いることなく,最大90%の精度が得られることを示す。事前学習モデルと事前学習データセットに関する実験により,事前学習モデルのキャパシティと事前学習データセットのサイズが重要であることを示す。我々の実験では、高容量の視覚変換器は高容量の畳み込みニューラルネットワークよりも優れた性能を示し、より大きな事前学習データセットは、素早い相関データセット上で最悪のグループ精度をもたらす。 Spurious correlations in training data often lead to robustness issues since models learn to use them as shortcuts. For example, when predicting whether an object is a cow, a model might learn to rely on its green background, so it would do poorly on a cow on a sandy background. A standard dataset for measuring state-of-the-art on methods mitigating this problem is Waterbirds. The best method (Group Distributionally Robust Optimization - GroupDRO) currently achieves 89\% worst group accuracy and standard training from scratch on raw images only gets 72\%. GroupDRO requires training a model in an end-to-end manner with subgroup labels. In this paper, we show that we can achieve up to 90\% accuracy without using any sub-group information in the training set by simply using embeddings from a large pre-trained vision model extractor and training a linear classifier on top of it. With experiments on a wide range of pre-trained models and pre-training datasets, we show that the capacity of the pre-training model and the size of the pre-training dataset matters. Our experiments reveal that high capacity vision transformers perform better compared to high capacity convolutional neural networks, and larger pre-training dataset leads to better worst-group accuracy on the spurious correlation dataset.	翻訳日:2022-12-14 13:51:38 公開日:2022-12-12
# モデル拡張によるデータセット蒸留の促進 Accelerating Dataset Distillation via Model Augmentation ( http://arxiv.org/abs/2212.06152v1 ) ライセンス: Link先を確認	Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu	(参考訳) 新たな分野であるデータセット蒸留(DD)は、大規模データからはるかに小さく高品質な合成データセットを生成することを目的としている。勾配マッチングに基づく既存のDD手法は、先行性能を達成するが、数千のランダム初期化モデルの間でデータセットを継続的に最適化する必要があるため、非常に計算集約的である。本稿では,多種多様なモデルを用いた合成データの学習が一般化性能の向上につながると仮定する。そこで本稿では, 学習コストを大幅に削減した情報合成集合を学習するために, \textbf{early-stage model} と \textbf{weight perturbation} の2つの手法を提案する。実験の結果,提案手法は20$\times$ の高速化と,最先端のベースライン法と同等の性能を達成できた。 Dataset Distillation (DD), a newly emerging field, aims at generating much smaller and high-quality synthetic datasets from large ones. Existing DD methods based on gradient matching achieve leading performance; however, they are extremely computationally intensive as they require continuously optimizing a dataset among thousands of randomly initialized models. In this paper, we assume that training the synthetic data with diverse models leads to better generalization performance. Thus we propose two \textbf{model augmentation} techniques, ~\ie using \textbf{early-stage models} and \textbf{weight perturbation} to learn an informative synthetic set with significantly reduced training cost. Extensive experiments demonstrate that our method achieves up to 20$\times$ speedup and comparable performance on par with state-of-the-art baseline methods.	翻訳日:2022-12-14 13:43:49 公開日:2022-12-12
# ブラインドドメイン遷移によるゼロショット運動健康モニタリング Zero-Shot Motor Health Monitoring by Blind Domain Transition ( http://arxiv.org/abs/2212.06154v1 ) ライセンス: Link先を確認	Serkan Kiranyaz, Ozer Can Devecioglu, Amir Alhams, Sadok Sassi, Turker Ince, Osama Abdeljaber, Onur Avci, and Moncef Gabbouj	(参考訳) 運動の健康状態の連続的モニタリングは、ベアリング障害などの異常の早期発見に不可欠である(最大51%のモーター障害はベアリング障害に起因する)。障害検出のための多くの手法が提案されているが、そのほとんどは正常(健康)と異常(正常)のデータを必要とする。ラベル付きデータに基づいて訓練された最近のDeep Learning (DL) 手法であっても、分類精度は1つまたは少数の条件が変更された場合に著しく低下する。さらに、その性能は著しく低下するか、全く異なる健全な信号パターンを持つ別のマシンでテストした場合に完全に失敗する可能性がある。そこで本研究では, 作業条件, センサパラメータ, 故障特性に関わらず, 新たな(ターゲット)マシンの故障を検知できるゼロショット軸受故障検出手法を提案する。この目的を達成するために、第1の操作生成逆ネットワーク(op-gan)は、(a)ソースマシンの正常振動信号と故障振動信号の遷移を、様々な条件、センサパラメータ、および故障タイプで特徴付ける。そして、ターゲットマシンでは、潜在的な故障信号を生成し、その実際の健全で合成された故障信号に対して、コンパクトで軽量な1d自己オン故障検出器を訓練して、発生時にリアルタイムに実故障状態を検出することができる。提案手法を検証するために、異なる条件とセンサ位置で動作する2つの異なるモータを用いて、新しいベンチマークデータセットを作成する。実験の結果, 本手法は, タイプ, 重大度, 位置に関わらず, 2台の対象機で平均89%, 95%のリコール率を達成するベアリング障害を正確に検出できることがわかった。 Continuous long-term monitoring of motor health is crucial for the early detection of abnormalities such as bearing faults (up to 51% of motor failures are attributed to bearing faults). Despite numerous methodologies proposed for bearing fault detection, most of them require normal (healthy) and abnormal (faulty) data for training. Even with the recent deep learning (DL) methodologies trained on the labeled data from the same machine, the classification accuracy significantly deteriorates when one or few conditions are altered. Furthermore, their performance suffers significantly or may entirely fail when they are tested on another machine with entirely different healthy and faulty signal patterns. To address this need, in this pilot study, we propose a zero-shot bearing fault detection method that can detect any fault on a new (target) machine regardless of the working conditions, sensor parameters, or fault characteristics. To accomplish this objective, a 1D Operational Generative Adversarial Network (Op-GAN) first characterizes the transition between normal and fault vibration signals of (a) source machine(s) under various conditions, sensor parameters, and fault types. Then for a target machine, the potential faulty signals can be generated, and over its actual healthy and synthesized faulty signals, a compact, and lightweight 1D Self-ONN fault detector can then be trained to detect the real faulty condition in real time whenever it occurs. To validate the proposed approach, a new benchmark dataset is created using two different motors working under different conditions and sensor locations. Experimental results demonstrate that this novel approach can accurately detect any bearing fault achieving an average recall rate of around 89% and 95% on two target machines regardless of its type, severity, and location.	翻訳日:2022-12-14 13:43:31 公開日:2022-12-12
# 可変再生型保守政策イテレーション Variance-Reduced Conservative Policy Iteration ( http://arxiv.org/abs/2212.06283v1 ) ライセンス: Link先を確認	Naman Agarwal, Brian Bullins, Karan Singh	(参考訳) 政策空間上の実証的リスク最小化問題の列に強化学習を還元するサンプル複雑性について検討する。このような還元に基づくアルゴリズムは、ポリシー勾配アルゴリズムのパラメータ空間とは対照的に関数空間の局所収束を示すため、ポリシークラスの非線型あるいは不連続なパラメータ化の影響を受けない。我々は、$O(\varepsilon^{-4})$から$O(\varepsilon^{-3})$へ、$\varepsilon$-functional local optimumを生成する際のサンプル複雑さを改善する保守政策イテレーションの分散還元変種を提案する。状態被覆とポリシー完全性の仮定の下で、アルゴリズムは$O(\varepsilon^{-2})$倍をサンプリングした後、$\varepsilon$-globalOptimityを享受し、以前に確立された$O(\varepsilon^{-3})$サンプル要件を改善した。 We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.	翻訳日:2022-12-14 13:43:00 公開日:2022-12-12
# 単語・文レベルでの軽度注意を用いた臨床ノートによる死亡予測モデル Mortality Prediction Models with Clinical Notes Using Sparse Attention at the Word and Sentence Levels ( http://arxiv.org/abs/2212.06267v1 ) ライセンス: Link先を確認	Miguel Rios, Ameen Abu-Hanna	(参考訳) 集中治療院死亡予測には様々な臨床応用がある。ニューラル予測モデル、特に臨床ノートに注目する場合には、既存のモデルの改善が期待されている。しかし、受け入れるにはこれらのモデルは高性能で透明でなければならない。本研究は, 臨床神経予測モデルにおいて, 識別・校正の観点から異なる注意機構について検討する。具体的には, 臨床ノートによる病院内死亡予測タスクにおける集中的注意重みの代替として, 軽度注意力について検討した。我々は注意機構を次のように評価する。一文中の単語に対する局所的な自己注意、及び二文にまたがるトランスフォーマーアーキテクチャによるグローバルな自己注意 sparse機構アプローチは,公開データセットを用いた予測性能の観点で,局所的自己着想に対する密接なアプローチよりも優れており,事前定義された関連する指示語に対する注意が高まることを実証する。しかし、文レベルのパフォーマンスは、影響力のある指示語を含む文をまとめてドロップする傾向があるため、悪化する。 Intensive Care in-hospital mortality prediction has various clinical applications. Neural prediction models, especially when capitalising on clinical notes, have been put forward as improvement on currently existing models. However, to be acceptable these models should be performant and transparent. This work studies different attention mechanisms for clinical neural prediction models in terms of their discrimination and calibration. Specifically, we investigate sparse attention as an alternative to dense attention weights in the task of in-hospital mortality prediction from clinical notes. We evaluate the attention mechanisms based on: i) local self-attention over words in a sentence, and ii) global self-attention with a transformer architecture across sentences. We demonstrate that the sparse mechanism approach outperforms the dense one for the local self-attention in terms of predictive performance with a publicly available dataset, and puts higher attention to prespecified relevant directive words. The performance at the sentence level, however, deteriorates as sentences including the influential directive words tend to be dropped all together.	翻訳日:2022-12-14 13:35:48 公開日:2022-12-12
# nnU-Netの効率よいベイズ不確かさ推定 Efficient Bayesian Uncertainty Estimation for nnU-Net ( http://arxiv.org/abs/2212.06278v1 ) ライセンス: Link先を確認	Yidong Zhao, Changchun Yang, Artur Schweidtmann, Qian Tao	(参考訳) 自己構成のnnU-Netは、幅広い医療画像セグメンテーションの課題において、主要なパフォーマンスを達成した。選択のモデルとして広く考えられており、医用画像セグメンテーションの強力なベースラインとなっている。しかし、その異常な性能にもかかわらず、nnU-Netはその失敗の可能性を示すための不確実性の尺度を提供していない。これは、データが不均一でnnU-Netが注意なく失敗する、大規模な画像分割アプリケーションで問題となる可能性がある。本研究では,医療画像分割におけるnnU-Netの不確実性を推定する新しい手法を提案する。ベイズ不確実性推定のための重み空間の後方サンプリングに有効な手法を提案する。モンテカルロ・ドロップアウトや平均場ベイズニューラルネットワークのような従来のベースライン手法とは異なり,提案手法は変分アーキテクチャを必要とせず,元のnnU-Netアーキテクチャをそのまま維持し,優れた性能と使いやすさを維持する。さらに,マルチモーダル後部モデルにより,元のnnU-Netよりもセグメンテーション性能を向上する。本法を心臓mriの公開 acdc および m&m データセットに適用し,様々な基準法における不確実性推定の改善を実証した。提案手法は, セグメンテーション精度と品質管理の両面で, 医用画像セグメンテーションのためのnnu-netをさらに強化する。 The self-configuring nnU-Net has achieved leading performance in a large range of medical image segmentation challenges. It is widely considered as the model of choice and a strong baseline for medical image segmentation. However, despite its extraordinary performance, nnU-Net does not supply a measure of uncertainty to indicate its possible failure. This can be problematic for large-scale image segmentation applications, where data are heterogeneous and nnU-Net may fail without notice. In this work, we introduce a novel method to estimate nnU-Net uncertainty for medical image segmentation. We propose a highly effective scheme for posterior sampling of weight space for Bayesian uncertainty estimation. Different from previous baseline methods such as Monte Carlo Dropout and mean-field Bayesian Neural Networks, our proposed method does not require a variational architecture and keeps the original nnU-Net architecture intact, thereby preserving its excellent performance and ease of use. Additionally, we boost the segmentation performance over the original nnU-Net via marginalizing multi-modal posterior models. We applied our method on the public ACDC and M&M datasets of cardiac MRI and demonstrated improved uncertainty estimation over a range of baseline methods. The proposed method further strengthens nnU-Net for medical image segmentation in terms of both segmentation accuracy and quality control.	翻訳日:2022-12-14 13:35:25 公開日:2022-12-12
# 適応型ヒューマン・イン・ザ・ループ法による添加物製造プロセスのエミッション検出とコンピュータビジョンを用いたアクティブラーニング An adaptive human-in-the-loop approach to emission detection of Additive Manufacturing processes and active learning with computer vision ( http://arxiv.org/abs/2212.06153v1 ) ライセンス: Link先を確認	Xiao Liu and Alan F. Smeaton and Alessandra Mileo	(参考訳) 3Dプリンティング(3D-printing)としても知られるAM(Additive Manufacturing)におけるin-situモニタリングとプロセス制御の最近の進歩は、製造される部品のビルドプロセス中に大量の排出データを収集することを可能にする。このデータは、3Dプリントされた部品の3Dおよび2D表現への入力として使用できる。しかし、分析と使用、およびこのデータのキャラクタリゼーションは依然として手作業のままである。本研究の目的は,AMプロセス中に発生する排出データを自動的に検査・注釈する機械学習技術を用いた適応型ヒューマン・イン・ザ・ループ手法を提案することである。第一に,畳み込みニューラルネットワーク(cnns)を用いてその場監視によって収集された放射データを自動検査し,分類し,第二に,開発された分類モデルにアクティブラーニング技術を適用することで,放射データのラベリングプロセスを高速化するヒューマン・イン・ザ・ループ機構を構築する。 CNNベースのアプローチは転送学習と微調整に依存しており、他の産業画像パターンに適用できる。提案手法の適応性は,不確実なサンプリング戦略により,ヒトの専門家に注釈を提示するサンプルの自動選択によって実現される。 Recent developments in in-situ monitoring and process control in Additive Manufacturing (AM), also known as 3D-printing, allows the collection of large amounts of emission data during the build process of the parts being manufactured. This data can be used as input into 3D and 2D representations of the 3D-printed parts. However the analysis and use, as well as the characterization of this data still remains a manual process. The aim of this paper is to propose an adaptive human-in-the-loop approach using Machine Learning techniques that automatically inspect and annotate the emissions data generated during the AM process. More specifically, this paper will look at two scenarios: firstly, using convolutional neural networks (CNNs) to automatically inspect and classify emission data collected by in-situ monitoring and secondly, applying Active Learning techniques to the developed classification model to construct a human-in-the-loop mechanism in order to accelerate the labeling process of the emission data. The CNN-based approach relies on transfer learning and fine-tuning, which makes the approach applicable to other industrial image patterns. The adaptive nature of the approach is enabled by uncertainty sampling strategy to automatic selection of samples to be presented to human experts for annotation.	翻訳日:2022-12-14 13:27:06 公開日:2022-12-12
# クラスエンコーディングパターンを見つけるためのAIモデル利用計測 AI Model Utilization Measurements For Finding Class Encoding Patterns ( http://arxiv.org/abs/2212.06576v1 ) ライセンス: Link先を確認	Peter Bajcsy and Antonio Cardone and Chenyi Ling and Philippe Dessauw and Michael Majurski and Tim Blattner and Derek Juba and Walid Keyrouz	(参考訳) この仕事は問題に対処する (a)訓練された人工知能(AI)モデル及びモデルの利用率を設計する b) これらの測定に基づいて,AIモデルにトレーニングデータをエンコードする方法を説明する。この問題は、自動運転車の交通標識の分類にAIモデルを使用するなど、セキュリティおよび安全クリティカルなアプリケーションにおけるAIモデルの説明可能性の欠如によって動機付けられている。計算グラフ(AIモデル)、サブグラフ、グラフノードのレベルにおける、交通標識の活用に基づくクラスエンコーディングにおけるAIモデル利用測定と理解パターンの理論的基盤を導入することで、この問題に対処する。概念的には、すべての可能な出力(テンソル状態)の空間におけるユニークな出力の数と分布に基づいて、AIモデルのグラフノード(計算単位)で利用が定義される。本研究では,有害およびクリーンなaiモデルを含むaiモデルから利用率の測定値を抽出する。クリーンなAIモデルとは対照的に、有毒なAIモデルは、そのようなトリガーの存在下で正しいクラスラベルを他のラベルに変更するために、体系的、物理的に実現可能なトラフィックサイン修正(トリガー)を含むトラフィックサインイメージで訓練された。このようなクリーンで有毒なAIモデルのクラスエンコーディングを分析し、トロイの木馬の注入と検出に影響を及ぼす。 This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in self-driving cars. We approach the problems by introducing theoretical underpinnings of AI model utilization measurement and understanding patterns in utilization-based class encodings of traffic signs at the level of computation graphs (AI models), subgraphs, and graph nodes. Conceptually, utilization is defined at each graph node (computation unit) of an AI model based on the number and distribution of unique outputs in the space of all possible outputs (tensor-states). In this work, utilization measurements are extracted from AI models, which include poisoned and clean AI models. In contrast to clean AI models, the poisoned AI models were trained with traffic sign images containing systematic, physically realizable, traffic sign modifications (i.e., triggers) to change a correct class label to another label in a presence of such a trigger. We analyze class encodings of such clean and poisoned AI models, and conclude with implications for trojan injection and detection.	翻訳日:2022-12-14 13:25:59 公開日:2022-12-12
# 分散確率的マルチプレイヤーマルチアーム歩行バンディット Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits ( http://arxiv.org/abs/2212.06279v1 ) ライセンス: Link先を確認	Guojun Xiong, Jian Li	(参考訳) マルチプレイヤーのマルチアームバンディットは、認知無線システムへの応用による、ますます関連する意思決定問題である。この問題のほとんどの研究は、プレイヤーがすべての腕に \textit{full access} を持ち、同じ腕を引っ張るときに報酬を受け取らない設定にのみ焦点をあてている。したがって、すべてのプレイヤーは累積報酬の最大化を目標として同じバンディット問題を解決する。しかし、これらの設定は多くの現実世界のアプリケーションにおいて重要な要素を無視しており、プレイヤーは \textit{a dynamic local subset of arms} に \textit{limited access} を持つ(つまり、腕は 'walking' で、プレイヤーにはアクセスできないことがある)。そこで本稿では,上記のモデリング問題に対処するために,多人数マルチプレイヤー歩行バンディットモデルを提案する。現在の目標は、報酬を最大化することだが、プレイヤーはローカルのサブセットからのみ腕を引くことができ、他のプレイヤーが同じ腕を引かなければ完全な報酬を得られる。我々は,探索・爆発のトレードオフに対処するためにuper confidence bound(ucb)を採用し,衝突を適切に処理するために分散最適化技術を採用する。そこで本研究では,これら2つの手法を慎重に統合することにより,後悔をほぼ最適に保証し,競争経験的性能を得るために容易に実装できる分散アルゴリズムを提案する。 Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have \textit{full access} to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have \textit{limited access} to \textit{a dynamic local subset of arms} (i.e., an arm could sometimes be ``walking'' and not accessible to the player). To this end, this paper proposes a \textit{multi-player multi-armed walking bandits} model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local subset and only collect a full reward if no other players pull the same arm. We adopt Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff and employ distributed optimization techniques to properly handle collisions. By carefully integrating these two techniques, we propose a decentralized algorithm with near-optimal guarantee on the regret, and can be easily implemented to obtain competitive empirical performance.	翻訳日:2022-12-14 13:16:26 公開日:2022-12-12
# 深部強化学習による包括再生エネルギーを用いたハイブリッドエネルギー貯蔵システムの最適計画 Optimal Planning of Hybrid Energy Storage Systems using Curtailed Renewable Energy through Deep Reinforcement Learning ( http://arxiv.org/abs/2212.05662v1 ) ライセンス: Link先を確認	Dongju Kang, Doeun Kang, Sumin Hwangbo, Haider Niaz, Won Bo Lee, J. Jay Liu, Jonggeol Na	(参考訳) エネルギー管理システム(EMS)は、継続的に成長する再生可能エネルギーを活用するためにますます重要になっている。エネルギー利害関係者の効率を最大化するために、電池やグリーン水素などのエネルギー貯蔵システム(ESS)のプロムリングを行う必要がある。しかし、異なる戦略間の活用を計画する最適な意思決定は、大規模問題の複雑さと不確実性に直面している。そこで本研究では,再生可能エネルギーの不確実性を考慮したリアルタイムな ESS 計画を実現するために,ポリシーベースアルゴリズムを用いた高度強化学習手法を提案する。定量的な性能比較により、DRLエージェントは広い動作と観測空間であってもシナリオベース確率最適化(SO)アルゴリズムよりも優れていた。 DRLの不確実性拒絶能力により, 再生可能エネルギーの大幅な不確実性の下で, 純利益と安定システムの最大化を図り, 頑健な性能を確認できた。 DRLエージェントの動作を状態に応じて視覚的に評価するためのアクションマッピングを行った。対応する結果は、drlエージェントが人間の専門家のやり方を学習することを確認し、提案手法の信頼性の高い適用を示唆した。 Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.	翻訳日:2022-12-13 18:43:20 公開日:2022-12-12
# 高分解能微分方程式による加速現象の再検討 Revisiting the acceleration phenomenon via high-resolution differential equations ( http://arxiv.org/abs/2212.05700v1 ) ライセンス: Link先を確認	Shuo Chen, Bin Shi, Ya-xiang Yuan	(参考訳) ネステロフの加速勾配降下(NAG)は、一階アルゴリズムの歴史におけるマイルストーンの1つである。高分解能微分方程式の枠組みが[Shi et al., 2022]で提案されるまでは、加速現象のメカニズムは勾配補正項によるものであることが判明しなかった。収束率に関する高分解能微分方程式の枠組みの理解を深めるために,本論文では,リアプノフ解析と位相空間表現の手法に基づいて,$\mu$-strongly convex関数のnagを引き続き検討する。まず、勾配補正スキームから証明を再検討する。 Chen et al., 2022] と同様、単純計算は証明を極端に単純化し、ステップサイズを小さな修正で$s=1/L$に拡大する。一方、リャプノフ函数の構成法は原則的である。また,暗黙の速度計画からNAGについても検討した。速度の反復性の違いから、リアプノフ関数は追加項を使わずに暗黙速度スキームから構築され、反復差分の計算がより簡単になることがわかった。 NAGの暗黙的速度スキームからの高分解能微分方程式フレームワークは最適であり、勾配補正スキームよりも優れている。 Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms. It was not successfully uncovered until the high-resolution differential equation framework was proposed in [Shi et al., 2022] that the mechanism behind the acceleration phenomenon is due to the gradient correction term. To deepen our understanding of the high-resolution differential equation framework on the convergence rate, we continue to investigate NAG for the $\mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation in this paper. First, we revisit the proof from the gradient-correction scheme. Similar to [Chen et al., 2022], the straightforward calculation simplifies the proof extremely and enlarges the step size to $s=1/L$ with minor modification. Meanwhile, the way of constructing Lyapunov functions is principled. Furthermore, we also investigate NAG from the implicit-velocity scheme. Due to the difference in the velocity iterates, we find that the Lyapunov function is constructed from the implicit-velocity scheme without the additional term and the calculation of iterative difference becomes simpler. Together with the optimal step size obtained, the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.	翻訳日:2022-12-13 18:43:00 公開日:2022-12-12
# ロバストなリカレントニューラルネットワークによる開放水中の船体運動の同定と性能保証 -- 技術報告 Robust Recurrent Neural Network to Identify Ship Motion in Open Water with Performance Guarantees -- Technical Report ( http://arxiv.org/abs/2212.05781v1 ) ライセンス: Link先を確認	Daniel Frank, Decky Aspandi Latif, Michael Muehlebach, Steffen Staab	(参考訳) リカレントニューラルネットワークは、単に入出力測定から未知の非線形システムのダイナミクスを学習することができる。しかし、結果のモデルは入出力マッピングの安定性を保証するものではない。本研究では,非線形乱れを伴う線形時間不変系として,リカレントニューラルネットワークを表現する。パラメータに制約を導入することで、有限利得安定性と増分有限利得安定性を保証できる。この識別法を用いて,開放水中を移動する4自由度船の動きを学習し,無拘束パラメータを用いた他の純粋学習型アプローチと比較する。本解析により,制約付き再帰型ニューラルネットワークは,テストセットの予測精度は低いが,分布外集合において同等の結果を得られ,安定性条件を尊重することを示した。 Recurrent neural networks are capable of learning the dynamics of an unknown nonlinear system purely from input-output measurements. However, the resulting models do not provide any stability guarantees on the input-output mapping. In this work, we represent a recurrent neural network as a linear time-invariant system with nonlinear disturbances. By introducing constraints on the parameters, we can guarantee finite gain stability and incremental finite gain stability. We apply this identification method to learn the motion of a four-degrees-of-freedom ship that is moving in open water and compare it against other purely learning-based approaches with unconstrained parameters. Our analysis shows that the constrained recurrent neural network has a lower prediction accuracy on the test set, but it achieves comparable results on an out-of-distribution set and respects stability conditions.	翻訳日:2022-12-13 18:42:37 公開日:2022-12-12
# 多変量駆動型ディリクレホークスプロセス Multivariate Powered Dirichlet Hawkes Process ( http://arxiv.org/abs/2212.05995v1 ) ライセンス: Link先を確認	Ga\"el Poux-M\'edard, Julien Velcin, Sabine Loudcher	(参考訳) 文書の公開時間は、その意味的内容に関する関連情報を運ぶ。 Dirichlet-Hawkesプロセスは、テキスト情報と出版ダイナミクスを共同でモデル化するために提案されている。このアプローチは、最近のいくつかの作品で成功して使われており、特定の困難な問題 -- 典型的には、短いテキストや絡み合った出版ダイナミックスのために。しかし、現在の形式では、複雑な出版ダイナミクスは許可されていない。特に、推測された話題は互いに独立している -- 金融に関する出版は、例えば、政治に関する出版物には影響しないと仮定されている。本研究では,この仮定を緩和する多変量dirichlet-hawkesプロセス(mpdhp)を開発した。様々な話題に関する出版物が互いに影響を与えている。相互作用するトピックから生じる技術的課題の詳細と克服。我々は,様々な合成データセット上でmpdhpを体系的に評価し,そのアプリケーションドメインと制限を定義する。最後に,redditデータを用いたmpdhpのユースケースを開発した。この記事の最後には、興味のある読者がMPDHPの使用方法と使用時期、そうでないタイミングを知ることができる。 The publication time of a document carries a relevant information about its semantic content. The Dirichlet-Hawkes process has been proposed to jointly model textual information and publication dynamics. This approach has been used with success in several recent works, and extended to tackle specific challenging problems --typically for short texts or entangled publication dynamics. However, the prior in its current form does not allow for complex publication dynamics. In particular, inferred topics are independent from each other --a publication about finance is assumed to have no influence on publications about politics, for instance. In this work, we develop the Multivariate Powered Dirichlet-Hawkes Process (MPDHP), that alleviates this assumption. Publications about various topics can now influence each other. We detail and overcome the technical challenges that arise from considering interacting topics. We conduct a systematic evaluation of MPDHP on a range of synthetic datasets to define its application domain and limitations. Finally, we develop a use case of the MPDHP on Reddit data. At the end of this article, the interested reader will know how and when to use MPDHP, and when not to.	翻訳日:2022-12-13 18:42:23 公開日:2022-12-12
# Dirichlet-Survival Process:トピック依存拡散ネットワークのスケーラブル推論 Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion Networks ( http://arxiv.org/abs/2212.05996v1 ) ライセンス: Link先を確認	Ga\"el Poux-M\'edard, Julien Velcin, Sabine Loudcher	(参考訳) ネットワーク上の情報拡散は、文書の内容、他の出版物に対する出版時期、ネットワークにおけるスプレッダーの位置の3つの特徴を考慮し、効率的にモデル化することができる。以前の作品のほとんどは、それらのうち2つを共同でモデル化するか、あるいは非常にパラメトリックなアプローチに依存している。近年のdirichlet-pointプロセス文献に基づいて,非パラメトリックな非教師付きフレームワークでこれらすべての機能を共同で考慮したhouston(hidden online user-topic network)モデルを紹介する。動的トピック依存の拡散ネットワークを,そのトピックとともに連続的に推定する。これは教師なしであり、入力データとして \textit{(time of publication, information's content, spread entity") の形をした三重項のラベルのないストリームを考える。オンライン推論は、データセットのサイズに線形にスケールする逐次モンテカルロアルゴリズムを用いて行われる。このアプローチは、クラスタリカバリとサブネットワーク推論タスクの両方において、既存のベースラインよりも連続的に改善します。 Information spread on networks can be efficiently modeled by considering three features: documents' content, time of publication relative to other publications, and position of the spreader in the network. Most previous works model up to two of those jointly, or rely on heavily parametric approaches. Building on recent Dirichlet-Point processes literature, we introduce the Houston (Hidden Online User-Topic Network) model, that jointly considers all those features in a non-parametric unsupervised framework. It infers dynamic topic-dependent underlying diffusion networks in a continuous-time setting along with said topics. It is unsupervised; it considers an unlabeled stream of triplets shaped as \textit{(time of publication, information's content, spreading entity)} as input data. Online inference is conducted using a sequential Monte-Carlo algorithm that scales linearly with the size of the dataset. Our approach yields consequent improvements over existing baselines on both cluster recovery and subnetworks inference tasks.	翻訳日:2022-12-13 18:42:03 公開日:2022-12-12
# ユニットコミット問題に対する強化学習と木探索手法 Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem ( http://arxiv.org/abs/2212.06001v1 ) ライセンス: Link先を確認	Patrick de Mars	(参考訳) 需要を満たす世代単位の運用スケジュールを決定する単位コミットメント(UC)問題は、電力系統の運用における基本的な課題である。混合整数プログラミングを用いた既存のUC法は確率的システムには適していない。不確実性をより厳密に考慮するアプローチは、回転予備の必要量を減らし、高い効率で発電所を稼働させ、より多くの可変再生可能エネルギーを統合することで、運用コストを大幅に削減することができる。 uc問題を解決する有望なアプローチは強化学習(rl)であり、人工知能における長年にわたる大きな課題を克服するために用いられてきた最適な意思決定のための方法論である。この論文は、UC問題へのRLの適用を探求し、不確実性の下での堅牢性、複数の問題インスタンスにわたる一般化可能性、以前研究されたよりも大規模な電力システムへのスケーリングといった課題に対処する。これらの課題に対処するため,モデルフリーRLとモデルベース計画を組み合わせた新しい手法であるガイドツリー探索を開発した。 UC問題はマルコフ決定プロセスとして定式化され、イギリスの電力システムからRLエージェントを訓練するための実データに基づくオープンソース環境を開発する。最大100個のジェネレータの問題では、誘導木探索は決定論的UC法と競合し、運用コストを最大1.4 %削減する。 rlの利点は、発電機の故障に対するロバスト性、風力の削減、炭素価格といった電力系統運用者にとって重要な考慮事項を取り入れるために、このフレームワークを簡単に拡張できることである。ジェネレータの停止を考慮した場合、従来の$N-x$予約基準を用いた手法と比較して、誘導木探索は運用コストの2\%以上を節約する。 The unit commitment (UC) problem, which determines operating schedules of generation units to meet demand, is a fundamental task in power systems operation. Existing UC methods using mixed-integer programming are not well-suited to highly stochastic systems. Approaches which more rigorously account for uncertainty could yield large reductions in operating costs by reducing spinning reserve requirements; operating power stations at higher efficiencies; and integrating greater volumes of variable renewables. A promising approach to solving the UC problem is reinforcement learning (RL), a methodology for optimal decision-making which has been used to conquer long-standing grand challenges in artificial intelligence. This thesis explores the application of RL to the UC problem and addresses challenges including robustness under uncertainty; generalisability across multiple problem instances; and scaling to larger power systems than previously studied. To tackle these issues, we develop guided tree search, a novel methodology combining model-free RL and model-based planning. The UC problem is formalised as a Markov decision process and we develop an open-source environment based on real data from Great Britain's power system to train RL agents. In problems of up to 100 generators, guided tree search is shown to be competitive with deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage of RL is that the framework can be easily extended to incorporate considerations important to power systems operators such as robustness to generator failure, wind curtailment or carbon prices. When generator outages are considered, guided tree search saves over 2\% in operating costs as compared with methods using conventional $N-x$ reserve criteria.	翻訳日:2022-12-13 18:41:43 公開日:2022-12-12
# 量子多体状態のハードウェア効率学習 Hardware-efficient learning of quantum many-body states ( http://arxiv.org/abs/2212.06084v1 ) ライセンス: Link先を確認	Katherine Van Kirk, Jordan Cotler, Hsin-Yuan Huang, Mikhail D. Lukin	(参考訳) 高絡み合いの多粒子系の効率的なキャラクタリゼーションは量子科学において顕著な課題である。近年の進歩は、量子多体系の多くの特性を学ぶのに、無作為な数の測定が十分であることを示している。しかし、そのような測定の実行には個々の粒子を完全に制御する必要があるため、多くの実験プラットフォームでは利用できない。本研究では,各粒子が同一大域に分布し,追加のアンシラ粒子が存在しない場合を含む,個々の粒子を制御できるようなシステムにおいて,量子多体状態を学ぶための厳密で効率的なアルゴリズムを提案する。我々は,U(1)格子ゲージ理論におけるエネルギー密度推定アルゴリズムの有効性を数値的に実証し,非常に限られた測定能力を用いて位相順を分類する。 Efficient characterization of highly entangled multi-particle systems is an outstanding challenge in quantum science. Recent developments have shown that a modest number of randomized measurements suffices to learn many properties of a quantum many-body system. However, implementing such measurements requires complete control over individual particles, which is unavailable in many experimental platforms. In this work, we present rigorous and efficient algorithms for learning quantum many-body states in systems with any degree of control over individual particles, including when every particle is subject to the same global field and no additional ancilla particles are available. We numerically demonstrate the effectiveness of our algorithms for estimating energy densities in a U(1) lattice gauge theory and classifying topological order using very limited measurement capabilities.	翻訳日:2022-12-13 18:41:16 公開日:2022-12-12
# 与えられた平均と分散を持つ任意の分布間の全変動距離に対する下限 Lower Bounds for the Total Variation Distance Between Arbitrary Distributions with Given Means and Variances ( http://arxiv.org/abs/2212.05820v1 ) ライセンス: Link先を確認	Tomohiro Nishiyama	(参考訳) 与えられた平均と分散(共分散行列)を持つ実 d-空間上の任意の二つの確率測度に対して、その全変動距離に対して下限を与える。 For arbitrary two probability measures on real d-space with given means and variances (covariance matrices), we provide lower bounds for their total variation distance.	翻訳日:2022-12-13 18:41:02 公開日:2022-12-12
# 経路レベルでの細胞内局在予測のためのグラフアルゴリズム Graph algorithms for predicting subcellular localization at the pathway level ( http://arxiv.org/abs/2212.05991v1 ) ライセンス: Link先を確認	Chris S. Magnano, Anthony Gitter	(参考訳) タンパク質の細胞内局在は正常な細胞プロセスや疾患において重要な因子である。多くのタンパク質の局在化資源は静的として扱うが、タンパク質の局在化は生物学的文脈に強く影響される。生物学的経路は、特定の生物学的文脈を表すグラフであり、大規模データから推測できる。生物経路における全ての相互作用の局所化をエッジラベルタスクとして予測するグラフアルゴリズムを開発した。我々は,グラフニューラルネットワーク,確率的グラフィカルモデル,識別分類器など様々なモデルを比較し,キュレーションされた経路データベースからの局所化アノテーションを予測する。また, ウイルス感染によるヒト線維芽細胞の局在を予測し, 生物学的経路を構築するケーススタディも実施した。経路ローカライゼーション予測は,大規模生物学的データの解析に公開可能なローカライゼーションデータを統合するための有望なアプローチである。 Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data. We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task. We compare a variety of models including graph neural networks, probabilistic graphical models, and discriminative classifiers for predicting localization annotations from curated pathway databases. We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection. Pathway localization prediction is a promising approach for integrating publicly available localization data into the analysis of large-scale biological data.	翻訳日:2022-12-13 18:25:58 公開日:2022-12-12
# ラベル差分プライバシーによる回帰 Regression with Label Differential Privacy ( http://arxiv.org/abs/2212.06074v1 ) ライセンス: Link先を確認	Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash Varadarajan, Chiyuan Zhang	(参考訳) ラベル差分プライバシー(DP)を保証した回帰モデルの学習課題について検討する。ラベル値のグローバルな事前分布に基づいて, 与えられた回帰損失関数の下で最適なラベルDPランダム化機構を導出する。最適機構が ‘randomized response on bins'' の形をとることを証明し、最適なbin値を求める効率的なアルゴリズムを提案する。アルゴリズムの有効性を示すいくつかのデータセットについて,徹底的な実験評価を行った。 We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a ``randomized response on bins'', and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.	翻訳日:2022-12-13 18:25:45 公開日:2022-12-12
# フランクウルフによる多次元ホークス過程の高速学習 Fast Learning of Multidimensional Hawkes Processes via Frank-Wolfe ( http://arxiv.org/abs/2212.06081v1 ) ライセンス: Link先を確認	Renbo Zhao, Niccol\`o Dalmasso, Mohsen Ghassemi, Vamsi K. Potluru, Tucker Balch, Manuela Veloso	(参考訳) シーケンシャルなイベントデータのモデリングと生成に関して、Hawkesプロセスは最近ツールの最前線に現れている。多次元ホークスプロセスは、異なる種類の事象間の自己および相互励起の両方をモデル化し、財務、疫学、パーソナライズドレコメンデーションなど様々な分野でうまく適用されている。本研究では,Frank-Wolfeアルゴリズムを多次元ホークス過程の学習に適用する。実験結果から,本手法は,他の1次手法よりもパラメータ推定の精度が優れており,実行時間が大幅に高速であることがわかった。 Hawkes processes have recently risen to the forefront of tools when it comes to modeling and generating sequential events data. Multidimensional Hawkes processes model both the self and cross-excitation between different types of events and have been applied successfully in various domain such as finance, epidemiology and personalized recommendations, among others. In this work we present an adaptation of the Frank-Wolfe algorithm for learning multidimensional Hawkes processes. Experimental results show that our approach has better or on par accuracy in terms of parameter estimation than other first order methods, while enjoying a significantly faster runtime.	翻訳日:2022-12-13 18:25:38 公開日:2022-12-12
# ハンドブレアテ:手のひらからの呼吸異常の非接触モニタリング Hand-breathe: Non-Contact Monitoring of Breathing Abnormalities from Hand Palm ( http://arxiv.org/abs/2212.06089v1 ) ライセンス: Link先を確認	Kawish Pervez, Waqas Aman, M. Mahboob Ur Rahman, M. Wasim Nawaz, Qammer H. Abbasi	(参考訳) ポストコビッド19の世界では、無線周波数(RF)ベースの非接触手法、例えばソフトウェア定義無線(SDR)ベースの手法が、人間のバイタルをインテリジェントにリモートセンシングするための候補として浮上し、コビッド19のような伝染性ウイルスを封じ込めている。そこで本研究では,usrp(universal software radio peripherals)ベースのsdrと古典的機械学習(ml)法を用いた非接触型呼吸異常監視法を提案する。提案手法では,送信アンテナと受信アンテナの間にあるテーブルに手を置くとともに,直交周波数分割多重化(ofdm)信号が手を通過する。その後、受信機はチャネル周波数応答(基本的、微細な無線チャネル状態情報)を抽出し、様々なMLアルゴリズムに供給し、最終的に異なる呼吸異常を分類する。すべての分類器のうち、線形svm分類器は最大精度88.1\%であった。 ML分類器を教師付きで訓練するために,実験室環境における4被験者のリアルタイム実験によりデータ収集を行った。ラベル生成の目的で、被験者の呼吸は正常、速、低呼吸の3つのクラスに分類された。さらに,提案手法(手のみをRF信号に曝す方法)に加えて,最先端手法(完全胸部をRF放射に曝す方法)の実装と試験を行った。両手法の性能比較の結果,提案手法の精度はわずかに劣るが,本手法ではRF照射による被曝が最小限に抑えられるというトレードオフが示された。 In post-covid19 world, radio frequency (RF)-based non-contact methods, e.g., software-defined radios (SDR)-based methods have emerged as promising candidates for intelligent remote sensing of human vitals, and could help in containment of contagious viruses like covid19. To this end, this work utilizes the universal software radio peripherals (USRP)-based SDRs along with classical machine learning (ML) methods to design a non-contact method to monitor different breathing abnormalities. Under our proposed method, a subject rests his/her hand on a table in between the transmit and receive antennas, while an orthogonal frequency division multiplexing (OFDM) signal passes through the hand. Subsequently, the receiver extracts the channel frequency response (basically, fine-grained wireless channel state information), and feeds it to various ML algorithms which eventually classify between different breathing abnormalities. Among all classifiers, linear SVM classifier resulted in a maximum accuracy of 88.1\%. To train the ML classifiers in a supervised manner, data was collected by doing real-time experiments on 4 subjects in a lab environment. For label generation purpose, the breathing of the subjects was classified into three classes: normal, fast, and slow breathing. Furthermore, in addition to our proposed method (where only a hand is exposed to RF signals), we also implemented and tested the state-of-the-art method (where full chest is exposed to RF radiation). The performance comparison of the two methods reveals a trade-off, i.e., the accuracy of our proposed method is slightly inferior but our method results in minimal body exposure to RF radiation, compared to the benchmark method.	翻訳日:2022-12-13 18:25:25 公開日:2022-12-12
# どこから始めるか? 簡単なスキルを複雑な環境に移す Where To Start? Transferring Simple Skills to Complex Environments ( http://arxiv.org/abs/2212.06111v1 ) ライセンス: Link先を確認	Vitalis Vosylius, Edward Johns	(参考訳) ロボット学習は、把持など、ロボットに簡単なスキルを教える多くの方法を提供する。しかし、これらのスキルは通常、オープンで散らかった環境で訓練されているため、より複雑で散らかった環境では望ましくない衝突を引き起こす可能性がある。そこで本研究では,環境のグラフ表現に基づくアプライアンスモデルを提案する。これは,展開中に最適化され,スキルを開始するための適切なロボット構成を見つける。提案手法は,事前取得したスキルをシミュレーションや実環境において,把握作業と配置作業の両方において,これまで見つからなかった制約のある環境に一般化できることを実証する。 Robot learning provides a number of ways to teach robots simple skills, such as grasping. However, these skills are usually trained in open, clutter-free environments, and therefore would likely cause undesirable collisions in more complex, cluttered environments. In this work, we introduce an affordance model based on a graph representation of an environment, which is optimised during deployment to find suitable robot configurations to start a skill from, such that the skill can be executed without any collisions. We demonstrate that our method can generalise a priori acquired skills to previously unseen cluttered and constrained environments, in simulation and in the real world, for both a grasping and a placing task.	翻訳日:2022-12-13 18:24:54 公開日:2022-12-12
# 自律運転への適用による強化学習セキュリティに関する調査研究 A Survey on Reinforcement Learning Security with Application to Autonomous Driving ( http://arxiv.org/abs/2212.06123v1 ) ライセンス: Link先を確認	Ambra Demontis, Maura Pintor, Luca Demetrio, Kathrin Grosse, Hsiao-Ying Lin, Chengfang Fang, Battista Biggio, Fabio Roli	(参考訳) 強化学習は、機械が自身の経験から学ぶことを可能にする。今日では、強化学習アルゴリズムが効果的で信頼できる方針を学習することを防ぐか、または訓練されたエージェントに間違った判断をさせるために慎重に作られた攻撃に対して脆弱であるにもかかわらず、自動運転のような安全クリティカルなアプリケーションで使用されている。強化学習の安全性に関する文献は急速に増加しており、この分野に光を当てるためにいくつかの調査が提案されている。しかし、それらの分類は、手元にあるシステムの種類に応じて適切な防御を選択するには不十分である。我々は,この制限を異なる視点から克服するだけでなく,強化学習アルゴリズムが自動運転の文脈で使用される場合の最先端攻撃と防御の適用可能性についても論じる。 Reinforcement learning allows machines to learn from their own experience. Nowadays, it is used in safety-critical applications, such as autonomous driving, despite being vulnerable to attacks carefully crafted to either prevent that the reinforcement learning algorithm learns an effective and reliable policy, or to induce the trained agent to make a wrong decision. The literature about the security of reinforcement learning is rapidly growing, and some surveys have been proposed to shed light on this field. However, their categorizations are insufficient for choosing an appropriate defense given the kind of system at hand. In our survey, we do not only overcome this limitation by considering a different perspective, but we also discuss the applicability of state-of-the-art attacks and defenses when reinforcement learning algorithms are used in the context of autonomous driving.	翻訳日:2022-12-13 18:24:42 公開日:2022-12-12
# 精神状態に基づくパーソナライズ型睡眠誘導システムの開発 Development of Personalized Sleep Induction System based on Mental States ( http://arxiv.org/abs/2212.05669v1 ) ライセンス: Link先を確認	Young-Seok Kweon, Gi-Hwan Shin, Heon-Gyu Kwak	(参考訳) 睡眠は認知、運動、感情的なパフォーマンスの低下や様々な病気を防ぐために不可欠な行動である。しかし、人々が眠りたいときに眠りに落ちることは容易ではない。睡眠障害には、新型コロナウイルス(covid-19)の状況、外からの騒音、夜間の光など様々な要因がある。脳波と聴覚刺激を用いた精神状態に基づくパーソナライズされた睡眠誘導システムの構築を目指している。本システムは,脳波とピッツバーグ睡眠状態指標とブルネル気分尺度の結果を用いて,ユーザの精神状態を解析する。精神状態によると、このシステムは、ホワイトノイズ、繰り返しビープ音、雨音、バイノーラルビート、シェーム音の5つの聴覚刺激の間で睡眠誘導音を演奏する。最後に、睡眠誘発システムは、非刺激性眼球運動睡眠の場合、94.7%の被験者の睡眠ステージを分類し、聴覚刺激を停止した。当システムでは20名のうち18名が眠る。 Sleep is an essential behavior to prevent the decrement of cognitive, motor, and emotional performance and various diseases. However, it is not easy to fall asleep when people want to sleep. There are various sleep-disturbing factors such as the COVID-19 situation, noise from outside, and light during the night. We aim to develop a personalized sleep induction system based on mental states using electroencephalogram and auditory stimulation. Our system analyzes users' mental states using an electroencephalogram and results of the Pittsburgh sleep quality index and Brunel mood scale. According to mental states, the system plays sleep induction sound among five auditory stimulation: white noise, repetitive beep sounds, rainy sound, binaural beat, and sham sound. Finally, the sleep-inducing system classified the sleep stage of participants with 94.7 percent and stopped auditory stimulation if participants showed non-rapid eye movement sleep. Our system makes 18 participants fall asleep among 20 participants.	翻訳日:2022-12-13 18:16:45 公開日:2022-12-12
# Wasserstein分布ロバスト最適化による一般化と正規化について On Generalization and Regularization via Wasserstein Distributionally Robust Optimization ( http://arxiv.org/abs/2212.05716v1 ) ライセンス: Link先を確認	Qinyu Wu, Jonathan Yu-Meng Li, Tiantian Mao	(参考訳) wasserstein distributionally robust optimization (dro) は、運用研究や機械学習アプリケーションで成功し、サンプル外のパフォーマンスに有利なソリューションを得るための強力な手段となった。この成功の2つの説得力ある説明は、ワッサーシュタイン DRO から導かれる一般化境界と、ワッサースタイン DRO と機械学習によく適用される正規化スキームの等価性である。一般化境界と正規化の同値性に関する既存の結果は、wasserstein球が特定の型であり、決定基準が期待される関数の特定の形を取るような設定に限定されている。本稿では,アフィン決定規則を伴うwasserstein dro問題に焦点をあてることで,wasserstein球が一般型であり,決定基準がリスクの一般的な尺度,すなわち非線形分布となるような,かなり広い設定で一般化境界と正規化の等価性を得ることができることを示す。これにより、wasserstein droを使用していない多くの重要な分類、回帰、リスク最小化アプリケーションに対応することができる。我々の結果は、一般化境界は次元性の呪いに苦しめられず、正規化の同値性は正確であるという点で強い。副産物として、我々の正規化結果は、正規化定式化によって効率的に解けるワッサーシュタイン DRO モデルのクラスを大きく広げた。 Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly applied in machine learning. Existing results on generalization bounds and the equivalency to regularization are largely limited to the setting where the Wasserstein ball is of a certain type and the decision criterion takes certain forms of an expected function. In this paper, we show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization in a significantly broader setting where the Wasserstein ball can be of a general type and the decision criterion can be a general measure of risk, i.e., nonlinear in distributions. This allows for accommodating many important classification, regression, and risk minimization applications that have not been addressed to date using Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a byproduct, our regularization results broaden considerably the class of Wasserstein DRO models that can be solved efficiently via regularization formulations.	翻訳日:2022-12-13 18:16:29 公開日:2022-12-12
# 安全課題に対するモデルフリー強化学習の評価 Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks ( http://arxiv.org/abs/2212.05727v1 ) ライセンス: Link先を確認	Linrui Zhang and Qin Zhang and Li Shen and Bo Yuan and Xueqian Wang and Dacheng Tao	(参考訳) 安全性は、自律エージェントを含む多くの現実世界のアプリケーションで最初に提供される。安全クリティカルなタスクに焦点を絞った強化学習(RL)手法は多数存在するが、複雑な未知のダイナミクスの下で各決定ステップにおける安全性制約に準拠するアルゴリズムの高品質な評価は依然として不足している。本稿では,この領域における先行研究を,状態的に安全なRLの観点から再考し,それぞれプロジェクションベース,リカバリベース,最適化ベースのアプローチとして分類する。さらに,安全最適化と安全予測を組み合わせた共同手法であるUnrolling Safety Layer (USL)を提案する。この新手法はディープアンロールアーキテクチャを通じて明示的に厳しい制約を強制し、報酬改善と制約満足度の間のトレードオフをナビゲートする構造上の利点を享受する。この領域のさらなる研究を容易にするために、我々は、関連するアルゴリズムを統一パイプラインで再現し、それらをSafeRL-Kitに組み込む。次に、ロボット制御から自律運転までの6つのベンチマークで、関連するアルゴリズムの比較研究を行う。実験結果から,タスク依存の手工法を使わずにゼロコスト・リターン政策を学習する際の適用性と堅牢性について考察した。プロジェクトページはhttps://sites.google.com/view/saferlkit.comで閲覧できる。 Safety comes first in many real-world applications involving autonomous agents. Despite a large number of reinforcement learning (RL) methods focusing on safety-critical tasks, there is still a lack of high-quality evaluation of those algorithms that adheres to safety constraints at each decision step under complex and unknown dynamics. In this paper, we revisit prior work in this scope from the perspective of state-wise safe RL and categorize them as projection-based, recovery-based, and optimization-based approaches, respectively. Furthermore, we propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. This novel technique explicitly enforces hard constraints via the deep unrolling architecture and enjoys structural advantages in navigating the trade-off between reward improvement and constraint satisfaction. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit, a toolkit that provides off-the-shelf interfaces and evaluation utilities for safety-critical tasks. We then perform a comparative study of the involved algorithms on six benchmarks ranging from robotic control to autonomous driving. The empirical results provide an insight into their applicability and robustness in learning zero-cost-return policies without task-dependent handcrafting. The project page is available at https://sites.google.com/view/saferlkit.	翻訳日:2022-12-13 18:16:03 公開日:2022-12-12
# クリックスルー速度予測における埋め込みの適応的低精度訓練 Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction ( http://arxiv.org/abs/2212.05735v1 ) ライセンス: Link先を確認	Shiwei Li, Huifeng Guo, Lu Hou, Wei Zhang, Xing Tang, Ruiming Tang, Rui Zhang, Ruixuan Li	(参考訳) 埋め込みテーブルは通常、クリックスルーレート(CTR)予測モデルにおいて巨大である。 CTRモデルを効率よく、経済的に訓練・展開するには、トレーニング段階での埋め込みテーブルを圧縮する必要がある。この目的のために,新しい量子化訓練パラダイムを定式化し,学習段階からの埋め込みを圧縮し,低精度訓練(lpt)と呼ぶ。また,その収束に関する理論的解析を行う。その結果, 確率的重み量子化は, LPTにおける決定論的重み量子化よりも収束速度が速く, 収束誤差も小さいことがわかった。さらに, 精度劣化を軽減するために, 勾配降下を通じてステップサイズ(すなわち量子化分解能)を学習する適応型低精度トレーニング(alpt)を提案する。 2つの実世界のデータセットの実験により、ALPTが予測精度、特に極低ビット幅で著しく向上できることが確認された。 CTRモデルでは,予測精度を犠牲にすることなく8ビット埋め込みのトレーニングに成功した。 ALPTのコードは公開されている。 Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables at the training stage. To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT). Also, we provide theoretical analysis on its convergence. The results show that stochastic weight quantization has a faster convergence rate and a smaller convergence error than deterministic weight quantization in LPT. Further, to reduce the accuracy degradation, we propose adaptive low-precision training (ALPT) that learns the step size (i.e., the quantization resolution) through gradient descent. Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit widths. For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy. The code of ALPT is publicly available.	翻訳日:2022-12-13 18:15:41 公開日:2022-12-12
# シャッフルとバッチクリッピングによるDP-SGDの一般化 Generalizing DP-SGD with Shuffling and Batching Clipping ( http://arxiv.org/abs/2212.05796v1 ) ライセンス: Link先を確認	Marten van Dijk, Phuong Ha Nguyen, Toan N. Nguyen and Lam M. Nguyen	(参考訳) 古典的な差分DP-SGDは、ランダムサブサンプリングによる個々のクリッピングを実装し、ミニバッチSGDアプローチを強制する。 DP-SGDを超越した一般微分プライベートアルゴリズムフレームワークを提供し、バッチクリッピングと組み合わせて一階最適化(古典的なSGDや運動量に基づくSGDアプローチなど)を可能とし、クリッピングされた勾配を(個々のクリッピングで行うように)要約するのではなく、計算された勾配の集合をクリッピングする。このフレームワークはまた、シャッフルのようなランダムなサブサンプリング以外のサンプリング技術も認めている。我々のDP分析は$f$-DPアプローチに従い、グループプライバシの分析を可能にする新しい証明手法を導入する。特に、e$ epochs の作業とサイズ $g$ のグループに対して、シャッフル付きバッチクリッピング用の$\sqrt{g e}$ dp 依存性を示します。これは、以前予想されていた$g$の線形依存性よりもはるかに優れており、一般的に$\sqrt{e}$以上の$e$ epochs内のラウンドの総数に対する以前の予測の平方根依存性よりもはるかに優れている。 Classical differential private DP-SGD implements individual clipping with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clipping, which clips an aggregate of computed gradients rather than summing clipped gradients (as is done in individual clipping). The framework also admits sampling techniques beyond random subsampling such as shuffling. Our DP analysis follows the $f$-DP approach and introduces a new proof technique which allows us to also analyse group privacy. In particular, for $E$ epochs work and groups of size $g$, we show a $\sqrt{g E}$ DP dependency for batch clipping with shuffling. This is much better than the previously anticipated linear dependency in $g$ and is much better than the previously expected square root dependency on the total number of rounds within $E$ epochs which is generally much more than $\sqrt{E}$.	翻訳日:2022-12-13 18:15:24 公開日:2022-12-12
# HACA3:マルチサイトMR画像調和のための統一的アプローチ HACA3: A Unified Approach for Multi-site MR Image Harmonization ( http://arxiv.org/abs/2212.06065v1 ) ライセンス: Link先を確認	Lianrui Zuo, Yihao Liu, Yuan Xue, Blake E. Dewey, Murat Bilgel, Ellen M. Mowry, Scott D. Newsome, Peter A. Calabresi, Susan M. Resnick, Jerry L. Prince, Aaron Carass	(参考訳) 標準化の欠如は磁気共鳴(MR)イメージングにおいて顕著な問題である。これはしばしば、ハードウェアと取得パラメータの違いによる望ましくないコントラスト変動を引き起こす。近年,非所望のコントラスト変動を補うために,画像合成によるMRハーモニゼーションが提案されている。既存の方法の成功にもかかわらず、私たちは3つの大きな改善ができると主張している。第一に、既存のほとんどの手法は、同一対象のマルチコントラストMR画像が同じ解剖学を共有するという仮定に基づいて構築されている。異なるmrコントラストは異なる解剖学的特徴を強調するために特別であるため、この仮定は疑わしい。第二に、これらの方法はトレーニングのために固定されたMRコントラスト(例えば、Tw強調画像とT2強調画像の両方が利用可能でなければならない)を必要とすることが多い。第3に、既存の手法は一般的にイメージングアーティファクトに敏感である。本稿では,これらの3つの問題に対処するために,注意に基づくコントラスト,解剖,アーティファクト意識(HACA3)を用いた調和方式を提案する。まず,haca3をmrコントラスト間の解剖学的差異を尊重する解剖学的融合モジュールを提案する。 HACA3はまた、イメージングアーティファクトにも堅牢であり、MRコントラストの任意のセットにトレーニングおよび適用することができる。実験により、HACA3は複数の画像品質指標の下で最先端のパフォーマンスを達成することが示された。また,フィールド強度の異なる21のサイト,スキャナプラットフォーム,取得プロトコルから取得したMRデータセットを用いて,下流タスクにおけるHACA3の適用性を示す。 The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations due to differences in hardware and acquisition parameters. In recent years, MR harmonization using image synthesis with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both Tw-weighted and T2-weighted images must be available), which limits their applicability. Third, existing methods generally are sensitive to imaging artifacts. In this paper, we present a novel approach, Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), to address these three issues. We first propose an anatomy fusion module that enables HACA3 to respect the anatomical differences between MR contrasts. HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability of HACA3 on downstream tasks with diverse MR datasets acquired from 21 sites with different field strengths, scanner platforms, and acquisition protocols.	翻訳日:2022-12-13 18:08:59 公開日:2022-12-12
# マルチプレイヤー不完全情報ゲームにおけるベイジアン対戦モデル Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games ( http://arxiv.org/abs/2212.06027v1 ) ライセンス: Link先を確認	Sam Ganzfried, Kevin A. Wang, Max Chiswick	(参考訳) 多くの現実世界の設定エージェントは、様々な戦略を利用できる複数の反対エージェントと戦略的に相互作用する。このような設定のためにエージェントを設計する標準的なアプローチは、nash均衡のような関連するゲーム理論的な解の概念を計算または近似し、所定の戦略に従うことである。しかし、このような戦略は、相手のプレーの観察を無視し、悪用できる欠点を示す可能性がある。本稿では,マルチプレイヤー不完全情報ゲームにおいて,繰り返しのインタラクションを通じて対戦者のプレーを観察する対戦者モデリング手法を提案する。我々は,3人プレイのクーンポーカーにおいて,多種多様な実敵と正確なナッシュ均衡戦略に対して実験を行い,このアルゴリズムがナッシュ均衡戦略を含む全てのエージェントを著しく上回ることを示す。 In many real-world settings agents engage in strategic interactions with multiple opposing agents who can employ a wide variety of strategies. The standard approach for designing agents for such settings is to compute or approximate a relevant game-theoretic solution concept such as Nash equilibrium and then follow the prescribed strategy. However, such a strategy ignores any observations of opponents' play, which may indicate shortcomings that can be exploited. We present an approach for opponent modeling in multiplayer imperfect-information games where we collect observations of opponents' play through repeated interactions. We run experiments against a wide variety of real opponents and exact Nash equilibrium strategies in three-player Kuhn poker and show that our algorithm significantly outperforms all of the agents, including the exact Nash equilibrium strategies.	翻訳日:2022-12-13 18:08:34 公開日:2022-12-12
# 近臨界レーザープラズマからの電子スペクトルに対する可逆ニューラルネットワークの受容速度:比較 Acceptance Rates of Invertible Neural Networks on Electron Spectra from Near-Critical Laser-Plasmas: A Comparison ( http://arxiv.org/abs/2212.05836v1 ) ライセンス: Link先を確認	Thomas Miethlinger, Nico Hoffmann, Thomas Kluge	(参考訳) 超短パルスと近接および過臨界プラズマの相互作用は直接観測できないが、実験的にアクセス可能な量(観測可能量)は、しばしば、基礎となるプラズマダイナミクスに関する情報を間接的に与えるだけである。さらに、observablesが提供する情報は不完全であり、逆問題は非常に曖昧である。したがって、プラズマ力学と実験パラメータを推論するためには、観測されたパラメータの完全な分布を考慮し、モデルが柔軟であり、前方プロセスで失われた情報を考慮しなければならない。 Invertible Neural Networks (INNs) は、前と逆のプロセスの両方を効率的にモデル化し、特定の測定値の完全な条件付き後部を提供するように設計されている。本研究では,合成電子スペクトルのINNと標準統計手法のベンチマークを行う。まず,受入率について実験を行い,受入率を最大10倍に向上させた。さらに,この受入率の増加は,IMNのスピードアップを同じ程度に向上させることを示す。最後に, innを活用し,高い精度を維持しつつ低ランタイムを約束する複合アルゴリズムを提案する。 While the interaction of ultra-intense ultra-short laser pulses with near- and overcritical plasmas cannot be directly observed, experimentally accessible quantities (observables) often only indirectly give information about the underlying plasma dynamics. Furthermore, the information provided by observables is incomplete, making the inverse problem highly ambiguous. Therefore, in order to infer plasma dynamics as well as experimental parameter, the full distribution over parameters given an observation needs to considered, requiring that models are flexible and account for the information lost in the forward process. Invertible Neural Networks (INNs) have been designed to efficiently model both the forward and inverse process, providing the full conditional posterior given a specific measurement. In this work, we benchmark INNs and standard statistical methods on synthetic electron spectra. First, we provide experimental results with respect to the acceptance rate, where our results show increases in acceptance rates up to a factor of 10. Additionally, we show that this increased acceptance rate also results in an increased speed-up for INNs to the same extent. Lastly, we propose a composite algorithm that utilizes INNs and promises low runtimes while preserving high accuracy.	翻訳日:2022-12-13 18:08:21 公開日:2022-12-12
# CTT-Net:白内障術後視力予測用多視点クロストーケン変換器 CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction ( http://arxiv.org/abs/2212.05794v1 ) ライセンス: Link先を確認	Jinhong Wang, Jingwen Wang, Tingting Chen, Wenhao Zheng, Zhe Xu, Xingdi Wu, Wen Xu, Haochao Ying, Danny Chen, and Jian Wu	(参考訳) 手術は白内障の視力障害(va)に対して唯一有効な治療である。臨床的には、白内障手術の必要性を評価するために、多視点光コヒーレンス断層撮影(OCT)画像を分析して術後VAを正確に予測する必要がある。残念ながら, 術後VAの判定は, 医療専門家にとって依然として困難である。近年,この問題の深層学習手法が開発されている。有効ではあるが、マルチビューのoct画像間の潜在的な関係を効率的に探索しない、臨床事前知識(例えば、術前のva値)の重要役割を無視する、参照を欠いた回帰ベースのメトリクスのみを使用するなど、いくつかの問題に直面している。本稿では,多視点OCT画像と術前VA画像の両方を分析し,術後VA予測のための新しいクロストークントランスフォーマーネットワーク(CTT-Net)を提案する。 oct画像の多視点特徴を効果的に融合するために,冗長・不要な注意フローを制限できるクロストケン注意手法を開発した。さらに,術前VA値を用いて,術後VA予測のための情報を提供し,ビュー間の融合を容易にする。さらに,モデル性能を向上させるために補助的な分類損失を設計し,回帰メトリクスのみを用いて制限を回避し,vaリカバリを十分に評価する。 CTT-Netを評価するために,共同病院から収集した多視点CT画像データセットを構築した。広範な実験のセットは、さまざまなメトリクスの既存の方法と比較して、モデルの有効性を検証するものです。コードは、https://github.com/wjh892521292/Cataract OCTで入手できる。 Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for medical experts. Deep learning methods for this problem were developed in recent years. Although effective, these methods still face several issues, such as not efficiently exploring potential relations between multi-view OCT images, neglecting the key role of clinical prior knowledge (e.g., preoperative VA value), and using only regression-based metrics which are lacking reference. In this paper, we propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction by analyzing both the multi-view OCT images and preoperative VA. To effectively fuse multi-view features of OCT images, we develop cross-token attention that could restrict redundant/unnecessary attention flow. Further, we utilize the preoperative VA value to provide more information for postoperative VA prediction and facilitate fusion between views. Moreover, we design an auxiliary classification loss to improve model performance and assess VA recovery more sufficiently, avoiding the limitation by only using the regression metrics. To evaluate CTT-Net, we build a multi-view OCT image dataset collected from our collaborative hospital. A set of extensive experiments validate the effectiveness of our model compared to existing methods in various metrics. Code is available at: https://github.com/wjh892521292/Cataract OCT.	翻訳日:2022-12-13 17:59:16 公開日:2022-12-12
# Z-SSMNet:bpMRIにおける前立腺癌の検出と診断のためのゾナル・アウェア・セルフスーパービジョンメッシュネットワーク Z-SSMNet: A Zonal-aware Self-Supervised Mesh Network for Prostate Cancer Detection and Diagnosis in bpMRI ( http://arxiv.org/abs/2212.05808v1 ) ライセンス: Link先を確認	Yuan Yuan, Euijoon Ahn, Dagan Feng, Mohamad Khadra, Jinman Kim	(参考訳) 前立腺癌(PCa)は、男性において最も多いがんの1つであり、世界中の多くの人々が臨床的に重要なPCa(csPCa)によって死亡した。マルチパラメトリックMRI (mpMRI) と比較して非侵襲的, 費用対効果が高く, より効率的であるbi-parametric MRI (bpMRI) におけるcsPCaの早期診断はPCaの精度向上に寄与する。人工知能(AI)アルゴリズムの急速な増加は、csPCaの診断と理解に役立つ意思決定支援システムの提供において、前例のない改善を可能にしている。しかし、ディープラーニング技術に基づくAIアルゴリズムの既存の状態は、しばしば3Dボリューム画像のスライス間相関を捉えるのに失敗する2D画像に制限される。 3D畳み込みニューラルネットワーク(CNN)の使用は、この制限を部分的に克服するが、画像の異方性に適応しないため、準最適セマンティック表現や一般化が不足する。さらに、bpMRIのラベル付きデータの量制限とラベル付けの難しさにより、既存のCNNは比較的小さなデータセット上に構築されており、性能が劣っている。上記の制約に対処するため,複数2D,2.5D,3DのCNNを適応的に融合させてbpMRIにおけるスパース・インタースライス情報と高密度スライス情報の表現を効果的にバランスさせる,Z-SSMNetを提案する。自己教師付き学習(ssl)技術がさらに導入され,非ラベルデータを用いてネットワークを事前学習し,一般化可能な画像特徴を学習する。さらに,cspcaの診断精度を向上させるために,動物特異的ドメイン知識の理解にネットワークを制約した。 pi-caiチャレンジデータセットの実験により、bpmriにおけるcspca検出および診断の性能が向上することを示す。 Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms are enabling unprecedented improvements in providing decision support systems that can aid in csPCa diagnosis and understanding. However, existing state of the art AI algorithms which are based on deep learning technology are often limited to 2D images that fails to capture inter-slice correlations in 3D volumetric images. The use of 3D convolutional neural networks (CNNs) partly overcomes this limitation, but it does not adapt to the anisotropy of images, resulting in sub-optimal semantic representation and poor generalization. Furthermore, due to the limitation of the amount of labelled data of bpMRI and the difficulty of labelling, existing CNNs are built on relatively small datasets, leading to a poor performance. To address the limitations identified above, we propose a new Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptatively fuses multiple 2D, 2.5D and 3D CNNs to effectively balance representation for sparse inter-slice information and dense intra-slice information in bpMRI. A self-supervised learning (SSL) technique is further introduced to pre-train our network using unlabelled data to learn the generalizable image features. Furthermore, we constrained our network to understand the zonal specific domain knowledge to improve the diagnosis precision of csPCa. Experiments on the PI-CAI Challenge dataset demonstrate our proposed method achieves better performance for csPCa detection and diagnosis in bpMRI.	翻訳日:2022-12-13 17:58:49 公開日:2022-12-12
# KonX: クロスリゾリューション画像の品質評価 KonX: Cross-Resolution Image Quality Assessment ( http://arxiv.org/abs/2212.05813v1 ) ライセンス: Link先を確認	Oliver Wiedemann and Vlad Hosu and Shaolin Su and Dietmar Saupe	(参考訳) スケール不変性は多くのコンピュータビジョンサブフィールドにおいてオープンな問題である。例えば、オブジェクトラベルはスケールにわたって一定であり続けるべきですが、モデル予測は多くのケースでばらつきます。この問題は、プレゼンテーションの規模で基調ラベルが変化するタスクでは難しくなる。画質アセスメント(iqa)では、ダウンサンプリングはぼやけや圧縮アーティファクトなどの障害を弱め、主観的な研究で誘発される印象に正の影響を与える。したがって、知覚画像の品質を正確に予測するためには、クロスレゾリューションIQA法はモデル不整合によって引き起こされる分解能依存誤差と、基底真実における知覚ラベルシフトを考慮しなければならない。本報告では,この2つの問題を分離して検討する手法として,新しいクロスレゾリューションIQAデータベースであるKonXについて述べる。本稿は以下のとおりである。 1. KonX を用いて, 表示解像度の変化によるラベルシフトの実証的証拠を提供する。 2. 客観的 iqa 手法にはスケールバイアスがあり,予測性能が低下することを示す。 3) 従来のIQAモデルよりも性能が向上するマルチスケール・マルチカラムDNNアーキテクチャを提案する。そこで我々は,画像品質評価における新たな研究課題を提起し,解決する。 Scale-invariance is an open problem in many computer vision subfields. For example, object labels should remain constant across scales, yet model predictions diverge in many cases. This problem gets harder for tasks where the ground-truth labels change with the presentation scale. In image quality assessment (IQA), downsampling attenuates impairments, e.g., blurs or compression artifacts, which can positively affect the impression evoked in subjective studies. To accurately predict perceptual image quality, cross-resolution IQA methods must therefore account for resolution-dependent errors induced by model inadequacies as well as for the perceptual label shifts in the ground truth. We present the first study of its kind that disentangles and examines the two issues separately via KonX, a novel, carefully crafted cross-resolution IQA database. This paper contributes the following: 1. Through KonX, we provide empirical evidence of label shifts caused by changes in the presentation resolution. 2. We show that objective IQA methods have a scale bias, which reduces their predictive performance. 3. We propose a multi-scale and multi-column DNN architecture that improves performance over previous state-of-the-art IQA models for this task, including recent transformers. We thus both raise and address a novel research problem in image quality assessment.	翻訳日:2022-12-13 17:58:11 公開日:2022-12-12
# NFResNet:デブロアリングのためのマルチスケールおよびU字型ネットワーク NFResNet: Multi-scale and U-shaped Networks for Deblurring ( http://arxiv.org/abs/2212.05909v1 ) ライセンス: Link先を確認	Tanish Mittal, Preyansh Agrawal, Esha Pahwa, Aarya Makwana	(参考訳) マルチスケールおよびU字型ネットワークは、デブロアリングを含む様々な画像復元問題に広く利用されている。幅広い応用を念頭に置いて,これらのアーキテクチャの比較と画像の劣化に対する影響について述べる。また、NFResblockと呼ばれる新しいブロックも導入する。高速フーリエ変換層と一連の修正された非線形活性化自由ブロックからなる。これらのアーキテクチャと追加に基づき,NFResnetとNFResnet+を導入し,それぞれマルチスケールアーキテクチャとU-Netアーキテクチャを改良した。また、これらのアーキテクチャをトレーニングするために、Charbonnier Loss、Edge Loss、 Frequency Reconstruction Lossという3つの異なる損失関数を使用します。本稿では,各成分のアブレーション研究とともに,深部ビデオデブラリングデータセットに関する広範囲な実験を行った。提案手法は,Pak Signal to Noise (PSNR) 比と構造類似度指数 (SSIM) の値を大きく向上させる。 Multi-Scale and U-shaped Networks are widely used in various image restoration problems, including deblurring. Keeping in mind the wide range of applications, we present a comparison of these architectures and their effects on image deblurring. We also introduce a new block called as NFResblock. It consists of a Fast Fourier Transformation layer and a series of modified Non-Linear Activation Free Blocks. Based on these architectures and additions, we introduce NFResnet and NFResnet+, which are modified multi-scale and U-Net architectures, respectively. We also use three different loss functions to train these architectures: Charbonnier Loss, Edge Loss, and Frequency Reconstruction Loss. Extensive experiments on the Deep Video Deblurring dataset, along with ablation studies for each component, have been presented in this paper. The proposed architectures achieve a considerable increase in Peak Signal to Noise (PSNR) ratio and Structural Similarity Index (SSIM) value.	翻訳日:2022-12-13 17:57:34 公開日:2022-12-12
# 不均質効果を評価するハイブリッド型量子回帰林 Hybrid Censored Quantile Regression Forest to Assess the Heterogeneous Effects ( http://arxiv.org/abs/2212.05672v1 ) ライセンス: Link先を確認	Huichen Zhu, Yifei Sun, Ying Wei	(参考訳) 多くの応用において、検閲された応答変数に対する不均一な処理効果は第一の関心事であり、異なる量的効果(例えば中央値)を評価することは自然である。多数の潜在的な効果修飾剤、治療効果の未知の構造、および右検閲の存在は重大な課題をもたらす。本稿では,高次元変数による不均質効果を評価するために,ハイブリッド・セザード・クォンタイル回帰林(hcqrf)と呼ばれるハイブリッド・フォレスト・アプローチを開発した。ハイブリッド推定手法は、ランダム林と検閲された分位量回帰の利点を生かしている。高次元効果関数を扱うために,森林由来の検閲を処理し,適応的に最寄りの重みを推定する2重重重み付き推定手法を提案する。本稿では,治療効果関数に対する変数の影響を測定するために,変数重要度分解を提案する。広範なシミュレーション研究によりhcqrfの有効性と安定性が示された。また,シミュレーションの結果から,変数の重要度分解の有効性が示唆された。大腸癌の臨床治験にHCQRFを適用した。治療効果と有意義な変数重要度を洞察的に推定する。変数の重要性の結果として、分解の必要性も確認される。 In many applications, heterogeneous treatment effects on a censored response variable are of primary interest, and it is natural to evaluate the effects at different quantiles (e.g., median). The large number of potential effect modifiers, the unknown structure of the treatment effects, and the presence of right censoring pose significant challenges. In this paper, we develop a hybrid forest approach called Hybrid Censored Quantile Regression Forest (HCQRF) to assess the heterogeneous effects varying with high-dimensional variables. The hybrid estimation approach takes advantage of the random forests and the censored quantile regression. We propose a doubly-weighted estimation procedure that consists of a redistribution-of-mass weight to handle censoring and an adaptive nearest neighbor weight derived from the forest to handle high-dimensional effect functions. We propose a variable importance decomposition to measure the impact of a variable on the treatment effect function. Extensive simulation studies demonstrate the efficacy and stability of HCQRF. The result of the simulation study also convinces us of the effectiveness of the variable importance decomposition. We apply HCQRF to a clinical trial of colorectal cancer. We achieve insightful estimations of the treatment effect and meaningful variable importance results. The result of the variable importance also confirms the necessity of the decomposition.	翻訳日:2022-12-13 17:51:09 公開日:2022-12-12
# Bottleneck特徴を用いたテキスト注釈のない直接音声音声合成 Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features ( http://arxiv.org/abs/2212.05805v1 ) ライセンス: Link先を確認	Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma	(参考訳) 音声から音声への翻訳は、異なる言語間での発話を直接翻訳し、同時解釈のようなタスクにおいて大きな可能性を持つ。 State-of-artモデルは、通常、音素シーケンス予測のための補助モジュールを含み、トレーニングデータセットのテキストアノテーションを必要とする。テキストの注釈や内容情報なしで学習できる音声から音声への直接翻訳モデルを提案する。モデルに補助音素予測タスクを導入する代わりに,システムの翻訳性能を保証するために,モデルの中間学習目標としてボトルネック機能を使用することを提案する。 Mandarin-Cantonese音声翻訳の実験は,提案手法の有効性を実証し,その性能は翻訳品質と合成品質の点でカスケードシステムと一致させることができる。 Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation. State-of-art models usually contains an auxiliary module for phoneme sequences prediction, and this requires textual annotation of the training dataset. We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information. Instead of introducing an auxiliary phoneme prediction task in the model, we propose to use bottleneck features as intermediate training objectives for our model to ensure the translation performance of the system. Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach and the performance can match a cascaded system with respect of translation and synthesis qualities.	翻訳日:2022-12-13 17:50:50 公開日:2022-12-12
# Androidアプリケーションのための事前学習BERTモデル A Pre-Trained BERT Model for Android Applications ( http://arxiv.org/abs/2212.05976v1 ) ライセンス: Link先を確認	Tiezhu Sun (1), Kevin Allix (1), Kisub Kim (2), Xin Zhou (2), Dongsun Kim (3), David Lo (2), Tegawend\'e F. Bissyand\'e (1) and Jacques Klein (1) ((1) University of Luxembourg, (2) Singapore Management University, (3) Kyungpook National University)	(参考訳) 機械学習(ML)のおかげで、ますます多くのソフトウェアエンジニアリングタスクの自動化が可能になっている。 MLのソフトウェアアーティファクトへの適用における基本的な構成要素の1つは、これらのアーティファクト(例えば、ソースコードや実行可能なコード)を学習に適した形式に表現することである。多くの研究は表現学習を活用し、ML自体に適切な表現を自動設計する仕事を委譲している。しかし、Androidの問題の文脈では、既存のモデルは粗い粒度のアプリケーションレベル(apk2vecなど)に制限されるか、特定の下流タスク(smali2vecなど)で実行される。私たちの研究は、これらの2つの制限を緩和するために、バイトコードの効率的でタスクに依存しない、きめ細かい普遍的な表現を調査する新しい研究の一部です。このような表現は、様々な低レベル下流タスク(例えば、クラスレベルで)に関連する情報をキャプチャすることを目的としている。我々は自然言語処理の分野に触発され、普遍表現の問題は、文に関する抽象的な意味情報を様々なタスクで再利用することを目的として、BERTのようなユニバーサル言語モデルを構築することで解決された。我々は,Androidアプリケーションで使用される主要なバイナリフォーマットであるDEXバイトコードのチャンクを表現するために,BERTライクな言語モデルであるDexBERTを提案する。 DexBERT が DEX 言語をモデル化できるかどうかを実証的に評価し、2 つのクラスレベルのソフトウェアエンジニアリングタスクでモデルの適合性を評価する。また、サイズが大きく異なるアプリへのキャタリングの問題に対処する戦略を実験し、その手法を用いて与えられたタスクに関連する情報を調査する一例を示した。 The automation of an increasingly large number of software engineering tasks is becoming possible thanks to Machine Learning (ML). One foundational building block in the application of ML to software artifacts is the representation of these artifacts (e.g., source code or executable code) into a form that is suitable for learning. Many studies have leveraged representation learning, delegating to ML itself the job of automatically devising suitable representations. Yet, in the context of Android problems, existing models are either limited to coarse-grained whole-app level (e.g., apk2vec) or conducted for one specific downstream task (e.g., smali2vec). Our work is part of a new line of research that investigates effective, task-agnostic, and fine-grained universal representations of bytecode to mitigate both of these two limitations. Such representations aim to capture information relevant to various low-level downstream tasks (e.g., at the class-level). We are inspired by the field of Natural Language Processing, where the problem of universal representation was addressed by building Universal Language Models, such as BERT, whose goal is to capture abstract semantic information about sentences, in a way that is reusable for a variety of tasks. We propose DexBERT, a BERT-like Language Model dedicated to representing chunks of DEX bytecode, the main binary format used in Android applications. We empirically assess whether DexBERT is able to model the DEX language and evaluate the suitability of our model in two distinct class-level software engineering tasks: Malicious Code Localization and Defect Prediction. We also experiment with strategies to deal with the problem of catering to apps having vastly different sizes, and we demonstrate one example of using our technique to investigate what information is relevant to a given task.	翻訳日:2022-12-13 17:40:18 公開日:2022-12-12
# 評価対象は誰か? AIに基づく攻撃コードジェネレータの自動評価基準について Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators ( http://arxiv.org/abs/2212.06008v1 ) ライセンス: Link先を確認	Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, and Domenico Cotroneo	(参考訳) AIベースのコードジェネレータは、ディープニューラルネットワーク(Neural Machine Translation, NMT)を使用して、自然言語による記述から始まるプログラムを自動記述する新しいソリューションである。特にコードジェネレータは、概念実証攻撃を生成することによって倫理的ハッキングや攻撃的なセキュリティテストに使用されている。残念ながら、コードジェネレータの評価にはいくつかの問題がある。現在のプラクティスは自動メトリクスを使用しており、それによって生成されたコードのテキスト的類似性を接地真実参照で計算する。しかし、どのメトリクスを使うべきか、どのメトリクスが特定のコンテキストに最も適しているかは明らかではない。この実用的な経験報告は、攻撃的なコードジェネレータの出力類似度を大量に分析する。攻撃的アセンブリとPythonコードを含む2つのデータセットを英語で記述した2つのNMTモデルに適用した。自動測定値からの見積もりを人的評価と比較し,その強みと限界に関する実践的洞察を提供する。 AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.	翻訳日:2022-12-13 17:39:47 公開日:2022-12-12
# テンソル分解による融合グラフ再構成による時空間交通モデリング Spatial-temporal traffic modeling with a fusion graph reconstructed by tensor decomposition ( http://arxiv.org/abs/2212.05653v1 ) ライセンス: Link先を確認	Qin Li, Xuan Yang, Yong Wang, Yuankai Wu, Deqiang He	(参考訳) 正確な時空間交通流予測は、交通管理者が制御手段とドライバーが最適な経路を選択するのを助けるために不可欠である。近年,グラフ畳み込みネットワーク (GCN) は空間的時間的依存関係を捕捉する強力な能力のため,交通流予測に広く利用されている。時空間グラフ隣接行列の設計はGCNの成功の鍵であり、まだ未解決の問題である。本稿では, テンソル分解による連接行列の再構成を提案し, 交通流予測手法を提案する。まず,空間-時間融合グラフ隣接行列を3方向隣接テンソルに再構成する。次に,タッカー分解による隣接テンソルの再構成を行い,より情報的かつグローバルな時空間依存性を符号化した。最後に、局所化された空間-時間相関学習のための空間-時間同期グラフ畳み込みモジュールと、グローバル相関学習のための拡張畳み込みモジュールとを組み立てて、道路網の包括的空間-時間依存性を集約学習する。 4つのオープンアクセスデータセットによる実験結果から,提案モデルが予測性能と計算コストにおいて最先端の手法より優れていることが示された。 Accurate spatial-temporal traffic flow forecasting is essential for helping traffic managers to take control measures and drivers to choose the optimal travel routes. Recently, graph convolutional networks (GCNs) have been widely used in traffic flow prediction owing to their powerful ability to capture spatial-temporal dependencies. The design of the spatial-temporal graph adjacency matrix is a key to the success of GCNs, and it is still an open question. This paper proposes reconstructing the binary adjacency matrix via tensor decomposition, and a traffic flow forecasting method is proposed. First, we reformulate the spatial-temporal fusion graph adjacency matrix into a three-way adjacency tensor. Then, we reconstructed the adjacency tensor via Tucker decomposition, wherein more informative and global spatial-temporal dependencies are encoded. Finally, a Spatial-temporal Synchronous Graph Convolutional module for localized spatial-temporal correlations learning and a Dilated Convolution module for global correlations learning are assembled to aggregate and learn the comprehensive spatial-temporal dependencies of the road network. Experimental results on four open-access datasets demonstrate that the proposed model outperforms state-of-the-art approaches in terms of the prediction performance and computational cost.	翻訳日:2022-12-13 17:33:07 公開日:2022-12-12
# 深部グラフ拡散情報マックスを用いたCOVID-19パンデミック時の人体移動モデリング Human Mobility Modeling During the COVID-19 Pandemic via Deep Graph Diffusion Infomax ( http://arxiv.org/abs/2212.05707v1 ) ライセンス: Link先を確認	Yang Liu, Yu Rong, Zhuoning Guo, Nuo Chen, Tingyang Xu, Fugee Tsung, Jia Li	(参考訳) 社会的集会制限などの非薬剤的介入(npis)は、人々の接触を減らすことで新型コロナウイルスの感染を遅らせる効果を示している。政策立案者を支援するために、まずマクロ指標(例えば1日平均移動距離)を介して人間の移動をモデル化し、NPIの有効性について研究した。本研究は,モビリティ・モデリングに焦点をあて,マイクロの観点から,新型コロナウイルスの感染者が訪れる場所を予測することを目的としている。 NPIは一般的に経済的・社会的損失を引き起こすため、このようなミクロな視点予測は政府にとって、それらを設計し評価する上で有用である。しかし、現実の状況では、厳格なプライバシーデータ保護規則は厳しいデータ空間の問題(ケースや位置情報の制限など)を引き起こす。これらの課題に対処するため、マイクロ視点モビリティモデリングを幾何学グラフ上で条件付きで、拡散と位置の関連性スコアを計算するために定式化する。本研究では,ddi(deep graph diffusion infomax)というモデルを提案する。このモデルでは,図形グラフ,拡散集合,位置集合などの変数を共同でモデル化し,covid-19予測の研究を容易にするために,covid-19症例の幾何グラフと位置履歴を含む2つのベンチマークを提案する。 2つのベンチマークの大規模な実験により、DGDIは他の競合する手法よりも大幅に優れていることが示された。 Non-Pharmaceutical Interventions (NPIs), such as social gathering restrictions, have shown effectiveness to slow the transmission of COVID-19 by reducing the contact of people. To support policy-makers, multiple studies have first modeled human mobility via macro indicators (e.g., average daily travel distance) and then studied the effectiveness of NPIs. In this work, we focus on mobility modeling and, from a micro perspective, aim to predict locations that will be visited by COVID-19 cases. Since NPIs generally cause economic and societal loss, such a micro perspective prediction benefits governments when they design and evaluate them. However, in real-world situations, strict privacy data protection regulations result in severe data sparsity problems (i.e., limited case and location information). To address these challenges, we formulate the micro perspective mobility modeling into computing the relevance score between a diffusion and a location, conditional on a geometric graph. we propose a model named Deep Graph Diffusion Infomax (DGDI), which jointly models variables including a geometric graph, a set of diffusions and a set of locations.To facilitate the research of COVID-19 prediction, we present two benchmarks that contain geometric graphs and location histories of COVID-19 cases. Extensive experiments on the two benchmarks show that DGDI significantly outperforms other competing methods.	翻訳日:2022-12-13 17:32:47 公開日:2022-12-12
# GT-CausIn:交通予測の新しい因果関係 GT-CausIn: a novel causal-based insight for traffic prediction ( http://arxiv.org/abs/2212.05782v1 ) ライセンス: Link先を確認	Ting Gao, Rodrigo Kappes Marques, Lei Yu	(参考訳) 交通予測は時空間予測の重要な応用である。様々な手法の中で、グラフニューラルネットワークはこれまでに最も有望な結果を達成しており、グラフノード間の関係を学習することが重要な課題となっている。しかし、これらの関係がノード-ノード方式で学習されると、改善空間は非常に制限される。この課題は(1)異なる局間の不明瞭な時間的依存関係、(2)ノードレベルを超えて変数を定義することの難しさ、(3)学習された関係を検証するための既製の方法に起因している。これらの課題に対処するために、トラフィック内の因果関係を発見するための正当なトラフィック因果変数を定義し、統計ツールやケース分析で慎重にチェックする。次に,事前学習された因果情報をグラフ拡散層と時間畳み込みネットワーク(tcn)層に統合した,因果的洞察(gt-causin)に基づくグラフ空間-時間的ネットワークモデルを提案する。 PEMS-BAYとMETR-LAの2つの実世界のトラフィックデータセットで実験を行い、GT-CausInは中期および長期予測において最先端モデルよりも大幅に優れていることを示した。 Traffic forecasting is an important application of spatiotemporal series prediction. Among different methods, graph neural networks have achieved so far the most promising results, learning relations between graph nodes then becomes a crucial task. However, improvement space is very limited when these relations are learned in a node-to-node manner. The challenge stems from (1) obscure temporal dependencies between different stations, (2) difficulties in defining variables beyond the node level, and (3) no ready-made method to validate the learned relations. To confront these challenges, we define legitimate traffic causal variables to discover the causal relation inside the traffic network, which is carefully checked with statistic tools and case analysis. We then present a novel model named Graph Spatial-Temporal Network Based on Causal Insight (GT-CausIn), where prior learned causal information is integrated with graph diffusion layers and temporal convolutional network (TCN) layers. Experiments are carried out on two real-world traffic datasets: PEMS-BAY and METR-LA, which show that GT-CausIn significantly outperforms the state-of-the-art models on mid-term and long-term prediction.	翻訳日:2022-12-13 17:32:23 公開日:2022-12-12
# 重み付けによる非定常データの学習 Learning on non-stationary data with re-weighting ( http://arxiv.org/abs/2212.05908v1 ) ライセンス: Link先を確認	Nishant Jain, Pradeep Shenoy	(参考訳) 多くの現実世界の学習シナリオは、時間の経過とともにデータ分布が徐々に変化するスローコンセプトドリフトの課題に直面している。そこで本研究では,予測精度を最適化するために,時間に敏感なデータ重み付けを学習する問題を提案する。本稿では,データの変化の複数の時間スケールをキャプチャできる時間的重み付け関数のクラスと,インスタンス固有の特性を提案する。両レベルの最適化基準と関連するメタ学習アルゴリズムを定式化し、これらの重みを学習する。特に,本定式化では,トレーニングインスタンスの関数として重みを出力する補助ネットワークを訓練し,インスタンス重みをコンパクトに表現する。 9年間にわたって拡散した39m画像の大規模な実世界データセット上で,時間的重み付け方式を検証する。我々の広範な実験は、データセットにおけるインスタンスベースの時間的重み付けの必要性を実証し、古典的なバッチ学習アプローチに対する大幅な改善を実現する。さらに,提案手法はストリーミング環境への一般化が容易であり,近年の連続学習手法に比べ,大幅な向上を示す。 Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.	翻訳日:2022-12-13 17:32:00 公開日:2022-12-12
# ロバストなメタラーニングアプローチによる選択的分類 Selective classification using a robust meta-learning approach ( http://arxiv.org/abs/2212.05987v1 ) ライセンス: Link先を確認	Nishant Jain and Pradeep Shenoy	(参考訳) 選択的分類は、モデルが高精度に分類できるテストサンプルのサブセットを識別することを含み、自動医療診断などのアプリケーションにとって重要である。この不確実なサンプルを特定する能力は、より正確な分類器を構築することを目的として、訓練用分類器にも有用である。インスタンスの関数として重要な重み付けを出力するために、1つの補助メタネットワークを訓練することで、これらの二重の役割を統一する。この尺度は、訓練データの再重み付けやテスト時に選択的な分類のためのテストインスタンスのランク付けに使用される。第2のキーとなるのは,メタネットワークのトレーニングのためのドロップアウト分散(ランダムウェイトドロップアウト時の分類器出力のばらつき)を最小化するメタオブジェクトである。学習データにおける分類器損失を最小化し,分離したメタトレーニングデータセット上でのメタ損失を最小化するネスト化目標を用いて,そのメタネットワークと共に分類器を訓練する。例えば、現実世界の糖尿病網膜症データセットでは、最大1.9%のaucと2%の精度で、選択的分類の最先端を上回っています。最後に、我々のメタラーニングフレームワークは、教師なし分散最小化メタオブジェクトを考慮し、教師なしドメイン適応に自然に拡張する。我々は、教師なしドメイン適応を用いた網膜症データセットのドメインシフト設定において、他のベースラインよりも3.4%/3.3%精度とAUCの累積絶対ゲインを示す。 Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.	翻訳日:2022-12-13 17:31:43 公開日:2022-12-12
# PERFEX:信頼できるAIシステムのための分類器のパフォーマンス説明 PERFEX: Classifier Performance Explanations for Trustworthy AI Systems ( http://arxiv.org/abs/2212.06045v1 ) ライセンス: Link先を確認	Erwin Walraven, Ajaya Adhikari, Cor J. Veenman	(参考訳) 実世界の意思決定支援システムに展開する場合、分類モデルの説明性は不可欠である。説明は、予測をユーザに実行可能にし、システムの能力と限界を知らせるべきである。しかし、既存の説明方法は通常、個々の予測についてのみ説明を提供する。分類器が意思決定者をサポートすることができる条件に関する情報は利用できないが、例えば、システムがクラスを区別できない場合の情報は非常に有用である。開発フェーズでは新機能の検索やモデルの組み合わせをサポートし、運用フェーズではシステムを使用しないなどの判断において意思決定者をサポートする。本稿では,PERFEX (PERFormance Explainer) と呼ばれる学習ベース分類器の品質を説明する手法を提案する。本手法は,ベース分類器がエラー度が高いか低いか,その他の分類性能指標を持つ条件下での予測と説明が可能なメタツリー学習アルゴリズムからなる。いくつかの分類器とデータセットを用いてPERFEXを評価し,都市移動データを用いたケーススタディを行った。 PERFEXは, 基本分類器がクラスを区別できない場合でも, コンパクトな性能説明を行いながら, メタ予測性能が高いことが判明した。 Explainability of a classification model is crucial when deployed in real-world decision support systems. Explanations make predictions actionable to the user and should inform about the capabilities and limitations of the system. Existing explanation methods, however, typically only provide explanations for individual predictions. Information about conditions under which the classifier is able to support the decision maker is not available, while for instance information about when the system is not able to differentiate classes can be very helpful. In the development phase it can support the search for new features or combining models, and in the operational phase it supports decision makers in deciding e.g. not to use the system. This paper presents a method to explain the qualities of a trained base classifier, called PERFormance EXplainer (PERFEX). Our method consists of a meta tree learning algorithm that is able to predict and explain under which conditions the base classifier has a high or low error or any other classification performance metric. We evaluate PERFEX using several classifiers and datasets, including a case study with urban mobility data. It turns out that PERFEX typically has high meta prediction performance even if the base classifier is hardly able to differentiate classes, while giving compact performance explanations.	翻訳日:2022-12-13 17:31:17 公開日:2022-12-12
# クロスドメイン自動項抽出用センシングトランス Ensembling Transformers for Cross-domain Automatic Term Extraction ( http://arxiv.org/abs/2212.05696v1 ) ライセンス: Link先を確認	Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, and Senja Pollak	(参考訳) 自動用語抽出は、ドメイン言語理解といくつかの自然言語処理下流タスクにおいて重要な役割を果たす。本稿では,多言語クロスドメイン環境における用語抽出に向けたトランスフォーマーに基づく事前学習言語モデルの予測能力の比較検討を行う。単言語モデルが単語と多語を抽出できる能力を評価するだけでなく、異なる言語モデルの項出力集合の交点または結合を行うことで、単言語モデルと多言語モデルのアンサンブルを実験する。本研究は,4つの専門ドメイン(故障,風力エネルギー,浮力,心不全)と3つの言語(英語,フランス語,オランダ語)をカバーするACTERコーパスと,さらに4つの追加ドメイン(バイオメカニクス,化学,獣医学,言語学)をカバーするRSDO5スロベニアコーパスについて行った。その結果、単言語モデルを採用する戦略は、単語抽出タスクが名前付きエンティティ項の抽出を除外した場合、オランダ語とフランス語を除くすべての言語について、多言語モデルを活用した関連作業から最先端のアプローチを上回っていることがわかった。さらに,2つの最高性能モデルの出力を組み合わせることで,大幅な改善を実現している。 Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ability of monolingual models to extract single- and multi-word terms, we also experiment with ensembles of mono- and multilingual models by conducting the intersection or union on the term output sets of different language models. Our experiments have been conducted on the ACTER corpus covering four specialized domains (Corruption, Wind energy, Equitation, and Heart failure) and three languages (English, French, and Dutch), and on the RSDO5 Slovenian corpus covering four additional domains (Biomechanics, Chemistry, Veterinary, and Linguistics). The results show that the strategy of employing monolingual models outperforms the state-of-the-art approaches from the related work leveraging multilingual models, regarding all the languages except Dutch and French if the term extraction task excludes the extraction of named entity terms. Furthermore, by combining the outputs of the two best performing models, we achieve significant improvements.	翻訳日:2022-12-13 17:15:53 公開日:2022-12-12
# 複数種類の文脈の統合による効果的なシードガイド付き話題発見 Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts ( http://arxiv.org/abs/2212.06002v1 ) ライセンス: Link先を確認	Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han	(参考訳) テキストコーパスから完全に教師されていない方法でコヒーレントなトピックをマイニングする代わりに、シード誘導されたトピック発見手法は、ユーザが提供するシードワードを利用して、ユニークでコヒーレントなトピックを抽出する。単語とシードのセマンティックな相関関係をモデル化するために、既存のシード誘導アプローチでは、文書レベルの単語共起、スライディングウィンドウベースのローカルコンテキスト、事前訓練された言語モデルによってもたらされる汎用言語知識など、さまざまな種類のコンテキスト信号を利用する。本研究は,各文脈情報の価値と限界を実例的に分析・示すものであるが,3種類の文脈(局所的な文脈から学習した単語埋め込み,一般ドメイン学習から得られた事前学習された言語モデル表現,およびシード情報に基づいて検索した話題表現文)を組み合わせることで,品質トピックの発見に相互に補完することができる。本稿では,3種類のコンテキストから共同で学習し,アンサンブルランキングプロセスを通じてコンテキスト信号を徐々に融合する反復的フレームワークSeedTopicMineを提案する。さまざまなシードセットと複数のデータセットに基づいて、SeedTopicMineは、既存のシード誘導トピック発見アプローチよりも一貫性と正確なトピックを一貫して生成する。 Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.	翻訳日:2022-12-13 17:15:15 公開日:2022-12-12
# ゼロショット検索のためのクロスエンコーダの防御 In Defense of Cross-Encoders for Zero-Shot Retrieval ( http://arxiv.org/abs/2212.06121v1 ) ライセンス: Link先を確認	Guilherme Rosa and Luiz Bonifacio and Vitor Jeronymo and Hugo Abonizio and Marzieh Fadaee and Roberto Lotufo and Rodrigo Nogueira	(参考訳) バイエンコーダとクロスエンコーダは多くの最先端の検索パイプラインで広く使われている。本研究では,これら2つのアーキテクチャの一般化能力を,ドメイン内シナリオとドメイン外シナリオの両方において,幅広いパラメータ数で検討する。クロスエンコーダのパラメータ数と初期クエリ文書間相互作用は,検索モデルの一般化能力において重要な役割を果たす。実験の結果, モデルサイズの増加はドメイン内テストセットの限界ゲインをもたらすが, ファインチューニング中に見つからなかった新しいドメインでは, はるかに大きなゲインが得られることがわかった。さらに、クロスエンコーダは、複数のタスクにおいて、ほぼ同様のサイズのbiエンコーダよりも優れていることを示す。 BEIRベンチマークでは、我々の最大のクロスエンコーダは最先端のバイエンコーダを4つ以上の平均点で上回っている。最後に,bi-encoderを第1ステージレトリバーとして使用すると,ドメイン外のタスクにおいてbm25のような単純なレトリバーに比べ,何も得られないことを示す。コードはhttps://github.com/guilhermemr04/scaling-zero-shot-retrieval.gitで入手できる。 Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that cross-encoders largely outperform bi-encoders of similar size in several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a state-of-the-art bi-encoder by more than 4 average points. Finally, we show that using bi-encoders as first-stage retrievers provides no gains in comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git	翻訳日:2022-12-13 17:14:37 公開日:2022-12-12
# プラグアンドプレイ拡散モデルに向けて Towards Practical Plug-and-Play Diffusion Models ( http://arxiv.org/abs/2212.05973v1 ) ライセンス: Link先を確認	Hyojun Go, Yunsung Lee, Jin-Young Kim, Seunghyun Lee, Myeongho Jeong, Hyun Seung Lee, and Seungtaek Choi	(参考訳) 拡散に基づく生成モデルは画像生成において顕著な成功を収めた。彼らのガイダンスの定式化により、外部モデルは拡散モデルを微調整することなく様々なタスクの生成プロセスをプラグ・アンド・プレイで制御できる。しかし、市販の市販オフザシェルフモデルのガイダンスへの直接的利用は、ノイズの多い入力における性能が低かったために失敗する。そのため、既存のプラクティスは、ラベル付きデータがノイズで破損したガイダンスモデルを微調整することです。本稿では,(1)非常に多様なノイズを持つ入力に対して実行することは単一モデルでは難しい,(2)ラベル付きデータセットの収集は様々なタスクのスケールアップを妨げる,という2つの側面に限界がある,と主張する。この制約に対処するために,各専門家が特定のノイズ範囲に特化している複数の専門家を活用し,対応するタイミングで逆処理を誘導する新しい戦略を提案する。しかし,複数ネットワークの管理やラベル付きデータの利用が不可能なため,パラメータ効率の高い微調整とデータフリーな知識伝達を利用した実践的プラグアンドプレイ(PPAP)フレームワークを提案する。我々はImageNetクラス条件生成実験を徹底的に実施し、小さなトレーニング可能なパラメータとラベル付きデータで拡散を導出できることを示す。最後に、画像分類器、深度推定器、セマンティックセグメンテーションモデルが、我々のフレームワークを通じて、プラグイン・アンド・プレイ方式でGLIDEをガイドできることを示す。 Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without fine-tuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner.	翻訳日:2022-12-13 17:07:33 公開日:2022-12-12
# rgbd拡散モデルを用いたインクリメンタルビューインペインティングによる生成シーン合成 Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models ( http://arxiv.org/abs/2212.05993v1 ) ライセンス: Link先を確認	Jiabao Lei, Jiapeng Tang, Kui Jia	(参考訳) 我々は,rgbd ビュー観測のスパース集合から基盤となるシーン幾何と色を復元する課題に対処した。本研究では,カメラの軌跡に沿って新たなrgbdビューを逐次生成する新しいソリューションを提案する。より具体的には、新しいrgbdビューのレンダリングに使用される中間面メッシュを維持し、その後、塗装されたネットワークによって完成させ、レンダリングされたrgbdビューは、後に部分面としてバックプロジェクションされ、中間メッシュに補完される。中間メッシュとカメラプロジェクションの使用は、多視点不整合の屈折問題を解くのに役立つ。我々は,従来2次元生成モデリングに用いられてきた汎用的なrgbd拡散モデルとして,rgbdインパインティングネットワークを実際に実装した。我々は,sparse rgbd入力からの3次元シーン合成のタスクに対するアプローチを評価した。プロジェクトページ: https://jblei.site/project-pages/rgbd-diffusion.html We address the challenge of recovering an underlying scene geometry and colors from a sparse set of RGBD view observations. In this work, we present a new solution that sequentially generates novel RGBD views along a camera trajectory, and the scene geometry is simply the fusion result of these views. More specifically, we maintain an intermediate surface mesh used for rendering new RGBD views, which subsequently becomes complete by an inpainting network; each rendered RGBD view is later back-projected as a partial surface and is supplemented into the intermediate mesh. The use of intermediate mesh and camera projection helps solve the refractory problem of multi-view inconsistency. We practically implement the RGBD inpainting network as a versatile RGBD diffusion model, which is previously used for 2D generative modeling; we make a modification to its reverse diffusion process to enable our use. We evaluate our approach on the task of 3D scene synthesis from sparse RGBD inputs; extensive experiments on the ScanNet dataset demonstrate the superiority of our approach over existing ones. Project page: https://jblei.site/project-pages/rgbd-diffusion.html	翻訳日:2022-12-13 17:07:07 公開日:2022-12-12
# 自動口内感情認識の改良の試み An Approach for Improving Automatic Mouth Emotion Recognition ( http://arxiv.org/abs/2212.06009v1 ) ライセンス: Link先を確認	Giulio Biondi, Valentina Franzoni, Osvaldo Gervasi, Damiano Perri	(参考訳) 本研究は,コンボリューションニューラルネット(cnn)を介して,感情を認識し,リアルタイムフィードバックを生成するために,コミュニケーションスキルの問題(筋肉の浪費,脳卒中,自閉症,より簡単には痛みなど)を伴う健康障害を持つ人を支援することを目的とした,口内検出による感情自動認識手法の提案とテストを行う。ソフトウェアシステムは、取得した画像に顔が存在するかどうかを識別する計算を開始し、次に口の位置を探し、対応する特徴を抽出する。両方のタスクはhaar機能ベースの分類器を使用して実行され、高速実行と有望なパフォーマンスが保証される。これまでの作業が,単一ユーザに対するパーソナライズされたトレーニングのための視覚的なマイクロ表現に重点を置いていたならば,この戦略は,汎用的な顔データセットでもシステムをトレーニングすることを目的としています。 The study proposes and tests a technique for automated emotion recognition through mouth detection via Convolutional Neural Networks (CNN), meant to be applied for supporting people with health disorders with communication skills issues (e.g. muscle wasting, stroke, autism, or, more simply, pain) in order to recognize emotions and generate real-time feedback, or data feeding supporting systems. The software system starts the computation identifying if a face is present on the acquired image, then it looks for the mouth location and extracts the corresponding features. Both tasks are carried out using Haar Feature-based Classifiers, which guarantee fast execution and promising performance. If our previous works focused on visual micro-expressions for personalized training on a single user, this strategy aims to train the system also on generalized faces data sets.	翻訳日:2022-12-13 17:06:47 公開日:2022-12-12
# humpty dumpty再構成:オープンセット動作認識のための多機能グラフオートエンコーダ Reconstructing Humpty Dumpty: Multi-feature Graph Autoencoder for Open Set Action Recognition ( http://arxiv.org/abs/2212.06023v1 ) ライセンス: Link先を確認	Dawei Du, Ameya Shringi, Anthony Hoogs, Christopher Funk	(参考訳) ほとんどのアクション認識データセットとアルゴリズムは、すべてのテストサンプルが既知のクラスのインスタンスであるクローズドワールドを想定している。開集合問題では、テストサンプルは既知のクラスまたは未知のクラスから引き出すことができる。既存のオープンセット動作認識法は、通常、分類スコアや特徴距離のポストホック分析を追加することによって拡張された閉セット法に基づいており、すべてのビデオクリップ要素間の関係を捉えない。本手法は,未知のクラスを組み戻すのが難しく,既知のクラスのビデオよりも高い再構成誤差を有するため,映像の新規性を決定するために再構成誤差を用いる。我々は,オープンセット動作認識問題に対する我々の解決策を,その再構築能力から「ハンプティダンプティ」と呼んでいる。 humpty dumptyは、新しいグラフベースのオートエンコーダで、クリップのコンテクストとセマンティクスの関係を考慮し、再構成を改善する。より大きなリコンストラクションエラーは、アクションが再構築できない可能性、すなわち、ハンプティダンプティを再び戻せない可能性の増加をもたらし、そのアクションがこれまで見たことがなく、新規/未知であることを示している。 HMDB-51とUCF-101を含む2つの一般公開された行動認識データセットで大規模な実験を行い、オープンセットの行動認識の最先端性能を示す。 Most action recognition datasets and algorithms assume a closed world, where all test samples are instances of the known classes. In open set problems, test samples may be drawn from either known or unknown classes. Existing open set action recognition methods are typically based on extending closed set methods by adding post hoc analysis of classification scores or feature distances and do not capture the relations among all the video clip elements. Our approach uses the reconstruction error to determine the novelty of the video since unknown classes are harder to put back together and thus have a higher reconstruction error than videos from known classes. We refer to our solution to the open set action recognition problem as "Humpty Dumpty", due to its reconstruction abilities. Humpty Dumpty is a novel graph-based autoencoder that accounts for contextual and semantic relations among the clip pieces for improved reconstruction. A larger reconstruction error leads to an increased likelihood that the action can not be reconstructed, i.e., can not put Humpty Dumpty back together again, indicating that the action has never been seen before and is novel/unknown. Extensive experiments are performed on two publicly available action recognition datasets including HMDB-51 and UCF-101, showing the state-of-the-art performance for open set action recognition.	翻訳日:2022-12-13 17:06:30 公開日:2022-12-12
# 効率的な変圧器による映像予測 Video Prediction by Efficient Transformers ( http://arxiv.org/abs/2212.06026v1 ) ライセンス: Link先を確認	Xi Ye, Guillaume-Alexandre Bilodeau	(参考訳) ビデオ予測は、幅広いアプリケーションを持つコンピュータビジョンの課題である。そこで本研究では,ビデオ予測のためのトランスフォーマーモデルについて紹介する。まず, 標準変圧器の複雑さを低減するため, 効率的な局所空間分離注意機構を提案する。そして、新しい効率的な変圧器に基づいて、完全自己回帰モデル、部分自己回帰モデル、非自己回帰モデルを開発した。部分自己回帰モデルは完全な自己回帰モデルと同様の性能を持つが、より高速な推論速度を持つ。非自己回帰モデルは、高速な推論速度を達成するだけでなく、自己回帰モデルの品質劣化問題を緩和するだけでなく、学習のために追加のパラメータと損失関数を必要とする。そこで本研究では,提案する3種類の映像予測手法を総合的に検討した。実験により,提案するビデオ予測モデルは,より複雑な畳み込み型lstmモデルと競合することが示された。ソースコードはhttps://github.com/XiYe20/VPTRで入手できる。 Video prediction is a challenging computer vision task that has a wide range of applications. In this work, we present a new family of Transformer-based models for video prediction. Firstly, an efficient local spatial-temporal separation attention mechanism is proposed to reduce the complexity of standard Transformers. Then, a full autoregressive model, a partial autoregressive model and a non-autoregressive model are developed based on the new efficient Transformer. The partial autoregressive model has a similar performance with the full autoregressive model but a faster inference speed. The non-autoregressive model not only achieves a faster inference speed but also mitigates the quality degradation problem of the autoregressive counterparts, but it requires additional parameters and loss function for learning. Given the same attention mechanism, we conducted a comprehensive study to compare the proposed three video prediction variants. Experiments show that the proposed video prediction models are competitive with more complex state-of-the-art convolutional-LSTM based models. The source code is available at https://github.com/XiYe20/VPTR.	翻訳日:2022-12-13 17:06:07 公開日:2022-12-12
# 等分散によるロバスト知覚 Robust Perception through Equivariance ( http://arxiv.org/abs/2212.06079v1 ) ライセンス: Link先を確認	Chengzhi Mao, Lingyu Zhang, Abhishek Joshi, Junfeng Yang, Hao Wang, Carl Vondrick	(参考訳) コンピュータビジョンのためのディープネットワークは、敵の例に遭遇すると信頼できない。本稿では,自然画像における密集した内在的制約を用いて推論を堅牢化する枠組みを提案する。推論時間に制約を導入することで、ロバストネスの負担をトレーニングから推論アルゴリズムにシフトさせることにより、モデルが各画像のユニークで潜在的に新しい特徴に対して、推論時に動的に調整することができる。異なる制約のうち、等分散に基づく制約が最も効果的であることは、細粒度レベルで表現を過度に制約することなく、機能空間における密集した制約を可能にするためである。理論的な結果は, 推定時にそのような密度制約を持つことの重要性を検証した。実験の結果, 推定時間における特徴等分散の復元は, 最悪の対向摂動を防御することが示された。本手法は,画像認識,セマンティックセグメンテーション,インスタンスセグメンテーションの4つのデータセット(ImageNet,Cityscapes,PASCAL VOC,MS-COCO)の対向ロバスト性を向上させる。プロジェクトページは equi4robust.cs.columbia.edu で公開されている。 Deep networks for computer vision are not reliable when they encounter adversarial examples. In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference. By introducing constraints at inference time, we can shift the burden of robustness from training to the inference algorithm, thereby allowing the model to adjust dynamically to each individual image's unique and potentially novel characteristics at inference time. Among different constraints, we find that equivariance-based constraints are most effective, because they allow dense constraints in the feature space without overly constraining the representation at a fine-grained level. Our theoretical results validate the importance of having such dense constraints at inference time. Our empirical experiments show that restoring feature equivariance at inference time defends against worst-case adversarial perturbations. The method obtains improved adversarial robustness on four datasets (ImageNet, Cityscapes, PASCAL VOC, and MS-COCO) on image recognition, semantic segmentation, and instance segmentation tasks. Project page is available at equi4robust.cs.columbia.edu.	翻訳日:2022-12-13 17:05:54 公開日:2022-12-12
# ゼロショット検出における意味的混乱の解消 Resolving Semantic Confusions for Improved Zero-Shot Detection ( http://arxiv.org/abs/2212.06097v1 ) ライセンス: Link先を確認	Sandipan Sarma, Sushil Kumar, Arijit Sur	(参考訳) ゼロショット検出(zsd)は、モデルがいくつかのターゲット("unseen")クラスの視覚的なサンプルでトレーニングされていなくても、オブジェクトの認識とローカライズを同時に行なおうとする、難しいタスクです。近年、GANのような生成モデルを用いた手法は、目に見えるデータに基づいて訓練されたGANによって、目に見えないサンプルが生成され、バニラオブジェクト検出器が見えないオブジェクトを認識できるという、最良の結果を示している。しかし、意味的混乱の問題はまだ残っており、モデルが意味的類似クラスを区別できないこともある。本研究では,クラス間の相違の度合いを認識し,生成したサンプルに反映する三重項損失を取り入れた生成モデルを訓練することを提案する。さらに、クラスの生成したビジュアルサンプルが、自身のセマンティクスに高度に対応することを保証するために、サイクリック一貫性損失も実施される。 MSCOCOとPASCAL-VOCの2つのベンチマークZSDデータセットに対する大規模な実験は、現在のZSDメソッドよりも大幅に向上し、意味的混乱を低減し、目に見えないクラスの検出を改善する。 Zero-shot detection (ZSD) is a challenging task where we aim to recognize and localize objects simultaneously, even when our model has not been trained with visual samples of a few target ("unseen") classes. Recently, methods employing generative models like GANs have shown some of the best results, where unseen-class samples are generated based on their semantics by a GAN trained on seen-class data, enabling vanilla object detectors to recognize unseen objects. However, the problem of semantic confusion still remains, where the model is sometimes unable to distinguish between semantically-similar classes. In this work, we propose to train a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes and reflects them in the generated samples. Moreover, a cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics. Extensive experiments on two benchmark ZSD datasets - MSCOCO and PASCAL-VOC - demonstrate significant gains over the current ZSD methods, reducing semantic confusion and improving detection for the unseen classes.	翻訳日:2022-12-13 17:05:31 公開日:2022-12-12
# 皮膚癌分類のためのシームズニューラルネットワークと臨床および皮膚内視鏡画像データセットを用いた新しいクラス検出 Siamese Neural Networks for Skin Cancer Classification and New Class Detection using Clinical and Dermoscopic Image Datasets ( http://arxiv.org/abs/2212.06130v1 ) ライセンス: Link先を確認	Michael Luke Battle, Amir Atapour-Abarghouei, Andrew Stephen McGough	(参考訳) 皮膚がんは世界でもっとも一般的な悪性腫瘍である。自動皮膚がん検出は早期発見率を大幅に改善し、死亡を防ぐ。この目的を達成するために、ディープラーニングシステムのトレーニングに使用できる多くのデータセットがリリースされた。しかし、これはトレーニング対象のクラスでのみ有効であり、これまで見つからなかったクラスから皮膚の病変を識別できないため、臨床使用には適さない。私たちは、すべての皮膚病変を含むことでデータセットを大幅に増やすことを検討できますが、これは常にいくつかのクラスを除外します。代わりに、SNN(Siamese Neural Networks)を評価し、皮膚病変の画像の分類を可能にするだけでなく、トレーニングされたクラスとは異なるイメージを識別することが可能になる。皮膚病変の皮膚内視鏡像と臨床像でSNNを評価した。臨床データと皮膚内視鏡データからtop-1分類精度の74.33%と85.61%を得た。これは最先端の結果よりもわずかに低いが、SNNアプローチはクラス外例を検出するという利点がある。本研究はSNNアプローチの可能性と今後の臨床展開への道筋を明らかにするものである。 Skin cancer is the most common malignancy in the world. Automated skin cancer detection would significantly improve early detection rates and prevent deaths. To help with this aim, a number of datasets have been released which can be used to train Deep Learning systems - these have produced impressive results for classification. However, this only works for the classes they are trained on whilst they are incapable of identifying skin lesions from previously unseen classes, making them unconducive for clinical use. We could look to massively increase the datasets by including all possible skin lesions, though this would always leave out some classes. Instead, we evaluate Siamese Neural Networks (SNNs), which not only allows us to classify images of skin lesions, but also allow us to identify those images which are different from the trained classes - allowing us to determine that an image is not an example of our training classes. We evaluate SNNs on both dermoscopic and clinical images of skin lesions. We obtain top-1 classification accuracy levels of 74.33% and 85.61% on clinical and dermoscopic datasets, respectively. Although this is slightly lower than the state-of-the-art results, the SNN approach has the advantage that it can detect out-of-class examples. Our results highlight the potential of an SNN approach as well as pathways towards future clinical deployment.	翻訳日:2022-12-13 17:05:08 公開日:2022-12-12
# Rodin:拡散を利用した3Dデジタルアバターの創成モデル Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion ( http://arxiv.org/abs/2212.06135v1 ) ライセンス: Link先を確認	Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo	(参考訳) 本稿では,拡散モデルを用いて神経放射場を表す3次元デジタルアバターを自動生成する3次元生成モデルを提案する。このようなアバターを生成する上での重大な課題は、3dのメモリと処理コストが高品質アバターに必要なリッチなディテールを生成できることである。この問題を解決するために,複数の2次元特徴写像として神経放射場を表すロールアウト拡散ネットワーク (Rodin) を提案し,これらのマップを1つの2次元特徴平面にロールアウトして3次元拡散を行う。 Rodinモデルでは、3Dにおける拡散の完全性を維持しつつ、3Dにおける元の関係に従って2D特徴面の投影された特徴に付随する3D認識畳み込みを用いて計算効率を向上する。我々はまた,グローバルコヒーレンスのための特徴生成のオーケストレーションに潜時条件付けを使用し,高忠実度アバターを実現し,テキストプロンプトに基づく意味的な編集を可能にする。最後に,階層合成を用いてさらに詳細化を行う。モデルにより生成された3Dアバターは,既存の生成技術とよく比較できる。リアルな髪型とあごひげのような顔の毛を持つ、非常に詳細なアバターを生成できる。また,画像やテキストからの3dアバター生成や,テキストガイドによる編集性を示す。 This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability.	翻訳日:2022-12-13 17:04:49 公開日:2022-12-12
# NMSが復活 NMS Strikes Back ( http://arxiv.org/abs/2212.06137v1 ) ライセンス: Link先を確認	Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Kr\"ahenb\"uhl	(参考訳) Detection Transformer (DETR)は、トレーニング中に1対1のバイパーティイトマッチングを使用してクエリをユニークなオブジェクトに変換し、エンドツーエンドのオブジェクト検出を可能にする。近年、これらのモデルはCOCO上の従来の検出器を優雅に越えている。しかし、モデルアーキテクチャやトレーニングスケジュールを含む複数の設計における従来の検出器とは異なり、1対1マッチングの有効性は十分に理解されていない。本研究では,DETRにおける1対1のハンガリー語マッチングと,NMSを用いた従来の検出器における1対多のラベル割り当てとの厳密な比較を行う。意外なことに、NMSによる1対多の割り当ては、同じ設定で標準の1対1マッチングを一貫して上回り、最大2.5mAPで大幅に向上する。従来のIoUをベースとしたラベル割り当てでDeformable-DETRをトレーニングする検出器は、ResNet50のバックボーンで12時間で50.2COCOmAPを達成した。複数のデータセット、スケジュール、アーキテクチャにおいて、パフォーマンス検出変換器には二部マッチングが不要であることを示す。さらに,検出トランスの成功は,その表現型トランスアーキテクチャによるものである。コードはhttps://github.com/jozhang97/DETAで入手できる。 Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.	翻訳日:2022-12-13 17:04:28 公開日:2022-12-12
# 半正則表面メッシュ上のスペクトル畳み込みオートエンコーダを用いた転送学習 Transfer Learning using Spectral Convolutional Autoencoders on Semi-Regular Surface Meshes ( http://arxiv.org/abs/2212.05810v1 ) ライセンス: Link先を確認	Sara Hahner, Felix Kerkhoff, Jochen Garcke	(参考訳) 時間とともに変形する3次元表面メッシュの基盤となるダイナミクスとパターンは、教師なし学習、特に表面の低次元埋め込みを計算するオートエンコーダによって発見される。トランスファーラーニングにより未知形状の変形パターンを研究するために,新しいネットワークをトレーニングすることなく,新しい表面メッシュを解析できるオートエンコーダを訓練したい。ここでは、ほとんどの最先端のオートエンコーダは異なる接続のメッシュを扱えないため、新しいメッシュへの一般化能力に制限される。また, トレーニング形状の誤差と比較して, 復元誤差が強くなった。そこで本研究では,新しいスペクトルCoSMA(Convolutional Semi-Regular Mesh Autoencoder)ネットワークを提案する。このパッチベースのアプローチは、表面認識トレーニングと組み合わせられる。トレーニング中に提示されない表面を再構成し、表面のパッチの変形挙動を一般化する。新たなアプローチでは,これらの形状をトレーニングした最先端オートエンコーダと比較して,さまざまなデータセットから未認識のメッシュを優れた品質で再構成する。データ上で直接学習したモデルよりも,目に見えない形状でのトランスファー学習誤差は40%低い。さらに、ベースラインオートエンコーダは、全体の形状にのみ、目に見えないメッシュシーケンスの変形パターンを検出する。対照的に, 使用済みの地域パッチと安定した復元品質により, これらの変形パターンが表れる場所を局所化することができる。 The underlying dynamics and patterns of 3D surface meshes deforming over time can be discovered by unsupervised learning, especially autoencoders, which calculate low-dimensional embeddings of the surfaces. To study the deformation patterns of unseen shapes by transfer learning, we want to train an autoencoder that can analyze new surface meshes without training a new network. Here, most state-of-the-art autoencoders cannot handle meshes of different connectivity and therefore have limited to no generalization capacities to new meshes. Also, reconstruction errors strongly increase in comparison to the errors for the training shapes. To address this, we propose a novel spectral CoSMA (Convolutional Semi-Regular Mesh Autoencoder) network. This patch-based approach is combined with a surface-aware training. It reconstructs surfaces not presented during training and generalizes the deformation behavior of the surfaces' patches. The novel approach reconstructs unseen meshes from different datasets in superior quality compared to state-of-the-art autoencoders that have been trained on these shapes. Our transfer learning errors on unseen shapes are 40% lower than those from models learned directly on the data. Furthermore, baseline autoencoders detect deformation patterns of unseen mesh sequences only for the whole shape. In contrast, due to the employed regional patches and stable reconstruction quality, we can localize where on the surfaces these deformation patterns manifest.	翻訳日:2022-12-13 16:58:27 公開日:2022-12-12
# SAR画像における船舶検出効率の最適化 Optimizing ship detection efficiency in SAR images ( http://arxiv.org/abs/2212.05843v1 ) ライセンス: Link先を確認	Arthur Van Meerbeeck, Jordy Van Landeghem, Ruben Cartuyvels, Marie-Francine Moens	(参考訳) 違法漁業の検出と防止は、健全で機能的な生態系を維持するために重要である。衛星画像における船の検出に関する最近の研究は、性能の向上と検出効率の無視に重点を置いている。しかし,漁獲防止のための時間的介入には,船体検出の速度と計算コストが不可欠である。そこで本研究では,検出時間とコストを最小限に抑える最適化手法を検討した。衛星画像のデータセットを用いて,畳み込みニューラルネットワーク(CNN)に基づく物体検出モデルを訓練した。次に,ベースcnnや他のベースモデルに適用可能な2つの効率最適化を考案した。最適化は高速で安価な分類モデルと統計アルゴリズムから構成される。オブジェクト検出モデルとの最適化の統合は、速度と性能のトレードオフをもたらす。私たちは、実行時間とパフォーマンスに異なる重み付けを与えるメトリクスを使ってトレードオフを調べました。分類モデルを用いることで,検出モデルの精度を44%で99.5%,25%で92.7%に近似できることを示した。 The detection and prevention of illegal fishing is critical to maintaining a healthy and functional ecosystem. Recent research on ship detection in satellite imagery has focused exclusively on performance improvements, disregarding detection efficiency. However, the speed and compute cost of vessel detection are essential for a timely intervention to prevent illegal fishing. Therefore, we investigated optimization methods that lower detection time and cost with minimal performance loss. We trained an object detection model based on a convolutional neural network (CNN) using a dataset of satellite images. Then, we designed two efficiency optimizations that can be applied to the base CNN or any other base model. The optimizations consist of a fast, cheap classification model and a statistical algorithm. The integration of the optimizations with the object detection model leads to a trade-off between speed and performance. We studied the trade-off using metrics that give different weight to execution time and performance. We show that by using a classification model the average precision of the detection model can be approximated to 99.5% in 44% of the time or to 92.7% in 25% of the time.	翻訳日:2022-12-13 16:58:08 公開日:2022-12-12
# CbwLoss: 深さと姿勢の自己教師型学習のための制約付き双方向重み付き損失 CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose ( http://arxiv.org/abs/2212.05845v1 ) ライセンス: Link先を確認	Fei Wang, Jun Cheng, Penglei Liu	(参考訳) 光度差は、未ラベルの単眼ビデオから深度とカメラのポーズを推定するためのニューラルネットワークを訓練するための監視信号として広く使用されている。しかし、この手法は静的シナリオの仮定に違反するため、モデル最適化にとって有害である。さらに、テクスチャレス領域の画素や、差別的な画素がモデルトレーニングを妨げる。そこで本研究では,アフィン変換とビュー合成によって生じる流れ場と深さ構造の違いを利用して,移動物体とオクルージョンをそれぞれ扱う。第2に,ネットワークを追加することなく,より意味のある特徴と文脈的な情報の違いを測定することにより,テクスチャレス領域がモデル最適化に与える影響を緩和する。さらに、各サブオブジェクト関数で双方向成分が使用されるが、一対の画像を1回だけ推論することでオーバーヘッドを低減できる。提案手法は,同一条件下で,かつ補助的な情報を導入することなく,既存の自己教師あり手法に勝る効果を示す。 Photometric differences are widely used as supervision signals to train neural networks for estimating depth and camera pose from unlabeled monocular videos. However, this approach is detrimental for model optimization because occlusions and moving objects in a scene violate the underlying static scenario assumption. In addition, pixels in textureless regions or less discriminative pixels hinder model training. To solve these problems, in this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis, respectively. Secondly, we mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks. In addition, although the bidirectionality component is used in each sub-objective function, a pair of images are reasoned about only once, which helps reduce overhead. Extensive experiments and visual analysis demonstrate the effectiveness of the proposed method, which outperform existing state-of-the-art self-supervised methods under the same conditions and without introducing additional auxiliary information.	翻訳日:2022-12-13 16:57:54 公開日:2022-12-12
# BeautyREC:ロバスト、効率的、およびコンテンツ保存メイクアップ転送 BeautyREC: Robust, Efficient, and Content-preserving Makeup Transfer ( http://arxiv.org/abs/2212.05855v1 ) ライセンス: Link先を確認	Qixin Yan and Chunle Guo and Jixin Zhao and Yuekun Dai and Chen Change Loy and Chongyi Li	(参考訳) 本稿では,Robust,Efficient and Component-specific makeup transfer method (BeautyREC)を提案する。グローバル注意を活用し、単に特徴を結合したり、潜在空間で特徴を暗黙的に操作したりする先行手法からのユニークな脱却として、参照画像のメイクアップスタイルを、ソース画像の対応するコンポーネント(例えば、肌、唇、目)に直接転送し、精巧で正確な局所メイクアップ転送を行うコンポーネント固有対応を提案する。補助として、Transformerの長距離視覚依存性を導入して、効率的なグローバルメイク転送を実現する。複雑で不安定なサイクル構造の代わりに、コンテントエンコーダと組み合わされたコンテンツ一貫性損失を用いて、効率的なシングルパスメイクアップ転送を実現する。本研究の主な知見は, 局所メイク転送のためのコンポーネント固有対応のモデル化, グローバルメイク転送のための長距離依存関係の取得, シングルパス構造による効率的なメイク転送の実現である。既存のデータセットを補完するメークアップ転送データセットであるbeautyfaceも提供しています。このデータセットには3000の顔が含まれ、より多様なメイクスタイル、顔のポーズ、レースをカバーしている。各顔にはアノテーション付きパースマップがある。本手法の最先端手法に対する有効性を示す実験を行った。また,本手法は100Mパラメータのみで,最先端手法(BeautyGAN: 8.43M, PSGAN: 12.62M, SCGAN: 15.30M, CPM: 9.24M, SSAT: 10.48M)より優れている。 In this work, we propose a Robust, Efficient, and Component-specific makeup transfer method (abbreviated as BeautyREC). A unique departure from prior methods that leverage global attention, simply concatenate features, or implicitly manipulate features in latent space, we propose a component-specific correspondence to directly transfer the makeup style of a reference image to the corresponding components (e.g., skin, lips, eyes) of a source image, making elaborate and accurate local makeup transfer. As an auxiliary, the long-range visual dependencies of Transformer are introduced for effective global makeup transfer. Instead of the commonly used cycle structure that is complex and unstable, we employ a content consistency loss coupled with a content encoder to implement efficient single-path makeup transfer. The key insights of this study are modeling component-specific correspondence for local makeup transfer, capturing long-range dependencies for global makeup transfer, and enabling efficient makeup transfer via a single-path structure. We also contribute BeautyFace, a makeup transfer dataset to supplement existing datasets. This dataset contains 3,000 faces, covering more diverse makeup styles, face poses, and races. Each face has annotated parsing map. Extensive experiments demonstrate the effectiveness of our method against state-of-the-art methods. Besides, our method is appealing as it is with only 1M parameters, outperforming the state-of-the-art methods (BeautyGAN: 8.43M, PSGAN: 12.62M, SCGAN: 15.30M, CPM: 9.24M, SSAT: 10.48M).	翻訳日:2022-12-13 16:57:34 公開日:2022-12-12
# CountingMOT:複数物体追跡のための共同カウント、検出、再同定 CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking ( http://arxiv.org/abs/2212.05861v1 ) ライセンス: Link先を確認	Weihong Ren, Bowen Chen, Yuhang Shi, Weibo Jiang and Honghai Liu	(参考訳) マルチオブジェクトトラッキング(mot)の最近のトレンドは、オブジェクト検出と出現機能(あるいは動き)を同時に学習する検出と追跡を共同で解決している。競争性能にもかかわらず、混雑したシーンでは、共同検出と追跡は通常、ミスや誤検出のために正確なオブジェクト関連を見つけることができない。本稿では,混み合うシーンに適したエンドツーエンドフレームワークであるCountingMOTのカウント,検出,再識別を共同でモデル化する。検出とカウントの間にオブジェクトカウントの制約を課すことで、countingmotはオブジェクト検出とクラウド密度マップ推定のバランスを見つけようとする。私たちのアプローチは、オブジェクトの検出、カウント、再同定のギャップを埋める試みです。これは、群衆密度を無視して、混み合ったシーンで失敗する傾向にある以前のMOT手法とは対照的である。提案手法は,mot16(mota:77.6),mot17(mota:78.0%),mot20(mota:70.2%)の公開ベンチマークにおいて,オンラインおよびリアルタイムのトラッキングを行うことができる。 The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 77.6), MOT17 (MOTA of 78.0%) and MOT20 (MOTA of 70.2%).	翻訳日:2022-12-13 16:57:03 公開日:2022-12-12
# Diff-Font:ロバストワンショットフォント生成のための拡散モデル Diff-Font: Diffusion Model for Robust One-Shot Font Generation ( http://arxiv.org/abs/2212.05895v1 ) ライセンス: Link先を確認	Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao, Yu Qiao	(参考訳) フォント生成は困難で時間を要する作業であり、特に中国語など多数の文字を持つ複雑な構造を持つイデオグラムを用いた言語では特に困難である。この問題を解決するために、少数ショットフォント生成やワンショットフォント生成さえも注目されている。しかし、既存のフォント生成メソッドの多くは、まだ苦しむ可能性がある。 (i)大規模なクロスファントギャップチャレンジ (二)微妙なクロスファント変動問題、及び (三)複雑な文字を誤って生成すること。本稿では,大きなデータセット上で安定的に学習できる拡散モデルに基づく新しいワンショットフォント生成法diff-fontを提案する。提案モデルは,フォントライブラリ全体を生成することを目的として,参照として1つのサンプルのみを与える。具体的には、大きなストロークワイドデータセットを構築し、各生成された文字の構造と完了を保存するためのストロークワイド拡散モデルを提案する。我々の知る限りでは、フォント生成タスクを処理する拡散モデルを開発した最初のDiff-Fontが提案されている。十分に訓練されたdiff-fontはフォントギャップやフォントのバリエーションに頑健なだけでなく、難しい文字生成でも有望な性能を達成している。従来のフォント生成手法と比較して,本モデルは質的かつ定量的に,最先端の性能に達する。 Font generation is a difficult and time-consuming task, especially in those languages using ideograms that have complicated structures with a large number of characters, such as Chinese. To solve this problem, few-shot font generation and even one-shot font generation have attracted a lot of attention. However, most existing font generation methods may still suffer from (i) large cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii) incorrect generation of complicated characters. In this paper, we propose a novel one-shot font generation method based on a diffusion model, named Diff-Font, which can be stably trained on large datasets. The proposed model aims to generate the entire font library by giving only one sample as the reference. Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character. To our best knowledge, the proposed Diff-Font is the first work that developed diffusion models to handle the font generation task. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation. Compared to previous font generation methods, our model reaches state-of-the-art performance both qualitatively and quantitatively.	翻訳日:2022-12-13 16:56:38 公開日:2022-12-12
# multiact: 複数のアクションラベルからの長期的3次元動作生成 MultiAct: Long-Term 3D Human Motion Generation from Multiple Action Labels ( http://arxiv.org/abs/2212.05897v1 ) ライセンス: Link先を確認	Taeryung Lee, Gyeongsik Moon, and Kyoung Mu Lee	(参考訳) 複数のアクションラベルから長期の3次元人間の動きを生成する問題に取り組む。アクションとモーションコンディショニングの2つの主要なアプローチには、この問題を解決するための制限がある。アクション条件付きメソッドは、単一のアクションから一連の動きを生成する。したがって、複数のアクションとアクション間の遷移からなる長期的な動作は生成できない。一方、モーションコンディショニング方式は、初期動作から将来の動きを生成する。生成された将来の動作は過去のみに依存するため、ユーザの望ましいアクションによって制御できない。複数のアクションラベルから長期的3次元動作を生成する最初のフレームワークであるmultiactを提案する。 MultiActは、動作条件と動作条件の両方を統一されたリカレント生成システムで考慮する。前の動作とアクションラベルを繰り返すと、その動作のスムーズな遷移と動きが生成される。その結果、MultiActは複数のアクションラベルの与えられたシーケンスによって制御される現実的な長期動作を生成する。コードはリリースされます。 We tackle the problem of generating long-term 3D human motion from multiple action labels. Two main previous approaches, such as action- and motion-conditioned methods, have limitations to solve this problem. The action-conditioned methods generate a sequence of motion from a single action. Hence, it cannot generate long-term motions composed of multiple actions and transitions between actions. Meanwhile, the motion-conditioned methods generate future motions from initial motion. The generated future motions only depend on the past, so they are not controllable by the user's desired actions. We present MultiAct, the first framework to generate long-term 3D human motion from multiple action labels. MultiAct takes account of both action and motion conditions with a unified recurrent generation system. It repetitively takes the previous motion and action label; then, it generates a smooth transition and the motion of the given action. As a result, MultiAct produces realistic long-term motion controlled by the given sequence of multiple action labels. The code will be released.	翻訳日:2022-12-13 16:56:17 公開日:2022-12-12
# 階層型アンカーフリー物体検出器を用いた深層学習に基づく非定型および正常ミトースのサブタイプ Deep learning-based Subtyping of Atypical and Normal Mitoses using a Hierarchical Anchor-Free Object Detector ( http://arxiv.org/abs/2212.05900v1 ) ライセンス: Link先を確認	Marc Aubreville, Jonathan Ganz, Jonas Ammeling, Taryn A. Donovan, Rutger H. J. Fick, Katharina Breininger, Christof A. Bertram	(参考訳) 多くの腫瘍の悪性度評価には, 分裂活性が重要である。また,正常ミトーシスに対する異常ミトーシスの割合は予後に重要な意味があることが示されている。非定型的有糸分裂図形(MF)は、形態学的に、クロマチドの分離異常を有すると同定できる。本研究は,ミトーシスの異なる相の特徴的な形態的出現に応じて,初めて,ミトーシス図形の正常および非定型分類への自動サブタイピングを行う。 MIDOG21とTUPAC16の乳がん有糸分裂データセットを用いて、2人の専門家が盲目的に有糸分裂像を5つの形態分類に分類した。さらに,アンカーフリーなfcosアプローチを拡張した最先端オブジェクト検出パイプラインを,階層的サブクラス化ブランチで構築した。我々のラベル付け実験は、有糸分裂体のサブタイプは難しい課題であり、MFの24.89%で見られるラター間不一致の傾向を示した。より多様な訓練用midog21データセットと試験用tupac16を用いて、平均平均精度スコア0.552、非定型/正常mf用roc aucスコア0.833、mitosisの異なる相の細胞を識別するための平均クラス平均roc-aucスコア0.977に達した。 Mitotic activity is key for the assessment of malignancy in many tumors. Moreover, it has been demonstrated that the proportion of abnormal mitosis to normal mitosis is of prognostic significance. Atypical mitotic figures (MF) can be identified morphologically as having segregation abnormalities of the chromatids. In this work, we perform, for the first time, automatic subtyping of mitotic figures into normal and atypical categories according to characteristic morphological appearances of the different phases of mitosis. Using the publicly available MIDOG21 and TUPAC16 breast cancer mitosis datasets, two experts blindly subtyped mitotic figures into five morphological categories. Further, we set up a state-of-the-art object detection pipeline extending the anchor-free FCOS approach with a gated hierarchical subclassification branch. Our labeling experiment indicated that subtyping of mitotic figures is a challenging task and prone to inter-rater disagreement, which we found in 24.89% of MF. Using the more diverse MIDOG21 dataset for training and TUPAC16 for testing, we reached a mean overall average precision score of 0.552, a ROC AUC score of 0.833 for atypical/normal MF and a mean class-averaged ROC-AUC score of 0.977 for discriminating the different phases of cells undergoing mitosis.	翻訳日:2022-12-13 16:56:04 公開日:2022-12-12
# SRouDA:ロバストな教師なしドメイン適応のためのメタセルフトレーニング SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation ( http://arxiv.org/abs/2212.05917v1 ) ライセンス: Link先を確認	Wanqing Zhu, Jia-Li Yin, Bo-Hao Chen, Ximeng Liu	(参考訳) データの手動ラベルの取得にはコストがかかる可能性があるため、リッチラベルデータセットから未ラベルのターゲットデータセットに学習した知識を転送するunsupervised domain adaptation(UDA)が人気を集めている。対象領域におけるモデル精度の向上に多くの研究が費やされているが、モデル堅牢性の重要な問題は無視されている。さらに悪いことに、モデルロバスト性を改善するための従来の対戦訓練(AT)手法は、教師付き損失関数によって生成される敵の例に基づいてモデルを訓練するため、UDAシナリオでは適用できない。本稿では,UDAモデルの対角的堅牢性を改善するために,SRoUDAというメタ自己学習パイプラインを提案する。自己学習パラダイムに基づいて、SRoUDAは、ソースラベル付きデータにUDAベースラインを適用し、開発したランダムマスク拡張(RMA)でラベルなしデータをタラゲットすることでソースモデルを事前トレーニングし、その後、擬似ラベル付きターゲットデータに基づく敵ターゲットモデルトレーニングと、メタステップでソースモデルを微調整する。自己学習は、UDAにATを直接組み込むことを可能にするが、SRoUDAのメタステップは、ノイズの多い擬似ラベルからのエラー伝播を緩和するのに役立つ。さまざまなベンチマークデータセットに対する大規模な実験は、SRoUDAの最先端性能を示し、クリーンな精度を損なうことなく、重要なモデルロバスト性の改善を実現する。コードはhttps://github.com/Vision.comで入手できる。 As acquiring manual labels on data could be costly, unsupervised domain adaptation (UDA), which transfers knowledge learned from a rich-label dataset to the unlabeled target dataset, is gaining increasing popularity. While extensive studies have been devoted to improving the model accuracy on target domain, an important issue of model robustness is neglected. To make things worse, conventional adversarial training (AT) methods for improving model robustness are inapplicable under UDA scenario since they train models on adversarial examples that are generated by supervised loss function. In this paper, we present a new meta self-training pipeline, named SRoUDA, for improving adversarial robustness of UDA models. Based on self-training paradigm, SRoUDA starts with pre-training a source model by applying UDA baseline on source labeled data and taraget unlabeled data with a developed random masked augmentation (RMA), and then alternates between adversarial target model training on pseudo-labeled target data and finetuning source model by a meta step. While self-training allows the direct incorporation of AT in UDA, the meta step in SRoUDA further helps in mitigating error propagation from noisy pseudo labels. Extensive experiments on various benchmark datasets demonstrate the state-of-the-art performance of SRoUDA where it achieves significant model robustness improvement without harming clean accuracy. Code is available at https://github.com/Vision.	翻訳日:2022-12-13 16:55:39 公開日:2022-12-12
# ProtoPNetは本当に説明可能であるか? プロトタイプの解釈可能性の評価と改善 Is ProtoPNet Really Explainable? Evaluating and Improving the Interpretability of Prototypes ( http://arxiv.org/abs/2212.05946v1 ) ライセンス: Link先を確認	Qihan Huang, Mengqi Xue, Haofei Zhang, Jie Song, Mingli Song	(参考訳) ProtoPNetとその追従型(ProtoPNets)は、プロトタイプから固有の解釈可能性と非解釈不可能な解釈に匹敵する精度で、幅広い研究の関心を集めている。しかし,最近になって,潜在空間における類似性と入力空間における類似性の関係から,プロトタイプの解釈性が損なわれることが判明した。本研究は,サクラの摘み取りによって容易に誤解されるような可視化例による質的評価に留まらず,プロトタイプに基づく説明の解釈性を定量的に評価する最初の試みである。そこで本研究では,2つの評価指標,すなわち一貫性スコアと安定性スコアを提案し,説明一貫性クロスイメージと摂動に対する説明堅牢性を評価する。さらに,プロトタイプの解釈性を向上させるために,浅層深度特徴アライメント(SDFA)モジュールとスコアアグリゲーション(SA)モジュールを提案する。我々は,既存のプロトネットの解釈可能性を明らかにするために,体系的な評価実験を行い,実質的な議論を行う。実験により,従来の定性評価と定量的評価の両面において,精度と解釈性の両方において,本手法は最先端技術よりも優れた性能を示すことが示された。コードはhttps://github.com/hqhQAQ/EvalProtoPNetで入手できる。 ProtoPNet and its follow-up variants (ProtoPNets) have attracted broad research interest for their intrinsic interpretability from prototypes and comparable accuracy to non-interpretable counterparts. However, it has been recently found that the interpretability of prototypes can be corrupted due to the semantic gap between similarity in latent space and that in input space. In this work, we make the first attempt to quantitatively evaluate the interpretability of prototype-based explanations, rather than solely qualitative evaluations by some visualization examples, which can be easily misled by cherry picks. To this end, we propose two evaluation metrics, termed consistency score and stability score, to evaluate the explanation consistency cross images and the explanation robustness against perturbations, both of which are essential for explanations taken into practice. Furthermore, we propose a shallow-deep feature alignment (SDFA) module and a score aggregation (SA) module to improve the interpretability of prototypes. We conduct systematical evaluation experiments and substantial discussions to uncover the interpretability of existing ProtoPNets. Experiments demonstrate that our method achieves significantly superior performance to the state-of-the-arts, under both the conventional qualitative evaluations and the proposed quantitative evaluations, in both accuracy and interpretability. Codes are available at https://github.com/hqhQAQ/EvalProtoPNet.	翻訳日:2022-12-13 16:55:07 公開日:2022-12-12
# マスクオートエンコーダはトランスフォーマーデータハングリーの効果的な解法である Masked autoencoders is an effective solution to transformer data-hungry ( http://arxiv.org/abs/2212.05677v1 ) ライセンス: Link先を確認	Jiawei Mao, Honggu Zhou, Xuesong Yin, Yuanqi Chang. Binling Nie. Rui Xu	(参考訳) ビジョントランスフォーマー(ViT)は、いくつかのビジョンタスクにおいて、そのグローバルモデリング能力で畳み込みニューラルネットワーク(CNN)を上回っている。しかし、ViTには畳み込みに固有の誘導バイアスがないため、トレーニングには大量のデータが必要である。これにより、ViTは医学や科学のような小さなデータセット上でCNNと同等に動作しない。マスク付きオートエンコーダ(mae)はトランスフォーマーを画像そのものに集中させることで、vitのデータ・ハングリー問題をある程度緩和できることを実験的に発見した。しかし、現在のmaeモデルは複雑すぎるため、小さなデータセットに過剰フィッティング問題が発生する。これにより、小さなデータセットでトレーニングされたMAEと高度なCNNモデルのギャップが生じる。そこで、maeにおけるデコーダの複雑さを低減させる方法について検討し、小さなデータセットでそれに適したアーキテクチャ構成を見出した。さらに,位置予測タスクと対比学習タスクも設計し,maeの局所化と不分散特性を導入した。対照的な学習タスクは、モデルがハイレベルなビジュアル情報を学習できるだけでなく、maeのクラストークンのトレーニングも可能にします。ほとんどのMAE改善努力は考慮していません。大規模な実験により,本手法は,現在普及しているマスク画像モデリング(MIM)や小型データセットのビジョントランスフォーマーと比較して,標準の小型データセットと医療データセットの最先端性能を示すとともに,そのコードとモデルはhttps://github.com/Talented-Q/SDMAEで公開されている。 Vision Transformers (ViTs) outperforms convolutional neural networks (CNNs) in several vision tasks with its global modeling capabilities. However, ViT lacks the inductive bias inherent to convolution making it require a large amount of data for training. This results in ViT not performing as well as CNNs on small datasets like medicine and science. We experimentally found that masked autoencoders (MAE) can make the transformer focus more on the image itself, thus alleviating the data-hungry issue of ViT to some extent. Yet the current MAE model is too complex resulting in over-fitting problems on small datasets. This leads to a gap between MAEs trained on small datasets and advanced CNNs models still. Therefore, we investigated how to reduce the decoder complexity in MAE and found a more suitable architectural configuration for it with small datasets. Besides, we additionally designed a location prediction task and a contrastive learning task to introduce localization and invariance characteristics for MAE. Our contrastive learning task not only enables the model to learn high-level visual information but also allows the training of MAE's class token. This is something that most MAE improvement efforts do not consider. Extensive experiments have shown that our method shows state-of-the-art performance on standard small datasets as well as medical datasets with few samples compared to the current popular masked image modeling (MIM) and vision transformers for small datasets.The code and models are available at https://github.com/Talented-Q/SDMAE.	翻訳日:2022-12-13 16:49:40 公開日:2022-12-12
# ポイントクラウド登録のための解空間切断を用いた進化的マルチタスク Evolutionary Multitasking with Solution Space Cutting for Point Cloud Registration ( http://arxiv.org/abs/2212.05679v1 ) ライセンス: Link先を確認	Wu Yue, Peiran Gong, Maoguo Gong, Hangqi Ding, Zedong Tang, Yibo Liu, Wenping Ma, Qiguang Miao	(参考訳) ポイントクラウド登録(PCR)はコンピュータビジョンにおいて人気のある研究トピックである。近年,対象関数設計における初期ポーズに対する頑健さと柔軟性から,進化的手法による登録法が注目されている。しかし、ほとんどの登録法は局所最適にうまく対応できず、成功率を調査することはめったになく、これは局所最適に陥らない可能性を示し、アルゴリズムの実用性に密接に関係している。進化的マルチタスク最適化(EMTO)は、関連するタスク間の知識伝達を通じて探索能力を向上するパラダイムである。この概念に着想を得た本研究では,マルチタスク構成を解空間切断の考え方に基づくEMTOによる新規な登録アルゴリズムを提案する。具体的には, カットスペースを探索するタスクは, 局所最適から逃れ, 登録率を向上する上で, 複雑な関数ランドスケープを伴うタスクを支援する。不要な計算コストを削減するため,スパース・トゥ・ダンス戦略を提案する。また,様々なオーバーラップ率に頑健な新しい適合関数と,計算コストの課題特異的指標を導入する。オブジェクトスケールおよびシーンスケールの登録データセットに対する7つの進化した登録手法と4つの従来の登録手法と比較して,提案手法の精度および局所最適処理における優れた性能を示す実験結果が得られた。 Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 7 evolving registration approaches and 4 traditional registration approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.	翻訳日:2022-12-13 16:49:12 公開日:2022-12-12
# CircleNet:ロバストペデストリアン検出のためのReciprocating Feature Adaptation CircleNet: Reciprocating Feature Adaptation for Robust Pedestrian Detection ( http://arxiv.org/abs/2212.05691v1 ) ライセンス: Link先を確認	Tianliang Zhang, Zhenjun Han, Huijuan Xu, Baochang Zhang, Qixiang Ye	(参考訳) 野生での歩行者検出は、特にシーンが歩行者の著しい閉塞や解像度が低い場合に問題となっている。既存のメソッドは、許容できるパフォーマンスを維持しながら、これらの難しいケースに適応できない。本稿では,circlenetと呼ばれる新しい特徴学習モデルを提案する。このモデルでは,低分解能とオクルードされた物体を人間が観察する過程を模倣して,特徴適応を実現する。 circlenetは機能ピラミッドのセットとして実装され、機能融合を改善するために重み共有パス拡張を使用する。複数のトップダウンおよびボトムアップパスを使用して、特徴適応と反復オブジェクト検出を相互に行う。 CircleNetの特徴適応能力を最大限に活用するために、各サイクルにおける様々な解像度と異なる閉塞レベルの歩行者インスタンスの検出に焦点を当てたインスタンス分解訓練戦略を設計する。具体的には、CircleNetは機能アンサンブルを、エンドツーエンドでハードネガティブなブーストという考え方で実装している。カルテックとシティパーソンの2つの歩行者検出データセットの実験では、circlenetは、通常のインスタンスでの良好なパフォーマンスを維持しつつ、かなりのマージンでオクルードと低解像度の歩行者のパフォーマンスを改善している。 Pedestrian detection in the wild remains a challenging problem especially when the scene contains significant occlusion and/or low resolution of the pedestrians to be detected. Existing methods are unable to adapt to these difficult cases while maintaining acceptable performance. In this paper we propose a novel feature learning model, referred to as CircleNet, to achieve feature adaptation by mimicking the process humans looking at low resolution and occluded objects: focusing on it again, at a finer scale, if the object can not be identified clearly for the first time. CircleNet is implemented as a set of feature pyramids and uses weight sharing path augmentation for better feature fusion. It targets at reciprocating feature adaptation and iterative object detection using multiple top-down and bottom-up pathways. To take full advantage of the feature adaptation capability in CircleNet, we design an instance decomposition training strategy to focus on detecting pedestrian instances of various resolutions and different occlusion levels in each cycle. Specifically, CircleNet implements feature ensemble with the idea of hard negative boosting in an end-to-end manner. Experiments on two pedestrian detection datasets, Caltech and CityPersons, show that CircleNet improves the performance of occluded and low-resolution pedestrians with significant margins while maintaining good performance on normal instances.	翻訳日:2022-12-13 16:48:51 公開日:2022-12-12
# 検出選択アルゴリズム : 物体検出のためのポスト処理を行う確率ベース最適化手法 Detection Selection Algorithm: A Likelihood based Optimization Method to Perform Post Processing for Object Detection ( http://arxiv.org/abs/2212.05706v1 ) ライセンス: Link先を確認	Angzhi Fan, Benjamin Ticknor and Yali Amit	(参考訳) 物体検出では、非最大抑圧(NMS)のような後処理法が広く用いられている。 NMSは偽陽性の検出回数を大幅に減らすことができるが、目標値の低いいくつかの検出を維持できる可能性がある。画像中のオブジェクトとそのラベルの正確な数を求めるため,NMSや関連手法の後に使用される検出選択アルゴリズム(DSA)と呼ばれるポスト処理手法を提案する。 DSAは検出されたバウンディングボックスのサブセットを優雅に選択し、オブジェクトの閉塞を考慮した画像全体の解釈を最も高い確率で行う完全なオブジェクト再構成を行う。アルゴリズムは4つの要素からなる。まず、オブジェクト間の閉塞関係を得るために、より高速なR-CNNに閉塞分岐を追加する。第2に,我々がデコーダと呼ぶ訓練済み生成ネットワークの潜在変数の最適化に基づいて,その可視部分から物体全体の外観を再構築できる単一再構成アルゴリズムを開発した。第3に, 咬合順序を考慮した仮説的解釈により, 全物体の同時再構成を行う全再構成アルゴリズムを提案する。最後に,リストから検出を漸進的に追加または削除し,対応する解釈の可能性を最大化する欲望アルゴリズムを提案する。 NMS や Soft-NMS を用いた DSA は NMS や Soft-NMS よりも優れた結果が得られる。 In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.	翻訳日:2022-12-13 16:48:33 公開日:2022-12-12
# ホットコールドブロック:新しいウェアラブルデザインで赤外線センサーを騙す HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design ( http://arxiv.org/abs/2212.05709v1 ) ライセンス: Link先を確認	Hui Wei, Zhixiang Wang, Xuemei Jia, Yinqiang Zheng, Hao Tang, Shin'ichi Satoh, Zheng Wang	(参考訳) 熱赤外画像に対する敵対的な攻撃は、関連する応用のリスクを露呈する。これらのシステムのセキュリティを見積もることは、現実世界に安全にデプロイするには不可欠です。多くの場合、物理的空間における攻撃を実現するには、精巧な特別な摂動が必要である。これらの解はしばしば \emph{impractical} と \emph{attention-grabbing} である。物理的に実用的でステルス的な敵攻撃の必要性に対処するために、ウェアラブルウォーミングペーストと冷却ペーストを利用する人を隠蔽する赤外線検出器の新しい物理的攻撃である \textsc{HotCold} Blockを導入する。これらの容易に利用できる温度制御材料を体に取り付けることで、 \textsc{HotCold} Blockは人間の目から効率的に逃れる。さらに、複雑なテクスチャと構造特徴を持つ逆パッチを構築する既存の方法とは異なり、 \textsc{HotCold} Blockは、純粋な色ブロックによる攻撃を可能にし、サイズ、形状、位置が攻撃性能に与える影響を探索するSSP指向の逆最適化アルゴリズムを使用している。ディジタル環境と物理環境の両方における広範な実験結果は、提案する \textsc{hotcold}ブロックの性能を示している。 \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}。 Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attention-grabbing}. To address the need for a physically practical and stealthy adversarial attack, we introduce \textsc{HotCold} Block, a novel physical attack for infrared detectors that hide persons utilizing the wearable Warming Paste and Cooling Paste. By attaching these readily available temperature-controlled materials to the body, \textsc{HotCold} Block evades human eyes efficiently. Moreover, unlike existing methods that build adversarial patches with complex texture and structure features, \textsc{HotCold} Block utilizes an SSP-oriented adversarial optimization algorithm that enables attacks with pure color blocks and explores the influence of size, shape, and position on attack performance. Extensive experimental results in both digital and physical environments demonstrate the performance of our proposed \textsc{HotCold} Block. \emph{Code is available: \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}.	翻訳日:2022-12-13 16:48:09 公開日:2022-12-12
# Occluded Pedestrian Detectionのための特徴校正ネットワーク Feature Calibration Network for Occluded Pedestrian Detection ( http://arxiv.org/abs/2212.05717v1 ) ライセンス: Link先を確認	Tianliang Zhang, Qixiang Ye, Baochang Zhang, Jianzhuang Liu, Xiaopeng Zhang, Qi Tian	(参考訳) 野生での歩行者検出は、特に深刻な閉塞を含むシーンでは難しい問題である。本稿では,様々な閉塞下で歩行者を適応的に検出する特徴校正ネットワーク(FC-Net)と呼ばれる,ディープラーニングフレームワークにおける特徴学習手法を提案する。 FC-Netは、歩行者の可視部分が検出に選択的かつ決定的であることの観察に基づいており、セルフアクティベーション(SA)モジュールと特徴校正(FC)モジュールを備えたセルフペース機能学習フレームワークとして実装されている。 FC-Netは、新しい自己活性化方式で、目に見える部分をハイライトし、歩行者の隠された部分を抑える特徴を学習する。 SAモジュールは、余分なパラメータを伴わずに、分類器の重みを再利用することによって歩行者の活性化マップを推定し、その結果、特徴のセマンティクスを強化するための極めてパーシモニーモデルとなり、FCモジュールは、画素単位でも地域的にも適応的な歩行者表現のための畳み込み特性を校正する。 CityPersonsとCaltechのデータセットの実験では、FC-Netは閉塞歩行者の検知性能を最大10%改善し、非閉塞歩行者の優れた性能を維持している。 Pedestrian detection in the wild remains a challenging problem especially for scenes containing serious occlusion. In this paper, we propose a novel feature learning method in the deep learning framework, referred to as Feature Calibration Network (FC-Net), to adaptively detect pedestrians under various occlusions. FC-Net is based on the observation that the visible parts of pedestrians are selective and decisive for detection, and is implemented as a self-paced feature learning framework with a self-activation (SA) module and a feature calibration (FC) module. In a new self-activated manner, FC-Net learns features which highlight the visible parts and suppress the occluded parts of pedestrians. The SA module estimates pedestrian activation maps by reusing classifier weights, without any additional parameter involved, therefore resulting in an extremely parsimony model to reinforce the semantics of features, while the FC module calibrates the convolutional features for adaptive pedestrian representation in both pixel-wise and region-based ways. Experiments on CityPersons and Caltech datasets demonstrate that FC-Net improves detection performance on occluded pedestrians up to 10% while maintaining excellent performance on non-occluded instances.	翻訳日:2022-12-13 16:47:48 公開日:2022-12-12
# 画像アライメントのための変換テンソル・テンソル生成物によるテンソル因子化 Tensor Factorization via Transformed Tensor-Tensor Product for Image Alignment ( http://arxiv.org/abs/2212.05719v1 ) ライセンス: Link先を確認	Sijia Xia, Duo Qiu, and Xiongjun Zhang	(参考訳) 本稿では,観測された画像が未知の領域変換によって変形し,付加ガウス雑音とスパースノイズによって同時に劣化する線形相関画像アライメントのバッチ問題について検討する。これらの画像を3階テンソルの正面スライスとして積み重ねることで、変換テンソルテンソル積によるテンソル分解法を用いて、基底テンソルの低ランク性を探索し、任意のユニタリ変換の下で変換テンソルテンソル積を介して2つの小さなテンソルの積に分解する。変換テンソル-テンソル積の主な利点は、その計算複雑性が変換テンソル核ノルムに基づく既存の文献よりも低いことである。さらに、テンソル$\ell_p$$(0<p<1)$ノルムはスパースノイズの空間性を特徴づけるために使用され、テンソルのフロベニウスノルムは加法ガウスノイズをモデル化するために用いられる。一般化されたGauss-Newtonアルゴリズムは、ドメイン変換を線形化して得られたモデルを解くために設計され、対応するサブプロブレムを解くために近位Gauss-Seidelアルゴリズムが開発された。さらに、近位ガウス-セイデルアルゴリズムの収束が確立され、その収束率はクルディカ-$\l$ojasiewicz の性質に基づいて解析される。実世界の画像データセットに関する広範囲な数値実験を行い,精度と計算時間の両方において,提案手法の優れた性能を示す。 In this paper, we study the problem of a batch of linearly correlated image alignment, where the observed images are deformed by some unknown domain transformations, and corrupted by additive Gaussian noise and sparse noise simultaneously. By stacking these images as the frontal slices of a third-order tensor, we propose to utilize the tensor factorization method via transformed tensor-tensor product to explore the low-rankness of the underlying tensor, which is factorized into the product of two smaller tensors via transformed tensor-tensor product under any unitary transformation. The main advantage of transformed tensor-tensor product is that its computational complexity is lower compared with the existing literature based on transformed tensor nuclear norm. Moreover, the tensor $\ell_p$ $(0<p<1)$ norm is employed to characterize the sparsity of sparse noise and the tensor Frobenius norm is adopted to model additive Gaussian noise. A generalized Gauss-Newton algorithm is designed to solve the resulting model by linearizing the domain transformations and a proximal Gauss-Seidel algorithm is developed to solve the corresponding subproblem. Furthermore, the convergence of the proximal Gauss-Seidel algorithm is established, whose convergence rate is also analyzed based on the Kurdyka-$\L$ojasiewicz property. Extensive numerical experiments on real-world image datasets are carried out to demonstrate the superior performance of the proposed method as compared to several state-of-the-art methods in both accuracy and computational time.	翻訳日:2022-12-13 16:47:25 公開日:2022-12-12
# hdnet:群衆数を階層的に分離したネットワーク HDNet: A Hierarchically Decoupled Network for Crowd Counting ( http://arxiv.org/abs/2212.05722v1 ) ライセンス: Link先を確認	Chenliang Gu, Changan Wang, Bin-Bin Gao, Jun Liu, Tianliang Zhang	(参考訳) 近年,密度分布の適合性が高いため,密度マップ回帰に基づく手法が群集計数において優勢である。しかし、背景雑音と大きな密度変化が主な原因で、さらなる改善は飽和する傾向にある。本稿では,上記の2つの問題を解決するための階層的分離ネットワーク(hdnet)を提案する。具体的には、背景分類サブタスクを密度マップ予測タスクから分解し、密度デカップリングモジュール(DDM)に割り当てられ、その高い識別能力を利用する。残りのフォアグラウンド予測サブタスクでは、ddmによって複数の密度特異的サブタスクに階層的に分解され、フォアグラウンド密度推定モジュール(fdem)で回帰ベースの専門家によって解決される。提案手法は,これらのタスク固有の専門家の最適化を緩和するために仮説空間を効果的に削減するが,これらのサブタスクの高相関は無視される。そこで我々は,機能インタラクション,勾配インタラクション,スケールインタラクションという,フレームワーク全体を統一するための3種類のインタラクション戦略を導入する。上記の精神と統合されたHDNetは、いくつかの人気のあるカウントベンチマークで最先端のパフォーマンスを達成する。 Recently, density map regression-based methods have dominated in crowd counting owing to their excellent fitting ability on density distribution. However, further improvement tends to saturate mainly because of the confusing background noise and the large density variation. In this paper, we propose a Hierarchically Decoupled Network (HDNet) to solve the above two problems within a unified framework. Specifically, a background classification sub-task is decomposed from the density map prediction task, which is then assigned to a Density Decoupling Module (DDM) to exploit its highly discriminative ability. For the remaining foreground prediction sub-task, it is further hierarchically decomposed to several density-specific sub-tasks by the DDM, which are then solved by the regression-based experts in a Foreground Density Estimation Module (FDEM). Although the proposed strategy effectively reduces the hypothesis space so as to relieve the optimization for those task-specific experts, the high correlation of these sub-tasks are ignored. Therefore, we introduce three types of interaction strategies to unify the whole framework, which are Feature Interaction, Gradient Interaction, and Scale Interaction. Integrated with the above spirits, HDNet achieves state-of-the-art performance on several popular counting benchmarks.	翻訳日:2022-12-13 16:46:56 公開日:2022-12-12
# roiformer: 自己教師付き単眼深度推定のための意味認識領域変換器 ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient Self-Supervised Monocular Depth Estimation ( http://arxiv.org/abs/2212.05729v1 ) ライセンス: Link先を確認	Daitao Xing, Jinglin Shen, Chiuman Ho and Anthony Tzes	(参考訳) 相互に適合するクロスドメインの探索は、正確な自己監督深度推定への大きな可能性を示している。本研究では,深度情報と意味情報の融合について再検討し,幾何認識表現強調のための効率的な局所適応注意法を提案する。グローバルな接続を構築したり、制約なく特徴空間に注意を向ける代わりに、学習可能な関心領域内に空間的相互作用を縛り付ける。特に,意味情報からの幾何学的手がかりを利用して局所適応境界ボックスを学習し,教師なし特徴集約を導く。局所領域は注意空間から最も無関係な参照ポイントを妨げ、より選択的な特徴学習とより速い収束をもたらす。我々は自然にパラダイムを多面的・階層的な方法で拡張し、異なる意味レベルでの情報蒸留を可能にし、詳細な深度推定のための特徴識別能力を向上させる。 KITTIデータセットの大規模な実験により,提案手法は自己教師付き単眼深度推定タスクにおける新しい最先端技術を確立し,従来のトランスフォーマーモデルに対するアプローチの有効性を示す。 The exploration of mutual-benefit cross-domains has shown great potential toward accurate self-supervised depth estimation. In this work, we revisit feature fusion between depth and semantic information and propose an efficient local adaptive attention method for geometric aware representation enhancement. Instead of building global connections or deforming attention across the feature space without restraint, we bound the spatial interaction within a learnable region of interest. In particular, we leverage geometric cues from semantic information to learn local adaptive bounding boxes to guide unsupervised feature aggregation. The local areas preclude most irrelevant reference points from attention space, yielding more selective feature learning and faster convergence. We naturally extend the paradigm into a multi-head and hierarchic way to enable the information distillation in different semantic levels and improve the feature discriminative ability for fine-grained depth estimation. Extensive experiments on the KITTI dataset show that our proposed method establishes a new state-of-the-art in self-supervised monocular depth estimation task, demonstrating the effectiveness of our approach over former Transformer variants.	翻訳日:2022-12-13 16:46:37 公開日:2022-12-12
# bev-mae:アウトドア・ポイント・クラウド・プレトレーニングのためのバードズ・アイ・ビューマスク付きオートエンコーダ BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training ( http://arxiv.org/abs/2212.05758v1 ) ライセンス: Link先を確認	Zhiwei Lin, Yongtao Wang	(参考訳) 現在の屋外LiDARに基づく3Dオブジェクト検出法は、主にスクラッチの訓練パラダイムを採用している。残念ながら、このパラダイムは大規模なラベル付きデータに大きく依存しており、そのコレクションは高価で時間を要する可能性がある。自己教師付き事前学習は、この広範な注釈付きデータへの依存を緩和するための効果的かつ望ましい方法である。近年,マスキングモデリングは,ポイントクラウドのための自己教師あり学習手法として成功している。しかし、現在は主に合成データや屋内データセットに焦点を当てている。大規模で希少な屋外点雲に適用すると、良好な結果が得られない。本稿では,アウトドア・ポイント・クラウド上での3次元物体検出のための簡易マスク型オートエンコーダプリトレーニングフレームワークbev-maeを提案する。具体的には、まず、BEV視点で3Dエンコーダ学習特徴表現を誘導し、事前学習中に複雑なデコーダ設計を避けるために、鳥の目視(BEV)誘導マスキング戦略を提案する。さらに,マスキングポイントクラウド入力の微調整による3次元エンコーダの一貫した受容フィールドサイズを維持するために,学習可能なポイントトークンを導入する。最後に、3次元エンコーダが物体検出に不可欠な位置情報を学習できるようにするために, 遠方物体の点雲がより疎いという, 屋外点雲の性質に基づき, 点密度予測を提案する。実験結果から,BEV-MAEは,多種多様な3次元物体検出器を用いたWaymoとnuSceneの両方で,最先端の自己監督結果が得られることがわかった。さらに、事前トレーニング中のトレーニングコストはわずか20%のデータと7%で、最先端のメソッドの提案と同等のパフォーマンスを達成している。ソースコードと事前トレーニングされたモデルが公開される予定だ。 Current outdoor LiDAR-based 3D object detection methods mainly adopt the training-from-scratch paradigm. Unfortunately, this paradigm heavily relies on large-scale labeled data, whose collection can be expensive and time-consuming. Self-supervised pre-training is an effective and desirable way to alleviate this dependence on extensive annotated data. Recently, masked modeling has become a successful self-supervised learning approach for point clouds. However, current works mainly focus on synthetic or indoor datasets. When applied to large-scale and sparse outdoor point clouds, they fail to yield satisfactory results. In this work, we present BEV-MAE, a simple masked autoencoder pre-training framework for 3D object detection on outdoor point clouds. Specifically, we first propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation in a BEV perspective and avoid complex decoder design during pre-training. Besides, we introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder with fine-tuning for masked point cloud inputs. Finally, based on the property of outdoor point clouds, i.e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection. Experimental results show that BEV-MAE achieves new state-of-the-art self-supervised results on both Waymo and nuScenes with diverse 3D object detectors. Furthermore, with only 20% data and 7% training cost during pre-training, BEV-MAE achieves comparable performance with the state-of-the-art method ProposalContrast. The source code and pre-trained models will be made publicly available.	翻訳日:2022-12-13 16:46:17 公開日:2022-12-12
# 動作認識のための3次元変形注意を用いたクロスモーダル学習 Cross-Modal Learning with 3D Deformable Attention for Action Recognition ( http://arxiv.org/abs/2212.05638v1 ) ライセンス: Link先を確認	Sangwon Kim and Dasom Ahn and Byoung Chul Ko	(参考訳) 視覚に基づく行動認識における重要な課題は、時空間的特徴を2つ以上の不均一なモダリティを1つの特徴に埋め込むことである。本研究では,適応時空間受容場とクロスモーダル学習方式を用いた行動認識のための新しい3次元変形型トランスを提案する。 3次元変形可能な変圧器は、3次元変形性、局所的な関節ストライド、時間的ストライドアテンションの3つのアテンションモジュールから構成される。 2つのクロスモーダルトークンは、3D変形可能なアテンションモジュールに入力され、反射時空間相関を持つクロスアテンショントークンを生成する。局所的なストライドアテンションは、注意を空間的に組み合わせ、トークンをポーズさせる。時間的ストライドアテンションは、アテンションモジュール内の入力トークン数を時間的に減少させ、すべてのトークンを同時に使用せずに時間的表現学習をサポートする。変形可能な変換器は、L回繰り返して、最後のクロスモーダルトークンを組み合わせて分類する。提案した3DデフォルマブルトランスはNTU60, NTU120, FineGYM, Penn Actionのデータセットでテストされ, 事前学習プロセスなしでも, 先行訓練された最先端手法よりも優れた結果が得られた。また、空間的関節および時間的ストライド注意による行動認識における重要な関節と相関を可視化することにより、行動認識のための説明可能なポテンシャルを達成する可能性を示す。 An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D deformability, local joint stride, and temporal stride attention. The two cross-modal tokens are input into the 3D deformable attention module to create a cross-attention token with a reflected spatiotemporal correlation. Local joint stride attention is applied to spatially combine attention and pose tokens. Temporal stride attention temporally reduces the number of input tokens in the attention module and supports temporal expression learning without the simultaneous use of all tokens. The deformable transformer iterates L times and combines the last cross-modal token for classification. The proposed 3D deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn Action datasets, and showed results better than or similar to pre-trained state-of-the-art methods even without a pre-training process. In addition, by visualizing important joints and correlations during action recognition through spatial joint and temporal stride attention, the possibility of achieving an explainable potential for action recognition is presented.	翻訳日:2022-12-13 16:37:33 公開日:2022-12-12
# 悪意のあるメディアデータと戦う: タンパ検出とディープフェイク検出に関する調査 Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection ( http://arxiv.org/abs/2212.05667v1 ) ライセンス: Link先を確認	Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang	(参考訳) オンラインメディアデータは、画像やビデオの形で、主流のコミュニケーションチャネルになりつつある。しかし、近年の深層学習の進歩、特に深層生成モデルでは、視覚的に説得力のある画像や動画を低コストで制作するための扉が開かれており、デジタル情報の信頼性に深刻な脅威をもたらすだけでなく、社会的影響も深刻である。これはメディア改ざん検出の研究の関心の高まり、すなわち、メディアデータが悪意ある操作を受けているかどうかをディープラーニング技術を用いて調べることである。対象画像の内容によっては、メディア偽造は画像改ざんとディープフェイク技術に分けられる。前者は通常、通常の画像の視覚的要素を移動または消去するが、後者は表情や人間の顔の同一性も操作する。したがって、防御手段には、多種多様な特性を有する画像改ざん検出およびディープフェイク検出が含まれる。本稿では,現在のメディア改ざん検出手法の包括的レビューを行い,今後の研究に向けて,この分野の課題と動向について考察する。 Online media data, in the forms of images and videos, are becoming mainstream communication channels. However, recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost, which not only poses a serious threat to the trustworthiness of digital information but also has severe societal implications. This motivates a growing interest of research in media tampering detection, i.e., using deep learning techniques to examine whether media data have been maliciously manipulated. Depending on the content of the targeted images, media forgery could be divided into image tampering and Deepfake techniques. The former typically moves or erases the visual elements in ordinary images, while the latter manipulates the expressions and even the identity of human faces. Accordingly, the means of defense include image tampering detection and Deepfake detection, which share a wide variety of properties. In this paper, we provide a comprehensive review of the current media tampering detection approaches, and discuss the challenges and trends in this field for future research.	翻訳日:2022-12-13 16:37:07 公開日:2022-12-12
# T5Score: 世代評価メトリクスの識別的微調整 T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics ( http://arxiv.org/abs/2212.05726v1 ) ライセンス: Link先を確認	Yiwei Qin, Weizhe Yuan, Graham Neubig, Pengfei Liu	(参考訳) 現代のテキスト評価のための埋め込みベースのメトリクスは、一般的に2つのパラダイムの1つに該当する: 教師付き人間のアノテーションに従ってどの出力が高品質かを直接予測するために訓練された差別的メトリクスと、生成モデルの確率に基づいてテキストを評価するために訓練された生成的メトリクスである。判別メトリクスは良いアウトプットと悪いアウトプットを区別する問題を直接最適化することができ、生成メトリクスは豊富な生のテキストを使ってトレーニングすることができる。本稿では,現在利用可能なデータからの教師なし信号と教師なし信号の両方を用いて,両世界の長所を組み合わせたフレームワークを提案する。このアイデアを,mT5をバックボーンとするトレーニング信号を使用するメトリックであるT5Scoreをトレーニングすることで,運用する。 5つのデータセット、19の言語、280のシステムで既存のメトリクスと比較し、本手法の有用性を実証した。 T5Scoreは、セグメントレベルの既存のトップスコアメトリクスに対して、すべてのデータセットで最高のパフォーマンスを達成する。コードとモデルはhttps://github.com/qinyiwei/t5scoreでリリースします。 Modern embedding-based metrics for evaluation of generated text generally fall into one of two paradigms: discriminative metrics that are trained to directly predict which outputs are of higher quality according to supervised human annotations, and generative metrics that are trained to evaluate text based on the probabilities of a generative model. Both have their advantages; discriminative metrics are able to directly optimize for the problem of distinguishing between good and bad outputs, while generative metrics can be trained using abundant raw text. In this paper, we present a framework that combines the best of both worlds, using both supervised and unsupervised signals from whatever data we have available. We operationalize this idea by training T5Score, a metric that uses these training signals with mT5 as the backbone. We perform an extensive empirical comparison with other existing metrics on 5 datasets, 19 languages and 280 systems, demonstrating the utility of our method. Experimental results show that: T5Score achieves the best performance on all datasets against existing top-scoring metrics at the segment level. We release our code and models at https://github.com/qinyiwei/T5Score.	翻訳日:2022-12-13 16:12:36 公開日:2022-12-12
# 効果的な多言語微調整方法の探索--要約の事例研究 Searching for Effective Multilingual Fine-Tuning Methods: A Case Study in Summarization ( http://arxiv.org/abs/2212.05740v1 ) ライセンス: Link先を確認	Yiwei Qin, Graham Neubig, Pengfei Liu	(参考訳) 近年,学習済み言語モデルを下流タスクに適応させるためのチューニング戦略が多数提案されている。本稿では,多言語学習のための様々なチューニング戦略,特にテキスト要約の文脈において,広範な経験的評価を行う。具体的には、多言語調律戦略(合計5つのモデル)の3つのファミリーの相対的な利点を調べ、45以上の言語を要約するために経験的に評価する。実験により,XL-Sumデータセット上に新たな最先端技術を構築しただけでなく,多言語チューニング戦略の設計に関する今後の研究のヒントとなる一連の観測結果も得られた。 Recently, a large number of tuning strategies have been proposed to adapt pre-trained language models to downstream tasks. In this paper, we perform an extensive empirical evaluation of various tuning strategies for multilingual learning, particularly in the context of text summarization. Specifically, we explore the relative advantages of three families of multilingual tuning strategies (a total of five models) and empirically evaluate them for summarization over 45 languages. Experimentally, we not only established a new state-of-the-art on the XL-Sum dataset but also derive a series of observations that hopefully can provide hints for future research on the design of multilingual tuning strategies.	翻訳日:2022-12-13 16:12:14 公開日:2022-12-12
# プログラミングのための自然言語処理に関する調査 A Survey on Natural Language Processing for Programming ( http://arxiv.org/abs/2212.05773v1 ) ライセンス: Link先を確認	Qingfu Zhu, Xianzhen Luo, Fang Liu, Cuiyun Gao, Wanxiang Che	(参考訳) NLP技術を用いてプログラミングを支援するプログラミングのための自然言語処理は,近年爆発的な進歩を遂げている。しかし、全スペクトルから関連する作品を体系的にレビューする文献はない。本稿では,初期の演目モデルから最新の競争レベルモデルまで,既存の研究を包括的に調査する。この論文のもう1つの利点はテクニックカテゴリの完全性であり、将来の作品の配置と比較を簡単に行うことができる。 Natural language processing for programming, which aims to use NLP techniques to assist programming, has experienced an explosion in recent years. However, there is no literature that systematically reviews related work from the full spectrum. In this paper, we comprehensively investigate existing work, ranging from early deductive models to the latest competition-level models. Another advantage of this paper is the completeness of the technique category, which provides easy access to locating and comparing future works.	翻訳日:2022-12-13 16:12:01 公開日:2022-12-12
# 連携学習による異種自然言語処理タスクの協調 Collaborating Heterogeneous Natural Language Processing Tasks via Federated Learning ( http://arxiv.org/abs/2212.05789v1 ) ライセンス: Link先を確認	Chenhe Dong, Yuexiang Xie, Bolin Ding, Ying Shen, Yaliang Li	(参考訳) 個人のプライベートテキストデータに対するプライバシーの懸念が高まり、近年のフェデレートラーニング(FL)の発展が促進されている。しかし、NLPにおけるFLの適用に関する既存の研究は、参加者を異種・プライベートな学習目標に合わせるのに適していない。本研究では、異種NLPタスクを持つクライアントがFLコースを構築し、相互に有用な知識を学習できるようにするAssign-Then-Contrast(ATC)フレームワークを提案することにより、NLPにおけるFLの適用範囲をさらに広げる。具体的には、クライアントは、アサイントレーニングステージと呼ばれる独自の学習目標を使用するのではなく、サーバが割り当てた統一タスクで最初にローカルトレーニングを行うように提案する。その後、Contrastのトレーニング段階において、クライアントは異なるローカル学習目標でトレーニングを行い、一貫性と有用なモデル更新に貢献する他のクライアントと知識を交換する。本研究では,自然言語理解(NLU)タスクと自然言語生成(NLG)タスクを対象とする6つの広義のデータセットについて広範な実験を行った。ソースコードは \url{https://github.com/alibaba/FederatedScope/tree/federatedscope/nlp/hetero_tasks} で公開されている。 The increasing privacy concerns on personal private text data promote the development of federated learning (FL) in recent years. However, the existing studies on applying FL in NLP are not suitable to coordinate participants with heterogeneous or private learning objectives. In this study, we further broaden the application scope of FL in NLP by proposing an Assign-Then-Contrast (denoted as ATC) framework, which enables clients with heterogeneous NLP tasks to construct an FL course and learn useful knowledge from each other. Specifically, the clients are suggested to first perform local training with the unified tasks assigned by the server rather than using their own learning objectives, which is called the Assign training stage. After that, in the Contrast training stage, clients train with different local learning objectives and exchange knowledge with other clients who contribute consistent and useful model updates. We conduct extensive experiments on six widely-used datasets covering both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks, and the proposed ATC framework achieves significant improvements compared with various baseline methods. The source code is available at \url{https://github.com/alibaba/FederatedScope/tree/master/federatedscope/nlp/hetero_tasks}.	翻訳日:2022-12-13 16:11:54 公開日:2022-12-12
# P-Transformer: ドキュメントからドキュメントへのニューラルマシン翻訳の改善を目指す P-Transformer: Towards Better Document-to-Document Neural Machine Translation ( http://arxiv.org/abs/2212.05830v1 ) ライセンス: Link先を確認	Yachao Li, Junhui Li, Jing Jiang, Shimin Tao, Hao Yang and Min Zhang	(参考訳) document-to-document (doc2doc)neural machine translation (nmt)をtransformer経由で直接トレーニングする。私たちの専門的な調査課題は 1) 絶対位置情報と相対位置情報の両方が上エンコーダ層に到達すると徐々に弱まるか、あるいは消えてしまう。 2)エンコーダ出力における絶対位置情報の消滅はDoc2Doc NMTのトレーニング失敗を引き起こす。そこで,本研究では,位置認識トランス(p-transformer,p-transformer,p-transformer)を提案する。具体的には,絶対的な位置情報,すなわち位置埋め込みを,単純かつ効果的な加算操作を通じて,自己参照とクロスアテンションの両方においてクエリキーペアに統合する。さらに,相対的な位置エンコーディングを自己注意に組み込む。提案するP-Transformerは正弦波位置符号化を利用しており,タスク特定位置埋め込み,セグメント埋め込み,アテンション機構を必要としない。 P-Transformerを用いてDoc2Doc NMTモデルを構築し、ソース文書を取り込み、シーケンシャル・ツー・シーケンス(seq2seq)方式でターゲット文書を完全に生成する。さらに、p-transformer は seq2seq-based document-to-sentence (doc2sent) および sentence-to-sentence (sent2sent) 翻訳に適用することができる。 doc2doc nmtの広範な実験結果によると、p-transformerは、7つの言語ペアで広く使われている9つのドキュメントレベルのデータセットの強いベースラインを上回っており、小規模、中規模、大規模をカバーする。談話現象に関する実験により、私たちのDoc2Doc NMTモデルはBLEUと談話コヒーレンスの両方の翻訳品質を改善した。コードをgithubで公開しています。 Directly training a document-to-document (Doc2Doc) neural machine translation (NMT) via Transformer from scratch, especially on small datasets usually fails to converge. Our dedicated probing tasks show that 1) both the absolute position and relative position information gets gradually weakened or even vanished once it reaches the upper encoder layers, and 2) the vanishing of absolute position information in encoder output causes the training failure of Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information in both self-attention and cross-attention. Specifically, we integrate absolute positional information, i.e., position embeddings, into the query-key pairs both in self-attention and cross-attention through a simple yet effective addition operation. Moreover, we also integrate relative position encoding in self-attention. The proposed P-Transformer utilizes sinusoidal position encoding and does not require any task-specified position embedding, segment embedding, or attention mechanism. Through the above methods, we build a Doc2Doc NMT model with P-Transformer, which ingests the source document and completely generates the target document in a sequence-to-sequence (seq2seq) way. In addition, P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation. Extensive experimental results of Doc2Doc NMT show that P-Transformer significantly outperforms strong baselines on widely-used 9 document-level datasets in 7 language pairs, covering small-, middle-, and large-scales, and achieves a new state-of-the-art. Experimentation on discourse phenomena shows that our Doc2Doc NMT models improve the translation quality in both BLEU and discourse coherence. We make our code available on Github.	翻訳日:2022-12-13 16:11:31 公開日:2022-12-12
# 「これは最も破壊的な技術だと思う」:Twitter Dataを用いたChatGPTアーリーアダプターの感性を探る "I think this is the most disruptive technology": Exploring Sentiments of ChatGPT Early Adopters using Twitter Data ( http://arxiv.org/abs/2212.05856v1 ) ライセンス: Link先を確認	Mubin Ul Haque, Isuru Dharmadasa, Zarrin Tasnim Sworna, Roshan Namal Rajapakse, and Hussain Ahmad	(参考訳) 大規模な言語モデルは最近、さまざまなタスクで素晴らしいパフォーマンスを発揮したことで、大きな注目を集めている。 openaiが開発したchatgptは、大規模な事前学習された言語モデルの実装のひとつで、アーリーアダプターの間で人気を集めている。このようなアーリーアダプターの感情を理解することは、テクノロジーの潜在的な成功や失敗、そしてその強みや弱点についての洞察を提供することができるため重要である。本稿では,初期のChatGPTユーザからの10,732ツイートを用いた混合手法による研究を行う。まず、トピックモデリングを使用してメイントピックを特定し、各トピックの詳細な質的感情分析を行います。この結果から,早期採用者の大多数は,ソフトウェア開発への混乱,エンタテイメント,創造性の行使といったトピックに関して,圧倒的に肯定的な感情を表明していることがわかった。 chatgptを誤用する可能性などの問題、特に教育面への影響などに関する懸念を表明したのはごく一部のユーザーだけだった。本研究は,各トピックの具体例を提示し,研究者とユーザ双方の懸念に対処する上での意義を詳述する。 Large language models have recently attracted significant attention due to their impressive performance on a variety of tasks. ChatGPT developed by OpenAI is one such implementation of a large, pre-trained language model that has gained immense popularity among early adopters, where certain users go to the extent of characterizing it as a disruptive technology in many domains. Understanding such early adopters' sentiments is important because it can provide insights into the potential success or failure of the technology, as well as its strengths and weaknesses. In this paper, we conduct a mixed-method study using 10,732 tweets from early ChatGPT users. We first use topic modelling to identify the main topics and then perform an in-depth qualitative sentiment analysis of each topic. Our results show that the majority of the early adopters have expressed overwhelmingly positive sentiments related to topics such as Disruptions to software development, Entertainment and exercising creativity. Only a limited percentage of users expressed concerns about issues such as the potential for misuse of ChatGPT, especially regarding topics such as Impact on educational aspects. We discuss these findings by providing specific examples for each topic and then detail implications related to addressing these concerns for both researchers and users.	翻訳日:2022-12-13 16:10:57 公開日:2022-12-12
# 極多ラベル長文変換器モデルを用いたICDの自動符号化 Automated ICD Coding using Extreme Multi-label Long Text Transformer-based Models ( http://arxiv.org/abs/2212.05857v1 ) ライセンス: Link先を確認	Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Louisa Jorm	(参考訳) 背景:多くの自然言語処理タスクで事前訓練されたトランスフォーマーモデルの成功により、国際疾病分類(icd)コーディングタスクへの使用が積極的に検討されている。本研究では,3種類のトランスフォーマーモデルについて検討し,自動ICD符号化タスクによって生じる極端なラベルセットと長いテキスト分類課題に対処することを目的とした。方法: Transformer-based model PLM-ICDは、ICD符号化ベンチマークデータセットMIMIC-III上で、現在の最先端(SOTA)性能を達成した。さらに最適化するために、ベースラインモデルに選ばれました。また,XR-Transformerモデルの新たな適応であるXR-LATをMIMIC-IIIデータセット上でトレーニングした。 XR-LATは、ラベルに関する注意、知識伝達、動的負のサンプリング機構を備えた、事前定義された階層コードツリー上の再帰的に訓練されたモデルチェーンである。結果: より長い総数およびチャンクシーケンス長で訓練したPLM-ICDモデルは, 現行のSOTA PLM-ICDモデルより有意に優れ, マイクロF1スコアは60.8%であった。 XR-Transformerモデルは、一般的なドメインではSOTAだが、すべてのメトリクスでうまく機能しなかった。 XR-LATベースの最良のモデルでは、現在のSOTA PLM-ICDモデルと競合する結果が得られ、マクロAUCは2.1%向上した。結論:我々の最適化PLM-ICDモデルはMIMIC-IIIデータセット上でのICDの自動符号化のための新しいSOTAモデルであり,新しいXR-LATモデルは以前のSOTA PLM-ICDモデルと競合する。 Background: Encouraged by the success of pretrained Transformer models in many natural language processing tasks, their use for International Classification of Diseases (ICD) coding tasks is now actively being explored. In this study, we investigate three types of Transformer-based models, aiming to address the extreme label set and long text classification challenges that are posed by automated ICD coding tasks. Methods: The Transformer-based model PLM-ICD achieved the current state-of-the-art (SOTA) performance on the ICD coding benchmark dataset MIMIC-III. It was chosen as our baseline model to be further optimised. XR-Transformer, the new SOTA model in the general extreme multi-label text classification domain, and XR-LAT, a novel adaptation of the XR-Transformer model, were also trained on the MIMIC-III dataset. XR-LAT is a recursively trained model chain on a predefined hierarchical code tree with label-wise attention, knowledge transferring and dynamic negative sampling mechanisms. Results: Our optimised PLM-ICD model, which was trained with longer total and chunk sequence lengths, significantly outperformed the current SOTA PLM-ICD model, and achieved the highest micro-F1 score of 60.8%. The XR-Transformer model, although SOTA in the general domain, did not perform well across all metrics. The best XR-LAT based model obtained results that were competitive with the current SOTA PLM-ICD model, including improving the macro-AUC by 2.1%. Conclusion: Our optimised PLM-ICD model is the new SOTA model for automated ICD coding on the MIMIC-III dataset, while our novel XR-LAT model performs competitively with the previous SOTA PLM-ICD model.	翻訳日:2022-12-13 16:10:37 公開日:2022-12-12
# Carpet-Bombing パッチ:通常の要求なしにディープネットワークを攻撃 Carpet-bombing patch: attacking a deep network without usual requirements ( http://arxiv.org/abs/2212.05827v1 ) ライセンス: Link先を確認	Pol Labarbarie, Adrien Chan-Hon-Tong, St\'ephane Herbin and Milad Leyli-Abadi	(参考訳) ディープネットワークは回避攻撃の脆弱性を示したが、そのような攻撃は通常非現実的な要件がある。最近の文献では、これらの要件の削除の可能性について論じている。本論文は, ほぼ不要なカーペットボーミングパッチ攻撃を導入することで, 本研究に寄与する。特徴表現をターゲットとして、このパッチアタックはネットワークタスクを知る必要はない。この攻撃は、Imagenet、Pascal VocのmAP、CityscapesのIoUの精度を低下させる。この攻撃によって引き起こされる潜在的な安全性の問題以外にも、カーペットボーミング攻撃の影響は、ディープネットワーク層の動的に興味深い特性を浮き彫りにしている。 Although deep networks have shown vulnerability to evasion attacks, such attacks have usually unrealistic requirements. Recent literature discussed the possibility to remove or not some of these requirements. This paper contributes to this literature by introducing a carpet-bombing patch attack which has almost no requirement. Targeting the feature representations, this patch attack does not require knowing the network task. This attack decreases accuracy on Imagenet, mAP on Pascal Voc, and IoU on Cityscapes without being aware that the underlying tasks involved classification, detection or semantic segmentation, respectively. Beyond the potential safety issues raised by this attack, the impact of the carpet-bombing attack highlights some interesting property of deep network layer dynamic.	翻訳日:2022-12-13 16:03:56 公開日:2022-12-12
# where to go: 都市規模のオンライン配車サービスにおける深層強化学習によるエージェントガイダンス Where to go: Agent Guidance with Deep Reinforcement Learning in A City-Scale Online Ride-Hailing Service ( http://arxiv.org/abs/2212.05742v1 ) ライセンス: Link先を確認	Jiyao Li, Vicki H. Allan	(参考訳) オンライン配車サービスは世界中で普及している交通システムとなっている。本稿では,オンライン配車サービスにおいて,供給と需要のバランスがとれるように,都市周辺の空きタクシーをどのように誘導するかという課題について検討する。我々は、オンライン配車サービスの複数のパフォーマンス指標を考慮した新しい報酬スキームをデザインする。また,様々な場所で不要な動作をマスキングし,エージェントがより高速かつ効率的に学習できるように,deep-q-network with action mask (am-dqn) という新しい深層強化学習法を提案する。シカゴの都市規模データセットを用いて大規模な実験を行った。いくつかの一般的なヒューリスティックおよび学習法は、比較のベースラインとして実装されている。実験の結果, AM-DQNは, 平均故障率, 顧客の平均待ち時間, 空きタクシーの平均アイドル検索時間に関して, 全手法で最高の性能を発揮することがわかった。 Online ride-hailing services have become a prevalent transportation system across the world. In this paper, we study a challenging problem of how to direct vacant taxis around a city such that supplies and demands can be balanced in online ride-hailing services. We design a new reward scheme that considers multiple performance metrics of online ride-hailing services. We also propose a novel deep reinforcement learning method named Deep-Q-Network with Action Mask (AM-DQN) masking off unnecessary actions in various locations such that agents can learn much faster and more efficiently. We conduct extensive experiments using a city-scale dataset from Chicago. Several popular heuristic and learning methods are also implemented as baselines for comparison. The results of the experiments show that the AM-DQN attains the best performances of all methods with respect to average failure rate, average waiting time for customers, and average idle search time for vacant taxis.	翻訳日:2022-12-13 16:02:58 公開日:2022-12-12
# 因果推論と機械学習におけるインストゥルメンタル変数--調査 Instrumental Variables in Causal Inference and Machine Learning: A Survey ( http://arxiv.org/abs/2212.05778v1 ) ライセンス: Link先を確認	Anpeng Wu, Kun Kuang, Ruoxuan Xiong, Fei Wu	(参考訳) 因果推論は、データに基づく変数間の因果関係に関する結論を導き出すために、仮定、研究設計、見積もり戦略を使用するプロセスである。これにより、複雑なシステムで作業しているメカニズムをよりよく理解し、より情報的な決定を下すことができる。多くの環境では、治療と結果変数の両方に影響を及ぼすすべての共同創設者を十分に観察することはできません。この問題に対処するために、因果推論と機械学習の両方における文献の増加は、インストゥルメンタル変数(IV)の使用を提案する。本論文は,因果推論と機械学習の両方において,iv法とその応用を体系的かつ包括的に導入し,議論する最初の試みである。まず、IV の形式的定義を提案し、異なる仮定の下での IV 回帰法の同定問題について議論する。第2に,提案手法に焦点をあてて,既存のIV手法を3つのストリームに分類し,IVを用いた最小二乗法,IVを用いた制御機能,IVの評価を行った。各ストリームに対して、古典的な因果推論手法と、機械学習文学における最近の発展について述べる。次に,実世界のシナリオにおけるIV手法の様々な応用を紹介し,利用可能なデータセットとアルゴリズムの要約を提供する。最後に,本論文を要約し,オープンな問題について議論し,将来的なIV法研究の方向性を提案する。また、この調査でレビューされたIVsメソッドのツールキットをhttps://github.com/causal-machine-learning-lab/mlivで開発する。 Causal inference is the process of using assumptions, study designs, and estimation strategies to draw conclusions about the causal relationships between variables based on data. This allows researchers to better understand the underlying mechanisms at work in complex systems and make more informed decisions. In many settings, we may not fully observe all the confounders that affect both the treatment and outcome variables, complicating the estimation of causal effects. To address this problem, a growing literature in both causal inference and machine learning proposes to use Instrumental Variables (IV). This paper serves as the first effort to systematically and comprehensively introduce and discuss the IV methods and their applications in both causal inference and machine learning. First, we provide the formal definition of IVs and discuss the identification problem of IV regression methods under different assumptions. Second, we categorize the existing work on IV methods into three streams according to the focus on the proposed methods, including two-stage least squares with IVs, control function with IVs, and evaluation of IVs. For each stream, we present both the classical causal inference methods, and recent developments in the machine learning literature. Then, we introduce a variety of applications of IV methods in real-world scenarios and provide a summary of the available datasets and algorithms. Finally, we summarize the literature, discuss the open problems and suggest promising future research directions for IV methods and their applications. We also develop a toolkit of IVs methods reviewed in this survey at https://github.com/causal-machine-learning-lab/mliv.	翻訳日:2022-12-13 15:54:45 公開日:2022-12-12
# 説明可能なパフォーマンス Explainable Performance ( http://arxiv.org/abs/2212.05866v1 ) ライセンス: Link先を確認	Hu\'e Sullivan, Hurlin Christophe, P\'erignon Christophe and Saurin S\'ebastien	(参考訳) 本稿では,モデルの予測的・経済的性能に対する入力特徴の具体的な寄与を測定するために,XPER手法を提案する。我々の方法論にはいくつかの利点がある。第一に、モデル非依存とパフォーマンス指標非依存の両方です。第2に、XPERはShapley値に基づいて理論的に確立されている。第3に、Shapley値の分解に固有のベンチマークの解釈は、私たちのコンテキストにおいて有意義である。第4に、XPERはモデルの再見積を必要としないため、モデル仕様のエラーに悩まされない。 5つ目は、モデルレベルでも、個々のレベルでも実装できます。オートローンに基づくアプリケーションでは、驚くほど少数の機能によってパフォーマンスが説明できることがわかった。 XPERの分解はメトリクス間でかなり安定していますが、いくつかの機能はメトリクス間でサインを切り替えます。また,モデル予測とモデル性能が2つの異なる課題であることを示す。 We introduce the XPER (eXplainable PERformance) methodology to measure the specific contribution of the input features to the predictive or economic performance of a model. Our methodology offers several advantages. First, it is both model-agnostic and performance metric-agnostic. Second, XPER is theoretically founded as it is based on Shapley values. Third, the interpretation of the benchmark, which is inherent in any Shapley value decomposition, is meaningful in our context. Fourth, XPER is not plagued by model specification error, as it does not require re-estimating the model. Fifth, it can be implemented either at the model level or at the individual level. In an application based on auto loans, we find that performance can be explained by a surprisingly small number of features. XPER decompositions are rather stable across metrics, yet some feature contributions switch sign across metrics. Our analysis also shows that explaining model forecasts and model performance are two distinct tasks.	翻訳日:2022-12-13 15:53:25 公開日:2022-12-12
# 線形マルコフ決定過程に対する最短最適強化学習 Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes ( http://arxiv.org/abs/2212.06132v1 ) ライセンス: Link先を確認	Jiafan He and Heyang Zhao and Dongruo Zhou and Quanquan Gu	(参考訳) 線形関数近似による強化学習(rl)について検討した。任意の特徴マッピングの線形関数として遷移ダイナミクスをパラメータ化できるエピソドック時間不均質線形マルコフ決定プロセス(線形mdp)に対して、ほぼミニマックスの最適後悔である$\tilde o(d\sqrt{h^3k})$(ここで$d$は特徴マッピングの次元、$h$は計画の地平線、$k$はエピソード数)を達成する計算効率の良いアルゴリズムを提案する。本アルゴリズムは,(1)emph{optimal}値関数の分散を直接推定し,(2)エピソード数に対して単調に減少し,推定精度が向上し,(3)推定値関数クラスの複雑性を制御するために,値関数推定器の更新にレアスイッチングポリシを用いる新しい分散推定器に依存する,注意深く設計された重み付き線形回帰スキームに基づいている。本研究は,線形mdpを用いた最適rlに対する完全な回答を提供するとともに,開発したアルゴリズムと理論的ツールが独立した興味を持つかもしれない。 We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition dynamic can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the \emph{optimal} value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.	翻訳日:2022-12-13 15:53:11 公開日:2022-12-12
# 機械学習におけるドメイン知識統合のロードマップ A Roadmap to Domain Knowledge Integration in Machine Learning ( http://arxiv.org/abs/2212.05712v1 ) ライセンス: Link先を確認	Himel Das Gupta, Victor S. Sheng	(参考訳) 近年,人工知能のさまざまな面でモデルの性能を高めるために,多くの機械学習アルゴリズムが開発されている。しかし、問題は不適切なデータとリソースのために続く。機械学習モデルに知識を統合することで、これらの障害をある程度克服することができる。知識を組み込むことは、様々な形態の知識表現のために複雑な作業である。本稿では,これらの異なる形態の知識統合と,特定の機械学習タスクにおけるその性能について概説する。 Many machine learning algorithms have been developed in recent years to enhance the performance of a model in different aspects of artificial intelligence. But the problem persists due to inadequate data and resources. Integrating knowledge in a machine learning model can help to overcome these obstacles up to a certain degree. Incorporating knowledge is a complex task though because of various forms of knowledge representation. In this paper, we will give a brief overview of these different forms of knowledge integration and their performance in certain machine learning tasks.	翻訳日:2022-12-13 15:51:57 公開日:2022-12-12
# Visuo-Motorコントロールの事前学習について:学習ベースラインの再検討 On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline ( http://arxiv.org/abs/2212.05749v1 ) ライセンス: Link先を確認	Nicklas Hansen and Zhecheng Yuan and Yanjie Ze and Tongzhou Mu and Aravind Rajeswaran and Hao Su and Huazhe Xu and Xiaolong Wang	(参考訳) データ拡張と浅いconvnetを用いた,visuo-motor制御のための,単純なスクラッチベースラインを再検討する。このベースラインは、大規模な視覚データセットで訓練された凍結視覚表現を利用する最近の手法と競合する性能を持つ。 We revisit a simple Learning-from-Scratch baseline for visuo-motor control that uses data augmentation and a shallow ConvNet. We find that this baseline has competitive performance with recent methods that leverage frozen visual representations trained on large-scale vision datasets.	翻訳日:2022-12-13 15:51:50 公開日:2022-12-12
# ニューラルアセット:インタラクティブ環境のためのボリュームオブジェクトキャプチャとレンダリング Neural Assets: Volumetric Object Capture and Rendering for Interactive Environments ( http://arxiv.org/abs/2212.06125v1 ) ライセンス: Link先を確認	Alja\v{z} Bo\v{z}i\v{c}, Denis Gladkov, Luke Doukakis and Christoph Lassner	(参考訳) リアルなバーチャルアセットを作ることは、時間を要するプロセスだ。通常、アーティストがオブジェクトをデザインし、その外観の微調整に多くの労力を費やす。表面下散乱のような複雑な詳細や特定の効果は、リアルタイムBRDFを用いて表現され、特定の物体の外観を完全に捉えることは不可能である。近年のニューラルレンダリングの進歩に触発されて,日常環境における実世界の物体を忠実かつ高速に捉える手法を提案する。我々は,透明な物体部品などの容積効果を復元し,光写実性オブジェクトの外観を保存するために,新しいニューラル表現を用いる。レンダリング品質を損なうことなくリアルタイムレンダリングをサポートするために,我々のモデルは,インタラクティブなフレームレートを持つ効率的なシェーダコードに変換される,多数の機能と小さなMPPデコーダを使用する。これにより、提案されたニューラルネットワークアセットと既存のメッシュ環境とオブジェクトのシームレスな統合が可能になる。標準的なシェーダーコードレンダリングのおかげで、既存の多くのハードウェアやソフトウェアシステムで可搬性がある。 Creating realistic virtual assets is a time-consuming process: it usually involves an artist designing the object, then spending a lot of effort on tweaking its appearance. Intricate details and certain effects, such as subsurface scattering, elude representation using real-time BRDFs, making it impossible to fully capture the appearance of certain objects. Inspired by the recent progress of neural rendering, we propose an approach for capturing real-world objects in everyday environments faithfully and fast. We use a novel neural representation to reconstruct volumetric effects, such as translucent object parts, and preserve photorealistic object appearance. To support real-time rendering without compromising rendering quality, our model uses a grid of features and a small MLP decoder that is transpiled into efficient shader code with interactive framerates. This leads to a seamless integration of the proposed neural assets with existing mesh environments and objects. Thanks to the use of standard shader code rendering is portable across many existing hardware and software systems.	翻訳日:2022-12-13 15:45:52 公開日:2022-12-12
# PyPop7: 人口ベースのブラックボックス最適化のためのピュアPythonライブラリ PyPop7: A Pure-Python Library for Population-Based Black-Box Optimization ( http://arxiv.org/abs/2212.05652v1 ) ライセンス: Link先を確認	Qiqi Duan, Guochen Zhou, Chang Shao, Zhuowei Wang, Mingyang Feng, Yijun Yang, Qi Zhao, Yuhui Shi	(参考訳) 本稿では,black-box optimization(bbo)用のpure-pythonオープンソースライブラリpypop7を提案する。 It provides a unified and modular interface for more than 60 versions and variants of different black-box optimization algorithms, particularly population-based optimizers, which can be classified into 12 popular families: Evolution Strategies (ES), Natural Evolution Strategies (NES), Estimation of Distribution Algorithms (EDA), Cross-Entropy Method (CEM), Differential Evolution (DE), Particle Swarm Optimizer (PSO), Cooperative Coevolution (CC), Simulated Annealing (SA), Genetic Algorithms (GA), Evolutionary Programming (EP), Pattern Search (PS), and Random Search (RS). また、多くの例や興味深いチュートリアル、本格的なAPIドキュメントも提供している。この新しいライブラリを通じて、オプティマイザのベンチマークと実際のアプリケーション、特に大規模BBOの促進のための、よく設計されたプラットフォームを提供することを期待する。ソースコードとドキュメントはhttps://github.com/Evolutionary-Intelligence/pypopとhttps://pypop.readthedocs.io/en/latestで公開されている。 In this paper, we present a pure-Python open-source library, called PyPop7, for black-box optimization (BBO). It provides a unified and modular interface for more than 60 versions and variants of different black-box optimization algorithms, particularly population-based optimizers, which can be classified into 12 popular families: Evolution Strategies (ES), Natural Evolution Strategies (NES), Estimation of Distribution Algorithms (EDA), Cross-Entropy Method (CEM), Differential Evolution (DE), Particle Swarm Optimizer (PSO), Cooperative Coevolution (CC), Simulated Annealing (SA), Genetic Algorithms (GA), Evolutionary Programming (EP), Pattern Search (PS), and Random Search (RS). It also provides many examples, interesting tutorials, and full-fledged API documentations. Through this new library, we expect to provide a well-designed platform for benchmarking of optimizers and promote their real-world applications, especially for large-scale BBO. Its source code and documentations are available at https://github.com/Evolutionary-Intelligence/pypop and https://pypop.readthedocs.io/en/latest, respectively.	翻訳日:2022-12-13 15:45:09 公開日:2022-12-12
# MoDem: デモによる視覚モデルに基づく強化学習の促進 MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations ( http://arxiv.org/abs/2212.05698v1 ) ライセンス: Link先を確認	Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran	(参考訳) サンプル効率の低さは、現実世界のアプリケーション、特にビジュオモーター制御のためのディープ強化学習(RL)アルゴリズムの展開において、引き続き主要な課題である。モデルベースのrlは、世界モデルを同時に学習し、計画と政策改善に合成ロールアウトを使用することで、非常にサンプル効率が良い可能性がある。しかし、実際には、モデルに基づくRLを用いたサンプル効率学習は探索課題によってボトルネックとなる。本研究では,モデルベースRLのサンプル効率を劇的に向上させることができることを示す。ただし、インタラクションデータセットにデモを追加するだけでは十分ではありません。モデルベースのrlフレームワークの3つのフェーズを形成する,モデル学習 – ポリシ事前トレーニング,ターゲット探索,デモデータのオーバーサンプリング – における,デモンストレーションを活用する上で重要な要素を特定します。我々は,3つの複雑なビジュオモータ制御領域を実験的に研究し,この手法が低データ方式(100Kのインタラクションステップ,5つのデモ)の従来のアプローチと比較して,スパース報酬タスクの完了に150%-250%成功していることを確認した。コードとビデオは、https://nicklashansen.github.io/modemrl.comで入手できる。 Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications, and in particular for visuo-motor control. Model-based RL has the potential to be highly sample efficient by concurrently learning a world model and using synthetic rollouts for planning and policy improvement. However, in practice, sample-efficient learning with model-based RL is bottlenecked by the exploration challenge. In this work, we find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. Simply appending demonstrations to the interaction dataset, however, does not suffice. We identify key ingredients for leveraging demonstrations in model learning -- policy pretraining, targeted exploration, and oversampling of demonstration data -- which forms the three phases of our model-based RL framework. We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100K interaction steps, 5 demonstrations). Code and videos are available at: https://nicklashansen.github.io/modemrl	翻訳日:2022-12-13 15:44:00 公開日:2022-12-12
# CACTI: スケーラブルなマルチタスクマルチステージ視覚模倣学習フレームワーク CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning ( http://arxiv.org/abs/2212.05711v1 ) ライセンス: Link先を確認	Zhao Mandi, Homanga Bharadhwaj, Vincent Moens, Shuran Song, Aravind Rajeswaran, Vikash Kumar	(参考訳) 多くのスキルと未発見のシナリオへの一般化が可能なロボットの開発には、大規模で多様なデータセットの効率的な収集と、収集されたデータに対する高容量ポリシーのトレーニングという2つの面での進歩が必要だ。大規模なデータセットはコンピュータビジョンや自然言語処理といった他の分野の進歩を加速させているが、ロボット工学のような物理システムでは、同等のスケールのデータを集めることが特に難しい。本研究では,このギャップを解消し,キッチン環境におけるマルチタスクマルチセンシングロボット操作のレンズとして,ロボット学習のスケールアップを実現するフレームワークを提案する。 CACTIという名前のフレームワークは,データ収集,データ拡張,視覚表現学習,模倣ポリシートレーニングの4つの段階を別々に扱う。 CACTIフレームワークでは、画像生成に最先端モデルを適用する利点と、圧縮段階における事前訓練された領域外視覚表現を使用することによるトレーニング効率の大幅な向上を強調した。実験では 1) 実際のロボットのセットアップにおいて、CACTIは、キッチンオブジェクトを含む10の操作作業が可能な単一ポリシーの効率的な訓練を可能にし、邪魔対象のレイアウトに頑健である。 2) シミュレーションキッチン環境では,CACTIは18のセマンティックタスクに対して,最大50のレイアウトバリエーションで単一のポリシをトレーニングする。シミュレーションタスクベンチマークと、実環境とシミュレーション環境の両方のデータセットがリリースされ、将来の研究が促進される。 Developing robots that are capable of many skills and generalization to unseen scenarios requires progress on two fronts: efficient collection of large and diverse datasets, and training of high-capacity policies on the collected data. While large datasets have propelled progress in other fields like computer vision and natural language processing, collecting data of comparable scale is particularly challenging for physical systems like robotics. In this work, we propose a framework to bridge this gap and better scale up robot learning, under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage, and the significant improvement of training efficiency by using pretrained out-of-domain visual representations at the compression stage. Experimentally, we demonstrate that 1) on a real robot setup, CACTI enables efficient training of a single policy capable of 10 manipulation tasks involving kitchen objects, and robust to varying layouts of distractor objects; 2) in a simulated kitchen environment, CACTI trains a single policy on 18 semantic tasks across up to 50 layout variations per task. The simulation task benchmark and augmented datasets in both real and simulated environments will be released to facilitate future research.	翻訳日:2022-12-13 15:43:41 公開日:2022-12-12
# 自己校正インタフェースの対話的紹介 Interactive introduction to self-calibrating interfaces ( http://arxiv.org/abs/2212.05766v1 ) ライセンス: Link先を確認	Jonathan Grizou	(参考訳) 本論文は,自己管理インタフェースパラダイムを直感的に理解することを目的とする。このパラダイムでは、オンザフライで好みに合わせてインターフェイスを使用する方法を選択することができる。そこで我々は,PIN入力タスクを導入し,事前校正されたインタフェースから自己校正インターフェースへ移行し,ボタンからの入力モダリティの複雑さを増大させ,地図上のポイント,スケッチ,最後に音声語への変換を行う。本研究は, 従来の研究論文ではなく, 主張を裏付ける仮説と実験結果であり, この研究を裏付ける研究はすでにすでに行われており, 後段で広く言及されている。代わりに私たちの目標は、イラストやインタラクティブなデモ、ビデオなどをサポートする小さな論理的なステップで、興味をそそる対話パラダイムを身につけ、学習を強化することです。この論文は、あらゆる背景の好奇心の楽しみのために設計され、平易な英語で書かれており、事前の知識は必要ない。すべてのデモはopenvault.jgrizou.comでオンラインで公開されている。 This interactive paper aims to provide an intuitive understanding of the self-calibrating interface paradigm. Under this paradigm, you can choose how to use an interface which can adapt to your preferences on the fly. We introduce a PIN entering task and gradually release constraints, moving from a pre-calibrated interface to a self-calibrating interface while increasing the complexity of input modalities from buttons, to points on a map, to sketches, and finally to spoken words. This is not a traditional research paper with a hypothesis and experimental results to support claims; the research supporting this work has already been done and we refer to it extensively in the later sections. Instead, our aim is to walk you through an intriguing interaction paradigm in small logical steps with supporting illustrations, interactive demonstrations, and videos to reinforce your learning. We designed this paper for the enjoyments of curious minds of any backgrounds, it is written in plain English and no prior knowledge is necessary. All demos are available online at openvault.jgrizou.com and linked individually in the paper.	翻訳日:2022-12-13 15:43:17 公開日:2022-12-12
# 非線形コンテキスト帯域とマルコフ決定過程に対する不確かさ重み付き破壊ロバストアルゴリズム Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes ( http://arxiv.org/abs/2212.05949v1 ) ライセンス: Link先を確認	Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang	(参考訳) 敵の汚職に伴う強化学習(RL)問題への大きな関心と進展にもかかわらず、現在の作業は線形設定に限られるか、望ましくない$\tilde{O}(\sqrt{T}\zeta)$ regret boundにつながり、$T$はラウンド数、$\zeta$は総汚職数である。本稿では,一般関数近似を用いた文脈的帯域幅を考慮し,$\tilde{O}(\sqrt{T}+\zeta)$の後悔を実現するための計算効率の良いアルゴリズムを提案する。提案アルゴリズムは、最近開発された線形文脈帯域からの不確実性重み付き最小二乗回帰と、一般関数クラスに対する新しい重み付き推定器に依存する。線形構造に大きく依存する既存の解析とは対照的に,重み付き不確実性の総和を制御する新しい手法を開発し,最終的な後悔境界を確立する。次に、このアルゴリズムをエピソディックmdp設定に一般化し、一般関数近似のシナリオにおいて、まず汚職レベル$\zeta$に対する加法依存を達成する。特に、我々のアルゴリズムは、すべての汚職レベルと未知の$\zeta$のケースにおいて、パフォーマンスの低いバウンダリにほぼ一致するか、既存のメソッドを改善している。 Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound, where $T$ is the number of rounds and $\zeta$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit \citep{he2022nearly} and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final regret bounds. We then generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $\zeta$ in the scenario of general function approximation. Notably, our algorithms achieve regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown $\zeta$ cases.	翻訳日:2022-12-13 15:37:09 公開日:2022-12-12
# VO$Q$L:非線形関数近似を用いたモデルフリーRLの最適回帰に向けて VO$Q$L: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation ( http://arxiv.org/abs/2212.06069v1 ) ライセンス: Link先を確認	Alekh Agarwal, Yujia Jin, Tong Zhang	(参考訳) 一般関数近似とスパース報酬による時間的不均一なエピソード強化学習(RL)について検討した。我々は,分散重み付き楽観的$q$-learning (vo$q$l) という新しいアルゴリズムを$q$-learningに基づいて設計し,その後悔を完全性と回帰関数クラスに対する有界eluder次元に限定した。特別な場合として、VO$Q$L は$\tilde{O}(d\sqrt{HT}+d^6H^{5})$ regret over $T$ episodes for a horizon $H$ MDP under (d$-dimensional) linear function approximation という漸近的に最適である。本アルゴリズムは, 重み付き回帰に基づく上限と下限を最適値関数に組み込んで, 改良された後悔を得る。このアルゴリズムは関数クラス上の回帰オラクルによって計算的に効率的であり、線形MDPに対して計算可能で統計的に最適なアプローチとなる。 We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic $Q$-Learning (VO$Q$L), based on $Q$-learning and bound its regret assuming completeness and bounded Eluder dimension for the regression function class. As a special case, VO$Q$L achieves $\tilde{O}(d\sqrt{HT}+d^6H^{5})$ regret over $T$ episodes for a horizon $H$ MDP under ($d$-dimensional) linear function approximation, which is asymptotically optimal. Our algorithm incorporates weighted regression-based upper and lower bounds on the optimal value function to obtain this improved regret. The algorithm is computationally efficient given a regression oracle over the function class, making this the first computationally tractable and statistically optimal approach for linear MDPs.	翻訳日:2022-12-13 15:36:41 公開日:2022-12-12
# ALSO:運転推定による自動車ライダー自己監督 ALSO: Automotive Lidar Self-supervision by Occupancy estimation ( http://arxiv.org/abs/2212.05867v1 ) ライセンス: Link先を確認	Alexandre Boulch, Corentin Sautier, Bj\"orn Michele, Gilles Puy, Renaud Marlet	(参考訳) 本稿では,ポイントクラウド上で動作する深層知覚モデルのバックボーンを事前学習する新しい自己教師あり手法を提案する。中心となる考え方は、3Dポイントがサンプリングされる表面の再構成であるプリテキストタスクでモデルを訓練し、基礎となる潜在ベクトルを知覚ヘッドへの入力として使用することである。直感的には、もしネットワークがシーン表面を再構築できるなら、わずかな入力ポイントのみを与えられた場合、おそらく、実際の知覚タスクを促進するために使用できる意味情報の断片をキャプチャする。この原理は非常に単純な定式化であり、実装が容易であり、多種多様な3dセンサーや、セマンティックセグメンテーションやオブジェクト検出を行うディープネットワークにも広く適用できる。実際、ほとんどの対照的な学習アプローチとは対照的に、単一のストリームパイプラインをサポートし、限られたリソースでのトレーニングを可能にする。セマンティクスセグメンテーションとオブジェクト検出の両面で,異なる種類のライダーを含む様々な自律運転データセットについて広範な実験を行った。その結果,既存の手法と比較して,アノテーションなしで有用な表現を学習する手法の有効性が示された。コードは \href{https://github.com/valeoai/also}{github.com/valeoai/also} で入手できる。 We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled, and to use the underlying latent vectors as input to the perception head. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information, that can be used to boost an actual perception task. This principle has a very simple formulation, which makes it both easy to implement and widely applicable to a large range of 3D sensors and deep networks performing semantic segmentation or object detection. In fact, it supports a single-stream pipeline, as opposed to most contrastive learning approaches, allowing training on limited resources. We conducted extensive experiments on various autonomous driving datasets, involving very different kinds of lidars, for both semantic segmentation and object detection. The results show the effectiveness of our method to learn useful representations without any annotation, compared to existing approaches. Code is available at \href{https://github.com/valeoai/ALSO}{github.com/valeoai/ALSO}	翻訳日:2022-12-13 15:35:28 公開日:2022-12-12
# CLIP Itselfは強力なファインタナーで、ImageNetのViT-BとViT-Lで85.7%と88.0%のTop-1の精度を達成した CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet ( http://arxiv.org/abs/2212.06138v1 ) ライセンス: Link先を確認	Xiaoyi Dong and Jianmin Bao and Ting Zhang and Dongdong Chen and Shuyang Gu and Weiming Zhang and Lu Yuan and Dong Chen and Fang Wen and Nenghai Yu	(参考訳) 近年の研究では、CLIPはゼロショット推論に成功しているが、微調整性能は不十分である。本稿では,超パラメータ選択による微調整性能の影響について検討する。各種重要パラメータについて検討し,分類タスクにおける微調整CLIPの影響を包括的研究により実証的に評価した。 CLIPの微調整性能はかなり過小評価されている。大規模教師付き事前トレーニングアプローチや,Masked Image Modelingの予測ターゲットとしてCLIPを使用する最新の研究と比較して,CLIP自体が微調整において優れているか,少なくとも競争的であることを示す。具体的には、CLIP ViT-Base/16とCLIP ViT-Large/14は、ImageNet-1Kデータセット上のTop-1精度を85.7%、88.0%微調整することができる。これらの観察は、CLIPは微調整には適さないという従来の結論に挑戦し、最近提案されたCLIPに基づく改善を再考する動機となった。当社のコードは、 \url{https://github.com/LightDXY/FT-CLIP}で公開します。 Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory. In this paper, we identify that fine-tuning performance is significantly impacted by hyper-parameter choices. We examine various key hyper-parameters and empirically evaluate their impact in fine-tuning CLIP for classification tasks through a comprehensive study. We find that the fine-tuning performance of CLIP is substantially underestimated. Equipped with hyper-parameter refinement, we demonstrate CLIP itself is better or at least competitive in fine-tuning compared with large-scale supervised pre-training approaches or latest works that use CLIP as prediction targets in Masked Image Modeling. Specifically, CLIP ViT-Base/16 and CLIP ViT-Large/14 can achieve 85.7%,88.0% finetuning Top-1 accuracy on the ImageNet-1K dataset . These observations challenge the conventional conclusion that CLIP is not suitable for fine-tuning, and motivate us to rethink recently proposed improvements based on CLIP. We will release our code publicly at \url{https://github.com/LightDXY/FT-CLIP}.	翻訳日:2022-12-13 15:35:06 公開日:2022-12-12
# AECにおける自動ルールチェックのためのテキストマイニングに基づく特許分析 Text Mining-Based Patent Analysis for Automated Rule Checking in AEC ( http://arxiv.org/abs/2212.05891v1 ) ライセンス: Link先を確認	Zhe Zheng, Bo-Rui Kang, Qi-Tian Yuan, Yu-Cheng Zhou, Xin-Zheng Lu, Jia-Rui Lin	(参考訳) アーキテクチャ、エンジニアリング、建設(aec)業界におけるコンプライアンスチェックプロセスの効率性を促進することが期待されている自動ルールチェック(arc)が注目されている。 ARCアプリケーションのホットスポットに光を当て、そのトレンドを予測することは、関連する研究とイノベーションの推進に役立つ。そこで本研究では,derwent innovations index database (dii) とchina national knowledge infrastructure (cnki) のデータベースから特許をデータソースとし,(1)特許の定量的特徴(すなわち,年次分配分析),(2)潜在ディリクレ割当(lda)を用いたアークトピックの同定,(3)snaによるアークトピックの共起分析を含む3段階の分析を行った。その結果,中国と英語の特許研究のホットスポットと傾向が異なっていた。本研究の貢献は,(1)複数のテキストマイニング手法(sna,lda)を統合した総合的特許分析へのアプローチ,(2)特許分析に基づいてarcの応用ホットスポットと開発動向をレビューする,(3)arcの技術開発とイノベーションのための標識を提供する,の3つの側面を有する。 Automated rule checking (ARC), which is expected to promote the efficiency of the compliance checking process in the architecture, engineering, and construction (AEC) industry, is gaining increasing attention. Throwing light on the ARC application hotspots and forecasting its trends are useful to the related research and drive innovations. Therefore, this study takes the patents from the database of the Derwent Innovations Index database (DII) and China national knowledge infrastructure (CNKI) as data sources and then carried out a three-step analysis including (1) quantitative characteristics (i.e., annual distribution analysis) of patents, (2) identification of ARC topics using a latent Dirichlet allocation (LDA) and, (3) SNA-based co-occurrence analysis of ARC topics. The results show that the research hotspots and trends of Chinese and English patents are different. The contributions of this study have three aspects: (1) an approach to a comprehensive analysis of patents by integrating multiple text mining methods (i.e., SNA and LDA) is introduced ; (2) the application hotspots and development trends of ARC are reviewed based on patent analysis; and (3) a signpost for technological development and innovation of ARC is provided.	翻訳日:2022-12-13 15:34:46 公開日:2022-12-12
# ソースコード変換器のパラメータ効率向上 Parameter-Efficient Finetuning of Transformers for Source Code ( http://arxiv.org/abs/2212.05901v1 ) ライセンス: Link先を確認	Shamil Ayupov and Nadezhda Chirkova	(参考訳) 事前訓練されたトランスフォーマーは、様々なコード処理タスクで最先端のパフォーマンスを達成するが、デプロイするには大きすぎる可能性がある。ソフトウェア開発ツールは、事前訓練されたモデルの単一インスタンスを使用する可能性がある様々な目的のためにモジュールを組み込むことが多いため、事前訓練されたコードのモデルに対してパラメータ効率の良い微調整を利用する必要があると思われる。本研究では,NLPタスクで最初にテストされたアダプタとLoRAの2つのアプローチを4つのコード処理タスクでテストする。効率的な微調整アプローチは、標準的な、コード理解タスクの完全な微調整と同等あるいは高いパフォーマンスを達成できますが、コード生成タスクの完全な微調整を過小評価しています。これらの結果は、NLP以外の領域で効率的な微調整アプローチをテストすることの重要性を浮き彫りにし、ソースコードの効率的な微調整における将来の研究を動機付けている。 Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but may be too large to be deployed. As software development tools often incorporate modules for various purposes which may potentially use a single instance of the pretrained model, it appears relevant to utilize parameter-efficient fine-tuning for the pretrained models of code. In this work, we test two widely used approaches, adapters and LoRA, which were initially tested on NLP tasks, on four code-processing tasks. We find that though the efficient fine-tuning approaches may achieve comparable or higher performance than the standard, full, fine-tuning in code understanding tasks, they underperform full fine-tuning in code-generative tasks. These results underline the importance of testing efficient fine-tuning approaches on other domains than NLP and motivate future research in efficient fine-tuning for source code.	翻訳日:2022-12-13 15:34:20 公開日:2022-12-12
# MegaCRN:時空間モデリングのためのメタグラフ畳み込みリカレントネットワーク MegaCRN: Meta-Graph Convolutional Recurrent Network for Spatio-Temporal Modeling ( http://arxiv.org/abs/2212.05989v1 ) ライセンス: Link先を確認	Renhe Jiang, Zhaonan Wang, Jiawei Yong, Puneet Jeph, Quanjun Chen, Yasumasa Kobayashi, Xuan Song, Toyotaro Suzumura, Shintaro Fukushima	(参考訳) 多変量時系列予測の標準タスクとしての時空間モデリングは、AIコミュニティにおいて重要な研究トピックとなっている。グラフストリームに暗示される不均一性と非定常性に対処するため,時空間データに対する新しいグラフ構造学習機構として時空間メタグラフ学習を提案する。具体的には,このアイデアをMeta-Graph Convolutional Recurrent Network(MegaCRN)に実装し,Meta-ノードバンクを利用したMeta-Graph LearnerをGCRNエンコーダに接続する。我々は,2つのベンチマークデータセット(METR-LAとPEMS-BAY)と,非定常現象のばらつきを含む大規模時空間データセットの総合的な評価を行う。私たちのモデルは3つのデータセット(27% mae と 34% rmse)すべてにおいて最先端を上回りました。さらに,一連の質的評価を通じて,異なるパターンを持つ位置と時間スロットを明示的に区別し,異常な状況に対してロバストに適応できることを実証する。コードとデータセットはhttps://github.com/deepkashiwa20/megacrnで入手できる。 Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.	翻訳日:2022-12-13 15:27:14 公開日:2022-12-12
# 確率的帰納的説明の計算について On Computing Probabilistic Abductive Explanations ( http://arxiv.org/abs/2212.05990v1 ) ライセンス: Link先を確認	Yacine Izza, Xuanxiang Huang, Alexey Ignatiev, Nina Narodytska, Martin C. Cooper and Joao Marques-Silva	(参考訳) 最も広く研究されているAI(XAI)アプローチは正しくない。これはよく知られたモデルに依存しない説明手法のケースであり、従順写像に基づくアプローチのケースでもある。一つの解決策は、不明瞭さの欠点を示さない本質的な解釈可能性を考えることである。不運なことに、本質的な解釈性は説明の冗長性を示すことができる。形式的説明可能性(英語版)はこれらの非厳密なアプローチの代替であり、その一例がPI説明である。残念なことに、PI-Explanationsは重要な欠点も示しており、最も目に見えるのはおそらくそのサイズである。近年,PI-Explanationsの厳密な厳密な厳密さは,いわゆる関連する集合を計算することによって,より小さな説明サイズで取り除けることが観察されている。ある正の {\delta} が与えられたとき、S の特徴が固定されたとき、対象のクラスを得る確率が {\delta} を超えるとき、S の集合 S は {\delta}-関連である。しかし、非常に単純な分類器であっても、関連する特徴集合の計算の複雑さは禁じられ、回路ベースの分類器ではNPPP完全である。従来の否定的な結果とは対照的に,決定木(DT),ネイブベイズ分類器(NBC),命題言語から得られたいくつかの分類器群など,広く使われている分類器の集合を計算するための実践的アプローチを検討する。さらに,本論文では,これらの分類器の族に対して,関連する集合の計算が容易であることを示す。さらに,検討した分類器群に対して,関連特徴の簡潔な集合が得られることを確認した。 The most widely studied explainable AI (XAI) approaches are unsound. This is the case with well-known model-agnostic explanation approaches, and it is also the case with approaches based on saliency maps. One solution is to consider intrinsic interpretability, which does not exhibit the drawback of unsoundness. Unfortunately, intrinsic interpretability can display unwieldy explanation redundancy. Formal explainability represents the alternative to these non-rigorous approaches, with one example being PI-explanations. Unfortunately, PI-explanations also exhibit important drawbacks, the most visible of which is arguably their size. Recently, it has been observed that the (absolute) rigor of PI-explanations can be traded off for a smaller explanation size, by computing the so-called relevant sets. Given some positive {\delta}, a set S of features is {\delta}-relevant if, when the features in S are fixed, the probability of getting the target class exceeds {\delta}. However, even for very simple classifiers, the complexity of computing relevant sets of features is prohibitive, with the decision problem being NPPP-complete for circuit-based classifiers. In contrast with earlier negative results, this paper investigates practical approaches for computing relevant sets for a number of widely used classifiers that include Decision Trees (DTs), Naive Bayes Classifiers (NBCs), and several families of classifiers obtained from propositional languages. Moreover, the paper shows that, in practice, and for these families of classifiers, relevant sets are easy to compute. Furthermore, the experiments confirm that succinct sets of relevant features can be obtained for the families of classifiers considered.	翻訳日:2022-12-13 15:26:52 公開日:2022-12-12
# 変圧器層のニューラルネットワークによる解釈 A Neural ODE Interpretation of Transformer Layers ( http://arxiv.org/abs/2212.06011v1 ) ライセンス: Link先を確認	Yaofeng Desmond Zhong and Tongtao Zhang and Amit Chakraborty and Biswadip Dey	(参考訳) マルチヘッドアテンションとマルチレイヤパーセプトロン(MLP)レイヤの交互パターンを使用するトランスフォーマーレイヤは、さまざまな機械学習問題に対して効果的なツールを提供する。変圧器層は勾配の解消の問題を避けるために残差接続を用いるため、微分方程式の数値積分と見なすことができる。この拡張抽象化では、この接続の上に構築し、トランス層の内部構造を変更することを提案する。提案モデルでは,マルチヘッドアテンションサブレイヤとMLPサブレイヤを並列に配置する。この簡単な修正により,複数のタスクにおけるトランスフォーマーネットワークの性能が向上することを示す。さらに,画像分類タスクにおいて,高度な統合スキームを持つニューラルodeソルバを用いることにより,さらに性能が向上することを示す。 Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance.	翻訳日:2022-12-13 15:26:22 公開日:2022-12-12
# 多次元自己注意に基づく生活推定のためのアプローチ Multi-Dimensional Self Attention based Approach for Remaining Useful Life Estimation ( http://arxiv.org/abs/2212.05772v1 ) ライセンス: Link先を確認	Zhi Lai, Mengjuan Liu, Yunzhu Pan, Dajiang Chen	(参考訳) Remaining Useful Life (RUL) は、予後・健康管理(PHM)において重要な役割を担っている。従来の機械の健康維持システムはしばしばコストがかかり、事前の専門知識が必要であり、高度に複雑で変化する産業シナリオに適合することは困難である。産業機器へのセンサーの普及に伴い、これらの機器を相互接続するための産業用モノのインターネット(iiot)の構築は、デジタル工場の発展において不可解なトレンドとなっている。 IIoTが収集したリアルタイムな運用データを用いて、推定されたRULをRUL予測アルゴリズムにより取得し、PHMシステムはデバイスに対する前向きなメンテナンス対策を開発することにより、メンテナンスコストを低減し、運用中の障害時間を短縮することができる。本稿では,IIoTシナリオにおけるマルチセンサデバイスのための生活予測モデルについて検討する。本シナリオでは,主流rul予測モデルを調査し,rul予測モデリングの基本ステップを要約した。そこで本論文では,RUL推定のためのデータ駆動手法を提案する。複数のセンサから出力される多次元時系列データを融合するために、マルチヘッド注意機構を使用し、特徴に対する注意が特徴とシーケンスに対する注意の相互作用を捉え、時間ステップの重みを学習する。そして、時系列の特徴を学習するためにLong Short-Term Memory Networkを適用する。提案モデルを2つのベンチマークデータセット(c-mapssとphm08)で評価し,結果が最先端モデルを上回ることを示した。さらに, マルチヘッドアテンション機構の解釈可能性により, 提案モデルはエンジン劣化の予備的な説明を与えることができる。したがって、このアプローチはIIoTシナリオの予測メンテナンスを約束する。 Remaining Useful Life (RUL) estimation plays a critical role in Prognostics and Health Management (PHM). Traditional machine health maintenance systems are often costly, requiring sufficient prior expertise, and are difficult to fit into highly complex and changing industrial scenarios. With the widespread deployment of sensors on industrial equipment, building the Industrial Internet of Things (IIoT) to interconnect these devices has become an inexorable trend in the development of the digital factory. Using the device's real-time operational data collected by IIoT to get the estimated RUL through the RUL prediction algorithm, the PHM system can develop proactive maintenance measures for the device, thus, reducing maintenance costs and decreasing failure times during operation. This paper carries out research into the remaining useful life prediction model for multi-sensor devices in the IIoT scenario. We investigated the mainstream RUL prediction models and summarized the basic steps of RUL prediction modeling in this scenario. On this basis, a data-driven approach for RUL estimation is proposed in this paper. It employs a Multi-Head Attention Mechanism to fuse the multi-dimensional time-series data output from multiple sensors, in which the attention on features is used to capture the interactions between features and attention on sequences is used to learn the weights of time steps. Then, the Long Short-Term Memory Network is applied to learn the features of time series. We evaluate the proposed model on two benchmark datasets (C-MAPSS and PHM08), and the results demonstrate that it outperforms the state-of-art models. Moreover, through the interpretability of the multi-head attention mechanism, the proposed model can provide a preliminary explanation of engine degradation. Therefore, this approach is promising for predictive maintenance in IIoT scenarios.	翻訳日:2022-12-13 15:25:52 公開日:2022-12-12
# gwrboost:空間変動関係の定量的定量化のための地理的重み付け勾配促進法 GWRBoost:A geographically weighted gradient boosting method for explainable quantification of spatially-varying relationships ( http://arxiv.org/abs/2212.05814v1 ) ライセンス: Link先を確認	Han Wang, Zhou Huang, Ganmin Yin, Yi Bao, Xiao Zhou, Yong Gao	(参考訳) 地理的重み付け回帰(GWR)は、地理的文脈における従属変数と独立変数の関係の空間的変動を推定するための重要なツールである。しかし、gwrモデルを構成する古典的な線形回帰は、特にかなりの体積と複雑な非線形データにおいて不適合になりがちであり、比較性能が劣るという問題に苦しんでいる。それでも、決定木やサポートベクトルマシンのような先進的なモデルでは、より効率的に複雑なデータから特徴を学習できるが、局所的な関係の空間的変動について説明可能な定量化はできない。上記の問題に対処するため, 局所的な加法モデルと勾配強化最適化法を適用し, 地理的に位置する変数間の空間的に変化する関係について, 説明可能な定量化能力を保持するGWRBoostを提案する。さらに,提案モデルに対する赤池情報スコアの計算方法を定式化し,従来のGWRアルゴリズムとの比較分析を行う。シミュレーション実験と実験ケーススタディを適用して, GWRBoostの性能と実用性を実証した。その結果,提案モデルはパラメータ推定精度が18.3\%,accが67.3\%,適合性が67.3\%低減できることがわかった。 The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inferior comparative performance. Nevertheless, some advanced models, such as the decision tree and the support vector machine, can learn features from complex data more effectively while they cannot provide explainable quantification for the spatial variation of localized relationships. To address the above issues, we propose a geographically gradient boosting weighted regression model, GWRBoost, that applies the localized additive model and gradient boosting optimization method to alleviate underfitting problems and retains explainable quantification capability for spatially-varying relationships between geographically located variables. Furthermore, we formulate the computation method of the Akaike information score for the proposed model to conduct the comparative analysis with the classic GWR algorithm. Simulation experiments and the empirical case study are applied to prove the efficient performance and practical value of GWRBoost. The results show that our proposed model can reduce the RMSE by 18.3\% in parameter estimation accuracy and AICc by 67.3\% in the goodness of fit.	翻訳日:2022-12-13 15:25:24 公開日:2022-12-12
# 継続KD:継続最適化レンズによる知識蒸留の改善 Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization ( http://arxiv.org/abs/2212.05998v1 ) ライセンス: Link先を確認	Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi	(参考訳) 知識蒸留(KD)は、より大規模なモデル(教師)から知識を伝達することで、小さなモデルの(学生)一般化を改善するために自然言語理解(NLU)タスクに広く用いられている。 kdメソッドは多くの設定で最先端のパフォーマンスを達成しているが、性能を制限するいくつかの問題に苦しんでいる。文献では,教師と学生のネットワーク間の容量ギャップがKDを非効率にすることを示した。さらに、既存のKD技術は教師の出力のノイズを軽減するものではない:教師の騒々しい振る舞いをモデル化することで、生徒がより有用な特徴を学ぶのを邪魔することができる。本稿では,これらの問題に対処し,従来の手法と比較して訓練を容易にする新しいKD手法を提案する。継続最適化にヒントを得て,この目標のスムーズなバージョンから始めることで,非凸KD目標を最適化する訓練手順を設計し,トレーニングが進むにつれてさらに複雑化する。提案手法(Continuation-KD)は,NLU(GLUEベンチマーク)およびコンピュータビジョンタスク(CIFAR-10およびCIFAR-100)上の各種コンパクトアーキテクチャにおける最先端性能を実現する。 Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).	翻訳日:2022-12-13 15:19:30 公開日:2022-12-12
# リモートセンシングにおける画像テキスト検索のためのスケール・semantic joint decoupling network Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing ( http://arxiv.org/abs/2212.05752v1 ) ライセンス: Link先を確認	Chengyu Zheng, Ning song, Ruoyu Zhang, Lei Huang, Zhiqiang Wei, Jie Nie (corresponding author)	(参考訳) リモートセンシングにおける画像テキスト検索は、データ分析と応用のための柔軟な情報を提供することを目的としている。近年、最先端の手法は「スケールデカップリング」と「セマンティックデカップリング」の戦略に特化して表現能力をさらに強化している。しかしながら、これらの以前のアプローチは、スケールやセマンティクスの分離に焦点をあてるが、これらの2つのアイデアを結合モデルにマージすることを無視し、クロスモーダル検索モデルの性能を極端に制限している。そこで,本稿では,リモートセンシング画像テキスト検索のための新しいスケール・セマンティクス・ジョイント・デカップリング・ネットワーク(ssjdn)を提案する。具体的には、Salience Feature extract (SFE) とSalience-Guided Suppression (SGS) のユニットを利用した双方向スケールデカップリング(BSD) モジュールを設計し、潜在的な特徴を適応的に抽出し、異なるスケールの手がかりを得るために、他のスケールでの煩雑な特徴を抑圧する。さらに,分類セマンティック・デカップリング(LSD)モジュールを,カテゴリセマンティック・ラベルを事前知識として活用して,重要なセマンティック関連情報を示す画像やテキストを監督する。最後に,stl(semantic-guided triple loss)の設計を行った。stlは損失関数を調整する定数を適応的に生成し,同じ意味画像とテキストにマッチする確率を改善し,検索モデルの収束時間を短縮する。提案するssjdnは,4つのベンチマークリモートセンシングデータセットで実施した数値実験で最先端のアプローチを上回っている。 Image-text retrieval in remote sensing aims to provide flexible information for data analysis and application. In recent years, state-of-the-art methods are dedicated to ``scale decoupling'' and ``semantic decoupling'' strategies to further enhance the capability of representation. However, these previous approaches focus on either the disentangling scale or semantics but ignore merging these two ideas in a union model, which extremely limits the performance of cross-modal retrieval models. To address these issues, we propose a novel Scale-Semantic Joint Decoupling Network (SSJDN) for remote sensing image-text retrieval. Specifically, we design the Bidirectional Scale Decoupling (BSD) module, which exploits Salience Feature Extraction (SFE) and Salience-Guided Suppression (SGS) units to adaptively extract potential features and suppress cumbersome features at other scales in a bidirectional pattern to yield different scale clues. Besides, we design the Label-supervised Semantic Decoupling (LSD) module by leveraging the category semantic labels as prior knowledge to supervise images and texts probing significant semantic-related information. Finally, we design a Semantic-guided Triple Loss (STL), which adaptively generates a constant to adjust the loss function to improve the probability of matching the same semantic image and text and shorten the convergence time of the retrieval model. Our proposed SSJDN outperforms state-of-the-art approaches in numerical experiments conducted on four benchmark remote sensing datasets.	翻訳日:2022-12-13 15:18:06 公開日:2022-12-12
# 教師なし異常定位のためのマルチスケール特徴模倣 Multi-scale Feature Imitation for Unsupervised Anomaly Localization ( http://arxiv.org/abs/2212.05786v1 ) ライセンス: Link先を確認	Chao Hu, Shengxin Lai	(参考訳) 非教師付き異常局在化タスクは、異常サンプルトレーニングの欠如、複数のタイプの異常の検出、複数の異常領域の比率の対応といった課題に直面している。これらの問題を解決するために,教師と学生の個別の特徴模倣ネットワーク構造と,画像と特徴ピラミッドを組み合わせたマルチスケール処理戦略を提案する。ネットワーク構造を単純化するために,勾配勾配勾配最適化に基づくネットワークモジュール重要探索手法を提案する。実験結果から,提案アルゴリズムは実工業製品検出データセット上の特徴モデリング異常な局所化手法よりも,同期間に優れた性能を示した。マルチスケール戦略は、ベンチマーク手法と比較して効果的に効果を改善できる。 The unsupervised anomaly localization task faces the challenge of missing anomaly sample training, detecting multiple types of anomalies, and dealing with the proportion of the area of multiple anomalies. A separate teacher-student feature imitation network structure and a multi-scale processing strategy combining an image and feature pyramid are proposed to solve these problems. A network module importance search method based on gradient descent optimization is proposed to simplify the network structure. The experimental results show that the proposed algorithm performs better than the feature modeling anomaly localization method on the real industrial product detection dataset in the same period. The multi-scale strategy can effectively improve the effect compared with the benchmark method.	翻訳日:2022-12-13 15:17:32 公開日:2022-12-12
# マルチビューカメラを用いた3次元キャラクタアニメーションのためのマーカーレスボディモーションキャプチャ Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras ( http://arxiv.org/abs/2212.05788v1 ) ライセンス: Link先を確認	Jinbao Wang, Ke Lu, Jian Xue	(参考訳) 本稿では,マーカーレス人体モーションキャプチャによる3次元3次元キャラクタアニメーション生成のための新しいアプリケーションシステムを提案する。システム全体のパイプラインは以下の5段階からなる。 1) 複数のカメラを用いたモーションデータのキャプチャ 2) 2次元(2次元)人体関節の検出 3)3次元関節の推定 4)骨変換行列の計算、及び 5)キャラクターアニメーションの生成。本研究の目的は,通常のカメラで撮影した多視点画像を用いて,3次元のスケルトンとアニメーションを生成することである。 3次元視覚に基づく3次元骨格再構築の計算複雑性は、フレーム単位のモーションキャプチャを実現するために必要なように低減されている。実験の結果,本システムは人間の行動を効果的かつ効率的に捉え,リアルタイムに3Dアニメキャラクターをアニメーション化することができることがわかった。 This paper proposes a novel application system for the generation of three-dimensional (3D) character animation driven by markerless human body motion capturing. The entire pipeline of the system consists of five stages: 1) the capturing of motion data using multiple cameras, 2) detection of the two-dimensional (2D) human body joints, 3) estimation of the 3D joints, 4) calculation of bone transformation matrices, and 5) generation of character animation. The main objective of this study is to generate a 3D skeleton and animation for 3D characters using multi-view images captured by ordinary cameras. The computational complexity of the 3D skeleton reconstruction based on 3D vision has been reduced as needed to achieve frame-by-frame motion capturing. The experimental results reveal that our system can effectively and efficiently capture human actions and use them to animate 3D cartoon characters in real-time.	翻訳日:2022-12-13 15:17:24 公開日:2022-12-12
# 異なるタイプの知識グラフに対する推論:静的、時間的、マルチモーダル Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal ( http://arxiv.org/abs/2212.05767v1 ) ライセンス: Link先を確認	Ke Liang, Lingyuan Meng, Meng Liu, Yue Liu, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu, Fuchun Sun	(参考訳) 知識グラフ推論(KGR)は,知識グラフに基づくマイニング論理則に基づいて,既存の事実から新たな事実を推論することを目的として,急速に発展する研究方向となっている。質問応答やレコメンデーションシステムなど、多くのAIアプリケーションでKGを使うことに大きなメリットがあることが証明されている。グラフ型によると、既存のkgrモデルは、静的モデル、時間モデル、マルチモーダルモデルという3つのカテゴリに大まかに分類できる。この領域の初期の研究は主に静的KGRに焦点を当てており、推論タスクに直接一般知識グラフ埋め込みモデルを適用する傾向がある。しかし、これらのモデルは、帰納的静的KGR、時間的KGR、マルチモーダルKGRのようなより複雑で実用的なタスクには適していない。この目的のために、最近複数の研究が開発されているが、調査論文やオープンソースリポジトリは、この重要な方向へのモデルを包括的に要約し、議論している。このギャップを埋めるために、静的から時間的、そしてマルチモーダルなKGをトレースする知識グラフの調査を行う。具体的には、KGRモデルの予備項、要約、典型的なデータセットを導入し、議論する。さらに,課題と可能性についても論じる。対応するオープンソースリポジトリはGitHubで共有されている。 Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: https://github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.	翻訳日:2022-12-13 15:17:11 公開日:2022-12-12
# BigText-QA: 大規模ハイブリッド知識グラフに関する質問応答 BigText-QA: Question Answering over a Large-Scale Hybrid Knowledge Graph ( http://arxiv.org/abs/2212.05798v1 ) ライセンス: Link先を確認	Jingjing Xu, Maria Biryukov, Martin Theobald, Vinu Ellampallil Venugopal	(参考訳) 特に自然言語的質問や手がかりの中で発生する複数のエンティティ間のきめ細かい関係を解釈する場合、テキスト的リソースに関する複雑な質問に答えることは難しい問題である。 YAGO、DBpedia、Freebase、Wikidataなどの知識ベース(KB)は、この文脈で広く使われており、質問応答(QA)アプリケーションでは過去10年間に広く受け入れられてきた。現在のKBは構造化知識の簡潔な表現を提供するが、自然言語ソースが提供する情報だけでなく、定式化や意味的なニュアンスも欠如している。我々は,BigText-QAを用いて,構造化知識と非構造化知識の両方を統一的なグラフィカル表現で整理した,より冗長な知識グラフ(KG)に基づいて,質問に答えられる統合QAシステムを開発することを目的とする。これにより、BigText-QAは、構造化された背景KB(YAGOやWikidataなど)にマッピングされた名前付きエンティティの標準セットである$\unicode{x2013}$aと、高度に多様化したリレーショナルパラフレーズとリッチなコンテキスト情報を提供するオープンな文節のセットを組み合わせることができる。 Answering complex questions over textual resources remains a challenging problem$\unicode{x2013}$especially when interpreting the fine-grained relationships among multiple entities that occur within a natural-language question or clue. Curated knowledge bases (KBs), such as YAGO, DBpedia, Freebase and Wikidata, have been widely used in this context and gained great acceptance for question-answering (QA) applications in the past decade. While current KBs offer a concise representation of structured knowledge, they lack the variety of formulations and semantic nuances as well as the context of information provided by the natural-language sources. With BigText-QA, we aim to develop an integrated QA system which is able to answer questions based on a more redundant form of a knowledge graph (KG) that organizes both structured and unstructured (i.e., "hybrid") knowledge in a unified graphical representation. BigText-QA thereby is able to combine the best of both worlds$\unicode{x2013}$a canonical set of named entities, mapped to a structured background KB (such as YAGO or Wikidata), as well as an open set of textual clauses providing highly diversified relational paraphrases with rich context information.	翻訳日:2022-12-13 15:10:45 公開日:2022-12-12
# rpn: 言語理解のためのディープラーニングにおける単語ベクトルレベルデータ拡張アルゴリズム RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding ( http://arxiv.org/abs/2212.05961v1 ) ライセンス: Link先を確認	Zhengqing Yuan, Zhuanzhe Zhao, Yongming Liu, Xiaolong Zhang, Xuecong Hou and Yue Wang	(参考訳) 本稿では, RPN:Random Position Noise Algorithmと呼ばれる自然理解タスクのための新しいデータ拡張アルゴリズムを提案する。全ての文レベルのタスクに対して自然言語理解タスクに適用できる手法はほとんどなく、RPNは原文の従来の拡張を単語ベクトルレベルに適用する。 RPNアルゴリズムは、あるワードベクトルの1つまたは複数の次元に置換する。その結果、RPNはサンプルにある程度の摂動を導入することができ、異なるタスクに対する摂動の範囲を調整することができる。拡張されたサンプルはモデルのトレーニングに使用され、モデルがより堅牢になる。その後の実験で、トレーニングや微調整モデルにrpnを加えると、tweeteval、cola、sst-2を含む8つの自然言語処理タスクが安定的に向上し、他のデータ拡張アルゴリズムよりも大幅に改善されたことが分かり、rpnアルゴリズムは言語理解のための全ての文レベルのタスクに適用され、単語埋め込み層を持つあらゆるディープラーニングモデルで使用される。 This paper presents a new data augmentation algorithm for natural understanding tasks, called RPN:Random Position Noise algorithm.Due to the relative paucity of current text augmentation methods. Few of the extant methods apply to natural language understanding tasks for all sentence-level tasks.RPN applies the traditional augmentation on the original text to the word vector level. The RPN algorithm makes a substitution in one or several dimensions of some word vectors. As a result, the RPN can introduce a certain degree of perturbation to the sample and can adjust the range of perturbation on different tasks. The augmented samples are then used to give the model training.This makes the model more robust. In subsequent experiments, we found that adding RPN to the training or fine-tuning model resulted in a stable boost on all 8 natural language processing tasks, including TweetEval, CoLA, and SST-2 datasets, and more significant improvements than other data augmentation algorithms.The RPN algorithm applies to all sentence-level tasks for language understanding and is used in any deep learning model with a word embedding layer.	翻訳日:2022-12-13 15:10:22 公開日:2022-12-12
# アンタングル型シーケンス対シーケンス学習を用いた実世界の合成一般化 Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning ( http://arxiv.org/abs/2212.05982v1 ) ライセンス: Link先を確認	Hao Zheng and Mirella Lapata	(参考訳) 合成一般化は、現在のニューラルネットワークが苦戦している人間の言語学習の基本的なメカニズムである。最近提案されたDunangled sequence-to-sequence model (Dangle) は、復号ステップごとに特別な符号化を学習することで、有望な一般化能力を示す。このモデルに2つの重要な変更を加え、より不整合表現を奨励し、計算とメモリ効率を改善し、より現実的な構成一般化に取り組みます。具体的には、各時間ステップでソースキーと値を適応的に再エンコードするのではなく、それらの表現を分離し、一定間隔で定期的にキーを再エンコードする。我々の新しいアーキテクチャは、既存のタスクやデータセット間でのより優れた一般化性能と、トレーニングセットに関連して自然に発生する構成パターンを検出して作成する新しい機械翻訳ベンチマークをもたらす。この手法は人工的な課題よりも現実の要求をうまくエミュレートする。 Compositional generalization is a basic mechanism in human language learning, which current neural networks struggle with. A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability by learning specialized encodings for each decoding step. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency, allowing us to tackle compositional generalization in a more realistic setting. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically, at some interval. Our new architecture leads to better generalization performance across existing tasks and datasets, and a new machine translation benchmark which we create by detecting naturally occurring compositional patterns in relation to a training set. We show this methodology better emulates real-world requirements than artificial challenges.	翻訳日:2022-12-13 15:10:04 公開日:2022-12-12
# Promptingはプログラミング - 大規模言語モデルのためのクエリ言語 Prompting Is Programming: A Query Language For Large Language Models ( http://arxiv.org/abs/2212.06094v1 ) ライセンス: Link先を確認	Luca Beurer-Kellner, Marc Fischer, Martin Vechev	(参考訳) 大規模言語モデルは、質問応答やコード生成など、幅広いタスクにおいて優れたパフォーマンスを示している。高いレベルでは、入力が与えられると、言語モデルを使用して、統計的に類似した方法でシーケンスを自動補完することができる。これに基づいて、ユーザはこれらのモデルを言語命令や例で促し、さまざまな下流タスクを実装する。高度なプロンプト手法は、言語モデル、ユーザ、計算機などの外部ツール間のインタラクションを暗示することができる。しかし、特定のタスクに対する最新のパフォーマンスや適応言語モデルを得るためには、複雑なタスクとモデル固有のプログラムを実装する必要がある。そこで我々は,LMP(Language Model Programming)という新しいアイデアを提案する。 LMPは、純粋テキストプロンプトから直感的にテキストプロンプトとスクリプティングを組み合わせた言語モデルを一般化する。加えて、LMPは言語モデルの出力に対して制約を指定できる。これにより、言語モデルの内部を抽象化し、ハイレベルなセマンティクスを提供しながら、多くのタスクに簡単に適応できる。 lmpを有効にするために、lmql(language model query languageの略)を実装し、lmpプロンプトからの制約と制御フローを活用して、基礎となる言語モデルへの高価な呼び出し数を最小限に抑える効率的な推論手順を生成する。 LMQLは、直感的に幅広い最先端のプロンプトメソッドをキャプチャすることができ、特に既存のハイレベルAPIで実装するのが困難なインタラクティブなフローを容易にします。評価の結果,複数のダウンストリームタスクの精度を維持したり向上させたりしながら,従量課金API(13～85%のコスト削減)の場合に必要となる計算量やコストを大幅に削減した。 Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks, while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (13-85% cost savings).	翻訳日:2022-12-13 15:09:49 公開日:2022-12-12
# ビデオグラウンデッド対話における情報理論的テキスト幻覚低減 Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue ( http://arxiv.org/abs/2212.05765v1 ) ライセンス: Link先を確認	Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chang D. Yoo	(参考訳) ビデオグラウンドド・ダイアログ(VGD)は、与えられたビデオと対話コンテキストに関する質問に対して、回答文をデコードすることを目的としている。最近のマルチモーダル推論による回答文生成の成功にもかかわらず、既存の対話システムは依然として、質問を理解せずに入力テキストからのテキストコピーを区別しないテキスト幻覚問題に苦しんでいる。これは、データセット内の回答文が通常入力テキストの単語を含むという事実から、スプリアスな相関を学習するためであり、vgdシステムは入力テキストから単語を過度にコピーし、それらの単語が接頭辞のテキストと重なり合うことを期待している。そこで我々は,提案した情報理論テキスト幻覚測定手法から得られたテキスト幻覚正規化(THR)損失を組み込んだTHAM(Text Hallucination Mitigating)フレームワークを設計する。 THAMを現在の対話システムに適用すると、VGDベンチマーク(AVSD@DSTC7とAVSD@DSTC8)の有効性が検証され、高い解釈可能性を示す。 Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.	翻訳日:2022-12-13 15:08:47 公開日:2022-12-12
# インド語における記事要約のためのディープラーニング手法の実装 Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages ( http://arxiv.org/abs/2212.05702v1 ) ライセンス: Link先を確認	Rahul Tangsali, Aabha Pingle, Aditya Vyawahare, Isha Joshi, Raviraj Joshi	(参考訳) 低リソースのインドの言語に対するテキスト要約の研究は、関連するデータセットが利用可能であることから制限されている。本稿では、ilsum 2022のindic language summarizationデータセットで使用されるさまざまなディープラーニングアプローチの概要を示す。 ISUM 2022データセットは、それぞれインド英語、ヒンディー語、グジャラティ語で書かれたニュース記事と、それらの基礎的な要約で構成されている。我々の研究では、様々な事前訓練されたSeq2seqモデルを探索し、ILSUM 2022データセットでそれらを微調整する。我々の場合、細調整された SoTA PEGASUS モデルは英語、細調整された IndicBART モデル、ヒンディー語のための拡張データ、そして再び細調整された PEGASUS モデル、そしてGujarati のための翻訳マッピングに基づくアプローチで機能した。評価指標としてROUGE-1, ROUGE-2, ROUGE-4を用いた。 The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language summarization datasets. The ISUM 2022 dataset consists of news articles written in Indian English, Hindi, and Gujarati respectively, and their ground-truth summarizations. In our work, we explore different pre-trained seq2seq models and fine-tune those with the ILSUM 2022 datasets. In our case, the fine-tuned SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and ROUGE-4 as the evaluation metrics.	翻訳日:2022-12-13 15:08:28 公開日:2022-12-12
# 未ラベルデータを用いた変圧器モデルによるドイツの顧客フィードバックの関連性・極性分類のドメイン適応 Domain Adaptation of Transformer-Based Models using Unlabeled Data for Relevance and Polarity Classification of German Customer Feedback ( http://arxiv.org/abs/2212.05764v1 ) ライセンス: Link先を確認	Ahmad Idrissi-Yaghir, Henning Sch\"afer, Nadja Bauer, Christoph M. Friedrich	(参考訳) 顧客からのフィードバックを理解することは、企業が問題を特定し、製品やサービスを改善するために必要なことです。テキスト分類と感情分析は、さまざまな機械学習アプローチとディープラーニングアプローチを用いて、これらのデータを分析する上で大きな役割を果たす。この作業では、ドイツの顧客フィードバックデータセットを扱う際に、さまざまなトランスフォーマーベースのモデルを使用して、これらのモデルがいかに効率的かを調べる。さらに、これらの事前学習モデルは、未ラベルデータを用いて特定の領域に適応させることで、既学習モデルよりも優れた結果が得られるかどうかを更に分析する。モデルを評価するために、GermEval 2017の2つの下流タスクが検討されている。実験の結果,トランスフォーマティブベースモデルは,fasttextベースラインに比べて大幅に改善され,公開スコアや先行モデルよりも優れていた。サブタスク関連分類において、最良モデルは、第1のテストセットで96.1 %、第2テストセットで95.9 %、サブタスク極性分類で85.1 %、85.3 %のマイクロ平均値である。 Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working with a German customer feedback dataset. In addition, these pre-trained models are further analyzed to determine if adapting them to a specific domain using unlabeled data can yield better results than off-the-shelf pre-trained models. To evaluate the models, two downstream tasks from the GermEval 2017 are considered. The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline and outperform the published scores and previous models. For the subtask Relevance Classification, the best models achieve a micro-averaged $F1$-Score of 96.1 % on the first test set and 95.9 % on the second one, and a score of 85.1 % and 85.3 % for the subtask Polarity Classification.	翻訳日:2022-12-13 15:08:10 公開日:2022-12-12
# 確率重み平均化による事前学習言語モデルの一般化の改善 Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging ( http://arxiv.org/abs/2212.05956v1 ) ライセンス: Link先を確認	Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais	(参考訳) 知識蒸留(KD)は、下流タスクにおけるコンパクトな事前学習言語モデル(PLM)の一般化を改善するための一般的な手法である。しかし、このような方法は、新しいデータセットごとに別の教師モデルをトレーニングする追加の負担を課す。あるいは、より優れた一般化に向けて、コンパクトモデルの最適化手順の改善に直接取り組むことができる。近年の研究では、局所的な最小値の平坦性はより良い一般化とよく相関している。本研究では,より平坦な最小値への収束を促す手法であるSWA(Stochastic Weight Averaging)を微調整PLMに適用する。我々は、様々なNLPタスク(テキスト分類、質問応答、生成)と異なるモデルアーキテクチャについて広範な実験を行い、追加の計算コストなしで一般化を改善することを示す。さらに, この単純な最適化手法は, コンパクトモデルに対する最先端KD法よりも優れていることを示す。 Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.	翻訳日:2022-12-13 15:07:47 公開日:2022-12-12
# ファウショットシナリオにおけるフェデレートNLP Federated NLP in Few-shot Scenarios ( http://arxiv.org/abs/2212.05974v1 ) ライセンス: Link先を確認	Dongqi Cai, Shangguang Wang, Yaozong Wu, Felix Xiaozhu Lin, Mengwei Xu	(参考訳) 自然言語処理(NLP)はリッチなモバイルアプリケーションである。様々な言語理解タスクをサポートするため、基盤となるnlpモデルは、しばしば連合したプライバシー保護設定(fl)で微調整される。このプロセスは現在、モバイルクライアントから少なくとも数十万のラベル付きトレーニングサンプルに依存しているが、モバイルユーザは自分のデータをラベル付けする意思や知識を欠いていることが多い。このようなデータラベルの不十分さは、数ショットのシナリオとして知られており、モバイルNLPアプリケーションのキーブロッカーとなっている。この研究は、数ショットシナリオ(FedFSL)におけるフェデレーションNLPを初めて調査する。擬似ラベリングと即時学習のアルゴリズム的進歩を再現することにより、トレーニングデータの0.05%(100未満)しかラベル付けされず、残りがラベル付けされていない場合に競争精度を提供する訓練パイプラインを最初に構築する。ワークフローをインスタンス化するために,新しい設計で高い実行コストに対応するシステムFFNLPを提案する。 1)疑似ラベルをトレーニングワークフローに、学習の進捗に合致するレートで注入するカリキュラムペーシング、(2)最も学習可能なデータを選択するためのメカニズムである表現多様性、(3)モデルのトレーニング深さと層容量のコプランニング。これらの設計により、トレーニング遅延、クライアントエネルギー、ネットワークトラフィックがそれぞれ46.0$\times$、41.2$\times$、3000.0$\times$となる。 FFNLPはアルゴリズム/システムの共同設計を通じて、ほとんどのトレーニングサンプルがラベル付けされていない困難な設定にFLを適用することができることを示した。 Natural language processing (NLP) sees rich mobile applications. To support various language understanding tasks, a foundation NLP model is often fine-tuned in a federated, privacy-preserving setting (FL). This process currently relies on at least hundreds of thousands of labeled training samples from mobile clients; yet mobile users often lack willingness or knowledge to label their data. Such an inadequacy of data labels is known as a few-shot scenario; it becomes the key blocker for mobile NLP applications. For the first time, this work investigates federated NLP in the few-shot scenario (FedFSL). By retrofitting algorithmic advances of pseudo labeling and prompt learning, we first establish a training pipeline that delivers competitive accuracy when only 0.05% (fewer than 100) of the training data is labeled and the remaining is unlabeled. To instantiate the workflow, we further present a system FFNLP, addressing the high execution cost with novel designs. (1) Curriculum pacing, which injects pseudo labels to the training workflow at a rate commensurate to the learning progress; (2) Representational diversity, a mechanism for selecting the most learnable data, only for which pseudo labels will be generated; (3) Co-planning of a model's training depth and layer capacity. Together, these designs reduce the training delay, client energy, and network traffic by up to 46.0$\times$, 41.2$\times$ and 3000.0$\times$, respectively. Through algorithm/system co-design, FFNLP demonstrates that FL can apply to challenging settings where most training samples are unlabeled.	翻訳日:2022-12-13 15:07:31 公開日:2022-12-12
# 空間的関係を学習する畳み込みニューラルネットワークの能力を明らかにする新しい特徴スクランブルアプローチ A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations ( http://arxiv.org/abs/2212.06021v1 ) ライセンス: Link先を確認	Amr Farahat, Felix Effenberger, Martin Vinck	(参考訳) 畳み込みニューラルネットワーク(cnns)は、オブジェクト認識を解く最も成功したコンピュータビジョンシステムの一つである。さらに、CNNは人間の脳における視覚的表現の性質を理解するために大きな応用がある。しかし、CNNが実際にどのように決断を下すのか、内部表現の性質や認識戦略が人間とどのように異なるのかは、まだ理解されていない。具体的には、cnnが主に物体の表面の規則性に依存しているのか、それとも人間に似た特徴の空間的配置を活用できるのかという議論がある。本稿では,cnnがオブジェクトの分類に特徴の空間的配置(すなわちオブジェクト部分)を使用するかどうかを明示的に検証する新しい特徴スクランブル手法を開発した。我々は,この手法を,CNNの有効受容フィールドサイズを体系的に操作すると同時に,最小認識可能な構成(MIRC)解析と組み合わせる。従来の文献とは対照的に,CNNが比較的長距離空間関係をオブジェクト分類に利用できることを示す。さらに、cnnが空間的関係を使用する範囲は、テクスチャやスケッチといったデータセットに大きく依存する。実際、CNNは異種データセット(ImageNet)内の異なるクラスに対して異なる戦略を使用しており、CNNは連続的な分類戦略を持っていることを示唆している。最後に,cnnは粒度の中間レベルまでのみ特徴の空間的配置を学習できることを示し,大域的な形状特徴よりも中間的な特徴が物体分類における感度と特異性の最適なトレードオフをもたらすことを示唆する。これらの結果は、cnn表現の性質と、それらがオブジェクト分類の特徴の空間的配置に依存する範囲に関する新しい洞察を与える。 Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.	翻訳日:2022-12-13 15:01:02 公開日:2022-12-12
# ステアブルCNNのための入射神経畳み込みカーネル Implicit Neural Convolutional Kernels for Steerable CNNs ( http://arxiv.org/abs/2212.06096v1 ) ライセンス: Link先を確認	Maksim Zhdanov, Nico Hoffmann and Gabriele Cesa	(参考訳) ステアブル畳み込みニューラルネットワーク(Steerable Convolutional Neural Network, CNN)は、リフレクションや回転など、原点保存グループである$G$に属する翻訳や他の変換と等価なニューラルネットワークを構築するための一般的なフレームワークを提供する。それらは、カーネル空間に課されるグループ固有の等分散制約を解析的に解いて得られる、$g$-steerable kernelの標準畳み込みに依存する。解が特定の群 $G$ に調整されるので、核基底の実装は、群同変モデルの開発を複雑にする他の対称性変換に一般化されない。本稿では,多層パーセプトロン(MLP)による暗黙的神経表現を用いて,$G$-steerableカーネルのパラメータ化を提案する。結果として得られるフレームワークは、ステアブルCNNの実装をシンプルで柔軟な方法で提供し、任意のグループ$G$に一般化し、$G$-equivariant MLPを構築できる。本手法をポイントクラウド (modelnet-40) と分子データ (qm9) に適用し, 標準制御型cnnと比較して性能が著しく向上することを示す。 Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and other transformations belonging to an origin-preserving group $G$, such as reflections and rotations. They rely on standard convolutions with $G$-steerable kernels obtained by analytically solving the group-specific equivariance constraint imposed onto the kernel space. As the solution is tailored to a particular group $G$, the implementation of a kernel basis does not generalize to other symmetry transformations, which complicates the development of group equivariant models. We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels. The resulting framework offers a simple and flexible way to implement Steerable CNNs and generalizes to any group $G$ for which a $G$-equivariant MLP can be built. We apply our method to point cloud (ModelNet-40) and molecular data (QM9) and demonstrate a significant improvement in performance compared to standard Steerable CNNs.	翻訳日:2022-12-13 15:00:34 公開日:2022-12-12
# 質問応答のためのモーメントコントラスト事前学習 Momentum Contrastive Pre-training for Question Answering ( http://arxiv.org/abs/2212.05762v1 ) ライセンス: Link先を確認	Minda Hu, Muzhi Li, Yasheng Wang and Irwin King	(参考訳) 既存の抽出質問回答(QA)の事前学習手法は、構文構造において自然質問とは異なるクローゼのようなクエリを生成する。そこで本研究では,抽出QAのための新しいMomentum Contrastive pRe-training fOr queStion anSwering(MCROSS)法を提案する。具体的には、MCROSSはモーメントコントラスト学習フレームワークを導入し、クローゼのような解答確率と自然な問合せのサンプルペアを一致させる。したがって、事前訓練されたモデルは、クローゼのようなサンプルで学んだ知識を自然の疑問に答えることができる。 3つのベンチマークQAデータセットによる実験結果から,本手法は教師付きシナリオとゼロショットシナリオの両方のベースラインと比較して顕著な改善が得られた。 Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries different from natural questions in syntax structure, which could overfit pre-trained models to simple keyword matching. In order to address this problem, we propose a novel Momentum Contrastive pRe-training fOr queStion anSwering (MCROSS) method for extractive QA. Specifically, MCROSS introduces a momentum contrastive learning framework to align the answer probability between cloze-like and natural query-passage sample pairs. Hence, the pre-trained models can better transfer the knowledge learned in cloze-like samples to answering natural questions. Experimental results on three benchmarking QA datasets show that our method achieves noticeable improvement compared with all baselines in both supervised and zero-shot scenarios.	翻訳日:2022-12-13 14:58:25 公開日:2022-12-12
# ResNetのソリューション構築による解釈について On an Interpretation of ResNets via Solution Constructions ( http://arxiv.org/abs/2212.05663v1 ) ライセンス: Link先を確認	Changcun Huang	(参考訳) 本稿では,resnetアーキテクチャの一般的な解釈が与えられ,性能メカニズムが説明できる,ゲートネットワーク制御と深層分類の原理による,マルチカテゴリ分類のためのresnetの典型的な解法について述べる。その解釈の一般性をさらに実証するために、さらに多くの解を用いる。 ResNetsの普遍近似能力が証明された。 This paper first constructs a typical solution of ResNets for multi-category classifications by the principle of gate-network controls and deep-layer classifications, from which a general interpretation of the ResNet architecture is given and the performance mechanism is explained. We then use more solutions to further demonstrate the generality of that interpretation. The universal-approximation capability of ResNets is proved.	翻訳日:2022-12-13 14:51:00 公開日:2022-12-12
# 次項目推薦のためのハンケル行列表現によるテンソル型逐次学習 Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations ( http://arxiv.org/abs/2212.05720v1 ) ライセンス: Link先を確認	Evgeny Frolov and Ivan Oseledets	(参考訳) 自己注意型トランスフォーマーモデルは、最近、次の項目推奨タスクを非常に効率的に解くことが示されている。学習された注意重みは、ユーザの行動のシーケンシャルなダイナミクスを捉え、うまく一般化する。学習パラメータ空間の特別な構造に動機付けられ、それに代わるより軽量なアプローチでそれを模倣できるかどうかを疑問視する。学習プロセス内のシーケンシャルデータに関する構造的知識を生かしたテンソル分解に基づく新しいモデルを開発する。我々は,特別なハンケル行列表現に基づいて,自己アテンションネットワークの特性をどのように再現できるかを示す。結果として得られるモデルは、浅い線形アーキテクチャを持ち、そのニューラルアーキテクチャと比較する。 Self-attentive transformer models have recently been shown to solve the next item recommendation task very efficiently. The learned attention weights capture sequential dynamics in user behavior and generalize well. Motivated by the special structure of learned parameter space, we question if it is possible to mimic it with an alternative and more lightweight approach. We develop a new tensor factorization-based model that ingrains the structural knowledge about sequential data within the learning process. We demonstrate how certain properties of a self-attention network can be reproduced with our approach based on special Hankel matrix representation. The resulting model has a shallow linear architecture and compares competitively to its neural counterpart.	翻訳日:2022-12-13 14:50:53 公開日:2022-12-12
# REAP: 大規模で現実的な競合するパッチベンチマーク REAP: A Large-Scale Realistic Adversarial Patch Benchmark ( http://arxiv.org/abs/2212.05680v1 ) ライセンス: Link先を確認	Nabeel Hingun, Chawin Sitawarin, Jerry Li, David Wagner	(参考訳) 機械学習モデルは敵の摂動に影響を受けやすいことが知られている。有名な攻撃のひとつがadversarial patchで、特にデザインされたパターンを持つステッカーで、モデルがオブジェクトを誤って予測します。この攻撃は、自動運転車のようなカメラに依存するサイバー物理システムに重大な脅威をもたらす。現実の世界における攻撃や防御の評価は、合成データが非現実的であるのに対して、非常にコストがかかる。本研究では,実際の画像に対するパッチ攻撃や実環境下でのパッチ攻撃を評価するデジタルベンチマークであるREAP(Realistic Adversarial Patch)ベンチマークを提案する。 mapillary vistasデータセット上に構築されたベンチマークには、14,000以上のトラフィックサインが含まれています。それぞれのサインは幾何変換と照明変換で拡張され、デジタル的に生成されたパッチをリアルにサインに応用することができる。本ベンチマークを用いて,現実的な条件下での敵パッチ攻撃の大規模評価を行った。実験の結果, 敵のパッチ攻撃は従来考えられていたよりも脅威が少なく, 単純なデジタルシミュレーションに対する攻撃の成功率は実際の効果を予測できないことが示唆された。私たちはベンチマークをhttps://github.com/wagner-group/reap-benchmarkで公開しています。 Machine learning models are known to be susceptible to adversarial perturbation. One famous attack is the adversarial patch, a sticker with a particularly crafted pattern that makes the model incorrectly predict the object it is placed on. This attack presents a critical threat to cyber-physical systems that rely on cameras such as autonomous cars. Despite the significance of the problem, conducting research in this setting has been difficult; evaluating attacks and defenses in the real world is exceptionally costly while synthetic data are unrealistic. In this work, we propose the REAP (REalistic Adversarial Patch) benchmark, a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions. Built on top of the Mapillary Vistas dataset, our benchmark contains over 14,000 traffic signs. Each sign is augmented with a pair of geometric and lighting transformations, which can be used to apply a digitally generated patch realistically onto the sign. Using our benchmark, we perform the first large-scale assessments of adversarial patch attacks under realistic conditions. Our experiments suggest that adversarial patch attacks may present a smaller threat than previously believed and that the success rate of an attack on simpler digital simulations is not predictive of its actual effectiveness in practice. We release our benchmark publicly at https://github.com/wagner-group/reap-benchmark.	翻訳日:2022-12-13 14:49:58 公開日:2022-12-12
# DeepCut: グラフニューラルネットワーククラスタリングによる教師なしセグメンテーション DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering ( http://arxiv.org/abs/2212.05853v1 ) ライセンス: Link先を確認	Amit Aflalo, Shai Bagon, Tamar Kashti, Yonina eldar	(参考訳) 画像分割はコンピュータビジョンの基本課題である。教師なしメソッドをトレーニングするためのデータアノテーションは労働集約的であり、教師なしメソッドを動機付ける。既存のアプローチでは、事前訓練されたネットワークから深い特徴を抽出し、グラフを構築して古典的なクラスタリング手法(例えば、$k$-meansや正規化-cuts)を後処理の段階として適用する。これらの手法は特徴量に符号化された高次元情報をペアワイズスカラー親和性に還元する。本研究では、従来のクラスタリングアルゴリズムを、同じクラスタリング目的関数を達成するために訓練された軽量グラフニューラルネットワーク(GNN)に置き換える。しかし、既存のアプローチとは対照的に、GNNはローカルな画像特徴間のペアの親和性だけでなく、生の特徴自体も与えている。生の機能とクラスタリング目標の間のこの接続を維持することで、追加の処理ステップを必要とせずに、部分的なセマンティクスセグメンテーションを暗黙的に実行することができる。画像セグメンテーションGNNを学習するための自己教師付き損失関数として,古典的クラスタリングの目的を定式化する方法を示す。さらに、相関クラスタリング(CC)の目的を使ってクラスタ数を定義せずにクラスタリングを行う(k$lessクラスタリング)。提案手法は,複数のベンチマークにおいて最先端性能を上回って,オブジェクトのローカライゼーション,セグメンテーション,セマンティクス部分セグメンテーションタスクに適用する。 Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Some existing approaches extract deep features from pre-trained networks and build a graph to apply classical clustering methods (e.g., $k$-means and normalized-cuts) as a post-processing stage. These techniques reduce the high-dimensional information encoded in the features to pair-wise scalar affinities. In this work, we replace classical clustering algorithms with a lightweight Graph Neural Network (GNN) trained to achieve the same clustering objective function. However, in contrast to existing approaches, we feed the GNN not only the pair-wise affinities between local image features but also the raw features themselves. Maintaining this connection between the raw feature and the clustering goal allows to perform part semantic segmentation implicitly, without requiring additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training our image segmentation GNN. Additionally, we use the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters ($k$-less clustering). We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.	翻訳日:2022-12-13 14:49:37 公開日:2022-12-12
# 安定なアーティスト:拡散ラテント・スペースでセマンティックを操る The Stable Artist: Steering Semantics in Diffusion Latent Space ( http://arxiv.org/abs/2212.06013v1 ) ライセンス: Link先を確認	Manuel Brack, Patrick Schramowski, Felix Friedrich, Dominik Hintersdorf, Kristian Kersting	(参考訳) テキストコンディショニングによる大規模生成拡散モデルは最近、テキストのみから高精細な画像を生成するという素晴らしい性能で多くの注目を集めている。しかし、高品質な結果を得ることはほとんど不可能である。それに対して、テキスト誘導画像生成では、ユーザは、想定された画像を反復的に彫るために、入力にわずかな変更を多く行う。しかし、入力プロンプトのわずかな変更は、しばしば全く異なる画像が生成されることにつながるため、アーティストの制御はその粒度に制限される。フレキシビリティを実現するため,画像生成プロセスのきめ細かい制御が可能な画像編集手法であるStable Artistを提案する。主要なコンポーネントはセマンティックガイダンス(SEGA)であり、セマンティックな方向の変数数に沿って拡散過程を制御している。これにより、画像の微妙な編集、構成やスタイルの変化、芸術的概念全体の最適化が可能になる。さらに、SEGAは潜在空間を探索することで、モデルによって学習された概念、例えば「炭素放出」のような複雑な概念の表現についての洞察を得ることができる。いくつかのタスクで安定したアーティストを示し、高品質な画像編集と構成を示す。 Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as 'carbon emission'. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.	翻訳日:2022-12-13 14:49:13 公開日:2022-12-12
# ORCa:放射界カメラとしての光沢のある物体 ORCa: Glossy Objects as Radiance Field Cameras ( http://arxiv.org/abs/2212.04531v2 ) ライセンス: Link先を確認	Kushagra Tiwary, Akshat Dave, Nikhil Behari, Tzofi Klinghoffer, Ashok Veeraraghavan, Ramesh Raskar	(参考訳) 光沢のある物体の反射は、周囲の環境に関する貴重な情報と隠れた情報を含んでいる。これらの物体をカメラに変換することで、カメラの視野外の画像化や、人間の目に映る反射のような一見不可能な視界から、エキサイティングな応用を解き放つことができる。しかし, 反射は物体形状, 材料特性, 3次元環境, 観測者の観察方向などと密接に依存するため, この課題は困難である。本手法は,未知の幾何学を持つ光沢のある物体を放射場カメラに変換し,物体の視点から世界を撮影する。私たちの重要な洞察は、オブジェクトの表面を、オブジェクトが見える5d環境放射フィールドの2d投影としてキャストされた反射をキャプチャする仮想センサーに変換することです。本研究では, 環境放射界の復元により, 被写体から周囲への深度と放射率の推定が可能であり, また, 現場に存在する光沢のある物体にのみ直接視認できる新規なビューのレンダリングも可能であり, 観察者ではないことを示す。さらに、放射場を用いて、シーン内の近接物体によって引き起こされる閉塞体の周囲を画像化することができる。本手法はオブジェクトの多視点画像に基づいてエンドツーエンドに学習し,オブジェクト形状,拡散放射率,および5次元環境放射率場を共同で推定する。 Reflections on glossy objects contain valuable and hidden information about the surrounding environment. By converting these objects into cameras, we can unlock exciting applications, including imaging beyond the camera's field-of-view and from seemingly impossible vantage points, e.g. from reflections on the human eye. However, this task is challenging because reflections depend jointly on object geometry, material properties, the 3D environment, and the observer viewing direction. Our approach converts glossy objects with unknown geometry into radiance-field cameras to image the world from the object's perspective. Our key insight is to convert the object surface into a virtual sensor that captures cast reflections as a 2D projection of the 5D environment radiance field visible to the object. We show that recovering the environment radiance fields enables depth and radiance estimation from the object to its surroundings in addition to beyond field-of-view novel-view synthesis, i.e. rendering of novel views that are only directly-visible to the glossy object present in the scene, but not the observer. Moreover, using the radiance field we can image around occluders caused by close-by objects in the scene. Our method is trained end-to-end on multi-view images of the object and jointly estimates object geometry, diffuse radiance, and the 5D environment radiance field.	翻訳日:2022-12-13 12:40:32 公開日:2022-12-12
# コンテンツモデレーションと映画コンテンツ評価のための深層アーキテクチャ Deep Architectures for Content Moderation and Movie Content Rating ( http://arxiv.org/abs/2212.04533v2 ) ライセンス: Link先を確認	Fatih Cagatay Akyon, Alptekin Temizel	(参考訳) コンテンツに基づくビデオの評価は、ビデオ年齢カテゴリーを分類するための重要なステップである。映画コンテンツレーティングとテレビ番組レーティングは、専門家委員会が設立した2つの最も一般的なレーティングシステムである。しかし、委員会によるシーン・フィルムコンテンツの手作業によるレビュー・評価は面倒な作業であり、オンラインビデオコンテンツの増大がますます困難になっている。そのため、コンピュータビジョンに基づく映像コンテンツ分析技術を用いて評価プロセスを自動化することが望ましい。本稿では,アクション認識,マルチモーダル学習,映画ジャンル分類,コンテンツモデレーションと映画コンテンツ評価の文脈におけるセンシティブなコンテンツ検出について要約する。プロジェクトページはhttps://github.com/fcakyon/content-moderation-deep-learningにある。 Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning.	翻訳日:2022-12-13 12:40:07 公開日:2022-12-12
# 顔生成における一対多対応の記憶 Memories are One-to-Many Mapping Alleviators in Talking Face Generation ( http://arxiv.org/abs/2212.05005v2 ) ライセンス: Link先を確認	Anni Tang, Tianyu He, Xu Tan, Jun Ling, Runnan Li, Sheng Zhao, Li Song, Jiang Bian	(参考訳) 対話顔生成は、入力音声によって駆動される対象者の写実的映像像を生成することを目的としている。入力音声から出力映像への1対1マッピング(例えば、1つの音声コンテンツが複数の可視性を持つ)の性質から、以前の作品のように決定論的なマッピングを学ぶことはトレーニングのあいまいさをもたらし、その結果は劣る。この1対多マッピングは、部分的には2段階のフレームワーク(すなわち、音声対表現モデルとニューラルレンダリングモデル)によって緩和されるが、十分な情報(感情、しわなど)が得られないので、まだ不十分である。本稿では,不足している情報を暗黙記憶で補完するmemfaceと,それぞれ2段階の感覚に従う明示記憶を提案する。より具体的には、暗黙記憶は、音声表現共有空間における高レベルセマンティクスを捉えるのに、暗黙記憶は、ピクセルレベルの詳細を合成するために、ニューラルレンダリングモデルで使用される。実験の結果,提案するmemfaceは,複数のシナリオにまたがる最先端の成果を一貫して,かつ著しく上回ることがわかった。 Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.	翻訳日:2022-12-13 12:39:54 公開日:2022-12-12
# 機械学習フレームワーク:医療施設における競争的知性とキードライバーの市場シェア傾向の同定 Machine Learning Framework: Competitive Intelligence and Key Drivers Identification of Market Share Trends Among Healthcare Facilities ( http://arxiv.org/abs/2212.04810v2 ) ライセンス: Link先を確認	Anudeep Appe, Bhanu Poluparthi, Lakshmi Kasivajjula, Udai Mv, Sobha Bagadi, Punya Modi, Aditya Singh, Hemanth Gunupudi	(参考訳) 医療戦略策定におけるデータ駆動決定の必要性は急速に増加している。医療提供者施設や病院(ここからは施設と呼ぶ)に影響を与える要因を特定するための信頼性の高いフレームワークが重要視されている。このパイロット研究の目的は、ストラテジストが医療サービスの品質向上に影響を及ぼす施設の市場シェアを改善するために重要な決定を策定することを支援する、データ駆動機械学習(data driven machine learning) - 回帰フレームワークの開発である。米国(米国)のヘルスケアビジネスが研究対象に選ばれ、ワシントン州の主要施設60施設にまたがるデータと、約3年間の歴史的データについて検討されている。現在の分析において、市場シェアは、潜在的な競争相手の施設群間の合計の出会いに対する施設の出会いの割合として表される。本研究は,市場シェアを評価・予測するための,競争相手識別と回帰アプローチの2段階的アプローチを提案する。マーケットシェアに影響を与える機能の相対的重要性を定量化するために、モデル非依存技術であるSHAPを利用する。提案手法は,既存分析における競合相手のプールを同定し,DAG(Directed Acyclic Graphs)と特徴レベルのワードベクトルを開発し,施設レベルで重要な連結成分を評価する。この技術は、経験的手法のバイアスを最小限に抑えるデータ駆動によって堅牢である。施設間の競争相手を特定したポストは、市場シェアを予測するための回帰モデルを開発した。施設レベルでの特徴の相対的定量化のために、shap a をモデル非依存の説明器として組み込んだ。これは、市場シェアに影響を与える各施設の属性を特定しランク付けするのに役立った。 The necessity of data driven decisions in healthcare strategy formulation is rapidly increasing. A reliable framework which helps identify factors impacting a Healthcare Provider Facility or a Hospital (from here on termed as Facility) Market Share is of key importance. This pilot study aims at developing a data driven Machine Learning - Regression framework which aids strategists in formulating key decisions to improve the Facilitys Market Share which in turn impacts in improving the quality of healthcare services. The US (United States) healthcare business is chosen for the study; and the data spanning across 60 key Facilities in Washington State and about 3 years of historical data is considered. In the current analysis Market Share is termed as the ratio of facility encounters to the total encounters among the group of potential competitor facilities. The current study proposes a novel two-pronged approach of competitor identification and regression approach to evaluate and predict market share, respectively. Leveraged model agnostic technique, SHAP, to quantify the relative importance of features impacting the market share. The proposed method to identify pool of competitors in current analysis, develops Directed Acyclic Graphs (DAGs), feature level word vectors and evaluates the key connected components at facility level. This technique is robust since its data driven which minimizes the bias from empirical techniques. Post identifying the set of competitors among facilities, developed Regression model to predict the Market share. For relative quantification of features at a facility level, incorporated SHAP a model agnostic explainer. This helped to identify and rank the attributes at each facility which impacts the market share.	翻訳日:2022-12-13 12:39:15 公開日:2022-12-12
# ソーシャルレコメンデータシステムのためのグラフニューラルネットワークに関する調査 A Survey of Graph Neural Networks for Social Recommender Systems ( http://arxiv.org/abs/2212.04481v2 ) ライセンス: Link先を確認	Kartik Sharma and Yeon-Chang Lee and Sivagami Nambi and Aditya Salian and Shlok Shah and Sang-Wook Kim and Srijan Kumar	(参考訳) ソーシャルリコメンデーションシステム(social recommender systems, social recommender)は,アイテムレコメンデーションを生成するタスクとして,ユーザ間インタラクションとユーザ間ソーシャルリレーションを同時に活用する。さらに、社会関係の活用は、同性や社会的影響によるユーザの嗜好を理解する上で、明らかに有効である。そのため、SocialRSはますます注目を集めている。特に、グラフニューラルネットワーク(GNN)の進歩により、近年多くのGNNベースのSocialRS手法が開発されている。そこで我々はGNNベースのSocialRSに関する文献を包括的かつ体系的にレビューする。本調査では,PRISMAフレームワークに従って2151の論文を注釈付けし,まずGNNベースのSocialRSに関する80の論文を同定した。 1)入力分類学は入力型表記の5つのグループと入力型表記の7つのグループを含み、(2)アーキテクチャ分類学はGNNエンコーダの8つのグループとデコーダの2つのグループと損失関数表記の12つのグループを含む。我々は,GNNに基づくSocialRS手法を分類学のいくつかのカテゴリに分類し,その詳細を説明する。さらに、GNNベースのSocialRS手法を評価するために広く使われているベンチマークデータセットとメトリクスを要約する。最後に,今後の研究の方向性を示すことで,この調査を結論づける。 Social recommender systems (SocialRS) simultaneously leverage user-to-item interactions as well as user-to-user social relations for the task of generating item recommendations to users. Additionally exploiting social relations is clearly effective in understanding users' tastes due to the effects of homophily and social influence. For this reason, SocialRS has increasingly attracted attention. In particular, with the advance of Graph Neural Networks (GNN), many GNN-based SocialRS methods have been developed recently. Therefore, we conduct a comprehensive and systematic review of the literature on GNN-based SocialRS. In this survey, we first identify 80 papers on GNN-based SocialRS after annotating 2151 papers by following the PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-Analysis). Then, we comprehensively review them in terms of their inputs and architectures to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type notations and 7 groups of input representation notations; (2) architecture taxonomy includes 8 groups of GNN encoder, 2 groups of decoder, and 12 groups of loss function notations. We classify the GNN-based SocialRS methods into several categories as per the taxonomy and describe their details. Furthermore, we summarize the benchmark datasets and metrics widely used to evaluate the GNN-based SocialRS methods. Finally, we conclude this survey by presenting some future research directions.	翻訳日:2022-12-13 12:38:51 公開日:2022-12-12

Title

Authors

Abstract

論文公表日・翻訳日

# 位相・エネルギー・振幅推定のための高速コヒーレント量子アルゴリズム

Faster Coherent Quantum Algorithms for Phase, Energy, and Amplitude Estimation ( http://arxiv.org/abs/2103.09717v4 )

ライセンス: Link先を確認

Patrick Rall

(参考訳) 入力状態のコピーが1つだけ与えられ、入力状態がユニタリの固有状態である必要はなく、状態を測定する必要もない。ほとんどの量子推定アルゴリズムはこの「コヒーレント」設定に適さない仮定を作り、教科書のアプローチのみを残す。本稿では,従来の教科書法よりも概念的にも計算的にも簡単な位相,エネルギー,振幅推定のための新しいアルゴリズムを提案し,クエリの複雑さとアンシラフットプリントの両面を特徴付ける。量子フーリエ変換は不要であり、いくつかの推定値の中央値を計算するために量子ソートネットワークを必要としない。その代わりに、ブロックエンコーディング技術を使って1ビットずつの推定を計算し、特異値変換による全増幅を実行する。これらの改良されたサブルーチンは、量子メトロポリスサンプリングと量子ベイズ推論のパフォーマンスを加速する。

We consider performing phase estimation under the following conditions: we are given only one copy of the input state, the input state does not have to be an eigenstate of the unitary, and the state must not be measured. Most quantum estimation algorithms make assumptions that make them unsuitable for this 'coherent' setting, leaving only the textbook approach. We present novel algorithms for phase, energy, and amplitude estimation that are both conceptually and computationally simpler than the textbook method, featuring both a smaller query complexity and ancilla footprint. They do not require a quantum Fourier transform, and they do not require a quantum sorting network to compute the median of several estimates. Instead, they use block-encoding techniques to compute the estimate one bit at a time, performing all amplification via singular value transformation. These improved subroutines accelerate the performance of quantum Metropolis sampling and quantum Bayesian inference.

翻訳日:2023-04-07 21:09:28 公開日:2022-12-12

# 1+1)$d量子リンクシュウィンガーモデルの連続限界に向けて

Towards the continuum limit of a $(1+1)$d quantum link Schwinger model ( http://arxiv.org/abs/2104.00025v2 )

ライセンス: Link先を確認

Torsten V. Zache, Maarten Van Damme, Jad C. Halimeh, Philipp Hauke, Debasish Banerjee

(参考訳) ゲージ理論の解は、量子技術の最も有望な応用の1つである。ここでは、量子スピン-S$作用素の有限次元ヒルベルト空間を介して正規化された$U(1)$ゲージ理論の連続極限へのアプローチについて議論する。 1つの空間次元における量子電磁力学(QED)に対して、基底状態エネルギー、スカラー、ベクトル中間子質量を大きなスピン長$S$、大体積$N$、消滅格子$a$に外挿することで連続限界を数値的に示す。任意の$S$に対してガウスの法則を正確に解くことにより、一般化されたPXPスピンモデルを求め、解析的にヒルベルト空間次元を数える。これにより、量子デバイス上の連続極限に対する信頼性の高い外挿に必要なリソースを定量化できる。関数積分法を用いて、モデルと半整数スピンの大きな値と位相角 $\theta=\pi$ の物理学を関連付ける。この結果から,近い将来,量子デバイスが量子リンクモデルを用いてQEDレシエーションを定量的に探究できることが示唆された。

The solution of gauge theories is one of the most promising applications of quantum technologies. Here, we discuss the approach to the continuum limit for $U(1)$ gauge theories regularized via finite-dimensional Hilbert spaces of quantum spin-$S$ operators, known as quantum link models. For quantum electrodynamics (QED) in one spatial dimension, we numerically demonstrate the continuum limit by extrapolating the ground state energy, the scalar, and the vector meson masses to large spin lengths $S$, large volume $N$, and vanishing lattice spacing $a$. By exactly solving Gauss' law for arbitrary $S$, we obtain a generalized PXP spin model and count the physical Hilbert space dimension analytically. This allows us to quantify the required resources for reliable extrapolations to the continuum limit on quantum devices. We use a functional integral approach to relate the model with large values of half-integer spins to the physics at topological angle $\Theta=\pi$. Our findings indicate that quantum devices will in the foreseeable future be able to quantitatively probe the QED regime with quantum link models.

翻訳日:2023-04-06 00:24:51 公開日:2022-12-12

# 量子コンピュータにおけるコスト効率QFAアルゴリズム

Cost-efficient QFA Algorithm for Quantum Computers ( http://arxiv.org/abs/2107.02262v2 )

ライセンス: Link先を確認

\"Ozlem Salehi, Abuzer Yakary{\i}lmaz

(参考訳) 量子有限オートマトン(QFAs)の研究は、有限メモリを持つ量子コンピュータの探索の可能なアプローチの1つである。最も制限されたモデルであるにもかかわらず、ムーア・クラッチフィールド量子有限オートマトン (MCQFA) は、$\mathtt{MOD}_p = \{ a^{j} \mid j \equiv 0 \mod p\}$ のような特定の言語を認識するとき、古典的有限オートマトンモデルよりも指数関数的に簡潔であることが証明されている。本稿では,利用可能な実量子コンピュータの基底ゲートに基づいて演算子を選択した言語$\mathtt{MOD}_p$に対する改良MCQFAアルゴリズムを提案する。その結果,本論文で与えられたアルゴリズムの実装と比較して,基底ゲートの少ない短い量子プログラムを得ることができた。

The study of quantum finite automata (QFAs) is one of the possible approaches in exploring quantum computers with finite memory. Despite being one of the most restricted models, Moore-Crutchfield quantum finite automaton (MCQFA) is proven to be exponentially more succinct than classical finite automata models in recognizing certain languages such as $\mathtt{MOD}_p = \{ a^{j} \mid j \equiv 0 \mod p\}$, where $p$ is a prime number. In this paper, we present a modified MCQFA algorithm for the language $\mathtt{MOD}_p$, the operators of which are selected based on the basis gates on the available real quantum computers. As a consequence, we obtain shorter quantum programs using fewer basis gates compared to the implementation of the original algorithm given in the literature.

翻訳日:2023-03-23 08:38:41 公開日:2022-12-12

# 量子プログラムの形式的検証:理論,ツール,課題

Formal Verification of Quantum Programs: Theory, Tools and Challenges ( http://arxiv.org/abs/2110.01320v2 )

ライセンス: Link先を確認

Marco Lewis and Sadegh Soudjani and Paolo Zuliani

(参考訳) 過去27年間で、量子コンピューティングは学界と産業の両方から大きな関心を集めている。現在の速度では、量子コンピュータはこの分野の研究の増加によって急速に成長している。量子ハードウェアの信頼性の向上と、量子コンピュータのプログラムに適したソフトウェアの開発に、多大な努力が払われている。対照的に、量子プログラムの検証はあまり注目されていない。プログラムの検証は、リソース制約やエラーが発生しやすい量子ハードウェア上で複雑なアルゴリズムを正しくプログラムすることの難しさから、量子環境において特に重要である。量子プログラムの検証フレームワークを作成する研究は近年、理論的なアイデアの集合を用いて様々なツールが実装されている。この調査は、量子プログラムの形式的検証分野への短い導入であり、これまで開発された理論とツールをまとめることを目的としている。さらに、この調査では、この分野が将来直面するであろういくつかの課題、すなわち複雑な量子アルゴリズムの開発について調べる。

Over the past 27 years, quantum computing has seen a huge rise in interest from both academia and industry. At the current rate, quantum computers are growing in size rapidly backed up by the increase of research in the field. Significant efforts are being made to improve the reliability of quantum hardware and to develop suitable software to program quantum computers. In contrast, the verification of quantum programs has received relatively less attention. Verifying programs is especially important in the quantum setting due to how difficult it is to program complex algorithms correctly on resource-constrained and error-prone quantum hardware. Research into creating verification frameworks for quantum programs has seen recent development, with a variety of tools implemented using a collection of theoretical ideas. This survey aims to be a short introduction into the area of formal verification of quantum programs, bringing together theory and tools developed to date. Further, this survey examines some of the challenges that the field may face in the future, namely the development of complex quantum algorithms.

翻訳日:2023-03-12 14:19:13 公開日:2022-12-12

# Jaynes-Cummingsモデルとその子孫

The Jaynes-Cummings model and its descendants ( http://arxiv.org/abs/2202.00330v2 )

ライセンス: Link先を確認

Jonas Larson and Th. K. Mavrogordatos

(参考訳) Jaynes-Cummings (JC) モデルは、現在まで約60年間量子光学の最前線にあり、現代の物理学において最も単純だが複雑な非線形な光物質相互作用の定式化の1つとなっている。このモノグラフは、様々な分野にわたるモデルの全義性に重点を置いており、原子物理学、量子光学、固体物理学、量子情報科学を含むいくつかの領域における特定の物理系における幅広い応用を考察して、その形式主義の基本的な一般化をもたらす。物語を組み立てるために様々な部品を組み立てるとき、我々は主に量子物理学と量子光学の研究者をターゲットにしてきた。このモノグラフはまた、非平衡量子相転移、量子コンピューティングとシミュレーション、および量子多体物理学に携わる大学院生向けのアクセス可能な導入を含んでいる。この枠組みでは、物理学と応用の共通基盤を文献に散らばり、様々な技術進歩を明らかにすることを目的としている。この展示は、量子光学と凝縮物質物理学をインターレースする活気のある場を通して読者を導く。全てのセクションは理論と実験の強い相互関係に費やされており、歴史的にjc物理学を起源とする様々な現代の研究方向の発展と結びついている。これは1960年代初めからその進化を形作った主要な出版物への包括的な参照リストを伴っている。最後に,このような多面的素材の提示を可能な限り簡潔に維持し,数学的表現の経済的利用とともに,様々な図形で連続的なテキストを散在させてきた。

The Jaynes-Cummings (JC) model has been at the forefront of quantum optics for almost six decades to date, providing one of the simplest yet intricately nonlinear formulations of light-matter interaction in modern physics. Laying most of the emphasis on the omnipresence of the model across a range of disciplines, this monograph brings up the fundamental generality of its formalism, looking at a wide gamut of applications in specific physical systems among several realms, including atomic physics, quantum optics, solid-state physics and quantum information science. When bringing the various pieces together to assemble our narrative, we have primarily targeted researchers in quantum physics and quantum optics. The monograph also comprises an accessible introduction for graduate students engaged with non-equilibrium quantum phase transitions, quantum computing and simulation, and quantum many-body physics. In that framework, we aim to reveal the common ground between physics and applications scattered across literature and different technological advancements. The exposition guides the reader through a vibrant field interlacing quantum optics and condensed-matter physics. All sections are devoted to the strong interconnection between theory and experiment, historically linked to the development of the various modern research directions stemming from JC physics. This is accompanied by a comprehensive list of references to the key publications that have shaped its evolution since the early 1960s. Finally, we have endeavored to keep the presentation of such a multi-sided material as concise as possible, interspersing continuous text with various illustrations alongside an economical use of mathematical expressions.

翻訳日:2023-02-27 03:16:46 公開日:2022-12-12

# 隠れたプロジェクター埋め込みから生じるバイパルタイトRydbergアレイの量子多体傷

Quantum many-body scars in bipartite Rydberg arrays originate from hidden projector embedding ( http://arxiv.org/abs/2203.00658v4 )

ライセンス: Link先を確認

Keita Omiya and Markus M\"uller

(参考訳) 拘束されたラビ振動を記述したPXPモデルに現れるエルゴディディディティ破りの「量子多体傷」状態の性質について検討する。 Rydberg 原子の2部格子の全体クラスについて、これらの状態のほぼエネルギー等価な塔は、一般化された射影埋め込み形式(量子多体散乱をホストする多くのモデルに共通する構造)にハミルトニアンの近接から生じることを明らかにする。非エルミート的だが厳密には局所的なPXPモデルの拡張を構築し、文献からヘルミート的傷跡安定化拡張がどのように自然に理解できるかを示す。正確な傷痕状態は、明示的に構築された擬似スピンの大きなスピン状態として解析的に得られる。 n\eel状態から生じる準周期運動は、最終的に大きな擬スピンの接点のrydberg-constrained subspaceへの射影であることが示される。

We study the nature of the ergodicity-breaking "quantum many-body scar" states that appear in the PXP model describing constrained Rabi oscillations. For a {wide class of bipartite lattices} of Rydberg atoms, we reveal that the nearly energy-equidistant tower of these states arises from the Hamiltonian's close proximity to a generalized projector-embedding form, a structure common to many models hosting quantum many-body scars. We construct a non-Hermitian, but strictly local extension of the PXP model hosting exact quantum scars, and show how various Hermitian scar-stabilizing extensions from the literature can be naturally understood within this framework. The exact scar states are obtained analytically as large spin states of explicitly constructed pseudospins. The quasi-periodic motion ensuing from the N\'eel state is finally shown to be the projection onto the Rydberg-constrained subspace of the precession of the large pseudospin.

翻訳日:2023-02-23 10:15:08 公開日:2022-12-12

# 臨界カシミール効果:厳密な結果

Critical Casimir Effect: Exact Results ( http://arxiv.org/abs/2203.15050v2 )

ライセンス: Link先を確認

D. M. Dantchev and S. Dietrich

(参考訳) 任意の媒体では、温度やその構成成分の量子的性質による変動がある。物質体がそのような媒体に浸漬された場合、その形状とその構成成分の性質は、周囲の媒体の特性とその変動を変化させる。同じ媒体に第2の体がある場合(これら間の直接の相互作用に加えて)、第1の体による変化は第2の体による変化に影響を及ぼす。この相互の影響は、両者の間に力をもたらす。物質間の効果的な相互作用を媒介する媒体の励起が質量を持たない場合、この力は長距離で現在カシミール力として知られている。変動媒体が真空中の閉じ込められた電磁場からなる場合、量子力学的カシミール効果について話す。物質場の秩序パラメータが数密度や濃度の差のように変動し、対応する秩序パラメータの変動が長距離である場合、臨界カシミール効果を語る。これは例えば、二階相転移を実行し、対応する臨界点の近くに熱力学的に位置する系や、ゴールドストーンモード励起を示す連続対称性を持つ系などである。ここでは、一次元イジングモデル、XY、ハイゼンベルクモデル、二次元イジングモデル、ガウスモデル、球面モデル、およびイジングモデルとXYモデルの平均場結果を含む系のカシミール効果に関する現在利用可能な正確な結果について述べる。境界条件がカシミール力の挙動に及ぼす影響には特に注意が必要である。

In any medium there are fluctuations due to temperature or due to the quantum nature of its constituents. If a material body is immersed into such a medium, its shape and the properties of its constituents modify the properties of the surrounding medium and its fluctuations. If in the same medium there is a second body then -- in addition to all direct interactions between them -- the modifications due to the first body influence the modifications due to the second body. This mutual influence results in a force between these bodies. If the excitations of the medium, which mediate the effective interaction between the bodies, are massless, this force is long-ranged and nowadays known as a Casimir force. If the fluctuating medium consists of the confined electromagnetic field in vacuum, one speaks of the quantum mechanical Casimir effect. In the case that the order parameter of material fields fluctuates - such as differences of number densities or concentrations - and that the corresponding fluctuations of the order parameter are long-ranged, one speaks of the critical Casimir effect. This holds, e.g., in the case of systems which undergo a second-order phase transition and which are thermodynamically located near the corresponding critical point, or for systems with a continuous symmetry exhibiting Goldstone mode excitations. Here we review the currently available exact results concerning the critical Casimir effect in systems encompassing the one-dimensional Ising, XY, and Heisenberg models, the two-dimensional Ising model, the Gaussian and the spherical models, as well as the mean field results for the Ising and the XY model. Special attention is paid to the influence of the boundary conditions on the behavior of the Casimir force.

翻訳日:2023-02-20 11:40:47 公開日:2022-12-12

# 論理的誤認検出

Logical Fallacy Detection ( http://arxiv.org/abs/2202.13758v3 )

ライセンス: Link先を確認

Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, Bernhard Sch\"olkopf

(参考訳) 推論は人間の知性の中心である。しかし、悪質な議論が一般的であり、気候変動に関する誤報を広めるなど、悪化する問題もある。本稿では,論理的誤り検出の課題を提案し,テキストに一般的に見られる論理的誤りの新たなデータセット(ロジック)と,気候変動の主張(LogicClimate)における論理的誤りの検出のための追加の課題を提案する。モデルが議論の根底にある論理構造を理解する必要があるため、論理的誤りの検出は難しい問題である。既存の事前学習済みの大規模言語モデルは、このタスクでパフォーマンスが悪いことが分かりました。対照的に、単純な構造認識型分類器は論理学では5.46%、論理学では4.51%の言語モデルを上回る。私たちは今後この課題を探求することを奨励します (a)言語モデルの新たな推論課題として機能し、 (b)誤情報の拡散に取り組むための潜在的な応用がある可能性がある。私たちのデータセットとコードはhttps://github.com/causalNLP/logical-fallacyで利用可能です。

Reasoning is central to human intelligence. However, fallacious arguments are common, and some exacerbate problems such as spreading misinformation about climate change. In this paper, we propose the task of logical fallacy detection, and provide a new dataset (Logic) of logical fallacies generally found in text, together with an additional challenge set for detecting logical fallacies in climate change claims (LogicClimate). Detecting logical fallacies is a hard problem as the model must understand the underlying logical structure of the argument. We find that existing pretrained large language models perform poorly on this task. In contrast, we show that a simple structure-aware classifier outperforms the best language model by 5.46% on Logic and 4.51% on LogicClimate. We encourage future work to explore this task as (a) it can serve as a new reasoning challenge for language models, and (b) it can have potential applications in tackling the spread of misinformation. Our dataset and code are available at https://github.com/causalNLP/logical-fallacy

翻訳日:2023-02-19 15:18:44 公開日:2022-12-12

# 予測セットによるエキスパート予測の改善

Improving Expert Predictions with Prediction Sets ( http://arxiv.org/abs/2201.12006v4 )

ライセンス: Link先を確認

Eleni Straitouri and Lequn Wang and Nastaran Okati and Manuel Gomez Rodriguez

(参考訳) 自動意思決定支援システムは、人間の専門家がより効率的に正確にタスクを解決できるようにする。しかし、既存のシステムは一般に専門家に、いつエージェンシーをシステムに割譲するか、いつ独自のエージェンシーを行使するかを理解する必要がある。さらに、専門家がシステムに対する誤った信頼を育むと、パフォーマンスが悪化する可能性がある。本研究では、上記の要件を解除し、設計上、それぞれの推奨がパフォーマンスを改善するのにいつ正確なのかを専門家が理解する必要のない、自動意思決定支援システムを開発する。この目的のために,マルチクラス分類タスクに着目し,各データサンプルに対して分類器を使用してラベルのサブセットを人間エキスパートに推薦する自動決定支援システムを検討する。まず、共形予測の観点から、そのようなシステムの設計を考えることで、ラベルの推奨されたサブセットが真のラベルを含む確率が、ほぼ正確にターゲット確率値と高い確率と一致することを保証できることを示す。そこで,本研究では,本システムの利用が最も有利な目標確率値を求めるための,効率的で近似的な探索手法を開発した。合成データと実データを用いた実験により,本システムはより正確な予測を行うことができ,それに依存する分類器の精度にロバストであることが証明された。

Automated decision support systems promise to help human experts solve tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Moreover, if the experts develop a misplaced trust in the system, their performance may worsen. In this work, we lift the above requirement and develop automated decision support systems that, by design, do not require experts to understand when each of their recommendations is accurate to improve their performance. To this end, we focus on multiclass classification tasks and consider an automated decision support system that, for each data sample, uses a classifier to recommend a subset of labels to a human expert. We first show that, by looking at the design of such a system from the perspective of conformal prediction, we can ensure that the probability that the recommended subset of labels contains the true label matches almost exactly a target probability value with high probability. Then, we develop an efficient and near-optimal search method to find the target probability value under which the expert benefits the most from using our system. Experiments on synthetic and real data demonstrate that our system can help the experts make more accurate predictions and is robust to the accuracy of the classifier it relies on.

翻訳日:2023-02-19 14:32:23 公開日:2022-12-12

# パーソナルブロックチェーンの作成とメンテナンスのための新しいアーキテクチャ

Novel Architecture to Create and Maintain Personal Blockchains ( http://arxiv.org/abs/2212.14671v1 )

ライセンス: Link先を確認

Collin Connors and Dilip Sarkar

(参考訳) ブロックチェーンは革命的技術だと言われている。しかし、興奮にもかかわらず、ブロックチェーンは多くの分野で採用されていない。多くの人は、プライバシの懸念や使用障壁、実用的なユースケースの欠如などを理由に、ブロックチェーン技術の採用をためらっている。本研究では、複数の金融機関間での金融取引を追跡するブロックチェーンのユースケースについて概説する。従来の集中型アプローチの欠点と、ブロックチェーンアプローチが、このユースケースに必要なすべてのプライバシとアクセシビリティを提供していないことを示しています。したがって、ユースケースをサポートする新しいブロックチェーンアーキテクチャを提案する。この新しいアーキテクチャは、パブリックブロックチェーンの使いやすさとプライベートブロックチェーンのプライバシを組み合わせることで、ユーザがパーソナルブロックチェーンを作成できるようにする。この新たなパーソナルブロックチェーンアーキテクチャは、特にプライベートデータを扱うユースケースにおいて、より多くのブロックチェーン採用につながると考えています。

Blockchain has been touted as a revolutionary technology. However, despite the excitement, blockchain has not been adopted in many fields. Many are hesitant to adopt blockchain technology due to privacy concerns, barriers to use, or lack of practical use cases. In this work, we outline a potential blockchain use case for tracking financial transactions across multiple financial institutions. We show the downsides of traditional centralized approaches and that blockchain approaches fail to give all the privacy and accessibility required for this use case. Thus we propose a novel blockchain architecture to support our use case. This novel architecture combines the ease of use of public blockchains with the privacy of private blockchains by allowing users to create personal blockchains. We believe this novel personal blockchain architecture will lead to more blockchain adoption, particularly in use cases handling private data.

翻訳日:2023-02-19 13:23:02 公開日:2022-12-12

# アダムズン大学における総合的な教育管理ツール

Integrated Educational Management Tool for Adamson University ( http://arxiv.org/abs/2212.08039v1 )

ライセンス: Link先を確認

Anabella C. Doctor

(参考訳) 本研究は、アダムソン大学の教員が無費用の試験、学生の成績の付与、データや努力の冗長性回避、試験や成績に関するアクセス可能で信頼性の高い情報を提供することを支援できるウェブベースの総合的な学術情報システムの開発に焦点をあてた。開発システムは、試験と学生の成績の過程を自動化する。研究の目的を達成するため、研究者はソフトウェア開発ライフサイクルの段階を踏襲し、アダムズソン大学の教員や管理職の期待を満たさない高品質なソフトウェアアウトプットを生み出すことを目的とした。開発システムはアダムズン大学でテストされ,IT専門家やエンドユーザを含む回答者によるISO 9126ソフトウェア製品評価基準を用いて評価された。アダムソン大学統合教育管理ツールは、オープンソース技術を用いてwebサイトの開発に成功したシステムである。このシステムは,webサイトの機能,信頼性,ユーザビリティ,効率性,ポータビリティの試験に成功し,その評価結果から,教育機関試験と学生成績評価システムによる効率性,信頼性,アクセシビリティが評価された。データ処理のオンライン自動同期によるオフラインクラス記録と試験の作成とともに、アイテム分析、仕様表、システムのサブモジュールの強化に関する今後の研究と統合が推奨される。

This study focused on the development of a web based integrated academic information system that can aid Adamson University faculty to become more effective and efficient in giving costless examinations, in giving student grades, in avoiding redundancy of data and efforts, and in providing accessible and reliable information about examinations and grades. The developed system automates the processes of examination and student grading. To achieve the goal of the study, the researcher followed the phases of software development life cycle aiming to produce high quality software output that meets or even exceeds Adamson University faculty and administrations expectations. The developed system was tested in Adamson University and evaluated using the ISO 9126 software product evaluation criteria by respondents who include IT Experts and end-users with a descriptive rating of excellent with a mean average of 4.76 which proves that the system can be a useful tool for managing educational institutions examination and student grading. Integrated Educational Management Tool for Adamson University is a system that was successfully constructed using open source technology in developing web sites. The system has been successfully tested for functionality, reliability, usability, efficiency, and portability of the website with results that revealed that the developed application supports the educational institutions examination and student grading system for efficiency, reliability, and accessibility. Future studies and integration of item analysis, table of specification, and enhancement of sub modules of the system are recommended as well as making available offline class records and exams with online auto synchronization of data processes

翻訳日:2023-02-19 13:05:02 公開日:2022-12-12

# 大規模モビリティネットワークによるcovid-19政策の地理的流出効果の推定

Estimating Geographic Spillover Effects of COVID-19 Policies From Large-Scale Mobility Networks ( http://arxiv.org/abs/2212.06224v1 )

ライセンス: Link先を確認

Serina Chang, Damir Vrabac, Jure Leskovec, Johan Ugander

(参考訳) アメリカにおける多くの政策は、例えば郡レベルで地方で決定される。地方政策体制は地域間の柔軟性を提供しているが、地理的な流出がある場合、人口は近隣の制限の少ない地域へ旅行することで地域制限を回避できる。政策作成の内在的な性質のため、因果的流出効果を確実に推定したり、地域政策への影響を評価する機会はほとんどない。本研究では,地域政策の流出効果を未定で見積もることができるような,新しい設定を特定し,適切な方法論を開発する。カリフォルニア州のより安全な経済のための青写真に焦点を当て、郡レベルのモビリティー制限が新型コロナウイルス(covid-19)の重大度統計によって決定論的に設定されたことを活用し、郡間の流出を見積もる回帰不連続設計の枠組みを可能にした。我々は、何十億ものタイムスタンプのある移動ネットワークを用いてこれらの効果を推定し、小売、飲食店、体育館において大きな効果を持つ大規模な流出運動を見出した。地方やグローバルな政策体制とは対照的に、我々の推計では、郡レベルの制限はモビリティを減らすための州全体の制限と同程度に有効である。しかしながら、郡分割を最適化するマクロカントリー制限の中間戦略は、スプリンクラー推定値で重み付けされたグラフ上で最小のkカット問題を解決することで、郡間の柔軟性を保ちながら、州全体の移動率の90%以上を回復することができる。

Many policies in the US are determined locally, e.g., at the county-level. Local policy regimes provide flexibility between regions, but may become less effective in the presence of geographic spillovers, where populations circumvent local restrictions by traveling to less restricted regions nearby. Due to the endogenous nature of policymaking, there have been few opportunities to reliably estimate causal spillover effects or evaluate their impact on local policies. In this work, we identify a novel setting and develop a suitable methodology that allow us to make unconfounded estimates of spillover effects of local policies. Focusing on California's Blueprint for a Safer Economy, we leverage how county-level mobility restrictions were deterministically set by public COVID-19 severity statistics, enabling a regression discontinuity design framework to estimate spillovers between counties. We estimate these effects using a mobility network with billions of timestamped edges and find significant spillover movement, with larger effects in retail, eating places, and gyms. Contrasting local and global policy regimes, our spillover estimates suggest that county-level restrictions are only 54% as effective as statewide restrictions at reducing mobility. However, an intermediate strategy of macro-county restrictions -- where we optimize county partitions by solving a minimum k-cut problem on a graph weighted by our spillover estimates -- can recover over 90% of statewide mobility reductions, while maintaining substantial flexibility between counties.

翻訳日:2023-02-19 12:58:55 公開日:2022-12-12

# LAMBRETTA: Twitterのソフトモデレーションでランク付けを学ぶ

LAMBRETTA: Learning to Rank for Twitter Soft Moderation ( http://arxiv.org/abs/2212.05926v1 )

ライセンス: Link先を確認

Pujan Paudel, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, and Gianluca Stringhini

(参考訳) 偽情報の問題を抑制するため、twitterのようなソーシャルメディアプラットフォームは、削除された物語を議論するコンテンツに警告ラベルを追加し始めた。残念なことに、これらのラベルは一様に適用されず、大量の偽コンテンツは調整されない。本稿では,Learning To Rank (LTR) を用いたソフトモデレーション候補であるツイートを自動的に識別するシステム LAMBRETTA を提案する。私たちはtwitterのデータを使って、2020年のアメリカ大統領選挙に関連する虚偽の主張を控えめに調べ、twitterがtwitterの20倍以上のツイートを警告していることを突き止め、偽陽性は3.93%、偽陰性は18.81%で、キーワード抽出とセマンティック検索に基づく代替的方法よりも優れていると結論づけました。全体として、LAMBRETTAはソーシャルメディア上の偽情報を識別し、フラグ付けする人間のモデレーターを支援する。

To curb the problem of false information, social media platforms like Twitter started adding warning labels to content discussing debunked narratives, with the goal of providing more context to their audiences. Unfortunately, these labels are not applied uniformly and leave large amounts of false content unmoderated. This paper presents LAMBRETTA, a system that automatically identifies tweets that are candidates for soft moderation using Learning To Rank (LTR). We run LAMBRETTA on Twitter data to moderate false claims related to the 2020 US Election and find that it flags over 20 times more tweets than Twitter, with only 3.93% false positives and 18.81% false negatives, outperforming alternative state-of-the-art methods based on keyword extraction and semantic search. Overall, LAMBRETTA assists human moderators in identifying and flagging false information on social media.

翻訳日:2023-02-19 12:57:50 公開日:2022-12-12

# Wikipediaのバランス法: 集合的知性や大量監視のためのツール?

Wikipedia's Balancing Act: A Tool for Collective Intelligence or Mass Surveillance? ( http://arxiv.org/abs/2212.05828v1 )

ライセンス: Link先を確認

Simon Liu

(参考訳) Wikipediaは、ますます複雑なデータ駆動社会におけるオンライン百科事典としての本来の機能を超えて進化してきた。社会的プラットフォームは、集団情報と集団監視のバランスをとる行為に満ちており、匿名通信ソフトウェアによる重要な貢献を犠牲にすることなく、個人やコミュニティを政府による大量監視から守るためのプロセスを開発する必要がある。ケーススタディは、NSA政府の監視慣行、反SOPA法運動、ウィキペディアの参加ジャーナリズムへの関与、偽情報、自己検閲、Torの使用に関する研究から提供される。本稿では,データ保持と政府説明責任に関する社会文化人類学と政策枠組みの今後の研究を通じて,個人,公共機関,民間機関間の共通基盤を整備することを提案する。ウィキペディアは、その反復的な性質を通じて変化に適応できる複雑な組織として、米国の諜報機関の例として使われており、今後の政策フレームワークの保護方法に関する洞察を引き出している。最後に本稿は,個人,民間機関,政府に対して,オンラインコミュニティへの貢献の結果,個人情報の保存と利用に対する警戒を継続するよう呼びかけるものである。

Wikipedia has evolved beyond its original function as an online encyclopedia in an increasingly complex data-driven society. The social platform is met with a balancing act between collective intelligence and mass surveillance; processes need to be developed to protect individuals and the community from government mass surveillance without sacrificing the important contributions made through prohibited anonymous communication software. Case studies are provided from NSA government surveillance practices, the anti-SOPA legislation movement, and research that covers Wikipedia's involvement with participatory journalism, disinformation, self-censorship, and the use of Tor. This paper proposes that a common ground can be developed between individuals, public and private institutions through future research in socio-cultural anthropology and policy frameworks around data retention and government accountability. Wikipedia is used as an example within the US intelligence community as a complex organisation that can adapt to changes through its iterative nature, which draws insight into how policy frameworks can be future-proofed. Finally, this paper is a wake-up call to individuals, private institutions, and governments to remain vigilant about the storage and use of personal information as a result of contributing to online communities.

翻訳日:2023-02-19 12:57:35 公開日:2022-12-12

# メタバースにおける経済システム:基礎,最先端,課題

Economic Systems in Metaverse: Basics, State of the Art, and Challenges ( http://arxiv.org/abs/2212.05803v1 )

ライセンス: Link先を確認

Huawei Huang, Qinnan Zhang, Taotao Li, Qinglin Yang, Zhaokang Yin, Junhao Wu, Zehui Xiong, Jianming Zhu, Jiajing Wu, and Zibin Zheng

(参考訳) 経済システムはメタバースにおいて重要な役割を果たす。しかし,メタバースの経済システムを体系的に導入する概観はまだ見つかっていない。そこで本稿では,経済システムに関する最先端のソリューション,アーキテクチャ,システムについて概観する。 1) メタバースの文脈における経済システムの枠組みとは何か,(2) 経済システムはメタバースにどのような影響を及ぼすのか? 本稿では、現在と将来のメタバースの両方で機能する経済システムに関する洞察を明らかにすることを目的とする。経済体制の枠組みを概観するために,我々はメタバース,すなわちデジタル創造,デジタル資産,デジタル取引市場における3つの基本的な要素の関連性について論じる。その後,提案する経済システム枠組みの各トピックについて詳述する。これらのトピックには、インセンティブメカニズム、金融システム、デジタルウォレット、分散金融(defi)アクティビティ、メタバースのクロスプラットフォーム相互運用性などがある。各トピックについて、主に3つの質問を取り上げます。 a) この話題の理論的根拠 b) メタバースがなぜこの話題を必要とするのか c) このトピックがメタバースでどのように進化するか。この概観を通じて、読者はメタバースが必要とする経済システムと、メタバースにおける経済活動の背後にある洞察をよりよく理解できるようにしたい。

Economic systems play pivotal roles in the metaverse. However, we have not yet found an overview that systematically introduces economic systems for the metaverse. Therefore, we review the state-of-the-art solutions, architectures, and systems related to economic systems. When investigating those state-of-the-art studies, we keep two questions in our mind: (1) what is the framework of economic systems in the context of the metaverse, and (2) what activities would economic systems engage in the metaverse? This article aims to disclose insights into the economic systems that work for both the current and the future metaverse. To have a clear overview of the economic-system framework, we mainly discuss the connections among three fundamental elements in the metaverse, i.e., digital creation, digital assets, and the digital trading market. After that, we elaborate on each topic of the proposed economic-system framework. Those topics include incentive mechanisms, monetary systems, digital wallets, decentralized finance (DeFi) activities, and cross-platform interoperability for the metaverse. For each topic, we mainly discuss three questions: a) the rationale of this topic, b) why the metaverse needs this topic, and c) how this topic will evolve in the metaverse. Through this overview, we wish readers can better understand what economic systems the metaverse needs, and the insights behind the economic activities in the metaverse.

翻訳日:2023-02-19 12:57:15 公開日:2022-12-12

# 拡散アートかデジタル偽造か? 拡散モデルにおけるデータレプリケーションの検討

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models ( http://arxiv.org/abs/2212.03860v3 )

ライセンス: Link先を確認

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein

(参考訳) カットエッジ拡散モデルは高品質でカスタマイズ可能な画像を生成し、商業芸術やグラフィックデザインの目的で使用することができる。しかし、拡散モデルは独自の芸術作品を作るのか、それともトレーニングセットから直接コンテンツを複製するのか? 本研究では,生成した画像とトレーニングサンプルを比較し,コンテンツが複製されたことを検知する画像検索フレームワークについて検討する。フレームワークをオックスフォード花、Celeb-A、ImageNet、LAIONなど複数のデータセットでトレーニングされた拡散モデルに適用することにより、トレーニングセットのサイズがコンテンツ複製の速度にどのように影響するかを議論する。また,人気のある安定拡散モデルを含む拡散モデルが,トレーニングデータからぼんやりとコピーされるケースを特定する。

Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they replicating content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.

翻訳日:2023-02-19 12:54:27 公開日:2022-12-12

# 非互換性を超えた: 機械学習と法における相互排他的公正基準のトレードオフ

Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law ( http://arxiv.org/abs/2212.00469v3 )

ライセンス: Link先を確認

Meike Zehlike and Alex Loosley and H{\aa}kan Jonsson and Emil Wiedemann and Philipp Hacker

(参考訳) 信頼できるAIは、マシンラーニングと法的なドメインの両方において、ますます重要になっている。重要な結果の1つは、意思決定者が「公正」、すなわち非差別的、アルゴリズム的決定手順を保証しようとすることである。しかし、現実的な事実的仮定の下で相互に相容れないことが示されているアルゴリズム的公正性のいくつかの競合する概念がある。この懸念は、例えば、「グループ内の校正」と「正負のクラスに対する均衡」の広く使われている公平度尺度である。本稿では,これら3つのフェアネス基準を補間する新しいアルゴリズム(FAir Interpolation Method: FAIM)を提案する。これにより、少なくとも部分的には、各フェアネス条件の所望の重み付けの組み合わせを満たすように、初期不公平な予測を是正することができる。我々は,合成データ,CompASデータセット,電子商取引部門による新たな実世界のデータセットに適用した場合のアルゴリズムの有効性を実証する。最後に、FAIMが相反する法的義務を満たすためにどの程度活用できるかについて議論する。この分析は、信用スコアリングや刑事司法手続のような従来の法分野における業務を運用する可能性だけでなく、最近制定されたデジタル市場法のように、EUで実施された最新のAI規制についても運用する可能性があることを示唆している。

Trustworthy AI is becoming ever more important in both machine learning and legal domains. One important consequence is that decision makers must seek to guarantee a 'fair', i.e., non-discriminatory, algorithmic decision procedure. However, there are several competing notions of algorithmic fairness that have been shown to be mutually incompatible under realistic factual assumptions. This concerns, for example, the widely used fairness measures of 'calibration within groups' and 'balance for the positive/negative class'. In this paper, we present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between these three fairness criteria. Thus, an initially unfair prediction can be remedied to, at least partially, meet a desired, weighted combination of the respective fairness conditions. We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector. Finally, we discuss to what extent FAIM can be harnessed to comply with conflicting legal obligations. The analysis suggests that it may operationalize duties in traditional legal fields, such as credit scoring and criminal justice proceedings, but also for the latest AI regulations put forth in the EU, like the recently enacted Digital Markets Act.

翻訳日:2023-02-19 12:45:32 公開日:2022-12-12

# 法律インフォームス・コード:人間と人工知能をアライメントするための法情報学のアプローチ

Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans ( http://arxiv.org/abs/2209.13020v12 )

ライセンス: Link先を確認

John J. Nay

(参考訳) 私たちは現在、AIの振る舞いを確実に導く方法で、人間の目標と社会的価値を特定できません。法的な解釈と法的な解釈は、不透明な人間の価値を妥当な指令に変換する計算エンジンを形成する。ローインフォメーション・コード(law informs code)は、aiに法的知識と推論を組み込んだ研究課題である。法的な契約の当事者が将来の関係のあらゆる潜在的な事態を予測できないのと同様に、議会は提案された法案が適用される全ての状況を予測することができない。法理論と実践は、これらの仕様問題に対処するための一連のツールを開発した。例えば、法的な基準により、人間は共通の理解を発達させ、新しい状況に適応することができる。法律のより散在的な使用(例えば、認可の脅威による悪行の抑止として)とは対照的に、人間の目標の伝達方法や社会の価値観の表現として活用され、法律はコードを知らせる。本稿では,法的プロセス(法制定法,法解釈法,契約起草法,法標準の適用法,法的推論法等)が生み出すデータがどのように,本質的にあいまいな人間の目標の堅牢な仕様を促進するかを述べる。これにより、人間-AIアライメントとAIの局所的有用性が向上する。社会AIアライメントに向けて,多エージェントアライメントの応用哲学としての法を理解するための枠組みを提案する。法律は歴史的に有望な政治権力の反映であり、したがって市民選好の完全な集積ではないが、適切に解析すれば、その蒸留は利用可能な社会的価値の最も正当な計算的理解を提供する。法律が最終的に強力なAIに通知すると、法律を改善するための熟考的な政治プロセスがさらに意味を成す。

We are currently unable to specify human goals and societal values in a way that reliably directs AI behavior. Law-making and legal interpretation form a computational engine that converts opaque human values into legible directives. "Law Informs Code" is the research agenda embedding legal knowledge and reasoning in AI. Similar to how parties to a legal contract cannot foresee every potential contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex ante specify rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code. We describe how data generated by legal processes (methods of law-making, statutory interpretation, contract drafting, applications of legal standards, legal reasoning, etc.) can facilitate the robust specification of inherently vague human goals. This increases human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we present a framework for understanding law as the applied philosophy of multi-agent alignment. Although law is partly a reflection of historically contingent political power - and thus not a perfect aggregation of citizen preferences - if properly parsed, its distillation offers the most legitimate computational comprehension of societal values available. If law eventually informs powerful AI, engaging in the deliberative political process to improve law takes on even more meaning.

翻訳日:2023-02-19 11:24:07 公開日:2022-12-12

# 公正なプログラミング

Fairness Reprogramming ( http://arxiv.org/abs/2209.10222v4 )

ライセンス: Link先を確認

Guanhua Zhang, Yihua Zhang, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, Shiyu Chang

(参考訳) 機械学習(ML)の公正性を促進する最近の進歩にもかかわらず、既存の主流のアプローチは、公正性基準を満たすために、ニューラルネットワークの全重みをトレーニングまたは微調整する必要がある。しかし、大規模なトレーニングモデルでは、計算コストやストレージコスト、データ効率の低さ、モデルプライバシの問題などにより、これは実現不可能であることが多い。本稿では,モデル再プログラミング手法を組み込んだ新しい汎用的フェアネス学習パラダイム,fairreprogramを提案する。具体的には、fairreprogramはモデルの変更ができず、min-maxの定式化の下でフェアネス基準に向けて調整されるfairness triggerと呼ばれる一連の摂動を入力に付加するケースを考察している。さらに,公平性トリガーを用いて公平性目標を達成できる理由と条件を説明する情報理論の枠組みについても紹介する。本研究では,固定MLモデルの出力予測において,フェアネストリガが,正しい人口統計情報を利用して予測を行うのを妨げる偽の人口統計情報を提供することによって,効果的に人口統計バイアスを隠蔽できることを示す。 nlp と cv のデータセットを広範囲に実験した結果,2つのフェアネス基準の下では,データ依存度がはるかに少ない再トレーニング法よりも公平性の向上が期待できることがわかった。コードはhttps://github.com/UCSB-NLP-Chang/Fairness-Remingming.gitで公開されている。

Despite a surge of recent advances in promoting machine Learning (ML) fairness, the existing mainstream approaches mostly require retraining or finetuning the entire weights of the neural network to meet the fairness criteria. However, this is often infeasible in practice for those large-scale trained models due to large computational and storage costs, low data efficiency, and model privacy issues. In this paper, we propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the case where models can not be changed and appends to the input a set of perturbations, called the fairness trigger, which is tuned towards the fairness criteria under a min-max formulation. We further introduce an information-theoretic framework that explains why and under what conditions fairness goals can be achieved using the fairness trigger. We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models by providing false demographic information that hinders the model from utilizing the correct demographic information to make the prediction. Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less data dependency under two widely-used fairness criteria. Codes are available at https://github.com/UCSB-NLP-Chang/Fairness-Reprogramming.git.

翻訳日:2023-02-19 11:17:55 公開日:2022-12-12

# 量子シミュレーションの今後数十年

The Coming Decades of Quantum Simulation ( http://arxiv.org/abs/2204.08905v2 )

ライセンス: Link先を確認

Joana Fraxanet, Tymoteusz Salamon and Maciej Lewenstein

(参考訳) 現代の量子技術は、誤り訂正を伴うフォールトトレラント量子コンピューティングにおいて大きな困難に直面しており、代わりに様々な量子シミュレーション(ノイズ中間スケール量子、NISQ)デバイス、アナログおよびデジタル量子シミュレータ、および量子アニールに焦点を当てている。このようなシステムには、物理的システムの量子力学を必ずしもシミュレートすることなく、巨大で制御可能な、堅牢で絡み合った、重畳状態を生成するという明確なニーズと要求がある。これは特に、デコヒーレンスの制御を可能にし、これらの状態を量子通信(例えば、より安全で速い方法で情報の効率的な転送を実現する)、量子計測、センシング、診断(例えば、光場の位相シフトを正確に測定したり、量子物質を診断するために)に使用できる。この章では、今後数十年にわたって量子シミュレーターの黄金の未来を展望する。

Contemporary quantum technologies face major difficulties in fault tolerant quantum computing with error correction, and focus instead on various shades of quantum simulation (Noisy Intermediate Scale Quantum, NISQ) devices, analogue and digital quantum simulators and quantum annealers. There is a clear need and quest for such systems that, without necessarily simulating quantum dynamics of some physical systems, can generate massive, controllable, robust, entangled, and superposition states. This will, in particular, allow the control of decoherence, enabling the use of these states for quantum communications (e.g. to achieve efficient transfer of information in a safer and quicker way), quantum metrology, sensing and diagnostics (e.g. to precisely measure phase shifts of light fields, or to diagnose quantum materials). In this Chapter we present a vision of the golden future of quantum simulators in the decades to come.

翻訳日:2023-02-16 08:46:46 公開日:2022-12-12

# 古典的情報原則の統一と拡張

Unification and Extension of Classic Information Principles ( http://arxiv.org/abs/2207.07577v5 )

ライセンス: Link先を確認

Jianfeng Xu

(参考訳) 情報理論の普遍的な枠組みを定式化することは有益である。本研究は, 客観的情報理論(oit)のセクシュタプルモデルが, 4つの基本的な仮定と情報を議論するのに十分かつ必要な条件であることを示す。これはOITで定義された各計量に対して示され、古典情報理論や一般的に用いられる原理に該当する例がある。さらに、原子情報は識別不能な基本情報として定義され、原子情報の組み合わせに対して体積加算性が証明される。これにより、単一の量子キャリアが持てる情報ボリュームが導出され、質量、エネルギー、時間に関する情報ボリュームに関する定理が証明される。これらの取り組みは、OITが様々な古典的な情報原理を統一し、情報、物質、エネルギーの量的関係を正確に明らかにできる新しい情報理論であることを示している。

To formulate a universal framework of information theory is beneficial. This study proves that the sextuple model of the objective information theory (OIT) is a sufficient and necessary condition for discussing information with four basic postulations. It is demonstrated for each metric defined in the OIT, there is a corresponding example in classical information theories or commonly used principles. Furthermore, atomic information is defined as the indivisible elementary information and the volume additivity is proven for combinations of atomic information. Consequently, the information volume that a single quantum carrier can carry is derived and a theorem relating information volume to mass, energy, and time is proved. All these efforts illustrate that the OIT is a novel information theory that can unify a variety of classical information principles and even accurately reveal the quantitative relationship between information, matter and energy.

翻訳日:2023-02-09 20:38:27 公開日:2022-12-12

# 不確実性関係について非古典的とは何か?

What is nonclassical about uncertainty relations? ( http://arxiv.org/abs/2207.11779v2 )

ライセンス: Link先を確認

Lorenzo Catani, Matthew Leifer, Giovanni Scala, David Schmid and Robert W. Spekkens

(参考訳) 不確実性関係は、単一の状態における異なる測定結果が共同で予測できる程度に制限を表現している。量子論における非自明な不確実性関係の存在は、古典的世界観からの離脱を伴う方法であると考えられている。しかし、この観点は、非自明な不確実性関係を示すが、一般化された非文脈のオントロジモデルを認めた古典的世界観と一致する操作理論が存在するという事実により、弱められている。これにより、不確実性関係のどの側面が実現できないかという疑問が提起され、真の非古典性の証拠となる。ここでは、二元系アウトカム観測の予測可能性(例えば、パウリxとパウリz観測可能性の量子論における測定)のトレードオフを記述する不確実性関係を考える。特定の対称性特性を満たす理論のクラスに対して、この予測可能性トレードオフの関数形式は、線形曲線の下にある非文脈性によって制約されることを示す。量子ビット量子論は関連する対称性を持つため、その予測可能性トレードオフが円の部分を記述するという事実は、この非コンテキスト境界の違反であり、したがって不確かさ関係の関数形式が文脈性を見極める一例である。また、選択された演算ホイル群を量子論に含意し、3つの測度への一般化を考える。

Uncertainty relations express limits on the extent to which the outcomes of distinct measurements on a single state can be made jointly predictable. The existence of nontrivial uncertainty relations in quantum theory is generally considered to be a way in which it entails a departure from the classical worldview. However, this perspective is undermined by the fact that there exist operational theories which exhibit nontrivial uncertainty relations but which are consistent with the classical worldview insofar as they admit of a generalized-noncontextual ontological model. This prompts the question of what aspects of uncertainty relations, if any, cannot be realized in this way and so constitute evidence of genuine nonclassicality. We here consider uncertainty relations describing the tradeoff between the predictability of a pair of binary-outcome measurements (e.g., measurements of Pauli X and Pauli Z observables in quantum theory). We show that, for a class of theories satisfying a particular symmetry property, the functional form of this predictability tradeoff is constrained by noncontextuality to be below a linear curve. Because qubit quantum theory has the relevant symmetry property, the fact that its predictability tradeoff describes a section of a circle is a violation of this noncontextual bound, and therefore constitutes an example of how the functional form of an uncertainty relation can witness contextuality. We also deduce the implications for a selected group of operational foils to quantum theory and consider the generalization to three measurements.

翻訳日:2023-02-03 22:05:42 公開日:2022-12-12

# 単光子検出器を用いた単一サイクルTHzフィールドの電気光学サンプリング

Electro-Optical Sampling of Single-Cycle THz Fields with Single-Photon Detectors ( http://arxiv.org/abs/2208.02103v2 )

ライセンス: Link先を確認

Taylor Shields, Adetunmise C. Dada, Lennart Hirsch, Seungjin Yoon, Jonathan M. R. Weaver, Daniele Faccio, Lucia Caspani, Marco Peccianti, Matteo Clerici

(参考訳) 超短パルスプローブを用いたテラヘルツ磁場の電気光学サンプリングは、thz放射の電界を直接測定するための確立された手法である。この技術は通常、THz誘起複屈折による光位相シフトを記録するために平衡検出に依存する。したがって、電気光学サンプリングの感度はプローブパルスのショットノイズによって制限され、例えばハイゼンベルク限界位相推定のためにNOON状態を用いる量子メトロジーアプローチによって改善が達成される。光プローブとして単一光子検出器と弱い圧縮真空場を用いたTHz電気光学サンプリング実験について報告する。本手法は、位相同期単光子検出器を用いたプローブ状態の統計特性に制限された磁場感度を実現し、量子増幅型THzセンシングをターゲットとしたさらなる研究の道を開く。

Electro-optical sampling of Terahertz fields with ultrashort pulsed probes is a well-established approach for directly measuring the electric field of THz radiation. This technique usually relies on balanced detection to record the optical phase shift brought by THz-induced birefringence. The sensitivity of electro-optical sampling is, therefore, limited by the shot noise of the probe pulse, and improvements could be achieved using quantum metrology approaches using, e.g., NOON states for Heisenberg-limited phase estimation. We report on our experiments on THz electro-optical sampling using single-photon detectors and a weak squeezed vacuum field as the optical probe. Our approach achieves field sensitivity limited by the probe state statistical properties using phase-locked single-photon detectors and paves the way for further studies targeting quantum-enhanced THz sensing.

翻訳日:2023-02-02 10:05:49 公開日:2022-12-12

# 非平衡フォノン凝縮と相転移の完全量子論

Full quantum theory of nonequilibrium phonon condensation and phase transition ( http://arxiv.org/abs/2209.05086v3 )

ライセンス: Link先を確認

Xuanhua Wang, Jin Wang

(参考訳) fr\"olich condensationは室温の非平衡現象であり、多くの物理系や生物系で起こることが期待されている。理論上は半世紀前に予測されたが、そのような凝縮の性質はいまだ解明されていない。このレターでは、Wu-Austin Hamiltonian から Fr\"ohlich condensation の完全な量子論が導出され、非平衡性および非線形性によって誘導される二階相転移がデコレーション近似なしで大きなD$極限に現れるという解析的証明を初めて提示する。この批判的な行動は、外部ソースが古典的に扱われても観察できない。位相遷移は, 凝縮フォノンの統計分布に大きなゆらぎを伴い, 過大な外部エネルギー入力の限界で変動を特徴づけるマンデル-Q因子が負となることを示す。冷原子平衡 BEC とは対照的に、Fr\"ohlich condensate は、ポンプが粒子の数を設定する役割を担い、媒体が温度を設定する役割を担っている非平衡駆動の結果である。したがって、becは固定ポンプの媒体温度を下げる(平衡の場合)か、固定中温度でポンプを増加させる(非平衡の場合)かのどちらかである。

Fr\"olich condensation is a room-temperature nonequilibrium phenomenon which is expected to occur in many physical and biological systems. Though predicted theoretically a half century ago, the nature of such condensation remains elusive. In this Letter, we derive a full quantum theory of Fr\"ohlich condensation from the Wu-Austin Hamiltonian and present for the first time an analytical proof that a second-order phase transition induced by nonequilibrium and nonlinearity emerges in the large-$D$ limit with and without decorrelation approximation. This critical behavior cannot be witnessed if external sources are treated classically. We show that the phase transition is accompanied by large fluctuations in the statistical distribution of condensate phonons and that the Mandel-Q factor which characterizes fluctuations becomes negative in the limit of excessive external energy input. In contrast with the cold atom equilibrium BEC, the Fr\"ohlich condensate is a result of the nonequilibrium driving where the pump plays a role of setting the number of particles, and the medium plays a role of setting the temperature. Hence, BEC can either arise by reducing the medium temperature at fixed pump (equilibrium case), or by increasing the pump at fixed medium temperature (nonequilibrium case).

翻訳日:2023-01-26 22:20:07 公開日:2022-12-12

# 等次元コンパクト多様体上のベレジン型量子化

Berezin-type quantization on even-dimensional compact manifolds ( http://arxiv.org/abs/2210.08814v2 )

ライセンス: Link先を確認

Rukmini Dey and Kohinoor Ghosh

(参考訳) 本稿では、コンパクトな偶次元多様体 $M^{2d}$ 上でベレジン型量子化が達成できることを示し、残余が ${\mathbb R}^{2d}$ に微分同型であるような低い次元の骨格 $M_0$ を取り除き、${\mathbb C}^d$ と同一視して ${\mathbb C}P^d$ に埋め込む。局所ポアソン構造とベレジン型量子化は${\mathbb C}P^d$から誘導される。我々は、再生カーネルを持つヒルベルト空間を持つ。ヒルベルト空間上の有界線型作用素の記号は、測度 0 の集合の外側の対応原理を満たす星積を持つ。この構成は微分同相に依存する。しかし、x=m \setminus m_0$ が複素構造を持ち、x \setminus x_0$, (x_0$ a set of measure zero or empty) から ${\mathbb c}^d \setminus n_0$, (ここで $n_0$ は測度 0 か空か) への双ホモ同型を仮定する。前述したように、${\mathbb C}^d \setminus N_0$ in ${\mathbb C}^d$ そして${\mathbb C}P^d$に埋め込み、${\mathbb C}P^d$から誘導されるベレジン型量子化を持つ。別の双ホモ同型を用いると、一方の再生核が他方の再生核に写像し、等価な量子化を持つように考慮された2つのヒルベルト空間の写像が存在する。同様の構成として、任意の複素多様体を考え、局所座標を用いて${\mathbb c}p^d$ から量子化を誘導する。コンパクト複素多様体上の大域ベレジン量子化を定義する可能性について検討する。プルバックトプリッツ作用素の定義を与え、測度 0 の集合を取り除いた後にコンパクトな偶次元多様体のトプリッツ量子化を示す。次に、コンパクトな滑らかな多様体上のプルバックコヒーレント状態(英語版)(pullback coherent states)の簡単な構成を与える。

In this article we show that a Berezin-type quantization can be achieved on a compact even dimensional manifold $M^{2d}$ by removing a skeleton $M_0$ of lower dimension such that what remains is diffeomorphic to ${\mathbb R}^{2d}$ which we identify with ${\mathbb C}^d$ and embed in ${\mathbb C}P^d$. A local Poisson structure and Berezin-type quantization are induced from ${\mathbb C}P^d$. We have a Hilbert space with a reproducing kernel. The symbols of bounded linear operators on the Hilbert space have a star product which satisfies the correspondence principle outside a set of measure zero. This construction depends on the diffeomorphism. However, suppose $X= M \setminus M_0$ has a complex structure and we have from $X \setminus X_0$, ($X_0$ a set of measure zero or empty) a biholomorphism from it to ${\mathbb C}^d \setminus N_0$, (where $N_0$ is of measure zero or empty). As before we embed ${\mathbb C}^d \setminus N_0$ in ${\mathbb C}^d$ and then into ${\mathbb C}P^d$ and we have a Berezin-type quantization induced from ${\mathbb C}P^d$. If we use another biholomorphism, we have a map of the two Hilbert spaces under consideration such that the reproducing kernel of one maps to the reproducing kernel of the other and we have an equivalent quantization. We have a similar construction where we consider an arbitrary complex manifold and use local coordinates to induce the quantization from ${\mathbb C}P^d$. We study the possibility of defining a global Berezin quantization on compact complex manifolds. We give a defintion of pullback Toeplitz operators and exhibit Toeplitz quantization of compact even dimensional manifolds after removing a set of measure zero. Next we give a simple construction of pullback coherent states on compact smooth manifolds which are simplified versions of those defined in an earlier work by the authors.

翻訳日:2023-01-22 07:17:37 公開日:2022-12-12

# 時間・周波数分解核共鳴散乱スペクトル

Unraveling Time- and Frequency-Resolved Nuclear Resonant Scattering Spectra ( http://arxiv.org/abs/2210.09848v2 )

ライセンス: Link先を確認

Lukas Wolff and J\"org Evers

(参考訳) 極めて狭い線幅と例外的なコヒーレンス特性のため、m\"ossbauer 核は硬x線のエネルギーで量子光学、分光、動力学に有望なプラットフォームを形成する。さらなる進歩の鍵となる要件は、より強力な計測とデータ分析技術の開発である。 1つのアプローチとして、時間分解スペクトルまたは周波数分解スペクトルを別々に測定する確立されたアプローチと比較して、最近の実験では時間分解スペクトルと周波数分解スペクトルの測定が採用されている。これらの実験では、周波数依存性を調整可能な単線核レファレンス吸収器を用いて実装する。本稿では,周波数領域における時間分解核共鳴散乱スペクトルの分光法と解析法を開発した。提案手法は, 時間軸に沿った実験的にアクセス可能な強度のフーリエ変換に基づいて, 複素数値周波数相関(FFC)スペクトルを導出する。これらのFFCスペクトルは、特に単純な構造を示し、異なる散乱寄与を阻害するだけでなく、ターゲット応答の核標的特性と複雑な値の核共鳴部分に直接アクセスすることを可能にする。第2部では,提案方式の基準吸収器から共振的に散乱したX線の追加位相制御の可能性について検討する。このような制御は特定の散乱経路への選択的アクセスを提供し、パラメータ空間を特定の周波数や時間制限に制限することなく、それぞれの分離解析を可能にする。すべての結果は、m\"ossbauer原子核の薄い層を含む薄膜x線キャビティの核前方散乱や反射の関連例で示される。

Owing to their extremely narrow line-widths and exceptional coherence properties, M\"ossbauer nuclei form a promising platform for quantum optics, spectroscopy and dynamics at energies of hard x-rays. A key requirement for further progress is the development of more powerful measurement and data analysis techniques. As one approach, recent experiments have employed time- and frequency-resolved measurements, as compared to the established approaches of measuring time-resolved or frequency-resolved spectra separately. In these experiments, the frequency-dependence is implemented using a tunable single-line nuclear reference absorber. Here, we develop spectroscopy and analysis techniques for such time- and frequency-resolved Nuclear Resonant Scattering spectra in the frequency-frequency domain. Our approach is based on a Fourier-transform of the experimentally accessible intensities along the time axis, which results in complex-valued frequency-frequency correlation (FFC) spectra. We show that these FFC spectra not only exhibit a particularly simple structure, disentangling the different scattering contributions, but also allow one to directly access nuclear target properties and the complex-valued nuclear resonant part of the target response. In a second part, we explore the potential of an additional phase control of the x-rays resonantly scattered off of the reference absorber for our scheme. Such control provides selective access to specific scattering pathways, allowing for their separate analysis without the need to constrain the parameter space to certain frequency or time limits. All results are illustrated with pertinent examples in Nuclear Forward Scattering and in reflection off of thin-film x-ray cavities containing thin layers of M\"ossbauer nuclei.

翻訳日:2023-01-22 04:28:11 公開日:2022-12-12

# 単分子磁石における$\pi$-Radicalの空間分解電子スピン共鳴

Spatially Resolving Electron Spin Resonance of $\pi$-Radical in Single-molecule Magnet ( http://arxiv.org/abs/2210.10235v2 )

ライセンス: Link先を確認

Ryo Kawaguchi, Katsushi Hashimoto, Toshiyuki Kakudate, Keiichi Katoh, Masahiro Yamashita, Tadahiro Komeda

(参考訳) 磁気分子のスピントロニクスは科学的に注目されている。量子情報処理の量子ビットに特に重点が置かれている。単分子磁石Bis(phthalocyaninato (Pc)) Tb(III) (TbPc2) は、Pc配位子の非局在化された {\pi}-ラジカル電子スピンが、局所化されたTbスピン量子ビットの読み出しと中間化において重要な役割を果たす最もよく検討された例の1つである。走査型トンネル顕微鏡(STM)に実装した電子スピン共鳴(ESR)技術を用いて,Cu(100)基板から分離した単一TbPc2分子の局所ESRを2つの単層NaCl膜で測定し,その放射スピンを同定した。我々は,S = 1/2スピンで期待される共振条件下で,リガンド位置でESR信号を検出する。その結果, ラジカル電子はリガンド内で非局在化され, 化学環境の影響を受けやすい分子内結合を示すことがわかった。

The spintronic properties of magnetic molecules have attracted significant scientific attention. Special emphasis has been placed on the qubit for quantum information processing. The single molecule magnet, bis(phthalocyaninato (Pc)) Tb(III) (TbPc2), is one of the best examined cases in which the delocalized {\pi}-radical electron spin of the Pc ligand plays the key role in reading and intermediating the localized Tb spin qubits. We utilized the electron spin resonance (ESR) technique implemented on scanning tunneling microscope (STM) and use it to measure local ESR of single TbPc2 molecule decoupled from the Cu(100) substrate by 2 monolayers NaCl film to identify the {\pi}-radical spin. We detected the ESR signal at the ligand positions at the resonance condition expected for the S = 1/2 spin. The results reveal that the {\pi}-radical electron is delocalized within the ligands and exhibits intramolecular coupling susceptible to the chemical environment.

翻訳日:2023-01-22 01:53:03 公開日:2022-12-12

# メソスコピックフロケット系の熱化の普遍クラス

Universality classes of thermalization for mesoscopic Floquet systems ( http://arxiv.org/abs/2210.13444v2 )

ライセンス: Link先を確認

Alan Morningstar, David A. Huse, Vedika Khemani

(参考訳) 周期的に駆動されるメソスコピックまたは中間スケールの量子カオス系で発生する熱化の異なる相を同定する。その際、新しいフロッケ熱アンサンブルである「ラダーアンサンブル」を同定し、それは長年駆動系に適した平衡アンサンブルであると仮定されてきた「特徴のない無限温度」状態とは定性的に区別される。我々が見つけた段階は大きく分類され (i)そのシステムが非可逆的に$\omega$のエネルギーをドライブ、すなわちフロッケの熱化と交換するか否か (ii)フロッケ加熱を行う系における最終的な平衡を記述するフロッケ熱アンサンブル。これらの位相はメソスコピック系における振舞いのレギュレーションを表すものであるが、駆動周波数$\omega$がシステムサイズ$N$にスケールアップする特定の大系極限において鋭く定義される: 弱い$N$依存の$\omega(N) \sim \log N$から、$\omega(N) \sim \sqrt{N}$から$\omega(N) \sim N$までの周波数スケーリングを調べる。本研究では,フロッケ熱化が崩壊する遷移は広い駆動周波数で起こることを示すとともに,フロッケ熱化を行わない系は,フロッケ帯のレア共鳴の有無によって区別されることを示す。熱化相図はフロケット系の数値的研究と中間スケール量子シミュレータの実験的研究に関係し、どちらもN$と$\omega$の間のスケールの清浄な分離を欠く有限サイズのシステムに限定されている。我々の研究の顕著な予測は、単純な初期状態からの実験的に観測可能なクエンチプロトコルが、異なる温度での状態のグローバルな重ね合わせである新しいタイプのシュロディンガー・キャット状態へのフロケ熱化を示すことができるということである。

We identify several distinct phases of thermalization that can occur in periodically driven mesoscopic or intermediate-scale quantum chaotic systems. In doing so, we also identify a new Floquet thermal ensemble, the ``ladder ensemble", that is qualitatively distinct from the ``featureless infinite-temperature" state that has long been assumed to be the appropriate equilibrium ensemble for driven systems. The phases we find can be coarsely classified by (i) whether or not the system irreversibly exchanges energy of order $\omega$ with the drive, i.e., Floquet thermalizes, and (ii) the Floquet thermal ensemble describing the final equilibrium in systems that do Floquet thermalize. These phases are representative of regimes of behavior in mesoscopic systems, but they are sharply defined in a particular large-system limit where the drive frequency $\omega$ scales up with system size $N$ as the $N\to\infty$ limit is taken: we examine frequency scalings ranging from a weakly $N$-dependent $\omega(N) \sim \log N$, to stronger scalings ranging from $\omega(N) \sim \sqrt{N}$ to $\omega(N) \sim N$. We show that the transition where Floquet thermalization breaks down happens at an extensive drive frequency and, beyond that, systems that do not Floquet thermalize are distinguished based on the presence or absence of rare resonances across Floquet zones. We produce a thermalization phase diagram that is relevant for numerical studies of Floquet systems and experimental studies on intermediate-scale quantum simulators, both of which are limited to finite-size systems that lack a clean separation of scales between $N$ and $\omega$. A striking prediction of our work is that certain experimentally observable quench protocols from simple initial states can show Floquet thermalization to a novel type of Schrodinger-cat state that is a global superposition of states at distinct temperatures.

翻訳日:2023-01-21 18:46:18 公開日:2022-12-12

# スピン液体ハミルトニアンの量子シミュレーターにおける分数統計の探索

Probing fractional statistics in quantum simulators of spin liquid Hamiltonians ( http://arxiv.org/abs/2211.09784v2 )

ライセンス: Link先を確認

Shiyu Zhou, Maria Zelenayova, Oliver Hart, Claudio Chamon, Claudio Castelnovo

(参考訳) プログラマブル量子デバイスの最近の進歩は、トポロジカル量子スピン液体相の実現と研究にそれらを使うことの興味深い可能性をもたらした。この新しくエキサイティングな方向性は、このようなエキゾチックで非常に絡み合ったフェーズの存在を探究し、決定する方法に関する重要な研究課題をもたらす。最も有望なツールの1つは、トポロジカルな励起の挙動、特にその分数統計の研究である。本研究では、これを達成するための一般的な経路を示し、組合せゲージ対称性の助けを借りて実装された$\mathbb{Z}_2$トポロジカルスピン液体の特定の場合について説明する。我々は,準粒子干渉法を用いて分数統計量のシグネチャを研究するための便利なアーキテクチャを設計し,その頑健性を評価するとともに,雑音のある量子プログラマブルデバイスで一般的に普及する効果を強調する。我々が探している署名は、システム内の量子コヒーレンスと量子干渉効果に重大な影響を与えているため、これらのデバイスの「量子性」を明確にテストするのに役立つ。

Recent advances in programmable quantum devices brought to the fore the intriguing possibility of using them to realise and investigate topological quantum spin liquid phases. This new and exciting direction brings about important research questions on how to probe and determine the presence of such exotic, highly entangled phases. One of the most promising tools is investigating the behaviour of the topological excitations, and in particular their fractional statistics. In this work we put forward a generic route to achieve this, and we illustrate it in the specific case of $\mathbb{Z}_2$ topological spin liquids implemented with the aid of combinatorial gauge symmetry. We design a convenient architecture to study signatures of fractional statistics via quasiparticle interferometry, and we assess its robustness to diagonal and off-diagonal disorder, as well as to dephasing -- effects that are generally pervasive in noisy quantum programmable devices. A useful counterpart of our scheme is that it provides a clear test of the `quantumness' of these devices, since the signatures that we are looking for crucially hinge on quantum coherence and quantum interference effects in the system.

翻訳日:2023-01-19 06:41:42 公開日:2022-12-12

# 部分絡み状態からの絡み替えにおける局所予測可能性とコヒーレンス対分散絡み合い

Local predictability and coherence versus distributed entanglement in entanglement swapping from partially entangled pure states ( http://arxiv.org/abs/2211.07539v2 )

ライセンス: Link先を確認

Jonas Maziero, Marcos L. W. Basso, Lucas C. C\'eleri

(参考訳) 例えば、$p(\rho_{a})^{2} + c(\rho_{a})^{2} + e(|\psi\rangle_{ab})^{2}=1$、局所予測可能性、$p$、局所コヒーレンス、$c$、二成分純粋状態の絡み合い、$e$などである。局所コヒーレンスを持つ部分絡み合った純粋状態の特定のクラスで最初に作られた量子ビット対に対して、これらの関係はRefで使われた。 (第451条,第451条,第128414条(第2022条))]は,エンタングルメント交換プロトコル(esp)のベル・ベーシス測定後の状態の最大エンタングル成分の確率と,事前測定状態の局所的予測可能性との操作的関係を提供する。本稿では、この結果を一般的な純初期状態に対して拡張し、ESPにおける$P$,$C$と分散絡みの関係を確立する。我々は、IBMの量子コンピュータを用いて、これらの一般的な理論結果のいくつかの事例を実験的に検証する。

Complete complementarity relations, as e.g. $P(\rho_{A})^{2} + C(\rho_{A})^{2} + E(|\Psi\rangle_{AB})^{2}=1$, constrain the local predictability, $P$, and local coherence, $C$, and the entanglement, $E$, of bipartite pure states. For pairs of qubits prepared initially in a particular class of partially entangled pure states with null local coherence, these relations were used in Ref. [Phys. Lett. A, 451, 128414 (2022)] to provide an operational connection between local predictability of the pre-measurement states with the probability of the maximally entangled components of the states after the Bell-basis measurement of the entanglement swapping protocol (ESP). In this article, we extend this result for general pure initial states establishing the relation between $P$, $C$ and the distributed entanglement in the ESP. We use IBM's quantum computers to verify experimentally some instances of these general theoretical results.

翻訳日:2023-01-18 06:51:01 公開日:2022-12-12

# 量子畳み込みニューラルネットワークを用いた物体の量子相のモデル独立学習

Model-Independent Learning of Quantum Phases of Matter with Quantum Convolutional Neural Networks ( http://arxiv.org/abs/2211.11786v2 )

ライセンス: Link先を確認

Yu-Jie Liu, Adam Smith, Michael Knap, and Frank Pollmann

(参考訳) 量子畳み込みニューラルネットワーク(QCNN)は、物質ギャップ量子相の分類器として導入されている。本稿では,位相保存摂動下で変化する順序パラメータを検出するために,qcnnを訓練するためのモデル非依存プロトコルを提案する。量子位相の定点波動関数でトレーニングシーケンスを開始し、システムの対称性を尊重する変換不変ノイズを加えて、短い長さスケールで固定点構造を隠蔽する。本稿では、QCNNを1次元の時間反転対称性で保護された位相上で訓練し、自明で対称性を破り、対称性を保護した位相秩序を示す複数の時間反転対称性モデル上でテストする。 QCNNは3つのフェーズすべてを特定し、位相境界の位置を正確に予測する順序パラメータのセットを発見する。提案プロトコルは,プログラム可能な量子プロセッサ上での量子位相分類器のハードウェア効率トレーニングへの道を開くものである。

Quantum convolutional neural networks (QCNNs) have been introduced as classifiers for gapped quantum phases of matter. Here, we propose a model-independent protocol for training QCNNs to discover order parameters that are unchanged under phase-preserving perturbations. We initiate the training sequence with the fixed-point wavefunctions of the quantum phase and then add translation-invariant noise that respects the symmetries of the system to mask the fixed-point structure on short length scales. We illustrate this approach by training the QCNN on phases protected by time-reversal symmetry in one dimension, and test it on several time-reversal symmetric models exhibiting trivial, symmetry-breaking, and symmetry-protected topological order. The QCNN discovers a set of order parameters that identifies all three phases and accurately predicts the location of the phase boundary. The proposed protocol paves the way towards hardware-efficient training of quantum phase classifiers on a programmable quantum processor.

翻訳日:2023-01-17 23:17:56 公開日:2022-12-12

# エッジからバルクへ:空間的非局所量子ビットのキャビティ誘起変位

From edge to bulk: Cavity induced displacement of topological non-local qubits ( http://arxiv.org/abs/2211.14145v2 )

ライセンス: Link先を確認

F. P. M. M\'endez-C\'ordoba, F. J. Rodr\'iguez, C. Tejedor, L. Quiroga

(参考訳) マヨラナフェルミオンの連結性を調整するための位相鎖への選択的キャビティカップリングの能力について検討した。非局所マヨラナフェルミオンペアリングに関連するトポロジカルキュービット(TQ)が、特定の物理的部位との光-物質相互作用への選択的アクセスを通じて、トポロジカルチェーンの端からバルクへ移動可能であることを示す。特に, 鎖状キャビティ結合ジオメトリーの基底状態特性に関する総合的DMRG研究を行い, 強い結合状態における解析的知見を検証した。この新しい種類のマヨラナフェルミオン相関生成プロセスは、新しいキャビティ光子特性を持つ。また,キャビティ・マッター結合強度の急冷後の時間変化を考慮し,高い非自明性物質(majorana)相関の発展はキャビティ内に測定可能な非古典光子インプリントを欠くことを示した。これにより、任意の長さのトポロジカル連鎖におけるTQ非局所相関を動的に生成する新たな方法が提供される。

We investigate the ability of selective cavity coupling to a topological chain for tailoring the connectivity of Majorana fermions. We show how topological qubits (TQs), associated with non-local Majorana fermion pairing, can be moved from the edge to the bulk of a topological chain through selective access to light-matter interaction with specific physical sites. In particular, we present a comprehensive DMRG study of ground-state features in different chain-cavity coupling geometries and validate analytical insights in the strong coupling regime. This new kind of Majorana fermion correlation generation process comes with new cavity photon features. Moreover, by considering the time evolution after a sudden quench of the cavity-matter coupling strength, we show that the development of high non-trivial matter (Majorana) correlations leaves off measurable non-classical photon imprints in the cavity. New ways to dynamically generate TQ nonlocal correlations in topological chains of arbitrary length are thus provided, opening alternative routes to controllable long-range entanglement in hybrid photonic solid-state systems.

翻訳日:2023-01-17 20:41:34 公開日:2022-12-12

# 機械学習による適切な回転フレームの構築

Machine-learning-assisted construction of appropriate rotating frame ( http://arxiv.org/abs/2211.15269v3 )

ライセンス: Link先を確認

Yoshihiro Michishita

(参考訳) ニューラルネットワークによる機械学習は、自然言語処理、画像認識、ゲーム勝利、さらには物理学の問題など、さまざまなタスクのための、ますます強力なツールになりつつある。数値計算への機械学習の適用と実験的な検出の支援については,多くの研究があるが,解析手法の発見に機械学習を適用する方法はあまり研究されていない。本稿では,機械学習を用いて解析手法を見つける手法を提案する。本研究では,時間周期ハミルトニアンをニューラルネットワークに入力するだけで,フロッケマグヌス展開を‘導出’することができることを実証し,周期駆動系の適切な回転フレームを導出する。また,本手法は,他のシステムにおける理論的枠組みの発見にも適用可能であると論じる。

Machine learning with neural networks is now becoming a more and more powerful tool for various tasks, such as natural language processing, image recognition, winning the game, and even for the issues of physics. Although there are many studies on the application of machine learning to numerical calculation and the assistance of experimental detection, the methods of applying machine learning to find the analytical method are poorly studied. In this letter, we propose methods to use machine learning to find the analytical methods. We demonstrate that the recurrent neural networks can ``derive'' the Floquet-Magnus expansion just by inputting the time-periodic Hamiltonian to the neural networks, and derive the appropriate rotating frame in the periodically-driven system. We also argue that this method is also applicable to finding other theoretical frameworks in other systems.

翻訳日:2023-01-17 15:09:33 公開日:2022-12-12

# ホログラフィックメタサーフェストランスシーバーの最適位相シフトの学習

Learning Optimal Phase-Shifts of Holographic Metasurface Transceivers ( http://arxiv.org/abs/2301.03371v1 )

ライセンス: Link先を確認

Debamita Ghosh and Manjesh K. Hanawal and Nikola Zlatanov

(参考訳) ホログラフィー・トランスシーバー(HMT)は,無線通信システムのカバレッジと速度を高める新しい技術である。しかし,HMT支援無線通信システムにおける正確なチャネル状態情報の取得は,これらの目標達成に不可欠である。本論文では,遠距離チャネルモデルのためのHMTにおける最適位相シフトの学習アルゴリズムを提案する。提案手法は遠方界領域におけるチャネルゲインの構造を活用し,受信信号における雑音の存在下での最適位相シフトを学習する。提案アルゴリズムにより推定された最適位相シフトが真の値から逸脱する確率はパイロット信号数で指数関数的に減衰することを示す。大規模な数値シミュレーションは、理論上の保証を検証し、最先端の政策と比較して大きな効果を示す。

Holographic metasurface transceivers (HMT) is an emerging technology for enhancing the coverage and rate of wireless communication systems. However, acquiring accurate channel state information in HMT-assisted wireless communication systems is critical for achieving these goals. In this paper, we propose an algorithm for learning the optimal phase-shifts at a HMT for the far-field channel model. Our proposed algorithm exploits the structure of the channel gains in the far-field regions and learns the optimal phase-shifts in presence of noise in the received signals. We prove that the probability that the optimal phase-shifts estimated by our proposed algorithm deviate from the true values decays exponentially in the number of pilot signals. Extensive numerical simulations validate the theoretical guarantees and also demonstrate significant gains as compared to the state-of-the-art policies.

翻訳日:2023-01-15 23:24:22 公開日:2022-12-12

# 熱行列化ポリトープとその退化

The Thermomajorization Polytope and Its Degeneracies ( http://arxiv.org/abs/2212.04305v2 )

ライセンス: Link先を確認

Frederik vom Ende, Emanuel Malvetti

(参考訳) 将来の熱錐は、与えられた初期状態によって熱行列化された全ての状態の集合であり、準古典的領域において凸ポリトープを形成し、このポリトープの極端点に置換を関連付ける地図を明示的に書き下すことができることはよく知られている。そのような極端点が与えられたとき、初期状態をその極端状態にマッピングするギブス確率行列の式をレビューし、単純な基礎構造を明らかにする。これにより、輸送ポリトープの理論とのつながりが引き起こされ、これは ``well-structured''' と `stable'' ギブス状態の概念に繋がる。前者は極大である極大状態の数に関係しているが、後者は準古典的領域において熱大化が部分次数であるときに特徴付けられる。さらに、極点写像が2つの異なる置換を同じ状態にマップするかどうかを確認するために、ポリトープの退化に関する簡単な基準を与える。

It is well known that the future thermal cone -- which is the set of all states thermomajorized by a given initial state -- forms a convex polytope in the quasi-classical realm, and that one can explicitly write down a map which relates the permutations to the extreme points of this polytope. Given any such extreme point we review a formula for a Gibbs-stochastic matrix that maps the initial state to said extremal state, and we uncover the simple underlying structure. This allows us to draw a connection to the theory of transportation polytopes, which leads to the notions of ``well-structured'' and ``stable'' Gibbs states. While the former relates to the number of extremal states being maximal, the latter characterizes when thermomajorization is a partial order in the quasi-classical realm; this corresponds to the impossibility of cyclic state transfers. Moreover, we give simple criteria for degeneracy of the polytope, that is, for checking whether the extreme point map maps two different permutations to the same state.

翻訳日:2023-01-09 18:50:30 公開日:2022-12-12

# 局所ハミルトニアンの正規化群による低境界基底状態エネルギー

Lower Bounding Ground-State Energies of Local Hamiltonians Through the Renormalization Group ( http://arxiv.org/abs/2212.03014v2 )

ライセンス: Link先を確認

Ilya Kull, Norbert Schuch, Ben Dive, Miguel Navascu\'es

(参考訳) 再正規化スキームが与えられた場合、多体量子系の実現可能な局所密度行列の集合のトラクタブル凸緩和を定式化する方法を示す。この緩和は、成長を続ける格子サイトの集合の減少状態の間の制約階層を導入することによって得られる。基礎となる再正規化手順の粗粒度マップは、それらの制約の多くを取り除くのに役立ち、残りのものは合理的な計算手段で強制される。これは、縮小された量子状態の凸緩和に対して線形最適化を行うことにより、任意の局所ハミルトニアンの基底状態エネルギーの厳密な下界を得るのに使うことができる。境界の質は特定の再正規化スキームに決定的に依存するが、これは対象のハミルトニアンに合わせる必要がある。本手法を1次元翻訳不変スピンモデルに適用し,n\gtrsim 100$スピンの局所翻訳不変状態に対して最適化することで得られるエネルギー境界を求める。この実証の他に、一般的な方法は、高空間次元のスピン系、電子構造問題、および絡み合いや非局所性検出などの様々な多体最適化問題など、幅広い問題に適用することができる。

Given a renormalization scheme, we show how to formulate a tractable convex relaxation of the set of feasible local density matrices of a many-body quantum system. The relaxation is obtained by introducing a hierarchy of constraints between the reduced states of ever-growing sets of lattice sites. The coarse-graining maps of the underlying renormalization procedure serve to eliminate a vast number of those constraints, such that the remaining ones can be enforced with reasonable computational means. This can be used to obtain rigorous lower bounds on the ground state energy of arbitrary local Hamiltonians, by performing a linear optimization over the resulting convex relaxation of reduced quantum states. The quality of the bounds crucially depends on the particular renormalization scheme, which must be tailored to the target Hamiltonian. We apply our method to 1D translation-invariant spin models, obtaining energy bounds comparable to those attained by optimizing over locally translation-invariant states of $n\gtrsim 100$ spins. Beyond this demonstration, the general method can be applied to a wide range of other problems, such as spin systems in higher spatial dimensions, electronic structure problems, and various other many-body optimization problems, such as entanglement and nonlocality detection.

翻訳日:2023-01-09 17:48:21 公開日:2022-12-12

# su(2)$ の還元可能表現のための近傍可換行列の構築と緒方定理への応用

Constructing Nearby Commuting Matrices for Reducible Representations of $su(2)$ with an Application to Ogata's Theorem ( http://arxiv.org/abs/2212.06012v1 )

ライセンス: Link先を確認

David Herrera (Rutgers University)

(参考訳) フォン・ノイマンの予想を解くと、arxiv:1111.5933 のオガタの定理は、n$ のサイトと固定されたサイト次元 $d$ のマクロ可観測量に対応する行列が、漸近的に近傍の可換可観測量 $n \to \infty$ である非常に非自明な結果を示した。本論文では,既約部分表現の多重度が一定の単調な減少挙動を示す$su(2)$の正規化高既約表現に対して,近傍の可換行列を構築する手法を開発した。次に、現場次元 $d=2$ に対するオガタの定理の構成的証明と、近傍の可観測物がどれほど近いかを明確に見積もる。さらに、arxiv:1012.3494で探究された時間反転対称性の適用により、実巨視可観測性は漸近的に近傍の実可換可観測性を有するという性質を持つ。

Resolving a conjecture of von Neumann, Ogata's theorem in arXiv:1111.5933 showed the highly nontrivial result that arbitrarily many matrices corresponding to macroscopic observables with $N$ sites and a fixed site dimension $d$ are asymptotically nearby commuting observables as $N \to \infty$. In this paper, we develop a method to construct nearby commuting matrices for normalized highly reducible representations of $su(2)$ whose multiplicities of irreducible subrepresentations exhibit a certain monotonically decreasing behavior. We then provide a constructive proof of Ogata's theorem for site dimension $d=2$ with explicit estimates for how close the nearby observables are. Moreover, motivated by the application to time-reversal symmetry explored in arXiv:1012.3494, our construction has the property that real macroscopic observables are asymptotically nearby real commuting observables.

翻訳日:2023-01-09 16:28:53 公開日:2022-12-12

# 開量子系に対する作用素成長仮説

An operator growth hypothesis for open quantum systems ( http://arxiv.org/abs/2212.06180v1 )

ライセンス: Link先を確認

Budhaditya Bhattacharjee, Xiangyu Cao, Pratik Nandy, Tanay Pathak

(参考訳) Physの形式的拡張。 rev. x 9, 041017 特定の開量子系において作用素成長仮説を提供することを目標とする。この結果は,マルコフ力学が支配する散逸性$q$-body Sachdev-Ye-Kitaev(SYK$_q$)モデルに基づく。ここでは、大きな$q$の極限において、Lanczos係数の2つの集合(a_n$および$b_n$)の漸近線型挙動の図式的および組合せ的証明を可能にする'operator size concentration'の概念を導入する。我々の結果は、大きな$N$極限における有限$q$の半解析と、有限$q$および有限$N$極限における数値アルノルニ反復とを相関付ける。結果として、クリロフ複雑性は、逆散逸強度で対数的に成長する飽和後の指数関数的な成長を示す。複雑性の増大は閉系結果と比較して抑制されるが、正規化外秩序相関器(OTOC)の成長は上界である。我々は、これを任意の散逸(開)量子系に対して一般的なものと推測し、そのような場合のカオス境界を一般化することができる。また、双対重力面による結果のもっともらしい説明も提供する。

Extending the formalism of Phys. Rev. X 9, 041017, we aim to provide an operator growth hypothesis in certain open quantum systems. Our results are based on the study of the dissipative $q$-body Sachdev-Ye-Kitaev (SYK$_q$) model, governed by the Markovian dynamics. We introduce a notion of ''operator size concentration'' which allows a diagrammatic and combinatorial proof of the asymptotic linear behavior of the two sets of Lanczos coefficients ($a_n$ and $b_n$) in the large $q$ limit. Our results corroborate with the semi-analytics in finite $q$ in the large $N$ limit, and the numerical Arnoldi iteration in finite $q$ and finite $N$ limit. As a result, Krylov complexity exhibits exponential growth following a saturation at a time that grows logarithmically with the inverse dissipation strength. The growth of complexity is suppressed compared to the closed system results, yet it upper bounds the growth of the normalized out-of-time-ordered correlator (OTOC). We conjecture this to be generic for any dissipative (open) quantum systems and may generalize the chaos bound in such cases. We also provide a plausible explanation of the results from the dual gravitational side.

翻訳日:2023-01-09 15:53:55 公開日:2022-12-12

# 時空双対による非平衡全数統計と対称性解の絡み合い

Nonequilibrium Full Counting Statistics and Symmetry-Resolved Entanglement from Space-Time Duality ( http://arxiv.org/abs/2212.06188v1 )

ライセンス: Link先を確認

Bruno Bertini, Pasquale Calabrese, Mario Collura, Katja Klobas, Colin Rylands

(参考訳) その確率的性質から、量子力学における測定過程は可能な結果の分布を生成する。この分布、またはフルカウント統計(FCS)として知られるフーリエ変換は、測定された可観測値の平均値よりもはるかに多くの情報を含み、それにアクセスすることがシステムに関する関連情報を得る唯一の方法である。実際、FCSは、大域対称性の存在下で量子エンタングルメントが異なる対称性セクターにどのように分割されるかを特徴付ける、より一般的な観測可能な族(荷電モーメント)の極限である。ここでは、FCSとU(1)電荷の電荷モーメントの進化を、大域的量子クエンチの後に有限領域に切り替わったものとみなす。大規模な場合、これらの量は2つの異なる状態が時間の関数として示される単純な大偏差形式をとる: 領域のサイズよりもはるかに大きい場合、局所平衡状態によって設定された定常値に近づくが、領域サイズよりも短い場合、時間に対する自明な依存を示す。初期状態が U(1) 対称であるとき、FCS の時間における先頭の順序と非平衡状態における荷電モーメントは時空双対性によって決定できることを示す。すなわち、時間と空間の役割が交換されるシステムの定常値と一致する。この観察を用いてfcsと荷電モーメントの一般性を見いだし、相互作用する可積分モデルにおいてそれらの量の正確な表現を導出する。我々は、この式を規則54量子セルオートマトンとxxzスピン1/2鎖の正確な数値の正確な結果に対してテストする。

Due to its probabilistic nature, a measurement process in quantum mechanics produces a distribution of possible outcomes. This distribution - or its Fourier transform known as full counting statistics (FCS) - contains much more information than say the mean value of the measured observable and accessing it is sometimes the only way to obtain relevant information about the system. In fact, the FCS is the limit of an even more general family of observables - the charged moments - that characterise how quantum entanglement is split in different symmetry sectors in the presence of a global symmetry. Here we consider the evolution of the FCS and of the charged moments of a U (1) charge truncated to a finite region after a global quantum quench. For large scales these quantities take a simple large-deviation form, showing two different regimes as functions of time: while for times much larger than the size of the region they approach a stationary value set by the local equilibrium state, for times shorter than region size they show a non-trivial dependence on time. We show that, whenever the initial state is also U (1) symmetric, the leading order in time of FCS and charged moments in the out-of-equilibrium regime can be determined by means of a space-time duality. Namely, it coincides with the stationary value in the system where the roles of time and space are exchanged. We use this observation to find some general properties of FCS and charged moments out-of-equilibrium, and to derive an exact expression for these quantities in interacting integrable models. We test this expression against exact results in the Rule 54 quantum cellular automaton and exact numerics in the XXZ spin-1/2 chain.

翻訳日:2023-01-09 15:53:32 公開日:2022-12-12

# 遠心影推定 : 量子回路とバウンディングテールの再利用

Thrifty shadow estimation: re-using quantum circuits and bounding tails ( http://arxiv.org/abs/2212.06240v1 )

ライセンス: Link先を確認

Jonas Helsen and Michael Walter

(参考訳) ランダム化シャドウ推定は、ランダム量子回路と計算基底測定を用いて得られる「古典シャドウ」から、指数関数的に多くの量子状態の期待値を推定できる最近のプロトコルである。本稿では,短期量子コンピューティングの観点から,このアプローチの統計効率について検討する。特に,本プロトコルのより実践的に実装可能な変種であるスリフティシャドウ推定を提案し,各測定(元のプロトコルのように)に新たに生成する代わりに,量子回路を何度も再利用する。この再利用の効果は、選択される量子回路の族に強く依存していることを示す。特に、ハールランダムユニタリをサンプリングする場合は最大有効であり、クリフォード回路(クリフォード群が3つのデザインを形成するにもかかわらず)をサンプリングする場合は最大有効ではない。これら2つの極小を補間するために、近似t設計の最近の構築に触発された量子回路の効率良くシミュレート可能な族を提供する。最後に,シャドウ推定のテール境界を考察し,平均中央値推定を標準平均推定に置き換える方法について検討する。

Randomized shadow estimation is a recent protocol that allows estimating exponentially many expectation values of a quantum state from ``classical shadows'', obtained by applying random quantum circuits and computational basis measurements. In this paper we study the statistical efficiency of this approach in light of near-term quantum computing. In particular, we propose and analyze a more practically-implementable variant of the protocol, thrifty shadow estimation, in which quantum circuits are reused many times instead of having to be freshly generated for each measurement (as in the original protocol). We show that the effect of this reuse strongly depends on the family of quantum circuits that is chosen. In particular, it is maximally effective when sampling Haar random unitaries, and maximally ineffective when sampling Clifford circuits (even though the Clifford group forms a three-design). To interpolate between these two extremes, we provide an efficiently simulable family of quantum circuits inspired by a recent construction of approximate t-designs. Finally we consider tail bounds for shadow estimation and discuss when median-of-means estimation can be replaced with standard mean estimation.

翻訳日:2023-01-09 15:53:03 公開日:2022-12-12

# 有限温度ドープ2次元半導体におけるエキシトンポラロンからトライアンへのクロスオーバー

Crossover from exciton polarons to trions in doped two-dimensional semiconductors at finite temperature ( http://arxiv.org/abs/2212.05635v1 )

ライセンス: Link先を確認

A. Tiene, B. C. Mulkerin, J. Levinsen, M. M. Parish, and F. M. Marchetti

(参考訳) ドープ2次元半導体の光応答における温度の役割を体系的に研究した。有限温度フェルミ・ポーラロン理論を用いることで、よく定義されたポラロン準粒子を持つ量子縮退状態から、最低エネルギーの「引力」ポラロン準粒子が破壊される高温または低ドーピングにおける非コヒーレントな状態へのクロスオーバーを明らかにした。クロスオーバーは吸収とフォトルミネッセンスの両方において有意な質的変化を伴っていることを示す。特に、温度の上昇(またはドーピングの減少)とともに、魅力的な枝の放出プロファイルは、有限運動量でトリオンと反動電子を含む指数的テールを持つ対称ローレンツ型から非対称ピークへと進化する。マイクロキャビティ内に埋もれた構造物の光とのカップリングに対する温度の影響を考察し、エキシトン-ポーラロン準粒子が破壊されても十分に定義された偏光子準粒子が存在することを示し、弱光から強光間カップリングへの遷移をポーラロン線幅とスペクトル重みの観点から説明できることを示した。

We study systematically the role of temperature in the optical response of doped two-dimensional semiconductors. By making use of a finite-temperature Fermi-polaron theory, we reveal a crossover from a quantum-degenerate regime with well-defined polaron quasiparticles to an incoherent regime at high temperature or low doping where the lowest energy "attractive" polaron quasiparticle is destroyed, becoming subsumed into a broad trion-hole continuum. We demonstrate that the crossover is accompanied by significant qualitative changes in both absorption and photoluminescence. In particular, with increasing temperature (or decreasing doping), the emission profile of the attractive branch evolves from a symmetric Lorentzian to an asymmetric peak with an exponential tail involving trions and recoil electrons at finite momentum. We discuss the effect of temperature on the coupling to light for structures embedded into a microcavity, and we show that there can exist well-defined polariton quasiparticles even when the exciton-polaron quasiparticle has been destroyed, where the transition from weak to strong light-matter coupling can be explained in terms of the polaron linewidths and spectral weights.

翻訳日:2023-01-09 15:45:42 公開日:2022-12-12

# 非エルミート系における有限温度動的量子相転移

Finite temperature dynamical quantum phase transition in a non-Hermitian system ( http://arxiv.org/abs/2212.05839v1 )

ライセンス: Link先を確認

Debashish Mondal, Tanay Nag

(参考訳) 混合状態動的量子相転移(MSDQPT)の文脈における非常温性と有限温度との相互作用について検討する。 p$-wave超伝導体モデルでは,ガッピング位相に加えてギャップのない位相を生じさせる複雑なホッピングと非ハーミティを包含し,位相内クエンチを介してmsdqptと巻線数を調べる。このMSDQPTは, 基礎相のギャップ構造によらず常に存在するが, フィッシャー零点の分布は上記の相の間で変化する。このようなMSDQPTの発生は、差分相に対してDQPTが起こらないゼロ温度の場合とは対照的である。驚くべきことに、ゼロ温度での回転数の半整数ジャンプは、ギャップレス位相の有限温度のために洗い流される。本研究では,ガッピング相とギャップレス相を区別できる逆温度でmsdqptを経験するために必要な最小時間の進化について検討する。本研究は, ギャップ付き(ギャップなし)相の単調(非単調)挙動を最小時間で示すことを示す。

We investigate the interplay between the non-Hermiticity and finite temperature in the context of mixed state dynamical quantum phase transition (MSDQPT). We consider a $p$-wave superconductor model, encompassing complex hopping and non-Hermiticity that can lead to gapless phases in addition to gapped phases, to examine the MSDQPT and winding number via the intra-phase quench. We find that the MSDQPT is always present irrespective of the gap structure of the underlying phase, however, the profile of Fisher zeros changes between the above phases. Such occurrences of MSDQPT are in contrast to the zero-temperature case where DQPT does not take place for the gapped phase. Surprisingly, the half-integer jumps in winding number at zero-temperature are washed away for finite temperature in the gapless phase. We study the evolution of the minimum time required by the system to experience MSDQPT with the inverse temperature such that gapped and gapless phases can be differentiated. Our study indicates that the minimum time shows monotonic (non-monotonic) behavior for the gapped (gapless) phase.

翻訳日:2023-01-09 15:45:16 公開日:2022-12-12

# 弱磁場中における中性原子の高速核スピンゲートと電子核絡み

Fast nuclear-spin gates and electrons-nuclei entanglement of neutral atoms in weak magnetic fields ( http://arxiv.org/abs/2212.05876v1 )

ライセンス: Link先を確認

Xiao-Feng Shi

(参考訳) 例として,2価原子の核スピンを^<171}$ybとする高速リドバーグ媒介の絡み合いを示す。まず、スタークシフトの補助により2つのレーザーパルスまたは3つのパルスで実現可能な任意の位相の核スピン制御位相ゲートを示す。次に、2つの原子の電子~(e)と核スピン~(n)の間に絡み合った状態(\lvert\text{cc}\rangle_{\text{e}} \otimes \lvert\phi\rangle_{\text{n}} + \lvert\phi\rangle_{\text{e}} \otimes \lvert\psi\rangle_{\text{n}} )/\sqrt{2}$を作成する。より優れた用語を欲しがるならば、" `large'' Bell state"を模倣した"Super Bell State"と呼ばれ、3つの" ``smaller' Bell state"が組み込まれている。第3に、3つの原子状態 $(\sqrt{3}\lvert\rangle_{\text{e}} \otimes \lvert\lambda\rangle_{\text{n}} + \lvert \text{w}\rangle_{\text{e}} \otimes \lvert \text{ghz}\rangle_{\text{n}} )/2$, where $\lvert\lambda\rangle_{\text{n}}$ is a nuclear-spin state, $\lvert \text{w}\rangle_{\text{e}}$ is a w state in the ground-clock state space, $\lvert \text{ghz}\rangle_{\text{n}}$ is the greenberger-horne-zeilinger(~ghz) state-pin state space である。 4つのプロトコルは内在性が高く、単サイトのrydbergアドレスを必要とせず、各原子中の2つの核スピン量子ビット状態のrydberg励起を伴う弱いガウススケールの磁場で大きな$\omega_{\text{m}}$で実行することができる。後者の2つのプロトコルはベル、ハイパーエンタングルおよびGHZ状態の測定に基づく準備を可能にする。

We present fast Rydberg-mediated entanglement involving nuclear spins of divalent atoms with $^{171}$Yb as an example. First, we show a nuclear-spin controlled phase gate of an arbitrary phase realizable either with two laser pulses when assisted by Stark shifts, or with three pulses. Second, we propose to create a state $(\lvert\text{cc}\rangle_{\text{e}} \otimes \lvert\Phi\rangle_{\text{n}} + \lvert\Phi\rangle_{\text{e}} \otimes \lvert\Psi\rangle_{\text{n}} )/\sqrt{2}$ entangled between the electrons~(e) and nuclear spins~(n) of two atoms, where $\lvert\Phi\rangle$ and $\lvert\Psi\rangle$ are two orthogonal Bell states and $\lvert \text{c}\rangle_{\text{e}}$ denotes the clock state. For want of a better term, it is called a Super Bell State for it mimics a ``large'' Bell state incorporating three ``smaller'' Bell states. Third, we show a protocol to create a three-atom state $(\sqrt{3}\lvert\text{ccc}\rangle_{\text{e}} \otimes \lvert\Lambda\rangle_{\text{n}} + \lvert \text{W}\rangle_{\text{e}} \otimes \lvert \text{GHZ}\rangle_{\text{n}} )/2$, where $\lvert\Lambda\rangle_{\text{n}}$ is a nuclear-spin state, $\lvert \text{W}\rangle_{\text{e}}$ is a W state in the ground-clock state space, and $\lvert \text{GHZ}\rangle_{\text{n}}$ is the Greenberger-Horne-Zeilinger~(GHZ) state in the nuclear-spin state space. The four protocols possess high intrinsic fidelities, do not require single-site Rydberg addressing, and can be executed with large $\Omega_{\text{m}}$ in a weak, Gauss-scale magnetic field for they involve Rydberg excitation of both nuclear-spin qubit states in each atom. The latter two protocols can enable measurement-based preparation of Bell, hyperentangled, and GHZ states.

翻訳日:2023-01-09 15:44:58 公開日:2022-12-12

# SyReC Synthesizer:可逆回路合成のためのMQTツール

SyReC Synthesizer: An MQT tool for synthesis of reversible circuits ( http://arxiv.org/abs/2212.05903v1 )

ライセンス: Link先を確認

Smaran Adarsh, Lukas Burgholzer, Tanmay Manjunath and Robert Wille

(参考訳) 可逆回路は、量子コンピューティング、低消費電力/断熱設計、エンコーダ/デコーダデバイスなど、多くの有望な新興技術のバックボーンを形成する。近年,このような回路のスケーラブルな合成が注目されている。本稿では,ハードウェア記述言語SyReCに基づく可逆回路の合成ツールであるSyReC Synthesizerを提案する。 SyReCは高レベルの抽象化で可逆的な機能を記述することができる。提供されるSyReC Synthesizerはプッシュボタン方式でこの機能を実現する。対応するオプションは、必要な回路信号/線数(例えば、全ての回路ラインがキュービットに対応する量子コンピューティング)と必要なゲート(回路のコストに応じた)との間のトレードオフを可能にする。さらに、このツールは結果の回路をシミュレートし、ゲートコストを決定できる。 SyReC Synthesizerは、ミュンヘン量子ツールキット(MQT)の一部としてhttps://github.com/cda-tum/syrecでオープンソースソフトウェアパッケージとして利用可能である。

Reversible circuits form the backbone for many promising emerging technologies such as quantum computing, low power/adiabatic design, encoder/decoder devices, and several other applications. In the recent years, the scalable synthesis of such circuits has gained significant attention. In this work, we present the SyReC Synthesizer, a synthesis tool for reversible circuits based on the hardware description language SyReC. SyReC allows to describe reversible functionality at a high level of abstraction. The provided SyReC Synthesizer then realizes this functionality in a push-button fashion. Corresponding options allow for a trade-off between the number of needed circuit signals/lines (relevant, e.g., for quantum computing in which every circuit line corresponds to a qubit) and the respectively needed gates (corresponding to the circuit's costs). Furthermore, the tool allows to simulate the resulting circuit as well as to determine the gate costs of it. The SyReC Synthesizer is available as an open-source software package at https://github.com/cda-tum/syrec as part of the Munich Quantum Toolkit (MQT).

翻訳日:2023-01-09 15:43:28 公開日:2022-12-12

# ホログラフィー量子スカー

Holographic Quantum Scars ( http://arxiv.org/abs/2212.05962v1 )

ライセンス: Link先を確認

Diego Liska, Vladimir Gritsev, Ward Vleeshouwers, Ji\v{r}\'i Min\'a\v{r}

(参考訳) ホログラフィーの文脈における量子多体傷の構成について論じる。二次元共形場の理論を考察し、その力学対称性をヴィラソロ環を通じて自然に実現し、傷ついた状態を構築する。 Loschmidt振幅の研究により、状態の周期的特性を評価する。幾何学的解釈により、応力テンソルの期待値とこれらの傷つき状態の絡み合いエントロピーを計算することができる。これらのホログラフィック双対は、ブラックホールしきい値以上のエネルギーであっても、空のAdSと微分同相によって関連していることを示す。また,スカーレッド状態における期待値は一般に非熱的であり,典型的な (バルク) 状態に対する$\sqrt{e}$ とは対照的に,そのエントロピーが $\log(e)$ のエネルギーとともに増大することを示した。さらに、スカーレッド状態が無限エネルギーを持つ極限において、発散あるいは消滅する絡み合いエントロピーに関連するCFT平面上の固定点を同定する。

We discuss a construction of quantum many-body scars in the context of holography. We consider two-dimensional conformal field theories and use their dynamical symmetries, naturally realized through the Virasoro algebra, to construct scarred states. By studying their Loschmidt amplitude, we evaluate the states' periodic properties. A geometrical interpretation allows us to compute the expectation value of the stress tensor and entanglement entropy of these scarred states. We show that their holographic dual is related by a diffeomorphism to empty AdS, even for energies above the black hole threshold. We also demonstrate that expectation values in the scarred states are generally non-thermal and that their entanglement entropy grows with the energy as $\log(E)$ in contrast to $\sqrt{E}$ for the typical (bulk) states. Furthermore, we identify fixed points on the CFT plane associated with divergent or vanishing entanglement entropy in the limit where the scarred states have infinite energy.

翻訳日:2023-01-09 15:43:00 公開日:2022-12-12

# Biorthogonal renormalization

Biorthogonal Renormalization ( http://arxiv.org/abs/2212.06004v1 )

ライセンス: Link先を確認

Elisabet Edvardsson, J Lukas K K\"onig, Marcus St{\aa}lhammar

(参考訳) 生物直交形式は、従来の量子力学を非エルミート領域に拡張する。しかし、生物rthogonal inner productは固有ベクトルのスケーリングによって変化することが指摘されており、その物理的意義はまだ議論されている。ここでは、この問題を再検討し、この正規化の選択が物理的に重要である場合について議論する。本稿では, 予測値や遷移確率などの設定量が固有ベクトルのスケーリングに依存する場合と, 生物rthogonal formalism の設定が不明瞭である場合について述べる。明らかなスケーリングの曖昧さを解決するため、基底のゲージ選択に依存しない内部積を導入し、それに対応する数学的構造が量子力学と一致することを示す。この形式主義を用いて、ヒルベルト空間表現の物理性に関するより深い問題を特定し、位置基底を用いて説明する。多くの物理的結果が依拠する数学的基礎の理解を深めるだけでなく、この発見は非エルミート的ハミルトニアンによって記述されたシステム間の一貫した比較への道を開いた。

The biorthogonal formalism extends conventional quantum mechanics to the non-Hermitian realm. It has, however, been pointed out that the biorthogonal inner product changes with the scaling of the eigenvectors, an ambiguity whose physical significance is still being debated. Here, we revisit this issue and argue when this choice of normalization is of physical importance. We illustrate in which settings quantities such as expectation values and transition probabilities depend on the scaling of eigenvectors, and in which settings the biorthogonal formalism remains unambiguous. To resolve the apparent scaling ambiguity, we introduce an inner product independent of the gauge choice of basis and show that its corresponding mathematical structure is consistent with quantum mechanics. Using this formalism, we identify a deeper problem relating to the physicality of Hilbert space representations, which we illustrate using the position basis. Apart from increasing the understanding of the mathematical foundations upon which many physical results rely, our findings also pave the way towards consistent comparisons between systems described by non-Hermitian Hamiltonians.

翻訳日:2023-01-09 15:42:40 公開日:2022-12-12

# 臨界不安定二層系の量子コヒーレンス

Quantum Coherence of Critical Unstable Two-Level Systems ( http://arxiv.org/abs/2212.06031v1 )

ライセンス: Link先を確認

Dimitrios Karamitros, Thomas McKelvey, Apostolos Pilaftsis

(参考訳) 量子ビットのブロッホ球形式を用いて不安定な2レベル量子系の力学を詳細に研究する。このような不安定な量子ビット系のブロッホベクトル表現を用いることで、いわゆるエネルギーレベルベクトルと減衰幅ベクトルである ${\bf e}$ と ${\bf\gamma}$ が互いに直交し、パラメータ $r = |{\bf \gamma}|/(2|{\bf e}|)$ が 1 未満となるような、新しい臨界シナリオのクラスを特定する。最も驚くべきことに、臨界不安定な量子ビット系は、システムの適切に定義された共沈系で解析された場合、コヒーレンス・デコヒーレンス振動のような非定型的な振る舞いを示す。同じフレームで、純粋な臨界量子ビットを記述する単位ブロッホベクトル ${\bf b}$ は、同じ時間間隔で不等な領域を掃き、一方、ベクトル ${\bf e}$ の周りで回転する。これらの現象は、2レベル量子系のエネルギーレベル差によって通常の振動パターンを越えて現れる。興味深いことに、これらの新機能は準クリティカルなシナリオでも継続するので、ベクトル ${\bf e}$ と ${\bf\gamma}$ は互いに完全に直交するものではない。量子情報および不安定な中間子-アタイムソンおよび他のシステムへの適用について論じる。

We study in detail the dynamics of unstable two-level quantum systems by adopting the Bloch-sphere formalism of qubits. By employing the Bloch-vector representation for such unstable qubit systems, we identify a novel class of critical scenarios in which the so-called energy-level and decay-width vectors, ${\bf E}$ and ${\bf\Gamma}$, are orthogonal to one another, and the parameter $r = |{\bf \Gamma}|/(2|{\bf E}|)$ is less than 1. Most remarkably, we find that critical unstable qubit systems exhibit atypical behaviours like coherence--decoherence oscillations when analysed in an appropriately defined co-decaying frame of the system. In the same frame, a unit Bloch vector ${\bf b}$ describing a pure critical qubit will sweep out unequal areas during equal intervals of time, while rotating about the vector ${\bf E}$. These phenomena emerge beyond the usual oscillatory pattern due to the energy-level difference of the two-level quantum system. Interestingly enough, we observe that these new features will persist even for quasi-critical scenarios, in which the vectors ${\bf E}$ and ${\bf\Gamma}$ are not perfectly orthogonal to each other. Applications of our results to quantum information and to unstable meson--antimeson and other systems are discussed.

翻訳日:2023-01-09 15:42:22 公開日:2022-12-12

# ボソニックフラックスラダーにおける幾何学的フラストレーションのないフラストレーションマグネット

Frustrated magnets without geometrical frustration in bosonic flux ladders ( http://arxiv.org/abs/2212.06112v1 )

ライセンス: Link先を確認

Luca Barbiero, Josep Cabedo, Maciej Lewenstein, Leticia Tarruell, Alessio Celi

(参考訳) 光格子中の超低温ボゾン原子を用いたフラストレーションスピン1/2量子XXモデルの実現手法を提案する。我々のアプローチは、1つの実次元と1つの合成スピン次元を持つ$\pi$に近い磁束の正方形のラダーに基づいている。このシステムは幾何学的なフラストレーションを持たないが、低エネルギーでは合成トンネルの特定の値にスタガー付きフラックスを持つ有効三角形のはしごにマッピングされる。本研究では, その豊富な相図を数値的に検討し, 結合秩序波およびキラル超流動相を含むことを示す。本手法は, 実際の幾何学的フラストレーションを必要とせずに, 最小のフラストレーションマグネットのインスタンスにアクセスし, 実験的な複雑さを最小化する。

We propose a scheme to realize the frustrated spin-1/2 quantum XX model with ultracold bosonic atoms in optical lattices. Our approach is based on a square ladder of magnetic flux close to $\pi$ with one real and one synthetic spin dimension. Although this system does not have geometrical frustration, we show that at low energies it maps into an effective triangular ladder with staggered fluxes for specific values of the synthetic tunneling. We numerically investigate its rich phase diagram and show that it contains bond-ordered-wave and chiral superfluid phases. Our scheme gives access to minimal instances of frustrated magnets without the need for real geometrical frustration, in a setup of minimal experimental complexity.

翻訳日:2023-01-09 15:41:56 公開日:2022-12-12

# スペクトルフィルタによる不連続ポンピング2レベル系の純度, 識別性, 量子収率の制御

Controlling Purity, Indistinguishability and Quantum Yield of Incoherently Pumped Two-Level System by Spectral Filters ( http://arxiv.org/abs/2212.06233v1 )

ライセンス: Link先を確認

Ivan V. Panyukov, Vladislav Yu. Shishkov, Evgeny S. Andrianov

(参考訳) 退化過程は決定論的単一光子源の性能に大きな影響を及ぼす。スペクトル線を強調することで、放出された光子の識別性が低下し、多くの応用、特に量子コンピューティングでは望ましくない。パルス非コヒーレントポンプを用いた2レベルシステムにより放射される光をスペクトルフィルタの存在下で検討する。スペクトルフィルタは、2階自己相関関数、識別不能性、および量子収率の制御を可能にする。狭いスペクトルフィルタは、量子収率を損なうことなく、放出される光の識別性を高めることができる。スペクトルフィルタが2次相関関数に及ぼす影響はポンプの持続時間に依存する。ポンプパルスが2レベル系の寿命と比較して長い場合、狭いスペクトルフィルタは2次自己相関関数を急速に増加させる。この限界において、2段階の系からの光の統計は、非コヒーレントポンプの統計を継承する。ポンプパルスの短寿命の場合、スペクトルフィルタのサブライフタイム幅に対して、一光子特性をある程度保持することができる。さらに、単一光子源によって放出される光が、例えば空洞のような量子系を制御するために使用されるとき、光の単一光子の性質は、量子系の応答時間によって異なる。特に、長い応答時間の場合、サブライフタイム幅のスペクトルフィルタは、ほぼゼロの2次自己相関関数を提供できる。

Dephasing processes significantly impact the performance of deterministic single-photon sources. Dephasing broadens the spectral line and suppresses the indistinguishability of the emitted photons, which is undesirable for many applications, primarily for quantum computing. We consider a light emitted by a two-level system with a pulsed incoherent pump in the presence of the spectral filter. The spectral filter allows control of the second-order autocorrelation function, indistinguishability, and quantum yield. We show that narrow spectral filters can increase the indistinguishability of the emitted light while undermining the quantum yield. The influence of the spectral filter on the second-order correlation function depends on the duration of the pump. When the pumping pulse is long compared to the lifetime of the two-level system, the narrow spectral filters lead to a rapid increase in the second-order autocorrelation function. In this limit, the statistics of the light from the two-level system inherit the statistics of the incoherent pump. In the case of the short duration of the pump pulse, it is possible to preserve single-photon properties to some degree for the sub-lifetime width of the spectral filter. Moreover, when the light emitted by the single-photon source is used to control a quantum system, e.g., cavity, the single-photon properties of the light manifest themselves differently, depending on the response time of the quantum system. In particular, in the case of long response time, the spectral filter with sub-lifetime width can provide the near-zero second-order autocorrelation function.

翻訳日:2023-01-09 15:08:29 公開日:2022-12-12

# 放射周期ポテンシャルにおける渦リング量子液滴

Vortex-ring quantum droplets in a radially-periodic potential ( http://arxiv.org/abs/2212.05838v1 )

ライセンス: Link先を確認

Bin Liu, Yi xi Chen, Ao wei Yang, Xiao yan Cai, Yan Liu, Zhi huan Luo, Xi zhou Qin, Xun da Jiang, Yong yao Li, and Boris A. Malomed

(参考訳) ボース・アインシュタイン凝縮体(BECs)により形成される2次元渦輪状量子滴(QDs)の安定性と特性を確立する。この系はGross-Pitaevskii(GP)方程式でモデル化され、対数係数(平均場理論へのリー=ハン・ヤン補正によって生成される)と放射座標の周期関数であるポテンシャルによって乗算される。放射電位の特定の円輪に閉じ込められた位相電荷の高い狭い渦輪が生成される。これらの結果から,vortical qdsの作成には実験的に関連のある方法が示唆された(これまではゼロ渦のみ報告されている)。狭い環の2次元GP方程式は1次元形式にほぼ還元され、アジムタール摂動に対する環の変調安定性を研究することができる。これらのモードでは、完全な安定性領域がデライン化されている。回転数 (WNs) の異なる渦輪に対して, 円形トラフのトラップ容量を同定した。互いにネストした同心多重環の形で安定な化合物状態も構築され、WNの反対の符号を含む。他のロバスト化合物状態は、1つの円電位トラフに変調的に安定な狭い環と、隣接する軌道運動を行うアジムタールソリトンを組み合わせる。この結果は、データストレージに異なるWNを持つリング形状のモードを併用したデバイスの設計に使用することができる。

We establish stability and characteristics of two-dimensional (2D) vortex ring-shaped quantum droplets (QDs) formed by binary Bose-Einstein condensates (BECs). The system is modeled by the Gross-Pitaevskii (GP) equation with the cubic term multiplied by a logarithmic factor (as produced by the Lee-Huang-Yang correction to the mean-field theory) and a potential which is a periodic function of the radial coordinate. Narrow vortex rings with high values of the topological charge, trapped in particular circular troughs of the radial potential, are produced. These results suggest an experimentally relevant method for the creation of vortical QDs (thus far, only zero-vorticity ones have been reported). The 2D GP equation for the narrow rings is approximately reduced to the 1D form, which makes it possible to study the modulational stability of the rings against azimuthal perturbations. Full stability areas are delineated for these modes. The trapping capacity of the circular troughs is identified for the vortex rings with different winding numbers (WNs). Stable compound states in the form of mutually nested concentric multiple rings are constructed too, including ones with opposite signs of the WNs. Other robust compound states combine a modulationally stable narrow ring in one circular potential trough and an azimuthal soliton performing orbital motion in an adjacent one. The results may be used to design a device employing coexisting ring-shaped modes with different WNs for data storage.

翻訳日:2023-01-09 14:57:23 公開日:2022-12-12

# フェムト秒レーザーライティングによる2量子ビット量子フォトニックプロセッサ

Two-qubit quantum photonic processor manufactured by femtosecond laser writing ( http://arxiv.org/abs/2212.05931v1 )

ライセンス: Link先を確認

N.N. Skryabin, I.V. Kondratyev, I.V. Dyakonov, O.V. Borzenkova, S.P. Kulik, and S.S. Straupe

(参考訳) フェムト秒レーザーライティング技術を用いて作製した2量子ビットフォトニック量子プロセッサを実験的に実装した。我々はフェムト秒レーザーライティングを用いて、精密な単一量子ビットと2量子ビット演算を実装した低損失再構成可能なフォトニックチップを作成する。シングルキュービットゲートと2キュービットゲートの性能はフルプロセストモグラフィーによって特徴付けられる。変動量子固有解法アルゴリズムを用いてH2分子の基底状態エネルギーを決定するためのプロセッサの例を示した。フェムト秒レーザーによる小型量子フォトニックプロセッサの高性能化の可能性について検討した。

We present an experimental implementation of a two-qubit photonic quantum processor fabricated using femtosecond laser writing technology. We employ femtosecond laser writing to create a low-loss reconfigurable photonic chip implementing precise single-qubit and two-qubit operations. The performance of single-qubit and two-qubit gates is characterized by full process tomography. An exemplary application of the processor to determining the ground state energy of an H2 molecule using the variational quantum eigensolver algorithm is demonstrated. Our results highlight the potential of femtosecond laser writing technology to deliver high quality small-scale quantum photonic processors.

翻訳日:2023-01-09 14:56:57 公開日:2022-12-12

# 確率量子シミュレーションにおける重要度サンプリング

Importance sampling for stochastic quantum simulations ( http://arxiv.org/abs/2212.05952v1 )

ライセンス: Link先を確認

Oriel Kiss, Michele Grossi and Alessandro Roggero

(参考訳) 複雑な量子システムのシミュレーションは、デジタル量子コンピュータにとって有望なタスクである。しかし、一般的な製品公式の深さはハミルトニアンのサムマン数に比例するので、短期的およびフォールトトレラントなデバイスで実装することは困難である。効率的な解は、ハミルトニアンから係数の大きさに応じてサンプリングしてランダム積公式を構築する qdrift として知られる確率的コンパイルプロトコルによって与えられる。本研究では,qdriftプロトコルをサンプリングの重要性で統一し,バイアスと統計変動の両方を制御しながら任意の分布からサンプルすることが可能である。サンプリング段階における個別のシミュレーションコストを考慮することにより、同じ精度でシミュレーションコストを削減することができることを示す。さらに,本研究では, 対象の精度に対して, サンプル数, 実験数, 時間ステップを選択する方法を示す, 偏差と分散の厳密な境界を計算した最近の研究を取り入れた。これらの結果は、複合チャネルの使用の有無に関わらず、qdriftプロトコルをより効率的に実装することにつながる。理論的結果は格子核実効場理論で行った数値シミュレーションによって確認される。

Simulating complex quantum systems is a promising task for digital quantum computers. However, the depth of popular product formulas scales with the number of summands in the Hamiltonian, which can therefore be challenging to implement on near-term as well as fault-tolerant devices. An efficient solution is given by the stochastic compilation protocol known as qDrift, which builds random product formulas by sampling from the Hamiltonian according to the magnitude of their coefficients. In this work, we unify the qDrift protocol with importance sampling, allowing us to sample from arbitrary distributions while controlling both the bias as well as the statistical fluctuations. We show that the simulation cost can be reduced while achieving the same accuracy by considering the individual simulation cost during the sampling stage. Moreover, we incorporate recent work on composite channel and compute rigorous bounds on the bias and variance showing how to choose the number of samples, experiments, and time steps for a given target accuracy. These results lead to a more efficient implementation of the qDrift protocol, both with and without the use of composite channels. Theoretical results are confirmed by numerical simulations performed on a lattice nuclear effective field theory.

翻訳日:2023-01-09 14:06:17 公開日:2022-12-12

# 一般時間非依存ハミルトニアンの連続変数のダイナミクスに基づく量子性証明

Dynamics-based quantumness certification of continuous variables with generic time-independent Hamiltonians ( http://arxiv.org/abs/2212.06017v1 )

ライセンス: Link先を確認

Lin Htoo Zaw and Valerio Scarani

(参考訳) 量子性のダイナミクスに基づく証明は、その力学が知られているという仮定の下で、連続変数状態の非古典的な性質を目撃するアプローチである。単一系に対する非古典性の他のテストとは異なり、シーケンシャルな測定は不要である。このプロトコルのファミリーは調和力学のために導入された。本研究では,一般時間非依存ハミルトニアンの下で進化する一自由度に対する力学に基づく証明を構築する方法を示す。低エネルギーの限界(ケレル非線形性、振り子、モースポテンシャル)でほぼ調和しているものや(無限井戸の粒子)ではないものなど、いくつかの例が明示的に研究されている。

Dynamics-based certification of quantumness is an approach to witnessing the nonclassical character of some continuous-variable states, under the assumption that their dynamics is known. Contrary to other tests of nonclassicality for single systems, it does not require sequential measurements. This family of protocols was introduced for harmonic dynamics. In this work, we show how to construct dynamics-based certification for one degree of freedom evolving under a generic time-independent Hamiltonian. Several examples are explicitly studied: some that are approximately harmonic in the limits of low energy (Kerr nonlinearities, the pendulum, and the Morse potential) and one that is not (the particle in an infinite well).

翻訳日:2023-01-09 14:05:58 公開日:2022-12-12

# ガウス状態の光子数モーメントと累積

Photon-number moments and cumulants of Gaussian states ( http://arxiv.org/abs/2212.06067v1 )

ライセンス: Link先を確認

Yanic Cardin, Nicol\'as Quesada

(参考訳) 光子数ベースで測定した場合,ガウス状態のモーメントと累積に対する閉形式表現を開発する。ガウス状態の光子数モーメントをループハフニアンで表現し、グラフの隣接を表す$(0,1)$行列に適用すると、その完全マッチングの数を数える。次に、これらの式を用いて、累積の観点でモード間の真の光子数相関を計算する。単一モードのガウス状態がゼロの全ての入力において、一様損失の干渉計が供給されると、奇数次累積は全てゼロとなる。最後に,K$同一状態が$$\ell$モード干渉計に供給されるガウスボソンサンプリング装置において,累積の分布を4次まで異なる入力状態に対して研究するために導出した式を用いる。本研究は, 累積物の種類, 圧縮, 損失, スクワッド, 熱の関数として, および非真空入力数の関数として, 累積物の依存性を解析した。熱状態は他の古典的状態(例えばスカッシュ状態)よりも、損失状態や無損失状態の光子数累積状態の模倣においてずっと悪い結果をもたらすことが判明した。

We develop a closed-form expression for the moments and cumulants of Gaussian states when measured in the photon-number basis. We express the photon-number moments of a Gaussian state in terms of the loop Hafnian, a function that when applied to $(0,1)$-matrices representing the adjacency of a graph, counts the number of its perfect matchings. We then use these expressions to calculate genuine photon-number correlations between modes in terms of cumulants. We show that when a uniformly lossy interferometer is fed in every input with identical single-mode Gaussian states with zero displacement, all the odd-order cumulants but the first one are zero. Finally, we employ the expressions we derive to study the distribution of cumulants up to the fourth order for different input states in a Gaussian boson sampling setup where $K$ identical states are fed into an $\ell$-mode interferometer. We analyze the dependence of the cumulants as a function of the type of state, squeezed, lossy squeezed, squashed, or thermal, and as a function of the number of non-vacuum inputs. We find that thermal states perform much worse than other classical states, such as squashed states, at mimicking the photon-number cumulants of lossy or lossless squeezed states.

翻訳日:2023-01-09 14:05:45 公開日:2022-12-12

# 複数のチャネルを通して貯水池に結合したオープン量子系の効率的なシミュレーション

Efficient simulation of open quantum systems coupled to a reservoir through multiple channels ( http://arxiv.org/abs/2212.06099v1 )

ライセンス: Link先を確認

Kai T. Liu, Jiaxi Wu, Peng Zhang, and David N. Beratan

(参考訳) 複数のチャネルを通して貯水池に結合したオープン量子系のシミュレーションは依然として大きな課題である。この種の開量子系は、例えば分子振動に結合した励起状態の放射のない崩壊を考えると生じる。相互作用図では連鎖マッピング戦略を用いて、複数の相互作用チャネルを介して調和浴に線形に結合した系を研究する。相互作用図では、素浴ハミルトニアンはユニタリ変換によって除去され(系-バス相互作用は残っており)、連鎖写像は浴モードを新しい基底に変換する。変換ハミルトニアンは時間依存の局所系-バス結合を含む。開量子系は、新しい基礎において限られた数の(変換された)浴モードに結合される。したがって、システムバス相互作用によって生じる絡み合いは局所的であり、行列積状態で効率的な動的シミュレーションを可能にする。このアプローチは一般化スピンボソンハミルトニアンを用いて一重項分裂をシミュレートする。電子状態は、対角および対角の両方の振動浴に結合される。このアプローチは、連鎖写像スキームをマルチチャネル系-バスカップリングの場合に一般化し、行列積状態を用いたこのクラス開量子システムの効率的なシミュレーションを可能にする。

The simulation of open quantum systems coupled to a reservoir through multiple channels remains a substantial challenge. This kind of open quantum system arises when considering the radiationless decay of excited states that are coupled to molecular vibrations, for example. We use the chain mapping strategy in the interaction picture to study systems linearly coupled to a harmonic bath through multiple interaction channels. In the interaction picture, the bare bath Hamiltonian is removed by a unitary transformation (the system-bath interactions remain), and a chain mapping transforms the bath modes to a new basis. The transformed Hamiltonian contains time-dependent local system-bath couplings. The open quantum system is coupled to a limited number of (transformed) bath modes in the new basis. As such, the entanglement generated by the system-bath interactions is local, making efficient dynamical simulations possible with matrix product states. We use this approach to simulate singlet fission, using a generalized spin-boson Hamiltonian. The electronic states are coupled to a vibrational bath both diagonally and off-diagonally. This approach generalizes the chain mapping scheme to the case of multi-channel system-bath couplings, enabling the efficient simulation of this class of open quantum systems using matrix product states.

翻訳日:2023-01-09 14:05:19 公開日:2022-12-12

# 複雑なネットワークの量子シミュレーションについて

On the quantum simulation of complex networks ( http://arxiv.org/abs/2212.06126v1 )

ライセンス: Link先を確認

Duarte Magano and Jo\~ao Moutinho and Bruno Coutinho

(参考訳) 量子ウォークは、量子コンピュータでグラフ問題にアプローチするための自然なフレームワークを提供し、マークされたノードの探索や欠落したリンクの予測といったタスクに対して、従来のものよりもスピードアップを示す。連続時間量子ウォークアルゴリズムは、ハミルトニアンがグラフの隣接行列によって与えられる量子システムのダイナミクスをシミュレートできると仮定する。グラフが行スパースで効率よく行計算可能であれば、これを効率的にシミュレートできることが知られている。これは多くのアプリケーションに十分であるが、このタイプのアルゴリズムが実世界の複雑なネットワークを研究するための適用性を制限する。言い換えれば、複雑なネットワークは通常、すべてのノード間の平均接続が非常に小さいとしても、行スパースではない。本研究では、量子シミュレーションの最先端結果を、少数のハブを含むグラフに拡張するが、それ以外はスパースである。私たちの結果は、量子コンピューティングのネットワーク科学への新しい応用に繋がるかもしれません。

Quantum walks provide a natural framework to approach graph problems with quantum computers, exhibiting speedups over their classical counterparts for tasks such as the search for marked nodes or the prediction of missing links. Continuous-time quantum walk algorithms assume that we can simulate the dynamics of quantum systems where the Hamiltonian is given by the adjacency matrix of the graph. It is known that such can be simulated efficiently if the underlying graph is row-sparse and efficiently row-computable. While this is sufficient for many applications, it limits the applicability for this class of algorithms to study real world complex networks, which, among other properties, are characterized by the existence of a few densely connected nodes, called hubs. In other words, complex networks are typically not row-sparse, even though the average connectivity over all nodes can be very small. In this work, we extend the state-of-the-art results on quantum simulation to graphs that contain a small number of hubs, but that are otherwise sparse. Hopefully, our results may lead to new applications of quantum computing to network science.

翻訳日:2023-01-09 14:05:00 公開日:2022-12-12

# マルチノード超電導量子コンピュータのアーキテクチャ

Architectures for Multinode Superconducting Quantum Computers ( http://arxiv.org/abs/2212.06167v1 )

ライセンス: Link先を確認

James Ang, Gabriella Carini, Yanzhu Chen, Isaac Chuang, Michael Austin DeMarco, Sophia E. Economou, Alec Eickbusch, Andrei Faraon, Kai-Mei Fu, Steven M. Girvin, Michael Hatridge, Andrew Houck, Paul Hilaire, Kevin Krsulich, Ang Li, Chenxu Liu, Yuan Liu, Margaret Martonosi, David C. McKay, James Misewich, Mark Ritter, Robert J. Schoelkopf, Samuel A. Stein, Sara Sussman, Hong X. Tang, Wei Tang, Teague Tomesh, Norm M. Tubman, Chen Wang, Nathan Wiebe, Yong-Xin Yao, Dillon C. Yost, Yiyu Zhou

(参考訳) 量子技術をスケールする多くの提案は、ノードと呼ばれる個々の量子プロセッサが結合して1つの大きなマルチノード量子コンピュータ(MNQC)を形成するモジュラーまたは分散設計に依存している。 MNQCを構築するためのスケーラブルな方法の1つは、光配線を持つ超伝導量子システムである。しかし、これらのマシンの制限要因はノード間ゲートであり、これは2～3桁のノイズがあり、局所的な操作よりも遅い。ノード間ゲートの制限を克服するには、絡み合い生成の改善、絡み合い蒸留の使用、最適化されたソフトウェアとコンパイラなど、さまざまなテクニックが必要である。本稿では,ノード間リンク,絡み込み蒸留,局所アーキテクチャといったハードウェアモデルを用いて,MNQCの全体的な性能を定量化するために,コデザインにインスパイアされたアプローチを用いる。マイクロ波-光リンクを有する超伝導MNQCでは, エンタングルメント生成と蒸留のトレードオフが発見され, 性能低下を脅かす。我々は、このトレードオフをナビゲートする方法を示し、コンパイラがローカルゲートとインターノードゲートを最適化する方法を示し、ノイズの多い量子リンクが純粋に古典的なリンクよりも優れている場合について議論する。これらの結果から,MNQCのハードウェアやソフトウェアの改良の可能性を示す初期のMNQCの実現のロードマップや,絡み込み生成や量子メモリの進歩から,分散量子位相推定などの専用アルゴリズムに至るまで,ランドスケープを評価するための基準を概説する。光配線を有する超伝導デバイスに焦点をあてる一方で、我々のアプローチはMNQC実装全体にわたって一般的である。

Many proposals to scale quantum technology rely on modular or distributed designs where individual quantum processors, called nodes, are linked together to form one large multinode quantum computer (MNQC). One scalable method to construct an MNQC is using superconducting quantum systems with optical interconnects. However, a limiting factor of these machines will be internode gates, which may be two to three orders of magnitude noisier and slower than local operations. Surmounting the limitations of internode gates will require a range of techniques, including improvements in entanglement generation, the use of entanglement distillation, and optimized software and compilers, and it remains unclear how improvements to these components interact to affect overall system performance, what performance from each is required, or even how to quantify the performance of each. In this paper, we employ a `co-design' inspired approach to quantify overall MNQC performance in terms of hardware models of internode links, entanglement distillation, and local architecture. In the case of superconducting MNQCs with microwave-to-optical links, we uncover a tradeoff between entanglement generation and distillation that threatens to degrade performance. We show how to navigate this tradeoff, lay out how compilers should optimize between local and internode gates, and discuss when noisy quantum links have an advantage over purely classical links. Using these results, we introduce a roadmap for the realization of early MNQCs which illustrates potential improvements to the hardware and software of MNQCs and outlines criteria for evaluating the landscape, from progress in entanglement generation and quantum memory to dedicated algorithms such as distributed quantum phase estimation. While we focus on superconducting devices with optical interconnects, our approach is general across MNQC implementations.

翻訳日:2023-01-09 14:04:43 公開日:2022-12-12

# ランダム量子回路を用いたランダム化ベンチマークの一般保証

General guarantees for randomized benchmarking with random quantum circuits ( http://arxiv.org/abs/2212.06181v1 )

ライセンス: Link先を確認

Markus Heinrich, Martin Kliesch, Ingo Roth

(参考訳) 多くの変種において、ランダム化ベンチマーク(RB)は量子コンピュータにおけるゲート実装の品質を評価するために広く用いられている手法である。厳密な理論的な理解と一般的な保証がRBプロトコルの関数化と解釈のために存在する: 精査下のゲートがコンパクト群からランダムに一様に描かれる。対照的に、実際に魅力的でスケーラブルなrbプロトコルの多くは、あるゲート集合からランダムに引き出される局所ゲートを持つランダム量子回路を実装している。実際には、これらの一様でないRBプロトコルに対して、実験的に妥当な仮定の下での一般的な保証が欠落している。本研究では,フィルタRBと呼ぶランダム回路に対して,大規模なRBプロトコルの保証を導出する。代表的な例として、線形クロスエントロピーベンチマーク、文字ベンチマーク、ポーリノイズトモグラフィ、同時rbの変種がある。近年のランダム回路に関する結果をもとに,線形深さのランダム量子回路を用いて,関連する多くのフィルタ付きrbスキームを実現できることを示した。さらに,フィルタRBの一般試料複雑性境界を導出する。高次クロストーク対応プロトコルを含むいくつかの関連グループにおいて,フィルタ付きrbはサンプル効率が高いことを示す。非一様フィルタRBの理論は、原則として、非一様およびアナログ量子シミュレータのための新しいプロトコルを設計できるほど柔軟である。

In its many variants, randomized benchmarking (RB) is a broadly used technique for assessing the quality of gate implementations on quantum computers. A detailed theoretical understanding and general guarantees exist for the functioning and interpretation of RB protocols if the gates under scrutiny are drawn uniformly at random from a compact group. In contrast, many practically attractive and scalable RB protocols implement random quantum circuits with local gates randomly drawn from some gate-set. Despite their abundance in practice, for those non-uniform RB protocols, general guarantees under experimentally plausible assumptions are missing. In this work, we derive such guarantees for a large class of RB protocols for random circuits that we refer to as filtered RB. Prominent examples include linear cross-entropy benchmarking, character benchmarking, Pauli-noise tomography and variants of simultaneous RB. Building upon recent results for random circuits, we show that many relevant filtered RB schemes can be realized with random quantum circuits in linear depth, and we provide explicit small constants for common instances. We further derive general sample complexity bounds for filtered RB. We show filtered RB to be sample-efficient for several relevant groups, including protocols addressing higher-order cross-talk. Our theory for non-uniform filtered RB is, in principle, flexible enough to design new protocols for non-universal and analog quantum simulators.

翻訳日:2023-01-09 14:04:15 公開日:2022-12-12

# 有限分割単発読み出しによる読み出しと初期化の忠実性について

On readout and initialisation fidelity by finite demolition single shot readout ( http://arxiv.org/abs/2212.06271v1 )

ライセンス: Link先を確認

Majid Zahedian, Max Keller, Minsik Kwon, Javid Javadzade, Jonas Meinel, Vadim Vorobyov and J\"org Wrachtrup

(参考訳) 理想的な射影量子測定は、可観測作用素の1つで系状態が崩壊する(|\phi_\alpha\rangle$)。しかし、射影計測の実験的な実現は理想的ではない。装置の古典的なノイズを克服するために必要な測定時間の間、システム状態はしばしば(軽く)摂動し、初期化の忠実さを損なう。本稿では,単発読み出しによって実行されるシステムの初期化忠実度を分析する解析モデルを提案する。ダイヤモンド, 電荷状態, 核スピン, 低温電子スピンの読み出しにおけるNV色中心の光子計数に基づく読み出しのパラメータを最適化する手法を考案した。我々の研究は、単発読み出しがポストセレクションやリアルタイム制御による初期化に使用されるとき、量子ビットの初期化忠実性の正確な記述に関連している。

Ideal projective quantum measurement makes the system state collapse in one of the observable operator eigenstates $|\phi_\alpha\rangle$, making it a powerful tool for preparing the system in the desired pure state. Nevertheless, experimental realisations of projective measurement are not ideal. During the measurement time needed to overcome the classical noise of the apparatus, the system state is often (slightly) perturbed, which compromises the fidelity of initialisation. In this paper, we propose an analytical model to analyse the initialisation fidelity of the system performed by the single-shot readout. We derive a method to optimise parameters for the three most used cases of photon counting based readouts for NV colour centre in diamond, charge state, nuclear spin and low temperature electron spin readout. Our work is of relevance for the accurate description of initialisation fidelity of the quantum bit when the single-shot readout is used for initialisation via post-selection or real-time control.

翻訳日:2023-01-09 14:03:52 公開日:2022-12-12

# 対称量子センサの量子誤差補正

Quantum error correction on symmetric quantum sensors ( http://arxiv.org/abs/2212.06285v1 )

ライセンス: Link先を確認

Yingkai Ouyang and Gavin K. Brennen

(参考訳) 集合角運動量の対称状態は、準備が容易で、個々のアドレナビリティを必要とせずに制御できるため、量子センサーのマルチキュービットプローブ状態のよい候補である。ここでは,古典場の大きさを対称プローブ状態を用いて推定するための量子誤差補正プロトコルを提案する。これを達成するために、まず対称部分空間上の量子誤差補正の一般理論を考案する。この理論は対称群の表現論に基づいて、任意の置換不変コード上の修正可能な誤りを訂正できる効率的なアルゴリズムを構築することができる。これらのアルゴリズムは、全角運動量、量子シュール変換または論理状態テレポーテーション、幾何学パルスゲートの測定を含む。削除誤差に対しては,幾何学的パルスゲートに基づく単純な量子誤差補正アルゴリズムを提案する。第2に、除去誤差の線形率にもかかわらず機能する対称プローブ状態に対する簡単な量子センシング手法を考案し、その漸近的性能を解析する。提案手法では,信号が蓄積している間,プローブ状態をコード空間に繰り返し投影する。信号の蓄積に要する時間が一定であれば,ノイズのない設定で可能な限り近い精度で位相推定を行うことができる。第3に,アルゴリズムの短期的実装を行う。

Symmetric states of collective angular momentum are good candidates for multi-qubit probe states in quantum sensors because they are easy to prepare and can be controlled without requiring individual addressability. Here, we give quantum error correction protocols for estimating the magnitude of classical fields using symmetric probe states. To achieve this, we first develop a general theory for quantum error correction on the symmetric subspace. This theory, based on the representation theory of the symmetric group, allows us to construct efficient algorithms that can correct any correctible error on any permutation-invariant code. These algorithms involve measurements of total angular momentum, quantum Schur transforms or logical state teleportations, and geometric pulse gates. For deletion errors, we give a simpler quantum error correction algorithm based on primarily on geometric pulse gates. Second, we devise a simple quantum sensing scheme on symmetric probe states that works in spite of a linear rate of deletion errors, and analyze its asymptotic performance. In our scheme, we repeatedly project the probe state onto the codespace while the signal accumulates. When the time spent to accumulate the signal is constant, our scheme can do phase estimation with precision that approaches the best possible in the noiseless setting. Third, we give near-term implementations of our algorithms.

翻訳日:2023-01-09 14:03:35 公開日:2022-12-12

# 単一光子メモリ計測-デバイス非依存量子セキュアな直接通信

Single-photon-memory measurement-device-independent quantum secure direct communication ( http://arxiv.org/abs/2212.05661v1 )

ライセンス: Link先を確認

Xiang-Jie Li, Dong Pan, Gui-Lu Long, and Lajos Hanzo

(参考訳) 量子セキュアダイレクト通信(QSDC)は、量子チャネルを使用して情報を確実かつ安全に送信する。実用検出器によるセキュリティの抜け穴を取り除くため,測定デバイス非依存(MDI)QSDCプロトコルが提案されている。しかし、ブロックベースの量子状態の伝送はmdi-qsdcで活用されており、執筆時点ではまだ使用できない実用的な量子メモリを必要とする。この障害を回避するため,高速な量子メモリを不要とする単一光子メモリMDI QSDCプロトコル(SPMQC)を提案する。提案プロトコルの性能は,現実的な実験パラメータを考慮したシミュレーションにより特徴づけられ,現在の技術に依拠してspmqcを実装することが可能であることが示されている。

Quantum secure direct communication (QSDC) uses the quantum channel to transmit information reliably and securely. In order to eliminate the security loopholes resulting from practical detectors, the measurement-device-independent (MDI) QSDC protocol has been proposed. However, block-based transmission of quantum states is utilized in MDI-QSDC, which requires practical quantum memory that is still unavailable at the time of writing. For circumventing this impediment, we propose a single-photon-memory MDI QSDC protocol (SPMQC) for dispensing with high-performance quantum memory. The performance of the proposed protocol is characterized by simulations considering realistic experimental parameters, and the results show that it is feasible to implement SPMQC by relying on present-day technology.

翻訳日:2023-01-09 13:56:10 公開日:2022-12-12

# 通勤ゲートのSWAPゲート挿入における初期写像問題に対するSATアプローチ

A SAT approach to the initial mapping problem in SWAP gate insertion for commuting gates ( http://arxiv.org/abs/2212.05666v1 )

ライセンス: Link先を確認

Atsushi Matsuo, Shigeru Yamashita, Daniel J. Egger

(参考訳) ほとんどの量子回路は、量子ハードウェア上で量子ビット接続に制限のあるSWAPゲート挿入を必要とする。 2ビットゲートを交換するブロックに対する有望なSWAPゲート挿入方法は、結合マップ上で同時に実行可能なSWAPゲートの層を適用した所定のスワップ戦略である。スワップ戦略に対する優れた初期マッピングは、必要なスワップゲートの数を減らす。しかし、量子近似最適化アルゴリズム(QAOA)やイジン・ハミルトニアンのトロッター化シミュレーションのように、回路が通勤ゲートで構成されている場合でも、よい初期写像を見つけることは難しい問題である。そこで本研究では,スワップ戦略を応用したコンミューティングゲートをハードウェアにトランスパイアした回路の初期マッピングをsatで求める手法を提案する。この手法は500ノードのランダムな3正則グラフに対するゲート数を65%削減する。さらに,SATの定式化とクラスタリングアルゴリズムを組み合わせたヒューリスティックな手法を提案する。このアプローチは、1000ノードのランダムな3正則グラフの自明な初期マッピングとランダムな初期マッピングの両方と比較して、スワップ層数を25%削減する。良い初期写像は、数百の量子ビットを持つノイズの多い量子ハードウェア上で、スパース問題に適用されたQAOAやIsing Hamiltonianシミュレーションのような量子アルゴリズムの研究を可能にする。

Most quantum circuits require SWAP gate insertion to run on quantum hardware with limited qubit connectivity. A promising SWAP gate insertion method for blocks of commuting two-qubit gates is a predetermined swap strategy which applies layers of SWAP gates simultaneously executable on the coupling map. A good initial mapping for the swap strategy reduces the number of required swap gates. However, even when a circuit consists of commuting gates, e.g., as in the Quantum Approximate Optimization Algorithm (QAOA) or trotterized simulations of Ising Hamiltonians, finding a good initial mapping is a hard problem. We present a SAT-based approach to find good initial mappings for circuits with commuting gates transpiled to the hardware with swap strategies. Our method achieves a 65% reduction in gate count for random three-regular graphs with 500 nodes. In addition, we present a heuristic approach that combines the SAT formulation with a clustering algorithm to reduce large problems to a manageable size. This approach reduces the number of swap layers by 25% compared to both a trivial and random initial mapping for a random three-regular graph with 1000 nodes. Good initial mappings will therefore enable the study of quantum algorithms, such as QAOA and Ising Hamiltonian simulation applied to sparse problems, on noisy quantum hardware with several hundreds of qubits.

翻訳日:2023-01-09 13:55:58 公開日:2022-12-12

# マイクロ波被覆弱非調和超伝導量子ビットの非摂動的再正規化の解消

Resolving non-perturbative renormalization of a microwave-dressed weakly anharmonic superconducting qubit ( http://arxiv.org/abs/2212.05847v1 )

ライセンス: Link先を確認

Byoung-moo Ann, Sercan Deve, and Gary A. Steele

(参考訳) マイクロ波駆動は超伝導量子ビット(scqs)のユビキタスな技術であるが、従来の摂動理論と回転波近似に基づく服装状態の記述は強い駆動限界のダイナミクスを完全に捉えることはできない。これらの近似を超越した包括的な実験的な研究は、残念ながら量子技術への関心が高まり、稀である。本研究では,マイクロ波装填トランスモンを広範囲の駆動パラメータで検討する。我々は,従来の近似を破ることなく,Rabi周波数,エネルギー緩和時間,およびリードアウト共振器との結合速度の有意な再正規化を見出した。また、2状態モデルを超えた簡潔な非フロケ理論を確立し、近似を劇的に最小化し、実験を良好に定量化する。この研究は、時間周期駆動システムの基本的な理解を拡大し、弱い非調和量子ビットのダイナミクスを正確に推定する上で重要な役割を担います。さらに,マルチレベルシステムではより複雑である適切なフロッケモードの選択などの追加作業を回避できるため,非フロッケアプローチは理論的解析に有益である。

Microwave driving is a ubiquitous technique for superconducting qubits (SCQs), but the dressed states description based on the conventionally used perturbation theory and rotating wave approximation cannot fully capture the dynamics in the strong driving limit. Comprehensive experimental works beyond these approximations applicable for transmons is unfortunately rare, which receive rising interests in quantum technologies. In this work, we investigate a microwave-dressed transmon over a wide range of driving parameters. We find significant renormalization of Rabi frequencies, energy relaxation times, and the coupling rates with a readout resonator, all of which are not quantified without breaking the conventional approximations. We also establish a concise non-Floquet theory beyond the two-state model while dramatically minimizing the approximations, which excellently quantifies the experiments. This work expands our fundamental understanding of time-periodically driven systems and will have an important role in accurately estimating the dynamics of weakly anharmonic qubits. Furthermore, our non-Floquet approach is beneficial for theoretical analysis since one can avoid additional efforts such as the choice of proper Floquet modes, which is more complicated for multi-level systems.

翻訳日:2023-01-09 13:54:59 公開日:2022-12-12

# 位置表現におけるBialynicki-BirulaとLandau-Peierls Fock空間の電磁場量子化の同型性

Isomorphism between the Bialynicki-Birula and the Landau-Peierls Fock space quantization of the electromagnetic field in position representation ( http://arxiv.org/abs/2212.05849v1 )

ライセンス: Link先を確認

Maxime Federico and Hans Rudolf Jauslin

(参考訳) まず, 位置空間表現における電磁場の量子化について, クーロンゲージにおけるlandau-peierlsアプローチと, リーマン・シルバーシュタインベクトルに基づくbialynicki-birulaアプローチの2つの主要なアプローチを用いて概説する。古典的ハミルトニアン構造から始まる枠組みと、正確に定義された対応原理によってボソニックフォック空間に量子モデルを構築する枠組みの両方を記述する。 2つの近似が完全同値であることを示す。これは、フォック空間の間に同型となるユニタリ写像が存在することを示すことによって定式化される。物理的に測定可能な全ての量はスカラー積で表現できるので、2つの量子化が全く同じ物理的性質をもたらすことを意味する。さらに、同型は時間進化において保存されていることを示す。等価性を示すために、ヘリシティと周波数演算子の概念を用いる。これら2つの演算子の組み合わせは、これらの2つの量子化法を正確な方法でリンクできる定式化を提供する。また、ハミルトニアンにおける負の固有値の存在を回避できるbialynicki-birula量子化の構成は、電子と陽電子のディラック方程式の例に類似しており、マクスウェル方程式の正準変数の別の選択を通して行うことができることを示した。

We first present a summary of the quantization of the electromagnetic field in position space representation, using two main approaches: the Landau-Peierls approach in the Coulomb gauge and the Bialynicki-Birula approach, based on the Riemann-Silberstein vector. We describe both in a framework that starts with a classical Hamiltonian structure and builds the quantum model in a bosonic Fock space by a precisely defined principle of correspondence. We show that the two approches are completly equivalent. This is formulated by showing that there is a unitary map between the Fock spaces that makes them isomorphic. Since all the physically measurable quantities can be expressed in terms of scalar products, this implies that the two quantizations lead to exactly the same physical properties. We show furthemore that the isomorphism is preserved in the time evolutions. To show the equivalence, we use the concepts of helicity and frequency operators. The combination of these two operators provides a formulation that allows one to make the link between these two methods of quantization in a precise way. We also show that the construction in the Bialynicki-Birula quantization that avoids the presence of negative eigenvalues in the Hamiltonian, in analogy with the one for the Dirac equation for electrons and positrons, can be performed through an alternative choice of the canonical variables for Maxwell's equations.

翻訳日:2023-01-09 13:54:39 公開日:2022-12-12

# 単光子干渉計のサブ0.1度位相ロック

Sub-0.1 degree phase locking of a single-photon interferometer ( http://arxiv.org/abs/2212.05852v1 )

ライセンス: Link先を確認

Vojt\v{e}ch \v{S}varc, Martina Nov\'akov\'a, Michal Dudka, Miroslav Je\v{z}ek

(参考訳) 単光子mach-zehnder干渉計の位相精度を15時間で0.05度に安定化した。位相をロックするために、量子信号とは異なる波長の補助基準光を用いる。開発した位相ロックは、無視可能なクロストークと量子信号の任意の位相に対して連続的に動作する。さらに、その性能は基準の強度変動とは無関係である。この手法は、量子干渉量論ネットワークの大部分で使用できるため、量子通信や量子メトロロジーにおける位相感応応用を大幅に改善することができる。

We report a single-photon Mach-Zehnder interferometer stabilized to a phase precision of 0.05 degrees over 15 hours. To lock the phase, we employ an auxiliary reference light at a different wavelength than the quantum signal. The developed phase locking operates continuously, with negligible crosstalk, and for an arbitrary phase of the quantum signal. Moreover, its performance is independent of intensity fluctuations of the reference. Since the presented method can be used in a vast majority of quantum interferometric networks it can significantly improve phase-sensitive applications in quantum communication and quantum metrology.

翻訳日:2023-01-09 13:54:13 公開日:2022-12-12

# trinet:完全あるいはゆっくり崩壊した自己教師付き学習の安定化

TriNet: stabilizing self-supervised learning from complete or slow collapse ( http://arxiv.org/abs/2301.00656v1 )

ライセンス: Link先を確認

Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu

(参考訳) 自己教師付き学習(SSL)モデルは、突然の情報崩壊や遅い次元崩壊という課題に直面している。本稿では,崩壊を防止し,事前学習を安定化するための新しい三分岐アーキテクチャTriNetを提案する。提案手法は,下降ベンチマークasrタスクのsof-the-art(sota)データ2vecと比較して,事前学習を安定化し,5.32%の単語誤り率低減(werr)を実現する。コードはhttps://github.com/tencent-ailab/でリリースします。

Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. Our experimental results show that the proposed method notably stabilizes and accelerates pre-training and achieves a relative word error rate reduction (WERR) of 5.32% compared to the state-of-the-art (SOTA) Data2vec for a downstream benchmark ASR task. We will release our code at https://github.com/tencent-ailab/.

翻訳日:2023-01-09 13:47:41 公開日:2022-12-12

# 自律走行車における分散的協調認識--未知の価値を学習する

Decentralized cooperative perception for autonomous vehicles: Learning to value the unknown ( http://arxiv.org/abs/2301.01250v1 )

ライセンス: Link先を確認

Maxime Chaveroche, Franck Davoine, V\'eronique Cherfaoui

(参考訳) 最近、自動運転車による事故と十分な情報不足が目撃されている。この問題に取り組む一つの方法は、異なる視点の認識、すなわち協調的な知覚から恩恵を受けることである。そこで我々は,エージェントが周囲にもっと知りたくなるような特定の領域を求めることで,完全な認識を求める活動を行っている分散的なコラボレーション,すなわちピア・ツー・ピアを提案する。究極的には、移動対象に関する知識の最大化と、他者から受信される情報総量の最小化とのトレードオフを最適化し、通信コストとメッセージ処理時間を制限したい。そこで本研究では,送信側でフィルタリングを行う代わりに,未知の車両を自我車に要求するだけで,通常の通信パラダイムを逆転する通信方針を学習する方法を提案する。深層強化学習(drl)アルゴリズムのベースとして3つの異なる生成モデルをテストし,それらを放送ポリシーとランダムに選択するポリシーと比較した。特に,局部予測可能なvae (lp-vae) を提案する。これはスタンドアロンモデルとdrlの文脈の両方において,最先端のモデルよりも優れた予測状態を生成する。運転シミュレータCARLAで実験を行った。我々の最良のモデルは、平均して補完情報の25%を獲得し、エゴ車両の知覚野の約5%しか要求していない。このトレードオフは、報酬関数の解釈可能なハイパーパラメータを通じて調整可能です。

Recently, we have been witnesses of accidents involving autonomous vehicles and their lack of sufficient information. One way to tackle this issue is to benefit from the perception of different view points, namely cooperative perception. We propose here a decentralized collaboration, i.e. peer-to-peer, in which the agents are active in their quest for full perception by asking for specific areas in their surroundings on which they would like to know more. Ultimately, we want to optimize a trade-off between the maximization of knowledge about moving objects and the minimization of the total volume of information received from others, to limit communication costs and message processing time. For this, we propose a way to learn a communication policy that reverses the usual communication paradigm by only requesting from other vehicles what is unknown to the ego-vehicle, instead of filtering on the sender side. We tested three different generative models to be taken as base for a Deep Reinforcement Learning (DRL) algorithm, and compared them to a broadcasting policy and a policy randomly selecting areas. In particular, we propose Locally Predictable VAE (LP-VAE), which appears to be producing better belief states for predictions than state-of-the-art models, both as a standalone model and in the context of DRL. Experiments were conducted in the driving simulator CARLA. Our best models reached on average a gain of 25% of the total complementary information, while only requesting about 5% of the ego-vehicle's perceptual field. This trade-off is adjustable through the interpretable hyperparameters of our reward function.

翻訳日:2023-01-09 13:47:06 公開日:2022-12-12

# テキストに富む歴史的文書のページレイアウト分析--テキストと視覚的アプローチの比較-

Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches ( http://arxiv.org/abs/2212.13924v1 )

ライセンス: Link先を確認

Najem-Meyer Sven, Romanello Matteo

(参考訳) ページレイアウト分析は、ページを関心のある領域に分割できるドキュメント処理の基本的なステップである。非常に複雑なレイアウトと複雑なスクリプトにより、学術的なコメンテータはテキストに富んだドキュメントであり、最先端のモデルでは依然として困難である。彼らのレイアウトは版によって大きく異なり、最も重要な領域は主に位置や外観といったグラフィカルな特徴ではなく、意味的に定義される。この設定は、テキスト、ビジュアル、およびハイブリッドのアプローチの比較を要求する。そこで我々は2つの変圧器(LayoutLMv3とRoBERTa)と対物検出ネットワーク(YOLOv5)の性能を評価する。結果が後者に有利な点を示した場合、この発見に注意すべき点をいくつか挙げる。実験に加えて、私たちはcaのデータセットをリリースしました。 19世紀の注釈から採集された300ページ。

Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.

翻訳日:2023-01-01 14:26:25 公開日:2022-12-12

# 自己覚醒と選択的バッチサンプリングを併用したロバスト睡眠ステージ用シームス睡眠トランスフォーマー

Siamese Sleep Transformer For Robust Sleep Stage Scoring With Self-knowledge Distillation and Selective Batch Sampling ( http://arxiv.org/abs/2212.13919v1 )

ライセンス: Link先を確認

Heon-Gyu Kwak, Young-Seok Kweon, Gi-Hwan Shin

(参考訳) 本稿では,単一チャネルの生脳波信号から特徴を効果的に抽出し,ロバストな睡眠ステージスコアリングを行うシアム睡眠トランスフォーマ(sst)を提案する。過去数年間の睡眠ステージスコアの大幅な進歩にもかかわらず、そのほとんどはモデルパフォーマンスの増大に重点を置いていた。しかしながら、データセット内のラベルのバイアスや、繰り返しトレーニングによるモデルパフォーマンスの不安定さなど、他の問題も存在する。そこで本研究では,選択的なバッチサンプリング戦略と自己認識蒸留による新しい睡眠ステージスコアリングモデルであるsstを提案する。このモデルがラベルのバイアスに対してどれほど堅牢かを評価するために、私たちは、トレーニングとテストのために異なるデータセット、すなわち睡眠心健康調査と睡眠-EDFデータセットを使用しました。この条件下では、SSTは睡眠ステージスコアにおいて競争性能を示した。また, 繰り返し訓練による性能の標準偏差を低減し, 選択的バッチサンプリング戦略の有効性を実証した。これらの結果から,sstはデータセット内のラベルのバイアスに対して効果的な学習特徴を抽出でき,選択的なバッチサンプリング戦略はモデルのロバスト性に有効であった。

In this paper, we propose a Siamese sleep transformer (SST) that effectively extracts features from single-channel raw electroencephalogram signals for robust sleep stage scoring. Despite the significant advances in sleep stage scoring in the last few years, most of them mainly focused on the increment of model performance. However, other problems still exist: the bias of labels in datasets and the instability of model performance by repetitive training. To alleviate these problems, we propose the SST, a novel sleep stage scoring model with a selective batch sampling strategy and self-knowledge distillation. To evaluate how robust the model was to the bias of labels, we used different datasets for training and testing: the sleep heart health study and the Sleep-EDF datasets. In this condition, the SST showed competitive performance in sleep stage scoring. In addition, we demonstrated the effectiveness of the selective batch sampling strategy with a reduction of the standard deviation of performance by repetitive training. These results could show that SST extracted effective learning features against the bias of labels in datasets, and the selective batch sampling strategy worked for the model robustness in training.

翻訳日:2023-01-01 14:26:11 公開日:2022-12-12

# インダストリアルエッジ装置からの高周波機械データを用いた工具側面摩耗予測

Tool flank wear prediction using high-frequency machine data from industrial edge device ( http://arxiv.org/abs/2212.13905v1 )

ライセンス: Link先を確認

D. Bilgili (1), G. Kecibas (1 and 2), C. Besirova (1 and 2), M. R. Chehrehzad (2), G. Burun (3), T. Pehlivan (1), U. Uresin (1), E. Emekli (1), I. Lazoglu (2) ((1) Ford Otosan R&D Center, Istanbul, Turkey, (2) Ko\c{c} University, Manufacturing and Automation Research Center, Istanbul, Turkey, (3) Tubitak BILGEM Information Technologies Institute, Kocaeli, Turkey)

(参考訳) ツールサイドの摩耗監視は、生産性と製品品質を高めながら、加工のダウンタイムコストを最小限に抑えることができる。一部の工業用途では、必要な許容度を達成するために、限られたレベルの工具着用しか認められない。機械のフレキシブルな振動などの他のコンポーネントが測定信号を支配しているため、機械から収集されたデータの限られたレベルのツール摩耗を監視することは困難になるかもしれない。本研究では,スピンドルモータ電流とダイナモメータの測定値から工具摩耗の限られたレベルを予測するための工具摩耗モニタリング手法を提案する。産業用エッジ装置で高周波スピンドルモータ電流データを収集し、選択された多数の穴の掘削試験において、回転ダイナモメーターで切削力とトルクを測定する。ツール摩耗の小さな変化に最も敏感な計測信号の統計的特徴を特定するために,特徴工学を行った。計測されたスピンドルモータ電流とダイナモメータ信号からツール側面の摩耗を予測するために、long short-term memory(lstm)アーキテクチャに基づくニューラルネットワークを開発した。提案手法は精度が高く計算効率も高いツールサイド摩耗を予測できることが実証された。提案手法は産業用エッジデバイスにリアルタイムの予測保守アプリケーションとして容易に実装でき、製造ダウンタイムやツール使用過多によるコストを最小限に抑えることができる。

Tool flank wear monitoring can minimize machining downtime costs while increasing productivity and product quality. In some industrial applications, only a limited level of tool wear is allowed to attain necessary tolerances. It may become challenging to monitor a limited level of tool wear in the data collected from the machine due to the other components, such as the flexible vibrations of the machine, dominating the measurement signals. In this study, a tool wear monitoring technique to predict limited levels of tool wear from the spindle motor current and dynamometer measurements is presented. High-frequency spindle motor current data is collected with an industrial edge device while the cutting forces and torque are measured with a rotary dynamometer in drilling tests for a selected number of holes. Feature engineering is conducted to identify the statistical features of the measurement signals that are most sensitive to small changes in tool wear. A neural network based on the long short-term memory (LSTM) architecture is developed to predict tool flank wear from the measured spindle motor current and dynamometer signals. It is demonstrated that the proposed technique predicts tool flank wear with good accuracy and high computational efficiency. The proposed technique can easily be implemented in an industrial edge device as a real-time predictive maintenance application to minimize the costs due to manufacturing downtime and tool underuse or overuse.

翻訳日:2023-01-01 14:25:13 公開日:2022-12-12

# PEファイル中のマルウェア検出のための機械学習

Machine Learning for Detecting Malware in PE Files ( http://arxiv.org/abs/2212.13988v1 )

ライセンス: Link先を確認

Collin Connors and Dilip Sarkar

(参考訳) 高度なマルウェアの増加は、サイバーセキュリティの大きな脅威となる。ポータブル実行可能ファイル(PEファイル)はそのようなマルウェアの一般的なベクトルである。本研究では,機械学習を用いたPEマルウェア検出手法のレビューと評価を行う。大規模ベンチマークデータセットを用いて,マルウェア検出に最も一般的な機械学習手法を用いてpeファイルの特徴を評価する。

The increasing number of sophisticated malware poses a major cybersecurity threat. Portable executable (PE) files are a common vector for such malware. In this work we review and evaluate machine learning-based PE malware detection techniques. Using a large benchmark dataset, we evaluate features of PE files using the most common machine learning techniques to detect malware.

翻訳日:2023-01-01 14:24:52 公開日:2022-12-12

# 時間変動コスト関数を用いた分散制約なし最適化

Distributed Unconstrained Optimization with Time-varying Cost Functions ( http://arxiv.org/abs/2212.09472v1 )

ライセンス: Link先を確認

Amir-Salar Esteki and Solmaz S. Kia

(参考訳) 本稿では,グループネットワークエージェントの時間変動局所コスト関数の総和を総コストとする分散制約なし最適化問題に対する新しい解を提案する。目的は、各時点の総コストを最小限に抑える最適な軌道を追跡することである。提案手法は,2段階のダイナミックスから成り,第1段階と第2段階の局所的コストの導関数を定期的にサンプリングして最適軌道への降下方向の推定を行い,第2段階では,この推定とコンセンサス項を用いて局所状態の時間変化解への誘導を行う。第1部は離散時間フレームワークにおける重み付き平均コンセンサスアルゴリズムの実装により実行され、第2部は連続時間ダイナミクスで実行される。リアプノフ安定性解析を用いて、漸近的に到達した総コストの勾配上の上限を求める。この境界は地域費用の特性によって特徴づけられる。提案手法の性能を示すために,アルゴリズムのパラメータをチューニングした数値実験を行い,局所状態の最適軌道への収束に対する効果について検討した。

In this paper, we propose a novel solution for the distributed unconstrained optimization problem where the total cost is the summation of time-varying local cost functions of a group networked agents. The objective is to track the optimal trajectory that minimizes the total cost at each time instant. Our approach consists of a two-stage dynamics, where the first one samples the first and second derivatives of the local costs periodically to construct an estimate of the descent direction towards the optimal trajectory, and the second one uses this estimate and a consensus term to drive local states towards the time-varying solution while reaching consensus. The first part is carried out by the implementation of a weighted average consensus algorithm in the discrete-time framework and the second part is performed with a continuous-time dynamics. Using the Lyapunov stability analysis, an upper bound on the gradient of the total cost is obtained which is asymptotically reached. This bound is characterized by the properties of the local costs. To demonstrate the performance of the proposed method, a numerical example is conducted that studies tuning the algorithm's parameters and their effects on the convergence of local states to the optimal trajectory.

翻訳日:2022-12-25 02:52:46 公開日:2022-12-12

# 会話レコメンダシステムのための合成データセットの評価

Evaluation of Synthetic Datasets for Conversational Recommender Systems ( http://arxiv.org/abs/2212.08167v1 )

ライセンス: Link先を確認

Harsh Lara, Manoj Tiwari

(参考訳) 大規模言語モデル(llms)をトレーニングデータセット、特に会話型レコメンデーションシステムの生成に活用する研究者にとって、堅牢な評価フレームワークの欠如は長年の問題だった。データ生成段階でllmsによってもたらされる効率は、一般的には、生成されたデータが高品質で十分な多様性を有することを保証するために、人手が要求されるため、生成データの評価の過程で阻害される。ダウンストリームアプリケーションでは,トレーニングデータの質が重要となるため,品質を水平的に評価し,バイアスを識別する指標を開発することが重要である。本稿では,生成モデルによって生成されたデータセットを評価するための多面的アプローチを用いて,様々な評価手法の利点と限界について議論する。

For researchers leveraging Large-Language Models (LLMs) in the generation of training datasets, especially for conversational recommender systems - the absence of robust evaluation frameworks has been a long-standing problem. The efficiency brought about by LLMs in the data generation phase is impeded during the process of evaluation of the generated data, since it generally requires human-raters to ensure that the data generated is of high quality and has sufficient diversity. Since the quality of training data is critical for downstream applications, it is important to develop metrics that evaluate the quality holistically and identify biases. In this paper, we present a framework that takes a multi-faceted approach towards evaluating datasets produced by generative models and discuss the advantages and limitations of various evaluation methods.

翻訳日:2022-12-25 02:45:48 公開日:2022-12-12

# 再合成ギャップの心:単段階と多段階の再合成予測の分岐

Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction ( http://arxiv.org/abs/2212.11809v1 )

ライセンス: Link先を確認

Alan Kai Hassen, Paula Torren-Peraire, Samuel Genheden, Jonas Verhoeven, Mike Preuss, Igor Tetko

(参考訳) 再合成は、商業的に利用可能な分子の集合が見つかるまで、再帰的に分子前駆体に分解する作業である。したがって、分子の有効な合成経路を提供することが目的である。単段階モデルが発展するにつれて、分子切断の予測精度が高まり、合成経路の生成が改善される可能性がある。多段階のアプローチは、単段階のレトロシンセシスモデルに格納された化学情報を繰り返し適用する。しかし、この接続は現代の研究には反映されず、プロセス内のシングルステップモデルまたはマルチステップアルゴリズムを固定する。本研究では,2つの共通探索アルゴリズムであるモンテカルロ木探索法とレトロ*法を用いて,異なる単一ステップのレトロシンセシスモデルの性能と転送のベンチマークを行い,両タスク間の橋渡しを確立する。複数のステップに拡張された単一ステップの逆合成を設計したモデルは、現在のマルチステップ手法の経路探索能力に大きな影響を及ぼし、最も広く使われているモデルと比較して最大30%性能が向上することを示した。さらに,同時代の単段階評価指標と多段階評価指標との間には明確な相関は見られず,多段階領域に対して単段階モデルを開発し,テストする必要があることを示す。

Retrosynthesis is the task of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found. Consequently, the goal is to provide a valid synthesis route for a molecule. As more single-step models develop, we see increasing accuracy in the prediction of molecular disconnections, potentially improving the creation of synthetic paths. Multi-step approaches repeatedly apply the chemical information stored in single-step retrosynthesis models. However, this connection is not reflected in contemporary research, fixing either the single-step model or the multi-step algorithm in the process. In this work, we establish a bridge between both tasks by benchmarking the performance and transfer of different single-step retrosynthesis models to the multi-step domain by leveraging two common search algorithms, Monte Carlo Tree Search and Retro*. We show that models designed for single-step retrosynthesis, when extended to multi-step, can have a tremendous impact on the route finding capabilities of current multi-step methods, improving performance by up to +30% compared to the most widely used model. Furthermore, we observe no clear link between contemporary single-step and multi-step evaluation metrics, showing that single-step models need to be developed and tested for the multi-step domain and not as an isolated task to find synthesis routes for molecules of interest.

翻訳日:2022-12-25 02:44:18 公開日:2022-12-12

# nervus: 医学画像と臨床データ分析の両方のための総合的なディープラーニング分類、回帰、予後予測ツール

Nervus: A Comprehensive Deep Learning Classification, Regression, and Prognostication Tool for both Medical Image and Clinical Data Analysis ( http://arxiv.org/abs/2212.11113v1 )

ライセンス: Link先を確認

Toshimasa Matsumoto, Shannon L Walston, Yukio Miki, Daiju Ueda

(参考訳) 本研究の目的は、医用画像研究に使いやすく、グレースケール画像、複数の入力(画像と表データの両方)、マルチラベルタスクを処理できる総合的で柔軟なライブラリを作ることである。 nervusと名付けました。研究目的にAIに適したPyTorchライブラリをベースとして、包括的な入力と出力を処理する4部モデルを作成しました。 nervusは4つの部分からなる。まずはデータローダ、次に特徴抽出器、機能ミキサー、そして最後に分類器です。データローダは入力データを前処理し、特徴抽出器はトレーニングデータとグランド真実ラベルとの間の特徴を抽出し、特徴混合器は抽出器の特徴を混合し、分類器はタスクに基づいて特徴混合器から入力データを分類する。我々はNervusを開発した。Nervusは包括的で柔軟なモデルライブラリで、グレースケール画像、マルチインプット、マルチラベルタスクを処理できる医療画像研究に簡単に利用できる。これは、放射線学の分野の研究者にとって役立つだろう。

The goal of our research is to create a comprehensive and flexible library that is easy to use for medical imaging research, and capable of handling grayscale images, multiple inputs (both images and tabular data), and multi-label tasks. We have named it Nervus. Based on the PyTorch library, which is suitable for AI for research purposes, we created a four-part model to handle comprehensive inputs and outputs. Nervus consists of four parts. First is the dataloader, then the feature extractor, the feature mixer, and finally the classifier. The dataloader preprocesses the input data, the feature extractor extracts the features between the training data and ground truth labels, feature mixer mixes the features of the extractors, and the classifier classifies the input data from feature mixer based on the task. We have created Nervus, which is a comprehensive and flexible model library that is easy to use for medical imaging research which can handle grayscale images, multi-inputs and multi-label tasks. This will be helpful for researchers in the field of radiology.

翻訳日:2022-12-25 02:43:54 公開日:2022-12-12

# 自然言語処理における論理的誤りのロバストかつ説明可能な同定

Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments ( http://arxiv.org/abs/2212.07425v1 )

ライセンス: Link先を確認

Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande, Himanshu Rawlani, Filip Ilievski, H\^ong-\^An Sandlin, Alain Mermoud

(参考訳) 偽情報、プロパガンダ、欠陥のある議論の拡散はインターネット時代に増幅されている。データの量と議論規範の違反を識別する微妙さを考えると、コンテンツモデレーションのような情報分析タスクをサポートし、論理的誤りを識別する信頼できる方法が不可欠である。本稿では,従来の論理的誤りに関する理論的研究を,検出,粗粒度,きめ細かい分類の総合的な3段階評価フレームワークに定式化する。既存の評価データセットを評価の各段階に適用する。プロトタイプ推論,インスタンスベース推論,知識注入に基づくロバストで説明可能な3つの手法を考案した。この手法は言語モデルと背景知識と説明可能なメカニズムを組み合わせるために設計されている。さらに,データ拡張とカリキュラム学習の戦略により,データの分散性に対処する。当社の3段階フレームワークは,プロパガンダ検出などの既存のタスクから,事前データセットとメソッドをネイティブに統合し,総合的な評価テストベッドとして機能します。これらの手法をデータセット上で広範囲に評価し,堅牢性と説明可能性に注目した。本研究は,異なる構成要素と誤認クラスにおける手法の強みと弱みについて考察し,誤認同定は様々なクラスを捉えるのに特別な推論を必要とする困難な課題であることを示す。私たちはオープンソースコードとデータをgithubで共有し、論理的な誤った識別に関するさらなる作業を支援しています。

The spread of misinformation, propaganda, and flawed argumentation has been amplified in the Internet era. Given the volume of data and the subtlety of identifying violations of argumentation norms, supporting information analytics tasks, like content moderation, with trustworthy methods that can identify logical fallacies is essential. In this paper, we formalize prior theoretical work on logical fallacies into a comprehensive three-stage evaluation framework of detection, coarse-grained, and fine-grained classification. We adapt existing evaluation datasets for each stage of the evaluation. We devise three families of robust and explainable methods based on prototype reasoning, instance-based reasoning, and knowledge injection. The methods are designed to combine language models with background knowledge and explainable mechanisms. Moreover, we address data sparsity with strategies for data augmentation and curriculum learning. Our three-stage framework natively consolidates prior datasets and methods from existing tasks, like propaganda detection, serving as an overarching evaluation testbed. We extensively evaluate these methods on our datasets, focusing on their robustness and explainability. Our results provide insight into the strengths and weaknesses of the methods on different components and fallacy classes, indicating that fallacy identification is a challenging task that may require specialized forms of reasoning to capture various classes. We share our open-source code and data on GitHub to support further work on logical fallacy identification.

翻訳日:2022-12-16 15:48:02 公開日:2022-12-12

# eBayにおける移動メトリック検出とアラーティングシステム

Moving Metric Detection and Alerting System at eBay ( http://arxiv.org/abs/2004.02360v2 )

ライセンス: Link先を確認

Zezhong Zhang, Keyu Nie and Ted Tao Yuan

(参考訳) ebayでは、さまざまなドメインチームが監視する何千もの製品健康指標があります。異常検出と警告検索に基づいて,動作可能な警告をユーザに通知する2段階警告システムを構築した。第1フェーズでは,分布非依存な基準を持つメトリクス間の潜在的な警告を識別するために,移動メトリック検出(mmd)と呼ばれる効率的な異常検出アルゴリズムを開発した。第2の警告検索フェーズでは、ポイントワイドランキングモデルとビジネスルールで有効な警告を選択するためのフィードバック付きロジックを構築しました。他の傾向や季節分解法と比較すると,非監督症例の異常検出がより高速かつ良好である。当社の2段階アプローチは、警告精度を劇的に改善し、ebayプロダクションにおける警告スパムを回避する。

At eBay, there are thousands of product health metrics for different domain teams to monitor. We built a two-phase alerting system to notify users with actionable alerts based on anomaly detection and alert retrieval. In the first phase, we developed an efficient anomaly detection algorithm, called Moving Metric Detector (MMD), to identify potential alerts among metrics with distribution agnostic criteria. In the second alert retrieval phase, we built additional logic with feedbacks to select valid actionable alerts with point-wise ranking model and business rules. Compared with other trend and seasonality decomposition methods, our decomposer is faster and better to detect anomalies in unsupervised cases. Our two-phase approach dramatically improves alert precision and avoids alert spamming in eBay production.

翻訳日:2022-12-16 06:00:58 公開日:2022-12-12

# 深層学習に基づく推薦システムにおけるスパース特徴のアクセスパターンによるデータ漏洩

Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems ( http://arxiv.org/abs/2212.06264v1 )

ライセンス: Link先を確認

Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram, G. Edward Suh, Hsien-Hsin S. Lee

(参考訳) オンラインパーソナライズドレコメンデーションサービスは一般にクラウドにホストされ、ユーザーはクラウドベースのモデルに問い合わせて商品やニュースフィードなどの推奨入力を受け取る。最先端のレコメンデーションモデルは、ユーザのプロファイル情報とそれらが対話するアイテムを表現するために、疎密な機能に依存しています。スパース機能はモデル全体の99%を占めるが、スパース機能による潜在的な情報漏洩には十分な注意が払われていなかった。これらのスパース機能は、クリック履歴やオブジェクトのインタラクションなど、ユーザの振る舞いを追跡するために使われ、各ユーザのプライベート情報を運ぶ可能性がある。スパース機能は大きなテーブルに格納された学習された埋め込みベクターとして表現され、パーソナライズされたレコメンデーションは、特定のユーザのスパース機能を使用してテーブルをインデックス化する。クラウドで発生する計算を隠蔽する最近提案された方法であっても、クラウドの攻撃者は埋め込みテーブルへのアクセスパターンを追跡することができる。本稿では,レコメンデーションモデルのスパース機能アクセスパターンを追跡することで学習できるプライベート情報について検討する。まず,信頼できないクラウド内のレコメンデーションモデルのスパース機能上で実施可能な攻撃の種類を特徴付け,次に,これらの攻撃がユーザの個人情報の抽出や,ユーザの行動の追跡にどのようにつながるかをデモする。

Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.

翻訳日:2022-12-14 15:51:44 公開日:2022-12-12

# スマートファクトリにおける機械停止予測の非依存学習

Agnostic Learning for Packing Machine Stoppage Prediction in Smart Factories ( http://arxiv.org/abs/2212.06288v1 )

ライセンス: Link先を確認

Gabriel Filios, Ioannis Katsidimas, Sotiris Nikoletseas, Stefanos H. Panagiotou, Theofanis P. Raptis

(参考訳) サイバー物理的収束は、産業運営者にとって新たなビジネスチャンスを開いている。サイバーと物理世界の深い統合の必要性は、新しいシステムとネットワークエンジニアリングのアプローチを統合するための豊富なビジネスアジェンダを確立する。この革命は、豊かで異質なデータソースと、そのインテリジェントな利用能力がなければ不可能であり、主にデータが産業4.0を推進するための基本的な資源となるためである。このデータ豊かでサイバー物理的でスマートな工場環境から生まれてくる最も実りある研究と実践分野の1つは、データ駆動型のプロセス監視分野である。本稿では,パッキングマシンの運転状態記録(食品・飲料領域から製造プラントの生産ラインから得られる実データ)の歴史的産業データセットを変換・前処理することにより,産業4.0の応用コンテキストにおいて,一般時系列予測手法と機械学習アルゴリズムについて検討する。提案手法では,機械の動作状態に関する1つの信号のみを使用して予測を行い,他の動作変数や故障信号や警告信号を考慮せずに予測を行う。この点において,本手法は3つのユースケースに対して極めて有望な性能を達成できることを示す。

The cyber-physical convergence is opening up new business opportunities for industrial operators. The need for deep integration of the cyber and the physical worlds establishes a rich business agenda towards consolidating new system and network engineering approaches. This revolution would not be possible without the rich and heterogeneous sources of data, as well as the ability of their intelligent exploitation, mainly due to the fact that data will serve as a fundamental resource to promote Industry 4.0. One of the most fruitful research and practice areas emerging from this data-rich, cyber-physical, smart factory environment is the data-driven process monitoring field, which applies machine learning methodologies to enable predictive maintenance applications. In this paper, we examine popular time series forecasting techniques as well as supervised machine learning algorithms in the applied context of Industry 4.0, by transforming and preprocessing the historical industrial dataset of a packing machine's operational state recordings (real data coming from the production line of a manufacturing plant from the food and beverage domain). In our methodology, we use only a single signal concerning the machine's operational status to make our predictions, without considering other operational variables or fault and warning signals, hence its characterization as ``agnostic''. In this respect, the results demonstrate that the adopted methods achieve a quite promising performance on three targeted use cases.

翻訳日:2022-12-14 15:51:18 公開日:2022-12-12

# リスクアウェアコントロールのためのオンライン学習障害: 1分未満のデータによるリスクアウェアフライト

Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data ( http://arxiv.org/abs/2212.06253v1 )

ライセンス: Link先を確認

Prithvi Akella, Skylar X. Wei, Joel W. Burdick, and Aaron D. Ames

(参考訳) 安全クリティカルなリスク認識制御の最近の進歩は、システムが直面する障害に関するアプリオリの知識に基づいて予測されている。本稿では,これらの障害をオンラインで効果的に学習する手法を提案する。まず、リスク認識コミュニティで一般的に使用されるリスク尺度であるValue-at-Riskを拡張する確率過程のリスク尺度であるSurface-at-Riskの概念を紹介します。第二に、モデルと真のシステム進化の間の状態差のノルムをスカラー値の確率過程としてモデル化し、ガウス過程回帰を通じて表面-アット-リスクへの上限を決定する。第3に,システム動作中に収集したデータセットに対して検証可能な軽度の仮定を対象とする表面の精度に関する理論的結果を提供する。最後に,ドローンの制御器を増設し,運用データを1分未満で収集した後のリスク認識アプローチによる性能向上を実証した。

Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data.

翻訳日:2022-12-14 15:48:41 公開日:2022-12-12

# ドメインインスパイアされた時間グラフ畳み込みニューラルネットワークによる土壌水分の予測と持続的作物管理

Forecasting Soil Moisture Using Domain Inspired Temporal Graph Convolution Neural Networks To Guide Sustainable Crop Management ( http://arxiv.org/abs/2212.06565v1 )

ライセンス: Link先を確認

Muneeza Azmat, Malvern Madondo, Kelsey Dipietro, Raya Horesh, Arun Bawa, Michael Jacobs, Raghavan Srinivasan, Fearghal O'Donncha

(参考訳) 気候変動、人口増加、水不足は農業にとって前例のない課題である。本研究の目的は、持続可能な農業を可能にする作物管理決定のための、ドメイン知識と機械学習を用いた土壌水分の予測である。水文反応の特徴を予測する従来の方法は、計算時間と専門知識を必要とする。最近の研究は、水文応答特性を予測するツールとして機械学習モデルを実装しているが、これらのモデルは、空間的に近接したユニットが全く異なる水文応答を持つことのできる従来の水文モデリングの重要な構成要素を無視している。従来の水文モデルでは、類似した水文特性を持つ単位をまとめて、その空間的近接に関係なくモデルパラメータを共有する。このドメイン知識に触発されて、新しいドメインにインスパイアされた時間グラフ畳み込みニューラルネットワークを構築した。本手法は,時間変動水理特性に基づくクラスタリングユニット,各クラスタのグラフトポロジの構築,およびグラフ畳み込みとゲートリカレントニューラルネットワークを用いた土壌水分の予測を含む。我々は,米国北東部のケーススタディにおいて,40年間にわたる約99,000個の水文応答ユニットからなるフィールドスケール時系列データを訓練し,検証し,検証した。既存のモデルとの比較は、時系列グラフニューラルネットワークを用いたドメインインスパイアされたクラスタリングの有効性を示している。このフレームワークは、pro bono social impactプログラムの一部としてデプロイされている。訓練されたモデルはテキサス中部の小規模農場に配備されている。

Climate change, population growth, and water scarcity present unprecedented challenges for agriculture. This project aims to forecast soil moisture using domain knowledge and machine learning for crop management decisions that enable sustainable farming. Traditional methods for predicting hydrological response features require significant computational time and expertise. Recent work has implemented machine learning models as a tool for forecasting hydrological response features, but these models neglect a crucial component of traditional hydrological modeling that spatially close units can have vastly different hydrological responses. In traditional hydrological modeling, units with similar hydrological properties are grouped together and share model parameters regardless of their spatial proximity. Inspired by this domain knowledge, we have constructed a novel domain-inspired temporal graph convolution neural network. Our approach involves clustering units based on time-varying hydrological properties, constructing graph topologies for each cluster, and forecasting soil moisture using graph convolutions and a gated recurrent neural network. We have trained, validated, and tested our method on field-scale time series data consisting of approximately 99,000 hydrological response units spanning 40 years in a case study in northeastern United States. Comparison with existing models illustrates the effectiveness of using domain-inspired clustering with time series graph neural networks. The framework is being deployed as part of a pro bono social impact program. The trained models are being deployed on small-holding farms in central Texas.

翻訳日:2022-12-14 15:41:41 公開日:2022-12-12

# ROAD: 3次元形状を効率的にエンコードする不必要な再帰オクターオートデコーダ

ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes ( http://arxiv.org/abs/2212.06193v1 )

ライセンス: Link先を確認

Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon

(参考訳) 3次元形状のコンパクトで正確な表現は多くの知覚やロボット工学のタスクの中心である。最先端の学習ベースの手法は、単一のオブジェクトを再構築できるが、大きなデータセットにはスケールしない。本稿では,暗黙のオクツリーを潜在空間で再帰的にトラバースすることで,複雑な3次元形状の大規模データセットを効率よく正確に符号化する新しい暗黙表現を提案する。暗黙的再帰的Octree Auto-Decoder (ROAD) は階層的に構造化された潜在空間を学習し、圧縮比99%以上で最先端の復元結果を実現する。また,基礎となるoctree空間表現の粗さを自然に活用する効率的なカリキュラム学習手法を提案する。本研究では, 潜在空間次元, データセットサイズ, 再構成精度に関するスケーリング則を考察し, 潜在空間次元の増加は大規模形状データセットにスケールするのに十分であることを示した。最後に,学習した潜在性空間は,異なる詳細レベルにわたって再利用可能な潜在性をもたらす粗粒度から細粒度までの階層構造を符号化し,トレーニングセット外の新しい形状への一般化の質的証拠を提供する。

Compact and accurate representations of 3D shapes are central to many perception and robotics tasks. State-of-the-art learning-based methods can reconstruct single objects but scale poorly to large datasets. We present a novel recursive implicit representation to efficiently and accurately encode large datasets of complex 3D shapes by recursively traversing an implicit octree in latent space. Our implicit Recursive Octree Auto-Decoder (ROAD) learns a hierarchically structured latent space enabling state-of-the-art reconstruction results at a compression ratio above 99%. We also propose an efficient curriculum learning scheme that naturally exploits the coarse-to-fine properties of the underlying octree spatial representation. We explore the scaling law relating latent space dimension, dataset size, and reconstruction accuracy, showing that increasing the latent space dimension is enough to scale to large shape datasets. Finally, we show that our learned latent space encodes a coarse-to-fine hierarchical structure yielding reusable latents across different levels of details, and we provide qualitative evidence of generalization to novel shapes outside the training set.

翻訳日:2022-12-14 15:40:25 公開日:2022-12-12

# 量子テンソルネットワークを用いた量子位相認識

Quantum Phase Recognition using Quantum Tensor Networks ( http://arxiv.org/abs/2212.06207v1 )

ライセンス: Link先を確認

Shweta Sahoo, Utkarsh Azad and Harjinder Singh

(参考訳) 機械学習(ML)は、最近、多体物理システムに関連する問題の解決に多くの進歩をもたらした。これらの問題の本質的な量子的性質を考えると、量子化された機械学習によって、現在よりもさらに詳細が明らかにできると推測するのは自然なことです。本稿では,教師付き学習タスクのためのテンソルネットワークに触発された浅い変動アンサツに基づく量子機械学習手法について検討する。特に,ファッション・ムニストデータセットを用いた標準画像分類タスクをまず検討し,テンソルネットワーク層がansatzの表現性と性能に与える影響について検討した。最後に、この戦略を用いて、横フィールドIsingとHeisenbergのスピンモデルに対する量子位相認識の問題を1次元と2次元で解決し、マルチスケールエンタングルメント再正規化アンサッツ (MERA) とツリーテンソルネットワーク (TTN) にインスパイアされたパラメタライズ量子回路を用いて、$\geq 98\%$テストセット精度を達成できた。

Machine learning (ML) has recently facilitated many advances in solving problems related to many-body physical systems. Given the intrinsic quantum nature of these problems, it is natural to speculate that quantum-enhanced machine learning will enable us to unveil even greater details than we currently have. With this motivation, this paper examines a quantum machine learning approach based on shallow variational ansatz inspired by tensor networks for supervised learning tasks. In particular, we first look at the standard image classification tasks using the Fashion-MNIST dataset and study the effect of repeating tensor network layers on ansatz's expressibility and performance. Finally, we use this strategy to tackle the problem of quantum phase recognition for the transverse-field Ising and Heisenberg spin models in one and two dimensions, where we were able to reach $\geq 98\%$ test-set accuracies with both multi-scale entanglement renormalization ansatz (MERA) and tree tensor network (TTN) inspired parametrized quantum circuits.

翻訳日:2022-12-14 15:32:09 公開日:2022-12-12

# PathFusion:パスに一貫性のあるLidar-Camera Deep Feature Fusion

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion ( http://arxiv.org/abs/2212.06244v1 )

ライセンス: Link先を確認

Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra

(参考訳) LiDARで撮影するカメラは、物理特性の相補性による3次元検出の精度を向上させるための有望な技術である。既存のほとんどの手法は、カメラ機能を生のLiDAR点雲や浅部3次元特徴と直接融合させることに重点を置いているが、直接深部3次元特徴融合は特徴の不一致により精度が劣る。深いネットワークの段階において、大きな受容領域にまたがる特徴集約から生じる誤用がますます厳しくなっている。本稿ではパス一貫性を有するLiDARカメラの深部機能融合を実現するPathFusionを提案する。 PathFusionは浅い特徴と深い特徴の間の経路一貫性の損失を導入し、2Dバックボーンとその融合パスが3Dバックボーンの変換にセマンティックに整合するように2D特徴を変換することを奨励する。従来の核融合ベースラインである Focals Conv にPathFusion を適用し, nuScenes テストにおける 1.2\% mAP の改善を, テスト時間拡張なしで一貫して観察する。さらにPathFusionは、KITTI AP3D(R11)を適度なレベルで0.6%以上改善する。

Fusing camera with LiDAR is a promising technique to improve the accuracy of 3D detection due to the complementary physical properties. While most existing methods focus on fusing camera features directly with raw LiDAR point clouds or shallow 3D features, it is observed that direct deep 3D feature fusion achieves inferior accuracy due to feature misalignment. The misalignment that originates from the feature aggregation across large receptive fields becomes increasingly severe for deep network stages. In this paper, we propose PathFusion to enable path-consistent LiDAR-camera deep feature fusion. PathFusion introduces a path consistency loss between shallow and deep features, which encourages the 2D backbone and its fusion path to transform 2D features in a way that is semantically aligned with the transform of the 3D backbone. We apply PathFusion to the prior-art fusion baseline, Focals Conv, and observe more than 1.2\% mAP improvements on the nuScenes test split consistently with and without testing-time augmentations. Moreover, PathFusion also improves KITTI AP3D (R11) by more than 0.6% on moderate level.

翻訳日:2022-12-14 14:49:53 公開日:2022-12-12

# scanents3d: 3dシーンにおける visio-linguistic model の改良

ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes ( http://arxiv.org/abs/2212.06250v1 )

ライセンス: Link先を確認

Ahmed Abdelreheem, Kyle Olszewski, Hsin-Ying Lee, Peter Wonka, Panos Achlioptas

(参考訳) ScanRefer [16]とReferIt3D [3]の2つの人気のあるデータセットは、自然言語を現実世界の3Dデータに結びつける。本稿では,参照文で言及されるすべてのオブジェクトと,その基礎となるインスタンスを3dシーン内で関連付けることで,上記2つを拡張した大規模かつ補完的なデータセットをキュレートする。特に、3d(scanents3d)データセットのスキャンエンティティは、84kの自然参照文にまたがる369kオブジェクト間の明示的な対応を提供し、705の現実世界のシーンをカバーします。重要なのは、この新しいデータセットから学習できる直感的な損失を組み込むことで、Nr3DとScanReferのベンチマークでそれぞれ4.3%と5.0%の改善を含む、最近導入されたいくつかのニューラルリスニングアーキテクチャのパフォーマンスを大幅に改善できることである。さらに,nr3dベンチマークにおけるsitaの13.2cider点の改善を含む3dニューラル話者のトレーニングにより,言語生成タスクの競合ベースラインと最近の手法を実験し,ニューラルリスナーと同様に3dニューラル話者もscanents3dで明らかに有益であることを示す。本研究は,ScanEnts3Dを学習することで,新たに収集したアノテーションをテスト時に提供することなく,より効率的かつ解釈可能な3Dアーキテクチャを実現することができるという結論を強く支持する。プロジェクトのwebページはhttps://scanents3d.github.io/。

The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects across 84k natural referential sentences, covering 705 real-world scenes. Crucially, we show that by incorporating intuitive losses that enable learning from this novel dataset, we can significantly improve the performance of several recently introduced neural listening architectures, including improving the SoTA in both the Nr3D and ScanRefer benchmarks by 4.3% and 5.0%, respectively. Moreover, we experiment with competitive baselines and recent methods for the task of language generation and show that, as with neural listeners, 3D neural speakers can also noticeably benefit by training with ScanEnts3D, including improving the SoTA by 13.2 CIDEr points on the Nr3D benchmark. Overall, our carefully conducted experimental studies strongly support the conclusion that, by learning on ScanEnts3D, commonly used visio-linguistic 3D architectures can become more efficient and interpretable in their generalization without needing to provide these newly collected annotations at test time. The project's webpage is https://scanents3d.github.io/ .

翻訳日:2022-12-14 14:49:31 公開日:2022-12-12

# 光コヒーレンス断層画像を用いた糖尿病網膜症自動評価法

An Ensemble Method to Automatically Grade Diabetic Retinopathy with Optical Coherence Tomography Angiography Images ( http://arxiv.org/abs/2212.06265v1 )

ライセンス: Link先を確認

Yuhan Zheng, Fuping Wu, Bart{\l}omiej W. Papie\.z

(参考訳) 糖尿病網膜症(英語版)(dr)は糖尿病の合併症であり、世界人口における視覚障害の主な原因の1つである。 DRの早期発現は、通常非常に軽度で検出が難しいため、眼球スクリーニングによる正確な診断は、後段の視力喪失を防ぐために臨床的に重要である。本研究では,糖尿病網膜症解析チャレンジ(DRAC)2022から入手可能なUW-OCTA画像を用いて,DRを自動的に評価するアンサンブル手法を提案する。まず、最先端の分類ネットワーク、すなわちresnet, densenet, efficientnet, vggを採用し、利用可能なデータセットの異なる分割を持つuw-octaイメージのグレードを訓練する。最終的に、25モデルを取得し、そのうち上位16モデルを選択して、最終的な予測を生成する。また、学習過程において、マルチタスク学習戦略についても検討し、モデル性能を改善するために補助的な分類タスクである画像品質評価を追加する。最終アンサンブルモデルでは,内部テストデータセットでは0.9346の2次重み付きカッパ(QWK),内部テストデータセットでは0.9766のエリアアンダーカーブ(AUC),DRACチャレンジテストデータセットでは0.839のQWKと0.8978のAUCを達成した。

Diabetic retinopathy (DR) is a complication of diabetes, and one of the major causes of vision impairment in the global population. As the early-stage manifestation of DR is usually very mild and hard to detect, an accurate diagnosis via eye-screening is clinically important to prevent vision loss at later stages. In this work, we propose an ensemble method to automatically grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022. First, we adopt the state-of-the-art classification networks, i.e., ResNet, DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with different splits of the available dataset. Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions. During the training process, we also investigate the multi-task learning strategy, and add an auxiliary classification task, the Image Quality Assessment, to improve the model performance. Our final ensemble model achieved a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of 0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of 0.8978 on the DRAC challenge testing dataset.

翻訳日:2022-12-14 14:49:01 公開日:2022-12-12

# ビデオオブジェクトセグメンテーションにおける「オブジェクト」の分解

Breaking the "Object" in Video Object Segmentation ( http://arxiv.org/abs/2212.06200v1 )

ライセンス: Link先を確認

Pavel Tokmakov, Jie Li, Adrien Gaidon

(参考訳) 物体の外観は、それが変形するときに浮かび上がることがある。卵が折れたり、紙が破れてしまうと、その色、形、テクスチャが劇的に変化し、アイデンティティ自体を除いてオリジナルのものはほとんど保存されない。しかし、この重要な現象は既存のvos(video object segmentation)ベンチマークにはほとんど及ばない。本研究では,ビデオオブジェクトセグメンテーションのための新しいデータセットを変換(VOST)下で収集することで,そのギャップを埋める。 700以上の高解像度ビデオで構成され、さまざまな環境で撮影され、平均20秒の長さで、インスタンスマスクでラベル付けされている。これらのビデオは、複雑なオブジェクト変換に焦点を合わせ、その完全な時間的範囲を捉えるために、注意深いマルチステップのアプローチが採用されている。次に、最先端のVOS手法を広く評価し、多くの重要な発見を行う。特に,本課題に適用された場合,既存の手法は困難であり,その主な限界は静的な外観上の過度な信頼にあることを示す。これにより、時空間情報のモデリングを改善することにより、その能力を改善するトップパフォーマンスベースラインのいくつかの変更を提案する動機付けとなります。しかし、より広範に、より堅牢なビデオオブジェクト表現の学習に関する議論を刺激することを期待している。

The appearance of an object can be fleeting when it transforms. As eggs are broken or paper is torn, their color, shape and texture can change dramatically, preserving virtually nothing of the original except for the identity itself. Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks. In this work, we close the gap by collecting a new dataset for Video Object Segmentation under Transformations (VOST). It consists of more than 700 high-resolution videos, captured in diverse environments, which are 20 seconds long on average and densely labeled with instance masks. A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent. We then extensively evaluate state-of-the-art VOS methods and make a number of important discoveries. In particular, we show that existing methods struggle when applied to this novel task and that their main limitation lies in over-reliance on static appearance cues. This motivates us to propose a few modifications for the top-performing baseline that improve its capabilities by better modeling spatio-temporal information. But more broadly, the hope is to stimulate discussion on learning more robust video object representations.

翻訳日:2022-12-14 14:37:56 公開日:2022-12-12

# 2つの正しいオブジェクト認識: 視覚的合理的な理由

Doubly Right Object Recognition: A Why Prompt for Visual Rationales ( http://arxiv.org/abs/2212.06202v1 )

ライセンス: Link先を確認

Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick

(参考訳) 多くの視覚認識モデルは、それらが強い性能を得る指標である分類精度に基づいて評価される。本稿では,コンピュータビジョンモデルが予測に正しい根拠を与えることができるかどうかを考察する。そこで、メトリクスはモデルに対して、正しいラベルと正しい合理性の両方を同時に生成するように要求する。クリップのような最先端の視覚モデルは、分類学的予測に不正確な根拠を与えることが多い。しかし, 言語モデルから, 適切なデータセットを用いて視覚表現に有理を変換することにより, 大きな視覚表現を適応させて正しい有理を生成できる「なぜプロンプト」を学習できることが示される。可視化と実証実験により,2倍のオブジェクト認識の性能が向上し,非認識タスクやデータセットへのゼロショット転送も向上した。

Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In this paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a ``why prompt,'' which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets.

翻訳日:2022-12-14 14:37:35 公開日:2022-12-12

# 文脈記述可能なビデオ表現:\Human知覚に基づく理解

Contextual Explainable Video Representation:\\Human Perception-based Understanding ( http://arxiv.org/abs/2212.06206v1 )

ライセンス: Link先を確認

Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan Le

(参考訳) 映像理解は、行動検出、行動認識、ビデオキャプション、ビデオ検索など、空間的情報と時間的情報の両方を理解するための多くの興味深いタスクを含む、強烈な研究の対象となっている。ビデオ理解における最も困難な問題の1つは特徴抽出(例えば、制約のないビデオの長く複雑な時間構造のために与えられたビデオから文脈的視覚表現を抽出する)を扱うことである。事前学習されたバックボーンネットワークをブラックボックスとして視覚的表現を抽出する既存のアプローチとは異なり、本手法は説明可能なメカニズムで最も文脈的な情報を抽出することを目的としている。私たちが観察したように、人間は通常、アクタ、関連するオブジェクト、および周囲の環境という3つの主要な要因の相互作用を通してビデオを知覚する。したがって,それぞれの要因を抽出し,それらの関係をモデル化する,文脈的に説明可能な映像表現抽出を設計することが極めて重要である。本稿では,人間の知覚過程をアクタ,物体,環境のモデリングに組み込む手法について述べる。映像理解における人間の知覚に基づく文脈表現の有効性を説明するために,映像段落キャプションと時間的行動検出を選択する。ソースコードはhttps://github.com/UARK-AICV/Video_Representationで公開されている。

Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.

翻訳日:2022-12-14 14:37:18 公開日:2022-12-12

# 新しい脆弱歩行者データセットにおける深部物体検出器の比較

Comparison Of Deep Object Detectors On A New Vulnerable Pedestrian Dataset ( http://arxiv.org/abs/2212.06218v1 )

ライセンス: Link先を確認

Devansh Sharma, Tihitina Hade, Qing Tian

(参考訳) 歩行者の安全は自動運転の主要な関心事である。今日の歩行者データセットにおける脆弱なグループの表現不足は、脆弱な道路ユーザのデータセットに対する緊急の必要性を示している。本稿では、まず、BG Vulnerable Pedestrian(BGVP)データセットという、脆弱な歩行者検出データセットを導入し、身近なモデルを訓練し、脆弱な歩行者検出の有効性を高めるために研究を誘導する。データセットには、障害のない子供、障害のない高齢者、障害のある高齢者、非脆弱性の4つのクラスが含まれている。このデータセットはパブリックドメインから収集された画像と手動で注釈付けされたバウンディングボックスで構成されている。さらに,提案したデータセットを用いて,YOLOv4,YOLOv5,YOLOX,Faster R-CNN,EfficientDetという,最先端オブジェクト検出モデルのトレーニングとテストを行った。その結果,YOLOXとYOLOv4はデータセット上で最高の成績を示し,YOLOv4は0.7999,YOLOXは0.5で0.7779,YOLOXは0.5で3.8%の成績を示した。一般的に、5つの検知器は、 with Disability クラスをよく予測し、高齢者障害クラスではうまく機能しない。 YOLOX は mAP (0.5:0.95) の他の検出器を常に上回り、障害のない子供、障害のない高齢者、障害のない子供、障害のない子供、および障害のない人それぞれ 0.5644, 0.5242, 0.4781, 0.6796 を得る。私たちのデータセットとコードはhttps://github.com/devvansh1997/bgvpで利用可能です。

Pedestrian safety is one primary concern in autonomous driving. The under-representation of vulnerable groups in today's pedestrian datasets points to an urgent need for a dataset of vulnerable road users. In this paper, we first introduce a new vulnerable pedestrian detection dataset, BG Vulnerable Pedestrian (BGVP) dataset to help train well-rounded models and thus induce research to increase the efficacy of vulnerable pedestrian detection. The dataset includes four classes, i.e., Children Without Disability, Elderly without Disability, With Disability, and Non-Vulnerable. This dataset consists of images collected from the public domain and manually-annotated bounding boxes. In addition, on the proposed dataset, we have trained and tested five state-of-the-art object detection models, i.e., YOLOv4, YOLOv5, YOLOX, Faster R-CNN, and EfficientDet. Our results indicate that YOLOX and YOLOv4 perform the best on our dataset, YOLOv4 scoring 0.7999 and YOLOX scoring 0.7779 on the mAP 0.5 metric, while YOLOX outperforms YOLOv4 by 3.8 percent on the mAP 0.5:0.95 metric. Generally speaking, all five detectors do well predicting the With Disability class and perform poorly in the Elderly Without Disability class. YOLOX consistently outperforms all other detectors on the mAP (0.5:0.95) per class metric, obtaining 0.5644, 0.5242, 0.4781, and 0.6796 for Children Without Disability, Elderly Without Disability, Non-vulnerable, and With Disability, respectively. Our dataset and codes are available at https://github.com/devvansh1997/BGVP.

翻訳日:2022-12-14 14:36:56 公開日:2022-12-12

# 生データから視覚と聴覚の表現を共同学習する

Jointly Learning Visual and Auditory Speech Representations from Raw Data ( http://arxiv.org/abs/2212.06246v1 )

ライセンス: Link先を確認

Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic

(参考訳) 視覚と聴覚の表現を協調的に学習する自己教師型マルチモーダルアプローチであるRAVEnを提案する。事前学習の目的は,マスキング入力を符号化し,ゆるやかに変化する運動量エンコーダによって生成された文脈的目標を予測することである。映像と音声の相違により、我々の設計は非対称なw.r.t.の2つのモードのプリテキストタスクである:聴覚ストリームは視覚的目標と聴覚的目標の両方を予測するが、視覚ストリームは聴覚的目標のみを予測する。我々は,1つの事前学習段階から得られる視覚的および聴覚的エンコーダを微調整し,エンコーダを協調的に訓練する際の,低・高リソースなラベル付きデータ設定の強い結果を観察した。特に、RAVEnは、RS3上の視覚音声認識(VSR)に関する全ての自己指導的手法を超越し、RAVEnと自己訓練を組み合わせることで、わずか30時間のラベル付きデータを使用して、90,000時間の公開データに基づいてトレーニングされた最近の半監督的手法よりも優れています。同時に、聴覚音声認識のための低リソース設定であるLSS3(VSR)において、最先端の結果を達成している。本研究は,手作りの特徴に頼らずに,生の映像や音声から強力な音声表現を学習できることを示す。コードとモデルは公開されます。

We present RAVEn, a self-supervised multi-modal approach to jointly learn visual and auditory speech representations. Our pre-training objective involves encoding masked inputs, and then predicting contextualised targets generated by slowly-evolving momentum encoders. Driven by the inherent differences between video and audio, our design is asymmetric w.r.t. the two modalities' pretext tasks: Whereas the auditory stream predicts both the visual and auditory targets, the visual one predicts only the auditory targets. We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained. Notably, RAVEn surpasses all self-supervised methods on visual speech recognition (VSR) on LRS3, and combining RAVEn with self-training using only 30 hours of labelled data even outperforms a recent semi-supervised method trained on 90,000 hours of non-public data. At the same time, we achieve state-of-the-art results in the LRS3 low-resource setting for auditory speech recognition (as well as for VSR). Our findings point to the viability of learning powerful speech representations entirely from raw video and audio, i.e., without relying on handcrafted features. Code and models will be made public.

翻訳日:2022-12-14 14:21:21 公開日:2022-12-12

# 変異を利用したゲノムデータを用いたニューラルネットワークの解釈可能性評価

Utilizing Mutations to Evaluate Interpretability of Neural Networks on Genomic Data ( http://arxiv.org/abs/2212.06151v1 )

ライセンス: Link先を確認

Utku Ozbulak, Solha Kang, Jasper Zuallaert, Stephen Depuydt, Joris Vankerschaver

(参考訳) 深層ニューラルネットワーク(DNN)はゲノムデータに関わる多くの問題に対して最先端の結果を達成しているが、DNNに意思決定プロセスを説明することは、ブラックボックスの性質のために大きな課題となっている。 DNNに予測の推論を説明する1つの方法は、最も予測に寄与する入力の部分を強調すると仮定される帰属法である。多くの帰属法の存在とそれらの方法の忠実度に関する定量的な結果の欠如を踏まえ、列ベースタスクに対する帰属法の選択は質的に行われている。本研究では,点突然変異を利用した計算手法を提案することにより,最も忠実な帰属法を特定するための一歩を踏み出した。 7つの一般的な帰属法について定量的な結果が得られ,LRPは翻訳開始に最も適しており,LRPは翻訳の2つの重要な生物学的特徴であるコザック配列の整合性および早期停止コドンの有害な影響を同定している。

Even though deep neural networks (DNNs) achieve state-of-the-art results for a number of problems involving genomic data, getting DNNs to explain their decision-making process has been a major challenge due to their black-box nature. One way to get DNNs to explain their reasoning for prediction is via attribution methods which are assumed to highlight the parts of the input that contribute to the prediction the most. Given the existence of numerous attribution methods and a lack of quantitative results on the fidelity of those methods, selection of an attribution method for sequence-based tasks has been mostly done qualitatively. In this work, we take a step towards identifying the most faithful attribution method by proposing a computational approach that utilizes point mutations. Providing quantitative results on seven popular attribution methods, we find Layerwise Relevance Propagation (LRP) to be the most appropriate one for translation initiation, with LRP identifying two important biological features for translation: the integrity of Kozak sequence as well as the detrimental effects of premature stop codons.

翻訳日:2022-12-14 14:11:13 公開日:2022-12-12

# 近似探索データ解析の強化

Reinforced Approximate Exploratory Data Analysis ( http://arxiv.org/abs/2212.06225v1 )

ライセンス: Link先を確認

Shaddy Garg, Subrata Mitra, Tong Yu, Yash Gadhia, Arjun Kashettiwar

(参考訳) 探索的データ分析(exploratory data analytics、eda)は、アナリストがそれに続くクエリを選択して、過去のクエリとそれに対応する結果に基づいて興味深い洞察を導き出す、逐次的な意思決定プロセスである。データ処理システムは、低レイテンシで結果を生成するために、しばしばサンプルでクエリを実行する。異なるダウンサンプリング戦略は、データの異なる統計を保存し、異なる大きさの遅延減少を持つ。サンプリング戦略の最適選択は分析フローの特定の文脈と分析者の隠れた意図に依存することが多い。本稿では,対話型データ探索におけるサンプリングの影響を,近似誤差を導入する際に初めて検討する。本稿では, サンプル選択を最適化し, 分析および洞察フローの持続性を維持するための, 深層強化学習(DRL)に基づくフレームワークを提案する。 3つの実データセットを用いて評価した結果,本手法は,ベースライン法と比較して,相互作用遅延を改善しつつ,元の洞察生成フローを維持可能であることがわかった。

Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.

翻訳日:2022-12-14 14:10:53 公開日:2022-12-12

# テキストマイニングとソーシャルメディア分析に基づく地震影響解析

Earthquake Impact Analysis Based on Text Mining and Social Media Analytics ( http://arxiv.org/abs/2212.06765v1 )

ライセンス: Link先を確認

Zhe Zheng, Hong-Zheng Shi, Yu-Cheng Zhou, Xin-Zheng Lu, Jia-Rui Lin

(参考訳) 地震は広い範囲に深く影響し、緊急救助活動は災害の範囲や範囲に関するソーシャルメディアの情報から恩恵を受ける可能性がある。そこで本研究では,早期地震影響解析のためのソーシャルメディアデータを収集・分析するためのテキストマイニング手法を提案する。まず、災害関連マイクロブログをクローラ技術に基づくSinaマイクロブログから収集する。そして、データをクリーニングした後、(1)ホットワード分析、(2)マイクロブログ数の動向、(3)世論感情の傾向、(4)地震影響分析のためのキーワードおよび規則に基づくテキスト分類を含む一連の分析を行う。最後に,中国におけるマグニチュードと震源深度が同じ2つの最近の地震を解析し,その影響を比較した。その結果, 世論の傾向分析と世論の傾向は, 早期に地震の社会的影響を推定し, 意思決定・救助管理に有効であることが示唆された。

Earthquakes have a deep impact on wide areas, and emergency rescue operations may benefit from social media information about the scope and extent of the disaster. Therefore, this work presents a text miningbased approach to collect and analyze social media data for early earthquake impact analysis. First, disasterrelated microblogs are collected from the Sina microblog based on crawler technology. Then, after data cleaning a series of analyses are conducted including (1) the hot words analysis, (2) the trend of the number of microblogs, (3) the trend of public opinion sentiment, and (4) a keyword and rule-based text classification for earthquake impact analysis. Finally, two recent earthquakes with the same magnitude and focal depth in China are analyzed to compare their impacts. The results show that the public opinion trend analysis and the trend of public opinion sentiment can estimate the earthquake's social impact at an early stage, which will be helpful to decision-making and rescue management.

翻訳日:2022-12-14 14:01:09 公開日:2022-12-12

# 自己回帰バンド

Autoregressive Bandits ( http://arxiv.org/abs/2212.06251v1 )

ライセンス: Link先を確認

Francesco Bacchiocchi, Gianmarco Genalti, Davide Maran, Marco Mussi, Marcello Restelli, Nicola Gatti, Alberto Maria Metelli

(参考訳) 自己回帰的なプロセスは、株式市場、売り予測、天気予報、広告、価格など、様々な現実世界のシナリオで自然に発生する。このような文脈でシーケンシャルな意思決定問題に対処する場合、連続的な観測間の時間的依存は最適決定ポリシーに収束するために適切に考慮すべきである。そこで本研究では,エージェントが選択する動作にパラメータが依存するk$の自己回帰プロセスに従って,有限セットのn$アクション内で,観察された報酬が従う,自己回帰的バンディット(autoregressive bandits, arbs)という,新しいオンライン学習設定を提案する。次に、楽観的な後悔の最小化アルゴリズム(ar-ucb)を考案し、$\widetilde{\mathcal{o}} \left( \frac{(k+1)^{3/2}\sqrt{nt}}{(1-\gamma)^2} \right)$であり、最適化の地平線に$t$、システムの安定性の指標に$\gamma < 1$を与える。最後に,提案手法の利点を示す一般および特定目的のバンディットベースラインと比較し,いくつかの合成および1つの実世界環境での数値検証を行う。

Autoregressive processes naturally arise in a large variety of real-world scenarios, including e.g., stock markets, sell forecasting, weather prediction, advertising, and pricing. When addressing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for converge to the optimal decision policy. In this work, we propose a novel online learning setting, named Autoregressive Bandits (ARBs), in which the observed reward follows an autoregressive process of order $k$, whose parameters depend on the action the agent chooses, within a finite set of $n$ actions. Then, we devise an optimistic regret minimization algorithm AutoRegressive Upper Confidence Bounds (AR-UCB) that suffers regret of order $\widetilde{\mathcal{O}} \left( \frac{(k+1)^{3/2}\sqrt{nT}}{(1-\Gamma)^2} \right)$, being $T$ the optimization horizon and $\Gamma < 1$ an index of the stability of the system. Finally, we present a numerical validation in several synthetic and one real-world setting, in comparison with general and specific purpose bandit baselines showing the advantages of the proposed approach.

翻訳日:2022-12-14 13:53:59 公開日:2022-12-12

# ディープラーニングのための合成画像データ

Synthetic Image Data for Deep Learning ( http://arxiv.org/abs/2212.06232v1 )

ライセンス: Link先を確認

Jason W. Anderson, Marcin Ziolkowski, Ken Kennedy, Amy W. Apon

(参考訳) 3dモデルからレンダリングされた現実的な合成画像データは、画像セットの拡張と画像分類のセマンティクスセグメンテーションモデルのトレーニングに使用できる。本研究では,実車の生産3次元CADモデルに基づく大規模合成データセットを,高品質な物理ベースレンダリングとドメインランダム化により効率的に作成する方法について検討する。このデータセットを用いて、u-netおよびdouble-u-netモデルを用いた合成拡張の有効性を定量化する。この領域では, 合成画像は, 限られた実データ集合を増強する有効な手法であることがわかった。純合成画像上で訓練されたモデルでは,実際の検証画像上では平均IoUが極めて低かった。また,合成データセットに非常に少量の実画像を追加すると精度が大幅に向上し,合成画像で拡張されたデータセットで学習したモデルの方が実画像単独で訓練したモデルよりも精度が高かった。最後に, インクリメンタルトレーニングやモデル特殊化の恩恵を受けるユースケースでは, 合成画像のベースモデルを事前訓練することで, 転送学習のトレーニングコストが大幅に削減され, モデルトレーニングの最大90%をフロントロードできることがわかった。

Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.

翻訳日:2022-12-14 13:52:24 公開日:2022-12-12

# テスト時間適応とトレーニング時間一般化:キーポイント推定を用いたヒトインスタンス分割のケーススタディ

Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation ( http://arxiv.org/abs/2212.06242v1 )

ライセンス: Link先を確認

Kambiz Azarian, Debasmit Das, Hyojin Park, Fatih Porikli

(参考訳) キーポイント推定を用いて,与えられたテスト画像のヒトインスタンスセグメンテーションマスク品質を改善する問題を考える。 2つのアプローチを比較します。第1のアプローチはテスト時間適応(TTA)法であり、単一のラベルのないテスト画像を用いてセグメント化ネットワークの重みをテスト時間修正できる。このアプローチでは、ラベル付きソースデータセットへのテスト時間アクセスを前提としません。具体的には、キーポイント推定値を擬似ラベルとして使用し、バックボーン重みを調整するためにバックプロパゲートする。第2のアプローチは、トレーニング時一般化(TTG)手法であり、ラベル付きソースデータセットへのオフラインアクセスを許可するが、重みのテスト時修正は許可しない。さらに、対象領域に関する画像や知識が利用できるとは想定していません。 TTG法は,キーポイントヘッドが生成したバックボーンの特徴を増強し,アグリゲートベクトルをマスクヘッドに供給することで構成する。包括的アブリケーションを通じて、両アプローチを評価し、TTAゲインを制限するいくつかの要因を特定する。特に、大きなドメインシフトがなければ、TTAは損傷し、TTGはパフォーマンスがわずかに向上することを示し、一方、大きなドメインシフトでは、TTAゲインはより小さく、使用したヒューリスティックに依存し、TTGゲインはより大きく、アーキテクチャ上の選択に対して堅牢であることを示す。

We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to the labeled source dataset. More specifically, our TTA method consists of using the keypoints estimates as pseudo labels and backpropagating them to adjust the backbone weights. The second approach is a training-time generalization (TTG) method, where we permit offline access to the labeled source dataset but not the test-time modification of weights. Furthermore, we do not assume the availability of any images from or knowledge about the target domain. Our TTG method consists of augmenting the backbone features with those generated by the keypoints head and feeding the aggregate vector to the mask head. Through a comprehensive set of ablations, we evaluate both approaches and identify several factors limiting the TTA gains. In particular, we show that in the absence of a significant domain shift, TTA may hurt and TTG show only a small gain in performance, whereas for a large domain shift, TTA gains are smaller and dependent on the heuristics used, while TTG gains are larger and robust to architectural choices.

翻訳日:2022-12-14 13:52:04 公開日:2022-12-12

# スプリアス相関を修正するには、適切な埋め込み抽出子だけでよい

You Only Need a Good Embeddings Extractor to Fix Spurious Correlations ( http://arxiv.org/abs/2212.06254v1 )

ライセンス: Link先を確認

Raghav Mehta, V\'itor Albiero, Li Chen, Ivan Evtimov, Tamar Glaser, Zhiheng Li, Tal Hassner

(参考訳) トレーニングデータのスプリアス相関は、モデルがショートカットとして使用することを学ぶと、しばしば堅牢性の問題を引き起こす。例えば、オブジェクトが牛であるかどうかを予測する場合、モデルはその緑の背景に依存することを学べるので、砂浜の背景の牛ではうまくいかない。この問題を軽減する方法に関する最先端の測定のための標準データセットは、waterbirdsである。ベストメソッド(Group Distributionally Robust Optimization - GroupDRO)は、現在、89 %最悪のグループ精度を達成しており、生画像のスクラッチからの標準トレーニングは72 %しか得られない。 GroupDROは、サブグループラベルを使ってエンドツーエンドでモデルをトレーニングする必要がある。本稿では,大規模な視覚モデル抽出器からの埋め込みを単純に使用し,その上に線形分類器を訓練することにより,トレーニングセットのサブグループ情報を用いることなく,最大90%の精度が得られることを示す。事前学習モデルと事前学習データセットに関する実験により,事前学習モデルのキャパシティと事前学習データセットのサイズが重要であることを示す。我々の実験では、高容量の視覚変換器は高容量の畳み込みニューラルネットワークよりも優れた性能を示し、より大きな事前学習データセットは、素早い相関データセット上で最悪のグループ精度をもたらす。

Spurious correlations in training data often lead to robustness issues since models learn to use them as shortcuts. For example, when predicting whether an object is a cow, a model might learn to rely on its green background, so it would do poorly on a cow on a sandy background. A standard dataset for measuring state-of-the-art on methods mitigating this problem is Waterbirds. The best method (Group Distributionally Robust Optimization - GroupDRO) currently achieves 89\% worst group accuracy and standard training from scratch on raw images only gets 72\%. GroupDRO requires training a model in an end-to-end manner with subgroup labels. In this paper, we show that we can achieve up to 90\% accuracy without using any sub-group information in the training set by simply using embeddings from a large pre-trained vision model extractor and training a linear classifier on top of it. With experiments on a wide range of pre-trained models and pre-training datasets, we show that the capacity of the pre-training model and the size of the pre-training dataset matters. Our experiments reveal that high capacity vision transformers perform better compared to high capacity convolutional neural networks, and larger pre-training dataset leads to better worst-group accuracy on the spurious correlation dataset.

翻訳日:2022-12-14 13:51:38 公開日:2022-12-12

# モデル拡張によるデータセット蒸留の促進

Accelerating Dataset Distillation via Model Augmentation ( http://arxiv.org/abs/2212.06152v1 )

ライセンス: Link先を確認

Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu

(参考訳) 新たな分野であるデータセット蒸留(DD)は、大規模データからはるかに小さく高品質な合成データセットを生成することを目的としている。勾配マッチングに基づく既存のDD手法は、先行性能を達成するが、数千のランダム初期化モデルの間でデータセットを継続的に最適化する必要があるため、非常に計算集約的である。本稿では,多種多様なモデルを用いた合成データの学習が一般化性能の向上につながると仮定する。そこで本稿では, 学習コストを大幅に削減した情報合成集合を学習するために, \textbf{early-stage model} と \textbf{weight perturbation} の2つの手法を提案する。実験の結果,提案手法は20$\times$ の高速化と,最先端のベースライン法と同等の性能を達成できた。

Dataset Distillation (DD), a newly emerging field, aims at generating much smaller and high-quality synthetic datasets from large ones. Existing DD methods based on gradient matching achieve leading performance; however, they are extremely computationally intensive as they require continuously optimizing a dataset among thousands of randomly initialized models. In this paper, we assume that training the synthetic data with diverse models leads to better generalization performance. Thus we propose two \textbf{model augmentation} techniques, ~\ie using \textbf{early-stage models} and \textbf{weight perturbation} to learn an informative synthetic set with significantly reduced training cost. Extensive experiments demonstrate that our method achieves up to 20$\times$ speedup and comparable performance on par with state-of-the-art baseline methods.

翻訳日:2022-12-14 13:43:49 公開日:2022-12-12

# ブラインドドメイン遷移によるゼロショット運動健康モニタリング

Zero-Shot Motor Health Monitoring by Blind Domain Transition ( http://arxiv.org/abs/2212.06154v1 )

ライセンス: Link先を確認

Serkan Kiranyaz, Ozer Can Devecioglu, Amir Alhams, Sadok Sassi, Turker Ince, Osama Abdeljaber, Onur Avci, and Moncef Gabbouj

(参考訳) 運動の健康状態の連続的モニタリングは、ベアリング障害などの異常の早期発見に不可欠である(最大51%のモーター障害はベアリング障害に起因する)。障害検出のための多くの手法が提案されているが、そのほとんどは正常(健康)と異常(正常)のデータを必要とする。ラベル付きデータに基づいて訓練された最近のDeep Learning (DL) 手法であっても、分類精度は1つまたは少数の条件が変更された場合に著しく低下する。さらに、その性能は著しく低下するか、全く異なる健全な信号パターンを持つ別のマシンでテストした場合に完全に失敗する可能性がある。そこで本研究では, 作業条件, センサパラメータ, 故障特性に関わらず, 新たな(ターゲット)マシンの故障を検知できるゼロショット軸受故障検出手法を提案する。この目的を達成するために、第1の操作生成逆ネットワーク(op-gan)は、(a)ソースマシンの正常振動信号と故障振動信号の遷移を、様々な条件、センサパラメータ、および故障タイプで特徴付ける。そして、ターゲットマシンでは、潜在的な故障信号を生成し、その実際の健全で合成された故障信号に対して、コンパクトで軽量な1d自己オン故障検出器を訓練して、発生時にリアルタイムに実故障状態を検出することができる。提案手法を検証するために、異なる条件とセンサ位置で動作する2つの異なるモータを用いて、新しいベンチマークデータセットを作成する。実験の結果, 本手法は, タイプ, 重大度, 位置に関わらず, 2台の対象機で平均89%, 95%のリコール率を達成するベアリング障害を正確に検出できることがわかった。

Continuous long-term monitoring of motor health is crucial for the early detection of abnormalities such as bearing faults (up to 51% of motor failures are attributed to bearing faults). Despite numerous methodologies proposed for bearing fault detection, most of them require normal (healthy) and abnormal (faulty) data for training. Even with the recent deep learning (DL) methodologies trained on the labeled data from the same machine, the classification accuracy significantly deteriorates when one or few conditions are altered. Furthermore, their performance suffers significantly or may entirely fail when they are tested on another machine with entirely different healthy and faulty signal patterns. To address this need, in this pilot study, we propose a zero-shot bearing fault detection method that can detect any fault on a new (target) machine regardless of the working conditions, sensor parameters, or fault characteristics. To accomplish this objective, a 1D Operational Generative Adversarial Network (Op-GAN) first characterizes the transition between normal and fault vibration signals of (a) source machine(s) under various conditions, sensor parameters, and fault types. Then for a target machine, the potential faulty signals can be generated, and over its actual healthy and synthesized faulty signals, a compact, and lightweight 1D Self-ONN fault detector can then be trained to detect the real faulty condition in real time whenever it occurs. To validate the proposed approach, a new benchmark dataset is created using two different motors working under different conditions and sensor locations. Experimental results demonstrate that this novel approach can accurately detect any bearing fault achieving an average recall rate of around 89% and 95% on two target machines regardless of its type, severity, and location.

翻訳日:2022-12-14 13:43:31 公開日:2022-12-12

# 可変再生型保守政策イテレーション

Variance-Reduced Conservative Policy Iteration ( http://arxiv.org/abs/2212.06283v1 )

ライセンス: Link先を確認

Naman Agarwal, Brian Bullins, Karan Singh

(参考訳) 政策空間上の実証的リスク最小化問題の列に強化学習を還元するサンプル複雑性について検討する。このような還元に基づくアルゴリズムは、ポリシー勾配アルゴリズムのパラメータ空間とは対照的に関数空間の局所収束を示すため、ポリシークラスの非線型あるいは不連続なパラメータ化の影響を受けない。我々は、$O(\varepsilon^{-4})$から$O(\varepsilon^{-3})$へ、$\varepsilon$-functional local optimumを生成する際のサンプル複雑さを改善する保守政策イテレーションの分散還元変種を提案する。状態被覆とポリシー完全性の仮定の下で、アルゴリズムは$O(\varepsilon^{-2})$倍をサンプリングした後、$\varepsilon$-globalOptimityを享受し、以前に確立された$O(\varepsilon^{-3})$サンプル要件を改善した。

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.

翻訳日:2022-12-14 13:43:00 公開日:2022-12-12

# 単語・文レベルでの軽度注意を用いた臨床ノートによる死亡予測モデル

Mortality Prediction Models with Clinical Notes Using Sparse Attention at the Word and Sentence Levels ( http://arxiv.org/abs/2212.06267v1 )

ライセンス: Link先を確認

Miguel Rios, Ameen Abu-Hanna

(参考訳) 集中治療院死亡予測には様々な臨床応用がある。ニューラル予測モデル、特に臨床ノートに注目する場合には、既存のモデルの改善が期待されている。しかし、受け入れるにはこれらのモデルは高性能で透明でなければならない。本研究は, 臨床神経予測モデルにおいて, 識別・校正の観点から異なる注意機構について検討する。具体的には, 臨床ノートによる病院内死亡予測タスクにおける集中的注意重みの代替として, 軽度注意力について検討した。我々は注意機構を次のように評価する。一文中の単語に対する局所的な自己注意、及び二文にまたがるトランスフォーマーアーキテクチャによるグローバルな自己注意 sparse機構アプローチは,公開データセットを用いた予測性能の観点で,局所的自己着想に対する密接なアプローチよりも優れており,事前定義された関連する指示語に対する注意が高まることを実証する。しかし、文レベルのパフォーマンスは、影響力のある指示語を含む文をまとめてドロップする傾向があるため、悪化する。

Intensive Care in-hospital mortality prediction has various clinical applications. Neural prediction models, especially when capitalising on clinical notes, have been put forward as improvement on currently existing models. However, to be acceptable these models should be performant and transparent. This work studies different attention mechanisms for clinical neural prediction models in terms of their discrimination and calibration. Specifically, we investigate sparse attention as an alternative to dense attention weights in the task of in-hospital mortality prediction from clinical notes. We evaluate the attention mechanisms based on: i) local self-attention over words in a sentence, and ii) global self-attention with a transformer architecture across sentences. We demonstrate that the sparse mechanism approach outperforms the dense one for the local self-attention in terms of predictive performance with a publicly available dataset, and puts higher attention to prespecified relevant directive words. The performance at the sentence level, however, deteriorates as sentences including the influential directive words tend to be dropped all together.

翻訳日:2022-12-14 13:35:48 公開日:2022-12-12

# nnU-Netの効率よいベイズ不確かさ推定

Efficient Bayesian Uncertainty Estimation for nnU-Net ( http://arxiv.org/abs/2212.06278v1 )

ライセンス: Link先を確認

Yidong Zhao, Changchun Yang, Artur Schweidtmann, Qian Tao

(参考訳) 自己構成のnnU-Netは、幅広い医療画像セグメンテーションの課題において、主要なパフォーマンスを達成した。選択のモデルとして広く考えられており、医用画像セグメンテーションの強力なベースラインとなっている。しかし、その異常な性能にもかかわらず、nnU-Netはその失敗の可能性を示すための不確実性の尺度を提供していない。これは、データが不均一でnnU-Netが注意なく失敗する、大規模な画像分割アプリケーションで問題となる可能性がある。本研究では,医療画像分割におけるnnU-Netの不確実性を推定する新しい手法を提案する。ベイズ不確実性推定のための重み空間の後方サンプリングに有効な手法を提案する。モンテカルロ・ドロップアウトや平均場ベイズニューラルネットワークのような従来のベースライン手法とは異なり,提案手法は変分アーキテクチャを必要とせず,元のnnU-Netアーキテクチャをそのまま維持し,優れた性能と使いやすさを維持する。さらに,マルチモーダル後部モデルにより,元のnnU-Netよりもセグメンテーション性能を向上する。本法を心臓mriの公開 acdc および m&m データセットに適用し,様々な基準法における不確実性推定の改善を実証した。提案手法は, セグメンテーション精度と品質管理の両面で, 医用画像セグメンテーションのためのnnu-netをさらに強化する。

The self-configuring nnU-Net has achieved leading performance in a large range of medical image segmentation challenges. It is widely considered as the model of choice and a strong baseline for medical image segmentation. However, despite its extraordinary performance, nnU-Net does not supply a measure of uncertainty to indicate its possible failure. This can be problematic for large-scale image segmentation applications, where data are heterogeneous and nnU-Net may fail without notice. In this work, we introduce a novel method to estimate nnU-Net uncertainty for medical image segmentation. We propose a highly effective scheme for posterior sampling of weight space for Bayesian uncertainty estimation. Different from previous baseline methods such as Monte Carlo Dropout and mean-field Bayesian Neural Networks, our proposed method does not require a variational architecture and keeps the original nnU-Net architecture intact, thereby preserving its excellent performance and ease of use. Additionally, we boost the segmentation performance over the original nnU-Net via marginalizing multi-modal posterior models. We applied our method on the public ACDC and M&M datasets of cardiac MRI and demonstrated improved uncertainty estimation over a range of baseline methods. The proposed method further strengthens nnU-Net for medical image segmentation in terms of both segmentation accuracy and quality control.

翻訳日:2022-12-14 13:35:25 公開日:2022-12-12

# 適応型ヒューマン・イン・ザ・ループ法による添加物製造プロセスのエミッション検出とコンピュータビジョンを用いたアクティブラーニング

An adaptive human-in-the-loop approach to emission detection of Additive Manufacturing processes and active learning with computer vision ( http://arxiv.org/abs/2212.06153v1 )

ライセンス: Link先を確認

Xiao Liu and Alan F. Smeaton and Alessandra Mileo

(参考訳) 3Dプリンティング(3D-printing)としても知られるAM(Additive Manufacturing)におけるin-situモニタリングとプロセス制御の最近の進歩は、製造される部品のビルドプロセス中に大量の排出データを収集することを可能にする。このデータは、3Dプリントされた部品の3Dおよび2D表現への入力として使用できる。しかし、分析と使用、およびこのデータのキャラクタリゼーションは依然として手作業のままである。本研究の目的は,AMプロセス中に発生する排出データを自動的に検査・注釈する機械学習技術を用いた適応型ヒューマン・イン・ザ・ループ手法を提案することである。第一に,畳み込みニューラルネットワーク(cnns)を用いてその場監視によって収集された放射データを自動検査し,分類し,第二に,開発された分類モデルにアクティブラーニング技術を適用することで,放射データのラベリングプロセスを高速化するヒューマン・イン・ザ・ループ機構を構築する。 CNNベースのアプローチは転送学習と微調整に依存しており、他の産業画像パターンに適用できる。提案手法の適応性は,不確実なサンプリング戦略により,ヒトの専門家に注釈を提示するサンプルの自動選択によって実現される。

Recent developments in in-situ monitoring and process control in Additive Manufacturing (AM), also known as 3D-printing, allows the collection of large amounts of emission data during the build process of the parts being manufactured. This data can be used as input into 3D and 2D representations of the 3D-printed parts. However the analysis and use, as well as the characterization of this data still remains a manual process. The aim of this paper is to propose an adaptive human-in-the-loop approach using Machine Learning techniques that automatically inspect and annotate the emissions data generated during the AM process. More specifically, this paper will look at two scenarios: firstly, using convolutional neural networks (CNNs) to automatically inspect and classify emission data collected by in-situ monitoring and secondly, applying Active Learning techniques to the developed classification model to construct a human-in-the-loop mechanism in order to accelerate the labeling process of the emission data. The CNN-based approach relies on transfer learning and fine-tuning, which makes the approach applicable to other industrial image patterns. The adaptive nature of the approach is enabled by uncertainty sampling strategy to automatic selection of samples to be presented to human experts for annotation.

翻訳日:2022-12-14 13:27:06 公開日:2022-12-12

# クラスエンコーディングパターンを見つけるためのAIモデル利用計測

AI Model Utilization Measurements For Finding Class Encoding Patterns ( http://arxiv.org/abs/2212.06576v1 )

ライセンス: Link先を確認

Peter Bajcsy and Antonio Cardone and Chenyi Ling and Philippe Dessauw and Michael Majurski and Tim Blattner and Derek Juba and Walid Keyrouz

(参考訳) この仕事は問題に対処する (a)訓練された人工知能(AI)モデル及びモデルの利用率を設計する b) これらの測定に基づいて,AIモデルにトレーニングデータをエンコードする方法を説明する。この問題は、自動運転車の交通標識の分類にAIモデルを使用するなど、セキュリティおよび安全クリティカルなアプリケーションにおけるAIモデルの説明可能性の欠如によって動機付けられている。計算グラフ(AIモデル)、サブグラフ、グラフノードのレベルにおける、交通標識の活用に基づくクラスエンコーディングにおけるAIモデル利用測定と理解パターンの理論的基盤を導入することで、この問題に対処する。概念的には、すべての可能な出力(テンソル状態)の空間におけるユニークな出力の数と分布に基づいて、AIモデルのグラフノード(計算単位)で利用が定義される。本研究では,有害およびクリーンなaiモデルを含むaiモデルから利用率の測定値を抽出する。クリーンなAIモデルとは対照的に、有毒なAIモデルは、そのようなトリガーの存在下で正しいクラスラベルを他のラベルに変更するために、体系的、物理的に実現可能なトラフィックサイン修正(トリガー)を含むトラフィックサインイメージで訓練された。このようなクリーンで有毒なAIモデルのクラスエンコーディングを分析し、トロイの木馬の注入と検出に影響を及ぼす。

This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in self-driving cars. We approach the problems by introducing theoretical underpinnings of AI model utilization measurement and understanding patterns in utilization-based class encodings of traffic signs at the level of computation graphs (AI models), subgraphs, and graph nodes. Conceptually, utilization is defined at each graph node (computation unit) of an AI model based on the number and distribution of unique outputs in the space of all possible outputs (tensor-states). In this work, utilization measurements are extracted from AI models, which include poisoned and clean AI models. In contrast to clean AI models, the poisoned AI models were trained with traffic sign images containing systematic, physically realizable, traffic sign modifications (i.e., triggers) to change a correct class label to another label in a presence of such a trigger. We analyze class encodings of such clean and poisoned AI models, and conclude with implications for trojan injection and detection.

翻訳日:2022-12-14 13:25:59 公開日:2022-12-12

# 分散確率的マルチプレイヤーマルチアーム歩行バンディット

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits ( http://arxiv.org/abs/2212.06279v1 )

ライセンス: Link先を確認

Guojun Xiong, Jian Li

(参考訳) マルチプレイヤーのマルチアームバンディットは、認知無線システムへの応用による、ますます関連する意思決定問題である。この問題のほとんどの研究は、プレイヤーがすべての腕に \textit{full access} を持ち、同じ腕を引っ張るときに報酬を受け取らない設定にのみ焦点をあてている。したがって、すべてのプレイヤーは累積報酬の最大化を目標として同じバンディット問題を解決する。しかし、これらの設定は多くの現実世界のアプリケーションにおいて重要な要素を無視しており、プレイヤーは \textit{a dynamic local subset of arms} に \textit{limited access} を持つ(つまり、腕は 'walking' で、プレイヤーにはアクセスできないことがある)。そこで本稿では,上記のモデリング問題に対処するために,多人数マルチプレイヤー歩行バンディットモデルを提案する。現在の目標は、報酬を最大化することだが、プレイヤーはローカルのサブセットからのみ腕を引くことができ、他のプレイヤーが同じ腕を引かなければ完全な報酬を得られる。我々は,探索・爆発のトレードオフに対処するためにuper confidence bound(ucb)を採用し,衝突を適切に処理するために分散最適化技術を採用する。そこで本研究では,これら2つの手法を慎重に統合することにより,後悔をほぼ最適に保証し,競争経験的性能を得るために容易に実装できる分散アルゴリズムを提案する。

Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have \textit{full access} to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have \textit{limited access} to \textit{a dynamic local subset of arms} (i.e., an arm could sometimes be ``walking'' and not accessible to the player). To this end, this paper proposes a \textit{multi-player multi-armed walking bandits} model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local subset and only collect a full reward if no other players pull the same arm. We adopt Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff and employ distributed optimization techniques to properly handle collisions. By carefully integrating these two techniques, we propose a decentralized algorithm with near-optimal guarantee on the regret, and can be easily implemented to obtain competitive empirical performance.

翻訳日:2022-12-14 13:16:26 公開日:2022-12-12

# 深部強化学習による包括再生エネルギーを用いたハイブリッドエネルギー貯蔵システムの最適計画

Optimal Planning of Hybrid Energy Storage Systems using Curtailed Renewable Energy through Deep Reinforcement Learning ( http://arxiv.org/abs/2212.05662v1 )

ライセンス: Link先を確認

Dongju Kang, Doeun Kang, Sumin Hwangbo, Haider Niaz, Won Bo Lee, J. Jay Liu, Jonggeol Na

(参考訳) エネルギー管理システム(EMS)は、継続的に成長する再生可能エネルギーを活用するためにますます重要になっている。エネルギー利害関係者の効率を最大化するために、電池やグリーン水素などのエネルギー貯蔵システム(ESS)のプロムリングを行う必要がある。しかし、異なる戦略間の活用を計画する最適な意思決定は、大規模問題の複雑さと不確実性に直面している。そこで本研究では,再生可能エネルギーの不確実性を考慮したリアルタイムな ESS 計画を実現するために,ポリシーベースアルゴリズムを用いた高度強化学習手法を提案する。定量的な性能比較により、DRLエージェントは広い動作と観測空間であってもシナリオベース確率最適化(SO)アルゴリズムよりも優れていた。 DRLの不確実性拒絶能力により, 再生可能エネルギーの大幅な不確実性の下で, 純利益と安定システムの最大化を図り, 頑健な性能を確認できた。 DRLエージェントの動作を状態に応じて視覚的に評価するためのアクションマッピングを行った。対応する結果は、drlエージェントが人間の専門家のやり方を学習することを確認し、提案手法の信頼性の高い適用を示唆した。

Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.

翻訳日:2022-12-13 18:43:20 公開日:2022-12-12

# 高分解能微分方程式による加速現象の再検討

Revisiting the acceleration phenomenon via high-resolution differential equations ( http://arxiv.org/abs/2212.05700v1 )

ライセンス: Link先を確認

Shuo Chen, Bin Shi, Ya-xiang Yuan

(参考訳) ネステロフの加速勾配降下(NAG)は、一階アルゴリズムの歴史におけるマイルストーンの1つである。高分解能微分方程式の枠組みが[Shi et al., 2022]で提案されるまでは、加速現象のメカニズムは勾配補正項によるものであることが判明しなかった。収束率に関する高分解能微分方程式の枠組みの理解を深めるために,本論文では,リアプノフ解析と位相空間表現の手法に基づいて,$\mu$-strongly convex関数のnagを引き続き検討する。まず、勾配補正スキームから証明を再検討する。 Chen et al., 2022] と同様、単純計算は証明を極端に単純化し、ステップサイズを小さな修正で$s=1/L$に拡大する。一方、リャプノフ函数の構成法は原則的である。また,暗黙の速度計画からNAGについても検討した。速度の反復性の違いから、リアプノフ関数は追加項を使わずに暗黙速度スキームから構築され、反復差分の計算がより簡単になることがわかった。 NAGの暗黙的速度スキームからの高分解能微分方程式フレームワークは最適であり、勾配補正スキームよりも優れている。

Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms. It was not successfully uncovered until the high-resolution differential equation framework was proposed in [Shi et al., 2022] that the mechanism behind the acceleration phenomenon is due to the gradient correction term. To deepen our understanding of the high-resolution differential equation framework on the convergence rate, we continue to investigate NAG for the $\mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation in this paper. First, we revisit the proof from the gradient-correction scheme. Similar to [Chen et al., 2022], the straightforward calculation simplifies the proof extremely and enlarges the step size to $s=1/L$ with minor modification. Meanwhile, the way of constructing Lyapunov functions is principled. Furthermore, we also investigate NAG from the implicit-velocity scheme. Due to the difference in the velocity iterates, we find that the Lyapunov function is constructed from the implicit-velocity scheme without the additional term and the calculation of iterative difference becomes simpler. Together with the optimal step size obtained, the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.

翻訳日:2022-12-13 18:43:00 公開日:2022-12-12

# ロバストなリカレントニューラルネットワークによる開放水中の船体運動の同定と性能保証 -- 技術報告

Robust Recurrent Neural Network to Identify Ship Motion in Open Water with Performance Guarantees -- Technical Report ( http://arxiv.org/abs/2212.05781v1 )

ライセンス: Link先を確認

Daniel Frank, Decky Aspandi Latif, Michael Muehlebach, Steffen Staab

(参考訳) リカレントニューラルネットワークは、単に入出力測定から未知の非線形システムのダイナミクスを学習することができる。しかし、結果のモデルは入出力マッピングの安定性を保証するものではない。本研究では,非線形乱れを伴う線形時間不変系として,リカレントニューラルネットワークを表現する。パラメータに制約を導入することで、有限利得安定性と増分有限利得安定性を保証できる。この識別法を用いて,開放水中を移動する4自由度船の動きを学習し,無拘束パラメータを用いた他の純粋学習型アプローチと比較する。本解析により,制約付き再帰型ニューラルネットワークは,テストセットの予測精度は低いが,分布外集合において同等の結果を得られ,安定性条件を尊重することを示した。

Recurrent neural networks are capable of learning the dynamics of an unknown nonlinear system purely from input-output measurements. However, the resulting models do not provide any stability guarantees on the input-output mapping. In this work, we represent a recurrent neural network as a linear time-invariant system with nonlinear disturbances. By introducing constraints on the parameters, we can guarantee finite gain stability and incremental finite gain stability. We apply this identification method to learn the motion of a four-degrees-of-freedom ship that is moving in open water and compare it against other purely learning-based approaches with unconstrained parameters. Our analysis shows that the constrained recurrent neural network has a lower prediction accuracy on the test set, but it achieves comparable results on an out-of-distribution set and respects stability conditions.

翻訳日:2022-12-13 18:42:37 公開日:2022-12-12

# 多変量駆動型ディリクレホークスプロセス

Multivariate Powered Dirichlet Hawkes Process ( http://arxiv.org/abs/2212.05995v1 )

ライセンス: Link先を確認

Ga\"el Poux-M\'edard, Julien Velcin, Sabine Loudcher

(参考訳) 文書の公開時間は、その意味的内容に関する関連情報を運ぶ。 Dirichlet-Hawkesプロセスは、テキスト情報と出版ダイナミクスを共同でモデル化するために提案されている。このアプローチは、最近のいくつかの作品で成功して使われており、特定の困難な問題 -- 典型的には、短いテキストや絡み合った出版ダイナミックスのために。しかし、現在の形式では、複雑な出版ダイナミクスは許可されていない。特に、推測された話題は互いに独立している -- 金融に関する出版は、例えば、政治に関する出版物には影響しないと仮定されている。本研究では,この仮定を緩和する多変量dirichlet-hawkesプロセス(mpdhp)を開発した。様々な話題に関する出版物が互いに影響を与えている。相互作用するトピックから生じる技術的課題の詳細と克服。我々は,様々な合成データセット上でmpdhpを体系的に評価し,そのアプリケーションドメインと制限を定義する。最後に,redditデータを用いたmpdhpのユースケースを開発した。この記事の最後には、興味のある読者がMPDHPの使用方法と使用時期、そうでないタイミングを知ることができる。

The publication time of a document carries a relevant information about its semantic content. The Dirichlet-Hawkes process has been proposed to jointly model textual information and publication dynamics. This approach has been used with success in several recent works, and extended to tackle specific challenging problems --typically for short texts or entangled publication dynamics. However, the prior in its current form does not allow for complex publication dynamics. In particular, inferred topics are independent from each other --a publication about finance is assumed to have no influence on publications about politics, for instance. In this work, we develop the Multivariate Powered Dirichlet-Hawkes Process (MPDHP), that alleviates this assumption. Publications about various topics can now influence each other. We detail and overcome the technical challenges that arise from considering interacting topics. We conduct a systematic evaluation of MPDHP on a range of synthetic datasets to define its application domain and limitations. Finally, we develop a use case of the MPDHP on Reddit data. At the end of this article, the interested reader will know how and when to use MPDHP, and when not to.

翻訳日:2022-12-13 18:42:23 公開日:2022-12-12

# Dirichlet-Survival Process:トピック依存拡散ネットワークのスケーラブル推論

Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion Networks ( http://arxiv.org/abs/2212.05996v1 )

ライセンス: Link先を確認

Ga\"el Poux-M\'edard, Julien Velcin, Sabine Loudcher

(参考訳) ネットワーク上の情報拡散は、文書の内容、他の出版物に対する出版時期、ネットワークにおけるスプレッダーの位置の3つの特徴を考慮し、効率的にモデル化することができる。以前の作品のほとんどは、それらのうち2つを共同でモデル化するか、あるいは非常にパラメトリックなアプローチに依存している。近年のdirichlet-pointプロセス文献に基づいて,非パラメトリックな非教師付きフレームワークでこれらすべての機能を共同で考慮したhouston(hidden online user-topic network)モデルを紹介する。動的トピック依存の拡散ネットワークを,そのトピックとともに連続的に推定する。これは教師なしであり、入力データとして \textit{(time of publication, information's content, spread entity") の形をした三重項のラベルのないストリームを考える。オンライン推論は、データセットのサイズに線形にスケールする逐次モンテカルロアルゴリズムを用いて行われる。このアプローチは、クラスタリカバリとサブネットワーク推論タスクの両方において、既存のベースラインよりも連続的に改善します。

Information spread on networks can be efficiently modeled by considering three features: documents' content, time of publication relative to other publications, and position of the spreader in the network. Most previous works model up to two of those jointly, or rely on heavily parametric approaches. Building on recent Dirichlet-Point processes literature, we introduce the Houston (Hidden Online User-Topic Network) model, that jointly considers all those features in a non-parametric unsupervised framework. It infers dynamic topic-dependent underlying diffusion networks in a continuous-time setting along with said topics. It is unsupervised; it considers an unlabeled stream of triplets shaped as \textit{(time of publication, information's content, spreading entity)} as input data. Online inference is conducted using a sequential Monte-Carlo algorithm that scales linearly with the size of the dataset. Our approach yields consequent improvements over existing baselines on both cluster recovery and subnetworks inference tasks.

翻訳日:2022-12-13 18:42:03 公開日:2022-12-12

# ユニットコミット問題に対する強化学習と木探索手法

Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem ( http://arxiv.org/abs/2212.06001v1 )

ライセンス: Link先を確認

Patrick de Mars

(参考訳) 需要を満たす世代単位の運用スケジュールを決定する単位コミットメント(UC)問題は、電力系統の運用における基本的な課題である。混合整数プログラミングを用いた既存のUC法は確率的システムには適していない。不確実性をより厳密に考慮するアプローチは、回転予備の必要量を減らし、高い効率で発電所を稼働させ、より多くの可変再生可能エネルギーを統合することで、運用コストを大幅に削減することができる。 uc問題を解決する有望なアプローチは強化学習(rl)であり、人工知能における長年にわたる大きな課題を克服するために用いられてきた最適な意思決定のための方法論である。この論文は、UC問題へのRLの適用を探求し、不確実性の下での堅牢性、複数の問題インスタンスにわたる一般化可能性、以前研究されたよりも大規模な電力システムへのスケーリングといった課題に対処する。これらの課題に対処するため,モデルフリーRLとモデルベース計画を組み合わせた新しい手法であるガイドツリー探索を開発した。 UC問題はマルコフ決定プロセスとして定式化され、イギリスの電力システムからRLエージェントを訓練するための実データに基づくオープンソース環境を開発する。最大100個のジェネレータの問題では、誘導木探索は決定論的UC法と競合し、運用コストを最大1.4 %削減する。 rlの利点は、発電機の故障に対するロバスト性、風力の削減、炭素価格といった電力系統運用者にとって重要な考慮事項を取り入れるために、このフレームワークを簡単に拡張できることである。ジェネレータの停止を考慮した場合、従来の$N-x$予約基準を用いた手法と比較して、誘導木探索は運用コストの2\%以上を節約する。

The unit commitment (UC) problem, which determines operating schedules of generation units to meet demand, is a fundamental task in power systems operation. Existing UC methods using mixed-integer programming are not well-suited to highly stochastic systems. Approaches which more rigorously account for uncertainty could yield large reductions in operating costs by reducing spinning reserve requirements; operating power stations at higher efficiencies; and integrating greater volumes of variable renewables. A promising approach to solving the UC problem is reinforcement learning (RL), a methodology for optimal decision-making which has been used to conquer long-standing grand challenges in artificial intelligence. This thesis explores the application of RL to the UC problem and addresses challenges including robustness under uncertainty; generalisability across multiple problem instances; and scaling to larger power systems than previously studied. To tackle these issues, we develop guided tree search, a novel methodology combining model-free RL and model-based planning. The UC problem is formalised as a Markov decision process and we develop an open-source environment based on real data from Great Britain's power system to train RL agents. In problems of up to 100 generators, guided tree search is shown to be competitive with deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage of RL is that the framework can be easily extended to incorporate considerations important to power systems operators such as robustness to generator failure, wind curtailment or carbon prices. When generator outages are considered, guided tree search saves over 2\% in operating costs as compared with methods using conventional $N-x$ reserve criteria.

翻訳日:2022-12-13 18:41:43 公開日:2022-12-12

# 量子多体状態のハードウェア効率学習

Hardware-efficient learning of quantum many-body states ( http://arxiv.org/abs/2212.06084v1 )

ライセンス: Link先を確認

Katherine Van Kirk, Jordan Cotler, Hsin-Yuan Huang, Mikhail D. Lukin

(参考訳) 高絡み合いの多粒子系の効率的なキャラクタリゼーションは量子科学において顕著な課題である。近年の進歩は、量子多体系の多くの特性を学ぶのに、無作為な数の測定が十分であることを示している。しかし、そのような測定の実行には個々の粒子を完全に制御する必要があるため、多くの実験プラットフォームでは利用できない。本研究では,各粒子が同一大域に分布し,追加のアンシラ粒子が存在しない場合を含む,個々の粒子を制御できるようなシステムにおいて,量子多体状態を学ぶための厳密で効率的なアルゴリズムを提案する。我々は,U(1)格子ゲージ理論におけるエネルギー密度推定アルゴリズムの有効性を数値的に実証し,非常に限られた測定能力を用いて位相順を分類する。

Efficient characterization of highly entangled multi-particle systems is an outstanding challenge in quantum science. Recent developments have shown that a modest number of randomized measurements suffices to learn many properties of a quantum many-body system. However, implementing such measurements requires complete control over individual particles, which is unavailable in many experimental platforms. In this work, we present rigorous and efficient algorithms for learning quantum many-body states in systems with any degree of control over individual particles, including when every particle is subject to the same global field and no additional ancilla particles are available. We numerically demonstrate the effectiveness of our algorithms for estimating energy densities in a U(1) lattice gauge theory and classifying topological order using very limited measurement capabilities.

翻訳日:2022-12-13 18:41:16 公開日:2022-12-12

# 与えられた平均と分散を持つ任意の分布間の全変動距離に対する下限

Lower Bounds for the Total Variation Distance Between Arbitrary Distributions with Given Means and Variances ( http://arxiv.org/abs/2212.05820v1 )

ライセンス: Link先を確認

Tomohiro Nishiyama

(参考訳) 与えられた平均と分散(共分散行列)を持つ実 d-空間上の任意の二つの確率測度に対して、その全変動距離に対して下限を与える。

For arbitrary two probability measures on real d-space with given means and variances (covariance matrices), we provide lower bounds for their total variation distance.

翻訳日:2022-12-13 18:41:02 公開日:2022-12-12

# 経路レベルでの細胞内局在予測のためのグラフアルゴリズム

Graph algorithms for predicting subcellular localization at the pathway level ( http://arxiv.org/abs/2212.05991v1 )

ライセンス: Link先を確認

Chris S. Magnano, Anthony Gitter

(参考訳) タンパク質の細胞内局在は正常な細胞プロセスや疾患において重要な因子である。多くのタンパク質の局在化資源は静的として扱うが、タンパク質の局在化は生物学的文脈に強く影響される。生物学的経路は、特定の生物学的文脈を表すグラフであり、大規模データから推測できる。生物経路における全ての相互作用の局所化をエッジラベルタスクとして予測するグラフアルゴリズムを開発した。我々は,グラフニューラルネットワーク,確率的グラフィカルモデル,識別分類器など様々なモデルを比較し,キュレーションされた経路データベースからの局所化アノテーションを予測する。また, ウイルス感染によるヒト線維芽細胞の局在を予測し, 生物学的経路を構築するケーススタディも実施した。経路ローカライゼーション予測は,大規模生物学的データの解析に公開可能なローカライゼーションデータを統合するための有望なアプローチである。

Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data. We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task. We compare a variety of models including graph neural networks, probabilistic graphical models, and discriminative classifiers for predicting localization annotations from curated pathway databases. We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection. Pathway localization prediction is a promising approach for integrating publicly available localization data into the analysis of large-scale biological data.

翻訳日:2022-12-13 18:25:58 公開日:2022-12-12

# ラベル差分プライバシーによる回帰

Regression with Label Differential Privacy ( http://arxiv.org/abs/2212.06074v1 )

ライセンス: Link先を確認

Badih Ghazi, Pritish Kamath, Ravi Kumar, Ethan Leeman, Pasin Manurangsi, Avinash Varadarajan, Chiyuan Zhang

(参考訳) ラベル差分プライバシー(DP)を保証した回帰モデルの学習課題について検討する。ラベル値のグローバルな事前分布に基づいて, 与えられた回帰損失関数の下で最適なラベルDPランダム化機構を導出する。最適機構が ‘randomized response on bins'' の形をとることを証明し、最適なbin値を求める効率的なアルゴリズムを提案する。アルゴリズムの有効性を示すいくつかのデータセットについて,徹底的な実験評価を行った。

We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a ``randomized response on bins'', and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.

翻訳日:2022-12-13 18:25:45 公開日:2022-12-12

# フランクウルフによる多次元ホークス過程の高速学習

Fast Learning of Multidimensional Hawkes Processes via Frank-Wolfe ( http://arxiv.org/abs/2212.06081v1 )

ライセンス: Link先を確認

Renbo Zhao, Niccol\`o Dalmasso, Mohsen Ghassemi, Vamsi K. Potluru, Tucker Balch, Manuela Veloso

(参考訳) シーケンシャルなイベントデータのモデリングと生成に関して、Hawkesプロセスは最近ツールの最前線に現れている。多次元ホークスプロセスは、異なる種類の事象間の自己および相互励起の両方をモデル化し、財務、疫学、パーソナライズドレコメンデーションなど様々な分野でうまく適用されている。本研究では,Frank-Wolfeアルゴリズムを多次元ホークス過程の学習に適用する。実験結果から,本手法は,他の1次手法よりもパラメータ推定の精度が優れており,実行時間が大幅に高速であることがわかった。

Hawkes processes have recently risen to the forefront of tools when it comes to modeling and generating sequential events data. Multidimensional Hawkes processes model both the self and cross-excitation between different types of events and have been applied successfully in various domain such as finance, epidemiology and personalized recommendations, among others. In this work we present an adaptation of the Frank-Wolfe algorithm for learning multidimensional Hawkes processes. Experimental results show that our approach has better or on par accuracy in terms of parameter estimation than other first order methods, while enjoying a significantly faster runtime.

翻訳日:2022-12-13 18:25:38 公開日:2022-12-12

# ハンドブレアテ:手のひらからの呼吸異常の非接触モニタリング

Hand-breathe: Non-Contact Monitoring of Breathing Abnormalities from Hand Palm ( http://arxiv.org/abs/2212.06089v1 )

ライセンス: Link先を確認

Kawish Pervez, Waqas Aman, M. Mahboob Ur Rahman, M. Wasim Nawaz, Qammer H. Abbasi

(参考訳) ポストコビッド19の世界では、無線周波数(RF)ベースの非接触手法、例えばソフトウェア定義無線(SDR)ベースの手法が、人間のバイタルをインテリジェントにリモートセンシングするための候補として浮上し、コビッド19のような伝染性ウイルスを封じ込めている。そこで本研究では,usrp(universal software radio peripherals)ベースのsdrと古典的機械学習(ml)法を用いた非接触型呼吸異常監視法を提案する。提案手法では,送信アンテナと受信アンテナの間にあるテーブルに手を置くとともに,直交周波数分割多重化(ofdm)信号が手を通過する。その後、受信機はチャネル周波数応答(基本的、微細な無線チャネル状態情報)を抽出し、様々なMLアルゴリズムに供給し、最終的に異なる呼吸異常を分類する。すべての分類器のうち、線形svm分類器は最大精度88.1\%であった。 ML分類器を教師付きで訓練するために,実験室環境における4被験者のリアルタイム実験によりデータ収集を行った。ラベル生成の目的で、被験者の呼吸は正常、速、低呼吸の3つのクラスに分類された。さらに,提案手法(手のみをRF信号に曝す方法)に加えて,最先端手法(完全胸部をRF放射に曝す方法)の実装と試験を行った。両手法の性能比較の結果,提案手法の精度はわずかに劣るが,本手法ではRF照射による被曝が最小限に抑えられるというトレードオフが示された。

In post-covid19 world, radio frequency (RF)-based non-contact methods, e.g., software-defined radios (SDR)-based methods have emerged as promising candidates for intelligent remote sensing of human vitals, and could help in containment of contagious viruses like covid19. To this end, this work utilizes the universal software radio peripherals (USRP)-based SDRs along with classical machine learning (ML) methods to design a non-contact method to monitor different breathing abnormalities. Under our proposed method, a subject rests his/her hand on a table in between the transmit and receive antennas, while an orthogonal frequency division multiplexing (OFDM) signal passes through the hand. Subsequently, the receiver extracts the channel frequency response (basically, fine-grained wireless channel state information), and feeds it to various ML algorithms which eventually classify between different breathing abnormalities. Among all classifiers, linear SVM classifier resulted in a maximum accuracy of 88.1\%. To train the ML classifiers in a supervised manner, data was collected by doing real-time experiments on 4 subjects in a lab environment. For label generation purpose, the breathing of the subjects was classified into three classes: normal, fast, and slow breathing. Furthermore, in addition to our proposed method (where only a hand is exposed to RF signals), we also implemented and tested the state-of-the-art method (where full chest is exposed to RF radiation). The performance comparison of the two methods reveals a trade-off, i.e., the accuracy of our proposed method is slightly inferior but our method results in minimal body exposure to RF radiation, compared to the benchmark method.

翻訳日:2022-12-13 18:25:25 公開日:2022-12-12

# どこから始めるか? 簡単なスキルを複雑な環境に移す

Where To Start? Transferring Simple Skills to Complex Environments ( http://arxiv.org/abs/2212.06111v1 )

ライセンス: Link先を確認

Vitalis Vosylius, Edward Johns

(参考訳) ロボット学習は、把持など、ロボットに簡単なスキルを教える多くの方法を提供する。しかし、これらのスキルは通常、オープンで散らかった環境で訓練されているため、より複雑で散らかった環境では望ましくない衝突を引き起こす可能性がある。そこで本研究では,環境のグラフ表現に基づくアプライアンスモデルを提案する。これは,展開中に最適化され,スキルを開始するための適切なロボット構成を見つける。提案手法は,事前取得したスキルをシミュレーションや実環境において,把握作業と配置作業の両方において,これまで見つからなかった制約のある環境に一般化できることを実証する。

Robot learning provides a number of ways to teach robots simple skills, such as grasping. However, these skills are usually trained in open, clutter-free environments, and therefore would likely cause undesirable collisions in more complex, cluttered environments. In this work, we introduce an affordance model based on a graph representation of an environment, which is optimised during deployment to find suitable robot configurations to start a skill from, such that the skill can be executed without any collisions. We demonstrate that our method can generalise a priori acquired skills to previously unseen cluttered and constrained environments, in simulation and in the real world, for both a grasping and a placing task.

翻訳日:2022-12-13 18:24:54 公開日:2022-12-12

# 自律運転への適用による強化学習セキュリティに関する調査研究

A Survey on Reinforcement Learning Security with Application to Autonomous Driving ( http://arxiv.org/abs/2212.06123v1 )

ライセンス: Link先を確認

Ambra Demontis, Maura Pintor, Luca Demetrio, Kathrin Grosse, Hsiao-Ying Lin, Chengfang Fang, Battista Biggio, Fabio Roli

(参考訳) 強化学習は、機械が自身の経験から学ぶことを可能にする。今日では、強化学習アルゴリズムが効果的で信頼できる方針を学習することを防ぐか、または訓練されたエージェントに間違った判断をさせるために慎重に作られた攻撃に対して脆弱であるにもかかわらず、自動運転のような安全クリティカルなアプリケーションで使用されている。強化学習の安全性に関する文献は急速に増加しており、この分野に光を当てるためにいくつかの調査が提案されている。しかし、それらの分類は、手元にあるシステムの種類に応じて適切な防御を選択するには不十分である。我々は,この制限を異なる視点から克服するだけでなく,強化学習アルゴリズムが自動運転の文脈で使用される場合の最先端攻撃と防御の適用可能性についても論じる。

Reinforcement learning allows machines to learn from their own experience. Nowadays, it is used in safety-critical applications, such as autonomous driving, despite being vulnerable to attacks carefully crafted to either prevent that the reinforcement learning algorithm learns an effective and reliable policy, or to induce the trained agent to make a wrong decision. The literature about the security of reinforcement learning is rapidly growing, and some surveys have been proposed to shed light on this field. However, their categorizations are insufficient for choosing an appropriate defense given the kind of system at hand. In our survey, we do not only overcome this limitation by considering a different perspective, but we also discuss the applicability of state-of-the-art attacks and defenses when reinforcement learning algorithms are used in the context of autonomous driving.

翻訳日:2022-12-13 18:24:42 公開日:2022-12-12

# 精神状態に基づくパーソナライズ型睡眠誘導システムの開発

Development of Personalized Sleep Induction System based on Mental States ( http://arxiv.org/abs/2212.05669v1 )

ライセンス: Link先を確認

Young-Seok Kweon, Gi-Hwan Shin, Heon-Gyu Kwak

(参考訳) 睡眠は認知、運動、感情的なパフォーマンスの低下や様々な病気を防ぐために不可欠な行動である。しかし、人々が眠りたいときに眠りに落ちることは容易ではない。睡眠障害には、新型コロナウイルス(covid-19)の状況、外からの騒音、夜間の光など様々な要因がある。脳波と聴覚刺激を用いた精神状態に基づくパーソナライズされた睡眠誘導システムの構築を目指している。本システムは,脳波とピッツバーグ睡眠状態指標とブルネル気分尺度の結果を用いて,ユーザの精神状態を解析する。精神状態によると、このシステムは、ホワイトノイズ、繰り返しビープ音、雨音、バイノーラルビート、シェーム音の5つの聴覚刺激の間で睡眠誘導音を演奏する。最後に、睡眠誘発システムは、非刺激性眼球運動睡眠の場合、94.7%の被験者の睡眠ステージを分類し、聴覚刺激を停止した。当システムでは20名のうち18名が眠る。

Sleep is an essential behavior to prevent the decrement of cognitive, motor, and emotional performance and various diseases. However, it is not easy to fall asleep when people want to sleep. There are various sleep-disturbing factors such as the COVID-19 situation, noise from outside, and light during the night. We aim to develop a personalized sleep induction system based on mental states using electroencephalogram and auditory stimulation. Our system analyzes users' mental states using an electroencephalogram and results of the Pittsburgh sleep quality index and Brunel mood scale. According to mental states, the system plays sleep induction sound among five auditory stimulation: white noise, repetitive beep sounds, rainy sound, binaural beat, and sham sound. Finally, the sleep-inducing system classified the sleep stage of participants with 94.7 percent and stopped auditory stimulation if participants showed non-rapid eye movement sleep. Our system makes 18 participants fall asleep among 20 participants.

翻訳日:2022-12-13 18:16:45 公開日:2022-12-12

# Wasserstein分布ロバスト最適化による一般化と正規化について

On Generalization and Regularization via Wasserstein Distributionally Robust Optimization ( http://arxiv.org/abs/2212.05716v1 )

ライセンス: Link先を確認

Qinyu Wu, Jonathan Yu-Meng Li, Tiantian Mao

(参考訳) wasserstein distributionally robust optimization (dro) は、運用研究や機械学習アプリケーションで成功し、サンプル外のパフォーマンスに有利なソリューションを得るための強力な手段となった。この成功の2つの説得力ある説明は、ワッサーシュタイン DRO から導かれる一般化境界と、ワッサースタイン DRO と機械学習によく適用される正規化スキームの等価性である。一般化境界と正規化の同値性に関する既存の結果は、wasserstein球が特定の型であり、決定基準が期待される関数の特定の形を取るような設定に限定されている。本稿では,アフィン決定規則を伴うwasserstein dro問題に焦点をあてることで,wasserstein球が一般型であり,決定基準がリスクの一般的な尺度,すなわち非線形分布となるような,かなり広い設定で一般化境界と正規化の等価性を得ることができることを示す。これにより、wasserstein droを使用していない多くの重要な分類、回帰、リスク最小化アプリケーションに対応することができる。我々の結果は、一般化境界は次元性の呪いに苦しめられず、正規化の同値性は正確であるという点で強い。副産物として、我々の正規化結果は、正規化定式化によって効率的に解けるワッサーシュタイン DRO モデルのクラスを大きく広げた。

Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly applied in machine learning. Existing results on generalization bounds and the equivalency to regularization are largely limited to the setting where the Wasserstein ball is of a certain type and the decision criterion takes certain forms of an expected function. In this paper, we show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization in a significantly broader setting where the Wasserstein ball can be of a general type and the decision criterion can be a general measure of risk, i.e., nonlinear in distributions. This allows for accommodating many important classification, regression, and risk minimization applications that have not been addressed to date using Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a byproduct, our regularization results broaden considerably the class of Wasserstein DRO models that can be solved efficiently via regularization formulations.

翻訳日:2022-12-13 18:16:29 公開日:2022-12-12

# 安全課題に対するモデルフリー強化学習の評価

Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks ( http://arxiv.org/abs/2212.05727v1 )

ライセンス: Link先を確認

Linrui Zhang and Qin Zhang and Li Shen and Bo Yuan and Xueqian Wang and Dacheng Tao

(参考訳) 安全性は、自律エージェントを含む多くの現実世界のアプリケーションで最初に提供される。安全クリティカルなタスクに焦点を絞った強化学習(RL)手法は多数存在するが、複雑な未知のダイナミクスの下で各決定ステップにおける安全性制約に準拠するアルゴリズムの高品質な評価は依然として不足している。本稿では,この領域における先行研究を,状態的に安全なRLの観点から再考し,それぞれプロジェクションベース,リカバリベース,最適化ベースのアプローチとして分類する。さらに,安全最適化と安全予測を組み合わせた共同手法であるUnrolling Safety Layer (USL)を提案する。この新手法はディープアンロールアーキテクチャを通じて明示的に厳しい制約を強制し、報酬改善と制約満足度の間のトレードオフをナビゲートする構造上の利点を享受する。この領域のさらなる研究を容易にするために、我々は、関連するアルゴリズムを統一パイプラインで再現し、それらをSafeRL-Kitに組み込む。次に、ロボット制御から自律運転までの6つのベンチマークで、関連するアルゴリズムの比較研究を行う。実験結果から,タスク依存の手工法を使わずにゼロコスト・リターン政策を学習する際の適用性と堅牢性について考察した。プロジェクトページはhttps://sites.google.com/view/saferlkit.comで閲覧できる。

Safety comes first in many real-world applications involving autonomous agents. Despite a large number of reinforcement learning (RL) methods focusing on safety-critical tasks, there is still a lack of high-quality evaluation of those algorithms that adheres to safety constraints at each decision step under complex and unknown dynamics. In this paper, we revisit prior work in this scope from the perspective of state-wise safe RL and categorize them as projection-based, recovery-based, and optimization-based approaches, respectively. Furthermore, we propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. This novel technique explicitly enforces hard constraints via the deep unrolling architecture and enjoys structural advantages in navigating the trade-off between reward improvement and constraint satisfaction. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit, a toolkit that provides off-the-shelf interfaces and evaluation utilities for safety-critical tasks. We then perform a comparative study of the involved algorithms on six benchmarks ranging from robotic control to autonomous driving. The empirical results provide an insight into their applicability and robustness in learning zero-cost-return policies without task-dependent handcrafting. The project page is available at https://sites.google.com/view/saferlkit.

翻訳日:2022-12-13 18:16:03 公開日:2022-12-12

# クリックスルー速度予測における埋め込みの適応的低精度訓練

Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction ( http://arxiv.org/abs/2212.05735v1 )

ライセンス: Link先を確認

Shiwei Li, Huifeng Guo, Lu Hou, Wei Zhang, Xing Tang, Ruiming Tang, Rui Zhang, Ruixuan Li

(参考訳) 埋め込みテーブルは通常、クリックスルーレート(CTR)予測モデルにおいて巨大である。 CTRモデルを効率よく、経済的に訓練・展開するには、トレーニング段階での埋め込みテーブルを圧縮する必要がある。この目的のために,新しい量子化訓練パラダイムを定式化し,学習段階からの埋め込みを圧縮し,低精度訓練(lpt)と呼ぶ。また,その収束に関する理論的解析を行う。その結果, 確率的重み量子化は, LPTにおける決定論的重み量子化よりも収束速度が速く, 収束誤差も小さいことがわかった。さらに, 精度劣化を軽減するために, 勾配降下を通じてステップサイズ(すなわち量子化分解能)を学習する適応型低精度トレーニング(alpt)を提案する。 2つの実世界のデータセットの実験により、ALPTが予測精度、特に極低ビット幅で著しく向上できることが確認された。 CTRモデルでは,予測精度を犠牲にすることなく8ビット埋め込みのトレーニングに成功した。 ALPTのコードは公開されている。

Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables at the training stage. To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT). Also, we provide theoretical analysis on its convergence. The results show that stochastic weight quantization has a faster convergence rate and a smaller convergence error than deterministic weight quantization in LPT. Further, to reduce the accuracy degradation, we propose adaptive low-precision training (ALPT) that learns the step size (i.e., the quantization resolution) through gradient descent. Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit widths. For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy. The code of ALPT is publicly available.

翻訳日:2022-12-13 18:15:41 公開日:2022-12-12

# シャッフルとバッチクリッピングによるDP-SGDの一般化

Generalizing DP-SGD with Shuffling and Batching Clipping ( http://arxiv.org/abs/2212.05796v1 )

ライセンス: Link先を確認

Marten van Dijk, Phuong Ha Nguyen, Toan N. Nguyen and Lam M. Nguyen

(参考訳) 古典的な差分DP-SGDは、ランダムサブサンプリングによる個々のクリッピングを実装し、ミニバッチSGDアプローチを強制する。 DP-SGDを超越した一般微分プライベートアルゴリズムフレームワークを提供し、バッチクリッピングと組み合わせて一階最適化(古典的なSGDや運動量に基づくSGDアプローチなど)を可能とし、クリッピングされた勾配を(個々のクリッピングで行うように)要約するのではなく、計算された勾配の集合をクリッピングする。このフレームワークはまた、シャッフルのようなランダムなサブサンプリング以外のサンプリング技術も認めている。我々のDP分析は$f$-DPアプローチに従い、グループプライバシの分析を可能にする新しい証明手法を導入する。特に、e$ epochs の作業とサイズ $g$ のグループに対して、シャッフル付きバッチクリッピング用の$\sqrt{g e}$ dp 依存性を示します。これは、以前予想されていた$g$の線形依存性よりもはるかに優れており、一般的に$\sqrt{e}$以上の$e$ epochs内のラウンドの総数に対する以前の予測の平方根依存性よりもはるかに優れている。

Classical differential private DP-SGD implements individual clipping with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clipping, which clips an aggregate of computed gradients rather than summing clipped gradients (as is done in individual clipping). The framework also admits sampling techniques beyond random subsampling such as shuffling. Our DP analysis follows the $f$-DP approach and introduces a new proof technique which allows us to also analyse group privacy. In particular, for $E$ epochs work and groups of size $g$, we show a $\sqrt{g E}$ DP dependency for batch clipping with shuffling. This is much better than the previously anticipated linear dependency in $g$ and is much better than the previously expected square root dependency on the total number of rounds within $E$ epochs which is generally much more than $\sqrt{E}$.

翻訳日:2022-12-13 18:15:24 公開日:2022-12-12

# HACA3:マルチサイトMR画像調和のための統一的アプローチ

HACA3: A Unified Approach for Multi-site MR Image Harmonization ( http://arxiv.org/abs/2212.06065v1 )

ライセンス: Link先を確認

Lianrui Zuo, Yihao Liu, Yuan Xue, Blake E. Dewey, Murat Bilgel, Ellen M. Mowry, Scott D. Newsome, Peter A. Calabresi, Susan M. Resnick, Jerry L. Prince, Aaron Carass

(参考訳) 標準化の欠如は磁気共鳴(MR)イメージングにおいて顕著な問題である。これはしばしば、ハードウェアと取得パラメータの違いによる望ましくないコントラスト変動を引き起こす。近年,非所望のコントラスト変動を補うために,画像合成によるMRハーモニゼーションが提案されている。既存の方法の成功にもかかわらず、私たちは3つの大きな改善ができると主張している。第一に、既存のほとんどの手法は、同一対象のマルチコントラストMR画像が同じ解剖学を共有するという仮定に基づいて構築されている。異なるmrコントラストは異なる解剖学的特徴を強調するために特別であるため、この仮定は疑わしい。第二に、これらの方法はトレーニングのために固定されたMRコントラスト(例えば、Tw強調画像とT2強調画像の両方が利用可能でなければならない)を必要とすることが多い。第3に、既存の手法は一般的にイメージングアーティファクトに敏感である。本稿では,これらの3つの問題に対処するために,注意に基づくコントラスト,解剖,アーティファクト意識(HACA3)を用いた調和方式を提案する。まず,haca3をmrコントラスト間の解剖学的差異を尊重する解剖学的融合モジュールを提案する。 HACA3はまた、イメージングアーティファクトにも堅牢であり、MRコントラストの任意のセットにトレーニングおよび適用することができる。実験により、HACA3は複数の画像品質指標の下で最先端のパフォーマンスを達成することが示された。また,フィールド強度の異なる21のサイト,スキャナプラットフォーム,取得プロトコルから取得したMRデータセットを用いて,下流タスクにおけるHACA3の適用性を示す。

The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations due to differences in hardware and acquisition parameters. In recent years, MR harmonization using image synthesis with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both Tw-weighted and T2-weighted images must be available), which limits their applicability. Third, existing methods generally are sensitive to imaging artifacts. In this paper, we present a novel approach, Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), to address these three issues. We first propose an anatomy fusion module that enables HACA3 to respect the anatomical differences between MR contrasts. HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability of HACA3 on downstream tasks with diverse MR datasets acquired from 21 sites with different field strengths, scanner platforms, and acquisition protocols.

翻訳日:2022-12-13 18:08:59 公開日:2022-12-12

# マルチプレイヤー不完全情報ゲームにおけるベイジアン対戦モデル

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games ( http://arxiv.org/abs/2212.06027v1 )

ライセンス: Link先を確認

Sam Ganzfried, Kevin A. Wang, Max Chiswick

(参考訳) 多くの現実世界の設定エージェントは、様々な戦略を利用できる複数の反対エージェントと戦略的に相互作用する。このような設定のためにエージェントを設計する標準的なアプローチは、nash均衡のような関連するゲーム理論的な解の概念を計算または近似し、所定の戦略に従うことである。しかし、このような戦略は、相手のプレーの観察を無視し、悪用できる欠点を示す可能性がある。本稿では,マルチプレイヤー不完全情報ゲームにおいて,繰り返しのインタラクションを通じて対戦者のプレーを観察する対戦者モデリング手法を提案する。我々は,3人プレイのクーンポーカーにおいて,多種多様な実敵と正確なナッシュ均衡戦略に対して実験を行い,このアルゴリズムがナッシュ均衡戦略を含む全てのエージェントを著しく上回ることを示す。

In many real-world settings agents engage in strategic interactions with multiple opposing agents who can employ a wide variety of strategies. The standard approach for designing agents for such settings is to compute or approximate a relevant game-theoretic solution concept such as Nash equilibrium and then follow the prescribed strategy. However, such a strategy ignores any observations of opponents' play, which may indicate shortcomings that can be exploited. We present an approach for opponent modeling in multiplayer imperfect-information games where we collect observations of opponents' play through repeated interactions. We run experiments against a wide variety of real opponents and exact Nash equilibrium strategies in three-player Kuhn poker and show that our algorithm significantly outperforms all of the agents, including the exact Nash equilibrium strategies.

翻訳日:2022-12-13 18:08:34 公開日:2022-12-12

# 近臨界レーザープラズマからの電子スペクトルに対する可逆ニューラルネットワークの受容速度:比較

Acceptance Rates of Invertible Neural Networks on Electron Spectra from Near-Critical Laser-Plasmas: A Comparison ( http://arxiv.org/abs/2212.05836v1 )

ライセンス: Link先を確認

Thomas Miethlinger, Nico Hoffmann, Thomas Kluge

(参考訳) 超短パルスと近接および過臨界プラズマの相互作用は直接観測できないが、実験的にアクセス可能な量(観測可能量)は、しばしば、基礎となるプラズマダイナミクスに関する情報を間接的に与えるだけである。さらに、observablesが提供する情報は不完全であり、逆問題は非常に曖昧である。したがって、プラズマ力学と実験パラメータを推論するためには、観測されたパラメータの完全な分布を考慮し、モデルが柔軟であり、前方プロセスで失われた情報を考慮しなければならない。 Invertible Neural Networks (INNs) は、前と逆のプロセスの両方を効率的にモデル化し、特定の測定値の完全な条件付き後部を提供するように設計されている。本研究では,合成電子スペクトルのINNと標準統計手法のベンチマークを行う。まず,受入率について実験を行い,受入率を最大10倍に向上させた。さらに,この受入率の増加は,IMNのスピードアップを同じ程度に向上させることを示す。最後に, innを活用し,高い精度を維持しつつ低ランタイムを約束する複合アルゴリズムを提案する。

While the interaction of ultra-intense ultra-short laser pulses with near- and overcritical plasmas cannot be directly observed, experimentally accessible quantities (observables) often only indirectly give information about the underlying plasma dynamics. Furthermore, the information provided by observables is incomplete, making the inverse problem highly ambiguous. Therefore, in order to infer plasma dynamics as well as experimental parameter, the full distribution over parameters given an observation needs to considered, requiring that models are flexible and account for the information lost in the forward process. Invertible Neural Networks (INNs) have been designed to efficiently model both the forward and inverse process, providing the full conditional posterior given a specific measurement. In this work, we benchmark INNs and standard statistical methods on synthetic electron spectra. First, we provide experimental results with respect to the acceptance rate, where our results show increases in acceptance rates up to a factor of 10. Additionally, we show that this increased acceptance rate also results in an increased speed-up for INNs to the same extent. Lastly, we propose a composite algorithm that utilizes INNs and promises low runtimes while preserving high accuracy.

翻訳日:2022-12-13 18:08:21 公開日:2022-12-12

# CTT-Net:白内障術後視力予測用多視点クロストーケン変換器

CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction ( http://arxiv.org/abs/2212.05794v1 )

ライセンス: Link先を確認

Jinhong Wang, Jingwen Wang, Tingting Chen, Wenhao Zheng, Zhe Xu, Xingdi Wu, Wen Xu, Haochao Ying, Danny Chen, and Jian Wu

(参考訳) 手術は白内障の視力障害(va)に対して唯一有効な治療である。臨床的には、白内障手術の必要性を評価するために、多視点光コヒーレンス断層撮影(OCT)画像を分析して術後VAを正確に予測する必要がある。残念ながら, 術後VAの判定は, 医療専門家にとって依然として困難である。近年,この問題の深層学習手法が開発されている。有効ではあるが、マルチビューのoct画像間の潜在的な関係を効率的に探索しない、臨床事前知識(例えば、術前のva値)の重要役割を無視する、参照を欠いた回帰ベースのメトリクスのみを使用するなど、いくつかの問題に直面している。本稿では,多視点OCT画像と術前VA画像の両方を分析し,術後VA予測のための新しいクロストークントランスフォーマーネットワーク(CTT-Net)を提案する。 oct画像の多視点特徴を効果的に融合するために,冗長・不要な注意フローを制限できるクロストケン注意手法を開発した。さらに,術前VA値を用いて,術後VA予測のための情報を提供し,ビュー間の融合を容易にする。さらに,モデル性能を向上させるために補助的な分類損失を設計し,回帰メトリクスのみを用いて制限を回避し,vaリカバリを十分に評価する。 CTT-Netを評価するために,共同病院から収集した多視点CT画像データセットを構築した。広範な実験のセットは、さまざまなメトリクスの既存の方法と比較して、モデルの有効性を検証するものです。コードは、https://github.com/wjh892521292/Cataract OCTで入手できる。

Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for medical experts. Deep learning methods for this problem were developed in recent years. Although effective, these methods still face several issues, such as not efficiently exploring potential relations between multi-view OCT images, neglecting the key role of clinical prior knowledge (e.g., preoperative VA value), and using only regression-based metrics which are lacking reference. In this paper, we propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction by analyzing both the multi-view OCT images and preoperative VA. To effectively fuse multi-view features of OCT images, we develop cross-token attention that could restrict redundant/unnecessary attention flow. Further, we utilize the preoperative VA value to provide more information for postoperative VA prediction and facilitate fusion between views. Moreover, we design an auxiliary classification loss to improve model performance and assess VA recovery more sufficiently, avoiding the limitation by only using the regression metrics. To evaluate CTT-Net, we build a multi-view OCT image dataset collected from our collaborative hospital. A set of extensive experiments validate the effectiveness of our model compared to existing methods in various metrics. Code is available at: https://github.com/wjh892521292/Cataract OCT.

翻訳日:2022-12-13 17:59:16 公開日:2022-12-12

# Z-SSMNet:bpMRIにおける前立腺癌の検出と診断のためのゾナル・アウェア・セルフスーパービジョンメッシュネットワーク

Z-SSMNet: A Zonal-aware Self-Supervised Mesh Network for Prostate Cancer Detection and Diagnosis in bpMRI ( http://arxiv.org/abs/2212.05808v1 )

ライセンス: Link先を確認

Yuan Yuan, Euijoon Ahn, Dagan Feng, Mohamad Khadra, Jinman Kim

(参考訳) 前立腺癌(PCa)は、男性において最も多いがんの1つであり、世界中の多くの人々が臨床的に重要なPCa(csPCa)によって死亡した。マルチパラメトリックMRI (mpMRI) と比較して非侵襲的, 費用対効果が高く, より効率的であるbi-parametric MRI (bpMRI) におけるcsPCaの早期診断はPCaの精度向上に寄与する。人工知能(AI)アルゴリズムの急速な増加は、csPCaの診断と理解に役立つ意思決定支援システムの提供において、前例のない改善を可能にしている。しかし、ディープラーニング技術に基づくAIアルゴリズムの既存の状態は、しばしば3Dボリューム画像のスライス間相関を捉えるのに失敗する2D画像に制限される。 3D畳み込みニューラルネットワーク(CNN)の使用は、この制限を部分的に克服するが、画像の異方性に適応しないため、準最適セマンティック表現や一般化が不足する。さらに、bpMRIのラベル付きデータの量制限とラベル付けの難しさにより、既存のCNNは比較的小さなデータセット上に構築されており、性能が劣っている。上記の制約に対処するため,複数2D,2.5D,3DのCNNを適応的に融合させてbpMRIにおけるスパース・インタースライス情報と高密度スライス情報の表現を効果的にバランスさせる,Z-SSMNetを提案する。自己教師付き学習(ssl)技術がさらに導入され,非ラベルデータを用いてネットワークを事前学習し,一般化可能な画像特徴を学習する。さらに,cspcaの診断精度を向上させるために,動物特異的ドメイン知識の理解にネットワークを制約した。 pi-caiチャレンジデータセットの実験により、bpmriにおけるcspca検出および診断の性能が向上することを示す。

Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms are enabling unprecedented improvements in providing decision support systems that can aid in csPCa diagnosis and understanding. However, existing state of the art AI algorithms which are based on deep learning technology are often limited to 2D images that fails to capture inter-slice correlations in 3D volumetric images. The use of 3D convolutional neural networks (CNNs) partly overcomes this limitation, but it does not adapt to the anisotropy of images, resulting in sub-optimal semantic representation and poor generalization. Furthermore, due to the limitation of the amount of labelled data of bpMRI and the difficulty of labelling, existing CNNs are built on relatively small datasets, leading to a poor performance. To address the limitations identified above, we propose a new Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptatively fuses multiple 2D, 2.5D and 3D CNNs to effectively balance representation for sparse inter-slice information and dense intra-slice information in bpMRI. A self-supervised learning (SSL) technique is further introduced to pre-train our network using unlabelled data to learn the generalizable image features. Furthermore, we constrained our network to understand the zonal specific domain knowledge to improve the diagnosis precision of csPCa. Experiments on the PI-CAI Challenge dataset demonstrate our proposed method achieves better performance for csPCa detection and diagnosis in bpMRI.

翻訳日:2022-12-13 17:58:49 公開日:2022-12-12

# KonX: クロスリゾリューション画像の品質評価

KonX: Cross-Resolution Image Quality Assessment ( http://arxiv.org/abs/2212.05813v1 )

ライセンス: Link先を確認

Oliver Wiedemann and Vlad Hosu and Shaolin Su and Dietmar Saupe

(参考訳) スケール不変性は多くのコンピュータビジョンサブフィールドにおいてオープンな問題である。例えば、オブジェクトラベルはスケールにわたって一定であり続けるべきですが、モデル予測は多くのケースでばらつきます。この問題は、プレゼンテーションの規模で基調ラベルが変化するタスクでは難しくなる。画質アセスメント(iqa)では、ダウンサンプリングはぼやけや圧縮アーティファクトなどの障害を弱め、主観的な研究で誘発される印象に正の影響を与える。したがって、知覚画像の品質を正確に予測するためには、クロスレゾリューションIQA法はモデル不整合によって引き起こされる分解能依存誤差と、基底真実における知覚ラベルシフトを考慮しなければならない。本報告では,この2つの問題を分離して検討する手法として,新しいクロスレゾリューションIQAデータベースであるKonXについて述べる。本稿は以下のとおりである。 1. KonX を用いて, 表示解像度の変化によるラベルシフトの実証的証拠を提供する。 2. 客観的 iqa 手法にはスケールバイアスがあり,予測性能が低下することを示す。 3) 従来のIQAモデルよりも性能が向上するマルチスケール・マルチカラムDNNアーキテクチャを提案する。そこで我々は,画像品質評価における新たな研究課題を提起し,解決する。

Scale-invariance is an open problem in many computer vision subfields. For example, object labels should remain constant across scales, yet model predictions diverge in many cases. This problem gets harder for tasks where the ground-truth labels change with the presentation scale. In image quality assessment (IQA), downsampling attenuates impairments, e.g., blurs or compression artifacts, which can positively affect the impression evoked in subjective studies. To accurately predict perceptual image quality, cross-resolution IQA methods must therefore account for resolution-dependent errors induced by model inadequacies as well as for the perceptual label shifts in the ground truth. We present the first study of its kind that disentangles and examines the two issues separately via KonX, a novel, carefully crafted cross-resolution IQA database. This paper contributes the following: 1. Through KonX, we provide empirical evidence of label shifts caused by changes in the presentation resolution. 2. We show that objective IQA methods have a scale bias, which reduces their predictive performance. 3. We propose a multi-scale and multi-column DNN architecture that improves performance over previous state-of-the-art IQA models for this task, including recent transformers. We thus both raise and address a novel research problem in image quality assessment.

翻訳日:2022-12-13 17:58:11 公開日:2022-12-12

# NFResNet:デブロアリングのためのマルチスケールおよびU字型ネットワーク

NFResNet: Multi-scale and U-shaped Networks for Deblurring ( http://arxiv.org/abs/2212.05909v1 )

ライセンス: Link先を確認

Tanish Mittal, Preyansh Agrawal, Esha Pahwa, Aarya Makwana

(参考訳) マルチスケールおよびU字型ネットワークは、デブロアリングを含む様々な画像復元問題に広く利用されている。幅広い応用を念頭に置いて,これらのアーキテクチャの比較と画像の劣化に対する影響について述べる。また、NFResblockと呼ばれる新しいブロックも導入する。高速フーリエ変換層と一連の修正された非線形活性化自由ブロックからなる。これらのアーキテクチャと追加に基づき,NFResnetとNFResnet+を導入し,それぞれマルチスケールアーキテクチャとU-Netアーキテクチャを改良した。また、これらのアーキテクチャをトレーニングするために、Charbonnier Loss、Edge Loss、 Frequency Reconstruction Lossという3つの異なる損失関数を使用します。本稿では,各成分のアブレーション研究とともに,深部ビデオデブラリングデータセットに関する広範囲な実験を行った。提案手法は,Pak Signal to Noise (PSNR) 比と構造類似度指数 (SSIM) の値を大きく向上させる。

Multi-Scale and U-shaped Networks are widely used in various image restoration problems, including deblurring. Keeping in mind the wide range of applications, we present a comparison of these architectures and their effects on image deblurring. We also introduce a new block called as NFResblock. It consists of a Fast Fourier Transformation layer and a series of modified Non-Linear Activation Free Blocks. Based on these architectures and additions, we introduce NFResnet and NFResnet+, which are modified multi-scale and U-Net architectures, respectively. We also use three different loss functions to train these architectures: Charbonnier Loss, Edge Loss, and Frequency Reconstruction Loss. Extensive experiments on the Deep Video Deblurring dataset, along with ablation studies for each component, have been presented in this paper. The proposed architectures achieve a considerable increase in Peak Signal to Noise (PSNR) ratio and Structural Similarity Index (SSIM) value.

翻訳日:2022-12-13 17:57:34 公開日:2022-12-12

# 不均質効果を評価するハイブリッド型量子回帰林

Hybrid Censored Quantile Regression Forest to Assess the Heterogeneous Effects ( http://arxiv.org/abs/2212.05672v1 )

ライセンス: Link先を確認

Huichen Zhu, Yifei Sun, Ying Wei

(参考訳) 多くの応用において、検閲された応答変数に対する不均一な処理効果は第一の関心事であり、異なる量的効果(例えば中央値)を評価することは自然である。多数の潜在的な効果修飾剤、治療効果の未知の構造、および右検閲の存在は重大な課題をもたらす。本稿では,高次元変数による不均質効果を評価するために,ハイブリッド・セザード・クォンタイル回帰林(hcqrf)と呼ばれるハイブリッド・フォレスト・アプローチを開発した。ハイブリッド推定手法は、ランダム林と検閲された分位量回帰の利点を生かしている。高次元効果関数を扱うために,森林由来の検閲を処理し,適応的に最寄りの重みを推定する2重重重み付き推定手法を提案する。本稿では,治療効果関数に対する変数の影響を測定するために,変数重要度分解を提案する。広範なシミュレーション研究によりhcqrfの有効性と安定性が示された。また,シミュレーションの結果から,変数の重要度分解の有効性が示唆された。大腸癌の臨床治験にHCQRFを適用した。治療効果と有意義な変数重要度を洞察的に推定する。変数の重要性の結果として、分解の必要性も確認される。

In many applications, heterogeneous treatment effects on a censored response variable are of primary interest, and it is natural to evaluate the effects at different quantiles (e.g., median). The large number of potential effect modifiers, the unknown structure of the treatment effects, and the presence of right censoring pose significant challenges. In this paper, we develop a hybrid forest approach called Hybrid Censored Quantile Regression Forest (HCQRF) to assess the heterogeneous effects varying with high-dimensional variables. The hybrid estimation approach takes advantage of the random forests and the censored quantile regression. We propose a doubly-weighted estimation procedure that consists of a redistribution-of-mass weight to handle censoring and an adaptive nearest neighbor weight derived from the forest to handle high-dimensional effect functions. We propose a variable importance decomposition to measure the impact of a variable on the treatment effect function. Extensive simulation studies demonstrate the efficacy and stability of HCQRF. The result of the simulation study also convinces us of the effectiveness of the variable importance decomposition. We apply HCQRF to a clinical trial of colorectal cancer. We achieve insightful estimations of the treatment effect and meaningful variable importance results. The result of the variable importance also confirms the necessity of the decomposition.

翻訳日:2022-12-13 17:51:09 公開日:2022-12-12

# Bottleneck特徴を用いたテキスト注釈のない直接音声音声合成

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features ( http://arxiv.org/abs/2212.05805v1 )

ライセンス: Link先を確認

Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma

(参考訳) 音声から音声への翻訳は、異なる言語間での発話を直接翻訳し、同時解釈のようなタスクにおいて大きな可能性を持つ。 State-of-artモデルは、通常、音素シーケンス予測のための補助モジュールを含み、トレーニングデータセットのテキストアノテーションを必要とする。テキストの注釈や内容情報なしで学習できる音声から音声への直接翻訳モデルを提案する。モデルに補助音素予測タスクを導入する代わりに,システムの翻訳性能を保証するために,モデルの中間学習目標としてボトルネック機能を使用することを提案する。 Mandarin-Cantonese音声翻訳の実験は,提案手法の有効性を実証し,その性能は翻訳品質と合成品質の点でカスケードシステムと一致させることができる。

Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation. State-of-art models usually contains an auxiliary module for phoneme sequences prediction, and this requires textual annotation of the training dataset. We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information. Instead of introducing an auxiliary phoneme prediction task in the model, we propose to use bottleneck features as intermediate training objectives for our model to ensure the translation performance of the system. Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach and the performance can match a cascaded system with respect of translation and synthesis qualities.

翻訳日:2022-12-13 17:50:50 公開日:2022-12-12

# Androidアプリケーションのための事前学習BERTモデル

A Pre-Trained BERT Model for Android Applications ( http://arxiv.org/abs/2212.05976v1 )

ライセンス: Link先を確認

Tiezhu Sun (1), Kevin Allix (1), Kisub Kim (2), Xin Zhou (2), Dongsun Kim (3), David Lo (2), Tegawend\'e F. Bissyand\'e (1) and Jacques Klein (1) ((1) University of Luxembourg, (2) Singapore Management University, (3) Kyungpook National University)

(参考訳) 機械学習(ML)のおかげで、ますます多くのソフトウェアエンジニアリングタスクの自動化が可能になっている。 MLのソフトウェアアーティファクトへの適用における基本的な構成要素の1つは、これらのアーティファクト(例えば、ソースコードや実行可能なコード)を学習に適した形式に表現することである。多くの研究は表現学習を活用し、ML自体に適切な表現を自動設計する仕事を委譲している。しかし、Androidの問題の文脈では、既存のモデルは粗い粒度のアプリケーションレベル(apk2vecなど)に制限されるか、特定の下流タスク(smali2vecなど)で実行される。私たちの研究は、これらの2つの制限を緩和するために、バイトコードの効率的でタスクに依存しない、きめ細かい普遍的な表現を調査する新しい研究の一部です。このような表現は、様々な低レベル下流タスク(例えば、クラスレベルで)に関連する情報をキャプチャすることを目的としている。我々は自然言語処理の分野に触発され、普遍表現の問題は、文に関する抽象的な意味情報を様々なタスクで再利用することを目的として、BERTのようなユニバーサル言語モデルを構築することで解決された。我々は,Androidアプリケーションで使用される主要なバイナリフォーマットであるDEXバイトコードのチャンクを表現するために,BERTライクな言語モデルであるDexBERTを提案する。 DexBERT が DEX 言語をモデル化できるかどうかを実証的に評価し、2 つのクラスレベルのソフトウェアエンジニアリングタスクでモデルの適合性を評価する。また、サイズが大きく異なるアプリへのキャタリングの問題に対処する戦略を実験し、その手法を用いて与えられたタスクに関連する情報を調査する一例を示した。

The automation of an increasingly large number of software engineering tasks is becoming possible thanks to Machine Learning (ML). One foundational building block in the application of ML to software artifacts is the representation of these artifacts (e.g., source code or executable code) into a form that is suitable for learning. Many studies have leveraged representation learning, delegating to ML itself the job of automatically devising suitable representations. Yet, in the context of Android problems, existing models are either limited to coarse-grained whole-app level (e.g., apk2vec) or conducted for one specific downstream task (e.g., smali2vec). Our work is part of a new line of research that investigates effective, task-agnostic, and fine-grained universal representations of bytecode to mitigate both of these two limitations. Such representations aim to capture information relevant to various low-level downstream tasks (e.g., at the class-level). We are inspired by the field of Natural Language Processing, where the problem of universal representation was addressed by building Universal Language Models, such as BERT, whose goal is to capture abstract semantic information about sentences, in a way that is reusable for a variety of tasks. We propose DexBERT, a BERT-like Language Model dedicated to representing chunks of DEX bytecode, the main binary format used in Android applications. We empirically assess whether DexBERT is able to model the DEX language and evaluate the suitability of our model in two distinct class-level software engineering tasks: Malicious Code Localization and Defect Prediction. We also experiment with strategies to deal with the problem of catering to apps having vastly different sizes, and we demonstrate one example of using our technique to investigate what information is relevant to a given task.

翻訳日:2022-12-13 17:40:18 公開日:2022-12-12

# 評価対象は誰か? AIに基づく攻撃コードジェネレータの自動評価基準について

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators ( http://arxiv.org/abs/2212.06008v1 )

ライセンス: Link先を確認

Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, and Domenico Cotroneo

(参考訳) AIベースのコードジェネレータは、ディープニューラルネットワーク(Neural Machine Translation, NMT)を使用して、自然言語による記述から始まるプログラムを自動記述する新しいソリューションである。特にコードジェネレータは、概念実証攻撃を生成することによって倫理的ハッキングや攻撃的なセキュリティテストに使用されている。残念ながら、コードジェネレータの評価にはいくつかの問題がある。現在のプラクティスは自動メトリクスを使用しており、それによって生成されたコードのテキスト的類似性を接地真実参照で計算する。しかし、どのメトリクスを使うべきか、どのメトリクスが特定のコンテキストに最も適しているかは明らかではない。この実用的な経験報告は、攻撃的なコードジェネレータの出力類似度を大量に分析する。攻撃的アセンブリとPythonコードを含む2つのデータセットを英語で記述した2つのNMTモデルに適用した。自動測定値からの見積もりを人的評価と比較し,その強みと限界に関する実践的洞察を提供する。

AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.

翻訳日:2022-12-13 17:39:47 公開日:2022-12-12

# テンソル分解による融合グラフ再構成による時空間交通モデリング

Spatial-temporal traffic modeling with a fusion graph reconstructed by tensor decomposition ( http://arxiv.org/abs/2212.05653v1 )

ライセンス: Link先を確認

Qin Li, Xuan Yang, Yong Wang, Yuankai Wu, Deqiang He

(参考訳) 正確な時空間交通流予測は、交通管理者が制御手段とドライバーが最適な経路を選択するのを助けるために不可欠である。近年,グラフ畳み込みネットワーク (GCN) は空間的時間的依存関係を捕捉する強力な能力のため,交通流予測に広く利用されている。時空間グラフ隣接行列の設計はGCNの成功の鍵であり、まだ未解決の問題である。本稿では, テンソル分解による連接行列の再構成を提案し, 交通流予測手法を提案する。まず,空間-時間融合グラフ隣接行列を3方向隣接テンソルに再構成する。次に,タッカー分解による隣接テンソルの再構成を行い,より情報的かつグローバルな時空間依存性を符号化した。最後に、局所化された空間-時間相関学習のための空間-時間同期グラフ畳み込みモジュールと、グローバル相関学習のための拡張畳み込みモジュールとを組み立てて、道路網の包括的空間-時間依存性を集約学習する。 4つのオープンアクセスデータセットによる実験結果から,提案モデルが予測性能と計算コストにおいて最先端の手法より優れていることが示された。

Accurate spatial-temporal traffic flow forecasting is essential for helping traffic managers to take control measures and drivers to choose the optimal travel routes. Recently, graph convolutional networks (GCNs) have been widely used in traffic flow prediction owing to their powerful ability to capture spatial-temporal dependencies. The design of the spatial-temporal graph adjacency matrix is a key to the success of GCNs, and it is still an open question. This paper proposes reconstructing the binary adjacency matrix via tensor decomposition, and a traffic flow forecasting method is proposed. First, we reformulate the spatial-temporal fusion graph adjacency matrix into a three-way adjacency tensor. Then, we reconstructed the adjacency tensor via Tucker decomposition, wherein more informative and global spatial-temporal dependencies are encoded. Finally, a Spatial-temporal Synchronous Graph Convolutional module for localized spatial-temporal correlations learning and a Dilated Convolution module for global correlations learning are assembled to aggregate and learn the comprehensive spatial-temporal dependencies of the road network. Experimental results on four open-access datasets demonstrate that the proposed model outperforms state-of-the-art approaches in terms of the prediction performance and computational cost.

翻訳日:2022-12-13 17:33:07 公開日:2022-12-12

# 深部グラフ拡散情報マックスを用いたCOVID-19パンデミック時の人体移動モデリング

Human Mobility Modeling During the COVID-19 Pandemic via Deep Graph Diffusion Infomax ( http://arxiv.org/abs/2212.05707v1 )

ライセンス: Link先を確認

Yang Liu, Yu Rong, Zhuoning Guo, Nuo Chen, Tingyang Xu, Fugee Tsung, Jia Li

(参考訳) 社会的集会制限などの非薬剤的介入(npis)は、人々の接触を減らすことで新型コロナウイルスの感染を遅らせる効果を示している。政策立案者を支援するために、まずマクロ指標(例えば1日平均移動距離)を介して人間の移動をモデル化し、NPIの有効性について研究した。本研究は,モビリティ・モデリングに焦点をあて,マイクロの観点から,新型コロナウイルスの感染者が訪れる場所を予測することを目的としている。 NPIは一般的に経済的・社会的損失を引き起こすため、このようなミクロな視点予測は政府にとって、それらを設計し評価する上で有用である。しかし、現実の状況では、厳格なプライバシーデータ保護規則は厳しいデータ空間の問題(ケースや位置情報の制限など)を引き起こす。これらの課題に対処するため、マイクロ視点モビリティモデリングを幾何学グラフ上で条件付きで、拡散と位置の関連性スコアを計算するために定式化する。本研究では,ddi(deep graph diffusion infomax)というモデルを提案する。このモデルでは,図形グラフ,拡散集合,位置集合などの変数を共同でモデル化し,covid-19予測の研究を容易にするために,covid-19症例の幾何グラフと位置履歴を含む2つのベンチマークを提案する。 2つのベンチマークの大規模な実験により、DGDIは他の競合する手法よりも大幅に優れていることが示された。

Non-Pharmaceutical Interventions (NPIs), such as social gathering restrictions, have shown effectiveness to slow the transmission of COVID-19 by reducing the contact of people. To support policy-makers, multiple studies have first modeled human mobility via macro indicators (e.g., average daily travel distance) and then studied the effectiveness of NPIs. In this work, we focus on mobility modeling and, from a micro perspective, aim to predict locations that will be visited by COVID-19 cases. Since NPIs generally cause economic and societal loss, such a micro perspective prediction benefits governments when they design and evaluate them. However, in real-world situations, strict privacy data protection regulations result in severe data sparsity problems (i.e., limited case and location information). To address these challenges, we formulate the micro perspective mobility modeling into computing the relevance score between a diffusion and a location, conditional on a geometric graph. we propose a model named Deep Graph Diffusion Infomax (DGDI), which jointly models variables including a geometric graph, a set of diffusions and a set of locations.To facilitate the research of COVID-19 prediction, we present two benchmarks that contain geometric graphs and location histories of COVID-19 cases. Extensive experiments on the two benchmarks show that DGDI significantly outperforms other competing methods.

翻訳日:2022-12-13 17:32:47 公開日:2022-12-12

# GT-CausIn:交通予測の新しい因果関係

GT-CausIn: a novel causal-based insight for traffic prediction ( http://arxiv.org/abs/2212.05782v1 )

ライセンス: Link先を確認

Ting Gao, Rodrigo Kappes Marques, Lei Yu

(参考訳) 交通予測は時空間予測の重要な応用である。様々な手法の中で、グラフニューラルネットワークはこれまでに最も有望な結果を達成しており、グラフノード間の関係を学習することが重要な課題となっている。しかし、これらの関係がノード-ノード方式で学習されると、改善空間は非常に制限される。この課題は(1)異なる局間の不明瞭な時間的依存関係、(2)ノードレベルを超えて変数を定義することの難しさ、(3)学習された関係を検証するための既製の方法に起因している。これらの課題に対処するために、トラフィック内の因果関係を発見するための正当なトラフィック因果変数を定義し、統計ツールやケース分析で慎重にチェックする。次に,事前学習された因果情報をグラフ拡散層と時間畳み込みネットワーク(tcn)層に統合した,因果的洞察(gt-causin)に基づくグラフ空間-時間的ネットワークモデルを提案する。 PEMS-BAYとMETR-LAの2つの実世界のトラフィックデータセットで実験を行い、GT-CausInは中期および長期予測において最先端モデルよりも大幅に優れていることを示した。

Traffic forecasting is an important application of spatiotemporal series prediction. Among different methods, graph neural networks have achieved so far the most promising results, learning relations between graph nodes then becomes a crucial task. However, improvement space is very limited when these relations are learned in a node-to-node manner. The challenge stems from (1) obscure temporal dependencies between different stations, (2) difficulties in defining variables beyond the node level, and (3) no ready-made method to validate the learned relations. To confront these challenges, we define legitimate traffic causal variables to discover the causal relation inside the traffic network, which is carefully checked with statistic tools and case analysis. We then present a novel model named Graph Spatial-Temporal Network Based on Causal Insight (GT-CausIn), where prior learned causal information is integrated with graph diffusion layers and temporal convolutional network (TCN) layers. Experiments are carried out on two real-world traffic datasets: PEMS-BAY and METR-LA, which show that GT-CausIn significantly outperforms the state-of-the-art models on mid-term and long-term prediction.

翻訳日:2022-12-13 17:32:23 公開日:2022-12-12

# 重み付けによる非定常データの学習

Learning on non-stationary data with re-weighting ( http://arxiv.org/abs/2212.05908v1 )

ライセンス: Link先を確認

Nishant Jain, Pradeep Shenoy

(参考訳) 多くの現実世界の学習シナリオは、時間の経過とともにデータ分布が徐々に変化するスローコンセプトドリフトの課題に直面している。そこで本研究では,予測精度を最適化するために,時間に敏感なデータ重み付けを学習する問題を提案する。本稿では,データの変化の複数の時間スケールをキャプチャできる時間的重み付け関数のクラスと,インスタンス固有の特性を提案する。両レベルの最適化基準と関連するメタ学習アルゴリズムを定式化し、これらの重みを学習する。特に,本定式化では,トレーニングインスタンスの関数として重みを出力する補助ネットワークを訓練し,インスタンス重みをコンパクトに表現する。 9年間にわたって拡散した39m画像の大規模な実世界データセット上で,時間的重み付け方式を検証する。我々の広範な実験は、データセットにおけるインスタンスベースの時間的重み付けの必要性を実証し、古典的なバッチ学習アプローチに対する大幅な改善を実現する。さらに,提案手法はストリーミング環境への一般化が容易であり,近年の連続学習手法に比べ,大幅な向上を示す。

Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.

翻訳日:2022-12-13 17:32:00 公開日:2022-12-12

# ロバストなメタラーニングアプローチによる選択的分類

Selective classification using a robust meta-learning approach ( http://arxiv.org/abs/2212.05987v1 )

ライセンス: Link先を確認

Nishant Jain and Pradeep Shenoy

(参考訳) 選択的分類は、モデルが高精度に分類できるテストサンプルのサブセットを識別することを含み、自動医療診断などのアプリケーションにとって重要である。この不確実なサンプルを特定する能力は、より正確な分類器を構築することを目的として、訓練用分類器にも有用である。インスタンスの関数として重要な重み付けを出力するために、1つの補助メタネットワークを訓練することで、これらの二重の役割を統一する。この尺度は、訓練データの再重み付けやテスト時に選択的な分類のためのテストインスタンスのランク付けに使用される。第2のキーとなるのは,メタネットワークのトレーニングのためのドロップアウト分散(ランダムウェイトドロップアウト時の分類器出力のばらつき)を最小化するメタオブジェクトである。学習データにおける分類器損失を最小化し,分離したメタトレーニングデータセット上でのメタ損失を最小化するネスト化目標を用いて,そのメタネットワークと共に分類器を訓練する。例えば、現実世界の糖尿病網膜症データセットでは、最大1.9%のaucと2%の精度で、選択的分類の最先端を上回っています。最後に、我々のメタラーニングフレームワークは、教師なし分散最小化メタオブジェクトを考慮し、教師なしドメイン適応に自然に拡張する。我々は、教師なしドメイン適応を用いた網膜症データセットのドメインシフト設定において、他のベースラインよりも3.4%/3.3%精度とAUCの累積絶対ゲインを示す。

Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.

翻訳日:2022-12-13 17:31:43 公開日:2022-12-12

# PERFEX:信頼できるAIシステムのための分類器のパフォーマンス説明

PERFEX: Classifier Performance Explanations for Trustworthy AI Systems ( http://arxiv.org/abs/2212.06045v1 )

ライセンス: Link先を確認

Erwin Walraven, Ajaya Adhikari, Cor J. Veenman

(参考訳) 実世界の意思決定支援システムに展開する場合、分類モデルの説明性は不可欠である。説明は、予測をユーザに実行可能にし、システムの能力と限界を知らせるべきである。しかし、既存の説明方法は通常、個々の予測についてのみ説明を提供する。分類器が意思決定者をサポートすることができる条件に関する情報は利用できないが、例えば、システムがクラスを区別できない場合の情報は非常に有用である。開発フェーズでは新機能の検索やモデルの組み合わせをサポートし、運用フェーズではシステムを使用しないなどの判断において意思決定者をサポートする。本稿では,PERFEX (PERFormance Explainer) と呼ばれる学習ベース分類器の品質を説明する手法を提案する。本手法は,ベース分類器がエラー度が高いか低いか,その他の分類性能指標を持つ条件下での予測と説明が可能なメタツリー学習アルゴリズムからなる。いくつかの分類器とデータセットを用いてPERFEXを評価し,都市移動データを用いたケーススタディを行った。 PERFEXは, 基本分類器がクラスを区別できない場合でも, コンパクトな性能説明を行いながら, メタ予測性能が高いことが判明した。

Explainability of a classification model is crucial when deployed in real-world decision support systems. Explanations make predictions actionable to the user and should inform about the capabilities and limitations of the system. Existing explanation methods, however, typically only provide explanations for individual predictions. Information about conditions under which the classifier is able to support the decision maker is not available, while for instance information about when the system is not able to differentiate classes can be very helpful. In the development phase it can support the search for new features or combining models, and in the operational phase it supports decision makers in deciding e.g. not to use the system. This paper presents a method to explain the qualities of a trained base classifier, called PERFormance EXplainer (PERFEX). Our method consists of a meta tree learning algorithm that is able to predict and explain under which conditions the base classifier has a high or low error or any other classification performance metric. We evaluate PERFEX using several classifiers and datasets, including a case study with urban mobility data. It turns out that PERFEX typically has high meta prediction performance even if the base classifier is hardly able to differentiate classes, while giving compact performance explanations.

翻訳日:2022-12-13 17:31:17 公開日:2022-12-12

# クロスドメイン自動項抽出用センシングトランス

Ensembling Transformers for Cross-domain Automatic Term Extraction ( http://arxiv.org/abs/2212.05696v1 )

ライセンス: Link先を確認

Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, and Senja Pollak

(参考訳) 自動用語抽出は、ドメイン言語理解といくつかの自然言語処理下流タスクにおいて重要な役割を果たす。本稿では,多言語クロスドメイン環境における用語抽出に向けたトランスフォーマーに基づく事前学習言語モデルの予測能力の比較検討を行う。単言語モデルが単語と多語を抽出できる能力を評価するだけでなく、異なる言語モデルの項出力集合の交点または結合を行うことで、単言語モデルと多言語モデルのアンサンブルを実験する。本研究は,4つの専門ドメイン(故障,風力エネルギー,浮力,心不全)と3つの言語(英語,フランス語,オランダ語)をカバーするACTERコーパスと,さらに4つの追加ドメイン(バイオメカニクス,化学,獣医学,言語学)をカバーするRSDO5スロベニアコーパスについて行った。その結果、単言語モデルを採用する戦略は、単語抽出タスクが名前付きエンティティ項の抽出を除外した場合、オランダ語とフランス語を除くすべての言語について、多言語モデルを活用した関連作業から最先端のアプローチを上回っていることがわかった。さらに,2つの最高性能モデルの出力を組み合わせることで,大幅な改善を実現している。

Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ability of monolingual models to extract single- and multi-word terms, we also experiment with ensembles of mono- and multilingual models by conducting the intersection or union on the term output sets of different language models. Our experiments have been conducted on the ACTER corpus covering four specialized domains (Corruption, Wind energy, Equitation, and Heart failure) and three languages (English, French, and Dutch), and on the RSDO5 Slovenian corpus covering four additional domains (Biomechanics, Chemistry, Veterinary, and Linguistics). The results show that the strategy of employing monolingual models outperforms the state-of-the-art approaches from the related work leveraging multilingual models, regarding all the languages except Dutch and French if the term extraction task excludes the extraction of named entity terms. Furthermore, by combining the outputs of the two best performing models, we achieve significant improvements.

翻訳日:2022-12-13 17:15:53 公開日:2022-12-12

# 複数種類の文脈の統合による効果的なシードガイド付き話題発見

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts ( http://arxiv.org/abs/2212.06002v1 )

ライセンス: Link先を確認

Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han

(参考訳) テキストコーパスから完全に教師されていない方法でコヒーレントなトピックをマイニングする代わりに、シード誘導されたトピック発見手法は、ユーザが提供するシードワードを利用して、ユニークでコヒーレントなトピックを抽出する。単語とシードのセマンティックな相関関係をモデル化するために、既存のシード誘導アプローチでは、文書レベルの単語共起、スライディングウィンドウベースのローカルコンテキスト、事前訓練された言語モデルによってもたらされる汎用言語知識など、さまざまな種類のコンテキスト信号を利用する。本研究は,各文脈情報の価値と限界を実例的に分析・示すものであるが,3種類の文脈(局所的な文脈から学習した単語埋め込み,一般ドメイン学習から得られた事前学習された言語モデル表現,およびシード情報に基づいて検索した話題表現文)を組み合わせることで,品質トピックの発見に相互に補完することができる。本稿では,3種類のコンテキストから共同で学習し,アンサンブルランキングプロセスを通じてコンテキスト信号を徐々に融合する反復的フレームワークSeedTopicMineを提案する。さまざまなシードセットと複数のデータセットに基づいて、SeedTopicMineは、既存のシード誘導トピック発見アプローチよりも一貫性と正確なトピックを一貫して生成する。

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.

翻訳日:2022-12-13 17:15:15 公開日:2022-12-12

# ゼロショット検索のためのクロスエンコーダの防御

In Defense of Cross-Encoders for Zero-Shot Retrieval ( http://arxiv.org/abs/2212.06121v1 )

ライセンス: Link先を確認

Guilherme Rosa and Luiz Bonifacio and Vitor Jeronymo and Hugo Abonizio and Marzieh Fadaee and Roberto Lotufo and Rodrigo Nogueira

(参考訳) バイエンコーダとクロスエンコーダは多くの最先端の検索パイプラインで広く使われている。本研究では,これら2つのアーキテクチャの一般化能力を,ドメイン内シナリオとドメイン外シナリオの両方において,幅広いパラメータ数で検討する。クロスエンコーダのパラメータ数と初期クエリ文書間相互作用は,検索モデルの一般化能力において重要な役割を果たす。実験の結果, モデルサイズの増加はドメイン内テストセットの限界ゲインをもたらすが, ファインチューニング中に見つからなかった新しいドメインでは, はるかに大きなゲインが得られることがわかった。さらに、クロスエンコーダは、複数のタスクにおいて、ほぼ同様のサイズのbiエンコーダよりも優れていることを示す。 BEIRベンチマークでは、我々の最大のクロスエンコーダは最先端のバイエンコーダを4つ以上の平均点で上回っている。最後に,bi-encoderを第1ステージレトリバーとして使用すると,ドメイン外のタスクにおいてbm25のような単純なレトリバーに比べ,何も得られないことを示す。コードはhttps://github.com/guilhermemr04/scaling-zero-shot-retrieval.gitで入手できる。

Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that cross-encoders largely outperform bi-encoders of similar size in several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a state-of-the-art bi-encoder by more than 4 average points. Finally, we show that using bi-encoders as first-stage retrievers provides no gains in comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git

翻訳日:2022-12-13 17:14:37 公開日:2022-12-12

# プラグアンドプレイ拡散モデルに向けて

Towards Practical Plug-and-Play Diffusion Models ( http://arxiv.org/abs/2212.05973v1 )

ライセンス: Link先を確認

Hyojun Go, Yunsung Lee, Jin-Young Kim, Seunghyun Lee, Myeongho Jeong, Hyun Seung Lee, and Seungtaek Choi

(参考訳) 拡散に基づく生成モデルは画像生成において顕著な成功を収めた。彼らのガイダンスの定式化により、外部モデルは拡散モデルを微調整することなく様々なタスクの生成プロセスをプラグ・アンド・プレイで制御できる。しかし、市販の市販オフザシェルフモデルのガイダンスへの直接的利用は、ノイズの多い入力における性能が低かったために失敗する。そのため、既存のプラクティスは、ラベル付きデータがノイズで破損したガイダンスモデルを微調整することです。本稿では,(1)非常に多様なノイズを持つ入力に対して実行することは単一モデルでは難しい,(2)ラベル付きデータセットの収集は様々なタスクのスケールアップを妨げる,という2つの側面に限界がある,と主張する。この制約に対処するために,各専門家が特定のノイズ範囲に特化している複数の専門家を活用し,対応するタイミングで逆処理を誘導する新しい戦略を提案する。しかし,複数ネットワークの管理やラベル付きデータの利用が不可能なため,パラメータ効率の高い微調整とデータフリーな知識伝達を利用した実践的プラグアンドプレイ(PPAP)フレームワークを提案する。我々はImageNetクラス条件生成実験を徹底的に実施し、小さなトレーニング可能なパラメータとラベル付きデータで拡散を導出できることを示す。最後に、画像分類器、深度推定器、セマンティックセグメンテーションモデルが、我々のフレームワークを通じて、プラグイン・アンド・プレイ方式でGLIDEをガイドできることを示す。

Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without fine-tuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner.

翻訳日:2022-12-13 17:07:33 公開日:2022-12-12

# rgbd拡散モデルを用いたインクリメンタルビューインペインティングによる生成シーン合成

Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models ( http://arxiv.org/abs/2212.05993v1 )

ライセンス: Link先を確認

Jiabao Lei, Jiapeng Tang, Kui Jia

(参考訳) 我々は,rgbd ビュー観測のスパース集合から基盤となるシーン幾何と色を復元する課題に対処した。本研究では,カメラの軌跡に沿って新たなrgbdビューを逐次生成する新しいソリューションを提案する。より具体的には、新しいrgbdビューのレンダリングに使用される中間面メッシュを維持し、その後、塗装されたネットワークによって完成させ、レンダリングされたrgbdビューは、後に部分面としてバックプロジェクションされ、中間メッシュに補完される。中間メッシュとカメラプロジェクションの使用は、多視点不整合の屈折問題を解くのに役立つ。我々は,従来2次元生成モデリングに用いられてきた汎用的なrgbd拡散モデルとして,rgbdインパインティングネットワークを実際に実装した。我々は,sparse rgbd入力からの3次元シーン合成のタスクに対するアプローチを評価した。プロジェクトページ: https://jblei.site/project-pages/rgbd-diffusion.html

We address the challenge of recovering an underlying scene geometry and colors from a sparse set of RGBD view observations. In this work, we present a new solution that sequentially generates novel RGBD views along a camera trajectory, and the scene geometry is simply the fusion result of these views. More specifically, we maintain an intermediate surface mesh used for rendering new RGBD views, which subsequently becomes complete by an inpainting network; each rendered RGBD view is later back-projected as a partial surface and is supplemented into the intermediate mesh. The use of intermediate mesh and camera projection helps solve the refractory problem of multi-view inconsistency. We practically implement the RGBD inpainting network as a versatile RGBD diffusion model, which is previously used for 2D generative modeling; we make a modification to its reverse diffusion process to enable our use. We evaluate our approach on the task of 3D scene synthesis from sparse RGBD inputs; extensive experiments on the ScanNet dataset demonstrate the superiority of our approach over existing ones. Project page: https://jblei.site/project-pages/rgbd-diffusion.html

翻訳日:2022-12-13 17:07:07 公開日:2022-12-12

# 自動口内感情認識の改良の試み

An Approach for Improving Automatic Mouth Emotion Recognition ( http://arxiv.org/abs/2212.06009v1 )

ライセンス: Link先を確認

Giulio Biondi, Valentina Franzoni, Osvaldo Gervasi, Damiano Perri

(参考訳) 本研究は,コンボリューションニューラルネット(cnn)を介して,感情を認識し,リアルタイムフィードバックを生成するために,コミュニケーションスキルの問題(筋肉の浪費,脳卒中,自閉症,より簡単には痛みなど)を伴う健康障害を持つ人を支援することを目的とした,口内検出による感情自動認識手法の提案とテストを行う。ソフトウェアシステムは、取得した画像に顔が存在するかどうかを識別する計算を開始し、次に口の位置を探し、対応する特徴を抽出する。両方のタスクはhaar機能ベースの分類器を使用して実行され、高速実行と有望なパフォーマンスが保証される。これまでの作業が,単一ユーザに対するパーソナライズされたトレーニングのための視覚的なマイクロ表現に重点を置いていたならば,この戦略は,汎用的な顔データセットでもシステムをトレーニングすることを目的としています。

The study proposes and tests a technique for automated emotion recognition through mouth detection via Convolutional Neural Networks (CNN), meant to be applied for supporting people with health disorders with communication skills issues (e.g. muscle wasting, stroke, autism, or, more simply, pain) in order to recognize emotions and generate real-time feedback, or data feeding supporting systems. The software system starts the computation identifying if a face is present on the acquired image, then it looks for the mouth location and extracts the corresponding features. Both tasks are carried out using Haar Feature-based Classifiers, which guarantee fast execution and promising performance. If our previous works focused on visual micro-expressions for personalized training on a single user, this strategy aims to train the system also on generalized faces data sets.

翻訳日:2022-12-13 17:06:47 公開日:2022-12-12

# humpty dumpty再構成:オープンセット動作認識のための多機能グラフオートエンコーダ

Reconstructing Humpty Dumpty: Multi-feature Graph Autoencoder for Open Set Action Recognition ( http://arxiv.org/abs/2212.06023v1 )

ライセンス: Link先を確認

Dawei Du, Ameya Shringi, Anthony Hoogs, Christopher Funk

(参考訳) ほとんどのアクション認識データセットとアルゴリズムは、すべてのテストサンプルが既知のクラスのインスタンスであるクローズドワールドを想定している。開集合問題では、テストサンプルは既知のクラスまたは未知のクラスから引き出すことができる。既存のオープンセット動作認識法は、通常、分類スコアや特徴距離のポストホック分析を追加することによって拡張された閉セット法に基づいており、すべてのビデオクリップ要素間の関係を捉えない。本手法は,未知のクラスを組み戻すのが難しく,既知のクラスのビデオよりも高い再構成誤差を有するため,映像の新規性を決定するために再構成誤差を用いる。我々は,オープンセット動作認識問題に対する我々の解決策を,その再構築能力から「ハンプティダンプティ」と呼んでいる。 humpty dumptyは、新しいグラフベースのオートエンコーダで、クリップのコンテクストとセマンティクスの関係を考慮し、再構成を改善する。より大きなリコンストラクションエラーは、アクションが再構築できない可能性、すなわち、ハンプティダンプティを再び戻せない可能性の増加をもたらし、そのアクションがこれまで見たことがなく、新規/未知であることを示している。 HMDB-51とUCF-101を含む2つの一般公開された行動認識データセットで大規模な実験を行い、オープンセットの行動認識の最先端性能を示す。

Most action recognition datasets and algorithms assume a closed world, where all test samples are instances of the known classes. In open set problems, test samples may be drawn from either known or unknown classes. Existing open set action recognition methods are typically based on extending closed set methods by adding post hoc analysis of classification scores or feature distances and do not capture the relations among all the video clip elements. Our approach uses the reconstruction error to determine the novelty of the video since unknown classes are harder to put back together and thus have a higher reconstruction error than videos from known classes. We refer to our solution to the open set action recognition problem as "Humpty Dumpty", due to its reconstruction abilities. Humpty Dumpty is a novel graph-based autoencoder that accounts for contextual and semantic relations among the clip pieces for improved reconstruction. A larger reconstruction error leads to an increased likelihood that the action can not be reconstructed, i.e., can not put Humpty Dumpty back together again, indicating that the action has never been seen before and is novel/unknown. Extensive experiments are performed on two publicly available action recognition datasets including HMDB-51 and UCF-101, showing the state-of-the-art performance for open set action recognition.

翻訳日:2022-12-13 17:06:30 公開日:2022-12-12

# 効率的な変圧器による映像予測

Video Prediction by Efficient Transformers ( http://arxiv.org/abs/2212.06026v1 )

ライセンス: Link先を確認

Xi Ye, Guillaume-Alexandre Bilodeau

(参考訳) ビデオ予測は、幅広いアプリケーションを持つコンピュータビジョンの課題である。そこで本研究では,ビデオ予測のためのトランスフォーマーモデルについて紹介する。まず, 標準変圧器の複雑さを低減するため, 効率的な局所空間分離注意機構を提案する。そして、新しい効率的な変圧器に基づいて、完全自己回帰モデル、部分自己回帰モデル、非自己回帰モデルを開発した。部分自己回帰モデルは完全な自己回帰モデルと同様の性能を持つが、より高速な推論速度を持つ。非自己回帰モデルは、高速な推論速度を達成するだけでなく、自己回帰モデルの品質劣化問題を緩和するだけでなく、学習のために追加のパラメータと損失関数を必要とする。そこで本研究では,提案する3種類の映像予測手法を総合的に検討した。実験により,提案するビデオ予測モデルは,より複雑な畳み込み型lstmモデルと競合することが示された。ソースコードはhttps://github.com/XiYe20/VPTRで入手できる。

Video prediction is a challenging computer vision task that has a wide range of applications. In this work, we present a new family of Transformer-based models for video prediction. Firstly, an efficient local spatial-temporal separation attention mechanism is proposed to reduce the complexity of standard Transformers. Then, a full autoregressive model, a partial autoregressive model and a non-autoregressive model are developed based on the new efficient Transformer. The partial autoregressive model has a similar performance with the full autoregressive model but a faster inference speed. The non-autoregressive model not only achieves a faster inference speed but also mitigates the quality degradation problem of the autoregressive counterparts, but it requires additional parameters and loss function for learning. Given the same attention mechanism, we conducted a comprehensive study to compare the proposed three video prediction variants. Experiments show that the proposed video prediction models are competitive with more complex state-of-the-art convolutional-LSTM based models. The source code is available at https://github.com/XiYe20/VPTR.

翻訳日:2022-12-13 17:06:07 公開日:2022-12-12

# 等分散によるロバスト知覚

Robust Perception through Equivariance ( http://arxiv.org/abs/2212.06079v1 )

ライセンス: Link先を確認

Chengzhi Mao, Lingyu Zhang, Abhishek Joshi, Junfeng Yang, Hao Wang, Carl Vondrick

(参考訳) コンピュータビジョンのためのディープネットワークは、敵の例に遭遇すると信頼できない。本稿では,自然画像における密集した内在的制約を用いて推論を堅牢化する枠組みを提案する。推論時間に制約を導入することで、ロバストネスの負担をトレーニングから推論アルゴリズムにシフトさせることにより、モデルが各画像のユニークで潜在的に新しい特徴に対して、推論時に動的に調整することができる。異なる制約のうち、等分散に基づく制約が最も効果的であることは、細粒度レベルで表現を過度に制約することなく、機能空間における密集した制約を可能にするためである。理論的な結果は, 推定時にそのような密度制約を持つことの重要性を検証した。実験の結果, 推定時間における特徴等分散の復元は, 最悪の対向摂動を防御することが示された。本手法は,画像認識,セマンティックセグメンテーション,インスタンスセグメンテーションの4つのデータセット(ImageNet,Cityscapes,PASCAL VOC,MS-COCO)の対向ロバスト性を向上させる。プロジェクトページは equi4robust.cs.columbia.edu で公開されている。

Deep networks for computer vision are not reliable when they encounter adversarial examples. In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference. By introducing constraints at inference time, we can shift the burden of robustness from training to the inference algorithm, thereby allowing the model to adjust dynamically to each individual image's unique and potentially novel characteristics at inference time. Among different constraints, we find that equivariance-based constraints are most effective, because they allow dense constraints in the feature space without overly constraining the representation at a fine-grained level. Our theoretical results validate the importance of having such dense constraints at inference time. Our empirical experiments show that restoring feature equivariance at inference time defends against worst-case adversarial perturbations. The method obtains improved adversarial robustness on four datasets (ImageNet, Cityscapes, PASCAL VOC, and MS-COCO) on image recognition, semantic segmentation, and instance segmentation tasks. Project page is available at equi4robust.cs.columbia.edu.

翻訳日:2022-12-13 17:05:54 公開日:2022-12-12

# ゼロショット検出における意味的混乱の解消

Resolving Semantic Confusions for Improved Zero-Shot Detection ( http://arxiv.org/abs/2212.06097v1 )

ライセンス: Link先を確認

Sandipan Sarma, Sushil Kumar, Arijit Sur

(参考訳) ゼロショット検出(zsd)は、モデルがいくつかのターゲット("unseen")クラスの視覚的なサンプルでトレーニングされていなくても、オブジェクトの認識とローカライズを同時に行なおうとする、難しいタスクです。近年、GANのような生成モデルを用いた手法は、目に見えるデータに基づいて訓練されたGANによって、目に見えないサンプルが生成され、バニラオブジェクト検出器が見えないオブジェクトを認識できるという、最良の結果を示している。しかし、意味的混乱の問題はまだ残っており、モデルが意味的類似クラスを区別できないこともある。本研究では,クラス間の相違の度合いを認識し,生成したサンプルに反映する三重項損失を取り入れた生成モデルを訓練することを提案する。さらに、クラスの生成したビジュアルサンプルが、自身のセマンティクスに高度に対応することを保証するために、サイクリック一貫性損失も実施される。 MSCOCOとPASCAL-VOCの2つのベンチマークZSDデータセットに対する大規模な実験は、現在のZSDメソッドよりも大幅に向上し、意味的混乱を低減し、目に見えないクラスの検出を改善する。

Zero-shot detection (ZSD) is a challenging task where we aim to recognize and localize objects simultaneously, even when our model has not been trained with visual samples of a few target ("unseen") classes. Recently, methods employing generative models like GANs have shown some of the best results, where unseen-class samples are generated based on their semantics by a GAN trained on seen-class data, enabling vanilla object detectors to recognize unseen objects. However, the problem of semantic confusion still remains, where the model is sometimes unable to distinguish between semantically-similar classes. In this work, we propose to train a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes and reflects them in the generated samples. Moreover, a cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics. Extensive experiments on two benchmark ZSD datasets - MSCOCO and PASCAL-VOC - demonstrate significant gains over the current ZSD methods, reducing semantic confusion and improving detection for the unseen classes.

翻訳日:2022-12-13 17:05:31 公開日:2022-12-12

# 皮膚癌分類のためのシームズニューラルネットワークと臨床および皮膚内視鏡画像データセットを用いた新しいクラス検出

Siamese Neural Networks for Skin Cancer Classification and New Class Detection using Clinical and Dermoscopic Image Datasets ( http://arxiv.org/abs/2212.06130v1 )

ライセンス: Link先を確認

Michael Luke Battle, Amir Atapour-Abarghouei, Andrew Stephen McGough

(参考訳) 皮膚がんは世界でもっとも一般的な悪性腫瘍である。自動皮膚がん検出は早期発見率を大幅に改善し、死亡を防ぐ。この目的を達成するために、ディープラーニングシステムのトレーニングに使用できる多くのデータセットがリリースされた。しかし、これはトレーニング対象のクラスでのみ有効であり、これまで見つからなかったクラスから皮膚の病変を識別できないため、臨床使用には適さない。私たちは、すべての皮膚病変を含むことでデータセットを大幅に増やすことを検討できますが、これは常にいくつかのクラスを除外します。代わりに、SNN(Siamese Neural Networks)を評価し、皮膚病変の画像の分類を可能にするだけでなく、トレーニングされたクラスとは異なるイメージを識別することが可能になる。皮膚病変の皮膚内視鏡像と臨床像でSNNを評価した。臨床データと皮膚内視鏡データからtop-1分類精度の74.33%と85.61%を得た。これは最先端の結果よりもわずかに低いが、SNNアプローチはクラス外例を検出するという利点がある。本研究はSNNアプローチの可能性と今後の臨床展開への道筋を明らかにするものである。

Skin cancer is the most common malignancy in the world. Automated skin cancer detection would significantly improve early detection rates and prevent deaths. To help with this aim, a number of datasets have been released which can be used to train Deep Learning systems - these have produced impressive results for classification. However, this only works for the classes they are trained on whilst they are incapable of identifying skin lesions from previously unseen classes, making them unconducive for clinical use. We could look to massively increase the datasets by including all possible skin lesions, though this would always leave out some classes. Instead, we evaluate Siamese Neural Networks (SNNs), which not only allows us to classify images of skin lesions, but also allow us to identify those images which are different from the trained classes - allowing us to determine that an image is not an example of our training classes. We evaluate SNNs on both dermoscopic and clinical images of skin lesions. We obtain top-1 classification accuracy levels of 74.33% and 85.61% on clinical and dermoscopic datasets, respectively. Although this is slightly lower than the state-of-the-art results, the SNN approach has the advantage that it can detect out-of-class examples. Our results highlight the potential of an SNN approach as well as pathways towards future clinical deployment.

翻訳日:2022-12-13 17:05:08 公開日:2022-12-12

# Rodin:拡散を利用した3Dデジタルアバターの創成モデル

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion ( http://arxiv.org/abs/2212.06135v1 )

ライセンス: Link先を確認

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

(参考訳) 本稿では,拡散モデルを用いて神経放射場を表す3次元デジタルアバターを自動生成する3次元生成モデルを提案する。このようなアバターを生成する上での重大な課題は、3dのメモリと処理コストが高品質アバターに必要なリッチなディテールを生成できることである。この問題を解決するために,複数の2次元特徴写像として神経放射場を表すロールアウト拡散ネットワーク (Rodin) を提案し,これらのマップを1つの2次元特徴平面にロールアウトして3次元拡散を行う。 Rodinモデルでは、3Dにおける拡散の完全性を維持しつつ、3Dにおける元の関係に従って2D特徴面の投影された特徴に付随する3D認識畳み込みを用いて計算効率を向上する。我々はまた,グローバルコヒーレンスのための特徴生成のオーケストレーションに潜時条件付けを使用し,高忠実度アバターを実現し,テキストプロンプトに基づく意味的な編集を可能にする。最後に,階層合成を用いてさらに詳細化を行う。モデルにより生成された3Dアバターは,既存の生成技術とよく比較できる。リアルな髪型とあごひげのような顔の毛を持つ、非常に詳細なアバターを生成できる。また,画像やテキストからの3dアバター生成や,テキストガイドによる編集性を示す。

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability.

翻訳日:2022-12-13 17:04:49 公開日:2022-12-12

# NMSが復活

NMS Strikes Back ( http://arxiv.org/abs/2212.06137v1 )

ライセンス: Link先を確認

Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Kr\"ahenb\"uhl

(参考訳) Detection Transformer (DETR)は、トレーニング中に1対1のバイパーティイトマッチングを使用してクエリをユニークなオブジェクトに変換し、エンドツーエンドのオブジェクト検出を可能にする。近年、これらのモデルはCOCO上の従来の検出器を優雅に越えている。しかし、モデルアーキテクチャやトレーニングスケジュールを含む複数の設計における従来の検出器とは異なり、1対1マッチングの有効性は十分に理解されていない。本研究では,DETRにおける1対1のハンガリー語マッチングと,NMSを用いた従来の検出器における1対多のラベル割り当てとの厳密な比較を行う。意外なことに、NMSによる1対多の割り当ては、同じ設定で標準の1対1マッチングを一貫して上回り、最大2.5mAPで大幅に向上する。従来のIoUをベースとしたラベル割り当てでDeformable-DETRをトレーニングする検出器は、ResNet50のバックボーンで12時間で50.2COCOmAPを達成した。複数のデータセット、スケジュール、アーキテクチャにおいて、パフォーマンス検出変換器には二部マッチングが不要であることを示す。さらに,検出トランスの成功は,その表現型トランスアーキテクチャによるものである。コードはhttps://github.com/jozhang97/DETAで入手できる。

Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.

翻訳日:2022-12-13 17:04:28 公開日:2022-12-12

# 半正則表面メッシュ上のスペクトル畳み込みオートエンコーダを用いた転送学習

Transfer Learning using Spectral Convolutional Autoencoders on Semi-Regular Surface Meshes ( http://arxiv.org/abs/2212.05810v1 )

ライセンス: Link先を確認

Sara Hahner, Felix Kerkhoff, Jochen Garcke

(参考訳) 時間とともに変形する3次元表面メッシュの基盤となるダイナミクスとパターンは、教師なし学習、特に表面の低次元埋め込みを計算するオートエンコーダによって発見される。トランスファーラーニングにより未知形状の変形パターンを研究するために,新しいネットワークをトレーニングすることなく,新しい表面メッシュを解析できるオートエンコーダを訓練したい。ここでは、ほとんどの最先端のオートエンコーダは異なる接続のメッシュを扱えないため、新しいメッシュへの一般化能力に制限される。また, トレーニング形状の誤差と比較して, 復元誤差が強くなった。そこで本研究では,新しいスペクトルCoSMA(Convolutional Semi-Regular Mesh Autoencoder)ネットワークを提案する。このパッチベースのアプローチは、表面認識トレーニングと組み合わせられる。トレーニング中に提示されない表面を再構成し、表面のパッチの変形挙動を一般化する。新たなアプローチでは,これらの形状をトレーニングした最先端オートエンコーダと比較して,さまざまなデータセットから未認識のメッシュを優れた品質で再構成する。データ上で直接学習したモデルよりも,目に見えない形状でのトランスファー学習誤差は40%低い。さらに、ベースラインオートエンコーダは、全体の形状にのみ、目に見えないメッシュシーケンスの変形パターンを検出する。対照的に, 使用済みの地域パッチと安定した復元品質により, これらの変形パターンが表れる場所を局所化することができる。

The underlying dynamics and patterns of 3D surface meshes deforming over time can be discovered by unsupervised learning, especially autoencoders, which calculate low-dimensional embeddings of the surfaces. To study the deformation patterns of unseen shapes by transfer learning, we want to train an autoencoder that can analyze new surface meshes without training a new network. Here, most state-of-the-art autoencoders cannot handle meshes of different connectivity and therefore have limited to no generalization capacities to new meshes. Also, reconstruction errors strongly increase in comparison to the errors for the training shapes. To address this, we propose a novel spectral CoSMA (Convolutional Semi-Regular Mesh Autoencoder) network. This patch-based approach is combined with a surface-aware training. It reconstructs surfaces not presented during training and generalizes the deformation behavior of the surfaces' patches. The novel approach reconstructs unseen meshes from different datasets in superior quality compared to state-of-the-art autoencoders that have been trained on these shapes. Our transfer learning errors on unseen shapes are 40% lower than those from models learned directly on the data. Furthermore, baseline autoencoders detect deformation patterns of unseen mesh sequences only for the whole shape. In contrast, due to the employed regional patches and stable reconstruction quality, we can localize where on the surfaces these deformation patterns manifest.

翻訳日:2022-12-13 16:58:27 公開日:2022-12-12

# SAR画像における船舶検出効率の最適化

Optimizing ship detection efficiency in SAR images ( http://arxiv.org/abs/2212.05843v1 )

ライセンス: Link先を確認

Arthur Van Meerbeeck, Jordy Van Landeghem, Ruben Cartuyvels, Marie-Francine Moens

(参考訳) 違法漁業の検出と防止は、健全で機能的な生態系を維持するために重要である。衛星画像における船の検出に関する最近の研究は、性能の向上と検出効率の無視に重点を置いている。しかし,漁獲防止のための時間的介入には,船体検出の速度と計算コストが不可欠である。そこで本研究では,検出時間とコストを最小限に抑える最適化手法を検討した。衛星画像のデータセットを用いて,畳み込みニューラルネットワーク(CNN)に基づく物体検出モデルを訓練した。次に,ベースcnnや他のベースモデルに適用可能な2つの効率最適化を考案した。最適化は高速で安価な分類モデルと統計アルゴリズムから構成される。オブジェクト検出モデルとの最適化の統合は、速度と性能のトレードオフをもたらす。私たちは、実行時間とパフォーマンスに異なる重み付けを与えるメトリクスを使ってトレードオフを調べました。分類モデルを用いることで,検出モデルの精度を44%で99.5%,25%で92.7%に近似できることを示した。

The detection and prevention of illegal fishing is critical to maintaining a healthy and functional ecosystem. Recent research on ship detection in satellite imagery has focused exclusively on performance improvements, disregarding detection efficiency. However, the speed and compute cost of vessel detection are essential for a timely intervention to prevent illegal fishing. Therefore, we investigated optimization methods that lower detection time and cost with minimal performance loss. We trained an object detection model based on a convolutional neural network (CNN) using a dataset of satellite images. Then, we designed two efficiency optimizations that can be applied to the base CNN or any other base model. The optimizations consist of a fast, cheap classification model and a statistical algorithm. The integration of the optimizations with the object detection model leads to a trade-off between speed and performance. We studied the trade-off using metrics that give different weight to execution time and performance. We show that by using a classification model the average precision of the detection model can be approximated to 99.5% in 44% of the time or to 92.7% in 25% of the time.

翻訳日:2022-12-13 16:58:08 公開日:2022-12-12

# CbwLoss: 深さと姿勢の自己教師型学習のための制約付き双方向重み付き損失

CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose ( http://arxiv.org/abs/2212.05845v1 )

ライセンス: Link先を確認

Fei Wang, Jun Cheng, Penglei Liu

(参考訳) 光度差は、未ラベルの単眼ビデオから深度とカメラのポーズを推定するためのニューラルネットワークを訓練するための監視信号として広く使用されている。しかし、この手法は静的シナリオの仮定に違反するため、モデル最適化にとって有害である。さらに、テクスチャレス領域の画素や、差別的な画素がモデルトレーニングを妨げる。そこで本研究では,アフィン変換とビュー合成によって生じる流れ場と深さ構造の違いを利用して,移動物体とオクルージョンをそれぞれ扱う。第2に,ネットワークを追加することなく,より意味のある特徴と文脈的な情報の違いを測定することにより,テクスチャレス領域がモデル最適化に与える影響を緩和する。さらに、各サブオブジェクト関数で双方向成分が使用されるが、一対の画像を1回だけ推論することでオーバーヘッドを低減できる。提案手法は,同一条件下で,かつ補助的な情報を導入することなく,既存の自己教師あり手法に勝る効果を示す。

Photometric differences are widely used as supervision signals to train neural networks for estimating depth and camera pose from unlabeled monocular videos. However, this approach is detrimental for model optimization because occlusions and moving objects in a scene violate the underlying static scenario assumption. In addition, pixels in textureless regions or less discriminative pixels hinder model training. To solve these problems, in this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis, respectively. Secondly, we mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks. In addition, although the bidirectionality component is used in each sub-objective function, a pair of images are reasoned about only once, which helps reduce overhead. Extensive experiments and visual analysis demonstrate the effectiveness of the proposed method, which outperform existing state-of-the-art self-supervised methods under the same conditions and without introducing additional auxiliary information.

翻訳日:2022-12-13 16:57:54 公開日:2022-12-12

# BeautyREC:ロバスト、効率的、およびコンテンツ保存メイクアップ転送

BeautyREC: Robust, Efficient, and Content-preserving Makeup Transfer ( http://arxiv.org/abs/2212.05855v1 )

ライセンス: Link先を確認

Qixin Yan and Chunle Guo and Jixin Zhao and Yuekun Dai and Chen Change Loy and Chongyi Li

(参考訳) 本稿では,Robust,Efficient and Component-specific makeup transfer method (BeautyREC)を提案する。グローバル注意を活用し、単に特徴を結合したり、潜在空間で特徴を暗黙的に操作したりする先行手法からのユニークな脱却として、参照画像のメイクアップスタイルを、ソース画像の対応するコンポーネント(例えば、肌、唇、目)に直接転送し、精巧で正確な局所メイクアップ転送を行うコンポーネント固有対応を提案する。補助として、Transformerの長距離視覚依存性を導入して、効率的なグローバルメイク転送を実現する。複雑で不安定なサイクル構造の代わりに、コンテントエンコーダと組み合わされたコンテンツ一貫性損失を用いて、効率的なシングルパスメイクアップ転送を実現する。本研究の主な知見は, 局所メイク転送のためのコンポーネント固有対応のモデル化, グローバルメイク転送のための長距離依存関係の取得, シングルパス構造による効率的なメイク転送の実現である。既存のデータセットを補完するメークアップ転送データセットであるbeautyfaceも提供しています。このデータセットには3000の顔が含まれ、より多様なメイクスタイル、顔のポーズ、レースをカバーしている。各顔にはアノテーション付きパースマップがある。本手法の最先端手法に対する有効性を示す実験を行った。また,本手法は100Mパラメータのみで,最先端手法(BeautyGAN: 8.43M, PSGAN: 12.62M, SCGAN: 15.30M, CPM: 9.24M, SSAT: 10.48M)より優れている。

In this work, we propose a Robust, Efficient, and Component-specific makeup transfer method (abbreviated as BeautyREC). A unique departure from prior methods that leverage global attention, simply concatenate features, or implicitly manipulate features in latent space, we propose a component-specific correspondence to directly transfer the makeup style of a reference image to the corresponding components (e.g., skin, lips, eyes) of a source image, making elaborate and accurate local makeup transfer. As an auxiliary, the long-range visual dependencies of Transformer are introduced for effective global makeup transfer. Instead of the commonly used cycle structure that is complex and unstable, we employ a content consistency loss coupled with a content encoder to implement efficient single-path makeup transfer. The key insights of this study are modeling component-specific correspondence for local makeup transfer, capturing long-range dependencies for global makeup transfer, and enabling efficient makeup transfer via a single-path structure. We also contribute BeautyFace, a makeup transfer dataset to supplement existing datasets. This dataset contains 3,000 faces, covering more diverse makeup styles, face poses, and races. Each face has annotated parsing map. Extensive experiments demonstrate the effectiveness of our method against state-of-the-art methods. Besides, our method is appealing as it is with only 1M parameters, outperforming the state-of-the-art methods (BeautyGAN: 8.43M, PSGAN: 12.62M, SCGAN: 15.30M, CPM: 9.24M, SSAT: 10.48M).

翻訳日:2022-12-13 16:57:34 公開日:2022-12-12

# CountingMOT:複数物体追跡のための共同カウント、検出、再同定

CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking ( http://arxiv.org/abs/2212.05861v1 )

ライセンス: Link先を確認

Weihong Ren, Bowen Chen, Yuhang Shi, Weibo Jiang and Honghai Liu

(参考訳) マルチオブジェクトトラッキング(mot)の最近のトレンドは、オブジェクト検出と出現機能(あるいは動き)を同時に学習する検出と追跡を共同で解決している。競争性能にもかかわらず、混雑したシーンでは、共同検出と追跡は通常、ミスや誤検出のために正確なオブジェクト関連を見つけることができない。本稿では,混み合うシーンに適したエンドツーエンドフレームワークであるCountingMOTのカウント,検出,再識別を共同でモデル化する。検出とカウントの間にオブジェクトカウントの制約を課すことで、countingmotはオブジェクト検出とクラウド密度マップ推定のバランスを見つけようとする。私たちのアプローチは、オブジェクトの検出、カウント、再同定のギャップを埋める試みです。これは、群衆密度を無視して、混み合ったシーンで失敗する傾向にある以前のMOT手法とは対照的である。提案手法は,mot16(mota:77.6),mot17(mota:78.0%),mot20(mota:70.2%)の公開ベンチマークにおいて,オンラインおよびリアルタイムのトラッキングを行うことができる。

The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 77.6), MOT17 (MOTA of 78.0%) and MOT20 (MOTA of 70.2%).

翻訳日:2022-12-13 16:57:03 公開日:2022-12-12

# Diff-Font:ロバストワンショットフォント生成のための拡散モデル

Diff-Font: Diffusion Model for Robust One-Shot Font Generation ( http://arxiv.org/abs/2212.05895v1 )

ライセンス: Link先を確認

Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao, Yu Qiao

(参考訳) フォント生成は困難で時間を要する作業であり、特に中国語など多数の文字を持つ複雑な構造を持つイデオグラムを用いた言語では特に困難である。この問題を解決するために、少数ショットフォント生成やワンショットフォント生成さえも注目されている。しかし、既存のフォント生成メソッドの多くは、まだ苦しむ可能性がある。 (i)大規模なクロスファントギャップチャレンジ (二)微妙なクロスファント変動問題、及び (三)複雑な文字を誤って生成すること。本稿では,大きなデータセット上で安定的に学習できる拡散モデルに基づく新しいワンショットフォント生成法diff-fontを提案する。提案モデルは,フォントライブラリ全体を生成することを目的として,参照として1つのサンプルのみを与える。具体的には、大きなストロークワイドデータセットを構築し、各生成された文字の構造と完了を保存するためのストロークワイド拡散モデルを提案する。我々の知る限りでは、フォント生成タスクを処理する拡散モデルを開発した最初のDiff-Fontが提案されている。十分に訓練されたdiff-fontはフォントギャップやフォントのバリエーションに頑健なだけでなく、難しい文字生成でも有望な性能を達成している。従来のフォント生成手法と比較して,本モデルは質的かつ定量的に,最先端の性能に達する。

Font generation is a difficult and time-consuming task, especially in those languages using ideograms that have complicated structures with a large number of characters, such as Chinese. To solve this problem, few-shot font generation and even one-shot font generation have attracted a lot of attention. However, most existing font generation methods may still suffer from (i) large cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii) incorrect generation of complicated characters. In this paper, we propose a novel one-shot font generation method based on a diffusion model, named Diff-Font, which can be stably trained on large datasets. The proposed model aims to generate the entire font library by giving only one sample as the reference. Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character. To our best knowledge, the proposed Diff-Font is the first work that developed diffusion models to handle the font generation task. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation. Compared to previous font generation methods, our model reaches state-of-the-art performance both qualitatively and quantitatively.

翻訳日:2022-12-13 16:56:38 公開日:2022-12-12

# multiact: 複数のアクションラベルからの長期的3次元動作生成

MultiAct: Long-Term 3D Human Motion Generation from Multiple Action Labels ( http://arxiv.org/abs/2212.05897v1 )

ライセンス: Link先を確認

Taeryung Lee, Gyeongsik Moon, and Kyoung Mu Lee

(参考訳) 複数のアクションラベルから長期の3次元人間の動きを生成する問題に取り組む。アクションとモーションコンディショニングの2つの主要なアプローチには、この問題を解決するための制限がある。アクション条件付きメソッドは、単一のアクションから一連の動きを生成する。したがって、複数のアクションとアクション間の遷移からなる長期的な動作は生成できない。一方、モーションコンディショニング方式は、初期動作から将来の動きを生成する。生成された将来の動作は過去のみに依存するため、ユーザの望ましいアクションによって制御できない。複数のアクションラベルから長期的3次元動作を生成する最初のフレームワークであるmultiactを提案する。 MultiActは、動作条件と動作条件の両方を統一されたリカレント生成システムで考慮する。前の動作とアクションラベルを繰り返すと、その動作のスムーズな遷移と動きが生成される。その結果、MultiActは複数のアクションラベルの与えられたシーケンスによって制御される現実的な長期動作を生成する。コードはリリースされます。

We tackle the problem of generating long-term 3D human motion from multiple action labels. Two main previous approaches, such as action- and motion-conditioned methods, have limitations to solve this problem. The action-conditioned methods generate a sequence of motion from a single action. Hence, it cannot generate long-term motions composed of multiple actions and transitions between actions. Meanwhile, the motion-conditioned methods generate future motions from initial motion. The generated future motions only depend on the past, so they are not controllable by the user's desired actions. We present MultiAct, the first framework to generate long-term 3D human motion from multiple action labels. MultiAct takes account of both action and motion conditions with a unified recurrent generation system. It repetitively takes the previous motion and action label; then, it generates a smooth transition and the motion of the given action. As a result, MultiAct produces realistic long-term motion controlled by the given sequence of multiple action labels. The code will be released.

翻訳日:2022-12-13 16:56:17 公開日:2022-12-12

# 階層型アンカーフリー物体検出器を用いた深層学習に基づく非定型および正常ミトースのサブタイプ

Deep learning-based Subtyping of Atypical and Normal Mitoses using a Hierarchical Anchor-Free Object Detector ( http://arxiv.org/abs/2212.05900v1 )

ライセンス: Link先を確認

Marc Aubreville, Jonathan Ganz, Jonas Ammeling, Taryn A. Donovan, Rutger H. J. Fick, Katharina Breininger, Christof A. Bertram

(参考訳) 多くの腫瘍の悪性度評価には, 分裂活性が重要である。また,正常ミトーシスに対する異常ミトーシスの割合は予後に重要な意味があることが示されている。非定型的有糸分裂図形(MF)は、形態学的に、クロマチドの分離異常を有すると同定できる。本研究は,ミトーシスの異なる相の特徴的な形態的出現に応じて,初めて,ミトーシス図形の正常および非定型分類への自動サブタイピングを行う。 MIDOG21とTUPAC16の乳がん有糸分裂データセットを用いて、2人の専門家が盲目的に有糸分裂像を5つの形態分類に分類した。さらに,アンカーフリーなfcosアプローチを拡張した最先端オブジェクト検出パイプラインを,階層的サブクラス化ブランチで構築した。我々のラベル付け実験は、有糸分裂体のサブタイプは難しい課題であり、MFの24.89%で見られるラター間不一致の傾向を示した。より多様な訓練用midog21データセットと試験用tupac16を用いて、平均平均精度スコア0.552、非定型/正常mf用roc aucスコア0.833、mitosisの異なる相の細胞を識別するための平均クラス平均roc-aucスコア0.977に達した。

Mitotic activity is key for the assessment of malignancy in many tumors. Moreover, it has been demonstrated that the proportion of abnormal mitosis to normal mitosis is of prognostic significance. Atypical mitotic figures (MF) can be identified morphologically as having segregation abnormalities of the chromatids. In this work, we perform, for the first time, automatic subtyping of mitotic figures into normal and atypical categories according to characteristic morphological appearances of the different phases of mitosis. Using the publicly available MIDOG21 and TUPAC16 breast cancer mitosis datasets, two experts blindly subtyped mitotic figures into five morphological categories. Further, we set up a state-of-the-art object detection pipeline extending the anchor-free FCOS approach with a gated hierarchical subclassification branch. Our labeling experiment indicated that subtyping of mitotic figures is a challenging task and prone to inter-rater disagreement, which we found in 24.89% of MF. Using the more diverse MIDOG21 dataset for training and TUPAC16 for testing, we reached a mean overall average precision score of 0.552, a ROC AUC score of 0.833 for atypical/normal MF and a mean class-averaged ROC-AUC score of 0.977 for discriminating the different phases of cells undergoing mitosis.

翻訳日:2022-12-13 16:56:04 公開日:2022-12-12

# SRouDA:ロバストな教師なしドメイン適応のためのメタセルフトレーニング

SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation ( http://arxiv.org/abs/2212.05917v1 )

ライセンス: Link先を確認

Wanqing Zhu, Jia-Li Yin, Bo-Hao Chen, Ximeng Liu

(参考訳) データの手動ラベルの取得にはコストがかかる可能性があるため、リッチラベルデータセットから未ラベルのターゲットデータセットに学習した知識を転送するunsupervised domain adaptation(UDA)が人気を集めている。対象領域におけるモデル精度の向上に多くの研究が費やされているが、モデル堅牢性の重要な問題は無視されている。さらに悪いことに、モデルロバスト性を改善するための従来の対戦訓練(AT)手法は、教師付き損失関数によって生成される敵の例に基づいてモデルを訓練するため、UDAシナリオでは適用できない。本稿では,UDAモデルの対角的堅牢性を改善するために,SRoUDAというメタ自己学習パイプラインを提案する。自己学習パラダイムに基づいて、SRoUDAは、ソースラベル付きデータにUDAベースラインを適用し、開発したランダムマスク拡張(RMA)でラベルなしデータをタラゲットすることでソースモデルを事前トレーニングし、その後、擬似ラベル付きターゲットデータに基づく敵ターゲットモデルトレーニングと、メタステップでソースモデルを微調整する。自己学習は、UDAにATを直接組み込むことを可能にするが、SRoUDAのメタステップは、ノイズの多い擬似ラベルからのエラー伝播を緩和するのに役立つ。さまざまなベンチマークデータセットに対する大規模な実験は、SRoUDAの最先端性能を示し、クリーンな精度を損なうことなく、重要なモデルロバスト性の改善を実現する。コードはhttps://github.com/Vision.comで入手できる。

As acquiring manual labels on data could be costly, unsupervised domain adaptation (UDA), which transfers knowledge learned from a rich-label dataset to the unlabeled target dataset, is gaining increasing popularity. While extensive studies have been devoted to improving the model accuracy on target domain, an important issue of model robustness is neglected. To make things worse, conventional adversarial training (AT) methods for improving model robustness are inapplicable under UDA scenario since they train models on adversarial examples that are generated by supervised loss function. In this paper, we present a new meta self-training pipeline, named SRoUDA, for improving adversarial robustness of UDA models. Based on self-training paradigm, SRoUDA starts with pre-training a source model by applying UDA baseline on source labeled data and taraget unlabeled data with a developed random masked augmentation (RMA), and then alternates between adversarial target model training on pseudo-labeled target data and finetuning source model by a meta step. While self-training allows the direct incorporation of AT in UDA, the meta step in SRoUDA further helps in mitigating error propagation from noisy pseudo labels. Extensive experiments on various benchmark datasets demonstrate the state-of-the-art performance of SRoUDA where it achieves significant model robustness improvement without harming clean accuracy. Code is available at https://github.com/Vision.

翻訳日:2022-12-13 16:55:39 公開日:2022-12-12

# ProtoPNetは本当に説明可能であるか? プロトタイプの解釈可能性の評価と改善

Is ProtoPNet Really Explainable? Evaluating and Improving the Interpretability of Prototypes ( http://arxiv.org/abs/2212.05946v1 )

ライセンス: Link先を確認

Qihan Huang, Mengqi Xue, Haofei Zhang, Jie Song, Mingli Song

(参考訳) ProtoPNetとその追従型(ProtoPNets)は、プロトタイプから固有の解釈可能性と非解釈不可能な解釈に匹敵する精度で、幅広い研究の関心を集めている。しかし,最近になって,潜在空間における類似性と入力空間における類似性の関係から,プロトタイプの解釈性が損なわれることが判明した。本研究は,サクラの摘み取りによって容易に誤解されるような可視化例による質的評価に留まらず,プロトタイプに基づく説明の解釈性を定量的に評価する最初の試みである。そこで本研究では,2つの評価指標,すなわち一貫性スコアと安定性スコアを提案し,説明一貫性クロスイメージと摂動に対する説明堅牢性を評価する。さらに,プロトタイプの解釈性を向上させるために,浅層深度特徴アライメント(SDFA)モジュールとスコアアグリゲーション(SA)モジュールを提案する。我々は,既存のプロトネットの解釈可能性を明らかにするために,体系的な評価実験を行い,実質的な議論を行う。実験により,従来の定性評価と定量的評価の両面において,精度と解釈性の両方において,本手法は最先端技術よりも優れた性能を示すことが示された。コードはhttps://github.com/hqhQAQ/EvalProtoPNetで入手できる。

ProtoPNet and its follow-up variants (ProtoPNets) have attracted broad research interest for their intrinsic interpretability from prototypes and comparable accuracy to non-interpretable counterparts. However, it has been recently found that the interpretability of prototypes can be corrupted due to the semantic gap between similarity in latent space and that in input space. In this work, we make the first attempt to quantitatively evaluate the interpretability of prototype-based explanations, rather than solely qualitative evaluations by some visualization examples, which can be easily misled by cherry picks. To this end, we propose two evaluation metrics, termed consistency score and stability score, to evaluate the explanation consistency cross images and the explanation robustness against perturbations, both of which are essential for explanations taken into practice. Furthermore, we propose a shallow-deep feature alignment (SDFA) module and a score aggregation (SA) module to improve the interpretability of prototypes. We conduct systematical evaluation experiments and substantial discussions to uncover the interpretability of existing ProtoPNets. Experiments demonstrate that our method achieves significantly superior performance to the state-of-the-arts, under both the conventional qualitative evaluations and the proposed quantitative evaluations, in both accuracy and interpretability. Codes are available at https://github.com/hqhQAQ/EvalProtoPNet.

翻訳日:2022-12-13 16:55:07 公開日:2022-12-12

# マスクオートエンコーダはトランスフォーマーデータハングリーの効果的な解法である

Masked autoencoders is an effective solution to transformer data-hungry ( http://arxiv.org/abs/2212.05677v1 )

ライセンス: Link先を確認

Jiawei Mao, Honggu Zhou, Xuesong Yin, Yuanqi Chang. Binling Nie. Rui Xu

(参考訳) ビジョントランスフォーマー(ViT)は、いくつかのビジョンタスクにおいて、そのグローバルモデリング能力で畳み込みニューラルネットワーク(CNN)を上回っている。しかし、ViTには畳み込みに固有の誘導バイアスがないため、トレーニングには大量のデータが必要である。これにより、ViTは医学や科学のような小さなデータセット上でCNNと同等に動作しない。マスク付きオートエンコーダ(mae)はトランスフォーマーを画像そのものに集中させることで、vitのデータ・ハングリー問題をある程度緩和できることを実験的に発見した。しかし、現在のmaeモデルは複雑すぎるため、小さなデータセットに過剰フィッティング問題が発生する。これにより、小さなデータセットでトレーニングされたMAEと高度なCNNモデルのギャップが生じる。そこで、maeにおけるデコーダの複雑さを低減させる方法について検討し、小さなデータセットでそれに適したアーキテクチャ構成を見出した。さらに,位置予測タスクと対比学習タスクも設計し,maeの局所化と不分散特性を導入した。対照的な学習タスクは、モデルがハイレベルなビジュアル情報を学習できるだけでなく、maeのクラストークンのトレーニングも可能にします。ほとんどのMAE改善努力は考慮していません。大規模な実験により,本手法は,現在普及しているマスク画像モデリング(MIM)や小型データセットのビジョントランスフォーマーと比較して,標準の小型データセットと医療データセットの最先端性能を示すとともに,そのコードとモデルはhttps://github.com/Talented-Q/SDMAEで公開されている。

Vision Transformers (ViTs) outperforms convolutional neural networks (CNNs) in several vision tasks with its global modeling capabilities. However, ViT lacks the inductive bias inherent to convolution making it require a large amount of data for training. This results in ViT not performing as well as CNNs on small datasets like medicine and science. We experimentally found that masked autoencoders (MAE) can make the transformer focus more on the image itself, thus alleviating the data-hungry issue of ViT to some extent. Yet the current MAE model is too complex resulting in over-fitting problems on small datasets. This leads to a gap between MAEs trained on small datasets and advanced CNNs models still. Therefore, we investigated how to reduce the decoder complexity in MAE and found a more suitable architectural configuration for it with small datasets. Besides, we additionally designed a location prediction task and a contrastive learning task to introduce localization and invariance characteristics for MAE. Our contrastive learning task not only enables the model to learn high-level visual information but also allows the training of MAE's class token. This is something that most MAE improvement efforts do not consider. Extensive experiments have shown that our method shows state-of-the-art performance on standard small datasets as well as medical datasets with few samples compared to the current popular masked image modeling (MIM) and vision transformers for small datasets.The code and models are available at https://github.com/Talented-Q/SDMAE.

翻訳日:2022-12-13 16:49:40 公開日:2022-12-12

# ポイントクラウド登録のための解空間切断を用いた進化的マルチタスク

Evolutionary Multitasking with Solution Space Cutting for Point Cloud Registration ( http://arxiv.org/abs/2212.05679v1 )

ライセンス: Link先を確認

Wu Yue, Peiran Gong, Maoguo Gong, Hangqi Ding, Zedong Tang, Yibo Liu, Wenping Ma, Qiguang Miao

(参考訳) ポイントクラウド登録(PCR)はコンピュータビジョンにおいて人気のある研究トピックである。近年,対象関数設計における初期ポーズに対する頑健さと柔軟性から,進化的手法による登録法が注目されている。しかし、ほとんどの登録法は局所最適にうまく対応できず、成功率を調査することはめったになく、これは局所最適に陥らない可能性を示し、アルゴリズムの実用性に密接に関係している。進化的マルチタスク最適化(EMTO)は、関連するタスク間の知識伝達を通じて探索能力を向上するパラダイムである。この概念に着想を得た本研究では,マルチタスク構成を解空間切断の考え方に基づくEMTOによる新規な登録アルゴリズムを提案する。具体的には, カットスペースを探索するタスクは, 局所最適から逃れ, 登録率を向上する上で, 複雑な関数ランドスケープを伴うタスクを支援する。不要な計算コストを削減するため,スパース・トゥ・ダンス戦略を提案する。また,様々なオーバーラップ率に頑健な新しい適合関数と,計算コストの課題特異的指標を導入する。オブジェクトスケールおよびシーンスケールの登録データセットに対する7つの進化した登録手法と4つの従来の登録手法と比較して,提案手法の精度および局所最適処理における優れた性能を示す実験結果が得られた。

Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in escaping from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 7 evolving registration approaches and 4 traditional registration approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima.

翻訳日:2022-12-13 16:49:12 公開日:2022-12-12

# CircleNet:ロバストペデストリアン検出のためのReciprocating Feature Adaptation

CircleNet: Reciprocating Feature Adaptation for Robust Pedestrian Detection ( http://arxiv.org/abs/2212.05691v1 )

ライセンス: Link先を確認

Tianliang Zhang, Zhenjun Han, Huijuan Xu, Baochang Zhang, Qixiang Ye

(参考訳) 野生での歩行者検出は、特にシーンが歩行者の著しい閉塞や解像度が低い場合に問題となっている。既存のメソッドは、許容できるパフォーマンスを維持しながら、これらの難しいケースに適応できない。本稿では,circlenetと呼ばれる新しい特徴学習モデルを提案する。このモデルでは,低分解能とオクルードされた物体を人間が観察する過程を模倣して,特徴適応を実現する。 circlenetは機能ピラミッドのセットとして実装され、機能融合を改善するために重み共有パス拡張を使用する。複数のトップダウンおよびボトムアップパスを使用して、特徴適応と反復オブジェクト検出を相互に行う。 CircleNetの特徴適応能力を最大限に活用するために、各サイクルにおける様々な解像度と異なる閉塞レベルの歩行者インスタンスの検出に焦点を当てたインスタンス分解訓練戦略を設計する。具体的には、CircleNetは機能アンサンブルを、エンドツーエンドでハードネガティブなブーストという考え方で実装している。カルテックとシティパーソンの2つの歩行者検出データセットの実験では、circlenetは、通常のインスタンスでの良好なパフォーマンスを維持しつつ、かなりのマージンでオクルードと低解像度の歩行者のパフォーマンスを改善している。

Pedestrian detection in the wild remains a challenging problem especially when the scene contains significant occlusion and/or low resolution of the pedestrians to be detected. Existing methods are unable to adapt to these difficult cases while maintaining acceptable performance. In this paper we propose a novel feature learning model, referred to as CircleNet, to achieve feature adaptation by mimicking the process humans looking at low resolution and occluded objects: focusing on it again, at a finer scale, if the object can not be identified clearly for the first time. CircleNet is implemented as a set of feature pyramids and uses weight sharing path augmentation for better feature fusion. It targets at reciprocating feature adaptation and iterative object detection using multiple top-down and bottom-up pathways. To take full advantage of the feature adaptation capability in CircleNet, we design an instance decomposition training strategy to focus on detecting pedestrian instances of various resolutions and different occlusion levels in each cycle. Specifically, CircleNet implements feature ensemble with the idea of hard negative boosting in an end-to-end manner. Experiments on two pedestrian detection datasets, Caltech and CityPersons, show that CircleNet improves the performance of occluded and low-resolution pedestrians with significant margins while maintaining good performance on normal instances.

翻訳日:2022-12-13 16:48:51 公開日:2022-12-12

# 検出選択アルゴリズム : 物体検出のためのポスト処理を行う確率ベース最適化手法

Detection Selection Algorithm: A Likelihood based Optimization Method to Perform Post Processing for Object Detection ( http://arxiv.org/abs/2212.05706v1 )

ライセンス: Link先を確認

Angzhi Fan, Benjamin Ticknor and Yali Amit

(参考訳) 物体検出では、非最大抑圧(NMS)のような後処理法が広く用いられている。 NMSは偽陽性の検出回数を大幅に減らすことができるが、目標値の低いいくつかの検出を維持できる可能性がある。画像中のオブジェクトとそのラベルの正確な数を求めるため,NMSや関連手法の後に使用される検出選択アルゴリズム(DSA)と呼ばれるポスト処理手法を提案する。 DSAは検出されたバウンディングボックスのサブセットを優雅に選択し、オブジェクトの閉塞を考慮した画像全体の解釈を最も高い確率で行う完全なオブジェクト再構成を行う。アルゴリズムは4つの要素からなる。まず、オブジェクト間の閉塞関係を得るために、より高速なR-CNNに閉塞分岐を追加する。第2に,我々がデコーダと呼ぶ訓練済み生成ネットワークの潜在変数の最適化に基づいて,その可視部分から物体全体の外観を再構築できる単一再構成アルゴリズムを開発した。第3に, 咬合順序を考慮した仮説的解釈により, 全物体の同時再構成を行う全再構成アルゴリズムを提案する。最後に,リストから検出を漸進的に追加または削除し,対応する解釈の可能性を最大化する欲望アルゴリズムを提案する。 NMS や Soft-NMS を用いた DSA は NMS や Soft-NMS よりも優れた結果が得られる。

In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.

翻訳日:2022-12-13 16:48:33 公開日:2022-12-12

# ホットコールドブロック:新しいウェアラブルデザインで赤外線センサーを騙す

HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design ( http://arxiv.org/abs/2212.05709v1 )

ライセンス: Link先を確認

Hui Wei, Zhixiang Wang, Xuemei Jia, Yinqiang Zheng, Hao Tang, Shin'ichi Satoh, Zheng Wang

(参考訳) 熱赤外画像に対する敵対的な攻撃は、関連する応用のリスクを露呈する。これらのシステムのセキュリティを見積もることは、現実世界に安全にデプロイするには不可欠です。多くの場合、物理的空間における攻撃を実現するには、精巧な特別な摂動が必要である。これらの解はしばしば \emph{impractical} と \emph{attention-grabbing} である。物理的に実用的でステルス的な敵攻撃の必要性に対処するために、ウェアラブルウォーミングペーストと冷却ペーストを利用する人を隠蔽する赤外線検出器の新しい物理的攻撃である \textsc{HotCold} Blockを導入する。これらの容易に利用できる温度制御材料を体に取り付けることで、 \textsc{HotCold} Blockは人間の目から効率的に逃れる。さらに、複雑なテクスチャと構造特徴を持つ逆パッチを構築する既存の方法とは異なり、 \textsc{HotCold} Blockは、純粋な色ブロックによる攻撃を可能にし、サイズ、形状、位置が攻撃性能に与える影響を探索するSSP指向の逆最適化アルゴリズムを使用している。ディジタル環境と物理環境の両方における広範な実験結果は、提案する \textsc{hotcold}ブロックの性能を示している。 \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}。

Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attention-grabbing}. To address the need for a physically practical and stealthy adversarial attack, we introduce \textsc{HotCold} Block, a novel physical attack for infrared detectors that hide persons utilizing the wearable Warming Paste and Cooling Paste. By attaching these readily available temperature-controlled materials to the body, \textsc{HotCold} Block evades human eyes efficiently. Moreover, unlike existing methods that build adversarial patches with complex texture and structure features, \textsc{HotCold} Block utilizes an SSP-oriented adversarial optimization algorithm that enables attacks with pure color blocks and explores the influence of size, shape, and position on attack performance. Extensive experimental results in both digital and physical environments demonstrate the performance of our proposed \textsc{HotCold} Block. \emph{Code is available: \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}.

翻訳日:2022-12-13 16:48:09 公開日:2022-12-12

# Occluded Pedestrian Detectionのための特徴校正ネットワーク

Feature Calibration Network for Occluded Pedestrian Detection ( http://arxiv.org/abs/2212.05717v1 )

ライセンス: Link先を確認

Tianliang Zhang, Qixiang Ye, Baochang Zhang, Jianzhuang Liu, Xiaopeng Zhang, Qi Tian

(参考訳) 野生での歩行者検出は、特に深刻な閉塞を含むシーンでは難しい問題である。本稿では,様々な閉塞下で歩行者を適応的に検出する特徴校正ネットワーク(FC-Net)と呼ばれる,ディープラーニングフレームワークにおける特徴学習手法を提案する。 FC-Netは、歩行者の可視部分が検出に選択的かつ決定的であることの観察に基づいており、セルフアクティベーション(SA)モジュールと特徴校正(FC)モジュールを備えたセルフペース機能学習フレームワークとして実装されている。 FC-Netは、新しい自己活性化方式で、目に見える部分をハイライトし、歩行者の隠された部分を抑える特徴を学習する。 SAモジュールは、余分なパラメータを伴わずに、分類器の重みを再利用することによって歩行者の活性化マップを推定し、その結果、特徴のセマンティクスを強化するための極めてパーシモニーモデルとなり、FCモジュールは、画素単位でも地域的にも適応的な歩行者表現のための畳み込み特性を校正する。 CityPersonsとCaltechのデータセットの実験では、FC-Netは閉塞歩行者の検知性能を最大10%改善し、非閉塞歩行者の優れた性能を維持している。

Pedestrian detection in the wild remains a challenging problem especially for scenes containing serious occlusion. In this paper, we propose a novel feature learning method in the deep learning framework, referred to as Feature Calibration Network (FC-Net), to adaptively detect pedestrians under various occlusions. FC-Net is based on the observation that the visible parts of pedestrians are selective and decisive for detection, and is implemented as a self-paced feature learning framework with a self-activation (SA) module and a feature calibration (FC) module. In a new self-activated manner, FC-Net learns features which highlight the visible parts and suppress the occluded parts of pedestrians. The SA module estimates pedestrian activation maps by reusing classifier weights, without any additional parameter involved, therefore resulting in an extremely parsimony model to reinforce the semantics of features, while the FC module calibrates the convolutional features for adaptive pedestrian representation in both pixel-wise and region-based ways. Experiments on CityPersons and Caltech datasets demonstrate that FC-Net improves detection performance on occluded pedestrians up to 10% while maintaining excellent performance on non-occluded instances.

翻訳日:2022-12-13 16:47:48 公開日:2022-12-12

# 画像アライメントのための変換テンソル・テンソル生成物によるテンソル因子化

Tensor Factorization via Transformed Tensor-Tensor Product for Image Alignment ( http://arxiv.org/abs/2212.05719v1 )

ライセンス: Link先を確認

Sijia Xia, Duo Qiu, and Xiongjun Zhang

(参考訳) 本稿では,観測された画像が未知の領域変換によって変形し,付加ガウス雑音とスパースノイズによって同時に劣化する線形相関画像アライメントのバッチ問題について検討する。これらの画像を3階テンソルの正面スライスとして積み重ねることで、変換テンソルテンソル積によるテンソル分解法を用いて、基底テンソルの低ランク性を探索し、任意のユニタリ変換の下で変換テンソルテンソル積を介して2つの小さなテンソルの積に分解する。変換テンソル-テンソル積の主な利点は、その計算複雑性が変換テンソル核ノルムに基づく既存の文献よりも低いことである。さらに、テンソル$\ell_p$$(0<p<1)$ノルムはスパースノイズの空間性を特徴づけるために使用され、テンソルのフロベニウスノルムは加法ガウスノイズをモデル化するために用いられる。一般化されたGauss-Newtonアルゴリズムは、ドメイン変換を線形化して得られたモデルを解くために設計され、対応するサブプロブレムを解くために近位Gauss-Seidelアルゴリズムが開発された。さらに、近位ガウス-セイデルアルゴリズムの収束が確立され、その収束率はクルディカ-$\l$ojasiewicz の性質に基づいて解析される。実世界の画像データセットに関する広範囲な数値実験を行い,精度と計算時間の両方において,提案手法の優れた性能を示す。

In this paper, we study the problem of a batch of linearly correlated image alignment, where the observed images are deformed by some unknown domain transformations, and corrupted by additive Gaussian noise and sparse noise simultaneously. By stacking these images as the frontal slices of a third-order tensor, we propose to utilize the tensor factorization method via transformed tensor-tensor product to explore the low-rankness of the underlying tensor, which is factorized into the product of two smaller tensors via transformed tensor-tensor product under any unitary transformation. The main advantage of transformed tensor-tensor product is that its computational complexity is lower compared with the existing literature based on transformed tensor nuclear norm. Moreover, the tensor $\ell_p$ $(0<p<1)$ norm is employed to characterize the sparsity of sparse noise and the tensor Frobenius norm is adopted to model additive Gaussian noise. A generalized Gauss-Newton algorithm is designed to solve the resulting model by linearizing the domain transformations and a proximal Gauss-Seidel algorithm is developed to solve the corresponding subproblem. Furthermore, the convergence of the proximal Gauss-Seidel algorithm is established, whose convergence rate is also analyzed based on the Kurdyka-$\L$ojasiewicz property. Extensive numerical experiments on real-world image datasets are carried out to demonstrate the superior performance of the proposed method as compared to several state-of-the-art methods in both accuracy and computational time.

翻訳日:2022-12-13 16:47:25 公開日:2022-12-12

# hdnet:群衆数を階層的に分離したネットワーク

HDNet: A Hierarchically Decoupled Network for Crowd Counting ( http://arxiv.org/abs/2212.05722v1 )

ライセンス: Link先を確認

Chenliang Gu, Changan Wang, Bin-Bin Gao, Jun Liu, Tianliang Zhang

(参考訳) 近年,密度分布の適合性が高いため,密度マップ回帰に基づく手法が群集計数において優勢である。しかし、背景雑音と大きな密度変化が主な原因で、さらなる改善は飽和する傾向にある。本稿では,上記の2つの問題を解決するための階層的分離ネットワーク(hdnet)を提案する。具体的には、背景分類サブタスクを密度マップ予測タスクから分解し、密度デカップリングモジュール(DDM)に割り当てられ、その高い識別能力を利用する。残りのフォアグラウンド予測サブタスクでは、ddmによって複数の密度特異的サブタスクに階層的に分解され、フォアグラウンド密度推定モジュール(fdem)で回帰ベースの専門家によって解決される。提案手法は,これらのタスク固有の専門家の最適化を緩和するために仮説空間を効果的に削減するが,これらのサブタスクの高相関は無視される。そこで我々は,機能インタラクション,勾配インタラクション,スケールインタラクションという,フレームワーク全体を統一するための3種類のインタラクション戦略を導入する。上記の精神と統合されたHDNetは、いくつかの人気のあるカウントベンチマークで最先端のパフォーマンスを達成する。

Recently, density map regression-based methods have dominated in crowd counting owing to their excellent fitting ability on density distribution. However, further improvement tends to saturate mainly because of the confusing background noise and the large density variation. In this paper, we propose a Hierarchically Decoupled Network (HDNet) to solve the above two problems within a unified framework. Specifically, a background classification sub-task is decomposed from the density map prediction task, which is then assigned to a Density Decoupling Module (DDM) to exploit its highly discriminative ability. For the remaining foreground prediction sub-task, it is further hierarchically decomposed to several density-specific sub-tasks by the DDM, which are then solved by the regression-based experts in a Foreground Density Estimation Module (FDEM). Although the proposed strategy effectively reduces the hypothesis space so as to relieve the optimization for those task-specific experts, the high correlation of these sub-tasks are ignored. Therefore, we introduce three types of interaction strategies to unify the whole framework, which are Feature Interaction, Gradient Interaction, and Scale Interaction. Integrated with the above spirits, HDNet achieves state-of-the-art performance on several popular counting benchmarks.

翻訳日:2022-12-13 16:46:56 公開日:2022-12-12

# roiformer: 自己教師付き単眼深度推定のための意味認識領域変換器

ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient Self-Supervised Monocular Depth Estimation ( http://arxiv.org/abs/2212.05729v1 )

ライセンス: Link先を確認

Daitao Xing, Jinglin Shen, Chiuman Ho and Anthony Tzes

(参考訳) 相互に適合するクロスドメインの探索は、正確な自己監督深度推定への大きな可能性を示している。本研究では,深度情報と意味情報の融合について再検討し,幾何認識表現強調のための効率的な局所適応注意法を提案する。グローバルな接続を構築したり、制約なく特徴空間に注意を向ける代わりに、学習可能な関心領域内に空間的相互作用を縛り付ける。特に,意味情報からの幾何学的手がかりを利用して局所適応境界ボックスを学習し,教師なし特徴集約を導く。局所領域は注意空間から最も無関係な参照ポイントを妨げ、より選択的な特徴学習とより速い収束をもたらす。我々は自然にパラダイムを多面的・階層的な方法で拡張し、異なる意味レベルでの情報蒸留を可能にし、詳細な深度推定のための特徴識別能力を向上させる。 KITTIデータセットの大規模な実験により,提案手法は自己教師付き単眼深度推定タスクにおける新しい最先端技術を確立し,従来のトランスフォーマーモデルに対するアプローチの有効性を示す。

The exploration of mutual-benefit cross-domains has shown great potential toward accurate self-supervised depth estimation. In this work, we revisit feature fusion between depth and semantic information and propose an efficient local adaptive attention method for geometric aware representation enhancement. Instead of building global connections or deforming attention across the feature space without restraint, we bound the spatial interaction within a learnable region of interest. In particular, we leverage geometric cues from semantic information to learn local adaptive bounding boxes to guide unsupervised feature aggregation. The local areas preclude most irrelevant reference points from attention space, yielding more selective feature learning and faster convergence. We naturally extend the paradigm into a multi-head and hierarchic way to enable the information distillation in different semantic levels and improve the feature discriminative ability for fine-grained depth estimation. Extensive experiments on the KITTI dataset show that our proposed method establishes a new state-of-the-art in self-supervised monocular depth estimation task, demonstrating the effectiveness of our approach over former Transformer variants.

翻訳日:2022-12-13 16:46:37 公開日:2022-12-12

# bev-mae:アウトドア・ポイント・クラウド・プレトレーニングのためのバードズ・アイ・ビューマスク付きオートエンコーダ

BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training ( http://arxiv.org/abs/2212.05758v1 )

ライセンス: Link先を確認

Zhiwei Lin, Yongtao Wang

(参考訳) 現在の屋外LiDARに基づく3Dオブジェクト検出法は、主にスクラッチの訓練パラダイムを採用している。残念ながら、このパラダイムは大規模なラベル付きデータに大きく依存しており、そのコレクションは高価で時間を要する可能性がある。自己教師付き事前学習は、この広範な注釈付きデータへの依存を緩和するための効果的かつ望ましい方法である。近年,マスキングモデリングは,ポイントクラウドのための自己教師あり学習手法として成功している。しかし、現在は主に合成データや屋内データセットに焦点を当てている。大規模で希少な屋外点雲に適用すると、良好な結果が得られない。本稿では,アウトドア・ポイント・クラウド上での3次元物体検出のための簡易マスク型オートエンコーダプリトレーニングフレームワークbev-maeを提案する。具体的には、まず、BEV視点で3Dエンコーダ学習特徴表現を誘導し、事前学習中に複雑なデコーダ設計を避けるために、鳥の目視(BEV)誘導マスキング戦略を提案する。さらに,マスキングポイントクラウド入力の微調整による3次元エンコーダの一貫した受容フィールドサイズを維持するために,学習可能なポイントトークンを導入する。最後に、3次元エンコーダが物体検出に不可欠な位置情報を学習できるようにするために, 遠方物体の点雲がより疎いという, 屋外点雲の性質に基づき, 点密度予測を提案する。実験結果から,BEV-MAEは,多種多様な3次元物体検出器を用いたWaymoとnuSceneの両方で,最先端の自己監督結果が得られることがわかった。さらに、事前トレーニング中のトレーニングコストはわずか20%のデータと7%で、最先端のメソッドの提案と同等のパフォーマンスを達成している。ソースコードと事前トレーニングされたモデルが公開される予定だ。

Current outdoor LiDAR-based 3D object detection methods mainly adopt the training-from-scratch paradigm. Unfortunately, this paradigm heavily relies on large-scale labeled data, whose collection can be expensive and time-consuming. Self-supervised pre-training is an effective and desirable way to alleviate this dependence on extensive annotated data. Recently, masked modeling has become a successful self-supervised learning approach for point clouds. However, current works mainly focus on synthetic or indoor datasets. When applied to large-scale and sparse outdoor point clouds, they fail to yield satisfactory results. In this work, we present BEV-MAE, a simple masked autoencoder pre-training framework for 3D object detection on outdoor point clouds. Specifically, we first propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation in a BEV perspective and avoid complex decoder design during pre-training. Besides, we introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder with fine-tuning for masked point cloud inputs. Finally, based on the property of outdoor point clouds, i.e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection. Experimental results show that BEV-MAE achieves new state-of-the-art self-supervised results on both Waymo and nuScenes with diverse 3D object detectors. Furthermore, with only 20% data and 7% training cost during pre-training, BEV-MAE achieves comparable performance with the state-of-the-art method ProposalContrast. The source code and pre-trained models will be made publicly available.

翻訳日:2022-12-13 16:46:17 公開日:2022-12-12

# 動作認識のための3次元変形注意を用いたクロスモーダル学習

Cross-Modal Learning with 3D Deformable Attention for Action Recognition ( http://arxiv.org/abs/2212.05638v1 )

ライセンス: Link先を確認

Sangwon Kim and Dasom Ahn and Byoung Chul Ko

(参考訳) 視覚に基づく行動認識における重要な課題は、時空間的特徴を2つ以上の不均一なモダリティを1つの特徴に埋め込むことである。本研究では,適応時空間受容場とクロスモーダル学習方式を用いた行動認識のための新しい3次元変形型トランスを提案する。 3次元変形可能な変圧器は、3次元変形性、局所的な関節ストライド、時間的ストライドアテンションの3つのアテンションモジュールから構成される。 2つのクロスモーダルトークンは、3D変形可能なアテンションモジュールに入力され、反射時空間相関を持つクロスアテンショントークンを生成する。局所的なストライドアテンションは、注意を空間的に組み合わせ、トークンをポーズさせる。時間的ストライドアテンションは、アテンションモジュール内の入力トークン数を時間的に減少させ、すべてのトークンを同時に使用せずに時間的表現学習をサポートする。変形可能な変換器は、L回繰り返して、最後のクロスモーダルトークンを組み合わせて分類する。提案した3DデフォルマブルトランスはNTU60, NTU120, FineGYM, Penn Actionのデータセットでテストされ, 事前学習プロセスなしでも, 先行訓練された最先端手法よりも優れた結果が得られた。また、空間的関節および時間的ストライド注意による行動認識における重要な関節と相関を可視化することにより、行動認識のための説明可能なポテンシャルを達成する可能性を示す。

An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D deformability, local joint stride, and temporal stride attention. The two cross-modal tokens are input into the 3D deformable attention module to create a cross-attention token with a reflected spatiotemporal correlation. Local joint stride attention is applied to spatially combine attention and pose tokens. Temporal stride attention temporally reduces the number of input tokens in the attention module and supports temporal expression learning without the simultaneous use of all tokens. The deformable transformer iterates L times and combines the last cross-modal token for classification. The proposed 3D deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn Action datasets, and showed results better than or similar to pre-trained state-of-the-art methods even without a pre-training process. In addition, by visualizing important joints and correlations during action recognition through spatial joint and temporal stride attention, the possibility of achieving an explainable potential for action recognition is presented.

翻訳日:2022-12-13 16:37:33 公開日:2022-12-12

# 悪意のあるメディアデータと戦う: タンパ検出とディープフェイク検出に関する調査

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection ( http://arxiv.org/abs/2212.05667v1 )

ライセンス: Link先を確認

Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang

(参考訳) オンラインメディアデータは、画像やビデオの形で、主流のコミュニケーションチャネルになりつつある。しかし、近年の深層学習の進歩、特に深層生成モデルでは、視覚的に説得力のある画像や動画を低コストで制作するための扉が開かれており、デジタル情報の信頼性に深刻な脅威をもたらすだけでなく、社会的影響も深刻である。これはメディア改ざん検出の研究の関心の高まり、すなわち、メディアデータが悪意ある操作を受けているかどうかをディープラーニング技術を用いて調べることである。対象画像の内容によっては、メディア偽造は画像改ざんとディープフェイク技術に分けられる。前者は通常、通常の画像の視覚的要素を移動または消去するが、後者は表情や人間の顔の同一性も操作する。したがって、防御手段には、多種多様な特性を有する画像改ざん検出およびディープフェイク検出が含まれる。本稿では,現在のメディア改ざん検出手法の包括的レビューを行い,今後の研究に向けて,この分野の課題と動向について考察する。

Online media data, in the forms of images and videos, are becoming mainstream communication channels. However, recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost, which not only poses a serious threat to the trustworthiness of digital information but also has severe societal implications. This motivates a growing interest of research in media tampering detection, i.e., using deep learning techniques to examine whether media data have been maliciously manipulated. Depending on the content of the targeted images, media forgery could be divided into image tampering and Deepfake techniques. The former typically moves or erases the visual elements in ordinary images, while the latter manipulates the expressions and even the identity of human faces. Accordingly, the means of defense include image tampering detection and Deepfake detection, which share a wide variety of properties. In this paper, we provide a comprehensive review of the current media tampering detection approaches, and discuss the challenges and trends in this field for future research.

翻訳日:2022-12-13 16:37:07 公開日:2022-12-12

# T5Score: 世代評価メトリクスの識別的微調整

T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics ( http://arxiv.org/abs/2212.05726v1 )

ライセンス: Link先を確認

Yiwei Qin, Weizhe Yuan, Graham Neubig, Pengfei Liu

(参考訳) 現代のテキスト評価のための埋め込みベースのメトリクスは、一般的に2つのパラダイムの1つに該当する: 教師付き人間のアノテーションに従ってどの出力が高品質かを直接予測するために訓練された差別的メトリクスと、生成モデルの確率に基づいてテキストを評価するために訓練された生成的メトリクスである。判別メトリクスは良いアウトプットと悪いアウトプットを区別する問題を直接最適化することができ、生成メトリクスは豊富な生のテキストを使ってトレーニングすることができる。本稿では,現在利用可能なデータからの教師なし信号と教師なし信号の両方を用いて,両世界の長所を組み合わせたフレームワークを提案する。このアイデアを,mT5をバックボーンとするトレーニング信号を使用するメトリックであるT5Scoreをトレーニングすることで,運用する。 5つのデータセット、19の言語、280のシステムで既存のメトリクスと比較し、本手法の有用性を実証した。 T5Scoreは、セグメントレベルの既存のトップスコアメトリクスに対して、すべてのデータセットで最高のパフォーマンスを達成する。コードとモデルはhttps://github.com/qinyiwei/t5scoreでリリースします。

Modern embedding-based metrics for evaluation of generated text generally fall into one of two paradigms: discriminative metrics that are trained to directly predict which outputs are of higher quality according to supervised human annotations, and generative metrics that are trained to evaluate text based on the probabilities of a generative model. Both have their advantages; discriminative metrics are able to directly optimize for the problem of distinguishing between good and bad outputs, while generative metrics can be trained using abundant raw text. In this paper, we present a framework that combines the best of both worlds, using both supervised and unsupervised signals from whatever data we have available. We operationalize this idea by training T5Score, a metric that uses these training signals with mT5 as the backbone. We perform an extensive empirical comparison with other existing metrics on 5 datasets, 19 languages and 280 systems, demonstrating the utility of our method. Experimental results show that: T5Score achieves the best performance on all datasets against existing top-scoring metrics at the segment level. We release our code and models at https://github.com/qinyiwei/T5Score.

翻訳日:2022-12-13 16:12:36 公開日:2022-12-12

# 効果的な多言語微調整方法の探索--要約の事例研究

Searching for Effective Multilingual Fine-Tuning Methods: A Case Study in Summarization ( http://arxiv.org/abs/2212.05740v1 )

ライセンス: Link先を確認

Yiwei Qin, Graham Neubig, Pengfei Liu

(参考訳) 近年,学習済み言語モデルを下流タスクに適応させるためのチューニング戦略が多数提案されている。本稿では,多言語学習のための様々なチューニング戦略,特にテキスト要約の文脈において,広範な経験的評価を行う。具体的には、多言語調律戦略(合計5つのモデル)の3つのファミリーの相対的な利点を調べ、45以上の言語を要約するために経験的に評価する。実験により,XL-Sumデータセット上に新たな最先端技術を構築しただけでなく,多言語チューニング戦略の設計に関する今後の研究のヒントとなる一連の観測結果も得られた。

Recently, a large number of tuning strategies have been proposed to adapt pre-trained language models to downstream tasks. In this paper, we perform an extensive empirical evaluation of various tuning strategies for multilingual learning, particularly in the context of text summarization. Specifically, we explore the relative advantages of three families of multilingual tuning strategies (a total of five models) and empirically evaluate them for summarization over 45 languages. Experimentally, we not only established a new state-of-the-art on the XL-Sum dataset but also derive a series of observations that hopefully can provide hints for future research on the design of multilingual tuning strategies.

翻訳日:2022-12-13 16:12:14 公開日:2022-12-12

# プログラミングのための自然言語処理に関する調査

A Survey on Natural Language Processing for Programming ( http://arxiv.org/abs/2212.05773v1 )

ライセンス: Link先を確認

Qingfu Zhu, Xianzhen Luo, Fang Liu, Cuiyun Gao, Wanxiang Che

(参考訳) NLP技術を用いてプログラミングを支援するプログラミングのための自然言語処理は,近年爆発的な進歩を遂げている。しかし、全スペクトルから関連する作品を体系的にレビューする文献はない。本稿では,初期の演目モデルから最新の競争レベルモデルまで,既存の研究を包括的に調査する。この論文のもう1つの利点はテクニックカテゴリの完全性であり、将来の作品の配置と比較を簡単に行うことができる。

Natural language processing for programming, which aims to use NLP techniques to assist programming, has experienced an explosion in recent years. However, there is no literature that systematically reviews related work from the full spectrum. In this paper, we comprehensively investigate existing work, ranging from early deductive models to the latest competition-level models. Another advantage of this paper is the completeness of the technique category, which provides easy access to locating and comparing future works.

翻訳日:2022-12-13 16:12:01 公開日:2022-12-12

# 連携学習による異種自然言語処理タスクの協調

Collaborating Heterogeneous Natural Language Processing Tasks via Federated Learning ( http://arxiv.org/abs/2212.05789v1 )

ライセンス: Link先を確認

Chenhe Dong, Yuexiang Xie, Bolin Ding, Ying Shen, Yaliang Li

(参考訳) 個人のプライベートテキストデータに対するプライバシーの懸念が高まり、近年のフェデレートラーニング(FL)の発展が促進されている。しかし、NLPにおけるFLの適用に関する既存の研究は、参加者を異種・プライベートな学習目標に合わせるのに適していない。本研究では、異種NLPタスクを持つクライアントがFLコースを構築し、相互に有用な知識を学習できるようにするAssign-Then-Contrast(ATC)フレームワークを提案することにより、NLPにおけるFLの適用範囲をさらに広げる。具体的には、クライアントは、アサイントレーニングステージと呼ばれる独自の学習目標を使用するのではなく、サーバが割り当てた統一タスクで最初にローカルトレーニングを行うように提案する。その後、Contrastのトレーニング段階において、クライアントは異なるローカル学習目標でトレーニングを行い、一貫性と有用なモデル更新に貢献する他のクライアントと知識を交換する。本研究では,自然言語理解(NLU)タスクと自然言語生成(NLG)タスクを対象とする6つの広義のデータセットについて広範な実験を行った。ソースコードは \url{https://github.com/alibaba/FederatedScope/tree/federatedscope/nlp/hetero_tasks} で公開されている。

The increasing privacy concerns on personal private text data promote the development of federated learning (FL) in recent years. However, the existing studies on applying FL in NLP are not suitable to coordinate participants with heterogeneous or private learning objectives. In this study, we further broaden the application scope of FL in NLP by proposing an Assign-Then-Contrast (denoted as ATC) framework, which enables clients with heterogeneous NLP tasks to construct an FL course and learn useful knowledge from each other. Specifically, the clients are suggested to first perform local training with the unified tasks assigned by the server rather than using their own learning objectives, which is called the Assign training stage. After that, in the Contrast training stage, clients train with different local learning objectives and exchange knowledge with other clients who contribute consistent and useful model updates. We conduct extensive experiments on six widely-used datasets covering both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks, and the proposed ATC framework achieves significant improvements compared with various baseline methods. The source code is available at \url{https://github.com/alibaba/FederatedScope/tree/master/federatedscope/nlp/hetero_tasks}.

翻訳日:2022-12-13 16:11:54 公開日:2022-12-12

# P-Transformer: ドキュメントからドキュメントへのニューラルマシン翻訳の改善を目指す

P-Transformer: Towards Better Document-to-Document Neural Machine Translation ( http://arxiv.org/abs/2212.05830v1 )

ライセンス: Link先を確認

Yachao Li, Junhui Li, Jing Jiang, Shimin Tao, Hao Yang and Min Zhang

(参考訳) document-to-document (doc2doc)neural machine translation (nmt)をtransformer経由で直接トレーニングする。私たちの専門的な調査課題は 1) 絶対位置情報と相対位置情報の両方が上エンコーダ層に到達すると徐々に弱まるか、あるいは消えてしまう。 2)エンコーダ出力における絶対位置情報の消滅はDoc2Doc NMTのトレーニング失敗を引き起こす。そこで,本研究では,位置認識トランス(p-transformer,p-transformer,p-transformer)を提案する。具体的には,絶対的な位置情報,すなわち位置埋め込みを,単純かつ効果的な加算操作を通じて,自己参照とクロスアテンションの両方においてクエリキーペアに統合する。さらに,相対的な位置エンコーディングを自己注意に組み込む。提案するP-Transformerは正弦波位置符号化を利用しており,タスク特定位置埋め込み,セグメント埋め込み,アテンション機構を必要としない。 P-Transformerを用いてDoc2Doc NMTモデルを構築し、ソース文書を取り込み、シーケンシャル・ツー・シーケンス(seq2seq)方式でターゲット文書を完全に生成する。さらに、p-transformer は seq2seq-based document-to-sentence (doc2sent) および sentence-to-sentence (sent2sent) 翻訳に適用することができる。 doc2doc nmtの広範な実験結果によると、p-transformerは、7つの言語ペアで広く使われている9つのドキュメントレベルのデータセットの強いベースラインを上回っており、小規模、中規模、大規模をカバーする。談話現象に関する実験により、私たちのDoc2Doc NMTモデルはBLEUと談話コヒーレンスの両方の翻訳品質を改善した。コードをgithubで公開しています。

Directly training a document-to-document (Doc2Doc) neural machine translation (NMT) via Transformer from scratch, especially on small datasets usually fails to converge. Our dedicated probing tasks show that 1) both the absolute position and relative position information gets gradually weakened or even vanished once it reaches the upper encoder layers, and 2) the vanishing of absolute position information in encoder output causes the training failure of Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information in both self-attention and cross-attention. Specifically, we integrate absolute positional information, i.e., position embeddings, into the query-key pairs both in self-attention and cross-attention through a simple yet effective addition operation. Moreover, we also integrate relative position encoding in self-attention. The proposed P-Transformer utilizes sinusoidal position encoding and does not require any task-specified position embedding, segment embedding, or attention mechanism. Through the above methods, we build a Doc2Doc NMT model with P-Transformer, which ingests the source document and completely generates the target document in a sequence-to-sequence (seq2seq) way. In addition, P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation. Extensive experimental results of Doc2Doc NMT show that P-Transformer significantly outperforms strong baselines on widely-used 9 document-level datasets in 7 language pairs, covering small-, middle-, and large-scales, and achieves a new state-of-the-art. Experimentation on discourse phenomena shows that our Doc2Doc NMT models improve the translation quality in both BLEU and discourse coherence. We make our code available on Github.

翻訳日:2022-12-13 16:11:31 公開日:2022-12-12

# 「これは最も破壊的な技術だと思う」:Twitter Dataを用いたChatGPTアーリーアダプターの感性を探る

"I think this is the most disruptive technology": Exploring Sentiments of ChatGPT Early Adopters using Twitter Data ( http://arxiv.org/abs/2212.05856v1 )

ライセンス: Link先を確認

Mubin Ul Haque, Isuru Dharmadasa, Zarrin Tasnim Sworna, Roshan Namal Rajapakse, and Hussain Ahmad

(参考訳) 大規模な言語モデルは最近、さまざまなタスクで素晴らしいパフォーマンスを発揮したことで、大きな注目を集めている。 openaiが開発したchatgptは、大規模な事前学習された言語モデルの実装のひとつで、アーリーアダプターの間で人気を集めている。このようなアーリーアダプターの感情を理解することは、テクノロジーの潜在的な成功や失敗、そしてその強みや弱点についての洞察を提供することができるため重要である。本稿では,初期のChatGPTユーザからの10,732ツイートを用いた混合手法による研究を行う。まず、トピックモデリングを使用してメイントピックを特定し、各トピックの詳細な質的感情分析を行います。この結果から,早期採用者の大多数は,ソフトウェア開発への混乱,エンタテイメント,創造性の行使といったトピックに関して,圧倒的に肯定的な感情を表明していることがわかった。 chatgptを誤用する可能性などの問題、特に教育面への影響などに関する懸念を表明したのはごく一部のユーザーだけだった。本研究は,各トピックの具体例を提示し,研究者とユーザ双方の懸念に対処する上での意義を詳述する。

Large language models have recently attracted significant attention due to their impressive performance on a variety of tasks. ChatGPT developed by OpenAI is one such implementation of a large, pre-trained language model that has gained immense popularity among early adopters, where certain users go to the extent of characterizing it as a disruptive technology in many domains. Understanding such early adopters' sentiments is important because it can provide insights into the potential success or failure of the technology, as well as its strengths and weaknesses. In this paper, we conduct a mixed-method study using 10,732 tweets from early ChatGPT users. We first use topic modelling to identify the main topics and then perform an in-depth qualitative sentiment analysis of each topic. Our results show that the majority of the early adopters have expressed overwhelmingly positive sentiments related to topics such as Disruptions to software development, Entertainment and exercising creativity. Only a limited percentage of users expressed concerns about issues such as the potential for misuse of ChatGPT, especially regarding topics such as Impact on educational aspects. We discuss these findings by providing specific examples for each topic and then detail implications related to addressing these concerns for both researchers and users.

翻訳日:2022-12-13 16:10:57 公開日:2022-12-12

# 極多ラベル長文変換器モデルを用いたICDの自動符号化

Automated ICD Coding using Extreme Multi-label Long Text Transformer-based Models ( http://arxiv.org/abs/2212.05857v1 )

ライセンス: Link先を確認

Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Louisa Jorm

(参考訳) 背景:多くの自然言語処理タスクで事前訓練されたトランスフォーマーモデルの成功により、国際疾病分類(icd)コーディングタスクへの使用が積極的に検討されている。本研究では,3種類のトランスフォーマーモデルについて検討し,自動ICD符号化タスクによって生じる極端なラベルセットと長いテキスト分類課題に対処することを目的とした。方法: Transformer-based model PLM-ICDは、ICD符号化ベンチマークデータセットMIMIC-III上で、現在の最先端(SOTA)性能を達成した。さらに最適化するために、ベースラインモデルに選ばれました。また,XR-Transformerモデルの新たな適応であるXR-LATをMIMIC-IIIデータセット上でトレーニングした。 XR-LATは、ラベルに関する注意、知識伝達、動的負のサンプリング機構を備えた、事前定義された階層コードツリー上の再帰的に訓練されたモデルチェーンである。結果: より長い総数およびチャンクシーケンス長で訓練したPLM-ICDモデルは, 現行のSOTA PLM-ICDモデルより有意に優れ, マイクロF1スコアは60.8%であった。 XR-Transformerモデルは、一般的なドメインではSOTAだが、すべてのメトリクスでうまく機能しなかった。 XR-LATベースの最良のモデルでは、現在のSOTA PLM-ICDモデルと競合する結果が得られ、マクロAUCは2.1%向上した。結論:我々の最適化PLM-ICDモデルはMIMIC-IIIデータセット上でのICDの自動符号化のための新しいSOTAモデルであり,新しいXR-LATモデルは以前のSOTA PLM-ICDモデルと競合する。

Background: Encouraged by the success of pretrained Transformer models in many natural language processing tasks, their use for International Classification of Diseases (ICD) coding tasks is now actively being explored. In this study, we investigate three types of Transformer-based models, aiming to address the extreme label set and long text classification challenges that are posed by automated ICD coding tasks. Methods: The Transformer-based model PLM-ICD achieved the current state-of-the-art (SOTA) performance on the ICD coding benchmark dataset MIMIC-III. It was chosen as our baseline model to be further optimised. XR-Transformer, the new SOTA model in the general extreme multi-label text classification domain, and XR-LAT, a novel adaptation of the XR-Transformer model, were also trained on the MIMIC-III dataset. XR-LAT is a recursively trained model chain on a predefined hierarchical code tree with label-wise attention, knowledge transferring and dynamic negative sampling mechanisms. Results: Our optimised PLM-ICD model, which was trained with longer total and chunk sequence lengths, significantly outperformed the current SOTA PLM-ICD model, and achieved the highest micro-F1 score of 60.8%. The XR-Transformer model, although SOTA in the general domain, did not perform well across all metrics. The best XR-LAT based model obtained results that were competitive with the current SOTA PLM-ICD model, including improving the macro-AUC by 2.1%. Conclusion: Our optimised PLM-ICD model is the new SOTA model for automated ICD coding on the MIMIC-III dataset, while our novel XR-LAT model performs competitively with the previous SOTA PLM-ICD model.

翻訳日:2022-12-13 16:10:37 公開日:2022-12-12

# Carpet-Bombing パッチ:通常の要求なしにディープネットワークを攻撃

Carpet-bombing patch: attacking a deep network without usual requirements ( http://arxiv.org/abs/2212.05827v1 )

ライセンス: Link先を確認

Pol Labarbarie, Adrien Chan-Hon-Tong, St\'ephane Herbin and Milad Leyli-Abadi

(参考訳) ディープネットワークは回避攻撃の脆弱性を示したが、そのような攻撃は通常非現実的な要件がある。最近の文献では、これらの要件の削除の可能性について論じている。本論文は, ほぼ不要なカーペットボーミングパッチ攻撃を導入することで, 本研究に寄与する。特徴表現をターゲットとして、このパッチアタックはネットワークタスクを知る必要はない。この攻撃は、Imagenet、Pascal VocのmAP、CityscapesのIoUの精度を低下させる。この攻撃によって引き起こされる潜在的な安全性の問題以外にも、カーペットボーミング攻撃の影響は、ディープネットワーク層の動的に興味深い特性を浮き彫りにしている。

Although deep networks have shown vulnerability to evasion attacks, such attacks have usually unrealistic requirements. Recent literature discussed the possibility to remove or not some of these requirements. This paper contributes to this literature by introducing a carpet-bombing patch attack which has almost no requirement. Targeting the feature representations, this patch attack does not require knowing the network task. This attack decreases accuracy on Imagenet, mAP on Pascal Voc, and IoU on Cityscapes without being aware that the underlying tasks involved classification, detection or semantic segmentation, respectively. Beyond the potential safety issues raised by this attack, the impact of the carpet-bombing attack highlights some interesting property of deep network layer dynamic.

翻訳日:2022-12-13 16:03:56 公開日:2022-12-12

# where to go: 都市規模のオンライン配車サービスにおける深層強化学習によるエージェントガイダンス

Where to go: Agent Guidance with Deep Reinforcement Learning in A City-Scale Online Ride-Hailing Service ( http://arxiv.org/abs/2212.05742v1 )

ライセンス: Link先を確認

Jiyao Li, Vicki H. Allan

(参考訳) オンライン配車サービスは世界中で普及している交通システムとなっている。本稿では,オンライン配車サービスにおいて,供給と需要のバランスがとれるように,都市周辺の空きタクシーをどのように誘導するかという課題について検討する。我々は、オンライン配車サービスの複数のパフォーマンス指標を考慮した新しい報酬スキームをデザインする。また,様々な場所で不要な動作をマスキングし,エージェントがより高速かつ効率的に学習できるように,deep-q-network with action mask (am-dqn) という新しい深層強化学習法を提案する。シカゴの都市規模データセットを用いて大規模な実験を行った。いくつかの一般的なヒューリスティックおよび学習法は、比較のベースラインとして実装されている。実験の結果, AM-DQNは, 平均故障率, 顧客の平均待ち時間, 空きタクシーの平均アイドル検索時間に関して, 全手法で最高の性能を発揮することがわかった。

Online ride-hailing services have become a prevalent transportation system across the world. In this paper, we study a challenging problem of how to direct vacant taxis around a city such that supplies and demands can be balanced in online ride-hailing services. We design a new reward scheme that considers multiple performance metrics of online ride-hailing services. We also propose a novel deep reinforcement learning method named Deep-Q-Network with Action Mask (AM-DQN) masking off unnecessary actions in various locations such that agents can learn much faster and more efficiently. We conduct extensive experiments using a city-scale dataset from Chicago. Several popular heuristic and learning methods are also implemented as baselines for comparison. The results of the experiments show that the AM-DQN attains the best performances of all methods with respect to average failure rate, average waiting time for customers, and average idle search time for vacant taxis.

翻訳日:2022-12-13 16:02:58 公開日:2022-12-12

# 因果推論と機械学習におけるインストゥルメンタル変数--調査

Instrumental Variables in Causal Inference and Machine Learning: A Survey ( http://arxiv.org/abs/2212.05778v1 )

ライセンス: Link先を確認

Anpeng Wu, Kun Kuang, Ruoxuan Xiong, Fei Wu

(参考訳) 因果推論は、データに基づく変数間の因果関係に関する結論を導き出すために、仮定、研究設計、見積もり戦略を使用するプロセスである。これにより、複雑なシステムで作業しているメカニズムをよりよく理解し、より情報的な決定を下すことができる。多くの環境では、治療と結果変数の両方に影響を及ぼすすべての共同創設者を十分に観察することはできません。この問題に対処するために、因果推論と機械学習の両方における文献の増加は、インストゥルメンタル変数(IV)の使用を提案する。本論文は,因果推論と機械学習の両方において,iv法とその応用を体系的かつ包括的に導入し,議論する最初の試みである。まず、IV の形式的定義を提案し、異なる仮定の下での IV 回帰法の同定問題について議論する。第2に,提案手法に焦点をあてて,既存のIV手法を3つのストリームに分類し,IVを用いた最小二乗法,IVを用いた制御機能,IVの評価を行った。各ストリームに対して、古典的な因果推論手法と、機械学習文学における最近の発展について述べる。次に,実世界のシナリオにおけるIV手法の様々な応用を紹介し,利用可能なデータセットとアルゴリズムの要約を提供する。最後に,本論文を要約し,オープンな問題について議論し,将来的なIV法研究の方向性を提案する。また、この調査でレビューされたIVsメソッドのツールキットをhttps://github.com/causal-machine-learning-lab/mlivで開発する。

Causal inference is the process of using assumptions, study designs, and estimation strategies to draw conclusions about the causal relationships between variables based on data. This allows researchers to better understand the underlying mechanisms at work in complex systems and make more informed decisions. In many settings, we may not fully observe all the confounders that affect both the treatment and outcome variables, complicating the estimation of causal effects. To address this problem, a growing literature in both causal inference and machine learning proposes to use Instrumental Variables (IV). This paper serves as the first effort to systematically and comprehensively introduce and discuss the IV methods and their applications in both causal inference and machine learning. First, we provide the formal definition of IVs and discuss the identification problem of IV regression methods under different assumptions. Second, we categorize the existing work on IV methods into three streams according to the focus on the proposed methods, including two-stage least squares with IVs, control function with IVs, and evaluation of IVs. For each stream, we present both the classical causal inference methods, and recent developments in the machine learning literature. Then, we introduce a variety of applications of IV methods in real-world scenarios and provide a summary of the available datasets and algorithms. Finally, we summarize the literature, discuss the open problems and suggest promising future research directions for IV methods and their applications. We also develop a toolkit of IVs methods reviewed in this survey at https://github.com/causal-machine-learning-lab/mliv.

翻訳日:2022-12-13 15:54:45 公開日:2022-12-12

# 説明可能なパフォーマンス

Explainable Performance ( http://arxiv.org/abs/2212.05866v1 )

ライセンス: Link先を確認

Hu\'e Sullivan, Hurlin Christophe, P\'erignon Christophe and Saurin S\'ebastien

(参考訳) 本稿では,モデルの予測的・経済的性能に対する入力特徴の具体的な寄与を測定するために,XPER手法を提案する。我々の方法論にはいくつかの利点がある。第一に、モデル非依存とパフォーマンス指標非依存の両方です。第2に、XPERはShapley値に基づいて理論的に確立されている。第3に、Shapley値の分解に固有のベンチマークの解釈は、私たちのコンテキストにおいて有意義である。第4に、XPERはモデルの再見積を必要としないため、モデル仕様のエラーに悩まされない。 5つ目は、モデルレベルでも、個々のレベルでも実装できます。オートローンに基づくアプリケーションでは、驚くほど少数の機能によってパフォーマンスが説明できることがわかった。 XPERの分解はメトリクス間でかなり安定していますが、いくつかの機能はメトリクス間でサインを切り替えます。また,モデル予測とモデル性能が2つの異なる課題であることを示す。

We introduce the XPER (eXplainable PERformance) methodology to measure the specific contribution of the input features to the predictive or economic performance of a model. Our methodology offers several advantages. First, it is both model-agnostic and performance metric-agnostic. Second, XPER is theoretically founded as it is based on Shapley values. Third, the interpretation of the benchmark, which is inherent in any Shapley value decomposition, is meaningful in our context. Fourth, XPER is not plagued by model specification error, as it does not require re-estimating the model. Fifth, it can be implemented either at the model level or at the individual level. In an application based on auto loans, we find that performance can be explained by a surprisingly small number of features. XPER decompositions are rather stable across metrics, yet some feature contributions switch sign across metrics. Our analysis also shows that explaining model forecasts and model performance are two distinct tasks.

翻訳日:2022-12-13 15:53:25 公開日:2022-12-12

# 線形マルコフ決定過程に対する最短最適強化学習

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes ( http://arxiv.org/abs/2212.06132v1 )

ライセンス: Link先を確認

Jiafan He and Heyang Zhao and Dongruo Zhou and Quanquan Gu

(参考訳) 線形関数近似による強化学習(rl)について検討した。任意の特徴マッピングの線形関数として遷移ダイナミクスをパラメータ化できるエピソドック時間不均質線形マルコフ決定プロセス(線形mdp)に対して、ほぼミニマックスの最適後悔である$\tilde o(d\sqrt{h^3k})$(ここで$d$は特徴マッピングの次元、$h$は計画の地平線、$k$はエピソード数)を達成する計算効率の良いアルゴリズムを提案する。本アルゴリズムは,(1)emph{optimal}値関数の分散を直接推定し,(2)エピソード数に対して単調に減少し,推定精度が向上し,(3)推定値関数クラスの複雑性を制御するために,値関数推定器の更新にレアスイッチングポリシを用いる新しい分散推定器に依存する,注意深く設計された重み付き線形回帰スキームに基づいている。本研究は,線形mdpを用いた最適rlに対する完全な回答を提供するとともに,開発したアルゴリズムと理論的ツールが独立した興味を持つかもしれない。

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition dynamic can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the \emph{optimal} value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.

翻訳日:2022-12-13 15:53:11 公開日:2022-12-12

# 機械学習におけるドメイン知識統合のロードマップ

A Roadmap to Domain Knowledge Integration in Machine Learning ( http://arxiv.org/abs/2212.05712v1 )

ライセンス: Link先を確認

Himel Das Gupta, Victor S. Sheng

(参考訳) 近年,人工知能のさまざまな面でモデルの性能を高めるために,多くの機械学習アルゴリズムが開発されている。しかし、問題は不適切なデータとリソースのために続く。機械学習モデルに知識を統合することで、これらの障害をある程度克服することができる。知識を組み込むことは、様々な形態の知識表現のために複雑な作業である。本稿では,これらの異なる形態の知識統合と,特定の機械学習タスクにおけるその性能について概説する。

Many machine learning algorithms have been developed in recent years to enhance the performance of a model in different aspects of artificial intelligence. But the problem persists due to inadequate data and resources. Integrating knowledge in a machine learning model can help to overcome these obstacles up to a certain degree. Incorporating knowledge is a complex task though because of various forms of knowledge representation. In this paper, we will give a brief overview of these different forms of knowledge integration and their performance in certain machine learning tasks.

翻訳日:2022-12-13 15:51:57 公開日:2022-12-12

# Visuo-Motorコントロールの事前学習について:学習ベースラインの再検討

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline ( http://arxiv.org/abs/2212.05749v1 )

ライセンス: Link先を確認

Nicklas Hansen and Zhecheng Yuan and Yanjie Ze and Tongzhou Mu and Aravind Rajeswaran and Hao Su and Huazhe Xu and Xiaolong Wang

(参考訳) データ拡張と浅いconvnetを用いた,visuo-motor制御のための,単純なスクラッチベースラインを再検討する。このベースラインは、大規模な視覚データセットで訓練された凍結視覚表現を利用する最近の手法と競合する性能を持つ。

We revisit a simple Learning-from-Scratch baseline for visuo-motor control that uses data augmentation and a shallow ConvNet. We find that this baseline has competitive performance with recent methods that leverage frozen visual representations trained on large-scale vision datasets.

翻訳日:2022-12-13 15:51:50 公開日:2022-12-12

# ニューラルアセット:インタラクティブ環境のためのボリュームオブジェクトキャプチャとレンダリング

Neural Assets: Volumetric Object Capture and Rendering for Interactive Environments ( http://arxiv.org/abs/2212.06125v1 )

ライセンス: Link先を確認

Alja\v{z} Bo\v{z}i\v{c}, Denis Gladkov, Luke Doukakis and Christoph Lassner

(参考訳) リアルなバーチャルアセットを作ることは、時間を要するプロセスだ。通常、アーティストがオブジェクトをデザインし、その外観の微調整に多くの労力を費やす。表面下散乱のような複雑な詳細や特定の効果は、リアルタイムBRDFを用いて表現され、特定の物体の外観を完全に捉えることは不可能である。近年のニューラルレンダリングの進歩に触発されて,日常環境における実世界の物体を忠実かつ高速に捉える手法を提案する。我々は,透明な物体部品などの容積効果を復元し,光写実性オブジェクトの外観を保存するために,新しいニューラル表現を用いる。レンダリング品質を損なうことなくリアルタイムレンダリングをサポートするために,我々のモデルは,インタラクティブなフレームレートを持つ効率的なシェーダコードに変換される,多数の機能と小さなMPPデコーダを使用する。これにより、提案されたニューラルネットワークアセットと既存のメッシュ環境とオブジェクトのシームレスな統合が可能になる。標準的なシェーダーコードレンダリングのおかげで、既存の多くのハードウェアやソフトウェアシステムで可搬性がある。

Creating realistic virtual assets is a time-consuming process: it usually involves an artist designing the object, then spending a lot of effort on tweaking its appearance. Intricate details and certain effects, such as subsurface scattering, elude representation using real-time BRDFs, making it impossible to fully capture the appearance of certain objects. Inspired by the recent progress of neural rendering, we propose an approach for capturing real-world objects in everyday environments faithfully and fast. We use a novel neural representation to reconstruct volumetric effects, such as translucent object parts, and preserve photorealistic object appearance. To support real-time rendering without compromising rendering quality, our model uses a grid of features and a small MLP decoder that is transpiled into efficient shader code with interactive framerates. This leads to a seamless integration of the proposed neural assets with existing mesh environments and objects. Thanks to the use of standard shader code rendering is portable across many existing hardware and software systems.

翻訳日:2022-12-13 15:45:52 公開日:2022-12-12

# PyPop7: 人口ベースのブラックボックス最適化のためのピュアPythonライブラリ

PyPop7: A Pure-Python Library for Population-Based Black-Box Optimization ( http://arxiv.org/abs/2212.05652v1 )

ライセンス: Link先を確認

Qiqi Duan, Guochen Zhou, Chang Shao, Zhuowei Wang, Mingyang Feng, Yijun Yang, Qi Zhao, Yuhui Shi

(参考訳) 本稿では,black-box optimization(bbo)用のpure-pythonオープンソースライブラリpypop7を提案する。 It provides a unified and modular interface for more than 60 versions and variants of different black-box optimization algorithms, particularly population-based optimizers, which can be classified into 12 popular families: Evolution Strategies (ES), Natural Evolution Strategies (NES), Estimation of Distribution Algorithms (EDA), Cross-Entropy Method (CEM), Differential Evolution (DE), Particle Swarm Optimizer (PSO), Cooperative Coevolution (CC), Simulated Annealing (SA), Genetic Algorithms (GA), Evolutionary Programming (EP), Pattern Search (PS), and Random Search (RS). また、多くの例や興味深いチュートリアル、本格的なAPIドキュメントも提供している。この新しいライブラリを通じて、オプティマイザのベンチマークと実際のアプリケーション、特に大規模BBOの促進のための、よく設計されたプラットフォームを提供することを期待する。ソースコードとドキュメントはhttps://github.com/Evolutionary-Intelligence/pypopとhttps://pypop.readthedocs.io/en/latestで公開されている。

In this paper, we present a pure-Python open-source library, called PyPop7, for black-box optimization (BBO). It provides a unified and modular interface for more than 60 versions and variants of different black-box optimization algorithms, particularly population-based optimizers, which can be classified into 12 popular families: Evolution Strategies (ES), Natural Evolution Strategies (NES), Estimation of Distribution Algorithms (EDA), Cross-Entropy Method (CEM), Differential Evolution (DE), Particle Swarm Optimizer (PSO), Cooperative Coevolution (CC), Simulated Annealing (SA), Genetic Algorithms (GA), Evolutionary Programming (EP), Pattern Search (PS), and Random Search (RS). It also provides many examples, interesting tutorials, and full-fledged API documentations. Through this new library, we expect to provide a well-designed platform for benchmarking of optimizers and promote their real-world applications, especially for large-scale BBO. Its source code and documentations are available at https://github.com/Evolutionary-Intelligence/pypop and https://pypop.readthedocs.io/en/latest, respectively.

翻訳日:2022-12-13 15:45:09 公開日:2022-12-12

# MoDem: デモによる視覚モデルに基づく強化学習の促進

MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations ( http://arxiv.org/abs/2212.05698v1 )

ライセンス: Link先を確認

Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran

(参考訳) サンプル効率の低さは、現実世界のアプリケーション、特にビジュオモーター制御のためのディープ強化学習(RL)アルゴリズムの展開において、引き続き主要な課題である。モデルベースのrlは、世界モデルを同時に学習し、計画と政策改善に合成ロールアウトを使用することで、非常にサンプル効率が良い可能性がある。しかし、実際には、モデルに基づくRLを用いたサンプル効率学習は探索課題によってボトルネックとなる。本研究では,モデルベースRLのサンプル効率を劇的に向上させることができることを示す。ただし、インタラクションデータセットにデモを追加するだけでは十分ではありません。モデルベースのrlフレームワークの3つのフェーズを形成する,モデル学習 – ポリシ事前トレーニング,ターゲット探索,デモデータのオーバーサンプリング – における,デモンストレーションを活用する上で重要な要素を特定します。我々は,3つの複雑なビジュオモータ制御領域を実験的に研究し,この手法が低データ方式(100Kのインタラクションステップ,5つのデモ)の従来のアプローチと比較して,スパース報酬タスクの完了に150%-250%成功していることを確認した。コードとビデオは、https://nicklashansen.github.io/modemrl.comで入手できる。

Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications, and in particular for visuo-motor control. Model-based RL has the potential to be highly sample efficient by concurrently learning a world model and using synthetic rollouts for planning and policy improvement. However, in practice, sample-efficient learning with model-based RL is bottlenecked by the exploration challenge. In this work, we find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. Simply appending demonstrations to the interaction dataset, however, does not suffice. We identify key ingredients for leveraging demonstrations in model learning -- policy pretraining, targeted exploration, and oversampling of demonstration data -- which forms the three phases of our model-based RL framework. We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100K interaction steps, 5 demonstrations). Code and videos are available at: https://nicklashansen.github.io/modemrl

翻訳日:2022-12-13 15:44:00 公開日:2022-12-12

# CACTI: スケーラブルなマルチタスクマルチステージ視覚模倣学習フレームワーク

CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning ( http://arxiv.org/abs/2212.05711v1 )

ライセンス: Link先を確認

Zhao Mandi, Homanga Bharadhwaj, Vincent Moens, Shuran Song, Aravind Rajeswaran, Vikash Kumar

(参考訳) 多くのスキルと未発見のシナリオへの一般化が可能なロボットの開発には、大規模で多様なデータセットの効率的な収集と、収集されたデータに対する高容量ポリシーのトレーニングという2つの面での進歩が必要だ。大規模なデータセットはコンピュータビジョンや自然言語処理といった他の分野の進歩を加速させているが、ロボット工学のような物理システムでは、同等のスケールのデータを集めることが特に難しい。本研究では,このギャップを解消し,キッチン環境におけるマルチタスクマルチセンシングロボット操作のレンズとして,ロボット学習のスケールアップを実現するフレームワークを提案する。 CACTIという名前のフレームワークは,データ収集,データ拡張,視覚表現学習,模倣ポリシートレーニングの4つの段階を別々に扱う。 CACTIフレームワークでは、画像生成に最先端モデルを適用する利点と、圧縮段階における事前訓練された領域外視覚表現を使用することによるトレーニング効率の大幅な向上を強調した。実験では 1) 実際のロボットのセットアップにおいて、CACTIは、キッチンオブジェクトを含む10の操作作業が可能な単一ポリシーの効率的な訓練を可能にし、邪魔対象のレイアウトに頑健である。 2) シミュレーションキッチン環境では,CACTIは18のセマンティックタスクに対して,最大50のレイアウトバリエーションで単一のポリシをトレーニングする。シミュレーションタスクベンチマークと、実環境とシミュレーション環境の両方のデータセットがリリースされ、将来の研究が促進される。

Developing robots that are capable of many skills and generalization to unseen scenarios requires progress on two fronts: efficient collection of large and diverse datasets, and training of high-capacity policies on the collected data. While large datasets have propelled progress in other fields like computer vision and natural language processing, collecting data of comparable scale is particularly challenging for physical systems like robotics. In this work, we propose a framework to bridge this gap and better scale up robot learning, under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage, and the significant improvement of training efficiency by using pretrained out-of-domain visual representations at the compression stage. Experimentally, we demonstrate that 1) on a real robot setup, CACTI enables efficient training of a single policy capable of 10 manipulation tasks involving kitchen objects, and robust to varying layouts of distractor objects; 2) in a simulated kitchen environment, CACTI trains a single policy on 18 semantic tasks across up to 50 layout variations per task. The simulation task benchmark and augmented datasets in both real and simulated environments will be released to facilitate future research.

翻訳日:2022-12-13 15:43:41 公開日:2022-12-12

# 自己校正インタフェースの対話的紹介

Interactive introduction to self-calibrating interfaces ( http://arxiv.org/abs/2212.05766v1 )

ライセンス: Link先を確認

Jonathan Grizou

(参考訳) 本論文は,自己管理インタフェースパラダイムを直感的に理解することを目的とする。このパラダイムでは、オンザフライで好みに合わせてインターフェイスを使用する方法を選択することができる。そこで我々は,PIN入力タスクを導入し,事前校正されたインタフェースから自己校正インターフェースへ移行し,ボタンからの入力モダリティの複雑さを増大させ,地図上のポイント,スケッチ,最後に音声語への変換を行う。本研究は, 従来の研究論文ではなく, 主張を裏付ける仮説と実験結果であり, この研究を裏付ける研究はすでにすでに行われており, 後段で広く言及されている。代わりに私たちの目標は、イラストやインタラクティブなデモ、ビデオなどをサポートする小さな論理的なステップで、興味をそそる対話パラダイムを身につけ、学習を強化することです。この論文は、あらゆる背景の好奇心の楽しみのために設計され、平易な英語で書かれており、事前の知識は必要ない。すべてのデモはopenvault.jgrizou.comでオンラインで公開されている。

This interactive paper aims to provide an intuitive understanding of the self-calibrating interface paradigm. Under this paradigm, you can choose how to use an interface which can adapt to your preferences on the fly. We introduce a PIN entering task and gradually release constraints, moving from a pre-calibrated interface to a self-calibrating interface while increasing the complexity of input modalities from buttons, to points on a map, to sketches, and finally to spoken words. This is not a traditional research paper with a hypothesis and experimental results to support claims; the research supporting this work has already been done and we refer to it extensively in the later sections. Instead, our aim is to walk you through an intriguing interaction paradigm in small logical steps with supporting illustrations, interactive demonstrations, and videos to reinforce your learning. We designed this paper for the enjoyments of curious minds of any backgrounds, it is written in plain English and no prior knowledge is necessary. All demos are available online at openvault.jgrizou.com and linked individually in the paper.

翻訳日:2022-12-13 15:43:17 公開日:2022-12-12

# 非線形コンテキスト帯域とマルコフ決定過程に対する不確かさ重み付き破壊ロバストアルゴリズム

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes ( http://arxiv.org/abs/2212.05949v1 )

ライセンス: Link先を確認

Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

(参考訳) 敵の汚職に伴う強化学習(RL)問題への大きな関心と進展にもかかわらず、現在の作業は線形設定に限られるか、望ましくない$\tilde{O}(\sqrt{T}\zeta)$ regret boundにつながり、$T$はラウンド数、$\zeta$は総汚職数である。本稿では,一般関数近似を用いた文脈的帯域幅を考慮し,$\tilde{O}(\sqrt{T}+\zeta)$の後悔を実現するための計算効率の良いアルゴリズムを提案する。提案アルゴリズムは、最近開発された線形文脈帯域からの不確実性重み付き最小二乗回帰と、一般関数クラスに対する新しい重み付き推定器に依存する。線形構造に大きく依存する既存の解析とは対照的に,重み付き不確実性の総和を制御する新しい手法を開発し,最終的な後悔境界を確立する。次に、このアルゴリズムをエピソディックmdp設定に一般化し、一般関数近似のシナリオにおいて、まず汚職レベル$\zeta$に対する加法依存を達成する。特に、我々のアルゴリズムは、すべての汚職レベルと未知の$\zeta$のケースにおいて、パフォーマンスの低いバウンダリにほぼ一致するか、既存のメソッドを改善している。

Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound, where $T$ is the number of rounds and $\zeta$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit \citep{he2022nearly} and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final regret bounds. We then generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $\zeta$ in the scenario of general function approximation. Notably, our algorithms achieve regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown $\zeta$ cases.

翻訳日:2022-12-13 15:37:09 公開日:2022-12-12

# VO$Q$L:非線形関数近似を用いたモデルフリーRLの最適回帰に向けて

VO$Q$L: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation ( http://arxiv.org/abs/2212.06069v1 )

ライセンス: Link先を確認

Alekh Agarwal, Yujia Jin, Tong Zhang

(参考訳) 一般関数近似とスパース報酬による時間的不均一なエピソード強化学習(RL)について検討した。我々は,分散重み付き楽観的$q$-learning (vo$q$l) という新しいアルゴリズムを$q$-learningに基づいて設計し,その後悔を完全性と回帰関数クラスに対する有界eluder次元に限定した。特別な場合として、VO$Q$L は$\tilde{O}(d\sqrt{HT}+d^6H^{5})$ regret over $T$ episodes for a horizon $H$ MDP under (d$-dimensional) linear function approximation という漸近的に最適である。本アルゴリズムは, 重み付き回帰に基づく上限と下限を最適値関数に組み込んで, 改良された後悔を得る。このアルゴリズムは関数クラス上の回帰オラクルによって計算的に効率的であり、線形MDPに対して計算可能で統計的に最適なアプローチとなる。

We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic $Q$-Learning (VO$Q$L), based on $Q$-learning and bound its regret assuming completeness and bounded Eluder dimension for the regression function class. As a special case, VO$Q$L achieves $\tilde{O}(d\sqrt{HT}+d^6H^{5})$ regret over $T$ episodes for a horizon $H$ MDP under ($d$-dimensional) linear function approximation, which is asymptotically optimal. Our algorithm incorporates weighted regression-based upper and lower bounds on the optimal value function to obtain this improved regret. The algorithm is computationally efficient given a regression oracle over the function class, making this the first computationally tractable and statistically optimal approach for linear MDPs.

翻訳日:2022-12-13 15:36:41 公開日:2022-12-12

# ALSO:運転推定による自動車ライダー自己監督

ALSO: Automotive Lidar Self-supervision by Occupancy estimation ( http://arxiv.org/abs/2212.05867v1 )

ライセンス: Link先を確認

Alexandre Boulch, Corentin Sautier, Bj\"orn Michele, Gilles Puy, Renaud Marlet

(参考訳) 本稿では,ポイントクラウド上で動作する深層知覚モデルのバックボーンを事前学習する新しい自己教師あり手法を提案する。中心となる考え方は、3Dポイントがサンプリングされる表面の再構成であるプリテキストタスクでモデルを訓練し、基礎となる潜在ベクトルを知覚ヘッドへの入力として使用することである。直感的には、もしネットワークがシーン表面を再構築できるなら、わずかな入力ポイントのみを与えられた場合、おそらく、実際の知覚タスクを促進するために使用できる意味情報の断片をキャプチャする。この原理は非常に単純な定式化であり、実装が容易であり、多種多様な3dセンサーや、セマンティックセグメンテーションやオブジェクト検出を行うディープネットワークにも広く適用できる。実際、ほとんどの対照的な学習アプローチとは対照的に、単一のストリームパイプラインをサポートし、限られたリソースでのトレーニングを可能にする。セマンティクスセグメンテーションとオブジェクト検出の両面で,異なる種類のライダーを含む様々な自律運転データセットについて広範な実験を行った。その結果,既存の手法と比較して,アノテーションなしで有用な表現を学習する手法の有効性が示された。コードは \href{https://github.com/valeoai/also}{github.com/valeoai/also} で入手できる。

We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled, and to use the underlying latent vectors as input to the perception head. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information, that can be used to boost an actual perception task. This principle has a very simple formulation, which makes it both easy to implement and widely applicable to a large range of 3D sensors and deep networks performing semantic segmentation or object detection. In fact, it supports a single-stream pipeline, as opposed to most contrastive learning approaches, allowing training on limited resources. We conducted extensive experiments on various autonomous driving datasets, involving very different kinds of lidars, for both semantic segmentation and object detection. The results show the effectiveness of our method to learn useful representations without any annotation, compared to existing approaches. Code is available at \href{https://github.com/valeoai/ALSO}{github.com/valeoai/ALSO}

翻訳日:2022-12-13 15:35:28 公開日:2022-12-12

# CLIP Itselfは強力なファインタナーで、ImageNetのViT-BとViT-Lで85.7%と88.0%のTop-1の精度を達成した

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet ( http://arxiv.org/abs/2212.06138v1 )

ライセンス: Link先を確認

Xiaoyi Dong and Jianmin Bao and Ting Zhang and Dongdong Chen and Shuyang Gu and Weiming Zhang and Lu Yuan and Dong Chen and Fang Wen and Nenghai Yu

(参考訳) 近年の研究では、CLIPはゼロショット推論に成功しているが、微調整性能は不十分である。本稿では,超パラメータ選択による微調整性能の影響について検討する。各種重要パラメータについて検討し,分類タスクにおける微調整CLIPの影響を包括的研究により実証的に評価した。 CLIPの微調整性能はかなり過小評価されている。大規模教師付き事前トレーニングアプローチや,Masked Image Modelingの予測ターゲットとしてCLIPを使用する最新の研究と比較して,CLIP自体が微調整において優れているか,少なくとも競争的であることを示す。具体的には、CLIP ViT-Base/16とCLIP ViT-Large/14は、ImageNet-1Kデータセット上のTop-1精度を85.7%、88.0%微調整することができる。これらの観察は、CLIPは微調整には適さないという従来の結論に挑戦し、最近提案されたCLIPに基づく改善を再考する動機となった。当社のコードは、 \url{https://github.com/LightDXY/FT-CLIP}で公開します。

Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory. In this paper, we identify that fine-tuning performance is significantly impacted by hyper-parameter choices. We examine various key hyper-parameters and empirically evaluate their impact in fine-tuning CLIP for classification tasks through a comprehensive study. We find that the fine-tuning performance of CLIP is substantially underestimated. Equipped with hyper-parameter refinement, we demonstrate CLIP itself is better or at least competitive in fine-tuning compared with large-scale supervised pre-training approaches or latest works that use CLIP as prediction targets in Masked Image Modeling. Specifically, CLIP ViT-Base/16 and CLIP ViT-Large/14 can achieve 85.7%,88.0% finetuning Top-1 accuracy on the ImageNet-1K dataset . These observations challenge the conventional conclusion that CLIP is not suitable for fine-tuning, and motivate us to rethink recently proposed improvements based on CLIP. We will release our code publicly at \url{https://github.com/LightDXY/FT-CLIP}.

翻訳日:2022-12-13 15:35:06 公開日:2022-12-12

# AECにおける自動ルールチェックのためのテキストマイニングに基づく特許分析

Text Mining-Based Patent Analysis for Automated Rule Checking in AEC ( http://arxiv.org/abs/2212.05891v1 )

ライセンス: Link先を確認

Zhe Zheng, Bo-Rui Kang, Qi-Tian Yuan, Yu-Cheng Zhou, Xin-Zheng Lu, Jia-Rui Lin

(参考訳) アーキテクチャ、エンジニアリング、建設(aec)業界におけるコンプライアンスチェックプロセスの効率性を促進することが期待されている自動ルールチェック(arc)が注目されている。 ARCアプリケーションのホットスポットに光を当て、そのトレンドを予測することは、関連する研究とイノベーションの推進に役立つ。そこで本研究では,derwent innovations index database (dii) とchina national knowledge infrastructure (cnki) のデータベースから特許をデータソースとし,(1)特許の定量的特徴(すなわち,年次分配分析),(2)潜在ディリクレ割当(lda)を用いたアークトピックの同定,(3)snaによるアークトピックの共起分析を含む3段階の分析を行った。その結果,中国と英語の特許研究のホットスポットと傾向が異なっていた。本研究の貢献は,(1)複数のテキストマイニング手法(sna,lda)を統合した総合的特許分析へのアプローチ,(2)特許分析に基づいてarcの応用ホットスポットと開発動向をレビューする,(3)arcの技術開発とイノベーションのための標識を提供する,の3つの側面を有する。

Automated rule checking (ARC), which is expected to promote the efficiency of the compliance checking process in the architecture, engineering, and construction (AEC) industry, is gaining increasing attention. Throwing light on the ARC application hotspots and forecasting its trends are useful to the related research and drive innovations. Therefore, this study takes the patents from the database of the Derwent Innovations Index database (DII) and China national knowledge infrastructure (CNKI) as data sources and then carried out a three-step analysis including (1) quantitative characteristics (i.e., annual distribution analysis) of patents, (2) identification of ARC topics using a latent Dirichlet allocation (LDA) and, (3) SNA-based co-occurrence analysis of ARC topics. The results show that the research hotspots and trends of Chinese and English patents are different. The contributions of this study have three aspects: (1) an approach to a comprehensive analysis of patents by integrating multiple text mining methods (i.e., SNA and LDA) is introduced ; (2) the application hotspots and development trends of ARC are reviewed based on patent analysis; and (3) a signpost for technological development and innovation of ARC is provided.

翻訳日:2022-12-13 15:34:46 公開日:2022-12-12

# ソースコード変換器のパラメータ効率向上

Parameter-Efficient Finetuning of Transformers for Source Code ( http://arxiv.org/abs/2212.05901v1 )

ライセンス: Link先を確認

Shamil Ayupov and Nadezhda Chirkova

(参考訳) 事前訓練されたトランスフォーマーは、様々なコード処理タスクで最先端のパフォーマンスを達成するが、デプロイするには大きすぎる可能性がある。ソフトウェア開発ツールは、事前訓練されたモデルの単一インスタンスを使用する可能性がある様々な目的のためにモジュールを組み込むことが多いため、事前訓練されたコードのモデルに対してパラメータ効率の良い微調整を利用する必要があると思われる。本研究では,NLPタスクで最初にテストされたアダプタとLoRAの2つのアプローチを4つのコード処理タスクでテストする。効率的な微調整アプローチは、標準的な、コード理解タスクの完全な微調整と同等あるいは高いパフォーマンスを達成できますが、コード生成タスクの完全な微調整を過小評価しています。これらの結果は、NLP以外の領域で効率的な微調整アプローチをテストすることの重要性を浮き彫りにし、ソースコードの効率的な微調整における将来の研究を動機付けている。

Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but may be too large to be deployed. As software development tools often incorporate modules for various purposes which may potentially use a single instance of the pretrained model, it appears relevant to utilize parameter-efficient fine-tuning for the pretrained models of code. In this work, we test two widely used approaches, adapters and LoRA, which were initially tested on NLP tasks, on four code-processing tasks. We find that though the efficient fine-tuning approaches may achieve comparable or higher performance than the standard, full, fine-tuning in code understanding tasks, they underperform full fine-tuning in code-generative tasks. These results underline the importance of testing efficient fine-tuning approaches on other domains than NLP and motivate future research in efficient fine-tuning for source code.

翻訳日:2022-12-13 15:34:20 公開日:2022-12-12

# MegaCRN:時空間モデリングのためのメタグラフ畳み込みリカレントネットワーク

MegaCRN: Meta-Graph Convolutional Recurrent Network for Spatio-Temporal Modeling ( http://arxiv.org/abs/2212.05989v1 )

ライセンス: Link先を確認

Renhe Jiang, Zhaonan Wang, Jiawei Yong, Puneet Jeph, Quanjun Chen, Yasumasa Kobayashi, Xuan Song, Toyotaro Suzumura, Shintaro Fukushima

(参考訳) 多変量時系列予測の標準タスクとしての時空間モデリングは、AIコミュニティにおいて重要な研究トピックとなっている。グラフストリームに暗示される不均一性と非定常性に対処するため,時空間データに対する新しいグラフ構造学習機構として時空間メタグラフ学習を提案する。具体的には,このアイデアをMeta-Graph Convolutional Recurrent Network(MegaCRN)に実装し,Meta-ノードバンクを利用したMeta-Graph LearnerをGCRNエンコーダに接続する。我々は,2つのベンチマークデータセット(METR-LAとPEMS-BAY)と,非定常現象のばらつきを含む大規模時空間データセットの総合的な評価を行う。私たちのモデルは3つのデータセット(27% mae と 34% rmse)すべてにおいて最先端を上回りました。さらに,一連の質的評価を通じて,異なるパターンを持つ位置と時間スロットを明示的に区別し,異常な状況に対してロバストに適応できることを実証する。コードとデータセットはhttps://github.com/deepkashiwa20/megacrnで入手できる。

Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.

翻訳日:2022-12-13 15:27:14 公開日:2022-12-12

# 確率的帰納的説明の計算について

On Computing Probabilistic Abductive Explanations ( http://arxiv.org/abs/2212.05990v1 )

ライセンス: Link先を確認

Yacine Izza, Xuanxiang Huang, Alexey Ignatiev, Nina Narodytska, Martin C. Cooper and Joao Marques-Silva

(参考訳) 最も広く研究されているAI(XAI)アプローチは正しくない。これはよく知られたモデルに依存しない説明手法のケースであり、従順写像に基づくアプローチのケースでもある。一つの解決策は、不明瞭さの欠点を示さない本質的な解釈可能性を考えることである。不運なことに、本質的な解釈性は説明の冗長性を示すことができる。形式的説明可能性(英語版)はこれらの非厳密なアプローチの代替であり、その一例がPI説明である。残念なことに、PI-Explanationsは重要な欠点も示しており、最も目に見えるのはおそらくそのサイズである。近年,PI-Explanationsの厳密な厳密な厳密さは,いわゆる関連する集合を計算することによって,より小さな説明サイズで取り除けることが観察されている。ある正の {\delta} が与えられたとき、S の特徴が固定されたとき、対象のクラスを得る確率が {\delta} を超えるとき、S の集合 S は {\delta}-関連である。しかし、非常に単純な分類器であっても、関連する特徴集合の計算の複雑さは禁じられ、回路ベースの分類器ではNPPP完全である。従来の否定的な結果とは対照的に,決定木(DT),ネイブベイズ分類器(NBC),命題言語から得られたいくつかの分類器群など,広く使われている分類器の集合を計算するための実践的アプローチを検討する。さらに,本論文では,これらの分類器の族に対して,関連する集合の計算が容易であることを示す。さらに,検討した分類器群に対して,関連特徴の簡潔な集合が得られることを確認した。

The most widely studied explainable AI (XAI) approaches are unsound. This is the case with well-known model-agnostic explanation approaches, and it is also the case with approaches based on saliency maps. One solution is to consider intrinsic interpretability, which does not exhibit the drawback of unsoundness. Unfortunately, intrinsic interpretability can display unwieldy explanation redundancy. Formal explainability represents the alternative to these non-rigorous approaches, with one example being PI-explanations. Unfortunately, PI-explanations also exhibit important drawbacks, the most visible of which is arguably their size. Recently, it has been observed that the (absolute) rigor of PI-explanations can be traded off for a smaller explanation size, by computing the so-called relevant sets. Given some positive {\delta}, a set S of features is {\delta}-relevant if, when the features in S are fixed, the probability of getting the target class exceeds {\delta}. However, even for very simple classifiers, the complexity of computing relevant sets of features is prohibitive, with the decision problem being NPPP-complete for circuit-based classifiers. In contrast with earlier negative results, this paper investigates practical approaches for computing relevant sets for a number of widely used classifiers that include Decision Trees (DTs), Naive Bayes Classifiers (NBCs), and several families of classifiers obtained from propositional languages. Moreover, the paper shows that, in practice, and for these families of classifiers, relevant sets are easy to compute. Furthermore, the experiments confirm that succinct sets of relevant features can be obtained for the families of classifiers considered.

翻訳日:2022-12-13 15:26:52 公開日:2022-12-12

# 変圧器層のニューラルネットワークによる解釈

A Neural ODE Interpretation of Transformer Layers ( http://arxiv.org/abs/2212.06011v1 )

ライセンス: Link先を確認

Yaofeng Desmond Zhong and Tongtao Zhang and Amit Chakraborty and Biswadip Dey

(参考訳) マルチヘッドアテンションとマルチレイヤパーセプトロン(MLP)レイヤの交互パターンを使用するトランスフォーマーレイヤは、さまざまな機械学習問題に対して効果的なツールを提供する。変圧器層は勾配の解消の問題を避けるために残差接続を用いるため、微分方程式の数値積分と見なすことができる。この拡張抽象化では、この接続の上に構築し、トランス層の内部構造を変更することを提案する。提案モデルでは,マルチヘッドアテンションサブレイヤとMLPサブレイヤを並列に配置する。この簡単な修正により,複数のタスクにおけるトランスフォーマーネットワークの性能が向上することを示す。さらに,画像分類タスクにおいて,高度な統合スキームを持つニューラルodeソルバを用いることにより,さらに性能が向上することを示す。

Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance.

翻訳日:2022-12-13 15:26:22 公開日:2022-12-12

# 多次元自己注意に基づく生活推定のためのアプローチ

Multi-Dimensional Self Attention based Approach for Remaining Useful Life Estimation ( http://arxiv.org/abs/2212.05772v1 )

ライセンス: Link先を確認

Zhi Lai, Mengjuan Liu, Yunzhu Pan, Dajiang Chen

(参考訳) Remaining Useful Life (RUL) は、予後・健康管理(PHM)において重要な役割を担っている。従来の機械の健康維持システムはしばしばコストがかかり、事前の専門知識が必要であり、高度に複雑で変化する産業シナリオに適合することは困難である。産業機器へのセンサーの普及に伴い、これらの機器を相互接続するための産業用モノのインターネット(iiot)の構築は、デジタル工場の発展において不可解なトレンドとなっている。 IIoTが収集したリアルタイムな運用データを用いて、推定されたRULをRUL予測アルゴリズムにより取得し、PHMシステムはデバイスに対する前向きなメンテナンス対策を開発することにより、メンテナンスコストを低減し、運用中の障害時間を短縮することができる。本稿では,IIoTシナリオにおけるマルチセンサデバイスのための生活予測モデルについて検討する。本シナリオでは,主流rul予測モデルを調査し,rul予測モデリングの基本ステップを要約した。そこで本論文では,RUL推定のためのデータ駆動手法を提案する。複数のセンサから出力される多次元時系列データを融合するために、マルチヘッド注意機構を使用し、特徴に対する注意が特徴とシーケンスに対する注意の相互作用を捉え、時間ステップの重みを学習する。そして、時系列の特徴を学習するためにLong Short-Term Memory Networkを適用する。提案モデルを2つのベンチマークデータセット(c-mapssとphm08)で評価し,結果が最先端モデルを上回ることを示した。さらに, マルチヘッドアテンション機構の解釈可能性により, 提案モデルはエンジン劣化の予備的な説明を与えることができる。したがって、このアプローチはIIoTシナリオの予測メンテナンスを約束する。

Remaining Useful Life (RUL) estimation plays a critical role in Prognostics and Health Management (PHM). Traditional machine health maintenance systems are often costly, requiring sufficient prior expertise, and are difficult to fit into highly complex and changing industrial scenarios. With the widespread deployment of sensors on industrial equipment, building the Industrial Internet of Things (IIoT) to interconnect these devices has become an inexorable trend in the development of the digital factory. Using the device's real-time operational data collected by IIoT to get the estimated RUL through the RUL prediction algorithm, the PHM system can develop proactive maintenance measures for the device, thus, reducing maintenance costs and decreasing failure times during operation. This paper carries out research into the remaining useful life prediction model for multi-sensor devices in the IIoT scenario. We investigated the mainstream RUL prediction models and summarized the basic steps of RUL prediction modeling in this scenario. On this basis, a data-driven approach for RUL estimation is proposed in this paper. It employs a Multi-Head Attention Mechanism to fuse the multi-dimensional time-series data output from multiple sensors, in which the attention on features is used to capture the interactions between features and attention on sequences is used to learn the weights of time steps. Then, the Long Short-Term Memory Network is applied to learn the features of time series. We evaluate the proposed model on two benchmark datasets (C-MAPSS and PHM08), and the results demonstrate that it outperforms the state-of-art models. Moreover, through the interpretability of the multi-head attention mechanism, the proposed model can provide a preliminary explanation of engine degradation. Therefore, this approach is promising for predictive maintenance in IIoT scenarios.

翻訳日:2022-12-13 15:25:52 公開日:2022-12-12

# gwrboost:空間変動関係の定量的定量化のための地理的重み付け勾配促進法

GWRBoost:A geographically weighted gradient boosting method for explainable quantification of spatially-varying relationships ( http://arxiv.org/abs/2212.05814v1 )

ライセンス: Link先を確認

Han Wang, Zhou Huang, Ganmin Yin, Yi Bao, Xiao Zhou, Yong Gao

(参考訳) 地理的重み付け回帰(GWR)は、地理的文脈における従属変数と独立変数の関係の空間的変動を推定するための重要なツールである。しかし、gwrモデルを構成する古典的な線形回帰は、特にかなりの体積と複雑な非線形データにおいて不適合になりがちであり、比較性能が劣るという問題に苦しんでいる。それでも、決定木やサポートベクトルマシンのような先進的なモデルでは、より効率的に複雑なデータから特徴を学習できるが、局所的な関係の空間的変動について説明可能な定量化はできない。上記の問題に対処するため, 局所的な加法モデルと勾配強化最適化法を適用し, 地理的に位置する変数間の空間的に変化する関係について, 説明可能な定量化能力を保持するGWRBoostを提案する。さらに,提案モデルに対する赤池情報スコアの計算方法を定式化し,従来のGWRアルゴリズムとの比較分析を行う。シミュレーション実験と実験ケーススタディを適用して, GWRBoostの性能と実用性を実証した。その結果,提案モデルはパラメータ推定精度が18.3\%,accが67.3\%,適合性が67.3\%低減できることがわかった。

The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inferior comparative performance. Nevertheless, some advanced models, such as the decision tree and the support vector machine, can learn features from complex data more effectively while they cannot provide explainable quantification for the spatial variation of localized relationships. To address the above issues, we propose a geographically gradient boosting weighted regression model, GWRBoost, that applies the localized additive model and gradient boosting optimization method to alleviate underfitting problems and retains explainable quantification capability for spatially-varying relationships between geographically located variables. Furthermore, we formulate the computation method of the Akaike information score for the proposed model to conduct the comparative analysis with the classic GWR algorithm. Simulation experiments and the empirical case study are applied to prove the efficient performance and practical value of GWRBoost. The results show that our proposed model can reduce the RMSE by 18.3\% in parameter estimation accuracy and AICc by 67.3\% in the goodness of fit.

翻訳日:2022-12-13 15:25:24 公開日:2022-12-12

# 継続KD:継続最適化レンズによる知識蒸留の改善

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization ( http://arxiv.org/abs/2212.05998v1 )

ライセンス: Link先を確認

Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

(参考訳) 知識蒸留(KD)は、より大規模なモデル(教師)から知識を伝達することで、小さなモデルの(学生)一般化を改善するために自然言語理解(NLU)タスクに広く用いられている。 kdメソッドは多くの設定で最先端のパフォーマンスを達成しているが、性能を制限するいくつかの問題に苦しんでいる。文献では,教師と学生のネットワーク間の容量ギャップがKDを非効率にすることを示した。さらに、既存のKD技術は教師の出力のノイズを軽減するものではない:教師の騒々しい振る舞いをモデル化することで、生徒がより有用な特徴を学ぶのを邪魔することができる。本稿では,これらの問題に対処し,従来の手法と比較して訓練を容易にする新しいKD手法を提案する。継続最適化にヒントを得て,この目標のスムーズなバージョンから始めることで,非凸KD目標を最適化する訓練手順を設計し,トレーニングが進むにつれてさらに複雑化する。提案手法(Continuation-KD)は,NLU(GLUEベンチマーク)およびコンピュータビジョンタスク(CIFAR-10およびCIFAR-100)上の各種コンパクトアーキテクチャにおける最先端性能を実現する。

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).

翻訳日:2022-12-13 15:19:30 公開日:2022-12-12

# リモートセンシングにおける画像テキスト検索のためのスケール・semantic joint decoupling network

Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing ( http://arxiv.org/abs/2212.05752v1 )

ライセンス: Link先を確認

Chengyu Zheng, Ning song, Ruoyu Zhang, Lei Huang, Zhiqiang Wei, Jie Nie (corresponding author)

(参考訳) リモートセンシングにおける画像テキスト検索は、データ分析と応用のための柔軟な情報を提供することを目的としている。近年、最先端の手法は「スケールデカップリング」と「セマンティックデカップリング」の戦略に特化して表現能力をさらに強化している。しかしながら、これらの以前のアプローチは、スケールやセマンティクスの分離に焦点をあてるが、これらの2つのアイデアを結合モデルにマージすることを無視し、クロスモーダル検索モデルの性能を極端に制限している。そこで,本稿では,リモートセンシング画像テキスト検索のための新しいスケール・セマンティクス・ジョイント・デカップリング・ネットワーク(ssjdn)を提案する。具体的には、Salience Feature extract (SFE) とSalience-Guided Suppression (SGS) のユニットを利用した双方向スケールデカップリング(BSD) モジュールを設計し、潜在的な特徴を適応的に抽出し、異なるスケールの手がかりを得るために、他のスケールでの煩雑な特徴を抑圧する。さらに,分類セマンティック・デカップリング(LSD)モジュールを,カテゴリセマンティック・ラベルを事前知識として活用して,重要なセマンティック関連情報を示す画像やテキストを監督する。最後に,stl(semantic-guided triple loss)の設計を行った。stlは損失関数を調整する定数を適応的に生成し,同じ意味画像とテキストにマッチする確率を改善し,検索モデルの収束時間を短縮する。提案するssjdnは,4つのベンチマークリモートセンシングデータセットで実施した数値実験で最先端のアプローチを上回っている。

Image-text retrieval in remote sensing aims to provide flexible information for data analysis and application. In recent years, state-of-the-art methods are dedicated to ``scale decoupling'' and ``semantic decoupling'' strategies to further enhance the capability of representation. However, these previous approaches focus on either the disentangling scale or semantics but ignore merging these two ideas in a union model, which extremely limits the performance of cross-modal retrieval models. To address these issues, we propose a novel Scale-Semantic Joint Decoupling Network (SSJDN) for remote sensing image-text retrieval. Specifically, we design the Bidirectional Scale Decoupling (BSD) module, which exploits Salience Feature Extraction (SFE) and Salience-Guided Suppression (SGS) units to adaptively extract potential features and suppress cumbersome features at other scales in a bidirectional pattern to yield different scale clues. Besides, we design the Label-supervised Semantic Decoupling (LSD) module by leveraging the category semantic labels as prior knowledge to supervise images and texts probing significant semantic-related information. Finally, we design a Semantic-guided Triple Loss (STL), which adaptively generates a constant to adjust the loss function to improve the probability of matching the same semantic image and text and shorten the convergence time of the retrieval model. Our proposed SSJDN outperforms state-of-the-art approaches in numerical experiments conducted on four benchmark remote sensing datasets.

翻訳日:2022-12-13 15:18:06 公開日:2022-12-12

# 教師なし異常定位のためのマルチスケール特徴模倣

Multi-scale Feature Imitation for Unsupervised Anomaly Localization ( http://arxiv.org/abs/2212.05786v1 )

ライセンス: Link先を確認

Chao Hu, Shengxin Lai

(参考訳) 非教師付き異常局在化タスクは、異常サンプルトレーニングの欠如、複数のタイプの異常の検出、複数の異常領域の比率の対応といった課題に直面している。これらの問題を解決するために,教師と学生の個別の特徴模倣ネットワーク構造と,画像と特徴ピラミッドを組み合わせたマルチスケール処理戦略を提案する。ネットワーク構造を単純化するために,勾配勾配勾配最適化に基づくネットワークモジュール重要探索手法を提案する。実験結果から,提案アルゴリズムは実工業製品検出データセット上の特徴モデリング異常な局所化手法よりも,同期間に優れた性能を示した。マルチスケール戦略は、ベンチマーク手法と比較して効果的に効果を改善できる。

The unsupervised anomaly localization task faces the challenge of missing anomaly sample training, detecting multiple types of anomalies, and dealing with the proportion of the area of multiple anomalies. A separate teacher-student feature imitation network structure and a multi-scale processing strategy combining an image and feature pyramid are proposed to solve these problems. A network module importance search method based on gradient descent optimization is proposed to simplify the network structure. The experimental results show that the proposed algorithm performs better than the feature modeling anomaly localization method on the real industrial product detection dataset in the same period. The multi-scale strategy can effectively improve the effect compared with the benchmark method.

翻訳日:2022-12-13 15:17:32 公開日:2022-12-12

# マルチビューカメラを用いた3次元キャラクタアニメーションのためのマーカーレスボディモーションキャプチャ

Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras ( http://arxiv.org/abs/2212.05788v1 )

ライセンス: Link先を確認

Jinbao Wang, Ke Lu, Jian Xue

(参考訳) 本稿では,マーカーレス人体モーションキャプチャによる3次元3次元キャラクタアニメーション生成のための新しいアプリケーションシステムを提案する。システム全体のパイプラインは以下の5段階からなる。 1) 複数のカメラを用いたモーションデータのキャプチャ 2) 2次元(2次元)人体関節の検出 3)3次元関節の推定 4)骨変換行列の計算、及び 5)キャラクターアニメーションの生成。本研究の目的は,通常のカメラで撮影した多視点画像を用いて,3次元のスケルトンとアニメーションを生成することである。 3次元視覚に基づく3次元骨格再構築の計算複雑性は、フレーム単位のモーションキャプチャを実現するために必要なように低減されている。実験の結果,本システムは人間の行動を効果的かつ効率的に捉え,リアルタイムに3Dアニメキャラクターをアニメーション化することができることがわかった。

This paper proposes a novel application system for the generation of three-dimensional (3D) character animation driven by markerless human body motion capturing. The entire pipeline of the system consists of five stages: 1) the capturing of motion data using multiple cameras, 2) detection of the two-dimensional (2D) human body joints, 3) estimation of the 3D joints, 4) calculation of bone transformation matrices, and 5) generation of character animation. The main objective of this study is to generate a 3D skeleton and animation for 3D characters using multi-view images captured by ordinary cameras. The computational complexity of the 3D skeleton reconstruction based on 3D vision has been reduced as needed to achieve frame-by-frame motion capturing. The experimental results reveal that our system can effectively and efficiently capture human actions and use them to animate 3D cartoon characters in real-time.

翻訳日:2022-12-13 15:17:24 公開日:2022-12-12

# 異なるタイプの知識グラフに対する推論:静的、時間的、マルチモーダル

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal ( http://arxiv.org/abs/2212.05767v1 )

ライセンス: Link先を確認

Ke Liang, Lingyuan Meng, Meng Liu, Yue Liu, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu, Fuchun Sun

(参考訳) 知識グラフ推論(KGR)は,知識グラフに基づくマイニング論理則に基づいて,既存の事実から新たな事実を推論することを目的として,急速に発展する研究方向となっている。質問応答やレコメンデーションシステムなど、多くのAIアプリケーションでKGを使うことに大きなメリットがあることが証明されている。グラフ型によると、既存のkgrモデルは、静的モデル、時間モデル、マルチモーダルモデルという3つのカテゴリに大まかに分類できる。この領域の初期の研究は主に静的KGRに焦点を当てており、推論タスクに直接一般知識グラフ埋め込みモデルを適用する傾向がある。しかし、これらのモデルは、帰納的静的KGR、時間的KGR、マルチモーダルKGRのようなより複雑で実用的なタスクには適していない。この目的のために、最近複数の研究が開発されているが、調査論文やオープンソースリポジトリは、この重要な方向へのモデルを包括的に要約し、議論している。このギャップを埋めるために、静的から時間的、そしてマルチモーダルなKGをトレースする知識グラフの調査を行う。具体的には、KGRモデルの予備項、要約、典型的なデータセットを導入し、議論する。さらに,課題と可能性についても論じる。対応するオープンソースリポジトリはGitHubで共有されている。

Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: https://github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.

翻訳日:2022-12-13 15:17:11 公開日:2022-12-12

# BigText-QA: 大規模ハイブリッド知識グラフに関する質問応答

BigText-QA: Question Answering over a Large-Scale Hybrid Knowledge Graph ( http://arxiv.org/abs/2212.05798v1 )

ライセンス: Link先を確認

Jingjing Xu, Maria Biryukov, Martin Theobald, Vinu Ellampallil Venugopal

(参考訳) 特に自然言語的質問や手がかりの中で発生する複数のエンティティ間のきめ細かい関係を解釈する場合、テキスト的リソースに関する複雑な質問に答えることは難しい問題である。 YAGO、DBpedia、Freebase、Wikidataなどの知識ベース(KB)は、この文脈で広く使われており、質問応答(QA)アプリケーションでは過去10年間に広く受け入れられてきた。現在のKBは構造化知識の簡潔な表現を提供するが、自然言語ソースが提供する情報だけでなく、定式化や意味的なニュアンスも欠如している。我々は,BigText-QAを用いて,構造化知識と非構造化知識の両方を統一的なグラフィカル表現で整理した,より冗長な知識グラフ(KG)に基づいて,質問に答えられる統合QAシステムを開発することを目的とする。これにより、BigText-QAは、構造化された背景KB(YAGOやWikidataなど)にマッピングされた名前付きエンティティの標準セットである$\unicode{x2013}$aと、高度に多様化したリレーショナルパラフレーズとリッチなコンテキスト情報を提供するオープンな文節のセットを組み合わせることができる。

Answering complex questions over textual resources remains a challenging problem$\unicode{x2013}$especially when interpreting the fine-grained relationships among multiple entities that occur within a natural-language question or clue. Curated knowledge bases (KBs), such as YAGO, DBpedia, Freebase and Wikidata, have been widely used in this context and gained great acceptance for question-answering (QA) applications in the past decade. While current KBs offer a concise representation of structured knowledge, they lack the variety of formulations and semantic nuances as well as the context of information provided by the natural-language sources. With BigText-QA, we aim to develop an integrated QA system which is able to answer questions based on a more redundant form of a knowledge graph (KG) that organizes both structured and unstructured (i.e., "hybrid") knowledge in a unified graphical representation. BigText-QA thereby is able to combine the best of both worlds$\unicode{x2013}$a canonical set of named entities, mapped to a structured background KB (such as YAGO or Wikidata), as well as an open set of textual clauses providing highly diversified relational paraphrases with rich context information.

翻訳日:2022-12-13 15:10:45 公開日:2022-12-12

# rpn: 言語理解のためのディープラーニングにおける単語ベクトルレベルデータ拡張アルゴリズム

RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding ( http://arxiv.org/abs/2212.05961v1 )

ライセンス: Link先を確認

Zhengqing Yuan, Zhuanzhe Zhao, Yongming Liu, Xiaolong Zhang, Xuecong Hou and Yue Wang

(参考訳) 本稿では, RPN:Random Position Noise Algorithmと呼ばれる自然理解タスクのための新しいデータ拡張アルゴリズムを提案する。全ての文レベルのタスクに対して自然言語理解タスクに適用できる手法はほとんどなく、RPNは原文の従来の拡張を単語ベクトルレベルに適用する。 RPNアルゴリズムは、あるワードベクトルの1つまたは複数の次元に置換する。その結果、RPNはサンプルにある程度の摂動を導入することができ、異なるタスクに対する摂動の範囲を調整することができる。拡張されたサンプルはモデルのトレーニングに使用され、モデルがより堅牢になる。その後の実験で、トレーニングや微調整モデルにrpnを加えると、tweeteval、cola、sst-2を含む8つの自然言語処理タスクが安定的に向上し、他のデータ拡張アルゴリズムよりも大幅に改善されたことが分かり、rpnアルゴリズムは言語理解のための全ての文レベルのタスクに適用され、単語埋め込み層を持つあらゆるディープラーニングモデルで使用される。

This paper presents a new data augmentation algorithm for natural understanding tasks, called RPN:Random Position Noise algorithm.Due to the relative paucity of current text augmentation methods. Few of the extant methods apply to natural language understanding tasks for all sentence-level tasks.RPN applies the traditional augmentation on the original text to the word vector level. The RPN algorithm makes a substitution in one or several dimensions of some word vectors. As a result, the RPN can introduce a certain degree of perturbation to the sample and can adjust the range of perturbation on different tasks. The augmented samples are then used to give the model training.This makes the model more robust. In subsequent experiments, we found that adding RPN to the training or fine-tuning model resulted in a stable boost on all 8 natural language processing tasks, including TweetEval, CoLA, and SST-2 datasets, and more significant improvements than other data augmentation algorithms.The RPN algorithm applies to all sentence-level tasks for language understanding and is used in any deep learning model with a word embedding layer.

翻訳日:2022-12-13 15:10:22 公開日:2022-12-12

# アンタングル型シーケンス対シーケンス学習を用いた実世界の合成一般化

Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning ( http://arxiv.org/abs/2212.05982v1 )

ライセンス: Link先を確認

Hao Zheng and Mirella Lapata

(参考訳) 合成一般化は、現在のニューラルネットワークが苦戦している人間の言語学習の基本的なメカニズムである。最近提案されたDunangled sequence-to-sequence model (Dangle) は、復号ステップごとに特別な符号化を学習することで、有望な一般化能力を示す。このモデルに2つの重要な変更を加え、より不整合表現を奨励し、計算とメモリ効率を改善し、より現実的な構成一般化に取り組みます。具体的には、各時間ステップでソースキーと値を適応的に再エンコードするのではなく、それらの表現を分離し、一定間隔で定期的にキーを再エンコードする。我々の新しいアーキテクチャは、既存のタスクやデータセット間でのより優れた一般化性能と、トレーニングセットに関連して自然に発生する構成パターンを検出して作成する新しい機械翻訳ベンチマークをもたらす。この手法は人工的な課題よりも現実の要求をうまくエミュレートする。

Compositional generalization is a basic mechanism in human language learning, which current neural networks struggle with. A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability by learning specialized encodings for each decoding step. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency, allowing us to tackle compositional generalization in a more realistic setting. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically, at some interval. Our new architecture leads to better generalization performance across existing tasks and datasets, and a new machine translation benchmark which we create by detecting naturally occurring compositional patterns in relation to a training set. We show this methodology better emulates real-world requirements than artificial challenges.

翻訳日:2022-12-13 15:10:04 公開日:2022-12-12

# Promptingはプログラミング - 大規模言語モデルのためのクエリ言語

Prompting Is Programming: A Query Language For Large Language Models ( http://arxiv.org/abs/2212.06094v1 )

ライセンス: Link先を確認

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

(参考訳) 大規模言語モデルは、質問応答やコード生成など、幅広いタスクにおいて優れたパフォーマンスを示している。高いレベルでは、入力が与えられると、言語モデルを使用して、統計的に類似した方法でシーケンスを自動補完することができる。これに基づいて、ユーザはこれらのモデルを言語命令や例で促し、さまざまな下流タスクを実装する。高度なプロンプト手法は、言語モデル、ユーザ、計算機などの外部ツール間のインタラクションを暗示することができる。しかし、特定のタスクに対する最新のパフォーマンスや適応言語モデルを得るためには、複雑なタスクとモデル固有のプログラムを実装する必要がある。そこで我々は,LMP(Language Model Programming)という新しいアイデアを提案する。 LMPは、純粋テキストプロンプトから直感的にテキストプロンプトとスクリプティングを組み合わせた言語モデルを一般化する。加えて、LMPは言語モデルの出力に対して制約を指定できる。これにより、言語モデルの内部を抽象化し、ハイレベルなセマンティクスを提供しながら、多くのタスクに簡単に適応できる。 lmpを有効にするために、lmql(language model query languageの略)を実装し、lmpプロンプトからの制約と制御フローを活用して、基礎となる言語モデルへの高価な呼び出し数を最小限に抑える効率的な推論手順を生成する。 LMQLは、直感的に幅広い最先端のプロンプトメソッドをキャプチャすることができ、特に既存のハイレベルAPIで実装するのが困難なインタラクティブなフローを容易にします。評価の結果,複数のダウンストリームタスクの精度を維持したり向上させたりしながら,従量課金API(13～85%のコスト削減)の場合に必要となる計算量やコストを大幅に削減した。

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks, while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (13-85% cost savings).

翻訳日:2022-12-13 15:09:49 公開日:2022-12-12

# ビデオグラウンデッド対話における情報理論的テキスト幻覚低減

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue ( http://arxiv.org/abs/2212.05765v1 )

ライセンス: Link先を確認

Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chang D. Yoo

(参考訳) ビデオグラウンドド・ダイアログ(VGD)は、与えられたビデオと対話コンテキストに関する質問に対して、回答文をデコードすることを目的としている。最近のマルチモーダル推論による回答文生成の成功にもかかわらず、既存の対話システムは依然として、質問を理解せずに入力テキストからのテキストコピーを区別しないテキスト幻覚問題に苦しんでいる。これは、データセット内の回答文が通常入力テキストの単語を含むという事実から、スプリアスな相関を学習するためであり、vgdシステムは入力テキストから単語を過度にコピーし、それらの単語が接頭辞のテキストと重なり合うことを期待している。そこで我々は,提案した情報理論テキスト幻覚測定手法から得られたテキスト幻覚正規化(THR)損失を組み込んだTHAM(Text Hallucination Mitigating)フレームワークを設計する。 THAMを現在の対話システムに適用すると、VGDベンチマーク(AVSD@DSTC7とAVSD@DSTC8)の有効性が検証され、高い解釈可能性を示す。

Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.

翻訳日:2022-12-13 15:08:47 公開日:2022-12-12

# インド語における記事要約のためのディープラーニング手法の実装

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages ( http://arxiv.org/abs/2212.05702v1 )

ライセンス: Link先を確認

Rahul Tangsali, Aabha Pingle, Aditya Vyawahare, Isha Joshi, Raviraj Joshi

(参考訳) 低リソースのインドの言語に対するテキスト要約の研究は、関連するデータセットが利用可能であることから制限されている。本稿では、ilsum 2022のindic language summarizationデータセットで使用されるさまざまなディープラーニングアプローチの概要を示す。 ISUM 2022データセットは、それぞれインド英語、ヒンディー語、グジャラティ語で書かれたニュース記事と、それらの基礎的な要約で構成されている。我々の研究では、様々な事前訓練されたSeq2seqモデルを探索し、ILSUM 2022データセットでそれらを微調整する。我々の場合、細調整された SoTA PEGASUS モデルは英語、細調整された IndicBART モデル、ヒンディー語のための拡張データ、そして再び細調整された PEGASUS モデル、そしてGujarati のための翻訳マッピングに基づくアプローチで機能した。評価指標としてROUGE-1, ROUGE-2, ROUGE-4を用いた。

The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language summarization datasets. The ISUM 2022 dataset consists of news articles written in Indian English, Hindi, and Gujarati respectively, and their ground-truth summarizations. In our work, we explore different pre-trained seq2seq models and fine-tune those with the ILSUM 2022 datasets. In our case, the fine-tuned SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and ROUGE-4 as the evaluation metrics.

翻訳日:2022-12-13 15:08:28 公開日:2022-12-12

# 未ラベルデータを用いた変圧器モデルによるドイツの顧客フィードバックの関連性・極性分類のドメイン適応

Domain Adaptation of Transformer-Based Models using Unlabeled Data for Relevance and Polarity Classification of German Customer Feedback ( http://arxiv.org/abs/2212.05764v1 )

ライセンス: Link先を確認

Ahmad Idrissi-Yaghir, Henning Sch\"afer, Nadja Bauer, Christoph M. Friedrich

(参考訳) 顧客からのフィードバックを理解することは、企業が問題を特定し、製品やサービスを改善するために必要なことです。テキスト分類と感情分析は、さまざまな機械学習アプローチとディープラーニングアプローチを用いて、これらのデータを分析する上で大きな役割を果たす。この作業では、ドイツの顧客フィードバックデータセットを扱う際に、さまざまなトランスフォーマーベースのモデルを使用して、これらのモデルがいかに効率的かを調べる。さらに、これらの事前学習モデルは、未ラベルデータを用いて特定の領域に適応させることで、既学習モデルよりも優れた結果が得られるかどうかを更に分析する。モデルを評価するために、GermEval 2017の2つの下流タスクが検討されている。実験の結果,トランスフォーマティブベースモデルは,fasttextベースラインに比べて大幅に改善され,公開スコアや先行モデルよりも優れていた。サブタスク関連分類において、最良モデルは、第1のテストセットで96.1 %、第2テストセットで95.9 %、サブタスク極性分類で85.1 %、85.3 %のマイクロ平均値である。

Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working with a German customer feedback dataset. In addition, these pre-trained models are further analyzed to determine if adapting them to a specific domain using unlabeled data can yield better results than off-the-shelf pre-trained models. To evaluate the models, two downstream tasks from the GermEval 2017 are considered. The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline and outperform the published scores and previous models. For the subtask Relevance Classification, the best models achieve a micro-averaged $F1$-Score of 96.1 % on the first test set and 95.9 % on the second one, and a score of 85.1 % and 85.3 % for the subtask Polarity Classification.

翻訳日:2022-12-13 15:08:10 公開日:2022-12-12

# 確率重み平均化による事前学習言語モデルの一般化の改善

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging ( http://arxiv.org/abs/2212.05956v1 )

ライセンス: Link先を確認

Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais

(参考訳) 知識蒸留(KD)は、下流タスクにおけるコンパクトな事前学習言語モデル(PLM)の一般化を改善するための一般的な手法である。しかし、このような方法は、新しいデータセットごとに別の教師モデルをトレーニングする追加の負担を課す。あるいは、より優れた一般化に向けて、コンパクトモデルの最適化手順の改善に直接取り組むことができる。近年の研究では、局所的な最小値の平坦性はより良い一般化とよく相関している。本研究では,より平坦な最小値への収束を促す手法であるSWA(Stochastic Weight Averaging)を微調整PLMに適用する。我々は、様々なNLPタスク(テキスト分類、質問応答、生成)と異なるモデルアーキテクチャについて広範な実験を行い、追加の計算コストなしで一般化を改善することを示す。さらに, この単純な最適化手法は, コンパクトモデルに対する最先端KD法よりも優れていることを示す。

Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

翻訳日:2022-12-13 15:07:47 公開日:2022-12-12

# ファウショットシナリオにおけるフェデレートNLP

Federated NLP in Few-shot Scenarios ( http://arxiv.org/abs/2212.05974v1 )

ライセンス: Link先を確認

Dongqi Cai, Shangguang Wang, Yaozong Wu, Felix Xiaozhu Lin, Mengwei Xu

(参考訳) 自然言語処理(NLP)はリッチなモバイルアプリケーションである。様々な言語理解タスクをサポートするため、基盤となるnlpモデルは、しばしば連合したプライバシー保護設定(fl)で微調整される。このプロセスは現在、モバイルクライアントから少なくとも数十万のラベル付きトレーニングサンプルに依存しているが、モバイルユーザは自分のデータをラベル付けする意思や知識を欠いていることが多い。このようなデータラベルの不十分さは、数ショットのシナリオとして知られており、モバイルNLPアプリケーションのキーブロッカーとなっている。この研究は、数ショットシナリオ(FedFSL)におけるフェデレーションNLPを初めて調査する。擬似ラベリングと即時学習のアルゴリズム的進歩を再現することにより、トレーニングデータの0.05%(100未満)しかラベル付けされず、残りがラベル付けされていない場合に競争精度を提供する訓練パイプラインを最初に構築する。ワークフローをインスタンス化するために,新しい設計で高い実行コストに対応するシステムFFNLPを提案する。 1)疑似ラベルをトレーニングワークフローに、学習の進捗に合致するレートで注入するカリキュラムペーシング、(2)最も学習可能なデータを選択するためのメカニズムである表現多様性、(3)モデルのトレーニング深さと層容量のコプランニング。これらの設計により、トレーニング遅延、クライアントエネルギー、ネットワークトラフィックがそれぞれ46.0$\times$、41.2$\times$、3000.0$\times$となる。 FFNLPはアルゴリズム/システムの共同設計を通じて、ほとんどのトレーニングサンプルがラベル付けされていない困難な設定にFLを適用することができることを示した。

Natural language processing (NLP) sees rich mobile applications. To support various language understanding tasks, a foundation NLP model is often fine-tuned in a federated, privacy-preserving setting (FL). This process currently relies on at least hundreds of thousands of labeled training samples from mobile clients; yet mobile users often lack willingness or knowledge to label their data. Such an inadequacy of data labels is known as a few-shot scenario; it becomes the key blocker for mobile NLP applications. For the first time, this work investigates federated NLP in the few-shot scenario (FedFSL). By retrofitting algorithmic advances of pseudo labeling and prompt learning, we first establish a training pipeline that delivers competitive accuracy when only 0.05% (fewer than 100) of the training data is labeled and the remaining is unlabeled. To instantiate the workflow, we further present a system FFNLP, addressing the high execution cost with novel designs. (1) Curriculum pacing, which injects pseudo labels to the training workflow at a rate commensurate to the learning progress; (2) Representational diversity, a mechanism for selecting the most learnable data, only for which pseudo labels will be generated; (3) Co-planning of a model's training depth and layer capacity. Together, these designs reduce the training delay, client energy, and network traffic by up to 46.0$\times$, 41.2$\times$ and 3000.0$\times$, respectively. Through algorithm/system co-design, FFNLP demonstrates that FL can apply to challenging settings where most training samples are unlabeled.

翻訳日:2022-12-13 15:07:31 公開日:2022-12-12

# 空間的関係を学習する畳み込みニューラルネットワークの能力を明らかにする新しい特徴スクランブルアプローチ

A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations ( http://arxiv.org/abs/2212.06021v1 )

ライセンス: Link先を確認

Amr Farahat, Felix Effenberger, Martin Vinck

(参考訳) 畳み込みニューラルネットワーク(cnns)は、オブジェクト認識を解く最も成功したコンピュータビジョンシステムの一つである。さらに、CNNは人間の脳における視覚的表現の性質を理解するために大きな応用がある。しかし、CNNが実際にどのように決断を下すのか、内部表現の性質や認識戦略が人間とどのように異なるのかは、まだ理解されていない。具体的には、cnnが主に物体の表面の規則性に依存しているのか、それとも人間に似た特徴の空間的配置を活用できるのかという議論がある。本稿では,cnnがオブジェクトの分類に特徴の空間的配置(すなわちオブジェクト部分)を使用するかどうかを明示的に検証する新しい特徴スクランブル手法を開発した。我々は,この手法を,CNNの有効受容フィールドサイズを体系的に操作すると同時に,最小認識可能な構成(MIRC)解析と組み合わせる。従来の文献とは対照的に,CNNが比較的長距離空間関係をオブジェクト分類に利用できることを示す。さらに、cnnが空間的関係を使用する範囲は、テクスチャやスケッチといったデータセットに大きく依存する。実際、CNNは異種データセット(ImageNet)内の異なるクラスに対して異なる戦略を使用しており、CNNは連続的な分類戦略を持っていることを示唆している。最後に,cnnは粒度の中間レベルまでのみ特徴の空間的配置を学習できることを示し,大域的な形状特徴よりも中間的な特徴が物体分類における感度と特異性の最適なトレードオフをもたらすことを示唆する。これらの結果は、cnn表現の性質と、それらがオブジェクト分類の特徴の空間的配置に依存する範囲に関する新しい洞察を与える。

Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.

翻訳日:2022-12-13 15:01:02 公開日:2022-12-12

# ステアブルCNNのための入射神経畳み込みカーネル

Implicit Neural Convolutional Kernels for Steerable CNNs ( http://arxiv.org/abs/2212.06096v1 )

ライセンス: Link先を確認

Maksim Zhdanov, Nico Hoffmann and Gabriele Cesa

(参考訳) ステアブル畳み込みニューラルネットワーク(Steerable Convolutional Neural Network, CNN)は、リフレクションや回転など、原点保存グループである$G$に属する翻訳や他の変換と等価なニューラルネットワークを構築するための一般的なフレームワークを提供する。それらは、カーネル空間に課されるグループ固有の等分散制約を解析的に解いて得られる、$g$-steerable kernelの標準畳み込みに依存する。解が特定の群 $G$ に調整されるので、核基底の実装は、群同変モデルの開発を複雑にする他の対称性変換に一般化されない。本稿では,多層パーセプトロン(MLP)による暗黙的神経表現を用いて,$G$-steerableカーネルのパラメータ化を提案する。結果として得られるフレームワークは、ステアブルCNNの実装をシンプルで柔軟な方法で提供し、任意のグループ$G$に一般化し、$G$-equivariant MLPを構築できる。本手法をポイントクラウド (modelnet-40) と分子データ (qm9) に適用し, 標準制御型cnnと比較して性能が著しく向上することを示す。

Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and other transformations belonging to an origin-preserving group $G$, such as reflections and rotations. They rely on standard convolutions with $G$-steerable kernels obtained by analytically solving the group-specific equivariance constraint imposed onto the kernel space. As the solution is tailored to a particular group $G$, the implementation of a kernel basis does not generalize to other symmetry transformations, which complicates the development of group equivariant models. We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels. The resulting framework offers a simple and flexible way to implement Steerable CNNs and generalizes to any group $G$ for which a $G$-equivariant MLP can be built. We apply our method to point cloud (ModelNet-40) and molecular data (QM9) and demonstrate a significant improvement in performance compared to standard Steerable CNNs.

翻訳日:2022-12-13 15:00:34 公開日:2022-12-12

# 質問応答のためのモーメントコントラスト事前学習

Momentum Contrastive Pre-training for Question Answering ( http://arxiv.org/abs/2212.05762v1 )

ライセンス: Link先を確認

Minda Hu, Muzhi Li, Yasheng Wang and Irwin King

(参考訳) 既存の抽出質問回答(QA)の事前学習手法は、構文構造において自然質問とは異なるクローゼのようなクエリを生成する。そこで本研究では,抽出QAのための新しいMomentum Contrastive pRe-training fOr queStion anSwering(MCROSS)法を提案する。具体的には、MCROSSはモーメントコントラスト学習フレームワークを導入し、クローゼのような解答確率と自然な問合せのサンプルペアを一致させる。したがって、事前訓練されたモデルは、クローゼのようなサンプルで学んだ知識を自然の疑問に答えることができる。 3つのベンチマークQAデータセットによる実験結果から,本手法は教師付きシナリオとゼロショットシナリオの両方のベースラインと比較して顕著な改善が得られた。

Existing pre-training methods for extractive Question Answering (QA) generate cloze-like queries different from natural questions in syntax structure, which could overfit pre-trained models to simple keyword matching. In order to address this problem, we propose a novel Momentum Contrastive pRe-training fOr queStion anSwering (MCROSS) method for extractive QA. Specifically, MCROSS introduces a momentum contrastive learning framework to align the answer probability between cloze-like and natural query-passage sample pairs. Hence, the pre-trained models can better transfer the knowledge learned in cloze-like samples to answering natural questions. Experimental results on three benchmarking QA datasets show that our method achieves noticeable improvement compared with all baselines in both supervised and zero-shot scenarios.

翻訳日:2022-12-13 14:58:25 公開日:2022-12-12

# ResNetのソリューション構築による解釈について

On an Interpretation of ResNets via Solution Constructions ( http://arxiv.org/abs/2212.05663v1 )

ライセンス: Link先を確認

Changcun Huang

(参考訳) 本稿では,resnetアーキテクチャの一般的な解釈が与えられ,性能メカニズムが説明できる,ゲートネットワーク制御と深層分類の原理による,マルチカテゴリ分類のためのresnetの典型的な解法について述べる。その解釈の一般性をさらに実証するために、さらに多くの解を用いる。 ResNetsの普遍近似能力が証明された。

This paper first constructs a typical solution of ResNets for multi-category classifications by the principle of gate-network controls and deep-layer classifications, from which a general interpretation of the ResNet architecture is given and the performance mechanism is explained. We then use more solutions to further demonstrate the generality of that interpretation. The universal-approximation capability of ResNets is proved.

翻訳日:2022-12-13 14:51:00 公開日:2022-12-12

# 次項目推薦のためのハンケル行列表現によるテンソル型逐次学習

Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations ( http://arxiv.org/abs/2212.05720v1 )

ライセンス: Link先を確認

Evgeny Frolov and Ivan Oseledets

(参考訳) 自己注意型トランスフォーマーモデルは、最近、次の項目推奨タスクを非常に効率的に解くことが示されている。学習された注意重みは、ユーザの行動のシーケンシャルなダイナミクスを捉え、うまく一般化する。学習パラメータ空間の特別な構造に動機付けられ、それに代わるより軽量なアプローチでそれを模倣できるかどうかを疑問視する。学習プロセス内のシーケンシャルデータに関する構造的知識を生かしたテンソル分解に基づく新しいモデルを開発する。我々は,特別なハンケル行列表現に基づいて,自己アテンションネットワークの特性をどのように再現できるかを示す。結果として得られるモデルは、浅い線形アーキテクチャを持ち、そのニューラルアーキテクチャと比較する。

Self-attentive transformer models have recently been shown to solve the next item recommendation task very efficiently. The learned attention weights capture sequential dynamics in user behavior and generalize well. Motivated by the special structure of learned parameter space, we question if it is possible to mimic it with an alternative and more lightweight approach. We develop a new tensor factorization-based model that ingrains the structural knowledge about sequential data within the learning process. We demonstrate how certain properties of a self-attention network can be reproduced with our approach based on special Hankel matrix representation. The resulting model has a shallow linear architecture and compares competitively to its neural counterpart.

翻訳日:2022-12-13 14:50:53 公開日:2022-12-12

# REAP: 大規模で現実的な競合するパッチベンチマーク

REAP: A Large-Scale Realistic Adversarial Patch Benchmark ( http://arxiv.org/abs/2212.05680v1 )

ライセンス: Link先を確認

Nabeel Hingun, Chawin Sitawarin, Jerry Li, David Wagner

(参考訳) 機械学習モデルは敵の摂動に影響を受けやすいことが知られている。有名な攻撃のひとつがadversarial patchで、特にデザインされたパターンを持つステッカーで、モデルがオブジェクトを誤って予測します。この攻撃は、自動運転車のようなカメラに依存するサイバー物理システムに重大な脅威をもたらす。現実の世界における攻撃や防御の評価は、合成データが非現実的であるのに対して、非常にコストがかかる。本研究では,実際の画像に対するパッチ攻撃や実環境下でのパッチ攻撃を評価するデジタルベンチマークであるREAP(Realistic Adversarial Patch)ベンチマークを提案する。 mapillary vistasデータセット上に構築されたベンチマークには、14,000以上のトラフィックサインが含まれています。それぞれのサインは幾何変換と照明変換で拡張され、デジタル的に生成されたパッチをリアルにサインに応用することができる。本ベンチマークを用いて,現実的な条件下での敵パッチ攻撃の大規模評価を行った。実験の結果, 敵のパッチ攻撃は従来考えられていたよりも脅威が少なく, 単純なデジタルシミュレーションに対する攻撃の成功率は実際の効果を予測できないことが示唆された。私たちはベンチマークをhttps://github.com/wagner-group/reap-benchmarkで公開しています。

Machine learning models are known to be susceptible to adversarial perturbation. One famous attack is the adversarial patch, a sticker with a particularly crafted pattern that makes the model incorrectly predict the object it is placed on. This attack presents a critical threat to cyber-physical systems that rely on cameras such as autonomous cars. Despite the significance of the problem, conducting research in this setting has been difficult; evaluating attacks and defenses in the real world is exceptionally costly while synthetic data are unrealistic. In this work, we propose the REAP (REalistic Adversarial Patch) benchmark, a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions. Built on top of the Mapillary Vistas dataset, our benchmark contains over 14,000 traffic signs. Each sign is augmented with a pair of geometric and lighting transformations, which can be used to apply a digitally generated patch realistically onto the sign. Using our benchmark, we perform the first large-scale assessments of adversarial patch attacks under realistic conditions. Our experiments suggest that adversarial patch attacks may present a smaller threat than previously believed and that the success rate of an attack on simpler digital simulations is not predictive of its actual effectiveness in practice. We release our benchmark publicly at https://github.com/wagner-group/reap-benchmark.

翻訳日:2022-12-13 14:49:58 公開日:2022-12-12

# DeepCut: グラフニューラルネットワーククラスタリングによる教師なしセグメンテーション

DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering ( http://arxiv.org/abs/2212.05853v1 )

ライセンス: Link先を確認

Amit Aflalo, Shai Bagon, Tamar Kashti, Yonina eldar

(参考訳) 画像分割はコンピュータビジョンの基本課題である。教師なしメソッドをトレーニングするためのデータアノテーションは労働集約的であり、教師なしメソッドを動機付ける。既存のアプローチでは、事前訓練されたネットワークから深い特徴を抽出し、グラフを構築して古典的なクラスタリング手法(例えば、$k$-meansや正規化-cuts)を後処理の段階として適用する。これらの手法は特徴量に符号化された高次元情報をペアワイズスカラー親和性に還元する。本研究では、従来のクラスタリングアルゴリズムを、同じクラスタリング目的関数を達成するために訓練された軽量グラフニューラルネットワーク(GNN)に置き換える。しかし、既存のアプローチとは対照的に、GNNはローカルな画像特徴間のペアの親和性だけでなく、生の特徴自体も与えている。生の機能とクラスタリング目標の間のこの接続を維持することで、追加の処理ステップを必要とせずに、部分的なセマンティクスセグメンテーションを暗黙的に実行することができる。画像セグメンテーションGNNを学習するための自己教師付き損失関数として,古典的クラスタリングの目的を定式化する方法を示す。さらに、相関クラスタリング(CC)の目的を使ってクラスタ数を定義せずにクラスタリングを行う(k$lessクラスタリング)。提案手法は,複数のベンチマークにおいて最先端性能を上回って,オブジェクトのローカライゼーション,セグメンテーション,セマンティクス部分セグメンテーションタスクに適用する。

Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Some existing approaches extract deep features from pre-trained networks and build a graph to apply classical clustering methods (e.g., $k$-means and normalized-cuts) as a post-processing stage. These techniques reduce the high-dimensional information encoded in the features to pair-wise scalar affinities. In this work, we replace classical clustering algorithms with a lightweight Graph Neural Network (GNN) trained to achieve the same clustering objective function. However, in contrast to existing approaches, we feed the GNN not only the pair-wise affinities between local image features but also the raw features themselves. Maintaining this connection between the raw feature and the clustering goal allows to perform part semantic segmentation implicitly, without requiring additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training our image segmentation GNN. Additionally, we use the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters ($k$-less clustering). We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.

翻訳日:2022-12-13 14:49:37 公開日:2022-12-12

# 安定なアーティスト:拡散ラテント・スペースでセマンティックを操る

The Stable Artist: Steering Semantics in Diffusion Latent Space ( http://arxiv.org/abs/2212.06013v1 )

ライセンス: Link先を確認

Manuel Brack, Patrick Schramowski, Felix Friedrich, Dominik Hintersdorf, Kristian Kersting

(参考訳) テキストコンディショニングによる大規模生成拡散モデルは最近、テキストのみから高精細な画像を生成するという素晴らしい性能で多くの注目を集めている。しかし、高品質な結果を得ることはほとんど不可能である。それに対して、テキスト誘導画像生成では、ユーザは、想定された画像を反復的に彫るために、入力にわずかな変更を多く行う。しかし、入力プロンプトのわずかな変更は、しばしば全く異なる画像が生成されることにつながるため、アーティストの制御はその粒度に制限される。フレキシビリティを実現するため,画像生成プロセスのきめ細かい制御が可能な画像編集手法であるStable Artistを提案する。主要なコンポーネントはセマンティックガイダンス(SEGA)であり、セマンティックな方向の変数数に沿って拡散過程を制御している。これにより、画像の微妙な編集、構成やスタイルの変化、芸術的概念全体の最適化が可能になる。さらに、SEGAは潜在空間を探索することで、モデルによって学習された概念、例えば「炭素放出」のような複雑な概念の表現についての洞察を得ることができる。いくつかのタスクで安定したアーティストを示し、高品質な画像編集と構成を示す。

Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as 'carbon emission'. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.

翻訳日:2022-12-13 14:49:13 公開日:2022-12-12

# ORCa:放射界カメラとしての光沢のある物体

ORCa: Glossy Objects as Radiance Field Cameras ( http://arxiv.org/abs/2212.04531v2 )

ライセンス: Link先を確認

Kushagra Tiwary, Akshat Dave, Nikhil Behari, Tzofi Klinghoffer, Ashok Veeraraghavan, Ramesh Raskar

(参考訳) 光沢のある物体の反射は、周囲の環境に関する貴重な情報と隠れた情報を含んでいる。これらの物体をカメラに変換することで、カメラの視野外の画像化や、人間の目に映る反射のような一見不可能な視界から、エキサイティングな応用を解き放つことができる。しかし, 反射は物体形状, 材料特性, 3次元環境, 観測者の観察方向などと密接に依存するため, この課題は困難である。本手法は,未知の幾何学を持つ光沢のある物体を放射場カメラに変換し,物体の視点から世界を撮影する。私たちの重要な洞察は、オブジェクトの表面を、オブジェクトが見える5d環境放射フィールドの2d投影としてキャストされた反射をキャプチャする仮想センサーに変換することです。本研究では, 環境放射界の復元により, 被写体から周囲への深度と放射率の推定が可能であり, また, 現場に存在する光沢のある物体にのみ直接視認できる新規なビューのレンダリングも可能であり, 観察者ではないことを示す。さらに、放射場を用いて、シーン内の近接物体によって引き起こされる閉塞体の周囲を画像化することができる。本手法はオブジェクトの多視点画像に基づいてエンドツーエンドに学習し,オブジェクト形状,拡散放射率,および5次元環境放射率場を共同で推定する。

Reflections on glossy objects contain valuable and hidden information about the surrounding environment. By converting these objects into cameras, we can unlock exciting applications, including imaging beyond the camera's field-of-view and from seemingly impossible vantage points, e.g. from reflections on the human eye. However, this task is challenging because reflections depend jointly on object geometry, material properties, the 3D environment, and the observer viewing direction. Our approach converts glossy objects with unknown geometry into radiance-field cameras to image the world from the object's perspective. Our key insight is to convert the object surface into a virtual sensor that captures cast reflections as a 2D projection of the 5D environment radiance field visible to the object. We show that recovering the environment radiance fields enables depth and radiance estimation from the object to its surroundings in addition to beyond field-of-view novel-view synthesis, i.e. rendering of novel views that are only directly-visible to the glossy object present in the scene, but not the observer. Moreover, using the radiance field we can image around occluders caused by close-by objects in the scene. Our method is trained end-to-end on multi-view images of the object and jointly estimates object geometry, diffuse radiance, and the 5D environment radiance field.

翻訳日:2022-12-13 12:40:32 公開日:2022-12-12

# コンテンツモデレーションと映画コンテンツ評価のための深層アーキテクチャ

Deep Architectures for Content Moderation and Movie Content Rating ( http://arxiv.org/abs/2212.04533v2 )

ライセンス: Link先を確認

Fatih Cagatay Akyon, Alptekin Temizel

(参考訳) コンテンツに基づくビデオの評価は、ビデオ年齢カテゴリーを分類するための重要なステップである。映画コンテンツレーティングとテレビ番組レーティングは、専門家委員会が設立した2つの最も一般的なレーティングシステムである。しかし、委員会によるシーン・フィルムコンテンツの手作業によるレビュー・評価は面倒な作業であり、オンラインビデオコンテンツの増大がますます困難になっている。そのため、コンピュータビジョンに基づく映像コンテンツ分析技術を用いて評価プロセスを自動化することが望ましい。本稿では,アクション認識,マルチモーダル学習,映画ジャンル分類,コンテンツモデレーションと映画コンテンツ評価の文脈におけるセンシティブなコンテンツ検出について要約する。プロジェクトページはhttps://github.com/fcakyon/content-moderation-deep-learningにある。

Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning.

翻訳日:2022-12-13 12:40:07 公開日:2022-12-12

# 顔生成における一対多対応の記憶

Memories are One-to-Many Mapping Alleviators in Talking Face Generation ( http://arxiv.org/abs/2212.05005v2 )

ライセンス: Link先を確認

Anni Tang, Tianyu He, Xu Tan, Jun Ling, Runnan Li, Sheng Zhao, Li Song, Jiang Bian

(参考訳) 対話顔生成は、入力音声によって駆動される対象者の写実的映像像を生成することを目的としている。入力音声から出力映像への1対1マッピング(例えば、1つの音声コンテンツが複数の可視性を持つ)の性質から、以前の作品のように決定論的なマッピングを学ぶことはトレーニングのあいまいさをもたらし、その結果は劣る。この1対多マッピングは、部分的には2段階のフレームワーク(すなわち、音声対表現モデルとニューラルレンダリングモデル)によって緩和されるが、十分な情報(感情、しわなど)が得られないので、まだ不十分である。本稿では,不足している情報を暗黙記憶で補完するmemfaceと,それぞれ2段階の感覚に従う明示記憶を提案する。より具体的には、暗黙記憶は、音声表現共有空間における高レベルセマンティクスを捉えるのに、暗黙記憶は、ピクセルレベルの詳細を合成するために、ニューラルレンダリングモデルで使用される。実験の結果,提案するmemfaceは,複数のシナリオにまたがる最先端の成果を一貫して,かつ著しく上回ることがわかった。

Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.

翻訳日:2022-12-13 12:39:54 公開日:2022-12-12

# 機械学習フレームワーク:医療施設における競争的知性とキードライバーの市場シェア傾向の同定

Machine Learning Framework: Competitive Intelligence and Key Drivers Identification of Market Share Trends Among Healthcare Facilities ( http://arxiv.org/abs/2212.04810v2 )

ライセンス: Link先を確認

Anudeep Appe, Bhanu Poluparthi, Lakshmi Kasivajjula, Udai Mv, Sobha Bagadi, Punya Modi, Aditya Singh, Hemanth Gunupudi

(参考訳) 医療戦略策定におけるデータ駆動決定の必要性は急速に増加している。医療提供者施設や病院(ここからは施設と呼ぶ)に影響を与える要因を特定するための信頼性の高いフレームワークが重要視されている。このパイロット研究の目的は、ストラテジストが医療サービスの品質向上に影響を及ぼす施設の市場シェアを改善するために重要な決定を策定することを支援する、データ駆動機械学習(data driven machine learning) - 回帰フレームワークの開発である。米国(米国)のヘルスケアビジネスが研究対象に選ばれ、ワシントン州の主要施設60施設にまたがるデータと、約3年間の歴史的データについて検討されている。現在の分析において、市場シェアは、潜在的な競争相手の施設群間の合計の出会いに対する施設の出会いの割合として表される。本研究は,市場シェアを評価・予測するための,競争相手識別と回帰アプローチの2段階的アプローチを提案する。マーケットシェアに影響を与える機能の相対的重要性を定量化するために、モデル非依存技術であるSHAPを利用する。提案手法は,既存分析における競合相手のプールを同定し,DAG(Directed Acyclic Graphs)と特徴レベルのワードベクトルを開発し,施設レベルで重要な連結成分を評価する。この技術は、経験的手法のバイアスを最小限に抑えるデータ駆動によって堅牢である。施設間の競争相手を特定したポストは、市場シェアを予測するための回帰モデルを開発した。施設レベルでの特徴の相対的定量化のために、shap a をモデル非依存の説明器として組み込んだ。これは、市場シェアに影響を与える各施設の属性を特定しランク付けするのに役立った。

The necessity of data driven decisions in healthcare strategy formulation is rapidly increasing. A reliable framework which helps identify factors impacting a Healthcare Provider Facility or a Hospital (from here on termed as Facility) Market Share is of key importance. This pilot study aims at developing a data driven Machine Learning - Regression framework which aids strategists in formulating key decisions to improve the Facilitys Market Share which in turn impacts in improving the quality of healthcare services. The US (United States) healthcare business is chosen for the study; and the data spanning across 60 key Facilities in Washington State and about 3 years of historical data is considered. In the current analysis Market Share is termed as the ratio of facility encounters to the total encounters among the group of potential competitor facilities. The current study proposes a novel two-pronged approach of competitor identification and regression approach to evaluate and predict market share, respectively. Leveraged model agnostic technique, SHAP, to quantify the relative importance of features impacting the market share. The proposed method to identify pool of competitors in current analysis, develops Directed Acyclic Graphs (DAGs), feature level word vectors and evaluates the key connected components at facility level. This technique is robust since its data driven which minimizes the bias from empirical techniques. Post identifying the set of competitors among facilities, developed Regression model to predict the Market share. For relative quantification of features at a facility level, incorporated SHAP a model agnostic explainer. This helped to identify and rank the attributes at each facility which impacts the market share.

翻訳日:2022-12-13 12:39:15 公開日:2022-12-12

# ソーシャルレコメンデータシステムのためのグラフニューラルネットワークに関する調査

A Survey of Graph Neural Networks for Social Recommender Systems ( http://arxiv.org/abs/2212.04481v2 )

ライセンス: Link先を確認

Kartik Sharma and Yeon-Chang Lee and Sivagami Nambi and Aditya Salian and Shlok Shah and Sang-Wook Kim and Srijan Kumar

(参考訳) ソーシャルリコメンデーションシステム(social recommender systems, social recommender)は,アイテムレコメンデーションを生成するタスクとして,ユーザ間インタラクションとユーザ間ソーシャルリレーションを同時に活用する。さらに、社会関係の活用は、同性や社会的影響によるユーザの嗜好を理解する上で、明らかに有効である。そのため、SocialRSはますます注目を集めている。特に、グラフニューラルネットワーク(GNN)の進歩により、近年多くのGNNベースのSocialRS手法が開発されている。そこで我々はGNNベースのSocialRSに関する文献を包括的かつ体系的にレビューする。本調査では,PRISMAフレームワークに従って2151の論文を注釈付けし,まずGNNベースのSocialRSに関する80の論文を同定した。 1)入力分類学は入力型表記の5つのグループと入力型表記の7つのグループを含み、(2)アーキテクチャ分類学はGNNエンコーダの8つのグループとデコーダの2つのグループと損失関数表記の12つのグループを含む。我々は,GNNに基づくSocialRS手法を分類学のいくつかのカテゴリに分類し,その詳細を説明する。さらに、GNNベースのSocialRS手法を評価するために広く使われているベンチマークデータセットとメトリクスを要約する。最後に,今後の研究の方向性を示すことで,この調査を結論づける。

Social recommender systems (SocialRS) simultaneously leverage user-to-item interactions as well as user-to-user social relations for the task of generating item recommendations to users. Additionally exploiting social relations is clearly effective in understanding users' tastes due to the effects of homophily and social influence. For this reason, SocialRS has increasingly attracted attention. In particular, with the advance of Graph Neural Networks (GNN), many GNN-based SocialRS methods have been developed recently. Therefore, we conduct a comprehensive and systematic review of the literature on GNN-based SocialRS. In this survey, we first identify 80 papers on GNN-based SocialRS after annotating 2151 papers by following the PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-Analysis). Then, we comprehensively review them in terms of their inputs and architectures to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type notations and 7 groups of input representation notations; (2) architecture taxonomy includes 8 groups of GNN encoder, 2 groups of decoder, and 12 groups of loss function notations. We classify the GNN-based SocialRS methods into several categories as per the taxonomy and describe their details. Furthermore, we summarize the benchmark datasets and metrics widely used to evaluate the GNN-based SocialRS methods. Finally, we conclude this survey by presenting some future research directions.

翻訳日:2022-12-13 12:38:51 公開日:2022-12-12

PDF登録状況（公開日: 20221212）