Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20221128となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 有限アーベル群に対する内部量子参照フレーム Internal quantum reference frames for finite Abelian groups ( http://arxiv.org/abs/2107.07545v2 ) ライセンス: Link先を確認	Philipp A. Hoehn, Marius Krumm, Markus P. Mueller	(参考訳) 内部量子システムを参照フレームとして用いることは、外的相対性が利用できない場合、量子重力、ゲージ理論、量子基礎において重要な概念である。本研究では、基礎となる構成空間が有限アベリア群である場合に、そのような量子参照フレーム(QRF)の包括的かつ自己完結的な処理を施し、これまでの作業を大幅に拡張する(Quantum 5, 530 (2021))。このセットアップの単純さは、完全に厳密な量子情報理論解析を認め、概念的および構造的問題の多くを探索するのに十分な構造を維持しながら、より複雑な設定に関係している。これを利用して、量子情報理論法による制約量子化の重要構造を導出し、QRF共分散に対する異なるアプローチの関係を明らかにする。特に、我々は「物理的ヒルベルト空間」("perspective-neutral" アプローチの領域)を、状態の浄化のフレームに依存しない記述を許容する極大部分空間として特徴づける。次に,qrfsに対する「可視的ニュートラル」アプローチと「良性」アプローチの運動学的同値性と,驚くべき動的不等価性を示す。前者は任意の部分系関係の間の遷移を生成するユニタリを認めているが、後者は対称保存を必要とするときにそのようなダイナミクスを認めない。実験では, 相互作用する粒子の例として, ダイナミックスを「サブシステムの1つと相対的に」表現する方法を例示する。 Employing internal quantum systems as reference frames is a crucial concept in quantum gravity, gauge theories and quantum foundations whenever external relata are unavailable. In this work, we give a comprehensive and self-contained treatment of such quantum reference frames (QRFs) for the case when the underlying configuration space is a finite Abelian group, significantly extending our previous work (Quantum 5, 530 (2021)). The simplicity of this setup admits a fully rigorous quantum information-theoretic analysis, while maintaining sufficient structure for exploring many of the conceptual and structural questions also pertinent to more complicated setups. We exploit this to derive several important structures of constraint quantization with quantum information-theoretic methods and to reveal the relation between different approaches to QRF covariance. In particular, we characterize the "physical Hilbert space" -- the arena of the "perspective-neutral" approach -- as the maximal subspace that admits frame-independent descriptions of purifications of states. We then demonstrate the kinematical equivalence and, surprising, dynamical inequivalence of the "perspective-neutral" and the "alignability" approach to QRFs. While the former admits unitaries generating transitions between arbitrary subsystem relations, the latter, remarkably, admits no such dynamics when requiring symmetry-preservation. We illustrate these findings by example of interacting discrete particles, including how dynamics can be described "relative to one of the subsystems".	翻訳日:2023-03-22 05:03:05 公開日:2022-11-28
# 文法の進化における方向性力の検出:EEBO, COHA, Google Booksを用いた英語完全語の事例研究 Detecting directional forces in the evolution of grammar: A case study of the English perfect using EEBO, COHA, and Google Books ( http://arxiv.org/abs/2110.08567v2 ) ライセンス: Link先を確認	Shimpei Okuda, Michio Hosaka, and Kazutoshi Sasahara	(参考訳) 言語には進化を通じて現れた様々な特徴がある。現代の英語文法では、完全は \textit{have}+PP (past participle) で形成されるが、初期の英語では \textit{be}+PP 形式も存在した。副動詞BEは,いくつかの特別な症例を除いて,進化を通じてHAVEに置き換えられたことが広く認識されている。しかし、この進化が自然選択やランダムドリフトによって引き起こされたのかはいまだ不明である。本稿では、EEBO(Early English Books Online)、COHA(Corpus of Historical American English)、Google Books(Google Books)の3つの大規模データソースを組み合わせて、英語完全性の進化における方向性について検討した。非翻訳動詞の多くは、ディープニューラルネットワークに基づくモデルによって「選択」に分類された \textit{be}+pp から \textit{have}+pp へ明らかな遷移を示した。これらの結果は、英語の完全性がランダムなドリフトではなく自然選択を通じて進化し、文法の文化的進化に対する洞察を与えることを示唆している。 Languages have diverse characteristics that have emerged through evolution. In modern English grammar, the perfect is formed with \textit{have}+PP (past participle), but in earlier English the \textit{be}+PP form also existed. It is widely recognised that the auxiliary verb BE was replaced by HAVE throughout evolution, except for some special cases. However, whether this evolution was caused by natural selection or random drift is still unclear. Here we examined directional forces in the evolution of the English perfect by combining three large-scale data sources: EEBO (Early English Books Online), COHA (Corpus of Historical American English), and Google Books. We found that most intransitive verbs exhibited an apparent transition from \textit{be}+PP to \textit{have}+PP, most of which were classified as `selection' by a deep neural network-based model. These results suggest that the English perfect could have evolved through natural selection rather than random drift, and provide insights into the cultural evolution of grammar.	翻訳日:2023-03-11 08:03:36 公開日:2022-11-28
# 量子次数探索の成功確率について On the success probability of quantum order finding ( http://arxiv.org/abs/2201.07791v2 ) ライセンス: Link先を確認	Martin Eker{\aa}	(参考訳) shor の順序探索アルゴリズムが 1 回のランで $r$ を回収することに成功すれば,その確率は低い値であることが証明される。このバウンドは、アルゴリズムの古典的な後処理部分で2つの制限された検索を行うことで、量子部分を再実行したり、shorと比較して指数長を増加させることなく、r$で高い成功確率を保証できることを意味する。漸近的に、$r$が無限大の傾向にあるように、単一のランで$r$を回復する確率は1つになる。適度な$r$の場合、例えば1 - 10^{-4}$を超える高い成功確率が保証される。行程として、オーダーフィンディングアルゴリズムの単一実行において任意の整数$N$を完全に分解する確率について類似した結果を示す。 We prove a lower bound on the probability of Shor's order-finding algorithm successfully recovering the order $r$ in a single run. The bound implies that by performing two limited searches in the classical post-processing part of the algorithm, a high success probability can be guaranteed, for any $r$, without re-running the quantum part or increasing the exponent length compared to Shor. Asymptotically, in the limit as $r$ tends to infinity, the probability of successfully recovering $r$ in a single run tends to one. Already for moderate $r$, a high success probability exceeding e.g. $1 - 10^{-4}$ can be guaranteed. As corollaries, we prove analogous results for the probability of completely factoring any integer $N$ in a single run of the order-finding algorithm.	翻訳日:2023-02-28 10:13:00 公開日:2022-11-28
# ガウス回路を用いたゴッテマン・キタエフ精密状態の効率的なシミュレーション Efficient simulation of Gottesman-Kitaev-Preskill states with Gaussian circuits ( http://arxiv.org/abs/2203.11182v3 ) ライセンス: Link先を確認	Cameron Calcluth, Alessandro Ferraro, Giulia Ferrini	(参考訳) ゴッテマン・キタエフ・プレスキル状態(GKP)の古典的シミュラビリティを,任意の変位,大規模なシンプレクティック操作,ホモダイン測定と組み合わせて検討した。これらのタイプの回路では、準確率分布の非負性性に基づく連続変数の定理も、ゴッテマン・クニルの定理のような離散変数の定理も、シミュラビリティを評価するために用いられる。まず、任意の旋回と大きな回転の後に、位置ベースで1つのGKP状態を測定することに対応する確率密度関数を評価する方法を開発した。この方法は解析数論の手法を用いて変換されたヤコビテータ関数を評価することを含む。この結果を用いて、古典的に効率的にシミュレート可能であり、GKP符号化クリフォード群に含まれない2つの大きなマルチモード回路を同定する。その結果、従来より効率的にシミュラブルな回路の集合が拡張された。 We study the classical simulatability of Gottesman-Kitaev-Preskill (GKP) states in combination with arbitrary displacements, a large set of symplectic operations and homodyne measurements. For these types of circuits, neither continuous-variable theorems based on the non-negativity of quasi-probability distributions nor discrete-variable theorems such as the Gottesman-Knill theorem can be employed to assess the simulatability. We first develop a method to evaluate the probability density function corresponding to measuring a single GKP state in the position basis following arbitrary squeezing and a large set of rotations. This method involves evaluating a transformed Jacobi theta function using techniques from analytic number theory. We then use this result to identify two large classes of multimode circuits which are classically efficiently simulatable and are not contained by the GKP encoded Clifford group. Our results extend the set of circuits previously known to be classically efficiently simulatable.	翻訳日:2023-02-21 04:59:03 公開日:2022-11-28
# 人工知能による差別を防ぐために機密データを使用する:GDPRは新しい例外が必要か? Using sensitive data to prevent discrimination by artificial intelligence: Does the GDPR need a new exception? ( http://arxiv.org/abs/2206.03262v3 ) ライセンス: Link先を確認	Marvin van Bekkum, Frederik Zuiderveen Borgesius	(参考訳) 組織は人工知能を使用して、さまざまな理由から人々に関する意思決定を行い、例えば、多くの求人アプリケーションから最高の候補を選ぶことができる。しかし、AIシステムは意思決定に使用すると差別効果がある。説明として、AIシステムは特定の民族を持つ人々の適用を拒否する可能性があるが、組織はそのような民族差別を計画しなかった。しかし、ヨーロッパでは、AIシステムが誤って民族によって差別されているかどうかを評価しようとすると、組織は問題に直面します。原則として、GDPRは、民族、宗教、性的嗜好に関するデータを含む特定の「特定のデータのカテゴリ」の使用を禁止している(しばしば「感受性データ」と呼ばれる)。欧州委員会AI法の提案には、組織がAIシステムの監査に特別なカテゴリのデータを使用することを可能にする条項が含まれている。本稿では,個人データの特殊カテゴリに関するGDPRのルールが,AIによる差別の防止を妨げているかどうかを問う。 GDPRは多くの状況において特別なカテゴリーデータの使用を禁止していると論じる。我々はまた、AIシステムによる差別を防止するために、GDPRが個人データの特別なカテゴリの使用を禁止することの例外を作成することに対する議論と反対の議論をマップアップする。この論文は欧州の法律について論じているが、この論文は欧州以外でも関係があり、世界の多くの政策立案者がプライバシーと非差別政策の緊張を和らげている。 Organisations can use artificial intelligence to make decisions about people for a variety of reasons, for instance, to select the best candidates from many job applications. However, AI systems can have discriminatory effects when used for decision-making. To illustrate, an AI system could reject applications of people with a certain ethnicity, while the organisation did not plan such ethnicity discrimination. But in Europe, an organisation runs into a problem when it wants to assess whether its AI system accidentally discriminates based on ethnicity: the organisation may not know the applicants' ethnicity. In principle, the GDPR bans the use of certain 'special categories of data' (sometimes called 'sensitive data'), which include data on ethnicity, religion, and sexual preference. The proposal for an AI Act of the European Commission includes a provision that would enable organisations to use special categories of data for auditing their AI systems. This paper asks whether the GDPR's rules on special categories of personal data hinder the prevention of AI-driven discrimination. We argue that the GDPR does prohibit such use of special category data in many circumstances. We also map out the arguments for and against creating an exception to the GDPR's ban on using special categories of personal data, to enable preventing discrimination by AI systems. The paper discusses European law, but the paper can be relevant outside Europe too, as many policymakers in the world grapple with the tension between privacy and non-discrimination policy.	翻訳日:2023-02-19 17:35:10 公開日:2022-11-28
# 自然言語処理によるデジタル資産価格の予測:調査 Predicting Digital Asset Prices using Natural Language Processing: a survey ( http://arxiv.org/abs/2212.00726v1 ) ライセンス: Link先を確認	Trang Tran	(参考訳) ブロックチェーン技術は、人々がどのように自分の資産を保存し、取引するかについての考え方を変えました。ブロックチェーン技術の革新の1つは分散化(decentralization)である。つまり、資産担保発行者や銀行といった従来の金融仲介業者はプロセス中に排除される。ブロックチェーン技術はさまざまな業界で利用されているが、最も顕著なアプリケーションは暗号通貨であり、Bitcoinが最初に提案されている。 2021年のピーク時には、Bitcoinの時価総額はかつて1兆ドルを超えた。仮想通貨市場のオープンな性質は、投資価格が非常に変動し、その変動が予測不可能であるため、小売投資家と機関投資家の両方に様々な課題と懸念をもたらす。特に機械学習や自然言語処理の台頭は、暗号通貨の価格行動の監視と予測に光を当てている。本稿では,ビットコインやイーサリアムなどのデジタル資産の価格予測と動作分析に機械学習と自然言語処理を適用した最近の取り組みをレビューし,分析することを目的とする。 Blockchain technology has changed how people think about how they used to store and trade their assets, as it introduced us to a whole new way to transact: using digital currencies. One of the major innovations of blockchain technology is decentralization, meaning that traditional financial intermediaries, such as asset-backed security issuers and banks, are eliminated in the process. Even though blockchain technology has been utilized in a wide range of industries, its most prominent application is still cryptocurrencies, with Bitcoin being the first proposed. At its peak in 2021, the market cap for Bitcoin once surpassed 1 trillion US dollars. The open nature of the crypto market poses various challenges and concerns for both potential retail investors and institutional investors, as the price of the investment is highly volatile, and its fluctuations are unpredictable. The rise of Machine Learning, and Natural Language Processing, in particular, has shed some light on monitoring and predicting the price behaviors of cryptocurrencies. This paper aims to review and analyze the recent efforts in applying Machine Learning and Natural Language Processing methods to predict the prices and analyze the behaviors of digital assets such as Bitcoin and Ethereum.	翻訳日:2023-02-19 12:47:00 公開日:2022-11-28
# Minimax AUC Fairness: Provable Convergence を用いた効率的なアルゴリズム Minimax AUC Fairness: Efficient Algorithm with Provable Convergence ( http://arxiv.org/abs/2208.10451v2 ) ライセンス: Link先を確認	Zhenhuan Yang, Yan Lok Ko, Kush R. Varshney, Yiming Ying	(参考訳) 一連の意思決定における機械学習モデルの使用は、社会的不平等を悪化させ、特に人種や性別によって定義された限界グループのメンバーに異質な影響をもたらす。 ROC曲線(AUC)の下の領域は、機械学習におけるスコアリング関数の性能を評価するために広く使われているが、他のパフォーマンス指標よりもアルゴリズム的公正さで研究されている。 AUC の双対の性質のため、AUC に基づく群フェアネス計量を定義することはペア独立であり、 \emph{intra-group} と \emph{inter-group} AUC の両方を含むこともある。重要なことは、AUCの1つのカテゴリだけを考えると、AUC最適化の不公平さを軽減するには不十分である。本稿では,実用性を維持しつつグループ内およびグループ間aucsを組み込んだミニマックス学習・バイアス緩和フレームワークを提案する。このrawlsianフレームワークに基づいて,効率的な確率最適化アルゴリズムを設計し,最小群レベル auc への収束を証明する。我々は,ミニマックスフレームワークと提案アルゴリズムの有効性を検証するために,合成データセットと実世界のデータセットの数値実験を行った。 The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metrics. Due to the pairwise nature of the AUC, defining an AUC-based group fairness metric is pairwise-dependent and may involve both \emph{intra-group} and \emph{inter-group} AUCs. Importantly, considering only one category of AUCs is not sufficient to mitigate unfairness in AUC optimization. In this paper, we propose a minimax learning and bias mitigation framework that incorporates both intra-group and inter-group AUCs while maintaining utility. Based on this Rawlsian framework, we design an efficient stochastic optimization algorithm and prove its convergence to the minimum group-level AUC. We conduct numerical experiments on both synthetic and real-world datasets to validate the effectiveness of the minimax framework and the proposed optimization algorithm.	翻訳日:2023-02-19 10:39:07 公開日:2022-11-28
# 光の多モードスクイーズ状態を用いた量子密度符号化ネットワーク Quantum Dense Coding Network using Multimode Squeezed States of Light ( http://arxiv.org/abs/2204.14147v2 ) ライセンス: Link先を確認	Ayan Patra, Rivu Gupta, Saptarshi Roy, Tamoghna Das, Aditi Sen De	(参考訳) 本稿では,複数送信機と連続変数システムを用いた単一受信機を備えた多モード高密度符号化ネットワークのフレームワークを提案する。このプロトコルは任意の数のモードに対してスケーラブルであり、符号化は変位である一方、復号にはビームスプリッターの列によって互いに結合されたモードのホモダインの計測が伴うため、現在利用可能なリソースを持つ実験室で実装される可能性を示す。 3モード状態と4モード状態の共有を含む2と3つの送信者の場合の符号化容量の閉形式表現を計算する。符号化動作後、送信者のモードが受信機に転送されると、高密度符号化容量は固定平均エネルギー伝送の制約により算出される。いずれの場合も,3モードおよび4モード状態のパラダイムクラスを用いて,プロトコルの量子的優位性を示す。量子アドバンテージは、送信者から受信者への送信が許されるエネルギー量の増加に伴って増加する。 We present a framework of a multimode dense coding network with multiple senders and a single receiver using continuous variable systems. The protocol is scalable to arbitrary numbers of modes with the encoding being displacements while the decoding involves homodyne measurements of the modes after they are combined in a pairwise manner by a sequence of beam splitters, thereby exhibiting its potentiality to implement in laboratories with currently available resources. We compute the closed form expression of the dense coding capacity for the cases of two and three senders that involve sharing of three- and four-mode states respectively. The dense coding capacity is calculated with the constraint of fixed average energy transmission when the modes of the sender are transferred to the receiver after the encoding operation. In both the cases, we demonstrate the quantum advantage of the protocol using paradigmatic classes of three- and four-mode states. The quantum advantage increases with the increase in the amount of energy that is allowed to be transmitted from the senders to the receiver.	翻訳日:2023-02-15 04:00:13 公開日:2022-11-28
# 非相互作用非エルミート$n$-パーティイト強結合格子に対する一般リーブの定理 Generalized Lieb's theorem for noninteracting non-Hermitian $n$-partite tight-binding lattices ( http://arxiv.org/abs/2205.04174v2 ) ライセンス: Link先を確認	A. M. Marques and R. G. Dias	(参考訳) エルミート二成分モデルはカイラル対称性の存在とリーブの定理によって特徴づけられ、2つの部分格子の間の点の不均衡からモデルのゼロエネルギー平坦なバンドの数を導出する。本稿では、任意の数の部分束が一方向および巡回的に連結された非エルミート模型のクラスを導入し、それらのモデルの零エネルギー平坦バンドの数をリーブの定理の一般化バージョンから見出すことができ、各部分束と最小次元の部分束との間の不均衡を含む非相互作用的密結合モデルへの応用について述べる。さらに、これらのモデルは、特定の時計やパラフェルミオン系の文脈で見られるタイプの一般化されたキラル対称性に従うことも示される。主な成果は単純な玩具モデルで示され、ここで紹介されるモデルの異なるプラットフォームにおける実現の可能性について論じる。 Hermitian bipartite models are characterized by the presence of chiral symmetry and by Lieb's theorem, which derives the number of zero-energy flat bands of the model from the imbalance of sites between its two sublattices. Here, we introduce a class of non-Hermitian models with an arbitrary number of sublattices connected in a unidirectional and cyclical way and show that the number of zero-energy flat bands of these models can be found from a generalized version of Lieb's theorem, in what regards its application to noninteracting tight-binding models, involving the imbalance between each sublattice and the sublattice of lowest dimension. Furthermore, these models are also shown to obey a generalized chiral symmetry, of the type found in the context of certain clock or parafermionic systems. The main results are illustrated with a simple toy model, and possible realizations in different platforms of the models introduced here are discussed.	翻訳日:2023-02-13 20:37:53 公開日:2022-11-28
# ブラックホールデコヒール量子重ね合わせ Black Holes Decohere Quantum Superpositions ( http://arxiv.org/abs/2205.06279v2 ) ライセンス: Link先を確認	Daine L. Danielson, Gautam Satishchandran, and Robert M. Wald	(参考訳) 質量体を空間的に分離された状態の量子重ね合わせにすると、天体の近傍にブラックホールが存在するだけで、最終的に重畳のコヒーレンスが破壊されることを示す。これは、事実上、天体の重力場がブラックホールに柔らかい重力を放射し、ブラックホールが重ね合わせに関する「どの経路」情報を取得することができるためである。同様の効果は、荷電された天体の量子重ね合わせにも起こる。このような量子重ね合わせのデコヒーレンス時間を推定する。ブラックホールが最終的に量子的重ね合わせを解き放つという事実は、重力の量子論におけるブラックホールの性質を理解する上での基本的な重要性であると考えられている。 We show that if a massive body is put in a quantum superposition of spatially separated states, the mere presence of a black hole in the vicinity of the body will eventually destroy the coherence of the superposition. This occurs because, in effect, the gravitational field of the body radiates soft gravitons into the black hole, allowing the black hole to acquire "which path" information about the superposition. A similar effect occurs for quantum superpositions of electrically charged bodies. We provide estimates of the decoherence time for such quantum superpositions. We believe that the fact that a black hole will eventually decohere any quantum superposition may be of fundamental significance for our understanding of the nature of black holes in a quantum theory of gravity.	翻訳日:2023-02-13 09:29:16 公開日:2022-11-28
# Pegg-BarnettとPaul量子相の形式的関係 Formal relation between Pegg-Barnett and Paul quantum phase frameworks ( http://arxiv.org/abs/2205.09481v2 ) ライセンス: Link先を確認	Tomasz Linowski, Konrad Schlichtholz, {\L}ukasz Rudnicki	(参考訳) エルミート量子位相演算子を定義する問題は、量子力学そのものと同じくらい古い。長年にわたり、抽象演算子形式から位相空間法まで、多くの解が提案された。本研究では、ポール形式主義における位相の確率分布が、後者と量子制限増幅チャネルを組み合わせることで、ペッグ・バーネット形式主義から完全に従うことを証明し、最も顕著な2つのアプローチの間に明確な接続を行う。その結果,Paul フレームワークは Pegg-Barnett アプローチの半古典的限界と見なされる可能性が示唆された。 The problem of defining a hermitian quantum phase operator is nearly as old as quantum mechanics itself. Throughout the years, a number of solutions was proposed, ranging from abstract operator formalisms to phase-space methods. In this work, we make an explicit connection between two of the most prominent approaches, by proving that the probability distribution of phase in the Paul formalism follows exactly from the Pegg-Barnett formalism by combining the latter with the quantum limited amplifier channel. Our findings suggest that the Paul framework may be viewed as a semi-classical limit of the Pegg-Barnett approach.	翻訳日:2023-02-12 15:52:59 公開日:2022-11-28
# 古典最適化ハミルトンシミュレーション Classically optimized Hamiltonian simulation ( http://arxiv.org/abs/2205.11427v3 ) ライセンス: Link先を確認	Conor Mc Keever, Michael Lubasch	(参考訳) ハミルトンシミュレーションは量子コンピュータが量子優位を達成するための有望な応用である。本稿では,量子回路を最適化するためのテンソルネットワーク法に基づく古典的アルゴリズムを提案する。トロッター積公式と比較して、古典的に最適化された回路は桁違いに精度が高く、シミュレーション時間も大幅に拡張できることを示す。 Hamiltonian simulation is a promising application for quantum computers to achieve a quantum advantage. We present classical algorithms based on tensor network methods to optimize quantum circuits for this task. We show that, compared to Trotter product formulas, the classically optimized circuits can be orders of magnitude more accurate and significantly extend the total simulation time.	翻訳日:2023-02-12 00:30:45 公開日:2022-11-28
# 熱力学における量子コヒーレンスの役割 Role of Quantum Coherence in Thermodynamics ( http://arxiv.org/abs/2205.13612v3 ) ライセンス: Link先を確認	Gilad Gour	(参考訳) 時間変換共変進化下での量子系の相互変換性を決定するために必要な条件を見つけ、それを用いて単発状態と漸近状態の両方において量子熱力学の問題を解く。量子熱力学の資源理論は可逆ではないことがよく知られているが、prl 111, 250404 (2013) では、エネルギー準位に対するコヒーレント重ね合わせのsublinear amount of coherent superpositionが利用可能であると主張した。ここでは、エネルギー準線形量のコヒーレンスを自由とすれば、熱水性に関する資源理論は自明になることを示す。代わりに、エネルギーの亜線型量を考えることで、純粋な状態の場合、熱力学の理論は可逆になることを示す。混合状態の場合に対する同じ主張の証明はまだ不足している。 We find necessary and sufficient conditions to determine the inter-convertibility of quantum systems under time-translation covariant evolution, and use it to solve several problems in quantum thermodynamics both in the single-shot and asymptotic regimes. It is well known that the resource theory of quantum athermality is not reversible, but in PRL 111, 250404 (2013) it was claimed that the theory becomes reversible ``provided a sublinear amount of coherent superposition over energy levels is available". Here we show that if a sublinear amount of coherence among energy levels were considered free, then the resource theory of athermality would become trivial. Instead, we show that by considering a sublinear amount of energy to be free, the theory of athermality becomes reversible for the pure-state case. A proof of the same claim for the mixed-state case is still lacking.	翻訳日:2023-02-11 16:30:23 公開日:2022-11-28
# 量子状態の不安定構造に基づく局所近似によるブラインド量子データ圧縮のレート低減 Rate Reduction of Blind Quantum Data Compression with Local Approximations Based on Unstable Structure of Quantum States ( http://arxiv.org/abs/2206.03501v2 ) ライセンス: Link先を確認	Kohdai Kuroiwa and Debbie Leung	(参考訳) 本稿では,有限局所近似を用いたデータ圧縮タスクであるブラインド量子データ圧縮のための新しいプロトコルを提案する。ブラインドデータ圧縮の速度は、近似が小さくても近似に影響を受けやすい。この不安定性は近似に対する量子状態の構造の感度に由来するため、近似の存在下でのブラインド圧縮の解析は難解である。本稿では, 圧縮速度を実質的に低減するために, 不安定性を利用したプロトコルを構築した。本プロトコルは, 具体例において, 顕著な削減率を示す。さらに,本手法を対角状態に適用し,この特別な場合において2種類の近似法を提案する。数値実験を行い、これらの2つの近似法のうちの1つが他方よりもかなり優れていることを観察する。そこで本研究では,ブラインド量子データ圧縮の近似速度トレードオフのさらなる検討に向けて,近似値を用いたブラインド量子データ圧縮の一般研究に向けて第一歩を踏み出した。 In this paper, we propose a new protocol for a data compression task, blind quantum data compression, with finite local approximations. The rate of blind data compression is susceptible to approximations even when the approximations are diminutive. This instability originates from the sensitivity of a structure of quantum states against approximations, which makes the analysis of blind compression in the presence of approximations intractable. In this paper, we constructed a protocol that takes advantage of the instability to reduce the compression rate substantially. Our protocol shows a significant reduction in rate for specific examples we examined. Moreover, we apply our methods to diagonal states, and propose two types of approximation methods in this special case. We perform numerical experiments and observe that one of these two approximation methods performs significantly better than the other. Thus, our analysis makes a first step toward general investigation of blind quantum data compression with the allowance of approximations towards further investigation of approximation-rate trade-off of blind quantum data compression.	翻訳日:2023-02-10 06:36:14 公開日:2022-11-28
# 量子化学のための量子学習マシン(qlm)のオープンソースの変分量子固有ソルバ拡張 Open Source Variational Quantum Eigensolver Extension of the Quantum Learning Machine (QLM) for Quantum Chemistry ( http://arxiv.org/abs/2206.08798v4 ) ライセンス: Link先を確認	Mohammad Haidar, Marko J. Ran\v{c}i\'c, Thomas Ayral, Yvon Maday, Jean-Philip Piquemal	(参考訳) 量子化学 (qc) は量子コンピューティングの最も有望な応用の一つである。しかし、現在の量子処理ユニット(QPU)は依然として大きなエラーにさらされている。したがって、ノイズの多い中間スケール量子(NISQ)ハードウェアは、量子ビット数と回路深さの点で制限される。変分量子固有解法(VQE)のような特定のアルゴリズムは、そのような問題を克服することができる。本稿では,オープンVQE(Open-VQE)と呼ばれる新しいオープンソースQCパッケージについて紹介する。 VQEアルゴリズムの開発とテストを容易にする。 atos quantum learning machine (qlm)は、量子コンピューティングプログラムを書き、最適化し、シミュレートできる一般的な量子プログラミングフレームワークである。私たちは、open-vqeとともに、新しいオープンソースモジュールであるmyqlm-fermion(qc開発で重要な重要なqlm再資源を含む)を紹介します(fermionic second quantization toolsなど)。 Open-VQEパッケージはQLMをQCに拡張します。 (i)一般的に使用されるuccsd ans{\"a}tz以外の異なる種類の励起を生成する関数 (ii) 単純なクラス構造ピソン符号で書かれた"adaptive derivative assembled pseudo-Trotter method"(ADAPT-VQE)の新たな実装。他の主要な量子プログラミングフレームワークとの相互運用性は、myqlmのおかげで保証されている。 open-vqe/myqlm-fermion量子シミュレータを組み合わせることで、変分量子アルゴリズムの実装、テスト、開発が容易になり、現在の量子コンピュータでqc計算を実行するための最良の妥協を選択し、大きな分子をテストできる。 4から24までの量子ビット数に関連する分子の広範なベンチマークを提供する。 Quantum Chemistry (QC) is one of the most promising applications of Quantum Computing. However, present quantum processing units (QPUs) are still subject to large errors. Therefore, noisy intermediate-scale quantum (NISQ) hardware is limited in terms of qubits counts and circuit depths. Specific algorithms such as Variational Quantum Eigensolvers (VQEs) can potentially overcome such issues. We introduce here a novel open-source QC package, denoted Open-VQE, providing tools for using and developing chemically-inspired adaptive methods derived from Unitary Coupled Cluster (UCC). It facilitates the development and testing of VQE algorithms. It is able to use the Atos Quantum Learning Machine (QLM), a general quantum programming framework enabling to write, optimize and simulate quantum computing programs. Along with Open-VQE, we introduce myQLM-Fermion, a new open-source module (that includes the key QLM ressources that are important for QC developments (fermionic second quantization tools etc...). The Open-VQE package extends therefore QLM to QC providing: (i) the functions to generate the different types of excitations beyond the commonly used UCCSD ans{\"a}tz;(ii) a new implementation of the "adaptive derivative assembled pseudo-Trotter method" (ADAPT-VQE), written in simple class structure python codes. Interoperability with other major quantum programming frameworks is ensured thanks to myQLM, which allows users to easily build their own code and execute it on existing QPUs. The combined Open-VQE/myQLM-Fermion quantum simulator facilitates the implementation, tests and developments of variational quantum algorithms towards choosing the best compromise to run QC computations on present quantum computers while offering the possibility to test large molecules. We provide extensive benchmarks for several molecules associated to qubit counts ranging from 4 up to 24.	翻訳日:2023-02-09 02:00:15 公開日:2022-11-28
# シュール・ワイル双対性を示す任意の対称性群上の量子状態平均化の双対性 Duality of averaging of quantum states over arbitrary symmetry groups revealing Schur-Weyl duality ( http://arxiv.org/abs/2208.07689v2 ) ライセンス: Link先を確認	Marcin Markiewicz and Janusz Przewocki	(参考訳) 量子情報理論において、多部量子状態上のユニタリ群の集合的作用に対する一様平均化は、その状態がサブシステムの置換作用素に相当する形式に投影されるという、確立された事実である。したがって、置換作用素と同値な状態は集合ユニタリノイズによって影響を受けない。自明な観察により、置換作用素上の一様平均化は、ユニタリ群の集団作用の1つに相当するブロック対角構造を持つ形式に状態を投影することを示している。私たちは、平均値の双対性というこの性質の名前を紹介します。この双対性の背後にある数学的理由は、多部量子系のテンソル積状態空間上のユニタリ群の集合作用と置換演算の作用が行列代数として扱われるときに相互可換であるという事実である。そのような行列代数のペアは双対還元対として知られている。この研究において、有限次元量子系の場合、平均化の双対性は、群作用のカルタン分解上のイテレーテッド積分によって平均化演算が定義される限り、コンパクトであるか否かにかかわらず、双対還元対である対称群の任意の対に対して成り立つことを示す。結果は非常に一般的なものであるが、特殊線形行列と置換演算の集団作用からなる双対簡約対の具体的例に着目し、これは非一元的slocc(stochastic local operations and classical communication)演算上の多元量子状態の平均化に対応している。この文脈では、特定の不変部分空間へのポストセレクションにおいて、SLOCC平均化の場合において、集合単位平均化から知られているノイズレスサブシステムが持続することを示す。 It is a well-established fact in quantum information theory, that uniform averaging over the collective action of a unitary group on a multipartite quantum state projects the state to a form equivalent to a permutation operator of the subsystems. Hence states equivalent to permutation operators are untouched by collective unitary noise. A trivial observation shows that uniform averaging over permutation operators projects the state into a form with block-diagonal structure equivalent to the one of the collective action of the unitary group. We introduce a name for this property: duality of averaging. The mathematical reason behind this duality is the fact that the collective action of the unitary group on the tensor product state space of a multipartite quantum system and the action of the permutation operations are mutual commutants when treated as matrix algebras. Such pairs of matrix algebras are known as dual reductive pairs. In this work we show, that in the case of finite dimensional quantum systems such duality of averaging holds for any pairs of symmetry groups being dual reductive pairs, regardless of whether they are compact or not, as long as the averaging operation is defined via iterated integral over the Cartan decomposition of the group action. Although our result is very general, we focus much attention on the concrete example of a dual reductive pair consisting of collective action of special linear matrices and permutation operations, which physically corresponds to averaging multipartite quantum states over non-unitary SLOCC-type (Stochastic Local Operations and Classical Communication) operations. In this context we show, that noiseless subsystems known from collective unitary averaging persist in the case of SLOCC averaging in a conditional way: upon postselection to specific invariant subspaces.	翻訳日:2023-01-30 22:54:16 公開日:2022-11-28
# ミスマッチ基底測定による単一ビットレジームにおける実用量子鍵分布の安全性の簡易かつ厳密な証明法 Simple and Rigorous Proof Method for the Security of Practical Quantum Key Distribution in the Single-Qubit Regime Using Mismatched Basis Measurements ( http://arxiv.org/abs/2208.13754v2 ) ライセンス: Link先を確認	Michel Boyer, Gilles Brassard, Nicolas Godbout, Rotem Liss, St\'ephane Virally	(参考訳) 量子鍵分配(QKD)プロトコルは、2つのパーティが秘密の共有鍵を生成できるようにすることを目的としている。理論上、多くのQKDプロトコルは無条件で安全であることが証明されているが、実験的なQKD実装の実際のセキュリティ分析は、通常、可能なすべての抜け穴を考慮していない。本稿では、まず、単一量子ビットロスレスシステムにおいて、離散変数QKD(測定デバイスに依存しないQKDにも適用できる)の実用的実装に対して、セキュアなキーレートの計算方法を提案する。我々は,本手法がQKDの実践的実現を解析,ベンチマーク,標準化するための標準ツールの1つになることを願っている。 Quantum key distribution (QKD) protocols aim at allowing two parties to generate a secret shared key. While many QKD protocols have been proven unconditionally secure in theory, practical security analyses of experimental QKD implementations typically do not take into account all possible loopholes, and practical devices are still not fully characterized for obtaining tight and realistic key rates. We present a simple method of computing secure key rates for any practical implementation of discrete-variable QKD (which can also apply to measurement-device-independent QKD), initially in the single-qubit lossless regime, and we rigorously prove its unconditional security against any possible attack. We hope our method becomes one of the standard tools used for analysing, benchmarking, and standardizing all practical realizations of QKD.	翻訳日:2023-01-28 14:33:46 公開日:2022-11-28
# 分散ボース-アインシュタイン凝縮体の量子バックアクション限界 Quantum Back-action Limits in Dispersively Measured Bose-Einstein Condensates ( http://arxiv.org/abs/2209.04400v2 ) ライセンス: Link先を確認	Emine Altuntas and Ian B. Spielman	(参考訳) 量子力学の基本的な理論は、測定がシステムの波動関数を、観測者がいなくても測定結果と最も一致するものに変化させることである。弱測定はシステムの限られた情報のみを生成し、結果としてシステムの状態を最小限に変化させる。ここでは、原子ボース・アインシュタイン凝縮における量子バックアクションと遠方共振レーザービームとの相互作用を理論的に実験的に特徴付ける。この過程を,環境が散乱光を測定する量子軌道法を用いて理論的に記述し,理想的な光検出機構に基づく測定モデルを提案する。ラムゼー干渉計のコントラストの観点で導波関数の変化を実験的に定量化し,測定過程に伴う寄生効果を制御した。観測されたバックアクションは、我々の測定モデルとよく一致しており、量子ガスの真の量子バックアクション制限測定を可能にする。 A fundamental tenet of quantum mechanics is that measurements change a system's wavefunction to that most consistent with the measurement outcome, even if no observer is present. Weak measurements produce only limited information about the system, and as a result only minimally change the system's state. Here, we theoretically and experimentally characterize quantum back-action in atomic Bose-Einstein condensates interacting with a far-from resonant laser beam. We theoretically describe this process using a quantum trajectories approach where the environment measures the scattered light and present a measurement model based on an ideal photodetection mechanism. We experimentally quantify the resulting wavefunction change in terms of the contrast of a Ramsey interferometer and control parasitic effects associated with the measurement process. The observed back-action is in good agreement with our measurement model, enabling true quantum back-action limited measurements of quantum gases.	翻訳日:2023-01-27 05:21:36 公開日:2022-11-28
# 超伝導量子ビットによる量子コンピューティングの将来 The Future of Quantum Computing with Superconducting Qubits ( http://arxiv.org/abs/2209.06841v2 ) ライセンス: Link先を確認	Sergey Bravyi, Oliver Dial, Jay M. Gambetta, Dario Gil, and Zaira Nazario	(参考訳) 史上初めて、量子処理ユニット(QPU)の出現とともに、コンピューティングパラダイムにおいて分岐点が見られます。超多項式スピードアップによる計算の可能性を抽出し、量子アルゴリズムを実現するには、量子誤り訂正技術の大幅な進歩が必要になる可能性が高い。一方、短期的に計算上の優位性を達成するためには、回路編み技術による複数のQPUの組み合わせ、エラー抑制と緩和による解の質の向上、漸近的なスピードアップによる量子アルゴリズムのヒューリスティックバージョンに焦点を当てることが考えられる。そのためには、量子コンピューティングハードウェアの性能を改善し、ソフトウェアは量子中心のスーパーコンピュータと呼ばれる新しいアーキテクチャを形成するために、量子プロセッサと古典プロセッサをシームレスに統合する必要がある。長期的には、2Dトポロジ以上の量子ビット接続を利用してより効率的な量子エラー訂正コードを実現するハードウェアや、QPUのスケーリングとワークロードの並列化のためのモジュールアーキテクチャ、ユーザの目に見えない技術的複雑さを実現し、ユビキタスで摩擦のない量子コンピューティングの目標を実現するソフトウェアなどがあります。 For the first time in history, we are seeing a branching point in computing paradigms with the emergence of quantum processing units (QPUs). Extracting the full potential of computation and realizing quantum algorithms with a super-polynomial speedup will most likely require major advances in quantum error correction technology. Meanwhile, achieving a computational advantage in the near term may be possible by combining multiple QPUs through circuit knitting techniques, improving the quality of solutions through error suppression and mitigation, and focusing on heuristic versions of quantum algorithms with asymptotic speedups. For this to happen, the performance of quantum computing hardware needs to improve and software needs to seamlessly integrate quantum and classical processors together to form a new architecture that we are calling quantum-centric supercomputing. Long term, we see hardware that exploits qubit connectivity in higher than 2D topologies to realize more efficient quantum error correcting codes, modular architectures for scaling QPUs and parallelizing workloads, and software that evolves to make the intricacies of the technology invisible to the users and realize the goal of ubiquitous, frictionless quantum computing.	翻訳日:2023-01-26 16:51:57 公開日:2022-11-28
# 脱局在質量の再結合における重力子放出条件 Conditions for graviton emission in the recombination of a delocalized mass ( http://arxiv.org/abs/2209.10355v2 ) ライセンス: Link先を確認	Alessandro Pesci	(参考訳) 既知のゲダンケン実験では、非局在化質量は再結合され、それによって引き起こされる重力場は別の(距離のある)粒子によって探される。これにより、重力場が重なり合った位置と絡み合うと生じる相補性と因果関係の間の緊張関係を探究することができる。提案された解決法は四極子モーメントからのグラビトン放出である:因果的に分離されたソースとプローブに対して、どの経路を許容するのに十分なモーメントが異なるとき、それらはまた再結合におけるグラビトン放出を暗示する(明示的に計算する必要はない)。ここでは、非局在化粒子(プローブとゲダンケンの実験の鍛造)に焦点を当て、重力子放出の条件(質量、分離、再結合時間)を探索する。これにより、再結合における四極子モーメントのバリエーションは、非局在状態のエネルギー期待値(後者の場合のモーメント変動 $\sim m \, d^2$, with $m$ mass, $d$ separation)に置き換わる場合に比べて、体が絡み合っている場合と比べ、一般的に大きく増加することが分かる。また、グラビトン放出の制限組換え時間は、$\sqrt{m}$の代わりに$m$として増加する。この場合、プランク質量はしきい値質量(重く、非局在化された物体)として作用し、その下に重力子放出は生じないが、再結合の速度は速い。もしこれがディオシとペンローズの崩壊モデル(基本形)で予想される崩壊時間と比較された場合、再結合による(四極子)グラビトン放出は不可能であることが分かる。実際、m$が排出を許容できるほど大きくなれば、重ね合わせが再結合するのに十分な期間崩壊に耐えるには大きすぎる。 In a known gedanken experiment a delocalized mass is recombined while the gravitational field sourced by it is probed by another (distant) particle. This allows to explore a possible tension between complementarity and causality, arising if the gravitational field entangles with the superposed locations. A proposed resolution is graviton emission from quadrupole moments: when, for causally disconnected source and probe, the moments differ enough to allow which path, they also imply graviton emission in the recombination (no need to explicitly compute them). Here we focus on the delocalized particle (forgetting about the probe and the gedanken experiment) and explore the conditions (in terms of mass, separation, recombination time) for graviton emission. Doing this we find that the variations of quadrupole moments in the recombination are generically greatly enhanced if the field is entangled compared to if it is sourced instead by the energy expectation value on the delocalized state (moment variation $\sim m \, d^2$ in the latter case, with $m$ mass, $d$ separation). Also, we get a limit recombination time for graviton emission growing as $m$ in place of $\sqrt{m}$. In this the Planck mass acts as threshold mass (huge, for delocalized objects): no graviton emission is possible below it, however fast the recombination occurs. If this is compared with the decay times foreseen in the collapse models of Diosi and Penrose (in their basic form), one finds that no (quadrupole) graviton emission from recombination is possible in them. Indeed, right when $m$ becomes large enough to allow for emission it becomes too large for the superposition to survive collapse long enough to recombine.	翻訳日:2023-01-25 20:46:54 公開日:2022-11-28
# 可変超伝導量子ビットを用いた量子センシング:最適化と高速化 Quantum sensing with tunable superconducting qubits: optimization and speed-up ( http://arxiv.org/abs/2211.08344v3 ) ライセンス: Link先を確認	Sergey Danilin, Nicholas Nugent, Martin Weides	(参考訳) センシングとメトロロジーは、より正確なデータセットの必要性を常に満たし、研究者が理論モデルの妥当性についてより信頼できる結論を出すことによって、基礎科学や応用において重要な役割を果たす。センサーはユビキタスです。これらは重力イメージング、地質学、ナビゲーション、セキュリティ、タイムキーピング、分光、化学、磁気測定、医療、医療など幅広い分野のアプリケーションで使われている。量子技術の現在の進歩は、必然的に新しい能力を持つセンサーとしての量子システムの使用を探求するきっかけとなった。本稿では、波長可変トランスモン量子ビットセンサを用いたキタエフ位相推定アルゴリズムによる外部磁束の量子エンハンスセンシングの最適化について述べる。最大量子ビット遷移周波数の異なるセンサに対して最適なフラックス偏差点を提供する。所定の設計に対してデコヒーレンス率の推定を行う。センシングに2-$と3-$qubitのエンタングル状態を使用することは、単一のqubitケースとシミュレーションで比較される。フラックスセンシング精度は10^{-8}\cdot\Phi_0$に達し、時間とともに$\sim\ 1/t$とスケールする。 Sensing and metrology play an important role in fundamental science and applications by fulfilling the ever-present need for more precise data sets and by allowing researchers to make more reliable conclusions on the validity of theoretical models. Sensors are ubiquitous. They are used in applications across a diverse range of fields including gravity imaging, geology, navigation, security, timekeeping, spectroscopy, chemistry, magnetometry, healthcare, and medicine. Current progress in quantum technologies has inevitably triggered the exploration of the use of quantum systems as sensors with new and improved capabilities. This article describes the optimization of the quantum-enhanced sensing of external magnetic fluxes with a Kitaev phase estimation algorithm based on a sensor with tunable transmon qubits. It provides the optimal flux biasing point for sensors with different maximal qubit transition frequencies. An estimation of decoherence rates is made for a given design. The use of $2-$ and $3-$qubit entangled states for sensing are compared in simulation with the single qubit case. The flux sensing accuracy reaches $10^{-8}\cdot\Phi_0$ and scales with time as $\sim\ 1/t$ which proves the speed-up of sensing with high ultimate accuracy.	翻訳日:2023-01-19 12:29:20 公開日:2022-11-28
# 未決定ダイソン・シュウィンガー方程式 Underdetermined Dyson-Schwinger equations ( http://arxiv.org/abs/2211.13026v2 ) ライセンス: Link先を確認	Carl M. Bender, Christos Karapoulitidis and S.P. Klevansky	(参考訳) 本稿では、ダイソン・シュウィンガー方程式(ds)を量子場理論における計算ツールとしての有効性について検討する。 DS方程式は、場の理論の連結グリーン函数$G_n$によって正確に満たされる結合方程式の無限列である。これらの方程式は、より高次のグリーン函数に結合し、それらが切り離された場合、結果として生じる有限な方程式体系は過小評価される。未決定系を解く最も単純な方法は、すべての高次グリーン関数を 0 に設定し、最初の数個のグリーン関数に対して得られた決定系を解くことである。得られた$g_1$ または $g_2$ so は、解決可能なモデルの正確な結果と比較でき、高次切り換えの精度が向上するかどうかを確認することができる。 hermitian $\phi^4$ と $\phi^6$ と non-hermitian $i\phi^3$, $-\phi^4$, $i\phi^5$ の5つのモデルが研究されている。切断されたds方程式は、緩やかに制限値に収束する近似値の列を与えるが、この制限値は常に正確な値と数パーセント異なる。平均場的近似に基づくより洗練されたトランケーションスキームは、この恐ろしい計算問題を解決しない。 This paper examines the effectiveness of the Dyson-Schwinger (DS) equations as a calculational tool in quantum field theory. The DS equations are an infinite sequence of coupled equations that are satisfied exactly by the connected Green's functions $G_n$ of the field theory. These equations link lower to higher Green's functions and, if they are truncated, the resulting finite system of equations is underdetermined. The simplest way to solve the underdetermined system is to set all higher Green's function(s) to zero and then to solve the resulting determined system for the first few Green's functions. The $G_1$ or $G_2$ so obtained can be compared with exact results in solvable models to see if the accuracy improves for high-order truncations. Five $D=0$ models are studied: Hermitian $\phi^4$ and $\phi^6$ and non-Hermitian $i\phi^3$, $-\phi^4$, and $i\phi^5$ theories. The truncated DS equations give a sequence of approximants that converge slowly to a limiting value but this limiting value always {\it differs} from the exact value by a few percent. More sophisticated truncation schemes based on mean-field-like approximations do not fix this formidable calculational problem.	翻訳日:2023-01-19 01:32:08 公開日:2022-11-28
# 増幅ファイバリンクにおける絡み合い支援通信のスケーリング Scaling of Entanglement-Assisted Communication in Amplified Fiber Links ( http://arxiv.org/abs/2211.13296v2 ) ライセンス: Link先を確認	Simon Sekav\v{c}nik and Janis N\"otzel	(参考訳) 量子通信技術はいくつかの高度な戦略を提供する。しかし、その実践的利用はしばしばよく理解されていない。本稿では,増幅ファイバリンクにおける理論的な通信容量スケーリングについて概説する。本稿では,十分な帯域幅と空間モードがファイバによって提供され,事前共有されたエンタングルメントによる支援が任意のキャパシティを提供するシナリオを提案する。従来の古典的手法や非支援量子技術に対する将来的な能力の優位性は、潜在的に無限大である。我々はこの理論的な観察を、繊維開発の現状に関連して論じる。 Quantum communication technology offers several advanced strategies. However, their practical use is often times still not well understood. In this work we outline the theoretical communication capacity scaling in amplified fiber links. We present a scenario in which the assistance via pre-shared entanglement offers an arbitrary capacity, given enough bandwidth and spatial modes are provided by the fiber. The future capacity advantage over conventional classical techniques as well as non-assisted quantum techniques is potentially infinite. We discuss this theoretical observation in connection to current trends in fiber development.	翻訳日:2023-01-19 01:12:50 公開日:2022-11-28
# 量子オットーサイクルによるエナンチオマー検出 Enantiomer detection via Quantum Otto cycle ( http://arxiv.org/abs/2211.06888v2 ) ライセンス: Link先を確認	Mohsen Izadyari and M. Tahir Naseem and \"Ozg\"ur E. M\"ustecapl{\i}ouglu	(参考訳) エナンチオマーは、左右の配座に存在するキラル分子である。エナンチオマーの検出の光学的手法は、左利き分子と右利き分子の識別に広く用いられている。しかし、同一のエナンチオマーのスペクトルはエナンチオマーの検出を非常に困難な課題にしている。本稿では, 熱力学的プロセスを利用したエナンチオマー検出の可能性を検討する。特に、周期的な光遷移を持つ3段階系によって記述されるキラル分子を加工媒体とする量子オットーサイクルを用いる。 3レベルシステムの各エネルギー遷移は、外部レーザー駆動と結合される。左利きの分子は熱エンジンとして機能し、右利きの分子は熱加速器として機能し、ドライブの全体位相はサイクルの制御パラメータとして考慮される。さらに、レーザー駆動の変形を制御パラメータとして考慮し、左右どちらの分子も熱エンジンとして機能する。しかし、両方の症例の抽出された作業と効率が定量的に異なるため、分子は依然として区別できる。したがって、オットーサイクルの作業分布を評価することにより、左右の分子を区別することができる。 Enantiomers are chiral molecules that exist in right-handed and left-handed conformations. Optical techniques of enantiomers detection are widely employed to discriminate between left- and right-handed molecules. However, identical spectra of enantiomers make enantiomer detection a very challenging task. Here, we investigate the possibility of exploiting thermodynamic processes for enantiomer detection. In particular, we employ a quantum Otto cycle, in which a chiral molecule described by a three-level system with cyclic optical transitions is considered a working medium. Each energy transition of the three-level system is coupled with an external laser drive. We find that the left-handed molecule works as a heat engine, while the right-handed molecule works as a thermal accelerator where the overall phase of the drives is considered as the cycle's control parameter. In addition, both left- and right-handed molecules work as heat engines by considering laser drives' detuning as the control parameter. However, the molecules can still be distinguished because both cases' extracted work and efficiency are quantitatively very different. Accordingly, left and right-handed molecules can be distinguished by evaluating the work distribution in the Otto cycle.	翻訳日:2023-01-18 07:27:05 公開日:2022-11-28
# 光子計数ベルテストのeberhard極限と量子鍵分布におけるその有用性 Eberhard limit for photon-counting Bell tests and its utility in quantum key distribution ( http://arxiv.org/abs/2211.15033v1 ) ライセンス: Link先を確認	Thomas McDermott, Morteza Moradi, Antoni Mikos-Nuszkiewicz, Magdalena Stobi\'nska	(参考訳) 抜け穴のないベルテストは、任意の抜け穴がプロトコルのセキュリティを損なう可能性があるため、デバイス非依存の量子鍵分布(qkd)を実行したい場合には不可欠である。エバーハルトによる地殻調査では、弱絡み2量子状態は最大絡み2/3以上の検出効率でループホールを閉じることができるため、検出ループホールに対する抵抗が最大絡み2/3よりもはるかに大きいことを示した。ここでは、2モード圧縮真空や一般化ホランド・バーネット状態のような高次元多光子状態の非局在性を証明できる光子計数CHSHベル試験について、この制限が成り立つことを示した。実際、これらのテストは何らかの意味で普遍的であり、2つのモードが光子数でよく相関している限り、任意の多光子二成分状態に対して可能な検出ホールフリーなテストを可能にする証拠を示す。さらに、典型的な2入出力ベルのシナリオを超えて、エバーハルト限界に合致する光子計数CGLMPの不等式も存在し、よりエキゾチックなループホールのないベル試験への道を開いた。最後に,非エンタングル状態の損失許容度を増大させることで,光子計数テストに基づくqkdプロトコルの鍵レートと損失許容度を向上できることを示す。 Loophole-free Bell tests are essential if one wishes to perform device-independent quantum key distribution (QKD), since any loophole could be used by a potential adversary to undermine the security of the protocol. Crucial work by Eberhard demonstrated that weakly entangled two-qubit states have a far greater resistance to the detection loophole than maximally entangled states, allowing one to close the loophole with detection efficiency greater than 2/3. Here we demonstrate that this same limit holds for photon-counting CHSH Bell tests which can demonstrate non-locality for higher dimensional multiphoton states such as two-mode squeezed vacuum and generalized Holland-Burnett states. In fact, we show evidence that these tests are in some sense universal, allowing feasible detection loophole-free tests for any multiphoton bipartite state, as long as the two modes are well correlated in photon number. Additionally, by going beyond the typical two-input two-output Bell scenario, we show that there are also photon-counting CGLMP inequalities which can also match the Eberhard limit, paving the way for more exotic loophole-free Bell tests. Finally we show that by exploiting this increased loss tolerance of non maximally entangled states, one can increase the key rates and loss tolerances of QKD protocols based on photon-counting tests.	翻訳日:2023-01-17 15:19:02 公開日:2022-11-28
# 開量子多体系におけるデコヒーレンス過程の準粒子:インコヒーレントン Quasiparticles of Decoherence Processes in Open Quantum Many-Body Systems: Incoherentons ( http://arxiv.org/abs/2211.14991v1 ) ライセンス: Link先を確認	Taiki Haga, Masaya Nakagawa, Ryusuke Hamazaki, Masahito Ueda	(参考訳) 開量子系の緩和ダイナミクスは、系のコヒーレントハミルトン力学と環境との相互作用による散逸力学との競合によって決定される。したがって、コヒーレント体制から非コヒーレント体制への移行を理解することは基本的な関心事である。ヒッヘルト非認識準粒子(インコヒーレントン)は、開量子多体系の力学を支配するリウヴィリア超作用素の固有モデムにおけるコヒーレント-非コヒーレント遷移を記述する。ここで、インコヒーレントンは、系の密度行列を表す補助ラダー系において、鎖間結合状態として定義される。リウヴィリアン固有モードは、関連するインコヒーレントンの数を反映する異なる減衰率を持つ群に分類される。また、固有モードの異なるグループを分離するスペクトルギャップ(量子コヒーレンスギャップ)も導入します。我々は, 劣化を受ける格子ボソンモデルにおけるインコヒーレントンの存在を実証し, インコヒーレントンが分解されると量子コヒーレンスギャップが閉じることを示し, 指数的崩壊による非コヒーレント緩和からコヒーレント振動緩和への動的遷移を示す。さらに, 量子多体系のデコヒーレンスダイナミクスが, インコヒーレントンの生成, 局在, 拡散の観点でどのように理解できるかを考察する。 The relaxation dynamics of an open quantum system is determined by the competition between the coherent Hamiltonian dynamics of a system and the dissipative dynamics due to interactions with environments. It is therefore of fundamental interest to understand the transition from the coherent to incoherent regimes. We find that hitherto unrecognized quasiparticles -- incoherentons -- describe this coherent-to-incoherent transition in eigenmodes of a Liouvillian superoperator that governs the dynamics of an open quantum many-body system. Here, an incoherenton is defined as an interchain bound state in an auxiliary ladder system that represents the density matrix of a system. The Liouvillian eigenmodes are classified into groups with different decay rates that reflect the number of incoherentons involved therein. We also introduce a spectral gap -- quantum coherence gap -- that separates the different groups of eigenmodes. We demonstrate the existence of incoherentons in a lattice boson model subject to dephasing, and show that the quantum coherence gap closes when incoherentons are deconfined, which signals a dynamical transition from incoherent relaxation with exponential decay to coherent oscillatory relaxation. Furthermore, we discuss how the decoherence dynamics of quantum many-body systems can be understood in terms of the generation, localization, and diffusion of incoherentons.	翻訳日:2023-01-17 15:18:36 公開日:2022-11-28
# 量子交互演算子アンザッツによる最小被覆問題の解法 Quantum Alternating Operator Ansatz for Solving the Minimum Exact Cover Problem ( http://arxiv.org/abs/2211.15266v1 ) ライセンス: Link先を確認	Sha-Sha Wang, Hai-Ling Liu, Su-Juan Qin, Fei Gao, and Qiao-Yan Wen	(参考訳) 最小完全被覆(MEC)は一般的な組合せ最適化問題であり、テールアサインメントや車両ルーティングに広く応用されている。本稿では,MEC 問題を解くために量子交互演算子 ansatz (QAOA+) を用いる。詳しくは、自明な実現可能な解を得るために、まずMECを2つの目的関数を持つ制約付き最適化問題に変換する。そこで,線形重み付き和法を用いて上記の制約付き最適化問題を解き,対応する対象ハミルトニアンを構成する。最後に,本アルゴリズムの性能向上のために,実験例が6,8,10キュービットである場合のシミュレーションにパラメータ固定方式を採用する。数値計算の結果,アルゴリズムのレベル$p$が低い場合,高い確率で解が得られることがわかった。さらに、シングルキュービット回転ゲート$r_z$を除去して量子回路を最適化する。量子ゲートの数は$np$ for $p$レベルの最適化回路で減少することがわかった。さらに、$p$レベル最適化回路は$p$パラメータしか必要とせず、$p$パラメータを持つオリジナル回路と同様の実験的な効果を実現できる。 The minimum exact cover (MEC) is a common combinatorial optimization problem, with wide applications in tail-assignment and vehicle routing. In this paper, we adopt quantum alternating operator ansatz (QAOA+) to solve MEC problem. In detail, to obtain a trivial feasible solution, we first transform MEC into a constrained optimization problem with two objective functions. Then, we adopt the linear weighted sum method to solve the above constrained optimization problem and construct the corresponding target Hamiltonian. Finally, to improve the performance of this algorithm, we adopt parameters fixing strategy to simulate, where the experimental instances are 6, 8, and 10 qubits. The numerical results show that the solution can be obtained with high probability when level $p$ of the algorithm is low. Besides, we optimize the quantum circuit by removing single-qubit rotating gates $R_Z$. We found that the number of quantum gates is reduced by $np$ for $p$-level optimized circuit. Furthermore, $p$-level optimized circuit only needs $p$ parameters, which can achieve an experimental effect similar to original circuit with $2p$ parameters.	翻訳日:2023-01-17 15:09:18 公開日:2022-11-28
# 動くunruh-dewitt検出器の量子相関とコヒーレンス Quantum Correlations and Coherence in a Moving Unruh-deWitt Detector ( http://arxiv.org/abs/2211.15263v1 ) ライセンス: Link先を確認	S. Bhuvaneswari, R. Muthuganesan and R. Radha	(参考訳) 本稿では,スカラー場に結合した2つの加速型unruh-dewitt検出器の3 + 1 minkowski時空における量子相関とコヒーレンスについて検討する。エンタングルメントは無限加速度の限界で完全に破壊されるが、局所量子の不確かさとコヒーレンスのl1ノルムはゼロではない。さらに,初期状態の異なる選択に対する量子相関に対する検出器の非ルール温度とエネルギー間隔の役割についても注目する。 In this paper, we investigate the quantum correlations and coherence of two accelerating Unruh-deWitt detectors coupled to a scalar field in 3 + 1 Minkowski space-time. We show that the entanglement is completely destroyed in the limit of infinite acceleration while the local quantum uncertainty and l1-norm of coherence remain nonzero. In addition, we also highlight the role of Unruh temperature and energy spacing of detectors on quantum correlations for different choices of initial states.	翻訳日:2023-01-17 15:09:00 公開日:2022-11-28
# ランダムイジングモデルのためのディープラーニング最適量子アニールスケジュール Deep learning optimal quantum annealing schedules for random Ising models ( http://arxiv.org/abs/2211.15209v1 ) ライセンス: Link先を確認	Pratibha Raghupati Hegde, Gianluca Passarelli, Giovanni Cantele, and Procolo Lucignano	(参考訳) 量子アドバンテージへの競争における重要なステップは、アドホックアニーリングスケジュールを用いた量子アニーリングの最適化である。この分野の最近の進展に触発されて、我々は、正規グラフ上の(ランダム)重み付きMax-Cutの最適焼鈍スケジュールの探索を自動化するために、長期記憶(LSTM)ニューラルネットワークを提案する。局所断熱焼鈍経路を用いてネットワークを訓練することにより,未発見のインスタンスやより大きなグラフに対する最適焼鈍スケジュールを,トレーニングに使用するものよりも予測することができる。 A crucial step in the race towards quantum advantage is optimizing quantum annealing using ad-hoc annealing schedules. Motivated by recent progress in the field, we propose to employ long short term memory (LSTM) neural networks to automate the search for optimal annealing schedules for (random) weighted Max-Cut on regular graphs. By training our network using locally adiabatic annealing paths, we are able to predict optimal annealing schedules for unseen instances and even larger graphs than those used for training.	翻訳日:2023-01-17 15:08:52 公開日:2022-11-28
# 量子状態の数学的モデリングにおけるMajorana表現 Majorana Representation in Mathematical Modeling of Quantum States ( http://arxiv.org/abs/2211.15113v1 ) ライセンス: Link先を確認	Farhod Shokir	(参考訳) 本稿では、Majorana法を用いてスピン数S=j\^hの量子系の状態の数学的モデリングを行う。一般の場合j>=0.5における配向状態の相関関数の式を得る。 In this paper, using the Majorana method, mathematical modeling of the state of quantum systems with spin number S=j\^h. An expression for the correlation functions of oriented states in the general case j>=0.5 is obtained.	翻訳日:2023-01-17 15:08:11 公開日:2022-11-28
# 散逸キラル分子の放射線に対するエナンチオ選択的スイッチ Enantioselective switch on radiations of dissipative chiral molecules ( http://arxiv.org/abs/2211.15112v1 ) ライセンス: Link先を確認	Chong Ye, Xiaowei Mu, Yifan Sun, Libin Fu, and Xiangdong Zhang	(参考訳) エナンチオ検出は自然科学において重要かつ困難な課題である。今日では、キラル分子の脱コヒーレンス非環状三レベルモデルに基づく光学的エナンチオデッション法は、分子応答におけるエナンチオ選択性の究極の限界に達することができる。したがって、従来のキロプティカル法よりも効率的である。しかしながら、脱コヒーレンスは避けられず、これらの高度な光学的手法のエナンチオ選択性を著しく低減することができるため、弱い脱コヒーレンス領域ではうまく機能する。本稿では,散逸性キラル分子の放射線に対するエナンチオ選択的スイッチを提案し,全てのデコヒーレンス領域において新しいエナンチオ検出法を開発した。提案方式では, 選択したエナンチオマーに対して放射線を照射し, 消散性三レベルモデルに基づいて電磁界をよく設計し, ミラー画像に対して同時に消光する。キラル混合物のエナンチオマー過剰は、2つのエナンチオマーの放射がそれぞれオフになっている2つのケースでその放出を比較することにより決定される。対応するエナンチオ選択性は、全てのデコヒーレンス領域において究極の限界に達し、エナンチオ検出における他のキロプティカル手法よりもスキームのアドバンテージを提供する。本研究は, すべての脱コヒーレンス領域において, より効率的なエナンチオディッション技術を開発するための出発点となる可能性がある。 Enantiodetection is an important and challenging task across natural science. Nowadays, some chiroptical methods of enantiodetection based on decoherence-free cyclic three-level models of chiral molecules can reach the ultimate limit of the enantioselectivities in the molecular responses. They are thus more efficient than traditional chiroptical methods. However, decoherence is inevitable and can severely reduce enantioselectivities in these advanced chiroptical methods, so they only work well in the weak decoherence region. Here, we propose an enantioselective switch on the radiation of dissipative chiral molecules and develop a novel chiroptical method of enantiodetection working well in all decoherence regions. In our scheme, radiation is turned on for the selected enantiomer and simultaneously turned off for its mirror image by designing the electromagnetic fields well based on dissipative cyclic three-level models. The enantiomeric excess of a chiral mixture is determined by comparing its emissions in two cases, where the radiations of two enantiomers are turned off respectively. The corresponding enantioselectivities reach the ultimate limit in all decoherence regions, offering our scheme advantages over other chiroptical methods in enantiodetection. Our work potentially constitutes the starting point for developing more efficient chiroptical techniques for enantiodection in all decoherence regions.	翻訳日:2023-01-17 15:08:07 公開日:2022-11-28
# チャネル識別のための利益のある絡み合い Profitable entanglement for channel discrimination ( http://arxiv.org/abs/2211.15108v1 ) ライセンス: Link先を確認	Samad Khabbazi Oskouei, Stefano Mancini, Milajiguli Rexiti	(参考訳) 本研究では,2つの一般量子ビットチャネルの識別における側絡の有用性について検討し,それが拡張する条件(および成功確率が向上しない条件)を決定する。これは、まず、完全正およびトレース保存されたキュービット線型写像の集合において極端であるチャネルの問題を解析し、次にそのような集合の内部にあるチャネルについて構成的に行われる。 We investigate the usefulness of side entanglement in discriminating between two generic qubit channels and determine exact conditions under which it does enhance (as well as conditions under which it does not) the success probability. This is done in a constructive way by first analyzing the problem for channels that are extremal in the set of completely positive and trace-preserving qubit linear maps and then for channels that are inside such a set.	翻訳日:2023-01-17 15:07:40 公開日:2022-11-28
# 4光子ghzオンチップ状態の高信頼化 High-fidelity generation of four-photon GHZ states on-chip ( http://arxiv.org/abs/2211.15626v1 ) ライセンス: Link先を確認	Mathias Pont, Giacomo Corrielli, Andreas Fyrillas, Iris Agresti, Gonzalo Carvacho, Nicolas Maring, Pierre-Emmanuel Emeriau, Francesco Ceccarelli, Ricardo Albiero, Paulo H. D. Ferreira, Niccolo Somaschi, Jean Senellart, Isabelle Sagnes, Martina Morassi, Aristide Lemaitre, Pascale Senellart, Fabio Sciarrino, Marco Liscidini, Nadia Belabas, Roberto Osellame	(参考訳) 相互に絡み合った多光子状態は、全光学量子技術の核心にある。自由空間装置を用いた量子光の発生において顕著な進展が報告されているが、将来の拡張性には高忠実なオンチップエンタングルメント生成が不可欠である。本研究では,4光子グリーンバーグ・ホルン・ザイリンガー(GHZ)状態の高忠実度発生を低損失再構成ガラスフォトニック回路で実証するために,明るい量子ドットベースの単一光子源を用いる。我々は、生成状態の密度行列を、ターゲットである$\|{\text{GHZ}_4}\rangle$ of $\mathcal{F}_{\text{GHZ}_4} (86.0\pm0.4)\,\%$と、$\mathcal{P}_{\text{GHZ}_4}=(76.3\pm0.6)\,\%$に到達した完全量子状態トモグラフィーを用いて再構成する。生成した状態の絡み合いは、39以上の標準偏差によるベルのような不等式違反による半デバイス非依存のアプローチで認証される。最後に、我々は4つのパーティの量子秘密共有プロトコルをチップ上で実行し、3つのインターロケータと最大1978ビットのsiftedキーを共有し、キュービットエラー率10.87\,\%$を達成する。これらの結果は、チップ上の絡み合い生成のためのガラスフォトニック回路と組み合わされた量子ドット技術が、中間スケールの量子計算と通信に有効な経路を提供することを示している。 Mutually entangled multi-photon states are at the heart of all-optical quantum technologies. While impressive progresses have been reported in the generation of such quantum light states using free space apparatus, high-fidelity high-rate on-chip entanglement generation is crucial for future scalability. In this work, we use a bright quantum-dot based single-photon source to demonstrate the high fidelity generation of 4-photon Greenberg-Horne-Zeilinger (GHZ) states with a low-loss reconfigurable glass photonic circuit. We reconstruct the density matrix of the generated states using full quantum-state tomography reaching an experimental fidelity to the target $\|{\text{GHZ}_4}\rangle$ of $\mathcal{F}_{\text{GHZ}_4} (86.0\pm0.4)\,\%$, and a purity of $\mathcal{P}_{\text{GHZ}_4}=(76.3\pm0.6)\,\%$. The entanglement of the generated states is certified with a semi device-independent approach through the violation of a Bell-like inequality by more than 39 standard deviations. Finally, we carry out a four-partite quantum secret sharing protocol on-chip where a regulator shares with three interlocutors a sifted key with up to 1978 bits, achieving a qubit-error rate of $10.87\,\%$. These results establish that the quantum-dot technology combined with glass photonic circuitry for entanglement generation on chip offers a viable path for intermediate scale quantum computation and communication.	翻訳日:2023-01-17 15:00:45 公開日:2022-11-28
# 周期駆動型散逸双極子系のカスケードダイナミクス Cascaded dynamics of a periodically driven dissipative dipolar system ( http://arxiv.org/abs/2211.15592v1 ) ライセンス: Link先を確認	Saptarshi Saha and Rangeet Bhattacharyya	(参考訳) 最近の実験では、双極子系の周期駆動が長寿命の予熱状態をもたらすことが示されている。これらのシステムは環境に弱い結合を持ち、熱化の時間スケールよりもはるかに短い時間スケールで予熱状態に達する。このようなほぼ閉ざされた系は、以前にフロッケ形式を用いて分析され、予熱プレートの出現を示している。これらのシステムを記述するために、変動制御量子マスター方程式(FRQME)を用いる。システム-環境結合に加えて、FRQMEはシステム内の様々な局所的相互作用からの散逸効果を捉えた。調査の結果,システムの最終安定状態へのカスケード的な旅が明らかになった。カスケードは、準保存量の集合によって特徴づけられる熱前状態または捕縛状態の集合を含む。これらの熱前状態は、緩和時間スケールよりもずっと短い時間スケールで現れる。また,予熱台地の存在が止まる限界の存在を発見,報告する。 Recent experiments show that periodic drives on dipolar systems lead to long-lived prethermal states. These systems are weakly coupled to the environment and reach prethermal states in a timescale much shorter than the timescale for thermalization. Such nearly-closed systems have previously been analyzed using Floquet formalism, which shows the emergence of a prethermal plateau. We use a fluctuation-regulated quantum master equation (FRQME) to describe these systems. In addition to the system-environment coupling, FRQME successfully captures the dissipative effect from the various local interactions in the system. Our investigation reveals a cascaded journey of the system to a final steady state. The cascade involves a set of prethermal or arrested states characterized by a set of quasi-conserved quantities. We show that these prethermal states emerge in a timescale much shorter than the relaxation timescale. We also find and report the existence of a critical limit beyond which the prethermal plateau ceases to exist.	翻訳日:2023-01-17 15:00:12 公開日:2022-11-28
# 量子会議キー合意の基本的限界を克服する Overcoming fundamental bounds on quantum conference key agreement ( http://arxiv.org/abs/2211.15559v1 ) ライセンス: Link先を確認	Giacomo Carrara, Gl\'aucia Murta and Federico Grasselli	(参考訳) ツインフィールド量子鍵分布(TF-QKD)は、中間測定ステーションで弱いコヒーレントパルス(WCP)を干渉することにより、2つの離れたパーティが共有秘密鍵を確立することを可能にする。これにより、TF-QKDは従来のQKDスキームよりも遠くまで到達でき、二部構成のプライベートキャパシティ上のリピータレスバウンドを打破できる唯一のスキームとなる。ここでは、TF-QKDを多人数シナリオに一般化する。具体的には,WCPと線形光学しか使用せず,マルチパーティデコイステート方式でセキュリティを証明する,実用的な会議鍵契約(CKA)プロトコルを提案する。本プロトコルは,任意の数の参加者が単一光子干渉によって秘密の会議鍵を確立することを可能にし,リピータを使わずに量子ネットワークで会議鍵を確立できる速度の最近の限界を克服する。 Twin-Field Quantum Key Distribution (TF-QKD) enables two distant parties to establish a shared secret key, by interfering weak coherent pulses (WCPs) in an intermediate measuring station. This allows TF-QKD to reach greater distances than traditional QKD schemes and makes it the only scheme capable of beating the repeaterless bound on the bipartite private capacity. Here, we generalize TF-QKD to the multipartite scenario. Specifically, we propose a practical conference key agreement (CKA) protocol that only uses WCPs and linear optics and prove its security with a multiparty decoy-state method. Our protocol allows an arbitrary number of parties to establish a secret conference key by single-photon interference, enabling it to overcome recent bounds on the rate at which conference keys can be established in quantum networks without a repeater.	翻訳日:2023-01-17 15:00:01 公開日:2022-11-28
# ギブス多様体 Gibbs Manifolds ( http://arxiv.org/abs/2211.15490v1 ) ライセンス: Link先を確認	Dmitrii Pavlov, Bernd Sturmfels and Simon Telen	(参考訳) ギブス多様体は指数写像の下で対称行列のアフィン空間の像である。これらは最適化、統計学、量子物理学などの応用で生まれ、トーリック幾何学のユビキタスな役割を伸ばす。ギブス多様体は、ギブス多様体上で消えるすべての多項式の零点である。これらの多項式を計算し、ギブス多様体が低次元であることを示す。我々の理論は、行列鉛筆や量子最適輸送など、幅広いシナリオに適用されている。 Gibbs manifolds are images of affine spaces of symmetric matrices under the exponential map. They arise in applications such as optimization, statistics and quantum~physics, where they extend the ubiquitous role of toric geometry. The Gibbs variety is the zero locus of all polynomials that vanish on the Gibbs manifold. We compute these polynomials and show that the Gibbs variety is low-dimensional. Our theory is applied to a wide range of scenarios, including matrix pencils and quantum optimal transport.	翻訳日:2023-01-17 14:59:45 公開日:2022-11-28
# アクティブボリューム:非ローカル接続の少ない効率的なフォールトトレラント量子コンピュータのためのアーキテクチャ Active volume: An architecture for efficient fault-tolerant quantum computers with limited non-local connections ( http://arxiv.org/abs/2211.15465v1 ) ライセンス: Link先を確認	Daniel Litinski and Naomi Nickerson	(参考訳) 表面符号に基づくフォールトトレラント量子コンピュータの既存の汎用アーキテクチャでは、量子計算のコストは回路体積、すなわち非クリフォードゲート数で乗算された量子ビット数によって決定される。我々は,非2d-ローカル接続を用いたアーキテクチャを導入し,そのコストはキュービット数でスケールせず,論理演算数でのみスケールする。各論理演算は関連するアクティブボリュームを持ち、量子計算のコストを全ての演算のアクティブボリュームの和として定量化することができる。数千の論理量子ビットを持つ量子計算では、アクティブ体積は回路体積よりも桁違いに小さい。重要なことに、アーキテクチャはN論理量子ビット間の全接続を必要としない。代わりに、各論理キュービットは O(log N) の他のサイトと接続される。例えば、同じ数の論理量子ビットを用いることで、2048ビットのファクタリングアルゴリズムが、非ローカル接続のない汎用アーキテクチャよりも44倍高速に実行可能であることを示す。フォトニック量子ビットでは、長距離接続が可能であり、フォトニックコンポーネントが融合ベースのアクティブボリューム量子コンピュータの構築にどのように使われるかを示す。 In existing general-purpose architectures for surface-code-based fault-tolerant quantum computers, the cost of a quantum computation is determined by the circuit volume, i.e., the number of qubits multiplied by the number of non-Clifford gates. We introduce an architecture using non-2D-local connections in which the cost does not scale with the number of qubits, and instead only with the number of logical operations. Each logical operation has an associated active volume, such that the cost of a quantum computation can be quantified as a sum of active volumes of all operations. For quantum computations with thousands of logical qubits, the active volume can be orders of magnitude lower than the circuit volume. Importantly, the architecture does not require all-to-all connectivity between N logical qubits. Instead, each logical qubit is connected to O(log N) other sites. As an example, we show that, using the same number of logical qubits, a 2048-bit factoring algorithm can be executed 44 times faster than on a general-purpose architecture without non-local connections. With photonic qubits, long-range connections are available and we show how photonic components can be used to construct a fusion-based active-volume quantum computer.	翻訳日:2023-01-17 14:59:39 公開日:2022-11-28
# 鳥類コンパスのラジカルペアダイナミクスの量子シミュレーション Quantum Simulation of the Radical Pair Dynamics of the Avian Compass ( http://arxiv.org/abs/2211.15427v1 ) ライセンス: Link先を確認	Yiteng Zhang, Zixuan Hu, Yuchen Wang, and Sabre Kais	(参考訳) 量子回路上でのオープン量子ダイナミクスのシミュレーションは、近年、様々な量子アルゴリズムの開発と実証によって、幅広い関心を集めている。これらのうち、ユニタリディレーションに基づく量子アルゴリズムの特定の設計は、一般および複雑な物理系をシミュレートすることができる。本稿では,この量子アルゴリズムを鳥のコンパスにおけるラジカル対機構のダイナミクスに応用する。このアプリケーションはIBM QASM量子シミュレータで実証される。この研究は、鳥のコンパスにおけるラジカルペア機構をシミュレートする量子アルゴリズムの最初の応用であり、これは量子アルゴリズムの一般化を実証するだけでなく、鳥類のコンパスを量子コンピューティングデバイスで研究する新たな機会を開く。 The simulation of open quantum dynamics on quantum circuits has attracted wide interests recently with a variety of quantum algorithms developed and demonstrated. Among these, one particular design of a unitary-dilation-based quantum algorithm is capable of simulating general and complex physical systems. In this paper, we apply this quantum algorithm to simulating the dynamics of the radical pair mechanism in the avian compass. This application is demonstrated on the IBM QASM quantum simulator. This work is the first application of any quantum algorithm to simulating the radical pair mechanism in the avian compass, which not only demonstrates the generality of the quantum algorithm, but also opens new opportunities for studying the avian compass with quantum computing devices.	翻訳日:2023-01-17 14:59:20 公開日:2022-11-28
# 非エルミート量子系に対する半古典的フシミ分布 Semiclassical Husimi distributions for non-Hermitian quantum systems ( http://arxiv.org/abs/2211.15336v1 ) ライセンス: Link先を確認	Joesph Hall, Simon Malzard, and Eva-Maria Graefe	(参考訳) 非エルミート量子系におけるシュールベクトルの半古典位相空間密度を構築する。各schurベクトルは単一のプランクセルに関連付けられる。シュール状態は位相空間上の古典的ノルムの風景(非エルミート系の特徴である寿命の古典的表現)に従って組織される。この構成の一般性を示すために、混合的およびカオス的古典力学の条件下でのPT対称キックローターを非常に非自明な例に適用する。 We construct a semiclassical phase-space density of Schur vectors in non-Hermitian quantum systems. Each Schur vector is associated to a single Planck cell. The Schur states are organised according to a classical norm landscape on phase space - a classical manifestation of the lifetimes which are characteristic of non-Hermitian systems. To demonstrate the generality of this construction we apply it to a highly non-trivial example, a PT-symmetric kicked rotor in the regimes of mixed and chaotic classical dynamics.	翻訳日:2023-01-17 14:59:03 公開日:2022-11-28
# 3次元ブラックホールシミュレータによるAdS/CFT対応 AdS/CFT Correspondence with a 3D Black Hole Simulator ( http://arxiv.org/abs/2211.15305v1 ) ライセンス: Link先を確認	Aydin Deger and Jiannis K. Pachos	(参考訳) AdS/CFT対応は高エネルギー・凝縮物質物理学においても洞察に富んでいる。この対応の応用は、反ド・ジッター(AdS)ブラックホールの絡み合いエントロピーと低次元共形場理論(CFT)の双対性である。この対応を明確に示すために、3次元ブラックホール幾何がディラック場に与える影響を非均一トンネル結合を持つフェルミオンの正方格子を用いてシミュレートする。 3次元BTZブラックホール水平線をシミュレーションし、AdS空間の宇宙定数に依存する中心電荷を持つ対応する2次元CFTと一致する領域法挙動を数値的に得る。様々な3dブラックホールプロファイルの体系的な数値的研究は、全ての3dブラックホールが同じcftで表現できるエントロピーな振る舞いを与えることを示唆している。 The AdS/CFT correspondence has been insightful for high-energy and condensed matter physics alike. An application of this correspondence is the duality between the entanglement entropy of Anti-de Sitter (AdS) black holes and lower-dimensional conformal field theories (CFT). To explicitly demonstrate this correspondence we simulate the effect a 3D black hole geometry has on Dirac fields by employing a square lattice of fermions with inhomogeneous tunnelling couplings. Simulating a 3D BTZ black hole horizon, we numerically obtain an area law behaviour that is in agreement with the corresponding 2D CFT with a central charge that depends on the cosmological constant of the AdS space. A systematic numerical investigation of various 3D black hole profiles suggests that all 3D black holes give an entropic behaviour that can be represented by the same CFT.	翻訳日:2023-01-17 14:58:55 公開日:2022-11-28
# エージェントネットワークを利用した量子秘密集約 Quantum secret aggregation utilizing a network of agents ( http://arxiv.org/abs/2211.15758v1 ) ライセンス: Link先を確認	Michael Ampatzis and Theodore Andronikos	(参考訳) この研究では、スパイのネットワークが宇宙の異なる場所に分散されていること、そして各スパイが小さなが不完全な大秘密の一部を持っていると仮定すると、これらの部分的な秘密をスパイマスターに安全に送信し、大きな秘密を明らかにするために組み合わせることができるか、という課題について考察する。我々はこれを量子秘密集約問題と呼び、aliceがスパイマスターの役割を引き継いだ量子ゲームという形で、この問題に完全一般性で対処するプロトコルを提案する。我々のプロトコルは、アリスと彼女のスパイに対称に分布する最大絡み合ったghzタプルの使用に依存している。エージェントからスパイマスターへの小さな部分的な秘密の安全な伝達を可能にするのは、絡み合いの力である。追加のボーナスとして、アンタグルメントはプロトコルのセキュリティを保証し、悪名高い盗賊イヴが大きな秘密を盗むことは統計的に不可能である。 In this work we consider the following problem: given a network of spies, all distributed in different locations in space, and assuming that each spy possesses a small, but incomplete by itself part of a big secret, is it possible to securely transmit all these partial secrets to the spymaster, so that they can be combined together in order to reveal the big secret? We refer to it as the Quantum Secret Aggregation problem, and we propose a protocol, in the form of a quantum game, with Alice taking over the role of the spymaster, that addresses this problem in complete generality. Our protocol relies on the use of maximally entangled GHZ tuples, which are symmetrically distributed among Alice and all her spies. It is the power of entanglement that makes possible the secure transmission of the small partial secrets from the agents to the spymaster. As an additional bonus, entanglement guarantees the security of the protocol, by making it statistically improbable for the notorious eavesdropper Eve to steal the big secret.	翻訳日:2023-01-17 14:52:43 公開日:2022-11-28
# 中性原子量子アーキテクチャにおける使用ベースマイグレーションによるランタイムオーバーヘッドの削減 Reducing Runtime Overhead via Use-Based Migration in Neutral Atom Quantum Architectures ( http://arxiv.org/abs/2211.15757v1 ) ライセンス: Link先を確認	Andrew Litteken (1), Jonathan M. Baker (1), Frederic T. Chong (1) ((1) University of Chicago)	(参考訳) 中性原子はスケーラブルな量子コンピューティングアーキテクチャにとって有望な選択である。長距離通信やネイティブマルチビットゲートといった特徴は、通信コストと運用回数の削減を提供する。しかし、量子ビットとして用いられる閉じ込められた原子は、計算過程や環境要因の悪さにより失われる。失われた計算キュービットの値は回復できず、配列の再ロードと計算の再実行が必要となり、回路の実行数が大幅に増加する。ソフトウェア緩和戦略は存在するが、元のマッピングされた回路の位置を緩やかに使い果たし、アーキテクチャ全体にクビットのクラスタを分散させ、成功の確率を低下させる。私たちは、すべての到達可能な量子ビットを見つける戦略を開発することによって、柔軟性を高めます。第二に、アーキテクチャを別々のセクションに分割し、失われた原子のない各セクションで回路を実行する。アーキテクチャが十分に大きい場合は、アーキテクチャ全体をリロードすることなく回路をリセットする。これにより、アーキテクチャの30%を利用する回路で再ロードする前に有効ショット数を2倍に増やすことができる。また、これらのセクションを使用して回路の実行を並列化し、30キュービットの回路で全体の実行時間を50%削減する。これらの手法は、失われた計算空間の有害な効果と戦うための動的な新しい戦略のセットに寄与する。 Neutral atoms are a promising choice for scalable quantum computing architectures. Features such as long distance interactions and native multiqubit gates offer reductions in communication costs and operation count. However, the trapped atoms used as qubits can be lost over the course of computation and due to adverse environmental factors. The value of a lost computation qubit cannot be recovered and requires the reloading of the array and rerunning of the computation, greatly increasing the number of runs of a circuit. Software mitigation strategies exist but exhaust the original mapped locations of the circuit slowly and create more spread out clusters of qubits across the architecture decreasing the probability of success. We increase flexibility by developing strategies that find all reachable qubits, rather only adjacent hardware qubits. Second, we divide the architecture into separate sections, and run the circuit in each section, free of lost atoms. Provided the architecture is large enough, this resets the circuit without having to reload the entire architecture. This increases the number of effective shots before reloading by a factor of two for a circuit that utilizes 30% of the architecture. We also explore using these sections to parallelize execution of circuits, reducing the overall runtime by a total 50% for 30 qubit circuit. These techniques contribute to a dynamic new set of strategies to combat the detrimental effects of lost computational space.	翻訳日:2023-01-17 14:52:23 公開日:2022-11-28
# Hu-Paz-Zhangマスター方程式の係数の解析的評価:オーミックスペクトル密度、零温度、整合性チェック Analytical evaluation of the coefficients of the Hu-Paz-Zhang master equation: Ohmic spectral density, zero temperature, and consistency check ( http://arxiv.org/abs/2211.15722v1 ) ライセンス: Link先を確認	G. Homa, J. Z. Bern\'ad, A. Csord\'as	(参考訳) ローレンツドロード型オーミックスペクトル密度を持つゼロ温度の量子高調波発振器に対するhu,paz,zhangの厳密なマスター方程式について検討した。このマスター方程式は量子ブラウン運動の研究において重要な役割を果たし、様々な応用において弱いカップリング極限のような近似の対象となる。本稿では,この非マルコフマスター方程式の係数をリンドブラッド形式を用いずに解析的に評価し,弱結合限界,定常密度作用素の正値,モデルのパラメータの境界などについて検討する。 We investigate the exact master equation of Hu, Paz, and Zhang for a quantum harmonic oscillator at zero temperature with a Lorentz-Drude type ohmic spectral density. This master equation plays an important role in the study of quantum Brownian motion and in various applications it is subject to approximations, like the weak coupling limit. In this paper, we give an analytical evaluation of the coefficients of this non-Markovian master equation without Lindblad form, which allows us to investigate consistencies of the weak coupling limit, the positivity of the stationary density operator, and the boundaries of the model's parameters.	翻訳日:2023-01-17 14:52:04 公開日:2022-11-28
# ライドバーグ配位原子を用いた光学格子中の数体アナログ量子シミュレーション Few-body analogue quantum simulation with Rydberg-dressed atoms in optical lattices ( http://arxiv.org/abs/2211.15708v1 ) ライセンス: Link先を確認	Daniel Malz and J. Ignacio Cirac	(参考訳) 光学格子内の超低温原子を用いたほとんどの実験は接触相互作用を持つため、強い相互作用の効果を観測するために1サイトあたりの約1原子の高密度で作用する。強い範囲の相互作用は、ほとんど相互作用しない粒子の物理学を探求する道を開くライドバーグドレッシングによって生成される。結晶の単位セルではなく、光学格子の部位を離散化された空間と解釈することができる。これにより、慣れ親しんだアーキテクチャで全く新しいタイプの問題を研究することができる。相互作用のスケーリング法則が異なるものの、量子化学で見られる問題に似た問題を実現する可能性について検討する。数値シミュレーションにより, 単純な擬似原子と-分子は, 最先端実験において高い忠実度で生成できることを示した。 Most experiments with ultracold atoms in optical lattices have contact interactions, and therefore operate at high densities of around one atom per site to observe the effect of strong interactions. Strong ranged interactions can be generated via Rydberg dressing, which opens the path to explore the physics of few interacting particles. Rather than the unit cells of a crystal, the sites of the optical lattice can now be interpreted as discretized space. This allows studying completely new types of problems in a familiar architecture. We investigate the possibility of realizing problems akin to those found in quantum chemistry, although with a different scaling law in the interactions. Through numerical simulation, we show that simple pseudo-atoms and -molecules could be prepared with high fidelity in state-of-the-art experiments.	翻訳日:2023-01-17 14:51:53 公開日:2022-11-28
# 宇宙デブリのための量子重力センサ Quantum Gravitational Sensor for Space Debris ( http://arxiv.org/abs/2211.15695v1 ) ライセンス: Link先を確認	Meng-Zhi Wu, Marko Toro\v{s}, Sougato Bose, Anupam Mazumdar	(参考訳) 物質波干渉計は、等価原理や重力の量子性をテストするなど、重力実験の基本的な応用がある。さらに、物質波干渉計を量子センサとして使用して、外部の巨大な移動物体による局所重力加速度を測定することで、技術応用に役立てることができる。本稿では,外部移動物体からの重力勾配信号を記述するための3次元モデルを構築し,Stern-Gerlach セットアップに基づく物質波干渉計による達成可能な感度を理論的に検討する。応用として、メソスコピック干渉(MIMAC)と重力波検出法(New J. Phys. 22 083012 (2020))について検討し、周波数空間解析を用いて重力勾配に対する感度を定量化する。我々は,地球近傍の物体と衛星近傍の宇宙デブリを考察し,その距離,速度,方向の関数として物体の最小検出可能な質量を推定する。小惑星、惑星運動、および太陽系の原始ブラックホールから重力勾配を感知する要件を推定することで、結論付けます。 Matter-wave interferometers have fundamental applications for gravity experiments such as testing the equivalence principle and the quantum nature of gravity. In addition, matter-wave interferometers can be used as quantum sensors to measure the local gravitational acceleration caused by external massive moving objects, thus lending itself for technological applications. In this paper, we will establish a three dimensional model to describe the gravity gradient signal from an external moving object, and theoretically investigate the achievable sensitivities using the matter-wave interferometer based on the Stern-Gerlach set-up. As an application we will consider the Mesoscopic Interference for Metric and Curvature (MIMAC) and Gravitational wave detection scheme [New J. Phys. 22, 083012 (2020)] and quantify its sensitivity to gravity gradients using frequency-space analysis. We will consider objects near Earth-based experiments and space debris in proximity of satellites and estimate the minimum detectable mass of the object as a function of their distance, velocity, and orientation. We will conclude by estimating the requirements to sense gravity gradients from asteroids, planetary motion and from outer solar system primordial black holes.	翻訳日:2023-01-17 14:51:41 公開日:2022-11-28
# 量子微分同相写像は不定因数次数を定めない Quantum diffeomorphisms cannot make indefinite causal order definite ( http://arxiv.org/abs/2211.15685v1 ) ライセンス: Link先を確認	Anne-Catherine de la Hamette, Viktoria Kabel, Marios Christodoulou, and \v{C}aslav Brukner	(参考訳) 不定因果関係の研究は、近年、理論的にも実験的にも急速に進展している。古典的には、2つの時間的な分離事象 A と B の因果順序は、A の前の A か B のどちらかで固定されるが、量子論ではもはやそうではない。ここでは、因果順序の重ね合わせに遭遇することができる。位置、モーメント、その他の性質の重ね合わせが参照フレームや座標系の選択に依存することを明らかにする量子参照フレームに関する最近の研究に照らして、これが因果順序の重ね合わせにも当てはまるかどうかという疑問が生じる。ここでは、量子微分同相に関するこの問題に対して負の答えを与える。まず、2つの事象間の因果順序を世界線一致と3番目の粒子の適切な時間という観点から曖昧に定義する。そして、そのような因果次数の重ね合わせは、各分岐における最も一般的な座標変換のクラス(量子制御、独立微分同相)を通しても定式化できないことを示す。最後に,この結果に基づいて,情報理論と重力的視点を無期限因果順に結びつける。 The study of indefinite causal order has seen rapid development, both theoretically and experimentally, in recent years. While classically the causal order of two timelike separated events A and B is fixed - either A before B or B before A - this is no longer true in quantum theory. There, it is possible to encounter superpositions of causal orders. In light of recent work on quantum reference frames, which reveals that the superposition of locations, momenta, and other properties can depend on the choice of reference frame or coordinate system, the question arises whether this also holds true for superpositions of causal orders. Here, we provide a negative answer to this question for quantum diffeomorphisms. First, we provide an unambiguous definition of causal order between two events in terms of worldline coincidences and the proper time of a third particle. Then, we show that superpositions of causal order defined as such cannot be rendered definite even through the most general class of coordinate transformations - quantum-controlled, independent diffeomorphisms in each branch. Finally, based on our results, we connect the information theoretic and gravitational perspectives on indefinite causal order.	翻訳日:2023-01-17 14:50:53 公開日:2022-11-28
# ユニタリ量子過程のアンシラフリー証明 Ancilla-free certification of unitary quantum processes ( http://arxiv.org/abs/2211.15647v1 ) ライセンス: Link先を確認	Wei Xie	(参考訳) 我々は,ユニタリ量子プロセスのための効率的な量子認証アルゴリズムを,アンシラを使わずに研究する。以前の研究では、未知のユニタリ$u$が既知のユニタリ$v$と同一か、または、未知のユニタリ$v$を固定次元で、o(\varepsilon^{-2})$で、choi状態が使われ、高次元のアンシラシステムが必要であるかを区別できることを示した。 2つのケースを1つのユニタリの$o(\varepsilon^{-1})$で区別するアルゴリズムを与える。 We study efficient quantum certification algorithms for unitary quantum process using no ancilla. Previous study showed that one can distinguish whether an unknown unitary $U$ is equal to or $\varepsilon$-far from a known or unknown unitary $V$ in fixed dimension with $O(\varepsilon^{-2})$ uses of the unitary, in which the Choi state is used and thus a high dimensional ancilla system is always needed. We give an algorithm that distinguishes the two cases with $O(\varepsilon^{-1})$ uses of the unitary, using fewer or no ancilla, outperforming previous relevant results.	翻訳日:2023-01-17 14:50:03 公開日:2022-11-28
# 回路オプトメカニクスによる機械運動の高速フィードバック制御 Fast feedback control of mechanical motion using circuit optomechanics ( http://arxiv.org/abs/2211.15645v1 ) ライセンス: Link先を確認	Cheng Wang, Louise Banniard, Laure Mercier de L\'epinay, and Mika A. Sillanp\"a\"a	(参考訳) アクティブフィードバックループを利用する計測ベースの制御は、技術における標準ツールである。フィードバック制御は、様々な量子系における純粋な量子状態の準備と安定化に使用できる量子技術や関連する基礎研究において有用かつ基本的なツールとして現れている。量子状態よりもはるかに高い熱雑音を呈する中心マイクロメカニカル振動子のフィードバック冷却は特に活発に研究され、近年では光学的測定により地中冷却が可能であることが示されている。ここでは,電気機械システムにおける測定に基づくフィードバック動作を実現し,機械的熱雑音を3量子に冷却する。また,ブルーオプトメカニカルサイドバンドでは,フィードバックを伴わずにシステムが不安定な場合に,高い冷却量が得られる。 Measurement-based control, utilizing an active feedback loop, is a standard tool in technology. Feedback control is also emerging as a useful and fundamental tool in quantum technology and in related fundamental studies, where it can be used to prepare and stabilize pure quantum states in various quantum systems. Feedback-cooling of center-of-mass micromechanical oscillators, which typically exhibit a high thermal noise far above the quantum regime has been particularly actively studied and has recently been shown to allow for ground-state cooling using optical measurements. Here, we realize measurement-based feedback operations in an electromechanical system, cooling the mechanical thermal noise down to 3 quanta, limited by added amplifier noise. Counter-intuitively, we also obtain significant cooling when the system is pumped at the blue optomechanical sideband, where the system is unstable without feedback.	翻訳日:2023-01-17 14:49:47 公開日:2022-11-28
# 量子力学的トンネルの電磁アナログ Electromagnetic Analogs of Quantum Mechanical Tunnelling ( http://arxiv.org/abs/2211.16369v1 ) ライセンス: Link先を確認	Jeanne Riga and Rebecca Seviour	(参考訳) 本稿では、類似のマクロ電磁システムを用いた量子力学エミッションモデルのための検証検証法(v&v)の基礎となる理論的枠組みを提案する。転送行列を用いた量子力学と電磁磁性の対応を導出し、原子論的量子トンネルシミュレーションを固定するために使用される電磁アナログを記述する。最後に、量子力学系と電磁系を比較して、いくつかの単純で分析的に可溶な例を示し、この枠組みに基づいて将来のV&V研究の概要を述べる。 In this paper, we introduce the theoretical framework underlying our proposed methodology of verification and validation (V&V) for quantum mechanical emission models using analogous macroscopic electromagnetic systems. We derive the correspondence between quantum mechanics and electromagnetism using the transfer matrix approach, and describe the electromagnetic analog that will be used to anchor the atomistic quantum tunneling simulations. Finally, we illustrate this correspondence by comparing the quantum mechanical and electromagnetic systems for some simple, analytically soluble examples and outline future V&V work based on the framework presented here.	翻訳日:2023-01-17 14:43:18 公開日:2022-11-28
# 2モードスクイーズ状態と原子ノイズレス増幅器を用いた量子リピータ Quantum Repeater using Two-Mode Squeezed States and Atomic Noiseless Amplifiers ( http://arxiv.org/abs/2211.16343v1 ) ライセンス: Link先を確認	Anders J. E. Bjerrum and Jonatan B. Brask and Jonas S. Neergaard-Nielsen and Ulrik L. Andersen	(参考訳) 本研究では, 固体量子ビットのコレクションを用いた無ノイズ増幅法を用いて, 光子損失を受ける2モード圧縮真空状態の保存・精製について理論的に検討する。提案手法は、状態を共有する2つの当事者間の絡み合いを確率的に増大させるために用いられる。提案する増幅ステップは、量子ハサミの集合の構造に類似している。しかし、この研究において増幅ステップは、光モードから量子メモリとして機能する固体量子ビットの集合への状態移動によって実現される。我々は,エンタングル多量子ビットレジスタの生成と,長距離量子鍵分布のための量子リピータの構成という2つの異なる応用について検討する。 We perform a theoretical investigation into how a two-mode squeezed vacuum state, that has undergone photon loss, can be stored and purified using noiseless amplification with a collection of solid-state qubits. The proposed method may be used to probabilistically increase the entanglement between the two parties sharing the state. The proposed amplification step is similar in structure to a set of quantum scissors. However, in this work the amplification step is realized by a state transfer from an optical mode to a set of solid-state qubits, which act as a quantum memory. We explore two different applications, the generation of entangled many-qubit registers, and the construction of quantum repeaters for long-distance quantum key distribution.	翻訳日:2023-01-17 14:43:01 公開日:2022-11-28
# 完全グラフ上のマックスカット問題に対する再帰的量子近似最適化アルゴリズム Recursive Quantum Approximate Optimization Algorithm for the MAX-CUT problem on Complete graphs ( http://arxiv.org/abs/2211.15832v1 ) ライセンス: Link先を確認	Eunok Bae and Soojoon Lee	(参考訳) 量子近似最適化アルゴリズムは、MAX-CUT問題のような組合せ最適化問題を解くために設計されたハイブリッド量子古典的変分アルゴリズムである。近い将来の量子応用の可能性にもかかわらず、量子近似最適化アルゴリズムは、任意の定数レベル $p$ において、マックスカット問題を解くための特定のインスタンスに制限があることが知られている。近年、量子近似最適化アルゴリズムの非局所バージョンである再帰的量子近似最適化アルゴリズムが、これらの制限を克服するために提案されている。しかし、再帰的量子近似最適化アルゴリズムは、特定のインスタンスに対する元の量子近似最適化アルゴリズムよりも優れているという、主に数値的な証拠によって示されている。本研究では、再帰的量子近似最適化アルゴリズムが、近似比に関する完全グラフに対するMAX-CUT問題を解くために、元のアルゴリズムよりも競争力があることを解析的に証明する。 Quantum approximate optimization algorithms are hybrid quantum-classical variational algorithms designed to approximately solve combinatorial optimization problems such as the MAX-CUT problem. In spite of its potential for near-term quantum applications, it has been known that quantum approximate optimization algorithms have limitations for certain instances to solve the MAX-CUT problem, at any constant level $p$. Recently, the recursive quantum approximate optimization algorithm, which is a non-local version of quantum approximate optimization algorithm, has been proposed to overcome these limitations. However, it has been shown by mostly numerical evidences that the recursive quantum approximate optimization algorithm outperforms the original quantum approximate optimization algorithm for specific instances. In this work, we analytically prove that the recursive quantum approximate optimization algorithm is more competitive than the original one to solve the MAX-CUT problem for complete graphs with respect to the approximation ratio.	翻訳日:2023-01-17 14:42:48 公開日:2022-11-28
# 双曲格子上の準次元粒子のy-cubeモデルとフラクタル構造 Y-cube model and fractal structure of subdimensional particles on hyperbolic lattices ( http://arxiv.org/abs/2211.15829v1 ) ライセンス: Link先を確認	Han Yan, Kevin Slagle, Andriy H. Nevidomskyy	(参考訳) 通常の位相量子相とは異なり、フラクトン位数は基礎となる格子幾何学に依存する。本研究では,超双曲平面のスタックである$H_2\times S^1$空間に埋め込まれた格子上で,Y-cubeモデルと呼ばれるX-cubeモデルの一般化を研究する。 y-cube という名前は、x-cube の x-字型頂点作用素のアナログの y-形に由来する。ある双曲格子テッセレーションに対して、y-cubeモデルは、格子のフラクタル型部分集合上でのみ動くことのできる、新しい種類の準次元粒子であるツリーオン(treeons)を持つ。このような励起は双曲幾何学にのみ現れ、平坦な空間ではツリーンは直線あるいは平面となる。興味深いことに、ある種の双曲型容器の場合、フラクトンは膜演算子(X-キューブモデルのように)または双曲平面内のフラクタル型演算子によって生成できる。 Unlike ordinary topological quantum phases, fracton orders are intimately dependent on the underlying lattice geometry. In this work, we study a generalization of the X-cube model, dubbed the Y-cube model, on lattices embedded in $H_2\times S^1$ space, i.e., a stack of hyperbolic planes. The name `Y-cube' comes from the Y-shape of the analog of the X-cube's X-shaped vertex operator. We demonstrate that for certain hyperbolic lattice tesselations, the Y-cube model hosts a new kind of subdimensional particle, treeons, which can only move on a fractal-shaped subset of the lattice. Such an excitation only appears on hyperbolic geometries; on flat spaces treeons becomes either a lineon or a planeon. Intriguingly, we find that for certain hyperbolic tesselations, a fracton can be created by a membrane operator (as in the X-cube model) or by a fractal-shaped operator within the hyperbolic plane.	翻訳日:2023-01-17 14:42:34 公開日:2022-11-28
# wse$_2$単一光子エミッタを用いた弾性表面波キャビティ光力学 Surface Acoustic Wave Cavity Optomechanics with WSe$_2$ Single Photon Emitters ( http://arxiv.org/abs/2211.15811v1 ) ライセンス: Link先を確認	Sahil D. Patel, Kamyar Parto, Michael Choquer, Sammy Umezawa, Landon Hellman, Daniella Polishchuk, Galan Moody	(参考訳) 表面音響波 (SAWs) は、超伝導量子ビット、スピン、量子エミッタなど、マイクロ波から光周波数にまたがる様々な固体量子システムと共存する汎用的なツールである。ここでは, 超伝導電子回路により駆動される平板状ニオブ酸リチウム共振器上の2次元材料, 特に単層wse$_2$を用いたsaw共振器の光学特性を示す。定常フォトルミネッセンス分光法と時間分解単光子計数法を用いて、変調された2Dエミッタの時間ダイナミクスを異なるSAWキャビティモードに結合させ、30meV/%の変形ポテンシャル結合とエネルギーレベルの分裂を示す。我々はSAWからの大きな異方性ひずみを利用して、ナノ秒の時間スケールでの励起微細構造分割を変調し、2次元材料からオンデマンドに絡み合った光子対生成への応用を見出すことができる。 SAWと2D量子エミッタによるキャビティ光学は、音速、光学、超伝導電子量子システムを組み合わせた多機能統合プラットフォームにおいて、コンパクトセンサーと量子電気光学の機会を提供する。 Surface acoustic waves (SAWs) are a versatile tool for coherently interfacing with a variety of solid-state quantum systems spanning microwave to optical frequencies, including superconducting qubits, spins, and quantum emitters. Here, we demonstrate SAW cavity optomechanics with quantum emitters in 2D materials, specifically monolayer WSe$_2$, on a planar lithium niobate SAW resonator driven by superconducting electronics. Using steady-state photoluminescence spectroscopy and time-resolved single-photon counting, we map the temporal dynamics of modulated 2D emitters under coupling to different SAW cavity modes, showing energy-level splitting consistent with deformation potential coupling of 30 meV/%. We leverage the large anisotropic strain from the SAW to modulate the excitonic fine-structure splitting on a nanosecond timescale, which may find applications for on-demand entangled photon-pair generation from 2D materials. Cavity optomechanics with SAWs and 2D quantum emitters provides opportunities for compact sensors and quantum electro-optomechanics in a multi-functional integrated platform that combines phononic, optical, and superconducting electronic quantum systems.	翻訳日:2023-01-17 14:42:14 公開日:2022-11-28
# 連続波マルチパスイメージングフローサイトメトリー Continuous wave multi-pass imaging flow cytometry ( http://arxiv.org/abs/2211.15791v1 ) ライセンス: Link先を確認	Yonatan Israel, Joshua L. Reynolds, Brannon B. Klopfer, Mark A. Kasevich	(参考訳) 本稿では,ラベルフリーイメージングフローサイトメトリーの広視野マルチパス実装を提案する。本手法は, 最大4パスのヒト赤血球アンサンブルの高速フローイメージングを行い, コントラストと信号対雑音のx4強調を示す。本手法は, 測定感度の量子限界に近づき, 弱い吸収状態の試料に最適な撮像範囲を拡大することを示す。これにより、限られた照明強度で動的サンプルを撮像する実用的なシナリオにおいて、最適な撮像感度とスループットが得られ、現在利用可能な量子光源で達成されている感度を上回っている。 We present a wide-field multi-pass implementation of label-free imaging flow cytometry. Our technique is shown for high-speed flow imaging of ensembles of human red blood cells with up to four passes, demonstrating x4 enhancement in contrast and signal-to-noise. We show that our technique approaches close to the quantum limit of measurement sensitivity, extending the range of optimal imaging to samples in the weakly absorbing regime. This allows for near optimal imaging sensitivity and throughput in a practical scenario of imaging a dynamic sample under limited illumination intensity, surpassing the sensitivity achieved with currently available quantum light sources.	翻訳日:2023-01-17 14:41:52 公開日:2022-11-28
# 軌道自由関連密度関数型パウリポテンシャルからの原子殻構造 Atomic shell structure from an orbital-free-related density-functional-theory Pauli potential ( http://arxiv.org/abs/2211.15764v1 ) ライセンス: Link先を確認	Russell B. Thompson	(参考訳) 高分子自己整合体場理論技術は、孤立原子に対する放射電子密度と全結合エネルギーを見つけるために用いられる。量子粒子は4次元熱空間における環-ポリマー構造を持つガウス糸としてモデル化され、平均場近似におけるエドワーズ/フローリー-ハギンズ相互作用を用いて熱空間に実装された古典的排除体積に基づいてパウリポテンシャルが仮定される。その他の近似として、電子-電子自己相互作用のフェルミ-アマルディ補正、問題の次元性を減らす球面平均近似、相関の無視がある。ポリマースケーリング理論は、パウリポテンシャルの排除された体積形式が、一様極限における既知のトーマス・フェルミエネルギー密度に還元されることを示すために用いられる。周期表の最初の18要素について、放射基底関数を持つ双線型フーリエ展開を用いて自己整合方程式を解く。放射状電子密度は正しい殻構造を示し、既知の結合エネルギーと比較して全体の結合エネルギーの誤差は最も軽い元素では9%以下であり、窒素よりも重い原子では3%以下である。より一般的には、静的な非相対論的量子力学による予測の等価性を達成するためには、古典的な統計力学において2つの仮定しか必要とされないことが示唆されている。熱空間におけるこれら2つの仮定は、3次元空間におけるハイゼンベルクの不確実性原理とパウリ排他原理と同一となることが示されている。 Polymer self-consistent field theory techniques are used to find radial electron densities and total binding energies for isolated atoms. Quantum particles are modelled as Gaussian threads with ring-polymer architecture in a four dimensional thermal-space, and a Pauli potential is postulated based on classical excluded volume implemented in the thermal-space using Edwards/Flory-Huggins interactions in a mean-field approximation. Other approximations include a Fermi-Amaldi correction for electron-electron self-interactions, a spherical averaging approximation to reduce the dimensionality of the problem, and the neglect of correlations. Polymer scaling theory is used to show that the excluded volume form of Pauli potential reduces to the known Thomas-Fermi energy density in the uniform limit. Self-consistent equations are solved using a bilinear Fourier expansion, with radial basis functions, for the first eighteen elements of the periodic table. Radial electron densities show correct shell structure, and the errors on the total binding energies compared to known binding energies are less than 9% for the lightest elements and drop to 3% or less for atoms heavier than nitrogen. More generally, it is suggested that only two postulates are needed within classical statistical mechanics to achieve equivalency of predictions with static, non-relativistic quantum mechanics: First, quantum particles are modelled as Gaussian threads in four dimensional thermal-space and, second, pairs of threads (allowing for spin) are subject to classical excluded volume in the thermal-space. It is shown that these two postulates in thermal-space become the same as the Heisenberg uncertainty principle and the Pauli exclusion principle in three dimensional space.	翻訳日:2023-01-17 14:41:30 公開日:2022-11-28
# 強(y光子)コヒーレント状態を用いた実験から単一光子弱値を得る Obtaining a Single-Photon Weak Value from Experiments using a Strong (Many-Photon) Coherent State ( http://arxiv.org/abs/2211.15761v1 ) ライセンス: Link先を確認	Howard M. Wiseman, Aephraim M. Steinberg, Matin Hallaji	(参考訳) 一般的な弱値実験は、1つの状態の1つの粒子を準備し、別の状態の占有数を弱く測定し、第3の状態の粒子を見つける後選択する(クリック)。ほとんどの弱い値の実験は光子を用いて行われているが、単一の光子の合成は困難で速度が遅い。ここでは、上記の弱値は強い(多光子)コヒーレント状態を用いて測定できるが、アバランシェ光ダイオードのような「クリック」検出器はいまだ必要であることを示す。単純にクリックの弱い値をクリックの弱い値から減算し、クリックの確率の単純な関数で答えをスケールする。 A common type of weak-value experiment prepares a single particle in one state, weakly measures the occupation number of another state, and post-selects on finding the particle in a third state (a `click'). Most weak-value experiments have been done with photons, but the heralded preparation of a single photon is difficult and slow of rate. Here we show that the weak value mentioned above can be measured using strong (many-photon) coherent states, while still needing only a `click' detector such as an avalanche photodiode. One simply subtracts the no-click weak value from the click weak-value, and scales the answer by a simple function of the click probability.	翻訳日:2023-01-17 14:40:58 公開日:2022-11-28
# 必要か? 職種に必要な技能のランク付け Is it Required? Ranking the Skills Required for a Job-Title ( http://arxiv.org/abs/2212.08553v1 ) ライセンス: Link先を確認	Sarthak Anand, Jens-Joris Decorte, Niels Lowie	(参考訳) 本稿では,ある職種に対して必要なスキルをランク付けする手法について述べる。我々の分析によると、同様の職種では重要/関連スキルが頻繁に現れる。本稿では,Language-agnostic BERT Sentence Encoder (LaBSE)モデルをトレーニングし,弱い監督力を用いてスキルの重要性を予測する。モデルはスキルの重要性を学び、他の言語でうまく機能することを示す。さらに,スキルの逆文書頻度因子が,特殊スキルをいかに促進するかを示す。 In this paper, we describe our method for ranking the skills required for a given job title. Our analysis shows that important/relevant skills appear more frequently in similar job titles. We train a Language-agnostic BERT Sentence Encoder (LaBSE) model to predict the importance of the skills using weak supervision. We show the model can learn the importance of skills and perform well in other languages. Furthermore, we show how the Inverse Document Frequency factor of skill boosts the specialised skills.	翻訳日:2022-12-25 03:21:39 公開日:2022-11-28
# 潜伏配列構造モデルによる抗菌ペプチド発見の加速 Accelerating Antimicrobial Peptide Discovery with Latent Sequence-Structure Model ( http://arxiv.org/abs/2212.09450v1 ) ライセンス: Link先を確認	Danqing Wang, Zeyu Wen, Fei Ye, Hao Zhou, Lei Li	(参考訳) 抗菌ペプチド (amp) は広スペクトル抗生物質および薬剤耐性感染症の治療において有望な治療法である。近年、AMP発見を加速する深層生成モデルを導入している研究者が増えている。しかし、近年の研究は主に、AMPの生物学的機能において重要な配列属性と構造情報の無視に焦点を当てている。本稿では,マルチスケールVQ-VAEを用いたAMP(LSSAMP)の潜在シーケンス構造モデルを提案する。潜伏空間でサンプリングすることにより、LSSAMPは理想的な配列属性と二次構造を持つペプチドを同時に生成することができる。実験の結果,LSSAMPにより産生されるペプチドはAMPの確率が高く,21の候補のうち2つは優れた抗菌活性を有することが確認された。我々のモデルは、生物実験のフォローアップのための高品質なAMP候補を作成し、AMP発見全体を加速するのに役立つ。 Antimicrobial peptide (AMP) is a promising therapy in the treatment of broad-spectrum antibiotics and drug-resistant infections. Recently, an increasing number of researchers have been introducing deep generative models to accelerate AMP discovery. However, current studies mainly focus on sequence attributes and ignore structure information, which is important in AMP biological functions. In this paper, we propose a latent sequence-structure model for AMPs (LSSAMP) with multi-scale VQ-VAE to incorporate secondary structures. By sampling in the latent space, LSSAMP can simultaneously generate peptides with ideal sequence attributes and secondary structures. Experimental results show that the peptides generated by LSSAMP have a high probability of AMP, and two of the 21 candidates have been verified to have good antimicrobial activity. Our model will be released to help create high-quality AMP candidates for follow-up biological experiments and accelerate the whole AMP discovery.	翻訳日:2022-12-25 03:20:41 公開日:2022-11-28
# syn-qg: 質問生成のための構文と浅い意味規則 Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation ( http://arxiv.org/abs/2004.08694v5 ) ライセンス: Link先を確認	Kaustubh D. Dhole and Christopher D. Manning	(参考訳) 質問生成(qg)は、基本的には単純な構文変換であるが、意味論の多くの側面は、どの質問が形式に良いかに影響する。この観察は、普遍的な依存関係、浅いセマンティックパーシング、語彙資源、および宣言文を質問対に変換するカスタムルールを活用する透明な統語規則であるSynQGを開発することで実現される。 propbank の引数記述と verbnet 状態述語を用いて、浅い意味的コンテンツを取り込んで記述的性質の質問を生成し、既存のシステムよりも推論的かつ意味的にリッチな質問を生成する。文法的不正確な質問を排除し,構文の流布性を改善するために,これらの構文規則のアウトプットを逆翻訳する。クラウドソースによる評価の結果,我々のシステムは従来のQGシステムよりも文法的・関連性の高い質問を多く生成でき,バックトランスレーションは無関係な質問を生成するためのわずかなコストで文法性を劇的に向上させることがわかった。 Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing SynQG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer pairs. We utilize PropBank argument descriptions and VerbNet state predicates to incorporate shallow semantic content, which helps generate questions of a descriptive nature and produce inferential and semantically richer questions than existing systems. In order to improve syntactic fluency and eliminate grammatically incorrect questions, we employ back-translation over the output of these syntactic rules. A set of crowd-sourced evaluations shows that our system can generate a larger number of highly grammatical and relevant questions than previous QG systems and that back-translation drastically improves grammaticality at a slight cost of generating irrelevant questions.	翻訳日:2022-12-12 05:00:04 公開日:2022-11-28
# AIの定義とこの定義を満たすプログラム The AI Definition and a Program Which Satisfies this Definition ( http://arxiv.org/abs/2212.03184v1 ) ライセンス: Link先を確認	Dimiter Dobrev	(参考訳) 我々はエージェントのすべてのポリシーを検討し、その1つが最良の実行方針であることを証明します。このポリシーは計算可能ではないが、計算可能なポリシーはその近傍に存在する。私たちはAIを、最高のパフォーマンスポリシーに十分近い計算可能なポリシーとして定義します。エージェントの最高の実行ポリシーを定義する前に、世界を記述するための言語が必要です。 AIの定義を満たすプログラムを開発するためにも、この言語を使用します。プログラムはまず、選択した言語で記述することで世界を理解する。プログラムは、将来を予測するために記述を使用し、可能な限り最良の行動を選択する。このプログラムは非常に非効率で実用的には使用できないが、世界の記述のための言語と未来を予測するアルゴリズムの両方を精製することで改善することができる。これにより、AI定義の効率的かつ一貫性のあるプログラムが得られる。 We will consider all policies of the agent and will prove that one of them is the best performing policy. While that policy is not computable, computable policies do exist in its proximity. We will define AI as a computable policy which is sufficiently proximal to the best performing policy. Before we can define the agent's best performing policy, we need a language for description of the world. We will also use this language to develop a program which satisfies the AI definition. The program will first understand the world by describing it in the selected language. The program will then use the description in order to predict the future and select the best possible move. While this program is extremely inefficient and practically unusable, it can be improved by refining both the language for description of the world and the algorithm used to predict the future. This can yield a program which is both efficient and consistent with the AI definition.	翻訳日:2022-12-11 12:52:01 公開日:2022-11-28
# RSS-based Localizationにおけるプライオリティの有効利用について On the Effective Usage of Priors in RSS-based Localization ( http://arxiv.org/abs/2212.00728v1 ) ライセンス: Link先を確認	\c{C}a\u{g}kan Yapar, Fabian Jaensch, Ron Levie, Giuseppe Caire	(参考訳) 本稿では,密集した都市環境におけるローカライズ問題について考察する。このような環境では、Global Navigation Satellite Systemsは、建物のような障害物が存在するため、受信機(Rx)と衛星との間のLOS(Line-of-sight)リンクの可能性が低いため、精度が良くない。したがって、NLOS(Non-of-Sight)条件下で確実に動作可能な他の技術を利用する必要がある。近年,受信信号強度(rss)指紋と畳み込みニューラルネットワークに基づくアルゴリズムlocunetを提案し,広く採用されているk-nearest neighbors(knn)アルゴリズムとtoa(state-of-the-art time of arrival)範囲ベース手法に関してその最先端の局在化性能を実証した。本研究では,Rx位置やRxの位置の事前分布を学習し,トレーニングデータから送信者(Tx)関連を選好するLocUNetの能力を認識し,その高い性能をこれらに関連付ける。逆に, 確率的手法に基づく古典的手法は, 事前情報を適切に組み込むことにより, 大いに有益であることを示す。また,LocUNetの最適性能を理論的に最適な定式化と比較することにより,多くの設定で数値的に証明する。 In this paper, we study the localization problem in dense urban settings. In such environments, Global Navigation Satellite Systems fail to provide good accuracy due to low likelihood of line-of-sight (LOS) links between the receiver (Rx) to be located and the satellites, due to the presence of obstacles like the buildings. Thus, one has to resort to other technologies, which can reliably operate under non-line-of-sight (NLOS) conditions. Recently, we proposed a Received Signal Strength (RSS) fingerprint and convolutional neural network-based algorithm, LocUNet, and demonstrated its state-of-the-art localization performance with respect to the widely adopted k-nearest neighbors (kNN) algorithm, and to state-of-the-art time of arrival (ToA) ranging-based methods. In the current work, we first recognize LocUNet's ability to learn the underlying prior distribution of the Rx position or Rx and transmitter (Tx) association preferences from the training data, and attribute its high performance to these. Conversely, we demonstrate that classical methods based on probabilistic approach, can greatly benefit from an appropriate incorporation of such prior information. Our studies also numerically prove LocUNet's close to optimal performance in many settings, by comparing it with the theoretically optimal formulations.	翻訳日:2022-12-02 17:57:33 公開日:2022-11-28
# 脳波を用いた脳-コンピュータインタフェースにおける逆アーチファクト検出 Adversarial Artifact Detection in EEG-Based Brain-Computer Interfaces ( http://arxiv.org/abs/2212.00727v1 ) ライセンス: Link先を確認	Xiaoqing Chen and Dongrui Wu	(参考訳) 機械学習は脳波(EEG)ベースの脳-コンピュータインターフェース(BCI)において大きな成功を収めた。既存のbci研究のほとんどは精度の向上に重点を置いていたが、セキュリティを考慮に入れていたものはほとんどなかった。しかし最近の研究では、脳波に基づくBCIは敵の攻撃に弱いことが示されており、入力に小さな摂動が加えられると誤分類が起こる可能性がある。敵の例の検出は,この現象の理解と防御の両方に不可欠である。本稿では,脳波によるBCIの逆検出を初めて検討する。 3つの畳み込みニューラルネットワークを用いた2つの脳波データセットの実験を行い、複数の検出手法の性能を検証する。ホワイトボックス攻撃とブラックボックス攻撃の両方が検出可能であり,前者の方が検出が容易であることを示した。 Machine learning has achieved great success in electroencephalogram (EEG) based brain-computer interfaces (BCIs). Most existing BCI research focused on improving its accuracy, but few had considered its security. Recent studies, however, have shown that EEG-based BCIs are vulnerable to adversarial attacks, where small perturbations added to the input can cause misclassification. Detection of adversarial examples is crucial to both the understanding of this phenomenon and the defense. This paper, for the first time, explores adversarial detection in EEG-based BCIs. Experiments on two EEG datasets using three convolutional neural networks were performed to verify the performances of multiple detection approaches. We showed that both white-box and black-box attacks can be detected, and the former are easier to detect.	翻訳日:2022-12-02 15:36:51 公開日:2022-11-28
# 構造に基づく薬物設計のための強化遺伝的アルゴリズム Reinforced Genetic Algorithm for Structure-based Drug Design ( http://arxiv.org/abs/2211.16508v1 ) ライセンス: Link先を確認	Tianfan Fu, Wenhao Gao, Connor W. Coley, Jimeng Sun	(参考訳) SBDD(Structure-based drug design)は、疾患関連タンパク質(ターゲット)に強く結合する分子(配位子)を見つけることで、薬物候補を見つけることを目的としている。近年,タンパク質ポケットに3次元分子設計を適用してSBDDを解く手法が注目されているが,確率的モデルとしての定式化は不満足な最適化性能をもたらすことが多い。一方、遺伝的アルゴリズム(GA)のような従来の組合せ最適化手法は、様々な分子最適化タスクにおいて最先端の性能を示す。しかし、彼らはタンパク質標的構造を利用して設計手順を知らせるのではなく、ランダムウォークのような探索に依存しており、同様の結合物理学にもかかわらず、不安定な性能と異なるタスク間の知識伝達を起こさない。より安定で効率的なsbddを実現するために、神経モデルを用いて、利益の出る設計ステップを優先順位付けし、ランダムウォーク動作を抑制する強化遺伝的アルゴリズム(rga)を提案する。ニューラルモデルは、ターゲットとリガンドの3d構造を入力とし、異なるターゲットからの共有結合物理学の知識を利用して、最適化中に微調整される。各種疾患ターゲットに対する結合親和性を最適化する実験的な研究を行い、RGAがドッキングスコアにおいてベースラインより優れ、ランダム初期化に対してより堅牢であることを示す。アブレーション研究では、異なる目標に対するトレーニングが、結合プロセスの共有基盤物理を活用することで、パフォーマンスを向上させることも示している。コードはhttps://github.com/futianfan/reinforced-genetic-algorithmで入手できる。 Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules (ligands) that bind tightly to a disease-related protein (targets), which is the primary approach to computer-aided drug discovery. Recently, applying deep generative models for three-dimensional (3D) molecular design conditioned on protein pockets to solve SBDD has attracted much attention, but their formulation as probabilistic modeling often leads to unsatisfactory optimization performance. On the other hand, traditional combinatorial optimization methods such as genetic algorithms (GA) have demonstrated state-of-the-art performance in various molecular optimization tasks. However, they do not utilize protein target structure to inform design steps but rely on a random-walk-like exploration, which leads to unstable performance and no knowledge transfer between different tasks despite the similar binding physics. To achieve a more stable and efficient SBDD, we propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior. The neural models take the 3D structure of the targets and ligands as inputs and are pre-trained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then fine-tuned during optimization. We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations. The ablation study also indicates that the training on different targets helps improve performance by leveraging the shared underlying physics of the binding processes. The code is available at https://github.com/futianfan/reinforced-genetic-algorithm.	翻訳日:2022-12-01 18:10:41 公開日:2022-11-28
# PACによる統計的アルゴリズムの検証 PAC Verification of Statistical Algorithms ( http://arxiv.org/abs/2211.17096v1 ) ライセンス: Link先を確認	Saachi Mutreja, Jonathan Shafer	(参考訳) goldwasser et alの略。 2021)は、最近PAC検証の設定を提案し、そこでは、非依存的なPAC学習目標を満たす仮説(機械学習モデル)を対話的証明を用いて検証した。本稿では,この概念をさらに様々な方法で展開する。まず、VC 次元 $d$ の仮説クラスに対する $\Omega(\sqrt{d})$ i.i.d.\ サンプルの PAC 検証に対する下界を証明する。第二に、このタスクに対する提案したプロトコルを改善し、下位境界にマッチする$\mathbb{R}$を超える間隔の和のPAC検証のためのプロトコルを提案する。第3に,その定義の自然な一般化を一般統計アルゴリズムの検証に導入する。提案した定義を裏付ける上で,我々の最終結果は,クエリの組合せ制約を満たす統計的クエリアルゴリズムの検証のためのプロトコルである。 Goldwasser et al.\ (2021) recently proposed the setting of PAC verification, where a hypothesis (machine learning model) that purportedly satisfies the agnostic PAC learning objective is verified using an interactive proof. In this paper we develop this notion further in a number of ways. First, we prove a lower bound for PAC verification of $\Omega(\sqrt{d})$ i.i.d.\ samples for hypothesis classes of VC dimension $d$. Second, we present a protocol for PAC verification of unions of intervals over $\mathbb{R}$ that improves upon their proposed protocol for that task, and matches our lower bound. Third, we introduce a natural generalization of their definition to verification of general statistical algorithms, which is applicable to a wider variety of practical algorithms beyond agnostic PAC learning. Showcasing our proposed definition, our final result is a protocol for the verification of statistical query algorithms that satisfy a combinatorial constraint on their queries.	翻訳日:2022-12-01 16:05:26 公開日:2022-11-28
# マルチタスク学習のためのセミソフトタスククラスタリング Semisoft Task Clustering for Multi-Task Learning ( http://arxiv.org/abs/2211.17204v1 ) ライセンス: Link先を確認	Yuzhao Zhang, Yifan Sun	(参考訳) マルチタスク学習(MTL)は、複数の関連する予測タスクの性能を向上させることを目的としている。未知の係数を著しく低減する柔軟性と能力のため、タスククラスタリングに基づくMTLアプローチは注目されている。そこで我々は,データの半ソフトクラスタリングの考え方に動機づけられ,純粋タスクと混合タスクの両方のタスククラスタ構造を同時に明らかにし,関連する機能を選択する半ソフトタスククラスタリング手法を提案する。このアプローチの背後にある主な前提は、各クラスタにはいくつかの純粋なタスクがあり、それぞれの混合タスクは異なるクラスタ内の純粋なタスクの線形結合によって表現できるということです。結果として生じる非凸制約最適化問題を解決するために,効率的な3ステップアルゴリズムを設計する。合成および実世界のデータセットに基づく実験結果は,提案手法の有効性と有効性を検証する。最後に,提案手法をロバストなタスククラスタリング問題に拡張する。 Multi-task learning (MTL) aims to improve the performance of multiple related prediction tasks by leveraging useful information from them. Due to their flexibility and ability to reduce unknown coefficients substantially, the task-clustering-based MTL approaches have attracted considerable attention. Motivated by the idea of semisoft clustering of data, we propose a semisoft task clustering approach, which can simultaneously reveal the task cluster structure for both pure and mixed tasks as well as select the relevant features. The main assumption behind our approach is that each cluster has some pure tasks, and each mixed task can be represented by a linear combination of pure tasks in different clusters. To solve the resulting non-convex constrained optimization problem, we design an efficient three-step algorithm. The experimental results based on synthetic and real-world datasets validate the effectiveness and efficiency of the proposed approach. Finally, we extend the proposed approach to a robust task clustering problem.	翻訳日:2022-12-01 16:05:10 公開日:2022-11-28
# 音声認識と名前付きエンティティ認識を用いた顧客会話からのキーエンティティの扱いと抽出 Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition ( http://arxiv.org/abs/2211.17107v1 ) ライセンス: Link先を確認	Sharvi Endait, Ruturaj Ghatage, Prof. DD Kadam	(参考訳) eコマースが急速に発展する現代のテクノロジー時代において、顧客の要求や詳細をビジネス上の会話から理解することが非常に重要である。これは顧客の維持と満足にとって非常に重要です。これらの会話から重要な洞察を抽出することは、製品の開発や問題解決において非常に重要です。顧客のフィードバック、反応、製品の重要な詳細を理解することは不可欠であり、名前付きエンティティ認識(NER)を使用して行われる。エンティティを抽出するために、最適な音声-テキストモデルを用いて会話をテキストに変換する。このモデルは、会話をテキストに変換する2段階のネットワークである。そして、NER BERTトランスモデルを用いて、ロバストな手法を用いて適切なエンティティを抽出する。これによって、彼らが直面している問題が発生した場合、顧客エクスペリエンスの充実に役立つでしょう。顧客が問題に直面したら、電話して苦情を登録します。モデルがこの会話から重要な特徴を抽出し、問題を調べるのに必要となる。これらの機能には、注文番号や正確な問題などの詳細が含まれている。これらすべては会話から直接抽出され、また会話を行う労力を減らすことになる。 In this modern era of technology with e-commerce developing at a rapid pace, it is very important to understand customer requirements and details from a business conversation. It is very crucial for customer retention and satisfaction. Extracting key insights from these conversations is very important when it comes to developing their product or solving their issue. Understanding customer feedback, responses, and important details of the product are essential and it would be done using Named entity recognition (NER). For extracting the entities we would be converting the conversations to text using the optimal speech-to-text model. The model would be a two-stage network in which the conversation is converted to text. Then, suitable entities are extracted using robust techniques using a NER BERT transformer model. This will aid in the enrichment of customer experience when there is an issue which is faced by them. If a customer faces a problem he will call and register his complaint. The model will then extract the key features from this conversation which will be necessary to look into the problem. These features would include details like the order number, and the exact problem. All these would be extracted directly from the conversation and this would reduce the effort of going through the conversation again.	翻訳日:2022-12-01 15:47:21 公開日:2022-11-28
# スコアベース拡散モデルにおける判別器指導による精錬生成過程 Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models ( http://arxiv.org/abs/2211.17091v1 ) ライセンス: Link先を確認	Dongjun Kim, Yeongmin Kim, Wanmo Kang, Il-Chul Moon	(参考訳) 拡散モデルの成功は様々な領域で目撃されているが、生成過程の変動についての研究はごくわずかである。本稿では、スコアチェックポイントが同じであれば、元の生成プロセスよりも逆プロセスに近い新しい生成プロセスを提案する。具体的には、生成過程を実データと生成データとの間の補助判別器で調整する。これにより、判別器による調整された生成プロセスは、元のプロセスよりも現実的なサンプルを生成する。実験では,CIFAR-10では1.74,CelebAでは1.33,FFHQでは1.88の新たなSOTA FIDが得られた。 While the success of diffusion models has been witnessed in various domains, only a few works have investigated the variation of the generative process. In this paper, we introduce a new generative process that is closer to the reverse process than the original generative process, given the identical score checkpoint. Specifically, we adjust the generative process with the auxiliary discriminator between the real data and the generated data. Consequently, the adjusted generative process with the discriminator generates more realistic samples than the original process. In experiments, we achieve new SOTA FIDs of 1.74 on CIFAR-10, 1.33 on CelebA, and 1.88 on FFHQ in the unconditional generation.	翻訳日:2022-12-01 15:27:27 公開日:2022-11-28
# 線形関数近似を用いたリーダフォロワーMDPにおけるモデルフリーRLの確率論的評価 Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation ( http://arxiv.org/abs/2211.15792v1 ) ライセンス: Link先を確認	Arnob Ghosh	(参考訳) エピソードの各ステップでエージェント(リーダー)が行動し、次に別のエージェント(フォロワー)が続くマルチエージェント・エピソードMDPのセットアップを考える。状態の進化と報酬は、リーダーと従者の合同行動ペアに依存する。このようなインタラクションは、スマートグリッド、メカニズム設計、セキュリティ、ポリシー作成など、多くの分野のアプリケーションを見つけることができる。ビジットフィードバック設定の下で、証明可能なパフォーマンス保証を持つプレイヤーの両方のポリシーを学ぶ方法に興味があります。我々は,リーダとフォロワーの両方が非ミオピック(非ミオピック)である,すなわち,エピソード全体を通じて報酬を最大化し,多くのRLアプリケーションで非常に一般的な連続的な状態空間をモデル化可能な線形MDPを検討する,という設定に焦点を当てる。我々は、"em model-free" rlアルゴリズムを提案し、$\tilde{\mathcal{o}}(\sqrt{d^3h^3t})$ regret boundsがリーダーとフォロワーの両方に対して達成できることを示し、ここで$d$は特徴マッピングの次元、$h$はエピソードの長さ、$t$はバンディットフィードバック情報の設定下のステップの総数であることを示した。したがって、状態の数が無限になった場合でも結果が成り立つ。このアルゴリズムはLSVI-UCBアルゴリズムの適応に依存している。具体的には、標準の欲求政策を(最高の反応として)リーダーとフォロワーの両方にとってのソフトマックス政策に置き換えます。これは値関数に対して一様濃度境界を確立する上で鍵となる。我々の知る限りでは、これは関数近似を持つ非ミオピックフォロワを持つマルコフゲームに対する最初の半線形後悔の限度保証である。 We consider a multi-agent episodic MDP setup where an agent (leader) takes action at each step of the episode followed by another agent (follower). The state evolution and rewards depend on the joint action pair of the leader and the follower. Such type of interactions can find applications in many domains such as smart grids, mechanism design, security, and policymaking. We are interested in how to learn policies for both the players with provable performance guarantee under a bandit feedback setting. We focus on a setup where both the leader and followers are {\em non-myopic}, i.e., they both seek to maximize their rewards over the entire episode and consider a linear MDP which can model continuous state-space which is very common in many RL applications. We propose a {\em model-free} RL algorithm and show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret bounds can be achieved for both the leader and the follower, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps under the bandit feedback information setup. Thus, our result holds even when the number of states becomes infinite. The algorithm relies on {\em novel} adaptation of the LSVI-UCB algorithm. Specifically, we replace the standard greedy policy (as the best response) with the soft-max policy for both the leader and the follower. This turns out to be key in establishing uniform concentration bound for the value functions. To the best of our knowledge, this is the first sub-linear regret bound guarantee for the Markov games with non-myopic followers with function approximation.	翻訳日:2022-11-30 18:16:39 公開日:2022-11-28
# クラスター化による新旧代謝産物の予測経路 Predicting pathways for old and new metabolites through clustering ( http://arxiv.org/abs/2211.15720v1 ) ライセンス: Link先を確認	Thiru Siddharth, Nathan Lewis	(参考訳) 多様な代謝経路は、エネルギーを収穫し、バイオマス成分を合成し、微小環境と相互作用する分子を生産し、毒素を中和する、すべての生物にとって基本である。新しい代謝産物や経路の発見が続いているが、新しい代謝産物の経路の予測は困難である。新しい代謝産物の経路を解明するのに膨大な時間を要するため、HMDBによると代謝産物の60%しか経路に割り当てられていない。本稿では,代謝物構造に基づく経路同定手法を提案する。 SMILESアノテーションから201の特徴を抽出し,PubMed抽象とHMDBから新たな代謝産物を同定した。両特徴群にクラスタリングアルゴリズムを適用した結果,代謝産物間の相関を定量化し,既知の代謝産物の92%をそれぞれの経路に正確に関連付けた。したがって、このアプローチは新しい代謝産物の代謝経路を予測するのに有用である。 The diverse metabolic pathways are fundamental to all living organisms, as they harvest energy, synthesize biomass components, produce molecules to interact with the microenvironment, and neutralize toxins. While discovery of new metabolites and pathways continues, the prediction of pathways for new metabolites can be challenging. It can take vast amounts of time to elucidate pathways for new metabolites; thus, according to HMDB only 60% of metabolites get assigned to pathways. Here, we present an approach to identify pathways based on metabolite structure. We extracted 201 features from SMILES annotations, and identified new metabolites from PubMed abstracts and HMDB. After applying clustering algorithms to both groups of features, we quantified correlations between metabolites, and found the clusters accurately linked 92% of known metabolites to their respective pathways. Thus, this approach could be valuable for predicting metabolic pathways for new metabolites.	翻訳日:2022-11-30 18:10:33 公開日:2022-11-28
# 条件付き生成型adversarial networkを用いたiactデータ解析のための生成画像の統計的特性制御 Using a Conditional Generative Adversarial Network to Control the Statistical Characteristics of Generated Images for IACT Data Analysis ( http://arxiv.org/abs/2211.15807v1 ) ライセンス: Link先を確認	Julia Dubenskaya, Alexander Kryukov, Andrey Demichev, Stanislav Polyakov, Elizaveta Gres, Anna Vlaskina	(参考訳) 生成逆ネットワークは天文学領域における画像生成に有望なツールである。特に興味深いのは条件付き生成対向ネットワーク(cGAN)で、画像のいくつかの特性の値に応じて画像を複数のクラスに分割し、新しい画像を生成する際に必要なクラスを指定することができる。大気圧チェレンコフ望遠鏡(IACT)の画像の場合、重要な特性はすべての画像ピクセルの明るさ(画像サイズ)であり、これは一次粒子のエネルギーと直接相関している。我々は,TAIGA-IACT実験で得られた画像と類似した画像を生成するために,cGANを用いた。トレーニングセットとして,TAIGAモンテカルロシミュレーションソフトウェアを用いて生成した2次元画像の集合を用いた。トレーニングセットを10クラスに人工的に分割し,画像のサイズを分類し,同じ数の画像が各クラスに収まるようにクラスの境界を定義する。これらのクラスはネットワークのトレーニングに使われました。本稿は,各クラスについて,生成した画像のサイズ分布が正規に近いことを示し,その平均値が対応するクラスのほぼ中間に位置することを示す。また,生成した画像に対して,全クラスにわたる分布を合計した総画像サイズ分布がトレーニングセットの原分布に近いことを示す。得られた結果は、IACTsが撮影したものと同様のリアルな合成画像のより正確な生成に役立ちます。 Generative adversarial networks are a promising tool for image generation in the astronomy domain. Of particular interest are conditional generative adversarial networks (cGANs), which allow you to divide images into several classes according to the value of some property of the image, and then specify the required class when generating new images. In the case of images from Imaging Atmospheric Cherenkov Telescopes (IACTs), an important property is the total brightness of all image pixels (image size), which is in direct correlation with the energy of primary particles. We used a cGAN technique to generate images similar to whose obtained in the TAIGA-IACT experiment. As a training set, we used a set of two-dimensional images generated using the TAIGA Monte Carlo simulation software. We artificiallly divided the training set into 10 classes, sorting images by size and defining the boundaries of the classes so that the same number of images fall into each class. These classes were used while training our network. The paper shows that for each class, the size distribution of the generated images is close to normal with the mean value located approximately in the middle of the corresponding class. We also show that for the generated images, the total image size distribution obtained by summing the distributions over all classes is close to the original distribution of the training set. The results obtained will be useful for more accurate generation of realistic synthetic images similar to the ones taken by IACTs.	翻訳日:2022-11-30 18:10:19 公開日:2022-11-28
# ニューラルネットワーク:星間媒質の化学問題を解決する Neural networks: solving the chemistry of the interstellar medium ( http://arxiv.org/abs/2211.15688v1 ) ライセンス: Link先を確認	Lorenzo Branca, Andrea Pallottini	(参考訳) 非平衡化学は、インターステラー中間体(ISM)の研究において重要な過程であり、特に分子雲や星の形成である。しかし、一般に(>40)反応の数が多いこと、短い進化の時間スケール(ISMの動的時間よりも約10^4$)、関連する正規微分方程式系(ODE)の特徴的な非線形性と剛性など、天体物理学シミュレーションに含めることが最も難しいタスクの1つである。この概念研究の証明では、物理インフォームドニューラルネットワーク(PINN)が、硬質熱化学系のための従来のODE時間積分器、すなわち水素分子生成(9種46反応)の代替となることを示す。 2< \log n/{\rm cm}^{-3}<3$) と温度(1< \log T/{\rm K}<5$) で異なる化学ネットワークをテストすると、基本的なアーキテクチャは単純な化学システムにのみ快適な収束を与えることができ、Deep Galerkin法が必要とされる突然の化学的および熱的変動を適切に捉えることができる。トレーニングされた(\sim 10^3$ GPUhr)PINNは、ソリューションの強い非線形特性(errors $\lesssim 10\%$)をうまく再現し、従来のODEソルバに対して最大$\sim 200$までスピードアップすることができる。さらに、後者は初期$n$と$T$で約$\sim 30\%$の完了時間を持ち、PINNメソッドは無視できるバリエーションを提供する。ロードバランシングのスピードアップと潜在的な改善は、ピン駆動のシミュレーションが天体物理学や宇宙論の問題における複雑な化学計算を解決する非常に好適な方法であることを暗示している。 Non-equilibrium chemistry is a key process in the study of the InterStellar Medium (ISM), in particular the formation of molecular clouds and thus stars. However, computationally it is among the most difficult tasks to include in astrophysical simulations, because of the typically high (>40) number of reactions, the short evolutionary timescales (about $10^4$ times less than the ISM dynamical time) and the characteristic non-linearity and stiffness of the associated Ordinary Differential Equations system (ODEs). In this proof of concept work, we show that Physics Informed Neural Networks (PINN) are a viable alternative to traditional ODE time integrators for stiff thermo-chemical systems, i.e. up to molecular hydrogen formation (9 species and 46 reactions). Testing different chemical networks in a wide range of densities ($-2< \log n/{\rm cm}^{-3}< 3$) and temperatures ($1 < \log T/{\rm K}< 5$), we find that a basic architecture can give a comfortable convergence only for simplified chemical systems: to properly capture the sudden chemical and thermal variations a Deep Galerkin Method is needed. Once trained ($\sim 10^3$ GPUhr), the PINN well reproduces the strong non-linear nature of the solutions (errors $\lesssim 10\%$) and can give speed-ups up to a factor of $\sim 200$ with respect to traditional ODE solvers. Further, the latter have completion times that vary by about $\sim 30\%$ for different initial $n$ and $T$, while the PINN method gives negligible variations. Both the speed-up and the potential improvement in load balancing imply that PINN-powered simulations are a very palatable way to solve complex chemical calculation in astrophysical and cosmological problems.	翻訳日:2022-11-30 17:44:02 公開日:2022-11-28
# CWD: 未知のクラウドワークロードを検出する機械学習ベースのアプローチ CWD: A Machine Learning based Approach to Detect Unknown Cloud Workloads ( http://arxiv.org/abs/2211.15739v1 ) ライセンス: Link先を確認	Mohammad Hossain, Derssie Mebratu, Niranjan Hasabnis, Jun Jin, Gaurav Chaudhary, Noah Shen	(参考訳) 現代のクラウドデータセンターのワークロードはますます複雑になりつつある。クラウドサービスプロバイダ(csp)は、オンデマンドサービスをリアルタイムにサポートしています。クラウド環境とクラウドワークロードの複雑さが増す中、IntelやAMDといったハードウェアベンダは、CPUプラットフォームにクラウド固有のワークロード加速機能を導入している。これらの機能は一般的に人気があり、一般的に使用されているクラウドワークロードをターゲットにしている。それにもかかわらず、顧客固有のワークロード(未知のワークロード)は、その特性が共通のワークロード(既知のワークロード)とは異なる場合、基盤となるプラットフォームの可能性に気付かない可能性がある。基盤となるプラットフォームの全可能性を実現するこの問題を解決するために、クラウド環境で実行されるワークロードを特徴付け、プロファイル化し、予測する機械学習技術を開発した。本手法の実験的評価は良好な予測性能を示す。また,モデルの性能をスタンドアロンで解析する手法も開発している。 Workloads in modern cloud data centers are becoming increasingly complex. The number of workloads running in cloud data centers has been growing exponentially for the last few years, and cloud service providers (CSP) have been supporting on-demand services in real-time. Realizing the growing complexity of cloud environment and cloud workloads, hardware vendors such as Intel and AMD are increasingly introducing cloud-specific workload acceleration features in their CPU platforms. These features are typically targeted towards popular and commonly-used cloud workloads. Nonetheless, uncommon, customer-specific workloads (unknown workloads), if their characteristics are different from common workloads (known workloads), may not realize the potential of the underlying platform. To address this problem of realizing the full potential of the underlying platform, we develop a machine learning based technique to characterize, profile and predict workloads running in the cloud environment. Experimental evaluation of our technique demonstrates good prediction performance. We also develop techniques to analyze the performance of the model in a standalone manner.	翻訳日:2022-11-30 17:43:25 公開日:2022-11-28
# 信頼度対応グラフニューラルネットワークによる信頼性評価 Confidence-Aware Graph Neural Networks for Learning Reliability Assessment Commitments ( http://arxiv.org/abs/2211.15755v1 ) ライセンス: Link先を確認	Seonho Park, Wenbo Chen, Dahye Han, Mathieu Tanneau, and Pascal Van Hentenryck	(参考訳) 信頼度評価コミットメント(RAC)最適化は, 再生可能世代の増加と予測誤差の増加により, グリッド運用においてますます重要になっている。独立系演算子(isos)はまた、より細かい時間的粒度、より長い時間的地平線、そしてさらなる経済的および信頼性の利益のために確率的定式化を使用することを目標としている。本論文の目的は, rac定式化の範囲拡大に伴う計算上の課題を解決することである。 RACLEARN は,(1) グラフニューラルネットワーク (GNN) を用いて生成元のコミットメントとアクティブラインの制約を予測し,(2) 信頼値を各コミットメント予測に関連付け,(3) 信頼性の高い予測のサブセットを選択し,(4) 実現可能性のために修正し,(5) 実現可能な予測とアクティブな制約を含む最先端の最適化アルゴリズムをシードする,と提示する。ミドルコンチネント・インディペンデント・システム・オペレーター(MISO)と実際の送信ネットワーク(8965の送信線、6708のバス、1890の発電機、6262の負荷ユニット)が使用する正確なRAC定式化実験の結果、RACLEARNフレームワークは、解品質が2～4の要因でRAC最適化を高速化できることが示されている。 Reliability Assessment Commitment (RAC) Optimization is increasingly important in grid operations due to larger shares of renewable generations in the generation mix and increased prediction errors. Independent System Operators (ISOs) also aim at using finer time granularities, longer time horizons, and possibly stochastic formulations for additional economic and reliability benefits. The goal of this paper is to address the computational challenges arising in extending the scope of RAC formulations. It presents RACLEARN that (1) uses Graph Neural Networks (GNN) to predict generator commitments and active line constraints, (2) associates a confidence value to each commitment prediction, (3) selects a subset of the high-confidence predictions, which are (4) repaired for feasibility, and (5) seeds a state-of-the-art optimization algorithm with the feasible predictions and the active constraints. Experimental results on exact RAC formulations used by the Midcontinent Independent System Operator (MISO) and an actual transmission network (8965 transmission lines, 6708 buses, 1890 generators, and 6262 load units) show that the RACLEARN framework can speed up RAC optimization by factors ranging from 2 to 4 with negligible loss in solution quality.	翻訳日:2022-11-30 17:43:10 公開日:2022-11-28
# 群数データに対する階層ベイズモデルの効率的な推定のための近似ギブズサンプリング Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data ( http://arxiv.org/abs/2211.15771v1 ) ライセンス: Link先を確認	Jin-Zhu Yu, Hiba Baroud	(参考訳) 階層型ベイズ・ポアソン回帰モデル (HBPRMs) は予測値とカウント応答変数の関係の柔軟なモデリング手法を提供する。大規模データセットへのhbprmの適用には、ランダムサンプリングに基づく多くのモデルパラメータを推測する計算コストが高いため、効率的な推論アルゴリズムが必要である。マルコフ・チェイン・モンテカルロ (MCMC) アルゴリズムはベイジアン推論に広く用いられているが、このタイプのアルゴリズムを用いたサンプリングは、大規模なデータと時間に敏感な意思決定を行うアプリケーションには時間を要する。この制限を克服するため,推定精度を維持しつつHBPRMを効率的に学習するための近似ギブスサンプリング器(AGS)を開発した。提案したサンプリング器では,データ確率はガウス分布と近似して係数の条件付き後部が閉形式解を持つ。実データと合成データを用いた数値実験は,特に大規模データセットにおいて,最先端サンプリングアルゴリズムと比較してAGSの優れた性能を示す。 Hierarchical Bayesian Poisson regression models (HBPRMs) provide a flexible modeling approach of the relationship between predictors and count response variables. The applications of HBPRMs to large-scale datasets require efficient inference algorithms due to the high computational cost of inferring many model parameters based on random sampling. Although Markov Chain Monte Carlo (MCMC) algorithms have been widely used for Bayesian inference, sampling using this class of algorithms is time-consuming for applications with large-scale data and time-sensitive decision-making, partially due to the non-conjugacy of many models. To overcome this limitation, this research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy. In the proposed sampler, the data likelihood is approximated with Gaussian distribution such that the conditional posterior of the coefficients has a closed-form solution. Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS in comparison to the state-of-the-art sampling algorithm, especially for large datasets.	翻訳日:2022-11-30 17:42:40 公開日:2022-11-28
# sgva-clip: 画像分類のための視覚言語モデルのセマンティック誘導視覚適応 SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification ( http://arxiv.org/abs/2211.16191v1 ) ライセンス: Link先を確認	Fang Peng, Xiaoshan Yang, Changsheng Xu	(参考訳) 少数ショット学習では大きな進歩があったが、既存の少数ショット学習法のほとんどは、実世界のアプリケーションにおける一般化能力を制限するために、大量のベースクラスのサンプルで事前学習を監督する必要がある。近年、大規模な自己教師型視覚言語モデル(例えばCLIP)は、伝達可能な視覚表現学習のための新しいパラダイムを提供している。しかしながら、事前訓練されたvlpは、言語文によって記述が難しいが、少ないショット分類で効果的な分類法を学ぶために重要である詳細な視覚情報を無視する可能性がある。そこで本研究では,視覚固有のコントラスト損失,クロスモーダルコントラスト損失,暗黙の知識蒸留を包括的に利用することにより,視覚言語事前学習モデルを拡張し,識別的タスク特有の視覚特徴を創り出すための新しいフレームワークであるsemantic-guided visual adapting (sgva)を提案する。暗黙的知識蒸留は、細粒度のクロスモーダル知識を視覚アダプターの更新を導くために設計されている。 13のデータセットに関する最先端の成果は、適応したビジュアル機能がクロスモーダル機能を補完し、少数ショットの画像分類を改善することを証明している。 Although significant progress has been made in few-shot learning, most of existing few-shot learning methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application. Recently, large-scale self-supervised vision-language models (e.g., CLIP) have provided a new paradigm for transferable visual representation learning. However, the pre-trained VLPs may neglect detailed visual information that is difficult to describe by language sentences, but important for learning an effective classifier in few-shot classification. To address the above problem, we propose a new framework, named Semantic-guided Visual Adapting (SgVA), which can effectively extend vision-language pre-trained models to produce discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation. The implicit knowledge distillation is designed to transfer the fine-grained cross-modal knowledge to guide the updating of the vision adapter. State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.	翻訳日:2022-11-30 17:24:24 公開日:2022-11-28
# 回転に注意:3d形状のための一様バックドアパターン Be Careful with Rotation: A Uniform Backdoor Pattern for 3D Shape ( http://arxiv.org/abs/2211.16192v1 ) ライセンス: Link先を確認	Linkun Fan, Fazhi He, Qing Guo, Wei Tang, Xiaolin Hong, Bing Li	(参考訳) コスト削減のために、多くのディープニューラルネットワーク(DNN)は、インターネットからダウンロードされたサードパーティのデータセットでトレーニングされている。 2Dドメインでは、異なる画像フォーマットの固有の構造が似ている。したがって、あるイメージフォーマット用に設計されたバックドアアタックは、他のフォーマットと一致します。しかし、3Dの世界では、異なる3Dデータ構造の間に大きな違いがあります。その結果、ある特定の3dデータ構造用に設計されたバックドアパターンは、同じ3dシーンの他のデータ構造では無効になる。そこで本稿では, 不均一な3次元データ構造に適応可能な NRBdoor (Noisy Rotation Backdoor) という一様バックドアパターンを設計する。具体的には、まずユニット回転から始めて、ノイズ生成と選択プロセスにより最適なパターンを探索する。 NRBdoorは,一対の点のミスマッチと実世界の3Dシーンのセンサキャリブレーション誤差により,通常ノイズを含むため,自然かつ知覚不能である。 3Dメッシュとポイントクラウドの大規模な実験により、提案したRBBdoorは、無視可能な形状変化で最先端のパフォーマンスを達成することが示された。 For saving cost, many deep neural networks (DNNs) are trained on third-party datasets downloaded from internet, which enables attacker to implant backdoor into DNNs. In 2D domain, inherent structures of different image formats are similar. Hence, backdoor attack designed for one image format will suite for others. However, when it comes to 3D world, there is a huge disparity among different 3D data structures. As a result, backdoor pattern designed for one certain 3D data structure will be disable for other data structures of the same 3D scene. Therefore, this paper designs a uniform backdoor pattern: NRBdoor (Noisy Rotation Backdoor) which is able to adapt for heterogeneous 3D data structures. Specifically, we start from the unit rotation and then search for the optimal pattern by noise generation and selection process. The proposed NRBdoor is natural and imperceptible, since rotation is a common operation which usually contains noise due to both the miss match between a pair of points and the sensor calibration error for real-world 3D scene. Extensive experiments on 3D mesh and point cloud show that the proposed NRBdoor achieves state-of-the-art performance, with negligible shape variation.	翻訳日:2022-11-30 17:23:59 公開日:2022-11-28
# より賢く、難しくない: 不足データから深部腹部ctの登録を学ぶ Train smarter, not harder: learning deep abdominal CT registration on scarce data ( http://arxiv.org/abs/2211.15717v1 ) ライセンス: Link先を確認	Javier P\'erez de Frutos, Andr\'e Pedersen, Egidijus Pelanis, David Bouget, Shanmugapriya Survarachakan, Thomas Lang{\o}, Ole-Jakob Elle, Frank Lindseth	(参考訳) 目的:本研究の目的は,腹部画像の畳み込みニューラルネットワークに基づく画像から画像への登録を改善するための訓練戦略を検討することである。方法: 異なる訓練戦略, 損失関数, 転校学習スキームを検討した。さらに, 動的損失重み付けが可能な損失層に加えて, 実機で人工訓練画像対を生成する拡張層も提案した。結果: 訓練段階におけるセグメンテーションを用いた登録指導は, 深層学習に基づく画像登録に有用であることが判明した。脳MRIデータセットから腹部CTデータセットに事前トレーニングされたモデルを微調整することで、後者のアプリケーションのパフォーマンスがさらに向上した。動的損失重み付けは、推論ランタイムに影響を与えることなく、パフォーマンスをわずかに改善した。結論: 単純な概念を用いて, 一般的に使用される深層画像登録アーキテクチャvoxelmorphの性能を改善した。今後の作業では、DDMRというフレームワークをさまざまなデータセットで検証して、その価値をさらに評価する必要があります。 Purpose: This study aims to explore training strategies to improve convolutional neural network-based image-to-image registration for abdominal imaging. Methods: Different training strategies, loss functions, and transfer learning schemes were considered. Furthermore, an augmentation layer which generates artificial training image pairs on-the-fly was proposed, in addition to a loss layer that enables dynamic loss weighting. Results: Guiding registration using segmentations in the training step proved beneficial for deep-learning-based image registration. Finetuning the pretrained model from the brain MRI dataset to the abdominal CT dataset further improved performance on the latter application, removing the need for a large dataset to yield satisfactory performance. Dynamic loss weighting also marginally improved performance, all without impacting inference runtime. Conclusion: Using simple concepts, we improved the performance of a commonly used deep image registration architecture, VoxelMorph. In future work, our framework, DDMR, should be validated on different datasets to further assess its value.	翻訳日:2022-11-30 17:15:25 公開日:2022-11-28
# d'ecouvrir de nouvelles class dans des donn\'ees tabulaires D\'ecouvrir de nouvelles classes dans des donn\'ees tabulaires ( http://arxiv.org/abs/2211.16352v1 ) ライセンス: Link先を確認	Colin Troisemaine, Joachim Flocon-Cholet, St\'ephane Gosselin, Sandrine Vaton, Alexandre Reiffers-Masson, Vincent Lemaire	(参考訳) novel class discovery (ncd) では、既知のが異なるクラスのラベル付き集合が与えられたラベルなしのセットで新しいクラスを見つけることが目的である。 NCDは最近、コミュニティから注目を集めているが、非常に一般的なデータ表現であるにもかかわらず、不均一な表形式データのためのフレームワークはまだ提案されていない。本稿では,表データの新しいクラスを発見するための新しい手法であるTabularNCDを提案する。異種変数を含む表データのコンテキストにおいて,すでに知られているクラスから知識を抽出し,新しいクラスの発見プロセスを導く方法を示す。このプロセスの一部は、擬似ラベルを定義する新しい方法によって行われ、マルチタスク学習における最近の知見に従い、共同目的関数を最適化する。本手法は,NCDが画像だけでなく,不均一な表データにも適用可能であることを示す。 In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.	翻訳日:2022-11-30 17:07:21 公開日:2022-11-28
# ディープラーニングフレームワークにおけるライブラリの利用と依存に関する実証的研究 An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks ( http://arxiv.org/abs/2211.15733v1 ) ライセンス: Link先を確認	Mohamed Raed El aoun, Lionel Nganyewou Tidjon, Ben Rombaut, Foutse Khomh, Ahmed E. Hassan	(参考訳) ディープラーニング(dl)の最近の進歩は、最先端のディープニューラルネットワーク(DNN)の開発とデプロイにおいて機械学習(ml)実践者を支援するために、pytorch、Caffe、TensorFlowなどのいくつかのdlソフトウェアライブラリがリリースされたが、テストやデータ処理などのdlライブラリの制限に適切に対処することはできない。本稿では、最も頻繁なdlライブラリの組み合わせの質的かつ定量的な分析、mlワークフロー全体にわたるdlライブラリ依存性の分布、および一連のレコメンデーションの定式化について述べる。 (i)より最適化されたアクセラレーターのためのハードウェアビルダー (ii) より洗練された将来のリリースのためのライブラリビルダー。本研究は1,484のオープンソースdlプロジェクトに基づいており,46,110人のコントリビューターが評価に基づいて選出されている。まず,深層学習ライブラリの利用が増加傾向にあった。第2に,ディープラーニングライブラリの利用パターンをいくつか紹介する。さらに、dlライブラリと最も頻繁なコンビネーション間の依存関係を特定し、pytorchとscikit-learn、kerasとtensorflowが18%と14%のプロジェクトでもっとも頻繁なコンビネーションであることが分かりました。開発者は同じプロジェクトで2、3のdlライブラリを使用し、同じ関数と同じファイルの両方で異なる複数のdlライブラリを使用する傾向がある。開発者は、さまざまなディープラーニングライブラリの使用パターンを示し、より少ない引数と直接的な目標を持つ単純な関数を好む。最後に, 研究者, ライブラリメンテナ, ハードウェアベンダに対して, 調査結果の意義について述べる。 Recent advances in deep learning (dl) have led to the release of several dl software libraries such as pytorch, Caffe, and TensorFlow, in order to assist machine learning (ml) practitioners in developing and deploying state-of-the-art deep neural networks (DNN), but they are not able to properly cope with limitations in the dl libraries such as testing or data processing. In this paper, we present a qualitative and quantitative analysis of the most frequent dl libraries combination, the distribution of dl library dependencies across the ml workflow, and formulate a set of recommendations to (i) hardware builders for more optimized accelerators and (ii) library builder for more refined future releases. Our study is based on 1,484 open-source dl projects with 46,110 contributors selected based on their reputation. First, we found an increasing trend in the usage of deep learning libraries. Second, we highlight several usage patterns of deep learning libraries. In addition, we identify dependencies between dl libraries and the most frequent combination where we discover that pytorch and Scikit-learn and, Keras and TensorFlow are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files. The developer shows patterns in using various deep-learning libraries and prefers simple functions with fewer arguments and straightforward goals. Finally, we present the implications of our findings for researchers, library maintainers, and hardware vendors.	翻訳日:2022-11-30 17:06:50 公開日:2022-11-28
# 特徴とセマンティクスの二重一致による深い半教師付き学習 Deep Semi-supervised Learning with Double-Contrast of Features and Semantics ( http://arxiv.org/abs/2211.15671v1 ) ライセンス: Link先を確認	Quan Feng, Jiayu Yao, Zhison Pan, Guojun Zhou	(参考訳) 近年、インテリジェントトランスポートシステム(ITS)の分野は、大量のアノテーションデータによって大きな成功を収めている。しかし、これらの注釈付きデータを取得するには、実際のコストがかかる必要がある。したがって、より現実的な戦略は、少量のラベル付きデータと大量のラベルなしデータで半教師付き学習(SSL)を活用することである。典型的には、意味整合性規則化と特徴抽出と分類を分離する2段階学習法が有効であることが証明されている。それにもかかわらず、意味的一貫性の正規化のみに限定された表現学習は、異なる意味論を持つサンプルの表現の分離や判別性を保証するものではない。以上の欠点に対処するため,本論文では,正と負の強化サンプルペアのセマンティクス/特徴を対比することにより,効果的なタスク固有の識別特徴を抽出する,意味と特徴の両立を両立する深層半教師付き学習手法を提案する。さらに,情報理論を用いて意味論と特徴の二重コントラストの合理性を説明し,slack相互情報をより単純な方法でコントラスト損失を説明する。最後に,本手法の有効性をベンチマークデータセットで検証した。 In recent years, the field of intelligent transportation systems (ITS) has achieved remarkable success, which is mainly due to the large amount of available annotation data. However, obtaining these annotated data has to afford expensive costs in reality. Therefore, a more realistic strategy is to leverage semi-supervised learning (SSL) with a small amount of labeled data and a large amount of unlabeled data. Typically, semantic consistency regularization and the two-stage learning methods of decoupling feature extraction and classification have been proven effective. Nevertheless, representation learning only limited to semantic consistency regularization may not guarantee the separation or discriminability of representations of samples with different semantics; due to the inherent limitations of the two-stage learning methods, the extracted features may not match the specific downstream tasks. In order to deal with the above drawbacks, this paper proposes an end-to-end deep semi-supervised learning double contrast of semantic and feature, which extracts effective tasks specific discriminative features by contrasting the semantics/features of positive and negative augmented samples pairs. Moreover, we leverage information theory to explain the rationality of double contrast of semantics and features and slack mutual information to contrastive loss in a simpler way. Finally, the effectiveness of our method is verified in benchmark datasets.	翻訳日:2022-11-30 16:59:39 公開日:2022-11-28
# PyTorch Adapt PyTorch Adapt ( http://arxiv.org/abs/2211.15673v1 ) ライセンス: Link先を確認	Kevin Musgrave, Serge Belongie, Ser-Nam Lim	(参考訳) PyTorch Adaptは、既存のモデルを新しいドメインで動作させるための機械学習アルゴリズムの一種である、ドメイン適応のためのライブラリである。これは完全なツールキットであり、ユーザは数行のコードで完全なトレーナー/テストパイプラインを作成できる。モジュール性もあるので、ユーザは必要なパーツだけをインポートでき、フレームワークにロックされることを心配する必要はない。このライブラリの1つの特徴はカスタマイズ性である。特に複雑なトレーニングアルゴリズムは、構成可能で遅延評価されたフックのシステムのおかげで、容易に修正および組み合わせが可能である。本報告では,これらの特徴と図書館全体の設計について概説する。コードはhttps://www.github.com/KevinMusgrave/pytorch-adaptで入手できる。 PyTorch Adapt is a library for domain adaptation, a type of machine learning algorithm that re-purposes existing models to work in new domains. It is a fully-featured toolkit, allowing users to create a complete train/test pipeline in a few lines of code. It is also modular, so users can import just the parts they need, and not worry about being locked into a framework. One defining feature of this library is its customizability. In particular, complex training algorithms can be easily modified and combined, thanks to a system of composable, lazily-evaluated hooks. In this technical report, we explain in detail these features and the overall design of the library. Code is available at https://www.github.com/KevinMusgrave/pytorch-adapt	翻訳日:2022-11-30 16:59:16 公開日:2022-11-28
# eXplainable Machine LearningとKelly Indexによるフットボールの試合結果の予測 Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index ( http://arxiv.org/abs/2211.15734v1 ) ライセンス: Link先を確認	Yiming Ren and Teo Susnjak	(参考訳) 本研究では,サッカーの試合の結果を予測するための機械学習手法を開発した。この研究の新規性は、Kelly Indexを利用して、マッチをそれぞれ異なるレベルの予測困難を示すカテゴリに分類することにある。このアプローチの有効性を判断するために,マッチの各カテゴリに対して,幅広いアルゴリズム群を用いた分類モデルを開発した。これと合わせて、以前は探索されていなかった一連の機能は、Eloベースの変数を含むエンジニアリングだった。データセットは2019-2021シーズンのプレミアリーグの試合データに由来する。その結果,予測問題をサブタスクに分解するプロセスが効果的であり,先行研究と競合する結果が得られたが,アンサンブルベースの手法が最も効果的であった。また,本書の確率をベンチマークすることで有効性を評価するための投資戦略も考案した。予測モデルの信頼しきい値とケリー指数を組み合わせることでリスクを最小化する手法を開発した。実験の結果,提案手法は,予測モデルが高い信頼度を示す場合に,予測し易いマッチングに主眼を置く保守的アプローチに従えば,利益を返すことができることがわかった。 In this work, a machine learning approach is developed for predicting the outcomes of football matches. The novelty of this research lies in the utilisation of the Kelly Index to first classify matches into categories where each one denotes the different levels of predictive difficulty. Classification models using a wide suite of algorithms were developed for each category of matches in order to determine the efficacy of the approach. In conjunction to this, a set of previously unexplored features were engineering including Elo-based variables. The dataset originated from the Premier League match data covering the 2019-2021 seasons. The findings indicate that the process of decomposing the predictive problem into sub-tasks was effective and produced competitive results with prior works, while the ensemble-based methods were the most effective. The paper also devised an investment strategy in order to evaluate its effectiveness by benchmarking against bookmaker odds. An approach was developed that minimises risk by combining the Kelly Index with the predefined confidence thresholds of the predictive models. The experiments found that the proposed strategy can return a profit when following a conservative approach that focuses primarily on easy-to-predict matches where the predictive models display a high confidence level.	翻訳日:2022-11-30 16:59:05 公開日:2022-11-28
# プライバシデリゲーション下におけるニューラルネット型微分プライベートタブラリトレーニングデータの実用性回復性について On the Utility Recovery Incapability of Neural Net-based Differential Private Tabular Training Data Synthesizer under Privacy Deregulation ( http://arxiv.org/abs/2211.15809v1 ) ライセンス: Link先を確認	Yucong Liu, Chi-Hua Wang, Guang Cheng	(参考訳) 生成モデルプライバシ・ユーティリティ・トレードオフの監査手順の策定は、実際には重要な課題であるが未解決の問題である。既存の研究は、合成データトレーニングの実際のパラダイムに基づいた、合成における列車のユーティリティ劣化の観点から、プライバシ制約の副作用を調査することに集中している。我々は,プライバシデリゲーションの側面が合成トレーニングデータユーティリティに与える影響を観察することによって,プライバシユーティリティのトレードオフに関する理解を次のレベルに押し上げる。突如として,DP-CTGANとPATE-CTGANのプライバシー規制下での実用性回復不能が発見され,実用性への懸念が高まった。プライバシデリゲーション(Privacy Deregulation)は,必ずしもユーティリティリカバリを意味するものではない。 Devising procedures for auditing generative model privacy-utility tradeoff is an important yet unresolved problem in practice. Existing works concentrates on investigating the privacy constraint side effect in terms of utility degradation of the train on synthetic, test on real paradigm of synthetic data training. We push such understanding on privacy-utility tradeoff to next level by observing the privacy deregulation side effect on synthetic training data utility. Surprisingly, we discover the Utility Recovery Incapability of DP-CTGAN and PATE-CTGAN under privacy deregulation, raising concerns on their practical applications. The main message is Privacy Deregulation does NOT always imply Utility Recovery.	翻訳日:2022-11-30 16:58:46 公開日:2022-11-28
# 動的応力予測のための物理インフォームドニューラルネットワーク Physics Informed Neural Network for Dynamic Stress Prediction ( http://arxiv.org/abs/2211.16190v1 ) ライセンス: Link先を確認	Hamed Bolandi, Gautam Sreekumar, Xuyang Li, Nizar Lajnef, Vishnu Naresh Boddeti	(参考訳) 構造的破壊はしばしば地震や風などの壊滅的な出来事によって引き起こされる。その結果, 動的応力分布をリアルタイムに予測することが重要である。現在利用可能な有限要素モデル(FEM)のような高忠実度メソッドは、その固有の高複雑性に悩まされている。そこで, 精度を維持しつつ計算コストを削減するため, 偏微分方程式 (PDE) を用いた有限要素シミュレーションに基づいて, 応力分布の全列を予測するために, PINN-Stressモデル(Physical Informed Neural Network)を提案する。自動微分を用いて、深層ニューラルネットワークの損失関数にPDEを埋め込み、測定やPDEからの情報を取り込む。 PINN-Stressモデルは、ほぼリアルタイムで応力分布の列を予測でき、PINNなしではモデルよりも良く一般化できる。 Structural failures are often caused by catastrophic events such as earthquakes and winds. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity. Therefore, to reduce computational cost while maintaining accuracy, a Physics Informed Neural Network (PINN), PINN-Stress model, is proposed to predict the entire sequence of stress distribution based on Finite Element simulations using a partial differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep neural network's loss function to incorporate information from measurements and PDEs. The PINN-Stress model can predict the sequence of stress distribution in almost real-time and can generalize better than the model without PINN.	翻訳日:2022-11-30 16:57:21 公開日:2022-11-28
# RGBシーケンスからの手動3次元オブジェクトスキャン In-Hand 3D Object Scanning from an RGB Sequence ( http://arxiv.org/abs/2211.16193v1 ) ライセンス: Link先を確認	Shreyas Hampali, Tomas Hodan, Luan Tran, Lingni Ma, Cem Keskin, Vincent Lepetit	(参考訳) カラー画像のシーケンスから未知の物体を3次元的に走査する手法を提案する。本研究では, 物体表面を多視点画像から再構成する手法として, 物体の形状と外観の両方を捉えるニューラル暗示表面表現に頼っている。多くのNeRF方式とは対照的に、カメラ対象の相対的なポーズが知られているとは仮定せず、オブジェクト形状とポーズ軌道の両方を同時に最適化する。全ての形状とポーズパラメータのグローバルな最適化は、ポーズの粗い初期化なしに失敗しがちであるので、最適化が成功する可能性のある、慎重に選択された重複セグメントにシーケンスを分割することから始まる漸進的なアプローチを提案する。物体形状を漸進的に再構成し,各セグメント内で独立して物体のポーズを追跡し,その後,重なり合うフレームで推定されるポーズを調整して全てのセグメントをマージする。最後に,全セグメントに対してグローバルな最適化を行い,完全な再構築を実現する。提案手法は,テクスチャと難解なテクスチャレス物体の形状と色を再現し,外観のみに依存する古典的手法よりも優れており,その性能は既知のカメラのポーズを仮定する最近の手法に近いことを示す。 We propose a method for in-hand 3D scanning of an unknown object from a sequence of color images. We cast the problem as reconstructing the object surface from un-posed multi-view images and rely on a neural implicit surface representation that captures both the geometry and the appearance of the object. By contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known and instead simultaneously optimize both the object shape and the pose trajectory. As global optimization over all the shape and pose parameters is prone to fail without coarse-level initialization of the poses, we propose an incremental approach which starts by splitting the sequence into carefully selected overlapping segments within which the optimization is likely to succeed. We incrementally reconstruct the object shape and track the object poses independently within each segment, and later merge all the segments by aligning poses estimated at the overlapping frames. Finally, we perform a global optimization over all the aligned segments to achieve full reconstruction. We experimentally show that the proposed method is able to reconstruct the shape and color of both textured and challenging texture-less objects, outperforms classical methods that rely only on appearance features, and its performance is close to recent methods that assume known camera poses.	翻訳日:2022-11-30 16:41:05 公開日:2022-11-28
# SLAN: クロスモーダル理解のためのセルフロケータ支援ネットワーク SLAN: Self-Locator Aided Network for Cross-Modal Understanding ( http://arxiv.org/abs/2211.16208v1 ) ライセンス: Link先を確認	Jiang-Tian Zhai, Qi Zhang, Tong Wu, Xing-Yu Chen, Jiang-Jiang Liu, Bo Ren, Ming-Ming Cheng	(参考訳) 視覚と言語の間のきめ細かい相互作用を学ぶことで、VisionLanguageタスクをより正確に理解できます。しかし、セマンティックアライメントのためのテキストに従ってキー画像領域を抽出することは依然として困難である。既存のほとんどの作品は、凍結検知器で得られたテキスト診断や冗長な領域によって制限されているか、あるいは事前の検出器へのわずかな接地(金)データに大きく依存しているため、さらにスケールできない。これらの問題を解決するために,ゴールドデータなしでクロスモーダル理解タスクを行うセルフロケータ支援ネットワーク (slan, self-locator aided network) を提案する。 SLANは、異なるテキストで条件付けられた関心領域をローカライズするための領域フィルタと領域アダプタで構成される。クロスモーダル情報を集約することにより、領域フィルタはキー領域を選択し、領域適応子はテキストガイダンスで座標を更新する。詳細な領域単語アライメントにより、SLANは多くの下流タスクに簡単に一般化できる。 5つのクロスモーダル理解タスク(例えば、coco画像からテキストへの変換とテキストから画像への検索において85.7%と69.2%)において、かなり競争力のある結果が得られる。 SLANはまた、2つのローカライゼーションタスクに強いゼロショットと微調整の転送可能性を示す。 Learning fine-grained interplay between vision and language allows to a more accurate understanding for VisionLanguage tasks. However, it remains challenging to extract key image regions according to the texts for semantic alignments. Most existing works are either limited by textagnostic and redundant regions obtained with the frozen detectors, or failing to scale further due to its heavy reliance on scarce grounding (gold) data to pre-train detectors. To solve these problems, we propose Self-Locator Aided Network (SLAN) for cross-modal understanding tasks without any extra gold data. SLAN consists of a region filter and a region adaptor to localize regions of interest conditioned on different texts. By aggregating cross-modal information, the region filter selects key regions and the region adaptor updates their coordinates with text guidance. With detailed region-word alignments, SLAN can be easily generalized to many downstream tasks. It achieves fairly competitive results on five cross-modal understanding tasks (e.g., 85.7% and 69.2% on COCO image-to-text and text-to-image retrieval, surpassing previous SOTA methods). SLAN also demonstrates strong zero-shot and fine-tuned transferability to two localization tasks.	翻訳日:2022-11-30 16:40:19 公開日:2022-11-28
# pids: 3次元点雲のコネクテッドポイントインタラクション・ディメンション探索 PIDS: Joint Point Interaction-Dimension Search for 3D Point Cloud ( http://arxiv.org/abs/2211.15759v1 ) ライセンス: Link先を確認	Tunhou Zhang, Mingyuan Ma, Feng Yan, Hai Li, Yiran Chen	(参考訳) 点の相互作用と次元は、階層的3dモデルを提供する点作用素を設計する上で重要な軸である。しかし、この2つの軸は異質であり、完全な探査は困難である。既存のワークスクラフトポイント演算子を1軸下に置き、3Dモデルのすべての部分でクラフトスクラフト演算子を再利用する。これは、3次元点雲の様々な幾何学的・密度を活用し、点相互作用と次元をより良く結合する機会を見下ろす。本研究では,点間相互作用と点次元を共同で探索し,点クラウドデータのセマンティックセグメンテーションを提供する新しいパラダイムであるPIDSを確立する。我々は多目的点相互作用と点次元を共同で検討する大規模な探索空間を確立する。これは様々な幾何学・密度を考慮した点演算子をサポートする。ヘテロジニアスな検索コンポーネントを持つ拡張された検索空間は、候補モデルのより優れたランキングを求める。そこで我々は,予測器をベースとしたニューラルアーキテクチャ探索(NAS)を活用して探索空間の探索を改良し,それ以前の特徴に基づいて,一意のエンコーディングを異種検索コンポーネントに割り当てることで予測品質を向上させる。本研究では,2つのセマンティックセグメンテーション・ベンチマークを用いてPIDSが作成したネットワークを徹底的に評価し,SemanticKITTIとS3DISの3Dモデルに対して約1%のmIOU改善を示した。 The interaction and dimension of points are two important axes in designing point operators to serve hierarchical 3D models. Yet, these two axes are heterogeneous and challenging to fully explore. Existing works craft point operator under a single axis and reuse the crafted operator in all parts of 3D models. This overlooks the opportunity to better combine point interactions and dimensions by exploiting varying geometry/density of 3D point clouds. In this work, we establish PIDS, a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data. We establish a large search space to jointly consider versatile point interactions and point dimensions. This supports point operators with various geometry/density considerations. The enlarged search space with heterogeneous search components calls for a better ranking of candidate models. To achieve this, we improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS), and enhance the quality of prediction by assigning unique encoding to heterogeneous search components based on their priors. We thoroughly evaluate the networks crafted by PIDS on two semantic segmentation benchmarks, showing ~1% mIOU improvement on SemanticKITTI and S3DIS over state-of-the-art 3D models.	翻訳日:2022-11-30 16:23:53 公開日:2022-11-28
# 3次元シーンインスタンスセグメンテーションのためのスーパーポイントトランスフォーマー Superpoint Transformer for 3D Scene Instance Segmentation ( http://arxiv.org/abs/2211.15766v1 ) ライセンス: Link先を確認	Jiahao Sun, Chunmei Qing, Junpeng Tan, Xiangmin Xu	(参考訳) 既存のほとんどのメソッドは、3Dオブジェクト検出や3Dセマンティックセマンティックセマンティックセマンティクスに使用されるモデルを拡張して3Dインスタンスセマンティクスを実現する。しかし、これらの非ストレートフォワード法には2つの欠点がある。 1) 境界ボックスや不十分な意味予測は、3dインスタンスのセグメンテーションフレームワーク全体のパフォーマンスを制限する。 2) 既存の手法では, 集約に要する時間を要する。そこで本研究では,SPFormer という名称の Superpoint Transformer に基づく,エンドツーエンドの3Dインスタンスセグメンテーション手法を提案する。ポイントクラウドから潜在的な機能をスーパーポイントにグループ化し、オブジェクト検出やセマンティクスセグメンテーションの結果に頼ることなく、クエリベクトルを通じてインスタンスを直接予測する。このフレームワークの重要なステップは、スーパーポイントのクロスアテンション機構を通じてインスタンス情報をキャプチャし、インスタンスのスーパーポイントマスクを生成することができるトランスフォーマーを備えた新しいクエリデコーダである。スーパーポイントマスクに基づく2部マッチングにより、spformerは中間集約ステップなしでネットワークトレーニングを実行でき、ネットワークを高速化できる。 ScanNetv2 と S3DIS ベンチマークの広範囲な実験により,提案手法は簡潔で効率的であることが確認された。特にSPFormerは、mAPの点でScanNetv2の隠れテストセットを4.3%上回り、高速な推論速度(247ms/フレーム)を同時に維持する。コードはhttps://github.com/sunjiahao 1999/SPFormerで入手できる。 Most existing methods realize 3D instance segmentation by extending those models used for 3D object detection or 3D semantic segmentation. However, these non-straightforward methods suffer from two drawbacks: 1) Imprecise bounding boxes or unsatisfactory semantic predictions limit the performance of the overall 3D instance segmentation framework. 2) Existing method requires a time-consuming intermediate step of aggregation. To address these issues, this paper proposes a novel end-to-end 3D instance segmentation method based on Superpoint Transformer, named as SPFormer. It groups potential features from point clouds into superpoints, and directly predicts instances through query vectors without relying on the results of object detection or semantic segmentation. The key step in this framework is a novel query decoder with transformers that can capture the instance information through the superpoint cross-attention mechanism and generate the superpoint masks of the instances. Through bipartite matching based on superpoint masks, SPFormer can implement the network training without the intermediate aggregation step, which accelerates the network. Extensive experiments on ScanNetv2 and S3DIS benchmarks verify that our method is concise yet efficient. Notably, SPFormer exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set in terms of mAP and keeps fast inference speed (247ms per frame) simultaneously. Code is available at https://github.com/sunjiahao1999/SPFormer.	翻訳日:2022-11-30 16:23:29 公開日:2022-11-28
# リモートセンシングにおける画像とラベル解像度のミスマッチ処理 Handling Image and Label Resolution Mismatch in Remote Sensing ( http://arxiv.org/abs/2211.15790v1 ) ライセンス: Link先を確認	Scott Workman, Armin Hadzic, M. Usman Rafique	(参考訳) セマンティックセグメンテーションは視覚文学において深く研究されてきたが、リモートセンシング領域ではユニークな課題が残っている。そのような課題の1つは、地上サンプル距離の違いによるオーバーヘッド画像と地上ラベルソースとの解像度ミスマッチの処理方法である。この問題を説明するために、我々は新しいデータセットを導入し、既存の戦略に固有の弱点を示すために使用します。代わりに、(アップサンプリングなしで)低解像度ラベルを使用して監督されるが、学習プロセスを導くために、高解像度ラベルの例示セットを利用する方法を提案する。本手法は,高分解能アノテーションを必要とせず,領域集約,逆学習,自己教師付き事前学習を組み込んだ細粒度予測手法である。大規模な実験は、我々のアプローチの現実的な適用性を実証している。 Though semantic segmentation has been heavily explored in vision literature, unique challenges remain in the remote sensing domain. One such challenge is how to handle resolution mismatch between overhead imagery and ground-truth label sources, due to differences in ground sample distance. To illustrate this problem, we introduce a new dataset and use it to showcase weaknesses inherent in existing strategies that naively upsample the target label to match the image resolution. Instead, we present a method that is supervised using low-resolution labels (without upsampling), but takes advantage of an exemplar set of high-resolution labels to guide the learning process. Our method incorporates region aggregation, adversarial learning, and self-supervised pretraining to generate fine-grained predictions, without requiring high-resolution annotations. Extensive experiments demonstrate the real-world applicability of our approach.	翻訳日:2022-11-30 16:23:04 公開日:2022-11-28
# 半定義型プログラミングによるk平均クラスタリングのスケッチ解法 Sketch-and-solve approaches to k-means clustering by semidefinite programming ( http://arxiv.org/abs/2211.15744v1 ) ライセンス: Link先を確認	Charles Clum, Dustin G. Mixon, Soledad Villar, Kaiying Xie	(参考訳) 本稿では,k-meansクラスタリングのpeng-wei半定値緩和を高速化するsketch-and-solveアプローチを提案する。データが適切に分離されると、k平均最適クラスタリングが特定される。そうでなければ、我々のアプローチは最適k平均値の高信頼な下界を与える。この下限はデータ駆動であり、データやどのように生成されるかは仮定しない。我々は、k-means++で得られたクラスタリングソリューションの近似最適性を証明するために、この手法を用いたコードと広範な数値実験を提供する。 We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.	翻訳日:2022-11-30 16:14:21 公開日:2022-11-28
# 抽象的解釈による議論的マルチエージェントにおける意味構造保存に向けて Towards Preserving Semantic Structure in Argumentative Multi-Agent via Abstract Interpretation ( http://arxiv.org/abs/2211.15782v1 ) ライセンス: Link先を確認	Minal Suresh Patil	(参考訳) 近年の20年間で、知識表現、推論、マルチエージェントシステムの分野で議論が注目されている。しかし、動的マルチエージェントシステムの議論は、表現複雑性と計算コストの犠牲となるエージェントによって生成される重要な議論の問題に直面する。本研究では,システム内の意味的フロー構造を保ちながら,複数の議論が同一位置を様々な視点から守ろうとしているモデルチェックの観点から,抽象化の概念を検討することを目的としている。 Over the recent twenty years, argumentation has received considerable attention in the fields of knowledge representation, reasoning, and multi-agent systems. However, argumentation in dynamic multi-agent systems encounters the problem of significant arguments generated by agents, which comes at the expense of representational complexity and computational cost. In this work, we aim to investigate the notion of abstraction from the model-checking perspective, where several arguments are trying to defend the same position from various points of view, thereby reducing the size of the argumentation framework whilst preserving the semantic flow structure in the system.	翻訳日:2022-11-30 16:14:12 公開日:2022-11-28
# h3wb:human3.6mの3dデータセットとベンチマーク H3WB: Human3.6M 3D WholeBody Dataset and Benchmark ( http://arxiv.org/abs/2211.15692v1 ) ライセンス: Link先を確認	Yue Zhu, Nermin Samet, David Picard	(参考訳) 3D人間全体のポーズ推定は、顔、手、体、足など、人体全体の正確な3Dキーポイントをローカライズすることを目的としている。大規模な完全に注釈付けされた3Dボディデータセットがないため、一般的なアプローチは、特定の身体部分専用のデータセットで複数のディープネットワークを個別にトレーニングし、推論中にそれらを組み合わせることである。このアプローチは、使用するデータセットのバイアスが異なるため、複雑なトレーニングと推論パイプラインに悩まされる。また、異なるメソッドを比較するのが難しい共通のベンチマークがない。これらの問題に対処するために、COCO Wholebodyレイアウトを使用して、Human3.6Mデータセットに全身アノテーションを提供するHuman3.6M 3D WholeBody (H3WB)を導入する。 H3WBは、100Kイメージに133のボディ全体のキーポイントアノテーションを備えた大規模なデータセットで、新しいマルチビューパイプラインで実現しました。 H3WBとともに3つのタスクを提案する。一 2次元完全全身ポーズから持ち上げる3次元全身ポーズ二 2次元不完全な全身ポーズから持ち上げる3次元全身ポーズ三単一のRGB画像から全身の3次元ポーズ推定また,これらの課題に対する一般的な手法のベースラインをいくつか報告する。データセットは \url{https://github.com/wholebody3d/wholebody3d} で公開されている。 3D human whole-body pose estimation aims to localize precise 3D keypoints on the entire human body, including the face, hands, body, and feet. Due to the lack of a large-scale fully annotated 3D whole-body dataset, a common approach has been to train several deep networks separately on datasets dedicated to specific body parts, and combine them during inference. This approach suffers from complex training and inference pipelines because of the different biases in each dataset used. It also lacks a common benchmark which makes it difficult to compare different methods. To address these issues, we introduce Human3.6M 3D WholeBody (H3WB) which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. Along with H3WB, we propose 3 tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image. We also report several baselines from popular methods for these tasks. The dataset is publicly available at \url{https://github.com/wholebody3d/wholebody3d}.	翻訳日:2022-11-30 16:12:19 公開日:2022-11-28
# 拡散モデルによる後学習量子化 Post-training Quantization on Diffusion Models ( http://arxiv.org/abs/2211.15736v1 ) ライセンス: Link先を確認	Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, Yan Yan	(参考訳) denoising diffusion (score-based) 生成モデルは最近、現実的で多様なデータを生成することで大きな成果を上げている。これらの手法は、データをノイズに変換する前方拡散プロセスと、ノイズからデータをサンプリングする後方デノナイジングプロセスを定義する。残念ながら、現在のデノナイジング拡散モデルの生成プロセスは、面倒なニューラルネットワークに依存する長い反復的なノイズ推定のため、明らかに遅い。これは拡散モデルが特にエッジデバイスに広く展開されることを防ぐ。従来の研究は、短いが効果的なサンプリング軌道を見つけることによって拡散モデル(DM)の生成を加速した。しかし、各イテレーションで重ネットワークによるノイズ推定のコストを見落としている。本研究では,雑音推定ネットワークの圧縮の観点から生成を高速化する。 DMの再トレーニングの難しさから,主流のトレーニング対応圧縮パラダイムを除外し,DMアクセラレーションにPTQを導入している。しかし、ノイズ推定ネットワークの出力分布は時間とともに変化するため、従来のPTQ手法は単一ステップのシナリオ用に設計されているため、DMではフェールする。 DM固有のPTQ法を考案するために、定量化演算、キャリブレーションデータセット、キャリブレーションメトリックの3つの側面で、DM上のPTQを探索する。本手法を定式化するために全包括的調査から得られたいくつかの観測結果の要約と利用,特にDMの多段階構造を対象とする。実験では,完全な精度dmsを8ビットモデルへ直接定量化し,その性能を無訓練で維持・改善することができる。重要なことに,本手法はDDIMなどの他の高速サンプリング手法のプラグアンドプレイモジュールとして機能する。 Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM.	翻訳日:2022-11-30 16:11:57 公開日:2022-11-28
# CoNAL: 大規模言語モデルによるアウトリーチの予測 CoNAL: Anticipating Outliers with Large Language Models ( http://arxiv.org/abs/2211.15718v1 ) ライセンス: Link先を確認	Albert Xu, Xiang Ren, and Robin Jia	(参考訳) 多くのタスク設定において、テキスト分類モデルは、正しく予測できない新しいクラスの例に遭遇する可能性が高い。モデルが低信頼の例に固執する選択的予測は可能な解決策を提供するが、既存のモデルはしばしばoodの例に過度に自信を持っている。この過度な自信を補うために,新しいクラスを代表するOOD例を生成する2段階の手法であるContrastive Novelty-Augmented Learning (CoNAL)を導入し,その信頼性を低下させる訓練を行った。まず、大きな言語モデルを2回促すことでoodの例を生成します。関連する新しいラベルを列挙するように促し、タスクフォーマットにマッチする各新規クラスから例を生成します。第2に,生成されたood例に対する信頼度をトレーニング例よりも低くする,新たなコントラスト目標で分類器をトレーニングする。 CoNALで訓練すると、分類器は4つのNLPデータセットに対して平均2.3%のAUACと5.5%のAUROCでOODのサンプルを検出し、吸収する能力を向上させる。 In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly. Selective prediction, in which models abstain on low-confidence examples, provides a possible solution, but existing models are often overly confident on OOD examples. To remedy this overconfidence, we introduce Contrastive Novelty-Augmented Learning (CoNAL), a two-step method that generates OOD examples representative of novel classes, then trains to decrease confidence on them. First, we generate OOD examples by prompting a large language model twice: we prompt it to enumerate relevant novel labels, then generate examples from each novel class matching the task format. Second, we train our classifier with a novel contrastive objective that encourages lower confidence on generated OOD examples than training examples. When trained with CoNAL, classifiers improve in their ability to detect and abstain on OOD examples over prior methods by an average of 2.3% AUAC and 5.5% AUROC across 4 NLP datasets, with no cost to in-distribution accuracy.	翻訳日:2022-11-30 15:55:05 公開日:2022-11-28
# 言語学習項目のための制御言語生成 Controlled Language Generation for Language Learning Items ( http://arxiv.org/abs/2211.15731v1 ) ライセンス: Link先を確認	Kevin Stowe, Debanjan Ghosh, Mengxuan Zhao	(参考訳) この研究は、英語学習アプリケーションのためのアイテムを迅速に生成するために、自然言語生成(nlg)を採用することを目的としている。本研究は,言語学習に関連する要素の項目を制御する新しい手法である,習熟度が異なる多様な文と文法テストのための引数構造を考案した。ヒトの評価では、全てのモデル(3.4以上、4以上)で高い文法スコアを示し、高度な熟練度モデルのベースラインよりも高い長さ(24%)と複雑さ(9%)を示す。その結果,個々のユーザに対して多様でカスタマイズされたコンテンツを保証するためのコントロールを追加して,強力なパフォーマンスを実現することができた。 This work aims to employ natural language generation (NLG) to rapidly generate items for English language learning applications: this requires both language models capable of generating fluent, high-quality English, and to control the output of the generation to match the requirements of the relevant items. We experiment with deep pretrained models for this task, developing novel methods for controlling items for factors relevant in language learning: diverse sentences for different proficiency levels and argument structure to test grammar. Human evaluation demonstrates high grammatically scores for all models (3.4 and above out of 4), and higher length (24%) and complexity (9%) over the baseline for the advanced proficiency model. Our results show that we can achieve strong performance while adding additional control to ensure diverse, tailored content for individual users.	翻訳日:2022-11-30 15:54:44 公開日:2022-11-28
# 創発的言語の語彙エントロピーを数学的にモデル化する Mathematically Modeling the Lexicon Entropy of Emergent Language ( http://arxiv.org/abs/2211.15783v1 ) ライセンス: Link先を確認	Brendon Boldt, David Mortensen	(参考訳) 深層学習に基づく創発言語システムにおける語彙エントロピーの数学的モデルとして確率過程FiLexを定式化する。モデルを数学的に定義することで、直接かつ決定的にテスト可能な明確な予測を生成することができる。本研究は,FiLexがハイパーパラメータ(トレーニングステップ,レキシコンサイズ,学習速度,ロールアウトバッファサイズ,Gumbel-Softmax温度)と,20の環境-ハイパーパラメータの組み合わせのうち20の創発言語エントロピーの正確な相関を予測できる4つの環境を実証的に検証した。さらに, 実験により, 異なる環境が過度パラメータとエントロピーの関係を多様に示し, 精度の高い粒度の予測を行うモデルの必要性が示された。 We formulate a stochastic process, FiLex, as a mathematical model of lexicon entropy in deep learning-based emergent language systems. Defining a model mathematically allows it to generate clear predictions which can be directly and decisively tested. We empirically verify across four different environments that FiLex predicts the correct correlation between hyperparameters (training steps, lexicon size, learning rate, rollout buffer size, and Gumbel-Softmax temperature) and the emergent language's entropy in 20 out of 20 environment-hyperparameter combinations. Furthermore, our experiments reveal that different environments show diverse relationships between their hyperparameters and entropy which demonstrates the need for a model which can make well-defined predictions at a precise level of granularity.	翻訳日:2022-11-30 15:54:30 公開日:2022-11-28
# ディープラーニング駆動のエッジビデオ分析:調査 Deep Learning-Driven Edge Video Analytics: A Survey ( http://arxiv.org/abs/2211.15751v1 ) ライセンス: Link先を確認	Renjie Xu, Saiedeh Razavi and Rong Zheng	(参考訳) ビデオは、デジタル情報のグローバルな爆発の鍵を握る存在であり、人間社会に多大な利益をもたらす。政府や企業は、例えば、警察、緊急管理、交通制御、セキュリティ監視など、様々な用途に無数のカメラを配備しており、いずれもビデオ分析(VA)によって促進されている。この傾向は、オブジェクト分類、検出、追跡のためのより正確なモデルを可能にするディープラーニング(DL)の急速な進歩によって引き起こされる。一方、インターネットに接続されたデバイスの普及に伴い、大量のデータが毎日生成され、クラウドを圧倒する。ワークロードとサービスをネットワークコアからネットワークエッジに移行する、新たなパラダイムであるエッジコンピューティングは、有望なソリューションとして広く認識されている。新たな交差点であるedge video analytics(eva)は、広く注目を集め始めている。それにもかかわらず、この話題に関する調査はごくわずかである。 EVAの最新の進歩を収集・要約するための専用会場がコミュニティから強く望まれている。さらに、EVAの基本概念(定義、アーキテクチャなど)は曖昧であり、この領域の急速な発展のためにこれらの調査によって無視されている。これらの概念のコンセンサスを促進するためには、徹底的な明確化が必要である。これらのギャップを埋めるために、EVAに関する最近の取り組みを包括的に調査する。本稿では,まずエッジコンピューティングの基礎を概観し,続いてvaの概要について述べる。次にEVAシステムとその実現技術について述べる。さらに,EVAシステムの開発において,今後の研究者を支援するためのフレームワークやデータセットも紹介する。最後に,既存の課題と今後の研究方向性について考察する。この調査は、読者がVAとエッジコンピューティングの関係を理解し、EVAに関する新しいアイデアを喚起するのに役立ちます。 Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.	翻訳日:2022-11-30 15:46:16 公開日:2022-11-28
# 深層学習2段階アプローチによる新型コロナウイルスの分類 COVID-19 Classification Using Deep Learning Two-Stage Approach ( http://arxiv.org/abs/2211.15817v1 ) ライセンス: Link先を確認	Mostapha Alsaidi, Ali Saleem Altaher, Muhammad Tanveer Jan, Ahmed Altaher, Zahra Salekshahrezaee	(参考訳) 本稿では,未訓練の畳み込みニューラルネットワーク(VGG16とVGG19)の微調整と,開発されたCNNモデルのエンドツーエンドトレーニングを併用して,X線画像を新型コロナウイルス,正常,不透明,肺炎の4つのクラスに分類した。 20,000以上のX線スキャンを含むデータセットがKaggleから取得され、この実験で使用された。 2段階の分類アプローチをワンショット分類アプローチと比較するために実施した。我々の仮説は、2段階のモデルが単発モデルよりも優れたパフォーマンスを達成できるというものだった。以上の結果より, VGG16は5倍の訓練で95%の精度を達成できた。今後は、2段階分類モデルのCovid-TSCのより堅牢な実装に注力する予定である。主な改善点は、covid-19データセット上でvgg16モデルが微調整されたstage-1の出力からstage-2の入力へのデータフローを可能にすることだ。 In this paper, deep-learning-based approaches namely fine-tuning of pretrained convolutional neural networks (VGG16 and VGG19), and end-to-end training of a developed CNN model, have been used in order to classify X-Ray images into four different classes that include COVID-19, normal, opacity and pneumonia cases. A dataset containing more than 20,000 X-ray scans was retrieved from Kaggle and used in this experiment. A two-stage classification approach was implemented to be compared to the one-shot classification approach. Our hypothesis was that a two-stage model will be able to achieve better performance than a one-shot model. Our results show otherwise as VGG16 achieved 95% accuracy using one-shot approach over 5-fold of training. Future work will focus on a more robust implementation of the two-stage classification model Covid-TSC. The main improvement will be allowing data to flow from the output of stage-1 to the input of stage-2, where stage-1 and stage-2 models are VGG16 models fine-tuned on the Covid-19 dataset.	翻訳日:2022-11-30 15:45:52 公開日:2022-11-28
# 整数プログラミングによる加速非負テンソル補完 Accelerated Nonnegative Tensor Completion via Integer Programming ( http://arxiv.org/abs/2211.15770v1 ) ライセンス: Link先を確認	Wenhao Pan, Anil Aswani and Chen Chen	(参考訳) テンソル完備化の問題には、医療、コンピュータビジョン、その他の領域への応用がある。しかし、従来のテンソル完備化へのアプローチは、多項式時間計算を持つが、情報理論速度よりも指数関数的に多くのサンプルを必要とするか、より少ないサンプルを使用するが、既知の実用的なアルゴリズムが存在しないNPハード問題を解く必要があるという緊張に直面している。整数計画に基づく最近のアプローチは、非負のテンソル完全化に対するこの緊張を解消する。情報理論的なサンプル複雑性率を達成し、大域的最適に収束するためには、線形(数値的寛容)数のオラクルステップを必要とするBlended Conditional Gradientsアルゴリズムをデプロイする。このアプローチのトレードオフは、最悪の場合、oracleのステップは整数線形プログラムの解決を必要とすることである。この理論的な制限にもかかわらず、数値実験により、このアルゴリズムは、ある場合において、パーソナルコンピュータ上で実行中に最大1億のエントリをスケール可能であることが示された。本稿の目標は,解決可能なインスタンスの広さと規模を拡大することを目的として,このアルゴリズムをさらに強化することである。我々はアルゴリズムと同じ理論的保証を維持することができるが、より高速な計算を提供するいくつかの変種を探索する。我々は、異なるデータ構造、勾配降下ステップの加速、Blended Pairwise Conditional Gradientsアルゴリズムの利用について検討する。提案手法は, アルゴリズム設計の選択において, 様々なトレードオフを探索するために, 数値実験を行うものである。 The problem of tensor completion has applications in healthcare, computer vision, and other domains. However, past approaches to tensor completion have faced a tension in that they either have polynomial-time computation but require exponentially more samples than the information-theoretic rate, or they use fewer samples but require solving NP-hard problems for which there are no known practical algorithms. A recent approach, based on integer programming, resolves this tension for nonnegative tensor completion. It achieves the information-theoretic sample complexity rate and deploys the Blended Conditional Gradients algorithm, which requires a linear (in numerical tolerance) number of oracle steps to converge to the global optimum. The tradeoff in this approach is that, in the worst case, the oracle step requires solving an integer linear program. Despite this theoretical limitation, numerical experiments show that this algorithm can, on certain instances, scale up to 100 million entries while running on a personal computer. The goal of this paper is to further enhance this algorithm, with the intention to expand both the breadth and scale of instances that can be solved. We explore several variants that can maintain the same theoretical guarantees as the algorithm, but offer potentially faster computation. We consider different data structures, acceleration of gradient descent steps, and the use of the Blended Pairwise Conditional Gradients algorithm. We describe the original approach and these variants, and conduct numerical experiments in order to explore various tradeoffs in these algorithmic design choices.	翻訳日:2022-11-30 15:35:54 公開日:2022-11-28
# 対話型学習(IGL)を用いた個人化リワード学習 Personalized Reward Learning with Interaction-Grounded Learning (IGL) ( http://arxiv.org/abs/2211.15823v1 ) ライセンス: Link先を確認	Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan	(参考訳) 数え切れないほどのコンテンツ提供の時代に、レコメンダシステムはユーザーにパーソナライズされたコンテンツ提案を提供することで、情報の過負荷を軽減する。明示的なユーザフィードバックが不足しているため、現代のレコメンデータシステムは一般的に、すべてのユーザに対する暗黙的なフィードバック信号の固定的な組み合わせを最適化する。しかし、このアプローチは、それを強調している仕事の体を無視している。 (i)暗黙の信号は、ユーザの満足感からアクティブな嫌悪まで、様々な方法で使用することができる。 (ii)異なるユーザーが異なる方法で好みを伝える。多様なユーザ・コミュニケーション・モダリティの学習表現の課題に対処するために,近年のインタラクション・グラウンドド・ラーニング(IGL)パラダイムを適用することを提案する。固定された人間設計の報酬関数を取るのではなく、IGLは異なるユーザーに対してパーソナライズされた報酬関数を学習し、潜在ユーザの満足度を直接最適化することができる。シミュレーションおよび実世界の生産トレースを用いた実験により,IGLの成功例を示す。 In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than taking a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.	翻訳日:2022-11-30 15:27:28 公開日:2022-11-28
# SuS-X: 視覚言語モデルの訓練自由名専用転送 SuS-X: Training-Free Name-Only Transfer of Vision-Language Models ( http://arxiv.org/abs/2211.16198v1 ) ライセンス: Link先を確認	Vishaal Udandarao, Ankush Gupta, Samuel Albanie	(参考訳) Contrastive Language-Image Pre-Training (CLIP) は、大規模な視覚言語モデルを訓練するための単純かつ効果的な方法として登場した。 CLIPは、さまざまな下流タスクに対する印象的なゼロショットの分類と検索を示す。しかし、その潜在能力を最大限活用するためには、微調整が必要であるようだ。クリップモデル全体の微調整はリソース集約的で不安定です。さらに、このような微調整を回避しようとする最近の手法では、ターゲット分布からの画像にアクセスする必要がある。本稿では,異なるアプローチを追求し,ダウンストリームタスクに関する知識が下流のターゲットカテゴリの名前のみを含む,トレーニングフリーな"名前のみの転送"の仕組みを検討する。本稿では,SuSとTIP-Xという2つの重要なビルディングブロックで構成されるSuS-Xを提案する。 SuS-Xは19のベンチマークデータセットで最先端のゼロショット分類結果を達成する。また,TIP-Xをトレーニング不要な複数ショット設定で有効性を示すとともに,トレーニング不要なベースラインの強化に対して,最先端の結果が得られた。コードはhttps://github.com/vishaal27/SuS-Xで入手できる。 Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks -- SuS and TIP-X, that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve state-of-the-art results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.	翻訳日:2022-11-30 15:20:00 公開日:2022-11-28
# マルチヘッド蒸留による分散学習 Decentralized Learning with Multi-Headed Distillation ( http://arxiv.org/abs/2211.15774v1 ) ライセンス: Link先を確認	Andrey Zhmoginov and Mark Sandler and Nolan Miller and Gus Kristiansen and Max Vladymyrov	(参考訳) プライベートデータによる分散学習は、機械学習の中心的な問題である。本研究では,非iidデータを持つ複数のエージェントが,データ共有や重み付け,重み付けの更新を必要とせずに相互に学習できる,新たな蒸留型分散学習手法を提案する。提案手法は通信効率が高く,ラベルのない公開データセットを活用し,クライアント毎に複数の補助ヘッドを使用することで,異種データの場合のトレーニング効率を大幅に向上する。このアプローチにより、個々のモデルがプライベートタスクのパフォーマンスを保ち、向上すると同時に、グローバル集約されたデータ分散のパフォーマンスを劇的に改善することができる。我々は,データとモデルアーキテクチャの不均一性と,基礎となる通信グラフトポロジが学習効率に与える影響について検討し,エージェントが単独で学習するよりも性能を著しく向上できることを示す。 Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.	翻訳日:2022-11-30 15:19:42 公開日:2022-11-28
# 悪性オーバーフィッティング:補間はおそらく不変性を妨げる Malign Overfitting: Interpolation Can Provably Preclude Invariance ( http://arxiv.org/abs/2211.15724v1 ) ライセンス: Link先を確認	Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon	(参考訳) 学習された分類器は、公平性、堅牢性、分散の一般化を促進するためのある種の不変性を持つべきである。しかし、近年の複数の研究により、共通不分散誘導正規化器は、分類器がトレーニングデータに完全に適合する(つまり補間する)過剰パラメータ化方式では有効ではないことが実証されている。これは、補間にもかかわらずモデルがうまく一般化する「良心過剰」現象が、堅牢性や公正性が望ましい設定にまで好ましくないことを示唆している。この研究では、これらの観測を理論的に正当化します。最も単純な設定であっても、任意の補間学習規則(任意にマージンが小さい)がこれらの不変性特性を満たさないことを証明します。そして、同じ設定で、証明可能な不変な非補間分類器をうまく学習するアルゴリズムを提案し、解析する。シミュレーションデータと水鳥データセットに関する理論的観察を検証する。 Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.	翻訳日:2022-11-30 15:10:02 公開日:2022-11-28
# 対人ロバスト性が精度差に及ぼす影響の理解 Understanding the Impact of Adversarial Robustness on Accuracy Disparity ( http://arxiv.org/abs/2211.15762v1 ) ライセンス: Link先を確認	Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao	(参考訳) 敵対的ロバスト性は標準的な精度に反する可能性があり、異なるクラスにさらに異なる影響を与える可能性があることは、長い間実証されてきたが、そのような観察がどの程度の程度で、クラスの不均衡が内部でどのように役割を果たすのかについては、未解決の問題である。本稿では,ガウス混合モデルの下で線形分類器を詳しく検討することにより,この精度格差の問題を解明しようとする。本研究は, 対向ロバスト性の影響を, 全クラスにおける標準精度を低下させる固有の効果と, 標準トレーニングと比較して精度の相違を増大させるクラス不均衡比の2つに分解する。さらに、我々のモデルを安定分布の一般族に拡張する。対向ロバスト性の制約は、バランスの取れたクラス設定における標準精度を常に低下させるが、クラス不均衡比は、安定分布の重みのため、ガウスの場合と比較して、精度の相違において根本的に異なる役割を果たす。合成データセットと実世界のデータセットの両方で実験を行う。実験結果は、理論的な知見を裏付けるだけでなく、実世界のデータセットよりも非線形モデルにも影響が及ぶ可能性を示唆している。 While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. We decompose the impact of adversarial robustness into two parts: an inherent effect that will degrade the standard accuracy on all classes, and the other caused by the class imbalance ratio, which will increase the accuracy disparity compared to standard training. Furthermore, we also extend our model to the general family of stable distributions. We demonstrate that while the constraint of adversarial robustness consistently degrades the standard accuracy in the balanced class setting, the class imbalance ratio plays a fundamentally different role in accuracy disparity compared to the Gaussian case, due to the heavy tail of the stable distribution. We additionally perform experiments on both synthetic and real-world datasets. The empirical results not only corroborate our theoretical findings, but also suggest that the implications may extend to nonlinear models over real-world datasets.	翻訳日:2022-11-30 15:09:46 公開日:2022-11-28
# Ollivier's Ricci Curvature を用いたオーバースムーシングとオーバースキャッシングの再検討 Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature ( http://arxiv.org/abs/2211.15779v1 ) ライセンス: Link先を確認	Khang Nguyen and Tan Nguyen and Nhat Ho and Khuong Nguyen and Hieu Nong and Vinh Nguyen	(参考訳) グラフニューラルネットワーク(GNN)は、オーバースムーシングとオーバースキャッシングの問題に本質的に感受性があることが示されている。これらの問題は、GNNが遠隔情報を考慮した複雑なグラフ相互作用をモデル化することを禁じている。本研究は,局所グラフ幾何学とこれら2つの問題の発生との間の鍵となる関係を明らかにし,Ollivier's Ricci曲率を用いて局所的に研究するための統一的な枠組みを提供する。この理論に基づき, 過剰なスムーシングと過剰なスケーシングの問題を緩和するために, 多数の原理的手法が提案されている。 Graph Neural Networks (GNNs) had been demonstrated to be inherently susceptible to the problems of over-smoothing and over-squashing. These issues prohibit the ability of GNNs to model complex graph interactions by limiting their effectiveness at taking into account distant information. Our study reveals the key connection between the local graph geometry and the occurrence of both of these issues, thereby providing a unified framework for studying them at a local scale using the Ollivier's Ricci curvature. Based on our theory, a number of principled methods are proposed to alleviate the over-smoothing and over-squashing issues.	翻訳日:2022-11-30 15:09:22 公開日:2022-11-28
# clas: 中央潜在アクションスペースによるマルチロボット操作のコーディネート CLAS: Coordinating Multi-Robot Manipulation with Central Latent Action Spaces ( http://arxiv.org/abs/2211.15824v1 ) ライセンス: Link先を確認	Elie Aljalbout and Maximilian Karl and Patrick van der Smagt	(参考訳) マルチロボット操作タスクは、動的に独立した部分に分割することができる様々な制御エンティティを含む。そのような現実世界のタスクの典型的な例はデュアルアーム操作である。このようなタスクを強化学習でナビゲート的に解くことは、アクションと状態空間の次元とともに成長するサンプルの複雑さと探索要求のため、しばしば実現不可能である。代わりに、マルチエージェントシステムのような環境を扱い、エージェントが全体を制御するようにしたいと考えています。しかし、アクションの生成を分散化するには、タスクの中心となる情報に制限されたチャネルを通じてエージェント間の調整が必要である。本稿では,異なるエージェント間で共有される学習された潜在行動空間を通じて,マルチロボット操作を協調する手法を提案する。シミュレーションによるマルチロボット操作タスクにおいて,本手法を検証し,サンプル効率と学習性能の観点から,従来のベースラインよりも改善することを示す。 Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. Instead, we would like to handle such environments as multi-agent systems and have several agents control parts of the whole. However, decentralizing the generation of actions requires coordination across agents through a channel limited to information central to the task. This paper proposes an approach to coordinating multi-robot manipulation through learned latent action spaces that are shared across different agents. We validate our method in simulated multi-robot manipulation tasks and demonstrate improvement over previous baselines in terms of sample efficiency and learning performance.	翻訳日:2022-11-30 15:01:59 公開日:2022-11-28
# マーケティングにおける資源配分問題に対する直接的不均一因果学習 Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing ( http://arxiv.org/abs/2211.15728v1 ) ライセンス: Link先を確認	Hao Zhou, Shaoming Li, Guibin Jiang, Jiaqi Zheng and Dong Wang	(参考訳) マーケティングは、ユーザのエンゲージメントを高め、プラットフォーム収益を改善するための重要なメカニズムであり、不均一な因果学習は、より効果的な戦略の開発に役立つ。マーケティングにおける意思決定問題は資源配分問題として定式化され、数十年にわたって研究されてきた。既存の作業は通常、解法を2つの完全に分離された段階、すなわち機械学習(ML)とオペレーションリサーチ(OR)に分割する。しかし、MLにおける予測パラメータの誤差は尊重されず、ORにおける一連の複雑な数学的操作は累積誤差の増加につながる。本質的に、予測パラメータの精度向上は、デカップリング設計による副作用のため、最終解に正の相関を持たない可能性がある。本稿では,資源割当問題を解決し,副作用を緩和するための新しい手法を提案する。我々の重要な直感は、MLとOR間のブリッジを確立するための決定因子を導入し、決定因子のソートや比較操作のみを実行することで、OR内で直接解を得ることができることです。さらに,決定要因に対して直接的不均質因果学習を行うようにカスタマイズした損失関数を設計し,損失が収束した場合の偏りのない推定を行う。ケーススタディでは,2次処理代入問題と複数処理による予算配分問題という,マーケティングにおける重要な2つの問題にアプローチを適用した。大規模シミュレーションとオンラインa/bテストの両方で,我々のアプローチが最先端の手法に比べて大幅に改善できることが示されている。 Marketing is an important mechanism to increase user engagement and improve platform revenue, and heterogeneous causal learning can help develop more effective strategies. Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades. Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR) -- the first stage predicts the model parameters and they are fed to the optimization in the second stage. However, the error of the predicted parameters in ML cannot be respected and a series of complex mathematical operations in OR lead to the increased accumulative errors. Essentially, the improved precision on the prediction parameters may not have a positive correlation on the final solution due to the side-effect from the decoupled design. In this paper, we propose a novel approach for solving resource allocation problems to mitigate the side-effects. Our key intuition is that we introduce the decision factor to establish a bridge between ML and OR such that the solution can be directly obtained in OR by only performing the sorting or comparison operations on the decision factor. Furthermore, we design a customized loss function that can conduct direct heterogeneous causal learning on the decision factor, an unbiased estimation of which can be guaranteed when the loss converges. As a case study, we apply our approach to two crucial problems in marketing: the binary treatment assignment problem and the budget allocation problem with multiple treatments. Both large-scale simulations and online A/B Tests demonstrate that our approach achieves significant improvement compared with state-of-the-art.	翻訳日:2022-11-30 15:01:26 公開日:2022-11-28
# videofact: 注意、シーンコンテキスト、法医学的トレースを用いたビデオ偽造の検出 VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces ( http://arxiv.org/abs/2211.15775v1 ) ライセンス: Link先を確認	Tai D. Nguyen, Shengbang Fang, Matthew C. Stamm	(参考訳) フェイクビデオは重要な誤報の脅威だ。既存の法医学的ネットワークは画像偽造に強いパフォーマンスを示しているが、最近のAdobe VideoShamデータセットの報告によると、これらのネットワークはビデオ内の偽のコンテンツを識別できない。本稿では,多種多様なビデオの偽造や操作を検知・ローカライズできる新しいネットワークを提案する。既存のネットワークがビデオ解析時に直面する課題を克服するため,本ネットワークは,操作によって残される痕跡を捕捉する法医学的埋め込みと,局所的なシーン内容に対する法医学的トレースの条件付き依存関係を利用するコンテキスト埋め込み,深層でトランスフォーマーベースの注意機構による空間的注意の両方を利用する。いくつかの新しいビデオフォージェリーデータセットを作成し、これらを公開データとともに使用して、ネットワークのパフォーマンスを実験的に評価する。これらの結果から,提案するネットワークは,訓練中に遭遇しないものを含む多様なビデオ偽造を識別できることがわかった。さらに,本研究の結果は,画像鑑定ネットワークがビデオ中の偽コンテンツをほとんど特定できないという最近の知見を裏付けるものである。 Fake videos represent an important misinformation threat. While existing forensic networks have demonstrated strong performance on image forgeries, recent results reported on the Adobe VideoSham dataset show that these networks fail to identify fake content in videos. In this paper, we propose a new network that is able to detect and localize a wide variety of video forgeries and manipulations. To overcome challenges that existing networks face when analyzing videos, our network utilizes both forensic embeddings to capture traces left by manipulation, context embeddings to exploit forensic traces' conditional dependencies upon local scene content, and spatial attention provided by a deep, transformer-based attention mechanism. We create several new video forgery datasets and use these, along with publicly available data, to experimentally evaluate our network's performance. These results show that our proposed network is able to identify a diverse set of video forgeries, including those not encountered during training. Furthermore, our results reinforce recent findings that image forensic networks largely fail to identify fake content in videos.	翻訳日:2022-11-30 14:52:32 公開日:2022-11-28
# 地理空間探索のためのビジュアルアクティブ検索フレームワーク A Visual Active Search Framework for Geospatial Exploration ( http://arxiv.org/abs/2211.15788v1 ) ライセンス: Link先を確認	Anindya Sarkar, Michael Lanier, Scott Alfeld, Roman Garnett, Nathan Jacobs, Yevgeniy Vorobeychik	(参考訳) 多くの問題は航空画像による地理空間探索の一種と見なすことができ、例えば、密猟活動の検出から人身売買まで多岐にわたる。本研究では,視覚的能動探索(VAS)フレームワークを用いて,広い領域のイメージを入力とし,対象対象物のできるだけ多くの例を特定することを目的とする。これはクエリの限られたシーケンスを通じて行われ、それぞれが与えられた領域にサンプルが存在するかどうかを検証する。本稿では,完全注釈付き検索タスクの集合を学習データとして活用し,検索方針を学習し,入力画像の特徴と能動検索状態の自然な表現を組み合わせる,vasのための強化学習手法を提案する。さらに,VASタスクのテスト時間分布を完全に反映していない場合の判定時のポリシー改善のためのドメイン適応手法を提案する。複数の衛星画像データセットに関する広範囲な実験を通じて,提案手法が複数の強力なベースラインを上回ることを示した。コードとデータは公開されます。 Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.	翻訳日:2022-11-30 14:52:10 公開日:2022-11-28
# 微分可能な辞書を用いた信号混合の確率論的モデル化 Probabilistic Modelling of Signal Mixtures with Differentiable Dictionaries ( http://arxiv.org/abs/2211.15439v1 ) ライセンス: Link先を確認	Luk\'a\v{s} Samuel Mart\'ak, Rainer Kelz, Gerhard Widmer	(参考訳) 我々は,事前情報を(半)教師付き非負の行列分解に組み込む新しい手法を導入し,これを微分可能な辞書探索と呼ぶ。これは、非線形ソースが線形に混合される混合の一般的な、柔軟で原理的なモデリングを可能にする。音声分解タスクにおけるその動作を解析し、そのモデリング能力に関する広範囲かつ高度に制御された研究を行う。 We introduce a novel way to incorporate prior information into (semi-) supervised non-negative matrix factorization, which we call differentiable dictionary search. It enables general, highly flexible and principled modelling of mixtures where non-linear sources are linearly mixed. We study its behavior on an audio decomposition task, and conduct an extensive, highly controlled study of its modelling capabilities.	翻訳日:2022-11-29 23:03:12 公開日:2022-11-28
# 大規模アレー通信における近接場チャネル推定--モデルに基づくディープラーニングアプローチ Near-Field Channel Estimation for Extremely Large-Scale Array Communications: A model-based deep learning approach ( http://arxiv.org/abs/2211.15440v1 ) ライセンス: Link先を確認	Xiangyu Zhang and Zening Wang and Haiyang Zhang and Luxi Yang	(参考訳) 大規模MIMO(XL-MIMO)が将来無線通信の有望な技術として評価されている。 XL-MIMOの展開は、特に高周波帯において、従来の遠方界ではなく、近距離域にユーザーを配置させる。本稿では,XL-MIMO通信の近距離無線チャネルを推定するためのモデルに基づく効率的なディープラーニングアルゴリズムを提案する。特に,XL-MIMO近距離チャネル推定タスクを空間グリッド型スペーシング辞書を用いて圧縮センシング問題として定式化し,学習反復収縮・保持アルゴリズム(LISTA)を適用して結果の問題を解決する。近接場特性のため、空間グリッドに基づくスパース化辞書は、低いチャネル推定精度と重い計算負荷をもたらす可能性がある。この問題に対処するために、スペーサー辞書をニューラルネットワーク層として定式化し、LISTAニューラルネットワークに組み込む新しいスペーサー辞書学習LISTA(SDL-LISTA)アルゴリズムを提案する。その結果,提案手法は非学習ベンチマーク方式よりも優れており,sdl-listaは10倍の原子削減でlistaよりも優れた性能が得られることがわかった。 Extremely large-scale massive MIMO (XL-MIMO) has been reviewed as a promising technology for future wireless communications. The deployment of XL-MIMO, especially at high-frequency bands, leads to users being located in the near-field region instead of the conventional far-field. This letter proposes efficient model-based deep learning algorithms for estimating the near-field wireless channel of XL-MIMO communications. In particular, we first formulate the XL-MIMO near-field channel estimation task as a compressed sensing problem using the spatial gridding-based sparsifying dictionary, and then solve the resulting problem by applying the Learning Iterative Shrinkage and Thresholding Algorithm (LISTA). Due to the near-field characteristic, the spatial gridding-based sparsifying dictionary may result in low channel estimation accuracy and a heavy computational burden. To address this issue, we further propose a new sparsifying dictionary learning-LISTA (SDL-LISTA) algorithm that formulates the sparsifying dictionary as a neural network layer and embeds it into LISTA neural network. The numerical results show that our proposed algorithms outperform non-learning benchmark schemes, and SDL-LISTA achieves better performance than LISTA with ten times atoms reduction.	翻訳日:2022-11-29 23:03:07 公開日:2022-11-28
# 微分可能な辞書探索:音源分離のための線形混合と深部非線形モデルの統合 Differentiable Dictionary Search: Integrating Linear Mixing with Deep Non-Linear Modelling for Audio Source Separation ( http://arxiv.org/abs/2211.15524v1 ) ライセンス: Link先を確認	Luk\'a\v{s} Samuel Mart\'ak, Rainer Kelz, Gerhard Widmer	(参考訳) 本稿では,微分可能な辞書検索 (DDS) の名称で最近定式化した信号分解法の改良について述べる。 DDSの基本的な考え方は、正規化フローと呼ばれる強力な非可逆密度推定器のクラスを利用して、辞書をNMFのような線形分解法でモデル化し、辞書要素の空間と関連する確率空間の間のビジェクションを効果的に生成し、推定密度で導かれる辞書空間を通して微分可能な探索を可能にすることである。最初の定式化は、いくつかの実用的な制限のある概念実証であり、我々は、この手法の計算複雑性と信号分解能力の両方を改善するために、拡張性を高めるためのいくつかのステップを示す。実験的な評価のためのテストベッドとして,個々のピアノ音符に起因した音源に信号が分解されるフレームレベルピアノの書き起こしのタスクを選択する。音源の非線形モデリングの改善による影響を明らかにするため,提案手法の変種を線形オーバーコンプリートNMFベースラインと比較した。実験の結果、追加の制約がなくても、2つの関連する評価尺度により、モデルがより疎弱で正確な分解を生じていることが示される。 This paper describes several improvements to a new method for signal decomposition that we recently formulated under the name of Differentiable Dictionary Search (DDS). The fundamental idea of DDS is to exploit a class of powerful deep invertible density estimators called normalizing flows, to model the dictionary in a linear decomposition method such as NMF, effectively creating a bijection between the space of dictionary elements and the associated probability space, allowing a differentiable search through the dictionary space, guided by the estimated densities. As the initial formulation was a proof of concept with some practical limitations, we will present several steps towards making it scalable, hoping to improve both the computational complexity of the method and its signal decomposition capabilities. As a testbed for experimental evaluation, we choose the task of frame-level piano transcription, where the signal is to be decomposed into sources whose activity is attributed to individual piano notes. To highlight the impact of improved non-linear modelling of sources, we compare variants of our method to a linear overcomplete NMF baseline. Experimental results will show that even in the absence of additional constraints, our models produce increasingly sparse and precise decompositions, according to two pertinent evaluation measures.	翻訳日:2022-11-29 23:02:45 公開日:2022-11-28
# データ拡張による機械学習による外惑星検出 Exoplanet Detection by Machine Learning with Data Augmentation ( http://arxiv.org/abs/2211.15577v1 ) ライセンス: Link先を確認	Koray Aydo\u{g}an	(参考訳) 近年、深層学習はkepler \cite{borucki2010kepler} \cite{koch2010kepler} やnasaのtransiting exoplanet survey satellite (tess) \cite{ricker2010transiting} のような衛星からの光曲線データを用いて、太陽系外惑星検出パイプラインの一部を自動化する重要な可能性を実証されている。残念ながら、利用可能なデータセットの小さいため、強力なネットワークアーキテクチャから期待されるパフォーマンスレベルを実現するのは難しい。本稿では,外惑星を識別するために,ニューラルネットワークを訓練するための光曲線データに対するデータ拡張手法について検討する。 Augmentation Technique は2つのクラスから構成される: 単純(例:加法的雑音増大)と学習ベース(例: GAN \cite{goodfellow2020generative} を訓練して新しい例を生成する)。我々は、データ拡張が外惑星検出問題におけるモデル性能を向上させる可能性を実証し、より多くのデータが利用可能になるにつれて、生成モデルに基づく拡張の利用を推奨する。 It has recently been demonstrated that deep learning has significant potential to automate parts of the exoplanet detection pipeline using light curve data from satellites such as Kepler \cite{borucki2010kepler} \cite{koch2010kepler} and NASA's Transiting Exoplanet Survey Satellite (TESS) \cite{ricker2010transiting}. Unfortunately, the smallness of the available datasets makes it difficult to realize the level of performance one expects from powerful network architectures. In this paper, we investigate the use of data augmentation techniques on light curve data from to train neural networks to identify exoplanets. The augmentation techniques used are of two classes: Simple (e.g. additive noise augmentation) and learning-based (e.g. first training a GAN \cite{goodfellow2020generative} to generate new examples). We demonstrate that data augmentation has a potential to improve model performance for the exoplanet detection problem, and recommend the use of augmentation based on generative models as more data becomes available.	翻訳日:2022-11-29 23:02:23 公開日:2022-11-28
# 量子およびハイブリッドアルゴリズムを用いたシミュレーションおよび物理量子処理ユニットのベンチマーク Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms ( http://arxiv.org/abs/2211.15631v1 ) ライセンス: Link先を確認	Mohammad Kordzanganeh, Markus Buchberger, Maxim Povolotskii, Wilhelm Fischer, Andrii Kurkin, Wilfrid Somogyi, Asel Sagingalieva, Markus Pflitsch, Alexey Melnikov	(参考訳) 強力なハードウェアサービスとソフトウェアライブラリは、量子アルゴリズムを迅速に設計、テスト、実行するための必須のツールである。これらのプラットフォームのパフォーマンスがキュービット数でどのようにスケールするかに関する堅牢な大規模研究は、業界問題に対する量子ソリューションを提供する上で鍵となる。このような評価は、物理量子処理ユニットの可用性と価格のために難しい。この作業は、特殊な高性能シミュレーションおよび物理量子処理ユニットの代表的なサンプルのランタイムと精度をベンチマークする。その結果、QMwareクラウドコンピューティングサービスは、27キュービット未満のアルゴリズムの次の最速オプションと比較して、量子回路の実行ランタイムを最大78%削減できることがわかった。 AWS SV1シミュレータは、SV1で利用可能な最大34キュービットまでの大きな回路に対して、ランタイム上のアドバンテージを提供する。この制限を超えて、QMwareは40キュービットの回路を実行する機能を提供する。 RigettiのAspen-M2のような物理量子デバイスは、30以上の回路に対して指数的ランタイムの利点を提供することができる。しかし、物理的量子処理ユニットの高コストは、実用化への深刻な障壁となっている。さらに、試験された4つの量子デバイスのうち、IonQのHarmonyのみが4ビット以上の高忠実性を達成する。この研究は、実用的な量子アルゴリズムを実行するための利用可能なソフトウェアとハードウェアの最適な組み合わせを理解する方法を示している。 Powerful hardware services and software libraries are vital tools for quickly and affordably designing, testing, and executing quantum algorithms. A robust large-scale study of how the performance of these platforms scales with the number of qubits is key to providing quantum solutions to challenging industry problems. Such an evaluation is difficult owing to the availability and price of physical quantum processing units. This work benchmarks the runtime and accuracy for a representative sample of specialized high-performance simulated and physical quantum processing units. Results show the QMware cloud computing service can reduce the runtime for executing a quantum circuit by up to 78% compared to the next fastest option for algorithms with fewer than 27 qubits. The AWS SV1 simulator offers a runtime advantage for larger circuits, up to the maximum 34 qubits available with SV1. Beyond this limit, QMware provides the ability to execute circuits as large as 40 qubits. Physical quantum devices, such as Rigetti's Aspen-M2, can provide an exponential runtime advantage for circuits with more than 30. However, the high financial cost of physical quantum processing units presents a serious barrier to practical use. Moreover, of the four quantum devices tested, only IonQ's Harmony achieves high fidelity with more than four qubits. This study paves the way to understanding the optimal combination of available software and hardware for executing practical quantum algorithms.	翻訳日:2022-11-29 23:01:59 公開日:2022-11-28
# RAMP:分散ディープラーニングシステムのためのフラットナノ秒光ネットワークとMPI操作 RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems ( http://arxiv.org/abs/2211.15226v1 ) ライセンス: Link先を確認	Alessandro Ottino, Joshua Benjamin, Georgios Zervas	(参考訳) 分散ディープラーニング(DDL)システムはネットワーク性能に強く依存する。現在の電子パケット交換(eps)ネットワークアーキテクチャと技術は、可変径トポロジー、低バイス帯域幅、通信や集団操作の完了時間に影響するオーバーサブリプションに苦しむ。我々は,大規模分散並列コンピューティングシステム(ノード1ノードあたり12.8～tbps,最大65,536ノード)をサポートする,ナノ秒再構成と呼ばれるネットワークアーキテクチャを導入する。光回路スイッチング(OCS)ネットワーク上で,RAMP-xのMPI戦略とネットワークトランスコーダをスケジュールのない競合のない方法で動作させる方法が提案されている。 RAMPは7.6-171$\times$ quickly-up in completion time across all MPI operations than real EPS and OCS equivalents。また、1.3-16$\times$と7.8-58$\times$がmegatronとdlrmのトレーニング時間をそれぞれ削減し、42-53$\times$と3.3-12.4$\times$がエネルギー消費とコストをそれぞれ改善できる。 Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations. We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8~Tbps per node for up to 65,536 nodes). For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171$\times$ speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and 7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption and cost respectively.	翻訳日:2022-11-29 23:01:40 公開日:2022-11-28
# 機械学習によるPDEバックステッピングオブザーバの高速化 Machine Learning Accelerated PDE Backstepping Observers ( http://arxiv.org/abs/2211.15044v1 ) ライセンス: Link先を確認	Yuanyuan Shi, Zongyi Li, Huan Yu, Drew Steeves, Anima Anandkumar, Miroslav Krstic	(参考訳) 状態推定は、予測からフィードバックコントローラの未測定状態の置換まで、さまざまなタスクにおいて重要である。 PDEのバックステッピングをベースとした観測者など、実証的かつ迅速に収束する観測者によるPDEのリアルタイム状態推定は、計算コストが高く、多くの場合禁止される。精度を保ちながらより高速な学習手法を用いてPDEオブザーバ計算を高速化するフレームワークを提案する。特に、最近開発されたフーリエニューラル演算子(FNO)を用いて、初期観測値と境界測定値から状態推定値への関数マッピングを学習する。特定の収束率を保証した前設計の観測者に対してバックステッピングオブザーバゲインを用いることで,fno による計算効率の向上を評価する数値実験を行う。まず, 反応拡散(パラボリック)PDEに対して, その状態が指数的収束率で推定される場合, および, パラボリックPDEに対して, 正確な所定時間推定を行う場合, および, 交通流密度と速度をモデル化する一階双曲PDEを結合した一階双曲PDEについて, 状態推定を行う。これらのPDEのシミュレーションデータセットで訓練されたML加速オブザーバは、古典的手法と比較して計算速度が最大で3桁向上する。これは、リアルタイム状態推定と制御のためのml加速オブザーバの魅力を示す。 State estimation is important for a variety of tasks, from forecasting to substituting for unmeasured states in feedback controllers. Performing real-time state estimation for PDEs using provably and rapidly converging observers, such as those based on PDE backstepping, is computationally expensive and in many cases prohibitive. We propose a framework for accelerating PDE observer computations using learning-based approaches that are much faster while maintaining accuracy. In particular, we employ the recently-developed Fourier Neural Operator (FNO) to learn the functional mapping from the initial observer state and boundary measurements to the state estimate. By employing backstepping observer gains for previously-designed observers with particular convergence rate guarantees, we provide numerical experiments that evaluate the increased computational efficiency gained with FNO. We consider the state estimation for three benchmark PDE examples motivated by applications: first, for a reaction-diffusion (parabolic) PDE whose state is estimated with an exponential rate of convergence; second, for a parabolic PDE with exact prescribed-time estimation; and, third, for a pair of coupled first-order hyperbolic PDEs that modeling traffic flow density and velocity. The ML-accelerated observers trained on simulation data sets for these PDEs achieves up to three orders of magnitude improvement in computational speed compared to classical methods. This demonstrates the attractiveness of the ML-accelerated observers for real-time state estimation and control.	翻訳日:2022-11-29 22:55:27 公開日:2022-11-28
# 深部平衡学習を用いた軽量・適応FDD質量型MIMO CSIフィードバック Lightweight and Adaptive FDD Massive MIMO CSI Feedback with Deep Equilibrium Learning ( http://arxiv.org/abs/2211.15079v1 ) ライセンス: Link先を確認	Yifan Ma, Wentao Yu, Xianghao Yu, Jun Zhang, Shenghui Song, Khaled B. Letaief	(参考訳) 広帯域多重出力(MIMO)システムでは、ダウンリンクチャネル状態情報(CSI)をユーザから基地局(BS)に送信する必要がある。本稿では,深層平衡モデルを用いた軽量かつ適応的な深層学習に基づくCSIフィードバック方式を提案する。複数の明示的な層を積み重ねる既存のディープラーニングベースのアプローチとは異なり、無限深層ニューラルネットワークの過程を模倣する暗黙の平衡ブロックを提案する。特に、暗黙の平衡ブロックは固定点反復によって定義され、各イテレーションの訓練可能なパラメータは共有され、結果として軽量モデルとなる。さらに、ユーザの計算能力に応じて前方イテレーションの数を調整でき、オンラインの精度と効率のトレードオフを実現できる。シミュレーションの結果,提案手法は既存のベンチマークに匹敵する性能を示すが,複雑さが大きく,実行時に精度・効率のトレードオフが可能であることが示された。 In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, downlink channel state information (CSI) needs to be sent from users back to the base station (BS), which causes prohibitive feedback overhead. In this paper, we propose a lightweight and adaptive deep learning-based CSI feedback scheme by capitalizing on deep equilibrium models. Different from existing deep learning-based approaches that stack multiple explicit layers, we propose an implicit equilibrium block to mimic the process of an infinite-depth neural network. In particular, the implicit equilibrium block is defined by a fixed-point iteration and the trainable parameters in each iteration are shared, which results in a lightweight model. Furthermore, the number of forward iterations can be adjusted according to the users' computational capability, achieving an online accuracy-efficiency trade-off. Simulation results will show that the proposed method obtains a comparable performance as the existing benchmarks but with much-reduced complexity and permits an accuracy-efficiency trade-off at runtime.	翻訳日:2022-11-29 22:55:01 公開日:2022-11-28
# 対向機械学習における非局所周波のガンマ収束 Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning ( http://arxiv.org/abs/2211.15223v1 ) ライセンス: Link先を確認	Leon Bungert, Kerrek Stinson	(参考訳) 本稿では,ミンコフスキー型非局所周囲を局所異方性周囲に収束させるガンマコンバージェンスを証明する。非局所モデルは、二分分類における逆訓練の正規化効果を記述する。エネルギーは本質的に2つの分布間の相互作用に依存し、関連するクラスの確率をモデル化する。我々は、分布の典型的な厳密な規則性仮定を克服し、それらは$bv$ 密度を持つと仮定するだけである。コンパクト性から生じる自然トポロジーにおいて, 2つの密度の異方性関数によって決定される重み付き周囲にガンマ収束が証明される。局所的であるにもかかわらず、この鋭いインターフェイス制限は、対向摂動に関する分類安定性を反映している。さらに, 関連する全変動のガンマコンバージェンスを推定し, 逆訓練の漸近性について検討し, 非局所周囲におけるグラフ離散化のガンマコンバージェンスを証明する。 In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.	翻訳日:2022-11-29 22:54:43 公開日:2022-11-28
# 1次元畳み込みニューラルネットワークのリプシッツ定数推定 Lipschitz constant estimation for 1D convolutional neural networks ( http://arxiv.org/abs/2211.15253v1 ) ライセンス: Link先を確認	Patricia Pauli and Dennis Gramlich and Frank Allg\"ower	(参考訳) 本研究では,1次元畳み込みニューラルネットワーク(CNN)のリプシッツ定数推定法を提案する。特に,畳み込み,プーリング,および完全連結層の分散特性を,非線形活性化関数とプーリング演算に漸進的2次制約を適用して解析する。これらの写像の連結のリプシッツ定数は、分離性理論から導かれる半定値のプログラムを解いて推定される。提案手法を極力効率的にするために,これらの有限インパルス応答フィルタを状態空間の因果力学系として実現し,状態空間実現のための分散解析を行うために,畳み込み層の構造を考慮に入れた。我々が提示した例は、我々のリプシッツ境界が正確性と拡張性の観点から有利であることを示している。 In this work, we propose a dissipativity-based method for Lipschitz constant estimation of 1D convolutional neural networks (CNNs). In particular, we analyze the dissipativity properties of convolutional, pooling, and fully connected layers making use of incremental quadratic constraints for nonlinear activation functions and pooling operations. The Lipschitz constant of the concatenation of these mappings is then estimated by solving a semidefinite program which we derive from dissipativity theory. To make our method as efficient as possible, we take the structure of convolutional layers into account realizing these finite impulse response filters as causal dynamical systems in state space and carrying out the dissipativity analysis for the state space realizations. The examples we provide show that our Lipschitz bounds are advantageous in terms of accuracy and scalability.	翻訳日:2022-11-29 22:54:29 公開日:2022-11-28
# 弱連結ネットワークシステムにおけるコヒーレントクラスタの学習 Learning Coherent Clusters in Weakly-Connected Network Systems ( http://arxiv.org/abs/2211.15301v1 ) ライセンス: Link先を確認	Hancheng Min and Enrique Mallada	(参考訳) 本稿では,密結合コンポーネントを用いた大規模動的ネットワークのための構造保存モデル還元手法を提案する。まず、コヒーレント群は、ネットワークフィードバックをモデル化したグラフラプラシア行列上のスペクトルクラスタリングアルゴリズムによって同定される。次に、各ノードが各コヒーレントグループの集合ダイナミクスを表すように縮小されたネットワークを構築し、還元されたネットワークがグループ間の動的結合をキャプチャする。重み付き確率ブロックモデルからネットワークグラフをランダムに生成する場合、近似誤差の上限を与える。最後に, 数値実験は理論的な知見と一致し, 検証する。 We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. Then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.	翻訳日:2022-11-29 22:54:17 公開日:2022-11-28
# 計算流体力学における機械学習の新興動向 Emerging trends in machine learning for computational fluid dynamics ( http://arxiv.org/abs/2211.15145v1 ) ライセンス: Link先を確認	Ricardo Vinuesa and Steve Brunton	(参考訳) 機械学習(ml)の科学コミュニティからの新たな関心は、多くの新しい研究分野を開いている。ここでは、計算流体力学(CFD)の分野を改善する機会を提供するMLの新たなトレンドに焦点を当てる。特に,すでに利益を示しているMLとCFDの相乗効果について論じるとともに,現在開発中であり,今後数年で重要な利益をもたらす可能性のある領域も評価する。我々は、これらの新興アプローチに対する慎重な楽観主義のバランスのとれた視点を強調することも重要であると信じている。 The renewed interest from the scientific community in machine learning (ML) is opening many new areas of research. Here we focus on how novel trends in ML are providing opportunities to improve the field of computational fluid dynamics (CFD). In particular, we discuss synergies between ML and CFD that have already shown benefits, and we also assess areas that are under development and may produce important benefits in the coming years. We believe that it is also important to emphasize a balanced perspective of cautious optimism for these emerging approaches	翻訳日:2022-11-29 20:45:57 公開日:2022-11-28
# aquafel-pso:マルチモーダルpsoとフェデレーション学習に基づく自律型表面車両を用いた水資源モニタリングシステム AquaFeL-PSO: A Monitoring System for Water Resources using Autonomous Surface Vehicles based on Multimodal PSO and Federated Learning ( http://arxiv.org/abs/2211.15217v1 ) ライセンス: Link先を確認	Micaela Jara Ten Kathen, Princy Johnson, Isabel Jurado Flores, Daniel Guti errez Reina	(参考訳) 水資源の保存、モニタリング、管理は、ここ数十年で大きな課題となっている。水資源は、水の汚染レベルを知るために常に監視されなければならない。本研究の目的は,マルチモーダル粒子群最適化に基づく水質センサを備えた自律型表面車両を用いた水監視システムと,ガウス過程をサロゲートモデルとしてアクアフェル-psoアルゴリズムを用いたフェデレーション学習手法を提案することである。提案するモニタリングシステムは,探索フェーズと搾取フェーズの2つのフェーズを有する。調査段階では、車両は水資源の表面を調べ、水質センサによって取得されたデータにより、第1の水質モデルが中央サーバで推定される。利用フェーズでは, 調査フェーズで推定したモデルを用いて, 領域をアクションゾーンに分割し, 汚染ゾーンをよりよく活用する。水資源の最終的な水質モデルを得るため、両方の相で得られたモデルが組み合わされる。その結果,提案する経路プランナーは,他の経路プランナーと比較して14$%$改善し,水資源全体において400$$$$改善モデルが得られ,汚染ピークの検出においても4,000$$%改善が得られた。また,フェデレート学習技術を適用した結果が,集中型システムの結果と非常によく似ていることも証明された。 The preservation, monitoring, and control of water resources has been a major challenge in recent decades. Water resources must be constantly monitored to know the contamination levels of water. To meet this objective, this paper proposes a water monitoring system using autonomous surface vehicles, equipped with water quality sensors, based on a multimodal particle swarm optimization, and the federated learning technique, with Gaussian process as a surrogate model, the AquaFeL-PSO algorithm. The proposed monitoring system has two phases, the exploration phase and the exploitation phase. In the exploration phase, the vehicles examine the surface of the water resource, and with the data acquired by the water quality sensors, a first water quality model is estimated in the central server. In the exploitation phase, the area is divided into action zones using the model estimated in the exploration phase for a better exploitation of the contamination zones. To obtain the final water quality model of the water resource, the models obtained in both phases are combined. The results demonstrate the efficiency of the proposed path planner in obtaining water quality models of the pollution zones, with a 14$\%$ improvement over the other path planners compared, and the entire water resource, obtaining a 400$\%$ better model, as well as in detecting pollution peaks, the improvement in this case study is 4,000$\%$. It was also proven that the results obtained by applying the federated learning technique are very similar to the results of a centralized system.	翻訳日:2022-11-29 20:45:49 公開日:2022-11-28
# 確率的シュテッフェンセン法 Stochastic Steffensen method ( http://arxiv.org/abs/2211.15310v1 ) ライセンス: Link先を確認	Minda Zhao, Zehua Lai, and Lek-Heng Lim	(参考訳) 一階法、すなわち、第一導関数のみが許される場合、二次収束することは可能であるか。不定損失関数の場合、答えは yes である -- steffensen 法は第二導関数を避け、ニュートン法のように二次収束する。最適なステップサイズを組み込むことで、収束順序を2次から1+\sqrt{2} \approx 2.414$まで押し上げることもできる。このような高い収束順序は決定論的アルゴリズムの無意味なオーバーキルであるが、アルゴリズムが巨大なサイズの問題に対してランダム化されると、ランダム化は必ず収束速度を損なう。 steffensen法にインスパイアされた2つの適応学習率を導入する。確率的最適化設定での使用を意図しており、バッチサイズ以外にハイパーパラメータチューニングは不要である。広範な実験により、既存のいくつかの一階法と比較できることがわかった。二次目的に制限された場合、確率的シュテッフェンセン法はランダム化されたカッツマルツ法に還元される(これはSGD や SLBFGS には当てはまらない)。 Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.	翻訳日:2022-11-29 20:45:23 公開日:2022-11-28
# beyond s-curves: 技術予測のためのリカレントニューラルネットワーク Beyond S-curves: Recurrent Neural Networks for Technology Forecasting ( http://arxiv.org/abs/2211.15334v1 ) ライセンス: Link先を確認	Alexander Glavackij, Dimitri Percia David, Alain Mermoud, Angelika Romanou, Karl Aberer	(参考訳) 技術的ランドスケープのかなりの多様性と複雑さのため、正確なモデルを構築して予測することは困難な取り組みである。多くの複雑なシステムにおいて高い頻度でS曲線は以前の研究で一般的な予測手法である。しかし、その予測性能は他の技術予測手法と直接比較されていない。さらに、予測精度の向上を主張する時系列予測の最近の発展は、技術開発データにはまだ適用されていない。本研究は,s曲線の予測性能をベースラインと比較し,機械学習と時系列予測の最近の進歩を用いたautencoderアプローチを開発することにより,両研究のギャップに対処する。 S曲線予測は、単純なARIMAベースラインに匹敵する平均パーセンテージ誤差(MAPE)を示す。しかし、新興技術の少数派にとっては、MAPEは2等級に増大する。我々のオートエンコーダアプローチは、2番目に高い結果に対して平均13.5%改善する。他のアプローチと同じ精度で確立された技術を予測する。しかし、特に新興技術の予測は、平均MAPEが次の最良の結果より18%低いことが強くなっている。以上の結果から,S曲線よりも単純なARIMAモデルの方が好ましいことが示唆された。より正確な予測を求める実践者は、提示されたautoencoderアプローチを選択する必要がある。 Because of the considerable heterogeneity and complexity of the technological landscape, building accurate models to forecast is a challenging endeavor. Due to their high prevalence in many complex systems, S-curves are a popular forecasting approach in previous work. However, their forecasting performance has not been directly compared to other technology forecasting approaches. Additionally, recent developments in time series forecasting that claim to improve forecasting accuracy are yet to be applied to technological development data. This work addresses both research gaps by comparing the forecasting performance of S-curves to a baseline and by developing an autencoder approach that employs recent advances in machine learning and time series forecasting. S-curves forecasts largely exhibit a mean average percentage error (MAPE) comparable to a simple ARIMA baseline. However, for a minority of emerging technologies, the MAPE increases by two magnitudes. Our autoencoder approach improves the MAPE by 13.5% on average over the second-best result. It forecasts established technologies with the same accuracy as the other approaches. However, it is especially strong at forecasting emerging technologies with a mean MAPE 18% lower than the next best result. Our results imply that a simple ARIMA model is preferable over the S-curve for technology forecasting. Practitioners looking for more accurate forecasts should opt for the presented autoencoder approach.	翻訳日:2022-11-29 20:45:02 公開日:2022-11-28
# beyond cage: 学習された自律ネットワーク防衛政策の一般化を調査 Beyond CAGE: Investigating Generalization of Learned Autonomous Network Defense Policies ( http://arxiv.org/abs/2211.15557v1 ) ライセンス: Link先を確認	Melody Wolk, Andy Applebaum, Camron Denver, Patrick Dwyer, Marina Moskowitz, Harold Nguyen, Nicole Nichols, Nicole Park, Paul Rachwalski, Frank Rau, Adrian Webster	(参考訳) 強化学習(RL)の進歩は、ネットワーク防御のインテリジェントな自動化に新たな方向性をもたらした。しかし、これらの進歩の多くは、自分たちのアプリケーションをネットワークセキュリティに上回っているか、現実の世界でそれを実装する際の課題を考慮していない。これらの問題を理解するために,本研究では,高忠実度ネットワークシミュレータを用いた自律型ネットワークディフェンサエージェント構築のための公開競争であるCAGE Challengeの第2版で実施されたいくつかのRLアプローチを評価する。我々のアプローチはすべて、アルゴリズムのPPO(Proximal Policy Optimization)ファミリに基づいており、階層的RL、アクションマスキング、カスタムトレーニング、アンサンブルRLを含んでいる。アンサンブルRL技術は,我々の他のモデルより優れ,競争において第2位である。実環境への適用性を理解するため,未知のネットワークや未知の攻撃戦略に対して,各手法の一般化能力を評価する。目に見えない環境では, 環境変化のタイプによって劣化が変化するなど, 全てのアプローチが悪化する。未知の攻撃戦略に対して、新しい戦略はトレーニングしたモデルよりも効率的ではありませんでしたが、我々のモデルは全体的なパフォーマンスを低下させました。これらの結果は、現実世界における自律的ネットワーク防衛のための有望な研究方向を強調する。 Advancements in reinforcement learning (RL) have inspired new directions in intelligent automation of network defense. However, many of these advancements have either outpaced their application to network security or have not considered the challenges associated with implementing them in the real-world. To understand these problems, this work evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defender agent in a high-fidelity network simulator. Our approaches all build on the Proximal Policy Optimization (PPO) family of algorithms, and include hierarchical RL, action masking, custom training, and ensemble RL. We find that the ensemble RL technique performs strongest, outperforming our other models and taking second place in the competition. To understand applicability to real environments we evaluate each method's ability to generalize to unseen networks and against an unknown attack strategy. In unseen environments, all of our approaches perform worse, with degradation varied based on the type of environmental change. Against an unknown attacker strategy, we found that our models had reduced overall performance even though the new strategy was less efficient than the ones our models trained on. Together, these results highlight promising research directions for autonomous network defense in the real world.	翻訳日:2022-11-29 20:44:32 公開日:2022-11-28
# 乳癌データ統合のためのグラフニューラルネットワーク Graph Neural Networks for Breast Cancer Data Integration ( http://arxiv.org/abs/2211.15561v1 ) ライセンス: Link先を確認	Teodora Reu	(参考訳) METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) などの国際イニシアチブは、様々ながんの進化を通じて進行中の分子過程を特定するために、複数の多ゲノミクスおよび臨床データセットを収集している。多くの機械学習と統計モデルは、これらのタイプのデータを独立して分析するために設計、訓練されてきたが、そのような異なる形状のソース情報ストリームの統合は、広く研究されていない。これらのデータセットをよりうまく統合し、最終的にがん検出タスクに活用できる有意義な表現を生成することで、患者に適切な治療を与えることができる。そこで我々は,ガンデータモダリティをグラフとして統合し,次いでグラフニューラルネットワークを教師なし環境で適用して,組み合わせたデータから低次元の埋め込みを生成し,最終的に癌サブタイプの分類モデルに新しい表現を与えて評価する,という3つのステップからなる新しい学習パイプラインを提案する。グラフ構築アルゴリズムは、METABRICは患者のモダリティ間の関係を記憶していないため、それらが生成した埋め込みの品質に与える影響について議論している。また、グラフニューラルネットワーク、変分グラフオートエンコーダ、ディープグラフ情報マックスといった低遅延空間表現を生成するために使用されるモデルも提示する。並列に、パイプラインを合成データセット上でテストし、ホモフィリーレベルなどの基礎となるデータの特徴が、人工データにおける51\%から98\%の精度、METABRICにおける13\%と80\%の精度に大きく影響を与えることを示した。このプロジェクトは、がんデータ理解を改善する可能性があり、正規データセットからグラフ型データへの移行を促進する。 International initiatives such as METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) have collected several multigenomic and clinical data sets to identify the undergoing molecular processes taking place throughout the evolution of various cancers. Numerous Machine Learning and statistical models have been designed and trained to analyze these types of data independently, however, the integration of such differently shaped and sourced information streams has not been extensively studied. To better integrate these data sets and generate meaningful representations that can ultimately be leveraged for cancer detection tasks could lead to giving well-suited treatments to patients. Hence, we propose a novel learning pipeline comprising three steps - the integration of cancer data modalities as graphs, followed by the application of Graph Neural Networks in an unsupervised setting to generate lower-dimensional embeddings from the combined data, and finally feeding the new representations on a cancer sub-type classification model for evaluation. The graph construction algorithms are described in-depth as METABRIC does not store relationships between the patient modalities, with a discussion of their influence over the quality of the generated embeddings. We also present the models used to generate the lower-latent space representations: Graph Neural Networks, Variational Graph Autoencoders and Deep Graph Infomax. In parallel, the pipeline is tested on a synthetic dataset to demonstrate that the characteristics of the underlying data, such as homophily levels, greatly influence the performance of the pipeline, which ranges between 51\% to 98\% accuracy on artificial data, and 13\% and 80\% on METABRIC. This project has the potential to improve cancer data understanding and encourages the transition of regular data sets to graph-shaped data.	翻訳日:2022-11-29 20:44:11 公開日:2022-11-28
# Action-GPT: 改良および一般化されたゼロショットアクション生成のための大規模言語モデルを活用する Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation ( http://arxiv.org/abs/2211.15603v1 ) ライセンス: Link先を確認	Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla	(参考訳) 本稿では,大規模言語モデル(LLM)をテキストベースのアクション生成モデルに組み込むためのプラグインおよびプレイフレームワークであるAction-GPTを紹介する。現在のモーションキャプチャデータセットにおけるアクションフレーズは、最小限の情報とポイント情報を含む。 LLMのプロンプトを慎重に作成することにより、アクションのよりリッチできめ細かい記述を生成する。動作句の代わりにこれらの詳細記述を利用することで,テキストと動き空間のアライメントが向上することを示す。本実験は,最近のテキスト・ツー・モーションモデルによる合成運動の質の質的,定量的な改善を示す。コード、事前トレーニングされたモデル、サンプルビデオはhttps://actiongpt.github.ioで入手できる。 We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at https://actiongpt.github.io	翻訳日:2022-11-29 20:43:41 公開日:2022-11-28
# 2つは1つより優れている:補完的な製品推奨のためのデュアル埋め込み Two Is Better Than One: Dual Embeddings for Complementary Product Recommendations ( http://arxiv.org/abs/2211.14982v1 ) ライセンス: Link先を確認	Giorgi Kvernadze, Putu Ayu G. Sudyanti, Nishan Subedi, Mohammad Hajiaghayi	(参考訳) 近年,大規模なシステムに容易に統合でき,近隣の検索をリアルタイムに行えるため,埋め込みベースの製品レコメンデーションが人気を集めている。この領域における多くの研究は、主に類似の項目の推薦に焦点を当てている。一方,相補的項目推薦の研究は,まだ未検討のままである。類似の項目を,有用性の観点から交換可能な項目と定義し,異なる目的に適合するが,相互に使用する場合には互換性を持つ項目として補完的項目を定義した。本稿では,製品に対する二重埋め込み表現を活用し,補完的項目を見つけるための新しい手法を提案する。本研究では,NLP におけるスキップグラム陰性サンプリング (SGNS) モデルにおける関連性の概念が,共購入データを用いてアイテム表現を訓練する際の相補性の概念に有効であることを示す。実際のシナリオでは,購入データの分散が大きな課題となるため,包括範囲を拡大するために合成サンプルを用いたモデルをさらに強化する。これにより、画像、テキスト、クリックなどの豊富なデータモダリティを活用することで、共購入データを共有しない項目に対して補完的なレコメンデーションを提供することができる。我々は,大手オンライン小売企業において,実世界のデータに対するレコメンデーションのカバレッジと品質を向上させるためのアプローチの有効性を確立した。さらに,SGNS訓練におけるタスク特化ハイパーパラメータチューニングの重要性を示す。我々のモデルは実装が簡単であり、あらゆるeコマースウェブサイトで補完的なアイテムレコメンデーションを生成するための優れた候補となる。 Embedding based product recommendations have gained popularity in recent years due to its ability to easily integrate to large-scale systems and allowing nearest neighbor searches in real-time. The bulk of studies in this area has predominantly been focused on similar item recommendations. Research on complementary item recommendations, on the other hand, still remains considerably under-explored. We define similar items as items that are interchangeable in terms of their utility and complementary items as items that serve different purposes, yet are compatible when used with one another. In this paper, we apply a novel approach to finding complementary items by leveraging dual embedding representations for products. We demonstrate that the notion of relatedness discovered in NLP for skip-gram negative sampling (SGNS) models translates effectively to the concept of complementarity when training item representations using co-purchase data. Since sparsity of purchase data is a major challenge in real-world scenarios, we further augment the model using synthetic samples to extend coverage. This allows the model to provide complementary recommendations for items that do not share co-purchase data by leveraging other abundantly available data modalities such as images, text, clicks etc. We establish the effectiveness of our approach in improving both coverage and quality of recommendations on real world data for a major online retail company. We further show the importance of task specific hyperparameter tuning in training SGNS. Our model is effective yet simple to implement, making it a great candidate for generating complementary item recommendations at any e-commerce website.	翻訳日:2022-11-29 20:35:28 公開日:2022-11-28
# 疎高次元線形回帰に対する適応的最短解導除法 An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression ( http://arxiv.org/abs/2211.15057v1 ) ライセンス: Link先を確認	Xue Yu, Yifan Sun, Haijun Zhou	(参考訳) 高次元線形回帰モデルは高次元データの統計モデルとしては最も一般的なものであるが、偏差係数のスパースセットを達成することは極めて難しい課題である。本稿では, 最短解導出デシメーションアルゴリズムから適応し, assdと呼ばれる, スパース高次元線形回帰モデルを構築するための単純ヒューリスティックなアルゴリズムを提案する。このアルゴリズムは再帰的減算線形方程式の最小二乗解の指導の下で回帰係数の支持を構築し、早期停止基準と二段階しきい値法を適用してこの支持を洗練する。以上の結果から,ASSDはLASSO,ベクトル近似メッセージパッシング,および他の2つの代表的グリージーアルゴリズムよりも解の精度と堅牢性に優れていた。 ASSDは、実世界の応用で遭遇する高度に相関した測定行列を持つ線形回帰問題に特に適している。 High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest solution-guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the least-squares solution of the recursively decimated linear equations, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outperforms LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.	翻訳日:2022-11-29 20:35:03 公開日:2022-11-28
# lone sampler:コーディネートローカル近傍サンプリングによるグラフノード埋め込み LoNe Sampler: Graph node embeddings by coordinated local neighborhood sampling ( http://arxiv.org/abs/2211.15114v1 ) ライセンス: Link先を確認	Konstantin Kutzkov	(参考訳) 局所グラフ近傍サンプリングは、ノード表現学習のアルゴリズムの中心にある基本的な計算問題である。グラフノードを周辺ノードの属性のような離散的な特徴で表現する離散ノード埋め込みを学習するためのアルゴリズムがいくつか提案されている。離散埋め込みは、連続的なword2vecライクなノード埋め込みと比較して、いくつかの利点を提供している: 計算の容易さ、拡張性、解釈性。我々は,局所的な近傍サンプリングにより離散ノード埋め込みを生成するアルゴリズムのスイートであるlone samplerを提案する。まず、我々のアルゴリズムは理論的性質を厳密に理解した。第2に,カーネルモデルのトレーニングのためのグラム行列の高価な計算を避けるために,近似的ベクトル写像を生成する方法を示す。ベンチマークデータセットの実験は理論的な結果を確認し、提案手法の利点を実証する。 Local graph neighborhood sampling is a fundamental computational problem that is at the heart of algorithms for node representation learning. Several works have presented algorithms for learning discrete node embeddings where graph nodes are represented by discrete features such as attributes of neighborhood nodes. Discrete embeddings offer several advantages compared to continuous word2vec-like node embeddings: ease of computation, scalability, and interpretability. We present LoNe Sampler, a suite of algorithms for generating discrete node embeddings by Local Neighborhood Sampling, and address two shortcomings of previous work. First, our algorithms have rigorously understood theoretical properties. Second, we show how to generate approximate explicit vector maps that avoid the expensive computation of a Gram matrix for the training of a kernel model. Experiments on benchmark datasets confirm the theoretical findings and demonstrate the advantages of the proposed methods.	翻訳日:2022-11-29 20:34:46 公開日:2022-11-28
# より高速な$k$-means++アルゴリズム A Faster $k$-means++ Algorithm ( http://arxiv.org/abs/2211.15118v1 ) ライセンス: Link先を確認	Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Danyang Zhuo	(参考訳) K-means++は、k-meansクラスタリングアルゴリズムの初期クラスタセンターを選択するための重要なアルゴリズムである。そこで本研究では,k$-means++問題を最適実行時間で解く新しいアルゴリズムを提案する。 n$のデータポイントが$\mathbb{r}^d$で与えられると、現在の最先端のアルゴリズムは$\widetilde{o}(k)$の反復で動作し、各イテレーションは$\widetilde{o}(nd k)$の時間を要する。従って、全体の実行時間は$\widetilde{O}(n d k^2)$である。我々は,$\widetilde{o}(nd + nk^2)$ の時間しかかからない新しいアルゴリズム \textsc{fastkmeans++} を提案する。 K-means++ is an important algorithm to choose initial cluster centers for the k-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with near optimal running time. Given $n$ data points in $\mathbb{R}^d$, the current state-of-the-art algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes $\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d k^2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.	翻訳日:2022-11-29 20:34:32 公開日:2022-11-28
# SuperFusion:Long-Range HD Map生成と予測のためのマルチレベルLiDAR-Camera Fusion SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation and Prediction ( http://arxiv.org/abs/2211.15656v1 ) ライセンス: Link先を確認	Hao Dong, Xianjing Zhang, Xuan Jiang, Jun Zhang, Jintao Xu, Rui Ai, Weihao Gu, Huimin Lu, Juho Kannala and Xieyuanli Chen	(参考訳) 環境の高精細(HD)セマンティックマップ生成は自律運転の重要な構成要素である。既存の手法は、LiDARやカメラなど、様々なセンサーモードを融合することにより、このタスクにおいて優れたパフォーマンスを実現している。しかし、現在の作業は生のデータやネットワークの機能レベルの融合に基づいており、短距離のhdマップ生成のみを考慮し、現実的な自動運転アプリケーションへのデプロイを制限している。本稿では,30m以内の短距離でHDマップを構築する作業と,下流経路計画と制御タスクが必要とする90mまでの長距離HDマップの予測に焦点をあてて,自動運転の滑らかさと安全性を向上させる。そこで本研究では,LiDARとカメラデータの融合を利用したSuperFusionというネットワークを提案する。我々は、nuScenesデータセットと自己記録データセットでSuperFusionをベンチマークし、最先端のベースラインメソッドよりも大きなマージンを持つことを示す。さらに,長距離HDマップの予測評価のための新しい指標を提案し,生成したHDマップを下流経路計画タスクに適用する。その結果,提案手法で予測した長距離hdマップを用いることで,自動運転車の経路計画を改善することが可能となった。コードはhttps://github.com/haomo-ai/SuperFusion.comから入手できる。 High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins. Furthermore, we propose a new metric to evaluate the long-range HD map prediction and apply the generated HD map to a downstream path planning task. The results show that by using the long-range HD maps predicted by our method, we can make better path planning for autonomous vehicles. The code will be available at https://github.com/haomo-ai/SuperFusion.	翻訳日:2022-11-29 20:28:19 公開日:2022-11-28
# 変更点検出のためのオンラインカーネルCUSUM Online Kernel CUSUM for Change-Point Detection ( http://arxiv.org/abs/2211.15070v1 ) ライセンス: Link先を確認	Song Wei, Yao Xie	(参考訳) 我々は、異なるウィンドウサイズを持つ並列カーネル統計セットからなるオンラインカーネル累積Sum(CUSUM)手順を開発し、未知の変化点位置を考慮に入れた。 Shewhartチャート形式に対応する既存のスライディングウィンドウベースのカーネル変更点検出手順と比較して,提案手法は小さな変更に対してより敏感である。さらに、オンライン処理において一定の計算量とメモリの複雑さを達成するために重要となる検出統計の再帰的計算を、計算のボトルネックとなるグラム行列全体を計算し記憶する必要がないように提示する。本研究では,2つの基本性能指標,平均走行長(ARL)と予測検出遅延(EDD)を正確に解析する。さらに、任意のウィンドウサイズを $\log ({\rm arl})$ の順に定め、oracle のプロシージャと比較してほとんど電力損失がないようにし、これは window-limited generalized likelihood ratio (glr) 手順の古典的な結果と類似している。提案手法の理論的結果と競合性能を検証するため, 広範な数値実験を行った。 We develop an online kernel Cumulative Sum (CUSUM) procedure, which consists of a parallel set of kernel statistics with different window sizes to account for the unknown change-point location. Compared with many existing sliding window-based kernel change-point detection procedures, which correspond to the Shewhart chart-type procedure, the proposed procedure is more sensitive to small changes. We further present a recursive computation of detection statistics, which is crucial for online procedures to achieve a constant computational and memory complexity, such that we do not need to calculate and remember the entire Gram matrix, which can be a computational bottleneck otherwise. We obtain precise analytic approximations of the two fundamental performance metrics, the Average Run Length (ARL) and Expected Detection Delay (EDD). Furthermore, we establish the optimal window size on the order of $\log ({\rm ARL})$ such that there is nearly no power loss compared with an oracle procedure, which is analogous to the classic result for window-limited Generalized Likelihood Ratio (GLR) procedure. We present extensive numerical experiments to validate our theoretical results and the competitive performance of the proposed method.	翻訳日:2022-11-29 20:26:52 公開日:2022-11-28
# Deep Learning Inverse Technique を用いたニアフィルタSAR画像復元 : 予備的検討 Near-filed SAR Image Restoration with Deep Learning Inverse Technique: A Preliminary Study ( http://arxiv.org/abs/2211.14990v1 ) ライセンス: Link先を確認	Xu Zhan, Xiaoling Zhang, Wensi Zhang, Jun Shi, Shunjun Wei, Tianjiao Zeng	(参考訳) 比較的大きな開口角と広い伝送帯域と組み合わせて、近距離場合成開口レーダー(SAR)は、ターゲットの散乱分布ホットスポットの高解像度画像を提供する。一方、撮像結果は、サイドローブ、クラッタ、ノイズから必然的に劣化し、ターゲットの情報検索を妨げる。イメージを復元するために、現在の手法では、例えば、点拡散関数(PSF)は空間的に一貫したものであり、ターゲットはスパース点散乱器などで構成されている。これにより、特に複雑なターゲットに対して、ターゲット形状の限定的な復元性能が得られる。これらの課題に対処するために,本研究における近年の有望な深層学習逆テクニックによる復元に関する予備的研究を行った。本研究では,分解モデルを,近接場sarのシステム応答を考慮した空間変数複素畳み込みモデルに再構成する。それに合わせて、モデルベースのディープラーニングネットワークは、イメージを復元するように設計されている。複数の複雑なターゲットモデルからのシミュレーション劣化画像データセットを構築し,ネットワークの検証を行った。全ての画像は電磁シミュレーションツールを用いて定式化される。データセットの実験は、その有効性を明らかにする。現在の手法と比較して、目標形状とエネルギー推定に関して優れた性能が得られる。 Benefiting from a relatively larger aperture's angle, and in combination with a wide transmitting bandwidth, near-field synthetic aperture radar (SAR) provides a high-resolution image of a target's scattering distribution-hot spots. Meanwhile, imaging result suffers inevitable degradation from sidelobes, clutters, and noises, hindering the information retrieval of the target. To restore the image, current methods make simplified assumptions; for example, the point spread function (PSF) is spatially consistent, the target consists of sparse point scatters, etc. Thus, they achieve limited restoration performance in terms of the target's shape, especially for complex targets. To address these issues, a preliminary study is conducted on restoration with the recent promising deep learning inverse technique in this work. We reformulate the degradation model into a spatially variable complex-convolution model, where the near-field SAR's system response is considered. Adhering to it, a model-based deep learning network is designed to restore the image. A simulated degraded image dataset from multiple complex target models is constructed to validate the network. All the images are formulated using the electromagnetic simulation tool. Experiments on the dataset reveal their effectiveness. Compared with current methods, superior performance is achieved regarding the target's shape and energy estimation.	翻訳日:2022-11-29 20:19:31 公開日:2022-11-28
# 断層sarイメージングのための多次元特徴量埋め込みモデルデータ駆動ネットワーク A Model-data-driven Network Embedding Multidimensional Features for Tomographic SAR Imaging ( http://arxiv.org/abs/2211.15002v1 ) ライセンス: Link先を確認	Yu Ren, Xiaoling Zhang, Xu Zhan, Jun Shi, Shunjun Wei, Tianjiao Zeng	(参考訳) ディープラーニング(DL)ベースのトモグラフィSARイメージングアルゴリズムの研究が徐々に進んでいる。典型的には、展開ネットワークを用いて古典的圧縮センシング法(CS)の反復計算を模倣し、各範囲方位単位を個別に処理する。しかし、この方法で有効活用されるのは1次元の特徴のみである。隣接する分解単位間の相関を直接無視する。そこで本研究では,多次元特徴量に基づくtomosarイメージングを実現するための新しいモデルデータ駆動ネットワークを提案する。ディープ・アンフォールディング法により、2次元ディープ・アンフォールディング・イメージング・ネットワークを構築する。そこで我々は,画像シーンの多次元的特徴を効果的に向上するために,畳み込みエンコーダ・デコーダ構造を2つの2次元処理モジュールに追加する。一方,提案する多機能イメージングネットワークをトレーニングするために,建物シミュレーションデータからなるトモSARシミュレーションデータセットを構築した。実験はモデルの有効性を検証する。従来のCS-based FISTA法とDL-based gamma-Net法と比較して,提案手法は良好な画像精度を有しつつ,完全性を向上させる。 Deep learning (DL)-based tomographic SAR imaging algorithms are gradually being studied. Typically, they use an unfolding network to mimic the iterative calculation of the classical compressive sensing (CS)-based methods and process each range-azimuth unit individually. However, only one-dimensional features are effectively utilized in this way. The correlation between adjacent resolution units is ignored directly. To address that, we propose a new model-data-driven network to achieve tomoSAR imaging based on multi-dimensional features. Guided by the deep unfolding methodology, a two-dimensional deep unfolding imaging network is constructed. On the basis of it, we add two 2D processing modules, both convolutional encoder-decoder structures, to enhance multi-dimensional features of the imaging scene effectively. Meanwhile, to train the proposed multifeature-based imaging network, we construct a tomoSAR simulation dataset consisting entirely of simulation data of buildings. Experiments verify the effectiveness of the model. Compared with the conventional CS-based FISTA method and DL-based gamma-Net method, the result of our proposed method has better performance on completeness while having decent imaging accuracy.	翻訳日:2022-11-29 20:19:12 公開日:2022-11-28
# 可逆ニューラルネットワークによる非知覚的敵攻撃 Imperceptible Adversarial Attack via Invertible Neural Networks ( http://arxiv.org/abs/2211.15030v1 ) ライセンス: Link先を確認	Zihan Chen, Ziyue Wang, Junjie Huang, Wentao Zhao, Xiao Liu, Dejian Guan	(参考訳) 補助的な勾配情報を利用した摂動の追加や、良性画像の既存詳細の破棄は、逆の例を生成するための2つの一般的なアプローチである。視覚インプセプティビリティは、敵の例の望ましい特性であるが、従来の敵の攻撃は、いまだに追跡可能な敵の摂動を生み出している。本稿では,非可逆ニューラルネットワーク(AdvINN)を用いた新たな逆攻撃手法を提案する。具体的には、advinnは可逆ニューラルネットワークの情報保存特性を十分に活用し、ターゲットクラスのクラス固有の意味情報を同時に追加し、元のクラスの識別情報をドロップすることで、逆例を生成する。 CIFAR-10, CIFAR-100, ImageNet-1Kの大規模な実験により, 提案したAdvINN法は, 最先端の手法よりも知覚不可能な逆画像を生成することができ, また, 他の攻撃に比べ, より堅牢な逆画像が得られることを示した。 Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack via Invertible Neural Networks (AdvINN) method to produce robust and imperceptible adversarial examples. Specifically, AdvINN fully takes advantage of the information preservation property of Invertible Neural Networks and thereby generates adversarial examples by simultaneously adding class-specific semantic information of the target class and dropping discriminant information of the original class. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that the proposed AdvINN method can produce less imperceptible adversarial images than the state-of-the-art methods and AdvINN yields more robust adversarial examples with high confidence compared to other adversarial attacks.	翻訳日:2022-11-29 20:18:51 公開日:2022-11-28
# renmin university of china at trecvid 2022: 特徴融合と否定理解によるビデオ検索の改善 Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding ( http://arxiv.org/abs/2211.15039v1 ) ライセンス: Link先を確認	Xirong Li, Aozhu Chen, Ziyue Wang, Fan Hu, Kaibin Tian, Xinru Chen, Chengbo Dong	(参考訳) TRECVID 2022 Ad-hoc Video Search (AVS) 実験を要約する。提案手法は,視覚とテキストの多様な特徴を結合するlightweight attentional feature fusion (laff) と,否定的手がかりを含む問合せに対する双方向否定学習 (bnl) という2つの新しい手法を用いて構築した。特にLAFFは、早期と後期の両方で機能融合を行い、テキストとビデオの両方で多様な(既製の)機能を利用する。多面的自己注意と比較して、LAFFはよりコンパクトだがより効果的である。注意重みはより少ない特徴の選択にも利用でき、検索性能はほとんど保存されている。 BNLは、与えられたトレーニングビデオとそのオリジナルの記述と部分的に否定された記述からなる三重項あたりの双方向制約損失を最小化することにより、否定対応のビデオ検索モデルを訓練する。ビデオ特徴抽出にはCLIP,BLIP,BEiT,ResNeXt-101,irCSNを用いる。テキスト機能に関しては、baba-of-words、 word2vec、CLIP、BLIPを採用しています。トレーニングデータには,MSR-VTT,TGIF,VATEXが組み込まれている。さらに,事前学習のためのv3c1コレクションを自動キャプションする。 TRECVIDベンチマークの2022年版は、再びRUCMMチームにとって実りある参加となった。私たちのベストランは、infapが0.262で、チーム別で2位にランクインします。 We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text and video ends to exploit diverse (off-the-shelf) features. Compared to multi-head self attention, LAFF is much more compact yet more effective. Its attentional weights can also be used for selecting fewer features, with the retrieval performance mostly preserved. BNL trains a negation-aware video retrieval model by minimizing a bidirectionally constrained loss per triplet, where a triplet consists of a given training video, its original description and a partially negated description. For video feature extraction, we use pre-trained CLIP, BLIP, BEiT, ResNeXt-101 and irCSN. As for text features, we adopt bag-of-words, word2vec, CLIP and BLIP. Our training data consists of MSR-VTT, TGIF and VATEX that were used in our previous participation. In addition, we automatically caption the V3C1 collection for pre-training. The 2022 edition of the TRECVID benchmark has again been a fruitful participation for the RUCMM team. Our best run, with an infAP of 0.262, is ranked at the second place teamwise.	翻訳日:2022-11-29 20:18:32 公開日:2022-11-28
# Nested U-Net Architectureによる低磁場MRI超解像 Synthetic Low-Field MRI Super-Resolution Via Nested U-Net Architecture ( http://arxiv.org/abs/2211.15047v1 ) ライセンス: Link先を確認	Aryan Kalluvila, Neha Koonjoo, Danyal Bhutto, Marcio Rockenbach, Matthew S. Rosen	(参考訳) 低磁場MRIスキャナーは、高磁場MRIスキャナーのポータブルで安価な代替品を提供することで、医療画像に革命をもたらす力を持っている。しかし、そのようなスキャナーは通常、ハイフィールドスキャナーよりもかなりノイズが多く、品質も低い。本研究の目的は、低磁場MRIスキャンのSNRと画像品質を改善し、診断能力を向上することである。この問題に対処するため,Nested U-Net ニューラルネットワークアーキテクチャの超解法アルゴリズムを提案し,提案手法を平均PSNR78.83,SSIM0.9551で上回った。 t1-mixデータセットと呼ばれる主要な重み付けmri画像データセットから人工的なノイズダウンサンプリング合成データを用いてネットワークをテストした。ある放射線技師はlikertスケール(1-5)で25枚の画像を記録し、全体の画像品質、解剖学的構造、および我々のアーキテクチャや他の出版作品(sr densenet、generator block、srcnnなど)の診断信頼度を評価しました。また、NLMSE(Natural log mean squared error)と呼ばれる新しいタイプの損失関数も導入する。結論として、Nested U-Netアーキテクチャを用いて、合成低磁場MRIに適用した単一画像超解像のためのより正確なディープラーニング手法を提案する。 Low-field (LF) MRI scanners have the power to revolutionize medical imaging by providing a portable and cheaper alternative to high-field MRI scanners. However, such scanners are usually significantly noisier and lower quality than their high-field counterparts. The aim of this paper is to improve the SNR and overall image quality of low-field MRI scans to improve diagnostic capability. To address this issue, we propose a Nested U-Net neural network architecture super-resolution algorithm that outperforms previously suggested deep learning methods with an average PSNR of 78.83 and SSIM of 0.9551. We tested our network on artificial noisy downsampled synthetic data from a major T1 weighted MRI image dataset called the T1-mix dataset. One board-certified radiologist scored 25 images on the Likert scale (1-5) assessing overall image quality, anatomical structure, and diagnostic confidence across our architecture and other published works (SR DenseNet, Generator Block, SRCNN, etc.). We also introduce a new type of loss function called natural log mean squared error (NLMSE). In conclusion, we present a more accurate deep learning method for single image super-resolution applied to synthetic low-field MRI via a Nested U-Net architecture.	翻訳日:2022-11-29 20:18:09 公開日:2022-11-28
# PlasmoID:薄い血液スミアにおけるインドネシアのマラリア原虫検出とセグメンテーションのためのデータセット PlasmoID: A dataset for Indonesian malaria parasite detection and segmentation in thin blood smear ( http://arxiv.org/abs/2211.15105v1 ) ライセンス: Link先を確認	Hanung Adi Nugroho, Rizki Nurfauzi, E. Elsa Herdiana Murhandarwati, Purwono Purwono	(参考訳) インドネシアは東南アジアで最多のマラリア患者数で2番目に高い国である。ディープラーニングアプローチに基づくマラリア寄生虫セマンティックセグメンテーションの異なる手法は、従来の方法の限界を減らす代替手段である。しかし,大型寄生虫が優勢であり,小寄生虫が抑制されるため,セマンティクスセグメンテーション技術の主な問題点が浮かび上がっている。加えて、データの量と分散は、モデルを確立する上で重要な影響である。本研究では2つの貢献を行う。まず,薄い血液スミアのマラリア寄生虫691点を含む559点の顕微鏡画像を収集した。データセットはPlasmoIDと名付けられ、ほとんどのデータはインドネシアの田舎から来ている。 PlasmoIDは寄生虫の検出とセグメンテーションの目的にも真実を提供する。第二に,rcnnの高速化とセマンティクスセグメンテーション手法を組み合わせたマラリア寄生虫のセグメンテーションと検出手法を提案する。提案手法はPlasmoIDデータセット上で評価されている。 UNet、ResFCN-18、DeepLabV3、DeepLabV3plus、ResUNet-18といったセマンティックセグメンテーション技術の研究と比較されている。その結果,本手法はマラリア寄生虫のセグメンテーションと検出を,本来のセグメンテーション手法と比較して改善できることがわかった。 Indonesia holds the second-highest-ranking country for the highest number of malaria cases in Southeast Asia. A different malaria parasite semantic segmentation technique based on a deep learning approach is an alternative to reduce the limitations of traditional methods. However, the main problem of the semantic segmentation technique is raised since large parasites are dominant, and the tiny parasites are suppressed. In addition, the amount and variance of data are important influences in establishing their models. In this study, we conduct two contributions. First, we collect 559 microscopic images containing 691 malaria parasites of thin blood smears. The dataset is named PlasmoID, and most data comes from rural Indonesia. PlasmoID also provides ground truth for parasite detection and segmentation purposes. Second, this study proposes a malaria parasite segmentation and detection scheme by combining Faster RCNN and a semantic segmentation technique. The proposed scheme has been evaluated on the PlasmoID dataset. It has been compared with recent studies of semantic segmentation techniques, namely UNet, ResFCN-18, DeepLabV3, DeepLabV3plus and ResUNet-18. The result shows that our proposed scheme can improve the segmentation and detection of malaria parasite performance compared to original semantic segmentation techniques.	翻訳日:2022-11-29 20:17:48 公開日:2022-11-28
# グローバルセンシング品質最大化に向けて:カメラネットワークの構成最適化スキーム Toward Global Sensing Quality Maximization: A Configuration Optimization Scheme for Camera Networks ( http://arxiv.org/abs/2211.15166v1 ) ライセンス: Link先を確認	Xuechao Zhang, Xuda Ding, Yi Ren, Yu Zheng, Chongrong Fang and Jianping He	(参考訳) ターゲットの集合を監視するカメラネットワークの性能は、カメラの構成に大きく依存する。本稿では,パラメータ化カメラネットワークモデルの再構成戦略について検討し,複数のターゲットの知覚特性をグローバルかつ同時に最適化できることを示す。まず,画像中の単位長オブジェクトが占有する画素数を,カメラのパラメータ(内在的,外在的,歪的係数など)によって決定される物体の知覚品質の指標として用いることを提案する。そして、カメラネットワークによる目標のセンシング品質を測定する単一の量を形成する。この量はさらに最適化問題の目的関数として機能し、最適なカメラ構成を得る。提案手法の有効性を広範囲なシミュレーションと実験により検証し, apriltag検出タスクの性能向上を明らかにした。この作業のためのコードと関連するユーティリティは、https://github.com/sszxc/MultiCam-Simulationで公開されている。 The performance of a camera network monitoring a set of targets depends crucially on the configuration of the cameras. In this paper, we investigate the reconfiguration strategy for the parameterized camera network model, with which the sensing qualities of the multiple targets can be optimized globally and simultaneously. We first propose to use the number of pixels occupied by a unit-length object in image as a metric of the sensing quality of the object, which is determined by the parameters of the camera, such as intrinsic, extrinsic, and distortional coefficients. Then, we form a single quantity that measures the sensing quality of the targets by the camera network. This quantity further serves as the objective function of our optimization problem to obtain the optimal camera configuration. We verify the effectiveness of our approach through extensive simulations and experiments, and the results reveal its improved performance on the AprilTag detection tasks. Codes and related utilities for this work are open-sourced and available at https://github.com/sszxc/MultiCam-Simulation.	翻訳日:2022-11-29 20:17:28 公開日:2022-11-28
# AD診断と予後のための集団人工知能に基づくディープグレーディング Deep Grading based on Collective Artificial Intelligence for AD Diagnosis and Prognosis ( http://arxiv.org/abs/2211.15192v1 ) ライセンス: Link先を確認	Huy-Dung Nguyen, Micha\"el Cl\'ement, Boris Mansencal, and Pierrick Coup\'e	(参考訳) アルツハイマー病の正確な診断と予後は、新しい治療法の開発と関連するコストの削減に不可欠である。近年,畳み込みニューラルネットワークの進歩に伴い,この2つのタスクを構造MRIを用いて自動化する方法が提案されている。しかし、これらの手法はしばしば解釈可能性や一般化の欠如に苦しめられ、性能の面で制限されることがある。本稿では,これらの制約を克服する新しい深層フレームワークを提案する。私たちの枠組みは2つの段階からなる。最初の段階では,意味のある特徴を抽出するディープグレーディングモデルを提案する。ドメインシフトに対するこれらの特徴の堅牢性を高めるため、トレーニングと評価のための革新的な集合人工知能戦略を導入する。第2段階では、ADシグネチャをよりよくキャプチャするために、グラフ畳み込みニューラルネットワークを使用します。本研究は2074年を対象とし,AD診断と予後の両面で異なるデータセットにおける最先端の手法と比較した。 Accurate diagnosis and prognosis of Alzheimer's disease are crucial to develop new therapies and reduce the associated costs. Recently, with the advances of convolutional neural networks, methods have been proposed to automate these two tasks using structural MRI. However, these methods often suffer from lack of interpretability, generalization, and can be limited in terms of performance. In this paper, we propose a novel deep framework designed to overcome these limitations. Our framework consists of two stages. In the first stage, we propose a deep grading model to extract meaningful features. To enhance the robustness of these features against domain shift, we introduce an innovative collective artificial intelligence strategy for training and evaluating steps. In the second stage, we use a graph convolutional neural network to better capture AD signatures. Our experiments based on 2074 subjects show the competitive performance of our deep framework compared to state-of-the-art methods on different datasets for both AD diagnosis and prognosis.	翻訳日:2022-11-29 20:17:13 公開日:2022-11-28
# 深い優先順位を持つ調整不要なプラグアンドプレイハイパースペクトル画像デコンボリューション Tuning-free Plug-and-Play Hyperspectral Image Deconvolution with Deep Priors ( http://arxiv.org/abs/2211.15307v1 ) ライセンス: Link先を確認	Xiuheng Wang, Jie Chen, C\'edric Richard	(参考訳) デコンボリューション(deconvolution)は、取得装置が生成するハイパースペクトル画像~(hsi)のぼやけやノイズを軽減するために広く用いられる戦略である。この問題は通常、不適切な逆問題を解くことで解決される。適切な画像プリエントを調べることでデコンボリューション性能が向上するが、強力な正規化器を手作りし、正規化パラメータを設定することは自明ではない。本稿では,これらの問題に対処するため,hsiデコンボリューションのためのチューニングフリープラグアンドプレイ(pnp)アルゴリズムを提案する。具体的には、乗算器の交互方向法(ADMM)を用いて最適化問題を2つの反復部分確率に分解する。フレキシブルブラインド3dデノイジングネットワーク(b3ddn)は、より深い事前学習と、異なるノイズレベルでデノイジングサブ問題を解くために設計されている。次に、3次元残留白度の測定を行い、二次部分問題を解く際のペナルティパラメータと停止基準を調整する。実地データと実地データの両方における実験結果から,提案手法の優位性が示された。 Deconvolution is a widely used strategy to mitigate the blurring and noisy degradation of hyperspectral images~(HSI) generated by the acquisition devices. This issue is usually addressed by solving an ill-posed inverse problem. While investigating proper image priors can enhance the deconvolution performance, it is not trivial to handcraft a powerful regularizer and to set the regularization parameters. To address these issues, in this paper we introduce a tuning-free Plug-and-Play (PnP) algorithm for HSI deconvolution. Specifically, we use the alternating direction method of multipliers (ADMM) to decompose the optimization problem into two iterative sub-problems. A flexible blind 3D denoising network (B3DDN) is designed to learn deep priors and to solve the denoising sub-problem with different noise levels. A measure of 3D residual whiteness is then investigated to adjust the penalty parameters when solving the quadratic sub-problems, as well as a stopping criterion. Experimental results on both simulated and real-world data with ground-truth demonstrate the superiority of the proposed method.	翻訳日:2022-11-29 20:17:01 公開日:2022-11-28
# 脳MRIにおける教師なし異常検出の表現特性の検討 A Study of Representational Properties of Unsupervised Anomaly Detection in Brain MRI ( http://arxiv.org/abs/2211.15527v1 ) ライセンス: Link先を確認	Ayantika Das, Arun Palla, Keerthi Ram, Mohanasankar Sivaprakasam	(参考訳) MRIにおける異常検出は画像診断や診断において高い臨床的価値がある。異常検出の教師なし手法は、再構成や潜伏埋め込みに基づく興味深い定式化を提供し、分解に関連する特性を観察する方法を提供する。我々は4つの既存のモデリング手法を調査し、簡単なデータサイエンスツールを用いて経験的観察を報告し、脳構造MRIの場合を考慮して、非教師なしの異常検出の課題に最も関係がある因子化の観点から結果を求める。本研究は, 因子化関連特性を示す異常検出アルゴリズムが, 正規データと異常データとを区別する特徴量を持つことを示唆する。我々は、複数の異常および正常なデータセットで観測を検証した。 Anomaly detection in MRI is of high clinical value in imaging and diagnosis. Unsupervised methods for anomaly detection provide interesting formulations based on reconstruction or latent embedding, offering a way to observe properties related to factorization. We study four existing modeling methods, and report our empirical observations using simple data science tools, to seek outcomes from the perspective of factorization as it would be most relevant to the task of unsupervised anomaly detection, considering the case of brain structural MRI. Our study indicates that anomaly detection algorithms that exhibit factorization related properties are well capacitated with delineatory capabilities to distinguish between normal and anomaly data. We have validated our observations in multiple anomaly and normal datasets.	翻訳日:2022-11-29 20:16:39 公開日:2022-11-28
# データ増補とハイブリッド畳み込みネットワークを用いた前庭神経節状神経節形成のための非ペア化クロスモダリティセグメンテーションフレームワーク An Unpaired Cross-modality Segmentation Framework Using Data Augmentation and Hybrid Convolutional Networks for Segmenting Vestibular Schwannoma and Cochlea ( http://arxiv.org/abs/2211.14986v1 ) ライセンス: Link先を確認	Yuzhou Zhuang, Hong Liu, Enmin Song, Coskun Cetinkaya, and Chih-Cheng Hung	(参考訳) CrossMoDAの課題は、ラベル付き造影T1スキャンを利用して、ラベル付き高分解能T2スキャンで前庭神経腫瘍(VS)腫瘍とコチェリー領域を自動的に分離することである。 2022年版では、セグメンテーションタスクを多施設スキャンで拡張している。本研究では,データ拡張とハイブリッド畳み込みネットワークを用いた非ペア型クロスモダリティセグメンテーションフレームワークを提案する。多施設スキャンにおける不均一分布と様々な画像サイズを考慮し、各スキャンの強度を-1から1に拡大するためにmin-max正規化を適用し、ボクセルサイズ再サンプリングと中心刈りを用いて訓練を行う。我々は,意味情報を効果的に学習し,現実的な対象領域スキャンを生成するための2つのデータ拡張手法を採用した。本研究では,CUTとCycleGANを用いて,教師付きセグメンテーショントレーニングのための詳細と外観の異なる2つの現実的なT2ボリュームを生成する。オンラインデータ拡張のために,vs腫瘍信号の不均一性をシミュレートするランダム腫瘍信号低減法を考案する。さらに,多次元畳み込みを伴う高度なハイブリッド畳み込みネットワークを用いて,異方性スキャンにおいてvs腫瘍と人工内耳領域の正確なボリュームセグメンテーションのために,スパース間スライス情報と高密度内スライス情報を適応的に学習する。クロスモダ2022バリデーションデータセットでは有望な結果を示し,vs腫瘍領域では平均dsc値が72.47%,76.48%,asd値が3.42mmと0.53mmであった。 The crossMoDA challenge aims to automatically segment the vestibular schwannoma (VS) tumor and cochlea regions of unlabeled high-resolution T2 scans by leveraging labeled contrast-enhanced T1 scans. The 2022 edition extends the segmentation task by including multi-institutional scans. In this work, we proposed an unpaired cross-modality segmentation framework using data augmentation and hybrid convolutional networks. Considering heterogeneous distributions and various image sizes for multi-institutional scans, we apply the min-max normalization for scaling the intensities of all scans between -1 and 1, and use the voxel size resampling and center cropping to obtain fixed-size sub-volumes for training. We adopt two data augmentation methods for effectively learning the semantic information and generating realistic target domain scans: generative and online data augmentation. For generative data augmentation, we use CUT and CycleGAN to generate two groups of realistic T2 volumes with different details and appearances for supervised segmentation training. For online data augmentation, we design a random tumor signal reducing method for simulating the heterogeneity of VS tumor signals. Furthermore, we utilize an advanced hybrid convolutional network with multi-dimensional convolutions to adaptively learn sparse inter-slice information and dense intra-slice information for accurate volumetric segmentation of VS tumor and cochlea regions in anisotropic scans. On the crossMoDA2022 validation dataset, our method produces promising results and achieves the mean DSC values of 72.47% and 76.48% and ASSD values of 3.42 mm and 0.53 mm for VS tumor and cochlea regions, respectively.	翻訳日:2022-11-29 20:08:13 公開日:2022-11-28
# 多認識タスク指向フレームワークを用いた3次元レーダイメージング逆問題の解法 Solving 3D Radar Imaging Inverse Problems with a Multi-cognition Task-oriented Framework ( http://arxiv.org/abs/2211.14989v1 ) ライセンス: Link先を確認	Xu Zhan, Xiaoling Zhang, Mou Wang, Jun Shi, Shunjun Wei, Tianjiao Zeng	(参考訳) 本研究は3次元レーダ画像逆問題に焦点を当てる。現在の方法では,タスク依存情報検索の損失を被った未分化の結果が得られており,タスク固有の要求を十分に満たさない。例えば、偏光散乱エネルギーはスクリーンイメージングでは許容されるが、散乱診断では許容されない。この問題に対処するため,我々は新しいタスク指向イメージングフレームワークを提案する。撮像原理は、タスクの要求を得るために分析フェーズを通してタスク指向である。画像モデルは、要求を埋め込んで満たすために正規化された多認識である。本手法は,認識間のカップリングを近似法と可変スプリッティング法で個別に解く汎用的に設計されている。例として、散乱診断、パーソンスクリーンイメージング、パーセルスクリーニングイメージングなどがある。 2つのシステムからのデータに対する実験は、提案されたフレームワークがタスク依存情報検索において現在のフレームワークよりも優れていることを示している。 This work focuses on 3D Radar imaging inverse problems. Current methods obtain undifferentiated results that suffer task-depended information retrieval loss and thus don't meet the task's specific demands well. For example, biased scattering energy may be acceptable for screen imaging but not for scattering diagnosis. To address this issue, we propose a new task-oriented imaging framework. The imaging principle is task-oriented through an analysis phase to obtain task's demands. The imaging model is multi-cognition regularized to embed and fulfill demands. The imaging method is designed to be general-ized, where couplings between cognitions are decoupled and solved individually with approximation and variable-splitting techniques. Tasks include scattering diagnosis, person screen imaging, and parcel screening imaging are given as examples. Experiments on data from two systems indicate that the pro-posed framework outperforms the current ones in task-depended information retrieval.	翻訳日:2022-11-29 20:07:40 公開日:2022-11-28
# ロボット運動学:運動、運動学、力学 Robot Kinematics: Motion, Kinematics and Dynamics ( http://arxiv.org/abs/2211.15093v1 ) ライセンス: Link先を確認	Jiawei Zhang	(参考訳) この記事は、“Robot Basics: Representation, Rotation and Velocity”と題された前回の記事のフォローアップチュートリアル記事である。本稿では,本論文のトピックについてより深く理解するために,ロボット基礎に関する以前のチュートリアル記事を読むことを勧める。具体的には,ロボット運動,前方運動学,逆運動学,ロボット力学など,ロボットキネマティクスに関するより高度な話題について紹介する。前回の記事で紹介されたトピック、用語、表記について、この記事では再び導入することなく直接使用します。また、前回の記事と同様、本記事でも数学と公式が多用される(読者は今後の数学爆弾の準備が整っていることを願う)。この記事を読んでから、読者はロボットの動き、運動学、ダイナミクスについてより深く理解できるようになるだろう。ロボット制御に関するより先進的な話題については、読者向けの以下のチュートリアル記事で紹介する。 This is a follow-up tutorial article of our previous article entitled "Robot Basics: Representation, Rotation and Velocity". For better understanding of the topics covered in this articles, we recommend the readers to first read our previous tutorial article on robot basics. Specifically, in this article, we will cover some more advanced topics on robot kinematics, including robot motion, forward kinematics, inverse kinematics, and robot dynamics. For the topics, terminologies and notations introduced in the previous article, we will use them directly without re-introducing them again in this article. Also similar to the previous article, math and formulas will also be heavily used in this article as well (hope the readers are well prepared for the upcoming math bomb). After reading this article, readers should be able to have a deeper understanding about how robot motion, kinematics and dynamics. As to some more advanced topics about robot control, we will introduce them in the following tutorial articles for readers instead.	翻訳日:2022-11-29 20:01:39 公開日:2022-11-28
# 時空間物体モデリングによるプロアクティブロボット支援 Proactive Robot Assistance via Spatio-Temporal Object Modeling ( http://arxiv.org/abs/2211.15501v1 ) ライセンス: Link先を確認	Maithili Patel, Sonia Chernova	(参考訳) アクティブなロボット支援により、ロボットは明示的に尋ねられることなく、ユーザのニーズを予測し、提供することができる。ロボットが日常のユーザルーチンに付随する物体の動きの時間的パターンを予測する問題として、積極的な支援を定式化し、そのニーズに適応するためのオブジェクトを配置することで、ユーザの積極的な支援を行う。本稿では,物体配置の時間系列から物体ダイナミクスの時空間予測モデルを学ぶために,生成グラフニューラルネットワークを提案する。また,50日以上の生活行動に関連する家庭内オブジェクトを5つのシミュレートされた家庭で追跡するHouse Object Movements from Everyday Routines(HOMER)データセットを寄贈した。提案モデルは,物体移動の予測において主要なベースラインを上回り,11.1%以上の物体の位置を正確に予測し,11.5%の利用者が使用する物体の位置を誤って予測する。 Proactive robot assistance enables a robot to anticipate and provide for a user's needs without being explicitly asked. We formulate proactive assistance as the problem of the robot anticipating temporal patterns of object movements associated with everyday user routines, and proactively assisting the user by placing objects to adapt the environment to their needs. We introduce a generative graph neural network to learn a unified spatio-temporal predictive model of object dynamics from temporal sequences of object arrangements. We additionally contribute the Household Object Movements from Everyday Routines (HOMER) dataset, which tracks household objects associated with human activities of daily living across 50+ days for five simulated households. Our model outperforms the leading baseline in predicting object movement, correctly predicting locations for 11.1% more objects and wrongly predicting locations for 11.5% fewer objects used by the human user.	翻訳日:2022-11-29 20:01:23 公開日:2022-11-28
# トレーニング不足でグラフニューラルネットワークを改良:訓練されていないGNNのチケットを見つける You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets ( http://arxiv.org/abs/2211.15335v1 ) ライセンス: Link先を確認	Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu	(参考訳) 近年の研究では、ネットワークの重みを最適化することなく、完全に訓練された高密度ネットワークの性能に匹敵する、ランダムに初期化された畳み込みニューラルネットワーク(CNN)にサブネットワークが存在することが顕著に示されている。しかし、グラフニューラルネットワーク(GNN)におけるそのような訓練されていないサブネットワークの存在は、いまだに謎のままである。本稿では,未学習のGNNを探索する第一種探索を行う。 sparsityをコアツールとして、初期化時に \textit{untrained sparse subnetworks} を見つけることができ、これは \textit{fully trained dense} gnnのパフォーマンスにマッチする。このことに加えて、未学習のサブネットワークがGNNのオーバースムース化問題を大幅に軽減し、ベルやホイッスルを使わずにより深いGNNを可能にする強力なツールとなることを示す。また,そのようなスパースな未学習サブネットワークは,分布外検出や入力摂動のロバスト性において,優れた性能を有することが観察された。提案手法は,Open Graph Benchmark (OGB) など,広く使用されているGNNアーキテクチャを用いて評価する。 Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).	翻訳日:2022-11-29 19:53:17 公開日:2022-11-28
# 強化学習におけるゼロショット転送のためのハイパーネットワーク Hypernetworks for Zero-shot Transfer in Reinforcement Learning ( http://arxiv.org/abs/2211.15457v1 ) ライセンス: Link先を確認	Sahand Rezaei-Shoshtari, Charlotte Morissette, Francois Robert Hogan, Gregory Dudek, David Meger	(参考訳) 本稿では,新しいTDベースのトレーニング目標と準最適RLソリューションの集合から得られたデータを用いて,未知のタスク条件にまたがる行動を生成するために,ハイパーネットワークを訓練する。この作業は、メタRL、コンテキストRL、トランスファーラーニングに関連するもので、特にテスト時のゼロショットパフォーマンスに焦点を当てており、タスクパラメータ(コンテキストとしても知られる)の知識によって実現されている。我々の技術的アプローチは、各RLアルゴリズムをMDP仕様から準最適値関数とポリシーへのマッピングとして捉え、MDPのパラメータを考慮し、準最適値関数とポリシーを生成できるハイパーネットワークで近似することに基づいている。特定の条件下では、このマッピングを教師付き学習問題とみなすことができる。我々は,DeepMind Control Suiteの一連の連続制御タスクにおいて,新たな報酬と遷移ダイナミクスへのゼロショット転送の有効性を実証的に評価した。提案手法は,マルチタスクおよびメタRLアプローチによるベースラインの大幅な改善を示す。 In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objective and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.	翻訳日:2022-11-29 19:52:54 公開日:2022-11-28
# ベイズ逆強化学習による実演満足度の自動評価 Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning ( http://arxiv.org/abs/2211.15542v1 ) ライセンス: Link先を確認	Tu Trinh, Daniel S. Brown	(参考訳) 本稿では,AIエージェントが実演から学習するAIエージェントのデモンストレーション効率を決定する上での問題点について考察する。この問題を解決するために,ベイジアン逆強化学習とバリュー・アット・リスクに基づく新たな自己評価手法を提案する。我々は,(1)正規化期待値差,(2)専門家の観察できない報酬関数に対する後悔度,(2)基準政策に対する改善,という2つの定義を提案し,評価する。両指標の高信頼境界を定式化する方法を示す。我々は、シミュレーションにおける我々のアプローチを評価し、専門家のパフォーマンスに適合するか、あるいは所望の安全閾値内で基準ポリシーのパフォーマンスを上回ることを保証し、十分なトレーニングデータを受信したかどうかを正確に評価できるAIシステムの開発の可能性を示す。 In this paper we examine the problem of determining demonstration sufficiency for AI agents that learn from demonstrations: how can an AI agent self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk to enable agents that learn from demonstrations to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the expert's unobserved reward function, and (2) improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation and demonstrate the feasibility of developing an AI system that can accurately evaluate whether it has received sufficient training data to guarantee, with high confidence, that it can match an expert's performance or surpass the performance of a baseline policy within some desired safety threshold.	翻訳日:2022-11-29 19:52:25 公開日:2022-11-28
# 健康シンポジウム2022における機械学習 -- 拡張抽象トラック Machine Learning for Health symposium 2022 -- Extended Abstract track ( http://arxiv.org/abs/2211.15564v1 ) ライセンス: Link先を確認	Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y. Chen, Shengpu Tang, Luis Oala, Adarsh Subbaswamy	(参考訳) 第2回 machine learning for health symposium (ml4h 2022) で発表された拡張要約のコレクションは、2022年11月28日にアメリカ合衆国ルイジアナ州ニューオーリンズで開催された。マシンラーニング・フォー・ヘルス(ML4H)は、理論的な作業と応用的な作業の両方を含む、健康のための機械学習の研究のための長年にわたる場所である。 ML4H 2022は、技術的に成熟した厳密な作業の完全な提出を含むプロシージャトラックと、より成熟度が低いが議論のための革新的な研究を受理する拡張された抽象トラックの2つの提案トラックを特徴とした。 ml4hシンポジウムに提出された全ての原稿は、二重盲検のピアレビュープロセスが行われた。このコレクションに含まれる拡張された抽象化は、健康と医療の関連問題に焦点を当てた革新的な機械学習研究を記述している。 A collection of the extended abstracts that were presented at the 2nd Machine Learning for Health symposium (ML4H 2022), which was held both virtually and in person on November 28, 2022, in New Orleans, Louisiana, USA. Machine Learning for Health (ML4H) is a longstanding venue for research into machine learning for health, including both theoretical works and applied works. ML4H 2022 featured two submission tracks: a proceedings track, which encompassed full-length submissions of technically mature and rigorous work, and an extended abstract track, which would accept less mature, but innovative research for discussion. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process. Extended abstracts included in this collection describe innovative machine learning research focused on relevant problems in health and biomedicine.	翻訳日:2022-11-29 19:51:59 公開日:2022-11-28
# カスケード障害から相互依存型インフラストラクチャネットワークを再構築するベイズ的アプローチ A Bayesian Approach to Reconstructing Interdependent Infrastructure Networks from Cascading Failures ( http://arxiv.org/abs/2211.15590v1 ) ライセンス: Link先を確認	Yu Wang, Jin-Zhu Yu, Hiba Baroud	(参考訳) 複雑な相互依存ネットワークの挙動を分析するには、ネットワークトポロジとネットワーク間の相互依存リンクに関する完全な情報が必要である。重要なインフラストラクチャシステムのような多くのアプリケーションにとって、ネットワーク相互依存を理解することは、カスケード障害を予測し、破壊の計画を立てるのに不可欠である。しかしながら、プライバシやセキュリティ上の懸念から、個々のネットワークのトポロジに関するデータは一般には利用できないことが多い。さらに、相互依存リンクは、しばしばカスケード障害の結果、破壊の余波でのみ明らかにされる。本稿では,カスケード故障の観測から相互依存型インフラストラクチャネットワークのトポロジを再構築するスケーラブルな非パラメトリックベイズ手法を提案する。インフラストラクチャ依存の提案と組み合わされたメトロポリス・ハスティングスアルゴリズムは、可能なグラフをサンプリングする効率を高めるために用いられる。相互依存型ネットワークの合成システムを再構築した結果,提案手法は精度と計算時間の両方で既存手法よりも優れていた。さらに本手法を用いて, シェルビー郡のガス水ネットワークやイタリアにおける電力水ネットワークの相互依存システムなど, 相互依存型インフラネットワークの1つのシステムと2つの実世界のシステムのトポロジを再構築し, アプローチの適用性を実証する。 Analyzing the behavior of complex interdependent networks requires complete information about the network topology and the interdependent links across networks. For many applications such as critical infrastructure systems, understanding network interdependencies is crucial to anticipate cascading failures and plan for disruptions. However, data on the topology of individual networks are often publicly unavailable due to privacy and security concerns. Additionally, interdependent links are often only revealed in the aftermath of a disruption as a result of cascading failures. We propose a scalable nonparametric Bayesian approach to reconstruct the topology of interdependent infrastructure networks from observations of cascading failures. Metropolis-Hastings algorithm coupled with the infrastructure-dependent proposal are employed to increase the efficiency of sampling possible graphs. Results of reconstructing a synthetic system of interdependent infrastructure networks demonstrate that the proposed approach outperforms existing methods in both accuracy and computational time. We further apply this approach to reconstruct the topology of one synthetic and two real-world systems of interdependent infrastructure networks, including gas-power-water networks in Shelby County, TN, USA, and an interdependent system of power-water networks in Italy, to demonstrate the general applicability of the approach.	翻訳日:2022-11-29 19:51:44 公開日:2022-11-28
# オフラインマルチエージェント強化学習における良い軌道からの学習 Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2211.15612v1 ) ライセンス: Link先を確認	Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang	(参考訳) オフラインマルチエージェント強化学習(marl: offline multi-agent reinforcement learning)は、事前収集されたデータセットから効果的なマルチエージェントポリシーを学ぶことを目的としている。しかし、実際には、複数エージェントのジョイントトラジェクタを生成する個々の行動ポリシーは、通常、そのパフォーマンスのレベルが異なる。例えば、エージェントはランダムポリシーであり、他のエージェントはメディアポリシーである。グローバルな報酬を伴う協調ゲームでは、既存のオフラインMARLによって学習されたエージェントが、しばしばこのランダムなポリシーを継承し、チーム全体のパフォーマンスを危うくする。本稿では,エージェントワイドトラジェクトリの多様性を明確に考慮したオフラインMARLを調査し,この問題に対処するための共有個人トラジェクトリ(SIT)と呼ばれる新しいフレームワークを提案する。具体的には、注目ベースの報酬分解ネットワークは、異なるキー値記憶機構を介して各エージェントにオフラインでクレジットを割り当てる。これらの分解クレジットは、オフラインデータセットを個別の軌道と優先順位付けされた体験リプレイに再構築するために使用され、その後エージェントは良い軌道を共有し、グラフアテンションネットワーク(gat)ベースの批評家と保守的にポリシーを訓練することができる。離散制御(starcraft iiおよびmulti-agent particle environment)と連続制御(multi-agent mujoco)の両方において,本手法を評価した。提案手法は,複雑なオフラインマルチエージェントデータセットにおいて,特に個々のトラクタ間のデータ品質の差が大きい場合に,より優れた結果が得られることを示す。 Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e, multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.	翻訳日:2022-11-29 19:51:22 公開日:2022-11-28
# 離散化線形回帰とマルチクラス支持ベクトルによる大気汚染予測手法 Discretized Linear Regression and Multiclass Support Vector Based Air Pollution Forecasting Technique ( http://arxiv.org/abs/2211.15095v1 ) ライセンス: Link先を確認	Dhanalakshmi M and Radha V	(参考訳) 大気汚染は、発展途上国における伝統的なエネルギー資源の未管理利用から生じる重要な問題である。したがって、リスクを最小限に抑えるために、創発的な大気汚染予測手法が不可欠である。そこで本研究では,クラウドコンピューティング環境における大気汚染の監視と制御を行うIoT(Internet of Things)システムを提案する。リニア回帰・マルチクラスサポートベクトル (LR-MSV) IoTベースの大気汚染予測手法を提案し, 大気質データと大気質指数測定をモニタリングし, 効果的に制御する方法について検討した。インドのデータセットにおける空気品質データを用いた広範囲な実験により、確立された最先端手法を用いてベンチマークを行った場合、提案手法の優れた性能が明らかになった。 LR-MSV法により得られた結果は, 大気汚染予測時間と誤差率を, 他の最先端手法による結果と比較することにより, 大気汚染予測精度を著しく向上させることを示した。 Air pollution is a vital issue emerging from the uncontrolled utilization of traditional energy sources as far as developing countries are concerned. Hence, ingenious air pollution forecasting methods are indispensable to minimize the risk. To that end, this paper proposes an Internet of Things (IoT) enabled system for monitoring and controlling air pollution in the cloud computing environment. A method called Linear Regression and Multiclass Support Vector (LR-MSV) IoT-based Air Pollution Forecast is proposed to monitor the air quality data and the air quality index measurement to pave the way for controlling effectively. Extensive experiments carried out on the air quality data in the India dataset have revealed the outstanding performance of the proposed LR-MSV method when benchmarked with well-established state-of-the-art methods. The results obtained by the LR-MSV method witness a significant increase in air pollution forecasting accuracy by reducing the air pollution forecasting time and error rate compared with the results produced by the other state-of-the-art methods	翻訳日:2022-11-29 19:44:02 公開日:2022-11-28
# 間隔準メトリック埋め込みによる非対称距離の表現の改善 Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings ( http://arxiv.org/abs/2211.15120v1 ) ライセンス: Link先を確認	Tongzhou Wang, Phillip Isola	(参考訳) 非対称距離構造(準距離構造)は、私たちの生活においてユビキタスであり、機械学習応用においてより注目を集めている。このような準計量構造をモデル表現に取り入れることで、強化学習(RL)や因果関係学習など、多くの課題が改善されることが示されている。本研究では,そのような準メトリックモデルにおいて4つの望ましい性質を示し,それに対してどのように先行作用が失敗するかを示す。 4つの基準を全て満たすために, IQE (Interval Quasimetric Embedding) を提案する。 3つの準メトリック学習実験において、iqeは強い近似と一般化能力を示し、従来の方法よりも優れた性能と効率をもたらす。 Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding Quasimetric Learning Code Package: https://www.github.com/quasimetric-learning/torch-quasimetric Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives and are gaining more attention in machine learning applications. Imposing such quasimetric structures in model representations has been shown to improve many tasks, including reinforcement learning (RL) and causal relation learning. In this work, we present four desirable properties in such quasimetric models, and show how prior works fail at them. We propose Interval Quasimetric Embedding (IQE), which is designed to satisfy all four criteria. On three quasimetric learning experiments, IQEs show strong approximation and generalization abilities, leading to better performance and improved efficiency over prior methods. Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding Quasimetric Learning Code Package: https://www.github.com/quasimetric-learning/torch-quasimetric	翻訳日:2022-11-29 19:43:46 公開日:2022-11-28
# スケールと一般化の異なるマルチタスクデータに関するオフラインQ-Learning Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes ( http://arxiv.org/abs/2211.15144v1 ) ライセンス: Link先を確認	Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine	(参考訳) オフライン強化学習(rl)の可能性は、大規模な異種データセットでトレーニングされた高容量モデルが、視覚とnlpの類似の進歩と同様に、広く一般化されるエージェントにつながる可能性があることである。しかし、最近の研究は、オフラインのRLメソッドはモデルキャパシティのスケールアップに固有の課題に直面していると主張している。これらの研究から得られた知見をもとに,先行設計の選択肢を再検討し,適切な選択を行うことでそれを見出す。resnet,クロスエントロピーベースの分散バックアップ,機能正規化,オフラインのq-learningアルゴリズムは,モデルキャパシティでスケールする強力なパフォーマンスを示す。マルチタスクのAtariをスケーリングと一般化のためのテストベッドとして使用し、最大8000万のパラメータネットワークを用いて40ゲームに1つのポリシーをトレーニングし、モデル性能がキャパシティと良好にスケールできることを発見した。以前の作業とは対照的に、大規模な(4mのトランジッションで完全にトレーニングされた場合でも、データセットのパフォーマンス以上を推定する(人間レベルのパフォーマンスは51%)。回帰条件付き教師付きアプローチと比較して、オフラインのq-learningはモデルキャパシティと同様にスケールし、特にデータセットが最適でない場合、パフォーマンスが向上する。最後に、多様なデータセットを持つオフラインのq-learningは、新しいゲームへの迅速な移行とトレーニングゲームの新たなバリエーションに関する高速なオンライン学習を促進する強力な表現を学習するのに十分であることを示し、既存の最先端表現学習アプローチよりも改善する。 The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51% human-level performance). Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.	翻訳日:2022-11-29 19:43:31 公開日:2022-11-28
# データ駆動型マルチノミアルランダム森林 Data-driven multinomial random forest ( http://arxiv.org/abs/2211.15154v1 ) ライセンス: Link先を確認	JunHao Chen	(参考訳) 本稿では,ランダムフォレスト変種に対する従来の弱い一貫性証明法を強い一貫性証明法に補強し,rf変種のデータ駆動性を強化し,より優れた理論特性と実験性能を得る。また,データ駆動型マルチノミアルランダムフォレスト(DMRF)をMRF(Multinomial random forest)に基づいて提案する。我々の知る限り、DMRFアルゴリズムはアルゴリズムの複雑さが低く、優れた性能を持つRFの変種である。 In this paper, we strengthen the previous weak consistency proof method of random forest variants into a strong consistency proof method, and strengthen the data-driven degree of RF variants, so as to obtain better theoretical properties and experimental performance. In addition, we also propose a data-driven multinomial random forest (DMRF) based on the multinomial random forest (MRF), which meets the strong consistency and has lower complexity than MRF, and the effect is equal to or better than MRF. As far as we know, DMRF algorithm is a variant of RF with low algorithm complexity and excellent performance.	翻訳日:2022-11-29 19:43:01 公開日:2022-11-28
# インクリメンタルフーリエニューラルオペレータ Incremental Fourier Neural Operator ( http://arxiv.org/abs/2211.15188v1 ) ライセンス: Link先を確認	Jiawei Zhao, Robert Joseph George, Yifei Zhang, Zongyi Li, Anima Anandkumar	(参考訳) 近年、ニューラルネットワークは偏微分方程式(pdes)を解く能力が証明されている。中でもフーリエニューラル演算子(FNO)は乱流などの非線形問題に対する学習ソリューション演算子として成功している。 FNOは離散化不変であり、低解像度のデータをトレーニングし、高解像度の問題を一般化することができる。この特性は、情報伝達のために限られた周波数モードのみを選択するFNOの低域フィルタと関連している。しかし、異なるPDEに対して適切な回数の周波数モードとトレーニング解像度を選択することは依然として課題である。周波数モードと低解像度データが多すぎると一般化を損なうが、多くの周波数モードと高解像度データは計算に高価であり、過度に適合する。そこで本研究では,訓練中の周波数モードとデータ解像度を漸進的に拡張するインクリメンタルフーリエニューラル演算子(ifno)を提案する。 IFNOは,標準FNOに比べて計算コストを35%削減しつつ,より優れた一般化(L2損失テストの15%削減)を実現する。さらに,IFNOはFNOにおける暗黙の正則化の挙動に従い,その優れた一般化能力を説明する。 Recently, neural networks have proven their impressive ability to solve partial differential equations (PDEs). Among them, Fourier neural operator (FNO) has shown success in learning solution operators for highly non-linear problems such as turbulence flow. FNO is discretization-invariant, where it can be trained on low-resolution data and generalizes to problems with high-resolution. This property is related to the low-pass filters in FNO, where only a limited number of frequency modes are selected to propagate information. However, it is still a challenge to select an appropriate number of frequency modes and training resolution for different PDEs. Too few frequency modes and low-resolution data hurt generalization, while too many frequency modes and high-resolution data are computationally expensive and lead to over-fitting. To this end, we propose Incremental Fourier Neural Operator (IFNO), which augments both the frequency modes and data resolution incrementally during training. We show that IFNO achieves better generalization (around 15% reduction on testing L2 loss) while reducing the computational cost by 35%, compared to the standard FNO. In addition, we observe that IFNO follows the behavior of implicit regularization in FNO, which explains its excellent generalization ability.	翻訳日:2022-11-29 19:42:50 公開日:2022-11-28
# CIM:スパース逆連続制御のための制約付き固有モチベーション CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control ( http://arxiv.org/abs/2211.15205v1 ) ライセンス: Link先を確認	Xiang Zheng, Xingjun Ma, Cong Wang	(参考訳) 内在的動機付けは、希薄な報酬または欠如した報酬で強化学習タスクを解決するための有望な探索技術である。固有のモチベーションを実装するには2つの技術的課題があります。 1)効率的な探査を促進するための適切な本質的目標の設計方法 2)本質的な目的と外生的な目的を組み合わせて、より良い解決策を見つける方法。現在の文献では、本質的な目的はすべてタスクに依存しない方法で設計され、単純な追加(あるいは報酬のない事前訓練に自身で使用する)によって外生的な目的と組み合わせられている。本研究では、これらの設計が典型的なスパース逆連続制御タスクで失敗することを示す。そこで本研究では,制約付き本質的目標を構築するために,容易に達成可能なタスクプリエントを活用するための制約付き本質的モチベーション(cim)を提案し,同時に,本質的目標と外生的目標を同時最大化フレームワークで適応的にバランスさせるラグランジアン法を活用した。我々は、複数のスパース逆連続制御タスクにおいて、CIM手法が最先端手法よりも性能とサンプル効率を大幅に向上させることを示す。さらに、CIMの重要なテクニックを既存のメソッドにプラグインしてパフォーマンスを向上させることも可能です。 Intrinsic motivation is a promising exploration technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation: 1) how to design a proper intrinsic objective to facilitate efficient exploration; and 2) how to combine the intrinsic objective with the extrinsic objective to help find better solutions. In the current literature, the intrinsic objectives are all designed in a task-agnostic manner and combined with the extrinsic objective via simple addition (or used by itself for reward-free pre-training). In this work, we show that these designs would fail in typical sparse-reward continuous control tasks. To address the problem, we propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective, and at the same time, exploit the Lagrangian method to adaptively balance the intrinsic and extrinsic objectives via a simultaneous-maximization framework. We empirically show, on multiple sparse-reward continuous control tasks, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods. Moreover, the key techniques of our CIM can also be plugged into existing methods to boost their performances.	翻訳日:2022-11-29 19:42:30 公開日:2022-11-28
# Chroma-VAE: 生成型分類器によるショートカット学習の軽減 Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers ( http://arxiv.org/abs/2211.15231v1 ) ライセンス: Link先を確認	Wanqian Yang, Polina Kirichenko, Micah Goldblum, Andrew Gordon Wilson	(参考訳) 深層ニューラルネットワークは、学習をショートカットし、基本的な意味構造を発見することなく、トレーニング損失の少ないために単純な特徴を使用する。先行する信念とは対照的に,生成モデルだけでは,識別的アプローチよりも総合的な表現を回復する動機があるにもかかわらず,近距離学習を防止するには不十分であることを示す。しかし,ショートカットを最小限の情報で優先的に符号化することは,生成モデルがショートカット学習の軽減に有効であることを示す。特にChroma-VAEは、VAE分類器を初期訓練して小さな潜在部分空間でショートカットを分離し、二次分類器を補完的、ショートカットのない潜在部分空間で訓練する2段階のアプローチを提案する。ベンチマークや実世界のショートカット学習におけるクロマVAEの有効性の実証に加えて, 生成型分類器の潜時空間を操作して, 特定の相関関係を分離・解釈する可能性を強調した。 Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure. Contrary to prior belief, we show that generative models alone are not sufficient to prevent shortcut learning, despite an incentive to recover a more comprehensive representation of the data than discriminative approaches. However, we observe that shortcuts are preferentially encoded with minimal information, a fact that generative models can exploit to mitigate shortcut learning. In particular, we propose Chroma-VAE, a two-pronged approach where a VAE classifier is initially trained to isolate the shortcut in a small latent subspace, allowing a secondary classifier to be trained on the complementary, shortcut-free latent subspace. In addition to demonstrating the efficacy of Chroma-VAE on benchmark and real-world shortcut learning tasks, our work highlights the potential for manipulating the latent space of generative classifiers to isolate or interpret specific correlations.	翻訳日:2022-11-29 19:42:06 公開日:2022-11-28
# GADMSL:マルチスケールサブ構造学習による分散ネットワーク上のグラフ異常検出 GADMSL: Graph Anomaly Detection on Attributed Networks via Multi-scale Substructure Learning ( http://arxiv.org/abs/2211.15255v1 ) ライセンス: Link先を確認	Duan Jingcan, Wang Siwei, Liu Xinwang, Zhou Haifang, Hu Jingtao, Jin Hu	(参考訳) 近年,グラフ異常検出がデータマイニングや機械学習コミュニティで注目を集めている。既存の属性異常とは別に、グラフ異常検出は主要な属性と異なる疑わしい位相異常ノードもキャプチャする。グラフに基づく大規模な検出手法が提案されているが、そのほとんどはノードレベルの比較に重点を置いている。より異なる近傍構造を持つノードは、異常である可能性がより疑わしい。局所的なサブストラクチャー検出能力を高めるために,マルチスケールサブストラクチャー学習(GADMSL, Multi-scale Substructure Learning)によるグラフ異常検出フレームワークを提案する。従来のアルゴリズムとは異なり、内部類似度が密結合領域において比較的低い異常な部分構造を捉えることができる。具体的には,ネットワーク内の高密度な部分構造を疑わしい部分として見つけるための領域提案モジュールを採用する。内部ノード埋め込みの類似性は検出された部分構造の異常度を示している。一般に、埋め込み類似度の低いことは、部分構造が位相異常を含む高い確率を意味する。さらに,ノード属性の埋め込み性を向上するために,属性異常を観測するグラフコントラスト学習方式を導入する。このようにして、GADMSLはトポロジーと属性の異常の両方を検出することができる。最終的に、ベンチマークデータセットの広範な実験により、gadmslは最先端のネットワーク異常検出アルゴリズムに比べて検出性能(最大7.30%のaucと17.46%のauprc向上)が大幅に向上することが示された。 Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.	翻訳日:2022-11-29 19:41:46 公開日:2022-11-28
# ラベルノイズに頑健なニューラルネットワークの確立 Establishment of Neural Networks Robust to Label Noise ( http://arxiv.org/abs/2211.15279v1 ) ライセンス: Link先を確認	Pengwei Yang, Angel Teng and Jack Mangos	(参考訳) ラベルノイズはディープラーニングモデルのトレーニングにおいて重要な障害である。これは画像分類モデル、特にディープニューラルネットワークの性能に大きな影響を与える可能性がある。本稿では,関連ラベルノイズ手法の基本概念について検討した。遷移行列推定器が作成され、実際の遷移行列に対する効果が実証されている。さらに,2つの畳み込みニューラルネットワーク分類器のラベル雑音耐性をLeNetとAlexNetの設計を用いて検討した。 2つのFashionMINISTデータセットは、両方のモデルの堅牢性を明らかにしている。我々は、時間と計算資源の制約により複雑な畳み込みニューラルネットワークモデルを正しく調整できないため、遷移行列ノイズ補正が堅牢性向上に与える影響を効率的に示すことができない。今後の研究において、ニューラルネットワークモデルを微調整し、推定遷移モデルの精度を探求する追加の努力が必要である。 Label noise is a significant obstacle in deep learning model training. It can have a considerable impact on the performance of image classification models, particularly deep neural networks, which are especially susceptible because they have a strong propensity to memorise noisy labels. In this paper, we have examined the fundamental concept underlying related label noise approaches. A transition matrix estimator has been created, and its effectiveness against the actual transition matrix has been demonstrated. In addition, we examined the label noise robustness of two convolutional neural network classifiers with LeNet and AlexNet designs. The two FashionMINIST datasets have revealed the robustness of both models. We are not efficiently able to demonstrate the influence of the transition matrix noise correction on robustness enhancements due to our inability to correctly tune the complex convolutional neural network model due to time and computing resource constraints. There is a need for additional effort to fine-tune the neural network model and explore the precision of the estimated transition model in future research.	翻訳日:2022-11-29 19:41:21 公開日:2022-11-28
# Flow: 動的ルーティングによる個人化フェデレーション学習 Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing ( http://arxiv.org/abs/2211.15281v1 ) ライセンス: Link先を確認	Kunjal Panchal, Sunav Choudhary, Hui Guan	(参考訳) フェデレートラーニング(FL)におけるパーソナライゼーションは、クライアントごとに協調的に訓練されたグローバルモデルを変更することを目的としている。 FLにおけるパーソナライズへの現在のアプローチは、粗い粒度、すなわち、クライアントのすべての入力インスタンスは同じパーソナライズされたモデルを使っている。これは、いくつかのインスタンスがより正確なグローバルモデルによって扱われているという事実を無視している。この課題に対処するために、この研究は、きめ細かいステートレスパーソナライズされたFLアプローチであるFlowを提案する。 Flowは、入力インスタンスがローカルパラメータを好むかどうかを判断するルーティングメカニズムを学習することで、動的パーソナライズされたモデルを生成する。このようにflowは、クライアント毎のパーソナライズを活用して、各クライアントのアキュラビリティを向上させることに加えて、インスタンス毎のルーティングを導入する。さらに、Flowはステートレスであるため、クライアントがFLラウンド全体でパーソナライズされた状態を維持する必要がなくなる。これにより、Flowは大規模FL設定で実用的になり、新しく加入したクライアントと親しみやすくなります。 Stackoverflow、Reddit、EMNISTデータセットの評価は、FLに対する最先端の非個人化とクライアント毎のパーソナライズされたアプローチよりも、Flowの予測精度が優れていることを示している。 Personalization in Federated Learning (FL) aims to modify a collaboratively trained global model according to each client. Current approaches to personalization in FL are at a coarse granularity, i.e. all the input instances of a client use the same personalized model. This ignores the fact that some instances are more accurately handled by the global model due to better generalizability. To address this challenge, this work proposes Flow, a fine-grained stateless personalized FL approach. Flow creates dynamic personalized models by learning a routing mechanism that determines whether an input instance prefers the local parameters or its global counterpart. Thus, Flow introduces per-instance routing in addition to leveraging per-client personalization to improve accuracies at each client. Further, Flow is stateless which makes it unnecessary for a client to retain its personalized state across FL rounds. This makes Flow practical for large-scale FL settings and friendly to newly joined clients. Evaluations on Stackoverflow, Reddit, and EMNIST datasets demonstrate the superiority in prediction accuracy of Flow over state-of-the-art non-personalized and only per-client personalized approaches to FL.	翻訳日:2022-11-29 19:41:08 公開日:2022-11-28
# qlammp:自動化市場構築プロトコルの料金を最適化するq-learningエージェント QLAMMP: A Q-Learning Agent for Optimizing Fees on Automated Market Making Protocols ( http://arxiv.org/abs/2211.14977v1 ) ライセンス: Link先を確認	Dev Churiwala, Bhaskar Krishnamachari	(参考訳) AMM(Automated Market Makers)は、分散金融(DeFi)分野の不可欠な部分として自らを固めている。 AMMは、集中取引所を必要とせずに資産を取引できる取引所の一種である。多数の分散交換(DEX)の基礎を形成し、オンチェーントークンの迅速かつ効率的な交換を支援する。現在の一般的なdexはすべて静的プロトコルであり、固定パラメータが料金と曲率を制御している。この特徴は、トレーダーが不利な市場の動きによって引き起こされる高い滑り込み状態の間、遠ざかってしまう可能性がある。本稿では,AMMプロトコル上で収集した料金を最適化するRLフレームワークを提案する。特に,マーケットメイキングプロトコルのQラーニングエージェント(QLAMMP)を開発し,与えられたAMMプロトコルの最適料金率と係数を学習し,様々な市場条件下で収集された期待手数料を最大化する。 QLAMMPは、すべてのシミュレートされたテスト条件下で、その静的な性能よりも一貫して優れています。 Automated Market Makers (AMMs) have cemented themselves as an integral part of the decentralized finance (DeFi) space. AMMs are a type of exchange that allows users to trade assets without the need for a centralized exchange. They form the foundation for numerous decentralized exchanges (DEXs), which help facilitate the quick and efficient exchange of on-chain tokens. All present-day popular DEXs are static protocols, with fixed parameters controlling the fee and the curvature - they suffer from invariance and cannot adapt to quickly changing market conditions. This characteristic may cause traders to stay away during high slippage conditions brought about by intractable market movements. We propose an RL framework to optimize the fees collected on an AMM protocol. In particular, we develop a Q-Learning Agent for Market Making Protocols (QLAMMP) that learns the optimal fee rates and leverage coefficients for a given AMM protocol and maximizes the expected fee collected under a range of different market conditions. We show that QLAMMP is consistently able to outperform its static counterparts under all the simulated test conditions.	翻訳日:2022-11-29 19:32:34 公開日:2022-11-28
# 最適スパース回帰木 Optimal Sparse Regression Trees ( http://arxiv.org/abs/2211.14980v1 ) ライセンス: Link先を確認	Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin	(参考訳) 回帰木はAIモデルの最も古い形式の1つであり、その予測は電卓なしで行うことができる。回帰木に関する大規模な文献の中で、問題の計算の難しさから、完全証明可能な最適化への取り組みはほとんどなかった。本研究は,確率的最適スパース回帰木の構築に対する動的プログラミングとバウンドのアプローチを提案する。ラベル集合上の1次元におけるk-平均クラスタリングアルゴリズムの最適解に基づく新しい下界を利用する。数秒で最適なスパースツリーを見つけることがしばしば可能で、大量のサンプルと高い相関性のある機能を含む、挑戦的なデータセットでさえあります。 Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.	翻訳日:2022-11-29 19:32:15 公開日:2022-11-28
# dgi:gnnの簡単かつ効率的な推論 DGI: Easy and Efficient Inference for GNNs ( http://arxiv.org/abs/2211.15082v1 ) ライセンス: Link先を確認	Peiqi Yin, Xiao Yan, Jinjing Zhou, Qiang Fu, Zhenkun Cai, James Cheng, Bo Tang, Minjie Wang	(参考訳) グラフニューラルネットワーク(GNN)を訓練するために多くのシステムが開発されているが、効率的なモデル推論と評価は未解決のままである。例えば、広く採用されているノードワイドアプローチを使用して、モデル評価は、隣の爆発によるエンドツーエンドのトレーニングプロセスにおいて、最大94%の時間を占めることができる。一方、層ワイド推論は、各層に1ホップの隣り合わせしか必要としないように、各層で推論層を実行することによって、隣の爆発問題を回避している。しかし、計算のためにGNNモデルをレイヤーに手動で分解し、ワークロードをデバイスメモリに適合させるためには、バッチに分割する必要があるため、レイヤワイズ推論を実装するにはかなりのエンジニアリング作業が必要である。本稿では,GNNモデルの学習コードを階層的実行のために自動的に翻訳する,簡易かつ効率的なGNNモデル推論システムであるDeep Graph Inference(DGI)を開発する。 DGIはさまざまなGNNモデルとさまざまな種類の推論要求に対して汎用的であり、CPUメモリに収まらない大きなグラフ上でのコア外実行をサポートする。実験の結果、DGIは異なるデータセットとハードウェア設定で階層的推論を一貫して上回り、スピードアップは1000倍以上であることがわかった。 While many systems have been developed to train Graph Neural Networks (GNNs), efficient model inference and evaluation remain to be addressed. For instance, using the widely adopted node-wise approach, model evaluation can account for up to 94% of the time in the end-to-end training process due to neighbor explosion, which means that a node accesses its multi-hop neighbors. On the other hand, layer-wise inference avoids the neighbor explosion problem by conducting inference layer by layer such that the nodes only need their one-hop neighbors in each layer. However, implementing layer-wise inference requires substantial engineering efforts because users need to manually decompose a GNN model into layers for computation and split workload into batches to fit into device memory. In this paper, we develop Deep Graph Inference (DGI) -- a system for easy and efficient GNN model inference, which automatically translates the training code of a GNN model for layer-wise execution. DGI is general for various GNN models and different kinds of inference requests, and supports out-of-core execution on large graphs that cannot fit in CPU memory. Experimental results show that DGI consistently outperforms layer-wise inference across different datasets and hardware settings, and the speedup can be over 1,000x.	翻訳日:2022-11-29 19:32:05 公開日:2022-11-28
# バイオメディシンにおける会話探索と応用に関する研究 A Survey on Conversational Search and Applications in Biomedicine ( http://arxiv.org/abs/2211.15328v1 ) ライセンス: Link先を確認	Naga Sai Krishna Adatrao, Gowtham Reddy Gadireddy, Jiho Noh	(参考訳) 本稿では,ユーザが情報検索タスクの対話を行う情報検索手法を強化するためのアプローチである会話検索(convsearch)の抜本的な実行方法を提案する。本研究では,ConvSearchシステムにおけるヒューマン・インタラクティブな特徴に着目し,アクション・モジュール,おそらく検索システム,質問応答システム,レコメンダシステムの動作に注目した。動作モジュールとともに、知識ベース、自然言語処理、対話管理システムにおける様々なConvSearch研究問題をラベル付けした。さらに,convsearchの枠組みを分類し,臨床社会技術活用のためのバイオメディカル・医療分野への応用をめざした。最後に,特にバイオメディシンにおけるconvsearchの課題と課題について論じる。我々の主な目的は、さまざまな分野からConvSearchコンポーネントを統合して統合したビジョンを提供することであり、医療システムにおける情報検索のプロセスに役立てることである。 This paper aims to provide a radical rundown on Conversation Search (ConvSearch), an approach to enhance the information retrieval method where users engage in a dialogue for the information-seeking tasks. In this survey, we predominantly focused on the human interactive characteristics of the ConvSearch systems, highlighting the operations of the action modules, likely the Retrieval system, Question-Answering, and Recommender system. We labeled various ConvSearch research problems in knowledge bases, natural language processing, and dialogue management systems along with the action modules. We further categorized the framework to ConvSearch and the application is directed toward biomedical and healthcare fields for the utilization of clinical social technology. Finally, we conclude by talking through the challenges and issues of ConvSearch, particularly in Bio-Medicine. Our main aim is to provide an integrated and unified vision of the ConvSearch components from different fields, which benefit the information-seeking process in healthcare systems.	翻訳日:2022-11-29 19:16:55 公開日:2022-11-28
# Fast-SNARF:人工神経の高速変形器 Fast-SNARF: A Fast Deformer for Articulated Neural Fields ( http://arxiv.org/abs/2211.15601v1 ) ライセンス: Link先を確認	Xu Chen, Tianjian Jiang, Jie Song, Max Rietmann, Andreas Geiger, Michael J. Black, Otmar Hilliges	(参考訳) ニューラルフィールドは3次元再構成と剛体シーンの新しいビュー合成の領域に革命をもたらした。このような手法を人体などの関節オブジェクトに適用する上で重要な課題は、残りのポーズ(標準空間)と変形した空間の間の3D位置の変形をモデル化することである。本研究では, 反復的ルート探索により, 正準空間とポーズ空間の正確な対応を求める, ニューラルフィールドのための新しい調音モジュールfast-snarfを提案する。 Fast-SNARFは、これまでの作業であるSNARFの代替機能であり、計算効率は大幅に向上した。我々は,SNARFに対するアルゴリズムおよび実装の改善に寄与し,150\times$の高速化を実現した。これらの改善には、voxelベースの対応検索、線形ブレンドスキン機能の事前計算、CUDAカーネルによる効率的なソフトウェア実装が含まれる。高速SNARFは、対応のない変形した観察(例えば3Dメッシュ)に対して、形状とスキンの重量の効率的かつ同時最適化を可能にする。変形マップの学習は多くの人間のアバター法において重要な要素であり、Fast-SNARFは計算効率の良い解を提供するので、この研究は3次元仮想人間の実現に向けた重要な一歩であると信じている。 Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space via iterative root finding. Fast-SNARF is a drop-in replacement in functionality to our previous work, SNARF, while significantly improving its computational efficiency. We contribute several algorithmic and implementation improvements over SNARF, yielding a speed-up of $150\times$. These improvements include voxel-based correspondence search, pre-computing the linear blend skinning function, and an efficient software implementation with CUDA kernels. Fast-SNARF enables efficient and simultaneous optimization of shape and skinning weights given deformed observations without correspondences (e.g. 3D meshes). Because learning of deformation maps is a crucial component in many 3D human avatar methods and since Fast-SNARF provides a computationally efficient solution, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.	翻訳日:2022-11-29 19:08:31 公開日:2022-11-28
# 多レベル不均質学習による効率的なミラー検出 Efficient Mirror Detection via Multi-level Heterogeneous Learning ( http://arxiv.org/abs/2211.15644v1 ) ライセンス: Link先を確認	Ruozhen He and Jiaying Lin and Rynson W.H. Lau	(参考訳) 超効率的なミラー検出ネットワークであるhetnet (multi-level \textbf{het}erogeneous \textbf{net}work) を提案する。現在のミラー検出手法は効率よりも性能に重点を置いており、リアルタイムアプリケーション(ドローンなど)を制限する。それらの効率性の欠如は、異なるレベルの同質な加群を採用するという共通の設計によって引き起こされる。対照的に、hetnetはまず低レベルの理解(例えば、強度コントラスト)を通じて潜在的なミラー領域を検出し、その後高レベルの理解(例えば、コンテキストの不連続)と組み合わせて予測を確定する。正確かつ効率的なミラー検出を行うため、hetnetは鏡を検出するために異なる段階で特定の情報を取得する効果的なアーキテクチャに従う。さらに,HetNetをベースとしたマルチオリエンテーション強度に基づくコントラスト付きモジュール (MIC) とリフレクションセマンティック論理モジュール (RSL) を提案し,低レベルの理解によるミラー領域の予測と,高レベルの理解によるシナリオにおけるセマンティックロジックの解析を行う。最先端の手法と比較すると、hetnetは664$\%$高速で動作し、maeでは8.9$\%$、iouでは3.1$%$、ミラー検出ベンチマークでは2つのf-measureで2.0$$%$である。 We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between different levels of features. In contrast, HetNet detects potential mirror regions initially through low-level understandings (\textit{e.g.}, intensity contrasts) and then combines with high-level understandings (contextual discontinuity for instance) to finalize the predictions. To perform accurate yet efficient mirror detection, HetNet follows an effective architecture that obtains specific information at different stages to detect mirrors. We further propose a multi-orientation intensity-based contrasted module (MIC) and a reflection semantic logical module (RSL), equipped on HetNet, to predict potential mirror regions by low-level understandings and analyze semantic logic in scenarios by high-level understandings, respectively. Compared to the state-of-the-art method, HetNet runs 664$\%$ faster and draws an average performance gain of 8.9$\%$ on MAE, 3.1$\%$ on IoU, and 2.0$\%$ on F-measure on two mirror detection benchmarks.	翻訳日:2022-11-29 19:08:11 公開日:2022-11-28
# OpenScene:オープン語彙による3Dシーン理解 OpenScene: 3D Scene Understanding with Open Vocabularies ( http://arxiv.org/abs/2211.15654v1 ) ライセンス: Link先を確認	Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser	(参考訳) 従来の3Dシーン理解アプローチは、単一のタスクのためにモデルをトレーニングするためのラベル付き3Dデータセットに依存している。私たちは,CLIP機能空間にテキストと画像ピクセルを埋め込んだ3次元シーンポイントの高密度特徴をモデルが予測する代替手法OpenSceneを提案する。このゼロショットアプローチは、タスク非依存のトレーニングとオープン語彙クエリを可能にする。例えば、SOTAゼロショット3Dセマンティックセグメンテーションを実行するには、まず3Dポイント毎にCLIP機能を推論し、後に任意のクラスラベルの埋め込みと類似性に基づいてそれらを分類する。さらに興味深いのは、これまでにないオープン語彙のシーン理解アプリケーションスイートを可能にすることだ。例えば、任意のテキストクエリを入力すると、シーンのどの部分が一致しているかを示すヒートマップが表示される。我々のアプローチは、複雑な3Dシーンにおいて、オブジェクト、材料、余剰、活動、ルームタイプを特定するのに効果的であり、いずれもラベル付き3Dデータなしでトレーニングされた単一のモデルを使用する。 Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic segmentation it first infers CLIP features for every 3D point and later classifies them based on similarities to embeddings of arbitrary class labels. More interestingly, it enables a suite of open-vocabulary scene understanding applications that have never been done before. For example, it allows a user to enter an arbitrary text query and then see a heat map indicating which parts of a scene match. Our approach is effective at identifying objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data.	翻訳日:2022-11-29 19:07:42 公開日:2022-11-28
# ドット接続:2レベルクエリを用いたフロアプラン再構築 Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries ( http://arxiv.org/abs/2211.15658v1 ) ライセンス: Link先を確認	Yuanwen Yue, Theodora Kontogianni, Konrad Schindler, Francis Engelmann	(参考訳) 3次元スキャンによる2次元フロアプラン再構成について述べる。既存のアプローチは通常、ヒューリスティックに設計されたマルチステージパイプラインを使用する。代わりに、フロアプラン再構築を単一段階構造予測タスクとして定式化し、可変サイズの多角形の集合を見つけ、これは順序付けられた頂点の可変長列である。そこで本研究では,複数の部屋の多角形を並列に,手作り中間段を使わずに総合的に生成する新しい変圧器アーキテクチャを開発した。モデルには、多角形と角形の2レベルクエリと、ネットワークをエンドツーエンドでトレーニング可能にする多角形マッチングが含まれている。提案手法は,Structured3DとSceneCADという2つの挑戦的データセットに対して,従来の手法よりもはるかに高速な推論を実現する。さらに、セマンティックルームタイプやドアや窓のようなアーキテクチャ要素などの追加情報を予測するために簡単に拡張できる。私たちのコードとモデルは、https://github.com/ywyue/RoomFormer.comで利用可能になります。 We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that generates polygons of multiple rooms in parallel, in a holistic manner without hand-crafted intermediate stages. The model features two-level queries for polygons and corners, and includes polygon matching to make the network end-to-end trainable. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD, along with significantly faster inference than previous methods. Moreover, it can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows. Our code and models will be available at: https://github.com/ywyue/RoomFormer.	翻訳日:2022-11-29 19:07:24 公開日:2022-11-28
# Satlas: リモートセンシング画像理解のための大規模マルチタスクデータセット Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding ( http://arxiv.org/abs/2211.15660v1 ) ライセンス: Link先を確認	Favyen Bastani and Piper Wolters and Ritwik Gupta and Joe Ferdinando and Aniruddha Kembhavi	(参考訳) リモートセンシング画像は、森林伐採の追跡、違法漁業、都市の拡張、自然災害など、様々な環境・地球モニタリング作業に有用である。地球は極めて多様で、リモートセンシング画像における潜在的タスクの量は膨大であり、特徴の大きさは数kmから数十cm程度である。しかしながら、汎用的なコンピュータビジョン手法を作成することは、多くのタスクのためにこれらの多様な特徴をキャプチャする大規模なデータセットが欠如していることによる課題である。本稿では,上述したすべてのアプリケーションと,137のカテゴリと7つのラベルモダリティを持つ290mのラベルを含むスケールを特徴とする,リモートセンシングデータセットとベンチマークであるsatlasを提案する。我々は8つのベースラインと提案手法をsatlas上で評価し,リモートセンシングに特有の研究課題に対して,非常に異なる種類のセンサからのイメージからなる画像時系列の処理や,長距離空間コンテキストの活用など,改善の余地があることを見出した。また,satlasでの事前トレーニングでは,ラベル付き例が少なく,下流タスクのパフォーマンスが大幅に向上し,imagenetでは平均精度が16%向上し,次回のベストベースラインでは5%向上した。 Remote sensing images are useful for a wide variety of environmental and earth monitoring tasks, including tracking deforestation, illegal fishing, urban expansion, and natural disasters. The earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes of features range from several kilometers to just tens of centimeters. However, creating generalizable computer vision methods is a challenge in part due to the lack of a large-scale dataset that captures these diverse features for many tasks. In this paper, we present Satlas, a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and seven label modalities. We evaluate eight baselines and a proposed method on Satlas, and find that there is substantial room for improvement in addressing research challenges specific to remote sensing, including processing image time series that consist of images from very different types of sensors, and taking advantage of long-range spatial context. We also find that pre-training on Satlas substantially improves performance on downstream tasks with few labeled examples, increasing average accuracy by 16% over ImageNet and 5% over the next best baseline.	翻訳日:2022-11-29 19:07:07 公開日:2022-11-28
# Pseudo-multi-view Optimization による高忠実度3D GANインバージョン High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization ( http://arxiv.org/abs/2211.15662v1 ) ライセンス: Link先を確認	Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen	(参考訳) 入力画像の特定の詳細を保存しながら、フォトリアリスティックな新規ビューを合成できる高忠実な3D生成逆ネットワーク(GAN)インバージョンフレームワークを提案する。高忠実度3D GANインバージョンは、3Dインバージョンにおける幾何学的・テクスチャ的トレードオフのため本質的に困難である。この課題を解決するために,視覚分析を用いた擬似マルチビュー推定に基づく新しいパイプラインを提案する。目に見える部分の原文のテクスチャを保ち、隠された部分の生成前文を利用する。広範な実験により,本手法は分散テクスチャを有する画像においても,最先端手法よりも有利な再構成と新しいビュー合成品質を実現することが示された。提案するパイプラインでは、反転した潜在コードと3d対応テクスチャによるイメージ属性編集も可能である。提案手法は,1枚の画像から高忠実度3Dレンダリングを可能にし,AI生成3Dコンテンツの様々な応用に期待できる。 We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. High-fidelity 3D GAN inversion is inherently challenging due to the geometry-texture trade-off in 3D inversion, where overfitting to a single view input image often damages the estimated geometry during the latent optimization. To solve this challenge, we propose a novel pipeline that builds on the pseudo-multi-view estimation with visibility analysis. We keep the original textures for the visible parts and utilize generative priors for the occluded parts. Extensive experiments show that our approach achieves advantageous reconstruction and novel view synthesis quality over state-of-the-art methods, even for images with out-of-distribution textures. The proposed pipeline also enables image attribute editing with the inverted latent code and 3D-aware texture modification. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.	翻訳日:2022-11-29 19:06:44 公開日:2022-11-28
# ハンドオブジェクトインタラクション画像生成 Hand-Object Interaction Image Generation ( http://arxiv.org/abs/2211.15663v1 ) ライセンス: Link先を確認	Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li	(参考訳) 本研究では,与えられた手,対象,およびそのインタラクション状態の下でのハンドオブジェクトイメージの条件付き生成を目的とした,新たなタスクであるハンドオブジェクトインタラクション画像生成に焦点をあてる。このタスクは、AR/VRゲームやオンラインショッピングなど、多くの潜在的なアプリケーションシナリオにおいて、挑戦的で研究に値するものです。この問題に対処するために,表現型モデル認識ハンドオブジェクト表現を利用した新しいHOGANフレームワークを提案し,その固有のトポロジを活用して統一表面空間を構築する。この空間では、相互作用中の複雑な自己と相互の閉塞を明示的に考慮する。最終的な画像合成では,手と対象の異なる特性を検討し,対象画像の分割合成を行う。評価のために,生成画像の忠実性と構造保存の両方にアクセスするための包括的なプロトコルを構築する。 HO3Dv3とDexYCBという2つの大規模データセットに対する大規模な実験は、我々のフレームワークの有効性と優位性を定量的かつ質的に実証している。プロジェクトページはhttps://play-with-hoi-generation.github.io/で閲覧できる。 In this work, we are dedicated to a new task, i.e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status. This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping, etc. To address this problem, we propose a novel HOGAN framework, which utilizes the expressive model-aware hand-object representation and leverages its inherent topology to build the unified surface space. In this space, we explicitly consider the complex self- and mutual occlusion during interaction. During final image synthesis, we consider different characteristics of hand and object and generate the target image in a split-and-combine manner. For evaluation, we build a comprehensive protocol to access both the fidelity and structure preservation of the generated image. Extensive experiments on two large-scale datasets, i.e., HO3Dv3 and DexYCB, demonstrate the effectiveness and superiority of our framework both quantitatively and qualitatively. The project page is available at https://play-with-hoi-generation.github.io/.	翻訳日:2022-11-29 19:06:24 公開日:2022-11-28
# RankDNN: 少しの学習でランク付けを学ぶ RankDNN: Learning to Rank for Few-shot Learning ( http://arxiv.org/abs/2211.15320v1 ) ライセンス: Link先を確認	Qianyu Guo, Hongtong Gong, Xujun Wei, Yanwei Fu, Weifeng Ge, Yizhou Yu, Wenqiang Zhang	(参考訳) 本稿では、画像検索の関連性ランキングをバイナリランキング関係分類として活用する、新しい数ショット学習パイプラインを提案する。画像分類と比較して、ランキング関係分類は標本効率が高く、領域非依存である。さらに、少数の学習に対する新しい視点を提供し、最先端の手法を補完する。我々のディープニューラルネットワークのコアコンポーネントは単純なMLPで、2つのベクトルクロネッカー積の差分として符号化された画像三重項を入力として、バイナリ関連ランキングを出力する。提案された rankmlp は最先端の機能抽出器の上に構築することができ、我々のディープニューラルネットワーク全体を ranking deep neural network または rankdnn と呼ぶ。一方 RankDNN は他の後処理手法と柔軟に融合することができる。メタテスト中、RandDNNは、クエリサンプルと類似度に応じてサポートイメージをランク付けし、各クエリサンプルは、隣人のクラスラベルを割り当てる。実験により、rankdnnは様々なバックボーンに基づくベースラインのパフォーマンスを効果的に改善できることが示され、miniimagenet、tieredimagenet、caltech-ucsd birds、cifar-fsを含む複数のマイナショット学習ベンチマークで、以前の最先端アルゴリズムを上回っている。さらに、クロスドメインチャレンジに関する実験では、rankdnnの優れた転送性が実証されている。 This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.	翻訳日:2022-11-29 18:59:17 公開日:2022-11-28
# 学ぶこと:人間と機械を継続的に教育する方法 Learning to Learn: How to Continuously Teach Humans and Machines ( http://arxiv.org/abs/2211.15470v1 ) ライセンス: Link先を確認	Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Daniel Gao, Morgan Bruce Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang	(参考訳) 我々の教育システムは一連のカリキュラムで構成されている。例えば、学校で数学を学ぶとき、加算から乗算へ、そして後に積分へ順に学習する。人間または機械に教えるためのカリキュラムを記述することは、初期のタスクから後のタスクへのポジティブな知識伝達を最大化し、初期のタスクの忘れを最小化するという基本的な目標を共有します。そこで我々は,アルゴリズムが連続的なデータストリームから一度に1つのクラスを学習しなければならないクラスインクリメンタルセッティングにおいて,カリキュラムが既存の継続学習アルゴリズムに与える影響を網羅的に調査した。我々は,可能な級数(カリキュラム)の幅の広い範囲において,キュリキュラは情報の保持に影響を与え,この効果は確率性の産物ではないことを見出した。さらに, 自動カリキュラム設計への取り組みとして, クラス間特徴類似度に基づいて, 効果的なカリキュラムを設計・ランク付けする手法を提案する。実測値と実測値との比較を行い,両者の間に有意な重複が認められた。カリキュラム設計者の研究を支援するために,人間心理物理学実験を行い,物体認識における新しい連続学習ベンチマークを作成した。我々は人間と機械の効果的なカリキュラムにおける合意度を評価した。驚いたことに、我々のカリキュラムデザイナーは、人間の学習に有効な最適なカリキュラムセットを予測できた。カリキュラムデザインには、タイムリーな学生のフィードバックや複数のモダリティによる学習など、多くの考慮事項がある。私たちの研究は、人間や機械に継続的な学習を教えることの課題に取り組むための、コミュニティの標準フレームワークを設定する最初の試みである。 Our education system comprises a series of curricula. For example, when we learn mathematics at school, we learn in order from addition, to multiplication, and later to integration. Delineating a curriculum for teaching either a human or a machine shares the underlying goal of maximizing the positive knowledge transfer from early to later tasks and minimizing forgetting of the early tasks. Here, we exhaustively surveyed the effect of curricula on existing continual learning algorithms in the class-incremental setting, where algorithms must learn classes one at a time from a continuous stream of data. We observed that across a breadth of possible class orders (curricula), curricula influence the retention of information and that this effect is not just a product of stochasticity. Further, as a primary effort toward automated curriculum design, we proposed a method capable of designing and ranking effective curricula based on inter-class feature similarities. We compared the predicted curricula against empirically determined effectual curricula and observed significant overlaps between the two. To support the study of a curriculum designer, we conducted a series of human psychophysics experiments and contributed a new Continual Learning benchmark in object recognition. We assessed the degree of agreement in effective curricula between humans and machines. Surprisingly, our curriculum designer successfully predicts an optimal set of curricula that is effective for human learning. There are many considerations in curriculum design, such as timely student feedback and learning with multiple modalities. Our study is the first attempt to set a standard framework for the community to tackle the problem of teaching humans and machines to learn to learn continuously.	翻訳日:2022-11-29 18:58:35 公開日:2022-11-28
# エッジスパース埋め込みを用いた教師なしスーパーピクセル生成 Unsupervised Superpixel Generation using Edge-Sparse Embedding ( http://arxiv.org/abs/2211.15474v1 ) ライセンス: Link先を確認	Jakob Geusen, Gustav Bredell, Tianfei Zhou, Ender Konukoglu	(参考訳) ピクセルの類似性に基づいて、画像をスーパーピクセルに分割することで、色や空間的位置などの特徴から、データの複雑さを大幅に削減し、その後の画像処理タスクを改善することができる。教師なしスーパーピクセル生成の初期アルゴリズムは、任意のものよりも重要なエッジを優先することなく、局所的なキューにのみ依存していた。一方で、教師なし深層学習に基づく最近の手法では、スーパーピクセルエッジの付着とコンパクト性の間のトレードオフを適切に解決できなかったり、生成されたスーパーピクセル数を制御できなかったりしている。非畳み込み画像デコーダでは、強い空間相関を持つランダムな画像を入力として使用することにより、期待されるコントラスト数を削減し、再構成された画像にスムーズで接続されたエッジを強制することができる。デコーダの最後の隠れ層から断片的なスムースアクティベーションマップに追加の空間情報をエンコードしてエッジスパース画素埋め込みを生成し、標準クラスタリングアルゴリズムを用いて高品質のスーパーピクセルを抽出する。提案手法はbsds500,pascal-context,顕微鏡データセットにおいて最先端の性能を実現する。 Partitioning an image into superpixels based on the similarity of pixels with respect to features such as colour or spatial location can significantly reduce data complexity and improve subsequent image processing tasks. Initial algorithms for unsupervised superpixel generation solely relied on local cues without prioritizing significant edges over arbitrary ones. On the other hand, more recent methods based on unsupervised deep learning either fail to properly address the trade-off between superpixel edge adherence and compactness or lack control over the generated number of superpixels. By using random images with strong spatial correlation as input, \ie, blurred noise images, in a non-convolutional image decoder we can reduce the expected number of contrasts and enforce smooth, connected edges in the reconstructed image. We generate edge-sparse pixel embeddings by encoding additional spatial information into the piece-wise smooth activation maps from the decoder's last hidden layer and use a standard clustering algorithm to extract high quality superpixels. Our proposed method reaches state-of-the-art performance on the BSDS500, PASCAL-Context and a microscopy dataset.	翻訳日:2022-11-29 18:58:10 公開日:2022-11-28
# 推定時間における時間優先性を利用した物体検出におけるオブジェクトの永続性 Object Permanence in Object Detection Leveraging Temporal Priors at Inference Time ( http://arxiv.org/abs/2211.15505v1 ) ライセンス: Link先を確認	Michael F\"urst, Priyash Bhugra, Ren\'e Schuster, Didier Stricker	(参考訳) オブジェクト永続性(object permanence)は、オブジェクトが物理的世界で突然消滅しないという概念である。人間はこの概念を若いころに理解し、それが一時的に隠されているにもかかわらず、他の人がいることを知っている。現在、ニューラルネットワークはこの課題に苦戦している。そこで本研究では,粒子フィルタからインスピレーションを得た2段階検出手法を提案する。基本的には,従来のフレームの予測を,現在のフレームを推定時に追加提案として使用する。実験では、計算オーバーヘッドが少なく、最大10.3 mAPで検出性能を向上させるフィードバックループを確認する。本手法は,重閉塞下においても安定かつ信頼性の高い2段階検出装置の拡張に適している。さらに、既存のモデルを再トレーニングすることなく、このメソッドを適用できることは、現実世界のタスクで幅広いアプリケーションを実現する。 Object permanence is the concept that objects do not suddenly disappear in the physical world. Humans understand this concept at young ages and know that another person is still there, even though it is temporarily occluded. Neural networks currently often struggle with this challenge. Thus, we introduce explicit object permanence into two stage detection approaches drawing inspiration from particle filters. At the core, our detector uses the predictions of previous frames as additional proposals for the current one at inference time. Experiments confirm the feedback loop improving detection performance by a up to 10.3 mAP with little computational overhead. Our approach is suited to extend two-stage detectors for stabilized and reliable detections even under heavy occlusion. Additionally, the ability to apply our method without retraining an existing model promises wide application in real-world tasks.	翻訳日:2022-11-29 18:57:08 公開日:2022-11-28
# DQ-DETR: フレーズ抽出とグラウンド化のためのデュアルクエリ検出変換器 DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding ( http://arxiv.org/abs/2211.15516v1 ) ライセンス: Link先を確認	Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang	(参考訳) 本稿では,句抽出と接地(PEG)の両方を考慮した視覚的接地の問題について検討する。以前のフレーズ-既知の設定とは対照的に、PEGはテキストからフレーズを抽出し、画像からオブジェクトを同時に見つけ出すモデルを必要とする。句抽出を1Dテキストセグメンテーション問題と見なすことができるため、PEGを二重検出問題として定式化し、オブジェクト予測とフレーズマスク予測のための画像とテキストの異なる特徴を探索するDQ-DETRモデルを提案する。各2つのクエリは、異なるコンテンツ部分ではなく、共有位置部分を持つように設計されている。このような設計は(単一のクエリ設計とは対照的に)画像とテキスト間のモダリティアライメントの難しさを効果的に軽減し、トランスフォーマーデコーダにフレーズマスクによる注意を活用させ、パフォーマンスを向上させる。 PEGの性能を評価するため,物体検出におけるAP測定値に類似した新しい測定基準CMAP(クロスモーダル平均精度)を提案する。新しいメトリックは、フレーズグラウンドで多ボックスから一フレーズのケースでRecall@1の曖昧さを克服する。その結果、PEGが事前訓練したDQ-DETRは、ResNet-101バックボーンを持つ全てのビジュアルグラウンドベンチマークに対して、新しい最先端の結果を確立する。例えば、RefCOCO testAとtestBのリコールレートで91.04\%$と83.51\%$をResNet-101バックボーンで達成している。コードは \url{https://github.com/IDEA-Research/DQ-DETR} で利用可能になる。 In this paper, we study the problem of visual grounding by considering both phrase extraction and grounding (PEG). In contrast to the previous phrase-known-at-test setting, PEG requires a model to extract phrases from text and locate objects from images simultaneously, which is a more practical setting in real applications. As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction. Each pair of dual queries is designed to have shared positional parts but different content parts. Such a design effectively alleviates the difficulty of modality alignment between image and text (in contrast to a single query design) and empowers Transformer decoder to leverage phrase mask-guided attention to improve performance. To evaluate the performance of PEG, we also propose a new metric CMAP (cross-modal average precision), analogous to the AP metric in object detection. The new metric overcomes the ambiguity of Recall@1 in many-box-to-one-phrase cases in phrase grounding. As a result, our PEG pre-trained DQ-DETR establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone. For example, it achieves $91.04\%$ and $83.51\%$ in terms of recall rate on RefCOCO testA and testB with a ResNet-101 backbone. Code will be availabl at \url{https://github.com/IDEA-Research/DQ-DETR}.	翻訳日:2022-11-29 18:56:57 公開日:2022-11-28
# マルチターゲットマルチカメラ車両追跡のためのグラフ畳み込みネットワーク Graph Convolutional Network for Multi-Target Multi-Camera Vehicle Tracking ( http://arxiv.org/abs/2211.15538v1 ) ライセンス: Link先を確認	Elena Luna, Juan Carlos San Miguel, Jos\'e Mar\'ia Mart\'inez, and Marcos Escudero-Vi\~nolo	(参考訳) このレターはマルチターゲットマルチカメラ車両追跡のタスクに焦点を当てている。グラフ畳み込みネットワークを訓練することにより,シングルカメラの軌跡をマルチカメラのグローバル軌跡に関連付けることを提案する。当社のアプローチは,グローバルソリューションを提供するすべてのカメラを同時に処理すると同時に,大規模カメラの非同期化にも堅牢です。さらに,クラス不均衡に対処する新たな損失関数を設計する。提案手法は,比較手法と異なり,アドホックな手動アノテーションやしきい値を必要としない,より優れた一般化を示す。 This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal with class imbalance. Our proposal outperforms the related work showing better generalization and without requiring ad-hoc manual annotations or thresholds, unlike compared approaches.	翻訳日:2022-11-29 18:56:28 公開日:2022-11-28
# 幾何学的アライメントに基づくリアルタイムFewshotポートレートスティル化 Realtime Fewshot Portrait Stylization Based On Geometric Alignment ( http://arxiv.org/abs/2211.15549v1 ) ライセンス: Link先を確認	Xinrui Wang, Zhuoru Li, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo	(参考訳) 本稿では,リアルタイムモバイルアプリケーション用にデザインしたポートレートスタイライゼーション手法を提案する。従来の学習に基づくスタイライゼーション手法では、ポートレートドメインとスタイルドメインの間の幾何学的および意味的なギャップに苦しむため、ポートレートイメージに正しく転送されるスタイル情報が妨げられ、スタイライゼーションの品質が低下する。本稿では,人間の顔属性の幾何学的前置に基づいて,幾何学的アライメントを用いてこの問題に取り組むことを提案する。まず,TPS(Thin-Plate-Spline)をジェネレータネットワーク内の特徴マップに加え,画素空間のスタイル画像に直接適用し,同一のランドマークと整列したポートレートスタイルの画像ペアを生成し,二つの領域間の幾何学的ギャップを埋める。第2に、敵対的学習は、ポートレートイメージのテクスチャと色をスタイルドメインにマッピングする。最後に、幾何認識サイクル一貫性はコンテンツとアイデンティティ情報を不変に保存し、変形不変制約はアーティファクトと歪みを抑制する。定性的かつ定量的な比較により,提案手法は既存手法よりも優れており,実験により,モバイル端末上でのリアルタイム(40FPS以上)の限られたスタイルの例(100以下)で学習できることを示した。アブレーション研究はフレームワークの各コンポーネントの有効性を示す。 This paper presents a portrait stylization method designed for real-time mobile applications with limited style examples available. Previous learning based stylization methods suffer from the geometric and semantic gaps between portrait domain and style domain, which obstacles the style information to be correctly transferred to the portrait images, leading to poor stylization quality. Based on the geometric prior of human facial attributions, we propose to utilize geometric alignment to tackle this issue. Firstly, we apply Thin-Plate-Spline (TPS) on feature maps in the generator network and also directly to style images in pixel space, generating aligned portrait-style image pairs with identical landmarks, which closes the geometric gaps between two domains. Secondly, adversarial learning maps the textures and colors of portrait images to the style domain. Finally, geometric aware cycle consistency preserves the content and identity information unchanged, and deformation invariant constraint suppresses artifacts and distortions. Qualitative and quantitative comparison validate our method outperforms existing methods, and experiments proof our method could be trained with limited style examples (100 or less) in real-time (more than 40 FPS) on mobile devices. Ablation study demonstrates the effectiveness of each component in the framework.	翻訳日:2022-11-29 18:56:20 公開日:2022-11-28
# VLTinT:コヒーレントビデオパラグラフキャプションのための視覚言語変換器 VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning ( http://arxiv.org/abs/2211.15103v1 ) ライセンス: Link先を確認	Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le	(参考訳) ビデオパラグラフキャプションは、コヒーレントなストーリーテリングにおいて、複数の時間的イベントロケーションを持つ未トリミングビデオのマルチセンテンス記述を作成することを目的としている。視覚と言語による相互影響の下で視覚成分(例えば、人間、動物)と非視覚成分(例えば、行動、関係)に分解してシーンを効果的に理解する人間の知覚過程に従い、まず視覚言語(vl)特徴を提案する。提案したVL機能では、シーンを3つのモードでモデル化する。 (i)グローバルな視覚環境 (ii) 局所視覚メインエージェント (三)言語シーン要素。次に,ビデオ内およびイベント間コンテンツの意味的コヒーレンスを同時に捉えるために,自己回帰トランスフォーマ(tint)を導入する。最後に,字幕のセマンティクスに適合する学習型埋め込み機能を保証するために,新たなVLコントラスト損失関数を提案する。 ActivityNet CaptionsとYouCookIIデータセットに関する包括的な実験と大規模なアブレーション研究は、提案されたVisual-Linguistic Transformer-in-Transform (VLTinT)が、精度と多様性に関する最先端の手法よりも優れていることを示している。 Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non-visual components (e.g. action, relations) under the mutual influence of vision and language, we first propose a visual-linguistic (VL) feature. In the proposed VL feature, the scene is modeled by three modalities including (i) a global visual environment; (ii) local visual main agents; (iii) linguistic scene elements. We then introduce an autoregressive Transformer-in-Transformer (TinT) to simultaneously capture the semantic coherence of intra- and inter-event contents within a video. Finally, we present a new VL contrastive loss function to guarantee learnt embedding features are matched with the captions semantics. Comprehensive experiments and extensive ablation studies on ActivityNet Captions and YouCookII datasets show that the proposed Visual-Linguistic Transformer-in-Transform (VLTinT) outperforms prior state-of-the-art methods on accuracy and diversity.	翻訳日:2022-11-29 18:50:41 公開日:2022-11-28
# 潜在距離学習を用いた半教師付きバイナリ分類 Semi-supervised binary classification with latent distance learning ( http://arxiv.org/abs/2211.15153v1 ) ライセンス: Link先を確認	Imam Mustafa Kamal and Hyerim Bae	(参考訳) バイナリ分類(BC)は、バイオメディカル診断における健康・不健康な物体の識別や、製造検査における欠陥・非欠陥製品など、現実的な問題においてユビキタスな実践課題である。それでも、この問題を効果的に解決するために、完全に注釈付きデータが必要であり、ドメインの専門家による収集は退屈で高価な手順である。 BCとは対照的に、確率的データ拡張技術に大きく依存するいくつかの重要な半教師付き学習技術が、マルチクラス分類の解決のために考案された。本研究では, 正と負のサンプルを厳密に区別する重要な特徴を省略できるため, 確率的データ拡張手法は典型的な bc 問題の解法には適さないことを示す。そこで本研究では,ランダムなkペア間学習機構を持つラベルを用いて,bc問題を解くための新しい学習表現を提案する。まず、いくつかのラベル付きサンプルを利用することで、エンコーダネットワークは、角空間における正と負のサンプルの投影を学習し、クラス間距離とクラス内距離を最大化し、最小化する。第2に、分類器は、角空間とラベル付きサンプルに基づいて生成されたオンザフライラベルを用いて正と負のサンプルを判別し、bcタスクを解決する。大規模な実験は4つのBCデータセットを用いて実施された。ラベルが少なく、データ拡張技術がないため、提案手法は最先端の半教師あり自己教師あり学習法より優れていた。さらに,10%のラベル付けにより,完全教師付き設定と比較して,半教師付き分類器が競争精度を得ることができた。 Binary classification (BC) is a practical task that is ubiquitous in real-world problems, such as distinguishing healthy and unhealthy objects in biomedical diagnostics and defective and non-defective products in manufacturing inspections. Nonetheless, fully annotated data are commonly required to effectively solve this problem, and their collection by domain experts is a tedious and expensive procedure. In contrast to BC, several significant semi-supervised learning techniques that heavily rely on stochastic data augmentation techniques have been devised for solving multi-class classification. In this study, we demonstrate that the stochastic data augmentation technique is less suitable for solving typical BC problems because it can omit crucial features that strictly distinguish between positive and negative samples. To address this issue, we propose a new learning representation to solve the BC problem using a few labels with a random k-pair cross-distance learning mechanism. First, by harnessing a few labeled samples, the encoder network learns the projection of positive and negative samples in angular spaces to maximize and minimize their inter-class and intra-class distances, respectively. Second, the classifier learns to discriminate between positive and negative samples using on-the-fly labels generated based on the angular space and labeled samples to solve BC tasks. Extensive experiments were conducted using four real-world publicly available BC datasets. With few labels and without any data augmentation techniques, the proposed method outperformed state-of-the-art semi-supervised and self-supervised learning methods. Moreover, with 10% labeling, our semi-supervised classifier could obtain competitive accuracy compared with a fully supervised setting.	翻訳日:2022-11-29 18:50:19 公開日:2022-11-28
# ロバストモデル非依存メタラーニングにおけるショット数の再考 Rethinking the Number of Shots in Robust Model-Agnostic Meta-Learning ( http://arxiv.org/abs/2211.15180v1 ) ライセンス: Link先を確認	Xiaoyue Duan, Guoliang Kang, Runqi Wang, Shumin Han, Song Xue, Tian Wang, Baochang Zhang	(参考訳) モデルに依存しないロバストなメタラーニング(maml)は、通常、少数の例しか持たない新しいクラスに素早く適応するかもしれないメタモデルのトレーニングに採用され、一方で敵の攻撃に対して頑健である。従来のロバストMAMLの解決策は、メタトレーニング段階におけるロバストネスプロモーティング正規化の導入である。このような規則化により、従来の頑健なMAML手法は、トレーニングショットの数とテストショットの数とを一致させ、最適な適応性能を達成するための典型的なMAML手法に従う。しかし、ロバスト性は大幅に改善されるが、従来の方法はクリーンな精度を犠牲にしている。本稿では,MAMLにロバストネス・プロモーティング・正規化を導入することで,クリーンなサンプル特徴の固有次元が減少し,クリーンな表現能力が低下することを示す。これは、従来の堅牢なMAMLメソッドのクリーンな精度が著しく低下する理由を説明できるかもしれない。この観察に基づいて,ロバスト性向上正規化に起因する内在的次元の損失を軽減するため,訓練ショット数の増加という単純な戦略を提案する。本手法は単純ではあるが,ロバスト性を損なうことなくMAMLのクリーンな精度を著しく向上させ,ロバストで高精度なモデルを生成する。広範な実験により,本手法は精度とロバスト性とのトレードオフをよりよく達成する上で,先行技術よりも優れていることが示された。また,本手法は,メタトレーニング中の微調整ステップ数に対する感度が低く,トレーニング効率を向上させるための微調整ステップ数が少なくなることも確認した。 Robust Model-Agnostic Meta-Learning (MAML) is usually adopted to train a meta-model which may fast adapt to novel classes with only a few exemplars and meanwhile remain robust to adversarial attacks. The conventional solution for robust MAML is to introduce robustness-promoting regularization during meta-training stage. With such a regularization, previous robust MAML methods simply follow the typical MAML practice that the number of training shots should match with the number of test shots to achieve an optimal adaptation performance. However, although the robustness can be largely improved, previous methods sacrifice clean accuracy a lot. In this paper, we observe that introducing robustness-promoting regularization into MAML reduces the intrinsic dimension of clean sample features, which results in a lower capacity of clean representations. This may explain why the clean accuracy of previous robust MAML methods drops severely. Based on this observation, we propose a simple strategy, i.e., increasing the number of training shots, to mitigate the loss of intrinsic dimension caused by robustness-promoting regularization. Though simple, our method remarkably improves the clean accuracy of MAML without much loss of robustness, producing a robust yet accurate model. Extensive experiments demonstrate that our method outperforms prior arts in achieving a better trade-off between accuracy and robustness. Besides, we observe that our method is less sensitive to the number of fine-tuning steps during meta-training, which allows for a reduced number of fine-tuning steps to improve training efficiency.	翻訳日:2022-11-29 18:49:52 公開日:2022-11-28
# mixfairface:mixfairアダプタによる顔認識による究極の公平性の実現 MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition ( http://arxiv.org/abs/2211.15181v1 ) ライセンス: Link先を確認	Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai	(参考訳) 顔認識では大きな進歩があったが、顔認識システムにはまだ人口統計バイアスが存在する。例えば、ある人口集団の顔認識性能が他の集団よりも低い場合が普通である。本稿では,顔認識モデルの公平性を改善するためのmixfairfaceフレームワークを提案する。まず、一般的に使用される属性ベースの公正度メトリクスは、顔認識には適さないと主張する。顔認識システムは、すべての人が近いパフォーマンスを持っている間のみ公平であると考えられる。そこで我々は,異なるアプローチの公平性を評価するための新しい評価プロトコルを提案する。人種や性別といった機密性の高い属性ラベルを必要とする従来のアプローチとは異なり、顔表現におけるアイデンティティバイアス、すなわち、機密属性ラベルを必要とせず、異なるアイデンティティ間のパフォーマンスの不一貫性に対処することを目的としている。そこで本研究では,トレーニングサンプルの同一性バイアスを判定し低減するためのmixfairアダプタを提案する。広範な実験により,当社のmixfairfaceアプローチが,すべてのベンチマークデータセットで最先端のフェアネス性能を実現することを実証した。 Although significant progress has been made in face recognition, demographic bias still exists in face recognition systems. For instance, it usually happens that the face recognition performance for a certain demographic group is lower than the others. In this paper, we propose MixFairFace framework to improve the fairness in face recognition models. First of all, we argue that the commonly used attribute-based fairness metric is not appropriate for face recognition. A face recognition system can only be considered fair while every person has a close performance. Hence, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches. Different from previous approaches that require sensitive attribute labels such as race and gender for reducing the demographic bias, we aim at addressing the identity bias in face representation, i.e., the performance inconsistency between different identities, without the need for sensitive attribute labels. To this end, we propose MixFair Adapter to determine and reduce the identity bias of training samples. Our extensive experiments demonstrate that our MixFairFace approach achieves state-of-the-art fairness performance on all benchmark datasets.	翻訳日:2022-11-29 18:49:24 公開日:2022-11-28
# 共分散埋め込み型サービスとしてのメトリックラーニング Metric Learning as a Service with Covariance Embedding ( http://arxiv.org/abs/2211.15197v1 ) ライセンス: Link先を確認	Imam Mustafa Kamal, Hyerim Bae, Ling Liu	(参考訳) ディープラーニングの出現により、メトリック学習は、情報検索、オブジェクト認識、レコメンデーションシステムなど、複雑で大規模なデータセットを扱う多くの機械学習タスクで大きな人気を得ている。メトリック学習は、クラス間の類似性を最大化し、最小化する。しかし、既存のモデルは、主に分離可能な埋め込み空間を得るための距離測度に依存し、クラス間の関係を無視しながらクラス内類似性を暗黙的に最大化する。高性能なディープラーニングアプリケーションのためのサービスとしてメトリック学習を有効にするためには、クラス間の関係を賢く扱い、より高度で意味のある埋め込み空間表現を得る必要がある。本稿では,埋め込み空間におけるデータポイント間の線形関係の方向を示すために共分散を組み込んだサービス手法として,新しい計量学習を提案する。従来の計量学習とは異なり、我々の共分散埋め込み強化アプローチは、サービスとしてのメートル法学習が、類似または異種の測度を計算するためにより表現力があり、正、負、中立の関係を捉えることができる。自然, バイオメディカル, 顔画像など, さまざまなベンチマークデータセットを用いて実施した大規模な実験により, 共分散埋め込み最適化サービスとしてのモデルが, 既存のモデルよりも高品質で分離性が高く, 表現力に富んだ埋め込み表現を得ることができることを示した。 With the emergence of deep learning, metric learning has gained significant popularity in numerous machine learning tasks dealing with complex and large-scale datasets, such as information retrieval, object recognition and recommendation systems. Metric learning aims to maximize and minimize inter- and intra-class similarities. However, existing models mainly rely on distance measures to obtain a separable embedding space and implicitly maximize the intra-class similarity while neglecting the inter-class relationship. We argue that to enable metric learning as a service for high-performance deep learning applications, we should also wisely deal with inter-class relationships to obtain a more advanced and meaningful embedding space representation. In this paper, a novel metric learning is presented as a service methodology that incorporates covariance to signify the direction of the linear relationship between data points in an embedding space. Unlike conventional metric learning, our covariance-embedding-enhanced approach enables metric learning as a service to be more expressive for computing similar or dissimilar measures and can capture positive, negative, or neutral relationships. Extensive experiments conducted using various benchmark datasets, including natural, biomedical, and facial images, demonstrate that the proposed model as a service with covariance-embedding optimizations can obtain higher-quality, more separable, and more expressive embedding representations than existing models.	翻訳日:2022-11-29 18:49:06 公開日:2022-11-28
# Meet-in-the-middle: クロスレゾリューション顔認識のためのマルチスケールアップサンプリングとマッチング \\ Meet-in-the-middle: Multi-scale upsampling and matching \\ for cross-resolution face recognition ( http://arxiv.org/abs/2211.15225v1 ) ライセンス: Link先を確認	Klemen Grm, Berk Kemal \"Ozata, Vitomir \v{S}truc, Haz{\i}m Kemal Ekenel	(参考訳) 本稿では,プロのポートレート写真からの高解像度顔画像と,セキュリティカメラからの低画質監視画像との間の大きな領域ギャップに対処することを目的とする。このような異なる情報源間のアイデンティティマッチングを確立することは、古典的な顔認証シナリオであり、現代の顔認識技術では難しい問題である。そこで本研究では,顔の超解像,解像度マッチング,マルチスケールテンプレート蓄積を組み合わせ,低品質ソースを含む長距離監視映像から顔を確実に認識する手法を提案する。提案手法は、実際の監視画像のターゲットデータセットのトレーニングや微調整を必要としない。広範な実験により,提案手法はscfaceデータセットに微調整された既存手法よりも優れることを示した。 In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.	翻訳日:2022-11-29 18:48:41 公開日:2022-11-28
# 医療画像の領域適応のための周波数領域と空間領域の領域ギャップ低減 Reducing Domain Gap in Frequency and Spatial domain for Cross-modality Domain Adaptation on Medical Image Segmentation ( http://arxiv.org/abs/2211.15235v1 ) ライセンス: Link先を確認	Shaolei Liu, Siqi Yin, Linhao Qu, Manning Wang	(参考訳) unsupervised domain adaptation(uda)は、ソースドメインでトレーニングされたモデルを学び、ラベルなしのターゲットドメインでうまく機能することを目的としている。医用画像セグメンテーションの分野では、既存のUDA手法は、複雑なトレーニングプロセスのため効果の低い異なる画像モダリティ間の領域ギャップに対処するために、敵対的な学習に依存している。本稿では, 周波数及び空間領域移動Uner Multi-Teacher蒸留フレームワークに基づく, 単純かつ効果的なUDA手法を提案する。周波数領域では、まず、ドメイン不変かつドメイン不変な周波数成分(DIFsとDVFs)を識別するための非サブスタンプコントゥール変換を導入し、次に、ソース領域画像のDVFをターゲット領域画像に置き換えてドメインギャップを狭めるとともに、DIFを変更しない。空間領域において,領域変動画像スタイルバイアスを低減するために,バッチモーメント更新に基づくヒストグラムマッチング戦略を提案する。 2つのクロスモーダル医療画像セグメンテーションデータセット(心,腹部)を用いた実験により,提案手法は最先端手法と比較して優れた性能を示した。 Unsupervised domain adaptation (UDA) aims to learn a model trained on source domain and performs well on unlabeled target domain. In medical image segmentation field, most existing UDA methods depend on adversarial learning to address the domain gap between different image modalities, which is ineffective due to its complicated training process. In this paper, we propose a simple yet effective UDA method based on frequency and spatial domain transfer uner multi-teacher distillation framework. In the frequency domain, we first introduce non-subsampled contourlet transform for identifying domain-invariant and domain-variant frequency components (DIFs and DVFs), and then keep the DIFs unchanged while replacing the DVFs of the source domain images with that of the target domain images to narrow the domain gap. In the spatial domain, we propose a batch momentum update-based histogram matching strategy to reduce the domain-variant image style bias. Experiments on two cross-modality medical image segmentation datasets (cardiac, abdominal) show that our proposed method achieves superior performance compared to state-of-the-art methods.	翻訳日:2022-11-29 18:48:27 公開日:2022-11-28
# DeepAngle:ディープラーニングを用いた断層画像の接触角の高速計算 DeepAngle: Fast calculation of contact angles in tomography images using deep learning ( http://arxiv.org/abs/2211.15243v1 ) ライセンス: Link先を確認	Arash Rabbani, Chenhao Sun, Masoud Babaei, Vahid J. Niasar, Ryan T. Armstrong, Peyman Mostaghimi	(参考訳) deepangleは、多孔質材料のトモグラフィ画像における異なる位相の接触角を決定する機械学習ベースの手法である。 3次元の角度の測定は、角度平面に垂直な表面で行う必要があり、画像ボクセルの離散化された空間を扱う際には不正確になる可能性がある。計算集約的な解は、適応可能な格子を用いて全ての曲面の相関とベクトル化を行い、次に所望の平面内の角度を測定することである。そこで本研究では,画像から直接界面角度を推定する深層学習による迅速かつ低コストな手法を提案する。 DeepAngleは直接測定技術に対して合成画像と現実画像の両方でテストされ、計算コストを20倍に下げながらr-2乗を5～16%改善した。この高速な手法は,大規模トモグラフィーデータや時間分解画像の処理に特に応用できる。開発コードとデータセットはGitHubのオープンリポジトリ(https://www.github.com/ArashRabbani/DeepAngle)で入手できる。 DeepAngle is a machine learning-based method to determine the contact angles of different phases in the tomography images of porous materials. Measurement of angles in 3--D needs to be done within the surface perpendicular to the angle planes, and it could become inaccurate when dealing with the discretized space of the image voxels. A computationally intensive solution is to correlate and vectorize all surfaces using an adaptable grid, and then measure the angles within the desired planes. On the contrary, the present study provides a rapid and low-cost technique powered by deep learning to estimate the interfacial angles directly from images. DeepAngle is tested on both synthetic and realistic images against the direct measurement technique and found to improve the r-squared by 5 to 16% while lowering the computational cost 20 times. This rapid method is especially applicable for processing large tomography data and time-resolved images, which is computationally intensive. The developed code and the dataset are available at an open repository on GitHub (https://www.github.com/ArashRabbani/DeepAngle).	翻訳日:2022-11-29 18:48:03 公開日:2022-11-28
# 顔画像品質評価におけるバイアスの評価 Assessing Bias in Face Image Quality Assessment ( http://arxiv.org/abs/2211.15265v1 ) ライセンス: Link先を確認	\v{Z}iga Babnik and Vitomir \v{S}truc	(参考訳) 顔画像品質評価(FIQA)は、サンプル品質に関する追加情報を提供することで、顔認識(FR)の性能を向上させる。 FIQA法は, 顔認識におけるサンプルの有用性を推定しようとするため, 基礎となる顔認識システムの影響を強く受けていると仮定することは妥当である。現代の顔認識システムはよく機能することが知られているが、いくつかの研究では、そのようなシステムはしばしば人口統計バイアスを伴う問題を示すことが知られている。したがって、このような問題はFIQA技術にも存在している可能性が高い。本稿では, FIQAアプローチに関連する人口統計学的バイアスについて検討するため, 様々な品質評価手法(汎用画像品質評価, 教師なし顔品質評価, 教師なし顔品質評価)と3種類の最先端FRモデルを含む総合的研究を行った。 The Balanced Faces in the Wild (BFW) データセットの解析により、考慮されたすべてのテクニックは、セックスよりも人種のバリエーションによって影響を受けていることが示された。汎用的な画像品質評価手法は,2つの要因に比較して偏見が低いが,監督的および教師なしの顔画像品質評価法はともに,(性別の)白人を好む傾向のある強い偏見を示す。さらに、人種的に偏りの少ない手法は、全体的な成績が悪化することがわかった。このことは、FIQA法における観測バイアスが、基礎となる顔認識システムとかなりの関係があることを示唆している。 Face image quality assessment (FIQA) attempts to improve face recognition (FR) performance by providing additional information about sample quality. Because FIQA methods attempt to estimate the utility of a sample for face recognition, it is reasonable to assume that these methods are heavily influenced by the underlying face recognition system. Although modern face recognition systems are known to perform well, several studies have found that such systems often exhibit problems with demographic bias. It is therefore likely that such problems are also present with FIQA techniques. To investigate the demographic biases associated with FIQA approaches, this paper presents a comprehensive study involving a variety of quality assessment methods (general-purpose image quality assessment, supervised face quality assessment, and unsupervised face quality assessment methods) and three diverse state-of-theart FR models. Our analysis on the Balanced Faces in the Wild (BFW) dataset shows that all techniques considered are affected more by variations in race than sex. While the general-purpose image quality assessment methods appear to be less biased with respect to the two demographic factors considered, the supervised and unsupervised face image quality assessment methods both show strong bias with a tendency to favor white individuals (of either sex). In addition, we found that methods that are less racially biased perform worse overall. This suggests that the observed bias in FIQA methods is to a significant extent related to the underlying face recognition system.	翻訳日:2022-11-29 18:47:43 公開日:2022-11-28
# スペクトル反射率分解による非ランバート型多スペクトル光量ステレオ NeuralMPS: Non-Lambertian Multispectral Photometric Stereo via Spectral Reflectance Decomposition ( http://arxiv.org/abs/2211.15311v1 ) ライセンス: Link先を確認	Jipeng Lv, Heng Guo, Guanying Chen, Jinxiu Liang and Boxin Shi	(参考訳) マルチスペクトラルフォトメトリックステレオ(mps)は、マルチスペクトラル照明下で撮影された単発マルチスペクトラル画像からシーンの表面正常を回復することを目的としている。既存のMPS法ではランベルト反射率モデルを用いて問題を抽出できるが、現実の表面への応用は大幅に制限される。本稿では,一般の非ランベルトスペクトル反射率の下でmps問題を解決するために,ニューラルネットワークであるneuralmpsを提案する。具体的には,スペクトル反射率分解(srd)モデルを用いて,スペクトル反射率を幾何成分とスペクトル成分に分解する。この分解により、均一な材料を持つ表面のMPS問題は、未知の光強度を持つ従来の測光ステレオ(CPS)と等価であることを示す。このように、NeuralMPSは、よく研究された非ランベルト的なCPS手法を活用することで、非ランベルト的なMPS問題の難しさを軽減する。合成シーンと実世界のシーンの両方で実験を行い,本手法の有効性を実証した。 Multispectral photometric stereo(MPS) aims at recovering the surface normal of a scene from a single-shot multispectral image captured under multispectral illuminations. Existing MPS methods adopt the Lambertian reflectance model to make the problem tractable, but it greatly limits their application to real-world surfaces. In this paper, we propose a deep neural network named NeuralMPS to solve the MPS problem under general non-Lambertian spectral reflectances. Specifically, we present a spectral reflectance decomposition(SRD) model to disentangle the spectral reflectance into geometric components and spectral components. With this decomposition, we show that the MPS problem for surfaces with a uniform material is equivalent to the conventional photometric stereo(CPS) with unknown light intensities. In this way, NeuralMPS reduces the difficulty of the non-Lambertian MPS problem by leveraging the well-studied non-Lambertian CPS methods. Experiments on both synthetic and real-world scenes demonstrate the effectiveness of our method.	翻訳日:2022-11-29 18:47:19 公開日:2022-11-28
# CLIP2GAN: GANの潜在空間でテキストをブリッジする CLIP2GAN: Towards Bridging Text with the Latent Space of GANs ( http://arxiv.org/abs/2211.15045v1 ) ライセンス: Link先を確認	Yixuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li	(参考訳) 本稿では,CLIPモデルとStyleGANを活用して,テキスト誘導画像生成に特化して,CLIP2GANという新しいフレームワークを提案する。 CLIP2GANのキーとなる考え方は、CLIPの出力特徴埋め込み空間とStyleGANの入力潜在空間をブリッジすることであり、マッピングネットワークを導入して実現している。トレーニング段階では、画像をクリップでエンコードし、出力機能を潜在コードにマップし、さらに画像の再構築に使用する。このように、マッピングネットワークは自己教師付き学習方法で最適化される。推論段階では、CLIPは画像とテキストの両方を共有機能埋め込みスペースに埋め込むことができるため、トレーニングアーキテクチャにおけるCLIPイメージエンコーダをCLIPテキストエンコーダに置き換えると同時に、以下のマッピングネットワークとStyleGANモデルを保持する。その結果、テキスト記述を柔軟に入力して画像を生成することができる。さらに、地図化されたCLIP画像機能に属性のマッピングされたテキスト機能を追加するだけで、画像に対する属性を効果的に編集できる。提案したCLIP2GANは,従来の方法に比べて優れた性能を示した。 In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network. In the training stage, we encode an image with CLIP and map the output feature to a latent code, which is further used to reconstruct the image. In this way, the mapping network is optimized in a self-supervised learning way. In the inference stage, since CLIP can embed both image and text into a shared feature embedding space, we replace CLIP image encoder in the training architecture with CLIP text encoder, while keeping the following mapping network as well as StyleGAN model. As a result, we can flexibly input a text description to generate an image. Moreover, by simply adding mapped text features of an attribute to a mapped CLIP image feature, we can effectively edit the attribute to the image. Extensive experiments demonstrate the superior performance of our proposed CLIP2GAN compared to previous methods.	翻訳日:2022-11-29 18:41:37 公開日:2022-11-28
# Mix and Localize: 音源のミキサー内局在化 Mix and Localize: Localizing Sound Sources in Mixtures ( http://arxiv.org/abs/2211.15058v1 ) ライセンス: Link先を確認	Xixi Hu, Ziyang Chen, Andrew Owens	(参考訳) 本稿では,複数の音源を同時に可視化する手法を提案する。このタスクは、音の混合を個々のソースにグループ化し、それらを視覚信号に関連付けるモデルを必要とする。本手法は,Jabriらのランダムウォークにヒントを得た定式化を用いて,両課題を同時に解決する。我々は、画像と分離された音がノードに対応するグラフを作成し、ランダムウォーカーに異なるモードから高い戻り確率でノード間の遷移を訓練する。この歩行の遷移確率は、モデルによって学習された視聴覚類似度指標によって決定される。実験では,複数の音の局所化に成功し,他の自己監視手法よりも優れていることを示す。プロジェクトサイト: https://hxixixh.github.io/mix-and-localize We present a method for simultaneously localizing multiple sound sources within a visual scene. This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal. Our method jointly solves both tasks at once, using a formulation inspired by the contrastive random walk of Jabri et al. We create a graph in which images and separated sounds correspond to nodes, and train a random walker to transition between nodes from different modalities with high return probability. The transition probabilities for this walk are determined by an audio-visual similarity metric that is learned by our model. We show through experiments with musical instruments and human speech that our model can successfully localize multiple sounds, outperforming other self-supervised methods. Project site: https://hxixixh.github.io/mix-and-localize	翻訳日:2022-11-29 18:41:15 公開日:2022-11-28
# 低ショットカテゴリ一般化のための複数視点からの高密度オブジェクト記述子学習 Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization ( http://arxiv.org/abs/2211.15059v1 ) ライセンス: Link先を確認	Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg	(参考訳) コンピュータビジョンのディープラーニング時代の特徴は、オブジェクト認識やセマンティックセグメンテーション、光学フロー推定、そして3dシーンの新しいビュー合成まで、タスクの特徴表現を訓練するために大規模なラベル付きデータセットをうまく利用することである。本研究では,カテゴリラベルを必要とせず,低ショットカテゴリ認識のための密な判別対象表現を学習することを目的とする。そこで本稿では,対象インスタンスの複数ビューからカテゴリや意味的オブジェクト部分ラベルを使わずにトレーニング可能な,ディープオブジェクトパッチエンコーディング(dope)を提案する。 dopeを訓練するには,被写体の視野間のピクセルレベル対応を得るために,被写体深度,前景マスク,既知のカメラへのアクセスを想定し,これを用いて自己教師あり学習タスクを定式化し,識別対象パッチを学習する。 DOPEは, 局所的マッチングを用いて, 新規カテゴリーの低ショット分類に利用でき, 教師付き学習ベースラインや自己教師型学習ベースラインと競合する。コードとデータはhttps://github.com/rehg-lab/dope_selfsup。 A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines. Code and data available at https://github.com/rehg-lab/dope_selfsup.	翻訳日:2022-11-29 18:41:01 公開日:2022-11-28
# インタラクティブなビジュアル特徴検索 Interactive Visual Feature Search ( http://arxiv.org/abs/2211.15060v1 ) ライセンス: Link先を確認	Devon Ulrich and Ruth Fong	(参考訳) 畳み込みニューラルネットワーク(CNN)の動作を説明するために、多くの可視化技術が作成されているが、それらは主に限られた情報を伝える静的な図で構成されている。インタラクティブなビジュアライゼーションはより豊富な洞察を提供し、より簡単にモデルの振る舞いを探索することができるが、一般的には再利用可能なものではなく、特定のモデルに特有のものである。我々は,任意のcnnに一般化可能で,研究者のワークフローに容易に組み込むことのできる,インタラクティブなインタラクティブ可視化であるvisual feature searchを紹介する。このツールを使うと、ユーザーは画像領域をハイライトし、最もよく似たCNN機能を持つデータセットから画像を検索できる。キャッシュベースの効率的な検索実装で、大きなイメージデータセットの検索をサポートする。本手法は, 教師付き, 自己監督型, および人間編集型cnnを用いた実験により, モデル行動の異なる側面を解明する方法を示す。また、ポータブルなPythonライブラリといくつかのIPythonノートブックもリリースしています。私たちのコードはhttps://github.com/lookingglasslab/VisualFeatureSearchで参照できます。 Many visualization techniques have been created to help explain the behavior of convolutional neural networks (CNNs), but they largely consist of static diagrams that convey limited information. Interactive visualizations can provide more rich insights and allow users to more easily explore a model's behavior; however, they are typically not easily reusable and are specific to a particular model. We introduce Visual Feature Search, a novel interactive visualization that is generalizable to any CNN and can easily be incorporated into a researcher's workflow. Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar CNN features. It supports searching through large image datasets with an efficient cache-based search implementation. We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on supervised, self-supervised, and human-edited CNNs. We also release a portable Python library and several IPython notebooks to enable researchers to easily use our tool in their own experiments. Our code can be found at https://github.com/lookingglasslab/VisualFeatureSearch.	翻訳日:2022-11-29 18:40:27 公開日:2022-11-28
# 単眼ビデオからの高忠実度顔面アバター再構成 High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors ( http://arxiv.org/abs/2211.15064v1 ) ライセンス: Link先を確認	Yunpeng Bai, Yanbo Fan, Xuan Wang, Yong Zhang, Jingxiang Sun, Chun Yuan, Ying Shan	(参考訳) 単眼映像からの高忠実な顔のアバター再構成は、コンピュータグラフィックスとコンピュータビジョンにおいて重要な研究課題である。近年,Neural Radiance Field (NeRF) は目覚しいビューレンダリング結果を示しており,顔アバターの再構成も検討されている。しかし、単眼ビデオにおける複雑な顔のダイナミクスと3D情報の欠如は、忠実な顔の再構築に重大な課題をもたらす。そこで本研究では,3次元認識を用いた顔アバター再構成手法を提案する。動的モデリングのための条件付き変形場に依存する既存の作品とは異なり、3d-ganの潜在空間における局所および低次元部分空間として定式化されたパーソナライズされた生成前置法を学習することを提案する。そこで本稿では,特定の人物の顔画像の小さなセットに基づいて,パーソナライズされた生成前を効率的に構築する方法を提案する。学習後、新しいビューによるフォトリアリスティックなレンダリングが可能となり、潜在空間でナビゲーションを行うことで、顔再現を実現することができる。提案手法は,RGB画像,3DMM係数,オーディオなど,異なる駆動信号に適用可能である。既存の作品と比較して優れた新規視点合成結果と忠実に対面再現性能が得られる。 High-fidelity facial avatar reconstruction from a monocular video is a significant research problem in computer graphics and computer vision. Recently, Neural Radiance Field (NeRF) has shown impressive novel view rendering results and has been considered for facial avatar reconstruction. However, the complex facial dynamics and missing 3D information in monocular videos raise significant challenges for faithful facial reconstruction. In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. Different from existing works that depend on a conditional deformation field for dynamic modeling, we propose to learn a personalized generative prior, which is formulated as a local and low dimensional subspace in the latent space of 3D-GAN. We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios. Compared with existing works, we obtain superior novel view synthesis results and faithfully face reenactment performance.	翻訳日:2022-11-29 18:40:01 公開日:2022-11-28
# クラス不均衡セマンティックセグメンテーションのための半監督信頼度に基づくコントラスト識別 Semi-Supervised Confidence-Level-based Contrastive Discrimination for Class-Imbalanced Semantic Segmentation ( http://arxiv.org/abs/2211.15066v1 ) ライセンス: Link先を確認	Kangcheng Liu	(参考訳) データ・ハングリー課題を克服するために,クラス不均衡意味セグメンテーションタスクのための半教師ありコントラスト学習フレームワークを提案する。まず、モデルを半教師付きで動作させるため、信頼度に基づくコントラスト学習を提案し、インスタンス識別を明示的に達成し、低信頼度低品質特徴を高信頼度特徴と整合させる。さらに,クラックセグメンテーションと道路成分抽出におけるクラス不均衡の問題に取り組むため,画素レベル意味セグメンテーションにおける従来のクロスエントロピー損失に代わるデータ不均衡損失を提案した。最後に,セマンティクスセグメンテーション性能を向上させるための,効果的な多段融合ネットワークアーキテクチャを提案する。実産業用ひび割れセグメント化と道路セグメント化に関する広範囲実験により,提案手法の有効性が示された。提案手法は3.5%のラベル付きデータでも十分なセグメンテーション結果が得られる。 To overcome the data-hungry challenge, we have proposed a semi-supervised contrastive learning framework for the task of class-imbalanced semantic segmentation. First and foremost, to make the model operate in a semi-supervised manner, we proposed the confidence-level-based contrastive learning to achieve instance discrimination in an explicit manner, and make the low-confidence low-quality features align with the high-confidence counterparts. Moreover, to tackle the problem of class imbalance in crack segmentation and road components extraction, we proposed the data imbalance loss to replace the traditional cross entropy loss in pixel-level semantic segmentation. Finally, we have also proposed an effective multi-stage fusion network architecture to improve semantic segmentation performance. Extensive experiments on the real industrial crack segmentation and the road segmentation demonstrate the superior effectiveness of the proposed framework. Our proposed method can provide satisfactory segmentation results with even merely 3.5% labeled data.	翻訳日:2022-11-29 18:39:41 公開日:2022-11-28
# FeatureBooster: 軽量ニューラルネットワークによる機能記述の強化 FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network ( http://arxiv.org/abs/2211.15069v1 ) ライセンス: Link先を確認	Xinjiang Wang, Zeyu Liu, Yu Hu, Wei Xi, Wenxian Yu, Danping Zou	(参考訳) 同じ画像内のキーポイントの記述子を改善するための軽量ネットワークを導入する。このネットワークは、元の記述子とキーポイントの幾何学的性質を入力とし、MLPベースのセルフブートステージとTransformerベースのクロスブートステージを使用して記述子を強化する。拡張ディスクリプタは、実数値かバイナリかのいずれかである。提案するネットワークは,手作り(orb, sift)と最先端学習に基づく記述子(superpoint, 等)の両方を増強し,画像マッチング, 視覚定位, 運動からの構造タスクで評価する。その結果、特に大きな照明変化や繰り返しパターンなどの困難な場合において、各タスクの性能が著しく向上することが示された。提案手法では,デスクトップgpuでは3.2ms,組込みgpuでは27msしか必要とせず,実用的なシステムに適用するには十分高速である。 We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a Transformer-based cross-boosting stage to enhance the descriptors. The enhanced descriptors can be either real-valued or binary ones. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors (SuperPoint, ALIKE) and evaluate them on image matching, visual localization, and structure-from-motion tasks. The results show that our method significantly improves the performance of each task, particularly in challenging cases such as large illumination changes or repetitive patterns. Our method requires only 3.2ms on desktop GPU and 27ms on embedded GPU to process 2000 features, which is fast enough to be applied to a practical system.	翻訳日:2022-11-29 18:39:24 公開日:2022-11-28
# 条件付きバッチ正規化のマルチモーダル学習における落とし穴 Pitfalls of Conditional Batch Normalization for Contextual Multi-Modal Learning ( http://arxiv.org/abs/2211.15071v1 ) ライセンス: Link先を確認	Ivaxi Sheth, Aamer Abdul Rahman, Mohammad Havaei, Samira Ebrahimi Kahou	(参考訳) 人間は感覚器官を通して複数のモダリティから学ぶ技術を完成させた。単一のモダリティにおける驚くべき予測性能にもかかわらず、ニューラルネットワークは複数のモダリティに関して人間のレベルの精度に到達できない。これは、それぞれの様相の構造が変化するため、特に難しい課題である。条件付きバッチ正規化(CBN)は、文脈的特徴を学習して深層学習タスクを支援するために提案される一般的な手法である。この技術は、畳み込みニューラルネットワークのアフィン変換を学習することにより、補助データを用いて表現力を向上させる。 CBN層を用いた性能向上にもかかわらず,我々はCBNによる補助データの導入によって得られた視覚的特徴が劣化していることを明らかにした。我々は,様々なデータセットに対するCBNネットワークの脆さを評価するための総合的な実験を行い,視覚的特徴のみからの学習が一般化に優れていることを示唆した。鳥類分類のための自然画像のcbnモデルと癌分類のための組織像を評価した。我々は,CBNネットワークが鳥類分類データセットの視覚的特徴や組織学的データセットの視覚的特徴をほとんど学習していないことを観察した。 CBNは補助データとラベル間のショートカット学習を促進する可能性がある。 Humans have perfected the art of learning from multiple modalities through sensory organs. Despite their impressive predictive performance on a single modality, neural networks cannot reach human level accuracy with respect to multiple modalities. This is a particularly challenging task due to variations in the structure of respective modalities. Conditional Batch Normalization (CBN) is a popular method that was proposed to learn contextual features to aid deep learning tasks. This technique uses auxiliary data to improve representational power by learning affine transformations for convolutional neural networks. Despite the boost in performance observed by using CBN layers, our work reveals that the visual features learned by introducing auxiliary data via CBN deteriorates. We perform comprehensive experiments to evaluate the brittleness of CBN networks to various datasets, suggesting that learning from visual features alone could often be superior for generalization. We evaluate CBN models on natural images for bird classification and histology images for cancer type classification. We observe that the CBN network learns close to no visual features on the bird classification dataset and partial visual features on the histology dataset. Our extensive experiments reveal that CBN may encourage shortcut learning between the auxiliary data and labels.	翻訳日:2022-11-29 18:39:06 公開日:2022-11-28
# クラス適応型ネットワーク校正 Class Adaptive Network Calibration ( http://arxiv.org/abs/2211.15088v1 ) ライセンス: Link先を確認	Bingyuan Liu, J\'er\^ome Rony, Adrian Galdran, Jose Dolz, Ismail Ben Ayed	(参考訳) 最近の研究では、従来の精度以上のキャリブレーションは、現代のディープニューラルネットワークのトレーニングにも考慮すべきであることが示されている。学習中の誤校正に対処するために,各項の相対的寄与を制御するハイパーパラメータを用いて,学習目標の一部として異なるペナルティ関数を探索した手法がある。しかしながら、これらの手法には2つの大きな欠点がある。 1) スカラーバランスの重みは,すべてのクラスにおいて同じであり,クラス間の内在的困難や不均衡に対処する能力を妨げる。 2) バランスウェイトは適応戦略を使わずに固定され, 精度とキャリブレーションの最良の妥協点に達するのを防ぎ, 各アプリケーションに対してハイパーパラメーター探索が必要となる。そこで本研究では,深層ネットワークを校正するクラス適応ラベル平滑化(cals)を提案する。提案手法は,制約付き最適化における確立された手法である一般拡張ラグランジアンアプローチに基づいているが,大規模クラス適応型トレーニングのための修正がいくつか導入されている。標準およびロングテール画像分類、意味セグメンテーション、テキスト分類を含む様々なベンチマークにおける総合的評価と多重比較は、提案手法の優位性を示している。コードはhttps://github.com/by-liu/CALSで公開されている。 Recent studies have revealed that, beyond conventional accuracy, calibration should also be considered for training modern deep neural networks. To address miscalibration during learning, some methods have explored different penalty functions as part of the learning objective, alongside a standard classification loss, with a hyper-parameter controlling the relative contribution of each term. Nevertheless, these methods share two major drawbacks: 1) the scalar balancing weight is the same for all classes, hindering the ability to address different intrinsic difficulties or imbalance among classes; and 2) the balancing weight is usually fixed without an adaptive strategy, which may prevent from reaching the best compromise between accuracy and calibration, and requires hyper-parameter search for each application. We propose Class Adaptive Label Smoothing (CALS) for calibrating deep networks, which allows to learn class-wise multipliers during training, yielding a powerful alternative to common label smoothing penalties. Our method builds on a general Augmented Lagrangian approach, a well-established technique in constrained optimization, but we introduce several modifications to tailor it for large-scale, class-adaptive training. Comprehensive evaluation and multiple comparisons on a variety of benchmarks, including standard and long-tailed image classification, semantic segmentation, and text classification, demonstrate the superiority of the proposed method. The code is available at https://github.com/by-liu/CALS.	翻訳日:2022-11-29 18:38:48 公開日:2022-11-28
# MGFN:弱スーパービジョンビデオ異常検出のためのマグニチュードコントラストGlance-and-Focusネットワーク MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection ( http://arxiv.org/abs/2211.15098v1 ) ライセンス: Link先を確認	Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, Yik-Chung Wu	(参考訳) 監視ビデオにおける異常検出の微妙な監視は難しい課題だ。長編ビデオに異常をローカライズする能力に欠ける既存の作品以外にも,空間的時間的情報を効率的に統合して正確な異常検出を行う新しい視点・フォーカスネットワークを提案する。さらに,異常度を表すために特徴量を用いた既存手法では,シーンの変動の影響を無視することが一般的であり,その結果,シーン間の特徴量の不整合による準最適性能が得られた。この問題に対処するため,異常検出のための特徴量の識別性を高めるために,特徴増幅機構とマグニチュードコントラスト損失を提案する。 UCF-Crime と XD-Violence の2つの大規模ベンチマークの実験結果から,本手法は最先端の手法よりも優れていることが示された。 Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches.	翻訳日:2022-11-29 18:38:23 公開日:2022-11-28
# デュアル情報強化マルチビュー分散グラフクラスタリング Dual Information Enhanced Multi-view Attributed Graph Clustering ( http://arxiv.org/abs/2211.14987v1 ) ライセンス: Link先を確認	Jia-Qi Lin, Man-Sheng Chen, Xi-Ran Zhu, Chang-Dong Wang, Haizhang Zhang	(参考訳) マルチビュー属性グラフクラスタリングは、属性特徴と、異なるビューからの隣接行列に基づいて、マルチビューデータを分割する重要なアプローチである。有望なクラスタリング性能を達成したグラフニューラルネットワーク(GNN)の利用が試みられている。それにもかかわらず、複数のビューに埋め込まれた固有の特定の情報に注意を払う人は少ない。一方、低レベルの表現から潜在高レベルの表現を回復することができないため、ダウンストリームクラスタリングのパフォーマンスが大幅に制限される。本稿では,これらのギャップを埋めるために,新しい2重情報強化多視点グラフクラスタリング(diagc)法を提案する。具体的には,複数視点からのコンセンサスと特定情報の探索を解消するsir(specific information reconstruction)モジュールを導入することで,gcnがより本質的な低レベル表現をキャプチャできるようにする。さらに、相互情報最大化(MIM)モジュールは、潜在高レベル表現と低レベル表現との合意を最大化し、自己監督クラスタリング(SC)モジュールの助けを借りて、高レベル表現が望ましいクラスタリング構造を満たすことを可能にする。いくつかの実世界のベンチマーク実験では、提案手法の有効性を最先端のベースラインと比較した。 Multi-view attributed graph clustering is an important approach to partition multi-view data based on the attribute feature and adjacent matrices from different views. Some attempts have been made in utilizing Graph Neural Network (GNN), which have achieved promising clustering performance. Despite this, few of them pay attention to the inherent specific information embedded in multiple views. Meanwhile, they are incapable of recovering the latent high-level representation from the low-level ones, greatly limiting the downstream clustering performance. To fill these gaps, a novel Dual Information enhanced multi-view Attributed Graph Clustering (DIAGC) method is proposed in this paper. Specifically, the proposed method introduces the Specific Information Reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views, which enables GCN to capture the more essential low-level representations. Besides, the Mutual Information Maximization (MIM) module maximizes the agreement between the latent high-level representation and low-level ones, and enables the high-level representation to satisfy the desired clustering structure with the help of the Self-supervised Clustering (SC) module. Extensive experiments on several real-world benchmarks demonstrate the effectiveness of the proposed DIAGC method compared with the state-of-the-art baselines.	翻訳日:2022-11-29 17:56:11 公開日:2022-11-28
# shoupa:パーキンソン病の早期診断のためのaiシステム Shoupa: An AI System for Early Diagnosis of Parkinson's Disease ( http://arxiv.org/abs/2211.15234v1 ) ライセンス: Link先を確認	Jingwei Li, Ruitian Wu, Tzu-liang Huang, Zian Pan, Ming-chun Huang	(参考訳) パーキンソン病(英: Parkinson's Disease、PD)は、進行性神経系疾患であり、580万人以上、特に高齢者に影響を与えた。症状の複雑さと他の神経疾患との類似性のため、早期発見には神経科医やPDスペシャリストが関与する必要があるが、ほとんどの高齢者にはアクセスできない。そこで我々は、スマートモバイルデバイスとAI技術を統合する。本稿では,運動症状と非運動症状の両方を評価する異なるタスクを組み合わせるpd早期検出システムの枠組みを提案する。開発したモデルを用いて,非クリニカルな条件下でPDを一時的に検出し,最も重篤な症状を明らかにする。この結果は、PDリハビリテーション指導や他の神経疾患の検出にさらに使われることが期待される。 Parkinson's Disease (PD) is a progressive nervous system disorder that has affected more than 5.8 million people, especially the elderly. Due to the complexity of its symptoms and its similarity to other neurological disorders, early detection requires neurologists or PD specialists to be involved, which is not accessible to most old people. Therefore, we integrate smart mobile devices with AI technologies. In this paper, we introduce the framework of our developed PD early detection system which combines different tasks evaluating both motor and non-motor symptoms. With the developed model, we help users detect PD punctually in non-clinical settings and figure out their most severe symptoms. The results are expected to be further used for PD rehabilitation guidance and detection of other neurological disorders.	翻訳日:2022-11-29 17:55:51 公開日:2022-11-28
# 文化的に無知なAIモデルの神話 The Myth of Culturally Agnostic AI Models ( http://arxiv.org/abs/2211.15271v1 ) ライセンス: Link先を確認	Eva Cetinic	(参考訳) 本稿では,経験的文化研究の目的として,大規模視覚言語モデルの可能性について考察する。 dall-e 2とstable diffusionという2つの一般的なテキストから画像への合成モデルからの出力の比較分析に注目し,文化に無依存なaiモデルに対する努力の長所と短所について考察した。本稿では、リスク緩和と文化的特異性とのトレードオフを示す出力の記憶とバイアスの例と、文化的非依存モデルの開発における全体的な不可能性について論じる。 The paper discusses the potential of large vision-language models as objects of interest for empirical cultural studies. Focusing on the comparative analysis of outputs from two popular text-to-image synthesis models, DALL-E 2 and Stable Diffusion, the paper tries to tackle the pros and cons of striving towards culturally agnostic vs. culturally specific AI models. The paper discusses several examples of memorization and bias in generated outputs which showcase the trade-off between risk mitigation and cultural specificity, as well as the overall impossibility of developing culturally agnostic models.	翻訳日:2022-11-29 17:55:37 公開日:2022-11-28
# 会話からの低リソース個人属性予測 Low-resource Personal Attribute Prediction from Conversation ( http://arxiv.org/abs/2211.15324v1 ) ライセンス: Link先を確認	Yinan Liu and Hu Chen and Wei Shen and Jiaoyan Chen	(参考訳) 個人知識ベース(pkbs)は、パーソナライズドレコメンデーションやwebベースのチャットボットなど、幅広いアプリケーションにとって重要である。 PKBを構築する上で重要な課題は、ユーザの会話データから個人属性の知識を抽出することである。会話システムや個人属性,これらのユーザの発話のユーザ数を考えると,ユーザ毎の個人属性値のランク付けを予測することが目的である。従来の研究では、ラベル付き発話や外部データなどのリソースの相対的な数に依存することが多いが、ラベル付き発話に埋め込まれた属性知識は未利用であり、難解な個人属性を予測する能力は未だに不十分である。さらに,この課題を直接解決するために,いくつかのテキスト分類手法が利用可能であることが判明した。しかし、これらの難しい個人的属性に対してうまく機能しない。本稿では,ラベル付き発話や外部データを使用しない低リソース環境下で,発話から豊富な個人的属性知識を活用し,会話から個人的属性を予測する新しい枠組みを提案する。 PEARLは、更新された事前属性知識を用いて、両項意味情報と単語共起情報をシームレスに結合し、両項トピックモデルのギブスサンプリングプロセスを反復的に洗練する。広範な実験結果から,pearlは2つのデータセット上での会話による個人属性予測のタスクだけでなく,より一般的な弱い教師付きテキスト分類タスクを1つのデータセット上で超えていることがわかった。 Personal knowledge bases (PKBs) are crucial for a broad range of applications such as personalized recommendation and Web-based chatbots. A critical challenge to build PKBs is extracting personal attribute knowledge from users' conversation data. Given some users of a conversational system, a personal attribute and these users' utterances, our goal is to predict the ranking of the given personal attribute values for each user. Previous studies often rely on a relative number of resources such as labeled utterances and external data, yet the attribute knowledge embedded in unlabeled utterances is underutilized and their performance of predicting some difficult personal attributes is still unsatisfactory. In addition, it is found that some text classification methods could be employed to resolve this task directly. However, they also perform not well over those difficult personal attributes. In this paper, we propose a novel framework PEARL to predict personal attributes from conversations by leveraging the abundant personal attribute knowledge from utterances under a low-resource setting in which no labeled utterances or external data are utilized. PEARL combines the biterm semantic information with the word co-occurrence information seamlessly via employing the updated prior attribute knowledge to refine the biterm topic model's Gibbs sampling process in an iterative manner. The extensive experimental results show that PEARL outperforms all the baseline methods not only on the task of personal attribute prediction from conversations over two data sets, but also on the more general weakly supervised text classification task over one data set.	翻訳日:2022-11-29 17:55:26 公開日:2022-11-28
# 資源制約ゴールPMDPの遮蔽 Shielding in Resource-Constrained Goal POMDPs ( http://arxiv.org/abs/2211.15349v1 ) ライセンス: Link先を確認	Michal Ajdar\'ow, \v{S}imon Brlej, Petr Novotn\'y	(参考訳) 我々は,特定の資源(例えば,電池に蓄えられた電力)の供給を必要とするエージェントを正しく動作させるためにモデル化する部分可観測マルコフ決定プロセス(pomdps)を検討する。資源はエージェントの行動によって消費され、特定の州でのみ補充される。エージェントは、リソースの枯渇を防止しながら、ある目標を達成するための期待されるコストを最小限にすることを目的としています。 RSGO問題に対して2段階のアプローチをとる。まず,形式的手法を用いて,与えられたシナリオに対して \emph{shield} を演算するアルゴリズムを設計する。第2に, RSGO問題を解くアルゴリズムを得るために, シールドを用いたPOMDP計画のためのPOMCPヒューリスティック探索アルゴリズムを拡張した。本アルゴリズムを実装し,そのベンチマークへの適用性を示す実験を行った。 We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call \emph{resource-constrained goal optimization} (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a \emph{shield} for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.	翻訳日:2022-11-29 17:55:02 公開日:2022-11-28
# マニキュア識別チャレンジによるマニキュア識別を可能にするAI AI Enabled Maneuver Identification via the Maneuver Identification Challenge ( http://arxiv.org/abs/2211.15552v1 ) ライセンス: Link先を確認	Kaira Samuel, Matthew LaRosa, Kyle McAlpin, Morgan Schaefer, Brandon Swenson, Devin Wasilefsky, Yan Wu, Dan Zhao, Jeremy Kepner	(参考訳) 人工知能(AI)は、パイロット訓練の質に関する実用的なフィードバックを提供することで、空軍のパイロット訓練を改善する大きな可能性を秘めている。歴史的に、データ、問題記述、サンプルコードで構成されるAIの課題は、AIのブレークスルーを促進するために重要だった。空軍・マサチューセッツ工科大学aiアクセラレーター(daf-mit ai accelerator)は、実世界の航空シミュレータデータを用いたaiチャレンジを開発した。 Maneuver IDチャレンジでは、パイロット訓練会(PTN)で実際に空軍の学生パイロットが収集した何千ものバーチャルリアリティーシミュレーター飛行記録が集められた。このデータセットはManeuver-ID.mit.eduで公開され、USAFの飛行訓練データの最初の公開である。このデータセットを用いて、我々は「良い」と「悪い」シミュレーターデータを分離し、操作の分類と特徴付けに様々なAI手法を適用した。これらのデータ、アルゴリズム、ソフトウェアは、飛行シミュレータトレーニングのためのAIエコシステムを実現するために、他の人が構築するモデルパフォーマンスのベースラインとしてリリースされている。 Artificial intelligence (AI) has enormous potential to improve Air Force pilot training by providing actionable feedback to pilot trainees on the quality of their maneuvers and enabling instructor-less flying familiarization for early-stage trainees in low-cost simulators. Historically, AI challenges consisting of data, problem descriptions, and example code have been critical to fueling AI breakthroughs. The Department of the Air Force-Massachusetts Institute of Technology AI Accelerator (DAF-MIT AI Accelerator) developed such an AI challenge using real-world Air Force flight simulator data. The Maneuver ID challenge assembled thousands of virtual reality simulator flight recordings collected by actual Air Force student pilots at Pilot Training Next (PTN). This dataset has been publicly released at Maneuver-ID.mit.edu and represents the first of its kind public release of USAF flight training data. Using this dataset, we have applied a variety of AI methods to separate "good" vs "bad" simulator data and categorize and characterize maneuvers. These data, algorithms, and software are being released as baselines of model performance for others to build upon to enable the AI ecosystem for flight simulator training.	翻訳日:2022-11-29 17:54:37 公開日:2022-11-28
# ニューロシンボリック時空間推論 Neuro-Symbolic Spatio-Temporal Reasoning ( http://arxiv.org/abs/2211.15566v1 ) ライセンス: Link先を確認	Jae Hee Lee, Michael Sioutis, Kyra Ahrens, Marjan Alirezaie, Matthias Kerzel, Stefan Wermter	(参考訳) 空間と時間に関する知識は、物理的世界の問題を解決するために必要である: 物理的世界に位置し、オブジェクトと相互作用するaiエージェントは、しばしばオブジェクト間の位置と関係について判断する必要がある。しかし時空間的知識は物理的世界との相互作用を超えて必要であり、しばしばアナロジーやメタファー(例えば「私たちの頭の上に掛かっている脅威」)を通して抽象的な概念の世界に移される。空間的および時間的推論はユビキタスであるため、これをAIシステムに統合するためのさまざまな試みがなされている。知識表現の分野では、空間的および時間的推論は、オブジェクトとリレーションシップのモデリングと、オブジェクトとリレーションシップに関するステートメントを検証するための推論方法の開発に大きく制限されている。一方、ニューラルネットワーク研究者は、限られた推論能力を持つデータから空間関係を学習するモデルを教えようとした。これら2つのアプローチ間のギャップを相互に有益な方法で橋渡しすることで、自然言語処理、視覚的質問応答、セマンティックイメージのセグメンテーションなど、多くの複雑な実世界の問題に対処できます。本章では、ニューロシンボリックAIの観点から、この統合問題を考察する。具体的には,空間的および時間的知識に基づく論理的推論と機械学習の相乗効果を提案する。いくつかの成功したアプリケーション、残る課題、そしてこの方向に関連する評価データセットを記述することが、この貢献の主要なトピックである。 Knowledge about space and time is necessary to solve problems in the physical world: An AI agent situated in the physical world and interacting with objects often needs to reason about positions of and relations between objects; and as soon as the agent plans its actions to solve a task, it needs to consider the temporal aspect (e.g., what actions to perform over time). Spatio-temporal knowledge, however, is required beyond interacting with the physical world, and is also often transferred to the abstract world of concepts through analogies and metaphors (e.g., "a threat that is hanging over our heads"). As spatial and temporal reasoning is ubiquitous, different attempts have been made to integrate this into AI systems. In the area of knowledge representation, spatial and temporal reasoning has been largely limited to modeling objects and relations and developing reasoning methods to verify statements about objects and relations. On the other hand, neural network researchers have tried to teach models to learn spatial relations from data with limited reasoning capabilities. Bridging the gap between these two approaches in a mutually beneficial way could allow us to tackle many complex real-world problems, such as natural language processing, visual question answering, and semantic image segmentation. In this chapter, we view this integration problem from the perspective of Neuro-Symbolic AI. Specifically, we propose a synergy between logical reasoning and machine learning that will be grounded on spatial and temporal knowledge. Describing some successful applications, remaining challenges, and evaluation datasets pertaining to this direction is the main topic of this contribution.	翻訳日:2022-11-29 17:54:07 公開日:2022-11-28
# 医療対話における情報の自動抽出 : エキスパートシステムとラベリングへの注意 Automatically Extracting Information in Medical Dialogue: Expert System And Attention for Labelling ( http://arxiv.org/abs/2211.15544v1 ) ライセンス: Link先を確認	Xinshi Wang, Daniel Tang	(参考訳) 現代の医療において,医療対話情報抽出はますます大きな問題になりつつある。電子カルテ(EMR)から重要な情報を大量に抽出することは困難である。これまで研究者は、emrから特徴を検索するための注意に基づくモデルを提案したが、その限界は医療対話の異なるカテゴリを認識することができないことを反映していた。本稿では,新しいモデルであるExpert System and Attention for Labelling (ESAL)を提案する。我々は、専門家と事前訓練されたBERTの混合を用いて、異なるカテゴリのセマンティクスを検索し、モデルがそれらの違いを融合できるようにする。実験では, ESALを公開データセットに適用し, 実験結果から, ESALは医療情報分類の性能を大幅に向上したことが示された。 Medical dialogue information extraction is becoming an increasingly significant problem in modern medical care. It is difficult to extract key information from electronic medical records (EMRs) due to their large numbers. Previously, researchers proposed attention-based models for retrieving features from EMRs, but their limitations were reflected in their inability to recognize different categories in medical dialogues. In this paper, we propose a novel model, Expert System and Attention for Labelling (ESAL). We use mixture of experts and pre-trained BERT to retrieve the semantics of different categories, enabling the model to fuse the differences between them. In our experiment, ESAL was applied to a public dataset and the experimental results indicated that ESAL significantly improved the performance of Medical Information Classification.	翻訳日:2022-11-29 17:47:01 公開日:2022-11-28
# Unfair ToS Clause Detection に対する攻撃:Universal Adversarial Trigger を用いたケーススタディ Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers ( http://arxiv.org/abs/2211.15556v1 ) ライセンス: Link先を確認	Shanshan Xu and Irina Broda and Rashid Haddad and Marco Negrini and Matthias Grabmair	(参考訳) 近年の研究では、自然言語処理技術がサービス規約(tos)の不正な条項を自動的に検出することで消費者保護を支援することが示されている。この研究は、トランスフォーマーベースのToS分析システムが敵攻撃に対して脆弱であることを示す。我々は,普遍的な敵トリガーを持つ不公平なクラーズ検出器を攻撃実験を行う。実験により、テキストのわずかな摂動は検出性能を著しく低下させることが示された。さらに,トリガの検出可能性を測定するため,回答の精度と応答時間の両方を参加者から収集し,詳細な人的評価研究を行う。その結果、トリガーの自然さが読者を騙す鍵であることがわかった。 Recent work has demonstrated that natural language processing techniques can support consumer protection by automatically detecting unfair clauses in the Terms of Service (ToS) Agreement. This work demonstrates that transformer-based ToS analysis systems are vulnerable to adversarial attacks. We conduct experiments attacking an unfair-clause detector with universal adversarial triggers. Experiments show that a minor perturbation of the text can considerably reduce the detection performance. Moreover, to measure the detectability of the triggers, we conduct a detailed human evaluation study by collecting both answer accuracy and response time from the participants. The results show that the naturalness of the triggers remains key to tricking readers.	翻訳日:2022-11-29 17:46:49 公開日:2022-11-28
# スウェーデンにおける基本読解理解のための質問応答ペアの自動生成 Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish ( http://arxiv.org/abs/2211.15568v1 ) ライセンス: Link先を確認	Dmytro Kalpakchi and Johan Boye	(参考訳) 本稿では,クインダクタ法を用いて,スウェーデン語テキストから自動生成した読解質問の品質評価を行う。本手法は、自動質問生成(qg)のための軽量でデータ駆動だが非ニューラルな手法である。評価の結果,Quinductorはニューラルネットワークに基づくQG手法の強力なベースラインを提供する,実行可能なQG手法であることがわかった。 This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods.	翻訳日:2022-11-29 17:46:39 公開日:2022-11-28
# 大きな変化を伴う動的コミュニティ検出のための高次知識伝達 Higher-order Knowledge Transfer for Dynamic Community Detection with Great Changes ( http://arxiv.org/abs/2211.15043v1 ) ライセンス: Link先を確認	Huixin Ma, Kai Wu, Handing Wang, Jing Liu	(参考訳) ネットワーク構造は現実の時間とともに進化し,動的ネットワークにおけるコミュニティの変化の発見は,課題を提起する重要な研究課題である。既存のほとんどのメソッドは、ネットワークに大きな変化は起こらないと仮定している。しかし、通常、現実世界には大きな変化がある。ネットワークの大幅な変更により、コミュニティ検出アルゴリズムは以前のスナップショットから貴重な情報を得るのが難しくなり、次のステップでは負の転送が行われる。本稿では、過去のスナップショットから高次知識を統合することで、大幅な変化を伴う動的なコミュニティ検出に焦点を当てた。さらに、検索効率を向上させるために、スナップショットの隣接行列の類似性を検出することにより、一階知識と高階知識を判定する高階知識転送戦略を考案する。このように、我々の提案は、過去のコミュニティ検出結果の利点をよりよく保ち、それらを次のタスクに移すことができる。我々は4つの実世界のネットワークで実験を行い、大きな変更や小さな変更を加えたネットワークを含む。低相似性データセットにおける実験結果は、ネットワークが著しく変化しても一階の知識よりも高階の知識の方が価値があることを示し、高相似性データセットを扱う場合でも利点を保っていることを示している。我々の提案は、大きな変化を伴う他の動的最適化問題を導くこともできる。 Network structure evolves with time in the real world, and the discovery of changing communities in dynamic networks is an important research topic that poses challenging tasks. Most existing methods assume that no significant change in the network occurs; namely, the difference between adjacent snapshots is slight. However, great change exists in the real world usually. The great change in the network will result in the community detection algorithms are difficulty obtaining valuable information from the previous snapshot, leading to negative transfer for the next time steps. This paper focuses on dynamic community detection with substantial changes by integrating higher-order knowledge from the previous snapshots to aid the subsequent snapshots. Moreover, to improve search efficiency, a higher-order knowledge transfer strategy is designed to determine first-order and higher-order knowledge by detecting the similarity of the adjacency matrix of snapshots. In this way, our proposal can better keep the advantages of previous community detection results and transfer them to the next task. We conduct the experiments on four real-world networks, including the networks with great or minor changes. Experimental results in the low-similarity datasets demonstrate that higher-order knowledge is more valuable than first-order knowledge when the network changes significantly and keeps the advantage even if handling the high-similarity datasets. Our proposal can also guide other dynamic optimization problems with great changes.	翻訳日:2022-11-29 17:46:33 公開日:2022-11-28
# snpシステムとその構成グラフの特性 Properties of SN P system and its Configuration Graph ( http://arxiv.org/abs/2211.15159v1 ) ライセンス: Link先を確認	Henry N. Adorna	(参考訳) sn pシステムとその変異に関する文献でいくつかの研究が報告されている。多くの場合、結果は様々な変種とこれらの変種が生成し認識する言語のクラスに普遍性をもたらす。 sn pシステムの状態はその構成である。構成の到達可能性に関する前回の結果をsn p系に対する「it基本状態方程式」と呼ぶ。本稿では,sn pシステムの動作特性と構造特性について,本基本状態方程式に主に依存する遅延を伴わない予備的な検討を行う。また、設定グラフ $CG_{\Pi}$ を SN P システム $\Pi$ のアイデアを紹介し、$CG_{\Pi} に対して $\Pi$ の振る舞い特性を特徴付けるのに遅延を伴わない。 sn p システム $\pi$ の行列 $m_{\pi}$ は、$\pi の構造特性を特徴付けるために使われる。 $ Several studies have been reported in the literature about SN P system and its variants. Often, the results provide universality of various variants and the classes of languages that these variants generate and recognize. The state of SN P system is its configuration. We refer to our previous result on reachability of configuration as the {\it Fundamental state equation for SN P system.} This paper provides a preliminary investigation on the behavioral and structural properties of SN P system without delay that depend primarily to this fundamental state equation. Also, we introduce the idea of configuration graph $CG_{\Pi}$ of an SN P system $\Pi$ without delay to characterize behavioral properties of $\Pi$ with respect to $CG_{\Pi}.$ The matrix $M_{\Pi}$ of an SN P system $\Pi$ without delay is used to characterize structural properties of $\Pi.$	翻訳日:2022-11-29 17:46:10 公開日:2022-11-28
# プロンプトベース学習によるキーポイントマッピングへの議論 Arguments to Key Points Mapping with Prompt-based Learning ( http://arxiv.org/abs/2211.14995v1 ) ライセンス: Link先を確認	Ahnaf Mozib Samin, Behrooz Nikandish, Jingyan Chen	(参考訳) 大量の情報を効率的に処理し、消化することは、現代社会における長期的な需要である。キーポイント(必須情報を取り込む短いテキスト要約とフィルタリング冗長性)を多くの引数/オピニオンにマップするソリューションが最近提供されている(bar-haim et al., 2020)。本稿では,引数対キーポイントマッピングタスクの全体像を補完するために,主に2つのアプローチを提案する。最初のアプローチは、事前学習言語モデル(plm)の微調整にプロンプトエンジニアリングを組み込むことである。第2のアプローチは、PLMにおけるプロンプトベースの学習を利用して中間テキストを生成し、元の引数キーポイントペアと組み合わせて、クラス化子に入力として入力し、それらをマッピングする。さらに,実験をクロス/イン・ドメインに拡張し,詳細な分析を行う。私たちの評価では一より直接的な方法による即効的な工学の使用(アプローチ1)は、有望な結果をもたらし、性能を改善することができる。二アプローチ2は、PLMの否定問題により、アプローチ1より著しく悪化する。 Handling and digesting a huge amount of information in an efficient manner has been a long-term demand in modern society. Some solutions to map key points (short textual summaries capturing essential information and filtering redundancies) to a large number of arguments/opinions have been provided recently (Bar-Haim et al., 2020). To complement the full picture of the argument-to-keypoint mapping task, we mainly propose two approaches in this paper. The first approach is to incorporate prompt engineering for fine-tuning the pre-trained language models (PLMs). The second approach utilizes prompt-based learning in PLMs to generate intermediary texts, which are then combined with the original argument-keypoint pairs and fed as inputs to a classifier, thereby mapping them. Furthermore, we extend the experiments to cross/in-domain to conduct an in-depth analysis. In our evaluation, we find that i) using prompt engineering in a more direct way (Approach 1) can yield promising results and improve the performance; ii) Approach 2 performs considerably worse than Approach 1 due to the negation issue of the PLM.	翻訳日:2022-11-29 17:38:37 公開日:2022-11-28
# stage: アスペクト感情三重項抽出のためのスパンタグとグリーディ推論法 STAGE: Span Tagging and Greedy Inference Scheme for Aspect Sentiment Triplet Extraction ( http://arxiv.org/abs/2211.15003v1 ) ライセンス: Link先を確認	Shuo Liang, Wei Wei, Xian-Ling Mao, Yuanyuan Fu, Rui Fang, Dangyang Chen	(参考訳) Aspect Sentiment Triplet extract (ASTE) は感情分析研究において新たな課題となり、ある文からアスペクト項とその対応する意見項とその関連する感情極性を抽出することを目指している。近年、異なるタグ付けスキームを持つ多くのニューラルネットワークベースのモデルが提案されているが、ほとんどすべてのモデルには制限がある。 1) 各単語が1つの役割(アスペクト項や意見項など)にのみ関連しているという事前仮定 2) 単語レベルの相互作用と各意見/アスペクトを独立した単語の集合として扱う。したがって、複数の役割に関連する単語や複数の単語を持つアスペクト/オピニオン項など、複雑なasteタスクではパフォーマンスが低下する。そこで我々は,Span TAgging と Greedy infErence (STAGE) という新たなアプローチを提案し,複数の単語から構成され,同時に異なる役割を演じることができる。そこで本稿では,ASTEタスクを多クラススパン分類問題として定式化する。具体的には、スパンレベルの情報と制約、すなわちスパンタグスキームとグリーディ推論戦略の2つのコンポーネントを探索することで、より正確なアスペクト感情三重項抽出を生成する。前者のタグは、新しく定義されたタグセットに基づいて、可能な候補すべてにまたがる。後者は、候補感情スニペットから最大長のアスペクト/オピニオン項を取得し、感情三重項を出力する。さらに,このステージに基づく簡易かつ効果的なモデルを提案する。これは4つの広く使用されているデータセットにおいて,最先端を大きなマージンで上回っている。さらに,STAGE を他のペア/トリップレット抽出タスクに簡単に一般化することができ,提案方式の STAGE の優位性を示す。 Aspect Sentiment Triplet Extraction (ASTE) has become an emerging task in sentiment analysis research, aiming to extract triplets of the aspect term, its corresponding opinion term, and its associated sentiment polarity from a given sentence. Recently, many neural networks based models with different tagging schemes have been proposed, but almost all of them have their limitations: heavily relying on 1) prior assumption that each word is only associated with a single role (e.g., aspect term, or opinion term, etc. ) and 2) word-level interactions and treating each opinion/aspect as a set of independent words. Hence, they perform poorly on the complex ASTE task, such as a word associated with multiple roles or an aspect/opinion term with multiple words. Hence, we propose a novel approach, Span TAgging and Greedy infErence (STAGE), to extract sentiment triplets in span-level, where each span may consist of multiple words and play different roles simultaneously. To this end, this paper formulates the ASTE task as a multi-class span classification problem. Specifically, STAGE generates more accurate aspect sentiment triplet extractions via exploring span-level information and constraints, which consists of two components, namely, span tagging scheme and greedy inference strategy. The former tag all possible candidate spans based on a newly-defined tagging set. The latter retrieves the aspect/opinion term with the maximum length from the candidate sentiment snippet to output sentiment triplets. Furthermore, we propose a simple but effective model based on the STAGE, which outperforms the state-of-the-arts by a large margin on four widely-used datasets. Moreover, our STAGE can be easily generalized to other pair/triplet extraction tasks, which also demonstrates the superiority of the proposed scheme STAGE.	翻訳日:2022-11-29 17:38:20 公開日:2022-11-28
# WMT22チャット翻訳タスクのためのBJTU-WeChatのシステム BJTU-WeChat's Systems for the WMT22 Chat Translation Task ( http://arxiv.org/abs/2211.15009v1 ) ライセンス: Link先を確認	Yunlong Liang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou	(参考訳) 本稿では,WMT'22チャット翻訳タスクに対して,北京地東大学とWeChat AIを共同で提案する。 Transformerに基づいて、いくつかの有効な変種を適用する。実験では,事前学習型微調整パラダイムを用いた。最初の事前学習段階では、データフィルタリングと合成データ生成(バックトランスレーション、フォワードトランスレーション、知識蒸留)を用いる。第2のファインチューニング段階では、話者対応のドメイン内データ生成、話者適応、プロンプトベースコンテキストモデリング、ターゲットデノイング微調整、自己圧縮型モデルアンサンブルについて検討する。本システムは0.810と0.946のCOMETスコアを得る。英語とドイツ語のCOMETスコアは、全ての応募の中で最高である。 This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. In our experiments, we utilize the pre-training-then-fine-tuning paradigm. In the first pre-training stage, we employ data filtering and synthetic data generation (i.e., back-translation, forward-translation, and knowledge distillation). In the second fine-tuning stage, we investigate speaker-aware in-domain data generation, speaker adaptation, prompt-based context modeling, target denoising fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve 0.810 and 0.946 COMET scores. The COMET scores of English-German and German-English are the highest among all submissions.	翻訳日:2022-11-29 17:37:48 公開日:2022-11-28
# 夏:WMT22バイオメディカル翻訳タスクのためのWeChatニューラル機械翻訳システム Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task ( http://arxiv.org/abs/2211.15022v1 ) ライセンス: Link先を確認	Ernan Li, Fandong Meng and Jie Zhou	(参考訳) 本稿では,WeChatのWMT 2022への参加について紹介する。我々のシステムはトランスフォーマに基づいており、いくつかの異なるトランスフォーマ構造を使用して翻訳の質を向上させる。実験では,データフィルタリング,データ生成,トランスフォーマーのいくつかの変種,微調整,モデルアンサンブルを用いた。われわれの中国の$\to$EnglishシステムはSummerと名付けられ、全応募中で最も高いBLEUスコアを達成している。 This paper introduces WeChat's participation in WMT 2022 shared biomedical translation task on Chinese to English. Our systems are based on the Transformer, and use several different Transformer structures to improve the quality of translation. In our experiments, we employ data filtering, data generation, several variants of Transformer, fine-tuning and model ensemble. Our Chinese$\to$English system, named Summer, achieves the highest BLEU score among all submissions.	翻訳日:2022-11-29 17:37:35 公開日:2022-11-28
# DiffusionBERT: 拡散モデルによる生成的マスク言語モデルの改善 DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models ( http://arxiv.org/abs/2211.15029v1 ) ライセンス: Link先を確認	Zhengfu He, Tianxiang Sun, Kuanning Wang, Xuanjing Huang, Xipeng Qiu	(参考訳) 離散拡散モデルに基づく新しい生成マスク付き言語モデルであるDiffusionBERTを提案する。拡散モデルと多くの事前訓練された言語モデルは共通の訓練目標、すなわち2つの強力なモデルを組み合わせ、両方の世界の最高のものを楽しむことができる。一方、拡散モデルは、生成品質を改善するための有望なトレーニング戦略を提供する。一方、事前訓練された言語モデル(例えばBERT)は収束を加速する優れた初期化として使用できる。我々は,離散拡散過程の逆過程を吸収状態で学習し,それを改善するためにいくつかの設計を解明するためにBERTを訓練する。まず,各ステップに付加される雑音の度合いを,各トークンの情報に基づいて制御する前方拡散プロセスのための新しいノイズスケジュールを提案する。次に,時間ステップをBERTに組み込む設計について検討する。非条件テキスト生成の実験では、DiffusionBERTはテキストの既存の拡散モデル(例えば、D3PMとDiffusion-LM)や、パープレキシティとBLEUスコアの点で、以前の生成的マスキング言語モデルよりも大幅に改善されている。 We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the two powerful models and enjoy the best of both worlds. On the one hand, diffusion models offer a promising training strategy that helps improve the generation quality. On the other hand, pre-trained denoising language models (e.g., BERT) can be used as a good initialization that accelerates convergence. We explore training BERT to learn the reverse process of a discrete diffusion process with an absorbing state and elucidate several designs to improve it. First, we propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step based on the information of each token. Second, we investigate several designs of incorporating the time step into BERT. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text (e.g., D3PM and Diffusion-LM) and previous generative masked language models in terms of perplexity and BLEU score.	翻訳日:2022-11-29 17:37:27 公開日:2022-11-28
# songrewriter: コントロール可能なコンテンツとrhymeスキームを備えた中国の歌の書き直しシステム SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme ( http://arxiv.org/abs/2211.15037v1 ) ライセンス: Link先を確認	Yusen Sun, Liangyou Li, Qun Liu and Dit-Yan Yeung	(参考訳) 近年,歌詞生成は顕著な進歩を遂げているが,互換性のある旋律を作成せずには歌詞を演奏できないため,実用的応用は限られている。そこで本研究では,生成した歌詞が既存の旋律のリズムと適合し,歌えるように,既存の歌の歌詞を書き換える歌書き換えシステムを提案することで,この実用的ギャップを解消する。特に,メロディ構成の事前知識を必要とせず,ユーザを支援する制御可能な中国語歌詞生成・編集システムであるsongrewriterを提案する。システムはランダム化されたマルチレベルマスキング戦略によって訓練され、完全に新しい歌詞を生成したり、いくつかの断片を編集するための統一モデルを生成する。生成プロセスの制御能力を向上させるために、コンテンツの語彙選択を制御するキーワードプロンプトを更に取り入れ、フレキシブルエンドおよび内部リズムスキームを実現するための新しい復号制約と母音モデリングタスクを提案する。先行韻律はラップ歌詞を主目的とするが,新たに3つの韻律評価指標を提案する。自動評価と人間評価の両方により,提案モデルが,内容と韻律品質の両方において,最先端モデルよりも優れた性能を示す。 MindSpore Liteツールで実装されたコードとモデルが利用可能になります。 Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter, a controllable Chinese lyric generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality. Our code and models implemented in MindSpore Lite tool will be available.	翻訳日:2022-11-29 17:37:08 公開日:2022-11-28
# 超大語彙を持つ大規模事前学習モデル:ヘブライ語のBERTモデルの対比分析と、その全てを上回る新しいモデル Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All ( http://arxiv.org/abs/2211.15199v1 ) ライセンス: Link先を確認	Eylon Guetta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty	(参考訳) 我々は,従来のヘブライ語plmよりもはるかに大きな語彙(128k項目)を用いた現代ヘブライ語のための新しい事前学習言語モデル(plm)を提案する。我々は,従来のヘブライ語 PLM (mBERT, heBERT, AlephBERT) に対して,このモデルを対照的に解析し,より大きな語彙がタスク性能に与える影響を評価する。実験の結果、より大きな語彙は分割を減らし、分割を減らすことは、異なるタスクをまたいだモデルの性能向上に役立つことがわかった。すべての新しいモデルにおいて、Morphological Segmentation、POS Tagging、Full Morphological Analysis、NER、Sentiment Analysisを含むすべてのHebrewベンチマークで新しいSOTAを実現している。その後、レイヤ数やトレーニングデータだけでなく、その語彙の観点からも大きなplmを提唱します。制限のない使用のために、新しいモデルを公開しています。 We present a new pre-trained language model (PLM) for modern Hebrew, termed AlephBERTGimmel, which employs a much larger vocabulary (128K items) than standard Hebrew PLMs before. We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance. Our experiments show that larger vocabularies lead to fewer splits, and that reducing splits is better for model performance, across different tasks. All in all this new model achieves new SOTA on all available Hebrew benchmarks, including Morphological Segmentation, POS Tagging, Full Morphological Analysis, NER, and Sentiment Analysis. Subsequently we advocate for PLMs that are larger not only in terms of number of layers or training data, but also in terms of their vocabulary. We release the new model publicly for unrestricted use.	翻訳日:2022-11-29 17:36:48 公開日:2022-11-28
# HERDPhobia:ナイジェリアのフラーニに対するヘイトスピーチのデータセット HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria ( http://arxiv.org/abs/2211.15262v1 ) ライセンス: Link先を確認	Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Ibrahim Said Ahmad	(参考訳) ソーシャルメディアプラットフォームは、ユーザーが問題や自分が望むものについて自由に意見を共有できるようにする。しかし、憎しみや虐待的なコンテンツを広めるのも容易だ。フラーニ族はこの不幸な現象の犠牲者となっている。本稿では,ナイジェリアのフラーニ牧草地における最初の注釈付きヘイトスピーチデータセットであるHERDPhobiaについて,英語,ナイジェリア・ピジン,ハウサの3言語で紹介する。我々は,事前学習した言語モデルを用いて,ツイートを憎悪か非憎悪かのいずれかに分類するベンチマーク実験を行う。我々の実験によると、XML-Tモデルは99.83%の重み付きF1でより良いパフォーマンスを提供する。さらなる研究のために、データセットをhttps://github.com/hausanlp/HERDPhobiaでリリースしました。 Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.	翻訳日:2022-11-29 17:36:29 公開日:2022-11-28
# 2パスカスケードエンコーダASRモデルにおけるE2Eセグメンテーション E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model ( http://arxiv.org/abs/2211.15432v1 ) ライセンス: Link先を確認	W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman	(参考訳) 2パスのカスケードエンコーダASRとニューラルセグメンタを1つのモデルに統合することを検討する。重要な課題は、セグメンタ(デコーダと同期してリアルタイムに実行される)が、推論中にユーザの認識したレイテンシや削除エラーを発生させることなく、第2パス(リアルタイムに900msの後方で動作する)をファイナライズできるようにすることである。本稿では,ニューラルセグメンタを1stパスデコーダと統合して終端信号(EOS)をリアルタイムに出力する設計を提案する。 EOS信号は、非因果性第2パスのファイナライズに使用される。第2パスをファイナライズする方法を試作し,新しいダミーフレームインジェクション戦略により,高品質な第2パスと低ファイナライズ遅延を同時に実現できることを確認した。実世界の長文キャプションタスク(YouTube)では、2.4%の相対的なWERと140ミリ秒のEOSレイテンシを、同じカスケードエンコーダを持つベースラインのVADベースのセグメンタで達成している。 We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a novel dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over a baseline VAD-based segmenter with the same cascaded encoder.	翻訳日:2022-11-29 17:36:04 公開日:2022-11-28
# adatask:マルチタスク学習のためのタスク認識適応学習率アプローチ AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning ( http://arxiv.org/abs/2211.15055v1 ) ライセンス: Link先を確認	Enneng Yang, Junwei Pan, Ximei Wang, Haibin Yu, Li Shen, Xihua Chen, Lei Xiao, Jie Jiang, Guibing Guo	(参考訳) マルチタスク学習(MTL)モデルは、コンピュータビジョン、自然言語処理、レコメンダシステムにおいて印象的な結果を示している。多くのアプローチが提案されているが、それぞれのパラメータでどのように異なるタスクをバランスさせるかはまだ不明である。本稿では,このパラメータ上の各タスクの総更新によって,パラメータのタスク支配度を測定することを提案する。具体的には、対応するタスクからパラメータの2乗更新(au)を指数関数的に減少させる平均値で総更新を計算する。この新しいメトリックに基づいて、既存のmtlメソッドの多くのパラメータ、特に高い共有層におけるパラメータが、1つまたは複数のタスクで支配されていることを観測する。 AUの優位は、主に1つまたは複数のタスクからの累積勾配の優位性に起因する。そこで本研究では,適応学習率のアプローチにおいて,各パラメータに対する各タスクの学習率を<emph{accumulative gradients}>と分離するタスク単位適応学習率アプローチ adatask を提案する。コンピュータビジョンとレコメンダシステムMTLデータセットに関する総合的な実験は、AdaTaskが支配的なタスクのパフォーマンスを大幅に改善し、SOTAの平均タスク性能が向上することを示した。合成データと実世界のデータセットの両方の分析は、共有層ごとにadatask balanceパラメータをよく示している。 Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task.Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the \emph{accumulative gradients} and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well.	翻訳日:2022-11-29 17:21:09 公開日:2022-11-28
# 畳み込みネットワークを用いたDolphin Whistlesの自動検出と伝達学習 Automated Detection of Dolphin Whistles with Convolutional Networks and Transfer Learning ( http://arxiv.org/abs/2211.15406v1 ) ライセンス: Link先を確認	Burla Nur Korkmaz, Roee Diamant, Gil Danino, Alberto Testolin	(参考訳) 海洋環境の効率的な保全と絶滅危惧種の野生生物管理は、環境モニタリングのための効率的で正確でスケーラブルなソリューションの実装を必要とする。エコ音響学は、環境音の非侵襲的長期サンプリングの利点を提供し、生物多様性調査の基準ツールとなる可能性がある。しかし、音響データの分析と解釈は、しばしば大量の人間の監督を必要とする時間を要するプロセスである。この問題は、ディープラーニング研究の進歩により、最近目覚ましいパフォーマンスを達成した音声信号分析の現代的技術を活用することで解決されるかもしれない。本稿では,畳み込み型ニューラルネットワークが水中の音声記録からイルカの口笛を識別することで,従来の自動手法よりもはるかに優れていることを示す。提案システムでは,環境雑音の存在下でも信号を検出することができると同時に,偽陽性や偽陰性の発生可能性も一貫して低減できる。本研究は,海洋生態系の自動モニタリングを改善するための人工知能技術の導入をさらに支援する。 Effective conservation of maritime environments and wildlife management of endangered species require the implementation of efficient, accurate and scalable solutions for environmental monitoring. Ecoacoustics offers the advantages of non-invasive, long-duration sampling of environmental sounds and has the potential to become the reference tool for biodiversity surveying. However, the analysis and interpretation of acoustic data is a time-consuming process that often requires a great amount of human supervision. This issue might be tackled by exploiting modern techniques for automatic audio signal analysis, which have recently achieved impressive performance thanks to the advances in deep learning research. In this paper we show that convolutional neural networks can indeed significantly outperform traditional automatic methods in a challenging detection task: identification of dolphin whistles from underwater audio recordings. The proposed system can detect signals even in the presence of ambient noise, at the same time consistently reducing the likelihood of producing false positives and false negatives. Our results further support the adoption of artificial intelligence technology to improve the automatic monitoring of marine ecosystems.	翻訳日:2022-11-29 17:20:14 公開日:2022-11-28
# 合成主成分設計:合成制御による高速共変量バランス Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls ( http://arxiv.org/abs/2211.15241v1 ) ライセンス: Link先を確認	Yiping Lu, Jiajin Li, Lexing Ying, Jose Blanchet	(参考訳) 実験の最適設計は一般にNP-ハード組合せ最適化問題を解くことである。本稿では,グローバルに収束し,効率的な最適化アルゴリズムを開発することを目的とする。具体的には、前処理結果データが利用可能で、合成制御推定器が呼び出される設定を考える。平均処理効果は、処理単位の重み付き平均結果と、観察データから重みが学習される制御単位の差によって推定される。この設定下では、最適実験設計問題はいわゆる \textit{phase sync}問題に還元できることを驚くほど観察した。スペクトル初期化を用いた一般化電力法の正規化変種を用いてこの問題を解決する。理論的には、あるデータ生成プロセスから前処理データをサンプリングする場合、実験設計のための最初の大域的最適性保証を確立する。実験では,米国労働統計局とアバディ・ダイモンド・ハインミューラー・カリフォルニア喫煙データの両方において,本手法の有効性を実証する実験を行った。根平均二乗誤差の観点からは、このアルゴリズムはランダムな設計を大きなマージンで超えている。 The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.	翻訳日:2022-11-29 17:11:02 公開日:2022-11-28
# サインコヒーレンシによる個別処理規則のメタ分析 Meta-analysis of individualized treatment rules via sign-coherency ( http://arxiv.org/abs/2211.15476v1 ) ライセンス: Link先を確認	Jay Jojo Cheng, Jared D. Huling, Guanhua Chen	(参考訳) 患者の基本特性に合わせた治療は、副作用を減少させながら患者の予後を改善する可能性を秘めている。個別化処理ルール(ITR)の学習には、複数のデータセット(サイト)の集約が必要となることが多いが、現在のITR方法論では、サイト間の不均一性を考慮していないため、各サイトへのデプロイ時にモデルの一般化性が損なわれる可能性がある。そこで本研究では,ITRの個人レベルでのメタ分析手法を開発し,地域固有のITRを共同で学習すると同時に,科学的に動機付けられた指向性原理を通じて特徴記号コヒーレンシに関する情報を借用する。また,itr学習問題に適応した情報基準を用いて,モデルチューニングのための適応手順を開発した。提案手法を数値実験により検討し,多地点間不均一性の異なるレベル下での性能を把握し,その手法を適用して電子健康記録の多施設データベース上でITRを推定する。この研究は、ITR(Aラーニング、重み付け学習)をマルチサイト設定に推定するためのいくつかの一般的な方法論を拡張した。 Medical treatments tailored to a patient's baseline characteristics hold the potential of improving patient outcomes while reducing negative side effects. Learning individualized treatment rules (ITRs) often requires aggregation of multiple datasets(sites); however, current ITR methodology does not take between-site heterogeneity into account, which can hurt model generalizability when deploying back to each site. To address this problem, we develop a method for individual-level meta-analysis of ITRs, which jointly learns site-specific ITRs while borrowing information about feature sign-coherency via a scientifically-motivated directionality principle. We also develop an adaptive procedure for model tuning, using information criteria tailored to the ITR learning problem. We study the proposed methods through numerical experiments to understand their performance under different levels of between-site heterogeneity and apply the methodology to estimate ITRs in a large multi-center database of electronic health records. This work extends several popular methodologies for estimating ITRs (A-learning, weighted learning) to the multiple-sites setting.	翻訳日:2022-11-29 17:10:30 公開日:2022-11-28
# テキスト-SQLモデルのセキュリティ脆弱性について On the Security Vulnerabilities of Text-to-SQL Models ( http://arxiv.org/abs/2211.15363v1 ) ライセンス: Link先を確認	Xutan Peng, Yipeng Zhang, Jingfeng Yang, Mark Stevenson	(参考訳) 最近の研究によると、テキスト処理アルゴリズムは多くのタスクに効果があるものの、意図的な攻撃に対して脆弱である可能性がある。しかし、このような弱点が直接セキュリティの脅威に繋がるかどうかはまだ未定だ。このギャップを埋めるため、データベースの自然言語インターフェースを構築するテクニックであるText-to-SQLの脆弱性テストを実施しました。実証的な結果として、2つの商用ブラックボックス(Baidu-UNIT と Codex で動作する Ai2sql)の Text-to-SQL モジュールが悪意のあるコードを生成するために操作可能であることを示しました。これは、NLPモデルが野生の攻撃ベクトルとして利用される危険性の初めての実証である。さらに、4つのオープンソースフレームワークを含む実験により、単純なバックドア攻撃がテキストからSQLシステムで100%の成功率を達成できることを確認した。これらの知見を報告し,実践的な防衛策を提案することにより,ソフトウェアセキュリティ問題の特定と修復にNLPコミュニティから直ちに注意を喚起する。 Recent studies show that, despite being effective on numerous tasks, text processing algorithms may be vulnerable to deliberate attacks. However, the question of whether such weaknesses can directly lead to security threats is still under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL, a technique that builds natural language interfaces for databases. Empirically, we showed that the Text-to-SQL modules of two commercial black boxes (Baidu-UNIT and Codex-powered Ai2sql) can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service. This is the first demonstration of the danger of NLP models being exploited as attack vectors in the wild. Moreover, experiments involving four open-source frameworks verified that simple backdoor attacks can achieve a 100% success rate on Text-to-SQL systems with almost no prediction performance impact. By reporting these findings and suggesting practical defences, we call for immediate attention from the NLP community to the identification and remediation of software security issues.	翻訳日:2022-11-29 17:10:10 公開日:2022-11-28
# 可変需要に適応した自律経路・ピックアップ問題に対するマルチエージェント強化学習 Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand ( http://arxiv.org/abs/2211.14983v1 ) ライセンス: Link先を確認	Daniel Garces, Sushmita Bhattacharya, Stephanie Gil, Dimitri Bertsekas	(参考訳) 都市地図上で確率的に現れる要求の処理を行う車両群に対して,ルーティング/ピックアップポリシを生成するための学習フレームワークを導出する。私たちは政策に焦点を合わせ 1)車両間の連携を生じさせ、従量化の待ち時間を短縮する。 2)非明快で、未定の今後の要望を考慮し、 3) 基盤となる需要分布の変化に対応できる。特に、オンピーク時間とオフピーク時間のような都市環境における実際の需要条件の変動に対応することに関心があります。私たちはこれを組み合わせて達成し (i)オンラインプレイ、近似ポリシー反復ステップによるロールアウト手法の性能を向上させるルックアヘッド最適化方法、及び (ii)基盤となる需要モデルの変化に適応できるオフライン近似スキーム。特に,wassersteinambiguity集合のq-valid半径を用いて妥当性の領域を定量化することにより,学習したポリシーを異なる需要分布に適応させることができる。本研究では,現在の要求が元の有効領域外にある場合に,トレーニング済みのオフライン近似を切り替える機構を提案する。この場合、wasserstein距離の観点で現在の需要に近い歴史的な需要モデルに基づいてトレーニングされたオフラインアーキテクチャを使うように提案する。我々は、サンフランシスコのダウンタウンにおける実際の納税要求に対するルーティングとピックアップのポリシーを、オンピーク時間とオフピーク時間の間で高いばらつきで学習し、需要分布の実際の変動に対応する方法の能力を実証した。その結果,本手法は,運用研究の古典的手法に基づくベンチマークと同様に,ロールアウトに基づく強化学習よりも優れることがわかった。 We derive a learning framework to generate routing/pickup policies for a fleet of vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, considering a-priori unknown potential future requests, and 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in adapting to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) online play, a lookahead optimization method that improves the performance of rollout methods via an approximate policy iteration step, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in downtown San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms rollout-based reinforcement learning, as well as several benchmarks based on classical methods from the field of operations research.	翻訳日:2022-11-29 17:02:42 公開日:2022-11-28
# 企業の金融リスク分析に関する包括的調査 : 問題、方法、スポットライト、応用 A Comprehensive Survey on Enterprise Financial Risk Analysis: Problems, Methods, Spotlights and Applications ( http://arxiv.org/abs/2211.14997v1 ) ライセンス: Link先を確認	Yu Zhao, Huaming Du	(参考訳) 企業金融リスク分析は、企業の将来的な金融リスクを予測することを目的としており、広く適用されているため、企業金融リスク分析は金融の中核的な研究課題である。リスク管理に関する貴重な調査はすでにいくつかあるが、これらの調査は比較的孤立したアプローチを導入し、近年の企業金融リスク分析の進歩を欠いている。企業金融リスク分析の急速な拡大、特にコンピュータ科学とビッグデータの観点からは、関連する研究を包括的にレビューすることは必要かつ困難である。本調査は、既存の企業金融リスク研究を統合・体系化し、また、企業金融リスク分析のメカニズムと戦略を包括的に要約・解釈し、読者が現在の研究状況や考え方をよりよく理解する上で役立てることを目的とする。本論文は,1968年から2022年までの50年間の企業リスク分析モデリングに関する300以上の論文の体系的文献レビューを提供する。まず,企業リスクの形式的定義と関連する概念について紹介する。次に,リスクタイプの観点から代表作を分類し,リスク分析の3つの側面を要約した。最後に、企業財務リスクをモデル化するための分析手法を比較した。本研究の目的は,企業リスクコミュニケーションのメカニズムと企業ガバナンス,金融機関,政府規制への影響を十分に理解することを目的とした,現在の最先端の研究と,企業リスクをモデル化するための今後の方向性を明らかにすることである。 Enterprise financial risk analysis aims at predicting the enterprises' future financial risk.Due to the wide application, enterprise financial risk analysis has always been a core research issue in finance. Although there are already some valuable and impressive surveys on risk management, these surveys introduce approaches in a relatively isolated way and lack the recent advances in enterprise financial risk analysis. Due to the rapid expansion of the enterprise financial risk analysis, especially from the computer science and big data perspective, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing enterprise financial risk researches, as well as to summarize and interpret the mechanisms and the strategies of enterprise financial risk analysis in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. This paper provides a systematic literature review of over 300 articles published on enterprise risk analysis modelling over a 50-year period, 1968 to 2022. We first introduce the formal definition of enterprise risk as well as the related concepts. Then, we categorized the representative works in terms of risk type and summarized the three aspects of risk analysis. Finally, we compared the analysis methods used to model the enterprise financial risk. Our goal is to clarify current cutting-edge research and its possible future directions to model enterprise risk, aiming to fully understand the mechanisms of enterprise risk communication and influence and its application on corporate governance, financial institution and government regulation.	翻訳日:2022-11-29 17:02:16 公開日:2022-11-28
# 移動ロボットによる物体操作のための集団知能 Collective Intelligence for Object Manipulation with Mobile Robots ( http://arxiv.org/abs/2211.15136v1 ) ライセンス: Link先を確認	So Kuroki, Tatsuya Matsushima, Jumpei Arima, Yutaka Matsuo, Shixiang Shane Gu, Yujin Tang	(参考訳) 自然システムは多くの場合、自己組織化と変化への適応を可能にする集団的知性を示すが、ほとんどの人工的なシステムでは同等なものが欠落している。移動ロボットを用いた協調物体操作において,そのようなシステムの可能性を検討する。従来の研究では、制限された設定で問題に対する潜在的な解決策を示すが、計算と学習が困難である。さらに重要なことに、これらのシステムは環境の変化に直面するときに適応する能力を持たない。本研究では,グラデーションに基づくソフトボディシミュレータから得られたプランナーを注意に基づくニューラルネットワークに蒸留することで,マルチロボット操作システムがベースラインよりも優れた性能を実現できることを示す。さらに,本システムでは,トレーニング中に見えない構成に一般化し,外乱や環境変化を適用した場合のタスク完了に適応できる。 While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems. We explore the possibility of such a system in the context of cooperative object manipulation using mobile robots. Although conventional works demonstrate potential solutions for the problem in restricted settings, they have computational and learning difficulties. More importantly, these systems do not possess the ability to adapt when facing environmental changes. In this work, we show that by distilling a planner derived from a gradient-based soft-body physics simulator into an attention-based neural network, our multi-robot manipulation system can achieve better performance than baselines. In addition, our system also generalizes to unseen configurations during training and is able to adapt toward task completions when external turbulence and environmental changes are applied.	翻訳日:2022-11-29 17:01:52 公開日:2022-11-28
# ST-Curriculum Dropoutを用いた時空間グラフモデリング Easy Begun is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum Dropout ( http://arxiv.org/abs/2211.15182v1 ) ライセンス: Link先を確認	Hongjun Wang, Jiyuan Chen, Tong Pan, Zipei Fan, Boyuan Zhang, Renhe Jiang, Lingyu Zhang, Yi Xie, Zhongyi Wang, Xuan Song	(参考訳) 交通速度予測やタクシー需要予測といった空間的時間的(st)グラフモデリングは、ディープラーニング分野において重要なタスクである。しかし、グラフ内のノードの場合、それらのSTパターンはSTデータの異種性に依拠し、モデリングの困難さに大きく依存する。我々は、ノードをモデルに有意義な順序で公開することで、従来のトレーニング手順よりもパフォーマンスが向上すると主張している。このアイデアはカリキュラム学習のルーツであり、初期のトレーニングモデルではノイズや難しいサンプルに敏感であることが示唆されている。本稿では,空間時間グラフモデリングのための新しい実装戦略ST-Curriculum Dropoutを提案する。具体的には,高レベルな機能空間における各ノードの学習難易度を評価し,それらの難易度を取り除き,モデルが最初から基本的なst関係のみを処理することを保証する。我々の戦略は、訓練可能なパラメータを加味せずに任意の標準的ディープラーニングアーキテクチャに適用でき、訓練が進むにつれてST関係の難易度を制御することによって、より優れたデータ表現を捉えることができ、より高度な一般化が得られることを示すために、幅広いデータセットに関する広範な実験を行うことができる。 Spatial-temporal (ST) graph modeling, such as traffic speed forecasting and taxi demand prediction, is an important task in deep learning area. However, for the nodes in graph, their ST patterns can vary greatly in difficulties for modeling, owning to the heterogeneous nature of ST data. We argue that unveiling the nodes to the model in a meaningful order, from easy to complex, can provide performance improvements over traditional training procedure. The idea has its root in Curriculum Learning which suggests in the early stage of training models can be sensitive to noise and difficult samples. In this paper, we propose ST-Curriculum Dropout, a novel and easy-to-implement strategy for spatial-temporal graph modeling. Specifically, we evaluate the learning difficulty of each node in high-level feature space and drop those difficult ones out to ensure the model only needs to handle fundamental ST relations at the beginning, before gradually moving to hard ones. Our strategy can be applied to any canonical deep learning architecture without extra trainable parameters, and extensive experiments on a wide range of datasets are conducted to illustrate that, by controlling the difficulty level of ST relations as the training progresses, the model is able to capture better representation of the data and thus yields better generalization.	翻訳日:2022-11-29 17:01:39 公開日:2022-11-28
# 連続エピソード制御 Continuous Episodic Control ( http://arxiv.org/abs/2211.15183v1 ) ライセンス: Link先を確認	Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat	(参考訳) 非パラメトリックエピソディックメモリは、強化学習タスクでハイリワード体験を素早くラッチするのに使うことができる。パラメトリック深層強化学習法とは対照的に、これらの手法は解を一度だけ発見し、繰り返し解くだけでよい。しかしながら、エピソディック制御解は離散テーブルに格納されており、このアプローチは離散作用空間問題にのみ適用されている。そこで本研究では,連続行動空間の問題における逐次決定のための非パラメトリックエピソードメモリアルゴリズムであるContinuous Episodic Control (CEC)を提案する。いくつかのスパース・リワード連続制御環境において,提案手法は現状のモデルレスRLやメモリ拡張RLアルゴリズムよりも高速に学習でき,長期性能も良好である。要するに、CECは継続的制御タスクにおける学習の高速なアプローチであり、ハイブリッドアプローチにおけるパラメトリックRLメソッドへの有用な追加である。 Non-parametric episodic memory can be used to quickly latch onto high-reward experience in reinforcement learning tasks. In contrast to parametric deep reinforcement learning approaches, these methods only need to discover the solution once, and may then repeatedly solve the task. However, episodic control solutions are stored in discrete tables, and this approach has so far only been applied to discrete action space problems. Therefore, this paper introduces Continuous Episodic Control (CEC), a novel non-parametric episodic memory algorithm for sequential decision making in problems with a continuous action space. Results on several sparse-reward continuous control environments show that our proposed method learns faster than state-of-the-art model-free RL and memory-augmented RL algorithms, while maintaining good long-run performance as well. In short, CEC can be a fast approach for learning in continuous control tasks, and a useful addition to parametric RL methods in a hybrid approach as well.	翻訳日:2022-11-29 17:01:16 公開日:2022-11-28
# 5G基地局交通予測のためのフェデレートラーニング Federated Learning for 5G Base Station Traffic Forecasting ( http://arxiv.org/abs/2211.15220v1 ) ライセンス: Link先を確認	Vasileios Perifanis, Nikolaos Pavlidis, Remous-Aris Koutsiamanis, Pavlos S. Efraimidis	(参考訳) モバイルトラフィック予測は、5gモバイルネットワークがスマートで効率的なインフラ計画と管理を可能にするために非常に重要である。ただし、利用可能なデータは基地局のログ情報に限られている。したがって、異なる当事者に対する新たな観察に一般化できる高品質な予測を生成するための訓練方法が求められている。従来のアプローチでは、異なるベースステーションから測定値を収集し、中央のエンティティに送信し、受信したデータを使用して機械学習操作を実行する必要がある。ローカルな観察を広めることで、プライバシ、機密性、パフォーマンス上の懸念が高まり、マシンラーニング技術の適用性が損なわれる。この問題に対処するために,様々な分散学習手法が提案されているが,交通予測への応用はまだ検討されていない。本研究は, 時系列予測のための原基地局集約LTEデータに適用したフェデレーション学習の有効性について検討する。非iidデータのフェデレーション設定でトレーニングされた5つの異なるニューラルネットワークアーキテクチャを用いて、ワンステップ予測を評価する。提示されたアルゴリズムは、5gおよびbeyond challengeのグローバルフェデレーショントラフィック予測に提出された。その結果,フェデレート設定に適応した学習アーキテクチャは,集中型設定と等価な予測誤差を達成し,ベースステーションでの事前処理技術は高い予測精度をもたらすが,最先端のアグリゲータは単純なアプローチを上回らないことがわかった。 Mobile traffic prediction is of great importance on the path of enabling 5G mobile networks to perform smart and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations on different parties are in demand. Traditional approaches require collecting measurements from different base stations and sending them to a central entity, followed by performing machine learning operations using the received data. The dissemination of local observations raises privacy, confidentiality, and performance concerns, hindering the applicability of machine learning techniques. Various distributed learning methods have been proposed to address this issue, but their application to traffic prediction has yet to be explored. In this work, we study the effectiveness of federated learning applied to raw base station aggregated LTE data for time-series forecasting. We evaluate one-step predictions using 5 different neural network architectures trained with a federated setting on non-iid data. The presented algorithms have been submitted to the Global Federated Traffic Prediction for 5G and Beyond Challenge. Our results show that the learning architectures adapted to the federated setting achieve equivalent prediction error to the centralized setting, pre-processing techniques on base stations lead to higher forecasting accuracy, while state-of-the-art aggregators do not outperform simple approaches.	翻訳日:2022-11-29 17:01:00 公開日:2022-11-28
# 外科的場面理解のためのクラスインクリメンタルコントラスト学習を用いたタスク対応非同期マルチタスクモデル Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding ( http://arxiv.org/abs/2211.15327v1 ) ライセンス: Link先を確認	Lalithkumar Seenivasan, Mobarakol Islam, Mengya Xu, Chwee Ming Lim and Hongliang Ren	(参考訳) 目的: ロボット手術における術中指導, 意思決定, 術後分析において, ツール間相互作用認識と自動レポート生成による手術シーン理解が重要な役割を担っている。しかし,患者間および患者内変動の異なる手術群と,新しい楽器の外観のドメインシフトは,モデル予測の性能を低下させる。さらに、計算コストが高く、リアルタイムのパフォーマンスに影響する複数のモデルからの出力が必要である。方法論: 領域シフト問題に対処する多タスク学習(MTL)モデルが手術報告生成およびツールとタスク間の相互作用予測のために提案される。共有特徴抽出器のモデル形式、キャプションのためのメッシュ変換分岐、ツール・トイシューインタラクション予測のためのグラフ注意分岐。共有特徴抽出器は、クラスインクリメンタルコントラスト学習(CICL)を用いて、ターゲット領域における強度シフトと新しいクラス外観に取り組む。我々は,gaussian (log) に基づくカリキュラム学習のlalacianを,共有分科とタスク分科に分割して,モデル学習を強化する。タスク対応非同期MTL最適化手法を導入し,共有重みを微調整し,両タスクを最適に収束させる。結果:タスク認識最適化と微調整技術を用いて訓練したMTLモデルは,目標領域上の両方のタスクに対するバランス性能(シーンキャプションのBLEUスコア0.4049,インタラクション検出の精度0.3508)を報告し,ドメイン適応における単一タスクモデルとオンパーで実行した。結論: 提案するマルチタスクモデルは, ドメインシフトに適応し, 対象領域に新しい機器を取り入れ, ツール間インタラクション検出とレポート生成を単一タスクモデルと同等に行うことができた。 Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. Methodology: A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning (CICL) to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian (LoG) based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. Results: The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. Conclusion: The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models.	翻訳日:2022-11-29 16:43:41 公開日:2022-11-28
# スパイキング神経p系のマトリックス表現:再検討 Matrix representations of spiking neural P systems: Revisited ( http://arxiv.org/abs/2211.15156v1 ) ライセンス: Link先を確認	Henry N. Adorna	(参考訳) 2010年、遅延のないsn pシステムの行列表現が提示され、遅延のあるsn pシステムの場合、2017年に行列表現が提案された。これらの表現は、コンピュータソフトウェアとハードウェア技術を用いたsn pシステムの一連のシミュレーションをもたらした。本研究では,これらの表現を再検討し,sn p系の計算の挙動について考察する。構成の到達可能性の概念は、sn pシステムと遅延のないシステムの両方において考慮される。 SNPシステムの遅延を考慮した場合、次の構成のより良い計算法を提案する。 In the 2010, matrix representation of SN P system without delay was presented while in the case of SN P systems with delay, matrix representation was suggested in the 2017. These representations brought about series of simulation of SN P systems using computer software and hardware technology. In this work, we revisit these representation and provide some observations on the behavior of the computations of SN P systems. The concept of reachability of configuration is considered in both SN P systems with and without delays. A better computation of next configuration is proposed in the case of SN P system with delay.	翻訳日:2022-11-29 16:43:08 公開日:2022-11-28
# アンサンブルスタック構築のためのブースティングアプローチ A Boosting Approach to Constructing an Ensemble Stack ( http://arxiv.org/abs/2211.15621v1 ) ライセンス: Link先を確認	Zhilei Zhou and Ziyu Qiu and Brad Niblett and Andrew Johnston and Jeffrey Schwartzentruber and Nur Zincir-Heywood and Malcolm Heywood	(参考訳) 分類のための進化的アンサンブル学習へのアプローチが提案され、プログラムのスタックを構築するためにブースティングが使用される。 boostingのそれぞれのアプリケーションは、単一のチャンピオンと残りのデータセット、すなわち、これまで正しく分類されていなかったトレーニングレコードを識別する。次のプログラムは残留物に対してのみ訓練され、最大アンサンブルサイズまたはそれ以上の残留物が残るまで反復される。残留データセットに対するトレーニングは、トレーニングコストを積極的に削減する。アンサンブルをスタックとしてデプロイすることは、予測を行うのに1つの分類器だけが必要であることを意味するため、解釈性も向上する。ベンチマーク研究は、最先端の進化的アンサンブル学習アルゴリズムの予測精度と競争性を示すとともに、桁違いに単純なソリューションを提供する。高濃度データセットによるさらなるベンチマークにより,提案手法はXGBoostよりも正確かつ効率的であることが示唆された。 An approach to evolutionary ensemble learning for classification is proposed in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.	翻訳日:2022-11-29 16:42:34 公開日:2022-11-28
# 局所解釈可能なモデル非依存な説明による画像分類のための深層畳み込みニューラルネットワークの説明 Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations ( http://arxiv.org/abs/2211.15143v1 ) ライセンス: Link先を確認	Bin Wang, Wenbin Pei, Bing Xue, Mengjie Zhang	(参考訳) 深層畳み込みニューラルネットワークはその有効性を証明し、画像分類の最も有力な方法として認識されている。しかし、深層畳み込みニューラルネットワークの深刻な欠点は説明可能性の低下である。残念ながら、多くの現実世界のアプリケーションでは、ユーザーは予測を信頼すべきかどうかを決定する際に、深い畳み込みニューラルネットワークの予測の背後にある根拠を理解する必要がある。この問題を解決するために,局所的な説明を自動的に進化させ,ユーザが予測の合理性を評価するのに役立つ新しい遺伝的アルゴリズムに基づく手法を提案する。さらに,提案手法はモデルに依存しない,すなわち深い畳み込みニューラルネットワークモデルを説明するために利用できる。実験では、ResNetがサンプルモデルとして使用され、ImageNetデータセットがベンチマークデータセットとして選択される。 densenet と mobilenet はさらに説明され,提案手法のモデル非依存な特性を示す。 ImageNetからランダムに選択された4つの画像上の進化した局所的説明は、進化した局所的説明が人間によって容易に認識されることを示す。さらに、進化した説明は、サンプル画像の有意義な解釈可能な特徴をうまく捉えることで、4つの画像の全ての深部畳み込みニューラルネットワークの予測をうまく説明することができる。実験の30回の実行に基づくさらなる分析により、進化した局所的な説明は、予測を行う際の深層畳み込みニューラルネットワークモデルの確率/確信を向上させることができることが示された。提案手法は,lime (state-of-the-art method) の10倍以上の速度で局所的な説明が得られる。 Deep convolutional neural networks have proven their effectiveness, and have been acknowledged as the most dominant method for image classification. However, a severe drawback of deep convolutional neural networks is poor explainability. Unfortunately, in many real-world applications, users need to understand the rationale behind the predictions of deep convolutional neural networks when determining whether they should trust the predictions or not. To resolve this issue, a novel genetic algorithm-based method is proposed for the first time to automatically evolve local explanations that can assist users to assess the rationality of the predictions. Furthermore, the proposed method is model-agnostic, i.e., it can be utilised to explain any deep convolutional neural network models. In the experiments, ResNet is used as an example model to be explained, and the ImageNet dataset is selected as the benchmark dataset. DenseNet and MobileNet are further explained to demonstrate the model-agnostic characteristic of the proposed method. The evolved local explanations on four images, randomly selected from ImageNet, are presented, which show that the evolved local explanations are straightforward to be recognised by humans. Moreover, the evolved explanations can explain the predictions of deep convolutional neural networks on all four images very well by successfully capturing meaningful interpretable features of the sample images. Further analysis based on the 30 runs of the experiments exhibits that the evolved local explanations can also improve the probabilities/confidences of the deep convolutional neural network models in making the predictions. The proposed method can obtain local explanations within one minute, which is more than ten times faster than LIME (the state-of-the-art method).	翻訳日:2022-11-29 16:35:28 公開日:2022-11-28
# 誤りレベル解析を用いた画像鑑定のためのSOTA画像分類ディープラーニング法による画像検出 Forged Image Detection using SOTA Image Classification Deep Learning Methods for Image Forensics with Error Level Analysis ( http://arxiv.org/abs/2211.15196v1 ) ライセンス: Link先を確認	Raunak Joshi, Abhishek Gupta, Nandan Kanvinde, Pandharinath Ghonge	(参考訳) コンピュータビジョンの領域における進歩は、深層学習機構を用いてもたらされている。 Image Forensicsはコンピュータビジョンアプリケーションの主要な分野の1つである。画像の偽造は画像鑑識のサブカテゴリであり、エラーレベル分析を使用して検出することができる。このようなイメージを入力として使うと、畳み込みニューラルネットワークのバリエーションを利用して、バイナリ分類の問題になってしまう可能性がある。本稿では,casia itde v.2データセットによる誤りレベル解析に基づく最先端画像分類モデルを用いて転送学習を行う。アルゴリズムは vgg-19, inception-v3, resnet-152-v2, xceptionnet, efficientnet-v2l である。 The advancement in the area of computer vision has been brought using deep learning mechanisms. Image Forensics is one of the major areas of computer vision application. Forgery of images is sub-category of image forensics and can be detected using Error Level Analysis. Using such images as an input, this can turn out to be a binary classification problem which can be leveraged using variations of convolutional neural networks. In this paper we perform transfer learning with state-of-the-art image classification models over error level analysis induced CASIA ITDE v.2 dataset. The algorithms used are VGG-19, Inception-V3, ResNet-152-V2, XceptionNet and EfficientNet-V2L with their respective methodologies and results.	翻訳日:2022-11-29 16:35:03 公開日:2022-11-28
# マスクの裏にあるもの:画像間問題における不確かさを推定する What's Behind the Mask: Estimating Uncertainty in Image-to-Image Problems ( http://arxiv.org/abs/2211.15211v1 ) ライセンス: Link先を確認	Gilad Kutiel, Regev Cohen, Michael Elad, Daniel Freedman	(参考訳) イメージ・ツー・イメージ・ネットワークの不確実性を推定することは重要な課題であり、特にそのようなネットワークが生物学的・医学的な画像領域にますます展開されている。本稿では,マスキングに基づくこの問題に対する新しいアプローチを提案する。既存の画像画像ネットワークを前提として,マスク再構成画像とマスク真の画像との距離が一定の閾値未満であることを保証するマスクを高い確率で計算する。したがって、マスクは再構成された画像のより特定の領域を特定する。我々のアプローチは、基礎となるイメージ・ツー・イメージ・ネットワークとは無関係であり、トレーニングには入力(劣化)、再構成、真のイメージの3倍しか必要としない。さらに,本手法は距離測定値と無関係である。結果として、L_p$スタイルの距離やLPIPSのような知覚距離を使うことができる。我々の理論的な保証は共形校正手順に由来する。我々は,画像のカラー化,画像補完,超解像度タスクにおける不確実性に対するマスクベースアプローチを評価し,それぞれに高品質な性能を示す。 Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use $L_p$-style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each.	翻訳日:2022-11-29 16:34:51 公開日:2022-11-28
# 忘れずにプログレッシブな学習 Progressive Learning without Forgetting ( http://arxiv.org/abs/2211.15215v1 ) ライセンス: Link先を確認	Tao Feng, Hangjie Yuan, Mang Wang, Ziyuan Huang, Ang Bian, Jianzhou Zhang	(参考訳) 得られた知識を忘れずにタスクの変更やシーケンシャルな経験から学ぶことは、ニューラルネットワークにとって難しい問題である。本研究では,従来のデータを含まない連続学習(CL)のパラダイムにおいて,2つの課題に焦点をあてる。 (i)モデルがそれまでの知識を学習する段階的な知識空間によって引き起こされる破滅的な記憶の蓄積 (ii)新しい課題の学習における安定性と可塑性のバランスをとるための無制御の綱引き力学。これらの問題に対処するため、我々はPLwF(Progressive Learning without Forgetting)と、オプティマイザの信用割当制度を提示する。 PLwFは、従来のタスクからモデル関数を導入し、各タスクに関する最も信頼性の高い知識と異なるタスクの分布情報を含む知識空間を構築する。広範囲なアブレーション実験は、PLwFとクレジット割り当ての有効性を示す。他のCL法と比較して,生データに頼らずとも,優れた結果が得られている。 Learning from changing tasks and sequential experience without forgetting the obtained knowledge is a challenging problem for artificial neural networks. In this work, we focus on two challenging problems in the paradigm of Continual Learning (CL) without involving any old data: (i) the accumulation of catastrophic forgetting caused by the gradually fading knowledge space from which the model learns the previous knowledge; (ii) the uncontrolled tug-of-war dynamics to balance the stability and plasticity during the learning of new tasks. In order to tackle these problems, we present Progressive Learning without Forgetting (PLwF) and a credit assignment regime in the optimizer. PLwF densely introduces model functions from previous tasks to construct a knowledge space such that it contains the most reliable knowledge on each task and the distribution information of different tasks, while credit assignment controls the tug-of-war dynamics by removing gradient conflict through projection. Extensive ablative experiments demonstrate the effectiveness of PLwF and credit assignment. In comparison with other CL methods, we report notably better results even without relying on any raw data.	翻訳日:2022-11-29 16:34:33 公開日:2022-11-28
# 画像分類における故障検出のための評価実践を振り返って A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification ( http://arxiv.org/abs/2211.15259v1 ) ライセンス: Link先を確認	Paul F. Jaeger, Carsten T. L\"uth, Lukas Klein and Till J. Bungert	(参考訳) 機械学習に基づく意思決定システムの荒野における信頼性の高い適用は、現在この分野で調査されている大きな課題の1つだ。確立されたアプローチの大部分は、信頼スコアを割り当てることで誤った予測を検出することを目的としている。この信頼性は、モデルの予測の不確かさを定量化したり、明示的なスコアリング関数を学習したり、入力がトレーニング分布と一致しているかを評価することによって得られる。事実、これら全ての状態は実生活のアプリケーション上で分類器の故障を検出するという同じ目標に対処するが、現在では個々の評価プロトコルで大半を分離した研究分野を構成しており、関連する手法のかなりの部分を除外するか、関連する障害源の大部分を無視する。本研究では,これらの不整合に起因する現在の落とし穴を系統的に明らかにし,障害検出の全体的かつ現実的な評価のための要件を導出する。この統一的な視点の関連性を示すために,本研究では,信頼度スコアリング関数w.r.tを,関連するすべての方法と障害源として,初めて大規模実証研究を行う。簡便なソフトマックス応答ベースラインの総合的評価手法としての啓示は、信頼度スコアリングに関する公開研究が豊富にある中で、現在の評価の劇的な欠点を浮き彫りにしている。コードとトレーニングされたモデルはhttps://github.com/IML-DKFZ/fd-shiftsにある。 Reliable application of machine learning-based decision systems in the wild is one of the major challenges currently investigated by the field. A large portion of established approaches aims to detect erroneous predictions by means of assigning confidence scores. This confidence may be obtained by either quantifying the model's predictive uncertainty, learning explicit scoring functions, or assessing whether the input is in line with the training distribution. Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. To demonstrate the relevance of this unified perspective, we present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions w.r.t all relevant methods and failure sources. The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation in the abundance of publicized research on confidence scoring. Code and trained models are at https://github.com/IML-DKFZ/fd-shifts.	翻訳日:2022-11-29 16:34:16 公開日:2022-11-28
# パーシステンスバーコードの誘導マッチングによる位相的忠実な画像分割 Topologically faithful image segmentation via induced matching of persistence barcodes ( http://arxiv.org/abs/2211.15272v1 ) ライセンス: Link先を確認	Nico Stucki, Johannes C. Paetzold, Suprosanna Shit, Bjoern Menze, Ulrich Bauer	(参考訳) 画像のセグメンテーションは、ニューラルネットワークが多くの技術分野において膨大な応用を見出す研究分野である。セグメンテーションネットワークを訓練する最も一般的なアプローチは、多くのセグメンテーションタスクで不十分な目的であるピクセルオーバーラップを最適化する損失関数を用いる。近年、それらの限界は、セグメント構造の正しいトポロジーを回復することを目的としたトポロジー認識法への関心を高めた。しかし、これまでのアプローチでは、地上の真実と予測のトポロジ的特徴の空間的整合性は得られていない。本研究では,教師付き画像セグメンテーションのためのトポロジカルかつ特徴的に正確な計量と損失関数を提案し,これをベッチマッチングと呼ぶ。セグメント化設定におけるバーコード間の空間的整合性を保証する方法を示す。さらに,画像のベッチマッチングを計算するための効率的なアルゴリズムを提案する。ベッチマッチング誤差はセグメンテーションの位相的正しさを評価するための解釈可能な指標であり,既定のベッチ数誤差よりも感度が高いことを示す。さらに、ベッチマッチング損失の微分性は、損失関数としての使用を可能にする。ボリューム性能を維持しながら、6つの多様なデータセットにわたるセグメンテーションネットワークのトポロジ的パフォーマンスを改善する。私たちのコードはhttps://github.com/nstucki/betti-matchingで利用可能です。 Image segmentation is a largely researched field where neural networks find vast applications in many facets of technology. Some of the most popular approaches to train segmentation networks employ loss functions optimizing pixel-overlap, an objective that is insufficient for many segmentation tasks. In recent years, their limitations fueled a growing interest in topology-aware methods, which aim to recover the correct topology of the segmented structures. However, so far, none of the existing approaches achieve a spatially correct matching between the topological features of ground truth and prediction. In this work, we propose the first topologically and feature-wise accurate metric and loss function for supervised image segmentation, which we term Betti matching. We show how induced matchings guarantee the spatially correct matching between barcodes in a segmentation setting. Furthermore, we propose an efficient algorithm to compute the Betti matching of images. We show that the Betti matching error is an interpretable metric to evaluate the topological correctness of segmentations, which is more sensitive than the well-established Betti number error. Moreover, the differentiability of the Betti matching loss enables its use as a loss function. It improves the topological performance of segmentation networks across six diverse datasets while preserving the volumetric performance. Our code is available in https://github.com/nstucki/Betti-matching.	翻訳日:2022-11-29 16:33:54 公開日:2022-11-28
# 衛星画像生成のための条件付きプログレッシブ・ジェネレーティブ・アドバイサル・ネットワーク Conditional Progressive Generative Adversarial Network for satellite image generation ( http://arxiv.org/abs/2211.15303v1 ) ライセンス: Link先を確認	Renato Cardoso, Sofia Vallecorsa, Edoardo Nemni	(参考訳) 画像生成と画像補完は、欠落したピクセルを現実的に置き換えることができる機械学習アルゴリズムのおかげで、急速に進化している。しかし,高解像度画像を高精細度で生成することは重要な計算課題である。本研究では、3つの隅のうち1つが欠けている画像の完成として画像生成タスクを定式化する。そして、このアプローチを拡張して、同じレベルのディテールで大きなイメージを反復的に構築します。我々の目標は、衛星画像データセットに典型的な高解像度のサンプルを生成するためのスケーラブルな手法を得ることである。本稿では,wassersteinオートエンコーダによって潜在ベクトルにエンコードされた3つの初期隣接タイルを入力として,画像中の欠落タイルを生成する条件付きプログレッシブ生成逆ネットワーク(gan)を提案する。我々は,国連衛星センター(unosat)が洪水検知ツールの訓練に使用する画像セットに着目し,合成画像の品質を現実的な設定で検証する。 Image generation and image completion are rapidly evolving fields, thanks to machine learning algorithms that are able to realistically replace missing pixels. However, generating large high resolution images, with a large level of details, presents important computational challenges. In this work, we formulate the image generation task as completion of an image where one out of three corners is missing. We then extend this approach to iteratively build larger images with the same level of detail. Our goal is to obtain a scalable methodology to generate high resolution samples typically found in satellite imagery data sets. We introduce a conditional progressive Generative Adversarial Networks (GAN), that generates the missing tile in an image, using as input three initial adjacent tiles encoded in a latent vector by a Wasserstein auto-encoder. We focus on a set of images used by the United Nations Satellite Centre (UNOSAT) to train flood detection tools, and validate the quality of synthetic images in a realistic setup.	翻訳日:2022-11-29 16:33:33 公開日:2022-11-28
# 良いヘルパーはあなたの周りにある:注意駆動マスク画像モデリング Good helper is around you: Attention-driven Masked Image Modeling ( http://arxiv.org/abs/2211.15362v1 ) ライセンス: Link先を確認	Jie Gui, Zhengqi Liu, Hao Luo	(参考訳) マスク付き画像モデリング(MIM)は,過去1年間,自己教師型学習において大きな可能性を秘めてきた。 MIMは、ユニバーサルバックボーン・ビジョン・トランスフォーマーから恩恵を受け、画像のパッチの一部を隠蔽し、欠落したピクセルを回復しようとすることで、自己監督された視覚表現を学習する。これまでのほとんどの作業では、画像のパッチをランダムにマスクし、視覚表現学習に有用な意味情報を弱めている。一方、バックボーンの大きさが大きいため、以前のほとんどの作品は事前トレーニングに多くの時間を費やしなければならない。本稿では,上記の2つの問題を解くことができるtextbf{Attention-driven Masking and Throwing Strategy} (AMT)を提案する。まず,教師付き手法を使わずに,学習過程中に画像の意味情報を自動取得するために自己照査機構を利用する。マスキング戦略は、その情報を選択的にマスキング領域に誘導することができ、表現学習に役立つ。さらに,冗長なパッチスロー戦略を提案し,学習をより効率的にする。マスク画像モデリング用プラグアンドプレイモジュールとして、AMTは、CIFAR-10/100, STL-10, Tiny ImageNet, ImageNet-1K上のMAEの線形探索精度を$2.9\% \sim 5.9\%で改善し、MAEとSimMIMの微調整精度に関して改善された性能を得る。さらに、この設計は下流検出およびセグメント化タスクにおいて優れた性能を達成する。 It has been witnessed that masked image modeling (MIM) has shown a huge potential in self-supervised learning in the past year. Benefiting from the universal backbone vision transformer, MIM learns self-supervised visual representations through masking a part of patches of the image while attempting to recover the missing pixels. Most previous works mask patches of the image randomly, which underutilizes the semantic information that is beneficial to visual representation learning. On the other hand, due to the large size of the backbone, most previous works have to spend much time on pre-training. In this paper, we propose \textbf{Attention-driven Masking and Throwing Strategy} (AMT), which could solve both problems above. We first leverage the self-attention mechanism to obtain the semantic information of the image during the training process automatically without using any supervised methods. Masking strategy can be guided by that information to mask areas selectively, which is helpful for representation learning. Moreover, a redundant patch throwing strategy is proposed, which makes learning more efficient. As a plug-and-play module for masked image modeling, AMT improves the linear probing accuracy of MAE by $2.9\% \sim 5.9\%$ on CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-1K, and obtains an improved performance with respect to fine-tuning accuracy of MAE and SimMIM. Moreover, this design also achieves superior performance on downstream detection and segmentation tasks.	翻訳日:2022-11-29 16:33:15 公開日:2022-11-28
# ディープラーニングオプティマイザの探索-第1および第2次方法- A survey of deep learning optimizers-first and second order methods ( http://arxiv.org/abs/2211.15596v1 ) ライセンス: Link先を確認	Rohan V Kashyap	(参考訳) 深層学習最適化は、サドル点、局所小数点、ヘッセンおよび限られた計算資源の不調和などの固有の困難により、しばしば困難であると見なされる重み空間における高次元損失関数の最小化を伴う。本稿では,深層学習における12の標準最適化手法の包括的レビューを行い,最適化文献から数値最適化の困難さを理論的に評価する。 Deep Learning optimization involves minimizing a high-dimensional loss function in the weight space which is often perceived as difficult due to its inherent difficulties such as saddle points, local minima, ill-conditioning of the Hessian and limited compute resources. In this paper, we provide a comprehensive review of 12 standard optimization methods successfully used in deep learning research and a theoretical assessment of the difficulties in numerical optimization from the optimization literature.	翻訳日:2022-11-29 16:32:25 公開日:2022-11-28
# FaiREE:Finite-Sample と Distribution-free Guarantee による公平な分類 FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee ( http://arxiv.org/abs/2211.15072v1 ) ライセンス: Link先を確認	Puheng Li, James Zou, Linjun Zhang	(参考訳) アルゴリズム的公平性は、機械学習研究においてますます重要な役割を果たす。いくつかのグループフェアネスの概念とアルゴリズムが提案されている。しかし、既存の公平な分類方法の公平性保証は、多くの場合、大きなサンプルサイズを必要とする特定のデータ分布の仮定に主に依存しており、サンプルが少なからぬ数である場合には公平性に違反する可能性がある。本稿では,有限サンプルと分布フリーな理論保証で群フェアネス制約を満たすフェア分類アルゴリズムであるfairを提案する。 FaiREEは、グループフェアネスの概念(例えば、機会の平等、平等化オッド、デモグラフィックパリティなど)を満たし、最適な精度を達成するように適応することができる。これらの理論的保証は、合成データと実データの両方の実験によってさらに支持される。 FaiREEは最先端のアルゴリズムよりも優れた性能を示した。 Algorithmic fairness plays an increasingly critical role in machine learning research. Several group fairness notions and algorithms have been proposed. However, the fairness guarantee of existing fair classification methods mainly depends on specific data distributional assumptions, often requiring large sample sizes, and fairness could be violated when there is a modest number of samples, which is often the case in practice. In this paper, we propose FaiREE, a fair classification algorithm that can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. FaiREE can be adapted to satisfy various group fairness notions (e.g., Equality of Opportunity, Equalized Odds, Demographic Parity, etc.) and achieve the optimal accuracy. These theoretical guarantees are further supported by experiments on both synthetic and real data. FaiREE is shown to have favorable performance over state-of-the-art algorithms.	翻訳日:2022-11-29 16:15:58 公開日:2022-11-28
# グラフ上のガウス過程のためのトランスダクティブカーネル Transductive Kernels for Gaussian Processes on Graphs ( http://arxiv.org/abs/2211.15322v1 ) ライセンス: Link先を確認	Yin-Cong Zhi, Felix L. Opolka, Yin Cheng Ng, Pietro Li\`o, Xiaowen Dong	(参考訳) グラフ上のカーネルは、ノードレベルの問題に対する選択肢が限られている。そこで本研究では,ノード特徴データ付きグラフ用カーネルを,半教師付き学習用として提案する。カーネルは、グラフと特徴データを2つのヒルベルト空間として扱うことで正規化フレームワークから派生する。また、グラフ上のカーネルベースのモデルが私たちの設計の例であることも示しています。この方法で定義されたカーネルは、トランスダクティブ特性を持ち、より少ないトレーニングポイントで学習する能力が向上し、高度に非ユークリッドなデータの処理性が向上する。グラフ全体の分布がラベルのパターンを知らせることができる合成データを用いて,これらの利点を実証する。最後に、カーネル内のグラフラプラシアンの柔軟な多項式を利用することで、モデルは様々なレベルのホモフィリーグラフ上の半教師付き分類でも効果的に機能する。 Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel defined this way has transductive properties, and this leads to improved ability to learn on fewer training points, as well as better handling of highly non-Euclidean data. We demonstrate these advantages using synthetic data where the distribution of the whole graph can inform the pattern of the labels. Finally, by utilizing a flexible polynomial of the graph Laplacian within the kernel, the model also performed effectively in semi-supervised classification on graphs of various levels of homophily.	翻訳日:2022-11-29 16:15:43 公開日:2022-11-28
# 観測データを用いた因果深い強化学習 Causal Deep Reinforcement Learning using Observational Data ( http://arxiv.org/abs/2211.15355v1 ) ライセンス: Link先を確認	Wenxuan Zhu, Chao Yu, Qiang Zhang	(参考訳) 深層強化学習(DRL)は、自動運転車や医療分野など、現実の世界では高価で倫理的ではない多くの介入データを収集する必要がある。オフライン強化学習は、現実世界で利用可能な膨大な観測データを活用することでこの問題を軽減することを約束している。しかし、観測データは、データを生成する行動ポリシーが観測されていない確率変数(つまり共同設立者)に依存する場合、学習エージェントを望ましくない結果へと誤解させる可能性がある。本稿では,この問題に対処するため,DRLにおける2つの分離手法を提案する。提案手法はまず,因果推論手法に基づいて異なるサンプルの重要度を算出し,その不偏性を確保するためにオフラインデータセットを再重み付けあるいは再サンプリングすることにより,損失関数に対する異なるサンプルの影響を調整する。これらの解離法は、これらのアルゴリズムの損失関数によって弱条件を満たすことができることを条件として、ソフトアクター批判や深部Q-ラーニングのような既存のモデルフリーDRLアルゴリズムと柔軟に組み合わせることができる。本手法の有効性を実証し,実験的に検証する。 Deep reinforcement learning (DRL) requires the collection of plenty of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with the existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.	翻訳日:2022-11-29 16:15:30 公開日:2022-11-28
# 未知測定ノイズを持つ物理形ニューラルネットワーク Physics-informed neural networks with unknown measurement noise ( http://arxiv.org/abs/2211.15498v1 ) ライセンス: Link先を確認	Philipp Pilar, Niklas Wahlstr\"om	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、解の発見と偏微分方程式のパラメータの同定の両方に対する柔軟なアプローチである。ほとんどの作業はノイズのないデータや、ガウス雑音によって汚染されたデータを想定している。標準の pinn フレームワークが非ガウスノイズの場合に分解されることを示す。本稿では,この基本的な問題を解決する方法を提供し,エネルギーベースモデル(EBM)を協調訓練して,正しい雑音分布を学習することを提案する。複数の例を用いて,提案手法の性能改善について述べる。 Physics-informed neural networks (PINNs) constitute a flexible approach to both finding solutions and identifying parameters of partial differential equations. Most works on the topic assume noiseless data, or data contaminated by weak Gaussian noise. We show that the standard PINN framework breaks down in case of non-Gaussian noise. We give a way of resolving this fundamental issue and we propose to jointly train an energy-based model (EBM) to learn the correct noise distribution. We illustrate the improved performance of our approach using multiple examples.	翻訳日:2022-11-29 16:15:11 公開日:2022-11-28
# 分散を超えて:"純粋"相関を持つ分布に対するテスト時間ラベルシフト適応 Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations ( http://arxiv.org/abs/2211.15646v1 ) ライセンス: Link先を確認	Qingyao Sun (University of Chicago), Kevin Murphy (Google Brain), Sayna Ebrahimi (Google Cloud), Alexander D'Amour (Google Brain)	(参考訳) 厳密な相関、あるいはモデルをデプロイ可能なドメイン間で変化する相関は、機械学習モデルの現実的な応用に重大な課題をもたらす。しかし、そのような相関は常に「純然たる」とは限らない;多くの場合、それらは入力のみから抽出できる以上の予測のための貴重な事前情報を提供する。本稿では,非分散によるスプリアス相関を解消しようとする近年のアプローチとは対照的に,スプリアス相関現象を利用したテスト時間適応法を提案する。クラスラベル $y$ とニュアサンス係数 $z$ の間の限界依存性をモデル化する事前分布 $p(y, z)$ がドメイン間で変化する可能性があるが、フィーチャ $p(\mathbf{x}\|y, z)$ の生成モデルは一定である。これはラベルシフトの仮定の拡張版であり、そこではラベルには$z$というニュアンス要素も含まれている。この観測に基づいて、ソース分布上で$p(y, z\|\mathbf{x})$を予測できるように分類器を訓練し、対象領域からの未ラベルのサンプルを用いて、限界分布$p(y, z)$の変化に対応するテストタイムラベルシフト補正を実装する。我々はこの手法をTTLSA(Test-Time Label-Shift Adaptation)と呼ぶ。我々は、CheXpertの胸部X線データセットと色付きMNISTデータセットの2つの異なる画像データセットに適用し、従来の分布の変化に不変な分類器を訓練する手法よりも、下流結果が優れていることを示す。コード再現実験はhttps://github.com/nalzok/test-time-label-shiftで利用可能である。 Spurious correlations, or correlations that change across domains where a model can be deployed, present significant challenges to real-world applications of machine learning models. However, such correlations are not always "spurious"; often, they provide valuable prior information for a prediction beyond what can be extracted from the input alone. Here, we present a test-time adaptation method that exploits the spurious correlation phenomenon, in contrast to recent approaches that attempt to eliminate spurious correlations through invariance. We consider situations where the prior distribution $p(y, z)$, which models the marginal dependence between the class label $y$ and the nuisance factors $z$, may change across domains, but the generative model for features $p(\mathbf{x}\|y, z)$ is constant. We note that this is an expanded version of the label shift assumption, where the labels now also include the nuisance factors $z$. Based on this observation, we train a classifier to predict $p(y, z\|\mathbf{x})$ on the source distribution, and implement a test-time label shift correction that adapts to changes in the marginal distribution $p(y, z)$ using unlabeled samples from the target domain. We call our method "Test-Time Label-Shift Adaptation" or TTLSA. We apply our method to two different image datasets -- the CheXpert chest X-ray dataset and the colored MNIST dataset -- and show that it gives better downstream results than methods that try to train classifiers which are invariant to the changes in prior distribution. Code reproducing experiments is available at https://github.com/nalzok/test-time-label-shift .	翻訳日:2022-11-29 16:15:03 公開日:2022-11-28
# 時系列予測のための階層誘導モデル選択 Hierarchy-guided Model Selection for Time Series Forecasting ( http://arxiv.org/abs/2211.15092v1 ) ライセンス: Link先を確認	Arindam Jati, Vijay Ekambaram, Shaonli Pal, Brian Quanz, Wesley M. Gifford, Pavithra Harsha, Stuart Siegel, Sumanta Mukherjee, Chandra Narayanaswami	(参考訳) 時系列予測モデルの一般化は、モデル選択の質に依存する。時間的クロスバリデーション(TCV)は予測タスクにおいてモデル選択を行う標準的な手法である。 TCVは、トレーニング時系列を列車および検証ウィンドウに順次分割し、予測モデルのハイパーパラメータ最適化(HPO)を行い、最高の検証性能でモデルを選択する。 TCVを用いたモデル選択は、テストデータの分布が検証データと異なる場合、テスト性能が低下することが多い。本稿では,時系列データセットに関連するデータ階層を利用した新しいモデル選択法h-proを提案する。一般的に、階層の上位レベルの集約されたデータは、よりスパースで(時には)断続的なボトムレベルのデータと比較して予測可能性と一貫性が向上する。 h-proは、階層内の上位レベルの教師モデルの集合から得られたテストプロキシ予測に基づいて、最低レベルの学生モデルのhpoを実行する。教師のプロキシ予測の整合性は、最低レベルでより良い生徒モデルを選択するのに役立つ。提案手法の有効性を検証するため,複数のデータセットについて広範な実験を行った。 H-Proは、既成の予測モデルとともに、M5ポイント予測競争の勝利モデルを含む既存の最先端予測手法を上回っている。 Generalizability of time series forecasting models depends on the quality of model selection. Temporal cross validation (TCV) is a standard technique to perform model selection in forecasting tasks. TCV sequentially partitions the training time series into train and validation windows, and performs hyperparameter optmization (HPO) of the forecast model to select the model with the best validation performance. Model selection with TCV often leads to poor test performance when the test data distribution differs from that of the validation data. We propose a novel model selection method, H-Pro that exploits the data hierarchy often associated with a time series dataset. Generally, the aggregated data at the higher levels of the hierarchy show better predictability and more consistency compared to the bottom-level data which is more sparse and (sometimes) intermittent. H-Pro performs the HPO of the lowest-level student model based on the test proxy forecasts obtained from a set of teacher models at higher levels in the hierarchy. The consistency of the teachers' proxy forecasts help select better student models at the lowest-level. We perform extensive empirical studies on multiple datasets to validate the efficacy of the proposed method. H-Pro along with off-the-shelf forecasting models outperform existing state-of-the-art forecasting methods including the winning models of the M5 point-forecasting competition.	翻訳日:2022-11-29 15:58:25 公開日:2022-11-28
# GraphPNAS:ディープグラフ生成モデルによる優れたニューラルネットワークの分布学習 GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models ( http://arxiv.org/abs/2211.15155v1 ) ライセンス: Link先を確認	Muchen Li, Jeffrey Yunfan Liu, Leonid Sigal, Renjie Liao	(参考訳) ニューラルアーキテクチャは自然に計算グラフと見なすことができる。本稿では,この視点に動機づけられ,ランダムグラフモデル学習のレンズを通してニューラルネットワーク探索(nas)について検討する。単一最良アーキテクチャ,すなわち点推定に重点を置いている既存のNAS手法とは対照的に,優れたアーキテクチャの分布を学習するグラフ生成モデルであるGraphPNASを提案する。 GraphPNASはグラフニューラルネットワーク(GNN)に基づいて、優れたニューラルネットワークのトポロジとオペレータ間の関係をよりよく捉えます。さらに, グラフ生成器は, 一般的なrnn生成器やランダム探索法よりも柔軟で効率的な学習可能な確率的探索法をもたらす。最後に、NASのための効率的な強化学習定式化により、発電機を学習する。 GraphPNASの有効性を評価するため,TinyImageNet上でのRandWire,CIFAR10上でのENAS,NAS-Bench-101/201など,3つの検索空間で広範囲にわたる実験を行った。 RandWireの複雑さは他の文献の検索空間よりもはるかに大きい。提案するグラフジェネレータは,RNNベースよりも一貫して優れており,最先端のNAS手法よりも優れた,あるいは同等のパフォーマンスが得られることを示す。 Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.	翻訳日:2022-11-29 15:58:04 公開日:2022-11-28
# マルチビュー探索最大化による視覚制御 Tackling Visual Control via Multi-View Exploration Maximization ( http://arxiv.org/abs/2211.15233v1 ) ライセンス: Link先を確認	Mingqi Yuan, Xin Jin, Bo Li, Wenjun Zeng	(参考訳) 本稿では,複雑なビジュアル制御タスクに取り組むためのマルチビュー探索の最大化について述べる。我々の知る限りでは、MEMは多視点表現学習と本質的な報酬駆動による強化学習(RL)を組み合わせた最初のアプローチである。より具体的には、memはまずマルチビュー観察の具体的かつ共有的な情報を抽出し、学習した機能でrlを実行する前に高品質な機能を形成する。さらに、MEMは多視点特徴をエントロピー最大化に基づく固有報酬に変換することにより探索を促進する。その結果、MEMはRLエージェントの試料効率と一般化能力を著しく向上させ、高次元の観測と余剰空間で現実の問題を解くのに役立てることができる。我々は,DeepMind Control Suite と Procgen の様々なタスクにおける MEM の評価を行った。広範なシミュレーション結果から、memは優れたパフォーマンスを達成でき、単純なアーキテクチャと高い効率でベンチマークスキームを上回ることが示される。 We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.	翻訳日:2022-11-29 15:57:41 公開日:2022-11-28
# 医療意思決定における因果介入のベイズ的ネットワークモデル:文献レビューとソフトウェア評価 Bayesian Network Models of Causal Interventions in Healthcare Decision Making: Literature Review and Software Evaluation ( http://arxiv.org/abs/2211.15258v1 ) ライセンス: Link先を確認	Artem Velikzhanin, Benjie Wang and Marta Kwiatkowska	(参考訳) 本報告は,医療における意思決定を支援するベイズネットワークモデルを特定するための体系的文献探索の結果をまとめたものである。検索手法を説明した後、wang b, lyle c, kwiatkowska m (2021) で開発された因果的介入分析ソフトウェアツールを用いて分析に適した公開モデルとデータセットを識別するために、選択された研究論文を簡潔にレビューする。最後に,ソフトウェアをモデル選択に適用する実験的な評価を行い,予備的な結果を報告する。 This report summarises the outcomes of a systematic literature search to identify Bayesian network models used to support decision making in healthcare. After describing the search methodology, the selected research papers are briefly reviewed, with the view to identify publicly available models and datasets that are well suited to analysis using the causal interventional analysis software tool developed in Wang B, Lyle C, Kwiatkowska M (2021). Finally, an experimental evaluation of applying the software on a selection of models is carried out and preliminary results are reported.	翻訳日:2022-11-29 15:57:23 公開日:2022-11-28
# 時変低ランク自己回帰による時空間データからの動的パターンの発見 Discovering Dynamic Patterns from Spatiotemporal Data with Time-Varying Low-Rank Autoregression ( http://arxiv.org/abs/2211.15482v1 ) ライセンス: Link先を確認	Xinyu Chen and Chengyuan Zhang and Xiaoxu Chen and Nicolas Saunier and Lijun Sun	(参考訳) 本稿では,時空間データ解析における広範な実践的関心,すなわち時空間データからの解釈可能な動的パターンの発見について述べる。この目的に向けて, 係数行列が低ランクテンソル因子分解によってパラメータ化される時間変化低減ランクベクトル自己回帰(var)モデルを開発した。テンソル因子化構造を利用して,モデル圧縮とパターン発見を同時に行うことができる。特に,提案モデルでは時空間データに基づく非定常性と時間変化システムの挙動を特徴付けることができる。提案モデルを評価するために, 流体力学, 海面温度, USA表面温度, NYCタクシートリップなど, 様々な非線形力学系を表す様々な時空間データを用いて実験を行った。実験結果は,時空間データをモデル化し,提案モデルを用いて空間的・時空間的パターンを特徴付ける効果を示す。空間的文脈では、空間的パターンを自動的に抽出し、空間的モードによって直感的に特徴付けることができる。時間的文脈において、複雑な時変系の挙動は、提案されたモデルの時間的モードによって明らかにされる。したがって,本モデルは実世界の動的システムにおける複雑な時空間データを理解するための洞察に富んだ基礎を築いた。適応データセットとPythonの実装はhttps://github.com/xinychen/vars.comで公開されている。 The problem of broad practical interest in spatiotemporal data analysis, i.e., discovering interpretable dynamic patterns from spatiotemporal data, is studied in this paper. Towards this end, we develop a time-varying reduced-rank vector autoregression (VAR) model whose coefficient matrices are parameterized by low-rank tensor factorization. Benefiting from the tensor factorization structure, the proposed model can simultaneously achieve model compression and pattern discovery. In particular, the proposed model allows one to characterize nonstationarity and time-varying system behaviors underlying spatiotemporal data. To evaluate the proposed model, extensive experiments are conducted on various spatiotemporal data representing different nonlinear dynamical systems, including fluid dynamics, sea surface temperature, USA surface temperature, and NYC taxi trips. Experimental results demonstrate the effectiveness of modeling spatiotemporal data and characterizing spatial/temporal patterns with the proposed model. In the spatial context, the spatial patterns can be automatically extracted and intuitively characterized by the spatial modes. In the temporal context, the complex time-varying system behaviors can be revealed by the temporal modes in the proposed model. Thus, our model lays an insightful foundation for understanding complex spatiotemporal data in real-world dynamical systems. The adapted datasets and Python implementation are publicly available at https://github.com/xinychen/vars.	翻訳日:2022-11-29 15:56:51 公開日:2022-11-28
# 学習ブルームフィルタにおける分類器選択の臨界解析 A Critical Analysis of Classifier Selection in Learned Bloom Filters ( http://arxiv.org/abs/2211.15565v1 ) ライセンス: Link先を確認	Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli, Raffaele Giancarlo, Marco Frasca	(参考訳) 学習されたブルームフィルタ、すなわち、機械学習技術を介してデータから誘導されるモデルと、近似された集合メンバシップ問題の解決は、特に空間占有に焦点を当てた標準的なブルームフィルタの性能向上を目的として最近導入された。古典的な場合とは異なり、フィルタを構築するために使用されるデータの「複雑さ」は、その性能に大きな影響を与える可能性がある。そこで本研究では,与えられた分類複雑性のデータセット上で,与えられた学習ブルームフィルタの性能評価を行うための,私たちの知識を最大限活用するための,最初の深度解析を提案する。実際、我々はソフトウェアがサポートする新しい手法を提案し、学習されたブルームフィルタの設計、解析、実装を行い、そのマルチクリトリア性(すなわち、空間効率、偽陽性率、拒絶時間を含む制約)に特定の制約を課す。提案手法と支援ソフトウェアが有効で有用であることを示す実験により,データ複雑性の異なる問題に対して,2つの分類器だけが望ましい特性を持つことが判明し,文献にはこれまで検討されていない。また,学習されたブルームフィルタのサンドウィッチ化が,データ複雑性や分類器の性能変動に対して最も頑健であることも実験的に示した。このソフトウェアは、新たに学習されたbloomフィルタの提案をテストするために簡単に利用できる。 Learned Bloom Filters, i.e., models induced from data via machine learning techniques and solving the approximate set membership problem, have recently been introduced with the aim of enhancing the performance of standard Bloom Filters, with special focus on space occupancy. Unlike in the classical case, the "complexity" of the data used to build the filter might heavily impact on its performance. Therefore, here we propose the first in-depth analysis, to the best of our knowledge, for the performance assessment of a given Learned Bloom Filter, in conjunction with a given classifier, on a dataset of a given classification complexity. Indeed, we propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters in function of specific constraints on their multi-criteria nature (that is, constraints involving space efficiency, false positive rate, and reject time). Our experiments show that the proposed methodology and the supporting software are valid and useful: we find out that only two classifiers have desirable properties in relation to problems with different data complexity, and, interestingly, none of them has been considered so far in the literature. We also experimentally show that the Sandwiched variant of Learned Bloom filters is the most robust to data complexity and classifier performance variability, as well as those usually having smaller reject times. The software can be readily used to test new Learned Bloom Filter proposals, which can be compared with the best ones identified here.	翻訳日:2022-11-29 15:56:30 公開日:2022-11-28
# 強化学習における知識伝達のための不適用行動学習 Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning ( http://arxiv.org/abs/2211.15589v1 ) ライセンス: Link先を確認	Leo Ardon, Alberto Pozanco, Daniel Borrajo, Sumitra Ganesh	(参考訳) 強化学習(rl)アルゴリズムは、利用可能なアクションがたくさんある環境ではスケールが悪く、最適なポリシーを学ぶために多数のサンプルを必要とすることが知られている。あらゆる可能な状態において同じ固定されたアクション空間を考える伝統的なアプローチは、エージェントが、その報酬を最大化するためにも、$\textit{inapplicable actions}$のような無関係なアクション(つまり、与えられた状態において実行された環境に影響を与えないアクション)を無視しなければならないことを意味する。この情報を知ることで、ポリシー分布から適用不可能なアクションを隠蔽し、最適なポリシーを見つけるためのアクションのみを探索することで、RLアルゴリズムのサンプルの複雑さを低減することができる。これは通常、RLアルゴリズムに手作りのドメインロジックを追加してアドホックな方法で行われる。本稿では,この知識をアルゴリズムに導入するためのより体系的な手法を提案する。私たち (i) エージェントに対して知識を手動で指定する方法を標準化すること。 (II)政策と協調してこれらの国家依存的行動制約を自律的に学習する新しい枠組みを提案する。本研究では,学習不可能な動作が,無関係な動作を隠蔽する信頼性の高い信号を提供することにより,アルゴリズムのサンプル効率を大幅に向上することを示す。さらに,取得した知識の伝達性により,学習プロセスを効率化するために他のタスクで再利用できることを実証する。 Reinforcement Learning (RL) algorithms are known to scale poorly to environments with many available actions, requiring numerous samples to learn an optimal policy. The traditional approach of considering the same fixed action space in every possible state implies that the agent must understand, while also learning to maximize its reward, to ignore irrelevant actions such as $\textit{inapplicable actions}$ (i.e. actions that have no effect on the environment when performed in a given state). Knowing this information can help reduce the sample complexity of RL algorithms by masking the inapplicable actions from the policy distribution to only explore actions relevant to finding an optimal policy. This is typically done in an ad-hoc manner with hand-crafted domain logic added to the RL algorithm. In this paper, we propose a more systematic approach to introduce this knowledge into the algorithm. We (i) standardize the way knowledge can be manually specified to the agent; and (ii) present a new framework to autonomously learn these state-dependent action constraints jointly with the policy. We show experimentally that learning inapplicable actions greatly improves the sample efficiency of the algorithm by providing a reliable signal to mask out irrelevant actions. Moreover, we demonstrate that thanks to the transferability of the knowledge acquired, it can be reused in other tasks to make the learning process more efficient.	翻訳日:2022-11-29 15:56:06 公開日:2022-11-28
# 小口径バイオメディカルデータに対する特徴選択による重み予測ネットワーク Weight Predictor Network with Feature Selection for Small Sample Tabular Biomedical Data ( http://arxiv.org/abs/2211.15616v1 ) ライセンス: Link先を確認	Andrei Margeloiu, Nikola Simidjievski, Pietro Lio, Mateja Jamnik	(参考訳) タブラルバイオメディカルデータはしばしば高次元であるが、非常に少数のサンプルを持つ。最近の研究は、よく規則化された単純なニューラルネットワークが、グラフデータ上のより洗練されたアーキテクチャよりも優れていることを示したが、それでも多くの潜在的に無関係な機能を持つ小さなデータセットに過度に適合する傾向にある。これらの問題に対処するために,ニューラルネットワークを高次元および小サンプルデータから学習するための重み予測器ネットワーク(WPFS)を提案し,学習可能なパラメータの数を削減し,同時に特徴選択を行う。分類ネットワークに加えて、WPFSは2つの小さな補助ネットワークを使用して分類モデルの第一層の重みを出力する。我々は9つの実世界のバイオメディカルデータセットを評価し、wpfが他の標準よりも優れており、表データに適用するより最近の方法であることを示す。さらに,提案する特徴選択機構について検討し,学習課題に対する有用な洞察を提供しながら,性能の向上を示す。 Tabular biomedical data is often high-dimensional but with a very small number of samples. Although recent work showed that well-regularised simple neural networks could outperform more sophisticated architectures on tabular data, they are still prone to overfitting on tiny datasets with many potentially irrelevant features. To combat these issues, we propose Weight Predictor Network with Feature Selection (WPFS) for learning neural networks from high-dimensional and small sample data by reducing the number of learnable parameters and simultaneously performing feature selection. In addition to the classification network, WPFS uses two small auxiliary networks that together output the weights of the first layer of the classification model. We evaluate on nine real-world biomedical datasets and demonstrate that WPFS outperforms other standard as well as more recent methods typically applied to tabular data. Furthermore, we investigate the proposed feature selection mechanism and show that it improves performance while providing useful insights into the learning task.	翻訳日:2022-11-29 15:55:43 公開日:2022-11-28
# 条件付き生成モデリングは意思決定に必要なすべてか? Is Conditional Generative Modeling all you need for Decision-Making? ( http://arxiv.org/abs/2211.15657v1 ) ライセンス: Link先を確認	Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal	(参考訳) 近年の条件生成モデルの改良により,言語記述だけで高品質な画像を生成することが可能になった。これらの手法が逐次意思決定の問題に直接対処できるかどうかを検討する。我々は、強化学習(RL)のレンズを通してではなく、条件付き生成モデルを通して意思決定を行う。驚いたことに、私たちの定式化は、標準ベンチマークで既存のオフラインRLアプローチを上回り得るポリシーにつながります。ポリシーを戻り条件拡散モデルとしてモデル化することで、動的プログラミングの必要性を回避し、それから従来のオフラインrlで発生する多くの複雑さを排除する方法を説明します。さらに,条件拡散モデルとしてのポリシーモデリングの利点を,制約とスキルの2つの条件変数を考慮に入れて実証する。トレーニング中の単一の制約やスキルの条件付けは、複数の制約を満たすか、あるいはスキルの組み合わせを示すテスト時の振る舞いにつながります。条件付き生成モデリングは意思決定のための強力なツールであることを示す。 Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential decision-making. We view decision-making not through the lens of reinforcement learning (RL), but rather through conditional generative modeling. To our surprise, we find that our formulation leads to policies that can outperform existing offline RL approaches across standard benchmarks. By modeling a policy as a return-conditional diffusion model, we illustrate how we may circumvent the need for dynamic programming and subsequently eliminate many of the complexities that come with traditional offline RL. We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills. Conditioning on a single constraint or skill during training leads to behaviors at test-time that can satisfy several constraints together or demonstrate a composition of skills. Our results illustrate that conditional generative modeling is a powerful tool for decision-making.	翻訳日:2022-11-29 15:55:27 公開日:2022-11-28
# AcceRL: 深層強化学習のための政策加速フレームワーク AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning ( http://arxiv.org/abs/2211.15023v1 ) ライセンス: Link先を確認	Hongjie Zhang	(参考訳) 深層強化学習はその超意思決定能力で様々な分野で大きな成功を収めた。しかし、政策学習プロセスは大量の訓練時間を必要とし、エネルギー消費を引き起こす。ニューラルネットワークの冗長性に触発されて,ニューラルネットワーク圧縮に基づく軽量並列学習フレームワーク accerl を提案する。具体的には、さまざまなニューラルネットワーク圧縮手法を柔軟に組み合わせて、経験収集を高速化する。全体としてaccerlはアクタ、学習者、圧縮機、補正器、モニターの5つのコンポーネントで構成されている。アクターはコンプレッサーを使用して学習者のポリシーネットワークを圧縮し、環境と対話する。そして生成されたエクスペリエンスは、v-trace、retraceなどのオフポリシーメソッドによる修正子によって変換される。そして、修正された経験を学習者に与えてポリシー学習を行う。これは、複数のニューラルネットワーク圧縮技術を組み込んだ最初の汎用強化学習フレームワークであると考えています。体育館で行われた大規模な実験では、AceRLは従来の方法と比較してアクターの時間コストを約2.0Xから4.13Xに削減している。さらに、AceRLは従来の方法と比較してトレーニング全体の時間を29.8%から40.3%削減し、同じポリシー品質を維持している。 Deep reinforcement learning has achieved great success in various fields with its super decision-making ability. However, the policy learning process requires a large amount of training time, causing energy consumption. Inspired by the redundancy of neural networks, we propose a lightweight parallel training framework based on neural network compression, AcceRL, to accelerate the policy learning while ensuring policy quality. Specifically, AcceRL speeds up the experience collection by flexibly combining various neural network compression methods. Overall, the AcceRL consists of five components, namely Actor, Learner, Compressor, Corrector, and Monitor. The Actor uses the Compressor to compress the Learner's policy network to interact with the environment. And the generated experiences are transformed by the Corrector with Off-Policy methods, such as V-trace, Retrace and so on. Then the corrected experiences are feed to the Learner for policy learning. We believe this is the first general reinforcement learning framework that incorporates multiple neural network compression techniques. Extensive experiments conducted in gym show that the AcceRL reduces the time cost of the actor by about 2.0 X to 4.13 X compared to the traditional methods. Furthermore, the AcceRL reduces the whole training time by about 29.8% to 40.3% compared to the traditional methods while keeps the same policy quality.	翻訳日:2022-11-29 15:47:46 公開日:2022-11-28
# 質的制約付き強化学習:停電確率を制約する強化学習フレームワーク Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability ( http://arxiv.org/abs/2211.15034v1 ) ライセンス: Link先を確認	Whiyoung Jung, Myungsik Cho, Jongeui Park, Youngchul Sung	(参考訳) 制約強化学習(restricted reinforcement learning, rl)は、与えられた制約を満たしながら、期待累積回帰を最大化する最適方針を見つけることを目的とした、rlの領域である。以前の制約付きrlワークのほとんどは、期待累積和コストを制約として考慮している。しかし、この制約による最適化は、累積和コストが所定の閾値を超えるような停止事象の目標確率を保証できない。本稿では,停止制約を満たすために必要な十分条件である累積和コスト分布の量子化を制約する,quantile restricteded rl(qcrl)という枠組みを提案する。これは、ポリシー勾配定理を量子論に適用する問題に取り組み、量子論の勾配を近似するための理論的結果を提供する最初の研究である。導出した理論結果とラグランジュ乗算器の手法に基づき、量子量制限ポリシー最適化(qcpo)と呼ばれる制約付きrlアルゴリズムを構築した。我々は,大偏差原理(LDP)を用いた分布RLを用いて,QCPOの実装における累積和コストの定量値とテール確率を推定する。実装されたアルゴリズムは、トレーニング期間後の停止確率制約を満たす。 Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period.	翻訳日:2022-11-29 15:47:28 公開日:2022-11-28
# オフライン強化学習のための状態認識近位悲観的アルゴリズム State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning ( http://arxiv.org/abs/2211.15065v1 ) ライセンス: Link先を確認	Chen Chen, Hongyao Tang, Yi Ma, Chao Wang, Qianli Shen, Dong Li, Jianye Hao	(参考訳) ペシミズムはオフライン強化学習(RL)において非常に重要である。オフラインRLアルゴリズムの幅広いカテゴリは、明示的または暗黙的な振舞い規則化によって悲観主義を満たす。しかし、そのほとんどは、オフライン状態の分布が学習方針とどのように異なるかという影響を無視して、行動規則化として政策の分岐のみを考慮する。この問題を考慮し、オフラインRLのための原理的アルゴリズムフレームワークである 'emph{State-Aware Proximal Pessimism} (SA-PP) を提案する。 SA-PPの鍵となる考え方は、学習ポリシーとオフラインデータセット間の定常状態分布比の割引を利用して、状態ワイドな振る舞い規則化の度合いを調整し、悲観性をより適切な方法で実装できるようにすることである。まず, 従来のアルゴリズムよりもSA-PPの方が優れていることの理論的正当性を示し, 幅広い設定において, SA-PPが下位最適上界を生成することを示す。さらに、DualDICEの助けを借りて、SA-PPを代表CQLアルゴリズム上に構築し、割引された定常状態分布比を推定することで、SA-CQLと呼ばれる新しいアルゴリズムを提案する。標準のオフラインRLベンチマークに対する大規模な実験は、SA-CQLがベンチマークの大部分で一般的なベースラインを上回っ、最も高い平均リターンを達成したことを示している。 Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offline RL algorithms fulfills pessimism by explicit or implicit behavior regularization. However, most of them only consider policy divergence as behavior regularization, ignoring the effect of how the offline state distribution differs with that of the learning policy, which may lead to under-pessimism for some states and over-pessimism for others. Taking account of this problem, we propose a principled algorithmic framework for offline RL, called \emph{State-Aware Proximal Pessimism} (SA-PP). The key idea of SA-PP is leveraging discounted stationary state distribution ratios between the learning policy and the offline dataset to modulate the degree of behavior regularization in a state-wise manner, so that pessimism can be implemented in a more appropriate way. We first provide theoretical justifications on the superiority of SA-PP over previous algorithms, demonstrating that SA-PP produces a lower suboptimality upper bound in a broad range of settings. Furthermore, we propose a new algorithm named \emph{State-Aware Conservative Q-Learning} (SA-CQL), by building SA-PP upon representative CQL algorithm with the help of DualDICE for estimating discounted stationary state distribution ratios. Extensive experiments on standard offline RL benchmark show that SA-CQL outperforms the popular baselines on a large portion of benchmarks and attains the highest average return.	翻訳日:2022-11-29 15:47:08 公開日:2022-11-28
# 事前データのない設計への学習:ディープラーニングと木探索を用いた汎用設計戦略の発見 Learning to design without prior data: Discovering generalizable design strategies using deep learning and tree search ( http://arxiv.org/abs/2211.15068v1 ) ライセンス: Link先を確認	Ayush Raina, Jonathan Cagan, Christopher McComb	(参考訳) 独自に設計できるAIエージェントの構築は1980年代から目標とされてきた。近年、ディープラーニングは大規模データから学習する能力を示し、データ駆動設計の大幅な進歩を可能にしている。しかし、事前のデータから学ぶことは、以前解決した問題を解決することのみを制限し、データ駆動学習を既存のソリューションに偏らせる。設計エージェントの最終的な目標は、問題空間における一般的な設計動作を、これまで見たことのないまま学習する能力である。本稿では,この目標を達成するための自己学習エージェントフレームワークを提案する。このフレームワークは,木探索が問題空間を探索する新しい木探索アルゴリズムと深いポリシーネットワークを統合し,深いポリシーネットワークは自己生成した経験を活用して探索をさらに誘導する。このフレームワークは、まず、先行データなしで高性能な生成戦略を発見する能力を示し、次に、未知の境界条件をまたいだ生成戦略のゼロショット一般化を示す。本研究は,2つのエンジニアリング設計問題の複数バージョンを再訓練せずに解くことにより,フレームワークの有効性と汎用性を評価する。本稿では,任意の問題空間における自己学習型ハイパフォーマンス・一般化可能な問題解決行動の方法論を提案し,専門家データ,既存ソリューション,問題固有学習の必要性を回避した。 Building an AI agent that can design on its own has been a goal since the 1980s. Recently, deep learning has shown the ability to learn from large-scale data, enabling significant advances in data-driven design. However, learning over prior data limits us only to solve problems that have been solved before and biases data-driven learning towards existing solutions. The ultimate goal for a design agent is the ability to learn generalizable design behavior in a problem space without having seen it before. We introduce a self-learning agent framework in this work that achieves this goal. This framework integrates a deep policy network with a novel tree search algorithm, where the tree search explores the problem space, and the deep policy network leverages self-generated experience to guide the search further. This framework first demonstrates an ability to discover high-performing generative strategies without any prior data, and second, it illustrates a zero-shot generalization of generative strategies across various unseen boundary conditions. This work evaluates the effectiveness and versatility of the framework by solving multiple versions of two engineering design problems without retraining. Overall, this paper presents a methodology to self-learn high-performing and generalizable problem-solving behavior in an arbitrary problem space, circumventing the needs for expert data, existing solutions, and problem-specific learning.	翻訳日:2022-11-29 15:46:37 公開日:2022-11-28
# flip initial features: 半教師付きノード分類のためのニューラルネットワークの一般化 Flip Initial Features: Generalization of Neural Networks for Semi-supervised Node Classification ( http://arxiv.org/abs/2211.15081v1 ) ライセンス: Link先を確認	Yoonhyuk Choi, Chong-Kwon Kim	(参考訳) グラフニューラルネットワーク(GNN)は、半教師付き設定下で広く利用されている。以前の研究は主に、好気性グラフと好気性グラフの両方をよく一般化するための適切なグラフフィルタ(例えば集約スキーム)を見つけることに重点を置いてきた。これらのアプローチは必須かつ効果的ではあるが、単語の袋表現に内在する初期ノードの特徴のスパースに苦しむ。半教師付き学習では、トレーニングサンプルがグラフフィルタ(超平面)の全次元をカバーできない場合があり、これは第1のプロジェクター行列における特定の次元の過度な適合を生じさせる。この問題に対処するために、我々は単純で新しい戦略を提案し、初期特徴と超平面を同時に反転させて追加空間を作成する。オリジナルとフリップスペースの両方でのトレーニングは、学習可能なパラメータの正確な更新を提供することができる。我々の知る限りでは、これはGNNのオーバーフィッティング問題を効果的に緩和する最初の試みである。実世界のデータセットに対する大規模な実験により、提案手法はノード分類精度を最大40.2%改善することを示した。 Graph neural networks (GNNs) have been widely used under semi-supervised settings. Prior studies have mainly focused on finding appropriate graph filters (e.g., aggregation schemes) to generalize well for both homophilic and heterophilic graphs. Even though these approaches are essential and effective, they still suffer from the sparsity in initial node features inherent in the bag-of-words representation. Common in semi-supervised learning where the training samples often fail to cover the entire dimensions of graph filters (hyperplanes), this can precipitate over-fitting of specific dimensions in the first projection matrix. To deal with this problem, we suggest a simple and novel strategy; create additional space by flipping the initial features and hyperplane simultaneously. Training in both the original and in the flip space can provide precise updates of learnable parameters. To the best of our knowledge, this is the first attempt that effectively moderates the overfitting problem in GNN. Extensive experiments on real-world datasets demonstrate that the proposed technique improves the node classification accuracy up to 40.2 %	翻訳日:2022-11-29 15:46:15 公開日:2022-11-28
# si-gat:ソナー画像分類のための改良グラフアテンションネットワークに基づく手法 SI-GAT: A method based on improved Graph Attention Network for sonar image classification ( http://arxiv.org/abs/2211.15133v1 ) ライセンス: Link先を確認	Can Lei and Huigang Wang and Juan Lei	(参考訳) 深層学習に基づく既存のソナー画像分類法は、局所像の特徴のみを考慮してユークリッド空間でしばしば分析される。そこで本稿では,複数種類の撮像ソナーに適用可能な改良型グラフアテンションネットワーク (gat) に基づくソナー分類法を提案する。本手法は,非ユークリッド空間におけるソナー特性を表す色近距離と空間近距離の連成計算に基づいてノード間の相関関係を定量化し,KNN(K-Nearest Neighbor)アルゴリズムを用いて,注目係数行列と結合してSI-GATの鍵部分を構成するグラフアテンション機構の近傍範囲と隣接行列を決定する。このSI-GATは、実データの検証を通じてユークリッド空間に基づくCNN(Convolutional Neural Network)手法よりも優れている。 The existing sonar image classification methods based on deep learning are often analyzed in Euclidean space, only considering the local image features. For this reason, this paper presents a sonar classification method based on improved Graph Attention Network (GAT), namely SI-GAT, which is applicable to multiple types imaging sonar. This method quantifies the correlation relationship between nodes based on the joint calculation of color proximity and spatial proximity that represent the sonar characteristics in non-Euclidean space, then the KNN (K-Nearest Neighbor) algorithm is used to determine the neighborhood range and adjacency matrix in the graph attention mechanism, which are jointly considered with the attention coefficient matrix to construct the key part of the SI-GAT. This SI-GAT is superior to several CNN (Convolutional Neural Network) methods based on Euclidean space through validation of real data.	翻訳日:2022-11-29 15:41:04 公開日:2022-11-28
# 順序距離学習のための角三角形距離 Angular triangle distance for ordinal metric learning ( http://arxiv.org/abs/2211.15200v1 ) ライセンス: Link先を確認	Imam Mustafa Kamal and Hyerim Bae	(参考訳) deep metric learning(dml)は、タスク固有の距離やデータの類似性を自動的に構築することを目的としている。いくつかの重要なメトリックラーニング手法が提案されている。それでも、低次元空間における元のデータの順序的性質の保存は保証されない。通常のデータは、バイオメディカルケースにおける症状の重症度、製造における生産品質、企業における格付けレベル、顔認識における老化レベルなど、現実世界の問題においてユビキタスである。本研究では,新しい三角形距離 (ATD) と順序三重項ネットワーク (OTD) を提案し,順序データに対する高精度で有意義な埋め込み空間表現を求める。 ATDは角空間におけるデータの順序関係を投影し、OTDはその順序関係を学習する。また、新しい距離測度が数学的に距離計量特性を満たすことを示した。提案手法は,生体情報,顔画像,手指画像などの順序的性質を持つ実世界データを用いて評価した。その結果,提案手法は順序性だけでなく,既存のDMLモデルよりも正確であることがわかった。さらに,提案手法は,最先端の順序数学習法よりも優れていることを示す。 Deep metric learning (DML) aims to automatically construct task-specific distances or similarities of data, resulting in a low-dimensional representation. Several significant metric-learning methods have been proposed. Nonetheless, no approach guarantees the preservation of the ordinal nature of the original data in a low-dimensional space. Ordinal data are ubiquitous in real-world problems, such as the severity of symptoms in biomedical cases, production quality in manufacturing, rating level in businesses, and aging level in face recognition. This study proposes a novel angular triangle distance (ATD) and ordinal triplet network (OTD) to obtain an accurate and meaningful embedding space representation for ordinal data. The ATD projects the ordinal relation of data in the angular space, whereas the OTD learns its ordinal projection. We also demonstrated that our new distance measure satisfies the distance metric properties mathematically. The proposed method was assessed using real-world data with an ordinal nature, such as biomedical, facial, and hand-gestured images. Extensive experiments have been conducted, and the results show that our proposed method not only semantically preserves the ordinal nature but is also more accurate than existing DML models. Moreover, we also demonstrate that our proposed method outperforms the state-of-the-art ordinal metric learning method.	翻訳日:2022-11-29 15:40:47 公開日:2022-11-28
# MicroAST: 超高分解能任意型トランスファーを目指して MicroAST: Towards Super-Fast Ultra-Resolution Arbitrary Style Transfer ( http://arxiv.org/abs/2211.15313v1 ) ライセンス: Link先を確認	Zhizhong Wang, Lei Zhao, Zhiwen Zuo, Ailin Li, Haibo Chen, Wei Xing, Dongming Lu	(参考訳) 任意スタイル転送(AST)は、任意の芸術スタイルをコンテンツイメージに転送する。最近の急速な進歩にもかかわらず、既存のastメソッドは、リソースが限られている超高解像度(4kなど)で実行できないか、遅すぎるため、さらなるアプリケーションを妨げる。本稿では,MicroASTと呼ばれる単純で軽量なモデルを学ぶことで,このジレンマに対処する。鍵となる洞察は、推論時に面倒な事前訓練されたDeep Convolutional Neural Networks(例えばVGG)の使用を完全に放棄することである。代わりに、2つのマイクロエンコーダ(コンテンツエンコーダとスタイルエンコーダ)と1つのマイクロデコーダを設計する。コンテンツエンコーダは、コンテンツ画像の主構造を抽出することを目的とする。スタイルエンコーダは、変調器と組み合わせて、このスタイル画像を学習可能なデュアル変調信号に符号化し、デコーダの中間特徴と畳み込みフィルタの両方を変調し、より洗練され柔軟なスタイル信号を注入してスタイル化を導く。さらに、より明瞭で代表的なスタイル信号を抽出するスタイルエンコーダの能力を高めるために、我々のモデルに新しいスタイル信号のコントラストロスを導入する。この技術と比較すると、私たちのMicroASTは視覚的に優れた結果をもたらすだけでなく、5-73倍小さく、6-18倍速く、初めて超高速(0.5秒)のASTを4K超解像度で実現しました。コードはhttps://github.com/EndyWon/MicroASTで入手できる。 Arbitrary style transfer (AST) transfers arbitrary artistic styles onto content images. Despite the recent rapid progress, existing AST methods are either incapable or too slow to run at ultra-resolutions (e.g., 4K) with limited resources, which heavily hinders their further applications. In this paper, we tackle this dilemma by learning a straightforward and lightweight model, dubbed MicroAST. The key insight is to completely abandon the use of cumbersome pre-trained Deep Convolutional Neural Networks (e.g., VGG) at inference. Instead, we design two micro encoders (content and style encoders) and one micro decoder for style transfer. The content encoder aims at extracting the main structure of the content image. The style encoder, coupled with a modulator, encodes the style image into learnable dual-modulation signals that modulate both intermediate features and convolutional filters of the decoder, thus injecting more sophisticated and flexible style signals to guide the stylizations. In addition, to boost the ability of the style encoder to extract more distinct and representative style signals, we also introduce a new style signal contrastive loss in our model. Compared to the state of the art, our MicroAST not only produces visually superior results but also is 5-73 times smaller and 6-18 times faster, for the first time enabling super-fast (about 0.5 seconds) AST at 4K ultra-resolutions. Code is available at https://github.com/EndyWon/MicroAST.	翻訳日:2022-11-29 15:40:25 公開日:2022-11-28
# 知覚、基礎、理性、行動:汎用視覚表現のためのベンチマーク Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation ( http://arxiv.org/abs/2211.15402v1 ) ライセンス: Link先を確認	Jiangyong Huang, William Yicheng Zhu, Baoxiong Jia, Zan Wang, Xiaojian Ma, Qing Li, Siyuan Huang	(参考訳) 現在のコンピュータビジョンモデルは、人間の視覚システムとは異なり、汎用的な視覚理解をまだ達成できていない。一般的なビジョンモデルを作成する既存の取り組みは、評価されたタスクの範囲に制限があり、それらを全体的に実行する包括的なフレームワークを提供していません。我々は,4つの機能ドメインを持つ視覚認知能力の全スペクトルを包括的に網羅した,汎用視覚理解評価(General-purpose Visual Understanding Evaluation, G-VUE)を提案する。 4つのドメインは、3d再構成から視覚的推論や操作まで、11の注意深くキュレートされたタスクに具体化されている。ベンチマークとともに、11タスクの任意の視覚表現を評価するための一般的なエンコーダ・デコーダフレームワークを提供する。我々は,(1)トランスフォーマーベースの視覚バックボーンが,G-VUE上でCNNベースのバックボーンよりも優れており,(2)視覚言語による事前学習による視覚表現が視覚タスクを横断する視覚のみの事前学習よりも優れていることを確認する。 g-vueでは,より汎用的な視覚表現を得ることで,汎用視覚システム構築に向けた研究のモチベーションを高めるための総合的評価基準を提供する。 Current computer vision models, unlike the human visual system, cannot yet achieve general-purpose visual understanding. Existing efforts to create a general vision model are limited in the scope of assessed tasks and offer no overarching framework to perform them holistically. We present a new comprehensive benchmark, General-purpose Visual Understanding Evaluation (G-VUE), covering the full spectrum of visual cognitive abilities with four functional domains $\unicode{x2014}$ Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation. Along with the benchmark, we provide a general encoder-decoder framework to allow for the evaluation of arbitrary visual representation on all 11 tasks. We evaluate various pre-trained visual representations with our framework and observe that (1) Transformer-based visual backbone generally outperforms CNN-based backbone on G-VUE, (2) visual representations from vision-language pre-training are superior to those with vision-only pre-training across visual tasks. With G-VUE, we provide a holistic evaluation standard to motivate research toward building general-purpose visual systems via obtaining more general-purpose visual representations.	翻訳日:2022-11-29 15:39:57 公開日:2022-11-28
# FsaNet: セマンティックセグメンテーションのための周波数自己注意 FsaNet: Frequency Self-attention for Semantic Segmentation ( http://arxiv.org/abs/2211.15595v1 ) ライセンス: Link先を確認	Fengyu Zhang, Ashkan Panahi, Guangjun Gao	(参考訳) 画像のスペクトル特性を考慮し,線形速度まで計算複雑性を低減した新しい自己追尾機構を提案する。オブジェクト内の類似性を促進しつつエッジの保存性を向上させるため,周波数帯域の異なる個別化プロセスを提案する。特に, プロセスが低周波成分上のみである場合について検討する。アブレーション研究により,低周波自己注意は,ネットワークを再トレーニングすることなく,全周波に対して非常に近い,あるいは良好な性能が得られることを示した。そこで我々は,FsaNetと呼ぶCNNネットワークの先頭に,新しいプラグアンドプレイモジュールを設計し,組み込む。周波数自己注意 1)低周波係数を入力とする。 2) 線形構造を持つ空間領域自己完結と数学的に等価である。 3) トークンマッピング(1\times1$畳み込み)ステージとトークンの混合ステージを同時に単純化する。周波数自己アテンションに要するメモリは 87.29 % \sim 90.04 %$ メモリは 96.13 % \sim 98.07 % $ FLOPs と 97.56 % \sim 98.18 %$ である。他のResNet101ベースのセルフアテンションネットワークと比較して、FsaNetはCityscapeのテストデータセットとADE20kとVOCaugの競合する結果に対して、最先端の新たな結果(83.0\%$ mIoU)を達成した。 Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that the frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.	翻訳日:2022-11-29 15:39:33 公開日:2022-11-28
# エッジ強化グラフアライメントネットワークとワードペア関係タグを用いた共同マルチモーダルエンティティ-リレーション抽出 Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging ( http://arxiv.org/abs/2211.15028v1 ) ライセンス: Link先を確認	Li Yuan, Yi Cai, Jin Wang, Qing Li	(参考訳) マルチモーダル認識(MNER)とマルチモーダル関係抽出(MRE)は、マルチモーダル知識グラフ構築タスクにおける2つの基本的なサブタスクである。しかし、既存のメソッドは通常2つのタスクを独立に処理し、両者の双方向インタラクションを無視する。本稿では,MNERとMREをJMERE(Joint Multimodal entity-relation extract task)として共同実行することを提案する。さらに、現在のmnerおよびmreモデルは、視覚およびテキストグラフにおける視覚オブジェクトとテキストエンティティの整合のみを考慮し、エンティティ-エンティティ関係とオブジェクト-オブジェクト関係を無視する。上記の課題に対処するため、JMEREタスクのためのエッジ強化グラフアライメントネットワークとワードペア関係タグ付け(EEGA)を提案する。具体的には、まず、MNERとMREの双方向相互作用を利用して単語対関係タグを設計し、エラー伝搬を回避する。次に,クロスグラフのノードとエッジをアライメントすることにより,jmereタスクを強化するためのエッジエンハンスグラフアライメントネットワークを提案する。従来の手法と比較して,エッジ情報を利用してオブジェクトとエンティティのアライメントを補助し,エンティティ-エンティティ関係とオブジェクト-オブジェクト関係の相関関係を求めることができる。本モデルの有効性を示す実験を行った。 Multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) are two fundamental subtasks in the multimodal knowledge graph construction task. However, the existing methods usually handle two tasks independently, which ignores the bidirectional interaction between them. This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE). Besides, the current MNER and MRE models only consider aligning the visual objects with textual entities in visual and textual graphs but ignore the entity-entity relationships and object-object relationships. To address the above challenges, we propose an edge-enhanced graph alignment network and a word-pair relation tagging (EEGA) for JMERE task. Specifically, we first design a word-pair relation tagging to exploit the bidirectional interaction between MNER and MRE and avoid the error propagation. Then, we propose an edge-enhanced graph alignment network to enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared with previous methods, the proposed method can leverage the edge information to auxiliary alignment between objects and entities and find the correlations between entity-entity relationships and object-object relationships. Experiments are conducted to show the effectiveness of our model.	翻訳日:2022-11-29 15:39:07 公開日:2022-11-28
# G^3: Guidebook Grounding によるジオロケーション G^3: Geolocation via Guidebook Grounding ( http://arxiv.org/abs/2211.15521v1 ) ライセンス: Link先を確認	Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach	(参考訳) 画像が撮影された場所を予測するタスクである位置情報を,言語がいかに改善できるかを示す。そこで本研究では,人間が位置情報に用いている視覚的特徴を,人間の手書きガイドブックから明らかに把握する。多様な場所のストリートビュー画像のデータセットと、人気のあるインタラクティブなジオロケーションゲームであるGeoGuessrのテキストガイドブックを用いた、ガイドブックグラウンドによるジオロケーションのタスクを提案する。本手法は,ガイドブックから自動的に抽出された手がかりに注目することで,各画像の国を予測する。国レベルの擬似ラベルによる注目が最高のパフォーマンスを達成する。本手法は,最先端画像のみの位置情報法を実質的に上回り,top-1精度が5%以上向上した。データセットとコードはhttps://github.com/g-luo/geolocation_via_guidebook_grounding.orgにある。 We demonstrate how language can improve geolocation: the task of predicting the location where an image was taken. Here we study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game. Our approach predicts a country for each image by attending over the clues automatically extracted from the guidebook. Supervising attention with country-level pseudo labels achieves the best performance. Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy. Our dataset and code can be found at https://github.com/g-luo/geolocation_via_guidebook_grounding.	翻訳日:2022-11-29 15:22:02 公開日:2022-11-28
# 多様な嗜好を持つヒトの合意を見つけるための微調整言語モデル Fine-tuning language models to find agreement among humans with diverse preferences ( http://arxiv.org/abs/2211.15006v1 ) ライセンス: Link先を確認	Michiel A. Bakker and Martin J. Chadwick and Hannah R. Sheahan and Michael Henry Tessler and Lucy Campbell-Gillingham and Jan Balaguer and Nat McAleese and Amelia Glaese and John Aslanides and Matthew M. Botvinick and Christopher Summerfield	(参考訳) 大規模言語モデリング(LLM)における最近の研究は、出力をプロトタイプユーザの好みに合わせるために微調整を用いている。この研究は、人間の嗜好が個人間で静的で均質であると仮定し、単一の"ジェネリック"なユーザーとの整合がより一般的な整合性を与える。ここでは、人間の嗜好の不均一性を受け入れて、異なる課題を考える: 多様な視点を持つ人々が合意を見つけるのに、マシンはどのように役立つのか? 我々は700億のパラメータllmを微調整し、多様な意見を持つグループに対して、期待される承認を最大化する声明を生成する。人間の参加者は、道徳的問題や政治的問題(例えば、「富裕層に税金を課すべきか?」など)に関する数千の質問について意見書を提出し、LLMが生成した合意と品質に関する合意書を評価する。次に、報酬モデルは個々の選好を予測するために訓練され、異なる集約(社会福祉)機能に従って定義されたグループ全体へのアピールの観点からコンセンサスステートメントを定量化しランク付けすることができる。このモデルでは, LLM(>70%)よりも人間の方が好まれるコンセンサス文を生成し, 最終ランク付けステップに欠ける厳密な微調整ベースラインを著しく上回っている。さらに、ベストモデルのコンセンサスステートメントは、最高の人間生成の意見(>65%)よりも好まれます。グループメンバーのサブセットからのみ合意文を静かに構築すると、除外されたメンバは反対する傾向があり、個々のコントリビューションに対する合意の感受性が明らかになる。これらの結果は、人間のグループ同士の価値観の整合を支援するためにLLMを使うことの可能性を強調している。 Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.	翻訳日:2022-11-29 15:21:21 公開日:2022-11-28
# カテゴリーデータに対する連続拡散 Continuous diffusion for categorical data ( http://arxiv.org/abs/2211.15089v1 ) ライセンス: Link先を確認	Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, R\'emi Leblond, Will Grathwohl, Jonas Adler	(参考訳) 拡散モデルは、反復的洗練による知覚信号(画像や音など)の生成のパラダイムとして急速に発展してきた。彼らの成功は、基礎となる物理現象が連続しているという事実にかかっている。言語のような本質的に離散的で分類的なデータに対して、様々な拡散にインスパイアされた代替案が提案されている。しかし、拡散モデルの連続的な性質は多くの利点をもたらしており、この研究ではそれを保存しようと努力する。時間空間と入力空間の両方で連続的な拡散モデルを用いて分類データをモデル化するCDCDを提案する。いくつかの言語モデリングタスクにおいて有効性を示す。 Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.	翻訳日:2022-11-29 15:20:50 公開日:2022-11-28
# 事前学習言語モデルにおける科学的・創造的アナロジー Scientific and Creative Analogies in Pretrained Language Models ( http://arxiv.org/abs/2211.15268v1 ) ライセンス: Link先を確認	Tamara Czinczoll, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova	(参考訳) 本稿では,BERT や GPT-2 などの大規模事前学習言語モデルにおけるアナログの符号化について検討する。既存の類似データセットは、典型的には類似関係の限られた集合に焦点をあて、類似が持つ2つの領域の類似度が高い。より現実的な設定として、異種ドメイン間の複数の属性と関係構造の体系的なマッピングを含む新しいアナログデータセットであるScientific and Creative Analogy dataset(SCAN)を紹介する。このデータセットを用いて、広く使われている事前学習言語モデル(LM)の類似推論機能をテストする。現状のLMはこれらの複雑なアナロジータスクにおいて低性能を実現し、アナロジー理解によってもたらされる課題を浮き彫りにする。 This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.	翻訳日:2022-11-29 15:20:42 公開日:2022-11-28
# eコマースサイトにおける感情分析と意見マイニング Sentiment analysis and opinion mining on E-commerce site ( http://arxiv.org/abs/2211.15536v1 ) ライセンス: Link先を確認	Fatema Tuz Zohra Anny and Oahidul Islam	(参考訳) 感情分析や意見マイニングは、NLP(Natural Language Processing)というフレーズを説明するのに役立つ。近年では感性分析が最も重要な話題となっている。本研究の目的は,感情分析における感情極性分類の課題を解決することである。全体的プロセスの説明とともに、感情的反対を分類する幅広い手法が提示される。分析の結果,文レベルの分類とレビューレベルの分類の両方が行われる。最後に,今後の感情分析研究の計画について述べる。 Sentiment analysis or opinion mining help to illustrate the phrase NLP (Natural Language Processing). Sentiment analysis has been the most significant topic in recent years. The goal of this study is to solve the sentiment polarity classification challenges in sentiment analysis. A broad technique for categorizing sentiment opposition is presented, along with comprehensive process explanations. With the results of the analysis, both sentence-level classification and review-level categorization are conducted. Finally, we discuss our plans for future sentiment analysis research.	翻訳日:2022-11-29 15:20:11 公開日:2022-11-28
# 常識推論のためのGPT-Neo-理論と実用的なレンズ GPT-Neo for commonsense reasoning-a theoretical and practical lens ( http://arxiv.org/abs/2211.15593v1 ) ライセンス: Link先を確認	Rohan Kashyap, Vivek Kashyap, Narendra C.P	(参考訳) 最近の研究は、GPT-2、GPT-3、GPT-neoのような大規模一方向言語モデルを事前訓練し、次いで下流タスクの微調整で大幅に向上した。本稿では,コモンセンス推論タスクにおけるGPT-neo 1.13億モデルの性能評価を行う。 6つのコモンセンス推論ベンチマークタスクのモデル性能を評価し,これらのタスクの精度スコアを報告する。適切なハイパーパラメータを用いて微調整を行うと、これらの3つのタスクの競合スコアを得るが、データセットのサイズが著しく小さくなると苦労する。これらのタスクのいくつかにおける低モデルのパフォーマンスは、これらのデータセットに固有の難しさを示唆している。また,モデルの性能をよりよく理解するために,可視化を用いて結果を検証し,多数の推論テストを実施しました。最後に,様々な手法を用いて徹底的なロバストネステストを行い,多数の設定条件下でモデル性能を測定した。これらの結果から, GPT-3 175億モデルよりも小さい言語モデルを探索し, 自然言語理解を必要とするタスクを遂行できる可能性が示唆された。 Recent work has demonstrated substantial gains in pre-training large-scale unidirectional language models such as the GPT-2, GPT-3, and GPT-neo, followed by fine-tuning on a downstream task. In this paper, we evaluate the performance of the GPT-neo 1.3 billion model for commonsense reasoning tasks. We assess the model performance on six commonsense reasoning benchmark tasks and report the accuracy scores for these tasks. When fine-tuned using the right set of hyperparameters, we obtain competitive scores on three of these tasks but struggle when the dataset size is significantly smaller. The low model performance on a few of these tasks suggests the inherent difficulty in these datasets and since it fails to establish coherent patterns given their limited training samples. We also investigate and substantiate our results using visualization and conduct numerous inference tests to understand the model performance better. Finally, we conduct thorough robustness tests using various methods to gauge the model performance under numerous settings. These findings suggest a promising path for exploring smaller language models than the GPT-3 175 billion model to perform tasks requiring natural language understanding.	翻訳日:2022-11-29 15:20:05 公開日:2022-11-28
# コンテキスト内学習はどのような学習アルゴリズムか? 線形モデルによる研究 What learning algorithm is in-context learning? Investigations with linear models ( http://arxiv.org/abs/2211.15661v1 ) ライセンス: Link先を確認	Ekin Aky\"urek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou	(参考訳) ニューラルシーケンスモデル、特にトランスフォーマーは、文脈内学習において顕著な能力を示す。ラベル付き例のシーケンスから新しい予測器を構築することができ、追加のパラメータ更新なしに入力に$(x, f(x))$が表示される。本稿では,トランスフォーマーをベースとしたインコンテキスト学習者が,より小さなモデルをアクティベーションに符号化することで,暗黙的な学習アルゴリズムを暗黙的に実装する仮説について検討する。線形回帰を原型問題として用いることで,この仮説の証拠を3つ提示する。まず, 勾配降下と閉形式リッジ回帰に基づく線形モデルのための学習アルゴリズムをトランスフォーマーが実装できることを示す。第2に, 学習者は, 勾配降下, リッジ回帰, および完全最小二乗回帰によって計算された予測器と密接に一致し, トランスフォーマタ深さやデータセットノイズが変化するため, 予測器間の遷移が変化し, 広い幅と深さのベイズ推定器に収束することを示した。第3に,学習者の後期層が重みベクトルやモーメント行列を非線形にエンコードする,文脈内学習者がアルゴリズム的特徴をこれらの予測器と共有する,予備的証拠を示す。これらの結果は,文脈内学習がアルゴリズム的に理解可能であり,(少なくとも線形の場合)学習者が標準推定アルゴリズムを再発見できることを示唆している。この$\href{https://github.com/ekinakyurek/google-research/blob/master/incontext}{http\,link}$でリリースされたコードと参照実装。 Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations released at this $\href{https://github.com/ekinakyurek/google-research/blob/master/incontext}{http\,link}$.	翻訳日:2022-11-29 15:19:45 公開日:2022-11-28
# 光沢度に基づく有意義手話機械翻訳に関する一考察 Considerations for meaningful sign language machine translation based on glosses ( http://arxiv.org/abs/2211.15464v1 ) ライセンス: Link先を確認	Mathias M\"uller, Zifan Jiang, Amit Moryossef, Annette Rios, Sarah Ebling	(参考訳) 自然言語処理(NLP)の研究(Yin et al., 2021)では,手話の自動処理が普及している。特に機械翻訳(MT)では、グルースに基づく手話翻訳が顕著なアプローチである。本稿では,ニューラルグロス翻訳に関する最近の研究について概説する。一般的なグルースの制限や特定のデータセットの制限は、透過的な方法では議論されず、評価の共通標準が存在しないことがわかった。これらの課題に対処するため,光沢翻訳研究の具体的な提言を行った。提案では,光沢に基づくアプローチ,現実的なデータセット,より強固なベースライン,説得力のある評価という本質的な限界に対する認識を提唱する。 Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation.	翻訳日:2022-11-29 15:12:24 公開日:2022-11-28
# 言語間移動のためのフラストレーションやすいラベル投影法 Frustratingly Easy Label Projection for Cross-lingual Transfer ( http://arxiv.org/abs/2211.15613v1 ) ライセンス: Link先を確認	Yang Chen, Chao Jiang, Alan Ritter, Wei Xu	(参考訳) 訓練データを多くの言語に翻訳することは、言語間転送を改善するための実用的な解決策として現れてきた。情報抽出や質問応答などのスパンレベルのアノテーションを含むタスクには、注釈付きスパンを翻訳されたテキストにマッピングするために追加のラベル投影ステップが必要である。近年, ラベル付きスパンの周囲に特別なマーカーを挿入することにより, 翻訳と投影を共同で行うための簡易なマーク翻訳手法が試みられている。しかし、我々の知る限り、この手法が単語アライメントに基づく従来のアノテーション投影とどのように比較されるかについては、実証的な分析は行われていない。本稿では,42言語および3つのタスク(QA,NER,イベント抽出)にまたがる広範な実証的研究を行い,両手法の有効性と限界を評価し,文献における重要なギャップを埋める。実験結果から,我々はEasyProjectと呼ぶマーク-then-translateの最適化版を多くの言語に適用しやすく,驚くほどうまく動作し,より複雑な単語アライメント方式よりも優れていることがわかった。エンドタスクのパフォーマンスに影響を与えるいくつかの重要な要因を分析し、EasyProjectが翻訳後のラベルスパン境界を正確に保存できることを示す。すべてのコードとデータを公開します。 Translating training data into many languages has emerged as a practical solution for improving cross-lingual transfer. For tasks that involve span-level annotations, such as information extraction or question answering, an additional label projection step is required to map annotated spans onto the translated texts. Recently, a few efforts have utilized a simple mark-then-translate method to jointly perform translation and projection by inserting special markers around the labeled spans in the original sentence. However, as far as we are aware, no empirical analysis has been conducted on how this approach compares to traditional annotation projection based on word alignment. In this paper, we present an extensive empirical study across 42 languages and three tasks (QA, NER, and Event Extraction) to evaluate the effectiveness and limitations of both methods, filling an important gap in the literature. Experimental results show that our optimized version of mark-then-translate, which we call EasyProject, is easily applied to many languages and works surprisingly well, outperforming the more complex word alignment-based methods. We analyze several key factors that affect end-task performance, and show EasyProject works well because it can accurately preserve label span boundaries after translation. We will publicly release all our code and data.	翻訳日:2022-11-29 15:12:13 公開日:2022-11-28
# データセットの数え方を超えて:多言語データセットの構築と必要なリソースの調査 Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources ( http://arxiv.org/abs/2211.15649v1 ) ライセンス: Link先を確認	Xinyan Velocity Yu, Akari Asai, Trina Chatterjee, Junjie Hu and Eunsol Choi	(参考訳) NLPコミュニティは一般的に言語間の資源格差を認識しているが、そのような格差の程度と種類を定量化する研究は欠如している。データセットの品質が変化するにつれて、データセットの数に基づいてリソースの可用性を推定する以前の調査は誤解を招く可能性がある。より包括的な言語資源図を提供するため、156個の公開NLPデータセットの特徴について検討する。それらは、入力テキストやラベルソース、それらを構築するのに使用されるツール、彼らが何を勉強するか、彼らが対処するタスクと彼らの作成に対するモチベーションを含む、手動で作成する方法を注釈します。言語間の質的なNLPリソースギャップを定量化した後、低リソース言語におけるデータ収集を改善する方法について論じる。言語に習熟したNLP研究者と言語ごとの群衆労働者を調査したところ、その推定可用性はデータセットの可用性と相関していることがわかった。クラウドソーシング実験を通じて,メカニカルトルコプラットフォーム上で高品質な多言語データを収集するための戦略を同定する。今後の多言語データ開発のためのNLPコミュニティと個人研究者に対してマクロおよびマイクロレベルの提案を行うことで、結論付ける。 While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.	翻訳日:2022-11-29 15:11:51 公開日:2022-11-28
# 論争的な刺激を伴う表現的ジオメトリの識別:ベイズの実験設計と相違性判定への応用 Distinguishing representational geometries with controversial stimuli: Bayesian experimental design and its application to face dissimilarity judgments ( http://arxiv.org/abs/2211.15053v1 ) ライセンス: Link先を確認	Tal Golan, Wenxuan Guo, Heiko H. Sch\"utt, Nikolaus Kriegeskorte	(参考訳) ニューラルネットワーク層における複雑な刺激の表現と人間の脳の表現や行動判断を比較することで、モデル開発を導くことができる。しかし、定性的に異なるニューラルネットワークモデルでさえ、典型的な刺激セットの同様の表現的ジオメトリを予測することが多い。本稿では,表現モデル間の適応のための刺激セットを効率的に合成するためのベイズ実験設計手法を提案する。本稿では,行動相違判定のニューラルネットワークモデルの識別に本手法を適用した。その結果,3次元顔モデルグラフィックスレンダラを倒すように訓練されたニューラルネットワークは,識別,分類,自動エンコーディングを訓練した同じアーキテクチャよりも人間的指向性が高いことがわかった。提案した刺激合成の目的は,モデル比較のための表現類似性解析により解析する実験の設計に適用できる。 Comparing representations of complex stimuli in neural network layers to human brain representations or behavioral judgments can guide model development. However, even qualitatively distinct neural network models often predict similar representational geometries of typical stimulus sets. We propose a Bayesian experimental design approach to synthesizing stimulus sets for adjudicating among representational models efficiently. We apply our method to discriminate among candidate neural network models of behavioral face dissimilarity judgments. Our results indicate that a neural network trained to invert a 3D-face-model graphics renderer is more human-aligned than the same architecture trained on identification, classification, or autoencoding. Our proposed stimulus synthesis objective is generally applicable to designing experiments to be analyzed by representational similarity analysis for model comparison.	翻訳日:2022-11-29 15:04:11 公開日:2022-11-28
# CycleGAN拡張による地域降雨予報 Regional Precipitation Nowcasting Based on CycleGAN Extension ( http://arxiv.org/abs/2211.15046v1 ) ライセンス: Link先を確認	Jaeho Choi, Yura Kim, Kwang-Ho Kim, Sung-Hwa Jung, Ikhyun Cho	(参考訳) 通常、集中豪雨は2022年8月8日に韓国中部を襲った。多くの低地が水没し、交通と生活はひどく麻痺した。わずか数時間の暴風雨による致命的な被害であった。この出来事は、より信頼性の高い地域降水ノキャスティング方法の必要性を思い出させた。本稿では,サイクル一貫性のある対向ネットワーク (CycleGAN) を時系列領域に導入し,それを拡張し,地域降水流の信頼性モデルを提案する。提案モデルは,現在から10分後に複合複合表面降雨(HSR)データを生成する。また,提案モデルでは,トレーニング時間段階の段階的拡張により,最大2時間の信頼性予測を行う。既存の複雑な放送方法とは異なり、提案モデルはリカレントニューラルネットワーク(RNN)を使用しず、サイクルのシーケンシャルトレーニングを通じて時間的因果性を確保する。 RNNに基づく畳み込み長短期記憶(ConvLSTM)よりも優れた降水量推定法を提案する。さらに,実際の量的降水予測(QPF)モデルの一つであるラグランジアン外挿法による降水流のマギルアルゴリズムであるMAPLEに対する質的,定量的比較によるアプローチの優位性を示した。 Unusually, intensive heavy rain hit the central region of Korea on August 8, 2022. Many low-lying areas were submerged, so traffic and life were severely paralyzed. It was the critical damage caused by torrential rain for just a few hours. This event reminded us of the need for a more reliable regional precipitation nowcasting method. In this paper, we bring cycle-consistent adversarial networks (CycleGAN) into the time-series domain and extend it to propose a reliable model for regional precipitation nowcasting. The proposed model generates composite hybrid surface rainfall (HSR) data after 10 minutes from the present time. Also, the proposed model provides a reliable prediction of up to 2 hours with a gradual extension of the training time steps. Unlike the existing complex nowcasting methods, the proposed model does not use recurrent neural networks (RNNs) and secures temporal causality via sequential training in the cycle. Our precipitation nowcasting method outperforms convolutional long short-term memory (ConvLSTM) based on RNNs. Additionally, we demonstrate the superiority of our approach by qualitative and quantitative comparisons against MAPLE, the McGill algorithm for precipitation nowcasting by lagrangian extrapolation, one of the real quantitative precipitation forecast (QPF) models.	翻訳日:2022-11-29 14:55:43 公開日:2022-11-28
# 光タッチによるトランスフォーマーの多視点幾何学教育 A Light Touch Approach to Teaching Transformers Multi-view Geometry ( http://arxiv.org/abs/2211.15107v1 ) ライセンス: Link先を確認	Yash Bhalgat, Joao F. Henriques, Andrew Zisserman	(参考訳) トランスフォーマーは強力な視覚的学習者であり、多くの場合、手動で特定された事前情報がないためである。この柔軟性は、3次元形状と視点のほぼ無限のバリエーション(柔軟性が必要)と射影幾何学の正確な性質(剛性の法則に従えば)のため、多視点幾何学に関わるタスクにおいて問題となる。この混乱を解決するために,視覚トランスフォーマーに多視点幾何学を学ぶように誘導する「ライトタッチ」アプローチを提案する。我々は、エピポーラ線を用いてトランスフォーマーのクロスアテンションマップを誘導し、エピポーラ線外の注意値をペナルティ化し、それらの線に沿って高い注意を喚起する。従来の方法とは異なり、テスト時にカメラのポーズ情報を必要としない。検索画像と検索画像の視点の違いが大きいため,標準的なトランスフォーマーネットワークが苦労する,ポーズ不変オブジェクトインスタンス検索に注目する。提案手法は,テスト時にポーズ情報を必要とせず,オブジェクト検索における最先端の手法よりも優れている。 Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time.	翻訳日:2022-11-29 14:55:21 公開日:2022-11-28
# マルチモーダル医療データ分析のためのヘテロジニアスグラフ学習 Heterogeneous Graph Learning for Multi-modal Medical Data Analysis ( http://arxiv.org/abs/2211.15158v1 ) ライセンス: Link先を確認	Sein Kim, Namkyeong Lee, Junseok Lee, Dongmin Hyun and Chanyoung Park	(参考訳) 患者の定期的な臨床訪問は、画像データだけでなく、患者に関する臨床情報を含む非画像データ、すなわち、自然界において医療データはマルチモーダルである。このような異質な形態は、同じ患者に対して異なる視点と相補的な視点を提供し、適切な組み合わせによってより正確な臨床判断をもたらす。しかしながら、その重要性にもかかわらず、マルチモーダル医療データを統一フレームワークに効果的に融合する方法は比較的注目されていない。本稿では,マルチモーダル医療データを融合するためのHetMed (Heterogeneous Graph Learning for Multi-modal Medical Data Analysis) というグラフベースの効果的なフレームワークを提案する。具体的には,複数種類の非画像特徴を組み込んだマルチプレックスネットワークを構築し,患者間の複雑な関係を体系的に捉えることにより,より正確な臨床判断を行う。様々な実世界のデータセットに対する大規模な実験は、HetMedの優位性と実用性を示している。 HetMedのソースコードはhttps://github.com/Sein-Kim/Multimodal-Medicalで入手できる。 Routine clinical visits of a patient produce not only image data, but also non-image data containing clinical information regarding the patient, i.e., medical data is multi-modal in nature. Such heterogeneous modalities offer different and complementary perspectives on the same patient, resulting in more accurate clinical decisions when they are properly combined. However, despite its significance, how to effectively fuse the multi-modal medical data into a unified framework has received relatively little attention. In this paper, we propose an effective graph-based framework called HetMed (Heterogeneous Graph Learning for Multi-modal Medical Data Analysis) for fusing the multi-modal medical data. Specifically, we construct a multiplex network that incorporates multiple types of non-image features of patients to capture the complex relationship between patients in a systematic way, which leads to more accurate clinical decisions. Extensive experiments on various real-world datasets demonstrate the superiority and practicality of HetMed. The source code for HetMed is available at https://github.com/Sein-Kim/Multimodal-Medical.	翻訳日:2022-11-29 14:55:00 公開日:2022-11-28
# ブリッジモード接続による文脈適応型ディープニューラルネットワーク Context-Adaptive Deep Neural Networks via Bridge-Mode Connectivity ( http://arxiv.org/abs/2211.15436v1 ) ライセンス: Link先を確認	Nathan Drenkow, Alvin Tan, Chace Ashcraft, Kiran Karra	(参考訳) 安全クリティカルなアプリケーションにおける機械学習モデルのデプロイは、このようなモデルがさまざまな状況でうまく機能することを期待している(例えば、街路標識を分類するためのビジョンモデルは、様々な照明/天候条件下で農村部、都市、高速道路で機能するべきである)。しかし、これらのワンサイズモデルは通常、平均ケースパフォーマンスに最適化されており、名目上の条件では高いパフォーマンスを達成することを奨励するが、難しい状況や稀な状況では予期せぬ振る舞いに露呈する。そこで本研究では,文脈依存型モデルを学習するための新しい手法を提案する。ブリッジモード接続 (bmc) (garipov et al., 2018) を拡張して,モデルの無限アンサンブルを連続的なコンテキストの尺度上でトレーニングし,対応する評価コンテキストに特別に調整したモデルパラメータをサンプリングする。本研究では,リスクプロファイルの変化,ロングテール画像の統計・出現,コンテキスト依存分布シフトなど,画像分類タスクにおけるコンテキスト定義について検討する。これらの各ケースに対してbmc最適化の新たな拡張を開発し,各シナリオにおけるモデル性能をコンテキストにうまく調整できることを実験により実証した。 The deployment of machine learning models in safety-critical applications comes with the expectation that such models will perform well over a range of contexts (e.g., a vision model for classifying street signs should work in rural, city, and highway settings under varying lighting/weather conditions). However, these one-size-fits-all models are typically optimized for average case performance, encouraging them to achieve high performance in nominal conditions but exposing them to unexpected behavior in challenging or rare contexts. To address this concern, we develop a new method for training context-dependent models. We extend Bridge-Mode Connectivity (BMC) (Garipov et al., 2018) to train an infinite ensemble of models over a continuous measure of context such that we can sample model parameters specifically tuned to the corresponding evaluation context. We explore the definition of context in image classification tasks through multiple lenses including changes in the risk profile, long-tail image statistics/appearance, and context-dependent distribution shift. We develop novel extensions of the BMC optimization for each of these cases and our experiments demonstrate that model performance can be successfully tuned to context in each scenario.	翻訳日:2022-11-29 14:53:50 公開日:2022-11-28
# 構成性向上のための相互排他性訓練と原始増強 Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality ( http://arxiv.org/abs/2211.15578v1 ) ライセンス: Link先を確認	Yichen Jiang, Xiang Zhou, Mohit Bansal	(参考訳) 最近のデータセットは、標準的なシーケンス対シーケンスモデルにおける体系的な一般化能力の欠如を明らかにしている。本研究では,Seq2seqモデルの振る舞いを分析し,相互排他バイアスの欠如(すなわち,すでに対象配列にマッピングされたソースシーケンスが他のターゲットシーケンスにマッピングされる可能性が低い)と,構造を内容から切り離すのではなく,全体例を記憶する傾向という2つの要因を同定する。我々は,これら2つの課題にそれぞれ対処するための2つの手法を提案している: 相互排他的訓練は,新奇な例に対面した場合にモデルが現れるのを防止し,類似性に基づく損失による未発見の例を発生させる;prim2primxデータ拡張は,すべての構文関数の引数を自動的に多様化し,暗記化を防止し,テストセットデータを露呈することなく構成的帰納的バイアスを与える。これら2つの手法を組み合わせることで,SCAN と COGS の2つの広く使用されている構成性データセット上で,標準シーケンス列列モデル (LSTM と Transformer ) を用いた経験的改善が得られた。最後に,改善点と残る課題を特徴とする分析を行い,本手法の詳細なアブレーションを行う。私たちのコードはhttps://github.com/owenzx/met-primaugで利用可能です。 Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug	翻訳日:2022-11-29 14:47:37 公開日:2022-11-28
# パラメータ効率の良いファインチューニングの有効性について On the Effectiveness of Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2211.15583v1 ) ライセンス: Link先を確認	Zihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel Collier	(参考訳) 微調整事前学習モデルは、幅広いNLPタスクに有効であることが広く証明されている。しかし、モデル全体を微調整することはパラメータ非効率であり、常にタスクごとに全く新しいモデルを生成する。現在、多くの研究がパラメータのごく一部だけを微調整し、多くのパラメータを異なるタスクで共有することを提案している。これらの手法は驚くほど優れた性能を達成し、対応する完全微調整のものよりも安定であることが示される。しかし、そのような方法はまだよく分かっていない。パラメータの空間性は、どのように有望なパフォーマンスをもたらすのか? なぜモデルは完全に調整されたモデルよりも安定しているのか? チューニング可能なパラメータの選び方? 本稿では,既存の手法をまずランダムアプローチ,ルールベースアプローチ,投射ベースアプローチに分類し,どのパラメータをチューニングするかを選択する。そして,全ての手法が実際に微調整されたモデルに分散していることを示し,新しい理論解析を行う。安定性の上限を制御して元のモデルに正規化を実際に与えていることを示す。このような安定性は、最近の多くの研究で実証的に観察されたより優れた一般化能力をもたらす。我々の理論が根拠としているスパーシティの有効性にもかかわらず、チューニング可能なパラメータを選択する方法は依然として未解決の問題である。調整可能なパラメータをよりよく選択するために,解析的に解ける最適化関数を用いて元の問題を近似する新しい二階近似法(SAM)を提案する。可変パラメータは近似関数を直接最適化することによって決定される。実験結果から,提案するsamモデルは,強いベースラインモデルよりも優れており,理論解析も検証できることがわかった。 Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.	翻訳日:2022-11-29 14:47:04 公開日:2022-11-28
# グローバル・ローカル構造をもつマルチタスク帯域における表現学習のサンプル複雑さについて On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure ( http://arxiv.org/abs/2211.15129v1 ) ライセンス: Link先を確認	Alessio Russo, Alexandre Proutiere	(参考訳) マルチタスクバンディット問題に対する最適アーム学習のサンプル複雑性について検討した。アームはタスク間で共有されるもの(表現と呼ぶもの)とタスク固有のもの(予測子と呼ばれるもの)の2つのコンポーネントで構成されています。目的は、最適表現がすべてのタスクに共通であると仮定して、各タスクの最適な(表現、予測)ペアを学ぶことである。このフレームワークでは、効率的な学習アルゴリズムはタスク間で知識を転送する必要がある。各ラウンドにおいて、学習者はタスクとアームの両方を積極的に選択し、対応する報酬を観察する。我々は、任意の$(\delta_g,\delta_h)$-pacアルゴリズムで満たされるインスタンス固有のサンプル複雑性下限を導出する(そのようなアルゴリズムは、最良表現を少なくとも1-\delta_g$、確率が少なくとも1-\delta_h$のタスクの最適予測器として識別する)。我々は,サンプル複雑性が下限に近づくアルゴリズムosrl-scを考案し,最大で$h(g\log(1/\delta_g)+x\log(1/\delta_h))$,$x,g,h$ をそれぞれタスク数,表現数,予測値として拡張する。比較として、このスケーリングは$hgx\log(1/\delta)$でスケールする古典的なベストアーム識別アルゴリズムよりもはるかに優れている。 We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to all tasks. Within this framework, efficient learning algorithms should transfer knowledge across tasks. We consider the best-arm identification problem for a fixed confidence, where, in each round, the learner actively selects both a task, and an arm, and observes the corresponding reward. We derive instance-specific sample complexity lower bounds satisfied by any $(\delta_G,\delta_H)$-PAC algorithm (such an algorithm identifies the best representation with probability at least $1-\delta_G$, and the best predictor for a task with probability at least $1-\delta_H$). We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(G\log(1/\delta_G)+ X\log(1/\delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors. By comparison, this scaling is significantly better than the classical best-arm identification algorithm that scales as $HGX\log(1/\delta)$.	翻訳日:2022-11-29 14:46:18 公開日:2022-11-28
# ロングテールクロスモーダルハッシュ Long-tail Cross Modal Hashing ( http://arxiv.org/abs/2211.15162v1 ) ライセンス: Link先を確認	Zijun Gao, Jun Wang, Guoxian Yu, Zhongmin Yan, Carlotta Domeniconi, Jinglin Zhang	(参考訳) 既存のクロスモーダルハッシュ法(cmh)は主にバランスのあるデータのために設計されているが、ロングテール分布を持つ不均衡なデータは現実世界でより一般的である。いくつかのロングテールハッシュ法が提案されているが、ラベルと個人間の複雑な相互作用とマルチモーダルデータの共通性情報のため、マルチモーダルデータには適応できない。さらに、cmh法は、各モダリティの個性によって符号化された末尾ラベルをオーバーライドするハッシュコードを学ぶために、多モードデータの共通性を発掘する。本稿では,不均衡なマルチモーダルデータを扱うLtCMH(Long-tail CMH)を提案する。 LtCMHはまず、各モダリティの個性と共通性を最小化し、これらのモダリティの共通性を高めることで、異なるモダリティの個性と共通性をマイニングするオートエンコーダを採用する。次に、個性と共通性を各モジュールから抽出した直接特徴と動的に組み合わせて、テールラベルの表現を豊かにするメタ特徴と、ハッシュコードを生成するバイナリメタ特徴を生成する。 LtCMHは、ロングテールデータセットの最先端ベースラインを著しく上回り、バランスの取れたラベルを持つデータセットの(あるいは同等の)パフォーマンスを向上する。 Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced data, while imbalanced data with long-tail distribution is more general in real-world. Several long-tail hashing methods have been proposed but they can not adapt for multi-modal data, due to the complex interplay between labels and individuality and commonality information of multi-modal data. Furthermore, CMH methods mostly mine the commonality of multi-modal data to learn hash codes, which may override tail labels encoded by the individuality of respective modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the individuality and commonality of different modalities by minimizing the dependency between the individuality of respective modalities and by enhancing the commonality of these modalities. Then it dynamically combines the individuality and commonality with direct features extracted from respective modalities to create meta features that enrich the representation of tail labels, and binaries meta features to generate hash codes. LtCMH significantly outperforms state-of-the-art baselines on long-tail datasets and holds a better (or comparable) performance on datasets with balanced labels.	翻訳日:2022-11-29 14:45:48 公開日:2022-11-28
# YOLOv5モデルの海洋環境における微小物体検出への応用 Application of the YOLOv5 Model for the Detection of Microobjects in the Marine Environment ( http://arxiv.org/abs/2211.15218v1 ) ライセンス: Link先を確認	Aleksandr N. Grekov (1)(2), Yurii E. Shishkin, Sergei S. Peliushenko, Aleksandr S. Mavrin, ((1) Institute of Natural and Technical Systems, (2) Sevastopol State University)	(参考訳) 海洋環境における微小物体の自動検出と認識の問題を解決するためのYOLOV5機械学習モデルの有効性について検討した。マイクロプランクトンとマイクロプラスチックのサンプルを作成し,画像認識ニューラルネットワークを訓練するために,機密画像のデータベースを収集した。訓練されたネットワークを用いて、写真やビデオ画像中の微小物体をリアルタイムで見つける実験結果を示す。実験により, 海洋環境における微小物体の検出問題の解法において, 提案モデルを用いた手動認識に匹敵する高い効率性を示した。 The efficiency of using the YOLOV5 machine learning model for solving the problem of automatic de-tection and recognition of micro-objects in the marine environment is studied. Samples of microplankton and microplastics were prepared, according to which a database of classified images was collected for training an image recognition neural network. The results of experiments using a trained network to find micro-objects in photo and video images in real time are presented. Experimental studies have shown high efficiency, comparable to manual recognition, of the proposed model in solving problems of detect-ing micro-objects in the marine environment.	翻訳日:2022-11-29 14:37:47 公開日:2022-11-28
# ビデオキャプションにおける周波数拡散に対する精細セマンティックエンハンスメント Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning ( http://arxiv.org/abs/2211.15076v1 ) ライセンス: Link先を確認	Xian Zhong, Zipeng Li, Shuqin Chen, Kui Jiang, Chen Chen and Mang Ye	(参考訳) ビデオキャプションは、与えられたビデオを正確に記述する自然言語文を生成することを目的としている。既存の手法では、エンコードフェーズでよりリッチな視覚的表現を探索したり、復号能力を向上させることで良好な生成が得られる。しかし、長い尾の問題はこれらの低周波トークンに対する試みを妨げ、これは稀に起こるが重要な意味論を持ち、詳細な生成において重要な役割を果たす。本稿では,不適切なトークンの言語表現を常に知覚するキャプションモデルである周波数拡散(rsfd)に対する新しい洗練された意味的拡張法を提案する。具体的には、低周波トークンの意味を理解するために、周波数対応拡散(FAD)モジュールを提案する。このようにして、トークンの吸収を不十分に促進してキャプションを洗練する。 fadに基づき、拡散過程によって引き起こされる高周波トークンの情報損失を補償するために、分散セマンティックスーパーバイザ(dss)モジュールを設計し、低周波トークンのセマンティクスをさらに強調し、ロングテール問題を軽減する。 RSFDは、MSR-VTTとMSVDという2つのベンチマークデータセット上で最先端の手法よりも優れており、低周波トークンセマンティクスの強化が競合する生成効果が得られることを示している。コードはhttps://github.com/lzp870/RSFDで入手できる。 Video captioning aims to generate natural language sentences that describe the given video accurately. Existing methods obtain favorable generation by exploring richer visual representations in encode phase or improving the decoding ability. However, the long-tailed problem hinders these attempts at low-frequency tokens, which rarely occur but carry critical semantics, playing a vital role in the detailed generation. In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens. Concretely, a Frequency-Aware Diffusion (FAD) module is proposed to comprehend the semantics of low-frequency tokens to break through generation limitations. In this way, the caption is refined by promoting the absorption of tokens with insufficient occurrence. Based on FAD, we design a Divergent Semantic Supervisor (DSS) module to compensate for the information loss of high-frequency tokens brought by the diffusion process, where the semantics of low-frequency tokens is further emphasized to alleviate the long-tailed problem. Extensive experiments indicate that RSFD outperforms the state-of-the-art methods on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate that the enhancement of low-frequency tokens semantics can obtain a competitive generation effect. Code is available at https://github.com/lzp870/RSFD.	翻訳日:2022-11-29 14:36:40 公開日:2022-11-28
# Decoupled Prototypeal Networkによる一般化カテゴリー探索 Generalized Category Discovery with Decoupled Prototypical Network ( http://arxiv.org/abs/2211.15115v1 ) ライセンス: Link先を確認	Wenbin An, Feng Tian, Qinghua Zheng, Wei Ding, QianYing Wang, Ping Chen	(参考訳) Generalized Category Discovery (GCD)は、既知のカテゴリのみをラベル付けした別のデータセットに基づいて、ラベルなしデータの集合から既知のカテゴリと新しいカテゴリの両方を認識することを目的としている。既知のカテゴリと新しいカテゴリの違いを考慮せずに、現在の手法はそれらを結合的に学習し、モデルの一般化と識別能力を損なう。さらに,これらのモデルがラベル付きデータからラベルなしデータへ,カテゴリ固有の知識を明示的に伝達することを防止し,高レベルのセマンティック情報やモデル性能を損なうことができる。上記の制約を緩和するために,Decoupled Prototypeal Network (DPN) と呼ばれる新しいモデルを提案する。カテゴリプロトタイプの両部マッチング問題を定式化することにより、DPNは、既知のカテゴリと新しいカテゴリを分離して、異なるトレーニング目標を効果的に達成するだけでなく、ラベル付きおよびラベルなしデータの既知のカテゴリを整列させて、カテゴリ固有の知識を明示的に伝達し、ハイレベルなセマンティクスを捉えることができる。さらに、DPNは、SPL(Semantic-aware Prototypeal Learning)によって、既知のカテゴリと新しいカテゴリの両方のより差別的な特徴を学習することができる。意味のある意味情報を取得することに加えて、SPLは意味重み付けされたソフトアロケーションによって硬い擬似ラベルのノイズを軽減することもできる。大規模な実験により、DPNは複数のベンチマークデータセットのすべての評価指標に対して、最先端のモデルよりも大きなマージンで優れていることが示された。コードとデータはhttps://github.com/lackel/dpnで入手できる。 Generalized Category Discovery (GCD) aims to recognize both known and novel categories from a set of unlabeled data, based on another dataset labeled with only known categories. Without considering differences between known and novel categories, current methods learn about them in a coupled manner, which can hurt model's generalization and discriminative ability. Furthermore, the coupled training approach prevents these models transferring category-specific knowledge explicitly from labeled data to unlabeled data, which can lose high-level semantic information and impair model performance. To mitigate above limitations, we present a novel model called Decoupled Prototypical Network (DPN). By formulating a bipartite matching problem for category prototypes, DPN can not only decouple known and novel categories to achieve different training targets effectively, but also align known categories in labeled and unlabeled data to transfer category-specific knowledge explicitly and capture high-level semantics. Furthermore, DPN can learn more discriminative features for both known and novel categories through our proposed Semantic-aware Prototypical Learning (SPL). Besides capturing meaningful semantic information, SPL can also alleviate the noise of hard pseudo labels through semantic-weighted soft assignment. Extensive experiments show that DPN outperforms state-of-the-art models by a large margin on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/Lackel/DPN.	翻訳日:2022-11-29 14:36:15 公開日:2022-11-28
# 教師付き言語モデルのFew-Shotシナリオにおける距離メトリック学習損失関数 Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning ( http://arxiv.org/abs/2211.15195v1 ) ライセンス: Link先を確認	Witold Sosnowski, Karolina Seweryn, Anna Wr\'oblewska, Piotr Gawrysiak	(参考訳) 本稿では,分類タスクにおける言語モデルの教師付き微調整に対する距離メトリック学習(dml)損失関数の影響について分析する。 SentEval Transfer Tasksの既知のデータセットを実験した。実験により,DML損失関数の適用により,ロバータ大規模モデルの下流分類タスクの性能が向上することが示された。ソフトトリプル損失を微調整したモデルは、トレーニングデータセットに応じて0.04から13.48ポイントの約2.89ポイントの、標準カテゴリのクロスエントロピー損失関数を持つモデルよりも優れた結果が得られる。さらに,モデルの信頼性を評価し,結果を説明するために,説明可能性技術を用いた総合的な分析を行った。 This paper presents an analysis regarding an influence of the Distance Metric Learning (DML) loss functions on the supervised fine-tuning of the language models for classification tasks. We experimented with known datasets from SentEval Transfer Tasks. Our experiments show that applying the DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios. Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function by about 2.89 percentage points from 0.04 to 13.48 percentage points depending on the training dataset. Additionally, we accomplished a comprehensive analysis with explainability techniques to assess the models' reliability and explain their results.	翻訳日:2022-11-29 14:35:34 公開日:2022-11-28
# 数発自然言語分類のための距離メトリック学習の再検討 Revisiting Distance Metric Learning for Few-Shot Natural Language Classification ( http://arxiv.org/abs/2211.15202v1 ) ライセンス: Link先を確認	Witold Sosnowski, Anna Wr\'oblewska, Karolina Seweryn, Piotr Gawrysiak	(参考訳) 距離メトリック学習(DML)は近年,画像処理において注目されている。本稿では,自然言語処理(nlp)分類タスクにおける教師付き微調整言語モデルへの影響を分析した。我々は、既知のSentEval Transfer Tasksデータセット上でRoBERTa言語モデルを訓練する際のDML損失関数について検討した。また、モデル推論中にプロキシベースのDML損失を利用する可能性についても分析した。体系的な実験により,少数の学習条件,特にプロキシに基づくdml損失は,教師付き言語モデルの微調整と推論に正の影響を与えうることが示された。 CCE(カテゴリー的クロスエントロピー損失)とProxyAnchor Lossの組み合わせで調整されたモデルは、トレーニングデータセットによって最大10.38ポイントまで、平均してCCEのみで最高のパフォーマンスとパフォーマンスのモデルである。 Distance Metric Learning (DML) has attracted much attention in image processing in recent years. This paper analyzes its impact on supervised fine-tuning language models for Natural Language Processing (NLP) classification tasks under few-shot learning settings. We investigated several DML loss functions in training RoBERTa language models on known SentEval Transfer Tasks datasets. We also analyzed the possibility of using proxy-based DML losses during model inference. Our systematic experiments have shown that under few-shot learning settings, particularly proxy-based DML losses can positively affect the fine-tuning and inference of a supervised language model. Models tuned with a combination of CCE (categorical cross-entropy loss) and ProxyAnchor Loss have, on average, the best performance and outperform models with only CCE by about 3.27 percentage points -- up to 10.38 percentage points depending on the training dataset.	翻訳日:2022-11-29 14:35:21 公開日:2022-11-28
# 逆向き知識蒸留による雷速映像異常検出 Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation ( http://arxiv.org/abs/2211.15597v1 ) ライセンス: Link先を確認	Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah	(参考訳) 本稿では,複数の高精度な対象レベルの教師モデルから知識を抽出し,異常検出を学習する,ビデオ中の異常検出のための非常に高速なフレームレベルモデルを提案する。学生の忠実度を向上させるために,教師の低分解能な異常マップを,標準と対角蒸留を併用して蒸留し,各教師に対して,目標と生成した異常マップを区別する対角ディミネータを導入する。我々は,3つのベンチマーク (avenue, shanghaitech, ucsd ped2) について実験を行い,提案手法が最速の競合手法よりも7倍以上高速で,オブジェクト中心モデルよりも28～62倍高速であることを示した。また,従来の1480fpsの低速化により,速度と精度のトレードオフが最良であることを示す。さらに、アーキテクチャ設計の選択を正当化するための包括的なアブレーション研究を実施します。 We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices.	翻訳日:2022-11-29 14:28:03 公開日:2022-11-28
# 逆可解性とセキュリティ : フェデレーション学習への応用 Inverse Solvability and Security with Applications to Federated Learning ( http://arxiv.org/abs/2211.14115v2 ) ライセンス: Link先を確認	Tomasz Piotrowski, Matthias Frey, Renato L.G. Cavalcante, Rafail Ismailov	(参考訳) 本稿では,一般線形フォワードモデルにおける逆可解性と安全性の概念を紹介し,連体学習で用いられるモデルに適用する方法を示す。本稿では,本論文で定義した逆可解性とセキュリティが異なるようなモデルの例を示す。また,フェデレート学習の繰り返しに参加する多数のユーザが,解答可能性とセキュリティを高めるためにどのように活用できるかを示す。最後に、非線形ケースを含む提示概念の拡張について論じる。 We introduce the concepts of inverse solvability and security for a generic linear forward model and demonstrate how they can be applied to models used in federated learning. We provide examples of such models which differ in the resulting inverse solvability and security as defined in this paper. We also show how the large number of users participating in a given iteration of federated learning can be leveraged to increase both solvability and security. Finally, we discuss possible extensions of the presented concepts including the nonlinear case.	翻訳日:2022-11-29 14:19:24 公開日:2022-11-28
# 因子モデルにおける二重強近傍 Doubly robust nearest neighbors in factor models ( http://arxiv.org/abs/2211.14297v2 ) ライセンス: Link先を確認	Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah	(参考訳) 本稿では,複数のユニットが複数の時点に複数の処理を割り当てるパネルデータ設定において,各ユニットが一定の確率でサンプリングされた非事実的推論のための改良型を提案する。我々はこの推定器を2倍に頑健な近接推定器と呼び、各単位に対応する平均パラメータにバインドされた高い確率の非漸近誤差を与える。私たちの保証は、二重ロバストな推定器は、これらの設定のために事前の作業で分析された近隣の推定器と比較して、エラーの(ほぼ)クアドドラティックな改善を提供することを示している。 In this technical note, we introduce an improved variant of nearest neighbors for counterfactual inference in panel data settings where multiple units are assigned multiple treatments over multiple time points, each sampled with constant probabilities. We call this estimator a doubly robust nearest neighbor estimator and provide a high probability non-asymptotic error bound for the mean parameter corresponding to each unit at each time. Our guarantee shows that the doubly robust estimator provides a (near-)quadratic improvement in the error compared to nearest neighbor estimators analyzed in prior work for these settings.	翻訳日:2022-11-29 14:19:17 公開日:2022-11-28
# 欧州のAI責任指令 -- ハーフハードアプローチの批判と今後の教訓 The European AI Liability Directives -- Critique of a Half-Hearted Approach and Lessons for the Future ( http://arxiv.org/abs/2211.13960v2 ) ライセンス: Link先を確認	Philipp Hacker	(参考訳) aiシステムの最適責任フレームワークは、世界中で未解決の問題のままである。欧州委員会は2022年9月に、新たなai責任指令と製品責任指令の改訂という2つの提案を前進させた。それらは、EUにおけるAI規制の最終的かつ待望の基盤となっている。重要なことに、責任提案とEUのAI法は本質的に相互運用されており、後者は被災者の個人的権利を一切含んでおらず、前者はAI開発と展開に関する特定の実質的な規則を欠いている。総合すると、これらの行為は、米国や他の国に大きな影響を与えるai規制においてブリュッセル効果を引き起こす可能性がある。この論文は3つの新しい貢献をする。まず、欧州委員会の提案を詳細に検討し、正しい方向に進む一方で、最終的にはハーフハーフハーフのアプローチを表現している。もし前向きに制定されたら、EUにおけるAIの責任は、主に証拠メカニズムの開示と、欠陥、欠陥、因果関係に関する狭義の予測にかかっている。第二に、この記事は修正を提案するが、これは論文の最後にAnnexで収集される。第3に、AIがもたらす重要なリスクの分析に基づいて、最終部では、EU以降におけるAIの責任と規制の将来への道のりを図示している。これには、AI責任のための包括的なフレームワーク、イノベーションをサポートするための条項、非差別/アルゴリズムフェアネスの拡張、説明可能なAI、持続可能性が含まれる。我々は、AI法における持続可能性影響評価と、債務制度における持続可能な設計欠陥を通じて、持続可能なAI規制を飛躍的に開始することを提案する。このようにして、この法律は公正なAIとXAIだけでなく、持続可能なAI(SAI)にも役立ちます。 The optimal liability framework for AI systems remains an unsolved problem across the globe. In a much-anticipated move, the European Commission advanced two proposals outlining the European approach to AI liability in September 2022: a novel AI Liability Directive and a revision of the Product Liability Directive. They constitute the final, and much-anticipated, cornerstone of AI regulation in the EU. Crucially, the liability proposals and the EU AI Act are inherently intertwined: the latter does not contain any individual rights of affected persons, and the former lack specific, substantive rules on AI development and deployment. Taken together, these acts may well trigger a Brussels effect in AI regulation, with significant consequences for the US and other countries. This paper makes three novel contributions. First, it examines in detail the Commission proposals and shows that, while making steps in the right direction, they ultimately represent a half-hearted approach: if enacted as foreseen, AI liability in the EU will primarily rest on disclosure of evidence mechanisms and a set of narrowly defined presumptions concerning fault, defectiveness and causality. Hence, second, the article suggests amendments, which are collected in an Annex at the end of the paper. Third, based on an analysis of the key risks AI poses, the final part of the paper maps out a road for the future of AI liability and regulation, in the EU and beyond. This includes: a comprehensive framework for AI liability; provisions to support innovation; an extension to non-discrimination/algorithmic fairness, as well as explainable AI; and sustainability. I propose to jump-start sustainable AI regulation via sustainability impact assessments in the AI Act and sustainable design defects in the liability regime. In this way, the law may help spur not only fair AI and XAI, but potentially also sustainable AI (SAI).	翻訳日:2022-11-29 14:19:10 公開日:2022-11-28
# MIAD: 教師なし異常検出のための保守検査データセット MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2211.13968v2 ) ライセンス: Link先を確認	Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng	(参考訳) 視覚異常検出は,製造工程中の製品の欠陥を見つけるための製造検査だけでなく,特に屋外の最適作業条件を維持するためのメンテナンス検査においても重要な役割を担っている。欠陥サンプルの不足により,近年,教師なし異常検出が注目されている。しかし, 監視不能な異常検出のための既存のデータセットは製造検査に偏り, 様々なカメラ視点, 乱雑な背景, 長期作業後の物体表面の劣化など, 外部制御されていない環境下での保守検査を考慮しない。各種の屋外産業シナリオにおいて,100K以上の高分解能カラー画像を含むMIADデータセットの総合的な保守検査に焦点をあてた。このデータセットは3Dグラフィックソフトウェアによって生成され、表面および論理異常の両方をピクセル精度の基底真理でカバーしている。非教師付き異常検出のための代表アルゴリズムの広範囲な評価を行い、MIADとそれに対応する実験結果が屋外教師なし異常検出タスクにおける研究コミュニティに刺激を与えると期待する。価値と関連する今後の作業は、私たちの新しいデータセットから生み出すことができます。 Visual anomaly detection plays a crucial role in not only manufacturing inspection to find defects of products during manufacturing processes, but also maintenance inspection to keep equipment in optimum working condition particularly outdoors. Due to the scarcity of the defective samples, unsupervised anomaly detection has attracted great attention in recent years. However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working. We focus on outdoor maintenance inspection and contribute a comprehensive Maintenance Inspection Anomaly Detection (MIAD) dataset which contains more than 100K high-resolution color images in various outdoor industrial scenarios. This dataset is generated by a 3D graphics software and covers both surface and logical anomalies with pixel-precise ground truth. Extensive evaluations of representative algorithms for unsupervised anomaly detection are conducted, and we expect MIAD and corresponding experimental results can inspire research community in outdoor unsupervised anomaly detection tasks. Worthwhile and related future work can be spawned from our new dataset.	翻訳日:2022-11-29 14:18:27 公開日:2022-11-28
# ILSGAN: 教師なし前地上セグメンテーションのための独立層合成 ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation ( http://arxiv.org/abs/2211.13974v2 ) ライセンス: Link先を確認	Qiran Zou, Yu Yang, Wing Yin Cheung, Chang Liu, Xiangyang Ji	(参考訳) 非教師なしフォアグラウンド・バックグラウンド・セグメンテーションは、乱雑な背景から、特に層状GAN(Generative Adversarial Network)アプローチによって、非常に有望な対象を抽出することを目的としている。しかしながら、人間のアノテーションがなければ、それらは一般に「情報漏洩」と呼ばれる非無視的な意味と視覚的混乱を伴う前景層と背景層を生成する傾向があり、それによって生成されたセグメンテーションマスクが顕著に劣化する。この問題を軽減するために,独立層合成GAN (ILSGAN) と呼ばれる,単純かつ効果的な明示的な層独立性モデリング手法を提案する。具体的には、前景と背景の可視領域間の相互情報の最小化を目標とし、層間独立を促進する。理論的および実験的分析により、明示的な層独立性モデリングは情報漏洩を抑制するために重要であり、セグメンテーション性能の向上に寄与する。また,我々のilsganは,複雑な実世界のデータに対して,最先端の生成品質とセグメンテーション性能を実現している。 Unsupervised foreground-background segmentation aims at extracting salient objects from cluttered backgrounds, where Generative Adversarial Network (GAN) approaches, especially layered GANs, show great promise. However, without human annotations, they are typically prone to produce foreground and background layers with non-negligible semantic and visual confusion, dubbed "information leakage", resulting in notable degeneration of the generated segmentation mask. To alleviate this issue, we propose a simple-yet-effective explicit layer independence modeling approach, termed Independent Layer Synthesis GAN (ILSGAN), pursuing independent foreground-background layer generation by encouraging their discrepancy. Specifically, it targets minimizing the mutual information between visible and invisible regions of the foreground and background to spur interlayer independence. Through in-depth theoretical and experimental analyses, we justify that explicit layer independence modeling is critical to suppressing information leakage and contributes to impressive segmentation performance gains. Also, our ILSGAN achieves strong state-of-the-art generation quality and segmentation performance on complex real-world data.	翻訳日:2022-11-29 14:18:01 公開日:2022-11-28
# テーブルの変換: ML評価のためのバイアス付き、不均衡、動的タブラルデータセット Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation ( http://arxiv.org/abs/2211.13358v2 ) ライセンス: Link先を確認	S\'ergio Jesus, Jos\'e Pombal, Duarte Alves, Andr\'e Cruz, Pedro Saleiro, Rita P. Ribeiro, Jo\~ao Gama, Pedro Bizarro	(参考訳) 現実的なデータセットに対する新しいテクニックの評価は、ML研究の発展と実践者によるより広範な採用において重要な役割を果たす。近年,コンピュータビジョンやNLPタスクのための非構造化データリソースの公開が著しく増加している。しかし、多くのハイテイクドメインで広く使われている表形式のデータは、遅れを取っている。このギャップを埋めるために、私たちは、初めて一般公開されたプライバシー保護、大規模かつ現実的なテーブル型データセットのセットである、銀行口座詐欺(bank account fraud:baf)を紹介します。このスイートは、匿名化された現実世界の銀行口座の不正検出データセットに最先端の表式データ生成技術を適用して生成された。この設定には、時間的ダイナミクスや重大なクラス不均衡など、現実世界のアプリケーションで一般的な課題が伴う。さらに、実践者がMLメソッドのパフォーマンスと公平性の両方をテストできるように、各データセットのBAFには、特定の種類のデータバイアスが含まれている。本資料では, より現実的で, 完全で, 堅牢なテストベッドを研究コミュニティに提供することを目的として, 新規および既存手法の評価を行う。 Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this gap, we present Bank Account Fraud (BAF), the first publicly available privacy-preserving, large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized,real-world bank account opening fraud detection dataset. This setting carries a set of challenges that are commonplace in real-world applications, including temporal dynamics and significant class imbalance. Additionally, to allow practitioners to stress test both performance and fairness of ML methods, each dataset variant of BAF contains specific types of data bias. With this resource, we aim to provide the research community with a more realistic, complete, and robust test bed to evaluate novel and existing methods.	翻訳日:2022-11-29 14:17:36 公開日:2022-11-28
# 最適目的推定を用いた知覚指向単一画像超解法 Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation ( http://arxiv.org/abs/2211.13676v2 ) ライセンス: Link先を確認	Seung Ho Park, Young Su Moon, Nam Ik Cho	(参考訳) 知覚的および敵対的損失で訓練されたシングルイメージスーパーレゾリューション(sisr)ネットワークは、l1やl2のような歪み指向損失で訓練されたネットワークと比較して高いコントラスト出力を提供する。しかし, 画像の局所的な多様な形状を正確に復元するには, 単一の知覚損失を用いることが不十分であり, 望ましくない人工物や不自然な細部が生じることが示されている。このため, 知覚, 対角, 歪み損失などの様々な損失の組み合わせが試みられているが, 最適な組み合わせを見つけることは困難である。そこで本稿では,高分解能出力の全体領域において,各領域に最適な目標を適用したSISRフレームワークを提案する。具体的には、所定の低解像度(LR)入力に対して最適な客観的マップを推定する予測モデルと、対応するSR出力を生成するために対象対象マップを適用する生成モデルとからなる。生成モデルは,本提案した目的の集合を表す対象軌道上で訓練され,単一のネットワークが,軌道上の複合的な損失に対応する様々なSR結果を学ぶことができる。予測モデルは、一対のLR画像と、対象軌道から探索された対応する最適目的写像を用いて訓練される。 5つのベンチマーク実験の結果,提案手法はLPIPS, DISTS, PSNR, SSIM測定値において,最先端の認識駆動SR法よりも優れていた。また,視覚効果は,知覚指向の再構成における手法の優位性を示す。コードとモデルはhttps://github.com/seungho-snu/srooeで入手できる。 Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or unnatural details. For this reason, combinations of various losses, such as perceptual, adversarial, and distortion losses, have been attempted, yet it remains challenging to find optimal combinations. Hence, in this paper, we propose a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. Specifically, the framework comprises two models: a predictive model that infers an optimal objective map for a given low-resolution (LR) input and a generative model that applies a target objective map to produce the corresponding SR output. The generative model is trained over our proposed objective trajectory representing a set of essential objectives, which enables the single network to learn various SR results corresponding to combined losses on the trajectory. The predictive model is trained using pairs of LR images and corresponding optimal objective maps searched from the objective trajectory. Experimental results on five benchmarks show that the proposed method outperforms state-of-the-art perception-driven SR methods in LPIPS, DISTS, PSNR, and SSIM metrics. The visual results also demonstrate the superiority of our method in perception-oriented reconstruction. The code and models are available at https://github.com/seungho-snu/SROOE.	翻訳日:2022-11-29 14:17:19 公開日:2022-11-28
# エッジコンピューティングにおける分散CNN推論高速化の設計と試作 Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing ( http://arxiv.org/abs/2211.13778v2 ) ライセンス: Link先を確認	Zhongtian Dong, Nan Li, Alexandros Iosifidis, Qi Zhang	(参考訳) ディープラーニングを使用した時間クリティカルなIoTアプリケーションにとって、分散コンピューティングによる推論アクセラレーションは、厳しい期限を満たすための有望なアプローチである。本稿では,3つのラズベリーPi 4を用いた新しい分散推論加速法HALPの動作プロトタイプを実装した。 HALPはエッジコンピューティングにおけるエッジデバイス(ED)間のシームレスなコラボレーションを設計することで推論を加速する。セグメント分割に基づくタスク分割比を最適化することにより,協調ed間の通信と計算の並列化を最大化する。実験の結果,分散推論HALPはVGG-16の1.7倍の推論加速を達成することがわかった。次に,分散推論と従来のニューラルネットワークモデル圧縮を組み合わせることで,mobilenet-v1の縮小ハイパーパラメータを設定する。このように、推論をさらに加速することができるが、推測精度損失のコストがかかる。レイテンシと精度のバランスをとるために,遅延制約の中で最高の精度のモデルを選択するための動的モデル選択を提案する。分散推論halpを用いたモデル選択により,従来のスタンドアロン計算に比べてサービス信頼性が著しく向上することが示された。 For time-critical IoT applications using deep learning, inference acceleration through distributed computing is a promising approach to meet a stringent deadline. In this paper, we implement a working prototype of a new distributed inference acceleration method HALP using three raspberry Pi 4. HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. We maximize the parallelization between communication and computation among the collaborative EDs by optimizing the task partitioning ratio based on the segment-based partitioning. Experimental results show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. Then, we combine distributed inference with conventional neural network model compression by setting up different shrinking hyperparameters for MobileNet-V1. In this way, we can further accelerate inference but at the cost of inference accuracy loss. To strike a balance between latency and accuracy, we propose dynamic model selection to select a model which provides the highest accuracy within the latency constraint. It is shown that the model selection with distributed inference HALP can significantly improve service reliability compared to the conventional stand-alone computation.	翻訳日:2022-11-29 14:16:51 公開日:2022-11-28
# 深部グラフ表現学習を用いたエンド・ツー・エンド風車ウェイクモデリング End-to-end Wind Turbine Wake Modelling with Deep Graph Representation Learning ( http://arxiv.org/abs/2211.13649v2 ) ライセンス: Link先を確認	Siyi Li, Mingrui Zhang, Matthew D. Piggott	(参考訳) 風力タービンのウェイクモデリングは、正確な資源評価、レイアウトの最適化、風力発電所の運用管理において重要な役割を担っている。本研究では,グラフニューラルネットワークと呼ばれる最先端グラフ表現学習法に基づいて,風車ウェイク表現のためのサロゲートモデルを提案する。提案したエンドツーエンドディープラーニングモデルは、非構造メッシュ上で直接動作し、高忠実度データに対して検証され、様々な入口条件やタービンヨー角度に対して高精度な3次元流れ場予測を行う能力を示している。ここで用いられる特定のグラフニューラルネットワークモデルは、目に見えないデータにうまく一般化し、一般的なグラフニューラルネットワークと比較して過度なスムーシングに敏感でないことを示す。実世界の風力発電所に基づくケーススタディでは,提案手法による大規模発電予測の可能性をさらに実証する。さらに,提案するグラフニューラルネットワークフレームワークは柔軟かつ高度に汎用的であり,非構造メッシュ上の任意の定常数値流体力学シミュレーションに適用可能である。 Wind turbine wake modelling is of crucial importance to accurate resource assessment, to layout optimisation, and to the operational control of wind farms. This work proposes a surrogate model for the representation of wind turbine wakes based on a state-of-the-art graph representation learning method termed a graph neural network. The proposed end-to-end deep learning model operates directly on unstructured meshes and has been validated against high-fidelity data, demonstrating its ability to rapidly make accurate 3D flow field predictions for various inlet conditions and turbine yaw angles. The specific graph neural network model employed here is shown to generalise well to unseen data and is less sensitive to over-smoothing compared to common graph neural networks. A case study based upon a real world wind farm further demonstrates the capability of the proposed approach to predict farm scale power generation. Moreover, the proposed graph neural network framework is flexible and highly generic and as formulated here can be applied to any steady state computational fluid dynamics simulations on unstructured meshes.	翻訳日:2022-11-29 14:16:35 公開日:2022-11-28
# 時間臨界IoTアプリケーションのためのロバストエッジインテリジェンスを実現するセマンティック通信 Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications ( http://arxiv.org/abs/2211.13787v2 ) ライセンス: Link先を確認	Andrea Cavagna, Nan Li, Alexandros Iosifidis, Qi Zhang	(参考訳) 本稿では、時間クリティカルなIoTアプリケーションのためのセマンティック通信を用いて、堅牢なエッジインテリジェンスを設計することを目的とする。画像DCT係数が推定精度に与える影響を系統的に解析し、まず最も有意義なタスクデータを送信し、オフロードのためのチャネル非依存の有効性符号化を提案する。このスキームは利用可能な全ての通信リソースをうまく活用し、伝送遅延と推論精度のバランスを取ることができる。次に、畳み込みニューラルネットワーク(CNN)トレーニングのための新しい画像拡張プロセスを実装し、元のCNNモデルをロバストCNNモデルに変換することにより、有効デコーディングを設計する。提案手法を用いて,Robust MobileNet-v2 と Robust ResNet-50 を生成する。提案するエッジインテリジェンスフレームワークは,提案する有効性エンコーディングと有効性復号で構成される。実験の結果,ロバストなcnnモデルを用いたデコードの有効性は,チャネルエラーや通信資源の制限による様々な画像歪みに対して一貫して向上することがわかった。セマンティクス通信を用いたエッジインテリジェンスフレームワークは、レイテンシとデータレートの制約、特に超厳密な期限と低いデータレート下での従来のアプローチを大きく上回っている。 This paper aims to design robust Edge Intelligence using semantic communication for time-critical IoT applications. We systematically analyze the effect of image DCT coefficients on inference accuracy and propose the channel-agnostic effectiveness encoding for offloading by transmitting the most meaningful task data first. This scheme can well utilize all available communication resource and strike a balance between transmission latency and inference accuracy. Then, we design an effectiveness decoding by implementing a novel image augmentation process for convolutional neural network (CNN) training, through which an original CNN model is transformed into a Robust CNN model. We use the proposed training method to generate Robust MobileNet-v2 and Robust ResNet-50. The proposed Edge Intelligence framework consists of the proposed effectiveness encoding and effectiveness decoding. The experimental results show that the effectiveness decoding using the Robust CNN models perform consistently better under various image distortions caused by channel errors or limited communication resource. The proposed Edge Intelligence framework using semantic communication significantly outperforms the conventional approach under latency and data rate constraints, in particular, under ultra stringent deadlines and low data rate.	翻訳日:2022-11-29 14:16:20 公開日:2022-11-28
# ネットワーク支援空間進化を用いた辞書攻撃のための2次元および3次元マスターフェイスの生成 Generating 2D and 3D Master Faces for Dictionary Attacks with a Network-Assisted Latent Space Evolution ( http://arxiv.org/abs/2211.13964v2 ) ライセンス: Link先を確認	Tomer Friedlander, Ron Shmelkin, Lior Wolf	(参考訳) マスターフェイス(master face)は、人口の比率が高い顔認証をパスする顔画像である。これらの顔は、ユーザー情報にアクセスせずに、成功の可能性の高いユーザーを偽装するのに使うことができる。 2次元および3次元の顔検証モデルのために,スタイルガン顔生成器の潜在埋め込み空間における進化的アルゴリズムを用いて顔の最適化を行う。 2次元顔認証では,複数の進化戦略を比較し,適応度評価を加えることなく,有望なサンプルを探索するためのニューラルネットワークを用いた新しいアプローチを提案する。その結果,6つの主顔認識システムにおいて,10個の主顔未満のLFWデータセットやRFWデータセットのアイデンティティをかなり網羅することが可能であることが判明した。 3Dでは,2次元スタイルGAN2ジェネレータを用いて顔を生成し,深部3次元顔再構成ネットワークを用いて3次元構造を予測する。 2つの異なる3D顔認証システムを採用すると、40%から50%のカバレッジが得られる。さらに,2次元モデルと3次元モデルとを同時に組み合わせた2次元RGBと3次元マスターフェイスのペア生成を提案する。 A master face is a face image that passes face-based identity authentication for a high percentage of the population. These faces can be used to impersonate, with a high probability of success, any user, without having access to any user information. We optimize these faces for 2D and 3D face verification models, by using an evolutionary algorithm in the latent embedding space of the StyleGAN face generator. For 2D face verification, multiple evolutionary strategies are compared, and we propose a novel approach that employs a neural network to direct the search toward promising samples, without adding fitness evaluations. The results we present demonstrate that it is possible to obtain a considerable coverage of the identities in the LFW or RFW datasets with less than 10 master faces, for six leading deep face recognition systems. In 3D, we generate faces using the 2D StyleGAN2 generator and predict a 3D structure using a deep 3D face reconstruction network. When employing two different 3D face recognition systems, we are able to obtain a coverage of 40%-50%. Additionally, we present the generation of paired 2D RGB and 3D master faces, which simultaneously match 2D and 3D models with high impersonation rates.	翻訳日:2022-11-29 14:07:49 公開日:2022-11-28
# 第1回海洋コンピュータビジョンワークショップ(macvi)2023:チャレンジ結果 1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results ( http://arxiv.org/abs/2211.13508v2 ) ライセンス: Link先を確認	Benjamin Kiefer, Matej Kristan, Janez Per\v{s}, Lojze \v{Z}ust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon H\"ofer, Qiming Zhang, Yufei Xu, Jing Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Sagar Verma, Siddharth Gupta, Shishir Muralidhara, Niharika Hegde, Daitao Xing, Nikolaos Evangeliou, Anthony Tzes, Vojt\v{e}ch Bartl, Jakub \v{S}pa\v{n}hel, Adam Herout, Neelanjan Bhowmik, Toby P. Breckon, Shivanand Kundargi, Tejas Anvekar, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudengudi, Arpita Vats, Yang Song, Delong Liu, Yonglin Li, Shuman Li, Chenhao Tan, Long Lan, Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi, Hsiang-Wei Huang, Cheng-Yen Yang, Jenq-Neng Hwang, Pyong-Kun Kim, Kwangju Kim, Kyoungoh Lee, Shuai Jiang, Haiwen Li, Zheng Ziqiang, Tuan-Anh Vu, Hai Nguyen-Truong, Sai-Kit Yeung, Zhuang Jia, Sophia Yang, Chih-Chung Hsu, Xiu-Yu Hou, Yu-An Jhang, Simon Yang, Mau-Tsuen Yang	(参考訳) 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023は、無人航空機 (UAV) と無人表面車両 (USV) のための海上コンピュータビジョンに焦点を当て、この分野のいくつかのサブ組織を組織した。 (i)uavによる海上物体検出 (II)UAVによる海上物体追跡 (iii)usvによる海上障害物セグメンテーションと海上障害物セグメンテーション (iv)usvによる海上障害物検出サブチャンジはSeaDronesSeeとMODSベンチマークに基づいていた。本報告では,個々のサブクラスの主な知見を要約し,新たなベンチマークであるseadronessee object detection v2を紹介する。統計的および定性的な分析を行い,130以上の応募のベストパフォーマンス手法の傾向を評価する。メソッドは付録にまとめられている。データセット、評価コード、リーダーボードはhttps://seadronessee.cs.uni-tuebingen.de/macvi.comで公開されている。 The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.	翻訳日:2022-11-29 14:07:29 公開日:2022-11-28

Title

Authors

Abstract

論文公表日・翻訳日

# 有限アーベル群に対する内部量子参照フレーム

Internal quantum reference frames for finite Abelian groups ( http://arxiv.org/abs/2107.07545v2 )

ライセンス: Link先を確認

Philipp A. Hoehn, Marius Krumm, Markus P. Mueller

(参考訳) 内部量子システムを参照フレームとして用いることは、外的相対性が利用できない場合、量子重力、ゲージ理論、量子基礎において重要な概念である。本研究では、基礎となる構成空間が有限アベリア群である場合に、そのような量子参照フレーム(QRF)の包括的かつ自己完結的な処理を施し、これまでの作業を大幅に拡張する(Quantum 5, 530 (2021))。このセットアップの単純さは、完全に厳密な量子情報理論解析を認め、概念的および構造的問題の多くを探索するのに十分な構造を維持しながら、より複雑な設定に関係している。これを利用して、量子情報理論法による制約量子化の重要構造を導出し、QRF共分散に対する異なるアプローチの関係を明らかにする。特に、我々は「物理的ヒルベルト空間」("perspective-neutral" アプローチの領域)を、状態の浄化のフレームに依存しない記述を許容する極大部分空間として特徴づける。次に,qrfsに対する「可視的ニュートラル」アプローチと「良性」アプローチの運動学的同値性と,驚くべき動的不等価性を示す。前者は任意の部分系関係の間の遷移を生成するユニタリを認めているが、後者は対称保存を必要とするときにそのようなダイナミクスを認めない。実験では, 相互作用する粒子の例として, ダイナミックスを「サブシステムの1つと相対的に」表現する方法を例示する。

Employing internal quantum systems as reference frames is a crucial concept in quantum gravity, gauge theories and quantum foundations whenever external relata are unavailable. In this work, we give a comprehensive and self-contained treatment of such quantum reference frames (QRFs) for the case when the underlying configuration space is a finite Abelian group, significantly extending our previous work (Quantum 5, 530 (2021)). The simplicity of this setup admits a fully rigorous quantum information-theoretic analysis, while maintaining sufficient structure for exploring many of the conceptual and structural questions also pertinent to more complicated setups. We exploit this to derive several important structures of constraint quantization with quantum information-theoretic methods and to reveal the relation between different approaches to QRF covariance. In particular, we characterize the "physical Hilbert space" -- the arena of the "perspective-neutral" approach -- as the maximal subspace that admits frame-independent descriptions of purifications of states. We then demonstrate the kinematical equivalence and, surprising, dynamical inequivalence of the "perspective-neutral" and the "alignability" approach to QRFs. While the former admits unitaries generating transitions between arbitrary subsystem relations, the latter, remarkably, admits no such dynamics when requiring symmetry-preservation. We illustrate these findings by example of interacting discrete particles, including how dynamics can be described "relative to one of the subsystems".

翻訳日:2023-03-22 05:03:05 公開日:2022-11-28

# 文法の進化における方向性力の検出:EEBO, COHA, Google Booksを用いた英語完全語の事例研究

Detecting directional forces in the evolution of grammar: A case study of the English perfect using EEBO, COHA, and Google Books ( http://arxiv.org/abs/2110.08567v2 )

ライセンス: Link先を確認

Shimpei Okuda, Michio Hosaka, and Kazutoshi Sasahara

(参考訳) 言語には進化を通じて現れた様々な特徴がある。現代の英語文法では、完全は \textit{have}+PP (past participle) で形成されるが、初期の英語では \textit{be}+PP 形式も存在した。副動詞BEは,いくつかの特別な症例を除いて,進化を通じてHAVEに置き換えられたことが広く認識されている。しかし、この進化が自然選択やランダムドリフトによって引き起こされたのかはいまだ不明である。本稿では、EEBO(Early English Books Online)、COHA(Corpus of Historical American English)、Google Books(Google Books)の3つの大規模データソースを組み合わせて、英語完全性の進化における方向性について検討した。非翻訳動詞の多くは、ディープニューラルネットワークに基づくモデルによって「選択」に分類された \textit{be}+pp から \textit{have}+pp へ明らかな遷移を示した。これらの結果は、英語の完全性がランダムなドリフトではなく自然選択を通じて進化し、文法の文化的進化に対する洞察を与えることを示唆している。

Languages have diverse characteristics that have emerged through evolution. In modern English grammar, the perfect is formed with \textit{have}+PP (past participle), but in earlier English the \textit{be}+PP form also existed. It is widely recognised that the auxiliary verb BE was replaced by HAVE throughout evolution, except for some special cases. However, whether this evolution was caused by natural selection or random drift is still unclear. Here we examined directional forces in the evolution of the English perfect by combining three large-scale data sources: EEBO (Early English Books Online), COHA (Corpus of Historical American English), and Google Books. We found that most intransitive verbs exhibited an apparent transition from \textit{be}+PP to \textit{have}+PP, most of which were classified as `selection' by a deep neural network-based model. These results suggest that the English perfect could have evolved through natural selection rather than random drift, and provide insights into the cultural evolution of grammar.

翻訳日:2023-03-11 08:03:36 公開日:2022-11-28

# 量子次数探索の成功確率について

On the success probability of quantum order finding ( http://arxiv.org/abs/2201.07791v2 )

ライセンス: Link先を確認

Martin Eker{\aa}

(参考訳) shor の順序探索アルゴリズムが 1 回のランで $r$ を回収することに成功すれば,その確率は低い値であることが証明される。このバウンドは、アルゴリズムの古典的な後処理部分で2つの制限された検索を行うことで、量子部分を再実行したり、shorと比較して指数長を増加させることなく、r$で高い成功確率を保証できることを意味する。漸近的に、$r$が無限大の傾向にあるように、単一のランで$r$を回復する確率は1つになる。適度な$r$の場合、例えば1 - 10^{-4}$を超える高い成功確率が保証される。行程として、オーダーフィンディングアルゴリズムの単一実行において任意の整数$N$を完全に分解する確率について類似した結果を示す。

We prove a lower bound on the probability of Shor's order-finding algorithm successfully recovering the order $r$ in a single run. The bound implies that by performing two limited searches in the classical post-processing part of the algorithm, a high success probability can be guaranteed, for any $r$, without re-running the quantum part or increasing the exponent length compared to Shor. Asymptotically, in the limit as $r$ tends to infinity, the probability of successfully recovering $r$ in a single run tends to one. Already for moderate $r$, a high success probability exceeding e.g. $1 - 10^{-4}$ can be guaranteed. As corollaries, we prove analogous results for the probability of completely factoring any integer $N$ in a single run of the order-finding algorithm.

翻訳日:2023-02-28 10:13:00 公開日:2022-11-28

# ガウス回路を用いたゴッテマン・キタエフ精密状態の効率的なシミュレーション

Efficient simulation of Gottesman-Kitaev-Preskill states with Gaussian circuits ( http://arxiv.org/abs/2203.11182v3 )

ライセンス: Link先を確認

Cameron Calcluth, Alessandro Ferraro, Giulia Ferrini

(参考訳) ゴッテマン・キタエフ・プレスキル状態(GKP)の古典的シミュラビリティを,任意の変位,大規模なシンプレクティック操作,ホモダイン測定と組み合わせて検討した。これらのタイプの回路では、準確率分布の非負性性に基づく連続変数の定理も、ゴッテマン・クニルの定理のような離散変数の定理も、シミュラビリティを評価するために用いられる。まず、任意の旋回と大きな回転の後に、位置ベースで1つのGKP状態を測定することに対応する確率密度関数を評価する方法を開発した。この方法は解析数論の手法を用いて変換されたヤコビテータ関数を評価することを含む。この結果を用いて、古典的に効率的にシミュレート可能であり、GKP符号化クリフォード群に含まれない2つの大きなマルチモード回路を同定する。その結果、従来より効率的にシミュラブルな回路の集合が拡張された。

We study the classical simulatability of Gottesman-Kitaev-Preskill (GKP) states in combination with arbitrary displacements, a large set of symplectic operations and homodyne measurements. For these types of circuits, neither continuous-variable theorems based on the non-negativity of quasi-probability distributions nor discrete-variable theorems such as the Gottesman-Knill theorem can be employed to assess the simulatability. We first develop a method to evaluate the probability density function corresponding to measuring a single GKP state in the position basis following arbitrary squeezing and a large set of rotations. This method involves evaluating a transformed Jacobi theta function using techniques from analytic number theory. We then use this result to identify two large classes of multimode circuits which are classically efficiently simulatable and are not contained by the GKP encoded Clifford group. Our results extend the set of circuits previously known to be classically efficiently simulatable.

翻訳日:2023-02-21 04:59:03 公開日:2022-11-28

# 人工知能による差別を防ぐために機密データを使用する:GDPRは新しい例外が必要か?

Using sensitive data to prevent discrimination by artificial intelligence: Does the GDPR need a new exception? ( http://arxiv.org/abs/2206.03262v3 )

ライセンス: Link先を確認

Marvin van Bekkum, Frederik Zuiderveen Borgesius

(参考訳) 組織は人工知能を使用して、さまざまな理由から人々に関する意思決定を行い、例えば、多くの求人アプリケーションから最高の候補を選ぶことができる。しかし、AIシステムは意思決定に使用すると差別効果がある。説明として、AIシステムは特定の民族を持つ人々の適用を拒否する可能性があるが、組織はそのような民族差別を計画しなかった。しかし、ヨーロッパでは、AIシステムが誤って民族によって差別されているかどうかを評価しようとすると、組織は問題に直面します。原則として、GDPRは、民族、宗教、性的嗜好に関するデータを含む特定の「特定のデータのカテゴリ」の使用を禁止している(しばしば「感受性データ」と呼ばれる)。欧州委員会AI法の提案には、組織がAIシステムの監査に特別なカテゴリのデータを使用することを可能にする条項が含まれている。本稿では,個人データの特殊カテゴリに関するGDPRのルールが,AIによる差別の防止を妨げているかどうかを問う。 GDPRは多くの状況において特別なカテゴリーデータの使用を禁止していると論じる。我々はまた、AIシステムによる差別を防止するために、GDPRが個人データの特別なカテゴリの使用を禁止することの例外を作成することに対する議論と反対の議論をマップアップする。この論文は欧州の法律について論じているが、この論文は欧州以外でも関係があり、世界の多くの政策立案者がプライバシーと非差別政策の緊張を和らげている。

Organisations can use artificial intelligence to make decisions about people for a variety of reasons, for instance, to select the best candidates from many job applications. However, AI systems can have discriminatory effects when used for decision-making. To illustrate, an AI system could reject applications of people with a certain ethnicity, while the organisation did not plan such ethnicity discrimination. But in Europe, an organisation runs into a problem when it wants to assess whether its AI system accidentally discriminates based on ethnicity: the organisation may not know the applicants' ethnicity. In principle, the GDPR bans the use of certain 'special categories of data' (sometimes called 'sensitive data'), which include data on ethnicity, religion, and sexual preference. The proposal for an AI Act of the European Commission includes a provision that would enable organisations to use special categories of data for auditing their AI systems. This paper asks whether the GDPR's rules on special categories of personal data hinder the prevention of AI-driven discrimination. We argue that the GDPR does prohibit such use of special category data in many circumstances. We also map out the arguments for and against creating an exception to the GDPR's ban on using special categories of personal data, to enable preventing discrimination by AI systems. The paper discusses European law, but the paper can be relevant outside Europe too, as many policymakers in the world grapple with the tension between privacy and non-discrimination policy.

翻訳日:2023-02-19 17:35:10 公開日:2022-11-28

# 自然言語処理によるデジタル資産価格の予測:調査

Predicting Digital Asset Prices using Natural Language Processing: a survey ( http://arxiv.org/abs/2212.00726v1 )

ライセンス: Link先を確認

Trang Tran

(参考訳) ブロックチェーン技術は、人々がどのように自分の資産を保存し、取引するかについての考え方を変えました。ブロックチェーン技術の革新の1つは分散化(decentralization)である。つまり、資産担保発行者や銀行といった従来の金融仲介業者はプロセス中に排除される。ブロックチェーン技術はさまざまな業界で利用されているが、最も顕著なアプリケーションは暗号通貨であり、Bitcoinが最初に提案されている。 2021年のピーク時には、Bitcoinの時価総額はかつて1兆ドルを超えた。仮想通貨市場のオープンな性質は、投資価格が非常に変動し、その変動が予測不可能であるため、小売投資家と機関投資家の両方に様々な課題と懸念をもたらす。特に機械学習や自然言語処理の台頭は、暗号通貨の価格行動の監視と予測に光を当てている。本稿では,ビットコインやイーサリアムなどのデジタル資産の価格予測と動作分析に機械学習と自然言語処理を適用した最近の取り組みをレビューし,分析することを目的とする。

Blockchain technology has changed how people think about how they used to store and trade their assets, as it introduced us to a whole new way to transact: using digital currencies. One of the major innovations of blockchain technology is decentralization, meaning that traditional financial intermediaries, such as asset-backed security issuers and banks, are eliminated in the process. Even though blockchain technology has been utilized in a wide range of industries, its most prominent application is still cryptocurrencies, with Bitcoin being the first proposed. At its peak in 2021, the market cap for Bitcoin once surpassed 1 trillion US dollars. The open nature of the crypto market poses various challenges and concerns for both potential retail investors and institutional investors, as the price of the investment is highly volatile, and its fluctuations are unpredictable. The rise of Machine Learning, and Natural Language Processing, in particular, has shed some light on monitoring and predicting the price behaviors of cryptocurrencies. This paper aims to review and analyze the recent efforts in applying Machine Learning and Natural Language Processing methods to predict the prices and analyze the behaviors of digital assets such as Bitcoin and Ethereum.

翻訳日:2023-02-19 12:47:00 公開日:2022-11-28

# Minimax AUC Fairness: Provable Convergence を用いた効率的なアルゴリズム

Minimax AUC Fairness: Efficient Algorithm with Provable Convergence ( http://arxiv.org/abs/2208.10451v2 )

ライセンス: Link先を確認

Zhenhuan Yang, Yan Lok Ko, Kush R. Varshney, Yiming Ying

(参考訳) 一連の意思決定における機械学習モデルの使用は、社会的不平等を悪化させ、特に人種や性別によって定義された限界グループのメンバーに異質な影響をもたらす。 ROC曲線(AUC)の下の領域は、機械学習におけるスコアリング関数の性能を評価するために広く使われているが、他のパフォーマンス指標よりもアルゴリズム的公正さで研究されている。 AUC の双対の性質のため、AUC に基づく群フェアネス計量を定義することはペア独立であり、 \emph{intra-group} と \emph{inter-group} AUC の両方を含むこともある。重要なことは、AUCの1つのカテゴリだけを考えると、AUC最適化の不公平さを軽減するには不十分である。本稿では,実用性を維持しつつグループ内およびグループ間aucsを組み込んだミニマックス学習・バイアス緩和フレームワークを提案する。このrawlsianフレームワークに基づいて,効率的な確率最適化アルゴリズムを設計し,最小群レベル auc への収束を証明する。我々は,ミニマックスフレームワークと提案アルゴリズムの有効性を検証するために,合成データセットと実世界のデータセットの数値実験を行った。

The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metrics. Due to the pairwise nature of the AUC, defining an AUC-based group fairness metric is pairwise-dependent and may involve both \emph{intra-group} and \emph{inter-group} AUCs. Importantly, considering only one category of AUCs is not sufficient to mitigate unfairness in AUC optimization. In this paper, we propose a minimax learning and bias mitigation framework that incorporates both intra-group and inter-group AUCs while maintaining utility. Based on this Rawlsian framework, we design an efficient stochastic optimization algorithm and prove its convergence to the minimum group-level AUC. We conduct numerical experiments on both synthetic and real-world datasets to validate the effectiveness of the minimax framework and the proposed optimization algorithm.

翻訳日:2023-02-19 10:39:07 公開日:2022-11-28

# 光の多モードスクイーズ状態を用いた量子密度符号化ネットワーク

Quantum Dense Coding Network using Multimode Squeezed States of Light ( http://arxiv.org/abs/2204.14147v2 )

ライセンス: Link先を確認

Ayan Patra, Rivu Gupta, Saptarshi Roy, Tamoghna Das, Aditi Sen De

(参考訳) 本稿では,複数送信機と連続変数システムを用いた単一受信機を備えた多モード高密度符号化ネットワークのフレームワークを提案する。このプロトコルは任意の数のモードに対してスケーラブルであり、符号化は変位である一方、復号にはビームスプリッターの列によって互いに結合されたモードのホモダインの計測が伴うため、現在利用可能なリソースを持つ実験室で実装される可能性を示す。 3モード状態と4モード状態の共有を含む2と3つの送信者の場合の符号化容量の閉形式表現を計算する。符号化動作後、送信者のモードが受信機に転送されると、高密度符号化容量は固定平均エネルギー伝送の制約により算出される。いずれの場合も,3モードおよび4モード状態のパラダイムクラスを用いて,プロトコルの量子的優位性を示す。量子アドバンテージは、送信者から受信者への送信が許されるエネルギー量の増加に伴って増加する。

We present a framework of a multimode dense coding network with multiple senders and a single receiver using continuous variable systems. The protocol is scalable to arbitrary numbers of modes with the encoding being displacements while the decoding involves homodyne measurements of the modes after they are combined in a pairwise manner by a sequence of beam splitters, thereby exhibiting its potentiality to implement in laboratories with currently available resources. We compute the closed form expression of the dense coding capacity for the cases of two and three senders that involve sharing of three- and four-mode states respectively. The dense coding capacity is calculated with the constraint of fixed average energy transmission when the modes of the sender are transferred to the receiver after the encoding operation. In both the cases, we demonstrate the quantum advantage of the protocol using paradigmatic classes of three- and four-mode states. The quantum advantage increases with the increase in the amount of energy that is allowed to be transmitted from the senders to the receiver.

翻訳日:2023-02-15 04:00:13 公開日:2022-11-28

# 非相互作用非エルミート$n$-パーティイト強結合格子に対する一般リーブの定理

Generalized Lieb's theorem for noninteracting non-Hermitian $n$-partite tight-binding lattices ( http://arxiv.org/abs/2205.04174v2 )

ライセンス: Link先を確認

A. M. Marques and R. G. Dias

(参考訳) エルミート二成分モデルはカイラル対称性の存在とリーブの定理によって特徴づけられ、2つの部分格子の間の点の不均衡からモデルのゼロエネルギー平坦なバンドの数を導出する。本稿では、任意の数の部分束が一方向および巡回的に連結された非エルミート模型のクラスを導入し、それらのモデルの零エネルギー平坦バンドの数をリーブの定理の一般化バージョンから見出すことができ、各部分束と最小次元の部分束との間の不均衡を含む非相互作用的密結合モデルへの応用について述べる。さらに、これらのモデルは、特定の時計やパラフェルミオン系の文脈で見られるタイプの一般化されたキラル対称性に従うことも示される。主な成果は単純な玩具モデルで示され、ここで紹介されるモデルの異なるプラットフォームにおける実現の可能性について論じる。

Hermitian bipartite models are characterized by the presence of chiral symmetry and by Lieb's theorem, which derives the number of zero-energy flat bands of the model from the imbalance of sites between its two sublattices. Here, we introduce a class of non-Hermitian models with an arbitrary number of sublattices connected in a unidirectional and cyclical way and show that the number of zero-energy flat bands of these models can be found from a generalized version of Lieb's theorem, in what regards its application to noninteracting tight-binding models, involving the imbalance between each sublattice and the sublattice of lowest dimension. Furthermore, these models are also shown to obey a generalized chiral symmetry, of the type found in the context of certain clock or parafermionic systems. The main results are illustrated with a simple toy model, and possible realizations in different platforms of the models introduced here are discussed.

翻訳日:2023-02-13 20:37:53 公開日:2022-11-28

# ブラックホールデコヒール量子重ね合わせ

Black Holes Decohere Quantum Superpositions ( http://arxiv.org/abs/2205.06279v2 )

ライセンス: Link先を確認

Daine L. Danielson, Gautam Satishchandran, and Robert M. Wald

(参考訳) 質量体を空間的に分離された状態の量子重ね合わせにすると、天体の近傍にブラックホールが存在するだけで、最終的に重畳のコヒーレンスが破壊されることを示す。これは、事実上、天体の重力場がブラックホールに柔らかい重力を放射し、ブラックホールが重ね合わせに関する「どの経路」情報を取得することができるためである。同様の効果は、荷電された天体の量子重ね合わせにも起こる。このような量子重ね合わせのデコヒーレンス時間を推定する。ブラックホールが最終的に量子的重ね合わせを解き放つという事実は、重力の量子論におけるブラックホールの性質を理解する上での基本的な重要性であると考えられている。

We show that if a massive body is put in a quantum superposition of spatially separated states, the mere presence of a black hole in the vicinity of the body will eventually destroy the coherence of the superposition. This occurs because, in effect, the gravitational field of the body radiates soft gravitons into the black hole, allowing the black hole to acquire "which path" information about the superposition. A similar effect occurs for quantum superpositions of electrically charged bodies. We provide estimates of the decoherence time for such quantum superpositions. We believe that the fact that a black hole will eventually decohere any quantum superposition may be of fundamental significance for our understanding of the nature of black holes in a quantum theory of gravity.

翻訳日:2023-02-13 09:29:16 公開日:2022-11-28

# Pegg-BarnettとPaul量子相の形式的関係

Formal relation between Pegg-Barnett and Paul quantum phase frameworks ( http://arxiv.org/abs/2205.09481v2 )

ライセンス: Link先を確認

Tomasz Linowski, Konrad Schlichtholz, {\L}ukasz Rudnicki

(参考訳) エルミート量子位相演算子を定義する問題は、量子力学そのものと同じくらい古い。長年にわたり、抽象演算子形式から位相空間法まで、多くの解が提案された。本研究では、ポール形式主義における位相の確率分布が、後者と量子制限増幅チャネルを組み合わせることで、ペッグ・バーネット形式主義から完全に従うことを証明し、最も顕著な2つのアプローチの間に明確な接続を行う。その結果,Paul フレームワークは Pegg-Barnett アプローチの半古典的限界と見なされる可能性が示唆された。

The problem of defining a hermitian quantum phase operator is nearly as old as quantum mechanics itself. Throughout the years, a number of solutions was proposed, ranging from abstract operator formalisms to phase-space methods. In this work, we make an explicit connection between two of the most prominent approaches, by proving that the probability distribution of phase in the Paul formalism follows exactly from the Pegg-Barnett formalism by combining the latter with the quantum limited amplifier channel. Our findings suggest that the Paul framework may be viewed as a semi-classical limit of the Pegg-Barnett approach.

翻訳日:2023-02-12 15:52:59 公開日:2022-11-28

# 古典最適化ハミルトンシミュレーション

Classically optimized Hamiltonian simulation ( http://arxiv.org/abs/2205.11427v3 )

ライセンス: Link先を確認

Conor Mc Keever, Michael Lubasch

(参考訳) ハミルトンシミュレーションは量子コンピュータが量子優位を達成するための有望な応用である。本稿では,量子回路を最適化するためのテンソルネットワーク法に基づく古典的アルゴリズムを提案する。トロッター積公式と比較して、古典的に最適化された回路は桁違いに精度が高く、シミュレーション時間も大幅に拡張できることを示す。

Hamiltonian simulation is a promising application for quantum computers to achieve a quantum advantage. We present classical algorithms based on tensor network methods to optimize quantum circuits for this task. We show that, compared to Trotter product formulas, the classically optimized circuits can be orders of magnitude more accurate and significantly extend the total simulation time.

翻訳日:2023-02-12 00:30:45 公開日:2022-11-28

# 熱力学における量子コヒーレンスの役割

Role of Quantum Coherence in Thermodynamics ( http://arxiv.org/abs/2205.13612v3 )

ライセンス: Link先を確認

Gilad Gour

(参考訳) 時間変換共変進化下での量子系の相互変換性を決定するために必要な条件を見つけ、それを用いて単発状態と漸近状態の両方において量子熱力学の問題を解く。量子熱力学の資源理論は可逆ではないことがよく知られているが、prl 111, 250404 (2013) では、エネルギー準位に対するコヒーレント重ね合わせのsublinear amount of coherent superpositionが利用可能であると主張した。ここでは、エネルギー準線形量のコヒーレンスを自由とすれば、熱水性に関する資源理論は自明になることを示す。代わりに、エネルギーの亜線型量を考えることで、純粋な状態の場合、熱力学の理論は可逆になることを示す。混合状態の場合に対する同じ主張の証明はまだ不足している。

We find necessary and sufficient conditions to determine the inter-convertibility of quantum systems under time-translation covariant evolution, and use it to solve several problems in quantum thermodynamics both in the single-shot and asymptotic regimes. It is well known that the resource theory of quantum athermality is not reversible, but in PRL 111, 250404 (2013) it was claimed that the theory becomes reversible ``provided a sublinear amount of coherent superposition over energy levels is available". Here we show that if a sublinear amount of coherence among energy levels were considered free, then the resource theory of athermality would become trivial. Instead, we show that by considering a sublinear amount of energy to be free, the theory of athermality becomes reversible for the pure-state case. A proof of the same claim for the mixed-state case is still lacking.

翻訳日:2023-02-11 16:30:23 公開日:2022-11-28

# 量子状態の不安定構造に基づく局所近似によるブラインド量子データ圧縮のレート低減

Rate Reduction of Blind Quantum Data Compression with Local Approximations Based on Unstable Structure of Quantum States ( http://arxiv.org/abs/2206.03501v2 )

ライセンス: Link先を確認

Kohdai Kuroiwa and Debbie Leung

(参考訳) 本稿では,有限局所近似を用いたデータ圧縮タスクであるブラインド量子データ圧縮のための新しいプロトコルを提案する。ブラインドデータ圧縮の速度は、近似が小さくても近似に影響を受けやすい。この不安定性は近似に対する量子状態の構造の感度に由来するため、近似の存在下でのブラインド圧縮の解析は難解である。本稿では, 圧縮速度を実質的に低減するために, 不安定性を利用したプロトコルを構築した。本プロトコルは, 具体例において, 顕著な削減率を示す。さらに,本手法を対角状態に適用し,この特別な場合において2種類の近似法を提案する。数値実験を行い、これらの2つの近似法のうちの1つが他方よりもかなり優れていることを観察する。そこで本研究では,ブラインド量子データ圧縮の近似速度トレードオフのさらなる検討に向けて,近似値を用いたブラインド量子データ圧縮の一般研究に向けて第一歩を踏み出した。

In this paper, we propose a new protocol for a data compression task, blind quantum data compression, with finite local approximations. The rate of blind data compression is susceptible to approximations even when the approximations are diminutive. This instability originates from the sensitivity of a structure of quantum states against approximations, which makes the analysis of blind compression in the presence of approximations intractable. In this paper, we constructed a protocol that takes advantage of the instability to reduce the compression rate substantially. Our protocol shows a significant reduction in rate for specific examples we examined. Moreover, we apply our methods to diagonal states, and propose two types of approximation methods in this special case. We perform numerical experiments and observe that one of these two approximation methods performs significantly better than the other. Thus, our analysis makes a first step toward general investigation of blind quantum data compression with the allowance of approximations towards further investigation of approximation-rate trade-off of blind quantum data compression.

翻訳日:2023-02-10 06:36:14 公開日:2022-11-28

# 量子化学のための量子学習マシン(qlm)のオープンソースの変分量子固有ソルバ拡張

Open Source Variational Quantum Eigensolver Extension of the Quantum Learning Machine (QLM) for Quantum Chemistry ( http://arxiv.org/abs/2206.08798v4 )

ライセンス: Link先を確認

Mohammad Haidar, Marko J. Ran\v{c}i\'c, Thomas Ayral, Yvon Maday, Jean-Philip Piquemal

(参考訳) 量子化学 (qc) は量子コンピューティングの最も有望な応用の一つである。しかし、現在の量子処理ユニット(QPU)は依然として大きなエラーにさらされている。したがって、ノイズの多い中間スケール量子(NISQ)ハードウェアは、量子ビット数と回路深さの点で制限される。変分量子固有解法(VQE)のような特定のアルゴリズムは、そのような問題を克服することができる。本稿では,オープンVQE(Open-VQE)と呼ばれる新しいオープンソースQCパッケージについて紹介する。 VQEアルゴリズムの開発とテストを容易にする。 atos quantum learning machine (qlm)は、量子コンピューティングプログラムを書き、最適化し、シミュレートできる一般的な量子プログラミングフレームワークである。私たちは、open-vqeとともに、新しいオープンソースモジュールであるmyqlm-fermion(qc開発で重要な重要なqlm再資源を含む)を紹介します(fermionic second quantization toolsなど)。 Open-VQEパッケージはQLMをQCに拡張します。 (i)一般的に使用されるuccsd ans{\"a}tz以外の異なる種類の励起を生成する関数 (ii) 単純なクラス構造ピソン符号で書かれた"adaptive derivative assembled pseudo-Trotter method"(ADAPT-VQE)の新たな実装。他の主要な量子プログラミングフレームワークとの相互運用性は、myqlmのおかげで保証されている。 open-vqe/myqlm-fermion量子シミュレータを組み合わせることで、変分量子アルゴリズムの実装、テスト、開発が容易になり、現在の量子コンピュータでqc計算を実行するための最良の妥協を選択し、大きな分子をテストできる。 4から24までの量子ビット数に関連する分子の広範なベンチマークを提供する。

Quantum Chemistry (QC) is one of the most promising applications of Quantum Computing. However, present quantum processing units (QPUs) are still subject to large errors. Therefore, noisy intermediate-scale quantum (NISQ) hardware is limited in terms of qubits counts and circuit depths. Specific algorithms such as Variational Quantum Eigensolvers (VQEs) can potentially overcome such issues. We introduce here a novel open-source QC package, denoted Open-VQE, providing tools for using and developing chemically-inspired adaptive methods derived from Unitary Coupled Cluster (UCC). It facilitates the development and testing of VQE algorithms. It is able to use the Atos Quantum Learning Machine (QLM), a general quantum programming framework enabling to write, optimize and simulate quantum computing programs. Along with Open-VQE, we introduce myQLM-Fermion, a new open-source module (that includes the key QLM ressources that are important for QC developments (fermionic second quantization tools etc...). The Open-VQE package extends therefore QLM to QC providing: (i) the functions to generate the different types of excitations beyond the commonly used UCCSD ans{\"a}tz;(ii) a new implementation of the "adaptive derivative assembled pseudo-Trotter method" (ADAPT-VQE), written in simple class structure python codes. Interoperability with other major quantum programming frameworks is ensured thanks to myQLM, which allows users to easily build their own code and execute it on existing QPUs. The combined Open-VQE/myQLM-Fermion quantum simulator facilitates the implementation, tests and developments of variational quantum algorithms towards choosing the best compromise to run QC computations on present quantum computers while offering the possibility to test large molecules. We provide extensive benchmarks for several molecules associated to qubit counts ranging from 4 up to 24.

翻訳日:2023-02-09 02:00:15 公開日:2022-11-28

# シュール・ワイル双対性を示す任意の対称性群上の量子状態平均化の双対性

Duality of averaging of quantum states over arbitrary symmetry groups revealing Schur-Weyl duality ( http://arxiv.org/abs/2208.07689v2 )

ライセンス: Link先を確認

Marcin Markiewicz and Janusz Przewocki

(参考訳) 量子情報理論において、多部量子状態上のユニタリ群の集合的作用に対する一様平均化は、その状態がサブシステムの置換作用素に相当する形式に投影されるという、確立された事実である。したがって、置換作用素と同値な状態は集合ユニタリノイズによって影響を受けない。自明な観察により、置換作用素上の一様平均化は、ユニタリ群の集団作用の1つに相当するブロック対角構造を持つ形式に状態を投影することを示している。私たちは、平均値の双対性というこの性質の名前を紹介します。この双対性の背後にある数学的理由は、多部量子系のテンソル積状態空間上のユニタリ群の集合作用と置換演算の作用が行列代数として扱われるときに相互可換であるという事実である。そのような行列代数のペアは双対還元対として知られている。この研究において、有限次元量子系の場合、平均化の双対性は、群作用のカルタン分解上のイテレーテッド積分によって平均化演算が定義される限り、コンパクトであるか否かにかかわらず、双対還元対である対称群の任意の対に対して成り立つことを示す。結果は非常に一般的なものであるが、特殊線形行列と置換演算の集団作用からなる双対簡約対の具体的例に着目し、これは非一元的slocc(stochastic local operations and classical communication)演算上の多元量子状態の平均化に対応している。この文脈では、特定の不変部分空間へのポストセレクションにおいて、SLOCC平均化の場合において、集合単位平均化から知られているノイズレスサブシステムが持続することを示す。

It is a well-established fact in quantum information theory, that uniform averaging over the collective action of a unitary group on a multipartite quantum state projects the state to a form equivalent to a permutation operator of the subsystems. Hence states equivalent to permutation operators are untouched by collective unitary noise. A trivial observation shows that uniform averaging over permutation operators projects the state into a form with block-diagonal structure equivalent to the one of the collective action of the unitary group. We introduce a name for this property: duality of averaging. The mathematical reason behind this duality is the fact that the collective action of the unitary group on the tensor product state space of a multipartite quantum system and the action of the permutation operations are mutual commutants when treated as matrix algebras. Such pairs of matrix algebras are known as dual reductive pairs. In this work we show, that in the case of finite dimensional quantum systems such duality of averaging holds for any pairs of symmetry groups being dual reductive pairs, regardless of whether they are compact or not, as long as the averaging operation is defined via iterated integral over the Cartan decomposition of the group action. Although our result is very general, we focus much attention on the concrete example of a dual reductive pair consisting of collective action of special linear matrices and permutation operations, which physically corresponds to averaging multipartite quantum states over non-unitary SLOCC-type (Stochastic Local Operations and Classical Communication) operations. In this context we show, that noiseless subsystems known from collective unitary averaging persist in the case of SLOCC averaging in a conditional way: upon postselection to specific invariant subspaces.

翻訳日:2023-01-30 22:54:16 公開日:2022-11-28

# ミスマッチ基底測定による単一ビットレジームにおける実用量子鍵分布の安全性の簡易かつ厳密な証明法

Simple and Rigorous Proof Method for the Security of Practical Quantum Key Distribution in the Single-Qubit Regime Using Mismatched Basis Measurements ( http://arxiv.org/abs/2208.13754v2 )

ライセンス: Link先を確認

Michel Boyer, Gilles Brassard, Nicolas Godbout, Rotem Liss, St\'ephane Virally

(参考訳) 量子鍵分配(QKD)プロトコルは、2つのパーティが秘密の共有鍵を生成できるようにすることを目的としている。理論上、多くのQKDプロトコルは無条件で安全であることが証明されているが、実験的なQKD実装の実際のセキュリティ分析は、通常、可能なすべての抜け穴を考慮していない。本稿では、まず、単一量子ビットロスレスシステムにおいて、離散変数QKD(測定デバイスに依存しないQKDにも適用できる)の実用的実装に対して、セキュアなキーレートの計算方法を提案する。我々は,本手法がQKDの実践的実現を解析,ベンチマーク,標準化するための標準ツールの1つになることを願っている。

Quantum key distribution (QKD) protocols aim at allowing two parties to generate a secret shared key. While many QKD protocols have been proven unconditionally secure in theory, practical security analyses of experimental QKD implementations typically do not take into account all possible loopholes, and practical devices are still not fully characterized for obtaining tight and realistic key rates. We present a simple method of computing secure key rates for any practical implementation of discrete-variable QKD (which can also apply to measurement-device-independent QKD), initially in the single-qubit lossless regime, and we rigorously prove its unconditional security against any possible attack. We hope our method becomes one of the standard tools used for analysing, benchmarking, and standardizing all practical realizations of QKD.

翻訳日:2023-01-28 14:33:46 公開日:2022-11-28

# 分散ボース-アインシュタイン凝縮体の量子バックアクション限界

Quantum Back-action Limits in Dispersively Measured Bose-Einstein Condensates ( http://arxiv.org/abs/2209.04400v2 )

ライセンス: Link先を確認

Emine Altuntas and Ian B. Spielman

(参考訳) 量子力学の基本的な理論は、測定がシステムの波動関数を、観測者がいなくても測定結果と最も一致するものに変化させることである。弱測定はシステムの限られた情報のみを生成し、結果としてシステムの状態を最小限に変化させる。ここでは、原子ボース・アインシュタイン凝縮における量子バックアクションと遠方共振レーザービームとの相互作用を理論的に実験的に特徴付ける。この過程を,環境が散乱光を測定する量子軌道法を用いて理論的に記述し,理想的な光検出機構に基づく測定モデルを提案する。ラムゼー干渉計のコントラストの観点で導波関数の変化を実験的に定量化し,測定過程に伴う寄生効果を制御した。観測されたバックアクションは、我々の測定モデルとよく一致しており、量子ガスの真の量子バックアクション制限測定を可能にする。

A fundamental tenet of quantum mechanics is that measurements change a system's wavefunction to that most consistent with the measurement outcome, even if no observer is present. Weak measurements produce only limited information about the system, and as a result only minimally change the system's state. Here, we theoretically and experimentally characterize quantum back-action in atomic Bose-Einstein condensates interacting with a far-from resonant laser beam. We theoretically describe this process using a quantum trajectories approach where the environment measures the scattered light and present a measurement model based on an ideal photodetection mechanism. We experimentally quantify the resulting wavefunction change in terms of the contrast of a Ramsey interferometer and control parasitic effects associated with the measurement process. The observed back-action is in good agreement with our measurement model, enabling true quantum back-action limited measurements of quantum gases.

翻訳日:2023-01-27 05:21:36 公開日:2022-11-28

# 超伝導量子ビットによる量子コンピューティングの将来

The Future of Quantum Computing with Superconducting Qubits ( http://arxiv.org/abs/2209.06841v2 )

ライセンス: Link先を確認

Sergey Bravyi, Oliver Dial, Jay M. Gambetta, Dario Gil, and Zaira Nazario

(参考訳) 史上初めて、量子処理ユニット(QPU)の出現とともに、コンピューティングパラダイムにおいて分岐点が見られます。超多項式スピードアップによる計算の可能性を抽出し、量子アルゴリズムを実現するには、量子誤り訂正技術の大幅な進歩が必要になる可能性が高い。一方、短期的に計算上の優位性を達成するためには、回路編み技術による複数のQPUの組み合わせ、エラー抑制と緩和による解の質の向上、漸近的なスピードアップによる量子アルゴリズムのヒューリスティックバージョンに焦点を当てることが考えられる。そのためには、量子コンピューティングハードウェアの性能を改善し、ソフトウェアは量子中心のスーパーコンピュータと呼ばれる新しいアーキテクチャを形成するために、量子プロセッサと古典プロセッサをシームレスに統合する必要がある。長期的には、2Dトポロジ以上の量子ビット接続を利用してより効率的な量子エラー訂正コードを実現するハードウェアや、QPUのスケーリングとワークロードの並列化のためのモジュールアーキテクチャ、ユーザの目に見えない技術的複雑さを実現し、ユビキタスで摩擦のない量子コンピューティングの目標を実現するソフトウェアなどがあります。

For the first time in history, we are seeing a branching point in computing paradigms with the emergence of quantum processing units (QPUs). Extracting the full potential of computation and realizing quantum algorithms with a super-polynomial speedup will most likely require major advances in quantum error correction technology. Meanwhile, achieving a computational advantage in the near term may be possible by combining multiple QPUs through circuit knitting techniques, improving the quality of solutions through error suppression and mitigation, and focusing on heuristic versions of quantum algorithms with asymptotic speedups. For this to happen, the performance of quantum computing hardware needs to improve and software needs to seamlessly integrate quantum and classical processors together to form a new architecture that we are calling quantum-centric supercomputing. Long term, we see hardware that exploits qubit connectivity in higher than 2D topologies to realize more efficient quantum error correcting codes, modular architectures for scaling QPUs and parallelizing workloads, and software that evolves to make the intricacies of the technology invisible to the users and realize the goal of ubiquitous, frictionless quantum computing.

翻訳日:2023-01-26 16:51:57 公開日:2022-11-28

# 脱局在質量の再結合における重力子放出条件

Conditions for graviton emission in the recombination of a delocalized mass ( http://arxiv.org/abs/2209.10355v2 )

ライセンス: Link先を確認

Alessandro Pesci

(参考訳) 既知のゲダンケン実験では、非局在化質量は再結合され、それによって引き起こされる重力場は別の(距離のある)粒子によって探される。これにより、重力場が重なり合った位置と絡み合うと生じる相補性と因果関係の間の緊張関係を探究することができる。提案された解決法は四極子モーメントからのグラビトン放出である:因果的に分離されたソースとプローブに対して、どの経路を許容するのに十分なモーメントが異なるとき、それらはまた再結合におけるグラビトン放出を暗示する(明示的に計算する必要はない)。ここでは、非局在化粒子(プローブとゲダンケンの実験の鍛造)に焦点を当て、重力子放出の条件(質量、分離、再結合時間)を探索する。これにより、再結合における四極子モーメントのバリエーションは、非局在状態のエネルギー期待値(後者の場合のモーメント変動 $\sim m \, d^2$, with $m$ mass, $d$ separation)に置き換わる場合に比べて、体が絡み合っている場合と比べ、一般的に大きく増加することが分かる。また、グラビトン放出の制限組換え時間は、$\sqrt{m}$の代わりに$m$として増加する。この場合、プランク質量はしきい値質量(重く、非局在化された物体)として作用し、その下に重力子放出は生じないが、再結合の速度は速い。もしこれがディオシとペンローズの崩壊モデル(基本形)で予想される崩壊時間と比較された場合、再結合による(四極子)グラビトン放出は不可能であることが分かる。実際、m$が排出を許容できるほど大きくなれば、重ね合わせが再結合するのに十分な期間崩壊に耐えるには大きすぎる。

In a known gedanken experiment a delocalized mass is recombined while the gravitational field sourced by it is probed by another (distant) particle. This allows to explore a possible tension between complementarity and causality, arising if the gravitational field entangles with the superposed locations. A proposed resolution is graviton emission from quadrupole moments: when, for causally disconnected source and probe, the moments differ enough to allow which path, they also imply graviton emission in the recombination (no need to explicitly compute them). Here we focus on the delocalized particle (forgetting about the probe and the gedanken experiment) and explore the conditions (in terms of mass, separation, recombination time) for graviton emission. Doing this we find that the variations of quadrupole moments in the recombination are generically greatly enhanced if the field is entangled compared to if it is sourced instead by the energy expectation value on the delocalized state (moment variation $\sim m \, d^2$ in the latter case, with $m$ mass, $d$ separation). Also, we get a limit recombination time for graviton emission growing as $m$ in place of $\sqrt{m}$. In this the Planck mass acts as threshold mass (huge, for delocalized objects): no graviton emission is possible below it, however fast the recombination occurs. If this is compared with the decay times foreseen in the collapse models of Diosi and Penrose (in their basic form), one finds that no (quadrupole) graviton emission from recombination is possible in them. Indeed, right when $m$ becomes large enough to allow for emission it becomes too large for the superposition to survive collapse long enough to recombine.

翻訳日:2023-01-25 20:46:54 公開日:2022-11-28

# 可変超伝導量子ビットを用いた量子センシング:最適化と高速化

Quantum sensing with tunable superconducting qubits: optimization and speed-up ( http://arxiv.org/abs/2211.08344v3 )

ライセンス: Link先を確認

Sergey Danilin, Nicholas Nugent, Martin Weides

(参考訳) センシングとメトロロジーは、より正確なデータセットの必要性を常に満たし、研究者が理論モデルの妥当性についてより信頼できる結論を出すことによって、基礎科学や応用において重要な役割を果たす。センサーはユビキタスです。これらは重力イメージング、地質学、ナビゲーション、セキュリティ、タイムキーピング、分光、化学、磁気測定、医療、医療など幅広い分野のアプリケーションで使われている。量子技術の現在の進歩は、必然的に新しい能力を持つセンサーとしての量子システムの使用を探求するきっかけとなった。本稿では、波長可変トランスモン量子ビットセンサを用いたキタエフ位相推定アルゴリズムによる外部磁束の量子エンハンスセンシングの最適化について述べる。最大量子ビット遷移周波数の異なるセンサに対して最適なフラックス偏差点を提供する。所定の設計に対してデコヒーレンス率の推定を行う。センシングに2-$と3-$qubitのエンタングル状態を使用することは、単一のqubitケースとシミュレーションで比較される。フラックスセンシング精度は10^{-8}\cdot\Phi_0$に達し、時間とともに$\sim\ 1/t$とスケールする。

Sensing and metrology play an important role in fundamental science and applications by fulfilling the ever-present need for more precise data sets and by allowing researchers to make more reliable conclusions on the validity of theoretical models. Sensors are ubiquitous. They are used in applications across a diverse range of fields including gravity imaging, geology, navigation, security, timekeeping, spectroscopy, chemistry, magnetometry, healthcare, and medicine. Current progress in quantum technologies has inevitably triggered the exploration of the use of quantum systems as sensors with new and improved capabilities. This article describes the optimization of the quantum-enhanced sensing of external magnetic fluxes with a Kitaev phase estimation algorithm based on a sensor with tunable transmon qubits. It provides the optimal flux biasing point for sensors with different maximal qubit transition frequencies. An estimation of decoherence rates is made for a given design. The use of $2-$ and $3-$qubit entangled states for sensing are compared in simulation with the single qubit case. The flux sensing accuracy reaches $10^{-8}\cdot\Phi_0$ and scales with time as $\sim\ 1/t$ which proves the speed-up of sensing with high ultimate accuracy.

翻訳日:2023-01-19 12:29:20 公開日:2022-11-28

# 未決定ダイソン・シュウィンガー方程式

Underdetermined Dyson-Schwinger equations ( http://arxiv.org/abs/2211.13026v2 )

ライセンス: Link先を確認

Carl M. Bender, Christos Karapoulitidis and S.P. Klevansky

(参考訳) 本稿では、ダイソン・シュウィンガー方程式(ds)を量子場理論における計算ツールとしての有効性について検討する。 DS方程式は、場の理論の連結グリーン函数$G_n$によって正確に満たされる結合方程式の無限列である。これらの方程式は、より高次のグリーン函数に結合し、それらが切り離された場合、結果として生じる有限な方程式体系は過小評価される。未決定系を解く最も単純な方法は、すべての高次グリーン関数を 0 に設定し、最初の数個のグリーン関数に対して得られた決定系を解くことである。得られた$g_1$ または $g_2$ so は、解決可能なモデルの正確な結果と比較でき、高次切り換えの精度が向上するかどうかを確認することができる。 hermitian $\phi^4$ と $\phi^6$ と non-hermitian $i\phi^3$, $-\phi^4$, $i\phi^5$ の5つのモデルが研究されている。切断されたds方程式は、緩やかに制限値に収束する近似値の列を与えるが、この制限値は常に正確な値と数パーセント異なる。平均場的近似に基づくより洗練されたトランケーションスキームは、この恐ろしい計算問題を解決しない。

This paper examines the effectiveness of the Dyson-Schwinger (DS) equations as a calculational tool in quantum field theory. The DS equations are an infinite sequence of coupled equations that are satisfied exactly by the connected Green's functions $G_n$ of the field theory. These equations link lower to higher Green's functions and, if they are truncated, the resulting finite system of equations is underdetermined. The simplest way to solve the underdetermined system is to set all higher Green's function(s) to zero and then to solve the resulting determined system for the first few Green's functions. The $G_1$ or $G_2$ so obtained can be compared with exact results in solvable models to see if the accuracy improves for high-order truncations. Five $D=0$ models are studied: Hermitian $\phi^4$ and $\phi^6$ and non-Hermitian $i\phi^3$, $-\phi^4$, and $i\phi^5$ theories. The truncated DS equations give a sequence of approximants that converge slowly to a limiting value but this limiting value always {\it differs} from the exact value by a few percent. More sophisticated truncation schemes based on mean-field-like approximations do not fix this formidable calculational problem.

翻訳日:2023-01-19 01:32:08 公開日:2022-11-28

# 増幅ファイバリンクにおける絡み合い支援通信のスケーリング

Scaling of Entanglement-Assisted Communication in Amplified Fiber Links ( http://arxiv.org/abs/2211.13296v2 )

ライセンス: Link先を確認

Simon Sekav\v{c}nik and Janis N\"otzel

(参考訳) 量子通信技術はいくつかの高度な戦略を提供する。しかし、その実践的利用はしばしばよく理解されていない。本稿では,増幅ファイバリンクにおける理論的な通信容量スケーリングについて概説する。本稿では,十分な帯域幅と空間モードがファイバによって提供され,事前共有されたエンタングルメントによる支援が任意のキャパシティを提供するシナリオを提案する。従来の古典的手法や非支援量子技術に対する将来的な能力の優位性は、潜在的に無限大である。我々はこの理論的な観察を、繊維開発の現状に関連して論じる。

Quantum communication technology offers several advanced strategies. However, their practical use is often times still not well understood. In this work we outline the theoretical communication capacity scaling in amplified fiber links. We present a scenario in which the assistance via pre-shared entanglement offers an arbitrary capacity, given enough bandwidth and spatial modes are provided by the fiber. The future capacity advantage over conventional classical techniques as well as non-assisted quantum techniques is potentially infinite. We discuss this theoretical observation in connection to current trends in fiber development.

翻訳日:2023-01-19 01:12:50 公開日:2022-11-28

# 量子オットーサイクルによるエナンチオマー検出

Enantiomer detection via Quantum Otto cycle ( http://arxiv.org/abs/2211.06888v2 )

ライセンス: Link先を確認

Mohsen Izadyari and M. Tahir Naseem and \"Ozg\"ur E. M\"ustecapl{\i}ouglu

(参考訳) エナンチオマーは、左右の配座に存在するキラル分子である。エナンチオマーの検出の光学的手法は、左利き分子と右利き分子の識別に広く用いられている。しかし、同一のエナンチオマーのスペクトルはエナンチオマーの検出を非常に困難な課題にしている。本稿では, 熱力学的プロセスを利用したエナンチオマー検出の可能性を検討する。特に、周期的な光遷移を持つ3段階系によって記述されるキラル分子を加工媒体とする量子オットーサイクルを用いる。 3レベルシステムの各エネルギー遷移は、外部レーザー駆動と結合される。左利きの分子は熱エンジンとして機能し、右利きの分子は熱加速器として機能し、ドライブの全体位相はサイクルの制御パラメータとして考慮される。さらに、レーザー駆動の変形を制御パラメータとして考慮し、左右どちらの分子も熱エンジンとして機能する。しかし、両方の症例の抽出された作業と効率が定量的に異なるため、分子は依然として区別できる。したがって、オットーサイクルの作業分布を評価することにより、左右の分子を区別することができる。

Enantiomers are chiral molecules that exist in right-handed and left-handed conformations. Optical techniques of enantiomers detection are widely employed to discriminate between left- and right-handed molecules. However, identical spectra of enantiomers make enantiomer detection a very challenging task. Here, we investigate the possibility of exploiting thermodynamic processes for enantiomer detection. In particular, we employ a quantum Otto cycle, in which a chiral molecule described by a three-level system with cyclic optical transitions is considered a working medium. Each energy transition of the three-level system is coupled with an external laser drive. We find that the left-handed molecule works as a heat engine, while the right-handed molecule works as a thermal accelerator where the overall phase of the drives is considered as the cycle's control parameter. In addition, both left- and right-handed molecules work as heat engines by considering laser drives' detuning as the control parameter. However, the molecules can still be distinguished because both cases' extracted work and efficiency are quantitatively very different. Accordingly, left and right-handed molecules can be distinguished by evaluating the work distribution in the Otto cycle.

翻訳日:2023-01-18 07:27:05 公開日:2022-11-28

# 光子計数ベルテストのeberhard極限と量子鍵分布におけるその有用性

Eberhard limit for photon-counting Bell tests and its utility in quantum key distribution ( http://arxiv.org/abs/2211.15033v1 )

ライセンス: Link先を確認

Thomas McDermott, Morteza Moradi, Antoni Mikos-Nuszkiewicz, Magdalena Stobi\'nska

(参考訳) 抜け穴のないベルテストは、任意の抜け穴がプロトコルのセキュリティを損なう可能性があるため、デバイス非依存の量子鍵分布(qkd)を実行したい場合には不可欠である。エバーハルトによる地殻調査では、弱絡み2量子状態は最大絡み2/3以上の検出効率でループホールを閉じることができるため、検出ループホールに対する抵抗が最大絡み2/3よりもはるかに大きいことを示した。ここでは、2モード圧縮真空や一般化ホランド・バーネット状態のような高次元多光子状態の非局在性を証明できる光子計数CHSHベル試験について、この制限が成り立つことを示した。実際、これらのテストは何らかの意味で普遍的であり、2つのモードが光子数でよく相関している限り、任意の多光子二成分状態に対して可能な検出ホールフリーなテストを可能にする証拠を示す。さらに、典型的な2入出力ベルのシナリオを超えて、エバーハルト限界に合致する光子計数CGLMPの不等式も存在し、よりエキゾチックなループホールのないベル試験への道を開いた。最後に,非エンタングル状態の損失許容度を増大させることで,光子計数テストに基づくqkdプロトコルの鍵レートと損失許容度を向上できることを示す。

Loophole-free Bell tests are essential if one wishes to perform device-independent quantum key distribution (QKD), since any loophole could be used by a potential adversary to undermine the security of the protocol. Crucial work by Eberhard demonstrated that weakly entangled two-qubit states have a far greater resistance to the detection loophole than maximally entangled states, allowing one to close the loophole with detection efficiency greater than 2/3. Here we demonstrate that this same limit holds for photon-counting CHSH Bell tests which can demonstrate non-locality for higher dimensional multiphoton states such as two-mode squeezed vacuum and generalized Holland-Burnett states. In fact, we show evidence that these tests are in some sense universal, allowing feasible detection loophole-free tests for any multiphoton bipartite state, as long as the two modes are well correlated in photon number. Additionally, by going beyond the typical two-input two-output Bell scenario, we show that there are also photon-counting CGLMP inequalities which can also match the Eberhard limit, paving the way for more exotic loophole-free Bell tests. Finally we show that by exploiting this increased loss tolerance of non maximally entangled states, one can increase the key rates and loss tolerances of QKD protocols based on photon-counting tests.

翻訳日:2023-01-17 15:19:02 公開日:2022-11-28

# 開量子多体系におけるデコヒーレンス過程の準粒子:インコヒーレントン

Quasiparticles of Decoherence Processes in Open Quantum Many-Body Systems: Incoherentons ( http://arxiv.org/abs/2211.14991v1 )

ライセンス: Link先を確認

Taiki Haga, Masaya Nakagawa, Ryusuke Hamazaki, Masahito Ueda

(参考訳) 開量子系の緩和ダイナミクスは、系のコヒーレントハミルトン力学と環境との相互作用による散逸力学との競合によって決定される。したがって、コヒーレント体制から非コヒーレント体制への移行を理解することは基本的な関心事である。ヒッヘルト非認識準粒子(インコヒーレントン)は、開量子多体系の力学を支配するリウヴィリア超作用素の固有モデムにおけるコヒーレント-非コヒーレント遷移を記述する。ここで、インコヒーレントンは、系の密度行列を表す補助ラダー系において、鎖間結合状態として定義される。リウヴィリアン固有モードは、関連するインコヒーレントンの数を反映する異なる減衰率を持つ群に分類される。また、固有モードの異なるグループを分離するスペクトルギャップ(量子コヒーレンスギャップ)も導入します。我々は, 劣化を受ける格子ボソンモデルにおけるインコヒーレントンの存在を実証し, インコヒーレントンが分解されると量子コヒーレンスギャップが閉じることを示し, 指数的崩壊による非コヒーレント緩和からコヒーレント振動緩和への動的遷移を示す。さらに, 量子多体系のデコヒーレンスダイナミクスが, インコヒーレントンの生成, 局在, 拡散の観点でどのように理解できるかを考察する。

The relaxation dynamics of an open quantum system is determined by the competition between the coherent Hamiltonian dynamics of a system and the dissipative dynamics due to interactions with environments. It is therefore of fundamental interest to understand the transition from the coherent to incoherent regimes. We find that hitherto unrecognized quasiparticles -- incoherentons -- describe this coherent-to-incoherent transition in eigenmodes of a Liouvillian superoperator that governs the dynamics of an open quantum many-body system. Here, an incoherenton is defined as an interchain bound state in an auxiliary ladder system that represents the density matrix of a system. The Liouvillian eigenmodes are classified into groups with different decay rates that reflect the number of incoherentons involved therein. We also introduce a spectral gap -- quantum coherence gap -- that separates the different groups of eigenmodes. We demonstrate the existence of incoherentons in a lattice boson model subject to dephasing, and show that the quantum coherence gap closes when incoherentons are deconfined, which signals a dynamical transition from incoherent relaxation with exponential decay to coherent oscillatory relaxation. Furthermore, we discuss how the decoherence dynamics of quantum many-body systems can be understood in terms of the generation, localization, and diffusion of incoherentons.

翻訳日:2023-01-17 15:18:36 公開日:2022-11-28

# 量子交互演算子アンザッツによる最小被覆問題の解法

Quantum Alternating Operator Ansatz for Solving the Minimum Exact Cover Problem ( http://arxiv.org/abs/2211.15266v1 )

ライセンス: Link先を確認

Sha-Sha Wang, Hai-Ling Liu, Su-Juan Qin, Fei Gao, and Qiao-Yan Wen

(参考訳) 最小完全被覆(MEC)は一般的な組合せ最適化問題であり、テールアサインメントや車両ルーティングに広く応用されている。本稿では,MEC 問題を解くために量子交互演算子 ansatz (QAOA+) を用いる。詳しくは、自明な実現可能な解を得るために、まずMECを2つの目的関数を持つ制約付き最適化問題に変換する。そこで,線形重み付き和法を用いて上記の制約付き最適化問題を解き,対応する対象ハミルトニアンを構成する。最後に,本アルゴリズムの性能向上のために,実験例が6,8,10キュービットである場合のシミュレーションにパラメータ固定方式を採用する。数値計算の結果,アルゴリズムのレベル$p$が低い場合,高い確率で解が得られることがわかった。さらに、シングルキュービット回転ゲート$r_z$を除去して量子回路を最適化する。量子ゲートの数は$np$ for $p$レベルの最適化回路で減少することがわかった。さらに、$p$レベル最適化回路は$p$パラメータしか必要とせず、$p$パラメータを持つオリジナル回路と同様の実験的な効果を実現できる。

The minimum exact cover (MEC) is a common combinatorial optimization problem, with wide applications in tail-assignment and vehicle routing. In this paper, we adopt quantum alternating operator ansatz (QAOA+) to solve MEC problem. In detail, to obtain a trivial feasible solution, we first transform MEC into a constrained optimization problem with two objective functions. Then, we adopt the linear weighted sum method to solve the above constrained optimization problem and construct the corresponding target Hamiltonian. Finally, to improve the performance of this algorithm, we adopt parameters fixing strategy to simulate, where the experimental instances are 6, 8, and 10 qubits. The numerical results show that the solution can be obtained with high probability when level $p$ of the algorithm is low. Besides, we optimize the quantum circuit by removing single-qubit rotating gates $R_Z$. We found that the number of quantum gates is reduced by $np$ for $p$-level optimized circuit. Furthermore, $p$-level optimized circuit only needs $p$ parameters, which can achieve an experimental effect similar to original circuit with $2p$ parameters.

翻訳日:2023-01-17 15:09:18 公開日:2022-11-28

# 動くunruh-dewitt検出器の量子相関とコヒーレンス

Quantum Correlations and Coherence in a Moving Unruh-deWitt Detector ( http://arxiv.org/abs/2211.15263v1 )

ライセンス: Link先を確認

S. Bhuvaneswari, R. Muthuganesan and R. Radha

(参考訳) 本稿では,スカラー場に結合した2つの加速型unruh-dewitt検出器の3 + 1 minkowski時空における量子相関とコヒーレンスについて検討する。エンタングルメントは無限加速度の限界で完全に破壊されるが、局所量子の不確かさとコヒーレンスのl1ノルムはゼロではない。さらに,初期状態の異なる選択に対する量子相関に対する検出器の非ルール温度とエネルギー間隔の役割についても注目する。

In this paper, we investigate the quantum correlations and coherence of two accelerating Unruh-deWitt detectors coupled to a scalar field in 3 + 1 Minkowski space-time. We show that the entanglement is completely destroyed in the limit of infinite acceleration while the local quantum uncertainty and l1-norm of coherence remain nonzero. In addition, we also highlight the role of Unruh temperature and energy spacing of detectors on quantum correlations for different choices of initial states.

翻訳日:2023-01-17 15:09:00 公開日:2022-11-28

# ランダムイジングモデルのためのディープラーニング最適量子アニールスケジュール

Deep learning optimal quantum annealing schedules for random Ising models ( http://arxiv.org/abs/2211.15209v1 )

ライセンス: Link先を確認

Pratibha Raghupati Hegde, Gianluca Passarelli, Giovanni Cantele, and Procolo Lucignano

(参考訳) 量子アドバンテージへの競争における重要なステップは、アドホックアニーリングスケジュールを用いた量子アニーリングの最適化である。この分野の最近の進展に触発されて、我々は、正規グラフ上の(ランダム)重み付きMax-Cutの最適焼鈍スケジュールの探索を自動化するために、長期記憶(LSTM)ニューラルネットワークを提案する。局所断熱焼鈍経路を用いてネットワークを訓練することにより,未発見のインスタンスやより大きなグラフに対する最適焼鈍スケジュールを,トレーニングに使用するものよりも予測することができる。

A crucial step in the race towards quantum advantage is optimizing quantum annealing using ad-hoc annealing schedules. Motivated by recent progress in the field, we propose to employ long short term memory (LSTM) neural networks to automate the search for optimal annealing schedules for (random) weighted Max-Cut on regular graphs. By training our network using locally adiabatic annealing paths, we are able to predict optimal annealing schedules for unseen instances and even larger graphs than those used for training.

翻訳日:2023-01-17 15:08:52 公開日:2022-11-28

# 量子状態の数学的モデリングにおけるMajorana表現

Majorana Representation in Mathematical Modeling of Quantum States ( http://arxiv.org/abs/2211.15113v1 )

ライセンス: Link先を確認

Farhod Shokir

(参考訳) 本稿では、Majorana法を用いてスピン数S=j\^hの量子系の状態の数学的モデリングを行う。一般の場合j>=0.5における配向状態の相関関数の式を得る。

In this paper, using the Majorana method, mathematical modeling of the state of quantum systems with spin number S=j\^h. An expression for the correlation functions of oriented states in the general case j>=0.5 is obtained.

翻訳日:2023-01-17 15:08:11 公開日:2022-11-28

# 散逸キラル分子の放射線に対するエナンチオ選択的スイッチ

Enantioselective switch on radiations of dissipative chiral molecules ( http://arxiv.org/abs/2211.15112v1 )

ライセンス: Link先を確認

Chong Ye, Xiaowei Mu, Yifan Sun, Libin Fu, and Xiangdong Zhang

(参考訳) エナンチオ検出は自然科学において重要かつ困難な課題である。今日では、キラル分子の脱コヒーレンス非環状三レベルモデルに基づく光学的エナンチオデッション法は、分子応答におけるエナンチオ選択性の究極の限界に達することができる。したがって、従来のキロプティカル法よりも効率的である。しかしながら、脱コヒーレンスは避けられず、これらの高度な光学的手法のエナンチオ選択性を著しく低減することができるため、弱い脱コヒーレンス領域ではうまく機能する。本稿では,散逸性キラル分子の放射線に対するエナンチオ選択的スイッチを提案し,全てのデコヒーレンス領域において新しいエナンチオ検出法を開発した。提案方式では, 選択したエナンチオマーに対して放射線を照射し, 消散性三レベルモデルに基づいて電磁界をよく設計し, ミラー画像に対して同時に消光する。キラル混合物のエナンチオマー過剰は、2つのエナンチオマーの放射がそれぞれオフになっている2つのケースでその放出を比較することにより決定される。対応するエナンチオ選択性は、全てのデコヒーレンス領域において究極の限界に達し、エナンチオ検出における他のキロプティカル手法よりもスキームのアドバンテージを提供する。本研究は, すべての脱コヒーレンス領域において, より効率的なエナンチオディッション技術を開発するための出発点となる可能性がある。

Enantiodetection is an important and challenging task across natural science. Nowadays, some chiroptical methods of enantiodetection based on decoherence-free cyclic three-level models of chiral molecules can reach the ultimate limit of the enantioselectivities in the molecular responses. They are thus more efficient than traditional chiroptical methods. However, decoherence is inevitable and can severely reduce enantioselectivities in these advanced chiroptical methods, so they only work well in the weak decoherence region. Here, we propose an enantioselective switch on the radiation of dissipative chiral molecules and develop a novel chiroptical method of enantiodetection working well in all decoherence regions. In our scheme, radiation is turned on for the selected enantiomer and simultaneously turned off for its mirror image by designing the electromagnetic fields well based on dissipative cyclic three-level models. The enantiomeric excess of a chiral mixture is determined by comparing its emissions in two cases, where the radiations of two enantiomers are turned off respectively. The corresponding enantioselectivities reach the ultimate limit in all decoherence regions, offering our scheme advantages over other chiroptical methods in enantiodetection. Our work potentially constitutes the starting point for developing more efficient chiroptical techniques for enantiodection in all decoherence regions.

翻訳日:2023-01-17 15:08:07 公開日:2022-11-28

# チャネル識別のための利益のある絡み合い

Profitable entanglement for channel discrimination ( http://arxiv.org/abs/2211.15108v1 )

ライセンス: Link先を確認

Samad Khabbazi Oskouei, Stefano Mancini, Milajiguli Rexiti

(参考訳) 本研究では,2つの一般量子ビットチャネルの識別における側絡の有用性について検討し,それが拡張する条件(および成功確率が向上しない条件)を決定する。これは、まず、完全正およびトレース保存されたキュービット線型写像の集合において極端であるチャネルの問題を解析し、次にそのような集合の内部にあるチャネルについて構成的に行われる。

We investigate the usefulness of side entanglement in discriminating between two generic qubit channels and determine exact conditions under which it does enhance (as well as conditions under which it does not) the success probability. This is done in a constructive way by first analyzing the problem for channels that are extremal in the set of completely positive and trace-preserving qubit linear maps and then for channels that are inside such a set.

翻訳日:2023-01-17 15:07:40 公開日:2022-11-28

# 4光子ghzオンチップ状態の高信頼化

High-fidelity generation of four-photon GHZ states on-chip ( http://arxiv.org/abs/2211.15626v1 )

ライセンス: Link先を確認

Mathias Pont, Giacomo Corrielli, Andreas Fyrillas, Iris Agresti, Gonzalo Carvacho, Nicolas Maring, Pierre-Emmanuel Emeriau, Francesco Ceccarelli, Ricardo Albiero, Paulo H. D. Ferreira, Niccolo Somaschi, Jean Senellart, Isabelle Sagnes, Martina Morassi, Aristide Lemaitre, Pascale Senellart, Fabio Sciarrino, Marco Liscidini, Nadia Belabas, Roberto Osellame

(参考訳) 相互に絡み合った多光子状態は、全光学量子技術の核心にある。自由空間装置を用いた量子光の発生において顕著な進展が報告されているが、将来の拡張性には高忠実なオンチップエンタングルメント生成が不可欠である。本研究では,4光子グリーンバーグ・ホルン・ザイリンガー(GHZ)状態の高忠実度発生を低損失再構成ガラスフォトニック回路で実証するために,明るい量子ドットベースの単一光子源を用いる。我々は、生成状態の密度行列を、ターゲットである$|{\text{GHZ}_4}\rangle$ of $\mathcal{F}_{\text{GHZ}_4} (86.0\pm0.4)\,\%$と、$\mathcal{P}_{\text{GHZ}_4}=(76.3\pm0.6)\,\%$に到達した完全量子状態トモグラフィーを用いて再構成する。生成した状態の絡み合いは、39以上の標準偏差によるベルのような不等式違反による半デバイス非依存のアプローチで認証される。最後に、我々は4つのパーティの量子秘密共有プロトコルをチップ上で実行し、3つのインターロケータと最大1978ビットのsiftedキーを共有し、キュービットエラー率10.87\,\%$を達成する。これらの結果は、チップ上の絡み合い生成のためのガラスフォトニック回路と組み合わされた量子ドット技術が、中間スケールの量子計算と通信に有効な経路を提供することを示している。

Mutually entangled multi-photon states are at the heart of all-optical quantum technologies. While impressive progresses have been reported in the generation of such quantum light states using free space apparatus, high-fidelity high-rate on-chip entanglement generation is crucial for future scalability. In this work, we use a bright quantum-dot based single-photon source to demonstrate the high fidelity generation of 4-photon Greenberg-Horne-Zeilinger (GHZ) states with a low-loss reconfigurable glass photonic circuit. We reconstruct the density matrix of the generated states using full quantum-state tomography reaching an experimental fidelity to the target $|{\text{GHZ}_4}\rangle$ of $\mathcal{F}_{\text{GHZ}_4} (86.0\pm0.4)\,\%$, and a purity of $\mathcal{P}_{\text{GHZ}_4}=(76.3\pm0.6)\,\%$. The entanglement of the generated states is certified with a semi device-independent approach through the violation of a Bell-like inequality by more than 39 standard deviations. Finally, we carry out a four-partite quantum secret sharing protocol on-chip where a regulator shares with three interlocutors a sifted key with up to 1978 bits, achieving a qubit-error rate of $10.87\,\%$. These results establish that the quantum-dot technology combined with glass photonic circuitry for entanglement generation on chip offers a viable path for intermediate scale quantum computation and communication.

翻訳日:2023-01-17 15:00:45 公開日:2022-11-28

# 周期駆動型散逸双極子系のカスケードダイナミクス

Cascaded dynamics of a periodically driven dissipative dipolar system ( http://arxiv.org/abs/2211.15592v1 )

ライセンス: Link先を確認

Saptarshi Saha and Rangeet Bhattacharyya

(参考訳) 最近の実験では、双極子系の周期駆動が長寿命の予熱状態をもたらすことが示されている。これらのシステムは環境に弱い結合を持ち、熱化の時間スケールよりもはるかに短い時間スケールで予熱状態に達する。このようなほぼ閉ざされた系は、以前にフロッケ形式を用いて分析され、予熱プレートの出現を示している。これらのシステムを記述するために、変動制御量子マスター方程式(FRQME)を用いる。システム-環境結合に加えて、FRQMEはシステム内の様々な局所的相互作用からの散逸効果を捉えた。調査の結果,システムの最終安定状態へのカスケード的な旅が明らかになった。カスケードは、準保存量の集合によって特徴づけられる熱前状態または捕縛状態の集合を含む。これらの熱前状態は、緩和時間スケールよりもずっと短い時間スケールで現れる。また,予熱台地の存在が止まる限界の存在を発見,報告する。

Recent experiments show that periodic drives on dipolar systems lead to long-lived prethermal states. These systems are weakly coupled to the environment and reach prethermal states in a timescale much shorter than the timescale for thermalization. Such nearly-closed systems have previously been analyzed using Floquet formalism, which shows the emergence of a prethermal plateau. We use a fluctuation-regulated quantum master equation (FRQME) to describe these systems. In addition to the system-environment coupling, FRQME successfully captures the dissipative effect from the various local interactions in the system. Our investigation reveals a cascaded journey of the system to a final steady state. The cascade involves a set of prethermal or arrested states characterized by a set of quasi-conserved quantities. We show that these prethermal states emerge in a timescale much shorter than the relaxation timescale. We also find and report the existence of a critical limit beyond which the prethermal plateau ceases to exist.

翻訳日:2023-01-17 15:00:12 公開日:2022-11-28

# 量子会議キー合意の基本的限界を克服する

Overcoming fundamental bounds on quantum conference key agreement ( http://arxiv.org/abs/2211.15559v1 )

ライセンス: Link先を確認

Giacomo Carrara, Gl\'aucia Murta and Federico Grasselli

(参考訳) ツインフィールド量子鍵分布(TF-QKD)は、中間測定ステーションで弱いコヒーレントパルス(WCP)を干渉することにより、2つの離れたパーティが共有秘密鍵を確立することを可能にする。これにより、TF-QKDは従来のQKDスキームよりも遠くまで到達でき、二部構成のプライベートキャパシティ上のリピータレスバウンドを打破できる唯一のスキームとなる。ここでは、TF-QKDを多人数シナリオに一般化する。具体的には,WCPと線形光学しか使用せず,マルチパーティデコイステート方式でセキュリティを証明する,実用的な会議鍵契約(CKA)プロトコルを提案する。本プロトコルは,任意の数の参加者が単一光子干渉によって秘密の会議鍵を確立することを可能にし,リピータを使わずに量子ネットワークで会議鍵を確立できる速度の最近の限界を克服する。

Twin-Field Quantum Key Distribution (TF-QKD) enables two distant parties to establish a shared secret key, by interfering weak coherent pulses (WCPs) in an intermediate measuring station. This allows TF-QKD to reach greater distances than traditional QKD schemes and makes it the only scheme capable of beating the repeaterless bound on the bipartite private capacity. Here, we generalize TF-QKD to the multipartite scenario. Specifically, we propose a practical conference key agreement (CKA) protocol that only uses WCPs and linear optics and prove its security with a multiparty decoy-state method. Our protocol allows an arbitrary number of parties to establish a secret conference key by single-photon interference, enabling it to overcome recent bounds on the rate at which conference keys can be established in quantum networks without a repeater.

翻訳日:2023-01-17 15:00:01 公開日:2022-11-28

# ギブス多様体

Gibbs Manifolds ( http://arxiv.org/abs/2211.15490v1 )

ライセンス: Link先を確認

Dmitrii Pavlov, Bernd Sturmfels and Simon Telen

(参考訳) ギブス多様体は指数写像の下で対称行列のアフィン空間の像である。これらは最適化、統計学、量子物理学などの応用で生まれ、トーリック幾何学のユビキタスな役割を伸ばす。ギブス多様体は、ギブス多様体上で消えるすべての多項式の零点である。これらの多項式を計算し、ギブス多様体が低次元であることを示す。我々の理論は、行列鉛筆や量子最適輸送など、幅広いシナリオに適用されている。

Gibbs manifolds are images of affine spaces of symmetric matrices under the exponential map. They arise in applications such as optimization, statistics and quantum~physics, where they extend the ubiquitous role of toric geometry. The Gibbs variety is the zero locus of all polynomials that vanish on the Gibbs manifold. We compute these polynomials and show that the Gibbs variety is low-dimensional. Our theory is applied to a wide range of scenarios, including matrix pencils and quantum optimal transport.

翻訳日:2023-01-17 14:59:45 公開日:2022-11-28

# アクティブボリューム:非ローカル接続の少ない効率的なフォールトトレラント量子コンピュータのためのアーキテクチャ

Active volume: An architecture for efficient fault-tolerant quantum computers with limited non-local connections ( http://arxiv.org/abs/2211.15465v1 )

ライセンス: Link先を確認

Daniel Litinski and Naomi Nickerson

(参考訳) 表面符号に基づくフォールトトレラント量子コンピュータの既存の汎用アーキテクチャでは、量子計算のコストは回路体積、すなわち非クリフォードゲート数で乗算された量子ビット数によって決定される。我々は,非2d-ローカル接続を用いたアーキテクチャを導入し,そのコストはキュービット数でスケールせず,論理演算数でのみスケールする。各論理演算は関連するアクティブボリュームを持ち、量子計算のコストを全ての演算のアクティブボリュームの和として定量化することができる。数千の論理量子ビットを持つ量子計算では、アクティブ体積は回路体積よりも桁違いに小さい。重要なことに、アーキテクチャはN論理量子ビット間の全接続を必要としない。代わりに、各論理キュービットは O(log N) の他のサイトと接続される。例えば、同じ数の論理量子ビットを用いることで、2048ビットのファクタリングアルゴリズムが、非ローカル接続のない汎用アーキテクチャよりも44倍高速に実行可能であることを示す。フォトニック量子ビットでは、長距離接続が可能であり、フォトニックコンポーネントが融合ベースのアクティブボリューム量子コンピュータの構築にどのように使われるかを示す。

In existing general-purpose architectures for surface-code-based fault-tolerant quantum computers, the cost of a quantum computation is determined by the circuit volume, i.e., the number of qubits multiplied by the number of non-Clifford gates. We introduce an architecture using non-2D-local connections in which the cost does not scale with the number of qubits, and instead only with the number of logical operations. Each logical operation has an associated active volume, such that the cost of a quantum computation can be quantified as a sum of active volumes of all operations. For quantum computations with thousands of logical qubits, the active volume can be orders of magnitude lower than the circuit volume. Importantly, the architecture does not require all-to-all connectivity between N logical qubits. Instead, each logical qubit is connected to O(log N) other sites. As an example, we show that, using the same number of logical qubits, a 2048-bit factoring algorithm can be executed 44 times faster than on a general-purpose architecture without non-local connections. With photonic qubits, long-range connections are available and we show how photonic components can be used to construct a fusion-based active-volume quantum computer.

翻訳日:2023-01-17 14:59:39 公開日:2022-11-28

# 鳥類コンパスのラジカルペアダイナミクスの量子シミュレーション

Quantum Simulation of the Radical Pair Dynamics of the Avian Compass ( http://arxiv.org/abs/2211.15427v1 )

ライセンス: Link先を確認

Yiteng Zhang, Zixuan Hu, Yuchen Wang, and Sabre Kais

(参考訳) 量子回路上でのオープン量子ダイナミクスのシミュレーションは、近年、様々な量子アルゴリズムの開発と実証によって、幅広い関心を集めている。これらのうち、ユニタリディレーションに基づく量子アルゴリズムの特定の設計は、一般および複雑な物理系をシミュレートすることができる。本稿では,この量子アルゴリズムを鳥のコンパスにおけるラジカル対機構のダイナミクスに応用する。このアプリケーションはIBM QASM量子シミュレータで実証される。この研究は、鳥のコンパスにおけるラジカルペア機構をシミュレートする量子アルゴリズムの最初の応用であり、これは量子アルゴリズムの一般化を実証するだけでなく、鳥類のコンパスを量子コンピューティングデバイスで研究する新たな機会を開く。

The simulation of open quantum dynamics on quantum circuits has attracted wide interests recently with a variety of quantum algorithms developed and demonstrated. Among these, one particular design of a unitary-dilation-based quantum algorithm is capable of simulating general and complex physical systems. In this paper, we apply this quantum algorithm to simulating the dynamics of the radical pair mechanism in the avian compass. This application is demonstrated on the IBM QASM quantum simulator. This work is the first application of any quantum algorithm to simulating the radical pair mechanism in the avian compass, which not only demonstrates the generality of the quantum algorithm, but also opens new opportunities for studying the avian compass with quantum computing devices.

翻訳日:2023-01-17 14:59:20 公開日:2022-11-28

# 非エルミート量子系に対する半古典的フシミ分布

Semiclassical Husimi distributions for non-Hermitian quantum systems ( http://arxiv.org/abs/2211.15336v1 )

ライセンス: Link先を確認

Joesph Hall, Simon Malzard, and Eva-Maria Graefe

(参考訳) 非エルミート量子系におけるシュールベクトルの半古典位相空間密度を構築する。各schurベクトルは単一のプランクセルに関連付けられる。シュール状態は位相空間上の古典的ノルムの風景(非エルミート系の特徴である寿命の古典的表現)に従って組織される。この構成の一般性を示すために、混合的およびカオス的古典力学の条件下でのPT対称キックローターを非常に非自明な例に適用する。

We construct a semiclassical phase-space density of Schur vectors in non-Hermitian quantum systems. Each Schur vector is associated to a single Planck cell. The Schur states are organised according to a classical norm landscape on phase space - a classical manifestation of the lifetimes which are characteristic of non-Hermitian systems. To demonstrate the generality of this construction we apply it to a highly non-trivial example, a PT-symmetric kicked rotor in the regimes of mixed and chaotic classical dynamics.

翻訳日:2023-01-17 14:59:03 公開日:2022-11-28

# 3次元ブラックホールシミュレータによるAdS/CFT対応

AdS/CFT Correspondence with a 3D Black Hole Simulator ( http://arxiv.org/abs/2211.15305v1 )

ライセンス: Link先を確認

Aydin Deger and Jiannis K. Pachos

(参考訳) AdS/CFT対応は高エネルギー・凝縮物質物理学においても洞察に富んでいる。この対応の応用は、反ド・ジッター(AdS)ブラックホールの絡み合いエントロピーと低次元共形場理論(CFT)の双対性である。この対応を明確に示すために、3次元ブラックホール幾何がディラック場に与える影響を非均一トンネル結合を持つフェルミオンの正方格子を用いてシミュレートする。 3次元BTZブラックホール水平線をシミュレーションし、AdS空間の宇宙定数に依存する中心電荷を持つ対応する2次元CFTと一致する領域法挙動を数値的に得る。様々な3dブラックホールプロファイルの体系的な数値的研究は、全ての3dブラックホールが同じcftで表現できるエントロピーな振る舞いを与えることを示唆している。

The AdS/CFT correspondence has been insightful for high-energy and condensed matter physics alike. An application of this correspondence is the duality between the entanglement entropy of Anti-de Sitter (AdS) black holes and lower-dimensional conformal field theories (CFT). To explicitly demonstrate this correspondence we simulate the effect a 3D black hole geometry has on Dirac fields by employing a square lattice of fermions with inhomogeneous tunnelling couplings. Simulating a 3D BTZ black hole horizon, we numerically obtain an area law behaviour that is in agreement with the corresponding 2D CFT with a central charge that depends on the cosmological constant of the AdS space. A systematic numerical investigation of various 3D black hole profiles suggests that all 3D black holes give an entropic behaviour that can be represented by the same CFT.

翻訳日:2023-01-17 14:58:55 公開日:2022-11-28

# エージェントネットワークを利用した量子秘密集約

Quantum secret aggregation utilizing a network of agents ( http://arxiv.org/abs/2211.15758v1 )

ライセンス: Link先を確認

Michael Ampatzis and Theodore Andronikos

(参考訳) この研究では、スパイのネットワークが宇宙の異なる場所に分散されていること、そして各スパイが小さなが不完全な大秘密の一部を持っていると仮定すると、これらの部分的な秘密をスパイマスターに安全に送信し、大きな秘密を明らかにするために組み合わせることができるか、という課題について考察する。我々はこれを量子秘密集約問題と呼び、aliceがスパイマスターの役割を引き継いだ量子ゲームという形で、この問題に完全一般性で対処するプロトコルを提案する。我々のプロトコルは、アリスと彼女のスパイに対称に分布する最大絡み合ったghzタプルの使用に依存している。エージェントからスパイマスターへの小さな部分的な秘密の安全な伝達を可能にするのは、絡み合いの力である。追加のボーナスとして、アンタグルメントはプロトコルのセキュリティを保証し、悪名高い盗賊イヴが大きな秘密を盗むことは統計的に不可能である。

In this work we consider the following problem: given a network of spies, all distributed in different locations in space, and assuming that each spy possesses a small, but incomplete by itself part of a big secret, is it possible to securely transmit all these partial secrets to the spymaster, so that they can be combined together in order to reveal the big secret? We refer to it as the Quantum Secret Aggregation problem, and we propose a protocol, in the form of a quantum game, with Alice taking over the role of the spymaster, that addresses this problem in complete generality. Our protocol relies on the use of maximally entangled GHZ tuples, which are symmetrically distributed among Alice and all her spies. It is the power of entanglement that makes possible the secure transmission of the small partial secrets from the agents to the spymaster. As an additional bonus, entanglement guarantees the security of the protocol, by making it statistically improbable for the notorious eavesdropper Eve to steal the big secret.

翻訳日:2023-01-17 14:52:43 公開日:2022-11-28

# 中性原子量子アーキテクチャにおける使用ベースマイグレーションによるランタイムオーバーヘッドの削減

Reducing Runtime Overhead via Use-Based Migration in Neutral Atom Quantum Architectures ( http://arxiv.org/abs/2211.15757v1 )

ライセンス: Link先を確認

Andrew Litteken (1), Jonathan M. Baker (1), Frederic T. Chong (1) ((1) University of Chicago)

(参考訳) 中性原子はスケーラブルな量子コンピューティングアーキテクチャにとって有望な選択である。長距離通信やネイティブマルチビットゲートといった特徴は、通信コストと運用回数の削減を提供する。しかし、量子ビットとして用いられる閉じ込められた原子は、計算過程や環境要因の悪さにより失われる。失われた計算キュービットの値は回復できず、配列の再ロードと計算の再実行が必要となり、回路の実行数が大幅に増加する。ソフトウェア緩和戦略は存在するが、元のマッピングされた回路の位置を緩やかに使い果たし、アーキテクチャ全体にクビットのクラスタを分散させ、成功の確率を低下させる。私たちは、すべての到達可能な量子ビットを見つける戦略を開発することによって、柔軟性を高めます。第二に、アーキテクチャを別々のセクションに分割し、失われた原子のない各セクションで回路を実行する。アーキテクチャが十分に大きい場合は、アーキテクチャ全体をリロードすることなく回路をリセットする。これにより、アーキテクチャの30%を利用する回路で再ロードする前に有効ショット数を2倍に増やすことができる。また、これらのセクションを使用して回路の実行を並列化し、30キュービットの回路で全体の実行時間を50%削減する。これらの手法は、失われた計算空間の有害な効果と戦うための動的な新しい戦略のセットに寄与する。

Neutral atoms are a promising choice for scalable quantum computing architectures. Features such as long distance interactions and native multiqubit gates offer reductions in communication costs and operation count. However, the trapped atoms used as qubits can be lost over the course of computation and due to adverse environmental factors. The value of a lost computation qubit cannot be recovered and requires the reloading of the array and rerunning of the computation, greatly increasing the number of runs of a circuit. Software mitigation strategies exist but exhaust the original mapped locations of the circuit slowly and create more spread out clusters of qubits across the architecture decreasing the probability of success. We increase flexibility by developing strategies that find all reachable qubits, rather only adjacent hardware qubits. Second, we divide the architecture into separate sections, and run the circuit in each section, free of lost atoms. Provided the architecture is large enough, this resets the circuit without having to reload the entire architecture. This increases the number of effective shots before reloading by a factor of two for a circuit that utilizes 30% of the architecture. We also explore using these sections to parallelize execution of circuits, reducing the overall runtime by a total 50% for 30 qubit circuit. These techniques contribute to a dynamic new set of strategies to combat the detrimental effects of lost computational space.

翻訳日:2023-01-17 14:52:23 公開日:2022-11-28

# Hu-Paz-Zhangマスター方程式の係数の解析的評価:オーミックスペクトル密度、零温度、整合性チェック

Analytical evaluation of the coefficients of the Hu-Paz-Zhang master equation: Ohmic spectral density, zero temperature, and consistency check ( http://arxiv.org/abs/2211.15722v1 )

ライセンス: Link先を確認

G. Homa, J. Z. Bern\'ad, A. Csord\'as

(参考訳) ローレンツドロード型オーミックスペクトル密度を持つゼロ温度の量子高調波発振器に対するhu,paz,zhangの厳密なマスター方程式について検討した。このマスター方程式は量子ブラウン運動の研究において重要な役割を果たし、様々な応用において弱いカップリング極限のような近似の対象となる。本稿では,この非マルコフマスター方程式の係数をリンドブラッド形式を用いずに解析的に評価し,弱結合限界,定常密度作用素の正値,モデルのパラメータの境界などについて検討する。

We investigate the exact master equation of Hu, Paz, and Zhang for a quantum harmonic oscillator at zero temperature with a Lorentz-Drude type ohmic spectral density. This master equation plays an important role in the study of quantum Brownian motion and in various applications it is subject to approximations, like the weak coupling limit. In this paper, we give an analytical evaluation of the coefficients of this non-Markovian master equation without Lindblad form, which allows us to investigate consistencies of the weak coupling limit, the positivity of the stationary density operator, and the boundaries of the model's parameters.

翻訳日:2023-01-17 14:52:04 公開日:2022-11-28

# ライドバーグ配位原子を用いた光学格子中の数体アナログ量子シミュレーション

Few-body analogue quantum simulation with Rydberg-dressed atoms in optical lattices ( http://arxiv.org/abs/2211.15708v1 )

ライセンス: Link先を確認

Daniel Malz and J. Ignacio Cirac

(参考訳) 光学格子内の超低温原子を用いたほとんどの実験は接触相互作用を持つため、強い相互作用の効果を観測するために1サイトあたりの約1原子の高密度で作用する。強い範囲の相互作用は、ほとんど相互作用しない粒子の物理学を探求する道を開くライドバーグドレッシングによって生成される。結晶の単位セルではなく、光学格子の部位を離散化された空間と解釈することができる。これにより、慣れ親しんだアーキテクチャで全く新しいタイプの問題を研究することができる。相互作用のスケーリング法則が異なるものの、量子化学で見られる問題に似た問題を実現する可能性について検討する。数値シミュレーションにより, 単純な擬似原子と-分子は, 最先端実験において高い忠実度で生成できることを示した。

Most experiments with ultracold atoms in optical lattices have contact interactions, and therefore operate at high densities of around one atom per site to observe the effect of strong interactions. Strong ranged interactions can be generated via Rydberg dressing, which opens the path to explore the physics of few interacting particles. Rather than the unit cells of a crystal, the sites of the optical lattice can now be interpreted as discretized space. This allows studying completely new types of problems in a familiar architecture. We investigate the possibility of realizing problems akin to those found in quantum chemistry, although with a different scaling law in the interactions. Through numerical simulation, we show that simple pseudo-atoms and -molecules could be prepared with high fidelity in state-of-the-art experiments.

翻訳日:2023-01-17 14:51:53 公開日:2022-11-28

# 宇宙デブリのための量子重力センサ

Quantum Gravitational Sensor for Space Debris ( http://arxiv.org/abs/2211.15695v1 )

ライセンス: Link先を確認

Meng-Zhi Wu, Marko Toro\v{s}, Sougato Bose, Anupam Mazumdar

(参考訳) 物質波干渉計は、等価原理や重力の量子性をテストするなど、重力実験の基本的な応用がある。さらに、物質波干渉計を量子センサとして使用して、外部の巨大な移動物体による局所重力加速度を測定することで、技術応用に役立てることができる。本稿では,外部移動物体からの重力勾配信号を記述するための3次元モデルを構築し,Stern-Gerlach セットアップに基づく物質波干渉計による達成可能な感度を理論的に検討する。応用として、メソスコピック干渉(MIMAC)と重力波検出法(New J. Phys. 22 083012 (2020))について検討し、周波数空間解析を用いて重力勾配に対する感度を定量化する。我々は,地球近傍の物体と衛星近傍の宇宙デブリを考察し,その距離,速度,方向の関数として物体の最小検出可能な質量を推定する。小惑星、惑星運動、および太陽系の原始ブラックホールから重力勾配を感知する要件を推定することで、結論付けます。

Matter-wave interferometers have fundamental applications for gravity experiments such as testing the equivalence principle and the quantum nature of gravity. In addition, matter-wave interferometers can be used as quantum sensors to measure the local gravitational acceleration caused by external massive moving objects, thus lending itself for technological applications. In this paper, we will establish a three dimensional model to describe the gravity gradient signal from an external moving object, and theoretically investigate the achievable sensitivities using the matter-wave interferometer based on the Stern-Gerlach set-up. As an application we will consider the Mesoscopic Interference for Metric and Curvature (MIMAC) and Gravitational wave detection scheme [New J. Phys. 22, 083012 (2020)] and quantify its sensitivity to gravity gradients using frequency-space analysis. We will consider objects near Earth-based experiments and space debris in proximity of satellites and estimate the minimum detectable mass of the object as a function of their distance, velocity, and orientation. We will conclude by estimating the requirements to sense gravity gradients from asteroids, planetary motion and from outer solar system primordial black holes.

翻訳日:2023-01-17 14:51:41 公開日:2022-11-28

# 量子微分同相写像は不定因数次数を定めない

Quantum diffeomorphisms cannot make indefinite causal order definite ( http://arxiv.org/abs/2211.15685v1 )

ライセンス: Link先を確認

Anne-Catherine de la Hamette, Viktoria Kabel, Marios Christodoulou, and \v{C}aslav Brukner

(参考訳) 不定因果関係の研究は、近年、理論的にも実験的にも急速に進展している。古典的には、2つの時間的な分離事象 A と B の因果順序は、A の前の A か B のどちらかで固定されるが、量子論ではもはやそうではない。ここでは、因果順序の重ね合わせに遭遇することができる。位置、モーメント、その他の性質の重ね合わせが参照フレームや座標系の選択に依存することを明らかにする量子参照フレームに関する最近の研究に照らして、これが因果順序の重ね合わせにも当てはまるかどうかという疑問が生じる。ここでは、量子微分同相に関するこの問題に対して負の答えを与える。まず、2つの事象間の因果順序を世界線一致と3番目の粒子の適切な時間という観点から曖昧に定義する。そして、そのような因果次数の重ね合わせは、各分岐における最も一般的な座標変換のクラス(量子制御、独立微分同相)を通しても定式化できないことを示す。最後に,この結果に基づいて,情報理論と重力的視点を無期限因果順に結びつける。

The study of indefinite causal order has seen rapid development, both theoretically and experimentally, in recent years. While classically the causal order of two timelike separated events A and B is fixed - either A before B or B before A - this is no longer true in quantum theory. There, it is possible to encounter superpositions of causal orders. In light of recent work on quantum reference frames, which reveals that the superposition of locations, momenta, and other properties can depend on the choice of reference frame or coordinate system, the question arises whether this also holds true for superpositions of causal orders. Here, we provide a negative answer to this question for quantum diffeomorphisms. First, we provide an unambiguous definition of causal order between two events in terms of worldline coincidences and the proper time of a third particle. Then, we show that superpositions of causal order defined as such cannot be rendered definite even through the most general class of coordinate transformations - quantum-controlled, independent diffeomorphisms in each branch. Finally, based on our results, we connect the information theoretic and gravitational perspectives on indefinite causal order.

翻訳日:2023-01-17 14:50:53 公開日:2022-11-28

# ユニタリ量子過程のアンシラフリー証明

Ancilla-free certification of unitary quantum processes ( http://arxiv.org/abs/2211.15647v1 )

ライセンス: Link先を確認

Wei Xie

(参考訳) 我々は,ユニタリ量子プロセスのための効率的な量子認証アルゴリズムを,アンシラを使わずに研究する。以前の研究では、未知のユニタリ$u$が既知のユニタリ$v$と同一か、または、未知のユニタリ$v$を固定次元で、o(\varepsilon^{-2})$で、choi状態が使われ、高次元のアンシラシステムが必要であるかを区別できることを示した。 2つのケースを1つのユニタリの$o(\varepsilon^{-1})$で区別するアルゴリズムを与える。

We study efficient quantum certification algorithms for unitary quantum process using no ancilla. Previous study showed that one can distinguish whether an unknown unitary $U$ is equal to or $\varepsilon$-far from a known or unknown unitary $V$ in fixed dimension with $O(\varepsilon^{-2})$ uses of the unitary, in which the Choi state is used and thus a high dimensional ancilla system is always needed. We give an algorithm that distinguishes the two cases with $O(\varepsilon^{-1})$ uses of the unitary, using fewer or no ancilla, outperforming previous relevant results.

翻訳日:2023-01-17 14:50:03 公開日:2022-11-28

# 回路オプトメカニクスによる機械運動の高速フィードバック制御

Fast feedback control of mechanical motion using circuit optomechanics ( http://arxiv.org/abs/2211.15645v1 )

ライセンス: Link先を確認

Cheng Wang, Louise Banniard, Laure Mercier de L\'epinay, and Mika A. Sillanp\"a\"a

(参考訳) アクティブフィードバックループを利用する計測ベースの制御は、技術における標準ツールである。フィードバック制御は、様々な量子系における純粋な量子状態の準備と安定化に使用できる量子技術や関連する基礎研究において有用かつ基本的なツールとして現れている。量子状態よりもはるかに高い熱雑音を呈する中心マイクロメカニカル振動子のフィードバック冷却は特に活発に研究され、近年では光学的測定により地中冷却が可能であることが示されている。ここでは,電気機械システムにおける測定に基づくフィードバック動作を実現し,機械的熱雑音を3量子に冷却する。また,ブルーオプトメカニカルサイドバンドでは,フィードバックを伴わずにシステムが不安定な場合に,高い冷却量が得られる。

Measurement-based control, utilizing an active feedback loop, is a standard tool in technology. Feedback control is also emerging as a useful and fundamental tool in quantum technology and in related fundamental studies, where it can be used to prepare and stabilize pure quantum states in various quantum systems. Feedback-cooling of center-of-mass micromechanical oscillators, which typically exhibit a high thermal noise far above the quantum regime has been particularly actively studied and has recently been shown to allow for ground-state cooling using optical measurements. Here, we realize measurement-based feedback operations in an electromechanical system, cooling the mechanical thermal noise down to 3 quanta, limited by added amplifier noise. Counter-intuitively, we also obtain significant cooling when the system is pumped at the blue optomechanical sideband, where the system is unstable without feedback.

翻訳日:2023-01-17 14:49:47 公開日:2022-11-28

# 量子力学的トンネルの電磁アナログ

Electromagnetic Analogs of Quantum Mechanical Tunnelling ( http://arxiv.org/abs/2211.16369v1 )

ライセンス: Link先を確認

Jeanne Riga and Rebecca Seviour

(参考訳) 本稿では、類似のマクロ電磁システムを用いた量子力学エミッションモデルのための検証検証法(v&v)の基礎となる理論的枠組みを提案する。転送行列を用いた量子力学と電磁磁性の対応を導出し、原子論的量子トンネルシミュレーションを固定するために使用される電磁アナログを記述する。最後に、量子力学系と電磁系を比較して、いくつかの単純で分析的に可溶な例を示し、この枠組みに基づいて将来のV&V研究の概要を述べる。

In this paper, we introduce the theoretical framework underlying our proposed methodology of verification and validation (V&V) for quantum mechanical emission models using analogous macroscopic electromagnetic systems. We derive the correspondence between quantum mechanics and electromagnetism using the transfer matrix approach, and describe the electromagnetic analog that will be used to anchor the atomistic quantum tunneling simulations. Finally, we illustrate this correspondence by comparing the quantum mechanical and electromagnetic systems for some simple, analytically soluble examples and outline future V&V work based on the framework presented here.

翻訳日:2023-01-17 14:43:18 公開日:2022-11-28

# 2モードスクイーズ状態と原子ノイズレス増幅器を用いた量子リピータ

Quantum Repeater using Two-Mode Squeezed States and Atomic Noiseless Amplifiers ( http://arxiv.org/abs/2211.16343v1 )

ライセンス: Link先を確認

Anders J. E. Bjerrum and Jonatan B. Brask and Jonas S. Neergaard-Nielsen and Ulrik L. Andersen

(参考訳) 本研究では, 固体量子ビットのコレクションを用いた無ノイズ増幅法を用いて, 光子損失を受ける2モード圧縮真空状態の保存・精製について理論的に検討する。提案手法は、状態を共有する2つの当事者間の絡み合いを確率的に増大させるために用いられる。提案する増幅ステップは、量子ハサミの集合の構造に類似している。しかし、この研究において増幅ステップは、光モードから量子メモリとして機能する固体量子ビットの集合への状態移動によって実現される。我々は,エンタングル多量子ビットレジスタの生成と,長距離量子鍵分布のための量子リピータの構成という2つの異なる応用について検討する。

We perform a theoretical investigation into how a two-mode squeezed vacuum state, that has undergone photon loss, can be stored and purified using noiseless amplification with a collection of solid-state qubits. The proposed method may be used to probabilistically increase the entanglement between the two parties sharing the state. The proposed amplification step is similar in structure to a set of quantum scissors. However, in this work the amplification step is realized by a state transfer from an optical mode to a set of solid-state qubits, which act as a quantum memory. We explore two different applications, the generation of entangled many-qubit registers, and the construction of quantum repeaters for long-distance quantum key distribution.

翻訳日:2023-01-17 14:43:01 公開日:2022-11-28

# 完全グラフ上のマックスカット問題に対する再帰的量子近似最適化アルゴリズム

Recursive Quantum Approximate Optimization Algorithm for the MAX-CUT problem on Complete graphs ( http://arxiv.org/abs/2211.15832v1 )

ライセンス: Link先を確認

Eunok Bae and Soojoon Lee

(参考訳) 量子近似最適化アルゴリズムは、MAX-CUT問題のような組合せ最適化問題を解くために設計されたハイブリッド量子古典的変分アルゴリズムである。近い将来の量子応用の可能性にもかかわらず、量子近似最適化アルゴリズムは、任意の定数レベル $p$ において、マックスカット問題を解くための特定のインスタンスに制限があることが知られている。近年、量子近似最適化アルゴリズムの非局所バージョンである再帰的量子近似最適化アルゴリズムが、これらの制限を克服するために提案されている。しかし、再帰的量子近似最適化アルゴリズムは、特定のインスタンスに対する元の量子近似最適化アルゴリズムよりも優れているという、主に数値的な証拠によって示されている。本研究では、再帰的量子近似最適化アルゴリズムが、近似比に関する完全グラフに対するMAX-CUT問題を解くために、元のアルゴリズムよりも競争力があることを解析的に証明する。

Quantum approximate optimization algorithms are hybrid quantum-classical variational algorithms designed to approximately solve combinatorial optimization problems such as the MAX-CUT problem. In spite of its potential for near-term quantum applications, it has been known that quantum approximate optimization algorithms have limitations for certain instances to solve the MAX-CUT problem, at any constant level $p$. Recently, the recursive quantum approximate optimization algorithm, which is a non-local version of quantum approximate optimization algorithm, has been proposed to overcome these limitations. However, it has been shown by mostly numerical evidences that the recursive quantum approximate optimization algorithm outperforms the original quantum approximate optimization algorithm for specific instances. In this work, we analytically prove that the recursive quantum approximate optimization algorithm is more competitive than the original one to solve the MAX-CUT problem for complete graphs with respect to the approximation ratio.

翻訳日:2023-01-17 14:42:48 公開日:2022-11-28

# 双曲格子上の準次元粒子のy-cubeモデルとフラクタル構造

Y-cube model and fractal structure of subdimensional particles on hyperbolic lattices ( http://arxiv.org/abs/2211.15829v1 )

ライセンス: Link先を確認

Han Yan, Kevin Slagle, Andriy H. Nevidomskyy

(参考訳) 通常の位相量子相とは異なり、フラクトン位数は基礎となる格子幾何学に依存する。本研究では,超双曲平面のスタックである$H_2\times S^1$空間に埋め込まれた格子上で,Y-cubeモデルと呼ばれるX-cubeモデルの一般化を研究する。 y-cube という名前は、x-cube の x-字型頂点作用素のアナログの y-形に由来する。ある双曲格子テッセレーションに対して、y-cubeモデルは、格子のフラクタル型部分集合上でのみ動くことのできる、新しい種類の準次元粒子であるツリーオン(treeons)を持つ。このような励起は双曲幾何学にのみ現れ、平坦な空間ではツリーンは直線あるいは平面となる。興味深いことに、ある種の双曲型容器の場合、フラクトンは膜演算子(X-キューブモデルのように)または双曲平面内のフラクタル型演算子によって生成できる。

Unlike ordinary topological quantum phases, fracton orders are intimately dependent on the underlying lattice geometry. In this work, we study a generalization of the X-cube model, dubbed the Y-cube model, on lattices embedded in $H_2\times S^1$ space, i.e., a stack of hyperbolic planes. The name `Y-cube' comes from the Y-shape of the analog of the X-cube's X-shaped vertex operator. We demonstrate that for certain hyperbolic lattice tesselations, the Y-cube model hosts a new kind of subdimensional particle, treeons, which can only move on a fractal-shaped subset of the lattice. Such an excitation only appears on hyperbolic geometries; on flat spaces treeons becomes either a lineon or a planeon. Intriguingly, we find that for certain hyperbolic tesselations, a fracton can be created by a membrane operator (as in the X-cube model) or by a fractal-shaped operator within the hyperbolic plane.

翻訳日:2023-01-17 14:42:34 公開日:2022-11-28

# wse$_2$単一光子エミッタを用いた弾性表面波キャビティ光力学

Surface Acoustic Wave Cavity Optomechanics with WSe$_2$ Single Photon Emitters ( http://arxiv.org/abs/2211.15811v1 )

ライセンス: Link先を確認

Sahil D. Patel, Kamyar Parto, Michael Choquer, Sammy Umezawa, Landon Hellman, Daniella Polishchuk, Galan Moody

(参考訳) 表面音響波 (SAWs) は、超伝導量子ビット、スピン、量子エミッタなど、マイクロ波から光周波数にまたがる様々な固体量子システムと共存する汎用的なツールである。ここでは, 超伝導電子回路により駆動される平板状ニオブ酸リチウム共振器上の2次元材料, 特に単層wse$_2$を用いたsaw共振器の光学特性を示す。定常フォトルミネッセンス分光法と時間分解単光子計数法を用いて、変調された2Dエミッタの時間ダイナミクスを異なるSAWキャビティモードに結合させ、30meV/%の変形ポテンシャル結合とエネルギーレベルの分裂を示す。我々はSAWからの大きな異方性ひずみを利用して、ナノ秒の時間スケールでの励起微細構造分割を変調し、2次元材料からオンデマンドに絡み合った光子対生成への応用を見出すことができる。 SAWと2D量子エミッタによるキャビティ光学は、音速、光学、超伝導電子量子システムを組み合わせた多機能統合プラットフォームにおいて、コンパクトセンサーと量子電気光学の機会を提供する。

Surface acoustic waves (SAWs) are a versatile tool for coherently interfacing with a variety of solid-state quantum systems spanning microwave to optical frequencies, including superconducting qubits, spins, and quantum emitters. Here, we demonstrate SAW cavity optomechanics with quantum emitters in 2D materials, specifically monolayer WSe$_2$, on a planar lithium niobate SAW resonator driven by superconducting electronics. Using steady-state photoluminescence spectroscopy and time-resolved single-photon counting, we map the temporal dynamics of modulated 2D emitters under coupling to different SAW cavity modes, showing energy-level splitting consistent with deformation potential coupling of 30 meV/%. We leverage the large anisotropic strain from the SAW to modulate the excitonic fine-structure splitting on a nanosecond timescale, which may find applications for on-demand entangled photon-pair generation from 2D materials. Cavity optomechanics with SAWs and 2D quantum emitters provides opportunities for compact sensors and quantum electro-optomechanics in a multi-functional integrated platform that combines phononic, optical, and superconducting electronic quantum systems.

翻訳日:2023-01-17 14:42:14 公開日:2022-11-28

# 連続波マルチパスイメージングフローサイトメトリー

Continuous wave multi-pass imaging flow cytometry ( http://arxiv.org/abs/2211.15791v1 )

ライセンス: Link先を確認

Yonatan Israel, Joshua L. Reynolds, Brannon B. Klopfer, Mark A. Kasevich

(参考訳) 本稿では,ラベルフリーイメージングフローサイトメトリーの広視野マルチパス実装を提案する。本手法は, 最大4パスのヒト赤血球アンサンブルの高速フローイメージングを行い, コントラストと信号対雑音のx4強調を示す。本手法は, 測定感度の量子限界に近づき, 弱い吸収状態の試料に最適な撮像範囲を拡大することを示す。これにより、限られた照明強度で動的サンプルを撮像する実用的なシナリオにおいて、最適な撮像感度とスループットが得られ、現在利用可能な量子光源で達成されている感度を上回っている。

We present a wide-field multi-pass implementation of label-free imaging flow cytometry. Our technique is shown for high-speed flow imaging of ensembles of human red blood cells with up to four passes, demonstrating x4 enhancement in contrast and signal-to-noise. We show that our technique approaches close to the quantum limit of measurement sensitivity, extending the range of optimal imaging to samples in the weakly absorbing regime. This allows for near optimal imaging sensitivity and throughput in a practical scenario of imaging a dynamic sample under limited illumination intensity, surpassing the sensitivity achieved with currently available quantum light sources.

翻訳日:2023-01-17 14:41:52 公開日:2022-11-28

# 軌道自由関連密度関数型パウリポテンシャルからの原子殻構造

Atomic shell structure from an orbital-free-related density-functional-theory Pauli potential ( http://arxiv.org/abs/2211.15764v1 )

ライセンス: Link先を確認

Russell B. Thompson

(参考訳) 高分子自己整合体場理論技術は、孤立原子に対する放射電子密度と全結合エネルギーを見つけるために用いられる。量子粒子は4次元熱空間における環-ポリマー構造を持つガウス糸としてモデル化され、平均場近似におけるエドワーズ/フローリー-ハギンズ相互作用を用いて熱空間に実装された古典的排除体積に基づいてパウリポテンシャルが仮定される。その他の近似として、電子-電子自己相互作用のフェルミ-アマルディ補正、問題の次元性を減らす球面平均近似、相関の無視がある。ポリマースケーリング理論は、パウリポテンシャルの排除された体積形式が、一様極限における既知のトーマス・フェルミエネルギー密度に還元されることを示すために用いられる。周期表の最初の18要素について、放射基底関数を持つ双線型フーリエ展開を用いて自己整合方程式を解く。放射状電子密度は正しい殻構造を示し、既知の結合エネルギーと比較して全体の結合エネルギーの誤差は最も軽い元素では9%以下であり、窒素よりも重い原子では3%以下である。より一般的には、静的な非相対論的量子力学による予測の等価性を達成するためには、古典的な統計力学において2つの仮定しか必要とされないことが示唆されている。熱空間におけるこれら2つの仮定は、3次元空間におけるハイゼンベルクの不確実性原理とパウリ排他原理と同一となることが示されている。

Polymer self-consistent field theory techniques are used to find radial electron densities and total binding energies for isolated atoms. Quantum particles are modelled as Gaussian threads with ring-polymer architecture in a four dimensional thermal-space, and a Pauli potential is postulated based on classical excluded volume implemented in the thermal-space using Edwards/Flory-Huggins interactions in a mean-field approximation. Other approximations include a Fermi-Amaldi correction for electron-electron self-interactions, a spherical averaging approximation to reduce the dimensionality of the problem, and the neglect of correlations. Polymer scaling theory is used to show that the excluded volume form of Pauli potential reduces to the known Thomas-Fermi energy density in the uniform limit. Self-consistent equations are solved using a bilinear Fourier expansion, with radial basis functions, for the first eighteen elements of the periodic table. Radial electron densities show correct shell structure, and the errors on the total binding energies compared to known binding energies are less than 9% for the lightest elements and drop to 3% or less for atoms heavier than nitrogen. More generally, it is suggested that only two postulates are needed within classical statistical mechanics to achieve equivalency of predictions with static, non-relativistic quantum mechanics: First, quantum particles are modelled as Gaussian threads in four dimensional thermal-space and, second, pairs of threads (allowing for spin) are subject to classical excluded volume in the thermal-space. It is shown that these two postulates in thermal-space become the same as the Heisenberg uncertainty principle and the Pauli exclusion principle in three dimensional space.

翻訳日:2023-01-17 14:41:30 公開日:2022-11-28

# 強(y光子)コヒーレント状態を用いた実験から単一光子弱値を得る

Obtaining a Single-Photon Weak Value from Experiments using a Strong (Many-Photon) Coherent State ( http://arxiv.org/abs/2211.15761v1 )

ライセンス: Link先を確認

Howard M. Wiseman, Aephraim M. Steinberg, Matin Hallaji

(参考訳) 一般的な弱値実験は、1つの状態の1つの粒子を準備し、別の状態の占有数を弱く測定し、第3の状態の粒子を見つける後選択する(クリック)。ほとんどの弱い値の実験は光子を用いて行われているが、単一の光子の合成は困難で速度が遅い。ここでは、上記の弱値は強い(多光子)コヒーレント状態を用いて測定できるが、アバランシェ光ダイオードのような「クリック」検出器はいまだ必要であることを示す。単純にクリックの弱い値をクリックの弱い値から減算し、クリックの確率の単純な関数で答えをスケールする。

A common type of weak-value experiment prepares a single particle in one state, weakly measures the occupation number of another state, and post-selects on finding the particle in a third state (a `click'). Most weak-value experiments have been done with photons, but the heralded preparation of a single photon is difficult and slow of rate. Here we show that the weak value mentioned above can be measured using strong (many-photon) coherent states, while still needing only a `click' detector such as an avalanche photodiode. One simply subtracts the no-click weak value from the click weak-value, and scales the answer by a simple function of the click probability.

翻訳日:2023-01-17 14:40:58 公開日:2022-11-28

# 必要か? 職種に必要な技能のランク付け

Is it Required? Ranking the Skills Required for a Job-Title ( http://arxiv.org/abs/2212.08553v1 )

ライセンス: Link先を確認

Sarthak Anand, Jens-Joris Decorte, Niels Lowie

(参考訳) 本稿では,ある職種に対して必要なスキルをランク付けする手法について述べる。我々の分析によると、同様の職種では重要/関連スキルが頻繁に現れる。本稿では,Language-agnostic BERT Sentence Encoder (LaBSE)モデルをトレーニングし,弱い監督力を用いてスキルの重要性を予測する。モデルはスキルの重要性を学び、他の言語でうまく機能することを示す。さらに,スキルの逆文書頻度因子が,特殊スキルをいかに促進するかを示す。

In this paper, we describe our method for ranking the skills required for a given job title. Our analysis shows that important/relevant skills appear more frequently in similar job titles. We train a Language-agnostic BERT Sentence Encoder (LaBSE) model to predict the importance of the skills using weak supervision. We show the model can learn the importance of skills and perform well in other languages. Furthermore, we show how the Inverse Document Frequency factor of skill boosts the specialised skills.

翻訳日:2022-12-25 03:21:39 公開日:2022-11-28

# 潜伏配列構造モデルによる抗菌ペプチド発見の加速

Accelerating Antimicrobial Peptide Discovery with Latent Sequence-Structure Model ( http://arxiv.org/abs/2212.09450v1 )

ライセンス: Link先を確認

Danqing Wang, Zeyu Wen, Fei Ye, Hao Zhou, Lei Li

(参考訳) 抗菌ペプチド (amp) は広スペクトル抗生物質および薬剤耐性感染症の治療において有望な治療法である。近年、AMP発見を加速する深層生成モデルを導入している研究者が増えている。しかし、近年の研究は主に、AMPの生物学的機能において重要な配列属性と構造情報の無視に焦点を当てている。本稿では,マルチスケールVQ-VAEを用いたAMP(LSSAMP)の潜在シーケンス構造モデルを提案する。潜伏空間でサンプリングすることにより、LSSAMPは理想的な配列属性と二次構造を持つペプチドを同時に生成することができる。実験の結果,LSSAMPにより産生されるペプチドはAMPの確率が高く,21の候補のうち2つは優れた抗菌活性を有することが確認された。我々のモデルは、生物実験のフォローアップのための高品質なAMP候補を作成し、AMP発見全体を加速するのに役立つ。

Antimicrobial peptide (AMP) is a promising therapy in the treatment of broad-spectrum antibiotics and drug-resistant infections. Recently, an increasing number of researchers have been introducing deep generative models to accelerate AMP discovery. However, current studies mainly focus on sequence attributes and ignore structure information, which is important in AMP biological functions. In this paper, we propose a latent sequence-structure model for AMPs (LSSAMP) with multi-scale VQ-VAE to incorporate secondary structures. By sampling in the latent space, LSSAMP can simultaneously generate peptides with ideal sequence attributes and secondary structures. Experimental results show that the peptides generated by LSSAMP have a high probability of AMP, and two of the 21 candidates have been verified to have good antimicrobial activity. Our model will be released to help create high-quality AMP candidates for follow-up biological experiments and accelerate the whole AMP discovery.

翻訳日:2022-12-25 03:20:41 公開日:2022-11-28

# syn-qg: 質問生成のための構文と浅い意味規則

Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation ( http://arxiv.org/abs/2004.08694v5 )

ライセンス: Link先を確認

Kaustubh D. Dhole and Christopher D. Manning

(参考訳) 質問生成(qg)は、基本的には単純な構文変換であるが、意味論の多くの側面は、どの質問が形式に良いかに影響する。この観察は、普遍的な依存関係、浅いセマンティックパーシング、語彙資源、および宣言文を質問対に変換するカスタムルールを活用する透明な統語規則であるSynQGを開発することで実現される。 propbank の引数記述と verbnet 状態述語を用いて、浅い意味的コンテンツを取り込んで記述的性質の質問を生成し、既存のシステムよりも推論的かつ意味的にリッチな質問を生成する。文法的不正確な質問を排除し,構文の流布性を改善するために,これらの構文規則のアウトプットを逆翻訳する。クラウドソースによる評価の結果,我々のシステムは従来のQGシステムよりも文法的・関連性の高い質問を多く生成でき,バックトランスレーションは無関係な質問を生成するためのわずかなコストで文法性を劇的に向上させることがわかった。

Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing SynQG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer pairs. We utilize PropBank argument descriptions and VerbNet state predicates to incorporate shallow semantic content, which helps generate questions of a descriptive nature and produce inferential and semantically richer questions than existing systems. In order to improve syntactic fluency and eliminate grammatically incorrect questions, we employ back-translation over the output of these syntactic rules. A set of crowd-sourced evaluations shows that our system can generate a larger number of highly grammatical and relevant questions than previous QG systems and that back-translation drastically improves grammaticality at a slight cost of generating irrelevant questions.

翻訳日:2022-12-12 05:00:04 公開日:2022-11-28

# AIの定義とこの定義を満たすプログラム

The AI Definition and a Program Which Satisfies this Definition ( http://arxiv.org/abs/2212.03184v1 )

ライセンス: Link先を確認

Dimiter Dobrev

(参考訳) 我々はエージェントのすべてのポリシーを検討し、その1つが最良の実行方針であることを証明します。このポリシーは計算可能ではないが、計算可能なポリシーはその近傍に存在する。私たちはAIを、最高のパフォーマンスポリシーに十分近い計算可能なポリシーとして定義します。エージェントの最高の実行ポリシーを定義する前に、世界を記述するための言語が必要です。 AIの定義を満たすプログラムを開発するためにも、この言語を使用します。プログラムはまず、選択した言語で記述することで世界を理解する。プログラムは、将来を予測するために記述を使用し、可能な限り最良の行動を選択する。このプログラムは非常に非効率で実用的には使用できないが、世界の記述のための言語と未来を予測するアルゴリズムの両方を精製することで改善することができる。これにより、AI定義の効率的かつ一貫性のあるプログラムが得られる。

We will consider all policies of the agent and will prove that one of them is the best performing policy. While that policy is not computable, computable policies do exist in its proximity. We will define AI as a computable policy which is sufficiently proximal to the best performing policy. Before we can define the agent's best performing policy, we need a language for description of the world. We will also use this language to develop a program which satisfies the AI definition. The program will first understand the world by describing it in the selected language. The program will then use the description in order to predict the future and select the best possible move. While this program is extremely inefficient and practically unusable, it can be improved by refining both the language for description of the world and the algorithm used to predict the future. This can yield a program which is both efficient and consistent with the AI definition.

翻訳日:2022-12-11 12:52:01 公開日:2022-11-28

# RSS-based Localizationにおけるプライオリティの有効利用について

On the Effective Usage of Priors in RSS-based Localization ( http://arxiv.org/abs/2212.00728v1 )

ライセンス: Link先を確認

\c{C}a\u{g}kan Yapar, Fabian Jaensch, Ron Levie, Giuseppe Caire

(参考訳) 本稿では,密集した都市環境におけるローカライズ問題について考察する。このような環境では、Global Navigation Satellite Systemsは、建物のような障害物が存在するため、受信機(Rx)と衛星との間のLOS(Line-of-sight)リンクの可能性が低いため、精度が良くない。したがって、NLOS(Non-of-Sight)条件下で確実に動作可能な他の技術を利用する必要がある。近年,受信信号強度(rss)指紋と畳み込みニューラルネットワークに基づくアルゴリズムlocunetを提案し,広く採用されているk-nearest neighbors(knn)アルゴリズムとtoa(state-of-the-art time of arrival)範囲ベース手法に関してその最先端の局在化性能を実証した。本研究では,Rx位置やRxの位置の事前分布を学習し,トレーニングデータから送信者(Tx)関連を選好するLocUNetの能力を認識し,その高い性能をこれらに関連付ける。逆に, 確率的手法に基づく古典的手法は, 事前情報を適切に組み込むことにより, 大いに有益であることを示す。また,LocUNetの最適性能を理論的に最適な定式化と比較することにより,多くの設定で数値的に証明する。

In this paper, we study the localization problem in dense urban settings. In such environments, Global Navigation Satellite Systems fail to provide good accuracy due to low likelihood of line-of-sight (LOS) links between the receiver (Rx) to be located and the satellites, due to the presence of obstacles like the buildings. Thus, one has to resort to other technologies, which can reliably operate under non-line-of-sight (NLOS) conditions. Recently, we proposed a Received Signal Strength (RSS) fingerprint and convolutional neural network-based algorithm, LocUNet, and demonstrated its state-of-the-art localization performance with respect to the widely adopted k-nearest neighbors (kNN) algorithm, and to state-of-the-art time of arrival (ToA) ranging-based methods. In the current work, we first recognize LocUNet's ability to learn the underlying prior distribution of the Rx position or Rx and transmitter (Tx) association preferences from the training data, and attribute its high performance to these. Conversely, we demonstrate that classical methods based on probabilistic approach, can greatly benefit from an appropriate incorporation of such prior information. Our studies also numerically prove LocUNet's close to optimal performance in many settings, by comparing it with the theoretically optimal formulations.

翻訳日:2022-12-02 17:57:33 公開日:2022-11-28

# 脳波を用いた脳-コンピュータインタフェースにおける逆アーチファクト検出

Adversarial Artifact Detection in EEG-Based Brain-Computer Interfaces ( http://arxiv.org/abs/2212.00727v1 )

ライセンス: Link先を確認

Xiaoqing Chen and Dongrui Wu

(参考訳) 機械学習は脳波(EEG)ベースの脳-コンピュータインターフェース(BCI)において大きな成功を収めた。既存のbci研究のほとんどは精度の向上に重点を置いていたが、セキュリティを考慮に入れていたものはほとんどなかった。しかし最近の研究では、脳波に基づくBCIは敵の攻撃に弱いことが示されており、入力に小さな摂動が加えられると誤分類が起こる可能性がある。敵の例の検出は,この現象の理解と防御の両方に不可欠である。本稿では,脳波によるBCIの逆検出を初めて検討する。 3つの畳み込みニューラルネットワークを用いた2つの脳波データセットの実験を行い、複数の検出手法の性能を検証する。ホワイトボックス攻撃とブラックボックス攻撃の両方が検出可能であり,前者の方が検出が容易であることを示した。

Machine learning has achieved great success in electroencephalogram (EEG) based brain-computer interfaces (BCIs). Most existing BCI research focused on improving its accuracy, but few had considered its security. Recent studies, however, have shown that EEG-based BCIs are vulnerable to adversarial attacks, where small perturbations added to the input can cause misclassification. Detection of adversarial examples is crucial to both the understanding of this phenomenon and the defense. This paper, for the first time, explores adversarial detection in EEG-based BCIs. Experiments on two EEG datasets using three convolutional neural networks were performed to verify the performances of multiple detection approaches. We showed that both white-box and black-box attacks can be detected, and the former are easier to detect.

翻訳日:2022-12-02 15:36:51 公開日:2022-11-28

# 構造に基づく薬物設計のための強化遺伝的アルゴリズム

Reinforced Genetic Algorithm for Structure-based Drug Design ( http://arxiv.org/abs/2211.16508v1 )

ライセンス: Link先を確認

Tianfan Fu, Wenhao Gao, Connor W. Coley, Jimeng Sun

(参考訳) SBDD(Structure-based drug design)は、疾患関連タンパク質(ターゲット)に強く結合する分子(配位子)を見つけることで、薬物候補を見つけることを目的としている。近年,タンパク質ポケットに3次元分子設計を適用してSBDDを解く手法が注目されているが,確率的モデルとしての定式化は不満足な最適化性能をもたらすことが多い。一方、遺伝的アルゴリズム(GA)のような従来の組合せ最適化手法は、様々な分子最適化タスクにおいて最先端の性能を示す。しかし、彼らはタンパク質標的構造を利用して設計手順を知らせるのではなく、ランダムウォークのような探索に依存しており、同様の結合物理学にもかかわらず、不安定な性能と異なるタスク間の知識伝達を起こさない。より安定で効率的なsbddを実現するために、神経モデルを用いて、利益の出る設計ステップを優先順位付けし、ランダムウォーク動作を抑制する強化遺伝的アルゴリズム(rga)を提案する。ニューラルモデルは、ターゲットとリガンドの3d構造を入力とし、異なるターゲットからの共有結合物理学の知識を利用して、最適化中に微調整される。各種疾患ターゲットに対する結合親和性を最適化する実験的な研究を行い、RGAがドッキングスコアにおいてベースラインより優れ、ランダム初期化に対してより堅牢であることを示す。アブレーション研究では、異なる目標に対するトレーニングが、結合プロセスの共有基盤物理を活用することで、パフォーマンスを向上させることも示している。コードはhttps://github.com/futianfan/reinforced-genetic-algorithmで入手できる。

Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules (ligands) that bind tightly to a disease-related protein (targets), which is the primary approach to computer-aided drug discovery. Recently, applying deep generative models for three-dimensional (3D) molecular design conditioned on protein pockets to solve SBDD has attracted much attention, but their formulation as probabilistic modeling often leads to unsatisfactory optimization performance. On the other hand, traditional combinatorial optimization methods such as genetic algorithms (GA) have demonstrated state-of-the-art performance in various molecular optimization tasks. However, they do not utilize protein target structure to inform design steps but rely on a random-walk-like exploration, which leads to unstable performance and no knowledge transfer between different tasks despite the similar binding physics. To achieve a more stable and efficient SBDD, we propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior. The neural models take the 3D structure of the targets and ligands as inputs and are pre-trained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then fine-tuned during optimization. We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations. The ablation study also indicates that the training on different targets helps improve performance by leveraging the shared underlying physics of the binding processes. The code is available at https://github.com/futianfan/reinforced-genetic-algorithm.

翻訳日:2022-12-01 18:10:41 公開日:2022-11-28

# PACによる統計的アルゴリズムの検証

PAC Verification of Statistical Algorithms ( http://arxiv.org/abs/2211.17096v1 )

ライセンス: Link先を確認

Saachi Mutreja, Jonathan Shafer

(参考訳) goldwasser et alの略。 2021)は、最近PAC検証の設定を提案し、そこでは、非依存的なPAC学習目標を満たす仮説(機械学習モデル)を対話的証明を用いて検証した。本稿では,この概念をさらに様々な方法で展開する。まず、VC 次元 $d$ の仮説クラスに対する $\Omega(\sqrt{d})$ i.i.d.\ サンプルの PAC 検証に対する下界を証明する。第二に、このタスクに対する提案したプロトコルを改善し、下位境界にマッチする$\mathbb{R}$を超える間隔の和のPAC検証のためのプロトコルを提案する。第3に,その定義の自然な一般化を一般統計アルゴリズムの検証に導入する。提案した定義を裏付ける上で,我々の最終結果は,クエリの組合せ制約を満たす統計的クエリアルゴリズムの検証のためのプロトコルである。

Goldwasser et al.\ (2021) recently proposed the setting of PAC verification, where a hypothesis (machine learning model) that purportedly satisfies the agnostic PAC learning objective is verified using an interactive proof. In this paper we develop this notion further in a number of ways. First, we prove a lower bound for PAC verification of $\Omega(\sqrt{d})$ i.i.d.\ samples for hypothesis classes of VC dimension $d$. Second, we present a protocol for PAC verification of unions of intervals over $\mathbb{R}$ that improves upon their proposed protocol for that task, and matches our lower bound. Third, we introduce a natural generalization of their definition to verification of general statistical algorithms, which is applicable to a wider variety of practical algorithms beyond agnostic PAC learning. Showcasing our proposed definition, our final result is a protocol for the verification of statistical query algorithms that satisfy a combinatorial constraint on their queries.

翻訳日:2022-12-01 16:05:26 公開日:2022-11-28

# マルチタスク学習のためのセミソフトタスククラスタリング

Semisoft Task Clustering for Multi-Task Learning ( http://arxiv.org/abs/2211.17204v1 )

ライセンス: Link先を確認

Yuzhao Zhang, Yifan Sun

(参考訳) マルチタスク学習(MTL)は、複数の関連する予測タスクの性能を向上させることを目的としている。未知の係数を著しく低減する柔軟性と能力のため、タスククラスタリングに基づくMTLアプローチは注目されている。そこで我々は,データの半ソフトクラスタリングの考え方に動機づけられ,純粋タスクと混合タスクの両方のタスククラスタ構造を同時に明らかにし,関連する機能を選択する半ソフトタスククラスタリング手法を提案する。このアプローチの背後にある主な前提は、各クラスタにはいくつかの純粋なタスクがあり、それぞれの混合タスクは異なるクラスタ内の純粋なタスクの線形結合によって表現できるということです。結果として生じる非凸制約最適化問題を解決するために,効率的な3ステップアルゴリズムを設計する。合成および実世界のデータセットに基づく実験結果は,提案手法の有効性と有効性を検証する。最後に,提案手法をロバストなタスククラスタリング問題に拡張する。

Multi-task learning (MTL) aims to improve the performance of multiple related prediction tasks by leveraging useful information from them. Due to their flexibility and ability to reduce unknown coefficients substantially, the task-clustering-based MTL approaches have attracted considerable attention. Motivated by the idea of semisoft clustering of data, we propose a semisoft task clustering approach, which can simultaneously reveal the task cluster structure for both pure and mixed tasks as well as select the relevant features. The main assumption behind our approach is that each cluster has some pure tasks, and each mixed task can be represented by a linear combination of pure tasks in different clusters. To solve the resulting non-convex constrained optimization problem, we design an efficient three-step algorithm. The experimental results based on synthetic and real-world datasets validate the effectiveness and efficiency of the proposed approach. Finally, we extend the proposed approach to a robust task clustering problem.

翻訳日:2022-12-01 16:05:10 公開日:2022-11-28

# 音声認識と名前付きエンティティ認識を用いた顧客会話からのキーエンティティの扱いと抽出

Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition ( http://arxiv.org/abs/2211.17107v1 )

ライセンス: Link先を確認

Sharvi Endait, Ruturaj Ghatage, Prof. DD Kadam

(参考訳) eコマースが急速に発展する現代のテクノロジー時代において、顧客の要求や詳細をビジネス上の会話から理解することが非常に重要である。これは顧客の維持と満足にとって非常に重要です。これらの会話から重要な洞察を抽出することは、製品の開発や問題解決において非常に重要です。顧客のフィードバック、反応、製品の重要な詳細を理解することは不可欠であり、名前付きエンティティ認識(NER)を使用して行われる。エンティティを抽出するために、最適な音声-テキストモデルを用いて会話をテキストに変換する。このモデルは、会話をテキストに変換する2段階のネットワークである。そして、NER BERTトランスモデルを用いて、ロバストな手法を用いて適切なエンティティを抽出する。これによって、彼らが直面している問題が発生した場合、顧客エクスペリエンスの充実に役立つでしょう。顧客が問題に直面したら、電話して苦情を登録します。モデルがこの会話から重要な特徴を抽出し、問題を調べるのに必要となる。これらの機能には、注文番号や正確な問題などの詳細が含まれている。これらすべては会話から直接抽出され、また会話を行う労力を減らすことになる。

In this modern era of technology with e-commerce developing at a rapid pace, it is very important to understand customer requirements and details from a business conversation. It is very crucial for customer retention and satisfaction. Extracting key insights from these conversations is very important when it comes to developing their product or solving their issue. Understanding customer feedback, responses, and important details of the product are essential and it would be done using Named entity recognition (NER). For extracting the entities we would be converting the conversations to text using the optimal speech-to-text model. The model would be a two-stage network in which the conversation is converted to text. Then, suitable entities are extracted using robust techniques using a NER BERT transformer model. This will aid in the enrichment of customer experience when there is an issue which is faced by them. If a customer faces a problem he will call and register his complaint. The model will then extract the key features from this conversation which will be necessary to look into the problem. These features would include details like the order number, and the exact problem. All these would be extracted directly from the conversation and this would reduce the effort of going through the conversation again.

翻訳日:2022-12-01 15:47:21 公開日:2022-11-28

# スコアベース拡散モデルにおける判別器指導による精錬生成過程

Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models ( http://arxiv.org/abs/2211.17091v1 )

ライセンス: Link先を確認

Dongjun Kim, Yeongmin Kim, Wanmo Kang, Il-Chul Moon

(参考訳) 拡散モデルの成功は様々な領域で目撃されているが、生成過程の変動についての研究はごくわずかである。本稿では、スコアチェックポイントが同じであれば、元の生成プロセスよりも逆プロセスに近い新しい生成プロセスを提案する。具体的には、生成過程を実データと生成データとの間の補助判別器で調整する。これにより、判別器による調整された生成プロセスは、元のプロセスよりも現実的なサンプルを生成する。実験では,CIFAR-10では1.74,CelebAでは1.33,FFHQでは1.88の新たなSOTA FIDが得られた。

While the success of diffusion models has been witnessed in various domains, only a few works have investigated the variation of the generative process. In this paper, we introduce a new generative process that is closer to the reverse process than the original generative process, given the identical score checkpoint. Specifically, we adjust the generative process with the auxiliary discriminator between the real data and the generated data. Consequently, the adjusted generative process with the discriminator generates more realistic samples than the original process. In experiments, we achieve new SOTA FIDs of 1.74 on CIFAR-10, 1.33 on CelebA, and 1.88 on FFHQ in the unconditional generation.

翻訳日:2022-12-01 15:27:27 公開日:2022-11-28

# 線形関数近似を用いたリーダフォロワーMDPにおけるモデルフリーRLの確率論的評価

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation ( http://arxiv.org/abs/2211.15792v1 )

ライセンス: Link先を確認

Arnob Ghosh

(参考訳) エピソードの各ステップでエージェント(リーダー)が行動し、次に別のエージェント(フォロワー)が続くマルチエージェント・エピソードMDPのセットアップを考える。状態の進化と報酬は、リーダーと従者の合同行動ペアに依存する。このようなインタラクションは、スマートグリッド、メカニズム設計、セキュリティ、ポリシー作成など、多くの分野のアプリケーションを見つけることができる。ビジットフィードバック設定の下で、証明可能なパフォーマンス保証を持つプレイヤーの両方のポリシーを学ぶ方法に興味があります。我々は,リーダとフォロワーの両方が非ミオピック(非ミオピック)である,すなわち,エピソード全体を通じて報酬を最大化し,多くのRLアプリケーションで非常に一般的な連続的な状態空間をモデル化可能な線形MDPを検討する,という設定に焦点を当てる。我々は、"em model-free" rlアルゴリズムを提案し、$\tilde{\mathcal{o}}(\sqrt{d^3h^3t})$ regret boundsがリーダーとフォロワーの両方に対して達成できることを示し、ここで$d$は特徴マッピングの次元、$h$はエピソードの長さ、$t$はバンディットフィードバック情報の設定下のステップの総数であることを示した。したがって、状態の数が無限になった場合でも結果が成り立つ。このアルゴリズムはLSVI-UCBアルゴリズムの適応に依存している。具体的には、標準の欲求政策を(最高の反応として)リーダーとフォロワーの両方にとってのソフトマックス政策に置き換えます。これは値関数に対して一様濃度境界を確立する上で鍵となる。我々の知る限りでは、これは関数近似を持つ非ミオピックフォロワを持つマルコフゲームに対する最初の半線形後悔の限度保証である。

We consider a multi-agent episodic MDP setup where an agent (leader) takes action at each step of the episode followed by another agent (follower). The state evolution and rewards depend on the joint action pair of the leader and the follower. Such type of interactions can find applications in many domains such as smart grids, mechanism design, security, and policymaking. We are interested in how to learn policies for both the players with provable performance guarantee under a bandit feedback setting. We focus on a setup where both the leader and followers are {\em non-myopic}, i.e., they both seek to maximize their rewards over the entire episode and consider a linear MDP which can model continuous state-space which is very common in many RL applications. We propose a {\em model-free} RL algorithm and show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret bounds can be achieved for both the leader and the follower, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps under the bandit feedback information setup. Thus, our result holds even when the number of states becomes infinite. The algorithm relies on {\em novel} adaptation of the LSVI-UCB algorithm. Specifically, we replace the standard greedy policy (as the best response) with the soft-max policy for both the leader and the follower. This turns out to be key in establishing uniform concentration bound for the value functions. To the best of our knowledge, this is the first sub-linear regret bound guarantee for the Markov games with non-myopic followers with function approximation.

翻訳日:2022-11-30 18:16:39 公開日:2022-11-28

# クラスター化による新旧代謝産物の予測経路

Predicting pathways for old and new metabolites through clustering ( http://arxiv.org/abs/2211.15720v1 )

ライセンス: Link先を確認

Thiru Siddharth, Nathan Lewis

(参考訳) 多様な代謝経路は、エネルギーを収穫し、バイオマス成分を合成し、微小環境と相互作用する分子を生産し、毒素を中和する、すべての生物にとって基本である。新しい代謝産物や経路の発見が続いているが、新しい代謝産物の経路の予測は困難である。新しい代謝産物の経路を解明するのに膨大な時間を要するため、HMDBによると代謝産物の60%しか経路に割り当てられていない。本稿では,代謝物構造に基づく経路同定手法を提案する。 SMILESアノテーションから201の特徴を抽出し,PubMed抽象とHMDBから新たな代謝産物を同定した。両特徴群にクラスタリングアルゴリズムを適用した結果,代謝産物間の相関を定量化し,既知の代謝産物の92%をそれぞれの経路に正確に関連付けた。したがって、このアプローチは新しい代謝産物の代謝経路を予測するのに有用である。

The diverse metabolic pathways are fundamental to all living organisms, as they harvest energy, synthesize biomass components, produce molecules to interact with the microenvironment, and neutralize toxins. While discovery of new metabolites and pathways continues, the prediction of pathways for new metabolites can be challenging. It can take vast amounts of time to elucidate pathways for new metabolites; thus, according to HMDB only 60% of metabolites get assigned to pathways. Here, we present an approach to identify pathways based on metabolite structure. We extracted 201 features from SMILES annotations, and identified new metabolites from PubMed abstracts and HMDB. After applying clustering algorithms to both groups of features, we quantified correlations between metabolites, and found the clusters accurately linked 92% of known metabolites to their respective pathways. Thus, this approach could be valuable for predicting metabolic pathways for new metabolites.

翻訳日:2022-11-30 18:10:33 公開日:2022-11-28

# 条件付き生成型adversarial networkを用いたiactデータ解析のための生成画像の統計的特性制御

Using a Conditional Generative Adversarial Network to Control the Statistical Characteristics of Generated Images for IACT Data Analysis ( http://arxiv.org/abs/2211.15807v1 )

ライセンス: Link先を確認

Julia Dubenskaya, Alexander Kryukov, Andrey Demichev, Stanislav Polyakov, Elizaveta Gres, Anna Vlaskina

(参考訳) 生成逆ネットワークは天文学領域における画像生成に有望なツールである。特に興味深いのは条件付き生成対向ネットワーク(cGAN)で、画像のいくつかの特性の値に応じて画像を複数のクラスに分割し、新しい画像を生成する際に必要なクラスを指定することができる。大気圧チェレンコフ望遠鏡(IACT)の画像の場合、重要な特性はすべての画像ピクセルの明るさ(画像サイズ)であり、これは一次粒子のエネルギーと直接相関している。我々は,TAIGA-IACT実験で得られた画像と類似した画像を生成するために,cGANを用いた。トレーニングセットとして,TAIGAモンテカルロシミュレーションソフトウェアを用いて生成した2次元画像の集合を用いた。トレーニングセットを10クラスに人工的に分割し,画像のサイズを分類し,同じ数の画像が各クラスに収まるようにクラスの境界を定義する。これらのクラスはネットワークのトレーニングに使われました。本稿は,各クラスについて,生成した画像のサイズ分布が正規に近いことを示し,その平均値が対応するクラスのほぼ中間に位置することを示す。また,生成した画像に対して,全クラスにわたる分布を合計した総画像サイズ分布がトレーニングセットの原分布に近いことを示す。得られた結果は、IACTsが撮影したものと同様のリアルな合成画像のより正確な生成に役立ちます。

Generative adversarial networks are a promising tool for image generation in the astronomy domain. Of particular interest are conditional generative adversarial networks (cGANs), which allow you to divide images into several classes according to the value of some property of the image, and then specify the required class when generating new images. In the case of images from Imaging Atmospheric Cherenkov Telescopes (IACTs), an important property is the total brightness of all image pixels (image size), which is in direct correlation with the energy of primary particles. We used a cGAN technique to generate images similar to whose obtained in the TAIGA-IACT experiment. As a training set, we used a set of two-dimensional images generated using the TAIGA Monte Carlo simulation software. We artificiallly divided the training set into 10 classes, sorting images by size and defining the boundaries of the classes so that the same number of images fall into each class. These classes were used while training our network. The paper shows that for each class, the size distribution of the generated images is close to normal with the mean value located approximately in the middle of the corresponding class. We also show that for the generated images, the total image size distribution obtained by summing the distributions over all classes is close to the original distribution of the training set. The results obtained will be useful for more accurate generation of realistic synthetic images similar to the ones taken by IACTs.

翻訳日:2022-11-30 18:10:19 公開日:2022-11-28

# ニューラルネットワーク:星間媒質の化学問題を解決する

Neural networks: solving the chemistry of the interstellar medium ( http://arxiv.org/abs/2211.15688v1 )

ライセンス: Link先を確認

Lorenzo Branca, Andrea Pallottini

(参考訳) 非平衡化学は、インターステラー中間体(ISM)の研究において重要な過程であり、特に分子雲や星の形成である。しかし、一般に(>40)反応の数が多いこと、短い進化の時間スケール(ISMの動的時間よりも約10^4$)、関連する正規微分方程式系(ODE)の特徴的な非線形性と剛性など、天体物理学シミュレーションに含めることが最も難しいタスクの1つである。この概念研究の証明では、物理インフォームドニューラルネットワーク(PINN)が、硬質熱化学系のための従来のODE時間積分器、すなわち水素分子生成(9種46反応)の代替となることを示す。 2< \log n/{\rm cm}^{-3}<3$) と温度(1< \log T/{\rm K}<5$) で異なる化学ネットワークをテストすると、基本的なアーキテクチャは単純な化学システムにのみ快適な収束を与えることができ、Deep Galerkin法が必要とされる突然の化学的および熱的変動を適切に捉えることができる。トレーニングされた(\sim 10^3$ GPUhr)PINNは、ソリューションの強い非線形特性(errors $\lesssim 10\%$)をうまく再現し、従来のODEソルバに対して最大$\sim 200$までスピードアップすることができる。さらに、後者は初期$n$と$T$で約$\sim 30\%$の完了時間を持ち、PINNメソッドは無視できるバリエーションを提供する。ロードバランシングのスピードアップと潜在的な改善は、ピン駆動のシミュレーションが天体物理学や宇宙論の問題における複雑な化学計算を解決する非常に好適な方法であることを暗示している。

Non-equilibrium chemistry is a key process in the study of the InterStellar Medium (ISM), in particular the formation of molecular clouds and thus stars. However, computationally it is among the most difficult tasks to include in astrophysical simulations, because of the typically high (>40) number of reactions, the short evolutionary timescales (about $10^4$ times less than the ISM dynamical time) and the characteristic non-linearity and stiffness of the associated Ordinary Differential Equations system (ODEs). In this proof of concept work, we show that Physics Informed Neural Networks (PINN) are a viable alternative to traditional ODE time integrators for stiff thermo-chemical systems, i.e. up to molecular hydrogen formation (9 species and 46 reactions). Testing different chemical networks in a wide range of densities ($-2< \log n/{\rm cm}^{-3}< 3$) and temperatures ($1 < \log T/{\rm K}< 5$), we find that a basic architecture can give a comfortable convergence only for simplified chemical systems: to properly capture the sudden chemical and thermal variations a Deep Galerkin Method is needed. Once trained ($\sim 10^3$ GPUhr), the PINN well reproduces the strong non-linear nature of the solutions (errors $\lesssim 10\%$) and can give speed-ups up to a factor of $\sim 200$ with respect to traditional ODE solvers. Further, the latter have completion times that vary by about $\sim 30\%$ for different initial $n$ and $T$, while the PINN method gives negligible variations. Both the speed-up and the potential improvement in load balancing imply that PINN-powered simulations are a very palatable way to solve complex chemical calculation in astrophysical and cosmological problems.

翻訳日:2022-11-30 17:44:02 公開日:2022-11-28

# CWD: 未知のクラウドワークロードを検出する機械学習ベースのアプローチ

CWD: A Machine Learning based Approach to Detect Unknown Cloud Workloads ( http://arxiv.org/abs/2211.15739v1 )

ライセンス: Link先を確認

Mohammad Hossain, Derssie Mebratu, Niranjan Hasabnis, Jun Jin, Gaurav Chaudhary, Noah Shen

(参考訳) 現代のクラウドデータセンターのワークロードはますます複雑になりつつある。クラウドサービスプロバイダ(csp)は、オンデマンドサービスをリアルタイムにサポートしています。クラウド環境とクラウドワークロードの複雑さが増す中、IntelやAMDといったハードウェアベンダは、CPUプラットフォームにクラウド固有のワークロード加速機能を導入している。これらの機能は一般的に人気があり、一般的に使用されているクラウドワークロードをターゲットにしている。それにもかかわらず、顧客固有のワークロード(未知のワークロード)は、その特性が共通のワークロード(既知のワークロード)とは異なる場合、基盤となるプラットフォームの可能性に気付かない可能性がある。基盤となるプラットフォームの全可能性を実現するこの問題を解決するために、クラウド環境で実行されるワークロードを特徴付け、プロファイル化し、予測する機械学習技術を開発した。本手法の実験的評価は良好な予測性能を示す。また,モデルの性能をスタンドアロンで解析する手法も開発している。

Workloads in modern cloud data centers are becoming increasingly complex. The number of workloads running in cloud data centers has been growing exponentially for the last few years, and cloud service providers (CSP) have been supporting on-demand services in real-time. Realizing the growing complexity of cloud environment and cloud workloads, hardware vendors such as Intel and AMD are increasingly introducing cloud-specific workload acceleration features in their CPU platforms. These features are typically targeted towards popular and commonly-used cloud workloads. Nonetheless, uncommon, customer-specific workloads (unknown workloads), if their characteristics are different from common workloads (known workloads), may not realize the potential of the underlying platform. To address this problem of realizing the full potential of the underlying platform, we develop a machine learning based technique to characterize, profile and predict workloads running in the cloud environment. Experimental evaluation of our technique demonstrates good prediction performance. We also develop techniques to analyze the performance of the model in a standalone manner.

翻訳日:2022-11-30 17:43:25 公開日:2022-11-28

# 信頼度対応グラフニューラルネットワークによる信頼性評価

Confidence-Aware Graph Neural Networks for Learning Reliability Assessment Commitments ( http://arxiv.org/abs/2211.15755v1 )

ライセンス: Link先を確認

Seonho Park, Wenbo Chen, Dahye Han, Mathieu Tanneau, and Pascal Van Hentenryck

(参考訳) 信頼度評価コミットメント(RAC)最適化は, 再生可能世代の増加と予測誤差の増加により, グリッド運用においてますます重要になっている。独立系演算子(isos)はまた、より細かい時間的粒度、より長い時間的地平線、そしてさらなる経済的および信頼性の利益のために確率的定式化を使用することを目標としている。本論文の目的は, rac定式化の範囲拡大に伴う計算上の課題を解決することである。 RACLEARN は,(1) グラフニューラルネットワーク (GNN) を用いて生成元のコミットメントとアクティブラインの制約を予測し,(2) 信頼値を各コミットメント予測に関連付け,(3) 信頼性の高い予測のサブセットを選択し,(4) 実現可能性のために修正し,(5) 実現可能な予測とアクティブな制約を含む最先端の最適化アルゴリズムをシードする,と提示する。ミドルコンチネント・インディペンデント・システム・オペレーター(MISO)と実際の送信ネットワーク(8965の送信線、6708のバス、1890の発電機、6262の負荷ユニット)が使用する正確なRAC定式化実験の結果、RACLEARNフレームワークは、解品質が2～4の要因でRAC最適化を高速化できることが示されている。

Reliability Assessment Commitment (RAC) Optimization is increasingly important in grid operations due to larger shares of renewable generations in the generation mix and increased prediction errors. Independent System Operators (ISOs) also aim at using finer time granularities, longer time horizons, and possibly stochastic formulations for additional economic and reliability benefits. The goal of this paper is to address the computational challenges arising in extending the scope of RAC formulations. It presents RACLEARN that (1) uses Graph Neural Networks (GNN) to predict generator commitments and active line constraints, (2) associates a confidence value to each commitment prediction, (3) selects a subset of the high-confidence predictions, which are (4) repaired for feasibility, and (5) seeds a state-of-the-art optimization algorithm with the feasible predictions and the active constraints. Experimental results on exact RAC formulations used by the Midcontinent Independent System Operator (MISO) and an actual transmission network (8965 transmission lines, 6708 buses, 1890 generators, and 6262 load units) show that the RACLEARN framework can speed up RAC optimization by factors ranging from 2 to 4 with negligible loss in solution quality.

翻訳日:2022-11-30 17:43:10 公開日:2022-11-28

# 群数データに対する階層ベイズモデルの効率的な推定のための近似ギブズサンプリング

Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data ( http://arxiv.org/abs/2211.15771v1 )

ライセンス: Link先を確認

Jin-Zhu Yu, Hiba Baroud

(参考訳) 階層型ベイズ・ポアソン回帰モデル (HBPRMs) は予測値とカウント応答変数の関係の柔軟なモデリング手法を提供する。大規模データセットへのhbprmの適用には、ランダムサンプリングに基づく多くのモデルパラメータを推測する計算コストが高いため、効率的な推論アルゴリズムが必要である。マルコフ・チェイン・モンテカルロ (MCMC) アルゴリズムはベイジアン推論に広く用いられているが、このタイプのアルゴリズムを用いたサンプリングは、大規模なデータと時間に敏感な意思決定を行うアプリケーションには時間を要する。この制限を克服するため,推定精度を維持しつつHBPRMを効率的に学習するための近似ギブスサンプリング器(AGS)を開発した。提案したサンプリング器では,データ確率はガウス分布と近似して係数の条件付き後部が閉形式解を持つ。実データと合成データを用いた数値実験は,特に大規模データセットにおいて,最先端サンプリングアルゴリズムと比較してAGSの優れた性能を示す。

Hierarchical Bayesian Poisson regression models (HBPRMs) provide a flexible modeling approach of the relationship between predictors and count response variables. The applications of HBPRMs to large-scale datasets require efficient inference algorithms due to the high computational cost of inferring many model parameters based on random sampling. Although Markov Chain Monte Carlo (MCMC) algorithms have been widely used for Bayesian inference, sampling using this class of algorithms is time-consuming for applications with large-scale data and time-sensitive decision-making, partially due to the non-conjugacy of many models. To overcome this limitation, this research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy. In the proposed sampler, the data likelihood is approximated with Gaussian distribution such that the conditional posterior of the coefficients has a closed-form solution. Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS in comparison to the state-of-the-art sampling algorithm, especially for large datasets.

翻訳日:2022-11-30 17:42:40 公開日:2022-11-28

# sgva-clip: 画像分類のための視覚言語モデルのセマンティック誘導視覚適応

SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification ( http://arxiv.org/abs/2211.16191v1 )

ライセンス: Link先を確認

Fang Peng, Xiaoshan Yang, Changsheng Xu

(参考訳) 少数ショット学習では大きな進歩があったが、既存の少数ショット学習法のほとんどは、実世界のアプリケーションにおける一般化能力を制限するために、大量のベースクラスのサンプルで事前学習を監督する必要がある。近年、大規模な自己教師型視覚言語モデル(例えばCLIP)は、伝達可能な視覚表現学習のための新しいパラダイムを提供している。しかしながら、事前訓練されたvlpは、言語文によって記述が難しいが、少ないショット分類で効果的な分類法を学ぶために重要である詳細な視覚情報を無視する可能性がある。そこで本研究では,視覚固有のコントラスト損失,クロスモーダルコントラスト損失,暗黙の知識蒸留を包括的に利用することにより,視覚言語事前学習モデルを拡張し,識別的タスク特有の視覚特徴を創り出すための新しいフレームワークであるsemantic-guided visual adapting (sgva)を提案する。暗黙的知識蒸留は、細粒度のクロスモーダル知識を視覚アダプターの更新を導くために設計されている。 13のデータセットに関する最先端の成果は、適応したビジュアル機能がクロスモーダル機能を補完し、少数ショットの画像分類を改善することを証明している。

Although significant progress has been made in few-shot learning, most of existing few-shot learning methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application. Recently, large-scale self-supervised vision-language models (e.g., CLIP) have provided a new paradigm for transferable visual representation learning. However, the pre-trained VLPs may neglect detailed visual information that is difficult to describe by language sentences, but important for learning an effective classifier in few-shot classification. To address the above problem, we propose a new framework, named Semantic-guided Visual Adapting (SgVA), which can effectively extend vision-language pre-trained models to produce discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation. The implicit knowledge distillation is designed to transfer the fine-grained cross-modal knowledge to guide the updating of the vision adapter. State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.

翻訳日:2022-11-30 17:24:24 公開日:2022-11-28

# 回転に注意:3d形状のための一様バックドアパターン

Be Careful with Rotation: A Uniform Backdoor Pattern for 3D Shape ( http://arxiv.org/abs/2211.16192v1 )

ライセンス: Link先を確認

Linkun Fan, Fazhi He, Qing Guo, Wei Tang, Xiaolin Hong, Bing Li

(参考訳) コスト削減のために、多くのディープニューラルネットワーク(DNN)は、インターネットからダウンロードされたサードパーティのデータセットでトレーニングされている。 2Dドメインでは、異なる画像フォーマットの固有の構造が似ている。したがって、あるイメージフォーマット用に設計されたバックドアアタックは、他のフォーマットと一致します。しかし、3Dの世界では、異なる3Dデータ構造の間に大きな違いがあります。その結果、ある特定の3dデータ構造用に設計されたバックドアパターンは、同じ3dシーンの他のデータ構造では無効になる。そこで本稿では, 不均一な3次元データ構造に適応可能な NRBdoor (Noisy Rotation Backdoor) という一様バックドアパターンを設計する。具体的には、まずユニット回転から始めて、ノイズ生成と選択プロセスにより最適なパターンを探索する。 NRBdoorは,一対の点のミスマッチと実世界の3Dシーンのセンサキャリブレーション誤差により,通常ノイズを含むため,自然かつ知覚不能である。 3Dメッシュとポイントクラウドの大規模な実験により、提案したRBBdoorは、無視可能な形状変化で最先端のパフォーマンスを達成することが示された。

For saving cost, many deep neural networks (DNNs) are trained on third-party datasets downloaded from internet, which enables attacker to implant backdoor into DNNs. In 2D domain, inherent structures of different image formats are similar. Hence, backdoor attack designed for one image format will suite for others. However, when it comes to 3D world, there is a huge disparity among different 3D data structures. As a result, backdoor pattern designed for one certain 3D data structure will be disable for other data structures of the same 3D scene. Therefore, this paper designs a uniform backdoor pattern: NRBdoor (Noisy Rotation Backdoor) which is able to adapt for heterogeneous 3D data structures. Specifically, we start from the unit rotation and then search for the optimal pattern by noise generation and selection process. The proposed NRBdoor is natural and imperceptible, since rotation is a common operation which usually contains noise due to both the miss match between a pair of points and the sensor calibration error for real-world 3D scene. Extensive experiments on 3D mesh and point cloud show that the proposed NRBdoor achieves state-of-the-art performance, with negligible shape variation.

翻訳日:2022-11-30 17:23:59 公開日:2022-11-28

# より賢く、難しくない: 不足データから深部腹部ctの登録を学ぶ

Train smarter, not harder: learning deep abdominal CT registration on scarce data ( http://arxiv.org/abs/2211.15717v1 )

ライセンス: Link先を確認

Javier P\'erez de Frutos, Andr\'e Pedersen, Egidijus Pelanis, David Bouget, Shanmugapriya Survarachakan, Thomas Lang{\o}, Ole-Jakob Elle, Frank Lindseth

(参考訳) 目的:本研究の目的は,腹部画像の畳み込みニューラルネットワークに基づく画像から画像への登録を改善するための訓練戦略を検討することである。方法: 異なる訓練戦略, 損失関数, 転校学習スキームを検討した。さらに, 動的損失重み付けが可能な損失層に加えて, 実機で人工訓練画像対を生成する拡張層も提案した。結果: 訓練段階におけるセグメンテーションを用いた登録指導は, 深層学習に基づく画像登録に有用であることが判明した。脳MRIデータセットから腹部CTデータセットに事前トレーニングされたモデルを微調整することで、後者のアプリケーションのパフォーマンスがさらに向上した。動的損失重み付けは、推論ランタイムに影響を与えることなく、パフォーマンスをわずかに改善した。結論: 単純な概念を用いて, 一般的に使用される深層画像登録アーキテクチャvoxelmorphの性能を改善した。今後の作業では、DDMRというフレームワークをさまざまなデータセットで検証して、その価値をさらに評価する必要があります。

Purpose: This study aims to explore training strategies to improve convolutional neural network-based image-to-image registration for abdominal imaging. Methods: Different training strategies, loss functions, and transfer learning schemes were considered. Furthermore, an augmentation layer which generates artificial training image pairs on-the-fly was proposed, in addition to a loss layer that enables dynamic loss weighting. Results: Guiding registration using segmentations in the training step proved beneficial for deep-learning-based image registration. Finetuning the pretrained model from the brain MRI dataset to the abdominal CT dataset further improved performance on the latter application, removing the need for a large dataset to yield satisfactory performance. Dynamic loss weighting also marginally improved performance, all without impacting inference runtime. Conclusion: Using simple concepts, we improved the performance of a commonly used deep image registration architecture, VoxelMorph. In future work, our framework, DDMR, should be validated on different datasets to further assess its value.

翻訳日:2022-11-30 17:15:25 公開日:2022-11-28

# d'ecouvrir de nouvelles class dans des donn\'ees tabulaires

D\'ecouvrir de nouvelles classes dans des donn\'ees tabulaires ( http://arxiv.org/abs/2211.16352v1 )

ライセンス: Link先を確認

Colin Troisemaine, Joachim Flocon-Cholet, St\'ephane Gosselin, Sandrine Vaton, Alexandre Reiffers-Masson, Vincent Lemaire

(参考訳) novel class discovery (ncd) では、既知のが異なるクラスのラベル付き集合が与えられたラベルなしのセットで新しいクラスを見つけることが目的である。 NCDは最近、コミュニティから注目を集めているが、非常に一般的なデータ表現であるにもかかわらず、不均一な表形式データのためのフレームワークはまだ提案されていない。本稿では,表データの新しいクラスを発見するための新しい手法であるTabularNCDを提案する。異種変数を含む表データのコンテキストにおいて,すでに知られているクラスから知識を抽出し,新しいクラスの発見プロセスを導く方法を示す。このプロセスの一部は、擬似ラベルを定義する新しい方法によって行われ、マルチタスク学習における最近の知見に従い、共同目的関数を最適化する。本手法は,NCDが画像だけでなく,不均一な表データにも適用可能であることを示す。

In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.

翻訳日:2022-11-30 17:07:21 公開日:2022-11-28

# ディープラーニングフレームワークにおけるライブラリの利用と依存に関する実証的研究

An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks ( http://arxiv.org/abs/2211.15733v1 )

ライセンス: Link先を確認

Mohamed Raed El aoun, Lionel Nganyewou Tidjon, Ben Rombaut, Foutse Khomh, Ahmed E. Hassan

(参考訳) ディープラーニング(dl)の最近の進歩は、最先端のディープニューラルネットワーク(DNN)の開発とデプロイにおいて機械学習(ml)実践者を支援するために、pytorch、Caffe、TensorFlowなどのいくつかのdlソフトウェアライブラリがリリースされたが、テストやデータ処理などのdlライブラリの制限に適切に対処することはできない。本稿では、最も頻繁なdlライブラリの組み合わせの質的かつ定量的な分析、mlワークフロー全体にわたるdlライブラリ依存性の分布、および一連のレコメンデーションの定式化について述べる。 (i)より最適化されたアクセラレーターのためのハードウェアビルダー (ii) より洗練された将来のリリースのためのライブラリビルダー。本研究は1,484のオープンソースdlプロジェクトに基づいており,46,110人のコントリビューターが評価に基づいて選出されている。まず,深層学習ライブラリの利用が増加傾向にあった。第2に,ディープラーニングライブラリの利用パターンをいくつか紹介する。さらに、dlライブラリと最も頻繁なコンビネーション間の依存関係を特定し、pytorchとscikit-learn、kerasとtensorflowが18%と14%のプロジェクトでもっとも頻繁なコンビネーションであることが分かりました。開発者は同じプロジェクトで2、3のdlライブラリを使用し、同じ関数と同じファイルの両方で異なる複数のdlライブラリを使用する傾向がある。開発者は、さまざまなディープラーニングライブラリの使用パターンを示し、より少ない引数と直接的な目標を持つ単純な関数を好む。最後に, 研究者, ライブラリメンテナ, ハードウェアベンダに対して, 調査結果の意義について述べる。

Recent advances in deep learning (dl) have led to the release of several dl software libraries such as pytorch, Caffe, and TensorFlow, in order to assist machine learning (ml) practitioners in developing and deploying state-of-the-art deep neural networks (DNN), but they are not able to properly cope with limitations in the dl libraries such as testing or data processing. In this paper, we present a qualitative and quantitative analysis of the most frequent dl libraries combination, the distribution of dl library dependencies across the ml workflow, and formulate a set of recommendations to (i) hardware builders for more optimized accelerators and (ii) library builder for more refined future releases. Our study is based on 1,484 open-source dl projects with 46,110 contributors selected based on their reputation. First, we found an increasing trend in the usage of deep learning libraries. Second, we highlight several usage patterns of deep learning libraries. In addition, we identify dependencies between dl libraries and the most frequent combination where we discover that pytorch and Scikit-learn and, Keras and TensorFlow are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files. The developer shows patterns in using various deep-learning libraries and prefers simple functions with fewer arguments and straightforward goals. Finally, we present the implications of our findings for researchers, library maintainers, and hardware vendors.

翻訳日:2022-11-30 17:06:50 公開日:2022-11-28

# 特徴とセマンティクスの二重一致による深い半教師付き学習

Deep Semi-supervised Learning with Double-Contrast of Features and Semantics ( http://arxiv.org/abs/2211.15671v1 )

ライセンス: Link先を確認

Quan Feng, Jiayu Yao, Zhison Pan, Guojun Zhou

(参考訳) 近年、インテリジェントトランスポートシステム(ITS)の分野は、大量のアノテーションデータによって大きな成功を収めている。しかし、これらの注釈付きデータを取得するには、実際のコストがかかる必要がある。したがって、より現実的な戦略は、少量のラベル付きデータと大量のラベルなしデータで半教師付き学習(SSL)を活用することである。典型的には、意味整合性規則化と特徴抽出と分類を分離する2段階学習法が有効であることが証明されている。それにもかかわらず、意味的一貫性の正規化のみに限定された表現学習は、異なる意味論を持つサンプルの表現の分離や判別性を保証するものではない。以上の欠点に対処するため,本論文では,正と負の強化サンプルペアのセマンティクス/特徴を対比することにより,効果的なタスク固有の識別特徴を抽出する,意味と特徴の両立を両立する深層半教師付き学習手法を提案する。さらに,情報理論を用いて意味論と特徴の二重コントラストの合理性を説明し,slack相互情報をより単純な方法でコントラスト損失を説明する。最後に,本手法の有効性をベンチマークデータセットで検証した。

In recent years, the field of intelligent transportation systems (ITS) has achieved remarkable success, which is mainly due to the large amount of available annotation data. However, obtaining these annotated data has to afford expensive costs in reality. Therefore, a more realistic strategy is to leverage semi-supervised learning (SSL) with a small amount of labeled data and a large amount of unlabeled data. Typically, semantic consistency regularization and the two-stage learning methods of decoupling feature extraction and classification have been proven effective. Nevertheless, representation learning only limited to semantic consistency regularization may not guarantee the separation or discriminability of representations of samples with different semantics; due to the inherent limitations of the two-stage learning methods, the extracted features may not match the specific downstream tasks. In order to deal with the above drawbacks, this paper proposes an end-to-end deep semi-supervised learning double contrast of semantic and feature, which extracts effective tasks specific discriminative features by contrasting the semantics/features of positive and negative augmented samples pairs. Moreover, we leverage information theory to explain the rationality of double contrast of semantics and features and slack mutual information to contrastive loss in a simpler way. Finally, the effectiveness of our method is verified in benchmark datasets.

翻訳日:2022-11-30 16:59:39 公開日:2022-11-28

# PyTorch Adapt

PyTorch Adapt ( http://arxiv.org/abs/2211.15673v1 )

ライセンス: Link先を確認

Kevin Musgrave, Serge Belongie, Ser-Nam Lim

(参考訳) PyTorch Adaptは、既存のモデルを新しいドメインで動作させるための機械学習アルゴリズムの一種である、ドメイン適応のためのライブラリである。これは完全なツールキットであり、ユーザは数行のコードで完全なトレーナー/テストパイプラインを作成できる。モジュール性もあるので、ユーザは必要なパーツだけをインポートでき、フレームワークにロックされることを心配する必要はない。このライブラリの1つの特徴はカスタマイズ性である。特に複雑なトレーニングアルゴリズムは、構成可能で遅延評価されたフックのシステムのおかげで、容易に修正および組み合わせが可能である。本報告では,これらの特徴と図書館全体の設計について概説する。コードはhttps://www.github.com/KevinMusgrave/pytorch-adaptで入手できる。

PyTorch Adapt is a library for domain adaptation, a type of machine learning algorithm that re-purposes existing models to work in new domains. It is a fully-featured toolkit, allowing users to create a complete train/test pipeline in a few lines of code. It is also modular, so users can import just the parts they need, and not worry about being locked into a framework. One defining feature of this library is its customizability. In particular, complex training algorithms can be easily modified and combined, thanks to a system of composable, lazily-evaluated hooks. In this technical report, we explain in detail these features and the overall design of the library. Code is available at https://www.github.com/KevinMusgrave/pytorch-adapt

翻訳日:2022-11-30 16:59:16 公開日:2022-11-28

# eXplainable Machine LearningとKelly Indexによるフットボールの試合結果の予測

Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index ( http://arxiv.org/abs/2211.15734v1 )

ライセンス: Link先を確認

Yiming Ren and Teo Susnjak

(参考訳) 本研究では,サッカーの試合の結果を予測するための機械学習手法を開発した。この研究の新規性は、Kelly Indexを利用して、マッチをそれぞれ異なるレベルの予測困難を示すカテゴリに分類することにある。このアプローチの有効性を判断するために,マッチの各カテゴリに対して,幅広いアルゴリズム群を用いた分類モデルを開発した。これと合わせて、以前は探索されていなかった一連の機能は、Eloベースの変数を含むエンジニアリングだった。データセットは2019-2021シーズンのプレミアリーグの試合データに由来する。その結果,予測問題をサブタスクに分解するプロセスが効果的であり,先行研究と競合する結果が得られたが,アンサンブルベースの手法が最も効果的であった。また,本書の確率をベンチマークすることで有効性を評価するための投資戦略も考案した。予測モデルの信頼しきい値とケリー指数を組み合わせることでリスクを最小化する手法を開発した。実験の結果,提案手法は,予測モデルが高い信頼度を示す場合に,予測し易いマッチングに主眼を置く保守的アプローチに従えば,利益を返すことができることがわかった。

In this work, a machine learning approach is developed for predicting the outcomes of football matches. The novelty of this research lies in the utilisation of the Kelly Index to first classify matches into categories where each one denotes the different levels of predictive difficulty. Classification models using a wide suite of algorithms were developed for each category of matches in order to determine the efficacy of the approach. In conjunction to this, a set of previously unexplored features were engineering including Elo-based variables. The dataset originated from the Premier League match data covering the 2019-2021 seasons. The findings indicate that the process of decomposing the predictive problem into sub-tasks was effective and produced competitive results with prior works, while the ensemble-based methods were the most effective. The paper also devised an investment strategy in order to evaluate its effectiveness by benchmarking against bookmaker odds. An approach was developed that minimises risk by combining the Kelly Index with the predefined confidence thresholds of the predictive models. The experiments found that the proposed strategy can return a profit when following a conservative approach that focuses primarily on easy-to-predict matches where the predictive models display a high confidence level.

翻訳日:2022-11-30 16:59:05 公開日:2022-11-28

# プライバシデリゲーション下におけるニューラルネット型微分プライベートタブラリトレーニングデータの実用性回復性について

On the Utility Recovery Incapability of Neural Net-based Differential Private Tabular Training Data Synthesizer under Privacy Deregulation ( http://arxiv.org/abs/2211.15809v1 )

ライセンス: Link先を確認

Yucong Liu, Chi-Hua Wang, Guang Cheng

(参考訳) 生成モデルプライバシ・ユーティリティ・トレードオフの監査手順の策定は、実際には重要な課題であるが未解決の問題である。既存の研究は、合成データトレーニングの実際のパラダイムに基づいた、合成における列車のユーティリティ劣化の観点から、プライバシ制約の副作用を調査することに集中している。我々は,プライバシデリゲーションの側面が合成トレーニングデータユーティリティに与える影響を観察することによって,プライバシユーティリティのトレードオフに関する理解を次のレベルに押し上げる。突如として,DP-CTGANとPATE-CTGANのプライバシー規制下での実用性回復不能が発見され,実用性への懸念が高まった。プライバシデリゲーション(Privacy Deregulation)は,必ずしもユーティリティリカバリを意味するものではない。

Devising procedures for auditing generative model privacy-utility tradeoff is an important yet unresolved problem in practice. Existing works concentrates on investigating the privacy constraint side effect in terms of utility degradation of the train on synthetic, test on real paradigm of synthetic data training. We push such understanding on privacy-utility tradeoff to next level by observing the privacy deregulation side effect on synthetic training data utility. Surprisingly, we discover the Utility Recovery Incapability of DP-CTGAN and PATE-CTGAN under privacy deregulation, raising concerns on their practical applications. The main message is Privacy Deregulation does NOT always imply Utility Recovery.

翻訳日:2022-11-30 16:58:46 公開日:2022-11-28

# 動的応力予測のための物理インフォームドニューラルネットワーク

Physics Informed Neural Network for Dynamic Stress Prediction ( http://arxiv.org/abs/2211.16190v1 )

ライセンス: Link先を確認

Hamed Bolandi, Gautam Sreekumar, Xuyang Li, Nizar Lajnef, Vishnu Naresh Boddeti

(参考訳) 構造的破壊はしばしば地震や風などの壊滅的な出来事によって引き起こされる。その結果, 動的応力分布をリアルタイムに予測することが重要である。現在利用可能な有限要素モデル(FEM)のような高忠実度メソッドは、その固有の高複雑性に悩まされている。そこで, 精度を維持しつつ計算コストを削減するため, 偏微分方程式 (PDE) を用いた有限要素シミュレーションに基づいて, 応力分布の全列を予測するために, PINN-Stressモデル(Physical Informed Neural Network)を提案する。自動微分を用いて、深層ニューラルネットワークの損失関数にPDEを埋め込み、測定やPDEからの情報を取り込む。 PINN-Stressモデルは、ほぼリアルタイムで応力分布の列を予測でき、PINNなしではモデルよりも良く一般化できる。

Structural failures are often caused by catastrophic events such as earthquakes and winds. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity. Therefore, to reduce computational cost while maintaining accuracy, a Physics Informed Neural Network (PINN), PINN-Stress model, is proposed to predict the entire sequence of stress distribution based on Finite Element simulations using a partial differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep neural network's loss function to incorporate information from measurements and PDEs. The PINN-Stress model can predict the sequence of stress distribution in almost real-time and can generalize better than the model without PINN.

翻訳日:2022-11-30 16:57:21 公開日:2022-11-28

# RGBシーケンスからの手動3次元オブジェクトスキャン

In-Hand 3D Object Scanning from an RGB Sequence ( http://arxiv.org/abs/2211.16193v1 )

ライセンス: Link先を確認

Shreyas Hampali, Tomas Hodan, Luan Tran, Lingni Ma, Cem Keskin, Vincent Lepetit

(参考訳) カラー画像のシーケンスから未知の物体を3次元的に走査する手法を提案する。本研究では, 物体表面を多視点画像から再構成する手法として, 物体の形状と外観の両方を捉えるニューラル暗示表面表現に頼っている。多くのNeRF方式とは対照的に、カメラ対象の相対的なポーズが知られているとは仮定せず、オブジェクト形状とポーズ軌道の両方を同時に最適化する。全ての形状とポーズパラメータのグローバルな最適化は、ポーズの粗い初期化なしに失敗しがちであるので、最適化が成功する可能性のある、慎重に選択された重複セグメントにシーケンスを分割することから始まる漸進的なアプローチを提案する。物体形状を漸進的に再構成し,各セグメント内で独立して物体のポーズを追跡し,その後,重なり合うフレームで推定されるポーズを調整して全てのセグメントをマージする。最後に,全セグメントに対してグローバルな最適化を行い,完全な再構築を実現する。提案手法は,テクスチャと難解なテクスチャレス物体の形状と色を再現し,外観のみに依存する古典的手法よりも優れており,その性能は既知のカメラのポーズを仮定する最近の手法に近いことを示す。

We propose a method for in-hand 3D scanning of an unknown object from a sequence of color images. We cast the problem as reconstructing the object surface from un-posed multi-view images and rely on a neural implicit surface representation that captures both the geometry and the appearance of the object. By contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known and instead simultaneously optimize both the object shape and the pose trajectory. As global optimization over all the shape and pose parameters is prone to fail without coarse-level initialization of the poses, we propose an incremental approach which starts by splitting the sequence into carefully selected overlapping segments within which the optimization is likely to succeed. We incrementally reconstruct the object shape and track the object poses independently within each segment, and later merge all the segments by aligning poses estimated at the overlapping frames. Finally, we perform a global optimization over all the aligned segments to achieve full reconstruction. We experimentally show that the proposed method is able to reconstruct the shape and color of both textured and challenging texture-less objects, outperforms classical methods that rely only on appearance features, and its performance is close to recent methods that assume known camera poses.

翻訳日:2022-11-30 16:41:05 公開日:2022-11-28

# SLAN: クロスモーダル理解のためのセルフロケータ支援ネットワーク

SLAN: Self-Locator Aided Network for Cross-Modal Understanding ( http://arxiv.org/abs/2211.16208v1 )

ライセンス: Link先を確認

Jiang-Tian Zhai, Qi Zhang, Tong Wu, Xing-Yu Chen, Jiang-Jiang Liu, Bo Ren, Ming-Ming Cheng

(参考訳) 視覚と言語の間のきめ細かい相互作用を学ぶことで、VisionLanguageタスクをより正確に理解できます。しかし、セマンティックアライメントのためのテキストに従ってキー画像領域を抽出することは依然として困難である。既存のほとんどの作品は、凍結検知器で得られたテキスト診断や冗長な領域によって制限されているか、あるいは事前の検出器へのわずかな接地(金)データに大きく依存しているため、さらにスケールできない。これらの問題を解決するために,ゴールドデータなしでクロスモーダル理解タスクを行うセルフロケータ支援ネットワーク (slan, self-locator aided network) を提案する。 SLANは、異なるテキストで条件付けられた関心領域をローカライズするための領域フィルタと領域アダプタで構成される。クロスモーダル情報を集約することにより、領域フィルタはキー領域を選択し、領域適応子はテキストガイダンスで座標を更新する。詳細な領域単語アライメントにより、SLANは多くの下流タスクに簡単に一般化できる。 5つのクロスモーダル理解タスク(例えば、coco画像からテキストへの変換とテキストから画像への検索において85.7%と69.2%)において、かなり競争力のある結果が得られる。 SLANはまた、2つのローカライゼーションタスクに強いゼロショットと微調整の転送可能性を示す。

Learning fine-grained interplay between vision and language allows to a more accurate understanding for VisionLanguage tasks. However, it remains challenging to extract key image regions according to the texts for semantic alignments. Most existing works are either limited by textagnostic and redundant regions obtained with the frozen detectors, or failing to scale further due to its heavy reliance on scarce grounding (gold) data to pre-train detectors. To solve these problems, we propose Self-Locator Aided Network (SLAN) for cross-modal understanding tasks without any extra gold data. SLAN consists of a region filter and a region adaptor to localize regions of interest conditioned on different texts. By aggregating cross-modal information, the region filter selects key regions and the region adaptor updates their coordinates with text guidance. With detailed region-word alignments, SLAN can be easily generalized to many downstream tasks. It achieves fairly competitive results on five cross-modal understanding tasks (e.g., 85.7% and 69.2% on COCO image-to-text and text-to-image retrieval, surpassing previous SOTA methods). SLAN also demonstrates strong zero-shot and fine-tuned transferability to two localization tasks.

翻訳日:2022-11-30 16:40:19 公開日:2022-11-28

# pids: 3次元点雲のコネクテッドポイントインタラクション・ディメンション探索

PIDS: Joint Point Interaction-Dimension Search for 3D Point Cloud ( http://arxiv.org/abs/2211.15759v1 )

ライセンス: Link先を確認

Tunhou Zhang, Mingyuan Ma, Feng Yan, Hai Li, Yiran Chen

(参考訳) 点の相互作用と次元は、階層的3dモデルを提供する点作用素を設計する上で重要な軸である。しかし、この2つの軸は異質であり、完全な探査は困難である。既存のワークスクラフトポイント演算子を1軸下に置き、3Dモデルのすべての部分でクラフトスクラフト演算子を再利用する。これは、3次元点雲の様々な幾何学的・密度を活用し、点相互作用と次元をより良く結合する機会を見下ろす。本研究では,点間相互作用と点次元を共同で探索し,点クラウドデータのセマンティックセグメンテーションを提供する新しいパラダイムであるPIDSを確立する。我々は多目的点相互作用と点次元を共同で検討する大規模な探索空間を確立する。これは様々な幾何学・密度を考慮した点演算子をサポートする。ヘテロジニアスな検索コンポーネントを持つ拡張された検索空間は、候補モデルのより優れたランキングを求める。そこで我々は,予測器をベースとしたニューラルアーキテクチャ探索(NAS)を活用して探索空間の探索を改良し,それ以前の特徴に基づいて,一意のエンコーディングを異種検索コンポーネントに割り当てることで予測品質を向上させる。本研究では,2つのセマンティックセグメンテーション・ベンチマークを用いてPIDSが作成したネットワークを徹底的に評価し,SemanticKITTIとS3DISの3Dモデルに対して約1%のmIOU改善を示した。

The interaction and dimension of points are two important axes in designing point operators to serve hierarchical 3D models. Yet, these two axes are heterogeneous and challenging to fully explore. Existing works craft point operator under a single axis and reuse the crafted operator in all parts of 3D models. This overlooks the opportunity to better combine point interactions and dimensions by exploiting varying geometry/density of 3D point clouds. In this work, we establish PIDS, a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data. We establish a large search space to jointly consider versatile point interactions and point dimensions. This supports point operators with various geometry/density considerations. The enlarged search space with heterogeneous search components calls for a better ranking of candidate models. To achieve this, we improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS), and enhance the quality of prediction by assigning unique encoding to heterogeneous search components based on their priors. We thoroughly evaluate the networks crafted by PIDS on two semantic segmentation benchmarks, showing ~1% mIOU improvement on SemanticKITTI and S3DIS over state-of-the-art 3D models.

翻訳日:2022-11-30 16:23:53 公開日:2022-11-28

# 3次元シーンインスタンスセグメンテーションのためのスーパーポイントトランスフォーマー

Superpoint Transformer for 3D Scene Instance Segmentation ( http://arxiv.org/abs/2211.15766v1 )

ライセンス: Link先を確認

Jiahao Sun, Chunmei Qing, Junpeng Tan, Xiangmin Xu

(参考訳) 既存のほとんどのメソッドは、3Dオブジェクト検出や3Dセマンティックセマンティックセマンティックセマンティクスに使用されるモデルを拡張して3Dインスタンスセマンティクスを実現する。しかし、これらの非ストレートフォワード法には2つの欠点がある。 1) 境界ボックスや不十分な意味予測は、3dインスタンスのセグメンテーションフレームワーク全体のパフォーマンスを制限する。 2) 既存の手法では, 集約に要する時間を要する。そこで本研究では,SPFormer という名称の Superpoint Transformer に基づく,エンドツーエンドの3Dインスタンスセグメンテーション手法を提案する。ポイントクラウドから潜在的な機能をスーパーポイントにグループ化し、オブジェクト検出やセマンティクスセグメンテーションの結果に頼ることなく、クエリベクトルを通じてインスタンスを直接予測する。このフレームワークの重要なステップは、スーパーポイントのクロスアテンション機構を通じてインスタンス情報をキャプチャし、インスタンスのスーパーポイントマスクを生成することができるトランスフォーマーを備えた新しいクエリデコーダである。スーパーポイントマスクに基づく2部マッチングにより、spformerは中間集約ステップなしでネットワークトレーニングを実行でき、ネットワークを高速化できる。 ScanNetv2 と S3DIS ベンチマークの広範囲な実験により,提案手法は簡潔で効率的であることが確認された。特にSPFormerは、mAPの点でScanNetv2の隠れテストセットを4.3%上回り、高速な推論速度(247ms/フレーム)を同時に維持する。コードはhttps://github.com/sunjiahao 1999/SPFormerで入手できる。

Most existing methods realize 3D instance segmentation by extending those models used for 3D object detection or 3D semantic segmentation. However, these non-straightforward methods suffer from two drawbacks: 1) Imprecise bounding boxes or unsatisfactory semantic predictions limit the performance of the overall 3D instance segmentation framework. 2) Existing method requires a time-consuming intermediate step of aggregation. To address these issues, this paper proposes a novel end-to-end 3D instance segmentation method based on Superpoint Transformer, named as SPFormer. It groups potential features from point clouds into superpoints, and directly predicts instances through query vectors without relying on the results of object detection or semantic segmentation. The key step in this framework is a novel query decoder with transformers that can capture the instance information through the superpoint cross-attention mechanism and generate the superpoint masks of the instances. Through bipartite matching based on superpoint masks, SPFormer can implement the network training without the intermediate aggregation step, which accelerates the network. Extensive experiments on ScanNetv2 and S3DIS benchmarks verify that our method is concise yet efficient. Notably, SPFormer exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set in terms of mAP and keeps fast inference speed (247ms per frame) simultaneously. Code is available at https://github.com/sunjiahao1999/SPFormer.

翻訳日:2022-11-30 16:23:29 公開日:2022-11-28

# リモートセンシングにおける画像とラベル解像度のミスマッチ処理

Handling Image and Label Resolution Mismatch in Remote Sensing ( http://arxiv.org/abs/2211.15790v1 )

ライセンス: Link先を確認

Scott Workman, Armin Hadzic, M. Usman Rafique

(参考訳) セマンティックセグメンテーションは視覚文学において深く研究されてきたが、リモートセンシング領域ではユニークな課題が残っている。そのような課題の1つは、地上サンプル距離の違いによるオーバーヘッド画像と地上ラベルソースとの解像度ミスマッチの処理方法である。この問題を説明するために、我々は新しいデータセットを導入し、既存の戦略に固有の弱点を示すために使用します。代わりに、(アップサンプリングなしで)低解像度ラベルを使用して監督されるが、学習プロセスを導くために、高解像度ラベルの例示セットを利用する方法を提案する。本手法は,高分解能アノテーションを必要とせず,領域集約,逆学習,自己教師付き事前学習を組み込んだ細粒度予測手法である。大規模な実験は、我々のアプローチの現実的な適用性を実証している。

Though semantic segmentation has been heavily explored in vision literature, unique challenges remain in the remote sensing domain. One such challenge is how to handle resolution mismatch between overhead imagery and ground-truth label sources, due to differences in ground sample distance. To illustrate this problem, we introduce a new dataset and use it to showcase weaknesses inherent in existing strategies that naively upsample the target label to match the image resolution. Instead, we present a method that is supervised using low-resolution labels (without upsampling), but takes advantage of an exemplar set of high-resolution labels to guide the learning process. Our method incorporates region aggregation, adversarial learning, and self-supervised pretraining to generate fine-grained predictions, without requiring high-resolution annotations. Extensive experiments demonstrate the real-world applicability of our approach.

翻訳日:2022-11-30 16:23:04 公開日:2022-11-28

# 半定義型プログラミングによるk平均クラスタリングのスケッチ解法

Sketch-and-solve approaches to k-means clustering by semidefinite programming ( http://arxiv.org/abs/2211.15744v1 )

ライセンス: Link先を確認

Charles Clum, Dustin G. Mixon, Soledad Villar, Kaiying Xie

(参考訳) 本稿では,k-meansクラスタリングのpeng-wei半定値緩和を高速化するsketch-and-solveアプローチを提案する。データが適切に分離されると、k平均最適クラスタリングが特定される。そうでなければ、我々のアプローチは最適k平均値の高信頼な下界を与える。この下限はデータ駆動であり、データやどのように生成されるかは仮定しない。我々は、k-means++で得られたクラスタリングソリューションの近似最適性を証明するために、この手法を用いたコードと広範な数値実験を提供する。

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.

翻訳日:2022-11-30 16:14:21 公開日:2022-11-28

# 抽象的解釈による議論的マルチエージェントにおける意味構造保存に向けて

Towards Preserving Semantic Structure in Argumentative Multi-Agent via Abstract Interpretation ( http://arxiv.org/abs/2211.15782v1 )

ライセンス: Link先を確認

Minal Suresh Patil

(参考訳) 近年の20年間で、知識表現、推論、マルチエージェントシステムの分野で議論が注目されている。しかし、動的マルチエージェントシステムの議論は、表現複雑性と計算コストの犠牲となるエージェントによって生成される重要な議論の問題に直面する。本研究では,システム内の意味的フロー構造を保ちながら,複数の議論が同一位置を様々な視点から守ろうとしているモデルチェックの観点から,抽象化の概念を検討することを目的としている。

Over the recent twenty years, argumentation has received considerable attention in the fields of knowledge representation, reasoning, and multi-agent systems. However, argumentation in dynamic multi-agent systems encounters the problem of significant arguments generated by agents, which comes at the expense of representational complexity and computational cost. In this work, we aim to investigate the notion of abstraction from the model-checking perspective, where several arguments are trying to defend the same position from various points of view, thereby reducing the size of the argumentation framework whilst preserving the semantic flow structure in the system.

翻訳日:2022-11-30 16:14:12 公開日:2022-11-28

# h3wb:human3.6mの3dデータセットとベンチマーク

H3WB: Human3.6M 3D WholeBody Dataset and Benchmark ( http://arxiv.org/abs/2211.15692v1 )

ライセンス: Link先を確認

Yue Zhu, Nermin Samet, David Picard

(参考訳) 3D人間全体のポーズ推定は、顔、手、体、足など、人体全体の正確な3Dキーポイントをローカライズすることを目的としている。大規模な完全に注釈付けされた3Dボディデータセットがないため、一般的なアプローチは、特定の身体部分専用のデータセットで複数のディープネットワークを個別にトレーニングし、推論中にそれらを組み合わせることである。このアプローチは、使用するデータセットのバイアスが異なるため、複雑なトレーニングと推論パイプラインに悩まされる。また、異なるメソッドを比較するのが難しい共通のベンチマークがない。これらの問題に対処するために、COCO Wholebodyレイアウトを使用して、Human3.6Mデータセットに全身アノテーションを提供するHuman3.6M 3D WholeBody (H3WB)を導入する。 H3WBは、100Kイメージに133のボディ全体のキーポイントアノテーションを備えた大規模なデータセットで、新しいマルチビューパイプラインで実現しました。 H3WBとともに3つのタスクを提案する。一 2次元完全全身ポーズから持ち上げる3次元全身ポーズ二 2次元不完全な全身ポーズから持ち上げる3次元全身ポーズ三単一のRGB画像から全身の3次元ポーズ推定また,これらの課題に対する一般的な手法のベースラインをいくつか報告する。データセットは \url{https://github.com/wholebody3d/wholebody3d} で公開されている。

3D human whole-body pose estimation aims to localize precise 3D keypoints on the entire human body, including the face, hands, body, and feet. Due to the lack of a large-scale fully annotated 3D whole-body dataset, a common approach has been to train several deep networks separately on datasets dedicated to specific body parts, and combine them during inference. This approach suffers from complex training and inference pipelines because of the different biases in each dataset used. It also lacks a common benchmark which makes it difficult to compare different methods. To address these issues, we introduce Human3.6M 3D WholeBody (H3WB) which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. Along with H3WB, we propose 3 tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image. We also report several baselines from popular methods for these tasks. The dataset is publicly available at \url{https://github.com/wholebody3d/wholebody3d}.

翻訳日:2022-11-30 16:12:19 公開日:2022-11-28

# 拡散モデルによる後学習量子化

Post-training Quantization on Diffusion Models ( http://arxiv.org/abs/2211.15736v1 )

ライセンス: Link先を確認

Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, Yan Yan

(参考訳) denoising diffusion (score-based) 生成モデルは最近、現実的で多様なデータを生成することで大きな成果を上げている。これらの手法は、データをノイズに変換する前方拡散プロセスと、ノイズからデータをサンプリングする後方デノナイジングプロセスを定義する。残念ながら、現在のデノナイジング拡散モデルの生成プロセスは、面倒なニューラルネットワークに依存する長い反復的なノイズ推定のため、明らかに遅い。これは拡散モデルが特にエッジデバイスに広く展開されることを防ぐ。従来の研究は、短いが効果的なサンプリング軌道を見つけることによって拡散モデル(DM)の生成を加速した。しかし、各イテレーションで重ネットワークによるノイズ推定のコストを見落としている。本研究では,雑音推定ネットワークの圧縮の観点から生成を高速化する。 DMの再トレーニングの難しさから,主流のトレーニング対応圧縮パラダイムを除外し,DMアクセラレーションにPTQを導入している。しかし、ノイズ推定ネットワークの出力分布は時間とともに変化するため、従来のPTQ手法は単一ステップのシナリオ用に設計されているため、DMではフェールする。 DM固有のPTQ法を考案するために、定量化演算、キャリブレーションデータセット、キャリブレーションメトリックの3つの側面で、DM上のPTQを探索する。本手法を定式化するために全包括的調査から得られたいくつかの観測結果の要約と利用,特にDMの多段階構造を対象とする。実験では,完全な精度dmsを8ビットモデルへ直接定量化し,その性能を無訓練で維持・改善することができる。重要なことに,本手法はDDIMなどの他の高速サンプリング手法のプラグアンドプレイモジュールとして機能する。

Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM.

翻訳日:2022-11-30 16:11:57 公開日:2022-11-28

# CoNAL: 大規模言語モデルによるアウトリーチの予測

CoNAL: Anticipating Outliers with Large Language Models ( http://arxiv.org/abs/2211.15718v1 )

ライセンス: Link先を確認

Albert Xu, Xiang Ren, and Robin Jia

(参考訳) 多くのタスク設定において、テキスト分類モデルは、正しく予測できない新しいクラスの例に遭遇する可能性が高い。モデルが低信頼の例に固執する選択的予測は可能な解決策を提供するが、既存のモデルはしばしばoodの例に過度に自信を持っている。この過度な自信を補うために,新しいクラスを代表するOOD例を生成する2段階の手法であるContrastive Novelty-Augmented Learning (CoNAL)を導入し,その信頼性を低下させる訓練を行った。まず、大きな言語モデルを2回促すことでoodの例を生成します。関連する新しいラベルを列挙するように促し、タスクフォーマットにマッチする各新規クラスから例を生成します。第2に,生成されたood例に対する信頼度をトレーニング例よりも低くする,新たなコントラスト目標で分類器をトレーニングする。 CoNALで訓練すると、分類器は4つのNLPデータセットに対して平均2.3%のAUACと5.5%のAUROCでOODのサンプルを検出し、吸収する能力を向上させる。

In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly. Selective prediction, in which models abstain on low-confidence examples, provides a possible solution, but existing models are often overly confident on OOD examples. To remedy this overconfidence, we introduce Contrastive Novelty-Augmented Learning (CoNAL), a two-step method that generates OOD examples representative of novel classes, then trains to decrease confidence on them. First, we generate OOD examples by prompting a large language model twice: we prompt it to enumerate relevant novel labels, then generate examples from each novel class matching the task format. Second, we train our classifier with a novel contrastive objective that encourages lower confidence on generated OOD examples than training examples. When trained with CoNAL, classifiers improve in their ability to detect and abstain on OOD examples over prior methods by an average of 2.3% AUAC and 5.5% AUROC across 4 NLP datasets, with no cost to in-distribution accuracy.

翻訳日:2022-11-30 15:55:05 公開日:2022-11-28

# 言語学習項目のための制御言語生成

Controlled Language Generation for Language Learning Items ( http://arxiv.org/abs/2211.15731v1 )

ライセンス: Link先を確認

Kevin Stowe, Debanjan Ghosh, Mengxuan Zhao

(参考訳) この研究は、英語学習アプリケーションのためのアイテムを迅速に生成するために、自然言語生成(nlg)を採用することを目的としている。本研究は,言語学習に関連する要素の項目を制御する新しい手法である,習熟度が異なる多様な文と文法テストのための引数構造を考案した。ヒトの評価では、全てのモデル(3.4以上、4以上)で高い文法スコアを示し、高度な熟練度モデルのベースラインよりも高い長さ(24%)と複雑さ(9%)を示す。その結果,個々のユーザに対して多様でカスタマイズされたコンテンツを保証するためのコントロールを追加して,強力なパフォーマンスを実現することができた。

This work aims to employ natural language generation (NLG) to rapidly generate items for English language learning applications: this requires both language models capable of generating fluent, high-quality English, and to control the output of the generation to match the requirements of the relevant items. We experiment with deep pretrained models for this task, developing novel methods for controlling items for factors relevant in language learning: diverse sentences for different proficiency levels and argument structure to test grammar. Human evaluation demonstrates high grammatically scores for all models (3.4 and above out of 4), and higher length (24%) and complexity (9%) over the baseline for the advanced proficiency model. Our results show that we can achieve strong performance while adding additional control to ensure diverse, tailored content for individual users.

翻訳日:2022-11-30 15:54:44 公開日:2022-11-28

# 創発的言語の語彙エントロピーを数学的にモデル化する

Mathematically Modeling the Lexicon Entropy of Emergent Language ( http://arxiv.org/abs/2211.15783v1 )

ライセンス: Link先を確認

Brendon Boldt, David Mortensen

(参考訳) 深層学習に基づく創発言語システムにおける語彙エントロピーの数学的モデルとして確率過程FiLexを定式化する。モデルを数学的に定義することで、直接かつ決定的にテスト可能な明確な予測を生成することができる。本研究は,FiLexがハイパーパラメータ(トレーニングステップ,レキシコンサイズ,学習速度,ロールアウトバッファサイズ,Gumbel-Softmax温度)と,20の環境-ハイパーパラメータの組み合わせのうち20の創発言語エントロピーの正確な相関を予測できる4つの環境を実証的に検証した。さらに, 実験により, 異なる環境が過度パラメータとエントロピーの関係を多様に示し, 精度の高い粒度の予測を行うモデルの必要性が示された。

We formulate a stochastic process, FiLex, as a mathematical model of lexicon entropy in deep learning-based emergent language systems. Defining a model mathematically allows it to generate clear predictions which can be directly and decisively tested. We empirically verify across four different environments that FiLex predicts the correct correlation between hyperparameters (training steps, lexicon size, learning rate, rollout buffer size, and Gumbel-Softmax temperature) and the emergent language's entropy in 20 out of 20 environment-hyperparameter combinations. Furthermore, our experiments reveal that different environments show diverse relationships between their hyperparameters and entropy which demonstrates the need for a model which can make well-defined predictions at a precise level of granularity.

翻訳日:2022-11-30 15:54:30 公開日:2022-11-28

# ディープラーニング駆動のエッジビデオ分析:調査

Deep Learning-Driven Edge Video Analytics: A Survey ( http://arxiv.org/abs/2211.15751v1 )

ライセンス: Link先を確認

Renjie Xu, Saiedeh Razavi and Rong Zheng

(参考訳) ビデオは、デジタル情報のグローバルな爆発の鍵を握る存在であり、人間社会に多大な利益をもたらす。政府や企業は、例えば、警察、緊急管理、交通制御、セキュリティ監視など、様々な用途に無数のカメラを配備しており、いずれもビデオ分析(VA)によって促進されている。この傾向は、オブジェクト分類、検出、追跡のためのより正確なモデルを可能にするディープラーニング(DL)の急速な進歩によって引き起こされる。一方、インターネットに接続されたデバイスの普及に伴い、大量のデータが毎日生成され、クラウドを圧倒する。ワークロードとサービスをネットワークコアからネットワークエッジに移行する、新たなパラダイムであるエッジコンピューティングは、有望なソリューションとして広く認識されている。新たな交差点であるedge video analytics(eva)は、広く注目を集め始めている。それにもかかわらず、この話題に関する調査はごくわずかである。 EVAの最新の進歩を収集・要約するための専用会場がコミュニティから強く望まれている。さらに、EVAの基本概念(定義、アーキテクチャなど)は曖昧であり、この領域の急速な発展のためにこれらの調査によって無視されている。これらの概念のコンセンサスを促進するためには、徹底的な明確化が必要である。これらのギャップを埋めるために、EVAに関する最近の取り組みを包括的に調査する。本稿では,まずエッジコンピューティングの基礎を概観し,続いてvaの概要について述べる。次にEVAシステムとその実現技術について述べる。さらに,EVAシステムの開発において,今後の研究者を支援するためのフレームワークやデータセットも紹介する。最後に,既存の課題と今後の研究方向性について考察する。この調査は、読者がVAとエッジコンピューティングの関係を理解し、EVAに関する新しいアイデアを喚起するのに役立ちます。

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.

翻訳日:2022-11-30 15:46:16 公開日:2022-11-28

# 深層学習2段階アプローチによる新型コロナウイルスの分類

COVID-19 Classification Using Deep Learning Two-Stage Approach ( http://arxiv.org/abs/2211.15817v1 )

ライセンス: Link先を確認

Mostapha Alsaidi, Ali Saleem Altaher, Muhammad Tanveer Jan, Ahmed Altaher, Zahra Salekshahrezaee

(参考訳) 本稿では,未訓練の畳み込みニューラルネットワーク(VGG16とVGG19)の微調整と,開発されたCNNモデルのエンドツーエンドトレーニングを併用して,X線画像を新型コロナウイルス,正常,不透明,肺炎の4つのクラスに分類した。 20,000以上のX線スキャンを含むデータセットがKaggleから取得され、この実験で使用された。 2段階の分類アプローチをワンショット分類アプローチと比較するために実施した。我々の仮説は、2段階のモデルが単発モデルよりも優れたパフォーマンスを達成できるというものだった。以上の結果より, VGG16は5倍の訓練で95%の精度を達成できた。今後は、2段階分類モデルのCovid-TSCのより堅牢な実装に注力する予定である。主な改善点は、covid-19データセット上でvgg16モデルが微調整されたstage-1の出力からstage-2の入力へのデータフローを可能にすることだ。

In this paper, deep-learning-based approaches namely fine-tuning of pretrained convolutional neural networks (VGG16 and VGG19), and end-to-end training of a developed CNN model, have been used in order to classify X-Ray images into four different classes that include COVID-19, normal, opacity and pneumonia cases. A dataset containing more than 20,000 X-ray scans was retrieved from Kaggle and used in this experiment. A two-stage classification approach was implemented to be compared to the one-shot classification approach. Our hypothesis was that a two-stage model will be able to achieve better performance than a one-shot model. Our results show otherwise as VGG16 achieved 95% accuracy using one-shot approach over 5-fold of training. Future work will focus on a more robust implementation of the two-stage classification model Covid-TSC. The main improvement will be allowing data to flow from the output of stage-1 to the input of stage-2, where stage-1 and stage-2 models are VGG16 models fine-tuned on the Covid-19 dataset.

翻訳日:2022-11-30 15:45:52 公開日:2022-11-28

# 整数プログラミングによる加速非負テンソル補完

Accelerated Nonnegative Tensor Completion via Integer Programming ( http://arxiv.org/abs/2211.15770v1 )

ライセンス: Link先を確認

Wenhao Pan, Anil Aswani and Chen Chen

(参考訳) テンソル完備化の問題には、医療、コンピュータビジョン、その他の領域への応用がある。しかし、従来のテンソル完備化へのアプローチは、多項式時間計算を持つが、情報理論速度よりも指数関数的に多くのサンプルを必要とするか、より少ないサンプルを使用するが、既知の実用的なアルゴリズムが存在しないNPハード問題を解く必要があるという緊張に直面している。整数計画に基づく最近のアプローチは、非負のテンソル完全化に対するこの緊張を解消する。情報理論的なサンプル複雑性率を達成し、大域的最適に収束するためには、線形(数値的寛容)数のオラクルステップを必要とするBlended Conditional Gradientsアルゴリズムをデプロイする。このアプローチのトレードオフは、最悪の場合、oracleのステップは整数線形プログラムの解決を必要とすることである。この理論的な制限にもかかわらず、数値実験により、このアルゴリズムは、ある場合において、パーソナルコンピュータ上で実行中に最大1億のエントリをスケール可能であることが示された。本稿の目標は,解決可能なインスタンスの広さと規模を拡大することを目的として,このアルゴリズムをさらに強化することである。我々はアルゴリズムと同じ理論的保証を維持することができるが、より高速な計算を提供するいくつかの変種を探索する。我々は、異なるデータ構造、勾配降下ステップの加速、Blended Pairwise Conditional Gradientsアルゴリズムの利用について検討する。提案手法は, アルゴリズム設計の選択において, 様々なトレードオフを探索するために, 数値実験を行うものである。

The problem of tensor completion has applications in healthcare, computer vision, and other domains. However, past approaches to tensor completion have faced a tension in that they either have polynomial-time computation but require exponentially more samples than the information-theoretic rate, or they use fewer samples but require solving NP-hard problems for which there are no known practical algorithms. A recent approach, based on integer programming, resolves this tension for nonnegative tensor completion. It achieves the information-theoretic sample complexity rate and deploys the Blended Conditional Gradients algorithm, which requires a linear (in numerical tolerance) number of oracle steps to converge to the global optimum. The tradeoff in this approach is that, in the worst case, the oracle step requires solving an integer linear program. Despite this theoretical limitation, numerical experiments show that this algorithm can, on certain instances, scale up to 100 million entries while running on a personal computer. The goal of this paper is to further enhance this algorithm, with the intention to expand both the breadth and scale of instances that can be solved. We explore several variants that can maintain the same theoretical guarantees as the algorithm, but offer potentially faster computation. We consider different data structures, acceleration of gradient descent steps, and the use of the Blended Pairwise Conditional Gradients algorithm. We describe the original approach and these variants, and conduct numerical experiments in order to explore various tradeoffs in these algorithmic design choices.

翻訳日:2022-11-30 15:35:54 公開日:2022-11-28

# 対話型学習(IGL)を用いた個人化リワード学習

Personalized Reward Learning with Interaction-Grounded Learning (IGL) ( http://arxiv.org/abs/2211.15823v1 )

ライセンス: Link先を確認

Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan

(参考訳) 数え切れないほどのコンテンツ提供の時代に、レコメンダシステムはユーザーにパーソナライズされたコンテンツ提案を提供することで、情報の過負荷を軽減する。明示的なユーザフィードバックが不足しているため、現代のレコメンデータシステムは一般的に、すべてのユーザに対する暗黙的なフィードバック信号の固定的な組み合わせを最適化する。しかし、このアプローチは、それを強調している仕事の体を無視している。 (i)暗黙の信号は、ユーザの満足感からアクティブな嫌悪まで、様々な方法で使用することができる。 (ii)異なるユーザーが異なる方法で好みを伝える。多様なユーザ・コミュニケーション・モダリティの学習表現の課題に対処するために,近年のインタラクション・グラウンドド・ラーニング(IGL)パラダイムを適用することを提案する。固定された人間設計の報酬関数を取るのではなく、IGLは異なるユーザーに対してパーソナライズされた報酬関数を学習し、潜在ユーザの満足度を直接最適化することができる。シミュレーションおよび実世界の生産トレースを用いた実験により,IGLの成功例を示す。

In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than taking a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.

翻訳日:2022-11-30 15:27:28 公開日:2022-11-28

# SuS-X: 視覚言語モデルの訓練自由名専用転送

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models ( http://arxiv.org/abs/2211.16198v1 )

ライセンス: Link先を確認

Vishaal Udandarao, Ankush Gupta, Samuel Albanie

(参考訳) Contrastive Language-Image Pre-Training (CLIP) は、大規模な視覚言語モデルを訓練するための単純かつ効果的な方法として登場した。 CLIPは、さまざまな下流タスクに対する印象的なゼロショットの分類と検索を示す。しかし、その潜在能力を最大限活用するためには、微調整が必要であるようだ。クリップモデル全体の微調整はリソース集約的で不安定です。さらに、このような微調整を回避しようとする最近の手法では、ターゲット分布からの画像にアクセスする必要がある。本稿では,異なるアプローチを追求し,ダウンストリームタスクに関する知識が下流のターゲットカテゴリの名前のみを含む,トレーニングフリーな"名前のみの転送"の仕組みを検討する。本稿では,SuSとTIP-Xという2つの重要なビルディングブロックで構成されるSuS-Xを提案する。 SuS-Xは19のベンチマークデータセットで最先端のゼロショット分類結果を達成する。また,TIP-Xをトレーニング不要な複数ショット設定で有効性を示すとともに,トレーニング不要なベースラインの強化に対して,最先端の結果が得られた。コードはhttps://github.com/vishaal27/SuS-Xで入手できる。

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks -- SuS and TIP-X, that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve state-of-the-art results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.

翻訳日:2022-11-30 15:20:00 公開日:2022-11-28

# マルチヘッド蒸留による分散学習

Decentralized Learning with Multi-Headed Distillation ( http://arxiv.org/abs/2211.15774v1 )

ライセンス: Link先を確認

Andrey Zhmoginov and Mark Sandler and Nolan Miller and Gus Kristiansen and Max Vladymyrov

(参考訳) プライベートデータによる分散学習は、機械学習の中心的な問題である。本研究では,非iidデータを持つ複数のエージェントが,データ共有や重み付け,重み付けの更新を必要とせずに相互に学習できる,新たな蒸留型分散学習手法を提案する。提案手法は通信効率が高く,ラベルのない公開データセットを活用し,クライアント毎に複数の補助ヘッドを使用することで,異種データの場合のトレーニング効率を大幅に向上する。このアプローチにより、個々のモデルがプライベートタスクのパフォーマンスを保ち、向上すると同時に、グローバル集約されたデータ分散のパフォーマンスを劇的に改善することができる。我々は,データとモデルアーキテクチャの不均一性と,基礎となる通信グラフトポロジが学習効率に与える影響について検討し,エージェントが単独で学習するよりも性能を著しく向上できることを示す。

Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.

翻訳日:2022-11-30 15:19:42 公開日:2022-11-28

# 悪性オーバーフィッティング:補間はおそらく不変性を妨げる

Malign Overfitting: Interpolation Can Provably Preclude Invariance ( http://arxiv.org/abs/2211.15724v1 )

ライセンス: Link先を確認

Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon

(参考訳) 学習された分類器は、公平性、堅牢性、分散の一般化を促進するためのある種の不変性を持つべきである。しかし、近年の複数の研究により、共通不分散誘導正規化器は、分類器がトレーニングデータに完全に適合する(つまり補間する)過剰パラメータ化方式では有効ではないことが実証されている。これは、補間にもかかわらずモデルがうまく一般化する「良心過剰」現象が、堅牢性や公正性が望ましい設定にまで好ましくないことを示唆している。この研究では、これらの観測を理論的に正当化します。最も単純な設定であっても、任意の補間学習規則(任意にマージンが小さい)がこれらの不変性特性を満たさないことを証明します。そして、同じ設定で、証明可能な不変な非補間分類器をうまく学習するアルゴリズムを提案し、解析する。シミュレーションデータと水鳥データセットに関する理論的観察を検証する。

Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.

翻訳日:2022-11-30 15:10:02 公開日:2022-11-28

# 対人ロバスト性が精度差に及ぼす影響の理解

Understanding the Impact of Adversarial Robustness on Accuracy Disparity ( http://arxiv.org/abs/2211.15762v1 )

ライセンス: Link先を確認

Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao

(参考訳) 敵対的ロバスト性は標準的な精度に反する可能性があり、異なるクラスにさらに異なる影響を与える可能性があることは、長い間実証されてきたが、そのような観察がどの程度の程度で、クラスの不均衡が内部でどのように役割を果たすのかについては、未解決の問題である。本稿では,ガウス混合モデルの下で線形分類器を詳しく検討することにより,この精度格差の問題を解明しようとする。本研究は, 対向ロバスト性の影響を, 全クラスにおける標準精度を低下させる固有の効果と, 標準トレーニングと比較して精度の相違を増大させるクラス不均衡比の2つに分解する。さらに、我々のモデルを安定分布の一般族に拡張する。対向ロバスト性の制約は、バランスの取れたクラス設定における標準精度を常に低下させるが、クラス不均衡比は、安定分布の重みのため、ガウスの場合と比較して、精度の相違において根本的に異なる役割を果たす。合成データセットと実世界のデータセットの両方で実験を行う。実験結果は、理論的な知見を裏付けるだけでなく、実世界のデータセットよりも非線形モデルにも影響が及ぶ可能性を示唆している。

While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. We decompose the impact of adversarial robustness into two parts: an inherent effect that will degrade the standard accuracy on all classes, and the other caused by the class imbalance ratio, which will increase the accuracy disparity compared to standard training. Furthermore, we also extend our model to the general family of stable distributions. We demonstrate that while the constraint of adversarial robustness consistently degrades the standard accuracy in the balanced class setting, the class imbalance ratio plays a fundamentally different role in accuracy disparity compared to the Gaussian case, due to the heavy tail of the stable distribution. We additionally perform experiments on both synthetic and real-world datasets. The empirical results not only corroborate our theoretical findings, but also suggest that the implications may extend to nonlinear models over real-world datasets.

翻訳日:2022-11-30 15:09:46 公開日:2022-11-28

# Ollivier's Ricci Curvature を用いたオーバースムーシングとオーバースキャッシングの再検討

Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature ( http://arxiv.org/abs/2211.15779v1 )

ライセンス: Link先を確認

Khang Nguyen and Tan Nguyen and Nhat Ho and Khuong Nguyen and Hieu Nong and Vinh Nguyen

(参考訳) グラフニューラルネットワーク(GNN)は、オーバースムーシングとオーバースキャッシングの問題に本質的に感受性があることが示されている。これらの問題は、GNNが遠隔情報を考慮した複雑なグラフ相互作用をモデル化することを禁じている。本研究は,局所グラフ幾何学とこれら2つの問題の発生との間の鍵となる関係を明らかにし,Ollivier's Ricci曲率を用いて局所的に研究するための統一的な枠組みを提供する。この理論に基づき, 過剰なスムーシングと過剰なスケーシングの問題を緩和するために, 多数の原理的手法が提案されている。

Graph Neural Networks (GNNs) had been demonstrated to be inherently susceptible to the problems of over-smoothing and over-squashing. These issues prohibit the ability of GNNs to model complex graph interactions by limiting their effectiveness at taking into account distant information. Our study reveals the key connection between the local graph geometry and the occurrence of both of these issues, thereby providing a unified framework for studying them at a local scale using the Ollivier's Ricci curvature. Based on our theory, a number of principled methods are proposed to alleviate the over-smoothing and over-squashing issues.

翻訳日:2022-11-30 15:09:22 公開日:2022-11-28

# clas: 中央潜在アクションスペースによるマルチロボット操作のコーディネート

CLAS: Coordinating Multi-Robot Manipulation with Central Latent Action Spaces ( http://arxiv.org/abs/2211.15824v1 )

ライセンス: Link先を確認

Elie Aljalbout and Maximilian Karl and Patrick van der Smagt

(参考訳) マルチロボット操作タスクは、動的に独立した部分に分割することができる様々な制御エンティティを含む。そのような現実世界のタスクの典型的な例はデュアルアーム操作である。このようなタスクを強化学習でナビゲート的に解くことは、アクションと状態空間の次元とともに成長するサンプルの複雑さと探索要求のため、しばしば実現不可能である。代わりに、マルチエージェントシステムのような環境を扱い、エージェントが全体を制御するようにしたいと考えています。しかし、アクションの生成を分散化するには、タスクの中心となる情報に制限されたチャネルを通じてエージェント間の調整が必要である。本稿では,異なるエージェント間で共有される学習された潜在行動空間を通じて,マルチロボット操作を協調する手法を提案する。シミュレーションによるマルチロボット操作タスクにおいて,本手法を検証し,サンプル効率と学習性能の観点から,従来のベースラインよりも改善することを示す。

Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. Instead, we would like to handle such environments as multi-agent systems and have several agents control parts of the whole. However, decentralizing the generation of actions requires coordination across agents through a channel limited to information central to the task. This paper proposes an approach to coordinating multi-robot manipulation through learned latent action spaces that are shared across different agents. We validate our method in simulated multi-robot manipulation tasks and demonstrate improvement over previous baselines in terms of sample efficiency and learning performance.

翻訳日:2022-11-30 15:01:59 公開日:2022-11-28

# マーケティングにおける資源配分問題に対する直接的不均一因果学習

Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing ( http://arxiv.org/abs/2211.15728v1 )

ライセンス: Link先を確認

Hao Zhou, Shaoming Li, Guibin Jiang, Jiaqi Zheng and Dong Wang

(参考訳) マーケティングは、ユーザのエンゲージメントを高め、プラットフォーム収益を改善するための重要なメカニズムであり、不均一な因果学習は、より効果的な戦略の開発に役立つ。マーケティングにおける意思決定問題は資源配分問題として定式化され、数十年にわたって研究されてきた。既存の作業は通常、解法を2つの完全に分離された段階、すなわち機械学習(ML)とオペレーションリサーチ(OR)に分割する。しかし、MLにおける予測パラメータの誤差は尊重されず、ORにおける一連の複雑な数学的操作は累積誤差の増加につながる。本質的に、予測パラメータの精度向上は、デカップリング設計による副作用のため、最終解に正の相関を持たない可能性がある。本稿では,資源割当問題を解決し,副作用を緩和するための新しい手法を提案する。我々の重要な直感は、MLとOR間のブリッジを確立するための決定因子を導入し、決定因子のソートや比較操作のみを実行することで、OR内で直接解を得ることができることです。さらに,決定要因に対して直接的不均質因果学習を行うようにカスタマイズした損失関数を設計し,損失が収束した場合の偏りのない推定を行う。ケーススタディでは,2次処理代入問題と複数処理による予算配分問題という,マーケティングにおける重要な2つの問題にアプローチを適用した。大規模シミュレーションとオンラインa/bテストの両方で,我々のアプローチが最先端の手法に比べて大幅に改善できることが示されている。

Marketing is an important mechanism to increase user engagement and improve platform revenue, and heterogeneous causal learning can help develop more effective strategies. Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades. Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR) -- the first stage predicts the model parameters and they are fed to the optimization in the second stage. However, the error of the predicted parameters in ML cannot be respected and a series of complex mathematical operations in OR lead to the increased accumulative errors. Essentially, the improved precision on the prediction parameters may not have a positive correlation on the final solution due to the side-effect from the decoupled design. In this paper, we propose a novel approach for solving resource allocation problems to mitigate the side-effects. Our key intuition is that we introduce the decision factor to establish a bridge between ML and OR such that the solution can be directly obtained in OR by only performing the sorting or comparison operations on the decision factor. Furthermore, we design a customized loss function that can conduct direct heterogeneous causal learning on the decision factor, an unbiased estimation of which can be guaranteed when the loss converges. As a case study, we apply our approach to two crucial problems in marketing: the binary treatment assignment problem and the budget allocation problem with multiple treatments. Both large-scale simulations and online A/B Tests demonstrate that our approach achieves significant improvement compared with state-of-the-art.

翻訳日:2022-11-30 15:01:26 公開日:2022-11-28

# videofact: 注意、シーンコンテキスト、法医学的トレースを用いたビデオ偽造の検出

VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces ( http://arxiv.org/abs/2211.15775v1 )

ライセンス: Link先を確認

Tai D. Nguyen, Shengbang Fang, Matthew C. Stamm

(参考訳) フェイクビデオは重要な誤報の脅威だ。既存の法医学的ネットワークは画像偽造に強いパフォーマンスを示しているが、最近のAdobe VideoShamデータセットの報告によると、これらのネットワークはビデオ内の偽のコンテンツを識別できない。本稿では,多種多様なビデオの偽造や操作を検知・ローカライズできる新しいネットワークを提案する。既存のネットワークがビデオ解析時に直面する課題を克服するため,本ネットワークは,操作によって残される痕跡を捕捉する法医学的埋め込みと,局所的なシーン内容に対する法医学的トレースの条件付き依存関係を利用するコンテキスト埋め込み,深層でトランスフォーマーベースの注意機構による空間的注意の両方を利用する。いくつかの新しいビデオフォージェリーデータセットを作成し、これらを公開データとともに使用して、ネットワークのパフォーマンスを実験的に評価する。これらの結果から,提案するネットワークは,訓練中に遭遇しないものを含む多様なビデオ偽造を識別できることがわかった。さらに,本研究の結果は,画像鑑定ネットワークがビデオ中の偽コンテンツをほとんど特定できないという最近の知見を裏付けるものである。

Fake videos represent an important misinformation threat. While existing forensic networks have demonstrated strong performance on image forgeries, recent results reported on the Adobe VideoSham dataset show that these networks fail to identify fake content in videos. In this paper, we propose a new network that is able to detect and localize a wide variety of video forgeries and manipulations. To overcome challenges that existing networks face when analyzing videos, our network utilizes both forensic embeddings to capture traces left by manipulation, context embeddings to exploit forensic traces' conditional dependencies upon local scene content, and spatial attention provided by a deep, transformer-based attention mechanism. We create several new video forgery datasets and use these, along with publicly available data, to experimentally evaluate our network's performance. These results show that our proposed network is able to identify a diverse set of video forgeries, including those not encountered during training. Furthermore, our results reinforce recent findings that image forensic networks largely fail to identify fake content in videos.

翻訳日:2022-11-30 14:52:32 公開日:2022-11-28

# 地理空間探索のためのビジュアルアクティブ検索フレームワーク

A Visual Active Search Framework for Geospatial Exploration ( http://arxiv.org/abs/2211.15788v1 )

ライセンス: Link先を確認

Anindya Sarkar, Michael Lanier, Scott Alfeld, Roman Garnett, Nathan Jacobs, Yevgeniy Vorobeychik

(参考訳) 多くの問題は航空画像による地理空間探索の一種と見なすことができ、例えば、密猟活動の検出から人身売買まで多岐にわたる。本研究では,視覚的能動探索(VAS)フレームワークを用いて,広い領域のイメージを入力とし,対象対象物のできるだけ多くの例を特定することを目的とする。これはクエリの限られたシーケンスを通じて行われ、それぞれが与えられた領域にサンプルが存在するかどうかを検証する。本稿では,完全注釈付き検索タスクの集合を学習データとして活用し,検索方針を学習し,入力画像の特徴と能動検索状態の自然な表現を組み合わせる,vasのための強化学習手法を提案する。さらに,VASタスクのテスト時間分布を完全に反映していない場合の判定時のポリシー改善のためのドメイン適応手法を提案する。複数の衛星画像データセットに関する広範囲な実験を通じて,提案手法が複数の強力なベースラインを上回ることを示した。コードとデータは公開されます。

Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.

翻訳日:2022-11-30 14:52:10 公開日:2022-11-28

# 微分可能な辞書を用いた信号混合の確率論的モデル化

Probabilistic Modelling of Signal Mixtures with Differentiable Dictionaries ( http://arxiv.org/abs/2211.15439v1 )

ライセンス: Link先を確認

Luk\'a\v{s} Samuel Mart\'ak, Rainer Kelz, Gerhard Widmer

(参考訳) 我々は,事前情報を(半)教師付き非負の行列分解に組み込む新しい手法を導入し,これを微分可能な辞書探索と呼ぶ。これは、非線形ソースが線形に混合される混合の一般的な、柔軟で原理的なモデリングを可能にする。音声分解タスクにおけるその動作を解析し、そのモデリング能力に関する広範囲かつ高度に制御された研究を行う。

We introduce a novel way to incorporate prior information into (semi-) supervised non-negative matrix factorization, which we call differentiable dictionary search. It enables general, highly flexible and principled modelling of mixtures where non-linear sources are linearly mixed. We study its behavior on an audio decomposition task, and conduct an extensive, highly controlled study of its modelling capabilities.

翻訳日:2022-11-29 23:03:12 公開日:2022-11-28

# 大規模アレー通信における近接場チャネル推定--モデルに基づくディープラーニングアプローチ

Near-Field Channel Estimation for Extremely Large-Scale Array Communications: A model-based deep learning approach ( http://arxiv.org/abs/2211.15440v1 )

ライセンス: Link先を確認

Xiangyu Zhang and Zening Wang and Haiyang Zhang and Luxi Yang

(参考訳) 大規模MIMO(XL-MIMO)が将来無線通信の有望な技術として評価されている。 XL-MIMOの展開は、特に高周波帯において、従来の遠方界ではなく、近距離域にユーザーを配置させる。本稿では,XL-MIMO通信の近距離無線チャネルを推定するためのモデルに基づく効率的なディープラーニングアルゴリズムを提案する。特に,XL-MIMO近距離チャネル推定タスクを空間グリッド型スペーシング辞書を用いて圧縮センシング問題として定式化し,学習反復収縮・保持アルゴリズム(LISTA)を適用して結果の問題を解決する。近接場特性のため、空間グリッドに基づくスパース化辞書は、低いチャネル推定精度と重い計算負荷をもたらす可能性がある。この問題に対処するために、スペーサー辞書をニューラルネットワーク層として定式化し、LISTAニューラルネットワークに組み込む新しいスペーサー辞書学習LISTA(SDL-LISTA)アルゴリズムを提案する。その結果,提案手法は非学習ベンチマーク方式よりも優れており,sdl-listaは10倍の原子削減でlistaよりも優れた性能が得られることがわかった。

Extremely large-scale massive MIMO (XL-MIMO) has been reviewed as a promising technology for future wireless communications. The deployment of XL-MIMO, especially at high-frequency bands, leads to users being located in the near-field region instead of the conventional far-field. This letter proposes efficient model-based deep learning algorithms for estimating the near-field wireless channel of XL-MIMO communications. In particular, we first formulate the XL-MIMO near-field channel estimation task as a compressed sensing problem using the spatial gridding-based sparsifying dictionary, and then solve the resulting problem by applying the Learning Iterative Shrinkage and Thresholding Algorithm (LISTA). Due to the near-field characteristic, the spatial gridding-based sparsifying dictionary may result in low channel estimation accuracy and a heavy computational burden. To address this issue, we further propose a new sparsifying dictionary learning-LISTA (SDL-LISTA) algorithm that formulates the sparsifying dictionary as a neural network layer and embeds it into LISTA neural network. The numerical results show that our proposed algorithms outperform non-learning benchmark schemes, and SDL-LISTA achieves better performance than LISTA with ten times atoms reduction.

翻訳日:2022-11-29 23:03:07 公開日:2022-11-28

# 微分可能な辞書探索:音源分離のための線形混合と深部非線形モデルの統合

Differentiable Dictionary Search: Integrating Linear Mixing with Deep Non-Linear Modelling for Audio Source Separation ( http://arxiv.org/abs/2211.15524v1 )

ライセンス: Link先を確認

Luk\'a\v{s} Samuel Mart\'ak, Rainer Kelz, Gerhard Widmer

(参考訳) 本稿では,微分可能な辞書検索 (DDS) の名称で最近定式化した信号分解法の改良について述べる。 DDSの基本的な考え方は、正規化フローと呼ばれる強力な非可逆密度推定器のクラスを利用して、辞書をNMFのような線形分解法でモデル化し、辞書要素の空間と関連する確率空間の間のビジェクションを効果的に生成し、推定密度で導かれる辞書空間を通して微分可能な探索を可能にすることである。最初の定式化は、いくつかの実用的な制限のある概念実証であり、我々は、この手法の計算複雑性と信号分解能力の両方を改善するために、拡張性を高めるためのいくつかのステップを示す。実験的な評価のためのテストベッドとして,個々のピアノ音符に起因した音源に信号が分解されるフレームレベルピアノの書き起こしのタスクを選択する。音源の非線形モデリングの改善による影響を明らかにするため,提案手法の変種を線形オーバーコンプリートNMFベースラインと比較した。実験の結果、追加の制約がなくても、2つの関連する評価尺度により、モデルがより疎弱で正確な分解を生じていることが示される。

This paper describes several improvements to a new method for signal decomposition that we recently formulated under the name of Differentiable Dictionary Search (DDS). The fundamental idea of DDS is to exploit a class of powerful deep invertible density estimators called normalizing flows, to model the dictionary in a linear decomposition method such as NMF, effectively creating a bijection between the space of dictionary elements and the associated probability space, allowing a differentiable search through the dictionary space, guided by the estimated densities. As the initial formulation was a proof of concept with some practical limitations, we will present several steps towards making it scalable, hoping to improve both the computational complexity of the method and its signal decomposition capabilities. As a testbed for experimental evaluation, we choose the task of frame-level piano transcription, where the signal is to be decomposed into sources whose activity is attributed to individual piano notes. To highlight the impact of improved non-linear modelling of sources, we compare variants of our method to a linear overcomplete NMF baseline. Experimental results will show that even in the absence of additional constraints, our models produce increasingly sparse and precise decompositions, according to two pertinent evaluation measures.

翻訳日:2022-11-29 23:02:45 公開日:2022-11-28

# データ拡張による機械学習による外惑星検出

Exoplanet Detection by Machine Learning with Data Augmentation ( http://arxiv.org/abs/2211.15577v1 )

ライセンス: Link先を確認

Koray Aydo\u{g}an

(参考訳) 近年、深層学習はkepler \cite{borucki2010kepler} \cite{koch2010kepler} やnasaのtransiting exoplanet survey satellite (tess) \cite{ricker2010transiting} のような衛星からの光曲線データを用いて、太陽系外惑星検出パイプラインの一部を自動化する重要な可能性を実証されている。残念ながら、利用可能なデータセットの小さいため、強力なネットワークアーキテクチャから期待されるパフォーマンスレベルを実現するのは難しい。本稿では,外惑星を識別するために,ニューラルネットワークを訓練するための光曲線データに対するデータ拡張手法について検討する。 Augmentation Technique は2つのクラスから構成される: 単純(例:加法的雑音増大)と学習ベース(例: GAN \cite{goodfellow2020generative} を訓練して新しい例を生成する)。我々は、データ拡張が外惑星検出問題におけるモデル性能を向上させる可能性を実証し、より多くのデータが利用可能になるにつれて、生成モデルに基づく拡張の利用を推奨する。

It has recently been demonstrated that deep learning has significant potential to automate parts of the exoplanet detection pipeline using light curve data from satellites such as Kepler \cite{borucki2010kepler} \cite{koch2010kepler} and NASA's Transiting Exoplanet Survey Satellite (TESS) \cite{ricker2010transiting}. Unfortunately, the smallness of the available datasets makes it difficult to realize the level of performance one expects from powerful network architectures. In this paper, we investigate the use of data augmentation techniques on light curve data from to train neural networks to identify exoplanets. The augmentation techniques used are of two classes: Simple (e.g. additive noise augmentation) and learning-based (e.g. first training a GAN \cite{goodfellow2020generative} to generate new examples). We demonstrate that data augmentation has a potential to improve model performance for the exoplanet detection problem, and recommend the use of augmentation based on generative models as more data becomes available.

翻訳日:2022-11-29 23:02:23 公開日:2022-11-28

# 量子およびハイブリッドアルゴリズムを用いたシミュレーションおよび物理量子処理ユニットのベンチマーク

Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms ( http://arxiv.org/abs/2211.15631v1 )

ライセンス: Link先を確認

Mohammad Kordzanganeh, Markus Buchberger, Maxim Povolotskii, Wilhelm Fischer, Andrii Kurkin, Wilfrid Somogyi, Asel Sagingalieva, Markus Pflitsch, Alexey Melnikov

(参考訳) 強力なハードウェアサービスとソフトウェアライブラリは、量子アルゴリズムを迅速に設計、テスト、実行するための必須のツールである。これらのプラットフォームのパフォーマンスがキュービット数でどのようにスケールするかに関する堅牢な大規模研究は、業界問題に対する量子ソリューションを提供する上で鍵となる。このような評価は、物理量子処理ユニットの可用性と価格のために難しい。この作業は、特殊な高性能シミュレーションおよび物理量子処理ユニットの代表的なサンプルのランタイムと精度をベンチマークする。その結果、QMwareクラウドコンピューティングサービスは、27キュービット未満のアルゴリズムの次の最速オプションと比較して、量子回路の実行ランタイムを最大78%削減できることがわかった。 AWS SV1シミュレータは、SV1で利用可能な最大34キュービットまでの大きな回路に対して、ランタイム上のアドバンテージを提供する。この制限を超えて、QMwareは40キュービットの回路を実行する機能を提供する。 RigettiのAspen-M2のような物理量子デバイスは、30以上の回路に対して指数的ランタイムの利点を提供することができる。しかし、物理的量子処理ユニットの高コストは、実用化への深刻な障壁となっている。さらに、試験された4つの量子デバイスのうち、IonQのHarmonyのみが4ビット以上の高忠実性を達成する。この研究は、実用的な量子アルゴリズムを実行するための利用可能なソフトウェアとハードウェアの最適な組み合わせを理解する方法を示している。

Powerful hardware services and software libraries are vital tools for quickly and affordably designing, testing, and executing quantum algorithms. A robust large-scale study of how the performance of these platforms scales with the number of qubits is key to providing quantum solutions to challenging industry problems. Such an evaluation is difficult owing to the availability and price of physical quantum processing units. This work benchmarks the runtime and accuracy for a representative sample of specialized high-performance simulated and physical quantum processing units. Results show the QMware cloud computing service can reduce the runtime for executing a quantum circuit by up to 78% compared to the next fastest option for algorithms with fewer than 27 qubits. The AWS SV1 simulator offers a runtime advantage for larger circuits, up to the maximum 34 qubits available with SV1. Beyond this limit, QMware provides the ability to execute circuits as large as 40 qubits. Physical quantum devices, such as Rigetti's Aspen-M2, can provide an exponential runtime advantage for circuits with more than 30. However, the high financial cost of physical quantum processing units presents a serious barrier to practical use. Moreover, of the four quantum devices tested, only IonQ's Harmony achieves high fidelity with more than four qubits. This study paves the way to understanding the optimal combination of available software and hardware for executing practical quantum algorithms.

翻訳日:2022-11-29 23:01:59 公開日:2022-11-28

# RAMP:分散ディープラーニングシステムのためのフラットナノ秒光ネットワークとMPI操作

RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems ( http://arxiv.org/abs/2211.15226v1 )

ライセンス: Link先を確認

Alessandro Ottino, Joshua Benjamin, Georgios Zervas

(参考訳) 分散ディープラーニング(DDL)システムはネットワーク性能に強く依存する。現在の電子パケット交換(eps)ネットワークアーキテクチャと技術は、可変径トポロジー、低バイス帯域幅、通信や集団操作の完了時間に影響するオーバーサブリプションに苦しむ。我々は,大規模分散並列コンピューティングシステム(ノード1ノードあたり12.8～tbps,最大65,536ノード)をサポートする,ナノ秒再構成と呼ばれるネットワークアーキテクチャを導入する。光回路スイッチング(OCS)ネットワーク上で,RAMP-xのMPI戦略とネットワークトランスコーダをスケジュールのない競合のない方法で動作させる方法が提案されている。 RAMPは7.6-171$\times$ quickly-up in completion time across all MPI operations than real EPS and OCS equivalents。また、1.3-16$\times$と7.8-58$\times$がmegatronとdlrmのトレーニング時間をそれぞれ削減し、42-53$\times$と3.3-12.4$\times$がエネルギー消費とコストをそれぞれ改善できる。

Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations. We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8~Tbps per node for up to 65,536 nodes). For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171$\times$ speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and 7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption and cost respectively.

翻訳日:2022-11-29 23:01:40 公開日:2022-11-28

# 機械学習によるPDEバックステッピングオブザーバの高速化

Machine Learning Accelerated PDE Backstepping Observers ( http://arxiv.org/abs/2211.15044v1 )

ライセンス: Link先を確認

Yuanyuan Shi, Zongyi Li, Huan Yu, Drew Steeves, Anima Anandkumar, Miroslav Krstic

(参考訳) 状態推定は、予測からフィードバックコントローラの未測定状態の置換まで、さまざまなタスクにおいて重要である。 PDEのバックステッピングをベースとした観測者など、実証的かつ迅速に収束する観測者によるPDEのリアルタイム状態推定は、計算コストが高く、多くの場合禁止される。精度を保ちながらより高速な学習手法を用いてPDEオブザーバ計算を高速化するフレームワークを提案する。特に、最近開発されたフーリエニューラル演算子(FNO)を用いて、初期観測値と境界測定値から状態推定値への関数マッピングを学習する。特定の収束率を保証した前設計の観測者に対してバックステッピングオブザーバゲインを用いることで,fno による計算効率の向上を評価する数値実験を行う。まず, 反応拡散(パラボリック)PDEに対して, その状態が指数的収束率で推定される場合, および, パラボリックPDEに対して, 正確な所定時間推定を行う場合, および, 交通流密度と速度をモデル化する一階双曲PDEを結合した一階双曲PDEについて, 状態推定を行う。これらのPDEのシミュレーションデータセットで訓練されたML加速オブザーバは、古典的手法と比較して計算速度が最大で3桁向上する。これは、リアルタイム状態推定と制御のためのml加速オブザーバの魅力を示す。

State estimation is important for a variety of tasks, from forecasting to substituting for unmeasured states in feedback controllers. Performing real-time state estimation for PDEs using provably and rapidly converging observers, such as those based on PDE backstepping, is computationally expensive and in many cases prohibitive. We propose a framework for accelerating PDE observer computations using learning-based approaches that are much faster while maintaining accuracy. In particular, we employ the recently-developed Fourier Neural Operator (FNO) to learn the functional mapping from the initial observer state and boundary measurements to the state estimate. By employing backstepping observer gains for previously-designed observers with particular convergence rate guarantees, we provide numerical experiments that evaluate the increased computational efficiency gained with FNO. We consider the state estimation for three benchmark PDE examples motivated by applications: first, for a reaction-diffusion (parabolic) PDE whose state is estimated with an exponential rate of convergence; second, for a parabolic PDE with exact prescribed-time estimation; and, third, for a pair of coupled first-order hyperbolic PDEs that modeling traffic flow density and velocity. The ML-accelerated observers trained on simulation data sets for these PDEs achieves up to three orders of magnitude improvement in computational speed compared to classical methods. This demonstrates the attractiveness of the ML-accelerated observers for real-time state estimation and control.

翻訳日:2022-11-29 22:55:27 公開日:2022-11-28

# 深部平衡学習を用いた軽量・適応FDD質量型MIMO CSIフィードバック

Lightweight and Adaptive FDD Massive MIMO CSI Feedback with Deep Equilibrium Learning ( http://arxiv.org/abs/2211.15079v1 )

ライセンス: Link先を確認

Yifan Ma, Wentao Yu, Xianghao Yu, Jun Zhang, Shenghui Song, Khaled B. Letaief

(参考訳) 広帯域多重出力(MIMO)システムでは、ダウンリンクチャネル状態情報(CSI)をユーザから基地局(BS)に送信する必要がある。本稿では,深層平衡モデルを用いた軽量かつ適応的な深層学習に基づくCSIフィードバック方式を提案する。複数の明示的な層を積み重ねる既存のディープラーニングベースのアプローチとは異なり、無限深層ニューラルネットワークの過程を模倣する暗黙の平衡ブロックを提案する。特に、暗黙の平衡ブロックは固定点反復によって定義され、各イテレーションの訓練可能なパラメータは共有され、結果として軽量モデルとなる。さらに、ユーザの計算能力に応じて前方イテレーションの数を調整でき、オンラインの精度と効率のトレードオフを実現できる。シミュレーションの結果,提案手法は既存のベンチマークに匹敵する性能を示すが,複雑さが大きく,実行時に精度・効率のトレードオフが可能であることが示された。

In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, downlink channel state information (CSI) needs to be sent from users back to the base station (BS), which causes prohibitive feedback overhead. In this paper, we propose a lightweight and adaptive deep learning-based CSI feedback scheme by capitalizing on deep equilibrium models. Different from existing deep learning-based approaches that stack multiple explicit layers, we propose an implicit equilibrium block to mimic the process of an infinite-depth neural network. In particular, the implicit equilibrium block is defined by a fixed-point iteration and the trainable parameters in each iteration are shared, which results in a lightweight model. Furthermore, the number of forward iterations can be adjusted according to the users' computational capability, achieving an online accuracy-efficiency trade-off. Simulation results will show that the proposed method obtains a comparable performance as the existing benchmarks but with much-reduced complexity and permits an accuracy-efficiency trade-off at runtime.

翻訳日:2022-11-29 22:55:01 公開日:2022-11-28

# 対向機械学習における非局所周波のガンマ収束

Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning ( http://arxiv.org/abs/2211.15223v1 )

ライセンス: Link先を確認

Leon Bungert, Kerrek Stinson

(参考訳) 本稿では,ミンコフスキー型非局所周囲を局所異方性周囲に収束させるガンマコンバージェンスを証明する。非局所モデルは、二分分類における逆訓練の正規化効果を記述する。エネルギーは本質的に2つの分布間の相互作用に依存し、関連するクラスの確率をモデル化する。我々は、分布の典型的な厳密な規則性仮定を克服し、それらは$bv$ 密度を持つと仮定するだけである。コンパクト性から生じる自然トポロジーにおいて, 2つの密度の異方性関数によって決定される重み付き周囲にガンマ収束が証明される。局所的であるにもかかわらず、この鋭いインターフェイス制限は、対向摂動に関する分類安定性を反映している。さらに, 関連する全変動のガンマコンバージェンスを推定し, 逆訓練の漸近性について検討し, 非局所周囲におけるグラフ離散化のガンマコンバージェンスを証明する。

In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.

翻訳日:2022-11-29 22:54:43 公開日:2022-11-28

# 1次元畳み込みニューラルネットワークのリプシッツ定数推定

Lipschitz constant estimation for 1D convolutional neural networks ( http://arxiv.org/abs/2211.15253v1 )

ライセンス: Link先を確認

Patricia Pauli and Dennis Gramlich and Frank Allg\"ower

(参考訳) 本研究では,1次元畳み込みニューラルネットワーク(CNN)のリプシッツ定数推定法を提案する。特に,畳み込み,プーリング,および完全連結層の分散特性を,非線形活性化関数とプーリング演算に漸進的2次制約を適用して解析する。これらの写像の連結のリプシッツ定数は、分離性理論から導かれる半定値のプログラムを解いて推定される。提案手法を極力効率的にするために,これらの有限インパルス応答フィルタを状態空間の因果力学系として実現し,状態空間実現のための分散解析を行うために,畳み込み層の構造を考慮に入れた。我々が提示した例は、我々のリプシッツ境界が正確性と拡張性の観点から有利であることを示している。

In this work, we propose a dissipativity-based method for Lipschitz constant estimation of 1D convolutional neural networks (CNNs). In particular, we analyze the dissipativity properties of convolutional, pooling, and fully connected layers making use of incremental quadratic constraints for nonlinear activation functions and pooling operations. The Lipschitz constant of the concatenation of these mappings is then estimated by solving a semidefinite program which we derive from dissipativity theory. To make our method as efficient as possible, we take the structure of convolutional layers into account realizing these finite impulse response filters as causal dynamical systems in state space and carrying out the dissipativity analysis for the state space realizations. The examples we provide show that our Lipschitz bounds are advantageous in terms of accuracy and scalability.

翻訳日:2022-11-29 22:54:29 公開日:2022-11-28

# 弱連結ネットワークシステムにおけるコヒーレントクラスタの学習

Learning Coherent Clusters in Weakly-Connected Network Systems ( http://arxiv.org/abs/2211.15301v1 )

ライセンス: Link先を確認

Hancheng Min and Enrique Mallada

(参考訳) 本稿では,密結合コンポーネントを用いた大規模動的ネットワークのための構造保存モデル還元手法を提案する。まず、コヒーレント群は、ネットワークフィードバックをモデル化したグラフラプラシア行列上のスペクトルクラスタリングアルゴリズムによって同定される。次に、各ノードが各コヒーレントグループの集合ダイナミクスを表すように縮小されたネットワークを構築し、還元されたネットワークがグループ間の動的結合をキャプチャする。重み付き確率ブロックモデルからネットワークグラフをランダムに生成する場合、近似誤差の上限を与える。最後に, 数値実験は理論的な知見と一致し, 検証する。

We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. Then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.

翻訳日:2022-11-29 22:54:17 公開日:2022-11-28

# 計算流体力学における機械学習の新興動向

Emerging trends in machine learning for computational fluid dynamics ( http://arxiv.org/abs/2211.15145v1 )

ライセンス: Link先を確認

Ricardo Vinuesa and Steve Brunton

(参考訳) 機械学習(ml)の科学コミュニティからの新たな関心は、多くの新しい研究分野を開いている。ここでは、計算流体力学(CFD)の分野を改善する機会を提供するMLの新たなトレンドに焦点を当てる。特に,すでに利益を示しているMLとCFDの相乗効果について論じるとともに,現在開発中であり,今後数年で重要な利益をもたらす可能性のある領域も評価する。我々は、これらの新興アプローチに対する慎重な楽観主義のバランスのとれた視点を強調することも重要であると信じている。

The renewed interest from the scientific community in machine learning (ML) is opening many new areas of research. Here we focus on how novel trends in ML are providing opportunities to improve the field of computational fluid dynamics (CFD). In particular, we discuss synergies between ML and CFD that have already shown benefits, and we also assess areas that are under development and may produce important benefits in the coming years. We believe that it is also important to emphasize a balanced perspective of cautious optimism for these emerging approaches

翻訳日:2022-11-29 20:45:57 公開日:2022-11-28

# aquafel-pso:マルチモーダルpsoとフェデレーション学習に基づく自律型表面車両を用いた水資源モニタリングシステム

AquaFeL-PSO: A Monitoring System for Water Resources using Autonomous Surface Vehicles based on Multimodal PSO and Federated Learning ( http://arxiv.org/abs/2211.15217v1 )

ライセンス: Link先を確認

Micaela Jara Ten Kathen, Princy Johnson, Isabel Jurado Flores, Daniel Guti errez Reina

(参考訳) 水資源の保存、モニタリング、管理は、ここ数十年で大きな課題となっている。水資源は、水の汚染レベルを知るために常に監視されなければならない。本研究の目的は,マルチモーダル粒子群最適化に基づく水質センサを備えた自律型表面車両を用いた水監視システムと,ガウス過程をサロゲートモデルとしてアクアフェル-psoアルゴリズムを用いたフェデレーション学習手法を提案することである。提案するモニタリングシステムは,探索フェーズと搾取フェーズの2つのフェーズを有する。調査段階では、車両は水資源の表面を調べ、水質センサによって取得されたデータにより、第1の水質モデルが中央サーバで推定される。利用フェーズでは, 調査フェーズで推定したモデルを用いて, 領域をアクションゾーンに分割し, 汚染ゾーンをよりよく活用する。水資源の最終的な水質モデルを得るため、両方の相で得られたモデルが組み合わされる。その結果,提案する経路プランナーは,他の経路プランナーと比較して14$%$改善し,水資源全体において400$$$$改善モデルが得られ,汚染ピークの検出においても4,000$$%改善が得られた。また,フェデレート学習技術を適用した結果が,集中型システムの結果と非常によく似ていることも証明された。

The preservation, monitoring, and control of water resources has been a major challenge in recent decades. Water resources must be constantly monitored to know the contamination levels of water. To meet this objective, this paper proposes a water monitoring system using autonomous surface vehicles, equipped with water quality sensors, based on a multimodal particle swarm optimization, and the federated learning technique, with Gaussian process as a surrogate model, the AquaFeL-PSO algorithm. The proposed monitoring system has two phases, the exploration phase and the exploitation phase. In the exploration phase, the vehicles examine the surface of the water resource, and with the data acquired by the water quality sensors, a first water quality model is estimated in the central server. In the exploitation phase, the area is divided into action zones using the model estimated in the exploration phase for a better exploitation of the contamination zones. To obtain the final water quality model of the water resource, the models obtained in both phases are combined. The results demonstrate the efficiency of the proposed path planner in obtaining water quality models of the pollution zones, with a 14$\%$ improvement over the other path planners compared, and the entire water resource, obtaining a 400$\%$ better model, as well as in detecting pollution peaks, the improvement in this case study is 4,000$\%$. It was also proven that the results obtained by applying the federated learning technique are very similar to the results of a centralized system.

翻訳日:2022-11-29 20:45:49 公開日:2022-11-28

# 確率的シュテッフェンセン法

Stochastic Steffensen method ( http://arxiv.org/abs/2211.15310v1 )

ライセンス: Link先を確認

Minda Zhao, Zehua Lai, and Lek-Heng Lim

(参考訳) 一階法、すなわち、第一導関数のみが許される場合、二次収束することは可能であるか。不定損失関数の場合、答えは yes である -- steffensen 法は第二導関数を避け、ニュートン法のように二次収束する。最適なステップサイズを組み込むことで、収束順序を2次から1+\sqrt{2} \approx 2.414$まで押し上げることもできる。このような高い収束順序は決定論的アルゴリズムの無意味なオーバーキルであるが、アルゴリズムが巨大なサイズの問題に対してランダム化されると、ランダム化は必ず収束速度を損なう。 steffensen法にインスパイアされた2つの適応学習率を導入する。確率的最適化設定での使用を意図しており、バッチサイズ以外にハイパーパラメータチューニングは不要である。広範な実験により、既存のいくつかの一階法と比較できることがわかった。二次目的に制限された場合、確率的シュテッフェンセン法はランダム化されたカッツマルツ法に還元される(これはSGD や SLBFGS には当てはまらない)。

Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.

翻訳日:2022-11-29 20:45:23 公開日:2022-11-28

# beyond s-curves: 技術予測のためのリカレントニューラルネットワーク

Beyond S-curves: Recurrent Neural Networks for Technology Forecasting ( http://arxiv.org/abs/2211.15334v1 )

ライセンス: Link先を確認

Alexander Glavackij, Dimitri Percia David, Alain Mermoud, Angelika Romanou, Karl Aberer

(参考訳) 技術的ランドスケープのかなりの多様性と複雑さのため、正確なモデルを構築して予測することは困難な取り組みである。多くの複雑なシステムにおいて高い頻度でS曲線は以前の研究で一般的な予測手法である。しかし、その予測性能は他の技術予測手法と直接比較されていない。さらに、予測精度の向上を主張する時系列予測の最近の発展は、技術開発データにはまだ適用されていない。本研究は,s曲線の予測性能をベースラインと比較し,機械学習と時系列予測の最近の進歩を用いたautencoderアプローチを開発することにより,両研究のギャップに対処する。 S曲線予測は、単純なARIMAベースラインに匹敵する平均パーセンテージ誤差(MAPE)を示す。しかし、新興技術の少数派にとっては、MAPEは2等級に増大する。我々のオートエンコーダアプローチは、2番目に高い結果に対して平均13.5%改善する。他のアプローチと同じ精度で確立された技術を予測する。しかし、特に新興技術の予測は、平均MAPEが次の最良の結果より18%低いことが強くなっている。以上の結果から,S曲線よりも単純なARIMAモデルの方が好ましいことが示唆された。より正確な予測を求める実践者は、提示されたautoencoderアプローチを選択する必要がある。

Because of the considerable heterogeneity and complexity of the technological landscape, building accurate models to forecast is a challenging endeavor. Due to their high prevalence in many complex systems, S-curves are a popular forecasting approach in previous work. However, their forecasting performance has not been directly compared to other technology forecasting approaches. Additionally, recent developments in time series forecasting that claim to improve forecasting accuracy are yet to be applied to technological development data. This work addresses both research gaps by comparing the forecasting performance of S-curves to a baseline and by developing an autencoder approach that employs recent advances in machine learning and time series forecasting. S-curves forecasts largely exhibit a mean average percentage error (MAPE) comparable to a simple ARIMA baseline. However, for a minority of emerging technologies, the MAPE increases by two magnitudes. Our autoencoder approach improves the MAPE by 13.5% on average over the second-best result. It forecasts established technologies with the same accuracy as the other approaches. However, it is especially strong at forecasting emerging technologies with a mean MAPE 18% lower than the next best result. Our results imply that a simple ARIMA model is preferable over the S-curve for technology forecasting. Practitioners looking for more accurate forecasts should opt for the presented autoencoder approach.

翻訳日:2022-11-29 20:45:02 公開日:2022-11-28

# beyond cage: 学習された自律ネットワーク防衛政策の一般化を調査

Beyond CAGE: Investigating Generalization of Learned Autonomous Network Defense Policies ( http://arxiv.org/abs/2211.15557v1 )

ライセンス: Link先を確認

Melody Wolk, Andy Applebaum, Camron Denver, Patrick Dwyer, Marina Moskowitz, Harold Nguyen, Nicole Nichols, Nicole Park, Paul Rachwalski, Frank Rau, Adrian Webster

(参考訳) 強化学習(RL)の進歩は、ネットワーク防御のインテリジェントな自動化に新たな方向性をもたらした。しかし、これらの進歩の多くは、自分たちのアプリケーションをネットワークセキュリティに上回っているか、現実の世界でそれを実装する際の課題を考慮していない。これらの問題を理解するために,本研究では,高忠実度ネットワークシミュレータを用いた自律型ネットワークディフェンサエージェント構築のための公開競争であるCAGE Challengeの第2版で実施されたいくつかのRLアプローチを評価する。我々のアプローチはすべて、アルゴリズムのPPO(Proximal Policy Optimization)ファミリに基づいており、階層的RL、アクションマスキング、カスタムトレーニング、アンサンブルRLを含んでいる。アンサンブルRL技術は,我々の他のモデルより優れ,競争において第2位である。実環境への適用性を理解するため,未知のネットワークや未知の攻撃戦略に対して,各手法の一般化能力を評価する。目に見えない環境では, 環境変化のタイプによって劣化が変化するなど, 全てのアプローチが悪化する。未知の攻撃戦略に対して、新しい戦略はトレーニングしたモデルよりも効率的ではありませんでしたが、我々のモデルは全体的なパフォーマンスを低下させました。これらの結果は、現実世界における自律的ネットワーク防衛のための有望な研究方向を強調する。

Advancements in reinforcement learning (RL) have inspired new directions in intelligent automation of network defense. However, many of these advancements have either outpaced their application to network security or have not considered the challenges associated with implementing them in the real-world. To understand these problems, this work evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defender agent in a high-fidelity network simulator. Our approaches all build on the Proximal Policy Optimization (PPO) family of algorithms, and include hierarchical RL, action masking, custom training, and ensemble RL. We find that the ensemble RL technique performs strongest, outperforming our other models and taking second place in the competition. To understand applicability to real environments we evaluate each method's ability to generalize to unseen networks and against an unknown attack strategy. In unseen environments, all of our approaches perform worse, with degradation varied based on the type of environmental change. Against an unknown attacker strategy, we found that our models had reduced overall performance even though the new strategy was less efficient than the ones our models trained on. Together, these results highlight promising research directions for autonomous network defense in the real world.

翻訳日:2022-11-29 20:44:32 公開日:2022-11-28

# 乳癌データ統合のためのグラフニューラルネットワーク

Graph Neural Networks for Breast Cancer Data Integration ( http://arxiv.org/abs/2211.15561v1 )

ライセンス: Link先を確認

Teodora Reu

(参考訳) METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) などの国際イニシアチブは、様々ながんの進化を通じて進行中の分子過程を特定するために、複数の多ゲノミクスおよび臨床データセットを収集している。多くの機械学習と統計モデルは、これらのタイプのデータを独立して分析するために設計、訓練されてきたが、そのような異なる形状のソース情報ストリームの統合は、広く研究されていない。これらのデータセットをよりうまく統合し、最終的にがん検出タスクに活用できる有意義な表現を生成することで、患者に適切な治療を与えることができる。そこで我々は,ガンデータモダリティをグラフとして統合し,次いでグラフニューラルネットワークを教師なし環境で適用して,組み合わせたデータから低次元の埋め込みを生成し,最終的に癌サブタイプの分類モデルに新しい表現を与えて評価する,という3つのステップからなる新しい学習パイプラインを提案する。グラフ構築アルゴリズムは、METABRICは患者のモダリティ間の関係を記憶していないため、それらが生成した埋め込みの品質に与える影響について議論している。また、グラフニューラルネットワーク、変分グラフオートエンコーダ、ディープグラフ情報マックスといった低遅延空間表現を生成するために使用されるモデルも提示する。並列に、パイプラインを合成データセット上でテストし、ホモフィリーレベルなどの基礎となるデータの特徴が、人工データにおける51\%から98\%の精度、METABRICにおける13\%と80\%の精度に大きく影響を与えることを示した。このプロジェクトは、がんデータ理解を改善する可能性があり、正規データセットからグラフ型データへの移行を促進する。

International initiatives such as METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) have collected several multigenomic and clinical data sets to identify the undergoing molecular processes taking place throughout the evolution of various cancers. Numerous Machine Learning and statistical models have been designed and trained to analyze these types of data independently, however, the integration of such differently shaped and sourced information streams has not been extensively studied. To better integrate these data sets and generate meaningful representations that can ultimately be leveraged for cancer detection tasks could lead to giving well-suited treatments to patients. Hence, we propose a novel learning pipeline comprising three steps - the integration of cancer data modalities as graphs, followed by the application of Graph Neural Networks in an unsupervised setting to generate lower-dimensional embeddings from the combined data, and finally feeding the new representations on a cancer sub-type classification model for evaluation. The graph construction algorithms are described in-depth as METABRIC does not store relationships between the patient modalities, with a discussion of their influence over the quality of the generated embeddings. We also present the models used to generate the lower-latent space representations: Graph Neural Networks, Variational Graph Autoencoders and Deep Graph Infomax. In parallel, the pipeline is tested on a synthetic dataset to demonstrate that the characteristics of the underlying data, such as homophily levels, greatly influence the performance of the pipeline, which ranges between 51\% to 98\% accuracy on artificial data, and 13\% and 80\% on METABRIC. This project has the potential to improve cancer data understanding and encourages the transition of regular data sets to graph-shaped data.

翻訳日:2022-11-29 20:44:11 公開日:2022-11-28

# Action-GPT: 改良および一般化されたゼロショットアクション生成のための大規模言語モデルを活用する

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation ( http://arxiv.org/abs/2211.15603v1 )

ライセンス: Link先を確認

Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla

(参考訳) 本稿では,大規模言語モデル(LLM)をテキストベースのアクション生成モデルに組み込むためのプラグインおよびプレイフレームワークであるAction-GPTを紹介する。現在のモーションキャプチャデータセットにおけるアクションフレーズは、最小限の情報とポイント情報を含む。 LLMのプロンプトを慎重に作成することにより、アクションのよりリッチできめ細かい記述を生成する。動作句の代わりにこれらの詳細記述を利用することで,テキストと動き空間のアライメントが向上することを示す。本実験は,最近のテキスト・ツー・モーションモデルによる合成運動の質の質的,定量的な改善を示す。コード、事前トレーニングされたモデル、サンプルビデオはhttps://actiongpt.github.ioで入手できる。

We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at https://actiongpt.github.io

翻訳日:2022-11-29 20:43:41 公開日:2022-11-28

# 2つは1つより優れている:補完的な製品推奨のためのデュアル埋め込み

Two Is Better Than One: Dual Embeddings for Complementary Product Recommendations ( http://arxiv.org/abs/2211.14982v1 )

ライセンス: Link先を確認

Giorgi Kvernadze, Putu Ayu G. Sudyanti, Nishan Subedi, Mohammad Hajiaghayi

(参考訳) 近年,大規模なシステムに容易に統合でき,近隣の検索をリアルタイムに行えるため,埋め込みベースの製品レコメンデーションが人気を集めている。この領域における多くの研究は、主に類似の項目の推薦に焦点を当てている。一方,相補的項目推薦の研究は,まだ未検討のままである。類似の項目を,有用性の観点から交換可能な項目と定義し,異なる目的に適合するが,相互に使用する場合には互換性を持つ項目として補完的項目を定義した。本稿では,製品に対する二重埋め込み表現を活用し,補完的項目を見つけるための新しい手法を提案する。本研究では,NLP におけるスキップグラム陰性サンプリング (SGNS) モデルにおける関連性の概念が,共購入データを用いてアイテム表現を訓練する際の相補性の概念に有効であることを示す。実際のシナリオでは,購入データの分散が大きな課題となるため,包括範囲を拡大するために合成サンプルを用いたモデルをさらに強化する。これにより、画像、テキスト、クリックなどの豊富なデータモダリティを活用することで、共購入データを共有しない項目に対して補完的なレコメンデーションを提供することができる。我々は,大手オンライン小売企業において,実世界のデータに対するレコメンデーションのカバレッジと品質を向上させるためのアプローチの有効性を確立した。さらに,SGNS訓練におけるタスク特化ハイパーパラメータチューニングの重要性を示す。我々のモデルは実装が簡単であり、あらゆるeコマースウェブサイトで補完的なアイテムレコメンデーションを生成するための優れた候補となる。

Embedding based product recommendations have gained popularity in recent years due to its ability to easily integrate to large-scale systems and allowing nearest neighbor searches in real-time. The bulk of studies in this area has predominantly been focused on similar item recommendations. Research on complementary item recommendations, on the other hand, still remains considerably under-explored. We define similar items as items that are interchangeable in terms of their utility and complementary items as items that serve different purposes, yet are compatible when used with one another. In this paper, we apply a novel approach to finding complementary items by leveraging dual embedding representations for products. We demonstrate that the notion of relatedness discovered in NLP for skip-gram negative sampling (SGNS) models translates effectively to the concept of complementarity when training item representations using co-purchase data. Since sparsity of purchase data is a major challenge in real-world scenarios, we further augment the model using synthetic samples to extend coverage. This allows the model to provide complementary recommendations for items that do not share co-purchase data by leveraging other abundantly available data modalities such as images, text, clicks etc. We establish the effectiveness of our approach in improving both coverage and quality of recommendations on real world data for a major online retail company. We further show the importance of task specific hyperparameter tuning in training SGNS. Our model is effective yet simple to implement, making it a great candidate for generating complementary item recommendations at any e-commerce website.

翻訳日:2022-11-29 20:35:28 公開日:2022-11-28

# 疎高次元線形回帰に対する適応的最短解導除法

An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression ( http://arxiv.org/abs/2211.15057v1 )

ライセンス: Link先を確認

Xue Yu, Yifan Sun, Haijun Zhou

(参考訳) 高次元線形回帰モデルは高次元データの統計モデルとしては最も一般的なものであるが、偏差係数のスパースセットを達成することは極めて難しい課題である。本稿では, 最短解導出デシメーションアルゴリズムから適応し, assdと呼ばれる, スパース高次元線形回帰モデルを構築するための単純ヒューリスティックなアルゴリズムを提案する。このアルゴリズムは再帰的減算線形方程式の最小二乗解の指導の下で回帰係数の支持を構築し、早期停止基準と二段階しきい値法を適用してこの支持を洗練する。以上の結果から,ASSDはLASSO,ベクトル近似メッセージパッシング,および他の2つの代表的グリージーアルゴリズムよりも解の精度と堅牢性に優れていた。 ASSDは、実世界の応用で遭遇する高度に相関した測定行列を持つ線形回帰問題に特に適している。

High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest solution-guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the least-squares solution of the recursively decimated linear equations, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outperforms LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

翻訳日:2022-11-29 20:35:03 公開日:2022-11-28

# lone sampler:コーディネートローカル近傍サンプリングによるグラフノード埋め込み

LoNe Sampler: Graph node embeddings by coordinated local neighborhood sampling ( http://arxiv.org/abs/2211.15114v1 )

ライセンス: Link先を確認

Konstantin Kutzkov

(参考訳) 局所グラフ近傍サンプリングは、ノード表現学習のアルゴリズムの中心にある基本的な計算問題である。グラフノードを周辺ノードの属性のような離散的な特徴で表現する離散ノード埋め込みを学習するためのアルゴリズムがいくつか提案されている。離散埋め込みは、連続的なword2vecライクなノード埋め込みと比較して、いくつかの利点を提供している: 計算の容易さ、拡張性、解釈性。我々は,局所的な近傍サンプリングにより離散ノード埋め込みを生成するアルゴリズムのスイートであるlone samplerを提案する。まず、我々のアルゴリズムは理論的性質を厳密に理解した。第2に,カーネルモデルのトレーニングのためのグラム行列の高価な計算を避けるために,近似的ベクトル写像を生成する方法を示す。ベンチマークデータセットの実験は理論的な結果を確認し、提案手法の利点を実証する。

Local graph neighborhood sampling is a fundamental computational problem that is at the heart of algorithms for node representation learning. Several works have presented algorithms for learning discrete node embeddings where graph nodes are represented by discrete features such as attributes of neighborhood nodes. Discrete embeddings offer several advantages compared to continuous word2vec-like node embeddings: ease of computation, scalability, and interpretability. We present LoNe Sampler, a suite of algorithms for generating discrete node embeddings by Local Neighborhood Sampling, and address two shortcomings of previous work. First, our algorithms have rigorously understood theoretical properties. Second, we show how to generate approximate explicit vector maps that avoid the expensive computation of a Gram matrix for the training of a kernel model. Experiments on benchmark datasets confirm the theoretical findings and demonstrate the advantages of the proposed methods.

翻訳日:2022-11-29 20:34:46 公開日:2022-11-28

# より高速な$k$-means++アルゴリズム

A Faster $k$-means++ Algorithm ( http://arxiv.org/abs/2211.15118v1 )

ライセンス: Link先を確認

Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Danyang Zhuo

(参考訳) K-means++は、k-meansクラスタリングアルゴリズムの初期クラスタセンターを選択するための重要なアルゴリズムである。そこで本研究では,k$-means++問題を最適実行時間で解く新しいアルゴリズムを提案する。 n$のデータポイントが$\mathbb{r}^d$で与えられると、現在の最先端のアルゴリズムは$\widetilde{o}(k)$の反復で動作し、各イテレーションは$\widetilde{o}(nd k)$の時間を要する。従って、全体の実行時間は$\widetilde{O}(n d k^2)$である。我々は,$\widetilde{o}(nd + nk^2)$ の時間しかかからない新しいアルゴリズム \textsc{fastkmeans++} を提案する。

K-means++ is an important algorithm to choose initial cluster centers for the k-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with near optimal running time. Given $n$ data points in $\mathbb{R}^d$, the current state-of-the-art algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes $\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d k^2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.

翻訳日:2022-11-29 20:34:32 公開日:2022-11-28

# SuperFusion:Long-Range HD Map生成と予測のためのマルチレベルLiDAR-Camera Fusion

SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation and Prediction ( http://arxiv.org/abs/2211.15656v1 )

ライセンス: Link先を確認

Hao Dong, Xianjing Zhang, Xuan Jiang, Jun Zhang, Jintao Xu, Rui Ai, Weihao Gu, Huimin Lu, Juho Kannala and Xieyuanli Chen

(参考訳) 環境の高精細(HD)セマンティックマップ生成は自律運転の重要な構成要素である。既存の手法は、LiDARやカメラなど、様々なセンサーモードを融合することにより、このタスクにおいて優れたパフォーマンスを実現している。しかし、現在の作業は生のデータやネットワークの機能レベルの融合に基づいており、短距離のhdマップ生成のみを考慮し、現実的な自動運転アプリケーションへのデプロイを制限している。本稿では,30m以内の短距離でHDマップを構築する作業と,下流経路計画と制御タスクが必要とする90mまでの長距離HDマップの予測に焦点をあてて,自動運転の滑らかさと安全性を向上させる。そこで本研究では,LiDARとカメラデータの融合を利用したSuperFusionというネットワークを提案する。我々は、nuScenesデータセットと自己記録データセットでSuperFusionをベンチマークし、最先端のベースラインメソッドよりも大きなマージンを持つことを示す。さらに,長距離HDマップの予測評価のための新しい指標を提案し,生成したHDマップを下流経路計画タスクに適用する。その結果,提案手法で予測した長距離hdマップを用いることで,自動運転車の経路計画を改善することが可能となった。コードはhttps://github.com/haomo-ai/SuperFusion.comから入手できる。

High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins. Furthermore, we propose a new metric to evaluate the long-range HD map prediction and apply the generated HD map to a downstream path planning task. The results show that by using the long-range HD maps predicted by our method, we can make better path planning for autonomous vehicles. The code will be available at https://github.com/haomo-ai/SuperFusion.

翻訳日:2022-11-29 20:28:19 公開日:2022-11-28

# 変更点検出のためのオンラインカーネルCUSUM

Online Kernel CUSUM for Change-Point Detection ( http://arxiv.org/abs/2211.15070v1 )

ライセンス: Link先を確認

Song Wei, Yao Xie

(参考訳) 我々は、異なるウィンドウサイズを持つ並列カーネル統計セットからなるオンラインカーネル累積Sum(CUSUM)手順を開発し、未知の変化点位置を考慮に入れた。 Shewhartチャート形式に対応する既存のスライディングウィンドウベースのカーネル変更点検出手順と比較して,提案手法は小さな変更に対してより敏感である。さらに、オンライン処理において一定の計算量とメモリの複雑さを達成するために重要となる検出統計の再帰的計算を、計算のボトルネックとなるグラム行列全体を計算し記憶する必要がないように提示する。本研究では,2つの基本性能指標,平均走行長(ARL)と予測検出遅延(EDD)を正確に解析する。さらに、任意のウィンドウサイズを $\log ({\rm arl})$ の順に定め、oracle のプロシージャと比較してほとんど電力損失がないようにし、これは window-limited generalized likelihood ratio (glr) 手順の古典的な結果と類似している。提案手法の理論的結果と競合性能を検証するため, 広範な数値実験を行った。

We develop an online kernel Cumulative Sum (CUSUM) procedure, which consists of a parallel set of kernel statistics with different window sizes to account for the unknown change-point location. Compared with many existing sliding window-based kernel change-point detection procedures, which correspond to the Shewhart chart-type procedure, the proposed procedure is more sensitive to small changes. We further present a recursive computation of detection statistics, which is crucial for online procedures to achieve a constant computational and memory complexity, such that we do not need to calculate and remember the entire Gram matrix, which can be a computational bottleneck otherwise. We obtain precise analytic approximations of the two fundamental performance metrics, the Average Run Length (ARL) and Expected Detection Delay (EDD). Furthermore, we establish the optimal window size on the order of $\log ({\rm ARL})$ such that there is nearly no power loss compared with an oracle procedure, which is analogous to the classic result for window-limited Generalized Likelihood Ratio (GLR) procedure. We present extensive numerical experiments to validate our theoretical results and the competitive performance of the proposed method.

翻訳日:2022-11-29 20:26:52 公開日:2022-11-28

# Deep Learning Inverse Technique を用いたニアフィルタSAR画像復元 : 予備的検討

Near-filed SAR Image Restoration with Deep Learning Inverse Technique: A Preliminary Study ( http://arxiv.org/abs/2211.14990v1 )

ライセンス: Link先を確認

Xu Zhan, Xiaoling Zhang, Wensi Zhang, Jun Shi, Shunjun Wei, Tianjiao Zeng

(参考訳) 比較的大きな開口角と広い伝送帯域と組み合わせて、近距離場合成開口レーダー(SAR)は、ターゲットの散乱分布ホットスポットの高解像度画像を提供する。一方、撮像結果は、サイドローブ、クラッタ、ノイズから必然的に劣化し、ターゲットの情報検索を妨げる。イメージを復元するために、現在の手法では、例えば、点拡散関数(PSF)は空間的に一貫したものであり、ターゲットはスパース点散乱器などで構成されている。これにより、特に複雑なターゲットに対して、ターゲット形状の限定的な復元性能が得られる。これらの課題に対処するために,本研究における近年の有望な深層学習逆テクニックによる復元に関する予備的研究を行った。本研究では,分解モデルを,近接場sarのシステム応答を考慮した空間変数複素畳み込みモデルに再構成する。それに合わせて、モデルベースのディープラーニングネットワークは、イメージを復元するように設計されている。複数の複雑なターゲットモデルからのシミュレーション劣化画像データセットを構築し,ネットワークの検証を行った。全ての画像は電磁シミュレーションツールを用いて定式化される。データセットの実験は、その有効性を明らかにする。現在の手法と比較して、目標形状とエネルギー推定に関して優れた性能が得られる。

Benefiting from a relatively larger aperture's angle, and in combination with a wide transmitting bandwidth, near-field synthetic aperture radar (SAR) provides a high-resolution image of a target's scattering distribution-hot spots. Meanwhile, imaging result suffers inevitable degradation from sidelobes, clutters, and noises, hindering the information retrieval of the target. To restore the image, current methods make simplified assumptions; for example, the point spread function (PSF) is spatially consistent, the target consists of sparse point scatters, etc. Thus, they achieve limited restoration performance in terms of the target's shape, especially for complex targets. To address these issues, a preliminary study is conducted on restoration with the recent promising deep learning inverse technique in this work. We reformulate the degradation model into a spatially variable complex-convolution model, where the near-field SAR's system response is considered. Adhering to it, a model-based deep learning network is designed to restore the image. A simulated degraded image dataset from multiple complex target models is constructed to validate the network. All the images are formulated using the electromagnetic simulation tool. Experiments on the dataset reveal their effectiveness. Compared with current methods, superior performance is achieved regarding the target's shape and energy estimation.

翻訳日:2022-11-29 20:19:31 公開日:2022-11-28

# 断層sarイメージングのための多次元特徴量埋め込みモデルデータ駆動ネットワーク

A Model-data-driven Network Embedding Multidimensional Features for Tomographic SAR Imaging ( http://arxiv.org/abs/2211.15002v1 )

ライセンス: Link先を確認

Yu Ren, Xiaoling Zhang, Xu Zhan, Jun Shi, Shunjun Wei, Tianjiao Zeng

(参考訳) ディープラーニング(DL)ベースのトモグラフィSARイメージングアルゴリズムの研究が徐々に進んでいる。典型的には、展開ネットワークを用いて古典的圧縮センシング法(CS)の反復計算を模倣し、各範囲方位単位を個別に処理する。しかし、この方法で有効活用されるのは1次元の特徴のみである。隣接する分解単位間の相関を直接無視する。そこで本研究では,多次元特徴量に基づくtomosarイメージングを実現するための新しいモデルデータ駆動ネットワークを提案する。ディープ・アンフォールディング法により、2次元ディープ・アンフォールディング・イメージング・ネットワークを構築する。そこで我々は,画像シーンの多次元的特徴を効果的に向上するために,畳み込みエンコーダ・デコーダ構造を2つの2次元処理モジュールに追加する。一方,提案する多機能イメージングネットワークをトレーニングするために,建物シミュレーションデータからなるトモSARシミュレーションデータセットを構築した。実験はモデルの有効性を検証する。従来のCS-based FISTA法とDL-based gamma-Net法と比較して,提案手法は良好な画像精度を有しつつ,完全性を向上させる。

Deep learning (DL)-based tomographic SAR imaging algorithms are gradually being studied. Typically, they use an unfolding network to mimic the iterative calculation of the classical compressive sensing (CS)-based methods and process each range-azimuth unit individually. However, only one-dimensional features are effectively utilized in this way. The correlation between adjacent resolution units is ignored directly. To address that, we propose a new model-data-driven network to achieve tomoSAR imaging based on multi-dimensional features. Guided by the deep unfolding methodology, a two-dimensional deep unfolding imaging network is constructed. On the basis of it, we add two 2D processing modules, both convolutional encoder-decoder structures, to enhance multi-dimensional features of the imaging scene effectively. Meanwhile, to train the proposed multifeature-based imaging network, we construct a tomoSAR simulation dataset consisting entirely of simulation data of buildings. Experiments verify the effectiveness of the model. Compared with the conventional CS-based FISTA method and DL-based gamma-Net method, the result of our proposed method has better performance on completeness while having decent imaging accuracy.

翻訳日:2022-11-29 20:19:12 公開日:2022-11-28

# 可逆ニューラルネットワークによる非知覚的敵攻撃

Imperceptible Adversarial Attack via Invertible Neural Networks ( http://arxiv.org/abs/2211.15030v1 )

ライセンス: Link先を確認

Zihan Chen, Ziyue Wang, Junjie Huang, Wentao Zhao, Xiao Liu, Dejian Guan

(参考訳) 補助的な勾配情報を利用した摂動の追加や、良性画像の既存詳細の破棄は、逆の例を生成するための2つの一般的なアプローチである。視覚インプセプティビリティは、敵の例の望ましい特性であるが、従来の敵の攻撃は、いまだに追跡可能な敵の摂動を生み出している。本稿では,非可逆ニューラルネットワーク(AdvINN)を用いた新たな逆攻撃手法を提案する。具体的には、advinnは可逆ニューラルネットワークの情報保存特性を十分に活用し、ターゲットクラスのクラス固有の意味情報を同時に追加し、元のクラスの識別情報をドロップすることで、逆例を生成する。 CIFAR-10, CIFAR-100, ImageNet-1Kの大規模な実験により, 提案したAdvINN法は, 最先端の手法よりも知覚不可能な逆画像を生成することができ, また, 他の攻撃に比べ, より堅牢な逆画像が得られることを示した。

Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack via Invertible Neural Networks (AdvINN) method to produce robust and imperceptible adversarial examples. Specifically, AdvINN fully takes advantage of the information preservation property of Invertible Neural Networks and thereby generates adversarial examples by simultaneously adding class-specific semantic information of the target class and dropping discriminant information of the original class. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that the proposed AdvINN method can produce less imperceptible adversarial images than the state-of-the-art methods and AdvINN yields more robust adversarial examples with high confidence compared to other adversarial attacks.

翻訳日:2022-11-29 20:18:51 公開日:2022-11-28

# renmin university of china at trecvid 2022: 特徴融合と否定理解によるビデオ検索の改善

Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding ( http://arxiv.org/abs/2211.15039v1 )

ライセンス: Link先を確認

Xirong Li, Aozhu Chen, Ziyue Wang, Fan Hu, Kaibin Tian, Xinru Chen, Chengbo Dong

(参考訳) TRECVID 2022 Ad-hoc Video Search (AVS) 実験を要約する。提案手法は,視覚とテキストの多様な特徴を結合するlightweight attentional feature fusion (laff) と,否定的手がかりを含む問合せに対する双方向否定学習 (bnl) という2つの新しい手法を用いて構築した。特にLAFFは、早期と後期の両方で機能融合を行い、テキストとビデオの両方で多様な(既製の)機能を利用する。多面的自己注意と比較して、LAFFはよりコンパクトだがより効果的である。注意重みはより少ない特徴の選択にも利用でき、検索性能はほとんど保存されている。 BNLは、与えられたトレーニングビデオとそのオリジナルの記述と部分的に否定された記述からなる三重項あたりの双方向制約損失を最小化することにより、否定対応のビデオ検索モデルを訓練する。ビデオ特徴抽出にはCLIP,BLIP,BEiT,ResNeXt-101,irCSNを用いる。テキスト機能に関しては、baba-of-words、 word2vec、CLIP、BLIPを採用しています。トレーニングデータには,MSR-VTT,TGIF,VATEXが組み込まれている。さらに,事前学習のためのv3c1コレクションを自動キャプションする。 TRECVIDベンチマークの2022年版は、再びRUCMMチームにとって実りある参加となった。私たちのベストランは、infapが0.262で、チーム別で2位にランクインします。

We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text and video ends to exploit diverse (off-the-shelf) features. Compared to multi-head self attention, LAFF is much more compact yet more effective. Its attentional weights can also be used for selecting fewer features, with the retrieval performance mostly preserved. BNL trains a negation-aware video retrieval model by minimizing a bidirectionally constrained loss per triplet, where a triplet consists of a given training video, its original description and a partially negated description. For video feature extraction, we use pre-trained CLIP, BLIP, BEiT, ResNeXt-101 and irCSN. As for text features, we adopt bag-of-words, word2vec, CLIP and BLIP. Our training data consists of MSR-VTT, TGIF and VATEX that were used in our previous participation. In addition, we automatically caption the V3C1 collection for pre-training. The 2022 edition of the TRECVID benchmark has again been a fruitful participation for the RUCMM team. Our best run, with an infAP of 0.262, is ranked at the second place teamwise.

翻訳日:2022-11-29 20:18:32 公開日:2022-11-28

# Nested U-Net Architectureによる低磁場MRI超解像

Synthetic Low-Field MRI Super-Resolution Via Nested U-Net Architecture ( http://arxiv.org/abs/2211.15047v1 )

ライセンス: Link先を確認

Aryan Kalluvila, Neha Koonjoo, Danyal Bhutto, Marcio Rockenbach, Matthew S. Rosen

(参考訳) 低磁場MRIスキャナーは、高磁場MRIスキャナーのポータブルで安価な代替品を提供することで、医療画像に革命をもたらす力を持っている。しかし、そのようなスキャナーは通常、ハイフィールドスキャナーよりもかなりノイズが多く、品質も低い。本研究の目的は、低磁場MRIスキャンのSNRと画像品質を改善し、診断能力を向上することである。この問題に対処するため,Nested U-Net ニューラルネットワークアーキテクチャの超解法アルゴリズムを提案し,提案手法を平均PSNR78.83,SSIM0.9551で上回った。 t1-mixデータセットと呼ばれる主要な重み付けmri画像データセットから人工的なノイズダウンサンプリング合成データを用いてネットワークをテストした。ある放射線技師はlikertスケール(1-5)で25枚の画像を記録し、全体の画像品質、解剖学的構造、および我々のアーキテクチャや他の出版作品(sr densenet、generator block、srcnnなど)の診断信頼度を評価しました。また、NLMSE(Natural log mean squared error)と呼ばれる新しいタイプの損失関数も導入する。結論として、Nested U-Netアーキテクチャを用いて、合成低磁場MRIに適用した単一画像超解像のためのより正確なディープラーニング手法を提案する。

Low-field (LF) MRI scanners have the power to revolutionize medical imaging by providing a portable and cheaper alternative to high-field MRI scanners. However, such scanners are usually significantly noisier and lower quality than their high-field counterparts. The aim of this paper is to improve the SNR and overall image quality of low-field MRI scans to improve diagnostic capability. To address this issue, we propose a Nested U-Net neural network architecture super-resolution algorithm that outperforms previously suggested deep learning methods with an average PSNR of 78.83 and SSIM of 0.9551. We tested our network on artificial noisy downsampled synthetic data from a major T1 weighted MRI image dataset called the T1-mix dataset. One board-certified radiologist scored 25 images on the Likert scale (1-5) assessing overall image quality, anatomical structure, and diagnostic confidence across our architecture and other published works (SR DenseNet, Generator Block, SRCNN, etc.). We also introduce a new type of loss function called natural log mean squared error (NLMSE). In conclusion, we present a more accurate deep learning method for single image super-resolution applied to synthetic low-field MRI via a Nested U-Net architecture.

翻訳日:2022-11-29 20:18:09 公開日:2022-11-28

# PlasmoID:薄い血液スミアにおけるインドネシアのマラリア原虫検出とセグメンテーションのためのデータセット

PlasmoID: A dataset for Indonesian malaria parasite detection and segmentation in thin blood smear ( http://arxiv.org/abs/2211.15105v1 )

ライセンス: Link先を確認

Hanung Adi Nugroho, Rizki Nurfauzi, E. Elsa Herdiana Murhandarwati, Purwono Purwono

(参考訳) インドネシアは東南アジアで最多のマラリア患者数で2番目に高い国である。ディープラーニングアプローチに基づくマラリア寄生虫セマンティックセグメンテーションの異なる手法は、従来の方法の限界を減らす代替手段である。しかし,大型寄生虫が優勢であり,小寄生虫が抑制されるため,セマンティクスセグメンテーション技術の主な問題点が浮かび上がっている。加えて、データの量と分散は、モデルを確立する上で重要な影響である。本研究では2つの貢献を行う。まず,薄い血液スミアのマラリア寄生虫691点を含む559点の顕微鏡画像を収集した。データセットはPlasmoIDと名付けられ、ほとんどのデータはインドネシアの田舎から来ている。 PlasmoIDは寄生虫の検出とセグメンテーションの目的にも真実を提供する。第二に,rcnnの高速化とセマンティクスセグメンテーション手法を組み合わせたマラリア寄生虫のセグメンテーションと検出手法を提案する。提案手法はPlasmoIDデータセット上で評価されている。 UNet、ResFCN-18、DeepLabV3、DeepLabV3plus、ResUNet-18といったセマンティックセグメンテーション技術の研究と比較されている。その結果,本手法はマラリア寄生虫のセグメンテーションと検出を,本来のセグメンテーション手法と比較して改善できることがわかった。

Indonesia holds the second-highest-ranking country for the highest number of malaria cases in Southeast Asia. A different malaria parasite semantic segmentation technique based on a deep learning approach is an alternative to reduce the limitations of traditional methods. However, the main problem of the semantic segmentation technique is raised since large parasites are dominant, and the tiny parasites are suppressed. In addition, the amount and variance of data are important influences in establishing their models. In this study, we conduct two contributions. First, we collect 559 microscopic images containing 691 malaria parasites of thin blood smears. The dataset is named PlasmoID, and most data comes from rural Indonesia. PlasmoID also provides ground truth for parasite detection and segmentation purposes. Second, this study proposes a malaria parasite segmentation and detection scheme by combining Faster RCNN and a semantic segmentation technique. The proposed scheme has been evaluated on the PlasmoID dataset. It has been compared with recent studies of semantic segmentation techniques, namely UNet, ResFCN-18, DeepLabV3, DeepLabV3plus and ResUNet-18. The result shows that our proposed scheme can improve the segmentation and detection of malaria parasite performance compared to original semantic segmentation techniques.

翻訳日:2022-11-29 20:17:48 公開日:2022-11-28

# グローバルセンシング品質最大化に向けて:カメラネットワークの構成最適化スキーム

Toward Global Sensing Quality Maximization: A Configuration Optimization Scheme for Camera Networks ( http://arxiv.org/abs/2211.15166v1 )

ライセンス: Link先を確認

Xuechao Zhang, Xuda Ding, Yi Ren, Yu Zheng, Chongrong Fang and Jianping He

(参考訳) ターゲットの集合を監視するカメラネットワークの性能は、カメラの構成に大きく依存する。本稿では,パラメータ化カメラネットワークモデルの再構成戦略について検討し,複数のターゲットの知覚特性をグローバルかつ同時に最適化できることを示す。まず,画像中の単位長オブジェクトが占有する画素数を,カメラのパラメータ(内在的,外在的,歪的係数など)によって決定される物体の知覚品質の指標として用いることを提案する。そして、カメラネットワークによる目標のセンシング品質を測定する単一の量を形成する。この量はさらに最適化問題の目的関数として機能し、最適なカメラ構成を得る。提案手法の有効性を広範囲なシミュレーションと実験により検証し, apriltag検出タスクの性能向上を明らかにした。この作業のためのコードと関連するユーティリティは、https://github.com/sszxc/MultiCam-Simulationで公開されている。

The performance of a camera network monitoring a set of targets depends crucially on the configuration of the cameras. In this paper, we investigate the reconfiguration strategy for the parameterized camera network model, with which the sensing qualities of the multiple targets can be optimized globally and simultaneously. We first propose to use the number of pixels occupied by a unit-length object in image as a metric of the sensing quality of the object, which is determined by the parameters of the camera, such as intrinsic, extrinsic, and distortional coefficients. Then, we form a single quantity that measures the sensing quality of the targets by the camera network. This quantity further serves as the objective function of our optimization problem to obtain the optimal camera configuration. We verify the effectiveness of our approach through extensive simulations and experiments, and the results reveal its improved performance on the AprilTag detection tasks. Codes and related utilities for this work are open-sourced and available at https://github.com/sszxc/MultiCam-Simulation.

翻訳日:2022-11-29 20:17:28 公開日:2022-11-28

# AD診断と予後のための集団人工知能に基づくディープグレーディング

Deep Grading based on Collective Artificial Intelligence for AD Diagnosis and Prognosis ( http://arxiv.org/abs/2211.15192v1 )

ライセンス: Link先を確認

Huy-Dung Nguyen, Micha\"el Cl\'ement, Boris Mansencal, and Pierrick Coup\'e

(参考訳) アルツハイマー病の正確な診断と予後は、新しい治療法の開発と関連するコストの削減に不可欠である。近年,畳み込みニューラルネットワークの進歩に伴い,この2つのタスクを構造MRIを用いて自動化する方法が提案されている。しかし、これらの手法はしばしば解釈可能性や一般化の欠如に苦しめられ、性能の面で制限されることがある。本稿では,これらの制約を克服する新しい深層フレームワークを提案する。私たちの枠組みは2つの段階からなる。最初の段階では,意味のある特徴を抽出するディープグレーディングモデルを提案する。ドメインシフトに対するこれらの特徴の堅牢性を高めるため、トレーニングと評価のための革新的な集合人工知能戦略を導入する。第2段階では、ADシグネチャをよりよくキャプチャするために、グラフ畳み込みニューラルネットワークを使用します。本研究は2074年を対象とし,AD診断と予後の両面で異なるデータセットにおける最先端の手法と比較した。

Accurate diagnosis and prognosis of Alzheimer's disease are crucial to develop new therapies and reduce the associated costs. Recently, with the advances of convolutional neural networks, methods have been proposed to automate these two tasks using structural MRI. However, these methods often suffer from lack of interpretability, generalization, and can be limited in terms of performance. In this paper, we propose a novel deep framework designed to overcome these limitations. Our framework consists of two stages. In the first stage, we propose a deep grading model to extract meaningful features. To enhance the robustness of these features against domain shift, we introduce an innovative collective artificial intelligence strategy for training and evaluating steps. In the second stage, we use a graph convolutional neural network to better capture AD signatures. Our experiments based on 2074 subjects show the competitive performance of our deep framework compared to state-of-the-art methods on different datasets for both AD diagnosis and prognosis.

翻訳日:2022-11-29 20:17:13 公開日:2022-11-28

# 深い優先順位を持つ調整不要なプラグアンドプレイハイパースペクトル画像デコンボリューション

Tuning-free Plug-and-Play Hyperspectral Image Deconvolution with Deep Priors ( http://arxiv.org/abs/2211.15307v1 )

ライセンス: Link先を確認

Xiuheng Wang, Jie Chen, C\'edric Richard

(参考訳) デコンボリューション(deconvolution)は、取得装置が生成するハイパースペクトル画像~(hsi)のぼやけやノイズを軽減するために広く用いられる戦略である。この問題は通常、不適切な逆問題を解くことで解決される。適切な画像プリエントを調べることでデコンボリューション性能が向上するが、強力な正規化器を手作りし、正規化パラメータを設定することは自明ではない。本稿では,これらの問題に対処するため,hsiデコンボリューションのためのチューニングフリープラグアンドプレイ(pnp)アルゴリズムを提案する。具体的には、乗算器の交互方向法(ADMM)を用いて最適化問題を2つの反復部分確率に分解する。フレキシブルブラインド3dデノイジングネットワーク(b3ddn)は、より深い事前学習と、異なるノイズレベルでデノイジングサブ問題を解くために設計されている。次に、3次元残留白度の測定を行い、二次部分問題を解く際のペナルティパラメータと停止基準を調整する。実地データと実地データの両方における実験結果から,提案手法の優位性が示された。

Deconvolution is a widely used strategy to mitigate the blurring and noisy degradation of hyperspectral images~(HSI) generated by the acquisition devices. This issue is usually addressed by solving an ill-posed inverse problem. While investigating proper image priors can enhance the deconvolution performance, it is not trivial to handcraft a powerful regularizer and to set the regularization parameters. To address these issues, in this paper we introduce a tuning-free Plug-and-Play (PnP) algorithm for HSI deconvolution. Specifically, we use the alternating direction method of multipliers (ADMM) to decompose the optimization problem into two iterative sub-problems. A flexible blind 3D denoising network (B3DDN) is designed to learn deep priors and to solve the denoising sub-problem with different noise levels. A measure of 3D residual whiteness is then investigated to adjust the penalty parameters when solving the quadratic sub-problems, as well as a stopping criterion. Experimental results on both simulated and real-world data with ground-truth demonstrate the superiority of the proposed method.

翻訳日:2022-11-29 20:17:01 公開日:2022-11-28

# 脳MRIにおける教師なし異常検出の表現特性の検討

A Study of Representational Properties of Unsupervised Anomaly Detection in Brain MRI ( http://arxiv.org/abs/2211.15527v1 )

ライセンス: Link先を確認

Ayantika Das, Arun Palla, Keerthi Ram, Mohanasankar Sivaprakasam

(参考訳) MRIにおける異常検出は画像診断や診断において高い臨床的価値がある。異常検出の教師なし手法は、再構成や潜伏埋め込みに基づく興味深い定式化を提供し、分解に関連する特性を観察する方法を提供する。我々は4つの既存のモデリング手法を調査し、簡単なデータサイエンスツールを用いて経験的観察を報告し、脳構造MRIの場合を考慮して、非教師なしの異常検出の課題に最も関係がある因子化の観点から結果を求める。本研究は, 因子化関連特性を示す異常検出アルゴリズムが, 正規データと異常データとを区別する特徴量を持つことを示唆する。我々は、複数の異常および正常なデータセットで観測を検証した。

Anomaly detection in MRI is of high clinical value in imaging and diagnosis. Unsupervised methods for anomaly detection provide interesting formulations based on reconstruction or latent embedding, offering a way to observe properties related to factorization. We study four existing modeling methods, and report our empirical observations using simple data science tools, to seek outcomes from the perspective of factorization as it would be most relevant to the task of unsupervised anomaly detection, considering the case of brain structural MRI. Our study indicates that anomaly detection algorithms that exhibit factorization related properties are well capacitated with delineatory capabilities to distinguish between normal and anomaly data. We have validated our observations in multiple anomaly and normal datasets.

翻訳日:2022-11-29 20:16:39 公開日:2022-11-28

# データ増補とハイブリッド畳み込みネットワークを用いた前庭神経節状神経節形成のための非ペア化クロスモダリティセグメンテーションフレームワーク

An Unpaired Cross-modality Segmentation Framework Using Data Augmentation and Hybrid Convolutional Networks for Segmenting Vestibular Schwannoma and Cochlea ( http://arxiv.org/abs/2211.14986v1 )

ライセンス: Link先を確認

Yuzhou Zhuang, Hong Liu, Enmin Song, Coskun Cetinkaya, and Chih-Cheng Hung

(参考訳) CrossMoDAの課題は、ラベル付き造影T1スキャンを利用して、ラベル付き高分解能T2スキャンで前庭神経腫瘍(VS)腫瘍とコチェリー領域を自動的に分離することである。 2022年版では、セグメンテーションタスクを多施設スキャンで拡張している。本研究では,データ拡張とハイブリッド畳み込みネットワークを用いた非ペア型クロスモダリティセグメンテーションフレームワークを提案する。多施設スキャンにおける不均一分布と様々な画像サイズを考慮し、各スキャンの強度を-1から1に拡大するためにmin-max正規化を適用し、ボクセルサイズ再サンプリングと中心刈りを用いて訓練を行う。我々は,意味情報を効果的に学習し,現実的な対象領域スキャンを生成するための2つのデータ拡張手法を採用した。本研究では,CUTとCycleGANを用いて,教師付きセグメンテーショントレーニングのための詳細と外観の異なる2つの現実的なT2ボリュームを生成する。オンラインデータ拡張のために,vs腫瘍信号の不均一性をシミュレートするランダム腫瘍信号低減法を考案する。さらに,多次元畳み込みを伴う高度なハイブリッド畳み込みネットワークを用いて,異方性スキャンにおいてvs腫瘍と人工内耳領域の正確なボリュームセグメンテーションのために,スパース間スライス情報と高密度内スライス情報を適応的に学習する。クロスモダ2022バリデーションデータセットでは有望な結果を示し,vs腫瘍領域では平均dsc値が72.47%,76.48%,asd値が3.42mmと0.53mmであった。

The crossMoDA challenge aims to automatically segment the vestibular schwannoma (VS) tumor and cochlea regions of unlabeled high-resolution T2 scans by leveraging labeled contrast-enhanced T1 scans. The 2022 edition extends the segmentation task by including multi-institutional scans. In this work, we proposed an unpaired cross-modality segmentation framework using data augmentation and hybrid convolutional networks. Considering heterogeneous distributions and various image sizes for multi-institutional scans, we apply the min-max normalization for scaling the intensities of all scans between -1 and 1, and use the voxel size resampling and center cropping to obtain fixed-size sub-volumes for training. We adopt two data augmentation methods for effectively learning the semantic information and generating realistic target domain scans: generative and online data augmentation. For generative data augmentation, we use CUT and CycleGAN to generate two groups of realistic T2 volumes with different details and appearances for supervised segmentation training. For online data augmentation, we design a random tumor signal reducing method for simulating the heterogeneity of VS tumor signals. Furthermore, we utilize an advanced hybrid convolutional network with multi-dimensional convolutions to adaptively learn sparse inter-slice information and dense intra-slice information for accurate volumetric segmentation of VS tumor and cochlea regions in anisotropic scans. On the crossMoDA2022 validation dataset, our method produces promising results and achieves the mean DSC values of 72.47% and 76.48% and ASSD values of 3.42 mm and 0.53 mm for VS tumor and cochlea regions, respectively.

翻訳日:2022-11-29 20:08:13 公開日:2022-11-28

# 多認識タスク指向フレームワークを用いた3次元レーダイメージング逆問題の解法

Solving 3D Radar Imaging Inverse Problems with a Multi-cognition Task-oriented Framework ( http://arxiv.org/abs/2211.14989v1 )

ライセンス: Link先を確認

Xu Zhan, Xiaoling Zhang, Mou Wang, Jun Shi, Shunjun Wei, Tianjiao Zeng

(参考訳) 本研究は3次元レーダ画像逆問題に焦点を当てる。現在の方法では,タスク依存情報検索の損失を被った未分化の結果が得られており,タスク固有の要求を十分に満たさない。例えば、偏光散乱エネルギーはスクリーンイメージングでは許容されるが、散乱診断では許容されない。この問題に対処するため,我々は新しいタスク指向イメージングフレームワークを提案する。撮像原理は、タスクの要求を得るために分析フェーズを通してタスク指向である。画像モデルは、要求を埋め込んで満たすために正規化された多認識である。本手法は,認識間のカップリングを近似法と可変スプリッティング法で個別に解く汎用的に設計されている。例として、散乱診断、パーソンスクリーンイメージング、パーセルスクリーニングイメージングなどがある。 2つのシステムからのデータに対する実験は、提案されたフレームワークがタスク依存情報検索において現在のフレームワークよりも優れていることを示している。

This work focuses on 3D Radar imaging inverse problems. Current methods obtain undifferentiated results that suffer task-depended information retrieval loss and thus don't meet the task's specific demands well. For example, biased scattering energy may be acceptable for screen imaging but not for scattering diagnosis. To address this issue, we propose a new task-oriented imaging framework. The imaging principle is task-oriented through an analysis phase to obtain task's demands. The imaging model is multi-cognition regularized to embed and fulfill demands. The imaging method is designed to be general-ized, where couplings between cognitions are decoupled and solved individually with approximation and variable-splitting techniques. Tasks include scattering diagnosis, person screen imaging, and parcel screening imaging are given as examples. Experiments on data from two systems indicate that the pro-posed framework outperforms the current ones in task-depended information retrieval.

翻訳日:2022-11-29 20:07:40 公開日:2022-11-28

# ロボット運動学:運動、運動学、力学

Robot Kinematics: Motion, Kinematics and Dynamics ( http://arxiv.org/abs/2211.15093v1 )

ライセンス: Link先を確認

Jiawei Zhang

(参考訳) この記事は、“Robot Basics: Representation, Rotation and Velocity”と題された前回の記事のフォローアップチュートリアル記事である。本稿では,本論文のトピックについてより深く理解するために,ロボット基礎に関する以前のチュートリアル記事を読むことを勧める。具体的には,ロボット運動,前方運動学,逆運動学,ロボット力学など,ロボットキネマティクスに関するより高度な話題について紹介する。前回の記事で紹介されたトピック、用語、表記について、この記事では再び導入することなく直接使用します。また、前回の記事と同様、本記事でも数学と公式が多用される(読者は今後の数学爆弾の準備が整っていることを願う)。この記事を読んでから、読者はロボットの動き、運動学、ダイナミクスについてより深く理解できるようになるだろう。ロボット制御に関するより先進的な話題については、読者向けの以下のチュートリアル記事で紹介する。

This is a follow-up tutorial article of our previous article entitled "Robot Basics: Representation, Rotation and Velocity". For better understanding of the topics covered in this articles, we recommend the readers to first read our previous tutorial article on robot basics. Specifically, in this article, we will cover some more advanced topics on robot kinematics, including robot motion, forward kinematics, inverse kinematics, and robot dynamics. For the topics, terminologies and notations introduced in the previous article, we will use them directly without re-introducing them again in this article. Also similar to the previous article, math and formulas will also be heavily used in this article as well (hope the readers are well prepared for the upcoming math bomb). After reading this article, readers should be able to have a deeper understanding about how robot motion, kinematics and dynamics. As to some more advanced topics about robot control, we will introduce them in the following tutorial articles for readers instead.

翻訳日:2022-11-29 20:01:39 公開日:2022-11-28

# 時空間物体モデリングによるプロアクティブロボット支援

Proactive Robot Assistance via Spatio-Temporal Object Modeling ( http://arxiv.org/abs/2211.15501v1 )

ライセンス: Link先を確認

Maithili Patel, Sonia Chernova

(参考訳) アクティブなロボット支援により、ロボットは明示的に尋ねられることなく、ユーザのニーズを予測し、提供することができる。ロボットが日常のユーザルーチンに付随する物体の動きの時間的パターンを予測する問題として、積極的な支援を定式化し、そのニーズに適応するためのオブジェクトを配置することで、ユーザの積極的な支援を行う。本稿では,物体配置の時間系列から物体ダイナミクスの時空間予測モデルを学ぶために,生成グラフニューラルネットワークを提案する。また,50日以上の生活行動に関連する家庭内オブジェクトを5つのシミュレートされた家庭で追跡するHouse Object Movements from Everyday Routines(HOMER)データセットを寄贈した。提案モデルは,物体移動の予測において主要なベースラインを上回り,11.1%以上の物体の位置を正確に予測し,11.5%の利用者が使用する物体の位置を誤って予測する。

Proactive robot assistance enables a robot to anticipate and provide for a user's needs without being explicitly asked. We formulate proactive assistance as the problem of the robot anticipating temporal patterns of object movements associated with everyday user routines, and proactively assisting the user by placing objects to adapt the environment to their needs. We introduce a generative graph neural network to learn a unified spatio-temporal predictive model of object dynamics from temporal sequences of object arrangements. We additionally contribute the Household Object Movements from Everyday Routines (HOMER) dataset, which tracks household objects associated with human activities of daily living across 50+ days for five simulated households. Our model outperforms the leading baseline in predicting object movement, correctly predicting locations for 11.1% more objects and wrongly predicting locations for 11.5% fewer objects used by the human user.

翻訳日:2022-11-29 20:01:23 公開日:2022-11-28

# トレーニング不足でグラフニューラルネットワークを改良:訓練されていないGNNのチケットを見つける

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets ( http://arxiv.org/abs/2211.15335v1 )

ライセンス: Link先を確認

Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu

(参考訳) 近年の研究では、ネットワークの重みを最適化することなく、完全に訓練された高密度ネットワークの性能に匹敵する、ランダムに初期化された畳み込みニューラルネットワーク(CNN)にサブネットワークが存在することが顕著に示されている。しかし、グラフニューラルネットワーク(GNN)におけるそのような訓練されていないサブネットワークの存在は、いまだに謎のままである。本稿では,未学習のGNNを探索する第一種探索を行う。 sparsityをコアツールとして、初期化時に \textit{untrained sparse subnetworks} を見つけることができ、これは \textit{fully trained dense} gnnのパフォーマンスにマッチする。このことに加えて、未学習のサブネットワークがGNNのオーバースムース化問題を大幅に軽減し、ベルやホイッスルを使わずにより深いGNNを可能にする強力なツールとなることを示す。また,そのようなスパースな未学習サブネットワークは,分布外検出や入力摂動のロバスト性において,優れた性能を有することが観察された。提案手法は,Open Graph Benchmark (OGB) など,広く使用されているGNNアーキテクチャを用いて評価する。

Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).

翻訳日:2022-11-29 19:53:17 公開日:2022-11-28

# 強化学習におけるゼロショット転送のためのハイパーネットワーク

Hypernetworks for Zero-shot Transfer in Reinforcement Learning ( http://arxiv.org/abs/2211.15457v1 )

ライセンス: Link先を確認

Sahand Rezaei-Shoshtari, Charlotte Morissette, Francois Robert Hogan, Gregory Dudek, David Meger

(参考訳) 本稿では,新しいTDベースのトレーニング目標と準最適RLソリューションの集合から得られたデータを用いて,未知のタスク条件にまたがる行動を生成するために,ハイパーネットワークを訓練する。この作業は、メタRL、コンテキストRL、トランスファーラーニングに関連するもので、特にテスト時のゼロショットパフォーマンスに焦点を当てており、タスクパラメータ(コンテキストとしても知られる)の知識によって実現されている。我々の技術的アプローチは、各RLアルゴリズムをMDP仕様から準最適値関数とポリシーへのマッピングとして捉え、MDPのパラメータを考慮し、準最適値関数とポリシーを生成できるハイパーネットワークで近似することに基づいている。特定の条件下では、このマッピングを教師付き学習問題とみなすことができる。我々は,DeepMind Control Suiteの一連の連続制御タスクにおいて,新たな報酬と遷移ダイナミクスへのゼロショット転送の有効性を実証的に評価した。提案手法は,マルチタスクおよびメタRLアプローチによるベースラインの大幅な改善を示す。

In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objective and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.

翻訳日:2022-11-29 19:52:54 公開日:2022-11-28

# ベイズ逆強化学習による実演満足度の自動評価

Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning ( http://arxiv.org/abs/2211.15542v1 )

ライセンス: Link先を確認

Tu Trinh, Daniel S. Brown

(参考訳) 本稿では,AIエージェントが実演から学習するAIエージェントのデモンストレーション効率を決定する上での問題点について考察する。この問題を解決するために,ベイジアン逆強化学習とバリュー・アット・リスクに基づく新たな自己評価手法を提案する。我々は,(1)正規化期待値差,(2)専門家の観察できない報酬関数に対する後悔度,(2)基準政策に対する改善,という2つの定義を提案し,評価する。両指標の高信頼境界を定式化する方法を示す。我々は、シミュレーションにおける我々のアプローチを評価し、専門家のパフォーマンスに適合するか、あるいは所望の安全閾値内で基準ポリシーのパフォーマンスを上回ることを保証し、十分なトレーニングデータを受信したかどうかを正確に評価できるAIシステムの開発の可能性を示す。

In this paper we examine the problem of determining demonstration sufficiency for AI agents that learn from demonstrations: how can an AI agent self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk to enable agents that learn from demonstrations to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the expert's unobserved reward function, and (2) improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation and demonstrate the feasibility of developing an AI system that can accurately evaluate whether it has received sufficient training data to guarantee, with high confidence, that it can match an expert's performance or surpass the performance of a baseline policy within some desired safety threshold.

翻訳日:2022-11-29 19:52:25 公開日:2022-11-28

# 健康シンポジウム2022における機械学習 -- 拡張抽象トラック

Machine Learning for Health symposium 2022 -- Extended Abstract track ( http://arxiv.org/abs/2211.15564v1 )

ライセンス: Link先を確認

Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y. Chen, Shengpu Tang, Luis Oala, Adarsh Subbaswamy

(参考訳) 第2回 machine learning for health symposium (ml4h 2022) で発表された拡張要約のコレクションは、2022年11月28日にアメリカ合衆国ルイジアナ州ニューオーリンズで開催された。マシンラーニング・フォー・ヘルス(ML4H)は、理論的な作業と応用的な作業の両方を含む、健康のための機械学習の研究のための長年にわたる場所である。 ML4H 2022は、技術的に成熟した厳密な作業の完全な提出を含むプロシージャトラックと、より成熟度が低いが議論のための革新的な研究を受理する拡張された抽象トラックの2つの提案トラックを特徴とした。 ml4hシンポジウムに提出された全ての原稿は、二重盲検のピアレビュープロセスが行われた。このコレクションに含まれる拡張された抽象化は、健康と医療の関連問題に焦点を当てた革新的な機械学習研究を記述している。

A collection of the extended abstracts that were presented at the 2nd Machine Learning for Health symposium (ML4H 2022), which was held both virtually and in person on November 28, 2022, in New Orleans, Louisiana, USA. Machine Learning for Health (ML4H) is a longstanding venue for research into machine learning for health, including both theoretical works and applied works. ML4H 2022 featured two submission tracks: a proceedings track, which encompassed full-length submissions of technically mature and rigorous work, and an extended abstract track, which would accept less mature, but innovative research for discussion. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process. Extended abstracts included in this collection describe innovative machine learning research focused on relevant problems in health and biomedicine.

翻訳日:2022-11-29 19:51:59 公開日:2022-11-28

# カスケード障害から相互依存型インフラストラクチャネットワークを再構築するベイズ的アプローチ

A Bayesian Approach to Reconstructing Interdependent Infrastructure Networks from Cascading Failures ( http://arxiv.org/abs/2211.15590v1 )

ライセンス: Link先を確認

Yu Wang, Jin-Zhu Yu, Hiba Baroud

(参考訳) 複雑な相互依存ネットワークの挙動を分析するには、ネットワークトポロジとネットワーク間の相互依存リンクに関する完全な情報が必要である。重要なインフラストラクチャシステムのような多くのアプリケーションにとって、ネットワーク相互依存を理解することは、カスケード障害を予測し、破壊の計画を立てるのに不可欠である。しかしながら、プライバシやセキュリティ上の懸念から、個々のネットワークのトポロジに関するデータは一般には利用できないことが多い。さらに、相互依存リンクは、しばしばカスケード障害の結果、破壊の余波でのみ明らかにされる。本稿では,カスケード故障の観測から相互依存型インフラストラクチャネットワークのトポロジを再構築するスケーラブルな非パラメトリックベイズ手法を提案する。インフラストラクチャ依存の提案と組み合わされたメトロポリス・ハスティングスアルゴリズムは、可能なグラフをサンプリングする効率を高めるために用いられる。相互依存型ネットワークの合成システムを再構築した結果,提案手法は精度と計算時間の両方で既存手法よりも優れていた。さらに本手法を用いて, シェルビー郡のガス水ネットワークやイタリアにおける電力水ネットワークの相互依存システムなど, 相互依存型インフラネットワークの1つのシステムと2つの実世界のシステムのトポロジを再構築し, アプローチの適用性を実証する。

Analyzing the behavior of complex interdependent networks requires complete information about the network topology and the interdependent links across networks. For many applications such as critical infrastructure systems, understanding network interdependencies is crucial to anticipate cascading failures and plan for disruptions. However, data on the topology of individual networks are often publicly unavailable due to privacy and security concerns. Additionally, interdependent links are often only revealed in the aftermath of a disruption as a result of cascading failures. We propose a scalable nonparametric Bayesian approach to reconstruct the topology of interdependent infrastructure networks from observations of cascading failures. Metropolis-Hastings algorithm coupled with the infrastructure-dependent proposal are employed to increase the efficiency of sampling possible graphs. Results of reconstructing a synthetic system of interdependent infrastructure networks demonstrate that the proposed approach outperforms existing methods in both accuracy and computational time. We further apply this approach to reconstruct the topology of one synthetic and two real-world systems of interdependent infrastructure networks, including gas-power-water networks in Shelby County, TN, USA, and an interdependent system of power-water networks in Italy, to demonstrate the general applicability of the approach.

翻訳日:2022-11-29 19:51:44 公開日:2022-11-28

# オフラインマルチエージェント強化学習における良い軌道からの学習

Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2211.15612v1 )

ライセンス: Link先を確認

Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

(参考訳) オフラインマルチエージェント強化学習(marl: offline multi-agent reinforcement learning)は、事前収集されたデータセットから効果的なマルチエージェントポリシーを学ぶことを目的としている。しかし、実際には、複数エージェントのジョイントトラジェクタを生成する個々の行動ポリシーは、通常、そのパフォーマンスのレベルが異なる。例えば、エージェントはランダムポリシーであり、他のエージェントはメディアポリシーである。グローバルな報酬を伴う協調ゲームでは、既存のオフラインMARLによって学習されたエージェントが、しばしばこのランダムなポリシーを継承し、チーム全体のパフォーマンスを危うくする。本稿では,エージェントワイドトラジェクトリの多様性を明確に考慮したオフラインMARLを調査し,この問題に対処するための共有個人トラジェクトリ(SIT)と呼ばれる新しいフレームワークを提案する。具体的には、注目ベースの報酬分解ネットワークは、異なるキー値記憶機構を介して各エージェントにオフラインでクレジットを割り当てる。これらの分解クレジットは、オフラインデータセットを個別の軌道と優先順位付けされた体験リプレイに再構築するために使用され、その後エージェントは良い軌道を共有し、グラフアテンションネットワーク(gat)ベースの批評家と保守的にポリシーを訓練することができる。離散制御(starcraft iiおよびmulti-agent particle environment)と連続制御(multi-agent mujoco)の両方において,本手法を評価した。提案手法は,複雑なオフラインマルチエージェントデータセットにおいて,特に個々のトラクタ間のデータ品質の差が大きい場合に,より優れた結果が得られることを示す。

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e, multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.

翻訳日:2022-11-29 19:51:22 公開日:2022-11-28

# 離散化線形回帰とマルチクラス支持ベクトルによる大気汚染予測手法

Discretized Linear Regression and Multiclass Support Vector Based Air Pollution Forecasting Technique ( http://arxiv.org/abs/2211.15095v1 )

ライセンス: Link先を確認

Dhanalakshmi M and Radha V

(参考訳) 大気汚染は、発展途上国における伝統的なエネルギー資源の未管理利用から生じる重要な問題である。したがって、リスクを最小限に抑えるために、創発的な大気汚染予測手法が不可欠である。そこで本研究では,クラウドコンピューティング環境における大気汚染の監視と制御を行うIoT(Internet of Things)システムを提案する。リニア回帰・マルチクラスサポートベクトル (LR-MSV) IoTベースの大気汚染予測手法を提案し, 大気質データと大気質指数測定をモニタリングし, 効果的に制御する方法について検討した。インドのデータセットにおける空気品質データを用いた広範囲な実験により、確立された最先端手法を用いてベンチマークを行った場合、提案手法の優れた性能が明らかになった。 LR-MSV法により得られた結果は, 大気汚染予測時間と誤差率を, 他の最先端手法による結果と比較することにより, 大気汚染予測精度を著しく向上させることを示した。

Air pollution is a vital issue emerging from the uncontrolled utilization of traditional energy sources as far as developing countries are concerned. Hence, ingenious air pollution forecasting methods are indispensable to minimize the risk. To that end, this paper proposes an Internet of Things (IoT) enabled system for monitoring and controlling air pollution in the cloud computing environment. A method called Linear Regression and Multiclass Support Vector (LR-MSV) IoT-based Air Pollution Forecast is proposed to monitor the air quality data and the air quality index measurement to pave the way for controlling effectively. Extensive experiments carried out on the air quality data in the India dataset have revealed the outstanding performance of the proposed LR-MSV method when benchmarked with well-established state-of-the-art methods. The results obtained by the LR-MSV method witness a significant increase in air pollution forecasting accuracy by reducing the air pollution forecasting time and error rate compared with the results produced by the other state-of-the-art methods

翻訳日:2022-11-29 19:44:02 公開日:2022-11-28

# 間隔準メトリック埋め込みによる非対称距離の表現の改善

Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings ( http://arxiv.org/abs/2211.15120v1 )

ライセンス: Link先を確認

Tongzhou Wang, Phillip Isola

(参考訳) 非対称距離構造(準距離構造)は、私たちの生活においてユビキタスであり、機械学習応用においてより注目を集めている。このような準計量構造をモデル表現に取り入れることで、強化学習(RL)や因果関係学習など、多くの課題が改善されることが示されている。本研究では,そのような準メトリックモデルにおいて4つの望ましい性質を示し,それに対してどのように先行作用が失敗するかを示す。 4つの基準を全て満たすために, IQE (Interval Quasimetric Embedding) を提案する。 3つの準メトリック学習実験において、iqeは強い近似と一般化能力を示し、従来の方法よりも優れた性能と効率をもたらす。 Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding Quasimetric Learning Code Package: https://www.github.com/quasimetric-learning/torch-quasimetric

Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives and are gaining more attention in machine learning applications. Imposing such quasimetric structures in model representations has been shown to improve many tasks, including reinforcement learning (RL) and causal relation learning. In this work, we present four desirable properties in such quasimetric models, and show how prior works fail at them. We propose Interval Quasimetric Embedding (IQE), which is designed to satisfy all four criteria. On three quasimetric learning experiments, IQEs show strong approximation and generalization abilities, leading to better performance and improved efficiency over prior methods. Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding Quasimetric Learning Code Package: https://www.github.com/quasimetric-learning/torch-quasimetric

翻訳日:2022-11-29 19:43:46 公開日:2022-11-28

# スケールと一般化の異なるマルチタスクデータに関するオフラインQ-Learning

Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes ( http://arxiv.org/abs/2211.15144v1 )

ライセンス: Link先を確認

Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine

(参考訳) オフライン強化学習(rl)の可能性は、大規模な異種データセットでトレーニングされた高容量モデルが、視覚とnlpの類似の進歩と同様に、広く一般化されるエージェントにつながる可能性があることである。しかし、最近の研究は、オフラインのRLメソッドはモデルキャパシティのスケールアップに固有の課題に直面していると主張している。これらの研究から得られた知見をもとに,先行設計の選択肢を再検討し,適切な選択を行うことでそれを見出す。resnet,クロスエントロピーベースの分散バックアップ,機能正規化,オフラインのq-learningアルゴリズムは,モデルキャパシティでスケールする強力なパフォーマンスを示す。マルチタスクのAtariをスケーリングと一般化のためのテストベッドとして使用し、最大8000万のパラメータネットワークを用いて40ゲームに1つのポリシーをトレーニングし、モデル性能がキャパシティと良好にスケールできることを発見した。以前の作業とは対照的に、大規模な(4mのトランジッションで完全にトレーニングされた場合でも、データセットのパフォーマンス以上を推定する(人間レベルのパフォーマンスは51%)。回帰条件付き教師付きアプローチと比較して、オフラインのq-learningはモデルキャパシティと同様にスケールし、特にデータセットが最適でない場合、パフォーマンスが向上する。最後に、多様なデータセットを持つオフラインのq-learningは、新しいゲームへの迅速な移行とトレーニングゲームの新たなバリエーションに関する高速なオンライン学習を促進する強力な表現を学習するのに十分であることを示し、既存の最先端表現学習アプローチよりも改善する。

The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51% human-level performance). Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.

翻訳日:2022-11-29 19:43:31 公開日:2022-11-28

# データ駆動型マルチノミアルランダム森林

Data-driven multinomial random forest ( http://arxiv.org/abs/2211.15154v1 )

ライセンス: Link先を確認

JunHao Chen

(参考訳) 本稿では,ランダムフォレスト変種に対する従来の弱い一貫性証明法を強い一貫性証明法に補強し,rf変種のデータ駆動性を強化し,より優れた理論特性と実験性能を得る。また,データ駆動型マルチノミアルランダムフォレスト(DMRF)をMRF(Multinomial random forest)に基づいて提案する。我々の知る限り、DMRFアルゴリズムはアルゴリズムの複雑さが低く、優れた性能を持つRFの変種である。

In this paper, we strengthen the previous weak consistency proof method of random forest variants into a strong consistency proof method, and strengthen the data-driven degree of RF variants, so as to obtain better theoretical properties and experimental performance. In addition, we also propose a data-driven multinomial random forest (DMRF) based on the multinomial random forest (MRF), which meets the strong consistency and has lower complexity than MRF, and the effect is equal to or better than MRF. As far as we know, DMRF algorithm is a variant of RF with low algorithm complexity and excellent performance.

翻訳日:2022-11-29 19:43:01 公開日:2022-11-28

# インクリメンタルフーリエニューラルオペレータ

Incremental Fourier Neural Operator ( http://arxiv.org/abs/2211.15188v1 )

ライセンス: Link先を確認

Jiawei Zhao, Robert Joseph George, Yifei Zhang, Zongyi Li, Anima Anandkumar

(参考訳) 近年、ニューラルネットワークは偏微分方程式(pdes)を解く能力が証明されている。中でもフーリエニューラル演算子(FNO)は乱流などの非線形問題に対する学習ソリューション演算子として成功している。 FNOは離散化不変であり、低解像度のデータをトレーニングし、高解像度の問題を一般化することができる。この特性は、情報伝達のために限られた周波数モードのみを選択するFNOの低域フィルタと関連している。しかし、異なるPDEに対して適切な回数の周波数モードとトレーニング解像度を選択することは依然として課題である。周波数モードと低解像度データが多すぎると一般化を損なうが、多くの周波数モードと高解像度データは計算に高価であり、過度に適合する。そこで本研究では,訓練中の周波数モードとデータ解像度を漸進的に拡張するインクリメンタルフーリエニューラル演算子(ifno)を提案する。 IFNOは,標準FNOに比べて計算コストを35%削減しつつ,より優れた一般化(L2損失テストの15%削減)を実現する。さらに,IFNOはFNOにおける暗黙の正則化の挙動に従い,その優れた一般化能力を説明する。

Recently, neural networks have proven their impressive ability to solve partial differential equations (PDEs). Among them, Fourier neural operator (FNO) has shown success in learning solution operators for highly non-linear problems such as turbulence flow. FNO is discretization-invariant, where it can be trained on low-resolution data and generalizes to problems with high-resolution. This property is related to the low-pass filters in FNO, where only a limited number of frequency modes are selected to propagate information. However, it is still a challenge to select an appropriate number of frequency modes and training resolution for different PDEs. Too few frequency modes and low-resolution data hurt generalization, while too many frequency modes and high-resolution data are computationally expensive and lead to over-fitting. To this end, we propose Incremental Fourier Neural Operator (IFNO), which augments both the frequency modes and data resolution incrementally during training. We show that IFNO achieves better generalization (around 15% reduction on testing L2 loss) while reducing the computational cost by 35%, compared to the standard FNO. In addition, we observe that IFNO follows the behavior of implicit regularization in FNO, which explains its excellent generalization ability.

翻訳日:2022-11-29 19:42:50 公開日:2022-11-28

# CIM:スパース逆連続制御のための制約付き固有モチベーション

CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control ( http://arxiv.org/abs/2211.15205v1 )

ライセンス: Link先を確認

Xiang Zheng, Xingjun Ma, Cong Wang

(参考訳) 内在的動機付けは、希薄な報酬または欠如した報酬で強化学習タスクを解決するための有望な探索技術である。固有のモチベーションを実装するには2つの技術的課題があります。 1)効率的な探査を促進するための適切な本質的目標の設計方法 2)本質的な目的と外生的な目的を組み合わせて、より良い解決策を見つける方法。現在の文献では、本質的な目的はすべてタスクに依存しない方法で設計され、単純な追加(あるいは報酬のない事前訓練に自身で使用する)によって外生的な目的と組み合わせられている。本研究では、これらの設計が典型的なスパース逆連続制御タスクで失敗することを示す。そこで本研究では,制約付き本質的目標を構築するために,容易に達成可能なタスクプリエントを活用するための制約付き本質的モチベーション(cim)を提案し,同時に,本質的目標と外生的目標を同時最大化フレームワークで適応的にバランスさせるラグランジアン法を活用した。我々は、複数のスパース逆連続制御タスクにおいて、CIM手法が最先端手法よりも性能とサンプル効率を大幅に向上させることを示す。さらに、CIMの重要なテクニックを既存のメソッドにプラグインしてパフォーマンスを向上させることも可能です。

Intrinsic motivation is a promising exploration technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation: 1) how to design a proper intrinsic objective to facilitate efficient exploration; and 2) how to combine the intrinsic objective with the extrinsic objective to help find better solutions. In the current literature, the intrinsic objectives are all designed in a task-agnostic manner and combined with the extrinsic objective via simple addition (or used by itself for reward-free pre-training). In this work, we show that these designs would fail in typical sparse-reward continuous control tasks. To address the problem, we propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective, and at the same time, exploit the Lagrangian method to adaptively balance the intrinsic and extrinsic objectives via a simultaneous-maximization framework. We empirically show, on multiple sparse-reward continuous control tasks, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods. Moreover, the key techniques of our CIM can also be plugged into existing methods to boost their performances.

翻訳日:2022-11-29 19:42:30 公開日:2022-11-28

# Chroma-VAE: 生成型分類器によるショートカット学習の軽減

Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers ( http://arxiv.org/abs/2211.15231v1 )

ライセンス: Link先を確認

Wanqian Yang, Polina Kirichenko, Micah Goldblum, Andrew Gordon Wilson

(参考訳) 深層ニューラルネットワークは、学習をショートカットし、基本的な意味構造を発見することなく、トレーニング損失の少ないために単純な特徴を使用する。先行する信念とは対照的に,生成モデルだけでは,識別的アプローチよりも総合的な表現を回復する動機があるにもかかわらず,近距離学習を防止するには不十分であることを示す。しかし,ショートカットを最小限の情報で優先的に符号化することは,生成モデルがショートカット学習の軽減に有効であることを示す。特にChroma-VAEは、VAE分類器を初期訓練して小さな潜在部分空間でショートカットを分離し、二次分類器を補完的、ショートカットのない潜在部分空間で訓練する2段階のアプローチを提案する。ベンチマークや実世界のショートカット学習におけるクロマVAEの有効性の実証に加えて, 生成型分類器の潜時空間を操作して, 特定の相関関係を分離・解釈する可能性を強調した。

Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure. Contrary to prior belief, we show that generative models alone are not sufficient to prevent shortcut learning, despite an incentive to recover a more comprehensive representation of the data than discriminative approaches. However, we observe that shortcuts are preferentially encoded with minimal information, a fact that generative models can exploit to mitigate shortcut learning. In particular, we propose Chroma-VAE, a two-pronged approach where a VAE classifier is initially trained to isolate the shortcut in a small latent subspace, allowing a secondary classifier to be trained on the complementary, shortcut-free latent subspace. In addition to demonstrating the efficacy of Chroma-VAE on benchmark and real-world shortcut learning tasks, our work highlights the potential for manipulating the latent space of generative classifiers to isolate or interpret specific correlations.

翻訳日:2022-11-29 19:42:06 公開日:2022-11-28

# GADMSL:マルチスケールサブ構造学習による分散ネットワーク上のグラフ異常検出

GADMSL: Graph Anomaly Detection on Attributed Networks via Multi-scale Substructure Learning ( http://arxiv.org/abs/2211.15255v1 )

ライセンス: Link先を確認

Duan Jingcan, Wang Siwei, Liu Xinwang, Zhou Haifang, Hu Jingtao, Jin Hu

(参考訳) 近年,グラフ異常検出がデータマイニングや機械学習コミュニティで注目を集めている。既存の属性異常とは別に、グラフ異常検出は主要な属性と異なる疑わしい位相異常ノードもキャプチャする。グラフに基づく大規模な検出手法が提案されているが、そのほとんどはノードレベルの比較に重点を置いている。より異なる近傍構造を持つノードは、異常である可能性がより疑わしい。局所的なサブストラクチャー検出能力を高めるために,マルチスケールサブストラクチャー学習(GADMSL, Multi-scale Substructure Learning)によるグラフ異常検出フレームワークを提案する。従来のアルゴリズムとは異なり、内部類似度が密結合領域において比較的低い異常な部分構造を捉えることができる。具体的には,ネットワーク内の高密度な部分構造を疑わしい部分として見つけるための領域提案モジュールを採用する。内部ノード埋め込みの類似性は検出された部分構造の異常度を示している。一般に、埋め込み類似度の低いことは、部分構造が位相異常を含む高い確率を意味する。さらに,ノード属性の埋め込み性を向上するために,属性異常を観測するグラフコントラスト学習方式を導入する。このようにして、GADMSLはトポロジーと属性の異常の両方を検出することができる。最終的に、ベンチマークデータセットの広範な実験により、gadmslは最先端のネットワーク異常検出アルゴリズムに比べて検出性能(最大7.30%のaucと17.46%のauprc向上)が大幅に向上することが示された。

Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.

翻訳日:2022-11-29 19:41:46 公開日:2022-11-28

# ラベルノイズに頑健なニューラルネットワークの確立

Establishment of Neural Networks Robust to Label Noise ( http://arxiv.org/abs/2211.15279v1 )

ライセンス: Link先を確認

Pengwei Yang, Angel Teng and Jack Mangos

(参考訳) ラベルノイズはディープラーニングモデルのトレーニングにおいて重要な障害である。これは画像分類モデル、特にディープニューラルネットワークの性能に大きな影響を与える可能性がある。本稿では,関連ラベルノイズ手法の基本概念について検討した。遷移行列推定器が作成され、実際の遷移行列に対する効果が実証されている。さらに,2つの畳み込みニューラルネットワーク分類器のラベル雑音耐性をLeNetとAlexNetの設計を用いて検討した。 2つのFashionMINISTデータセットは、両方のモデルの堅牢性を明らかにしている。我々は、時間と計算資源の制約により複雑な畳み込みニューラルネットワークモデルを正しく調整できないため、遷移行列ノイズ補正が堅牢性向上に与える影響を効率的に示すことができない。今後の研究において、ニューラルネットワークモデルを微調整し、推定遷移モデルの精度を探求する追加の努力が必要である。

Label noise is a significant obstacle in deep learning model training. It can have a considerable impact on the performance of image classification models, particularly deep neural networks, which are especially susceptible because they have a strong propensity to memorise noisy labels. In this paper, we have examined the fundamental concept underlying related label noise approaches. A transition matrix estimator has been created, and its effectiveness against the actual transition matrix has been demonstrated. In addition, we examined the label noise robustness of two convolutional neural network classifiers with LeNet and AlexNet designs. The two FashionMINIST datasets have revealed the robustness of both models. We are not efficiently able to demonstrate the influence of the transition matrix noise correction on robustness enhancements due to our inability to correctly tune the complex convolutional neural network model due to time and computing resource constraints. There is a need for additional effort to fine-tune the neural network model and explore the precision of the estimated transition model in future research.

翻訳日:2022-11-29 19:41:21 公開日:2022-11-28

# Flow: 動的ルーティングによる個人化フェデレーション学習

Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing ( http://arxiv.org/abs/2211.15281v1 )

ライセンス: Link先を確認

Kunjal Panchal, Sunav Choudhary, Hui Guan

(参考訳) フェデレートラーニング(FL)におけるパーソナライゼーションは、クライアントごとに協調的に訓練されたグローバルモデルを変更することを目的としている。 FLにおけるパーソナライズへの現在のアプローチは、粗い粒度、すなわち、クライアントのすべての入力インスタンスは同じパーソナライズされたモデルを使っている。これは、いくつかのインスタンスがより正確なグローバルモデルによって扱われているという事実を無視している。この課題に対処するために、この研究は、きめ細かいステートレスパーソナライズされたFLアプローチであるFlowを提案する。 Flowは、入力インスタンスがローカルパラメータを好むかどうかを判断するルーティングメカニズムを学習することで、動的パーソナライズされたモデルを生成する。このようにflowは、クライアント毎のパーソナライズを活用して、各クライアントのアキュラビリティを向上させることに加えて、インスタンス毎のルーティングを導入する。さらに、Flowはステートレスであるため、クライアントがFLラウンド全体でパーソナライズされた状態を維持する必要がなくなる。これにより、Flowは大規模FL設定で実用的になり、新しく加入したクライアントと親しみやすくなります。 Stackoverflow、Reddit、EMNISTデータセットの評価は、FLに対する最先端の非個人化とクライアント毎のパーソナライズされたアプローチよりも、Flowの予測精度が優れていることを示している。

Personalization in Federated Learning (FL) aims to modify a collaboratively trained global model according to each client. Current approaches to personalization in FL are at a coarse granularity, i.e. all the input instances of a client use the same personalized model. This ignores the fact that some instances are more accurately handled by the global model due to better generalizability. To address this challenge, this work proposes Flow, a fine-grained stateless personalized FL approach. Flow creates dynamic personalized models by learning a routing mechanism that determines whether an input instance prefers the local parameters or its global counterpart. Thus, Flow introduces per-instance routing in addition to leveraging per-client personalization to improve accuracies at each client. Further, Flow is stateless which makes it unnecessary for a client to retain its personalized state across FL rounds. This makes Flow practical for large-scale FL settings and friendly to newly joined clients. Evaluations on Stackoverflow, Reddit, and EMNIST datasets demonstrate the superiority in prediction accuracy of Flow over state-of-the-art non-personalized and only per-client personalized approaches to FL.

翻訳日:2022-11-29 19:41:08 公開日:2022-11-28

# qlammp:自動化市場構築プロトコルの料金を最適化するq-learningエージェント

QLAMMP: A Q-Learning Agent for Optimizing Fees on Automated Market Making Protocols ( http://arxiv.org/abs/2211.14977v1 )

ライセンス: Link先を確認

Dev Churiwala, Bhaskar Krishnamachari

(参考訳) AMM(Automated Market Makers)は、分散金融(DeFi)分野の不可欠な部分として自らを固めている。 AMMは、集中取引所を必要とせずに資産を取引できる取引所の一種である。多数の分散交換(DEX)の基礎を形成し、オンチェーントークンの迅速かつ効率的な交換を支援する。現在の一般的なdexはすべて静的プロトコルであり、固定パラメータが料金と曲率を制御している。この特徴は、トレーダーが不利な市場の動きによって引き起こされる高い滑り込み状態の間、遠ざかってしまう可能性がある。本稿では,AMMプロトコル上で収集した料金を最適化するRLフレームワークを提案する。特に,マーケットメイキングプロトコルのQラーニングエージェント(QLAMMP)を開発し,与えられたAMMプロトコルの最適料金率と係数を学習し,様々な市場条件下で収集された期待手数料を最大化する。 QLAMMPは、すべてのシミュレートされたテスト条件下で、その静的な性能よりも一貫して優れています。

Automated Market Makers (AMMs) have cemented themselves as an integral part of the decentralized finance (DeFi) space. AMMs are a type of exchange that allows users to trade assets without the need for a centralized exchange. They form the foundation for numerous decentralized exchanges (DEXs), which help facilitate the quick and efficient exchange of on-chain tokens. All present-day popular DEXs are static protocols, with fixed parameters controlling the fee and the curvature - they suffer from invariance and cannot adapt to quickly changing market conditions. This characteristic may cause traders to stay away during high slippage conditions brought about by intractable market movements. We propose an RL framework to optimize the fees collected on an AMM protocol. In particular, we develop a Q-Learning Agent for Market Making Protocols (QLAMMP) that learns the optimal fee rates and leverage coefficients for a given AMM protocol and maximizes the expected fee collected under a range of different market conditions. We show that QLAMMP is consistently able to outperform its static counterparts under all the simulated test conditions.

翻訳日:2022-11-29 19:32:34 公開日:2022-11-28

# 最適スパース回帰木

Optimal Sparse Regression Trees ( http://arxiv.org/abs/2211.14980v1 )

ライセンス: Link先を確認

Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin

(参考訳) 回帰木はAIモデルの最も古い形式の1つであり、その予測は電卓なしで行うことができる。回帰木に関する大規模な文献の中で、問題の計算の難しさから、完全証明可能な最適化への取り組みはほとんどなかった。本研究は,確率的最適スパース回帰木の構築に対する動的プログラミングとバウンドのアプローチを提案する。ラベル集合上の1次元におけるk-平均クラスタリングアルゴリズムの最適解に基づく新しい下界を利用する。数秒で最適なスパースツリーを見つけることがしばしば可能で、大量のサンプルと高い相関性のある機能を含む、挑戦的なデータセットでさえあります。

Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.

翻訳日:2022-11-29 19:32:15 公開日:2022-11-28

# dgi:gnnの簡単かつ効率的な推論

DGI: Easy and Efficient Inference for GNNs ( http://arxiv.org/abs/2211.15082v1 )

ライセンス: Link先を確認

Peiqi Yin, Xiao Yan, Jinjing Zhou, Qiang Fu, Zhenkun Cai, James Cheng, Bo Tang, Minjie Wang

(参考訳) グラフニューラルネットワーク(GNN)を訓練するために多くのシステムが開発されているが、効率的なモデル推論と評価は未解決のままである。例えば、広く採用されているノードワイドアプローチを使用して、モデル評価は、隣の爆発によるエンドツーエンドのトレーニングプロセスにおいて、最大94%の時間を占めることができる。一方、層ワイド推論は、各層に1ホップの隣り合わせしか必要としないように、各層で推論層を実行することによって、隣の爆発問題を回避している。しかし、計算のためにGNNモデルをレイヤーに手動で分解し、ワークロードをデバイスメモリに適合させるためには、バッチに分割する必要があるため、レイヤワイズ推論を実装するにはかなりのエンジニアリング作業が必要である。本稿では,GNNモデルの学習コードを階層的実行のために自動的に翻訳する,簡易かつ効率的なGNNモデル推論システムであるDeep Graph Inference(DGI)を開発する。 DGIはさまざまなGNNモデルとさまざまな種類の推論要求に対して汎用的であり、CPUメモリに収まらない大きなグラフ上でのコア外実行をサポートする。実験の結果、DGIは異なるデータセットとハードウェア設定で階層的推論を一貫して上回り、スピードアップは1000倍以上であることがわかった。

While many systems have been developed to train Graph Neural Networks (GNNs), efficient model inference and evaluation remain to be addressed. For instance, using the widely adopted node-wise approach, model evaluation can account for up to 94% of the time in the end-to-end training process due to neighbor explosion, which means that a node accesses its multi-hop neighbors. On the other hand, layer-wise inference avoids the neighbor explosion problem by conducting inference layer by layer such that the nodes only need their one-hop neighbors in each layer. However, implementing layer-wise inference requires substantial engineering efforts because users need to manually decompose a GNN model into layers for computation and split workload into batches to fit into device memory. In this paper, we develop Deep Graph Inference (DGI) -- a system for easy and efficient GNN model inference, which automatically translates the training code of a GNN model for layer-wise execution. DGI is general for various GNN models and different kinds of inference requests, and supports out-of-core execution on large graphs that cannot fit in CPU memory. Experimental results show that DGI consistently outperforms layer-wise inference across different datasets and hardware settings, and the speedup can be over 1,000x.

翻訳日:2022-11-29 19:32:05 公開日:2022-11-28

# バイオメディシンにおける会話探索と応用に関する研究

A Survey on Conversational Search and Applications in Biomedicine ( http://arxiv.org/abs/2211.15328v1 )

ライセンス: Link先を確認

Naga Sai Krishna Adatrao, Gowtham Reddy Gadireddy, Jiho Noh

(参考訳) 本稿では,ユーザが情報検索タスクの対話を行う情報検索手法を強化するためのアプローチである会話検索(convsearch)の抜本的な実行方法を提案する。本研究では,ConvSearchシステムにおけるヒューマン・インタラクティブな特徴に着目し,アクション・モジュール,おそらく検索システム,質問応答システム,レコメンダシステムの動作に注目した。動作モジュールとともに、知識ベース、自然言語処理、対話管理システムにおける様々なConvSearch研究問題をラベル付けした。さらに,convsearchの枠組みを分類し,臨床社会技術活用のためのバイオメディカル・医療分野への応用をめざした。最後に,特にバイオメディシンにおけるconvsearchの課題と課題について論じる。我々の主な目的は、さまざまな分野からConvSearchコンポーネントを統合して統合したビジョンを提供することであり、医療システムにおける情報検索のプロセスに役立てることである。

This paper aims to provide a radical rundown on Conversation Search (ConvSearch), an approach to enhance the information retrieval method where users engage in a dialogue for the information-seeking tasks. In this survey, we predominantly focused on the human interactive characteristics of the ConvSearch systems, highlighting the operations of the action modules, likely the Retrieval system, Question-Answering, and Recommender system. We labeled various ConvSearch research problems in knowledge bases, natural language processing, and dialogue management systems along with the action modules. We further categorized the framework to ConvSearch and the application is directed toward biomedical and healthcare fields for the utilization of clinical social technology. Finally, we conclude by talking through the challenges and issues of ConvSearch, particularly in Bio-Medicine. Our main aim is to provide an integrated and unified vision of the ConvSearch components from different fields, which benefit the information-seeking process in healthcare systems.

翻訳日:2022-11-29 19:16:55 公開日:2022-11-28

# Fast-SNARF:人工神経の高速変形器

Fast-SNARF: A Fast Deformer for Articulated Neural Fields ( http://arxiv.org/abs/2211.15601v1 )

ライセンス: Link先を確認

Xu Chen, Tianjian Jiang, Jie Song, Max Rietmann, Andreas Geiger, Michael J. Black, Otmar Hilliges

(参考訳) ニューラルフィールドは3次元再構成と剛体シーンの新しいビュー合成の領域に革命をもたらした。このような手法を人体などの関節オブジェクトに適用する上で重要な課題は、残りのポーズ(標準空間)と変形した空間の間の3D位置の変形をモデル化することである。本研究では, 反復的ルート探索により, 正準空間とポーズ空間の正確な対応を求める, ニューラルフィールドのための新しい調音モジュールfast-snarfを提案する。 Fast-SNARFは、これまでの作業であるSNARFの代替機能であり、計算効率は大幅に向上した。我々は,SNARFに対するアルゴリズムおよび実装の改善に寄与し,150\times$の高速化を実現した。これらの改善には、voxelベースの対応検索、線形ブレンドスキン機能の事前計算、CUDAカーネルによる効率的なソフトウェア実装が含まれる。高速SNARFは、対応のない変形した観察(例えば3Dメッシュ)に対して、形状とスキンの重量の効率的かつ同時最適化を可能にする。変形マップの学習は多くの人間のアバター法において重要な要素であり、Fast-SNARFは計算効率の良い解を提供するので、この研究は3次元仮想人間の実現に向けた重要な一歩であると信じている。

Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space via iterative root finding. Fast-SNARF is a drop-in replacement in functionality to our previous work, SNARF, while significantly improving its computational efficiency. We contribute several algorithmic and implementation improvements over SNARF, yielding a speed-up of $150\times$. These improvements include voxel-based correspondence search, pre-computing the linear blend skinning function, and an efficient software implementation with CUDA kernels. Fast-SNARF enables efficient and simultaneous optimization of shape and skinning weights given deformed observations without correspondences (e.g. 3D meshes). Because learning of deformation maps is a crucial component in many 3D human avatar methods and since Fast-SNARF provides a computationally efficient solution, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.

翻訳日:2022-11-29 19:08:31 公開日:2022-11-28

# 多レベル不均質学習による効率的なミラー検出

Efficient Mirror Detection via Multi-level Heterogeneous Learning ( http://arxiv.org/abs/2211.15644v1 )

ライセンス: Link先を確認

Ruozhen He and Jiaying Lin and Rynson W.H. Lau

(参考訳) 超効率的なミラー検出ネットワークであるhetnet (multi-level \textbf{het}erogeneous \textbf{net}work) を提案する。現在のミラー検出手法は効率よりも性能に重点を置いており、リアルタイムアプリケーション(ドローンなど)を制限する。それらの効率性の欠如は、異なるレベルの同質な加群を採用するという共通の設計によって引き起こされる。対照的に、hetnetはまず低レベルの理解(例えば、強度コントラスト)を通じて潜在的なミラー領域を検出し、その後高レベルの理解(例えば、コンテキストの不連続)と組み合わせて予測を確定する。正確かつ効率的なミラー検出を行うため、hetnetは鏡を検出するために異なる段階で特定の情報を取得する効果的なアーキテクチャに従う。さらに,HetNetをベースとしたマルチオリエンテーション強度に基づくコントラスト付きモジュール (MIC) とリフレクションセマンティック論理モジュール (RSL) を提案し,低レベルの理解によるミラー領域の予測と,高レベルの理解によるシナリオにおけるセマンティックロジックの解析を行う。最先端の手法と比較すると、hetnetは664$\%$高速で動作し、maeでは8.9$\%$、iouでは3.1$%$、ミラー検出ベンチマークでは2つのf-measureで2.0$$%$である。

We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between different levels of features. In contrast, HetNet detects potential mirror regions initially through low-level understandings (\textit{e.g.}, intensity contrasts) and then combines with high-level understandings (contextual discontinuity for instance) to finalize the predictions. To perform accurate yet efficient mirror detection, HetNet follows an effective architecture that obtains specific information at different stages to detect mirrors. We further propose a multi-orientation intensity-based contrasted module (MIC) and a reflection semantic logical module (RSL), equipped on HetNet, to predict potential mirror regions by low-level understandings and analyze semantic logic in scenarios by high-level understandings, respectively. Compared to the state-of-the-art method, HetNet runs 664$\%$ faster and draws an average performance gain of 8.9$\%$ on MAE, 3.1$\%$ on IoU, and 2.0$\%$ on F-measure on two mirror detection benchmarks.

翻訳日:2022-11-29 19:08:11 公開日:2022-11-28

# OpenScene:オープン語彙による3Dシーン理解

OpenScene: 3D Scene Understanding with Open Vocabularies ( http://arxiv.org/abs/2211.15654v1 )

ライセンス: Link先を確認

Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser

(参考訳) 従来の3Dシーン理解アプローチは、単一のタスクのためにモデルをトレーニングするためのラベル付き3Dデータセットに依存している。私たちは,CLIP機能空間にテキストと画像ピクセルを埋め込んだ3次元シーンポイントの高密度特徴をモデルが予測する代替手法OpenSceneを提案する。このゼロショットアプローチは、タスク非依存のトレーニングとオープン語彙クエリを可能にする。例えば、SOTAゼロショット3Dセマンティックセグメンテーションを実行するには、まず3Dポイント毎にCLIP機能を推論し、後に任意のクラスラベルの埋め込みと類似性に基づいてそれらを分類する。さらに興味深いのは、これまでにないオープン語彙のシーン理解アプリケーションスイートを可能にすることだ。例えば、任意のテキストクエリを入力すると、シーンのどの部分が一致しているかを示すヒートマップが表示される。我々のアプローチは、複雑な3Dシーンにおいて、オブジェクト、材料、余剰、活動、ルームタイプを特定するのに効果的であり、いずれもラベル付き3Dデータなしでトレーニングされた単一のモデルを使用する。

Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic segmentation it first infers CLIP features for every 3D point and later classifies them based on similarities to embeddings of arbitrary class labels. More interestingly, it enables a suite of open-vocabulary scene understanding applications that have never been done before. For example, it allows a user to enter an arbitrary text query and then see a heat map indicating which parts of a scene match. Our approach is effective at identifying objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data.

翻訳日:2022-11-29 19:07:42 公開日:2022-11-28

# ドット接続:2レベルクエリを用いたフロアプラン再構築

Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries ( http://arxiv.org/abs/2211.15658v1 )

ライセンス: Link先を確認

Yuanwen Yue, Theodora Kontogianni, Konrad Schindler, Francis Engelmann

(参考訳) 3次元スキャンによる2次元フロアプラン再構成について述べる。既存のアプローチは通常、ヒューリスティックに設計されたマルチステージパイプラインを使用する。代わりに、フロアプラン再構築を単一段階構造予測タスクとして定式化し、可変サイズの多角形の集合を見つけ、これは順序付けられた頂点の可変長列である。そこで本研究では,複数の部屋の多角形を並列に,手作り中間段を使わずに総合的に生成する新しい変圧器アーキテクチャを開発した。モデルには、多角形と角形の2レベルクエリと、ネットワークをエンドツーエンドでトレーニング可能にする多角形マッチングが含まれている。提案手法は,Structured3DとSceneCADという2つの挑戦的データセットに対して,従来の手法よりもはるかに高速な推論を実現する。さらに、セマンティックルームタイプやドアや窓のようなアーキテクチャ要素などの追加情報を予測するために簡単に拡張できる。私たちのコードとモデルは、https://github.com/ywyue/RoomFormer.comで利用可能になります。

We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that generates polygons of multiple rooms in parallel, in a holistic manner without hand-crafted intermediate stages. The model features two-level queries for polygons and corners, and includes polygon matching to make the network end-to-end trainable. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD, along with significantly faster inference than previous methods. Moreover, it can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows. Our code and models will be available at: https://github.com/ywyue/RoomFormer.

翻訳日:2022-11-29 19:07:24 公開日:2022-11-28

# Satlas: リモートセンシング画像理解のための大規模マルチタスクデータセット

Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding ( http://arxiv.org/abs/2211.15660v1 )

ライセンス: Link先を確認

Favyen Bastani and Piper Wolters and Ritwik Gupta and Joe Ferdinando and Aniruddha Kembhavi

(参考訳) リモートセンシング画像は、森林伐採の追跡、違法漁業、都市の拡張、自然災害など、様々な環境・地球モニタリング作業に有用である。地球は極めて多様で、リモートセンシング画像における潜在的タスクの量は膨大であり、特徴の大きさは数kmから数十cm程度である。しかしながら、汎用的なコンピュータビジョン手法を作成することは、多くのタスクのためにこれらの多様な特徴をキャプチャする大規模なデータセットが欠如していることによる課題である。本稿では,上述したすべてのアプリケーションと,137のカテゴリと7つのラベルモダリティを持つ290mのラベルを含むスケールを特徴とする,リモートセンシングデータセットとベンチマークであるsatlasを提案する。我々は8つのベースラインと提案手法をsatlas上で評価し,リモートセンシングに特有の研究課題に対して,非常に異なる種類のセンサからのイメージからなる画像時系列の処理や,長距離空間コンテキストの活用など,改善の余地があることを見出した。また,satlasでの事前トレーニングでは,ラベル付き例が少なく,下流タスクのパフォーマンスが大幅に向上し,imagenetでは平均精度が16%向上し,次回のベストベースラインでは5%向上した。

Remote sensing images are useful for a wide variety of environmental and earth monitoring tasks, including tracking deforestation, illegal fishing, urban expansion, and natural disasters. The earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes of features range from several kilometers to just tens of centimeters. However, creating generalizable computer vision methods is a challenge in part due to the lack of a large-scale dataset that captures these diverse features for many tasks. In this paper, we present Satlas, a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and seven label modalities. We evaluate eight baselines and a proposed method on Satlas, and find that there is substantial room for improvement in addressing research challenges specific to remote sensing, including processing image time series that consist of images from very different types of sensors, and taking advantage of long-range spatial context. We also find that pre-training on Satlas substantially improves performance on downstream tasks with few labeled examples, increasing average accuracy by 16% over ImageNet and 5% over the next best baseline.

翻訳日:2022-11-29 19:07:07 公開日:2022-11-28

# Pseudo-multi-view Optimization による高忠実度3D GANインバージョン

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization ( http://arxiv.org/abs/2211.15662v1 )

ライセンス: Link先を確認

Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen

(参考訳) 入力画像の特定の詳細を保存しながら、フォトリアリスティックな新規ビューを合成できる高忠実な3D生成逆ネットワーク(GAN)インバージョンフレームワークを提案する。高忠実度3D GANインバージョンは、3Dインバージョンにおける幾何学的・テクスチャ的トレードオフのため本質的に困難である。この課題を解決するために,視覚分析を用いた擬似マルチビュー推定に基づく新しいパイプラインを提案する。目に見える部分の原文のテクスチャを保ち、隠された部分の生成前文を利用する。広範な実験により,本手法は分散テクスチャを有する画像においても,最先端手法よりも有利な再構成と新しいビュー合成品質を実現することが示された。提案するパイプラインでは、反転した潜在コードと3d対応テクスチャによるイメージ属性編集も可能である。提案手法は,1枚の画像から高忠実度3Dレンダリングを可能にし,AI生成3Dコンテンツの様々な応用に期待できる。

We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. High-fidelity 3D GAN inversion is inherently challenging due to the geometry-texture trade-off in 3D inversion, where overfitting to a single view input image often damages the estimated geometry during the latent optimization. To solve this challenge, we propose a novel pipeline that builds on the pseudo-multi-view estimation with visibility analysis. We keep the original textures for the visible parts and utilize generative priors for the occluded parts. Extensive experiments show that our approach achieves advantageous reconstruction and novel view synthesis quality over state-of-the-art methods, even for images with out-of-distribution textures. The proposed pipeline also enables image attribute editing with the inverted latent code and 3D-aware texture modification. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.

翻訳日:2022-11-29 19:06:44 公開日:2022-11-28

# ハンドオブジェクトインタラクション画像生成

Hand-Object Interaction Image Generation ( http://arxiv.org/abs/2211.15663v1 )

ライセンス: Link先を確認

Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li

(参考訳) 本研究では,与えられた手,対象,およびそのインタラクション状態の下でのハンドオブジェクトイメージの条件付き生成を目的とした,新たなタスクであるハンドオブジェクトインタラクション画像生成に焦点をあてる。このタスクは、AR/VRゲームやオンラインショッピングなど、多くの潜在的なアプリケーションシナリオにおいて、挑戦的で研究に値するものです。この問題に対処するために,表現型モデル認識ハンドオブジェクト表現を利用した新しいHOGANフレームワークを提案し,その固有のトポロジを活用して統一表面空間を構築する。この空間では、相互作用中の複雑な自己と相互の閉塞を明示的に考慮する。最終的な画像合成では,手と対象の異なる特性を検討し,対象画像の分割合成を行う。評価のために,生成画像の忠実性と構造保存の両方にアクセスするための包括的なプロトコルを構築する。 HO3Dv3とDexYCBという2つの大規模データセットに対する大規模な実験は、我々のフレームワークの有効性と優位性を定量的かつ質的に実証している。プロジェクトページはhttps://play-with-hoi-generation.github.io/で閲覧できる。

In this work, we are dedicated to a new task, i.e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status. This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping, etc. To address this problem, we propose a novel HOGAN framework, which utilizes the expressive model-aware hand-object representation and leverages its inherent topology to build the unified surface space. In this space, we explicitly consider the complex self- and mutual occlusion during interaction. During final image synthesis, we consider different characteristics of hand and object and generate the target image in a split-and-combine manner. For evaluation, we build a comprehensive protocol to access both the fidelity and structure preservation of the generated image. Extensive experiments on two large-scale datasets, i.e., HO3Dv3 and DexYCB, demonstrate the effectiveness and superiority of our framework both quantitatively and qualitatively. The project page is available at https://play-with-hoi-generation.github.io/.

翻訳日:2022-11-29 19:06:24 公開日:2022-11-28

# RankDNN: 少しの学習でランク付けを学ぶ

RankDNN: Learning to Rank for Few-shot Learning ( http://arxiv.org/abs/2211.15320v1 )

ライセンス: Link先を確認

Qianyu Guo, Hongtong Gong, Xujun Wei, Yanwei Fu, Weifeng Ge, Yizhou Yu, Wenqiang Zhang

(参考訳) 本稿では、画像検索の関連性ランキングをバイナリランキング関係分類として活用する、新しい数ショット学習パイプラインを提案する。画像分類と比較して、ランキング関係分類は標本効率が高く、領域非依存である。さらに、少数の学習に対する新しい視点を提供し、最先端の手法を補完する。我々のディープニューラルネットワークのコアコンポーネントは単純なMLPで、2つのベクトルクロネッカー積の差分として符号化された画像三重項を入力として、バイナリ関連ランキングを出力する。提案された rankmlp は最先端の機能抽出器の上に構築することができ、我々のディープニューラルネットワーク全体を ranking deep neural network または rankdnn と呼ぶ。一方 RankDNN は他の後処理手法と柔軟に融合することができる。メタテスト中、RandDNNは、クエリサンプルと類似度に応じてサポートイメージをランク付けし、各クエリサンプルは、隣人のクラスラベルを割り当てる。実験により、rankdnnは様々なバックボーンに基づくベースラインのパフォーマンスを効果的に改善できることが示され、miniimagenet、tieredimagenet、caltech-ucsd birds、cifar-fsを含む複数のマイナショット学習ベンチマークで、以前の最先端アルゴリズムを上回っている。さらに、クロスドメインチャレンジに関する実験では、rankdnnの優れた転送性が実証されている。

This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.

翻訳日:2022-11-29 18:59:17 公開日:2022-11-28

# 学ぶこと:人間と機械を継続的に教育する方法

Learning to Learn: How to Continuously Teach Humans and Machines ( http://arxiv.org/abs/2211.15470v1 )

ライセンス: Link先を確認

Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Daniel Gao, Morgan Bruce Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang

(参考訳) 我々の教育システムは一連のカリキュラムで構成されている。例えば、学校で数学を学ぶとき、加算から乗算へ、そして後に積分へ順に学習する。人間または機械に教えるためのカリキュラムを記述することは、初期のタスクから後のタスクへのポジティブな知識伝達を最大化し、初期のタスクの忘れを最小化するという基本的な目標を共有します。そこで我々は,アルゴリズムが連続的なデータストリームから一度に1つのクラスを学習しなければならないクラスインクリメンタルセッティングにおいて,カリキュラムが既存の継続学習アルゴリズムに与える影響を網羅的に調査した。我々は,可能な級数(カリキュラム)の幅の広い範囲において,キュリキュラは情報の保持に影響を与え,この効果は確率性の産物ではないことを見出した。さらに, 自動カリキュラム設計への取り組みとして, クラス間特徴類似度に基づいて, 効果的なカリキュラムを設計・ランク付けする手法を提案する。実測値と実測値との比較を行い,両者の間に有意な重複が認められた。カリキュラム設計者の研究を支援するために,人間心理物理学実験を行い,物体認識における新しい連続学習ベンチマークを作成した。我々は人間と機械の効果的なカリキュラムにおける合意度を評価した。驚いたことに、我々のカリキュラムデザイナーは、人間の学習に有効な最適なカリキュラムセットを予測できた。カリキュラムデザインには、タイムリーな学生のフィードバックや複数のモダリティによる学習など、多くの考慮事項がある。私たちの研究は、人間や機械に継続的な学習を教えることの課題に取り組むための、コミュニティの標準フレームワークを設定する最初の試みである。

Our education system comprises a series of curricula. For example, when we learn mathematics at school, we learn in order from addition, to multiplication, and later to integration. Delineating a curriculum for teaching either a human or a machine shares the underlying goal of maximizing the positive knowledge transfer from early to later tasks and minimizing forgetting of the early tasks. Here, we exhaustively surveyed the effect of curricula on existing continual learning algorithms in the class-incremental setting, where algorithms must learn classes one at a time from a continuous stream of data. We observed that across a breadth of possible class orders (curricula), curricula influence the retention of information and that this effect is not just a product of stochasticity. Further, as a primary effort toward automated curriculum design, we proposed a method capable of designing and ranking effective curricula based on inter-class feature similarities. We compared the predicted curricula against empirically determined effectual curricula and observed significant overlaps between the two. To support the study of a curriculum designer, we conducted a series of human psychophysics experiments and contributed a new Continual Learning benchmark in object recognition. We assessed the degree of agreement in effective curricula between humans and machines. Surprisingly, our curriculum designer successfully predicts an optimal set of curricula that is effective for human learning. There are many considerations in curriculum design, such as timely student feedback and learning with multiple modalities. Our study is the first attempt to set a standard framework for the community to tackle the problem of teaching humans and machines to learn to learn continuously.

翻訳日:2022-11-29 18:58:35 公開日:2022-11-28

# エッジスパース埋め込みを用いた教師なしスーパーピクセル生成

Unsupervised Superpixel Generation using Edge-Sparse Embedding ( http://arxiv.org/abs/2211.15474v1 )

ライセンス: Link先を確認

Jakob Geusen, Gustav Bredell, Tianfei Zhou, Ender Konukoglu

(参考訳) ピクセルの類似性に基づいて、画像をスーパーピクセルに分割することで、色や空間的位置などの特徴から、データの複雑さを大幅に削減し、その後の画像処理タスクを改善することができる。教師なしスーパーピクセル生成の初期アルゴリズムは、任意のものよりも重要なエッジを優先することなく、局所的なキューにのみ依存していた。一方で、教師なし深層学習に基づく最近の手法では、スーパーピクセルエッジの付着とコンパクト性の間のトレードオフを適切に解決できなかったり、生成されたスーパーピクセル数を制御できなかったりしている。非畳み込み画像デコーダでは、強い空間相関を持つランダムな画像を入力として使用することにより、期待されるコントラスト数を削減し、再構成された画像にスムーズで接続されたエッジを強制することができる。デコーダの最後の隠れ層から断片的なスムースアクティベーションマップに追加の空間情報をエンコードしてエッジスパース画素埋め込みを生成し、標準クラスタリングアルゴリズムを用いて高品質のスーパーピクセルを抽出する。提案手法はbsds500,pascal-context,顕微鏡データセットにおいて最先端の性能を実現する。

Partitioning an image into superpixels based on the similarity of pixels with respect to features such as colour or spatial location can significantly reduce data complexity and improve subsequent image processing tasks. Initial algorithms for unsupervised superpixel generation solely relied on local cues without prioritizing significant edges over arbitrary ones. On the other hand, more recent methods based on unsupervised deep learning either fail to properly address the trade-off between superpixel edge adherence and compactness or lack control over the generated number of superpixels. By using random images with strong spatial correlation as input, \ie, blurred noise images, in a non-convolutional image decoder we can reduce the expected number of contrasts and enforce smooth, connected edges in the reconstructed image. We generate edge-sparse pixel embeddings by encoding additional spatial information into the piece-wise smooth activation maps from the decoder's last hidden layer and use a standard clustering algorithm to extract high quality superpixels. Our proposed method reaches state-of-the-art performance on the BSDS500, PASCAL-Context and a microscopy dataset.

翻訳日:2022-11-29 18:58:10 公開日:2022-11-28

# 推定時間における時間優先性を利用した物体検出におけるオブジェクトの永続性

Object Permanence in Object Detection Leveraging Temporal Priors at Inference Time ( http://arxiv.org/abs/2211.15505v1 )

ライセンス: Link先を確認

Michael F\"urst, Priyash Bhugra, Ren\'e Schuster, Didier Stricker

(参考訳) オブジェクト永続性(object permanence)は、オブジェクトが物理的世界で突然消滅しないという概念である。人間はこの概念を若いころに理解し、それが一時的に隠されているにもかかわらず、他の人がいることを知っている。現在、ニューラルネットワークはこの課題に苦戦している。そこで本研究では,粒子フィルタからインスピレーションを得た2段階検出手法を提案する。基本的には,従来のフレームの予測を,現在のフレームを推定時に追加提案として使用する。実験では、計算オーバーヘッドが少なく、最大10.3 mAPで検出性能を向上させるフィードバックループを確認する。本手法は,重閉塞下においても安定かつ信頼性の高い2段階検出装置の拡張に適している。さらに、既存のモデルを再トレーニングすることなく、このメソッドを適用できることは、現実世界のタスクで幅広いアプリケーションを実現する。

Object permanence is the concept that objects do not suddenly disappear in the physical world. Humans understand this concept at young ages and know that another person is still there, even though it is temporarily occluded. Neural networks currently often struggle with this challenge. Thus, we introduce explicit object permanence into two stage detection approaches drawing inspiration from particle filters. At the core, our detector uses the predictions of previous frames as additional proposals for the current one at inference time. Experiments confirm the feedback loop improving detection performance by a up to 10.3 mAP with little computational overhead. Our approach is suited to extend two-stage detectors for stabilized and reliable detections even under heavy occlusion. Additionally, the ability to apply our method without retraining an existing model promises wide application in real-world tasks.

翻訳日:2022-11-29 18:57:08 公開日:2022-11-28

# DQ-DETR: フレーズ抽出とグラウンド化のためのデュアルクエリ検出変換器

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding ( http://arxiv.org/abs/2211.15516v1 )

ライセンス: Link先を確認

Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

(参考訳) 本稿では,句抽出と接地(PEG)の両方を考慮した視覚的接地の問題について検討する。以前のフレーズ-既知の設定とは対照的に、PEGはテキストからフレーズを抽出し、画像からオブジェクトを同時に見つけ出すモデルを必要とする。句抽出を1Dテキストセグメンテーション問題と見なすことができるため、PEGを二重検出問題として定式化し、オブジェクト予測とフレーズマスク予測のための画像とテキストの異なる特徴を探索するDQ-DETRモデルを提案する。各2つのクエリは、異なるコンテンツ部分ではなく、共有位置部分を持つように設計されている。このような設計は(単一のクエリ設計とは対照的に)画像とテキスト間のモダリティアライメントの難しさを効果的に軽減し、トランスフォーマーデコーダにフレーズマスクによる注意を活用させ、パフォーマンスを向上させる。 PEGの性能を評価するため,物体検出におけるAP測定値に類似した新しい測定基準CMAP(クロスモーダル平均精度)を提案する。新しいメトリックは、フレーズグラウンドで多ボックスから一フレーズのケースでRecall@1の曖昧さを克服する。その結果、PEGが事前訓練したDQ-DETRは、ResNet-101バックボーンを持つ全てのビジュアルグラウンドベンチマークに対して、新しい最先端の結果を確立する。例えば、RefCOCO testAとtestBのリコールレートで91.04\%$と83.51\%$をResNet-101バックボーンで達成している。コードは \url{https://github.com/IDEA-Research/DQ-DETR} で利用可能になる。

In this paper, we study the problem of visual grounding by considering both phrase extraction and grounding (PEG). In contrast to the previous phrase-known-at-test setting, PEG requires a model to extract phrases from text and locate objects from images simultaneously, which is a more practical setting in real applications. As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction. Each pair of dual queries is designed to have shared positional parts but different content parts. Such a design effectively alleviates the difficulty of modality alignment between image and text (in contrast to a single query design) and empowers Transformer decoder to leverage phrase mask-guided attention to improve performance. To evaluate the performance of PEG, we also propose a new metric CMAP (cross-modal average precision), analogous to the AP metric in object detection. The new metric overcomes the ambiguity of Recall@1 in many-box-to-one-phrase cases in phrase grounding. As a result, our PEG pre-trained DQ-DETR establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone. For example, it achieves $91.04\%$ and $83.51\%$ in terms of recall rate on RefCOCO testA and testB with a ResNet-101 backbone. Code will be availabl at \url{https://github.com/IDEA-Research/DQ-DETR}.

翻訳日:2022-11-29 18:56:57 公開日:2022-11-28

# マルチターゲットマルチカメラ車両追跡のためのグラフ畳み込みネットワーク

Graph Convolutional Network for Multi-Target Multi-Camera Vehicle Tracking ( http://arxiv.org/abs/2211.15538v1 )

ライセンス: Link先を確認

Elena Luna, Juan Carlos San Miguel, Jos\'e Mar\'ia Mart\'inez, and Marcos Escudero-Vi\~nolo

(参考訳) このレターはマルチターゲットマルチカメラ車両追跡のタスクに焦点を当てている。グラフ畳み込みネットワークを訓練することにより,シングルカメラの軌跡をマルチカメラのグローバル軌跡に関連付けることを提案する。当社のアプローチは,グローバルソリューションを提供するすべてのカメラを同時に処理すると同時に,大規模カメラの非同期化にも堅牢です。さらに,クラス不均衡に対処する新たな損失関数を設計する。提案手法は,比較手法と異なり,アドホックな手動アノテーションやしきい値を必要としない,より優れた一般化を示す。

This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal with class imbalance. Our proposal outperforms the related work showing better generalization and without requiring ad-hoc manual annotations or thresholds, unlike compared approaches.

翻訳日:2022-11-29 18:56:28 公開日:2022-11-28

# 幾何学的アライメントに基づくリアルタイムFewshotポートレートスティル化

Realtime Fewshot Portrait Stylization Based On Geometric Alignment ( http://arxiv.org/abs/2211.15549v1 )

ライセンス: Link先を確認

Xinrui Wang, Zhuoru Li, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo

(参考訳) 本稿では,リアルタイムモバイルアプリケーション用にデザインしたポートレートスタイライゼーション手法を提案する。従来の学習に基づくスタイライゼーション手法では、ポートレートドメインとスタイルドメインの間の幾何学的および意味的なギャップに苦しむため、ポートレートイメージに正しく転送されるスタイル情報が妨げられ、スタイライゼーションの品質が低下する。本稿では,人間の顔属性の幾何学的前置に基づいて,幾何学的アライメントを用いてこの問題に取り組むことを提案する。まず,TPS(Thin-Plate-Spline)をジェネレータネットワーク内の特徴マップに加え,画素空間のスタイル画像に直接適用し,同一のランドマークと整列したポートレートスタイルの画像ペアを生成し,二つの領域間の幾何学的ギャップを埋める。第2に、敵対的学習は、ポートレートイメージのテクスチャと色をスタイルドメインにマッピングする。最後に、幾何認識サイクル一貫性はコンテンツとアイデンティティ情報を不変に保存し、変形不変制約はアーティファクトと歪みを抑制する。定性的かつ定量的な比較により,提案手法は既存手法よりも優れており,実験により,モバイル端末上でのリアルタイム(40FPS以上)の限られたスタイルの例(100以下)で学習できることを示した。アブレーション研究はフレームワークの各コンポーネントの有効性を示す。

This paper presents a portrait stylization method designed for real-time mobile applications with limited style examples available. Previous learning based stylization methods suffer from the geometric and semantic gaps between portrait domain and style domain, which obstacles the style information to be correctly transferred to the portrait images, leading to poor stylization quality. Based on the geometric prior of human facial attributions, we propose to utilize geometric alignment to tackle this issue. Firstly, we apply Thin-Plate-Spline (TPS) on feature maps in the generator network and also directly to style images in pixel space, generating aligned portrait-style image pairs with identical landmarks, which closes the geometric gaps between two domains. Secondly, adversarial learning maps the textures and colors of portrait images to the style domain. Finally, geometric aware cycle consistency preserves the content and identity information unchanged, and deformation invariant constraint suppresses artifacts and distortions. Qualitative and quantitative comparison validate our method outperforms existing methods, and experiments proof our method could be trained with limited style examples (100 or less) in real-time (more than 40 FPS) on mobile devices. Ablation study demonstrates the effectiveness of each component in the framework.

翻訳日:2022-11-29 18:56:20 公開日:2022-11-28

# VLTinT:コヒーレントビデオパラグラフキャプションのための視覚言語変換器

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning ( http://arxiv.org/abs/2211.15103v1 )

ライセンス: Link先を確認

Kashu Yamazaki, Khoa Vo, Sang Truong, Bhiksha Raj, Ngan Le

(参考訳) ビデオパラグラフキャプションは、コヒーレントなストーリーテリングにおいて、複数の時間的イベントロケーションを持つ未トリミングビデオのマルチセンテンス記述を作成することを目的としている。視覚と言語による相互影響の下で視覚成分(例えば、人間、動物)と非視覚成分(例えば、行動、関係)に分解してシーンを効果的に理解する人間の知覚過程に従い、まず視覚言語(vl)特徴を提案する。提案したVL機能では、シーンを3つのモードでモデル化する。 (i)グローバルな視覚環境 (ii) 局所視覚メインエージェント (三)言語シーン要素。次に,ビデオ内およびイベント間コンテンツの意味的コヒーレンスを同時に捉えるために,自己回帰トランスフォーマ(tint)を導入する。最後に,字幕のセマンティクスに適合する学習型埋め込み機能を保証するために,新たなVLコントラスト損失関数を提案する。 ActivityNet CaptionsとYouCookIIデータセットに関する包括的な実験と大規模なアブレーション研究は、提案されたVisual-Linguistic Transformer-in-Transform (VLTinT)が、精度と多様性に関する最先端の手法よりも優れていることを示している。

Video paragraph captioning aims to generate a multi-sentence description of an untrimmed video with several temporal event locations in coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non-visual components (e.g. action, relations) under the mutual influence of vision and language, we first propose a visual-linguistic (VL) feature. In the proposed VL feature, the scene is modeled by three modalities including (i) a global visual environment; (ii) local visual main agents; (iii) linguistic scene elements. We then introduce an autoregressive Transformer-in-Transformer (TinT) to simultaneously capture the semantic coherence of intra- and inter-event contents within a video. Finally, we present a new VL contrastive loss function to guarantee learnt embedding features are matched with the captions semantics. Comprehensive experiments and extensive ablation studies on ActivityNet Captions and YouCookII datasets show that the proposed Visual-Linguistic Transformer-in-Transform (VLTinT) outperforms prior state-of-the-art methods on accuracy and diversity.

翻訳日:2022-11-29 18:50:41 公開日:2022-11-28

# 潜在距離学習を用いた半教師付きバイナリ分類

Semi-supervised binary classification with latent distance learning ( http://arxiv.org/abs/2211.15153v1 )

ライセンス: Link先を確認

Imam Mustafa Kamal and Hyerim Bae

(参考訳) バイナリ分類(BC)は、バイオメディカル診断における健康・不健康な物体の識別や、製造検査における欠陥・非欠陥製品など、現実的な問題においてユビキタスな実践課題である。それでも、この問題を効果的に解決するために、完全に注釈付きデータが必要であり、ドメインの専門家による収集は退屈で高価な手順である。 BCとは対照的に、確率的データ拡張技術に大きく依存するいくつかの重要な半教師付き学習技術が、マルチクラス分類の解決のために考案された。本研究では, 正と負のサンプルを厳密に区別する重要な特徴を省略できるため, 確率的データ拡張手法は典型的な bc 問題の解法には適さないことを示す。そこで本研究では,ランダムなkペア間学習機構を持つラベルを用いて,bc問題を解くための新しい学習表現を提案する。まず、いくつかのラベル付きサンプルを利用することで、エンコーダネットワークは、角空間における正と負のサンプルの投影を学習し、クラス間距離とクラス内距離を最大化し、最小化する。第2に、分類器は、角空間とラベル付きサンプルに基づいて生成されたオンザフライラベルを用いて正と負のサンプルを判別し、bcタスクを解決する。大規模な実験は4つのBCデータセットを用いて実施された。ラベルが少なく、データ拡張技術がないため、提案手法は最先端の半教師あり自己教師あり学習法より優れていた。さらに,10%のラベル付けにより,完全教師付き設定と比較して,半教師付き分類器が競争精度を得ることができた。

Binary classification (BC) is a practical task that is ubiquitous in real-world problems, such as distinguishing healthy and unhealthy objects in biomedical diagnostics and defective and non-defective products in manufacturing inspections. Nonetheless, fully annotated data are commonly required to effectively solve this problem, and their collection by domain experts is a tedious and expensive procedure. In contrast to BC, several significant semi-supervised learning techniques that heavily rely on stochastic data augmentation techniques have been devised for solving multi-class classification. In this study, we demonstrate that the stochastic data augmentation technique is less suitable for solving typical BC problems because it can omit crucial features that strictly distinguish between positive and negative samples. To address this issue, we propose a new learning representation to solve the BC problem using a few labels with a random k-pair cross-distance learning mechanism. First, by harnessing a few labeled samples, the encoder network learns the projection of positive and negative samples in angular spaces to maximize and minimize their inter-class and intra-class distances, respectively. Second, the classifier learns to discriminate between positive and negative samples using on-the-fly labels generated based on the angular space and labeled samples to solve BC tasks. Extensive experiments were conducted using four real-world publicly available BC datasets. With few labels and without any data augmentation techniques, the proposed method outperformed state-of-the-art semi-supervised and self-supervised learning methods. Moreover, with 10% labeling, our semi-supervised classifier could obtain competitive accuracy compared with a fully supervised setting.

翻訳日:2022-11-29 18:50:19 公開日:2022-11-28

# ロバストモデル非依存メタラーニングにおけるショット数の再考

Rethinking the Number of Shots in Robust Model-Agnostic Meta-Learning ( http://arxiv.org/abs/2211.15180v1 )

ライセンス: Link先を確認

Xiaoyue Duan, Guoliang Kang, Runqi Wang, Shumin Han, Song Xue, Tian Wang, Baochang Zhang

(参考訳) モデルに依存しないロバストなメタラーニング(maml)は、通常、少数の例しか持たない新しいクラスに素早く適応するかもしれないメタモデルのトレーニングに採用され、一方で敵の攻撃に対して頑健である。従来のロバストMAMLの解決策は、メタトレーニング段階におけるロバストネスプロモーティング正規化の導入である。このような規則化により、従来の頑健なMAML手法は、トレーニングショットの数とテストショットの数とを一致させ、最適な適応性能を達成するための典型的なMAML手法に従う。しかし、ロバスト性は大幅に改善されるが、従来の方法はクリーンな精度を犠牲にしている。本稿では,MAMLにロバストネス・プロモーティング・正規化を導入することで,クリーンなサンプル特徴の固有次元が減少し,クリーンな表現能力が低下することを示す。これは、従来の堅牢なMAMLメソッドのクリーンな精度が著しく低下する理由を説明できるかもしれない。この観察に基づいて,ロバスト性向上正規化に起因する内在的次元の損失を軽減するため,訓練ショット数の増加という単純な戦略を提案する。本手法は単純ではあるが,ロバスト性を損なうことなくMAMLのクリーンな精度を著しく向上させ,ロバストで高精度なモデルを生成する。広範な実験により,本手法は精度とロバスト性とのトレードオフをよりよく達成する上で,先行技術よりも優れていることが示された。また,本手法は,メタトレーニング中の微調整ステップ数に対する感度が低く,トレーニング効率を向上させるための微調整ステップ数が少なくなることも確認した。

Robust Model-Agnostic Meta-Learning (MAML) is usually adopted to train a meta-model which may fast adapt to novel classes with only a few exemplars and meanwhile remain robust to adversarial attacks. The conventional solution for robust MAML is to introduce robustness-promoting regularization during meta-training stage. With such a regularization, previous robust MAML methods simply follow the typical MAML practice that the number of training shots should match with the number of test shots to achieve an optimal adaptation performance. However, although the robustness can be largely improved, previous methods sacrifice clean accuracy a lot. In this paper, we observe that introducing robustness-promoting regularization into MAML reduces the intrinsic dimension of clean sample features, which results in a lower capacity of clean representations. This may explain why the clean accuracy of previous robust MAML methods drops severely. Based on this observation, we propose a simple strategy, i.e., increasing the number of training shots, to mitigate the loss of intrinsic dimension caused by robustness-promoting regularization. Though simple, our method remarkably improves the clean accuracy of MAML without much loss of robustness, producing a robust yet accurate model. Extensive experiments demonstrate that our method outperforms prior arts in achieving a better trade-off between accuracy and robustness. Besides, we observe that our method is less sensitive to the number of fine-tuning steps during meta-training, which allows for a reduced number of fine-tuning steps to improve training efficiency.

翻訳日:2022-11-29 18:49:52 公開日:2022-11-28

# mixfairface:mixfairアダプタによる顔認識による究極の公平性の実現

MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition ( http://arxiv.org/abs/2211.15181v1 )

ライセンス: Link先を確認

Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai

(参考訳) 顔認識では大きな進歩があったが、顔認識システムにはまだ人口統計バイアスが存在する。例えば、ある人口集団の顔認識性能が他の集団よりも低い場合が普通である。本稿では,顔認識モデルの公平性を改善するためのmixfairfaceフレームワークを提案する。まず、一般的に使用される属性ベースの公正度メトリクスは、顔認識には適さないと主張する。顔認識システムは、すべての人が近いパフォーマンスを持っている間のみ公平であると考えられる。そこで我々は,異なるアプローチの公平性を評価するための新しい評価プロトコルを提案する。人種や性別といった機密性の高い属性ラベルを必要とする従来のアプローチとは異なり、顔表現におけるアイデンティティバイアス、すなわち、機密属性ラベルを必要とせず、異なるアイデンティティ間のパフォーマンスの不一貫性に対処することを目的としている。そこで本研究では,トレーニングサンプルの同一性バイアスを判定し低減するためのmixfairアダプタを提案する。広範な実験により,当社のmixfairfaceアプローチが,すべてのベンチマークデータセットで最先端のフェアネス性能を実現することを実証した。

Although significant progress has been made in face recognition, demographic bias still exists in face recognition systems. For instance, it usually happens that the face recognition performance for a certain demographic group is lower than the others. In this paper, we propose MixFairFace framework to improve the fairness in face recognition models. First of all, we argue that the commonly used attribute-based fairness metric is not appropriate for face recognition. A face recognition system can only be considered fair while every person has a close performance. Hence, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches. Different from previous approaches that require sensitive attribute labels such as race and gender for reducing the demographic bias, we aim at addressing the identity bias in face representation, i.e., the performance inconsistency between different identities, without the need for sensitive attribute labels. To this end, we propose MixFair Adapter to determine and reduce the identity bias of training samples. Our extensive experiments demonstrate that our MixFairFace approach achieves state-of-the-art fairness performance on all benchmark datasets.

翻訳日:2022-11-29 18:49:24 公開日:2022-11-28

# 共分散埋め込み型サービスとしてのメトリックラーニング

Metric Learning as a Service with Covariance Embedding ( http://arxiv.org/abs/2211.15197v1 )

ライセンス: Link先を確認

Imam Mustafa Kamal, Hyerim Bae, Ling Liu

(参考訳) ディープラーニングの出現により、メトリック学習は、情報検索、オブジェクト認識、レコメンデーションシステムなど、複雑で大規模なデータセットを扱う多くの機械学習タスクで大きな人気を得ている。メトリック学習は、クラス間の類似性を最大化し、最小化する。しかし、既存のモデルは、主に分離可能な埋め込み空間を得るための距離測度に依存し、クラス間の関係を無視しながらクラス内類似性を暗黙的に最大化する。高性能なディープラーニングアプリケーションのためのサービスとしてメトリック学習を有効にするためには、クラス間の関係を賢く扱い、より高度で意味のある埋め込み空間表現を得る必要がある。本稿では,埋め込み空間におけるデータポイント間の線形関係の方向を示すために共分散を組み込んだサービス手法として,新しい計量学習を提案する。従来の計量学習とは異なり、我々の共分散埋め込み強化アプローチは、サービスとしてのメートル法学習が、類似または異種の測度を計算するためにより表現力があり、正、負、中立の関係を捉えることができる。自然, バイオメディカル, 顔画像など, さまざまなベンチマークデータセットを用いて実施した大規模な実験により, 共分散埋め込み最適化サービスとしてのモデルが, 既存のモデルよりも高品質で分離性が高く, 表現力に富んだ埋め込み表現を得ることができることを示した。

With the emergence of deep learning, metric learning has gained significant popularity in numerous machine learning tasks dealing with complex and large-scale datasets, such as information retrieval, object recognition and recommendation systems. Metric learning aims to maximize and minimize inter- and intra-class similarities. However, existing models mainly rely on distance measures to obtain a separable embedding space and implicitly maximize the intra-class similarity while neglecting the inter-class relationship. We argue that to enable metric learning as a service for high-performance deep learning applications, we should also wisely deal with inter-class relationships to obtain a more advanced and meaningful embedding space representation. In this paper, a novel metric learning is presented as a service methodology that incorporates covariance to signify the direction of the linear relationship between data points in an embedding space. Unlike conventional metric learning, our covariance-embedding-enhanced approach enables metric learning as a service to be more expressive for computing similar or dissimilar measures and can capture positive, negative, or neutral relationships. Extensive experiments conducted using various benchmark datasets, including natural, biomedical, and facial images, demonstrate that the proposed model as a service with covariance-embedding optimizations can obtain higher-quality, more separable, and more expressive embedding representations than existing models.

翻訳日:2022-11-29 18:49:06 公開日:2022-11-28

# Meet-in-the-middle: クロスレゾリューション顔認識のためのマルチスケールアップサンプリングとマッチング \\

Meet-in-the-middle: Multi-scale upsampling and matching \\ for cross-resolution face recognition ( http://arxiv.org/abs/2211.15225v1 )

ライセンス: Link先を確認

Klemen Grm, Berk Kemal \"Ozata, Vitomir \v{S}truc, Haz{\i}m Kemal Ekenel

(参考訳) 本稿では,プロのポートレート写真からの高解像度顔画像と,セキュリティカメラからの低画質監視画像との間の大きな領域ギャップに対処することを目的とする。このような異なる情報源間のアイデンティティマッチングを確立することは、古典的な顔認証シナリオであり、現代の顔認識技術では難しい問題である。そこで本研究では,顔の超解像,解像度マッチング,マルチスケールテンプレート蓄積を組み合わせ,低品質ソースを含む長距離監視映像から顔を確実に認識する手法を提案する。提案手法は、実際の監視画像のターゲットデータセットのトレーニングや微調整を必要としない。広範な実験により,提案手法はscfaceデータセットに微調整された既存手法よりも優れることを示した。

In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.

翻訳日:2022-11-29 18:48:41 公開日:2022-11-28

# 医療画像の領域適応のための周波数領域と空間領域の領域ギャップ低減

Reducing Domain Gap in Frequency and Spatial domain for Cross-modality Domain Adaptation on Medical Image Segmentation ( http://arxiv.org/abs/2211.15235v1 )

ライセンス: Link先を確認

Shaolei Liu, Siqi Yin, Linhao Qu, Manning Wang

(参考訳) unsupervised domain adaptation(uda)は、ソースドメインでトレーニングされたモデルを学び、ラベルなしのターゲットドメインでうまく機能することを目的としている。医用画像セグメンテーションの分野では、既存のUDA手法は、複雑なトレーニングプロセスのため効果の低い異なる画像モダリティ間の領域ギャップに対処するために、敵対的な学習に依存している。本稿では, 周波数及び空間領域移動Uner Multi-Teacher蒸留フレームワークに基づく, 単純かつ効果的なUDA手法を提案する。周波数領域では、まず、ドメイン不変かつドメイン不変な周波数成分(DIFsとDVFs)を識別するための非サブスタンプコントゥール変換を導入し、次に、ソース領域画像のDVFをターゲット領域画像に置き換えてドメインギャップを狭めるとともに、DIFを変更しない。空間領域において,領域変動画像スタイルバイアスを低減するために,バッチモーメント更新に基づくヒストグラムマッチング戦略を提案する。 2つのクロスモーダル医療画像セグメンテーションデータセット(心,腹部)を用いた実験により,提案手法は最先端手法と比較して優れた性能を示した。

Unsupervised domain adaptation (UDA) aims to learn a model trained on source domain and performs well on unlabeled target domain. In medical image segmentation field, most existing UDA methods depend on adversarial learning to address the domain gap between different image modalities, which is ineffective due to its complicated training process. In this paper, we propose a simple yet effective UDA method based on frequency and spatial domain transfer uner multi-teacher distillation framework. In the frequency domain, we first introduce non-subsampled contourlet transform for identifying domain-invariant and domain-variant frequency components (DIFs and DVFs), and then keep the DIFs unchanged while replacing the DVFs of the source domain images with that of the target domain images to narrow the domain gap. In the spatial domain, we propose a batch momentum update-based histogram matching strategy to reduce the domain-variant image style bias. Experiments on two cross-modality medical image segmentation datasets (cardiac, abdominal) show that our proposed method achieves superior performance compared to state-of-the-art methods.

翻訳日:2022-11-29 18:48:27 公開日:2022-11-28

# DeepAngle:ディープラーニングを用いた断層画像の接触角の高速計算

DeepAngle: Fast calculation of contact angles in tomography images using deep learning ( http://arxiv.org/abs/2211.15243v1 )

ライセンス: Link先を確認

Arash Rabbani, Chenhao Sun, Masoud Babaei, Vahid J. Niasar, Ryan T. Armstrong, Peyman Mostaghimi

(参考訳) deepangleは、多孔質材料のトモグラフィ画像における異なる位相の接触角を決定する機械学習ベースの手法である。 3次元の角度の測定は、角度平面に垂直な表面で行う必要があり、画像ボクセルの離散化された空間を扱う際には不正確になる可能性がある。計算集約的な解は、適応可能な格子を用いて全ての曲面の相関とベクトル化を行い、次に所望の平面内の角度を測定することである。そこで本研究では,画像から直接界面角度を推定する深層学習による迅速かつ低コストな手法を提案する。 DeepAngleは直接測定技術に対して合成画像と現実画像の両方でテストされ、計算コストを20倍に下げながらr-2乗を5～16%改善した。この高速な手法は,大規模トモグラフィーデータや時間分解画像の処理に特に応用できる。開発コードとデータセットはGitHubのオープンリポジトリ(https://www.github.com/ArashRabbani/DeepAngle)で入手できる。

DeepAngle is a machine learning-based method to determine the contact angles of different phases in the tomography images of porous materials. Measurement of angles in 3--D needs to be done within the surface perpendicular to the angle planes, and it could become inaccurate when dealing with the discretized space of the image voxels. A computationally intensive solution is to correlate and vectorize all surfaces using an adaptable grid, and then measure the angles within the desired planes. On the contrary, the present study provides a rapid and low-cost technique powered by deep learning to estimate the interfacial angles directly from images. DeepAngle is tested on both synthetic and realistic images against the direct measurement technique and found to improve the r-squared by 5 to 16% while lowering the computational cost 20 times. This rapid method is especially applicable for processing large tomography data and time-resolved images, which is computationally intensive. The developed code and the dataset are available at an open repository on GitHub (https://www.github.com/ArashRabbani/DeepAngle).

翻訳日:2022-11-29 18:48:03 公開日:2022-11-28

# 顔画像品質評価におけるバイアスの評価

Assessing Bias in Face Image Quality Assessment ( http://arxiv.org/abs/2211.15265v1 )

ライセンス: Link先を確認

\v{Z}iga Babnik and Vitomir \v{S}truc

(参考訳) 顔画像品質評価(FIQA)は、サンプル品質に関する追加情報を提供することで、顔認識(FR)の性能を向上させる。 FIQA法は, 顔認識におけるサンプルの有用性を推定しようとするため, 基礎となる顔認識システムの影響を強く受けていると仮定することは妥当である。現代の顔認識システムはよく機能することが知られているが、いくつかの研究では、そのようなシステムはしばしば人口統計バイアスを伴う問題を示すことが知られている。したがって、このような問題はFIQA技術にも存在している可能性が高い。本稿では, FIQAアプローチに関連する人口統計学的バイアスについて検討するため, 様々な品質評価手法(汎用画像品質評価, 教師なし顔品質評価, 教師なし顔品質評価)と3種類の最先端FRモデルを含む総合的研究を行った。 The Balanced Faces in the Wild (BFW) データセットの解析により、考慮されたすべてのテクニックは、セックスよりも人種のバリエーションによって影響を受けていることが示された。汎用的な画像品質評価手法は,2つの要因に比較して偏見が低いが,監督的および教師なしの顔画像品質評価法はともに,(性別の)白人を好む傾向のある強い偏見を示す。さらに、人種的に偏りの少ない手法は、全体的な成績が悪化することがわかった。このことは、FIQA法における観測バイアスが、基礎となる顔認識システムとかなりの関係があることを示唆している。

Face image quality assessment (FIQA) attempts to improve face recognition (FR) performance by providing additional information about sample quality. Because FIQA methods attempt to estimate the utility of a sample for face recognition, it is reasonable to assume that these methods are heavily influenced by the underlying face recognition system. Although modern face recognition systems are known to perform well, several studies have found that such systems often exhibit problems with demographic bias. It is therefore likely that such problems are also present with FIQA techniques. To investigate the demographic biases associated with FIQA approaches, this paper presents a comprehensive study involving a variety of quality assessment methods (general-purpose image quality assessment, supervised face quality assessment, and unsupervised face quality assessment methods) and three diverse state-of-theart FR models. Our analysis on the Balanced Faces in the Wild (BFW) dataset shows that all techniques considered are affected more by variations in race than sex. While the general-purpose image quality assessment methods appear to be less biased with respect to the two demographic factors considered, the supervised and unsupervised face image quality assessment methods both show strong bias with a tendency to favor white individuals (of either sex). In addition, we found that methods that are less racially biased perform worse overall. This suggests that the observed bias in FIQA methods is to a significant extent related to the underlying face recognition system.

翻訳日:2022-11-29 18:47:43 公開日:2022-11-28

# スペクトル反射率分解による非ランバート型多スペクトル光量ステレオ

NeuralMPS: Non-Lambertian Multispectral Photometric Stereo via Spectral Reflectance Decomposition ( http://arxiv.org/abs/2211.15311v1 )

ライセンス: Link先を確認

Jipeng Lv, Heng Guo, Guanying Chen, Jinxiu Liang and Boxin Shi

(参考訳) マルチスペクトラルフォトメトリックステレオ(mps)は、マルチスペクトラル照明下で撮影された単発マルチスペクトラル画像からシーンの表面正常を回復することを目的としている。既存のMPS法ではランベルト反射率モデルを用いて問題を抽出できるが、現実の表面への応用は大幅に制限される。本稿では,一般の非ランベルトスペクトル反射率の下でmps問題を解決するために,ニューラルネットワークであるneuralmpsを提案する。具体的には,スペクトル反射率分解(srd)モデルを用いて,スペクトル反射率を幾何成分とスペクトル成分に分解する。この分解により、均一な材料を持つ表面のMPS問題は、未知の光強度を持つ従来の測光ステレオ(CPS)と等価であることを示す。このように、NeuralMPSは、よく研究された非ランベルト的なCPS手法を活用することで、非ランベルト的なMPS問題の難しさを軽減する。合成シーンと実世界のシーンの両方で実験を行い,本手法の有効性を実証した。

Multispectral photometric stereo(MPS) aims at recovering the surface normal of a scene from a single-shot multispectral image captured under multispectral illuminations. Existing MPS methods adopt the Lambertian reflectance model to make the problem tractable, but it greatly limits their application to real-world surfaces. In this paper, we propose a deep neural network named NeuralMPS to solve the MPS problem under general non-Lambertian spectral reflectances. Specifically, we present a spectral reflectance decomposition(SRD) model to disentangle the spectral reflectance into geometric components and spectral components. With this decomposition, we show that the MPS problem for surfaces with a uniform material is equivalent to the conventional photometric stereo(CPS) with unknown light intensities. In this way, NeuralMPS reduces the difficulty of the non-Lambertian MPS problem by leveraging the well-studied non-Lambertian CPS methods. Experiments on both synthetic and real-world scenes demonstrate the effectiveness of our method.

翻訳日:2022-11-29 18:47:19 公開日:2022-11-28

# CLIP2GAN: GANの潜在空間でテキストをブリッジする

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs ( http://arxiv.org/abs/2211.15045v1 )

ライセンス: Link先を確認

Yixuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li

(参考訳) 本稿では,CLIPモデルとStyleGANを活用して,テキスト誘導画像生成に特化して,CLIP2GANという新しいフレームワークを提案する。 CLIP2GANのキーとなる考え方は、CLIPの出力特徴埋め込み空間とStyleGANの入力潜在空間をブリッジすることであり、マッピングネットワークを導入して実現している。トレーニング段階では、画像をクリップでエンコードし、出力機能を潜在コードにマップし、さらに画像の再構築に使用する。このように、マッピングネットワークは自己教師付き学習方法で最適化される。推論段階では、CLIPは画像とテキストの両方を共有機能埋め込みスペースに埋め込むことができるため、トレーニングアーキテクチャにおけるCLIPイメージエンコーダをCLIPテキストエンコーダに置き換えると同時に、以下のマッピングネットワークとStyleGANモデルを保持する。その結果、テキスト記述を柔軟に入力して画像を生成することができる。さらに、地図化されたCLIP画像機能に属性のマッピングされたテキスト機能を追加するだけで、画像に対する属性を効果的に編集できる。提案したCLIP2GANは,従来の方法に比べて優れた性能を示した。

In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network. In the training stage, we encode an image with CLIP and map the output feature to a latent code, which is further used to reconstruct the image. In this way, the mapping network is optimized in a self-supervised learning way. In the inference stage, since CLIP can embed both image and text into a shared feature embedding space, we replace CLIP image encoder in the training architecture with CLIP text encoder, while keeping the following mapping network as well as StyleGAN model. As a result, we can flexibly input a text description to generate an image. Moreover, by simply adding mapped text features of an attribute to a mapped CLIP image feature, we can effectively edit the attribute to the image. Extensive experiments demonstrate the superior performance of our proposed CLIP2GAN compared to previous methods.

翻訳日:2022-11-29 18:41:37 公開日:2022-11-28

# Mix and Localize: 音源のミキサー内局在化

Mix and Localize: Localizing Sound Sources in Mixtures ( http://arxiv.org/abs/2211.15058v1 )

ライセンス: Link先を確認

Xixi Hu, Ziyang Chen, Andrew Owens

(参考訳) 本稿では,複数の音源を同時に可視化する手法を提案する。このタスクは、音の混合を個々のソースにグループ化し、それらを視覚信号に関連付けるモデルを必要とする。本手法は,Jabriらのランダムウォークにヒントを得た定式化を用いて,両課題を同時に解決する。我々は、画像と分離された音がノードに対応するグラフを作成し、ランダムウォーカーに異なるモードから高い戻り確率でノード間の遷移を訓練する。この歩行の遷移確率は、モデルによって学習された視聴覚類似度指標によって決定される。実験では,複数の音の局所化に成功し,他の自己監視手法よりも優れていることを示す。プロジェクトサイト: https://hxixixh.github.io/mix-and-localize

We present a method for simultaneously localizing multiple sound sources within a visual scene. This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal. Our method jointly solves both tasks at once, using a formulation inspired by the contrastive random walk of Jabri et al. We create a graph in which images and separated sounds correspond to nodes, and train a random walker to transition between nodes from different modalities with high return probability. The transition probabilities for this walk are determined by an audio-visual similarity metric that is learned by our model. We show through experiments with musical instruments and human speech that our model can successfully localize multiple sounds, outperforming other self-supervised methods. Project site: https://hxixixh.github.io/mix-and-localize

翻訳日:2022-11-29 18:41:15 公開日:2022-11-28

# 低ショットカテゴリ一般化のための複数視点からの高密度オブジェクト記述子学習

Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization ( http://arxiv.org/abs/2211.15059v1 )

ライセンス: Link先を確認

Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg

(参考訳) コンピュータビジョンのディープラーニング時代の特徴は、オブジェクト認識やセマンティックセグメンテーション、光学フロー推定、そして3dシーンの新しいビュー合成まで、タスクの特徴表現を訓練するために大規模なラベル付きデータセットをうまく利用することである。本研究では,カテゴリラベルを必要とせず,低ショットカテゴリ認識のための密な判別対象表現を学習することを目的とする。そこで本稿では,対象インスタンスの複数ビューからカテゴリや意味的オブジェクト部分ラベルを使わずにトレーニング可能な,ディープオブジェクトパッチエンコーディング(dope)を提案する。 dopeを訓練するには,被写体の視野間のピクセルレベル対応を得るために,被写体深度,前景マスク,既知のカメラへのアクセスを想定し,これを用いて自己教師あり学習タスクを定式化し,識別対象パッチを学習する。 DOPEは, 局所的マッチングを用いて, 新規カテゴリーの低ショット分類に利用でき, 教師付き学習ベースラインや自己教師型学習ベースラインと競合する。コードとデータはhttps://github.com/rehg-lab/dope_selfsup。

A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines. Code and data available at https://github.com/rehg-lab/dope_selfsup.

翻訳日:2022-11-29 18:41:01 公開日:2022-11-28

# インタラクティブなビジュアル特徴検索

Interactive Visual Feature Search ( http://arxiv.org/abs/2211.15060v1 )

ライセンス: Link先を確認

Devon Ulrich and Ruth Fong

(参考訳) 畳み込みニューラルネットワーク(CNN)の動作を説明するために、多くの可視化技術が作成されているが、それらは主に限られた情報を伝える静的な図で構成されている。インタラクティブなビジュアライゼーションはより豊富な洞察を提供し、より簡単にモデルの振る舞いを探索することができるが、一般的には再利用可能なものではなく、特定のモデルに特有のものである。我々は,任意のcnnに一般化可能で,研究者のワークフローに容易に組み込むことのできる,インタラクティブなインタラクティブ可視化であるvisual feature searchを紹介する。このツールを使うと、ユーザーは画像領域をハイライトし、最もよく似たCNN機能を持つデータセットから画像を検索できる。キャッシュベースの効率的な検索実装で、大きなイメージデータセットの検索をサポートする。本手法は, 教師付き, 自己監督型, および人間編集型cnnを用いた実験により, モデル行動の異なる側面を解明する方法を示す。また、ポータブルなPythonライブラリといくつかのIPythonノートブックもリリースしています。私たちのコードはhttps://github.com/lookingglasslab/VisualFeatureSearchで参照できます。

Many visualization techniques have been created to help explain the behavior of convolutional neural networks (CNNs), but they largely consist of static diagrams that convey limited information. Interactive visualizations can provide more rich insights and allow users to more easily explore a model's behavior; however, they are typically not easily reusable and are specific to a particular model. We introduce Visual Feature Search, a novel interactive visualization that is generalizable to any CNN and can easily be incorporated into a researcher's workflow. Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar CNN features. It supports searching through large image datasets with an efficient cache-based search implementation. We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on supervised, self-supervised, and human-edited CNNs. We also release a portable Python library and several IPython notebooks to enable researchers to easily use our tool in their own experiments. Our code can be found at https://github.com/lookingglasslab/VisualFeatureSearch.

翻訳日:2022-11-29 18:40:27 公開日:2022-11-28

# 単眼ビデオからの高忠実度顔面アバター再構成

High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors ( http://arxiv.org/abs/2211.15064v1 )

ライセンス: Link先を確認

Yunpeng Bai, Yanbo Fan, Xuan Wang, Yong Zhang, Jingxiang Sun, Chun Yuan, Ying Shan

(参考訳) 単眼映像からの高忠実な顔のアバター再構成は、コンピュータグラフィックスとコンピュータビジョンにおいて重要な研究課題である。近年,Neural Radiance Field (NeRF) は目覚しいビューレンダリング結果を示しており,顔アバターの再構成も検討されている。しかし、単眼ビデオにおける複雑な顔のダイナミクスと3D情報の欠如は、忠実な顔の再構築に重大な課題をもたらす。そこで本研究では,3次元認識を用いた顔アバター再構成手法を提案する。動的モデリングのための条件付き変形場に依存する既存の作品とは異なり、3d-ganの潜在空間における局所および低次元部分空間として定式化されたパーソナライズされた生成前置法を学習することを提案する。そこで本稿では,特定の人物の顔画像の小さなセットに基づいて,パーソナライズされた生成前を効率的に構築する方法を提案する。学習後、新しいビューによるフォトリアリスティックなレンダリングが可能となり、潜在空間でナビゲーションを行うことで、顔再現を実現することができる。提案手法は,RGB画像,3DMM係数,オーディオなど,異なる駆動信号に適用可能である。既存の作品と比較して優れた新規視点合成結果と忠実に対面再現性能が得られる。

High-fidelity facial avatar reconstruction from a monocular video is a significant research problem in computer graphics and computer vision. Recently, Neural Radiance Field (NeRF) has shown impressive novel view rendering results and has been considered for facial avatar reconstruction. However, the complex facial dynamics and missing 3D information in monocular videos raise significant challenges for faithful facial reconstruction. In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. Different from existing works that depend on a conditional deformation field for dynamic modeling, we propose to learn a personalized generative prior, which is formulated as a local and low dimensional subspace in the latent space of 3D-GAN. We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios. Compared with existing works, we obtain superior novel view synthesis results and faithfully face reenactment performance.

翻訳日:2022-11-29 18:40:01 公開日:2022-11-28

# クラス不均衡セマンティックセグメンテーションのための半監督信頼度に基づくコントラスト識別

Semi-Supervised Confidence-Level-based Contrastive Discrimination for Class-Imbalanced Semantic Segmentation ( http://arxiv.org/abs/2211.15066v1 )

ライセンス: Link先を確認

Kangcheng Liu

(参考訳) データ・ハングリー課題を克服するために,クラス不均衡意味セグメンテーションタスクのための半教師ありコントラスト学習フレームワークを提案する。まず、モデルを半教師付きで動作させるため、信頼度に基づくコントラスト学習を提案し、インスタンス識別を明示的に達成し、低信頼度低品質特徴を高信頼度特徴と整合させる。さらに,クラックセグメンテーションと道路成分抽出におけるクラス不均衡の問題に取り組むため,画素レベル意味セグメンテーションにおける従来のクロスエントロピー損失に代わるデータ不均衡損失を提案した。最後に,セマンティクスセグメンテーション性能を向上させるための,効果的な多段融合ネットワークアーキテクチャを提案する。実産業用ひび割れセグメント化と道路セグメント化に関する広範囲実験により,提案手法の有効性が示された。提案手法は3.5%のラベル付きデータでも十分なセグメンテーション結果が得られる。

To overcome the data-hungry challenge, we have proposed a semi-supervised contrastive learning framework for the task of class-imbalanced semantic segmentation. First and foremost, to make the model operate in a semi-supervised manner, we proposed the confidence-level-based contrastive learning to achieve instance discrimination in an explicit manner, and make the low-confidence low-quality features align with the high-confidence counterparts. Moreover, to tackle the problem of class imbalance in crack segmentation and road components extraction, we proposed the data imbalance loss to replace the traditional cross entropy loss in pixel-level semantic segmentation. Finally, we have also proposed an effective multi-stage fusion network architecture to improve semantic segmentation performance. Extensive experiments on the real industrial crack segmentation and the road segmentation demonstrate the superior effectiveness of the proposed framework. Our proposed method can provide satisfactory segmentation results with even merely 3.5% labeled data.

翻訳日:2022-11-29 18:39:41 公開日:2022-11-28

# FeatureBooster: 軽量ニューラルネットワークによる機能記述の強化

FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network ( http://arxiv.org/abs/2211.15069v1 )

ライセンス: Link先を確認

Xinjiang Wang, Zeyu Liu, Yu Hu, Wei Xi, Wenxian Yu, Danping Zou

(参考訳) 同じ画像内のキーポイントの記述子を改善するための軽量ネットワークを導入する。このネットワークは、元の記述子とキーポイントの幾何学的性質を入力とし、MLPベースのセルフブートステージとTransformerベースのクロスブートステージを使用して記述子を強化する。拡張ディスクリプタは、実数値かバイナリかのいずれかである。提案するネットワークは,手作り(orb, sift)と最先端学習に基づく記述子(superpoint, 等)の両方を増強し,画像マッチング, 視覚定位, 運動からの構造タスクで評価する。その結果、特に大きな照明変化や繰り返しパターンなどの困難な場合において、各タスクの性能が著しく向上することが示された。提案手法では,デスクトップgpuでは3.2ms,組込みgpuでは27msしか必要とせず,実用的なシステムに適用するには十分高速である。

We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a Transformer-based cross-boosting stage to enhance the descriptors. The enhanced descriptors can be either real-valued or binary ones. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors (SuperPoint, ALIKE) and evaluate them on image matching, visual localization, and structure-from-motion tasks. The results show that our method significantly improves the performance of each task, particularly in challenging cases such as large illumination changes or repetitive patterns. Our method requires only 3.2ms on desktop GPU and 27ms on embedded GPU to process 2000 features, which is fast enough to be applied to a practical system.

翻訳日:2022-11-29 18:39:24 公開日:2022-11-28

# 条件付きバッチ正規化のマルチモーダル学習における落とし穴

Pitfalls of Conditional Batch Normalization for Contextual Multi-Modal Learning ( http://arxiv.org/abs/2211.15071v1 )

ライセンス: Link先を確認

Ivaxi Sheth, Aamer Abdul Rahman, Mohammad Havaei, Samira Ebrahimi Kahou

(参考訳) 人間は感覚器官を通して複数のモダリティから学ぶ技術を完成させた。単一のモダリティにおける驚くべき予測性能にもかかわらず、ニューラルネットワークは複数のモダリティに関して人間のレベルの精度に到達できない。これは、それぞれの様相の構造が変化するため、特に難しい課題である。条件付きバッチ正規化(CBN)は、文脈的特徴を学習して深層学習タスクを支援するために提案される一般的な手法である。この技術は、畳み込みニューラルネットワークのアフィン変換を学習することにより、補助データを用いて表現力を向上させる。 CBN層を用いた性能向上にもかかわらず,我々はCBNによる補助データの導入によって得られた視覚的特徴が劣化していることを明らかにした。我々は,様々なデータセットに対するCBNネットワークの脆さを評価するための総合的な実験を行い,視覚的特徴のみからの学習が一般化に優れていることを示唆した。鳥類分類のための自然画像のcbnモデルと癌分類のための組織像を評価した。我々は,CBNネットワークが鳥類分類データセットの視覚的特徴や組織学的データセットの視覚的特徴をほとんど学習していないことを観察した。 CBNは補助データとラベル間のショートカット学習を促進する可能性がある。

Humans have perfected the art of learning from multiple modalities through sensory organs. Despite their impressive predictive performance on a single modality, neural networks cannot reach human level accuracy with respect to multiple modalities. This is a particularly challenging task due to variations in the structure of respective modalities. Conditional Batch Normalization (CBN) is a popular method that was proposed to learn contextual features to aid deep learning tasks. This technique uses auxiliary data to improve representational power by learning affine transformations for convolutional neural networks. Despite the boost in performance observed by using CBN layers, our work reveals that the visual features learned by introducing auxiliary data via CBN deteriorates. We perform comprehensive experiments to evaluate the brittleness of CBN networks to various datasets, suggesting that learning from visual features alone could often be superior for generalization. We evaluate CBN models on natural images for bird classification and histology images for cancer type classification. We observe that the CBN network learns close to no visual features on the bird classification dataset and partial visual features on the histology dataset. Our extensive experiments reveal that CBN may encourage shortcut learning between the auxiliary data and labels.

翻訳日:2022-11-29 18:39:06 公開日:2022-11-28

# クラス適応型ネットワーク校正

Class Adaptive Network Calibration ( http://arxiv.org/abs/2211.15088v1 )

ライセンス: Link先を確認

Bingyuan Liu, J\'er\^ome Rony, Adrian Galdran, Jose Dolz, Ismail Ben Ayed

(参考訳) 最近の研究では、従来の精度以上のキャリブレーションは、現代のディープニューラルネットワークのトレーニングにも考慮すべきであることが示されている。学習中の誤校正に対処するために,各項の相対的寄与を制御するハイパーパラメータを用いて,学習目標の一部として異なるペナルティ関数を探索した手法がある。しかしながら、これらの手法には2つの大きな欠点がある。 1) スカラーバランスの重みは,すべてのクラスにおいて同じであり,クラス間の内在的困難や不均衡に対処する能力を妨げる。 2) バランスウェイトは適応戦略を使わずに固定され, 精度とキャリブレーションの最良の妥協点に達するのを防ぎ, 各アプリケーションに対してハイパーパラメーター探索が必要となる。そこで本研究では,深層ネットワークを校正するクラス適応ラベル平滑化(cals)を提案する。提案手法は,制約付き最適化における確立された手法である一般拡張ラグランジアンアプローチに基づいているが,大規模クラス適応型トレーニングのための修正がいくつか導入されている。標準およびロングテール画像分類、意味セグメンテーション、テキスト分類を含む様々なベンチマークにおける総合的評価と多重比較は、提案手法の優位性を示している。コードはhttps://github.com/by-liu/CALSで公開されている。

Recent studies have revealed that, beyond conventional accuracy, calibration should also be considered for training modern deep neural networks. To address miscalibration during learning, some methods have explored different penalty functions as part of the learning objective, alongside a standard classification loss, with a hyper-parameter controlling the relative contribution of each term. Nevertheless, these methods share two major drawbacks: 1) the scalar balancing weight is the same for all classes, hindering the ability to address different intrinsic difficulties or imbalance among classes; and 2) the balancing weight is usually fixed without an adaptive strategy, which may prevent from reaching the best compromise between accuracy and calibration, and requires hyper-parameter search for each application. We propose Class Adaptive Label Smoothing (CALS) for calibrating deep networks, which allows to learn class-wise multipliers during training, yielding a powerful alternative to common label smoothing penalties. Our method builds on a general Augmented Lagrangian approach, a well-established technique in constrained optimization, but we introduce several modifications to tailor it for large-scale, class-adaptive training. Comprehensive evaluation and multiple comparisons on a variety of benchmarks, including standard and long-tailed image classification, semantic segmentation, and text classification, demonstrate the superiority of the proposed method. The code is available at https://github.com/by-liu/CALS.

翻訳日:2022-11-29 18:38:48 公開日:2022-11-28

# MGFN:弱スーパービジョンビデオ異常検出のためのマグニチュードコントラストGlance-and-Focusネットワーク

MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection ( http://arxiv.org/abs/2211.15098v1 )

ライセンス: Link先を確認

Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, Yik-Chung Wu

(参考訳) 監視ビデオにおける異常検出の微妙な監視は難しい課題だ。長編ビデオに異常をローカライズする能力に欠ける既存の作品以外にも,空間的時間的情報を効率的に統合して正確な異常検出を行う新しい視点・フォーカスネットワークを提案する。さらに,異常度を表すために特徴量を用いた既存手法では,シーンの変動の影響を無視することが一般的であり,その結果,シーン間の特徴量の不整合による準最適性能が得られた。この問題に対処するため,異常検出のための特徴量の識別性を高めるために,特徴増幅機構とマグニチュードコントラスト損失を提案する。 UCF-Crime と XD-Violence の2つの大規模ベンチマークの実験結果から,本手法は最先端の手法よりも優れていることが示された。

Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches.

翻訳日:2022-11-29 18:38:23 公開日:2022-11-28

# デュアル情報強化マルチビュー分散グラフクラスタリング

Dual Information Enhanced Multi-view Attributed Graph Clustering ( http://arxiv.org/abs/2211.14987v1 )

ライセンス: Link先を確認

Jia-Qi Lin, Man-Sheng Chen, Xi-Ran Zhu, Chang-Dong Wang, Haizhang Zhang

(参考訳) マルチビュー属性グラフクラスタリングは、属性特徴と、異なるビューからの隣接行列に基づいて、マルチビューデータを分割する重要なアプローチである。有望なクラスタリング性能を達成したグラフニューラルネットワーク(GNN)の利用が試みられている。それにもかかわらず、複数のビューに埋め込まれた固有の特定の情報に注意を払う人は少ない。一方、低レベルの表現から潜在高レベルの表現を回復することができないため、ダウンストリームクラスタリングのパフォーマンスが大幅に制限される。本稿では,これらのギャップを埋めるために,新しい2重情報強化多視点グラフクラスタリング(diagc)法を提案する。具体的には,複数視点からのコンセンサスと特定情報の探索を解消するsir(specific information reconstruction)モジュールを導入することで,gcnがより本質的な低レベル表現をキャプチャできるようにする。さらに、相互情報最大化(MIM)モジュールは、潜在高レベル表現と低レベル表現との合意を最大化し、自己監督クラスタリング(SC)モジュールの助けを借りて、高レベル表現が望ましいクラスタリング構造を満たすことを可能にする。いくつかの実世界のベンチマーク実験では、提案手法の有効性を最先端のベースラインと比較した。

Multi-view attributed graph clustering is an important approach to partition multi-view data based on the attribute feature and adjacent matrices from different views. Some attempts have been made in utilizing Graph Neural Network (GNN), which have achieved promising clustering performance. Despite this, few of them pay attention to the inherent specific information embedded in multiple views. Meanwhile, they are incapable of recovering the latent high-level representation from the low-level ones, greatly limiting the downstream clustering performance. To fill these gaps, a novel Dual Information enhanced multi-view Attributed Graph Clustering (DIAGC) method is proposed in this paper. Specifically, the proposed method introduces the Specific Information Reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views, which enables GCN to capture the more essential low-level representations. Besides, the Mutual Information Maximization (MIM) module maximizes the agreement between the latent high-level representation and low-level ones, and enables the high-level representation to satisfy the desired clustering structure with the help of the Self-supervised Clustering (SC) module. Extensive experiments on several real-world benchmarks demonstrate the effectiveness of the proposed DIAGC method compared with the state-of-the-art baselines.

翻訳日:2022-11-29 17:56:11 公開日:2022-11-28

# shoupa:パーキンソン病の早期診断のためのaiシステム

Shoupa: An AI System for Early Diagnosis of Parkinson's Disease ( http://arxiv.org/abs/2211.15234v1 )

ライセンス: Link先を確認

Jingwei Li, Ruitian Wu, Tzu-liang Huang, Zian Pan, Ming-chun Huang

(参考訳) パーキンソン病(英: Parkinson's Disease、PD)は、進行性神経系疾患であり、580万人以上、特に高齢者に影響を与えた。症状の複雑さと他の神経疾患との類似性のため、早期発見には神経科医やPDスペシャリストが関与する必要があるが、ほとんどの高齢者にはアクセスできない。そこで我々は、スマートモバイルデバイスとAI技術を統合する。本稿では,運動症状と非運動症状の両方を評価する異なるタスクを組み合わせるpd早期検出システムの枠組みを提案する。開発したモデルを用いて,非クリニカルな条件下でPDを一時的に検出し,最も重篤な症状を明らかにする。この結果は、PDリハビリテーション指導や他の神経疾患の検出にさらに使われることが期待される。

Parkinson's Disease (PD) is a progressive nervous system disorder that has affected more than 5.8 million people, especially the elderly. Due to the complexity of its symptoms and its similarity to other neurological disorders, early detection requires neurologists or PD specialists to be involved, which is not accessible to most old people. Therefore, we integrate smart mobile devices with AI technologies. In this paper, we introduce the framework of our developed PD early detection system which combines different tasks evaluating both motor and non-motor symptoms. With the developed model, we help users detect PD punctually in non-clinical settings and figure out their most severe symptoms. The results are expected to be further used for PD rehabilitation guidance and detection of other neurological disorders.

翻訳日:2022-11-29 17:55:51 公開日:2022-11-28

# 文化的に無知なAIモデルの神話

The Myth of Culturally Agnostic AI Models ( http://arxiv.org/abs/2211.15271v1 )

ライセンス: Link先を確認

Eva Cetinic

(参考訳) 本稿では,経験的文化研究の目的として,大規模視覚言語モデルの可能性について考察する。 dall-e 2とstable diffusionという2つの一般的なテキストから画像への合成モデルからの出力の比較分析に注目し,文化に無依存なaiモデルに対する努力の長所と短所について考察した。本稿では、リスク緩和と文化的特異性とのトレードオフを示す出力の記憶とバイアスの例と、文化的非依存モデルの開発における全体的な不可能性について論じる。

The paper discusses the potential of large vision-language models as objects of interest for empirical cultural studies. Focusing on the comparative analysis of outputs from two popular text-to-image synthesis models, DALL-E 2 and Stable Diffusion, the paper tries to tackle the pros and cons of striving towards culturally agnostic vs. culturally specific AI models. The paper discusses several examples of memorization and bias in generated outputs which showcase the trade-off between risk mitigation and cultural specificity, as well as the overall impossibility of developing culturally agnostic models.

翻訳日:2022-11-29 17:55:37 公開日:2022-11-28

# 会話からの低リソース個人属性予測

Low-resource Personal Attribute Prediction from Conversation ( http://arxiv.org/abs/2211.15324v1 )

ライセンス: Link先を確認

Yinan Liu and Hu Chen and Wei Shen and Jiaoyan Chen

(参考訳) 個人知識ベース(pkbs)は、パーソナライズドレコメンデーションやwebベースのチャットボットなど、幅広いアプリケーションにとって重要である。 PKBを構築する上で重要な課題は、ユーザの会話データから個人属性の知識を抽出することである。会話システムや個人属性,これらのユーザの発話のユーザ数を考えると,ユーザ毎の個人属性値のランク付けを予測することが目的である。従来の研究では、ラベル付き発話や外部データなどのリソースの相対的な数に依存することが多いが、ラベル付き発話に埋め込まれた属性知識は未利用であり、難解な個人属性を予測する能力は未だに不十分である。さらに,この課題を直接解決するために,いくつかのテキスト分類手法が利用可能であることが判明した。しかし、これらの難しい個人的属性に対してうまく機能しない。本稿では,ラベル付き発話や外部データを使用しない低リソース環境下で,発話から豊富な個人的属性知識を活用し,会話から個人的属性を予測する新しい枠組みを提案する。 PEARLは、更新された事前属性知識を用いて、両項意味情報と単語共起情報をシームレスに結合し、両項トピックモデルのギブスサンプリングプロセスを反復的に洗練する。広範な実験結果から,pearlは2つのデータセット上での会話による個人属性予測のタスクだけでなく,より一般的な弱い教師付きテキスト分類タスクを1つのデータセット上で超えていることがわかった。

Personal knowledge bases (PKBs) are crucial for a broad range of applications such as personalized recommendation and Web-based chatbots. A critical challenge to build PKBs is extracting personal attribute knowledge from users' conversation data. Given some users of a conversational system, a personal attribute and these users' utterances, our goal is to predict the ranking of the given personal attribute values for each user. Previous studies often rely on a relative number of resources such as labeled utterances and external data, yet the attribute knowledge embedded in unlabeled utterances is underutilized and their performance of predicting some difficult personal attributes is still unsatisfactory. In addition, it is found that some text classification methods could be employed to resolve this task directly. However, they also perform not well over those difficult personal attributes. In this paper, we propose a novel framework PEARL to predict personal attributes from conversations by leveraging the abundant personal attribute knowledge from utterances under a low-resource setting in which no labeled utterances or external data are utilized. PEARL combines the biterm semantic information with the word co-occurrence information seamlessly via employing the updated prior attribute knowledge to refine the biterm topic model's Gibbs sampling process in an iterative manner. The extensive experimental results show that PEARL outperforms all the baseline methods not only on the task of personal attribute prediction from conversations over two data sets, but also on the more general weakly supervised text classification task over one data set.

翻訳日:2022-11-29 17:55:26 公開日:2022-11-28

# 資源制約ゴールPMDPの遮蔽

Shielding in Resource-Constrained Goal POMDPs ( http://arxiv.org/abs/2211.15349v1 )

ライセンス: Link先を確認

Michal Ajdar\'ow, \v{S}imon Brlej, Petr Novotn\'y

(参考訳) 我々は,特定の資源(例えば,電池に蓄えられた電力)の供給を必要とするエージェントを正しく動作させるためにモデル化する部分可観測マルコフ決定プロセス(pomdps)を検討する。資源はエージェントの行動によって消費され、特定の州でのみ補充される。エージェントは、リソースの枯渇を防止しながら、ある目標を達成するための期待されるコストを最小限にすることを目的としています。 RSGO問題に対して2段階のアプローチをとる。まず,形式的手法を用いて,与えられたシナリオに対して \emph{shield} を演算するアルゴリズムを設計する。第2に, RSGO問題を解くアルゴリズムを得るために, シールドを用いたPOMDP計画のためのPOMCPヒューリスティック探索アルゴリズムを拡張した。本アルゴリズムを実装し,そのベンチマークへの適用性を示す実験を行った。

We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call \emph{resource-constrained goal optimization} (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a \emph{shield} for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.

翻訳日:2022-11-29 17:55:02 公開日:2022-11-28

# マニキュア識別チャレンジによるマニキュア識別を可能にするAI

AI Enabled Maneuver Identification via the Maneuver Identification Challenge ( http://arxiv.org/abs/2211.15552v1 )

ライセンス: Link先を確認

Kaira Samuel, Matthew LaRosa, Kyle McAlpin, Morgan Schaefer, Brandon Swenson, Devin Wasilefsky, Yan Wu, Dan Zhao, Jeremy Kepner

(参考訳) 人工知能(AI)は、パイロット訓練の質に関する実用的なフィードバックを提供することで、空軍のパイロット訓練を改善する大きな可能性を秘めている。歴史的に、データ、問題記述、サンプルコードで構成されるAIの課題は、AIのブレークスルーを促進するために重要だった。空軍・マサチューセッツ工科大学aiアクセラレーター(daf-mit ai accelerator)は、実世界の航空シミュレータデータを用いたaiチャレンジを開発した。 Maneuver IDチャレンジでは、パイロット訓練会(PTN)で実際に空軍の学生パイロットが収集した何千ものバーチャルリアリティーシミュレーター飛行記録が集められた。このデータセットはManeuver-ID.mit.eduで公開され、USAFの飛行訓練データの最初の公開である。このデータセットを用いて、我々は「良い」と「悪い」シミュレーターデータを分離し、操作の分類と特徴付けに様々なAI手法を適用した。これらのデータ、アルゴリズム、ソフトウェアは、飛行シミュレータトレーニングのためのAIエコシステムを実現するために、他の人が構築するモデルパフォーマンスのベースラインとしてリリースされている。

Artificial intelligence (AI) has enormous potential to improve Air Force pilot training by providing actionable feedback to pilot trainees on the quality of their maneuvers and enabling instructor-less flying familiarization for early-stage trainees in low-cost simulators. Historically, AI challenges consisting of data, problem descriptions, and example code have been critical to fueling AI breakthroughs. The Department of the Air Force-Massachusetts Institute of Technology AI Accelerator (DAF-MIT AI Accelerator) developed such an AI challenge using real-world Air Force flight simulator data. The Maneuver ID challenge assembled thousands of virtual reality simulator flight recordings collected by actual Air Force student pilots at Pilot Training Next (PTN). This dataset has been publicly released at Maneuver-ID.mit.edu and represents the first of its kind public release of USAF flight training data. Using this dataset, we have applied a variety of AI methods to separate "good" vs "bad" simulator data and categorize and characterize maneuvers. These data, algorithms, and software are being released as baselines of model performance for others to build upon to enable the AI ecosystem for flight simulator training.

翻訳日:2022-11-29 17:54:37 公開日:2022-11-28

# ニューロシンボリック時空間推論

Neuro-Symbolic Spatio-Temporal Reasoning ( http://arxiv.org/abs/2211.15566v1 )

ライセンス: Link先を確認

Jae Hee Lee, Michael Sioutis, Kyra Ahrens, Marjan Alirezaie, Matthias Kerzel, Stefan Wermter

(参考訳) 空間と時間に関する知識は、物理的世界の問題を解決するために必要である: 物理的世界に位置し、オブジェクトと相互作用するaiエージェントは、しばしばオブジェクト間の位置と関係について判断する必要がある。しかし時空間的知識は物理的世界との相互作用を超えて必要であり、しばしばアナロジーやメタファー(例えば「私たちの頭の上に掛かっている脅威」)を通して抽象的な概念の世界に移される。空間的および時間的推論はユビキタスであるため、これをAIシステムに統合するためのさまざまな試みがなされている。知識表現の分野では、空間的および時間的推論は、オブジェクトとリレーションシップのモデリングと、オブジェクトとリレーションシップに関するステートメントを検証するための推論方法の開発に大きく制限されている。一方、ニューラルネットワーク研究者は、限られた推論能力を持つデータから空間関係を学習するモデルを教えようとした。これら2つのアプローチ間のギャップを相互に有益な方法で橋渡しすることで、自然言語処理、視覚的質問応答、セマンティックイメージのセグメンテーションなど、多くの複雑な実世界の問題に対処できます。本章では、ニューロシンボリックAIの観点から、この統合問題を考察する。具体的には,空間的および時間的知識に基づく論理的推論と機械学習の相乗効果を提案する。いくつかの成功したアプリケーション、残る課題、そしてこの方向に関連する評価データセットを記述することが、この貢献の主要なトピックである。

Knowledge about space and time is necessary to solve problems in the physical world: An AI agent situated in the physical world and interacting with objects often needs to reason about positions of and relations between objects; and as soon as the agent plans its actions to solve a task, it needs to consider the temporal aspect (e.g., what actions to perform over time). Spatio-temporal knowledge, however, is required beyond interacting with the physical world, and is also often transferred to the abstract world of concepts through analogies and metaphors (e.g., "a threat that is hanging over our heads"). As spatial and temporal reasoning is ubiquitous, different attempts have been made to integrate this into AI systems. In the area of knowledge representation, spatial and temporal reasoning has been largely limited to modeling objects and relations and developing reasoning methods to verify statements about objects and relations. On the other hand, neural network researchers have tried to teach models to learn spatial relations from data with limited reasoning capabilities. Bridging the gap between these two approaches in a mutually beneficial way could allow us to tackle many complex real-world problems, such as natural language processing, visual question answering, and semantic image segmentation. In this chapter, we view this integration problem from the perspective of Neuro-Symbolic AI. Specifically, we propose a synergy between logical reasoning and machine learning that will be grounded on spatial and temporal knowledge. Describing some successful applications, remaining challenges, and evaluation datasets pertaining to this direction is the main topic of this contribution.

翻訳日:2022-11-29 17:54:07 公開日:2022-11-28

# 医療対話における情報の自動抽出 : エキスパートシステムとラベリングへの注意

Automatically Extracting Information in Medical Dialogue: Expert System And Attention for Labelling ( http://arxiv.org/abs/2211.15544v1 )

ライセンス: Link先を確認

Xinshi Wang, Daniel Tang

(参考訳) 現代の医療において,医療対話情報抽出はますます大きな問題になりつつある。電子カルテ(EMR)から重要な情報を大量に抽出することは困難である。これまで研究者は、emrから特徴を検索するための注意に基づくモデルを提案したが、その限界は医療対話の異なるカテゴリを認識することができないことを反映していた。本稿では,新しいモデルであるExpert System and Attention for Labelling (ESAL)を提案する。我々は、専門家と事前訓練されたBERTの混合を用いて、異なるカテゴリのセマンティクスを検索し、モデルがそれらの違いを融合できるようにする。実験では, ESALを公開データセットに適用し, 実験結果から, ESALは医療情報分類の性能を大幅に向上したことが示された。

Medical dialogue information extraction is becoming an increasingly significant problem in modern medical care. It is difficult to extract key information from electronic medical records (EMRs) due to their large numbers. Previously, researchers proposed attention-based models for retrieving features from EMRs, but their limitations were reflected in their inability to recognize different categories in medical dialogues. In this paper, we propose a novel model, Expert System and Attention for Labelling (ESAL). We use mixture of experts and pre-trained BERT to retrieve the semantics of different categories, enabling the model to fuse the differences between them. In our experiment, ESAL was applied to a public dataset and the experimental results indicated that ESAL significantly improved the performance of Medical Information Classification.

翻訳日:2022-11-29 17:47:01 公開日:2022-11-28

# Unfair ToS Clause Detection に対する攻撃:Universal Adversarial Trigger を用いたケーススタディ

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers ( http://arxiv.org/abs/2211.15556v1 )

ライセンス: Link先を確認

Shanshan Xu and Irina Broda and Rashid Haddad and Marco Negrini and Matthias Grabmair

(参考訳) 近年の研究では、自然言語処理技術がサービス規約(tos)の不正な条項を自動的に検出することで消費者保護を支援することが示されている。この研究は、トランスフォーマーベースのToS分析システムが敵攻撃に対して脆弱であることを示す。我々は,普遍的な敵トリガーを持つ不公平なクラーズ検出器を攻撃実験を行う。実験により、テキストのわずかな摂動は検出性能を著しく低下させることが示された。さらに,トリガの検出可能性を測定するため,回答の精度と応答時間の両方を参加者から収集し,詳細な人的評価研究を行う。その結果、トリガーの自然さが読者を騙す鍵であることがわかった。

Recent work has demonstrated that natural language processing techniques can support consumer protection by automatically detecting unfair clauses in the Terms of Service (ToS) Agreement. This work demonstrates that transformer-based ToS analysis systems are vulnerable to adversarial attacks. We conduct experiments attacking an unfair-clause detector with universal adversarial triggers. Experiments show that a minor perturbation of the text can considerably reduce the detection performance. Moreover, to measure the detectability of the triggers, we conduct a detailed human evaluation study by collecting both answer accuracy and response time from the participants. The results show that the naturalness of the triggers remains key to tricking readers.

翻訳日:2022-11-29 17:46:49 公開日:2022-11-28

# スウェーデンにおける基本読解理解のための質問応答ペアの自動生成

Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish ( http://arxiv.org/abs/2211.15568v1 )

ライセンス: Link先を確認

Dmytro Kalpakchi and Johan Boye

(参考訳) 本稿では,クインダクタ法を用いて,スウェーデン語テキストから自動生成した読解質問の品質評価を行う。本手法は、自動質問生成(qg)のための軽量でデータ駆動だが非ニューラルな手法である。評価の結果,Quinductorはニューラルネットワークに基づくQG手法の強力なベースラインを提供する,実行可能なQG手法であることがわかった。

This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods.

翻訳日:2022-11-29 17:46:39 公開日:2022-11-28

# 大きな変化を伴う動的コミュニティ検出のための高次知識伝達

Higher-order Knowledge Transfer for Dynamic Community Detection with Great Changes ( http://arxiv.org/abs/2211.15043v1 )

ライセンス: Link先を確認

Huixin Ma, Kai Wu, Handing Wang, Jing Liu

(参考訳) ネットワーク構造は現実の時間とともに進化し,動的ネットワークにおけるコミュニティの変化の発見は,課題を提起する重要な研究課題である。既存のほとんどのメソッドは、ネットワークに大きな変化は起こらないと仮定している。しかし、通常、現実世界には大きな変化がある。ネットワークの大幅な変更により、コミュニティ検出アルゴリズムは以前のスナップショットから貴重な情報を得るのが難しくなり、次のステップでは負の転送が行われる。本稿では、過去のスナップショットから高次知識を統合することで、大幅な変化を伴う動的なコミュニティ検出に焦点を当てた。さらに、検索効率を向上させるために、スナップショットの隣接行列の類似性を検出することにより、一階知識と高階知識を判定する高階知識転送戦略を考案する。このように、我々の提案は、過去のコミュニティ検出結果の利点をよりよく保ち、それらを次のタスクに移すことができる。我々は4つの実世界のネットワークで実験を行い、大きな変更や小さな変更を加えたネットワークを含む。低相似性データセットにおける実験結果は、ネットワークが著しく変化しても一階の知識よりも高階の知識の方が価値があることを示し、高相似性データセットを扱う場合でも利点を保っていることを示している。我々の提案は、大きな変化を伴う他の動的最適化問題を導くこともできる。

Network structure evolves with time in the real world, and the discovery of changing communities in dynamic networks is an important research topic that poses challenging tasks. Most existing methods assume that no significant change in the network occurs; namely, the difference between adjacent snapshots is slight. However, great change exists in the real world usually. The great change in the network will result in the community detection algorithms are difficulty obtaining valuable information from the previous snapshot, leading to negative transfer for the next time steps. This paper focuses on dynamic community detection with substantial changes by integrating higher-order knowledge from the previous snapshots to aid the subsequent snapshots. Moreover, to improve search efficiency, a higher-order knowledge transfer strategy is designed to determine first-order and higher-order knowledge by detecting the similarity of the adjacency matrix of snapshots. In this way, our proposal can better keep the advantages of previous community detection results and transfer them to the next task. We conduct the experiments on four real-world networks, including the networks with great or minor changes. Experimental results in the low-similarity datasets demonstrate that higher-order knowledge is more valuable than first-order knowledge when the network changes significantly and keeps the advantage even if handling the high-similarity datasets. Our proposal can also guide other dynamic optimization problems with great changes.

翻訳日:2022-11-29 17:46:33 公開日:2022-11-28

# snpシステムとその構成グラフの特性

Properties of SN P system and its Configuration Graph ( http://arxiv.org/abs/2211.15159v1 )

ライセンス: Link先を確認

Henry N. Adorna

(参考訳) sn pシステムとその変異に関する文献でいくつかの研究が報告されている。多くの場合、結果は様々な変種とこれらの変種が生成し認識する言語のクラスに普遍性をもたらす。 sn pシステムの状態はその構成である。構成の到達可能性に関する前回の結果をsn p系に対する「it基本状態方程式」と呼ぶ。本稿では,sn pシステムの動作特性と構造特性について,本基本状態方程式に主に依存する遅延を伴わない予備的な検討を行う。また、設定グラフ $CG_{\Pi}$ を SN P システム $\Pi$ のアイデアを紹介し、$CG_{\Pi} に対して $\Pi$ の振る舞い特性を特徴付けるのに遅延を伴わない。 sn p システム $\pi$ の行列 $m_{\pi}$ は、$\pi の構造特性を特徴付けるために使われる。 $

Several studies have been reported in the literature about SN P system and its variants. Often, the results provide universality of various variants and the classes of languages that these variants generate and recognize. The state of SN P system is its configuration. We refer to our previous result on reachability of configuration as the {\it Fundamental state equation for SN P system.} This paper provides a preliminary investigation on the behavioral and structural properties of SN P system without delay that depend primarily to this fundamental state equation. Also, we introduce the idea of configuration graph $CG_{\Pi}$ of an SN P system $\Pi$ without delay to characterize behavioral properties of $\Pi$ with respect to $CG_{\Pi}.$ The matrix $M_{\Pi}$ of an SN P system $\Pi$ without delay is used to characterize structural properties of $\Pi.$

翻訳日:2022-11-29 17:46:10 公開日:2022-11-28

# プロンプトベース学習によるキーポイントマッピングへの議論

Arguments to Key Points Mapping with Prompt-based Learning ( http://arxiv.org/abs/2211.14995v1 )

ライセンス: Link先を確認

Ahnaf Mozib Samin, Behrooz Nikandish, Jingyan Chen

(参考訳) 大量の情報を効率的に処理し、消化することは、現代社会における長期的な需要である。キーポイント(必須情報を取り込む短いテキスト要約とフィルタリング冗長性)を多くの引数/オピニオンにマップするソリューションが最近提供されている(bar-haim et al., 2020)。本稿では,引数対キーポイントマッピングタスクの全体像を補完するために,主に2つのアプローチを提案する。最初のアプローチは、事前学習言語モデル(plm)の微調整にプロンプトエンジニアリングを組み込むことである。第2のアプローチは、PLMにおけるプロンプトベースの学習を利用して中間テキストを生成し、元の引数キーポイントペアと組み合わせて、クラス化子に入力として入力し、それらをマッピングする。さらに,実験をクロス/イン・ドメインに拡張し,詳細な分析を行う。私たちの評価では一より直接的な方法による即効的な工学の使用(アプローチ1)は、有望な結果をもたらし、性能を改善することができる。二アプローチ2は、PLMの否定問題により、アプローチ1より著しく悪化する。

Handling and digesting a huge amount of information in an efficient manner has been a long-term demand in modern society. Some solutions to map key points (short textual summaries capturing essential information and filtering redundancies) to a large number of arguments/opinions have been provided recently (Bar-Haim et al., 2020). To complement the full picture of the argument-to-keypoint mapping task, we mainly propose two approaches in this paper. The first approach is to incorporate prompt engineering for fine-tuning the pre-trained language models (PLMs). The second approach utilizes prompt-based learning in PLMs to generate intermediary texts, which are then combined with the original argument-keypoint pairs and fed as inputs to a classifier, thereby mapping them. Furthermore, we extend the experiments to cross/in-domain to conduct an in-depth analysis. In our evaluation, we find that i) using prompt engineering in a more direct way (Approach 1) can yield promising results and improve the performance; ii) Approach 2 performs considerably worse than Approach 1 due to the negation issue of the PLM.

翻訳日:2022-11-29 17:38:37 公開日:2022-11-28

# stage: アスペクト感情三重項抽出のためのスパンタグとグリーディ推論法

STAGE: Span Tagging and Greedy Inference Scheme for Aspect Sentiment Triplet Extraction ( http://arxiv.org/abs/2211.15003v1 )

ライセンス: Link先を確認

Shuo Liang, Wei Wei, Xian-Ling Mao, Yuanyuan Fu, Rui Fang, Dangyang Chen

(参考訳) Aspect Sentiment Triplet extract (ASTE) は感情分析研究において新たな課題となり、ある文からアスペクト項とその対応する意見項とその関連する感情極性を抽出することを目指している。近年、異なるタグ付けスキームを持つ多くのニューラルネットワークベースのモデルが提案されているが、ほとんどすべてのモデルには制限がある。 1) 各単語が1つの役割(アスペクト項や意見項など)にのみ関連しているという事前仮定 2) 単語レベルの相互作用と各意見/アスペクトを独立した単語の集合として扱う。したがって、複数の役割に関連する単語や複数の単語を持つアスペクト/オピニオン項など、複雑なasteタスクではパフォーマンスが低下する。そこで我々は,Span TAgging と Greedy infErence (STAGE) という新たなアプローチを提案し,複数の単語から構成され,同時に異なる役割を演じることができる。そこで本稿では,ASTEタスクを多クラススパン分類問題として定式化する。具体的には、スパンレベルの情報と制約、すなわちスパンタグスキームとグリーディ推論戦略の2つのコンポーネントを探索することで、より正確なアスペクト感情三重項抽出を生成する。前者のタグは、新しく定義されたタグセットに基づいて、可能な候補すべてにまたがる。後者は、候補感情スニペットから最大長のアスペクト/オピニオン項を取得し、感情三重項を出力する。さらに,このステージに基づく簡易かつ効果的なモデルを提案する。これは4つの広く使用されているデータセットにおいて,最先端を大きなマージンで上回っている。さらに,STAGE を他のペア/トリップレット抽出タスクに簡単に一般化することができ,提案方式の STAGE の優位性を示す。

Aspect Sentiment Triplet Extraction (ASTE) has become an emerging task in sentiment analysis research, aiming to extract triplets of the aspect term, its corresponding opinion term, and its associated sentiment polarity from a given sentence. Recently, many neural networks based models with different tagging schemes have been proposed, but almost all of them have their limitations: heavily relying on 1) prior assumption that each word is only associated with a single role (e.g., aspect term, or opinion term, etc. ) and 2) word-level interactions and treating each opinion/aspect as a set of independent words. Hence, they perform poorly on the complex ASTE task, such as a word associated with multiple roles or an aspect/opinion term with multiple words. Hence, we propose a novel approach, Span TAgging and Greedy infErence (STAGE), to extract sentiment triplets in span-level, where each span may consist of multiple words and play different roles simultaneously. To this end, this paper formulates the ASTE task as a multi-class span classification problem. Specifically, STAGE generates more accurate aspect sentiment triplet extractions via exploring span-level information and constraints, which consists of two components, namely, span tagging scheme and greedy inference strategy. The former tag all possible candidate spans based on a newly-defined tagging set. The latter retrieves the aspect/opinion term with the maximum length from the candidate sentiment snippet to output sentiment triplets. Furthermore, we propose a simple but effective model based on the STAGE, which outperforms the state-of-the-arts by a large margin on four widely-used datasets. Moreover, our STAGE can be easily generalized to other pair/triplet extraction tasks, which also demonstrates the superiority of the proposed scheme STAGE.

翻訳日:2022-11-29 17:38:20 公開日:2022-11-28

# WMT22チャット翻訳タスクのためのBJTU-WeChatのシステム

BJTU-WeChat's Systems for the WMT22 Chat Translation Task ( http://arxiv.org/abs/2211.15009v1 )

ライセンス: Link先を確認

Yunlong Liang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou

(参考訳) 本稿では,WMT'22チャット翻訳タスクに対して,北京地東大学とWeChat AIを共同で提案する。 Transformerに基づいて、いくつかの有効な変種を適用する。実験では,事前学習型微調整パラダイムを用いた。最初の事前学習段階では、データフィルタリングと合成データ生成(バックトランスレーション、フォワードトランスレーション、知識蒸留)を用いる。第2のファインチューニング段階では、話者対応のドメイン内データ生成、話者適応、プロンプトベースコンテキストモデリング、ターゲットデノイング微調整、自己圧縮型モデルアンサンブルについて検討する。本システムは0.810と0.946のCOMETスコアを得る。英語とドイツ語のCOMETスコアは、全ての応募の中で最高である。

This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. In our experiments, we utilize the pre-training-then-fine-tuning paradigm. In the first pre-training stage, we employ data filtering and synthetic data generation (i.e., back-translation, forward-translation, and knowledge distillation). In the second fine-tuning stage, we investigate speaker-aware in-domain data generation, speaker adaptation, prompt-based context modeling, target denoising fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve 0.810 and 0.946 COMET scores. The COMET scores of English-German and German-English are the highest among all submissions.

翻訳日:2022-11-29 17:37:48 公開日:2022-11-28

# 夏:WMT22バイオメディカル翻訳タスクのためのWeChatニューラル機械翻訳システム

Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task ( http://arxiv.org/abs/2211.15022v1 )

ライセンス: Link先を確認

Ernan Li, Fandong Meng and Jie Zhou

(参考訳) 本稿では,WeChatのWMT 2022への参加について紹介する。我々のシステムはトランスフォーマに基づいており、いくつかの異なるトランスフォーマ構造を使用して翻訳の質を向上させる。実験では,データフィルタリング,データ生成,トランスフォーマーのいくつかの変種,微調整,モデルアンサンブルを用いた。われわれの中国の$\to$EnglishシステムはSummerと名付けられ、全応募中で最も高いBLEUスコアを達成している。

This paper introduces WeChat's participation in WMT 2022 shared biomedical translation task on Chinese to English. Our systems are based on the Transformer, and use several different Transformer structures to improve the quality of translation. In our experiments, we employ data filtering, data generation, several variants of Transformer, fine-tuning and model ensemble. Our Chinese$\to$English system, named Summer, achieves the highest BLEU score among all submissions.

翻訳日:2022-11-29 17:37:35 公開日:2022-11-28

# DiffusionBERT: 拡散モデルによる生成的マスク言語モデルの改善

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models ( http://arxiv.org/abs/2211.15029v1 )

ライセンス: Link先を確認

Zhengfu He, Tianxiang Sun, Kuanning Wang, Xuanjing Huang, Xipeng Qiu

(参考訳) 離散拡散モデルに基づく新しい生成マスク付き言語モデルであるDiffusionBERTを提案する。拡散モデルと多くの事前訓練された言語モデルは共通の訓練目標、すなわち2つの強力なモデルを組み合わせ、両方の世界の最高のものを楽しむことができる。一方、拡散モデルは、生成品質を改善するための有望なトレーニング戦略を提供する。一方、事前訓練された言語モデル(例えばBERT)は収束を加速する優れた初期化として使用できる。我々は,離散拡散過程の逆過程を吸収状態で学習し,それを改善するためにいくつかの設計を解明するためにBERTを訓練する。まず,各ステップに付加される雑音の度合いを,各トークンの情報に基づいて制御する前方拡散プロセスのための新しいノイズスケジュールを提案する。次に,時間ステップをBERTに組み込む設計について検討する。非条件テキスト生成の実験では、DiffusionBERTはテキストの既存の拡散モデル(例えば、D3PMとDiffusion-LM)や、パープレキシティとBLEUスコアの点で、以前の生成的マスキング言語モデルよりも大幅に改善されている。

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the two powerful models and enjoy the best of both worlds. On the one hand, diffusion models offer a promising training strategy that helps improve the generation quality. On the other hand, pre-trained denoising language models (e.g., BERT) can be used as a good initialization that accelerates convergence. We explore training BERT to learn the reverse process of a discrete diffusion process with an absorbing state and elucidate several designs to improve it. First, we propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step based on the information of each token. Second, we investigate several designs of incorporating the time step into BERT. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text (e.g., D3PM and Diffusion-LM) and previous generative masked language models in terms of perplexity and BLEU score.

翻訳日:2022-11-29 17:37:27 公開日:2022-11-28

# songrewriter: コントロール可能なコンテンツとrhymeスキームを備えた中国の歌の書き直しシステム

SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme ( http://arxiv.org/abs/2211.15037v1 )

ライセンス: Link先を確認

Yusen Sun, Liangyou Li, Qun Liu and Dit-Yan Yeung

(参考訳) 近年,歌詞生成は顕著な進歩を遂げているが,互換性のある旋律を作成せずには歌詞を演奏できないため,実用的応用は限られている。そこで本研究では,生成した歌詞が既存の旋律のリズムと適合し,歌えるように,既存の歌の歌詞を書き換える歌書き換えシステムを提案することで,この実用的ギャップを解消する。特に,メロディ構成の事前知識を必要とせず,ユーザを支援する制御可能な中国語歌詞生成・編集システムであるsongrewriterを提案する。システムはランダム化されたマルチレベルマスキング戦略によって訓練され、完全に新しい歌詞を生成したり、いくつかの断片を編集するための統一モデルを生成する。生成プロセスの制御能力を向上させるために、コンテンツの語彙選択を制御するキーワードプロンプトを更に取り入れ、フレキシブルエンドおよび内部リズムスキームを実現するための新しい復号制約と母音モデリングタスクを提案する。先行韻律はラップ歌詞を主目的とするが,新たに3つの韻律評価指標を提案する。自動評価と人間評価の両方により,提案モデルが,内容と韻律品質の両方において,最先端モデルよりも優れた性能を示す。 MindSpore Liteツールで実装されたコードとモデルが利用可能になります。

Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter, a controllable Chinese lyric generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality. Our code and models implemented in MindSpore Lite tool will be available.

翻訳日:2022-11-29 17:37:08 公開日:2022-11-28

# 超大語彙を持つ大規模事前学習モデル:ヘブライ語のBERTモデルの対比分析と、その全てを上回る新しいモデル

Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All ( http://arxiv.org/abs/2211.15199v1 )

ライセンス: Link先を確認

Eylon Guetta, Avi Shmidman, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Joshua Guedalia, Moshe Koppel, Dan Bareket, Amit Seker, Reut Tsarfaty

(参考訳) 我々は,従来のヘブライ語plmよりもはるかに大きな語彙(128k項目)を用いた現代ヘブライ語のための新しい事前学習言語モデル(plm)を提案する。我々は,従来のヘブライ語 PLM (mBERT, heBERT, AlephBERT) に対して,このモデルを対照的に解析し,より大きな語彙がタスク性能に与える影響を評価する。実験の結果、より大きな語彙は分割を減らし、分割を減らすことは、異なるタスクをまたいだモデルの性能向上に役立つことがわかった。すべての新しいモデルにおいて、Morphological Segmentation、POS Tagging、Full Morphological Analysis、NER、Sentiment Analysisを含むすべてのHebrewベンチマークで新しいSOTAを実現している。その後、レイヤ数やトレーニングデータだけでなく、その語彙の観点からも大きなplmを提唱します。制限のない使用のために、新しいモデルを公開しています。

We present a new pre-trained language model (PLM) for modern Hebrew, termed AlephBERTGimmel, which employs a much larger vocabulary (128K items) than standard Hebrew PLMs before. We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance. Our experiments show that larger vocabularies lead to fewer splits, and that reducing splits is better for model performance, across different tasks. All in all this new model achieves new SOTA on all available Hebrew benchmarks, including Morphological Segmentation, POS Tagging, Full Morphological Analysis, NER, and Sentiment Analysis. Subsequently we advocate for PLMs that are larger not only in terms of number of layers or training data, but also in terms of their vocabulary. We release the new model publicly for unrestricted use.

翻訳日:2022-11-29 17:36:48 公開日:2022-11-28

# HERDPhobia:ナイジェリアのフラーニに対するヘイトスピーチのデータセット

HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria ( http://arxiv.org/abs/2211.15262v1 )

ライセンス: Link先を確認

Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Ibrahim Said Ahmad

(参考訳) ソーシャルメディアプラットフォームは、ユーザーが問題や自分が望むものについて自由に意見を共有できるようにする。しかし、憎しみや虐待的なコンテンツを広めるのも容易だ。フラーニ族はこの不幸な現象の犠牲者となっている。本稿では,ナイジェリアのフラーニ牧草地における最初の注釈付きヘイトスピーチデータセットであるHERDPhobiaについて,英語,ナイジェリア・ピジン,ハウサの3言語で紹介する。我々は,事前学習した言語モデルを用いて,ツイートを憎悪か非憎悪かのいずれかに分類するベンチマーク実験を行う。我々の実験によると、XML-Tモデルは99.83%の重み付きF1でより良いパフォーマンスを提供する。さらなる研究のために、データセットをhttps://github.com/hausanlp/HERDPhobiaでリリースしました。

Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.

翻訳日:2022-11-29 17:36:29 公開日:2022-11-28

# 2パスカスケードエンコーダASRモデルにおけるE2Eセグメンテーション

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model ( http://arxiv.org/abs/2211.15432v1 )

ライセンス: Link先を確認

W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman

(参考訳) 2パスのカスケードエンコーダASRとニューラルセグメンタを1つのモデルに統合することを検討する。重要な課題は、セグメンタ(デコーダと同期してリアルタイムに実行される)が、推論中にユーザの認識したレイテンシや削除エラーを発生させることなく、第2パス(リアルタイムに900msの後方で動作する)をファイナライズできるようにすることである。本稿では,ニューラルセグメンタを1stパスデコーダと統合して終端信号(EOS)をリアルタイムに出力する設計を提案する。 EOS信号は、非因果性第2パスのファイナライズに使用される。第2パスをファイナライズする方法を試作し,新しいダミーフレームインジェクション戦略により,高品質な第2パスと低ファイナライズ遅延を同時に実現できることを確認した。実世界の長文キャプションタスク(YouTube)では、2.4%の相対的なWERと140ミリ秒のEOSレイテンシを、同じカスケードエンコーダを持つベースラインのVADベースのセグメンタで達成している。

We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a novel dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over a baseline VAD-based segmenter with the same cascaded encoder.

翻訳日:2022-11-29 17:36:04 公開日:2022-11-28

# adatask:マルチタスク学習のためのタスク認識適応学習率アプローチ

AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning ( http://arxiv.org/abs/2211.15055v1 )

ライセンス: Link先を確認

Enneng Yang, Junwei Pan, Ximei Wang, Haibin Yu, Li Shen, Xihua Chen, Lei Xiao, Jie Jiang, Guibing Guo

(参考訳) マルチタスク学習(MTL)モデルは、コンピュータビジョン、自然言語処理、レコメンダシステムにおいて印象的な結果を示している。多くのアプローチが提案されているが、それぞれのパラメータでどのように異なるタスクをバランスさせるかはまだ不明である。本稿では,このパラメータ上の各タスクの総更新によって,パラメータのタスク支配度を測定することを提案する。具体的には、対応するタスクからパラメータの2乗更新(au)を指数関数的に減少させる平均値で総更新を計算する。この新しいメトリックに基づいて、既存のmtlメソッドの多くのパラメータ、特に高い共有層におけるパラメータが、1つまたは複数のタスクで支配されていることを観測する。 AUの優位は、主に1つまたは複数のタスクからの累積勾配の優位性に起因する。そこで本研究では,適応学習率のアプローチにおいて,各パラメータに対する各タスクの学習率を<emph{accumulative gradients}>と分離するタスク単位適応学習率アプローチ adatask を提案する。コンピュータビジョンとレコメンダシステムMTLデータセットに関する総合的な実験は、AdaTaskが支配的なタスクのパフォーマンスを大幅に改善し、SOTAの平均タスク性能が向上することを示した。合成データと実世界のデータセットの両方の分析は、共有層ごとにadatask balanceパラメータをよく示している。

Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task.Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the \emph{accumulative gradients} and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well.

翻訳日:2022-11-29 17:21:09 公開日:2022-11-28

# 畳み込みネットワークを用いたDolphin Whistlesの自動検出と伝達学習

Automated Detection of Dolphin Whistles with Convolutional Networks and Transfer Learning ( http://arxiv.org/abs/2211.15406v1 )

ライセンス: Link先を確認

Burla Nur Korkmaz, Roee Diamant, Gil Danino, Alberto Testolin

(参考訳) 海洋環境の効率的な保全と絶滅危惧種の野生生物管理は、環境モニタリングのための効率的で正確でスケーラブルなソリューションの実装を必要とする。エコ音響学は、環境音の非侵襲的長期サンプリングの利点を提供し、生物多様性調査の基準ツールとなる可能性がある。しかし、音響データの分析と解釈は、しばしば大量の人間の監督を必要とする時間を要するプロセスである。この問題は、ディープラーニング研究の進歩により、最近目覚ましいパフォーマンスを達成した音声信号分析の現代的技術を活用することで解決されるかもしれない。本稿では,畳み込み型ニューラルネットワークが水中の音声記録からイルカの口笛を識別することで,従来の自動手法よりもはるかに優れていることを示す。提案システムでは,環境雑音の存在下でも信号を検出することができると同時に,偽陽性や偽陰性の発生可能性も一貫して低減できる。本研究は,海洋生態系の自動モニタリングを改善するための人工知能技術の導入をさらに支援する。

Effective conservation of maritime environments and wildlife management of endangered species require the implementation of efficient, accurate and scalable solutions for environmental monitoring. Ecoacoustics offers the advantages of non-invasive, long-duration sampling of environmental sounds and has the potential to become the reference tool for biodiversity surveying. However, the analysis and interpretation of acoustic data is a time-consuming process that often requires a great amount of human supervision. This issue might be tackled by exploiting modern techniques for automatic audio signal analysis, which have recently achieved impressive performance thanks to the advances in deep learning research. In this paper we show that convolutional neural networks can indeed significantly outperform traditional automatic methods in a challenging detection task: identification of dolphin whistles from underwater audio recordings. The proposed system can detect signals even in the presence of ambient noise, at the same time consistently reducing the likelihood of producing false positives and false negatives. Our results further support the adoption of artificial intelligence technology to improve the automatic monitoring of marine ecosystems.

翻訳日:2022-11-29 17:20:14 公開日:2022-11-28

# 合成主成分設計:合成制御による高速共変量バランス

Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls ( http://arxiv.org/abs/2211.15241v1 )

ライセンス: Link先を確認

Yiping Lu, Jiajin Li, Lexing Ying, Jose Blanchet

(参考訳) 実験の最適設計は一般にNP-ハード組合せ最適化問題を解くことである。本稿では,グローバルに収束し,効率的な最適化アルゴリズムを開発することを目的とする。具体的には、前処理結果データが利用可能で、合成制御推定器が呼び出される設定を考える。平均処理効果は、処理単位の重み付き平均結果と、観察データから重みが学習される制御単位の差によって推定される。この設定下では、最適実験設計問題はいわゆる \textit{phase sync}問題に還元できることを驚くほど観察した。スペクトル初期化を用いた一般化電力法の正規化変種を用いてこの問題を解決する。理論的には、あるデータ生成プロセスから前処理データをサンプリングする場合、実験設計のための最初の大域的最適性保証を確立する。実験では,米国労働統計局とアバディ・ダイモンド・ハインミューラー・カリフォルニア喫煙データの両方において,本手法の有効性を実証する実験を行った。根平均二乗誤差の観点からは、このアルゴリズムはランダムな設計を大きなマージンで超えている。

The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.

翻訳日:2022-11-29 17:11:02 公開日:2022-11-28

# サインコヒーレンシによる個別処理規則のメタ分析

Meta-analysis of individualized treatment rules via sign-coherency ( http://arxiv.org/abs/2211.15476v1 )

ライセンス: Link先を確認

Jay Jojo Cheng, Jared D. Huling, Guanhua Chen

(参考訳) 患者の基本特性に合わせた治療は、副作用を減少させながら患者の予後を改善する可能性を秘めている。個別化処理ルール(ITR)の学習には、複数のデータセット(サイト)の集約が必要となることが多いが、現在のITR方法論では、サイト間の不均一性を考慮していないため、各サイトへのデプロイ時にモデルの一般化性が損なわれる可能性がある。そこで本研究では,ITRの個人レベルでのメタ分析手法を開発し,地域固有のITRを共同で学習すると同時に,科学的に動機付けられた指向性原理を通じて特徴記号コヒーレンシに関する情報を借用する。また,itr学習問題に適応した情報基準を用いて,モデルチューニングのための適応手順を開発した。提案手法を数値実験により検討し,多地点間不均一性の異なるレベル下での性能を把握し,その手法を適用して電子健康記録の多施設データベース上でITRを推定する。この研究は、ITR(Aラーニング、重み付け学習)をマルチサイト設定に推定するためのいくつかの一般的な方法論を拡張した。

Medical treatments tailored to a patient's baseline characteristics hold the potential of improving patient outcomes while reducing negative side effects. Learning individualized treatment rules (ITRs) often requires aggregation of multiple datasets(sites); however, current ITR methodology does not take between-site heterogeneity into account, which can hurt model generalizability when deploying back to each site. To address this problem, we develop a method for individual-level meta-analysis of ITRs, which jointly learns site-specific ITRs while borrowing information about feature sign-coherency via a scientifically-motivated directionality principle. We also develop an adaptive procedure for model tuning, using information criteria tailored to the ITR learning problem. We study the proposed methods through numerical experiments to understand their performance under different levels of between-site heterogeneity and apply the methodology to estimate ITRs in a large multi-center database of electronic health records. This work extends several popular methodologies for estimating ITRs (A-learning, weighted learning) to the multiple-sites setting.

翻訳日:2022-11-29 17:10:30 公開日:2022-11-28

# テキスト-SQLモデルのセキュリティ脆弱性について

On the Security Vulnerabilities of Text-to-SQL Models ( http://arxiv.org/abs/2211.15363v1 )

ライセンス: Link先を確認

Xutan Peng, Yipeng Zhang, Jingfeng Yang, Mark Stevenson

(参考訳) 最近の研究によると、テキスト処理アルゴリズムは多くのタスクに効果があるものの、意図的な攻撃に対して脆弱である可能性がある。しかし、このような弱点が直接セキュリティの脅威に繋がるかどうかはまだ未定だ。このギャップを埋めるため、データベースの自然言語インターフェースを構築するテクニックであるText-to-SQLの脆弱性テストを実施しました。実証的な結果として、2つの商用ブラックボックス(Baidu-UNIT と Codex で動作する Ai2sql)の Text-to-SQL モジュールが悪意のあるコードを生成するために操作可能であることを示しました。これは、NLPモデルが野生の攻撃ベクトルとして利用される危険性の初めての実証である。さらに、4つのオープンソースフレームワークを含む実験により、単純なバックドア攻撃がテキストからSQLシステムで100%の成功率を達成できることを確認した。これらの知見を報告し,実践的な防衛策を提案することにより,ソフトウェアセキュリティ問題の特定と修復にNLPコミュニティから直ちに注意を喚起する。

Recent studies show that, despite being effective on numerous tasks, text processing algorithms may be vulnerable to deliberate attacks. However, the question of whether such weaknesses can directly lead to security threats is still under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL, a technique that builds natural language interfaces for databases. Empirically, we showed that the Text-to-SQL modules of two commercial black boxes (Baidu-UNIT and Codex-powered Ai2sql) can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service. This is the first demonstration of the danger of NLP models being exploited as attack vectors in the wild. Moreover, experiments involving four open-source frameworks verified that simple backdoor attacks can achieve a 100% success rate on Text-to-SQL systems with almost no prediction performance impact. By reporting these findings and suggesting practical defences, we call for immediate attention from the NLP community to the identification and remediation of software security issues.

翻訳日:2022-11-29 17:10:10 公開日:2022-11-28

# 可変需要に適応した自律経路・ピックアップ問題に対するマルチエージェント強化学習

Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand ( http://arxiv.org/abs/2211.14983v1 )

ライセンス: Link先を確認

Daniel Garces, Sushmita Bhattacharya, Stephanie Gil, Dimitri Bertsekas

(参考訳) 都市地図上で確率的に現れる要求の処理を行う車両群に対して,ルーティング/ピックアップポリシを生成するための学習フレームワークを導出する。私たちは政策に焦点を合わせ 1)車両間の連携を生じさせ、従量化の待ち時間を短縮する。 2)非明快で、未定の今後の要望を考慮し、 3) 基盤となる需要分布の変化に対応できる。特に、オンピーク時間とオフピーク時間のような都市環境における実際の需要条件の変動に対応することに関心があります。私たちはこれを組み合わせて達成し (i)オンラインプレイ、近似ポリシー反復ステップによるロールアウト手法の性能を向上させるルックアヘッド最適化方法、及び (ii)基盤となる需要モデルの変化に適応できるオフライン近似スキーム。特に,wassersteinambiguity集合のq-valid半径を用いて妥当性の領域を定量化することにより,学習したポリシーを異なる需要分布に適応させることができる。本研究では,現在の要求が元の有効領域外にある場合に,トレーニング済みのオフライン近似を切り替える機構を提案する。この場合、wasserstein距離の観点で現在の需要に近い歴史的な需要モデルに基づいてトレーニングされたオフラインアーキテクチャを使うように提案する。我々は、サンフランシスコのダウンタウンにおける実際の納税要求に対するルーティングとピックアップのポリシーを、オンピーク時間とオフピーク時間の間で高いばらつきで学習し、需要分布の実際の変動に対応する方法の能力を実証した。その結果,本手法は,運用研究の古典的手法に基づくベンチマークと同様に,ロールアウトに基づく強化学習よりも優れることがわかった。

We derive a learning framework to generate routing/pickup policies for a fleet of vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, considering a-priori unknown potential future requests, and 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in adapting to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) online play, a lookahead optimization method that improves the performance of rollout methods via an approximate policy iteration step, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in downtown San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms rollout-based reinforcement learning, as well as several benchmarks based on classical methods from the field of operations research.

翻訳日:2022-11-29 17:02:42 公開日:2022-11-28

# 企業の金融リスク分析に関する包括的調査 : 問題、方法、スポットライト、応用

A Comprehensive Survey on Enterprise Financial Risk Analysis: Problems, Methods, Spotlights and Applications ( http://arxiv.org/abs/2211.14997v1 )

ライセンス: Link先を確認

Yu Zhao, Huaming Du

(参考訳) 企業金融リスク分析は、企業の将来的な金融リスクを予測することを目的としており、広く適用されているため、企業金融リスク分析は金融の中核的な研究課題である。リスク管理に関する貴重な調査はすでにいくつかあるが、これらの調査は比較的孤立したアプローチを導入し、近年の企業金融リスク分析の進歩を欠いている。企業金融リスク分析の急速な拡大、特にコンピュータ科学とビッグデータの観点からは、関連する研究を包括的にレビューすることは必要かつ困難である。本調査は、既存の企業金融リスク研究を統合・体系化し、また、企業金融リスク分析のメカニズムと戦略を包括的に要約・解釈し、読者が現在の研究状況や考え方をよりよく理解する上で役立てることを目的とする。本論文は,1968年から2022年までの50年間の企業リスク分析モデリングに関する300以上の論文の体系的文献レビューを提供する。まず,企業リスクの形式的定義と関連する概念について紹介する。次に,リスクタイプの観点から代表作を分類し,リスク分析の3つの側面を要約した。最後に、企業財務リスクをモデル化するための分析手法を比較した。本研究の目的は,企業リスクコミュニケーションのメカニズムと企業ガバナンス,金融機関,政府規制への影響を十分に理解することを目的とした,現在の最先端の研究と,企業リスクをモデル化するための今後の方向性を明らかにすることである。

Enterprise financial risk analysis aims at predicting the enterprises' future financial risk.Due to the wide application, enterprise financial risk analysis has always been a core research issue in finance. Although there are already some valuable and impressive surveys on risk management, these surveys introduce approaches in a relatively isolated way and lack the recent advances in enterprise financial risk analysis. Due to the rapid expansion of the enterprise financial risk analysis, especially from the computer science and big data perspective, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing enterprise financial risk researches, as well as to summarize and interpret the mechanisms and the strategies of enterprise financial risk analysis in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. This paper provides a systematic literature review of over 300 articles published on enterprise risk analysis modelling over a 50-year period, 1968 to 2022. We first introduce the formal definition of enterprise risk as well as the related concepts. Then, we categorized the representative works in terms of risk type and summarized the three aspects of risk analysis. Finally, we compared the analysis methods used to model the enterprise financial risk. Our goal is to clarify current cutting-edge research and its possible future directions to model enterprise risk, aiming to fully understand the mechanisms of enterprise risk communication and influence and its application on corporate governance, financial institution and government regulation.

翻訳日:2022-11-29 17:02:16 公開日:2022-11-28

# 移動ロボットによる物体操作のための集団知能

Collective Intelligence for Object Manipulation with Mobile Robots ( http://arxiv.org/abs/2211.15136v1 )

ライセンス: Link先を確認

So Kuroki, Tatsuya Matsushima, Jumpei Arima, Yutaka Matsuo, Shixiang Shane Gu, Yujin Tang

(参考訳) 自然システムは多くの場合、自己組織化と変化への適応を可能にする集団的知性を示すが、ほとんどの人工的なシステムでは同等なものが欠落している。移動ロボットを用いた協調物体操作において,そのようなシステムの可能性を検討する。従来の研究では、制限された設定で問題に対する潜在的な解決策を示すが、計算と学習が困難である。さらに重要なことに、これらのシステムは環境の変化に直面するときに適応する能力を持たない。本研究では,グラデーションに基づくソフトボディシミュレータから得られたプランナーを注意に基づくニューラルネットワークに蒸留することで,マルチロボット操作システムがベースラインよりも優れた性能を実現できることを示す。さらに,本システムでは,トレーニング中に見えない構成に一般化し,外乱や環境変化を適用した場合のタスク完了に適応できる。

While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems. We explore the possibility of such a system in the context of cooperative object manipulation using mobile robots. Although conventional works demonstrate potential solutions for the problem in restricted settings, they have computational and learning difficulties. More importantly, these systems do not possess the ability to adapt when facing environmental changes. In this work, we show that by distilling a planner derived from a gradient-based soft-body physics simulator into an attention-based neural network, our multi-robot manipulation system can achieve better performance than baselines. In addition, our system also generalizes to unseen configurations during training and is able to adapt toward task completions when external turbulence and environmental changes are applied.

翻訳日:2022-11-29 17:01:52 公開日:2022-11-28

# ST-Curriculum Dropoutを用いた時空間グラフモデリング

Easy Begun is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum Dropout ( http://arxiv.org/abs/2211.15182v1 )

ライセンス: Link先を確認

Hongjun Wang, Jiyuan Chen, Tong Pan, Zipei Fan, Boyuan Zhang, Renhe Jiang, Lingyu Zhang, Yi Xie, Zhongyi Wang, Xuan Song

(参考訳) 交通速度予測やタクシー需要予測といった空間的時間的(st)グラフモデリングは、ディープラーニング分野において重要なタスクである。しかし、グラフ内のノードの場合、それらのSTパターンはSTデータの異種性に依拠し、モデリングの困難さに大きく依存する。我々は、ノードをモデルに有意義な順序で公開することで、従来のトレーニング手順よりもパフォーマンスが向上すると主張している。このアイデアはカリキュラム学習のルーツであり、初期のトレーニングモデルではノイズや難しいサンプルに敏感であることが示唆されている。本稿では,空間時間グラフモデリングのための新しい実装戦略ST-Curriculum Dropoutを提案する。具体的には,高レベルな機能空間における各ノードの学習難易度を評価し,それらの難易度を取り除き,モデルが最初から基本的なst関係のみを処理することを保証する。我々の戦略は、訓練可能なパラメータを加味せずに任意の標準的ディープラーニングアーキテクチャに適用でき、訓練が進むにつれてST関係の難易度を制御することによって、より優れたデータ表現を捉えることができ、より高度な一般化が得られることを示すために、幅広いデータセットに関する広範な実験を行うことができる。

Spatial-temporal (ST) graph modeling, such as traffic speed forecasting and taxi demand prediction, is an important task in deep learning area. However, for the nodes in graph, their ST patterns can vary greatly in difficulties for modeling, owning to the heterogeneous nature of ST data. We argue that unveiling the nodes to the model in a meaningful order, from easy to complex, can provide performance improvements over traditional training procedure. The idea has its root in Curriculum Learning which suggests in the early stage of training models can be sensitive to noise and difficult samples. In this paper, we propose ST-Curriculum Dropout, a novel and easy-to-implement strategy for spatial-temporal graph modeling. Specifically, we evaluate the learning difficulty of each node in high-level feature space and drop those difficult ones out to ensure the model only needs to handle fundamental ST relations at the beginning, before gradually moving to hard ones. Our strategy can be applied to any canonical deep learning architecture without extra trainable parameters, and extensive experiments on a wide range of datasets are conducted to illustrate that, by controlling the difficulty level of ST relations as the training progresses, the model is able to capture better representation of the data and thus yields better generalization.

翻訳日:2022-11-29 17:01:39 公開日:2022-11-28

# 連続エピソード制御

Continuous Episodic Control ( http://arxiv.org/abs/2211.15183v1 )

ライセンス: Link先を確認

Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat

(参考訳) 非パラメトリックエピソディックメモリは、強化学習タスクでハイリワード体験を素早くラッチするのに使うことができる。パラメトリック深層強化学習法とは対照的に、これらの手法は解を一度だけ発見し、繰り返し解くだけでよい。しかしながら、エピソディック制御解は離散テーブルに格納されており、このアプローチは離散作用空間問題にのみ適用されている。そこで本研究では,連続行動空間の問題における逐次決定のための非パラメトリックエピソードメモリアルゴリズムであるContinuous Episodic Control (CEC)を提案する。いくつかのスパース・リワード連続制御環境において,提案手法は現状のモデルレスRLやメモリ拡張RLアルゴリズムよりも高速に学習でき,長期性能も良好である。要するに、CECは継続的制御タスクにおける学習の高速なアプローチであり、ハイブリッドアプローチにおけるパラメトリックRLメソッドへの有用な追加である。

Non-parametric episodic memory can be used to quickly latch onto high-reward experience in reinforcement learning tasks. In contrast to parametric deep reinforcement learning approaches, these methods only need to discover the solution once, and may then repeatedly solve the task. However, episodic control solutions are stored in discrete tables, and this approach has so far only been applied to discrete action space problems. Therefore, this paper introduces Continuous Episodic Control (CEC), a novel non-parametric episodic memory algorithm for sequential decision making in problems with a continuous action space. Results on several sparse-reward continuous control environments show that our proposed method learns faster than state-of-the-art model-free RL and memory-augmented RL algorithms, while maintaining good long-run performance as well. In short, CEC can be a fast approach for learning in continuous control tasks, and a useful addition to parametric RL methods in a hybrid approach as well.

翻訳日:2022-11-29 17:01:16 公開日:2022-11-28

# 5G基地局交通予測のためのフェデレートラーニング

Federated Learning for 5G Base Station Traffic Forecasting ( http://arxiv.org/abs/2211.15220v1 )

ライセンス: Link先を確認

Vasileios Perifanis, Nikolaos Pavlidis, Remous-Aris Koutsiamanis, Pavlos S. Efraimidis

(参考訳) モバイルトラフィック予測は、5gモバイルネットワークがスマートで効率的なインフラ計画と管理を可能にするために非常に重要である。ただし、利用可能なデータは基地局のログ情報に限られている。したがって、異なる当事者に対する新たな観察に一般化できる高品質な予測を生成するための訓練方法が求められている。従来のアプローチでは、異なるベースステーションから測定値を収集し、中央のエンティティに送信し、受信したデータを使用して機械学習操作を実行する必要がある。ローカルな観察を広めることで、プライバシ、機密性、パフォーマンス上の懸念が高まり、マシンラーニング技術の適用性が損なわれる。この問題に対処するために,様々な分散学習手法が提案されているが,交通予測への応用はまだ検討されていない。本研究は, 時系列予測のための原基地局集約LTEデータに適用したフェデレーション学習の有効性について検討する。非iidデータのフェデレーション設定でトレーニングされた5つの異なるニューラルネットワークアーキテクチャを用いて、ワンステップ予測を評価する。提示されたアルゴリズムは、5gおよびbeyond challengeのグローバルフェデレーショントラフィック予測に提出された。その結果,フェデレート設定に適応した学習アーキテクチャは,集中型設定と等価な予測誤差を達成し,ベースステーションでの事前処理技術は高い予測精度をもたらすが,最先端のアグリゲータは単純なアプローチを上回らないことがわかった。

Mobile traffic prediction is of great importance on the path of enabling 5G mobile networks to perform smart and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations on different parties are in demand. Traditional approaches require collecting measurements from different base stations and sending them to a central entity, followed by performing machine learning operations using the received data. The dissemination of local observations raises privacy, confidentiality, and performance concerns, hindering the applicability of machine learning techniques. Various distributed learning methods have been proposed to address this issue, but their application to traffic prediction has yet to be explored. In this work, we study the effectiveness of federated learning applied to raw base station aggregated LTE data for time-series forecasting. We evaluate one-step predictions using 5 different neural network architectures trained with a federated setting on non-iid data. The presented algorithms have been submitted to the Global Federated Traffic Prediction for 5G and Beyond Challenge. Our results show that the learning architectures adapted to the federated setting achieve equivalent prediction error to the centralized setting, pre-processing techniques on base stations lead to higher forecasting accuracy, while state-of-the-art aggregators do not outperform simple approaches.

翻訳日:2022-11-29 17:01:00 公開日:2022-11-28

# 外科的場面理解のためのクラスインクリメンタルコントラスト学習を用いたタスク対応非同期マルチタスクモデル

Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding ( http://arxiv.org/abs/2211.15327v1 )

ライセンス: Link先を確認

Lalithkumar Seenivasan, Mobarakol Islam, Mengya Xu, Chwee Ming Lim and Hongliang Ren

(参考訳) 目的: ロボット手術における術中指導, 意思決定, 術後分析において, ツール間相互作用認識と自動レポート生成による手術シーン理解が重要な役割を担っている。しかし,患者間および患者内変動の異なる手術群と,新しい楽器の外観のドメインシフトは,モデル予測の性能を低下させる。さらに、計算コストが高く、リアルタイムのパフォーマンスに影響する複数のモデルからの出力が必要である。方法論: 領域シフト問題に対処する多タスク学習(MTL)モデルが手術報告生成およびツールとタスク間の相互作用予測のために提案される。共有特徴抽出器のモデル形式、キャプションのためのメッシュ変換分岐、ツール・トイシューインタラクション予測のためのグラフ注意分岐。共有特徴抽出器は、クラスインクリメンタルコントラスト学習(CICL)を用いて、ターゲット領域における強度シフトと新しいクラス外観に取り組む。我々は,gaussian (log) に基づくカリキュラム学習のlalacianを,共有分科とタスク分科に分割して,モデル学習を強化する。タスク対応非同期MTL最適化手法を導入し,共有重みを微調整し,両タスクを最適に収束させる。結果:タスク認識最適化と微調整技術を用いて訓練したMTLモデルは,目標領域上の両方のタスクに対するバランス性能(シーンキャプションのBLEUスコア0.4049,インタラクション検出の精度0.3508)を報告し,ドメイン適応における単一タスクモデルとオンパーで実行した。結論: 提案するマルチタスクモデルは, ドメインシフトに適応し, 対象領域に新しい機器を取り入れ, ツール間インタラクション検出とレポート生成を単一タスクモデルと同等に行うことができた。

Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. Methodology: A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning (CICL) to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian (LoG) based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. Results: The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. Conclusion: The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models.

翻訳日:2022-11-29 16:43:41 公開日:2022-11-28

# スパイキング神経p系のマトリックス表現:再検討

Matrix representations of spiking neural P systems: Revisited ( http://arxiv.org/abs/2211.15156v1 )

ライセンス: Link先を確認

Henry N. Adorna

(参考訳) 2010年、遅延のないsn pシステムの行列表現が提示され、遅延のあるsn pシステムの場合、2017年に行列表現が提案された。これらの表現は、コンピュータソフトウェアとハードウェア技術を用いたsn pシステムの一連のシミュレーションをもたらした。本研究では,これらの表現を再検討し,sn p系の計算の挙動について考察する。構成の到達可能性の概念は、sn pシステムと遅延のないシステムの両方において考慮される。 SNPシステムの遅延を考慮した場合、次の構成のより良い計算法を提案する。

In the 2010, matrix representation of SN P system without delay was presented while in the case of SN P systems with delay, matrix representation was suggested in the 2017. These representations brought about series of simulation of SN P systems using computer software and hardware technology. In this work, we revisit these representation and provide some observations on the behavior of the computations of SN P systems. The concept of reachability of configuration is considered in both SN P systems with and without delays. A better computation of next configuration is proposed in the case of SN P system with delay.

翻訳日:2022-11-29 16:43:08 公開日:2022-11-28

# アンサンブルスタック構築のためのブースティングアプローチ

A Boosting Approach to Constructing an Ensemble Stack ( http://arxiv.org/abs/2211.15621v1 )

ライセンス: Link先を確認

Zhilei Zhou and Ziyu Qiu and Brad Niblett and Andrew Johnston and Jeffrey Schwartzentruber and Nur Zincir-Heywood and Malcolm Heywood

(参考訳) 分類のための進化的アンサンブル学習へのアプローチが提案され、プログラムのスタックを構築するためにブースティングが使用される。 boostingのそれぞれのアプリケーションは、単一のチャンピオンと残りのデータセット、すなわち、これまで正しく分類されていなかったトレーニングレコードを識別する。次のプログラムは残留物に対してのみ訓練され、最大アンサンブルサイズまたはそれ以上の残留物が残るまで反復される。残留データセットに対するトレーニングは、トレーニングコストを積極的に削減する。アンサンブルをスタックとしてデプロイすることは、予測を行うのに1つの分類器だけが必要であることを意味するため、解釈性も向上する。ベンチマーク研究は、最先端の進化的アンサンブル学習アルゴリズムの予測精度と競争性を示すとともに、桁違いに単純なソリューションを提供する。高濃度データセットによるさらなるベンチマークにより,提案手法はXGBoostよりも正確かつ効率的であることが示唆された。

An approach to evolutionary ensemble learning for classification is proposed in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.

翻訳日:2022-11-29 16:42:34 公開日:2022-11-28

# 局所解釈可能なモデル非依存な説明による画像分類のための深層畳み込みニューラルネットワークの説明

Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations ( http://arxiv.org/abs/2211.15143v1 )

ライセンス: Link先を確認

Bin Wang, Wenbin Pei, Bing Xue, Mengjie Zhang

(参考訳) 深層畳み込みニューラルネットワークはその有効性を証明し、画像分類の最も有力な方法として認識されている。しかし、深層畳み込みニューラルネットワークの深刻な欠点は説明可能性の低下である。残念ながら、多くの現実世界のアプリケーションでは、ユーザーは予測を信頼すべきかどうかを決定する際に、深い畳み込みニューラルネットワークの予測の背後にある根拠を理解する必要がある。この問題を解決するために,局所的な説明を自動的に進化させ,ユーザが予測の合理性を評価するのに役立つ新しい遺伝的アルゴリズムに基づく手法を提案する。さらに,提案手法はモデルに依存しない,すなわち深い畳み込みニューラルネットワークモデルを説明するために利用できる。実験では、ResNetがサンプルモデルとして使用され、ImageNetデータセットがベンチマークデータセットとして選択される。 densenet と mobilenet はさらに説明され,提案手法のモデル非依存な特性を示す。 ImageNetからランダムに選択された4つの画像上の進化した局所的説明は、進化した局所的説明が人間によって容易に認識されることを示す。さらに、進化した説明は、サンプル画像の有意義な解釈可能な特徴をうまく捉えることで、4つの画像の全ての深部畳み込みニューラルネットワークの予測をうまく説明することができる。実験の30回の実行に基づくさらなる分析により、進化した局所的な説明は、予測を行う際の深層畳み込みニューラルネットワークモデルの確率/確信を向上させることができることが示された。提案手法は,lime (state-of-the-art method) の10倍以上の速度で局所的な説明が得られる。

Deep convolutional neural networks have proven their effectiveness, and have been acknowledged as the most dominant method for image classification. However, a severe drawback of deep convolutional neural networks is poor explainability. Unfortunately, in many real-world applications, users need to understand the rationale behind the predictions of deep convolutional neural networks when determining whether they should trust the predictions or not. To resolve this issue, a novel genetic algorithm-based method is proposed for the first time to automatically evolve local explanations that can assist users to assess the rationality of the predictions. Furthermore, the proposed method is model-agnostic, i.e., it can be utilised to explain any deep convolutional neural network models. In the experiments, ResNet is used as an example model to be explained, and the ImageNet dataset is selected as the benchmark dataset. DenseNet and MobileNet are further explained to demonstrate the model-agnostic characteristic of the proposed method. The evolved local explanations on four images, randomly selected from ImageNet, are presented, which show that the evolved local explanations are straightforward to be recognised by humans. Moreover, the evolved explanations can explain the predictions of deep convolutional neural networks on all four images very well by successfully capturing meaningful interpretable features of the sample images. Further analysis based on the 30 runs of the experiments exhibits that the evolved local explanations can also improve the probabilities/confidences of the deep convolutional neural network models in making the predictions. The proposed method can obtain local explanations within one minute, which is more than ten times faster than LIME (the state-of-the-art method).

翻訳日:2022-11-29 16:35:28 公開日:2022-11-28

# 誤りレベル解析を用いた画像鑑定のためのSOTA画像分類ディープラーニング法による画像検出

Forged Image Detection using SOTA Image Classification Deep Learning Methods for Image Forensics with Error Level Analysis ( http://arxiv.org/abs/2211.15196v1 )

ライセンス: Link先を確認

Raunak Joshi, Abhishek Gupta, Nandan Kanvinde, Pandharinath Ghonge

(参考訳) コンピュータビジョンの領域における進歩は、深層学習機構を用いてもたらされている。 Image Forensicsはコンピュータビジョンアプリケーションの主要な分野の1つである。画像の偽造は画像鑑識のサブカテゴリであり、エラーレベル分析を使用して検出することができる。このようなイメージを入力として使うと、畳み込みニューラルネットワークのバリエーションを利用して、バイナリ分類の問題になってしまう可能性がある。本稿では,casia itde v.2データセットによる誤りレベル解析に基づく最先端画像分類モデルを用いて転送学習を行う。アルゴリズムは vgg-19, inception-v3, resnet-152-v2, xceptionnet, efficientnet-v2l である。

The advancement in the area of computer vision has been brought using deep learning mechanisms. Image Forensics is one of the major areas of computer vision application. Forgery of images is sub-category of image forensics and can be detected using Error Level Analysis. Using such images as an input, this can turn out to be a binary classification problem which can be leveraged using variations of convolutional neural networks. In this paper we perform transfer learning with state-of-the-art image classification models over error level analysis induced CASIA ITDE v.2 dataset. The algorithms used are VGG-19, Inception-V3, ResNet-152-V2, XceptionNet and EfficientNet-V2L with their respective methodologies and results.

翻訳日:2022-11-29 16:35:03 公開日:2022-11-28

# マスクの裏にあるもの:画像間問題における不確かさを推定する

What's Behind the Mask: Estimating Uncertainty in Image-to-Image Problems ( http://arxiv.org/abs/2211.15211v1 )

ライセンス: Link先を確認

Gilad Kutiel, Regev Cohen, Michael Elad, Daniel Freedman

(参考訳) イメージ・ツー・イメージ・ネットワークの不確実性を推定することは重要な課題であり、特にそのようなネットワークが生物学的・医学的な画像領域にますます展開されている。本稿では,マスキングに基づくこの問題に対する新しいアプローチを提案する。既存の画像画像ネットワークを前提として,マスク再構成画像とマスク真の画像との距離が一定の閾値未満であることを保証するマスクを高い確率で計算する。したがって、マスクは再構成された画像のより特定の領域を特定する。我々のアプローチは、基礎となるイメージ・ツー・イメージ・ネットワークとは無関係であり、トレーニングには入力(劣化)、再構成、真のイメージの3倍しか必要としない。さらに,本手法は距離測定値と無関係である。結果として、L_p$スタイルの距離やLPIPSのような知覚距離を使うことができる。我々の理論的な保証は共形校正手順に由来する。我々は,画像のカラー化,画像補完,超解像度タスクにおける不確実性に対するマスクベースアプローチを評価し,それぞれに高品質な性能を示す。

Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use $L_p$-style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each.

翻訳日:2022-11-29 16:34:51 公開日:2022-11-28

# 忘れずにプログレッシブな学習

Progressive Learning without Forgetting ( http://arxiv.org/abs/2211.15215v1 )

ライセンス: Link先を確認

Tao Feng, Hangjie Yuan, Mang Wang, Ziyuan Huang, Ang Bian, Jianzhou Zhang

(参考訳) 得られた知識を忘れずにタスクの変更やシーケンシャルな経験から学ぶことは、ニューラルネットワークにとって難しい問題である。本研究では,従来のデータを含まない連続学習(CL)のパラダイムにおいて,2つの課題に焦点をあてる。 (i)モデルがそれまでの知識を学習する段階的な知識空間によって引き起こされる破滅的な記憶の蓄積 (ii)新しい課題の学習における安定性と可塑性のバランスをとるための無制御の綱引き力学。これらの問題に対処するため、我々はPLwF(Progressive Learning without Forgetting)と、オプティマイザの信用割当制度を提示する。 PLwFは、従来のタスクからモデル関数を導入し、各タスクに関する最も信頼性の高い知識と異なるタスクの分布情報を含む知識空間を構築する。広範囲なアブレーション実験は、PLwFとクレジット割り当ての有効性を示す。他のCL法と比較して,生データに頼らずとも,優れた結果が得られている。

Learning from changing tasks and sequential experience without forgetting the obtained knowledge is a challenging problem for artificial neural networks. In this work, we focus on two challenging problems in the paradigm of Continual Learning (CL) without involving any old data: (i) the accumulation of catastrophic forgetting caused by the gradually fading knowledge space from which the model learns the previous knowledge; (ii) the uncontrolled tug-of-war dynamics to balance the stability and plasticity during the learning of new tasks. In order to tackle these problems, we present Progressive Learning without Forgetting (PLwF) and a credit assignment regime in the optimizer. PLwF densely introduces model functions from previous tasks to construct a knowledge space such that it contains the most reliable knowledge on each task and the distribution information of different tasks, while credit assignment controls the tug-of-war dynamics by removing gradient conflict through projection. Extensive ablative experiments demonstrate the effectiveness of PLwF and credit assignment. In comparison with other CL methods, we report notably better results even without relying on any raw data.

翻訳日:2022-11-29 16:34:33 公開日:2022-11-28

# 画像分類における故障検出のための評価実践を振り返って

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification ( http://arxiv.org/abs/2211.15259v1 )

ライセンス: Link先を確認

Paul F. Jaeger, Carsten T. L\"uth, Lukas Klein and Till J. Bungert

(参考訳) 機械学習に基づく意思決定システムの荒野における信頼性の高い適用は、現在この分野で調査されている大きな課題の1つだ。確立されたアプローチの大部分は、信頼スコアを割り当てることで誤った予測を検出することを目的としている。この信頼性は、モデルの予測の不確かさを定量化したり、明示的なスコアリング関数を学習したり、入力がトレーニング分布と一致しているかを評価することによって得られる。事実、これら全ての状態は実生活のアプリケーション上で分類器の故障を検出するという同じ目標に対処するが、現在では個々の評価プロトコルで大半を分離した研究分野を構成しており、関連する手法のかなりの部分を除外するか、関連する障害源の大部分を無視する。本研究では,これらの不整合に起因する現在の落とし穴を系統的に明らかにし,障害検出の全体的かつ現実的な評価のための要件を導出する。この統一的な視点の関連性を示すために,本研究では,信頼度スコアリング関数w.r.tを,関連するすべての方法と障害源として,初めて大規模実証研究を行う。簡便なソフトマックス応答ベースラインの総合的評価手法としての啓示は、信頼度スコアリングに関する公開研究が豊富にある中で、現在の評価の劇的な欠点を浮き彫りにしている。コードとトレーニングされたモデルはhttps://github.com/IML-DKFZ/fd-shiftsにある。

Reliable application of machine learning-based decision systems in the wild is one of the major challenges currently investigated by the field. A large portion of established approaches aims to detect erroneous predictions by means of assigning confidence scores. This confidence may be obtained by either quantifying the model's predictive uncertainty, learning explicit scoring functions, or assessing whether the input is in line with the training distribution. Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. To demonstrate the relevance of this unified perspective, we present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions w.r.t all relevant methods and failure sources. The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation in the abundance of publicized research on confidence scoring. Code and trained models are at https://github.com/IML-DKFZ/fd-shifts.

翻訳日:2022-11-29 16:34:16 公開日:2022-11-28

# パーシステンスバーコードの誘導マッチングによる位相的忠実な画像分割

Topologically faithful image segmentation via induced matching of persistence barcodes ( http://arxiv.org/abs/2211.15272v1 )

ライセンス: Link先を確認

Nico Stucki, Johannes C. Paetzold, Suprosanna Shit, Bjoern Menze, Ulrich Bauer

(参考訳) 画像のセグメンテーションは、ニューラルネットワークが多くの技術分野において膨大な応用を見出す研究分野である。セグメンテーションネットワークを訓練する最も一般的なアプローチは、多くのセグメンテーションタスクで不十分な目的であるピクセルオーバーラップを最適化する損失関数を用いる。近年、それらの限界は、セグメント構造の正しいトポロジーを回復することを目的としたトポロジー認識法への関心を高めた。しかし、これまでのアプローチでは、地上の真実と予測のトポロジ的特徴の空間的整合性は得られていない。本研究では,教師付き画像セグメンテーションのためのトポロジカルかつ特徴的に正確な計量と損失関数を提案し,これをベッチマッチングと呼ぶ。セグメント化設定におけるバーコード間の空間的整合性を保証する方法を示す。さらに,画像のベッチマッチングを計算するための効率的なアルゴリズムを提案する。ベッチマッチング誤差はセグメンテーションの位相的正しさを評価するための解釈可能な指標であり,既定のベッチ数誤差よりも感度が高いことを示す。さらに、ベッチマッチング損失の微分性は、損失関数としての使用を可能にする。ボリューム性能を維持しながら、6つの多様なデータセットにわたるセグメンテーションネットワークのトポロジ的パフォーマンスを改善する。私たちのコードはhttps://github.com/nstucki/betti-matchingで利用可能です。

Image segmentation is a largely researched field where neural networks find vast applications in many facets of technology. Some of the most popular approaches to train segmentation networks employ loss functions optimizing pixel-overlap, an objective that is insufficient for many segmentation tasks. In recent years, their limitations fueled a growing interest in topology-aware methods, which aim to recover the correct topology of the segmented structures. However, so far, none of the existing approaches achieve a spatially correct matching between the topological features of ground truth and prediction. In this work, we propose the first topologically and feature-wise accurate metric and loss function for supervised image segmentation, which we term Betti matching. We show how induced matchings guarantee the spatially correct matching between barcodes in a segmentation setting. Furthermore, we propose an efficient algorithm to compute the Betti matching of images. We show that the Betti matching error is an interpretable metric to evaluate the topological correctness of segmentations, which is more sensitive than the well-established Betti number error. Moreover, the differentiability of the Betti matching loss enables its use as a loss function. It improves the topological performance of segmentation networks across six diverse datasets while preserving the volumetric performance. Our code is available in https://github.com/nstucki/Betti-matching.

翻訳日:2022-11-29 16:33:54 公開日:2022-11-28

# 衛星画像生成のための条件付きプログレッシブ・ジェネレーティブ・アドバイサル・ネットワーク

Conditional Progressive Generative Adversarial Network for satellite image generation ( http://arxiv.org/abs/2211.15303v1 )

ライセンス: Link先を確認

Renato Cardoso, Sofia Vallecorsa, Edoardo Nemni

(参考訳) 画像生成と画像補完は、欠落したピクセルを現実的に置き換えることができる機械学習アルゴリズムのおかげで、急速に進化している。しかし,高解像度画像を高精細度で生成することは重要な計算課題である。本研究では、3つの隅のうち1つが欠けている画像の完成として画像生成タスクを定式化する。そして、このアプローチを拡張して、同じレベルのディテールで大きなイメージを反復的に構築します。我々の目標は、衛星画像データセットに典型的な高解像度のサンプルを生成するためのスケーラブルな手法を得ることである。本稿では,wassersteinオートエンコーダによって潜在ベクトルにエンコードされた3つの初期隣接タイルを入力として,画像中の欠落タイルを生成する条件付きプログレッシブ生成逆ネットワーク(gan)を提案する。我々は,国連衛星センター(unosat)が洪水検知ツールの訓練に使用する画像セットに着目し,合成画像の品質を現実的な設定で検証する。

Image generation and image completion are rapidly evolving fields, thanks to machine learning algorithms that are able to realistically replace missing pixels. However, generating large high resolution images, with a large level of details, presents important computational challenges. In this work, we formulate the image generation task as completion of an image where one out of three corners is missing. We then extend this approach to iteratively build larger images with the same level of detail. Our goal is to obtain a scalable methodology to generate high resolution samples typically found in satellite imagery data sets. We introduce a conditional progressive Generative Adversarial Networks (GAN), that generates the missing tile in an image, using as input three initial adjacent tiles encoded in a latent vector by a Wasserstein auto-encoder. We focus on a set of images used by the United Nations Satellite Centre (UNOSAT) to train flood detection tools, and validate the quality of synthetic images in a realistic setup.

翻訳日:2022-11-29 16:33:33 公開日:2022-11-28

# 良いヘルパーはあなたの周りにある:注意駆動マスク画像モデリング

Good helper is around you: Attention-driven Masked Image Modeling ( http://arxiv.org/abs/2211.15362v1 )

ライセンス: Link先を確認

Jie Gui, Zhengqi Liu, Hao Luo

(参考訳) マスク付き画像モデリング(MIM)は,過去1年間,自己教師型学習において大きな可能性を秘めてきた。 MIMは、ユニバーサルバックボーン・ビジョン・トランスフォーマーから恩恵を受け、画像のパッチの一部を隠蔽し、欠落したピクセルを回復しようとすることで、自己監督された視覚表現を学習する。これまでのほとんどの作業では、画像のパッチをランダムにマスクし、視覚表現学習に有用な意味情報を弱めている。一方、バックボーンの大きさが大きいため、以前のほとんどの作品は事前トレーニングに多くの時間を費やしなければならない。本稿では,上記の2つの問題を解くことができるtextbf{Attention-driven Masking and Throwing Strategy} (AMT)を提案する。まず,教師付き手法を使わずに,学習過程中に画像の意味情報を自動取得するために自己照査機構を利用する。マスキング戦略は、その情報を選択的にマスキング領域に誘導することができ、表現学習に役立つ。さらに,冗長なパッチスロー戦略を提案し,学習をより効率的にする。マスク画像モデリング用プラグアンドプレイモジュールとして、AMTは、CIFAR-10/100, STL-10, Tiny ImageNet, ImageNet-1K上のMAEの線形探索精度を$2.9\% \sim 5.9\%で改善し、MAEとSimMIMの微調整精度に関して改善された性能を得る。さらに、この設計は下流検出およびセグメント化タスクにおいて優れた性能を達成する。

It has been witnessed that masked image modeling (MIM) has shown a huge potential in self-supervised learning in the past year. Benefiting from the universal backbone vision transformer, MIM learns self-supervised visual representations through masking a part of patches of the image while attempting to recover the missing pixels. Most previous works mask patches of the image randomly, which underutilizes the semantic information that is beneficial to visual representation learning. On the other hand, due to the large size of the backbone, most previous works have to spend much time on pre-training. In this paper, we propose \textbf{Attention-driven Masking and Throwing Strategy} (AMT), which could solve both problems above. We first leverage the self-attention mechanism to obtain the semantic information of the image during the training process automatically without using any supervised methods. Masking strategy can be guided by that information to mask areas selectively, which is helpful for representation learning. Moreover, a redundant patch throwing strategy is proposed, which makes learning more efficient. As a plug-and-play module for masked image modeling, AMT improves the linear probing accuracy of MAE by $2.9\% \sim 5.9\%$ on CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-1K, and obtains an improved performance with respect to fine-tuning accuracy of MAE and SimMIM. Moreover, this design also achieves superior performance on downstream detection and segmentation tasks.

翻訳日:2022-11-29 16:33:15 公開日:2022-11-28

# ディープラーニングオプティマイザの探索-第1および第2次方法-

A survey of deep learning optimizers-first and second order methods ( http://arxiv.org/abs/2211.15596v1 )

ライセンス: Link先を確認

Rohan V Kashyap

(参考訳) 深層学習最適化は、サドル点、局所小数点、ヘッセンおよび限られた計算資源の不調和などの固有の困難により、しばしば困難であると見なされる重み空間における高次元損失関数の最小化を伴う。本稿では,深層学習における12の標準最適化手法の包括的レビューを行い,最適化文献から数値最適化の困難さを理論的に評価する。

Deep Learning optimization involves minimizing a high-dimensional loss function in the weight space which is often perceived as difficult due to its inherent difficulties such as saddle points, local minima, ill-conditioning of the Hessian and limited compute resources. In this paper, we provide a comprehensive review of 12 standard optimization methods successfully used in deep learning research and a theoretical assessment of the difficulties in numerical optimization from the optimization literature.

翻訳日:2022-11-29 16:32:25 公開日:2022-11-28

# FaiREE:Finite-Sample と Distribution-free Guarantee による公平な分類

FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee ( http://arxiv.org/abs/2211.15072v1 )

ライセンス: Link先を確認

Puheng Li, James Zou, Linjun Zhang

(参考訳) アルゴリズム的公平性は、機械学習研究においてますます重要な役割を果たす。いくつかのグループフェアネスの概念とアルゴリズムが提案されている。しかし、既存の公平な分類方法の公平性保証は、多くの場合、大きなサンプルサイズを必要とする特定のデータ分布の仮定に主に依存しており、サンプルが少なからぬ数である場合には公平性に違反する可能性がある。本稿では,有限サンプルと分布フリーな理論保証で群フェアネス制約を満たすフェア分類アルゴリズムであるfairを提案する。 FaiREEは、グループフェアネスの概念(例えば、機会の平等、平等化オッド、デモグラフィックパリティなど)を満たし、最適な精度を達成するように適応することができる。これらの理論的保証は、合成データと実データの両方の実験によってさらに支持される。 FaiREEは最先端のアルゴリズムよりも優れた性能を示した。

Algorithmic fairness plays an increasingly critical role in machine learning research. Several group fairness notions and algorithms have been proposed. However, the fairness guarantee of existing fair classification methods mainly depends on specific data distributional assumptions, often requiring large sample sizes, and fairness could be violated when there is a modest number of samples, which is often the case in practice. In this paper, we propose FaiREE, a fair classification algorithm that can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. FaiREE can be adapted to satisfy various group fairness notions (e.g., Equality of Opportunity, Equalized Odds, Demographic Parity, etc.) and achieve the optimal accuracy. These theoretical guarantees are further supported by experiments on both synthetic and real data. FaiREE is shown to have favorable performance over state-of-the-art algorithms.

翻訳日:2022-11-29 16:15:58 公開日:2022-11-28

# グラフ上のガウス過程のためのトランスダクティブカーネル

Transductive Kernels for Gaussian Processes on Graphs ( http://arxiv.org/abs/2211.15322v1 )

ライセンス: Link先を確認

Yin-Cong Zhi, Felix L. Opolka, Yin Cheng Ng, Pietro Li\`o, Xiaowen Dong

(参考訳) グラフ上のカーネルは、ノードレベルの問題に対する選択肢が限られている。そこで本研究では,ノード特徴データ付きグラフ用カーネルを,半教師付き学習用として提案する。カーネルは、グラフと特徴データを2つのヒルベルト空間として扱うことで正規化フレームワークから派生する。また、グラフ上のカーネルベースのモデルが私たちの設計の例であることも示しています。この方法で定義されたカーネルは、トランスダクティブ特性を持ち、より少ないトレーニングポイントで学習する能力が向上し、高度に非ユークリッドなデータの処理性が向上する。グラフ全体の分布がラベルのパターンを知らせることができる合成データを用いて,これらの利点を実証する。最後に、カーネル内のグラフラプラシアンの柔軟な多項式を利用することで、モデルは様々なレベルのホモフィリーグラフ上の半教師付き分類でも効果的に機能する。

Kernels on graphs have had limited options for node-level problems. To address this, we present a novel, generalized kernel for graphs with node feature data for semi-supervised learning. The kernel is derived from a regularization framework by treating the graph and feature data as two Hilbert spaces. We also show how numerous kernel-based models on graphs are instances of our design. A kernel defined this way has transductive properties, and this leads to improved ability to learn on fewer training points, as well as better handling of highly non-Euclidean data. We demonstrate these advantages using synthetic data where the distribution of the whole graph can inform the pattern of the labels. Finally, by utilizing a flexible polynomial of the graph Laplacian within the kernel, the model also performed effectively in semi-supervised classification on graphs of various levels of homophily.

翻訳日:2022-11-29 16:15:43 公開日:2022-11-28

# 観測データを用いた因果深い強化学習

Causal Deep Reinforcement Learning using Observational Data ( http://arxiv.org/abs/2211.15355v1 )

ライセンス: Link先を確認

Wenxuan Zhu, Chao Yu, Qiang Zhang

(参考訳) 深層強化学習(DRL)は、自動運転車や医療分野など、現実の世界では高価で倫理的ではない多くの介入データを収集する必要がある。オフライン強化学習は、現実世界で利用可能な膨大な観測データを活用することでこの問題を軽減することを約束している。しかし、観測データは、データを生成する行動ポリシーが観測されていない確率変数(つまり共同設立者)に依存する場合、学習エージェントを望ましくない結果へと誤解させる可能性がある。本稿では,この問題に対処するため,DRLにおける2つの分離手法を提案する。提案手法はまず,因果推論手法に基づいて異なるサンプルの重要度を算出し,その不偏性を確保するためにオフラインデータセットを再重み付けあるいは再サンプリングすることにより,損失関数に対する異なるサンプルの影響を調整する。これらの解離法は、これらのアルゴリズムの損失関数によって弱条件を満たすことができることを条件として、ソフトアクター批判や深部Q-ラーニングのような既存のモデルフリーDRLアルゴリズムと柔軟に組み合わせることができる。本手法の有効性を実証し,実験的に検証する。

Deep reinforcement learning (DRL) requires the collection of plenty of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with the existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.

翻訳日:2022-11-29 16:15:30 公開日:2022-11-28

# 未知測定ノイズを持つ物理形ニューラルネットワーク

Physics-informed neural networks with unknown measurement noise ( http://arxiv.org/abs/2211.15498v1 )

ライセンス: Link先を確認

Philipp Pilar, Niklas Wahlstr\"om

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、解の発見と偏微分方程式のパラメータの同定の両方に対する柔軟なアプローチである。ほとんどの作業はノイズのないデータや、ガウス雑音によって汚染されたデータを想定している。標準の pinn フレームワークが非ガウスノイズの場合に分解されることを示す。本稿では,この基本的な問題を解決する方法を提供し,エネルギーベースモデル(EBM)を協調訓練して,正しい雑音分布を学習することを提案する。複数の例を用いて,提案手法の性能改善について述べる。

Physics-informed neural networks (PINNs) constitute a flexible approach to both finding solutions and identifying parameters of partial differential equations. Most works on the topic assume noiseless data, or data contaminated by weak Gaussian noise. We show that the standard PINN framework breaks down in case of non-Gaussian noise. We give a way of resolving this fundamental issue and we propose to jointly train an energy-based model (EBM) to learn the correct noise distribution. We illustrate the improved performance of our approach using multiple examples.

翻訳日:2022-11-29 16:15:11 公開日:2022-11-28

# 分散を超えて:"純粋"相関を持つ分布に対するテスト時間ラベルシフト適応

Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations ( http://arxiv.org/abs/2211.15646v1 )

ライセンス: Link先を確認

Qingyao Sun (University of Chicago), Kevin Murphy (Google Brain), Sayna Ebrahimi (Google Cloud), Alexander D'Amour (Google Brain)

(参考訳) 厳密な相関、あるいはモデルをデプロイ可能なドメイン間で変化する相関は、機械学習モデルの現実的な応用に重大な課題をもたらす。しかし、そのような相関は常に「純然たる」とは限らない;多くの場合、それらは入力のみから抽出できる以上の予測のための貴重な事前情報を提供する。本稿では,非分散によるスプリアス相関を解消しようとする近年のアプローチとは対照的に,スプリアス相関現象を利用したテスト時間適応法を提案する。クラスラベル $y$ とニュアサンス係数 $z$ の間の限界依存性をモデル化する事前分布 $p(y, z)$ がドメイン間で変化する可能性があるが、フィーチャ $p(\mathbf{x}|y, z)$ の生成モデルは一定である。これはラベルシフトの仮定の拡張版であり、そこではラベルには$z$というニュアンス要素も含まれている。この観測に基づいて、ソース分布上で$p(y, z|\mathbf{x})$を予測できるように分類器を訓練し、対象領域からの未ラベルのサンプルを用いて、限界分布$p(y, z)$の変化に対応するテストタイムラベルシフト補正を実装する。我々はこの手法をTTLSA(Test-Time Label-Shift Adaptation)と呼ぶ。我々は、CheXpertの胸部X線データセットと色付きMNISTデータセットの2つの異なる画像データセットに適用し、従来の分布の変化に不変な分類器を訓練する手法よりも、下流結果が優れていることを示す。コード再現実験はhttps://github.com/nalzok/test-time-label-shiftで利用可能である。

Spurious correlations, or correlations that change across domains where a model can be deployed, present significant challenges to real-world applications of machine learning models. However, such correlations are not always "spurious"; often, they provide valuable prior information for a prediction beyond what can be extracted from the input alone. Here, we present a test-time adaptation method that exploits the spurious correlation phenomenon, in contrast to recent approaches that attempt to eliminate spurious correlations through invariance. We consider situations where the prior distribution $p(y, z)$, which models the marginal dependence between the class label $y$ and the nuisance factors $z$, may change across domains, but the generative model for features $p(\mathbf{x}|y, z)$ is constant. We note that this is an expanded version of the label shift assumption, where the labels now also include the nuisance factors $z$. Based on this observation, we train a classifier to predict $p(y, z|\mathbf{x})$ on the source distribution, and implement a test-time label shift correction that adapts to changes in the marginal distribution $p(y, z)$ using unlabeled samples from the target domain. We call our method "Test-Time Label-Shift Adaptation" or TTLSA. We apply our method to two different image datasets -- the CheXpert chest X-ray dataset and the colored MNIST dataset -- and show that it gives better downstream results than methods that try to train classifiers which are invariant to the changes in prior distribution. Code reproducing experiments is available at https://github.com/nalzok/test-time-label-shift .

翻訳日:2022-11-29 16:15:03 公開日:2022-11-28

# 時系列予測のための階層誘導モデル選択

Hierarchy-guided Model Selection for Time Series Forecasting ( http://arxiv.org/abs/2211.15092v1 )

ライセンス: Link先を確認

Arindam Jati, Vijay Ekambaram, Shaonli Pal, Brian Quanz, Wesley M. Gifford, Pavithra Harsha, Stuart Siegel, Sumanta Mukherjee, Chandra Narayanaswami

(参考訳) 時系列予測モデルの一般化は、モデル選択の質に依存する。時間的クロスバリデーション(TCV)は予測タスクにおいてモデル選択を行う標準的な手法である。 TCVは、トレーニング時系列を列車および検証ウィンドウに順次分割し、予測モデルのハイパーパラメータ最適化(HPO)を行い、最高の検証性能でモデルを選択する。 TCVを用いたモデル選択は、テストデータの分布が検証データと異なる場合、テスト性能が低下することが多い。本稿では,時系列データセットに関連するデータ階層を利用した新しいモデル選択法h-proを提案する。一般的に、階層の上位レベルの集約されたデータは、よりスパースで(時には)断続的なボトムレベルのデータと比較して予測可能性と一貫性が向上する。 h-proは、階層内の上位レベルの教師モデルの集合から得られたテストプロキシ予測に基づいて、最低レベルの学生モデルのhpoを実行する。教師のプロキシ予測の整合性は、最低レベルでより良い生徒モデルを選択するのに役立つ。提案手法の有効性を検証するため,複数のデータセットについて広範な実験を行った。 H-Proは、既成の予測モデルとともに、M5ポイント予測競争の勝利モデルを含む既存の最先端予測手法を上回っている。

Generalizability of time series forecasting models depends on the quality of model selection. Temporal cross validation (TCV) is a standard technique to perform model selection in forecasting tasks. TCV sequentially partitions the training time series into train and validation windows, and performs hyperparameter optmization (HPO) of the forecast model to select the model with the best validation performance. Model selection with TCV often leads to poor test performance when the test data distribution differs from that of the validation data. We propose a novel model selection method, H-Pro that exploits the data hierarchy often associated with a time series dataset. Generally, the aggregated data at the higher levels of the hierarchy show better predictability and more consistency compared to the bottom-level data which is more sparse and (sometimes) intermittent. H-Pro performs the HPO of the lowest-level student model based on the test proxy forecasts obtained from a set of teacher models at higher levels in the hierarchy. The consistency of the teachers' proxy forecasts help select better student models at the lowest-level. We perform extensive empirical studies on multiple datasets to validate the efficacy of the proposed method. H-Pro along with off-the-shelf forecasting models outperform existing state-of-the-art forecasting methods including the winning models of the M5 point-forecasting competition.

翻訳日:2022-11-29 15:58:25 公開日:2022-11-28

# GraphPNAS:ディープグラフ生成モデルによる優れたニューラルネットワークの分布学習

GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models ( http://arxiv.org/abs/2211.15155v1 )

ライセンス: Link先を確認

Muchen Li, Jeffrey Yunfan Liu, Leonid Sigal, Renjie Liao

(参考訳) ニューラルアーキテクチャは自然に計算グラフと見なすことができる。本稿では,この視点に動機づけられ,ランダムグラフモデル学習のレンズを通してニューラルネットワーク探索(nas)について検討する。単一最良アーキテクチャ,すなわち点推定に重点を置いている既存のNAS手法とは対照的に,優れたアーキテクチャの分布を学習するグラフ生成モデルであるGraphPNASを提案する。 GraphPNASはグラフニューラルネットワーク(GNN)に基づいて、優れたニューラルネットワークのトポロジとオペレータ間の関係をよりよく捉えます。さらに, グラフ生成器は, 一般的なrnn生成器やランダム探索法よりも柔軟で効率的な学習可能な確率的探索法をもたらす。最後に、NASのための効率的な強化学習定式化により、発電機を学習する。 GraphPNASの有効性を評価するため,TinyImageNet上でのRandWire,CIFAR10上でのENAS,NAS-Bench-101/201など,3つの検索空間で広範囲にわたる実験を行った。 RandWireの複雑さは他の文献の検索空間よりもはるかに大きい。提案するグラフジェネレータは,RNNベースよりも一貫して優れており,最先端のNAS手法よりも優れた,あるいは同等のパフォーマンスが得られることを示す。

Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.

翻訳日:2022-11-29 15:58:04 公開日:2022-11-28

# マルチビュー探索最大化による視覚制御

Tackling Visual Control via Multi-View Exploration Maximization ( http://arxiv.org/abs/2211.15233v1 )

ライセンス: Link先を確認

Mingqi Yuan, Xin Jin, Bo Li, Wenjun Zeng

(参考訳) 本稿では,複雑なビジュアル制御タスクに取り組むためのマルチビュー探索の最大化について述べる。我々の知る限りでは、MEMは多視点表現学習と本質的な報酬駆動による強化学習(RL)を組み合わせた最初のアプローチである。より具体的には、memはまずマルチビュー観察の具体的かつ共有的な情報を抽出し、学習した機能でrlを実行する前に高品質な機能を形成する。さらに、MEMは多視点特徴をエントロピー最大化に基づく固有報酬に変換することにより探索を促進する。その結果、MEMはRLエージェントの試料効率と一般化能力を著しく向上させ、高次元の観測と余剰空間で現実の問題を解くのに役立てることができる。我々は,DeepMind Control Suite と Procgen の様々なタスクにおける MEM の評価を行った。広範なシミュレーション結果から、memは優れたパフォーマンスを達成でき、単純なアーキテクチャと高い効率でベンチマークスキームを上回ることが示される。

We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.

翻訳日:2022-11-29 15:57:41 公開日:2022-11-28

# 医療意思決定における因果介入のベイズ的ネットワークモデル:文献レビューとソフトウェア評価

Bayesian Network Models of Causal Interventions in Healthcare Decision Making: Literature Review and Software Evaluation ( http://arxiv.org/abs/2211.15258v1 )

ライセンス: Link先を確認

Artem Velikzhanin, Benjie Wang and Marta Kwiatkowska

(参考訳) 本報告は,医療における意思決定を支援するベイズネットワークモデルを特定するための体系的文献探索の結果をまとめたものである。検索手法を説明した後、wang b, lyle c, kwiatkowska m (2021) で開発された因果的介入分析ソフトウェアツールを用いて分析に適した公開モデルとデータセットを識別するために、選択された研究論文を簡潔にレビューする。最後に,ソフトウェアをモデル選択に適用する実験的な評価を行い,予備的な結果を報告する。

This report summarises the outcomes of a systematic literature search to identify Bayesian network models used to support decision making in healthcare. After describing the search methodology, the selected research papers are briefly reviewed, with the view to identify publicly available models and datasets that are well suited to analysis using the causal interventional analysis software tool developed in Wang B, Lyle C, Kwiatkowska M (2021). Finally, an experimental evaluation of applying the software on a selection of models is carried out and preliminary results are reported.

翻訳日:2022-11-29 15:57:23 公開日:2022-11-28

# 時変低ランク自己回帰による時空間データからの動的パターンの発見

Discovering Dynamic Patterns from Spatiotemporal Data with Time-Varying Low-Rank Autoregression ( http://arxiv.org/abs/2211.15482v1 )

ライセンス: Link先を確認

Xinyu Chen and Chengyuan Zhang and Xiaoxu Chen and Nicolas Saunier and Lijun Sun

(参考訳) 本稿では,時空間データ解析における広範な実践的関心,すなわち時空間データからの解釈可能な動的パターンの発見について述べる。この目的に向けて, 係数行列が低ランクテンソル因子分解によってパラメータ化される時間変化低減ランクベクトル自己回帰(var)モデルを開発した。テンソル因子化構造を利用して,モデル圧縮とパターン発見を同時に行うことができる。特に,提案モデルでは時空間データに基づく非定常性と時間変化システムの挙動を特徴付けることができる。提案モデルを評価するために, 流体力学, 海面温度, USA表面温度, NYCタクシートリップなど, 様々な非線形力学系を表す様々な時空間データを用いて実験を行った。実験結果は,時空間データをモデル化し,提案モデルを用いて空間的・時空間的パターンを特徴付ける効果を示す。空間的文脈では、空間的パターンを自動的に抽出し、空間的モードによって直感的に特徴付けることができる。時間的文脈において、複雑な時変系の挙動は、提案されたモデルの時間的モードによって明らかにされる。したがって,本モデルは実世界の動的システムにおける複雑な時空間データを理解するための洞察に富んだ基礎を築いた。適応データセットとPythonの実装はhttps://github.com/xinychen/vars.comで公開されている。

The problem of broad practical interest in spatiotemporal data analysis, i.e., discovering interpretable dynamic patterns from spatiotemporal data, is studied in this paper. Towards this end, we develop a time-varying reduced-rank vector autoregression (VAR) model whose coefficient matrices are parameterized by low-rank tensor factorization. Benefiting from the tensor factorization structure, the proposed model can simultaneously achieve model compression and pattern discovery. In particular, the proposed model allows one to characterize nonstationarity and time-varying system behaviors underlying spatiotemporal data. To evaluate the proposed model, extensive experiments are conducted on various spatiotemporal data representing different nonlinear dynamical systems, including fluid dynamics, sea surface temperature, USA surface temperature, and NYC taxi trips. Experimental results demonstrate the effectiveness of modeling spatiotemporal data and characterizing spatial/temporal patterns with the proposed model. In the spatial context, the spatial patterns can be automatically extracted and intuitively characterized by the spatial modes. In the temporal context, the complex time-varying system behaviors can be revealed by the temporal modes in the proposed model. Thus, our model lays an insightful foundation for understanding complex spatiotemporal data in real-world dynamical systems. The adapted datasets and Python implementation are publicly available at https://github.com/xinychen/vars.

翻訳日:2022-11-29 15:56:51 公開日:2022-11-28

# 学習ブルームフィルタにおける分類器選択の臨界解析

A Critical Analysis of Classifier Selection in Learned Bloom Filters ( http://arxiv.org/abs/2211.15565v1 )

ライセンス: Link先を確認

Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli, Raffaele Giancarlo, Marco Frasca

(参考訳) 学習されたブルームフィルタ、すなわち、機械学習技術を介してデータから誘導されるモデルと、近似された集合メンバシップ問題の解決は、特に空間占有に焦点を当てた標準的なブルームフィルタの性能向上を目的として最近導入された。古典的な場合とは異なり、フィルタを構築するために使用されるデータの「複雑さ」は、その性能に大きな影響を与える可能性がある。そこで本研究では,与えられた分類複雑性のデータセット上で,与えられた学習ブルームフィルタの性能評価を行うための,私たちの知識を最大限活用するための,最初の深度解析を提案する。実際、我々はソフトウェアがサポートする新しい手法を提案し、学習されたブルームフィルタの設計、解析、実装を行い、そのマルチクリトリア性(すなわち、空間効率、偽陽性率、拒絶時間を含む制約)に特定の制約を課す。提案手法と支援ソフトウェアが有効で有用であることを示す実験により,データ複雑性の異なる問題に対して,2つの分類器だけが望ましい特性を持つことが判明し,文献にはこれまで検討されていない。また,学習されたブルームフィルタのサンドウィッチ化が,データ複雑性や分類器の性能変動に対して最も頑健であることも実験的に示した。このソフトウェアは、新たに学習されたbloomフィルタの提案をテストするために簡単に利用できる。

Learned Bloom Filters, i.e., models induced from data via machine learning techniques and solving the approximate set membership problem, have recently been introduced with the aim of enhancing the performance of standard Bloom Filters, with special focus on space occupancy. Unlike in the classical case, the "complexity" of the data used to build the filter might heavily impact on its performance. Therefore, here we propose the first in-depth analysis, to the best of our knowledge, for the performance assessment of a given Learned Bloom Filter, in conjunction with a given classifier, on a dataset of a given classification complexity. Indeed, we propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters in function of specific constraints on their multi-criteria nature (that is, constraints involving space efficiency, false positive rate, and reject time). Our experiments show that the proposed methodology and the supporting software are valid and useful: we find out that only two classifiers have desirable properties in relation to problems with different data complexity, and, interestingly, none of them has been considered so far in the literature. We also experimentally show that the Sandwiched variant of Learned Bloom filters is the most robust to data complexity and classifier performance variability, as well as those usually having smaller reject times. The software can be readily used to test new Learned Bloom Filter proposals, which can be compared with the best ones identified here.

翻訳日:2022-11-29 15:56:30 公開日:2022-11-28

# 強化学習における知識伝達のための不適用行動学習

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning ( http://arxiv.org/abs/2211.15589v1 )

ライセンス: Link先を確認

Leo Ardon, Alberto Pozanco, Daniel Borrajo, Sumitra Ganesh

(参考訳) 強化学習(rl)アルゴリズムは、利用可能なアクションがたくさんある環境ではスケールが悪く、最適なポリシーを学ぶために多数のサンプルを必要とすることが知られている。あらゆる可能な状態において同じ固定されたアクション空間を考える伝統的なアプローチは、エージェントが、その報酬を最大化するためにも、$\textit{inapplicable actions}$のような無関係なアクション(つまり、与えられた状態において実行された環境に影響を与えないアクション)を無視しなければならないことを意味する。この情報を知ることで、ポリシー分布から適用不可能なアクションを隠蔽し、最適なポリシーを見つけるためのアクションのみを探索することで、RLアルゴリズムのサンプルの複雑さを低減することができる。これは通常、RLアルゴリズムに手作りのドメインロジックを追加してアドホックな方法で行われる。本稿では,この知識をアルゴリズムに導入するためのより体系的な手法を提案する。私たち (i) エージェントに対して知識を手動で指定する方法を標準化すること。 (II)政策と協調してこれらの国家依存的行動制約を自律的に学習する新しい枠組みを提案する。本研究では,学習不可能な動作が,無関係な動作を隠蔽する信頼性の高い信号を提供することにより,アルゴリズムのサンプル効率を大幅に向上することを示す。さらに,取得した知識の伝達性により,学習プロセスを効率化するために他のタスクで再利用できることを実証する。

Reinforcement Learning (RL) algorithms are known to scale poorly to environments with many available actions, requiring numerous samples to learn an optimal policy. The traditional approach of considering the same fixed action space in every possible state implies that the agent must understand, while also learning to maximize its reward, to ignore irrelevant actions such as $\textit{inapplicable actions}$ (i.e. actions that have no effect on the environment when performed in a given state). Knowing this information can help reduce the sample complexity of RL algorithms by masking the inapplicable actions from the policy distribution to only explore actions relevant to finding an optimal policy. This is typically done in an ad-hoc manner with hand-crafted domain logic added to the RL algorithm. In this paper, we propose a more systematic approach to introduce this knowledge into the algorithm. We (i) standardize the way knowledge can be manually specified to the agent; and (ii) present a new framework to autonomously learn these state-dependent action constraints jointly with the policy. We show experimentally that learning inapplicable actions greatly improves the sample efficiency of the algorithm by providing a reliable signal to mask out irrelevant actions. Moreover, we demonstrate that thanks to the transferability of the knowledge acquired, it can be reused in other tasks to make the learning process more efficient.

翻訳日:2022-11-29 15:56:06 公開日:2022-11-28

# 小口径バイオメディカルデータに対する特徴選択による重み予測ネットワーク

Weight Predictor Network with Feature Selection for Small Sample Tabular Biomedical Data ( http://arxiv.org/abs/2211.15616v1 )

ライセンス: Link先を確認

Andrei Margeloiu, Nikola Simidjievski, Pietro Lio, Mateja Jamnik

(参考訳) タブラルバイオメディカルデータはしばしば高次元であるが、非常に少数のサンプルを持つ。最近の研究は、よく規則化された単純なニューラルネットワークが、グラフデータ上のより洗練されたアーキテクチャよりも優れていることを示したが、それでも多くの潜在的に無関係な機能を持つ小さなデータセットに過度に適合する傾向にある。これらの問題に対処するために,ニューラルネットワークを高次元および小サンプルデータから学習するための重み予測器ネットワーク(WPFS)を提案し,学習可能なパラメータの数を削減し,同時に特徴選択を行う。分類ネットワークに加えて、WPFSは2つの小さな補助ネットワークを使用して分類モデルの第一層の重みを出力する。我々は9つの実世界のバイオメディカルデータセットを評価し、wpfが他の標準よりも優れており、表データに適用するより最近の方法であることを示す。さらに,提案する特徴選択機構について検討し,学習課題に対する有用な洞察を提供しながら,性能の向上を示す。

Tabular biomedical data is often high-dimensional but with a very small number of samples. Although recent work showed that well-regularised simple neural networks could outperform more sophisticated architectures on tabular data, they are still prone to overfitting on tiny datasets with many potentially irrelevant features. To combat these issues, we propose Weight Predictor Network with Feature Selection (WPFS) for learning neural networks from high-dimensional and small sample data by reducing the number of learnable parameters and simultaneously performing feature selection. In addition to the classification network, WPFS uses two small auxiliary networks that together output the weights of the first layer of the classification model. We evaluate on nine real-world biomedical datasets and demonstrate that WPFS outperforms other standard as well as more recent methods typically applied to tabular data. Furthermore, we investigate the proposed feature selection mechanism and show that it improves performance while providing useful insights into the learning task.

翻訳日:2022-11-29 15:55:43 公開日:2022-11-28

# 条件付き生成モデリングは意思決定に必要なすべてか?

Is Conditional Generative Modeling all you need for Decision-Making? ( http://arxiv.org/abs/2211.15657v1 )

ライセンス: Link先を確認

Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal

(参考訳) 近年の条件生成モデルの改良により,言語記述だけで高品質な画像を生成することが可能になった。これらの手法が逐次意思決定の問題に直接対処できるかどうかを検討する。我々は、強化学習(RL)のレンズを通してではなく、条件付き生成モデルを通して意思決定を行う。驚いたことに、私たちの定式化は、標準ベンチマークで既存のオフラインRLアプローチを上回り得るポリシーにつながります。ポリシーを戻り条件拡散モデルとしてモデル化することで、動的プログラミングの必要性を回避し、それから従来のオフラインrlで発生する多くの複雑さを排除する方法を説明します。さらに,条件拡散モデルとしてのポリシーモデリングの利点を,制約とスキルの2つの条件変数を考慮に入れて実証する。トレーニング中の単一の制約やスキルの条件付けは、複数の制約を満たすか、あるいはスキルの組み合わせを示すテスト時の振る舞いにつながります。条件付き生成モデリングは意思決定のための強力なツールであることを示す。

Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential decision-making. We view decision-making not through the lens of reinforcement learning (RL), but rather through conditional generative modeling. To our surprise, we find that our formulation leads to policies that can outperform existing offline RL approaches across standard benchmarks. By modeling a policy as a return-conditional diffusion model, we illustrate how we may circumvent the need for dynamic programming and subsequently eliminate many of the complexities that come with traditional offline RL. We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills. Conditioning on a single constraint or skill during training leads to behaviors at test-time that can satisfy several constraints together or demonstrate a composition of skills. Our results illustrate that conditional generative modeling is a powerful tool for decision-making.

翻訳日:2022-11-29 15:55:27 公開日:2022-11-28

# AcceRL: 深層強化学習のための政策加速フレームワーク

AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning ( http://arxiv.org/abs/2211.15023v1 )

ライセンス: Link先を確認

Hongjie Zhang

(参考訳) 深層強化学習はその超意思決定能力で様々な分野で大きな成功を収めた。しかし、政策学習プロセスは大量の訓練時間を必要とし、エネルギー消費を引き起こす。ニューラルネットワークの冗長性に触発されて,ニューラルネットワーク圧縮に基づく軽量並列学習フレームワーク accerl を提案する。具体的には、さまざまなニューラルネットワーク圧縮手法を柔軟に組み合わせて、経験収集を高速化する。全体としてaccerlはアクタ、学習者、圧縮機、補正器、モニターの5つのコンポーネントで構成されている。アクターはコンプレッサーを使用して学習者のポリシーネットワークを圧縮し、環境と対話する。そして生成されたエクスペリエンスは、v-trace、retraceなどのオフポリシーメソッドによる修正子によって変換される。そして、修正された経験を学習者に与えてポリシー学習を行う。これは、複数のニューラルネットワーク圧縮技術を組み込んだ最初の汎用強化学習フレームワークであると考えています。体育館で行われた大規模な実験では、AceRLは従来の方法と比較してアクターの時間コストを約2.0Xから4.13Xに削減している。さらに、AceRLは従来の方法と比較してトレーニング全体の時間を29.8%から40.3%削減し、同じポリシー品質を維持している。

Deep reinforcement learning has achieved great success in various fields with its super decision-making ability. However, the policy learning process requires a large amount of training time, causing energy consumption. Inspired by the redundancy of neural networks, we propose a lightweight parallel training framework based on neural network compression, AcceRL, to accelerate the policy learning while ensuring policy quality. Specifically, AcceRL speeds up the experience collection by flexibly combining various neural network compression methods. Overall, the AcceRL consists of five components, namely Actor, Learner, Compressor, Corrector, and Monitor. The Actor uses the Compressor to compress the Learner's policy network to interact with the environment. And the generated experiences are transformed by the Corrector with Off-Policy methods, such as V-trace, Retrace and so on. Then the corrected experiences are feed to the Learner for policy learning. We believe this is the first general reinforcement learning framework that incorporates multiple neural network compression techniques. Extensive experiments conducted in gym show that the AcceRL reduces the time cost of the actor by about 2.0 X to 4.13 X compared to the traditional methods. Furthermore, the AcceRL reduces the whole training time by about 29.8% to 40.3% compared to the traditional methods while keeps the same policy quality.

翻訳日:2022-11-29 15:47:46 公開日:2022-11-28

# 質的制約付き強化学習:停電確率を制約する強化学習フレームワーク

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability ( http://arxiv.org/abs/2211.15034v1 )

ライセンス: Link先を確認

Whiyoung Jung, Myungsik Cho, Jongeui Park, Youngchul Sung

(参考訳) 制約強化学習(restricted reinforcement learning, rl)は、与えられた制約を満たしながら、期待累積回帰を最大化する最適方針を見つけることを目的とした、rlの領域である。以前の制約付きrlワークのほとんどは、期待累積和コストを制約として考慮している。しかし、この制約による最適化は、累積和コストが所定の閾値を超えるような停止事象の目標確率を保証できない。本稿では,停止制約を満たすために必要な十分条件である累積和コスト分布の量子化を制約する,quantile restricteded rl(qcrl)という枠組みを提案する。これは、ポリシー勾配定理を量子論に適用する問題に取り組み、量子論の勾配を近似するための理論的結果を提供する最初の研究である。導出した理論結果とラグランジュ乗算器の手法に基づき、量子量制限ポリシー最適化(qcpo)と呼ばれる制約付きrlアルゴリズムを構築した。我々は,大偏差原理(LDP)を用いた分布RLを用いて,QCPOの実装における累積和コストの定量値とテール確率を推定する。実装されたアルゴリズムは、トレーニング期間後の停止確率制約を満たす。

Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period.

翻訳日:2022-11-29 15:47:28 公開日:2022-11-28

# オフライン強化学習のための状態認識近位悲観的アルゴリズム

State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning ( http://arxiv.org/abs/2211.15065v1 )

ライセンス: Link先を確認

Chen Chen, Hongyao Tang, Yi Ma, Chao Wang, Qianli Shen, Dong Li, Jianye Hao

(参考訳) ペシミズムはオフライン強化学習(RL)において非常に重要である。オフラインRLアルゴリズムの幅広いカテゴリは、明示的または暗黙的な振舞い規則化によって悲観主義を満たす。しかし、そのほとんどは、オフライン状態の分布が学習方針とどのように異なるかという影響を無視して、行動規則化として政策の分岐のみを考慮する。この問題を考慮し、オフラインRLのための原理的アルゴリズムフレームワークである 'emph{State-Aware Proximal Pessimism} (SA-PP) を提案する。 SA-PPの鍵となる考え方は、学習ポリシーとオフラインデータセット間の定常状態分布比の割引を利用して、状態ワイドな振る舞い規則化の度合いを調整し、悲観性をより適切な方法で実装できるようにすることである。まず, 従来のアルゴリズムよりもSA-PPの方が優れていることの理論的正当性を示し, 幅広い設定において, SA-PPが下位最適上界を生成することを示す。さらに、DualDICEの助けを借りて、SA-PPを代表CQLアルゴリズム上に構築し、割引された定常状態分布比を推定することで、SA-CQLと呼ばれる新しいアルゴリズムを提案する。標準のオフラインRLベンチマークに対する大規模な実験は、SA-CQLがベンチマークの大部分で一般的なベースラインを上回っ、最も高い平均リターンを達成したことを示している。

Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offline RL algorithms fulfills pessimism by explicit or implicit behavior regularization. However, most of them only consider policy divergence as behavior regularization, ignoring the effect of how the offline state distribution differs with that of the learning policy, which may lead to under-pessimism for some states and over-pessimism for others. Taking account of this problem, we propose a principled algorithmic framework for offline RL, called \emph{State-Aware Proximal Pessimism} (SA-PP). The key idea of SA-PP is leveraging discounted stationary state distribution ratios between the learning policy and the offline dataset to modulate the degree of behavior regularization in a state-wise manner, so that pessimism can be implemented in a more appropriate way. We first provide theoretical justifications on the superiority of SA-PP over previous algorithms, demonstrating that SA-PP produces a lower suboptimality upper bound in a broad range of settings. Furthermore, we propose a new algorithm named \emph{State-Aware Conservative Q-Learning} (SA-CQL), by building SA-PP upon representative CQL algorithm with the help of DualDICE for estimating discounted stationary state distribution ratios. Extensive experiments on standard offline RL benchmark show that SA-CQL outperforms the popular baselines on a large portion of benchmarks and attains the highest average return.

翻訳日:2022-11-29 15:47:08 公開日:2022-11-28

# 事前データのない設計への学習:ディープラーニングと木探索を用いた汎用設計戦略の発見

Learning to design without prior data: Discovering generalizable design strategies using deep learning and tree search ( http://arxiv.org/abs/2211.15068v1 )

ライセンス: Link先を確認

Ayush Raina, Jonathan Cagan, Christopher McComb

(参考訳) 独自に設計できるAIエージェントの構築は1980年代から目標とされてきた。近年、ディープラーニングは大規模データから学習する能力を示し、データ駆動設計の大幅な進歩を可能にしている。しかし、事前のデータから学ぶことは、以前解決した問題を解決することのみを制限し、データ駆動学習を既存のソリューションに偏らせる。設計エージェントの最終的な目標は、問題空間における一般的な設計動作を、これまで見たことのないまま学習する能力である。本稿では,この目標を達成するための自己学習エージェントフレームワークを提案する。このフレームワークは,木探索が問題空間を探索する新しい木探索アルゴリズムと深いポリシーネットワークを統合し,深いポリシーネットワークは自己生成した経験を活用して探索をさらに誘導する。このフレームワークは、まず、先行データなしで高性能な生成戦略を発見する能力を示し、次に、未知の境界条件をまたいだ生成戦略のゼロショット一般化を示す。本研究は,2つのエンジニアリング設計問題の複数バージョンを再訓練せずに解くことにより,フレームワークの有効性と汎用性を評価する。本稿では,任意の問題空間における自己学習型ハイパフォーマンス・一般化可能な問題解決行動の方法論を提案し,専門家データ,既存ソリューション,問題固有学習の必要性を回避した。

Building an AI agent that can design on its own has been a goal since the 1980s. Recently, deep learning has shown the ability to learn from large-scale data, enabling significant advances in data-driven design. However, learning over prior data limits us only to solve problems that have been solved before and biases data-driven learning towards existing solutions. The ultimate goal for a design agent is the ability to learn generalizable design behavior in a problem space without having seen it before. We introduce a self-learning agent framework in this work that achieves this goal. This framework integrates a deep policy network with a novel tree search algorithm, where the tree search explores the problem space, and the deep policy network leverages self-generated experience to guide the search further. This framework first demonstrates an ability to discover high-performing generative strategies without any prior data, and second, it illustrates a zero-shot generalization of generative strategies across various unseen boundary conditions. This work evaluates the effectiveness and versatility of the framework by solving multiple versions of two engineering design problems without retraining. Overall, this paper presents a methodology to self-learn high-performing and generalizable problem-solving behavior in an arbitrary problem space, circumventing the needs for expert data, existing solutions, and problem-specific learning.

翻訳日:2022-11-29 15:46:37 公開日:2022-11-28

# flip initial features: 半教師付きノード分類のためのニューラルネットワークの一般化

Flip Initial Features: Generalization of Neural Networks for Semi-supervised Node Classification ( http://arxiv.org/abs/2211.15081v1 )

ライセンス: Link先を確認

Yoonhyuk Choi, Chong-Kwon Kim

(参考訳) グラフニューラルネットワーク(GNN)は、半教師付き設定下で広く利用されている。以前の研究は主に、好気性グラフと好気性グラフの両方をよく一般化するための適切なグラフフィルタ(例えば集約スキーム)を見つけることに重点を置いてきた。これらのアプローチは必須かつ効果的ではあるが、単語の袋表現に内在する初期ノードの特徴のスパースに苦しむ。半教師付き学習では、トレーニングサンプルがグラフフィルタ(超平面)の全次元をカバーできない場合があり、これは第1のプロジェクター行列における特定の次元の過度な適合を生じさせる。この問題に対処するために、我々は単純で新しい戦略を提案し、初期特徴と超平面を同時に反転させて追加空間を作成する。オリジナルとフリップスペースの両方でのトレーニングは、学習可能なパラメータの正確な更新を提供することができる。我々の知る限りでは、これはGNNのオーバーフィッティング問題を効果的に緩和する最初の試みである。実世界のデータセットに対する大規模な実験により、提案手法はノード分類精度を最大40.2%改善することを示した。

Graph neural networks (GNNs) have been widely used under semi-supervised settings. Prior studies have mainly focused on finding appropriate graph filters (e.g., aggregation schemes) to generalize well for both homophilic and heterophilic graphs. Even though these approaches are essential and effective, they still suffer from the sparsity in initial node features inherent in the bag-of-words representation. Common in semi-supervised learning where the training samples often fail to cover the entire dimensions of graph filters (hyperplanes), this can precipitate over-fitting of specific dimensions in the first projection matrix. To deal with this problem, we suggest a simple and novel strategy; create additional space by flipping the initial features and hyperplane simultaneously. Training in both the original and in the flip space can provide precise updates of learnable parameters. To the best of our knowledge, this is the first attempt that effectively moderates the overfitting problem in GNN. Extensive experiments on real-world datasets demonstrate that the proposed technique improves the node classification accuracy up to 40.2 %

翻訳日:2022-11-29 15:46:15 公開日:2022-11-28

# si-gat:ソナー画像分類のための改良グラフアテンションネットワークに基づく手法

SI-GAT: A method based on improved Graph Attention Network for sonar image classification ( http://arxiv.org/abs/2211.15133v1 )

ライセンス: Link先を確認

Can Lei and Huigang Wang and Juan Lei

(参考訳) 深層学習に基づく既存のソナー画像分類法は、局所像の特徴のみを考慮してユークリッド空間でしばしば分析される。そこで本稿では,複数種類の撮像ソナーに適用可能な改良型グラフアテンションネットワーク (gat) に基づくソナー分類法を提案する。本手法は,非ユークリッド空間におけるソナー特性を表す色近距離と空間近距離の連成計算に基づいてノード間の相関関係を定量化し,KNN(K-Nearest Neighbor)アルゴリズムを用いて,注目係数行列と結合してSI-GATの鍵部分を構成するグラフアテンション機構の近傍範囲と隣接行列を決定する。このSI-GATは、実データの検証を通じてユークリッド空間に基づくCNN(Convolutional Neural Network)手法よりも優れている。

The existing sonar image classification methods based on deep learning are often analyzed in Euclidean space, only considering the local image features. For this reason, this paper presents a sonar classification method based on improved Graph Attention Network (GAT), namely SI-GAT, which is applicable to multiple types imaging sonar. This method quantifies the correlation relationship between nodes based on the joint calculation of color proximity and spatial proximity that represent the sonar characteristics in non-Euclidean space, then the KNN (K-Nearest Neighbor) algorithm is used to determine the neighborhood range and adjacency matrix in the graph attention mechanism, which are jointly considered with the attention coefficient matrix to construct the key part of the SI-GAT. This SI-GAT is superior to several CNN (Convolutional Neural Network) methods based on Euclidean space through validation of real data.

翻訳日:2022-11-29 15:41:04 公開日:2022-11-28

# 順序距離学習のための角三角形距離

Angular triangle distance for ordinal metric learning ( http://arxiv.org/abs/2211.15200v1 )

ライセンス: Link先を確認

Imam Mustafa Kamal and Hyerim Bae

(参考訳) deep metric learning(dml)は、タスク固有の距離やデータの類似性を自動的に構築することを目的としている。いくつかの重要なメトリックラーニング手法が提案されている。それでも、低次元空間における元のデータの順序的性質の保存は保証されない。通常のデータは、バイオメディカルケースにおける症状の重症度、製造における生産品質、企業における格付けレベル、顔認識における老化レベルなど、現実世界の問題においてユビキタスである。本研究では,新しい三角形距離 (ATD) と順序三重項ネットワーク (OTD) を提案し,順序データに対する高精度で有意義な埋め込み空間表現を求める。 ATDは角空間におけるデータの順序関係を投影し、OTDはその順序関係を学習する。また、新しい距離測度が数学的に距離計量特性を満たすことを示した。提案手法は,生体情報,顔画像,手指画像などの順序的性質を持つ実世界データを用いて評価した。その結果,提案手法は順序性だけでなく,既存のDMLモデルよりも正確であることがわかった。さらに,提案手法は,最先端の順序数学習法よりも優れていることを示す。

Deep metric learning (DML) aims to automatically construct task-specific distances or similarities of data, resulting in a low-dimensional representation. Several significant metric-learning methods have been proposed. Nonetheless, no approach guarantees the preservation of the ordinal nature of the original data in a low-dimensional space. Ordinal data are ubiquitous in real-world problems, such as the severity of symptoms in biomedical cases, production quality in manufacturing, rating level in businesses, and aging level in face recognition. This study proposes a novel angular triangle distance (ATD) and ordinal triplet network (OTD) to obtain an accurate and meaningful embedding space representation for ordinal data. The ATD projects the ordinal relation of data in the angular space, whereas the OTD learns its ordinal projection. We also demonstrated that our new distance measure satisfies the distance metric properties mathematically. The proposed method was assessed using real-world data with an ordinal nature, such as biomedical, facial, and hand-gestured images. Extensive experiments have been conducted, and the results show that our proposed method not only semantically preserves the ordinal nature but is also more accurate than existing DML models. Moreover, we also demonstrate that our proposed method outperforms the state-of-the-art ordinal metric learning method.

翻訳日:2022-11-29 15:40:47 公開日:2022-11-28

# MicroAST: 超高分解能任意型トランスファーを目指して

MicroAST: Towards Super-Fast Ultra-Resolution Arbitrary Style Transfer ( http://arxiv.org/abs/2211.15313v1 )

ライセンス: Link先を確認

Zhizhong Wang, Lei Zhao, Zhiwen Zuo, Ailin Li, Haibo Chen, Wei Xing, Dongming Lu

(参考訳) 任意スタイル転送(AST)は、任意の芸術スタイルをコンテンツイメージに転送する。最近の急速な進歩にもかかわらず、既存のastメソッドは、リソースが限られている超高解像度(4kなど)で実行できないか、遅すぎるため、さらなるアプリケーションを妨げる。本稿では,MicroASTと呼ばれる単純で軽量なモデルを学ぶことで,このジレンマに対処する。鍵となる洞察は、推論時に面倒な事前訓練されたDeep Convolutional Neural Networks(例えばVGG)の使用を完全に放棄することである。代わりに、2つのマイクロエンコーダ(コンテンツエンコーダとスタイルエンコーダ)と1つのマイクロデコーダを設計する。コンテンツエンコーダは、コンテンツ画像の主構造を抽出することを目的とする。スタイルエンコーダは、変調器と組み合わせて、このスタイル画像を学習可能なデュアル変調信号に符号化し、デコーダの中間特徴と畳み込みフィルタの両方を変調し、より洗練され柔軟なスタイル信号を注入してスタイル化を導く。さらに、より明瞭で代表的なスタイル信号を抽出するスタイルエンコーダの能力を高めるために、我々のモデルに新しいスタイル信号のコントラストロスを導入する。この技術と比較すると、私たちのMicroASTは視覚的に優れた結果をもたらすだけでなく、5-73倍小さく、6-18倍速く、初めて超高速(0.5秒)のASTを4K超解像度で実現しました。コードはhttps://github.com/EndyWon/MicroASTで入手できる。

Arbitrary style transfer (AST) transfers arbitrary artistic styles onto content images. Despite the recent rapid progress, existing AST methods are either incapable or too slow to run at ultra-resolutions (e.g., 4K) with limited resources, which heavily hinders their further applications. In this paper, we tackle this dilemma by learning a straightforward and lightweight model, dubbed MicroAST. The key insight is to completely abandon the use of cumbersome pre-trained Deep Convolutional Neural Networks (e.g., VGG) at inference. Instead, we design two micro encoders (content and style encoders) and one micro decoder for style transfer. The content encoder aims at extracting the main structure of the content image. The style encoder, coupled with a modulator, encodes the style image into learnable dual-modulation signals that modulate both intermediate features and convolutional filters of the decoder, thus injecting more sophisticated and flexible style signals to guide the stylizations. In addition, to boost the ability of the style encoder to extract more distinct and representative style signals, we also introduce a new style signal contrastive loss in our model. Compared to the state of the art, our MicroAST not only produces visually superior results but also is 5-73 times smaller and 6-18 times faster, for the first time enabling super-fast (about 0.5 seconds) AST at 4K ultra-resolutions. Code is available at https://github.com/EndyWon/MicroAST.

翻訳日:2022-11-29 15:40:25 公開日:2022-11-28

# 知覚、基礎、理性、行動:汎用視覚表現のためのベンチマーク

Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation ( http://arxiv.org/abs/2211.15402v1 )

ライセンス: Link先を確認

Jiangyong Huang, William Yicheng Zhu, Baoxiong Jia, Zan Wang, Xiaojian Ma, Qing Li, Siyuan Huang

(参考訳) 現在のコンピュータビジョンモデルは、人間の視覚システムとは異なり、汎用的な視覚理解をまだ達成できていない。一般的なビジョンモデルを作成する既存の取り組みは、評価されたタスクの範囲に制限があり、それらを全体的に実行する包括的なフレームワークを提供していません。我々は,4つの機能ドメインを持つ視覚認知能力の全スペクトルを包括的に網羅した,汎用視覚理解評価(General-purpose Visual Understanding Evaluation, G-VUE)を提案する。 4つのドメインは、3d再構成から視覚的推論や操作まで、11の注意深くキュレートされたタスクに具体化されている。ベンチマークとともに、11タスクの任意の視覚表現を評価するための一般的なエンコーダ・デコーダフレームワークを提供する。我々は,(1)トランスフォーマーベースの視覚バックボーンが,G-VUE上でCNNベースのバックボーンよりも優れており,(2)視覚言語による事前学習による視覚表現が視覚タスクを横断する視覚のみの事前学習よりも優れていることを確認する。 g-vueでは,より汎用的な視覚表現を得ることで,汎用視覚システム構築に向けた研究のモチベーションを高めるための総合的評価基準を提供する。

Current computer vision models, unlike the human visual system, cannot yet achieve general-purpose visual understanding. Existing efforts to create a general vision model are limited in the scope of assessed tasks and offer no overarching framework to perform them holistically. We present a new comprehensive benchmark, General-purpose Visual Understanding Evaluation (G-VUE), covering the full spectrum of visual cognitive abilities with four functional domains $\unicode{x2014}$ Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation. Along with the benchmark, we provide a general encoder-decoder framework to allow for the evaluation of arbitrary visual representation on all 11 tasks. We evaluate various pre-trained visual representations with our framework and observe that (1) Transformer-based visual backbone generally outperforms CNN-based backbone on G-VUE, (2) visual representations from vision-language pre-training are superior to those with vision-only pre-training across visual tasks. With G-VUE, we provide a holistic evaluation standard to motivate research toward building general-purpose visual systems via obtaining more general-purpose visual representations.

翻訳日:2022-11-29 15:39:57 公開日:2022-11-28

# FsaNet: セマンティックセグメンテーションのための周波数自己注意

FsaNet: Frequency Self-attention for Semantic Segmentation ( http://arxiv.org/abs/2211.15595v1 )

ライセンス: Link先を確認

Fengyu Zhang, Ashkan Panahi, Guangjun Gao

(参考訳) 画像のスペクトル特性を考慮し,線形速度まで計算複雑性を低減した新しい自己追尾機構を提案する。オブジェクト内の類似性を促進しつつエッジの保存性を向上させるため,周波数帯域の異なる個別化プロセスを提案する。特に, プロセスが低周波成分上のみである場合について検討する。アブレーション研究により,低周波自己注意は,ネットワークを再トレーニングすることなく,全周波に対して非常に近い,あるいは良好な性能が得られることを示した。そこで我々は,FsaNetと呼ぶCNNネットワークの先頭に,新しいプラグアンドプレイモジュールを設計し,組み込む。周波数自己注意 1)低周波係数を入力とする。 2) 線形構造を持つ空間領域自己完結と数学的に等価である。 3) トークンマッピング(1\times1$畳み込み)ステージとトークンの混合ステージを同時に単純化する。周波数自己アテンションに要するメモリは 87.29 % \sim 90.04 %$ メモリは 96.13 % \sim 98.07 % $ FLOPs と 97.56 % \sim 98.18 %$ である。他のResNet101ベースのセルフアテンションネットワークと比較して、FsaNetはCityscapeのテストデータセットとADE20kとVOCaugの競合する結果に対して、最先端の新たな結果(83.0\%$ mIoU)を達成した。

Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that the frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.

翻訳日:2022-11-29 15:39:33 公開日:2022-11-28

# エッジ強化グラフアライメントネットワークとワードペア関係タグを用いた共同マルチモーダルエンティティ-リレーション抽出

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging ( http://arxiv.org/abs/2211.15028v1 )

ライセンス: Link先を確認

Li Yuan, Yi Cai, Jin Wang, Qing Li

(参考訳) マルチモーダル認識(MNER)とマルチモーダル関係抽出(MRE)は、マルチモーダル知識グラフ構築タスクにおける2つの基本的なサブタスクである。しかし、既存のメソッドは通常2つのタスクを独立に処理し、両者の双方向インタラクションを無視する。本稿では,MNERとMREをJMERE(Joint Multimodal entity-relation extract task)として共同実行することを提案する。さらに、現在のmnerおよびmreモデルは、視覚およびテキストグラフにおける視覚オブジェクトとテキストエンティティの整合のみを考慮し、エンティティ-エンティティ関係とオブジェクト-オブジェクト関係を無視する。上記の課題に対処するため、JMEREタスクのためのエッジ強化グラフアライメントネットワークとワードペア関係タグ付け(EEGA)を提案する。具体的には、まず、MNERとMREの双方向相互作用を利用して単語対関係タグを設計し、エラー伝搬を回避する。次に,クロスグラフのノードとエッジをアライメントすることにより,jmereタスクを強化するためのエッジエンハンスグラフアライメントネットワークを提案する。従来の手法と比較して,エッジ情報を利用してオブジェクトとエンティティのアライメントを補助し,エンティティ-エンティティ関係とオブジェクト-オブジェクト関係の相関関係を求めることができる。本モデルの有効性を示す実験を行った。

Multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) are two fundamental subtasks in the multimodal knowledge graph construction task. However, the existing methods usually handle two tasks independently, which ignores the bidirectional interaction between them. This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE). Besides, the current MNER and MRE models only consider aligning the visual objects with textual entities in visual and textual graphs but ignore the entity-entity relationships and object-object relationships. To address the above challenges, we propose an edge-enhanced graph alignment network and a word-pair relation tagging (EEGA) for JMERE task. Specifically, we first design a word-pair relation tagging to exploit the bidirectional interaction between MNER and MRE and avoid the error propagation. Then, we propose an edge-enhanced graph alignment network to enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared with previous methods, the proposed method can leverage the edge information to auxiliary alignment between objects and entities and find the correlations between entity-entity relationships and object-object relationships. Experiments are conducted to show the effectiveness of our model.

翻訳日:2022-11-29 15:39:07 公開日:2022-11-28

# G^3: Guidebook Grounding によるジオロケーション

G^3: Geolocation via Guidebook Grounding ( http://arxiv.org/abs/2211.15521v1 )

ライセンス: Link先を確認

Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach

(参考訳) 画像が撮影された場所を予測するタスクである位置情報を,言語がいかに改善できるかを示す。そこで本研究では,人間が位置情報に用いている視覚的特徴を,人間の手書きガイドブックから明らかに把握する。多様な場所のストリートビュー画像のデータセットと、人気のあるインタラクティブなジオロケーションゲームであるGeoGuessrのテキストガイドブックを用いた、ガイドブックグラウンドによるジオロケーションのタスクを提案する。本手法は,ガイドブックから自動的に抽出された手がかりに注目することで,各画像の国を予測する。国レベルの擬似ラベルによる注目が最高のパフォーマンスを達成する。本手法は,最先端画像のみの位置情報法を実質的に上回り,top-1精度が5%以上向上した。データセットとコードはhttps://github.com/g-luo/geolocation_via_guidebook_grounding.orgにある。

We demonstrate how language can improve geolocation: the task of predicting the location where an image was taken. Here we study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game. Our approach predicts a country for each image by attending over the clues automatically extracted from the guidebook. Supervising attention with country-level pseudo labels achieves the best performance. Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy. Our dataset and code can be found at https://github.com/g-luo/geolocation_via_guidebook_grounding.

翻訳日:2022-11-29 15:22:02 公開日:2022-11-28

# 多様な嗜好を持つヒトの合意を見つけるための微調整言語モデル

Fine-tuning language models to find agreement among humans with diverse preferences ( http://arxiv.org/abs/2211.15006v1 )

ライセンス: Link先を確認

Michiel A. Bakker and Martin J. Chadwick and Hannah R. Sheahan and Michael Henry Tessler and Lucy Campbell-Gillingham and Jan Balaguer and Nat McAleese and Amelia Glaese and John Aslanides and Matthew M. Botvinick and Christopher Summerfield

(参考訳) 大規模言語モデリング(LLM)における最近の研究は、出力をプロトタイプユーザの好みに合わせるために微調整を用いている。この研究は、人間の嗜好が個人間で静的で均質であると仮定し、単一の"ジェネリック"なユーザーとの整合がより一般的な整合性を与える。ここでは、人間の嗜好の不均一性を受け入れて、異なる課題を考える: 多様な視点を持つ人々が合意を見つけるのに、マシンはどのように役立つのか? 我々は700億のパラメータllmを微調整し、多様な意見を持つグループに対して、期待される承認を最大化する声明を生成する。人間の参加者は、道徳的問題や政治的問題(例えば、「富裕層に税金を課すべきか?」など)に関する数千の質問について意見書を提出し、LLMが生成した合意と品質に関する合意書を評価する。次に、報酬モデルは個々の選好を予測するために訓練され、異なる集約(社会福祉)機能に従って定義されたグループ全体へのアピールの観点からコンセンサスステートメントを定量化しランク付けすることができる。このモデルでは, LLM(>70%)よりも人間の方が好まれるコンセンサス文を生成し, 最終ランク付けステップに欠ける厳密な微調整ベースラインを著しく上回っている。さらに、ベストモデルのコンセンサスステートメントは、最高の人間生成の意見(>65%)よりも好まれます。グループメンバーのサブセットからのみ合意文を静かに構築すると、除外されたメンバは反対する傾向があり、個々のコントリビューションに対する合意の感受性が明らかになる。これらの結果は、人間のグループ同士の価値観の整合を支援するためにLLMを使うことの可能性を強調している。

Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.

翻訳日:2022-11-29 15:21:21 公開日:2022-11-28

# カテゴリーデータに対する連続拡散

Continuous diffusion for categorical data ( http://arxiv.org/abs/2211.15089v1 )

ライセンス: Link先を確認

Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, R\'emi Leblond, Will Grathwohl, Jonas Adler

(参考訳) 拡散モデルは、反復的洗練による知覚信号(画像や音など)の生成のパラダイムとして急速に発展してきた。彼らの成功は、基礎となる物理現象が連続しているという事実にかかっている。言語のような本質的に離散的で分類的なデータに対して、様々な拡散にインスパイアされた代替案が提案されている。しかし、拡散モデルの連続的な性質は多くの利点をもたらしており、この研究ではそれを保存しようと努力する。時間空間と入力空間の両方で連続的な拡散モデルを用いて分類データをモデル化するCDCDを提案する。いくつかの言語モデリングタスクにおいて有効性を示す。

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.

翻訳日:2022-11-29 15:20:50 公開日:2022-11-28

# 事前学習言語モデルにおける科学的・創造的アナロジー

Scientific and Creative Analogies in Pretrained Language Models ( http://arxiv.org/abs/2211.15268v1 )

ライセンス: Link先を確認

Tamara Czinczoll, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova

(参考訳) 本稿では,BERT や GPT-2 などの大規模事前学習言語モデルにおけるアナログの符号化について検討する。既存の類似データセットは、典型的には類似関係の限られた集合に焦点をあて、類似が持つ2つの領域の類似度が高い。より現実的な設定として、異種ドメイン間の複数の属性と関係構造の体系的なマッピングを含む新しいアナログデータセットであるScientific and Creative Analogy dataset(SCAN)を紹介する。このデータセットを用いて、広く使われている事前学習言語モデル(LM)の類似推論機能をテストする。現状のLMはこれらの複雑なアナロジータスクにおいて低性能を実現し、アナロジー理解によってもたらされる課題を浮き彫りにする。

This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.

翻訳日:2022-11-29 15:20:42 公開日:2022-11-28

# eコマースサイトにおける感情分析と意見マイニング

Sentiment analysis and opinion mining on E-commerce site ( http://arxiv.org/abs/2211.15536v1 )

ライセンス: Link先を確認

Fatema Tuz Zohra Anny and Oahidul Islam

(参考訳) 感情分析や意見マイニングは、NLP(Natural Language Processing)というフレーズを説明するのに役立つ。近年では感性分析が最も重要な話題となっている。本研究の目的は,感情分析における感情極性分類の課題を解決することである。全体的プロセスの説明とともに、感情的反対を分類する幅広い手法が提示される。分析の結果,文レベルの分類とレビューレベルの分類の両方が行われる。最後に,今後の感情分析研究の計画について述べる。

Sentiment analysis or opinion mining help to illustrate the phrase NLP (Natural Language Processing). Sentiment analysis has been the most significant topic in recent years. The goal of this study is to solve the sentiment polarity classification challenges in sentiment analysis. A broad technique for categorizing sentiment opposition is presented, along with comprehensive process explanations. With the results of the analysis, both sentence-level classification and review-level categorization are conducted. Finally, we discuss our plans for future sentiment analysis research.

翻訳日:2022-11-29 15:20:11 公開日:2022-11-28

# 常識推論のためのGPT-Neo-理論と実用的なレンズ

GPT-Neo for commonsense reasoning-a theoretical and practical lens ( http://arxiv.org/abs/2211.15593v1 )

ライセンス: Link先を確認

Rohan Kashyap, Vivek Kashyap, Narendra C.P

(参考訳) 最近の研究は、GPT-2、GPT-3、GPT-neoのような大規模一方向言語モデルを事前訓練し、次いで下流タスクの微調整で大幅に向上した。本稿では,コモンセンス推論タスクにおけるGPT-neo 1.13億モデルの性能評価を行う。 6つのコモンセンス推論ベンチマークタスクのモデル性能を評価し,これらのタスクの精度スコアを報告する。適切なハイパーパラメータを用いて微調整を行うと、これらの3つのタスクの競合スコアを得るが、データセットのサイズが著しく小さくなると苦労する。これらのタスクのいくつかにおける低モデルのパフォーマンスは、これらのデータセットに固有の難しさを示唆している。また,モデルの性能をよりよく理解するために,可視化を用いて結果を検証し,多数の推論テストを実施しました。最後に,様々な手法を用いて徹底的なロバストネステストを行い,多数の設定条件下でモデル性能を測定した。これらの結果から, GPT-3 175億モデルよりも小さい言語モデルを探索し, 自然言語理解を必要とするタスクを遂行できる可能性が示唆された。

Recent work has demonstrated substantial gains in pre-training large-scale unidirectional language models such as the GPT-2, GPT-3, and GPT-neo, followed by fine-tuning on a downstream task. In this paper, we evaluate the performance of the GPT-neo 1.3 billion model for commonsense reasoning tasks. We assess the model performance on six commonsense reasoning benchmark tasks and report the accuracy scores for these tasks. When fine-tuned using the right set of hyperparameters, we obtain competitive scores on three of these tasks but struggle when the dataset size is significantly smaller. The low model performance on a few of these tasks suggests the inherent difficulty in these datasets and since it fails to establish coherent patterns given their limited training samples. We also investigate and substantiate our results using visualization and conduct numerous inference tests to understand the model performance better. Finally, we conduct thorough robustness tests using various methods to gauge the model performance under numerous settings. These findings suggest a promising path for exploring smaller language models than the GPT-3 175 billion model to perform tasks requiring natural language understanding.

翻訳日:2022-11-29 15:20:05 公開日:2022-11-28

# コンテキスト内学習はどのような学習アルゴリズムか? 線形モデルによる研究

What learning algorithm is in-context learning? Investigations with linear models ( http://arxiv.org/abs/2211.15661v1 )

ライセンス: Link先を確認

Ekin Aky\"urek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

(参考訳) ニューラルシーケンスモデル、特にトランスフォーマーは、文脈内学習において顕著な能力を示す。ラベル付き例のシーケンスから新しい予測器を構築することができ、追加のパラメータ更新なしに入力に$(x, f(x))$が表示される。本稿では,トランスフォーマーをベースとしたインコンテキスト学習者が,より小さなモデルをアクティベーションに符号化することで,暗黙的な学習アルゴリズムを暗黙的に実装する仮説について検討する。線形回帰を原型問題として用いることで,この仮説の証拠を3つ提示する。まず, 勾配降下と閉形式リッジ回帰に基づく線形モデルのための学習アルゴリズムをトランスフォーマーが実装できることを示す。第2に, 学習者は, 勾配降下, リッジ回帰, および完全最小二乗回帰によって計算された予測器と密接に一致し, トランスフォーマタ深さやデータセットノイズが変化するため, 予測器間の遷移が変化し, 広い幅と深さのベイズ推定器に収束することを示した。第3に,学習者の後期層が重みベクトルやモーメント行列を非線形にエンコードする,文脈内学習者がアルゴリズム的特徴をこれらの予測器と共有する,予備的証拠を示す。これらの結果は,文脈内学習がアルゴリズム的に理解可能であり,(少なくとも線形の場合)学習者が標準推定アルゴリズムを再発見できることを示唆している。この$\href{https://github.com/ekinakyurek/google-research/blob/master/incontext}{http\,link}$でリリースされたコードと参照実装。

Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations released at this $\href{https://github.com/ekinakyurek/google-research/blob/master/incontext}{http\,link}$.

翻訳日:2022-11-29 15:19:45 公開日:2022-11-28

# 光沢度に基づく有意義手話機械翻訳に関する一考察

Considerations for meaningful sign language machine translation based on glosses ( http://arxiv.org/abs/2211.15464v1 )

ライセンス: Link先を確認

Mathias M\"uller, Zifan Jiang, Amit Moryossef, Annette Rios, Sarah Ebling

(参考訳) 自然言語処理(NLP)の研究(Yin et al., 2021)では,手話の自動処理が普及している。特に機械翻訳(MT)では、グルースに基づく手話翻訳が顕著なアプローチである。本稿では,ニューラルグロス翻訳に関する最近の研究について概説する。一般的なグルースの制限や特定のデータセットの制限は、透過的な方法では議論されず、評価の共通標準が存在しないことがわかった。これらの課題に対処するため,光沢翻訳研究の具体的な提言を行った。提案では,光沢に基づくアプローチ,現実的なデータセット,より強固なベースライン,説得力のある評価という本質的な限界に対する認識を提唱する。

Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation.

翻訳日:2022-11-29 15:12:24 公開日:2022-11-28

# 言語間移動のためのフラストレーションやすいラベル投影法

Frustratingly Easy Label Projection for Cross-lingual Transfer ( http://arxiv.org/abs/2211.15613v1 )

ライセンス: Link先を確認

Yang Chen, Chao Jiang, Alan Ritter, Wei Xu

(参考訳) 訓練データを多くの言語に翻訳することは、言語間転送を改善するための実用的な解決策として現れてきた。情報抽出や質問応答などのスパンレベルのアノテーションを含むタスクには、注釈付きスパンを翻訳されたテキストにマッピングするために追加のラベル投影ステップが必要である。近年, ラベル付きスパンの周囲に特別なマーカーを挿入することにより, 翻訳と投影を共同で行うための簡易なマーク翻訳手法が試みられている。しかし、我々の知る限り、この手法が単語アライメントに基づく従来のアノテーション投影とどのように比較されるかについては、実証的な分析は行われていない。本稿では,42言語および3つのタスク(QA,NER,イベント抽出)にまたがる広範な実証的研究を行い,両手法の有効性と限界を評価し,文献における重要なギャップを埋める。実験結果から,我々はEasyProjectと呼ぶマーク-then-translateの最適化版を多くの言語に適用しやすく,驚くほどうまく動作し,より複雑な単語アライメント方式よりも優れていることがわかった。エンドタスクのパフォーマンスに影響を与えるいくつかの重要な要因を分析し、EasyProjectが翻訳後のラベルスパン境界を正確に保存できることを示す。すべてのコードとデータを公開します。

Translating training data into many languages has emerged as a practical solution for improving cross-lingual transfer. For tasks that involve span-level annotations, such as information extraction or question answering, an additional label projection step is required to map annotated spans onto the translated texts. Recently, a few efforts have utilized a simple mark-then-translate method to jointly perform translation and projection by inserting special markers around the labeled spans in the original sentence. However, as far as we are aware, no empirical analysis has been conducted on how this approach compares to traditional annotation projection based on word alignment. In this paper, we present an extensive empirical study across 42 languages and three tasks (QA, NER, and Event Extraction) to evaluate the effectiveness and limitations of both methods, filling an important gap in the literature. Experimental results show that our optimized version of mark-then-translate, which we call EasyProject, is easily applied to many languages and works surprisingly well, outperforming the more complex word alignment-based methods. We analyze several key factors that affect end-task performance, and show EasyProject works well because it can accurately preserve label span boundaries after translation. We will publicly release all our code and data.

翻訳日:2022-11-29 15:12:13 公開日:2022-11-28

# データセットの数え方を超えて:多言語データセットの構築と必要なリソースの調査

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources ( http://arxiv.org/abs/2211.15649v1 )

ライセンス: Link先を確認

Xinyan Velocity Yu, Akari Asai, Trina Chatterjee, Junjie Hu and Eunsol Choi

(参考訳) NLPコミュニティは一般的に言語間の資源格差を認識しているが、そのような格差の程度と種類を定量化する研究は欠如している。データセットの品質が変化するにつれて、データセットの数に基づいてリソースの可用性を推定する以前の調査は誤解を招く可能性がある。より包括的な言語資源図を提供するため、156個の公開NLPデータセットの特徴について検討する。それらは、入力テキストやラベルソース、それらを構築するのに使用されるツール、彼らが何を勉強するか、彼らが対処するタスクと彼らの作成に対するモチベーションを含む、手動で作成する方法を注釈します。言語間の質的なNLPリソースギャップを定量化した後、低リソース言語におけるデータ収集を改善する方法について論じる。言語に習熟したNLP研究者と言語ごとの群衆労働者を調査したところ、その推定可用性はデータセットの可用性と相関していることがわかった。クラウドソーシング実験を通じて,メカニカルトルコプラットフォーム上で高品質な多言語データを収集するための戦略を同定する。今後の多言語データ開発のためのNLPコミュニティと個人研究者に対してマクロおよびマイクロレベルの提案を行うことで、結論付ける。

While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.

翻訳日:2022-11-29 15:11:51 公開日:2022-11-28

# 論争的な刺激を伴う表現的ジオメトリの識別:ベイズの実験設計と相違性判定への応用

Distinguishing representational geometries with controversial stimuli: Bayesian experimental design and its application to face dissimilarity judgments ( http://arxiv.org/abs/2211.15053v1 )

ライセンス: Link先を確認

Tal Golan, Wenxuan Guo, Heiko H. Sch\"utt, Nikolaus Kriegeskorte

(参考訳) ニューラルネットワーク層における複雑な刺激の表現と人間の脳の表現や行動判断を比較することで、モデル開発を導くことができる。しかし、定性的に異なるニューラルネットワークモデルでさえ、典型的な刺激セットの同様の表現的ジオメトリを予測することが多い。本稿では,表現モデル間の適応のための刺激セットを効率的に合成するためのベイズ実験設計手法を提案する。本稿では,行動相違判定のニューラルネットワークモデルの識別に本手法を適用した。その結果,3次元顔モデルグラフィックスレンダラを倒すように訓練されたニューラルネットワークは,識別,分類,自動エンコーディングを訓練した同じアーキテクチャよりも人間的指向性が高いことがわかった。提案した刺激合成の目的は,モデル比較のための表現類似性解析により解析する実験の設計に適用できる。

Comparing representations of complex stimuli in neural network layers to human brain representations or behavioral judgments can guide model development. However, even qualitatively distinct neural network models often predict similar representational geometries of typical stimulus sets. We propose a Bayesian experimental design approach to synthesizing stimulus sets for adjudicating among representational models efficiently. We apply our method to discriminate among candidate neural network models of behavioral face dissimilarity judgments. Our results indicate that a neural network trained to invert a 3D-face-model graphics renderer is more human-aligned than the same architecture trained on identification, classification, or autoencoding. Our proposed stimulus synthesis objective is generally applicable to designing experiments to be analyzed by representational similarity analysis for model comparison.

翻訳日:2022-11-29 15:04:11 公開日:2022-11-28

# CycleGAN拡張による地域降雨予報

Regional Precipitation Nowcasting Based on CycleGAN Extension ( http://arxiv.org/abs/2211.15046v1 )

ライセンス: Link先を確認

Jaeho Choi, Yura Kim, Kwang-Ho Kim, Sung-Hwa Jung, Ikhyun Cho

(参考訳) 通常、集中豪雨は2022年8月8日に韓国中部を襲った。多くの低地が水没し、交通と生活はひどく麻痺した。わずか数時間の暴風雨による致命的な被害であった。この出来事は、より信頼性の高い地域降水ノキャスティング方法の必要性を思い出させた。本稿では,サイクル一貫性のある対向ネットワーク (CycleGAN) を時系列領域に導入し,それを拡張し,地域降水流の信頼性モデルを提案する。提案モデルは,現在から10分後に複合複合表面降雨(HSR)データを生成する。また,提案モデルでは,トレーニング時間段階の段階的拡張により,最大2時間の信頼性予測を行う。既存の複雑な放送方法とは異なり、提案モデルはリカレントニューラルネットワーク(RNN)を使用しず、サイクルのシーケンシャルトレーニングを通じて時間的因果性を確保する。 RNNに基づく畳み込み長短期記憶(ConvLSTM)よりも優れた降水量推定法を提案する。さらに,実際の量的降水予測(QPF)モデルの一つであるラグランジアン外挿法による降水流のマギルアルゴリズムであるMAPLEに対する質的,定量的比較によるアプローチの優位性を示した。

Unusually, intensive heavy rain hit the central region of Korea on August 8, 2022. Many low-lying areas were submerged, so traffic and life were severely paralyzed. It was the critical damage caused by torrential rain for just a few hours. This event reminded us of the need for a more reliable regional precipitation nowcasting method. In this paper, we bring cycle-consistent adversarial networks (CycleGAN) into the time-series domain and extend it to propose a reliable model for regional precipitation nowcasting. The proposed model generates composite hybrid surface rainfall (HSR) data after 10 minutes from the present time. Also, the proposed model provides a reliable prediction of up to 2 hours with a gradual extension of the training time steps. Unlike the existing complex nowcasting methods, the proposed model does not use recurrent neural networks (RNNs) and secures temporal causality via sequential training in the cycle. Our precipitation nowcasting method outperforms convolutional long short-term memory (ConvLSTM) based on RNNs. Additionally, we demonstrate the superiority of our approach by qualitative and quantitative comparisons against MAPLE, the McGill algorithm for precipitation nowcasting by lagrangian extrapolation, one of the real quantitative precipitation forecast (QPF) models.

翻訳日:2022-11-29 14:55:43 公開日:2022-11-28

# 光タッチによるトランスフォーマーの多視点幾何学教育

A Light Touch Approach to Teaching Transformers Multi-view Geometry ( http://arxiv.org/abs/2211.15107v1 )

ライセンス: Link先を確認

Yash Bhalgat, Joao F. Henriques, Andrew Zisserman

(参考訳) トランスフォーマーは強力な視覚的学習者であり、多くの場合、手動で特定された事前情報がないためである。この柔軟性は、3次元形状と視点のほぼ無限のバリエーション(柔軟性が必要)と射影幾何学の正確な性質(剛性の法則に従えば)のため、多視点幾何学に関わるタスクにおいて問題となる。この混乱を解決するために,視覚トランスフォーマーに多視点幾何学を学ぶように誘導する「ライトタッチ」アプローチを提案する。我々は、エピポーラ線を用いてトランスフォーマーのクロスアテンションマップを誘導し、エピポーラ線外の注意値をペナルティ化し、それらの線に沿って高い注意を喚起する。従来の方法とは異なり、テスト時にカメラのポーズ情報を必要としない。検索画像と検索画像の視点の違いが大きいため,標準的なトランスフォーマーネットワークが苦労する,ポーズ不変オブジェクトインスタンス検索に注目する。提案手法は,テスト時にポーズ情報を必要とせず,オブジェクト検索における最先端の手法よりも優れている。

Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time.

翻訳日:2022-11-29 14:55:21 公開日:2022-11-28

# マルチモーダル医療データ分析のためのヘテロジニアスグラフ学習

Heterogeneous Graph Learning for Multi-modal Medical Data Analysis ( http://arxiv.org/abs/2211.15158v1 )

ライセンス: Link先を確認

Sein Kim, Namkyeong Lee, Junseok Lee, Dongmin Hyun and Chanyoung Park

(参考訳) 患者の定期的な臨床訪問は、画像データだけでなく、患者に関する臨床情報を含む非画像データ、すなわち、自然界において医療データはマルチモーダルである。このような異質な形態は、同じ患者に対して異なる視点と相補的な視点を提供し、適切な組み合わせによってより正確な臨床判断をもたらす。しかしながら、その重要性にもかかわらず、マルチモーダル医療データを統一フレームワークに効果的に融合する方法は比較的注目されていない。本稿では,マルチモーダル医療データを融合するためのHetMed (Heterogeneous Graph Learning for Multi-modal Medical Data Analysis) というグラフベースの効果的なフレームワークを提案する。具体的には,複数種類の非画像特徴を組み込んだマルチプレックスネットワークを構築し,患者間の複雑な関係を体系的に捉えることにより,より正確な臨床判断を行う。様々な実世界のデータセットに対する大規模な実験は、HetMedの優位性と実用性を示している。 HetMedのソースコードはhttps://github.com/Sein-Kim/Multimodal-Medicalで入手できる。

Routine clinical visits of a patient produce not only image data, but also non-image data containing clinical information regarding the patient, i.e., medical data is multi-modal in nature. Such heterogeneous modalities offer different and complementary perspectives on the same patient, resulting in more accurate clinical decisions when they are properly combined. However, despite its significance, how to effectively fuse the multi-modal medical data into a unified framework has received relatively little attention. In this paper, we propose an effective graph-based framework called HetMed (Heterogeneous Graph Learning for Multi-modal Medical Data Analysis) for fusing the multi-modal medical data. Specifically, we construct a multiplex network that incorporates multiple types of non-image features of patients to capture the complex relationship between patients in a systematic way, which leads to more accurate clinical decisions. Extensive experiments on various real-world datasets demonstrate the superiority and practicality of HetMed. The source code for HetMed is available at https://github.com/Sein-Kim/Multimodal-Medical.

翻訳日:2022-11-29 14:55:00 公開日:2022-11-28

# ブリッジモード接続による文脈適応型ディープニューラルネットワーク

Context-Adaptive Deep Neural Networks via Bridge-Mode Connectivity ( http://arxiv.org/abs/2211.15436v1 )

ライセンス: Link先を確認

Nathan Drenkow, Alvin Tan, Chace Ashcraft, Kiran Karra

(参考訳) 安全クリティカルなアプリケーションにおける機械学習モデルのデプロイは、このようなモデルがさまざまな状況でうまく機能することを期待している(例えば、街路標識を分類するためのビジョンモデルは、様々な照明/天候条件下で農村部、都市、高速道路で機能するべきである)。しかし、これらのワンサイズモデルは通常、平均ケースパフォーマンスに最適化されており、名目上の条件では高いパフォーマンスを達成することを奨励するが、難しい状況や稀な状況では予期せぬ振る舞いに露呈する。そこで本研究では,文脈依存型モデルを学習するための新しい手法を提案する。ブリッジモード接続 (bmc) (garipov et al., 2018) を拡張して,モデルの無限アンサンブルを連続的なコンテキストの尺度上でトレーニングし,対応する評価コンテキストに特別に調整したモデルパラメータをサンプリングする。本研究では,リスクプロファイルの変化,ロングテール画像の統計・出現,コンテキスト依存分布シフトなど,画像分類タスクにおけるコンテキスト定義について検討する。これらの各ケースに対してbmc最適化の新たな拡張を開発し,各シナリオにおけるモデル性能をコンテキストにうまく調整できることを実験により実証した。

The deployment of machine learning models in safety-critical applications comes with the expectation that such models will perform well over a range of contexts (e.g., a vision model for classifying street signs should work in rural, city, and highway settings under varying lighting/weather conditions). However, these one-size-fits-all models are typically optimized for average case performance, encouraging them to achieve high performance in nominal conditions but exposing them to unexpected behavior in challenging or rare contexts. To address this concern, we develop a new method for training context-dependent models. We extend Bridge-Mode Connectivity (BMC) (Garipov et al., 2018) to train an infinite ensemble of models over a continuous measure of context such that we can sample model parameters specifically tuned to the corresponding evaluation context. We explore the definition of context in image classification tasks through multiple lenses including changes in the risk profile, long-tail image statistics/appearance, and context-dependent distribution shift. We develop novel extensions of the BMC optimization for each of these cases and our experiments demonstrate that model performance can be successfully tuned to context in each scenario.

翻訳日:2022-11-29 14:53:50 公開日:2022-11-28

# 構成性向上のための相互排他性訓練と原始増強

Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality ( http://arxiv.org/abs/2211.15578v1 )

ライセンス: Link先を確認

Yichen Jiang, Xiang Zhou, Mohit Bansal

(参考訳) 最近のデータセットは、標準的なシーケンス対シーケンスモデルにおける体系的な一般化能力の欠如を明らかにしている。本研究では,Seq2seqモデルの振る舞いを分析し,相互排他バイアスの欠如(すなわち,すでに対象配列にマッピングされたソースシーケンスが他のターゲットシーケンスにマッピングされる可能性が低い)と,構造を内容から切り離すのではなく,全体例を記憶する傾向という2つの要因を同定する。我々は,これら2つの課題にそれぞれ対処するための2つの手法を提案している: 相互排他的訓練は,新奇な例に対面した場合にモデルが現れるのを防止し,類似性に基づく損失による未発見の例を発生させる;prim2primxデータ拡張は,すべての構文関数の引数を自動的に多様化し,暗記化を防止し,テストセットデータを露呈することなく構成的帰納的バイアスを与える。これら2つの手法を組み合わせることで,SCAN と COGS の2つの広く使用されている構成性データセット上で,標準シーケンス列列モデル (LSTM と Transformer ) を用いた経験的改善が得られた。最後に,改善点と残る課題を特徴とする分析を行い,本手法の詳細なアブレーションを行う。私たちのコードはhttps://github.com/owenzx/met-primaugで利用可能です。

Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug

翻訳日:2022-11-29 14:47:37 公開日:2022-11-28

# パラメータ効率の良いファインチューニングの有効性について

On the Effectiveness of Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2211.15583v1 )

ライセンス: Link先を確認

Zihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel Collier

(参考訳) 微調整事前学習モデルは、幅広いNLPタスクに有効であることが広く証明されている。しかし、モデル全体を微調整することはパラメータ非効率であり、常にタスクごとに全く新しいモデルを生成する。現在、多くの研究がパラメータのごく一部だけを微調整し、多くのパラメータを異なるタスクで共有することを提案している。これらの手法は驚くほど優れた性能を達成し、対応する完全微調整のものよりも安定であることが示される。しかし、そのような方法はまだよく分かっていない。パラメータの空間性は、どのように有望なパフォーマンスをもたらすのか? なぜモデルは完全に調整されたモデルよりも安定しているのか? チューニング可能なパラメータの選び方? 本稿では,既存の手法をまずランダムアプローチ,ルールベースアプローチ,投射ベースアプローチに分類し,どのパラメータをチューニングするかを選択する。そして,全ての手法が実際に微調整されたモデルに分散していることを示し,新しい理論解析を行う。安定性の上限を制御して元のモデルに正規化を実際に与えていることを示す。このような安定性は、最近の多くの研究で実証的に観察されたより優れた一般化能力をもたらす。我々の理論が根拠としているスパーシティの有効性にもかかわらず、チューニング可能なパラメータを選択する方法は依然として未解決の問題である。調整可能なパラメータをよりよく選択するために,解析的に解ける最適化関数を用いて元の問題を近似する新しい二階近似法(SAM)を提案する。可変パラメータは近似関数を直接最適化することによって決定される。実験結果から,提案するsamモデルは,強いベースラインモデルよりも優れており,理論解析も検証できることがわかった。

Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.

翻訳日:2022-11-29 14:47:04 公開日:2022-11-28

# グローバル・ローカル構造をもつマルチタスク帯域における表現学習のサンプル複雑さについて

On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure ( http://arxiv.org/abs/2211.15129v1 )

ライセンス: Link先を確認

Alessio Russo, Alexandre Proutiere

(参考訳) マルチタスクバンディット問題に対する最適アーム学習のサンプル複雑性について検討した。アームはタスク間で共有されるもの(表現と呼ぶもの)とタスク固有のもの(予測子と呼ばれるもの)の2つのコンポーネントで構成されています。目的は、最適表現がすべてのタスクに共通であると仮定して、各タスクの最適な(表現、予測)ペアを学ぶことである。このフレームワークでは、効率的な学習アルゴリズムはタスク間で知識を転送する必要がある。各ラウンドにおいて、学習者はタスクとアームの両方を積極的に選択し、対応する報酬を観察する。我々は、任意の$(\delta_g,\delta_h)$-pacアルゴリズムで満たされるインスタンス固有のサンプル複雑性下限を導出する(そのようなアルゴリズムは、最良表現を少なくとも1-\delta_g$、確率が少なくとも1-\delta_h$のタスクの最適予測器として識別する)。我々は,サンプル複雑性が下限に近づくアルゴリズムosrl-scを考案し,最大で$h(g\log(1/\delta_g)+x\log(1/\delta_h))$,$x,g,h$ をそれぞれタスク数,表現数,予測値として拡張する。比較として、このスケーリングは$hgx\log(1/\delta)$でスケールする古典的なベストアーム識別アルゴリズムよりもはるかに優れている。

We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to all tasks. Within this framework, efficient learning algorithms should transfer knowledge across tasks. We consider the best-arm identification problem for a fixed confidence, where, in each round, the learner actively selects both a task, and an arm, and observes the corresponding reward. We derive instance-specific sample complexity lower bounds satisfied by any $(\delta_G,\delta_H)$-PAC algorithm (such an algorithm identifies the best representation with probability at least $1-\delta_G$, and the best predictor for a task with probability at least $1-\delta_H$). We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(G\log(1/\delta_G)+ X\log(1/\delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors. By comparison, this scaling is significantly better than the classical best-arm identification algorithm that scales as $HGX\log(1/\delta)$.

翻訳日:2022-11-29 14:46:18 公開日:2022-11-28

# ロングテールクロスモーダルハッシュ

Long-tail Cross Modal Hashing ( http://arxiv.org/abs/2211.15162v1 )

ライセンス: Link先を確認

Zijun Gao, Jun Wang, Guoxian Yu, Zhongmin Yan, Carlotta Domeniconi, Jinglin Zhang

(参考訳) 既存のクロスモーダルハッシュ法(cmh)は主にバランスのあるデータのために設計されているが、ロングテール分布を持つ不均衡なデータは現実世界でより一般的である。いくつかのロングテールハッシュ法が提案されているが、ラベルと個人間の複雑な相互作用とマルチモーダルデータの共通性情報のため、マルチモーダルデータには適応できない。さらに、cmh法は、各モダリティの個性によって符号化された末尾ラベルをオーバーライドするハッシュコードを学ぶために、多モードデータの共通性を発掘する。本稿では,不均衡なマルチモーダルデータを扱うLtCMH(Long-tail CMH)を提案する。 LtCMHはまず、各モダリティの個性と共通性を最小化し、これらのモダリティの共通性を高めることで、異なるモダリティの個性と共通性をマイニングするオートエンコーダを採用する。次に、個性と共通性を各モジュールから抽出した直接特徴と動的に組み合わせて、テールラベルの表現を豊かにするメタ特徴と、ハッシュコードを生成するバイナリメタ特徴を生成する。 LtCMHは、ロングテールデータセットの最先端ベースラインを著しく上回り、バランスの取れたラベルを持つデータセットの(あるいは同等の)パフォーマンスを向上する。

Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced data, while imbalanced data with long-tail distribution is more general in real-world. Several long-tail hashing methods have been proposed but they can not adapt for multi-modal data, due to the complex interplay between labels and individuality and commonality information of multi-modal data. Furthermore, CMH methods mostly mine the commonality of multi-modal data to learn hash codes, which may override tail labels encoded by the individuality of respective modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the individuality and commonality of different modalities by minimizing the dependency between the individuality of respective modalities and by enhancing the commonality of these modalities. Then it dynamically combines the individuality and commonality with direct features extracted from respective modalities to create meta features that enrich the representation of tail labels, and binaries meta features to generate hash codes. LtCMH significantly outperforms state-of-the-art baselines on long-tail datasets and holds a better (or comparable) performance on datasets with balanced labels.

翻訳日:2022-11-29 14:45:48 公開日:2022-11-28

# YOLOv5モデルの海洋環境における微小物体検出への応用

Application of the YOLOv5 Model for the Detection of Microobjects in the Marine Environment ( http://arxiv.org/abs/2211.15218v1 )

ライセンス: Link先を確認

Aleksandr N. Grekov (1)(2), Yurii E. Shishkin, Sergei S. Peliushenko, Aleksandr S. Mavrin, ((1) Institute of Natural and Technical Systems, (2) Sevastopol State University)

(参考訳) 海洋環境における微小物体の自動検出と認識の問題を解決するためのYOLOV5機械学習モデルの有効性について検討した。マイクロプランクトンとマイクロプラスチックのサンプルを作成し,画像認識ニューラルネットワークを訓練するために,機密画像のデータベースを収集した。訓練されたネットワークを用いて、写真やビデオ画像中の微小物体をリアルタイムで見つける実験結果を示す。実験により, 海洋環境における微小物体の検出問題の解法において, 提案モデルを用いた手動認識に匹敵する高い効率性を示した。

The efficiency of using the YOLOV5 machine learning model for solving the problem of automatic de-tection and recognition of micro-objects in the marine environment is studied. Samples of microplankton and microplastics were prepared, according to which a database of classified images was collected for training an image recognition neural network. The results of experiments using a trained network to find micro-objects in photo and video images in real time are presented. Experimental studies have shown high efficiency, comparable to manual recognition, of the proposed model in solving problems of detect-ing micro-objects in the marine environment.

翻訳日:2022-11-29 14:37:47 公開日:2022-11-28

# ビデオキャプションにおける周波数拡散に対する精細セマンティックエンハンスメント

Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning ( http://arxiv.org/abs/2211.15076v1 )

ライセンス: Link先を確認

Xian Zhong, Zipeng Li, Shuqin Chen, Kui Jiang, Chen Chen and Mang Ye

(参考訳) ビデオキャプションは、与えられたビデオを正確に記述する自然言語文を生成することを目的としている。既存の手法では、エンコードフェーズでよりリッチな視覚的表現を探索したり、復号能力を向上させることで良好な生成が得られる。しかし、長い尾の問題はこれらの低周波トークンに対する試みを妨げ、これは稀に起こるが重要な意味論を持ち、詳細な生成において重要な役割を果たす。本稿では,不適切なトークンの言語表現を常に知覚するキャプションモデルである周波数拡散(rsfd)に対する新しい洗練された意味的拡張法を提案する。具体的には、低周波トークンの意味を理解するために、周波数対応拡散(FAD)モジュールを提案する。このようにして、トークンの吸収を不十分に促進してキャプションを洗練する。 fadに基づき、拡散過程によって引き起こされる高周波トークンの情報損失を補償するために、分散セマンティックスーパーバイザ(dss)モジュールを設計し、低周波トークンのセマンティクスをさらに強調し、ロングテール問題を軽減する。 RSFDは、MSR-VTTとMSVDという2つのベンチマークデータセット上で最先端の手法よりも優れており、低周波トークンセマンティクスの強化が競合する生成効果が得られることを示している。コードはhttps://github.com/lzp870/RSFDで入手できる。

Video captioning aims to generate natural language sentences that describe the given video accurately. Existing methods obtain favorable generation by exploring richer visual representations in encode phase or improving the decoding ability. However, the long-tailed problem hinders these attempts at low-frequency tokens, which rarely occur but carry critical semantics, playing a vital role in the detailed generation. In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens. Concretely, a Frequency-Aware Diffusion (FAD) module is proposed to comprehend the semantics of low-frequency tokens to break through generation limitations. In this way, the caption is refined by promoting the absorption of tokens with insufficient occurrence. Based on FAD, we design a Divergent Semantic Supervisor (DSS) module to compensate for the information loss of high-frequency tokens brought by the diffusion process, where the semantics of low-frequency tokens is further emphasized to alleviate the long-tailed problem. Extensive experiments indicate that RSFD outperforms the state-of-the-art methods on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate that the enhancement of low-frequency tokens semantics can obtain a competitive generation effect. Code is available at https://github.com/lzp870/RSFD.

翻訳日:2022-11-29 14:36:40 公開日:2022-11-28

# Decoupled Prototypeal Networkによる一般化カテゴリー探索

Generalized Category Discovery with Decoupled Prototypical Network ( http://arxiv.org/abs/2211.15115v1 )

ライセンス: Link先を確認

Wenbin An, Feng Tian, Qinghua Zheng, Wei Ding, QianYing Wang, Ping Chen

(参考訳) Generalized Category Discovery (GCD)は、既知のカテゴリのみをラベル付けした別のデータセットに基づいて、ラベルなしデータの集合から既知のカテゴリと新しいカテゴリの両方を認識することを目的としている。既知のカテゴリと新しいカテゴリの違いを考慮せずに、現在の手法はそれらを結合的に学習し、モデルの一般化と識別能力を損なう。さらに,これらのモデルがラベル付きデータからラベルなしデータへ,カテゴリ固有の知識を明示的に伝達することを防止し,高レベルのセマンティック情報やモデル性能を損なうことができる。上記の制約を緩和するために,Decoupled Prototypeal Network (DPN) と呼ばれる新しいモデルを提案する。カテゴリプロトタイプの両部マッチング問題を定式化することにより、DPNは、既知のカテゴリと新しいカテゴリを分離して、異なるトレーニング目標を効果的に達成するだけでなく、ラベル付きおよびラベルなしデータの既知のカテゴリを整列させて、カテゴリ固有の知識を明示的に伝達し、ハイレベルなセマンティクスを捉えることができる。さらに、DPNは、SPL(Semantic-aware Prototypeal Learning)によって、既知のカテゴリと新しいカテゴリの両方のより差別的な特徴を学習することができる。意味のある意味情報を取得することに加えて、SPLは意味重み付けされたソフトアロケーションによって硬い擬似ラベルのノイズを軽減することもできる。大規模な実験により、DPNは複数のベンチマークデータセットのすべての評価指標に対して、最先端のモデルよりも大きなマージンで優れていることが示された。コードとデータはhttps://github.com/lackel/dpnで入手できる。

Generalized Category Discovery (GCD) aims to recognize both known and novel categories from a set of unlabeled data, based on another dataset labeled with only known categories. Without considering differences between known and novel categories, current methods learn about them in a coupled manner, which can hurt model's generalization and discriminative ability. Furthermore, the coupled training approach prevents these models transferring category-specific knowledge explicitly from labeled data to unlabeled data, which can lose high-level semantic information and impair model performance. To mitigate above limitations, we present a novel model called Decoupled Prototypical Network (DPN). By formulating a bipartite matching problem for category prototypes, DPN can not only decouple known and novel categories to achieve different training targets effectively, but also align known categories in labeled and unlabeled data to transfer category-specific knowledge explicitly and capture high-level semantics. Furthermore, DPN can learn more discriminative features for both known and novel categories through our proposed Semantic-aware Prototypical Learning (SPL). Besides capturing meaningful semantic information, SPL can also alleviate the noise of hard pseudo labels through semantic-weighted soft assignment. Extensive experiments show that DPN outperforms state-of-the-art models by a large margin on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/Lackel/DPN.

翻訳日:2022-11-29 14:36:15 公開日:2022-11-28

# 教師付き言語モデルのFew-Shotシナリオにおける距離メトリック学習損失関数

Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning ( http://arxiv.org/abs/2211.15195v1 )

ライセンス: Link先を確認

Witold Sosnowski, Karolina Seweryn, Anna Wr\'oblewska, Piotr Gawrysiak

(参考訳) 本稿では,分類タスクにおける言語モデルの教師付き微調整に対する距離メトリック学習(dml)損失関数の影響について分析する。 SentEval Transfer Tasksの既知のデータセットを実験した。実験により,DML損失関数の適用により,ロバータ大規模モデルの下流分類タスクの性能が向上することが示された。ソフトトリプル損失を微調整したモデルは、トレーニングデータセットに応じて0.04から13.48ポイントの約2.89ポイントの、標準カテゴリのクロスエントロピー損失関数を持つモデルよりも優れた結果が得られる。さらに,モデルの信頼性を評価し,結果を説明するために,説明可能性技術を用いた総合的な分析を行った。

This paper presents an analysis regarding an influence of the Distance Metric Learning (DML) loss functions on the supervised fine-tuning of the language models for classification tasks. We experimented with known datasets from SentEval Transfer Tasks. Our experiments show that applying the DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios. Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function by about 2.89 percentage points from 0.04 to 13.48 percentage points depending on the training dataset. Additionally, we accomplished a comprehensive analysis with explainability techniques to assess the models' reliability and explain their results.

翻訳日:2022-11-29 14:35:34 公開日:2022-11-28

# 数発自然言語分類のための距離メトリック学習の再検討

Revisiting Distance Metric Learning for Few-Shot Natural Language Classification ( http://arxiv.org/abs/2211.15202v1 )

ライセンス: Link先を確認

Witold Sosnowski, Anna Wr\'oblewska, Karolina Seweryn, Piotr Gawrysiak

(参考訳) 距離メトリック学習(DML)は近年,画像処理において注目されている。本稿では,自然言語処理(nlp)分類タスクにおける教師付き微調整言語モデルへの影響を分析した。我々は、既知のSentEval Transfer Tasksデータセット上でRoBERTa言語モデルを訓練する際のDML損失関数について検討した。また、モデル推論中にプロキシベースのDML損失を利用する可能性についても分析した。体系的な実験により,少数の学習条件,特にプロキシに基づくdml損失は,教師付き言語モデルの微調整と推論に正の影響を与えうることが示された。 CCE(カテゴリー的クロスエントロピー損失)とProxyAnchor Lossの組み合わせで調整されたモデルは、トレーニングデータセットによって最大10.38ポイントまで、平均してCCEのみで最高のパフォーマンスとパフォーマンスのモデルである。

Distance Metric Learning (DML) has attracted much attention in image processing in recent years. This paper analyzes its impact on supervised fine-tuning language models for Natural Language Processing (NLP) classification tasks under few-shot learning settings. We investigated several DML loss functions in training RoBERTa language models on known SentEval Transfer Tasks datasets. We also analyzed the possibility of using proxy-based DML losses during model inference. Our systematic experiments have shown that under few-shot learning settings, particularly proxy-based DML losses can positively affect the fine-tuning and inference of a supervised language model. Models tuned with a combination of CCE (categorical cross-entropy loss) and ProxyAnchor Loss have, on average, the best performance and outperform models with only CCE by about 3.27 percentage points -- up to 10.38 percentage points depending on the training dataset.

翻訳日:2022-11-29 14:35:21 公開日:2022-11-28

# 逆向き知識蒸留による雷速映像異常検出

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation ( http://arxiv.org/abs/2211.15597v1 )

ライセンス: Link先を確認

Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

(参考訳) 本稿では,複数の高精度な対象レベルの教師モデルから知識を抽出し,異常検出を学習する,ビデオ中の異常検出のための非常に高速なフレームレベルモデルを提案する。学生の忠実度を向上させるために,教師の低分解能な異常マップを,標準と対角蒸留を併用して蒸留し,各教師に対して,目標と生成した異常マップを区別する対角ディミネータを導入する。我々は,3つのベンチマーク (avenue, shanghaitech, ucsd ped2) について実験を行い,提案手法が最速の競合手法よりも7倍以上高速で,オブジェクト中心モデルよりも28～62倍高速であることを示した。また,従来の1480fpsの低速化により,速度と精度のトレードオフが最良であることを示す。さらに、アーキテクチャ設計の選択を正当化するための包括的なアブレーション研究を実施します。

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices.

翻訳日:2022-11-29 14:28:03 公開日:2022-11-28

# 逆可解性とセキュリティ : フェデレーション学習への応用

Inverse Solvability and Security with Applications to Federated Learning ( http://arxiv.org/abs/2211.14115v2 )

ライセンス: Link先を確認

Tomasz Piotrowski, Matthias Frey, Renato L.G. Cavalcante, Rafail Ismailov

(参考訳) 本稿では,一般線形フォワードモデルにおける逆可解性と安全性の概念を紹介し,連体学習で用いられるモデルに適用する方法を示す。本稿では,本論文で定義した逆可解性とセキュリティが異なるようなモデルの例を示す。また,フェデレート学習の繰り返しに参加する多数のユーザが,解答可能性とセキュリティを高めるためにどのように活用できるかを示す。最後に、非線形ケースを含む提示概念の拡張について論じる。

We introduce the concepts of inverse solvability and security for a generic linear forward model and demonstrate how they can be applied to models used in federated learning. We provide examples of such models which differ in the resulting inverse solvability and security as defined in this paper. We also show how the large number of users participating in a given iteration of federated learning can be leveraged to increase both solvability and security. Finally, we discuss possible extensions of the presented concepts including the nonlinear case.

翻訳日:2022-11-29 14:19:24 公開日:2022-11-28

# 因子モデルにおける二重強近傍

Doubly robust nearest neighbors in factor models ( http://arxiv.org/abs/2211.14297v2 )

ライセンス: Link先を確認

Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

(参考訳) 本稿では,複数のユニットが複数の時点に複数の処理を割り当てるパネルデータ設定において,各ユニットが一定の確率でサンプリングされた非事実的推論のための改良型を提案する。我々はこの推定器を2倍に頑健な近接推定器と呼び、各単位に対応する平均パラメータにバインドされた高い確率の非漸近誤差を与える。私たちの保証は、二重ロバストな推定器は、これらの設定のために事前の作業で分析された近隣の推定器と比較して、エラーの(ほぼ)クアドドラティックな改善を提供することを示している。

In this technical note, we introduce an improved variant of nearest neighbors for counterfactual inference in panel data settings where multiple units are assigned multiple treatments over multiple time points, each sampled with constant probabilities. We call this estimator a doubly robust nearest neighbor estimator and provide a high probability non-asymptotic error bound for the mean parameter corresponding to each unit at each time. Our guarantee shows that the doubly robust estimator provides a (near-)quadratic improvement in the error compared to nearest neighbor estimators analyzed in prior work for these settings.

翻訳日:2022-11-29 14:19:17 公開日:2022-11-28

# 欧州のAI責任指令 -- ハーフハードアプローチの批判と今後の教訓

The European AI Liability Directives -- Critique of a Half-Hearted Approach and Lessons for the Future ( http://arxiv.org/abs/2211.13960v2 )

ライセンス: Link先を確認

Philipp Hacker

(参考訳) aiシステムの最適責任フレームワークは、世界中で未解決の問題のままである。欧州委員会は2022年9月に、新たなai責任指令と製品責任指令の改訂という2つの提案を前進させた。それらは、EUにおけるAI規制の最終的かつ待望の基盤となっている。重要なことに、責任提案とEUのAI法は本質的に相互運用されており、後者は被災者の個人的権利を一切含んでおらず、前者はAI開発と展開に関する特定の実質的な規則を欠いている。総合すると、これらの行為は、米国や他の国に大きな影響を与えるai規制においてブリュッセル効果を引き起こす可能性がある。この論文は3つの新しい貢献をする。まず、欧州委員会の提案を詳細に検討し、正しい方向に進む一方で、最終的にはハーフハーフハーフのアプローチを表現している。もし前向きに制定されたら、EUにおけるAIの責任は、主に証拠メカニズムの開示と、欠陥、欠陥、因果関係に関する狭義の予測にかかっている。第二に、この記事は修正を提案するが、これは論文の最後にAnnexで収集される。第3に、AIがもたらす重要なリスクの分析に基づいて、最終部では、EU以降におけるAIの責任と規制の将来への道のりを図示している。これには、AI責任のための包括的なフレームワーク、イノベーションをサポートするための条項、非差別/アルゴリズムフェアネスの拡張、説明可能なAI、持続可能性が含まれる。我々は、AI法における持続可能性影響評価と、債務制度における持続可能な設計欠陥を通じて、持続可能なAI規制を飛躍的に開始することを提案する。このようにして、この法律は公正なAIとXAIだけでなく、持続可能なAI(SAI)にも役立ちます。

The optimal liability framework for AI systems remains an unsolved problem across the globe. In a much-anticipated move, the European Commission advanced two proposals outlining the European approach to AI liability in September 2022: a novel AI Liability Directive and a revision of the Product Liability Directive. They constitute the final, and much-anticipated, cornerstone of AI regulation in the EU. Crucially, the liability proposals and the EU AI Act are inherently intertwined: the latter does not contain any individual rights of affected persons, and the former lack specific, substantive rules on AI development and deployment. Taken together, these acts may well trigger a Brussels effect in AI regulation, with significant consequences for the US and other countries. This paper makes three novel contributions. First, it examines in detail the Commission proposals and shows that, while making steps in the right direction, they ultimately represent a half-hearted approach: if enacted as foreseen, AI liability in the EU will primarily rest on disclosure of evidence mechanisms and a set of narrowly defined presumptions concerning fault, defectiveness and causality. Hence, second, the article suggests amendments, which are collected in an Annex at the end of the paper. Third, based on an analysis of the key risks AI poses, the final part of the paper maps out a road for the future of AI liability and regulation, in the EU and beyond. This includes: a comprehensive framework for AI liability; provisions to support innovation; an extension to non-discrimination/algorithmic fairness, as well as explainable AI; and sustainability. I propose to jump-start sustainable AI regulation via sustainability impact assessments in the AI Act and sustainable design defects in the liability regime. In this way, the law may help spur not only fair AI and XAI, but potentially also sustainable AI (SAI).

翻訳日:2022-11-29 14:19:10 公開日:2022-11-28

# MIAD: 教師なし異常検出のための保守検査データセット

MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2211.13968v2 )

ライセンス: Link先を確認

Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng

(参考訳) 視覚異常検出は,製造工程中の製品の欠陥を見つけるための製造検査だけでなく,特に屋外の最適作業条件を維持するためのメンテナンス検査においても重要な役割を担っている。欠陥サンプルの不足により,近年,教師なし異常検出が注目されている。しかし, 監視不能な異常検出のための既存のデータセットは製造検査に偏り, 様々なカメラ視点, 乱雑な背景, 長期作業後の物体表面の劣化など, 外部制御されていない環境下での保守検査を考慮しない。各種の屋外産業シナリオにおいて,100K以上の高分解能カラー画像を含むMIADデータセットの総合的な保守検査に焦点をあてた。このデータセットは3Dグラフィックソフトウェアによって生成され、表面および論理異常の両方をピクセル精度の基底真理でカバーしている。非教師付き異常検出のための代表アルゴリズムの広範囲な評価を行い、MIADとそれに対応する実験結果が屋外教師なし異常検出タスクにおける研究コミュニティに刺激を与えると期待する。価値と関連する今後の作業は、私たちの新しいデータセットから生み出すことができます。

Visual anomaly detection plays a crucial role in not only manufacturing inspection to find defects of products during manufacturing processes, but also maintenance inspection to keep equipment in optimum working condition particularly outdoors. Due to the scarcity of the defective samples, unsupervised anomaly detection has attracted great attention in recent years. However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working. We focus on outdoor maintenance inspection and contribute a comprehensive Maintenance Inspection Anomaly Detection (MIAD) dataset which contains more than 100K high-resolution color images in various outdoor industrial scenarios. This dataset is generated by a 3D graphics software and covers both surface and logical anomalies with pixel-precise ground truth. Extensive evaluations of representative algorithms for unsupervised anomaly detection are conducted, and we expect MIAD and corresponding experimental results can inspire research community in outdoor unsupervised anomaly detection tasks. Worthwhile and related future work can be spawned from our new dataset.

翻訳日:2022-11-29 14:18:27 公開日:2022-11-28

# ILSGAN: 教師なし前地上セグメンテーションのための独立層合成

ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation ( http://arxiv.org/abs/2211.13974v2 )

ライセンス: Link先を確認

Qiran Zou, Yu Yang, Wing Yin Cheung, Chang Liu, Xiangyang Ji

(参考訳) 非教師なしフォアグラウンド・バックグラウンド・セグメンテーションは、乱雑な背景から、特に層状GAN(Generative Adversarial Network)アプローチによって、非常に有望な対象を抽出することを目的としている。しかしながら、人間のアノテーションがなければ、それらは一般に「情報漏洩」と呼ばれる非無視的な意味と視覚的混乱を伴う前景層と背景層を生成する傾向があり、それによって生成されたセグメンテーションマスクが顕著に劣化する。この問題を軽減するために,独立層合成GAN (ILSGAN) と呼ばれる,単純かつ効果的な明示的な層独立性モデリング手法を提案する。具体的には、前景と背景の可視領域間の相互情報の最小化を目標とし、層間独立を促進する。理論的および実験的分析により、明示的な層独立性モデリングは情報漏洩を抑制するために重要であり、セグメンテーション性能の向上に寄与する。また,我々のilsganは,複雑な実世界のデータに対して,最先端の生成品質とセグメンテーション性能を実現している。

Unsupervised foreground-background segmentation aims at extracting salient objects from cluttered backgrounds, where Generative Adversarial Network (GAN) approaches, especially layered GANs, show great promise. However, without human annotations, they are typically prone to produce foreground and background layers with non-negligible semantic and visual confusion, dubbed "information leakage", resulting in notable degeneration of the generated segmentation mask. To alleviate this issue, we propose a simple-yet-effective explicit layer independence modeling approach, termed Independent Layer Synthesis GAN (ILSGAN), pursuing independent foreground-background layer generation by encouraging their discrepancy. Specifically, it targets minimizing the mutual information between visible and invisible regions of the foreground and background to spur interlayer independence. Through in-depth theoretical and experimental analyses, we justify that explicit layer independence modeling is critical to suppressing information leakage and contributes to impressive segmentation performance gains. Also, our ILSGAN achieves strong state-of-the-art generation quality and segmentation performance on complex real-world data.

翻訳日:2022-11-29 14:18:01 公開日:2022-11-28

# テーブルの変換: ML評価のためのバイアス付き、不均衡、動的タブラルデータセット

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation ( http://arxiv.org/abs/2211.13358v2 )

ライセンス: Link先を確認

S\'ergio Jesus, Jos\'e Pombal, Duarte Alves, Andr\'e Cruz, Pedro Saleiro, Rita P. Ribeiro, Jo\~ao Gama, Pedro Bizarro

(参考訳) 現実的なデータセットに対する新しいテクニックの評価は、ML研究の発展と実践者によるより広範な採用において重要な役割を果たす。近年,コンピュータビジョンやNLPタスクのための非構造化データリソースの公開が著しく増加している。しかし、多くのハイテイクドメインで広く使われている表形式のデータは、遅れを取っている。このギャップを埋めるために、私たちは、初めて一般公開されたプライバシー保護、大規模かつ現実的なテーブル型データセットのセットである、銀行口座詐欺(bank account fraud:baf)を紹介します。このスイートは、匿名化された現実世界の銀行口座の不正検出データセットに最先端の表式データ生成技術を適用して生成された。この設定には、時間的ダイナミクスや重大なクラス不均衡など、現実世界のアプリケーションで一般的な課題が伴う。さらに、実践者がMLメソッドのパフォーマンスと公平性の両方をテストできるように、各データセットのBAFには、特定の種類のデータバイアスが含まれている。本資料では, より現実的で, 完全で, 堅牢なテストベッドを研究コミュニティに提供することを目的として, 新規および既存手法の評価を行う。

Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this gap, we present Bank Account Fraud (BAF), the first publicly available privacy-preserving, large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized,real-world bank account opening fraud detection dataset. This setting carries a set of challenges that are commonplace in real-world applications, including temporal dynamics and significant class imbalance. Additionally, to allow practitioners to stress test both performance and fairness of ML methods, each dataset variant of BAF contains specific types of data bias. With this resource, we aim to provide the research community with a more realistic, complete, and robust test bed to evaluate novel and existing methods.

翻訳日:2022-11-29 14:17:36 公開日:2022-11-28

# 最適目的推定を用いた知覚指向単一画像超解法

Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation ( http://arxiv.org/abs/2211.13676v2 )

ライセンス: Link先を確認

Seung Ho Park, Young Su Moon, Nam Ik Cho

(参考訳) 知覚的および敵対的損失で訓練されたシングルイメージスーパーレゾリューション(sisr)ネットワークは、l1やl2のような歪み指向損失で訓練されたネットワークと比較して高いコントラスト出力を提供する。しかし, 画像の局所的な多様な形状を正確に復元するには, 単一の知覚損失を用いることが不十分であり, 望ましくない人工物や不自然な細部が生じることが示されている。このため, 知覚, 対角, 歪み損失などの様々な損失の組み合わせが試みられているが, 最適な組み合わせを見つけることは困難である。そこで本稿では,高分解能出力の全体領域において,各領域に最適な目標を適用したSISRフレームワークを提案する。具体的には、所定の低解像度(LR)入力に対して最適な客観的マップを推定する予測モデルと、対応するSR出力を生成するために対象対象マップを適用する生成モデルとからなる。生成モデルは,本提案した目的の集合を表す対象軌道上で訓練され,単一のネットワークが,軌道上の複合的な損失に対応する様々なSR結果を学ぶことができる。予測モデルは、一対のLR画像と、対象軌道から探索された対応する最適目的写像を用いて訓練される。 5つのベンチマーク実験の結果,提案手法はLPIPS, DISTS, PSNR, SSIM測定値において,最先端の認識駆動SR法よりも優れていた。また,視覚効果は,知覚指向の再構成における手法の優位性を示す。コードとモデルはhttps://github.com/seungho-snu/srooeで入手できる。

Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or unnatural details. For this reason, combinations of various losses, such as perceptual, adversarial, and distortion losses, have been attempted, yet it remains challenging to find optimal combinations. Hence, in this paper, we propose a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. Specifically, the framework comprises two models: a predictive model that infers an optimal objective map for a given low-resolution (LR) input and a generative model that applies a target objective map to produce the corresponding SR output. The generative model is trained over our proposed objective trajectory representing a set of essential objectives, which enables the single network to learn various SR results corresponding to combined losses on the trajectory. The predictive model is trained using pairs of LR images and corresponding optimal objective maps searched from the objective trajectory. Experimental results on five benchmarks show that the proposed method outperforms state-of-the-art perception-driven SR methods in LPIPS, DISTS, PSNR, and SSIM metrics. The visual results also demonstrate the superiority of our method in perception-oriented reconstruction. The code and models are available at https://github.com/seungho-snu/SROOE.

翻訳日:2022-11-29 14:17:19 公開日:2022-11-28

# エッジコンピューティングにおける分散CNN推論高速化の設計と試作

Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing ( http://arxiv.org/abs/2211.13778v2 )

ライセンス: Link先を確認

Zhongtian Dong, Nan Li, Alexandros Iosifidis, Qi Zhang

(参考訳) ディープラーニングを使用した時間クリティカルなIoTアプリケーションにとって、分散コンピューティングによる推論アクセラレーションは、厳しい期限を満たすための有望なアプローチである。本稿では,3つのラズベリーPi 4を用いた新しい分散推論加速法HALPの動作プロトタイプを実装した。 HALPはエッジコンピューティングにおけるエッジデバイス(ED)間のシームレスなコラボレーションを設計することで推論を加速する。セグメント分割に基づくタスク分割比を最適化することにより,協調ed間の通信と計算の並列化を最大化する。実験の結果,分散推論HALPはVGG-16の1.7倍の推論加速を達成することがわかった。次に,分散推論と従来のニューラルネットワークモデル圧縮を組み合わせることで,mobilenet-v1の縮小ハイパーパラメータを設定する。このように、推論をさらに加速することができるが、推測精度損失のコストがかかる。レイテンシと精度のバランスをとるために,遅延制約の中で最高の精度のモデルを選択するための動的モデル選択を提案する。分散推論halpを用いたモデル選択により,従来のスタンドアロン計算に比べてサービス信頼性が著しく向上することが示された。

For time-critical IoT applications using deep learning, inference acceleration through distributed computing is a promising approach to meet a stringent deadline. In this paper, we implement a working prototype of a new distributed inference acceleration method HALP using three raspberry Pi 4. HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. We maximize the parallelization between communication and computation among the collaborative EDs by optimizing the task partitioning ratio based on the segment-based partitioning. Experimental results show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. Then, we combine distributed inference with conventional neural network model compression by setting up different shrinking hyperparameters for MobileNet-V1. In this way, we can further accelerate inference but at the cost of inference accuracy loss. To strike a balance between latency and accuracy, we propose dynamic model selection to select a model which provides the highest accuracy within the latency constraint. It is shown that the model selection with distributed inference HALP can significantly improve service reliability compared to the conventional stand-alone computation.

翻訳日:2022-11-29 14:16:51 公開日:2022-11-28

# 深部グラフ表現学習を用いたエンド・ツー・エンド風車ウェイクモデリング

End-to-end Wind Turbine Wake Modelling with Deep Graph Representation Learning ( http://arxiv.org/abs/2211.13649v2 )

ライセンス: Link先を確認

Siyi Li, Mingrui Zhang, Matthew D. Piggott

(参考訳) 風力タービンのウェイクモデリングは、正確な資源評価、レイアウトの最適化、風力発電所の運用管理において重要な役割を担っている。本研究では,グラフニューラルネットワークと呼ばれる最先端グラフ表現学習法に基づいて,風車ウェイク表現のためのサロゲートモデルを提案する。提案したエンドツーエンドディープラーニングモデルは、非構造メッシュ上で直接動作し、高忠実度データに対して検証され、様々な入口条件やタービンヨー角度に対して高精度な3次元流れ場予測を行う能力を示している。ここで用いられる特定のグラフニューラルネットワークモデルは、目に見えないデータにうまく一般化し、一般的なグラフニューラルネットワークと比較して過度なスムーシングに敏感でないことを示す。実世界の風力発電所に基づくケーススタディでは,提案手法による大規模発電予測の可能性をさらに実証する。さらに,提案するグラフニューラルネットワークフレームワークは柔軟かつ高度に汎用的であり,非構造メッシュ上の任意の定常数値流体力学シミュレーションに適用可能である。

Wind turbine wake modelling is of crucial importance to accurate resource assessment, to layout optimisation, and to the operational control of wind farms. This work proposes a surrogate model for the representation of wind turbine wakes based on a state-of-the-art graph representation learning method termed a graph neural network. The proposed end-to-end deep learning model operates directly on unstructured meshes and has been validated against high-fidelity data, demonstrating its ability to rapidly make accurate 3D flow field predictions for various inlet conditions and turbine yaw angles. The specific graph neural network model employed here is shown to generalise well to unseen data and is less sensitive to over-smoothing compared to common graph neural networks. A case study based upon a real world wind farm further demonstrates the capability of the proposed approach to predict farm scale power generation. Moreover, the proposed graph neural network framework is flexible and highly generic and as formulated here can be applied to any steady state computational fluid dynamics simulations on unstructured meshes.

翻訳日:2022-11-29 14:16:35 公開日:2022-11-28

# 時間臨界IoTアプリケーションのためのロバストエッジインテリジェンスを実現するセマンティック通信

Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications ( http://arxiv.org/abs/2211.13787v2 )

ライセンス: Link先を確認

Andrea Cavagna, Nan Li, Alexandros Iosifidis, Qi Zhang

(参考訳) 本稿では、時間クリティカルなIoTアプリケーションのためのセマンティック通信を用いて、堅牢なエッジインテリジェンスを設計することを目的とする。画像DCT係数が推定精度に与える影響を系統的に解析し、まず最も有意義なタスクデータを送信し、オフロードのためのチャネル非依存の有効性符号化を提案する。このスキームは利用可能な全ての通信リソースをうまく活用し、伝送遅延と推論精度のバランスを取ることができる。次に、畳み込みニューラルネットワーク(CNN)トレーニングのための新しい画像拡張プロセスを実装し、元のCNNモデルをロバストCNNモデルに変換することにより、有効デコーディングを設計する。提案手法を用いて,Robust MobileNet-v2 と Robust ResNet-50 を生成する。提案するエッジインテリジェンスフレームワークは,提案する有効性エンコーディングと有効性復号で構成される。実験の結果,ロバストなcnnモデルを用いたデコードの有効性は,チャネルエラーや通信資源の制限による様々な画像歪みに対して一貫して向上することがわかった。セマンティクス通信を用いたエッジインテリジェンスフレームワークは、レイテンシとデータレートの制約、特に超厳密な期限と低いデータレート下での従来のアプローチを大きく上回っている。

This paper aims to design robust Edge Intelligence using semantic communication for time-critical IoT applications. We systematically analyze the effect of image DCT coefficients on inference accuracy and propose the channel-agnostic effectiveness encoding for offloading by transmitting the most meaningful task data first. This scheme can well utilize all available communication resource and strike a balance between transmission latency and inference accuracy. Then, we design an effectiveness decoding by implementing a novel image augmentation process for convolutional neural network (CNN) training, through which an original CNN model is transformed into a Robust CNN model. We use the proposed training method to generate Robust MobileNet-v2 and Robust ResNet-50. The proposed Edge Intelligence framework consists of the proposed effectiveness encoding and effectiveness decoding. The experimental results show that the effectiveness decoding using the Robust CNN models perform consistently better under various image distortions caused by channel errors or limited communication resource. The proposed Edge Intelligence framework using semantic communication significantly outperforms the conventional approach under latency and data rate constraints, in particular, under ultra stringent deadlines and low data rate.

翻訳日:2022-11-29 14:16:20 公開日:2022-11-28

# ネットワーク支援空間進化を用いた辞書攻撃のための2次元および3次元マスターフェイスの生成

Generating 2D and 3D Master Faces for Dictionary Attacks with a Network-Assisted Latent Space Evolution ( http://arxiv.org/abs/2211.13964v2 )

ライセンス: Link先を確認

Tomer Friedlander, Ron Shmelkin, Lior Wolf

(参考訳) マスターフェイス(master face)は、人口の比率が高い顔認証をパスする顔画像である。これらの顔は、ユーザー情報にアクセスせずに、成功の可能性の高いユーザーを偽装するのに使うことができる。 2次元および3次元の顔検証モデルのために,スタイルガン顔生成器の潜在埋め込み空間における進化的アルゴリズムを用いて顔の最適化を行う。 2次元顔認証では,複数の進化戦略を比較し,適応度評価を加えることなく,有望なサンプルを探索するためのニューラルネットワークを用いた新しいアプローチを提案する。その結果,6つの主顔認識システムにおいて,10個の主顔未満のLFWデータセットやRFWデータセットのアイデンティティをかなり網羅することが可能であることが判明した。 3Dでは,2次元スタイルGAN2ジェネレータを用いて顔を生成し,深部3次元顔再構成ネットワークを用いて3次元構造を予測する。 2つの異なる3D顔認証システムを採用すると、40%から50%のカバレッジが得られる。さらに,2次元モデルと3次元モデルとを同時に組み合わせた2次元RGBと3次元マスターフェイスのペア生成を提案する。

A master face is a face image that passes face-based identity authentication for a high percentage of the population. These faces can be used to impersonate, with a high probability of success, any user, without having access to any user information. We optimize these faces for 2D and 3D face verification models, by using an evolutionary algorithm in the latent embedding space of the StyleGAN face generator. For 2D face verification, multiple evolutionary strategies are compared, and we propose a novel approach that employs a neural network to direct the search toward promising samples, without adding fitness evaluations. The results we present demonstrate that it is possible to obtain a considerable coverage of the identities in the LFW or RFW datasets with less than 10 master faces, for six leading deep face recognition systems. In 3D, we generate faces using the 2D StyleGAN2 generator and predict a 3D structure using a deep 3D face reconstruction network. When employing two different 3D face recognition systems, we are able to obtain a coverage of 40%-50%. Additionally, we present the generation of paired 2D RGB and 3D master faces, which simultaneously match 2D and 3D models with high impersonation rates.

翻訳日:2022-11-29 14:07:49 公開日:2022-11-28

# 第1回海洋コンピュータビジョンワークショップ(macvi)2023:チャレンジ結果

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results ( http://arxiv.org/abs/2211.13508v2 )

ライセンス: Link先を確認

Benjamin Kiefer, Matej Kristan, Janez Per\v{s}, Lojze \v{Z}ust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon H\"ofer, Qiming Zhang, Yufei Xu, Jing Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Sagar Verma, Siddharth Gupta, Shishir Muralidhara, Niharika Hegde, Daitao Xing, Nikolaos Evangeliou, Anthony Tzes, Vojt\v{e}ch Bartl, Jakub \v{S}pa\v{n}hel, Adam Herout, Neelanjan Bhowmik, Toby P. Breckon, Shivanand Kundargi, Tejas Anvekar, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudengudi, Arpita Vats, Yang Song, Delong Liu, Yonglin Li, Shuman Li, Chenhao Tan, Long Lan, Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi, Hsiang-Wei Huang, Cheng-Yen Yang, Jenq-Neng Hwang, Pyong-Kun Kim, Kwangju Kim, Kyoungoh Lee, Shuai Jiang, Haiwen Li, Zheng Ziqiang, Tuan-Anh Vu, Hai Nguyen-Truong, Sai-Kit Yeung, Zhuang Jia, Sophia Yang, Chih-Chung Hsu, Xiu-Yu Hou, Yu-An Jhang, Simon Yang, Mau-Tsuen Yang

(参考訳) 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023は、無人航空機 (UAV) と無人表面車両 (USV) のための海上コンピュータビジョンに焦点を当て、この分野のいくつかのサブ組織を組織した。 (i)uavによる海上物体検出 (II)UAVによる海上物体追跡 (iii)usvによる海上障害物セグメンテーションと海上障害物セグメンテーション (iv)usvによる海上障害物検出サブチャンジはSeaDronesSeeとMODSベンチマークに基づいていた。本報告では,個々のサブクラスの主な知見を要約し,新たなベンチマークであるseadronessee object detection v2を紹介する。統計的および定性的な分析を行い,130以上の応募のベストパフォーマンス手法の傾向を評価する。メソッドは付録にまとめられている。データセット、評価コード、リーダーボードはhttps://seadronessee.cs.uni-tuebingen.de/macvi.comで公開されている。

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

翻訳日:2022-11-29 14:07:29 公開日:2022-11-28

PDF登録状況（公開日: 20221128）