Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230704となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 障害と自己効力:OSSコースが学生の知覚に及ぼす影響に関する大規模研究 Barriers and Self-Efficacy: A Large-Scale Study on the Impact of OSS Courses on Student Perceptions ( http://arxiv.org/abs/2304.14628v2 ) ライセンス: Link先を確認	Larissa Salerno, Simone de Fran\c{c}a Tonh\~ao, Igor Steinmacher, Christoph Treude	(参考訳) オープンソースソフトウェア(OSS)開発は、ソフトウェア工学の学生が大規模ソフトウェア開発を経験し、参加するユニークな機会を提供するが、そのようなコースが学生の自己効力や学生が直面する課題に与える影響はよく分かっていない。本稿は,異なる国の大学におけるoss開発コースの複数事例からのデータを分析し,授業の結果として学生の自己効力がどう変化したか,学生が直面する障壁や課題を報告することで,このギャップに対処することを目的とする。 Open source software (OSS) development offers a unique opportunity for students in Software Engineering to experience and participate in large-scale software development, however, the impact of such courses on students' self-efficacy and the challenges faced by students are not well understood. This paper aims to address this gap by analyzing data from multiple instances of OSS development courses at universities in different countries and reporting on how students' self-efficacy changed as a result of taking the course, as well as the barriers and challenges faced by students.	翻訳日:2023-10-24 12:26:33 公開日:2023-07-04
# ソフトウェアアーキテクチャ情報のためのクエリ言語(拡張版) A Query Language for Software Architecture Information (Extended version) ( http://arxiv.org/abs/2306.16829v2 ) ライセンス: Link先を確認	Joshua Ammermann, Sven Jordan, Lukas Linsbauer, Ina Schaefer	(参考訳) ソフトウェアのメンテナンスは、ソフトウェアシステムのライフサイクルの重要な部分です。既存のソフトウェアシステムのメンテナンスタスクは、時間とともに変化するアーキテクチャ情報(アーキテクチャドリフト)に苦しむ。 Digital Architecture Twin (DArT)は、最新のアーキテクチャ情報を提供することで、ソフトウェアのメンテナンスをサポートする。そのため、DArTはそのような情報を収集し、ソフトウェアシステムと共進化し、継続的なリバースエンジニアリングを可能にする。しかし、利害関係者が情報を取得するための重要なリンクが欠けている。このギャップを埋めるために、私たちはArchitecture Information Query Language (AIQL)にコントリビュートしています。我々は、継続的リバースエンジニアリングの文脈で4つのアプリケーションシナリオを導出した。私たちは、aiqlがアプリケーションシナリオのクエリを定式化するために必要な機能を提供し、言語が現実世界のソフトウェアシステムで使用するためにスケールすることを示した。ユーザ調査において、利害関係者は言語を理解するのが容易であることに同意し、その価値をアプリケーションシナリオの特定のステークホルダーに評価した。 Software maintenance is an important part of a software system's life cycle. Maintenance tasks of existing software systems suffer from architecture information that is diverging over time (architectural drift). The Digital Architecture Twin (DArT) can support software maintenance by providing up-to-date architecture information. For this, the DArT gathers such information and co-evolves with a software system, enabling continuous reverse engineering. But the crucial link for stakeholders to retrieve this information is missing. To fill this gap, we contribute the Architecture Information Query Language (AIQL), which enables stakeholders to access up-to-date and tailored architecture information. We derived four application scenarios in the context of continuous reverse engineering. We showed that the AIQL provides the required functionality to formulate queries for the application scenarios and that the language scales for use with real-world software systems. In a user study, stakeholders agreed that the language is easy to understand and assessed its value to the specific stakeholder for the application scenarios.	翻訳日:2023-10-23 18:46:52 公開日:2023-07-04
# 人工知能研究に関する文献学的研究 : グローバルパノラマとインド人の出現 A Bibliographic Study on Artificial Intelligence Research: Global Panorama and Indian Appearance ( http://arxiv.org/abs/2308.00705v1 ) ライセンス: Link先を確認	Amit Tiwari, Susmita Bardhan, Vikas Kumar	(参考訳) 本研究は,2015-2020年の人工知能研究における書誌学の傾向を,書誌学研究の科学マッピング法を用いて特定し,評価する。必要なデータはscopusデータベースから収集されている。収集したデータ分析を準備するために、ツールvizの助けを借りて、必須のデータ変換を手動で行いました。オープンリファイントレンドの決定とマッピング手法の実行のために、aiのオープンアクセスのトップ5と商用ジャーナルが、citescoreによるランキングに基づいて選ばれている。本書は,分析のために所定の期間に出版された6880条を含む。このトレンドは、国別出版物、年別出版物、AIにおける話題用語、トップクワッド記事、著名な作家、主要な機関、AIとインドにおける産業の関与に基づいている。その結果, オープンアクセス雑誌と比較して, 商業雑誌の引用率が高く, 記事数も年々増加していることがわかった。さらにIEEEは、最も暗唱された出版物の84%を出版する著名な出版社である。さらに、中国と米国はAI分野における文学の主要な貢献者である。この研究は、ニューラルネットワークとディープラーニングが、トップAI研究論文に含まれる主要なトピックであることを明らかにした。近年、公共機関だけでなく民間機関もAI研究に資金を投資している。この研究は、AI研究の観点からインドの研究者の相対的な位置についても調査している。現在の仕事は、AIの初期開発、現在の立場、そして将来の方向性を理解するのに役立つ。 The present study identifies and assesses the bibliographic trend in Artificial Intelligence (AI) research for the years 2015-2020 using the science mapping method of bibliometric study. The required data has been collected from the Scopus database. To make the collected data analysis-ready, essential data transformation was performed manually and with the help of a tool viz. OpenRefine. For determining the trend and performing the mapping techniques, top five open access and commercial journals of AI have been chosen based on their citescore driven ranking. The work includes 6880 articles published in the specified period for analysis. The trend is based on Country-wise publications, year-wise publications, topical terms in AI, top-cited articles, prominent authors, major institutions, involvement of industries in AI and Indian appearance. The results show that compared to open access journals; commercial journals have a higher citescore and number of articles published over the years. Additionally, IEEE is the prominent publisher which publishes 84% of the top-cited publications. Further, China and the United States are the major contributors to literature in the AI domain. The study reveals that neural networks and deep learning are the major topics included in top AI research publications. Recently, not only public institutions but also private bodies are investing their resources in AI research. The study also investigates the relative position of Indian researchers in terms of AI research. Present work helps in understanding the initial development, current stand and future direction of AI.	翻訳日:2023-08-06 11:02:33 公開日:2023-07-04
# 厳密な低ランク制約最適化 --漸近的に$\mathcal{O}(\frac{1}{t^2})$法 Strictly Low Rank Constraint Optimization -- An Asymptotically $\mathcal{O}(\frac{1}{t^2})$ Method ( http://arxiv.org/abs/2307.14344v1 ) ライセンス: Link先を確認	Mengyuan Zhang and Kai Liu	(参考訳) 最適解のスパーシリティを促進するために, \textit{rank} 正則化を伴う非凸および非スムース問題のクラスについて検討した。本稿では,中間更新の特異値に対する新しいサポートセットプロジェクション演算により,問題を解くための近似勾配降下法を適用し,プロセスの高速化を提案する。我々のアルゴリズムは、滑らかで凸な問題に対する一階法に対するネステロフの最適収束率と全く同じ$O(\frac{1}{t^2})$の収束率を達成することができることを示す。厳密な間隔が期待でき、各更新における特異値のサポートセットは単調に縮小し、私たちの知る限り、運動量に基づくアルゴリズムでは新しくなっている。 We study a class of non-convex and non-smooth problems with \textit{rank} regularization to promote sparsity in optimal solution. We propose to apply the proximal gradient descent method to solve the problem and accelerate the process with a novel support set projection operation on the singular values of the intermediate update. We show that our algorithms are able to achieve a convergence rate of $O(\frac{1}{t^2})$, which is exactly same as Nesterov's optimal convergence rate for first-order methods on smooth and convex problems. Strict sparsity can be expected and the support set of singular values during each update is monotonically shrinking, which to our best knowledge, is novel in momentum-based algorithms.	翻訳日:2023-07-30 03:57:01 公開日:2023-07-04
# GenRec: ジェネレーティブレコメンデーションのための大規模言語モデル GenRec: Large Language Model for Generative Recommendation ( http://arxiv.org/abs/2307.00457v2 ) ライセンス: Link先を確認	Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, Yongfeng Zhang	(参考訳) 近年,多種多様な自然言語処理タスクのための強力なツールとして,大規模言語モデル (LLM) が登場している。しかし、ジェネレーティブ・レコメンデーション・パラダイムの下でのレコメンデーション・システムへの可能性は比較的未定である。本稿では,テキストデータに基づく大規模言語モデル(LLM)を用いたレコメンデーションシステムに対する革新的なアプローチを提案する。本稿では, LLMの表現力を利用して, 従来の差別的推薦として, 各候補項目のランキングスコアを1つずつ計算するのではなく, 推薦対象項目を直接生成する新しいジェネレーティブレコメンデーション(GenRec)を提案する。 GenRecはLLMの理解機能を使ってコンテキストを解釈し、ユーザの好みを学習し、関連するレコメンデーションを生成する。提案手法は,大規模言語モデルに符号化された膨大な知識を活用して推薦課題を遂行する。まず,レコメンデーションタスクの理解能力を高めるための特別なプロンプトを定式化する。その後、これらのプロンプトを用いてLLaMAのバックボーンLLMをテキストデータで表されるユーザとイテムの相互作用のデータセット上で微調整し、ユーザの好みやアイテムの特徴をキャプチャする。本研究は,レコメンデーションシステムの領域を変革する上で,LLMに基づくジェネレーティブレコメンデーションの可能性を明らかにし,今後の探究の基盤となる枠組みを提供する。ベンチマークデータセットを広範囲に実験した結果,我々のジャンルは大規模データセットよりも優れた結果が得られることが示された。 In recent years, large language models (LLM) have emerged as powerful tools for diverse natural language processing tasks. However, their potential for recommender systems under the generative recommendation paradigm remains relatively unexplored. This paper presents an innovative approach to recommendation systems using large language models (LLMs) based on text data. In this paper, we present a novel LLM for generative recommendation (GenRec) that utilized the expressive power of LLM to directly generate the target item to recommend, rather than calculating ranking score for each candidate item one by one as in traditional discriminative recommendation. GenRec uses LLM's understanding ability to interpret context, learn user preferences, and generate relevant recommendation. Our proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks. We first we formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Subsequently, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research underscores the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and offers a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the experiments shows that our GenRec has significant better results on large dataset.	翻訳日:2023-07-16 04:18:29 公開日:2023-07-04
# トーキングヘッド生成における音声・ダイナミクス同期の総合的マルチスケールアプローチ A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation ( http://arxiv.org/abs/2307.03270v1 ) ライセンス: Link先を確認	Louis Airale (UGA, LIG), Dominique Vaufreydaz (LIG), Xavier Alameda-Pineda (UGA)	(参考訳) 音声入力信号を用いた静止画像の深部生成モデルによるアニメーション化は活発な研究課題であり,最近の重要な進展が見られる。しかし、頭の動きと音声の音声と視覚の相関はさておき、自然な頭の動きの発生は無視されることが多いため、唇の同期やレンダリングの質に多くの努力が注がれている。本研究では,頭部と唇のダイナミックスと音声の短期的・長期的相関をよりよく扱うために,マルチスケールの音声-視覚同期損失とマルチスケールの自己回帰的GANを提案する。特に、マルチモーダルな入力ピラミッド上でシンセサイザーモデルのスタックをトレーニングし、これらのモデルをマルチスケールジェネレータネットワークのガイダンスとして使用し、多様な時間スケールに展開するオーディオアライメント動作を生成する。我々のジェネレータは、標準的な低次元の頭部表現である顔のランドマーク領域で動作する。実験により,頭部運動のダイナミックス品質,およびランドマーク領域と画像領域の両方におけるマルチスケールオーディオ-視覚同期における技術の現状が大幅に改善された。 Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.	翻訳日:2023-07-16 04:14:36 公開日:2023-07-04
# Whisperを用いた教育用ビデオの翻訳:AIを用いた教育用ビデオの翻訳に関する予備的研究 Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos ( http://arxiv.org/abs/2307.03200v1 ) ライセンス: Link先を確認	Ashwin Rao	(参考訳) ビデオはますますeラーニングに使われており、文字起こしは学習体験を高めるために不可欠である。書き起こし生成のコストと遅延は、自動音声認識(ASR)システムによって軽減することができる。本稿では,25の教育ビデオに対してwhisperが生成した原稿を定量化し,asrを用いて教育ビデオの書き起こしを行う際の研究の道筋を明らかにした。 Videos are increasingly being used for e-learning, and transcripts are vital to enhance the learning experience. The costs and delays of generating transcripts can be alleviated by automatic speech recognition (ASR) systems. In this article, we quantify the transcripts generated by whisper for 25 educational videos and identify some open avenues of research when leveraging ASR for transcribing educational videos.	翻訳日:2023-07-16 04:14:16 公開日:2023-07-04
# splitfed learningの脆弱性分析:データ中毒攻撃に対するロバスト性の評価 Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks ( http://arxiv.org/abs/2307.03197v1 ) ライセンス: Link先を確認	Aysha Thahsin Zahir Ismail, Raj Mani Shukla	(参考訳) 分散コラボレーション機械学習(DCML)は、集中型機械学習に関連するプライバシー問題に対処する潜在的な代替手段である。スプリット学習(SL)とフェデレート学習(FL)はDCMLにおける2つの効果的な学習手法である。最近、SFL(SplitFed Learning)として知られるFLとSLのハイブリッドへの関心が高まっている。この研究は、SFLにおけるデータ中毒攻撃の影響を研究し、分析し、提示する最も初期の試みである。本研究では,SFLに対する標的外,標的外,遠隔攻撃の3種類の新規攻撃戦略を提案する。攻撃戦略はすべて、DCMLベースの分類器の性能を低下させることを目的としている。提案手法は,心電図信号分類と手書き文字自動認識の2つの異なるケーススタディで検証した。悪意のあるクライアントの割合と、クライアントとサーバ間でモデル分割層を選択することで、一連の攻撃実験が行われた。攻撃戦略の包括的分析の結果は、sflの標的攻撃と比較して、非標的および距離ベースの中毒攻撃は分類結果の回避に大きな影響を与えることを明らかに示す。 Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL	翻訳日:2023-07-16 04:14:07 公開日:2023-07-04
# 一貫性のある視覚合成のための協調スコア蒸留 Collaborative Score Distillation for Consistent Visual Synthesis ( http://arxiv.org/abs/2307.04787v1 ) ライセンス: Link先を確認	Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin	(参考訳) 大規模テキストと画像の拡散モデルの生成先行により、多様な視覚的モダリティに関する幅広い新しい生成および編集アプリケーションが可能になる。しかし、これらのプリエントを複数の画像(例えばビデオ)として表現される複雑な視覚モダリティに適応させる場合、一連の画像の一貫性を達成することは困難である。本稿では,この課題を協調スコア蒸留(csd)という新しい手法で解決する。 CSDはStein Variational Gradient Descent (SVGD)に基づいている。具体的には、SVGD更新において複数のサンプルを「粒子」とみなし、それらのスコア関数を組み合わせて、画像の集合を同期的に生成する。したがって、CSDは2次元画像間の情報のシームレスな統合を促進し、複数のサンプル間で一貫した視覚合成をもたらす。本研究では,パノラマ画像,ビデオ,および3dシーンのビジュアル編集を行い,様々なタスクにおけるcsdの有効性を示す。本研究は,サンプル間の整合性を向上し,テキスト・画像拡散モデルの適用性を高めるための汎用手法として,CDDの能力について述べる。 Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.	翻訳日:2023-07-16 04:03:33 公開日:2023-07-04
# SleepEGAN: 睡眠段階の非バランス分類のためのGANアンサンブル深層学習モデル SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages ( http://arxiv.org/abs/2307.05362v1 ) ライセンス: Link先を確認	Xuewei Cheng, Ke Huang, Yi Zou and Shujie Ma	(参考訳) ディープニューラルネットワークは、強力な表現とモデル内特徴変換能力のため、自動睡眠ステージ分類において重要な役割を果たす。しかし、睡眠データの生の脳波信号に存在するクラス不均衡と個々の不均一性は、あらゆる機械学習アルゴリズムの分類性能に大きな影響を及ぼす可能性がある。そこで本研究では,この2つの問題を解決するために,睡眠ステージの不均衡分類のための生成的逆ネットワーク(gan)を用いた学習モデルsleepganを開発した。クラス不均衡を軽減するため、データ拡張のためのEEG信号の特徴に適応した新しいGANアーキテクチャ(EGAN)を提案する。マイノリティクラスの生成されたサンプルは、トレーニングプロセスで使用される。さらに,検証とテストセットの不均一性に起因するモデル推定分散を低減し,予測性能の精度とロバスト性を高めるために,コストフリーなアンサンブル学習戦略を設計する。提案手法は,3つの睡眠データセットを用いた既存手法と比較して,分類精度を向上できることを示す。 Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two problems, this paper develops a generative adversarial network (GAN)-powered ensemble deep learning model, named SleepEGAN, for the imbalanced classification of sleep stages. To alleviate class imbalance, we propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation. The generated samples for the minority classes are used in the training process. In addition, we design a cost-free ensemble learning strategy to reduce the model estimation variance caused by the heterogeneity between the validation and test sets, so as to enhance the accuracy and robustness of prediction performance. We show that the proposed method can improve classification accuracy compared to several existing state-of-the-art methods using three public sleep datasets.	翻訳日:2023-07-16 03:55:40 公開日:2023-07-04
# テレビシリーズの人気をデコードする:ネットワーク分析の観点から Decoding the Popularity of TV Series: A Network Analysis Perspective ( http://arxiv.org/abs/2307.05329v1 ) ライセンス: Link先を確認	Melody Yu	(参考訳) 本稿では,3つの人気テレビシリーズから抽出されたキャラクタネットワークを分析し,テレビ番組のキャラクタネットワークメトリクスとIMDBのレビューとの関係について検討する。キャラクターネットワーク(英: character network)とは、シーン内のキャラクターの相互作用を表すテレビ番組のプロットから生成されたグラフであり、それら間の接続の存在を示す。ノード次数やグラフ密度など各エピソードのネットワークメトリクスを算出し,これらの指標を用いてimdbのネットワークメトリクスとテレビシリーズレビューの関係を考察する。その結果,テレビシリーズにおけるキャラクターインタラクションのネットワーク指標は,テレビシリーズのレビュースコアと強い相関を示した。本研究は,テレビ制作者が視聴者にアピールする未来のエピソードのキャラクタダイナミクスの調整方法を理解する上で,より定量的な情報を提供することを目的としている。キャラクタインタラクションが視聴者のエンゲージメントや楽しみに与える影響を理解することによって、プロデューサーは番組の展開に関するインフォームドな意思決定を行うことができる。 In this paper, we analyze the character networks extracted from three popular television series and explore the relationship between a TV show episode's character network metrics and its review from IMDB. Character networks are graphs created from the plot of a TV show that represents the interactions of characters in scenes, indicating the presence of a connection between them. We calculate various network metrics for each episode, such as node degree and graph density, and use these metrics to explore the potential relationship between network metrics and TV series reviews from IMDB. Our results show that certain network metrics of character interactions in episodes have a strong correlation with the review score of TV series. Our research aims to provide more quantitative information that can help TV producers understand how to adjust the character dynamics of future episodes to appeal to their audience. By understanding the impact of character interactions on audience engagement and enjoyment, producers can make informed decisions about the development of their shows.	翻訳日:2023-07-16 03:54:07 公開日:2023-07-04
# ガルバニック皮膚反応信号の特徴選択とSVMに基づく人間の感情認識 Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM ( http://arxiv.org/abs/2307.05383v1 ) ライセンス: Link先を確認	Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong	(参考訳) 本稿では,自動選択したGalvanic Skin Response (GSR)信号の特徴に基づく人間の感情認識手法とSVMを提案する。 GSR信号はE-Health Sensor Platform V2.0によって取得された。そして、ウェーブレット関数によってデータをデノーズし、正規化して個々の差を除去する。正規化データから30個の特徴を抽出するが、これらの特徴を直接使用すると認識率が低下する。本手法では,最適化機能を得るために,共分散に基づく特徴選択を行う。最後に、最適化された特徴を入力したSVMを用いて人間の感情認識を実現する。実験の結果,提案手法は人間の感情認識に適しており,認識精度は66.67%以上であることがわかった。 A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of these features will lead to a low recognition rate. In order to gain the optimized features, a covariance based feature selection is employed in our method. Finally, a SVM with input of the optimized features is utilized to achieve the human emotion recognition. The experimental results indicate that the proposed method leads to good human emotion recognition, and the recognition accuracy is more than 66.67%.	翻訳日:2023-07-16 03:43:33 公開日:2023-07-04
# コヒーレント光学系におけるニューラルネットワーク等化器の創発性を高めるマルチタスク学習 Multi-Task Learning to Enhance Generazability of Neural Network Equalizers in Coherent Optical Systems ( http://arxiv.org/abs/2307.05374v1 ) ライセンス: Link先を確認	Sasipim Srivallapanondh, Pedro J. Freire, Ashraful Alam, Nelson Costa, Bernhard Spinnler, Antonio Napoli, Egor Sedov, Sergei K. Turitsyn, Jaroslaw E. Prilepsky	(参考訳) コヒーレントシステムにおけるnnベースのイコライザの柔軟性を改善するため,マルチタスク学習が初めて提案されている。 NNベースの「単一」等化器は、打ち上げ電力、シンボルレート、送信距離の変動があっても再訓練することなく、CDCと比較して最大4dBのQ因子を改善する。 For the first time, multi-task learning is proposed to improve the flexibility of NN-based equalizers in coherent systems. A "single" NN-based equalizer improves Q-factor by up to 4 dB compared to CDC, without re-training, even with variations in launch power, symbol rate, or transmission distance.	翻訳日:2023-07-16 03:42:39 公開日:2023-07-04
# 量子回路シミュレーションの二酸化炭素排出量は想像以上に多い Carbon Emissions of Quantum Circuit Simulation: More than You Would Think ( http://arxiv.org/abs/2307.05510v1 ) ライセンス: Link先を確認	Jinyang Li, Qiang Guan, Dingwen Tao, Weiwen Jiang	(参考訳) 量子ハードウェアの急速な進歩は、多くの研究機会と多くの分野にわたる量子アドバンテージの可能性をもたらす。このランドスケープでは、量子回路シミュレーションは古典的コンピュータ上での量子挙動をエミュレートすることで、必須のツールとして機能する。簡単なアクセス、ノイズのない環境、量子状態のリアルタイム観察を提供する。しかし、量子回路シミュレーションの持続可能性の側面はまだ解明されていない。本稿では,量子回路シミュレーションによる環境影響の概念を初めて紹介する。量子回路シミュレーションから得られたCO2e排出量を計算するための予備モデルを提案する。以上の結果から,大規模な量子回路シミュレーション(43量子ビット)は,変圧器機械学習モデルのトレーニングの48倍のCO2e排出量につながる可能性が示唆された。 The rapid advancement of quantum hardware brings a host of research opportunities and the potential for quantum advantages across numerous fields. In this landscape, quantum circuit simulations serve as an indispensable tool by emulating quantum behavior on classical computers. They offer easy access, noise-free environments, and real-time observation of quantum states. However, the sustainability aspect of quantum circuit simulation is yet to be explored. In this paper, we introduce for the first time the concept of environmental impact from quantum circuit simulation. We present a preliminary model to compute the CO2e emissions derived from quantum circuit simulations. Our results indicate that large quantum circuit simulations (43 qubits) could lead to CO2e emissions 48 times greater than training a transformer machine learning model.	翻訳日:2023-07-16 03:36:04 公開日:2023-07-04
# garbage in, garbage out: 大きな言語モデルを用いた犯罪のゼロショット検出 Garbage in, garbage out: Zero-shot detection of crime using Large Language Models ( http://arxiv.org/abs/2307.06844v1 ) ライセンス: Link先を確認	Anj Simmons, Rajesh Vasa	(参考訳) 本稿では,大規模言語モデルが学習した常識知識を活用し,監視映像のテキスト記述による犯罪に関するゼロショット推論を行う。ビデオが(手動で)高品質なテキスト記述に変換される場合,大規模な言語モデルでは,ゼロショット推論のみを用いて,最先端のパフォーマンスで犯罪を検出し分類することができる。しかし、既存の自動ビデオからテキストへのアプローチでは、推論をサポートするのに十分な品質の動画記述を生成することができない(ガベージアウト、大きな言語モデルへのガベージアウトビデオ記述)。 This paper proposes exploiting the common sense knowledge learned by large language models to perform zero-shot reasoning about crimes given textual descriptions of surveillance videos. We show that when video is (manually) converted to high quality textual descriptions, large language models are capable of detecting and classifying crimes with state-of-the-art performance using only zero-shot reasoning. However, existing automated video-to-text approaches are unable to generate video descriptions of sufficient quality to support reasoning (garbage video descriptions into the large language model, garbage out).	翻訳日:2023-07-16 03:16:44 公開日:2023-07-04
# generative adversarial trainer: ganによる敵対的摂動に対する防御 Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN ( http://arxiv.org/abs/1705.03387v3 ) ライセンス: Link先を確認	Hyeungill Lee, Sungyeob Han, Jungwoo Lee	(参考訳) 本稿では,生成型adversarial networkを用いて,ニューラルネットワークを敵例に頑健にする新しい手法を提案する。我々は分類器と生成器のネットワークを交互に訓練する。生成ネットワークは、各画像の勾配を用いて分類器ネットワークを容易に騙すことができる逆摂動を生成する。同時に、分類器ネットワークは、生成者が生成した原画像と逆画像の両方を正しく分類するように訓練される。これらの手順は、分類器ネットワークが敵の摂動に対してより堅牢になるのに役立つ。さらに,本学習フレームワークは,オーバーフィッティングを効率的に低減し,ドロップアウトなどの他の正規化手法を上回る。提案手法をCIFARデータセットの教師あり学習に適用し,実験結果からネットワークの一般化誤差を著しく低減することを示した。我々の知る限りでは、教師あり学習を改善するために GAN を用いる最初の方法である。 We propose a novel technique to make neural network robust to adversarial examples using a generative adversarial network. We alternately train both classifier and generator networks. The generator network generates an adversarial perturbation that can easily fool the classifier network by using a gradient of each image. Simultaneously, the classifier network is trained to classify correctly both original and adversarial images generated by the generator. These procedures help the classifier network to become more robust to adversarial perturbations. Furthermore, our adversarial training framework efficiently reduces overfitting and outperforms other regularization methods such as Dropout. We applied our method to supervised learning for CIFAR datasets, and experimantal results show that our method significantly lowers the generalization error of the network. To the best of our knowledge, this is the first method which uses GAN to improve supervised learning.	翻訳日:2023-07-07 19:03:29 公開日:2023-07-04
# deepois:ジャイロスコープ誘導深部光学画像安定化装置 DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation ( http://arxiv.org/abs/2101.11183v2 ) ライセンス: Link先を確認	Haipeng Li, Shuaicheng Liu, Jue Wang	(参考訳) 撮影された画像はジャイロスコープセンサーを使ってアライメントすることができる。光画像安定化装置(OIS)は、撮影中に画像を調整することで、この可能性を終わらせる。本研究では,OISカメラの映像アライメントにジャイロスコープを使用できるように,OISが引き起こす動きを補償するディープネットワークを提案する。まず,oisカメラを用いて映像とジャイロスコープの両方をトレーニングデータとして記録する。次にジャイロスコープの読みを運動場に変換する。第2に, ローリングシャッターカメラにおいて, フレーム内回転の配列を接地ガイドとして抽出する基本混合運動モデルを提案する。第3に, ジャイロスコープ動作を入力として畳み込みニューラルネットワークをトレーニングし, OIS動作を補償する。一度処理が完了すると、補償ネットワークが他のシーンに適用され、画像アライメントは画像の内容を必要としないジャイロスコープに基づいており、強い堅牢性を提供する。実験の結果は,OIS以外のカメラと同等であり,画像ベースアライメントの精度は比較的高いことがわかった。コードとデータセットはhttps://github.com/lhaippp/DeepOISで入手できる。 Mobile captured images can be aligned using their gyroscope sensors. Optical image stabilizer (OIS) terminates this possibility by adjusting the images during the capturing. In this work, we propose a deep network that compensates the motions caused by the OIS, such that the gyroscopes can be used for image alignment on the OIS cameras. To achieve this, first, we record both videos and gyroscopes with an OIS camera as training data. Then, we convert gyroscope readings into motion fields. Second, we propose a Fundamental Mixtures motion model for rolling shutter cameras, where an array of rotations within a frame are extracted as the ground-truth guidance. Third, we train a convolutional neural network with gyroscope motions as input to compensate for the OIS motion. Once finished, the compensation network can be applied for other scenes, where the image alignment is purely based on gyroscopes with no need for images contents, delivering strong robustness. Experiments show that our results are comparable with that of non-OIS cameras, and outperform image-based alignment results with a relatively large margin. Code and dataset are available at https://github.com/lhaippp/DeepOIS	翻訳日:2023-07-07 18:57:53 公開日:2023-07-04
# グラディエントノルム認識の最小化は1次平坦性を追求し、一般化を改善する Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization ( http://arxiv.org/abs/2303.03108v3 ) ライセンス: Link先を確認	Xingxuan Zhang and Renzhe Xu and Han Yu and Hao Zou and Peng Cui	(参考訳) 近年、フラットミニマは一般化とシャープネス認識最小化(sam)の改善に効果的であることが証明されている。しかし、SAMで議論されている平坦性の現在の定義とそのフォローアップはゼロ階平坦性(摂動半径内の最悪の損失)に限定されている。摂動半径内に1つの最小または複数のミニマが存在する場合, 一般化誤差の低いミニマを高い一般化誤差で判別するには, ゼロ階平坦性が不十分であることを示す。そこで我々は,局所的最小点におけるヘッシアンの最大固有値とsamの正規化関数の両方を境界とする摂動半径内の最大勾配ノルムに着目した,一階平坦性を示す。また,全方向にわたって一様に曲率の小さい最小値を求めるため,GAM(Gradient norm Aware Minimization)と呼ばれる新しいトレーニング手順を提案する。実験結果から,GAMは様々なデータセットやネットワーク上で,SGDやAdamWといった現在の最適化アルゴリズムで訓練されたモデルの一般化を改善することが示された。さらに、GAMはSAMがより平坦なミニマムを見つけ、より良い一般化を実現するのに役立つことを示す。 Recently, flat minima are proven to be effective for improving generalization and sharpness-aware minimization (SAM) achieves state-of-the-art performance. Yet the current definition of flatness discussed in SAM and its follow-ups are limited to the zeroth-order flatness (i.e., the worst-case loss within a perturbation radius). We show that the zeroth-order flatness can be insufficient to discriminate minima with low generalization error from those with high generalization error both when there is a single minimum or multiple minima within the given perturbation radius. Thus we present first-order flatness, a stronger measure of flatness focusing on the maximal gradient norm within a perturbation radius which bounds both the maximal eigenvalue of Hessian at local minima and the regularization function of SAM. We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions. Experimental results show that GAM improves the generalization of models trained with current optimizers such as SGD and AdamW on various datasets and networks. Furthermore, we show that GAM can help SAM find flatter minima and achieve better generalization.	翻訳日:2023-07-07 17:49:50 公開日:2023-07-04
# 固体力学への応用におけるニューラルfem法とニューラルオペレータ法の比較 Comparison of Neural FEM and Neural Operator Methods for applications in Solid Mechanics ( http://arxiv.org/abs/2307.02494v1 ) ライセンス: Link先を確認	Stefan Hildebrand, Sandra Klinge	(参考訳) 機械学習手法は偏微分方程式を解くための最も最新のアプローチのグループに属する。現在の研究は、数値実験によるエラストスタティックスにおける2つのクラス、Neural FEMとNeural Operator Methodsを調査している。 Neural Operatorメソッドは、高価なトレーニングを必要とするが、同じ機械学習モデルで複数の境界値問題を解決することができる。 2つのクラスの主な違いは、計算の労力と精度である。特に、実用的応用にはさらなる研究が必要である。 Machine Learning methods belong to the group of most up-to-date approaches for solving partial differential equations. The current work investigates two classes, Neural FEM and Neural Operator Methods, for the use in elastostatics by means of numerical experiments. The Neural Operator methods require expensive training but then allow for solving multiple boundary value problems with the same Machine Learning model. Main differences between the two classes are the computational effort and accuracy. Especially the accuracy requires more research for practical applications.	翻訳日:2023-07-07 16:53:25 公開日:2023-07-04
# FREEDOM: 教師なしパーソナライゼーションのためのターゲットラベルとソースデータとドメイン情報のないマルチソースドメイン適応 FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization ( http://arxiv.org/abs/2307.02493v1 ) ライセンス: Link先を確認	Eunju Yang, Gyusang Cho, Chan-Hyun Youn	(参考訳) サービスの観点からは、Multi-Source Domain Adaptation(MSDA)は、デプロイされたモデルをクライアントのデータセットに適応させる、有望なシナリオである。ターゲットラベルなしで適応を提供し、ソースデータセットが複数のドメインから構築されている場合をサポートする。しかし、そのトレーニングは、マルチソースデータセットの事前ドメイン情報 -- 存在するドメインの数と各データサンプルのドメインラベル -- に大きく依存しているため、現実的ではない。さらにmsdaは、ソースとターゲットの両方のデータセットを同時に(物理的に)必要とし、クライアント装置のストレージ制限や、クライアントデータをサーバに転送することでデータプライバシの問題を引き起こす。サービス提供者の観点からモデル適応のより実践的なシナリオとして、これらの制約を緩和し、3自由ドメイン適応という新たな問題シナリオを提示します。 1)ターゲットラベル、 2)ソースデータセット、大部分は 3) ソースドメイン情報(ドメインラベル+ドメイン数)は利用できない。問題シナリオでは、FREEDOMと呼ばれる実践的な適応フレームワークを提案する。生成モデルのパワーを活用し、データをクラスとスタイルの側面に分離し、そのスタイルはソースデータからクラス非依存の情報として定義され、非パラメトリックベイズアプローチで設計される。適応段階において、FREEDOMは、スタイルが異なる場合でも、クラス分布は一貫性があるという考え方の下で、ソースクラスの分布とターゲットの分布とを一致させることを目的としており、その後、分類モデルの一部のみがパーソナライズされたネットワークとしてデプロイされる。その結果、FREEDOMは、ドメイン情報なしで、ターゲット側の最終的なモデルサイズを減らし、ソースドメインの数によらず、最先端または同等のパフォーマンスを達成する。 From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.	翻訳日:2023-07-07 16:53:18 公開日:2023-07-04
# tableye: 画像のレンズを通して小さなテーブルを見る TablEye: Seeing small Tables through the Lens of Images ( http://arxiv.org/abs/2307.02491v1 ) ライセンス: Link先を確認	Seung-eon Lee and Sang-Chul Lee	(参考訳) 少人数の表学習の探求が不可欠になる。タブラルデータ(Tabular data)は、多様な情報をキャプチャする汎用表現であるが、制限やデータの特性、モデルのサイズは除外されない。広範な表データのラベル付けは困難であり、すべての重要な機能をキャプチャすることは不可能である。しかし、独立データセット間の共有情報の不足と、表データ内の境界を定義する固有の曖昧さが原因で、比較的未熟なままである。我々の知る限りでは、データセットに制約を課すことなく有意義で制約のない数発の表型学習技術は開発されていない。本稿では,表型データに対する事前知識形成の限界を克服し,ドメイン変換を取り入れたTablEyeという革新的なフレームワークを提案する。表画像を生成してドメイン変換を容易にすることで、元の表データの本質的なセマンティクスを効果的に保存する。このアプローチは、厳密にテストされた少数の学習アルゴリズムと埋め込み関数を利用して、事前知識を取得し、適用する。共有データドメインを利用することで、イメージドメインから学習したこの事前知識を活用できます。具体的には、TablEyeはTabLLMを最大0.11AUCとSTUNTの4ショットタスクで上回り、1ショット設定で平均3.17%の精度で性能を発揮した。 The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.	翻訳日:2023-07-07 16:52:48 公開日:2023-07-04
# ボース・アインシュタイン凝縮系における一般外部ポテンシャルによる量子オットーエンジンの性能向上 Enhancing Quantum Otto Engine Performance in Generalized External Potential on Bose-Einstein Condensation Regime ( http://arxiv.org/abs/2307.01805v1 ) ライセンス: Link先を確認	Zahara Zettira, Ade Fahriza, Zulfi Abdullah, Trengginas E P Sutantyo	(参考訳) ボース・アインシュタイン凝縮(bec)と通常のボース気体の両方を汎用外部ポテンシャルに閉じ込めた動作媒質として用いた量子オットーエンジンについて検討した。エンジンを準静的かつ内可逆的に処理した。準静的および可逆的両方の膨張と圧縮は等エントロピー的であるため、効率の表現は類似している。しかし、準静電サイクルの出力は無限のストローク時間と長いストローク時間のためにゼロである。対照的に、可逆サイクルでは、2つの貯水池による熱化は有限時間で行われる。導電性においてフーリエの法則を用いて媒質の温度と貯水池の温度の関係を定式化し, 作業は加熱時間と冷却ストローク時間に依存する。さらに圧縮比$\kappa$に対して最大出力(EMP)の効率を得るために最大出力を最大化した。作業媒体としてBECを用いる場合, 通常のボースガスを用いたEMPはCurzon-Ahlborn効率に過ぎなかった。また,熱接触時間$\tau$とホット(\tau_{h}$)およびコールド(\tau_{l}$)がEMPに及ぼす影響についても検討した。我々は、$\tau_{h}=\tau_{l}$ stroke時間が発生すると、有意な差は認められなかった。それにもかかわらず、様々な冷却と加熱のストローク時間を調整することは、EMPにおいて重要な結果となり、ストローク時間は$\tau_{h}<\tau_{l}$より高く、ストローク時間は$\tau_{h}>\tau_{l}$より低い。この部分熱化は残留コヒーレンスによるエンジンのEMPを高めると結論付けている。 We examine a quantum Otto engine using both Bose-Einstein Condensation (BEC) and normal Bose gas as working medium trapped in generalized external potential. We treated the engine quasi-statically and endoreversibly. Since the expansion and compression in both quasi-static and endoreversible take place isentropic, the expression of efficiency is similar. However, the power output in the quasi-static cycle is zero due to infinite and long stroke time. In contrast, with an endoreversible cycle, thermalization with two reservoirs takes place at a finite time. We use Fourier's law in conduction to formulate the relation between temperature of medium and reservoir, making work depend on heating and cooling stroke time. Moreover, we maximized the power with respect to compression ratio $\kappa$ to obtain efficiency at maximum power (EMP). We found that EMP is significantly higher when using BEC as a working medium, meanwhile EMP with normal Bose gas is just Curzon-Ahlborn efficiency. We also investigate the effect of thermal contact time $\tau$ with hot ($\tau_{h}$) and cold ($\tau_{l}$) reservoir on EMP. We found that when $\tau_{h}=\tau_{l}$ stroke time occur, there are no significant differences. Nevertheless, adjusting various cooling and heating stroke time provide a significant result on EMP, which is much higher at $\tau_{h}<\tau_{l}$ stroke time whilst lower at $\tau_{h}>\tau_{l}$ stroke time. We conclude this partial thermalization enhances the EMP of the engine due to residual coherence.	翻訳日:2023-07-07 16:52:23 公開日:2023-07-04
# AI支援プログラミングのための自然言語生成とビッグコードの理解:レビュー Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review ( http://arxiv.org/abs/2307.02503v1 ) ライセンス: Link先を確認	Man Fai Wong, Shangxin Guo, Ching Nam Hang, Siu Wai Ho, Chee Wei Tan	(参考訳) 本稿では,自然言語処理(NLP)技術の利用に関する文献を包括的にレビューし,AI支援プログラミングタスクの分野において,Big Codeを用いてトレーニングされたトランスフォーマーベース大規模言語モデル(LLM)に着目した。ソフトウェア自然性によって強化されたLLMは、コード生成、コード補完、コード翻訳、コード洗練、コードの要約、欠陥検出、クローン検出など、AI支援プログラミングアプリケーションを促進する上で重要な役割を果たしている。このようなアプリケーションの著名な例としては、OpenAIのCodexとDeepMind AlphaCodeを利用したGitHub Copilotがある。本稿では,AI支援プログラミングに関連する下流タスクにおけるLLMとその応用について概説する。さらに、これらのアプリケーションにNLP技術とソフトウェア自然性を導入する際の課題と機会についても検討し、モバイルソフトウェア開発のためのAppleのXcodeにAI支援プログラミング機能を拡張することについて議論した。また,NLP技術をソフトウェア自然性に取り入れる上での課題と機会,高度なコーディング支援を開発者に与えること,ソフトウェア開発プロセスの合理化について述べる。 This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications, with a discussion on extending AI-assisted programming capabilities to Apple's Xcode for mobile software development. This paper also presents the challenges of and opportunities for incorporating NLP techniques with software naturalness, empowering developers with advanced coding assistance and streamlining the software development process.	翻訳日:2023-07-07 16:42:49 公開日:2023-07-04
# 数学エージェント:計算基盤、数学的埋め込み、ゲノム学 Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics ( http://arxiv.org/abs/2307.02502v1 ) ライセンス: Link先を確認	Melanie Swan, Takashi Kido, Eric Roland, Renato P. dos Santos	(参考訳) 生成AIの進歩は、よりアクセスしやすい数学によって促進される可能性がある。人間-AIチャット以外にも、大規模言語モデル(LLM)はプログラミング、アルゴリズム発見、定理証明に現れているが、ゲノム応用は限られている。本稿では、GPTベースのワークフローを用いて、数学エージェントと数学埋め込みを「ムーアの数学法則」の新たなエントリとして導入し、方程式を文学からLaTeXおよびPython形式に変換する。多くのデジタル方程式表現が存在するが、大規模な自動評価ツールがない。 LLMは言語ユーザインタフェースとして重要であり、人間のAIチャットや大規模AI支援計算インフラのための形式言語に自然言語アクセスを提供する。無限の形式的な可能性空間を考えると、数学と相互作用する数学エージェントは、私たちを「大きなデータ」から「大きな数学」に変える可能性がある。より柔軟な自然言語とは異なり、Mathには証明の対象となる特性があり、AIアライメントを目的とした高い精度の数学認証アイコンのような従来のアプリケーションを超えて使用することができる。本研究の目的は、マルチスカラー物理数学を病気モデルやゲノムデータに適用することにより、情報システム生物学の老化問題に対処するため、数学エージェントと数学的埋め込みを利用することである。エピソード記憶を持つ生成AIは、SIR精度健康モデルを用いて、縦断的な健康記録における因果関係を分析するのに役立つ。ゲノムデータは未解決のアルツハイマー病問題に対処するために提案されている。 The advancement in generative AI could be boosted with more accessible mathematics. Beyond human-AI chat, large language models (LLMs) are emerging in programming, algorithm discovery, and theorem proving, yet their genomics application is limited. This project introduces Math Agents and mathematical embedding as fresh entries to the "Moore's Law of Mathematics", using a GPT-based workflow to convert equations from literature into LaTeX and Python formats. While many digital equation representations exist, there's a lack of automated large-scale evaluation tools. LLMs are pivotal as linguistic user interfaces, providing natural language access for human-AI chat and formal languages for large-scale AI-assisted computational infrastructure. Given the infinite formal possibility spaces, Math Agents, which interact with math, could potentially shift us from "big data" to "big math". Math, unlike the more flexible natural language, has properties subject to proof, enabling its use beyond traditional applications like high-validation math-certified icons for AI alignment aims. This project aims to use Math Agents and mathematical embeddings to address the ageing issue in information systems biology by applying multiscalar physics mathematics to disease models and genomic data. Generative AI with episodic memory could help analyse causal relations in longitudinal health records, using SIR Precision Health models. Genomic data is suggested for addressing the unsolved Alzheimer's disease problem.	翻訳日:2023-07-07 16:42:30 公開日:2023-07-04
# アルゴリズム依存ラデマッハ錯体による一般化保証 Generalization Guarantees via Algorithm-dependent Rademacher Complexity ( http://arxiv.org/abs/2307.02501v1 ) ライセンス: Link先を確認	Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli	(参考訳) アルゴリズムとデータ依存の一般化境界は、現代の機械学習アルゴリズムの一般化挙動を説明するために必要である。この文脈では、(様々な形の)相互情報を含む情報理論の一般化境界と、仮説集合の安定性に基づく境界が存在する。本稿では、アルゴリズムとデータ依存仮説クラスの経験的ラデマッハ複雑性である一般化誤差を制御するための概念的だが技術的に異なる複雑性尺度を提案する。 Rademacher複雑性の標準的な性質と、このクラスの便利な構造を組み合わせることで、我々は、 (i)有限フラクタル次元に基づく新たな境界を得る。 (a)従来のフラクタル次元型境界を連続から有限の仮説クラスに拡張し、 b) 先行業務において必要とされた相互情報用語を避けること (II) 確率勾配降下に対する最近の次元独立一般化の証明を大幅に単純化する。 (iii)条件付き相互情報に基づくアプローチと同様に,vcクラスや圧縮スキームの結果の復元が容易である。 Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.	翻訳日:2023-07-07 16:42:07 公開日:2023-07-04
# 対人訓練による解釈可能なコンピュータビジョンモデル:ロバスト性-解釈可能性結合を解き明かす Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection ( http://arxiv.org/abs/2307.02500v1 ) ライセンス: Link先を確認	Delyan Boychev	(参考訳) 最先端のディープニューラルネットワークの複雑性が永久に増大するにつれて、その解釈性を維持することがますます難しくなっている。本研究は,ロバストなモデル作成に使用される敵の訓練の効果を評価することを目的としている。コンピュータビジョンモデルをより解釈可能にすることが示されている。モデルを現実世界にデプロイする場合、解釈性は堅牢性と同じくらい不可欠です。これら2つの課題の相関性を証明するため,局所的特徴重要度法 (SHAP, 統合的勾配法) と特徴可視化技術 (Representation Inversion, Class Specific Image Generation) を用いてモデルを広範囲に検討した。標準モデルは、ロバストに比べて敵の攻撃の影響を受けやすく、その学習された表現は人間にとって意味をなさない。逆に、これらのモデルは予測をサポートする画像の特徴的な領域に焦点を当てている。さらに、ロバストモデルによって学習される機能は、実際のものに近い。 With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.	翻訳日:2023-07-07 16:41:54 公開日:2023-07-04
# mPLUG-DocOwl:文書理解のためのモジュール化多モーダル大言語モデル mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding ( http://arxiv.org/abs/2307.02499v1 ) ライセンス: Link先を確認	Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang	(参考訳) 文書理解とは、ウェブページのような様々なタイプのデジタル文書から情報を自動的に抽出し、分析し、理解することである。 mPLUG-Owlを含む既存のMLLM(Multi-model Large Language Models)は、浅いOCRフリーテキスト認識において、望ましくないゼロショット機能を示し、OCRフリー文書理解の可能性を示している。それにもかかわらず、ドメイン内のトレーニングなしでは、これらのモデルは、OCRのない文書理解に不可欠な、洗練されたテーブルや大きなテキストブロックのような細粒度のOCR機能を無視する傾向にある。本稿では,OCRフリー文書理解のためのmPLUG-DocOwlに基づくmPLUG-DocOwlを提案する。具体的には、まず、幅広い視覚的テキスト理解タスクを特徴とするインストラクションチューニングデータセットを構築する。次に,ocrフリーな文書理解能力を強化し,言語のみ,汎用視覚言語,文書命令チューニングデータセットを統一した命令チューニング戦略で共同で学習する。また、OCRフリーな文書命令理解評価セットLLMDocを構築し、コンプライアンスと文書理解に関するモデルの能力をよりよく比較する。実験結果から,本モデルは既存のマルチモーダルモデルよりも優れており,文書理解の強力な能力を示している。さらに、特定の微調整なしに、mPLUG-DocOwlは様々な下流タスクをうまく一般化する。私たちのコード、モデル、トレーニングデータ、評価セットはhttps://github.com/X-PLUG/mPLUG-DocOwl.comで公開されています。 Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for OCR-free document understanding. Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding. In this paper, we propose mPLUG-DocOwl based on mPLUG-Owl for OCR-free document understanding. Specifically, we first construct a instruction tuning dataset featuring a wide range of visual-text understanding tasks. Then, we strengthen the OCR-free document understanding ability by jointly train the model on language-only, general vision-and-language, and document instruction tuning dataset with our unified instruction tuning strategy. We also build an OCR-free document instruction understanding evaluation set LLMDoc to better compare models' capabilities on instruct compliance and document understanding. Experimental results show that our model outperforms existing multi-modal models, demonstrating its strong ability of document understanding. Besides, without specific fine-tuning, mPLUG-DocOwl generalizes well on various downstream tasks. Our code, models, training data and evaluation set are available at https://github.com/X-PLUG/mPLUG-DocOwl.	翻訳日:2023-07-07 16:41:36 公開日:2023-07-04
# マルチゲージ水文変動データ同化:多層パーセプトロンとベイズ誘導多変量回帰を用いた空間勾配による地域化学習 Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression ( http://arxiv.org/abs/2307.02497v1 ) ライセンス: Link先を確認	Ngo Nghi Truyen Huynh, Pierre-Andr\'e Garambois, Fran\c{c}ois Colleoni, Benjamin Renard, H\'el\`ene Roux (IMFT)	(参考訳) 空間的に分散した水文パラメータを推定する難しい問題、特に未開水路の洪水について、この寄与は、高分解能な水文モデルのために設計された複雑な地域移動関数を学習するための、新しいシームレスな地域化技術である。転送関数は以下の通りである。 (i)勾配計算のシームレスな流れを可能にした多層パーセプトロンは、機械学習の最適化アルゴリズムを用いる。 (II)変分データ同化アルゴリズムにより最適化され,ベイズ推定により導かれる多変量回帰写像は,実現可能な解の不等式問題に対処する。この手法は、推定可能な地域化写像を微分可能な水文モデルに組み込んで、正確な随伴型空間分布勾配を持つマルチゲージデータに基づいて計算されるコスト関数を最適化する。 Tackling the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, this contribution presents a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on: (i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or (ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions. The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.	翻訳日:2023-07-07 16:41:11 公開日:2023-07-04
# Invertible Neural Networks and Error Diffusion を用いた導電性マップによる気泡分布の再構築 Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion ( http://arxiv.org/abs/2307.02496v1 ) ライセンス: Link先を確認	Nishant Kumar, Lukas Krause, Thomas Wondrak, Sven Eckert, Kerstin Eckert, Stefan Gumhold	(参考訳) 電解はエコフレンドリーな水素生産には不可欠であるが、反応の妨げとなり、セル効率が低下し、エネルギー消費が増加する。さらに、これらのガス気泡は細胞内部の伝導度の変化を引き起こし、細胞周囲の誘導磁場に対応する変化をもたらす。したがって, 外部磁場センサを用いてこれらのガス気泡誘起磁場変動を測定し, バイオサバルト法則の逆問題を解くことにより, セル内の伝導度を推定し, 気泡の大きさと位置を推定することができる。しかし、少数の磁場測定から高分解能導電率マップを決定することは、逆問題である。これを解決するために,Invertible Neural Networks (INNs) を用いて導電性フィールドを再構築する。その結果,tikhonov正則化に比べ,innははるかに優れた性能が得られることがわかった。 Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.	翻訳日:2023-07-07 16:40:53 公開日:2023-07-04
# 産業画像解析のためのパッチベースオートエンコーダの画像または潜時空間の異常検出 Anomaly detection in image or latent space of patch-based auto-encoders for industrial image analysis ( http://arxiv.org/abs/2307.02495v1 ) ライセンス: Link先を確認	Nicolas Pinon (MYRIAD), Robin Trombetta (MYRIAD), Carole Lartizien (MYRIAD)	(参考訳) 本研究では,パッチベースのオートエンコーダを用いたカラー画像の異常検出手法について検討した。まず、原画像と再構成の誤差に基づいて、3種類の手法の性能を比較し、第2に、潜時空間における正規像分布の支持推定、第3に、原画像と再構成画像の復元版との誤差について比較する。これらの手法を産業画像データベースMVTecADandで評価し、2つの最先端技術と比較した。 We study several methods for detecting anomalies in color images, constructed on patch-based auto-encoders. Wecompare the performance of three types of methods based, first, on the error between the original image and its reconstruction,second, on the support estimation of the normal image distribution in the latent space, and third, on the error between the originalimage and a restored version of the reconstructed image. These methods are evaluated on the industrial image database MVTecADand compared to two competitive state-of-the-art methods.	翻訳日:2023-07-07 16:40:38 公開日:2023-07-04
# 深層強化学習における転校学習:調査 Transfer Learning in Deep Reinforcement Learning: A Survey ( http://arxiv.org/abs/2009.07888v7 ) ライセンス: Link先を確認	Zhuangdi Zhu, Kaixiang Lin, Anil K. Jain, and Jiayu Zhou	(参考訳) 強化学習は、シーケンシャルな意思決定問題を解決するための学習パラダイムである。近年,深層ニューラルネットワークの急速な発展に伴い,強化学習が著しく進展している。ロボット工学やゲームプレイングといった多くの分野における強化学習の有望な展望とともに、翻訳学習は、強化学習が直面する様々な課題に取り組み、外部の専門知識から知識を移譲して学習プロセスの効率化と有効性を促進する。本研究では,深層強化学習の文脈における転校学習アプローチの最近の進歩を体系的に調査する。具体的には,最先端のトランスファー学習のアプローチを分類するためのフレームワークを提供し,それらの目標,方法論,互換性のある強化学習バックボーン,実践的応用について分析する。また,強化学習の観点からは,転校学習と関連する他の話題との関係を導き,今後の研究の進展を待ち受けている課題を探究する。 Reinforcement learning is a learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in reinforcement learning upon the fast development of deep neural networks. Along with the promising prospects of reinforcement learning in numerous domains such as robotics and game-playing, transfer learning has arisen to tackle various challenges faced by reinforcement learning, by transferring knowledge from external expertise to facilitate the efficiency and effectiveness of the learning process. In this survey, we systematically investigate the recent progress of transfer learning approaches in the context of deep reinforcement learning. Specifically, we provide a framework for categorizing the state-of-the-art transfer learning approaches, under which we analyze their goals, methodologies, compatible reinforcement learning backbones, and practical applications. We also draw connections between transfer learning and other relevant topics from the reinforcement learning perspective and explore their potential challenges that await future research progress.	翻訳日:2023-07-07 01:04:10 公開日:2023-07-04
# 現実的制約下における量子エンタングルメントパーコレーション Quantum entanglement percolation under a realistic restriction ( http://arxiv.org/abs/2008.09040v2 ) ライセンス: Link先を確認	Shashaank Khanna, Saronath Halder, Ujjwal Sen	(参考訳) Bell と Greenberger-Horne-Zeilinger を回路の遠方または遠方ノード間で確立することの問題は難しく、非常に重要な問題であり、それに対処する戦略はエンタングメント・パーコレーションである。部分絡み合った純二分体絡み合い状態の単層ハニカム格子上の3、2、および1量子ビットの測定を含む量子計測戦略により終端を得る方法を提供する。次に、二層格子に移動し、格子のノード上で許容される局所量子演算と古典的通信の現実的な制限の下で、その格子上に絡み合うパーコレーションを導入する。単層ハニカム格子に適用した場合、既存の手法で同じ現象が達成された場合よりも、実際の実現におけるノイズ効果の低減が求められる。さらに, 2層ハニカム格子に対しては, 現実的制約下での古典的エンタングルメントパーコレーションに対する量子エンタングルメントパーコレーションの利点を報告する。 The problem of establishing Bell and Greenberger-Horne-Zeilinger states between faraway places or distant nodes of a circuit is a difficult and an extremely important one, and a strategy which addresses it is entanglement percolation. We provide a method for attaining the end through a quantum measurement strategy involving three-, two-, and single-qubit measurements on a single-layer honeycomb lattice of partially entangled pure bipartite entangled states. We then move over to a double-layered lattice, and introduce entanglement percolation on that lattice under a realistic restriction on local quantum operations and classical communication allowed on the nodes of the lattice. When applied to a single-layered honeycomb lattice, our strategy would call for less noise effects in an actual realization than when the same phenomenon is attained via existing methods. Moreover, for the double-layered honeycomb lattice, we report advantage of quantum entanglement percolation over classical entanglement percolation under the realistic restriction.	翻訳日:2023-07-07 01:03:53 公開日:2023-07-04
# 非線形PTPチャネルを用いた高速量子状態判別 Fast quantum state discrimination with nonlinear PTP channels ( http://arxiv.org/abs/2111.05977v2 ) ライセンス: Link先を確認	Michael R. Geller	(参考訳) 決定論的正のトレース保存(PTP)チャネルと進化方程式に基づく非線形量子計算のモデルについて検討する。モデルは任意の有限ヒルベルト空間で定義されるが、主な結果は次元$N \! = \! 2$. 有界線型作用素 $X$ 上のすべての正規化可能線型あるいは非線形正写像 $\phi$ に対して、関連する正規化 PTP チャネル $ \phi(X) / {\rm tr}[\phi(X)]$ が存在する。正規化されたPTPチャネルは、相互作用するボソンに対するグロス=ピタエフスキー方程式のようなユニタリ平均場理論や、線形および非線形散逸のモデルを含む。それらは4つのタイプに分類され、計算力を探索する3種類の非線形性をもたらす。クビットの場合、これらのチャネルは以前に研究されたブロッホ球のねじれやその他の歪みをサポートし、そのような非線形性は1対のクビット状態の分離を増大させることで、状態判別の指数的なスピードアップを示唆している。このアイデアに基づいて、この操作を消散を用いて雑音に頑健にすることで、一対の固定点が本質的にフォールトトレラントな非線形状態判別器を生成する新しい位相への分岐を誘導することができると論じる。 We investigate models of nonlinear quantum computation based on deterministic positive trace-preserving (PTP) channels and evolution equations. The models are defined in any finite Hilbert space, but the main results are for dimension $N \! = \! 2$. For every normalizable linear or nonlinear positive map $\phi$ on bounded linear operators $X$, there is an associated normalized PTP channel $ \phi(X) / {\rm tr}[\phi(X)]$. Normalized PTP channels include unitary mean field theories, such as the Gross-Pitaevskii equation for interacting bosons, as well as models of linear and nonlinear dissipation. They classify into 4 types, yielding 3 distinct forms of nonlinearity whose computational power we explore. In the qubit case these channels support Bloch ball torsion and other distortions studied previously, where it has been shown that such nonlinearity can be used to increase the separation between a pair of close qubit states, suggesting an exponential speedup for state discrimination. Building on this idea, we argue that this operation can be made robust to noise by using dissipation to induce a bifurcation to a novel phase where a pair of attracting fixed points create an intrinsically fault-tolerant nonlinear state discriminator.	翻訳日:2023-07-07 00:57:52 公開日:2023-07-04
# 過パラメータ化からの導出性:負のパーセプトロンの例 Tractability from overparametrization: The example of the negative perceptron ( http://arxiv.org/abs/2110.15824v3 ) ライセンス: Link先を確認	Andrea Montanari, Yiqiao Zhong, Kangjie Zhou	(参考訳) 負のパーセプトロン問題では、$n$ data points $({\boldsymbol x}_i,y_i)$、ただし${\boldsymbol x}_i$は$d$-dimensional vector、$y_i\in\{+1,-1\}$はバイナリラベルである。データは線形分離可能ではなく、従って、最大の可能な 'emph{ negative} マージンを持つ線形分類器を見つけるのに満足する。言い換えれば、単位ノルムベクトル ${\boldsymbol \theta}$ を見つけて、$\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$ を最大化する。これは非凸最適化問題(ポリトープ内の最大ノルムベクトルを見つけるのと同値)であり、データに対する2つのランダムモデルの下でその典型的な性質を調べる。我々は、$n,d\to \infty$と$n/d\to\delta$の比例漸近を考慮し、その逆関数 $\delta_{\text{s}}(\kappa)$ の最大辺 $\kappa_{\text{s}}(\delta)$ あるいは -- 等価に) の上と下の境界を証明している。言い換えると、$\delta_{\text{s}}(\kappa)$はオーバーパラメトリゼーションしきい値である: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier 消滅するトレーニングエラーを達成することは高い確率で存在し、$n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$はそうではない。我々の$\delta_{\text{s}}(\kappa)$は、先頭の順序に$\kappa\to -\infty$と一致します。次に線形計画アルゴリズムを解析して解を見つけ、対応するしきい値 $\delta_{\text{lin}}(\kappa)$ を特徴付ける。我々は補間しきい値 $\delta_{\text{s}}(\kappa)$ と線形計画しきい値 $\delta_{\text{lin}}(\kappa)$ の間のギャップを観察し、他のアルゴリズムの振る舞いの問題を提起する。 In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i,y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector ${\boldsymbol \theta}$ that maximizes $\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which $n,d\to \infty$ with $n/d\to\delta$, and prove upper and lower bounds on the maximum margin $\kappa_{\text{s}}(\delta)$ or -- equivalently -- on its inverse function $\delta_{\text{s}}(\kappa)$. In other words, $\delta_{\text{s}}(\kappa)$ is the overparametrization threshold: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier achieving vanishing training error exists with high probability, while for $n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$ it does not. Our bounds on $\delta_{\text{s}}(\kappa)$ match to the leading order as $\kappa\to -\infty$. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold $\delta_{\text{lin}}(\kappa)$. We observe a gap between the interpolation threshold $\delta_{\text{s}}(\kappa)$ and the linear programming threshold $\delta_{\text{lin}}(\kappa)$, raising the question of the behavior of other algorithms.	翻訳日:2023-07-07 00:57:25 公開日:2023-07-04
# 確率勾配降下法における適応バッチサイズ選択戦略の等価性について On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods ( http://arxiv.org/abs/2109.10933v2 ) ライセンス: Link先を確認	Luis Espath, Sebastian Krumscheid, Ra\'ul Tempone, Pedro Vilanova	(参考訳) 本研究では,\epsilon^2=\theta^2+\nu^2}\,\theta$ および $\nu$ の特定の選択をした場合の確率的勾配降下 (sgd) 法に関連する収束率の観点から,ノルム検定と内積/直交性試験が等価であることを示す。ここで、$\epsilon$は勾配のノルムの相対統計誤差を制御し、$\theta$と$\nu$は勾配の方向と勾配の直交方向の相対統計誤差をそれぞれ制御する。さらに,もし$\theta$ と $\nu$ が最適に選択されれば,内積/オルトゴナリティテストは最善のケースではノルムテストと同じくらい安価になるが,内積/オルトゴナリティテストは$\epsilon^2=\theta^2+\nu^2$なら計算的に安くなることはない。最後に,2つの確率的最適化問題を提案する。 In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if $\epsilon^2=\theta^2+\nu^2$ with specific choices of $\theta$ and $\nu$. Here, $\epsilon$ controls the relative statistical error of the norm of the gradient while $\theta$ and $\nu$ control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if $\theta$ and $\nu$ are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if $\epsilon^2=\theta^2+\nu^2$. Finally, we present two stochastic optimization problems to illustrate our results.	翻訳日:2023-07-07 00:56:19 公開日:2023-07-04
# 半マルコフモデルを用いた適応前方シミュレーション時間(AFST)を用いたロボットナビゲーションの強化学習 Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model ( http://arxiv.org/abs/2108.06161v4 ) ライセンス: Link先を確認	Yu'an Chen, Ruosong Ye, Ziyang Tao, Hongjian Liu, Guangda Chen, Jie Peng, Jun Ma, Yu Zhang, Jianmin Ji and Yanyong Zhang	(参考訳) 深部強化学習(DRL)アルゴリズムは、知覚入力を直接ロボット制御コマンドにマッピングすることで、特に未知の環境でロボットナビゲーションに有効であることが証明されている。しかし、既存の手法の多くはナビゲーションの局所的な最小問題を無視しており、複雑な未知の環境を扱えない。本稿では,適応フォワードシミュレーション時間 (AFST) と呼ばれる連続的な行動空間を持つ半マルコフ決定プロセス (SMDP) でモデル化されたDRLベースのナビゲーション手法を提案する。具体的には,動作空間の次元を小さくし,特定のSMDP問題に対する分散近似ポリシー最適化(DPPO)アルゴリズムを改良し,GAEを修正してSMDPのポリシー勾配をより正確に推定する。様々な未知環境における実験は、AFSTの有効性を示す。 Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST.	翻訳日:2023-07-07 00:55:56 公開日:2023-07-04
# 多様体上の最適化:シンプレクティックアプローチ Optimization on manifolds: A symplectic approach ( http://arxiv.org/abs/2107.11231v2 ) ライセンス: Link先を確認	Guilherme Fran\c{c}a, Alessandro Barp, Mark Girolami, Michael I. Jordan	(参考訳) 統計的機械学習では最適化タスクが不可欠である。近年、動的システムからのツールを活用することで、連続時間システムの適切な離散化を通じて、加速的かつロバストな最適化手法を導出することに大きな関心が寄せられている。しかし、これらのアイデアは主にユークリッド空間や制約のない設定、あるいはリーマン勾配フローに限られている。本研究では, 非線形制約を伴う問題を含む滑らかな多様体上の最適化問題を解くための一般的な枠組みとして, ディラックの制約付きハミルトン系理論の散逸拡張を提案する。本研究では,「レートマッチング」である多様体上の幾何学的・漸近的数値積分器,すなわち連続時間収束率を保存する。特に,最適収束率を局所的に達成できる散逸型RATTLE積分器を提案する。我々の(加速された)アルゴリズムのクラスは単純で効率的なだけでなく、幅広いコンテキストに適用できる。 Optimization tasks are crucial in statistical machine learning. Recently, there has been great interest in leveraging tools from dynamical systems to derive accelerated and robust optimization methods via suitable discretizations of continuous-time systems. However, these ideas have mostly been limited to Euclidean spaces and unconstrained settings, or to Riemannian gradient flows. In this work, we propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems over smooth manifolds, including problems with nonlinear constraints. We develop geometric/symplectic numerical integrators on manifolds that are "rate-matching," i.e., preserve the continuous-time rates of convergence. In particular, we introduce a dissipative RATTLE integrator able to achieve optimal convergence rate locally. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.	翻訳日:2023-07-07 00:55:38 公開日:2023-07-04
# ビデオ超解像トランス Video Super-Resolution Transformer ( http://arxiv.org/abs/2106.06847v3 ) ライセンス: Link先を確認	Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool	(参考訳) ビデオ超解像(VSR)は、高解像度映像を対応する低解像度バージョンから復元することを目的としており、時空間シーケンス予測問題である。近年,シークエンス・ツー・シーケンス・モデリングの並列計算能力により,Transformerが普及している。したがって、視覚変換器をVSRの解法に適用することは容易である。しかしながら、完全接続された自己接続層とトークン指向のフィードフォワード層を持つトランスの典型的なブロック設計は、以下の2つの理由からvsrには適さない。第一に、完全接続されたセルフアテンション層は、注意マップを計算するために線形層に依存するため、データの局所性を利用するのを怠る。第2に、トークンワイドフィードフォワード層は、VSRにとって重要な特徴アライメントを欠いている。本稿では,VSR に Transformer を適用するための最初の試みを行う。具体的には,まず,局所性情報を利用した理論的理解を伴う空間的時間的畳み込み自己認識層を提案する。第2の課題として,双方向光フロー型フィードフォワード層をデザインし,異なる映像フレーム間の相関を探索し,特徴を整合させる。いくつかのベンチマークデータセットに対する大規模な実験により,提案手法の有効性が示された。コードはhttps://github.com/caojiezhang/vsr-transformerで入手できる。 Video super-resolution (VSR), with the aim to restore a high-resolution video from its corresponding low-resolution version, is a spatial-temporal sequence prediction problem. Recently, Transformer has been gaining popularity due to its parallel computing ability for sequence-to-sequence modeling. Thus, it seems to be straightforward to apply the vision Transformer to solve VSR. However, the typical block design of Transformer with a fully connected self-attention layer and a token-wise feed-forward layer does not fit well for VSR due to the following two reasons. First, the fully connected self-attention layer neglects to exploit the data locality because this layer relies on linear layers to compute attention maps. Second, the token-wise feed-forward layer lacks the feature alignment which is important for VSR since this layer independently processes each of the input token embeddings without any interaction among them. In this paper, we make the first attempt to adapt Transformer for VSR. Specifically, to tackle the first issue, we present a spatial-temporal convolutional self-attention layer with a theoretical understanding to exploit the locality information. For the second issue, we design a bidirectional optical flow-based feed-forward layer to discover the correlations across different video frames and also align features. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed method. The code will be available at https://github.com/caojiezhang/VSR-Transformer.	翻訳日:2023-07-07 00:54:48 公開日:2023-07-04
# 構造バンドにおける固定予算ベストアーム同定 Fixed-Budget Best-Arm Identification in Structured Bandits ( http://arxiv.org/abs/2106.04763v8 ) ライセンス: Link先を確認	Mohammad Javad Azizi, Branislav Kveton and Mohammad Ghavamzadeh	(参考訳) 固定予算設定におけるベストアーム識別(BAI)は、学習エージェントが一定の回数の観測後に最適な(ベスト)腕を特定する確率を最大化する盗賊問題である。このトピックに関するほとんどの研究は、少数の腕を持つ非構造的な問題を研究し、適用性を制限する。結合一般化モデルから平均報酬推定値に基づいて、次々に最適なアームを除去することにより、構造を組み込んだ一般トラクタブルアルゴリズムを提案する。線形および一般化線形モデル(GLM)を用いてアルゴリズムを解析し,G-最適設計に基づく実践的実装を提案する。線形モデルでは,提案アルゴリズムは先行動作に対する競合誤差を保証し,少なくとも経験的にも動作する。 GLMでは、固定予算BAIの分析を行う最初の実用的なアルゴリズムである。 Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations. Most works on this topic study unstructured problems with a small number of arms, which limits their applicability. We propose a general tractable algorithm that incorporates the structure, by successively eliminating suboptimal arms based on their mean reward estimates from a joint generalization model. We analyze our algorithm in linear and generalized linear models (GLMs), and propose a practical implementation based on a G-optimal design. In linear models, our algorithm has competitive error guarantees to prior works and performs at least as well empirically. In GLMs, this is the first practical algorithm with analysis for fixed-budget BAI.	翻訳日:2023-07-07 00:54:24 公開日:2023-07-04
# サンプルモーメントを用いた密度推定のための非古典的パラメータ化 A Non-Classical Parameterization for Density Estimation Using Sample Moments ( http://arxiv.org/abs/2201.04786v5 ) ライセンス: Link先を確認	Guangyu Wu, Anders Lindquist	(参考訳) 確率密度推定は統計処理と信号処理の中心的な問題である。モーメント法は、密度推定の重要な手段であるが、それらは一般に、性能に大きく影響する、実現可能な関数の選択に強く依存している。本稿では,そのような関数の選択を必要としないサンプルモーメントを用いた密度推定のための非古典的パラメトリゼーションを提案する。パラメトリゼーションは、二乗ヘリンガー距離によって引き起こされ、その解は、データに依存しない単純な前もって存在し、一意な対象であることが証明され、凸最適化によって得られる。密度推定器の統計的特性と漸近誤差上界は、パワーモーメントによる推定器に対して提案される。信号処理タスクにおける密度推定器の応用について述べる。シミュレーション結果から, 推定器の性能を, いくつかの手法との比較により検証した。我々の知る限りでは、提案された推定器は、任意の偶数列までのパワーモーメントが標本モーメントと正確に一致し、真の密度は特定の関数クラスに収まらないと仮定される文学における最初のものである。 Probability density estimation is a core problem of statistics and signal processing. Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. In this paper, we propose a non-classical parametrization for density estimation using sample moments, which does not require the choice of such functions. The parametrization is induced by the squared Hellinger distance, and the solution of it, which is proved to exist and be unique subject to a simple prior that does not depend on data, and can be obtained by convex optimization. Statistical properties of the density estimator, together with an asymptotic error upper bound are proposed for the estimator by power moments. Applications of the proposed density estimator in signal processing tasks are given. Simulation results validate the performance of the estimator by a comparison to several prevailing methods. To the best of our knowledge, the proposed estimator is the first one in the literature for which the power moments up to an arbitrary even order exactly match the sample moments, while the true density is not assumed to fall within specific function classes.	翻訳日:2023-07-07 00:44:25 公開日:2023-07-04
# 多目的ニューラルアーキテクチャ探索による解釈可能なモデル学習 Learning Interpretable Models Through Multi-Objective Neural Architecture Search ( http://arxiv.org/abs/2112.08645v4 ) ライセンス: Link先を確認	Zachariah Carmichael, Tim Moon, Sam Ade Jacobs	(参考訳) ディープラーニングの記念碑的な進歩は、さまざまな領域で前例のない成果をもたらしている。ディープニューラルネットワークのパフォーマンスは実行可能であるが、そのようなモデルのアーキテクチャ設計と解釈性は非自明である。ニューラルネットワークアーキテクチャの設計を自動化するために、ニューラルネットワークサーチ(NAS)が導入された。最近の進歩により、分散計算と新しい最適化アルゴリズムを活用することで、これらの手法はより実用的になった。しかし、解釈可能性のためにアーキテクチャを最適化する作業はほとんどない。そこで我々は,多目的分散NASフレームワークを提案し,タスク性能と「イントロスペクタビリティ」の両方を最適化する。我々は、非支配的なソート遺伝的アルゴリズム(NSGA-II)と説明可能なAI(XAI)技術を活用し、ドメインの専門家がより理解しやすいアーキテクチャに報いる。このフレームワークは複数の画像分類データセットで評価される。タスクエラーとイントロスペクタビリティを共同で最適化することで、許容可能なエラー内で実行する、より疎結合でデバッグ可能なアーキテクチャが実現できることを実証する。 Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these methods more pragmatic by exploiting distributed computation and novel optimization algorithms. However, there is little work in optimizing architectures for interpretability. To this end, we propose a multi-objective distributed NAS framework that optimizes for both task performance and "introspectability," a surrogate metric for aspects of interpretability. We leverage the non-dominated sorting genetic algorithm (NSGA-II) and explainable AI (XAI) techniques to reward architectures that can be better comprehended by domain experts. The framework is evaluated on several image classification datasets. We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within tolerable error.	翻訳日:2023-07-07 00:43:54 公開日:2023-07-04
# カメラネットワークにおける人物検索を支援するクロスカメラトラジェクタ Cross-Camera Trajectories Help Person Retrieval in a Camera Network ( http://arxiv.org/abs/2204.12900v3 ) ライセンス: Link先を確認	Xin Zhang and Xiaohua Xie and Jianhuang Lai and Wei-Shi Zheng	(参考訳) オーバラップしないカメラネットワークで撮影した複数のビデオからクエリを検索することに関心がある。既存の手法では、純粋な視覚的マッチングや時間的制約を考慮することが多いが、カメラネットワークの空間情報は無視する。この問題に対処するために,時間情報と空間情報を統合したクロスカメラトラジェクトリ生成に基づく歩行者検索フレームワークを提案する。本研究では,歩行者の歩行習慣とカメラ間の経路配置を統合し,協調確率分布を形成する新しいクロスカメラ時空間モデルを提案する。スパースサンプリングされた歩行者データを用いて、カメラネットワーク内のこのような時空間モデルを特定できる。時空間モデルに基づいて、クロスカメラトラジェクトリを条件付きランダム場モデルにより抽出し、制限された非負行列分解によりさらに最適化することができる。最後に,歩行者検索結果を改善するため,軌道再分類手法を提案する。本手法の有効性を検証するため,実際の監視シナリオにおいて,最初のクロスカメラ歩行者軌跡データセットであるPerson Trajectory Datasetを構築した。提案手法の有効性とロバスト性に関する広範な実験を行った。 We are concerned with retrieving a query person from multiple videos captured by a non-overlapping camera network. Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. To address this issue, we propose a pedestrian retrieval framework based on cross-camera trajectory generation, which integrates both temporal and spatial information. To obtain pedestrian trajectories, we propose a novel cross-camera spatio-temporal model that integrates pedestrians' walking habits and the path layout between cameras to form a joint probability distribution. Such a spatio-temporal model among a camera network can be specified using sparsely sampled pedestrian data. Based on the spatio-temporal model, cross-camera trajectories can be extracted by the conditional random field model and further optimized by restricted non-negative matrix factorization. Finally, a trajectory re-ranking technique is proposed to improve the pedestrian retrieval results. To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset, the Person Trajectory Dataset, in real surveillance scenarios. Extensive experiments verify the effectiveness and robustness of the proposed method.	翻訳日:2023-07-07 00:37:17 公開日:2023-07-04
# 有限パルス不完全なラマン断熱路の完全刺激 Perfect stimulated Raman adiabatic passage with imperfect finite-time pulses ( http://arxiv.org/abs/2204.05271v2 ) ライセンス: Link先を確認	Shruti Dogra and Gheorghe Sorin Paraoanu	(参考訳) 我々は,STImulated Raman Adiabatic Passage (STIRAP)において,完全な人口移動を実現する2つのガウスパルスドライブを適切に調整したシーケンスを示す。我々はストークスパルスとポンプパルスの最適乱れと相対配置に関する理論的解析を行った。さらに、与えられたパルス幅に対するプロトコルの電力と持続時間を得る。重要なことに、所望の忠実性の値を達成するために必要なプロトコルの期間は、不忠実性の対数的のみに依存する。ドライブの最適切断を前提とし、高速転送のポイントを参考に、非常に単純で効果的である新しい断熱性基準を得る。 We present a well-tailored sequence of two Gaussian-pulsed drives that achieves perfect population transfer in STImulated Raman Adiabatic Passage (STIRAP). We give a theoretical analysis of the optimal truncation and relative placement of the Stokes and pump pulses. Further, we obtain the power and the duration of the protocol for a given pulse width. Importantly, the duration of the protocol required to attain a desired value of fidelity depends only logarithmically on the infidelity. Subject to optimal truncation of the drives and with reference to the point of fastest transfer, we obtain a new adiabaticity criteria, which is remarkably simple and effective.	翻訳日:2023-07-07 00:36:58 公開日:2023-07-04
# 単純非パラメトリック混合学習の硬さに関する厳密な境界 Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures ( http://arxiv.org/abs/2203.15150v3 ) ライセンス: Link先を確認	Bryon Aragam, Wai Ming Tai	(参考訳) 有限混合系における非パラメトリック分布の学習問題について検討し、そのようなモデルにおける成分分布の学習におけるサンプル複雑性の厳密な境界を確立する。すなわち、pdf$f$から、$$f=w_1f_1+w_2f_2, \quad w_1+w_2=1, \quad w_1,w_2>0$$のサンプルが与えられる。 f_i$の仮定がなければ、この問題は正しくない。成分 $f_i$ を識別するために、各$f_i$ はガウスの畳み込みとコンパクトに支持された密度 $\nu_i$ と $\text{supp}(\nu_1)\cap \text{supp}(\nu_2)=\emptyset$ と書けると仮定する。主な結果は、$(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$サンプルが各$f_i$を推定するために必要であることを示している。この証明は、独立利害関係にあるガウシアンとの近似速度が速い量的タウバーの定理に依存している。これは厳密であることを示すために、各$f_i$を推定するために$(\frac{1}{\varepsilon})^{o(\log\log \frac{1}{\varepsilon})}$サンプルを使用するアルゴリズムも提案する。モーメントマッチングとテンソル法に基づく潜在変数モデルを学習する既存のアプローチとは異なり、我々の証明は直交関数による不条件線形系の微妙な解析を伴う。これらの境界を組み合わせることで、この問題の最適サンプル複雑性は多項式と指数関数の間にあると結論づけ、これは学習理論では一般的ではない。 We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=w_1f_1+w_2f_2, \quad w_1+w_2=1, \quad w_1,w_2>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_1)\cap \text{supp}(\nu_2)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. The proof relies on a quantitative Tauberian theorem that yields a fast rate of approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.	翻訳日:2023-07-07 00:36:47 公開日:2023-07-04
# 超伝導回路における超断熱的集団移動のスケーリング誤差下でのロバスト性の実験的実証 Experimental demonstration of robustness under scaling errors for superadiabatic population transfer in a superconducting circuit ( http://arxiv.org/abs/2203.12073v2 ) ライセンス: Link先を確認	Shruti Dogra, Antti Veps\"al\"ainen, and Gheorghe Sorin Paraoanu	(参考訳) 超断熱的Raman adiabatic passage (saSTIRAP) を用いて, トランスモン回路の基底状態と第2励起状態の間の集団移動を実験的に理論的に検討した。パルスの振幅の変動(スケーリング誤差)に対して、転送が著しく耐性があることを示し、超断熱過程が断熱過程からある種の強靭性特徴を継承することを示す。特に,sastirapの枠組みを超越した反断熱パルス強度の高値に出現する新しい高原の存在を実証した。 We study experimentally and theoretically the transfer of population between the ground state and the second excited state in a transmon circuit by the use of superadiabatic stimulated Raman adiabatic passage (saSTIRAP). We show that the transfer is remarkably resilient against variations in the amplitudes of the pulses (scaling errors), thus demostrating that the superadiabatic process inherits certain robustness features from the adiabatic one. In particular, we put in evidence a new plateau that appears at high values of the counterdiabatic pulse strength, which goes beyond the usual framework of saSTIRAP.	翻訳日:2023-07-07 00:35:20 公開日:2023-07-04
# 調整表面符号の回路ノイズの復号化と脆弱境界の改善 Improved decoding of circuit noise and fragile boundaries of tailored surface codes ( http://arxiv.org/abs/2203.04948v5 ) ライセンス: Link先を確認	Oscar Higgott, Thomas C. Bohdanowicz, Aleksander Kubica, Steven T. Flammia, Earl T. Campbell	(参考訳) 量子計算の可能性を最大限に発揮するには、量子誤差補正(qec)が必要である。 QEC符号は、複数のノイズのある物理量子ビットを使用して、より少ない論理量子ビットで情報を符号化し、復号処理による誤りの識別を可能にする。このプロセスは論理的忠実度(または精度)を高め、計算をより信頼性を高める。しかし、ほとんどの高速(効率的なランタイム)デコーダは重要なノイズ特性を無視し、精度を低下させる。本研究では,高速かつ高精度なデコーダを導入し,表面コードを含む多種多様なQECコードで使用することができる。我々のデコーダは、信仰マッチングと信念フィンドと呼ばれ、すべてのノイズ情報を活用し、QECの高精度なデモを解き放つ。性能指標として表面符号閾値を用いると、デコーダの誤差確率0.94\%でしきい値が観測され、標準の最小値完全整合デコーダの閾値0.82\%を上回った。また、バイアスノイズモデルに適した符号の理論的ケーススタディにおいて、信念マッチングデコーダを検証した。このデコーダは, 標準の正方形曲面符号に対して, 整形曲面符号において, より高いしきい値と低い量子ビットオーバーヘッドをもたらすことがわかった。驚くべきことに、十分に低いしきい値のシステムでは、私たちが"脆弱な境界"と呼ぶ以前は気付かなかった現象のために、矩形の表面コードは、調整された表面コードよりもリソース効率が向上します。我々のデコーダは他の全ての高速デコーダをしきい値と精度で上回り、現在の量子誤り訂正実験でより良い結果をもたらし、理論的なケーススタディのための新しい領域を開くことができる。 Realizing the full potential of quantum computation requires quantum error correction (QEC), with most recent breakthrough demonstrations of QEC using the surface code. QEC codes use multiple noisy physical qubits to encode information in fewer logical qubits, enabling the identification of errors through a decoding process. This process increases the logical fidelity (or accuracy) making the computation more reliable. However, most fast (efficient runtime) decoders neglect important noise characteristics, thereby reducing their accuracy. In this work, we introduce decoders that are both fast and accurate, and can be used with a wide class of QEC codes including the surface code. Our decoders, named belief-matching and belief-find, exploit all noise information and thereby unlock higher accuracy demonstrations of QEC. Using the surface code threshold as a performance metric, we observe a threshold at 0.94\% error probability for our decoders, outperforming the 0.82\% threshold for a standard minimum-weight perfect matching decoder. We also tested our belief-matching decoders in a theoretical case study of codes tailored to a biased noise model. We find that the decoders led to a much higher threshold and lower qubit overhead in the tailored surface code with respect to the standard, square surface code. Surprisingly, in the well-below threshold regime, the rectangular surface code becomes more resource-efficient than the tailored surface code, due to a previously unnoticed phenomenon that we call "fragile boundaries". Our decoders outperform all other fast decoders in terms of threshold and accuracy, enabling better results in current quantum error correction experiments and opening up new areas for theoretical case studies.	翻訳日:2023-07-07 00:34:47 公開日:2023-07-04
# 単純後悔最小化のためのメタラーニング Meta-Learning for Simple Regret Minimization ( http://arxiv.org/abs/2202.12888v2 ) ライセンス: Link先を確認	Mohammadjavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya	(参考訳) バンディットにおける簡単な後悔の最小化のためのメタラーニングフレームワークを開発する。このフレームワークでは、学習エージェントが未知の事前分布からサンプル化された一連のバンディットタスクと相互作用し、そのメタパラメータを学習して、将来のタスクをよりよく実行する。本稿では,ベイズ的かつ頻繁なメタ学習アルゴリズムを提案する。ベイズアルゴリズムは、メタパラメータ上の以前の分布にアクセスでき、そのメタ単純後悔は、水平線$n$は単に$\tilde{O}(m / \sqrt{n})$である。一方、頻繁なアルゴリズムのメタ単純後悔は$\tilde{O}(\sqrt{m} n + m/ \sqrt{n})$である。後悔は悪化するが、メタパラメーター上の事前分布を必要としないため、頻繁なアルゴリズムはより一般的である。より多くの設定で分析することもできる。アルゴリズムをいくつかのバンディット問題のクラスにインスタンス化する。我々のアルゴリズムは一般的であり、いくつかの環境で経験的に評価することで理論を補完する。 We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over $m$ bandit tasks with horizon $n$ is mere $\tilde{O}(m / \sqrt{n})$. On the other hand, the meta simple regret of the frequentist algorithm is $\tilde{O}(\sqrt{m} n + m/ \sqrt{n})$. While its regret is worse, the frequentist algorithm is more general because it does not need a prior distribution over the meta-parameters. It can also be analyzed in more settings. We instantiate our algorithms for several classes of bandit problems. Our algorithms are general and we complement our theory by evaluating them empirically in several environments.	翻訳日:2023-07-07 00:34:18 公開日:2023-07-04
# シャッフルチェックによるプライバシ増幅 Privacy Amplification via Shuffled Check-Ins ( http://arxiv.org/abs/2206.03151v2 ) ライセンス: Link先を確認	Seng Pei Liew, Satoshi Hasegawa, Tsubasa Takahashi	(参考訳) 我々は、信頼できるシャッフル器以上の信頼の仮定を必要とせずに強力なプライバシー保証を実現する、shuffled check-inと呼ばれる分散計算プロトコルについて検討する。既存のほとんどの作業とは異なり、シャッフルチェックインにより、クライアントは独立してランダムに計算に参加できるようになり、サーバ初期化サブサンプリングの必要性がなくなる。差分プライバシーを活用することで、シャッフルチェックインはプライバシーの増幅を通じて厳密なプライバシー保証を実現することを示し、既存の作業よりもプライバシー会計を改善するR{\'e}nyi差分プライバシーに基づく新たな分析を行った。また,本論文のローカル/シャッフルモデルにおける分散環境下での汎用メカニズムの最初の評価であるガウス機構を含む,汎用シャッフル機構のプライバシを追跡する数値的手法を導入する。提案手法の有効性を示す実証的研究も行われている。 We study a protocol for distributed computation called shuffled check-in, which achieves strong privacy guarantees without requiring any further trust assumptions beyond a trusted shuffler. Unlike most existing work, shuffled check-in allows clients to make independent and random decisions to participate in the computation, removing the need for server-initiated subsampling. Leveraging differential privacy, we show that shuffled check-in achieves tight privacy guarantees through privacy amplification, with a novel analysis based on R{\'e}nyi differential privacy that improves privacy accounting over existing work. We also introduce a numerical approach to track the privacy of generic shuffling mechanisms, including Gaussian mechanism, which is the first evaluation of a generic mechanism under the distributed setting within the local/shuffle model in the literature. Empirical studies are also given to demonstrate the efficacy of the proposed approach.	翻訳日:2023-07-07 00:24:44 公開日:2023-07-04
# ニューラルネットワークによるwehrlモーメントによる絡み合いの幾何測度の推定 Estimation of the geometric measure of entanglement with Wehrl Moments through Artificial Neural Networks ( http://arxiv.org/abs/2205.15095v3 ) ライセンス: Link先を確認	J\'er\^ome Denis, Fran\c{c}ois Damanet, John Martin	(参考訳) 近年、ニューラルネットワーク(anns)は、量子論、特に絡み合い理論の問題を研究するためのツールとして人気が高まっている。本研究では、入力として限られた数のwehrlモーメント(状態のフシミ関数のモーメント)のみを使用して、対称多量子状態の絡み合いの幾何学的測度をannがどの程度正確に予測できるかを分析し、状態に関する部分的情報を表現する。純粋な量子状態と混合量子状態の両方を考える。我々は、ANNを訓練して得られる結果と収束加速法を情報利用した結果を比較する。我々は、最も強力な収束加速アルゴリズムのいくつかでさえ、これらのANNを訓練するのに十分なデータが得られることを条件として、同じ入力データを与えられた場合、ANNと競合しないことがわかった。また,状態に依存しないwehrlモーメントを計測するための実験プロトコルを提供する。より一般に、この研究は、フル状態トモグラフィーよりも実験でより利用しやすい方法で、絡み合い測度と、Wehrlエントロピーのような他のSU(2)不変量の推定の視点を開く。 In recent years, artificial neural networks (ANNs) have become an increasingly popular tool for studying problems in quantum theory, and in particular entanglement theory. In this work, we analyse to what extent ANNs can accurately predict the geometric measure of entanglement of symmetric multiqubit states using only a limited number of Wehrl moments (moments of the Husimi function of the state) as input, which represents partial information about the state. We consider both pure and mixed quantum states. We compare the results we obtain by training ANNs with the informed use of convergence acceleration methods. We find that even some of the most powerful convergence acceleration algorithms do not compete with ANNs when given the same input data, provided that enough data is available to train these ANNs. We also provide an experimental protocol for measuring Wehrl moments, which is state-independent. More generally, this work opens up perspectives for the estimation of entanglement measures and other SU(2)-invariant quantities, such as the Wehrl entropy, in a way that is more accessible in experiments than by means of full state tomography.	翻訳日:2023-07-07 00:24:27 公開日:2023-07-04
# メッセージパッシングニューラルネットワークは、知識グラフの補完に役立つか? Are Message Passing Neural Networks Really Helpful for Knowledge Graph Completion? ( http://arxiv.org/abs/2205.10652v3 ) ライセンス: Link先を確認	Juanhui Li and Harry Shomer and Jiayuan Ding and Yiqi Wang and Yao Ma and Neil Shah and Jiliang Tang and Dawei Yin	(参考訳) 知識グラフ(KG)は様々な応用を促進する。製造とメンテナンスに多大な努力をしてきたにもかかわらず、最大のkgも完成にはほど遠い。したがって、KG完了(KGC)はKG研究において最も重要な課題の一つとなっている。近年,メッセージパッシング(Graph)ニューラルネットワーク(MPNN)の活用を中心に,強力な埋め込み学習が盛んに行われている。これらの手法の成功は、追加のメッセージパッシング(MP)コンポーネントを前提として、より単純な多層パーセプトロン(MLP)モデルよりもMPNNを使うことによる。この研究で、驚くほど単純なMPPモデルでMPNNに匹敵する性能を達成できることがわかり、MPが以前信じられていたほど重要でない可能性が示唆された。さらに,注意深いスコアリング機能を示し,損失関数の設計がkgcモデルの性能に大きく影響することを示した。これは、現在最先端のKGCメソッドのスケーラビリティに関する将来的な洞察と、明日のKGCタスクに適したMP設計への注意を、事前作業におけるスコアリング関数設計、損失関数設計、MPの融合を示唆している。私たちのコードは、https://github.com/Juanhui28/Are_MPNNs_helpful.comで公開されています。 Knowledge graphs (KGs) facilitate a wide variety of applications. Despite great efforts in creation and maintenance, even the largest KGs are far from complete. Hence, KG completion (KGC) has become one of the most crucial tasks for KG research. Recently, considerable literature in this space has centered around the use of Message Passing (Graph) Neural Networks (MPNNs), to learn powerful embeddings. The success of these methods is naturally attributed to the use of MPNNs over simpler multi-layer perceptron (MLP) models, given their additional message passing (MP) component. In this work, we find that surprisingly, simple MLP models are able to achieve comparable performance to MPNNs, suggesting that MP may not be as crucial as previously believed. With further exploration, we show careful scoring function and loss function design has a much stronger influence on KGC model performance. This suggests a conflation of scoring function design, loss function design, and MP in prior work, with promising insights regarding the scalability of state-of-the-art KGC methods today, as well as careful attention to more suitable MP designs for KGC tasks tomorrow. Our codes are publicly available at: https://github.com/Juanhui28/Are_MPNNs_helpful.	翻訳日:2023-07-07 00:23:25 公開日:2023-07-04
# 弱結合分子の振動ラダー脱落光安定化:遺伝的アルゴリズムによる量子最適制御 Vibrational ladder-descending photostabilization of a weakly bound molecule: Quantum optimal control with a genetic algorithm ( http://arxiv.org/abs/2205.06165v2 ) ライセンス: Link先を確認	Mateo Londo\~no, Julio C. Arce	(参考訳) 極性分子を高次振動レベルからターゲット低次分子へ同一電子状態内で駆動する光制御方式を提案する。この方式は、解析的な形状の赤外線チャープレーザーパルスを使用し、パラメータは遺伝的アルゴリズムに基づく量子最適制御のヒューリスティックな定式化によって最適化される。この手法をkrbフェッシュバッハ分子の最低三重項電子状態における計算的に示す。 We propose an optical control scheme for driving a polar molecule from a high-lying vibrational level to a target low-lying one, within the same electronic state. The scheme utilizes an infrared chirped laser pulse with an analytical shape, whose parameters are optimized by means of a heuristic formulation of quantum optimal control based on a genetic algorithm. We illustrate this methodology computationally for a KRb Feshbach molecule in the lowest triplet electronic state.	翻訳日:2023-07-07 00:23:03 公開日:2023-07-04
# 安全・共安全言語の一階述語論理 A first-order logic characterization of safety and co-safety languages ( http://arxiv.org/abs/2209.02307v4 ) ライセンス: Link先を確認	Alessandro Cimatti and Luca Geatti and Nicola Gigante and Angelo Montanari and Stefano Tonetta	(参考訳) LTL(Linear Temporal Logic)は、コンピュータ科学の様々な分野において、最も一般的な時間論理の1つである。 LTL は反自由オメガオートマタ、星のないオメガ正規表現、(カンプの定理により)一階線形順序理論(FO-TLO)と等価である。安全性(safety)とコセーフティ(co-safety)言語は、単語がそれぞれ言語に属さないか属さないかを確立するために有限プレフィックスが十分であり、モデル検査やltlのリアクティブ合成のような問題の複雑さを低下させる上で重要な役割を果たす。 SafetyLTL (resp., coSafetyLTL) はLTLの断片であり、安全(resp., co-safety)言語のみを認識する普遍的(resp., existential)時間的モダリティのみを許容する。この論文の主な貢献は、safetyfoと呼ばれるfo-tloの断片と、ltl-definable safetyとco-safety languageに関して表現的に完結した2つのcosafetyfoの導入である。我々は,これらがそれぞれSafetyLTLとcoSafetyLTLを正確に特徴付けることを証明し,その結果がカンプの定理に一致することを証明し,一階言語の観点からLTLの特徴付け(フラグメント)をより明確にする。さらに、ltlで定義可能な安全言語がsafetyltlでも定義可能であることを直接的でコンパクトで自己完結した証明を与える。副産物として,有限語および無限語で解釈された,明日の弱作用素SafetyLTLの表現力に関する興味深い結果が得られる。さらに、有限語を解釈すると、明日の(弱明日)演算子を欠いたsafetyltl (resp. cosafetyltl) が有限語上のltlの安全(resp., co-safety)フラグメントをキャプチャする。 Linear Temporal Logic (LTL) is one of the most popular temporal logics, that comes into play in a variety of branches of computer science. Among the various reasons of its widespread use there are its strong foundational properties: LTL is equivalent to counter-free omega-automata, to star-free omega-regular expressions, and (by Kamp's theorem) to the First-Order Theory of Linear Orders (FO-TLO). Safety and co-safety languages, where a finite prefix suffices to establish whether a word does not belong or belongs to the language, respectively, play a crucial role in lowering the complexity of problems like model checking and reactive synthesis for LTL. SafetyLTL (resp., coSafetyLTL) is a fragment of LTL where only universal (resp., existential) temporal modalities are allowed, that recognises safety (resp., co-safety) languages only. The main contribution of this paper is the introduction of a fragment of FO-TLO, called SafetyFO, and of its dual coSafetyFO, which are expressively complete with respect to the LTL-definable safety and co-safety languages. We prove that they exactly characterize SafetyLTL and coSafetyLTL, respectively, a result that joins Kamp's theorem, and provides a clearer view of the characterization of (fragments of) LTL in terms of first-order languages. In addition, it gives a direct, compact, and self-contained proof that any safety language definable in LTL is definable in SafetyLTL as well. As a by-product, we obtain some interesting results on the expressive power of the weak tomorrow operator of SafetyLTL, interpreted over finite and infinite words. Moreover, we prove that, when interpreted over finite words, SafetyLTL (resp. coSafetyLTL) devoid of the tomorrow (resp., weak tomorrow) operator captures the safety (resp., co-safety) fragment of LTL over finite words.	翻訳日:2023-07-07 00:18:12 公開日:2023-07-04
# 難解な学習戦略による動的データフリー知識蒸留 Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy ( http://arxiv.org/abs/2208.13648v3 ) ライセンス: Link先を確認	Jingru Li, Sheng Zhou, Liangcheng Li, Haishuai Wang, Zhi Yu, Jiajun Bu	(参考訳) data-free knowledge distillation (dfkd) は、トレーニングデータが利用できない知識蒸留戦略 (kd) である。訓練データにアクセスせずに、大きな事前訓練された教師モデルの助けを借りて、軽量の学生モデルを訓練する。しかし,既存のdfkd法は,学習中の学習モデルの状態に応じて動的に生成目標を調整することができないため,不適切な不安定なトレーニングプロセスに苦しむ。この制限に対処するため,CuDFKDと呼ばれる新しいDFKD法を提案する。生徒に、人間が学習する方法を反映して、徐々に難解な疑似サンプルを生成するダイナミックな戦略を教える。また、CuDFKDは、学生モデルの状態に応じて生成対象を動的に適応させる。さらに, 大規模化最小化(MM)アルゴリズムの理論解析を行い, CuDFKDの収束性を説明する。 DFKD手法のロバスト性および忠実性を評価するために,CuDFKDがすべてのデータセットにおける最先端(SOTA)DFKD手法に匹敵する性能を持つことを示す実験を行った。また、我々のCuDFKDは、他のSOTA DFKD法よりも早く収束し、最も堅牢であることを示す。 Data-free knowledge distillation (DFKD) is a widely-used strategy for Knowledge Distillation (KD) whose training data is not available. It trains a lightweight student model with the aid of a large pretrained teacher model without any access to training data. However, existing DFKD methods suffer from inadequate and unstable training process, as they do not adjust the generation target dynamically based on the status of the student model during learning. To address this limitation, we propose a novel DFKD method called CuDFKD. It teaches students by a dynamic strategy that gradually generates easy-to-hard pseudo samples, mirroring how humans learn. Besides, CuDFKD adapts the generation target dynamically according to the status of student model. Moreover, We provide a theoretical analysis of the majorization minimization (MM) algorithm and explain the convergence of CuDFKD. To measure the robustness and fidelity of DFKD methods, we propose two more metrics, and experiments shows CuDFKD has comparable performance to state-of-the-art (SOTA) DFKD methods on all datasets. Experiments also present that our CuDFKD has the fastest convergence and best robustness over other SOTA DFKD methods.	翻訳日:2023-07-07 00:17:11 公開日:2023-07-04
# SFusion: 自己注意に基づくN対1マルチモーダル核融合ブロック SFusion: Self-attention based N-to-One Multimodal Fusion Block ( http://arxiv.org/abs/2208.12776v2 ) ライセンス: Link先を確認	Zecheng Liu and Jia Wei and Rui Li and Jianlong Zhou	(参考訳) 人々は、視覚、聴覚、嗅覚、触覚など、異なる感覚で世界を知覚する。複数のモダリティから情報を処理し、融合することで、人工知能は私たちの周りの世界をより簡単に理解できるようになる。しかし、モダリティが欠けている場合、利用可能なモダリティの数は様々な状況で異なるため、n対1の融合問題に繋がる。そこで本研究では,SFusionと呼ばれる自己注意型核融合ブロックを提案する。プリセットの定式化や畳み込みに基づく方法とは異なり、提案するブロックは自動的に、合成やゼロパディングの欠如なく利用可能なモダリティを融合することを学習する。具体的には、上流処理モデルから抽出された特徴表現をトークンとして投影し、セルフアテンションモジュールに供給して潜在マルチモーダル相関を生成する。次に、下流決定モデルで適用可能な共有表現を構築するために、モーダル注意機構を導入する。提案したSFusionは,既存のマルチモーダル解析ネットワークに容易に統合できる。本研究では,SFusionを異なるバックボーンネットワークに適用し,ヒトの活動認識と脳腫瘍のセグメンテーションを行う。実験の結果,SFusionブロックは競合する融合戦略よりも優れた性能を示すことがわかった。私たちのコードはhttps://github.com/scut-cszcl/sfusionで利用可能です。 People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a self-attention based fusion block called SFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into self-attention module to generate latent multimodal correlations. Then, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed SFusion can be easily integrated into existing multimodal analysis networks. In this work, we apply SFusion to different backbone networks for human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the SFusion block achieves better performance than the competing fusion strategies. Our code is available at https://github.com/scut-cszcl/SFusion.	翻訳日:2023-07-07 00:16:49 公開日:2023-07-04
# 古典的データを用いた古典と量子機械学習の学習分離の確立について On establishing learning separations between classical and quantum machine learning with classical data ( http://arxiv.org/abs/2208.06339v2 ) ライセンス: Link先を確認	Casper Gyurik, Vedran Dunjko	(参考訳) 長年の努力にもかかわらず、量子機械学習コミュニティは、古典的データの場合、ある種の暗号化に触発されたデータセットに対して量子学習の利点を示すことしかできなかった。本稿では,量子学習アルゴリズムがどの古典的学習アルゴリズムよりも高速に学習できる学習問題を見つけるための課題について論じ,学習問題を特定する方法について検討する。具体的には、この問題に関連する計算学習理論の主要な概念を考察し、定義の微妙な変化がいかに概念的に著しく異なるタスクを意味するかについて議論する。さらに,より一般的かつ十分な条件(すなわち「チェックリスト」)の集合を蒸留し,古典的学習者と量子学習者の分離を示す学習問題に対して,既存の学習問題を証明可能な量子スピードアップを用いて検討する。これらのチェックリストは、学習問題に対する量子スピードアップを証明するためのアプローチの合理化やボトルネックの解明を目的としている。最後に,その応用例を説明するために,このアプローチのレンズを通して,学習問題(計算分離から構築された場合,あるいは量子実験から得られた場合)の潜在的分離の例を解析する。 Despite years of effort, the quantum machine learning community has only been able to show quantum learning advantages for certain contrived cryptography-inspired datasets in the case of classical data. In this note, we discuss the challenges of finding learning problems that quantum learning algorithms can learn much faster than any classical learning algorithm, and we study how to identify such learning problems. Specifically, we reflect on the main concepts in computational learning theory pertaining to this question, and we discuss how subtle changes in definitions can mean conceptually significantly different tasks, which can either lead to a separation or no separation at all. Moreover, we study existing learning problems with a provable quantum speedup to distill sets of more general and sufficient conditions (i.e., ``checklists'') for a learning problem to exhibit a separation between classical and quantum learners. These checklists are intended to streamline one's approach to proving quantum speedups for learning problems, or to elucidate bottlenecks. Finally, to illustrate its application, we analyze examples of potential separations (i.e., when the learning problem is build from computational separations, or when the data comes from a quantum experiment) through the lens of our approach.	翻訳日:2023-07-07 00:16:28 公開日:2023-07-04
# 集団カウントのためのマルチスケールニューラルネットワークの再設計 Redesigning Multi-Scale Neural Network for Crowd Counting ( http://arxiv.org/abs/2208.02894v2 ) ライセンス: Link先を確認	Zhipeng Du, Miaojing Shi, Jiankang Deng, Stefanos Zafeiriou	(参考訳) 視点の歪みと群衆の変動は、コンピュータビジョンにおいて、群衆の数え上げが困難なタスクとなる。これに取り組むために、多くの先行研究はディープニューラルネットワーク(DNN)にマルチスケールアーキテクチャを使用してきた。マルチスケールブランチは直接マージされる(例えば結合によって)か、DNNのプロキシ(例えば注意)のガイダンスによってマージされる。これらの組み合わせ法は,その普及にもかかわらず,マルチスケール密度マップに対する画素単位の性能差に対処するには不十分である。本研究では,複数スケールの密度マップを階層的にマージした密度エキスパートの階層的混合を導入することにより,マルチスケールニューラルネットワークを再設計する。階層構造の中では、すべてのスケールからの貢献を促進するために専門家のコンペティションとコラボレーションスキームが提示され、異なる階層のスケール組み合わせのためのピクセル単位のソフトウェイトを提供するために、ピクセル単位のソフトゲーティングネットが導入された。ネットワークは、群集密度マップと局所カウントマップの両方を用いて最適化され、後者は、前者の局所積分によって得られる。両者の最適化は、潜在的な競合のために問題となる可能性がある。画像中の強予測された局所領域間の相対的数差に基づく新たな相対的局所的カウント損失を導入し, 密度マップ上の従来の絶対誤差損失と相補的であることを証明した。実験の結果,提案手法は上海技術,UCF_CC_50,JHU-CROWD++,NWPU-Crowd,Trancosの5つの公開データセットに対して,最先端のパフォーマンスを実現することがわかった。 Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos.	翻訳日:2023-07-07 00:16:06 公開日:2023-07-04
# IsoVec:単語埋め込み空間の相対同型制御 IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces ( http://arxiv.org/abs/2210.05098v3 ) ライセンス: Link先を確認	Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn	(参考訳) 単言語単語埋め込み空間から高品質な翻訳辞書を抽出する能力は、空間の幾何学的類似性、すなわちその「同型」の度合いに依存する。単語埋め込み学習の結果、基礎となる空間が非同型となるという、欠陥のある言語間マッピングの根本原因に対処する。我々は,スキップ-グラム損失関数に直接同型の大域的測度を組み込んで,訓練された単語埋め込み空間の相対的同型を増大させ,共通言語間空間にマッピングする能力を向上させる。その結果、一般的なデータ条件、ドメインミスマッチ、トレーニングアルゴリズムの相違によるバイリンガル語彙誘導が改善された。私たちはIsoVecをhttps://github.com/kellymarchisio/isovec.comでリリースします。 The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into the Skip-gram loss function, successfully increasing the relative isomorphism of trained word embedding spaces and improving their ability to be mapped to a shared cross-lingual space. The result is improved bilingual lexicon induction in general data conditions, under domain mismatch, and with training algorithm dissimilarities. We release IsoVec at https://github.com/kellymarchisio/isovec.	翻訳日:2023-07-07 00:05:37 公開日:2023-07-04
# erasenet: 教師付き文書クリーニングのための再帰的残差ネットワーク EraseNet: A Recurrent Residual Network for Supervised Document Cleaning ( http://arxiv.org/abs/2210.00708v2 ) ライセンス: Link先を確認	Yashowardhan Shinde, Kishore Kulkarni, Sachin Kuberkar	(参考訳) ドキュメンテーションはコンピュータビジョンにおいて最も困難なタスクの1つである。デジタル化される文書は何百万もあるが、自然や人為的な要因による文書の劣化などの問題により、この作業は非常に困難である。本稿では, 完全畳み込み型自動エンコーダアーキテクチャを用いて, 汚れた文書のクリーニングを指導する手法を提案する。本稿では,文書の老朽化による変形,xeroxed したページに残されている裂け目,無作為な黒パッチ,明るい可視テキストなど,異質な文書の復元と,光学文字認識システム (ocr) の性能向上のための画像品質の向上に焦点を当てた。スキャンした文書からノイズを取り除くことは、このノイズがOCRシステムの性能に悪影響を及ぼす可能性があるため、文書の前の非常に重要なステップである。本実験では, モデルが各種の常用音や異常音を学習し, 効率よく修正できるので, 有望な結果が得られた。 Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.	翻訳日:2023-07-07 00:05:10 公開日:2023-07-04
# グラフニューラルネットワークのためのユニバーサルプロンプトチューニング Universal Prompt Tuning for Graph Neural Networks ( http://arxiv.org/abs/2209.15240v3 ) ライセンス: Link先を確認	Taoran Fang, Yunchao Zhang, Yang Yang, Chunping Wang, Lei Chen	(参考訳) 近年、プロンプトチューニングは、事前訓練されたモデルに適応する研究の急増を引き起こしている。言語分野における統合事前学習戦略とは異なり、グラフフィールドは様々な事前学習戦略を示し、グラフニューラルネットワークの適切なプロンプトベースのチューニング方法を設計する上での課題を提起する。いくつかの先駆的な研究は、エッジ予測を事前訓練タスクとして使用するモデルの特別なプロンプト機能を考案しているが、これらの手法は特定の事前訓練されたGNNモデルに限定されており、より広範な適用性に欠ける。本稿では,任意の事前学習戦略の下で事前学習したGNNモデルに対して,GPF(Graph Prompt Feature)と呼ばれる汎用的なプロンプトベースのチューニング手法を提案する。 GPFは入力グラフの特徴空間で動作し、理論上任意の形式のプロンプト関数に等価な効果を達成できる。その結果、各事前学習戦略に対応するプロンプト関数を明示的に記述する必要がなくなった。代わりに、我々はGPFを用いて、下流タスクの誘導グラフを適応的に取得する。 GPFの普遍性を実証し、その有効性を保証するための厳密な導出を提供する。様々な事前学習戦略による実験結果から,本手法は微調整よりも優れた性能を示し,全ショットシナリオでは平均1.4%,小ショットシナリオでは約3.2%改善した。さらに,本手法は,事前学習戦略を利用したモデルに適用した場合,既存の特殊プロンプトベースのチューニング手法よりも優れる。これらの多くの利点は、この手法を下流適応のための微調整の説得力のある代替手段と位置づけている。 In recent years, prompt tuning has sparked a research surge in adapting pre-trained models. Unlike the unified pre-training strategy employed in the language field, the graph field exhibits diverse pre-training strategies, posing challenges in designing appropriate prompt-based tuning methods for graph neural networks. While some pioneering work has devised specialized prompting functions for models that employ edge prediction as their pre-training tasks, these methods are limited to specific pre-trained GNN models and lack broader applicability. In this paper, we introduce a universal prompt-based tuning method called Graph Prompt Feature (GPF) for pre-trained GNN models under any pre-training strategy. GPF operates on the input graph's feature space and can theoretically achieve an equivalent effect to any form of prompting function. Consequently, we no longer need to illustrate the prompting function corresponding to each pre-training strategy explicitly. Instead, we employ GPF to obtain the prompted graph for the downstream task in an adaptive manner. We provide rigorous derivations to demonstrate the universality of GPF and make guarantee of its effectiveness. The experimental results under various pre-training strategies indicate that our method performs better than fine-tuning, with an average improvement of about 1.4% in full-shot scenarios and about 3.2% in few-shot scenarios. Moreover, our method significantly outperforms existing specialized prompt-based tuning methods when applied to models utilizing the pre-training strategy they specialize in. These numerous advantages position our method as a compelling alternative to fine-tuning for downstream adaptations.	翻訳日:2023-07-07 00:04:41 公開日:2023-07-04
# 対称性から学ぶ--対称行動と言語指示を用いたメタ強化学習 Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions ( http://arxiv.org/abs/2209.10656v2 ) ライセンス: Link先を確認	Xiangtong Yao, Zhenshan Bing, Genghang Zhuang, Kejia Chen, Hongkuan Zhou, Kai Huang and Alois Knoll	(参考訳) メタ強化学習(Meta-RL)は,エージェントが新しいタスクを素早く学習できるようにする,有望なアプローチである。しかし、ほとんどのメタRLアルゴリズムは、報酬のみによって提供されるタスク情報不足のため、マルチタスクシナリオでの一般化が不十分である。言語条件付きメタRLは、言語命令とエージェントの動作をマッチングすることで、一般化能力を向上させる。行動と言語命令の両方に対称性があり、新しい知識の人間の学習を加速させる。したがって、対称性と言語命令をメタRLに組み合わせることで、アルゴリズムの一般化と学習効率を向上させることができる。対称な動作や言語命令を用いて,新しいタスクを効率的に学習することのできる,デュアルMDPメタ強化学習手法を提案する。提案手法は,複数の難解な操作課題において評価され,実験により,メタ強化学習の一般化と学習効率が大幅に向上することが示された。ビデオはhttps://tumi6robot.wixsite.com/symmetry/。 Meta-reinforcement learning (meta-RL) is a promising approach that enables the agent to learn new tasks quickly. However, most meta-RL algorithms show poor generalization in multi-task scenarios due to the insufficient task information provided only by rewards. Language-conditioned meta-RL improves the generalization capability by matching language instructions with the agent's behaviors. While both behaviors and language instructions have symmetry, which can speed up human learning of new knowledge. Thus, combining symmetry and language instructions into meta-RL can help improve the algorithm's generalization and learning efficiency. We propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetrical behaviors and language instructions. We evaluate our method in multiple challenging manipulation tasks, and experimental results show that our method can greatly improve the generalization and learning efficiency of meta-reinforcement learning. Videos are available at https://tumi6robot.wixsite.com/symmetry/.	翻訳日:2023-07-07 00:04:14 公開日:2023-07-04
# マラリア診断のための機械学習アルゴリズムの開発を導く指標 Metrics to guide development of machine learning algorithms for malaria diagnosis ( http://arxiv.org/abs/2209.06947v2 ) ライセンス: Link先を確認	Charles B. Delahunt, Noni Gachuhi, Matthew P. Horning	(参考訳) 自動マラリア診断は、機械学習(ML)にとって難しいが高価値なターゲットであり、効果的なアルゴリズムは何千人もの子供の命を救える。しかし、現在のMLの取り組みは重要なユースケースの制約をほとんど無視しており、臨床的には有用ではない。特に2つの要因が臨床現場設定に翻訳可能なアルゴリズムの開発に不可欠である。 (i)mlソリューションが対応しなければならない臨床ニーズを明確に理解すること。 (II)MLモデルの指導と評価のためのタスク関連メトリクス。これらの要因の無視は、臨床ニーズと一致しないため、過去のMLのマラリア研究を著しく妨げている。本稿では,この2つの問題点を,ジエマの血液膜を顕微鏡で観察することで診断する。まず、なぜドメインの専門知識が、MLをマラリアに効果的に適用し、このドメインの知識を提供する技術文書やその他のリソースをリストアップすることが重要なのかを説明する。第2に,マラリア診断の臨床的要件に合わせたパフォーマンス指標を詳述し,mlモデルの開発を指導し,臨床ニーズレンズ(汎用mlレンズではなく)を通してモデル性能を評価する。患者レベルの視点,患者間の多様性,偽陽性率,検出限界,エラーの種類などの重要性を強調した。 ROC曲線、AUC、F1がMLの作業でよく使われるが、この文脈にはあまり適さない理由についても論じる。これらの所見は、分裂病などの熱帯病(NTD)を無視するなど、寄生虫の負荷を伴う他の疾患にも当てはまる。 Automated malaria diagnosis is a difficult but high-value target for machine learning (ML), and effective algorithms could save many thousands of children's lives. However, current ML efforts largely neglect crucial use case constraints and are thus not clinically useful. Two factors in particular are crucial to developing algorithms translatable to clinical field settings: (i) Clear understanding of the clinical needs that ML solutions must accommodate; and (ii) task-relevant metrics for guiding and evaluating ML models. Neglect of these factors has seriously hampered past ML work on malaria, because the resulting algorithms do not align with clinical needs. In this paper we address these two issues in the context of automated malaria diagnosis via microscopy on Giemsa-stained blood films. First, we describe why domain expertise is crucial to effectively apply ML to malaria, and list technical documents and other resources that provide this domain knowledge. Second, we detail performance metrics tailored to the clinical requirements of malaria diagnosis, to guide development of ML models and evaluate model performance through the lens of clinical needs (versus a generic ML lens). We highlight the importance of a patient-level perspective, interpatient variability, false positive rates, limit of detection, and different types of error. We also discuss reasons why ROC curves, AUC, and F1, as commonly used in ML work, are poorly suited to this context. These findings also apply to other diseases involving parasite loads, including neglected tropical diseases (NTDs) such as schistosomiasis.	翻訳日:2023-07-07 00:04:01 公開日:2023-07-04
# PlaStIL: プラスチックで安定なメモリフリーなクラスインクリメンタルラーニング PlaStIL: Plastic and Stable Memory-Free Class-Incremental Learning ( http://arxiv.org/abs/2209.06606v2 ) ライセンス: Link先を確認	Gr\'egoire Petit, Adrian Popescu, Eden Belouadah, David Picard, Bertrand Delezoide	(参考訳) 過去の知識を保ちながら新しいデータから学ぶためには、クラス増分学習において塑性と安定性が必要である。破滅的な忘れ方のため、メモリバッファがない場合、これら2つのプロパティ間の妥協を見つけることは特に難しい。従来のインクリメンタルな状態からの知識蒸留と微調整を使って新しいクラスを統合するため、主流のメソッドは2つの深いモデルを格納する必要がある。そこで本稿では, 可塑性と安定性のバランスを良くするために, パラメータ数に類似する手法を提案する。転送ベースのインクリメンタルメソッドですでにデプロイされているアプローチに従って,初期状態後の特徴抽出器を凍結する。最も古い段階的な状態のクラスは、安定性を確保するためにこの凍結抽出器で訓練される。最近のクラスは塑性を導入するために部分的に微調整されたモデルを用いて予測される。提案した塑性層は, 模範のない漸進的な学習を目的とした転送方式に組み込むことができ, 2つの手法に適用できる。評価は3つの大規模データセットで行う。その結果、既存の方法と比較して、すべてのテスト済み構成でパフォーマンスが向上することが示された。 Plasticity and stability are needed in class-incremental learning in order to learn from new data while preserving past knowledge. Due to catastrophic forgetting, finding a compromise between these two properties is particularly challenging when no memory buffer is available. Mainstream methods need to store two deep models since they integrate new classes using fine-tuning with knowledge distillation from the previous incremental state. We propose a method which has similar number of parameters but distributes them differently in order to find a better balance between plasticity and stability. Following an approach already deployed by transfer-based incremental methods, we freeze the feature extractor after the initial state. Classes in the oldest incremental states are trained with this frozen extractor to ensure stability. Recent classes are predicted using partially fine-tuned models in order to introduce plasticity. Our proposed plasticity layer can be incorporated to any transfer-based method designed for exemplar-free incremental learning, and we apply it to two such methods. Evaluation is done with three large-scale datasets. Results show that performance gains are obtained in all tested configurations compared to existing methods.	翻訳日:2023-07-07 00:03:37 公開日:2023-07-04
# 拘束ボース-ハバードモデルにおけるフラクトニックルッティンガー液体と超固体 Fractonic Luttinger Liquids and Supersolids in a Constrained Bose-Hubbard Model ( http://arxiv.org/abs/2210.11072v2 ) ライセンス: Link先を確認	Philip Zechmann, Ehud Altman, Michael Knap, Johannes Feldmeier	(参考訳) フラクトン制約を持つ量子多体系は、非慣習的な物質の低エネルギー位相を示すと広く予想されている。本研究では,Bose-Hubbardモデルを一次元に保存する双極子モーメント基底状態における,このような異方性量子相の存在を実証する。整数ボソン充填では,フラクトンの複合体である微視的局所双極子モデルへのシステムのマッピングを行う。ダイポールルッティンガー液相の出現を実証するために,低エネルギー場理論と大規模テンソルネットワークシミュレーションを組み合わせる。非整数補充では、量子リフシッツモデルによって説明される興味深い圧縮可能な状態が示され、電荷密度波秩序と双極子長距離秩序と超流動性(英語版)が共存する。この超固体状態は最終的に熱力学的極限の格子効果に対して不安定になるかもしれないが、その数値的ロバスト性は顕著である。我々は実験結果の潜在的意義について議論する。 Quantum many-body systems with fracton constraints are widely conjectured to exhibit unconventional low-energy phases of matter. In this work, we demonstrate the existence of a variety of such exotic quantum phases in the ground states of a dipole-moment conserving Bose-Hubbard model in one dimension. For integer boson fillings, we perform a mapping of the system to a model of microscopic local dipoles, which are composites of fractons. We apply a combination of low-energy field theory and large-scale tensor network simulations to demonstrate the emergence of a dipole Luttinger liquid phase. At non-integer fillings our numerical approach shows an intriguing compressible state described by a quantum Lifshitz model in which charge density-wave order coexists with dipole long-range order and superfluidity - a `dipole supersolid'. While this supersolid state may eventually be unstable against lattice effects in the thermodynamic limit, its numerical robustness is remarkable. We discuss potential experimental implications of our results.	翻訳日:2023-07-06 23:57:35 公開日:2023-07-04
# 一般量子マルコフ過程のヒット時間について On Hitting Times for General Quantum Markov Processes ( http://arxiv.org/abs/2210.10188v2 ) ライセンス: Link先を確認	Lorenzo Laneve, Francesco Tacchino, Ivano Tavernelli	(参考訳) ランダムウォーク(英: Random walk、またはMarkov chains)は、理論計算機科学で広く使われているモデルである。打つ時間や混合時間などの量の分析を含むいくつかのツールは、ランダム化されたアルゴリズムを考案するのに役立ちます。注目すべき例はsch\"oning's algorithm for the satisfiability (sat) problemである。本研究では,古典的ウォークを直接一般化する量子マルコフ連鎖モデルを定義するために密度行列形式を用い,古典的理論で見られるものと同様の公式で時間を打つような共通ツールが計算できることを示し,グロバーのアルゴリズムのような既知の量子的設定に適用する。 Random walks (or Markov chains) are models extensively used in theoretical computer science. Several tools, including analysis of quantities such as hitting and mixing times, are helpful for devising randomized algorithms. A notable example is Sch\"oning's algorithm for the satisfiability (SAT) problem. In this work, we use the density-matrix formalism to define a quantum Markov chain model which directly generalizes classical walks, and we show that a common tools such as hitting times can be computed with a similar formula as the one found in the classical theory, which we then apply to known quantum settings such as Grover's algorithm.	翻訳日:2023-07-06 23:56:16 公開日:2023-07-04
# 2つの導波路積分量子エミッタの独立動作 Independent operation of two waveguide-integrated quantum emitters ( http://arxiv.org/abs/2210.09826v2 ) ライセンス: Link先を確認	Camille Papon, Ying Wang, Ravitej Uppu, Sven Scholz, Andreas Dirk Wieck, Arne Ludwig, Peter Lodahl, Leonardo Midolo	(参考訳) 複数の空間モードにおけるオンチップ単光子生成のためのフォトニック集積回路において、2つの量子ドットの共振励起を示す。 2つの量子ドットは、孤立した1対のp$-$i$-n$ジャンクションを使用して同じ発光波長に電気的に調整され、デュアルモード導波路を介して共鳴ポンプレーザーによって励起される。狭線幅量子ドットの連続波励起下での$(79\pm2)\%$の2光子量子干渉の可視性を示す。我々の研究は、決定論的単一光子源のスケールアップの鍵となる機能を実現することによって、量子フォトニクスにおける卓越した課題を解決する。 We demonstrate the resonant excitation of two quantum dots in a photonic integrated circuit for on-chip single-photon generation in multiple spatial modes. The two quantum dots are electrically tuned to the same emission wavelength using a pair of isolated $p$-$i$-$n$ junctions and excited by a resonant pump laser via dual-mode waveguides. We demonstrate two-photon quantum interference visibility of $(79\pm2)\%$ under continuous-wave excitation of narrow-linewidth quantum dots. Our work solves an outstanding challenge in quantum photonics by realizing the key enabling functionality of how to scale-up deterministic single-photon sources.	翻訳日:2023-07-06 23:56:04 公開日:2023-07-04
# コントラスト誘導拡散過程による対向ロバスト性の向上 Improving Adversarial Robustness by Contrastive Guided Diffusion Process ( http://arxiv.org/abs/2210.09643v2 ) ライセンス: Link先を確認	Yidong Ouyang, Liyan Xie, Guang Cheng	(参考訳) 標準的な分類タスクに比べてロバストな学習にはトレーニングサンプルの量が大幅に多いため、合成データ生成は分類タスクの敵対的ロバスト性を改善するための新たなツールになっている。様々な深層生成モデルの中で,拡散モデルにより高品質な合成画像が生成され,対向性の向上に優れた性能を発揮することが示されている。しかし、拡散型法は通常、他の生成モデルと比較してデータ生成が遅い。近年, 異なる加速法が提案されているが, 下流タスクにおいて生成したデータのサンプル効率を改善する方法の研究も重要である。本稿では,まず合成分布の最適性条件を解析し,非自明なロバストな精度を実現する。生成データ間の識別性の向上は, 対向的ロバスト性の向上に不可欠であることを示す。そこで本研究では,データ生成における拡散モデルを導出するコントラスト的拡散過程(Contrastive-Guided Diffusion Process, Contrastive-DP)を提案する。シミュレーションを用いて理論的結果を検証し,画像データセット上でのコントラストDPの性能を示す。 Synthetic data generation has become an emerging tool to help improve the adversarial robustness in classification tasks since robust learning requires a significantly larger amount of training samples compared with standard classification tasks. Among various deep generative models, the diffusion model has been shown to produce high-quality synthetic images and has achieved good performance in improving the adversarial robustness. However, diffusion-type methods are typically slow in data generation as compared with other generative models. Although different acceleration techniques have been proposed recently, it is also of great importance to study how to improve the sample efficiency of generated data for the downstream task. In this paper, we first analyze the optimality condition of synthetic distribution for achieving non-trivial robust accuracy. We show that enhancing the distinguishability among the generated data is critical for improving adversarial robustness. Thus, we propose the Contrastive-Guided Diffusion Process (Contrastive-DP), which adopts the contrastive loss to guide the diffusion model in data generation. We verify our theoretical results using simulations and demonstrate the good performance of Contrastive-DP on image datasets.	翻訳日:2023-07-06 23:55:53 公開日:2023-07-04
# (1,1)-クラスタ編集は多項式時間可解である (1,1)-Cluster Editing is Polynomial-time Solvable ( http://arxiv.org/abs/2210.07722v2 ) ライセンス: Link先を確認	Gregory Gutin and Anders Yeo	(参考訳) グラフ $H$ がclique グラフであれば、$H$ はclique の頂点非共役和である。 abu-khzam (2017) は $(a,d)$-{cluster editing} 問題を導入し、固定自然数 $a,d$ に対して、グラフ $g$ と頂点重み $a^:\ v(g)\rightarrow \{0,1,\dots,a\}$ と $d^{}:\ v(g)\rightarrow \{0,1,\dots,d\}$ が与えられたとき、$g$ が $v\in v(g)$ に対して最大$d^(v)$ edges インシデントを削除できるかどうかを判断する。 komusiewicz と uhlmann (2012) と abu-khzam (2017) による結果は、すべてのペアに対して$a,d$ と$a=d=1.$ abu-khzam (2017) から離れて$(a,d)$-{cluster editing} の複雑性(p または np完全)の二分法を提供し、$(1,1)$-{cluster editing} が p にあると推測した。 (i)最大次数3の$C_3$-freeおよび$C_4$-freeグラフに真に5つの多項式時間還元を与える。 (ii)最大次数の$c_3$-free と $c_4$-free グラフ上で$(1,1)$-{cluster editing} を解く多項式時間アルゴリズムを設計する。 A graph $H$ is a clique graph if $H$ is a vertex-disjoin union of cliques. Abu-Khzam (2017) introduced the $(a,d)$-{Cluster Editing} problem, where for fixed natural numbers $a,d$, given a graph $G$ and vertex-weights $a^:\ V(G)\rightarrow \{0,1,\dots, a\}$ and $d^{}:\ V(G)\rightarrow \{0,1,\dots, d\}$, we are to decide whether $G$ can be turned into a cluster graph by deleting at most $d^(v)$ edges incident to every $v\in V(G)$ and adding at most $a^*(v)$ edges incident to every $v\in V(G)$. Results by Komusiewicz and Uhlmann (2012) and Abu-Khzam (2017) provided a dichotomy of complexity (in P or NP-complete) of $(a,d)$-{Cluster Editing} for all pairs $a,d$ apart from $a=d=1.$ Abu-Khzam (2017) conjectured that $(1,1)$-{Cluster Editing} is in P. We resolve Abu-Khzam's conjecture in affirmative by (i) providing a serious of five polynomial-time reductions to $C_3$-free and $C_4$-free graphs of maximum degree at most 3, and (ii) designing a polynomial-time algorithm for solving $(1,1)$-{Cluster Editing} on $C_3$-free and $C_4$-free graphs of maximum degree at most 3.	翻訳日:2023-07-06 23:54:58 公開日:2023-07-04
# ロボットによる仕事の学習:人間による自律性と展開中の学習 Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment ( http://arxiv.org/abs/2211.08416v3 ) ライセンス: Link先を確認	Huihan Liu, Soroush Nasiriany, Lance Zhang, Zhiyao Bao, Yuke Zhu	(参考訳) コンピュータパワーの急速な成長とディープラーニングの最近の進歩により、研究環境における新しいロボット能力の印象的な実証が見られた。それでも、これらの学習システムは不安定な一般化を示し、実践的なタスクに過剰なトレーニングデータを必要とする。非完全性を受け入れつつ最先端のロボット学習モデルの能力を活用するために,人間とロボットが作業部門で協力するための原則フレームワークであるsiriusを提案する。このフレームワークでは、部分的に自律的なロボットが意思決定の大部分を適切に処理するタスクを負う一方で、人間のオペレーターはプロセスを監視し、困難な状況に介入する。このような人間ロボットチームは、複雑なタスクに安全なデプロイを保証する。さらに,タスク実行から収集したデータに対するポリシーの性能を向上させるための新しい学習アルゴリズムを提案する。中心となるアイデアは、トレーニングサンプルをおよそ人間の信頼で強化し、重み付けされた行動のクローンでポリシーを最適化することだ。我々はSiriusをシミュレーションおよび実際のハードウェアで評価し、Siriusが一連のコンタクトリッチな操作タスクに対して一貫してベースラインを上回り、シミュレーションで8%、実際のハードウェアで27%向上し、コンバージェンスを2倍速くし、メモリサイズを85%削減した。ビデオや詳細はhttps://ut-austin-rpl.github.io/sirius/で確認できる。 With the rapid growth of computing powers and recent advances in deep learning, we have witnessed impressive demonstrations of novel robot capabilities in research settings. Nonetheless, these learning systems exhibit brittle generalization and require excessive training data for practical tasks. To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations. Such a human-robot team ensures safe deployments in complex tasks. Further, we introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions. The core idea is re-weighing training samples with approximated human trust and optimizing the policies with weighted behavioral cloning. We evaluate Sirius in simulation and on real hardware, showing that Sirius consistently outperforms baselines over a collection of contact-rich manipulation tasks, achieving an 8% boost in simulation and 27% on real hardware than the state-of-the-art methods in policy success rate, with twice faster convergence and 85% memory size reduction. Videos and more details are available at https://ut-austin-rpl.github.io/sirius/	翻訳日:2023-07-06 23:46:49 公開日:2023-07-04
# トーションを持つ一般相対論的パイロット波量子力学 General-relativistic pilot-wave quantum mechanics with torsion ( http://arxiv.org/abs/2211.03234v2 ) ライセンス: Link先を確認	Francisco Ribeiro Benard Guedes and Nikodem Janusz Pop{\l}awski	(参考訳) ディラック粒子の4速度は、u^i=\bar{\psi}\gamma^i\psi/\bar{\psi}\psi$ による相対論的波動関数と関連している。我々は、スピノルの四項項と共変微分によって与えられる翻訳生成元を関連付ける。我々は、スピノルの固有角運動量 4-テンソルとローレンツ群のスピノル表現における回転の生成を関連付ける。スピノル場に対するスピンおよびエネルギー$-$momentumテンソルの共変保存則を用いて、アインシュタイン$-$カルタントーションの存在下で、波動がディラック方程式を満たすならば、四速度、4運動量、スピン四運動量テンソルは古典的マティスソン$-$papapetrou運動方程式を満たすことを示す。これらの方程式は測地線運動方程式に還元される。したがって、パイロット波量子力学における4速度によって導かれる粒子の運動は、時空の幾何によって決定される粒子の測地線運動と一致し、相対論的波動の2重性を表す。 We propose that the four-velocity of a Dirac particle is related to its relativistic wave function by $u^i=\bar{\psi}\gamma^i\psi/\bar{\psi}\psi$. We associate the four-momentum of a spinor with a generator of translation, given by a covariant derivative. We associate the intrinsic angular momentum four-tensor of a spinor with a generator of rotation in the spinor representation of the Lorentz group. We use the covariant conservation laws for the spin and energy$-$momentum tensors for a spinor field in the presence of the Einstein$-$Cartan torsion to show that if the wave satisfies the Dirac equation, then the four-velocity, four-momentum, and spin four-tensor satisfy the classical Mathisson$-$Papapetrou equations of motion. We show that these equations reduce to the geodesic equation of motion. Consequently, the motion of a particle guided by the four-velocity in the pilot-wave quantum mechanics coincides with the geodesic motion of the particle determined by the geometry of spacetime, representing a relativistic wave$-$particle duality.	翻訳日:2023-07-06 23:45:36 公開日:2023-07-04
# 監督信号のインフォメーション性について On the Informativeness of Supervision Signals ( http://arxiv.org/abs/2211.01407v3 ) ライセンス: Link先を確認	Ilia Sucholutsky and Ruairidh M. Battleday and Katherine M. Collins and Raja Marjieh and Joshua C. Peterson and Pulkit Singh and Umang Bhatt and Nori Jacoby and Adrian Weller and Thomas L. Griffiths	(参考訳) 教師付き学習は通常、人間が注釈を付けたトレーニング例から転送可能な表現を学ぶことに焦点を当てる。リッチアノテーション(ソフトラベルなど)は(ハードラベルのような)スパースアノテーションよりも多くの情報を持っているが、収集するコストも高い。例えば、ハードラベルは、オブジェクトが属する最も近いクラスに関する情報のみを提供する(例:「犬である」)が、ソフトラベルは、オブジェクトと複数のクラスとの関係に関する情報を提供する(例:「これは犬である可能性が高いが、オオカミやコヨーテでもある」)。我々は情報理論を用いて、多くの一般的な監視信号が表現学習のパフォーマンスにどのように寄与するか、また、ラベル数、クラス数、寸法数、ノイズなどの要因によってその能力がどのように影響を受けるかを比較する。当社のフレームワークは,ビッグデータ環境においてハードラベルを使用するための理論的正当化を提供するが,少ない学習と分散一般化のためのよりリッチな監督信号を提供する。我々は,100万以上のクラウドソース画像アノテーションを用いた一連の実験において,これらの結果を実証的に検証し,コスト便益分析を行い,ユーザが自身のデータセットで表現学習を監督するコストを最適化できるトレードオフ曲線を確立する。 Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are also more expensive to collect. For example, while hard labels only provide information about the closest class an object belongs to (e.g., "this is a dog"), soft labels provide information about the object's relationship with multiple classes (e.g., "this is most likely a dog, but it could also be a wolf or a coyote"). We use information theory to compare how a number of commonly-used supervision signals contribute to representation-learning performance, as well as how their capacity is affected by factors such as the number of labels, classes, dimensions, and noise. Our framework provides theoretical justification for using hard labels in the big-data regime, but richer supervision signals for few-shot learning and out-of-distribution generalization. We validate these results empirically in a series of experiments with over 1 million crowdsourced image annotations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets.	翻訳日:2023-07-06 23:45:14 公開日:2023-07-04
# インテリジェント・ペインティング:拡散モデルを用いた画像合成 Intelligent Painter: Picture Composition With Resampling Diffusion Model ( http://arxiv.org/abs/2210.17106v3 ) ライセンス: Link先を確認	Wing-Fung Ku, Wan-Chi Siu, Xi Cheng, H. Anthony Chan	(参考訳) あなたは知的な画家になれると思ったことがありますか? これは、いくつかの期待されるオブジェクトを念頭に置いて、あるいは望ましいシーンで絵を描くことができることを意味する。これは、特定のオブジェクトの位置を決定できない通常のペイントアプローチとは異なる。本稿では,ある人物の想像上の場面を一行で生成する知的画家について,明示的なヒントを与える。拡散確率モデル(ddpm)を特定地点の入力対象に応じて無条件調和画像をインテリジェントに合成するための再サンプリング戦略を提案する。拡散特性を利用して効率よく再サンプリングし、リアルな画像を生成する。実験結果から,本手法は効率よく生成した出力の意味を選好し,ぼやけた出力を生成する。画像品質評価の定量的解析は,最先端の手法と比較して高い知覚的品質画像を生成することを示す。 Have you ever thought that you can be an intelligent painter? This means that you can paint a picture with a few expected objects in mind, or with a desirable scene. This is different from normal inpainting approaches for which the location of specific objects cannot be determined. In this paper, we present an intelligent painter that generate a person's imaginary scene in one go, given explicit hints. We propose a resampling strategy for Denoising Diffusion Probabilistic Model (DDPM) to intelligently compose unconditional harmonized pictures according to the input subjects at specific locations. By exploiting the diffusion property, we resample efficiently to produce realistic pictures. Experimental results show that our resampling method favors the semantic meaning of the generated output efficiently and generates less blurry output. Quantitative analysis of image quality assessment shows that our method produces higher perceptual quality images compared with the state-of-the-art methods.	翻訳日:2023-07-06 23:44:35 公開日:2023-07-04
# FI-ODE:ニューラル・オードにおけるロバストな前方不変性 FI-ODE: Certifiably Robust Forward Invariance in Neural ODEs ( http://arxiv.org/abs/2210.16940v3 ) ライセンス: Link先を確認	Yujia Huang, Ivan Dario Jimenez Rodriguez, Huan Zhang, Yuanyuan Shi, Yisong Yue	(参考訳) フォワード不変性(フォワード不変性、Forward invariance)とは、制御理論において、力学系が常に指定された状態の集合内に留まり、堅牢性を保証する(例えば、証明書は摂動の下で保持される)ことを証明するために用いられる長期研究された性質である。本稿では,ニューラルネットワークにおけるフォワード不変性の証明とトレーニングのための一般的なフレームワークを提案する。我々はこの枠組みを,頑健な連続制御における認証安全性と,画像分類のための認証された敵対的ロバスト性という2つの設定に適用する。私たちの知る限りでは、このような保証のない保証でNODEポリシーをトレーニングする最初の例です。 Forward invariance is a long-studied property in control theory that is used to certify that a dynamical system stays within some pre-specified set of states for all time, and also admits robustness guarantees (e.g., the certificate holds under perturbations). We propose a general framework for training and provably certifying robust forward invariance in Neural ODEs. We apply this framework in two settings: certified safety in robust continuous control, and certified adversarial robustness for image classification. To our knowledge, this is the first instance of training NODE policies with such non-vacuous certified guarantees.	翻訳日:2023-07-06 23:44:20 公開日:2023-07-04
# 欠陥のない原子配列の高速作成のための並列圧縮アルゴリズム Parallel compression algorithm for fast preparation of defect-free atom arrays ( http://arxiv.org/abs/2212.03047v2 ) ライセンス: Link先を確認	Shangguo Zhu, Yun Long, Mingbo Pu, Xiangang Luo	(参考訳) 欠陥のない原子配列は量子科学と技術のための強力で汎用的なプラットフォームとして登場し、高いプログラマビリティと有望なスケーラビリティを提供している。配列は、部分的にロードされた初期配列から指定されたターゲット部位に原子を配置することで作成することができる。しかし、大きな欠陥のないアレイを実現するには、再配置中の原子損失と、配列サイズに逆比例する真空制限寿命が問題となる。原子再配置の成功には、時間コストと原子損失を最小限に抑える効率的な再配置アルゴリズムが不可欠である。本稿では,複数の移動式ツイーザを用いて同時に原子を転送する並列圧縮アルゴリズムを提案する。トータルタイムコストは、ターゲットサイト数と線形にスケールするように削減できる。このアルゴリズムは、現在の実験装置で容易に実装できる。 Defect-free atom arrays have emerged as a powerful and versatile platform for quantum sciences and technologies, offering high programmability and promising scalability. The arrays can be prepared by rearranging atoms from a partially loaded initial array to the designated target sites. However, achieving large defect-free arrays presents challenges due to atom loss during rearrangement and the vacuum-limited lifetime which is inversely proportional to the array size. Efficient rearrangement algorithms which minimize time cost and atom loss are crucial for successful atom rearrangement. Here we propose a novel parallel compression algorithm which leverages multiple mobile tweezers to transfer atoms simultaneously. The total time cost could be reduced to scale linearly with the number of target sites. This algorithm can be readily implemented in current experimental setups.	翻訳日:2023-07-06 23:36:44 公開日:2023-07-04
# 適応的サンプリングによる公平な介入による条件付き生成のスプリアス因果関係の破れ Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling ( http://arxiv.org/abs/2212.02090v2 ) ライセンス: Link先を確認	Junhyun Nam, Sangwoo Mo, Jaeho Lee, Jinwoo Shin	(参考訳) サンプルとラベルの関係を捉えるために、条件付き生成モデルはトレーニングデータセットからスプリアス相関を継承することが多い。これは別の潜在属性に対して不均衡なラベル条件分布をもたらす。本稿では,条件生成の急激な因果関係を緩和するために,一般的な2段階戦略を提案する。 (a)Fairness Intervention (FI):トレーニングデータセットの急激な相関により生成が困難であるマイノリティサンプルを強調する。 b) 補正サンプリング(CS): 生成されたサンプルを明示的にフィルタリングし、所望の潜在属性分布に従うことを保証する。我々は,無監督,弱監督,半監督のシナリオを含むスプリアス属性の様々な監督のために,公平な介入をデザインした。実験の結果,FICSは様々なデータセットにまたがる条件生成の急激な因果関係を効果的に解決できることが示された。 To capture the relationship between samples and labels, conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. To mitigate this issue, which we call spurious causality of conditional generation, we propose a general two-step strategy. (a) Fairness Intervention (FI): emphasize the minority samples that are hard to generate due to the spurious correlation in the training dataset. (b) Corrective Sampling (CS): explicitly filter the generated samples and ensure that they follow the desired latent attribute distribution. We have designed the fairness intervention to work for various degrees of supervision on the spurious attribute, including unsupervised, weakly-supervised, and semi-supervised scenarios. Our experimental results demonstrate that FICS can effectively resolve spurious causality of conditional generation across various datasets.	翻訳日:2023-07-06 23:36:32 公開日:2023-07-04
# OPUS-MTを用いたニューラルマシン翻訳の民主化 Democratizing Neural Machine Translation with OPUS-MT ( http://arxiv.org/abs/2212.01936v3 ) ライセンス: Link先を確認	J\"org Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Gr\"onroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja	(参考訳) 本稿では,オープン機械翻訳モデルとツールの開発,エンドユーザーアプリケーション,開発プラットフォーム,プロフェッショナルワークフローへの統合に焦点をあてたOPUSエコシステムについて述べる。我々は現在進行中の言語カバレッジと翻訳品質の向上に関するミッションについて論じるとともに,モジュール型翻訳モデルの開発と,通常のデスクトップや小型デバイス上でのリアルタイム翻訳のための高速化されたコンパクトソリューションについて述べる。 This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the development of modular translation models and speed-optimized compact solutions for real-time translation on regular desktops and small devices.	翻訳日:2023-07-06 23:36:15 公開日:2023-07-04
# eコマースサイトにおける感情分析と意見マイニング Sentiment analysis and opinion mining on E-commerce site ( http://arxiv.org/abs/2211.15536v2 ) ライセンス: Link先を確認	Fatema Tuz Zohra Anny and Oahidul Islam	(参考訳) 感情分析や意見マイニングは、NLP(Natural Language Processing)というフレーズを説明するのに役立つ。近年では感性分析が最も重要な話題となっている。本研究の目的は,感情分析における感情極性分類の課題を解決することである。全体的プロセスの説明とともに、感情的反対を分類する幅広い手法が提示される。分析の結果,文レベルの分類とレビューレベルの分類の両方が行われる。最後に,今後の感情分析研究の計画について述べる。 Sentiment analysis or opinion mining help to illustrate the phrase NLP (Natural Language Processing). Sentiment analysis has been the most significant topic in recent years. The goal of this study is to solve the sentiment polarity classification challenges in sentiment analysis. A broad technique for categorizing sentiment opposition is presented, along with comprehensive process explanations. With the results of the analysis, both sentence-level classification and review-level categorization are conducted. Finally, we discuss our plans for future sentiment analysis research.	翻訳日:2023-07-06 23:36:06 公開日:2023-07-04
# 位相相と論理ゲートのフェルミオン欠陥 Fermionic defects of topological phases and logical gates ( http://arxiv.org/abs/2211.12394v2 ) ライセンス: Link先を確認	Ryohei Kobayashi	(参考訳) 2+1)Dボソニック位相の余次元-1欠陥について論じ、そこでは欠陥がフェルミオン自由度を支持する。このような欠陥をフェルミオン欠陥(fermionic defects)と呼び、任意のオンの自己統計をシフトできる「ゲージググウェンspt欠陥(gauged gu-wen spt defects)」と呼ばれる可逆フェルミオン欠陥のサブクラスを導入する。我々は、ゲージ付きGu-Wen SPT欠陥と、その欠陥上のフェルミオンから分離されたボソニック非可逆欠陥の融合の観点から、一般フェルミオン非可逆欠陥の正準形式を導出した。次に、総称可逆フェルミオン欠陥の融合則を導出する。ゲージ付きGu-Wen SPT欠陥は、追加のアンシラフェルミオンの存在下で安定化符号の興味深い論理ゲートをもたらす。例えば、(2+1)d $\mathbb{z}_2$ toric符号に(2+1)d ancilla trivial atomic insulatorを積み重ねたcz論理ゲートが有限深さ回路によって実装されている。また,(3+1)d walker-wangモデルの境界上で実現される(2+1)dボソニック位相相間のガッピングフェルミオン界面についても検討した。この場合、ガッピングされた界面は(2+1)d相のキラル中心電荷をシフトすることができる。これらのフェミオン界面のうち、(3+1)D相が空間反射対称性を持ち、(2+1)D表面トポロジカル秩序とその向き反転を補間する反射面にフェルミオン界面が支持される興味深い例を研究する。この設定を実現する(3+1)d 可解ハミルトニアンを構築し、このモデルが反射平面上の空間反射対称性とフェルミオンパリティを持つ (3+1)d 可逆位相の$\mathbb{z}_8$ の分類を生成する。我々は、時空高群対称性を持つエキゾチックな可逆位相として知られる有効場理論と接触する。 We discuss the codimension-1 defects of (2+1)D bosonic topological phases, where the defects can support fermionic degrees of freedom. We refer to such defects as fermionic defects, and introduce a certain subclass of invertible fermionic defects called "gauged Gu-Wen SPT defects" that can shift self-statistics of anyons. We derive a canonical form of a general fermionic invertible defect, in terms of the fusion of a gauged Gu-Wen SPT defect and a bosonic invertible defect decoupled from fermions on the defect. We then derive the fusion rule of generic invertible fermionic defects. The gauged Gu-Wen SPT defects give rise to interesting logical gates of stabilizer codes in the presence of additional ancilla fermions. For example, we find a realization of the CZ logical gate on the (2+1)D $\mathbb{Z}_2$ toric code stacked with a (2+1)D ancilla trivial atomic insulator, which is implemented by a finite depth circuit. We also investigate a gapped fermionic interface between (2+1)D bosonic topological phases realized on the boundary of the (3+1)D Walker-Wang model. In that case, the gapped interface can shift the chiral central charge of the (2+1)D phase. Among these fermionic interfaces, we study an interesting example where the (3+1)D phase has a spatial reflection symmetry, and the fermionic interface is supported on a reflection plane that interpolates a (2+1)D surface topological order and its orientation-reversal. We construct a (3+1)D exactly solvable Hamiltonian realizing this setup, and find that the model generates the $\mathbb{Z}_8$ classification of the (3+1)D invertible phase with spatial reflection symmetry and fermion parity on the reflection plane. We make contact with an effective field theory, known in literature as the exotic invertible phase with spacetime higher-group symmetry.	翻訳日:2023-07-06 23:36:00 公開日:2023-07-04
# 解剖誘導型領域適応による3次元インベッドヒトポーズ推定 Anatomy-guided domain adaptation for 3D in-bed human pose estimation ( http://arxiv.org/abs/2211.12193v2 ) ライセンス: Link先を確認	Alexander Bigalke, Lasse Hansen, Jasper Diesel, Carlotta Hennigs, Philipp Rostalski, Mattias P. Heinrich	(参考訳) 3次元人間のポーズ推定は臨床モニタリングシステムの重要な構成要素である。しかし、深部ポーズ推定モデルの臨床的適用性は、十分なラベル付きトレーニングデータの必要性とともに、ドメインシフトの下での一般化の貧弱さによって制限されている。本稿では,ラベル付きソースからシフト未ラベルのターゲットドメインにモデルを適応させる新しいドメイン適応手法を提案する。本手法は,ヒト解剖学に関する事前知識に基づく2つの相補的適応戦略からなる。まず,対象領域における学習過程を,解剖学的に妥当なポーズの空間に制約することで導く。この目的のために, 従来の知識を解剖学的損失関数に組み込んで, 非対称な手足長, 骨長, 関節角度を解析した。第二に,自己学習のための疑似ラベルを解剖学的妥当性に応じてフィルタリングし,その概念を平均教師パラダイムに取り入れる。我々は、教師なしおよびソースなしのドメイン適応に適用可能なポイントクラウドベースのフレームワークで両方の戦略を統合する。パブリックSLPデータセットと新たに作成されたデータセットを用いて,2つの適応シナリオ下でのベッド内ポーズ推定を行う。本手法は,最先端ドメイン適応法を一貫して上回り,ベースラインモデルを31%/66%上回り,領域ギャップを65%/82%削減する。ソースコードはhttps://github.com/multimodallearning/da-3dhpe-anatomyで入手できる。 3D human pose estimation is a key component of clinical monitoring systems. The clinical applicability of deep pose estimation models, however, is limited by their poor generalization under domain shifts along with their need for sufficient labeled training data. As a remedy, we present a novel domain adaptation method, adapting a model from a labeled source to a shifted unlabeled target domain. Our method comprises two complementary adaptation strategies based on prior knowledge about human anatomy. First, we guide the learning process in the target domain by constraining predictions to the space of anatomically plausible poses. To this end, we embed the prior knowledge into an anatomical loss function that penalizes asymmetric limb lengths, implausible bone lengths, and implausible joint angles. Second, we propose to filter pseudo labels for self-training according to their anatomical plausibility and incorporate the concept into the Mean Teacher paradigm. We unify both strategies in a point cloud-based framework applicable to unsupervised and source-free domain adaptation. Evaluation is performed for in-bed pose estimation under two adaptation scenarios, using the public SLP dataset and a newly created dataset. Our method consistently outperforms various state-of-the-art domain adaptation methods, surpasses the baseline model by 31%/66%, and reduces the domain gap by 65%/82%. Source code is available at https://github.com/multimodallearning/da-3dhpe-anatomy.	翻訳日:2023-07-06 23:35:24 公開日:2023-07-04
# 量子客観性における冗長性とコンセンサスの意味 The meaning of redundancy and consensus in quantum objectivity ( http://arxiv.org/abs/2211.09150v2 ) ライセンス: Link先を確認	Dario A. Chisholm, Luca Innocenti, G. Massimo Palma	(参考訳) 量子客観性の文脈において「冗長性」と「合意」という用語はしばしば同義語として用いられるが、ここではこれらが量子-古典的遷移の異なる特徴を定量化する2つの関連しているが異なる概念として理解されるべきであることを示す。量子客観性、すなわちスペクトル放送構造と量子ダーウィン主義の2つの主要なフレームワークは、それぞれ冗長性とコンセンサスを定量化するのに最適であることを示す。さらに、非局所的に符号化された情報の明示的な例を解析することにより、冗長度とコンセンサスとの潜在的な相違を明らかにする。特に、これはスペクトル放送構造と量子ダーウィン主義の間の階層的関係を崩壊させる。我々のフレームワークは、量子客観性という文脈で既知の結果と将来の結果を解釈するための新しい視点を提供し、量子領域からの古典性の出現をより深く理解するための道を開く。 While the terms "redundancy" and "consensus" are often used as synonyms in the context of quantum objectivity, we show here that these should be understood as two related but distinct notions, that quantify different features of the quantum-to-classical transition. We show that the two main frameworks used to measure quantum objectivity, namely spectrum broadcast structure and quantum Darwinism, are best suited to quantify redundancy and consensus, respectively. Furthermore, by analyzing explicit examples of states with nonlocally encoded information, we highlight the potentially stark difference between the degrees of redundancy and consensus. In particular, this causes a break in the hierarchical relations between spectrum broadcast structure and quantum Darwinism. Our framework provides a new perspective to interpret known and future results in the context of quantum objectivity, paving the way for a deeper understanding of the emergence of classicality from the quantum realm.	翻訳日:2023-07-06 23:34:59 公開日:2023-07-04
# ケースベースニューラルネットワーク:時間変動と高次相互作用による生存率解析 Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions ( http://arxiv.org/abs/2301.06535v3 ) ライセンス: Link先を確認	Jesse Islam, Maxime Turgeon, Robert Sladek, Sahir Bhatnagar	(参考訳) ニューラルネットワークに基づく生存法は、データ駆動の共変量相互作用をモデル化することができる。これらの手法は回帰に基づくアプローチよりも優れた予測性能を提供するが、時間変動相互作用や複雑なベースラインハザードをモデル化できるわけではない。そこで本研究では,ケースベースサンプリングフレームワークとフレキシブルニューラルネットワークアーキテクチャを組み合わせた新しいアプローチとして,ケースベースニューラルネットワーク(cbnns)を提案する。新たなサンプリング手法とデータ拡張を用いて、自然に検閲を考慮し、入力として時間がかかるかもしれないフィードフォワードニューラルネットワークを構築する。 cbnnは特定の瞬間に発生する事象の確率を予測し、ハザード関数を推定する。 CBNNの性能と回帰とニューラルネットワークに基づく生存法を比較したシミュレーションと,2つの時間依存メトリクスを用いた3つのケーススタディを行った。まず, 複雑なベースラインハザードと時間変動の相互作用を含むシミュレーションの性能を検証し, cbnn が競争相手を上回り, 全手法を評価する。次に,3つの実データアプリケーションに適用し,CBNNは2つの研究で競合するモデルより優れており,第3に同様の性能を示す。本研究は,ケースベースサンプリングと深層学習を組み合わせることで,データ駆動型・時間変動相互作用モデリングのための簡易かつ柔軟なモデリングフレームワークを提供する。 Rパッケージはhttps://github.com/Jesse-Islam/cbnnで入手できる。 Neural network-based survival methods can model data-driven covariate interactions. While these methods can provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines the case-base sampling framework with flexible neural network architectures. Using a novel sampling scheme and data augmentation to naturally account for censoring, we construct a feed-forward neural network that may take time as an input. CBNNs predict the probability of an event occurring at a given moment to estimate the hazard function. We compare the performance of CBNNs to regression and neural network-based survival methods in a simulation and three case studies using two time-dependent metrics. First, we examine performance on a simulation involving a complex baseline hazard and time-varying interactions to assess all methods, with CBNN outperforming competitors. Then, we apply all methods to three real data applications, with CBNNs outperforming the competing models in two studies and showing similar performance in the third. Our results highlight the benefit of combining case-base sampling with deep learning to provide a simple and flexible modeling framework for data-driven, time-varying interaction modeling of single event survival outcomes. An R package is available at https://github.com/Jesse-Islam/cbnn.	翻訳日:2023-07-06 23:27:35 公開日:2023-07-04
# 視覚言語関係アライメントのためのクロスモーダル注意調整 Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment ( http://arxiv.org/abs/2212.10549v2 ) ライセンス: Link先を確認	Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency	(参考訳) マルチモーダル視覚言語モデルのスケールアップに向けた最近の進歩にもかかわらず、これらのモデルはWinogroundのような合成一般化ベンチマークに苦戦していることが知られている。現在の視覚言語モデルに欠けている重要な要素は、テキスト(例えば「草の中のマグ」)の方向的意味関係と画像中の空間的関係(例えば、草の相対的なマグの位置)とを一致させる能力である関係レベルアライメントである。この問題に対処するために,モーグから「グラス」への指示言語注意(意味的関係「イン」をキャプチャする)をモッグから草への指示視覚的注意に合わせることで,関係アライメントが実施可能であることを示す。相互注意を用いて、トークンとその対応するオブジェクトをソフトに識別する。我々は,このソフトリレーションアライメントの概念が,モーダル・アテンション・マトリクスによって提供される「ベースの変化」の下で,視覚と言語注意行列の一致を強制することと同値であることを示す。直感的には、我々のアプローチは言語注意空間への視覚的注意を投影し、実際の言語注意からの分岐を計算し、その逆も計算する。 UNITERにCACR(Cross-modal Attention Congruence Regularization)の損失を適用し,Winogroundに対する最先端アプローチを改善した。 Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships in the image (e.g., the position of the mug relative to the grass). To tackle this problem, we show that relation alignment can be enforced by encouraging the directed language attention from 'mug' to 'grass' (capturing the semantic relation 'in') to match the directed visual attention from the mug to the grass. Tokens and their corresponding objects are softly identified using the cross-modal attention. We prove that this notion of soft relation alignment is equivalent to enforcing congruence between vision and language attention matrices under a 'change of basis' provided by the cross-modal attention matrix. Intuitively, our approach projects visual attention into the language attention space to calculate its divergence from the actual language attention, and vice versa. We apply our Cross-modal Attention Congruence Regularization (CACR) loss to UNITER and improve on the state-of-the-art approach to Winoground.	翻訳日:2023-07-06 23:26:44 公開日:2023-07-04
# ミニモデル適応:アライメント・アライメントによる事前学習されたモデルを新しい言語に効率的に拡張する Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training ( http://arxiv.org/abs/2212.10503v2 ) ライセンス: Link先を確認	Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe	(参考訳) 以前の研究は、トランスフォーマー本体を凍結させながら、新しい組込みを学習することで、事前訓練されたマスケッド言語モデル(MLM)を新しい言語に拡張できることを示していた。パラメータの小さなサブセットを学習しても、新しい埋め込みをトレーニングするためには、モデル全体を完全な前方と後方にパスする必要があるため、このアプローチは計算効率が良くない。大規模モデルのパラメータのごく一部から浅いミニモデルを構築する計算効率のよい代替案であるミニモデル適応を提案する。新しい言語固有の埋め込みは、ミニモデル上で効率的に訓練され、高速な言語間移動のために整列した大きなモデルにプラグインされる。 minijointは、中間層にmlmヘッドを持つ1つのトランスフォーマを使用して、プライマリモデルとミニモデルを事前学習し、minipostは、通常の事前トレーニングされたモデルから開始し、いくつかのレイヤを抽出・凍結することでミニモデルを構築し、その上に少数のパラメータを学習する。 XNLI、MLQA、PAWS-Xの実験では、ミニモデル適応は平均2.3倍の計算量で標準手法のパフォーマンスと一致している。 Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using 2.3x less compute on average.	翻訳日:2023-07-06 23:26:16 公開日:2023-07-04
# 双対領域における画家的イメージ調和 Painterly Image Harmonization in Dual Domains ( http://arxiv.org/abs/2212.08846v4 ) ライセンス: Link先を確認	Junyan Cao, Yan Hong, Li Niu	(参考訳) 画像調和は、前景の外観を背景と適合するように調整することにより、視覚的に調和した複合画像を作成することを目的としている。合成画像が写真前景と画家的背景を有する場合、この課題は画家的イメージ調和と呼ばれる。このタスクには、時間を要するか、うまく調和した結果を生み出すのに弱い、ごくわずかの作業しかありません。本研究では,空間領域と周波数領域の両方の複合画像とを調和させるデュアルドメイン生成器とデュアルドメイン判別器からなる,新しい画家的調和ネットワークを提案する。デュアルドメイン生成器は,空間領域におけるadainモジュールと周波数領域における提案するresfftモジュールとの調和を行う。二重領域判別器は、各パッチの空間的特徴と周波数特徴に基づいて不調和なパッチを識別し、逆向きにジェネレータの能力を高める。ベンチマークデータセットの大規模な実験により,本手法の有効性が示された。私たちのコードとモデルはhttps://github.com/bcmi/PHDNet-Painterly-Image-Harmonizationで公開されています。 Image harmonization aims to produce visually harmonious composite images by adjusting the foreground appearance to be compatible with the background. When the composite image has photographic foreground and painterly background, the task is called painterly image harmonization. There are only few works on this task, which are either time-consuming or weak in generating well-harmonized results. In this work, we propose a novel painterly harmonization network consisting of a dual-domain generator and a dual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain. The dual-domain generator performs harmonization by using AdaIN modules in the spatial domain and our proposed ResFFT modules in the frequency domain. The dual-domain discriminator attempts to distinguish the inharmonious patches based on the spatial feature and frequency feature of each patch, which can enhance the ability of generator in an adversarial manner. Extensive experiments on the benchmark dataset show the effectiveness of our method. Our code and model are available at https://github.com/bcmi/PHDNet-Painterly-Image-Harmonization.	翻訳日:2023-07-06 23:25:25 公開日:2023-07-04
# ドメイン内シナリオを超えて:ロバスト密度対応キャリブレーション Beyond In-Domain Scenarios: Robust Density-Aware Calibration ( http://arxiv.org/abs/2302.05118v2 ) ライセンス: Link先を確認	Christian Tomani, Futa Waseda, Yuesong Shen and Daniel Cremers	(参考訳) 深層ニューラルネットワークがますます安全クリティカルなアプリケーションに展開されていく中、ディープラーニングモデルを校正して不確実性を認識することは重要だ。既存のhoc後のキャリブレーション手法は、ドメイン内テストデータセットで印象的な結果が得られたが、それらはドメインシフトおよびドメイン外(ood)シナリオにおいて信頼性の高い不確実性推定ができないため、制限されている。このギャップを,k-nearest-neighbors (knn) に基づく精度保存法であるdacと密度認識校正法を提案することで橋渡しする。従来のポストホック法とは対照的に,分類器の隠れた層を不確実性に関する情報の源として利用し,その重要性について検討する。 DACは最先端のポストホック手法と簡単に組み合わせられる汎用手法であることを示す。 DACは、ドメインシフトとOODのキャリブレーション性能のロバスト性を高め、ドメイン内予測の不確実性評価を良好に維持する。私たちは、DACが多数のモデルアーキテクチャ、データセット、メトリクスのキャリブレーションを一貫して改善することを示した。さらに,DACは大量のデータを事前学習した最近の大規模ニューラルネットワークにおいて,キャリブレーションを大幅に改善することを示す。 Calibrating deep learning models to yield uncertainty-aware predictions is crucial as deep neural networks get increasingly deployed in safety-critical applications. While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN). In contrast to existing post-hoc methods, we utilize hidden layers of classifiers as a source for uncertainty-related information and study their importance. We show that DAC is a generic method that can readily be combined with state-of-the-art post-hoc methods. DAC boosts the robustness of calibration performance in domain-shift and OOD, while maintaining excellent in-domain predictive uncertainty estimates. We demonstrate that DAC leads to consistently better calibration across a large number of model architectures, datasets, and metrics. Additionally, we show that DAC improves calibration substantially on recent large-scale neural networks pre-trained on vast amounts of data.	翻訳日:2023-07-06 23:18:29 公開日:2023-07-04
# データ中心機械学習のための再ラベル法 The Re-Label Method For Data-Centric Machine Learning ( http://arxiv.org/abs/2302.04391v3 ) ライセンス: Link先を確認	Tong Guo	(参考訳) 業界深層学習アプリケーションでは、手作業でラベル付けしたデータは、一定の数のノイズデータを持っています。この問題を解決し、開発データセットで90以上のスコアを達成するために、人間のラベル付けにおける参照としてモデル予測を考慮し、ノイズデータを見つけ、ノイズデータを再ラベルする簡単な方法を提案する。本稿では,分類,シーケンスタグ付け,オブジェクト検出,シーケンス生成,クリックスルー率予測など,幅広いディープラーニングタスクのセットについて述べる。実験結果と人体評価結果は,我々の考えを検証する。 In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.	翻訳日:2023-07-06 23:17:47 公開日:2023-07-04
# ネットワークにおける2次元空間分割の生成モデル Generative models for two-ground-truth partitions in networks ( http://arxiv.org/abs/2302.02787v2 ) ライセンス: Link先を確認	Lena Mangold and Camille Roth	(参考訳) ネットワークのメソスケール構造を特徴付けるために、無数のアプローチが提案されている。明らかに、異なる種類のパターンを検出するために設計された異なる手法は、ネットワークのメソスケール構造に様々な答えをもたらす可能性がある。しかし、あるメソッドの複数の実行でさえ、多様で矛盾する結果をもたらすことがあるため、ネットワークの複数の(局所的に最適な)メソスケールの説明を含む、パーティションのランドスケープ全体を生成できる。このような曖昧さは、ネットワーク内の複数の定性的に異なる「根拠真理」パーティションを見つけるためのこれらの方法の能力をより詳しく見る動機となる。本稿では,1つのベンチマークネットワークのメソスケール構造に2つの異なるパーティションを組み込むことのできる生成モデルである確率的クロスブロックモデル(SCBM)を提案する。本研究では,確率ブロックモデル (SBM) のパワーを推定し,異なる強度の両コミュニティとコア周辺構造を暗黙的に植え付けることで,ベンチマークモデルの適用例を示す。モデル設計と実験的なセットアップから,2つのパーティションを個別に検出する能力はSBM変種によって異なり,両パーティションの共存は極めて限られたケースでのみ回復されることがわかった。以上の結果から,ほとんどの例では,他のパーティションが存在する場合でも,ひとつの構造のみを検出できることが示唆された。異なる競合する説明が存在する場合、分割の景観全体を考慮する必要性を強調し、分割共存検出法を前進させるために将来の研究を動機付ける。また,ネットワークのメソスケール構造におけるあいまいさを検出するために,新しい手法や既存手法のさらなる探索を可能にすることで,ベンチマークネットワークの分野に寄与する。 A myriad of approaches have been proposed to characterise the mesoscale structure of networks - most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network's mesoscale structure. Yet, even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bi-community and core-periphery structures of different strengths. Given our model design and experimental set-up, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one - in some way dominating - structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.	翻訳日:2023-07-06 23:17:38 公開日:2023-07-04
# マルチパーティイト非局所性とデバイス非依存効果ウィットネスの階層性 A Hierarchy of Multipartite Nonlocality and Device-Independent Effect Witnesses ( http://arxiv.org/abs/2301.12081v2 ) ライセンス: Link先を確認	Peter Bierhorst, Jitendra Prakash	(参考訳) 最近の新しい定義によれば、マルチパーティの行動が真にマルチパーティの非ローカル(gmnl)であるとは、すべてのパーティが共有するローカル(古典的)リソースを補完する二パートのみの非ローカルリソースの基盤ネットワーク上の測定値からモデル化できない場合である。新しい定義は、基礎となる二成分資源間の絡み合いの計測と/または超量子の振る舞いを許容するかどうかによって異なる。本稿では,これらの新しいGMNLの候補定義の階層構造を3つの量子ネットワークに分類し,デバイスに依存しないネットワーク効果の目撃者への親密な関係を明らかにする。 A key finding is the existence of a behavior in the simplest nontrivial multi-partite measurement scenario (3 parties, 2 measurement settings, and 2 outcomes) that cannot be simulated in a bipartite network prohibiting entangled measurements and superquantum resources -- thus witnessing the most general form of GMNL -- but can be simulated with bipartite-only quantum states with an entangled measurement, indicating an approach to device independent certification of entangled measurements with fewer settings than in previous protocols. 驚くべきことに、この3,2,2)の挙動は、従来はアンタングル測定のデバイス非依存の目撃者として研究されていたものと同様に、アンタングル測定を禁止しつつ、超量子双極子資源を許容するGMNL階層のより高いエケロンでシミュレートできる。これは、二部類非局所性とは異なる観測可能な現象として、絡み合った測定の理論に依存しない理解に挑戦する。 According to recent new definitions, a multi-party behavior is genuinely multipartite nonlocal (GMNL) if it cannot be modeled by measurements on an underlying network of bipartite-only nonlocal resources, possibly supplemented with local (classical) resources shared by all parties. The new definitions differ on whether to allow entangled measurements upon, and/or superquantum behaviors among, the underlying bipartite resources. Here, we categorize the full hierarchy of these new candidate definitions of GMNL in three-party quantum networks, highlighting the intimate link to device-independent witnesses of network effects. A key finding is the existence of a behavior in the simplest nontrivial multi-partite measurement scenario (3 parties, 2 measurement settings, and 2 outcomes) that cannot be simulated in a bipartite network prohibiting entangled measurements and superquantum resources -- thus witnessing the most general form of GMNL -- but can be simulated with bipartite-only quantum states with an entangled measurement, indicating an approach to device independent certification of entangled measurements with fewer settings than in previous protocols. Surprisingly, we also find that this (3,2,2) behavior, as well as the others previously studied as device-independent witnesses of entangled measurements, can all be simulated at a higher echelon of the GMNL hierarchy that allows superquantum bipartite resources while still prohibiting entangled measurements. This poses a challenge to a theory-independent understanding of entangled measurements as an observable phenomenon distinct from bipartite nonlocality.	翻訳日:2023-07-06 23:17:10 公開日:2023-07-04
# 神経作用素の分布外リスク境界とヘルムホルツ方程式への応用 Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation ( http://arxiv.org/abs/2301.11509v3 ) ライセンス: Link先を確認	J. Antonio Lara Benitez, Takashi Furuya, Florian Faucher, Anastasis Kratsios, Xavier Tricoche, Maarten V. de Hoop	(参考訳) PDEによって定義された幅広い演算子の近似に顕著な成功にもかかわらず、既存のニューラル演算子(NO)は必ずしも全ての物理問題に対してうまく機能しない。ここでは高周波波に着目し,欠点を浮き彫りにする。そこで本研究では,nos のサブファミリーを提案し,境界領域上のヘルムホルツ方程式の境界値と解への波動速度の非線形作用素マッピングを拡張的に近似する手法を提案する。後者の作用素は、逆問題の研究において一般に'forward'演算子と呼ばれる。提案手法は,確率深度などのトランスフォーマーや技術からインスピレーションを得ている。本実験は,確率的深度導入の一般化と関連性において,ある種の驚きを明らかにするものである。我々のNOは、トレーニングディストリビューション内でのテストだけでなく、アウト・オブ・ディストリビューションのシナリオに対しても、標準的なNOよりも優れたパフォーマンスを示しています。この観察を掘り下げるために、修正されたモデルに関連するラデマッハ複雑性を詳細に分析し、既存のnosが満たさない確率的深さに結びついた上限を証明します。さらに,バナッハ空間上のガウス測度に合わせた,確率的深さと境界に関する新たな分布的リスクが得られた。我々は、NOsのサブファミリーのハイパーネットワークバージョンを、前述のフォワード演算子の代理モデルとして提案することで結論付ける。 Despite their remarkable success in approximating a wide range of operators defined by PDEs, existing neural operators (NOs) do not necessarily perform well for all physics problems. We focus here on high-frequency waves to highlight possible shortcomings. To resolve these, we propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator mapping wave speed to solution, or boundary values for the Helmholtz equation on a bounded domain. The latter operator is commonly referred to as the ''forward'' operator in the study of inverse problems. Our methodology draws inspiration from transformers and techniques such as stochastic depth. Our experiments reveal certain surprises in the generalization and the relevance of introducing stochastic depth. Our NOs show superior performance as compared with standard NOs, not only for testing within the training distribution but also for out-of-distribution scenarios. To delve into this observation, we offer an in-depth analysis of the Rademacher complexity associated with our modified models and prove an upper bound tied to their stochastic depth that existing NOs do not satisfy. Furthermore, we obtain a novel out-of-distribution risk bound tailored to Gaussian measures on Banach spaces, again relating stochastic depth with the bound. We conclude by proposing a hypernetwork version of the subfamily of NOs as a surrogate model for the mentioned forward operator.	翻訳日:2023-07-06 23:15:38 公開日:2023-07-04
# 包括的機械翻訳のためのジェンダー中立化:理論基礎からオープンチャレンジへ Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges ( http://arxiv.org/abs/2301.10075v3 ) ライセンス: Link先を確認	Andrea Piergentili, Dennis Fucci, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri	(参考訳) 言語技術における男女排他性は、重要な研究テーマとなっている。本研究では,性中立翻訳(gnt)を,性別偏差と差別を継続する機械翻訳(mt)モデルによって達成される目的として,性中立翻訳(gnt)について検討する。具体的には、ジェンダー関連言語移行問題を表す言語対である、英語からイタリア語への翻訳に焦点を当てる。 GNTの定義には,ジェンダーを包摂する言語に関する制度的ガイドラインの選択,利用シナリオの議論,MTにおけるGNTの実行に関する技術的課題について検討し,MTにおけるより大きな傾きへの発展を促すための潜在的な解決策について議論する。 Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative of salient gender-related linguistic transfer problems. To define GNT, we review a selection of relevant institutional guidelines for gender-inclusive language, discuss its scenarios of use, and examine the technical challenges of performing GNT in MT, concluding with a discussion of potential solutions to encourage advancements toward greater inclusivity in MT.	翻訳日:2023-07-06 23:15:15 公開日:2023-07-04
# イオン擬ポテンシャルを用いた電池材料の量子シミュレーション Quantum simulation of battery materials using ionic pseudopotentials ( http://arxiv.org/abs/2302.07981v2 ) ライセンス: Link先を確認	Modjtaba Shokrian Zini, Alain Delgado, Roberto dos Reis, Pablo A. M. Casares, Jonathan E. Mueller, Arne-Christian Voigt, Juan Miguel Arrazola	(参考訳) イオン擬ポテンシャルは、核と核電子による有効ポテンシャルをモデル化するために、材料の古典的シミュレーションで広く使われている。電子の少ないモデリングは、システムの状態を正確に表すのに必要な平面波の数を明示的に減少させる。本研究では,疑似ポテンシャルを用いた量子コンピュータ上での周期的物質シミュレーションのコストを削減する量子アルゴリズムを提案する。平面波に基づくハミルトニアンの第一量子化表現を用いた量子化に基づく量子位相推定アルゴリズムを用いる。我々は、ハミルトニアンの量子化のための高度に最適化されたコンパイル戦略を開発することにより、擬ポテンシャルの複雑さを量子シミュレーションに組み込むという課題に対処する。これは分離可能な擬ポテンシャルの形式を利用するユニタリ分解の線形結合を含んでいる。我々の戦略は、量子読み取り専用メモリサブルーチンを量子算術のより効率的な代替手段として利用する。我々は, リチウム含有カソード材料をシミュレートするための計算コストを推定し, より正確なシミュレーションを行い, 余剰容量に対する可逆アクセスを得るための戦略を提示する必要がある。我々は,酸化マンガンリチウム,酸化マンガンリチウム,フッ化マンガンリチウムの3つの材料について,十分な精度のシミュレーションを行うために必要なキュービット数とトフォリゲート数を推定した。最適化されたコンパイル戦略により,Toffoliの総コストは,固定目標精度のため,従来よりも4桁も低い擬ポテンシャル型量子アルゴリズムが実現した。 Ionic pseudopotentials are widely used in classical simulations of materials to model the effective potential due to the nucleus and the core electrons. Modeling fewer electrons explicitly results in a reduction in the number of plane waves needed to accurately represent the states of a system. In this work, we introduce a quantum algorithm that uses pseudopotentials to reduce the cost of simulating periodic materials on a quantum computer. We use a qubitization-based quantum phase estimation algorithm that employs a first-quantization representation of the Hamiltonian in a plane-wave basis. We address the challenge of incorporating the complexity of pseudopotentials into quantum simulations by developing highly-optimized compilation strategies for the qubitization of the Hamiltonian. This includes a linear combination of unitaries decomposition that leverages the form of separable pseudopotentials. Our strategies make use of quantum read-only memory subroutines as a more efficient alternative to quantum arithmetic. We estimate the computational cost of applying our algorithm to simulating lithium-excess cathode materials for batteries, where more accurate simulations are needed to inform strategies for gaining reversible access to the excess capacity they offer. We estimate the number of qubits and Toffoli gates required to perform sufficiently accurate simulations with our algorithm for three materials: lithium manganese oxide, lithium nickel-manganese oxide, and lithium manganese oxyfluoride. Our optimized compilation strategies result in a pseudopotential-based quantum algorithm with a total Toffoli cost four orders of magnitude lower than the previous state of the art for a fixed target accuracy.	翻訳日:2023-07-06 23:05:54 公開日:2023-07-04
# 正規化層のみをチューニングする表現力 The Expressive Power of Tuning Only the Normalization Layers ( http://arxiv.org/abs/2302.07937v2 ) ライセンス: Link先を確認	Angeliki Giannou, Shashank Rajput, Dimitris Papailiopoulos	(参考訳) BatchやLayer-Normalizationといった特徴正規化変換は、最先端のディープニューラルネットワークの必須要素となっている。近年の微調整型大規模事前学習モデルの研究は、これらのアフィン変換のパラメータを調整するだけで下流タスクの精度が向上することを示している。これらの知見は、凍結ネットワークの正規化層をチューニングする表現力に関する疑問を提起する。本稿では,この問題への第一歩として,ランダムなReLUネットワークにおいて,正規化層のみを微調整することで,$O(\sqrt{\text{width}})$倍のターゲットネットワークを再構築可能であることを示す。従来の経験的作業と一致して、十分な過パラメータ化の下でランダムに分散されたネットワークであっても、これは成り立つことを示す。 Feature normalization transforms such as Batch and Layer-Normalization have become indispensable ingredients of state-of-the-art deep neural networks. Recent studies on fine-tuning large pretrained models indicate that just tuning the parameters of these affine transforms can achieve high accuracy for downstream tasks. These findings open the questions about the expressive power of tuning the normalization layers of frozen networks. In this work, we take the first step towards this question and show that for random ReLU networks, fine-tuning only its normalization layers can reconstruct any target network that is $O(\sqrt{\text{width}})$ times smaller. We show that this holds even for randomly sparsified networks, under sufficient overparameterization, in agreement with prior empirical work.	翻訳日:2023-07-06 23:05:29 公開日:2023-07-04
# ラベリング予算制約下での深い異常検出 Deep Anomaly Detection under Labeling Budget Constraints ( http://arxiv.org/abs/2302.07832v2 ) ライセンス: Link先を確認	Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Stephan Mandt, Maja Rudolph	(参考訳) 専門家のフィードバックに対する情報的データポイントの選択は、医療診断や不正検出など、さまざまなコンテキストにおける異常検出(AD)のパフォーマンスを著しく向上させることができる。本稿では,ラベル付きクエリからラベル付きデータへの異常スコアを一般化する理論的条件の集合を決定する。これらの結果から,予算制約の下で最適なデータカバレッジを持つデータラベリング戦略を提案する。さらに,半教師付きADのための新しい学習フレームワークを提案する。画像, 表, ビデオデータセットの大規模な実験により, 予算制約下での最先端の半教師付きAD性能が得られた。 Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.	翻訳日:2023-07-06 23:05:16 公開日:2023-07-04
# 次元低減とMARS Dimension Reduction and MARS ( http://arxiv.org/abs/2302.05790v2 ) ライセンス: Link先を確認	Yu Liu, Degui Li, Yingcun Xia	(参考訳) 多変量適応回帰スプライン(MARS)は、非パラメトリック多変量回帰の一般的な推定方法の1つである。しかし、MARSは境界スプラインに基づいてコヴァリエートの相互作用を組み込むため、境界スプラインの積を使わなければならないため、相互作用の順序が高ければ管理不能な基底関数の数が増加し、推定効率が低下する。本稿では,十分次元削減を実現する共変数の線形結合を用いてMARSの性能を向上させる。 MARSの特殊基底関数は回帰関数の勾配の計算を容易にし、勾配の外部積の固有解析により線形結合の推定を行う。いくつかの技術的条件下では,提案手法の漸近理論が確立されている。シミュレーションと経験的応用の両方を含む数値的研究は、回帰推定と予測においてMARSや他の一般的な非パラメトリック法よりも次元の減少と改善に有効であることを示す。 The multivariate adaptive regression spline (MARS) is one of the popular estimation methods for nonparametric multivariate regressions. However, as MARS is based on marginal splines, to incorporate interactions of covariates, products of the marginal splines must be used, which leads to an unmanageable number of basis functions when the order of interaction is high and results in low estimation efficiency. In this paper, we improve the performance of MARS by using linear combinations of the covariates which achieve sufficient dimension reduction. The special basis functions of MARS facilitate calculation of gradients of the regression function, and estimation of the linear combinations is obtained via eigen-analysis of the outer-product of the gradients. Under some technical conditions, the asymptotic theory is established for the proposed estimation method. Numerical studies including both simulation and empirical applications show its effectiveness in dimension reduction and improvement over MARS and other commonly-used nonparametric methods in regression estimation and prediction.	翻訳日:2023-07-06 23:04:41 公開日:2023-07-04
# 非)-マルコフ量子チャネル下の離散ウィグナー関数を用いた状態の調和量子性 Harnessing quantumness of states using discrete Wigner functions under (non)-Markovian quantum channels ( http://arxiv.org/abs/2303.05291v2 ) ライセンス: Link先を確認	Jai Lalita, K. G. Paulson, Subhashish Banerjee	(参考訳) 離散ウィグナー関数(DWF)の負性は非古典性の尺度であり、しばしば系の量子コヒーレンス度を定量化するために用いられる。異なる量子チャネルの下でのウィグナーの負性性とその進化の研究は、実用的な量子コンピューティングシステムの開発に不可欠である環境との相互作用の下での量子状態の安定性と堅牢性についての洞察を与えることができる。我々は,(非)マルコフ型ランダム電信ノイズ (RTN) と振幅減衰 (AD) 量子チャネルの作用により, 量子ビット, 量子ビットおよび2量子ビット系のDWF負性度の変化について検討した。我々は、量子計算と量子テレポーテーションのリソースとして使用できる異なる負の量子状態を構築する。量子計算とテレポーテーションの成功は、(非)マルコフ進化の下でこれらの状態に対して推定される。 The negativity of the discrete Wigner functions (DWFs) is a measure of non-classicality and is often used to quantify the degree of quantum coherence in a system. The study of Wigner negativity and its evolution under different quantum channels can provide insight into the stability and robustness of quantum states under their interaction with the environment, which is essential for developing practical quantum computing systems. We investigate the variation of DWF negativity of qubit, qutrit, and two-qubit systems under the action of (non)-Markovian random telegraph noise (RTN) and amplitude damping (AD) quantum channels. We construct different negative quantum states which can be used as a resource for quantum computation and quantum teleportation. The success of quantum computation and teleportation is estimated for these states under (non)-Markovian evolutions.	翻訳日:2023-07-06 22:59:12 公開日:2023-07-04
# デバイス非依存プロトコルの制約リークに対するロバスト性 Robustness of implemented device-independent protocols against constrained leakage ( http://arxiv.org/abs/2302.13928v2 ) ライセンス: Link先を確認	Ernest Y.-Z. Tan	(参考訳) 近年、デバイス非依存(DI)プロトコルは、DIランダムネスの生成や拡張、およびDI量子鍵分布の一連のデモによって大きな進歩を遂げている。しかし、これらのデモの既存のセキュリティ証明は、DI暗号の典型的な前提に依存しており、デバイスが互いに望ましくない情報を漏らさないか、敵に漏らさない。この仮定は、実際に完全に実施することは難しいかもしれない。このようなリーク量の制約を考慮に入れたDIセキュリティ証明は他にも存在するが、使用されるテクニックは最近のDIプロトコルのデモを分析するのに適していない。本稿では,この目的に適した制約付き漏洩モデルについて検討し,今後の類似実験にも適用すべき課題について考察する。我々の証明構造は、幅広いdiプロトコルの実装を柔軟に分析するための最近の証明技術と互換性がある。提案手法では,これらのプロトコルの鍵レートに対する漏洩の影響を推定し,正の鍵レートを得ながら許容される漏洩量を明確に把握する。 Device-independent (DI) protocols have experienced significant progress in recent years, with a series of demonstrations of DI randomness generation or expansion, as well as DI quantum key distribution. However, existing security proofs for those demonstrations rely on a typical assumption in DI cryptography, that the devices do not leak any unwanted information to each other or to an adversary. This assumption may be difficult to perfectly enforce in practice. While there exist other DI security proofs that account for a constrained amount of such leakage, the techniques used are somewhat unsuited for analyzing the recent DI protocol demonstrations. In this work, we address this issue by studying a constrained leakage model suited for this purpose, which should also be relevant for future similar experiments. Our proof structure is compatible with recent proof techniques for flexibly analyzing a wide range of DI protocol implementations. With our approach, we compute some estimates of the effects of leakage on the keyrates of those protocols, hence providing a clearer understanding of the amount of leakage that can be allowed while still obtaining positive keyrates.	翻訳日:2023-07-06 22:57:34 公開日:2023-07-04
# ディープニューラルネットワークの二重降下は避けられるか? Can we avoid Double Descent in Deep Neural Networks? ( http://arxiv.org/abs/2302.13259v4 ) ライセンス: Link先を確認	Victor Qu\'etu and Enzo Tartaglione	(参考訳) ディープラーニングモデルの最適サイズを見つけることは、特に省エネスキームにおいて、非常に現実的で幅広い影響を与える。最近になって,予期せぬ現象である‘二重降下’が,ディープラーニングコミュニティの注目を集めている。モデルのサイズが大きくなると、まずパフォーマンスが悪化し、その後は改善に戻ります。これは、高一般化を維持するために最適なモデルのサイズに関する深刻な疑問を提起する: モデルは十分に過度にパラメータ化する必要があるが、パラメータが多すぎるとトレーニングリソースが浪費される。効果的な方法で、最良のトレードオフを見つけることは可能か? 本研究は,学習問題の適切な条件付けによって二重降下現象を回避できる可能性を示唆するが,最終的な答えは見当たらない。我々は、単純な$\ell_2$正則化が既にそのような観点に肯定的な貢献をしているので、適切な正則化を持つ複素シナリオにおいて二重降下が期待されていることを実証的に観察する。 Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning community. As the model's size grows, the performance gets first worse, and then goes back to improving. It raises serious questions about the optimal model's size to maintain high generalization: the model needs to be sufficiently over-parametrized, but adding too many parameters wastes training resources. Is it possible to find, in an efficient way, the best trade-off? Our work shows that the double descent phenomenon is potentially avoidable with proper conditioning of the learning problem, but a final answer is yet to be found. We empirically observe that there is hope to dodge the double descent in complex scenarios with proper regularization, as a simple $\ell_2$ regularization is already positively contributing to such a perspective.	翻訳日:2023-07-06 22:56:26 公開日:2023-07-04
# Video-SwinUNet: VFSSインスタンス分割のための時空間深層学習フレームワーク Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation ( http://arxiv.org/abs/2302.11325v2 ) ライセンス: Link先を確認	Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M Gambaruto, Tilo Burghardt	(参考訳) 本稿では,医療ビデオセグメンテーションのためのディープラーニングフレームワークを提案する。畳み込みニューラルネットワーク(cnn)とトランスフォーマーベースの手法は、その驚くべきセマンティックな特徴エンコーディングとグローバルな情報理解能力によって、医療画像分割タスクにおいて大きなマイルストーンを達成した。しかし、既存のアプローチのほとんどは、時間次元という医療ビデオデータの健全な側面を無視している。提案するフレームワークは,隣接フレームから時間次元にまたがる特徴を明示的に抽出し,それを時間的特徴ブレンダに組み込むことにより,高レベルの時空間的特徴をトークン化し,スウィントランスで符号化された強大域的特徴を形成する。最終的なセグメンテーション結果は、UNetのようなエンコーダデコーダアーキテクチャによって生成される。このモデルは,vfss2022データセットのセグメンテーションベンチマークを改善し,テストした2つのデータセットに対して0.8986と0.8186のサイス係数を実現した。本研究は,学習能力の時間的特徴ブレンドスキームとデータセット間転送可能性の有効性も示す。コードとモデルはhttps://github.com/simonzeng7108/video-swinunetで完全に利用できる。 This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are fully available at https://github.com/SimonZeng7108/Video-SwinUNet.	翻訳日:2023-07-06 22:56:08 公開日:2023-07-04
# deforestvis:surrogate decision stumpsを用いた機械学習モデルの行動分析 DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps ( http://arxiv.org/abs/2304.00133v2 ) ライセンス: Link先を確認	Angelos Chatzimparmpas, Rafael M. Martins, Alexandru C. Telea, Andreas Kerren	(参考訳) 機械学習(ML)モデルの複雑さが増し、異なる(そして重要な)ドメインのアプリケーションが増加するにつれて、より解釈可能で信頼性の高いMLが強く求められている。複雑なmlモデルを理解するための単純でモデルに依存しない方法の1つは、ルールセットや決定木といった、よりシンプルで説明しやすく、元のモデルに十分近似するサーロゲートモデルを訓練することである。しかし、ルールセットは非常に長くなり、多くのif-else文があり、複雑なMLモデルを正確にエミュレートすると決定木深さが急速に増加する。そのような場合、両方のアプローチはコア目標を達成できず、ユーザーにモデル解釈性を提供する。我々は,adaptive boosting (adaboost) 技術を用いて生成されたサーロゲート決定スランプ (one-level decision tree) を提供することにより,複雑なmlモデルの振る舞いをユーザフレンドリに要約するビジュアル分析ツールであるdeforestvisを提案する。私たちのソリューションは、より多くの切り株をインクリメンタルに生成し、決定を正当化するための重み付き切り株による属性ベースの説明を作成し、ルールオーバーライドが1つ以上の切り株間のトレーニングインスタンス割り当てに与える影響を分析することで、複雑さと忠実さのトレードオフを探索するのに役立つ。独立したテストセットによって、ユーザは手動のルール変更の有効性を監視し、ケースバイケースの調査に基づいて仮説を形成することができる。 2つのユースケースでdeforestvisの適用可能性と有用性を示し,データアナリストとモデル開発者とのエキスパートインタビューを行った。 As the complexity of machine learning (ML) models increases and the applications in different (and critical) domains grow, there is a strong demand for more interpretable and trustworthy ML. One straightforward and model-agnostic way to interpret complex ML models is to train surrogate models, such as rule sets and decision trees, that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal, providing users with model interpretability. We tackle this by proposing DeforestVis, a visual analytics tool that offers user-friendly summarization of the behavior of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the adaptive boosting (AdaBoost) technique. Our solution helps users to explore the complexity vs fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analyzing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case investigations. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.	翻訳日:2023-07-06 22:48:04 公開日:2023-07-04
# 表面電子のリドバーグ状態に基づく制御なしゲート Controlled-NOT gate based on the Rydberg states of surface electrons ( http://arxiv.org/abs/2303.08650v4 ) ライセンス: Link先を確認	Jun Wang, Wan-Ting He, Cong-Wei Lu, Yang-Yang Wang, Qing Ai, Hai-Bo Wang	(参考訳) 長いコヒーレンス時間と効率的な操作のため、表面電子(se)は量子計算と量子シミュレーションのための完全な2次元プラットフォームを提供する。本研究では,制御NOT(CNOT)ゲートを実現するための理論スキームを提案し,SEの4レベルRydberg構造上に2量子系を符号化する。状態伝達は中間レベルを持つ3レベル構造によって達成される。 2つの外部電磁界でSEを同時に駆動することにより、電磁誘導透過(EIT)効果の暗黒状態を利用して、最も散逸した状態の人口を抑制し、散逸に対する堅牢性を高める。このスキームの忠実性は、実験的に達成可能なパラメータで 0.9989 である。 Due to the long coherence time and efficient manipulation, the surface electron (SE) provides a perfect two-dimensional platform for quantum computation and quantum simulation. In this work, a theoretical scheme to realize the controlled-NOT (CNOT) gate is proposed, where the two-qubit system is encoded on the four-level Rydberg structure of SE. The state transfer is achieved by a three-level structure with an intermediate level. By simultaneously driving the SE with two external electromagnetic fields, the dark state in the electromagnetically induced transparency (EIT) effect is exploited to suppress the population of the most dissipative state and increase the robustness against dissipation. The fidelity of the scheme is 0.9989 with experimentally achievable parameters.	翻訳日:2023-07-06 22:46:16 公開日:2023-07-04
# 層状材料を用いた光学系の定常2状態系 Stationary Two-State System in Optics using Layered Materials ( http://arxiv.org/abs/2303.08395v2 ) ライセンス: Link先を確認	Ken-ichi Sasaki	(参考訳) グラフェンのような平坦な面にのみ電子が存在する状況で電気力学が量子化されると、マクスウェル方程式の1つがハミルトニアンの局所部分として現れる。ゲージ不変性の結果、任意の物理的状態は局所ハミルトニアンのゼロエネルギー状態である必要がある。我々は2つの定常量子状態を構築し、一方は光の散乱と吸収を再現し、他方は古典光学に精通している。これらの2つの状態はハミルトニアンによって分離され、2つの状態系を形成するが、2つの状態が分離される特別な数の曲面が存在する。数値は 2/\pi \alpha$ であり、$\pi \alpha$ は単面の吸収確率である。 When electrodynamics is quantized in a situation where the electrons exist only at a flat surface such as graphene, one of the Maxwell equations appears as a local part of the Hamiltonian. As a consequence of gauge invariance, any physical state has to be a zero-energy state of the local Hamiltonian. We construct two stationary quantum states; one reproduces scattering and absorption of light, which is familiar in classical optics and the other is more fundamentally related to photon creation. These two states are inseparable by the Hamiltonian and forming a two-state system, but there is a special number of surfaces for which two states are decoupled. The number is $2/\pi \alpha$ where $\pi \alpha$ is the absorption probability of single surface.	翻訳日:2023-07-06 22:46:04 公開日:2023-07-04
# FairAdaBN:適応的バッチ正規化による不公平さの軽減と皮膚疾患分類への応用 FairAdaBN: Mitigating unfairness with adaptive batch normalization and its application to dermatological disease classification ( http://arxiv.org/abs/2303.08325v2 ) ライセンス: Link先を確認	Zikang Xu, Shang Zhao, Quan Quan, Qingsong Yao, and S. Kevin Zhou	(参考訳) 深層学習は、センシティブな情報や重要な診断決定を含む一方で、医学研究やアプリケーションにおいてますます普及している。研究者たちは、モデル不公平と呼ばれる異なる階層特性を持つサブグループ間での顕著なパフォーマンス格差を観察し、厳密なアーキテクチャを慎重に設計し、トレーニングの重荷を伴い、一般化を損なうとともに、モデルパフォーマンスと公平性のトレードオフを明らかにする。そこで本研究では,バッチ正規化を高感度属性に適応させることにより,fairadabnを提案する。この単純だが効果的な設計は、もともと公平を知らないいくつかの分類バックボーンに適用することができる。さらに、ミニバッチ上の部分群間の統計的パリティを抑える新しい損失関数を導出し、モデルが相当公正に収束するように促す。モデル性能と公平性の間のトレードオフを評価するために,fate(fairness-accuracy trade-off efficiency)と呼ばれる新しい指標を提案し,精度低下による正規化フェアネス改善を計算する。 2つの皮膚科学データセットを用いた実験により,提案手法はフェアネス基準とFATEの他の手法よりも優れていた。 Deep learning is becoming increasingly ubiquitous in medical research and applications while involving sensitive information and even critical diagnosis decisions. Researchers observe a significant performance disparity among subgroups with different demographic attributes, which is called model unfairness, and put lots of effort into carefully designing elegant architectures to address unfairness, which poses heavy training burden, brings poor generalization, and reveals the trade-off between model performance and fairness. To tackle these issues, we propose FairAdaBN by making batch normalization adaptive to sensitive attribute. This simple but effective design can be adopted to several classification backbones that are originally unaware of fairness. Additionally, we derive a novel loss function that restrains statistical parity between subgroups on mini-batches, encouraging the model to converge with considerable fairness. In order to evaluate the trade-off between model performance and fairness, we propose a new metric, named Fairness-Accuracy Trade-off Efficiency (FATE), to compute normalized fairness improvement over accuracy drop. Experiments on two dermatological datasets show that our proposed method outperforms other methods on fairness criteria and FATE.	翻訳日:2023-07-06 22:45:49 公開日:2023-07-04
# 表データを用いたディープラーニングのためのグラフニューラルネットワークコンテキスト埋め込み Graph Neural Network contextual embedding for Deep Learning on Tabular Data ( http://arxiv.org/abs/2303.06455v2 ) ライセンス: Link先を確認	Mario Villaiz\'an-Vallelado, Matteo Salvatori, Bel\'en Carro Martinez, Antonio Javier Sanchez Esguevillas	(参考訳) すべての業界は、いわゆる表形式で利用可能な既存のビッグデータに基づいて、人工知能(AI)を活用しようとしている。ディープラーニング(DL)は、自然言語処理のような人間のスキルに関連する分野において、AIにとって大きなブレークスルーとなっている。ツリーベースのアンサンブルのような、より古典的な機械学習(ML)モデルは、通常、パフォーマンスが向上する。本稿では,グラフニューラルネットワーク(GNN)を用いた新しいDLモデルを提案する。この結果は、最近発表された5つの公開データセットに基づいたDLベンチマークによる調査よりも優れており、増木ソリューションと比較しても競争力のある結果が得られる。 All industries are trying to leverage Artificial Intelligence (AI) based on their existing big data which is available in so called tabular form, where each record is composed of a number of heterogeneous continuous and categorical columns also known as features. Deep Learning (DL) has constituted a major breakthrough for AI in fields related to human skills like natural language processing, but its applicability to tabular data has been more challenging. More classical Machine Learning (ML) models like tree-based ensemble ones usually perform better. This paper presents a novel DL model using Graph Neural Network (GNN) more specifically Interaction Network (IN), for contextual embedding and modelling interactions among tabular features. Its results outperform those of a recently published survey with DL benchmark based on five public datasets, also achieving competitive results when compared to boosted-tree solutions.	翻訳日:2023-07-06 22:45:03 公開日:2023-07-04
# 機械学習アルゴリズムの記述的解析による部分順序の深さ関数 Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms ( http://arxiv.org/abs/2304.09872v2 ) ライセンス: Link先を確認	Hannah Blocher, Georg Schollmeyer, Christoph Jansen, Malte Nalenz	(参考訳) 本稿では,深度関数の概念に基づく部分順序集合を記述的に解析するフレームワークを提案する。線形空間および距離空間における深さ関数の集中的な研究にもかかわらず、部分順序のような非標準データ型に対する深さ関数についてはほとんど議論がない。我々は、よく知られたsimplicial depthをすべての部分順序、union-free generic (ufg) depthの集合に適応させる。さらに,多次元性能測定に基づく機械学習アルゴリズムの比較のために,我々の ufg 深度を利用する。具体的には、標準ベンチマークデータセットのサンプル上で異なる分類器の性能の分布を分析する。提案手法が既存のベンチマーク手法と大きく異なることを有望に証明し,分類器の比較に関する活発な議論に新たな視点を付加した。 We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers.	翻訳日:2023-07-06 22:40:02 公開日:2023-07-04
# ストリーミングデータのアクティブコストアウェアラベリング Active Cost-aware Labeling of Streaming Data ( http://arxiv.org/abs/2304.06808v2 ) ライセンス: Link先を確認	Ting Cai, Kirthevasan Kandasamy	(参考訳) アクティブな学習者がデータポイントのストリームに直面するストリーミングデータのラベル付けを積極的に研究し、高価な実験によってラベル付けするポイントを慎重に選択する必要がある。このような問題は医療や天文学などの応用でしばしば発生する。最初に、データの入力が$k$離散分布の1つに属する場合の設定を研究し、ラベリングコストと予測エラーをキャプチャするロスによってこの問題を形式化する。ラベル付けコストが$B$の場合、我々のアルゴリズムは、不確実性が時間とコスト依存しきい値よりも大きい場合の値にラベルを付けることを選択し、$T$ラウンド後の損失に対して$\widetilde{O}(B^{\frac{1}{3}} K^{\frac{1}{3}} T^{\frac{2}{3}})$の最悪の上限を達成する。また、よりニュアンスの高い上界を提供し、アルゴリズムが到着パターンに適応できることを示し、到着パターンがより有利な場合により良い性能を実現する。両方の上界と一致する下界を補完する。次に、入力が連続領域に属し、実験の出力が有界なRKHSノルムを持つ滑らかな関数である場合、この問題を研究する。 $d$次元での$T$のラウンドの後、損失は$\widetilde{O}(B^{\frac{1}{d+3}} T^{\frac{d+2}{d+3}})$と$\widetilde{O}(B^{\frac{1}{2d+3}} T^{\frac{2d+3}})$とMt\ernカーネルを持つRKHSで束縛されていることを示す。本手法は,いくつかの合成実験および医学および天文学における2つの実実験において,他のベースラインよりも優れることを示す。 We study actively labeling streaming data, where an active learner is faced with a stream of data points and must carefully choose which of these points to label via an expensive experiment. Such problems frequently arise in applications such as healthcare and astronomy. We first study a setting when the data's inputs belong to one of $K$ discrete distributions and formalize this problem via a loss that captures the labeling cost and the prediction error. When the labeling cost is $B$, our algorithm, which chooses to label a point if the uncertainty is larger than a time and cost dependent threshold, achieves a worst-case upper bound of $\widetilde{O}(B^{\frac{1}{3}} K^{\frac{1}{3}} T^{\frac{2}{3}})$ on the loss after $T$ rounds. We also provide a more nuanced upper bound which demonstrates that the algorithm can adapt to the arrival pattern, and achieves better performance when the arrival pattern is more favorable. We complement both upper bounds with matching lower bounds. We next study this problem when the inputs belong to a continuous domain and the output of the experiment is a smooth function with bounded RKHS norm. After $T$ rounds in $d$ dimensions, we show that the loss is bounded by $\widetilde{O}(B^{\frac{1}{d+3}} T^{\frac{d+2}{d+3}})$ in an RKHS with a squared exponential kernel and by $\widetilde{O}(B^{\frac{1}{2d+3}} T^{\frac{2d+2}{2d+3}})$ in an RKHS with a Mat\'ern kernel. Our empirical evaluation demonstrates that our method outperforms other baselines in several synthetic experiments and two real experiments in medicine and astronomy.	翻訳日:2023-07-06 22:38:57 公開日:2023-07-04
# 準エントロピーの単調性における等式、リーブの凹凸、安藤の凸凸 Equality cases in monotonicity of quasi-entropies, Lieb's concavity and Ando's convexity ( http://arxiv.org/abs/2304.04361v3 ) ライセンス: Link先を確認	Fumio Hiai	(参考訳) 我々はペッツによる準エントロピーの連接凹凸性および単調特性を再検討し改善する。次に、準エントロピーの単調性不等式(データ処理の不等式)における等式をいくつかの方法で特徴づける: $\Phi:\mathcal{B}(\mathcal{H})\to\mathcal{B}(\mathcal{K})$ を、$\Phi^$ がシュワルツ写像であるようなトレース保存写像とする。 f$ が作用素単調または作用素凸函数であるとき、$[0,\infty)$ 上の等式 $s_f^k(\phi(\rho)\\|\phi(\sigma))=s_f^{\phi^(k)}(\rho\\|\sigma)$ が与えられた正の作用素 $\rho,\sigma$ on $\mathcal{h}$ と $k\in\mathcal{b}(\mathcal{k})$ に対して成り立つ条件をいくつか提示する。この条件は、リーブの凹凸の単調版とアンドーの凸定理の等式を含む。写像 $\Phi$ の特殊化には、リーブの凹凸とアンドーの凸性に等しい条件がある。同様の等式条件は、単調測度や$\chi^2$-divergencesに対しても議論される。さらに,これらの量子情報量に対する線形保存問題についても考察する。 We revisit and improve joint concavity/convexity and monotonicity properties of quasi-entropies due to Petz in a new fashion. Then we characterize equality cases in the monotonicity inequalities (the data-processing inequalities) of quasi-entropies in several ways as follows: Let $\Phi:\mathcal{B}(\mathcal{H})\to\mathcal{B}(\mathcal{K})$ be a trace-preserving map such that $\Phi^$ is a Schwarz map. When $f$ is an operator monotone or operator convex function on $[0,\infty)$, we present several equivalent conditions for the equality $S_f^K(\Phi(\rho)\\|\Phi(\sigma))=S_f^{\Phi^(K)}(\rho\\|\sigma)$ to hold for given positive operators $\rho,\sigma$ on $\mathcal{H}$ and $K\in\mathcal{B}(\mathcal{K})$. The conditions include equality cases in the monotonicity versions of Lieb's concavity and Ando's convexity theorems. Specializing the map $\Phi$ we have equivalent conditions for equality cases in Lieb's concavity and Ando's convexity. Similar equality conditions are discussed also for monotone metrics and $\chi^2$-divergences. We further consider some types of linear preserver problems for those quantum information quantities.	翻訳日:2023-07-06 22:37:45 公開日:2023-07-04
# MedGen3D: ペアド3D画像とマスク生成のための深層生成フレームワーク MedGen3D: A Deep Generative Framework for Paired 3D Image and Mask Generation ( http://arxiv.org/abs/2304.04106v2 ) ライセンス: Link先を確認	Kun Han, Yifeng Xiong, Chenyu You, Pooya Khosravi, Shanlin Sun, Xiangyi Yan, James Duncan, Xiaohui Xie	(参考訳) 十分なラベル付きデータの取得と注釈付けは、正確で堅牢な学習ベースモデルの開発には不可欠であるが、そのようなデータを取得することは、多くの医療画像分割タスクにおいて困難である。有望な解決策の1つは、接地マスクアノテーションで現実的なデータを合成することである。しかし、マスクを用いた完全な3次元ボリューム画像の生成について、先行研究は行われていない。本稿では,3次元医用画像とマスクをペアで生成する深層生成フレームワークであるmedgen3dについて述べる。まず,3次元医用データを2次元配列として表現し,解剖学的形状に付着したマルチラベルマスク列を生成するためのマルチコンディション拡散確率モデル(MC-DPM)を提案する。次に,生成マスク列に条件付き画像系列生成器とセマンティック拡散精製器を用いて,生成マスクと整合したリアルな3次元医用画像を生成する。提案フレームワークは,合成画像とセグメンテーションマップの正確なアライメントを保証する。 3次元胸部ctと脳mriのデータセットを用いた実験では, 合成データはオリジナルデータに対して多様で忠実であり, 下流分節作業の利点を示す。我々は,MedGen3Dが組み合わせた3次元医用画像とマスクを合成する能力は,医用画像処理タスクのためのディープラーニングモデルのトレーニングに有用であることが期待できる。 Acquiring and annotating sufficient labeled data is crucial in developing accurate and robust learning-based models, but obtaining such data can be challenging in many medical image segmentation tasks. One promising solution is to synthesize realistic data with ground-truth mask annotations. However, no prior studies have explored generating complete 3D volumetric images with masks. In this paper, we present MedGen3D, a deep generative framework that can generate paired 3D medical images and masks. First, we represent the 3D medical data as 2D sequences and propose the Multi-Condition Diffusion Probabilistic Model (MC-DPM) to generate multi-label mask sequences adhering to anatomical geometry. Then, we use an image sequence generator and semantic diffusion refiner conditioned on the generated mask sequences to produce realistic 3D medical images that align with the generated masks. Our proposed framework guarantees accurate alignment between synthetic images and segmentation maps. Experiments on 3D thoracic CT and brain MRI datasets show that our synthetic data is both diverse and faithful to the original data, and demonstrate the benefits for downstream segmentation tasks. We anticipate that MedGen3D's ability to synthesize paired 3D medical images and masks will prove valuable in training deep learning models for medical imaging tasks.	翻訳日:2023-07-06 22:37:07 公開日:2023-07-04
# 大腸組織分類のためのクロスモーダル・マイノショット画像生成 Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification ( http://arxiv.org/abs/2304.01992v2 ) ライセンス: Link先を確認	Amandeep Kumar, Ankan kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen and Fahad Shahbaz Khan	(参考訳) 本研究では,まれな癌組織に対する病理組織学的トレーニングデータの不足に対処する,数発の大腸組織画像生成法を提案する。 XM-GANと名づけられた少数ショット生成法は,1塩基と1対の参照組織像を入力とし,高品質で多様な画像を生成する。 xm-gan内の新しい制御可能な核融合ブロックは、基準画像と類似性に基づいて参照画像の局所領域を密に集約し、局所的に一貫した特徴をもたらす。私たちの知る限りでは,大腸組織画像におけるマイトショット生成を初めて調査した。大腸組織画像の創出は, 広範囲な質的, 定量的, 主観的評価(病理医)を用いて行った。特に専門医による評価では、xm-ganが生成した組織画像と実際の画像とを55%しか区別できない。さらに,これらの生成画像をデータ拡張として利用して,数発の組織画像分類課題に対処し,バニラ数発の分類器よりも平均精度が4.4%向上した。コード: \url{https://github.com/VIROBO-15/XM-GAN} In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues. Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images. Within our XM-GAN, a novel controllable fusion block densely aggregates local regions of reference images based on their similarity to those in the base image, resulting in locally consistent features. To the best of our knowledge, we are the first to investigate few-shot generation in colorectal tissue images. We evaluate our few-shot colorectral tissue image generation by performing extensive qualitative, quantitative and subject specialist (pathologist) based evaluations. Specifically, in specialist-based evaluation, pathologists could differentiate between our XM-GAN generated tissue images and real images only 55% time. Moreover, we utilize these generated images as data augmentation to address the few-shot tissue image classification task, achieving a gain of 4.4% in terms of mean accuracy over the vanilla few-shot classifier. Code: \url{https://github.com/VIROBO-15/XM-GAN}	翻訳日:2023-07-06 22:36:29 公開日:2023-07-04
# 超周期的な測定システムと文脈のパターン Hypercyclic systems of measurements and patterns of contextuality ( http://arxiv.org/abs/2304.01155v2 ) ライセンス: Link先を確認	Victor H. Cervantes and Ehtibar N. Dzhafarov	(参考訳) 文脈性に関するいくつかの原理的な尺度は、外乱のない測定系と外乱を伴う測定系の両方について文献に提案されている。測定システムが変化するにつれて、どちらかが変化し、もう一方が一定のままである。これは文脈性の異なる側面を測定することを意味しており、ある特定の意味での文脈性の尺度を1つだけ選ぶのではなく、それら全てを使って文脈性のパターンによって文脈システムを特徴付けることができると提案した。しかし、文脈性のパターンを研究するには、その便利なパラメトリゼーションを必要とする様々な測定システムの体系的な方法が必要である。我々は、量子力学の基礎において主要な役割を担った環状系のクラス内の便利なパラメトリゼーションを持つ。しかし、このクラスでは文脈性のすべての尺度が互いに比例していることが示されているため、文脈性のパターンを研究するのに使用できない。本稿では,超循環計測系について述べる。便利なパラメトリゼーションを保ちながら循環系を一般化する。このクラスのシステムでは、大規模システムと同様、文脈性(contextuality)の既知の測度のうち2つが互いに関数であることを示す。つまり、ハイパーサイクリックシステムは文脈性のパターンを研究するのに使うことができる。 Several principled measures of contextuality have been proposed in the literature, both for systems of measurements without and with disturbance. We have previously shown that no two of them are functions of each other: as systems of measurements change, either of them can change while the other remains constant. This means that they measure different aspects of contextuality, and we proposed that rather than picking just one measure of contextuality in one specific sense, one could use all of them to characterize a contextual system by its pattern of contextuality. To study patterns of contextuality, however, one needs a systematic way of varying systems of measurements, which requires their convenient parametrization. We have convenient parametrization within the class of cyclic systems that have played a dominant role in the foundations of quantum mechanics. However, they cannot be used to study patterns of contextuality, because within this class all measures of contextuality have been shown to be proportional to each other. In this concept paper we introduce hypercyclic systems of measurements. They generalize cyclic systems while preserving convenient parametrization. We show that within this class of systems, the same as for systems at large, no two of the known measures of contextuality are functions of each other. This means that hypercyclic systems can be used to study patterns of contextuality.	翻訳日:2023-07-06 22:36:08 公開日:2023-07-04
# ドローン画像におけるゼブラの合成データに基づく検出 Synthetic Data-based Detection of Zebras in Drone Imagery ( http://arxiv.org/abs/2305.00432v2 ) ライセンス: Link先を確認	Elia Bonetto and Aamir Ahmad	(参考訳) 現在、一般的な物体検出器や人体検出器の訓練を可能にするデータセットが広く利用可能である。これらはラベル付き実世界のイメージの形で提供され、ラベルの欠如やVICONシステムのような非常に制約のあるシナリオのような高いエラーの確率で、かなりの量の人的努力を必要とする。一方、空の景色や野生のシマウマのような動物、人間の形のような難易度の高い情報など、一般的なシナリオはほとんど得られない。これを解決するために、リアルなレンダリング技術を用いた合成データ生成が最近注目を集め、ターゲット追跡や人間のポーズ推定といった先進的な研究分野が進められている。しかし、野生動物のような対象は通常そのようなデータセットではよく表現されない。本研究は,まず,事前学習したYOLO検出器が,空中から記録した実画像中のゼブラを識別できないことを示す。そこで本研究では,合成データのみを用いて動物検出器を訓練する手法を提案する。まず、データ生成のための最先端フレームワークであるGRADEを用いて、新しい合成ゼブラデータセットを生成する。データセットには、RGB、深さ、骨格関節位置、ポーズ、形状、各被験者のインスタンスセグメンテーションが含まれる。これを使って、YOLO検出器をゼロからトレーニングします。実世界のデータを用いたモデルの評価を通して一インターネットで利用可能な限られたデータセット及び二訓練中に合成データのみを用いて、新たに収集し、手作業でラベルづけしたゼブラを検出できることを示す。コード、結果、トレーニングされたモデル、および生成されたデータおよびトレーニングデータは、https://eliabntt.github.io/grade-rr.でオープンソースとして提供される。 Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://eliabntt.github.io/grade-rr.	翻訳日:2023-07-06 22:27:14 公開日:2023-07-04
# メカニスティック・インタプリタビリティのための自動回路発見に向けて Towards Automated Circuit Discovery for Mechanistic Interpretability ( http://arxiv.org/abs/2304.14997v2 ) ライセンス: Link先を確認	Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adri\`a Garriga-Alonso	(参考訳) かなりの努力と直感を通じて、近年のいくつかの研究は、トランスフォーマーモデルの非自明な振る舞いをリバースエンジニアリングした。本論文は, 機械的な解釈過程を体系化する。まず、研究者は望ましいモデル行動を引き起こすメトリクスとデータセットを選択する。次に、アクティベーションパッチを適用して、どの抽象ニューラルネットワークユニットが動作に関与しているかを見つける。調査中のデータセット、メトリック、ユニットを変えることで、研究者は各コンポーネントの機能を理解することができる。プロセスのステップの1つを自動化し、モデルの計算グラフで指定された動作を実装する回路を識別する。我々は,いくつかのアルゴリズムを提案し,それを検証するために先行する解釈可能性結果を再現する。例えば、ACDCアルゴリズムは、GPT-2 Smallの回路で5/5のコンポーネントタイプを再発見し、グレーター・タン演算を計算した。 ACDCはGPT-2 Smallで32,000のエッジのうち68を選定した。私たちのコードはhttps://github.com/ArthurConmy/Automatic-Circuit-Discoveryで公開されています。 Through considerable effort and intuition, several recent works have reverse-engineered nontrivial behaviors of transformer models. This paper systematizes the mechanistic interpretability process they followed. First, researchers choose a metric and dataset that elicit the desired model behavior. Then, they apply activation patching to find which abstract neural network units are involved in the behavior. By varying the dataset, metric, and units under investigation, researchers can understand the functionality of each component. We automate one of the process' steps: to identify the circuit that implements the specified behavior in the model's computational graph. We propose several algorithms and reproduce previous interpretability results to validate them. For example, the ACDC algorithm rediscovered 5/5 of the component types in a circuit in GPT-2 Small that computes the Greater-Than operation. ACDC selected 68 of the 32,000 edges in GPT-2 Small, all of which were manually found by previous work. Our code is available at https://github.com/ArthurConmy/Automatic-Circuit-Discovery.	翻訳日:2023-07-06 22:26:29 公開日:2023-07-04
# 駆動型量子対称単純排他過程における特殊絡み合い Exact Entanglement in the Driven Quantum Symmetric Simple Exclusion Process ( http://arxiv.org/abs/2304.10988v3 ) ライセンス: Link先を確認	Denis Bernard and Ludwig Hruza	(参考訳) 駆動量子系の絡み合い特性は、長距離コヒーレンスによる平衡状態とは異なる可能性がある。我々はこの観察をメソスコピック輸送に適したトイモデルである open quantum symmetric simple exclusion process (qssep) を用いて確認する。異なるサブシステム間の相互情報の正確な公式を導出し、体積法則を満たすことを示す。驚いたことに、QSSEPの絡み合い特性はその輸送特性に関するデータにのみ依存しており、そのような関係はより一般的なメソスコピックシステムに当てはまるかもしれない。 QSSEPのフリー確率構造をエクスプロイトし、これらの結果を得るため、ランダム行列の理論に潜在的に適用可能な数学的結果である、いわゆる局所的自由累積からランダム行列のサブブロックの固有値スペクトルを決定する新しい方法を開発した。この方法の例示として,局所自由積から固有状態熱化仮説 (eth) を満たす系における可観測性の期待値を計算する方法を示す。 Entanglement properties of driven quantum systems can potentially differ from the equilibrium situation due to long range coherences. We confirm this observation by studying a suitable toy model for mesoscopic transport~: the open quantum symmetric simple exclusion process (QSSEP). We derive exact formulae for its mutual information between different subsystems and show that it satisfies a volume law. Surprisingly, the QSSEP entanglement properties only depend on data related to its transport properties and we suspect that such a relation might hold for more general mesoscopic systems. Exploiting the free probability structure of QSSEP, we obtain these results by developing a new method to determine the eigenvalue spectrum of sub-blocks of random matrices from their so-called local free cumulants -- a mathematical result on its own with potential applications in the theory of random matrices. As an illustration of this method, we show how to compute expectation values of observables in systems satisfying the Eigenstate Thermalization Hypothesis (ETH) from the local free cumulants.	翻訳日:2023-07-06 22:25:39 公開日:2023-07-04
# 変圧器入門 An Introduction to Transformers ( http://arxiv.org/abs/2304.10557v3 ) ライセンス: Link先を確認	Richard E. Turner	(参考訳) トランスはニューラルネットワークコンポーネントであり、シーケンスやデータポイントの集合の有用な表現を学ぶのに使用できる。この変換器は、自然言語処理、コンピュータビジョン、時空間モデリングの最近の進歩を推し進めている。トランスフォーマーの紹介は数多く存在するが、ほとんどはアーキテクチャの正確な数学的記述を含んでおらず、設計の選択の背後にある直観も欠落している。さらに、研究が曲がりくねった経路を辿ると、変圧器の部品の説明は慣用的にできる。本論では, 数学的に正確で直感的で, クリーンなトランスフォーマアーキテクチャ記述を目指している。 The transformer is a neural network component that can be used to learn useful representations of sequences or sets of datapoints. The transformer has driven recent advances in natural language processing, computer vision, and spatio-temporal modelling. There are many introductions to transformers, but most do not contain precise mathematical descriptions of the architecture and the intuitions behind the design choices are often also missing. Moreover, as research takes a winding path, the explanations for the components of the transformer can be idiosyncratic. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.	翻訳日:2023-07-06 22:25:20 公開日:2023-07-04
# アノテーションフリーな視聴覚セグメンテーション Annotation-free Audio-Visual Segmentation ( http://arxiv.org/abs/2305.11019v3 ) ライセンス: Link先を確認	Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie	(参考訳) audio-visual segmentation(avs)の目的は、ピクセル単位でのセグメンテーションマスクを正確に予測することで、視覚シーン内の音響オブジェクトをローカライズすることである。タスクに取り組むには、データとモデルの両方の側面を包括的に考慮する必要がある。本稿ではまず,人間のアノテーションを使わずにAISタスクのための人工データを生成する新しいパイプラインを開始する。既存の画像セグメンテーションとオーディオデータセットを利用して、画像とマスクのペアと対応するオーディオサンプルとカテゴリラベルのリンクとをマッチングし、AVSモデルをトレーニングするための(画像、オーディオ、マスク)トリプルを無駄に組み立てることができます。パイプラインは多くのカテゴリをカバーするために、アノテーションフリーでスケーラブルです。さらに,SAMA-AVSによる事前学習セグメントモデル~SAMをAVSタスクに適用するための軽量なアプローチを提案する。アダプタを用いた少数のトレーニング可能なパラメータを導入することで,ほとんどのパラメータを固定した符号化段階において,適切な音声と視覚の融合と相互作用を効果的に実現できる。実験の結果,提案手法が他の競合手法をはるかに上回る結果が得られた。さらに,本合成データを用いて事前学習したモデルを用いて,実avsbenchデータの性能をさらに向上させ,s4サブセットでは83.17miou,ms3セットでは66.95miouを達成した。 The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks. To tackle the task, it involves a comprehensive consideration of both the data and model aspects. In this paper, first, we initiate a novel pipeline for generating artificial data for the AVS task without human annotating. We leverage existing image segmentation and audio datasets to match the image-mask pairs with its corresponding audio samples with the linkage of category labels, that allows us to effortlessly compose (image, audio, mask) triplets for training AVS models. The pipeline is annotation-free and scalable to cover a large number of categories. Additionally, we introduce a lightweight approach SAMA-AVS to adapt the pre-trained segment anything model~(SAM) to the AVS task. By introducing only a small number of trainable parameters with adapters, the proposed model can effectively achieve adequate audio-visual fusion and interaction in the encoding stage with vast majority of parameters fixed. We conduct extensive experiments, and the results show our proposed model remarkably surpasses other competing methods. Moreover, by using the proposed model pretrained with our synthetic data, the performance on real AVSBench data is further improved, achieving 83.17 mIoU on S4 subset and 66.95 mIoU on MS3 set.	翻訳日:2023-07-06 22:19:17 公開日:2023-07-04
# スピンバスと相互作用するシステムのためのラマン断熱経路 Stimulated Raman Adiabatic Passage for a system interacting with a spin-bath ( http://arxiv.org/abs/2305.08209v2 ) ライセンス: Link先を確認	Benedetto Militello and Anna Napoli	(参考訳) このような技術によって操作される物理系がスピン浴と相互作用する場合に、刺激ラマン断熱路を解析する。人口移動過程の効率は, 環境との弱い強い結合や不協和など, いくつかの制度において理論的, 数値的手法を用いて検討した。一般化された量子ゼノ効果の発生は、強い減衰状態における効率の低下を説明する。 Stimulated Raman Adiabatic Passage is analyzed in the case where the physical system manipulated by such technique is interacting with a spin bath. The efficiency of the population transfer process is investigated both theoretically and via numerical tools in several regimes, including the weak and strong coupling with the environment and the off-resonance. The occurrence of a generalized quantum Zeno effect explains the lowering of the efficiency in the strong damping regime.	翻訳日:2023-07-06 22:17:53 公開日:2023-07-04
# 人間と機械のスケーラブル符号化における条件と残留法 Conditional and Residual Methods in Scalable Coding for Humans and Machines ( http://arxiv.org/abs/2305.02562v2 ) ライセンス: Link先を確認	Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Baji\'c	(参考訳) 本稿では,人間および機械のスケーラブルコーディングの文脈において,条件付きおよび残差符号化の手法を提案する。我々は,コンピュータビジョンタスクで利用可能な情報を用いて,再建作業の速度歪み性能を最適化することに注力する。ベースラインを提供するための両手法の情報分析を含むとともに,モデリング能力の向上と従来と類似したトラクタビリティを備えた条件付き符号化に適したエントロピーモデルを提案する。これらの手法を画像再構成に適用し、cityscapesデータセット上のセマンティックセグメンテーション用に作成された表現と、cocoデータセット上のオブジェクト検出のために作成された表現を用いている。両実験とも条件付き法と残留法で同様の性能を示し,その結果の速度歪み曲線はベースラインに含まれる。 We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.	翻訳日:2023-07-06 22:17:13 公開日:2023-07-04
# 品質多様性アルゴリズムの実行時解析 Runtime Analysis of Quality Diversity Algorithms ( http://arxiv.org/abs/2305.18966v2 ) ライセンス: Link先を確認	Jakob Bossek, Dirk Sudholt	(参考訳) 品質の多様性~(QD)は進化的計算の分野であり、近年関心が高まりつつある。 map-elites qdアプローチは、探索空間の分割のような特徴空間を定義し、この空間の各セルに対して最適な解を格納する。我々は,$i$th セルが $[(i-1)k, ik-1]$ で多数のセルを持つセルに対して最適な解を格納する ``number of ones'' 特徴空間上の疑似boolean 最適化の文脈において,単純な qd アルゴリズムについて検討する。ここで$k$は粒度パラメータ $1 \leq k \leq n+1$ である。我々は、全てのセルが任意のフィットネス関数に被覆されるまでの期待時間に厳密な拘束を与え、すべての$k$に対して \textsc{OneMax} 上の QD の期待最適化時間と、特徴空間に好適に整合する他の問題を分析する。組合せ問題では、QD は単調部分モジュラ函数を 1 つの一様濃度制約で効率的に最大化するときに${(1-1/e)}$-近似を求める。連結グラフの連結成分の個数として特徴空間を定義すると、QDが期待される多項式時間で最小のスパンニングツリーを見つけることを示す。 Quality diversity~(QD) is a branch of evolutionary computation that gained increasing interest in recent years. The Map-Elites QD approach defines a feature space, i.e., a partition of the search space, and stores the best solution for each cell of this space. We study a simple QD algorithm in the context of pseudo-Boolean optimisation on the ``number of ones'' feature space, where the $i$th cell stores the best solution amongst those with a number of ones in $[(i-1)k, ik-1]$. Here $k$ is a granularity parameter $1 \leq k \leq n+1$. We give a tight bound on the expected time until all cells are covered for arbitrary fitness functions and for all $k$ and analyse the expected optimisation time of QD on \textsc{OneMax} and other problems whose structure aligns favourably with the feature space. On combinatorial problems we show that QD finds a ${(1-1/e)}$-approximation when maximising any monotone sub-modular function with a single uniform cardinality constraint efficiently. Defining the feature space as the number of connected components of a connected graph, we show that QD finds a minimum spanning tree in expected polynomial time.	翻訳日:2023-07-06 20:35:24 公開日:2023-07-04
# depf:赤外線および可視画像の分解プールに基づく新しい核融合法 DePF: A Novel Fusion Approach based on Decomposition Pooling for Infrared and Visible Images ( http://arxiv.org/abs/2305.17376v2 ) ライセンス: Link先を確認	Hui Li, Yongbiao Xiao, Chunyang Cheng, Zhongwei Shen, Xiaoning Song	(参考訳) 赤外線および可視画像融合は、下降タスクの促進に使用できる、優れた特徴と豊富なテクスチャ詳細を含む合成画像を同時に生成することを目的としている。しかし, 既存の核融合法は, テクスチャロスやエッジ情報不足の問題に悩まされており, 結果として準最適核融合が生じる。一方、ストレートフォワードアップサンプリングオペレータは、マルチスケールの特徴からソース情報を十分に保存できない。これらの問題に対処するために,分解プール法(デプール法)に基づく新しい融合ネットワークを提案し,これをDePFと呼ぶ。具体的には、デプールベースのエンコーダを用いて、複数スケールの画像とソース画像の詳細な特徴を同時に抽出する。さらに,空間的注意モデルを用いて,これらの特徴を集約する。その後、融合した機能はデコーダによって再構成され、アップサンプリング演算子はデプール反転操作に置き換えられる。一般的な最大サンプリング技術とは異なり、デプール層後の画像特徴は豊富な詳細情報を保持でき、融合プロセスに有利である。この場合、リコンストラクション段階では、リッチテクスチャ情報とマルチスケール情報が維持される。実験の結果,本手法は複数の画像融合ベンチマークにおいて最先端技術よりも高い融合性能を示すことがわかった。 Infrared and visible image fusion aims to generate synthetic images simultaneously containing salient features and rich texture details, which can be used to boost downstream tasks. However, existing fusion methods are suffering from the issues of texture loss and edge information deficiency, which result in suboptimal fusion results. Meanwhile, the straight-forward up-sampling operator can not well preserve the source information from multi-scale features. To address these issues, a novel fusion network based on the decomposition pooling (de-pooling) manner is proposed, termed as DePF. Specifically, a de-pooling based encoder is designed to extract multi-scale image and detail features of source images at the same time. In addition, the spatial attention model is used to aggregate these salient features. After that, the fused features will be reconstructed by the decoder, in which the up-sampling operator is replaced by the de-pooling reversed operation. Different from the common max-pooling technique, image features after the de-pooling layer can retain abundant details information, which is benefit to the fusion process. In this case, rich texture information and multi-scale information are maintained during the reconstruction phase. The experimental results demonstrate that the proposed method exhibits superior fusion performance over the state-of-the-arts on multiple image fusion benchmarks.	翻訳日:2023-07-06 20:34:36 公開日:2023-07-04
# 面ベース検索による検索言語モデルの難易度低減 Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models ( http://arxiv.org/abs/2305.16243v3 ) ライセンス: Link先を確認	Ehsan Doostmohammadi, Tobias Norlund, Marco Kuhlmann, Richard Johansson	(参考訳) 検索機構による言語モデルの強化は,パラメータ数を低く保ちながら,性能を著しく向上させることが示されている。検索型モデルは通常、クエリチャンクの密表現と潜在的な隣人の類似性に基づく意味的検索機構に依存する。本稿では,現状のRetroモデルについて検討し,トークン重複などの表面レベルの類似性により,その性能向上がよりよく説明できることを示した。これに触発されて,レトロのセマンティック検索をbm25に基づく表面レベル手法に置き換え,パープレキシティの大幅な低減を図る。 BM25の完全検索は大規模データセットに対して計算コストがかかるため,計算オーバーヘッドを最小に抑えることで,再分類シナリオにも適用することができる。 Augmenting language models with a retrieval mechanism has been shown to significantly improve their performance while keeping the number of parameters low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism based on the similarity between dense representations of the query chunk and potential neighbors. In this paper, we study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities, such as token overlap. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity. As full BM25 retrieval can be computationally costly for large datasets, we also apply it in a re-ranking scenario, gaining part of the perplexity reduction with minimal computational overhead.	翻訳日:2023-07-06 20:33:35 公開日:2023-07-04
# 痕跡のない消滅: ローレンツ量子現実問題に対するケントの解における時間の矢印 Disappearing Without a Trace: The Arrows of Time in Kent's Solution to the Lorentzian Quantum Reality Problem ( http://arxiv.org/abs/2305.13201v2 ) ライセンス: Link先を確認	Emily Adlam	(参考訳) 私たちの周りで見られる時間的非対称性を説明する既存の提案のほとんどは、時間発展に基づく物理学のアプローチの中に置かれており、そのため通常、非対称性は特別な初期状態の形で時間開始時に置かれる。しかし、時間進化パラダイムを前提としない場合、時間的非対称性を説明する他の可能性もあります。本稿では、ケントの量子力学の「最終測度」解釈に基づいて、そのような可能性を探る。このアプローチには、電磁的非対称性、熱力学的非対称性、粗い非対称性、フォーク非対称性、記録的非対称性、宇宙的非対称性を説明するためのリソースがある可能性があり、それがもたらす説明は特別な初期状態に訴える説明よりも優れているかもしれない。我々の希望は、この例が時間進化パラダイム以外の時間的非対称性に対する新しいアプローチをさらに探求することである。 Most existing proposals to explain the temporal asymmetries we see around us are sited within an approach to physics based on time evolution, and thus they typically put the asymmetry in at the beginning of time in the form of a special initial state. But there may be other possibilities for explaining temporal asymmetries if we don't presuppose the time evolution paradigm. In this article, we explore one such possibility, based on Kent's `final-measurement' interpretation of quantum mechanics. We argue that this approach potentially has the resources to explain the electromagnetic asymmetry, the thermodynamic asymmetry, the coarse-graining asymmetry, the fork asymmetry, the record asymmetry, and the cosmological asymmetry, and that the explanations it offers may potentially be better than explanations appealing to a special initial state. Our hope is that this example will encourage further exploration of novel approaches to temporal asymmetry outside of the time evolution paradigm.	翻訳日:2023-07-06 20:32:47 公開日:2023-07-04
# 量子ドット族における幾何学的効果 Geometry effects in quantum dot families ( http://arxiv.org/abs/2305.12748v2 ) ライセンス: Link先を確認	Pavel Exner	(参考訳) We consider Schr\"odinger operator in $L^2(\mathrm{R}^\nu),\, \nu=2,3$, with the interaction in the form on a array of potential Wells, each on them were arranged with a curve $\Gamma$。我々は、$\Gamma$ がコンパクトの外の直線の曲げあるいは変形であり、井戸が同じ弧状距離を持つことを証明し、そのような作用素は空でない離散スペクトルを持つ。また、$\gamma$ が円であれば、主固有値は井戸が同じ角距離を持つ配置によって最大化される。いくつかの予想や未解決の問題も言及されている。 We consider Schr\"odinger operators in $L^2(\mathrm{R}^\nu),\, \nu=2,3$, with the interaction in the form on an array of potential wells, each on them having rotational symmetry, arranged along a curve $\Gamma$. We prove that if $\Gamma$ is a bend or deformation of a line, being straight outside a compact, and the wells have the same arcwise distances, such an operator has a nonempty discrete spectrum. It is also shown that if $\Gamma$ is a circle, the principal eigenvalue is maximized by the arrangement in which the wells have the same angular distances. Some conjectures and open problems are also mentioned.	翻訳日:2023-07-06 20:32:29 公開日:2023-07-04
# 文脈的フレーズ予測ネットワークを用いた文脈的エンドツーエンド音声認識 Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network ( http://arxiv.org/abs/2305.12493v4 ) ライセンス: Link先を確認	Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie	(参考訳) 近年,音声認識技術において文脈情報が重要な役割を担い,エンドツーエンド音声認識モデルに組み込むことが注目されている。しかし、従来のディープバイアス法はバイアスタスクの明示的な監督を欠いていた。本研究では,注意に基づくディープバイアス手法のための文脈句予測ネットワークを提案する。このネットワークは文脈埋め込みを用いて発話中の文脈句を予測し、バイアス損失を計算して文脈モデルのトレーニングを支援する。提案手法は,様々なエンドツーエンド音声認識モデルにおいて,単語誤り率 (WER) の低減を実現した。 librispeechコーパスの実験では,提案モデルがベースラインモデルよりも12.1%向上し,文脈句のwerは相対的に40.5%減少することが示された。さらに,コンテキスト句フィルタリング戦略を適用することで,バイアスリストが大きい場合に,war劣化を効果的に排除する。 Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.	翻訳日:2023-07-06 20:32:12 公開日:2023-07-04
# ネットワーク側情報を用いた高次元線形回帰におけるベイズ最適学習 Bayes optimal learning in high-dimensional linear regression with network side information ( http://arxiv.org/abs/2306.05679v2 ) ライセンス: Link先を確認	Sagnik Nandy and Subhabrata Sen	(参考訳) ネットワークの形でサイド情報を持つ教師付き学習問題は、ゲノム学、プロテオミクス、神経科学の分野で頻繁に発生する。例えば、遺伝的応用において、ネットワーク側情報は、関連する遺伝子間の複雑な関係に関する背景生物学的情報を正確に捉えることができる。本稿では,ネットワーク側情報を含む高次元線形回帰におけるベイズ最適学習の研究を開始する。この目的のために、まず、教師付きデータと観測されたネットワークの共分散を共通の潜在パラメータ集合を通して仮定する単純な生成モデル(Reg-Graphモデル)を導入する。次に,非常に一般的な条件下で最適である近似メッセージパッシング(amp)に基づく反復アルゴリズムを提案する。さらに、潜時信号と観測したデータとの相互情報の制限を特徴付け、ネットワーク側情報の統計的影響を正確に定量化する。最後に,提案アルゴリズムは有限サンプルにおいて優れた性能を示すことを示す。 Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture background biological information on the intricate relations among the relevant genes. In this paper, we initiate a study of Bayes optimal learning in high-dimensional linear regression with network side information. To this end, we first introduce a simple generative model (called the Reg-Graph model) which posits a joint distribution for the supervised data and the observed network through a common set of latent parameters. Next, we introduce an iterative algorithm based on Approximate Message Passing (AMP) which is provably Bayes optimal under very general conditions. In addition, we characterize the limiting mutual information between the latent signal and the data observed, and thus precisely quantify the statistical impact of the network side information. Finally, supporting numerical experiments suggest that the introduced algorithm has excellent performance in finite samples.	翻訳日:2023-07-06 20:25:16 公開日:2023-07-04
# 視線を信じないで - 機能の可視化の信頼性について Don't trust your eyes: on the (un)reliability of feature visualizations ( http://arxiv.org/abs/2306.04719v3 ) ライセンス: Link先を確認	Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim	(参考訳) ニューラルネットワークはどのようにピクセルからパターンを抽出するか? 機能の可視化は、最適化によって非常に活性化したパターンを視覚化することで、この重要な質問に答えようとしている。今日、可視化手法は、機械的な解釈可能性の一種として、ニューラルネットワークの内部動作に関する我々の知識の基礎を形成している。機能可視化はどの程度信頼できるのか? 我々は,自然入力上での通常のネットワーク動作から完全に切り離された任意のパターンを示すために,特徴可視化を騙すネットワーク回路の開発に着手する。特徴視覚化は標準入力とは全く異なる処理を受けており、ニューラルネットワークが自然言語をどのように処理するかを「説明」する能力に疑問を呈している。特徴視覚化によって確実に理解できる関数の集合は極めて小さく、一般的なブラックボックスニューラルネットワークを含まないことを証明した理論によるこの経験的発見を裏付ける。そのため、より信頼性の高い特徴視覚化を実現するために、特定の構造を強制するネットワークの開発が期待できる。 How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.	翻訳日:2023-07-06 20:25:03 公開日:2023-07-04
# 木輪透かし:目に見えず頑丈な拡散画像の指紋 Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust ( http://arxiv.org/abs/2305.20030v3 ) ライセンス: Link先を確認	Yuxin Wen, John Kirchenbauer, Jonas Geiping, Tom Goldstein	(参考訳) 生成モデルのアウトプットを透かしは、著作権をトレースし、AI生成コンテンツによる潜在的な害を防ぐ重要なテクニックである。本稿では,拡散モデル出力を頑健にフィンガープリントするTree-Ring Watermarkingという新しい手法を提案する。サンプリング後の画像へのポストホックな修正を行う既存の方法とは異なり、Tree-Ring Watermarkingはサンプリングプロセス全体に微妙に影響を与え、人間の目に見えないモデル指紋を生み出す。ウォーターマークは、サンプリングに使用される初期ノイズベクトルにパターンを埋め込む。これらのパターンはよりフーリエ空間に構成され、畳み込み、作物、拡張、反転、回転に不変である。画像生成後、拡散過程を反転してノイズベクトルを検索して透かし信号を検出し、埋め込み信号をチェックする。この手法は,fidの損失を無視できるプラグインとして,テキスト条件付き安定拡散を含む任意の拡散モデルに容易に適用できることを実証する。私たちのウォーターマークはイメージ空間にセマンティックに隠されており、現在デプロイされているウォーターマークよりもずっと堅牢です。コードはhttps://github.com/yuxinwenrick/tree-ring-watermarkで入手できる。 Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influences the entire sampling process, resulting in a model fingerprint that is invisible to humans. The watermark embeds a pattern into the initial noise vector used for sampling. These patterns are structured in Fourier space so that they are invariant to convolutions, crops, dilations, flips, and rotations. After image generation, the watermark signal is detected by inverting the diffusion process to retrieve the noise vector, which is then checked for the embedded signal. We demonstrate that this technique can be easily applied to arbitrary diffusion models, including text-conditioned Stable Diffusion, as a plug-in with negligible loss in FID. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed. Code is available at https://github.com/YuxinWenRick/tree-ring-watermark.	翻訳日:2023-07-06 20:23:47 公開日:2023-07-04
# ニューラルネットワークによる1ビットの通信による絡み合った状態のシミュレーション Neural Network Approach to the Simulation of Entangled States with One Bit of Communication ( http://arxiv.org/abs/2305.19935v3 ) ライセンス: Link先を確認	Peter Sidajaya, Aloysius Dewen Lim, Baichu Yu, Valerio Scarani	(参考訳) ベルの定理は、局所隠れ変数(LHV)は、いくつかの絡み合った量子状態における測定の統計を十分に説明できないと述べている。それらをシミュレートするのに、どの程度追加的な古典的コミュニケーションが必要か尋ねるのは自然です。本研究では,ニューラルネットワークシミュレーションやその他のツールを用いて,この分野における2つの長年のオープン質問について検討する。まず, 部分的絡み合った2量子ビット状態における全ての射影的測定は, 1ビットの通信しか必要としないことを示す。我々は、正確な量子挙動とトレーニングされたネットワークの積、あるいはそれに触発された半解析モデルの間の統計的距離を定量化する。第二に、一ビットの通信が最終的に全ての二部量子相関を再現できないという一般的な根拠(そして明らかな)で知られているが、明示的な例は回避可能である。私たちの検索では、最大5つの入力と4つの出力を持つ2部ベルシナリオの1つを見つけられず、量子相関の再現における1ビットの通信のパワーが強調された。 Bell's theorem states that Local Hidden Variables (LHVs) cannot fully explain the statistics of measurements on some entangled quantum states. It is natural to ask how much supplementary classical communication would be needed to simulate them. We study two long-standing open questions in this field with neural network simulations and other tools. First, we present evidence that all projective measurements on partially entangled pure two-qubit states require only one bit of communication. We quantify the statistical distance between the exact quantum behaviour and the product of the trained network, or of a semianalytical model inspired by it. Second, while it is known on general grounds (and obvious) that one bit of communication cannot eventually reproduce all bipartite quantum correlation, explicit examples have proved evasive. Our search failed to find one for several bipartite Bell scenarios with up to 5 inputs and 4 outputs, highlighting the power of one bit of communication in reproducing quantum correlations.	翻訳日:2023-07-06 20:23:27 公開日:2023-07-04
# スパース不変量としての新しい解釈可能な保存法 Discovering New Interpretable Conservation Laws as Sparse Invariants ( http://arxiv.org/abs/2305.19525v3 ) ライセンス: Link先を確認	Ziming Liu, Patrick Obin Sturm, Saketh Bharadwaj, Sam Silva, Max Tegmark	(参考訳) 与えられた力学系の保存法則を明らかにすることは重要であるが困難である。理論的な設定(微分方程式と基底関数の両方が知られている)では、微分方程式から保存則を自動的に発見するアルゴリズムであるスパース不変検出器(SID)を提案する。そのアルゴリズムの単純さは、発見された保存量の堅牢性と解釈可能性を可能にする。 SIDは, 様々なシステムにおける新しい保全法則を再発見し, 発見することができることを示す。流体力学と大気化学の2つの例において、SIDはそれぞれ14と3の保存量を発見し、それまでドメインの専門家に知られていたのは12と2のみである。 Discovering conservation laws for a given dynamical system is important but challenging. In a theorist setup (differential equations and basis functions are both known), we propose the Sparse Invariant Detector (SID), an algorithm that auto-discovers conservation laws from differential equations. Its algorithmic simplicity allows robustness and interpretability of the discovered conserved quantities. We show that SID is able to rediscover known and even discover new conservation laws in a variety of systems. For two examples in fluid mechanics and atmospheric chemistry, SID discovers 14 and 3 conserved quantities, respectively, where only 12 and 2 were previously known to domain experts.	翻訳日:2023-07-06 20:22:52 公開日:2023-07-04
# 放射線腫瘍学のためのセグメンテーションモデル(SAM) Segment Anything Model (SAM) for Radiation Oncology ( http://arxiv.org/abs/2306.11730v2 ) ライセンス: Link先を確認	Lian Zhang, Zhengliang Liu, Lu Zhang, Zihao Wu, Xiaowei Yu, Jason Holmes, Hongying Feng, Haixing Dai, Xiang Li, Quanzheng Li, Dajiang Zhu, Tianming Liu, Wei Liu	(参考訳) 本研究では,臨床放射線治療におけるSegment Anything Model(SAM)の性能評価を行った。以上の結果から,Diceスコアが0.7以上であるほとんどの臓器アットリスク(OAR)において,SAMのセグメンテーションモードは臨床的に許容できるセグメンテーションを達成できることが示唆された。 SAMのボックスプロンプトモードはDiceのスコアをさらに0.1から0.5に改善する。臓器の大きさと境界の明確さを考慮すると、samは境界が明確であるが、境界が明確でない小さな臓器ではより良く機能する大きな臓器の性能を示す。自然画像にプリトレーニングされたモデルであるsamは、臨床的に許容される精度で医療画像からのオールのデライン化を処理できるため、放射線治療の自動セグメンテーションにおいて一貫した精度でsamの堅牢な一般化能力が強調される。言い換えれば、SAMは汎用的な自動セグメンテーションモデルを用いて、異なる場所で異なるOARをデライン化することができる。 SAMの様々な疾患部位における一般化能力は、放射線治療における自動セグメンテーションのための一般的なモデルを開発することが技術的に可能であることを示唆している。 In this study, we evaluate the performance of the Segment Anything Model (SAM) in clinical radiotherapy. Our results indicate that SAM's 'segment anything' mode can achieve clinically acceptable segmentation results in most organs-at-risk (OARs) with Dice scores higher than 0.7. SAM's 'box prompt' mode further improves the Dice scores by 0.1 to 0.5. Considering the size of the organ and the clarity of its boundary, SAM displays better performance for large organs with clear boundaries but performs worse for smaller organs with unclear boundaries. Given that SAM, a model pre-trained purely on natural images, can handle the delineation of OARs from medical images with clinically acceptable accuracy, these results highlight SAM's robust generalization capabilities with consistent accuracy in automatic segmentation for radiotherapy. In other words, SAM can achieve delineation of different OARs at different sites using a generic automatic segmentation model. SAM's generalization capabilities across different disease sites suggest that it is technically feasible to develop a generic model for automatic segmentation in radiotherapy.	翻訳日:2023-07-06 20:15:28 公開日:2023-07-04
# マルコフ鎖を経由する定数ステップサイズsgdの収束と濃度特性 Convergence and concentration properties of constant step-size SGD through Markov chains ( http://arxiv.org/abs/2306.11497v2 ) ライセンス: Link先を確認	Ibrahim Merad and St\'ephane Ga\"iffas	(参考訳) 定常ステップサイズ確率勾配勾配(SGD)を用いた滑らかで強凸な対象の最適化を考察し,マルコフ連鎖のプリズムを通じてその特性を研究する。ゆるやかに制御された分散を持つ偏りのない勾配推定では、反復は全変動距離の不変分布に収束する。また,この収束をwasserstein-2距離において,従来よりも一般的な設定で確立する。極限分布の不変性により, 解析により, これらが勾配に当てはまるとき, 後者が準ガウス的あるいは準指数的濃度特性を継承することを示した。これにより、最終的な推定に対する高信頼境界の導出が可能になる。最後に、線形の場合のそのような条件下では、テール列のポリアック・ラッパート平均に対して無次元の偏差を求める。結果はすべて非漸近的であり,その影響はいくつかの応用を通じて議論されている。 We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance in a more general setting compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.	翻訳日:2023-07-06 20:15:07 公開日:2023-07-04
# 拡張Bose-HubbardモデルにおけるSuper-Tonks-Girardeau Quench Super-Tonks-Girardeau Quench in the Extended Bose-Hubbard Model ( http://arxiv.org/abs/2306.10910v2 ) ライセンス: Link先を確認	Maciej Marciniak, Maciej {\L}ebek, Jakub Kopyci\'nski, Wojciech G\'orecki, Rafa{\l} O{\l}dziejewski, Krzysztof Paw{\l}owski	(参考訳) 本研究では, 強い局所相互作用を持つ一次元気体からのクエンチが, 超トンク・ジラルドー効果として知られる強誘電性ガスへ及ぼす影響について検討する。光学格子と非局所相互作用の両方を組み込むことで、クエンチ中の状態の破壊が特定の範囲の相互作用内に存在することを発見した。本研究は, 2つの原子の分析結果から始まり, 正確な対角化法, DMRG法, TDVP法を応用した少数体系まで, 様々なシステムサイズに拡張されたボース・ハッバードモデルを用いている。最後に、局所密度近似の数値的な実装を用いて、原子のマクロな数を求める。一貫して, スーパートンクス・ジラルドー・クエンチにより, 初期自己結合構造が拡大する領域が明らかとなった。高速蒸発は、拡張ボース・ハバード模型の物理学を探求する最先端の実験で位相図を特徴づけるツールを提供する。 We investigate the effect of a quench from a one-dimensional gas with strong and repulsive local interactions to a strongly attractive one, known as the super-Tonks-Girardeau effect. By incorporating both an optical lattice and non-local interactions, we discover a previously unexplored phenomenon: the disruption of the state during the quench, but within a specific range of interactions. Our study employs the extended Bose-Hubbard model across various system sizes, starting with analytical results for two atoms and progressing to few-body systems using exact diagonalization, DMRG and TDVP methods. Finally, we use a numerical implementation of the local density approximation for a macroscopic number of atoms. Consistently, our findings unveil a region where the initially self-bound structure expands due to the super-Tonks-Girardeau quench. The fast evaporation provides a tool to characterize the phase diagram in state-of-art experiments exploring the physics of the extended Bose-Hubbard model.	翻訳日:2023-07-06 20:14:34 公開日:2023-07-04
# 話題分類のための単言語・クロス言語知識伝達 Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification ( http://arxiv.org/abs/2306.07797v2 ) ライセンス: Link先を確認	Dmitry Karpov, Mikhail Burtsev	(参考訳) 本稿では,RuQTopicsデータセットからの知識伝達について検討する。このロシアのトピックデータセットは、大規模なサンプル番号(361,560シングルラベル、170,930マルチラベル)と広範なクラスカバレッジ(76クラス)を組み合わせたものだ。このデータセットは"yandex que"生データから作成しました。ロシアのMASSIVEサブセットの6つのマッチングクラスでトレーニングされたRuQTopicsモデルを評価することで、このデータセットでトレーニングされたロシアのみのモデルは、このサブセットで連続して85%の精度が得られるため、RuQTopicsデータセットが現実世界の会話タスクに適していることが証明された。また、RuQTopicsで訓練し、MASSIVEの6つのクラス(すべてのMASSIVE言語)で評価した多言語BERTに対して、言語知能の相関(スピアマン相関0.773とp値2.997e-11)と、それに対応する言語に対するBERTのデータの近似サイズとが密接に関連していることが判明した。同時に、言語学的精度とロシア語との言語的距離の相関は統計的に有意ではない。 This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have prepared this dataset from the "Yandex Que" raw data. By evaluating the RuQTopics - trained models on the six matching classes of the Russian MASSIVE subset, we have proved that the RuQTopics dataset is suitable for real-world conversational tasks, as the Russian-only models trained on this dataset consistently yield an accuracy around 85\% on this subset. We also have figured out that for the multilingual BERT, trained on the RuQTopics and evaluated on the same six classes of MASSIVE (for all MASSIVE languages), the language-wise accuracy closely correlates (Spearman correlation 0.773 with p-value 2.997e-11) with the approximate size of the pretraining BERT's data for the corresponding language. At the same time, the correlation of the language-wise accuracy with the linguistical distance from Russian is not statistically significant.	翻訳日:2023-07-06 20:13:26 公開日:2023-07-04
# 他人を検索する:指示付き汎用人物再識別タスク Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions ( http://arxiv.org/abs/2306.07520v2 ) ライセンス: Link先を確認	Weizhen He and Shixiang Tang and Yiheng Deng and Qihao Chen and Qingsong Xie and Yizhou Wang and Lei Bai and Feng Zhu and Rui Zhao and Wanli Ouyang and Donglian Qi and Yunfeng Yan	(参考訳) 人間の知性は、視覚と言語の両方の記述に従って、任意の人物を検索することができる。しかし、現在のコンピュータビジョンコミュニティは、異なるシナリオにおける特定の人物再識別(ReID)タスクを別々に研究しており、現実世界の応用を制限している。本稿では、与えられた画像や言語命令に従って画像を取得する必要がある新しいインストラクト-ReIDタスクを提案し、既存のReIDタスクを異なる命令を設計することで特別なケースとして見ることができる、より一般的なReID設定である。そこで本研究では, 大規模omnireidベンチマークと適応三重項損失をベースラインとして提案する。実験結果から,OmniReIDベンチマークでトレーニングしたベースラインモデルは,従来のReIDでは+0.6%,+1.4%,マーケット1501では0.2%,CUHK03では%,MSMT17では+0.8%,+2.0%,+13.4%,PRCCではVC-Clothes,LTCCでは+11.7%,RGB画像のみを使用する場合にはCOCAS+ real2では+11.7%,新たに定義された言語命令されたReIDでは+25.4%,COCAS+ real2では+25.4%となっている。データセット、モデル、コードはhttps://github.com/hwz-zju/instruct-reidで入手できる。 Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions.Our instruct-ReID is a more general ReID setting, where existing ReID tasks can be viewed as special cases by designing different instructions. We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a baseline method to facilitate research in this new setting. Experimental results show that the baseline model trained on our OmniReID benchmark can improve +0.6%, +1.4%, 0.2% mAP on Market1501, CUHK03, MSMT17 for traditional ReID, +0.8%, +2.0%, +13.4% mAP on PRCC, VC-Clothes, LTCC for clothes-changing ReID, +11.7% mAP on COCAS+ real2 for clothestemplate based clothes-changing ReID when using only RGB images, +25.4% mAP on COCAS+ real2 for our newly defined language-instructed ReID. The dataset, model, and code will be available at https://github.com/hwz-zju/Instruct-ReID.	翻訳日:2023-07-06 20:13:03 公開日:2023-07-04
# $E(2)$-Equivariant Vision Transformer $E(2)$-Equivariant Vision Transformer ( http://arxiv.org/abs/2306.06722v2 ) ライセンス: Link先を確認	Renjun Xu and Kaifan Yang and Ke Liu and Fengxiang He	(参考訳) Vision Transformer (ViT) はコンピュータビジョンにおいて優れた性能を発揮している。しかし、ViTにおける位置符号化は、データの本質的な等価性を学ぶのを著しく困難にしている。当初、同変 ViT を設計する試みがあったが、この論文ではいくつかのケースで欠陥があることが証明されている。この問題に対処するため、我々は、新しい効果的な位置符号化演算子を用いて、GE-ViT(Group Equivariant Vision Transformer)を設計する。 GE-ViTは同変ニューラルネットワークの理論的要件をすべて満たしていることを示す。 GE-ViTが非同変自己注意ネットワークを著しく上回ることを示すため、標準ベンチマークデータセットで包括的な実験が行われた。コードはhttps://github.com/zjucdsyangkaifan/gevitで入手できる。 Vision Transformer (ViT) has achieved remarkable performance in computer vision. However, positional encoding in ViT makes it substantially difficult to learn the intrinsic equivariance in data. Initial attempts have been made on designing equivariant ViT but are proved defective in some cases in this paper. To address this issue, we design a Group Equivariant Vision Transformer (GE-ViT) via a novel, effective positional encoding operator. We prove that GE-ViT meets all the theoretical requirements of an equivariant neural network. Comprehensive experiments are conducted on standard benchmark datasets, demonstrating that GE-ViT significantly outperforms non-equivariant self-attention networks. The code is available at https://github.com/ZJUCDSYangKaifan/GEVit.	翻訳日:2023-07-06 20:12:09 公開日:2023-07-04
# 識別可能な特徴分析によるChatGPT生成コードからの人間認証 Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis ( http://arxiv.org/abs/2306.14397v2 ) ライセンス: Link先を確認	Li Ke, Hong Sheng, Fu Cai, Zhang Yunhe and Liu Ming	(参考訳) プログラミングにおける大規模言語生成モデル(llm)のユビキタスな採用は、人間の書いたコードとインテリジェントなモデルによって生成されたコードの区別の重要性を強調している。本稿では,ChatGPTが生成するコードと,人間が作成したコードとを区別することを目的とする。この2つのソース間のプログラミングスタイル,技術レベル,可読性の違いを明らかにする。その結果,分化のための識別的特徴セットを開発し,その効果をアブレーション実験により評価する。さらに,時間的および空間的セグメンテーションを用いたデータセットクリーニング手法を考案し,データセットの重大さを軽減し,高度かつ汚染されていないデータセットを確保する。データリソースをさらに充実させるためには、"コードトランスフォーメーション"、"機能トランスフォーメーション"、"機能カスタマイズ"技術を採用し、10,000行のchatgpt生成コードからなる広範なデータセットを生成します。本研究の有意義な貢献は、二分分類タスクにおいて、人間が許可したコードとチャットgpt生成コードを区別する精度の高い識別機能セットの提案、広範なチャットgpt生成コードを生成する方法の考案、オープンソースリポジトリから未完成で高品質なコードデータセットを抽出するためのデータセットクリーン化戦略の導入、コードオーサシップアトリビューションタスクにおける例外的な精度の向上などである。 The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored the importance of differentiating between human-written code and code generated by intelligent models. This paper specifically aims to distinguish code generated by ChatGPT from that authored by humans. Our investigation reveals disparities in programming style, technical level, and readability between these two sources. Consequently, we develop a discriminative feature set for differentiation and evaluate its efficacy through ablation experiments. Additionally, we devise a dataset cleansing technique, which employs temporal and spatial segmentation, to mitigate the dearth of datasets and to secure high-caliber, uncontaminated datasets. To further enrich data resources, we employ "code transformation," "feature transformation," and "feature customization" techniques, generating an extensive dataset comprising 10,000 lines of ChatGPT-generated code. The salient contributions of our research include: proposing a discriminative feature set yielding high accuracy in differentiating ChatGPT-generated code from human-authored code in binary classification tasks; devising methods for generating extensive ChatGPT-generated codes; and introducing a dataset cleansing strategy that extracts immaculate, high-grade code datasets from open-source repositories, thus achieving exceptional accuracy in code authorship attribution tasks.	翻訳日:2023-07-06 20:06:18 公開日:2023-07-04
# L00Lとp00pの絡み合い L00L and p00p entanglement ( http://arxiv.org/abs/2306.13620v2 ) ライセンス: Link先を確認	Dylan Danese, Sabine Wollmann, Saroch Leedumrongwatthanakun, Will McCutcheon, Manuel Erhard, William N. Plick, and Mehul Malik	(参考訳) 1つの光子が基本(gauss)モードを持ち、もう1つの光子が非零アジムタール(\ell$)またはラジアル(p$)成分を持つ高次lgモードを持つラゲール・ガウシアン(lg)の非平衡2光子エンタングルメントの生成を実証する。 N00N$ state nomenclatureからキューを受け取り、これらのタイプの状態を$LOOL$ (L00L) または $p00p$-entangled と呼ぶ。それらはlgモード空間で1つの光子を移動させ、ビームスプリッターで第2の(当初は無相関な)光子と結合し、その次に偶然検出することで生成される。 2光子のコヒーレンスを検証するために、2光子の「ツイスト」量子消去器を実証し、香港・ウー・マンデル干渉を2つの区別可能な光子間で再現する。絡み合いの証人を用いて、生成した$LOOL$と$p00p$の状態は、それぞれの理想の最大絡み合い状態に対して95.31%と89.80%の忠実さを持つことがわかった。基本的な興味の他に、この種の絡み合いは、平均的な量子物理学者の面白い骨をくすぐることに大きな影響を与える可能性が高い。 We demonstrate the generation of unbalanced two-photon entanglement in the Laguerre-Gaussian (LG) transverse-spatial degree-of-freedom, where one photon carries a fundamental (Gauss) mode and the other a higher-order LG mode with a non-zero azimuthal ($\ell$) or radial ($p$) component. Taking a cue from the $N00N$ state nomenclature, we call these types of states $LOOL$ (L00L) or $p00p$-entangled. They are generated by shifting one photon in the LG mode space and combining it with a second (initially uncorrelated) photon at a beamsplitter, followed by coincidence detection. In order to verify two-photon coherence, we demonstrate a two-photon "twisted" quantum eraser, where Hong-Ou-Mandel interference is recovered between two distinguishable photons by projecting them into a rotated LG superposition basis. Using an entanglement witness, we find that our generated $LOOL$ and $p00p$ states have fidelities of 95.31% and 89.80% to their respective ideal maximally entangled states. Besides being of fundamental interest, this type of entanglement will likely have a significant impact on tickling the average quantum physicist's funny bone.	翻訳日:2023-07-06 20:04:17 公開日:2023-07-04
# 超伝導ケラーパラメトリック発振器における量子干渉の観測と操作 Observation and manipulation of quantum interference in a superconducting Kerr parametric oscillator ( http://arxiv.org/abs/2306.12299v2 ) ライセンス: Link先を確認	Daisuke Iyama, Takahiko Kamiya, Shiori Fujii, Hiroto Mukai, Yu Zhou, Toshiaki Nagase, Akiyoshi Tomonaga, Rui Wang, Jiao-Jiao Xue, Shohei Watabe, Sangil Kwon, and Jaw-Shen Tsai	(参考訳) 量子トンネルは超伝導回路を「量子」にする現象である。近年,Kerrパラメトリック発振器の位相空間における量子トンネルを量子情報処理の資源として利用することへの関心が高まっている。本稿では、ウィグナートモグラフィによる平面超伝導回路のトンネルによる量子干渉の直接観測について報告する。この量子干渉の全ての本質的性質、例えばフォック状態からキャット状態へのマッピング、ポンプのデチューニングによる時間的振動、そしてその特徴的なラビ振動とラムジー縞を実験的に解明する。最後に,観測された量子干渉の操作としてゲート操作を行う。本研究は,超伝導Kerrパラメトリック発振器の量子特性と量子情報技術への応用に関する基礎研究である。 Quantum tunneling is the phenomenon that makes superconducting circuits "quantum". Recently, there has been a renewed interest in using quantum tunneling in phase space of a Kerr parametric oscillator as a resource for quantum information processing. Here, we report a direct observation of quantum interference induced by such tunneling in a planar superconducting circuit through Wigner tomography. We experimentally elucidate all essential properties of this quantum interference, such as mapping from Fock states to cat states, a temporal oscillation due to the pump detuning, as well as its characteristic Rabi oscillations and Ramsey fringes. Finally, we perform gate operations as manipulations of the observed quantum interference. Our findings lay the groundwork for further studies on quantum properties of superconducting Kerr parametric oscillators and their use in quantum information technologies.	翻訳日:2023-07-06 20:03:48 公開日:2023-07-04
# saaformer : 超スペクトル画像分類のためのスペクトル-空間アキシャルアグリゲーショントランス SaaFormer: Spectral-spatial Axial Aggregation Transformer for Hyperspectral Image Classification ( http://arxiv.org/abs/2306.16759v2 ) ライセンス: Link先を確認	Enzhe Zhao, Zhichang Guo, Yao Li, Dazhi Zhang	(参考訳) 地球の観測衛星や航空機から撮影したハイパースペクトル画像(HSI)は、農業、環境モニタリング、鉱業などの分野でますます重要になっている。利用可能なハイパースペクトルデータセットが限られているため、pixel-wise random samplingは最も一般的に使用されるトレーニング-テストデータセット分割アプローチであり、トレーニングとテストデータセットのサンプル間にかなりの重複がある。さらに,より重なりが強い領域は分類精度が高いことが実験的に示唆された。したがって、画素単位のランダムサンプリングアプローチは、データ漏洩のリスクをもたらす。そこで本研究では,データ漏洩の可能性を最小限に抑えるブロックワイズサンプリング手法を提案する。また,2dcnnなどのモデルにおけるデータ漏洩の存在も実験的に確認した。さらに,HSIを長周期3次元画像とみなす超スペクトル画像分類器の課題に対処するため,スペクトル空間軸アグリゲーショントランスフォーマモデル,すなわちSaaFormerを提案する。このモデルは軸集約注意と多値スペクトル空間抽出の2つの主成分からなる。この軸集約注意機構は、空間的次元特徴を集約しながら、ハイパースペクトル画像の各画素位置におけるスペクトル帯域間の連続性と相関を効果的に活用する。これにより、SaaFormerはブロックワイドサンプリングでも高い精度を維持することができる。多層スペクトル空間抽出構造は、異なる物質成分の特定のスペクトル帯域に対する感度を捉え、より広範囲のスペクトル詳細に集中できるように設計されている。 6つの公開データセットの結果から,本モデルではランダムサンプリングでは同等の性能を示し,ブロックワイドサンプリングパーティションでは他の手法よりも優れていた。 Hyperspectral images (HSI) captured from earth observing satellites and aircraft is becoming increasingly important for applications in agriculture, environmental monitoring, mining, etc. Due to the limited available hyperspectral datasets, the pixel-wise random sampling is the most commonly used training-test dataset partition approach, which has significant overlap between samples in training and test datasets. Furthermore, our experimental observations indicates that regions with larger overlap often exhibit higher classification accuracy. Consequently, the pixel-wise random sampling approach poses a risk of data leakage. Thus, we propose a block-wise sampling method to minimize the potential for data leakage. Our experimental findings also confirm the presence of data leakage in models such as 2DCNN. Further, We propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, to address the challenges associated with hyperspectral image classifier that considers HSI as long sequential three-dimensional images. The model comprises two primary components: axial aggregation attention and multi-level spectral-spatial extraction. The axial aggregation attention mechanism effectively exploits the continuity and correlation among spectral bands at each pixel position in hyperspectral images, while aggregating spatial dimension features. This enables SaaFormer to maintain high precision even under block-wise sampling. The multi-level spectral-spatial extraction structure is designed to capture the sensitivity of different material components to specific spectral bands, allowing the model to focus on a broader range of spectral details. The results on six publicly available datasets demonstrate that our model exhibits comparable performance when using random sampling, while significantly outperforming other methods when employing block-wise sampling partition.	翻訳日:2023-07-06 19:55:13 公開日:2023-07-04
# McKean-Vlasov制御問題に対する連続時間q-ラーニング Continuous Time q-learning for McKean-Vlasov Control Problems ( http://arxiv.org/abs/2306.16208v2 ) ライセンス: Link先を確認	Xiaoli Wei, Xiang Yu	(参考訳) 本稿では,最近Jia と Zhou (2023) による Q-learning の連続時間版として作られた q-learning を,エントロピー規則化強化学習の設定における Mckean-Vlasov 制御問題に対して検討する。 jia と zhou (2023) における単一エージェントの制御問題とは対照的に、エージェントの平均場相互作用は q-関数の定義をより微妙に表現し、2つの異なる q-函数が自然に生じることを示す。 i) テストポリシを含む弱いマルティンゲール条件で学習可能な、Gu, Guo, Wei and Xu (2023) で導入された統合 Q-函数の1次近似としての統合 q-函数($q$ で記述) (ii)政策改善イテレーションで使用される本質的なq-関数($q_e$で示される)。 2つのq関数は、すべてのテストポリシーの下で積分表現を介して関連していることを示す。弱いマーチンゲール条件とテストポリシーの探索法に基づいて,いくつかのモデルフリー学習アルゴリズムを考案した。 LQ制御フレームワークとLQ制御フレームワーク以外の2つの例では、最適値関数とq-関数の正確なパラメータ化を求め、シミュレーション実験でアルゴリズムを説明できる。 This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.	翻訳日:2023-07-06 19:54:20 公開日:2023-07-04
# 証拠検出と追跡コラボレーション:ロバストアンチuavシステムの新しい問題、ベンチマーク、アルゴリズム Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System ( http://arxiv.org/abs/2306.15767v2 ) ライセンス: Link先を確認	Xue-Feng Zhu, Tianyang Xu, Jian Zhao, Jia-Wei Liu, Kai Wang, Gang Wang, Jianan Li, Qiang Wang, Lei Jin, Zheng Zhu, Junliang Xing, Xiao-Jun Wu	(参考訳) 無人航空機(uavs)は、輸送、監視、軍事など多くの分野で広く使用されている。しかし、安全とプライバシー侵害の可能性を増し、より広範な応用を厳しく制限し、UAVの認識と防衛(反UAV)の重要性を強調している。しかし、従来の作業では、UAVの以前の情報が常に提供されていた追跡問題として、このような反UAVタスクを単純化しており、実際の対UAVタスク(複雑なシーン、不定形、再認識型UAV、リアルタイムUAV監視など)では、そのようなスキームは失敗している。本稿では,UAV情報のない複雑な場面において,UAVの知覚を特徴とする新しい実用的対UAV問題を初めて定式化する。このような課題をベンチマークするために、AntiUAV600と呼ばれる最大のUAVデータセットと、新しい評価基準を提案する。 AntiUAV600は、ランダム、高速、小型のUAVを備えた600の挑戦的なシーンのビデオで構成され、723K以上の熱赤外フレームに密接な注釈が付けられた。最後に,グローバルなUAV検出とローカルなUAV追跡の明確な協調による,新たなUAV対策を開発し,提案課題に効果的に取り組むとともに,今後の研究の強力なベースラインとして機能する。広汎な実験により,本手法はSOTA法よりも優れており,大規模で複雑なUAV知覚性能を向上させるために,AntiUAV600の有効性が検証されている。データセット、事前トレーニングされたモデル、ソースコードはパブリックにリリースされます。 Unmanned Aerial Vehicles (UAVs) have been widely used in many areas, including transportation, surveillance, and military. However, their potential for safety and privacy violations is an increasing issue and highly limits their broader applications, underscoring the critical importance of UAV perception and defense (anti-UAV). Still, previous works have simplified such an anti-UAV task as a tracking problem, where the prior information of UAVs is always provided; such a scheme fails in real-world anti-UAV tasks (i.e. complex scenes, indeterminate-appear and -reappear UAVs, and real-time UAV surveillance). In this paper, we first formulate a new and practical anti-UAV problem featuring the UAVs perception in complex scenes without prior UAVs information. To benchmark such a challenging task, we propose the largest UAV dataset dubbed AntiUAV600 and a new evaluation metric. The AntiUAV600 comprises 600 video sequences of challenging scenes with random, fast, and small-scale UAVs, with over 723K thermal infrared frames densely annotated with bounding boxes. Finally, we develop a novel anti-UAV approach via an evidential collaboration of global UAVs detection and local UAVs tracking, which effectively tackles the proposed problem and can serve as a strong baseline for future research. Extensive experiments show our method outperforms SOTA approaches and validate the ability of AntiUAV600 to enhance UAV perception performance due to its large scale and complexity. Our dataset, pretrained models, and source codes will be released publically.	翻訳日:2023-07-06 19:53:56 公開日:2023-07-04
# 学習した位置認識記述子と点対ボクセルによるスパース双時間点雲の不規則変化検出 Irregular Change Detection in Sparse Bi-Temporal Point Clouds using Learned Place Recognition Descriptors and Point-to-Voxel Comparison ( http://arxiv.org/abs/2306.15416v2 ) ライセンス: Link先を確認	Nikolaos Stathoulopoulos, Anton Koval and George Nikolakopoulos	(参考訳) 3Dポイントクラウドにおける変化検出と不規則なオブジェクト抽出は、自律的なナビゲーションだけでなく、様々な産業環境の既存のデジタルツインモデルを更新する上でも重要な課題である。本稿では,voxel-to-point比較に基づく深層学習位置認識記述子と不規則物体抽出を用いた3次元点雲における変化検出手法を提案する。提案手法はまず,共通座標フレームを確立するために,マップマージアルゴリズムを用いて両時間点雲を配向する。そして、ディープラーニング技術を用いて、3Dポイントクラウドスキャンからロバストで差別的な特徴を抽出し、連続するポイントクラウドフレーム間の変化を検知し、変化した領域を見つける。最後に、変化した領域をサンプリングし、2つのインスタンス間で比較し、その領域が変化した障害を抽出する。提案手法は実世界の実地実験で評価され,オブジェクトやmuck-pileの付加・変位などの3次元点雲の異なる種類の変化を検知し,その効果を示した。本研究は, 建設現場における安全・安全監視, 地図作成, 調査, 今後の研究方向性など, 様々な応用に重要な影響を示唆するものである。 Change detection and irregular object extraction in 3D point clouds is a challenging task that is of high importance not only for autonomous navigation but also for updating existing digital twin models of various industrial environments. This article proposes an innovative approach for change detection in 3D point clouds using deep learned place recognition descriptors and irregular object extraction based on voxel-to-point comparison. The proposed method first aligns the bi-temporal point clouds using a map-merging algorithm in order to establish a common coordinate frame. Then, it utilizes deep learning techniques to extract robust and discriminative features from the 3D point cloud scans, which are used to detect changes between consecutive point cloud frames and therefore find the changed areas. Finally, the altered areas are sampled and compared between the two time instances to extract any obstructions that caused the area to change. The proposed method was successfully evaluated in real-world field experiments, where it was able to detect different types of changes in 3D point clouds, such as object or muck-pile addition and displacement, showcasing the effectiveness of the approach. The results of this study demonstrate important implications for various applications, including safety and security monitoring in construction sites, mapping and exploration and suggests potential future research directions in this field.	翻訳日:2023-07-06 19:53:10 公開日:2023-07-04
# ドメイン適応点雲登録のための分別平均教師 A denoised Mean Teacher for domain adaptive point cloud registration ( http://arxiv.org/abs/2306.14749v2 ) ライセンス: Link先を確認	Alexander Bigalke, Mattias P. Heinrich	(参考訳) ポイントクラウドベースの医療登録は、計算効率の向上、強度シフトへの堅牢性、匿名性保存を約束するが、類似度メトリクスによる教師なし学習の非効率性によって制限される。合成変形に関する教師付きトレーニングは代替となるが、ドメインギャップと実際のドメインとの差に悩まされる。本研究はドメイン適応によるこのギャップに取り組むことを目的としている。平均教師との自己学習は、この問題に対する確立されたアプローチであるが、教師からの疑似ラベルの固有ノイズによって障害を受ける。本稿では,2つの相補的デノベーション戦略を含む,ポイントクラウド登録のための教師・学生の認知パラダイムを提案する。まず,教員登録と学生登録のチャンファー距離に基づいて疑似ラベルをフィルタリングし,教師による有害な監督を防止することを提案する。第2に、教師は、予測変形で移動入力を歪ませることで、ノイズフリーラベルで新しいトレーニングペアを動的に合成する。 2つのドメインシフトの下で,公共PVTデータセット上の肺血管木の吸入吸入登録を行う。我々の手法は平均教師を13.5/62.8%上回り、様々な競争相手を一貫して上回り、新しい最先端精度(TRE=2.31mm)を設定する。コードはhttps://github.com/multimodallearning/denoized_mt_pcd_regで入手できる。 Point cloud-based medical registration promises increased computational efficiency, robustness to intensity shifts, and anonymity preservation but is limited by the inefficacy of unsupervised learning with similarity metrics. Supervised training on synthetic deformations is an alternative but, in turn, suffers from the domain gap to the real domain. In this work, we aim to tackle this gap through domain adaptation. Self-training with the Mean Teacher is an established approach to this problem but is impaired by the inherent noise of the pseudo labels from the teacher. As a remedy, we present a denoised teacher-student paradigm for point cloud registration, comprising two complementary denoising strategies. First, we propose to filter pseudo labels based on the Chamfer distances of teacher and student registrations, thus preventing detrimental supervision by the teacher. Second, we make the teacher dynamically synthesize novel training pairs with noise-free labels by warping its moving inputs with the predicted deformations. Evaluation is performed for inhale-to-exhale registration of lung vessel trees on the public PVT dataset under two domain shifts. Our method surpasses the baseline Mean Teacher by 13.5/62.8%, consistently outperforms diverse competitors, and sets a new state-of-the-art accuracy (TRE=2.31mm). Code is available at https://github.com/multimodallearning/denoised_mt_pcd_reg.	翻訳日:2023-07-06 19:52:33 公開日:2023-07-04
# ストリームシナリオにおける距離関数と正規化 Distance Functions and Normalization Under Stream Scenarios ( http://arxiv.org/abs/2307.00106v2 ) ライセンス: Link先を確認	Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr, Rafael M. O. Cruz	(参考訳) データ正規化は、分類システムのモデリングにおいて不可欠なタスクである。データストリームを扱う場合、最小/最大値などの機能の性質を事前に知ることができないため、データ正規化は特に困難になります。我々は,データストリーム中の8つのよく知られた距離関数が正規化せずに生成した精度を比較し,受信したデータの最初のバッチの統計値と受信した前のバッチの統計値から正規化する。完全ストリームを正規化と見なすストリームの実験的なプロトコルは非現実的であり、バイアスと貧弱な結果をもたらす可能性がある。以上の結果から,正規化を行なわずに元のデータストリームとキャンベラ距離を併用することは,データストリームに関する情報が事前に分かっていない場合によい組み合わせであることが示唆された。 Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.	翻訳日:2023-07-06 19:45:47 公開日:2023-07-04
# スキップ接続を用いたベイズ畳み込みニューラルネットワークの自由エネルギー Free energy of Bayesian Convolutional Neural Network with Skip Connection ( http://arxiv.org/abs/2307.01417v1 ) ライセンス: Link先を確認	Shuya Nagayasu and Sumio Watanabe	(参考訳) Residual Network(ResNet)の成功以来、畳み込みニューラルネットワーク(CNN)のアーキテクチャの多くはスキップ接続を採用してきた。スイッチ接続によるCNNの一般化性能は,Ensemble Learningのフレームワークで説明されているが,パラメータ数への依存性は明らかにされていない。本稿では,ベイズ学習において,コンボリューショナルニューラルネットワークのベイズ自由エネルギーは,接続をスキップせずとも有効であることを示す。スキップ接続を持つベイジアンCNNの上限自由エネルギーは、オーブパラメトリゼーションに依存しず、ベイジアンCNNの一般化誤差は同様の性質を持つ。 Since the success of Residual Network(ResNet), many of architectures of Convolutional Neural Networks(CNNs) have adopted skip connection. While the generalization performance of CNN with skip connection has been explained within the framework of Ensemble Learning, the dependency on the number of parameters have not been revealed. In this paper, we show that Bayesian free energy of Convolutional Neural Network both with and without skip connection in Bayesian learning. The upper bound of free energy of Bayesian CNN with skip connection does not depend on the oveparametrization and, the generalization error of Bayesian CNN has similar property.	翻訳日:2023-07-06 18:48:07 公開日:2023-07-04
# マッチング可能なキーポイント支援グラフニューラルネットワークによる学習機能マッチング Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network ( http://arxiv.org/abs/2307.01447v1 ) ライセンス: Link先を確認	Zizhuo Li and Jiayi Ma	(参考訳) 画像のペア間の局所的な特徴の正確なマッチングは、コンピュータビジョンの課題である。従来の研究では注意に基づくグラフニューラルネットワーク(gnn)を使用しており、キーポイント上の完全連結グラフを視覚的および幾何学的情報推論に使用していた。しかし、特徴マッチングの文脈では、検出器の閉塞と故障のため、かなりのキーポイントは取り消せないため、メッセージパッシングには無関係である。非繰り返しキーポイントとの接続は冗長性を導入し、効率が制限されるだけでなく、表現集約プロセスにも干渉し、精度が制限される。提案するMaKeGNNは,非繰り返しキーポイントをバイパスし,マッチング可能なキーポイントを利用して,コンパクトで有意義なメッセージパッシングを導出する,疎度な注意に基づくGNNアーキテクチャである。より具体的には、バイラテラル・コンテキストアウェア・サンプリングモジュールは、まず画像ペアから高い適合性スコアを持つ、分散キーポイントの2つの小さなセットを動的にサンプリングする。次に、我々のMatchable Keypoint-Assisted Context Aggregation Moduleは、サンプルされた通知キーポイントをメッセージボトルネックとみなし、各キーポイントに、マッチするキーポイント内およびマッチしないキーポイントから好ましくないコンテキスト情報を取得することだけを制約し、非削除可能なキーポイントとの無関係で冗長な接続の干渉を回避する。さらに、初期キーポイントとサンプルマッチング可能なキーの潜在的なノイズを考慮し、mkacaモジュールは、データ依存のコンテキスト伝搬のためのマッチング可能性誘導注意集約演算を採用する。これらの手法により, 相対カメラ推定, 基本行列推定, 視覚定位における最先端の性能を実現し, 従来の注意型gnnと比較して計算量やメモリの複雑さを著しく低減した。 Accurately matching local features between a pair of images is a challenging computer vision task. Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images for visual and geometric information reasoning. However, in the context of feature matching, considerable keypoints are non-repeatable due to occlusion and failure of the detector, and thus irrelevant for message passing. The connectivity with non-repeatable keypoints not only introduces redundancy, resulting in limited efficiency, but also interferes with the representation aggregation process, leading to limited accuracy. Targeting towards high accuracy and efficiency, we propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide compact and meaningful message passing. More specifically, our Bilateral Context-Aware Sampling Module first dynamically samples two small sets of well-distributed keypoints with high matchability scores from the image pair. Then, our Matchable Keypoint-Assisted Context Aggregation Module regards sampled informative keypoints as message bottlenecks and thus constrains each keypoint only to retrieve favorable contextual information from intra- and inter- matchable keypoints, evading the interference of irrelevant and redundant connectivity with non-repeatable ones. Furthermore, considering the potential noise in initial keypoints and sampled matchable ones, the MKACA module adopts a matchability-guided attentional aggregation operation for purer data-dependent context propagation. By these means, we achieve the state-of-the-art performance on relative camera estimation, fundamental matrix estimation, and visual localization, while significantly reducing computational and memory complexity compared to typical attentional GNNs.	翻訳日:2023-07-06 18:38:05 公開日:2023-07-04
# 条件付きおよび構成型言語モデル微分型プロンプトについて On Conditional and Compositional Language Model Differentiable Prompting ( http://arxiv.org/abs/2307.01446v1 ) ライセンス: Link先を確認	Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer	(参考訳) プロンプトは、凍った事前学習言語モデル(plm)を下流タスクに適応させる効果的な方法であることが示されている。プロンプトは、人間工学の単語シーケンスまたは学習された連続埋め込みによって表現できる。本研究では,条件と構成の相違性について検討する。本稿では,タスク命令や入力メタデータを PLM からタスク固有の出力を抽出する連続的なプロンプトに変換する新しいモデル Prompt Production System (PRopS) を提案する。私たちのモデルは、プロダクションシステムのニューラルな定式化に基づくモジュラーネットワーク構造を使用し、モデルが個別のルール -- 特定のプロンプト入力パターンの変換を専門に学習する神経関数 -- を学習することができる。本研究では,PRopS が他の PLM 適応手法を一貫して超越していることを示すとともに,構成一般化タスク,制御可能な要約,多言語翻訳において,PRopS が完全に微調整されたモデルで改善されることがしばしばあることを示す。 Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules -- neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that PRopS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.	翻訳日:2023-07-06 18:37:31 公開日:2023-07-04
# グラフポインタネットワークによる組合せ最適化の分岐学習 Learning to Branch in Combinatorial Optimization with Graph Pointer Networks ( http://arxiv.org/abs/2307.01434v1 ) ライセンス: Link先を確認	Rui Wang, Zhiming Zhou, Tao Zhang, Ling Wang, Xin Xu, Xiangke Liao, Kaiwen Li	(参考訳) 分岐とバウンドは組合せ最適化問題を解決する典型的な方法である。本稿では,分岐境界における変数選択ポリシーを学習するためのグラフポインターネットワークモデルを提案する。解法状態を表すために,グラフの特徴,グローバル特徴,歴史的特徴を抽出する。グラフニューラルネットワークとポインタ機構を組み合わせた提案モデルは, 解法状態から分岐変数決定へ効果的にマッピングすることができる。このモデルは、設計されたトップkのKullback-Leibler分散損失関数によって古典的な強い分岐エキスパートルールを模倣するように訓練されている。一連のベンチマーク問題に関する実験は、提案手法が広く使われている専門家設計の分岐規則よりも大幅に優れていることを示した。また,本手法は,最先端の機械学習に基づくブランチ・アンド・バウンド手法よりも,すべてのテストインスタンスにおける高速化と木の大きさの探索に優れる。さらに、モデルは見えないインスタンスに一般化し、より大きなインスタンスにスケールすることができる。 Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.	翻訳日:2023-07-06 18:37:11 公開日:2023-07-04
# 補完記憶システムを用いたオープン語彙分類における連続学習 Continual Learning in Open-vocabulary Classification with Complementary Memory Systems ( http://arxiv.org/abs/2307.01430v1 ) ライセンス: Link先を確認	Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem	(参考訳) オープン語彙画像分類におけるフレキシブルな連続学習法を導入し,人間の認知に観察される相補的な学習システムからインスピレーションを得た。本稿では,遅延学習の原則を適応した"ツリープローブ"手法を提案する。これにより,競合精度の高い新しい例からバッチ学習線形モデルへの高速学習が可能となる。さらに,サンプルのクラスが模範クラス内にあるというゼロショット推定確率を用いて,CLIPゼロショットモデルと模範モデルからの予測を組み合わせる手法を提案する。データインクリメンタル、クラスインクリメンタル、タスクインクリメンタルの設定でテストし、ゼロショットと学習されたカテゴリのさまざまなサブセットで柔軟な推論を実行します。提案手法は,学習速度,目標課題効率,ゼロショット効果のバランスが良好である。 We introduce a method for flexible continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. We propose a "tree probe" method, an adaption of lazy learning principles, which enables fast learning from new examples with competitive accuracy to batch-trained linear models. Further, we propose a method to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within any of the exemplar classes. We test in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories. Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness.	翻訳日:2023-07-06 18:36:57 公開日:2023-07-04
# スマートフィルタ支援ドメイン対向ニューラルネットワーク:ノイズの多い産業シナリオにおける障害診断のための教師なしドメイン適応手法 Smart filter aided domain adversarial neural network: An unsupervised domain adaptation method for fault diagnosis in noisy industrial scenarios ( http://arxiv.org/abs/2307.01429v1 ) ライセンス: Link先を確認	Baorui Dai, Ga\"etan Frusque, Tianfu Li, Qi Li, Olga Fink	(参考訳) 非教師なし領域適応(UDA)に基づく障害診断法の適用は、異なる運用条件、異なる運用単位、シミュレーションデータ、実データ間の運用経験と障害署名の転送を容易にし、産業環境において大きな効果を示した。しかし、実際の産業シナリオでは、未知のレベルやノイズの種類がドメインアライメントの難しさを増幅し、深層学習モデルの診断性能に重大な影響を及ぼす可能性がある。この問題に対処するため, ノイズの多い産業シナリオにおける故障診断のためのスマートフィルタ支援ドメイン適応ニューラルネットワーク (SFDANN) を提案する。提案手法は2段階からなる。最初のステップでは、時間周波数領域におけるソースとターゲットドメインデータの類似性を動的に強制するスマートフィルタを開発する。これは学習可能なウェーブレットパケット変換ネットワーク(lwpt)と従来のウェーブレットパケット変換モジュールを組み合わせたものである。第2のステップでは、スマートフィルタによって再構成されたデータをドメイン逆ニューラルネットワーク(DANN)に入力する。ドメイン不変性と識別的特徴を学習するために、SFDANNの学習可能なモジュールは、時間周波数特徴近接、ドメインアライメント、障害分類の3つの目的で統一的に訓練される。本研究では, 列車-線路連成振動系において, 騒音環境下での軸受の故障診断とスラブ線路の故障診断の2つの事例に基づくSFDANN法の有効性を検証した。その結果, 他のUDA法と比較すると, SFDANNは優れた性能と顕著な安定性を示した。 The application of unsupervised domain adaptation (UDA)-based fault diagnosis methods has shown significant efficacy in industrial settings, facilitating the transfer of operational experience and fault signatures between different operating conditions, different units of a fleet or between simulated and real data. However, in real industrial scenarios, unknown levels and types of noise can amplify the difficulty of domain alignment, thus severely affecting the diagnostic performance of deep learning models. To address this issue, we propose an UDA method called Smart Filter-Aided Domain Adversarial Neural Network (SFDANN) for fault diagnosis in noisy industrial scenarios. The proposed methodology comprises two steps. In the first step, we develop a smart filter that dynamically enforces similarity between the source and target domain data in the time-frequency domain. This is achieved by combining a learnable wavelet packet transform network (LWPT) and a traditional wavelet packet transform module. In the second step, we input the data reconstructed by the smart filter into a domain adversarial neural network (DANN). To learn domain-invariant and discriminative features, the learnable modules of SFDANN are trained in a unified manner with three objectives: time-frequency feature proximity, domain alignment, and fault classification. We validate the effectiveness of the proposed SFDANN method based on two fault diagnosis cases: one involving fault diagnosis of bearings in noisy environments and another involving fault diagnosis of slab tracks in a train-track-bridge coupling vibration system, where the transfer task involves transferring from numerical simulations to field measurements. Results show that compared to other representative state of the art UDA methods, SFDANN exhibits superior performance and remarkable stability.	翻訳日:2023-07-06 18:36:41 公開日:2023-07-04
# deepfakebench: deepfake検出の包括的なベンチマーク DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection ( http://arxiv.org/abs/2307.01426v1 ) ライセンス: Link先を確認	Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu	(参考訳) ディープフェイク検出の分野で見落とされがちな課題は、標準化され、統一され、包括的なベンチマークがないことである。この問題は不公平なパフォーマンス比較と、潜在的に誤解を招く結果につながる。具体的には、データ処理パイプラインに均一性がないため、検出モデルに対する一貫性のないデータ入力が発生する。さらに、実験的な設定には顕著な違いがあり、評価戦略とメトリクスには標準化が欠けている。このギャップを埋めるために、deepfakebenchと呼ばれるdeepfake検出のための最初の包括的なベンチマークを提示します。 1)全検出器間で一貫した入力を確保する統一データ管理システム 2)最先端手法実装のための統合フレームワーク、及び 3)透明性と再現性を促進するための標準化された評価指標とプロトコル。拡張可能なモジュールベースのコードベースを備えたdeepfakebenchには、15の最先端検出方法、9のdeepfakeデータセット、一連のdeepfake検出評価プロトコルと分析ツール、そして包括的な評価が含まれている。さらに、様々な視点(データ拡張、バックボーンなど)からの評価を広範囲に分析した新たな洞察を提供する。われわれの努力が今後の研究を促進し、このますます重要な領域におけるイノベーションを育むことを願っている。ベンチマークのコード、評価、分析はすべてhttps://github.com/SCLBD/DeepfakeBench.comで公開されています。 A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at https://github.com/SCLBD/DeepfakeBench.	翻訳日:2023-07-06 18:36:08 公開日:2023-07-04
# 統一GANフレームワークによる一貫性のあるマルチモーダル生成 Consistent Multimodal Generation via A Unified GAN Framework ( http://arxiv.org/abs/2307.01425v1 ) ライセンス: Link先を確認	Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem	(参考訳) 一つの生成モデルを用いて,RGB,深さ,表面正規化などのマルチモーダル画像を生成する方法について検討する。課題は、現実的で、互いに一貫性のある出力を生成することです。提案手法は,合成ネットワークの最後の層に共有バックボーンとモダリティ固有の分岐を持つstylegan3アーキテクチャを基盤とし,モダリティ毎の忠実度判別器とクロスモダリティ一貫性判別器を提案する。スタンフォード2D3Dデータセットの実験では、RGB、深さ、正常画像の現実的で一貫した生成を実証する。また,事前学習したモデルを新たなドメイン上で,たとえペアでのデータであっても容易に拡張するためのトレーニングレシピも提示しています。さらに, 合成RGBと深度ペアを用いたトレーニングおよび微調整深度推定装置について検討した。コードはhttps://github.com/jessemelpolio/multimodalganで入手できる。 We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at https://github.com/jessemelpolio/MultimodalGAN.	翻訳日:2023-07-06 18:35:48 公開日:2023-07-04
# 生成フローネットワーク - Markov Chain の視点から Generative Flow Networks: a Markov Chain Perspective ( http://arxiv.org/abs/2307.01422v1 ) ライセンス: Link先を確認	Tristan Deleu, Yoshua Bengio	(参考訳) マルコフ連鎖モンテカルロ法(MCMC)は、正規化まで定義された確率分布からサンプリングするための一般的な枠組みを提供するが、後者が高度にマルチモーダルである場合、しばしばターゲット分布への緩やかな収束に悩まされる。近年,サンプルが明確な構成構造を持つ場合,サンプリングを逐次意思決定問題として扱うことにより,この問題を軽減するための代替フレームワークとして生成フローネットワーク(GFlowNets)が提案されている。最初はフローネットワークの観点から紹介されたが、近年のGFlowNetsの進歩は、フローの必要性を完全に回避し、マルコフ連鎖の文献からより多くのインスピレーションを得ている。本稿では、この接続を形式化し、マルコフ連鎖を用いたGFlowNetsの新しい視点を提供し、マルコフ連鎖としての状態空間の性質に関係なくGFlowNetsの統一的な視点を示す。 MCMCメソッドと同じ理論的フレームワークの下でGFlowNetを配置することで、両方のフレームワークの類似性を識別できます。 While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their	翻訳日:2023-07-06 18:35:33 公開日:2023-07-04
# 創発的データ駆動型プロトタイプによる教師なし特徴学習 Unsupervised Feature Learning with Emergent Data-Driven Prototypicality ( http://arxiv.org/abs/2307.01421v1 ) ライセンス: Link先を確認	Yunhui Guo, Youren Zhang, Yubei Chen, Stella X. Yu	(参考訳) ラベルのない画像集合が与えられた場合、我々の目標は、それぞれの画像を特徴空間内の点にマッピングするモデルを訓練することであり、近接が視覚的な類似性を示すだけでなく、その画像がデータセットに従ってどのように原型的であるかを直接エンコードする。私たちの重要な洞察は、ユークリッド空間ではなく双曲空間で教師なしの機能学習を行うことです。そこでは、点間の距離は依然として画像の類似性を反映しています。後者の性質は、通常のメートル法学習の目的を最適化することから単純に発せられる: 多くの訓練例に類似したイメージはユークリッド空間の対応する点の中心に配置されるが、双曲空間の原点に近い。球状パッキングを用いたハイパーボリック空間における教師なし特徴学習アルゴリズムを提案する。 HACKはまず、双曲空間のポインカーボールに一様に充填された粒子を生成し、各粒子にそれぞれの画像を一意に割り当てる。凝縮後の画像は、そのデータセットのより典型的なものとみなされる。我々の特徴マッパーは、単に双曲空間のトレーニングインスタンスを広げるように訓練されただけで、画像が結束によって原点に近づくのを観察し、教師なしの原型発見という考え方を検証する。サンプルの複雑さを低減し、非定型インスタンスによるモデル一般化を増加させ、典型的なインスタンスとの堅牢性を高めるため、データ駆動型プロトティピカリティは簡単で優れた非教師なしインスタンス選択を提供する。 Given an image set without any labels, our goal is to train a model that maps each image to a point in a feature space such that, not only proximity indicates visual similarity, but where it is located directly encodes how prototypical the image is according to the dataset. Our key insight is to perform unsupervised feature learning in hyperbolic instead of Euclidean space, where the distance between points still reflect image similarity, and yet we gain additional capacity for representing prototypicality with the location of the point: The closer it is to the origin, the more prototypical it is. The latter property is simply emergent from optimizing the usual metric learning objective: The image similar to many training instances is best placed at the center of corresponding points in Euclidean space, but closer to the origin in hyperbolic space. We propose an unsupervised feature learning algorithm in Hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincar\'e ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded more typical of the dataset it belongs to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery. We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances and robustness with typical ones.	翻訳日:2023-07-06 18:35:14 公開日:2023-07-04
# コミュニティqaプラットフォームユーザの質問タグ行動分析に基づくタグ予測のモデル化 Modeling Tag Prediction based on Question Tagging Behavior Analysis of CommunityQA Platform Users ( http://arxiv.org/abs/2307.01420v1 ) ライセンス: Link先を確認	Kuntal Kumar Pal, Michael Gamon, Nirupama Chandrasekaran and Silviu Cucerzan	(参考訳) コミュニティの質問応答プラットフォームでは、タグは効果的な情報組織化と検索、より良い質問ルーティング、質問への迅速な回答、トピックの人気評価において重要な役割を果たす。したがって、投稿のタグを予測および提案するための自動アシストは、そのようなプラットフォームのユーザにとって非常に有用である。多様なコミュニティやドメインにまたがるタグ予測を改善するため、17のStackExchangeコミュニティにおいて,ユーザのタグ付け動作を徹底的に分析した。これらの多様な領域において、この挙動の様々な共通する性質が発見された。この結果を用いて、各質問に対して人気のあるタグとより粒度の細かいタグの両方を予測する柔軟なニューラルタグ予測アーキテクチャを開発した。我々のモデルの有効性を示す大規模な実験と得られた性能 In community question-answering platforms, tags play essential roles in effective information organization and retrieval, better question routing, faster response to questions, and assessment of topic popularity. Hence, automatic assistance for predicting and suggesting tags for posts is of high utility to users of such platforms. To develop better tag prediction across diverse communities and domains, we performed a thorough analysis of users' tagging behavior in 17 StackExchange communities. We found various common inherent properties of this behavior in those diverse domains. We used the findings to develop a flexible neural tag prediction architecture, which predicts both popular tags and more granular tags for each question. Our extensive experiments and obtained performance show the effectiveness of our model	翻訳日:2023-07-06 18:34:45 公開日:2023-07-04
# AdAM:Adaptation-Aware Kernel ModulationによるFew-Shot画像生成 AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation ( http://arxiv.org/abs/2307.01465v1 ) ライセンス: Link先を確認	Yunqing Zhao, Keshigeyan Chandrasegaran, Abdollahzadeh Milad, Chao Du, Tianyu Pang, Ruoteng Li, Henghui Ding, Ngai-Man Cheung	(参考訳) Few-shot Image Generation (FSIG)は、少数のトレーニングサンプル(例:10)が与えられた新しい多様な画像を生成することを目的としている。最近の研究は、大規模なソースドメインで事前訓練されたGANを活用し、ターゲットドメインに適応することでFSIGに対処している。最近のFSIG手法の中心は知識保存基準であり、適応されたモデルにソース知識のサブセットを選択し保存する。しかし、既存の方法の大きな制限は、知識保存基準がソースドメイン/タスクのみを考慮し、ソース知識の選択においてターゲットドメイン/適応を考慮せず、ソースドメインとターゲットドメインの近接性の異なる設定に適合性に疑問を投げかけることである。私たちの仕事は2つの貢献をする。まず,最近のFSIG研究とその実験について再検討する。ソースドメインとターゲットドメインの近接性が緩和されるという仮定の下では、知識保存におけるソースドメインのみを考慮した既存のsota(state-of-the-art)メソッドがベースラインメソッドよりも優れていることが判明した。第2の貢献として、異なるソース・ターゲット領域近接の一般FSIGに対してAdaptation-Aware kernel Modulation (AdAM)を提案する。大規模な実験により、AdAMはFSIGのSOTAパフォーマンスを一貫して達成し、ソースドメインとターゲットドメインがより分離された困難なセットアップを含むことを示した。 Few-shot image generation (FSIG) aims to learn to generate new and diverse images given few (e.g., 10) training samples. Recent work has addressed FSIG by leveraging a GAN pre-trained on a large-scale source domain and adapting it to the target domain with few target samples. Central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/task and fail to consider target domain/adaptation in selecting source knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. Firstly, we revisit recent FSIG works and their experiments. We reveal that under setups which assumption of close proximity between source and target domains is relaxed, many existing state-of-the-art (SOTA) methods which consider only source domain in knowledge preserving perform no better than a baseline method. As our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) for general FSIG of different source-target domain proximity. Extensive experiments show that AdAM consistently achieves SOTA performance in FSIG, including challenging setups where source and target domains are more apart.	翻訳日:2023-07-06 18:28:41 公開日:2023-07-04
# 単一フレームと重み付き逐次視覚位置認識の改善のための教師なし品質予測 Unsupervised Quality Prediction for Improved Single-Frame and Weighted Sequential Visual Place Recognition ( http://arxiv.org/abs/2307.01464v1 ) ライセンス: Link先を確認	Helen Carson, Jason J. Ford, Michael Milford	(参考訳) ローカライゼーションと視覚位置認識 (vpr) 技術の絶対的な性能では大きな進歩が見られたが、完全性と予測可能性といった他の能力が、特に安全性や運用上重要な自律システムにおいて重要であることに、これらのシステムをアプリケーションに変換することはますます明確になりつつある。本研究では,局所化推定の確率的品質を予測するための新しいトレーニングフリーアプローチと,これらの予測を用いてシーケンスマッチングプロセスをバイアスし,ナイーブシーケンスマッチングアプローチ以上のパフォーマンス向上を実現する新しい手法を提案する。我々の統合システムは軽量であり、リアルタイムに動作し、基礎となるVPR技術とは無関係である。 4つのデータセットと3つのVPR技術にわたる広範な実験を行い、特に高精度/低リコール動作点における精度向上を実証した。また,予測と重み付きシーケンスマッチングコンポーネントの性能寄与を分離したアブレーションと解析を行い,予測システムの品質と重み付きシーケンスマッチング器の利点との関係について検討した。 While substantial progress has been made in the absolute performance of localization and Visual Place Recognition (VPR) techniques, it is becoming increasingly clear from translating these systems into applications that other capabilities like integrity and predictability are just as important, especially for safety- or operationally-critical autonomous systems. In this research we present a new, training-free approach to predicting the likely quality of localization estimates, and a novel method for using these predictions to bias a sequence-matching process to produce additional performance gains beyond that of a naive sequence matching approach. Our combined system is lightweight, runs in real-time and is agnostic to the underlying VPR technique. On extensive experiments across four datasets and three VPR techniques, we demonstrate our system improves precision performance, especially at the high-precision/low-recall operating point. We also present ablation and analysis identifying the performance contributions of the prediction and weighted sequence matching components in isolation, and the relationship between the quality of the prediction system and the benefits of the weighted sequential matcher.	翻訳日:2023-07-06 18:28:16 公開日:2023-07-04
# 実用的なコラボレーティブ知覚:非同期およびマルチエージェント3dオブジェクト検出のためのフレームワーク Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection ( http://arxiv.org/abs/2307.01462v1 ) ライセンス: Link先を確認	Minh-Quan Dao, Julie Stephany Berrio, Vincent Fr\'emont, Mao Shan, Elwan H\'ery, and Stewart Worrall	(参考訳) 本稿では,LiDARを用いた単車体3次元物体検出モデルの改良を行い,その容量を個々の点雲の代わりにプロセスポイントクラウドシーケンスに拡張する。本稿では,複数フレーム検出モデルの検出精度を高めるため,点雲の連結における影効果の補正に関するこれまでの研究を拡張した。拡張にはHD Mapの導入とOracleモデルの蒸留が含まれています。次に、V2X通信によるマルチエージェント協調による単車認識の性能をさらに向上させる。我々は,単一車両検出モデルの変更やエージェント間同期の仮定を最小限に抑えながら,従来技術よりも帯域幅パフォーマンスのトレードオフを実現する,シンプルかつ効果的なコラボレーション手法を考案する。 v2x-simデータセットを用いた実験では,初期コラボレーションの0.03%に相当する遅延コラボレーションの帯域幅使用量を消費しながら,初期コラボレーションの98%のパフォーマンスを実現していることが示された。コードはhttps://github.com/quan-dao/practical-collab-perceptionでリリースされる。 In this paper, we improve the single-vehicle 3D object detection models using LiDAR by extending their capacity to process point cloud sequences instead of individual point clouds. In this step, we extend our previous work on rectification of the shadow effect in the concatenation of point clouds to boost the detection accuracy of multi-frame detection models. Our extension includes incorporating HD Map and distilling an Oracle model. Next, we further increase the performance of single-vehicle perception using multi-agent collaboration via Vehicle-to-everything (V2X) communication. We devise a simple yet effective collaboration method that achieves better bandwidth-performance tradeoffs than prior arts while minimizing changes made to single-vehicle detection models and assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98% performance of the early collaboration while consuming the equivalent amount of bandwidth usage of late collaboration which is 0.03% of early collaboration. The code will be released at https://github.com/quan-dao/practical-collab-perception.	翻訳日:2023-07-06 18:27:56 公開日:2023-07-04
# 量子キックロータにおけるリアプノフ指数の近似 Approximating Quantum Lyapunov Exponents in Quantum Kicked Rotor ( http://arxiv.org/abs/2307.01461v1 ) ライセンス: Link先を確認	Varsha Gupta	(参考訳) 本研究では,量子キックロータ(qkr)の動力学における初期近接状態の進化に着目し,量子カオスの研究を行う。本稿では,この量子系におけるカオスの度合いを量子リプノフ指数(Quantum Lyapunov Exponent, QLE)を用いて定量化する手法を提案する。まず運動量空間をモデル化し、次にqleを進化状態間の忠実性を分析して計算し、量子カオス挙動に関する洞察を提供する。さらに, 局所化, 均一化, 拡散, 収縮, 運動量空間の振動など, 様々な初期状態についても調査を展開する。この結果は、量子カオスの複雑な性質を浮き彫りにして、様々な動的挙動を明らかにした。最後に,多面量子システムのダイナミクスの可視化と理解に潜在的に有意な意味を持つ,複雑状態を上述の状態の重ね合わせとして表現する革新的な最適化フレームワークを提案する。 In this work, we study quantum chaos by focusing on the evolution of initially close states in the dynamics of the Quantum Kicked Rotor (QKR). We propose a novel measure, the Quantum Lyapunov Exponent (QLE), to quantify the degree of chaos in this quantum system, analogous to its classical counterpart. We begin by modeling the momentum space and then the QLE is computed through analyzing the fidelity between evolving states, offering insights into the quantum chaotic behavior. Furthermore, we extend our investigations to various initial states: localized, uniform, spreading, contracting and oscillating in momentum space. Our results unveil a diverse range of dynamical behaviors, highlighting the complex nature of quantum chaos. Finally, we propose an innovative optimization framework to represent a complex state as a superposition of the aforementioned states, which has potential implications for visualizing and understanding the dynamics of multifaceted quantum systems.	翻訳日:2023-07-06 18:27:40 公開日:2023-07-04
# CARE-MI:母子保健における誤情報評価のための中国のベンチマーク CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care ( http://arxiv.org/abs/2307.01458v1 ) ライセンス: Link先を確認	Tong Xiang, Liangzhi Li, Wangyue Li, Mingbai Bai, Lu Wei, Bowen Wang, Noa Garcia	(参考訳) NLPの最近の進歩は、LLMを現実世界のシナリオに適用する新しい傾向をもたらした。最新のLSMは、人間と対話するときに驚くほど流動的だが、意図せずに事実を偽造することによって誤情報問題に悩まされる。これにより、特に医療などのセンシティブなコンテキストで生成された場合、有害な結果が発生する可能性がある。しかし、LLMの長期化における誤情報の評価、特に知識集約的な話題に焦点を当てた以前の研究はほとんどない。さらに、LLMは様々な言語でうまく機能することが示されているが、誤情報評価は主に英語で行われている。そこで本研究では,LCM誤情報評価のためのベンチマークCARE-MIを提案する。 1)敏感な話題、具体的には母性及び乳幼児ケア領域 2) 英語以外の言語,すなわち中国語。最も重要なことは、他の知識集約型ドメインや低リソース言語に転送可能な長文生成評価ベンチマークを構築するための革新的なパラダイムを提供することです。提案するベンチマークは,LLMの広範利用と,これらのモデルが生成した誤情報を評価するためのデータセットの欠如とのギャップを埋めるものである。専門家による1,612の質問と、人間による参照が含まれている。以上の結果から,現在の中国のLSMは母性や乳幼児ケアの分野では完璧とは程遠いことが判明した。性能評価における人的資源への依存を最小限に抑えるため,ベンチマーク問題を用いてLLMの長期出力を自動的に評価する判断モデルを提案する。さらに、長期生成評価のための潜在的なソリューションを比較し、より堅牢で効率的な自動メトリクスを構築するための洞察を提供する。 The recent advances in NLP, have led to a new trend of applying LLMs to real-world scenarios. While the latest LLMs are astonishingly fluent when interacting with humans, they suffer from the misinformation problem by unintentionally generating factually false statements. This can lead to harmful consequences, especially when produced within sensitive contexts, such as healthcare. Yet few previous works have focused on evaluating misinformation in the long-form generation of LLMs, especially for knowledge-intensive topics. Moreover, although LLMs have been shown to perform well in different languages, misinformation evaluation has been mostly conducted in English. To this end, we present a benchmark, CARE-MI, for evaluating LLM misinformation in: 1) a sensitive topic, specifically the maternity and infant care domain; and 2) a language other than English, namely Chinese. Most importantly, we provide an innovative paradigm for building long-form generation evaluation benchmarks that can be transferred to other knowledge-intensive domains and low-resourced languages. Our proposed benchmark fills the gap between the extensive usage of LLMs and the lack of datasets for assessing the misinformation generated by these models. It contains 1,612 expert-checked questions, accompanied with human-selected references. Using our benchmark, we conduct extensive experiments and found that current Chinese LLMs are far from perfect in the topic of maternity and infant care. In an effort to minimize the reliance on human resources for performance evaluation, we offer a judgment model for automatically assessing the long-form output of LLMs using the benchmark questions. Moreover, we compare potential solutions for long-form generation evaluation and provide insights for building more robust and efficient automated metric.	翻訳日:2023-07-06 18:27:24 公開日:2023-07-04
# ブラックホール内部の非等距離ホログラフィーモデルにおけるホーキング放射からの情報を取得する:理論と量子シミュレーション Retrieving information from Hawking radiation in the non-isometric holographic model of black hole interior: theory and quantum simulations ( http://arxiv.org/abs/2307.01454v1 ) ライセンス: Link先を確認	Ran Li, Xuanhua Wang, Kun Zhang, Jin Wang	(参考訳) 近年、ブラックホール情報パズルの潜在的な解決策として、ブラックホール内部の非等尺的ホログラムモデルが提案されている。このモデルはブラックホールのダイナミクスの2つの記述を提供する: 有効場記述と量子重力の基本記述である。このモデルの重要な側面は、ブラックホールの内部の有効場記述におけるヒルベルト空間から基本自由度へのホログラフィック写像は線型であるが非等距離写像である。本研究では、ブラックホール内部の非等尺ホログラフィーモデルに基づいて、Hayden-Preskillプロトコルの修正版を提案し、ホーキング放射の復号から情報を取り出すことが可能なデカップリング条件を示す。ブラックホール内部のダイナミクスの完全な知識を仮定し,修正ヘイデン・プレススキルプロトコルのデコードに吉田・キタエフのデコード戦略をどのように活用するかを検討する。さらに、7ビットのIBM量子プロセッサ上で確率的および決定論的デコード戦略の実験を行い、解析結果の検証を行い、非等尺モデルにおける情報検索の可能性を確認する。この研究は、量子プロセッサのブラックホール情報問題を探究するより多くの関心を刺激する。 Recently, a non-isometric holographic model of the black hole interior \cite{Akers:2022qdl} was proposed as a potential solution to the long-standing black hole information puzzle. This model provides two descriptions of the black hole dynamics: the effective field description and the fundamental description of the quantum gravity. The key aspect of this model is that the holographic map from the Hilbert space in the effective field description of the black hole interior to the fundamental degrees of freedom is linear but non-isometric. In this study, based on the non-isometric holographic model of black hole interior, we propose a modified version of Hayden-Preskill protocol and demonstrate the decoupling condition under which retrieving information from decoding Hawking radiation is feasible. Assuming the full knowledge of the dynamics of the black hole interior, we investigate how Yoshida-Kitaev decoding strategy can be employed to decode the modified Hayden-Preskill protocol. Furthermore, we perform experimental tests of both probabilistic and deterministic decoding strategies on the 7-qubit IBM quantum processors to validate our analytical findings and confirm the feasibility of retrieving information in the non-isometric model. This study would stimulate more interests to explore black hole information problem on the quantum processors.	翻訳日:2023-07-06 18:26:57 公開日:2023-07-04
# 対話状態追跡のための多種多様な検索学習 Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking ( http://arxiv.org/abs/2307.01453v1 ) ライセンス: Link先を確認	Brendan King and Jeffrey Flanigan	(参考訳) タスク指向対話の収集と注釈付けのコストが高いため,対話状態追跡(DST)におけるゼロと少数ショット学習に大きな関心が寄せられている。近年の研究では、コンテキスト内学習では、データやパラメータの更新がほとんど必要とせず、トレーニング済みのメソッドをわずかに超えている(hu et al. 2022)。本稿では,DSTの文脈内学習に3つの進歩をもたらしたRefPyDSTを提案する。まず、DSTをPythonプログラミングタスクとして定式化し、Pythonの変数参照として言語コア参照を明示的にモデル化する。第2に、コンテキスト内学習は文脈の例に大きく依存するため、性能向上のための多様な事例を抽出する手法を提案する。最後に, 競合する表面形状の確率を考慮したデコード中の新しい再重み付け手法を導入し, より正確な対話状態予測を行う。提案手法をMultiWOZを用いて評価し、ゼロおよび少数ショット設定で最先端のマルチドメイン共同ゴール精度を実現する。 There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for DST. First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python. Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance. Finally, we introduce a novel re-weighting method during decoding that takes into account probabilities of competing surface forms, and produces a more accurate dialogue state prediction. We evaluate our approach using MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero and few-shot settings.	翻訳日:2023-07-06 18:26:36 公開日:2023-07-04
# 因果強化学習:調査 Causal Reinforcement Learning: A Survey ( http://arxiv.org/abs/2307.01452v1 ) ライセンス: Link先を確認	Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang	(参考訳) 強化学習は不確実性下での逐次的決定問題を解決する上で不可欠なパラダイムである。近年の多くの業績にもかかわらず、現実世界での強化学習手法の適用は依然として困難である。主な障害の1つは、強化学習エージェントが世界に対する根本的な理解を欠いているため、多くの試行錯誤相互作用を通じてゼロから学ぶ必要があることである。また、意思決定の説明を提供し、獲得した知識を一般化する上でも課題に直面している。しかし因果性は、体系的な方法で知識を形式化し、効果的な知識伝達のために不変性を活用することができるため、顕著な利点を提供する。これは、因果関係を学習プロセスに組み込むことで既存のアルゴリズムを強化することを目指す強化学習のサブフィールドである因果関係強化学習の出現につながった。本稿では,因果強化学習に関する文献を総合的に検討する。まず,因果関係と強化学習の基本概念を紹介し,因果関係が非因果関係強化学習の核となる課題にどのように対処できるかを説明する。我々は,既存の因果強化学習アプローチを対象問題と方法論に基づいて分類し,体系的に検討する。最後に,この新興分野におけるオープンイシューと今後の方向性について概説する。 Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.	翻訳日:2023-07-06 18:26:20 公開日:2023-07-04
# 実験データと観測データを組み合わせた二重機械学習手法 A Double Machine Learning Approach to Combining Experimental and Observational Data ( http://arxiv.org/abs/2307.01449v1 ) ライセンス: Link先を確認	Marco Morucci, Vittorio Orlandi, Harsh Parikh, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky	(参考訳) 実験的かつ観察的な研究は、しばしば検証不能な仮定のために妥当性を欠いている。本研究では,実験研究と観察研究を組み合わせた2つの機械学習手法を提案する。我々のフレームワークは、より穏やかな仮定の下で外部の妥当性と無知の違反をテストします。 1つの仮定に違反した場合、半パラメトリックに効率的な治療効果推定器を提供する。しかし,本定理は,一貫した処理効果推定のための仮定を正確に同定する必要性を強調している。実世界の3つのケーススタディにおいて,本手法の適用性を実証し,実践的設定との関連を強調した。 Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one assumption is violated, we provide semi-parametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. We demonstrate the applicability of our approach in three real-world case studies, highlighting its relevance for practical settings.	翻訳日:2023-07-06 18:26:02 公開日:2023-07-04
# ReactIE:Weak Supervisionによる化学反応抽出の強化 ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision ( http://arxiv.org/abs/2307.01448v1 ) ライセンス: Link先を確認	Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han	(参考訳) 構造化化学反応情報は、実験やコンピュータ支援医薬品設計などの先進的な取り組みに携わる化学者にとって重要な役割を担っている。科学文献から構造化された反応を抽出することの重要性にもかかわらず、この目的のためのデータアノテーションは、ドメインの専門家が必要とする膨大な労力のためにコストがかかる。したがって、十分なトレーニングデータの不足は、この分野における関連するモデルの進歩の障害となる。本稿では,事前学習のための2つの弱い教師付きアプローチを組み合わせたreactieを提案する。本手法は, テキスト中の頻繁なパターンを言語的手がかりとして, 化学反応の特徴を同定する。さらに,特許記録からの合成データを遠隔監視として採用し,ドメイン知識をモデルに組み込む。実験によると、ReactIEは大幅に改善され、既存のベースラインをすべて上回っている。 Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that ReactIE achieves substantial improvements and outperforms all existing baselines.	翻訳日:2023-07-06 18:25:52 公開日:2023-07-04
# 高密度変動を有する3次元点雲上のセマンティックセグメンテーション Semantic Segmentation on 3D Point Clouds with High Density Variations ( http://arxiv.org/abs/2307.01489v1 ) ライセンス: Link先を確認	Ryan Faulkner, Luke Haub, Simon Ratcliffe, Ian Reid, Tat-Jun Chin	(参考訳) 調査用lidarスキャンは、広範囲および長距離にわたる測定値を取得し、局所密度の異なる大規模な3dポイント雲を生成する。既存の3dセマンティクスセグメンテーションモデルは、様々な点密度に対して頑健性を構築するために、ダウンサンプリングとアップサンプリングを行うが、測量アプリケーションからの点雲の特徴である大きな局所密度変動では効果が低くなる。この弱点を解消するため、我々はHDVNetと呼ばれる新しいアーキテクチャを提案し、それぞれが特定の点密度範囲を扱うエンコーダ-デコーダ経路のネストセットを含む。特徴写像間の相互接続を制限することで、HDVNetは低密度オブジェクトに存在しない高密度特徴の重み付けのような点の密度に基づいて各特徴の信頼性を測定することができる。入力密度の変動を効果的に処理することにより、HDVNetは、半分以上の重みを使って、実点雲上のセグメント化精度で最先端のモデルより優れる。 LiDAR scanning for surveying applications acquire measurements over wide areas and long distances, which produces large-scale 3D point clouds with significant local density variations. While existing 3D semantic segmentation models conduct downsampling and upsampling to build robustness against varying point densities, they are less effective under the large local density variations characteristic of point clouds from surveying applications. To alleviate this weakness, we propose a novel architecture called HDVNet that contains a nested set of encoder-decoder pathways, each handling a specific point density range. Limiting the interconnections between the feature maps enables HDVNet to gauge the reliability of each feature based on the density of a point, e.g., downweighting high density features not existing in low density objects. By effectively handling input density variations, HDVNet outperforms state-of-the-art models in segmentation accuracy on real point clouds with inconsistent density, using just over half the weights.	翻訳日:2023-07-06 18:18:33 公開日:2023-07-04
# SCAT: テキスト分類のための逆学習による頑健な自己教師型コントラスト学習 SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification ( http://arxiv.org/abs/2307.01488v1 ) ライセンス: Link先を確認	Junjie Wu, Dit-Yan Yeung	(参考訳) 様々な自然言語処理(NLP)タスクにおける有望なパフォーマンスにもかかわらず、現在のNLPシステムはテキストの敵対攻撃に対して脆弱である。これらの攻撃から防御するために、既存の方法の多くは、敵の例を取り入れて敵の訓練を適用する。しかし、これらの手法は逆の例を生成するために接地ラベルに依存する必要があり、現在ではnlpや他の多くのタスクで一般的に使用される大規模モデルの事前学習には実用的でない。本稿では、ラベル付きデータを必要としない堅牢な表現を学習できるSCAT(Self-supervised Contrastive Learning via Adversarial Training)という新しい学習フレームワークを提案する。特にSCATは、データのランダムな拡張をラベルのない方法で修正し、逆例を生成する。敵の訓練は、増強と敵との対比的損失を最小化することで達成される。最近提案された2つの最先端攻撃方式を用いて、2つのテキスト分類データセット上でSCATを評価する。以上の結果から,SCATはスクラッチから頑健な言語モデルを訓練できるだけでなく,既存の事前学習言語モデルの堅牢性を大幅に向上させることができることがわかった。さらに,その柔軟性を示すために,scatと教師付き対向訓練を組み合わせることで,モデルのロバスト性をさらに向上できることを示す。 Despite their promising performance across various natural language processing (NLP) tasks, current NLP systems are vulnerable to textual adversarial attacks. To defend against these attacks, most existing methods apply adversarial training by incorporating adversarial examples. However, these methods have to rely on ground-truth labels to generate adversarial examples, rendering it impractical for large-scale model pre-training which is commonly used nowadays for NLP and many other tasks. In this paper, we propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training), which can learn robust representations without requiring labeled data. Specifically, SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples. Adversarial training is achieved by minimizing the contrastive loss between the augmentations and their adversarial counterparts. We evaluate SCAT on two text classification datasets using two state-of-the-art attack schemes proposed recently. Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models. Moreover, to demonstrate its flexibility, we show that SCAT can also be combined with supervised adversarial training to further enhance model robustness.	翻訳日:2023-07-06 18:18:13 公開日:2023-07-04
# h-denseformer : マルチモーダル腫瘍セグメンテーションのための高効率ハイブリッド結合トランスフォーマー H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation ( http://arxiv.org/abs/2307.01486v1 ) ライセンス: Link先を確認	Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue	(参考訳) 近年,多変量医用画像の腫瘍分割に深層学習法が広く用いられており,有望な結果が得られている。しかし、既存の手法のほとんどは、表現能力の不足、特定のモダリティ数、高い計算複雑性によって制限されている。本稿では,畳み込みニューラルネットワーク (cnn) の表現力とトランスフォーミング構造を組み合わせた,h-denseformerという腫瘍セグメント化のためのハイブリッドネットワークを提案する。具体的には、h-denseformerはトランスフォーマティブベースのマルチパス並列埋め込み(mpe)モジュールを統合し、任意の数のモダリティを入力として、異なるモダリティから融合特徴を抽出することができる。その後、マルチモーダル融合機能はエンコーダの異なるレベルに配信され、マルチモーダル学習表現が強化される。さらに,Densely Connected Transformer (DCT) ブロックを設計して,標準的な Transformer ブロックを置き換えることにより,計算量を大幅に削減する。公開マルチモーダルデータセットであるHECKTOR21とPI-CAI22について広範な実験を行った。実験の結果,提案手法は計算の複雑さを低減しつつ,既存の最先端手法よりも優れていることがわかった。ソースコードはhttps://github.com/shijun18/H-DenseFormerで入手できる。 Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.	翻訳日:2023-07-06 18:17:50 公開日:2023-07-04
# Nexus sine qua non:多変量時系列の時空間予測のための基本結合ニューラルネットワーク Nexus sine qua non: Essentially connected neural networks for spatial-temporal forecasting of multivariate time series ( http://arxiv.org/abs/2307.01482v1 ) ライセンス: Link先を確認	Tong Nie, Guoyang Qin, Yunpeng Wang, Jian Sun	(参考訳) 多変量時系列のモデリングと予測は、実践者の意思決定を促進するだけでなく、基礎となる力学系の科学的理解を深める。近年,時空間グラフニューラルネットワーク(STGNN)が強力な予測器として登場し,時空間表現を学習するためのデファクトモデルとなっている。しかし、既存のstgnnのアーキテクチャは、一連の派手なレイヤーを積み重ねることで複雑になりがちである。設計されたモデルは冗長か謎めいたものであり、複雑さと拡張性に大きな課題をもたらす。このような懸念から、私たちは現代のSTGNNの設計を再検討し、強力で効率的な神経予測に寄与するコア原則を特定できます。本稿では,高密度エンコーダデコーダとノード識別によるメッセージパッシング層によって完全に定義された,TN,RNN,Transformerなどの複雑な逐次モジュールを持たない,コンパクトな予測モデルを提案する。実験的な結果は、適切な帰納的ベースを持つ単純でエレガントなモデルが、空間的時間的予測問題に対してより解釈可能で計算的に効率的でありながら、芸術の状態と精巧な設計を適切に比較できることを示している。我々は、より簡潔な神経予測アーキテクチャの設計を再考するために、将来の研究のための新たな地平を開くことを願っている。 Modeling and forecasting multivariate time series not only facilitates the decision making of practitioners, but also deepens our scientific understanding of the underlying dynamical systems. Spatial-temporal graph neural networks (STGNNs) are emerged as powerful predictors and have become the de facto models for learning spatiotemporal representations in recent years. However, existing architectures of STGNNs tend to be complicated by stacking a series of fancy layers. The designed models could be either redundant or enigmatic, which pose great challenges on their complexity and scalability. Such concerns prompt us to re-examine the designs of modern STGNNs and identify core principles that contribute to a powerful and efficient neural predictor. Here we present a compact predictive model that is fully defined by a dense encoder-decoder and a message-passing layer, powered by node identifications, without any complex sequential modules, e.g., TCNs, RNNs, and Transformers. Empirical results demonstrate how a simple and elegant model with proper inductive basis can compare favorably w.r.t. the state of the art with elaborate designs, while being much more interpretable and computationally efficient for spatial-temporal forecasting problem. We hope our findings would open new horizons for future studies to revisit the design of more concise neural forecasting architectures.	翻訳日:2023-07-06 18:17:25 公開日:2023-07-04
# 量子プログラムのブラックボックステストにおける等価性、同一性、ユニタリティチェック Equivalence, Identity, and Unitarity Checking in Black-Box Testing of Quantum Programs ( http://arxiv.org/abs/2307.01481v1 ) ライセンス: Link先を確認	Peixun Long and Jianjun Zhao	(参考訳) 量子プログラムは本質的に非決定論的振る舞いを示し、従来のプログラムよりもエラー発見に重大な課題をもたらす。量子プログラムにはいくつかのテスト手法が提案されているが、ブラックボックステストの基本的な問題を見落としていることが多い。本稿では,量子プログラムのブラックボックステストにおける等価性,同一性,ユニタリティチェックの課題に対処するために特別に設計された3つの新しいアルゴリズムを提案することで,このギャップを埋める。また、等価度とユニタリティチェックの専門バージョンを含むこれらのアルゴリズムの最適化手法についても検討し、パラメータ選択に関する貴重な洞察を提供し、性能と有効性を最大化する。提案手法の有効性を評価するため,提案手法は量子プログラムのブラックボックステストに頑健なサポートを提供し,等価性,アイデンティティ,ユニタリティチェックを厳格に行うことができることを示す総合的な実験評価を行った。 Quantum programs exhibit inherent non-deterministic behavior, which poses more significant challenges for error discovery compared to classical programs. While several testing methods have been proposed for quantum programs, they often overlook fundamental questions in black-box testing. In this paper, we bridge this gap by presenting three novel algorithms specifically designed to address the challenges of equivalence, identity, and unitarity checking in black-box testing of quantum programs. We also explore optimization techniques for these algorithms, including specialized versions for equivalence and unitarity checking, and provide valuable insights into parameter selection to maximize performance and effectiveness. To evaluate the effectiveness of our proposed methods, we conducted comprehensive experimental evaluations, which demonstrate that our methods can rigorously perform equivalence, identity, and unitarity checking, offering robust support for black-box testing of quantum programs.	翻訳日:2023-07-06 18:17:00 公開日:2023-07-04
# バイアス緩和:モデル説明の改善による画像分類の強化 Mitigating Bias: Enhancing Image Classification by Improving Model Explanations ( http://arxiv.org/abs/2307.01473v1 ) ライセンス: Link先を確認	Raha Ahmadi, Mohammad Javad Rajabi, Mohamamd Khalooiem Mohammad Sabokrou	(参考訳) ディープラーニングモデルは、トレーニングデータから複雑なパターンや概念を学ぶ際、顕著な能力を示した。しかし、近年の研究では、これらのモデルは画像の背景に存在する単純で容易に識別できる特徴に大きく依存する傾向にあることが示されている。この現象は、画像への関心の重要要素が隠蔽される可能性があるため、画像分類器に挑戦する。本稿では,この問題に対処する新しいアプローチを提案し,画像分類器による主概念の学習を改善する。我々の中心的な考え方は、分類作業中にモデルがフォアグラウンドに注意を向けるのを同時に導くことを中心に展開する。関心の主対象をカプセル化した前景を強調することで,背景の優越的な影響からモデルの焦点を逸脱させることを目指している。これを実現するために、モデルに十分な注意を前景に割り当てるよう促すメカニズムを導入する。損失関数の変更や追加のアーキテクチャコンポーネントの導入など,さまざまな戦略を検討し,画像内の主概念を効果的に把握できるようにする。さらに,様々な注意機構がモデル性能に与える影響について検討し,その効果について考察する。ベンチマークデータセットの広範な実験を通じて,画像分類器の分類精度を向上させるための提案手法の有効性を実証する。本研究は,画像内の主概念の理解と表現における前景的注意の重要性を浮き彫りにしたものである。本研究は,画像分類分野の進展に寄与し,より堅牢で正確なディープラーニングモデルの開発に有用な知見を提供する。 Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model's attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for developing more robust and accurate deep-learning models.	翻訳日:2023-07-06 18:16:43 公開日:2023-07-04
# beyond conservatism: オフラインマルチエージェント強化学習における拡散ポリシー Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2307.01472v1 ) ライセンス: Link先を確認	Zhuoran Li, Ling Pan and Longbo Huang	(参考訳) 本稿では,オフラインマルチエージェント強化学習(marl)のための拡散型オフラインマルチエージェントモデル(dom2)を提案する。政策設計における保守主義に主に依存する既存のアルゴリズムとは異なり、dom2はポリシー表現力と拡散に基づく多様性を高める。具体的には,ポリシーネットワークに拡散モデルを導入し,訓練における軌道に基づくデータ提供方式を提案する。これらの重要な要素により、我々のアルゴリズムは環境変化に対してより堅牢になり、性能、一般化、データ効率が大幅に向上した。実験の結果,DOM2はマルチエージェント粒子およびマルチエージェント MuJoCo 環境において既存の最先端手法よりも優れており,その表現性や多様性により,シフト環境において大幅に向上していることがわかった。さらに、DOM2はデータ効率が優れ、既存のアルゴリズムに比べて20ドル以上のデータで最先端のパフォーマンスを達成することができる。 We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve state-of-the-art performance with $20+$ times less data compared to existing algorithms.	翻訳日:2023-07-06 18:16:20 公開日:2023-07-04
# 運転者の視線推定と視線行動理解への応用 A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding ( http://arxiv.org/abs/2307.01470v1 ) ライセンス: Link先を確認	Pavan Kumar Sharma and Pranamesh Chakraborty	(参考訳) 運転者の視線は、運転者の注意力検出、視覚障害検出、視線行動理解、建物運転支援システムなど、様々な視線ベースのアプリケーションにおいて重要な役割を果たす。本研究の主な目的は,運転者視線の基礎,運転者視線推定方法,実世界の運転シナリオにおける応用の総合的な要約を行うことである。まず,ヘッドマウントおよびリモートセットアップに基づく視線推定を含むドライバの視線に関する基礎と,これらのデータ収集手法で使用される用語について論じる。次に、既存のベンチマークドライバの注視データセットをリストアップし、収集方法論とそのようなデータ収集に使用する機器を強調する。続いて、従来の機械学習とディープラーニングに基づくテクニックを中心に、ドライバの視線推定に使用されるアルゴリズムに関する議論が行われる。推定されたドライバーの視線は、交差点、オンランプ、オフランプ、車線変更、道路側広告構造の影響を判断しながら視線行動を理解するために使用される。最後に,運転者の視線推定と視線に基づく応用における既存の文献,課題,今後の展望について考察した。 Driver gaze plays an important role in different gaze-based applications such as driver attentiveness detection, visual distraction detection, gaze behavior understanding, and building driver assistance system. The main objective of this study is to perform a comprehensive summary of driver gaze fundamentals, methods to estimate driver gaze, and it's applications in real world driving scenarios. We first discuss the fundamentals related to driver gaze, involving head-mounted and remote setup based gaze estimation and the terminologies used for each of these data collection methods. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and the equipment used for such data collection. This is followed by a discussion of the algorithms used for driver gaze estimation, which primarily involves traditional machine learning and deep learning based techniques. The estimated driver gaze is then used for understanding gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and determining the effect of roadside advertising structures. Finally, we have discussed the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.	翻訳日:2023-07-06 18:16:05 公開日:2023-07-04
# 単一ポートレートからアニマタブルな3次元カートゥーンの顔を生成する Generating Animatable 3D Cartoon Faces from Single Portraits ( http://arxiv.org/abs/2307.01468v1 ) ライセンス: Link先を確認	Chuanyu Pan, Guowei Yang, Taijiang Mu, and Yu-Kun Lai	(参考訳) 仮想現実(VR)技術のブームにより、カスタマイズされた3Dアバターの必要性が高まっている。しかし、従来の3Dアバターモデリングの手法は、時間を要するか、モデル化されている人物と類似性を維持するのに失敗する。 1枚の肖像画からアニマタブルな3Dマンガの顔を生成する新しい枠組みを提案する。まず、入力された現実世界のポートレートをスタイルガン付きのスタイリッシュな漫画画像に転送する。次に, テンプレートモデルに基づく粗い推定を行い, 非剛性変形によるモデルをランドマーク監督下で洗練する, 詳細なテクスチャで3次元マンガ面を復元する2段階の再構成法を提案する。最後に,手作業によるテンプレート作成と変形伝達に基づく意味保存顔リギング手法を提案する。先行技術と比較すると, 質的, 定量的な結果から, 精度, 審美性, 類似性基準が向上した。さらに,我々の3次元モデルのリアルタイム顔アニメーションの能力を実演する。 With the booming of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain similarity to the person being modeled. We present a novel framework to generate animatable 3D cartoon faces from a single portrait image. We first transfer an input real-world portrait to a stylized cartoon image with a StyleGAN. Then we propose a two-stage reconstruction method to recover the 3D cartoon face with detailed texture, which first makes a coarse estimation based on template models, and then refines the model by non-rigid deformation under landmark supervision. Finally, we propose a semantic preserving face rigging method based on manually created templates and deformation transfer. Compared with prior arts, qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrate the capability of real-time facial animation of our 3D model.	翻訳日:2023-07-06 18:15:45 公開日:2023-07-04
# Ego4D長期活動予測チャレンジ2023の実施報告 Technical Report for Ego4D Long Term Action Anticipation Challenge 2023 ( http://arxiv.org/abs/2307.01467v1 ) ライセンス: Link先を確認	Tatsuya Ishibashi, Kosuke Ono, Noriyuki Kugo, Yuji Sato	(参考訳) 本稿では,Ego4D Long-Term Action Precipation Challenge 2023に対するアプローチの技術的詳細について述べる。このタスクの目的は、入力されたビデオが与えられたとき、任意の時間以上で起こる、将来のアクションのシーケンスを予測することである。そこで本研究では,ビデオからクリップレベルの特徴を生成するエンコーダと,複数のクリップレベルの特徴を統合するアグリゲータと,将来的な動作を出力するデコーダの3つの改良点を紹介する。 1) SlowFast と SlowFast-CLIP のモデルアンサンブル 2) 今後の行動の順序制約を緩和するラベルの平滑化 3) 単語共起に基づく動作クラス(verb,noun)の予測を制約する。提案手法は, ベースライン性能を向上し, 公開リーダボード上の第2位ソリューションとして記録した。 In this report, we describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video. To accomplish this task, we introduce three improvements to the baseline model, which consists of an encoder that generates clip-level features from the video, an aggregator that integrates multiple clip-level features, and a decoder that outputs Z future actions. 1) Model ensemble of SlowFast and SlowFast-CLIP; 2) Label smoothing to relax order constraints for future actions; 3) Constraining the prediction of the action class (verb, noun) based on word co-occurrence. Our method outperformed the baseline performance and recorded as second place solution on the public leaderboard.	翻訳日:2023-07-06 18:15:28 公開日:2023-07-04
# selffed: iomtにおけるデータ不均一性とラベル不足に対する自己教師付き連合学習 SelfFed: Self-supervised Federated Learning for Data Heterogeneity and Label Scarcity in IoMT ( http://arxiv.org/abs/2307.01514v1 ) ライセンス: Link先を確認	Sunder Ali Khowaja, Kapal Dev, Syed Muhammad Anwar, Marius George Linguraru	(参考訳) 連合学習パラダイムにおける自己教師あり学習は,ラベルなしで孤立したデータの協調学習能力により,産業と研究の両方において大きな関心を集めている。しかし,自己管理型フェデレート学習戦略は,ラベル不足や多種多様なデータ分布,すなわちデータ不均一性による性能劣化に悩まされている。本稿では,IoMT(Internet of Medical Things)のためのSelfFedフレームワークを提案する。提案するSelfFedフレームワークは2段階で動作する。第1フェーズは、スウィントランスベースのエンコーダを用いた拡張モデリングを分散的に実行する事前学習パラダイムである。 SelfFedフレームワークの第1フェーズは、データの不均一性を克服するのに役立つ。第2フェーズは、対照的なネットワークと、限定ラベル付きデータに基づいて訓練された新たな集約戦略を分散的に導入する、微調整パラダイムである。この微調整段階はラベル不足問題を克服する。我々は,医用画像データセットに関する実験分析を行い,非独立・同一分散(IID)データとラベル不足に関する既存のベースラインと比較して,提案するSelfFedフレームワークが優れていることを示す。非IIDデータセット上のRetinaおよびCOVID-FLデータセットの最大8.8%と4.1%の改善を実現する。さらに,提案手法は,少数の (10%) ラベル付きインスタンスでトレーニングしても,既存のベースラインよりも優れている。 Self-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.	翻訳日:2023-07-06 18:09:44 公開日:2023-07-04
# コンテナ転位問題におけるエネルギー消費最小化のための転位ルールの自動設計 Automated design of relocation rules for minimising energy consumption in the container relocation problem ( http://arxiv.org/abs/2307.01513v1 ) ライセンス: Link先を確認	Marko {\DJ}urasevi\'c, Mateja {\DJ}umi\'c, Rebeka \v{C}ori\'c, Francisco Javier Gil-Gala	(参考訳) コンテナ配置問題は、所定の目的を最小化し、すべてのコンテナを所定の順序で回収するコンテナ配置のシーケンスを見つけることを目的とした組合せ最適化問題である。リロケーションルール(RR)は、優先度関数とリロケーションスキームから構成されており、その柔軟性と効率性から、上記の問題を解決するために一般的に用いられるヒューリスティックである。近年,実世界の多くの問題において,エネルギー消費を考えることがますます重要になっている。しかし、この派生型にはRRは存在せず、手動で設計する必要がある。この問題を回避できる可能性の1つは、新しいRRを自動設計するために超ヒューリスティックスを適用することである。本研究では,rrsにおけるエネルギー消費の最小化を目標とする優先関数の獲得に遺伝的プログラミングを用いる。提案手法を優先度関数の設計に用いた文献からの遺伝的アルゴリズムと比較する。その結果、遺伝子プログラミングによって設計されたRRが最高の性能を発揮することが示された。 The container relocation problem is a combinatorial optimisation problem aimed at finding a sequence of container relocations to retrieve all containers in a predetermined order by minimising a given objective. Relocation rules (RRs), which consist of a priority function and relocation scheme, are heuristics commonly used for solving the mentioned problem due to their flexibility and efficiency. Recently, in many real-world problems it is becoming increasingly important to consider energy consumption. However, for this variant no RRs exist and would need to be designed manually. One possibility to circumvent this issue is by applying hyperheuristics to automatically design new RRs. In this study we use genetic programming to obtain priority functions used in RRs whose goal is to minimise energy consumption. We compare the proposed approach with a genetic algorithm from the literature used to design the priority function. The results obtained demonstrate that the RRs designed by genetic programming achieve the best performance.	翻訳日:2023-07-06 18:09:20 公開日:2023-07-04
# 薬物-薬物相互作用予測のためのココントラスト学習と関係認識サブグラフ埋め込み Relation-aware subgraph embedding with co-contrastive learning for drug-drug interaction prediction ( http://arxiv.org/abs/2307.01507v1 ) ライセンス: Link先を確認	Mengying Jiang and Guizhong Liu and Biao Zhao and Yuanchao Su and Weiqiang Jin	(参考訳) リレーショナル・アウェア・サブグラフの埋め込みはDDI(multi-relational drug-drug interaction)の予測に有効である。通常、既存のほとんどの手法はDDIグラフの構築から始まり、DDIグラフから薬物の関連性認識サブグラフ埋め込み(RaSE)を学習する。しかしながら、既存のほとんどのアプローチは、新しい薬物のRaSEを学習するのに限られており、テストDDIがそのような薬物を含む場合、深刻な過度な適合をもたらす。そこで本稿では,連関学習を伴う関係認識部分グラフ埋め込みに基づく新しいddi予測手法rasecoを提案する。 RaSECoは、マルチリレーショナルDDIグラフとマルチ属性ベースのドラッグ・ドラッグ類似性(DDS)グラフという、2つの異種薬物グラフを構築している。 2つのグラフはそれぞれ、薬物のRaSEを学習し、伝播するために使用され、それによって新しい薬物を含む全ての薬物が効果的なRaSEを収集できる。さらに,薬物ペア(DP)の埋め込みを促進するために,クロスビューコントラスト機構を採用している。 RaSECoは2つの異なる視点(相互作用と類似性の観点から)からDP埋め込みを学び、これらの見解を相互に監督し、より差別的なDP埋め込みを得るよう促している。 2つの実データセットを用いて3つのタスクにおけるRaSECoの有効性を評価する。実験の結果,RaSECoは既存の最先端予測手法よりも優れていた。 Relation-aware subgraph embedding is promising for predicting multi-relational drug-drug interactions (DDIs). Typically, most existing methods begin by constructing a multi-relational DDI graph and then learning relation-aware subgraph embeddings (RaSEs) of drugs from the DDI graph. However, most existing approaches are usually limited in learning RaSEs of new drugs, leading to serious over-fitting when the test DDIs involve such drugs. To alleviate this issue, We propose a novel DDI prediction method based on relation-aware subgraph embedding with co-contrastive learning, RaSECo. RaSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attributes-based drug-drug similarity (DDS) graph. The two graphs are used respectively for learning and propagating the RaSEs of drugs, thereby ensuring that all drugs, including new ones, can aggregate effective RaSEs. Additionally, we employ a cross-view contrastive mechanism to enhance drug-pair (DP) embedding. RaSECo learns DP embeddings from two distinct views (interaction and similarity views) and encourages these views to supervise each other collaboratively to obtain more discriminative DP embeddings. We evaluate the effectiveness of our RaSECo on three different tasks using two real datasets. The experimental results demonstrate that RaSECo outperforms existing state-of-the-art prediction methods.	翻訳日:2023-07-06 18:09:06 公開日:2023-07-04
# グラフニューラルネットワークのためのマルチタスクプロンプト All in One: Multi-task Prompting for Graph Neural Networks ( http://arxiv.org/abs/2307.01504v1 ) ライセンス: Link先を確認	Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan	(参考訳) 近年、「事前学習と微調整」は、各アプリケーションからのグラフアノテーションの欠如を緩和するために一般的なグラフ知識を活用できるため、多くのグラフタスクの標準ワークフローとして採用されている。しかし、ノードレベル、エッジレベル、グラフレベルのグラフタスクは、はるかに多様化しており、事前トレーニングされたプリテキストは、これらの複数のタスクと互換性がないことが多い。このギャップは、特定のアプリケーションに対して'負の転送'を引き起こす可能性があり、その結果は乏しい。自然言語処理(NLP)の素早い学習にインスパイアされ,様々なNLPタスクに事前知識を活用する上で,事前学習されたモデルと各種グラフタスクのギャップを埋める動機付けとして,グラフの素早いトピックについて検討した。本稿では,グラフモデルのための新しいマルチタスクプロンプト手法を提案する。具体的には、最初にグラフプロンプトと言語プロンプトのフォーマットをプロンプトトークン、トークン構造、挿入パターンで統一しました。このようにして、NLPからのプロンプトアイデアをグラフ領域にシームレスに導入することができる。次に,グラフ処理と最先端事前学習戦略のギャップをさらに狭めるため,様々なグラフアプリケーションのタスク空間をさらに調査し,ダウンストリーム問題をグラフレベルのタスクに再構成する。その後、我々はメタラーニングを導入し、グラフのマルチタスクプロンプトのより優れた初期化を効果的に学習し、異なるタスクに対してより信頼性と一般的なプロンプトフレームワークを実現する。我々は広範囲な実験を行い、その結果、本手法の優位性を実証した。 Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a ''negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.	翻訳日:2023-07-06 18:08:43 公開日:2023-07-04
# 多言語設定におけるジェンダーバイアスの評価と緩和について On Evaluating and Mitigating Gender Biases in Multilingual Settings ( http://arxiv.org/abs/2307.01503v1 ) ライセンス: Link先を確認	Aniket Vashishtha, Kabir Ahuja, Sunayana Sitaram	(参考訳) 言語モデルにおけるジェンダーバイアスの理解と排除は、自然言語処理における長年の問題であったが、以前の研究は主に英語に限られていた。本研究では,多言語環境におけるバイアスの評価と緩和に関する課題について検討し,その原因は英語以外の言語におけるバイアス評価のための既存のベンチマークやリソースの欠如にある。本稿では、まず、人間のアノテーションを用いて、DisCoを異なるインド言語に拡張することにより、事前訓練されたマスキング言語モデルの性別バイアスを評価するベンチマークを作成する。提案手法を英語以外の言語に拡張し,SOTAの大規模多言語モデルの有効性を評価する。全体として、我々の研究は、多言語環境での社会的バイアスを研究する際に生じる課題を強調し、より多くの言語にスケールするためのリソースと緩和技術を提供する。 While understanding and removing gender biases in language models has been a long-standing problem in Natural Language Processing, prior research work has primarily been limited to English. In this work, we investigate some of the challenges with evaluating and mitigating biases in multilingual settings which stem from a lack of existing benchmarks and resources for bias evaluation beyond English especially for non-western context. In this paper, we first create a benchmark for evaluating gender biases in pre-trained masked language models by extending DisCo to different Indian languages using human annotations. We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models on our proposed metric. Overall, our work highlights the challenges that arise while studying social biases in multilingual settings and provides resources as well as mitigation techniques to take a step toward scaling to more languages.	翻訳日:2023-07-06 18:08:17 公開日:2023-07-04
# hedi : 初回臨床応用と切開ヘルニア修復のための生体力学的評価・可視化ツールの成績 HEDI: First-Time Clinical Application and Results of a Biomechanical Evaluation and Visualisation Tool for Incisional Hernia Repair ( http://arxiv.org/abs/2307.01502v1 ) ライセンス: Link先を確認	Jacob J. Relle, Samuel Vo{\ss}, Ramesch Raschidi, Regine Nessel, Johannes G\"orich, Mark O. Wielp\"utz, Thorsten L\"offler, Vincent Heuveline, Friedrich Kallinowski, Philipp D. L\"osel	(参考訳) 腹壁欠損は、しばしば痛み、不快感、また切開ヘルニアの再発を招き、深刻な致死性および世界中で外科的修復を繰り返している。大規模なヘルニアに対するメッシュ修復は, 筋肉活性化, 腹腔内圧, 組織弾性, 腹部壁の拘縮などの生体力学的側面を考慮せずに, 重なりが固定された欠損領域に基づいて行われる。この問題を解決するため,不安定な腹壁を考慮に入れた切開ヘルニア修復に対する生体力学的アプローチを提案する。さらに,valsalva操作を伴うダイナミックctを用いてヘルニアの大きさ,体積,腹壁不安定を自動的に検出し評価するツールであるhediを紹介する。 31例の術前評価におけるHEDIの初回臨床応用は, 術後3年を経過し, 痛覚を伴わず, ヘルニア再発を認めなかった症例と比較して, 成功率も有意に向上した。 Abdominal wall defects often lead to pain, discomfort, and recurrence of incisional hernias, resulting in significant morbidity and repeated surgical repairs worldwide. Mesh repair for large hernias is usually based on the defect area with a fixed overlap, without considering biomechanical aspects such as muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distention. To address this issue, we present a biomechanical approach to incisional hernia repair that takes into account the unstable abdominal wall. Additionally, we introduce HEDI, a tool that uses dynamic computed tomography with Valsalva maneuver to automatically detect and assess hernia size, volume, and abdominal wall instability. Our first clinical application of HEDI in the preoperative evaluation of 31 patients shows significantly improved success rates compared to reported rates, with all patients remaining pain-free and showing no hernia recurrence after three years of follow-up.	翻訳日:2023-07-06 18:08:03 公開日:2023-07-04
# 非エルミート境界項を持つハミルトンからの到着時間 Arrival time from Hamiltonian with non-hermitian boundary term ( http://arxiv.org/abs/2307.01501v1 ) ライセンス: Link先を確認	Tajron Juri\'c, Hrvoje Nikoli\'c	(参考訳) 検出器への到達の量子確率密度を求める新しい方法を開発した。検出器の外領域に制限された量子状態の進化は、非エルミート境界項を含む制限されたハミルトニアンによって記述される。非エルミート項は境界を通る確率電流演算子のフラックスに比例していることが示されており、これは到達確率密度が確率電流のフラックスに等しいことを意味する。 We develop a new method for finding the quantum probability density of arrival at the detector. The evolution of the quantum state restricted to the region outside of the detector is described by a restricted Hamiltonian that contains a non-hermitian boundary term. The non-hermitian term is shown to be proportional to the flux of the probability current operator through the boundary, which implies that the arrival probability density is equal to the flux of the probability current.	翻訳日:2023-07-06 18:07:44 公開日:2023-07-04
# 状態依存雑音を伴う加速確率近似 Accelerated stochastic approximation with state-dependent noise ( http://arxiv.org/abs/2307.01497v1 ) ライセンス: Link先を確認	Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li	(参考訳) 確率勾配観測における雑音に対するより一般的な仮定の下で、確率的滑らかな凸最適化問題のクラスを考える。ノイズの分散が一様有界であると仮定される古典的な問題設定とは対照的に、確率勾配の分散はアルゴリズムによって与えられる近似解の「準最適性」に関係していると仮定する。このような問題は様々な応用、特に統計学におけるよく知られた一般化線形回帰問題において自然に発生する。しかし、我々の知る限りでは、このような問題のクラスを解くための確率近似アルゴリズムは、精度、問題パラメータ、およびミニバッチサイズに依存するため、最適性を得ることができない。本稿では,2つの非ユークリッド加速確率近似ルーチン,-確率加速度勾配勾配(SAGD)と確率勾配外挿(SGE)について論じる。適切な条件下では,sagd と sge が最適収束率を達成し,最適な反復とサンプルの複雑度を同時に達成できることを示す。しかし、SGEアルゴリズムの対応する仮定はより一般的なものであり、例えば、重いテールノイズや不連続スコア関数の下での統計的推定問題にSGEを効率的に適用することができる。また,2次成長条件を満たす問題に対するSGEの適用について論じ,スパース溶液の回収にどのように使用できるかを示した。最後に,提案アルゴリズムの高次元設定における数値的性能を示すシミュレーション実験について報告する。 We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.	翻訳日:2023-07-06 18:07:36 公開日:2023-07-04
# AndroidおよびWindowsシステムにおけるディープラーニングによるマルウェア検出のレビュー Review of Deep Learning-based Malware Detection for Android and Windows System ( http://arxiv.org/abs/2307.01494v1 ) ライセンス: Link先を確認	Nazmul Islam and Seokjoo Shin	(参考訳) マルウェアの差別化は、彼らの行動と脅威レベルを判断し、彼らに対する防衛戦略を考案する上で重要である。これに対し、異なるマルウェアを区別する様々なアンチマルウェアシステムが開発されている。しかし、最近のマルウェアファミリーのほとんどは人工知能(AI)であり、異なる難読化技術を用いて従来のマルウェアシステムを騙すことができる。したがって、AI対応のアンチマルウェアシステムだけがこれらの技術に対して堅牢であり、悪意のある活動を支援するマルウェアファイルの異なる特徴を検出することができる。そこで本研究では,Windows と Android の2つのマルウェア検出技術について概説する。どちらの手法も、様々なマルウェアファミリーの検出において、完全な精度を達成した。 Differentiating malware is important to determine their behaviors and level of threat; as well as to devise defensive strategy against them. In response, various anti-malware systems have been developed to distinguish between different malwares. However, most of the recent malware families are Artificial Intelligence (AI) enable and can deceive traditional anti-malware systems using different obfuscation techniques. Therefore, only AI-enabled anti-malware system is robust against these techniques and can detect different features in the malware files that aid in malicious activities. In this study we review two AI-enabled techniques for detecting malware in Windows and Android operating system, respectively. Both the techniques achieved perfect accuracy in detecting various malware families.	翻訳日:2023-07-06 18:07:09 公開日:2023-07-04
# FB-OCC: 前向き視点変換に基づく3次元活動予測 FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation ( http://arxiv.org/abs/2307.01492v1 ) ライセンス: Link先を確認	Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez	(参考訳) 本報告は, エンド・ツー・エンド自動運転に関するcvpr 2023ワークショップと, 視覚中心自律運転ワークショップに関するcvpr 23ワークショップと共同で開催されている3次元占有予測チャレンジの勝利ソリューションを要約する。提案したFB-OCCは,前方投影を用いた最先端カメラを用いた鳥眼視認識設計であるFB-BEVに基づいている。 fb-bev 上に,3次元占有率予測タスクに合わせた新しい設計と最適化についてさらに検討し,共同学習,voxel-bev表現,モデルのスケールアップ,効果的な後処理戦略について検討した。これらの設計と最適化により、最新のmIoUスコアはnuScenesデータセットで54.19%となり、チャレンジトラックで1位となった。コードとモデルはhttps://github.com/nvlabs/fb-bevでリリースされる。 This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. On top of FB-BEV, we further study novel designs and optimization tailored to the 3D occupancy prediction task, including joint depth-semantic pre-training, joint voxel-BEV representation, model scaling up, and effective post-processing strategies. These designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track. Code and models will be released at: https://github.com/NVlabs/FB-BEV.	翻訳日:2023-07-06 18:06:58 公開日:2023-07-04
# 開放型世代のための自己矛盾訓練による反復学習バイアスの軽減 Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation ( http://arxiv.org/abs/2307.01542v1 ) ライセンス: Link先を確認	Jian Guan, Minlie Huang	(参考訳) 無数の生成タスクの大幅な進歩にもかかわらず、GPT2のような事前訓練された言語モデル(LM)は、オープンエンド生成のための最大化に基づく復号アルゴリズムで繰り返しテキストを生成する傾向にある。 lmsはmleの損失により、単純な反復パターンを素早く捉えます。本稿では,2つのデータセットの流速を維持しながら繰り返しを効果的に緩和することを示す反復を誤って予測した場合に,同一モデルの早期チェックポイントの出力をペナルティ化する自己比較訓練を提案する。さらに, LMは, 文レベルの繰り返しループの原因となる非繰り返しトークンよりも長い範囲依存を用いて繰り返しトークンを予測する。 Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.	翻訳日:2023-07-06 17:59:52 公開日:2023-07-04
# AIの限界を理解するために教室でプロンプトを学ぶ:パイロットスタディ Learning to Prompt in the Classroom to Understand AI Limits: A pilot study ( http://arxiv.org/abs/2307.01540v1 ) ライセンス: Link先を確認	Emily Theophilou, Cansu Koyuturk, Mona Yavari, Sathya Bursic, Gregor Donabauer, Alessia Telari, Alessia Testa, Raffaele Boiano, Davinia Hernandez-Leo, Martin Ruskov, Davide Taibi, Alessandro Gabbiadini, Dimitri Ognibene	(参考訳) 人工知能の進歩は社会を援助し、社会問題に取り組む上で大きな可能性を秘めている。特に大きな言語モデル(llm)とチャットボット(chatgptなど)は、aiシステムの自然言語処理機能を高度に改善し、前例のない量の非構造化データを処理できるようになった。一連の誇大広告も反発し、新しいaiメソッドの驚くべき貢献の後でもネガティブな感情が高まった。原因の1つは、AIや問題領域のこれまでの専門知識を使わずに、あらゆる領域のあらゆる種類の知識にアクセスし、処理でき、そして、幻覚や推論の限界のような現在のLSMの限界を無視している、という誤解を招くことにある。 AIの誤認を認めることは、LLMが生成した誤った提案において、犬の過信の影響に対処するために重要である。同時に、AIに対する恐怖やその他の否定的な態度を減らすことができる。 AIリテラシーの介入は、大衆がそのようなLCMの限界を理解して、より効果的な方法でそれらを使用する方法、すなわち「急進的な」学習を学ぶために必要である。この目的により、30人の生徒を抱えた高校でパイロット教育の介入が行われた。関係してます一知能、AI、LLMに関する高レベルの概念を提示すること。 (ii)非自明なタスクにおけるchatgptによる初期ナイーブな実践、そして最後に (iii)現在認められている推進戦略を適用すること。学生報告などの事前結果を収集した。 a) 活動の高く評価すること b)教育活動におけるLLMとの相互作用の質の向上。 c) aiに対する否定的な感情の低下。 d) 制限に対する理解の高まり,具体的には,AIの受容に影響を与える要因を調査し,より制御された環境でこの活動を洗練・繰り返すことを目的としている。 Artificial intelligence's progress holds great promise in assisting society in addressing pressing societal issues. In particular Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. The consequent hype has also backfired, raising negative sentiment even after novel AI methods' surprising contributions. One of the causes, but also an important issue per se, is the rising and misleading feeling of being able to access and process any form of knowledge to solve problems in any domain with no effort or previous expertise in AI or problem domain, disregarding current LLMs limits, such as hallucinations and reasoning limits. Acknowledging AI fallibility is crucial to address the impact of dogmatic overconfidence in possibly erroneous suggestions generated by LLMs. At the same time, it can reduce fear and other negative attitudes toward AI. AI literacy interventions are necessary that allow the public to understand such LLM limits and learn how to use them in a more effective manner, i.e. learning to "prompt". With this aim, a pilot educational intervention was performed in a high school with 30 students. It involved (i) presenting high-level concepts about intelligence, AI, and LLM, (ii) an initial naive practice with ChatGPT in a non-trivial task, and finally (iii) applying currently-accepted prompting strategies. Encouraging preliminary results have been collected such as students reporting a) high appreciation of the activity, b) improved quality of the interaction with the LLM during the educational activity, c) decreased negative sentiments toward AI, d) increased understanding of limitations and specifically We aim to study factors that impact AI acceptance and to refine and repeat this activity in more controlled settings.	翻訳日:2023-07-06 17:59:38 公開日:2023-07-04
# 柔らかい導波路のトンネル:本を閉じる Tunneling in soft waveguides:closing a book ( http://arxiv.org/abs/2307.01536v1 ) ライセンス: Link先を確認	Pavel Exner and David Spitzkopf	(参考訳) 一般化された「ブックカバー」形状の2次元の柔らかい量子導波路のスペクトル、すなわち、有限湾曲部分とほぼ平行に同じ方向を向いている直進形アシンポットからなる溝の形のポテンシャルを持つシュリンガー作用素について検討する。固有値が漸近値の間の角度が0になるときどのように蓄積するかを示す。平行漸近群の場合、離散スペクトルの存在は溝のプロファイルに依存する。弱結合の場合に存在しないことを証明し、一方、横ポテンシャルが十分強ければ存在することを証明する。また、臨界強度を評価する数値的な例を示す。 We investigate the spectrum of a soft quantum waveguide in two dimensions of the generalized `bookcover' shape, that is, Schr\"odinger operator with the potential in the form of a ditch consisting of a finite curved part and straight asymptotes which are parallel or almost parallel pointing in the same direction. We show how the eigenvalues accumulate when the angle between the asymptotes tends to zero. In case of parallel asymptotes the existence of a discrete spectrum depends on the ditch profile. We prove that it is absent in the weak-coupling case, on the other hand, it exists provided the transverse potential is strong enough. We also present a numerical example in which the critical strength can be assessed.	翻訳日:2023-07-06 17:59:09 公開日:2023-07-04
# コンパクトな動き表現に基づく拡散モデルによる教師なし映像異常検出 Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations ( http://arxiv.org/abs/2307.01533v1 ) ライセンス: Link先を確認	Anil Osman Tur and Nicola Dall'Asen and Cigdem Beyan and Elisa Ricci	(参考訳) 本稿では,ビデオ内の各フレームを,ラベルにアクセスすることなく正常または異常に分類する,教師なしビデオ異常検出(VAD)問題に対処することを目的とする。これを実現するために,提案手法では,入力データが事前学習されたネットワークから抽出された時空間的特徴である条件付き拡散モデルを用い,その条件は映像セグメントを要約したコンパクトな動作表現から抽出された特徴である。本手法は,データ駆動しきい値を用い,高い再構成誤差を異常事象の指標として捉える。本研究は,vadに対するコンパクトな運動表現を用いた最初の研究であり,2つの大規模vadベンチマークを用いた実験により,拡散モデルに関連する情報を提供し,その結果,先行技術におけるvad性能を向上させることを実証した。重要な点として,本手法は,各データセットの一般化性能が向上し,最先端手法とベースライン手法の両方に優れていた。私たちのメソッドのコードはhttps://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusionで利用可能です。 This paper aims to address the unsupervised video anomaly detection (VAD) problem, which involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, the proposed method employs conditional diffusion models, where the input data is the spatiotemporal features extracted from a pre-trained network, and the condition is the features extracted from compact motion representations that summarize a given video segment in terms of its motion and appearance. Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events. This study is the first to utilize compact motion representations for VAD and the experiments conducted on two large-scale VAD benchmarks demonstrate that they supply relevant information to the diffusion model, and consequently improve VAD performances w.r.t the prior art. Importantly, our method exhibits better generalization performance across different datasets, notably outperforming both the state-of-the-art and baseline methods. The code of our method is available at https://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusion	翻訳日:2023-07-06 17:58:57 公開日:2023-07-04
# 不確実性下における自律エージェントの意図行動分析 Analyzing Intentional Behavior in Autonomous Agents under Uncertainty ( http://arxiv.org/abs/2307.01532v1 ) ライセンス: Link先を確認	Filip Cano C\'ordoba, Samuel Judson, Timos Antonopoulos, Katrine Bj{\o}rner, Nicholas Shoemaker, Scott J. Shapiro, Ruzica Piskac and Bettina K\"onighofer	(参考訳) 不確実な環境での自律的な意思決定の原則的説明責任は、否定的な設計と実際の事故との意図的な結果の区別を必要とする。本稿では,意図的行動の証拠を定量的に測定し,自律的エージェントの行動分析を行う。我々は不確実な環境をマルコフ決定過程(MDP)としてモデル化する。与えられたシナリオでは、あるイベントに到達したエージェントの能力を計算するために確率論的モデルチェックに依存します。これを代理店のスコープと呼ぶ。エージェントのスコープが高く、エージェントの決定がイベントに到達するのに最適に近い場合、意図的な行動の証拠があると言う。提案手法は,評価の信頼性を高めるために分析可能な関連シナリオを自動的に生成する。ケーススタディでは,本手法が「意図的」交通衝突と「事故的」交通衝突を区別できることを示す。 Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence of intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event. Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between 'intentional' and 'accidental' traffic collisions.	翻訳日:2023-07-06 17:58:38 公開日:2023-07-04
# コンボリューショントランスフォーマによるトマトの照明・咬合・熟成条件の自律的認識と階調評価 Convolutional Transformer for Autonomous Recognition and Grading of Tomatoes Under Various Lighting, Occlusion, and Ripeness Conditions ( http://arxiv.org/abs/2307.01530v1 ) ライセンス: Link先を確認	Asim Khan, Taimur Hassan, Muhammad Shafay, Israa Fahmy, Naoufel Werghi, Lakmal Seneviratne and Irfan Hussain	(参考訳) 完全に熟したトマトをモバイルロボットで収穫することは、現実世界のシナリオにおいて重大な課題をもたらす。これらの課題は、葉や枝によって引き起こされる閉塞や、果実の発達段階におけるトマトと周辺の葉の色類似性などの要因から生じる。自然環境はさらにこれらの問題を、様々な光条件、視角、閉塞要因、および異なる成熟度レベルと組み合わせている。これらの障害を克服するために, コンボリューショントランスフォーマーアーキテクチャを利用して, 閉塞レベル, 照明条件, 熟度に関わらず, トマトを自律的に認識し, 格付けする新しい枠組みを導入する。提案モデルは、この目的のために特別にキュレートされた注意深い注釈付き画像を用いて訓練され、テストされる。データセットは、さまざまな照明条件下で準備され、視点を視認し、さまざまなモバイルカメラセンサーを使用し、Laboro TomatoやRob2Pheno Annotated Tomatoといった既存のデータセットと区別する。乱雑なトマトインスタンスと隠蔽トマトインスタンスの処理におけるフレームワークの有効性を,2つの公開データセットである Laboro Tomato と Rob2Pheno Annotated Tomato をベンチマークとして評価した。これら3つのデータセットにおける評価結果から, トマトをアノテートしたkutomadata, laboro tomato, rob2phenoの平均精度スコアにおいて, 最先端の58.14%, 65.42%, 66.39%を上回った。その結果,トマトをベースライン法や従来の手法と比較して精度良く検出・区分けできることで,提案モデルの優越性が向上した。具体的には、f1-scoreが80.14%、dice係数が73.26%、平均iouが66.41%である。 Harvesting fully ripe tomatoes with mobile robots presents significant challenges in real-world scenarios. These challenges arise from factors such as occlusion caused by leaves and branches, as well as the color similarity between tomatoes and the surrounding foliage during the fruit development stage. The natural environment further compounds these issues with varying light conditions, viewing angles, occlusion factors, and different maturity levels. To overcome these obstacles, this research introduces a novel framework that leverages a convolutional transformer architecture to autonomously recognize and grade tomatoes, irrespective of their occlusion level, lighting conditions, and ripeness. The proposed model is trained and tested using carefully annotated images curated specifically for this purpose. The dataset is prepared under various lighting conditions, viewing perspectives, and employs different mobile camera sensors, distinguishing it from existing datasets such as Laboro Tomato and Rob2Pheno Annotated Tomato. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively. The results underscore the superiority of the proposed model in accurately detecting and delineating tomatoes compared to baseline methods and previous approaches. Specifically, the model achieves an F1-score of 80.14%, a Dice coefficient of 73.26%, and a mean IoU of 66.41% on the KUTomaData image dataset.	翻訳日:2023-07-06 17:58:24 公開日:2023-07-04
# セマンティックセグメンテーションのための画像の学習圧縮表現の爆発的富化 Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation ( http://arxiv.org/abs/2307.01524v1 ) ライセンス: Link先を確認	Ravi Kakaiya, Rakshith Sathish, Ramanathan Sethuraman	(参考訳) 自動運転車とADAS(Advanced Driving Assistance Systems)は、旅行のやり方を根本的に変える可能性がある。これらの車両の多くは、周囲の物体を検知し追跡するために、現在セグメンテーションと物体検出アルゴリズムに依存している。車両から収集されたデータは、これらのアルゴリズムの継続的な/一生の学習を容易にするために、しばしばクラウドサーバに送られる。帯域幅の制約を考慮すると、データはサーバに送信する前に圧縮され、トレーニングや分析のためにデ圧縮される。本研究では,標準パイプラインにおける減圧縮動作に発生するレイテンシのオーバーヘッドを削減するために,学習ベースの圧縮コーデックを用いることを提案する。得られた圧縮表現は,画像を得るための減算に加えて,意味セグメンテーションなどのタスクの実行にも利用できることを示す。我々は、cityscapesデータセット上で提案されたパイプラインを実験的に検証し、圧縮係数を最大6,6 \times$とし、除算された画像を用いて達成した0.88$に対して、サイス係数0.84$でセグメンテーションを行うために必要な情報を保存し、全体的な計算を1,1\%$で削減した。 Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have the potential to radically change the way we travel. Many such vehicles currently rely on segmentation and object detection algorithms to detect and track objects around its surrounding. The data collected from the vehicles are often sent to cloud servers to facilitate continual/life-long learning of these algorithms. Considering the bandwidth constraints, the data is compressed before sending it to servers, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression Codec to reduce the overhead in latency incurred for the decompression operation in the standard pipeline. We demonstrate that the learned compressed representation can also be used to perform tasks like semantic segmentation in addition to decompression to obtain the images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor up to $66 \times$ while preserving the information required to perform segmentation with a dice coefficient of $0.84$ as compared to $0.88$ achieved using decompressed images while reducing the overall compute by $11\%$.	翻訳日:2023-07-06 17:57:44 公開日:2023-07-04
# LEAT: リアルタイムシナリオにおける遅延アンサンブル攻撃によるロバストディープフェイク破壊に向けて LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack ( http://arxiv.org/abs/2307.01520v1 ) ライセンス: Link先を確認	Joonkyo Shim, Hyunsoo Yoon	(参考訳) 生成モデルによって生成された悪質な視覚コンテンツであるディープフェイクは、社会にますます有害な脅威をもたらす。近年のディープフェイクの損傷を積極的に軽減するために, 逆方向の摂動を用いてディープフェイクモデルの出力を妨害する研究が進められている。しかしながら、以前のアプローチでは、主に所定のターゲット属性のみに基づいて歪んだ出力を生成することに重点を置いており、ターゲット属性が不明な現実世界のシナリオでは堅牢性が欠落している。さらに、GAN(Generative Adversarial Networks)と拡散モデル(Diffusion Models)の2つの顕著な生成モデル間の摂動の伝達性は未解明のままである。本稿では,頑健なディープフェイク破壊を実現するための目標特性伝達性とモデル伝達性の重要性を強調する。この課題に対処するために,leatと呼ばれる,独立な潜在符号化プロセスを攻撃する簡易かつ効果的な破壊手法を提案する。遅延符号化処理を中断することにより、所定の目標属性に関係なく、その後の生成プロセスで歪んだ出力画像を生成する。このターゲット属性非依存攻撃は、ターゲット属性が未知である場合でもロバストなディスラプションを保証する。さらに,回帰勾配攻撃のための勾配を効果的に集約し,ganモデルと拡散モデルの両方を含む様々なディープフェイクモデルに対する同時攻撃を可能にする正規化勾配アンサンブル戦略を導入する。さらに,画素レベルの差のみに基づく破壊品質の評価が不十分であることを示す。その結果,防衛の成功を包括的に評価するための代替プロトコルを提案する。実世界のシナリオにおいてディープフェイクをディスラプトする手法の有効性を確認し,従来の手法よりも高い防御成功率を報告した。 Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society. To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on only predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, Generative Adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.	翻訳日:2023-07-06 17:57:21 公開日:2023-07-04
# パーソナライズされた治療推薦のための深層注意qネットワーク Deep Attention Q-Network for Personalized Treatment Recommendation ( http://arxiv.org/abs/2307.01519v1 ) ライセンス: Link先を確認	Simin Ma, Junghwan Lee, Nicoleta Serban, Shihao Yang	(参考訳) 個別の患者に対する治療の調整は、最適な医療成果を得るためには極めて困難である。強化学習の最近の進歩は、有望なパーソナライズされた治療レコメンデーションを提供するが、それらは患者の状態として、患者の真の健康状態を正確に表現しない現在の患者観察(視覚標識、人口統計)にのみ依存している。この制限は政策学習と評価を妨げ、最終的に治療効果を制限する。本研究では,過去の患者観察を効率的に取り入れるために,深層強化学習フレームワーク内のトランスフォーマーアーキテクチャを活用して,パーソナライズされた治療推奨のための深層注意qネットワークを提案する。実世界の敗血症と急性低血圧コホートに関するモデルを評価し,最新モデルよりも優れていることを示した。私たちのモデルのソースコードはhttps://github.com/stevenmsm/RL-ICU-DAQNで公開されています。 Tailoring treatment for individual patients is crucial yet challenging in order to achieve optimal healthcare outcomes. Recent advances in reinforcement learning offer promising personalized treatment recommendations; however, they rely solely on current patient observations (vital signs, demographics) as the patient's state, which may not accurately represent the true health status of the patient. This limitation hampers policy learning and evaluation, ultimately limiting treatment effectiveness. In this study, we propose the Deep Attention Q-Network for personalized treatment recommendations, utilizing the Transformer architecture within a deep reinforcement learning framework to efficiently incorporate all past patient observations. We evaluated the model on real-world sepsis and acute hypotension cohorts, demonstrating its superiority to state-of-the-art models. The source code for our model is available at https://github.com/stevenmsm/RL-ICU-DAQN.	翻訳日:2023-07-06 17:56:51 公開日:2023-07-04
# LPN:数ショット分類のための言語誘導型プロトタイプネットワーク LPN: Language-guided Prototypical Network for few-shot classification ( http://arxiv.org/abs/2307.01515v1 ) ライセンス: Link先を確認	Kaihui Cheng, Chule Yang	(参考訳) 少数ショット分類は、制限されたラベル付き例で新しいタスクに適応することを目的としている。アクセス可能なデータを完全に利用するために、最近の手法では、クエリとサポートイメージの類似性、およびメタトレーニングと事前トレーニング戦略による高次元特徴の適切な測定方法が検討されている。しかし、マルチモダリティ情報の可能性はほとんど検討されていないため、少数ショット分類に有望な改善をもたらす可能性がある。本稿では,2つの並列分岐による視覚と言語モダリティの相補性を活用した,少数ショット分類のための言語誘導型ネットワーク (lpn) を提案する。具体的には,視覚タスクに限られたサンプルで言語モダリティを導入するために,事前学習されたテキストエンコーダを活用して,従来の画像エンコーダで画像を処理すると同時に,クラス名から直接クラスレベルのテキスト特徴を抽出する。次に、クラスレベルの特徴と視覚的特徴を整合させることにより、各画像に対応するテキスト特徴を得るために、言語案内デコーダを導入する。さらに,クラスレベルの特徴とプロトタイプを活用するために,テキストブランチに頑健なプロトタイプを生成する改良されたプロトタイプヘッドを構築した。最後に、視覚とテキストのロジットを集約し、単一のモダリティの偏差を校正する。大規模な実験は、ベンチマークデータセットの最先端手法に対するLPNの競争力を示す。 Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the accessible data, recent methods explore suitable measures for the similarity between the query and support images and better high-dimensional features with meta-training and pre-training strategies. However, the potential of multi-modality information has barely been explored, which may bring promising improvement for few-shot classification. In this paper, we propose a Language-guided Prototypical Network (LPN) for few-shot classification, which leverages the complementarity of vision and language modalities via two parallel branches. Concretely, to introduce language modality with limited samples in the visual task, we leverage a pre-trained text encoder to extract class-level text features directly from class names while processing images with a conventional image encoder. Then, a language-guided decoder is introduced to obtain text features corresponding to each image by aligning class-level features with visual features. In addition, to take advantage of class-level features and prototypes, we build a refined prototypical head that generates robust prototypes in the text branch for follow-up measurement. Finally, we aggregate the visual and text logits to calibrate the deviation of a single modality. Extensive experiments demonstrate the competitiveness of LPN against state-of-the-art methods on benchmark datasets.	翻訳日:2023-07-06 17:56:33 公開日:2023-07-04
# ニューラルネットワークと単語埋め込みを用いた概念認知マップ形成 Conceptual Cognitive Maps Formation with Neural Successor Networks and Word Embeddings ( http://arxiv.org/abs/2307.01577v1 ) ライセンス: Link先を確認	Paul Stoewer, Achim Schilling, Andreas Maier and Patrick Krauss	(参考訳) 人間の脳は、環境から受信した情報を文脈化する特別な能力を持っている。内野-海馬はこの機能において重要な役割を担っており、場所とグリッド細胞を用いた記憶処理や認知地図の構築に深く関わっている。この能力の理解と活用は、人工知能の分野を著しく強化する可能性がある。マルチスケールの後継表現は、場所とグリッドセルの機能の優れたモデルとして機能し、すでにこの役割を約束している。本稿では,3つの概念の認知マップを構築するために,後継表現とニューラルネットワークと単語埋め込みベクトルを用いたモデルを提案する。ネットワークは2つの異なるスケールドマップを学習し、関連する既存の表現に近接して新しい情報を配置する。認知地図上の情報の分散は、その規模によって異なり、集中度が高いか、3つの概念が形成されるか、あるいは地図全体に均等に広がる。我々のモデルは、入力と既存の知識表現の類似度基準に基づいて、任意の入力にマルチモーダルコンテキスト情報を提供することで、現在のAIモデルを改善する可能性を示唆している。 The human brain possesses the extraordinary capability to contextualize the information it receives from our environment. The entorhinal-hippocampal plays a critical role in this function, as it is deeply engaged in memory processing and constructing cognitive maps using place and grid cells. Comprehending and leveraging this ability could significantly augment the field of artificial intelligence. The multi-scale successor representation serves as a good model for the functionality of place and grid cells and has already shown promise in this role. Here, we introduce a model that employs successor representations and neural networks, along with word embedding vectors, to construct a cognitive map of three separate concepts. The network adeptly learns two different scaled maps and situates new information in proximity to related pre-existing representations. The dispersion of information across the cognitive map varies according to its scale - either being heavily concentrated, resulting in the formation of the three concepts, or spread evenly throughout the map. We suggest that our model could potentially improve current AI models by providing multi-modal context information to any input, based on a similarity metric for the input and pre-existing knowledge representations.	翻訳日:2023-07-06 17:50:15 公開日:2023-07-04
# kapitza-dirac効果におけるスピンフリップの二次元シミュレーション Two-dimensional simulation of the spin-flip in the Kapitza-Dirac effect ( http://arxiv.org/abs/2307.01571v1 ) ライセンス: Link先を確認	Ping Ge, Sven Ahrens, Baifei Shen	(参考訳) 強磁場場の量子論における多くの計算は単純な場の幾何学を用いて行われ、しばしば空間場のエンベロープを無視する。本稿では,ガウスビーム定在光波におけるカピツァ・ディラック効果の電子回折量子力学をシミュレートする。 2次元シミュレーションは、高速フーリエ変換スプリット作用素法を用いてディラック方程式を解いて相対論的枠組みで計算する。数値伝搬法を除くと,近似を適用しず,カピツァ・ディラック効果のスピンフリップが可能であることを示す。 Many calculations in strong field quantum field theory are carried out by using a simple field geometry, often neglecting the spacial field envelope. In this article, we simulate the electron diffraction quantum dynamics of the Kapitza-Dirac effect in a Gaussian beam standing light wave. The two-dimensional simulation is computed in a relativistic framework, by solving the Dirac equation with the fast Fourier transform split operator method. Except the numerical propagation method, our results are obtained without applying approximations and demonstrate that a spin-flip in the Kapitza-Dirac effect is possible.	翻訳日:2023-07-06 17:49:57 公開日:2023-07-04
# 機械学習に基づく侵入検出:特徴選択と特徴抽出 Machine Learning-Based Intrusion Detection: Feature Selection versus Feature Extraction ( http://arxiv.org/abs/2307.01570v1 ) ライセンス: Link先を確認	Vu-Duc Ngo, Tuan-Cuong Vuong, Thien Van Luong, and Hung Tran	(参考訳) スマートシティ、スマート農業、スマートヘルスケア、スマート製造など、多くの分野において、IoT(Internet of Things)が重要な役割を担っている。しかし、IoTデバイスはサイバー攻撃に非常に脆弱であり、セキュリティ侵害やデータ漏洩を引き起こす可能性がある。これらの攻撃を効果的に防止するために、さまざまな機械学習ベースのIoTネットワーク侵入検知手法が開発されており、機械学習モデルに入力される前の入力データの次元を減らすために、しばしば特徴抽出または特徴選択技術のいずれかに依存している。これは、リアルタイム操作のための検出の複雑さを低くすることを目的としており、特に侵入検知システムでは不可欠である。本稿は,最新のUNSW-NB15データセットとバイナリクラスとマルチクラス分類の両方が存在する場合において,これらの2つの特徴量削減手法を,精度,リコール率,検出精度,ランタイム複雑性といった様々なパフォーマンス指標で総合的に比較する。例えば、一般的には、特徴選択法は、より優れた検出性能を提供するだけでなく、特徴抽出よりも低いトレーニングと推論時間を提供する。しかし、特徴抽出法はその選択法よりもはるかに信頼性が高く、特にK = 4 のような K が非常に小さい場合である。さらに、特徴抽出は、特徴選択よりも縮小された特徴kの数を変更することに対する感受性が低く、バイナリクラスとマルチクラスの両方に当てはまる。この比較に基づいて,タブで詳述したように,特定のシナリオごとに適切な侵入検出タイプを選択するための有用なガイドラインを提供する。第4節の最後で14。 Internet of things (IoT) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as K = 4. Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Tab. 14 at the end of Section IV.	翻訳日:2023-07-06 17:49:47 公開日:2023-07-04
# 表現学習と不確実性定量化のための最終層状態空間モデル Last layer state space model for representation learning and uncertainty quantification ( http://arxiv.org/abs/2307.01566v1 ) ライセンス: Link先を確認	Max Cohen (TSP), Maurice Charbit, Sylvain Le Corff (TSP)	(参考訳) シーケンシャルなニューラルアーキテクチャがより深く複雑になるにつれて、不確実性の推定はますます困難になる。不確実性を定量化する努力は、しばしば特定の訓練手順に依存し、そのようなモデルの次元性のためにさらなる計算コストを負担する。本稿では,低次元状態学習のための表現学習ステージと不確かさ推定のための状態空間モデルという2つのステップで分類や回帰タスクを分解することを提案する。このアプローチは表現学習と生成モデルの設計を分離することができる。本稿では,モンテカルロ法を用いてパラメータを推定する状態空間ベース最後の層を追加することにより,既存のニューラルネットワーク上に予測分布を推定する方法を実証する。提案手法を,公的なベンチマークデータセットである電気変圧器油温の時間的推定に適用する。我々のモデルは未知変数や未使用変数によるノイズの多いデータ構造を考慮し、予測に信頼区間を提供できる。 As sequential neural architectures become deeper and more complex, uncertainty estimation is more and more challenging. Efforts in quantifying uncertainty often rely on specific training procedures, and bear additional computational costs due to the dimensionality of such models. In this paper, we propose to decompose a classification or regression task in two steps: a representation learning stage to learn low-dimensional states, and a state space model for uncertainty estimation. This approach allows to separate representation learning and design of generative models. We demonstrate how predictive distributions can be estimated on top of an existing and trained neural network, by adding a state space-based last layer whose parameters are estimated with Sequential Monte Carlo methods. We apply our proposed methodology to the hourly estimation of Electricity Transformer Oil temperature, a publicly benchmarked dataset. Our model accounts for the noisy data structure, due to unknown or unavailable variables, and is able to provide confidence intervals on predictions.	翻訳日:2023-07-06 17:49:18 公開日:2023-07-04
# 効率的な探査・探査戦略のための近似情報 Approximate information for efficient exploration-exploitation strategies ( http://arxiv.org/abs/2307.01563v1 ) ライセンス: Link先を確認	Alex Barbier-Chebbah (IP, CNRS, UPCit\'e), Christian L. Vestergaard (IP, CNRS, UPCit\'e), Jean-Baptiste Masson (IP, CNRS, UPCit\'e)	(参考訳) 本稿では,多腕バンディット問題に着目し,意思決定に固有の探索・探索ジレンマについて論じる。問題は、エージェントが現在の知識を即時利益に活用するか、または潜在的長期報酬のために新しい道を探るかを決定することである。本稿では,エントロピー勾配の解析的近似を用いて,各時点にどのアームを引くかを選択する新しいアルゴリズム,近似情報最大化(AIM)を提案する。 AIMはInfomaxとThompsonのサンプリングのパフォーマンスと一致し、計算速度、決定性、トラクタビリティも向上した。 aimの実証的な評価は、lai-robbinsの漸近的な境界に準拠していることを示し、様々な事前値に対する堅牢性を示している。その表現は調整可能であり、様々な設定で特定の最適化を可能にする。 This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.	翻訳日:2023-07-06 17:49:04 公開日:2023-07-04
# ポケットサイズのドローン上でセキュアなディープラーニングベースの分散インテリジェンス Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones ( http://arxiv.org/abs/2307.01559v1 ) ライセンス: Link先を確認	Elia Cereda and Alessandro Giusti and Daniele Palossi	(参考訳) パームサイズのナノドロンはエッジノードの魅力的なクラスであるが、その限られた計算資源は大規模なディープラーニングモデルの実行を妨げている。エッジフォッグ計算のパラダイムを採用することで、計算の一部をフォグにオフロードすることができるが、フォグノードや通信リンクが信頼できない場合、セキュリティ上の懸念が生じる。そこで本研究では,ナノドローン上でランダムなサブネットワークを冗長に実行することにより,霧の計算を検証する分散エッジフォッグ実行方式を提案する。システム上で完全に動作しているState-of-the-Artビジュアルポーズ推定ネットワークと比較して、大規模ネットワークは分散処理によってR^2$スコアを+0.19向上させ、攻撃時には95%の確率で2秒以内で検出する。 Palm-sized nano-drones are an appealing class of edge nodes, but their limited computational resources prevent running large deep-learning models onboard. Adopting an edge-fog computational paradigm, we can offload part of the computation to the fog; however, this poses security concerns if the fog node, or the communication link, can not be trusted. To tackle this concern, we propose a novel distributed edge-fog execution scheme that validates fog computation by redundantly executing a random subnetwork aboard our nano-drone. Compared to a State-of-the-Art visual pose estimation network that entirely runs onboard, a larger network executed in a distributed way improves the $R^2$ score by +0.19; in case of attack, our approach detects it within 2s with 95% probability.	翻訳日:2023-07-06 17:48:51 公開日:2023-07-04
# プロジェクション演算子を用いた2視点学習タスクのスケーラブル変数選択 Scalable variable selection for two-view learning tasks with projection operators ( http://arxiv.org/abs/2307.01558v1 ) ライセンス: Link先を確認	Sandor Szedmak (1), Riikka Huusari (1), Tat Hong Duong Le (1), Juho Rousu (1) ((1) Department of Computer Science, Aalto University, Espoo, Finland)	(参考訳) 本稿では,2視点設定,あるいはベクトル値教師付き学習問題に対する新しい変数選択法を提案する。当社のフレームワークは,データサンプルの数が数百万にものぼる,非常に大規模な選択タスクを処理できる。本手法は,出力変数と高い相関性を持つ変数を反復的に選択することで変数選択を行うが,従来選択されていた変数と相関性はない。相関を測るために,提案手法は射影作用素とその代数の概念を用いる。投影演算子では、入力変数と出力変数のセットの間の相関関係もカーネル関数によって表現できるため、非線形相関モデルも活用できる。提案手法を実験的に検証し,合成データと実データの両方において,そのスケーラビリティと特徴の関連性を示す。キーワード:教師付き変数選択、ベクトル値学習、投影値測度、カーネルヒルベルト空間 In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space	翻訳日:2023-07-06 17:48:35 公開日:2023-07-04
# 分離道路トーポフォーマー Separated RoadTopoFormer ( http://arxiv.org/abs/2307.01557v1 ) ライセンス: Link先を確認	Mingjie Lu, Yuanxian Huang, Ji Liu, Jinzhang Peng, Lu Tian, Ashish Sirasao	(参考訳) 自動運転を実現するためには、運転シナリオを理解することが不可欠だ。マップ学習やbevレーン検出といった以前の仕事は、レーンインスタンス間の接続関係を無視し、トラフィック要素検出タスクは通常、レーンラインとの関係を無視する。これらの課題に対処するため、4つのサブタスク、交通要素の検出、車線中心線の検出、車線間の接続関係の推論、車線と交通要素の割り当て関係の推論を含むタスクを提示する。本稿では,車線中心線と交通要素を識別し,それらの関係を推論するエンドツーエンドフレームワークであるroadtopoformerを提案する。各モジュールを別々に最適化することで、互いにインタラクションを防止し、小さな微調整でそれらを集約します。 2つの検出ヘッドではオブジェクトを検出するためにdetrライクなアーキテクチャを採用し、関係ヘッドでは、フロント検出器から2つのインスタンス特徴を取り込み、それらを分類器に供給して関係確率を得る。最終提出は0.445 OLSで、これはサブタスクと組み合わせたスコアの両方で競合します。 Understanding driving scenarios is crucial to realizing autonomous driving. Previous works such as map learning and BEV lane detection neglect the connection relationship between lane instances, and traffic elements detection tasks usually neglect the relationship with lane lines. To address these issues, the task is presented which includes 4 sub-tasks, the detection of traffic elements, the detection of lane centerlines, reasoning connection relationships among lanes, and reasoning assignment relationships between lanes and traffic elements. We present Separated RoadTopoFormer to tackle the issues, which is an end-to-end framework that detects lane centerline and traffic elements with reasoning relationships among them. We optimize each module separately to prevent interaction with each other and aggregate them together with few finetunes. For two detection heads, we adopted a DETR-like architecture to detect objects, and for the relationship head, we concat two instance features from front detectors and feed them to the classifier to obtain relationship probability. Our final submission achieves 0.445 OLS, which is competitive in both sub-task and combined scores.	翻訳日:2023-07-06 17:48:19 公開日:2023-07-04
# 対話エージェントの文脈におけるNLGの知識グラフ Knowledge Graph for NLG in the context of conversational agents ( http://arxiv.org/abs/2307.01548v1 ) ライセンス: Link先を確認	Hussam Ghanem (ICB), Massinissa Atmani (ICB), Christophe Cruz (ICB)	(参考訳) 知識グラフ(KG)の使用により、会話エージェントが提供する応答の正確性と包括性が向上する。会話中に回答を生成することは、これらのKGからテキストを生成することで成り立っているが、近年大きな注目を集めている課題であるとみなされている。本稿では,グラフニューラルネットワーク,グラフトランスフォーマー,seq2seqモデルによる線形化など,知識グラフからテキストへの生成に使用されるさまざまなアーキテクチャのレビューを行う。それぞれのアーキテクチャの利点と限界について議論し、アーキテクチャの選択は、目前にあるタスクの特定の要求に依存すると結論付ける。また、特に会話エージェントの文脈において、実行時間やモデルの妥当性といった制約を考慮することの重要性を強調する。これらの制約とDAVIのドメインに対するラベル付きデータの可用性に基づいて、知識グラフからテキスト生成タスクにSeq2seq Transformerベースモデル(PLM)を使用する。我々は PLM 上での kg-to-text 生成のベンチマークデータセットの改良と,今後の作業における感情的・多言語的側面の探索を目的とする。本総説では,知識グラフ・テキスト生成における様々なアプローチについて考察し,今後の研究の方向性について概説する。 The use of knowledge graphs (KGs) enhances the accuracy and comprehensiveness of the responses provided by a conversational agent. While generating answers during conversations consists in generating text from these KGs, it is still regarded as a challenging task that has gained significant attention in recent years. In this document, we provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models. We discuss the advantages and limitations of each architecture and conclude that the choice of architecture will depend on the specific requirements of the task at hand. We also highlight the importance of considering constraints such as execution time and model validity, particularly in the context of conversational agents. Based on these constraints and the availability of labeled data for the domains of DAVI, we choose to use seq2seq Transformer-based models (PLMs) for the Knowledge Graph-to-Text Generation task. We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work. Overall, this review provides insights into the different approaches for knowledge graph-to-text generation and outlines future directions for research in this area.	翻訳日:2023-07-06 17:48:00 公開日:2023-07-04
# EffSeg: 構造保存空間を用いた高効率細粒度インスタンスセグメンテーション EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity ( http://arxiv.org/abs/2307.01545v1 ) ライセンス: Link先を確認	C\'edric Picron, Tinne Tuytelaars	(参考訳) 多くの2段階のインスタンスセグメンテーションヘッドは、インスタンスごとに粗い28x28マスクを予測しており、多くのオブジェクトのきめ細かい詳細をキャプチャするには不十分である。この問題を解決するため、PointRendとRefineMaskは112x112のセグメンテーションマスクを予測し、より高い品質セグメンテーションをもたらす。どちらのメソッドも、隣接する機能(PointRend)にアクセスできないか、あるいは空間的な場所を疎結合に実行する(RefineMask)。本稿では,能動的特徴量,受動的特徴量,特徴量を含む密集した2次元インデックスマップを別々に保存し,構造保存スパーシティ(sps)法を用いて,効率的なインスタンス分割を行うeffsegを提案する。インデックスマップの目的は、どんな2D操作でも実行できるような特徴間の2D空間構成や構造を維持することである。 EffSegは、RefineMaskと比較してCOCOで同様のパフォーマンスを実現し、FLOPの数を71%削減し、FPSを29%増やした。コードはリリースされる。 Many two-stage instance segmentation heads predict a coarse 28x28 mask per instance, which is insufficient to capture the fine-grained details of many objects. To address this issue, PointRend and RefineMask predict a 112x112 segmentation mask resulting in higher quality segmentations. Both methods however have limitations by either not having access to neighboring features (PointRend) or by performing computation at all spatial locations instead of sparsely (RefineMask). In this work, we propose EffSeg performing fine-grained instance segmentation in an efficient way by using our Structure-Preserving Sparsity (SPS) method based on separately storing the active features, the passive features and a dense 2D index map containing the feature indices. The goal of the index map is to preserve the 2D spatial configuration or structure between the features such that any 2D operation can still be performed. EffSeg achieves similar performance on COCO compared to RefineMask, while reducing the number of FLOPs by 71% and increasing the FPS by 29%. Code will be released.	翻訳日:2023-07-06 17:47:38 公開日:2023-07-04
# 過信は危険なこと:信頼の低い予測によってメンバーシップ推論攻撃を緩和する Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction ( http://arxiv.org/abs/2307.01610v1 ) ライセンス: Link先を確認	Zitao Chen, Karthik Pattabiraman	(参考訳) 機械学習(ml)モデルはメンバーシップ推論攻撃(mia)に対して脆弱であり、与えられた入力がターゲットモデルのトレーニングに使用されるかどうかを判断する。 MIAを緩和する取り組みは数多くあるが、プライバシ保護の制限、大きな精度低下、および/または取得が困難な追加データを必要とする場合が多い。本研究は,強力なメンバーシッププライバシと高い精度を,余分なデータを必要とせずに達成できる防衛技術であるhampを提案する。異なる形式でMIAを緩和するために、異なるプロキシを通してトレーニングサンプルを予測する際に、MLモデルの過信を利用するため、それらが統一可能であることを観察する。これにより、モデルによる自信のない予測を強制するモチベーションが増し、トレーニングやテストサンプルで同じように振る舞うようになります。 HAMPは、高いエントロピーのソフトラベルを持つ新しいトレーニングフレームワークと、高い精度を保ちながらモデルの予測を制約するエントロピーベースの正規化器で構成されている。プライバシーリスクをさらに軽減するため、HAMPは全ての予測出力を均一に修正し、精度を維持しながら低信頼の出力となるようにし、メンバーと非メンバーの予測の違いを効果的に曖昧にする。 5つのベンチマークデータセットに対して広範な評価を行い、HAMPが常に高い精度と強力な会員プライバシーを提供することを示す。最先端の7つの防衛技術と比較すると、HAMPはそれらの技術よりも優れたプライバシーとユーティリティのトレードオフを実現している。 Machine learning (ML) models are vulnerable to membership inference attacks (MIAs), which determine whether a given input is used for training the target model. While there have been many efforts to mitigate MIAs, they often suffer from limited privacy protection, large accuracy drop, and/or requiring additional data that may be difficult to acquire. This work proposes a defense technique, HAMP that can achieve both strong membership privacy and high accuracy, without requiring extra data. To mitigate MIAs in different forms, we observe that they can be unified as they all exploit the ML model's overconfidence in predicting training samples through different proxies. This motivates our design to enforce less confident prediction by the model, hence forcing the model to behave similarly on the training and testing samples. HAMP consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model's prediction while still achieving high accuracy. To further reduce privacy risk, HAMP uniformly modifies all the prediction outputs to become low-confidence outputs while preserving the accuracy, which effectively obscures the differences between the prediction on members and non-members. We conduct extensive evaluation on five benchmark datasets, and show that HAMP provides consistently high accuracy and strong membership privacy. Our comparison with seven state-of-the-art defenses shows that HAMP achieves a superior privacy-utility trade off than those techniques.	翻訳日:2023-07-06 17:40:46 公開日:2023-07-04
# L2ロシア語における文法的誤り訂正のための言語モデル A Language Model for Grammatical Error Correction in L2 Russian ( http://arxiv.org/abs/2307.01609v1 ) ライセンス: Link先を確認	Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov, Anastasia Vyrenkova	(参考訳) 文法的誤り訂正は自然言語処理の基本課題の1つである。ロシア語では、ほとんどのスペルチェッカーは正確なタイポスやその他の単純なエラーを高精度で利用できるが、非ネイティブ(L2)文字に直面すると失敗することが多い。本稿では,L2ロシア文字の誤り訂正を目的とした言語モデルを含むパイプラインを提案する。提案する言語モデルは,ロシア国立コーパスの新聞サブコーパスの未タグテキストに基づいて学習し,その品質をRULEC-GECコーパスに対して検証する。 Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.	翻訳日:2023-07-06 17:39:41 公開日:2023-07-04
# 時系列異常検出のためのプロトタイプ Prototypes as Explanation for Time Series Anomaly Detection ( http://arxiv.org/abs/2307.01601v1 ) ライセンス: Link先を確認	Bin Li, Carsten Jentsch, Emmanuel M\"uller	(参考訳) 多くのビッグデータアプリケーションにおいて、時系列における一定の規則的反復パターンから逸脱する異常パターンの検出が不可欠である。しかしながら、ラベルの欠如、時系列データの動的性質、予期せぬ異常な振る舞いにより検出プロセスが困難になる。近年の深層異常検出手法の成功にもかかわらず、このようなブラックボックスモデルにおける神秘的なメカニズムは、安全クリティカルなアプリケーションにおいて新たな課題となっている。モデルの透明性と予測信頼性の欠如は、そのような領域のさらなるブレークスルーを妨げる。本稿では,プロトタイプを用いて異常検出時の正規パターン状態の例に基づく説明を行うprotoadを提案する。検出パフォーマンスに大きな影響を与えることなく、プロトタイプは深いブラックボックスモデルに光を当て、ドメインの専門家やステークホルダーに直感的な理解を提供する。分類問題において広く用いられているプロトタイプ学習を異常検出に拡張する。潜在空間と入力空間のプロトタイプの両方を可視化することにより、正規データがどのようにモデル化され、なぜ特定のパターンが異常であるかを直感的に示す。 Detecting abnormal patterns that deviate from a certain regular repeating pattern in time series is essential in many big data applications. However, the lack of labels, the dynamic nature of time series data, and unforeseeable abnormal behaviors make the detection process challenging. Despite the success of recent deep anomaly detection approaches, the mystical mechanisms in such black-box models have become a new challenge in safety-critical applications. The lack of model transparency and prediction reliability hinders further breakthroughs in such domains. This paper proposes ProtoAD, using prototypes as the example-based explanation for the state of regular patterns during anomaly detection. Without significant impact on the detection performance, prototypes shed light on the deep black-box models and provide intuitive understanding for domain experts and stakeholders. We extend the widely used prototype learning in classification problems into anomaly detection. By visualizing both the latent space and input space prototypes, we intuitively demonstrate how regular data are modeled and why specific patterns are considered abnormal.	翻訳日:2023-07-06 17:39:24 公開日:2023-07-04
# オンチェーンデータを用いたスケーラブル強化学習システムによる暗号ポートフォリオ管理 A Scalable Reinforcement Learning-based System Using On-Chain Data for Cryptocurrency Portfolio Management ( http://arxiv.org/abs/2307.01599v1 ) ライセンス: Link先を確認	Zhenhan Huang and Fumihide Tanaka	(参考訳) ブロックチェーンネットワークのオンチェーンデータ(メトリックス)は、企業の基本と似ていて、ネットワークに対する重要かつ包括的な洞察を提供する。その情報的性質にもかかわらず、オンチェーンデータは暗号(crypto)ポートフォリオ管理(pm)のための強化学習(rl)ベースのシステムでは利用されていない。興味深い課題は、オンチェーンデータの利用によって、ベースラインと比較してRLベースのシステムの戻り性能が向上する範囲である。そこで本研究では,エンドツーエンド暗号pmにオンチェーンデータを組み込んだ新しいrlベースシステムであるcryptorlpmを提案する。 cryptorlpmは情報理解から取引注文実行までの5つのユニットで構成される。 CryptoRLPMでは、オンチェーンデータを各暗号に対してテストして指定し、メトリクスの非効率性の問題を解決する。さらに、CryptoRLPMのスケーラブルな性質により、いつでもポートフォリオの暗号を変更することができる。 3つのポートフォリオのバックテスト結果から、CryptoRLPMは、累積リターン率(ARR)、毎日リターン率(DRR)、ソルティーノ比(SR)の点で、すべてのベースラインを上回ります。特にBitcoinと比較して、CryptoRLPMはARR、DRR、SRをそれぞれ83.14%、0.5603%、および2.1767で強化している。 On-chain data (metrics) of blockchain networks, akin to company fundamentals, provide crucial and comprehensive insights into the networks. Despite their informative nature, on-chain data have not been utilized in reinforcement learning (RL)-based systems for cryptocurrency (crypto) portfolio management (PM). An intriguing subject is the extent to which the utilization of on-chain data can enhance an RL-based system's return performance compared to baselines. Therefore, in this study, we propose CryptoRLPM, a novel RL-based system incorporating on-chain data for end-to-end crypto PM. CryptoRLPM consists of five units, spanning from information comprehension to trading order execution. In CryptoRLPM, the on-chain data are tested and specified for each crypto to solve the issue of ineffectiveness of metrics. Moreover, the scalable nature of CryptoRLPM allows changes in the portfolios' cryptos at any time. Backtesting results on three portfolios indicate that CryptoRLPM outperforms all the baselines in terms of accumulated rate of return (ARR), daily rate of return (DRR), and Sortino ratio (SR). Particularly, when compared to Bitcoin, CryptoRLPM enhances the ARR, DRR, and SR by at least 83.14%, 0.5603%, and 2.1767 respectively.	翻訳日:2023-07-06 17:39:08 公開日:2023-07-04
# ピーク時間連続予測におけるパフォーマンスギャップのブリッジ: Seq2Peakフレームワーク Bridge the Performance Gap in Peak-hour Series Forecasting: The Seq2Peak Framework ( http://arxiv.org/abs/2307.01597v1 ) ライセンス: Link先を確認	Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu	(参考訳) Peak-Hour Series Forecasting (PHSF) は、様々な領域において重要で未探索の課題である。最先端のディープラーニングモデルは通常の時系列予測(TSF)では優れていますが、PHSFでは同等の結果を得るのに苦労しています。これは、ピーク時系列における高い非定常性によって引き起こされる課題によるもので、これは通常の TSF よりも直接予測が困難である。さらに、定期的な予測結果から手動で最大値を抽出すると、平均赤字を最小化するモデルによる最適化性能が低下する。これらの問題に対処するため,本論文では,PHSFタスク用に設計された新しいフレームワークであるSeq2Peakについて述べる。 Seq2Peakは、非定常性問題を緩和するCyclicNormパイプラインと、オリジナルのシリーズとピーク時間の両方を教師付き信号として利用するハイブリッド損失関数を備えた単純なトレーニング可能なパラメータなしピーク時デコーダの2つの重要なコンポーネントを提供する。公開されている時系列データセットに対する大規模な実験は、提案されたフレームワークの有効性を示し、トランスフォーマーと非トランスフォーマーベースのTSFモデルの両方に対して、4つの実世界のデータセットに対して37.7\%の顕著な平均相対的な改善をもたらす。 Peak-Hour Series Forecasting (PHSF) is a crucial yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue, and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7\% across four real-world datasets for both transformer- and non-transformer-based TSF models.	翻訳日:2023-07-06 17:38:47 公開日:2023-07-04
# プロンプトチューニングは、より遠く、対照的な学習を引き寄せる: 社会的バイアスを軽減するための2段階アプローチ Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases ( http://arxiv.org/abs/2307.01595v1 ) ライセンス: Link先を確認	Yingji Li, Mengnan Du, Xin Wang, Ying Wang	(参考訳) プレトレーニング言語モデル(PLM)の表現能力が向上するにつれて、未処理のコーパスから社会的バイアスを継承するという懸念が高まっている。これまでのデバイアス技術のほとんどは、トレーニングコーパスのバランスをとるために、CDA(Counterfactual Data Augmentation)を使用していた。しかし、CDAは元のコーパスをわずかに修正し、異なる人口集団間の表現距離を狭い範囲に制限する。その結果,デバイアス化モデルは,テキストリソースの制限によるデバイアス化性能に影響を及ぼす対物対の違いに容易に適合することがわかった。本稿では,PLMのエンコーディングにおける社会的バイアスを軽減するために,Contrastive Learning with Continuous Prompt Augmentation (CCPA) を用いた対角的学習による2段階脱バイアスモデルを提案する。第1段階では,連続的なプロンプトチューニングに基づくデータ拡張法を提案する。第2段階では、コントラスト学習を利用して、強化されたサンプルペア間の表現距離を絞り、微調整されたPLMのパラメータをデバイアス符号化する。本手法は,トレーニングプロセスに難易度を加えることで,よりデバイアスな性能を達成するためのモデル指導を行う。大規模な実験の結果,CCPAはデバイアス性能においてベースラインよりも優れていた。一方、GLUEベンチマーク実験の結果、CCPAはPLMの言語モデリング能力を保っていることが示された。 As the representation capability of Pre-trained Language Models (PLMs) improve, there is growing concern that they will inherit social biases from unprocessed corpora. Most previous debiasing techniques used Counterfactual Data Augmentation (CDA) to balance the training corpus. However, CDA slightly modifies the original corpus, limiting the representation distance between different demographic groups to a narrow range. As a result, the debiasing model easily fits the differences between counterfactual pairs, which affects its debiasing performance with limited text resources. In this paper, we propose an adversarial training-inspired two-stage debiasing model using Contrastive learning with Continuous Prompt Augmentation (named CCPA) to mitigate social biases in PLMs' encoding. In the first stage, we propose a data augmentation method based on continuous prompt tuning to push farther the representation distance between sample pairs along different demographic groups. In the second stage, we utilize contrastive learning to pull closer the representation distance between the augmented sample pairs and then fine-tune PLMs' parameters to get debiased encoding. Our approach guides the model to achieve stronger debiasing performance by adding difficulty to the training process. Extensive experiments show that CCPA outperforms baselines in terms of debiasing performance. Meanwhile, experimental results on the GLUE benchmark show that CCPA retains the language modeling capability of PLMs.	翻訳日:2023-07-06 17:38:23 公開日:2023-07-04
# ディスプレイ広告における多要素創造のためのクロス要素組合せ選択 Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising ( http://arxiv.org/abs/2307.01593v1 ) ライセンス: Link先を確認	Wei Zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang	(参考訳) 広告制作の効果は、その視覚的な外観に大きく影響される。広告プラットフォームは、広告主が提供するクリエイティブ要素を組み合わせることで、異なる外観で広告クリエイティブを生成できる。しかし、広告クリエイティブ要素の増加に伴い、数え切れないほどの可能性から適切な組み合わせを選択することは困難になっている。業界主流のアプローチは、個別の創造的要素を個別に選択することであり、モデリングプロセスにおける創造的要素間の相互作用の重要性をしばしば見落としている。そこで本稿では,複数の創造的要素を対象とした多要素組合せ選択フレームワークcecsを提案する。エンコーダプロセスでは、現在候補の創造性に基づいて単一の創造的要素の表現を動的に調整するために、クロス要素相互作用が採用される。デコーダプロセスでは、創造的組み合わせ問題は複数の創造的要素のカスケード選択問題に変換される。候補間の関連をモデル化するためにカスケード設計を用いたポインタ機構を用いる。実世界のデータセットに関する総合的な実験は、CECSがオフラインメトリクスのSOTAスコアを達成したことを示している。さらに,cecsアルゴリズムが産業応用に応用され,ビジネス上有益である 6.02% ctr と 10.37% gmv lift が実現されている。 The effectiveness of ad creatives is greatly influenced by their visual appearance. Advertising platforms can generate ad creatives with different appearances by combining creative elements provided by advertisers. However, with the increasing number of ad creative elements, it becomes challenging to select a suitable combination from the countless possibilities. The industry's mainstream approach is to select individual creative elements independently, which often overlooks the importance of interaction between creative elements during the modeling process. In response, this paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements, termed CECS. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element based on the current candidate creatives. In the decoder process, the creative combination problem is transformed into a cascade selection problem of multiple creative elements. A pointer mechanism with a cascade design is used to model the associations among candidates. Comprehensive experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics. Moreover, the CECS algorithm has been deployed in our industrial application, resulting in a significant 6.02% CTR and 10.37% GMV lift, which is beneficial to the business.	翻訳日:2023-07-06 17:37:58 公開日:2023-07-04
# ニューラルネットワークを用いたリー群対称性変換の学習 Learning Lie Group Symmetry Transformations with Neural Networks ( http://arxiv.org/abs/2307.01583v1 ) ライセンス: Link先を確認	Alex Gabel, Victoria Klein, Riccardo Valperga, Jeroen S. W. Lamb, Kevin Webster, Rick Quax, Efstratios Gavves	(参考訳) データセットにおける対称性の存在を検出し定量化する問題は、モデル選択、生成モデリング、データ解析などに有用である。ニューラルネットワークにおける既存のハードコーディング変換法では、そのタスクの対称性に関する事前の知識を必要とするが、この研究は、データセットに存在する未知の対称性、すなわち、通常フィールドで考慮される従来のもの(回転、スケーリング、翻訳)を超えたリー群対称性変換の発見と特徴付けに焦点を当てている。具体的には、データポイントごとに異なるパラメータ値を持つ変換の1パラメータサブグループによってデータセットが変換されるシナリオを検討する。我々の目標は、変換群とパラメータ値の分布を特徴付けることである。その結果,両環境におけるアプローチの有効性が示された。 The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterizing unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.	翻訳日:2023-07-06 17:37:40 公開日:2023-07-04
# IAdet:最も単純なループ中の人間オブジェクト検出 IAdet: Simplest human-in-the-loop object detection ( http://arxiv.org/abs/2307.01582v1 ) ライセンス: Link先を確認	Franco Marchesoni-Acland, Gabriele Facciolo	(参考訳) この研究は、Intelligent Annotation (IA) という名前のデータをアノテートしながらモデルをトレーニングするための戦略を提案する。 iaには,(1)データアノテーション支援,(2)背景モデルのトレーニング,(3)データポイントのアクティブ選択という3つのモジュールが含まれている。このフレームワークでは、シングルクラスのオブジェクト検出に特化したIAdetツールをオープンソースにしています。さらに,そのようなループシステムを自動的に評価する手法も考案した。 PASCAL VOCデータセットの場合、IAdetツールは、トレーニング済みのモデルを無償で提供しながら、データベースアノテーションの時間を25\%$に短縮する。これらの結果は、意図的に非常に単純なIAdet設計のために得られる。その結果、IAdetは複数の簡単な改善の影響を受けるようになり、強力なHuman-in-the-loopオブジェクト検出システムへの道を開いた。 This work proposes a strategy for training models while annotating data named Intelligent Annotation (IA). IA involves three modules: (1) assisted data annotation, (2) background model training, and (3) active selection of the next datapoints. Under this framework, we open-source the IAdet tool, which is specific for single-class object detection. Additionally, we devise a method for automatically evaluating such a human-in-the-loop system. For the PASCAL VOC dataset, the IAdet tool reduces the database annotation time by $25\%$ while providing a trained model for free. These results are obtained for a deliberately very simple IAdet design. As a consequence, IAdet is susceptible to multiple easy improvements, paving the way for powerful human-in-the-loop object detection systems.	翻訳日:2023-07-06 17:37:27 公開日:2023-07-04
# 人間-the-Loopアノテーションのための最適かつ効率的なバイナリ質問 Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation ( http://arxiv.org/abs/2307.01578v1 ) ライセンス: Link先を確認	Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo	(参考訳) データアノテーションは、人工知能ソリューションの解釈、研究、開発において極めて重要であるが、アクティブラーニングやマイナショットラーニングのようなほとんどの研究は、サンプル効率問題に焦点を当てている。本稿では, 予測器が与える注釈データ取得の補足問題について検討する。単純な二項分類設定では、最適一般解から実用的な方法まで幅広いスペクトルを提示する。この問題は、予測者が利用可能な場合、最小のyes/no質問数を持つバイナリ分類データセットの完全なアノテーションとしてフレーム化されている。一般的な二分問題の場合、解は符号理論において見出され、最適な質問戦略は可能なラベルのハフマン符号化によって与えられる。しかし、このアプローチは小さなデータセットサイズであっても計算が難しい。本稿では,いくつかのヒューリスティックスとプロキシコスト関数のルックアヘッド最小化に基づく代替実用ソリューションを提案する。提案手法は最適解と比較して解析され、複数の合成および実世界のデータセットで評価される。これらのデータセットでは、アノテーションの効率が大幅に向上する(23-86\%$)。 Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.	翻訳日:2023-07-06 17:37:15 公開日:2023-07-04
# In-Domain Self-Supervised Learningはリモートセンシング画像分類の改善につながる In-Domain Self-Supervised Learning Can Lead to Improvements in Remote Sensing Image Classification ( http://arxiv.org/abs/2307.01645v1 ) ライセンス: Link先を確認	Ivica Dimitrovski, Ivan Kitanovski, Nikola Simidjievski, Dragi Kocev	(参考訳) 自己教師あり学習(ssl)は、大量のラベルなしデータを活用できるため、リモートセンシング画像分類に有望なアプローチとして登場した。従来の教師付き学習とは対照的に、sslは明示的なラベルなしでデータの表現を学ぶことを目指している。これは、ラベルのないデータのための擬似ラベルを作成し、事前学習されたモデルを学ぶために使用できる補助タスクを定式化することで達成される。事前学習されたモデルは、リモートセンシングイメージシーンの分類のような下流タスクで微調整することができる。本稿では,様々なリモートセンシング画像シーン分類データセットをダウンストリームタスクとして用いた,大規模なラベルなしリモートセンシングデータセットであるMillion AIDを用いたSSL事前トレーニングの有効性を解析する。具体的には、ImageNetデータセットを用いたViTの教師付き事前トレーニングとは対照的に、iBOTフレームワークとビジョントランスフォーマー(ViT)を併用したSSL事前トレーニングの有効性を評価する。さまざまな特性を持つ14のデータセットにわたる包括的な実験の結果、ドメイン内のSSLは、教師付きデータセットと比較してモデルの予測パフォーマンスを改善することが判明した。 Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to leverage large amounts of unlabeled data. In contrast to traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used to create pseudo-labels for the unlabeled data and learn pre-trained models. The pre-trained models can then be fine-tuned on downstream tasks such as remote sensing image scene classification. The paper analyzes the effectiveness of SSL pre-training using Million AID - a large unlabeled remote sensing dataset on various remote sensing image scene classification datasets as downstream tasks. More specifically, we evaluate the effectiveness of SSL pre-training using the iBOT framework coupled with Vision transformers (ViT) in contrast to supervised pre-training of ViT using the ImageNet dataset. The comprehensive experimental work across 14 datasets with diverse properties reveals that in-domain SSL leads to improved predictive performance of models compared to the supervised counterparts.	翻訳日:2023-07-06 17:31:05 公開日:2023-07-04
# ツール対応会話エージェントの挿入拡大 Insert-expansions for Tool-enabled Conversational Agents ( http://arxiv.org/abs/2307.01644v1 ) ライセンス: Link先を確認	Andreas G\"oldi and Roman Rietsche	(参考訳) 本稿では,このプロンプト手法によって生成された明示的な推論パスにおけるツール(あるいはプラグイン)の使用に注目し,大規模言語モデルにおける思考連鎖の高度な実装について述べる。ツールが使える会話エージェントは、検索エンジンや電卓などのツールが本来のユーザー意図から逸脱するなど、サイドトラック化されることが多い。そこで我々は,ユーザがツールになり,必要な詳細を提供し,リクエストを精査するコンセプトを探求する。会話分析を通して、我々はこの相互作用を、好ましい応答を促進するために設計された中間的会話である挿入膨張として特徴づける。我々は,この「ユーザ・アズ・ア・ツール」アプローチから生じる可能性について,直接比較による2つの経験的研究から検討し,レコメンデーション領域の利点を見出す。 This paper delves into an advanced implementation of Chain-of-Thought-Prompting in Large Language Models, focusing on the use of tools (or "plug-ins") within the explicit reasoning paths generated by this prompting method. We find that tool-enabled conversational agents often become sidetracked, as additional context from tools like search engines or calculators diverts from original user intents. To address this, we explore a concept wherein the user becomes the tool, providing necessary details and refining their requests. Through Conversation Analysis, we characterize this interaction as insert-expansion - an intermediary conversation designed to facilitate the preferred response. We explore possibilities arising from this 'user-as-a-tool' approach in two empirical studies using direct comparison, and find benefits in the recommendation domain.	翻訳日:2023-07-06 17:30:48 公開日:2023-07-04
# 知識の強化を促進する思考連鎖 Chain of Thought Prompting Elicits Knowledge Augmentation ( http://arxiv.org/abs/2307.01640v1 ) ライセンス: Link先を確認	Dingjun Wu, Jing Zhang, Xinmei Huang	(参考訳) 知識強化されたディープラーニングパラダイムは、ドメイン知識を同定し、深層モデルに統合するパラダイムを指す。従来の手法では、様々なソースから外部知識を集めるためにタスク固有のアプローチが用いられる。対照的に、大きな言語モデルは広範囲に事前訓練されており、外部知識の包括的な情報源として機能する。本稿では,深層学習のための知識を増強するChain-of-Thoughtベースの手法であるCoT-KAを提案する。 CoT-KAは、従来の拡張手法に必要な知識検索や知識推論モデルの必要性を回避する。以上の結果から,CoT-KAは,さまざまな推論タスクにおいて利用可能な11のベンチマークの過半数において,純粋なCoT法と非拡張法の両方に優れることが示された。 The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models. Conventional methods typically employ task-specific approaches to gather external knowledge from various sources. In contrast, large language models are extensively pre-trained and can serve as a comprehensive source of external knowledge. In this paper, we propose CoT-KA, a Chain-of-Thought-based method that augments knowledge for deep learning. CoT-KA avoids the need for additional knowledge retrieval or knowledge reasoning models, as required in conventional augmentation methods. Our results demonstrate that CoT-KA outperforms both pure CoT-based methods and the non-augmented method across the majority of eleven publicly available benchmarks for various reasoning tasks.	翻訳日:2023-07-06 17:30:32 公開日:2023-07-04
# 相互コヒーレンス近似のためのヒューリスティックアルゴリズム Heuristic Algorithms for the Approximation of Mutual Coherence ( http://arxiv.org/abs/2307.01639v1 ) ライセンス: Link先を確認	Gregor Betz, Vera Chekan, Tamara Mchedlidze	(参考訳) 相互一貫性は2つの意見の類似性の尺度である。この概念は哲学に由来するが、Wahl-O-Matシステムのような幅広い技術には必須である。ドイツでは、この制度は有権者が政治的嗜好に最も近い候補者を見つけるのに役立つ。相互コヒーレンスの正確な計算は、意見のすべての部分集合の反復のために非常に時間がかかる。さらに、各サブセットに対して、SATモデルカウント問題(英語版)のインスタンスを解く必要があるが、これはコンピュータ科学において難しい問題である。この研究は、この計算を加速する最初の研究である。本稿では,いわゆる確認値の分布を3つのガウスの混合としてモデル化し,モデルパラメータを推定する効率的なヒューリスティックスを提案する。相互コヒーレンスは、その分布の期待値と近似される。提示されたアルゴリズムのいくつかは完全に多項式時間であり、他のアルゴリズムは少数のsatモデルカウント問題の解のみを必要とする。我々の最善のアルゴリズムの平均二乗誤差は 0.0035 以下であり、効率を考慮すると重要ではない。さらに、wahl-o-matライクなシステムでは精度が十分である。 Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.	翻訳日:2023-07-06 17:30:19 公開日:2023-07-04
# 複数ネットワーク上のランダムウォーク Random Walk on Multiple Networks ( http://arxiv.org/abs/2307.01637v1 ) ライセンス: Link先を確認	Dongsheng Luo, Yuchen Bian, Yaowei Yan, Xiong Yu, Jun Huan, Xiao Liu, Xiang Zhang	(参考訳) Random Walkはネットワークの構造を探索するための基本的なアルゴリズムであり、ローカルなコミュニティ検出やネットワーク埋め込みといった多くのタスクで使用できる。既存のランダムウォーク手法は、限られた情報を含む単一ネットワークに基づいている。対照的に、実際のデータは、しばしば異なるタイプまたは異なるソースのエンティティを含んでおり、それらは包括的であり、複数のネットワークによりより良くモデル化される。本稿では,複数のネットワークにおけるリッチな情報を活用し,エンティティの推論を改善するために,複数ネットワーク上のランダムウォーク(RWM)を提案する。 RWMは柔軟で、多重ネットワークと一般的な多重ネットワークの両方をサポートし、ネットワーク間の多対多ノードマッピングを形成する。 RWMは各ネットワーク上でランダムなウォーカを送信し、開始ノードの局所的近接(すなわちノード訪問確率)を得る。同様の訪問確率を持つ歩行者はお互いを強化します。 RWMの収束特性を理論的に解析する。理論的性能保証を伴う2つの近似法を効率的な計算法として提案する。リンク予測,ネットワーク埋め込み,地域コミュニティ検出にRWMを適用した。合成データセットと実世界のデータセットの両方で実施された総合実験は、RWMの有効性と効率を実証している。 Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited information. In contrast, real data often contain entities with different types or/and from different sources, which are comprehensive and can be better modeled by multiple networks. To take advantage of rich information in multiple networks and make better inferences on entities, in this study, we propose random walk on multiple networks, RWM. RWM is flexible and supports both multiplex networks and general multiple networks, which may form many-to-many node mappings between networks. RWM sends a random walker on each network to obtain the local proximity (i.e., node visiting probabilities) w.r.t. the starting nodes. Walkers with similar visiting probabilities reinforce each other. We theoretically analyze the convergence properties of RWM. Two approximation methods with theoretical performance guarantees are proposed for efficient computation. We apply RWM in link prediction, network embedding, and local community detection. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of RWM.	翻訳日:2023-07-06 17:30:02 公開日:2023-07-04
# HAGNN: 異種グラフニューラルネットワークのためのハイブリッドアグリゲーション HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks ( http://arxiv.org/abs/2307.01636v1 ) ライセンス: Link先を確認	Guanghui Zhu, Zhennan Zhu, Hongyang Chen, Chunfeng Yuan, Yihua Huang	(参考訳) 異種グラフニューラルネットワーク(GNN)は異種グラフの処理に成功している。既存の異種GNNでは、メタパスが重要な役割を果たす。しかし、近年の研究はメタパスのない単純な同質グラフモデルでも同等の結果が得られることを指摘し、メタパスの必要性を疑問視している。本稿では,まず,メタパスベースモデルとメタパスフリーモデル,すなわちノードアグリゲーションのための近傍を選択する方法に関する本質的な違いについて述べる。そこで我々は,ヘテロジニアスグラフのリッチな型意味情報,すなわちHAGNN(Hybrid Aggregation for Heterogeneous GNNs)を包括的に活用するための新しいフレームワークを提案する。 HAGNNの中核は、ノード集約のためにメタパス隣人と直接接続された隣人を活用することである。 hagnnは全体の集約プロセスを、メタパスベースのイントラタイプアグリゲーションとメタパスフリーインタータイプアグリゲーションの2つのフェーズに分割する。型内アグリゲーションフェーズでは,融合メタパスグラフと呼ばれる新しいデータ構造を提案し,その上で構造的意味認識アグリゲーションを行う。最後に、各フェーズによって生成される埋め込みを組み合わせる。既存の異種GNNモデルと比較して、HAGNNは異種グラフの異種性を完全に活用することができる。ノード分類、ノードクラスタリング、リンク予測タスクに関する大規模な実験結果から、HAGNNは既存のモードよりも優れており、HAGNNの有効性を示している。 Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that simple homogeneous graph model without meta-path can also achieve comparable results, which calls into question the necessity of meta-path. In this paper, we first present the intrinsic difference about meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type semantic information in heterogeneous graphs comprehensively, namely HAGNN (Hybrid Aggregation for Heterogeneous GNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation. During the intra-type aggregation phase, we propose a new data structure called fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness of HAGNN.	翻訳日:2023-07-06 17:29:45 公開日:2023-07-04
# ChildPlay: 子どもの視線行動を理解するための新しいベンチマーク ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour ( http://arxiv.org/abs/2307.01630v1 ) ライセンス: Link先を確認	Samy Tafasca, Anshul Gupta, Jean-Marc Odobez	(参考訳) 子どもの発達障害を診断するための重要なマーカーは、アイコンタクトや共有注意などの迷路行動である。これまでの研究ではこれらの要素のいくつかを検討したが、分析は通常プライベートデータセット上で行われ、実験室の設定に限定されている。さらに、すべての一般公開された視線目標予測ベンチマークには、主に大人のインスタンスが含まれており、幼児のシナリオに適用できないようにトレーニングされたモデルが採用されている。本稿では,子どもの視線目標と相互作用する大人の視線目標を予測するための最初の研究を提案する。この目的のために,子どもがコントロールされていない環境(幼稚園,セラピーセンター,保育園など)で大人と遊んで交流する様子を収録した短いビデオクリップのキュレートされたコレクションであるChildPlayデータセットを紹介した。さらに,人物の3次元視野(3dfov)のシーン部分を明確に識別し,近年の奥行き推定法を活用し,視線目標予測のための新しいモデルを提案する。我々のモデルは、ベンチマークデータセットとChildPlayのアート結果の状態を達成します。また, 子どもの表情予測性能は, 成人よりもずっと悪く, 子どもの視線アノテーションを用いた微調整モデルにより有意に改善できることが示された。私たちのデータセットとモデルは公開されます。 Gaze behaviors such as eye-contact or shared attention are important markers for diagnosing developmental disorders in children. While previous studies have looked at some of these elements, the analysis is usually performed on private datasets and is restricted to lab settings. Furthermore, all publicly available gaze target prediction benchmarks mostly contain instances of adults, which makes models trained on them less applicable to scenarios with young children. In this paper, we propose the first study for predicting the gaze target of children and interacting adults. To this end, we introduce the ChildPlay dataset: a curated collection of short video clips featuring children playing and interacting with adults in uncontrolled environments (e.g. kindergarten, therapy centers, preschools etc.), which we annotate with rich gaze information. We further propose a new model for gaze target prediction that is geometrically grounded by explicitly identifying the scene parts in the 3D field of view (3DFoV) of the person, leveraging recent geometry preserving depth inference methods. Our model achieves state of the art results on benchmark datasets and ChildPlay. Furthermore, results show that looking at faces prediction performance on children is much worse than on adults, and can be significantly improved by fine-tuning models using child gaze annotations. Our dataset and models will be made publicly available.	翻訳日:2023-07-06 17:29:19 公開日:2023-07-04
# リカレントトレンド予測ニューラルネットワークに基づく予測組込みスケジューリングによるスマートホーム環境の再生可能エネルギー管理 Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network ( http://arxiv.org/abs/2307.01622v1 ) ライセンス: Link先を確認	Mert Nak{\i}p, Onur \c{C}opur, Emrah Biyik, C\"uneyt G\"uzeli\c{s}	(参考訳) スマートホームエネルギー管理システムは、配電網をより効率的かつ確実に運用し、分散型再生可能エネルギー源の効果的な普及を可能にする。これらのシステムは、需要と再生可能生成の不確実性を扱うことのできる堅牢な予測、最適化、制御/スケジューリングアルゴリズムに依存している。本稿では,Recurrent Trends Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES)と呼ばれるMLアルゴリズムを提案する。 rTPNN-FESは、再生可能エネルギーの発生と家電のスケジュールを同時に予測する新しいニューラルネットワークアーキテクチャである。組込み構造により、rTPNN-FESは予測とスケジューリングのための別々のアルゴリズムの使用を排除し、予測エラーに対して堅牢なスケジュールを生成する。本稿では,iot対応スマートホームにおける提案アルゴリズムの性能評価も行う。評価結果から, rTPNN-FESは最適化よりも37.5ドルの速さで, 最先端予測技術より優れていることがわかった。 Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.	翻訳日:2023-07-06 17:28:56 公開日:2023-07-04
# 量子ダイレクト通信のための2と3のプレイヤー方式 A 2 & 3 Player Scheme for Quantum Direct Communication ( http://arxiv.org/abs/2307.01620v1 ) ライセンス: Link先を確認	Theodore Andronikos and Alla Sirokofskich	(参考訳) 本稿では,aliceとbobの量子的セキュアな直接通信を実現する2つの情報理論的セキュアプロトコルと,alice,bod,charlieの2つのプロトコルを紹介する。どちらのプロトコルも、プレイヤーの絡み合った複合システムに秘密情報を埋め込むのと同じ新しい方法を使っている。情報エンコーディングの仕方は,本論文の目新しさであり,この分野の先行作品と比較して特徴を区別するものである。この手法の利点は、拡張が容易であり、2番目のプロトコルで示されるように、3つ以上のプレイヤーを含む設定に一般化できることである。この特徴は、2人の空間分離されたプレイヤーが、彼女が完全な秘密を明らかにするために、結合し、アリスに送信しなければならない秘密情報の一部だけをポッセするときに有益である。 3つのプレイヤプロトコルを使用することで、典型的なqsdcプロトコルを2回適用することなく、このタスクを1回で達成することができる。両方のプロトコルのもう1つの特徴は、単純さと均一性である。 2つのプレーヤプロトコルは、EPRペアとGHZトリプル上の3つのプレーヤプロトコルに依存しています。同じ静脈では、局所量子回路は類似または同一であり、アダマールゲートとCNOTゲートのみを使用するため容易に構成可能である。 This paper introduces two information-theoretically secure protocols that achieve quantum secure direct communication between Alice and Bob in the first case, and among Alice, Bod and Charlie in the second case. Both protocols use the same novel method to embed the secret information in the entangled composite system of the players. The way of encoding the information is the main novelty of this paper and the distinguishing feature compared to previous works in the field. The advantage of this method is that it is easily extensible and can be generalized to a setting involving three, or even more, players, as demonstrated with the second protocol. This trait can be beneficial when two spatially separated players posses only part of the secret information that must be combined and transmitted to Alice in order for her to reveal the complete secret. Using the three player protocol, this task can be achieved in one go, without the need to apply a typical QSDC protocol twice, where Alice first receives Bob's information and afterwards Charlie's information. Another characteristic of both protocols is their simplicity and uniformity. The two player protocol relies on EPR pairs, and the three player protocol on GHZ triples, which can be easily prepared with our current technology. In the same vein, the local quantum circuits are similar or identical, and are easily constructible as they employ only Hadamard and CNOT gates.	翻訳日:2023-07-06 17:28:39 公開日:2023-07-04
# SageFormer: 多変量時系列予測のためのグラフ強化変換器 SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2307.01616v1 ) ライセンス: Link先を確認	Zhenwei Zhang, Xin Wang, Yuantao Gu	(参考訳) 多変量時系列予測は多様な領域において重要な役割を果たす。近年のディープラーニング手法,特にトランスフォーマーの進歩は,将来性を示しているが,シリーズ間の依存関係の重要性に対処する上ではまだギャップが残っている。本稿では,グラフ構造を用いて時系列間の依存関係を効果的にキャプチャし,モデル化するシリーズ対応グラフ拡張トランスフォーマーモデルであるSageFormerを紹介する。 sageformerは、2つの重要な課題に取り組んでいる。シリーズ間で多様な時間パターンを効果的に表現し、シリーズ間で冗長な情報を緩和する。重要なのは、提案されたシリーズアウェアフレームワークが既存のトランスフォーマーベースのモデルとシームレスに統合され、シリーズ間の依存関係をモデル化する能力が強化されることだ。実世界および合成データセットに関する広範な実験を通じて、従来の最先端のアプローチと比較して、sageformerの優れたパフォーマンスを示す。 Multivariate time series forecasting plays a critical role in diverse domains. While recent advancements in deep learning methods, especially Transformers, have shown promise, there remains a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.	翻訳日:2023-07-06 17:28:14 公開日:2023-07-04
# 非条件音声合成のためのganの絡み合い Disentanglement in a GAN for Unconditional Speech Synthesis ( http://arxiv.org/abs/2307.01673v1 ) ライセンス: Link先を確認	Matthew Baas and Herman Kamper	(参考訳) 明示的な条件付けなしに、潜在空間から直接リアルな音声を合成できるモデルを開発することができるか? 過去10年間、いくつかの努力にもかかわらず、過去の敵対的および拡散ベースのアプローチは、小さなボカブラリデータセットでも、これを達成するのに苦労している。そこで本稿では,無条件音声合成のための生成対向ネットワークであるAudioStyleGAN(ASGAN)を提案する。画像合成モデルのstyleganファミリに基づいて、asganはサンプリングされたノイズを不連続な潜在ベクトルにマッピングし、オーディオ特徴のシーケンスにマッピングすることで、各層で信号エイリアシングが抑制される。 AsGANのトレーニングを成功させるためには、適応型判別器の増分修正など、いくつかの新しい手法を導入する。小語彙のGoogle Speech Commands digitsデータセットに適用し、非条件音声合成の最先端結果を達成する。また、既存の最高性能拡散モデルよりもかなり高速である。我々は,asganの潜在空間が不連続であることを確認する。空間内の単純な線形演算が,訓練中に見当たらないいくつかのタスクを実行するためにどのように利用できるかを示す。具体的には,音声変換,音声強調,話者照合,キーワード分類における評価を行う。我々の研究は、ganは依然として無条件音声合成環境において非常に競争力があり、非知覚タスクの一般化を支援するために不連続な潜在空間が利用できることを示している。コード、モデル、サンプル:https://github.com/RF5/simple-asgan/ Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this, even on small-vocabulary datasets. To address this, we propose AudioStyleGAN (ASGAN) -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space. Building upon the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation which probabilistically skips discriminator updates. We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis. It is also substantially faster than existing top-performing diffusion models. We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training. Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. Our work indicates that GANs are still highly competitive in the unconditional speech synthesis landscape, and that disentangled latent spaces can be used to aid generalization to unseen tasks. Code, models, samples: https://github.com/RF5/simple-asgan/	翻訳日:2023-07-06 17:21:21 公開日:2023-07-04
# ノルウェーの自動音声認識の強化 Boosting Norwegian Automatic Speech Recognition ( http://arxiv.org/abs/2307.01672v1 ) ライセンス: Link先を確認	Javier de la Rosa, Rolv-Arild Braaten, Per Egil Kummervold, Freddy Wetjen, Svein Arne Brygfjeld	(参考訳) 本稿では,ノルウェーの2つの公用語である Bokm{\aa}l と Nynorsk の音声認識モデルについて述べる。複数のノルウェー語音声データセットにおける様々な大きさのモデルと事前学習アプローチの性能を比較した。さらに、従来の最先端asrモデルやドメイン外データセットに対して、これらのモデルのパフォーマンスを測定する。ノルウェー議会音声コーパス(npsc)の技術状態を、単語誤り率(wer)が17.10\%から7.60\%に改善し、モデルではbokm{\aa}lが5.81\%、nynorskが11.54\%となった。ノルウェーのASRモデルをさらに改善するための課題と潜在的な解決策についても論じる。 In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets. Additionally, we measure the performance of these models against previous state-of-the-art ASR models, as well as on out-of-domain datasets. We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for Nynorsk. We also discuss the challenges and potential solutions for further improving ASR models for Norwegian.	翻訳日:2023-07-06 17:20:51 公開日:2023-07-04
# 拡散コントラスト発散を持つエネルギーベースモデルの訓練 Training Energy-Based Models with Diffusion Contrastive Divergences ( http://arxiv.org/abs/2307.01668v1 ) ライセンス: Link先を確認	Weijian Luo and Hao Jiang and Tianyang Hu and Jiacheng Sun and Zhenguo Li and Zhihua Zhang	(参考訳) エネルギーベースモデル(EBM)は生成モデルに広く用いられている。コントラシブ・ディバージェンス(CD:Contrastive Divergence)は、EMMのトレーニング目標であり、マルコフ・チェイン・モンテカルロ法(MCMC)を用いてEMMからサンプリングする必要がある。収束までのMCMCの実行は計算集約的である。一方、短期実行MCMCは、扱いにくい余分に無視できないパラメータ勾配項をもたらす。本稿では,CDをDCD(Diffusion Contrastive Divergence)ファミリーの特別な例と見なして,CDの一般的な解釈を提供する。 CD で用いられるランゲヴィン力学を他の EBM パラメータフリー拡散法に置き換えることにより,より効率的な分岐法を提案する。提案したDCDは,CDよりも計算効率が良く,非無視勾配項に制限されないことを示す。提案するdcdの利点を示すために,合成データモデリングと高次元画像のデニュージングと生成の両方を含む集中実験を行った。合成データ学習と画像復号化実験において,提案したDCDは大きな差でCDを上回った。画像生成実験において、提案するdcdは、既存のebmに匹敵する322\times 32$データセットを生成するためのエネルギーベースのモデルを訓練することができる。 Energy-Based Models (EBMs) have been widely used for generative modeling. Contrastive Divergence (CD), a prevailing training objective for EBMs, requires sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which leads to an irreconcilable trade-off between the computational burden and the validity of the CD. Running MCMCs till convergence is computationally intensive. On the other hand, short-run MCMC brings in an extra non-negligible parameter gradient term that is difficult to handle. In this paper, we provide a general interpretation of CD, viewing it as a special instance of our proposed Diffusion Contrastive Divergence (DCD) family. By replacing the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, we propose a more efficient divergence. We show that the proposed DCDs are both more computationally efficient than the CD and are not limited to a non-negligible gradient term. We conduct intensive experiments, including both synthesis data modeling and high-dimensional image denoising and generation, to show the advantages of the proposed DCDs. On the synthetic data learning and image denoising experiments, our proposed DCD outperforms CD by a large margin. In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.	翻訳日:2023-07-06 17:20:37 公開日:2023-07-04
# 精神疲労モニタリングのためのセンサとシステム--体系的レビュー Sensors and Systems for Monitoring Mental Fatigue: A systematic review ( http://arxiv.org/abs/2307.01666v1 ) ライセンス: Link先を確認	Prabin Sharma, Joanna C. Justus, Govinda R. Poudel	(参考訳) 精神疲労は、自動車事故、医療ミス、職場での生産性の低下、およびeラーニング環境における学生の離職の主な原因である。精神的な疲労を確実に追跡できるセンサーやシステムの開発は、事故を防止し、エラーを低減し、職場の生産性を向上させる。本稿では,心的疲労の理論モデルに関する批判的概要,センサ技術の鍵となる説明,およびバイオセンサーを用いた人間の心的疲労追跡システムを用いた最近の研究の体系的レビューについて述べる。ヒトの精神疲労の検出と追跡に焦点をあてた最近の文献を体系的に調査・レビューした。調査の結果、57の研究(n=1082)が行われ、その大半は心的疲労を追跡するために脳波(eeg)ベースのセンサーを用いた。脳波センサは疲労検出に適度から良好な感度を提供することがわかった。特に,高濃度脳波センサを用いた心的疲労検出の漸進的効果は認められなかった。この結果を踏まえて,ウェアラブル脳波と環境センサの統合について,実世界のモニタリングを実現するための重要な議論を行う。半自律型・自律型産業におけるウェアラブルセンサと疲労監視システムの普及に向けての技術の進歩と適応に必要な今後の課題について検討する。 Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a description of key enabling sensor technologies, and a systematic review of recent studies using biosensor-based systems for tracking mental fatigue in humans. We conducted a systematic search and review of recent literature which focused on detection and tracking of mental fatigue in humans. The search yielded 57 studies (N=1082), majority of which used electroencephalography (EEG) based sensors for tracking mental fatigue. We found that EEG-based sensors can provide a moderate to good sensitivity for fatigue detection. Notably, we found no incremental benefit of using high-density EEG sensors for application in mental fatigue detection. Given the findings, we provide a critical discussion on the integration of wearable EEG and ambient sensors in the context of achieving real-world monitoring. Future work required to advance and adapt the technologies toward widespread deployment of wearable sensors and systems for fatigue monitoring in semi-autonomous and autonomous industries is examined.	翻訳日:2023-07-06 17:20:11 公開日:2023-07-04
# チットチャットとタスク指向対話のシステム導入による統合会話モデル Unified Conversational Models with System-Initiated Transitions between Chit-Chat and Task-Oriented Dialogues ( http://arxiv.org/abs/2307.01664v1 ) ライセンス: Link先を確認	Ye Liu, Stefan Ultes, Wolfgang Minker and Wolfgang Maier	(参考訳) 音声対話システム(SDS)は、タスク指向とチャットという2つのカテゴリで別々に開発された。前者は機能的な目標を達成することに焦点を当て、後者は特別な目標を伴わずにソーシャルな会話を生み出すことを目的としている。チットチャットとタスク指向の対話を両立できる統一的な会話モデルの作成は、近年の有望な研究テーマである。しかし、一つの対話で対話モードが変化した場合に生じる「初期的」の可能性はほとんど探求されていない。本研究では,タスク関連トピックを暗黙的に取り込んでタスク指向の要求に切り替えることから始まり,タスク指向のインタラクションから始まり,すべての要求情報が提供された後にカジュアルチャットに変化する2種類の対話シナリオについて検討する。統合対話モデルにおいて、システム開始遷移をトリガーする遷移文を積極的に生成できる2つの効率的なプロンプトモデルに寄与する。 1つは2つの離散トークンで訓練された離散プロンプトモデルであり、もう1つは、分類器によって自動的に生成される連続プロンプト埋め込みを用いた連続プロンプトモデルである。さらに,連続的なプロンプトモデルを用いて,タスク指向のタスク指向設定において,特定のドメイン間のプロアクティブな遷移を導くことも可能であることを示す。 Spoken dialogue systems (SDSs) have been separately developed under two different categories, task-oriented and chit-chat. The former focuses on achieving functional goals and the latter aims at creating engaging social conversations without special goals. Creating a unified conversational model that can engage in both chit-chat and task-oriented dialogue is a promising research topic in recent years. However, the potential ``initiative'' that occurs when there is a change between dialogue modes in one dialogue has rarely been explored. In this work, we investigate two kinds of dialogue scenarios, one starts from chit-chat implicitly involving task-related topics and finally switching to task-oriented requests; the other starts from task-oriented interaction and eventually changes to casual chat after all requested information is provided. We contribute two efficient prompt models which can proactively generate a transition sentence to trigger system-initiated transitions in a unified dialogue model. One is a discrete prompt model trained with two discrete tokens, the other one is a continuous prompt model using continuous prompt embeddings automatically generated by a classifier. We furthermore show that the continuous prompt model can also be used to guide the proactive transitions between particular domains in a multi-domain task-oriented setting.	翻訳日:2023-07-06 17:19:52 公開日:2023-07-04
# オンライン手書き署名検証のためのトランスフォーマーの検討 Exploring Transformers for On-Line Handwritten Signature Verification ( http://arxiv.org/abs/2307.01663v1 ) ライセンス: Link先を確認	Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Paula Delgado-Santos, Giuseppe Stragapede, Julian Fierrez, Javier Ortega-Garcia	(参考訳) 近年,ユーザフレンドリーな認証手法としてのモバイルバイオメトリックスの利用が増加している。近年の研究では、トランスフォーマーに基づく新しい行動バイオメトリック認識システムを提案している。オンライン手書き署名検証は、タブレットやスマートフォンなどの電子機器を用いて取得した生体認証に基づいて、被験者の身元を確認することを目的としている。本稿では,オンライン署名検証のための最近のトランスフォーマーに基づくアーキテクチャの適合性について検討する。特に4つの異なる構成が研究され、そのうち2つはVanilla Transformerエンコーダに依存し、他の2つは歩行と行動認識のタスクにうまく適用されている。提案する4つの構成をsvc-ongoing competitionで提案された実験プロトコルに従って評価する。実験の結果は有望であり,オンライン署名検証におけるトランスフォーマーの利用を促進する。 The application of mobile biometrics as a user-friendly authentication method has increased in the last years. Recent studies have proposed novel behavioral biometric recognition systems based on Transformers, which currently outperform the state of the art in several application scenarios. On-line handwritten signature verification aims to verify the identity of subjects, based on their biometric signatures acquired using electronic devices such as tablets or smartphones. This paper investigates the suitability of architectures based on recent Transformers for on-line signature verification. In particular, four different configurations are studied, two of them rely on the Vanilla Transformer encoder, and the two others have been successfully applied to the tasks of gait and activity recognition. We evaluate the four proposed configurations according to the experimental protocol proposed in the SVC-onGoing competition. The results obtained in our experiments are promising, and promote the use of Transformers for on-line signature verification.	翻訳日:2023-07-06 17:19:28 公開日:2023-07-04
# マイクロ波周波数標準に適用する2種共役冷却coulomb結晶の$^{174}\mathrm{yb}^+$-$^{113}\mathrm{cd}^+$symphony-cooling bi-species coulomb結晶 $^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$ sympathetic-cooling bi-species Coulomb crystal applied to microwave frequency standard ( http://arxiv.org/abs/2307.01656v1 ) ライセンス: Link先を確認	Y Zheng, H. R. Qin, S. N. Miao, N. C. Xin, Y. T. Chen, J. Z. Han, J. W. Zhang, and L. J. Wang	(参考訳) 我々は、冷却剤として$^{174}\mathrm{yb}^+$-$^{113}\mathrm{cd}^+$ bi-species coulomb結晶の実現を報告し、$^{113}\mathrm{cd}^+$マイクロ波周波数標準としての応用の可能性を確認した。中心に$^{113}\mathrm{Cd}^+$イオンが閉じ込められ、$^{113}\mathrm{Cd}^+$イオンを被写体とする相当なRF加熱と過剰なマイクロモーションが減少する。このスキームの下では、2階ドップラー効果による不確実性は5\times10^{-16}$に還元され、同調冷却された$^{40}\mathrm{Ca}^+$-$^{113}\mathrm{Cd}^+$結晶よりも大幅に改善される。マイクロ波イオン周波数標準に最も大きな不確実性をもたらす第2次ゼーマン効果の不確実性は、4\times10^{-16}$となる。 ACスタークシフトの不確実性は4\times10^{-19}$と推定される。これらの結果は、$^{174}\mathrm{Yb}^+$を、$^{113}\mathrm{Cd}^+$に対して冷却剤イオンとして使用する方がはるかに優れており、$^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$2成分結晶を用いた同調冷却カドミウムイオンマイクロ波時計システムの実現可能性を確認している。 We reported the realization of a $^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$ bi-species Coulomb crystal comprising $^{174}\mathrm{Yb}^+$ ions as coolant and verified its potential for application as a $^{113}\mathrm{Cd}^+$ microwave frequency standard employing sympathetic cooling.The two species of massive ions stably trapped in a Paul trap make up this large two-component crystal. The $^{113}\mathrm{Cd}^+$ ions are trapped in the center, which reduces considerably RF heating and excess micromotion to which the $^{113}\mathrm{Cd}^+$ ions are subjected. Under this scheme, the uncertainty due to the second-order Doppler effect is reduced to $5\times10^{-16}$, which represents an order of magnitude improvement over sympathetic cooled $^{40}\mathrm{Ca}^+$-$^{113}\mathrm{Cd}^+$ crystal. The uncertainty from the second-order Zeeman effect, which contributes the largest uncertainty to the microwave-ion frequency standard, is reduced to $4\times10^{-16}$. The relevant AC Stark shift uncertainty is estimated to be $4\times10^{-19}$. These results indicate using $^{174}\mathrm{Yb}^+$ as coolant ions for $^{113}\mathrm{Cd}^+$ is far superior and confirm the feasibility of a sympathetic-cooled cadmium-ion microwave clock system employing a $^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$ two-component crystal.	翻訳日:2023-07-06 17:19:15 公開日:2023-07-04
# アーボリストと森林労働者のタスクプランニング支援:UAVデータに基づく樹木の深層学習アプローチと樹木の生力評価の比較 Task Planning Support for Arborists and Foresters: Comparing Deep Learning Approaches for Tree Inventory and Tree Vitality Assessment Based on UAV-Data ( http://arxiv.org/abs/2307.01651v1 ) ライセンス: Link先を確認	Jonas-Dario Troles and Richard Nieding and Sonia Simons and Ute Schmid	(参考訳) 気候危機とそれに関連する長い干ばつが、都市や森林の樹木の健康を脅かしている。その結果、アーボリストや森林労働者はワークロードの増加に悩まされ、最良の場合、一貫したがしばしば減少する。ワークフローの最適化と生産性向上を目的として,都市周辺の木々を気にする人たちのタスクプランニングを改善する,オープンソースのエンドツーエンドアプローチを提案する。提案手法は,都市公園や森林の樹木在庫の作成や,統計指標や深層学習による樹木の活力評価を行うために,RGBおよび多スペクトルUAVデータに基づく。都市部における飛行ドローンに関するEUの規制により、多スペクトル衛星データと15の土壌水分センサーを使用して、木活力関連データを拡張する。さらにバンバーグには、市内に約15,000本の孤立した樹林があり、有用な情報を生み出すためにも使われている。上記のデータはすべて対話型Webアプリケーションに結合して視覚化され、アーボリストや森林労働者は個人的かつ柔軟な評価を生成でき、日々のタスク計画を改善することができる。 Climate crisis and correlating prolonged, more intense periods of drought threaten tree health in cities and forests. In consequence, arborists and foresters suffer from increasing workloads and, in the best case, a consistent but often declining workforce. To optimise workflows and increase productivity, we propose a novel open-source end-to-end approach that generates helpful information and improves task planning of those who care for trees in and around cities. Our approach is based on RGB and multispectral UAV data, which is used to create tree inventories of city parks and forests and to deduce tree vitality assessments through statistical indices and Deep Learning. Due to EU restrictions regarding flying drones in urban areas, we will also use multispectral satellite data and fifteen soil moisture sensors to extend our tree vitality-related basis of data. Furthermore, Bamberg already has a georeferenced tree cadastre of around 15,000 solitary trees in the city area, which is also used to generate helpful information. All mentioned data is then joined and visualised in an interactive web application allowing arborists and foresters to generate individual and flexible evaluations, thereby improving daily task planning.	翻訳日:2023-07-06 17:18:28 公開日:2023-07-04
# オーバーパラメータ付き畳み込み残差ネットワークを用いた低次元多様体の非パラメトリック分類 Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks ( http://arxiv.org/abs/2307.01649v1 ) ライセンス: Link先を確認	Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang	(参考訳) 畳み込み残留ニューラルネットワーク(convolutional residual neural network, convresnets)は、過パラメータ化されているものの、実際には驚くべき予測性能を達成することができる。このギャップを埋めるために,ConvResNeXtsの性能について検討する。これはConvResNetsを特別なケースとしてカバーし,非パラメトリック分類の観点から重量減衰を訓練する。我々の分析は、ConvResNeXtsにおいて無限に多くのビルディングブロックを許容し、重み減衰がこれらのブロックに空間性を暗黙的に強制することを示す。具体的には、低次元多様体上で支持される滑らかな対象関数を考えることで、convresnextsが関数の滑らかさや低次元構造に適応できることを証明し、次元の呪いに苦しむことなく効率的に関数を学習する。従来の機械学習モデルに比べて過パラメータ化されたConvResNeXtの利点を部分的に正当化する。 Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.	翻訳日:2023-07-06 17:18:04 公開日:2023-07-04
# SwinGNN:グラフ生成のための拡散モデルにおける置換不変性の再考 SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation ( http://arxiv.org/abs/2307.01646v1 ) ライセンス: Link先を確認	Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang	(参考訳) 置換同変ネットワークに基づく拡散モデルは、グラフデータの置換不変分布を学習することができる。しかし、それらの非不変モデルと比較すると、これらの不変モデルはより大きな学習課題に直面することが判明した。 1)有効目標分布は、より多くのモードを示す。 2) 最適な一段階分音スコアは, ガウス混合成分の得点関数である。そこで本研究では,swintransformersにインスパイアされた移動ウィンドウベースの自己アテンションを利用した,効率的なエッジツーエッジ2-wlメッセージパッシングネットワークを用いた非不変拡散モデルである$\textit{swingnn}$を提案する。さらに, 系統的アブレーションにより, グラフ生成のサンプル品質を著しく向上させるいくつかの批判的訓練およびサンプリング手法を同定した。最後に、単純な後処理のトリックである$\textit{i.e.}$を導入し、生成したグラフをランダムに置換し、任意のグラフ生成モデルを置換不変のグラフに変換する。合成および実世界のタンパク質および分子データセットに関する大規模な実験は、我々のSwinGNNが最先端のパフォーマンスを達成することを示す。私たちのコードはhttps://github.com/qiyan98/SwinGNNでリリースされています。 Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN .	翻訳日:2023-07-06 17:17:46 公開日:2023-07-04
# 合成は必要なすべて:合成データに対する会員推測攻撃の補助的データ仮定を取り除く Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data ( http://arxiv.org/abs/2307.01701v1 ) ライセンス: Link先を確認	Florent Gu\'epin, Matthieu Meeus, Ana-Maria Cretu and Yves-Alexandre de Montjoye	(参考訳) プライバシーを保護しながら個人レベルのデータを共有できる最も有望なソリューションは、合成データだ。シャドウモデリングに基づくメンバーシップ推論攻撃(mias)は、合成データのプライバシを評価するための標準となっている。しかしこれらの攻撃は、現在、攻撃者はトレーニングデータセットと同じ分布からサンプリングされた補助データセットにアクセスすると仮定している。これはしばしば、攻撃が実際に起こりそうにないような非常に強い仮定である。本稿では,この仮定の除去方法と,合成データのみを用いてmiasを実現する方法を示す。より具体的には、合成データのみを用いた3つの異なる攻撃シナリオにおいて、我々の結果は、MIAがまだ成功していることを示す。これらの結果は、補助データセットにアクセス可能な合成データリリースを監査する際の強い仮説を緩和して実際の攻撃を実行する方法を示している。 Synthetic data is emerging as the most promising solution to share individual-level data while safeguarding privacy. Membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data. These attacks, however, currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This often is a very strong assumption that would make an attack unlikely to happen in practice. We here show how this assumption can be removed and how MIAs can be performed using only the synthetic data. More specifically, in three different attack scenarios using only synthetic data, our results demonstrate that MIAs are still successful, across two real-world datasets and two synthetic data generators. These results show how the strong hypothesis made when auditing synthetic data releases - access to an auxiliary dataset - can be relaxed to perform an actual attack.	翻訳日:2023-07-06 17:11:32 公開日:2023-07-04
# log-depth量子回路を用いた行列積状態の合成 Preparation of matrix product states with log-depth quantum circuits ( http://arxiv.org/abs/2307.01696v1 ) ライセンス: Link先を確認	Daniel Malz, Georgios Styliaris, Zhi-Yuan Wei, J. Ignacio Cirac	(参考訳) 局所ゲートの量子回路による行列積状態(MPS)の調製を検討する。まず、n$サイトの翻訳不変正規mpを忠実に準備するには回路深度$t=\omega(\log n)$が必要であることを証明します。次に、正規化群変換に基づくアルゴリズムを導入し、誤差$\epsilon$ in depth $T=O(\log (N/\epsilon))$で正規MPSを作成する。また、測定とフィードバックがアルゴリズムの指数的な高速化につながり、$T=O(\log\log (N/\epsilon))$であることを示す。測定により、任意の翻訳不変MPS、例えば長距離非正規MPSを同じ深さで作成することもできる。最後に、アルゴリズムは自然に不均一MPSにまで拡張する。 We consider preparation of matrix product states (MPS) via quantum circuits of local gates. We first prove that faithfully preparing translation-invariant normal MPS of $N$ sites requires a circuit depth $T=\Omega(\log N)$. We then introduce an algorithm based on the renormalization-group transformation to prepare normal MPS with an error $\epsilon$ in depth $T=O(\log (N/\epsilon))$, which is optimal. We also show that measurement and feedback leads to an exponential speed-up of the algorithm, to $T=O(\log\log (N/\epsilon))$. Measurements also allow one to prepare arbitrary translation-invariant MPS, including long-range non-normal ones, in the same depth. Finally, the algorithm naturally extends to inhomogeneous MPS.	翻訳日:2023-07-06 17:11:16 公開日:2023-07-04
# スパイク駆動変圧器 Spike-driven Transformer ( http://arxiv.org/abs/2307.01694v1 ) ライセンス: Link先を確認	Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li	(参考訳) スパイキングニューラルネットワーク(SNN)は、独自のスパイクベースのイベント駆動(スパイク駆動)パラダイムにより、エネルギー効率のよいディープラーニングオプションを提供する。本稿では、スパイク駆動のパラダイムを4つの特性を持つスパイク駆動トランスフォーマーによりTransformerに組み込む。 1) Transformer の入力が 0 の場合,イベント駆動の計算は行われない。 2) 二重スパイク通信, スパイク行列に関連するすべての行列乗算は, スパース加算に変換することができる。 3) トークン次元及びチャネル次元における線形複雑性を伴う自己注意 4) スパイク形式のクエリ、キー、値の間の操作はマスクと付加です。同時に、スパイク駆動トランスフォーマーにはスパース追加操作のみが存在する。この目的のために我々は,マスクと加算操作のみを乗算なしで利用し,バニラ自己認識よりも計算エネルギーが最大87.2\times$低い新しいSDSA(Spike-Driven Self-Attention)を設計した。特にsdsaでは、クエリー、キー、値の間の行列乗算がマスク演算として設計されている。さらに、活性化機能の前にバニラトランスの残余接続をすべて再構成し、すべてのニューロンがバイナリスパイク信号を伝達することを保証する。 SNNフィールドにおける最先端の結果であるImageNet-1Kでは、スパイク駆動トランスフォーマーが77.1\%のトップ-1精度を達成できることが示されている。ソースコードはhttps://github.com/BICLab/Spike-Driven-Transformerで入手できる。 Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.	翻訳日:2023-07-06 17:10:57 公開日:2023-07-04
# 米国の法的意見のテキストにおける人種バイアスの傾向 Racial Bias Trends in the Text of US Legal Opinions ( http://arxiv.org/abs/2307.01693v1 ) ライセンス: Link先を確認	Rohan Jinturkar	(参考訳) アメリカの法律には人種的偏見が広く認識されているが、そのような偏見が法律の言語、すなわち司法的意見にどのように現れるのか、また時代や地域によって異なるのかは不明である。大規模コーパスにおける暗黙の人種的偏見を測定するアプローチに基づいて、GloVeワードの埋め込みを1860年から2009年までの600万件以上の連邦および州裁判所で近似した。伝統的に黒人の名は事前分類された「不快な」用語とより密接に関連しており、伝統的に白人の名は事前分類された「不快な」用語とより密接に関連している。また、1950年以前の法的意見が1950年以前のものよりも暗黙的な人種的偏見を示すか、また南部州の意見が北東部のものよりも人種的偏見の変化が少ないかを検証した。 1950年以前の法的な意見に偏見が高まっている証拠や、北東部州の法的な意見が南部州に比べて人種的偏見が大きく変化している証拠は見つからない。これらの結果は、制度化された人種バイアスに対するさらなる研究の動機となった。 Although there is widespread recognition of racial bias in US law, it is unclear how such bias appears in the language of law, namely judicial opinions, and whether it varies across time period or region. Building upon approaches for measuring implicit racial bias in large-scale corpora, we approximate GloVe word embeddings for over 6 million US federal and state court cases from 1860 to 2009. We find strong evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with pre-classified "unpleasant" terms whereas traditionally White names are more closely associated with pre-classified "pleasant" terms. We also test whether legal opinions before 1950 exhibit more implicit racial bias than those after 1950, as well as whether opinions from Southern states exhibit less change in racial bias than those from Northeastern states. We do not find evidence of elevated bias in legal opinions before 1950, or evidence that legal opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results motivate further research into institutionalized racial bias.	翻訳日:2023-07-06 17:10:37 公開日:2023-07-04
# erm oracleによるオンライン学習と無限のゲーム解決 Online Learning and Solving Infinite Games with an ERM Oracle ( http://arxiv.org/abs/2307.01689v1 ) ライセンス: Link先を確認	Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson	(参考訳) ERMは確率的学習環境でほぼ最適の一般化誤差を達成するのに十分であるが、オンライン学習環境では、一般的な概念クラスのためのアルゴリズムが標準最適アルゴリズム(SOA)のような計算的に非効率なオラクルに依存することは知られていない。本研究では,ERMオーラクルコールのみに依存するオンラインバイナリ分類設定のアルゴリズムを提案する。我々は、基礎となる概念クラスのリトルストーンとしきい値次元の観点で後悔を締めくくった。我々は、erm oracleがベストレスポンスオラクルと解釈できる非パラメトリックゲームで同様の結果を得ることができ、他のプレイヤーのプレイ履歴に対するプレイヤーのベストレスポンスを見つけることができる。この設定において、我々は、ベストレスポンスオラクルにのみ依存し、2人のプレイヤーのゼロサムゲームにおける近似ミニマックス平衡とマルチプレイヤーの一般サムゲームにおける近似粗相関平衡に収束する学習アルゴリズムを提供する。我々のアルゴリズムは二値ゲームと実値ゲームの両方に適用でき、大きなゲームを解く実践において、二重オラクルと多重オラクルのアルゴリズムを広く活用するための正当化を提供すると見なすことができる。 While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.	翻訳日:2023-07-06 17:10:15 公開日:2023-07-04
# スマートIoTサービスのための分散フォッグサーバによるグラフニューラルネットワークの実現 Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services ( http://arxiv.org/abs/2307.01684v1 ) ライセンス: Link先を確認	Liekang Zeng, Xu Chen, Peng Huang, Ke Luo, Xiaoxi Zhang, Zhi Zhou	(参考訳) グラフニューラルネットワーク(GNN)は,グラフ構造上の潜在表現を抽出する能力に優れていたため,様々なアプリケーションへの関心が高まっている。 iot駆動スマートアプリケーションのためのgnnベースのサービスをレンダリングするために、従来のモデル提供パラダイムは通常、地理的に分散した入力データをリモートデータセンタにフルにアップロードすることでクラウドに頼る。しかし、当社の実験的な測定によって、このようなクラウドベースのサービスにおける通信のオーバーヘッドが明らかになり、フォグコンピューティングの導入における大きな可能性を浮き彫りにしています。本稿では、フォグコンピューティングによってもたらされるアーキテクチャ上の利点を最大化するために、iotデータソースに近い複数のフォグノードの多様な動的リソースを活用する、新しい分散リアルタイムgnn推論フレームワークfographを提案する。不均一な実行計画とGNN固有の圧縮技術を導入することで、フォグ環境でのGNNのユニークな特性をうまく適合させるようにFographは設計を調整した。プロトタイプに基づく評価とケーススタディにより、Fographは最先端のクラウドサービスと霧の配置を最大5.39倍の高速化と6.84倍のスループット向上で大幅に上回っている。 Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.	翻訳日:2023-07-06 17:09:52 公開日:2023-07-04
# 局所再パラメータ化トリックを用いた離散重みとアクティベーションの学習 Learning Discrete Weights and Activations Using the Local Reparameterization Trick ( http://arxiv.org/abs/2307.01683v1 ) ライセンス: Link先を確認	Guy Berger, Aviv Navon, Ethan Fetaya	(参考訳) コンピュータビジョンと機械学習において、重要な課題は、ニューラルネットワーク推論の計算とメモリ要求を減らすことである。この課題に対処する一般的な解決策は、バイナリ化の利用である。ネットワーク重みとアクティベーションをバイナライズすることにより、計算コストの高い浮動小数点演算を高速なビット演算で置き換えることで、計算複雑性を著しく低減することができる。これにより、低リソースデバイスにデプロイ可能な、より効率的なニューラルネットワーク推論が可能になる。本研究では,局所再パラメータ化手法を用いた離散重み付きネットワークの学習手法を拡張し,離散的アクティベーションも可能にした。元のアプローチでは離散ウェイト上の分布を最適化し、中央極限定理を用いて連続ガウス分布による事前活性化を近似する。本稿では,確率的モデリングにより,ネットワークの離散的アクティベーションを効果的に行うことができることを示す。これにより、バイナリアクティベーションを持つネットワークの最先端結果によって、推論時のランタイムとメモリフットプリントをさらに削減できる。 In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.	翻訳日:2023-07-06 17:09:33 公開日:2023-07-04
# ソーシャルメディアにおけるロバストヘイト音声検出 : クロスデータセット実験による評価 Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation ( http://arxiv.org/abs/2307.01680v1 ) ライセンス: Link先を確認	Dimosthenis Antypas and Jose Camacho-Collados	(参考訳) オンラインでのヘイトスピーチの自動検出は、NLPの活発な研究領域である。これまでの研究のほとんどはソーシャルメディアのデータセットに基づいており、それらに基づいて訓練されたヘイトスピーチ検出モデルの作成に貢献している。しかし、データ生成プロセスには独自のバイアスが含まれており、モデルはこれらのデータセット固有のバイアスから本質的に学習する。本稿では,異なるヘイトスピーチ検出データセット上で言語モデルを微調整する大規模クロスデータセット比較を行う。この分析は、トレーニングデータとして使用するデータセットが、他のデータセットよりも一般化可能であることを示している。本研究は,ヘイトスピーチ検出データセットを組み合わせることで,ロバストなヘイトスピーチ検出モデルの開発にどのように寄与するかを示す。このロバスト性は、データサイズで制御し、最高のデータセットと比較しても保持される。 The automatic detection of hate speech online is an active research area in NLP. Most of the studies to date are based on social media datasets that contribute to the creation of hate speech detection models trained on them. However, data creation processes contain their own biases, and models inherently learn from these dataset-specific biases. In this paper, we perform a large-scale cross-dataset comparison where we fine-tune language models on different hate speech detection datasets. This analysis shows how some datasets are more generalisable than others when used as training data. Crucially, our experiments show how combining hate speech detection datasets can contribute to the development of robust hate speech detection models. This robustness holds even when controlling by data size and compared with the best individual datasets.	翻訳日:2023-07-06 17:09:15 公開日:2023-07-04
# RaidEnv: ボスレイドゲームのためのコンテンツバランシング自動化の新たな課題 RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games ( http://arxiv.org/abs/2307.01676v1 ) ライセンス: Link先を確認	Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim	(参考訳) ゲームコンテンツのバランスはゲーム体験に大きな影響を与えます。不均衡なゲームコンテンツは、繰り返し失敗してエンゲージメントを減らしたり、フラストレーションを増加させる。ゲームデザイナーはゲームコンテンツの難易度を調整しようとするが、これは反復的で労働集約的で挑戦的なプロセスであり、特に幅広いコンテンツを持つ商業レベルのゲームではそうである。この問題に対処するため、ゲーム研究コミュニティは人工知能(AI)技術を用いた自動ゲームバランスについて検討した。しかし,従来の研究は限定的なゲームコンテンツに焦点を当てており,コンテンツの変化に遭遇する際のプレイテストエージェントの一般化能力の重要性を考慮しなかった。本研究では,mmorpgゲームにおけるboss raidシナリオの多様かつカスタマイズ可能なコンテンツを含む,新しいゲームシミュレータraidenvを提案する。さらに,ゲームAIの実践的応用に役立つボスレイドシナリオのベンチマークを2つ設計する。これらのベンチマークは,自動コンテンツバランシングにおける2つのオープン問題に対処し,自動コンテンツバランシングにおけるaiのガイダンスを提供するために,2つの評価指標を導入する。このゲーム研究プラットフォームは、自動ゲームバランシング問題のフロンティアを拡張し、現実的なゲーム生産パイプライン内でフレームワークを提供する。 The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline.	翻訳日:2023-07-06 17:09:02 公開日:2023-07-04
# ダイヤモンドの窒素空洞中心における2光子遷移超断熱通路 Two-photon-transition superadiabatic passage in an nitrogen-vacancy center in diamond ( http://arxiv.org/abs/2307.01675v1 ) ライセンス: Link先を確認	Musang Gong, Min Yu, Yaoming Chu, Wei Chen, Qingyun Cao, Ning Wang, Jianming Cai, Ralf Betzholz, and Luigi Giannelli	(参考訳) 与えられた目標量子状態に高い忠実度と高速な演算速度を量子限界に近づけることは、量子情報科学の重要な目標である。本稿では,3レベル固体スピン系における集団移動を実現するための超断熱量子駆動実験を行った。従来の刺激されたraman adiabatic passage (stirap) から始まり、いくつかのパラダイム的パルス形状を持つsrawap hamiltonianの超断熱補正を実装している。強いマイクロ波パルスや長い移動時間を必要としないため、パルス不完全性よりも強い堅牢性を示す。これらの結果は、量子情報処理および量子システムのコヒーレント操作に有用なツールとなるかもしれない。 Reaching a given target quantum state with high fidelity and fast operation speed close to the quantum limit represents an important goal in quantum information science. Here, we experimentally demonstrate superadiabatic quantum driving to achieve population transfer in a three-level solid-state spin system. Starting from traditional stimulated Raman adiabatic passage (STIRAP), our approach implements superadiabatic corrections to the STIRAP Hamiltonians with several paradigmatic pulse shapes. It requires no need of intense microwave pulses or long transfer times and shows enhanced robustness over pulse imperfections. These results might provide a useful tool for quantum information processing and coherent manipulations of quantum systems.	翻訳日:2023-07-06 17:08:43 公開日:2023-07-04
# rrcnn : リカレント残差畳み込みニューラルネットワークを用いた新しい信号分解法 RRCNN: A novel signal decomposition approach based on recurrent residue convolutional neural network ( http://arxiv.org/abs/2307.01725v1 ) ライセンス: Link先を確認	Feng Zhou, Antonio Cicone, Haomin Zhou	(参考訳) 非定常信号の分解は信号時間-周波数解析の分野で重要かつ困難な課題である。近年,1998年にhuangらによって開拓された経験的モード分解に導かれた多くの信号分解法が,異なる研究グループによって提案されている。しかし、いくつかの制限がある。例えば、それらは一般的に境界とモードの混合効果があり、ノイズに対してあまり頑丈ではない。 Inspired by the successful applications of deep learning in fields like image processing and natural language processing, and given the lack in the literature of works in which deep learning techniques are used directly to decompose non-stationary signals into simple oscillatory components, we use the convolutional neural network, residual structure and nonlinear activation function to compute in an innovative way the local average of the signal, and study a new non-stationary signal decomposition method under the framework of deep learning. 本稿では,提案モデルの学習過程について考察し,学習アルゴリズムの収束解析について考察する。実験では,提案モデルの性能を,局所平均の計算と信号分解という2つの観点から評価した。さらに,提案手法により得られた分解成分のモード混合,ノイズ干渉,直交特性について検討した。これらの結果から,提案モデルにより,既存手法よりも境界効果,モード混合効果,ロバスト性,分解成分の直交性が向上することが示唆された。 The decomposition of non-stationary signals is an important and challenging task in the field of signal time-frequency analysis. In the recent two decades, many signal decomposition methods led by the empirical mode decomposition, which was pioneered by Huang et al. in 1998, have been proposed by different research groups. However, they still have some limitations. For example, they are generally prone to boundary and mode mixing effects and are not very robust to noise. Inspired by the successful applications of deep learning in fields like image processing and natural language processing, and given the lack in the literature of works in which deep learning techniques are used directly to decompose non-stationary signals into simple oscillatory components, we use the convolutional neural network, residual structure and nonlinear activation function to compute in an innovative way the local average of the signal, and study a new non-stationary signal decomposition method under the framework of deep learning. We discuss the training process of the proposed model and study the convergence analysis of the learning algorithm. In the experiments, we evaluate the performance of the proposed model from two points of view: the calculation of the local average and the signal decomposition. Furthermore, we study the mode mixing, noise interference, and orthogonality properties of the decomposed components produced by the proposed method. All results show that the proposed model allows for better handling boundary effect, mode mixing effect, robustness, and the orthogonality of the decomposed components than existing methods.	翻訳日:2023-07-06 17:01:18 公開日:2023-07-04
# 空間広帯域高利得SU(1,1)干渉計の位相感度 Phase sensitivity of spatially broadband high-gain SU(1,1) interferometers ( http://arxiv.org/abs/2307.01723v1 ) ライセンス: Link先を確認	D. Scharwald, T. Meier, P. R. Sharapova	(参考訳) 非線形干渉計は、古典光を用いた線形干渉計と比較して位相感度のスケーリングが向上していることが特徴である。しかし、これらの干渉計で発生する光の多重度は位相感度の破壊を招き、マルチモード光に対して高度な干渉計構成を必要とする。さらに、単一モードの場合とは対照的に、時間順序効果はマルチモードシナリオにおいて高利得状態において重要な役割を担い、位相感度の正確な推定を考慮に入れなければならない。本研究では,低パラメトリックゲインおよび高パラメトリックゲインで動作する空間多重モードSU(1,1)干渉計の理論記述を示す。本手法は,各非線形相互作用領域に対する積分微分方程式系の段階的解法に基づいている。光の偏光を補うためにパラボラミラーなどの集光素子を用いる回折補償型干渉計に着目する。平面波とガウスポンプについて検討し,任意のパラメトリックゲインに対して,位相感度が標準ショットノイズスケールを超える位相領域が存在することを示し,ハイゼンベルクスケールに接近する状態について考察する。最後に、低パラメトリックゲインと高パラメトリックゲインの両方に有効な位相感度に関する洞察に富んだ解析式に到達し、それがシステムの空間モードの数に依存するかを実証する。 Nonlinear interferometers are promising tools for quantum metrology, as they are characterized by an improved phase sensitivity scaling compared to linear interferometers operating with classical light. However, the multimodeness of the light generated in these interferometers results in the destruction of their phase sensitivity, requiring advanced interferometric configurations for multimode light. Moreover, in contrast to the single-mode case, time-ordering effects play an important role for the high-gain regime in the multimode scenario and must be taken into account for a correct estimation of the phase sensitivity. In this work, we present a theoretical description of spatially multimode SU(1,1) interferometers operating at low and high parametric gains. Our approach is based on a step-by-step solution of a system of integro-differential equations for each nonlinear interaction region. We focus on interferometers with diffraction compensation, where focusing elements such as a parabolic mirror are used to compensate for the divergence of the light. We investigate plane-wave and Gaussian pumping and show that for any parametric gain, there exists a region of phases for which the phase sensitivity surpasses the standard shot-noise scaling and discuss the regimes where it approaches the Heisenberg scale. Finally, we arrive at insightful analytical expressions for the phase sensitivity that are valid for both low and high parametric gain and demonstrate how it depends on the number of spatial modes of the system.	翻訳日:2023-07-06 17:00:57 公開日:2023-07-04
# MOPO-LSI: ユーザガイド MOPO-LSI: A User Guide ( http://arxiv.org/abs/2307.01719v1 ) ライセンス: Link先を確認	Yong Zheng, Kumar Neelotpal Shukla, Jasmine Xu, David (Xuejun) Wang, Michael O'Leary	(参考訳) MOPO-LSIは、持続可能な投資のためのオープンソースの多目的ポートフォリオ最適化ライブラリである。この文書はMOPO-LSIバージョン1.0のユーザガイドを提供し、問題設定、ワークフロー、設定のハイパーパラメータを含む。 MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.	翻訳日:2023-07-06 17:00:35 公開日:2023-07-04
# 制約時間系列生成問題について On the Constrained Time-Series Generation Problem ( http://arxiv.org/abs/2307.01717v1 ) ライセンス: Link先を確認	Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, Svitlana Vyetrenko	(参考訳) 合成時系列は、機械学習アルゴリズムの性能向上のために履歴時系列データセットを増強し、まれな事象の発生を増幅し、時系列によって記述された反事実シナリオを作成するために、実用的な用途でしばしば使用される。分散相似性(リアリズムと呼ぶ)と特定の数値的制約の満足度は、反実時間時系列シナリオ生成要求において共通の要件である。例えば、米連邦準備制度理事会(Federal Reserve)は、金融機関が仮説的不況における業績を評価するための制約付き時系列によって与えられる合成市場ストレスシナリオを公表している。制約付き時系列を生成する既存のアプローチは、通常、トレーニング損失を罰して制約を強制し、非コンフォーミングなサンプルを拒否する。しかし、これらの手法は制約を変更した場合には再訓練が必要であり、拒否サンプリングは計算コストが高く、複雑な制約に対して実用的ではない。本稿では,制約付き時系列生成問題に対処し,生成時系列のリアリズムを確保しつつ効率的なサンプリングを実現するための新しい手法を提案する。特に,制約付き最適化フレームワークを用いて問題を枠組み化し,現実的な時系列を生成するための誘導拡散モデルである `GuidedDiffTime'' などの生成手法を提案する。実証的に、制約を組み込むことが重要となる金融・エネルギーデータのデータセットをいくつか評価します。我々のアプローチは、定性的にも量的にも、既存の作業より優れています。最も重要なことは、我々の `GuidedDiffTime'' モデルが、新しい制約に対して再トレーニングが不要な唯一のソリューションであり、結果として炭素フットプリントが大幅に減少することを示している。 Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.	翻訳日:2023-07-06 17:00:31 公開日:2023-07-04
# Align with Purpose: General Plug-and-Play Frameworkを用いたCTCモデルにおけるDesiredプロパティの最適化 Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework ( http://arxiv.org/abs/2307.01715v1 ) ライセンス: Link先を確認	Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua and Tal Rosenwein	(参考訳) コネクショニスト時間分類(ctc)は、教師付きシーケンシャル・ツー・シークエンス(seq2seq)モデルの訓練に広く用いられている基準である。これは不完全なアライメントを犠牲にして、完全なアライメント(基礎となる真実を生み出す)を余分にすることで、入力シーケンスと出力シーケンスの関係を学習することができる。完全かつ不完全なアライメントのこの二項微分は、他の現実世界の応用において重要な重要なアライメント特性を捉えていない。ここでは、CTC基準でトレーニングされたモデルにおいて、所望のプロパティを強化するために、$\textbf{ general Plug-and-Play framework}$を提案する。我々は、所望の特性に応じてアライメントを優先順位付けする追加の損失項でCTCを補完する。本手法はctc損失関数への干渉を一切必要とせず,様々な特性の最適化を容易にし,完全アライメントと不完全アライメントの区別を可能にする。我々は,ASR(Automatic Speech Recognition)の領域にフレームワークを適用し,その特性選択,アーキテクチャ選択,トレーニングデータセットのスケール(最大280,000時間)において,その汎用性を示す。本フレームワークの有効性を実証するため, 出力時間と単語誤り率(WER)の2つの非関連特性に適用した。前者については、WERの小さな削減によるレイテンシ最適化の最大570msの改善を報告し、後者については、ベースラインモデルよりも4.5%WERの相対的な改善を報告した。私たちの知る限りでは、これらのアプリケーションは我々のものほど大規模なデータを扱うことが実証されたことはない。特に,本手法は数行のコードだけで実装可能であり,アライメントフリーな損失関数やASR以外の領域にも拡張可能である。 Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.	翻訳日:2023-07-06 17:00:03 公開日:2023-07-04
# 論理はウィグナーの友人(とその友人)と出会う Logic meets Wigner's Friend (and their Friends) ( http://arxiv.org/abs/2307.01713v1 ) ライセンス: Link先を確認	Alexandru Baltag and Sonja Smets	(参考訳) 我々は、Wigner's Friend thought-experimentと、Frauchiger-Renner(FR) Paradox(英語版)など、より最近の変種と拡張のいくつかを新たに見ていく。これらのシナリオにおいて、状態割当の多重性の正しい認識論的解釈とは何か。その下では、従来の量子力学と相容れない方法で、古典的観察者を量子状態記述に含めることができるのか? あるシステムが別のバックグラウンドオブザーバの観点から、追加の"オブザーバ"として認められる条件は? エージェント間の「知識伝達」を可能にするマルチエージェント認識論理の標準公理は、量子物理学的観測者に適用できるのか? 論文の最後のパートでは、これらの質問に対する新しい回答を提案し、この回答の特定の形式的実装をスケッチし、友人型パラドックスに対する原理的な解決策を得るためにそれを適用する。 We take a fresh look at Wigner's Friend thought-experiment and some of its more recent variants and extensions, such as the Frauchiger-Renner (FR) Paradox. We discuss various solutions proposed in the literature, focusing on a few questions: what is the correct epistemic interpretation of the multiplicity of state assignments in these scenarios; under which conditions can one include classical observers into the quantum state descriptions, in a way that is still compatible with traditional Quantum Mechanics?; under which conditions can one system be admitted as an additional 'observer' from the perspective of another background observer?; when can the standard axioms of multi-agent Epistemic Logic (that allow "knowledge transfer" between agents) be applied to quantum-physical observers? In the last part of the paper, we propose a new answer to these questions, sketch a particular formal implementation of this answer, and apply it to obtain a principled solution to Wigner Friend-type paradoxes.	翻訳日:2023-07-06 16:59:29 公開日:2023-07-04
# ディッピングPLM:条件付きソフトプロンプティングによる効果的な知識グラフ補完のためのブリッジ構造とテキスト Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting ( http://arxiv.org/abs/2307.01709v1 ) ライセンス: Link先を確認	Chen Chen, Yufei Wang, Aixin Sun, Bing Li and Kwok-Yan Lam	(参考訳) 知識グラフ補完(KGC)は、しばしばKG構造情報とテキスト情報の両方を有効にする必要がある。事前訓練された言語モデル(PLM)は、通常、KGCタスクの微調整パラダイムの下で、テキスト情報を学ぶために使われてきた。しかし、微調整されたplmは、しばしばテキスト情報に集中し、構造的知識を見落としている。本稿では,構造情報とテキスト知識のバランスを保つCSProm-KG(Conditional Soft Prompts for KGC)を提案する。 CSProm-KGは、エンティティと関係表現によって生成される条件付きソフトプロンプトのパラメータのみをチューニングする。 WN18RR, FB15K-237, Wikidata5Mの3つの静的KGCベンチマークとICEWS14, ICEWS05-15におけるCSProm-KGの有効性を検証する。 CSProm-KGは競争ベースラインモデルより優れており、これらのベンチマークで新たな最先端を設定できる。さらなる分析を行い i)提案したコンポーネントの有効性。 (ii)csprom-kgの効率、及び (iii) csprom-kgの柔軟性。 Knowledge Graph Completion (KGC) often requires both KG structural and textual information to be effective. Pre-trained Language Models (PLMs) have been used to learn the textual information, usually under the fine-tune paradigm for the KGC task. However, the fine-tuned PLMs often overwhelmingly focus on the textual information and overlook structural knowledge. To tackle this issue, this paper proposes CSProm-KG (Conditional Soft Prompts for KGC) which maintains a balance between structural information and textual knowledge. CSProm-KG only tunes the parameters of Conditional Soft Prompts that are generated by the entities and relations representations. We verify the effectiveness of CSProm-KG on three popular static KGC benchmarks WN18RR, FB15K-237 and Wikidata5M, and two temporal KGC benchmarks ICEWS14 and ICEWS05-15. CSProm-KG outperforms competitive baseline models and sets new state-of-the-art on these benchmarks. We conduct further analysis to show (i) the effectiveness of our proposed components, (ii) the efficiency of CSProm-KG, and (iii) the flexibility of CSProm-KG.	翻訳日:2023-07-06 16:59:12 公開日:2023-07-04
# リスク感応強化学習のための分布モデル等価性 Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning ( http://arxiv.org/abs/2307.01708v1 ) ライセンス: Link先を確認	Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand	(参考訳) リスク感応強化学習における学習モデルの問題を考える。リスクニュートラルな設定で最適に計画できる学習モデルである適切な値等価性は、リスクセンシティブな設定で最適に計画するのに十分でないことを理論的に実証する。分散強化学習を用いて,モデル等価性という新たな概念を2つ導入した。1つは汎用的であり,任意のリスク対策の計画に使用できるが,難解である。また,どのリスク対策を最適に計画するかを選択できる実用的なバリエーションである。当社のフレームワークは,モデルフリーなリスクセンシティブアルゴリズムの強化にどのように役立つのかを実証するとともに,その能力を示すために,表式および大規模実験の両方を提供する。 We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.	翻訳日:2023-07-06 16:58:49 公開日:2023-07-04
# 皮膚内視鏡と臨床画像を用いたマルチラベル皮膚病変分類のためのグラフアンサンブル学習モデル Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images ( http://arxiv.org/abs/2307.01704v1 ) ライセンス: Link先を確認	Peng Tang, Yang Nan, Tobias Lasser	(参考訳) 近年,多くの皮膚病変解析 (SLA) 法が, 2つの要因によるマルチモーダルベース多ラベル分類法の開発に焦点をあてている。 1つはマルチモーダルデータ、すなわち臨床画像と皮膚鏡画像であり、単一のモーダルデータよりも正確な結果を得るために補完的な情報を提供できる。 2つ目は、補助分類タスクとしてのマルチラベル分類、すなわち7点チェックリスト(spc)基準は、深層学習(dl)パイプラインにおけるメラノーマの診断精度を高めるだけでなく、臨床皮膚科医の診断において一般的に用いられるように、臨床医により有用な機能を提供する。しかし、ほとんどの手法はマルチモーダルデータ融合のためのより良いモジュールの設計にのみ焦点を当てており、性能向上のためにSPCと皮膚疾患のラベル相関を利用する方法はほとんどない。本研究では,グラフ畳み込みネットワーク(GCN)を導入するギャップを埋め,相関行列として各カテゴリ間の先行共起を多ラベル分類のためのDLモデルに活用する。しかし,本実験では,GCNを直接適用することにより,医療データの統計的サンプルが不十分な場合において,GCNの弱い一般化能力が低下した。我々は,gcnからの予測を融合モデルからの予測の補完的情報と見なすグラフ・センスブル・ラーニング・モデル(geln)を提案し,それを重み付け平均化スキームによって適応的に融合することで,gcnから得られる貴重な情報を最大限の悪影響を回避しつつ活用する。提案手法を評価するために,公開データセットで実験を行う。その結果,異なるデータセットの分類性能を一貫して向上させ,spcと診断分類において最先端の性能を実現することができた。 Many skin lesion analysis (SLA) methods recently focused on developing a multi-modal-based multi-label classification method due to two factors. The first is multi-modal data, i.e., clinical and dermoscopy images, which can provide complementary information to obtain more accurate results than single-modal data. The second one is that multi-label classification, i.e., seven-point checklist (SPC) criteria as an auxiliary classification task can not only boost the diagnostic accuracy of melanoma in the deep learning (DL) pipeline but also provide more useful functions to the clinical doctor as it is commonly used in clinical dermatologist's diagnosis. However, most methods only focus on designing a better module for multi-modal data fusion; few methods explore utilizing the label correlation between SPC and skin disease for performance improvement. This study fills the gap that introduces a Graph Convolution Network (GCN) to exploit prior co-occurrence between each category as a correlation matrix into the DL model for the multi-label classification. However, directly applying GCN degraded the performances in our experiments; we attribute this to the weak generalization ability of GCN in the scenario of insufficient statistical samples of medical data. We tackle this issue by proposing a Graph-Ensemble Learning Model (GELN) that views the prediction from GCN as complementary information of the predictions from the fusion model and adaptively fuses them by a weighted averaging scheme, which can utilize the valuable information from GCN while avoiding its negative influences as much as possible. To evaluate our method, we conduct experiments on public datasets. The results illustrate that our GELN can consistently improve the classification performance on different datasets and that the proposed method can achieve state-of-the-art performance in SPC and diagnosis classification.	翻訳日:2023-07-06 16:58:35 公開日:2023-07-04
# ドメイン一般化セグメンテーションのための色を超えた拡張機能 Augment Features Beyond Color for Domain Generalized Segmentation ( http://arxiv.org/abs/2307.01703v1 ) ライセンス: Link先を確認	Qiyu Sun, Pavlo Melnyk, Michael Felsberg, Yang Tang	(参考訳) ドメイン一般化セマンティックセグメンテーション(dgss)は必須だが、非常に難しいタスクであり、モデルがソースデータのみに基づいてトレーニングされ、ターゲットデータも利用できない。従来のDGSSメソッドは拡張ベースと正規化ベースに分割できる。前者は余分なバイアス付きデータを導入するか、あるいはデータ拡張のためのチャネルワイズ調整のみを実行するか、後者は有益な視覚情報を捨て、どちらもDGSSの限られた性能に繋がる。一方,本手法はチャネル間変換を行い,その一方でドメイン固有のバイアスを回避し,データの多様化とモデル一般化性能の向上を図る。具体的には,ランダム画像色拡張 (rica) とランダム特徴分布拡張 (rfda) の2つのモジュールからなる。 RICAは、RGBからの画像をCIELABカラーモデルに変換し、知覚に基づく画像強調のための色マップをランダム化する。我々はさらに、RICAを補完するCycleGANベースの生成ネットワークを用いて色を超えて特徴空間に拡張し、さらに一般化能力を高めることにより、この拡張を行う。我々は広範な実験を行い,合成gtavとシンセサイアから実際の都市景観,bdd,mapillaryデータセットへの一般化結果から,dgssにおける最先端性能を実現することを示す。 Domain generalized semantic segmentation (DGSS) is an essential but highly challenging task, in which the model is trained only on source data and any target data is not available. Previous DGSS methods can be partitioned into augmentation-based and normalization-based ones. The former either introduces extra biased data or only conducts channel-wise adjustments for data augmentation, and the latter may discard beneficial visual information, both of which lead to limited performance in DGSS. Contrarily, our method performs inter-channel transformation and meanwhile evades domain-specific biases, thus diversifying data and enhancing model generalization performance. Specifically, our method consists of two modules: random image color augmentation (RICA) and random feature distribution augmentation (RFDA). RICA converts images from RGB to the CIELAB color model and randomizes color maps in a perception-based way for image enhancement purposes. We further this augmentation by extending it beyond color to feature space using a CycleGAN-based generative network, which complements RICA and further boosts generalization capability. We conduct extensive experiments, and the generalization results from the synthetic GTAV and SYNTHIA to the real Cityscapes, BDDS, and Mapillary datasets show that our method achieves state-of-the-art performance in DGSS.	翻訳日:2023-07-06 16:58:02 公開日:2023-07-04
# バイナリチームにおける量子アドバンテージとコーディネーションジレンマ:その1 The Quantum Advantage in Binary Teams and the Coordination Dilemma: Part I ( http://arxiv.org/abs/2307.01762v1 ) ライセンス: Link先を確認	Shashank A. Deshpande and Ankur A. Kulkarni	(参考訳) エンタングルメント支援確率的戦略により、パッシブ・コモン・ランダム性を通じてアクセス可能な古典的相関測度を超える戦略測度にアクセスでき、したがって分散制御における量子的優位性が得られることを示す。本稿では,問題クラスの広範な超構造の中での量子的優位性の決定論的起源について考察する。バイナリチームの各クラスは、異なる代数構造を持つコスト関数のパラメトリック族に対応しています。ここでは、量子戦略の恩恵を受ける唯一の問題クラスを特定する。これらのコスト構造は特別な決定論的特徴 -- ‘コーディネーションジレンマ’ を認めています。したがって、分散制御における非局所量子相関の有用性に対する直感が明らかとなる。 We have shown that entanglement assisted stochastic strategies allow access to strategic measures beyond the classically correlated measures accessible through passive common randomness, and thus attain a quantum advantage in decentralised control. In this two part series of articles, we investigate the decision theoretic origins of the quantum advantage within a broad superstructure of problem classes. Each class in our binary team superstructure corresponds to a parametric family of cost functions with a distinct algebraic structure. In this part, identify the only problem classes that benefit from quantum strategies. We find that these cost structures admit a special decision-theoretic feature -- `the coordination dilemma'. Our analysis hence reveals some intuition towards the utility of non-local quantum correlations in decentralised control.	翻訳日:2023-07-06 16:53:15 公開日:2023-07-04
# 事前学習は必要なすべて:自閉症スペクトラム障害分類のためのマルチアトラス拡張トランスフォーマフレームワーク Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification ( http://arxiv.org/abs/2307.01759v1 ) ライセンス: Link先を確認	Lucas Mahler, Qi Wang, Julius Steiglechner, Florian Birk, Samuel Heczko, Klaus Scheffler, Gabriele Lohmann	(参考訳) 自閉症スペクトラム障害(Autism spectrum disorder、ASD)は、非定型的認知、感情、社会的パターンを特徴とする精神疾患である。タイムリーかつ正確な診断は、ASD患者の効果的な介入と改善に不可欠である。本研究では,Multi-Atlas Enhanced Transformerフレームワーク,METAFormer,ASD分類を提案する。本フレームワークは, ABIDE I データセットからの静止状態機能的磁気共鳴画像データを用いて, 406 ASD と 476 の典型的制御 (TC) 被験者からなる。 METAFormerはマルチアトラス方式を採用しており、AAL、CC200、DOS160のフラット接続行列が変換器エンコーダの入力となる。特に,入力からのマスク値の再構成を含む自己教師付き事前学習は,付加的あるいは分離されたトレーニングデータを必要とすることなく,分類性能を著しく向上させる。階層化クロスバリデーションにより,提案手法の評価を行い,平均精度83.7%,AUCスコア0.832で,ABIDE Iデータセットの最先端性能を上回ることを示す。私たちのフレームワークのコードはhttps://github.com/Lugges991/METAFormerで利用可能です。 Autism spectrum disorder (ASD) is a prevalent psychiatric condition characterized by atypical cognitive, emotional, and social patterns. Timely and accurate diagnosis is crucial for effective interventions and improved outcomes in individuals with ASD. In this study, we propose a novel Multi-Atlas Enhanced Transformer framework, METAFormer, ASD classification. Our framework utilizes resting-state functional magnetic resonance imaging data from the ABIDE I dataset, comprising 406 ASD and 476 typical control (TC) subjects. METAFormer employs a multi-atlas approach, where flattened connectivity matrices from the AAL, CC200, and DOS160 atlases serve as input to the transformer encoder. Notably, we demonstrate that self-supervised pretraining, involving the reconstruction of masked values from the input, significantly enhances classification performance without the need for additional or separate training data. Through stratified cross-validation, we evaluate the proposed framework and show that it surpasses state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832. The code for our framework is available at https://github.com/Lugges991/METAFormer	翻訳日:2023-07-06 16:53:03 公開日:2023-07-04
# Flickrでプロの写真家を画像品質と美学で識別する Identifying Professional Photographers Through Image Quality and Aesthetics in Flickr ( http://arxiv.org/abs/2307.01756v1 ) ライセンス: Link先を確認	Sofia Strukova, Rub\'en Gaspar Marco, Jos\'e A. Ruip\'erez-Valiente, F\'elix G\'omez M\'armol	(参考訳) 私たちの世代では、ソーシャルメディア、特に写真とビデオの共有プラットフォームの利用が、間違いなく増加しています。これらのサイトは、ユーザのインタラクションを通じてリッチなデータセットを生成できることを証明し、データ駆動による機能評価に使用することができる。それにもかかわらず、写真とビデオの共有プラットフォームにおける適切なデータセットの欠如と、それらの評価プロセスを明らかにする。このようにして、私たちの最初のコントリビューションは、flickrで最大のラベル付きデータセットの1つと、このコントリビューションの一部としてオープンソース化されたマルチモーダルデータの作成です。これらのデータに基づいて機械学習モデルを探索し、ユーザーがプロの写真家であるか否かを、自己申告された職業ラベルとユーザー、写真、クラウドソースセットからいくつかの特徴表現に基づいて適切に予測することは可能であると結論付けた。また,画像の審美性と技術的品質と,その画像の社会的活動との関係についても検討した。最後に,プロの写真家と非プロの写真家を区別する特徴について述べる。私たちが知る限り、この研究で提示された結果は、さまざまなドメインの研究者が異なるアプリケーションのために使用できる、ユーザの専門知識の識別にとって重要なノベルティである。 In our generation, there is an undoubted rise in the use of social media and specifically photo and video sharing platforms. These sites have proved their ability to yield rich data sets through the users' interaction which can be used to perform a data-driven evaluation of capabilities. Nevertheless, this study reveals the lack of suitable data sets in photo and video sharing platforms and evaluation processes across them. In this way, our first contribution is the creation of one of the largest labelled data sets in Flickr with the multimodal data which has been open sourced as part of this contribution. Predicated on these data, we explored machine learning models and concluded that it is feasible to properly predict whether a user is a professional photographer or not based on self-reported occupation labels and several feature representations out of the user, photo and crowdsourced sets. We also examined the relationship between the aesthetics and technical quality of a picture and the social activity of that picture. Finally, we depicted which characteristics differentiate professional photographers from non-professionals. As far as we know, the results presented in this work represent an important novelty for the users' expertise identification which researchers from various domains can use for different applications.	翻訳日:2023-07-06 16:52:43 公開日:2023-07-04
# 脳波のフーリエスペクトル解析を用いたK複合検出 K-complex Detection Using Fourier Spectrum Analysis In EEG ( http://arxiv.org/abs/2307.01754v1 ) ライセンス: Link先を確認	Alexey Protopopov	(参考訳) k-複合体は脳活動の重要なマーカーであり、臨床実践において睡眠得点と研究の両方に使用される。しかし、脳波記録(EEG)のサイズや、社会学者によるK-複合体検出の主観的性質から、K-複合体検出の自動化は妥当である。この分野でのこれまでの研究は、提案手法の有効性を定量化するために真正の値と偽正の値に依存してきたが、この指標のセットは誤解を招く可能性がある。本研究の目的は、より正確なメトリクス集合を見つけ、それらをニューラルネットワークに依存しない新しいk-複素検出法の開発に用いることである。そこで本研究では,高速フーリエ変換に基づく2つのK-複素検出手法を提案する。その結果、提案手法は、ニューラルネットワークを用いた手法を含む従来の研究で示されていた手法の質と似ているか、あるいは優れているかのどちらかを提供するが、計算能力は低いため、K-複素検出はニューラルネットワークの使用を必要としないことがわかった。提案手法は,K-コンプレックス検出の品質を示す新しい指標を用いて評価した。 K-complexes are an important marker of brain activity and are used both in clinical practice to perform sleep scoring, and in research. However, due to the size of electroencephalography (EEG) records, as well as the subjective nature of K-complex detection performed by somnologists, it is reasonable to automate K-complex detection. Previous works in this field of research have relied on the values of true positive rate and false positive rate to quantify the effectiveness of proposed methods, however this set of metrics may be misleading. The objective of the present research is to find a more accurate set of metrics and use them to develop a new method of K-complex detection, which would not rely on neural networks. Thus, the present article proposes two new methods for K-complex detection based on the fast Fourier transform. The results achieved demonstrated that the proposed methods offered a quality of K-complex detection that is either similar or superior to the quality of the methods demonstrated in previous works, including the methods employing neural networks, while requiring less computational power, meaning that K-complex detection does not require the use of neural networks. The proposed methods were evaluated using a new set of metrics, which is more representative of the quality of K-complex detection.	翻訳日:2023-07-06 16:52:22 公開日:2023-07-04
# 光度DESI光赤銀河の大規模クラスタリングによる局所原始的非ガウス性 Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies ( http://arxiv.org/abs/2307.01753v1 ) ライセンス: Link先を確認	Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho, Julien Guy, Klaus Honscheid, Theodore Kisner, Martin Landriau, Michael Levi, Marc Manera, Aaron Meisner, Ramon Miquel, Eva-Maria Mueller, Adam Myers, Jeffrey A. Newman, Jundan Nie, Nathalie Palanque-Delabrouille, Will Percival, Claire Poppett, Graziano Rossi, Eusebio Sanchez, Michael Schubnell, Gregory Tarl\'e, Benjamin Alan Weaver, Christophe Y\`eche, Zhimin Zhou, Hu Zou	(参考訳) 我々は、Dark Energy Spectroscopic Instruments(DESI)による局所原始非ガウス性パラメータfNLの制約のために、光赤銀河の角クラスター化を用いる。サンプルは1200万以上のターゲットからなり、空は14,000平方度、赤方偏移は0.2<z < 1.35である。我々は, 銀河の絶滅, 調査深度, 天体観測を系統的誤りの主な原因とみなし, 大規模な非宇宙的余剰クラスタリングを緩和するために線形回帰と人工ニューラルネットワークを用いる。本手法は,fnlおよびシステマティックスの有無に関わらず対数正規化シミュレーションを行い,残存システマティックスを低減したニューラルネットワーク処理の性能を示す。普遍性関係を仮定すると、fNL $= 47^{+14(+29)}_{-11(-22)}$ 68\%(95\%) である。画像の全集合に対する回帰を含むよりアグレッシブな処理により、我々の最大可能性値は fNL$ \sim 50$ にわずかにシフトし、大規模なクラスタリング情報の除去による fNL の不確実性は増大する。得られた制約の整合性を示す一連の堅牢性テスト(例えば、画像、デクリエーション、または使用するスケールのカット)を適用する。系統的要因を緩和する多大な努力にもかかわらず、fnl > 0の信頼度は99.9%である。この結果は、キャリブレーションエラーや、絶滅テンプレートの低エネルギー系統に関する不確実性など、予期せぬ体系的な原因による可能性があるという懸念を引き起こす。あるいは、宇宙マイクロ波背景スケールが影響を受けないまま、大規模構造物の周囲に大きな非ガウス性を持つスケール依存のfnlモデルが示唆されるかもしれない。以上の結果から,DSIスペクトルを用いたfNLのさらなる研究が望まれる。 We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter fNL. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range 0.2< z < 1.35. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against log-normal simulations with and without fNL and systematics, showing superior performance of the neural network treatment in reducing remaining systematics. Assuming the universality relation, we find fNL $= 47^{+14(+29)}_{-11(-22)}$ at 68\%(95\%) confidence. With a more aggressive treatment, including regression against the full set of imaging maps, our maximum likelihood value shifts slightly to fNL$ \sim 50$ and the uncertainty on fNL increases due to the removal of large-scale clustering information. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. Despite extensive efforts to mitigate systematics, our measurements indicate fNL > 0 with a 99.9 percent confidence level. This outcome raises concerns as it could be attributed to unforeseen systematics, including calibration errors or uncertainties associated with low-\ell systematics in the extinction template. Alternatively, it could suggest a scale-dependent fNL model--causing significant non-Gaussianity around large-scale structure while leaving cosmic microwave background scales unaffected. Our results encourage further studies of fNL with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics.	翻訳日:2023-07-06 16:52:01 公開日:2023-07-04
# ひずみ工学によるリンの電気的および磁気的パーセル効果の制御 Controlling electric and magnetic Purcell effects in phosphorene via strain engineering ( http://arxiv.org/abs/2307.01752v1 ) ライセンス: Link先を確認	P. P. Abrantes, W. J. M. Kort-Kamp, F. S. S. Rosa, C. Farina, F. A. Pinheiro, and Tarik P. Cysne	(参考訳) 一軸ひずみの影響下で, 蛍光体で被覆した基板近傍の量子エミッタの自然発光寿命を調べた。電気双極子と磁気双極子を介する自発遷移を励起状態から基底状態へ考える。リンのモデリングは、通常の低エネルギー記述を超越した密結合モデルを用いて行われる。電気的, 磁気的減衰速度は, パーセル効果のほぼ完全な抑制から, ホスホレンの破砕格子構造に伴う高い柔軟性による1300%以上の顕著な向上まで, 均一ひずみの適用によって強く調整できることを実証した。また, 放出された量子の最も可能性の高い崩壊経路を調整するためのメカニズムとして, ひずみの利用も明らかにする。以上の結果から,一軸ひずみリンは光-物質相互作用の能動的制御のための効率的で多用途なプラットフォームであることがわかった。 We investigate the spontaneous emission lifetime of a quantum emitter near a substrate coated with phosphorene under the influence of uniaxial strain. We consider both electric dipole and magnetic dipole-mediated spontaneous transitions from the excited to the ground state. The modeling of phosphorene is performed by employing a tight-binding model that goes beyond the usual low-energy description. We demonstrate that both electric and magnetic decay rates can be strongly tuned by the application of uniform strain, ranging from a near-total suppression of the Purcell effect to a remarkable enhancement of more than 1300% due to the high flexibility associated with the puckered lattice structure of phosphorene. We also unveil the use of strain as a mechanism to tailor the most probable decay pathways of the emitted quanta. Our results show that uniaxially strained phosphorene is an efficient and versatile material platform for the active control of light-matter interactions thanks to its extraordinary optomechanical properties.	翻訳日:2023-07-06 16:51:28 公開日:2023-07-04
# SRCD:単一ドメイン汎用オブジェクト検出のための複合ドメインを用いた意味推論 SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection ( http://arxiv.org/abs/2307.01750v1 ) ライセンス: Link先を確認	Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo	(参考訳) 本稿では,単一ドメイン一般化オブジェクト検出のための新しいフレームワーク(すなわち単一dgod)を提案し,モデル一般化能力を高めるために,自己提供型複合クロスドメインサンプルの意味構造を学習し,維持することに関心を寄せる。複数のソースドメインでトレーニングされたDGODとは異なり、シングルDGODは単一のソースドメインだけで複数のターゲットドメインにうまく一般化することがはるかに難しい。既存の手法は主にDGODからの同様の処理を採用し、意味空間を分離または圧縮することでドメイン不変の特徴を学習する。しかし、潜在的な制限は2つある。 1) 極端に少ない単一ドメインデータによる擬似属性・ラベル相関 2) セマンティックな構造情報は一般に無視される。つまり,サンプルにおけるインスタンスレベルのセマンティック関係の親和性は,一般化のモデル化に不可欠である。本稿では,Single-DGODのためのSingmantic Reasoning with Compound Domains (SRCD)を提案する。具体的には,テクスチャベースの自己拡張(TBSA)モジュールと局所言語意味推論(LGSR)モジュールの2つの主要コンポーネントを含む。 TBSAは、光、影、色などのラベルに関連する無関係な属性を、光量効率の自己増強によって画像レベルで除去することを目的としている。さらに、lgsrは、インスタンス特徴のセマンティック関係をさらにモデル化し、本質的なセマンティック構造を解明し、維持するために使用される。複数のベンチマークで大規模な実験を行い、提案したSRCDの有効性を示した。 This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.	翻訳日:2023-07-06 16:51:12 公開日:2023-07-04
# Ben-ge: 地理・環境データによるBigEarthNetの拡張 Ben-ge: Extending BigEarthNet with Geographical and Environmental Data ( http://arxiv.org/abs/2307.01741v1 ) ライセンス: Link先を確認	Michael Mommert, Nicolas Kesseli, Jo\"elle Hanna, Linus Scheibenreif, Damian Borth, Beg\"um Demir	(参考訳) 深層学習法は、大量の複雑な地球観測データの解析において強力なツールであることが証明されている。しかし、地球観測データはほとんどの場合マルチモーダルであるが、通常は単一のあるいは少数のモーダルしか考慮されない。本稿では,自由かつグローバルに利用可能な地理および環境データをコンパイルすることにより,bigearthnet-mmデータセットを補完するben-geデータセットを提案する。このデータセットに基づいて,パッチベースの土地利用/土地被覆分類と土地利用/土地被覆区分の下流タスクにおける異なるデータモダリティを組み合わせる価値を示す。 ben-geは無料で利用可能であり、完全に監視され、自己監視された地球観測アプリケーションのためのテストベッドとして機能することが期待されている。 Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and environmental data. Based on this dataset, we showcase the value of combining different data modalities for the downstream tasks of patch-based land-use/land-cover classification and land-use/land-cover segmentation. ben-ge is freely available and expected to serve as a test bed for fully supervised and self-supervised Earth observation applications.	翻訳日:2023-07-06 16:50:31 公開日:2023-07-04
# 非コントラストCTにおけるストローク病変分割と画像-ラベル拡散確率モデル Synchronous Image-Label Diffusion Probability Model with Application to Stroke Lesion Segmentation on Non-contrast CT ( http://arxiv.org/abs/2307.01740v1 ) ライセンス: Link先を確認	Jianhai Zhang and Tonghua Wan and Ethan MacDonald and Aravind Ganesh and Qiu Wu	(参考訳) 急性虚血性脳卒中(AIS)患者の予後を評価するため, ストローク病変容積は重要なX線学的指標であり, 非コントラストCT(NCCT)スキャンでは自動測定が困難である。最近の拡散確率モデルは、画像分割に使用される可能性を示している。本稿では,マルコフ拡散法を用いてNCCTの脳梗塞セグメント化を行うために,シンクロナス画像ラベル拡散確率モデル(SDPM)を提案する。提案したSDPMはLVM(Latent Variable Model)を完全にベースとしており、完全な確率的エラボレーションを提供する。ノイズ予測ストリームと平行な追加のネットストリームを導入し、最終ラベルを効率的に推定するための初期ノイズラベル推定値を得る。特定の変動境界を最適化することにより、トレーニングされたモデルは、ノイズのある入力画像から基準値に対する複数のラベル推定を推測することができる。提案モデルは1つの公開データセットと2つのプライベートデータセットを含む3つの脳卒中病変データセットで評価された。いくつかのu-netおよびtransformerベースのセグメンテーション手法と比較して,提案するsdpmモデルは最先端の性能を実現することができる。コードは公開されている。 Stroke lesion volume is a key radiologic measurement for assessing the prognosis of Acute Ischemic Stroke (AIS) patients, which is challenging to be automatically measured on Non-Contrast CT (NCCT) scans. Recent diffusion probabilistic models have shown potentials of being used for image segmentation. In this paper, a novel Synchronous image-label Diffusion Probability Model (SDPM) is proposed for stroke lesion segmentation on NCCT using Markov diffusion process. The proposed SDPM is fully based on a Latent Variable Model (LVM), offering a complete probabilistic elaboration. An additional net-stream, parallel with a noise prediction stream, is introduced to obtain initial noisy label estimates for efficiently inferring the final labels. By optimizing the specified variational boundaries, the trained model can infer multiple label estimates for reference given the input images with noises. The proposed model was assessed on three stroke lesion datasets including one public and two private datasets. Compared to several U-net and transformer-based segmentation methods, our proposed SDPM model is able to achieve state-of-the-art performance. The code is publicly available.	翻訳日:2023-07-06 16:49:56 公開日:2023-07-04
# 医用画像解析の公平性向上を目的とした固定属性群のない校正バイアスの緩和 Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis ( http://arxiv.org/abs/2307.01738v1 ) ライセンス: Link先を確認	Changjian Shui, Justin Szeto, Raghav Mehta, Douglas Arnold, Tal Arbel	(参考訳) 深層学習医療画像モデルの現実的な臨床実践への展開には、校正が必要である。しかし、全体として十分に調整されたモデルは、サブ人口の調整が不十分なままであり、このモデルの推奨に基づいて、臨床医が不意にこのグループの決定を下す可能性がある。モデル精度の観点から,サブグループ間のバイアスの軽減に有効な方法が示されているが,本研究は医用画像解析の文脈におけるキャリブレーションバイアスの軽減に関するオープン問題に焦点を当てている。本手法は訓練中にサブグループ属性を必要とせず,各属性の選択に対するバイアスを緩和する柔軟性を実現する。そこで本研究では,まず低濃度の試料を同定し,それらをグループに分類し,グループワイド焦点損失を導入して校正バイアスを改善する2段階の手法を提案する。 HAM10000データセットを用いた皮膚病変分類と,多発性硬化症(MS)患者の将来の病変活動の予測について検討した。また,年齢,性別などの従来の敏感な属性を年齢,性別などのサブグループで考慮することに加えて,医療画像解析において必要となる病変負荷など,画像由来の属性が異なるグループ間でのバイアスも考慮する。提案手法は, 予測性能を維持しつつ, 最近のベースラインよりも高い精度で校正誤差を効果的に制御できることを示す。 Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.	翻訳日:2023-07-06 16:49:17 公開日:2023-07-04
# すべての国家のデジタル主権戦略 Digital Sovereignty Strategies for Every Nation ( http://arxiv.org/abs/2307.01791v1 ) ライセンス: Link先を確認	Ali Shoker	(参考訳) デジタル主権はすべての近代国家の議題になければならない。デジタル技術は、食べ物や水管理といった重要な要素から、メタバースや宇宙における超越まで、私たちの生活の細部の一部になっています。したがって、デジタル資産を保護することは、現代国家が生き、卓越し、リードすることは避けられない。デジタル主権は、これらのデジタル資産を友好的な合理的な国家の独占から守るための戦略的必要性であり、非友好的な国家や行動の脅威である。本研究では,デジタル資産の利用,所有,生産のバリューチェーン全体をカバーするように拡張することで,デジタル主権の定義と範囲を再検討する。我々は、持続可能な主権を達成するために必要な研究と革新に加えて、原材料と人的専門知識の両方の運用資源を保護することの重要性を強調します。また、自律によるデジタル主権はしばしば不可能であり、相互協力は必ずしも持続可能であるとは限らないことを示す。この目的のために,ゲーム理論においてしばしば研究されるナッシュ平衡を用いたデジタル主権の実現を提案し,合理的状態との関係を規定する。最後に,その現状,優先事項,能力に基づいて,各国のデジタルプロファイルに対するデジタル主権アジェンダを提案する。我々は、現在のデジタル資産を主権化するのに有用な最先端のデジタル技術を調査します。また、自律性に可能な限り近い独立デジタル国家の育成を目指すロードマップも提案する。最後に、技術的、経済的、地政学的という異なる観点からデジタル主権をよりよく理解し、実装するためのさらなる研究の必要性に注目します。 Digital Sovereignty must be on the agenda of every modern nation. Digital technology is becoming part of our life details, from the vital essentials, like food and water management, to transcendence in the Metaverse and Space. Protecting these digital assets will, therefore, be inevitable for a modern country to live, excel and lead. Digital Sovereignty is a strategic necessity to protect these digital assets from the monopoly of friendly rational states, and the threats of unfriendly Malicious states and behaviors. In this work, we revisit the definition and scope of digital sovereignty through extending it to cover the entire value chain of using, owning, and producing digital assets. We emphasize the importance of protecting the operational resources, both raw materials and human expertise, in addition to research and innovation necessary to achieve sustainable sovereignty. We also show that digital sovereignty by autonomy is often impossible, and by mutual cooperation is not always sustainable. To this end, we propose implementing digital sovereignty using Nash Equilibrium, often studied in Game Theory, to govern the relation with Rational states. Finally, we propose a digital sovereignty agenda for different country's digital profiles, based on their status quo, priorities, and capabilities. We survey state-of-the-art digital technology that is useful to make the current digital assets sovereign. Additionally, we propose a roadmap that aims to develop a sovereign digital nation, as close as possible to autonomy. Finally, we draw attention to the need of more research to better understand and implement digital sovereignty from different perspectives: technological, economic, and geopolitical.	翻訳日:2023-07-06 16:41:34 公開日:2023-07-04
# 二重シンプレクティック古典回路:多体カオスの正確に解けるモデル Dual symplectic classical circuits: An exactly solvable model of many-body chaos ( http://arxiv.org/abs/2307.01786v1 ) ライセンス: Link先を確認	Alexios Christopoulos, Andrea De Luca, D L Kovrizhin, Toma\v{z} Prosen	(参考訳) 二重シンプレクティックれんが壁回路における動的相関関数を1次元で計算する方法を提案する。これらは決定論的古典的多体力学系であり、2つの直交(時間と空間)方向のシンプレクティックダイナミクスによって解釈できる。量子双対回路との類似性において、2点動的相関関数は光円錐の端にしか存在しないことが証明される。動的相関は、一般に無限次元である1サイトマルコフ変換作用素の観点で正確に計算可能である。我々はこの理論を、古典的なフロッケスピンチェーンのダイナミクスを記述する双交回路の特定の族でテストする。驚くべきことに、これらのモデルでは、回転対称性は球面高調波に基づいてブロック対角形を持つ転送作用素に繋がる。これにより、簡単な局所観測可能な解析的予測が得られる。モンテカルロシミュレーションとの比較により,観測変数の異なる選択に対する優れた一致を示すことにより,我々の理論の有効性を実証する。 We propose a general exact method of calculating dynamical correlation functions in dual symplectic brick-wall circuits in one dimension. These are deterministic classical many-body dynamical systems which can be interpreted in terms of symplectic dynamics in two orthogonal (time and space) directions. In close analogy with quantum dual-unitary circuits, we prove that two-point dynamical correlation functions are non-vanishing only along the edges of the light cones. The dynamical correlations are exactly computable in terms of a one-site Markov transfer operator, which is generally of infinite dimensionality. We test our theory in a specific family of dual-symplectic circuits, describing the dynamics of a classical Floquet spin chain. Remarkably, for these models, the rotational symmetry leads to a transfer operator with a block diagonal form on the basis of spherical harmonics. This allows us to obtain analytical predictions for simple local observables. We demonstrate the validity of our theory by comparison with Montecarlo simulations, displaying excellent agreement for different choices of observables.	翻訳日:2023-07-06 16:41:05 公開日:2023-07-04
# 思考の内部的な感情 The Inner Sentiments of a Thought ( http://arxiv.org/abs/2307.01784v1 ) ライセンス: Link先を確認	Chris Gagne and Peter Dayan	(参考訳) トランスフォーマーベースの大規模言語モデル(LLM)は、非常にリアルなテキストを生成することができる。彼らは、はっきりと表現することができ、少なくとも暗黙的に、明らかな、価値や覚醒のような明白なものから、決定や賞賛といった微妙なものまで、幅広い感情や色を表現することができる。我々は、これらの表現を初めて探究し、それらが単一文の内部感傷的動作を理解するのにどのように役立つかを示す。長くなる接頭辞に適用されるllmの隠れた表現から文章の最終的な感情分布の量的特徴を推定する。評価, 判断, 賞賛, 不安, 不安の分布の予測が適切に調整されていることを示すと, これらの予測器を用いて文を分析し, 例えば, 通常の接続(例えば"but")でさえ, 発話の感情的軌跡を劇的に変えることができることを示す。次に,分布予測を活用し,分布の尾に感情のある文を生成する方法を示す。本研究は,精神機能障害などの思考の内的作業における結果の意義について考察する。 Transformer-based large-scale language models (LLMs) are able to generate highly realistic text. They are duly able to express, and at least implicitly represent, a wide range of sentiments and color, from the obvious, such as valence and arousal to the subtle, such as determination and admiration. We provide a first exploration of these representations and how they can be used for understanding the inner sentimental workings of single sentences. We train predictors of the quantiles of the distributions of final sentiments of sentences from the hidden representations of an LLM applied to prefixes of increasing lengths. After showing that predictors of distributions of valence, determination, admiration, anxiety and annoyance are well calibrated, we provide examples of using these predictors for analyzing sentences, illustrating, for instance, how even ordinary conjunctions (e.g., "but") can dramatically alter the emotional trajectory of an utterance. We then show how to exploit the distributional predictions to generate sentences with sentiments in the tails of distributions. We discuss the implications of our results for the inner workings of thoughts, for instance for psychiatric dysfunction.	翻訳日:2023-07-06 16:40:48 公開日:2023-07-04
# GHOST:シリコンフォトニクスを用いたグラフニューラルネットワーク加速器 GHOST: A Graph Neural Network Accelerator using Silicon Photonics ( http://arxiv.org/abs/2307.01782v1 ) ライセンス: Link先を確認	Salma Afifi, Febin Sunny, Amin Shafiee, Mahdi Nikdast, Sudeep Pasricha	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データからモデリングと学習を行うための強力なアプローチとして登場した。その後、複数の分野は、レコメンデーションシステム、ソーシャルネットワーク分析、薬物発見、ロボット工学などのGNNの能力から大きな恩恵を受けている。しかしながら、GNNの大幅な計算とメモリ要求のため、GNNの高速化と効率的な処理には、従来のニューラルネットワークアクセラレータを超えるユニークなアプローチが必要である。 CMOSプラットフォームのスケーリングのスローダウンは、代替実装基板の探索を動機付けている。本稿では、gnnのための最初のシリコンフォトニックハードウェアアクセラレータであるghostについて述べる。 GHOSTは、頂点中心とエッジ中心の両方の操作に関連するコストを効率的に軽減する。光学領域におけるGNNの実行に関わる3つの主要なステージを別々に実装し、グラフ畳み込みネットワークやグラフアテンションネットワークなど、広く使われているGNNモデルやアーキテクチャの推論に使用することができる。我々のシミュレーション研究は、GHOSTがGPU、TPU、CPUおよび複数の最先端GNNハードウェアアクセラレータと比較して、少なくとも10.2倍のスループットと3.8倍のエネルギー効率を示すことを示している。 Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.	翻訳日:2023-07-06 16:40:29 公開日:2023-07-04
# fedhil: モバイルデバイスを用いたロバストな屋内定位のためのヘテロゲニティレジリエントフェデレーション学習 FedHIL: Heterogeneity Resilient Federated Learning for Robust Indoor Localization with Mobile Devices ( http://arxiv.org/abs/2307.01780v1 ) ライセンス: Link先を確認	Danish Gufran, Sudeep Pasricha	(参考訳) 屋内ローカライゼーションは、緊急対応、倉庫管理、拡張現実体験などのアプリケーションにおいて重要な役割を果たす。機械学習(ML)ベースの屋内ローカライズフレームワークをモバイルデバイスにデプロイすることで、ユーザはさまざまな屋内および地下環境にローカライズすることができる。しかし、モバイルデバイスのハードウェアやソフトウェアスタックの不均一性のため、正確な屋内ローカライゼーションを実現することは困難であり、不整合かつ不正確な位置推定をもたらす可能性がある。従来のMLモデルは、初期トレーニングデータにも大きく依存しているため、内部環境全体の動的変更によるパフォーマンス低下に対して脆弱である。デバイスの不均一性と適応性の欠如による課題に対処するため,FedHILと呼ばれる新しいMLフレームワークを提案する。本研究では,屋内ローカライズとフェデレーション学習(fl)を組み合わせて,デバイスヘテロジェンス環境における屋内ローカライズ精度を向上させるとともに,ユーザデータのプライバシも保持する。 FedHILは、極めてノイズの多いデータが存在する場合でも、FL中の屋内ローカライゼーションのためのMLモデルの性能を維持するために、ドメイン固有の選択的な重量調整アプローチを統合する。各種屋内環境および異種モバイルデバイスを用いた実験により,FedHILは最先端のFLおよび非FL屋内ローカライゼーションフレームワークよりも優れた性能を示した。 FedHILは、FLベースの屋内ローカライゼーションフレームワークよりも平均して1.62倍の精度でローカライズすることができる。 Indoor localization plays a vital role in applications such as emergency response, warehouse management, and augmented reality experiences. By deploying machine learning (ML) based indoor localization frameworks on their mobile devices, users can localize themselves in a variety of indoor and subterranean environments. However, achieving accurate indoor localization can be challenging due to heterogeneity in the hardware and software stacks of mobile devices, which can result in inconsistent and inaccurate location estimates. Traditional ML models also heavily rely on initial training data, making them vulnerable to degradation in performance with dynamic changes across indoor environments. To address the challenges due to device heterogeneity and lack of adaptivity, we propose a novel embedded ML framework called FedHIL. Our framework combines indoor localization and federated learning (FL) to improve indoor localization accuracy in device-heterogeneous environments while also preserving user data privacy. FedHIL integrates a domain-specific selective weight adjustment approach to preserve the ML model's performance for indoor localization during FL, even in the presence of extremely noisy data. Experimental evaluations in diverse real-world indoor environments and with heterogeneous mobile devices show that FedHIL outperforms state-of-the-art FL and non-FL indoor localization frameworks. FedHIL is able to achieve 1.62x better localization accuracy on average than the best performing FL-based indoor localization framework from prior work.	翻訳日:2023-07-06 16:40:11 公開日:2023-07-04
# 物理的に実現可能な自然着衣テクスチャを用いた3次元モデリング Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling ( http://arxiv.org/abs/2307.01778v1 ) ライセンス: Link先を確認	Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu	(参考訳) 近年の研究では、人検知器を避けるために敵の服を作る方法が提案されているが、これは限られた視角でのみ有効か、人間にとって非常に顕著である。 3dプリントされた亀などの硬い物体を製作するために用いられてきた3dモデリングに基づいて、服の逆テクスチャを制作することを目指している。硬い物体とは異なり、人間と衣服は非剛性であり、物理的実現が困難になる。複数の視角で人検出を回避できる自然的な対向服を作るために, 日常服の典型的なテクスチャの一種であるカモフラージュテクスチャに類似した対向的なカモフラージュテクスチャ(AdvCaT)を提案する。我々はvoronoiダイアグラムとgumbel-softmaxのトリックを利用して迷彩テクスチャをパラメータ化し、3dモデリングによりパラメータを最適化する。さらに,デジタルオブジェクトと実世界のオブジェクトのギャップを狭めるために,トポロジカル・プラザブル・プロジェクション(topoproj)と薄板スプライン(tps)を組み合わせた3次元メッシュ上の効率的な拡張パイプラインを提案する。開発した3dテクスチャを布素材にプリントし、tシャツやズボンに仕立てました。実験では、これらの服が複数の検出器に対して高い攻撃成功率を示す。 Recent works have proposed to craft adversarial clothes for evading person detectors, while they are either only effective at limited viewing angles or very conspicuous to humans. We aim to craft adversarial texture for clothes based on 3D modeling, an idea that has been used to craft rigid adversarial objects such as a 3D-printed turtle. Unlike rigid objects, humans and clothes are non-rigid, leading to difficulties in physical realization. In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures. We leverage the Voronoi diagram and Gumbel-softmax trick to parameterize the camouflage textures and optimize the parameters via 3D modeling. Moreover, we propose an efficient augmentation pipeline on 3D meshes combining topologically plausible projection (TopoProj) and Thin Plate Spline (TPS) to narrow the gap between digital and real-world objects. We printed the developed 3D texture pieces on fabric materials and tailored them into T-shirts and trousers. Experiments show high attack success rates of these clothes against multiple detectors.	翻訳日:2023-07-06 16:39:45 公開日:2023-07-04
# shapley sets: 再帰的関数分解による機能帰属 Shapley Sets: Feature Attribution via Recursive Function Decomposition ( http://arxiv.org/abs/2307.01777v1 ) ライセンス: Link先を確認	Torty Sivill and Peter Flach	(参考訳) ユビキタスな使用にもかかわらず、Shapleyの価値ある特徴属性は、モデルとデータの両方の機能相互作用のために誤解を招く可能性がある。我々は,機能集合に価値を与える代替帰属アプローチであるshapley setsを提案する。 Shapley Setsは、変数数の対数線形複雑性を持つ再帰関数分解アルゴリズムを用いて、基礎モデルを非分離変数群に分解する。シャプリーは、それぞれの分離不能な変数群に対して属性を特定の予測のためにそれらの組み合わせ値に設定する。シェープ集合は変換された特徴集合上のシェープ値と等価であることを示し、したがってフェアネスの同じ公理の恩恵を受ける。 Shapley Setsは値関数非依存であり、Shapley SetsがShapley値ベースの代替手段に関連する落とし穴を回避し、複雑な依存構造を持つデータ型に対して特に有利であることを示す。 Despite their ubiquitous use, Shapley value feature attributions can be misleading due to feature interaction in both model and data. We propose an alternative attribution approach, Shapley Sets, which awards value to sets of features. Shapley Sets decomposes the underlying model into non-separable variable groups using a recursive function decomposition algorithm with log linear complexity in the number of variables. Shapley Sets attributes to each non-separable variable group their combined value for a particular prediction. We show that Shapley Sets is equivalent to the Shapley value over the transformed feature set and thus benefits from the same axioms of fairness. Shapley Sets is value function agnostic and we show theoretically and experimentally how Shapley Sets avoids pitfalls associated with Shapley value based alternatives and are particularly advantageous for data types with complex dependency structure.	翻訳日:2023-07-06 16:39:22 公開日:2023-07-04
# スライスワッサーシュタイン一般化測地学による高速最適輸送 Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics ( http://arxiv.org/abs/2307.01770v1 ) ライセンス: Link先を確認	Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Cl\'ement Bonet, Nicolas Courty	(参考訳) ワッサースタイン距離(wasserstein distance, wd)と関連する最適輸送計画は、確率測度が懸かっている多くの応用において有用であることが証明されている。本稿では,2つの入力分布の最適1次元投影により誘導される輸送マップに基づく,2乗WDの新たなプロキシであるmin-SWGGを提案する。 min-swgg と wasserstein の一般化測地学との接続を描き、ピボット測度を直線上で支持する。特に、ライン上でサポートされている分布の1つの場合において、正確なワッサースタイン距離に対する新しい閉形式を提供し、勾配降下最適化に適応可能な高速計算スキームを導出する。 min-SWGG は WD の上限であり,Sliced-Wasserstein と同様の複雑性を有し,関連する輸送計画を提供するという付加的な特徴を有することを示す。また、距離性、弱収束、計算および位相的性質などの理論的性質についても検討する。実験的な証拠は、勾配流、形状マッチング、画像の着色など、様々な文脈におけるmin-SWGGの利点を支持する。 Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.	翻訳日:2023-07-06 16:39:07 公開日:2023-07-04
# データ中心型MLの前提条件としてのローカライズドデータワーク:ガーナにおけるフルライフサイクル作物病の特定を事例として Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana ( http://arxiv.org/abs/2307.01767v1 ) ライセンス: Link先を確認	Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, Luis Oala	(参考訳) ghana cashew disease identification with artificial intelligence (cadi ai)プロジェクトは、農業の生産性や食品の安全性など、公共の業務に有用な、局所的なデータ中心のソリューションを提供するための前提条件として、健全なデータワークの重要性を実証している。ドローン収集データと機械学習を使用して作物のストレスを判定する。データ、モデル、最終アプリは共同で開発され、デスクトップアプリケーションを通じて地元の農家に提供される。 The Ghana Cashew Disease Identification with Artificial Intelligence (CADI AI) project demonstrates the importance of sound data work as a precondition for the delivery of useful, localized datacentric solutions for public good tasks such as agricultural productivity and food security. Drone collected data and machine learning are utilized to determine crop stressors. Data, model and the final app are developed jointly and made available to local farmers via a desktop application.	翻訳日:2023-07-06 16:38:46 公開日:2023-07-04
# 限定アノテートデータに対する知識認識型オーディオグラウンド生成スロットフィリング Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data ( http://arxiv.org/abs/2307.01764v1 ) ライセンス: Link先を確認	Guangzhi Sun, Chao Zhang, Ivan Vuli\'c, Pawe{\l} Budzianowski, Philip C. Woodland	(参考訳) タスク指向対話(tod)システムのための細粒度スロット値ラベルを手動で注釈するのは、高価で時間がかかります。これにより、限られた量のラベル付きデータを扱うスロットフィルング方法の研究が動機となる。さらに、ToDに関する現在の研究の大部分は、音声言語で作業する際の不完全な自動音声認識(ASR)のさらなる課題を無視し、入力モダリティとしてのテキストのみに基づいている。本研究では,音声入力によるToDの少数ショットおよびゼロショットスロットフィリングに着目した,知識認識型音声包絡型生成スロットフィリングフレームワークKA2Gを提案する。 KA2Gは音声ベースのToDにおけるロバストかつデータ効率の良いスロットフィリングを実現する 1)テキスト生成タスクとしてフレーミングすること。 2)音声モダリティに付加的なテキスト生成の接地,及び 3) 利用可能な外部知識の条件付け(スロット値の事前定義されたリストなど)。 KA2Gフレームワーク内の両方のモダリティを組み合わせることで、ASRエラーに対する堅牢性が向上することを示す。さらに、ポインタ生成機構を介して実装されたka2gの知識認識スロット値生成器は、特に、ゼロショット学習とゼロショット学習にメリットがある。商用todシステムから抽出した標準音声ベースのシングルターンslurpデータセットとマルチターンデータセットを用いて実験を行い,先行作業,特にマイショットおよびゼロショット設定において,強固かつ一貫した結果を示す。 Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by 1) framing it as a text generation task, 2) grounding text generation additionally in the audio modality, and 3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.	翻訳日:2023-07-06 16:38:38 公開日:2023-07-04
# 説明不能な行動不確かさを伴う人間の軌道予測 Human Trajectory Forecasting with Explainable Behavioral Uncertainty ( http://arxiv.org/abs/2307.01817v1 ) ライセンス: Link先を確認	Jiangbei Yue, Dinesh Manocha and He Wang	(参考訳) 人間の軌道予測は、人間の行動を理解し予測するのに役立ち、社会ロボットから自動運転車への応用を可能にする。既存の手法はモデルフリーとモデルベースに分けることができる。モデルフリー手法は予測精度が優れているが説明可能性に欠ける一方、モデルベース手法は説明可能性を提供するが、よく予測できない。両手法を組み合わせることで,行動sdeモデルとベイズニューラルネットワーク(bnns)を結合した新しいベイズ型神経確率微分方程式モデルbnsp-sfmを提案する。 NNは優れた予測力を提供するが、SDEは行動や観察における定量的不確実性を伴う強い説明可能性を提供する。 BNSP-SFMの予測精度は,11種類の最先端手法と比較して50%向上した。 BNSP-SFMはまた、異なる環境と群衆密度(テストデータより約20倍高い)で劇的に異なるシーンを一般化する。最後に、BNSP-SFMは、行動の潜在的な原因をよりよく説明するために、自信を持って予測を提供する。コードは受理後にリリースされます。 Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars, and therefore has been heavily investigated. Most existing methods can be divided into model-free and model-based methods. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. Combining both methodologies, we propose a new Bayesian Neural Stochastic Differential Equation model BNSP-SFM, where a behavior SDE model is combined with Bayesian neural networks (BNNs). While the NNs provide superior predictive power, the SDE offers strong explainability with quantifiable uncertainty in behavior and observation. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods. BNSP-SFM also generalizes better to drastically different scenes with different environments and crowd densities (~ 20 times higher than the testing data). Finally, BNSP-SFM can provide predictions with confidence to better explain potential causes of behaviors. The code will be released upon acceptance.	翻訳日:2023-07-06 16:32:52 公開日:2023-07-04
# 複素重みをもつ複素ネットワークにおける構造バランスとランダムウォーク Structural Balance and Random Walks on Complex Networks with Complex Weights ( http://arxiv.org/abs/2307.01813v1 ) ライセンス: Link先を確認	Yu Tian, Renaud Lambiotte	(参考訳) 複素数は、多くの状況における実体間の関係を定義する。正準例は量子物理学におけるハミルトン行列の対角線外項である。近年、エッジの重みが複雑な数である場合、ネットワーク科学のツールを拡張することへの関心が高まっている。ここでは、重み行列が多くの応用において妥当な仮定であるエルミート行列である場合に注目し、複素重み付きネットワークの構造的および動的特性について検討する。符号付きグラフの概念に基づいて,構造的バランスの概念に基づく複雑重み付きネットワークの分類を行い,各タイプのスペクトル特性の共有について述べる。次に,グラフの構造的バランスが取れた場合に局所的なコンセンサスを漸近的に達成し,厳密なバランスが取れない場合に大域的なコンセンサスを得る,複雑な重み付きネットワーク上でのランダムウォークのダイナミクスを特徴付ける。最後に,カットの概念を一般化し,その可能性について検討し,関連するスペクトルクラスタリングアルゴリズムを提案する。また、複素重み付きネットワークに関連付ける磁気ラプラシアンのさらなる特性も提供する。アルゴリズムの性能は合成ネットワークと実ネットワークの両方で検証される。 Complex numbers define the relationship between entities in many situations. A canonical example would be the off-diagonal terms in a Hamiltonian matrix in quantum physics. Recent years have seen an increasing interest to extend the tools of network science when the weight of edges are complex numbers. Here, we focus on the case when the weight matrix is Hermitian, a reasonable assumption in many applications, and investigate both structural and dynamical properties of the complex-weighted networks. Building on concepts from signed graphs, we introduce a classification of complex-weighted networks based on the notion of structural balance, and illustrate the shared spectral properties within each type. We then apply the results to characterise the dynamics of random walks on complex-weighted networks, where local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. Finally, we explore potential applications of our findings by generalising the notion of cut, and propose an associated spectral clustering algorithm. We also provide further characteristics of the magnetic Laplacian, associating directed networks to complex-weighted ones. The performance of the algorithm is verified on both synthetic and real networks.	翻訳日:2023-07-06 16:32:35 公開日:2023-07-04
# 3次元時間検出のための学習意義誘導情報 SUIT: Learning Significance-guided Information for 3D Temporal Detection ( http://arxiv.org/abs/2307.01807v1 ) ライセンス: Link先を確認	Zheyuan Zhou, Jiachen Lu, Yihan Zeng, Hang Xu, Li Zhang	(参考訳) LiDARポイントクラウドからの3Dオブジェクト検出は、自動運転とロボット工学にとって非常に重要である。逐次点雲は時間的情報を通じて3次元知覚を高める可能性があるが、これらの時間的特徴を効果的に効果的に活用することは難しい問題である。前景情報がライダーシーンに分散しているという観測に基づいて、十分な知識は密集した地図ではなくスパースフォーマットで提供できると信じている。そこで本研究では,時間情報をフレーム間の情報融合のためのばらばらな特徴として単純化する3次元時間検出(suit)の意義誘導情報を学ぶことを提案する。具体的には,まず,予測対象のセントロイドに基づいて,情報に富みながらもスパースな特徴を抽出できる重要なサンプリング機構を導入する。さらに,フレームにまたがるスパース特徴間のオブジェクト中心変換を学習する,明示的な幾何学的変換学習手法を提案する。大規模なnuScenesとWaymoデータセットにおいて、SUITは時間融合のメモリと計算コストを大幅に削減するだけでなく、最先端のベースラインよりも優れた性能を発揮する。 3D object detection from LiDAR point cloud is of critical importance for autonomous driving and robotics. While sequential point cloud has the potential to enhance 3D perception through temporal information, utilizing these temporal features effectively and efficiently remains a challenging problem. Based on the observation that the foreground information is sparsely distributed in LiDAR scenes, we believe sufficient knowledge can be provided by sparse format rather than dense maps. To this end, we propose to learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames. Specifically, we first introduce a significant sampling mechanism that extracts information-rich yet sparse features based on predicted object centroids. On top of that, we present an explicit geometric transformation learning technique, which learns the object-centric transformations among sparse features across frames. We evaluate our method on large-scale nuScenes and Waymo dataset, where our SUIT not only significantly reduces the memory and computation cost of temporal fusion, but also performs well over the state-of-the-art baselines.	翻訳日:2023-07-06 16:32:15 公開日:2023-07-04
# DeepFlorist: ディープニューラルネットワークとアンサンブルラーニングをオブジェクト分類のためのメタ分類器として考える DeepFlorist: Rethinking Deep Neural Networks and Ensemble Learning as A Meta-Classifier For Object Classification ( http://arxiv.org/abs/2307.01806v1 ) ライセンス: Link先を確認	Afshin Khadangi	(参考訳) 本稿では,アンサンブル学習をメタ分類として用いた花分類のための新しい学習パラダイム"DeepFlorist"を提案する。 DeepFloristは、深層学習のパワーとアンサンブル手法の堅牢さを組み合わせて、正確で信頼性の高い花分類結果を達成する。提案するネットワークアーキテクチャは,高次畳み込みニューラルネットワーク(DCNN)と畳み込みニューラルネットワーク(CNN)を組み合わせることで,花のイメージから高次特徴を抽出し,次に完全に連結された階層を分類する。 DeepFloristの性能向上と一般化のために、複数の多様なモデルを組み込んで分類精度を向上させるアンサンブル学習手法が採用された。ベンチマークフラワーデータセットの実験結果は、精度とロバスト性の観点から、deepfloristが最先端の手法よりも優れていることを示した。提案フレームワークは, 植物分類学, 保全研究, 生態学研究の進歩を可能とし, 実地応用における自動花認識システムへの大きな可能性を秘めている。 In this paper, we propose a novel learning paradigm called "DeepFlorist" for flower classification using ensemble learning as a meta-classifier. DeepFlorist combines the power of deep learning with the robustness of ensemble methods to achieve accurate and reliable flower classification results. The proposed network architecture leverages a combination of dense convolutional and convolutional neural networks (DCNNs and CNNs) to extract high-level features from flower images, followed by a fully connected layer for classification. To enhance the performance and generalization of DeepFlorist, an ensemble learning approach is employed, incorporating multiple diverse models to improve the classification accuracy. Experimental results on benchmark flower datasets demonstrate the effectiveness of DeepFlorist, outperforming state-of-the-art methods in terms of accuracy and robustness. The proposed framework holds significant potential for automated flower recognition systems in real-world applications, enabling advancements in plant taxonomy, conservation efforts, and ecological studies.	翻訳日:2023-07-06 16:31:55 公開日:2023-07-04
# フーリエニューラル演算子による添加製造中の局所温度変化の捕捉 Capturing Local Temperature Evolution during Additive Manufacturing through Fourier Neural Operators ( http://arxiv.org/abs/2307.01804v1 ) ライセンス: Link先を確認	Jiangce Chen, Wenzhuo Xu, Martha Baldwin, Bj\"orn Nijhuis, Ton van den Boogaard, Noelia Grande Guti\'errez, Sneha Prabha Narra, Christopher McComb	(参考訳) 部品設計、プロセス計画、モニタリング、制御など、複数の分野におけるAM技術の性能向上には、AM製造中の熱挙動を迅速にシミュレートできる高忠実なデータ駆動モデルが不可欠である。しかしながら、部分ジオメトリの複雑さは、現在のモデルが幅広いジオメトリにわたって高い精度を維持することを困難にしている。さらに、多くのモデルはドメイン全体(一部)にわたって平均二乗誤差(MSE)を報告している。しかし、各段階において、領域のほとんどの領域は、最近の鉱床付近の熱影響帯を除いて、大きな温度変化を経験していない。したがって、mseに基づくモデルの忠実度測定を過大評価することができる。本稿では,フーリエ・ニューラル・オペレーターを用いて添加物製造過程における局所温度変化を捉えるデータ駆動モデルを提案する。さらに, 平均温度を予測として用いた場合と比較して, モデルの性能を相対測度で表した$R^2$メトリックを用いてモデルを評価することを提案する。本モデルは直接エネルギー沈着法における不連続ガレルキン有限要素法に基づく数値シミュレーションを用いて実験を行い, r^2$ で測定した高い忠実性を達成し, トレーニングプロセスに含まれないジオメトリへの一般化性を維持することを実証した。 High-fidelity, data-driven models that can quickly simulate thermal behavior during additive manufacturing (AM) are crucial for improving the performance of AM technologies in multiple areas, such as part design, process planning, monitoring, and control. However, the complexities of part geometries make it challenging for current models to maintain high accuracy across a wide range of geometries. Additionally, many models report a low mean square error (MSE) across the entire domain (part). However, in each time step, most areas of the domain do not experience significant changes in temperature, except for the heat-affected zones near recent depositions. Therefore, the MSE-based fidelity measurement of the models may be overestimated. This paper presents a data-driven model that uses Fourier Neural Operator to capture the local temperature evolution during the additive manufacturing process. In addition, the authors propose to evaluate the model using the $R^2$ metric, which provides a relative measure of the model's performance compared to using mean temperature as a prediction. The model was tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process, and the results demonstrate that the model achieves high fidelity as measured by $R^2$ and maintains generalizability to geometries that were not included in the training process.	翻訳日:2023-07-06 16:31:36 公開日:2023-07-04
# 安定化器分解による三角形ZXダイアグラムの高速収縮 Speedy Contraction of ZX Diagrams with Triangles via Stabiliser Decompositions ( http://arxiv.org/abs/2307.01803v1 ) ライセンス: Link先を確認	Mark Koch, Richie Yeung, Quanlong Wang	(参考訳) クリフォード+t回路の古典的シミュレーションの最近の進歩は、zx計算を用いてマジック状態を反復分解し、単純化する。三角演算を含むzx図のスタビリザー分解について検討することで,この方法を改善する。この手法は、三角形を用いて自然に表現できるマルチ制御ゲートを含む量子回路のシミュレーションを大幅に高速化する。提案手法をquizxライブラリに実装し,ランダム回路に対する重要なシミュレーション高速化(最大数桁まで)と,これまで使用されていたベンチマーク回路のバリエーションを示す。さらに,本ソフトウェアを用いてパラメトリド量子回路の勾配変動を表す図を縮約し,量子機械学習に使用されるアンス・アッツにおけるバレンプラトー現象の自動数値検出を行う。従来の統計学的手法と比較すると, この手法は勾配分散の正確な値を与え, 1 つのダイアグラムを縮約するだけでよい。このツールのパフォーマンスは、クイムライブラリに対するベンチマークで示されているように、テンソルネットワークアプローチと競合する。 Recent advances in classical simulation of Clifford+T circuits make use of the ZX calculus to iteratively decompose and simplify magic states into stabiliser terms. We improve on this method by studying stabiliser decompositions of ZX diagrams involving the triangle operation. We show that this technique greatly speeds up the simulation of quantum circuits involving multi-controlled gates which can be naturally represented using triangles. We implement our approach in the QuiZX library and demonstrate a significant simulation speed-up (up to multiple orders of magnitude) for random circuits and a variation of previously used benchmarking circuits. Furthermore, we use our software to contract diagrams representing the gradient variance of parametrised quantum circuits, which yields a tool for the automatic numerical detection of the barren plateau phenomenon in ans\"atze used for quantum machine learning. Compared to traditional statistical approaches, our method yields exact values for gradient variances and only requires contracting a single diagram. The performance of this tool is competitive with tensor network approaches, as demonstrated with benchmarks against the quimb library.	翻訳日:2023-07-06 16:31:11 公開日:2023-07-04
# Infinite Tensor Network Contraction によるオープン量子システムダイナミクス Open Quantum System Dynamics from Infinite Tensor Network Contraction ( http://arxiv.org/abs/2307.01802v1 ) ライセンス: Link先を確認	Valentin Link, Hong-Hao Tu, Walter T. Strunz	(参考訳) 近年、強結合な非マルコフ開系の力学を計算するための手法が、行列積状態(MPS)形式に縮約できるテンソルネットワークの観点でいわゆるプロセステンソルの表現に基づいている。ガウス環境においては, 浴槽応答の定常性を利用して, 無限MPS進化法を用いて, このMPSを構築することができることを示す。この結果は、階層的あるいは擬態的手法のように、自由度を補助するオープンシステムの進化と構造的に類似している。しかし、これらの自由度はMPS進化アルゴリズムによって自動的に生成される。さらに, プロセステンソルネットワークを縮約するアルゴリズムは, 既存の提案よりも強い結合問題に対して大きな計算速度アップをもたらす。 Recently developed methods to compute dynamics of strongly coupled non-Markovian open systems are based on a representation of the so-called process tensor in terms of a tensor network, which can be contracted to matrix product state (MPS) form. We show that for Gaussian environments the stationarity of the bath response can be exploited in order to construct this MPS using infinite MPS evolution methods. The result structurally resembles open system evolution with auxiliary degrees of freedom, as in hierarchical or pseudomode methods. Here, however, these degrees of freedom are generated automatically by the MPS evolution algorithm. Furthermore, our algorithm for contracting the process tensor network leads to significant computational speed-ups for strong coupling problems over existing proposals.	翻訳日:2023-07-06 16:30:51 公開日:2023-07-04
# 対角形2量子ビットゲートとクラスター計測を用いた量子計算における古典的効率的レジーム Classically efficient regimes in measurement based quantum computation performed using diagonal two qubit gates and cluster measurements ( http://arxiv.org/abs/2307.01800v1 ) ライセンス: Link先を確認	Sahar Atallah, Michael Garn, Yukuan Tao, Shashank Virmani	(参考訳) 最近の研究 arXiv:2201.07655v2 において、定数 $\lambda > 0$ が存在し、量子系を効率よく古典的にシミュレートできることを示した。 (i)グラフのノードにquditを配置する。 (ii)各クディットは、最大でD$の対角ゲートを通す。 (iii)各クディットは、その偏りのない計算ベース又は基礎において破壊的に測定され、 (iv) それぞれのquditは、特定の距離測度に従って対角状態の$\lambda^{-D}$内で初期化される。この作業では、任意の2つの量子ビット対角ゲートに対して$\lambda$を明示的に計算し、CZゲートを越えてarXiv:2201.07655v2の計算を拡張する。任意の有限次グラフに対して、パラメータの他の値が理想的なクラスター状態量子計算を可能にするとしても、非自明な古典的に許容された測定に対して効率的にシミュレート可能な「位相」を持つ純絡み合った量子状態の2つのパラメータ族(または熱状態の3つのパラメータ族)を記述することができる。技術的なツールは、作用素の「円筒的」集合の観点から分離性を考えることである。また、異なる集合の選択がアルゴリズムを強化し、それらが広い種類の集合の中で最適であることを示すかどうかも検討するが、このクラス以外では古典的に効率的な体系のサイズを増大させる選択肢が存在することも数値的に示している。 In a recent work arXiv:2201.07655v2 we showed that there is a constant $\lambda >0$ such that it is possible to efficiently classically simulate a quantum system in which (i) qudits are placed on the nodes of a graph, (ii) each qudit undergoes at most $D$ diagonal gates, (iii) each qudit is destructively measured in the computational basis or bases unbiased to it, and (iv) each qudit is initialised within $\lambda^{-D}$ of a diagonal state according to a particular distance measure. In this work we explicitly compute $\lambda$ for any two qubit diagonal gate, thereby extending the computation of arXiv:2201.07655v2 beyond CZ gates. For any finite degree graph this allows us to describe a two parameter family of pure entangled quantum states (or three parameter family of thermal states) which have a non-trivial classically efficiently simulatable "phase" for the permitted measurements, even though other values of the parameters may enable ideal cluster state quantum computation. The main the technical tool involves considering separability in terms of "cylindrical" sets of operators. We also consider whether a different choice of set can strengthen the algorithm, and prove that they are optimal among a broad class of sets, but also show numerically that outside this class there are choices that can increase the size of the classically efficient regime.	翻訳日:2023-07-06 16:30:39 公開日:2023-07-04
# エッジアウェアマルチタスクネットワークによるマルチモダリティmriにおける肝腫瘍の定量化分節化と不確実性予測の統合 Edge-aware Multi-task Network for Integrating Quantification Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality Non-contrast MRI ( http://arxiv.org/abs/2307.01798v1 ) ライセンス: Link先を確認	Xiaojiao Xiao, Qinmin Hu, Guanghui Wang	(参考訳) multi-modality non-contrast magnetic resonance imaging (ncmri) における肝腫瘍の同時定量化, 分節化, 不確実性評価は, 診断に不可欠である。しかし、既存の手法では、マルチモードNCMRI融合と正確な境界情報取得のための効果的なメカニズムが欠如しており、これらのタスクは困難である。これらの課題に対処するために,マルチインデックス定量化,セグメンテーション,不確実性を多モードNCMRI上で関連付けるために,エッジ対応マルチタスクネットワーク(EaMtNet)という統合フレームワークを提案する。 EaMtNetは2つの並列CNNエンコーダとソベルフィルタを使用して、それぞれローカル特徴とエッジマップを抽出する。新たに設計されたエッジ対応機能集約モジュール(EaFA)は、機能融合と選択に使用され、機能マップとエッジマップ間の長距離依存性をキャプチャすることで、ネットワークエッジ対応を実現する。マルチタスクは予測誤差を利用して不確実性を推定し、セグメンテーションと定量化性能を改善する。マルチモダリティncmriと250名の臨床被験者による広範囲な実験を行った。提案モデルでは, ダイス類似度係数が90.01$\pm$1.23, 平均絶対誤差が2.72$\pm$0.58 mmである。その結果,EaMtNetは医用画像解析のための信頼性の高い臨床支援ツールとしての可能性を示した。 Simultaneous multi-index quantification, segmentation, and uncertainty estimation of liver tumors on multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an effective mechanism for multi-modality NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet), to associate multi-index quantification, segmentation, and uncertainty of liver tumors on the multi-modality NCMRI. The EaMtNet employs two parallel CNN encoders and the Sobel filters to extract local features and edge maps, respectively. The newly designed edge-aware feature aggregation module (EaFA) is used for feature fusion and selection, making the network edge-aware by capturing long-range dependency between feature and edge maps. Multi-tasking leverages prediction discrepancy to estimate uncertainty and improve segmentation and quantification performance. Extensive experiments are performed on multi-modality NCMRI with 250 clinical subjects. The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD. The results demonstrate the potential of EaMtNet as a reliable clinical-aided tool for medical image analysis.	翻訳日:2023-07-06 16:30:13 公開日:2023-07-04
# in-medium qcdジェットの量子シミュレーション:運動量拡大、グルーオン生成、エントロピー成長 Quantum simulation of in-medium QCD jets: momentum broadening, gluon production, and entropy growth ( http://arxiv.org/abs/2307.01792v1 ) ライセンス: Link先を確認	Jo\~ao Barata, Xiaojian Du, Meijian Li, Wenyang Qian, Carlos A. Salgado	(参考訳) ジェットは超相対論的重イオン衝突で生成するクォークグルーオンプラズマと、深い非弾性散乱実験で探究された冷たい核物質の主要なプローブの1つである。しかしながら、近年の重要な発展にもかかわらず、媒体内のqcdジェットのリアルタイム進化に関する記述は完成にはほど遠い。これまでの研究では、qcd物質のジェット進化をシミュレートし、現在の計算における固有の技術的困難を克服するための、有望な代替理論実験室として量子技術を検討した。ここでは、単一粒子 $\|q\rangle$ からファック空間 $\|q\rangle+\|qg\rangle$ に拡張し、グルーオンの生成を考慮する。光面ハミルトニアン形式に基づいて、確率的色場として記述された媒体の存在下で多粒子ジェットプローブの進化を追跡するデジタル量子回路を構築する。噴流状態の運動量拡大について検討し,固有推定値と比較し,相当な固有効果を観測した。また,真空分裂関数と比較して小さな補正を施したグルーオン放出確率の媒質変化について検討した。さらに、クォーク成分に関連するフォン・ノイマンエントロピーの時間発展の研究を行い、エントロピーの指数関数は裸クォークに対して線形に成長するが、グルーオン放出を考慮すれば超線形に成長することを見出した。 Jets provide one of the primary probes of the quark-gluon plasma produced in ultrarelativistic heavy ion collisions and the cold nuclear matter explored in deep inelastic scattering experiments. However, despite important developments in the last years, a description of the real-time evolution of QCD jets inside a medium is still far from being complete. In our previous work, we have explored quantum technologies as a promising alternative theoretical laboratory to simulate jet evolution in QCD matter, to overcome inherent technical difficulties in present calculations. Here, we extend our previous investigation from the single particle $\|q\rangle$ to the $\|q\rangle+\|qg\rangle$ Fock space, taking into account gluon production. Based on the light-front Hamiltonian formalism, we construct a digital quantum circuit that tracks the evolution of a multi-particle jet probe in the presence of a medium described as a stochastic color field. Studying the momentum broadening of the jet state, we observe sizable sub-eikonal effects by comparing to eikonal estimates. We also study the medium-induced modifications to the gluon emission probability, which exhibit small corrections compared to the vacuum splitting function. In addition, we study the time evolution of the von-Neumann entropy associated with the quark component; we find that the exponential of the entropy grows linearly in time for the bare quark but super-linearly when taking into account gluon emission.	翻訳日:2023-07-06 16:29:46 公開日:2023-07-04
# コンタクトレス指紋提示アタック検出のための深い機能:一般化できるか? Deep Features for Contactless Fingerprint Presentation Attack Detection: Can They Be Generalized? ( http://arxiv.org/abs/2307.01845v1 ) ライセンス: Link先を確認	Hailin Li and Raghavendra Ramachandra	(参考訳) 高度な高解像度カメラを備えたハイエンドスマートフォンの急速な進化は、より信頼性が高く、検証に適した指紋バイオメトリックスを接触なく捕獲する結果となった。他の生体認証システムと同様に、非接触指紋認証システムはプレゼンテーション攻撃に対して脆弱である。本稿では,7種類の事前学習型畳み込みニューラルネットワーク (CNN) と視覚変換器 (ViT) の汎用性を比較検討し,提示攻撃を確実に検出する。 4種類のプレゼンテーションアタック・インスツルメンツ(PAI)を用いて,スマートフォンによるプレゼンテーションアタック・データセットの公開実験を行った。第8の深層特徴量の検出性能は,未発見のpaiの一般化性能をベンチマークするためにrevet-one-outプロトコルを用いて評価した。その結果,ResNet50 CNNで最高の一般化性能を示した。 The rapid evolution of high-end smartphones with advanced high-resolution cameras has resulted in contactless capture of fingerprint biometrics that are more reliable and suitable for verification. Similar to other biometric systems, contactless fingerprint-verification systems are vulnerable to presentation attacks. In this paper, we present a comparative study on the generalizability of seven different pre-trained Convolutional Neural Networks (CNN) and a Vision Transformer (ViT) to reliably detect presentation attacks. Extensive experiments were carried out on publicly available smartphone-based presentation attack datasets using four different Presentation Attack Instruments (PAI). The detection performance of the eighth deep feature technique was evaluated using the leave-one-out protocol to benchmark the generalization performance for unseen PAI. The obtained results indicated the best generalization performance with the ResNet50 CNN.	翻訳日:2023-07-06 16:21:29 公開日:2023-07-04
# 3次元顔における創傷充満の促進:自動分割と創傷顔面再生アプローチ Advancing Wound Filling Extraction on 3D Faces: A Auto-Segmentation and Wound Face Regeneration Approach ( http://arxiv.org/abs/2307.01844v1 ) ライセンス: Link先を確認	Duong Q. Nguyen and Thinh D. Le and Phuong D. Nguyen and H. Nguyen-Xuan	(参考訳) 顔面創傷の分節は, 術前計画および各種医療応用における患者予後の最適化において重要な役割を担っている。本稿では,2ストリームグラフ畳み込みネットワークを用いた3次元顔面創傷セグメンテーションの効率的な自動化手法を提案する。提案手法は,Cir3D-FaIRデータセットを活用し,異なる損失関数を用いた広範囲な実験を通じてデータ不均衡の課題に対処する。精度の高いセグメンテーションを実現するために,徹底的な実験を行い,訓練したモデルから高性能モデルを選択した。選択したモデルは複雑な3次元顔面外傷に対して例外的なセグメンテーション性能を示す。さらに, このセグメンテーションモデルに基づいて, 3次元顔の創傷充填体を抽出し, 前報と比較する手法を提案する。提案手法は, テストスイート上で0.9999986\%の精度を達成し, 先行手法の性能を上回った。この結果から,3Dプリンティング技術を用いて創傷充填形状を図示する。本研究の結果は,術前計画と介入設計に関わる医師に有意な影響を及ぼす。顔の創傷断面積の自動化と創傷充満抽出の精度の向上により, 介入を慎重に評価し, 最適化し, 患者の治療効果を高めることができる。さらに、皮膚組織インプラントの印刷に機械学習と3dバイオプリンティングを活用し、顔面再建の進歩に寄与する。ソースコードは \url{https://github.com/SIMOGroup/WoundFilling3D} で公開されています。 Facial wound segmentation plays a crucial role in preoperative planning and optimizing patient outcomes in various medical applications. In this paper, we propose an efficient approach for automating 3D facial wound segmentation using a two-stream graph convolutional network. Our method leverages the Cir3D-FaIR dataset and addresses the challenge of data imbalance through extensive experimentation with different loss functions. To achieve accurate segmentation, we conducted thorough experiments and selected a high-performing model from the trained models. The selected model demonstrates exceptional segmentation performance for complex 3D facial wounds. Furthermore, based on the segmentation model, we propose an improved approach for extracting 3D facial wound fillers and compare it to the results of the previous study. Our method achieved a remarkable accuracy of 0.9999986\% on the test suite, surpassing the performance of the previous method. From this result, we use 3D printing technology to illustrate the shape of the wound filling. The outcomes of this study have significant implications for physicians involved in preoperative planning and intervention design. By automating facial wound segmentation and improving the accuracy of wound-filling extraction, our approach can assist in carefully assessing and optimizing interventions, leading to enhanced patient outcomes. Additionally, it contributes to advancing facial reconstruction techniques by utilizing machine learning and 3D bioprinting for printing skin tissue implants. Our source code is available at \url{https://github.com/SIMOGroup/WoundFilling3D}.	翻訳日:2023-07-06 16:21:16 公開日:2023-07-04
# ATOM:量子コンピューティングにおける小さな埋め込みのための効率的なトポロジ適応アルゴリズム ATOM: An Efficient Topology Adaptive Algorithm for Minor Embedding in Quantum Computing ( http://arxiv.org/abs/2307.01843v1 ) ライセンス: Link先を確認	Hoang M. Ngo, Tamer Kahveci, My T. Thai	(参考訳) 量子アニーリング(quantum annealing, qa)は、量子物理学の利点を生かして最適化問題を解決する強力な手法である。 QAプロセスにおいて、QAのスケールアップを防ぐボトルネックは、論理グラフと呼ばれるグラフで表される最適化問題を、別のグラフで表される量子コンピュータの量子処理ユニット(QPU)トポロジに埋め込む小さな埋め込みステップである。既存のマイナー埋め込みのメソッドは、大規模なグラフ埋め込みでかなりの量の実行時間を必要とする。本稿では,ハードウェアグラフの拡張可能な部分グラフである適応トポロジーの新たな概念を提案する。そこで我々は,Adaptive Topology eMbedding (ATOM) という小さな埋め込みアルゴリズムを開発した。 ATOMは論理グラフからノードを反復的に選択し、ハードウェアグラフの適応トポロジーに埋め込む。実験の結果、atomは、結果の埋め込みの品質を損なうことなく、最先端のものよりもずっと小さな実行時間で実現可能な埋め込みを提供できることがわかった。 Quantum annealing (QA) has emerged as a powerful technique to solve optimization problems by taking advantages of quantum physics. In QA process, a bottleneck that may prevent QA to scale up is minor embedding step in which we embed optimization problems represented by a graph, called logical graph, to Quantum Processing Unit (QPU) topology of quantum computers, represented by another graph, call hardware graph. Existing methods for minor embedding require a significant amount of running time in a large-scale graph embedding. To overcome this problem, in this paper, we introduce a novel notion of adaptive topology which is an expandable subgraph of the hardware graph. From that, we develop a minor embedding algorithm, namely Adaptive TOpology eMbedding (ATOM). ATOM iteratively selects a node from the logical graph, and embeds it to the adaptive topology of the hardware graph. Our experimental results show that ATOM is able to provide a feasible embedding in much smaller running time than that of the state-of-the-art without compromising the quality of resulting embedding.	翻訳日:2023-07-06 16:20:52 公開日:2023-07-04
# グローバルクエンチ後の三成分情報の普遍性:スピンフリップと半局所電荷 Universality in the tripartite information after global quenches: spin flip and semilocal charges ( http://arxiv.org/abs/2307.01842v1 ) ライセンス: Link先を確認	Vanja Mari\'c	(参考訳) 我々は、時間発展が半局所保存作用素を持つ局所ハミルトニアンの下にある大域的クエンチの後に現れる定常状態を研究する。特に、量子xy鎖に双対なモデルについて研究する。初期状態における局所摂動は定常状態における空間相関の指数関数的減衰を代数的崩壊に変えることができることを示す。隣り合う3つのサブシステムの三部情報に着目し, (R\enyi-$\alpha$) 絡み合いエントロピーの挙動について検討した。大きなサブシステムの限界において、相関の代数的崩壊を伴う定常状態において、三成分情報は交叉比に普遍的な依存を持つ非零値を示し、相関の指数的減衰とともに定常状態において消失する。 We study stationary states emerging after global quenches in which the time evolution is under local Hamiltonians that possess semilocal conserved operators. In particular, we study a model that is dual to quantum XY chain. We show that a localized perturbation in the initial state can turn an exponential decay of spatial correlations in the stationary state into an algebraic decay. We investigate the consequences on the behavior of the (R\'enyi-$\alpha$) entanglement entropies, focusing on the tripartite information of three adjacent subsystems. In the limit of large subsystems, we show that in the stationary state with the algebraic decay of correlations the tripartite information exhibits a non-zero value with a universal dependency on the cross ratio, while it vanishes in the stationary state with the exponential decay of correlations.	翻訳日:2023-07-06 16:20:34 公開日:2023-07-04
# ニューラルネットワーク混合状態再構成の実証的サンプル複雑性 Empirical Sample Complexity of Neural Network Mixed State Reconstruction ( http://arxiv.org/abs/2307.01840v1 ) ライセンス: Link先を確認	Haimeng Zhao and Giuseppe Carleo and Filippo Vicentini	(参考訳) 神経量子状態を用いた量子状態再構成は、実用的な応用において量子ショットの複雑さを減らすための有効なツールとして提案されており、特にノイズレスの場合に焦点を当てた数値実験でその利点が示されている。本研究では,混合状態に対する異なる量子状態再構成手法(有限温度イジングモデル)の性能を数値的に検討する。本稿では,分散低減手法を応用し,アルゴリズムの量子資源要件を体系的に低減する方法を示す。次に、状態の2つの主要なニューラルネットワーク量子状態、すなわち、神経密度演算子と正の演算子値測定表現を比較し、対象状態の混合度が異なるため、それらの性能を示す。我々は、ある種のエンコーディングは異なる混合状態においてより効率的であり、古典的資源と量子的資源の両方の観点からより効率的なエンコーディングを設計する必要性を指摘する。 Quantum state reconstruction using Neural Quantum States has been proposed as a viable tool to reduce quantum shot complexity in practical applications, and its advantage over competing techniques has been shown in numerical experiments focusing mainly on the noiseless case. In this work, we numerically investigate the performance of different quantum state reconstruction techniques for mixed states: the finite-temperature Ising model. We show how to systematically reduce the quantum resource requirement of the algorithms by applying variance reduction techniques. Then, we compare the two leading neural quantum state encodings of the state, namely, the Neural Density Operator and the positive operator-valued measurement representation, and illustrate their different performance as the mixedness of the target state varies. We find that certain encodings are more efficient in different regimes of mixedness and point out the need for designing more efficient encodings in terms of both classical and quantum resources.	翻訳日:2023-07-06 16:20:18 公開日:2023-07-04
# EdgeFace:エッジデバイスのための効率的な顔認識モデル EdgeFace: Efficient Face Recognition Model for Edge Devices ( http://arxiv.org/abs/2307.01838v1 ) ライセンス: Link先を確認	Anjith George and Christophe Ecabert and Hatef Otroshi Shahreza and Ketan Kotwal and Sebastien Marcel	(参考訳) 本稿では,EdgeNeXtのハイブリッドアーキテクチャにヒントを得た,軽量かつ効率的な顔認識ネットワークEdgeFaceを提案する。 CNNとTransformerモデルの長所と低階線形層を効果的に組み合わせることで、エッジデバイスに最適化された優れた顔認識性能を実現する。提案したEdgeFaceネットワークは、低計算コストとコンパクトストレージを維持するだけでなく、高い顔認識精度を実現し、エッジデバイスへのデプロイに適している。挑戦的なベンチマーク顔データセットに関する広範囲な実験は、最先端の軽量モデルや深層顔認識モデルと比較して、エッジフェイスの有効性と効率を示す。 1.77Mパラメータを持つEdgeFaceモデルはLFW(99.73%)、IJB-B(92.67%)、IJB-C(94.85%)のアート結果の状態を達成し、計算量の多い他の効率的なモデルよりも優れている。実験を再現するコードは公開される予定だ。 In this paper, we present EdgeFace, a lightweight and efficient face recognition network inspired by the hybrid architecture of EdgeNeXt. By effectively combining the strengths of both CNN and Transformer models, and a low rank linear layer, EdgeFace achieves excellent face recognition performance optimized for edge devices. The proposed EdgeFace network not only maintains low computational costs and compact storage, but also achieves high face recognition accuracy, making it suitable for deployment on edge devices. Extensive experiments on challenging benchmark face datasets demonstrate the effectiveness and efficiency of EdgeFace in comparison to state-of-the-art lightweight models and deep face recognition models. Our EdgeFace model with 1.77M parameters achieves state of the art results on LFW (99.73%), IJB-B (92.67%), and IJB-C (94.85%), outperforming other efficient models with larger computational complexities. The code to replicate the experiments will be made available publicly.	翻訳日:2023-07-06 16:20:03 公開日:2023-07-04
# 四元数フーリエ変換の行列形式と四元数畳み込みについて On the Matrix Form of the Quaternion Fourier Transform and Quaternion Convolution ( http://arxiv.org/abs/2307.01836v1 ) ライセンス: Link先を確認	Giorgos Sfikas and George Retsinas	(参考訳) フーリエ変換および畳み込み演算の四元数版行列形式について検討する。四元数(英語版)は強力な表現単位を提供するが、それらは四元数乗算の非可換性から最も遠ざかるそれらの利用の困難と関係しており、従って、$\mu^2 = -1$ は四元数領域における無限の解をとる。四元数行列の扱いはいくつかの面で複雑である(固有構造の定義、行列式など)。本研究では, 4次フーリエ変換行列と標準(複素)離散フーリエ変換行列との関係と, 既知の複素領域定理が四元数に拡張された拡張について明らかにする。特に四元系フーリエ変換行列と四元系循環行列の関係(四元系畳み込みを表わす)と、後者の固有構造との関係に注目した。理論結果を直接利用した概念実証の応用として,四元子畳み込みのスペクトルノルムを束縛する手法を提案する。 We study matrix forms of quaternionic versions of the Fourier Transform and Convolution operations. Quaternions offer a powerful representation unit, however they are related to difficulties in their use that stem foremost from non-commutativity of quaternion multiplication, and due to that $\mu^2 = -1$ posseses infinite solutions in the quaternion domain. Handling of quaternionic matrices is consequently complicated in several aspects (definition of eigenstructure, determinant, etc.). Our research findings clarify the relation of the Quaternion Fourier Transform matrix to the standard (complex) Discrete Fourier Transform matrix, and the extend on which well-known complex-domain theorems extend to quaternions. We focus especially on the relation of Quaternion Fourier Transform matrices to Quaternion Circulant matrices (representing quaternionic convolution), and the eigenstructure of the latter. A proof-of-concept application that makes direct use of our theoretical results is presented, where we produce a method to bound the spectral norm of a Quaternionic Convolution.	翻訳日:2023-07-06 16:19:44 公開日:2023-07-04
# パラメトリック逆変換源を用いた絡み合い型qkdの安全性 Security of entanglement-based QKD with realistic parametric down-conversion sources ( http://arxiv.org/abs/2307.01834v1 ) ライセンス: Link先を確認	K. S. Kravtsov	(参考訳) 本稿では,実践的絡み合いに基づく量子鍵分布(QKD),すなわちBBM92やBB84プロトコルのセキュリティ面を分析する。準備と測定のQKDプロトコルと同様に、絡み合いベースのQKDの実装は、非理想的な光子源に依存する必要がある。絡み合い生成の典型的な解は自然パラメトリックダウン変換である。しかし、このプロセスは単一の光子対だけでなく、2つ以上の光子を持つ量子状態も生成し、セキュリティの悪化につながる可能性がある。この効果は絡み合いに基づくQKDシステムのセキュリティを損なうものではない。また、利用可能なセキュリティ証明をレビューし、絡み合ったソースの特性がセキュリティ劣化とは無関係であることを示す。 The paper analyzes security aspects of practical entanglement-based quantum key distribution (QKD), namely, BBM92 or entanglement-based BB84 protocol. Similar to prepare-and-measure QKD protocols, practical implementations of the entanglement-based QKD have to rely upon non-ideal photon sources. A typical solution for entanglement generation is the spontaneous parametric down-conversion. However, this process creates not only single photon pairs, but also quantum states with more than two photons, which potentially may lead to security deterioration. We show that this effect does not impair the security of entanglement-based QKD systems. We also review the available security proofs and show that properties of the entanglement source have nothing to do with security degradation.	翻訳日:2023-07-06 16:19:19 公開日:2023-07-04
# dit-3d:3次元形状生成のための平滑拡散トランスの検討 DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation ( http://arxiv.org/abs/2307.01831v1 ) ライセンス: Link先を確認	Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nie{\ss}ner, Zhenguo Li	(参考訳) 最近の拡散変換器(例えば、DiT)は、高品質な2D画像を生成するための強力な効果を示している。しかし,従来の3次元拡散法は主にU-Netアーキテクチャを採用するため,トランスフォーマーアーキテクチャが3次元形状生成において同等に機能するかどうかはまだ定かではない。このギャップを埋めるために, 平らな変換器を用いて渦化点雲のデノナイジング過程を直接操作できる新しい3次元形状生成用拡散変換器, DiT-3Dを提案する。既存のU-Netアプローチと比較して、私たちのDiT-3Dはモデルサイズがよりスケーラブルで、より高品質な世代を生み出す。具体的には、DiT-3D は DiT の設計哲学を採用するが、3D の位置とパッチの埋め込みを組み込んで、voxelized point cloud からの入力を適応的に集約することで変更する。 3次元形状生成における自己注意の計算コストを低減するため、3次元ウィンドウアテンションをトランスフォーマーブロックに組み込む。最後に、偏光点雲の予測に線形および脱酸化層を用いる。また、2Dから3Dへの効率的な微調整もサポートしており、ImageNetのトレーニング済みのDiT-2DチェックポイントはShapeNetのDiT-3Dを大幅に改善することができる。 ShapeNetデータセットの実験結果から、提案したDiT-3Dは、高忠実で多様な3Dポイントクラウド生成において最先端の性能を達成することが示された。特に,我々のdit-3dは,最先端手法の1ネアレスト近傍の精度を4.59パーセント低下させ,シャンファー距離で評価した場合のカバレッジメートル法を3.51パーセント向上させる。 Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Specifically, the DiT-3D adopts the design philosophy of DiT but modifies it by incorporating 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. To reduce the computational cost of self-attention in 3D shape generation, we incorporate 3D window attention into Transformer blocks, as the increased 3D token length resulting from the additional dimension of voxels can lead to high computation. Finally, linear and devoxelization layers are used to predict the denoised point clouds. In addition, our transformer architecture supports efficient fine-tuning from 2D to 3D, where the pre-trained DiT-2D checkpoint on ImageNet can significantly improve DiT-3D on ShapeNet. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our DiT-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.	翻訳日:2023-07-06 16:19:07 公開日:2023-07-04
# データ再構築のデコンストラクション:マルチクラス、軽量化、一般的な損失 Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses ( http://arxiv.org/abs/2307.01827v1 ) ライセンス: Link先を確認	Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin and Michal Irani	(参考訳) トレーニングデータの記憶は活発な研究分野であるが、ニューラルネットワークの内部動作に関する我々の理解はまだ初期段階にある。近年,haimら (2022) は多層型パーセプトロンバイナリ分類器からトレーニングサンプルを再構成する手法を提案し,トレーニングサンプルの大部分がそのようなネットワークのパラメータにエンコードされていることを効果的に証明した。本研究では,マルチクラスニューラルネットワークや畳み込みニューラルネットワークからの再構成など,その知見をいくつかの方向に拡張する。回帰損失のようなより広い範囲の損失関数に適用可能な、より一般的な再構成スキームを導出する。さらに,ネットワークがそのような再構築計画に感受性を及ぼす様々な要因について検討した。興味深いことに、トレーニング中に重量減少を使用することで、量と品質の両面で復元性が向上する。さらに, トレーニング標本数に対するニューロン数の影響について検討した。 Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability.	翻訳日:2023-07-06 16:18:35 公開日:2023-07-04
# 意味的役割ラベリングにおける非言語的述語探索:課題と機会 Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities ( http://arxiv.org/abs/2307.01870v1 ) ライセンス: Link先を確認	Riccardo Orlando and Simone Conia and Roberto Navigli	(参考訳) セマンティック・ロール・ラベルリング (SRL) では顕著な進歩が見られたが、ほとんどの研究は、述語の大半が動詞であると仮定して行われている。逆に、述語は名詞や形容詞などの他の部分を用いて表現することもできる。しかしながら、非言語述語は、SRLの進捗を実際の設定(新聞の見出し、対話、ツイートなど)よりも少ない頻度で測定するために一般的に使用しているベンチマークに現れます。本稿では,複数の述語型をカバーする新しいpropbankデータセットを提案する。これにより、標準ベンチマークは、SRLの現在の状況の正確な画像を提供しておらず、最先端システムは、異なる述語型間で知識を伝達できないことを実証的に実証する。これらの問題を観察し、言語、名目、形容詞の述語構造に等しく重要性を与えるように設計された、手書きの課題セットも提示する。このようなデータセットを使用して,異なる言語資源を活用して知識伝達を促進することができるか検討する。結論として、SRLは「解決」には程遠いものであり、他の意味的タスクと統合することで、特に非言語述語の長い尾において、将来重要な改善が可能となり、非言語述語のSRLに関するさらなる研究が促進される。 Although we have witnessed impressive progress in Semantic Role Labeling (SRL), most of the research in the area is carried out assuming that the majority of predicates are verbs. Conversely, predicates can also be expressed using other parts of speech, e.g., nouns and adjectives. However, non-verbal predicates appear in the benchmarks we commonly use to measure progress in SRL less frequently than in some real-world settings -- newspaper headlines, dialogues, and tweets, among others. In this paper, we put forward a new PropBank dataset which boasts wide coverage of multiple predicate types. Thanks to it, we demonstrate empirically that standard benchmarks do not provide an accurate picture of the current situation in SRL and that state-of-the-art systems are still incapable of transferring knowledge across different predicate types. Having observed these issues, we also present a novel, manually-annotated challenge set designed to give equal importance to verbal, nominal, and adjectival predicate-argument structures. We use such dataset to investigate whether we can leverage different linguistic resources to promote knowledge transfer. In conclusion, we claim that SRL is far from "solved", and its integration with other semantic tasks might enable significant improvements in the future, especially for the long tail of non-verbal predicates, thereby facilitating further research on SRL for non-verbal predicates.	翻訳日:2023-07-06 16:12:45 公開日:2023-07-04
# MaskBEV:鳥眼視3D点雲のオブジェクト検出とフットプリント完了 MaskBEV: Joint Object Detection and Footprint Completion for Bird's-eye View 3D Point Clouds ( http://arxiv.org/abs/2307.01864v1 ) ライセンス: Link先を確認	William Guimont-Martin, Jean-Michel Fortin, Fran\c{c}ois Pomerleau, Philippe Gigu\`ere	(参考訳) ライダーポイントクラウドにおける最近のオブジェクト検出の研究は、主にオブジェクト周辺の境界ボックスの予測に焦点を当てている。この予測は通常、アンカーベースまたはアンカーフリーの検出器を使って境界ボックスを予測し、オブジェクトが適切に動作するための明確な事前知識を必要とする。これらの制約を緩和するために,鳥眼ビュー (BEV) を用いた物体検出ニューラルネットワークであるMaskBEVを提案する。 MaskBEVは検出されたオブジェクトのフットプリントを表す一連のBEVインスタンスマスクを予測する。さらに,1回のパスで物体検出と足跡完了を可能にする。 MaskBEVはまた、検出問題を分類の観点から純粋に再構成し、通常はリグレッションによって境界ボックスを予測する。本研究では,SemanticKITTIとKITTIの両方のデータセット上でのMaskBEVの性能評価を行い,アーキテクチャの利点と限界を分析した。 Recent works in object detection in LiDAR point clouds mostly focus on predicting bounding boxes around objects. This prediction is commonly achieved using anchor-based or anchor-free detectors that predict bounding boxes, requiring significant explicit prior knowledge about the objects to work properly. To remedy these limitations, we propose MaskBEV, a bird's-eye view (BEV) mask-based object detector neural architecture. MaskBEV predicts a set of BEV instance masks that represent the footprints of detected objects. Moreover, our approach allows object detection and footprint completion in a single pass. MaskBEV also reformulates the detection problem purely in terms of classification, doing away with regression usually done to predict bounding boxes. We evaluate the performance of MaskBEV on both SemanticKITTI and KITTI datasets while analyzing the architecture advantages and limitations.	翻訳日:2023-07-06 16:12:05 公開日:2023-07-04
# マルチエージェント強化学習による創発的リソース交換と盗難防止行動 Emergent Resource Exchange and Tolerated Theft Behavior using Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2307.01862v1 ) ライセンス: Link先を確認	Jack Garbus, Jordan Pollack	(参考訳) 何十年もの間、協調の進化はゲーム理論、経済学、生物学、コンピュータ科学といった多くの学術分野の関心を惹きつけてきた。本研究では,捕食環境において資源を投棄し,拾い上げることによって形成される,新規で効果的な資源交換プロトコルの出現を実証する。この形態の協力はキャンプファイヤーの導入によって可能となり、それ以外はあり得ない相互作用を探索するエージェントの会衆とダウンタイムが延長される。エージェントは交換相手に騙されるのを避けることを学ぶが、必ずしも第三者からではない。また,環境における処罰,戦闘,強姦のメカニズムが欠如しているにもかかわらず,許容盗難と類似した行動の出現も観察した。 For decades, the evolution of cooperation has piqued the interest of numerous academic disciplines such as game theory, economics, biology, and computer science. In this work, we demonstrate the emergence of a novel and effective resource exchange protocol formed by dropping and picking up resources in a foraging environment. This form of cooperation is made possible by the introduction of a campfire, which adds an extended period of congregation and downtime for agents to explore otherwise unlikely interactions. We find that the agents learn to avoid getting cheated by their exchange partners, but not always from a third party. We also observe the emergence of behavior analogous to tolerated theft, despite the lack of any punishment, combat, or larceny mechanism in the environment.	翻訳日:2023-07-06 16:11:51 公開日:2023-07-04
# 弱アダマール行列と弱アダマール対角化グラフ Weak Hadamard matrices and Weakly Hadamard diagonalizable graphs ( http://arxiv.org/abs/2307.01859v1 ) ライセンス: Link先を確認	Darian McLaren, Hermie Monterde, and Sarah Plosker	(参考訳) 弱いアダマール行列は$\{-1,0, 1\}$-matrix $p$ であり、$pp^t$ は三対角である。弱アダマール行列と弱アダマール対角化グラフ(ラプラシア行列が弱アダマール行列で対角化されるグラフ)の基底となる代数的構造と組合せ的構造について検討する。このような行列やグラフの構成や例も提供します。次に、そのようなグラフに関して量子状態転移を考える。 A weak Hadamard matrix is a $\{-1,0, 1\}$-matrix $P$ such that $PP^T$ is tridiagonal. We explore the underlying algebraic and combinatorial structure of weak Hadamard matrices and weakly Hadamard diagonalizable graphs (graphs whose Laplacian matrix is diagonalized by a weak Hadamard matrix). We also provide constructions and examples of such matrices and graphs. We then consider quantum state transfer with respect to such graphs.	翻訳日:2023-07-06 16:11:38 公開日:2023-07-04
# 時間変調結合共振子系に基づく超伝導非相反性 Superconducting Non-Reciprocity Based on Time-Modulated Coupled-Resonator Systems ( http://arxiv.org/abs/2307.01853v1 ) ライセンス: Link先を確認	Yi Zhuang, Chandrashekhar Gaikwad, Daria Kowsari, Kater Murch, and Aravind Nagulu	(参考訳) 本稿では、時間変調結合共振器ネットワークに基づいて、循環器、アイソレータ、一方向増幅器を含む多種多様な超伝導非相反成分を設計するための統一的アプローチを提案する。本手法は,SQUIDベースの標準共振器をビルディングブロックとして利用し,直列結合,ワイ接続,格子結合共振器などの様々な構成で配置し,幅広いオンチップ非相互デバイスを実現する。提案手法の有効性を実証し,20db以上の挿入損失とアイソレーションをほぼゼロとした循環器およびアイソレータと,10dbを超える前方利得を有する方向増幅器と20db以上の逆アイソレータを実現した。本研究は, 単層超伝導プロセスを用いた直列結合型3共振器超電導アイソレータの実装と評価を行った。 20mKのベース温度では, 前方方向の挿入損失が1.3dB, 中央周波数で25dB, 逆方向の帯域幅250MHzで15dB以上であった。本手法は超伝導回路の高性能非相反デバイスの設計を可能にすることを約束する。 We present a unified approach for designing a diverse range of superconducting non-reciprocal components, including circulators, isolators, and uni-directional amplifiers, based on temporally-modulated coupled resonator networks. Our method leverages standard SQUID-based resonators as building blocks, arranged in various configurations such as series-coupled, wye-connected, and lattice-coupled resonators, to realize a wide range of on-chip non-reciprocal devices. Our theoretical studies demonstrated the effectiveness of the proposed approach, achieving circulators and isolators with near-zero insertion losses and isolation greater than 20 dB, and directional amplifiers with forward gain exceeding 10 dB and reverse isolation greater than 20 dB. To validate our findings, we implemented and measured a series-coupled three-resonator superconducting isolator using a single-layer superconducting process. At a base temperature of 20 mK, our device exhibited insertion loss of 1.3 dB in the forward direction, and isolation of up to 25 dB at the center frequency and greater than 15 dB across a bandwidth of 250 MHz in the reverse direction. Our approach promises to enable the design of a broad range of high-performance non-reciprocal devices for superconducting circuits.	翻訳日:2023-07-06 16:11:30 公開日:2023-07-04
# 亜キラル対称性で保護された位相スピンテクスチャを持つ境界平坦バンド Boundary Flat Bands with Topological Spin Textures Protected by Sub-chiral Symmetry ( http://arxiv.org/abs/2307.01851v1 ) ライセンス: Link先を確認	Yijie Mo, Xiao-Jiao Wang, Rui Yu, Zhongbo Yan	(参考訳) キラル対称性は、トポロジカルな分類や、バルクあるいは境界平坦なバンドの起源の理解において欠かせない役割を果たす。従来のカイラル対称性の定義は、ハミルトニアンと反可換な定数ユニタリ行列の存在を指す。定数ユニタリ行列は一定の固有ベクトルを持つため、キラル対称性によって強制される境界平坦バンドは、同じ固有ベクトルとキラル対称性作用素を共有し、固定された(擬)スピン偏極を持ち、量子幾何学では特徴を持たないことが知られている。本研究では、キラル対称性を一般化し、サブキラル対称性という概念を導入する。定数として定義される従来のキラル対称性作用素とは異なり、亜キラル対称性作用素は運動量ベクトルの部分成分に依存する。キラル対称性を持たない位相的ガッピングまたはギャップレス系は、位相的スピンテクスチャと量子化されたベリー相を示す境界平坦バンドをサポートすることができる。このような興味深い境界平坦なバンドは、相互作用や障害の存在下で様々なエキゾチックな物理学をもたらすことを期待する。 Chiral symmetry plays an indispensable role in topological classifications as well as in the understanding of the origin of bulk or boundary flat bands. The conventional definition of chiral symmetry refers to the existence of a constant unitary matrix anticommuting with the Hamiltonian. As a constant unitary matrix has constant eigenvectors, boundary flat bands enforced by chiral symmetry, which share the same eigenvectors with the chiral symmetry operator, are known to carry fixed (pseudo)spin polarizations and be featureless in quantum geometry. In this work, we generalize the chiral symmetry and introduce a concept termed sub-chiral symmetry. Unlike the conventional chiral symmetry operator defined as constant, the sub-chiral symmetry operator depends on partial components of the momentum vector, so as its eigenvectors. We show that topological gapped or gapless systems without the chiral symmetry but with the sub-chiral symmetry can support boundary flat bands, which exhibit topological spin textures and quantized Berry phases. We expect that such intriguing boundary flat bands could give rise to a variety of exotic physics in the presence of interactions or disorders.	翻訳日:2023-07-06 16:11:08 公開日:2023-07-04
# 自己見積生成モデルがMADに Self-Consuming Generative Models Go MAD ( http://arxiv.org/abs/2307.01850v1 ) ライセンス: Link先を確認	Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk	(参考訳) 画像、テキスト、その他のデータ型の生成AIアルゴリズムの耐震性向上は、次世代モデルのトレーニングに合成データを使用する誘惑につながった。このプロセスを繰り返すと、性質が不十分な自己消費ループが生成される。本研究は,3種類のオートファゴスループの最先端画像モデルを用いて解析的,経験的分析を行い,トレーニングの世代を通しての固定的,新鮮な実トレーニングデータの利用方法や,前世代のモデルのサンプルがデータ品質と多様性のトレードオフに偏っているかどうかについて検討した。あらゆるシナリオの主な結論は、自己食ループの各世代に十分な新鮮な実データがない場合、将来の生成モデルは、その品質(精度)や多様性(リコール)を徐々に減少させる運命にあるということです。我々は、この状態モデルオートファジー障害(mad)と呼び、狂牛病と類似している。 Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.	翻訳日:2023-07-06 16:10:49 公開日:2023-07-04
# クロスウェイ拡散:自己教師型学習による拡散に基づくビジュモータ政策の改善 Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning ( http://arxiv.org/abs/2307.01849v1 ) ライセンス: Link先を確認	Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo	(参考訳) シーケンスモデリングアプローチはロボット模倣学習において有望な結果を示している。近年,複雑なデータ分布のモデル化に特有な能力を有する拡散モデルが,行動のクローニングに採用されている。本研究では,自己教師付き学習(SSL)目標を用いて,拡散に基づくビジュモータポリシー学習を強化する手法であるクロスウェイ拡散を提案する。標準拡散に基づくポリシーは、視覚観測やその他の低次元状態に基づくランダムノイズから動作シーケンスを生成する。さらに、逆拡散過程の中間表現から生画像画素(および他の状態情報)を再構成する新しいデコーダを導入し、ssl損失を用いて共同でモデルを訓練することで、これをさらに拡張する。シミュレーションおよび実世界のロボットタスクにおけるクロスウェイ拡散の有効性を実証し,標準拡散法よりも優れていることを確認する。このような自己教師型再構築は,特に実演の習熟度が異なる場合において,政策学習の表現性を向上することを示す。 Sequence modeling approaches have shown promising results in robot imitation learning. Recently, diffusion models have been adopted for behavioral cloning, benefiting from their exceptional capabilities in modeling complex data distribution. In this work, we propose Crossway Diffusion, a method to enhance diffusion-based visuomotor policy learning by using an extra self-supervised learning (SSL) objective. The standard diffusion-based policy generates action sequences from random noise conditioned on visual observations and other low-dimensional states. We further extend this by introducing a new decoder that reconstructs raw image pixels (and other state information) from the intermediate representations of the reverse diffusion process, and train the model jointly using the SSL loss. Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks, confirming its advantages over the standard diffusion-based policy. We demonstrate that such self-supervised reconstruction enables better representation for policy learning, especially when the demonstrations have different proficiencies.	翻訳日:2023-07-06 16:10:30 公開日:2023-07-04
# 大規模言語モデルを用いたタスクプランニング Embodied Task Planning with Large Language Models ( http://arxiv.org/abs/2307.01848v1 ) ライセンス: Link先を確認	Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan	(参考訳) インボディードエージェントをコモンセンスで取得することは、ロボットが一般的な環境で複雑なヒューマンインストラクションを完了させるのに重要である。最近の大規模言語モデル(LLM)は、複雑なタスクの計画生成にエージェントの豊富な意味知識を組み込むことができるが、現実的な世界に関する情報は乏しく、通常、実現不可能なアクションシーケンスを生成する。本稿では,物理的シーン制約を伴う平面計画のための具体的タスクにおけるタスクプランニングエージェント (tapa) を提案する。具体的には、まず屋内シーンのトリプル、指示、アクションプランを含むマルチモーダルデータセットを構築し、GPT-3.5のシーンにデザインされたプロンプトと既存のオブジェクトのリストを提供し、多数の命令とそれに対応する計画されたアクションを生成する。生成されたデータは、事前訓練されたLLMの接地計画調整に活用される。推論の際には,オープンボキャブラリオブジェクト検出器を様々な場所で収集された多視点RGB画像に拡張することにより,シーン内の物体を検出する。実験の結果,我々のTaPAフレームワークから生成したプランは,LLaVAやGPT-3.5よりも大きなマージンで高い成功率を達成できることがわかった。 Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments. Recent large language models (LLM) can embed rich semantic knowledge for agents in plan generation of complex tasks, while they lack the information about the realistic world and usually yield infeasible action sequences. In this paper, we propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint, where the agent generates executable plans according to the existed objects in the scene by aligning LLMs with the visual perception models. Specifically, we first construct a multimodal dataset containing triplets of indoor scenes, instructions and action plans, where we provide the designed prompts and the list of existing objects in the scene for GPT-3.5 to generate a large number of instructions and corresponding planned actions. The generated data is leveraged for grounded plan tuning of pre-trained LLMs. During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations. Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin, which indicates the practicality of embodied task planning in general and complex environments.	翻訳日:2023-07-06 16:10:13 公開日:2023-07-04
# Grad-FEC: コラボレーションインテリジェンスにおける深い特徴の不平等な損失保護 Grad-FEC: Unequal Loss Protection of Deep Features in Collaborative Intelligence ( http://arxiv.org/abs/2307.01846v1 ) ライセンス: Link先を確認	Korcan Uyanik, S. Faegheh Yeganli, Ivan V. Baji\'c	(参考訳) コラボレーションインテリジェンス(CI)では、人工知能(AI)モデルを、エッジデバイスにデプロイされるフロントエンドと、クラウドにデプロイされるバックエンドの2つの部分に分割する。フロントエンドによって生成された深い特徴テンソルは、通信チャネルを介してクラウドに送信され、パケットロスを受ける可能性がある。この問題に対処するために,Unequal Loss Protection (ULP) によるパケット損失の存在下でのCIシステムのレジリエンスを高める新しい手法を提案する。提案手法は,フロントエンドが生成する特徴パケットの重要度を推定し,重要なパケットを保護するために前方誤り訂正(FEC)符号を選択的に適用する特徴重要度推定器を含む。実験の結果,提案手法はパケット損失の場合にciシステムの信頼性とロバスト性を大幅に向上できることがわかった。 Collaborative intelligence (CI) involves dividing an artificial intelligence (AI) model into two parts: front-end, to be deployed on an edge device, and back-end, to be deployed in the cloud. The deep feature tensors produced by the front-end are transmitted to the cloud through a communication channel, which may be subject to packet loss. To address this issue, in this paper, we propose a novel approach to enhance the resilience of the CI system in the presence of packet loss through Unequal Loss Protection (ULP). The proposed ULP approach involves a feature importance estimator, which estimates the importance of feature packets produced by the front-end, and then selectively applies Forward Error Correction (FEC) codes to protect important packets. Experimental results demonstrate that the proposed approach can significantly improve the reliability and robustness of the CI system in the presence of packet loss.	翻訳日:2023-07-06 16:09:48 公開日:2023-07-04
# トポロジカル量子コンピューティングにおけるブレイド発生行列の体系計算 Systematic Computation of Braid Generator Matrix in Topological Quantum Computing ( http://arxiv.org/abs/2307.01892v1 ) ライセンス: Link先を確認	Abdellah Tounsi, Nacer Eddine Belaloui, Mohamed Messaoud Louamri, Amani Mimoun, Achour Benslama, Mohamed Taha Rouabah	(参考訳) 本稿では,トポロジカル量子計算(TQC)の基本編曲演算の体系的数値計算法を提案する。非可換アノンのブレイディングはtqcにおいて重要な技術であり、位相的に保護された量子ゲートの実装を提供する。しかし、特に多くのエノンや複雑な融合パターンを持つシステムでは、ブレイドジェネレータの行列表現を得ることは困難である。提案手法はこの課題に対処し,qubit あるいは qudit あたりの任意の数のエヌンを含むことができる。このアプローチは一般的なトポロジカル量子回路シミュレータの基本的な構成要素であり、TQCフレームワーク内の複雑な量子回路の探索と解析を容易にする。本手法を代数的条件を用いて実装・テストした。さらに,CNOTゲートの再生に成功して概念実証を行う。 We present a systematic numerical method to compute the elementary braiding operations for topological quantum computation (TQC). Braiding non-Abelian anyons is a crucial technique in TQC, offering a topologically protected implementation of quantum gates. However, obtaining matrix representations for braid generators can be challenging, especially for systems with numerous anyons or complex fusion patterns. Our proposed method addresses this challenge, allowing for the inclusion of an arbitrary number of anyons per qubit or qudit. This approach serves as a fundamental component in a general topological quantum circuit simulator, facilitating the exploration and analysis of intricate quantum circuits within the TQC framework. We have implemented and tested the method using algebraic conditions. Furthermore, we provide a proof of concept by successfully reproducing the CNOT gate.	翻訳日:2023-07-06 16:03:06 公開日:2023-07-04
# 機械学習技術は人道的作業や開発に使えるのだろうか? Are machine learning technologies ready to be used for humanitarian work and development? ( http://arxiv.org/abs/2307.01891v1 ) ライセンス: Link先を確認	Vedran Sekara, M\'arton Karsai, Esteban Moro, Dohyung Kim, Enrique Delamonica, Manuel Cebrian, Miguel Luengo-Oroz, Rebeca Moreno Jim\'enez, and Manuel Garcia-Herranz	(参考訳) 機械学習(ML)や人工知能(AI)といった新しいデジタルデータソースやツールは、開発に関するデータに革命をもたらす可能性があり、人道的な問題を監視し緩和するのに貢献する。人類の最も差し迫った問題を解決するために新しい技術を適用する可能性は、国際開発の研究や研究を行う伝統的な分野以外で関心を集めている。今日では、計算社会科学、ネットワークサイエンス、複雑システム、ヒューマンコンピュータインタラクション、機械学習、そしてより広範なAI分野といった分野の科学コミュニティが、これらのプレッシャー問題に注目し始めている。しかし、高度なデータ駆動ツールは、不完全なデータと停滞する複雑さで現実世界の問題を解決するのに使えるだろうか? 我々は,現状を概説し,データ駆動技術が人道的および開発的文脈において有用になるためには,克服すべき障壁を特定する。組織的かつ目的的な努力がなければ、これらの新技術は、約束された目標に届かず、最悪の場合不平等を高め、差別を増幅し、人権を侵害する恐れがある、と我々は主張する。 Novel digital data sources and tools like machine learning (ML) and artificial intelligence (AI) have the potential to revolutionize data about development and can contribute to monitoring and mitigating humanitarian problems. The potential of applying novel technologies to solving some of humanity's most pressing issues has garnered interest outside the traditional disciplines studying and working on international development. Today, scientific communities in fields like Computational Social Science, Network Science, Complex Systems, Human Computer Interaction, Machine Learning, and the broader AI field are increasingly starting to pay attention to these pressing issues. However, are sophisticated data driven tools ready to be used for solving real-world problems with imperfect data and of staggering complexity? We outline the current state-of-the-art and identify barriers, which need to be surmounted in order for data-driven technologies to become useful in humanitarian and development contexts. We argue that, without organized and purposeful efforts, these new technologies risk at best falling short of promised goals, at worst they can increase inequality, amplify discrimination, and infringe upon human rights.	翻訳日:2023-07-06 16:02:51 公開日:2023-07-04
# 完全量子作業統計量に対する一般化線形応答理論 Generalised linear response theory for the full quantum work statistics ( http://arxiv.org/abs/2307.01885v1 ) ライセンス: Link先を確認	Giacomo Guarnieri, Jens Eisert, Harry J. D. Miller	(参考訳) 我々は、小さなハミルトン摂動を通して平衡から引き出された量子系を考える。線形応答理論のパラダイム的枠組みに基づいて、散逸した作業の完全な生成関数の式を導出する。驚くべきことに、分布に関する全ての情報は緩和関数として知られる単一のアクセス可能な量にエンコードできるため、複雑な量子系における非平衡揺らぎを研究するために現象論的モデルを使う新しい方法が開かれる。本研究は, 小型かつ任意に高速なプロトコルの規則に適用される作業統計に, 熱力学的制約が多数設けられており, 環境への低速運転や弱い結合といった仮定は不要である。最後に、我々のアプローチは、基礎となるゼロポイントエネルギーゆらぎに由来する仕事統計学において明確な量子署名を明らかにする。これにより、短い駆動時間における確率分布の分散が増大し、量子熱力学における非古典的効果を観測することができる。 We consider a quantum system driven out of equilibrium via a small Hamiltonian perturbation. Building on the paradigmatic framework of linear response theory, we derive an expression for the full generating function of the dissipated work. Remarkably, we find that all information about the distribution can be encoded in a single accessible quantity known as the relaxation function, thus opening up new ways to use phenomenological models to study non-equilibrium fluctuations in complex quantum systems. Our results establish a number of refined thermodynamic constraints on the work statistics that apply to regimes of small but arbitrarily fast protocols, and do not require assumptions such as slow driving or weak coupling to an environment. Finally, our approach uncovers a distinctly quantum signature in the work statistics that originates from underlying zero-point energy fluctuations. This causes an increased dispersion of the probability distribution at short driving times, a feature that can be probed in efforts to witness non-classical effects in quantum thermodynamics.	翻訳日:2023-07-06 16:02:30 公開日:2023-07-04
# propile: 大規模言語モデルにおけるプライバシリークの調査 ProPILE: Probing Privacy Leakage in Large Language Models ( http://arxiv.org/abs/2307.01881v1 ) ライセンス: Link先を確認	Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh	(参考訳) 大規模言語モデル(llm)の急速な発展と普及は、個人識別情報(pii)の漏洩の可能性に関する重大な懸念を提起した。これらのモデルは、大量のWeb収集データに基づいてトレーニングされることが多い。本稿では,PLM ベースのサービスにおける PII リークの可能性を意識した,データ主体,あるいは PII の所有者を支援するための新しい探索ツールである ProPILE を提案する。 ProPILEは、データ被験者が自身のPIIに基づいてプロンプトを定式化し、LSMのプライバシー侵害のレベルを評価する。公開されているPileデータセットに基づいてトレーニングされたOPT-1.3Bモデルにその応用を実演する。そこで本研究では,Pileデータセットに含まれるPIIの可能性を仮説データで評価する。 ProPILEはLLMサービスプロバイダによって、社内モデル用に特別に調整されたより強力なプロンプトで、自身のPIIリークレベルを効果的に評価するために利用することもできる。このツールは、Web上の自分のデータに対する認識とコントロールのために、データ主体に力を与えるための先駆的なステップである。 The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.	翻訳日:2023-07-06 16:02:15 公開日:2023-07-04
# ワッサースタイン勾配流を有する粒子系距離GANの安定性解析フレームワーク Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow ( http://arxiv.org/abs/2307.01879v1 ) ライセンス: Link先を確認	Chuqi Chen, Wu Yue, Yang Xiang	(参考訳) 本稿では, MMD GAN, Cram\er GAN, EIEG GAN などの目的関数として, 粒子ベース距離と呼ばれる確率密度距離を用いた生成ネットワークの学習過程について検討する。しかし、これらのガンはしばしば不安定な訓練の問題に苦しむ。本稿では,これらのGANの学習過程の安定性を,確率密度力学の観点から解析する。本フレームワークでは,高次元データを特徴空間にマッピングする特徴変換写像として,識別器$D$を,ジェネレータ$G$は特徴空間の観点から実データに似たサンプルにランダム変数をマッピングする。この観点からは,確率密度関数のwasserstein勾配流を用いてgansトレーニングの安定性解析を行うことができる。 GANの$\min_G \max_D E(G, D)$の定式化により、判別器のトレーニングプロセスは通常不安定である。この問題に対処するため、判別器損失関数に安定化項を追加する。安定解析と安定化法を検証する実験を行った。 In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.	翻訳日:2023-07-06 16:01:59 公開日:2023-07-04
# KDSTM:知識蒸留を用いたニューラルネットワーク半教師付きトピックモデリング KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation ( http://arxiv.org/abs/2307.01878v1 ) ライセンス: Link先を確認	Weijie Xu, Xiaoyu Jiang, Jay Desai, Bin Han, Fuqin Yan and Francis Iannacci	(参考訳) テキスト分類タスクでは、BERT や GPT-3 のような事前訓練済み言語モデルの微調整は、競合する精度をもたらすが、どちらの手法も大きなテキストデータセットで事前訓練を必要とする。対照的に、一般的なトピックモデリング手法は、事前学習なしに意味のある単語のパターンを抽出するために文書を分析する利点を持っている。テキスト分類タスクにおけるトピックモデリングの教師なし洞察抽出を活用するために,知識蒸留半教師付きトピックモデリング(KDSTM)を開発した。 KDSTMは事前訓練された埋め込みを必要とせず、ラベル付きドキュメントがほとんどなく、訓練も効率的で、リソース制約のある設定で理想的です。様々なデータセットにまたがって,提案手法は,既存の教師付きトピックモデリング手法を分類精度,ロバスト性,効率性において上回り,弱教師付きテキスト分類法と比較して同様の性能を実現する。 In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.	翻訳日:2023-07-06 16:01:42 公開日:2023-07-04
# 局所感性量子化による高速プライベートカーネル密度推定 Fast Private Kernel Density Estimation via Locality Sensitive Quantization ( http://arxiv.org/abs/2307.01877v1 ) ライセンス: Link先を確認	Tal Wagner, Yonatan Naamad, Nina Mishra	(参考訳) 差分プライベートカーネル密度推定(DP-KDE)の効率的なメカニズムについて検討した。 gaussian kernel の以前の作業では、次元 $d$ で指数関数的に実行されるアルゴリズムが記述されていた。本稿では,指数障壁を破り,KDEを時間線形に$d$でプライベートに近似し,高次元データに対して実現可能であることを示す。また,低次元データの境界も改善した。本研究は,既存のKDE近似手法を応用可能なKDE機構を構築するために,LSQ(Locality Sensitive Quantization)と呼ばれる一般フレームワークを用いて得られた。 Random Fourier Features、Fast Gauss Transform、Locality Sensitive Hashingなど、効率的な非プライベートなKDEメソッドをブラックボックスで活用できます。実験の結果,DP-KDE機構は高次元および低次元の大規模データセット上で高速かつ高精度であることがわかった。 We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data. Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions.	翻訳日:2023-07-06 16:01:26 公開日:2023-07-04
# Approximate, Adapt, Anonymize (3A): 機械学習のためのトレーニングデータリリースを保存するプライバシー保護フレームワーク Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning ( http://arxiv.org/abs/2307.01875v1 ) ライセンス: Link先を確認	Tamas Madl, Weijie Xu, Olivia Choudhury, Matthew Howard	(参考訳) 大量の情報データの提供は、機械学習の成功に不可欠である。しかし、機密情報を持つドメインでは、個人のプライバシーを保護する高可用性データのリリースが困難であることが証明されている。文学におけるプライバシー保護データリリースのための差分プライバシーと生成モデリングの進歩にもかかわらず、機械学習ユーティリティに最適化されるアプローチはごくわずかである。ほとんどのアプローチは、データ自体の統計メトリクスを考慮に入れ、その後生成されたデータでトレーニングされる機械学習モデルの損失メトリクスを明示的に保持することができない。本稿では,データリリースフレームワークである3A(Approximate,Adapt,Anonymize)を導入し,差分プライバシーを保ちながら機械学習のデータユーティリティを最大化する。また,このフレームワークの具体的実装として,混合モデルを利用して近似的,カーネル誘導型,ガウス微分プライバシを用いてデータセットの匿名化を行い,結果がプライバシ保存と高ユーティリティの両方であることを保証する。本研究では,実データに基づく実データの評価において,実データと民営化データセットを用いたモデルの性能指標の最小差を示す実験的な証拠を示す。また,いくつかのプライバシ保存型合成データ生成モデル(差分プライベート生成型adversarial networkなど)と比較し,最新モデルと比較して分類性能指標が著しく向上したことを報告する。これらの好意的な比較は、提示されたフレームワークが研究の有望な方向であることを示し、機械学習のための低リスク合成データリリースの有用性を高めている。 The availability of large amounts of informative data is crucial for successful machine learning. However, in domains with sensitive information, the release of high-utility data which protects the privacy of individuals has proven challenging. Despite progress in differential privacy and generative modeling for privacy-preserving data release in the literature, only a few approaches optimize for machine learning utility: most approaches only take into account statistical metrics on the data itself and fail to explicitly preserve the loss metrics of machine learning models that are to be subsequently trained on the generated data. In this paper, we introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning, while preserving differential privacy. We also describe a specific implementation of this framework that leverages mixture models to approximate, kernel-inducing points to adapt, and Gaussian differential privacy to anonymize a dataset, in order to ensure that the resulting data is both privacy-preserving and high utility. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets, when evaluated on held-out real data. We also compare our results with several privacy-preserving synthetic data generation models (such as differentially private generative adversarial networks), and report significant increases in classification performance metrics compared to state-of-the-art models. These favorable comparisons show that the presented framework is a promising direction of research, increasing the utility of low-risk synthetic data release for machine learning.	翻訳日:2023-07-06 16:00:57 公開日:2023-07-04
# 非相対論的時空間量子参照フレーム Non-relativistic spatiotemporal quantum reference frames ( http://arxiv.org/abs/2307.01874v1 ) ライセンス: Link先を確認	Michael Suleymanov, Ismael L. Paiva, Eliahu Cohen	(参考訳) 量子参照フレームは、その探索が量子論の多くの分野に関連し、指導的であるため、近年新たな関心を集めている。異なるタイプの中で、位置と時間参照フレームは特別な注意を引いている。本稿では,その外的(空間的)自由度に加えて,各系が内部時計を含む非相対論的枠組みを導入・解析し,時空間量子参照フレームとして利用できることを示す。このフレームワークの他の応用の中で、相互作用のない単純なシナリオであっても、クロック間の相対的不確実性は系の相対的空間的拡散に影響を与えることを示す。 Quantum reference frames have attracted renewed interest recently, as their exploration is relevant and instructive in many areas of quantum theory. Among the different types, position and time reference frames have captivated special attention. Here, we introduce and analyze a non-relativistic framework in which each system contains an internal clock, in addition to its external (spatial) degree of freedom and, hence, can be used as a spatiotemporal quantum reference frame. Among other applications of this framework, we show that even in simple scenarios with no interactions, the relative uncertainty between clocks affects the relative spatial spread of the systems.	翻訳日:2023-07-06 15:59:52 公開日:2023-07-04
# 金属添加物製造におけるクラッド特性予測のためのハイブリッド機械学習フレームワーク A hybrid machine learning framework for clad characteristics prediction in metal additive manufacturing ( http://arxiv.org/abs/2307.01872v1 ) ライセンス: Link先を確認	Sina Tayebati, Kyu Taek Cho	(参考訳) 過去10年間、金属添加物製造(mam)は重要な発展を遂げ、複雑な部品の製作、機能的に傾斜した材料による製品の製造、廃棄物の最小化、低コストのカスタマイズを可能にした。これらの利点にもかかわらず、MAMプロセスの複雑な性質のため、MAMプリントクラッドの特性に対する処理パラメータの影響を予測することは困難である。機械学習(ML)技術は、プロセスの基礎となる物理と処理パラメータをクラッド特性に結びつけるのに役立つ。本研究では,マルチフィジカルな計算流体力学(cfd)モデルによって提供されるデータと,本質的なビッグデータを作成するための実験的研究とを組み合わせたハイブリッド手法を提案し,様々なmlモデルからなる包括的フレームワークを用いてクラッド特性の予測と理解を行う。本研究は,実験データをCFDモデルを用いて生成したデータに融合することにより,まず広範囲なデータセットをコンパイルする。このデータセットは、幅、高さ、深さなどの幾何学的特徴、クラッド品質を識別するラベル、および処理パラメータを含む重要なクラッド特性を含む。第2に、機械学習モデルのトレーニングには、機械設定パラメータと物理認識パラメータと、汎用MLモデルと信頼性評価指標の2つの処理パラメータを使用して、クラッド幾何学と品質を予測するための包括的なスケーラブルな学習フレームワークを作成します。このフレームワークはクラッド特性制御とプロセス最適化の基礎となる。このフレームワークは、ハイブリッドアプローチを用いてデータの不足を解消し、クラッド特性予測と最適化のための効率的で正確でスケーラブルなプラットフォームを導入することで、MAMにおける従来のモデリング手法の多くの課題を解決する。 During the past decade, metal additive manufacturing (MAM) has experienced significant developments and gained much attention due to its ability to fabricate complex parts, manufacture products with functionally graded materials, minimize waste, and enable low-cost customization. Despite these advantages, predicting the impact of processing parameters on the characteristics of an MAM printed clad is challenging due to the complex nature of MAM processes. Machine learning (ML) techniques can help connect the physics underlying the process and processing parameters to the clad characteristics. In this study, we introduce a hybrid approach which involves utilizing the data provided by a calibrated multi-physics computational fluid dynamic (CFD) model and experimental research for preparing the essential big dataset, and then uses a comprehensive framework consisting of various ML models to predict and understand clad characteristics. We first compile an extensive dataset by fusing experimental data into the data generated using the developed CFD model for this study. This dataset comprises critical clad characteristics, including geometrical features such as width, height, and depth, labels identifying clad quality, and processing parameters. Second, we use two sets of processing parameters for training the ML models: machine setting parameters and physics-aware parameters, along with versatile ML models and reliable evaluation metrics to create a comprehensive and scalable learning framework for predicting clad geometry and quality. This framework can serve as a basis for clad characteristics control and process optimization. The framework resolves many challenges of conventional modeling methods in MAM by solving t the issue of data scarcity using a hybrid approach and introducing an efficient, accurate, and scalable platform for clad characteristics prediction and optimization.	翻訳日:2023-07-06 15:59:34 公開日:2023-07-04
# 支援を求めるロボット: 大きな言語モデルプランナーのための不確実性アライメント Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners ( http://arxiv.org/abs/2307.01928v1 ) ライセンス: Link先を確認	Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar	(参考訳) 大規模言語モデル(llm)は、ステップバイステップの計画からコモンセンス推論まで、幅広い有望な能力を示しており、ロボットの実用性を提供するが、自信を持って幻覚的な予測を行う可能性が高い。本研究では,LLMをベースとしたプランナの不確実性を計測・調整するフレームワークであるKnowNoについて述べる。 KnowNoは、複雑な多段階計画設定において人間の助けを最小化しながら、タスク完了に関する統計的保証を提供する共形予測理論に基づいている。例えば、人間の好みからウィノグラードのスキーマまで、空間的な不確実性から数値的な不確実性まで)の異なるモードのタスクを含む様々なシミュレーションされた実ロボットのセットアップの実験では、KnowNoは効率性と自律性の向上の観点からモダンなベースライン(アンサンブルや広範囲な急進的なチューニングを含む)に対して好適に機能し、形式的な保証を提供する。 KnowNo はモデルファインタニングなしで LLM を最初から使用することができ、基礎モデルの増大する能力を補完し拡張できる不確実性をモデリングするための有望な軽量なアプローチを提案する。ウェブサイト:https://robot-help.github.io Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io	翻訳日:2023-07-06 15:53:35 公開日:2023-07-04
# ProtoDiffusion: 原型学習による分類自由拡散指導 ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning ( http://arxiv.org/abs/2307.01924v1 ) ライセンス: Link先を確認	Gulcin Baykal, Halil Faruk Karagoz, Taha Binhuraib, Gozde Unal	(参考訳) 拡散モデルは生成モデルであり、より高い世代品質とより安定したトレーニングという観点で、他の生成モデルと比較して大きな利点を示している。しかし,拡散モデルの学習の必要性は大幅に増大した。本研究では,プロトタイプ学習を拡散モデルに組み込んで,元の拡散モデルよりも高速に高次品質を実現する。クラス埋め込みをランダムに初期化する代わりに、学習したクラスプロトタイプを条件付け情報として使用して拡散過程を導出する。 ProtoDiffusionと呼ばれる本手法は,ベースライン法と比較して訓練の初期段階で優れた性能を達成し,学習したプロトタイプを使用することでトレーニング時間を短縮することを示す。様々なデータセットと実験的な設定を用いてProtoDiffusionの性能を実証し、すべての設定で短時間で最高のパフォーマンスを達成する。 Diffusion models are generative models that have shown significant advantages compared to other generative models in terms of higher generation quality and more stable training. However, the computational need for training diffusion models is considerably increased. In this work, we incorporate prototype learning into diffusion models to achieve high generation quality faster than the original diffusion model. Instead of randomly initialized class embeddings, we use separately learned class prototypes as the conditioning information to guide the diffusion process. We observe that our method, called ProtoDiffusion, achieves better performance in the early stages of training compared to the baseline method, signifying that using the learned prototypes shortens the training time. We demonstrate the performance of ProtoDiffusion using various datasets and experimental settings, achieving the best performance in shorter times across all settings.	翻訳日:2023-07-06 15:53:09 公開日:2023-07-04
# 計算社会科学における再現性 Computational Reproducibility in Computational Social Science ( http://arxiv.org/abs/2307.01918v1 ) ライセンス: Link先を確認	David Schoch, Chung-hong Chan, Claudia Wagner, Arnim Bleier	(参考訳) 過去10年間で、再現性と再現性の危機が科学界を揺るがしている。潜在的な解決策として、オープンサイエンスの実践は深く議論され、様々な分野で様々な成功を収めた。しかしながら,計算社会科学などの計算X分野における再現性のバイナリ定義は,結果が再現できるエージェントや条件について明示的でないため不十分である,と我々は主張する。本研究では, 理論的再現性を創出するが, 実用的, 検証された再現性をサポートしない「オープン洗浄」を避けるための定義を拡張し, 検証可能性の概念に基づく計算再現性の階層システムを導入する。検証可能な計算再現性、特に計算社会科学の分野における共通の障壁を特定し、共通アクセスや計算障壁を回避する方法について提案する。 In the last decade, replication and reproducibility crises have shaken the scientific landscape. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue, however, that the binary definition of reproducibility, specifically for computational-X disciplines such as computational social science, is insufficient since it is not explicit about the agents and conditions under which results can be reproduced. We expand the definition to avoid "open washing", the practice of fabricating theoretical reproducibility but not supporting practical or verified reproducibility, and introduce a tier system of computational reproducibility based on the concept of verifiability. We identify common barriers to verifiable computational reproducibility, specifically in the field of computational social science, and provide suggestions on how to circumvent common access and computational barriers.	翻訳日:2023-07-06 15:52:56 公開日:2023-07-04
# 複雑な海流中における不動容器のストランドングリスク:解析と制御 Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers ( http://arxiv.org/abs/2307.01917v1 ) ライセンス: Link先を確認	Andreas Doering, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux and Claire J. Tomlin	(参考訳) 低推進の船は、目的地に向かうために強力な海流を利用することができる。近年の結果,予測誤差にもかかわらず,船が目的地に到達できる可能性が示唆された。しかし、これらの結果はこれらの船舶の安全性の重要な側面を考慮せず、その低推進力は電流の大きさよりはるかに小さいため、浅い地域、ゴミのパッチ、海運レーンなどの安全でない地域に必然的に押し込む電流になってしまう可能性がある。本研究は,北東太平洋における自由に浮かぶ船舶のストレッチングの危険性について検討した。少なくとも5.04%は90日以内に立ち往生する。次に、安全でない集合をハミルトン・ヤコビ多重時間到達可能性(HJ-MTR)にハード制約としてエンコードし、低計算コストで各ステップで再計画と等価なフィードバックポリシーを合成する。このポリシーを適用したクローズドループは、電流が分かっている場合に安全な動作を保証するが、現実的な状況では不完全な予測しかできない。東北太平洋の高リスク域を航行する船舶の大規模シミュレーションにより,このような現実的な状況において,本手法の安全性を実証する。我々は, 予測誤差が最大推力を超える場合でも, 新たな予測を毎日再計画することで, 安全性を高い確率で確保できることを見出した。本手法はベースライン上での安全性を著しく向上させ,目的地にタイムリーに船が到着する。 Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they might end up in currents that inevitably push them into unsafe areas such as shallow areas, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for free-floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy closed-loop guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. We demonstrate the safety of our approach in such realistic situations empirically with large-scale simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning on new forecasts can ensure safety with high probability even under forecast errors that exceed the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination.	翻訳日:2023-07-06 15:52:42 公開日:2023-07-04
# 自律型農業における海藻成長の最大化:不確実な海流をナビゲートする不活性化システムの動的プログラミング手法 Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents ( http://arxiv.org/abs/2307.01916v1 ) ライセンス: Link先を確認	Matthias Killer, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux and Claire J. Tomlin	(参考訳) 海藻バイオマスは気候変動を緩和する大きな可能性を秘めているが、大規模で自律的なオープンオーシャン農場はそれを完全に活用する必要がある。このような農場は典型的には低い推進力を持ち、海流の影響を強く受けている。高成長域に到達するための非線形時間変化海流を利用して、海藻の成長を最大化するコントローラを設計したい。複雑なダイナミクスと過度な動作は、たとえ電流が知られているとしても、これを難しくする。不確実性が増大する短期的不完全な予測のみが可能であれば、これはさらに難しい。実電流が分かっている場合に最適な成長値関数を効率的に解く動的計画法を提案する。 We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. 実際の太平洋海流シナリオにおける海藻養殖場の30日間のシミュレーションによるアプローチの評価を行った。本手法は,5日間の予測で最高の成長率の95.8%を達成できたことを示す。これにより, 実環境下での浮遊農地における低出力推進と海藻生育促進のための最適制御の可能性が確認された。 Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth regions. The complex dynamics and underactuation make this challenging even when the currents are known. This is even harder when only short-term imperfect forecasts with increasing uncertainty are available. We propose a dynamic programming-based method to efficiently solve for the optimal growth value function when true currents are known. We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. We evaluate our approach through 30-day simulations of floating seaweed farms in realistic Pacific Ocean current scenarios. Our method demonstrates an achievement of 95.8% of the best possible growth using only 5-day forecasts. This confirms the feasibility of using low-power propulsion and optimal control for enhanced seaweed growth on floating farms under real-world conditions.	翻訳日:2023-07-06 15:52:16 公開日:2023-07-04
# climatelearn: 気象と気候モデリングのためのベンチマーク機械学習 ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling ( http://arxiv.org/abs/2307.01909v1 ) ライセンス: Link先を確認	Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover	(参考訳) 気象と気候のモデリングは、気候変動の短期的および長期的影響を理解するための重要な取り組みであり、適応と緩和のための技術と政策作成を通知する。近年,気象予報や気候下降といった中核的な問題を解決するため,機械学習に基づくデータ駆動手法の適用への関心が高まっている。有望な結果にもかかわらず、この進歩の多くは、再現性のための大規模でオープンソースな取り組みの欠如により、一貫性のないデータセットや不特定なデータセット、トレーニングのセットアップ、ドメイン科学者と人工知能研究者による評価によって損なわれている。このライブラリは、データ駆動型気候科学のための機械学習モデルのトレーニングと評価を大幅に単純化する。 climatelearnはデータセット処理のための総合的なパイプライン(例: era5、cmip6、prism)、最先端のディープラーニングモデル(例:transformers、resnets)の実装、標準の気象・気候モデリングタスクの量的・質的評価からなる。これらの機能には、広範なドキュメント、コントリビューションガイド、クイックスタートチュートリアルを加えて、アクセスの拡大とコミュニティの成長を促進する。ライブラリの機能と重要な機能を紹介するため、包括的な予測およびダウンスケーリング実験も行いました。私たちの知る限り、climatelearnは、現代の機械学習システムによる気象と気候モデリングの研究を橋渡しするための、最初の大規模でオープンソースの取り組みです。私たちのライブラリはhttps://github.com/aditya-grover/climate-learn.comで公開されている。 Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.	翻訳日:2023-07-06 15:51:55 公開日:2023-07-04
# ZotCare: フレキシブルでパーソナライズ可能、そして拡張可能なmHealthサービスプロバイダ ZotCare: A Flexible, Personalizable, and Affordable mHealth Service Provider ( http://arxiv.org/abs/2307.01905v1 ) ライセンス: Link先を確認	Sina Labbaf, Mahyar Abbasian, Iman Azimi, Nikil Dutt, and Amir M. Rahmani	(参考訳) インターネットに接続された健康デバイスの普及と、モバイル接続の普及により、信頼できるデジタルヘルスデータと、ジャスト・イン・タイムの介入を提供する可能性がある。しかし、これらの機会を健康研究に活用するには、モバイルヘルス(mhealth)アプリケーションの開発と展開が必要であり、研究者にとって重要な技術的課題となっている。既存のmHealthソリューションはこれらの課題のいくつかに対処する作業を進めてきたが、多くの場合、パーソナライズと適応のための時間と可利用性、柔軟性の面で不足している。 zotcareは、使用可能で柔軟なサービスを提供することで、これらの制限に対処し、研究者がmhealth研究にアクセスしやすく、コスト効率が高く、適応可能なソリューションを提供することを目指している。この記事では、ZotCareのサービスオーケストレーションに焦点を当て、mHealthリサーチ用のプログラム可能な環境を作成する能力を強調します。さらに,過去にも進行中のプロジェクトにおいても,ZotCareを利用した研究事例をいくつか紹介する。さらに,ZotCareをmHealth研究ソリューションとして検討している研究者に対して,リソースと情報を提供する。 The proliferation of Internet-connected health devices and the widespread availability of mobile connectivity have resulted in a wealth of reliable digital health data and the potential for delivering just-in-time interventions. However, leveraging these opportunities for health research requires the development and deployment of mobile health (mHealth) applications, which present significant technical challenges for researchers. While existing mHealth solutions have made progress in addressing some of these challenges, they often fall short in terms of time-to-use, affordability, and flexibility for personalization and adaptation. ZotCare aims to address these limitations by offering ready-to-use and flexible services, providing researchers with an accessible, cost-effective, and adaptable solution for their mHealth studies. This article focuses on ZotCare's service orchestration and highlights its capabilities in creating a programmable environment for mHealth research. Additionally, we showcase several successful research use cases that have utilized ZotCare, both in the past and in ongoing projects. Furthermore, we provide resources and information for researchers who are considering ZotCare as their mHealth research solution.	翻訳日:2023-07-06 15:51:24 公開日:2023-07-04
# 乱用言語分類器による偽因果関係の検証のための概念に基づく説明 Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers ( http://arxiv.org/abs/2307.01900v1 ) ライセンス: Link先を確認	Isar Nejadgholi, Svetlana Kiritchenko, Kathleen C. Fraser, and Esma Balk{\i}r	(参考訳) 分類器は、過剰表現された概念とラベルの間の誤った因果関係を学習する傾向があり、その結果、概念の過度な信頼と分類精度の妥協につながる。異なるモデルを比較し、特定の概念に過剰依存を識別できるメソッドを配置しておくことが不可欠である。大規模な英語データセットで訓練された3つのよく知られた乱用言語分類器について検討し,悪用ラベルの十分な特徴として学習すべきでない重要なシグナルである否定感情の概念に注目した。グローバル十分性の定義に動機づけられ、まず、すべての決定しきい値にまたがって設定された課題において、その正確性を評価することによって、分類器が学習した望ましくない依存関係を調べる。さらに,課題セットが必ずしも利用可能ではないことを認識し,概念がラベルに与える影響を評価するための概念ベースの説明指標を導入する。これらの説明により、概念とラベルの間で学んだ偽の大域的充足度について分類器を比較することができる。 Classifiers tend to learn a false causal relationship between an over-represented concept and a label, which can result in over-reliance on the concept and compromised classification accuracy. It is imperative to have methods in place that can compare different models and identify over-reliances on specific concepts. We consider three well-known abusive language classifiers trained on large English datasets and focus on the concept of negative emotions, which is an important signal but should not be learned as a sufficient feature for the label of abuse. Motivated by the definition of global sufficiency, we first examine the unwanted dependencies learned by the classifiers by assessing their accuracy on a challenge set across all decision thresholds. Further, recognizing that a challenge set might not always be available, we introduce concept-based explanation metrics to assess the influence of the concept on the labels. These explanations allow us to compare classifiers regarding the degree of false global sufficiency they have learned between a concept and a label.	翻訳日:2023-07-06 15:51:02 公開日:2023-07-04
# 変換プロトフォーム再構成 Transformed Protoform Reconstruction ( http://arxiv.org/abs/2307.01896v1 ) ライセンス: Link先を確認	Young Min Kim, Kalvin Chang, Chenxuan Cui and David Mortensen	(参考訳) プロトホルムの再構築は、娘言語の祖先言語における形態素や単語の出現を推測する作業である。 Meloni et al. (2021)は、RNNベースのエンコーダデコーダとアテンションモデルを用いて、ラテン文字のプロトフォーム再構築の最先端を達成した。我々は最新のseq2seqモデルであるtransformerでモデルを更新する。我々のモデルは,5言語にまたがる8,000コニャート,39種にまたがる800以上のコニャートからなる中国語データセット(Hou 2004)の2つの異なるデータセット上で,それらのモデルを比較した。また,本モデルに含まれる可能性のある系統信号についても検討する。私たちのコードはhttps://github.com/cmu-llab/acl-2023で公開されています。 Protoform reconstruction is the task of inferring what morphemes or words appeared like in the ancestral languages of a set of daughter languages. Meloni et al. (2021) achieved the state-of-the-art on Latin protoform reconstruction with an RNN-based encoder-decoder with attention model. We update their model with the state-of-the-art seq2seq model: the Transformer. Our model outperforms their model on a suite of different metrics on two different datasets: their Romance data of 8,000 cognates spanning 5 languages and a Chinese dataset (Hou 2004) of 800+ cognates spanning 39 varieties. We also probe our model for potential phylogenetic signal contained in the model. Our code is publicly available at https://github.com/cmu-llab/acl-2023.	翻訳日:2023-07-06 15:50:46 公開日:2023-07-04
# EANet: 拡張アトリビュートベースのRGBTトラッカーネットワーク EANet: Enhanced Attribute-based RGBT Tracker Network ( http://arxiv.org/abs/2307.01893v1 ) ライセンス: Link先を確認	Abbas T\"urko\u{g}lu, Erdem Akag\"und\"uz	(参考訳) トラッキング対象は、特に咬合や照明の変化、動きのぼやきといった課題に直面した場合には、コンピュータビジョンにおいて難しい課題となることがある。ディープラーニングの最近の進歩は、これらの条件に挑戦する可能性を示している。しかし、ほとんどのディープラーニングベースのオブジェクトトラッカーは、可視帯域(RGB)イメージのみを使用する。熱赤外電磁波(TIR)は、困難な状況に直面した場合、その温度を含む物体に関する追加情報を提供する。本稿では,RGBと熱画像(RGBT)を融合した深層学習に基づく画像追跡手法を提案する。提案モデルは,特徴抽出器とトラッカーの2つの主成分から構成される。特徴抽出器は、RGBとTIR画像の両方の深い特徴を符号化する。トラッカーはこれらの機能を使用して、拡張された属性ベースのアーキテクチャを使用してオブジェクトを追跡する。本稿ではアグリゲーションモジュールを用いた属性固有の特徴選択の融合を提案する。提案手法はRGBT234 \cite{LiCLiang2018}とLasHeR \cite{LiLasher2021}データセットで評価され,RGBTオブジェクト追跡データセットとして最も広く使用されている。その結果,提案システムはこれらのデータセット上で,比較的少ないパラメータで,最先端のRGBTオブジェクトトラッカーよりも優れていた。 Tracking objects can be a difficult task in computer vision, especially when faced with challenges such as occlusion, changes in lighting, and motion blur. Recent advances in deep learning have shown promise in challenging these conditions. However, most deep learning-based object trackers only use visible band (RGB) images. Thermal infrared electromagnetic waves (TIR) can provide additional information about an object, including its temperature, when faced with challenging conditions. We propose a deep learning-based image tracking approach that fuses RGB and thermal images (RGBT). The proposed model consists of two main components: a feature extractor and a tracker. The feature extractor encodes deep features from both the RGB and the TIR images. The tracker then uses these features to track the object using an enhanced attribute-based architecture. We propose a fusion of attribute-specific feature selection with an aggregation module. The proposed methods are evaluated on the RGBT234 \cite{LiCLiang2018} and LasHeR \cite{LiLasher2021} datasets, which are the most widely used RGBT object-tracking datasets in the literature. The results show that the proposed system outperforms state-of-the-art RGBT object trackers on these datasets, with a relatively smaller number of parameters.	翻訳日:2023-07-06 15:50:30 公開日:2023-07-04
# グラフニューラルネットワークにおける特徴進化の神経崩壊の展望 A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks ( http://arxiv.org/abs/2307.01951v1 ) ライセンス: Link先を確認	Vignesh Kothapalli, Tom Tirer, Joan Bruna	(参考訳) グラフ構造データの分類タスクでは,グラフニューラルネットワーク(gnns)がますます普及している。しかし,GNNにおけるグラフトポロジと特徴進化の相互作用はよく理解されていない。本稿では,確率的ブロックモデルグラフ上でのコミュニティ検出と共に,ノード単位の分類に着目し,神経崩壊(nc)現象のレンズを通して特徴進化を考察する。インスタンスワイドの深層分類器(例えば画像分類)をゼロの訓練誤差点を超えて訓練する場合、NCは最深部特徴のクラス内変数の減少を示し、それらのクラスは特定の対称構造にアライメントされる。まず、ノード単位の分類設定において、クラス内変数の減少が顕著であることを示す実証的研究から始めるが、インスタンス単位のケースで観測される範囲には及ばない。そして、この区別を理論的に研究する。具体的には、「最適」な数学的モデルでさえ、グラフは正確な崩壊を伴う最小値を持つために厳密な構造条件に従う必要があることを示す。興味深いことに、この条件は異種グラフにも有効であり、GNNの一般化を改善した最近の経験的研究と関係している。さらに, 理論モデルの勾配ダイナミクスを研究することにより, 経験的に観測される部分的崩壊の推理を与える。最後に,よく訓練されたgnnの層間におけるクラス間特徴変動の進化と,その挙動をスペクトル法と対比する。 Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.	翻訳日:2023-07-06 15:43:07 公開日:2023-07-04
# ビデオ探索のための因果ビデオ要約器 Causal Video Summarizer for Video Exploration ( http://arxiv.org/abs/2307.01947v1 ) ライセンス: Link先を確認	Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring	(参考訳) 近年,ビデオ探索を支援する方法としてビデオ要約が提案されている。しかし、従来のビデオ要約モデルは、ユーザー固有のニーズとは無関係に固定されたビデオ要約のみを生成し、それゆえビデオ探索の有効性を制限している。マルチモーダルビデオ要約はこの問題に対処するために使用されるアプローチの1つである。マルチモーダルビデオ要約は、ビデオ入力とテキストベースのクエリ入力を有する。したがって,マルチモーダルビデオ要約には,映像入力とテキスト検索の相互作用を効果的にモデル化することが不可欠である。本研究では,CVS(Causal Video Summarizer)と呼ばれる因果関係に基づく新しい手法を提案し,マルチモーダルビデオ要約の課題に対処するために,映像とクエリ間の対話的情報を効果的にキャプチャする。提案手法は確率エンコーダと確率デコーダからなる。既存のマルチモーダル映像要約データセットの評価結果から,提案手法の精度が+5.4%,F1スコアが+4.92%向上すると,最先端の手法と比較して有効であることが示された。 Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with the state-of-the-art method.	翻訳日:2023-07-06 15:42:42 公開日:2023-07-04
# 深層学習に基づく走査心電図デジタル化を実現するための心電図画像生成ツールボックス A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization ( http://arxiv.org/abs/2307.01946v1 ) ライセンス: Link先を確認	Kshama Kodthalu Shivashankara and Reza Sameni	(参考訳) 医療データへのアクセスは、保護された健康情報(PHI)を含むため、しばしば制限される。個人識別可能な情報を含むレコードの使用に関するプライバシー上の懸念がある。近年,臨床診断と意思決定に深層学習に基づくアルゴリズムを適用している。しかし、ディープラーニングモデルはデータグラデーションであり、これらのモデルのトレーニングと評価のための医療データセットは比較的限られている。いわゆる \textit{digital twins}によるデータ拡張は、このニーズに対処する新たなテクニックである。本稿では,ecg画像のデジタイズアルゴリズムを開発するために,時系列データから人工心電図(ecg)画像を生成する新しい手法を提案する。標準ECG紙の背景に歪みのないECG画像を生成することにより、プライバシ保存方式で合成データを生成する。次に、ECG画像に手書きのテキストアーティファクト、しわ、クレーゼ、パースペクティブ変換を含む様々な歪みを適用する。人工物は、個人を特定することなく、合成的に生成される。使用例として,生理学のptb-xlデータセットから21,801個の大規模心電図画像データセットを作成し,18,869人の患者から12個のリード心電図時系列データを得た。合成データセットを用いた深部心電図デジタイズモデルを開発し,評価のために合成画像から時系列データへの変換を行った。 snr(signal-to-noise ratio)を算出し,画像のデジタル化品質とグラウンド・トゥルータのecg時系列を比較した。その結果,27$\pm$2.8\,dBの平均信号回復SNRが示され,深層学習モデルのトレーニングのための合成ECG画像データセットの重要性が示された。 Access to medical data is often limited as it contains protected health information (PHI). There are privacy concerns regarding using records containing personally identifiable information. Recent advancements have been made in applying deep learning-based algorithms for clinical diagnosis and decision-making. However, deep learning models are data-greedy, whereas the availability of medical datasets for training and evaluating these models is relatively limited. Data augmentation with so-called \textit{digital twins} is an emerging technique to address this need. This paper presents a novel approach for generating synthetic electrocardiogram (ECG) images with realistic artifacts from time-series data for use in developing algorithms for digitization of ECG images. Synthetic data is generated in a privacy-preserving manner by generating distortionless ECG images on standard ECG paper background. Next, various distortions, including handwritten text artifacts, wrinkles, creases, and perspective transforms are applied to the ECG images. The artifacts are generated synthetically, without personally identifiable information. As a use case, we generated a large ECG image dataset of 21,801 records from the PhysioNet PTB-XL dataset, with 12 lead ECG time-series data from 18,869 patients. A deep ECG image digitization model was developed and trained on the synthetic dataset, and was employed to convert the synthetic images to time-series data for evaluation. The signal-to-noise ratio (SNR) was calculated to assess the image digitization quality vs the ground truth ECG time-series. The results show an average signal recovery SNR of 27$\pm$2.8\,dB, demonstrating the significance of the proposed synthetic ECG image dataset for training deep learning models.	翻訳日:2023-07-06 15:42:25 公開日:2023-07-04
# 擬似ラベルによるクエリに基づくビデオ要約 Query-based Video Summarization with Pseudo Label Supervision ( http://arxiv.org/abs/2307.01945v1 ) ライセンス: Link先を確認	Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring	(参考訳) 手動でラベル付けされたクエリベースのビデオ要約のための既存のデータセットはコストがかかり、小さくなり、教師付きディープビデオ要約モデルの性能が制限される。セルフスーパービジョンは、プリテキストタスクを使い、擬似ラベルで余分なデータを取得し、教師付き深層モデルを事前学習する方法を定義することで、データスパーシティチャレンジに対処することができる。本研究では,入力映像からのセグメントレベルの擬似ラベルを導入し,プリテキストタスクと対象タスクの関係と,擬似ラベルと人間定義ラベルとの暗黙の関係を適切にモデル化する。擬似ラベルは、既存のフレームレベルラベルに基づいて生成される。より正確なクエリ依存のビデオ要約を作成するために、コンテキスト対応のクエリ表現を生成するセマンティックスブースターを提案する。さらに,視覚とテキストの対話的情報を取り込むための相互注意を提案する。 3つの一般的なビデオ要約ベンチマークを用いて提案手法を徹底的に検証する。実験の結果,提案手法は最先端の性能を実現することがわかった。 Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models. Self-supervision can address the data sparsity challenge by using a pretext task and defining a method to acquire extra data with pseudo labels to pre-train a supervised deep model. In this work, we introduce segment-level pseudo labels from input videos to properly model both the relationship between a pretext task and a target task, and the implicit relationship between the pseudo label and the human-defined label. The pseudo labels are generated based on existing human-defined frame-level labels. To create more accurate query-dependent video summaries, a semantics booster is proposed to generate context-aware query representations. Furthermore, we propose mutual attention to help capture the interactive information between visual and textual modalities. Three commonly-used video summarization benchmarks are used to thoroughly validate the proposed approach. Experimental results show that the proposed video summarization algorithm achieves state-of-the-art performance.	翻訳日:2023-07-06 15:41:57 公開日:2023-07-04
# Text + Sketch:超低速度での画像圧縮 Text + Sketch: Image Compression at Ultra Low Rates ( http://arxiv.org/abs/2307.01944v1 ) ライセンス: Link先を確認	Eric Lei, Yi\u{g}it Berkay Uslu, Hamed Hassani, Shirin Saeedi Bidokhti	(参考訳) テキスト対画像生成モデルの最近の進歩は、短いテキスト記述から高品質な画像を生成する機能を提供する。これらの基盤モデルは、数十億規模のデータセットで事前トレーニングされた場合、ほとんどあるいはまったくトレーニングせずに、さまざまな下流タスクに有効である。自然な質問は、このようなモデルを画像圧縮にどのように適応するかである。本研究では,事前学習モデルを用いて,新しい低レートレジームをターゲットとした圧縮スキームを実装する手法について検討する。テキスト記述と副次的情報とを併用して,テキストのセマンティクスと空間構造を両立した高忠実度再構成を生成する方法を示す。エンド・ツー・エンドのトレーニングは行わないものの,非常に低ビットレートで学習した圧縮機の知覚的・意味的忠実度を向上できることを示す。 Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.	翻訳日:2023-07-06 15:41:42 公開日:2023-07-04
# スパース入力からの物理に基づく動き再ターゲティング Physics-based Motion Retargeting from Sparse Inputs ( http://arxiv.org/abs/2307.01938v1 ) ライセンス: Link先を確認	Daniele Reda, Jungdam Won, Yuting Ye, Michiel van de Panne, Alexander Winkler	(参考訳) アバターは仮想世界でインタラクティブで没入的な体験を作り出すために重要である。これらのキャラクターをユーザーの動きを模倣するアニメーション化の課題の1つは、商用AR/VR製品がヘッドセットとコントローラのみで構成されており、ユーザーのポーズのセンサーデータが非常に限られていることである。もう一つの課題は、アバターは人間とは異なる骨格構造を持ち、それらの間のマッピングは不明確である。この作業では、これら2つの課題に対処します。本稿では,人間の分散センサデータから様々な形態の文字へ,リアルタイムに動きをターゲティングする手法を提案する。本手法は,物理シミュレータにおける文字制御ポリシーの学習に強化学習を用いる。私たちは、アバターごとにアーティスト生成アニメーションに頼ることなく、トレーニングのために人間のモーションキャプチャーデータのみを必要とします。これにより、大規模なモーションキャプチャデータセットを使用して、未確認のユーザをリアルタイムおよびスパースデータから追跡する一般的なポリシをトレーニングできます。我々は、恐竜、ネズミのような生き物、人間という、異なる骨格構造を持つ3つのキャラクターに対するアプローチの実現可能性を示した。下半身のセンサー情報がないにもかかわらず、アバターのポーズは驚くほどユーザーとよく合っていることが分かる。我々は,我々のフレームワークの重要な構成要素,特にキネマティック・リターゲティングのステップ,模倣,接触,行動報酬,および非対称なアクター・クリティカルな観察について論じる。さらに,アンバランス,ダンス,スポーツ動作など,さまざまな環境下での手法の堅牢性について検討する。 Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.	翻訳日:2023-07-06 15:41:27 公開日:2023-07-04
# 脆性破壊のモデル化のための再生カーネル近似のニューラルネットワークによる強化 A Neural Network-Based Enrichment of Reproducing Kernel Approximation for Modeling Brittle Fracture ( http://arxiv.org/abs/2307.01937v1 ) ライセンス: Link先を確認	Jonghyuk Baek, Jiun-Shyan Chen	(参考訳) 局所化の数値モデリングは、局所化経路を事前に定義しない粗い解が進化しているため、難しい課題である。数十年の努力にもかかわらず、局所化の進化を予測するために、革新的な離散化非依存の計算方法が必要である。本研究では、脆性破壊をモデル化するためのニューラルネットワーク強化再生カーネル粒子法(NN-RKPM)の改良版を提案する。提案手法では、粗大かつ均一な離散化に基づいて定義されたバックグラウンド再生カーネル(RK)近似を、ユニティフレームワークの分割の下でニューラルネットワーク(NN)近似により濃縮する。 NN近似では、ディープニューラルネットワークが関数空間内の正規化された不連続を自動的に見つけ、挿入する。 NNベースのエンリッチメント関数は、RKを単位パッチ関数の分割として使用するRK近似関数と共にパッチされる。エネルギーベース損失関数の最小化により, 位置, 方向, 変位分布をrk近似係数とともに決定する最適nnパラメータを求める。 NN-RK近似を正規化するために、損失関数にパラメトリック座標の空間勾配の制約を課す。収束特性の解析は,提案手法の解収束が保証されていることを示す。提案手法の有効性は,損傷伝播と分岐を含む一連の数値例によって実証された。 Numerical modeling of localizations is a challenging task due to the evolving rough solution in which the localization paths are not predefined. Despite decades of efforts, there is a need for innovative discretization-independent computational methods to predict the evolution of localizations. In this work, an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) is proposed for modeling brittle fracture. In the proposed method, a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization is enriched by a neural network (NN) approximation under a Partition of Unity framework. In the NN approximation, the deep neural network automatically locates and inserts regularized discontinuities in the function space. The NN-based enrichment functions are then patched together with RK approximation functions using RK as a Partition of Unity patching function. The optimum NN parameters defining the location, orientation, and displacement distribution across location together with RK approximation coefficients are obtained via the energy-based loss function minimization. To regularize the NN-RK approximation, a constraint on the spatial gradient of the parametric coordinates is imposed in the loss function. Analysis of the convergence properties shows that the solution convergence of the proposed method is guaranteed. The effectiveness of the proposed method is demonstrated by a series of numerical examples involving damage propagation and branching.	翻訳日:2023-07-06 15:41:00 公開日:2023-07-04
# concept2box: 2視点知識グラフ学習のための合同幾何埋め込み Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs ( http://arxiv.org/abs/2307.01933v1 ) ライセンス: Link先を確認	Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, Jingbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, Wei Wang	(参考訳) 知識グラフ埋め込み(KGE)は、多くの実世界のアプリケーションに大規模な関係データを埋め込むために広く研究されている。既存の手法では、多くのkgsが2つの基本的な異なるビューを持っているという事実を長い間無視してきた。通常、すべてのノードをベクトルとして1つの潜在空間に埋め込む。しかし、一つの幾何学的表現は2つのビューの構造的な違いを捉えず、概念の粒度に対する確率論的意味論を欠いている。双対幾何表現を用いたkgの2つのビューを共同で埋め込む新しいアプローチであるconcept2boxを提案する。我々は,階層構造や重なりや不一致といった複雑な関係を学習するボックス埋め込みを用いて概念をモデル化する。ボックスボリュームは概念の粒度として解釈できる。概念とは違って、エンティティをベクトルとしてモデル化します。概念箱埋め込みと実体ベクトル埋め込みのギャップを埋めるため,新しいベクトル-箱間距離測定法を提案し,両埋め込みを共同で学習する。パブリックDBpedia KGと新しい産業KGの両方の実験は、Concept2Boxの有効性を示した。 Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts' granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts' granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.	翻訳日:2023-07-06 15:40:44 公開日:2023-07-04
# MDI+: フレキシブルなランダムフォレストベースの特徴重要度フレームワーク MDI+: A Flexible Random Forest-Based Feature Importance Framework ( http://arxiv.org/abs/2307.01932v1 ) ライセンス: Link先を確認	Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu	(参考訳) 不純物の平均減少(MDI)は、ランダム森林(RF)にとって重要な特徴である。 RFにおける各木の特徴である$X_k$に対するMDIは、X_k$で分割された決定切り株の集合に対する応答の線形回帰における非正規化$R^2$値と等価であることを示す。我々はこの解釈を用いて、MDI+と呼ばれるフレキシブルな特徴重視フレームワークを提案する。具体的には、MDI+は、アナリストが線形回帰モデルと$R^2$メトリックを正規化された一般化線形モデル(GLM)に置き換えることによって、MDIを一般化する。さらに、MDI+には、決定木の既知のバイアスを加法モデルやスムーズモデルに対して緩和する追加機能が含まれている。さらに,検証的データサイエンスの予測可能性,計算可能性,安定性フレームワークに基づいて,適切なglmとメトリックを選択する方法のガイダンスを提供する。広範囲なデータインスパイアされたシミュレーションでは、MDI+は信号の特徴を特定する上で、一般的な特徴の重要性を著しく上回っている。また,MDI+を薬物反応予測と乳癌サブタイプ分類の2つの実例に適用した。 MDI+は,既存の特徴重要度よりも安定性が著しく高い,確立された予測遺伝子を抽出する。すべてのコードとモデルは、github上のpythonパッケージでリリースされている。 Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.	翻訳日:2023-07-06 15:40:24 公開日:2023-07-04
# バックプロパゲーションを伴わない心電図信号特徴の学習 Learning ECG signal features without backpropagation ( http://arxiv.org/abs/2307.01930v1 ) ライセンス: Link先を確認	P\'eter P\'osfay, Marcell T. Kurbucz, P\'eter Kov\'acs, Antal Jakov\'ac	(参考訳) 表現学習は、分類や予測のような下流タスクの有効性、範囲、適用性を高める有用な特徴を持つ生データ表現の効率的な方法を見つけることを目的として、機械学習における重要な研究領域となっている。本稿では,時系列型データの表現を生成する新しい手法を提案する。この方法は、データ駆動の方法でコンパクト表現を構築するための理論物理学からのアイデアに依存しており、データの基本構造とタスク固有の情報の両方を捉えることができ、直感的で解釈可能で検証可能なままである。本手法は,特定のクラスに属するサンプル間の共有特性を効果的に把握できる線形法則を同定することを目的とする。その後、これらの法則を利用して分類子非依存表現を前方に生成することで、一般化された設定で適用されるようになる。本稿では,ECG信号分類の課題に対するアプローチの有効性を示す。 Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.	翻訳日:2023-07-06 15:39:59 公開日:2023-07-04
# 三面体による形状表現と生成のためのハイブリッドニューラル拡散型流れ Hybrid Neural Diffeomorphic Flow for Shape Representation and Generation via Triplane ( http://arxiv.org/abs/2307.01957v1 ) ライセンス: Link先を確認	Kun Han, Shanlin Sun, Xiaohui Xie	(参考訳) Deep Implicit Functions (DIF) はそのコンパクトさと連続表現能力のために3Dコンピュータビジョンで人気を博している。しかしながら、difエンコードされた形状にまたがる密接な対応と意味関係への対処は依然として重要な課題であり、テクスチャ転送や形状解析の応用は制限されている。さらに,DIFを用いた3次元形状生成における最近の取り組みは,対応やトポロジー保存を無視することが多い。本稿では,下層の表現を暗黙的に学習し,複雑な密接な対応を軸に並んだ三面体に分解する手法であるhndf(hybrid neural diffeomorphic flow)を提案する。局所ミニマに閉じ込められた準最適表現を避けるために,局所対応と大域対応の両方を捉えるハイブリッド監督を提案する。新しい3次元形状を直接生成する従来の手法とは異なり、変形は3次元平面の特徴によって符号化される微分型流によって変形したテンプレート形状による形状生成の考え方をさらに探求する。既存の2次元拡散モデルを利用して, 生成する三面体特徴を通じ, 高品質で多様な3次元二相流を生成し, テンプレート形状との位相的一貫性を確保する。 3次元形状表現と生成におけるhndfの有効性を評価する医用画像臓器分割データセットに関する広範囲実験 Deep Implicit Functions (DIFs) have gained popularity in 3D computer vision due to their compactness and continuous representation capabilities. However, addressing dense correspondences and semantic relationships across DIF-encoded shapes remains a critical challenge, limiting their applications in texture transfer and shape analysis. Moreover, recent endeavors in 3D shape generation using DIFs often neglect correspondence and topology preservation. This paper presents HNDF (Hybrid Neural Diffeomorphic Flow), a method that implicitly learns the underlying representation and decomposes intricate dense correspondences into explicitly axis-aligned triplane features. To avoid suboptimal representations trapped in local minima, we propose hybrid supervision that captures both local and global correspondences. Unlike conventional approaches that directly generate new 3D shapes, we further explore the idea of shape generation with deformed template shape via diffeomorphic flows, where the deformation is encoded by the generated triplane features. Leveraging a pre-existing 2D diffusion model, we produce high-quality and diverse 3D diffeomorphic flows through generated triplanes features, ensuring topological consistency with the template shape. Extensive experiments on medical image organ segmentation datasets evaluate the effectiveness of HNDF in 3D shape representation and generation.	翻訳日:2023-07-06 15:32:18 公開日:2023-07-04
# アルゴリズムEM r'egularis\'e Algorithme EM r\'egularis\'e ( http://arxiv.org/abs/2307.01955v1 ) ライセンス: Link先を確認	Pierre Houdouin and Matthieu Jonkcheere and Frederic Pascal	(参考訳) expectation-Maximization (EM) アルゴリズムはガウス混合モデル(GMM)を扱う際の最大推定値を計算するために広く用いられている反復アルゴリズムである。サンプルサイズがデータ次元よりも小さい場合、これは特異もしくは条件の悪い共分散行列となり、結果として性能が低下する可能性がある。本稿では,より少ないサンプルサイズに対応するために,事前知識を効率的に活用するEMアルゴリズムの正規化バージョンを提案する。本手法は,正規化推定が共分散行列更新の正定性を保証するペナルティ化gmmの確率を最大化することを目的としている。最後に, 実データを用いた実験では, 提案アルゴリズムの性能向上を強調する。 Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing maximum likelihood estimate when dealing with Gaussian Mixture Model (GMM). When the sample size is smaller than the data dimension, this could lead to a singular or poorly conditioned covariance matrix and, thus, to performance reduction. This paper presents a regularized version of the EM algorithm that efficiently uses prior knowledge to cope with a small sample size. This method aims to maximize a penalized GMM likelihood where regularized estimation may ensure positive definiteness of covariance matrix updates by shrinking the estimators towards some structured target covariance matrices. Finally, experiments on real data highlight the good performance of the proposed algorithm for clustering purposes	翻訳日:2023-07-06 15:31:55 公開日:2023-07-04
# femda:ロバストとフレキシブルの分類に関するune m\'ethode FEMDA: Une m\'ethode de classification robuste et flexible ( http://arxiv.org/abs/2307.01954v1 ) ライセンス: Link先を確認	Pierre Houdouin and Matthieu Jonckheere and Frederic Pascal	(参考訳) 線形および二次判別解析(ldaおよびqda)はよく知られた古典的手法であるが、非ガウス分布および/または汚染データセットに苦しむことがある。本稿では,各データ点を任意の楕円対称(ES)分布と独自の任意のスケールパラメータで描画する,新しい識別分析手法のデータのスケール変化に対するロバスト性について検討する。このようなモデルは、おそらく非常に異質で、独立で、特定されていない分散サンプルを可能にする。導出される新しい決定規則は、他の最先端手法と比較して、データのスケール変更に対して単純で高速で堅牢である Linear and Quadratic Discriminant Analysis (LDA and QDA) are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. This paper studies the robustness to scale changes in the data of a new discriminant analysis technique where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. The new decision rule derived is simple, fast, and robust to scale changes in the data compared to other state-of-the-art method	翻訳日:2023-07-06 15:31:42 公開日:2023-07-04
# 静止状態fMRIを用いた機能的脳ネットワークの自動認識モデルの構築に向けて Toward more frugal models for functional cerebral networks automatic recognition with resting-state fMRI ( http://arxiv.org/abs/2307.01953v1 ) ライセンス: Link先を確認	Lukman Ismaila, Pejman Rasti, Jean-Michel Lem\'ee, David Rousseau	(参考訳) 古典的畳み込みニューラルネットワークに基づくモデルが優れた性能を示した機械学習の状況について述べる。我々はスーパーボクセル(supervoxel)という形で異なる符号化技術を調査し、性能の低下を追跡しながらモデルの複雑さを減らすためにグラフを作成する。このアプローチは、脳腫瘍患者の安静時機能ネットワークの認識タスクについて説明する。超ボクセルをコードするグラフは、画像から機能的脳ネットワークの活性化特性を保存し、cnnモデルの性能を維持しながらモデルパラメータを26倍最適化する。 We refer to a machine learning situation where models based on classical convolutional neural networks have shown good performance. We are investigating different encoding techniques in the form of supervoxels, then graphs to reduce the complexity of the model while tracking the loss of performance. This approach is illustrated on a recognition task of resting-state functional networks for patients with brain tumors. Graphs encoding supervoxels preserve activation characteristics of functional brain networks from images, optimize model parameters by 26 times while maintaining CNN model performance.	翻訳日:2023-07-06 15:31:30 公開日:2023-07-04
# SDXL:高分解能画像合成のための潜時拡散モデルの改良 SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis ( http://arxiv.org/abs/2307.01952v1 ) ライセンス: Link先を確認	Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\"uller, Joe Penna, Robin Rombach	(参考訳) テキスト・画像合成のための遅延拡散モデルSDXLを提案する。モデルパラメータの増加は、主に注意ブロックの増加と、sdxlが第2のテキストエンコーダを使用するように、より大きなクロスタッチコンテキストに起因する。複数の新しい条件付けスキームを設計し,複数のアスペクト比でsdxlを訓練する。また,SDXLが生成する試料の視覚的忠実度を改善するために,ポストホックイメージ・トゥ・イメージ技術を用いて改良モデルを導入する。 SDXLは従来の安定拡散と比較して大幅に性能が向上し,ブラックボックス画像生成装置と競合する結果が得られることを示した。大規模モデルトレーニングと評価におけるオープンリサーチの推進と透明性向上の精神において、コードとモデルのウェイトへのアクセスはhttps://github.com/Stability-AI/generative-modelsで提供します。 We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models	翻訳日:2023-07-06 15:31:17 公開日:2023-07-04
# 3次元スーパービジョンのない複数2次元画像からのニューラル3次元シーン再構成 Neural 3D Scene Reconstruction from Multiple 2D Images without 3D Supervision ( http://arxiv.org/abs/2306.17643v3 ) ライセンス: Link先を確認	Yi Guo, Che Sun, Yunde Jia, and Yuwei Wu	(参考訳) 室内シーンにおける複雑な形状と低テクスチャ領域の再構成において,ニューラル3次元シーン再構成法は印象的な性能を達成した。しかし,これらの手法は,リアルタイムの取得に要する費用と時間を要する3Dデータに大きく依存している。本稿では,平面制約下でのスパース深度を用いてシーンを3次元監督せずに再構成するニューラル再構成手法を提案する。シーンを表現するために,符号付き距離関数フィールド,色フィールド,確率フィールドを導入する。我々は、これらのフィールドを最適化し、2D画像で識別可能な光線マーキングを監督することでシーンを再構築する。幾何的制約により得られた深さの少ない複雑な幾何シーン領域の再構成品質を向上させる。幾何学的制約プロジェクト3Dは、異なる2D画像に類似した特徴を持つ類似した外観の領域に表面を向ける。我々は平面制約を課し、屋内の床に平行あるいは垂直に大きな平面を作る。 2つの制約は、シーンの正確で滑らかな幾何学構造を再構築するのに役立つ。提案手法は,ScanNetデータセット上で3次元監視を行う既存手法と比較して,競争性能が向上する。 Neural 3D scene reconstruction methods have achieved impressive performance when reconstructing complex geometry and low-textured regions in indoor scenes. However, these methods heavily rely on 3D data which is costly and time-consuming to obtain in real world. In this paper, we propose a novel neural reconstruction method that reconstructs scenes using sparse depth under the plane constraints without 3D supervision. We introduce a signed distance function field, a color field, and a probability field to represent a scene. We optimize these fields to reconstruct the scene by using differentiable ray marching with accessible 2D images as supervision. We improve the reconstruction quality of complex geometry scene regions with sparse depth obtained by using the geometric constraints. The geometric constraints project 3D points on the surface to similar-looking regions with similar features in different 2D images. We impose the plane constraints to make large planes parallel or vertical to the indoor floor. Both two constraints help reconstruct accurate and smooth geometry structures of the scene. Without 3D supervision, our method achieves competitive performance compared with existing methods that use 3D supervision on the ScanNet dataset.	翻訳日:2023-07-06 10:52:22 公開日:2023-07-04
# 分類システムにおける説明のための統一論理枠組み A unified logical framework for explanations in classifier systems ( http://arxiv.org/abs/2105.14452v7 ) ライセンス: Link先を確認	Xinghan Liu and Emiliano Lorini	(参考訳) 近年では、説明可能なAI(XAI)分野におけるバイナリ分類器の説明において、ブール関数に対する新たな関心が高まっている。ブール関数の標準的なアプローチは命題論理である。我々は,二項入力分類器とその特性に関する推論をサポートするceteris paribusの性質のモーダル言語を提案する。我々は、分類子モデルの族を研究し、言語の濃度に関する2つの証明体系として公理化し、我々の公理学の完全性を示す。さらに、我々の様相言語に対する充足可能性チェック問題は無限変数の場合ではnexptime-completeであり、有限変数の場合では多項式となることを証明した。さらに、無限変数の場合において、我々の言語の興味深いNPフラグメントを同定する。我々はこの言語を,帰納的,対比的,反事実的説明,バイアスを含む様々な説明概念と同様に,反事実条件を形式化するために活用する。最後に,この言語の2つの拡張について述べる: 代入可能分類器変更の概念による動的拡張と,実際の入力に対する分類器の不確実性を表現できる認識的拡張である。 Recent years have witnessed a renewed interest in Boolean function in explaining binary classifiers in the field of explainable AI (XAI). The standard approach of Boolean function is propositional logic. We present a modal language of a ceteris paribus nature which supports reasoning about binary input classifiers and their properties. We study a family of classifier models, axiomatize it as two proof systems regarding the cardinality of the language and show completeness of our axiomatics. Moreover, we prove that satisfiability checking problem for our modal language is NEXPTIME-complete in the infinite-variable case, while it becomes polynomial in the finite-variable case. We furthermore identify an interesting NP fragment of our language in the infinite-variable case. We leverage the language to formalize counterfactual conditional as well as a variety of notions of explanation including abductive, contrastive and counterfactual explanations, and biases. Finally, we present two extensions of our language: a dynamic extension by the notion of assignment enabling classifier change and an epistemic extension in which the classifier's uncertainty about the actual input can be represented.	翻訳日:2023-07-06 10:51:25 公開日:2023-07-04
# アクティブフォーミングによる事前学習による言語可塑性の向上 Improving Language Plasticity via Pretraining with Active Forgetting ( http://arxiv.org/abs/2307.01163v2 ) ライセンス: Link先を確認	Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe	(参考訳) プリトレーニング言語モデル(plm)は現在、自然言語処理の主要なモデルである。ダウンストリームのパフォーマンスは印象的なものですが、新しい言語にplmを適用するのは困難です。以前の作業では、新しい言語用の新しい埋め込みレイヤを学ぶことでこの問題に対処できることが示されているが、データと計算非効率の両方がそうである。本稿では,新しい言語に迅速に適応できるPLMの作成方法として,事前学習中に能動的に忘れる機構を提案する。具体的には、プレトレーニング中のK更新毎に埋め込み層をリセットすることで、メタ学習効果と同様に、限られた数の更新で新しい埋め込みを学習する能力を改善することをPLMに推奨する。 RoBERTaを用いた実験では、言語適応の高速化だけでなく、特に英語から離れた言語において、低データ方式の標準モデルよりも優れていることが示されている。 Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.	翻訳日:2023-07-06 10:49:14 公開日:2023-07-04
# CNOTゲート受信機を用いたマイクロ波ガウス量子センシング Microwave Gaussian quantum sensing with a CNOT gate receiver ( http://arxiv.org/abs/2307.01014v2 ) ライセンス: Link先を確認	Hany Khalifa, Kirill Petrovnin, Riku J\"antti, Gheorghe Sorin Paraoanu	(参考訳) 量子照明(QI)では、連続変数(CV)絡み合った放射モード間の非古典的相関を利用して、熱雑音に埋め込まれたターゲットの存在を検出する。 QIが最適古典的性能を上回る極端な環境は、マイクロ波領域の応用がこの新しいセンシングパラダイムの恩恵を受けることを示唆している。しかし、提案されたQI受信機は全て、マイクロ波領域では実現不可能な理想的な光子カウンタや検出器に依存している。そこで本研究では,cv制御notゲート(cnot)を用いた新しいqi受信機を提案する。他のQI受信機とは異なり、検出プロセス全体はホモダイン測定と2乗法検出器によって実行される。受信機はゲートの操作の一部として2つの圧縮補助モードを利用する。これらの余分なリソースはオフラインで準備され、全体的な利得は単一のビームスプリッターパラメータによってパッシブに制御される。我々は,本モデルと他のQI受信機を比較し,その動作状態が他よりも優れ,性能が最適であることを示す。この研究の主な焦点はマイクロ波量子センシングアプリケーションであるが、提案したデバイスは光学領域でも構築可能であるため、より広義の量子センシングツールボックスに新たに追加されることになる。 In quantum illumination (QI) the non-classical correlations between continuous variable (CV) entangled modes of radiation are exploited to detect the presence of a target embedded in thermal noise. The extreme environment where QI outperforms its optimal classical counterpart suggests that applications in the microwave domain would benefit the most from this new sensing paradigm. However all the proposed QI receivers rely on ideal photon counters or detectors, which are not currently feasible in the microwave domain. Here we propose a new QI receiver that utilizes a CV controlled not gate (CNOT) in order to perform a joint measurement on a target return and its retained twin. Unlike other QI receivers, the entire detection process is carried out by homodyne measurements and square-law detectors. The receiver exploits two squeezed ancillary modes as a part of the gate's operation. These extra resources are prepared offline and their overall gain is controlled passively by a single beamsplitter parameter. We compare our model to other QI receivers and demonstrate its operation regime where it outperforms others and achieves optimal performance. Although the main focus of this study is microwave quantum sensing applications, our proposed device can be built as well in the optical domain, thus rendering it as a new addition to the quantum sensing toolbox in a wider sense.	翻訳日:2023-07-06 10:48:42 公開日:2023-07-04
# CardiGraphormer:創薬革命における自己指導型学習の力 CardiGraphormer: Unveiling the Power of Self-Supervised Learning in Revolutionizing Drug Discovery ( http://arxiv.org/abs/2307.00859v2 ) ライセンス: Link先を確認	Abhijit Gupta and Arnab Mukherjee	(参考訳) 約15,000の既知の薬物と約4,200の承認がある薬発見の世界では、化学空間の組合せの性質は極めて困難である。人工知能(AI)は強力な同盟国として登場したが、従来のAIフレームワークは大きなハードルに直面している。この原稿では、自己教師付き学習(SSL)、グラフニューラルネットワーク(GNN)、薬物発見に革命を起こすためのカルディナリティ保存注意を相乗化するための画期的なアプローチであるCardiGraphormerを紹介している。グラフマーと枢機卿の新たな組み合わせであるcardigraphormerはsslを利用して強力な分子表現を学習し、gnnを使って分子指紋を抽出し、計算時間を短縮しながら予測性能と解釈性を向上させる。分子構造のような複雑なデータを処理し、ノード、ノードのペア、サブグラフ、グラフ構造全体に関連するタスクを実行する。 CardiGraphormerによる薬物発見と薬物相互作用の潜在的な応用は、新しい薬物標的の同定から薬物と薬物の相互作用の予測、新しい薬物発見の実現まで幅広い。この革新的なアプローチは、薬物開発においてAIによって強化された方法論を提供し、SSLとGNNを組み合わせて既存の制限を克服し、薬物発見における膨大な組合せ化学空間をより深く探求する道を開く。 In the expansive realm of drug discovery, with approximately 15,000 known drugs and only around 4,200 approved, the combinatorial nature of the chemical space presents a formidable challenge. While Artificial Intelligence (AI) has emerged as a powerful ally, traditional AI frameworks face significant hurdles. This manuscript introduces CardiGraphormer, a groundbreaking approach that synergizes self-supervised learning (SSL), Graph Neural Networks (GNNs), and Cardinality Preserving Attention to revolutionize drug discovery. CardiGraphormer, a novel combination of Graphormer and Cardinality Preserving Attention, leverages SSL to learn potent molecular representations and employs GNNs to extract molecular fingerprints, enhancing predictive performance and interpretability while reducing computation time. It excels in handling complex data like molecular structures and performs tasks associated with nodes, pairs of nodes, subgraphs, or entire graph structures. CardiGraphormer's potential applications in drug discovery and drug interactions are vast, from identifying new drug targets to predicting drug-to-drug interactions and enabling novel drug discovery. This innovative approach provides an AI-enhanced methodology in drug development, utilizing SSL combined with GNNs to overcome existing limitations and pave the way for a richer exploration of the vast combinatorial chemical space in drug discovery.	翻訳日:2023-07-06 10:48:12 公開日:2023-07-04
# SketchMetaFace: 高忠実度3次元顔モデリングのための学習ベースのスケッチインタフェース SketchMetaFace: A Learning-based Sketching Interface for High-fidelity 3D Character Face Modeling ( http://arxiv.org/abs/2307.00804v2 ) ライセンス: Link先を確認	Zhongjin Luo, Dong Du, Heming Zhu, Yizhou Yu, Hongbo Fu, Xiaoguang Han	(参考訳) 3Dアバターのモデリングは、AR/VR、ゲーム、撮影といった様々なアプリケーションシナリオに役立つ。キャラクターの顔は、アバターの重要な構成要素として重要な多様性と鮮度をもたらす。しかし、3Dキャラクタフェイスモデルの構築には、経験豊富なアーティストであっても、商用ツールによる重い作業が必要になる。既存のスケッチベースの様々なツールは、多様な顔の形と豊富な幾何学的詳細をモデル化するアマチュアをサポートするのに失敗する。本稿では,素人ユーザを対象としたスケッチシステムであるSketchMetaFaceについて紹介する。ユーザインタフェースと基礎となるアルゴリズムの両方を慎重に設計する。第一に、顔の細部を彫る制御性を高めるために、曲率アウェア・ストロークが採用されている。第二に、2Dスケッチマップを3Dモデルにマッピングする鍵となる問題を考えると、「Implicit and Depth Guided Mesh Modeling」(IDGMM)と呼ばれる新しい学習手法を開発する。メッシュ、暗黙、深度表現の利点を融合させ、高い効率で高品質な結果を達成する。さらに,ユーザビリティをさらに支援するために,粗い2次元スケッチインタフェース設計とデータ駆動ストローク提案ツールを提案する。ユーザスタディは、使いやすさと結果の視覚的な品質の観点から、既存のモデリングツールよりも優れたシステムを示します。実験により、IDGMMは精度と効率のトレードオフがより良くなることが示された。 SketchMetaFaceはhttps://zhongjinluo.github.io/SketchMetaFace/で入手できる。 Modeling 3D avatars benefits various application scenarios such as AR/VR, gaming, and filming. Character faces contribute significant diversity and vividity as a vital component of avatars. However, building 3D character face models usually requires a heavy workload with commercial tools, even for experienced artists. Various existing sketch-based tools fail to support amateurs in modeling diverse facial shapes and rich geometric details. In this paper, we present SketchMetaFace - a sketching system targeting amateur users to model high-fidelity 3D faces in minutes. We carefully design both the user interface and the underlying algorithm. First, curvature-aware strokes are adopted to better support the controllability of carving facial details. Second, considering the key problem of mapping a 2D sketch map to a 3D model, we develop a novel learning-based method termed "Implicit and Depth Guided Mesh Modeling" (IDGMM). It fuses the advantages of mesh, implicit, and depth representations to achieve high-quality results with high efficiency. In addition, to further support usability, we present a coarse-to-fine 2D sketching interface design and a data-driven stroke suggestion tool. User studies demonstrate the superiority of our system over existing modeling tools in terms of the ease to use and visual quality of results. Experimental analyses also show that IDGMM reaches a better trade-off between accuracy and efficiency. SketchMetaFace is available at https://zhongjinluo.github.io/SketchMetaFace/.	翻訳日:2023-07-06 10:47:43 公開日:2023-07-04
# ジョイントベル計測による可変量子固有解法高速化 Accelerated variational quantum eigensolver with joint Bell measurement ( http://arxiv.org/abs/2307.00766v2 ) ライセンス: Link先を確認	Chenfeng Cao, Hiroshi Yano, Yuya O. Nakagawa	(参考訳) 変分量子固有解法(VQE)は、量子化学において分子ハミルトニアンの基底状態を得るために、短期量子コンピュータのための顕著な量子古典ハイブリッドアルゴリズムである。しかし、ハミルトニアンにおけるパウリ作用素の非可換性のため、量子コンピュータに要求される測定量は、システムのサイズが大きくなるにつれて著しく増加し、VQEの実用的な応用を妨げる可能性がある。本稿では,JBM-VQE (Joint Bell Measurement VQE) と呼ばれるプロトコルを提案する。本手法では、ハミルトニアンに存在するパウリ作用素のすべての期待値の絶対値を同時に測定できるジョイントベル測定器を用いる。最適化の過程では、jbm-vqeはジョイントベル測定により各イテレーション毎のポーリ演算子の期待値の絶対値を推定するが、それらの符号は従来の方法による期待値の測定ではより少ない頻度で測定される。我々のアプローチは、最適化中に標識が頻繁に変化しないという経験的観察に基づいている。小分子の分子ハミルトニアン基底状態を求める数値シミュレーションによる従来のVQEと比較して、JBM-VQEの高速化と、最適化の初期段階におけるJBM-VQEの高速化は、大規模システムではますます顕著になっている。共同ベル測定に基づくアプローチは、VQEに限らず、コスト関数が多くのパウリ演算子の期待値である様々な量子アルゴリズムで利用することができる。 The variational quantum eigensolver (VQE) stands as a prominent quantum-classical hybrid algorithm for near-term quantum computers to obtain the ground states of molecular Hamiltonians in quantum chemistry. However, due to the non-commutativity of the Pauli operators in the Hamiltonian, the number of measurements required on quantum computers increases significantly as the system size grows, which may hinder practical applications of VQE. In this work, we present a protocol termed joint Bell measurement VQE (JBM-VQE) to reduce the number of measurements and speed up the VQE algorithm. Our method employs joint Bell measurements, enabling the simultaneous measurement of the absolute values of all expectation values of Pauli operators present in the Hamiltonian. In the course of the optimization, JBM-VQE estimates the absolute values of the expectation values of the Pauli operators for each iteration by the joint Bell measurement, while the signs of them are measured less frequently by the conventional method to measure the expectation values. Our approach is based on the empirical observation that the signs do not often change during optimization. We illustrate the speed-up of JBM-VQE compared to conventional VQE by numerical simulations for finding the ground states of molecular Hamiltonians of small molecules, and the speed-up of JBM-VQE at the early stage of the optimization becomes increasingly pronounced in larger systems. Our approach based on the joint Bell measurement is not limited to VQE and can be utilized in various quantum algorithms whose cost functions are expectation values of many Pauli operators.	翻訳日:2023-07-06 10:47:21 公開日:2023-07-04
# 予測符号化と不確かさ最小化によるアクティブセンシング Active Sensing with Predictive Coding and Uncertainty Minimization ( http://arxiv.org/abs/2307.00668v2 ) ライセンス: Link先を確認	Abdelrahman Sharafeldin, Nabil Imam, Hannah Choi	(参考訳) 本稿では,生物にインスパイアされた2つの計算,予測符号化と不確実性最小化に基づくエンドツーエンド探索手法を提案する。この手順は、タスクに依存しない本質的に駆動された方法で、任意の探索設定に適用することができる。まず,mazeナビゲーションタスクで提案手法を実証し,基礎となる遷移分布を発見し,環境の空間的特徴を再構築できることを示す。第2に,エージェントが情報を収集するために,その視覚環境を積極的にサンプリングする必要があるアクティブビジョンのより複雑なタスクに,このモデルを適用する。我々のモデルは教師なしの表現を構築でき、センサのシーンを積極的にサンプリングし、効率的に分類できることを示す。さらに,これらの表現を下流分類の入力として用いると,他のベースラインと比較してデータ効率と学習速度が向上すると同時に,パラメータの複雑さも低下することを示した。最後に、モデルのモジュラリティにより、内部メカニズムを分析し、探索行動中の知覚と行動の相互作用についての洞察を導き出すことができる。 We present an end-to-end procedure for embodied exploration based on two biologically inspired computations: predictive coding and uncertainty minimization. The procedure can be applied to any exploration setting in a task-independent and intrinsically driven manner. We first demonstrate our approach in a maze navigation task and show that our model is capable of discovering the underlying transition distribution and reconstructing the spatial features of the environment. Second, we apply our model to the more complex task of active vision, where an agent must actively sample its visual environment to gather information. We show that our model is able to build unsupervised representations that allow it to actively sample and efficiently categorize sensory scenes. We further show that using these representations as input for downstream classification leads to superior data efficiency and learning speed compared to other baselines, while also maintaining lower parameter complexity. Finally, the modularity of our model allows us to analyze its internal mechanisms and to draw insight into the interactions between perception and action during exploratory behavior.	翻訳日:2023-07-06 10:46:27 公開日:2023-07-04
# ディープニューラルネットワークのためのスパーシティアウェア一般化理論 Sparsity-aware generalization theory for deep neural networks ( http://arxiv.org/abs/2307.00426v2 ) ライセンス: Link先を確認	Ramchandran Muthukumar, Jeremias Sulam	(参考訳) 深層人工ニューラルネットワークは、未理解のままの驚くべき一般化能力を達成する。本稿では,隠れ層アクティベーションにおいて達成される疎度を生かしたディープフィードフォワードReLUネットワークの一般化を解析するための新しいアプローチを提案する。各入力サンプルの有効なモデルサイズを削減したフレームワークを開発することで、スパーシティと一般化の間の根本的なトレードオフを示すことができる。重要なことは、この結果がモデルによって達成される疎度について強い仮定をしていないことであり、近年のノルムベースのアプローチよりも改善されている。過度にパラメータ化されたモデルであっても、特定の設定においてデータ依存の先行値と組み合わせて非空き境界を示す。 Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By developing a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models.	翻訳日:2023-07-06 10:46:12 公開日:2023-07-04
# スコア正規化を用いたCNNに基づく人物再識別の改善 Improving CNN-based Person Re-identification using score Normalization ( http://arxiv.org/abs/2307.00397v2 ) ライセンス: Link先を確認	Ammar Chouchane, Abdelmalik Ouamane, Yassine Himeur, Wathiq Mansoor, Shadi Atalla, Afaf Benzaibak and Chahrazed Boudellal	(参考訳) 個人再識別(PRe-ID)は、セキュリティ、監視、小売分析において重要な課題であり、複数のカメラやビューにまたがる個人を特定することである。しかし、照明・背景・視点の変化により困難な課題となっている。 PRe-IDシステムの成功には,効率的な特徴抽出とメートル法学習アルゴリズムが不可欠である。本稿では,畳み込みニューラルネットワーク(cnn)に基づく特徴抽出法と,xqda(cross-view quadratic discriminant analysis)を併用した,メトリック学習のための新しい手法を提案する。また、マハラノビス距離とスコア正規化処理を用いてカメラスコア間の不整合に対処するマッチングアルゴリズムを実装した。提案手法は, VIPeR, GRID, CUHK01, PRID450Sの4つの挑戦的データセットで検証し, 有望な結果を得た。例えば、GRID、CUHK01、VIPeR、PRID450Sデータセットのランク-20の精度は61.92%、83.90%、92.03%、96.22%であったが、スコア正規化後にそれぞれ64.64%、89.30%、92.78%、98.76%に増加した。したがって、4つの挑戦的データセットの有望な結果は、提案手法の有効性を示している。 Person re-identification (PRe-ID) is a crucial task in security, surveillance, and retail analysis, which involves identifying an individual across multiple cameras and views. However, it is a challenging task due to changes in illumination, background, and viewpoint. Efficient feature extraction and metric learning algorithms are essential for a successful PRe-ID system. This paper proposes a novel approach for PRe-ID, which combines a Convolutional Neural Network (CNN) based feature extraction method with Cross-view Quadratic Discriminant Analysis (XQDA) for metric learning. Additionally, a matching algorithm that employs Mahalanobis distance and a score normalization process to address inconsistencies between camera scores is implemented. The proposed approach is tested on four challenging datasets, including VIPeR, GRID, CUHK01, and PRID450S, and promising results are obtained. For example, without normalization, the rank-20 rate accuracies of the GRID, CUHK01, VIPeR and PRID450S datasets were 61.92%, 83.90%, 92.03%, 96.22%; however, after score normalization, they have increased to 64.64%, 89.30%, 92.78%, and 98.76%, respectively. Accordingly, the promising results on four challenging datasets indicate the effectiveness of the proposed approach.	翻訳日:2023-07-06 10:45:59 公開日:2023-07-04

Title

Authors

Abstract

論文公表日・翻訳日

# 障害と自己効力:OSSコースが学生の知覚に及ぼす影響に関する大規模研究

Barriers and Self-Efficacy: A Large-Scale Study on the Impact of OSS Courses on Student Perceptions ( http://arxiv.org/abs/2304.14628v2 )

ライセンス: Link先を確認

Larissa Salerno, Simone de Fran\c{c}a Tonh\~ao, Igor Steinmacher, Christoph Treude

(参考訳) オープンソースソフトウェア(OSS)開発は、ソフトウェア工学の学生が大規模ソフトウェア開発を経験し、参加するユニークな機会を提供するが、そのようなコースが学生の自己効力や学生が直面する課題に与える影響はよく分かっていない。本稿は,異なる国の大学におけるoss開発コースの複数事例からのデータを分析し,授業の結果として学生の自己効力がどう変化したか,学生が直面する障壁や課題を報告することで,このギャップに対処することを目的とする。

Open source software (OSS) development offers a unique opportunity for students in Software Engineering to experience and participate in large-scale software development, however, the impact of such courses on students' self-efficacy and the challenges faced by students are not well understood. This paper aims to address this gap by analyzing data from multiple instances of OSS development courses at universities in different countries and reporting on how students' self-efficacy changed as a result of taking the course, as well as the barriers and challenges faced by students.

翻訳日:2023-10-24 12:26:33 公開日:2023-07-04

# ソフトウェアアーキテクチャ情報のためのクエリ言語(拡張版)

A Query Language for Software Architecture Information (Extended version) ( http://arxiv.org/abs/2306.16829v2 )

ライセンス: Link先を確認

Joshua Ammermann, Sven Jordan, Lukas Linsbauer, Ina Schaefer

(参考訳) ソフトウェアのメンテナンスは、ソフトウェアシステムのライフサイクルの重要な部分です。既存のソフトウェアシステムのメンテナンスタスクは、時間とともに変化するアーキテクチャ情報(アーキテクチャドリフト)に苦しむ。 Digital Architecture Twin (DArT)は、最新のアーキテクチャ情報を提供することで、ソフトウェアのメンテナンスをサポートする。そのため、DArTはそのような情報を収集し、ソフトウェアシステムと共進化し、継続的なリバースエンジニアリングを可能にする。しかし、利害関係者が情報を取得するための重要なリンクが欠けている。このギャップを埋めるために、私たちはArchitecture Information Query Language (AIQL)にコントリビュートしています。我々は、継続的リバースエンジニアリングの文脈で4つのアプリケーションシナリオを導出した。私たちは、aiqlがアプリケーションシナリオのクエリを定式化するために必要な機能を提供し、言語が現実世界のソフトウェアシステムで使用するためにスケールすることを示した。ユーザ調査において、利害関係者は言語を理解するのが容易であることに同意し、その価値をアプリケーションシナリオの特定のステークホルダーに評価した。

Software maintenance is an important part of a software system's life cycle. Maintenance tasks of existing software systems suffer from architecture information that is diverging over time (architectural drift). The Digital Architecture Twin (DArT) can support software maintenance by providing up-to-date architecture information. For this, the DArT gathers such information and co-evolves with a software system, enabling continuous reverse engineering. But the crucial link for stakeholders to retrieve this information is missing. To fill this gap, we contribute the Architecture Information Query Language (AIQL), which enables stakeholders to access up-to-date and tailored architecture information. We derived four application scenarios in the context of continuous reverse engineering. We showed that the AIQL provides the required functionality to formulate queries for the application scenarios and that the language scales for use with real-world software systems. In a user study, stakeholders agreed that the language is easy to understand and assessed its value to the specific stakeholder for the application scenarios.

翻訳日:2023-10-23 18:46:52 公開日:2023-07-04

# 人工知能研究に関する文献学的研究 : グローバルパノラマとインド人の出現

A Bibliographic Study on Artificial Intelligence Research: Global Panorama and Indian Appearance ( http://arxiv.org/abs/2308.00705v1 )

ライセンス: Link先を確認

Amit Tiwari, Susmita Bardhan, Vikas Kumar

(参考訳) 本研究は,2015-2020年の人工知能研究における書誌学の傾向を,書誌学研究の科学マッピング法を用いて特定し,評価する。必要なデータはscopusデータベースから収集されている。収集したデータ分析を準備するために、ツールvizの助けを借りて、必須のデータ変換を手動で行いました。オープンリファイントレンドの決定とマッピング手法の実行のために、aiのオープンアクセスのトップ5と商用ジャーナルが、citescoreによるランキングに基づいて選ばれている。本書は,分析のために所定の期間に出版された6880条を含む。このトレンドは、国別出版物、年別出版物、AIにおける話題用語、トップクワッド記事、著名な作家、主要な機関、AIとインドにおける産業の関与に基づいている。その結果, オープンアクセス雑誌と比較して, 商業雑誌の引用率が高く, 記事数も年々増加していることがわかった。さらにIEEEは、最も暗唱された出版物の84%を出版する著名な出版社である。さらに、中国と米国はAI分野における文学の主要な貢献者である。この研究は、ニューラルネットワークとディープラーニングが、トップAI研究論文に含まれる主要なトピックであることを明らかにした。近年、公共機関だけでなく民間機関もAI研究に資金を投資している。この研究は、AI研究の観点からインドの研究者の相対的な位置についても調査している。現在の仕事は、AIの初期開発、現在の立場、そして将来の方向性を理解するのに役立つ。

The present study identifies and assesses the bibliographic trend in Artificial Intelligence (AI) research for the years 2015-2020 using the science mapping method of bibliometric study. The required data has been collected from the Scopus database. To make the collected data analysis-ready, essential data transformation was performed manually and with the help of a tool viz. OpenRefine. For determining the trend and performing the mapping techniques, top five open access and commercial journals of AI have been chosen based on their citescore driven ranking. The work includes 6880 articles published in the specified period for analysis. The trend is based on Country-wise publications, year-wise publications, topical terms in AI, top-cited articles, prominent authors, major institutions, involvement of industries in AI and Indian appearance. The results show that compared to open access journals; commercial journals have a higher citescore and number of articles published over the years. Additionally, IEEE is the prominent publisher which publishes 84% of the top-cited publications. Further, China and the United States are the major contributors to literature in the AI domain. The study reveals that neural networks and deep learning are the major topics included in top AI research publications. Recently, not only public institutions but also private bodies are investing their resources in AI research. The study also investigates the relative position of Indian researchers in terms of AI research. Present work helps in understanding the initial development, current stand and future direction of AI.

翻訳日:2023-08-06 11:02:33 公開日:2023-07-04

# 厳密な低ランク制約最適化 --漸近的に$\mathcal{O}(\frac{1}{t^2})$法

Strictly Low Rank Constraint Optimization -- An Asymptotically $\mathcal{O}(\frac{1}{t^2})$ Method ( http://arxiv.org/abs/2307.14344v1 )

ライセンス: Link先を確認

Mengyuan Zhang and Kai Liu

(参考訳) 最適解のスパーシリティを促進するために, \textit{rank} 正則化を伴う非凸および非スムース問題のクラスについて検討した。本稿では,中間更新の特異値に対する新しいサポートセットプロジェクション演算により,問題を解くための近似勾配降下法を適用し,プロセスの高速化を提案する。我々のアルゴリズムは、滑らかで凸な問題に対する一階法に対するネステロフの最適収束率と全く同じ$O(\frac{1}{t^2})$の収束率を達成することができることを示す。厳密な間隔が期待でき、各更新における特異値のサポートセットは単調に縮小し、私たちの知る限り、運動量に基づくアルゴリズムでは新しくなっている。

We study a class of non-convex and non-smooth problems with \textit{rank} regularization to promote sparsity in optimal solution. We propose to apply the proximal gradient descent method to solve the problem and accelerate the process with a novel support set projection operation on the singular values of the intermediate update. We show that our algorithms are able to achieve a convergence rate of $O(\frac{1}{t^2})$, which is exactly same as Nesterov's optimal convergence rate for first-order methods on smooth and convex problems. Strict sparsity can be expected and the support set of singular values during each update is monotonically shrinking, which to our best knowledge, is novel in momentum-based algorithms.

翻訳日:2023-07-30 03:57:01 公開日:2023-07-04

# GenRec: ジェネレーティブレコメンデーションのための大規模言語モデル

GenRec: Large Language Model for Generative Recommendation ( http://arxiv.org/abs/2307.00457v2 )

ライセンス: Link先を確認

Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, Yongfeng Zhang

(参考訳) 近年,多種多様な自然言語処理タスクのための強力なツールとして,大規模言語モデル (LLM) が登場している。しかし、ジェネレーティブ・レコメンデーション・パラダイムの下でのレコメンデーション・システムへの可能性は比較的未定である。本稿では,テキストデータに基づく大規模言語モデル(LLM)を用いたレコメンデーションシステムに対する革新的なアプローチを提案する。本稿では, LLMの表現力を利用して, 従来の差別的推薦として, 各候補項目のランキングスコアを1つずつ計算するのではなく, 推薦対象項目を直接生成する新しいジェネレーティブレコメンデーション(GenRec)を提案する。 GenRecはLLMの理解機能を使ってコンテキストを解釈し、ユーザの好みを学習し、関連するレコメンデーションを生成する。提案手法は,大規模言語モデルに符号化された膨大な知識を活用して推薦課題を遂行する。まず,レコメンデーションタスクの理解能力を高めるための特別なプロンプトを定式化する。その後、これらのプロンプトを用いてLLaMAのバックボーンLLMをテキストデータで表されるユーザとイテムの相互作用のデータセット上で微調整し、ユーザの好みやアイテムの特徴をキャプチャする。本研究は,レコメンデーションシステムの領域を変革する上で,LLMに基づくジェネレーティブレコメンデーションの可能性を明らかにし,今後の探究の基盤となる枠組みを提供する。ベンチマークデータセットを広範囲に実験した結果,我々のジャンルは大規模データセットよりも優れた結果が得られることが示された。

In recent years, large language models (LLM) have emerged as powerful tools for diverse natural language processing tasks. However, their potential for recommender systems under the generative recommendation paradigm remains relatively unexplored. This paper presents an innovative approach to recommendation systems using large language models (LLMs) based on text data. In this paper, we present a novel LLM for generative recommendation (GenRec) that utilized the expressive power of LLM to directly generate the target item to recommend, rather than calculating ranking score for each candidate item one by one as in traditional discriminative recommendation. GenRec uses LLM's understanding ability to interpret context, learn user preferences, and generate relevant recommendation. Our proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks. We first we formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Subsequently, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research underscores the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and offers a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the experiments shows that our GenRec has significant better results on large dataset.

翻訳日:2023-07-16 04:18:29 公開日:2023-07-04

# トーキングヘッド生成における音声・ダイナミクス同期の総合的マルチスケールアプローチ

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation ( http://arxiv.org/abs/2307.03270v1 )

ライセンス: Link先を確認

Louis Airale (UGA, LIG), Dominique Vaufreydaz (LIG), Xavier Alameda-Pineda (UGA)

(参考訳) 音声入力信号を用いた静止画像の深部生成モデルによるアニメーション化は活発な研究課題であり,最近の重要な進展が見られる。しかし、頭の動きと音声の音声と視覚の相関はさておき、自然な頭の動きの発生は無視されることが多いため、唇の同期やレンダリングの質に多くの努力が注がれている。本研究では,頭部と唇のダイナミックスと音声の短期的・長期的相関をよりよく扱うために,マルチスケールの音声-視覚同期損失とマルチスケールの自己回帰的GANを提案する。特に、マルチモーダルな入力ピラミッド上でシンセサイザーモデルのスタックをトレーニングし、これらのモデルをマルチスケールジェネレータネットワークのガイダンスとして使用し、多様な時間スケールに展開するオーディオアライメント動作を生成する。我々のジェネレータは、標準的な低次元の頭部表現である顔のランドマーク領域で動作する。実験により,頭部運動のダイナミックス品質,およびランドマーク領域と画像領域の両方におけるマルチスケールオーディオ-視覚同期における技術の現状が大幅に改善された。

Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.

翻訳日:2023-07-16 04:14:36 公開日:2023-07-04

# Whisperを用いた教育用ビデオの翻訳:AIを用いた教育用ビデオの翻訳に関する予備的研究

Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos ( http://arxiv.org/abs/2307.03200v1 )

ライセンス: Link先を確認

Ashwin Rao

(参考訳) ビデオはますますeラーニングに使われており、文字起こしは学習体験を高めるために不可欠である。書き起こし生成のコストと遅延は、自動音声認識(ASR)システムによって軽減することができる。本稿では,25の教育ビデオに対してwhisperが生成した原稿を定量化し,asrを用いて教育ビデオの書き起こしを行う際の研究の道筋を明らかにした。

Videos are increasingly being used for e-learning, and transcripts are vital to enhance the learning experience. The costs and delays of generating transcripts can be alleviated by automatic speech recognition (ASR) systems. In this article, we quantify the transcripts generated by whisper for 25 educational videos and identify some open avenues of research when leveraging ASR for transcribing educational videos.

翻訳日:2023-07-16 04:14:16 公開日:2023-07-04

# splitfed learningの脆弱性分析:データ中毒攻撃に対するロバスト性の評価

Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks ( http://arxiv.org/abs/2307.03197v1 )

ライセンス: Link先を確認

Aysha Thahsin Zahir Ismail, Raj Mani Shukla

(参考訳) 分散コラボレーション機械学習(DCML)は、集中型機械学習に関連するプライバシー問題に対処する潜在的な代替手段である。スプリット学習(SL)とフェデレート学習(FL)はDCMLにおける2つの効果的な学習手法である。最近、SFL(SplitFed Learning)として知られるFLとSLのハイブリッドへの関心が高まっている。この研究は、SFLにおけるデータ中毒攻撃の影響を研究し、分析し、提示する最も初期の試みである。本研究では,SFLに対する標的外,標的外,遠隔攻撃の3種類の新規攻撃戦略を提案する。攻撃戦略はすべて、DCMLベースの分類器の性能を低下させることを目的としている。提案手法は,心電図信号分類と手書き文字自動認識の2つの異なるケーススタディで検証した。悪意のあるクライアントの割合と、クライアントとサーバ間でモデル分割層を選択することで、一連の攻撃実験が行われた。攻撃戦略の包括的分析の結果は、sflの標的攻撃と比較して、非標的および距離ベースの中毒攻撃は分類結果の回避に大きな影響を与えることを明らかに示す。

Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL

翻訳日:2023-07-16 04:14:07 公開日:2023-07-04

# 一貫性のある視覚合成のための協調スコア蒸留

Collaborative Score Distillation for Consistent Visual Synthesis ( http://arxiv.org/abs/2307.04787v1 )

ライセンス: Link先を確認

Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin

(参考訳) 大規模テキストと画像の拡散モデルの生成先行により、多様な視覚的モダリティに関する幅広い新しい生成および編集アプリケーションが可能になる。しかし、これらのプリエントを複数の画像(例えばビデオ)として表現される複雑な視覚モダリティに適応させる場合、一連の画像の一貫性を達成することは困難である。本稿では,この課題を協調スコア蒸留(csd)という新しい手法で解決する。 CSDはStein Variational Gradient Descent (SVGD)に基づいている。具体的には、SVGD更新において複数のサンプルを「粒子」とみなし、それらのスコア関数を組み合わせて、画像の集合を同期的に生成する。したがって、CSDは2次元画像間の情報のシームレスな統合を促進し、複数のサンプル間で一貫した視覚合成をもたらす。本研究では,パノラマ画像,ビデオ,および3dシーンのビジュアル編集を行い,様々なタスクにおけるcsdの有効性を示す。本研究は,サンプル間の整合性を向上し,テキスト・画像拡散モデルの適用性を高めるための汎用手法として,CDDの能力について述べる。

Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.

翻訳日:2023-07-16 04:03:33 公開日:2023-07-04

# SleepEGAN: 睡眠段階の非バランス分類のためのGANアンサンブル深層学習モデル

SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages ( http://arxiv.org/abs/2307.05362v1 )

ライセンス: Link先を確認

Xuewei Cheng, Ke Huang, Yi Zou and Shujie Ma

(参考訳) ディープニューラルネットワークは、強力な表現とモデル内特徴変換能力のため、自動睡眠ステージ分類において重要な役割を果たす。しかし、睡眠データの生の脳波信号に存在するクラス不均衡と個々の不均一性は、あらゆる機械学習アルゴリズムの分類性能に大きな影響を及ぼす可能性がある。そこで本研究では,この2つの問題を解決するために,睡眠ステージの不均衡分類のための生成的逆ネットワーク(gan)を用いた学習モデルsleepganを開発した。クラス不均衡を軽減するため、データ拡張のためのEEG信号の特徴に適応した新しいGANアーキテクチャ(EGAN)を提案する。マイノリティクラスの生成されたサンプルは、トレーニングプロセスで使用される。さらに,検証とテストセットの不均一性に起因するモデル推定分散を低減し,予測性能の精度とロバスト性を高めるために,コストフリーなアンサンブル学習戦略を設計する。提案手法は,3つの睡眠データセットを用いた既存手法と比較して,分類精度を向上できることを示す。

Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two problems, this paper develops a generative adversarial network (GAN)-powered ensemble deep learning model, named SleepEGAN, for the imbalanced classification of sleep stages. To alleviate class imbalance, we propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation. The generated samples for the minority classes are used in the training process. In addition, we design a cost-free ensemble learning strategy to reduce the model estimation variance caused by the heterogeneity between the validation and test sets, so as to enhance the accuracy and robustness of prediction performance. We show that the proposed method can improve classification accuracy compared to several existing state-of-the-art methods using three public sleep datasets.

翻訳日:2023-07-16 03:55:40 公開日:2023-07-04

# テレビシリーズの人気をデコードする:ネットワーク分析の観点から

Decoding the Popularity of TV Series: A Network Analysis Perspective ( http://arxiv.org/abs/2307.05329v1 )

ライセンス: Link先を確認

Melody Yu

(参考訳) 本稿では,3つの人気テレビシリーズから抽出されたキャラクタネットワークを分析し,テレビ番組のキャラクタネットワークメトリクスとIMDBのレビューとの関係について検討する。キャラクターネットワーク(英: character network)とは、シーン内のキャラクターの相互作用を表すテレビ番組のプロットから生成されたグラフであり、それら間の接続の存在を示す。ノード次数やグラフ密度など各エピソードのネットワークメトリクスを算出し,これらの指標を用いてimdbのネットワークメトリクスとテレビシリーズレビューの関係を考察する。その結果,テレビシリーズにおけるキャラクターインタラクションのネットワーク指標は,テレビシリーズのレビュースコアと強い相関を示した。本研究は,テレビ制作者が視聴者にアピールする未来のエピソードのキャラクタダイナミクスの調整方法を理解する上で,より定量的な情報を提供することを目的としている。キャラクタインタラクションが視聴者のエンゲージメントや楽しみに与える影響を理解することによって、プロデューサーは番組の展開に関するインフォームドな意思決定を行うことができる。

In this paper, we analyze the character networks extracted from three popular television series and explore the relationship between a TV show episode's character network metrics and its review from IMDB. Character networks are graphs created from the plot of a TV show that represents the interactions of characters in scenes, indicating the presence of a connection between them. We calculate various network metrics for each episode, such as node degree and graph density, and use these metrics to explore the potential relationship between network metrics and TV series reviews from IMDB. Our results show that certain network metrics of character interactions in episodes have a strong correlation with the review score of TV series. Our research aims to provide more quantitative information that can help TV producers understand how to adjust the character dynamics of future episodes to appeal to their audience. By understanding the impact of character interactions on audience engagement and enjoyment, producers can make informed decisions about the development of their shows.

翻訳日:2023-07-16 03:54:07 公開日:2023-07-04

# ガルバニック皮膚反応信号の特徴選択とSVMに基づく人間の感情認識

Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM ( http://arxiv.org/abs/2307.05383v1 )

ライセンス: Link先を確認

Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong

(参考訳) 本稿では,自動選択したGalvanic Skin Response (GSR)信号の特徴に基づく人間の感情認識手法とSVMを提案する。 GSR信号はE-Health Sensor Platform V2.0によって取得された。そして、ウェーブレット関数によってデータをデノーズし、正規化して個々の差を除去する。正規化データから30個の特徴を抽出するが、これらの特徴を直接使用すると認識率が低下する。本手法では,最適化機能を得るために,共分散に基づく特徴選択を行う。最後に、最適化された特徴を入力したSVMを用いて人間の感情認識を実現する。実験の結果,提案手法は人間の感情認識に適しており,認識精度は66.67%以上であることがわかった。

A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of these features will lead to a low recognition rate. In order to gain the optimized features, a covariance based feature selection is employed in our method. Finally, a SVM with input of the optimized features is utilized to achieve the human emotion recognition. The experimental results indicate that the proposed method leads to good human emotion recognition, and the recognition accuracy is more than 66.67%.

翻訳日:2023-07-16 03:43:33 公開日:2023-07-04

# コヒーレント光学系におけるニューラルネットワーク等化器の創発性を高めるマルチタスク学習

Multi-Task Learning to Enhance Generazability of Neural Network Equalizers in Coherent Optical Systems ( http://arxiv.org/abs/2307.05374v1 )

ライセンス: Link先を確認

Sasipim Srivallapanondh, Pedro J. Freire, Ashraful Alam, Nelson Costa, Bernhard Spinnler, Antonio Napoli, Egor Sedov, Sergei K. Turitsyn, Jaroslaw E. Prilepsky

(参考訳) コヒーレントシステムにおけるnnベースのイコライザの柔軟性を改善するため,マルチタスク学習が初めて提案されている。 NNベースの「単一」等化器は、打ち上げ電力、シンボルレート、送信距離の変動があっても再訓練することなく、CDCと比較して最大4dBのQ因子を改善する。

For the first time, multi-task learning is proposed to improve the flexibility of NN-based equalizers in coherent systems. A "single" NN-based equalizer improves Q-factor by up to 4 dB compared to CDC, without re-training, even with variations in launch power, symbol rate, or transmission distance.

翻訳日:2023-07-16 03:42:39 公開日:2023-07-04

# 量子回路シミュレーションの二酸化炭素排出量は想像以上に多い

Carbon Emissions of Quantum Circuit Simulation: More than You Would Think ( http://arxiv.org/abs/2307.05510v1 )

ライセンス: Link先を確認

Jinyang Li, Qiang Guan, Dingwen Tao, Weiwen Jiang

(参考訳) 量子ハードウェアの急速な進歩は、多くの研究機会と多くの分野にわたる量子アドバンテージの可能性をもたらす。このランドスケープでは、量子回路シミュレーションは古典的コンピュータ上での量子挙動をエミュレートすることで、必須のツールとして機能する。簡単なアクセス、ノイズのない環境、量子状態のリアルタイム観察を提供する。しかし、量子回路シミュレーションの持続可能性の側面はまだ解明されていない。本稿では,量子回路シミュレーションによる環境影響の概念を初めて紹介する。量子回路シミュレーションから得られたCO2e排出量を計算するための予備モデルを提案する。以上の結果から,大規模な量子回路シミュレーション(43量子ビット)は,変圧器機械学習モデルのトレーニングの48倍のCO2e排出量につながる可能性が示唆された。

The rapid advancement of quantum hardware brings a host of research opportunities and the potential for quantum advantages across numerous fields. In this landscape, quantum circuit simulations serve as an indispensable tool by emulating quantum behavior on classical computers. They offer easy access, noise-free environments, and real-time observation of quantum states. However, the sustainability aspect of quantum circuit simulation is yet to be explored. In this paper, we introduce for the first time the concept of environmental impact from quantum circuit simulation. We present a preliminary model to compute the CO2e emissions derived from quantum circuit simulations. Our results indicate that large quantum circuit simulations (43 qubits) could lead to CO2e emissions 48 times greater than training a transformer machine learning model.

翻訳日:2023-07-16 03:36:04 公開日:2023-07-04

# garbage in, garbage out: 大きな言語モデルを用いた犯罪のゼロショット検出

Garbage in, garbage out: Zero-shot detection of crime using Large Language Models ( http://arxiv.org/abs/2307.06844v1 )

ライセンス: Link先を確認

Anj Simmons, Rajesh Vasa

(参考訳) 本稿では,大規模言語モデルが学習した常識知識を活用し,監視映像のテキスト記述による犯罪に関するゼロショット推論を行う。ビデオが(手動で)高品質なテキスト記述に変換される場合,大規模な言語モデルでは,ゼロショット推論のみを用いて,最先端のパフォーマンスで犯罪を検出し分類することができる。しかし、既存の自動ビデオからテキストへのアプローチでは、推論をサポートするのに十分な品質の動画記述を生成することができない(ガベージアウト、大きな言語モデルへのガベージアウトビデオ記述)。

This paper proposes exploiting the common sense knowledge learned by large language models to perform zero-shot reasoning about crimes given textual descriptions of surveillance videos. We show that when video is (manually) converted to high quality textual descriptions, large language models are capable of detecting and classifying crimes with state-of-the-art performance using only zero-shot reasoning. However, existing automated video-to-text approaches are unable to generate video descriptions of sufficient quality to support reasoning (garbage video descriptions into the large language model, garbage out).

翻訳日:2023-07-16 03:16:44 公開日:2023-07-04

# generative adversarial trainer: ganによる敵対的摂動に対する防御

Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN ( http://arxiv.org/abs/1705.03387v3 )

ライセンス: Link先を確認

Hyeungill Lee, Sungyeob Han, Jungwoo Lee

(参考訳) 本稿では,生成型adversarial networkを用いて,ニューラルネットワークを敵例に頑健にする新しい手法を提案する。我々は分類器と生成器のネットワークを交互に訓練する。生成ネットワークは、各画像の勾配を用いて分類器ネットワークを容易に騙すことができる逆摂動を生成する。同時に、分類器ネットワークは、生成者が生成した原画像と逆画像の両方を正しく分類するように訓練される。これらの手順は、分類器ネットワークが敵の摂動に対してより堅牢になるのに役立つ。さらに,本学習フレームワークは,オーバーフィッティングを効率的に低減し,ドロップアウトなどの他の正規化手法を上回る。提案手法をCIFARデータセットの教師あり学習に適用し,実験結果からネットワークの一般化誤差を著しく低減することを示した。我々の知る限りでは、教師あり学習を改善するために GAN を用いる最初の方法である。

We propose a novel technique to make neural network robust to adversarial examples using a generative adversarial network. We alternately train both classifier and generator networks. The generator network generates an adversarial perturbation that can easily fool the classifier network by using a gradient of each image. Simultaneously, the classifier network is trained to classify correctly both original and adversarial images generated by the generator. These procedures help the classifier network to become more robust to adversarial perturbations. Furthermore, our adversarial training framework efficiently reduces overfitting and outperforms other regularization methods such as Dropout. We applied our method to supervised learning for CIFAR datasets, and experimantal results show that our method significantly lowers the generalization error of the network. To the best of our knowledge, this is the first method which uses GAN to improve supervised learning.

翻訳日:2023-07-07 19:03:29 公開日:2023-07-04

# deepois:ジャイロスコープ誘導深部光学画像安定化装置

DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation ( http://arxiv.org/abs/2101.11183v2 )

ライセンス: Link先を確認

Haipeng Li, Shuaicheng Liu, Jue Wang

(参考訳) 撮影された画像はジャイロスコープセンサーを使ってアライメントすることができる。光画像安定化装置(OIS)は、撮影中に画像を調整することで、この可能性を終わらせる。本研究では,OISカメラの映像アライメントにジャイロスコープを使用できるように,OISが引き起こす動きを補償するディープネットワークを提案する。まず,oisカメラを用いて映像とジャイロスコープの両方をトレーニングデータとして記録する。次にジャイロスコープの読みを運動場に変換する。第2に, ローリングシャッターカメラにおいて, フレーム内回転の配列を接地ガイドとして抽出する基本混合運動モデルを提案する。第3に, ジャイロスコープ動作を入力として畳み込みニューラルネットワークをトレーニングし, OIS動作を補償する。一度処理が完了すると、補償ネットワークが他のシーンに適用され、画像アライメントは画像の内容を必要としないジャイロスコープに基づいており、強い堅牢性を提供する。実験の結果は,OIS以外のカメラと同等であり,画像ベースアライメントの精度は比較的高いことがわかった。コードとデータセットはhttps://github.com/lhaippp/DeepOISで入手できる。

Mobile captured images can be aligned using their gyroscope sensors. Optical image stabilizer (OIS) terminates this possibility by adjusting the images during the capturing. In this work, we propose a deep network that compensates the motions caused by the OIS, such that the gyroscopes can be used for image alignment on the OIS cameras. To achieve this, first, we record both videos and gyroscopes with an OIS camera as training data. Then, we convert gyroscope readings into motion fields. Second, we propose a Fundamental Mixtures motion model for rolling shutter cameras, where an array of rotations within a frame are extracted as the ground-truth guidance. Third, we train a convolutional neural network with gyroscope motions as input to compensate for the OIS motion. Once finished, the compensation network can be applied for other scenes, where the image alignment is purely based on gyroscopes with no need for images contents, delivering strong robustness. Experiments show that our results are comparable with that of non-OIS cameras, and outperform image-based alignment results with a relatively large margin. Code and dataset are available at https://github.com/lhaippp/DeepOIS

翻訳日:2023-07-07 18:57:53 公開日:2023-07-04

# グラディエントノルム認識の最小化は1次平坦性を追求し、一般化を改善する

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization ( http://arxiv.org/abs/2303.03108v3 )

ライセンス: Link先を確認

Xingxuan Zhang and Renzhe Xu and Han Yu and Hao Zou and Peng Cui

(参考訳) 近年、フラットミニマは一般化とシャープネス認識最小化(sam)の改善に効果的であることが証明されている。しかし、SAMで議論されている平坦性の現在の定義とそのフォローアップはゼロ階平坦性(摂動半径内の最悪の損失)に限定されている。摂動半径内に1つの最小または複数のミニマが存在する場合, 一般化誤差の低いミニマを高い一般化誤差で判別するには, ゼロ階平坦性が不十分であることを示す。そこで我々は,局所的最小点におけるヘッシアンの最大固有値とsamの正規化関数の両方を境界とする摂動半径内の最大勾配ノルムに着目した,一階平坦性を示す。また,全方向にわたって一様に曲率の小さい最小値を求めるため,GAM(Gradient norm Aware Minimization)と呼ばれる新しいトレーニング手順を提案する。実験結果から,GAMは様々なデータセットやネットワーク上で,SGDやAdamWといった現在の最適化アルゴリズムで訓練されたモデルの一般化を改善することが示された。さらに、GAMはSAMがより平坦なミニマムを見つけ、より良い一般化を実現するのに役立つことを示す。

Recently, flat minima are proven to be effective for improving generalization and sharpness-aware minimization (SAM) achieves state-of-the-art performance. Yet the current definition of flatness discussed in SAM and its follow-ups are limited to the zeroth-order flatness (i.e., the worst-case loss within a perturbation radius). We show that the zeroth-order flatness can be insufficient to discriminate minima with low generalization error from those with high generalization error both when there is a single minimum or multiple minima within the given perturbation radius. Thus we present first-order flatness, a stronger measure of flatness focusing on the maximal gradient norm within a perturbation radius which bounds both the maximal eigenvalue of Hessian at local minima and the regularization function of SAM. We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions. Experimental results show that GAM improves the generalization of models trained with current optimizers such as SGD and AdamW on various datasets and networks. Furthermore, we show that GAM can help SAM find flatter minima and achieve better generalization.

翻訳日:2023-07-07 17:49:50 公開日:2023-07-04

# 固体力学への応用におけるニューラルfem法とニューラルオペレータ法の比較

Comparison of Neural FEM and Neural Operator Methods for applications in Solid Mechanics ( http://arxiv.org/abs/2307.02494v1 )

ライセンス: Link先を確認

Stefan Hildebrand, Sandra Klinge

(参考訳) 機械学習手法は偏微分方程式を解くための最も最新のアプローチのグループに属する。現在の研究は、数値実験によるエラストスタティックスにおける2つのクラス、Neural FEMとNeural Operator Methodsを調査している。 Neural Operatorメソッドは、高価なトレーニングを必要とするが、同じ機械学習モデルで複数の境界値問題を解決することができる。 2つのクラスの主な違いは、計算の労力と精度である。特に、実用的応用にはさらなる研究が必要である。

Machine Learning methods belong to the group of most up-to-date approaches for solving partial differential equations. The current work investigates two classes, Neural FEM and Neural Operator Methods, for the use in elastostatics by means of numerical experiments. The Neural Operator methods require expensive training but then allow for solving multiple boundary value problems with the same Machine Learning model. Main differences between the two classes are the computational effort and accuracy. Especially the accuracy requires more research for practical applications.

翻訳日:2023-07-07 16:53:25 公開日:2023-07-04

# FREEDOM: 教師なしパーソナライゼーションのためのターゲットラベルとソースデータとドメイン情報のないマルチソースドメイン適応

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization ( http://arxiv.org/abs/2307.02493v1 )

ライセンス: Link先を確認

Eunju Yang, Gyusang Cho, Chan-Hyun Youn

(参考訳) サービスの観点からは、Multi-Source Domain Adaptation(MSDA)は、デプロイされたモデルをクライアントのデータセットに適応させる、有望なシナリオである。ターゲットラベルなしで適応を提供し、ソースデータセットが複数のドメインから構築されている場合をサポートする。しかし、そのトレーニングは、マルチソースデータセットの事前ドメイン情報 -- 存在するドメインの数と各データサンプルのドメインラベル -- に大きく依存しているため、現実的ではない。さらにmsdaは、ソースとターゲットの両方のデータセットを同時に(物理的に)必要とし、クライアント装置のストレージ制限や、クライアントデータをサーバに転送することでデータプライバシの問題を引き起こす。サービス提供者の観点からモデル適応のより実践的なシナリオとして、これらの制約を緩和し、3自由ドメイン適応という新たな問題シナリオを提示します。 1)ターゲットラベル、 2)ソースデータセット、大部分は 3) ソースドメイン情報(ドメインラベル+ドメイン数)は利用できない。問題シナリオでは、FREEDOMと呼ばれる実践的な適応フレームワークを提案する。生成モデルのパワーを活用し、データをクラスとスタイルの側面に分離し、そのスタイルはソースデータからクラス非依存の情報として定義され、非パラメトリックベイズアプローチで設計される。適応段階において、FREEDOMは、スタイルが異なる場合でも、クラス分布は一貫性があるという考え方の下で、ソースクラスの分布とターゲットの分布とを一致させることを目的としており、その後、分類モデルの一部のみがパーソナライズされたネットワークとしてデプロイされる。その結果、FREEDOMは、ドメイン情報なしで、ターゲット側の最終的なモデルサイズを減らし、ソースドメインの数によらず、最先端または同等のパフォーマンスを達成する。

From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.

翻訳日:2023-07-07 16:53:18 公開日:2023-07-04

# tableye: 画像のレンズを通して小さなテーブルを見る

TablEye: Seeing small Tables through the Lens of Images ( http://arxiv.org/abs/2307.02491v1 )

ライセンス: Link先を確認

Seung-eon Lee and Sang-Chul Lee

(参考訳) 少人数の表学習の探求が不可欠になる。タブラルデータ(Tabular data)は、多様な情報をキャプチャする汎用表現であるが、制限やデータの特性、モデルのサイズは除外されない。広範な表データのラベル付けは困難であり、すべての重要な機能をキャプチャすることは不可能である。しかし、独立データセット間の共有情報の不足と、表データ内の境界を定義する固有の曖昧さが原因で、比較的未熟なままである。我々の知る限りでは、データセットに制約を課すことなく有意義で制約のない数発の表型学習技術は開発されていない。本稿では,表型データに対する事前知識形成の限界を克服し,ドメイン変換を取り入れたTablEyeという革新的なフレームワークを提案する。表画像を生成してドメイン変換を容易にすることで、元の表データの本質的なセマンティクスを効果的に保存する。このアプローチは、厳密にテストされた少数の学習アルゴリズムと埋め込み関数を利用して、事前知識を取得し、適用する。共有データドメインを利用することで、イメージドメインから学習したこの事前知識を活用できます。具体的には、TablEyeはTabLLMを最大0.11AUCとSTUNTの4ショットタスクで上回り、1ショット設定で平均3.17%の精度で性能を発揮した。

The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.

翻訳日:2023-07-07 16:52:48 公開日:2023-07-04

# ボース・アインシュタイン凝縮系における一般外部ポテンシャルによる量子オットーエンジンの性能向上

Enhancing Quantum Otto Engine Performance in Generalized External Potential on Bose-Einstein Condensation Regime ( http://arxiv.org/abs/2307.01805v1 )

ライセンス: Link先を確認

Zahara Zettira, Ade Fahriza, Zulfi Abdullah, Trengginas E P Sutantyo

(参考訳) ボース・アインシュタイン凝縮(bec)と通常のボース気体の両方を汎用外部ポテンシャルに閉じ込めた動作媒質として用いた量子オットーエンジンについて検討した。エンジンを準静的かつ内可逆的に処理した。準静的および可逆的両方の膨張と圧縮は等エントロピー的であるため、効率の表現は類似している。しかし、準静電サイクルの出力は無限のストローク時間と長いストローク時間のためにゼロである。対照的に、可逆サイクルでは、2つの貯水池による熱化は有限時間で行われる。導電性においてフーリエの法則を用いて媒質の温度と貯水池の温度の関係を定式化し, 作業は加熱時間と冷却ストローク時間に依存する。さらに圧縮比$\kappa$に対して最大出力(EMP)の効率を得るために最大出力を最大化した。作業媒体としてBECを用いる場合, 通常のボースガスを用いたEMPはCurzon-Ahlborn効率に過ぎなかった。また,熱接触時間$\tau$とホット(\tau_{h}$)およびコールド(\tau_{l}$)がEMPに及ぼす影響についても検討した。我々は、$\tau_{h}=\tau_{l}$ stroke時間が発生すると、有意な差は認められなかった。それにもかかわらず、様々な冷却と加熱のストローク時間を調整することは、EMPにおいて重要な結果となり、ストローク時間は$\tau_{h}<\tau_{l}$より高く、ストローク時間は$\tau_{h}>\tau_{l}$より低い。この部分熱化は残留コヒーレンスによるエンジンのEMPを高めると結論付けている。

We examine a quantum Otto engine using both Bose-Einstein Condensation (BEC) and normal Bose gas as working medium trapped in generalized external potential. We treated the engine quasi-statically and endoreversibly. Since the expansion and compression in both quasi-static and endoreversible take place isentropic, the expression of efficiency is similar. However, the power output in the quasi-static cycle is zero due to infinite and long stroke time. In contrast, with an endoreversible cycle, thermalization with two reservoirs takes place at a finite time. We use Fourier's law in conduction to formulate the relation between temperature of medium and reservoir, making work depend on heating and cooling stroke time. Moreover, we maximized the power with respect to compression ratio $\kappa$ to obtain efficiency at maximum power (EMP). We found that EMP is significantly higher when using BEC as a working medium, meanwhile EMP with normal Bose gas is just Curzon-Ahlborn efficiency. We also investigate the effect of thermal contact time $\tau$ with hot ($\tau_{h}$) and cold ($\tau_{l}$) reservoir on EMP. We found that when $\tau_{h}=\tau_{l}$ stroke time occur, there are no significant differences. Nevertheless, adjusting various cooling and heating stroke time provide a significant result on EMP, which is much higher at $\tau_{h}<\tau_{l}$ stroke time whilst lower at $\tau_{h}>\tau_{l}$ stroke time. We conclude this partial thermalization enhances the EMP of the engine due to residual coherence.

翻訳日:2023-07-07 16:52:23 公開日:2023-07-04

# AI支援プログラミングのための自然言語生成とビッグコードの理解:レビュー

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review ( http://arxiv.org/abs/2307.02503v1 )

ライセンス: Link先を確認

Man Fai Wong, Shangxin Guo, Ching Nam Hang, Siu Wai Ho, Chee Wei Tan

(参考訳) 本稿では,自然言語処理(NLP)技術の利用に関する文献を包括的にレビューし,AI支援プログラミングタスクの分野において,Big Codeを用いてトレーニングされたトランスフォーマーベース大規模言語モデル(LLM)に着目した。ソフトウェア自然性によって強化されたLLMは、コード生成、コード補完、コード翻訳、コード洗練、コードの要約、欠陥検出、クローン検出など、AI支援プログラミングアプリケーションを促進する上で重要な役割を果たしている。このようなアプリケーションの著名な例としては、OpenAIのCodexとDeepMind AlphaCodeを利用したGitHub Copilotがある。本稿では,AI支援プログラミングに関連する下流タスクにおけるLLMとその応用について概説する。さらに、これらのアプリケーションにNLP技術とソフトウェア自然性を導入する際の課題と機会についても検討し、モバイルソフトウェア開発のためのAppleのXcodeにAI支援プログラミング機能を拡張することについて議論した。また,NLP技術をソフトウェア自然性に取り入れる上での課題と機会,高度なコーディング支援を開発者に与えること,ソフトウェア開発プロセスの合理化について述べる。

This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications, with a discussion on extending AI-assisted programming capabilities to Apple's Xcode for mobile software development. This paper also presents the challenges of and opportunities for incorporating NLP techniques with software naturalness, empowering developers with advanced coding assistance and streamlining the software development process.

翻訳日:2023-07-07 16:42:49 公開日:2023-07-04

# 数学エージェント:計算基盤、数学的埋め込み、ゲノム学

Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics ( http://arxiv.org/abs/2307.02502v1 )

ライセンス: Link先を確認

Melanie Swan, Takashi Kido, Eric Roland, Renato P. dos Santos

(参考訳) 生成AIの進歩は、よりアクセスしやすい数学によって促進される可能性がある。人間-AIチャット以外にも、大規模言語モデル(LLM)はプログラミング、アルゴリズム発見、定理証明に現れているが、ゲノム応用は限られている。本稿では、GPTベースのワークフローを用いて、数学エージェントと数学埋め込みを「ムーアの数学法則」の新たなエントリとして導入し、方程式を文学からLaTeXおよびPython形式に変換する。多くのデジタル方程式表現が存在するが、大規模な自動評価ツールがない。 LLMは言語ユーザインタフェースとして重要であり、人間のAIチャットや大規模AI支援計算インフラのための形式言語に自然言語アクセスを提供する。無限の形式的な可能性空間を考えると、数学と相互作用する数学エージェントは、私たちを「大きなデータ」から「大きな数学」に変える可能性がある。より柔軟な自然言語とは異なり、Mathには証明の対象となる特性があり、AIアライメントを目的とした高い精度の数学認証アイコンのような従来のアプリケーションを超えて使用することができる。本研究の目的は、マルチスカラー物理数学を病気モデルやゲノムデータに適用することにより、情報システム生物学の老化問題に対処するため、数学エージェントと数学的埋め込みを利用することである。エピソード記憶を持つ生成AIは、SIR精度健康モデルを用いて、縦断的な健康記録における因果関係を分析するのに役立つ。ゲノムデータは未解決のアルツハイマー病問題に対処するために提案されている。

The advancement in generative AI could be boosted with more accessible mathematics. Beyond human-AI chat, large language models (LLMs) are emerging in programming, algorithm discovery, and theorem proving, yet their genomics application is limited. This project introduces Math Agents and mathematical embedding as fresh entries to the "Moore's Law of Mathematics", using a GPT-based workflow to convert equations from literature into LaTeX and Python formats. While many digital equation representations exist, there's a lack of automated large-scale evaluation tools. LLMs are pivotal as linguistic user interfaces, providing natural language access for human-AI chat and formal languages for large-scale AI-assisted computational infrastructure. Given the infinite formal possibility spaces, Math Agents, which interact with math, could potentially shift us from "big data" to "big math". Math, unlike the more flexible natural language, has properties subject to proof, enabling its use beyond traditional applications like high-validation math-certified icons for AI alignment aims. This project aims to use Math Agents and mathematical embeddings to address the ageing issue in information systems biology by applying multiscalar physics mathematics to disease models and genomic data. Generative AI with episodic memory could help analyse causal relations in longitudinal health records, using SIR Precision Health models. Genomic data is suggested for addressing the unsolved Alzheimer's disease problem.

翻訳日:2023-07-07 16:42:30 公開日:2023-07-04

# アルゴリズム依存ラデマッハ錯体による一般化保証

Generalization Guarantees via Algorithm-dependent Rademacher Complexity ( http://arxiv.org/abs/2307.02501v1 )

ライセンス: Link先を確認

Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli

(参考訳) アルゴリズムとデータ依存の一般化境界は、現代の機械学習アルゴリズムの一般化挙動を説明するために必要である。この文脈では、(様々な形の)相互情報を含む情報理論の一般化境界と、仮説集合の安定性に基づく境界が存在する。本稿では、アルゴリズムとデータ依存仮説クラスの経験的ラデマッハ複雑性である一般化誤差を制御するための概念的だが技術的に異なる複雑性尺度を提案する。 Rademacher複雑性の標準的な性質と、このクラスの便利な構造を組み合わせることで、我々は、 (i)有限フラクタル次元に基づく新たな境界を得る。 (a)従来のフラクタル次元型境界を連続から有限の仮説クラスに拡張し、 b) 先行業務において必要とされた相互情報用語を避けること (II) 確率勾配降下に対する最近の次元独立一般化の証明を大幅に単純化する。 (iii)条件付き相互情報に基づくアプローチと同様に,vcクラスや圧縮スキームの結果の復元が容易である。

Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.

翻訳日:2023-07-07 16:42:07 公開日:2023-07-04

# 対人訓練による解釈可能なコンピュータビジョンモデル:ロバスト性-解釈可能性結合を解き明かす

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection ( http://arxiv.org/abs/2307.02500v1 )

ライセンス: Link先を確認

Delyan Boychev

(参考訳) 最先端のディープニューラルネットワークの複雑性が永久に増大するにつれて、その解釈性を維持することがますます難しくなっている。本研究は,ロバストなモデル作成に使用される敵の訓練の効果を評価することを目的としている。コンピュータビジョンモデルをより解釈可能にすることが示されている。モデルを現実世界にデプロイする場合、解釈性は堅牢性と同じくらい不可欠です。これら2つの課題の相関性を証明するため,局所的特徴重要度法 (SHAP, 統合的勾配法) と特徴可視化技術 (Representation Inversion, Class Specific Image Generation) を用いてモデルを広範囲に検討した。標準モデルは、ロバストに比べて敵の攻撃の影響を受けやすく、その学習された表現は人間にとって意味をなさない。逆に、これらのモデルは予測をサポートする画像の特徴的な領域に焦点を当てている。さらに、ロバストモデルによって学習される機能は、実際のものに近い。

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

翻訳日:2023-07-07 16:41:54 公開日:2023-07-04

# mPLUG-DocOwl:文書理解のためのモジュール化多モーダル大言語モデル

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding ( http://arxiv.org/abs/2307.02499v1 )

ライセンス: Link先を確認

Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

(参考訳) 文書理解とは、ウェブページのような様々なタイプのデジタル文書から情報を自動的に抽出し、分析し、理解することである。 mPLUG-Owlを含む既存のMLLM(Multi-model Large Language Models)は、浅いOCRフリーテキスト認識において、望ましくないゼロショット機能を示し、OCRフリー文書理解の可能性を示している。それにもかかわらず、ドメイン内のトレーニングなしでは、これらのモデルは、OCRのない文書理解に不可欠な、洗練されたテーブルや大きなテキストブロックのような細粒度のOCR機能を無視する傾向にある。本稿では,OCRフリー文書理解のためのmPLUG-DocOwlに基づくmPLUG-DocOwlを提案する。具体的には、まず、幅広い視覚的テキスト理解タスクを特徴とするインストラクションチューニングデータセットを構築する。次に,ocrフリーな文書理解能力を強化し,言語のみ,汎用視覚言語,文書命令チューニングデータセットを統一した命令チューニング戦略で共同で学習する。また、OCRフリーな文書命令理解評価セットLLMDocを構築し、コンプライアンスと文書理解に関するモデルの能力をよりよく比較する。実験結果から,本モデルは既存のマルチモーダルモデルよりも優れており,文書理解の強力な能力を示している。さらに、特定の微調整なしに、mPLUG-DocOwlは様々な下流タスクをうまく一般化する。私たちのコード、モデル、トレーニングデータ、評価セットはhttps://github.com/X-PLUG/mPLUG-DocOwl.comで公開されています。

Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for OCR-free document understanding. Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding. In this paper, we propose mPLUG-DocOwl based on mPLUG-Owl for OCR-free document understanding. Specifically, we first construct a instruction tuning dataset featuring a wide range of visual-text understanding tasks. Then, we strengthen the OCR-free document understanding ability by jointly train the model on language-only, general vision-and-language, and document instruction tuning dataset with our unified instruction tuning strategy. We also build an OCR-free document instruction understanding evaluation set LLMDoc to better compare models' capabilities on instruct compliance and document understanding. Experimental results show that our model outperforms existing multi-modal models, demonstrating its strong ability of document understanding. Besides, without specific fine-tuning, mPLUG-DocOwl generalizes well on various downstream tasks. Our code, models, training data and evaluation set are available at https://github.com/X-PLUG/mPLUG-DocOwl.

翻訳日:2023-07-07 16:41:36 公開日:2023-07-04

# マルチゲージ水文変動データ同化:多層パーセプトロンとベイズ誘導多変量回帰を用いた空間勾配による地域化学習

Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression ( http://arxiv.org/abs/2307.02497v1 )

ライセンス: Link先を確認

Ngo Nghi Truyen Huynh, Pierre-Andr\'e Garambois, Fran\c{c}ois Colleoni, Benjamin Renard, H\'el\`ene Roux (IMFT)

(参考訳) 空間的に分散した水文パラメータを推定する難しい問題、特に未開水路の洪水について、この寄与は、高分解能な水文モデルのために設計された複雑な地域移動関数を学習するための、新しいシームレスな地域化技術である。転送関数は以下の通りである。 (i)勾配計算のシームレスな流れを可能にした多層パーセプトロンは、機械学習の最適化アルゴリズムを用いる。 (II)変分データ同化アルゴリズムにより最適化され,ベイズ推定により導かれる多変量回帰写像は,実現可能な解の不等式問題に対処する。この手法は、推定可能な地域化写像を微分可能な水文モデルに組み込んで、正確な随伴型空間分布勾配を持つマルチゲージデータに基づいて計算されるコスト関数を最適化する。

Tackling the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, this contribution presents a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on: (i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or (ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions. The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.

翻訳日:2023-07-07 16:41:11 公開日:2023-07-04

# Invertible Neural Networks and Error Diffusion を用いた導電性マップによる気泡分布の再構築

Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion ( http://arxiv.org/abs/2307.02496v1 )

ライセンス: Link先を確認

Nishant Kumar, Lukas Krause, Thomas Wondrak, Sven Eckert, Kerstin Eckert, Stefan Gumhold

(参考訳) 電解はエコフレンドリーな水素生産には不可欠であるが、反応の妨げとなり、セル効率が低下し、エネルギー消費が増加する。さらに、これらのガス気泡は細胞内部の伝導度の変化を引き起こし、細胞周囲の誘導磁場に対応する変化をもたらす。したがって, 外部磁場センサを用いてこれらのガス気泡誘起磁場変動を測定し, バイオサバルト法則の逆問題を解くことにより, セル内の伝導度を推定し, 気泡の大きさと位置を推定することができる。しかし、少数の磁場測定から高分解能導電率マップを決定することは、逆問題である。これを解決するために,Invertible Neural Networks (INNs) を用いて導電性フィールドを再構築する。その結果,tikhonov正則化に比べ,innははるかに優れた性能が得られることがわかった。

Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.

翻訳日:2023-07-07 16:40:53 公開日:2023-07-04

# 産業画像解析のためのパッチベースオートエンコーダの画像または潜時空間の異常検出

Anomaly detection in image or latent space of patch-based auto-encoders for industrial image analysis ( http://arxiv.org/abs/2307.02495v1 )

ライセンス: Link先を確認

Nicolas Pinon (MYRIAD), Robin Trombetta (MYRIAD), Carole Lartizien (MYRIAD)

(参考訳) 本研究では,パッチベースのオートエンコーダを用いたカラー画像の異常検出手法について検討した。まず、原画像と再構成の誤差に基づいて、3種類の手法の性能を比較し、第2に、潜時空間における正規像分布の支持推定、第3に、原画像と再構成画像の復元版との誤差について比較する。これらの手法を産業画像データベースMVTecADandで評価し、2つの最先端技術と比較した。

We study several methods for detecting anomalies in color images, constructed on patch-based auto-encoders. Wecompare the performance of three types of methods based, first, on the error between the original image and its reconstruction,second, on the support estimation of the normal image distribution in the latent space, and third, on the error between the originalimage and a restored version of the reconstructed image. These methods are evaluated on the industrial image database MVTecADand compared to two competitive state-of-the-art methods.

翻訳日:2023-07-07 16:40:38 公開日:2023-07-04

# 深層強化学習における転校学習:調査

Transfer Learning in Deep Reinforcement Learning: A Survey ( http://arxiv.org/abs/2009.07888v7 )

ライセンス: Link先を確認

Zhuangdi Zhu, Kaixiang Lin, Anil K. Jain, and Jiayu Zhou

(参考訳) 強化学習は、シーケンシャルな意思決定問題を解決するための学習パラダイムである。近年,深層ニューラルネットワークの急速な発展に伴い,強化学習が著しく進展している。ロボット工学やゲームプレイングといった多くの分野における強化学習の有望な展望とともに、翻訳学習は、強化学習が直面する様々な課題に取り組み、外部の専門知識から知識を移譲して学習プロセスの効率化と有効性を促進する。本研究では,深層強化学習の文脈における転校学習アプローチの最近の進歩を体系的に調査する。具体的には,最先端のトランスファー学習のアプローチを分類するためのフレームワークを提供し,それらの目標,方法論,互換性のある強化学習バックボーン,実践的応用について分析する。また,強化学習の観点からは,転校学習と関連する他の話題との関係を導き,今後の研究の進展を待ち受けている課題を探究する。

Reinforcement learning is a learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in reinforcement learning upon the fast development of deep neural networks. Along with the promising prospects of reinforcement learning in numerous domains such as robotics and game-playing, transfer learning has arisen to tackle various challenges faced by reinforcement learning, by transferring knowledge from external expertise to facilitate the efficiency and effectiveness of the learning process. In this survey, we systematically investigate the recent progress of transfer learning approaches in the context of deep reinforcement learning. Specifically, we provide a framework for categorizing the state-of-the-art transfer learning approaches, under which we analyze their goals, methodologies, compatible reinforcement learning backbones, and practical applications. We also draw connections between transfer learning and other relevant topics from the reinforcement learning perspective and explore their potential challenges that await future research progress.

翻訳日:2023-07-07 01:04:10 公開日:2023-07-04

# 現実的制約下における量子エンタングルメントパーコレーション

Quantum entanglement percolation under a realistic restriction ( http://arxiv.org/abs/2008.09040v2 )

ライセンス: Link先を確認

Shashaank Khanna, Saronath Halder, Ujjwal Sen

(参考訳) Bell と Greenberger-Horne-Zeilinger を回路の遠方または遠方ノード間で確立することの問題は難しく、非常に重要な問題であり、それに対処する戦略はエンタングメント・パーコレーションである。部分絡み合った純二分体絡み合い状態の単層ハニカム格子上の3、2、および1量子ビットの測定を含む量子計測戦略により終端を得る方法を提供する。次に、二層格子に移動し、格子のノード上で許容される局所量子演算と古典的通信の現実的な制限の下で、その格子上に絡み合うパーコレーションを導入する。単層ハニカム格子に適用した場合、既存の手法で同じ現象が達成された場合よりも、実際の実現におけるノイズ効果の低減が求められる。さらに, 2層ハニカム格子に対しては, 現実的制約下での古典的エンタングルメントパーコレーションに対する量子エンタングルメントパーコレーションの利点を報告する。

The problem of establishing Bell and Greenberger-Horne-Zeilinger states between faraway places or distant nodes of a circuit is a difficult and an extremely important one, and a strategy which addresses it is entanglement percolation. We provide a method for attaining the end through a quantum measurement strategy involving three-, two-, and single-qubit measurements on a single-layer honeycomb lattice of partially entangled pure bipartite entangled states. We then move over to a double-layered lattice, and introduce entanglement percolation on that lattice under a realistic restriction on local quantum operations and classical communication allowed on the nodes of the lattice. When applied to a single-layered honeycomb lattice, our strategy would call for less noise effects in an actual realization than when the same phenomenon is attained via existing methods. Moreover, for the double-layered honeycomb lattice, we report advantage of quantum entanglement percolation over classical entanglement percolation under the realistic restriction.

翻訳日:2023-07-07 01:03:53 公開日:2023-07-04

# 非線形PTPチャネルを用いた高速量子状態判別

Fast quantum state discrimination with nonlinear PTP channels ( http://arxiv.org/abs/2111.05977v2 )

ライセンス: Link先を確認

Michael R. Geller

(参考訳) 決定論的正のトレース保存(PTP)チャネルと進化方程式に基づく非線形量子計算のモデルについて検討する。モデルは任意の有限ヒルベルト空間で定義されるが、主な結果は次元$N \! = \! 2$. 有界線型作用素 $X$ 上のすべての正規化可能線型あるいは非線形正写像 $\phi$ に対して、関連する正規化 PTP チャネル $ \phi(X) / {\rm tr}[\phi(X)]$ が存在する。正規化されたPTPチャネルは、相互作用するボソンに対するグロス=ピタエフスキー方程式のようなユニタリ平均場理論や、線形および非線形散逸のモデルを含む。それらは4つのタイプに分類され、計算力を探索する3種類の非線形性をもたらす。クビットの場合、これらのチャネルは以前に研究されたブロッホ球のねじれやその他の歪みをサポートし、そのような非線形性は1対のクビット状態の分離を増大させることで、状態判別の指数的なスピードアップを示唆している。このアイデアに基づいて、この操作を消散を用いて雑音に頑健にすることで、一対の固定点が本質的にフォールトトレラントな非線形状態判別器を生成する新しい位相への分岐を誘導することができると論じる。

We investigate models of nonlinear quantum computation based on deterministic positive trace-preserving (PTP) channels and evolution equations. The models are defined in any finite Hilbert space, but the main results are for dimension $N \! = \! 2$. For every normalizable linear or nonlinear positive map $\phi$ on bounded linear operators $X$, there is an associated normalized PTP channel $ \phi(X) / {\rm tr}[\phi(X)]$. Normalized PTP channels include unitary mean field theories, such as the Gross-Pitaevskii equation for interacting bosons, as well as models of linear and nonlinear dissipation. They classify into 4 types, yielding 3 distinct forms of nonlinearity whose computational power we explore. In the qubit case these channels support Bloch ball torsion and other distortions studied previously, where it has been shown that such nonlinearity can be used to increase the separation between a pair of close qubit states, suggesting an exponential speedup for state discrimination. Building on this idea, we argue that this operation can be made robust to noise by using dissipation to induce a bifurcation to a novel phase where a pair of attracting fixed points create an intrinsically fault-tolerant nonlinear state discriminator.

翻訳日:2023-07-07 00:57:52 公開日:2023-07-04

# 過パラメータ化からの導出性:負のパーセプトロンの例

Tractability from overparametrization: The example of the negative perceptron ( http://arxiv.org/abs/2110.15824v3 )

ライセンス: Link先を確認

Andrea Montanari, Yiqiao Zhong, Kangjie Zhou

(参考訳) 負のパーセプトロン問題では、$n$ data points $({\boldsymbol x}_i,y_i)$、ただし${\boldsymbol x}_i$は$d$-dimensional vector、$y_i\in\{+1,-1\}$はバイナリラベルである。データは線形分離可能ではなく、従って、最大の可能な 'emph{ negative} マージンを持つ線形分類器を見つけるのに満足する。言い換えれば、単位ノルムベクトル ${\boldsymbol \theta}$ を見つけて、$\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$ を最大化する。これは非凸最適化問題(ポリトープ内の最大ノルムベクトルを見つけるのと同値)であり、データに対する2つのランダムモデルの下でその典型的な性質を調べる。我々は、$n,d\to \infty$と$n/d\to\delta$の比例漸近を考慮し、その逆関数 $\delta_{\text{s}}(\kappa)$ の最大辺 $\kappa_{\text{s}}(\delta)$ あるいは -- 等価に) の上と下の境界を証明している。言い換えると、$\delta_{\text{s}}(\kappa)$はオーバーパラメトリゼーションしきい値である: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier 消滅するトレーニングエラーを達成することは高い確率で存在し、$n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$はそうではない。我々の$\delta_{\text{s}}(\kappa)$は、先頭の順序に$\kappa\to -\infty$と一致します。次に線形計画アルゴリズムを解析して解を見つけ、対応するしきい値 $\delta_{\text{lin}}(\kappa)$ を特徴付ける。我々は補間しきい値 $\delta_{\text{s}}(\kappa)$ と線形計画しきい値 $\delta_{\text{lin}}(\kappa)$ の間のギャップを観察し、他のアルゴリズムの振る舞いの問題を提起する。

In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i,y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector ${\boldsymbol \theta}$ that maximizes $\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which $n,d\to \infty$ with $n/d\to\delta$, and prove upper and lower bounds on the maximum margin $\kappa_{\text{s}}(\delta)$ or -- equivalently -- on its inverse function $\delta_{\text{s}}(\kappa)$. In other words, $\delta_{\text{s}}(\kappa)$ is the overparametrization threshold: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier achieving vanishing training error exists with high probability, while for $n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$ it does not. Our bounds on $\delta_{\text{s}}(\kappa)$ match to the leading order as $\kappa\to -\infty$. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold $\delta_{\text{lin}}(\kappa)$. We observe a gap between the interpolation threshold $\delta_{\text{s}}(\kappa)$ and the linear programming threshold $\delta_{\text{lin}}(\kappa)$, raising the question of the behavior of other algorithms.

翻訳日:2023-07-07 00:57:25 公開日:2023-07-04

# 確率勾配降下法における適応バッチサイズ選択戦略の等価性について

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods ( http://arxiv.org/abs/2109.10933v2 )

ライセンス: Link先を確認

Luis Espath, Sebastian Krumscheid, Ra\'ul Tempone, Pedro Vilanova

(参考訳) 本研究では,\epsilon^2=\theta^2+\nu^2}\,\theta$ および $\nu$ の特定の選択をした場合の確率的勾配降下 (sgd) 法に関連する収束率の観点から,ノルム検定と内積/直交性試験が等価であることを示す。ここで、$\epsilon$は勾配のノルムの相対統計誤差を制御し、$\theta$と$\nu$は勾配の方向と勾配の直交方向の相対統計誤差をそれぞれ制御する。さらに,もし$\theta$ と $\nu$ が最適に選択されれば,内積/オルトゴナリティテストは最善のケースではノルムテストと同じくらい安価になるが,内積/オルトゴナリティテストは$\epsilon^2=\theta^2+\nu^2$なら計算的に安くなることはない。最後に,2つの確率的最適化問題を提案する。

In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if $\epsilon^2=\theta^2+\nu^2$ with specific choices of $\theta$ and $\nu$. Here, $\epsilon$ controls the relative statistical error of the norm of the gradient while $\theta$ and $\nu$ control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if $\theta$ and $\nu$ are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if $\epsilon^2=\theta^2+\nu^2$. Finally, we present two stochastic optimization problems to illustrate our results.

翻訳日:2023-07-07 00:56:19 公開日:2023-07-04

# 半マルコフモデルを用いた適応前方シミュレーション時間(AFST)を用いたロボットナビゲーションの強化学習

Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model ( http://arxiv.org/abs/2108.06161v4 )

ライセンス: Link先を確認

Yu'an Chen, Ruosong Ye, Ziyang Tao, Hongjian Liu, Guangda Chen, Jie Peng, Jun Ma, Yu Zhang, Jianmin Ji and Yanyong Zhang

(参考訳) 深部強化学習(DRL)アルゴリズムは、知覚入力を直接ロボット制御コマンドにマッピングすることで、特に未知の環境でロボットナビゲーションに有効であることが証明されている。しかし、既存の手法の多くはナビゲーションの局所的な最小問題を無視しており、複雑な未知の環境を扱えない。本稿では,適応フォワードシミュレーション時間 (AFST) と呼ばれる連続的な行動空間を持つ半マルコフ決定プロセス (SMDP) でモデル化されたDRLベースのナビゲーション手法を提案する。具体的には,動作空間の次元を小さくし,特定のSMDP問題に対する分散近似ポリシー最適化(DPPO)アルゴリズムを改良し,GAEを修正してSMDPのポリシー勾配をより正確に推定する。様々な未知環境における実験は、AFSTの有効性を示す。

Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST.

翻訳日:2023-07-07 00:55:56 公開日:2023-07-04

# 多様体上の最適化:シンプレクティックアプローチ

Optimization on manifolds: A symplectic approach ( http://arxiv.org/abs/2107.11231v2 )

ライセンス: Link先を確認

Guilherme Fran\c{c}a, Alessandro Barp, Mark Girolami, Michael I. Jordan

(参考訳) 統計的機械学習では最適化タスクが不可欠である。近年、動的システムからのツールを活用することで、連続時間システムの適切な離散化を通じて、加速的かつロバストな最適化手法を導出することに大きな関心が寄せられている。しかし、これらのアイデアは主にユークリッド空間や制約のない設定、あるいはリーマン勾配フローに限られている。本研究では, 非線形制約を伴う問題を含む滑らかな多様体上の最適化問題を解くための一般的な枠組みとして, ディラックの制約付きハミルトン系理論の散逸拡張を提案する。本研究では,「レートマッチング」である多様体上の幾何学的・漸近的数値積分器,すなわち連続時間収束率を保存する。特に,最適収束率を局所的に達成できる散逸型RATTLE積分器を提案する。我々の(加速された)アルゴリズムのクラスは単純で効率的なだけでなく、幅広いコンテキストに適用できる。

Optimization tasks are crucial in statistical machine learning. Recently, there has been great interest in leveraging tools from dynamical systems to derive accelerated and robust optimization methods via suitable discretizations of continuous-time systems. However, these ideas have mostly been limited to Euclidean spaces and unconstrained settings, or to Riemannian gradient flows. In this work, we propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems over smooth manifolds, including problems with nonlinear constraints. We develop geometric/symplectic numerical integrators on manifolds that are "rate-matching," i.e., preserve the continuous-time rates of convergence. In particular, we introduce a dissipative RATTLE integrator able to achieve optimal convergence rate locally. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.

翻訳日:2023-07-07 00:55:38 公開日:2023-07-04

# ビデオ超解像トランス

Video Super-Resolution Transformer ( http://arxiv.org/abs/2106.06847v3 )

ライセンス: Link先を確認

Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool

(参考訳) ビデオ超解像(VSR)は、高解像度映像を対応する低解像度バージョンから復元することを目的としており、時空間シーケンス予測問題である。近年,シークエンス・ツー・シーケンス・モデリングの並列計算能力により,Transformerが普及している。したがって、視覚変換器をVSRの解法に適用することは容易である。しかしながら、完全接続された自己接続層とトークン指向のフィードフォワード層を持つトランスの典型的なブロック設計は、以下の2つの理由からvsrには適さない。第一に、完全接続されたセルフアテンション層は、注意マップを計算するために線形層に依存するため、データの局所性を利用するのを怠る。第2に、トークンワイドフィードフォワード層は、VSRにとって重要な特徴アライメントを欠いている。本稿では,VSR に Transformer を適用するための最初の試みを行う。具体的には,まず,局所性情報を利用した理論的理解を伴う空間的時間的畳み込み自己認識層を提案する。第2の課題として,双方向光フロー型フィードフォワード層をデザインし,異なる映像フレーム間の相関を探索し,特徴を整合させる。いくつかのベンチマークデータセットに対する大規模な実験により,提案手法の有効性が示された。コードはhttps://github.com/caojiezhang/vsr-transformerで入手できる。

Video super-resolution (VSR), with the aim to restore a high-resolution video from its corresponding low-resolution version, is a spatial-temporal sequence prediction problem. Recently, Transformer has been gaining popularity due to its parallel computing ability for sequence-to-sequence modeling. Thus, it seems to be straightforward to apply the vision Transformer to solve VSR. However, the typical block design of Transformer with a fully connected self-attention layer and a token-wise feed-forward layer does not fit well for VSR due to the following two reasons. First, the fully connected self-attention layer neglects to exploit the data locality because this layer relies on linear layers to compute attention maps. Second, the token-wise feed-forward layer lacks the feature alignment which is important for VSR since this layer independently processes each of the input token embeddings without any interaction among them. In this paper, we make the first attempt to adapt Transformer for VSR. Specifically, to tackle the first issue, we present a spatial-temporal convolutional self-attention layer with a theoretical understanding to exploit the locality information. For the second issue, we design a bidirectional optical flow-based feed-forward layer to discover the correlations across different video frames and also align features. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our proposed method. The code will be available at https://github.com/caojiezhang/VSR-Transformer.

翻訳日:2023-07-07 00:54:48 公開日:2023-07-04

# 構造バンドにおける固定予算ベストアーム同定

Fixed-Budget Best-Arm Identification in Structured Bandits ( http://arxiv.org/abs/2106.04763v8 )

ライセンス: Link先を確認

Mohammad Javad Azizi, Branislav Kveton and Mohammad Ghavamzadeh

(参考訳) 固定予算設定におけるベストアーム識別(BAI)は、学習エージェントが一定の回数の観測後に最適な(ベスト)腕を特定する確率を最大化する盗賊問題である。このトピックに関するほとんどの研究は、少数の腕を持つ非構造的な問題を研究し、適用性を制限する。結合一般化モデルから平均報酬推定値に基づいて、次々に最適なアームを除去することにより、構造を組み込んだ一般トラクタブルアルゴリズムを提案する。線形および一般化線形モデル(GLM)を用いてアルゴリズムを解析し,G-最適設計に基づく実践的実装を提案する。線形モデルでは,提案アルゴリズムは先行動作に対する競合誤差を保証し,少なくとも経験的にも動作する。 GLMでは、固定予算BAIの分析を行う最初の実用的なアルゴリズムである。

Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations. Most works on this topic study unstructured problems with a small number of arms, which limits their applicability. We propose a general tractable algorithm that incorporates the structure, by successively eliminating suboptimal arms based on their mean reward estimates from a joint generalization model. We analyze our algorithm in linear and generalized linear models (GLMs), and propose a practical implementation based on a G-optimal design. In linear models, our algorithm has competitive error guarantees to prior works and performs at least as well empirically. In GLMs, this is the first practical algorithm with analysis for fixed-budget BAI.

翻訳日:2023-07-07 00:54:24 公開日:2023-07-04

# サンプルモーメントを用いた密度推定のための非古典的パラメータ化

A Non-Classical Parameterization for Density Estimation Using Sample Moments ( http://arxiv.org/abs/2201.04786v5 )

ライセンス: Link先を確認

Guangyu Wu, Anders Lindquist

(参考訳) 確率密度推定は統計処理と信号処理の中心的な問題である。モーメント法は、密度推定の重要な手段であるが、それらは一般に、性能に大きく影響する、実現可能な関数の選択に強く依存している。本稿では,そのような関数の選択を必要としないサンプルモーメントを用いた密度推定のための非古典的パラメトリゼーションを提案する。パラメトリゼーションは、二乗ヘリンガー距離によって引き起こされ、その解は、データに依存しない単純な前もって存在し、一意な対象であることが証明され、凸最適化によって得られる。密度推定器の統計的特性と漸近誤差上界は、パワーモーメントによる推定器に対して提案される。信号処理タスクにおける密度推定器の応用について述べる。シミュレーション結果から, 推定器の性能を, いくつかの手法との比較により検証した。我々の知る限りでは、提案された推定器は、任意の偶数列までのパワーモーメントが標本モーメントと正確に一致し、真の密度は特定の関数クラスに収まらないと仮定される文学における最初のものである。

Probability density estimation is a core problem of statistics and signal processing. Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. In this paper, we propose a non-classical parametrization for density estimation using sample moments, which does not require the choice of such functions. The parametrization is induced by the squared Hellinger distance, and the solution of it, which is proved to exist and be unique subject to a simple prior that does not depend on data, and can be obtained by convex optimization. Statistical properties of the density estimator, together with an asymptotic error upper bound are proposed for the estimator by power moments. Applications of the proposed density estimator in signal processing tasks are given. Simulation results validate the performance of the estimator by a comparison to several prevailing methods. To the best of our knowledge, the proposed estimator is the first one in the literature for which the power moments up to an arbitrary even order exactly match the sample moments, while the true density is not assumed to fall within specific function classes.

翻訳日:2023-07-07 00:44:25 公開日:2023-07-04

# 多目的ニューラルアーキテクチャ探索による解釈可能なモデル学習

Learning Interpretable Models Through Multi-Objective Neural Architecture Search ( http://arxiv.org/abs/2112.08645v4 )

ライセンス: Link先を確認

Zachariah Carmichael, Tim Moon, Sam Ade Jacobs

(参考訳) ディープラーニングの記念碑的な進歩は、さまざまな領域で前例のない成果をもたらしている。ディープニューラルネットワークのパフォーマンスは実行可能であるが、そのようなモデルのアーキテクチャ設計と解釈性は非自明である。ニューラルネットワークアーキテクチャの設計を自動化するために、ニューラルネットワークサーチ(NAS)が導入された。最近の進歩により、分散計算と新しい最適化アルゴリズムを活用することで、これらの手法はより実用的になった。しかし、解釈可能性のためにアーキテクチャを最適化する作業はほとんどない。そこで我々は,多目的分散NASフレームワークを提案し,タスク性能と「イントロスペクタビリティ」の両方を最適化する。我々は、非支配的なソート遺伝的アルゴリズム(NSGA-II)と説明可能なAI(XAI)技術を活用し、ドメインの専門家がより理解しやすいアーキテクチャに報いる。このフレームワークは複数の画像分類データセットで評価される。タスクエラーとイントロスペクタビリティを共同で最適化することで、許容可能なエラー内で実行する、より疎結合でデバッグ可能なアーキテクチャが実現できることを実証する。

Monumental advances in deep learning have led to unprecedented achievements across various domains. While the performance of deep neural networks is indubitable, the architectural design and interpretability of such models are nontrivial. Research has been introduced to automate the design of neural network architectures through neural architecture search (NAS). Recent progress has made these methods more pragmatic by exploiting distributed computation and novel optimization algorithms. However, there is little work in optimizing architectures for interpretability. To this end, we propose a multi-objective distributed NAS framework that optimizes for both task performance and "introspectability," a surrogate metric for aspects of interpretability. We leverage the non-dominated sorting genetic algorithm (NSGA-II) and explainable AI (XAI) techniques to reward architectures that can be better comprehended by domain experts. The framework is evaluated on several image classification datasets. We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within tolerable error.

翻訳日:2023-07-07 00:43:54 公開日:2023-07-04

# カメラネットワークにおける人物検索を支援するクロスカメラトラジェクタ

Cross-Camera Trajectories Help Person Retrieval in a Camera Network ( http://arxiv.org/abs/2204.12900v3 )

ライセンス: Link先を確認

Xin Zhang and Xiaohua Xie and Jianhuang Lai and Wei-Shi Zheng

(参考訳) オーバラップしないカメラネットワークで撮影した複数のビデオからクエリを検索することに関心がある。既存の手法では、純粋な視覚的マッチングや時間的制約を考慮することが多いが、カメラネットワークの空間情報は無視する。この問題に対処するために,時間情報と空間情報を統合したクロスカメラトラジェクトリ生成に基づく歩行者検索フレームワークを提案する。本研究では,歩行者の歩行習慣とカメラ間の経路配置を統合し,協調確率分布を形成する新しいクロスカメラ時空間モデルを提案する。スパースサンプリングされた歩行者データを用いて、カメラネットワーク内のこのような時空間モデルを特定できる。時空間モデルに基づいて、クロスカメラトラジェクトリを条件付きランダム場モデルにより抽出し、制限された非負行列分解によりさらに最適化することができる。最後に,歩行者検索結果を改善するため,軌道再分類手法を提案する。本手法の有効性を検証するため,実際の監視シナリオにおいて,最初のクロスカメラ歩行者軌跡データセットであるPerson Trajectory Datasetを構築した。提案手法の有効性とロバスト性に関する広範な実験を行った。

We are concerned with retrieving a query person from multiple videos captured by a non-overlapping camera network. Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. To address this issue, we propose a pedestrian retrieval framework based on cross-camera trajectory generation, which integrates both temporal and spatial information. To obtain pedestrian trajectories, we propose a novel cross-camera spatio-temporal model that integrates pedestrians' walking habits and the path layout between cameras to form a joint probability distribution. Such a spatio-temporal model among a camera network can be specified using sparsely sampled pedestrian data. Based on the spatio-temporal model, cross-camera trajectories can be extracted by the conditional random field model and further optimized by restricted non-negative matrix factorization. Finally, a trajectory re-ranking technique is proposed to improve the pedestrian retrieval results. To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset, the Person Trajectory Dataset, in real surveillance scenarios. Extensive experiments verify the effectiveness and robustness of the proposed method.

翻訳日:2023-07-07 00:37:17 公開日:2023-07-04

# 有限パルス不完全なラマン断熱路の完全刺激

Perfect stimulated Raman adiabatic passage with imperfect finite-time pulses ( http://arxiv.org/abs/2204.05271v2 )

ライセンス: Link先を確認

Shruti Dogra and Gheorghe Sorin Paraoanu

(参考訳) 我々は,STImulated Raman Adiabatic Passage (STIRAP)において,完全な人口移動を実現する2つのガウスパルスドライブを適切に調整したシーケンスを示す。我々はストークスパルスとポンプパルスの最適乱れと相対配置に関する理論的解析を行った。さらに、与えられたパルス幅に対するプロトコルの電力と持続時間を得る。重要なことに、所望の忠実性の値を達成するために必要なプロトコルの期間は、不忠実性の対数的のみに依存する。ドライブの最適切断を前提とし、高速転送のポイントを参考に、非常に単純で効果的である新しい断熱性基準を得る。

We present a well-tailored sequence of two Gaussian-pulsed drives that achieves perfect population transfer in STImulated Raman Adiabatic Passage (STIRAP). We give a theoretical analysis of the optimal truncation and relative placement of the Stokes and pump pulses. Further, we obtain the power and the duration of the protocol for a given pulse width. Importantly, the duration of the protocol required to attain a desired value of fidelity depends only logarithmically on the infidelity. Subject to optimal truncation of the drives and with reference to the point of fastest transfer, we obtain a new adiabaticity criteria, which is remarkably simple and effective.

翻訳日:2023-07-07 00:36:58 公開日:2023-07-04

# 単純非パラメトリック混合学習の硬さに関する厳密な境界

Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures ( http://arxiv.org/abs/2203.15150v3 )

ライセンス: Link先を確認

Bryon Aragam, Wai Ming Tai

(参考訳) 有限混合系における非パラメトリック分布の学習問題について検討し、そのようなモデルにおける成分分布の学習におけるサンプル複雑性の厳密な境界を確立する。すなわち、pdf$f$から、$$f=w_1f_1+w_2f_2, \quad w_1+w_2=1, \quad w_1,w_2>0$$のサンプルが与えられる。 f_i$の仮定がなければ、この問題は正しくない。成分 $f_i$ を識別するために、各$f_i$ はガウスの畳み込みとコンパクトに支持された密度 $\nu_i$ と $\text{supp}(\nu_1)\cap \text{supp}(\nu_2)=\emptyset$ と書けると仮定する。主な結果は、$(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$サンプルが各$f_i$を推定するために必要であることを示している。この証明は、独立利害関係にあるガウシアンとの近似速度が速い量的タウバーの定理に依存している。これは厳密であることを示すために、各$f_i$を推定するために$(\frac{1}{\varepsilon})^{o(\log\log \frac{1}{\varepsilon})}$サンプルを使用するアルゴリズムも提案する。モーメントマッチングとテンソル法に基づく潜在変数モデルを学習する既存のアプローチとは異なり、我々の証明は直交関数による不条件線形系の微妙な解析を伴う。これらの境界を組み合わせることで、この問題の最適サンプル複雑性は多項式と指数関数の間にあると結論づけ、これは学習理論では一般的ではない。

We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=w_1f_1+w_2f_2, \quad w_1+w_2=1, \quad w_1,w_2>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_1)\cap \text{supp}(\nu_2)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. The proof relies on a quantitative Tauberian theorem that yields a fast rate of approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.

翻訳日:2023-07-07 00:36:47 公開日:2023-07-04

# 超伝導回路における超断熱的集団移動のスケーリング誤差下でのロバスト性の実験的実証

Experimental demonstration of robustness under scaling errors for superadiabatic population transfer in a superconducting circuit ( http://arxiv.org/abs/2203.12073v2 )

ライセンス: Link先を確認

Shruti Dogra, Antti Veps\"al\"ainen, and Gheorghe Sorin Paraoanu

(参考訳) 超断熱的Raman adiabatic passage (saSTIRAP) を用いて, トランスモン回路の基底状態と第2励起状態の間の集団移動を実験的に理論的に検討した。パルスの振幅の変動(スケーリング誤差)に対して、転送が著しく耐性があることを示し、超断熱過程が断熱過程からある種の強靭性特徴を継承することを示す。特に,sastirapの枠組みを超越した反断熱パルス強度の高値に出現する新しい高原の存在を実証した。

We study experimentally and theoretically the transfer of population between the ground state and the second excited state in a transmon circuit by the use of superadiabatic stimulated Raman adiabatic passage (saSTIRAP). We show that the transfer is remarkably resilient against variations in the amplitudes of the pulses (scaling errors), thus demostrating that the superadiabatic process inherits certain robustness features from the adiabatic one. In particular, we put in evidence a new plateau that appears at high values of the counterdiabatic pulse strength, which goes beyond the usual framework of saSTIRAP.

翻訳日:2023-07-07 00:35:20 公開日:2023-07-04

# 調整表面符号の回路ノイズの復号化と脆弱境界の改善

Improved decoding of circuit noise and fragile boundaries of tailored surface codes ( http://arxiv.org/abs/2203.04948v5 )

ライセンス: Link先を確認

Oscar Higgott, Thomas C. Bohdanowicz, Aleksander Kubica, Steven T. Flammia, Earl T. Campbell

(参考訳) 量子計算の可能性を最大限に発揮するには、量子誤差補正(qec)が必要である。 QEC符号は、複数のノイズのある物理量子ビットを使用して、より少ない論理量子ビットで情報を符号化し、復号処理による誤りの識別を可能にする。このプロセスは論理的忠実度(または精度)を高め、計算をより信頼性を高める。しかし、ほとんどの高速(効率的なランタイム)デコーダは重要なノイズ特性を無視し、精度を低下させる。本研究では,高速かつ高精度なデコーダを導入し,表面コードを含む多種多様なQECコードで使用することができる。我々のデコーダは、信仰マッチングと信念フィンドと呼ばれ、すべてのノイズ情報を活用し、QECの高精度なデモを解き放つ。性能指標として表面符号閾値を用いると、デコーダの誤差確率0.94\%でしきい値が観測され、標準の最小値完全整合デコーダの閾値0.82\%を上回った。また、バイアスノイズモデルに適した符号の理論的ケーススタディにおいて、信念マッチングデコーダを検証した。このデコーダは, 標準の正方形曲面符号に対して, 整形曲面符号において, より高いしきい値と低い量子ビットオーバーヘッドをもたらすことがわかった。驚くべきことに、十分に低いしきい値のシステムでは、私たちが"脆弱な境界"と呼ぶ以前は気付かなかった現象のために、矩形の表面コードは、調整された表面コードよりもリソース効率が向上します。我々のデコーダは他の全ての高速デコーダをしきい値と精度で上回り、現在の量子誤り訂正実験でより良い結果をもたらし、理論的なケーススタディのための新しい領域を開くことができる。

Realizing the full potential of quantum computation requires quantum error correction (QEC), with most recent breakthrough demonstrations of QEC using the surface code. QEC codes use multiple noisy physical qubits to encode information in fewer logical qubits, enabling the identification of errors through a decoding process. This process increases the logical fidelity (or accuracy) making the computation more reliable. However, most fast (efficient runtime) decoders neglect important noise characteristics, thereby reducing their accuracy. In this work, we introduce decoders that are both fast and accurate, and can be used with a wide class of QEC codes including the surface code. Our decoders, named belief-matching and belief-find, exploit all noise information and thereby unlock higher accuracy demonstrations of QEC. Using the surface code threshold as a performance metric, we observe a threshold at 0.94\% error probability for our decoders, outperforming the 0.82\% threshold for a standard minimum-weight perfect matching decoder. We also tested our belief-matching decoders in a theoretical case study of codes tailored to a biased noise model. We find that the decoders led to a much higher threshold and lower qubit overhead in the tailored surface code with respect to the standard, square surface code. Surprisingly, in the well-below threshold regime, the rectangular surface code becomes more resource-efficient than the tailored surface code, due to a previously unnoticed phenomenon that we call "fragile boundaries". Our decoders outperform all other fast decoders in terms of threshold and accuracy, enabling better results in current quantum error correction experiments and opening up new areas for theoretical case studies.

翻訳日:2023-07-07 00:34:47 公開日:2023-07-04

# 単純後悔最小化のためのメタラーニング

Meta-Learning for Simple Regret Minimization ( http://arxiv.org/abs/2202.12888v2 )

ライセンス: Link先を確認

Mohammadjavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

(参考訳) バンディットにおける簡単な後悔の最小化のためのメタラーニングフレームワークを開発する。このフレームワークでは、学習エージェントが未知の事前分布からサンプル化された一連のバンディットタスクと相互作用し、そのメタパラメータを学習して、将来のタスクをよりよく実行する。本稿では,ベイズ的かつ頻繁なメタ学習アルゴリズムを提案する。ベイズアルゴリズムは、メタパラメータ上の以前の分布にアクセスでき、そのメタ単純後悔は、水平線$n$は単に$\tilde{O}(m / \sqrt{n})$である。一方、頻繁なアルゴリズムのメタ単純後悔は$\tilde{O}(\sqrt{m} n + m/ \sqrt{n})$である。後悔は悪化するが、メタパラメーター上の事前分布を必要としないため、頻繁なアルゴリズムはより一般的である。より多くの設定で分析することもできる。アルゴリズムをいくつかのバンディット問題のクラスにインスタンス化する。我々のアルゴリズムは一般的であり、いくつかの環境で経験的に評価することで理論を補完する。

We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over $m$ bandit tasks with horizon $n$ is mere $\tilde{O}(m / \sqrt{n})$. On the other hand, the meta simple regret of the frequentist algorithm is $\tilde{O}(\sqrt{m} n + m/ \sqrt{n})$. While its regret is worse, the frequentist algorithm is more general because it does not need a prior distribution over the meta-parameters. It can also be analyzed in more settings. We instantiate our algorithms for several classes of bandit problems. Our algorithms are general and we complement our theory by evaluating them empirically in several environments.

翻訳日:2023-07-07 00:34:18 公開日:2023-07-04

# シャッフルチェックによるプライバシ増幅

Privacy Amplification via Shuffled Check-Ins ( http://arxiv.org/abs/2206.03151v2 )

ライセンス: Link先を確認

Seng Pei Liew, Satoshi Hasegawa, Tsubasa Takahashi

(参考訳) 我々は、信頼できるシャッフル器以上の信頼の仮定を必要とせずに強力なプライバシー保証を実現する、shuffled check-inと呼ばれる分散計算プロトコルについて検討する。既存のほとんどの作業とは異なり、シャッフルチェックインにより、クライアントは独立してランダムに計算に参加できるようになり、サーバ初期化サブサンプリングの必要性がなくなる。差分プライバシーを活用することで、シャッフルチェックインはプライバシーの増幅を通じて厳密なプライバシー保証を実現することを示し、既存の作業よりもプライバシー会計を改善するR{\'e}nyi差分プライバシーに基づく新たな分析を行った。また,本論文のローカル/シャッフルモデルにおける分散環境下での汎用メカニズムの最初の評価であるガウス機構を含む,汎用シャッフル機構のプライバシを追跡する数値的手法を導入する。提案手法の有効性を示す実証的研究も行われている。

We study a protocol for distributed computation called shuffled check-in, which achieves strong privacy guarantees without requiring any further trust assumptions beyond a trusted shuffler. Unlike most existing work, shuffled check-in allows clients to make independent and random decisions to participate in the computation, removing the need for server-initiated subsampling. Leveraging differential privacy, we show that shuffled check-in achieves tight privacy guarantees through privacy amplification, with a novel analysis based on R{\'e}nyi differential privacy that improves privacy accounting over existing work. We also introduce a numerical approach to track the privacy of generic shuffling mechanisms, including Gaussian mechanism, which is the first evaluation of a generic mechanism under the distributed setting within the local/shuffle model in the literature. Empirical studies are also given to demonstrate the efficacy of the proposed approach.

翻訳日:2023-07-07 00:24:44 公開日:2023-07-04

# ニューラルネットワークによるwehrlモーメントによる絡み合いの幾何測度の推定

Estimation of the geometric measure of entanglement with Wehrl Moments through Artificial Neural Networks ( http://arxiv.org/abs/2205.15095v3 )

ライセンス: Link先を確認

J\'er\^ome Denis, Fran\c{c}ois Damanet, John Martin

(参考訳) 近年、ニューラルネットワーク(anns)は、量子論、特に絡み合い理論の問題を研究するためのツールとして人気が高まっている。本研究では、入力として限られた数のwehrlモーメント(状態のフシミ関数のモーメント)のみを使用して、対称多量子状態の絡み合いの幾何学的測度をannがどの程度正確に予測できるかを分析し、状態に関する部分的情報を表現する。純粋な量子状態と混合量子状態の両方を考える。我々は、ANNを訓練して得られる結果と収束加速法を情報利用した結果を比較する。我々は、最も強力な収束加速アルゴリズムのいくつかでさえ、これらのANNを訓練するのに十分なデータが得られることを条件として、同じ入力データを与えられた場合、ANNと競合しないことがわかった。また,状態に依存しないwehrlモーメントを計測するための実験プロトコルを提供する。より一般に、この研究は、フル状態トモグラフィーよりも実験でより利用しやすい方法で、絡み合い測度と、Wehrlエントロピーのような他のSU(2)不変量の推定の視点を開く。

In recent years, artificial neural networks (ANNs) have become an increasingly popular tool for studying problems in quantum theory, and in particular entanglement theory. In this work, we analyse to what extent ANNs can accurately predict the geometric measure of entanglement of symmetric multiqubit states using only a limited number of Wehrl moments (moments of the Husimi function of the state) as input, which represents partial information about the state. We consider both pure and mixed quantum states. We compare the results we obtain by training ANNs with the informed use of convergence acceleration methods. We find that even some of the most powerful convergence acceleration algorithms do not compete with ANNs when given the same input data, provided that enough data is available to train these ANNs. We also provide an experimental protocol for measuring Wehrl moments, which is state-independent. More generally, this work opens up perspectives for the estimation of entanglement measures and other SU(2)-invariant quantities, such as the Wehrl entropy, in a way that is more accessible in experiments than by means of full state tomography.

翻訳日:2023-07-07 00:24:27 公開日:2023-07-04

# メッセージパッシングニューラルネットワークは、知識グラフの補完に役立つか?

Are Message Passing Neural Networks Really Helpful for Knowledge Graph Completion? ( http://arxiv.org/abs/2205.10652v3 )

ライセンス: Link先を確認

Juanhui Li and Harry Shomer and Jiayuan Ding and Yiqi Wang and Yao Ma and Neil Shah and Jiliang Tang and Dawei Yin

(参考訳) 知識グラフ(KG)は様々な応用を促進する。製造とメンテナンスに多大な努力をしてきたにもかかわらず、最大のkgも完成にはほど遠い。したがって、KG完了(KGC)はKG研究において最も重要な課題の一つとなっている。近年,メッセージパッシング(Graph)ニューラルネットワーク(MPNN)の活用を中心に,強力な埋め込み学習が盛んに行われている。これらの手法の成功は、追加のメッセージパッシング(MP)コンポーネントを前提として、より単純な多層パーセプトロン(MLP)モデルよりもMPNNを使うことによる。この研究で、驚くほど単純なMPPモデルでMPNNに匹敵する性能を達成できることがわかり、MPが以前信じられていたほど重要でない可能性が示唆された。さらに,注意深いスコアリング機能を示し,損失関数の設計がkgcモデルの性能に大きく影響することを示した。これは、現在最先端のKGCメソッドのスケーラビリティに関する将来的な洞察と、明日のKGCタスクに適したMP設計への注意を、事前作業におけるスコアリング関数設計、損失関数設計、MPの融合を示唆している。私たちのコードは、https://github.com/Juanhui28/Are_MPNNs_helpful.comで公開されています。

Knowledge graphs (KGs) facilitate a wide variety of applications. Despite great efforts in creation and maintenance, even the largest KGs are far from complete. Hence, KG completion (KGC) has become one of the most crucial tasks for KG research. Recently, considerable literature in this space has centered around the use of Message Passing (Graph) Neural Networks (MPNNs), to learn powerful embeddings. The success of these methods is naturally attributed to the use of MPNNs over simpler multi-layer perceptron (MLP) models, given their additional message passing (MP) component. In this work, we find that surprisingly, simple MLP models are able to achieve comparable performance to MPNNs, suggesting that MP may not be as crucial as previously believed. With further exploration, we show careful scoring function and loss function design has a much stronger influence on KGC model performance. This suggests a conflation of scoring function design, loss function design, and MP in prior work, with promising insights regarding the scalability of state-of-the-art KGC methods today, as well as careful attention to more suitable MP designs for KGC tasks tomorrow. Our codes are publicly available at: https://github.com/Juanhui28/Are_MPNNs_helpful.

翻訳日:2023-07-07 00:23:25 公開日:2023-07-04

# 弱結合分子の振動ラダー脱落光安定化:遺伝的アルゴリズムによる量子最適制御

Vibrational ladder-descending photostabilization of a weakly bound molecule: Quantum optimal control with a genetic algorithm ( http://arxiv.org/abs/2205.06165v2 )

ライセンス: Link先を確認

Mateo Londo\~no, Julio C. Arce

(参考訳) 極性分子を高次振動レベルからターゲット低次分子へ同一電子状態内で駆動する光制御方式を提案する。この方式は、解析的な形状の赤外線チャープレーザーパルスを使用し、パラメータは遺伝的アルゴリズムに基づく量子最適制御のヒューリスティックな定式化によって最適化される。この手法をkrbフェッシュバッハ分子の最低三重項電子状態における計算的に示す。

We propose an optical control scheme for driving a polar molecule from a high-lying vibrational level to a target low-lying one, within the same electronic state. The scheme utilizes an infrared chirped laser pulse with an analytical shape, whose parameters are optimized by means of a heuristic formulation of quantum optimal control based on a genetic algorithm. We illustrate this methodology computationally for a KRb Feshbach molecule in the lowest triplet electronic state.

翻訳日:2023-07-07 00:23:03 公開日:2023-07-04

# 安全・共安全言語の一階述語論理

A first-order logic characterization of safety and co-safety languages ( http://arxiv.org/abs/2209.02307v4 )

ライセンス: Link先を確認

Alessandro Cimatti and Luca Geatti and Nicola Gigante and Angelo Montanari and Stefano Tonetta

(参考訳) LTL(Linear Temporal Logic)は、コンピュータ科学の様々な分野において、最も一般的な時間論理の1つである。 LTL は反自由オメガオートマタ、星のないオメガ正規表現、(カンプの定理により)一階線形順序理論(FO-TLO)と等価である。安全性(safety)とコセーフティ(co-safety)言語は、単語がそれぞれ言語に属さないか属さないかを確立するために有限プレフィックスが十分であり、モデル検査やltlのリアクティブ合成のような問題の複雑さを低下させる上で重要な役割を果たす。 SafetyLTL (resp., coSafetyLTL) はLTLの断片であり、安全(resp., co-safety)言語のみを認識する普遍的(resp., existential)時間的モダリティのみを許容する。この論文の主な貢献は、safetyfoと呼ばれるfo-tloの断片と、ltl-definable safetyとco-safety languageに関して表現的に完結した2つのcosafetyfoの導入である。我々は,これらがそれぞれSafetyLTLとcoSafetyLTLを正確に特徴付けることを証明し,その結果がカンプの定理に一致することを証明し,一階言語の観点からLTLの特徴付け(フラグメント)をより明確にする。さらに、ltlで定義可能な安全言語がsafetyltlでも定義可能であることを直接的でコンパクトで自己完結した証明を与える。副産物として,有限語および無限語で解釈された,明日の弱作用素SafetyLTLの表現力に関する興味深い結果が得られる。さらに、有限語を解釈すると、明日の(弱明日)演算子を欠いたsafetyltl (resp. cosafetyltl) が有限語上のltlの安全(resp., co-safety)フラグメントをキャプチャする。

Linear Temporal Logic (LTL) is one of the most popular temporal logics, that comes into play in a variety of branches of computer science. Among the various reasons of its widespread use there are its strong foundational properties: LTL is equivalent to counter-free omega-automata, to star-free omega-regular expressions, and (by Kamp's theorem) to the First-Order Theory of Linear Orders (FO-TLO). Safety and co-safety languages, where a finite prefix suffices to establish whether a word does not belong or belongs to the language, respectively, play a crucial role in lowering the complexity of problems like model checking and reactive synthesis for LTL. SafetyLTL (resp., coSafetyLTL) is a fragment of LTL where only universal (resp., existential) temporal modalities are allowed, that recognises safety (resp., co-safety) languages only. The main contribution of this paper is the introduction of a fragment of FO-TLO, called SafetyFO, and of its dual coSafetyFO, which are expressively complete with respect to the LTL-definable safety and co-safety languages. We prove that they exactly characterize SafetyLTL and coSafetyLTL, respectively, a result that joins Kamp's theorem, and provides a clearer view of the characterization of (fragments of) LTL in terms of first-order languages. In addition, it gives a direct, compact, and self-contained proof that any safety language definable in LTL is definable in SafetyLTL as well. As a by-product, we obtain some interesting results on the expressive power of the weak tomorrow operator of SafetyLTL, interpreted over finite and infinite words. Moreover, we prove that, when interpreted over finite words, SafetyLTL (resp. coSafetyLTL) devoid of the tomorrow (resp., weak tomorrow) operator captures the safety (resp., co-safety) fragment of LTL over finite words.

翻訳日:2023-07-07 00:18:12 公開日:2023-07-04

# 難解な学習戦略による動的データフリー知識蒸留

Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy ( http://arxiv.org/abs/2208.13648v3 )

ライセンス: Link先を確認

Jingru Li, Sheng Zhou, Liangcheng Li, Haishuai Wang, Zhi Yu, Jiajun Bu

(参考訳) data-free knowledge distillation (dfkd) は、トレーニングデータが利用できない知識蒸留戦略 (kd) である。訓練データにアクセスせずに、大きな事前訓練された教師モデルの助けを借りて、軽量の学生モデルを訓練する。しかし,既存のdfkd法は,学習中の学習モデルの状態に応じて動的に生成目標を調整することができないため,不適切な不安定なトレーニングプロセスに苦しむ。この制限に対処するため,CuDFKDと呼ばれる新しいDFKD法を提案する。生徒に、人間が学習する方法を反映して、徐々に難解な疑似サンプルを生成するダイナミックな戦略を教える。また、CuDFKDは、学生モデルの状態に応じて生成対象を動的に適応させる。さらに, 大規模化最小化(MM)アルゴリズムの理論解析を行い, CuDFKDの収束性を説明する。 DFKD手法のロバスト性および忠実性を評価するために,CuDFKDがすべてのデータセットにおける最先端(SOTA)DFKD手法に匹敵する性能を持つことを示す実験を行った。また、我々のCuDFKDは、他のSOTA DFKD法よりも早く収束し、最も堅牢であることを示す。

Data-free knowledge distillation (DFKD) is a widely-used strategy for Knowledge Distillation (KD) whose training data is not available. It trains a lightweight student model with the aid of a large pretrained teacher model without any access to training data. However, existing DFKD methods suffer from inadequate and unstable training process, as they do not adjust the generation target dynamically based on the status of the student model during learning. To address this limitation, we propose a novel DFKD method called CuDFKD. It teaches students by a dynamic strategy that gradually generates easy-to-hard pseudo samples, mirroring how humans learn. Besides, CuDFKD adapts the generation target dynamically according to the status of student model. Moreover, We provide a theoretical analysis of the majorization minimization (MM) algorithm and explain the convergence of CuDFKD. To measure the robustness and fidelity of DFKD methods, we propose two more metrics, and experiments shows CuDFKD has comparable performance to state-of-the-art (SOTA) DFKD methods on all datasets. Experiments also present that our CuDFKD has the fastest convergence and best robustness over other SOTA DFKD methods.

翻訳日:2023-07-07 00:17:11 公開日:2023-07-04

# SFusion: 自己注意に基づくN対1マルチモーダル核融合ブロック

SFusion: Self-attention based N-to-One Multimodal Fusion Block ( http://arxiv.org/abs/2208.12776v2 )

ライセンス: Link先を確認

Zecheng Liu and Jia Wei and Rui Li and Jianlong Zhou

(参考訳) 人々は、視覚、聴覚、嗅覚、触覚など、異なる感覚で世界を知覚する。複数のモダリティから情報を処理し、融合することで、人工知能は私たちの周りの世界をより簡単に理解できるようになる。しかし、モダリティが欠けている場合、利用可能なモダリティの数は様々な状況で異なるため、n対1の融合問題に繋がる。そこで本研究では,SFusionと呼ばれる自己注意型核融合ブロックを提案する。プリセットの定式化や畳み込みに基づく方法とは異なり、提案するブロックは自動的に、合成やゼロパディングの欠如なく利用可能なモダリティを融合することを学習する。具体的には、上流処理モデルから抽出された特徴表現をトークンとして投影し、セルフアテンションモジュールに供給して潜在マルチモーダル相関を生成する。次に、下流決定モデルで適用可能な共有表現を構築するために、モーダル注意機構を導入する。提案したSFusionは,既存のマルチモーダル解析ネットワークに容易に統合できる。本研究では,SFusionを異なるバックボーンネットワークに適用し,ヒトの活動認識と脳腫瘍のセグメンテーションを行う。実験の結果,SFusionブロックは競合する融合戦略よりも優れた性能を示すことがわかった。私たちのコードはhttps://github.com/scut-cszcl/sfusionで利用可能です。

People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a self-attention based fusion block called SFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into self-attention module to generate latent multimodal correlations. Then, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed SFusion can be easily integrated into existing multimodal analysis networks. In this work, we apply SFusion to different backbone networks for human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the SFusion block achieves better performance than the competing fusion strategies. Our code is available at https://github.com/scut-cszcl/SFusion.

翻訳日:2023-07-07 00:16:49 公開日:2023-07-04

# 古典的データを用いた古典と量子機械学習の学習分離の確立について

On establishing learning separations between classical and quantum machine learning with classical data ( http://arxiv.org/abs/2208.06339v2 )

ライセンス: Link先を確認

Casper Gyurik, Vedran Dunjko

(参考訳) 長年の努力にもかかわらず、量子機械学習コミュニティは、古典的データの場合、ある種の暗号化に触発されたデータセットに対して量子学習の利点を示すことしかできなかった。本稿では,量子学習アルゴリズムがどの古典的学習アルゴリズムよりも高速に学習できる学習問題を見つけるための課題について論じ,学習問題を特定する方法について検討する。具体的には、この問題に関連する計算学習理論の主要な概念を考察し、定義の微妙な変化がいかに概念的に著しく異なるタスクを意味するかについて議論する。さらに,より一般的かつ十分な条件(すなわち「チェックリスト」)の集合を蒸留し,古典的学習者と量子学習者の分離を示す学習問題に対して,既存の学習問題を証明可能な量子スピードアップを用いて検討する。これらのチェックリストは、学習問題に対する量子スピードアップを証明するためのアプローチの合理化やボトルネックの解明を目的としている。最後に,その応用例を説明するために,このアプローチのレンズを通して,学習問題(計算分離から構築された場合,あるいは量子実験から得られた場合)の潜在的分離の例を解析する。

Despite years of effort, the quantum machine learning community has only been able to show quantum learning advantages for certain contrived cryptography-inspired datasets in the case of classical data. In this note, we discuss the challenges of finding learning problems that quantum learning algorithms can learn much faster than any classical learning algorithm, and we study how to identify such learning problems. Specifically, we reflect on the main concepts in computational learning theory pertaining to this question, and we discuss how subtle changes in definitions can mean conceptually significantly different tasks, which can either lead to a separation or no separation at all. Moreover, we study existing learning problems with a provable quantum speedup to distill sets of more general and sufficient conditions (i.e., ``checklists'') for a learning problem to exhibit a separation between classical and quantum learners. These checklists are intended to streamline one's approach to proving quantum speedups for learning problems, or to elucidate bottlenecks. Finally, to illustrate its application, we analyze examples of potential separations (i.e., when the learning problem is build from computational separations, or when the data comes from a quantum experiment) through the lens of our approach.

翻訳日:2023-07-07 00:16:28 公開日:2023-07-04

# 集団カウントのためのマルチスケールニューラルネットワークの再設計

Redesigning Multi-Scale Neural Network for Crowd Counting ( http://arxiv.org/abs/2208.02894v2 )

ライセンス: Link先を確認

Zhipeng Du, Miaojing Shi, Jiankang Deng, Stefanos Zafeiriou

(参考訳) 視点の歪みと群衆の変動は、コンピュータビジョンにおいて、群衆の数え上げが困難なタスクとなる。これに取り組むために、多くの先行研究はディープニューラルネットワーク(DNN)にマルチスケールアーキテクチャを使用してきた。マルチスケールブランチは直接マージされる(例えば結合によって)か、DNNのプロキシ(例えば注意)のガイダンスによってマージされる。これらの組み合わせ法は,その普及にもかかわらず,マルチスケール密度マップに対する画素単位の性能差に対処するには不十分である。本研究では,複数スケールの密度マップを階層的にマージした密度エキスパートの階層的混合を導入することにより,マルチスケールニューラルネットワークを再設計する。階層構造の中では、すべてのスケールからの貢献を促進するために専門家のコンペティションとコラボレーションスキームが提示され、異なる階層のスケール組み合わせのためのピクセル単位のソフトウェイトを提供するために、ピクセル単位のソフトゲーティングネットが導入された。ネットワークは、群集密度マップと局所カウントマップの両方を用いて最適化され、後者は、前者の局所積分によって得られる。両者の最適化は、潜在的な競合のために問題となる可能性がある。画像中の強予測された局所領域間の相対的数差に基づく新たな相対的局所的カウント損失を導入し, 密度マップ上の従来の絶対誤差損失と相補的であることを証明した。実験の結果,提案手法は上海技術,UCF_CC_50,JHU-CROWD++,NWPU-Crowd,Trancosの5つの公開データセットに対して,最先端のパフォーマンスを実現することがわかった。

Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos.

翻訳日:2023-07-07 00:16:06 公開日:2023-07-04

# IsoVec:単語埋め込み空間の相対同型制御

IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces ( http://arxiv.org/abs/2210.05098v3 )

ライセンス: Link先を確認

Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn

(参考訳) 単言語単語埋め込み空間から高品質な翻訳辞書を抽出する能力は、空間の幾何学的類似性、すなわちその「同型」の度合いに依存する。単語埋め込み学習の結果、基礎となる空間が非同型となるという、欠陥のある言語間マッピングの根本原因に対処する。我々は,スキップ-グラム損失関数に直接同型の大域的測度を組み込んで,訓練された単語埋め込み空間の相対的同型を増大させ,共通言語間空間にマッピングする能力を向上させる。その結果、一般的なデータ条件、ドメインミスマッチ、トレーニングアルゴリズムの相違によるバイリンガル語彙誘導が改善された。私たちはIsoVecをhttps://github.com/kellymarchisio/isovec.comでリリースします。

The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into the Skip-gram loss function, successfully increasing the relative isomorphism of trained word embedding spaces and improving their ability to be mapped to a shared cross-lingual space. The result is improved bilingual lexicon induction in general data conditions, under domain mismatch, and with training algorithm dissimilarities. We release IsoVec at https://github.com/kellymarchisio/isovec.

翻訳日:2023-07-07 00:05:37 公開日:2023-07-04

# erasenet: 教師付き文書クリーニングのための再帰的残差ネットワーク

EraseNet: A Recurrent Residual Network for Supervised Document Cleaning ( http://arxiv.org/abs/2210.00708v2 )

ライセンス: Link先を確認

Yashowardhan Shinde, Kishore Kulkarni, Sachin Kuberkar

(参考訳) ドキュメンテーションはコンピュータビジョンにおいて最も困難なタスクの1つである。デジタル化される文書は何百万もあるが、自然や人為的な要因による文書の劣化などの問題により、この作業は非常に困難である。本稿では, 完全畳み込み型自動エンコーダアーキテクチャを用いて, 汚れた文書のクリーニングを指導する手法を提案する。本稿では,文書の老朽化による変形,xeroxed したページに残されている裂け目,無作為な黒パッチ,明るい可視テキストなど,異質な文書の復元と,光学文字認識システム (ocr) の性能向上のための画像品質の向上に焦点を当てた。スキャンした文書からノイズを取り除くことは、このノイズがOCRシステムの性能に悪影響を及ぼす可能性があるため、文書の前の非常に重要なステップである。本実験では, モデルが各種の常用音や異常音を学習し, 効率よく修正できるので, 有望な結果が得られた。

Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.

翻訳日:2023-07-07 00:05:10 公開日:2023-07-04

# グラフニューラルネットワークのためのユニバーサルプロンプトチューニング

Universal Prompt Tuning for Graph Neural Networks ( http://arxiv.org/abs/2209.15240v3 )

ライセンス: Link先を確認

Taoran Fang, Yunchao Zhang, Yang Yang, Chunping Wang, Lei Chen

(参考訳) 近年、プロンプトチューニングは、事前訓練されたモデルに適応する研究の急増を引き起こしている。言語分野における統合事前学習戦略とは異なり、グラフフィールドは様々な事前学習戦略を示し、グラフニューラルネットワークの適切なプロンプトベースのチューニング方法を設計する上での課題を提起する。いくつかの先駆的な研究は、エッジ予測を事前訓練タスクとして使用するモデルの特別なプロンプト機能を考案しているが、これらの手法は特定の事前訓練されたGNNモデルに限定されており、より広範な適用性に欠ける。本稿では,任意の事前学習戦略の下で事前学習したGNNモデルに対して,GPF(Graph Prompt Feature)と呼ばれる汎用的なプロンプトベースのチューニング手法を提案する。 GPFは入力グラフの特徴空間で動作し、理論上任意の形式のプロンプト関数に等価な効果を達成できる。その結果、各事前学習戦略に対応するプロンプト関数を明示的に記述する必要がなくなった。代わりに、我々はGPFを用いて、下流タスクの誘導グラフを適応的に取得する。 GPFの普遍性を実証し、その有効性を保証するための厳密な導出を提供する。様々な事前学習戦略による実験結果から,本手法は微調整よりも優れた性能を示し,全ショットシナリオでは平均1.4%,小ショットシナリオでは約3.2%改善した。さらに,本手法は,事前学習戦略を利用したモデルに適用した場合,既存の特殊プロンプトベースのチューニング手法よりも優れる。これらの多くの利点は、この手法を下流適応のための微調整の説得力のある代替手段と位置づけている。

In recent years, prompt tuning has sparked a research surge in adapting pre-trained models. Unlike the unified pre-training strategy employed in the language field, the graph field exhibits diverse pre-training strategies, posing challenges in designing appropriate prompt-based tuning methods for graph neural networks. While some pioneering work has devised specialized prompting functions for models that employ edge prediction as their pre-training tasks, these methods are limited to specific pre-trained GNN models and lack broader applicability. In this paper, we introduce a universal prompt-based tuning method called Graph Prompt Feature (GPF) for pre-trained GNN models under any pre-training strategy. GPF operates on the input graph's feature space and can theoretically achieve an equivalent effect to any form of prompting function. Consequently, we no longer need to illustrate the prompting function corresponding to each pre-training strategy explicitly. Instead, we employ GPF to obtain the prompted graph for the downstream task in an adaptive manner. We provide rigorous derivations to demonstrate the universality of GPF and make guarantee of its effectiveness. The experimental results under various pre-training strategies indicate that our method performs better than fine-tuning, with an average improvement of about 1.4% in full-shot scenarios and about 3.2% in few-shot scenarios. Moreover, our method significantly outperforms existing specialized prompt-based tuning methods when applied to models utilizing the pre-training strategy they specialize in. These numerous advantages position our method as a compelling alternative to fine-tuning for downstream adaptations.

翻訳日:2023-07-07 00:04:41 公開日:2023-07-04

# 対称性から学ぶ--対称行動と言語指示を用いたメタ強化学習

Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions ( http://arxiv.org/abs/2209.10656v2 )

ライセンス: Link先を確認

Xiangtong Yao, Zhenshan Bing, Genghang Zhuang, Kejia Chen, Hongkuan Zhou, Kai Huang and Alois Knoll

(参考訳) メタ強化学習(Meta-RL)は,エージェントが新しいタスクを素早く学習できるようにする,有望なアプローチである。しかし、ほとんどのメタRLアルゴリズムは、報酬のみによって提供されるタスク情報不足のため、マルチタスクシナリオでの一般化が不十分である。言語条件付きメタRLは、言語命令とエージェントの動作をマッチングすることで、一般化能力を向上させる。行動と言語命令の両方に対称性があり、新しい知識の人間の学習を加速させる。したがって、対称性と言語命令をメタRLに組み合わせることで、アルゴリズムの一般化と学習効率を向上させることができる。対称な動作や言語命令を用いて,新しいタスクを効率的に学習することのできる,デュアルMDPメタ強化学習手法を提案する。提案手法は,複数の難解な操作課題において評価され,実験により,メタ強化学習の一般化と学習効率が大幅に向上することが示された。ビデオはhttps://tumi6robot.wixsite.com/symmetry/。

Meta-reinforcement learning (meta-RL) is a promising approach that enables the agent to learn new tasks quickly. However, most meta-RL algorithms show poor generalization in multi-task scenarios due to the insufficient task information provided only by rewards. Language-conditioned meta-RL improves the generalization capability by matching language instructions with the agent's behaviors. While both behaviors and language instructions have symmetry, which can speed up human learning of new knowledge. Thus, combining symmetry and language instructions into meta-RL can help improve the algorithm's generalization and learning efficiency. We propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetrical behaviors and language instructions. We evaluate our method in multiple challenging manipulation tasks, and experimental results show that our method can greatly improve the generalization and learning efficiency of meta-reinforcement learning. Videos are available at https://tumi6robot.wixsite.com/symmetry/.

翻訳日:2023-07-07 00:04:14 公開日:2023-07-04

# マラリア診断のための機械学習アルゴリズムの開発を導く指標

Metrics to guide development of machine learning algorithms for malaria diagnosis ( http://arxiv.org/abs/2209.06947v2 )

ライセンス: Link先を確認

Charles B. Delahunt, Noni Gachuhi, Matthew P. Horning

(参考訳) 自動マラリア診断は、機械学習(ML)にとって難しいが高価値なターゲットであり、効果的なアルゴリズムは何千人もの子供の命を救える。しかし、現在のMLの取り組みは重要なユースケースの制約をほとんど無視しており、臨床的には有用ではない。特に2つの要因が臨床現場設定に翻訳可能なアルゴリズムの開発に不可欠である。 (i)mlソリューションが対応しなければならない臨床ニーズを明確に理解すること。 (II)MLモデルの指導と評価のためのタスク関連メトリクス。これらの要因の無視は、臨床ニーズと一致しないため、過去のMLのマラリア研究を著しく妨げている。本稿では,この2つの問題点を,ジエマの血液膜を顕微鏡で観察することで診断する。まず、なぜドメインの専門知識が、MLをマラリアに効果的に適用し、このドメインの知識を提供する技術文書やその他のリソースをリストアップすることが重要なのかを説明する。第2に,マラリア診断の臨床的要件に合わせたパフォーマンス指標を詳述し,mlモデルの開発を指導し,臨床ニーズレンズ(汎用mlレンズではなく)を通してモデル性能を評価する。患者レベルの視点,患者間の多様性,偽陽性率,検出限界,エラーの種類などの重要性を強調した。 ROC曲線、AUC、F1がMLの作業でよく使われるが、この文脈にはあまり適さない理由についても論じる。これらの所見は、分裂病などの熱帯病(NTD)を無視するなど、寄生虫の負荷を伴う他の疾患にも当てはまる。

Automated malaria diagnosis is a difficult but high-value target for machine learning (ML), and effective algorithms could save many thousands of children's lives. However, current ML efforts largely neglect crucial use case constraints and are thus not clinically useful. Two factors in particular are crucial to developing algorithms translatable to clinical field settings: (i) Clear understanding of the clinical needs that ML solutions must accommodate; and (ii) task-relevant metrics for guiding and evaluating ML models. Neglect of these factors has seriously hampered past ML work on malaria, because the resulting algorithms do not align with clinical needs. In this paper we address these two issues in the context of automated malaria diagnosis via microscopy on Giemsa-stained blood films. First, we describe why domain expertise is crucial to effectively apply ML to malaria, and list technical documents and other resources that provide this domain knowledge. Second, we detail performance metrics tailored to the clinical requirements of malaria diagnosis, to guide development of ML models and evaluate model performance through the lens of clinical needs (versus a generic ML lens). We highlight the importance of a patient-level perspective, interpatient variability, false positive rates, limit of detection, and different types of error. We also discuss reasons why ROC curves, AUC, and F1, as commonly used in ML work, are poorly suited to this context. These findings also apply to other diseases involving parasite loads, including neglected tropical diseases (NTDs) such as schistosomiasis.

翻訳日:2023-07-07 00:04:01 公開日:2023-07-04

# PlaStIL: プラスチックで安定なメモリフリーなクラスインクリメンタルラーニング

PlaStIL: Plastic and Stable Memory-Free Class-Incremental Learning ( http://arxiv.org/abs/2209.06606v2 )

ライセンス: Link先を確認

Gr\'egoire Petit, Adrian Popescu, Eden Belouadah, David Picard, Bertrand Delezoide

(参考訳) 過去の知識を保ちながら新しいデータから学ぶためには、クラス増分学習において塑性と安定性が必要である。破滅的な忘れ方のため、メモリバッファがない場合、これら2つのプロパティ間の妥協を見つけることは特に難しい。従来のインクリメンタルな状態からの知識蒸留と微調整を使って新しいクラスを統合するため、主流のメソッドは2つの深いモデルを格納する必要がある。そこで本稿では, 可塑性と安定性のバランスを良くするために, パラメータ数に類似する手法を提案する。転送ベースのインクリメンタルメソッドですでにデプロイされているアプローチに従って,初期状態後の特徴抽出器を凍結する。最も古い段階的な状態のクラスは、安定性を確保するためにこの凍結抽出器で訓練される。最近のクラスは塑性を導入するために部分的に微調整されたモデルを用いて予測される。提案した塑性層は, 模範のない漸進的な学習を目的とした転送方式に組み込むことができ, 2つの手法に適用できる。評価は3つの大規模データセットで行う。その結果、既存の方法と比較して、すべてのテスト済み構成でパフォーマンスが向上することが示された。

Plasticity and stability are needed in class-incremental learning in order to learn from new data while preserving past knowledge. Due to catastrophic forgetting, finding a compromise between these two properties is particularly challenging when no memory buffer is available. Mainstream methods need to store two deep models since they integrate new classes using fine-tuning with knowledge distillation from the previous incremental state. We propose a method which has similar number of parameters but distributes them differently in order to find a better balance between plasticity and stability. Following an approach already deployed by transfer-based incremental methods, we freeze the feature extractor after the initial state. Classes in the oldest incremental states are trained with this frozen extractor to ensure stability. Recent classes are predicted using partially fine-tuned models in order to introduce plasticity. Our proposed plasticity layer can be incorporated to any transfer-based method designed for exemplar-free incremental learning, and we apply it to two such methods. Evaluation is done with three large-scale datasets. Results show that performance gains are obtained in all tested configurations compared to existing methods.

翻訳日:2023-07-07 00:03:37 公開日:2023-07-04

# 拘束ボース-ハバードモデルにおけるフラクトニックルッティンガー液体と超固体

Fractonic Luttinger Liquids and Supersolids in a Constrained Bose-Hubbard Model ( http://arxiv.org/abs/2210.11072v2 )

ライセンス: Link先を確認

Philip Zechmann, Ehud Altman, Michael Knap, Johannes Feldmeier

(参考訳) フラクトン制約を持つ量子多体系は、非慣習的な物質の低エネルギー位相を示すと広く予想されている。本研究では,Bose-Hubbardモデルを一次元に保存する双極子モーメント基底状態における,このような異方性量子相の存在を実証する。整数ボソン充填では,フラクトンの複合体である微視的局所双極子モデルへのシステムのマッピングを行う。ダイポールルッティンガー液相の出現を実証するために,低エネルギー場理論と大規模テンソルネットワークシミュレーションを組み合わせる。非整数補充では、量子リフシッツモデルによって説明される興味深い圧縮可能な状態が示され、電荷密度波秩序と双極子長距離秩序と超流動性(英語版)が共存する。この超固体状態は最終的に熱力学的極限の格子効果に対して不安定になるかもしれないが、その数値的ロバスト性は顕著である。我々は実験結果の潜在的意義について議論する。

Quantum many-body systems with fracton constraints are widely conjectured to exhibit unconventional low-energy phases of matter. In this work, we demonstrate the existence of a variety of such exotic quantum phases in the ground states of a dipole-moment conserving Bose-Hubbard model in one dimension. For integer boson fillings, we perform a mapping of the system to a model of microscopic local dipoles, which are composites of fractons. We apply a combination of low-energy field theory and large-scale tensor network simulations to demonstrate the emergence of a dipole Luttinger liquid phase. At non-integer fillings our numerical approach shows an intriguing compressible state described by a quantum Lifshitz model in which charge density-wave order coexists with dipole long-range order and superfluidity - a `dipole supersolid'. While this supersolid state may eventually be unstable against lattice effects in the thermodynamic limit, its numerical robustness is remarkable. We discuss potential experimental implications of our results.

翻訳日:2023-07-06 23:57:35 公開日:2023-07-04

# 一般量子マルコフ過程のヒット時間について

On Hitting Times for General Quantum Markov Processes ( http://arxiv.org/abs/2210.10188v2 )

ライセンス: Link先を確認

Lorenzo Laneve, Francesco Tacchino, Ivano Tavernelli

(参考訳) ランダムウォーク(英: Random walk、またはMarkov chains)は、理論計算機科学で広く使われているモデルである。打つ時間や混合時間などの量の分析を含むいくつかのツールは、ランダム化されたアルゴリズムを考案するのに役立ちます。注目すべき例はsch\"oning's algorithm for the satisfiability (sat) problemである。本研究では,古典的ウォークを直接一般化する量子マルコフ連鎖モデルを定義するために密度行列形式を用い,古典的理論で見られるものと同様の公式で時間を打つような共通ツールが計算できることを示し,グロバーのアルゴリズムのような既知の量子的設定に適用する。

Random walks (or Markov chains) are models extensively used in theoretical computer science. Several tools, including analysis of quantities such as hitting and mixing times, are helpful for devising randomized algorithms. A notable example is Sch\"oning's algorithm for the satisfiability (SAT) problem. In this work, we use the density-matrix formalism to define a quantum Markov chain model which directly generalizes classical walks, and we show that a common tools such as hitting times can be computed with a similar formula as the one found in the classical theory, which we then apply to known quantum settings such as Grover's algorithm.

翻訳日:2023-07-06 23:56:16 公開日:2023-07-04

# 2つの導波路積分量子エミッタの独立動作

Independent operation of two waveguide-integrated quantum emitters ( http://arxiv.org/abs/2210.09826v2 )

ライセンス: Link先を確認

Camille Papon, Ying Wang, Ravitej Uppu, Sven Scholz, Andreas Dirk Wieck, Arne Ludwig, Peter Lodahl, Leonardo Midolo

(参考訳) 複数の空間モードにおけるオンチップ単光子生成のためのフォトニック集積回路において、2つの量子ドットの共振励起を示す。 2つの量子ドットは、孤立した1対のp$-$i$-n$ジャンクションを使用して同じ発光波長に電気的に調整され、デュアルモード導波路を介して共鳴ポンプレーザーによって励起される。狭線幅量子ドットの連続波励起下での$(79\pm2)\%$の2光子量子干渉の可視性を示す。我々の研究は、決定論的単一光子源のスケールアップの鍵となる機能を実現することによって、量子フォトニクスにおける卓越した課題を解決する。

We demonstrate the resonant excitation of two quantum dots in a photonic integrated circuit for on-chip single-photon generation in multiple spatial modes. The two quantum dots are electrically tuned to the same emission wavelength using a pair of isolated $p$-$i$-$n$ junctions and excited by a resonant pump laser via dual-mode waveguides. We demonstrate two-photon quantum interference visibility of $(79\pm2)\%$ under continuous-wave excitation of narrow-linewidth quantum dots. Our work solves an outstanding challenge in quantum photonics by realizing the key enabling functionality of how to scale-up deterministic single-photon sources.

翻訳日:2023-07-06 23:56:04 公開日:2023-07-04

# コントラスト誘導拡散過程による対向ロバスト性の向上

Improving Adversarial Robustness by Contrastive Guided Diffusion Process ( http://arxiv.org/abs/2210.09643v2 )

ライセンス: Link先を確認

Yidong Ouyang, Liyan Xie, Guang Cheng

(参考訳) 標準的な分類タスクに比べてロバストな学習にはトレーニングサンプルの量が大幅に多いため、合成データ生成は分類タスクの敵対的ロバスト性を改善するための新たなツールになっている。様々な深層生成モデルの中で,拡散モデルにより高品質な合成画像が生成され,対向性の向上に優れた性能を発揮することが示されている。しかし、拡散型法は通常、他の生成モデルと比較してデータ生成が遅い。近年, 異なる加速法が提案されているが, 下流タスクにおいて生成したデータのサンプル効率を改善する方法の研究も重要である。本稿では,まず合成分布の最適性条件を解析し,非自明なロバストな精度を実現する。生成データ間の識別性の向上は, 対向的ロバスト性の向上に不可欠であることを示す。そこで本研究では,データ生成における拡散モデルを導出するコントラスト的拡散過程(Contrastive-Guided Diffusion Process, Contrastive-DP)を提案する。シミュレーションを用いて理論的結果を検証し,画像データセット上でのコントラストDPの性能を示す。

Synthetic data generation has become an emerging tool to help improve the adversarial robustness in classification tasks since robust learning requires a significantly larger amount of training samples compared with standard classification tasks. Among various deep generative models, the diffusion model has been shown to produce high-quality synthetic images and has achieved good performance in improving the adversarial robustness. However, diffusion-type methods are typically slow in data generation as compared with other generative models. Although different acceleration techniques have been proposed recently, it is also of great importance to study how to improve the sample efficiency of generated data for the downstream task. In this paper, we first analyze the optimality condition of synthetic distribution for achieving non-trivial robust accuracy. We show that enhancing the distinguishability among the generated data is critical for improving adversarial robustness. Thus, we propose the Contrastive-Guided Diffusion Process (Contrastive-DP), which adopts the contrastive loss to guide the diffusion model in data generation. We verify our theoretical results using simulations and demonstrate the good performance of Contrastive-DP on image datasets.

翻訳日:2023-07-06 23:55:53 公開日:2023-07-04

# (1,1)-クラスタ編集は多項式時間可解である

(1,1)-Cluster Editing is Polynomial-time Solvable ( http://arxiv.org/abs/2210.07722v2 )

ライセンス: Link先を確認

Gregory Gutin and Anders Yeo

(参考訳) グラフ $H$ がclique グラフであれば、$H$ はclique の頂点非共役和である。 abu-khzam (2017) は $(a,d)$-{cluster editing} 問題を導入し、固定自然数 $a,d$ に対して、グラフ $g$ と頂点重み $a^*:\ v(g)\rightarrow \{0,1,\dots,a\}$ と $d^*{}:\ v(g)\rightarrow \{0,1,\dots,d\}$ が与えられたとき、$g$ が $v\in v(g)$ に対して最大$d^*(v)$ edges インシデントを削除できるかどうかを判断する。 komusiewicz と uhlmann (2012) と abu-khzam (2017) による結果は、すべてのペアに対して$a,d$ と$a=d=1.$ abu-khzam (2017) から離れて$(a,d)$-{cluster editing} の複雑性(p または np完全)の二分法を提供し、$(1,1)$-{cluster editing} が p にあると推測した。 (i)最大次数3の$C_3$-freeおよび$C_4$-freeグラフに真に5つの多項式時間還元を与える。 (ii)最大次数の$c_3$-free と $c_4$-free グラフ上で$(1,1)$-{cluster editing} を解く多項式時間アルゴリズムを設計する。

A graph $H$ is a clique graph if $H$ is a vertex-disjoin union of cliques. Abu-Khzam (2017) introduced the $(a,d)$-{Cluster Editing} problem, where for fixed natural numbers $a,d$, given a graph $G$ and vertex-weights $a^*:\ V(G)\rightarrow \{0,1,\dots, a\}$ and $d^*{}:\ V(G)\rightarrow \{0,1,\dots, d\}$, we are to decide whether $G$ can be turned into a cluster graph by deleting at most $d^*(v)$ edges incident to every $v\in V(G)$ and adding at most $a^*(v)$ edges incident to every $v\in V(G)$. Results by Komusiewicz and Uhlmann (2012) and Abu-Khzam (2017) provided a dichotomy of complexity (in P or NP-complete) of $(a,d)$-{Cluster Editing} for all pairs $a,d$ apart from $a=d=1.$ Abu-Khzam (2017) conjectured that $(1,1)$-{Cluster Editing} is in P. We resolve Abu-Khzam's conjecture in affirmative by (i) providing a serious of five polynomial-time reductions to $C_3$-free and $C_4$-free graphs of maximum degree at most 3, and (ii) designing a polynomial-time algorithm for solving $(1,1)$-{Cluster Editing} on $C_3$-free and $C_4$-free graphs of maximum degree at most 3.

翻訳日:2023-07-06 23:54:58 公開日:2023-07-04

# ロボットによる仕事の学習:人間による自律性と展開中の学習

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment ( http://arxiv.org/abs/2211.08416v3 )

ライセンス: Link先を確認

Huihan Liu, Soroush Nasiriany, Lance Zhang, Zhiyao Bao, Yuke Zhu

(参考訳) コンピュータパワーの急速な成長とディープラーニングの最近の進歩により、研究環境における新しいロボット能力の印象的な実証が見られた。それでも、これらの学習システムは不安定な一般化を示し、実践的なタスクに過剰なトレーニングデータを必要とする。非完全性を受け入れつつ最先端のロボット学習モデルの能力を活用するために,人間とロボットが作業部門で協力するための原則フレームワークであるsiriusを提案する。このフレームワークでは、部分的に自律的なロボットが意思決定の大部分を適切に処理するタスクを負う一方で、人間のオペレーターはプロセスを監視し、困難な状況に介入する。このような人間ロボットチームは、複雑なタスクに安全なデプロイを保証する。さらに,タスク実行から収集したデータに対するポリシーの性能を向上させるための新しい学習アルゴリズムを提案する。中心となるアイデアは、トレーニングサンプルをおよそ人間の信頼で強化し、重み付けされた行動のクローンでポリシーを最適化することだ。我々はSiriusをシミュレーションおよび実際のハードウェアで評価し、Siriusが一連のコンタクトリッチな操作タスクに対して一貫してベースラインを上回り、シミュレーションで8%、実際のハードウェアで27%向上し、コンバージェンスを2倍速くし、メモリサイズを85%削減した。ビデオや詳細はhttps://ut-austin-rpl.github.io/sirius/で確認できる。

With the rapid growth of computing powers and recent advances in deep learning, we have witnessed impressive demonstrations of novel robot capabilities in research settings. Nonetheless, these learning systems exhibit brittle generalization and require excessive training data for practical tasks. To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations. Such a human-robot team ensures safe deployments in complex tasks. Further, we introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions. The core idea is re-weighing training samples with approximated human trust and optimizing the policies with weighted behavioral cloning. We evaluate Sirius in simulation and on real hardware, showing that Sirius consistently outperforms baselines over a collection of contact-rich manipulation tasks, achieving an 8% boost in simulation and 27% on real hardware than the state-of-the-art methods in policy success rate, with twice faster convergence and 85% memory size reduction. Videos and more details are available at https://ut-austin-rpl.github.io/sirius/

翻訳日:2023-07-06 23:46:49 公開日:2023-07-04

# トーションを持つ一般相対論的パイロット波量子力学

General-relativistic pilot-wave quantum mechanics with torsion ( http://arxiv.org/abs/2211.03234v2 )

ライセンス: Link先を確認

Francisco Ribeiro Benard Guedes and Nikodem Janusz Pop{\l}awski

(参考訳) ディラック粒子の4速度は、u^i=\bar{\psi}\gamma^i\psi/\bar{\psi}\psi$ による相対論的波動関数と関連している。我々は、スピノルの四項項と共変微分によって与えられる翻訳生成元を関連付ける。我々は、スピノルの固有角運動量 4-テンソルとローレンツ群のスピノル表現における回転の生成を関連付ける。スピノル場に対するスピンおよびエネルギー$-$momentumテンソルの共変保存則を用いて、アインシュタイン$-$カルタントーションの存在下で、波動がディラック方程式を満たすならば、四速度、4運動量、スピン四運動量テンソルは古典的マティスソン$-$papapetrou運動方程式を満たすことを示す。これらの方程式は測地線運動方程式に還元される。したがって、パイロット波量子力学における4速度によって導かれる粒子の運動は、時空の幾何によって決定される粒子の測地線運動と一致し、相対論的波動の2重性を表す。

We propose that the four-velocity of a Dirac particle is related to its relativistic wave function by $u^i=\bar{\psi}\gamma^i\psi/\bar{\psi}\psi$. We associate the four-momentum of a spinor with a generator of translation, given by a covariant derivative. We associate the intrinsic angular momentum four-tensor of a spinor with a generator of rotation in the spinor representation of the Lorentz group. We use the covariant conservation laws for the spin and energy$-$momentum tensors for a spinor field in the presence of the Einstein$-$Cartan torsion to show that if the wave satisfies the Dirac equation, then the four-velocity, four-momentum, and spin four-tensor satisfy the classical Mathisson$-$Papapetrou equations of motion. We show that these equations reduce to the geodesic equation of motion. Consequently, the motion of a particle guided by the four-velocity in the pilot-wave quantum mechanics coincides with the geodesic motion of the particle determined by the geometry of spacetime, representing a relativistic wave$-$particle duality.

翻訳日:2023-07-06 23:45:36 公開日:2023-07-04

# 監督信号のインフォメーション性について

On the Informativeness of Supervision Signals ( http://arxiv.org/abs/2211.01407v3 )

ライセンス: Link先を確認

Ilia Sucholutsky and Ruairidh M. Battleday and Katherine M. Collins and Raja Marjieh and Joshua C. Peterson and Pulkit Singh and Umang Bhatt and Nori Jacoby and Adrian Weller and Thomas L. Griffiths

(参考訳) 教師付き学習は通常、人間が注釈を付けたトレーニング例から転送可能な表現を学ぶことに焦点を当てる。リッチアノテーション(ソフトラベルなど)は(ハードラベルのような)スパースアノテーションよりも多くの情報を持っているが、収集するコストも高い。例えば、ハードラベルは、オブジェクトが属する最も近いクラスに関する情報のみを提供する(例:「犬である」)が、ソフトラベルは、オブジェクトと複数のクラスとの関係に関する情報を提供する(例:「これは犬である可能性が高いが、オオカミやコヨーテでもある」)。我々は情報理論を用いて、多くの一般的な監視信号が表現学習のパフォーマンスにどのように寄与するか、また、ラベル数、クラス数、寸法数、ノイズなどの要因によってその能力がどのように影響を受けるかを比較する。当社のフレームワークは,ビッグデータ環境においてハードラベルを使用するための理論的正当化を提供するが,少ない学習と分散一般化のためのよりリッチな監督信号を提供する。我々は,100万以上のクラウドソース画像アノテーションを用いた一連の実験において,これらの結果を実証的に検証し,コスト便益分析を行い,ユーザが自身のデータセットで表現学習を監督するコストを最適化できるトレードオフ曲線を確立する。

Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are also more expensive to collect. For example, while hard labels only provide information about the closest class an object belongs to (e.g., "this is a dog"), soft labels provide information about the object's relationship with multiple classes (e.g., "this is most likely a dog, but it could also be a wolf or a coyote"). We use information theory to compare how a number of commonly-used supervision signals contribute to representation-learning performance, as well as how their capacity is affected by factors such as the number of labels, classes, dimensions, and noise. Our framework provides theoretical justification for using hard labels in the big-data regime, but richer supervision signals for few-shot learning and out-of-distribution generalization. We validate these results empirically in a series of experiments with over 1 million crowdsourced image annotations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets.

翻訳日:2023-07-06 23:45:14 公開日:2023-07-04

# インテリジェント・ペインティング:拡散モデルを用いた画像合成

Intelligent Painter: Picture Composition With Resampling Diffusion Model ( http://arxiv.org/abs/2210.17106v3 )

ライセンス: Link先を確認

Wing-Fung Ku, Wan-Chi Siu, Xi Cheng, H. Anthony Chan

(参考訳) あなたは知的な画家になれると思ったことがありますか? これは、いくつかの期待されるオブジェクトを念頭に置いて、あるいは望ましいシーンで絵を描くことができることを意味する。これは、特定のオブジェクトの位置を決定できない通常のペイントアプローチとは異なる。本稿では,ある人物の想像上の場面を一行で生成する知的画家について,明示的なヒントを与える。拡散確率モデル(ddpm)を特定地点の入力対象に応じて無条件調和画像をインテリジェントに合成するための再サンプリング戦略を提案する。拡散特性を利用して効率よく再サンプリングし、リアルな画像を生成する。実験結果から,本手法は効率よく生成した出力の意味を選好し,ぼやけた出力を生成する。画像品質評価の定量的解析は,最先端の手法と比較して高い知覚的品質画像を生成することを示す。

Have you ever thought that you can be an intelligent painter? This means that you can paint a picture with a few expected objects in mind, or with a desirable scene. This is different from normal inpainting approaches for which the location of specific objects cannot be determined. In this paper, we present an intelligent painter that generate a person's imaginary scene in one go, given explicit hints. We propose a resampling strategy for Denoising Diffusion Probabilistic Model (DDPM) to intelligently compose unconditional harmonized pictures according to the input subjects at specific locations. By exploiting the diffusion property, we resample efficiently to produce realistic pictures. Experimental results show that our resampling method favors the semantic meaning of the generated output efficiently and generates less blurry output. Quantitative analysis of image quality assessment shows that our method produces higher perceptual quality images compared with the state-of-the-art methods.

翻訳日:2023-07-06 23:44:35 公開日:2023-07-04

# FI-ODE:ニューラル・オードにおけるロバストな前方不変性

FI-ODE: Certifiably Robust Forward Invariance in Neural ODEs ( http://arxiv.org/abs/2210.16940v3 )

ライセンス: Link先を確認

Yujia Huang, Ivan Dario Jimenez Rodriguez, Huan Zhang, Yuanyuan Shi, Yisong Yue

(参考訳) フォワード不変性(フォワード不変性、Forward invariance)とは、制御理論において、力学系が常に指定された状態の集合内に留まり、堅牢性を保証する(例えば、証明書は摂動の下で保持される)ことを証明するために用いられる長期研究された性質である。本稿では,ニューラルネットワークにおけるフォワード不変性の証明とトレーニングのための一般的なフレームワークを提案する。我々はこの枠組みを,頑健な連続制御における認証安全性と,画像分類のための認証された敵対的ロバスト性という2つの設定に適用する。私たちの知る限りでは、このような保証のない保証でNODEポリシーをトレーニングする最初の例です。

Forward invariance is a long-studied property in control theory that is used to certify that a dynamical system stays within some pre-specified set of states for all time, and also admits robustness guarantees (e.g., the certificate holds under perturbations). We propose a general framework for training and provably certifying robust forward invariance in Neural ODEs. We apply this framework in two settings: certified safety in robust continuous control, and certified adversarial robustness for image classification. To our knowledge, this is the first instance of training NODE policies with such non-vacuous certified guarantees.

翻訳日:2023-07-06 23:44:20 公開日:2023-07-04

# 欠陥のない原子配列の高速作成のための並列圧縮アルゴリズム

Parallel compression algorithm for fast preparation of defect-free atom arrays ( http://arxiv.org/abs/2212.03047v2 )

ライセンス: Link先を確認

Shangguo Zhu, Yun Long, Mingbo Pu, Xiangang Luo

(参考訳) 欠陥のない原子配列は量子科学と技術のための強力で汎用的なプラットフォームとして登場し、高いプログラマビリティと有望なスケーラビリティを提供している。配列は、部分的にロードされた初期配列から指定されたターゲット部位に原子を配置することで作成することができる。しかし、大きな欠陥のないアレイを実現するには、再配置中の原子損失と、配列サイズに逆比例する真空制限寿命が問題となる。原子再配置の成功には、時間コストと原子損失を最小限に抑える効率的な再配置アルゴリズムが不可欠である。本稿では,複数の移動式ツイーザを用いて同時に原子を転送する並列圧縮アルゴリズムを提案する。トータルタイムコストは、ターゲットサイト数と線形にスケールするように削減できる。このアルゴリズムは、現在の実験装置で容易に実装できる。

Defect-free atom arrays have emerged as a powerful and versatile platform for quantum sciences and technologies, offering high programmability and promising scalability. The arrays can be prepared by rearranging atoms from a partially loaded initial array to the designated target sites. However, achieving large defect-free arrays presents challenges due to atom loss during rearrangement and the vacuum-limited lifetime which is inversely proportional to the array size. Efficient rearrangement algorithms which minimize time cost and atom loss are crucial for successful atom rearrangement. Here we propose a novel parallel compression algorithm which leverages multiple mobile tweezers to transfer atoms simultaneously. The total time cost could be reduced to scale linearly with the number of target sites. This algorithm can be readily implemented in current experimental setups.

翻訳日:2023-07-06 23:36:44 公開日:2023-07-04

# 適応的サンプリングによる公平な介入による条件付き生成のスプリアス因果関係の破れ

Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling ( http://arxiv.org/abs/2212.02090v2 )

ライセンス: Link先を確認

Junhyun Nam, Sangwoo Mo, Jaeho Lee, Jinwoo Shin

(参考訳) サンプルとラベルの関係を捉えるために、条件付き生成モデルはトレーニングデータセットからスプリアス相関を継承することが多い。これは別の潜在属性に対して不均衡なラベル条件分布をもたらす。本稿では,条件生成の急激な因果関係を緩和するために,一般的な2段階戦略を提案する。 (a)Fairness Intervention (FI):トレーニングデータセットの急激な相関により生成が困難であるマイノリティサンプルを強調する。 b) 補正サンプリング(CS): 生成されたサンプルを明示的にフィルタリングし、所望の潜在属性分布に従うことを保証する。我々は,無監督,弱監督,半監督のシナリオを含むスプリアス属性の様々な監督のために,公平な介入をデザインした。実験の結果,FICSは様々なデータセットにまたがる条件生成の急激な因果関係を効果的に解決できることが示された。

To capture the relationship between samples and labels, conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. To mitigate this issue, which we call spurious causality of conditional generation, we propose a general two-step strategy. (a) Fairness Intervention (FI): emphasize the minority samples that are hard to generate due to the spurious correlation in the training dataset. (b) Corrective Sampling (CS): explicitly filter the generated samples and ensure that they follow the desired latent attribute distribution. We have designed the fairness intervention to work for various degrees of supervision on the spurious attribute, including unsupervised, weakly-supervised, and semi-supervised scenarios. Our experimental results demonstrate that FICS can effectively resolve spurious causality of conditional generation across various datasets.

翻訳日:2023-07-06 23:36:32 公開日:2023-07-04

# OPUS-MTを用いたニューラルマシン翻訳の民主化

Democratizing Neural Machine Translation with OPUS-MT ( http://arxiv.org/abs/2212.01936v3 )

ライセンス: Link先を確認

J\"org Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Gr\"onroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

(参考訳) 本稿では,オープン機械翻訳モデルとツールの開発,エンドユーザーアプリケーション,開発プラットフォーム,プロフェッショナルワークフローへの統合に焦点をあてたOPUSエコシステムについて述べる。我々は現在進行中の言語カバレッジと翻訳品質の向上に関するミッションについて論じるとともに,モジュール型翻訳モデルの開発と,通常のデスクトップや小型デバイス上でのリアルタイム翻訳のための高速化されたコンパクトソリューションについて述べる。

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the development of modular translation models and speed-optimized compact solutions for real-time translation on regular desktops and small devices.

翻訳日:2023-07-06 23:36:15 公開日:2023-07-04

# eコマースサイトにおける感情分析と意見マイニング

Sentiment analysis and opinion mining on E-commerce site ( http://arxiv.org/abs/2211.15536v2 )

ライセンス: Link先を確認

Fatema Tuz Zohra Anny and Oahidul Islam

(参考訳) 感情分析や意見マイニングは、NLP(Natural Language Processing)というフレーズを説明するのに役立つ。近年では感性分析が最も重要な話題となっている。本研究の目的は,感情分析における感情極性分類の課題を解決することである。全体的プロセスの説明とともに、感情的反対を分類する幅広い手法が提示される。分析の結果,文レベルの分類とレビューレベルの分類の両方が行われる。最後に,今後の感情分析研究の計画について述べる。

Sentiment analysis or opinion mining help to illustrate the phrase NLP (Natural Language Processing). Sentiment analysis has been the most significant topic in recent years. The goal of this study is to solve the sentiment polarity classification challenges in sentiment analysis. A broad technique for categorizing sentiment opposition is presented, along with comprehensive process explanations. With the results of the analysis, both sentence-level classification and review-level categorization are conducted. Finally, we discuss our plans for future sentiment analysis research.

翻訳日:2023-07-06 23:36:06 公開日:2023-07-04

# 位相相と論理ゲートのフェルミオン欠陥

Fermionic defects of topological phases and logical gates ( http://arxiv.org/abs/2211.12394v2 )

ライセンス: Link先を確認

Ryohei Kobayashi

(参考訳) 2+1)Dボソニック位相の余次元-1欠陥について論じ、そこでは欠陥がフェルミオン自由度を支持する。このような欠陥をフェルミオン欠陥(fermionic defects)と呼び、任意のオンの自己統計をシフトできる「ゲージググウェンspt欠陥(gauged gu-wen spt defects)」と呼ばれる可逆フェルミオン欠陥のサブクラスを導入する。我々は、ゲージ付きGu-Wen SPT欠陥と、その欠陥上のフェルミオンから分離されたボソニック非可逆欠陥の融合の観点から、一般フェルミオン非可逆欠陥の正準形式を導出した。次に、総称可逆フェルミオン欠陥の融合則を導出する。ゲージ付きGu-Wen SPT欠陥は、追加のアンシラフェルミオンの存在下で安定化符号の興味深い論理ゲートをもたらす。例えば、(2+1)d $\mathbb{z}_2$ toric符号に(2+1)d ancilla trivial atomic insulatorを積み重ねたcz論理ゲートが有限深さ回路によって実装されている。また,(3+1)d walker-wangモデルの境界上で実現される(2+1)dボソニック位相相間のガッピングフェルミオン界面についても検討した。この場合、ガッピングされた界面は(2+1)d相のキラル中心電荷をシフトすることができる。これらのフェミオン界面のうち、(3+1)D相が空間反射対称性を持ち、(2+1)D表面トポロジカル秩序とその向き反転を補間する反射面にフェルミオン界面が支持される興味深い例を研究する。この設定を実現する(3+1)d 可解ハミルトニアンを構築し、このモデルが反射平面上の空間反射対称性とフェルミオンパリティを持つ (3+1)d 可逆位相の$\mathbb{z}_8$ の分類を生成する。我々は、時空高群対称性を持つエキゾチックな可逆位相として知られる有効場理論と接触する。

We discuss the codimension-1 defects of (2+1)D bosonic topological phases, where the defects can support fermionic degrees of freedom. We refer to such defects as fermionic defects, and introduce a certain subclass of invertible fermionic defects called "gauged Gu-Wen SPT defects" that can shift self-statistics of anyons. We derive a canonical form of a general fermionic invertible defect, in terms of the fusion of a gauged Gu-Wen SPT defect and a bosonic invertible defect decoupled from fermions on the defect. We then derive the fusion rule of generic invertible fermionic defects. The gauged Gu-Wen SPT defects give rise to interesting logical gates of stabilizer codes in the presence of additional ancilla fermions. For example, we find a realization of the CZ logical gate on the (2+1)D $\mathbb{Z}_2$ toric code stacked with a (2+1)D ancilla trivial atomic insulator, which is implemented by a finite depth circuit. We also investigate a gapped fermionic interface between (2+1)D bosonic topological phases realized on the boundary of the (3+1)D Walker-Wang model. In that case, the gapped interface can shift the chiral central charge of the (2+1)D phase. Among these fermionic interfaces, we study an interesting example where the (3+1)D phase has a spatial reflection symmetry, and the fermionic interface is supported on a reflection plane that interpolates a (2+1)D surface topological order and its orientation-reversal. We construct a (3+1)D exactly solvable Hamiltonian realizing this setup, and find that the model generates the $\mathbb{Z}_8$ classification of the (3+1)D invertible phase with spatial reflection symmetry and fermion parity on the reflection plane. We make contact with an effective field theory, known in literature as the exotic invertible phase with spacetime higher-group symmetry.

翻訳日:2023-07-06 23:36:00 公開日:2023-07-04

# 解剖誘導型領域適応による3次元インベッドヒトポーズ推定

Anatomy-guided domain adaptation for 3D in-bed human pose estimation ( http://arxiv.org/abs/2211.12193v2 )

ライセンス: Link先を確認

Alexander Bigalke, Lasse Hansen, Jasper Diesel, Carlotta Hennigs, Philipp Rostalski, Mattias P. Heinrich

(参考訳) 3次元人間のポーズ推定は臨床モニタリングシステムの重要な構成要素である。しかし、深部ポーズ推定モデルの臨床的適用性は、十分なラベル付きトレーニングデータの必要性とともに、ドメインシフトの下での一般化の貧弱さによって制限されている。本稿では,ラベル付きソースからシフト未ラベルのターゲットドメインにモデルを適応させる新しいドメイン適応手法を提案する。本手法は,ヒト解剖学に関する事前知識に基づく2つの相補的適応戦略からなる。まず,対象領域における学習過程を,解剖学的に妥当なポーズの空間に制約することで導く。この目的のために, 従来の知識を解剖学的損失関数に組み込んで, 非対称な手足長, 骨長, 関節角度を解析した。第二に,自己学習のための疑似ラベルを解剖学的妥当性に応じてフィルタリングし,その概念を平均教師パラダイムに取り入れる。我々は、教師なしおよびソースなしのドメイン適応に適用可能なポイントクラウドベースのフレームワークで両方の戦略を統合する。パブリックSLPデータセットと新たに作成されたデータセットを用いて,2つの適応シナリオ下でのベッド内ポーズ推定を行う。本手法は,最先端ドメイン適応法を一貫して上回り,ベースラインモデルを31%/66%上回り,領域ギャップを65%/82%削減する。ソースコードはhttps://github.com/multimodallearning/da-3dhpe-anatomyで入手できる。

3D human pose estimation is a key component of clinical monitoring systems. The clinical applicability of deep pose estimation models, however, is limited by their poor generalization under domain shifts along with their need for sufficient labeled training data. As a remedy, we present a novel domain adaptation method, adapting a model from a labeled source to a shifted unlabeled target domain. Our method comprises two complementary adaptation strategies based on prior knowledge about human anatomy. First, we guide the learning process in the target domain by constraining predictions to the space of anatomically plausible poses. To this end, we embed the prior knowledge into an anatomical loss function that penalizes asymmetric limb lengths, implausible bone lengths, and implausible joint angles. Second, we propose to filter pseudo labels for self-training according to their anatomical plausibility and incorporate the concept into the Mean Teacher paradigm. We unify both strategies in a point cloud-based framework applicable to unsupervised and source-free domain adaptation. Evaluation is performed for in-bed pose estimation under two adaptation scenarios, using the public SLP dataset and a newly created dataset. Our method consistently outperforms various state-of-the-art domain adaptation methods, surpasses the baseline model by 31%/66%, and reduces the domain gap by 65%/82%. Source code is available at https://github.com/multimodallearning/da-3dhpe-anatomy.

翻訳日:2023-07-06 23:35:24 公開日:2023-07-04

# 量子客観性における冗長性とコンセンサスの意味

The meaning of redundancy and consensus in quantum objectivity ( http://arxiv.org/abs/2211.09150v2 )

ライセンス: Link先を確認

Dario A. Chisholm, Luca Innocenti, G. Massimo Palma

(参考訳) 量子客観性の文脈において「冗長性」と「合意」という用語はしばしば同義語として用いられるが、ここではこれらが量子-古典的遷移の異なる特徴を定量化する2つの関連しているが異なる概念として理解されるべきであることを示す。量子客観性、すなわちスペクトル放送構造と量子ダーウィン主義の2つの主要なフレームワークは、それぞれ冗長性とコンセンサスを定量化するのに最適であることを示す。さらに、非局所的に符号化された情報の明示的な例を解析することにより、冗長度とコンセンサスとの潜在的な相違を明らかにする。特に、これはスペクトル放送構造と量子ダーウィン主義の間の階層的関係を崩壊させる。我々のフレームワークは、量子客観性という文脈で既知の結果と将来の結果を解釈するための新しい視点を提供し、量子領域からの古典性の出現をより深く理解するための道を開く。

While the terms "redundancy" and "consensus" are often used as synonyms in the context of quantum objectivity, we show here that these should be understood as two related but distinct notions, that quantify different features of the quantum-to-classical transition. We show that the two main frameworks used to measure quantum objectivity, namely spectrum broadcast structure and quantum Darwinism, are best suited to quantify redundancy and consensus, respectively. Furthermore, by analyzing explicit examples of states with nonlocally encoded information, we highlight the potentially stark difference between the degrees of redundancy and consensus. In particular, this causes a break in the hierarchical relations between spectrum broadcast structure and quantum Darwinism. Our framework provides a new perspective to interpret known and future results in the context of quantum objectivity, paving the way for a deeper understanding of the emergence of classicality from the quantum realm.

翻訳日:2023-07-06 23:34:59 公開日:2023-07-04

# ケースベースニューラルネットワーク:時間変動と高次相互作用による生存率解析

Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions ( http://arxiv.org/abs/2301.06535v3 )

ライセンス: Link先を確認

Jesse Islam, Maxime Turgeon, Robert Sladek, Sahir Bhatnagar

(参考訳) ニューラルネットワークに基づく生存法は、データ駆動の共変量相互作用をモデル化することができる。これらの手法は回帰に基づくアプローチよりも優れた予測性能を提供するが、時間変動相互作用や複雑なベースラインハザードをモデル化できるわけではない。そこで本研究では,ケースベースサンプリングフレームワークとフレキシブルニューラルネットワークアーキテクチャを組み合わせた新しいアプローチとして,ケースベースニューラルネットワーク(cbnns)を提案する。新たなサンプリング手法とデータ拡張を用いて、自然に検閲を考慮し、入力として時間がかかるかもしれないフィードフォワードニューラルネットワークを構築する。 cbnnは特定の瞬間に発生する事象の確率を予測し、ハザード関数を推定する。 CBNNの性能と回帰とニューラルネットワークに基づく生存法を比較したシミュレーションと,2つの時間依存メトリクスを用いた3つのケーススタディを行った。まず, 複雑なベースラインハザードと時間変動の相互作用を含むシミュレーションの性能を検証し, cbnn が競争相手を上回り, 全手法を評価する。次に,3つの実データアプリケーションに適用し,CBNNは2つの研究で競合するモデルより優れており,第3に同様の性能を示す。本研究は,ケースベースサンプリングと深層学習を組み合わせることで,データ駆動型・時間変動相互作用モデリングのための簡易かつ柔軟なモデリングフレームワークを提供する。 Rパッケージはhttps://github.com/Jesse-Islam/cbnnで入手できる。

Neural network-based survival methods can model data-driven covariate interactions. While these methods can provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines the case-base sampling framework with flexible neural network architectures. Using a novel sampling scheme and data augmentation to naturally account for censoring, we construct a feed-forward neural network that may take time as an input. CBNNs predict the probability of an event occurring at a given moment to estimate the hazard function. We compare the performance of CBNNs to regression and neural network-based survival methods in a simulation and three case studies using two time-dependent metrics. First, we examine performance on a simulation involving a complex baseline hazard and time-varying interactions to assess all methods, with CBNN outperforming competitors. Then, we apply all methods to three real data applications, with CBNNs outperforming the competing models in two studies and showing similar performance in the third. Our results highlight the benefit of combining case-base sampling with deep learning to provide a simple and flexible modeling framework for data-driven, time-varying interaction modeling of single event survival outcomes. An R package is available at https://github.com/Jesse-Islam/cbnn.

翻訳日:2023-07-06 23:27:35 公開日:2023-07-04

# 視覚言語関係アライメントのためのクロスモーダル注意調整

Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment ( http://arxiv.org/abs/2212.10549v2 )

ライセンス: Link先を確認

Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

(参考訳) マルチモーダル視覚言語モデルのスケールアップに向けた最近の進歩にもかかわらず、これらのモデルはWinogroundのような合成一般化ベンチマークに苦戦していることが知られている。現在の視覚言語モデルに欠けている重要な要素は、テキスト(例えば「草の中のマグ」)の方向的意味関係と画像中の空間的関係(例えば、草の相対的なマグの位置)とを一致させる能力である関係レベルアライメントである。この問題に対処するために,モーグから「グラス」への指示言語注意(意味的関係「イン」をキャプチャする)をモッグから草への指示視覚的注意に合わせることで,関係アライメントが実施可能であることを示す。相互注意を用いて、トークンとその対応するオブジェクトをソフトに識別する。我々は,このソフトリレーションアライメントの概念が,モーダル・アテンション・マトリクスによって提供される「ベースの変化」の下で,視覚と言語注意行列の一致を強制することと同値であることを示す。直感的には、我々のアプローチは言語注意空間への視覚的注意を投影し、実際の言語注意からの分岐を計算し、その逆も計算する。 UNITERにCACR(Cross-modal Attention Congruence Regularization)の損失を適用し,Winogroundに対する最先端アプローチを改善した。

Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships in the image (e.g., the position of the mug relative to the grass). To tackle this problem, we show that relation alignment can be enforced by encouraging the directed language attention from 'mug' to 'grass' (capturing the semantic relation 'in') to match the directed visual attention from the mug to the grass. Tokens and their corresponding objects are softly identified using the cross-modal attention. We prove that this notion of soft relation alignment is equivalent to enforcing congruence between vision and language attention matrices under a 'change of basis' provided by the cross-modal attention matrix. Intuitively, our approach projects visual attention into the language attention space to calculate its divergence from the actual language attention, and vice versa. We apply our Cross-modal Attention Congruence Regularization (CACR) loss to UNITER and improve on the state-of-the-art approach to Winoground.

翻訳日:2023-07-06 23:26:44 公開日:2023-07-04

# ミニモデル適応:アライメント・アライメントによる事前学習されたモデルを新しい言語に効率的に拡張する

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training ( http://arxiv.org/abs/2212.10503v2 )

ライセンス: Link先を確認

Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

(参考訳) 以前の研究は、トランスフォーマー本体を凍結させながら、新しい組込みを学習することで、事前訓練されたマスケッド言語モデル(MLM)を新しい言語に拡張できることを示していた。パラメータの小さなサブセットを学習しても、新しい埋め込みをトレーニングするためには、モデル全体を完全な前方と後方にパスする必要があるため、このアプローチは計算効率が良くない。大規模モデルのパラメータのごく一部から浅いミニモデルを構築する計算効率のよい代替案であるミニモデル適応を提案する。新しい言語固有の埋め込みは、ミニモデル上で効率的に訓練され、高速な言語間移動のために整列した大きなモデルにプラグインされる。 minijointは、中間層にmlmヘッドを持つ1つのトランスフォーマを使用して、プライマリモデルとミニモデルを事前学習し、minipostは、通常の事前トレーニングされたモデルから開始し、いくつかのレイヤを抽出・凍結することでミニモデルを構築し、その上に少数のパラメータを学習する。 XNLI、MLQA、PAWS-Xの実験では、ミニモデル適応は平均2.3倍の計算量で標準手法のパフォーマンスと一致している。

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using 2.3x less compute on average.

翻訳日:2023-07-06 23:26:16 公開日:2023-07-04

# 双対領域における画家的イメージ調和

Painterly Image Harmonization in Dual Domains ( http://arxiv.org/abs/2212.08846v4 )

ライセンス: Link先を確認

Junyan Cao, Yan Hong, Li Niu

(参考訳) 画像調和は、前景の外観を背景と適合するように調整することにより、視覚的に調和した複合画像を作成することを目的としている。合成画像が写真前景と画家的背景を有する場合、この課題は画家的イメージ調和と呼ばれる。このタスクには、時間を要するか、うまく調和した結果を生み出すのに弱い、ごくわずかの作業しかありません。本研究では,空間領域と周波数領域の両方の複合画像とを調和させるデュアルドメイン生成器とデュアルドメイン判別器からなる,新しい画家的調和ネットワークを提案する。デュアルドメイン生成器は,空間領域におけるadainモジュールと周波数領域における提案するresfftモジュールとの調和を行う。二重領域判別器は、各パッチの空間的特徴と周波数特徴に基づいて不調和なパッチを識別し、逆向きにジェネレータの能力を高める。ベンチマークデータセットの大規模な実験により,本手法の有効性が示された。私たちのコードとモデルはhttps://github.com/bcmi/PHDNet-Painterly-Image-Harmonizationで公開されています。

Image harmonization aims to produce visually harmonious composite images by adjusting the foreground appearance to be compatible with the background. When the composite image has photographic foreground and painterly background, the task is called painterly image harmonization. There are only few works on this task, which are either time-consuming or weak in generating well-harmonized results. In this work, we propose a novel painterly harmonization network consisting of a dual-domain generator and a dual-domain discriminator, which harmonizes the composite image in both spatial domain and frequency domain. The dual-domain generator performs harmonization by using AdaIN modules in the spatial domain and our proposed ResFFT modules in the frequency domain. The dual-domain discriminator attempts to distinguish the inharmonious patches based on the spatial feature and frequency feature of each patch, which can enhance the ability of generator in an adversarial manner. Extensive experiments on the benchmark dataset show the effectiveness of our method. Our code and model are available at https://github.com/bcmi/PHDNet-Painterly-Image-Harmonization.

翻訳日:2023-07-06 23:25:25 公開日:2023-07-04

# ドメイン内シナリオを超えて:ロバスト密度対応キャリブレーション

Beyond In-Domain Scenarios: Robust Density-Aware Calibration ( http://arxiv.org/abs/2302.05118v2 )

ライセンス: Link先を確認

Christian Tomani, Futa Waseda, Yuesong Shen and Daniel Cremers

(参考訳) 深層ニューラルネットワークがますます安全クリティカルなアプリケーションに展開されていく中、ディープラーニングモデルを校正して不確実性を認識することは重要だ。既存のhoc後のキャリブレーション手法は、ドメイン内テストデータセットで印象的な結果が得られたが、それらはドメインシフトおよびドメイン外(ood)シナリオにおいて信頼性の高い不確実性推定ができないため、制限されている。このギャップを,k-nearest-neighbors (knn) に基づく精度保存法であるdacと密度認識校正法を提案することで橋渡しする。従来のポストホック法とは対照的に,分類器の隠れた層を不確実性に関する情報の源として利用し,その重要性について検討する。 DACは最先端のポストホック手法と簡単に組み合わせられる汎用手法であることを示す。 DACは、ドメインシフトとOODのキャリブレーション性能のロバスト性を高め、ドメイン内予測の不確実性評価を良好に維持する。私たちは、DACが多数のモデルアーキテクチャ、データセット、メトリクスのキャリブレーションを一貫して改善することを示した。さらに,DACは大量のデータを事前学習した最近の大規模ニューラルネットワークにおいて,キャリブレーションを大幅に改善することを示す。

Calibrating deep learning models to yield uncertainty-aware predictions is crucial as deep neural networks get increasingly deployed in safety-critical applications. While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN). In contrast to existing post-hoc methods, we utilize hidden layers of classifiers as a source for uncertainty-related information and study their importance. We show that DAC is a generic method that can readily be combined with state-of-the-art post-hoc methods. DAC boosts the robustness of calibration performance in domain-shift and OOD, while maintaining excellent in-domain predictive uncertainty estimates. We demonstrate that DAC leads to consistently better calibration across a large number of model architectures, datasets, and metrics. Additionally, we show that DAC improves calibration substantially on recent large-scale neural networks pre-trained on vast amounts of data.

翻訳日:2023-07-06 23:18:29 公開日:2023-07-04

# データ中心機械学習のための再ラベル法

The Re-Label Method For Data-Centric Machine Learning ( http://arxiv.org/abs/2302.04391v3 )

ライセンス: Link先を確認

Tong Guo

(参考訳) 業界深層学習アプリケーションでは、手作業でラベル付けしたデータは、一定の数のノイズデータを持っています。この問題を解決し、開発データセットで90以上のスコアを達成するために、人間のラベル付けにおける参照としてモデル予測を考慮し、ノイズデータを見つけ、ノイズデータを再ラベルする簡単な方法を提案する。本稿では,分類,シーケンスタグ付け,オブジェクト検出,シーケンス生成,クリックスルー率予測など,幅広いディープラーニングタスクのセットについて述べる。実験結果と人体評価結果は,我々の考えを検証する。

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.

翻訳日:2023-07-06 23:17:47 公開日:2023-07-04

# ネットワークにおける2次元空間分割の生成モデル

Generative models for two-ground-truth partitions in networks ( http://arxiv.org/abs/2302.02787v2 )

ライセンス: Link先を確認

Lena Mangold and Camille Roth

(参考訳) ネットワークのメソスケール構造を特徴付けるために、無数のアプローチが提案されている。明らかに、異なる種類のパターンを検出するために設計された異なる手法は、ネットワークのメソスケール構造に様々な答えをもたらす可能性がある。しかし、あるメソッドの複数の実行でさえ、多様で矛盾する結果をもたらすことがあるため、ネットワークの複数の(局所的に最適な)メソスケールの説明を含む、パーティションのランドスケープ全体を生成できる。このような曖昧さは、ネットワーク内の複数の定性的に異なる「根拠真理」パーティションを見つけるためのこれらの方法の能力をより詳しく見る動機となる。本稿では,1つのベンチマークネットワークのメソスケール構造に2つの異なるパーティションを組み込むことのできる生成モデルである確率的クロスブロックモデル(SCBM)を提案する。本研究では,確率ブロックモデル (SBM) のパワーを推定し,異なる強度の両コミュニティとコア周辺構造を暗黙的に植え付けることで,ベンチマークモデルの適用例を示す。モデル設計と実験的なセットアップから,2つのパーティションを個別に検出する能力はSBM変種によって異なり,両パーティションの共存は極めて限られたケースでのみ回復されることがわかった。以上の結果から,ほとんどの例では,他のパーティションが存在する場合でも,ひとつの構造のみを検出できることが示唆された。異なる競合する説明が存在する場合、分割の景観全体を考慮する必要性を強調し、分割共存検出法を前進させるために将来の研究を動機付ける。また,ネットワークのメソスケール構造におけるあいまいさを検出するために,新しい手法や既存手法のさらなる探索を可能にすることで,ベンチマークネットワークの分野に寄与する。

A myriad of approaches have been proposed to characterise the mesoscale structure of networks - most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network's mesoscale structure. Yet, even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bi-community and core-periphery structures of different strengths. Given our model design and experimental set-up, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one - in some way dominating - structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.

翻訳日:2023-07-06 23:17:38 公開日:2023-07-04

# マルチパーティイト非局所性とデバイス非依存効果ウィットネスの階層性

A Hierarchy of Multipartite Nonlocality and Device-Independent Effect Witnesses ( http://arxiv.org/abs/2301.12081v2 )

ライセンス: Link先を確認

Peter Bierhorst, Jitendra Prakash

(参考訳) 最近の新しい定義によれば、マルチパーティの行動が真にマルチパーティの非ローカル(gmnl)であるとは、すべてのパーティが共有するローカル(古典的)リソースを補完する二パートのみの非ローカルリソースの基盤ネットワーク上の測定値からモデル化できない場合である。新しい定義は、基礎となる二成分資源間の絡み合いの計測と/または超量子の振る舞いを許容するかどうかによって異なる。本稿では,これらの新しいGMNLの候補定義の階層構造を3つの量子ネットワークに分類し,デバイスに依存しないネットワーク効果の目撃者への親密な関係を明らかにする。 A key finding is the existence of a behavior in the simplest nontrivial multi-partite measurement scenario (3 parties, 2 measurement settings, and 2 outcomes) that cannot be simulated in a bipartite network prohibiting entangled measurements and superquantum resources -- thus witnessing the most general form of GMNL -- but can be simulated with bipartite-only quantum states with an entangled measurement, indicating an approach to device independent certification of entangled measurements with fewer settings than in previous protocols. 驚くべきことに、この3,2,2)の挙動は、従来はアンタングル測定のデバイス非依存の目撃者として研究されていたものと同様に、アンタングル測定を禁止しつつ、超量子双極子資源を許容するGMNL階層のより高いエケロンでシミュレートできる。これは、二部類非局所性とは異なる観測可能な現象として、絡み合った測定の理論に依存しない理解に挑戦する。

According to recent new definitions, a multi-party behavior is genuinely multipartite nonlocal (GMNL) if it cannot be modeled by measurements on an underlying network of bipartite-only nonlocal resources, possibly supplemented with local (classical) resources shared by all parties. The new definitions differ on whether to allow entangled measurements upon, and/or superquantum behaviors among, the underlying bipartite resources. Here, we categorize the full hierarchy of these new candidate definitions of GMNL in three-party quantum networks, highlighting the intimate link to device-independent witnesses of network effects. A key finding is the existence of a behavior in the simplest nontrivial multi-partite measurement scenario (3 parties, 2 measurement settings, and 2 outcomes) that cannot be simulated in a bipartite network prohibiting entangled measurements and superquantum resources -- thus witnessing the most general form of GMNL -- but can be simulated with bipartite-only quantum states with an entangled measurement, indicating an approach to device independent certification of entangled measurements with fewer settings than in previous protocols. Surprisingly, we also find that this (3,2,2) behavior, as well as the others previously studied as device-independent witnesses of entangled measurements, can all be simulated at a higher echelon of the GMNL hierarchy that allows superquantum bipartite resources while still prohibiting entangled measurements. This poses a challenge to a theory-independent understanding of entangled measurements as an observable phenomenon distinct from bipartite nonlocality.

翻訳日:2023-07-06 23:17:10 公開日:2023-07-04

# 神経作用素の分布外リスク境界とヘルムホルツ方程式への応用

Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation ( http://arxiv.org/abs/2301.11509v3 )

ライセンス: Link先を確認

J. Antonio Lara Benitez, Takashi Furuya, Florian Faucher, Anastasis Kratsios, Xavier Tricoche, Maarten V. de Hoop

(参考訳) PDEによって定義された幅広い演算子の近似に顕著な成功にもかかわらず、既存のニューラル演算子(NO)は必ずしも全ての物理問題に対してうまく機能しない。ここでは高周波波に着目し,欠点を浮き彫りにする。そこで本研究では,nos のサブファミリーを提案し,境界領域上のヘルムホルツ方程式の境界値と解への波動速度の非線形作用素マッピングを拡張的に近似する手法を提案する。後者の作用素は、逆問題の研究において一般に'forward'演算子と呼ばれる。提案手法は,確率深度などのトランスフォーマーや技術からインスピレーションを得ている。本実験は,確率的深度導入の一般化と関連性において,ある種の驚きを明らかにするものである。我々のNOは、トレーニングディストリビューション内でのテストだけでなく、アウト・オブ・ディストリビューションのシナリオに対しても、標準的なNOよりも優れたパフォーマンスを示しています。この観察を掘り下げるために、修正されたモデルに関連するラデマッハ複雑性を詳細に分析し、既存のnosが満たさない確率的深さに結びついた上限を証明します。さらに,バナッハ空間上のガウス測度に合わせた,確率的深さと境界に関する新たな分布的リスクが得られた。我々は、NOsのサブファミリーのハイパーネットワークバージョンを、前述のフォワード演算子の代理モデルとして提案することで結論付ける。

Despite their remarkable success in approximating a wide range of operators defined by PDEs, existing neural operators (NOs) do not necessarily perform well for all physics problems. We focus here on high-frequency waves to highlight possible shortcomings. To resolve these, we propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator mapping wave speed to solution, or boundary values for the Helmholtz equation on a bounded domain. The latter operator is commonly referred to as the ''forward'' operator in the study of inverse problems. Our methodology draws inspiration from transformers and techniques such as stochastic depth. Our experiments reveal certain surprises in the generalization and the relevance of introducing stochastic depth. Our NOs show superior performance as compared with standard NOs, not only for testing within the training distribution but also for out-of-distribution scenarios. To delve into this observation, we offer an in-depth analysis of the Rademacher complexity associated with our modified models and prove an upper bound tied to their stochastic depth that existing NOs do not satisfy. Furthermore, we obtain a novel out-of-distribution risk bound tailored to Gaussian measures on Banach spaces, again relating stochastic depth with the bound. We conclude by proposing a hypernetwork version of the subfamily of NOs as a surrogate model for the mentioned forward operator.

翻訳日:2023-07-06 23:15:38 公開日:2023-07-04

# 包括的機械翻訳のためのジェンダー中立化:理論基礎からオープンチャレンジへ

Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges ( http://arxiv.org/abs/2301.10075v3 )

ライセンス: Link先を確認

Andrea Piergentili, Dennis Fucci, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri

(参考訳) 言語技術における男女排他性は、重要な研究テーマとなっている。本研究では,性中立翻訳(gnt)を,性別偏差と差別を継続する機械翻訳(mt)モデルによって達成される目的として,性中立翻訳(gnt)について検討する。具体的には、ジェンダー関連言語移行問題を表す言語対である、英語からイタリア語への翻訳に焦点を当てる。 GNTの定義には,ジェンダーを包摂する言語に関する制度的ガイドラインの選択,利用シナリオの議論,MTにおけるGNTの実行に関する技術的課題について検討し,MTにおけるより大きな傾きへの発展を促すための潜在的な解決策について議論する。

Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative of salient gender-related linguistic transfer problems. To define GNT, we review a selection of relevant institutional guidelines for gender-inclusive language, discuss its scenarios of use, and examine the technical challenges of performing GNT in MT, concluding with a discussion of potential solutions to encourage advancements toward greater inclusivity in MT.

翻訳日:2023-07-06 23:15:15 公開日:2023-07-04

# イオン擬ポテンシャルを用いた電池材料の量子シミュレーション

Quantum simulation of battery materials using ionic pseudopotentials ( http://arxiv.org/abs/2302.07981v2 )

ライセンス: Link先を確認

Modjtaba Shokrian Zini, Alain Delgado, Roberto dos Reis, Pablo A. M. Casares, Jonathan E. Mueller, Arne-Christian Voigt, Juan Miguel Arrazola

(参考訳) イオン擬ポテンシャルは、核と核電子による有効ポテンシャルをモデル化するために、材料の古典的シミュレーションで広く使われている。電子の少ないモデリングは、システムの状態を正確に表すのに必要な平面波の数を明示的に減少させる。本研究では,疑似ポテンシャルを用いた量子コンピュータ上での周期的物質シミュレーションのコストを削減する量子アルゴリズムを提案する。平面波に基づくハミルトニアンの第一量子化表現を用いた量子化に基づく量子位相推定アルゴリズムを用いる。我々は、ハミルトニアンの量子化のための高度に最適化されたコンパイル戦略を開発することにより、擬ポテンシャルの複雑さを量子シミュレーションに組み込むという課題に対処する。これは分離可能な擬ポテンシャルの形式を利用するユニタリ分解の線形結合を含んでいる。我々の戦略は、量子読み取り専用メモリサブルーチンを量子算術のより効率的な代替手段として利用する。我々は, リチウム含有カソード材料をシミュレートするための計算コストを推定し, より正確なシミュレーションを行い, 余剰容量に対する可逆アクセスを得るための戦略を提示する必要がある。我々は,酸化マンガンリチウム,酸化マンガンリチウム,フッ化マンガンリチウムの3つの材料について,十分な精度のシミュレーションを行うために必要なキュービット数とトフォリゲート数を推定した。最適化されたコンパイル戦略により,Toffoliの総コストは,固定目標精度のため,従来よりも4桁も低い擬ポテンシャル型量子アルゴリズムが実現した。

Ionic pseudopotentials are widely used in classical simulations of materials to model the effective potential due to the nucleus and the core electrons. Modeling fewer electrons explicitly results in a reduction in the number of plane waves needed to accurately represent the states of a system. In this work, we introduce a quantum algorithm that uses pseudopotentials to reduce the cost of simulating periodic materials on a quantum computer. We use a qubitization-based quantum phase estimation algorithm that employs a first-quantization representation of the Hamiltonian in a plane-wave basis. We address the challenge of incorporating the complexity of pseudopotentials into quantum simulations by developing highly-optimized compilation strategies for the qubitization of the Hamiltonian. This includes a linear combination of unitaries decomposition that leverages the form of separable pseudopotentials. Our strategies make use of quantum read-only memory subroutines as a more efficient alternative to quantum arithmetic. We estimate the computational cost of applying our algorithm to simulating lithium-excess cathode materials for batteries, where more accurate simulations are needed to inform strategies for gaining reversible access to the excess capacity they offer. We estimate the number of qubits and Toffoli gates required to perform sufficiently accurate simulations with our algorithm for three materials: lithium manganese oxide, lithium nickel-manganese oxide, and lithium manganese oxyfluoride. Our optimized compilation strategies result in a pseudopotential-based quantum algorithm with a total Toffoli cost four orders of magnitude lower than the previous state of the art for a fixed target accuracy.

翻訳日:2023-07-06 23:05:54 公開日:2023-07-04

# 正規化層のみをチューニングする表現力

The Expressive Power of Tuning Only the Normalization Layers ( http://arxiv.org/abs/2302.07937v2 )

ライセンス: Link先を確認

Angeliki Giannou, Shashank Rajput, Dimitris Papailiopoulos

(参考訳) BatchやLayer-Normalizationといった特徴正規化変換は、最先端のディープニューラルネットワークの必須要素となっている。近年の微調整型大規模事前学習モデルの研究は、これらのアフィン変換のパラメータを調整するだけで下流タスクの精度が向上することを示している。これらの知見は、凍結ネットワークの正規化層をチューニングする表現力に関する疑問を提起する。本稿では,この問題への第一歩として,ランダムなReLUネットワークにおいて,正規化層のみを微調整することで,$O(\sqrt{\text{width}})$倍のターゲットネットワークを再構築可能であることを示す。従来の経験的作業と一致して、十分な過パラメータ化の下でランダムに分散されたネットワークであっても、これは成り立つことを示す。

Feature normalization transforms such as Batch and Layer-Normalization have become indispensable ingredients of state-of-the-art deep neural networks. Recent studies on fine-tuning large pretrained models indicate that just tuning the parameters of these affine transforms can achieve high accuracy for downstream tasks. These findings open the questions about the expressive power of tuning the normalization layers of frozen networks. In this work, we take the first step towards this question and show that for random ReLU networks, fine-tuning only its normalization layers can reconstruct any target network that is $O(\sqrt{\text{width}})$ times smaller. We show that this holds even for randomly sparsified networks, under sufficient overparameterization, in agreement with prior empirical work.

翻訳日:2023-07-06 23:05:29 公開日:2023-07-04

# ラベリング予算制約下での深い異常検出

Deep Anomaly Detection under Labeling Budget Constraints ( http://arxiv.org/abs/2302.07832v2 )

ライセンス: Link先を確認

Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Stephan Mandt, Maja Rudolph

(参考訳) 専門家のフィードバックに対する情報的データポイントの選択は、医療診断や不正検出など、さまざまなコンテキストにおける異常検出(AD)のパフォーマンスを著しく向上させることができる。本稿では,ラベル付きクエリからラベル付きデータへの異常スコアを一般化する理論的条件の集合を決定する。これらの結果から,予算制約の下で最適なデータカバレッジを持つデータラベリング戦略を提案する。さらに,半教師付きADのための新しい学習フレームワークを提案する。画像, 表, ビデオデータセットの大規模な実験により, 予算制約下での最先端の半教師付きAD性能が得られた。

Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.

翻訳日:2023-07-06 23:05:16 公開日:2023-07-04

# 次元低減とMARS

Dimension Reduction and MARS ( http://arxiv.org/abs/2302.05790v2 )

ライセンス: Link先を確認

Yu Liu, Degui Li, Yingcun Xia

(参考訳) 多変量適応回帰スプライン(MARS)は、非パラメトリック多変量回帰の一般的な推定方法の1つである。しかし、MARSは境界スプラインに基づいてコヴァリエートの相互作用を組み込むため、境界スプラインの積を使わなければならないため、相互作用の順序が高ければ管理不能な基底関数の数が増加し、推定効率が低下する。本稿では,十分次元削減を実現する共変数の線形結合を用いてMARSの性能を向上させる。 MARSの特殊基底関数は回帰関数の勾配の計算を容易にし、勾配の外部積の固有解析により線形結合の推定を行う。いくつかの技術的条件下では,提案手法の漸近理論が確立されている。シミュレーションと経験的応用の両方を含む数値的研究は、回帰推定と予測においてMARSや他の一般的な非パラメトリック法よりも次元の減少と改善に有効であることを示す。

The multivariate adaptive regression spline (MARS) is one of the popular estimation methods for nonparametric multivariate regressions. However, as MARS is based on marginal splines, to incorporate interactions of covariates, products of the marginal splines must be used, which leads to an unmanageable number of basis functions when the order of interaction is high and results in low estimation efficiency. In this paper, we improve the performance of MARS by using linear combinations of the covariates which achieve sufficient dimension reduction. The special basis functions of MARS facilitate calculation of gradients of the regression function, and estimation of the linear combinations is obtained via eigen-analysis of the outer-product of the gradients. Under some technical conditions, the asymptotic theory is established for the proposed estimation method. Numerical studies including both simulation and empirical applications show its effectiveness in dimension reduction and improvement over MARS and other commonly-used nonparametric methods in regression estimation and prediction.

翻訳日:2023-07-06 23:04:41 公開日:2023-07-04

# 非)-マルコフ量子チャネル下の離散ウィグナー関数を用いた状態の調和量子性

Harnessing quantumness of states using discrete Wigner functions under (non)-Markovian quantum channels ( http://arxiv.org/abs/2303.05291v2 )

ライセンス: Link先を確認

Jai Lalita, K. G. Paulson, Subhashish Banerjee

(参考訳) 離散ウィグナー関数(DWF)の負性は非古典性の尺度であり、しばしば系の量子コヒーレンス度を定量化するために用いられる。異なる量子チャネルの下でのウィグナーの負性性とその進化の研究は、実用的な量子コンピューティングシステムの開発に不可欠である環境との相互作用の下での量子状態の安定性と堅牢性についての洞察を与えることができる。我々は,(非)マルコフ型ランダム電信ノイズ (RTN) と振幅減衰 (AD) 量子チャネルの作用により, 量子ビット, 量子ビットおよび2量子ビット系のDWF負性度の変化について検討した。我々は、量子計算と量子テレポーテーションのリソースとして使用できる異なる負の量子状態を構築する。量子計算とテレポーテーションの成功は、(非)マルコフ進化の下でこれらの状態に対して推定される。

The negativity of the discrete Wigner functions (DWFs) is a measure of non-classicality and is often used to quantify the degree of quantum coherence in a system. The study of Wigner negativity and its evolution under different quantum channels can provide insight into the stability and robustness of quantum states under their interaction with the environment, which is essential for developing practical quantum computing systems. We investigate the variation of DWF negativity of qubit, qutrit, and two-qubit systems under the action of (non)-Markovian random telegraph noise (RTN) and amplitude damping (AD) quantum channels. We construct different negative quantum states which can be used as a resource for quantum computation and quantum teleportation. The success of quantum computation and teleportation is estimated for these states under (non)-Markovian evolutions.

翻訳日:2023-07-06 22:59:12 公開日:2023-07-04

# デバイス非依存プロトコルの制約リークに対するロバスト性

Robustness of implemented device-independent protocols against constrained leakage ( http://arxiv.org/abs/2302.13928v2 )

ライセンス: Link先を確認

Ernest Y.-Z. Tan

(参考訳) 近年、デバイス非依存(DI)プロトコルは、DIランダムネスの生成や拡張、およびDI量子鍵分布の一連のデモによって大きな進歩を遂げている。しかし、これらのデモの既存のセキュリティ証明は、DI暗号の典型的な前提に依存しており、デバイスが互いに望ましくない情報を漏らさないか、敵に漏らさない。この仮定は、実際に完全に実施することは難しいかもしれない。このようなリーク量の制約を考慮に入れたDIセキュリティ証明は他にも存在するが、使用されるテクニックは最近のDIプロトコルのデモを分析するのに適していない。本稿では,この目的に適した制約付き漏洩モデルについて検討し,今後の類似実験にも適用すべき課題について考察する。我々の証明構造は、幅広いdiプロトコルの実装を柔軟に分析するための最近の証明技術と互換性がある。提案手法では,これらのプロトコルの鍵レートに対する漏洩の影響を推定し,正の鍵レートを得ながら許容される漏洩量を明確に把握する。

Device-independent (DI) protocols have experienced significant progress in recent years, with a series of demonstrations of DI randomness generation or expansion, as well as DI quantum key distribution. However, existing security proofs for those demonstrations rely on a typical assumption in DI cryptography, that the devices do not leak any unwanted information to each other or to an adversary. This assumption may be difficult to perfectly enforce in practice. While there exist other DI security proofs that account for a constrained amount of such leakage, the techniques used are somewhat unsuited for analyzing the recent DI protocol demonstrations. In this work, we address this issue by studying a constrained leakage model suited for this purpose, which should also be relevant for future similar experiments. Our proof structure is compatible with recent proof techniques for flexibly analyzing a wide range of DI protocol implementations. With our approach, we compute some estimates of the effects of leakage on the keyrates of those protocols, hence providing a clearer understanding of the amount of leakage that can be allowed while still obtaining positive keyrates.

翻訳日:2023-07-06 22:57:34 公開日:2023-07-04

# ディープニューラルネットワークの二重降下は避けられるか?

Can we avoid Double Descent in Deep Neural Networks? ( http://arxiv.org/abs/2302.13259v4 )

ライセンス: Link先を確認

Victor Qu\'etu and Enzo Tartaglione

(参考訳) ディープラーニングモデルの最適サイズを見つけることは、特に省エネスキームにおいて、非常に現実的で幅広い影響を与える。最近になって,予期せぬ現象である‘二重降下’が,ディープラーニングコミュニティの注目を集めている。モデルのサイズが大きくなると、まずパフォーマンスが悪化し、その後は改善に戻ります。これは、高一般化を維持するために最適なモデルのサイズに関する深刻な疑問を提起する: モデルは十分に過度にパラメータ化する必要があるが、パラメータが多すぎるとトレーニングリソースが浪費される。効果的な方法で、最良のトレードオフを見つけることは可能か? 本研究は,学習問題の適切な条件付けによって二重降下現象を回避できる可能性を示唆するが,最終的な答えは見当たらない。我々は、単純な$\ell_2$正則化が既にそのような観点に肯定的な貢献をしているので、適切な正則化を持つ複素シナリオにおいて二重降下が期待されていることを実証的に観察する。

Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning community. As the model's size grows, the performance gets first worse, and then goes back to improving. It raises serious questions about the optimal model's size to maintain high generalization: the model needs to be sufficiently over-parametrized, but adding too many parameters wastes training resources. Is it possible to find, in an efficient way, the best trade-off? Our work shows that the double descent phenomenon is potentially avoidable with proper conditioning of the learning problem, but a final answer is yet to be found. We empirically observe that there is hope to dodge the double descent in complex scenarios with proper regularization, as a simple $\ell_2$ regularization is already positively contributing to such a perspective.

翻訳日:2023-07-06 22:56:26 公開日:2023-07-04

# Video-SwinUNet: VFSSインスタンス分割のための時空間深層学習フレームワーク

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation ( http://arxiv.org/abs/2302.11325v2 )

ライセンス: Link先を確認

Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M Gambaruto, Tilo Burghardt

(参考訳) 本稿では,医療ビデオセグメンテーションのためのディープラーニングフレームワークを提案する。畳み込みニューラルネットワーク(cnn)とトランスフォーマーベースの手法は、その驚くべきセマンティックな特徴エンコーディングとグローバルな情報理解能力によって、医療画像分割タスクにおいて大きなマイルストーンを達成した。しかし、既存のアプローチのほとんどは、時間次元という医療ビデオデータの健全な側面を無視している。提案するフレームワークは,隣接フレームから時間次元にまたがる特徴を明示的に抽出し,それを時間的特徴ブレンダに組み込むことにより,高レベルの時空間的特徴をトークン化し,スウィントランスで符号化された強大域的特徴を形成する。最終的なセグメンテーション結果は、UNetのようなエンコーダデコーダアーキテクチャによって生成される。このモデルは,vfss2022データセットのセグメンテーションベンチマークを改善し,テストした2つのデータセットに対して0.8986と0.8186のサイス係数を実現した。本研究は,学習能力の時間的特徴ブレンドスキームとデータセット間転送可能性の有効性も示す。コードとモデルはhttps://github.com/simonzeng7108/video-swinunetで完全に利用できる。

This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are fully available at https://github.com/SimonZeng7108/Video-SwinUNet.

翻訳日:2023-07-06 22:56:08 公開日:2023-07-04

# deforestvis:surrogate decision stumpsを用いた機械学習モデルの行動分析

DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps ( http://arxiv.org/abs/2304.00133v2 )

ライセンス: Link先を確認

Angelos Chatzimparmpas, Rafael M. Martins, Alexandru C. Telea, Andreas Kerren

(参考訳) 機械学習(ML)モデルの複雑さが増し、異なる(そして重要な)ドメインのアプリケーションが増加するにつれて、より解釈可能で信頼性の高いMLが強く求められている。複雑なmlモデルを理解するための単純でモデルに依存しない方法の1つは、ルールセットや決定木といった、よりシンプルで説明しやすく、元のモデルに十分近似するサーロゲートモデルを訓練することである。しかし、ルールセットは非常に長くなり、多くのif-else文があり、複雑なMLモデルを正確にエミュレートすると決定木深さが急速に増加する。そのような場合、両方のアプローチはコア目標を達成できず、ユーザーにモデル解釈性を提供する。我々は,adaptive boosting (adaboost) 技術を用いて生成されたサーロゲート決定スランプ (one-level decision tree) を提供することにより,複雑なmlモデルの振る舞いをユーザフレンドリに要約するビジュアル分析ツールであるdeforestvisを提案する。私たちのソリューションは、より多くの切り株をインクリメンタルに生成し、決定を正当化するための重み付き切り株による属性ベースの説明を作成し、ルールオーバーライドが1つ以上の切り株間のトレーニングインスタンス割り当てに与える影響を分析することで、複雑さと忠実さのトレードオフを探索するのに役立つ。独立したテストセットによって、ユーザは手動のルール変更の有効性を監視し、ケースバイケースの調査に基づいて仮説を形成することができる。 2つのユースケースでdeforestvisの適用可能性と有用性を示し,データアナリストとモデル開発者とのエキスパートインタビューを行った。

As the complexity of machine learning (ML) models increases and the applications in different (and critical) domains grow, there is a strong demand for more interpretable and trustworthy ML. One straightforward and model-agnostic way to interpret complex ML models is to train surrogate models, such as rule sets and decision trees, that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal, providing users with model interpretability. We tackle this by proposing DeforestVis, a visual analytics tool that offers user-friendly summarization of the behavior of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the adaptive boosting (AdaBoost) technique. Our solution helps users to explore the complexity vs fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analyzing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case investigations. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.

翻訳日:2023-07-06 22:48:04 公開日:2023-07-04

# 表面電子のリドバーグ状態に基づく制御なしゲート

Controlled-NOT gate based on the Rydberg states of surface electrons ( http://arxiv.org/abs/2303.08650v4 )

ライセンス: Link先を確認

Jun Wang, Wan-Ting He, Cong-Wei Lu, Yang-Yang Wang, Qing Ai, Hai-Bo Wang

(参考訳) 長いコヒーレンス時間と効率的な操作のため、表面電子(se)は量子計算と量子シミュレーションのための完全な2次元プラットフォームを提供する。本研究では,制御NOT(CNOT)ゲートを実現するための理論スキームを提案し,SEの4レベルRydberg構造上に2量子系を符号化する。状態伝達は中間レベルを持つ3レベル構造によって達成される。 2つの外部電磁界でSEを同時に駆動することにより、電磁誘導透過(EIT)効果の暗黒状態を利用して、最も散逸した状態の人口を抑制し、散逸に対する堅牢性を高める。このスキームの忠実性は、実験的に達成可能なパラメータで 0.9989 である。

Due to the long coherence time and efficient manipulation, the surface electron (SE) provides a perfect two-dimensional platform for quantum computation and quantum simulation. In this work, a theoretical scheme to realize the controlled-NOT (CNOT) gate is proposed, where the two-qubit system is encoded on the four-level Rydberg structure of SE. The state transfer is achieved by a three-level structure with an intermediate level. By simultaneously driving the SE with two external electromagnetic fields, the dark state in the electromagnetically induced transparency (EIT) effect is exploited to suppress the population of the most dissipative state and increase the robustness against dissipation. The fidelity of the scheme is 0.9989 with experimentally achievable parameters.

翻訳日:2023-07-06 22:46:16 公開日:2023-07-04

# 層状材料を用いた光学系の定常2状態系

Stationary Two-State System in Optics using Layered Materials ( http://arxiv.org/abs/2303.08395v2 )

ライセンス: Link先を確認

Ken-ichi Sasaki

(参考訳) グラフェンのような平坦な面にのみ電子が存在する状況で電気力学が量子化されると、マクスウェル方程式の1つがハミルトニアンの局所部分として現れる。ゲージ不変性の結果、任意の物理的状態は局所ハミルトニアンのゼロエネルギー状態である必要がある。我々は2つの定常量子状態を構築し、一方は光の散乱と吸収を再現し、他方は古典光学に精通している。これらの2つの状態はハミルトニアンによって分離され、2つの状態系を形成するが、2つの状態が分離される特別な数の曲面が存在する。数値は 2/\pi \alpha$ であり、$\pi \alpha$ は単面の吸収確率である。

When electrodynamics is quantized in a situation where the electrons exist only at a flat surface such as graphene, one of the Maxwell equations appears as a local part of the Hamiltonian. As a consequence of gauge invariance, any physical state has to be a zero-energy state of the local Hamiltonian. We construct two stationary quantum states; one reproduces scattering and absorption of light, which is familiar in classical optics and the other is more fundamentally related to photon creation. These two states are inseparable by the Hamiltonian and forming a two-state system, but there is a special number of surfaces for which two states are decoupled. The number is $2/\pi \alpha$ where $\pi \alpha$ is the absorption probability of single surface.

翻訳日:2023-07-06 22:46:04 公開日:2023-07-04

# FairAdaBN:適応的バッチ正規化による不公平さの軽減と皮膚疾患分類への応用

FairAdaBN: Mitigating unfairness with adaptive batch normalization and its application to dermatological disease classification ( http://arxiv.org/abs/2303.08325v2 )

ライセンス: Link先を確認

Zikang Xu, Shang Zhao, Quan Quan, Qingsong Yao, and S. Kevin Zhou

(参考訳) 深層学習は、センシティブな情報や重要な診断決定を含む一方で、医学研究やアプリケーションにおいてますます普及している。研究者たちは、モデル不公平と呼ばれる異なる階層特性を持つサブグループ間での顕著なパフォーマンス格差を観察し、厳密なアーキテクチャを慎重に設計し、トレーニングの重荷を伴い、一般化を損なうとともに、モデルパフォーマンスと公平性のトレードオフを明らかにする。そこで本研究では,バッチ正規化を高感度属性に適応させることにより,fairadabnを提案する。この単純だが効果的な設計は、もともと公平を知らないいくつかの分類バックボーンに適用することができる。さらに、ミニバッチ上の部分群間の統計的パリティを抑える新しい損失関数を導出し、モデルが相当公正に収束するように促す。モデル性能と公平性の間のトレードオフを評価するために,fate(fairness-accuracy trade-off efficiency)と呼ばれる新しい指標を提案し,精度低下による正規化フェアネス改善を計算する。 2つの皮膚科学データセットを用いた実験により,提案手法はフェアネス基準とFATEの他の手法よりも優れていた。

Deep learning is becoming increasingly ubiquitous in medical research and applications while involving sensitive information and even critical diagnosis decisions. Researchers observe a significant performance disparity among subgroups with different demographic attributes, which is called model unfairness, and put lots of effort into carefully designing elegant architectures to address unfairness, which poses heavy training burden, brings poor generalization, and reveals the trade-off between model performance and fairness. To tackle these issues, we propose FairAdaBN by making batch normalization adaptive to sensitive attribute. This simple but effective design can be adopted to several classification backbones that are originally unaware of fairness. Additionally, we derive a novel loss function that restrains statistical parity between subgroups on mini-batches, encouraging the model to converge with considerable fairness. In order to evaluate the trade-off between model performance and fairness, we propose a new metric, named Fairness-Accuracy Trade-off Efficiency (FATE), to compute normalized fairness improvement over accuracy drop. Experiments on two dermatological datasets show that our proposed method outperforms other methods on fairness criteria and FATE.

翻訳日:2023-07-06 22:45:49 公開日:2023-07-04

# 表データを用いたディープラーニングのためのグラフニューラルネットワークコンテキスト埋め込み

Graph Neural Network contextual embedding for Deep Learning on Tabular Data ( http://arxiv.org/abs/2303.06455v2 )

ライセンス: Link先を確認

Mario Villaiz\'an-Vallelado, Matteo Salvatori, Bel\'en Carro Martinez, Antonio Javier Sanchez Esguevillas

(参考訳) すべての業界は、いわゆる表形式で利用可能な既存のビッグデータに基づいて、人工知能(AI)を活用しようとしている。ディープラーニング(DL)は、自然言語処理のような人間のスキルに関連する分野において、AIにとって大きなブレークスルーとなっている。ツリーベースのアンサンブルのような、より古典的な機械学習(ML)モデルは、通常、パフォーマンスが向上する。本稿では,グラフニューラルネットワーク(GNN)を用いた新しいDLモデルを提案する。この結果は、最近発表された5つの公開データセットに基づいたDLベンチマークによる調査よりも優れており、増木ソリューションと比較しても競争力のある結果が得られる。

All industries are trying to leverage Artificial Intelligence (AI) based on their existing big data which is available in so called tabular form, where each record is composed of a number of heterogeneous continuous and categorical columns also known as features. Deep Learning (DL) has constituted a major breakthrough for AI in fields related to human skills like natural language processing, but its applicability to tabular data has been more challenging. More classical Machine Learning (ML) models like tree-based ensemble ones usually perform better. This paper presents a novel DL model using Graph Neural Network (GNN) more specifically Interaction Network (IN), for contextual embedding and modelling interactions among tabular features. Its results outperform those of a recently published survey with DL benchmark based on five public datasets, also achieving competitive results when compared to boosted-tree solutions.

翻訳日:2023-07-06 22:45:03 公開日:2023-07-04

# 機械学習アルゴリズムの記述的解析による部分順序の深さ関数

Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms ( http://arxiv.org/abs/2304.09872v2 )

ライセンス: Link先を確認

Hannah Blocher, Georg Schollmeyer, Christoph Jansen, Malte Nalenz

(参考訳) 本稿では,深度関数の概念に基づく部分順序集合を記述的に解析するフレームワークを提案する。線形空間および距離空間における深さ関数の集中的な研究にもかかわらず、部分順序のような非標準データ型に対する深さ関数についてはほとんど議論がない。我々は、よく知られたsimplicial depthをすべての部分順序、union-free generic (ufg) depthの集合に適応させる。さらに,多次元性能測定に基づく機械学習アルゴリズムの比較のために,我々の ufg 深度を利用する。具体的には、標準ベンチマークデータセットのサンプル上で異なる分類器の性能の分布を分析する。提案手法が既存のベンチマーク手法と大きく異なることを有望に証明し,分類器の比較に関する活発な議論に新たな視点を付加した。

We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers.

翻訳日:2023-07-06 22:40:02 公開日:2023-07-04

# ストリーミングデータのアクティブコストアウェアラベリング

Active Cost-aware Labeling of Streaming Data ( http://arxiv.org/abs/2304.06808v2 )

ライセンス: Link先を確認

Ting Cai, Kirthevasan Kandasamy

(参考訳) アクティブな学習者がデータポイントのストリームに直面するストリーミングデータのラベル付けを積極的に研究し、高価な実験によってラベル付けするポイントを慎重に選択する必要がある。このような問題は医療や天文学などの応用でしばしば発生する。最初に、データの入力が$k$離散分布の1つに属する場合の設定を研究し、ラベリングコストと予測エラーをキャプチャするロスによってこの問題を形式化する。ラベル付けコストが$B$の場合、我々のアルゴリズムは、不確実性が時間とコスト依存しきい値よりも大きい場合の値にラベルを付けることを選択し、$T$ラウンド後の損失に対して$\widetilde{O}(B^{\frac{1}{3}} K^{\frac{1}{3}} T^{\frac{2}{3}})$の最悪の上限を達成する。また、よりニュアンスの高い上界を提供し、アルゴリズムが到着パターンに適応できることを示し、到着パターンがより有利な場合により良い性能を実現する。両方の上界と一致する下界を補完する。次に、入力が連続領域に属し、実験の出力が有界なRKHSノルムを持つ滑らかな関数である場合、この問題を研究する。 $d$次元での$T$のラウンドの後、損失は$\widetilde{O}(B^{\frac{1}{d+3}} T^{\frac{d+2}{d+3}})$と$\widetilde{O}(B^{\frac{1}{2d+3}} T^{\frac{2d+3}})$とMt\ernカーネルを持つRKHSで束縛されていることを示す。本手法は,いくつかの合成実験および医学および天文学における2つの実実験において,他のベースラインよりも優れることを示す。

We study actively labeling streaming data, where an active learner is faced with a stream of data points and must carefully choose which of these points to label via an expensive experiment. Such problems frequently arise in applications such as healthcare and astronomy. We first study a setting when the data's inputs belong to one of $K$ discrete distributions and formalize this problem via a loss that captures the labeling cost and the prediction error. When the labeling cost is $B$, our algorithm, which chooses to label a point if the uncertainty is larger than a time and cost dependent threshold, achieves a worst-case upper bound of $\widetilde{O}(B^{\frac{1}{3}} K^{\frac{1}{3}} T^{\frac{2}{3}})$ on the loss after $T$ rounds. We also provide a more nuanced upper bound which demonstrates that the algorithm can adapt to the arrival pattern, and achieves better performance when the arrival pattern is more favorable. We complement both upper bounds with matching lower bounds. We next study this problem when the inputs belong to a continuous domain and the output of the experiment is a smooth function with bounded RKHS norm. After $T$ rounds in $d$ dimensions, we show that the loss is bounded by $\widetilde{O}(B^{\frac{1}{d+3}} T^{\frac{d+2}{d+3}})$ in an RKHS with a squared exponential kernel and by $\widetilde{O}(B^{\frac{1}{2d+3}} T^{\frac{2d+2}{2d+3}})$ in an RKHS with a Mat\'ern kernel. Our empirical evaluation demonstrates that our method outperforms other baselines in several synthetic experiments and two real experiments in medicine and astronomy.

翻訳日:2023-07-06 22:38:57 公開日:2023-07-04

# 準エントロピーの単調性における等式、リーブの凹凸、安藤の凸凸

Equality cases in monotonicity of quasi-entropies, Lieb's concavity and Ando's convexity ( http://arxiv.org/abs/2304.04361v3 )

ライセンス: Link先を確認

Fumio Hiai

(参考訳) 我々はペッツによる準エントロピーの連接凹凸性および単調特性を再検討し改善する。次に、準エントロピーの単調性不等式(データ処理の不等式)における等式をいくつかの方法で特徴づける: $\Phi:\mathcal{B}(\mathcal{H})\to\mathcal{B}(\mathcal{K})$ を、$\Phi^*$ がシュワルツ写像であるようなトレース保存写像とする。 f$ が作用素単調または作用素凸函数であるとき、$[0,\infty)$ 上の等式 $s_f^k(\phi(\rho)\|\phi(\sigma))=s_f^{\phi^*(k)}(\rho\|\sigma)$ が与えられた正の作用素 $\rho,\sigma$ on $\mathcal{h}$ と $k\in\mathcal{b}(\mathcal{k})$ に対して成り立つ条件をいくつか提示する。この条件は、リーブの凹凸の単調版とアンドーの凸定理の等式を含む。写像 $\Phi$ の特殊化には、リーブの凹凸とアンドーの凸性に等しい条件がある。同様の等式条件は、単調測度や$\chi^2$-divergencesに対しても議論される。さらに,これらの量子情報量に対する線形保存問題についても考察する。

We revisit and improve joint concavity/convexity and monotonicity properties of quasi-entropies due to Petz in a new fashion. Then we characterize equality cases in the monotonicity inequalities (the data-processing inequalities) of quasi-entropies in several ways as follows: Let $\Phi:\mathcal{B}(\mathcal{H})\to\mathcal{B}(\mathcal{K})$ be a trace-preserving map such that $\Phi^*$ is a Schwarz map. When $f$ is an operator monotone or operator convex function on $[0,\infty)$, we present several equivalent conditions for the equality $S_f^K(\Phi(\rho)\|\Phi(\sigma))=S_f^{\Phi^*(K)}(\rho\|\sigma)$ to hold for given positive operators $\rho,\sigma$ on $\mathcal{H}$ and $K\in\mathcal{B}(\mathcal{K})$. The conditions include equality cases in the monotonicity versions of Lieb's concavity and Ando's convexity theorems. Specializing the map $\Phi$ we have equivalent conditions for equality cases in Lieb's concavity and Ando's convexity. Similar equality conditions are discussed also for monotone metrics and $\chi^2$-divergences. We further consider some types of linear preserver problems for those quantum information quantities.

翻訳日:2023-07-06 22:37:45 公開日:2023-07-04

# MedGen3D: ペアド3D画像とマスク生成のための深層生成フレームワーク

MedGen3D: A Deep Generative Framework for Paired 3D Image and Mask Generation ( http://arxiv.org/abs/2304.04106v2 )

ライセンス: Link先を確認

Kun Han, Yifeng Xiong, Chenyu You, Pooya Khosravi, Shanlin Sun, Xiangyi Yan, James Duncan, Xiaohui Xie

(参考訳) 十分なラベル付きデータの取得と注釈付けは、正確で堅牢な学習ベースモデルの開発には不可欠であるが、そのようなデータを取得することは、多くの医療画像分割タスクにおいて困難である。有望な解決策の1つは、接地マスクアノテーションで現実的なデータを合成することである。しかし、マスクを用いた完全な3次元ボリューム画像の生成について、先行研究は行われていない。本稿では,3次元医用画像とマスクをペアで生成する深層生成フレームワークであるmedgen3dについて述べる。まず,3次元医用データを2次元配列として表現し,解剖学的形状に付着したマルチラベルマスク列を生成するためのマルチコンディション拡散確率モデル(MC-DPM)を提案する。次に,生成マスク列に条件付き画像系列生成器とセマンティック拡散精製器を用いて,生成マスクと整合したリアルな3次元医用画像を生成する。提案フレームワークは,合成画像とセグメンテーションマップの正確なアライメントを保証する。 3次元胸部ctと脳mriのデータセットを用いた実験では, 合成データはオリジナルデータに対して多様で忠実であり, 下流分節作業の利点を示す。我々は,MedGen3Dが組み合わせた3次元医用画像とマスクを合成する能力は,医用画像処理タスクのためのディープラーニングモデルのトレーニングに有用であることが期待できる。

Acquiring and annotating sufficient labeled data is crucial in developing accurate and robust learning-based models, but obtaining such data can be challenging in many medical image segmentation tasks. One promising solution is to synthesize realistic data with ground-truth mask annotations. However, no prior studies have explored generating complete 3D volumetric images with masks. In this paper, we present MedGen3D, a deep generative framework that can generate paired 3D medical images and masks. First, we represent the 3D medical data as 2D sequences and propose the Multi-Condition Diffusion Probabilistic Model (MC-DPM) to generate multi-label mask sequences adhering to anatomical geometry. Then, we use an image sequence generator and semantic diffusion refiner conditioned on the generated mask sequences to produce realistic 3D medical images that align with the generated masks. Our proposed framework guarantees accurate alignment between synthetic images and segmentation maps. Experiments on 3D thoracic CT and brain MRI datasets show that our synthetic data is both diverse and faithful to the original data, and demonstrate the benefits for downstream segmentation tasks. We anticipate that MedGen3D's ability to synthesize paired 3D medical images and masks will prove valuable in training deep learning models for medical imaging tasks.

翻訳日:2023-07-06 22:37:07 公開日:2023-07-04

# 大腸組織分類のためのクロスモーダル・マイノショット画像生成

Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification ( http://arxiv.org/abs/2304.01992v2 )

ライセンス: Link先を確認

Amandeep Kumar, Ankan kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen and Fahad Shahbaz Khan

(参考訳) 本研究では,まれな癌組織に対する病理組織学的トレーニングデータの不足に対処する,数発の大腸組織画像生成法を提案する。 XM-GANと名づけられた少数ショット生成法は,1塩基と1対の参照組織像を入力とし,高品質で多様な画像を生成する。 xm-gan内の新しい制御可能な核融合ブロックは、基準画像と類似性に基づいて参照画像の局所領域を密に集約し、局所的に一貫した特徴をもたらす。私たちの知る限りでは,大腸組織画像におけるマイトショット生成を初めて調査した。大腸組織画像の創出は, 広範囲な質的, 定量的, 主観的評価(病理医)を用いて行った。特に専門医による評価では、xm-ganが生成した組織画像と実際の画像とを55%しか区別できない。さらに,これらの生成画像をデータ拡張として利用して,数発の組織画像分類課題に対処し,バニラ数発の分類器よりも平均精度が4.4%向上した。コード: \url{https://github.com/VIROBO-15/XM-GAN}

In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues. Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images. Within our XM-GAN, a novel controllable fusion block densely aggregates local regions of reference images based on their similarity to those in the base image, resulting in locally consistent features. To the best of our knowledge, we are the first to investigate few-shot generation in colorectal tissue images. We evaluate our few-shot colorectral tissue image generation by performing extensive qualitative, quantitative and subject specialist (pathologist) based evaluations. Specifically, in specialist-based evaluation, pathologists could differentiate between our XM-GAN generated tissue images and real images only 55% time. Moreover, we utilize these generated images as data augmentation to address the few-shot tissue image classification task, achieving a gain of 4.4% in terms of mean accuracy over the vanilla few-shot classifier. Code: \url{https://github.com/VIROBO-15/XM-GAN}

翻訳日:2023-07-06 22:36:29 公開日:2023-07-04

# 超周期的な測定システムと文脈のパターン

Hypercyclic systems of measurements and patterns of contextuality ( http://arxiv.org/abs/2304.01155v2 )

ライセンス: Link先を確認

Victor H. Cervantes and Ehtibar N. Dzhafarov

(参考訳) 文脈性に関するいくつかの原理的な尺度は、外乱のない測定系と外乱を伴う測定系の両方について文献に提案されている。測定システムが変化するにつれて、どちらかが変化し、もう一方が一定のままである。これは文脈性の異なる側面を測定することを意味しており、ある特定の意味での文脈性の尺度を1つだけ選ぶのではなく、それら全てを使って文脈性のパターンによって文脈システムを特徴付けることができると提案した。しかし、文脈性のパターンを研究するには、その便利なパラメトリゼーションを必要とする様々な測定システムの体系的な方法が必要である。我々は、量子力学の基礎において主要な役割を担った環状系のクラス内の便利なパラメトリゼーションを持つ。しかし、このクラスでは文脈性のすべての尺度が互いに比例していることが示されているため、文脈性のパターンを研究するのに使用できない。本稿では,超循環計測系について述べる。便利なパラメトリゼーションを保ちながら循環系を一般化する。このクラスのシステムでは、大規模システムと同様、文脈性(contextuality)の既知の測度のうち2つが互いに関数であることを示す。つまり、ハイパーサイクリックシステムは文脈性のパターンを研究するのに使うことができる。

Several principled measures of contextuality have been proposed in the literature, both for systems of measurements without and with disturbance. We have previously shown that no two of them are functions of each other: as systems of measurements change, either of them can change while the other remains constant. This means that they measure different aspects of contextuality, and we proposed that rather than picking just one measure of contextuality in one specific sense, one could use all of them to characterize a contextual system by its pattern of contextuality. To study patterns of contextuality, however, one needs a systematic way of varying systems of measurements, which requires their convenient parametrization. We have convenient parametrization within the class of cyclic systems that have played a dominant role in the foundations of quantum mechanics. However, they cannot be used to study patterns of contextuality, because within this class all measures of contextuality have been shown to be proportional to each other. In this concept paper we introduce hypercyclic systems of measurements. They generalize cyclic systems while preserving convenient parametrization. We show that within this class of systems, the same as for systems at large, no two of the known measures of contextuality are functions of each other. This means that hypercyclic systems can be used to study patterns of contextuality.

翻訳日:2023-07-06 22:36:08 公開日:2023-07-04

# ドローン画像におけるゼブラの合成データに基づく検出

Synthetic Data-based Detection of Zebras in Drone Imagery ( http://arxiv.org/abs/2305.00432v2 )

ライセンス: Link先を確認

Elia Bonetto and Aamir Ahmad

(参考訳) 現在、一般的な物体検出器や人体検出器の訓練を可能にするデータセットが広く利用可能である。これらはラベル付き実世界のイメージの形で提供され、ラベルの欠如やVICONシステムのような非常に制約のあるシナリオのような高いエラーの確率で、かなりの量の人的努力を必要とする。一方、空の景色や野生のシマウマのような動物、人間の形のような難易度の高い情報など、一般的なシナリオはほとんど得られない。これを解決するために、リアルなレンダリング技術を用いた合成データ生成が最近注目を集め、ターゲット追跡や人間のポーズ推定といった先進的な研究分野が進められている。しかし、野生動物のような対象は通常そのようなデータセットではよく表現されない。本研究は,まず,事前学習したYOLO検出器が,空中から記録した実画像中のゼブラを識別できないことを示す。そこで本研究では,合成データのみを用いて動物検出器を訓練する手法を提案する。まず、データ生成のための最先端フレームワークであるGRADEを用いて、新しい合成ゼブラデータセットを生成する。データセットには、RGB、深さ、骨格関節位置、ポーズ、形状、各被験者のインスタンスセグメンテーションが含まれる。これを使って、YOLO検出器をゼロからトレーニングします。実世界のデータを用いたモデルの評価を通して一インターネットで利用可能な限られたデータセット及び二訓練中に合成データのみを用いて、新たに収集し、手作業でラベルづけしたゼブラを検出できることを示す。コード、結果、トレーニングされたモデル、および生成されたデータおよびトレーニングデータは、https://eliabntt.github.io/grade-rr.でオープンソースとして提供される。

Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://eliabntt.github.io/grade-rr.

翻訳日:2023-07-06 22:27:14 公開日:2023-07-04

# メカニスティック・インタプリタビリティのための自動回路発見に向けて

Towards Automated Circuit Discovery for Mechanistic Interpretability ( http://arxiv.org/abs/2304.14997v2 )

ライセンス: Link先を確認

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adri\`a Garriga-Alonso

(参考訳) かなりの努力と直感を通じて、近年のいくつかの研究は、トランスフォーマーモデルの非自明な振る舞いをリバースエンジニアリングした。本論文は, 機械的な解釈過程を体系化する。まず、研究者は望ましいモデル行動を引き起こすメトリクスとデータセットを選択する。次に、アクティベーションパッチを適用して、どの抽象ニューラルネットワークユニットが動作に関与しているかを見つける。調査中のデータセット、メトリック、ユニットを変えることで、研究者は各コンポーネントの機能を理解することができる。プロセスのステップの1つを自動化し、モデルの計算グラフで指定された動作を実装する回路を識別する。我々は,いくつかのアルゴリズムを提案し,それを検証するために先行する解釈可能性結果を再現する。例えば、ACDCアルゴリズムは、GPT-2 Smallの回路で5/5のコンポーネントタイプを再発見し、グレーター・タン演算を計算した。 ACDCはGPT-2 Smallで32,000のエッジのうち68を選定した。私たちのコードはhttps://github.com/ArthurConmy/Automatic-Circuit-Discoveryで公開されています。

Through considerable effort and intuition, several recent works have reverse-engineered nontrivial behaviors of transformer models. This paper systematizes the mechanistic interpretability process they followed. First, researchers choose a metric and dataset that elicit the desired model behavior. Then, they apply activation patching to find which abstract neural network units are involved in the behavior. By varying the dataset, metric, and units under investigation, researchers can understand the functionality of each component. We automate one of the process' steps: to identify the circuit that implements the specified behavior in the model's computational graph. We propose several algorithms and reproduce previous interpretability results to validate them. For example, the ACDC algorithm rediscovered 5/5 of the component types in a circuit in GPT-2 Small that computes the Greater-Than operation. ACDC selected 68 of the 32,000 edges in GPT-2 Small, all of which were manually found by previous work. Our code is available at https://github.com/ArthurConmy/Automatic-Circuit-Discovery.

翻訳日:2023-07-06 22:26:29 公開日:2023-07-04

# 駆動型量子対称単純排他過程における特殊絡み合い

Exact Entanglement in the Driven Quantum Symmetric Simple Exclusion Process ( http://arxiv.org/abs/2304.10988v3 )

ライセンス: Link先を確認

Denis Bernard and Ludwig Hruza

(参考訳) 駆動量子系の絡み合い特性は、長距離コヒーレンスによる平衡状態とは異なる可能性がある。我々はこの観察をメソスコピック輸送に適したトイモデルである open quantum symmetric simple exclusion process (qssep) を用いて確認する。異なるサブシステム間の相互情報の正確な公式を導出し、体積法則を満たすことを示す。驚いたことに、QSSEPの絡み合い特性はその輸送特性に関するデータにのみ依存しており、そのような関係はより一般的なメソスコピックシステムに当てはまるかもしれない。 QSSEPのフリー確率構造をエクスプロイトし、これらの結果を得るため、ランダム行列の理論に潜在的に適用可能な数学的結果である、いわゆる局所的自由累積からランダム行列のサブブロックの固有値スペクトルを決定する新しい方法を開発した。この方法の例示として,局所自由積から固有状態熱化仮説 (eth) を満たす系における可観測性の期待値を計算する方法を示す。

Entanglement properties of driven quantum systems can potentially differ from the equilibrium situation due to long range coherences. We confirm this observation by studying a suitable toy model for mesoscopic transport~: the open quantum symmetric simple exclusion process (QSSEP). We derive exact formulae for its mutual information between different subsystems and show that it satisfies a volume law. Surprisingly, the QSSEP entanglement properties only depend on data related to its transport properties and we suspect that such a relation might hold for more general mesoscopic systems. Exploiting the free probability structure of QSSEP, we obtain these results by developing a new method to determine the eigenvalue spectrum of sub-blocks of random matrices from their so-called local free cumulants -- a mathematical result on its own with potential applications in the theory of random matrices. As an illustration of this method, we show how to compute expectation values of observables in systems satisfying the Eigenstate Thermalization Hypothesis (ETH) from the local free cumulants.

翻訳日:2023-07-06 22:25:39 公開日:2023-07-04

# 変圧器入門

An Introduction to Transformers ( http://arxiv.org/abs/2304.10557v3 )

ライセンス: Link先を確認

Richard E. Turner

(参考訳) トランスはニューラルネットワークコンポーネントであり、シーケンスやデータポイントの集合の有用な表現を学ぶのに使用できる。この変換器は、自然言語処理、コンピュータビジョン、時空間モデリングの最近の進歩を推し進めている。トランスフォーマーの紹介は数多く存在するが、ほとんどはアーキテクチャの正確な数学的記述を含んでおらず、設計の選択の背後にある直観も欠落している。さらに、研究が曲がりくねった経路を辿ると、変圧器の部品の説明は慣用的にできる。本論では, 数学的に正確で直感的で, クリーンなトランスフォーマアーキテクチャ記述を目指している。

The transformer is a neural network component that can be used to learn useful representations of sequences or sets of datapoints. The transformer has driven recent advances in natural language processing, computer vision, and spatio-temporal modelling. There are many introductions to transformers, but most do not contain precise mathematical descriptions of the architecture and the intuitions behind the design choices are often also missing. Moreover, as research takes a winding path, the explanations for the components of the transformer can be idiosyncratic. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.

翻訳日:2023-07-06 22:25:20 公開日:2023-07-04

# アノテーションフリーな視聴覚セグメンテーション

Annotation-free Audio-Visual Segmentation ( http://arxiv.org/abs/2305.11019v3 )

ライセンス: Link先を確認

Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie

(参考訳) audio-visual segmentation(avs)の目的は、ピクセル単位でのセグメンテーションマスクを正確に予測することで、視覚シーン内の音響オブジェクトをローカライズすることである。タスクに取り組むには、データとモデルの両方の側面を包括的に考慮する必要がある。本稿ではまず,人間のアノテーションを使わずにAISタスクのための人工データを生成する新しいパイプラインを開始する。既存の画像セグメンテーションとオーディオデータセットを利用して、画像とマスクのペアと対応するオーディオサンプルとカテゴリラベルのリンクとをマッチングし、AVSモデルをトレーニングするための(画像、オーディオ、マスク)トリプルを無駄に組み立てることができます。パイプラインは多くのカテゴリをカバーするために、アノテーションフリーでスケーラブルです。さらに,SAMA-AVSによる事前学習セグメントモデル~SAMをAVSタスクに適用するための軽量なアプローチを提案する。アダプタを用いた少数のトレーニング可能なパラメータを導入することで,ほとんどのパラメータを固定した符号化段階において,適切な音声と視覚の融合と相互作用を効果的に実現できる。実験の結果,提案手法が他の競合手法をはるかに上回る結果が得られた。さらに,本合成データを用いて事前学習したモデルを用いて,実avsbenchデータの性能をさらに向上させ,s4サブセットでは83.17miou,ms3セットでは66.95miouを達成した。

The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks. To tackle the task, it involves a comprehensive consideration of both the data and model aspects. In this paper, first, we initiate a novel pipeline for generating artificial data for the AVS task without human annotating. We leverage existing image segmentation and audio datasets to match the image-mask pairs with its corresponding audio samples with the linkage of category labels, that allows us to effortlessly compose (image, audio, mask) triplets for training AVS models. The pipeline is annotation-free and scalable to cover a large number of categories. Additionally, we introduce a lightweight approach SAMA-AVS to adapt the pre-trained segment anything model~(SAM) to the AVS task. By introducing only a small number of trainable parameters with adapters, the proposed model can effectively achieve adequate audio-visual fusion and interaction in the encoding stage with vast majority of parameters fixed. We conduct extensive experiments, and the results show our proposed model remarkably surpasses other competing methods. Moreover, by using the proposed model pretrained with our synthetic data, the performance on real AVSBench data is further improved, achieving 83.17 mIoU on S4 subset and 66.95 mIoU on MS3 set.

翻訳日:2023-07-06 22:19:17 公開日:2023-07-04

# スピンバスと相互作用するシステムのためのラマン断熱経路

Stimulated Raman Adiabatic Passage for a system interacting with a spin-bath ( http://arxiv.org/abs/2305.08209v2 )

ライセンス: Link先を確認

Benedetto Militello and Anna Napoli

(参考訳) このような技術によって操作される物理系がスピン浴と相互作用する場合に、刺激ラマン断熱路を解析する。人口移動過程の効率は, 環境との弱い強い結合や不協和など, いくつかの制度において理論的, 数値的手法を用いて検討した。一般化された量子ゼノ効果の発生は、強い減衰状態における効率の低下を説明する。

Stimulated Raman Adiabatic Passage is analyzed in the case where the physical system manipulated by such technique is interacting with a spin bath. The efficiency of the population transfer process is investigated both theoretically and via numerical tools in several regimes, including the weak and strong coupling with the environment and the off-resonance. The occurrence of a generalized quantum Zeno effect explains the lowering of the efficiency in the strong damping regime.

翻訳日:2023-07-06 22:17:53 公開日:2023-07-04

# 人間と機械のスケーラブル符号化における条件と残留法

Conditional and Residual Methods in Scalable Coding for Humans and Machines ( http://arxiv.org/abs/2305.02562v2 )

ライセンス: Link先を確認

Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Baji\'c

(参考訳) 本稿では,人間および機械のスケーラブルコーディングの文脈において,条件付きおよび残差符号化の手法を提案する。我々は,コンピュータビジョンタスクで利用可能な情報を用いて,再建作業の速度歪み性能を最適化することに注力する。ベースラインを提供するための両手法の情報分析を含むとともに,モデリング能力の向上と従来と類似したトラクタビリティを備えた条件付き符号化に適したエントロピーモデルを提案する。これらの手法を画像再構成に適用し、cityscapesデータセット上のセマンティックセグメンテーション用に作成された表現と、cocoデータセット上のオブジェクト検出のために作成された表現を用いている。両実験とも条件付き法と残留法で同様の性能を示し,その結果の速度歪み曲線はベースラインに含まれる。

We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.

翻訳日:2023-07-06 22:17:13 公開日:2023-07-04

# 品質多様性アルゴリズムの実行時解析

Runtime Analysis of Quality Diversity Algorithms ( http://arxiv.org/abs/2305.18966v2 )

ライセンス: Link先を確認

Jakob Bossek, Dirk Sudholt

(参考訳) 品質の多様性~(QD)は進化的計算の分野であり、近年関心が高まりつつある。 map-elites qdアプローチは、探索空間の分割のような特徴空間を定義し、この空間の各セルに対して最適な解を格納する。我々は,$i$th セルが $[(i-1)k, ik-1]$ で多数のセルを持つセルに対して最適な解を格納する ``number of ones'' 特徴空間上の疑似boolean 最適化の文脈において,単純な qd アルゴリズムについて検討する。ここで$k$は粒度パラメータ $1 \leq k \leq n+1$ である。我々は、全てのセルが任意のフィットネス関数に被覆されるまでの期待時間に厳密な拘束を与え、すべての$k$に対して \textsc{OneMax} 上の QD の期待最適化時間と、特徴空間に好適に整合する他の問題を分析する。組合せ問題では、QD は単調部分モジュラ函数を 1 つの一様濃度制約で効率的に最大化するときに${(1-1/e)}$-近似を求める。連結グラフの連結成分の個数として特徴空間を定義すると、QDが期待される多項式時間で最小のスパンニングツリーを見つけることを示す。

Quality diversity~(QD) is a branch of evolutionary computation that gained increasing interest in recent years. The Map-Elites QD approach defines a feature space, i.e., a partition of the search space, and stores the best solution for each cell of this space. We study a simple QD algorithm in the context of pseudo-Boolean optimisation on the ``number of ones'' feature space, where the $i$th cell stores the best solution amongst those with a number of ones in $[(i-1)k, ik-1]$. Here $k$ is a granularity parameter $1 \leq k \leq n+1$. We give a tight bound on the expected time until all cells are covered for arbitrary fitness functions and for all $k$ and analyse the expected optimisation time of QD on \textsc{OneMax} and other problems whose structure aligns favourably with the feature space. On combinatorial problems we show that QD finds a ${(1-1/e)}$-approximation when maximising any monotone sub-modular function with a single uniform cardinality constraint efficiently. Defining the feature space as the number of connected components of a connected graph, we show that QD finds a minimum spanning tree in expected polynomial time.

翻訳日:2023-07-06 20:35:24 公開日:2023-07-04

# depf:赤外線および可視画像の分解プールに基づく新しい核融合法

DePF: A Novel Fusion Approach based on Decomposition Pooling for Infrared and Visible Images ( http://arxiv.org/abs/2305.17376v2 )

ライセンス: Link先を確認

Hui Li, Yongbiao Xiao, Chunyang Cheng, Zhongwei Shen, Xiaoning Song

(参考訳) 赤外線および可視画像融合は、下降タスクの促進に使用できる、優れた特徴と豊富なテクスチャ詳細を含む合成画像を同時に生成することを目的としている。しかし, 既存の核融合法は, テクスチャロスやエッジ情報不足の問題に悩まされており, 結果として準最適核融合が生じる。一方、ストレートフォワードアップサンプリングオペレータは、マルチスケールの特徴からソース情報を十分に保存できない。これらの問題に対処するために,分解プール法(デプール法)に基づく新しい融合ネットワークを提案し,これをDePFと呼ぶ。具体的には、デプールベースのエンコーダを用いて、複数スケールの画像とソース画像の詳細な特徴を同時に抽出する。さらに,空間的注意モデルを用いて,これらの特徴を集約する。その後、融合した機能はデコーダによって再構成され、アップサンプリング演算子はデプール反転操作に置き換えられる。一般的な最大サンプリング技術とは異なり、デプール層後の画像特徴は豊富な詳細情報を保持でき、融合プロセスに有利である。この場合、リコンストラクション段階では、リッチテクスチャ情報とマルチスケール情報が維持される。実験の結果,本手法は複数の画像融合ベンチマークにおいて最先端技術よりも高い融合性能を示すことがわかった。

Infrared and visible image fusion aims to generate synthetic images simultaneously containing salient features and rich texture details, which can be used to boost downstream tasks. However, existing fusion methods are suffering from the issues of texture loss and edge information deficiency, which result in suboptimal fusion results. Meanwhile, the straight-forward up-sampling operator can not well preserve the source information from multi-scale features. To address these issues, a novel fusion network based on the decomposition pooling (de-pooling) manner is proposed, termed as DePF. Specifically, a de-pooling based encoder is designed to extract multi-scale image and detail features of source images at the same time. In addition, the spatial attention model is used to aggregate these salient features. After that, the fused features will be reconstructed by the decoder, in which the up-sampling operator is replaced by the de-pooling reversed operation. Different from the common max-pooling technique, image features after the de-pooling layer can retain abundant details information, which is benefit to the fusion process. In this case, rich texture information and multi-scale information are maintained during the reconstruction phase. The experimental results demonstrate that the proposed method exhibits superior fusion performance over the state-of-the-arts on multiple image fusion benchmarks.

翻訳日:2023-07-06 20:34:36 公開日:2023-07-04

# 面ベース検索による検索言語モデルの難易度低減

Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models ( http://arxiv.org/abs/2305.16243v3 )

ライセンス: Link先を確認

Ehsan Doostmohammadi, Tobias Norlund, Marco Kuhlmann, Richard Johansson

(参考訳) 検索機構による言語モデルの強化は,パラメータ数を低く保ちながら,性能を著しく向上させることが示されている。検索型モデルは通常、クエリチャンクの密表現と潜在的な隣人の類似性に基づく意味的検索機構に依存する。本稿では,現状のRetroモデルについて検討し,トークン重複などの表面レベルの類似性により,その性能向上がよりよく説明できることを示した。これに触発されて,レトロのセマンティック検索をbm25に基づく表面レベル手法に置き換え,パープレキシティの大幅な低減を図る。 BM25の完全検索は大規模データセットに対して計算コストがかかるため,計算オーバーヘッドを最小に抑えることで,再分類シナリオにも適用することができる。

Augmenting language models with a retrieval mechanism has been shown to significantly improve their performance while keeping the number of parameters low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism based on the similarity between dense representations of the query chunk and potential neighbors. In this paper, we study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities, such as token overlap. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity. As full BM25 retrieval can be computationally costly for large datasets, we also apply it in a re-ranking scenario, gaining part of the perplexity reduction with minimal computational overhead.

翻訳日:2023-07-06 20:33:35 公開日:2023-07-04

# 痕跡のない消滅: ローレンツ量子現実問題に対するケントの解における時間の矢印

Disappearing Without a Trace: The Arrows of Time in Kent's Solution to the Lorentzian Quantum Reality Problem ( http://arxiv.org/abs/2305.13201v2 )

ライセンス: Link先を確認

Emily Adlam

(参考訳) 私たちの周りで見られる時間的非対称性を説明する既存の提案のほとんどは、時間発展に基づく物理学のアプローチの中に置かれており、そのため通常、非対称性は特別な初期状態の形で時間開始時に置かれる。しかし、時間進化パラダイムを前提としない場合、時間的非対称性を説明する他の可能性もあります。本稿では、ケントの量子力学の「最終測度」解釈に基づいて、そのような可能性を探る。このアプローチには、電磁的非対称性、熱力学的非対称性、粗い非対称性、フォーク非対称性、記録的非対称性、宇宙的非対称性を説明するためのリソースがある可能性があり、それがもたらす説明は特別な初期状態に訴える説明よりも優れているかもしれない。我々の希望は、この例が時間進化パラダイム以外の時間的非対称性に対する新しいアプローチをさらに探求することである。

Most existing proposals to explain the temporal asymmetries we see around us are sited within an approach to physics based on time evolution, and thus they typically put the asymmetry in at the beginning of time in the form of a special initial state. But there may be other possibilities for explaining temporal asymmetries if we don't presuppose the time evolution paradigm. In this article, we explore one such possibility, based on Kent's `final-measurement' interpretation of quantum mechanics. We argue that this approach potentially has the resources to explain the electromagnetic asymmetry, the thermodynamic asymmetry, the coarse-graining asymmetry, the fork asymmetry, the record asymmetry, and the cosmological asymmetry, and that the explanations it offers may potentially be better than explanations appealing to a special initial state. Our hope is that this example will encourage further exploration of novel approaches to temporal asymmetry outside of the time evolution paradigm.

翻訳日:2023-07-06 20:32:47 公開日:2023-07-04

# 量子ドット族における幾何学的効果

Geometry effects in quantum dot families ( http://arxiv.org/abs/2305.12748v2 )

ライセンス: Link先を確認

Pavel Exner

(参考訳) We consider Schr\"odinger operator in $L^2(\mathrm{R}^\nu),\, \nu=2,3$, with the interaction in the form on a array of potential Wells, each on them were arranged with a curve $\Gamma$。我々は、$\Gamma$ がコンパクトの外の直線の曲げあるいは変形であり、井戸が同じ弧状距離を持つことを証明し、そのような作用素は空でない離散スペクトルを持つ。また、$\gamma$ が円であれば、主固有値は井戸が同じ角距離を持つ配置によって最大化される。いくつかの予想や未解決の問題も言及されている。

We consider Schr\"odinger operators in $L^2(\mathrm{R}^\nu),\, \nu=2,3$, with the interaction in the form on an array of potential wells, each on them having rotational symmetry, arranged along a curve $\Gamma$. We prove that if $\Gamma$ is a bend or deformation of a line, being straight outside a compact, and the wells have the same arcwise distances, such an operator has a nonempty discrete spectrum. It is also shown that if $\Gamma$ is a circle, the principal eigenvalue is maximized by the arrangement in which the wells have the same angular distances. Some conjectures and open problems are also mentioned.

翻訳日:2023-07-06 20:32:29 公開日:2023-07-04

# 文脈的フレーズ予測ネットワークを用いた文脈的エンドツーエンド音声認識

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network ( http://arxiv.org/abs/2305.12493v4 )

ライセンス: Link先を確認

Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

(参考訳) 近年,音声認識技術において文脈情報が重要な役割を担い,エンドツーエンド音声認識モデルに組み込むことが注目されている。しかし、従来のディープバイアス法はバイアスタスクの明示的な監督を欠いていた。本研究では,注意に基づくディープバイアス手法のための文脈句予測ネットワークを提案する。このネットワークは文脈埋め込みを用いて発話中の文脈句を予測し、バイアス損失を計算して文脈モデルのトレーニングを支援する。提案手法は,様々なエンドツーエンド音声認識モデルにおいて,単語誤り率 (WER) の低減を実現した。 librispeechコーパスの実験では,提案モデルがベースラインモデルよりも12.1%向上し,文脈句のwerは相対的に40.5%減少することが示された。さらに,コンテキスト句フィルタリング戦略を適用することで,バイアスリストが大きい場合に,war劣化を効果的に排除する。

Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.

翻訳日:2023-07-06 20:32:12 公開日:2023-07-04

# ネットワーク側情報を用いた高次元線形回帰におけるベイズ最適学習

Bayes optimal learning in high-dimensional linear regression with network side information ( http://arxiv.org/abs/2306.05679v2 )

ライセンス: Link先を確認

Sagnik Nandy and Subhabrata Sen

(参考訳) ネットワークの形でサイド情報を持つ教師付き学習問題は、ゲノム学、プロテオミクス、神経科学の分野で頻繁に発生する。例えば、遺伝的応用において、ネットワーク側情報は、関連する遺伝子間の複雑な関係に関する背景生物学的情報を正確に捉えることができる。本稿では,ネットワーク側情報を含む高次元線形回帰におけるベイズ最適学習の研究を開始する。この目的のために、まず、教師付きデータと観測されたネットワークの共分散を共通の潜在パラメータ集合を通して仮定する単純な生成モデル(Reg-Graphモデル)を導入する。次に,非常に一般的な条件下で最適である近似メッセージパッシング(amp)に基づく反復アルゴリズムを提案する。さらに、潜時信号と観測したデータとの相互情報の制限を特徴付け、ネットワーク側情報の統計的影響を正確に定量化する。最後に,提案アルゴリズムは有限サンプルにおいて優れた性能を示すことを示す。

Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture background biological information on the intricate relations among the relevant genes. In this paper, we initiate a study of Bayes optimal learning in high-dimensional linear regression with network side information. To this end, we first introduce a simple generative model (called the Reg-Graph model) which posits a joint distribution for the supervised data and the observed network through a common set of latent parameters. Next, we introduce an iterative algorithm based on Approximate Message Passing (AMP) which is provably Bayes optimal under very general conditions. In addition, we characterize the limiting mutual information between the latent signal and the data observed, and thus precisely quantify the statistical impact of the network side information. Finally, supporting numerical experiments suggest that the introduced algorithm has excellent performance in finite samples.

翻訳日:2023-07-06 20:25:16 公開日:2023-07-04

# 視線を信じないで - 機能の可視化の信頼性について

Don't trust your eyes: on the (un)reliability of feature visualizations ( http://arxiv.org/abs/2306.04719v3 )

ライセンス: Link先を確認

Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim

(参考訳) ニューラルネットワークはどのようにピクセルからパターンを抽出するか? 機能の可視化は、最適化によって非常に活性化したパターンを視覚化することで、この重要な質問に答えようとしている。今日、可視化手法は、機械的な解釈可能性の一種として、ニューラルネットワークの内部動作に関する我々の知識の基礎を形成している。機能可視化はどの程度信頼できるのか? 我々は,自然入力上での通常のネットワーク動作から完全に切り離された任意のパターンを示すために,特徴可視化を騙すネットワーク回路の開発に着手する。特徴視覚化は標準入力とは全く異なる処理を受けており、ニューラルネットワークが自然言語をどのように処理するかを「説明」する能力に疑問を呈している。特徴視覚化によって確実に理解できる関数の集合は極めて小さく、一般的なブラックボックスニューラルネットワークを含まないことを証明した理論によるこの経験的発見を裏付ける。そのため、より信頼性の高い特徴視覚化を実現するために、特定の構造を強制するネットワークの開発が期待できる。

How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.

翻訳日:2023-07-06 20:25:03 公開日:2023-07-04

# 木輪透かし:目に見えず頑丈な拡散画像の指紋

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust ( http://arxiv.org/abs/2305.20030v3 )

ライセンス: Link先を確認

Yuxin Wen, John Kirchenbauer, Jonas Geiping, Tom Goldstein

(参考訳) 生成モデルのアウトプットを透かしは、著作権をトレースし、AI生成コンテンツによる潜在的な害を防ぐ重要なテクニックである。本稿では,拡散モデル出力を頑健にフィンガープリントするTree-Ring Watermarkingという新しい手法を提案する。サンプリング後の画像へのポストホックな修正を行う既存の方法とは異なり、Tree-Ring Watermarkingはサンプリングプロセス全体に微妙に影響を与え、人間の目に見えないモデル指紋を生み出す。ウォーターマークは、サンプリングに使用される初期ノイズベクトルにパターンを埋め込む。これらのパターンはよりフーリエ空間に構成され、畳み込み、作物、拡張、反転、回転に不変である。画像生成後、拡散過程を反転してノイズベクトルを検索して透かし信号を検出し、埋め込み信号をチェックする。この手法は,fidの損失を無視できるプラグインとして,テキスト条件付き安定拡散を含む任意の拡散モデルに容易に適用できることを実証する。私たちのウォーターマークはイメージ空間にセマンティックに隠されており、現在デプロイされているウォーターマークよりもずっと堅牢です。コードはhttps://github.com/yuxinwenrick/tree-ring-watermarkで入手できる。

Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influences the entire sampling process, resulting in a model fingerprint that is invisible to humans. The watermark embeds a pattern into the initial noise vector used for sampling. These patterns are structured in Fourier space so that they are invariant to convolutions, crops, dilations, flips, and rotations. After image generation, the watermark signal is detected by inverting the diffusion process to retrieve the noise vector, which is then checked for the embedded signal. We demonstrate that this technique can be easily applied to arbitrary diffusion models, including text-conditioned Stable Diffusion, as a plug-in with negligible loss in FID. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed. Code is available at https://github.com/YuxinWenRick/tree-ring-watermark.

翻訳日:2023-07-06 20:23:47 公開日:2023-07-04

# ニューラルネットワークによる1ビットの通信による絡み合った状態のシミュレーション

Neural Network Approach to the Simulation of Entangled States with One Bit of Communication ( http://arxiv.org/abs/2305.19935v3 )

ライセンス: Link先を確認

Peter Sidajaya, Aloysius Dewen Lim, Baichu Yu, Valerio Scarani

(参考訳) ベルの定理は、局所隠れ変数(LHV)は、いくつかの絡み合った量子状態における測定の統計を十分に説明できないと述べている。それらをシミュレートするのに、どの程度追加的な古典的コミュニケーションが必要か尋ねるのは自然です。本研究では,ニューラルネットワークシミュレーションやその他のツールを用いて,この分野における2つの長年のオープン質問について検討する。まず, 部分的絡み合った2量子ビット状態における全ての射影的測定は, 1ビットの通信しか必要としないことを示す。我々は、正確な量子挙動とトレーニングされたネットワークの積、あるいはそれに触発された半解析モデルの間の統計的距離を定量化する。第二に、一ビットの通信が最終的に全ての二部量子相関を再現できないという一般的な根拠(そして明らかな)で知られているが、明示的な例は回避可能である。私たちの検索では、最大5つの入力と4つの出力を持つ2部ベルシナリオの1つを見つけられず、量子相関の再現における1ビットの通信のパワーが強調された。

Bell's theorem states that Local Hidden Variables (LHVs) cannot fully explain the statistics of measurements on some entangled quantum states. It is natural to ask how much supplementary classical communication would be needed to simulate them. We study two long-standing open questions in this field with neural network simulations and other tools. First, we present evidence that all projective measurements on partially entangled pure two-qubit states require only one bit of communication. We quantify the statistical distance between the exact quantum behaviour and the product of the trained network, or of a semianalytical model inspired by it. Second, while it is known on general grounds (and obvious) that one bit of communication cannot eventually reproduce all bipartite quantum correlation, explicit examples have proved evasive. Our search failed to find one for several bipartite Bell scenarios with up to 5 inputs and 4 outputs, highlighting the power of one bit of communication in reproducing quantum correlations.

翻訳日:2023-07-06 20:23:27 公開日:2023-07-04

# スパース不変量としての新しい解釈可能な保存法

Discovering New Interpretable Conservation Laws as Sparse Invariants ( http://arxiv.org/abs/2305.19525v3 )

ライセンス: Link先を確認

Ziming Liu, Patrick Obin Sturm, Saketh Bharadwaj, Sam Silva, Max Tegmark

(参考訳) 与えられた力学系の保存法則を明らかにすることは重要であるが困難である。理論的な設定(微分方程式と基底関数の両方が知られている)では、微分方程式から保存則を自動的に発見するアルゴリズムであるスパース不変検出器(SID)を提案する。そのアルゴリズムの単純さは、発見された保存量の堅牢性と解釈可能性を可能にする。 SIDは, 様々なシステムにおける新しい保全法則を再発見し, 発見することができることを示す。流体力学と大気化学の2つの例において、SIDはそれぞれ14と3の保存量を発見し、それまでドメインの専門家に知られていたのは12と2のみである。

Discovering conservation laws for a given dynamical system is important but challenging. In a theorist setup (differential equations and basis functions are both known), we propose the Sparse Invariant Detector (SID), an algorithm that auto-discovers conservation laws from differential equations. Its algorithmic simplicity allows robustness and interpretability of the discovered conserved quantities. We show that SID is able to rediscover known and even discover new conservation laws in a variety of systems. For two examples in fluid mechanics and atmospheric chemistry, SID discovers 14 and 3 conserved quantities, respectively, where only 12 and 2 were previously known to domain experts.

翻訳日:2023-07-06 20:22:52 公開日:2023-07-04

# 放射線腫瘍学のためのセグメンテーションモデル(SAM)

Segment Anything Model (SAM) for Radiation Oncology ( http://arxiv.org/abs/2306.11730v2 )

ライセンス: Link先を確認

Lian Zhang, Zhengliang Liu, Lu Zhang, Zihao Wu, Xiaowei Yu, Jason Holmes, Hongying Feng, Haixing Dai, Xiang Li, Quanzheng Li, Dajiang Zhu, Tianming Liu, Wei Liu

(参考訳) 本研究では,臨床放射線治療におけるSegment Anything Model(SAM)の性能評価を行った。以上の結果から,Diceスコアが0.7以上であるほとんどの臓器アットリスク(OAR)において,SAMのセグメンテーションモードは臨床的に許容できるセグメンテーションを達成できることが示唆された。 SAMのボックスプロンプトモードはDiceのスコアをさらに0.1から0.5に改善する。臓器の大きさと境界の明確さを考慮すると、samは境界が明確であるが、境界が明確でない小さな臓器ではより良く機能する大きな臓器の性能を示す。自然画像にプリトレーニングされたモデルであるsamは、臨床的に許容される精度で医療画像からのオールのデライン化を処理できるため、放射線治療の自動セグメンテーションにおいて一貫した精度でsamの堅牢な一般化能力が強調される。言い換えれば、SAMは汎用的な自動セグメンテーションモデルを用いて、異なる場所で異なるOARをデライン化することができる。 SAMの様々な疾患部位における一般化能力は、放射線治療における自動セグメンテーションのための一般的なモデルを開発することが技術的に可能であることを示唆している。

In this study, we evaluate the performance of the Segment Anything Model (SAM) in clinical radiotherapy. Our results indicate that SAM's 'segment anything' mode can achieve clinically acceptable segmentation results in most organs-at-risk (OARs) with Dice scores higher than 0.7. SAM's 'box prompt' mode further improves the Dice scores by 0.1 to 0.5. Considering the size of the organ and the clarity of its boundary, SAM displays better performance for large organs with clear boundaries but performs worse for smaller organs with unclear boundaries. Given that SAM, a model pre-trained purely on natural images, can handle the delineation of OARs from medical images with clinically acceptable accuracy, these results highlight SAM's robust generalization capabilities with consistent accuracy in automatic segmentation for radiotherapy. In other words, SAM can achieve delineation of different OARs at different sites using a generic automatic segmentation model. SAM's generalization capabilities across different disease sites suggest that it is technically feasible to develop a generic model for automatic segmentation in radiotherapy.

翻訳日:2023-07-06 20:15:28 公開日:2023-07-04

# マルコフ鎖を経由する定数ステップサイズsgdの収束と濃度特性

Convergence and concentration properties of constant step-size SGD through Markov chains ( http://arxiv.org/abs/2306.11497v2 )

ライセンス: Link先を確認

Ibrahim Merad and St\'ephane Ga\"iffas

(参考訳) 定常ステップサイズ確率勾配勾配(SGD)を用いた滑らかで強凸な対象の最適化を考察し,マルコフ連鎖のプリズムを通じてその特性を研究する。ゆるやかに制御された分散を持つ偏りのない勾配推定では、反復は全変動距離の不変分布に収束する。また,この収束をwasserstein-2距離において,従来よりも一般的な設定で確立する。極限分布の不変性により, 解析により, これらが勾配に当てはまるとき, 後者が準ガウス的あるいは準指数的濃度特性を継承することを示した。これにより、最終的な推定に対する高信頼境界の導出が可能になる。最後に、線形の場合のそのような条件下では、テール列のポリアック・ラッパート平均に対して無次元の偏差を求める。結果はすべて非漸近的であり,その影響はいくつかの応用を通じて議論されている。

We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance in a more general setting compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.

翻訳日:2023-07-06 20:15:07 公開日:2023-07-04

# 拡張Bose-HubbardモデルにおけるSuper-Tonks-Girardeau Quench

Super-Tonks-Girardeau Quench in the Extended Bose-Hubbard Model ( http://arxiv.org/abs/2306.10910v2 )

ライセンス: Link先を確認

Maciej Marciniak, Maciej {\L}ebek, Jakub Kopyci\'nski, Wojciech G\'orecki, Rafa{\l} O{\l}dziejewski, Krzysztof Paw{\l}owski

(参考訳) 本研究では, 強い局所相互作用を持つ一次元気体からのクエンチが, 超トンク・ジラルドー効果として知られる強誘電性ガスへ及ぼす影響について検討する。光学格子と非局所相互作用の両方を組み込むことで、クエンチ中の状態の破壊が特定の範囲の相互作用内に存在することを発見した。本研究は, 2つの原子の分析結果から始まり, 正確な対角化法, DMRG法, TDVP法を応用した少数体系まで, 様々なシステムサイズに拡張されたボース・ハッバードモデルを用いている。最後に、局所密度近似の数値的な実装を用いて、原子のマクロな数を求める。一貫して, スーパートンクス・ジラルドー・クエンチにより, 初期自己結合構造が拡大する領域が明らかとなった。高速蒸発は、拡張ボース・ハバード模型の物理学を探求する最先端の実験で位相図を特徴づけるツールを提供する。

We investigate the effect of a quench from a one-dimensional gas with strong and repulsive local interactions to a strongly attractive one, known as the super-Tonks-Girardeau effect. By incorporating both an optical lattice and non-local interactions, we discover a previously unexplored phenomenon: the disruption of the state during the quench, but within a specific range of interactions. Our study employs the extended Bose-Hubbard model across various system sizes, starting with analytical results for two atoms and progressing to few-body systems using exact diagonalization, DMRG and TDVP methods. Finally, we use a numerical implementation of the local density approximation for a macroscopic number of atoms. Consistently, our findings unveil a region where the initially self-bound structure expands due to the super-Tonks-Girardeau quench. The fast evaporation provides a tool to characterize the phase diagram in state-of-art experiments exploring the physics of the extended Bose-Hubbard model.

翻訳日:2023-07-06 20:14:34 公開日:2023-07-04

# 話題分類のための単言語・クロス言語知識伝達

Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification ( http://arxiv.org/abs/2306.07797v2 )

ライセンス: Link先を確認

Dmitry Karpov, Mikhail Burtsev

(参考訳) 本稿では,RuQTopicsデータセットからの知識伝達について検討する。このロシアのトピックデータセットは、大規模なサンプル番号(361,560シングルラベル、170,930マルチラベル)と広範なクラスカバレッジ(76クラス)を組み合わせたものだ。このデータセットは"yandex que"生データから作成しました。ロシアのMASSIVEサブセットの6つのマッチングクラスでトレーニングされたRuQTopicsモデルを評価することで、このデータセットでトレーニングされたロシアのみのモデルは、このサブセットで連続して85%の精度が得られるため、RuQTopicsデータセットが現実世界の会話タスクに適していることが証明された。また、RuQTopicsで訓練し、MASSIVEの6つのクラス(すべてのMASSIVE言語)で評価した多言語BERTに対して、言語知能の相関(スピアマン相関0.773とp値2.997e-11)と、それに対応する言語に対するBERTのデータの近似サイズとが密接に関連していることが判明した。同時に、言語学的精度とロシア語との言語的距離の相関は統計的に有意ではない。

This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have prepared this dataset from the "Yandex Que" raw data. By evaluating the RuQTopics - trained models on the six matching classes of the Russian MASSIVE subset, we have proved that the RuQTopics dataset is suitable for real-world conversational tasks, as the Russian-only models trained on this dataset consistently yield an accuracy around 85\% on this subset. We also have figured out that for the multilingual BERT, trained on the RuQTopics and evaluated on the same six classes of MASSIVE (for all MASSIVE languages), the language-wise accuracy closely correlates (Spearman correlation 0.773 with p-value 2.997e-11) with the approximate size of the pretraining BERT's data for the corresponding language. At the same time, the correlation of the language-wise accuracy with the linguistical distance from Russian is not statistically significant.

翻訳日:2023-07-06 20:13:26 公開日:2023-07-04

# 他人を検索する:指示付き汎用人物再識別タスク

Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions ( http://arxiv.org/abs/2306.07520v2 )

ライセンス: Link先を確認

Weizhen He and Shixiang Tang and Yiheng Deng and Qihao Chen and Qingsong Xie and Yizhou Wang and Lei Bai and Feng Zhu and Rui Zhao and Wanli Ouyang and Donglian Qi and Yunfeng Yan

(参考訳) 人間の知性は、視覚と言語の両方の記述に従って、任意の人物を検索することができる。しかし、現在のコンピュータビジョンコミュニティは、異なるシナリオにおける特定の人物再識別(ReID)タスクを別々に研究しており、現実世界の応用を制限している。本稿では、与えられた画像や言語命令に従って画像を取得する必要がある新しいインストラクト-ReIDタスクを提案し、既存のReIDタスクを異なる命令を設計することで特別なケースとして見ることができる、より一般的なReID設定である。そこで本研究では, 大規模omnireidベンチマークと適応三重項損失をベースラインとして提案する。実験結果から,OmniReIDベンチマークでトレーニングしたベースラインモデルは,従来のReIDでは+0.6%,+1.4%,マーケット1501では0.2%,CUHK03では%,MSMT17では+0.8%,+2.0%,+13.4%,PRCCではVC-Clothes,LTCCでは+11.7%,RGB画像のみを使用する場合にはCOCAS+ real2では+11.7%,新たに定義された言語命令されたReIDでは+25.4%,COCAS+ real2では+25.4%となっている。データセット、モデル、コードはhttps://github.com/hwz-zju/instruct-reidで入手できる。

Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions.Our instruct-ReID is a more general ReID setting, where existing ReID tasks can be viewed as special cases by designing different instructions. We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a baseline method to facilitate research in this new setting. Experimental results show that the baseline model trained on our OmniReID benchmark can improve +0.6%, +1.4%, 0.2% mAP on Market1501, CUHK03, MSMT17 for traditional ReID, +0.8%, +2.0%, +13.4% mAP on PRCC, VC-Clothes, LTCC for clothes-changing ReID, +11.7% mAP on COCAS+ real2 for clothestemplate based clothes-changing ReID when using only RGB images, +25.4% mAP on COCAS+ real2 for our newly defined language-instructed ReID. The dataset, model, and code will be available at https://github.com/hwz-zju/Instruct-ReID.

翻訳日:2023-07-06 20:13:03 公開日:2023-07-04

# $E(2)$-Equivariant Vision Transformer

$E(2)$-Equivariant Vision Transformer ( http://arxiv.org/abs/2306.06722v2 )

ライセンス: Link先を確認

Renjun Xu and Kaifan Yang and Ke Liu and Fengxiang He

(参考訳) Vision Transformer (ViT) はコンピュータビジョンにおいて優れた性能を発揮している。しかし、ViTにおける位置符号化は、データの本質的な等価性を学ぶのを著しく困難にしている。当初、同変 ViT を設計する試みがあったが、この論文ではいくつかのケースで欠陥があることが証明されている。この問題に対処するため、我々は、新しい効果的な位置符号化演算子を用いて、GE-ViT(Group Equivariant Vision Transformer)を設計する。 GE-ViTは同変ニューラルネットワークの理論的要件をすべて満たしていることを示す。 GE-ViTが非同変自己注意ネットワークを著しく上回ることを示すため、標準ベンチマークデータセットで包括的な実験が行われた。コードはhttps://github.com/zjucdsyangkaifan/gevitで入手できる。

Vision Transformer (ViT) has achieved remarkable performance in computer vision. However, positional encoding in ViT makes it substantially difficult to learn the intrinsic equivariance in data. Initial attempts have been made on designing equivariant ViT but are proved defective in some cases in this paper. To address this issue, we design a Group Equivariant Vision Transformer (GE-ViT) via a novel, effective positional encoding operator. We prove that GE-ViT meets all the theoretical requirements of an equivariant neural network. Comprehensive experiments are conducted on standard benchmark datasets, demonstrating that GE-ViT significantly outperforms non-equivariant self-attention networks. The code is available at https://github.com/ZJUCDSYangKaifan/GEVit.

翻訳日:2023-07-06 20:12:09 公開日:2023-07-04

# 識別可能な特徴分析によるChatGPT生成コードからの人間認証

Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis ( http://arxiv.org/abs/2306.14397v2 )

ライセンス: Link先を確認

Li Ke, Hong Sheng, Fu Cai, Zhang Yunhe and Liu Ming

(参考訳) プログラミングにおける大規模言語生成モデル(llm)のユビキタスな採用は、人間の書いたコードとインテリジェントなモデルによって生成されたコードの区別の重要性を強調している。本稿では,ChatGPTが生成するコードと,人間が作成したコードとを区別することを目的とする。この2つのソース間のプログラミングスタイル,技術レベル,可読性の違いを明らかにする。その結果,分化のための識別的特徴セットを開発し,その効果をアブレーション実験により評価する。さらに,時間的および空間的セグメンテーションを用いたデータセットクリーニング手法を考案し,データセットの重大さを軽減し,高度かつ汚染されていないデータセットを確保する。データリソースをさらに充実させるためには、"コードトランスフォーメーション"、"機能トランスフォーメーション"、"機能カスタマイズ"技術を採用し、10,000行のchatgpt生成コードからなる広範なデータセットを生成します。本研究の有意義な貢献は、二分分類タスクにおいて、人間が許可したコードとチャットgpt生成コードを区別する精度の高い識別機能セットの提案、広範なチャットgpt生成コードを生成する方法の考案、オープンソースリポジトリから未完成で高品質なコードデータセットを抽出するためのデータセットクリーン化戦略の導入、コードオーサシップアトリビューションタスクにおける例外的な精度の向上などである。

The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored the importance of differentiating between human-written code and code generated by intelligent models. This paper specifically aims to distinguish code generated by ChatGPT from that authored by humans. Our investigation reveals disparities in programming style, technical level, and readability between these two sources. Consequently, we develop a discriminative feature set for differentiation and evaluate its efficacy through ablation experiments. Additionally, we devise a dataset cleansing technique, which employs temporal and spatial segmentation, to mitigate the dearth of datasets and to secure high-caliber, uncontaminated datasets. To further enrich data resources, we employ "code transformation," "feature transformation," and "feature customization" techniques, generating an extensive dataset comprising 10,000 lines of ChatGPT-generated code. The salient contributions of our research include: proposing a discriminative feature set yielding high accuracy in differentiating ChatGPT-generated code from human-authored code in binary classification tasks; devising methods for generating extensive ChatGPT-generated codes; and introducing a dataset cleansing strategy that extracts immaculate, high-grade code datasets from open-source repositories, thus achieving exceptional accuracy in code authorship attribution tasks.

翻訳日:2023-07-06 20:06:18 公開日:2023-07-04

# L00Lとp00pの絡み合い

L00L and p00p entanglement ( http://arxiv.org/abs/2306.13620v2 )

ライセンス: Link先を確認

Dylan Danese, Sabine Wollmann, Saroch Leedumrongwatthanakun, Will McCutcheon, Manuel Erhard, William N. Plick, and Mehul Malik

(参考訳) 1つの光子が基本(gauss)モードを持ち、もう1つの光子が非零アジムタール(\ell$)またはラジアル(p$)成分を持つ高次lgモードを持つラゲール・ガウシアン(lg)の非平衡2光子エンタングルメントの生成を実証する。 N00N$ state nomenclatureからキューを受け取り、これらのタイプの状態を$LOOL$ (L00L) または $p00p$-entangled と呼ぶ。それらはlgモード空間で1つの光子を移動させ、ビームスプリッターで第2の(当初は無相関な)光子と結合し、その次に偶然検出することで生成される。 2光子のコヒーレンスを検証するために、2光子の「ツイスト」量子消去器を実証し、香港・ウー・マンデル干渉を2つの区別可能な光子間で再現する。絡み合いの証人を用いて、生成した$LOOL$と$p00p$の状態は、それぞれの理想の最大絡み合い状態に対して95.31%と89.80%の忠実さを持つことがわかった。基本的な興味の他に、この種の絡み合いは、平均的な量子物理学者の面白い骨をくすぐることに大きな影響を与える可能性が高い。

We demonstrate the generation of unbalanced two-photon entanglement in the Laguerre-Gaussian (LG) transverse-spatial degree-of-freedom, where one photon carries a fundamental (Gauss) mode and the other a higher-order LG mode with a non-zero azimuthal ($\ell$) or radial ($p$) component. Taking a cue from the $N00N$ state nomenclature, we call these types of states $LOOL$ (L00L) or $p00p$-entangled. They are generated by shifting one photon in the LG mode space and combining it with a second (initially uncorrelated) photon at a beamsplitter, followed by coincidence detection. In order to verify two-photon coherence, we demonstrate a two-photon "twisted" quantum eraser, where Hong-Ou-Mandel interference is recovered between two distinguishable photons by projecting them into a rotated LG superposition basis. Using an entanglement witness, we find that our generated $LOOL$ and $p00p$ states have fidelities of 95.31% and 89.80% to their respective ideal maximally entangled states. Besides being of fundamental interest, this type of entanglement will likely have a significant impact on tickling the average quantum physicist's funny bone.

翻訳日:2023-07-06 20:04:17 公開日:2023-07-04

# 超伝導ケラーパラメトリック発振器における量子干渉の観測と操作

Observation and manipulation of quantum interference in a superconducting Kerr parametric oscillator ( http://arxiv.org/abs/2306.12299v2 )

ライセンス: Link先を確認

Daisuke Iyama, Takahiko Kamiya, Shiori Fujii, Hiroto Mukai, Yu Zhou, Toshiaki Nagase, Akiyoshi Tomonaga, Rui Wang, Jiao-Jiao Xue, Shohei Watabe, Sangil Kwon, and Jaw-Shen Tsai

(参考訳) 量子トンネルは超伝導回路を「量子」にする現象である。近年,Kerrパラメトリック発振器の位相空間における量子トンネルを量子情報処理の資源として利用することへの関心が高まっている。本稿では、ウィグナートモグラフィによる平面超伝導回路のトンネルによる量子干渉の直接観測について報告する。この量子干渉の全ての本質的性質、例えばフォック状態からキャット状態へのマッピング、ポンプのデチューニングによる時間的振動、そしてその特徴的なラビ振動とラムジー縞を実験的に解明する。最後に,観測された量子干渉の操作としてゲート操作を行う。本研究は,超伝導Kerrパラメトリック発振器の量子特性と量子情報技術への応用に関する基礎研究である。

Quantum tunneling is the phenomenon that makes superconducting circuits "quantum". Recently, there has been a renewed interest in using quantum tunneling in phase space of a Kerr parametric oscillator as a resource for quantum information processing. Here, we report a direct observation of quantum interference induced by such tunneling in a planar superconducting circuit through Wigner tomography. We experimentally elucidate all essential properties of this quantum interference, such as mapping from Fock states to cat states, a temporal oscillation due to the pump detuning, as well as its characteristic Rabi oscillations and Ramsey fringes. Finally, we perform gate operations as manipulations of the observed quantum interference. Our findings lay the groundwork for further studies on quantum properties of superconducting Kerr parametric oscillators and their use in quantum information technologies.

翻訳日:2023-07-06 20:03:48 公開日:2023-07-04

# saaformer : 超スペクトル画像分類のためのスペクトル-空間アキシャルアグリゲーショントランス

SaaFormer: Spectral-spatial Axial Aggregation Transformer for Hyperspectral Image Classification ( http://arxiv.org/abs/2306.16759v2 )

ライセンス: Link先を確認

Enzhe Zhao, Zhichang Guo, Yao Li, Dazhi Zhang

(参考訳) 地球の観測衛星や航空機から撮影したハイパースペクトル画像(HSI)は、農業、環境モニタリング、鉱業などの分野でますます重要になっている。利用可能なハイパースペクトルデータセットが限られているため、pixel-wise random samplingは最も一般的に使用されるトレーニング-テストデータセット分割アプローチであり、トレーニングとテストデータセットのサンプル間にかなりの重複がある。さらに,より重なりが強い領域は分類精度が高いことが実験的に示唆された。したがって、画素単位のランダムサンプリングアプローチは、データ漏洩のリスクをもたらす。そこで本研究では,データ漏洩の可能性を最小限に抑えるブロックワイズサンプリング手法を提案する。また,2dcnnなどのモデルにおけるデータ漏洩の存在も実験的に確認した。さらに,HSIを長周期3次元画像とみなす超スペクトル画像分類器の課題に対処するため,スペクトル空間軸アグリゲーショントランスフォーマモデル,すなわちSaaFormerを提案する。このモデルは軸集約注意と多値スペクトル空間抽出の2つの主成分からなる。この軸集約注意機構は、空間的次元特徴を集約しながら、ハイパースペクトル画像の各画素位置におけるスペクトル帯域間の連続性と相関を効果的に活用する。これにより、SaaFormerはブロックワイドサンプリングでも高い精度を維持することができる。多層スペクトル空間抽出構造は、異なる物質成分の特定のスペクトル帯域に対する感度を捉え、より広範囲のスペクトル詳細に集中できるように設計されている。 6つの公開データセットの結果から,本モデルではランダムサンプリングでは同等の性能を示し,ブロックワイドサンプリングパーティションでは他の手法よりも優れていた。

Hyperspectral images (HSI) captured from earth observing satellites and aircraft is becoming increasingly important for applications in agriculture, environmental monitoring, mining, etc. Due to the limited available hyperspectral datasets, the pixel-wise random sampling is the most commonly used training-test dataset partition approach, which has significant overlap between samples in training and test datasets. Furthermore, our experimental observations indicates that regions with larger overlap often exhibit higher classification accuracy. Consequently, the pixel-wise random sampling approach poses a risk of data leakage. Thus, we propose a block-wise sampling method to minimize the potential for data leakage. Our experimental findings also confirm the presence of data leakage in models such as 2DCNN. Further, We propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, to address the challenges associated with hyperspectral image classifier that considers HSI as long sequential three-dimensional images. The model comprises two primary components: axial aggregation attention and multi-level spectral-spatial extraction. The axial aggregation attention mechanism effectively exploits the continuity and correlation among spectral bands at each pixel position in hyperspectral images, while aggregating spatial dimension features. This enables SaaFormer to maintain high precision even under block-wise sampling. The multi-level spectral-spatial extraction structure is designed to capture the sensitivity of different material components to specific spectral bands, allowing the model to focus on a broader range of spectral details. The results on six publicly available datasets demonstrate that our model exhibits comparable performance when using random sampling, while significantly outperforming other methods when employing block-wise sampling partition.

翻訳日:2023-07-06 19:55:13 公開日:2023-07-04

# McKean-Vlasov制御問題に対する連続時間q-ラーニング

Continuous Time q-learning for McKean-Vlasov Control Problems ( http://arxiv.org/abs/2306.16208v2 )

ライセンス: Link先を確認

Xiaoli Wei, Xiang Yu

(参考訳) 本稿では,最近Jia と Zhou (2023) による Q-learning の連続時間版として作られた q-learning を,エントロピー規則化強化学習の設定における Mckean-Vlasov 制御問題に対して検討する。 jia と zhou (2023) における単一エージェントの制御問題とは対照的に、エージェントの平均場相互作用は q-関数の定義をより微妙に表現し、2つの異なる q-函数が自然に生じることを示す。 i) テストポリシを含む弱いマルティンゲール条件で学習可能な、Gu, Guo, Wei and Xu (2023) で導入された統合 Q-函数の1次近似としての統合 q-函数($q$ で記述) (ii)政策改善イテレーションで使用される本質的なq-関数($q_e$で示される)。 2つのq関数は、すべてのテストポリシーの下で積分表現を介して関連していることを示す。弱いマーチンゲール条件とテストポリシーの探索法に基づいて,いくつかのモデルフリー学習アルゴリズムを考案した。 LQ制御フレームワークとLQ制御フレームワーク以外の2つの例では、最適値関数とq-関数の正確なパラメータ化を求め、シミュレーション実験でアルゴリズムを説明できる。

This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.

翻訳日:2023-07-06 19:54:20 公開日:2023-07-04

# 証拠検出と追跡コラボレーション:ロバストアンチuavシステムの新しい問題、ベンチマーク、アルゴリズム

Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System ( http://arxiv.org/abs/2306.15767v2 )

ライセンス: Link先を確認

Xue-Feng Zhu, Tianyang Xu, Jian Zhao, Jia-Wei Liu, Kai Wang, Gang Wang, Jianan Li, Qiang Wang, Lei Jin, Zheng Zhu, Junliang Xing, Xiao-Jun Wu

(参考訳) 無人航空機(uavs)は、輸送、監視、軍事など多くの分野で広く使用されている。しかし、安全とプライバシー侵害の可能性を増し、より広範な応用を厳しく制限し、UAVの認識と防衛(反UAV)の重要性を強調している。しかし、従来の作業では、UAVの以前の情報が常に提供されていた追跡問題として、このような反UAVタスクを単純化しており、実際の対UAVタスク(複雑なシーン、不定形、再認識型UAV、リアルタイムUAV監視など)では、そのようなスキームは失敗している。本稿では,UAV情報のない複雑な場面において,UAVの知覚を特徴とする新しい実用的対UAV問題を初めて定式化する。このような課題をベンチマークするために、AntiUAV600と呼ばれる最大のUAVデータセットと、新しい評価基準を提案する。 AntiUAV600は、ランダム、高速、小型のUAVを備えた600の挑戦的なシーンのビデオで構成され、723K以上の熱赤外フレームに密接な注釈が付けられた。最後に,グローバルなUAV検出とローカルなUAV追跡の明確な協調による,新たなUAV対策を開発し,提案課題に効果的に取り組むとともに,今後の研究の強力なベースラインとして機能する。広汎な実験により,本手法はSOTA法よりも優れており,大規模で複雑なUAV知覚性能を向上させるために,AntiUAV600の有効性が検証されている。データセット、事前トレーニングされたモデル、ソースコードはパブリックにリリースされます。

Unmanned Aerial Vehicles (UAVs) have been widely used in many areas, including transportation, surveillance, and military. However, their potential for safety and privacy violations is an increasing issue and highly limits their broader applications, underscoring the critical importance of UAV perception and defense (anti-UAV). Still, previous works have simplified such an anti-UAV task as a tracking problem, where the prior information of UAVs is always provided; such a scheme fails in real-world anti-UAV tasks (i.e. complex scenes, indeterminate-appear and -reappear UAVs, and real-time UAV surveillance). In this paper, we first formulate a new and practical anti-UAV problem featuring the UAVs perception in complex scenes without prior UAVs information. To benchmark such a challenging task, we propose the largest UAV dataset dubbed AntiUAV600 and a new evaluation metric. The AntiUAV600 comprises 600 video sequences of challenging scenes with random, fast, and small-scale UAVs, with over 723K thermal infrared frames densely annotated with bounding boxes. Finally, we develop a novel anti-UAV approach via an evidential collaboration of global UAVs detection and local UAVs tracking, which effectively tackles the proposed problem and can serve as a strong baseline for future research. Extensive experiments show our method outperforms SOTA approaches and validate the ability of AntiUAV600 to enhance UAV perception performance due to its large scale and complexity. Our dataset, pretrained models, and source codes will be released publically.

翻訳日:2023-07-06 19:53:56 公開日:2023-07-04

# 学習した位置認識記述子と点対ボクセルによるスパース双時間点雲の不規則変化検出

Irregular Change Detection in Sparse Bi-Temporal Point Clouds using Learned Place Recognition Descriptors and Point-to-Voxel Comparison ( http://arxiv.org/abs/2306.15416v2 )

ライセンス: Link先を確認

Nikolaos Stathoulopoulos, Anton Koval and George Nikolakopoulos

(参考訳) 3Dポイントクラウドにおける変化検出と不規則なオブジェクト抽出は、自律的なナビゲーションだけでなく、様々な産業環境の既存のデジタルツインモデルを更新する上でも重要な課題である。本稿では,voxel-to-point比較に基づく深層学習位置認識記述子と不規則物体抽出を用いた3次元点雲における変化検出手法を提案する。提案手法はまず,共通座標フレームを確立するために,マップマージアルゴリズムを用いて両時間点雲を配向する。そして、ディープラーニング技術を用いて、3Dポイントクラウドスキャンからロバストで差別的な特徴を抽出し、連続するポイントクラウドフレーム間の変化を検知し、変化した領域を見つける。最後に、変化した領域をサンプリングし、2つのインスタンス間で比較し、その領域が変化した障害を抽出する。提案手法は実世界の実地実験で評価され,オブジェクトやmuck-pileの付加・変位などの3次元点雲の異なる種類の変化を検知し,その効果を示した。本研究は, 建設現場における安全・安全監視, 地図作成, 調査, 今後の研究方向性など, 様々な応用に重要な影響を示唆するものである。

Change detection and irregular object extraction in 3D point clouds is a challenging task that is of high importance not only for autonomous navigation but also for updating existing digital twin models of various industrial environments. This article proposes an innovative approach for change detection in 3D point clouds using deep learned place recognition descriptors and irregular object extraction based on voxel-to-point comparison. The proposed method first aligns the bi-temporal point clouds using a map-merging algorithm in order to establish a common coordinate frame. Then, it utilizes deep learning techniques to extract robust and discriminative features from the 3D point cloud scans, which are used to detect changes between consecutive point cloud frames and therefore find the changed areas. Finally, the altered areas are sampled and compared between the two time instances to extract any obstructions that caused the area to change. The proposed method was successfully evaluated in real-world field experiments, where it was able to detect different types of changes in 3D point clouds, such as object or muck-pile addition and displacement, showcasing the effectiveness of the approach. The results of this study demonstrate important implications for various applications, including safety and security monitoring in construction sites, mapping and exploration and suggests potential future research directions in this field.

翻訳日:2023-07-06 19:53:10 公開日:2023-07-04

# ドメイン適応点雲登録のための分別平均教師

A denoised Mean Teacher for domain adaptive point cloud registration ( http://arxiv.org/abs/2306.14749v2 )

ライセンス: Link先を確認

Alexander Bigalke, Mattias P. Heinrich

(参考訳) ポイントクラウドベースの医療登録は、計算効率の向上、強度シフトへの堅牢性、匿名性保存を約束するが、類似度メトリクスによる教師なし学習の非効率性によって制限される。合成変形に関する教師付きトレーニングは代替となるが、ドメインギャップと実際のドメインとの差に悩まされる。本研究はドメイン適応によるこのギャップに取り組むことを目的としている。平均教師との自己学習は、この問題に対する確立されたアプローチであるが、教師からの疑似ラベルの固有ノイズによって障害を受ける。本稿では,2つの相補的デノベーション戦略を含む,ポイントクラウド登録のための教師・学生の認知パラダイムを提案する。まず,教員登録と学生登録のチャンファー距離に基づいて疑似ラベルをフィルタリングし,教師による有害な監督を防止することを提案する。第2に、教師は、予測変形で移動入力を歪ませることで、ノイズフリーラベルで新しいトレーニングペアを動的に合成する。 2つのドメインシフトの下で,公共PVTデータセット上の肺血管木の吸入吸入登録を行う。我々の手法は平均教師を13.5/62.8%上回り、様々な競争相手を一貫して上回り、新しい最先端精度(TRE=2.31mm)を設定する。コードはhttps://github.com/multimodallearning/denoized_mt_pcd_regで入手できる。

Point cloud-based medical registration promises increased computational efficiency, robustness to intensity shifts, and anonymity preservation but is limited by the inefficacy of unsupervised learning with similarity metrics. Supervised training on synthetic deformations is an alternative but, in turn, suffers from the domain gap to the real domain. In this work, we aim to tackle this gap through domain adaptation. Self-training with the Mean Teacher is an established approach to this problem but is impaired by the inherent noise of the pseudo labels from the teacher. As a remedy, we present a denoised teacher-student paradigm for point cloud registration, comprising two complementary denoising strategies. First, we propose to filter pseudo labels based on the Chamfer distances of teacher and student registrations, thus preventing detrimental supervision by the teacher. Second, we make the teacher dynamically synthesize novel training pairs with noise-free labels by warping its moving inputs with the predicted deformations. Evaluation is performed for inhale-to-exhale registration of lung vessel trees on the public PVT dataset under two domain shifts. Our method surpasses the baseline Mean Teacher by 13.5/62.8%, consistently outperforms diverse competitors, and sets a new state-of-the-art accuracy (TRE=2.31mm). Code is available at https://github.com/multimodallearning/denoised_mt_pcd_reg.

翻訳日:2023-07-06 19:52:33 公開日:2023-07-04

# ストリームシナリオにおける距離関数と正規化

Distance Functions and Normalization Under Stream Scenarios ( http://arxiv.org/abs/2307.00106v2 )

ライセンス: Link先を確認

Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr, Rafael M. O. Cruz

(参考訳) データ正規化は、分類システムのモデリングにおいて不可欠なタスクである。データストリームを扱う場合、最小/最大値などの機能の性質を事前に知ることができないため、データ正規化は特に困難になります。我々は,データストリーム中の8つのよく知られた距離関数が正規化せずに生成した精度を比較し,受信したデータの最初のバッチの統計値と受信した前のバッチの統計値から正規化する。完全ストリームを正規化と見なすストリームの実験的なプロトコルは非現実的であり、バイアスと貧弱な結果をもたらす可能性がある。以上の結果から,正規化を行なわずに元のデータストリームとキャンベラ距離を併用することは,データストリームに関する情報が事前に分かっていない場合によい組み合わせであることが示唆された。

Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.

翻訳日:2023-07-06 19:45:47 公開日:2023-07-04

# スキップ接続を用いたベイズ畳み込みニューラルネットワークの自由エネルギー

Free energy of Bayesian Convolutional Neural Network with Skip Connection ( http://arxiv.org/abs/2307.01417v1 )

ライセンス: Link先を確認

Shuya Nagayasu and Sumio Watanabe

(参考訳) Residual Network(ResNet)の成功以来、畳み込みニューラルネットワーク(CNN)のアーキテクチャの多くはスキップ接続を採用してきた。スイッチ接続によるCNNの一般化性能は,Ensemble Learningのフレームワークで説明されているが,パラメータ数への依存性は明らかにされていない。本稿では,ベイズ学習において,コンボリューショナルニューラルネットワークのベイズ自由エネルギーは,接続をスキップせずとも有効であることを示す。スキップ接続を持つベイジアンCNNの上限自由エネルギーは、オーブパラメトリゼーションに依存しず、ベイジアンCNNの一般化誤差は同様の性質を持つ。

Since the success of Residual Network(ResNet), many of architectures of Convolutional Neural Networks(CNNs) have adopted skip connection. While the generalization performance of CNN with skip connection has been explained within the framework of Ensemble Learning, the dependency on the number of parameters have not been revealed. In this paper, we show that Bayesian free energy of Convolutional Neural Network both with and without skip connection in Bayesian learning. The upper bound of free energy of Bayesian CNN with skip connection does not depend on the oveparametrization and, the generalization error of Bayesian CNN has similar property.

翻訳日:2023-07-06 18:48:07 公開日:2023-07-04

# マッチング可能なキーポイント支援グラフニューラルネットワークによる学習機能マッチング

Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network ( http://arxiv.org/abs/2307.01447v1 )

ライセンス: Link先を確認

Zizhuo Li and Jiayi Ma

(参考訳) 画像のペア間の局所的な特徴の正確なマッチングは、コンピュータビジョンの課題である。従来の研究では注意に基づくグラフニューラルネットワーク(gnn)を使用しており、キーポイント上の完全連結グラフを視覚的および幾何学的情報推論に使用していた。しかし、特徴マッチングの文脈では、検出器の閉塞と故障のため、かなりのキーポイントは取り消せないため、メッセージパッシングには無関係である。非繰り返しキーポイントとの接続は冗長性を導入し、効率が制限されるだけでなく、表現集約プロセスにも干渉し、精度が制限される。提案するMaKeGNNは,非繰り返しキーポイントをバイパスし,マッチング可能なキーポイントを利用して,コンパクトで有意義なメッセージパッシングを導出する,疎度な注意に基づくGNNアーキテクチャである。より具体的には、バイラテラル・コンテキストアウェア・サンプリングモジュールは、まず画像ペアから高い適合性スコアを持つ、分散キーポイントの2つの小さなセットを動的にサンプリングする。次に、我々のMatchable Keypoint-Assisted Context Aggregation Moduleは、サンプルされた通知キーポイントをメッセージボトルネックとみなし、各キーポイントに、マッチするキーポイント内およびマッチしないキーポイントから好ましくないコンテキスト情報を取得することだけを制約し、非削除可能なキーポイントとの無関係で冗長な接続の干渉を回避する。さらに、初期キーポイントとサンプルマッチング可能なキーの潜在的なノイズを考慮し、mkacaモジュールは、データ依存のコンテキスト伝搬のためのマッチング可能性誘導注意集約演算を採用する。これらの手法により, 相対カメラ推定, 基本行列推定, 視覚定位における最先端の性能を実現し, 従来の注意型gnnと比較して計算量やメモリの複雑さを著しく低減した。

Accurately matching local features between a pair of images is a challenging computer vision task. Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images for visual and geometric information reasoning. However, in the context of feature matching, considerable keypoints are non-repeatable due to occlusion and failure of the detector, and thus irrelevant for message passing. The connectivity with non-repeatable keypoints not only introduces redundancy, resulting in limited efficiency, but also interferes with the representation aggregation process, leading to limited accuracy. Targeting towards high accuracy and efficiency, we propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide compact and meaningful message passing. More specifically, our Bilateral Context-Aware Sampling Module first dynamically samples two small sets of well-distributed keypoints with high matchability scores from the image pair. Then, our Matchable Keypoint-Assisted Context Aggregation Module regards sampled informative keypoints as message bottlenecks and thus constrains each keypoint only to retrieve favorable contextual information from intra- and inter- matchable keypoints, evading the interference of irrelevant and redundant connectivity with non-repeatable ones. Furthermore, considering the potential noise in initial keypoints and sampled matchable ones, the MKACA module adopts a matchability-guided attentional aggregation operation for purer data-dependent context propagation. By these means, we achieve the state-of-the-art performance on relative camera estimation, fundamental matrix estimation, and visual localization, while significantly reducing computational and memory complexity compared to typical attentional GNNs.

翻訳日:2023-07-06 18:38:05 公開日:2023-07-04

# 条件付きおよび構成型言語モデル微分型プロンプトについて

On Conditional and Compositional Language Model Differentiable Prompting ( http://arxiv.org/abs/2307.01446v1 )

ライセンス: Link先を確認

Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer

(参考訳) プロンプトは、凍った事前学習言語モデル(plm)を下流タスクに適応させる効果的な方法であることが示されている。プロンプトは、人間工学の単語シーケンスまたは学習された連続埋め込みによって表現できる。本研究では,条件と構成の相違性について検討する。本稿では,タスク命令や入力メタデータを PLM からタスク固有の出力を抽出する連続的なプロンプトに変換する新しいモデル Prompt Production System (PRopS) を提案する。私たちのモデルは、プロダクションシステムのニューラルな定式化に基づくモジュラーネットワーク構造を使用し、モデルが個別のルール -- 特定のプロンプト入力パターンの変換を専門に学習する神経関数 -- を学習することができる。本研究では,PRopS が他の PLM 適応手法を一貫して超越していることを示すとともに,構成一般化タスク,制御可能な要約,多言語翻訳において,PRopS が完全に微調整されたモデルで改善されることがしばしばあることを示す。

Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules -- neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that PRopS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.

翻訳日:2023-07-06 18:37:31 公開日:2023-07-04

# グラフポインタネットワークによる組合せ最適化の分岐学習

Learning to Branch in Combinatorial Optimization with Graph Pointer Networks ( http://arxiv.org/abs/2307.01434v1 )

ライセンス: Link先を確認

Rui Wang, Zhiming Zhou, Tao Zhang, Ling Wang, Xin Xu, Xiangke Liao, Kaiwen Li

(参考訳) 分岐とバウンドは組合せ最適化問題を解決する典型的な方法である。本稿では,分岐境界における変数選択ポリシーを学習するためのグラフポインターネットワークモデルを提案する。解法状態を表すために,グラフの特徴,グローバル特徴,歴史的特徴を抽出する。グラフニューラルネットワークとポインタ機構を組み合わせた提案モデルは, 解法状態から分岐変数決定へ効果的にマッピングすることができる。このモデルは、設計されたトップkのKullback-Leibler分散損失関数によって古典的な強い分岐エキスパートルールを模倣するように訓練されている。一連のベンチマーク問題に関する実験は、提案手法が広く使われている専門家設計の分岐規則よりも大幅に優れていることを示した。また,本手法は,最先端の機械学習に基づくブランチ・アンド・バウンド手法よりも,すべてのテストインスタンスにおける高速化と木の大きさの探索に優れる。さらに、モデルは見えないインスタンスに一般化し、より大きなインスタンスにスケールすることができる。

Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.

翻訳日:2023-07-06 18:37:11 公開日:2023-07-04

# 補完記憶システムを用いたオープン語彙分類における連続学習

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems ( http://arxiv.org/abs/2307.01430v1 )

ライセンス: Link先を確認

Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem

(参考訳) オープン語彙画像分類におけるフレキシブルな連続学習法を導入し,人間の認知に観察される相補的な学習システムからインスピレーションを得た。本稿では,遅延学習の原則を適応した"ツリープローブ"手法を提案する。これにより,競合精度の高い新しい例からバッチ学習線形モデルへの高速学習が可能となる。さらに,サンプルのクラスが模範クラス内にあるというゼロショット推定確率を用いて,CLIPゼロショットモデルと模範モデルからの予測を組み合わせる手法を提案する。データインクリメンタル、クラスインクリメンタル、タスクインクリメンタルの設定でテストし、ゼロショットと学習されたカテゴリのさまざまなサブセットで柔軟な推論を実行します。提案手法は,学習速度,目標課題効率,ゼロショット効果のバランスが良好である。

We introduce a method for flexible continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. We propose a "tree probe" method, an adaption of lazy learning principles, which enables fast learning from new examples with competitive accuracy to batch-trained linear models. Further, we propose a method to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within any of the exemplar classes. We test in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories. Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness.

翻訳日:2023-07-06 18:36:57 公開日:2023-07-04

# スマートフィルタ支援ドメイン対向ニューラルネットワーク:ノイズの多い産業シナリオにおける障害診断のための教師なしドメイン適応手法

Smart filter aided domain adversarial neural network: An unsupervised domain adaptation method for fault diagnosis in noisy industrial scenarios ( http://arxiv.org/abs/2307.01429v1 )

ライセンス: Link先を確認

Baorui Dai, Ga\"etan Frusque, Tianfu Li, Qi Li, Olga Fink

(参考訳) 非教師なし領域適応(UDA)に基づく障害診断法の適用は、異なる運用条件、異なる運用単位、シミュレーションデータ、実データ間の運用経験と障害署名の転送を容易にし、産業環境において大きな効果を示した。しかし、実際の産業シナリオでは、未知のレベルやノイズの種類がドメインアライメントの難しさを増幅し、深層学習モデルの診断性能に重大な影響を及ぼす可能性がある。この問題に対処するため, ノイズの多い産業シナリオにおける故障診断のためのスマートフィルタ支援ドメイン適応ニューラルネットワーク (SFDANN) を提案する。提案手法は2段階からなる。最初のステップでは、時間周波数領域におけるソースとターゲットドメインデータの類似性を動的に強制するスマートフィルタを開発する。これは学習可能なウェーブレットパケット変換ネットワーク(lwpt)と従来のウェーブレットパケット変換モジュールを組み合わせたものである。第2のステップでは、スマートフィルタによって再構成されたデータをドメイン逆ニューラルネットワーク(DANN)に入力する。ドメイン不変性と識別的特徴を学習するために、SFDANNの学習可能なモジュールは、時間周波数特徴近接、ドメインアライメント、障害分類の3つの目的で統一的に訓練される。本研究では, 列車-線路連成振動系において, 騒音環境下での軸受の故障診断とスラブ線路の故障診断の2つの事例に基づくSFDANN法の有効性を検証した。その結果, 他のUDA法と比較すると, SFDANNは優れた性能と顕著な安定性を示した。

The application of unsupervised domain adaptation (UDA)-based fault diagnosis methods has shown significant efficacy in industrial settings, facilitating the transfer of operational experience and fault signatures between different operating conditions, different units of a fleet or between simulated and real data. However, in real industrial scenarios, unknown levels and types of noise can amplify the difficulty of domain alignment, thus severely affecting the diagnostic performance of deep learning models. To address this issue, we propose an UDA method called Smart Filter-Aided Domain Adversarial Neural Network (SFDANN) for fault diagnosis in noisy industrial scenarios. The proposed methodology comprises two steps. In the first step, we develop a smart filter that dynamically enforces similarity between the source and target domain data in the time-frequency domain. This is achieved by combining a learnable wavelet packet transform network (LWPT) and a traditional wavelet packet transform module. In the second step, we input the data reconstructed by the smart filter into a domain adversarial neural network (DANN). To learn domain-invariant and discriminative features, the learnable modules of SFDANN are trained in a unified manner with three objectives: time-frequency feature proximity, domain alignment, and fault classification. We validate the effectiveness of the proposed SFDANN method based on two fault diagnosis cases: one involving fault diagnosis of bearings in noisy environments and another involving fault diagnosis of slab tracks in a train-track-bridge coupling vibration system, where the transfer task involves transferring from numerical simulations to field measurements. Results show that compared to other representative state of the art UDA methods, SFDANN exhibits superior performance and remarkable stability.

翻訳日:2023-07-06 18:36:41 公開日:2023-07-04

# deepfakebench: deepfake検出の包括的なベンチマーク

DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection ( http://arxiv.org/abs/2307.01426v1 )

ライセンス: Link先を確認

Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu

(参考訳) ディープフェイク検出の分野で見落とされがちな課題は、標準化され、統一され、包括的なベンチマークがないことである。この問題は不公平なパフォーマンス比較と、潜在的に誤解を招く結果につながる。具体的には、データ処理パイプラインに均一性がないため、検出モデルに対する一貫性のないデータ入力が発生する。さらに、実験的な設定には顕著な違いがあり、評価戦略とメトリクスには標準化が欠けている。このギャップを埋めるために、deepfakebenchと呼ばれるdeepfake検出のための最初の包括的なベンチマークを提示します。 1)全検出器間で一貫した入力を確保する統一データ管理システム 2)最先端手法実装のための統合フレームワーク、及び 3)透明性と再現性を促進するための標準化された評価指標とプロトコル。拡張可能なモジュールベースのコードベースを備えたdeepfakebenchには、15の最先端検出方法、9のdeepfakeデータセット、一連のdeepfake検出評価プロトコルと分析ツール、そして包括的な評価が含まれている。さらに、様々な視点(データ拡張、バックボーンなど)からの評価を広範囲に分析した新たな洞察を提供する。われわれの努力が今後の研究を促進し、このますます重要な領域におけるイノベーションを育むことを願っている。ベンチマークのコード、評価、分析はすべてhttps://github.com/SCLBD/DeepfakeBench.comで公開されています。

A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at https://github.com/SCLBD/DeepfakeBench.

翻訳日:2023-07-06 18:36:08 公開日:2023-07-04

# 統一GANフレームワークによる一貫性のあるマルチモーダル生成

Consistent Multimodal Generation via A Unified GAN Framework ( http://arxiv.org/abs/2307.01425v1 )

ライセンス: Link先を確認

Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem

(参考訳) 一つの生成モデルを用いて,RGB,深さ,表面正規化などのマルチモーダル画像を生成する方法について検討する。課題は、現実的で、互いに一貫性のある出力を生成することです。提案手法は,合成ネットワークの最後の層に共有バックボーンとモダリティ固有の分岐を持つstylegan3アーキテクチャを基盤とし,モダリティ毎の忠実度判別器とクロスモダリティ一貫性判別器を提案する。スタンフォード2D3Dデータセットの実験では、RGB、深さ、正常画像の現実的で一貫した生成を実証する。また,事前学習したモデルを新たなドメイン上で,たとえペアでのデータであっても容易に拡張するためのトレーニングレシピも提示しています。さらに, 合成RGBと深度ペアを用いたトレーニングおよび微調整深度推定装置について検討した。コードはhttps://github.com/jessemelpolio/multimodalganで入手できる。

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at https://github.com/jessemelpolio/MultimodalGAN.

翻訳日:2023-07-06 18:35:48 公開日:2023-07-04

# 生成フローネットワーク - Markov Chain の視点から

Generative Flow Networks: a Markov Chain Perspective ( http://arxiv.org/abs/2307.01422v1 )

ライセンス: Link先を確認

Tristan Deleu, Yoshua Bengio

(参考訳) マルコフ連鎖モンテカルロ法(MCMC)は、正規化まで定義された確率分布からサンプリングするための一般的な枠組みを提供するが、後者が高度にマルチモーダルである場合、しばしばターゲット分布への緩やかな収束に悩まされる。近年,サンプルが明確な構成構造を持つ場合,サンプリングを逐次意思決定問題として扱うことにより,この問題を軽減するための代替フレームワークとして生成フローネットワーク(GFlowNets)が提案されている。最初はフローネットワークの観点から紹介されたが、近年のGFlowNetsの進歩は、フローの必要性を完全に回避し、マルコフ連鎖の文献からより多くのインスピレーションを得ている。本稿では、この接続を形式化し、マルコフ連鎖を用いたGFlowNetsの新しい視点を提供し、マルコフ連鎖としての状態空間の性質に関係なくGFlowNetsの統一的な視点を示す。 MCMCメソッドと同じ理論的フレームワークの下でGFlowNetを配置することで、両方のフレームワークの類似性を識別できます。

While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their

翻訳日:2023-07-06 18:35:33 公開日:2023-07-04

# 創発的データ駆動型プロトタイプによる教師なし特徴学習

Unsupervised Feature Learning with Emergent Data-Driven Prototypicality ( http://arxiv.org/abs/2307.01421v1 )

ライセンス: Link先を確認

Yunhui Guo, Youren Zhang, Yubei Chen, Stella X. Yu

(参考訳) ラベルのない画像集合が与えられた場合、我々の目標は、それぞれの画像を特徴空間内の点にマッピングするモデルを訓練することであり、近接が視覚的な類似性を示すだけでなく、その画像がデータセットに従ってどのように原型的であるかを直接エンコードする。私たちの重要な洞察は、ユークリッド空間ではなく双曲空間で教師なしの機能学習を行うことです。そこでは、点間の距離は依然として画像の類似性を反映しています。後者の性質は、通常のメートル法学習の目的を最適化することから単純に発せられる: 多くの訓練例に類似したイメージはユークリッド空間の対応する点の中心に配置されるが、双曲空間の原点に近い。球状パッキングを用いたハイパーボリック空間における教師なし特徴学習アルゴリズムを提案する。 HACKはまず、双曲空間のポインカーボールに一様に充填された粒子を生成し、各粒子にそれぞれの画像を一意に割り当てる。凝縮後の画像は、そのデータセットのより典型的なものとみなされる。我々の特徴マッパーは、単に双曲空間のトレーニングインスタンスを広げるように訓練されただけで、画像が結束によって原点に近づくのを観察し、教師なしの原型発見という考え方を検証する。サンプルの複雑さを低減し、非定型インスタンスによるモデル一般化を増加させ、典型的なインスタンスとの堅牢性を高めるため、データ駆動型プロトティピカリティは簡単で優れた非教師なしインスタンス選択を提供する。

Given an image set without any labels, our goal is to train a model that maps each image to a point in a feature space such that, not only proximity indicates visual similarity, but where it is located directly encodes how prototypical the image is according to the dataset. Our key insight is to perform unsupervised feature learning in hyperbolic instead of Euclidean space, where the distance between points still reflect image similarity, and yet we gain additional capacity for representing prototypicality with the location of the point: The closer it is to the origin, the more prototypical it is. The latter property is simply emergent from optimizing the usual metric learning objective: The image similar to many training instances is best placed at the center of corresponding points in Euclidean space, but closer to the origin in hyperbolic space. We propose an unsupervised feature learning algorithm in Hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincar\'e ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded more typical of the dataset it belongs to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery. We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances and robustness with typical ones.

翻訳日:2023-07-06 18:35:14 公開日:2023-07-04

# コミュニティqaプラットフォームユーザの質問タグ行動分析に基づくタグ予測のモデル化

Modeling Tag Prediction based on Question Tagging Behavior Analysis of CommunityQA Platform Users ( http://arxiv.org/abs/2307.01420v1 )

ライセンス: Link先を確認

Kuntal Kumar Pal, Michael Gamon, Nirupama Chandrasekaran and Silviu Cucerzan

(参考訳) コミュニティの質問応答プラットフォームでは、タグは効果的な情報組織化と検索、より良い質問ルーティング、質問への迅速な回答、トピックの人気評価において重要な役割を果たす。したがって、投稿のタグを予測および提案するための自動アシストは、そのようなプラットフォームのユーザにとって非常に有用である。多様なコミュニティやドメインにまたがるタグ予測を改善するため、17のStackExchangeコミュニティにおいて,ユーザのタグ付け動作を徹底的に分析した。これらの多様な領域において、この挙動の様々な共通する性質が発見された。この結果を用いて、各質問に対して人気のあるタグとより粒度の細かいタグの両方を予測する柔軟なニューラルタグ予測アーキテクチャを開発した。我々のモデルの有効性を示す大規模な実験と得られた性能

In community question-answering platforms, tags play essential roles in effective information organization and retrieval, better question routing, faster response to questions, and assessment of topic popularity. Hence, automatic assistance for predicting and suggesting tags for posts is of high utility to users of such platforms. To develop better tag prediction across diverse communities and domains, we performed a thorough analysis of users' tagging behavior in 17 StackExchange communities. We found various common inherent properties of this behavior in those diverse domains. We used the findings to develop a flexible neural tag prediction architecture, which predicts both popular tags and more granular tags for each question. Our extensive experiments and obtained performance show the effectiveness of our model

翻訳日:2023-07-06 18:34:45 公開日:2023-07-04

# AdAM:Adaptation-Aware Kernel ModulationによるFew-Shot画像生成

AdAM: Few-Shot Image Generation via Adaptation-Aware Kernel Modulation ( http://arxiv.org/abs/2307.01465v1 )

ライセンス: Link先を確認

Yunqing Zhao, Keshigeyan Chandrasegaran, Abdollahzadeh Milad, Chao Du, Tianyu Pang, Ruoteng Li, Henghui Ding, Ngai-Man Cheung

(参考訳) Few-shot Image Generation (FSIG)は、少数のトレーニングサンプル(例:10)が与えられた新しい多様な画像を生成することを目的としている。最近の研究は、大規模なソースドメインで事前訓練されたGANを活用し、ターゲットドメインに適応することでFSIGに対処している。最近のFSIG手法の中心は知識保存基準であり、適応されたモデルにソース知識のサブセットを選択し保存する。しかし、既存の方法の大きな制限は、知識保存基準がソースドメイン/タスクのみを考慮し、ソース知識の選択においてターゲットドメイン/適応を考慮せず、ソースドメインとターゲットドメインの近接性の異なる設定に適合性に疑問を投げかけることである。私たちの仕事は2つの貢献をする。まず,最近のFSIG研究とその実験について再検討する。ソースドメインとターゲットドメインの近接性が緩和されるという仮定の下では、知識保存におけるソースドメインのみを考慮した既存のsota(state-of-the-art)メソッドがベースラインメソッドよりも優れていることが判明した。第2の貢献として、異なるソース・ターゲット領域近接の一般FSIGに対してAdaptation-Aware kernel Modulation (AdAM)を提案する。大規模な実験により、AdAMはFSIGのSOTAパフォーマンスを一貫して達成し、ソースドメインとターゲットドメインがより分離された困難なセットアップを含むことを示した。

Few-shot image generation (FSIG) aims to learn to generate new and diverse images given few (e.g., 10) training samples. Recent work has addressed FSIG by leveraging a GAN pre-trained on a large-scale source domain and adapting it to the target domain with few target samples. Central to recent FSIG methods are knowledge preservation criteria, which select and preserve a subset of source knowledge to the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/task and fail to consider target domain/adaptation in selecting source knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. Firstly, we revisit recent FSIG works and their experiments. We reveal that under setups which assumption of close proximity between source and target domains is relaxed, many existing state-of-the-art (SOTA) methods which consider only source domain in knowledge preserving perform no better than a baseline method. As our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) for general FSIG of different source-target domain proximity. Extensive experiments show that AdAM consistently achieves SOTA performance in FSIG, including challenging setups where source and target domains are more apart.

翻訳日:2023-07-06 18:28:41 公開日:2023-07-04

# 単一フレームと重み付き逐次視覚位置認識の改善のための教師なし品質予測

Unsupervised Quality Prediction for Improved Single-Frame and Weighted Sequential Visual Place Recognition ( http://arxiv.org/abs/2307.01464v1 )

ライセンス: Link先を確認

Helen Carson, Jason J. Ford, Michael Milford

(参考訳) ローカライゼーションと視覚位置認識 (vpr) 技術の絶対的な性能では大きな進歩が見られたが、完全性と予測可能性といった他の能力が、特に安全性や運用上重要な自律システムにおいて重要であることに、これらのシステムをアプリケーションに変換することはますます明確になりつつある。本研究では,局所化推定の確率的品質を予測するための新しいトレーニングフリーアプローチと,これらの予測を用いてシーケンスマッチングプロセスをバイアスし,ナイーブシーケンスマッチングアプローチ以上のパフォーマンス向上を実現する新しい手法を提案する。我々の統合システムは軽量であり、リアルタイムに動作し、基礎となるVPR技術とは無関係である。 4つのデータセットと3つのVPR技術にわたる広範な実験を行い、特に高精度/低リコール動作点における精度向上を実証した。また,予測と重み付きシーケンスマッチングコンポーネントの性能寄与を分離したアブレーションと解析を行い,予測システムの品質と重み付きシーケンスマッチング器の利点との関係について検討した。

While substantial progress has been made in the absolute performance of localization and Visual Place Recognition (VPR) techniques, it is becoming increasingly clear from translating these systems into applications that other capabilities like integrity and predictability are just as important, especially for safety- or operationally-critical autonomous systems. In this research we present a new, training-free approach to predicting the likely quality of localization estimates, and a novel method for using these predictions to bias a sequence-matching process to produce additional performance gains beyond that of a naive sequence matching approach. Our combined system is lightweight, runs in real-time and is agnostic to the underlying VPR technique. On extensive experiments across four datasets and three VPR techniques, we demonstrate our system improves precision performance, especially at the high-precision/low-recall operating point. We also present ablation and analysis identifying the performance contributions of the prediction and weighted sequence matching components in isolation, and the relationship between the quality of the prediction system and the benefits of the weighted sequential matcher.

翻訳日:2023-07-06 18:28:16 公開日:2023-07-04

# 実用的なコラボレーティブ知覚:非同期およびマルチエージェント3dオブジェクト検出のためのフレームワーク

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection ( http://arxiv.org/abs/2307.01462v1 )

ライセンス: Link先を確認

Minh-Quan Dao, Julie Stephany Berrio, Vincent Fr\'emont, Mao Shan, Elwan H\'ery, and Stewart Worrall

(参考訳) 本稿では,LiDARを用いた単車体3次元物体検出モデルの改良を行い,その容量を個々の点雲の代わりにプロセスポイントクラウドシーケンスに拡張する。本稿では,複数フレーム検出モデルの検出精度を高めるため,点雲の連結における影効果の補正に関するこれまでの研究を拡張した。拡張にはHD Mapの導入とOracleモデルの蒸留が含まれています。次に、V2X通信によるマルチエージェント協調による単車認識の性能をさらに向上させる。我々は,単一車両検出モデルの変更やエージェント間同期の仮定を最小限に抑えながら,従来技術よりも帯域幅パフォーマンスのトレードオフを実現する,シンプルかつ効果的なコラボレーション手法を考案する。 v2x-simデータセットを用いた実験では,初期コラボレーションの0.03%に相当する遅延コラボレーションの帯域幅使用量を消費しながら,初期コラボレーションの98%のパフォーマンスを実現していることが示された。コードはhttps://github.com/quan-dao/practical-collab-perceptionでリリースされる。

In this paper, we improve the single-vehicle 3D object detection models using LiDAR by extending their capacity to process point cloud sequences instead of individual point clouds. In this step, we extend our previous work on rectification of the shadow effect in the concatenation of point clouds to boost the detection accuracy of multi-frame detection models. Our extension includes incorporating HD Map and distilling an Oracle model. Next, we further increase the performance of single-vehicle perception using multi-agent collaboration via Vehicle-to-everything (V2X) communication. We devise a simple yet effective collaboration method that achieves better bandwidth-performance tradeoffs than prior arts while minimizing changes made to single-vehicle detection models and assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98% performance of the early collaboration while consuming the equivalent amount of bandwidth usage of late collaboration which is 0.03% of early collaboration. The code will be released at https://github.com/quan-dao/practical-collab-perception.

翻訳日:2023-07-06 18:27:56 公開日:2023-07-04

# 量子キックロータにおけるリアプノフ指数の近似

Approximating Quantum Lyapunov Exponents in Quantum Kicked Rotor ( http://arxiv.org/abs/2307.01461v1 )

ライセンス: Link先を確認

Varsha Gupta

(参考訳) 本研究では,量子キックロータ(qkr)の動力学における初期近接状態の進化に着目し,量子カオスの研究を行う。本稿では,この量子系におけるカオスの度合いを量子リプノフ指数(Quantum Lyapunov Exponent, QLE)を用いて定量化する手法を提案する。まず運動量空間をモデル化し、次にqleを進化状態間の忠実性を分析して計算し、量子カオス挙動に関する洞察を提供する。さらに, 局所化, 均一化, 拡散, 収縮, 運動量空間の振動など, 様々な初期状態についても調査を展開する。この結果は、量子カオスの複雑な性質を浮き彫りにして、様々な動的挙動を明らかにした。最後に,多面量子システムのダイナミクスの可視化と理解に潜在的に有意な意味を持つ,複雑状態を上述の状態の重ね合わせとして表現する革新的な最適化フレームワークを提案する。

In this work, we study quantum chaos by focusing on the evolution of initially close states in the dynamics of the Quantum Kicked Rotor (QKR). We propose a novel measure, the Quantum Lyapunov Exponent (QLE), to quantify the degree of chaos in this quantum system, analogous to its classical counterpart. We begin by modeling the momentum space and then the QLE is computed through analyzing the fidelity between evolving states, offering insights into the quantum chaotic behavior. Furthermore, we extend our investigations to various initial states: localized, uniform, spreading, contracting and oscillating in momentum space. Our results unveil a diverse range of dynamical behaviors, highlighting the complex nature of quantum chaos. Finally, we propose an innovative optimization framework to represent a complex state as a superposition of the aforementioned states, which has potential implications for visualizing and understanding the dynamics of multifaceted quantum systems.

翻訳日:2023-07-06 18:27:40 公開日:2023-07-04

# CARE-MI:母子保健における誤情報評価のための中国のベンチマーク

CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care ( http://arxiv.org/abs/2307.01458v1 )

ライセンス: Link先を確認

Tong Xiang, Liangzhi Li, Wangyue Li, Mingbai Bai, Lu Wei, Bowen Wang, Noa Garcia

(参考訳) NLPの最近の進歩は、LLMを現実世界のシナリオに適用する新しい傾向をもたらした。最新のLSMは、人間と対話するときに驚くほど流動的だが、意図せずに事実を偽造することによって誤情報問題に悩まされる。これにより、特に医療などのセンシティブなコンテキストで生成された場合、有害な結果が発生する可能性がある。しかし、LLMの長期化における誤情報の評価、特に知識集約的な話題に焦点を当てた以前の研究はほとんどない。さらに、LLMは様々な言語でうまく機能することが示されているが、誤情報評価は主に英語で行われている。そこで本研究では,LCM誤情報評価のためのベンチマークCARE-MIを提案する。 1)敏感な話題、具体的には母性及び乳幼児ケア領域 2) 英語以外の言語,すなわち中国語。最も重要なことは、他の知識集約型ドメインや低リソース言語に転送可能な長文生成評価ベンチマークを構築するための革新的なパラダイムを提供することです。提案するベンチマークは,LLMの広範利用と,これらのモデルが生成した誤情報を評価するためのデータセットの欠如とのギャップを埋めるものである。専門家による1,612の質問と、人間による参照が含まれている。以上の結果から,現在の中国のLSMは母性や乳幼児ケアの分野では完璧とは程遠いことが判明した。性能評価における人的資源への依存を最小限に抑えるため,ベンチマーク問題を用いてLLMの長期出力を自動的に評価する判断モデルを提案する。さらに、長期生成評価のための潜在的なソリューションを比較し、より堅牢で効率的な自動メトリクスを構築するための洞察を提供する。

The recent advances in NLP, have led to a new trend of applying LLMs to real-world scenarios. While the latest LLMs are astonishingly fluent when interacting with humans, they suffer from the misinformation problem by unintentionally generating factually false statements. This can lead to harmful consequences, especially when produced within sensitive contexts, such as healthcare. Yet few previous works have focused on evaluating misinformation in the long-form generation of LLMs, especially for knowledge-intensive topics. Moreover, although LLMs have been shown to perform well in different languages, misinformation evaluation has been mostly conducted in English. To this end, we present a benchmark, CARE-MI, for evaluating LLM misinformation in: 1) a sensitive topic, specifically the maternity and infant care domain; and 2) a language other than English, namely Chinese. Most importantly, we provide an innovative paradigm for building long-form generation evaluation benchmarks that can be transferred to other knowledge-intensive domains and low-resourced languages. Our proposed benchmark fills the gap between the extensive usage of LLMs and the lack of datasets for assessing the misinformation generated by these models. It contains 1,612 expert-checked questions, accompanied with human-selected references. Using our benchmark, we conduct extensive experiments and found that current Chinese LLMs are far from perfect in the topic of maternity and infant care. In an effort to minimize the reliance on human resources for performance evaluation, we offer a judgment model for automatically assessing the long-form output of LLMs using the benchmark questions. Moreover, we compare potential solutions for long-form generation evaluation and provide insights for building more robust and efficient automated metric.

翻訳日:2023-07-06 18:27:24 公開日:2023-07-04

# ブラックホール内部の非等距離ホログラフィーモデルにおけるホーキング放射からの情報を取得する:理論と量子シミュレーション

Retrieving information from Hawking radiation in the non-isometric holographic model of black hole interior: theory and quantum simulations ( http://arxiv.org/abs/2307.01454v1 )

ライセンス: Link先を確認

Ran Li, Xuanhua Wang, Kun Zhang, Jin Wang

(参考訳) 近年、ブラックホール情報パズルの潜在的な解決策として、ブラックホール内部の非等尺的ホログラムモデルが提案されている。このモデルはブラックホールのダイナミクスの2つの記述を提供する: 有効場記述と量子重力の基本記述である。このモデルの重要な側面は、ブラックホールの内部の有効場記述におけるヒルベルト空間から基本自由度へのホログラフィック写像は線型であるが非等距離写像である。本研究では、ブラックホール内部の非等尺ホログラフィーモデルに基づいて、Hayden-Preskillプロトコルの修正版を提案し、ホーキング放射の復号から情報を取り出すことが可能なデカップリング条件を示す。ブラックホール内部のダイナミクスの完全な知識を仮定し,修正ヘイデン・プレススキルプロトコルのデコードに吉田・キタエフのデコード戦略をどのように活用するかを検討する。さらに、7ビットのIBM量子プロセッサ上で確率的および決定論的デコード戦略の実験を行い、解析結果の検証を行い、非等尺モデルにおける情報検索の可能性を確認する。この研究は、量子プロセッサのブラックホール情報問題を探究するより多くの関心を刺激する。

Recently, a non-isometric holographic model of the black hole interior \cite{Akers:2022qdl} was proposed as a potential solution to the long-standing black hole information puzzle. This model provides two descriptions of the black hole dynamics: the effective field description and the fundamental description of the quantum gravity. The key aspect of this model is that the holographic map from the Hilbert space in the effective field description of the black hole interior to the fundamental degrees of freedom is linear but non-isometric. In this study, based on the non-isometric holographic model of black hole interior, we propose a modified version of Hayden-Preskill protocol and demonstrate the decoupling condition under which retrieving information from decoding Hawking radiation is feasible. Assuming the full knowledge of the dynamics of the black hole interior, we investigate how Yoshida-Kitaev decoding strategy can be employed to decode the modified Hayden-Preskill protocol. Furthermore, we perform experimental tests of both probabilistic and deterministic decoding strategies on the 7-qubit IBM quantum processors to validate our analytical findings and confirm the feasibility of retrieving information in the non-isometric model. This study would stimulate more interests to explore black hole information problem on the quantum processors.

翻訳日:2023-07-06 18:26:57 公開日:2023-07-04

# 対話状態追跡のための多種多様な検索学習

Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking ( http://arxiv.org/abs/2307.01453v1 )

ライセンス: Link先を確認

Brendan King and Jeffrey Flanigan

(参考訳) タスク指向対話の収集と注釈付けのコストが高いため,対話状態追跡(DST)におけるゼロと少数ショット学習に大きな関心が寄せられている。近年の研究では、コンテキスト内学習では、データやパラメータの更新がほとんど必要とせず、トレーニング済みのメソッドをわずかに超えている(hu et al. 2022)。本稿では,DSTの文脈内学習に3つの進歩をもたらしたRefPyDSTを提案する。まず、DSTをPythonプログラミングタスクとして定式化し、Pythonの変数参照として言語コア参照を明示的にモデル化する。第2に、コンテキスト内学習は文脈の例に大きく依存するため、性能向上のための多様な事例を抽出する手法を提案する。最後に, 競合する表面形状の確率を考慮したデコード中の新しい再重み付け手法を導入し, より正確な対話状態予測を行う。提案手法をMultiWOZを用いて評価し、ゼロおよび少数ショット設定で最先端のマルチドメイン共同ゴール精度を実現する。

There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for DST. First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python. Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance. Finally, we introduce a novel re-weighting method during decoding that takes into account probabilities of competing surface forms, and produces a more accurate dialogue state prediction. We evaluate our approach using MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero and few-shot settings.

翻訳日:2023-07-06 18:26:36 公開日:2023-07-04

# 因果強化学習:調査

Causal Reinforcement Learning: A Survey ( http://arxiv.org/abs/2307.01452v1 )

ライセンス: Link先を確認

Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang

(参考訳) 強化学習は不確実性下での逐次的決定問題を解決する上で不可欠なパラダイムである。近年の多くの業績にもかかわらず、現実世界での強化学習手法の適用は依然として困難である。主な障害の1つは、強化学習エージェントが世界に対する根本的な理解を欠いているため、多くの試行錯誤相互作用を通じてゼロから学ぶ必要があることである。また、意思決定の説明を提供し、獲得した知識を一般化する上でも課題に直面している。しかし因果性は、体系的な方法で知識を形式化し、効果的な知識伝達のために不変性を活用することができるため、顕著な利点を提供する。これは、因果関係を学習プロセスに組み込むことで既存のアルゴリズムを強化することを目指す強化学習のサブフィールドである因果関係強化学習の出現につながった。本稿では,因果強化学習に関する文献を総合的に検討する。まず,因果関係と強化学習の基本概念を紹介し,因果関係が非因果関係強化学習の核となる課題にどのように対処できるかを説明する。我々は,既存の因果強化学習アプローチを対象問題と方法論に基づいて分類し,体系的に検討する。最後に,この新興分野におけるオープンイシューと今後の方向性について概説する。

Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.

翻訳日:2023-07-06 18:26:20 公開日:2023-07-04

# 実験データと観測データを組み合わせた二重機械学習手法

A Double Machine Learning Approach to Combining Experimental and Observational Data ( http://arxiv.org/abs/2307.01449v1 )

ライセンス: Link先を確認

Marco Morucci, Vittorio Orlandi, Harsh Parikh, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky

(参考訳) 実験的かつ観察的な研究は、しばしば検証不能な仮定のために妥当性を欠いている。本研究では,実験研究と観察研究を組み合わせた2つの機械学習手法を提案する。我々のフレームワークは、より穏やかな仮定の下で外部の妥当性と無知の違反をテストします。 1つの仮定に違反した場合、半パラメトリックに効率的な治療効果推定器を提供する。しかし,本定理は,一貫した処理効果推定のための仮定を正確に同定する必要性を強調している。実世界の3つのケーススタディにおいて,本手法の適用性を実証し,実践的設定との関連を強調した。

Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one assumption is violated, we provide semi-parametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. We demonstrate the applicability of our approach in three real-world case studies, highlighting its relevance for practical settings.

翻訳日:2023-07-06 18:26:02 公開日:2023-07-04

# ReactIE:Weak Supervisionによる化学反応抽出の強化

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision ( http://arxiv.org/abs/2307.01448v1 )

ライセンス: Link先を確認

Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han

(参考訳) 構造化化学反応情報は、実験やコンピュータ支援医薬品設計などの先進的な取り組みに携わる化学者にとって重要な役割を担っている。科学文献から構造化された反応を抽出することの重要性にもかかわらず、この目的のためのデータアノテーションは、ドメインの専門家が必要とする膨大な労力のためにコストがかかる。したがって、十分なトレーニングデータの不足は、この分野における関連するモデルの進歩の障害となる。本稿では,事前学習のための2つの弱い教師付きアプローチを組み合わせたreactieを提案する。本手法は, テキスト中の頻繁なパターンを言語的手がかりとして, 化学反応の特徴を同定する。さらに,特許記録からの合成データを遠隔監視として採用し,ドメイン知識をモデルに組み込む。実験によると、ReactIEは大幅に改善され、既存のベースラインをすべて上回っている。

Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that ReactIE achieves substantial improvements and outperforms all existing baselines.

翻訳日:2023-07-06 18:25:52 公開日:2023-07-04

# 高密度変動を有する3次元点雲上のセマンティックセグメンテーション

Semantic Segmentation on 3D Point Clouds with High Density Variations ( http://arxiv.org/abs/2307.01489v1 )

ライセンス: Link先を確認

Ryan Faulkner, Luke Haub, Simon Ratcliffe, Ian Reid, Tat-Jun Chin

(参考訳) 調査用lidarスキャンは、広範囲および長距離にわたる測定値を取得し、局所密度の異なる大規模な3dポイント雲を生成する。既存の3dセマンティクスセグメンテーションモデルは、様々な点密度に対して頑健性を構築するために、ダウンサンプリングとアップサンプリングを行うが、測量アプリケーションからの点雲の特徴である大きな局所密度変動では効果が低くなる。この弱点を解消するため、我々はHDVNetと呼ばれる新しいアーキテクチャを提案し、それぞれが特定の点密度範囲を扱うエンコーダ-デコーダ経路のネストセットを含む。特徴写像間の相互接続を制限することで、HDVNetは低密度オブジェクトに存在しない高密度特徴の重み付けのような点の密度に基づいて各特徴の信頼性を測定することができる。入力密度の変動を効果的に処理することにより、HDVNetは、半分以上の重みを使って、実点雲上のセグメント化精度で最先端のモデルより優れる。

LiDAR scanning for surveying applications acquire measurements over wide areas and long distances, which produces large-scale 3D point clouds with significant local density variations. While existing 3D semantic segmentation models conduct downsampling and upsampling to build robustness against varying point densities, they are less effective under the large local density variations characteristic of point clouds from surveying applications. To alleviate this weakness, we propose a novel architecture called HDVNet that contains a nested set of encoder-decoder pathways, each handling a specific point density range. Limiting the interconnections between the feature maps enables HDVNet to gauge the reliability of each feature based on the density of a point, e.g., downweighting high density features not existing in low density objects. By effectively handling input density variations, HDVNet outperforms state-of-the-art models in segmentation accuracy on real point clouds with inconsistent density, using just over half the weights.

翻訳日:2023-07-06 18:18:33 公開日:2023-07-04

# SCAT: テキスト分類のための逆学習による頑健な自己教師型コントラスト学習

SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification ( http://arxiv.org/abs/2307.01488v1 )

ライセンス: Link先を確認

Junjie Wu, Dit-Yan Yeung

(参考訳) 様々な自然言語処理(NLP)タスクにおける有望なパフォーマンスにもかかわらず、現在のNLPシステムはテキストの敵対攻撃に対して脆弱である。これらの攻撃から防御するために、既存の方法の多くは、敵の例を取り入れて敵の訓練を適用する。しかし、これらの手法は逆の例を生成するために接地ラベルに依存する必要があり、現在ではnlpや他の多くのタスクで一般的に使用される大規模モデルの事前学習には実用的でない。本稿では、ラベル付きデータを必要としない堅牢な表現を学習できるSCAT(Self-supervised Contrastive Learning via Adversarial Training)という新しい学習フレームワークを提案する。特にSCATは、データのランダムな拡張をラベルのない方法で修正し、逆例を生成する。敵の訓練は、増強と敵との対比的損失を最小化することで達成される。最近提案された2つの最先端攻撃方式を用いて、2つのテキスト分類データセット上でSCATを評価する。以上の結果から,SCATはスクラッチから頑健な言語モデルを訓練できるだけでなく,既存の事前学習言語モデルの堅牢性を大幅に向上させることができることがわかった。さらに,その柔軟性を示すために,scatと教師付き対向訓練を組み合わせることで,モデルのロバスト性をさらに向上できることを示す。

Despite their promising performance across various natural language processing (NLP) tasks, current NLP systems are vulnerable to textual adversarial attacks. To defend against these attacks, most existing methods apply adversarial training by incorporating adversarial examples. However, these methods have to rely on ground-truth labels to generate adversarial examples, rendering it impractical for large-scale model pre-training which is commonly used nowadays for NLP and many other tasks. In this paper, we propose a novel learning framework called SCAT (Self-supervised Contrastive Learning via Adversarial Training), which can learn robust representations without requiring labeled data. Specifically, SCAT modifies random augmentations of the data in a fully labelfree manner to generate adversarial examples. Adversarial training is achieved by minimizing the contrastive loss between the augmentations and their adversarial counterparts. We evaluate SCAT on two text classification datasets using two state-of-the-art attack schemes proposed recently. Our results show that SCAT can not only train robust language models from scratch, but it can also significantly improve the robustness of existing pre-trained language models. Moreover, to demonstrate its flexibility, we show that SCAT can also be combined with supervised adversarial training to further enhance model robustness.

翻訳日:2023-07-06 18:18:13 公開日:2023-07-04

# h-denseformer : マルチモーダル腫瘍セグメンテーションのための高効率ハイブリッド結合トランスフォーマー

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation ( http://arxiv.org/abs/2307.01486v1 )

ライセンス: Link先を確認

Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue

(参考訳) 近年,多変量医用画像の腫瘍分割に深層学習法が広く用いられており,有望な結果が得られている。しかし、既存の手法のほとんどは、表現能力の不足、特定のモダリティ数、高い計算複雑性によって制限されている。本稿では,畳み込みニューラルネットワーク (cnn) の表現力とトランスフォーミング構造を組み合わせた,h-denseformerという腫瘍セグメント化のためのハイブリッドネットワークを提案する。具体的には、h-denseformerはトランスフォーマティブベースのマルチパス並列埋め込み(mpe)モジュールを統合し、任意の数のモダリティを入力として、異なるモダリティから融合特徴を抽出することができる。その後、マルチモーダル融合機能はエンコーダの異なるレベルに配信され、マルチモーダル学習表現が強化される。さらに,Densely Connected Transformer (DCT) ブロックを設計して,標準的な Transformer ブロックを置き換えることにより,計算量を大幅に削減する。公開マルチモーダルデータセットであるHECKTOR21とPI-CAI22について広範な実験を行った。実験の結果,提案手法は計算の複雑さを低減しつつ,既存の最先端手法よりも優れていることがわかった。ソースコードはhttps://github.com/shijun18/H-DenseFormerで入手できる。

Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.

翻訳日:2023-07-06 18:17:50 公開日:2023-07-04

# Nexus sine qua non:多変量時系列の時空間予測のための基本結合ニューラルネットワーク

Nexus sine qua non: Essentially connected neural networks for spatial-temporal forecasting of multivariate time series ( http://arxiv.org/abs/2307.01482v1 )

ライセンス: Link先を確認

Tong Nie, Guoyang Qin, Yunpeng Wang, Jian Sun

(参考訳) 多変量時系列のモデリングと予測は、実践者の意思決定を促進するだけでなく、基礎となる力学系の科学的理解を深める。近年,時空間グラフニューラルネットワーク(STGNN)が強力な予測器として登場し,時空間表現を学習するためのデファクトモデルとなっている。しかし、既存のstgnnのアーキテクチャは、一連の派手なレイヤーを積み重ねることで複雑になりがちである。設計されたモデルは冗長か謎めいたものであり、複雑さと拡張性に大きな課題をもたらす。このような懸念から、私たちは現代のSTGNNの設計を再検討し、強力で効率的な神経予測に寄与するコア原則を特定できます。本稿では,高密度エンコーダデコーダとノード識別によるメッセージパッシング層によって完全に定義された,TN,RNN,Transformerなどの複雑な逐次モジュールを持たない,コンパクトな予測モデルを提案する。実験的な結果は、適切な帰納的ベースを持つ単純でエレガントなモデルが、空間的時間的予測問題に対してより解釈可能で計算的に効率的でありながら、芸術の状態と精巧な設計を適切に比較できることを示している。我々は、より簡潔な神経予測アーキテクチャの設計を再考するために、将来の研究のための新たな地平を開くことを願っている。

Modeling and forecasting multivariate time series not only facilitates the decision making of practitioners, but also deepens our scientific understanding of the underlying dynamical systems. Spatial-temporal graph neural networks (STGNNs) are emerged as powerful predictors and have become the de facto models for learning spatiotemporal representations in recent years. However, existing architectures of STGNNs tend to be complicated by stacking a series of fancy layers. The designed models could be either redundant or enigmatic, which pose great challenges on their complexity and scalability. Such concerns prompt us to re-examine the designs of modern STGNNs and identify core principles that contribute to a powerful and efficient neural predictor. Here we present a compact predictive model that is fully defined by a dense encoder-decoder and a message-passing layer, powered by node identifications, without any complex sequential modules, e.g., TCNs, RNNs, and Transformers. Empirical results demonstrate how a simple and elegant model with proper inductive basis can compare favorably w.r.t. the state of the art with elaborate designs, while being much more interpretable and computationally efficient for spatial-temporal forecasting problem. We hope our findings would open new horizons for future studies to revisit the design of more concise neural forecasting architectures.

翻訳日:2023-07-06 18:17:25 公開日:2023-07-04

# 量子プログラムのブラックボックステストにおける等価性、同一性、ユニタリティチェック

Equivalence, Identity, and Unitarity Checking in Black-Box Testing of Quantum Programs ( http://arxiv.org/abs/2307.01481v1 )

ライセンス: Link先を確認

Peixun Long and Jianjun Zhao

(参考訳) 量子プログラムは本質的に非決定論的振る舞いを示し、従来のプログラムよりもエラー発見に重大な課題をもたらす。量子プログラムにはいくつかのテスト手法が提案されているが、ブラックボックステストの基本的な問題を見落としていることが多い。本稿では,量子プログラムのブラックボックステストにおける等価性,同一性,ユニタリティチェックの課題に対処するために特別に設計された3つの新しいアルゴリズムを提案することで,このギャップを埋める。また、等価度とユニタリティチェックの専門バージョンを含むこれらのアルゴリズムの最適化手法についても検討し、パラメータ選択に関する貴重な洞察を提供し、性能と有効性を最大化する。提案手法の有効性を評価するため,提案手法は量子プログラムのブラックボックステストに頑健なサポートを提供し,等価性,アイデンティティ,ユニタリティチェックを厳格に行うことができることを示す総合的な実験評価を行った。

Quantum programs exhibit inherent non-deterministic behavior, which poses more significant challenges for error discovery compared to classical programs. While several testing methods have been proposed for quantum programs, they often overlook fundamental questions in black-box testing. In this paper, we bridge this gap by presenting three novel algorithms specifically designed to address the challenges of equivalence, identity, and unitarity checking in black-box testing of quantum programs. We also explore optimization techniques for these algorithms, including specialized versions for equivalence and unitarity checking, and provide valuable insights into parameter selection to maximize performance and effectiveness. To evaluate the effectiveness of our proposed methods, we conducted comprehensive experimental evaluations, which demonstrate that our methods can rigorously perform equivalence, identity, and unitarity checking, offering robust support for black-box testing of quantum programs.

翻訳日:2023-07-06 18:17:00 公開日:2023-07-04

# バイアス緩和:モデル説明の改善による画像分類の強化

Mitigating Bias: Enhancing Image Classification by Improving Model Explanations ( http://arxiv.org/abs/2307.01473v1 )

ライセンス: Link先を確認

Raha Ahmadi, Mohammad Javad Rajabi, Mohamamd Khalooiem Mohammad Sabokrou

(参考訳) ディープラーニングモデルは、トレーニングデータから複雑なパターンや概念を学ぶ際、顕著な能力を示した。しかし、近年の研究では、これらのモデルは画像の背景に存在する単純で容易に識別できる特徴に大きく依存する傾向にあることが示されている。この現象は、画像への関心の重要要素が隠蔽される可能性があるため、画像分類器に挑戦する。本稿では,この問題に対処する新しいアプローチを提案し,画像分類器による主概念の学習を改善する。我々の中心的な考え方は、分類作業中にモデルがフォアグラウンドに注意を向けるのを同時に導くことを中心に展開する。関心の主対象をカプセル化した前景を強調することで,背景の優越的な影響からモデルの焦点を逸脱させることを目指している。これを実現するために、モデルに十分な注意を前景に割り当てるよう促すメカニズムを導入する。損失関数の変更や追加のアーキテクチャコンポーネントの導入など,さまざまな戦略を検討し,画像内の主概念を効果的に把握できるようにする。さらに,様々な注意機構がモデル性能に与える影響について検討し,その効果について考察する。ベンチマークデータセットの広範な実験を通じて,画像分類器の分類精度を向上させるための提案手法の有効性を実証する。本研究は,画像内の主概念の理解と表現における前景的注意の重要性を浮き彫りにしたものである。本研究は,画像分類分野の進展に寄与し,より堅牢で正確なディープラーニングモデルの開発に有用な知見を提供する。

Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model's attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for developing more robust and accurate deep-learning models.

翻訳日:2023-07-06 18:16:43 公開日:2023-07-04

# beyond conservatism: オフラインマルチエージェント強化学習における拡散ポリシー

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2307.01472v1 )

ライセンス: Link先を確認

Zhuoran Li, Ling Pan and Longbo Huang

(参考訳) 本稿では,オフラインマルチエージェント強化学習(marl)のための拡散型オフラインマルチエージェントモデル(dom2)を提案する。政策設計における保守主義に主に依存する既存のアルゴリズムとは異なり、dom2はポリシー表現力と拡散に基づく多様性を高める。具体的には,ポリシーネットワークに拡散モデルを導入し,訓練における軌道に基づくデータ提供方式を提案する。これらの重要な要素により、我々のアルゴリズムは環境変化に対してより堅牢になり、性能、一般化、データ効率が大幅に向上した。実験の結果,DOM2はマルチエージェント粒子およびマルチエージェント MuJoCo 環境において既存の最先端手法よりも優れており,その表現性や多様性により,シフト環境において大幅に向上していることがわかった。さらに、DOM2はデータ効率が優れ、既存のアルゴリズムに比べて20ドル以上のデータで最先端のパフォーマンスを達成することができる。

We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve state-of-the-art performance with $20+$ times less data compared to existing algorithms.

翻訳日:2023-07-06 18:16:20 公開日:2023-07-04

# 運転者の視線推定と視線行動理解への応用

A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding ( http://arxiv.org/abs/2307.01470v1 )

ライセンス: Link先を確認

Pavan Kumar Sharma and Pranamesh Chakraborty

(参考訳) 運転者の視線は、運転者の注意力検出、視覚障害検出、視線行動理解、建物運転支援システムなど、様々な視線ベースのアプリケーションにおいて重要な役割を果たす。本研究の主な目的は,運転者視線の基礎,運転者視線推定方法,実世界の運転シナリオにおける応用の総合的な要約を行うことである。まず,ヘッドマウントおよびリモートセットアップに基づく視線推定を含むドライバの視線に関する基礎と,これらのデータ収集手法で使用される用語について論じる。次に、既存のベンチマークドライバの注視データセットをリストアップし、収集方法論とそのようなデータ収集に使用する機器を強調する。続いて、従来の機械学習とディープラーニングに基づくテクニックを中心に、ドライバの視線推定に使用されるアルゴリズムに関する議論が行われる。推定されたドライバーの視線は、交差点、オンランプ、オフランプ、車線変更、道路側広告構造の影響を判断しながら視線行動を理解するために使用される。最後に,運転者の視線推定と視線に基づく応用における既存の文献,課題,今後の展望について考察した。

Driver gaze plays an important role in different gaze-based applications such as driver attentiveness detection, visual distraction detection, gaze behavior understanding, and building driver assistance system. The main objective of this study is to perform a comprehensive summary of driver gaze fundamentals, methods to estimate driver gaze, and it's applications in real world driving scenarios. We first discuss the fundamentals related to driver gaze, involving head-mounted and remote setup based gaze estimation and the terminologies used for each of these data collection methods. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and the equipment used for such data collection. This is followed by a discussion of the algorithms used for driver gaze estimation, which primarily involves traditional machine learning and deep learning based techniques. The estimated driver gaze is then used for understanding gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and determining the effect of roadside advertising structures. Finally, we have discussed the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

翻訳日:2023-07-06 18:16:05 公開日:2023-07-04

# 単一ポートレートからアニマタブルな3次元カートゥーンの顔を生成する

Generating Animatable 3D Cartoon Faces from Single Portraits ( http://arxiv.org/abs/2307.01468v1 )

ライセンス: Link先を確認

Chuanyu Pan, Guowei Yang, Taijiang Mu, and Yu-Kun Lai

(参考訳) 仮想現実(VR)技術のブームにより、カスタマイズされた3Dアバターの必要性が高まっている。しかし、従来の3Dアバターモデリングの手法は、時間を要するか、モデル化されている人物と類似性を維持するのに失敗する。 1枚の肖像画からアニマタブルな3Dマンガの顔を生成する新しい枠組みを提案する。まず、入力された現実世界のポートレートをスタイルガン付きのスタイリッシュな漫画画像に転送する。次に, テンプレートモデルに基づく粗い推定を行い, 非剛性変形によるモデルをランドマーク監督下で洗練する, 詳細なテクスチャで3次元マンガ面を復元する2段階の再構成法を提案する。最後に,手作業によるテンプレート作成と変形伝達に基づく意味保存顔リギング手法を提案する。先行技術と比較すると, 質的, 定量的な結果から, 精度, 審美性, 類似性基準が向上した。さらに,我々の3次元モデルのリアルタイム顔アニメーションの能力を実演する。

With the booming of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain similarity to the person being modeled. We present a novel framework to generate animatable 3D cartoon faces from a single portrait image. We first transfer an input real-world portrait to a stylized cartoon image with a StyleGAN. Then we propose a two-stage reconstruction method to recover the 3D cartoon face with detailed texture, which first makes a coarse estimation based on template models, and then refines the model by non-rigid deformation under landmark supervision. Finally, we propose a semantic preserving face rigging method based on manually created templates and deformation transfer. Compared with prior arts, qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrate the capability of real-time facial animation of our 3D model.

翻訳日:2023-07-06 18:15:45 公開日:2023-07-04

# Ego4D長期活動予測チャレンジ2023の実施報告

Technical Report for Ego4D Long Term Action Anticipation Challenge 2023 ( http://arxiv.org/abs/2307.01467v1 )

ライセンス: Link先を確認

Tatsuya Ishibashi, Kosuke Ono, Noriyuki Kugo, Yuji Sato

(参考訳) 本稿では,Ego4D Long-Term Action Precipation Challenge 2023に対するアプローチの技術的詳細について述べる。このタスクの目的は、入力されたビデオが与えられたとき、任意の時間以上で起こる、将来のアクションのシーケンスを予測することである。そこで本研究では,ビデオからクリップレベルの特徴を生成するエンコーダと,複数のクリップレベルの特徴を統合するアグリゲータと,将来的な動作を出力するデコーダの3つの改良点を紹介する。 1) SlowFast と SlowFast-CLIP のモデルアンサンブル 2) 今後の行動の順序制約を緩和するラベルの平滑化 3) 単語共起に基づく動作クラス(verb,noun)の予測を制約する。提案手法は, ベースライン性能を向上し, 公開リーダボード上の第2位ソリューションとして記録した。

In this report, we describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video. To accomplish this task, we introduce three improvements to the baseline model, which consists of an encoder that generates clip-level features from the video, an aggregator that integrates multiple clip-level features, and a decoder that outputs Z future actions. 1) Model ensemble of SlowFast and SlowFast-CLIP; 2) Label smoothing to relax order constraints for future actions; 3) Constraining the prediction of the action class (verb, noun) based on word co-occurrence. Our method outperformed the baseline performance and recorded as second place solution on the public leaderboard.

翻訳日:2023-07-06 18:15:28 公開日:2023-07-04

# selffed: iomtにおけるデータ不均一性とラベル不足に対する自己教師付き連合学習

SelfFed: Self-supervised Federated Learning for Data Heterogeneity and Label Scarcity in IoMT ( http://arxiv.org/abs/2307.01514v1 )

ライセンス: Link先を確認

Sunder Ali Khowaja, Kapal Dev, Syed Muhammad Anwar, Marius George Linguraru

(参考訳) 連合学習パラダイムにおける自己教師あり学習は,ラベルなしで孤立したデータの協調学習能力により,産業と研究の両方において大きな関心を集めている。しかし,自己管理型フェデレート学習戦略は,ラベル不足や多種多様なデータ分布,すなわちデータ不均一性による性能劣化に悩まされている。本稿では,IoMT(Internet of Medical Things)のためのSelfFedフレームワークを提案する。提案するSelfFedフレームワークは2段階で動作する。第1フェーズは、スウィントランスベースのエンコーダを用いた拡張モデリングを分散的に実行する事前学習パラダイムである。 SelfFedフレームワークの第1フェーズは、データの不均一性を克服するのに役立つ。第2フェーズは、対照的なネットワークと、限定ラベル付きデータに基づいて訓練された新たな集約戦略を分散的に導入する、微調整パラダイムである。この微調整段階はラベル不足問題を克服する。我々は,医用画像データセットに関する実験分析を行い,非独立・同一分散(IID)データとラベル不足に関する既存のベースラインと比較して,提案するSelfFedフレームワークが優れていることを示す。非IIDデータセット上のRetinaおよびCOVID-FLデータセットの最大8.8%と4.1%の改善を実現する。さらに,提案手法は,少数の (10%) ラベル付きインスタンスでトレーニングしても,既存のベースラインよりも優れている。

Self-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.

翻訳日:2023-07-06 18:09:44 公開日:2023-07-04

# コンテナ転位問題におけるエネルギー消費最小化のための転位ルールの自動設計

Automated design of relocation rules for minimising energy consumption in the container relocation problem ( http://arxiv.org/abs/2307.01513v1 )

ライセンス: Link先を確認

Marko {\DJ}urasevi\'c, Mateja {\DJ}umi\'c, Rebeka \v{C}ori\'c, Francisco Javier Gil-Gala

(参考訳) コンテナ配置問題は、所定の目的を最小化し、すべてのコンテナを所定の順序で回収するコンテナ配置のシーケンスを見つけることを目的とした組合せ最適化問題である。リロケーションルール(RR)は、優先度関数とリロケーションスキームから構成されており、その柔軟性と効率性から、上記の問題を解決するために一般的に用いられるヒューリスティックである。近年,実世界の多くの問題において,エネルギー消費を考えることがますます重要になっている。しかし、この派生型にはRRは存在せず、手動で設計する必要がある。この問題を回避できる可能性の1つは、新しいRRを自動設計するために超ヒューリスティックスを適用することである。本研究では,rrsにおけるエネルギー消費の最小化を目標とする優先関数の獲得に遺伝的プログラミングを用いる。提案手法を優先度関数の設計に用いた文献からの遺伝的アルゴリズムと比較する。その結果、遺伝子プログラミングによって設計されたRRが最高の性能を発揮することが示された。

The container relocation problem is a combinatorial optimisation problem aimed at finding a sequence of container relocations to retrieve all containers in a predetermined order by minimising a given objective. Relocation rules (RRs), which consist of a priority function and relocation scheme, are heuristics commonly used for solving the mentioned problem due to their flexibility and efficiency. Recently, in many real-world problems it is becoming increasingly important to consider energy consumption. However, for this variant no RRs exist and would need to be designed manually. One possibility to circumvent this issue is by applying hyperheuristics to automatically design new RRs. In this study we use genetic programming to obtain priority functions used in RRs whose goal is to minimise energy consumption. We compare the proposed approach with a genetic algorithm from the literature used to design the priority function. The results obtained demonstrate that the RRs designed by genetic programming achieve the best performance.

翻訳日:2023-07-06 18:09:20 公開日:2023-07-04

# 薬物-薬物相互作用予測のためのココントラスト学習と関係認識サブグラフ埋め込み

Relation-aware subgraph embedding with co-contrastive learning for drug-drug interaction prediction ( http://arxiv.org/abs/2307.01507v1 )

ライセンス: Link先を確認

Mengying Jiang and Guizhong Liu and Biao Zhao and Yuanchao Su and Weiqiang Jin

(参考訳) リレーショナル・アウェア・サブグラフの埋め込みはDDI(multi-relational drug-drug interaction)の予測に有効である。通常、既存のほとんどの手法はDDIグラフの構築から始まり、DDIグラフから薬物の関連性認識サブグラフ埋め込み(RaSE)を学習する。しかしながら、既存のほとんどのアプローチは、新しい薬物のRaSEを学習するのに限られており、テストDDIがそのような薬物を含む場合、深刻な過度な適合をもたらす。そこで本稿では,連関学習を伴う関係認識部分グラフ埋め込みに基づく新しいddi予測手法rasecoを提案する。 RaSECoは、マルチリレーショナルDDIグラフとマルチ属性ベースのドラッグ・ドラッグ類似性(DDS)グラフという、2つの異種薬物グラフを構築している。 2つのグラフはそれぞれ、薬物のRaSEを学習し、伝播するために使用され、それによって新しい薬物を含む全ての薬物が効果的なRaSEを収集できる。さらに,薬物ペア(DP)の埋め込みを促進するために,クロスビューコントラスト機構を採用している。 RaSECoは2つの異なる視点(相互作用と類似性の観点から)からDP埋め込みを学び、これらの見解を相互に監督し、より差別的なDP埋め込みを得るよう促している。 2つの実データセットを用いて3つのタスクにおけるRaSECoの有効性を評価する。実験の結果,RaSECoは既存の最先端予測手法よりも優れていた。

Relation-aware subgraph embedding is promising for predicting multi-relational drug-drug interactions (DDIs). Typically, most existing methods begin by constructing a multi-relational DDI graph and then learning relation-aware subgraph embeddings (RaSEs) of drugs from the DDI graph. However, most existing approaches are usually limited in learning RaSEs of new drugs, leading to serious over-fitting when the test DDIs involve such drugs. To alleviate this issue, We propose a novel DDI prediction method based on relation-aware subgraph embedding with co-contrastive learning, RaSECo. RaSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attributes-based drug-drug similarity (DDS) graph. The two graphs are used respectively for learning and propagating the RaSEs of drugs, thereby ensuring that all drugs, including new ones, can aggregate effective RaSEs. Additionally, we employ a cross-view contrastive mechanism to enhance drug-pair (DP) embedding. RaSECo learns DP embeddings from two distinct views (interaction and similarity views) and encourages these views to supervise each other collaboratively to obtain more discriminative DP embeddings. We evaluate the effectiveness of our RaSECo on three different tasks using two real datasets. The experimental results demonstrate that RaSECo outperforms existing state-of-the-art prediction methods.

翻訳日:2023-07-06 18:09:06 公開日:2023-07-04

# グラフニューラルネットワークのためのマルチタスクプロンプト

All in One: Multi-task Prompting for Graph Neural Networks ( http://arxiv.org/abs/2307.01504v1 )

ライセンス: Link先を確認

Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan

(参考訳) 近年、「事前学習と微調整」は、各アプリケーションからのグラフアノテーションの欠如を緩和するために一般的なグラフ知識を活用できるため、多くのグラフタスクの標準ワークフローとして採用されている。しかし、ノードレベル、エッジレベル、グラフレベルのグラフタスクは、はるかに多様化しており、事前トレーニングされたプリテキストは、これらの複数のタスクと互換性がないことが多い。このギャップは、特定のアプリケーションに対して'負の転送'を引き起こす可能性があり、その結果は乏しい。自然言語処理(NLP)の素早い学習にインスパイアされ,様々なNLPタスクに事前知識を活用する上で,事前学習されたモデルと各種グラフタスクのギャップを埋める動機付けとして,グラフの素早いトピックについて検討した。本稿では,グラフモデルのための新しいマルチタスクプロンプト手法を提案する。具体的には、最初にグラフプロンプトと言語プロンプトのフォーマットをプロンプトトークン、トークン構造、挿入パターンで統一しました。このようにして、NLPからのプロンプトアイデアをグラフ領域にシームレスに導入することができる。次に,グラフ処理と最先端事前学習戦略のギャップをさらに狭めるため,様々なグラフアプリケーションのタスク空間をさらに調査し,ダウンストリーム問題をグラフレベルのタスクに再構成する。その後、我々はメタラーニングを導入し、グラフのマルチタスクプロンプトのより優れた初期化を効果的に学習し、異なるタスクに対してより信頼性と一般的なプロンプトフレームワークを実現する。我々は広範囲な実験を行い、その結果、本手法の優位性を実証した。

Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a ''negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

翻訳日:2023-07-06 18:08:43 公開日:2023-07-04

# 多言語設定におけるジェンダーバイアスの評価と緩和について

On Evaluating and Mitigating Gender Biases in Multilingual Settings ( http://arxiv.org/abs/2307.01503v1 )

ライセンス: Link先を確認

Aniket Vashishtha, Kabir Ahuja, Sunayana Sitaram

(参考訳) 言語モデルにおけるジェンダーバイアスの理解と排除は、自然言語処理における長年の問題であったが、以前の研究は主に英語に限られていた。本研究では,多言語環境におけるバイアスの評価と緩和に関する課題について検討し,その原因は英語以外の言語におけるバイアス評価のための既存のベンチマークやリソースの欠如にある。本稿では、まず、人間のアノテーションを用いて、DisCoを異なるインド言語に拡張することにより、事前訓練されたマスキング言語モデルの性別バイアスを評価するベンチマークを作成する。提案手法を英語以外の言語に拡張し,SOTAの大規模多言語モデルの有効性を評価する。全体として、我々の研究は、多言語環境での社会的バイアスを研究する際に生じる課題を強調し、より多くの言語にスケールするためのリソースと緩和技術を提供する。

While understanding and removing gender biases in language models has been a long-standing problem in Natural Language Processing, prior research work has primarily been limited to English. In this work, we investigate some of the challenges with evaluating and mitigating biases in multilingual settings which stem from a lack of existing benchmarks and resources for bias evaluation beyond English especially for non-western context. In this paper, we first create a benchmark for evaluating gender biases in pre-trained masked language models by extending DisCo to different Indian languages using human annotations. We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models on our proposed metric. Overall, our work highlights the challenges that arise while studying social biases in multilingual settings and provides resources as well as mitigation techniques to take a step toward scaling to more languages.

翻訳日:2023-07-06 18:08:17 公開日:2023-07-04

# hedi : 初回臨床応用と切開ヘルニア修復のための生体力学的評価・可視化ツールの成績

HEDI: First-Time Clinical Application and Results of a Biomechanical Evaluation and Visualisation Tool for Incisional Hernia Repair ( http://arxiv.org/abs/2307.01502v1 )

ライセンス: Link先を確認

Jacob J. Relle, Samuel Vo{\ss}, Ramesch Raschidi, Regine Nessel, Johannes G\"orich, Mark O. Wielp\"utz, Thorsten L\"offler, Vincent Heuveline, Friedrich Kallinowski, Philipp D. L\"osel

(参考訳) 腹壁欠損は、しばしば痛み、不快感、また切開ヘルニアの再発を招き、深刻な致死性および世界中で外科的修復を繰り返している。大規模なヘルニアに対するメッシュ修復は, 筋肉活性化, 腹腔内圧, 組織弾性, 腹部壁の拘縮などの生体力学的側面を考慮せずに, 重なりが固定された欠損領域に基づいて行われる。この問題を解決するため,不安定な腹壁を考慮に入れた切開ヘルニア修復に対する生体力学的アプローチを提案する。さらに,valsalva操作を伴うダイナミックctを用いてヘルニアの大きさ,体積,腹壁不安定を自動的に検出し評価するツールであるhediを紹介する。 31例の術前評価におけるHEDIの初回臨床応用は, 術後3年を経過し, 痛覚を伴わず, ヘルニア再発を認めなかった症例と比較して, 成功率も有意に向上した。

Abdominal wall defects often lead to pain, discomfort, and recurrence of incisional hernias, resulting in significant morbidity and repeated surgical repairs worldwide. Mesh repair for large hernias is usually based on the defect area with a fixed overlap, without considering biomechanical aspects such as muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distention. To address this issue, we present a biomechanical approach to incisional hernia repair that takes into account the unstable abdominal wall. Additionally, we introduce HEDI, a tool that uses dynamic computed tomography with Valsalva maneuver to automatically detect and assess hernia size, volume, and abdominal wall instability. Our first clinical application of HEDI in the preoperative evaluation of 31 patients shows significantly improved success rates compared to reported rates, with all patients remaining pain-free and showing no hernia recurrence after three years of follow-up.

翻訳日:2023-07-06 18:08:03 公開日:2023-07-04

# 非エルミート境界項を持つハミルトンからの到着時間

Arrival time from Hamiltonian with non-hermitian boundary term ( http://arxiv.org/abs/2307.01501v1 )

ライセンス: Link先を確認

Tajron Juri\'c, Hrvoje Nikoli\'c

(参考訳) 検出器への到達の量子確率密度を求める新しい方法を開発した。検出器の外領域に制限された量子状態の進化は、非エルミート境界項を含む制限されたハミルトニアンによって記述される。非エルミート項は境界を通る確率電流演算子のフラックスに比例していることが示されており、これは到達確率密度が確率電流のフラックスに等しいことを意味する。

We develop a new method for finding the quantum probability density of arrival at the detector. The evolution of the quantum state restricted to the region outside of the detector is described by a restricted Hamiltonian that contains a non-hermitian boundary term. The non-hermitian term is shown to be proportional to the flux of the probability current operator through the boundary, which implies that the arrival probability density is equal to the flux of the probability current.

翻訳日:2023-07-06 18:07:44 公開日:2023-07-04

# 状態依存雑音を伴う加速確率近似

Accelerated stochastic approximation with state-dependent noise ( http://arxiv.org/abs/2307.01497v1 )

ライセンス: Link先を確認

Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li

(参考訳) 確率勾配観測における雑音に対するより一般的な仮定の下で、確率的滑らかな凸最適化問題のクラスを考える。ノイズの分散が一様有界であると仮定される古典的な問題設定とは対照的に、確率勾配の分散はアルゴリズムによって与えられる近似解の「準最適性」に関係していると仮定する。このような問題は様々な応用、特に統計学におけるよく知られた一般化線形回帰問題において自然に発生する。しかし、我々の知る限りでは、このような問題のクラスを解くための確率近似アルゴリズムは、精度、問題パラメータ、およびミニバッチサイズに依存するため、最適性を得ることができない。本稿では,2つの非ユークリッド加速確率近似ルーチン,-確率加速度勾配勾配(SAGD)と確率勾配外挿(SGE)について論じる。適切な条件下では,sagd と sge が最適収束率を達成し,最適な反復とサンプルの複雑度を同時に達成できることを示す。しかし、SGEアルゴリズムの対応する仮定はより一般的なものであり、例えば、重いテールノイズや不連続スコア関数の下での統計的推定問題にSGEを効率的に適用することができる。また,2次成長条件を満たす問題に対するSGEの適用について論じ,スパース溶液の回収にどのように使用できるかを示した。最後に,提案アルゴリズムの高次元設定における数値的性能を示すシミュレーション実験について報告する。

We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.

翻訳日:2023-07-06 18:07:36 公開日:2023-07-04

# AndroidおよびWindowsシステムにおけるディープラーニングによるマルウェア検出のレビュー

Review of Deep Learning-based Malware Detection for Android and Windows System ( http://arxiv.org/abs/2307.01494v1 )

ライセンス: Link先を確認

Nazmul Islam and Seokjoo Shin

(参考訳) マルウェアの差別化は、彼らの行動と脅威レベルを判断し、彼らに対する防衛戦略を考案する上で重要である。これに対し、異なるマルウェアを区別する様々なアンチマルウェアシステムが開発されている。しかし、最近のマルウェアファミリーのほとんどは人工知能(AI)であり、異なる難読化技術を用いて従来のマルウェアシステムを騙すことができる。したがって、AI対応のアンチマルウェアシステムだけがこれらの技術に対して堅牢であり、悪意のある活動を支援するマルウェアファイルの異なる特徴を検出することができる。そこで本研究では,Windows と Android の2つのマルウェア検出技術について概説する。どちらの手法も、様々なマルウェアファミリーの検出において、完全な精度を達成した。

Differentiating malware is important to determine their behaviors and level of threat; as well as to devise defensive strategy against them. In response, various anti-malware systems have been developed to distinguish between different malwares. However, most of the recent malware families are Artificial Intelligence (AI) enable and can deceive traditional anti-malware systems using different obfuscation techniques. Therefore, only AI-enabled anti-malware system is robust against these techniques and can detect different features in the malware files that aid in malicious activities. In this study we review two AI-enabled techniques for detecting malware in Windows and Android operating system, respectively. Both the techniques achieved perfect accuracy in detecting various malware families.

翻訳日:2023-07-06 18:07:09 公開日:2023-07-04

# FB-OCC: 前向き視点変換に基づく3次元活動予測

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation ( http://arxiv.org/abs/2307.01492v1 )

ライセンス: Link先を確認

Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

(参考訳) 本報告は, エンド・ツー・エンド自動運転に関するcvpr 2023ワークショップと, 視覚中心自律運転ワークショップに関するcvpr 23ワークショップと共同で開催されている3次元占有予測チャレンジの勝利ソリューションを要約する。提案したFB-OCCは,前方投影を用いた最先端カメラを用いた鳥眼視認識設計であるFB-BEVに基づいている。 fb-bev 上に,3次元占有率予測タスクに合わせた新しい設計と最適化についてさらに検討し,共同学習,voxel-bev表現,モデルのスケールアップ,効果的な後処理戦略について検討した。これらの設計と最適化により、最新のmIoUスコアはnuScenesデータセットで54.19%となり、チャレンジトラックで1位となった。コードとモデルはhttps://github.com/nvlabs/fb-bevでリリースされる。

This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. On top of FB-BEV, we further study novel designs and optimization tailored to the 3D occupancy prediction task, including joint depth-semantic pre-training, joint voxel-BEV representation, model scaling up, and effective post-processing strategies. These designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track. Code and models will be released at: https://github.com/NVlabs/FB-BEV.

翻訳日:2023-07-06 18:06:58 公開日:2023-07-04

# 開放型世代のための自己矛盾訓練による反復学習バイアスの軽減

Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation ( http://arxiv.org/abs/2307.01542v1 )

ライセンス: Link先を確認

Jian Guan, Minlie Huang

(参考訳) 無数の生成タスクの大幅な進歩にもかかわらず、GPT2のような事前訓練された言語モデル(LM)は、オープンエンド生成のための最大化に基づく復号アルゴリズムで繰り返しテキストを生成する傾向にある。 lmsはmleの損失により、単純な反復パターンを素早く捉えます。本稿では,2つのデータセットの流速を維持しながら繰り返しを効果的に緩和することを示す反復を誤って予測した場合に,同一モデルの早期チェックポイントの出力をペナルティ化する自己比較訓練を提案する。さらに, LMは, 文レベルの繰り返しループの原因となる非繰り返しトークンよりも長い範囲依存を用いて繰り返しトークンを予測する。

Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.

翻訳日:2023-07-06 17:59:52 公開日:2023-07-04

# AIの限界を理解するために教室でプロンプトを学ぶ:パイロットスタディ

Learning to Prompt in the Classroom to Understand AI Limits: A pilot study ( http://arxiv.org/abs/2307.01540v1 )

ライセンス: Link先を確認

Emily Theophilou, Cansu Koyuturk, Mona Yavari, Sathya Bursic, Gregor Donabauer, Alessia Telari, Alessia Testa, Raffaele Boiano, Davinia Hernandez-Leo, Martin Ruskov, Davide Taibi, Alessandro Gabbiadini, Dimitri Ognibene

(参考訳) 人工知能の進歩は社会を援助し、社会問題に取り組む上で大きな可能性を秘めている。特に大きな言語モデル(llm)とチャットボット(chatgptなど)は、aiシステムの自然言語処理機能を高度に改善し、前例のない量の非構造化データを処理できるようになった。一連の誇大広告も反発し、新しいaiメソッドの驚くべき貢献の後でもネガティブな感情が高まった。原因の1つは、AIや問題領域のこれまでの専門知識を使わずに、あらゆる領域のあらゆる種類の知識にアクセスし、処理でき、そして、幻覚や推論の限界のような現在のLSMの限界を無視している、という誤解を招くことにある。 AIの誤認を認めることは、LLMが生成した誤った提案において、犬の過信の影響に対処するために重要である。同時に、AIに対する恐怖やその他の否定的な態度を減らすことができる。 AIリテラシーの介入は、大衆がそのようなLCMの限界を理解して、より効果的な方法でそれらを使用する方法、すなわち「急進的な」学習を学ぶために必要である。この目的により、30人の生徒を抱えた高校でパイロット教育の介入が行われた。関係してます一知能、AI、LLMに関する高レベルの概念を提示すること。 (ii)非自明なタスクにおけるchatgptによる初期ナイーブな実践、そして最後に (iii)現在認められている推進戦略を適用すること。学生報告などの事前結果を収集した。 a) 活動の高く評価すること b)教育活動におけるLLMとの相互作用の質の向上。 c) aiに対する否定的な感情の低下。 d) 制限に対する理解の高まり,具体的には,AIの受容に影響を与える要因を調査し,より制御された環境でこの活動を洗練・繰り返すことを目的としている。

Artificial intelligence's progress holds great promise in assisting society in addressing pressing societal issues. In particular Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. The consequent hype has also backfired, raising negative sentiment even after novel AI methods' surprising contributions. One of the causes, but also an important issue per se, is the rising and misleading feeling of being able to access and process any form of knowledge to solve problems in any domain with no effort or previous expertise in AI or problem domain, disregarding current LLMs limits, such as hallucinations and reasoning limits. Acknowledging AI fallibility is crucial to address the impact of dogmatic overconfidence in possibly erroneous suggestions generated by LLMs. At the same time, it can reduce fear and other negative attitudes toward AI. AI literacy interventions are necessary that allow the public to understand such LLM limits and learn how to use them in a more effective manner, i.e. learning to "prompt". With this aim, a pilot educational intervention was performed in a high school with 30 students. It involved (i) presenting high-level concepts about intelligence, AI, and LLM, (ii) an initial naive practice with ChatGPT in a non-trivial task, and finally (iii) applying currently-accepted prompting strategies. Encouraging preliminary results have been collected such as students reporting a) high appreciation of the activity, b) improved quality of the interaction with the LLM during the educational activity, c) decreased negative sentiments toward AI, d) increased understanding of limitations and specifically We aim to study factors that impact AI acceptance and to refine and repeat this activity in more controlled settings.

翻訳日:2023-07-06 17:59:38 公開日:2023-07-04

# 柔らかい導波路のトンネル:本を閉じる

Tunneling in soft waveguides:closing a book ( http://arxiv.org/abs/2307.01536v1 )

ライセンス: Link先を確認

Pavel Exner and David Spitzkopf

(参考訳) 一般化された「ブックカバー」形状の2次元の柔らかい量子導波路のスペクトル、すなわち、有限湾曲部分とほぼ平行に同じ方向を向いている直進形アシンポットからなる溝の形のポテンシャルを持つシュリンガー作用素について検討する。固有値が漸近値の間の角度が0になるときどのように蓄積するかを示す。平行漸近群の場合、離散スペクトルの存在は溝のプロファイルに依存する。弱結合の場合に存在しないことを証明し、一方、横ポテンシャルが十分強ければ存在することを証明する。また、臨界強度を評価する数値的な例を示す。

We investigate the spectrum of a soft quantum waveguide in two dimensions of the generalized `bookcover' shape, that is, Schr\"odinger operator with the potential in the form of a ditch consisting of a finite curved part and straight asymptotes which are parallel or almost parallel pointing in the same direction. We show how the eigenvalues accumulate when the angle between the asymptotes tends to zero. In case of parallel asymptotes the existence of a discrete spectrum depends on the ditch profile. We prove that it is absent in the weak-coupling case, on the other hand, it exists provided the transverse potential is strong enough. We also present a numerical example in which the critical strength can be assessed.

翻訳日:2023-07-06 17:59:09 公開日:2023-07-04

# コンパクトな動き表現に基づく拡散モデルによる教師なし映像異常検出

Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations ( http://arxiv.org/abs/2307.01533v1 )

ライセンス: Link先を確認

Anil Osman Tur and Nicola Dall'Asen and Cigdem Beyan and Elisa Ricci

(参考訳) 本稿では,ビデオ内の各フレームを,ラベルにアクセスすることなく正常または異常に分類する,教師なしビデオ異常検出(VAD)問題に対処することを目的とする。これを実現するために,提案手法では,入力データが事前学習されたネットワークから抽出された時空間的特徴である条件付き拡散モデルを用い,その条件は映像セグメントを要約したコンパクトな動作表現から抽出された特徴である。本手法は,データ駆動しきい値を用い,高い再構成誤差を異常事象の指標として捉える。本研究は,vadに対するコンパクトな運動表現を用いた最初の研究であり,2つの大規模vadベンチマークを用いた実験により,拡散モデルに関連する情報を提供し,その結果,先行技術におけるvad性能を向上させることを実証した。重要な点として,本手法は,各データセットの一般化性能が向上し,最先端手法とベースライン手法の両方に優れていた。私たちのメソッドのコードはhttps://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusionで利用可能です。

This paper aims to address the unsupervised video anomaly detection (VAD) problem, which involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, the proposed method employs conditional diffusion models, where the input data is the spatiotemporal features extracted from a pre-trained network, and the condition is the features extracted from compact motion representations that summarize a given video segment in terms of its motion and appearance. Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events. This study is the first to utilize compact motion representations for VAD and the experiments conducted on two large-scale VAD benchmarks demonstrate that they supply relevant information to the diffusion model, and consequently improve VAD performances w.r.t the prior art. Importantly, our method exhibits better generalization performance across different datasets, notably outperforming both the state-of-the-art and baseline methods. The code of our method is available at https://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusion

翻訳日:2023-07-06 17:58:57 公開日:2023-07-04

# 不確実性下における自律エージェントの意図行動分析

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty ( http://arxiv.org/abs/2307.01532v1 )

ライセンス: Link先を確認

Filip Cano C\'ordoba, Samuel Judson, Timos Antonopoulos, Katrine Bj{\o}rner, Nicholas Shoemaker, Scott J. Shapiro, Ruzica Piskac and Bettina K\"onighofer

(参考訳) 不確実な環境での自律的な意思決定の原則的説明責任は、否定的な設計と実際の事故との意図的な結果の区別を必要とする。本稿では,意図的行動の証拠を定量的に測定し,自律的エージェントの行動分析を行う。我々は不確実な環境をマルコフ決定過程(MDP)としてモデル化する。与えられたシナリオでは、あるイベントに到達したエージェントの能力を計算するために確率論的モデルチェックに依存します。これを代理店のスコープと呼ぶ。エージェントのスコープが高く、エージェントの決定がイベントに到達するのに最適に近い場合、意図的な行動の証拠があると言う。提案手法は,評価の信頼性を高めるために分析可能な関連シナリオを自動的に生成する。ケーススタディでは,本手法が「意図的」交通衝突と「事故的」交通衝突を区別できることを示す。

Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence of intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event. Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between 'intentional' and 'accidental' traffic collisions.

翻訳日:2023-07-06 17:58:38 公開日:2023-07-04

# コンボリューショントランスフォーマによるトマトの照明・咬合・熟成条件の自律的認識と階調評価

Convolutional Transformer for Autonomous Recognition and Grading of Tomatoes Under Various Lighting, Occlusion, and Ripeness Conditions ( http://arxiv.org/abs/2307.01530v1 )

ライセンス: Link先を確認

Asim Khan, Taimur Hassan, Muhammad Shafay, Israa Fahmy, Naoufel Werghi, Lakmal Seneviratne and Irfan Hussain

(参考訳) 完全に熟したトマトをモバイルロボットで収穫することは、現実世界のシナリオにおいて重大な課題をもたらす。これらの課題は、葉や枝によって引き起こされる閉塞や、果実の発達段階におけるトマトと周辺の葉の色類似性などの要因から生じる。自然環境はさらにこれらの問題を、様々な光条件、視角、閉塞要因、および異なる成熟度レベルと組み合わせている。これらの障害を克服するために, コンボリューショントランスフォーマーアーキテクチャを利用して, 閉塞レベル, 照明条件, 熟度に関わらず, トマトを自律的に認識し, 格付けする新しい枠組みを導入する。提案モデルは、この目的のために特別にキュレートされた注意深い注釈付き画像を用いて訓練され、テストされる。データセットは、さまざまな照明条件下で準備され、視点を視認し、さまざまなモバイルカメラセンサーを使用し、Laboro TomatoやRob2Pheno Annotated Tomatoといった既存のデータセットと区別する。乱雑なトマトインスタンスと隠蔽トマトインスタンスの処理におけるフレームワークの有効性を,2つの公開データセットである Laboro Tomato と Rob2Pheno Annotated Tomato をベンチマークとして評価した。これら3つのデータセットにおける評価結果から, トマトをアノテートしたkutomadata, laboro tomato, rob2phenoの平均精度スコアにおいて, 最先端の58.14%, 65.42%, 66.39%を上回った。その結果,トマトをベースライン法や従来の手法と比較して精度良く検出・区分けできることで,提案モデルの優越性が向上した。具体的には、f1-scoreが80.14%、dice係数が73.26%、平均iouが66.41%である。

Harvesting fully ripe tomatoes with mobile robots presents significant challenges in real-world scenarios. These challenges arise from factors such as occlusion caused by leaves and branches, as well as the color similarity between tomatoes and the surrounding foliage during the fruit development stage. The natural environment further compounds these issues with varying light conditions, viewing angles, occlusion factors, and different maturity levels. To overcome these obstacles, this research introduces a novel framework that leverages a convolutional transformer architecture to autonomously recognize and grade tomatoes, irrespective of their occlusion level, lighting conditions, and ripeness. The proposed model is trained and tested using carefully annotated images curated specifically for this purpose. The dataset is prepared under various lighting conditions, viewing perspectives, and employs different mobile camera sensors, distinguishing it from existing datasets such as Laboro Tomato and Rob2Pheno Annotated Tomato. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively. The results underscore the superiority of the proposed model in accurately detecting and delineating tomatoes compared to baseline methods and previous approaches. Specifically, the model achieves an F1-score of 80.14%, a Dice coefficient of 73.26%, and a mean IoU of 66.41% on the KUTomaData image dataset.

翻訳日:2023-07-06 17:58:24 公開日:2023-07-04

# セマンティックセグメンテーションのための画像の学習圧縮表現の爆発的富化

Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation ( http://arxiv.org/abs/2307.01524v1 )

ライセンス: Link先を確認

Ravi Kakaiya, Rakshith Sathish, Ramanathan Sethuraman

(参考訳) 自動運転車とADAS(Advanced Driving Assistance Systems)は、旅行のやり方を根本的に変える可能性がある。これらの車両の多くは、周囲の物体を検知し追跡するために、現在セグメンテーションと物体検出アルゴリズムに依存している。車両から収集されたデータは、これらのアルゴリズムの継続的な/一生の学習を容易にするために、しばしばクラウドサーバに送られる。帯域幅の制約を考慮すると、データはサーバに送信する前に圧縮され、トレーニングや分析のためにデ圧縮される。本研究では,標準パイプラインにおける減圧縮動作に発生するレイテンシのオーバーヘッドを削減するために,学習ベースの圧縮コーデックを用いることを提案する。得られた圧縮表現は,画像を得るための減算に加えて,意味セグメンテーションなどのタスクの実行にも利用できることを示す。我々は、cityscapesデータセット上で提案されたパイプラインを実験的に検証し、圧縮係数を最大6,6 \times$とし、除算された画像を用いて達成した0.88$に対して、サイス係数0.84$でセグメンテーションを行うために必要な情報を保存し、全体的な計算を1,1\%$で削減した。

Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have the potential to radically change the way we travel. Many such vehicles currently rely on segmentation and object detection algorithms to detect and track objects around its surrounding. The data collected from the vehicles are often sent to cloud servers to facilitate continual/life-long learning of these algorithms. Considering the bandwidth constraints, the data is compressed before sending it to servers, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression Codec to reduce the overhead in latency incurred for the decompression operation in the standard pipeline. We demonstrate that the learned compressed representation can also be used to perform tasks like semantic segmentation in addition to decompression to obtain the images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor up to $66 \times$ while preserving the information required to perform segmentation with a dice coefficient of $0.84$ as compared to $0.88$ achieved using decompressed images while reducing the overall compute by $11\%$.

翻訳日:2023-07-06 17:57:44 公開日:2023-07-04

# LEAT: リアルタイムシナリオにおける遅延アンサンブル攻撃によるロバストディープフェイク破壊に向けて

LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack ( http://arxiv.org/abs/2307.01520v1 )

ライセンス: Link先を確認

Joonkyo Shim, Hyunsoo Yoon

(参考訳) 生成モデルによって生成された悪質な視覚コンテンツであるディープフェイクは、社会にますます有害な脅威をもたらす。近年のディープフェイクの損傷を積極的に軽減するために, 逆方向の摂動を用いてディープフェイクモデルの出力を妨害する研究が進められている。しかしながら、以前のアプローチでは、主に所定のターゲット属性のみに基づいて歪んだ出力を生成することに重点を置いており、ターゲット属性が不明な現実世界のシナリオでは堅牢性が欠落している。さらに、GAN(Generative Adversarial Networks)と拡散モデル(Diffusion Models)の2つの顕著な生成モデル間の摂動の伝達性は未解明のままである。本稿では,頑健なディープフェイク破壊を実現するための目標特性伝達性とモデル伝達性の重要性を強調する。この課題に対処するために,leatと呼ばれる,独立な潜在符号化プロセスを攻撃する簡易かつ効果的な破壊手法を提案する。遅延符号化処理を中断することにより、所定の目標属性に関係なく、その後の生成プロセスで歪んだ出力画像を生成する。このターゲット属性非依存攻撃は、ターゲット属性が未知である場合でもロバストなディスラプションを保証する。さらに,回帰勾配攻撃のための勾配を効果的に集約し,ganモデルと拡散モデルの両方を含む様々なディープフェイクモデルに対する同時攻撃を可能にする正規化勾配アンサンブル戦略を導入する。さらに,画素レベルの差のみに基づく破壊品質の評価が不十分であることを示す。その結果,防衛の成功を包括的に評価するための代替プロトコルを提案する。実世界のシナリオにおいてディープフェイクをディスラプトする手法の有効性を確認し,従来の手法よりも高い防御成功率を報告した。

Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society. To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on only predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, Generative Adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.

翻訳日:2023-07-06 17:57:21 公開日:2023-07-04

# パーソナライズされた治療推薦のための深層注意qネットワーク

Deep Attention Q-Network for Personalized Treatment Recommendation ( http://arxiv.org/abs/2307.01519v1 )

ライセンス: Link先を確認

Simin Ma, Junghwan Lee, Nicoleta Serban, Shihao Yang

(参考訳) 個別の患者に対する治療の調整は、最適な医療成果を得るためには極めて困難である。強化学習の最近の進歩は、有望なパーソナライズされた治療レコメンデーションを提供するが、それらは患者の状態として、患者の真の健康状態を正確に表現しない現在の患者観察(視覚標識、人口統計)にのみ依存している。この制限は政策学習と評価を妨げ、最終的に治療効果を制限する。本研究では,過去の患者観察を効率的に取り入れるために,深層強化学習フレームワーク内のトランスフォーマーアーキテクチャを活用して,パーソナライズされた治療推奨のための深層注意qネットワークを提案する。実世界の敗血症と急性低血圧コホートに関するモデルを評価し,最新モデルよりも優れていることを示した。私たちのモデルのソースコードはhttps://github.com/stevenmsm/RL-ICU-DAQNで公開されています。

Tailoring treatment for individual patients is crucial yet challenging in order to achieve optimal healthcare outcomes. Recent advances in reinforcement learning offer promising personalized treatment recommendations; however, they rely solely on current patient observations (vital signs, demographics) as the patient's state, which may not accurately represent the true health status of the patient. This limitation hampers policy learning and evaluation, ultimately limiting treatment effectiveness. In this study, we propose the Deep Attention Q-Network for personalized treatment recommendations, utilizing the Transformer architecture within a deep reinforcement learning framework to efficiently incorporate all past patient observations. We evaluated the model on real-world sepsis and acute hypotension cohorts, demonstrating its superiority to state-of-the-art models. The source code for our model is available at https://github.com/stevenmsm/RL-ICU-DAQN.

翻訳日:2023-07-06 17:56:51 公開日:2023-07-04

# LPN:数ショット分類のための言語誘導型プロトタイプネットワーク

LPN: Language-guided Prototypical Network for few-shot classification ( http://arxiv.org/abs/2307.01515v1 )

ライセンス: Link先を確認

Kaihui Cheng, Chule Yang

(参考訳) 少数ショット分類は、制限されたラベル付き例で新しいタスクに適応することを目的としている。アクセス可能なデータを完全に利用するために、最近の手法では、クエリとサポートイメージの類似性、およびメタトレーニングと事前トレーニング戦略による高次元特徴の適切な測定方法が検討されている。しかし、マルチモダリティ情報の可能性はほとんど検討されていないため、少数ショット分類に有望な改善をもたらす可能性がある。本稿では,2つの並列分岐による視覚と言語モダリティの相補性を活用した,少数ショット分類のための言語誘導型ネットワーク (lpn) を提案する。具体的には,視覚タスクに限られたサンプルで言語モダリティを導入するために,事前学習されたテキストエンコーダを活用して,従来の画像エンコーダで画像を処理すると同時に,クラス名から直接クラスレベルのテキスト特徴を抽出する。次に、クラスレベルの特徴と視覚的特徴を整合させることにより、各画像に対応するテキスト特徴を得るために、言語案内デコーダを導入する。さらに,クラスレベルの特徴とプロトタイプを活用するために,テキストブランチに頑健なプロトタイプを生成する改良されたプロトタイプヘッドを構築した。最後に、視覚とテキストのロジットを集約し、単一のモダリティの偏差を校正する。大規模な実験は、ベンチマークデータセットの最先端手法に対するLPNの競争力を示す。

Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the accessible data, recent methods explore suitable measures for the similarity between the query and support images and better high-dimensional features with meta-training and pre-training strategies. However, the potential of multi-modality information has barely been explored, which may bring promising improvement for few-shot classification. In this paper, we propose a Language-guided Prototypical Network (LPN) for few-shot classification, which leverages the complementarity of vision and language modalities via two parallel branches. Concretely, to introduce language modality with limited samples in the visual task, we leverage a pre-trained text encoder to extract class-level text features directly from class names while processing images with a conventional image encoder. Then, a language-guided decoder is introduced to obtain text features corresponding to each image by aligning class-level features with visual features. In addition, to take advantage of class-level features and prototypes, we build a refined prototypical head that generates robust prototypes in the text branch for follow-up measurement. Finally, we aggregate the visual and text logits to calibrate the deviation of a single modality. Extensive experiments demonstrate the competitiveness of LPN against state-of-the-art methods on benchmark datasets.

翻訳日:2023-07-06 17:56:33 公開日:2023-07-04

# ニューラルネットワークと単語埋め込みを用いた概念認知マップ形成

Conceptual Cognitive Maps Formation with Neural Successor Networks and Word Embeddings ( http://arxiv.org/abs/2307.01577v1 )

ライセンス: Link先を確認

Paul Stoewer, Achim Schilling, Andreas Maier and Patrick Krauss

(参考訳) 人間の脳は、環境から受信した情報を文脈化する特別な能力を持っている。内野-海馬はこの機能において重要な役割を担っており、場所とグリッド細胞を用いた記憶処理や認知地図の構築に深く関わっている。この能力の理解と活用は、人工知能の分野を著しく強化する可能性がある。マルチスケールの後継表現は、場所とグリッドセルの機能の優れたモデルとして機能し、すでにこの役割を約束している。本稿では,3つの概念の認知マップを構築するために,後継表現とニューラルネットワークと単語埋め込みベクトルを用いたモデルを提案する。ネットワークは2つの異なるスケールドマップを学習し、関連する既存の表現に近接して新しい情報を配置する。認知地図上の情報の分散は、その規模によって異なり、集中度が高いか、3つの概念が形成されるか、あるいは地図全体に均等に広がる。我々のモデルは、入力と既存の知識表現の類似度基準に基づいて、任意の入力にマルチモーダルコンテキスト情報を提供することで、現在のAIモデルを改善する可能性を示唆している。

The human brain possesses the extraordinary capability to contextualize the information it receives from our environment. The entorhinal-hippocampal plays a critical role in this function, as it is deeply engaged in memory processing and constructing cognitive maps using place and grid cells. Comprehending and leveraging this ability could significantly augment the field of artificial intelligence. The multi-scale successor representation serves as a good model for the functionality of place and grid cells and has already shown promise in this role. Here, we introduce a model that employs successor representations and neural networks, along with word embedding vectors, to construct a cognitive map of three separate concepts. The network adeptly learns two different scaled maps and situates new information in proximity to related pre-existing representations. The dispersion of information across the cognitive map varies according to its scale - either being heavily concentrated, resulting in the formation of the three concepts, or spread evenly throughout the map. We suggest that our model could potentially improve current AI models by providing multi-modal context information to any input, based on a similarity metric for the input and pre-existing knowledge representations.

翻訳日:2023-07-06 17:50:15 公開日:2023-07-04

# kapitza-dirac効果におけるスピンフリップの二次元シミュレーション

Two-dimensional simulation of the spin-flip in the Kapitza-Dirac effect ( http://arxiv.org/abs/2307.01571v1 )

ライセンス: Link先を確認

Ping Ge, Sven Ahrens, Baifei Shen

(参考訳) 強磁場場の量子論における多くの計算は単純な場の幾何学を用いて行われ、しばしば空間場のエンベロープを無視する。本稿では,ガウスビーム定在光波におけるカピツァ・ディラック効果の電子回折量子力学をシミュレートする。 2次元シミュレーションは、高速フーリエ変換スプリット作用素法を用いてディラック方程式を解いて相対論的枠組みで計算する。数値伝搬法を除くと,近似を適用しず,カピツァ・ディラック効果のスピンフリップが可能であることを示す。

Many calculations in strong field quantum field theory are carried out by using a simple field geometry, often neglecting the spacial field envelope. In this article, we simulate the electron diffraction quantum dynamics of the Kapitza-Dirac effect in a Gaussian beam standing light wave. The two-dimensional simulation is computed in a relativistic framework, by solving the Dirac equation with the fast Fourier transform split operator method. Except the numerical propagation method, our results are obtained without applying approximations and demonstrate that a spin-flip in the Kapitza-Dirac effect is possible.

翻訳日:2023-07-06 17:49:57 公開日:2023-07-04

# 機械学習に基づく侵入検出:特徴選択と特徴抽出

Machine Learning-Based Intrusion Detection: Feature Selection versus Feature Extraction ( http://arxiv.org/abs/2307.01570v1 )

ライセンス: Link先を確認

Vu-Duc Ngo, Tuan-Cuong Vuong, Thien Van Luong, and Hung Tran

(参考訳) スマートシティ、スマート農業、スマートヘルスケア、スマート製造など、多くの分野において、IoT(Internet of Things)が重要な役割を担っている。しかし、IoTデバイスはサイバー攻撃に非常に脆弱であり、セキュリティ侵害やデータ漏洩を引き起こす可能性がある。これらの攻撃を効果的に防止するために、さまざまな機械学習ベースのIoTネットワーク侵入検知手法が開発されており、機械学習モデルに入力される前の入力データの次元を減らすために、しばしば特徴抽出または特徴選択技術のいずれかに依存している。これは、リアルタイム操作のための検出の複雑さを低くすることを目的としており、特に侵入検知システムでは不可欠である。本稿は,最新のUNSW-NB15データセットとバイナリクラスとマルチクラス分類の両方が存在する場合において,これらの2つの特徴量削減手法を,精度,リコール率,検出精度,ランタイム複雑性といった様々なパフォーマンス指標で総合的に比較する。例えば、一般的には、特徴選択法は、より優れた検出性能を提供するだけでなく、特徴抽出よりも低いトレーニングと推論時間を提供する。しかし、特徴抽出法はその選択法よりもはるかに信頼性が高く、特にK = 4 のような K が非常に小さい場合である。さらに、特徴抽出は、特徴選択よりも縮小された特徴kの数を変更することに対する感受性が低く、バイナリクラスとマルチクラスの両方に当てはまる。この比較に基づいて,タブで詳述したように,特定のシナリオごとに適切な侵入検出タイプを選択するための有用なガイドラインを提供する。第4節の最後で14。

Internet of things (IoT) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as K = 4. Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Tab. 14 at the end of Section IV.

翻訳日:2023-07-06 17:49:47 公開日:2023-07-04

# 表現学習と不確実性定量化のための最終層状態空間モデル

Last layer state space model for representation learning and uncertainty quantification ( http://arxiv.org/abs/2307.01566v1 )

ライセンス: Link先を確認

Max Cohen (TSP), Maurice Charbit, Sylvain Le Corff (TSP)

(参考訳) シーケンシャルなニューラルアーキテクチャがより深く複雑になるにつれて、不確実性の推定はますます困難になる。不確実性を定量化する努力は、しばしば特定の訓練手順に依存し、そのようなモデルの次元性のためにさらなる計算コストを負担する。本稿では,低次元状態学習のための表現学習ステージと不確かさ推定のための状態空間モデルという2つのステップで分類や回帰タスクを分解することを提案する。このアプローチは表現学習と生成モデルの設計を分離することができる。本稿では,モンテカルロ法を用いてパラメータを推定する状態空間ベース最後の層を追加することにより,既存のニューラルネットワーク上に予測分布を推定する方法を実証する。提案手法を,公的なベンチマークデータセットである電気変圧器油温の時間的推定に適用する。我々のモデルは未知変数や未使用変数によるノイズの多いデータ構造を考慮し、予測に信頼区間を提供できる。

As sequential neural architectures become deeper and more complex, uncertainty estimation is more and more challenging. Efforts in quantifying uncertainty often rely on specific training procedures, and bear additional computational costs due to the dimensionality of such models. In this paper, we propose to decompose a classification or regression task in two steps: a representation learning stage to learn low-dimensional states, and a state space model for uncertainty estimation. This approach allows to separate representation learning and design of generative models. We demonstrate how predictive distributions can be estimated on top of an existing and trained neural network, by adding a state space-based last layer whose parameters are estimated with Sequential Monte Carlo methods. We apply our proposed methodology to the hourly estimation of Electricity Transformer Oil temperature, a publicly benchmarked dataset. Our model accounts for the noisy data structure, due to unknown or unavailable variables, and is able to provide confidence intervals on predictions.

翻訳日:2023-07-06 17:49:18 公開日:2023-07-04

# 効率的な探査・探査戦略のための近似情報

Approximate information for efficient exploration-exploitation strategies ( http://arxiv.org/abs/2307.01563v1 )

ライセンス: Link先を確認

Alex Barbier-Chebbah (IP, CNRS, UPCit\'e), Christian L. Vestergaard (IP, CNRS, UPCit\'e), Jean-Baptiste Masson (IP, CNRS, UPCit\'e)

(参考訳) 本稿では,多腕バンディット問題に着目し,意思決定に固有の探索・探索ジレンマについて論じる。問題は、エージェントが現在の知識を即時利益に活用するか、または潜在的長期報酬のために新しい道を探るかを決定することである。本稿では,エントロピー勾配の解析的近似を用いて,各時点にどのアームを引くかを選択する新しいアルゴリズム,近似情報最大化(AIM)を提案する。 AIMはInfomaxとThompsonのサンプリングのパフォーマンスと一致し、計算速度、決定性、トラクタビリティも向上した。 aimの実証的な評価は、lai-robbinsの漸近的な境界に準拠していることを示し、様々な事前値に対する堅牢性を示している。その表現は調整可能であり、様々な設定で特定の最適化を可能にする。

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.

翻訳日:2023-07-06 17:49:04 公開日:2023-07-04

# ポケットサイズのドローン上でセキュアなディープラーニングベースの分散インテリジェンス

Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones ( http://arxiv.org/abs/2307.01559v1 )

ライセンス: Link先を確認

Elia Cereda and Alessandro Giusti and Daniele Palossi

(参考訳) パームサイズのナノドロンはエッジノードの魅力的なクラスであるが、その限られた計算資源は大規模なディープラーニングモデルの実行を妨げている。エッジフォッグ計算のパラダイムを採用することで、計算の一部をフォグにオフロードすることができるが、フォグノードや通信リンクが信頼できない場合、セキュリティ上の懸念が生じる。そこで本研究では,ナノドローン上でランダムなサブネットワークを冗長に実行することにより,霧の計算を検証する分散エッジフォッグ実行方式を提案する。システム上で完全に動作しているState-of-the-Artビジュアルポーズ推定ネットワークと比較して、大規模ネットワークは分散処理によってR^2$スコアを+0.19向上させ、攻撃時には95%の確率で2秒以内で検出する。

Palm-sized nano-drones are an appealing class of edge nodes, but their limited computational resources prevent running large deep-learning models onboard. Adopting an edge-fog computational paradigm, we can offload part of the computation to the fog; however, this poses security concerns if the fog node, or the communication link, can not be trusted. To tackle this concern, we propose a novel distributed edge-fog execution scheme that validates fog computation by redundantly executing a random subnetwork aboard our nano-drone. Compared to a State-of-the-Art visual pose estimation network that entirely runs onboard, a larger network executed in a distributed way improves the $R^2$ score by +0.19; in case of attack, our approach detects it within 2s with 95% probability.

翻訳日:2023-07-06 17:48:51 公開日:2023-07-04

# プロジェクション演算子を用いた2視点学習タスクのスケーラブル変数選択

Scalable variable selection for two-view learning tasks with projection operators ( http://arxiv.org/abs/2307.01558v1 )

ライセンス: Link先を確認

Sandor Szedmak (1), Riikka Huusari (1), Tat Hong Duong Le (1), Juho Rousu (1) ((1) Department of Computer Science, Aalto University, Espoo, Finland)

(参考訳) 本稿では,2視点設定,あるいはベクトル値教師付き学習問題に対する新しい変数選択法を提案する。当社のフレームワークは,データサンプルの数が数百万にものぼる,非常に大規模な選択タスクを処理できる。本手法は,出力変数と高い相関性を持つ変数を反復的に選択することで変数選択を行うが,従来選択されていた変数と相関性はない。相関を測るために,提案手法は射影作用素とその代数の概念を用いる。投影演算子では、入力変数と出力変数のセットの間の相関関係もカーネル関数によって表現できるため、非線形相関モデルも活用できる。提案手法を実験的に検証し,合成データと実データの両方において,そのスケーラビリティと特徴の関連性を示す。キーワード:教師付き変数選択、ベクトル値学習、投影値測度、カーネルヒルベルト空間

In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space

翻訳日:2023-07-06 17:48:35 公開日:2023-07-04

# 分離道路トーポフォーマー

Separated RoadTopoFormer ( http://arxiv.org/abs/2307.01557v1 )

ライセンス: Link先を確認

Mingjie Lu, Yuanxian Huang, Ji Liu, Jinzhang Peng, Lu Tian, Ashish Sirasao

(参考訳) 自動運転を実現するためには、運転シナリオを理解することが不可欠だ。マップ学習やbevレーン検出といった以前の仕事は、レーンインスタンス間の接続関係を無視し、トラフィック要素検出タスクは通常、レーンラインとの関係を無視する。これらの課題に対処するため、4つのサブタスク、交通要素の検出、車線中心線の検出、車線間の接続関係の推論、車線と交通要素の割り当て関係の推論を含むタスクを提示する。本稿では,車線中心線と交通要素を識別し,それらの関係を推論するエンドツーエンドフレームワークであるroadtopoformerを提案する。各モジュールを別々に最適化することで、互いにインタラクションを防止し、小さな微調整でそれらを集約します。 2つの検出ヘッドではオブジェクトを検出するためにdetrライクなアーキテクチャを採用し、関係ヘッドでは、フロント検出器から2つのインスタンス特徴を取り込み、それらを分類器に供給して関係確率を得る。最終提出は0.445 OLSで、これはサブタスクと組み合わせたスコアの両方で競合します。

Understanding driving scenarios is crucial to realizing autonomous driving. Previous works such as map learning and BEV lane detection neglect the connection relationship between lane instances, and traffic elements detection tasks usually neglect the relationship with lane lines. To address these issues, the task is presented which includes 4 sub-tasks, the detection of traffic elements, the detection of lane centerlines, reasoning connection relationships among lanes, and reasoning assignment relationships between lanes and traffic elements. We present Separated RoadTopoFormer to tackle the issues, which is an end-to-end framework that detects lane centerline and traffic elements with reasoning relationships among them. We optimize each module separately to prevent interaction with each other and aggregate them together with few finetunes. For two detection heads, we adopted a DETR-like architecture to detect objects, and for the relationship head, we concat two instance features from front detectors and feed them to the classifier to obtain relationship probability. Our final submission achieves 0.445 OLS, which is competitive in both sub-task and combined scores.

翻訳日:2023-07-06 17:48:19 公開日:2023-07-04

# 対話エージェントの文脈におけるNLGの知識グラフ

Knowledge Graph for NLG in the context of conversational agents ( http://arxiv.org/abs/2307.01548v1 )

ライセンス: Link先を確認

Hussam Ghanem (ICB), Massinissa Atmani (ICB), Christophe Cruz (ICB)

(参考訳) 知識グラフ(KG)の使用により、会話エージェントが提供する応答の正確性と包括性が向上する。会話中に回答を生成することは、これらのKGからテキストを生成することで成り立っているが、近年大きな注目を集めている課題であるとみなされている。本稿では,グラフニューラルネットワーク,グラフトランスフォーマー,seq2seqモデルによる線形化など,知識グラフからテキストへの生成に使用されるさまざまなアーキテクチャのレビューを行う。それぞれのアーキテクチャの利点と限界について議論し、アーキテクチャの選択は、目前にあるタスクの特定の要求に依存すると結論付ける。また、特に会話エージェントの文脈において、実行時間やモデルの妥当性といった制約を考慮することの重要性を強調する。これらの制約とDAVIのドメインに対するラベル付きデータの可用性に基づいて、知識グラフからテキスト生成タスクにSeq2seq Transformerベースモデル(PLM)を使用する。我々は PLM 上での kg-to-text 生成のベンチマークデータセットの改良と,今後の作業における感情的・多言語的側面の探索を目的とする。本総説では,知識グラフ・テキスト生成における様々なアプローチについて考察し,今後の研究の方向性について概説する。

The use of knowledge graphs (KGs) enhances the accuracy and comprehensiveness of the responses provided by a conversational agent. While generating answers during conversations consists in generating text from these KGs, it is still regarded as a challenging task that has gained significant attention in recent years. In this document, we provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models. We discuss the advantages and limitations of each architecture and conclude that the choice of architecture will depend on the specific requirements of the task at hand. We also highlight the importance of considering constraints such as execution time and model validity, particularly in the context of conversational agents. Based on these constraints and the availability of labeled data for the domains of DAVI, we choose to use seq2seq Transformer-based models (PLMs) for the Knowledge Graph-to-Text Generation task. We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work. Overall, this review provides insights into the different approaches for knowledge graph-to-text generation and outlines future directions for research in this area.

翻訳日:2023-07-06 17:48:00 公開日:2023-07-04

# EffSeg: 構造保存空間を用いた高効率細粒度インスタンスセグメンテーション

EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity ( http://arxiv.org/abs/2307.01545v1 )

ライセンス: Link先を確認

C\'edric Picron, Tinne Tuytelaars

(参考訳) 多くの2段階のインスタンスセグメンテーションヘッドは、インスタンスごとに粗い28x28マスクを予測しており、多くのオブジェクトのきめ細かい詳細をキャプチャするには不十分である。この問題を解決するため、PointRendとRefineMaskは112x112のセグメンテーションマスクを予測し、より高い品質セグメンテーションをもたらす。どちらのメソッドも、隣接する機能(PointRend)にアクセスできないか、あるいは空間的な場所を疎結合に実行する(RefineMask)。本稿では,能動的特徴量,受動的特徴量,特徴量を含む密集した2次元インデックスマップを別々に保存し,構造保存スパーシティ(sps)法を用いて,効率的なインスタンス分割を行うeffsegを提案する。インデックスマップの目的は、どんな2D操作でも実行できるような特徴間の2D空間構成や構造を維持することである。 EffSegは、RefineMaskと比較してCOCOで同様のパフォーマンスを実現し、FLOPの数を71%削減し、FPSを29%増やした。コードはリリースされる。

Many two-stage instance segmentation heads predict a coarse 28x28 mask per instance, which is insufficient to capture the fine-grained details of many objects. To address this issue, PointRend and RefineMask predict a 112x112 segmentation mask resulting in higher quality segmentations. Both methods however have limitations by either not having access to neighboring features (PointRend) or by performing computation at all spatial locations instead of sparsely (RefineMask). In this work, we propose EffSeg performing fine-grained instance segmentation in an efficient way by using our Structure-Preserving Sparsity (SPS) method based on separately storing the active features, the passive features and a dense 2D index map containing the feature indices. The goal of the index map is to preserve the 2D spatial configuration or structure between the features such that any 2D operation can still be performed. EffSeg achieves similar performance on COCO compared to RefineMask, while reducing the number of FLOPs by 71% and increasing the FPS by 29%. Code will be released.

翻訳日:2023-07-06 17:47:38 公開日:2023-07-04

# 過信は危険なこと:信頼の低い予測によってメンバーシップ推論攻撃を緩和する

Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction ( http://arxiv.org/abs/2307.01610v1 )

ライセンス: Link先を確認

Zitao Chen, Karthik Pattabiraman

(参考訳) 機械学習(ml)モデルはメンバーシップ推論攻撃(mia)に対して脆弱であり、与えられた入力がターゲットモデルのトレーニングに使用されるかどうかを判断する。 MIAを緩和する取り組みは数多くあるが、プライバシ保護の制限、大きな精度低下、および/または取得が困難な追加データを必要とする場合が多い。本研究は,強力なメンバーシッププライバシと高い精度を,余分なデータを必要とせずに達成できる防衛技術であるhampを提案する。異なる形式でMIAを緩和するために、異なるプロキシを通してトレーニングサンプルを予測する際に、MLモデルの過信を利用するため、それらが統一可能であることを観察する。これにより、モデルによる自信のない予測を強制するモチベーションが増し、トレーニングやテストサンプルで同じように振る舞うようになります。 HAMPは、高いエントロピーのソフトラベルを持つ新しいトレーニングフレームワークと、高い精度を保ちながらモデルの予測を制約するエントロピーベースの正規化器で構成されている。プライバシーリスクをさらに軽減するため、HAMPは全ての予測出力を均一に修正し、精度を維持しながら低信頼の出力となるようにし、メンバーと非メンバーの予測の違いを効果的に曖昧にする。 5つのベンチマークデータセットに対して広範な評価を行い、HAMPが常に高い精度と強力な会員プライバシーを提供することを示す。最先端の7つの防衛技術と比較すると、HAMPはそれらの技術よりも優れたプライバシーとユーティリティのトレードオフを実現している。

Machine learning (ML) models are vulnerable to membership inference attacks (MIAs), which determine whether a given input is used for training the target model. While there have been many efforts to mitigate MIAs, they often suffer from limited privacy protection, large accuracy drop, and/or requiring additional data that may be difficult to acquire. This work proposes a defense technique, HAMP that can achieve both strong membership privacy and high accuracy, without requiring extra data. To mitigate MIAs in different forms, we observe that they can be unified as they all exploit the ML model's overconfidence in predicting training samples through different proxies. This motivates our design to enforce less confident prediction by the model, hence forcing the model to behave similarly on the training and testing samples. HAMP consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model's prediction while still achieving high accuracy. To further reduce privacy risk, HAMP uniformly modifies all the prediction outputs to become low-confidence outputs while preserving the accuracy, which effectively obscures the differences between the prediction on members and non-members. We conduct extensive evaluation on five benchmark datasets, and show that HAMP provides consistently high accuracy and strong membership privacy. Our comparison with seven state-of-the-art defenses shows that HAMP achieves a superior privacy-utility trade off than those techniques.

翻訳日:2023-07-06 17:40:46 公開日:2023-07-04

# L2ロシア語における文法的誤り訂正のための言語モデル

A Language Model for Grammatical Error Correction in L2 Russian ( http://arxiv.org/abs/2307.01609v1 )

ライセンス: Link先を確認

Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov, Anastasia Vyrenkova

(参考訳) 文法的誤り訂正は自然言語処理の基本課題の1つである。ロシア語では、ほとんどのスペルチェッカーは正確なタイポスやその他の単純なエラーを高精度で利用できるが、非ネイティブ(L2)文字に直面すると失敗することが多い。本稿では,L2ロシア文字の誤り訂正を目的とした言語モデルを含むパイプラインを提案する。提案する言語モデルは,ロシア国立コーパスの新聞サブコーパスの未タグテキストに基づいて学習し,その品質をRULEC-GECコーパスに対して検証する。

Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.

翻訳日:2023-07-06 17:39:41 公開日:2023-07-04

# 時系列異常検出のためのプロトタイプ

Prototypes as Explanation for Time Series Anomaly Detection ( http://arxiv.org/abs/2307.01601v1 )

ライセンス: Link先を確認

Bin Li, Carsten Jentsch, Emmanuel M\"uller

(参考訳) 多くのビッグデータアプリケーションにおいて、時系列における一定の規則的反復パターンから逸脱する異常パターンの検出が不可欠である。しかしながら、ラベルの欠如、時系列データの動的性質、予期せぬ異常な振る舞いにより検出プロセスが困難になる。近年の深層異常検出手法の成功にもかかわらず、このようなブラックボックスモデルにおける神秘的なメカニズムは、安全クリティカルなアプリケーションにおいて新たな課題となっている。モデルの透明性と予測信頼性の欠如は、そのような領域のさらなるブレークスルーを妨げる。本稿では,プロトタイプを用いて異常検出時の正規パターン状態の例に基づく説明を行うprotoadを提案する。検出パフォーマンスに大きな影響を与えることなく、プロトタイプは深いブラックボックスモデルに光を当て、ドメインの専門家やステークホルダーに直感的な理解を提供する。分類問題において広く用いられているプロトタイプ学習を異常検出に拡張する。潜在空間と入力空間のプロトタイプの両方を可視化することにより、正規データがどのようにモデル化され、なぜ特定のパターンが異常であるかを直感的に示す。

Detecting abnormal patterns that deviate from a certain regular repeating pattern in time series is essential in many big data applications. However, the lack of labels, the dynamic nature of time series data, and unforeseeable abnormal behaviors make the detection process challenging. Despite the success of recent deep anomaly detection approaches, the mystical mechanisms in such black-box models have become a new challenge in safety-critical applications. The lack of model transparency and prediction reliability hinders further breakthroughs in such domains. This paper proposes ProtoAD, using prototypes as the example-based explanation for the state of regular patterns during anomaly detection. Without significant impact on the detection performance, prototypes shed light on the deep black-box models and provide intuitive understanding for domain experts and stakeholders. We extend the widely used prototype learning in classification problems into anomaly detection. By visualizing both the latent space and input space prototypes, we intuitively demonstrate how regular data are modeled and why specific patterns are considered abnormal.

翻訳日:2023-07-06 17:39:24 公開日:2023-07-04

# オンチェーンデータを用いたスケーラブル強化学習システムによる暗号ポートフォリオ管理

A Scalable Reinforcement Learning-based System Using On-Chain Data for Cryptocurrency Portfolio Management ( http://arxiv.org/abs/2307.01599v1 )

ライセンス: Link先を確認

Zhenhan Huang and Fumihide Tanaka

(参考訳) ブロックチェーンネットワークのオンチェーンデータ(メトリックス)は、企業の基本と似ていて、ネットワークに対する重要かつ包括的な洞察を提供する。その情報的性質にもかかわらず、オンチェーンデータは暗号(crypto)ポートフォリオ管理(pm)のための強化学習(rl)ベースのシステムでは利用されていない。興味深い課題は、オンチェーンデータの利用によって、ベースラインと比較してRLベースのシステムの戻り性能が向上する範囲である。そこで本研究では,エンドツーエンド暗号pmにオンチェーンデータを組み込んだ新しいrlベースシステムであるcryptorlpmを提案する。 cryptorlpmは情報理解から取引注文実行までの5つのユニットで構成される。 CryptoRLPMでは、オンチェーンデータを各暗号に対してテストして指定し、メトリクスの非効率性の問題を解決する。さらに、CryptoRLPMのスケーラブルな性質により、いつでもポートフォリオの暗号を変更することができる。 3つのポートフォリオのバックテスト結果から、CryptoRLPMは、累積リターン率(ARR)、毎日リターン率(DRR)、ソルティーノ比(SR)の点で、すべてのベースラインを上回ります。特にBitcoinと比較して、CryptoRLPMはARR、DRR、SRをそれぞれ83.14%、0.5603%、および2.1767で強化している。

On-chain data (metrics) of blockchain networks, akin to company fundamentals, provide crucial and comprehensive insights into the networks. Despite their informative nature, on-chain data have not been utilized in reinforcement learning (RL)-based systems for cryptocurrency (crypto) portfolio management (PM). An intriguing subject is the extent to which the utilization of on-chain data can enhance an RL-based system's return performance compared to baselines. Therefore, in this study, we propose CryptoRLPM, a novel RL-based system incorporating on-chain data for end-to-end crypto PM. CryptoRLPM consists of five units, spanning from information comprehension to trading order execution. In CryptoRLPM, the on-chain data are tested and specified for each crypto to solve the issue of ineffectiveness of metrics. Moreover, the scalable nature of CryptoRLPM allows changes in the portfolios' cryptos at any time. Backtesting results on three portfolios indicate that CryptoRLPM outperforms all the baselines in terms of accumulated rate of return (ARR), daily rate of return (DRR), and Sortino ratio (SR). Particularly, when compared to Bitcoin, CryptoRLPM enhances the ARR, DRR, and SR by at least 83.14%, 0.5603%, and 2.1767 respectively.

翻訳日:2023-07-06 17:39:08 公開日:2023-07-04

# ピーク時間連続予測におけるパフォーマンスギャップのブリッジ: Seq2Peakフレームワーク

Bridge the Performance Gap in Peak-hour Series Forecasting: The Seq2Peak Framework ( http://arxiv.org/abs/2307.01597v1 )

ライセンス: Link先を確認

Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu

(参考訳) Peak-Hour Series Forecasting (PHSF) は、様々な領域において重要で未探索の課題である。最先端のディープラーニングモデルは通常の時系列予測(TSF)では優れていますが、PHSFでは同等の結果を得るのに苦労しています。これは、ピーク時系列における高い非定常性によって引き起こされる課題によるもので、これは通常の TSF よりも直接予測が困難である。さらに、定期的な予測結果から手動で最大値を抽出すると、平均赤字を最小化するモデルによる最適化性能が低下する。これらの問題に対処するため,本論文では,PHSFタスク用に設計された新しいフレームワークであるSeq2Peakについて述べる。 Seq2Peakは、非定常性問題を緩和するCyclicNormパイプラインと、オリジナルのシリーズとピーク時間の両方を教師付き信号として利用するハイブリッド損失関数を備えた単純なトレーニング可能なパラメータなしピーク時デコーダの2つの重要なコンポーネントを提供する。公開されている時系列データセットに対する大規模な実験は、提案されたフレームワークの有効性を示し、トランスフォーマーと非トランスフォーマーベースのTSFモデルの両方に対して、4つの実世界のデータセットに対して37.7\%の顕著な平均相対的な改善をもたらす。

Peak-Hour Series Forecasting (PHSF) is a crucial yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue, and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7\% across four real-world datasets for both transformer- and non-transformer-based TSF models.

翻訳日:2023-07-06 17:38:47 公開日:2023-07-04

# プロンプトチューニングは、より遠く、対照的な学習を引き寄せる: 社会的バイアスを軽減するための2段階アプローチ

Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases ( http://arxiv.org/abs/2307.01595v1 )

ライセンス: Link先を確認

Yingji Li, Mengnan Du, Xin Wang, Ying Wang

(参考訳) プレトレーニング言語モデル(PLM)の表現能力が向上するにつれて、未処理のコーパスから社会的バイアスを継承するという懸念が高まっている。これまでのデバイアス技術のほとんどは、トレーニングコーパスのバランスをとるために、CDA(Counterfactual Data Augmentation)を使用していた。しかし、CDAは元のコーパスをわずかに修正し、異なる人口集団間の表現距離を狭い範囲に制限する。その結果,デバイアス化モデルは,テキストリソースの制限によるデバイアス化性能に影響を及ぼす対物対の違いに容易に適合することがわかった。本稿では,PLMのエンコーディングにおける社会的バイアスを軽減するために,Contrastive Learning with Continuous Prompt Augmentation (CCPA) を用いた対角的学習による2段階脱バイアスモデルを提案する。第1段階では,連続的なプロンプトチューニングに基づくデータ拡張法を提案する。第2段階では、コントラスト学習を利用して、強化されたサンプルペア間の表現距離を絞り、微調整されたPLMのパラメータをデバイアス符号化する。本手法は,トレーニングプロセスに難易度を加えることで,よりデバイアスな性能を達成するためのモデル指導を行う。大規模な実験の結果,CCPAはデバイアス性能においてベースラインよりも優れていた。一方、GLUEベンチマーク実験の結果、CCPAはPLMの言語モデリング能力を保っていることが示された。

As the representation capability of Pre-trained Language Models (PLMs) improve, there is growing concern that they will inherit social biases from unprocessed corpora. Most previous debiasing techniques used Counterfactual Data Augmentation (CDA) to balance the training corpus. However, CDA slightly modifies the original corpus, limiting the representation distance between different demographic groups to a narrow range. As a result, the debiasing model easily fits the differences between counterfactual pairs, which affects its debiasing performance with limited text resources. In this paper, we propose an adversarial training-inspired two-stage debiasing model using Contrastive learning with Continuous Prompt Augmentation (named CCPA) to mitigate social biases in PLMs' encoding. In the first stage, we propose a data augmentation method based on continuous prompt tuning to push farther the representation distance between sample pairs along different demographic groups. In the second stage, we utilize contrastive learning to pull closer the representation distance between the augmented sample pairs and then fine-tune PLMs' parameters to get debiased encoding. Our approach guides the model to achieve stronger debiasing performance by adding difficulty to the training process. Extensive experiments show that CCPA outperforms baselines in terms of debiasing performance. Meanwhile, experimental results on the GLUE benchmark show that CCPA retains the language modeling capability of PLMs.

翻訳日:2023-07-06 17:38:23 公開日:2023-07-04

# ディスプレイ広告における多要素創造のためのクロス要素組合せ選択

Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising ( http://arxiv.org/abs/2307.01593v1 )

ライセンス: Link先を確認

Wei Zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang

(参考訳) 広告制作の効果は、その視覚的な外観に大きく影響される。広告プラットフォームは、広告主が提供するクリエイティブ要素を組み合わせることで、異なる外観で広告クリエイティブを生成できる。しかし、広告クリエイティブ要素の増加に伴い、数え切れないほどの可能性から適切な組み合わせを選択することは困難になっている。業界主流のアプローチは、個別の創造的要素を個別に選択することであり、モデリングプロセスにおける創造的要素間の相互作用の重要性をしばしば見落としている。そこで本稿では,複数の創造的要素を対象とした多要素組合せ選択フレームワークcecsを提案する。エンコーダプロセスでは、現在候補の創造性に基づいて単一の創造的要素の表現を動的に調整するために、クロス要素相互作用が採用される。デコーダプロセスでは、創造的組み合わせ問題は複数の創造的要素のカスケード選択問題に変換される。候補間の関連をモデル化するためにカスケード設計を用いたポインタ機構を用いる。実世界のデータセットに関する総合的な実験は、CECSがオフラインメトリクスのSOTAスコアを達成したことを示している。さらに,cecsアルゴリズムが産業応用に応用され,ビジネス上有益である 6.02% ctr と 10.37% gmv lift が実現されている。

The effectiveness of ad creatives is greatly influenced by their visual appearance. Advertising platforms can generate ad creatives with different appearances by combining creative elements provided by advertisers. However, with the increasing number of ad creative elements, it becomes challenging to select a suitable combination from the countless possibilities. The industry's mainstream approach is to select individual creative elements independently, which often overlooks the importance of interaction between creative elements during the modeling process. In response, this paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements, termed CECS. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element based on the current candidate creatives. In the decoder process, the creative combination problem is transformed into a cascade selection problem of multiple creative elements. A pointer mechanism with a cascade design is used to model the associations among candidates. Comprehensive experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics. Moreover, the CECS algorithm has been deployed in our industrial application, resulting in a significant 6.02% CTR and 10.37% GMV lift, which is beneficial to the business.

翻訳日:2023-07-06 17:37:58 公開日:2023-07-04

# ニューラルネットワークを用いたリー群対称性変換の学習

Learning Lie Group Symmetry Transformations with Neural Networks ( http://arxiv.org/abs/2307.01583v1 )

ライセンス: Link先を確認

Alex Gabel, Victoria Klein, Riccardo Valperga, Jeroen S. W. Lamb, Kevin Webster, Rick Quax, Efstratios Gavves

(参考訳) データセットにおける対称性の存在を検出し定量化する問題は、モデル選択、生成モデリング、データ解析などに有用である。ニューラルネットワークにおける既存のハードコーディング変換法では、そのタスクの対称性に関する事前の知識を必要とするが、この研究は、データセットに存在する未知の対称性、すなわち、通常フィールドで考慮される従来のもの(回転、スケーリング、翻訳)を超えたリー群対称性変換の発見と特徴付けに焦点を当てている。具体的には、データポイントごとに異なるパラメータ値を持つ変換の1パラメータサブグループによってデータセットが変換されるシナリオを検討する。我々の目標は、変換群とパラメータ値の分布を特徴付けることである。その結果,両環境におけるアプローチの有効性が示された。

The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterizing unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.

翻訳日:2023-07-06 17:37:40 公開日:2023-07-04

# IAdet:最も単純なループ中の人間オブジェクト検出

IAdet: Simplest human-in-the-loop object detection ( http://arxiv.org/abs/2307.01582v1 )

ライセンス: Link先を確認

Franco Marchesoni-Acland, Gabriele Facciolo

(参考訳) この研究は、Intelligent Annotation (IA) という名前のデータをアノテートしながらモデルをトレーニングするための戦略を提案する。 iaには,(1)データアノテーション支援,(2)背景モデルのトレーニング,(3)データポイントのアクティブ選択という3つのモジュールが含まれている。このフレームワークでは、シングルクラスのオブジェクト検出に特化したIAdetツールをオープンソースにしています。さらに,そのようなループシステムを自動的に評価する手法も考案した。 PASCAL VOCデータセットの場合、IAdetツールは、トレーニング済みのモデルを無償で提供しながら、データベースアノテーションの時間を25\%$に短縮する。これらの結果は、意図的に非常に単純なIAdet設計のために得られる。その結果、IAdetは複数の簡単な改善の影響を受けるようになり、強力なHuman-in-the-loopオブジェクト検出システムへの道を開いた。

This work proposes a strategy for training models while annotating data named Intelligent Annotation (IA). IA involves three modules: (1) assisted data annotation, (2) background model training, and (3) active selection of the next datapoints. Under this framework, we open-source the IAdet tool, which is specific for single-class object detection. Additionally, we devise a method for automatically evaluating such a human-in-the-loop system. For the PASCAL VOC dataset, the IAdet tool reduces the database annotation time by $25\%$ while providing a trained model for free. These results are obtained for a deliberately very simple IAdet design. As a consequence, IAdet is susceptible to multiple easy improvements, paving the way for powerful human-in-the-loop object detection systems.

翻訳日:2023-07-06 17:37:27 公開日:2023-07-04

# 人間-the-Loopアノテーションのための最適かつ効率的なバイナリ質問

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation ( http://arxiv.org/abs/2307.01578v1 )

ライセンス: Link先を確認

Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo

(参考訳) データアノテーションは、人工知能ソリューションの解釈、研究、開発において極めて重要であるが、アクティブラーニングやマイナショットラーニングのようなほとんどの研究は、サンプル効率問題に焦点を当てている。本稿では, 予測器が与える注釈データ取得の補足問題について検討する。単純な二項分類設定では、最適一般解から実用的な方法まで幅広いスペクトルを提示する。この問題は、予測者が利用可能な場合、最小のyes/no質問数を持つバイナリ分類データセットの完全なアノテーションとしてフレーム化されている。一般的な二分問題の場合、解は符号理論において見出され、最適な質問戦略は可能なラベルのハフマン符号化によって与えられる。しかし、このアプローチは小さなデータセットサイズであっても計算が難しい。本稿では,いくつかのヒューリスティックスとプロキシコスト関数のルックアヘッド最小化に基づく代替実用ソリューションを提案する。提案手法は最適解と比較して解析され、複数の合成および実世界のデータセットで評価される。これらのデータセットでは、アノテーションの効率が大幅に向上する(23-86\%$)。

Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.

翻訳日:2023-07-06 17:37:15 公開日:2023-07-04

# In-Domain Self-Supervised Learningはリモートセンシング画像分類の改善につながる

In-Domain Self-Supervised Learning Can Lead to Improvements in Remote Sensing Image Classification ( http://arxiv.org/abs/2307.01645v1 )

ライセンス: Link先を確認

Ivica Dimitrovski, Ivan Kitanovski, Nikola Simidjievski, Dragi Kocev

(参考訳) 自己教師あり学習(ssl)は、大量のラベルなしデータを活用できるため、リモートセンシング画像分類に有望なアプローチとして登場した。従来の教師付き学習とは対照的に、sslは明示的なラベルなしでデータの表現を学ぶことを目指している。これは、ラベルのないデータのための擬似ラベルを作成し、事前学習されたモデルを学ぶために使用できる補助タスクを定式化することで達成される。事前学習されたモデルは、リモートセンシングイメージシーンの分類のような下流タスクで微調整することができる。本稿では,様々なリモートセンシング画像シーン分類データセットをダウンストリームタスクとして用いた,大規模なラベルなしリモートセンシングデータセットであるMillion AIDを用いたSSL事前トレーニングの有効性を解析する。具体的には、ImageNetデータセットを用いたViTの教師付き事前トレーニングとは対照的に、iBOTフレームワークとビジョントランスフォーマー(ViT)を併用したSSL事前トレーニングの有効性を評価する。さまざまな特性を持つ14のデータセットにわたる包括的な実験の結果、ドメイン内のSSLは、教師付きデータセットと比較してモデルの予測パフォーマンスを改善することが判明した。

Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to leverage large amounts of unlabeled data. In contrast to traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used to create pseudo-labels for the unlabeled data and learn pre-trained models. The pre-trained models can then be fine-tuned on downstream tasks such as remote sensing image scene classification. The paper analyzes the effectiveness of SSL pre-training using Million AID - a large unlabeled remote sensing dataset on various remote sensing image scene classification datasets as downstream tasks. More specifically, we evaluate the effectiveness of SSL pre-training using the iBOT framework coupled with Vision transformers (ViT) in contrast to supervised pre-training of ViT using the ImageNet dataset. The comprehensive experimental work across 14 datasets with diverse properties reveals that in-domain SSL leads to improved predictive performance of models compared to the supervised counterparts.

翻訳日:2023-07-06 17:31:05 公開日:2023-07-04

# ツール対応会話エージェントの挿入拡大

Insert-expansions for Tool-enabled Conversational Agents ( http://arxiv.org/abs/2307.01644v1 )

ライセンス: Link先を確認

Andreas G\"oldi and Roman Rietsche

(参考訳) 本稿では,このプロンプト手法によって生成された明示的な推論パスにおけるツール(あるいはプラグイン)の使用に注目し,大規模言語モデルにおける思考連鎖の高度な実装について述べる。ツールが使える会話エージェントは、検索エンジンや電卓などのツールが本来のユーザー意図から逸脱するなど、サイドトラック化されることが多い。そこで我々は,ユーザがツールになり,必要な詳細を提供し,リクエストを精査するコンセプトを探求する。会話分析を通して、我々はこの相互作用を、好ましい応答を促進するために設計された中間的会話である挿入膨張として特徴づける。我々は,この「ユーザ・アズ・ア・ツール」アプローチから生じる可能性について,直接比較による2つの経験的研究から検討し,レコメンデーション領域の利点を見出す。

This paper delves into an advanced implementation of Chain-of-Thought-Prompting in Large Language Models, focusing on the use of tools (or "plug-ins") within the explicit reasoning paths generated by this prompting method. We find that tool-enabled conversational agents often become sidetracked, as additional context from tools like search engines or calculators diverts from original user intents. To address this, we explore a concept wherein the user becomes the tool, providing necessary details and refining their requests. Through Conversation Analysis, we characterize this interaction as insert-expansion - an intermediary conversation designed to facilitate the preferred response. We explore possibilities arising from this 'user-as-a-tool' approach in two empirical studies using direct comparison, and find benefits in the recommendation domain.

翻訳日:2023-07-06 17:30:48 公開日:2023-07-04

# 知識の強化を促進する思考連鎖

Chain of Thought Prompting Elicits Knowledge Augmentation ( http://arxiv.org/abs/2307.01640v1 )

ライセンス: Link先を確認

Dingjun Wu, Jing Zhang, Xinmei Huang

(参考訳) 知識強化されたディープラーニングパラダイムは、ドメイン知識を同定し、深層モデルに統合するパラダイムを指す。従来の手法では、様々なソースから外部知識を集めるためにタスク固有のアプローチが用いられる。対照的に、大きな言語モデルは広範囲に事前訓練されており、外部知識の包括的な情報源として機能する。本稿では,深層学習のための知識を増強するChain-of-Thoughtベースの手法であるCoT-KAを提案する。 CoT-KAは、従来の拡張手法に必要な知識検索や知識推論モデルの必要性を回避する。以上の結果から,CoT-KAは,さまざまな推論タスクにおいて利用可能な11のベンチマークの過半数において,純粋なCoT法と非拡張法の両方に優れることが示された。

The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models. Conventional methods typically employ task-specific approaches to gather external knowledge from various sources. In contrast, large language models are extensively pre-trained and can serve as a comprehensive source of external knowledge. In this paper, we propose CoT-KA, a Chain-of-Thought-based method that augments knowledge for deep learning. CoT-KA avoids the need for additional knowledge retrieval or knowledge reasoning models, as required in conventional augmentation methods. Our results demonstrate that CoT-KA outperforms both pure CoT-based methods and the non-augmented method across the majority of eleven publicly available benchmarks for various reasoning tasks.

翻訳日:2023-07-06 17:30:32 公開日:2023-07-04

# 相互コヒーレンス近似のためのヒューリスティックアルゴリズム

Heuristic Algorithms for the Approximation of Mutual Coherence ( http://arxiv.org/abs/2307.01639v1 )

ライセンス: Link先を確認

Gregor Betz, Vera Chekan, Tamara Mchedlidze

(参考訳) 相互一貫性は2つの意見の類似性の尺度である。この概念は哲学に由来するが、Wahl-O-Matシステムのような幅広い技術には必須である。ドイツでは、この制度は有権者が政治的嗜好に最も近い候補者を見つけるのに役立つ。相互コヒーレンスの正確な計算は、意見のすべての部分集合の反復のために非常に時間がかかる。さらに、各サブセットに対して、SATモデルカウント問題(英語版)のインスタンスを解く必要があるが、これはコンピュータ科学において難しい問題である。この研究は、この計算を加速する最初の研究である。本稿では,いわゆる確認値の分布を3つのガウスの混合としてモデル化し,モデルパラメータを推定する効率的なヒューリスティックスを提案する。相互コヒーレンスは、その分布の期待値と近似される。提示されたアルゴリズムのいくつかは完全に多項式時間であり、他のアルゴリズムは少数のsatモデルカウント問題の解のみを必要とする。我々の最善のアルゴリズムの平均二乗誤差は 0.0035 以下であり、効率を考慮すると重要ではない。さらに、wahl-o-matライクなシステムでは精度が十分である。

Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.

翻訳日:2023-07-06 17:30:19 公開日:2023-07-04

# 複数ネットワーク上のランダムウォーク

Random Walk on Multiple Networks ( http://arxiv.org/abs/2307.01637v1 )

ライセンス: Link先を確認

Dongsheng Luo, Yuchen Bian, Yaowei Yan, Xiong Yu, Jun Huan, Xiao Liu, Xiang Zhang

(参考訳) Random Walkはネットワークの構造を探索するための基本的なアルゴリズムであり、ローカルなコミュニティ検出やネットワーク埋め込みといった多くのタスクで使用できる。既存のランダムウォーク手法は、限られた情報を含む単一ネットワークに基づいている。対照的に、実際のデータは、しばしば異なるタイプまたは異なるソースのエンティティを含んでおり、それらは包括的であり、複数のネットワークによりより良くモデル化される。本稿では,複数のネットワークにおけるリッチな情報を活用し,エンティティの推論を改善するために,複数ネットワーク上のランダムウォーク(RWM)を提案する。 RWMは柔軟で、多重ネットワークと一般的な多重ネットワークの両方をサポートし、ネットワーク間の多対多ノードマッピングを形成する。 RWMは各ネットワーク上でランダムなウォーカを送信し、開始ノードの局所的近接(すなわちノード訪問確率)を得る。同様の訪問確率を持つ歩行者はお互いを強化します。 RWMの収束特性を理論的に解析する。理論的性能保証を伴う2つの近似法を効率的な計算法として提案する。リンク予測,ネットワーク埋め込み,地域コミュニティ検出にRWMを適用した。合成データセットと実世界のデータセットの両方で実施された総合実験は、RWMの有効性と効率を実証している。

Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited information. In contrast, real data often contain entities with different types or/and from different sources, which are comprehensive and can be better modeled by multiple networks. To take advantage of rich information in multiple networks and make better inferences on entities, in this study, we propose random walk on multiple networks, RWM. RWM is flexible and supports both multiplex networks and general multiple networks, which may form many-to-many node mappings between networks. RWM sends a random walker on each network to obtain the local proximity (i.e., node visiting probabilities) w.r.t. the starting nodes. Walkers with similar visiting probabilities reinforce each other. We theoretically analyze the convergence properties of RWM. Two approximation methods with theoretical performance guarantees are proposed for efficient computation. We apply RWM in link prediction, network embedding, and local community detection. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of RWM.

翻訳日:2023-07-06 17:30:02 公開日:2023-07-04

# HAGNN: 異種グラフニューラルネットワークのためのハイブリッドアグリゲーション

HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks ( http://arxiv.org/abs/2307.01636v1 )

ライセンス: Link先を確認

Guanghui Zhu, Zhennan Zhu, Hongyang Chen, Chunfeng Yuan, Yihua Huang

(参考訳) 異種グラフニューラルネットワーク(GNN)は異種グラフの処理に成功している。既存の異種GNNでは、メタパスが重要な役割を果たす。しかし、近年の研究はメタパスのない単純な同質グラフモデルでも同等の結果が得られることを指摘し、メタパスの必要性を疑問視している。本稿では,まず,メタパスベースモデルとメタパスフリーモデル,すなわちノードアグリゲーションのための近傍を選択する方法に関する本質的な違いについて述べる。そこで我々は,ヘテロジニアスグラフのリッチな型意味情報,すなわちHAGNN(Hybrid Aggregation for Heterogeneous GNNs)を包括的に活用するための新しいフレームワークを提案する。 HAGNNの中核は、ノード集約のためにメタパス隣人と直接接続された隣人を活用することである。 hagnnは全体の集約プロセスを、メタパスベースのイントラタイプアグリゲーションとメタパスフリーインタータイプアグリゲーションの2つのフェーズに分割する。型内アグリゲーションフェーズでは,融合メタパスグラフと呼ばれる新しいデータ構造を提案し,その上で構造的意味認識アグリゲーションを行う。最後に、各フェーズによって生成される埋め込みを組み合わせる。既存の異種GNNモデルと比較して、HAGNNは異種グラフの異種性を完全に活用することができる。ノード分類、ノードクラスタリング、リンク予測タスクに関する大規模な実験結果から、HAGNNは既存のモードよりも優れており、HAGNNの有効性を示している。

Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that simple homogeneous graph model without meta-path can also achieve comparable results, which calls into question the necessity of meta-path. In this paper, we first present the intrinsic difference about meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type semantic information in heterogeneous graphs comprehensively, namely HAGNN (Hybrid Aggregation for Heterogeneous GNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation. During the intra-type aggregation phase, we propose a new data structure called fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness of HAGNN.

翻訳日:2023-07-06 17:29:45 公開日:2023-07-04

# ChildPlay: 子どもの視線行動を理解するための新しいベンチマーク

ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour ( http://arxiv.org/abs/2307.01630v1 )

ライセンス: Link先を確認

Samy Tafasca, Anshul Gupta, Jean-Marc Odobez

(参考訳) 子どもの発達障害を診断するための重要なマーカーは、アイコンタクトや共有注意などの迷路行動である。これまでの研究ではこれらの要素のいくつかを検討したが、分析は通常プライベートデータセット上で行われ、実験室の設定に限定されている。さらに、すべての一般公開された視線目標予測ベンチマークには、主に大人のインスタンスが含まれており、幼児のシナリオに適用できないようにトレーニングされたモデルが採用されている。本稿では,子どもの視線目標と相互作用する大人の視線目標を予測するための最初の研究を提案する。この目的のために,子どもがコントロールされていない環境(幼稚園,セラピーセンター,保育園など)で大人と遊んで交流する様子を収録した短いビデオクリップのキュレートされたコレクションであるChildPlayデータセットを紹介した。さらに,人物の3次元視野(3dfov)のシーン部分を明確に識別し,近年の奥行き推定法を活用し,視線目標予測のための新しいモデルを提案する。我々のモデルは、ベンチマークデータセットとChildPlayのアート結果の状態を達成します。また, 子どもの表情予測性能は, 成人よりもずっと悪く, 子どもの視線アノテーションを用いた微調整モデルにより有意に改善できることが示された。私たちのデータセットとモデルは公開されます。

Gaze behaviors such as eye-contact or shared attention are important markers for diagnosing developmental disorders in children. While previous studies have looked at some of these elements, the analysis is usually performed on private datasets and is restricted to lab settings. Furthermore, all publicly available gaze target prediction benchmarks mostly contain instances of adults, which makes models trained on them less applicable to scenarios with young children. In this paper, we propose the first study for predicting the gaze target of children and interacting adults. To this end, we introduce the ChildPlay dataset: a curated collection of short video clips featuring children playing and interacting with adults in uncontrolled environments (e.g. kindergarten, therapy centers, preschools etc.), which we annotate with rich gaze information. We further propose a new model for gaze target prediction that is geometrically grounded by explicitly identifying the scene parts in the 3D field of view (3DFoV) of the person, leveraging recent geometry preserving depth inference methods. Our model achieves state of the art results on benchmark datasets and ChildPlay. Furthermore, results show that looking at faces prediction performance on children is much worse than on adults, and can be significantly improved by fine-tuning models using child gaze annotations. Our dataset and models will be made publicly available.

翻訳日:2023-07-06 17:29:19 公開日:2023-07-04

# リカレントトレンド予測ニューラルネットワークに基づく予測組込みスケジューリングによるスマートホーム環境の再生可能エネルギー管理

Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network ( http://arxiv.org/abs/2307.01622v1 )

ライセンス: Link先を確認

Mert Nak{\i}p, Onur \c{C}opur, Emrah Biyik, C\"uneyt G\"uzeli\c{s}

(参考訳) スマートホームエネルギー管理システムは、配電網をより効率的かつ確実に運用し、分散型再生可能エネルギー源の効果的な普及を可能にする。これらのシステムは、需要と再生可能生成の不確実性を扱うことのできる堅牢な予測、最適化、制御/スケジューリングアルゴリズムに依存している。本稿では,Recurrent Trends Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES)と呼ばれるMLアルゴリズムを提案する。 rTPNN-FESは、再生可能エネルギーの発生と家電のスケジュールを同時に予測する新しいニューラルネットワークアーキテクチャである。組込み構造により、rTPNN-FESは予測とスケジューリングのための別々のアルゴリズムの使用を排除し、予測エラーに対して堅牢なスケジュールを生成する。本稿では,iot対応スマートホームにおける提案アルゴリズムの性能評価も行う。評価結果から, rTPNN-FESは最適化よりも37.5ドルの速さで, 最先端予測技術より優れていることがわかった。

Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.

翻訳日:2023-07-06 17:28:56 公開日:2023-07-04

# 量子ダイレクト通信のための2と3のプレイヤー方式

A 2 & 3 Player Scheme for Quantum Direct Communication ( http://arxiv.org/abs/2307.01620v1 )

ライセンス: Link先を確認

Theodore Andronikos and Alla Sirokofskich

(参考訳) 本稿では,aliceとbobの量子的セキュアな直接通信を実現する2つの情報理論的セキュアプロトコルと,alice,bod,charlieの2つのプロトコルを紹介する。どちらのプロトコルも、プレイヤーの絡み合った複合システムに秘密情報を埋め込むのと同じ新しい方法を使っている。情報エンコーディングの仕方は,本論文の目新しさであり,この分野の先行作品と比較して特徴を区別するものである。この手法の利点は、拡張が容易であり、2番目のプロトコルで示されるように、3つ以上のプレイヤーを含む設定に一般化できることである。この特徴は、2人の空間分離されたプレイヤーが、彼女が完全な秘密を明らかにするために、結合し、アリスに送信しなければならない秘密情報の一部だけをポッセするときに有益である。 3つのプレイヤプロトコルを使用することで、典型的なqsdcプロトコルを2回適用することなく、このタスクを1回で達成することができる。両方のプロトコルのもう1つの特徴は、単純さと均一性である。 2つのプレーヤプロトコルは、EPRペアとGHZトリプル上の3つのプレーヤプロトコルに依存しています。同じ静脈では、局所量子回路は類似または同一であり、アダマールゲートとCNOTゲートのみを使用するため容易に構成可能である。

This paper introduces two information-theoretically secure protocols that achieve quantum secure direct communication between Alice and Bob in the first case, and among Alice, Bod and Charlie in the second case. Both protocols use the same novel method to embed the secret information in the entangled composite system of the players. The way of encoding the information is the main novelty of this paper and the distinguishing feature compared to previous works in the field. The advantage of this method is that it is easily extensible and can be generalized to a setting involving three, or even more, players, as demonstrated with the second protocol. This trait can be beneficial when two spatially separated players posses only part of the secret information that must be combined and transmitted to Alice in order for her to reveal the complete secret. Using the three player protocol, this task can be achieved in one go, without the need to apply a typical QSDC protocol twice, where Alice first receives Bob's information and afterwards Charlie's information. Another characteristic of both protocols is their simplicity and uniformity. The two player protocol relies on EPR pairs, and the three player protocol on GHZ triples, which can be easily prepared with our current technology. In the same vein, the local quantum circuits are similar or identical, and are easily constructible as they employ only Hadamard and CNOT gates.

翻訳日:2023-07-06 17:28:39 公開日:2023-07-04

# SageFormer: 多変量時系列予測のためのグラフ強化変換器

SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2307.01616v1 )

ライセンス: Link先を確認

Zhenwei Zhang, Xin Wang, Yuantao Gu

(参考訳) 多変量時系列予測は多様な領域において重要な役割を果たす。近年のディープラーニング手法,特にトランスフォーマーの進歩は,将来性を示しているが,シリーズ間の依存関係の重要性に対処する上ではまだギャップが残っている。本稿では,グラフ構造を用いて時系列間の依存関係を効果的にキャプチャし,モデル化するシリーズ対応グラフ拡張トランスフォーマーモデルであるSageFormerを紹介する。 sageformerは、2つの重要な課題に取り組んでいる。シリーズ間で多様な時間パターンを効果的に表現し、シリーズ間で冗長な情報を緩和する。重要なのは、提案されたシリーズアウェアフレームワークが既存のトランスフォーマーベースのモデルとシームレスに統合され、シリーズ間の依存関係をモデル化する能力が強化されることだ。実世界および合成データセットに関する広範な実験を通じて、従来の最先端のアプローチと比較して、sageformerの優れたパフォーマンスを示す。

Multivariate time series forecasting plays a critical role in diverse domains. While recent advancements in deep learning methods, especially Transformers, have shown promise, there remains a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.

翻訳日:2023-07-06 17:28:14 公開日:2023-07-04

# 非条件音声合成のためのganの絡み合い

Disentanglement in a GAN for Unconditional Speech Synthesis ( http://arxiv.org/abs/2307.01673v1 )

ライセンス: Link先を確認

Matthew Baas and Herman Kamper

(参考訳) 明示的な条件付けなしに、潜在空間から直接リアルな音声を合成できるモデルを開発することができるか? 過去10年間、いくつかの努力にもかかわらず、過去の敵対的および拡散ベースのアプローチは、小さなボカブラリデータセットでも、これを達成するのに苦労している。そこで本稿では,無条件音声合成のための生成対向ネットワークであるAudioStyleGAN(ASGAN)を提案する。画像合成モデルのstyleganファミリに基づいて、asganはサンプリングされたノイズを不連続な潜在ベクトルにマッピングし、オーディオ特徴のシーケンスにマッピングすることで、各層で信号エイリアシングが抑制される。 AsGANのトレーニングを成功させるためには、適応型判別器の増分修正など、いくつかの新しい手法を導入する。小語彙のGoogle Speech Commands digitsデータセットに適用し、非条件音声合成の最先端結果を達成する。また、既存の最高性能拡散モデルよりもかなり高速である。我々は,asganの潜在空間が不連続であることを確認する。空間内の単純な線形演算が,訓練中に見当たらないいくつかのタスクを実行するためにどのように利用できるかを示す。具体的には,音声変換,音声強調,話者照合,キーワード分類における評価を行う。我々の研究は、ganは依然として無条件音声合成環境において非常に競争力があり、非知覚タスクの一般化を支援するために不連続な潜在空間が利用できることを示している。コード、モデル、サンプル:https://github.com/RF5/simple-asgan/

Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this, even on small-vocabulary datasets. To address this, we propose AudioStyleGAN (ASGAN) -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space. Building upon the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer. To successfully train ASGAN, we introduce a number of new techniques, including a modification to adaptive discriminator augmentation which probabilistically skips discriminator updates. We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis. It is also substantially faster than existing top-performing diffusion models. We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training. Specifically, we perform evaluations in voice conversion, speech enhancement, speaker verification, and keyword classification. Our work indicates that GANs are still highly competitive in the unconditional speech synthesis landscape, and that disentangled latent spaces can be used to aid generalization to unseen tasks. Code, models, samples: https://github.com/RF5/simple-asgan/

翻訳日:2023-07-06 17:21:21 公開日:2023-07-04

# ノルウェーの自動音声認識の強化

Boosting Norwegian Automatic Speech Recognition ( http://arxiv.org/abs/2307.01672v1 )

ライセンス: Link先を確認

Javier de la Rosa, Rolv-Arild Braaten, Per Egil Kummervold, Freddy Wetjen, Svein Arne Brygfjeld

(参考訳) 本稿では,ノルウェーの2つの公用語である Bokm{\aa}l と Nynorsk の音声認識モデルについて述べる。複数のノルウェー語音声データセットにおける様々な大きさのモデルと事前学習アプローチの性能を比較した。さらに、従来の最先端asrモデルやドメイン外データセットに対して、これらのモデルのパフォーマンスを測定する。ノルウェー議会音声コーパス(npsc)の技術状態を、単語誤り率(wer)が17.10\%から7.60\%に改善し、モデルではbokm{\aa}lが5.81\%、nynorskが11.54\%となった。ノルウェーのASRモデルをさらに改善するための課題と潜在的な解決策についても論じる。

In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets. Additionally, we measure the performance of these models against previous state-of-the-art ASR models, as well as on out-of-domain datasets. We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for Nynorsk. We also discuss the challenges and potential solutions for further improving ASR models for Norwegian.

翻訳日:2023-07-06 17:20:51 公開日:2023-07-04

# 拡散コントラスト発散を持つエネルギーベースモデルの訓練

Training Energy-Based Models with Diffusion Contrastive Divergences ( http://arxiv.org/abs/2307.01668v1 )

ライセンス: Link先を確認

Weijian Luo and Hao Jiang and Tianyang Hu and Jiacheng Sun and Zhenguo Li and Zhihua Zhang

(参考訳) エネルギーベースモデル(EBM)は生成モデルに広く用いられている。コントラシブ・ディバージェンス(CD:Contrastive Divergence)は、EMMのトレーニング目標であり、マルコフ・チェイン・モンテカルロ法(MCMC)を用いてEMMからサンプリングする必要がある。収束までのMCMCの実行は計算集約的である。一方、短期実行MCMCは、扱いにくい余分に無視できないパラメータ勾配項をもたらす。本稿では,CDをDCD(Diffusion Contrastive Divergence)ファミリーの特別な例と見なして,CDの一般的な解釈を提供する。 CD で用いられるランゲヴィン力学を他の EBM パラメータフリー拡散法に置き換えることにより,より効率的な分岐法を提案する。提案したDCDは,CDよりも計算効率が良く,非無視勾配項に制限されないことを示す。提案するdcdの利点を示すために,合成データモデリングと高次元画像のデニュージングと生成の両方を含む集中実験を行った。合成データ学習と画像復号化実験において,提案したDCDは大きな差でCDを上回った。画像生成実験において、提案するdcdは、既存のebmに匹敵する322\times 32$データセットを生成するためのエネルギーベースのモデルを訓練することができる。

Energy-Based Models (EBMs) have been widely used for generative modeling. Contrastive Divergence (CD), a prevailing training objective for EBMs, requires sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which leads to an irreconcilable trade-off between the computational burden and the validity of the CD. Running MCMCs till convergence is computationally intensive. On the other hand, short-run MCMC brings in an extra non-negligible parameter gradient term that is difficult to handle. In this paper, we provide a general interpretation of CD, viewing it as a special instance of our proposed Diffusion Contrastive Divergence (DCD) family. By replacing the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, we propose a more efficient divergence. We show that the proposed DCDs are both more computationally efficient than the CD and are not limited to a non-negligible gradient term. We conduct intensive experiments, including both synthesis data modeling and high-dimensional image denoising and generation, to show the advantages of the proposed DCDs. On the synthetic data learning and image denoising experiments, our proposed DCD outperforms CD by a large margin. In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.

翻訳日:2023-07-06 17:20:37 公開日:2023-07-04

# 精神疲労モニタリングのためのセンサとシステム--体系的レビュー

Sensors and Systems for Monitoring Mental Fatigue: A systematic review ( http://arxiv.org/abs/2307.01666v1 )

ライセンス: Link先を確認

Prabin Sharma, Joanna C. Justus, Govinda R. Poudel

(参考訳) 精神疲労は、自動車事故、医療ミス、職場での生産性の低下、およびeラーニング環境における学生の離職の主な原因である。精神的な疲労を確実に追跡できるセンサーやシステムの開発は、事故を防止し、エラーを低減し、職場の生産性を向上させる。本稿では,心的疲労の理論モデルに関する批判的概要,センサ技術の鍵となる説明,およびバイオセンサーを用いた人間の心的疲労追跡システムを用いた最近の研究の体系的レビューについて述べる。ヒトの精神疲労の検出と追跡に焦点をあてた最近の文献を体系的に調査・レビューした。調査の結果、57の研究(n=1082)が行われ、その大半は心的疲労を追跡するために脳波(eeg)ベースのセンサーを用いた。脳波センサは疲労検出に適度から良好な感度を提供することがわかった。特に,高濃度脳波センサを用いた心的疲労検出の漸進的効果は認められなかった。この結果を踏まえて,ウェアラブル脳波と環境センサの統合について,実世界のモニタリングを実現するための重要な議論を行う。半自律型・自律型産業におけるウェアラブルセンサと疲労監視システムの普及に向けての技術の進歩と適応に必要な今後の課題について検討する。

Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a description of key enabling sensor technologies, and a systematic review of recent studies using biosensor-based systems for tracking mental fatigue in humans. We conducted a systematic search and review of recent literature which focused on detection and tracking of mental fatigue in humans. The search yielded 57 studies (N=1082), majority of which used electroencephalography (EEG) based sensors for tracking mental fatigue. We found that EEG-based sensors can provide a moderate to good sensitivity for fatigue detection. Notably, we found no incremental benefit of using high-density EEG sensors for application in mental fatigue detection. Given the findings, we provide a critical discussion on the integration of wearable EEG and ambient sensors in the context of achieving real-world monitoring. Future work required to advance and adapt the technologies toward widespread deployment of wearable sensors and systems for fatigue monitoring in semi-autonomous and autonomous industries is examined.

翻訳日:2023-07-06 17:20:11 公開日:2023-07-04

# チットチャットとタスク指向対話のシステム導入による統合会話モデル

Unified Conversational Models with System-Initiated Transitions between Chit-Chat and Task-Oriented Dialogues ( http://arxiv.org/abs/2307.01664v1 )

ライセンス: Link先を確認

Ye Liu, Stefan Ultes, Wolfgang Minker and Wolfgang Maier

(参考訳) 音声対話システム(SDS)は、タスク指向とチャットという2つのカテゴリで別々に開発された。前者は機能的な目標を達成することに焦点を当て、後者は特別な目標を伴わずにソーシャルな会話を生み出すことを目的としている。チットチャットとタスク指向の対話を両立できる統一的な会話モデルの作成は、近年の有望な研究テーマである。しかし、一つの対話で対話モードが変化した場合に生じる「初期的」の可能性はほとんど探求されていない。本研究では,タスク関連トピックを暗黙的に取り込んでタスク指向の要求に切り替えることから始まり,タスク指向のインタラクションから始まり,すべての要求情報が提供された後にカジュアルチャットに変化する2種類の対話シナリオについて検討する。統合対話モデルにおいて、システム開始遷移をトリガーする遷移文を積極的に生成できる2つの効率的なプロンプトモデルに寄与する。 1つは2つの離散トークンで訓練された離散プロンプトモデルであり、もう1つは、分類器によって自動的に生成される連続プロンプト埋め込みを用いた連続プロンプトモデルである。さらに,連続的なプロンプトモデルを用いて,タスク指向のタスク指向設定において,特定のドメイン間のプロアクティブな遷移を導くことも可能であることを示す。

Spoken dialogue systems (SDSs) have been separately developed under two different categories, task-oriented and chit-chat. The former focuses on achieving functional goals and the latter aims at creating engaging social conversations without special goals. Creating a unified conversational model that can engage in both chit-chat and task-oriented dialogue is a promising research topic in recent years. However, the potential ``initiative'' that occurs when there is a change between dialogue modes in one dialogue has rarely been explored. In this work, we investigate two kinds of dialogue scenarios, one starts from chit-chat implicitly involving task-related topics and finally switching to task-oriented requests; the other starts from task-oriented interaction and eventually changes to casual chat after all requested information is provided. We contribute two efficient prompt models which can proactively generate a transition sentence to trigger system-initiated transitions in a unified dialogue model. One is a discrete prompt model trained with two discrete tokens, the other one is a continuous prompt model using continuous prompt embeddings automatically generated by a classifier. We furthermore show that the continuous prompt model can also be used to guide the proactive transitions between particular domains in a multi-domain task-oriented setting.

翻訳日:2023-07-06 17:19:52 公開日:2023-07-04

# オンライン手書き署名検証のためのトランスフォーマーの検討

Exploring Transformers for On-Line Handwritten Signature Verification ( http://arxiv.org/abs/2307.01663v1 )

ライセンス: Link先を確認

Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Paula Delgado-Santos, Giuseppe Stragapede, Julian Fierrez, Javier Ortega-Garcia

(参考訳) 近年,ユーザフレンドリーな認証手法としてのモバイルバイオメトリックスの利用が増加している。近年の研究では、トランスフォーマーに基づく新しい行動バイオメトリック認識システムを提案している。オンライン手書き署名検証は、タブレットやスマートフォンなどの電子機器を用いて取得した生体認証に基づいて、被験者の身元を確認することを目的としている。本稿では,オンライン署名検証のための最近のトランスフォーマーに基づくアーキテクチャの適合性について検討する。特に4つの異なる構成が研究され、そのうち2つはVanilla Transformerエンコーダに依存し、他の2つは歩行と行動認識のタスクにうまく適用されている。提案する4つの構成をsvc-ongoing competitionで提案された実験プロトコルに従って評価する。実験の結果は有望であり,オンライン署名検証におけるトランスフォーマーの利用を促進する。

The application of mobile biometrics as a user-friendly authentication method has increased in the last years. Recent studies have proposed novel behavioral biometric recognition systems based on Transformers, which currently outperform the state of the art in several application scenarios. On-line handwritten signature verification aims to verify the identity of subjects, based on their biometric signatures acquired using electronic devices such as tablets or smartphones. This paper investigates the suitability of architectures based on recent Transformers for on-line signature verification. In particular, four different configurations are studied, two of them rely on the Vanilla Transformer encoder, and the two others have been successfully applied to the tasks of gait and activity recognition. We evaluate the four proposed configurations according to the experimental protocol proposed in the SVC-onGoing competition. The results obtained in our experiments are promising, and promote the use of Transformers for on-line signature verification.

翻訳日:2023-07-06 17:19:28 公開日:2023-07-04

# マイクロ波周波数標準に適用する2種共役冷却coulomb結晶の$^{174}\mathrm{yb}^+$-$^{113}\mathrm{cd}^+$symphony-cooling bi-species coulomb結晶

$^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$ sympathetic-cooling bi-species Coulomb crystal applied to microwave frequency standard ( http://arxiv.org/abs/2307.01656v1 )

ライセンス: Link先を確認

Y Zheng, H. R. Qin, S. N. Miao, N. C. Xin, Y. T. Chen, J. Z. Han, J. W. Zhang, and L. J. Wang

(参考訳) 我々は、冷却剤として$^{174}\mathrm{yb}^+$-$^{113}\mathrm{cd}^+$ bi-species coulomb結晶の実現を報告し、$^{113}\mathrm{cd}^+$マイクロ波周波数標準としての応用の可能性を確認した。中心に$^{113}\mathrm{Cd}^+$イオンが閉じ込められ、$^{113}\mathrm{Cd}^+$イオンを被写体とする相当なRF加熱と過剰なマイクロモーションが減少する。このスキームの下では、2階ドップラー効果による不確実性は5\times10^{-16}$に還元され、同調冷却された$^{40}\mathrm{Ca}^+$-$^{113}\mathrm{Cd}^+$結晶よりも大幅に改善される。マイクロ波イオン周波数標準に最も大きな不確実性をもたらす第2次ゼーマン効果の不確実性は、4\times10^{-16}$となる。 ACスタークシフトの不確実性は4\times10^{-19}$と推定される。これらの結果は、$^{174}\mathrm{Yb}^+$を、$^{113}\mathrm{Cd}^+$に対して冷却剤イオンとして使用する方がはるかに優れており、$^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$2成分結晶を用いた同調冷却カドミウムイオンマイクロ波時計システムの実現可能性を確認している。

We reported the realization of a $^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$ bi-species Coulomb crystal comprising $^{174}\mathrm{Yb}^+$ ions as coolant and verified its potential for application as a $^{113}\mathrm{Cd}^+$ microwave frequency standard employing sympathetic cooling.The two species of massive ions stably trapped in a Paul trap make up this large two-component crystal. The $^{113}\mathrm{Cd}^+$ ions are trapped in the center, which reduces considerably RF heating and excess micromotion to which the $^{113}\mathrm{Cd}^+$ ions are subjected. Under this scheme, the uncertainty due to the second-order Doppler effect is reduced to $5\times10^{-16}$, which represents an order of magnitude improvement over sympathetic cooled $^{40}\mathrm{Ca}^+$-$^{113}\mathrm{Cd}^+$ crystal. The uncertainty from the second-order Zeeman effect, which contributes the largest uncertainty to the microwave-ion frequency standard, is reduced to $4\times10^{-16}$. The relevant AC Stark shift uncertainty is estimated to be $4\times10^{-19}$. These results indicate using $^{174}\mathrm{Yb}^+$ as coolant ions for $^{113}\mathrm{Cd}^+$ is far superior and confirm the feasibility of a sympathetic-cooled cadmium-ion microwave clock system employing a $^{174}\mathrm{Yb}^+$-$^{113}\mathrm{Cd}^+$ two-component crystal.

翻訳日:2023-07-06 17:19:15 公開日:2023-07-04

# アーボリストと森林労働者のタスクプランニング支援:UAVデータに基づく樹木の深層学習アプローチと樹木の生力評価の比較

Task Planning Support for Arborists and Foresters: Comparing Deep Learning Approaches for Tree Inventory and Tree Vitality Assessment Based on UAV-Data ( http://arxiv.org/abs/2307.01651v1 )

ライセンス: Link先を確認

Jonas-Dario Troles and Richard Nieding and Sonia Simons and Ute Schmid

(参考訳) 気候危機とそれに関連する長い干ばつが、都市や森林の樹木の健康を脅かしている。その結果、アーボリストや森林労働者はワークロードの増加に悩まされ、最良の場合、一貫したがしばしば減少する。ワークフローの最適化と生産性向上を目的として,都市周辺の木々を気にする人たちのタスクプランニングを改善する,オープンソースのエンドツーエンドアプローチを提案する。提案手法は,都市公園や森林の樹木在庫の作成や,統計指標や深層学習による樹木の活力評価を行うために,RGBおよび多スペクトルUAVデータに基づく。都市部における飛行ドローンに関するEUの規制により、多スペクトル衛星データと15の土壌水分センサーを使用して、木活力関連データを拡張する。さらにバンバーグには、市内に約15,000本の孤立した樹林があり、有用な情報を生み出すためにも使われている。上記のデータはすべて対話型Webアプリケーションに結合して視覚化され、アーボリストや森林労働者は個人的かつ柔軟な評価を生成でき、日々のタスク計画を改善することができる。

Climate crisis and correlating prolonged, more intense periods of drought threaten tree health in cities and forests. In consequence, arborists and foresters suffer from increasing workloads and, in the best case, a consistent but often declining workforce. To optimise workflows and increase productivity, we propose a novel open-source end-to-end approach that generates helpful information and improves task planning of those who care for trees in and around cities. Our approach is based on RGB and multispectral UAV data, which is used to create tree inventories of city parks and forests and to deduce tree vitality assessments through statistical indices and Deep Learning. Due to EU restrictions regarding flying drones in urban areas, we will also use multispectral satellite data and fifteen soil moisture sensors to extend our tree vitality-related basis of data. Furthermore, Bamberg already has a georeferenced tree cadastre of around 15,000 solitary trees in the city area, which is also used to generate helpful information. All mentioned data is then joined and visualised in an interactive web application allowing arborists and foresters to generate individual and flexible evaluations, thereby improving daily task planning.

翻訳日:2023-07-06 17:18:28 公開日:2023-07-04

# オーバーパラメータ付き畳み込み残差ネットワークを用いた低次元多様体の非パラメトリック分類

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks ( http://arxiv.org/abs/2307.01649v1 )

ライセンス: Link先を確認

Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

(参考訳) 畳み込み残留ニューラルネットワーク(convolutional residual neural network, convresnets)は、過パラメータ化されているものの、実際には驚くべき予測性能を達成することができる。このギャップを埋めるために,ConvResNeXtsの性能について検討する。これはConvResNetsを特別なケースとしてカバーし,非パラメトリック分類の観点から重量減衰を訓練する。我々の分析は、ConvResNeXtsにおいて無限に多くのビルディングブロックを許容し、重み減衰がこれらのブロックに空間性を暗黙的に強制することを示す。具体的には、低次元多様体上で支持される滑らかな対象関数を考えることで、convresnextsが関数の滑らかさや低次元構造に適応できることを証明し、次元の呪いに苦しむことなく効率的に関数を学習する。従来の機械学習モデルに比べて過パラメータ化されたConvResNeXtの利点を部分的に正当化する。

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.

翻訳日:2023-07-06 17:18:04 公開日:2023-07-04

# SwinGNN:グラフ生成のための拡散モデルにおける置換不変性の再考

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation ( http://arxiv.org/abs/2307.01646v1 )

ライセンス: Link先を確認

Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang

(参考訳) 置換同変ネットワークに基づく拡散モデルは、グラフデータの置換不変分布を学習することができる。しかし、それらの非不変モデルと比較すると、これらの不変モデルはより大きな学習課題に直面することが判明した。 1)有効目標分布は、より多くのモードを示す。 2) 最適な一段階分音スコアは, ガウス混合成分の得点関数である。そこで本研究では,swintransformersにインスパイアされた移動ウィンドウベースの自己アテンションを利用した,効率的なエッジツーエッジ2-wlメッセージパッシングネットワークを用いた非不変拡散モデルである$\textit{swingnn}$を提案する。さらに, 系統的アブレーションにより, グラフ生成のサンプル品質を著しく向上させるいくつかの批判的訓練およびサンプリング手法を同定した。最後に、単純な後処理のトリックである$\textit{i.e.}$を導入し、生成したグラフをランダムに置換し、任意のグラフ生成モデルを置換不変のグラフに変換する。合成および実世界のタンパク質および分子データセットに関する大規模な実験は、我々のSwinGNNが最先端のパフォーマンスを達成することを示す。私たちのコードはhttps://github.com/qiyan98/SwinGNNでリリースされています。

Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN .

翻訳日:2023-07-06 17:17:46 公開日:2023-07-04

# 合成は必要なすべて:合成データに対する会員推測攻撃の補助的データ仮定を取り除く

Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data ( http://arxiv.org/abs/2307.01701v1 )

ライセンス: Link先を確認

Florent Gu\'epin, Matthieu Meeus, Ana-Maria Cretu and Yves-Alexandre de Montjoye

(参考訳) プライバシーを保護しながら個人レベルのデータを共有できる最も有望なソリューションは、合成データだ。シャドウモデリングに基づくメンバーシップ推論攻撃(mias)は、合成データのプライバシを評価するための標準となっている。しかしこれらの攻撃は、現在、攻撃者はトレーニングデータセットと同じ分布からサンプリングされた補助データセットにアクセスすると仮定している。これはしばしば、攻撃が実際に起こりそうにないような非常に強い仮定である。本稿では,この仮定の除去方法と,合成データのみを用いてmiasを実現する方法を示す。より具体的には、合成データのみを用いた3つの異なる攻撃シナリオにおいて、我々の結果は、MIAがまだ成功していることを示す。これらの結果は、補助データセットにアクセス可能な合成データリリースを監査する際の強い仮説を緩和して実際の攻撃を実行する方法を示している。

Synthetic data is emerging as the most promising solution to share individual-level data while safeguarding privacy. Membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data. These attacks, however, currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This often is a very strong assumption that would make an attack unlikely to happen in practice. We here show how this assumption can be removed and how MIAs can be performed using only the synthetic data. More specifically, in three different attack scenarios using only synthetic data, our results demonstrate that MIAs are still successful, across two real-world datasets and two synthetic data generators. These results show how the strong hypothesis made when auditing synthetic data releases - access to an auxiliary dataset - can be relaxed to perform an actual attack.

翻訳日:2023-07-06 17:11:32 公開日:2023-07-04

# log-depth量子回路を用いた行列積状態の合成

Preparation of matrix product states with log-depth quantum circuits ( http://arxiv.org/abs/2307.01696v1 )

ライセンス: Link先を確認

Daniel Malz, Georgios Styliaris, Zhi-Yuan Wei, J. Ignacio Cirac

(参考訳) 局所ゲートの量子回路による行列積状態(MPS)の調製を検討する。まず、n$サイトの翻訳不変正規mpを忠実に準備するには回路深度$t=\omega(\log n)$が必要であることを証明します。次に、正規化群変換に基づくアルゴリズムを導入し、誤差$\epsilon$ in depth $T=O(\log (N/\epsilon))$で正規MPSを作成する。また、測定とフィードバックがアルゴリズムの指数的な高速化につながり、$T=O(\log\log (N/\epsilon))$であることを示す。測定により、任意の翻訳不変MPS、例えば長距離非正規MPSを同じ深さで作成することもできる。最後に、アルゴリズムは自然に不均一MPSにまで拡張する。

We consider preparation of matrix product states (MPS) via quantum circuits of local gates. We first prove that faithfully preparing translation-invariant normal MPS of $N$ sites requires a circuit depth $T=\Omega(\log N)$. We then introduce an algorithm based on the renormalization-group transformation to prepare normal MPS with an error $\epsilon$ in depth $T=O(\log (N/\epsilon))$, which is optimal. We also show that measurement and feedback leads to an exponential speed-up of the algorithm, to $T=O(\log\log (N/\epsilon))$. Measurements also allow one to prepare arbitrary translation-invariant MPS, including long-range non-normal ones, in the same depth. Finally, the algorithm naturally extends to inhomogeneous MPS.

翻訳日:2023-07-06 17:11:16 公開日:2023-07-04

# スパイク駆動変圧器

Spike-driven Transformer ( http://arxiv.org/abs/2307.01694v1 )

ライセンス: Link先を確認

Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li

(参考訳) スパイキングニューラルネットワーク(SNN)は、独自のスパイクベースのイベント駆動(スパイク駆動)パラダイムにより、エネルギー効率のよいディープラーニングオプションを提供する。本稿では、スパイク駆動のパラダイムを4つの特性を持つスパイク駆動トランスフォーマーによりTransformerに組み込む。 1) Transformer の入力が 0 の場合,イベント駆動の計算は行われない。 2) 二重スパイク通信, スパイク行列に関連するすべての行列乗算は, スパース加算に変換することができる。 3) トークン次元及びチャネル次元における線形複雑性を伴う自己注意 4) スパイク形式のクエリ、キー、値の間の操作はマスクと付加です。同時に、スパイク駆動トランスフォーマーにはスパース追加操作のみが存在する。この目的のために我々は,マスクと加算操作のみを乗算なしで利用し,バニラ自己認識よりも計算エネルギーが最大87.2\times$低い新しいSDSA(Spike-Driven Self-Attention)を設計した。特にsdsaでは、クエリー、キー、値の間の行列乗算がマスク演算として設計されている。さらに、活性化機能の前にバニラトランスの残余接続をすべて再構成し、すべてのニューロンがバイナリスパイク信号を伝達することを保証する。 SNNフィールドにおける最先端の結果であるImageNet-1Kでは、スパイク駆動トランスフォーマーが77.1\%のトップ-1精度を達成できることが示されている。ソースコードはhttps://github.com/BICLab/Spike-Driven-Transformerで入手できる。

Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.

翻訳日:2023-07-06 17:10:57 公開日:2023-07-04

# 米国の法的意見のテキストにおける人種バイアスの傾向

Racial Bias Trends in the Text of US Legal Opinions ( http://arxiv.org/abs/2307.01693v1 )

ライセンス: Link先を確認

Rohan Jinturkar

(参考訳) アメリカの法律には人種的偏見が広く認識されているが、そのような偏見が法律の言語、すなわち司法的意見にどのように現れるのか、また時代や地域によって異なるのかは不明である。大規模コーパスにおける暗黙の人種的偏見を測定するアプローチに基づいて、GloVeワードの埋め込みを1860年から2009年までの600万件以上の連邦および州裁判所で近似した。伝統的に黒人の名は事前分類された「不快な」用語とより密接に関連しており、伝統的に白人の名は事前分類された「不快な」用語とより密接に関連している。また、1950年以前の法的意見が1950年以前のものよりも暗黙的な人種的偏見を示すか、また南部州の意見が北東部のものよりも人種的偏見の変化が少ないかを検証した。 1950年以前の法的な意見に偏見が高まっている証拠や、北東部州の法的な意見が南部州に比べて人種的偏見が大きく変化している証拠は見つからない。これらの結果は、制度化された人種バイアスに対するさらなる研究の動機となった。

Although there is widespread recognition of racial bias in US law, it is unclear how such bias appears in the language of law, namely judicial opinions, and whether it varies across time period or region. Building upon approaches for measuring implicit racial bias in large-scale corpora, we approximate GloVe word embeddings for over 6 million US federal and state court cases from 1860 to 2009. We find strong evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with pre-classified "unpleasant" terms whereas traditionally White names are more closely associated with pre-classified "pleasant" terms. We also test whether legal opinions before 1950 exhibit more implicit racial bias than those after 1950, as well as whether opinions from Southern states exhibit less change in racial bias than those from Northeastern states. We do not find evidence of elevated bias in legal opinions before 1950, or evidence that legal opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results motivate further research into institutionalized racial bias.

翻訳日:2023-07-06 17:10:37 公開日:2023-07-04

# erm oracleによるオンライン学習と無限のゲーム解決

Online Learning and Solving Infinite Games with an ERM Oracle ( http://arxiv.org/abs/2307.01689v1 )

ライセンス: Link先を確認

Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson

(参考訳) ERMは確率的学習環境でほぼ最適の一般化誤差を達成するのに十分であるが、オンライン学習環境では、一般的な概念クラスのためのアルゴリズムが標準最適アルゴリズム(SOA)のような計算的に非効率なオラクルに依存することは知られていない。本研究では,ERMオーラクルコールのみに依存するオンラインバイナリ分類設定のアルゴリズムを提案する。我々は、基礎となる概念クラスのリトルストーンとしきい値次元の観点で後悔を締めくくった。我々は、erm oracleがベストレスポンスオラクルと解釈できる非パラメトリックゲームで同様の結果を得ることができ、他のプレイヤーのプレイ履歴に対するプレイヤーのベストレスポンスを見つけることができる。この設定において、我々は、ベストレスポンスオラクルにのみ依存し、2人のプレイヤーのゼロサムゲームにおける近似ミニマックス平衡とマルチプレイヤーの一般サムゲームにおける近似粗相関平衡に収束する学習アルゴリズムを提供する。我々のアルゴリズムは二値ゲームと実値ゲームの両方に適用でき、大きなゲームを解く実践において、二重オラクルと多重オラクルのアルゴリズムを広く活用するための正当化を提供すると見なすことができる。

While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.

翻訳日:2023-07-06 17:10:15 公開日:2023-07-04

# スマートIoTサービスのための分散フォッグサーバによるグラフニューラルネットワークの実現

Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services ( http://arxiv.org/abs/2307.01684v1 )

ライセンス: Link先を確認

Liekang Zeng, Xu Chen, Peng Huang, Ke Luo, Xiaoxi Zhang, Zhi Zhou

(参考訳) グラフニューラルネットワーク(GNN)は,グラフ構造上の潜在表現を抽出する能力に優れていたため,様々なアプリケーションへの関心が高まっている。 iot駆動スマートアプリケーションのためのgnnベースのサービスをレンダリングするために、従来のモデル提供パラダイムは通常、地理的に分散した入力データをリモートデータセンタにフルにアップロードすることでクラウドに頼る。しかし、当社の実験的な測定によって、このようなクラウドベースのサービスにおける通信のオーバーヘッドが明らかになり、フォグコンピューティングの導入における大きな可能性を浮き彫りにしています。本稿では、フォグコンピューティングによってもたらされるアーキテクチャ上の利点を最大化するために、iotデータソースに近い複数のフォグノードの多様な動的リソースを活用する、新しい分散リアルタイムgnn推論フレームワークfographを提案する。不均一な実行計画とGNN固有の圧縮技術を導入することで、フォグ環境でのGNNのユニークな特性をうまく適合させるようにFographは設計を調整した。プロトタイプに基づく評価とケーススタディにより、Fographは最先端のクラウドサービスと霧の配置を最大5.39倍の高速化と6.84倍のスループット向上で大幅に上回っている。

Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.

翻訳日:2023-07-06 17:09:52 公開日:2023-07-04

# 局所再パラメータ化トリックを用いた離散重みとアクティベーションの学習

Learning Discrete Weights and Activations Using the Local Reparameterization Trick ( http://arxiv.org/abs/2307.01683v1 )

ライセンス: Link先を確認

Guy Berger, Aviv Navon, Ethan Fetaya

(参考訳) コンピュータビジョンと機械学習において、重要な課題は、ニューラルネットワーク推論の計算とメモリ要求を減らすことである。この課題に対処する一般的な解決策は、バイナリ化の利用である。ネットワーク重みとアクティベーションをバイナライズすることにより、計算コストの高い浮動小数点演算を高速なビット演算で置き換えることで、計算複雑性を著しく低減することができる。これにより、低リソースデバイスにデプロイ可能な、より効率的なニューラルネットワーク推論が可能になる。本研究では,局所再パラメータ化手法を用いた離散重み付きネットワークの学習手法を拡張し,離散的アクティベーションも可能にした。元のアプローチでは離散ウェイト上の分布を最適化し、中央極限定理を用いて連続ガウス分布による事前活性化を近似する。本稿では,確率的モデリングにより,ネットワークの離散的アクティベーションを効果的に行うことができることを示す。これにより、バイナリアクティベーションを持つネットワークの最先端結果によって、推論時のランタイムとメモリフットプリントをさらに削減できる。

In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.

翻訳日:2023-07-06 17:09:33 公開日:2023-07-04

# ソーシャルメディアにおけるロバストヘイト音声検出 : クロスデータセット実験による評価

Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation ( http://arxiv.org/abs/2307.01680v1 )

ライセンス: Link先を確認

Dimosthenis Antypas and Jose Camacho-Collados

(参考訳) オンラインでのヘイトスピーチの自動検出は、NLPの活発な研究領域である。これまでの研究のほとんどはソーシャルメディアのデータセットに基づいており、それらに基づいて訓練されたヘイトスピーチ検出モデルの作成に貢献している。しかし、データ生成プロセスには独自のバイアスが含まれており、モデルはこれらのデータセット固有のバイアスから本質的に学習する。本稿では,異なるヘイトスピーチ検出データセット上で言語モデルを微調整する大規模クロスデータセット比較を行う。この分析は、トレーニングデータとして使用するデータセットが、他のデータセットよりも一般化可能であることを示している。本研究は,ヘイトスピーチ検出データセットを組み合わせることで,ロバストなヘイトスピーチ検出モデルの開発にどのように寄与するかを示す。このロバスト性は、データサイズで制御し、最高のデータセットと比較しても保持される。

The automatic detection of hate speech online is an active research area in NLP. Most of the studies to date are based on social media datasets that contribute to the creation of hate speech detection models trained on them. However, data creation processes contain their own biases, and models inherently learn from these dataset-specific biases. In this paper, we perform a large-scale cross-dataset comparison where we fine-tune language models on different hate speech detection datasets. This analysis shows how some datasets are more generalisable than others when used as training data. Crucially, our experiments show how combining hate speech detection datasets can contribute to the development of robust hate speech detection models. This robustness holds even when controlling by data size and compared with the best individual datasets.

翻訳日:2023-07-06 17:09:15 公開日:2023-07-04

# RaidEnv: ボスレイドゲームのためのコンテンツバランシング自動化の新たな課題

RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games ( http://arxiv.org/abs/2307.01676v1 )

ライセンス: Link先を確認

Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim

(参考訳) ゲームコンテンツのバランスはゲーム体験に大きな影響を与えます。不均衡なゲームコンテンツは、繰り返し失敗してエンゲージメントを減らしたり、フラストレーションを増加させる。ゲームデザイナーはゲームコンテンツの難易度を調整しようとするが、これは反復的で労働集約的で挑戦的なプロセスであり、特に幅広いコンテンツを持つ商業レベルのゲームではそうである。この問題に対処するため、ゲーム研究コミュニティは人工知能(AI)技術を用いた自動ゲームバランスについて検討した。しかし,従来の研究は限定的なゲームコンテンツに焦点を当てており,コンテンツの変化に遭遇する際のプレイテストエージェントの一般化能力の重要性を考慮しなかった。本研究では,mmorpgゲームにおけるboss raidシナリオの多様かつカスタマイズ可能なコンテンツを含む,新しいゲームシミュレータraidenvを提案する。さらに,ゲームAIの実践的応用に役立つボスレイドシナリオのベンチマークを2つ設計する。これらのベンチマークは,自動コンテンツバランシングにおける2つのオープン問題に対処し,自動コンテンツバランシングにおけるaiのガイダンスを提供するために,2つの評価指標を導入する。このゲーム研究プラットフォームは、自動ゲームバランシング問題のフロンティアを拡張し、現実的なゲーム生産パイプライン内でフレームワークを提供する。

The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline.

翻訳日:2023-07-06 17:09:02 公開日:2023-07-04

# ダイヤモンドの窒素空洞中心における2光子遷移超断熱通路

Two-photon-transition superadiabatic passage in an nitrogen-vacancy center in diamond ( http://arxiv.org/abs/2307.01675v1 )

ライセンス: Link先を確認

Musang Gong, Min Yu, Yaoming Chu, Wei Chen, Qingyun Cao, Ning Wang, Jianming Cai, Ralf Betzholz, and Luigi Giannelli

(参考訳) 与えられた目標量子状態に高い忠実度と高速な演算速度を量子限界に近づけることは、量子情報科学の重要な目標である。本稿では,3レベル固体スピン系における集団移動を実現するための超断熱量子駆動実験を行った。従来の刺激されたraman adiabatic passage (stirap) から始まり、いくつかのパラダイム的パルス形状を持つsrawap hamiltonianの超断熱補正を実装している。強いマイクロ波パルスや長い移動時間を必要としないため、パルス不完全性よりも強い堅牢性を示す。これらの結果は、量子情報処理および量子システムのコヒーレント操作に有用なツールとなるかもしれない。

Reaching a given target quantum state with high fidelity and fast operation speed close to the quantum limit represents an important goal in quantum information science. Here, we experimentally demonstrate superadiabatic quantum driving to achieve population transfer in a three-level solid-state spin system. Starting from traditional stimulated Raman adiabatic passage (STIRAP), our approach implements superadiabatic corrections to the STIRAP Hamiltonians with several paradigmatic pulse shapes. It requires no need of intense microwave pulses or long transfer times and shows enhanced robustness over pulse imperfections. These results might provide a useful tool for quantum information processing and coherent manipulations of quantum systems.

翻訳日:2023-07-06 17:08:43 公開日:2023-07-04

# rrcnn : リカレント残差畳み込みニューラルネットワークを用いた新しい信号分解法

RRCNN: A novel signal decomposition approach based on recurrent residue convolutional neural network ( http://arxiv.org/abs/2307.01725v1 )

ライセンス: Link先を確認

Feng Zhou, Antonio Cicone, Haomin Zhou

(参考訳) 非定常信号の分解は信号時間-周波数解析の分野で重要かつ困難な課題である。近年,1998年にhuangらによって開拓された経験的モード分解に導かれた多くの信号分解法が,異なる研究グループによって提案されている。しかし、いくつかの制限がある。例えば、それらは一般的に境界とモードの混合効果があり、ノイズに対してあまり頑丈ではない。 Inspired by the successful applications of deep learning in fields like image processing and natural language processing, and given the lack in the literature of works in which deep learning techniques are used directly to decompose non-stationary signals into simple oscillatory components, we use the convolutional neural network, residual structure and nonlinear activation function to compute in an innovative way the local average of the signal, and study a new non-stationary signal decomposition method under the framework of deep learning. 本稿では,提案モデルの学習過程について考察し,学習アルゴリズムの収束解析について考察する。実験では,提案モデルの性能を,局所平均の計算と信号分解という2つの観点から評価した。さらに,提案手法により得られた分解成分のモード混合,ノイズ干渉,直交特性について検討した。これらの結果から,提案モデルにより,既存手法よりも境界効果,モード混合効果,ロバスト性,分解成分の直交性が向上することが示唆された。

The decomposition of non-stationary signals is an important and challenging task in the field of signal time-frequency analysis. In the recent two decades, many signal decomposition methods led by the empirical mode decomposition, which was pioneered by Huang et al. in 1998, have been proposed by different research groups. However, they still have some limitations. For example, they are generally prone to boundary and mode mixing effects and are not very robust to noise. Inspired by the successful applications of deep learning in fields like image processing and natural language processing, and given the lack in the literature of works in which deep learning techniques are used directly to decompose non-stationary signals into simple oscillatory components, we use the convolutional neural network, residual structure and nonlinear activation function to compute in an innovative way the local average of the signal, and study a new non-stationary signal decomposition method under the framework of deep learning. We discuss the training process of the proposed model and study the convergence analysis of the learning algorithm. In the experiments, we evaluate the performance of the proposed model from two points of view: the calculation of the local average and the signal decomposition. Furthermore, we study the mode mixing, noise interference, and orthogonality properties of the decomposed components produced by the proposed method. All results show that the proposed model allows for better handling boundary effect, mode mixing effect, robustness, and the orthogonality of the decomposed components than existing methods.

翻訳日:2023-07-06 17:01:18 公開日:2023-07-04

# 空間広帯域高利得SU(1,1)干渉計の位相感度

Phase sensitivity of spatially broadband high-gain SU(1,1) interferometers ( http://arxiv.org/abs/2307.01723v1 )

ライセンス: Link先を確認

D. Scharwald, T. Meier, P. R. Sharapova

(参考訳) 非線形干渉計は、古典光を用いた線形干渉計と比較して位相感度のスケーリングが向上していることが特徴である。しかし、これらの干渉計で発生する光の多重度は位相感度の破壊を招き、マルチモード光に対して高度な干渉計構成を必要とする。さらに、単一モードの場合とは対照的に、時間順序効果はマルチモードシナリオにおいて高利得状態において重要な役割を担い、位相感度の正確な推定を考慮に入れなければならない。本研究では,低パラメトリックゲインおよび高パラメトリックゲインで動作する空間多重モードSU(1,1)干渉計の理論記述を示す。本手法は,各非線形相互作用領域に対する積分微分方程式系の段階的解法に基づいている。光の偏光を補うためにパラボラミラーなどの集光素子を用いる回折補償型干渉計に着目する。平面波とガウスポンプについて検討し,任意のパラメトリックゲインに対して,位相感度が標準ショットノイズスケールを超える位相領域が存在することを示し,ハイゼンベルクスケールに接近する状態について考察する。最後に、低パラメトリックゲインと高パラメトリックゲインの両方に有効な位相感度に関する洞察に富んだ解析式に到達し、それがシステムの空間モードの数に依存するかを実証する。

Nonlinear interferometers are promising tools for quantum metrology, as they are characterized by an improved phase sensitivity scaling compared to linear interferometers operating with classical light. However, the multimodeness of the light generated in these interferometers results in the destruction of their phase sensitivity, requiring advanced interferometric configurations for multimode light. Moreover, in contrast to the single-mode case, time-ordering effects play an important role for the high-gain regime in the multimode scenario and must be taken into account for a correct estimation of the phase sensitivity. In this work, we present a theoretical description of spatially multimode SU(1,1) interferometers operating at low and high parametric gains. Our approach is based on a step-by-step solution of a system of integro-differential equations for each nonlinear interaction region. We focus on interferometers with diffraction compensation, where focusing elements such as a parabolic mirror are used to compensate for the divergence of the light. We investigate plane-wave and Gaussian pumping and show that for any parametric gain, there exists a region of phases for which the phase sensitivity surpasses the standard shot-noise scaling and discuss the regimes where it approaches the Heisenberg scale. Finally, we arrive at insightful analytical expressions for the phase sensitivity that are valid for both low and high parametric gain and demonstrate how it depends on the number of spatial modes of the system.

翻訳日:2023-07-06 17:00:57 公開日:2023-07-04

# MOPO-LSI: ユーザガイド

MOPO-LSI: A User Guide ( http://arxiv.org/abs/2307.01719v1 )

ライセンス: Link先を確認

Yong Zheng, Kumar Neelotpal Shukla, Jasmine Xu, David (Xuejun) Wang, Michael O'Leary

(参考訳) MOPO-LSIは、持続可能な投資のためのオープンソースの多目的ポートフォリオ最適化ライブラリである。この文書はMOPO-LSIバージョン1.0のユーザガイドを提供し、問題設定、ワークフロー、設定のハイパーパラメータを含む。

MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.

翻訳日:2023-07-06 17:00:35 公開日:2023-07-04

# 制約時間系列生成問題について

On the Constrained Time-Series Generation Problem ( http://arxiv.org/abs/2307.01717v1 )

ライセンス: Link先を確認

Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, Svitlana Vyetrenko

(参考訳) 合成時系列は、機械学習アルゴリズムの性能向上のために履歴時系列データセットを増強し、まれな事象の発生を増幅し、時系列によって記述された反事実シナリオを作成するために、実用的な用途でしばしば使用される。分散相似性(リアリズムと呼ぶ)と特定の数値的制約の満足度は、反実時間時系列シナリオ生成要求において共通の要件である。例えば、米連邦準備制度理事会(Federal Reserve)は、金融機関が仮説的不況における業績を評価するための制約付き時系列によって与えられる合成市場ストレスシナリオを公表している。制約付き時系列を生成する既存のアプローチは、通常、トレーニング損失を罰して制約を強制し、非コンフォーミングなサンプルを拒否する。しかし、これらの手法は制約を変更した場合には再訓練が必要であり、拒否サンプリングは計算コストが高く、複雑な制約に対して実用的ではない。本稿では,制約付き時系列生成問題に対処し,生成時系列のリアリズムを確保しつつ効率的なサンプリングを実現するための新しい手法を提案する。特に,制約付き最適化フレームワークを用いて問題を枠組み化し,現実的な時系列を生成するための誘導拡散モデルである `GuidedDiffTime'' などの生成手法を提案する。実証的に、制約を組み込むことが重要となる金融・エネルギーデータのデータセットをいくつか評価します。我々のアプローチは、定性的にも量的にも、既存の作業より優れています。最も重要なことは、我々の `GuidedDiffTime'' モデルが、新しい制約に対して再トレーニングが不要な唯一のソリューションであり、結果として炭素フットプリントが大幅に減少することを示している。

Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.

翻訳日:2023-07-06 17:00:31 公開日:2023-07-04

# Align with Purpose: General Plug-and-Play Frameworkを用いたCTCモデルにおけるDesiredプロパティの最適化

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework ( http://arxiv.org/abs/2307.01715v1 )

ライセンス: Link先を確認

Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua and Tal Rosenwein

(参考訳) コネクショニスト時間分類(ctc)は、教師付きシーケンシャル・ツー・シークエンス(seq2seq)モデルの訓練に広く用いられている基準である。これは不完全なアライメントを犠牲にして、完全なアライメント(基礎となる真実を生み出す)を余分にすることで、入力シーケンスと出力シーケンスの関係を学習することができる。完全かつ不完全なアライメントのこの二項微分は、他の現実世界の応用において重要な重要なアライメント特性を捉えていない。ここでは、CTC基準でトレーニングされたモデルにおいて、所望のプロパティを強化するために、$\textbf{ general Plug-and-Play framework}$を提案する。我々は、所望の特性に応じてアライメントを優先順位付けする追加の損失項でCTCを補完する。本手法はctc損失関数への干渉を一切必要とせず,様々な特性の最適化を容易にし,完全アライメントと不完全アライメントの区別を可能にする。我々は,ASR(Automatic Speech Recognition)の領域にフレームワークを適用し,その特性選択,アーキテクチャ選択,トレーニングデータセットのスケール(最大280,000時間)において,その汎用性を示す。本フレームワークの有効性を実証するため, 出力時間と単語誤り率(WER)の2つの非関連特性に適用した。前者については、WERの小さな削減によるレイテンシ最適化の最大570msの改善を報告し、後者については、ベースラインモデルよりも4.5%WERの相対的な改善を報告した。私たちの知る限りでは、これらのアプリケーションは我々のものほど大規模なデータを扱うことが実証されたことはない。特に,本手法は数行のコードだけで実装可能であり,アライメントフリーな損失関数やASR以外の領域にも拡張可能である。

Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.

翻訳日:2023-07-06 17:00:03 公開日:2023-07-04

# 論理はウィグナーの友人(とその友人)と出会う

Logic meets Wigner's Friend (and their Friends) ( http://arxiv.org/abs/2307.01713v1 )

ライセンス: Link先を確認

Alexandru Baltag and Sonja Smets

(参考訳) 我々は、Wigner's Friend thought-experimentと、Frauchiger-Renner(FR) Paradox(英語版)など、より最近の変種と拡張のいくつかを新たに見ていく。これらのシナリオにおいて、状態割当の多重性の正しい認識論的解釈とは何か。その下では、従来の量子力学と相容れない方法で、古典的観察者を量子状態記述に含めることができるのか? あるシステムが別のバックグラウンドオブザーバの観点から、追加の"オブザーバ"として認められる条件は? エージェント間の「知識伝達」を可能にするマルチエージェント認識論理の標準公理は、量子物理学的観測者に適用できるのか? 論文の最後のパートでは、これらの質問に対する新しい回答を提案し、この回答の特定の形式的実装をスケッチし、友人型パラドックスに対する原理的な解決策を得るためにそれを適用する。

We take a fresh look at Wigner's Friend thought-experiment and some of its more recent variants and extensions, such as the Frauchiger-Renner (FR) Paradox. We discuss various solutions proposed in the literature, focusing on a few questions: what is the correct epistemic interpretation of the multiplicity of state assignments in these scenarios; under which conditions can one include classical observers into the quantum state descriptions, in a way that is still compatible with traditional Quantum Mechanics?; under which conditions can one system be admitted as an additional 'observer' from the perspective of another background observer?; when can the standard axioms of multi-agent Epistemic Logic (that allow "knowledge transfer" between agents) be applied to quantum-physical observers? In the last part of the paper, we propose a new answer to these questions, sketch a particular formal implementation of this answer, and apply it to obtain a principled solution to Wigner Friend-type paradoxes.

翻訳日:2023-07-06 16:59:29 公開日:2023-07-04

# ディッピングPLM:条件付きソフトプロンプティングによる効果的な知識グラフ補完のためのブリッジ構造とテキスト

Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting ( http://arxiv.org/abs/2307.01709v1 )

ライセンス: Link先を確認

Chen Chen, Yufei Wang, Aixin Sun, Bing Li and Kwok-Yan Lam

(参考訳) 知識グラフ補完(KGC)は、しばしばKG構造情報とテキスト情報の両方を有効にする必要がある。事前訓練された言語モデル(PLM)は、通常、KGCタスクの微調整パラダイムの下で、テキスト情報を学ぶために使われてきた。しかし、微調整されたplmは、しばしばテキスト情報に集中し、構造的知識を見落としている。本稿では,構造情報とテキスト知識のバランスを保つCSProm-KG(Conditional Soft Prompts for KGC)を提案する。 CSProm-KGは、エンティティと関係表現によって生成される条件付きソフトプロンプトのパラメータのみをチューニングする。 WN18RR, FB15K-237, Wikidata5Mの3つの静的KGCベンチマークとICEWS14, ICEWS05-15におけるCSProm-KGの有効性を検証する。 CSProm-KGは競争ベースラインモデルより優れており、これらのベンチマークで新たな最先端を設定できる。さらなる分析を行い i)提案したコンポーネントの有効性。 (ii)csprom-kgの効率、及び (iii) csprom-kgの柔軟性。

Knowledge Graph Completion (KGC) often requires both KG structural and textual information to be effective. Pre-trained Language Models (PLMs) have been used to learn the textual information, usually under the fine-tune paradigm for the KGC task. However, the fine-tuned PLMs often overwhelmingly focus on the textual information and overlook structural knowledge. To tackle this issue, this paper proposes CSProm-KG (Conditional Soft Prompts for KGC) which maintains a balance between structural information and textual knowledge. CSProm-KG only tunes the parameters of Conditional Soft Prompts that are generated by the entities and relations representations. We verify the effectiveness of CSProm-KG on three popular static KGC benchmarks WN18RR, FB15K-237 and Wikidata5M, and two temporal KGC benchmarks ICEWS14 and ICEWS05-15. CSProm-KG outperforms competitive baseline models and sets new state-of-the-art on these benchmarks. We conduct further analysis to show (i) the effectiveness of our proposed components, (ii) the efficiency of CSProm-KG, and (iii) the flexibility of CSProm-KG.

翻訳日:2023-07-06 16:59:12 公開日:2023-07-04

# リスク感応強化学習のための分布モデル等価性

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning ( http://arxiv.org/abs/2307.01708v1 )

ライセンス: Link先を確認

Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand

(参考訳) リスク感応強化学習における学習モデルの問題を考える。リスクニュートラルな設定で最適に計画できる学習モデルである適切な値等価性は、リスクセンシティブな設定で最適に計画するのに十分でないことを理論的に実証する。分散強化学習を用いて,モデル等価性という新たな概念を2つ導入した。1つは汎用的であり,任意のリスク対策の計画に使用できるが,難解である。また,どのリスク対策を最適に計画するかを選択できる実用的なバリエーションである。当社のフレームワークは,モデルフリーなリスクセンシティブアルゴリズムの強化にどのように役立つのかを実証するとともに,その能力を示すために,表式および大規模実験の両方を提供する。

We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.

翻訳日:2023-07-06 16:58:49 公開日:2023-07-04

# 皮膚内視鏡と臨床画像を用いたマルチラベル皮膚病変分類のためのグラフアンサンブル学習モデル

Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images ( http://arxiv.org/abs/2307.01704v1 )

ライセンス: Link先を確認

Peng Tang, Yang Nan, Tobias Lasser

(参考訳) 近年,多くの皮膚病変解析 (SLA) 法が, 2つの要因によるマルチモーダルベース多ラベル分類法の開発に焦点をあてている。 1つはマルチモーダルデータ、すなわち臨床画像と皮膚鏡画像であり、単一のモーダルデータよりも正確な結果を得るために補完的な情報を提供できる。 2つ目は、補助分類タスクとしてのマルチラベル分類、すなわち7点チェックリスト(spc)基準は、深層学習(dl)パイプラインにおけるメラノーマの診断精度を高めるだけでなく、臨床皮膚科医の診断において一般的に用いられるように、臨床医により有用な機能を提供する。しかし、ほとんどの手法はマルチモーダルデータ融合のためのより良いモジュールの設計にのみ焦点を当てており、性能向上のためにSPCと皮膚疾患のラベル相関を利用する方法はほとんどない。本研究では,グラフ畳み込みネットワーク(GCN)を導入するギャップを埋め,相関行列として各カテゴリ間の先行共起を多ラベル分類のためのDLモデルに活用する。しかし,本実験では,GCNを直接適用することにより,医療データの統計的サンプルが不十分な場合において,GCNの弱い一般化能力が低下した。我々は,gcnからの予測を融合モデルからの予測の補完的情報と見なすグラフ・センスブル・ラーニング・モデル(geln)を提案し,それを重み付け平均化スキームによって適応的に融合することで,gcnから得られる貴重な情報を最大限の悪影響を回避しつつ活用する。提案手法を評価するために,公開データセットで実験を行う。その結果,異なるデータセットの分類性能を一貫して向上させ,spcと診断分類において最先端の性能を実現することができた。

Many skin lesion analysis (SLA) methods recently focused on developing a multi-modal-based multi-label classification method due to two factors. The first is multi-modal data, i.e., clinical and dermoscopy images, which can provide complementary information to obtain more accurate results than single-modal data. The second one is that multi-label classification, i.e., seven-point checklist (SPC) criteria as an auxiliary classification task can not only boost the diagnostic accuracy of melanoma in the deep learning (DL) pipeline but also provide more useful functions to the clinical doctor as it is commonly used in clinical dermatologist's diagnosis. However, most methods only focus on designing a better module for multi-modal data fusion; few methods explore utilizing the label correlation between SPC and skin disease for performance improvement. This study fills the gap that introduces a Graph Convolution Network (GCN) to exploit prior co-occurrence between each category as a correlation matrix into the DL model for the multi-label classification. However, directly applying GCN degraded the performances in our experiments; we attribute this to the weak generalization ability of GCN in the scenario of insufficient statistical samples of medical data. We tackle this issue by proposing a Graph-Ensemble Learning Model (GELN) that views the prediction from GCN as complementary information of the predictions from the fusion model and adaptively fuses them by a weighted averaging scheme, which can utilize the valuable information from GCN while avoiding its negative influences as much as possible. To evaluate our method, we conduct experiments on public datasets. The results illustrate that our GELN can consistently improve the classification performance on different datasets and that the proposed method can achieve state-of-the-art performance in SPC and diagnosis classification.

翻訳日:2023-07-06 16:58:35 公開日:2023-07-04

# ドメイン一般化セグメンテーションのための色を超えた拡張機能

Augment Features Beyond Color for Domain Generalized Segmentation ( http://arxiv.org/abs/2307.01703v1 )

ライセンス: Link先を確認

Qiyu Sun, Pavlo Melnyk, Michael Felsberg, Yang Tang

(参考訳) ドメイン一般化セマンティックセグメンテーション(dgss)は必須だが、非常に難しいタスクであり、モデルがソースデータのみに基づいてトレーニングされ、ターゲットデータも利用できない。従来のDGSSメソッドは拡張ベースと正規化ベースに分割できる。前者は余分なバイアス付きデータを導入するか、あるいはデータ拡張のためのチャネルワイズ調整のみを実行するか、後者は有益な視覚情報を捨て、どちらもDGSSの限られた性能に繋がる。一方,本手法はチャネル間変換を行い,その一方でドメイン固有のバイアスを回避し,データの多様化とモデル一般化性能の向上を図る。具体的には,ランダム画像色拡張 (rica) とランダム特徴分布拡張 (rfda) の2つのモジュールからなる。 RICAは、RGBからの画像をCIELABカラーモデルに変換し、知覚に基づく画像強調のための色マップをランダム化する。我々はさらに、RICAを補完するCycleGANベースの生成ネットワークを用いて色を超えて特徴空間に拡張し、さらに一般化能力を高めることにより、この拡張を行う。我々は広範な実験を行い,合成gtavとシンセサイアから実際の都市景観,bdd,mapillaryデータセットへの一般化結果から,dgssにおける最先端性能を実現することを示す。

Domain generalized semantic segmentation (DGSS) is an essential but highly challenging task, in which the model is trained only on source data and any target data is not available. Previous DGSS methods can be partitioned into augmentation-based and normalization-based ones. The former either introduces extra biased data or only conducts channel-wise adjustments for data augmentation, and the latter may discard beneficial visual information, both of which lead to limited performance in DGSS. Contrarily, our method performs inter-channel transformation and meanwhile evades domain-specific biases, thus diversifying data and enhancing model generalization performance. Specifically, our method consists of two modules: random image color augmentation (RICA) and random feature distribution augmentation (RFDA). RICA converts images from RGB to the CIELAB color model and randomizes color maps in a perception-based way for image enhancement purposes. We further this augmentation by extending it beyond color to feature space using a CycleGAN-based generative network, which complements RICA and further boosts generalization capability. We conduct extensive experiments, and the generalization results from the synthetic GTAV and SYNTHIA to the real Cityscapes, BDDS, and Mapillary datasets show that our method achieves state-of-the-art performance in DGSS.

翻訳日:2023-07-06 16:58:02 公開日:2023-07-04

# バイナリチームにおける量子アドバンテージとコーディネーションジレンマ:その1

The Quantum Advantage in Binary Teams and the Coordination Dilemma: Part I ( http://arxiv.org/abs/2307.01762v1 )

ライセンス: Link先を確認

Shashank A. Deshpande and Ankur A. Kulkarni

(参考訳) エンタングルメント支援確率的戦略により、パッシブ・コモン・ランダム性を通じてアクセス可能な古典的相関測度を超える戦略測度にアクセスでき、したがって分散制御における量子的優位性が得られることを示す。本稿では,問題クラスの広範な超構造の中での量子的優位性の決定論的起源について考察する。バイナリチームの各クラスは、異なる代数構造を持つコスト関数のパラメトリック族に対応しています。ここでは、量子戦略の恩恵を受ける唯一の問題クラスを特定する。これらのコスト構造は特別な決定論的特徴 -- ‘コーディネーションジレンマ’ を認めています。したがって、分散制御における非局所量子相関の有用性に対する直感が明らかとなる。

We have shown that entanglement assisted stochastic strategies allow access to strategic measures beyond the classically correlated measures accessible through passive common randomness, and thus attain a quantum advantage in decentralised control. In this two part series of articles, we investigate the decision theoretic origins of the quantum advantage within a broad superstructure of problem classes. Each class in our binary team superstructure corresponds to a parametric family of cost functions with a distinct algebraic structure. In this part, identify the only problem classes that benefit from quantum strategies. We find that these cost structures admit a special decision-theoretic feature -- `the coordination dilemma'. Our analysis hence reveals some intuition towards the utility of non-local quantum correlations in decentralised control.

翻訳日:2023-07-06 16:53:15 公開日:2023-07-04

# 事前学習は必要なすべて:自閉症スペクトラム障害分類のためのマルチアトラス拡張トランスフォーマフレームワーク

Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification ( http://arxiv.org/abs/2307.01759v1 )

ライセンス: Link先を確認

Lucas Mahler, Qi Wang, Julius Steiglechner, Florian Birk, Samuel Heczko, Klaus Scheffler, Gabriele Lohmann

(参考訳) 自閉症スペクトラム障害(Autism spectrum disorder、ASD)は、非定型的認知、感情、社会的パターンを特徴とする精神疾患である。タイムリーかつ正確な診断は、ASD患者の効果的な介入と改善に不可欠である。本研究では,Multi-Atlas Enhanced Transformerフレームワーク,METAFormer,ASD分類を提案する。本フレームワークは, ABIDE I データセットからの静止状態機能的磁気共鳴画像データを用いて, 406 ASD と 476 の典型的制御 (TC) 被験者からなる。 METAFormerはマルチアトラス方式を採用しており、AAL、CC200、DOS160のフラット接続行列が変換器エンコーダの入力となる。特に,入力からのマスク値の再構成を含む自己教師付き事前学習は,付加的あるいは分離されたトレーニングデータを必要とすることなく,分類性能を著しく向上させる。階層化クロスバリデーションにより,提案手法の評価を行い,平均精度83.7%,AUCスコア0.832で,ABIDE Iデータセットの最先端性能を上回ることを示す。私たちのフレームワークのコードはhttps://github.com/Lugges991/METAFormerで利用可能です。

Autism spectrum disorder (ASD) is a prevalent psychiatric condition characterized by atypical cognitive, emotional, and social patterns. Timely and accurate diagnosis is crucial for effective interventions and improved outcomes in individuals with ASD. In this study, we propose a novel Multi-Atlas Enhanced Transformer framework, METAFormer, ASD classification. Our framework utilizes resting-state functional magnetic resonance imaging data from the ABIDE I dataset, comprising 406 ASD and 476 typical control (TC) subjects. METAFormer employs a multi-atlas approach, where flattened connectivity matrices from the AAL, CC200, and DOS160 atlases serve as input to the transformer encoder. Notably, we demonstrate that self-supervised pretraining, involving the reconstruction of masked values from the input, significantly enhances classification performance without the need for additional or separate training data. Through stratified cross-validation, we evaluate the proposed framework and show that it surpasses state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832. The code for our framework is available at https://github.com/Lugges991/METAFormer

翻訳日:2023-07-06 16:53:03 公開日:2023-07-04

# Flickrでプロの写真家を画像品質と美学で識別する

Identifying Professional Photographers Through Image Quality and Aesthetics in Flickr ( http://arxiv.org/abs/2307.01756v1 )

ライセンス: Link先を確認

Sofia Strukova, Rub\'en Gaspar Marco, Jos\'e A. Ruip\'erez-Valiente, F\'elix G\'omez M\'armol

(参考訳) 私たちの世代では、ソーシャルメディア、特に写真とビデオの共有プラットフォームの利用が、間違いなく増加しています。これらのサイトは、ユーザのインタラクションを通じてリッチなデータセットを生成できることを証明し、データ駆動による機能評価に使用することができる。それにもかかわらず、写真とビデオの共有プラットフォームにおける適切なデータセットの欠如と、それらの評価プロセスを明らかにする。このようにして、私たちの最初のコントリビューションは、flickrで最大のラベル付きデータセットの1つと、このコントリビューションの一部としてオープンソース化されたマルチモーダルデータの作成です。これらのデータに基づいて機械学習モデルを探索し、ユーザーがプロの写真家であるか否かを、自己申告された職業ラベルとユーザー、写真、クラウドソースセットからいくつかの特徴表現に基づいて適切に予測することは可能であると結論付けた。また,画像の審美性と技術的品質と,その画像の社会的活動との関係についても検討した。最後に,プロの写真家と非プロの写真家を区別する特徴について述べる。私たちが知る限り、この研究で提示された結果は、さまざまなドメインの研究者が異なるアプリケーションのために使用できる、ユーザの専門知識の識別にとって重要なノベルティである。

In our generation, there is an undoubted rise in the use of social media and specifically photo and video sharing platforms. These sites have proved their ability to yield rich data sets through the users' interaction which can be used to perform a data-driven evaluation of capabilities. Nevertheless, this study reveals the lack of suitable data sets in photo and video sharing platforms and evaluation processes across them. In this way, our first contribution is the creation of one of the largest labelled data sets in Flickr with the multimodal data which has been open sourced as part of this contribution. Predicated on these data, we explored machine learning models and concluded that it is feasible to properly predict whether a user is a professional photographer or not based on self-reported occupation labels and several feature representations out of the user, photo and crowdsourced sets. We also examined the relationship between the aesthetics and technical quality of a picture and the social activity of that picture. Finally, we depicted which characteristics differentiate professional photographers from non-professionals. As far as we know, the results presented in this work represent an important novelty for the users' expertise identification which researchers from various domains can use for different applications.

翻訳日:2023-07-06 16:52:43 公開日:2023-07-04

# 脳波のフーリエスペクトル解析を用いたK複合検出

K-complex Detection Using Fourier Spectrum Analysis In EEG ( http://arxiv.org/abs/2307.01754v1 )

ライセンス: Link先を確認

Alexey Protopopov

(参考訳) k-複合体は脳活動の重要なマーカーであり、臨床実践において睡眠得点と研究の両方に使用される。しかし、脳波記録(EEG)のサイズや、社会学者によるK-複合体検出の主観的性質から、K-複合体検出の自動化は妥当である。この分野でのこれまでの研究は、提案手法の有効性を定量化するために真正の値と偽正の値に依存してきたが、この指標のセットは誤解を招く可能性がある。本研究の目的は、より正確なメトリクス集合を見つけ、それらをニューラルネットワークに依存しない新しいk-複素検出法の開発に用いることである。そこで本研究では,高速フーリエ変換に基づく2つのK-複素検出手法を提案する。その結果、提案手法は、ニューラルネットワークを用いた手法を含む従来の研究で示されていた手法の質と似ているか、あるいは優れているかのどちらかを提供するが、計算能力は低いため、K-複素検出はニューラルネットワークの使用を必要としないことがわかった。提案手法は,K-コンプレックス検出の品質を示す新しい指標を用いて評価した。

K-complexes are an important marker of brain activity and are used both in clinical practice to perform sleep scoring, and in research. However, due to the size of electroencephalography (EEG) records, as well as the subjective nature of K-complex detection performed by somnologists, it is reasonable to automate K-complex detection. Previous works in this field of research have relied on the values of true positive rate and false positive rate to quantify the effectiveness of proposed methods, however this set of metrics may be misleading. The objective of the present research is to find a more accurate set of metrics and use them to develop a new method of K-complex detection, which would not rely on neural networks. Thus, the present article proposes two new methods for K-complex detection based on the fast Fourier transform. The results achieved demonstrated that the proposed methods offered a quality of K-complex detection that is either similar or superior to the quality of the methods demonstrated in previous works, including the methods employing neural networks, while requiring less computational power, meaning that K-complex detection does not require the use of neural networks. The proposed methods were evaluated using a new set of metrics, which is more representative of the quality of K-complex detection.

翻訳日:2023-07-06 16:52:22 公開日:2023-07-04

# 光度DESI光赤銀河の大規模クラスタリングによる局所原始的非ガウス性

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies ( http://arxiv.org/abs/2307.01753v1 )

ライセンス: Link先を確認

Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho, Julien Guy, Klaus Honscheid, Theodore Kisner, Martin Landriau, Michael Levi, Marc Manera, Aaron Meisner, Ramon Miquel, Eva-Maria Mueller, Adam Myers, Jeffrey A. Newman, Jundan Nie, Nathalie Palanque-Delabrouille, Will Percival, Claire Poppett, Graziano Rossi, Eusebio Sanchez, Michael Schubnell, Gregory Tarl\'e, Benjamin Alan Weaver, Christophe Y\`eche, Zhimin Zhou, Hu Zou

(参考訳) 我々は、Dark Energy Spectroscopic Instruments(DESI)による局所原始非ガウス性パラメータfNLの制約のために、光赤銀河の角クラスター化を用いる。サンプルは1200万以上のターゲットからなり、空は14,000平方度、赤方偏移は0.2<z < 1.35である。我々は, 銀河の絶滅, 調査深度, 天体観測を系統的誤りの主な原因とみなし, 大規模な非宇宙的余剰クラスタリングを緩和するために線形回帰と人工ニューラルネットワークを用いる。本手法は,fnlおよびシステマティックスの有無に関わらず対数正規化シミュレーションを行い,残存システマティックスを低減したニューラルネットワーク処理の性能を示す。普遍性関係を仮定すると、fNL $= 47^{+14(+29)}_{-11(-22)}$ 68\%(95\%) である。画像の全集合に対する回帰を含むよりアグレッシブな処理により、我々の最大可能性値は fNL$ \sim 50$ にわずかにシフトし、大規模なクラスタリング情報の除去による fNL の不確実性は増大する。得られた制約の整合性を示す一連の堅牢性テスト(例えば、画像、デクリエーション、または使用するスケールのカット)を適用する。系統的要因を緩和する多大な努力にもかかわらず、fnl > 0の信頼度は99.9%である。この結果は、キャリブレーションエラーや、絶滅テンプレートの低エネルギー系統に関する不確実性など、予期せぬ体系的な原因による可能性があるという懸念を引き起こす。あるいは、宇宙マイクロ波背景スケールが影響を受けないまま、大規模構造物の周囲に大きな非ガウス性を持つスケール依存のfnlモデルが示唆されるかもしれない。以上の結果から,DSIスペクトルを用いたfNLのさらなる研究が望まれる。

We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter fNL. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range 0.2< z < 1.35. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against log-normal simulations with and without fNL and systematics, showing superior performance of the neural network treatment in reducing remaining systematics. Assuming the universality relation, we find fNL $= 47^{+14(+29)}_{-11(-22)}$ at 68\%(95\%) confidence. With a more aggressive treatment, including regression against the full set of imaging maps, our maximum likelihood value shifts slightly to fNL$ \sim 50$ and the uncertainty on fNL increases due to the removal of large-scale clustering information. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. Despite extensive efforts to mitigate systematics, our measurements indicate fNL > 0 with a 99.9 percent confidence level. This outcome raises concerns as it could be attributed to unforeseen systematics, including calibration errors or uncertainties associated with low-\ell systematics in the extinction template. Alternatively, it could suggest a scale-dependent fNL model--causing significant non-Gaussianity around large-scale structure while leaving cosmic microwave background scales unaffected. Our results encourage further studies of fNL with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics.

翻訳日:2023-07-06 16:52:01 公開日:2023-07-04

# ひずみ工学によるリンの電気的および磁気的パーセル効果の制御

Controlling electric and magnetic Purcell effects in phosphorene via strain engineering ( http://arxiv.org/abs/2307.01752v1 )

ライセンス: Link先を確認

P. P. Abrantes, W. J. M. Kort-Kamp, F. S. S. Rosa, C. Farina, F. A. Pinheiro, and Tarik P. Cysne

(参考訳) 一軸ひずみの影響下で, 蛍光体で被覆した基板近傍の量子エミッタの自然発光寿命を調べた。電気双極子と磁気双極子を介する自発遷移を励起状態から基底状態へ考える。リンのモデリングは、通常の低エネルギー記述を超越した密結合モデルを用いて行われる。電気的, 磁気的減衰速度は, パーセル効果のほぼ完全な抑制から, ホスホレンの破砕格子構造に伴う高い柔軟性による1300%以上の顕著な向上まで, 均一ひずみの適用によって強く調整できることを実証した。また, 放出された量子の最も可能性の高い崩壊経路を調整するためのメカニズムとして, ひずみの利用も明らかにする。以上の結果から,一軸ひずみリンは光-物質相互作用の能動的制御のための効率的で多用途なプラットフォームであることがわかった。

We investigate the spontaneous emission lifetime of a quantum emitter near a substrate coated with phosphorene under the influence of uniaxial strain. We consider both electric dipole and magnetic dipole-mediated spontaneous transitions from the excited to the ground state. The modeling of phosphorene is performed by employing a tight-binding model that goes beyond the usual low-energy description. We demonstrate that both electric and magnetic decay rates can be strongly tuned by the application of uniform strain, ranging from a near-total suppression of the Purcell effect to a remarkable enhancement of more than 1300% due to the high flexibility associated with the puckered lattice structure of phosphorene. We also unveil the use of strain as a mechanism to tailor the most probable decay pathways of the emitted quanta. Our results show that uniaxially strained phosphorene is an efficient and versatile material platform for the active control of light-matter interactions thanks to its extraordinary optomechanical properties.

翻訳日:2023-07-06 16:51:28 公開日:2023-07-04

# SRCD:単一ドメイン汎用オブジェクト検出のための複合ドメインを用いた意味推論

SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection ( http://arxiv.org/abs/2307.01750v1 )

ライセンス: Link先を確認

Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo

(参考訳) 本稿では,単一ドメイン一般化オブジェクト検出のための新しいフレームワーク(すなわち単一dgod)を提案し,モデル一般化能力を高めるために,自己提供型複合クロスドメインサンプルの意味構造を学習し,維持することに関心を寄せる。複数のソースドメインでトレーニングされたDGODとは異なり、シングルDGODは単一のソースドメインだけで複数のターゲットドメインにうまく一般化することがはるかに難しい。既存の手法は主にDGODからの同様の処理を採用し、意味空間を分離または圧縮することでドメイン不変の特徴を学習する。しかし、潜在的な制限は2つある。 1) 極端に少ない単一ドメインデータによる擬似属性・ラベル相関 2) セマンティックな構造情報は一般に無視される。つまり,サンプルにおけるインスタンスレベルのセマンティック関係の親和性は,一般化のモデル化に不可欠である。本稿では,Single-DGODのためのSingmantic Reasoning with Compound Domains (SRCD)を提案する。具体的には,テクスチャベースの自己拡張(TBSA)モジュールと局所言語意味推論(LGSR)モジュールの2つの主要コンポーネントを含む。 TBSAは、光、影、色などのラベルに関連する無関係な属性を、光量効率の自己増強によって画像レベルで除去することを目的としている。さらに、lgsrは、インスタンス特徴のセマンティック関係をさらにモデル化し、本質的なセマンティック構造を解明し、維持するために使用される。複数のベンチマークで大規模な実験を行い、提案したSRCDの有効性を示した。

This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.

翻訳日:2023-07-06 16:51:12 公開日:2023-07-04

# Ben-ge: 地理・環境データによるBigEarthNetの拡張

Ben-ge: Extending BigEarthNet with Geographical and Environmental Data ( http://arxiv.org/abs/2307.01741v1 )

ライセンス: Link先を確認

Michael Mommert, Nicolas Kesseli, Jo\"elle Hanna, Linus Scheibenreif, Damian Borth, Beg\"um Demir

(参考訳) 深層学習法は、大量の複雑な地球観測データの解析において強力なツールであることが証明されている。しかし、地球観測データはほとんどの場合マルチモーダルであるが、通常は単一のあるいは少数のモーダルしか考慮されない。本稿では,自由かつグローバルに利用可能な地理および環境データをコンパイルすることにより,bigearthnet-mmデータセットを補完するben-geデータセットを提案する。このデータセットに基づいて,パッチベースの土地利用/土地被覆分類と土地利用/土地被覆区分の下流タスクにおける異なるデータモダリティを組み合わせる価値を示す。 ben-geは無料で利用可能であり、完全に監視され、自己監視された地球観測アプリケーションのためのテストベッドとして機能することが期待されている。

Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and environmental data. Based on this dataset, we showcase the value of combining different data modalities for the downstream tasks of patch-based land-use/land-cover classification and land-use/land-cover segmentation. ben-ge is freely available and expected to serve as a test bed for fully supervised and self-supervised Earth observation applications.

翻訳日:2023-07-06 16:50:31 公開日:2023-07-04

# 非コントラストCTにおけるストローク病変分割と画像-ラベル拡散確率モデル

Synchronous Image-Label Diffusion Probability Model with Application to Stroke Lesion Segmentation on Non-contrast CT ( http://arxiv.org/abs/2307.01740v1 )

ライセンス: Link先を確認

Jianhai Zhang and Tonghua Wan and Ethan MacDonald and Aravind Ganesh and Qiu Wu

(参考訳) 急性虚血性脳卒中(AIS)患者の予後を評価するため, ストローク病変容積は重要なX線学的指標であり, 非コントラストCT(NCCT)スキャンでは自動測定が困難である。最近の拡散確率モデルは、画像分割に使用される可能性を示している。本稿では,マルコフ拡散法を用いてNCCTの脳梗塞セグメント化を行うために,シンクロナス画像ラベル拡散確率モデル(SDPM)を提案する。提案したSDPMはLVM(Latent Variable Model)を完全にベースとしており、完全な確率的エラボレーションを提供する。ノイズ予測ストリームと平行な追加のネットストリームを導入し、最終ラベルを効率的に推定するための初期ノイズラベル推定値を得る。特定の変動境界を最適化することにより、トレーニングされたモデルは、ノイズのある入力画像から基準値に対する複数のラベル推定を推測することができる。提案モデルは1つの公開データセットと2つのプライベートデータセットを含む3つの脳卒中病変データセットで評価された。いくつかのu-netおよびtransformerベースのセグメンテーション手法と比較して,提案するsdpmモデルは最先端の性能を実現することができる。コードは公開されている。

Stroke lesion volume is a key radiologic measurement for assessing the prognosis of Acute Ischemic Stroke (AIS) patients, which is challenging to be automatically measured on Non-Contrast CT (NCCT) scans. Recent diffusion probabilistic models have shown potentials of being used for image segmentation. In this paper, a novel Synchronous image-label Diffusion Probability Model (SDPM) is proposed for stroke lesion segmentation on NCCT using Markov diffusion process. The proposed SDPM is fully based on a Latent Variable Model (LVM), offering a complete probabilistic elaboration. An additional net-stream, parallel with a noise prediction stream, is introduced to obtain initial noisy label estimates for efficiently inferring the final labels. By optimizing the specified variational boundaries, the trained model can infer multiple label estimates for reference given the input images with noises. The proposed model was assessed on three stroke lesion datasets including one public and two private datasets. Compared to several U-net and transformer-based segmentation methods, our proposed SDPM model is able to achieve state-of-the-art performance. The code is publicly available.

翻訳日:2023-07-06 16:49:56 公開日:2023-07-04

# 医用画像解析の公平性向上を目的とした固定属性群のない校正バイアスの緩和

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis ( http://arxiv.org/abs/2307.01738v1 )

ライセンス: Link先を確認

Changjian Shui, Justin Szeto, Raghav Mehta, Douglas Arnold, Tal Arbel

(参考訳) 深層学習医療画像モデルの現実的な臨床実践への展開には、校正が必要である。しかし、全体として十分に調整されたモデルは、サブ人口の調整が不十分なままであり、このモデルの推奨に基づいて、臨床医が不意にこのグループの決定を下す可能性がある。モデル精度の観点から,サブグループ間のバイアスの軽減に有効な方法が示されているが,本研究は医用画像解析の文脈におけるキャリブレーションバイアスの軽減に関するオープン問題に焦点を当てている。本手法は訓練中にサブグループ属性を必要とせず,各属性の選択に対するバイアスを緩和する柔軟性を実現する。そこで本研究では,まず低濃度の試料を同定し,それらをグループに分類し,グループワイド焦点損失を導入して校正バイアスを改善する2段階の手法を提案する。 HAM10000データセットを用いた皮膚病変分類と,多発性硬化症(MS)患者の将来の病変活動の予測について検討した。また,年齢,性別などの従来の敏感な属性を年齢,性別などのサブグループで考慮することに加えて,医療画像解析において必要となる病変負荷など,画像由来の属性が異なるグループ間でのバイアスも考慮する。提案手法は, 予測性能を維持しつつ, 最近のベースラインよりも高い精度で校正誤差を効果的に制御できることを示す。

Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.

翻訳日:2023-07-06 16:49:17 公開日:2023-07-04

# すべての国家のデジタル主権戦略

Digital Sovereignty Strategies for Every Nation ( http://arxiv.org/abs/2307.01791v1 )

ライセンス: Link先を確認

Ali Shoker

(参考訳) デジタル主権はすべての近代国家の議題になければならない。デジタル技術は、食べ物や水管理といった重要な要素から、メタバースや宇宙における超越まで、私たちの生活の細部の一部になっています。したがって、デジタル資産を保護することは、現代国家が生き、卓越し、リードすることは避けられない。デジタル主権は、これらのデジタル資産を友好的な合理的な国家の独占から守るための戦略的必要性であり、非友好的な国家や行動の脅威である。本研究では,デジタル資産の利用,所有,生産のバリューチェーン全体をカバーするように拡張することで,デジタル主権の定義と範囲を再検討する。我々は、持続可能な主権を達成するために必要な研究と革新に加えて、原材料と人的専門知識の両方の運用資源を保護することの重要性を強調します。また、自律によるデジタル主権はしばしば不可能であり、相互協力は必ずしも持続可能であるとは限らないことを示す。この目的のために,ゲーム理論においてしばしば研究されるナッシュ平衡を用いたデジタル主権の実現を提案し,合理的状態との関係を規定する。最後に,その現状,優先事項,能力に基づいて,各国のデジタルプロファイルに対するデジタル主権アジェンダを提案する。我々は、現在のデジタル資産を主権化するのに有用な最先端のデジタル技術を調査します。また、自律性に可能な限り近い独立デジタル国家の育成を目指すロードマップも提案する。最後に、技術的、経済的、地政学的という異なる観点からデジタル主権をよりよく理解し、実装するためのさらなる研究の必要性に注目します。

Digital Sovereignty must be on the agenda of every modern nation. Digital technology is becoming part of our life details, from the vital essentials, like food and water management, to transcendence in the Metaverse and Space. Protecting these digital assets will, therefore, be inevitable for a modern country to live, excel and lead. Digital Sovereignty is a strategic necessity to protect these digital assets from the monopoly of friendly rational states, and the threats of unfriendly Malicious states and behaviors. In this work, we revisit the definition and scope of digital sovereignty through extending it to cover the entire value chain of using, owning, and producing digital assets. We emphasize the importance of protecting the operational resources, both raw materials and human expertise, in addition to research and innovation necessary to achieve sustainable sovereignty. We also show that digital sovereignty by autonomy is often impossible, and by mutual cooperation is not always sustainable. To this end, we propose implementing digital sovereignty using Nash Equilibrium, often studied in Game Theory, to govern the relation with Rational states. Finally, we propose a digital sovereignty agenda for different country's digital profiles, based on their status quo, priorities, and capabilities. We survey state-of-the-art digital technology that is useful to make the current digital assets sovereign. Additionally, we propose a roadmap that aims to develop a sovereign digital nation, as close as possible to autonomy. Finally, we draw attention to the need of more research to better understand and implement digital sovereignty from different perspectives: technological, economic, and geopolitical.

翻訳日:2023-07-06 16:41:34 公開日:2023-07-04

# 二重シンプレクティック古典回路:多体カオスの正確に解けるモデル

Dual symplectic classical circuits: An exactly solvable model of many-body chaos ( http://arxiv.org/abs/2307.01786v1 )

ライセンス: Link先を確認

Alexios Christopoulos, Andrea De Luca, D L Kovrizhin, Toma\v{z} Prosen

(参考訳) 二重シンプレクティックれんが壁回路における動的相関関数を1次元で計算する方法を提案する。これらは決定論的古典的多体力学系であり、2つの直交(時間と空間)方向のシンプレクティックダイナミクスによって解釈できる。量子双対回路との類似性において、2点動的相関関数は光円錐の端にしか存在しないことが証明される。動的相関は、一般に無限次元である1サイトマルコフ変換作用素の観点で正確に計算可能である。我々はこの理論を、古典的なフロッケスピンチェーンのダイナミクスを記述する双交回路の特定の族でテストする。驚くべきことに、これらのモデルでは、回転対称性は球面高調波に基づいてブロック対角形を持つ転送作用素に繋がる。これにより、簡単な局所観測可能な解析的予測が得られる。モンテカルロシミュレーションとの比較により,観測変数の異なる選択に対する優れた一致を示すことにより,我々の理論の有効性を実証する。

We propose a general exact method of calculating dynamical correlation functions in dual symplectic brick-wall circuits in one dimension. These are deterministic classical many-body dynamical systems which can be interpreted in terms of symplectic dynamics in two orthogonal (time and space) directions. In close analogy with quantum dual-unitary circuits, we prove that two-point dynamical correlation functions are non-vanishing only along the edges of the light cones. The dynamical correlations are exactly computable in terms of a one-site Markov transfer operator, which is generally of infinite dimensionality. We test our theory in a specific family of dual-symplectic circuits, describing the dynamics of a classical Floquet spin chain. Remarkably, for these models, the rotational symmetry leads to a transfer operator with a block diagonal form on the basis of spherical harmonics. This allows us to obtain analytical predictions for simple local observables. We demonstrate the validity of our theory by comparison with Montecarlo simulations, displaying excellent agreement for different choices of observables.

翻訳日:2023-07-06 16:41:05 公開日:2023-07-04

# 思考の内部的な感情

The Inner Sentiments of a Thought ( http://arxiv.org/abs/2307.01784v1 )

ライセンス: Link先を確認

Chris Gagne and Peter Dayan

(参考訳) トランスフォーマーベースの大規模言語モデル(LLM)は、非常にリアルなテキストを生成することができる。彼らは、はっきりと表現することができ、少なくとも暗黙的に、明らかな、価値や覚醒のような明白なものから、決定や賞賛といった微妙なものまで、幅広い感情や色を表現することができる。我々は、これらの表現を初めて探究し、それらが単一文の内部感傷的動作を理解するのにどのように役立つかを示す。長くなる接頭辞に適用されるllmの隠れた表現から文章の最終的な感情分布の量的特徴を推定する。評価, 判断, 賞賛, 不安, 不安の分布の予測が適切に調整されていることを示すと, これらの予測器を用いて文を分析し, 例えば, 通常の接続(例えば"but")でさえ, 発話の感情的軌跡を劇的に変えることができることを示す。次に,分布予測を活用し,分布の尾に感情のある文を生成する方法を示す。本研究は,精神機能障害などの思考の内的作業における結果の意義について考察する。

Transformer-based large-scale language models (LLMs) are able to generate highly realistic text. They are duly able to express, and at least implicitly represent, a wide range of sentiments and color, from the obvious, such as valence and arousal to the subtle, such as determination and admiration. We provide a first exploration of these representations and how they can be used for understanding the inner sentimental workings of single sentences. We train predictors of the quantiles of the distributions of final sentiments of sentences from the hidden representations of an LLM applied to prefixes of increasing lengths. After showing that predictors of distributions of valence, determination, admiration, anxiety and annoyance are well calibrated, we provide examples of using these predictors for analyzing sentences, illustrating, for instance, how even ordinary conjunctions (e.g., "but") can dramatically alter the emotional trajectory of an utterance. We then show how to exploit the distributional predictions to generate sentences with sentiments in the tails of distributions. We discuss the implications of our results for the inner workings of thoughts, for instance for psychiatric dysfunction.

翻訳日:2023-07-06 16:40:48 公開日:2023-07-04

# GHOST:シリコンフォトニクスを用いたグラフニューラルネットワーク加速器

GHOST: A Graph Neural Network Accelerator using Silicon Photonics ( http://arxiv.org/abs/2307.01782v1 )

ライセンス: Link先を確認

Salma Afifi, Febin Sunny, Amin Shafiee, Mahdi Nikdast, Sudeep Pasricha

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データからモデリングと学習を行うための強力なアプローチとして登場した。その後、複数の分野は、レコメンデーションシステム、ソーシャルネットワーク分析、薬物発見、ロボット工学などのGNNの能力から大きな恩恵を受けている。しかしながら、GNNの大幅な計算とメモリ要求のため、GNNの高速化と効率的な処理には、従来のニューラルネットワークアクセラレータを超えるユニークなアプローチが必要である。 CMOSプラットフォームのスケーリングのスローダウンは、代替実装基板の探索を動機付けている。本稿では、gnnのための最初のシリコンフォトニックハードウェアアクセラレータであるghostについて述べる。 GHOSTは、頂点中心とエッジ中心の両方の操作に関連するコストを効率的に軽減する。光学領域におけるGNNの実行に関わる3つの主要なステージを別々に実装し、グラフ畳み込みネットワークやグラフアテンションネットワークなど、広く使われているGNNモデルやアーキテクチャの推論に使用することができる。我々のシミュレーション研究は、GHOSTがGPU、TPU、CPUおよび複数の最先端GNNハードウェアアクセラレータと比較して、少なくとも10.2倍のスループットと3.8倍のエネルギー効率を示すことを示している。

Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.

翻訳日:2023-07-06 16:40:29 公開日:2023-07-04

# fedhil: モバイルデバイスを用いたロバストな屋内定位のためのヘテロゲニティレジリエントフェデレーション学習

FedHIL: Heterogeneity Resilient Federated Learning for Robust Indoor Localization with Mobile Devices ( http://arxiv.org/abs/2307.01780v1 )

ライセンス: Link先を確認

Danish Gufran, Sudeep Pasricha

(参考訳) 屋内ローカライゼーションは、緊急対応、倉庫管理、拡張現実体験などのアプリケーションにおいて重要な役割を果たす。機械学習(ML)ベースの屋内ローカライズフレームワークをモバイルデバイスにデプロイすることで、ユーザはさまざまな屋内および地下環境にローカライズすることができる。しかし、モバイルデバイスのハードウェアやソフトウェアスタックの不均一性のため、正確な屋内ローカライゼーションを実現することは困難であり、不整合かつ不正確な位置推定をもたらす可能性がある。従来のMLモデルは、初期トレーニングデータにも大きく依存しているため、内部環境全体の動的変更によるパフォーマンス低下に対して脆弱である。デバイスの不均一性と適応性の欠如による課題に対処するため,FedHILと呼ばれる新しいMLフレームワークを提案する。本研究では,屋内ローカライズとフェデレーション学習(fl)を組み合わせて,デバイスヘテロジェンス環境における屋内ローカライズ精度を向上させるとともに,ユーザデータのプライバシも保持する。 FedHILは、極めてノイズの多いデータが存在する場合でも、FL中の屋内ローカライゼーションのためのMLモデルの性能を維持するために、ドメイン固有の選択的な重量調整アプローチを統合する。各種屋内環境および異種モバイルデバイスを用いた実験により,FedHILは最先端のFLおよび非FL屋内ローカライゼーションフレームワークよりも優れた性能を示した。 FedHILは、FLベースの屋内ローカライゼーションフレームワークよりも平均して1.62倍の精度でローカライズすることができる。

Indoor localization plays a vital role in applications such as emergency response, warehouse management, and augmented reality experiences. By deploying machine learning (ML) based indoor localization frameworks on their mobile devices, users can localize themselves in a variety of indoor and subterranean environments. However, achieving accurate indoor localization can be challenging due to heterogeneity in the hardware and software stacks of mobile devices, which can result in inconsistent and inaccurate location estimates. Traditional ML models also heavily rely on initial training data, making them vulnerable to degradation in performance with dynamic changes across indoor environments. To address the challenges due to device heterogeneity and lack of adaptivity, we propose a novel embedded ML framework called FedHIL. Our framework combines indoor localization and federated learning (FL) to improve indoor localization accuracy in device-heterogeneous environments while also preserving user data privacy. FedHIL integrates a domain-specific selective weight adjustment approach to preserve the ML model's performance for indoor localization during FL, even in the presence of extremely noisy data. Experimental evaluations in diverse real-world indoor environments and with heterogeneous mobile devices show that FedHIL outperforms state-of-the-art FL and non-FL indoor localization frameworks. FedHIL is able to achieve 1.62x better localization accuracy on average than the best performing FL-based indoor localization framework from prior work.

翻訳日:2023-07-06 16:40:11 公開日:2023-07-04

# 物理的に実現可能な自然着衣テクスチャを用いた3次元モデリング

Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling ( http://arxiv.org/abs/2307.01778v1 )

ライセンス: Link先を確認

Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu

(参考訳) 近年の研究では、人検知器を避けるために敵の服を作る方法が提案されているが、これは限られた視角でのみ有効か、人間にとって非常に顕著である。 3dプリントされた亀などの硬い物体を製作するために用いられてきた3dモデリングに基づいて、服の逆テクスチャを制作することを目指している。硬い物体とは異なり、人間と衣服は非剛性であり、物理的実現が困難になる。複数の視角で人検出を回避できる自然的な対向服を作るために, 日常服の典型的なテクスチャの一種であるカモフラージュテクスチャに類似した対向的なカモフラージュテクスチャ(AdvCaT)を提案する。我々はvoronoiダイアグラムとgumbel-softmaxのトリックを利用して迷彩テクスチャをパラメータ化し、3dモデリングによりパラメータを最適化する。さらに,デジタルオブジェクトと実世界のオブジェクトのギャップを狭めるために,トポロジカル・プラザブル・プロジェクション(topoproj)と薄板スプライン(tps)を組み合わせた3次元メッシュ上の効率的な拡張パイプラインを提案する。開発した3dテクスチャを布素材にプリントし、tシャツやズボンに仕立てました。実験では、これらの服が複数の検出器に対して高い攻撃成功率を示す。

Recent works have proposed to craft adversarial clothes for evading person detectors, while they are either only effective at limited viewing angles or very conspicuous to humans. We aim to craft adversarial texture for clothes based on 3D modeling, an idea that has been used to craft rigid adversarial objects such as a 3D-printed turtle. Unlike rigid objects, humans and clothes are non-rigid, leading to difficulties in physical realization. In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures. We leverage the Voronoi diagram and Gumbel-softmax trick to parameterize the camouflage textures and optimize the parameters via 3D modeling. Moreover, we propose an efficient augmentation pipeline on 3D meshes combining topologically plausible projection (TopoProj) and Thin Plate Spline (TPS) to narrow the gap between digital and real-world objects. We printed the developed 3D texture pieces on fabric materials and tailored them into T-shirts and trousers. Experiments show high attack success rates of these clothes against multiple detectors.

翻訳日:2023-07-06 16:39:45 公開日:2023-07-04

# shapley sets: 再帰的関数分解による機能帰属

Shapley Sets: Feature Attribution via Recursive Function Decomposition ( http://arxiv.org/abs/2307.01777v1 )

ライセンス: Link先を確認

Torty Sivill and Peter Flach

(参考訳) ユビキタスな使用にもかかわらず、Shapleyの価値ある特徴属性は、モデルとデータの両方の機能相互作用のために誤解を招く可能性がある。我々は,機能集合に価値を与える代替帰属アプローチであるshapley setsを提案する。 Shapley Setsは、変数数の対数線形複雑性を持つ再帰関数分解アルゴリズムを用いて、基礎モデルを非分離変数群に分解する。シャプリーは、それぞれの分離不能な変数群に対して属性を特定の予測のためにそれらの組み合わせ値に設定する。シェープ集合は変換された特徴集合上のシェープ値と等価であることを示し、したがってフェアネスの同じ公理の恩恵を受ける。 Shapley Setsは値関数非依存であり、Shapley SetsがShapley値ベースの代替手段に関連する落とし穴を回避し、複雑な依存構造を持つデータ型に対して特に有利であることを示す。

Despite their ubiquitous use, Shapley value feature attributions can be misleading due to feature interaction in both model and data. We propose an alternative attribution approach, Shapley Sets, which awards value to sets of features. Shapley Sets decomposes the underlying model into non-separable variable groups using a recursive function decomposition algorithm with log linear complexity in the number of variables. Shapley Sets attributes to each non-separable variable group their combined value for a particular prediction. We show that Shapley Sets is equivalent to the Shapley value over the transformed feature set and thus benefits from the same axioms of fairness. Shapley Sets is value function agnostic and we show theoretically and experimentally how Shapley Sets avoids pitfalls associated with Shapley value based alternatives and are particularly advantageous for data types with complex dependency structure.

翻訳日:2023-07-06 16:39:22 公開日:2023-07-04

# スライスワッサーシュタイン一般化測地学による高速最適輸送

Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics ( http://arxiv.org/abs/2307.01770v1 )

ライセンス: Link先を確認

Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Cl\'ement Bonet, Nicolas Courty

(参考訳) ワッサースタイン距離(wasserstein distance, wd)と関連する最適輸送計画は、確率測度が懸かっている多くの応用において有用であることが証明されている。本稿では,2つの入力分布の最適1次元投影により誘導される輸送マップに基づく,2乗WDの新たなプロキシであるmin-SWGGを提案する。 min-swgg と wasserstein の一般化測地学との接続を描き、ピボット測度を直線上で支持する。特に、ライン上でサポートされている分布の1つの場合において、正確なワッサースタイン距離に対する新しい閉形式を提供し、勾配降下最適化に適応可能な高速計算スキームを導出する。 min-SWGG は WD の上限であり,Sliced-Wasserstein と同様の複雑性を有し,関連する輸送計画を提供するという付加的な特徴を有することを示す。また、距離性、弱収束、計算および位相的性質などの理論的性質についても検討する。実験的な証拠は、勾配流、形状マッチング、画像の着色など、様々な文脈におけるmin-SWGGの利点を支持する。

Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.

翻訳日:2023-07-06 16:39:07 公開日:2023-07-04

# データ中心型MLの前提条件としてのローカライズドデータワーク:ガーナにおけるフルライフサイクル作物病の特定を事例として

Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana ( http://arxiv.org/abs/2307.01767v1 )

ライセンス: Link先を確認

Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, Luis Oala

(参考訳) ghana cashew disease identification with artificial intelligence (cadi ai)プロジェクトは、農業の生産性や食品の安全性など、公共の業務に有用な、局所的なデータ中心のソリューションを提供するための前提条件として、健全なデータワークの重要性を実証している。ドローン収集データと機械学習を使用して作物のストレスを判定する。データ、モデル、最終アプリは共同で開発され、デスクトップアプリケーションを通じて地元の農家に提供される。

The Ghana Cashew Disease Identification with Artificial Intelligence (CADI AI) project demonstrates the importance of sound data work as a precondition for the delivery of useful, localized datacentric solutions for public good tasks such as agricultural productivity and food security. Drone collected data and machine learning are utilized to determine crop stressors. Data, model and the final app are developed jointly and made available to local farmers via a desktop application.

翻訳日:2023-07-06 16:38:46 公開日:2023-07-04

# 限定アノテートデータに対する知識認識型オーディオグラウンド生成スロットフィリング

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data ( http://arxiv.org/abs/2307.01764v1 )

ライセンス: Link先を確認

Guangzhi Sun, Chao Zhang, Ivan Vuli\'c, Pawe{\l} Budzianowski, Philip C. Woodland

(参考訳) タスク指向対話(tod)システムのための細粒度スロット値ラベルを手動で注釈するのは、高価で時間がかかります。これにより、限られた量のラベル付きデータを扱うスロットフィルング方法の研究が動機となる。さらに、ToDに関する現在の研究の大部分は、音声言語で作業する際の不完全な自動音声認識(ASR)のさらなる課題を無視し、入力モダリティとしてのテキストのみに基づいている。本研究では,音声入力によるToDの少数ショットおよびゼロショットスロットフィリングに着目した,知識認識型音声包絡型生成スロットフィリングフレームワークKA2Gを提案する。 KA2Gは音声ベースのToDにおけるロバストかつデータ効率の良いスロットフィリングを実現する 1)テキスト生成タスクとしてフレーミングすること。 2)音声モダリティに付加的なテキスト生成の接地,及び 3) 利用可能な外部知識の条件付け(スロット値の事前定義されたリストなど)。 KA2Gフレームワーク内の両方のモダリティを組み合わせることで、ASRエラーに対する堅牢性が向上することを示す。さらに、ポインタ生成機構を介して実装されたka2gの知識認識スロット値生成器は、特に、ゼロショット学習とゼロショット学習にメリットがある。商用todシステムから抽出した標準音声ベースのシングルターンslurpデータセットとマルチターンデータセットを用いて実験を行い,先行作業,特にマイショットおよびゼロショット設定において,強固かつ一貫した結果を示す。

Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by 1) framing it as a text generation task, 2) grounding text generation additionally in the audio modality, and 3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.

翻訳日:2023-07-06 16:38:38 公開日:2023-07-04

# 説明不能な行動不確かさを伴う人間の軌道予測

Human Trajectory Forecasting with Explainable Behavioral Uncertainty ( http://arxiv.org/abs/2307.01817v1 )

ライセンス: Link先を確認

Jiangbei Yue, Dinesh Manocha and He Wang

(参考訳) 人間の軌道予測は、人間の行動を理解し予測するのに役立ち、社会ロボットから自動運転車への応用を可能にする。既存の手法はモデルフリーとモデルベースに分けることができる。モデルフリー手法は予測精度が優れているが説明可能性に欠ける一方、モデルベース手法は説明可能性を提供するが、よく予測できない。両手法を組み合わせることで,行動sdeモデルとベイズニューラルネットワーク(bnns)を結合した新しいベイズ型神経確率微分方程式モデルbnsp-sfmを提案する。 NNは優れた予測力を提供するが、SDEは行動や観察における定量的不確実性を伴う強い説明可能性を提供する。 BNSP-SFMの予測精度は,11種類の最先端手法と比較して50%向上した。 BNSP-SFMはまた、異なる環境と群衆密度(テストデータより約20倍高い)で劇的に異なるシーンを一般化する。最後に、BNSP-SFMは、行動の潜在的な原因をよりよく説明するために、自信を持って予測を提供する。コードは受理後にリリースされます。

Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars, and therefore has been heavily investigated. Most existing methods can be divided into model-free and model-based methods. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. Combining both methodologies, we propose a new Bayesian Neural Stochastic Differential Equation model BNSP-SFM, where a behavior SDE model is combined with Bayesian neural networks (BNNs). While the NNs provide superior predictive power, the SDE offers strong explainability with quantifiable uncertainty in behavior and observation. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods. BNSP-SFM also generalizes better to drastically different scenes with different environments and crowd densities (~ 20 times higher than the testing data). Finally, BNSP-SFM can provide predictions with confidence to better explain potential causes of behaviors. The code will be released upon acceptance.

翻訳日:2023-07-06 16:32:52 公開日:2023-07-04

# 複素重みをもつ複素ネットワークにおける構造バランスとランダムウォーク

Structural Balance and Random Walks on Complex Networks with Complex Weights ( http://arxiv.org/abs/2307.01813v1 )

ライセンス: Link先を確認

Yu Tian, Renaud Lambiotte

(参考訳) 複素数は、多くの状況における実体間の関係を定義する。正準例は量子物理学におけるハミルトン行列の対角線外項である。近年、エッジの重みが複雑な数である場合、ネットワーク科学のツールを拡張することへの関心が高まっている。ここでは、重み行列が多くの応用において妥当な仮定であるエルミート行列である場合に注目し、複素重み付きネットワークの構造的および動的特性について検討する。符号付きグラフの概念に基づいて,構造的バランスの概念に基づく複雑重み付きネットワークの分類を行い,各タイプのスペクトル特性の共有について述べる。次に,グラフの構造的バランスが取れた場合に局所的なコンセンサスを漸近的に達成し,厳密なバランスが取れない場合に大域的なコンセンサスを得る,複雑な重み付きネットワーク上でのランダムウォークのダイナミクスを特徴付ける。最後に,カットの概念を一般化し,その可能性について検討し,関連するスペクトルクラスタリングアルゴリズムを提案する。また、複素重み付きネットワークに関連付ける磁気ラプラシアンのさらなる特性も提供する。アルゴリズムの性能は合成ネットワークと実ネットワークの両方で検証される。

Complex numbers define the relationship between entities in many situations. A canonical example would be the off-diagonal terms in a Hamiltonian matrix in quantum physics. Recent years have seen an increasing interest to extend the tools of network science when the weight of edges are complex numbers. Here, we focus on the case when the weight matrix is Hermitian, a reasonable assumption in many applications, and investigate both structural and dynamical properties of the complex-weighted networks. Building on concepts from signed graphs, we introduce a classification of complex-weighted networks based on the notion of structural balance, and illustrate the shared spectral properties within each type. We then apply the results to characterise the dynamics of random walks on complex-weighted networks, where local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. Finally, we explore potential applications of our findings by generalising the notion of cut, and propose an associated spectral clustering algorithm. We also provide further characteristics of the magnetic Laplacian, associating directed networks to complex-weighted ones. The performance of the algorithm is verified on both synthetic and real networks.

翻訳日:2023-07-06 16:32:35 公開日:2023-07-04

# 3次元時間検出のための学習意義誘導情報

SUIT: Learning Significance-guided Information for 3D Temporal Detection ( http://arxiv.org/abs/2307.01807v1 )

ライセンス: Link先を確認

Zheyuan Zhou, Jiachen Lu, Yihan Zeng, Hang Xu, Li Zhang

(参考訳) LiDARポイントクラウドからの3Dオブジェクト検出は、自動運転とロボット工学にとって非常に重要である。逐次点雲は時間的情報を通じて3次元知覚を高める可能性があるが、これらの時間的特徴を効果的に効果的に活用することは難しい問題である。前景情報がライダーシーンに分散しているという観測に基づいて、十分な知識は密集した地図ではなくスパースフォーマットで提供できると信じている。そこで本研究では,時間情報をフレーム間の情報融合のためのばらばらな特徴として単純化する3次元時間検出(suit)の意義誘導情報を学ぶことを提案する。具体的には,まず,予測対象のセントロイドに基づいて,情報に富みながらもスパースな特徴を抽出できる重要なサンプリング機構を導入する。さらに,フレームにまたがるスパース特徴間のオブジェクト中心変換を学習する,明示的な幾何学的変換学習手法を提案する。大規模なnuScenesとWaymoデータセットにおいて、SUITは時間融合のメモリと計算コストを大幅に削減するだけでなく、最先端のベースラインよりも優れた性能を発揮する。

3D object detection from LiDAR point cloud is of critical importance for autonomous driving and robotics. While sequential point cloud has the potential to enhance 3D perception through temporal information, utilizing these temporal features effectively and efficiently remains a challenging problem. Based on the observation that the foreground information is sparsely distributed in LiDAR scenes, we believe sufficient knowledge can be provided by sparse format rather than dense maps. To this end, we propose to learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames. Specifically, we first introduce a significant sampling mechanism that extracts information-rich yet sparse features based on predicted object centroids. On top of that, we present an explicit geometric transformation learning technique, which learns the object-centric transformations among sparse features across frames. We evaluate our method on large-scale nuScenes and Waymo dataset, where our SUIT not only significantly reduces the memory and computation cost of temporal fusion, but also performs well over the state-of-the-art baselines.

翻訳日:2023-07-06 16:32:15 公開日:2023-07-04

# DeepFlorist: ディープニューラルネットワークとアンサンブルラーニングをオブジェクト分類のためのメタ分類器として考える

DeepFlorist: Rethinking Deep Neural Networks and Ensemble Learning as A Meta-Classifier For Object Classification ( http://arxiv.org/abs/2307.01806v1 )

ライセンス: Link先を確認

Afshin Khadangi

(参考訳) 本稿では,アンサンブル学習をメタ分類として用いた花分類のための新しい学習パラダイム"DeepFlorist"を提案する。 DeepFloristは、深層学習のパワーとアンサンブル手法の堅牢さを組み合わせて、正確で信頼性の高い花分類結果を達成する。提案するネットワークアーキテクチャは,高次畳み込みニューラルネットワーク(DCNN)と畳み込みニューラルネットワーク(CNN)を組み合わせることで,花のイメージから高次特徴を抽出し,次に完全に連結された階層を分類する。 DeepFloristの性能向上と一般化のために、複数の多様なモデルを組み込んで分類精度を向上させるアンサンブル学習手法が採用された。ベンチマークフラワーデータセットの実験結果は、精度とロバスト性の観点から、deepfloristが最先端の手法よりも優れていることを示した。提案フレームワークは, 植物分類学, 保全研究, 生態学研究の進歩を可能とし, 実地応用における自動花認識システムへの大きな可能性を秘めている。

In this paper, we propose a novel learning paradigm called "DeepFlorist" for flower classification using ensemble learning as a meta-classifier. DeepFlorist combines the power of deep learning with the robustness of ensemble methods to achieve accurate and reliable flower classification results. The proposed network architecture leverages a combination of dense convolutional and convolutional neural networks (DCNNs and CNNs) to extract high-level features from flower images, followed by a fully connected layer for classification. To enhance the performance and generalization of DeepFlorist, an ensemble learning approach is employed, incorporating multiple diverse models to improve the classification accuracy. Experimental results on benchmark flower datasets demonstrate the effectiveness of DeepFlorist, outperforming state-of-the-art methods in terms of accuracy and robustness. The proposed framework holds significant potential for automated flower recognition systems in real-world applications, enabling advancements in plant taxonomy, conservation efforts, and ecological studies.

翻訳日:2023-07-06 16:31:55 公開日:2023-07-04

# フーリエニューラル演算子による添加製造中の局所温度変化の捕捉

Capturing Local Temperature Evolution during Additive Manufacturing through Fourier Neural Operators ( http://arxiv.org/abs/2307.01804v1 )

ライセンス: Link先を確認

Jiangce Chen, Wenzhuo Xu, Martha Baldwin, Bj\"orn Nijhuis, Ton van den Boogaard, Noelia Grande Guti\'errez, Sneha Prabha Narra, Christopher McComb

(参考訳) 部品設計、プロセス計画、モニタリング、制御など、複数の分野におけるAM技術の性能向上には、AM製造中の熱挙動を迅速にシミュレートできる高忠実なデータ駆動モデルが不可欠である。しかしながら、部分ジオメトリの複雑さは、現在のモデルが幅広いジオメトリにわたって高い精度を維持することを困難にしている。さらに、多くのモデルはドメイン全体(一部)にわたって平均二乗誤差(MSE)を報告している。しかし、各段階において、領域のほとんどの領域は、最近の鉱床付近の熱影響帯を除いて、大きな温度変化を経験していない。したがって、mseに基づくモデルの忠実度測定を過大評価することができる。本稿では,フーリエ・ニューラル・オペレーターを用いて添加物製造過程における局所温度変化を捉えるデータ駆動モデルを提案する。さらに, 平均温度を予測として用いた場合と比較して, モデルの性能を相対測度で表した$R^2$メトリックを用いてモデルを評価することを提案する。本モデルは直接エネルギー沈着法における不連続ガレルキン有限要素法に基づく数値シミュレーションを用いて実験を行い, r^2$ で測定した高い忠実性を達成し, トレーニングプロセスに含まれないジオメトリへの一般化性を維持することを実証した。

High-fidelity, data-driven models that can quickly simulate thermal behavior during additive manufacturing (AM) are crucial for improving the performance of AM technologies in multiple areas, such as part design, process planning, monitoring, and control. However, the complexities of part geometries make it challenging for current models to maintain high accuracy across a wide range of geometries. Additionally, many models report a low mean square error (MSE) across the entire domain (part). However, in each time step, most areas of the domain do not experience significant changes in temperature, except for the heat-affected zones near recent depositions. Therefore, the MSE-based fidelity measurement of the models may be overestimated. This paper presents a data-driven model that uses Fourier Neural Operator to capture the local temperature evolution during the additive manufacturing process. In addition, the authors propose to evaluate the model using the $R^2$ metric, which provides a relative measure of the model's performance compared to using mean temperature as a prediction. The model was tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process, and the results demonstrate that the model achieves high fidelity as measured by $R^2$ and maintains generalizability to geometries that were not included in the training process.

翻訳日:2023-07-06 16:31:36 公開日:2023-07-04

# 安定化器分解による三角形ZXダイアグラムの高速収縮

Speedy Contraction of ZX Diagrams with Triangles via Stabiliser Decompositions ( http://arxiv.org/abs/2307.01803v1 )

ライセンス: Link先を確認

Mark Koch, Richie Yeung, Quanlong Wang

(参考訳) クリフォード+t回路の古典的シミュレーションの最近の進歩は、zx計算を用いてマジック状態を反復分解し、単純化する。三角演算を含むzx図のスタビリザー分解について検討することで,この方法を改善する。この手法は、三角形を用いて自然に表現できるマルチ制御ゲートを含む量子回路のシミュレーションを大幅に高速化する。提案手法をquizxライブラリに実装し,ランダム回路に対する重要なシミュレーション高速化(最大数桁まで)と,これまで使用されていたベンチマーク回路のバリエーションを示す。さらに,本ソフトウェアを用いてパラメトリド量子回路の勾配変動を表す図を縮約し,量子機械学習に使用されるアンス・アッツにおけるバレンプラトー現象の自動数値検出を行う。従来の統計学的手法と比較すると, この手法は勾配分散の正確な値を与え, 1 つのダイアグラムを縮約するだけでよい。このツールのパフォーマンスは、クイムライブラリに対するベンチマークで示されているように、テンソルネットワークアプローチと競合する。

Recent advances in classical simulation of Clifford+T circuits make use of the ZX calculus to iteratively decompose and simplify magic states into stabiliser terms. We improve on this method by studying stabiliser decompositions of ZX diagrams involving the triangle operation. We show that this technique greatly speeds up the simulation of quantum circuits involving multi-controlled gates which can be naturally represented using triangles. We implement our approach in the QuiZX library and demonstrate a significant simulation speed-up (up to multiple orders of magnitude) for random circuits and a variation of previously used benchmarking circuits. Furthermore, we use our software to contract diagrams representing the gradient variance of parametrised quantum circuits, which yields a tool for the automatic numerical detection of the barren plateau phenomenon in ans\"atze used for quantum machine learning. Compared to traditional statistical approaches, our method yields exact values for gradient variances and only requires contracting a single diagram. The performance of this tool is competitive with tensor network approaches, as demonstrated with benchmarks against the quimb library.

翻訳日:2023-07-06 16:31:11 公開日:2023-07-04

# Infinite Tensor Network Contraction によるオープン量子システムダイナミクス

Open Quantum System Dynamics from Infinite Tensor Network Contraction ( http://arxiv.org/abs/2307.01802v1 )

ライセンス: Link先を確認

Valentin Link, Hong-Hao Tu, Walter T. Strunz

(参考訳) 近年、強結合な非マルコフ開系の力学を計算するための手法が、行列積状態(MPS)形式に縮約できるテンソルネットワークの観点でいわゆるプロセステンソルの表現に基づいている。ガウス環境においては, 浴槽応答の定常性を利用して, 無限MPS進化法を用いて, このMPSを構築することができることを示す。この結果は、階層的あるいは擬態的手法のように、自由度を補助するオープンシステムの進化と構造的に類似している。しかし、これらの自由度はMPS進化アルゴリズムによって自動的に生成される。さらに, プロセステンソルネットワークを縮約するアルゴリズムは, 既存の提案よりも強い結合問題に対して大きな計算速度アップをもたらす。

Recently developed methods to compute dynamics of strongly coupled non-Markovian open systems are based on a representation of the so-called process tensor in terms of a tensor network, which can be contracted to matrix product state (MPS) form. We show that for Gaussian environments the stationarity of the bath response can be exploited in order to construct this MPS using infinite MPS evolution methods. The result structurally resembles open system evolution with auxiliary degrees of freedom, as in hierarchical or pseudomode methods. Here, however, these degrees of freedom are generated automatically by the MPS evolution algorithm. Furthermore, our algorithm for contracting the process tensor network leads to significant computational speed-ups for strong coupling problems over existing proposals.

翻訳日:2023-07-06 16:30:51 公開日:2023-07-04

# 対角形2量子ビットゲートとクラスター計測を用いた量子計算における古典的効率的レジーム

Classically efficient regimes in measurement based quantum computation performed using diagonal two qubit gates and cluster measurements ( http://arxiv.org/abs/2307.01800v1 )

ライセンス: Link先を確認

Sahar Atallah, Michael Garn, Yukuan Tao, Shashank Virmani

(参考訳) 最近の研究 arXiv:2201.07655v2 において、定数 $\lambda > 0$ が存在し、量子系を効率よく古典的にシミュレートできることを示した。 (i)グラフのノードにquditを配置する。 (ii)各クディットは、最大でD$の対角ゲートを通す。 (iii)各クディットは、その偏りのない計算ベース又は基礎において破壊的に測定され、 (iv) それぞれのquditは、特定の距離測度に従って対角状態の$\lambda^{-D}$内で初期化される。この作業では、任意の2つの量子ビット対角ゲートに対して$\lambda$を明示的に計算し、CZゲートを越えてarXiv:2201.07655v2の計算を拡張する。任意の有限次グラフに対して、パラメータの他の値が理想的なクラスター状態量子計算を可能にするとしても、非自明な古典的に許容された測定に対して効率的にシミュレート可能な「位相」を持つ純絡み合った量子状態の2つのパラメータ族(または熱状態の3つのパラメータ族)を記述することができる。技術的なツールは、作用素の「円筒的」集合の観点から分離性を考えることである。また、異なる集合の選択がアルゴリズムを強化し、それらが広い種類の集合の中で最適であることを示すかどうかも検討するが、このクラス以外では古典的に効率的な体系のサイズを増大させる選択肢が存在することも数値的に示している。

In a recent work arXiv:2201.07655v2 we showed that there is a constant $\lambda >0$ such that it is possible to efficiently classically simulate a quantum system in which (i) qudits are placed on the nodes of a graph, (ii) each qudit undergoes at most $D$ diagonal gates, (iii) each qudit is destructively measured in the computational basis or bases unbiased to it, and (iv) each qudit is initialised within $\lambda^{-D}$ of a diagonal state according to a particular distance measure. In this work we explicitly compute $\lambda$ for any two qubit diagonal gate, thereby extending the computation of arXiv:2201.07655v2 beyond CZ gates. For any finite degree graph this allows us to describe a two parameter family of pure entangled quantum states (or three parameter family of thermal states) which have a non-trivial classically efficiently simulatable "phase" for the permitted measurements, even though other values of the parameters may enable ideal cluster state quantum computation. The main the technical tool involves considering separability in terms of "cylindrical" sets of operators. We also consider whether a different choice of set can strengthen the algorithm, and prove that they are optimal among a broad class of sets, but also show numerically that outside this class there are choices that can increase the size of the classically efficient regime.

翻訳日:2023-07-06 16:30:39 公開日:2023-07-04

# エッジアウェアマルチタスクネットワークによるマルチモダリティmriにおける肝腫瘍の定量化分節化と不確実性予測の統合

Edge-aware Multi-task Network for Integrating Quantification Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality Non-contrast MRI ( http://arxiv.org/abs/2307.01798v1 )

ライセンス: Link先を確認

Xiaojiao Xiao, Qinmin Hu, Guanghui Wang

(参考訳) multi-modality non-contrast magnetic resonance imaging (ncmri) における肝腫瘍の同時定量化, 分節化, 不確実性評価は, 診断に不可欠である。しかし、既存の手法では、マルチモードNCMRI融合と正確な境界情報取得のための効果的なメカニズムが欠如しており、これらのタスクは困難である。これらの課題に対処するために,マルチインデックス定量化,セグメンテーション,不確実性を多モードNCMRI上で関連付けるために,エッジ対応マルチタスクネットワーク(EaMtNet)という統合フレームワークを提案する。 EaMtNetは2つの並列CNNエンコーダとソベルフィルタを使用して、それぞれローカル特徴とエッジマップを抽出する。新たに設計されたエッジ対応機能集約モジュール(EaFA)は、機能融合と選択に使用され、機能マップとエッジマップ間の長距離依存性をキャプチャすることで、ネットワークエッジ対応を実現する。マルチタスクは予測誤差を利用して不確実性を推定し、セグメンテーションと定量化性能を改善する。マルチモダリティncmriと250名の臨床被験者による広範囲な実験を行った。提案モデルでは, ダイス類似度係数が90.01$\pm$1.23, 平均絶対誤差が2.72$\pm$0.58 mmである。その結果,EaMtNetは医用画像解析のための信頼性の高い臨床支援ツールとしての可能性を示した。

Simultaneous multi-index quantification, segmentation, and uncertainty estimation of liver tumors on multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an effective mechanism for multi-modality NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet), to associate multi-index quantification, segmentation, and uncertainty of liver tumors on the multi-modality NCMRI. The EaMtNet employs two parallel CNN encoders and the Sobel filters to extract local features and edge maps, respectively. The newly designed edge-aware feature aggregation module (EaFA) is used for feature fusion and selection, making the network edge-aware by capturing long-range dependency between feature and edge maps. Multi-tasking leverages prediction discrepancy to estimate uncertainty and improve segmentation and quantification performance. Extensive experiments are performed on multi-modality NCMRI with 250 clinical subjects. The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD. The results demonstrate the potential of EaMtNet as a reliable clinical-aided tool for medical image analysis.

翻訳日:2023-07-06 16:30:13 公開日:2023-07-04

# in-medium qcdジェットの量子シミュレーション:運動量拡大、グルーオン生成、エントロピー成長

Quantum simulation of in-medium QCD jets: momentum broadening, gluon production, and entropy growth ( http://arxiv.org/abs/2307.01792v1 )

ライセンス: Link先を確認

Jo\~ao Barata, Xiaojian Du, Meijian Li, Wenyang Qian, Carlos A. Salgado

(参考訳) ジェットは超相対論的重イオン衝突で生成するクォークグルーオンプラズマと、深い非弾性散乱実験で探究された冷たい核物質の主要なプローブの1つである。しかしながら、近年の重要な発展にもかかわらず、媒体内のqcdジェットのリアルタイム進化に関する記述は完成にはほど遠い。これまでの研究では、qcd物質のジェット進化をシミュレートし、現在の計算における固有の技術的困難を克服するための、有望な代替理論実験室として量子技術を検討した。ここでは、単一粒子 $|q\rangle$ からファック空間 $|q\rangle+|qg\rangle$ に拡張し、グルーオンの生成を考慮する。光面ハミルトニアン形式に基づいて、確率的色場として記述された媒体の存在下で多粒子ジェットプローブの進化を追跡するデジタル量子回路を構築する。噴流状態の運動量拡大について検討し,固有推定値と比較し,相当な固有効果を観測した。また,真空分裂関数と比較して小さな補正を施したグルーオン放出確率の媒質変化について検討した。さらに、クォーク成分に関連するフォン・ノイマンエントロピーの時間発展の研究を行い、エントロピーの指数関数は裸クォークに対して線形に成長するが、グルーオン放出を考慮すれば超線形に成長することを見出した。

Jets provide one of the primary probes of the quark-gluon plasma produced in ultrarelativistic heavy ion collisions and the cold nuclear matter explored in deep inelastic scattering experiments. However, despite important developments in the last years, a description of the real-time evolution of QCD jets inside a medium is still far from being complete. In our previous work, we have explored quantum technologies as a promising alternative theoretical laboratory to simulate jet evolution in QCD matter, to overcome inherent technical difficulties in present calculations. Here, we extend our previous investigation from the single particle $|q\rangle$ to the $|q\rangle+|qg\rangle$ Fock space, taking into account gluon production. Based on the light-front Hamiltonian formalism, we construct a digital quantum circuit that tracks the evolution of a multi-particle jet probe in the presence of a medium described as a stochastic color field. Studying the momentum broadening of the jet state, we observe sizable sub-eikonal effects by comparing to eikonal estimates. We also study the medium-induced modifications to the gluon emission probability, which exhibit small corrections compared to the vacuum splitting function. In addition, we study the time evolution of the von-Neumann entropy associated with the quark component; we find that the exponential of the entropy grows linearly in time for the bare quark but super-linearly when taking into account gluon emission.

翻訳日:2023-07-06 16:29:46 公開日:2023-07-04

# コンタクトレス指紋提示アタック検出のための深い機能:一般化できるか?

Deep Features for Contactless Fingerprint Presentation Attack Detection: Can They Be Generalized? ( http://arxiv.org/abs/2307.01845v1 )

ライセンス: Link先を確認

Hailin Li and Raghavendra Ramachandra

(参考訳) 高度な高解像度カメラを備えたハイエンドスマートフォンの急速な進化は、より信頼性が高く、検証に適した指紋バイオメトリックスを接触なく捕獲する結果となった。他の生体認証システムと同様に、非接触指紋認証システムはプレゼンテーション攻撃に対して脆弱である。本稿では,7種類の事前学習型畳み込みニューラルネットワーク (CNN) と視覚変換器 (ViT) の汎用性を比較検討し,提示攻撃を確実に検出する。 4種類のプレゼンテーションアタック・インスツルメンツ(PAI)を用いて,スマートフォンによるプレゼンテーションアタック・データセットの公開実験を行った。第8の深層特徴量の検出性能は,未発見のpaiの一般化性能をベンチマークするためにrevet-one-outプロトコルを用いて評価した。その結果,ResNet50 CNNで最高の一般化性能を示した。

The rapid evolution of high-end smartphones with advanced high-resolution cameras has resulted in contactless capture of fingerprint biometrics that are more reliable and suitable for verification. Similar to other biometric systems, contactless fingerprint-verification systems are vulnerable to presentation attacks. In this paper, we present a comparative study on the generalizability of seven different pre-trained Convolutional Neural Networks (CNN) and a Vision Transformer (ViT) to reliably detect presentation attacks. Extensive experiments were carried out on publicly available smartphone-based presentation attack datasets using four different Presentation Attack Instruments (PAI). The detection performance of the eighth deep feature technique was evaluated using the leave-one-out protocol to benchmark the generalization performance for unseen PAI. The obtained results indicated the best generalization performance with the ResNet50 CNN.

翻訳日:2023-07-06 16:21:29 公開日:2023-07-04

# 3次元顔における創傷充満の促進:自動分割と創傷顔面再生アプローチ

Advancing Wound Filling Extraction on 3D Faces: A Auto-Segmentation and Wound Face Regeneration Approach ( http://arxiv.org/abs/2307.01844v1 )

ライセンス: Link先を確認

Duong Q. Nguyen and Thinh D. Le and Phuong D. Nguyen and H. Nguyen-Xuan

(参考訳) 顔面創傷の分節は, 術前計画および各種医療応用における患者予後の最適化において重要な役割を担っている。本稿では,2ストリームグラフ畳み込みネットワークを用いた3次元顔面創傷セグメンテーションの効率的な自動化手法を提案する。提案手法は,Cir3D-FaIRデータセットを活用し,異なる損失関数を用いた広範囲な実験を通じてデータ不均衡の課題に対処する。精度の高いセグメンテーションを実現するために,徹底的な実験を行い,訓練したモデルから高性能モデルを選択した。選択したモデルは複雑な3次元顔面外傷に対して例外的なセグメンテーション性能を示す。さらに, このセグメンテーションモデルに基づいて, 3次元顔の創傷充填体を抽出し, 前報と比較する手法を提案する。提案手法は, テストスイート上で0.9999986\%の精度を達成し, 先行手法の性能を上回った。この結果から,3Dプリンティング技術を用いて創傷充填形状を図示する。本研究の結果は,術前計画と介入設計に関わる医師に有意な影響を及ぼす。顔の創傷断面積の自動化と創傷充満抽出の精度の向上により, 介入を慎重に評価し, 最適化し, 患者の治療効果を高めることができる。さらに、皮膚組織インプラントの印刷に機械学習と3dバイオプリンティングを活用し、顔面再建の進歩に寄与する。ソースコードは \url{https://github.com/SIMOGroup/WoundFilling3D} で公開されています。

Facial wound segmentation plays a crucial role in preoperative planning and optimizing patient outcomes in various medical applications. In this paper, we propose an efficient approach for automating 3D facial wound segmentation using a two-stream graph convolutional network. Our method leverages the Cir3D-FaIR dataset and addresses the challenge of data imbalance through extensive experimentation with different loss functions. To achieve accurate segmentation, we conducted thorough experiments and selected a high-performing model from the trained models. The selected model demonstrates exceptional segmentation performance for complex 3D facial wounds. Furthermore, based on the segmentation model, we propose an improved approach for extracting 3D facial wound fillers and compare it to the results of the previous study. Our method achieved a remarkable accuracy of 0.9999986\% on the test suite, surpassing the performance of the previous method. From this result, we use 3D printing technology to illustrate the shape of the wound filling. The outcomes of this study have significant implications for physicians involved in preoperative planning and intervention design. By automating facial wound segmentation and improving the accuracy of wound-filling extraction, our approach can assist in carefully assessing and optimizing interventions, leading to enhanced patient outcomes. Additionally, it contributes to advancing facial reconstruction techniques by utilizing machine learning and 3D bioprinting for printing skin tissue implants. Our source code is available at \url{https://github.com/SIMOGroup/WoundFilling3D}.

翻訳日:2023-07-06 16:21:16 公開日:2023-07-04

# ATOM:量子コンピューティングにおける小さな埋め込みのための効率的なトポロジ適応アルゴリズム

ATOM: An Efficient Topology Adaptive Algorithm for Minor Embedding in Quantum Computing ( http://arxiv.org/abs/2307.01843v1 )

ライセンス: Link先を確認

Hoang M. Ngo, Tamer Kahveci, My T. Thai

(参考訳) 量子アニーリング(quantum annealing, qa)は、量子物理学の利点を生かして最適化問題を解決する強力な手法である。 QAプロセスにおいて、QAのスケールアップを防ぐボトルネックは、論理グラフと呼ばれるグラフで表される最適化問題を、別のグラフで表される量子コンピュータの量子処理ユニット(QPU)トポロジに埋め込む小さな埋め込みステップである。既存のマイナー埋め込みのメソッドは、大規模なグラフ埋め込みでかなりの量の実行時間を必要とする。本稿では,ハードウェアグラフの拡張可能な部分グラフである適応トポロジーの新たな概念を提案する。そこで我々は,Adaptive Topology eMbedding (ATOM) という小さな埋め込みアルゴリズムを開発した。 ATOMは論理グラフからノードを反復的に選択し、ハードウェアグラフの適応トポロジーに埋め込む。実験の結果、atomは、結果の埋め込みの品質を損なうことなく、最先端のものよりもずっと小さな実行時間で実現可能な埋め込みを提供できることがわかった。

Quantum annealing (QA) has emerged as a powerful technique to solve optimization problems by taking advantages of quantum physics. In QA process, a bottleneck that may prevent QA to scale up is minor embedding step in which we embed optimization problems represented by a graph, called logical graph, to Quantum Processing Unit (QPU) topology of quantum computers, represented by another graph, call hardware graph. Existing methods for minor embedding require a significant amount of running time in a large-scale graph embedding. To overcome this problem, in this paper, we introduce a novel notion of adaptive topology which is an expandable subgraph of the hardware graph. From that, we develop a minor embedding algorithm, namely Adaptive TOpology eMbedding (ATOM). ATOM iteratively selects a node from the logical graph, and embeds it to the adaptive topology of the hardware graph. Our experimental results show that ATOM is able to provide a feasible embedding in much smaller running time than that of the state-of-the-art without compromising the quality of resulting embedding.

翻訳日:2023-07-06 16:20:52 公開日:2023-07-04

# グローバルクエンチ後の三成分情報の普遍性:スピンフリップと半局所電荷

Universality in the tripartite information after global quenches: spin flip and semilocal charges ( http://arxiv.org/abs/2307.01842v1 )

ライセンス: Link先を確認

Vanja Mari\'c

(参考訳) 我々は、時間発展が半局所保存作用素を持つ局所ハミルトニアンの下にある大域的クエンチの後に現れる定常状態を研究する。特に、量子xy鎖に双対なモデルについて研究する。初期状態における局所摂動は定常状態における空間相関の指数関数的減衰を代数的崩壊に変えることができることを示す。隣り合う3つのサブシステムの三部情報に着目し, (R\enyi-$\alpha$) 絡み合いエントロピーの挙動について検討した。大きなサブシステムの限界において、相関の代数的崩壊を伴う定常状態において、三成分情報は交叉比に普遍的な依存を持つ非零値を示し、相関の指数的減衰とともに定常状態において消失する。

We study stationary states emerging after global quenches in which the time evolution is under local Hamiltonians that possess semilocal conserved operators. In particular, we study a model that is dual to quantum XY chain. We show that a localized perturbation in the initial state can turn an exponential decay of spatial correlations in the stationary state into an algebraic decay. We investigate the consequences on the behavior of the (R\'enyi-$\alpha$) entanglement entropies, focusing on the tripartite information of three adjacent subsystems. In the limit of large subsystems, we show that in the stationary state with the algebraic decay of correlations the tripartite information exhibits a non-zero value with a universal dependency on the cross ratio, while it vanishes in the stationary state with the exponential decay of correlations.

翻訳日:2023-07-06 16:20:34 公開日:2023-07-04

# ニューラルネットワーク混合状態再構成の実証的サンプル複雑性

Empirical Sample Complexity of Neural Network Mixed State Reconstruction ( http://arxiv.org/abs/2307.01840v1 )

ライセンス: Link先を確認

Haimeng Zhao and Giuseppe Carleo and Filippo Vicentini

(参考訳) 神経量子状態を用いた量子状態再構成は、実用的な応用において量子ショットの複雑さを減らすための有効なツールとして提案されており、特にノイズレスの場合に焦点を当てた数値実験でその利点が示されている。本研究では,混合状態に対する異なる量子状態再構成手法(有限温度イジングモデル)の性能を数値的に検討する。本稿では,分散低減手法を応用し,アルゴリズムの量子資源要件を体系的に低減する方法を示す。次に、状態の2つの主要なニューラルネットワーク量子状態、すなわち、神経密度演算子と正の演算子値測定表現を比較し、対象状態の混合度が異なるため、それらの性能を示す。我々は、ある種のエンコーディングは異なる混合状態においてより効率的であり、古典的資源と量子的資源の両方の観点からより効率的なエンコーディングを設計する必要性を指摘する。

Quantum state reconstruction using Neural Quantum States has been proposed as a viable tool to reduce quantum shot complexity in practical applications, and its advantage over competing techniques has been shown in numerical experiments focusing mainly on the noiseless case. In this work, we numerically investigate the performance of different quantum state reconstruction techniques for mixed states: the finite-temperature Ising model. We show how to systematically reduce the quantum resource requirement of the algorithms by applying variance reduction techniques. Then, we compare the two leading neural quantum state encodings of the state, namely, the Neural Density Operator and the positive operator-valued measurement representation, and illustrate their different performance as the mixedness of the target state varies. We find that certain encodings are more efficient in different regimes of mixedness and point out the need for designing more efficient encodings in terms of both classical and quantum resources.

翻訳日:2023-07-06 16:20:18 公開日:2023-07-04

# EdgeFace:エッジデバイスのための効率的な顔認識モデル

EdgeFace: Efficient Face Recognition Model for Edge Devices ( http://arxiv.org/abs/2307.01838v1 )

ライセンス: Link先を確認

Anjith George and Christophe Ecabert and Hatef Otroshi Shahreza and Ketan Kotwal and Sebastien Marcel

(参考訳) 本稿では,EdgeNeXtのハイブリッドアーキテクチャにヒントを得た,軽量かつ効率的な顔認識ネットワークEdgeFaceを提案する。 CNNとTransformerモデルの長所と低階線形層を効果的に組み合わせることで、エッジデバイスに最適化された優れた顔認識性能を実現する。提案したEdgeFaceネットワークは、低計算コストとコンパクトストレージを維持するだけでなく、高い顔認識精度を実現し、エッジデバイスへのデプロイに適している。挑戦的なベンチマーク顔データセットに関する広範囲な実験は、最先端の軽量モデルや深層顔認識モデルと比較して、エッジフェイスの有効性と効率を示す。 1.77Mパラメータを持つEdgeFaceモデルはLFW(99.73%)、IJB-B(92.67%)、IJB-C(94.85%)のアート結果の状態を達成し、計算量の多い他の効率的なモデルよりも優れている。実験を再現するコードは公開される予定だ。

In this paper, we present EdgeFace, a lightweight and efficient face recognition network inspired by the hybrid architecture of EdgeNeXt. By effectively combining the strengths of both CNN and Transformer models, and a low rank linear layer, EdgeFace achieves excellent face recognition performance optimized for edge devices. The proposed EdgeFace network not only maintains low computational costs and compact storage, but also achieves high face recognition accuracy, making it suitable for deployment on edge devices. Extensive experiments on challenging benchmark face datasets demonstrate the effectiveness and efficiency of EdgeFace in comparison to state-of-the-art lightweight models and deep face recognition models. Our EdgeFace model with 1.77M parameters achieves state of the art results on LFW (99.73%), IJB-B (92.67%), and IJB-C (94.85%), outperforming other efficient models with larger computational complexities. The code to replicate the experiments will be made available publicly.

翻訳日:2023-07-06 16:20:03 公開日:2023-07-04

# 四元数フーリエ変換の行列形式と四元数畳み込みについて

On the Matrix Form of the Quaternion Fourier Transform and Quaternion Convolution ( http://arxiv.org/abs/2307.01836v1 )

ライセンス: Link先を確認

Giorgos Sfikas and George Retsinas

(参考訳) フーリエ変換および畳み込み演算の四元数版行列形式について検討する。四元数(英語版)は強力な表現単位を提供するが、それらは四元数乗算の非可換性から最も遠ざかるそれらの利用の困難と関係しており、従って、$\mu^2 = -1$ は四元数領域における無限の解をとる。四元数行列の扱いはいくつかの面で複雑である(固有構造の定義、行列式など)。本研究では, 4次フーリエ変換行列と標準(複素)離散フーリエ変換行列との関係と, 既知の複素領域定理が四元数に拡張された拡張について明らかにする。特に四元系フーリエ変換行列と四元系循環行列の関係(四元系畳み込みを表わす)と、後者の固有構造との関係に注目した。理論結果を直接利用した概念実証の応用として,四元子畳み込みのスペクトルノルムを束縛する手法を提案する。

We study matrix forms of quaternionic versions of the Fourier Transform and Convolution operations. Quaternions offer a powerful representation unit, however they are related to difficulties in their use that stem foremost from non-commutativity of quaternion multiplication, and due to that $\mu^2 = -1$ posseses infinite solutions in the quaternion domain. Handling of quaternionic matrices is consequently complicated in several aspects (definition of eigenstructure, determinant, etc.). Our research findings clarify the relation of the Quaternion Fourier Transform matrix to the standard (complex) Discrete Fourier Transform matrix, and the extend on which well-known complex-domain theorems extend to quaternions. We focus especially on the relation of Quaternion Fourier Transform matrices to Quaternion Circulant matrices (representing quaternionic convolution), and the eigenstructure of the latter. A proof-of-concept application that makes direct use of our theoretical results is presented, where we produce a method to bound the spectral norm of a Quaternionic Convolution.

翻訳日:2023-07-06 16:19:44 公開日:2023-07-04

# パラメトリック逆変換源を用いた絡み合い型qkdの安全性

Security of entanglement-based QKD with realistic parametric down-conversion sources ( http://arxiv.org/abs/2307.01834v1 )

ライセンス: Link先を確認

K. S. Kravtsov

(参考訳) 本稿では,実践的絡み合いに基づく量子鍵分布(QKD),すなわちBBM92やBB84プロトコルのセキュリティ面を分析する。準備と測定のQKDプロトコルと同様に、絡み合いベースのQKDの実装は、非理想的な光子源に依存する必要がある。絡み合い生成の典型的な解は自然パラメトリックダウン変換である。しかし、このプロセスは単一の光子対だけでなく、2つ以上の光子を持つ量子状態も生成し、セキュリティの悪化につながる可能性がある。この効果は絡み合いに基づくQKDシステムのセキュリティを損なうものではない。また、利用可能なセキュリティ証明をレビューし、絡み合ったソースの特性がセキュリティ劣化とは無関係であることを示す。

The paper analyzes security aspects of practical entanglement-based quantum key distribution (QKD), namely, BBM92 or entanglement-based BB84 protocol. Similar to prepare-and-measure QKD protocols, practical implementations of the entanglement-based QKD have to rely upon non-ideal photon sources. A typical solution for entanglement generation is the spontaneous parametric down-conversion. However, this process creates not only single photon pairs, but also quantum states with more than two photons, which potentially may lead to security deterioration. We show that this effect does not impair the security of entanglement-based QKD systems. We also review the available security proofs and show that properties of the entanglement source have nothing to do with security degradation.

翻訳日:2023-07-06 16:19:19 公開日:2023-07-04

# dit-3d:3次元形状生成のための平滑拡散トランスの検討

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation ( http://arxiv.org/abs/2307.01831v1 )

ライセンス: Link先を確認

Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nie{\ss}ner, Zhenguo Li

(参考訳) 最近の拡散変換器(例えば、DiT)は、高品質な2D画像を生成するための強力な効果を示している。しかし,従来の3次元拡散法は主にU-Netアーキテクチャを採用するため,トランスフォーマーアーキテクチャが3次元形状生成において同等に機能するかどうかはまだ定かではない。このギャップを埋めるために, 平らな変換器を用いて渦化点雲のデノナイジング過程を直接操作できる新しい3次元形状生成用拡散変換器, DiT-3Dを提案する。既存のU-Netアプローチと比較して、私たちのDiT-3Dはモデルサイズがよりスケーラブルで、より高品質な世代を生み出す。具体的には、DiT-3D は DiT の設計哲学を採用するが、3D の位置とパッチの埋め込みを組み込んで、voxelized point cloud からの入力を適応的に集約することで変更する。 3次元形状生成における自己注意の計算コストを低減するため、3次元ウィンドウアテンションをトランスフォーマーブロックに組み込む。最後に、偏光点雲の予測に線形および脱酸化層を用いる。また、2Dから3Dへの効率的な微調整もサポートしており、ImageNetのトレーニング済みのDiT-2DチェックポイントはShapeNetのDiT-3Dを大幅に改善することができる。 ShapeNetデータセットの実験結果から、提案したDiT-3Dは、高忠実で多様な3Dポイントクラウド生成において最先端の性能を達成することが示された。特に,我々のdit-3dは,最先端手法の1ネアレスト近傍の精度を4.59パーセント低下させ,シャンファー距離で評価した場合のカバレッジメートル法を3.51パーセント向上させる。

Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Specifically, the DiT-3D adopts the design philosophy of DiT but modifies it by incorporating 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. To reduce the computational cost of self-attention in 3D shape generation, we incorporate 3D window attention into Transformer blocks, as the increased 3D token length resulting from the additional dimension of voxels can lead to high computation. Finally, linear and devoxelization layers are used to predict the denoised point clouds. In addition, our transformer architecture supports efficient fine-tuning from 2D to 3D, where the pre-trained DiT-2D checkpoint on ImageNet can significantly improve DiT-3D on ShapeNet. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our DiT-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.

翻訳日:2023-07-06 16:19:07 公開日:2023-07-04

# データ再構築のデコンストラクション:マルチクラス、軽量化、一般的な損失

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses ( http://arxiv.org/abs/2307.01827v1 )

ライセンス: Link先を確認

Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin and Michal Irani

(参考訳) トレーニングデータの記憶は活発な研究分野であるが、ニューラルネットワークの内部動作に関する我々の理解はまだ初期段階にある。近年,haimら (2022) は多層型パーセプトロンバイナリ分類器からトレーニングサンプルを再構成する手法を提案し,トレーニングサンプルの大部分がそのようなネットワークのパラメータにエンコードされていることを効果的に証明した。本研究では,マルチクラスニューラルネットワークや畳み込みニューラルネットワークからの再構成など,その知見をいくつかの方向に拡張する。回帰損失のようなより広い範囲の損失関数に適用可能な、より一般的な再構成スキームを導出する。さらに,ネットワークがそのような再構築計画に感受性を及ぼす様々な要因について検討した。興味深いことに、トレーニング中に重量減少を使用することで、量と品質の両面で復元性が向上する。さらに, トレーニング標本数に対するニューロン数の影響について検討した。

Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability.

翻訳日:2023-07-06 16:18:35 公開日:2023-07-04

# 意味的役割ラベリングにおける非言語的述語探索:課題と機会

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities ( http://arxiv.org/abs/2307.01870v1 )

ライセンス: Link先を確認

Riccardo Orlando and Simone Conia and Roberto Navigli

(参考訳) セマンティック・ロール・ラベルリング (SRL) では顕著な進歩が見られたが、ほとんどの研究は、述語の大半が動詞であると仮定して行われている。逆に、述語は名詞や形容詞などの他の部分を用いて表現することもできる。しかしながら、非言語述語は、SRLの進捗を実際の設定(新聞の見出し、対話、ツイートなど)よりも少ない頻度で測定するために一般的に使用しているベンチマークに現れます。本稿では,複数の述語型をカバーする新しいpropbankデータセットを提案する。これにより、標準ベンチマークは、SRLの現在の状況の正確な画像を提供しておらず、最先端システムは、異なる述語型間で知識を伝達できないことを実証的に実証する。これらの問題を観察し、言語、名目、形容詞の述語構造に等しく重要性を与えるように設計された、手書きの課題セットも提示する。このようなデータセットを使用して,異なる言語資源を活用して知識伝達を促進することができるか検討する。結論として、SRLは「解決」には程遠いものであり、他の意味的タスクと統合することで、特に非言語述語の長い尾において、将来重要な改善が可能となり、非言語述語のSRLに関するさらなる研究が促進される。

Although we have witnessed impressive progress in Semantic Role Labeling (SRL), most of the research in the area is carried out assuming that the majority of predicates are verbs. Conversely, predicates can also be expressed using other parts of speech, e.g., nouns and adjectives. However, non-verbal predicates appear in the benchmarks we commonly use to measure progress in SRL less frequently than in some real-world settings -- newspaper headlines, dialogues, and tweets, among others. In this paper, we put forward a new PropBank dataset which boasts wide coverage of multiple predicate types. Thanks to it, we demonstrate empirically that standard benchmarks do not provide an accurate picture of the current situation in SRL and that state-of-the-art systems are still incapable of transferring knowledge across different predicate types. Having observed these issues, we also present a novel, manually-annotated challenge set designed to give equal importance to verbal, nominal, and adjectival predicate-argument structures. We use such dataset to investigate whether we can leverage different linguistic resources to promote knowledge transfer. In conclusion, we claim that SRL is far from "solved", and its integration with other semantic tasks might enable significant improvements in the future, especially for the long tail of non-verbal predicates, thereby facilitating further research on SRL for non-verbal predicates.

翻訳日:2023-07-06 16:12:45 公開日:2023-07-04

# MaskBEV:鳥眼視3D点雲のオブジェクト検出とフットプリント完了

MaskBEV: Joint Object Detection and Footprint Completion for Bird's-eye View 3D Point Clouds ( http://arxiv.org/abs/2307.01864v1 )

ライセンス: Link先を確認

William Guimont-Martin, Jean-Michel Fortin, Fran\c{c}ois Pomerleau, Philippe Gigu\`ere

(参考訳) ライダーポイントクラウドにおける最近のオブジェクト検出の研究は、主にオブジェクト周辺の境界ボックスの予測に焦点を当てている。この予測は通常、アンカーベースまたはアンカーフリーの検出器を使って境界ボックスを予測し、オブジェクトが適切に動作するための明確な事前知識を必要とする。これらの制約を緩和するために,鳥眼ビュー (BEV) を用いた物体検出ニューラルネットワークであるMaskBEVを提案する。 MaskBEVは検出されたオブジェクトのフットプリントを表す一連のBEVインスタンスマスクを予測する。さらに,1回のパスで物体検出と足跡完了を可能にする。 MaskBEVはまた、検出問題を分類の観点から純粋に再構成し、通常はリグレッションによって境界ボックスを予測する。本研究では,SemanticKITTIとKITTIの両方のデータセット上でのMaskBEVの性能評価を行い,アーキテクチャの利点と限界を分析した。

Recent works in object detection in LiDAR point clouds mostly focus on predicting bounding boxes around objects. This prediction is commonly achieved using anchor-based or anchor-free detectors that predict bounding boxes, requiring significant explicit prior knowledge about the objects to work properly. To remedy these limitations, we propose MaskBEV, a bird's-eye view (BEV) mask-based object detector neural architecture. MaskBEV predicts a set of BEV instance masks that represent the footprints of detected objects. Moreover, our approach allows object detection and footprint completion in a single pass. MaskBEV also reformulates the detection problem purely in terms of classification, doing away with regression usually done to predict bounding boxes. We evaluate the performance of MaskBEV on both SemanticKITTI and KITTI datasets while analyzing the architecture advantages and limitations.

翻訳日:2023-07-06 16:12:05 公開日:2023-07-04

# マルチエージェント強化学習による創発的リソース交換と盗難防止行動

Emergent Resource Exchange and Tolerated Theft Behavior using Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2307.01862v1 )

ライセンス: Link先を確認

Jack Garbus, Jordan Pollack

(参考訳) 何十年もの間、協調の進化はゲーム理論、経済学、生物学、コンピュータ科学といった多くの学術分野の関心を惹きつけてきた。本研究では,捕食環境において資源を投棄し,拾い上げることによって形成される,新規で効果的な資源交換プロトコルの出現を実証する。この形態の協力はキャンプファイヤーの導入によって可能となり、それ以外はあり得ない相互作用を探索するエージェントの会衆とダウンタイムが延長される。エージェントは交換相手に騙されるのを避けることを学ぶが、必ずしも第三者からではない。また,環境における処罰,戦闘,強姦のメカニズムが欠如しているにもかかわらず,許容盗難と類似した行動の出現も観察した。

For decades, the evolution of cooperation has piqued the interest of numerous academic disciplines such as game theory, economics, biology, and computer science. In this work, we demonstrate the emergence of a novel and effective resource exchange protocol formed by dropping and picking up resources in a foraging environment. This form of cooperation is made possible by the introduction of a campfire, which adds an extended period of congregation and downtime for agents to explore otherwise unlikely interactions. We find that the agents learn to avoid getting cheated by their exchange partners, but not always from a third party. We also observe the emergence of behavior analogous to tolerated theft, despite the lack of any punishment, combat, or larceny mechanism in the environment.

翻訳日:2023-07-06 16:11:51 公開日:2023-07-04

# 弱アダマール行列と弱アダマール対角化グラフ

Weak Hadamard matrices and Weakly Hadamard diagonalizable graphs ( http://arxiv.org/abs/2307.01859v1 )

ライセンス: Link先を確認

Darian McLaren, Hermie Monterde, and Sarah Plosker

(参考訳) 弱いアダマール行列は$\{-1,0, 1\}$-matrix $p$ であり、$pp^t$ は三対角である。弱アダマール行列と弱アダマール対角化グラフ(ラプラシア行列が弱アダマール行列で対角化されるグラフ)の基底となる代数的構造と組合せ的構造について検討する。このような行列やグラフの構成や例も提供します。次に、そのようなグラフに関して量子状態転移を考える。

A weak Hadamard matrix is a $\{-1,0, 1\}$-matrix $P$ such that $PP^T$ is tridiagonal. We explore the underlying algebraic and combinatorial structure of weak Hadamard matrices and weakly Hadamard diagonalizable graphs (graphs whose Laplacian matrix is diagonalized by a weak Hadamard matrix). We also provide constructions and examples of such matrices and graphs. We then consider quantum state transfer with respect to such graphs.

翻訳日:2023-07-06 16:11:38 公開日:2023-07-04

# 時間変調結合共振子系に基づく超伝導非相反性

Superconducting Non-Reciprocity Based on Time-Modulated Coupled-Resonator Systems ( http://arxiv.org/abs/2307.01853v1 )

ライセンス: Link先を確認

Yi Zhuang, Chandrashekhar Gaikwad, Daria Kowsari, Kater Murch, and Aravind Nagulu

(参考訳) 本稿では、時間変調結合共振器ネットワークに基づいて、循環器、アイソレータ、一方向増幅器を含む多種多様な超伝導非相反成分を設計するための統一的アプローチを提案する。本手法は,SQUIDベースの標準共振器をビルディングブロックとして利用し,直列結合,ワイ接続,格子結合共振器などの様々な構成で配置し,幅広いオンチップ非相互デバイスを実現する。提案手法の有効性を実証し,20db以上の挿入損失とアイソレーションをほぼゼロとした循環器およびアイソレータと,10dbを超える前方利得を有する方向増幅器と20db以上の逆アイソレータを実現した。本研究は, 単層超伝導プロセスを用いた直列結合型3共振器超電導アイソレータの実装と評価を行った。 20mKのベース温度では, 前方方向の挿入損失が1.3dB, 中央周波数で25dB, 逆方向の帯域幅250MHzで15dB以上であった。本手法は超伝導回路の高性能非相反デバイスの設計を可能にすることを約束する。

We present a unified approach for designing a diverse range of superconducting non-reciprocal components, including circulators, isolators, and uni-directional amplifiers, based on temporally-modulated coupled resonator networks. Our method leverages standard SQUID-based resonators as building blocks, arranged in various configurations such as series-coupled, wye-connected, and lattice-coupled resonators, to realize a wide range of on-chip non-reciprocal devices. Our theoretical studies demonstrated the effectiveness of the proposed approach, achieving circulators and isolators with near-zero insertion losses and isolation greater than 20 dB, and directional amplifiers with forward gain exceeding 10 dB and reverse isolation greater than 20 dB. To validate our findings, we implemented and measured a series-coupled three-resonator superconducting isolator using a single-layer superconducting process. At a base temperature of 20 mK, our device exhibited insertion loss of 1.3 dB in the forward direction, and isolation of up to 25 dB at the center frequency and greater than 15 dB across a bandwidth of 250 MHz in the reverse direction. Our approach promises to enable the design of a broad range of high-performance non-reciprocal devices for superconducting circuits.

翻訳日:2023-07-06 16:11:30 公開日:2023-07-04

# 亜キラル対称性で保護された位相スピンテクスチャを持つ境界平坦バンド

Boundary Flat Bands with Topological Spin Textures Protected by Sub-chiral Symmetry ( http://arxiv.org/abs/2307.01851v1 )

ライセンス: Link先を確認

Yijie Mo, Xiao-Jiao Wang, Rui Yu, Zhongbo Yan

(参考訳) キラル対称性は、トポロジカルな分類や、バルクあるいは境界平坦なバンドの起源の理解において欠かせない役割を果たす。従来のカイラル対称性の定義は、ハミルトニアンと反可換な定数ユニタリ行列の存在を指す。定数ユニタリ行列は一定の固有ベクトルを持つため、キラル対称性によって強制される境界平坦バンドは、同じ固有ベクトルとキラル対称性作用素を共有し、固定された(擬)スピン偏極を持ち、量子幾何学では特徴を持たないことが知られている。本研究では、キラル対称性を一般化し、サブキラル対称性という概念を導入する。定数として定義される従来のキラル対称性作用素とは異なり、亜キラル対称性作用素は運動量ベクトルの部分成分に依存する。キラル対称性を持たない位相的ガッピングまたはギャップレス系は、位相的スピンテクスチャと量子化されたベリー相を示す境界平坦バンドをサポートすることができる。このような興味深い境界平坦なバンドは、相互作用や障害の存在下で様々なエキゾチックな物理学をもたらすことを期待する。

Chiral symmetry plays an indispensable role in topological classifications as well as in the understanding of the origin of bulk or boundary flat bands. The conventional definition of chiral symmetry refers to the existence of a constant unitary matrix anticommuting with the Hamiltonian. As a constant unitary matrix has constant eigenvectors, boundary flat bands enforced by chiral symmetry, which share the same eigenvectors with the chiral symmetry operator, are known to carry fixed (pseudo)spin polarizations and be featureless in quantum geometry. In this work, we generalize the chiral symmetry and introduce a concept termed sub-chiral symmetry. Unlike the conventional chiral symmetry operator defined as constant, the sub-chiral symmetry operator depends on partial components of the momentum vector, so as its eigenvectors. We show that topological gapped or gapless systems without the chiral symmetry but with the sub-chiral symmetry can support boundary flat bands, which exhibit topological spin textures and quantized Berry phases. We expect that such intriguing boundary flat bands could give rise to a variety of exotic physics in the presence of interactions or disorders.

翻訳日:2023-07-06 16:11:08 公開日:2023-07-04

# 自己見積生成モデルがMADに

Self-Consuming Generative Models Go MAD ( http://arxiv.org/abs/2307.01850v1 )

ライセンス: Link先を確認

Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk

(参考訳) 画像、テキスト、その他のデータ型の生成AIアルゴリズムの耐震性向上は、次世代モデルのトレーニングに合成データを使用する誘惑につながった。このプロセスを繰り返すと、性質が不十分な自己消費ループが生成される。本研究は,3種類のオートファゴスループの最先端画像モデルを用いて解析的,経験的分析を行い,トレーニングの世代を通しての固定的,新鮮な実トレーニングデータの利用方法や,前世代のモデルのサンプルがデータ品質と多様性のトレードオフに偏っているかどうかについて検討した。あらゆるシナリオの主な結論は、自己食ループの各世代に十分な新鮮な実データがない場合、将来の生成モデルは、その品質(精度)や多様性(リコール)を徐々に減少させる運命にあるということです。我々は、この状態モデルオートファジー障害(mad)と呼び、狂牛病と類似している。

Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.

翻訳日:2023-07-06 16:10:49 公開日:2023-07-04

# クロスウェイ拡散:自己教師型学習による拡散に基づくビジュモータ政策の改善

Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning ( http://arxiv.org/abs/2307.01849v1 )

ライセンス: Link先を確認

Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo

(参考訳) シーケンスモデリングアプローチはロボット模倣学習において有望な結果を示している。近年,複雑なデータ分布のモデル化に特有な能力を有する拡散モデルが,行動のクローニングに採用されている。本研究では,自己教師付き学習(SSL)目標を用いて,拡散に基づくビジュモータポリシー学習を強化する手法であるクロスウェイ拡散を提案する。標準拡散に基づくポリシーは、視覚観測やその他の低次元状態に基づくランダムノイズから動作シーケンスを生成する。さらに、逆拡散過程の中間表現から生画像画素(および他の状態情報)を再構成する新しいデコーダを導入し、ssl損失を用いて共同でモデルを訓練することで、これをさらに拡張する。シミュレーションおよび実世界のロボットタスクにおけるクロスウェイ拡散の有効性を実証し,標準拡散法よりも優れていることを確認する。このような自己教師型再構築は,特に実演の習熟度が異なる場合において,政策学習の表現性を向上することを示す。

Sequence modeling approaches have shown promising results in robot imitation learning. Recently, diffusion models have been adopted for behavioral cloning, benefiting from their exceptional capabilities in modeling complex data distribution. In this work, we propose Crossway Diffusion, a method to enhance diffusion-based visuomotor policy learning by using an extra self-supervised learning (SSL) objective. The standard diffusion-based policy generates action sequences from random noise conditioned on visual observations and other low-dimensional states. We further extend this by introducing a new decoder that reconstructs raw image pixels (and other state information) from the intermediate representations of the reverse diffusion process, and train the model jointly using the SSL loss. Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks, confirming its advantages over the standard diffusion-based policy. We demonstrate that such self-supervised reconstruction enables better representation for policy learning, especially when the demonstrations have different proficiencies.

翻訳日:2023-07-06 16:10:30 公開日:2023-07-04

# 大規模言語モデルを用いたタスクプランニング

Embodied Task Planning with Large Language Models ( http://arxiv.org/abs/2307.01848v1 )

ライセンス: Link先を確認

Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan

(参考訳) インボディードエージェントをコモンセンスで取得することは、ロボットが一般的な環境で複雑なヒューマンインストラクションを完了させるのに重要である。最近の大規模言語モデル(LLM)は、複雑なタスクの計画生成にエージェントの豊富な意味知識を組み込むことができるが、現実的な世界に関する情報は乏しく、通常、実現不可能なアクションシーケンスを生成する。本稿では,物理的シーン制約を伴う平面計画のための具体的タスクにおけるタスクプランニングエージェント (tapa) を提案する。具体的には、まず屋内シーンのトリプル、指示、アクションプランを含むマルチモーダルデータセットを構築し、GPT-3.5のシーンにデザインされたプロンプトと既存のオブジェクトのリストを提供し、多数の命令とそれに対応する計画されたアクションを生成する。生成されたデータは、事前訓練されたLLMの接地計画調整に活用される。推論の際には,オープンボキャブラリオブジェクト検出器を様々な場所で収集された多視点RGB画像に拡張することにより,シーン内の物体を検出する。実験の結果,我々のTaPAフレームワークから生成したプランは,LLaVAやGPT-3.5よりも大きなマージンで高い成功率を達成できることがわかった。

Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments. Recent large language models (LLM) can embed rich semantic knowledge for agents in plan generation of complex tasks, while they lack the information about the realistic world and usually yield infeasible action sequences. In this paper, we propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint, where the agent generates executable plans according to the existed objects in the scene by aligning LLMs with the visual perception models. Specifically, we first construct a multimodal dataset containing triplets of indoor scenes, instructions and action plans, where we provide the designed prompts and the list of existing objects in the scene for GPT-3.5 to generate a large number of instructions and corresponding planned actions. The generated data is leveraged for grounded plan tuning of pre-trained LLMs. During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations. Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin, which indicates the practicality of embodied task planning in general and complex environments.

翻訳日:2023-07-06 16:10:13 公開日:2023-07-04

# Grad-FEC: コラボレーションインテリジェンスにおける深い特徴の不平等な損失保護

Grad-FEC: Unequal Loss Protection of Deep Features in Collaborative Intelligence ( http://arxiv.org/abs/2307.01846v1 )

ライセンス: Link先を確認

Korcan Uyanik, S. Faegheh Yeganli, Ivan V. Baji\'c

(参考訳) コラボレーションインテリジェンス(CI)では、人工知能(AI)モデルを、エッジデバイスにデプロイされるフロントエンドと、クラウドにデプロイされるバックエンドの2つの部分に分割する。フロントエンドによって生成された深い特徴テンソルは、通信チャネルを介してクラウドに送信され、パケットロスを受ける可能性がある。この問題に対処するために,Unequal Loss Protection (ULP) によるパケット損失の存在下でのCIシステムのレジリエンスを高める新しい手法を提案する。提案手法は,フロントエンドが生成する特徴パケットの重要度を推定し,重要なパケットを保護するために前方誤り訂正(FEC)符号を選択的に適用する特徴重要度推定器を含む。実験の結果,提案手法はパケット損失の場合にciシステムの信頼性とロバスト性を大幅に向上できることがわかった。

Collaborative intelligence (CI) involves dividing an artificial intelligence (AI) model into two parts: front-end, to be deployed on an edge device, and back-end, to be deployed in the cloud. The deep feature tensors produced by the front-end are transmitted to the cloud through a communication channel, which may be subject to packet loss. To address this issue, in this paper, we propose a novel approach to enhance the resilience of the CI system in the presence of packet loss through Unequal Loss Protection (ULP). The proposed ULP approach involves a feature importance estimator, which estimates the importance of feature packets produced by the front-end, and then selectively applies Forward Error Correction (FEC) codes to protect important packets. Experimental results demonstrate that the proposed approach can significantly improve the reliability and robustness of the CI system in the presence of packet loss.

翻訳日:2023-07-06 16:09:48 公開日:2023-07-04

# トポロジカル量子コンピューティングにおけるブレイド発生行列の体系計算

Systematic Computation of Braid Generator Matrix in Topological Quantum Computing ( http://arxiv.org/abs/2307.01892v1 )

ライセンス: Link先を確認

Abdellah Tounsi, Nacer Eddine Belaloui, Mohamed Messaoud Louamri, Amani Mimoun, Achour Benslama, Mohamed Taha Rouabah

(参考訳) 本稿では,トポロジカル量子計算(TQC)の基本編曲演算の体系的数値計算法を提案する。非可換アノンのブレイディングはtqcにおいて重要な技術であり、位相的に保護された量子ゲートの実装を提供する。しかし、特に多くのエノンや複雑な融合パターンを持つシステムでは、ブレイドジェネレータの行列表現を得ることは困難である。提案手法はこの課題に対処し,qubit あるいは qudit あたりの任意の数のエヌンを含むことができる。このアプローチは一般的なトポロジカル量子回路シミュレータの基本的な構成要素であり、TQCフレームワーク内の複雑な量子回路の探索と解析を容易にする。本手法を代数的条件を用いて実装・テストした。さらに,CNOTゲートの再生に成功して概念実証を行う。

We present a systematic numerical method to compute the elementary braiding operations for topological quantum computation (TQC). Braiding non-Abelian anyons is a crucial technique in TQC, offering a topologically protected implementation of quantum gates. However, obtaining matrix representations for braid generators can be challenging, especially for systems with numerous anyons or complex fusion patterns. Our proposed method addresses this challenge, allowing for the inclusion of an arbitrary number of anyons per qubit or qudit. This approach serves as a fundamental component in a general topological quantum circuit simulator, facilitating the exploration and analysis of intricate quantum circuits within the TQC framework. We have implemented and tested the method using algebraic conditions. Furthermore, we provide a proof of concept by successfully reproducing the CNOT gate.

翻訳日:2023-07-06 16:03:06 公開日:2023-07-04

# 機械学習技術は人道的作業や開発に使えるのだろうか?

Are machine learning technologies ready to be used for humanitarian work and development? ( http://arxiv.org/abs/2307.01891v1 )

ライセンス: Link先を確認

Vedran Sekara, M\'arton Karsai, Esteban Moro, Dohyung Kim, Enrique Delamonica, Manuel Cebrian, Miguel Luengo-Oroz, Rebeca Moreno Jim\'enez, and Manuel Garcia-Herranz

(参考訳) 機械学習(ML)や人工知能(AI)といった新しいデジタルデータソースやツールは、開発に関するデータに革命をもたらす可能性があり、人道的な問題を監視し緩和するのに貢献する。人類の最も差し迫った問題を解決するために新しい技術を適用する可能性は、国際開発の研究や研究を行う伝統的な分野以外で関心を集めている。今日では、計算社会科学、ネットワークサイエンス、複雑システム、ヒューマンコンピュータインタラクション、機械学習、そしてより広範なAI分野といった分野の科学コミュニティが、これらのプレッシャー問題に注目し始めている。しかし、高度なデータ駆動ツールは、不完全なデータと停滞する複雑さで現実世界の問題を解決するのに使えるだろうか? 我々は,現状を概説し,データ駆動技術が人道的および開発的文脈において有用になるためには,克服すべき障壁を特定する。組織的かつ目的的な努力がなければ、これらの新技術は、約束された目標に届かず、最悪の場合不平等を高め、差別を増幅し、人権を侵害する恐れがある、と我々は主張する。

Novel digital data sources and tools like machine learning (ML) and artificial intelligence (AI) have the potential to revolutionize data about development and can contribute to monitoring and mitigating humanitarian problems. The potential of applying novel technologies to solving some of humanity's most pressing issues has garnered interest outside the traditional disciplines studying and working on international development. Today, scientific communities in fields like Computational Social Science, Network Science, Complex Systems, Human Computer Interaction, Machine Learning, and the broader AI field are increasingly starting to pay attention to these pressing issues. However, are sophisticated data driven tools ready to be used for solving real-world problems with imperfect data and of staggering complexity? We outline the current state-of-the-art and identify barriers, which need to be surmounted in order for data-driven technologies to become useful in humanitarian and development contexts. We argue that, without organized and purposeful efforts, these new technologies risk at best falling short of promised goals, at worst they can increase inequality, amplify discrimination, and infringe upon human rights.

翻訳日:2023-07-06 16:02:51 公開日:2023-07-04

# 完全量子作業統計量に対する一般化線形応答理論

Generalised linear response theory for the full quantum work statistics ( http://arxiv.org/abs/2307.01885v1 )

ライセンス: Link先を確認

Giacomo Guarnieri, Jens Eisert, Harry J. D. Miller

(参考訳) 我々は、小さなハミルトン摂動を通して平衡から引き出された量子系を考える。線形応答理論のパラダイム的枠組みに基づいて、散逸した作業の完全な生成関数の式を導出する。驚くべきことに、分布に関する全ての情報は緩和関数として知られる単一のアクセス可能な量にエンコードできるため、複雑な量子系における非平衡揺らぎを研究するために現象論的モデルを使う新しい方法が開かれる。本研究は, 小型かつ任意に高速なプロトコルの規則に適用される作業統計に, 熱力学的制約が多数設けられており, 環境への低速運転や弱い結合といった仮定は不要である。最後に、我々のアプローチは、基礎となるゼロポイントエネルギーゆらぎに由来する仕事統計学において明確な量子署名を明らかにする。これにより、短い駆動時間における確率分布の分散が増大し、量子熱力学における非古典的効果を観測することができる。

We consider a quantum system driven out of equilibrium via a small Hamiltonian perturbation. Building on the paradigmatic framework of linear response theory, we derive an expression for the full generating function of the dissipated work. Remarkably, we find that all information about the distribution can be encoded in a single accessible quantity known as the relaxation function, thus opening up new ways to use phenomenological models to study non-equilibrium fluctuations in complex quantum systems. Our results establish a number of refined thermodynamic constraints on the work statistics that apply to regimes of small but arbitrarily fast protocols, and do not require assumptions such as slow driving or weak coupling to an environment. Finally, our approach uncovers a distinctly quantum signature in the work statistics that originates from underlying zero-point energy fluctuations. This causes an increased dispersion of the probability distribution at short driving times, a feature that can be probed in efforts to witness non-classical effects in quantum thermodynamics.

翻訳日:2023-07-06 16:02:30 公開日:2023-07-04

# propile: 大規模言語モデルにおけるプライバシリークの調査

ProPILE: Probing Privacy Leakage in Large Language Models ( http://arxiv.org/abs/2307.01881v1 )

ライセンス: Link先を確認

Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh

(参考訳) 大規模言語モデル(llm)の急速な発展と普及は、個人識別情報(pii)の漏洩の可能性に関する重大な懸念を提起した。これらのモデルは、大量のWeb収集データに基づいてトレーニングされることが多い。本稿では,PLM ベースのサービスにおける PII リークの可能性を意識した,データ主体,あるいは PII の所有者を支援するための新しい探索ツールである ProPILE を提案する。 ProPILEは、データ被験者が自身のPIIに基づいてプロンプトを定式化し、LSMのプライバシー侵害のレベルを評価する。公開されているPileデータセットに基づいてトレーニングされたOPT-1.3Bモデルにその応用を実演する。そこで本研究では,Pileデータセットに含まれるPIIの可能性を仮説データで評価する。 ProPILEはLLMサービスプロバイダによって、社内モデル用に特別に調整されたより強力なプロンプトで、自身のPIIリークレベルを効果的に評価するために利用することもできる。このツールは、Web上の自分のデータに対する認識とコントロールのために、データ主体に力を与えるための先駆的なステップである。

The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

翻訳日:2023-07-06 16:02:15 公開日:2023-07-04

# ワッサースタイン勾配流を有する粒子系距離GANの安定性解析フレームワーク

Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow ( http://arxiv.org/abs/2307.01879v1 )

ライセンス: Link先を確認

Chuqi Chen, Wu Yue, Yang Xiang

(参考訳) 本稿では, MMD GAN, Cram\er GAN, EIEG GAN などの目的関数として, 粒子ベース距離と呼ばれる確率密度距離を用いた生成ネットワークの学習過程について検討する。しかし、これらのガンはしばしば不安定な訓練の問題に苦しむ。本稿では,これらのGANの学習過程の安定性を,確率密度力学の観点から解析する。本フレームワークでは,高次元データを特徴空間にマッピングする特徴変換写像として,識別器$D$を,ジェネレータ$G$は特徴空間の観点から実データに似たサンプルにランダム変数をマッピングする。この観点からは,確率密度関数のwasserstein勾配流を用いてgansトレーニングの安定性解析を行うことができる。 GANの$\min_G \max_D E(G, D)$の定式化により、判別器のトレーニングプロセスは通常不安定である。この問題に対処するため、判別器損失関数に安定化項を追加する。安定解析と安定化法を検証する実験を行った。

In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.

翻訳日:2023-07-06 16:01:59 公開日:2023-07-04

# KDSTM:知識蒸留を用いたニューラルネットワーク半教師付きトピックモデリング

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation ( http://arxiv.org/abs/2307.01878v1 )

ライセンス: Link先を確認

Weijie Xu, Xiaoyu Jiang, Jay Desai, Bin Han, Fuqin Yan and Francis Iannacci

(参考訳) テキスト分類タスクでは、BERT や GPT-3 のような事前訓練済み言語モデルの微調整は、競合する精度をもたらすが、どちらの手法も大きなテキストデータセットで事前訓練を必要とする。対照的に、一般的なトピックモデリング手法は、事前学習なしに意味のある単語のパターンを抽出するために文書を分析する利点を持っている。テキスト分類タスクにおけるトピックモデリングの教師なし洞察抽出を活用するために,知識蒸留半教師付きトピックモデリング(KDSTM)を開発した。 KDSTMは事前訓練された埋め込みを必要とせず、ラベル付きドキュメントがほとんどなく、訓練も効率的で、リソース制約のある設定で理想的です。様々なデータセットにまたがって,提案手法は,既存の教師付きトピックモデリング手法を分類精度,ロバスト性,効率性において上回り,弱教師付きテキスト分類法と比較して同様の性能を実現する。

In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.

翻訳日:2023-07-06 16:01:42 公開日:2023-07-04

# 局所感性量子化による高速プライベートカーネル密度推定

Fast Private Kernel Density Estimation via Locality Sensitive Quantization ( http://arxiv.org/abs/2307.01877v1 )

ライセンス: Link先を確認

Tal Wagner, Yonatan Naamad, Nina Mishra

(参考訳) 差分プライベートカーネル密度推定(DP-KDE)の効率的なメカニズムについて検討した。 gaussian kernel の以前の作業では、次元 $d$ で指数関数的に実行されるアルゴリズムが記述されていた。本稿では,指数障壁を破り,KDEを時間線形に$d$でプライベートに近似し,高次元データに対して実現可能であることを示す。また,低次元データの境界も改善した。本研究は,既存のKDE近似手法を応用可能なKDE機構を構築するために,LSQ(Locality Sensitive Quantization)と呼ばれる一般フレームワークを用いて得られた。 Random Fourier Features、Fast Gauss Transform、Locality Sensitive Hashingなど、効率的な非プライベートなKDEメソッドをブラックボックスで活用できます。実験の結果,DP-KDE機構は高次元および低次元の大規模データセット上で高速かつ高精度であることがわかった。

We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data. Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions.

翻訳日:2023-07-06 16:01:26 公開日:2023-07-04

# Approximate, Adapt, Anonymize (3A): 機械学習のためのトレーニングデータリリースを保存するプライバシー保護フレームワーク

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning ( http://arxiv.org/abs/2307.01875v1 )

ライセンス: Link先を確認

Tamas Madl, Weijie Xu, Olivia Choudhury, Matthew Howard

(参考訳) 大量の情報データの提供は、機械学習の成功に不可欠である。しかし、機密情報を持つドメインでは、個人のプライバシーを保護する高可用性データのリリースが困難であることが証明されている。文学におけるプライバシー保護データリリースのための差分プライバシーと生成モデリングの進歩にもかかわらず、機械学習ユーティリティに最適化されるアプローチはごくわずかである。ほとんどのアプローチは、データ自体の統計メトリクスを考慮に入れ、その後生成されたデータでトレーニングされる機械学習モデルの損失メトリクスを明示的に保持することができない。本稿では,データリリースフレームワークである3A(Approximate,Adapt,Anonymize)を導入し,差分プライバシーを保ちながら機械学習のデータユーティリティを最大化する。また,このフレームワークの具体的実装として,混合モデルを利用して近似的,カーネル誘導型,ガウス微分プライバシを用いてデータセットの匿名化を行い,結果がプライバシ保存と高ユーティリティの両方であることを保証する。本研究では,実データに基づく実データの評価において,実データと民営化データセットを用いたモデルの性能指標の最小差を示す実験的な証拠を示す。また,いくつかのプライバシ保存型合成データ生成モデル(差分プライベート生成型adversarial networkなど)と比較し,最新モデルと比較して分類性能指標が著しく向上したことを報告する。これらの好意的な比較は、提示されたフレームワークが研究の有望な方向であることを示し、機械学習のための低リスク合成データリリースの有用性を高めている。

The availability of large amounts of informative data is crucial for successful machine learning. However, in domains with sensitive information, the release of high-utility data which protects the privacy of individuals has proven challenging. Despite progress in differential privacy and generative modeling for privacy-preserving data release in the literature, only a few approaches optimize for machine learning utility: most approaches only take into account statistical metrics on the data itself and fail to explicitly preserve the loss metrics of machine learning models that are to be subsequently trained on the generated data. In this paper, we introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning, while preserving differential privacy. We also describe a specific implementation of this framework that leverages mixture models to approximate, kernel-inducing points to adapt, and Gaussian differential privacy to anonymize a dataset, in order to ensure that the resulting data is both privacy-preserving and high utility. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets, when evaluated on held-out real data. We also compare our results with several privacy-preserving synthetic data generation models (such as differentially private generative adversarial networks), and report significant increases in classification performance metrics compared to state-of-the-art models. These favorable comparisons show that the presented framework is a promising direction of research, increasing the utility of low-risk synthetic data release for machine learning.

翻訳日:2023-07-06 16:00:57 公開日:2023-07-04

# 非相対論的時空間量子参照フレーム

Non-relativistic spatiotemporal quantum reference frames ( http://arxiv.org/abs/2307.01874v1 )

ライセンス: Link先を確認

Michael Suleymanov, Ismael L. Paiva, Eliahu Cohen

(参考訳) 量子参照フレームは、その探索が量子論の多くの分野に関連し、指導的であるため、近年新たな関心を集めている。異なるタイプの中で、位置と時間参照フレームは特別な注意を引いている。本稿では,その外的(空間的)自由度に加えて,各系が内部時計を含む非相対論的枠組みを導入・解析し,時空間量子参照フレームとして利用できることを示す。このフレームワークの他の応用の中で、相互作用のない単純なシナリオであっても、クロック間の相対的不確実性は系の相対的空間的拡散に影響を与えることを示す。

Quantum reference frames have attracted renewed interest recently, as their exploration is relevant and instructive in many areas of quantum theory. Among the different types, position and time reference frames have captivated special attention. Here, we introduce and analyze a non-relativistic framework in which each system contains an internal clock, in addition to its external (spatial) degree of freedom and, hence, can be used as a spatiotemporal quantum reference frame. Among other applications of this framework, we show that even in simple scenarios with no interactions, the relative uncertainty between clocks affects the relative spatial spread of the systems.

翻訳日:2023-07-06 15:59:52 公開日:2023-07-04

# 金属添加物製造におけるクラッド特性予測のためのハイブリッド機械学習フレームワーク

A hybrid machine learning framework for clad characteristics prediction in metal additive manufacturing ( http://arxiv.org/abs/2307.01872v1 )

ライセンス: Link先を確認

Sina Tayebati, Kyu Taek Cho

(参考訳) 過去10年間、金属添加物製造(mam)は重要な発展を遂げ、複雑な部品の製作、機能的に傾斜した材料による製品の製造、廃棄物の最小化、低コストのカスタマイズを可能にした。これらの利点にもかかわらず、MAMプロセスの複雑な性質のため、MAMプリントクラッドの特性に対する処理パラメータの影響を予測することは困難である。機械学習(ML)技術は、プロセスの基礎となる物理と処理パラメータをクラッド特性に結びつけるのに役立つ。本研究では,マルチフィジカルな計算流体力学(cfd)モデルによって提供されるデータと,本質的なビッグデータを作成するための実験的研究とを組み合わせたハイブリッド手法を提案し,様々なmlモデルからなる包括的フレームワークを用いてクラッド特性の予測と理解を行う。本研究は,実験データをCFDモデルを用いて生成したデータに融合することにより,まず広範囲なデータセットをコンパイルする。このデータセットは、幅、高さ、深さなどの幾何学的特徴、クラッド品質を識別するラベル、および処理パラメータを含む重要なクラッド特性を含む。第2に、機械学習モデルのトレーニングには、機械設定パラメータと物理認識パラメータと、汎用MLモデルと信頼性評価指標の2つの処理パラメータを使用して、クラッド幾何学と品質を予測するための包括的なスケーラブルな学習フレームワークを作成します。このフレームワークはクラッド特性制御とプロセス最適化の基礎となる。このフレームワークは、ハイブリッドアプローチを用いてデータの不足を解消し、クラッド特性予測と最適化のための効率的で正確でスケーラブルなプラットフォームを導入することで、MAMにおける従来のモデリング手法の多くの課題を解決する。

During the past decade, metal additive manufacturing (MAM) has experienced significant developments and gained much attention due to its ability to fabricate complex parts, manufacture products with functionally graded materials, minimize waste, and enable low-cost customization. Despite these advantages, predicting the impact of processing parameters on the characteristics of an MAM printed clad is challenging due to the complex nature of MAM processes. Machine learning (ML) techniques can help connect the physics underlying the process and processing parameters to the clad characteristics. In this study, we introduce a hybrid approach which involves utilizing the data provided by a calibrated multi-physics computational fluid dynamic (CFD) model and experimental research for preparing the essential big dataset, and then uses a comprehensive framework consisting of various ML models to predict and understand clad characteristics. We first compile an extensive dataset by fusing experimental data into the data generated using the developed CFD model for this study. This dataset comprises critical clad characteristics, including geometrical features such as width, height, and depth, labels identifying clad quality, and processing parameters. Second, we use two sets of processing parameters for training the ML models: machine setting parameters and physics-aware parameters, along with versatile ML models and reliable evaluation metrics to create a comprehensive and scalable learning framework for predicting clad geometry and quality. This framework can serve as a basis for clad characteristics control and process optimization. The framework resolves many challenges of conventional modeling methods in MAM by solving t the issue of data scarcity using a hybrid approach and introducing an efficient, accurate, and scalable platform for clad characteristics prediction and optimization.

翻訳日:2023-07-06 15:59:34 公開日:2023-07-04

# 支援を求めるロボット: 大きな言語モデルプランナーのための不確実性アライメント

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners ( http://arxiv.org/abs/2307.01928v1 )

ライセンス: Link先を確認

Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar

(参考訳) 大規模言語モデル(llm)は、ステップバイステップの計画からコモンセンス推論まで、幅広い有望な能力を示しており、ロボットの実用性を提供するが、自信を持って幻覚的な予測を行う可能性が高い。本研究では,LLMをベースとしたプランナの不確実性を計測・調整するフレームワークであるKnowNoについて述べる。 KnowNoは、複雑な多段階計画設定において人間の助けを最小化しながら、タスク完了に関する統計的保証を提供する共形予測理論に基づいている。例えば、人間の好みからウィノグラードのスキーマまで、空間的な不確実性から数値的な不確実性まで)の異なるモードのタスクを含む様々なシミュレーションされた実ロボットのセットアップの実験では、KnowNoは効率性と自律性の向上の観点からモダンなベースライン(アンサンブルや広範囲な急進的なチューニングを含む)に対して好適に機能し、形式的な保証を提供する。 KnowNo はモデルファインタニングなしで LLM を最初から使用することができ、基礎モデルの増大する能力を補完し拡張できる不確実性をモデリングするための有望な軽量なアプローチを提案する。ウェブサイト:https://robot-help.github.io

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io

翻訳日:2023-07-06 15:53:35 公開日:2023-07-04

# ProtoDiffusion: 原型学習による分類自由拡散指導

ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning ( http://arxiv.org/abs/2307.01924v1 )

ライセンス: Link先を確認

Gulcin Baykal, Halil Faruk Karagoz, Taha Binhuraib, Gozde Unal

(参考訳) 拡散モデルは生成モデルであり、より高い世代品質とより安定したトレーニングという観点で、他の生成モデルと比較して大きな利点を示している。しかし,拡散モデルの学習の必要性は大幅に増大した。本研究では,プロトタイプ学習を拡散モデルに組み込んで,元の拡散モデルよりも高速に高次品質を実現する。クラス埋め込みをランダムに初期化する代わりに、学習したクラスプロトタイプを条件付け情報として使用して拡散過程を導出する。 ProtoDiffusionと呼ばれる本手法は,ベースライン法と比較して訓練の初期段階で優れた性能を達成し,学習したプロトタイプを使用することでトレーニング時間を短縮することを示す。様々なデータセットと実験的な設定を用いてProtoDiffusionの性能を実証し、すべての設定で短時間で最高のパフォーマンスを達成する。

Diffusion models are generative models that have shown significant advantages compared to other generative models in terms of higher generation quality and more stable training. However, the computational need for training diffusion models is considerably increased. In this work, we incorporate prototype learning into diffusion models to achieve high generation quality faster than the original diffusion model. Instead of randomly initialized class embeddings, we use separately learned class prototypes as the conditioning information to guide the diffusion process. We observe that our method, called ProtoDiffusion, achieves better performance in the early stages of training compared to the baseline method, signifying that using the learned prototypes shortens the training time. We demonstrate the performance of ProtoDiffusion using various datasets and experimental settings, achieving the best performance in shorter times across all settings.

翻訳日:2023-07-06 15:53:09 公開日:2023-07-04

# 計算社会科学における再現性

Computational Reproducibility in Computational Social Science ( http://arxiv.org/abs/2307.01918v1 )

ライセンス: Link先を確認

David Schoch, Chung-hong Chan, Claudia Wagner, Arnim Bleier

(参考訳) 過去10年間で、再現性と再現性の危機が科学界を揺るがしている。潜在的な解決策として、オープンサイエンスの実践は深く議論され、様々な分野で様々な成功を収めた。しかしながら,計算社会科学などの計算X分野における再現性のバイナリ定義は,結果が再現できるエージェントや条件について明示的でないため不十分である,と我々は主張する。本研究では, 理論的再現性を創出するが, 実用的, 検証された再現性をサポートしない「オープン洗浄」を避けるための定義を拡張し, 検証可能性の概念に基づく計算再現性の階層システムを導入する。検証可能な計算再現性、特に計算社会科学の分野における共通の障壁を特定し、共通アクセスや計算障壁を回避する方法について提案する。

In the last decade, replication and reproducibility crises have shaken the scientific landscape. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue, however, that the binary definition of reproducibility, specifically for computational-X disciplines such as computational social science, is insufficient since it is not explicit about the agents and conditions under which results can be reproduced. We expand the definition to avoid "open washing", the practice of fabricating theoretical reproducibility but not supporting practical or verified reproducibility, and introduce a tier system of computational reproducibility based on the concept of verifiability. We identify common barriers to verifiable computational reproducibility, specifically in the field of computational social science, and provide suggestions on how to circumvent common access and computational barriers.

翻訳日:2023-07-06 15:52:56 公開日:2023-07-04

# 複雑な海流中における不動容器のストランドングリスク:解析と制御

Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers ( http://arxiv.org/abs/2307.01917v1 )

ライセンス: Link先を確認

Andreas Doering, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux and Claire J. Tomlin

(参考訳) 低推進の船は、目的地に向かうために強力な海流を利用することができる。近年の結果,予測誤差にもかかわらず,船が目的地に到達できる可能性が示唆された。しかし、これらの結果はこれらの船舶の安全性の重要な側面を考慮せず、その低推進力は電流の大きさよりはるかに小さいため、浅い地域、ゴミのパッチ、海運レーンなどの安全でない地域に必然的に押し込む電流になってしまう可能性がある。本研究は,北東太平洋における自由に浮かぶ船舶のストレッチングの危険性について検討した。少なくとも5.04%は90日以内に立ち往生する。次に、安全でない集合をハミルトン・ヤコビ多重時間到達可能性(HJ-MTR)にハード制約としてエンコードし、低計算コストで各ステップで再計画と等価なフィードバックポリシーを合成する。このポリシーを適用したクローズドループは、電流が分かっている場合に安全な動作を保証するが、現実的な状況では不完全な予測しかできない。東北太平洋の高リスク域を航行する船舶の大規模シミュレーションにより,このような現実的な状況において,本手法の安全性を実証する。我々は, 予測誤差が最大推力を超える場合でも, 新たな予測を毎日再計画することで, 安全性を高い確率で確保できることを見出した。本手法はベースライン上での安全性を著しく向上させ,目的地にタイムリーに船が到着する。

Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they might end up in currents that inevitably push them into unsafe areas such as shallow areas, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for free-floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy closed-loop guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. We demonstrate the safety of our approach in such realistic situations empirically with large-scale simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning on new forecasts can ensure safety with high probability even under forecast errors that exceed the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination.

翻訳日:2023-07-06 15:52:42 公開日:2023-07-04

# 自律型農業における海藻成長の最大化:不確実な海流をナビゲートする不活性化システムの動的プログラミング手法

Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents ( http://arxiv.org/abs/2307.01916v1 )

ライセンス: Link先を確認

Matthias Killer, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux and Claire J. Tomlin

(参考訳) 海藻バイオマスは気候変動を緩和する大きな可能性を秘めているが、大規模で自律的なオープンオーシャン農場はそれを完全に活用する必要がある。このような農場は典型的には低い推進力を持ち、海流の影響を強く受けている。高成長域に到達するための非線形時間変化海流を利用して、海藻の成長を最大化するコントローラを設計したい。複雑なダイナミクスと過度な動作は、たとえ電流が知られているとしても、これを難しくする。不確実性が増大する短期的不完全な予測のみが可能であれば、これはさらに難しい。実電流が分かっている場合に最適な成長値関数を効率的に解く動的計画法を提案する。 We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. 実際の太平洋海流シナリオにおける海藻養殖場の30日間のシミュレーションによるアプローチの評価を行った。本手法は,5日間の予測で最高の成長率の95.8%を達成できたことを示す。これにより, 実環境下での浮遊農地における低出力推進と海藻生育促進のための最適制御の可能性が確認された。

Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth regions. The complex dynamics and underactuation make this challenging even when the currents are known. This is even harder when only short-term imperfect forecasts with increasing uncertainty are available. We propose a dynamic programming-based method to efficiently solve for the optimal growth value function when true currents are known. We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. We evaluate our approach through 30-day simulations of floating seaweed farms in realistic Pacific Ocean current scenarios. Our method demonstrates an achievement of 95.8% of the best possible growth using only 5-day forecasts. This confirms the feasibility of using low-power propulsion and optimal control for enhanced seaweed growth on floating farms under real-world conditions.

翻訳日:2023-07-06 15:52:16 公開日:2023-07-04

# climatelearn: 気象と気候モデリングのためのベンチマーク機械学習

ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling ( http://arxiv.org/abs/2307.01909v1 )

ライセンス: Link先を確認

Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover

(参考訳) 気象と気候のモデリングは、気候変動の短期的および長期的影響を理解するための重要な取り組みであり、適応と緩和のための技術と政策作成を通知する。近年,気象予報や気候下降といった中核的な問題を解決するため,機械学習に基づくデータ駆動手法の適用への関心が高まっている。有望な結果にもかかわらず、この進歩の多くは、再現性のための大規模でオープンソースな取り組みの欠如により、一貫性のないデータセットや不特定なデータセット、トレーニングのセットアップ、ドメイン科学者と人工知能研究者による評価によって損なわれている。このライブラリは、データ駆動型気候科学のための機械学習モデルのトレーニングと評価を大幅に単純化する。 climatelearnはデータセット処理のための総合的なパイプライン(例: era5、cmip6、prism)、最先端のディープラーニングモデル(例:transformers、resnets)の実装、標準の気象・気候モデリングタスクの量的・質的評価からなる。これらの機能には、広範なドキュメント、コントリビューションガイド、クイックスタートチュートリアルを加えて、アクセスの拡大とコミュニティの成長を促進する。ライブラリの機能と重要な機能を紹介するため、包括的な予測およびダウンスケーリング実験も行いました。私たちの知る限り、climatelearnは、現代の機械学習システムによる気象と気候モデリングの研究を橋渡しするための、最初の大規模でオープンソースの取り組みです。私たちのライブラリはhttps://github.com/aditya-grover/climate-learn.comで公開されている。

Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.

翻訳日:2023-07-06 15:51:55 公開日:2023-07-04

# ZotCare: フレキシブルでパーソナライズ可能、そして拡張可能なmHealthサービスプロバイダ

ZotCare: A Flexible, Personalizable, and Affordable mHealth Service Provider ( http://arxiv.org/abs/2307.01905v1 )

ライセンス: Link先を確認

Sina Labbaf, Mahyar Abbasian, Iman Azimi, Nikil Dutt, and Amir M. Rahmani

(参考訳) インターネットに接続された健康デバイスの普及と、モバイル接続の普及により、信頼できるデジタルヘルスデータと、ジャスト・イン・タイムの介入を提供する可能性がある。しかし、これらの機会を健康研究に活用するには、モバイルヘルス(mhealth)アプリケーションの開発と展開が必要であり、研究者にとって重要な技術的課題となっている。既存のmHealthソリューションはこれらの課題のいくつかに対処する作業を進めてきたが、多くの場合、パーソナライズと適応のための時間と可利用性、柔軟性の面で不足している。 zotcareは、使用可能で柔軟なサービスを提供することで、これらの制限に対処し、研究者がmhealth研究にアクセスしやすく、コスト効率が高く、適応可能なソリューションを提供することを目指している。この記事では、ZotCareのサービスオーケストレーションに焦点を当て、mHealthリサーチ用のプログラム可能な環境を作成する能力を強調します。さらに,過去にも進行中のプロジェクトにおいても,ZotCareを利用した研究事例をいくつか紹介する。さらに,ZotCareをmHealth研究ソリューションとして検討している研究者に対して,リソースと情報を提供する。

The proliferation of Internet-connected health devices and the widespread availability of mobile connectivity have resulted in a wealth of reliable digital health data and the potential for delivering just-in-time interventions. However, leveraging these opportunities for health research requires the development and deployment of mobile health (mHealth) applications, which present significant technical challenges for researchers. While existing mHealth solutions have made progress in addressing some of these challenges, they often fall short in terms of time-to-use, affordability, and flexibility for personalization and adaptation. ZotCare aims to address these limitations by offering ready-to-use and flexible services, providing researchers with an accessible, cost-effective, and adaptable solution for their mHealth studies. This article focuses on ZotCare's service orchestration and highlights its capabilities in creating a programmable environment for mHealth research. Additionally, we showcase several successful research use cases that have utilized ZotCare, both in the past and in ongoing projects. Furthermore, we provide resources and information for researchers who are considering ZotCare as their mHealth research solution.

翻訳日:2023-07-06 15:51:24 公開日:2023-07-04

# 乱用言語分類器による偽因果関係の検証のための概念に基づく説明

Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers ( http://arxiv.org/abs/2307.01900v1 )

ライセンス: Link先を確認

Isar Nejadgholi, Svetlana Kiritchenko, Kathleen C. Fraser, and Esma Balk{\i}r

(参考訳) 分類器は、過剰表現された概念とラベルの間の誤った因果関係を学習する傾向があり、その結果、概念の過度な信頼と分類精度の妥協につながる。異なるモデルを比較し、特定の概念に過剰依存を識別できるメソッドを配置しておくことが不可欠である。大規模な英語データセットで訓練された3つのよく知られた乱用言語分類器について検討し,悪用ラベルの十分な特徴として学習すべきでない重要なシグナルである否定感情の概念に注目した。グローバル十分性の定義に動機づけられ、まず、すべての決定しきい値にまたがって設定された課題において、その正確性を評価することによって、分類器が学習した望ましくない依存関係を調べる。さらに,課題セットが必ずしも利用可能ではないことを認識し,概念がラベルに与える影響を評価するための概念ベースの説明指標を導入する。これらの説明により、概念とラベルの間で学んだ偽の大域的充足度について分類器を比較することができる。

Classifiers tend to learn a false causal relationship between an over-represented concept and a label, which can result in over-reliance on the concept and compromised classification accuracy. It is imperative to have methods in place that can compare different models and identify over-reliances on specific concepts. We consider three well-known abusive language classifiers trained on large English datasets and focus on the concept of negative emotions, which is an important signal but should not be learned as a sufficient feature for the label of abuse. Motivated by the definition of global sufficiency, we first examine the unwanted dependencies learned by the classifiers by assessing their accuracy on a challenge set across all decision thresholds. Further, recognizing that a challenge set might not always be available, we introduce concept-based explanation metrics to assess the influence of the concept on the labels. These explanations allow us to compare classifiers regarding the degree of false global sufficiency they have learned between a concept and a label.

翻訳日:2023-07-06 15:51:02 公開日:2023-07-04

# 変換プロトフォーム再構成

Transformed Protoform Reconstruction ( http://arxiv.org/abs/2307.01896v1 )

ライセンス: Link先を確認

Young Min Kim, Kalvin Chang, Chenxuan Cui and David Mortensen

(参考訳) プロトホルムの再構築は、娘言語の祖先言語における形態素や単語の出現を推測する作業である。 Meloni et al. (2021)は、RNNベースのエンコーダデコーダとアテンションモデルを用いて、ラテン文字のプロトフォーム再構築の最先端を達成した。我々は最新のseq2seqモデルであるtransformerでモデルを更新する。我々のモデルは,5言語にまたがる8,000コニャート,39種にまたがる800以上のコニャートからなる中国語データセット(Hou 2004)の2つの異なるデータセット上で,それらのモデルを比較した。また,本モデルに含まれる可能性のある系統信号についても検討する。私たちのコードはhttps://github.com/cmu-llab/acl-2023で公開されています。

Protoform reconstruction is the task of inferring what morphemes or words appeared like in the ancestral languages of a set of daughter languages. Meloni et al. (2021) achieved the state-of-the-art on Latin protoform reconstruction with an RNN-based encoder-decoder with attention model. We update their model with the state-of-the-art seq2seq model: the Transformer. Our model outperforms their model on a suite of different metrics on two different datasets: their Romance data of 8,000 cognates spanning 5 languages and a Chinese dataset (Hou 2004) of 800+ cognates spanning 39 varieties. We also probe our model for potential phylogenetic signal contained in the model. Our code is publicly available at https://github.com/cmu-llab/acl-2023.

翻訳日:2023-07-06 15:50:46 公開日:2023-07-04

# EANet: 拡張アトリビュートベースのRGBTトラッカーネットワーク

EANet: Enhanced Attribute-based RGBT Tracker Network ( http://arxiv.org/abs/2307.01893v1 )

ライセンス: Link先を確認

Abbas T\"urko\u{g}lu, Erdem Akag\"und\"uz

(参考訳) トラッキング対象は、特に咬合や照明の変化、動きのぼやきといった課題に直面した場合には、コンピュータビジョンにおいて難しい課題となることがある。ディープラーニングの最近の進歩は、これらの条件に挑戦する可能性を示している。しかし、ほとんどのディープラーニングベースのオブジェクトトラッカーは、可視帯域(RGB)イメージのみを使用する。熱赤外電磁波(TIR)は、困難な状況に直面した場合、その温度を含む物体に関する追加情報を提供する。本稿では,RGBと熱画像(RGBT)を融合した深層学習に基づく画像追跡手法を提案する。提案モデルは,特徴抽出器とトラッカーの2つの主成分から構成される。特徴抽出器は、RGBとTIR画像の両方の深い特徴を符号化する。トラッカーはこれらの機能を使用して、拡張された属性ベースのアーキテクチャを使用してオブジェクトを追跡する。本稿ではアグリゲーションモジュールを用いた属性固有の特徴選択の融合を提案する。提案手法はRGBT234 \cite{LiCLiang2018}とLasHeR \cite{LiLasher2021}データセットで評価され,RGBTオブジェクト追跡データセットとして最も広く使用されている。その結果,提案システムはこれらのデータセット上で,比較的少ないパラメータで,最先端のRGBTオブジェクトトラッカーよりも優れていた。

Tracking objects can be a difficult task in computer vision, especially when faced with challenges such as occlusion, changes in lighting, and motion blur. Recent advances in deep learning have shown promise in challenging these conditions. However, most deep learning-based object trackers only use visible band (RGB) images. Thermal infrared electromagnetic waves (TIR) can provide additional information about an object, including its temperature, when faced with challenging conditions. We propose a deep learning-based image tracking approach that fuses RGB and thermal images (RGBT). The proposed model consists of two main components: a feature extractor and a tracker. The feature extractor encodes deep features from both the RGB and the TIR images. The tracker then uses these features to track the object using an enhanced attribute-based architecture. We propose a fusion of attribute-specific feature selection with an aggregation module. The proposed methods are evaluated on the RGBT234 \cite{LiCLiang2018} and LasHeR \cite{LiLasher2021} datasets, which are the most widely used RGBT object-tracking datasets in the literature. The results show that the proposed system outperforms state-of-the-art RGBT object trackers on these datasets, with a relatively smaller number of parameters.

翻訳日:2023-07-06 15:50:30 公開日:2023-07-04

# グラフニューラルネットワークにおける特徴進化の神経崩壊の展望

A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks ( http://arxiv.org/abs/2307.01951v1 )

ライセンス: Link先を確認

Vignesh Kothapalli, Tom Tirer, Joan Bruna

(参考訳) グラフ構造データの分類タスクでは,グラフニューラルネットワーク(gnns)がますます普及している。しかし,GNNにおけるグラフトポロジと特徴進化の相互作用はよく理解されていない。本稿では,確率的ブロックモデルグラフ上でのコミュニティ検出と共に,ノード単位の分類に着目し,神経崩壊(nc)現象のレンズを通して特徴進化を考察する。インスタンスワイドの深層分類器(例えば画像分類)をゼロの訓練誤差点を超えて訓練する場合、NCは最深部特徴のクラス内変数の減少を示し、それらのクラスは特定の対称構造にアライメントされる。まず、ノード単位の分類設定において、クラス内変数の減少が顕著であることを示す実証的研究から始めるが、インスタンス単位のケースで観測される範囲には及ばない。そして、この区別を理論的に研究する。具体的には、「最適」な数学的モデルでさえ、グラフは正確な崩壊を伴う最小値を持つために厳密な構造条件に従う必要があることを示す。興味深いことに、この条件は異種グラフにも有効であり、GNNの一般化を改善した最近の経験的研究と関係している。さらに, 理論モデルの勾配ダイナミクスを研究することにより, 経験的に観測される部分的崩壊の推理を与える。最後に,よく訓練されたgnnの層間におけるクラス間特徴変動の進化と,その挙動をスペクトル法と対比する。

Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.

翻訳日:2023-07-06 15:43:07 公開日:2023-07-04

# ビデオ探索のための因果ビデオ要約器

Causal Video Summarizer for Video Exploration ( http://arxiv.org/abs/2307.01947v1 )

ライセンス: Link先を確認

Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring

(参考訳) 近年,ビデオ探索を支援する方法としてビデオ要約が提案されている。しかし、従来のビデオ要約モデルは、ユーザー固有のニーズとは無関係に固定されたビデオ要約のみを生成し、それゆえビデオ探索の有効性を制限している。マルチモーダルビデオ要約はこの問題に対処するために使用されるアプローチの1つである。マルチモーダルビデオ要約は、ビデオ入力とテキストベースのクエリ入力を有する。したがって,マルチモーダルビデオ要約には,映像入力とテキスト検索の相互作用を効果的にモデル化することが不可欠である。本研究では,CVS(Causal Video Summarizer)と呼ばれる因果関係に基づく新しい手法を提案し,マルチモーダルビデオ要約の課題に対処するために,映像とクエリ間の対話的情報を効果的にキャプチャする。提案手法は確率エンコーダと確率デコーダからなる。既存のマルチモーダル映像要約データセットの評価結果から,提案手法の精度が+5.4%,F1スコアが+4.92%向上すると,最先端の手法と比較して有効であることが示された。

Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with the state-of-the-art method.

翻訳日:2023-07-06 15:42:42 公開日:2023-07-04

# 深層学習に基づく走査心電図デジタル化を実現するための心電図画像生成ツールボックス

A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization ( http://arxiv.org/abs/2307.01946v1 )

ライセンス: Link先を確認

Kshama Kodthalu Shivashankara and Reza Sameni

(参考訳) 医療データへのアクセスは、保護された健康情報(PHI)を含むため、しばしば制限される。個人識別可能な情報を含むレコードの使用に関するプライバシー上の懸念がある。近年,臨床診断と意思決定に深層学習に基づくアルゴリズムを適用している。しかし、ディープラーニングモデルはデータグラデーションであり、これらのモデルのトレーニングと評価のための医療データセットは比較的限られている。いわゆる \textit{digital twins}によるデータ拡張は、このニーズに対処する新たなテクニックである。本稿では,ecg画像のデジタイズアルゴリズムを開発するために,時系列データから人工心電図(ecg)画像を生成する新しい手法を提案する。標準ECG紙の背景に歪みのないECG画像を生成することにより、プライバシ保存方式で合成データを生成する。次に、ECG画像に手書きのテキストアーティファクト、しわ、クレーゼ、パースペクティブ変換を含む様々な歪みを適用する。人工物は、個人を特定することなく、合成的に生成される。使用例として,生理学のptb-xlデータセットから21,801個の大規模心電図画像データセットを作成し,18,869人の患者から12個のリード心電図時系列データを得た。合成データセットを用いた深部心電図デジタイズモデルを開発し,評価のために合成画像から時系列データへの変換を行った。 snr(signal-to-noise ratio)を算出し,画像のデジタル化品質とグラウンド・トゥルータのecg時系列を比較した。その結果,27$\pm$2.8\,dBの平均信号回復SNRが示され,深層学習モデルのトレーニングのための合成ECG画像データセットの重要性が示された。

Access to medical data is often limited as it contains protected health information (PHI). There are privacy concerns regarding using records containing personally identifiable information. Recent advancements have been made in applying deep learning-based algorithms for clinical diagnosis and decision-making. However, deep learning models are data-greedy, whereas the availability of medical datasets for training and evaluating these models is relatively limited. Data augmentation with so-called \textit{digital twins} is an emerging technique to address this need. This paper presents a novel approach for generating synthetic electrocardiogram (ECG) images with realistic artifacts from time-series data for use in developing algorithms for digitization of ECG images. Synthetic data is generated in a privacy-preserving manner by generating distortionless ECG images on standard ECG paper background. Next, various distortions, including handwritten text artifacts, wrinkles, creases, and perspective transforms are applied to the ECG images. The artifacts are generated synthetically, without personally identifiable information. As a use case, we generated a large ECG image dataset of 21,801 records from the PhysioNet PTB-XL dataset, with 12 lead ECG time-series data from 18,869 patients. A deep ECG image digitization model was developed and trained on the synthetic dataset, and was employed to convert the synthetic images to time-series data for evaluation. The signal-to-noise ratio (SNR) was calculated to assess the image digitization quality vs the ground truth ECG time-series. The results show an average signal recovery SNR of 27$\pm$2.8\,dB, demonstrating the significance of the proposed synthetic ECG image dataset for training deep learning models.

翻訳日:2023-07-06 15:42:25 公開日:2023-07-04

# 擬似ラベルによるクエリに基づくビデオ要約

Query-based Video Summarization with Pseudo Label Supervision ( http://arxiv.org/abs/2307.01945v1 )

ライセンス: Link先を確認

Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring

(参考訳) 手動でラベル付けされたクエリベースのビデオ要約のための既存のデータセットはコストがかかり、小さくなり、教師付きディープビデオ要約モデルの性能が制限される。セルフスーパービジョンは、プリテキストタスクを使い、擬似ラベルで余分なデータを取得し、教師付き深層モデルを事前学習する方法を定義することで、データスパーシティチャレンジに対処することができる。本研究では,入力映像からのセグメントレベルの擬似ラベルを導入し,プリテキストタスクと対象タスクの関係と,擬似ラベルと人間定義ラベルとの暗黙の関係を適切にモデル化する。擬似ラベルは、既存のフレームレベルラベルに基づいて生成される。より正確なクエリ依存のビデオ要約を作成するために、コンテキスト対応のクエリ表現を生成するセマンティックスブースターを提案する。さらに,視覚とテキストの対話的情報を取り込むための相互注意を提案する。 3つの一般的なビデオ要約ベンチマークを用いて提案手法を徹底的に検証する。実験の結果,提案手法は最先端の性能を実現することがわかった。

Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models. Self-supervision can address the data sparsity challenge by using a pretext task and defining a method to acquire extra data with pseudo labels to pre-train a supervised deep model. In this work, we introduce segment-level pseudo labels from input videos to properly model both the relationship between a pretext task and a target task, and the implicit relationship between the pseudo label and the human-defined label. The pseudo labels are generated based on existing human-defined frame-level labels. To create more accurate query-dependent video summaries, a semantics booster is proposed to generate context-aware query representations. Furthermore, we propose mutual attention to help capture the interactive information between visual and textual modalities. Three commonly-used video summarization benchmarks are used to thoroughly validate the proposed approach. Experimental results show that the proposed video summarization algorithm achieves state-of-the-art performance.

翻訳日:2023-07-06 15:41:57 公開日:2023-07-04

# Text + Sketch:超低速度での画像圧縮

Text + Sketch: Image Compression at Ultra Low Rates ( http://arxiv.org/abs/2307.01944v1 )

ライセンス: Link先を確認

Eric Lei, Yi\u{g}it Berkay Uslu, Hamed Hassani, Shirin Saeedi Bidokhti

(参考訳) テキスト対画像生成モデルの最近の進歩は、短いテキスト記述から高品質な画像を生成する機能を提供する。これらの基盤モデルは、数十億規模のデータセットで事前トレーニングされた場合、ほとんどあるいはまったくトレーニングせずに、さまざまな下流タスクに有効である。自然な質問は、このようなモデルを画像圧縮にどのように適応するかである。本研究では,事前学習モデルを用いて,新しい低レートレジームをターゲットとした圧縮スキームを実装する手法について検討する。テキスト記述と副次的情報とを併用して,テキストのセマンティクスと空間構造を両立した高忠実度再構成を生成する方法を示す。エンド・ツー・エンドのトレーニングは行わないものの,非常に低ビットレートで学習した圧縮機の知覚的・意味的忠実度を向上できることを示す。

Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.

翻訳日:2023-07-06 15:41:42 公開日:2023-07-04

# スパース入力からの物理に基づく動き再ターゲティング

Physics-based Motion Retargeting from Sparse Inputs ( http://arxiv.org/abs/2307.01938v1 )

ライセンス: Link先を確認

Daniele Reda, Jungdam Won, Yuting Ye, Michiel van de Panne, Alexander Winkler

(参考訳) アバターは仮想世界でインタラクティブで没入的な体験を作り出すために重要である。これらのキャラクターをユーザーの動きを模倣するアニメーション化の課題の1つは、商用AR/VR製品がヘッドセットとコントローラのみで構成されており、ユーザーのポーズのセンサーデータが非常に限られていることである。もう一つの課題は、アバターは人間とは異なる骨格構造を持ち、それらの間のマッピングは不明確である。この作業では、これら2つの課題に対処します。本稿では,人間の分散センサデータから様々な形態の文字へ,リアルタイムに動きをターゲティングする手法を提案する。本手法は,物理シミュレータにおける文字制御ポリシーの学習に強化学習を用いる。私たちは、アバターごとにアーティスト生成アニメーションに頼ることなく、トレーニングのために人間のモーションキャプチャーデータのみを必要とします。これにより、大規模なモーションキャプチャデータセットを使用して、未確認のユーザをリアルタイムおよびスパースデータから追跡する一般的なポリシをトレーニングできます。我々は、恐竜、ネズミのような生き物、人間という、異なる骨格構造を持つ3つのキャラクターに対するアプローチの実現可能性を示した。下半身のセンサー情報がないにもかかわらず、アバターのポーズは驚くほどユーザーとよく合っていることが分かる。我々は,我々のフレームワークの重要な構成要素,特にキネマティック・リターゲティングのステップ,模倣,接触,行動報酬,および非対称なアクター・クリティカルな観察について論じる。さらに,アンバランス,ダンス,スポーツ動作など,さまざまな環境下での手法の堅牢性について検討する。

Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

翻訳日:2023-07-06 15:41:27 公開日:2023-07-04

# 脆性破壊のモデル化のための再生カーネル近似のニューラルネットワークによる強化

A Neural Network-Based Enrichment of Reproducing Kernel Approximation for Modeling Brittle Fracture ( http://arxiv.org/abs/2307.01937v1 )

ライセンス: Link先を確認

Jonghyuk Baek, Jiun-Shyan Chen

(参考訳) 局所化の数値モデリングは、局所化経路を事前に定義しない粗い解が進化しているため、難しい課題である。数十年の努力にもかかわらず、局所化の進化を予測するために、革新的な離散化非依存の計算方法が必要である。本研究では、脆性破壊をモデル化するためのニューラルネットワーク強化再生カーネル粒子法(NN-RKPM)の改良版を提案する。提案手法では、粗大かつ均一な離散化に基づいて定義されたバックグラウンド再生カーネル(RK)近似を、ユニティフレームワークの分割の下でニューラルネットワーク(NN)近似により濃縮する。 NN近似では、ディープニューラルネットワークが関数空間内の正規化された不連続を自動的に見つけ、挿入する。 NNベースのエンリッチメント関数は、RKを単位パッチ関数の分割として使用するRK近似関数と共にパッチされる。エネルギーベース損失関数の最小化により, 位置, 方向, 変位分布をrk近似係数とともに決定する最適nnパラメータを求める。 NN-RK近似を正規化するために、損失関数にパラメトリック座標の空間勾配の制約を課す。収束特性の解析は,提案手法の解収束が保証されていることを示す。提案手法の有効性は,損傷伝播と分岐を含む一連の数値例によって実証された。

Numerical modeling of localizations is a challenging task due to the evolving rough solution in which the localization paths are not predefined. Despite decades of efforts, there is a need for innovative discretization-independent computational methods to predict the evolution of localizations. In this work, an improved version of the neural network-enhanced Reproducing Kernel Particle Method (NN-RKPM) is proposed for modeling brittle fracture. In the proposed method, a background reproducing kernel (RK) approximation defined on a coarse and uniform discretization is enriched by a neural network (NN) approximation under a Partition of Unity framework. In the NN approximation, the deep neural network automatically locates and inserts regularized discontinuities in the function space. The NN-based enrichment functions are then patched together with RK approximation functions using RK as a Partition of Unity patching function. The optimum NN parameters defining the location, orientation, and displacement distribution across location together with RK approximation coefficients are obtained via the energy-based loss function minimization. To regularize the NN-RK approximation, a constraint on the spatial gradient of the parametric coordinates is imposed in the loss function. Analysis of the convergence properties shows that the solution convergence of the proposed method is guaranteed. The effectiveness of the proposed method is demonstrated by a series of numerical examples involving damage propagation and branching.

翻訳日:2023-07-06 15:41:00 公開日:2023-07-04

# concept2box: 2視点知識グラフ学習のための合同幾何埋め込み

Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs ( http://arxiv.org/abs/2307.01933v1 )

ライセンス: Link先を確認

Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, Jingbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, Wei Wang

(参考訳) 知識グラフ埋め込み(KGE)は、多くの実世界のアプリケーションに大規模な関係データを埋め込むために広く研究されている。既存の手法では、多くのkgsが2つの基本的な異なるビューを持っているという事実を長い間無視してきた。通常、すべてのノードをベクトルとして1つの潜在空間に埋め込む。しかし、一つの幾何学的表現は2つのビューの構造的な違いを捉えず、概念の粒度に対する確率論的意味論を欠いている。双対幾何表現を用いたkgの2つのビューを共同で埋め込む新しいアプローチであるconcept2boxを提案する。我々は,階層構造や重なりや不一致といった複雑な関係を学習するボックス埋め込みを用いて概念をモデル化する。ボックスボリュームは概念の粒度として解釈できる。概念とは違って、エンティティをベクトルとしてモデル化します。概念箱埋め込みと実体ベクトル埋め込みのギャップを埋めるため,新しいベクトル-箱間距離測定法を提案し,両埋め込みを共同で学習する。パブリックDBpedia KGと新しい産業KGの両方の実験は、Concept2Boxの有効性を示した。

Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts' granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts' granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.

翻訳日:2023-07-06 15:40:44 公開日:2023-07-04

# MDI+: フレキシブルなランダムフォレストベースの特徴重要度フレームワーク

MDI+: A Flexible Random Forest-Based Feature Importance Framework ( http://arxiv.org/abs/2307.01932v1 )

ライセンス: Link先を確認

Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu

(参考訳) 不純物の平均減少(MDI)は、ランダム森林(RF)にとって重要な特徴である。 RFにおける各木の特徴である$X_k$に対するMDIは、X_k$で分割された決定切り株の集合に対する応答の線形回帰における非正規化$R^2$値と等価であることを示す。我々はこの解釈を用いて、MDI+と呼ばれるフレキシブルな特徴重視フレームワークを提案する。具体的には、MDI+は、アナリストが線形回帰モデルと$R^2$メトリックを正規化された一般化線形モデル(GLM)に置き換えることによって、MDIを一般化する。さらに、MDI+には、決定木の既知のバイアスを加法モデルやスムーズモデルに対して緩和する追加機能が含まれている。さらに,検証的データサイエンスの予測可能性,計算可能性,安定性フレームワークに基づいて,適切なglmとメトリックを選択する方法のガイダンスを提供する。広範囲なデータインスパイアされたシミュレーションでは、MDI+は信号の特徴を特定する上で、一般的な特徴の重要性を著しく上回っている。また,MDI+を薬物反応予測と乳癌サブタイプ分類の2つの実例に適用した。 MDI+は,既存の特徴重要度よりも安定性が著しく高い,確立された予測遺伝子を抽出する。すべてのコードとモデルは、github上のpythonパッケージでリリースされている。

Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.

翻訳日:2023-07-06 15:40:24 公開日:2023-07-04

# バックプロパゲーションを伴わない心電図信号特徴の学習

Learning ECG signal features without backpropagation ( http://arxiv.org/abs/2307.01930v1 )

ライセンス: Link先を確認

P\'eter P\'osfay, Marcell T. Kurbucz, P\'eter Kov\'acs, Antal Jakov\'ac

(参考訳) 表現学習は、分類や予測のような下流タスクの有効性、範囲、適用性を高める有用な特徴を持つ生データ表現の効率的な方法を見つけることを目的として、機械学習における重要な研究領域となっている。本稿では,時系列型データの表現を生成する新しい手法を提案する。この方法は、データ駆動の方法でコンパクト表現を構築するための理論物理学からのアイデアに依存しており、データの基本構造とタスク固有の情報の両方を捉えることができ、直感的で解釈可能で検証可能なままである。本手法は,特定のクラスに属するサンプル間の共有特性を効果的に把握できる線形法則を同定することを目的とする。その後、これらの法則を利用して分類子非依存表現を前方に生成することで、一般化された設定で適用されるようになる。本稿では,ECG信号分類の課題に対するアプローチの有効性を示す。

Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.

翻訳日:2023-07-06 15:39:59 公開日:2023-07-04

# 三面体による形状表現と生成のためのハイブリッドニューラル拡散型流れ

Hybrid Neural Diffeomorphic Flow for Shape Representation and Generation via Triplane ( http://arxiv.org/abs/2307.01957v1 )

ライセンス: Link先を確認

Kun Han, Shanlin Sun, Xiaohui Xie

(参考訳) Deep Implicit Functions (DIF) はそのコンパクトさと連続表現能力のために3Dコンピュータビジョンで人気を博している。しかしながら、difエンコードされた形状にまたがる密接な対応と意味関係への対処は依然として重要な課題であり、テクスチャ転送や形状解析の応用は制限されている。さらに,DIFを用いた3次元形状生成における最近の取り組みは,対応やトポロジー保存を無視することが多い。本稿では,下層の表現を暗黙的に学習し,複雑な密接な対応を軸に並んだ三面体に分解する手法であるhndf(hybrid neural diffeomorphic flow)を提案する。局所ミニマに閉じ込められた準最適表現を避けるために,局所対応と大域対応の両方を捉えるハイブリッド監督を提案する。新しい3次元形状を直接生成する従来の手法とは異なり、変形は3次元平面の特徴によって符号化される微分型流によって変形したテンプレート形状による形状生成の考え方をさらに探求する。既存の2次元拡散モデルを利用して, 生成する三面体特徴を通じ, 高品質で多様な3次元二相流を生成し, テンプレート形状との位相的一貫性を確保する。 3次元形状表現と生成におけるhndfの有効性を評価する医用画像臓器分割データセットに関する広範囲実験

Deep Implicit Functions (DIFs) have gained popularity in 3D computer vision due to their compactness and continuous representation capabilities. However, addressing dense correspondences and semantic relationships across DIF-encoded shapes remains a critical challenge, limiting their applications in texture transfer and shape analysis. Moreover, recent endeavors in 3D shape generation using DIFs often neglect correspondence and topology preservation. This paper presents HNDF (Hybrid Neural Diffeomorphic Flow), a method that implicitly learns the underlying representation and decomposes intricate dense correspondences into explicitly axis-aligned triplane features. To avoid suboptimal representations trapped in local minima, we propose hybrid supervision that captures both local and global correspondences. Unlike conventional approaches that directly generate new 3D shapes, we further explore the idea of shape generation with deformed template shape via diffeomorphic flows, where the deformation is encoded by the generated triplane features. Leveraging a pre-existing 2D diffusion model, we produce high-quality and diverse 3D diffeomorphic flows through generated triplanes features, ensuring topological consistency with the template shape. Extensive experiments on medical image organ segmentation datasets evaluate the effectiveness of HNDF in 3D shape representation and generation.

翻訳日:2023-07-06 15:32:18 公開日:2023-07-04

# アルゴリズムEM r'egularis\'e

Algorithme EM r\'egularis\'e ( http://arxiv.org/abs/2307.01955v1 )

ライセンス: Link先を確認

Pierre Houdouin and Matthieu Jonkcheere and Frederic Pascal

(参考訳) expectation-Maximization (EM) アルゴリズムはガウス混合モデル(GMM)を扱う際の最大推定値を計算するために広く用いられている反復アルゴリズムである。サンプルサイズがデータ次元よりも小さい場合、これは特異もしくは条件の悪い共分散行列となり、結果として性能が低下する可能性がある。本稿では,より少ないサンプルサイズに対応するために,事前知識を効率的に活用するEMアルゴリズムの正規化バージョンを提案する。本手法は,正規化推定が共分散行列更新の正定性を保証するペナルティ化gmmの確率を最大化することを目的としている。最後に, 実データを用いた実験では, 提案アルゴリズムの性能向上を強調する。

Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing maximum likelihood estimate when dealing with Gaussian Mixture Model (GMM). When the sample size is smaller than the data dimension, this could lead to a singular or poorly conditioned covariance matrix and, thus, to performance reduction. This paper presents a regularized version of the EM algorithm that efficiently uses prior knowledge to cope with a small sample size. This method aims to maximize a penalized GMM likelihood where regularized estimation may ensure positive definiteness of covariance matrix updates by shrinking the estimators towards some structured target covariance matrices. Finally, experiments on real data highlight the good performance of the proposed algorithm for clustering purposes

翻訳日:2023-07-06 15:31:55 公開日:2023-07-04

# femda:ロバストとフレキシブルの分類に関するune m\'ethode

FEMDA: Une m\'ethode de classification robuste et flexible ( http://arxiv.org/abs/2307.01954v1 )

ライセンス: Link先を確認

Pierre Houdouin and Matthieu Jonckheere and Frederic Pascal

(参考訳) 線形および二次判別解析(ldaおよびqda)はよく知られた古典的手法であるが、非ガウス分布および/または汚染データセットに苦しむことがある。本稿では,各データ点を任意の楕円対称(ES)分布と独自の任意のスケールパラメータで描画する,新しい識別分析手法のデータのスケール変化に対するロバスト性について検討する。このようなモデルは、おそらく非常に異質で、独立で、特定されていない分散サンプルを可能にする。導出される新しい決定規則は、他の最先端手法と比較して、データのスケール変更に対して単純で高速で堅牢である

Linear and Quadratic Discriminant Analysis (LDA and QDA) are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. This paper studies the robustness to scale changes in the data of a new discriminant analysis technique where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. The new decision rule derived is simple, fast, and robust to scale changes in the data compared to other state-of-the-art method

翻訳日:2023-07-06 15:31:42 公開日:2023-07-04

# 静止状態fMRIを用いた機能的脳ネットワークの自動認識モデルの構築に向けて

Toward more frugal models for functional cerebral networks automatic recognition with resting-state fMRI ( http://arxiv.org/abs/2307.01953v1 )

ライセンス: Link先を確認

Lukman Ismaila, Pejman Rasti, Jean-Michel Lem\'ee, David Rousseau

(参考訳) 古典的畳み込みニューラルネットワークに基づくモデルが優れた性能を示した機械学習の状況について述べる。我々はスーパーボクセル(supervoxel)という形で異なる符号化技術を調査し、性能の低下を追跡しながらモデルの複雑さを減らすためにグラフを作成する。このアプローチは、脳腫瘍患者の安静時機能ネットワークの認識タスクについて説明する。超ボクセルをコードするグラフは、画像から機能的脳ネットワークの活性化特性を保存し、cnnモデルの性能を維持しながらモデルパラメータを26倍最適化する。

We refer to a machine learning situation where models based on classical convolutional neural networks have shown good performance. We are investigating different encoding techniques in the form of supervoxels, then graphs to reduce the complexity of the model while tracking the loss of performance. This approach is illustrated on a recognition task of resting-state functional networks for patients with brain tumors. Graphs encoding supervoxels preserve activation characteristics of functional brain networks from images, optimize model parameters by 26 times while maintaining CNN model performance.

翻訳日:2023-07-06 15:31:30 公開日:2023-07-04

# SDXL:高分解能画像合成のための潜時拡散モデルの改良

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis ( http://arxiv.org/abs/2307.01952v1 )

ライセンス: Link先を確認

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\"uller, Joe Penna, Robin Rombach

(参考訳) テキスト・画像合成のための遅延拡散モデルSDXLを提案する。モデルパラメータの増加は、主に注意ブロックの増加と、sdxlが第2のテキストエンコーダを使用するように、より大きなクロスタッチコンテキストに起因する。複数の新しい条件付けスキームを設計し,複数のアスペクト比でsdxlを訓練する。また,SDXLが生成する試料の視覚的忠実度を改善するために,ポストホックイメージ・トゥ・イメージ技術を用いて改良モデルを導入する。 SDXLは従来の安定拡散と比較して大幅に性能が向上し,ブラックボックス画像生成装置と競合する結果が得られることを示した。大規模モデルトレーニングと評価におけるオープンリサーチの推進と透明性向上の精神において、コードとモデルのウェイトへのアクセスはhttps://github.com/Stability-AI/generative-modelsで提供します。

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

翻訳日:2023-07-06 15:31:17 公開日:2023-07-04

# 3次元スーパービジョンのない複数2次元画像からのニューラル3次元シーン再構成

Neural 3D Scene Reconstruction from Multiple 2D Images without 3D Supervision ( http://arxiv.org/abs/2306.17643v3 )

ライセンス: Link先を確認

Yi Guo, Che Sun, Yunde Jia, and Yuwei Wu

(参考訳) 室内シーンにおける複雑な形状と低テクスチャ領域の再構成において,ニューラル3次元シーン再構成法は印象的な性能を達成した。しかし,これらの手法は,リアルタイムの取得に要する費用と時間を要する3Dデータに大きく依存している。本稿では,平面制約下でのスパース深度を用いてシーンを3次元監督せずに再構成するニューラル再構成手法を提案する。シーンを表現するために,符号付き距離関数フィールド,色フィールド,確率フィールドを導入する。我々は、これらのフィールドを最適化し、2D画像で識別可能な光線マーキングを監督することでシーンを再構築する。幾何的制約により得られた深さの少ない複雑な幾何シーン領域の再構成品質を向上させる。幾何学的制約プロジェクト3Dは、異なる2D画像に類似した特徴を持つ類似した外観の領域に表面を向ける。我々は平面制約を課し、屋内の床に平行あるいは垂直に大きな平面を作る。 2つの制約は、シーンの正確で滑らかな幾何学構造を再構築するのに役立つ。提案手法は,ScanNetデータセット上で3次元監視を行う既存手法と比較して,競争性能が向上する。

Neural 3D scene reconstruction methods have achieved impressive performance when reconstructing complex geometry and low-textured regions in indoor scenes. However, these methods heavily rely on 3D data which is costly and time-consuming to obtain in real world. In this paper, we propose a novel neural reconstruction method that reconstructs scenes using sparse depth under the plane constraints without 3D supervision. We introduce a signed distance function field, a color field, and a probability field to represent a scene. We optimize these fields to reconstruct the scene by using differentiable ray marching with accessible 2D images as supervision. We improve the reconstruction quality of complex geometry scene regions with sparse depth obtained by using the geometric constraints. The geometric constraints project 3D points on the surface to similar-looking regions with similar features in different 2D images. We impose the plane constraints to make large planes parallel or vertical to the indoor floor. Both two constraints help reconstruct accurate and smooth geometry structures of the scene. Without 3D supervision, our method achieves competitive performance compared with existing methods that use 3D supervision on the ScanNet dataset.

翻訳日:2023-07-06 10:52:22 公開日:2023-07-04

# 分類システムにおける説明のための統一論理枠組み

A unified logical framework for explanations in classifier systems ( http://arxiv.org/abs/2105.14452v7 )

ライセンス: Link先を確認

Xinghan Liu and Emiliano Lorini

(参考訳) 近年では、説明可能なAI(XAI)分野におけるバイナリ分類器の説明において、ブール関数に対する新たな関心が高まっている。ブール関数の標準的なアプローチは命題論理である。我々は,二項入力分類器とその特性に関する推論をサポートするceteris paribusの性質のモーダル言語を提案する。我々は、分類子モデルの族を研究し、言語の濃度に関する2つの証明体系として公理化し、我々の公理学の完全性を示す。さらに、我々の様相言語に対する充足可能性チェック問題は無限変数の場合ではnexptime-completeであり、有限変数の場合では多項式となることを証明した。さらに、無限変数の場合において、我々の言語の興味深いNPフラグメントを同定する。我々はこの言語を,帰納的,対比的,反事実的説明,バイアスを含む様々な説明概念と同様に,反事実条件を形式化するために活用する。最後に,この言語の2つの拡張について述べる: 代入可能分類器変更の概念による動的拡張と,実際の入力に対する分類器の不確実性を表現できる認識的拡張である。

Recent years have witnessed a renewed interest in Boolean function in explaining binary classifiers in the field of explainable AI (XAI). The standard approach of Boolean function is propositional logic. We present a modal language of a ceteris paribus nature which supports reasoning about binary input classifiers and their properties. We study a family of classifier models, axiomatize it as two proof systems regarding the cardinality of the language and show completeness of our axiomatics. Moreover, we prove that satisfiability checking problem for our modal language is NEXPTIME-complete in the infinite-variable case, while it becomes polynomial in the finite-variable case. We furthermore identify an interesting NP fragment of our language in the infinite-variable case. We leverage the language to formalize counterfactual conditional as well as a variety of notions of explanation including abductive, contrastive and counterfactual explanations, and biases. Finally, we present two extensions of our language: a dynamic extension by the notion of assignment enabling classifier change and an epistemic extension in which the classifier's uncertainty about the actual input can be represented.

翻訳日:2023-07-06 10:51:25 公開日:2023-07-04

# アクティブフォーミングによる事前学習による言語可塑性の向上

Improving Language Plasticity via Pretraining with Active Forgetting ( http://arxiv.org/abs/2307.01163v2 )

ライセンス: Link先を確認

Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

(参考訳) プリトレーニング言語モデル(plm)は現在、自然言語処理の主要なモデルである。ダウンストリームのパフォーマンスは印象的なものですが、新しい言語にplmを適用するのは困難です。以前の作業では、新しい言語用の新しい埋め込みレイヤを学ぶことでこの問題に対処できることが示されているが、データと計算非効率の両方がそうである。本稿では,新しい言語に迅速に適応できるPLMの作成方法として,事前学習中に能動的に忘れる機構を提案する。具体的には、プレトレーニング中のK更新毎に埋め込み層をリセットすることで、メタ学習効果と同様に、限られた数の更新で新しい埋め込みを学習する能力を改善することをPLMに推奨する。 RoBERTaを用いた実験では、言語適応の高速化だけでなく、特に英語から離れた言語において、低データ方式の標準モデルよりも優れていることが示されている。

Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

翻訳日:2023-07-06 10:49:14 公開日:2023-07-04

# CNOTゲート受信機を用いたマイクロ波ガウス量子センシング

Microwave Gaussian quantum sensing with a CNOT gate receiver ( http://arxiv.org/abs/2307.01014v2 )

ライセンス: Link先を確認

Hany Khalifa, Kirill Petrovnin, Riku J\"antti, Gheorghe Sorin Paraoanu

(参考訳) 量子照明(QI)では、連続変数(CV)絡み合った放射モード間の非古典的相関を利用して、熱雑音に埋め込まれたターゲットの存在を検出する。 QIが最適古典的性能を上回る極端な環境は、マイクロ波領域の応用がこの新しいセンシングパラダイムの恩恵を受けることを示唆している。しかし、提案されたQI受信機は全て、マイクロ波領域では実現不可能な理想的な光子カウンタや検出器に依存している。そこで本研究では,cv制御notゲート(cnot)を用いた新しいqi受信機を提案する。他のQI受信機とは異なり、検出プロセス全体はホモダイン測定と2乗法検出器によって実行される。受信機はゲートの操作の一部として2つの圧縮補助モードを利用する。これらの余分なリソースはオフラインで準備され、全体的な利得は単一のビームスプリッターパラメータによってパッシブに制御される。我々は,本モデルと他のQI受信機を比較し,その動作状態が他よりも優れ,性能が最適であることを示す。この研究の主な焦点はマイクロ波量子センシングアプリケーションであるが、提案したデバイスは光学領域でも構築可能であるため、より広義の量子センシングツールボックスに新たに追加されることになる。

In quantum illumination (QI) the non-classical correlations between continuous variable (CV) entangled modes of radiation are exploited to detect the presence of a target embedded in thermal noise. The extreme environment where QI outperforms its optimal classical counterpart suggests that applications in the microwave domain would benefit the most from this new sensing paradigm. However all the proposed QI receivers rely on ideal photon counters or detectors, which are not currently feasible in the microwave domain. Here we propose a new QI receiver that utilizes a CV controlled not gate (CNOT) in order to perform a joint measurement on a target return and its retained twin. Unlike other QI receivers, the entire detection process is carried out by homodyne measurements and square-law detectors. The receiver exploits two squeezed ancillary modes as a part of the gate's operation. These extra resources are prepared offline and their overall gain is controlled passively by a single beamsplitter parameter. We compare our model to other QI receivers and demonstrate its operation regime where it outperforms others and achieves optimal performance. Although the main focus of this study is microwave quantum sensing applications, our proposed device can be built as well in the optical domain, thus rendering it as a new addition to the quantum sensing toolbox in a wider sense.

翻訳日:2023-07-06 10:48:42 公開日:2023-07-04

# CardiGraphormer:創薬革命における自己指導型学習の力

CardiGraphormer: Unveiling the Power of Self-Supervised Learning in Revolutionizing Drug Discovery ( http://arxiv.org/abs/2307.00859v2 )

ライセンス: Link先を確認

Abhijit Gupta and Arnab Mukherjee

(参考訳) 約15,000の既知の薬物と約4,200の承認がある薬発見の世界では、化学空間の組合せの性質は極めて困難である。人工知能(AI)は強力な同盟国として登場したが、従来のAIフレームワークは大きなハードルに直面している。この原稿では、自己教師付き学習(SSL)、グラフニューラルネットワーク(GNN)、薬物発見に革命を起こすためのカルディナリティ保存注意を相乗化するための画期的なアプローチであるCardiGraphormerを紹介している。グラフマーと枢機卿の新たな組み合わせであるcardigraphormerはsslを利用して強力な分子表現を学習し、gnnを使って分子指紋を抽出し、計算時間を短縮しながら予測性能と解釈性を向上させる。分子構造のような複雑なデータを処理し、ノード、ノードのペア、サブグラフ、グラフ構造全体に関連するタスクを実行する。 CardiGraphormerによる薬物発見と薬物相互作用の潜在的な応用は、新しい薬物標的の同定から薬物と薬物の相互作用の予測、新しい薬物発見の実現まで幅広い。この革新的なアプローチは、薬物開発においてAIによって強化された方法論を提供し、SSLとGNNを組み合わせて既存の制限を克服し、薬物発見における膨大な組合せ化学空間をより深く探求する道を開く。

In the expansive realm of drug discovery, with approximately 15,000 known drugs and only around 4,200 approved, the combinatorial nature of the chemical space presents a formidable challenge. While Artificial Intelligence (AI) has emerged as a powerful ally, traditional AI frameworks face significant hurdles. This manuscript introduces CardiGraphormer, a groundbreaking approach that synergizes self-supervised learning (SSL), Graph Neural Networks (GNNs), and Cardinality Preserving Attention to revolutionize drug discovery. CardiGraphormer, a novel combination of Graphormer and Cardinality Preserving Attention, leverages SSL to learn potent molecular representations and employs GNNs to extract molecular fingerprints, enhancing predictive performance and interpretability while reducing computation time. It excels in handling complex data like molecular structures and performs tasks associated with nodes, pairs of nodes, subgraphs, or entire graph structures. CardiGraphormer's potential applications in drug discovery and drug interactions are vast, from identifying new drug targets to predicting drug-to-drug interactions and enabling novel drug discovery. This innovative approach provides an AI-enhanced methodology in drug development, utilizing SSL combined with GNNs to overcome existing limitations and pave the way for a richer exploration of the vast combinatorial chemical space in drug discovery.

翻訳日:2023-07-06 10:48:12 公開日:2023-07-04

# SketchMetaFace: 高忠実度3次元顔モデリングのための学習ベースのスケッチインタフェース

SketchMetaFace: A Learning-based Sketching Interface for High-fidelity 3D Character Face Modeling ( http://arxiv.org/abs/2307.00804v2 )

ライセンス: Link先を確認

Zhongjin Luo, Dong Du, Heming Zhu, Yizhou Yu, Hongbo Fu, Xiaoguang Han

(参考訳) 3Dアバターのモデリングは、AR/VR、ゲーム、撮影といった様々なアプリケーションシナリオに役立つ。キャラクターの顔は、アバターの重要な構成要素として重要な多様性と鮮度をもたらす。しかし、3Dキャラクタフェイスモデルの構築には、経験豊富なアーティストであっても、商用ツールによる重い作業が必要になる。既存のスケッチベースの様々なツールは、多様な顔の形と豊富な幾何学的詳細をモデル化するアマチュアをサポートするのに失敗する。本稿では,素人ユーザを対象としたスケッチシステムであるSketchMetaFaceについて紹介する。ユーザインタフェースと基礎となるアルゴリズムの両方を慎重に設計する。第一に、顔の細部を彫る制御性を高めるために、曲率アウェア・ストロークが採用されている。第二に、2Dスケッチマップを3Dモデルにマッピングする鍵となる問題を考えると、「Implicit and Depth Guided Mesh Modeling」(IDGMM)と呼ばれる新しい学習手法を開発する。メッシュ、暗黙、深度表現の利点を融合させ、高い効率で高品質な結果を達成する。さらに,ユーザビリティをさらに支援するために,粗い2次元スケッチインタフェース設計とデータ駆動ストローク提案ツールを提案する。ユーザスタディは、使いやすさと結果の視覚的な品質の観点から、既存のモデリングツールよりも優れたシステムを示します。実験により、IDGMMは精度と効率のトレードオフがより良くなることが示された。 SketchMetaFaceはhttps://zhongjinluo.github.io/SketchMetaFace/で入手できる。

Modeling 3D avatars benefits various application scenarios such as AR/VR, gaming, and filming. Character faces contribute significant diversity and vividity as a vital component of avatars. However, building 3D character face models usually requires a heavy workload with commercial tools, even for experienced artists. Various existing sketch-based tools fail to support amateurs in modeling diverse facial shapes and rich geometric details. In this paper, we present SketchMetaFace - a sketching system targeting amateur users to model high-fidelity 3D faces in minutes. We carefully design both the user interface and the underlying algorithm. First, curvature-aware strokes are adopted to better support the controllability of carving facial details. Second, considering the key problem of mapping a 2D sketch map to a 3D model, we develop a novel learning-based method termed "Implicit and Depth Guided Mesh Modeling" (IDGMM). It fuses the advantages of mesh, implicit, and depth representations to achieve high-quality results with high efficiency. In addition, to further support usability, we present a coarse-to-fine 2D sketching interface design and a data-driven stroke suggestion tool. User studies demonstrate the superiority of our system over existing modeling tools in terms of the ease to use and visual quality of results. Experimental analyses also show that IDGMM reaches a better trade-off between accuracy and efficiency. SketchMetaFace is available at https://zhongjinluo.github.io/SketchMetaFace/.

翻訳日:2023-07-06 10:47:43 公開日:2023-07-04

# ジョイントベル計測による可変量子固有解法高速化

Accelerated variational quantum eigensolver with joint Bell measurement ( http://arxiv.org/abs/2307.00766v2 )

ライセンス: Link先を確認

Chenfeng Cao, Hiroshi Yano, Yuya O. Nakagawa

(参考訳) 変分量子固有解法(VQE)は、量子化学において分子ハミルトニアンの基底状態を得るために、短期量子コンピュータのための顕著な量子古典ハイブリッドアルゴリズムである。しかし、ハミルトニアンにおけるパウリ作用素の非可換性のため、量子コンピュータに要求される測定量は、システムのサイズが大きくなるにつれて著しく増加し、VQEの実用的な応用を妨げる可能性がある。本稿では,JBM-VQE (Joint Bell Measurement VQE) と呼ばれるプロトコルを提案する。本手法では、ハミルトニアンに存在するパウリ作用素のすべての期待値の絶対値を同時に測定できるジョイントベル測定器を用いる。最適化の過程では、jbm-vqeはジョイントベル測定により各イテレーション毎のポーリ演算子の期待値の絶対値を推定するが、それらの符号は従来の方法による期待値の測定ではより少ない頻度で測定される。我々のアプローチは、最適化中に標識が頻繁に変化しないという経験的観察に基づいている。小分子の分子ハミルトニアン基底状態を求める数値シミュレーションによる従来のVQEと比較して、JBM-VQEの高速化と、最適化の初期段階におけるJBM-VQEの高速化は、大規模システムではますます顕著になっている。共同ベル測定に基づくアプローチは、VQEに限らず、コスト関数が多くのパウリ演算子の期待値である様々な量子アルゴリズムで利用することができる。

The variational quantum eigensolver (VQE) stands as a prominent quantum-classical hybrid algorithm for near-term quantum computers to obtain the ground states of molecular Hamiltonians in quantum chemistry. However, due to the non-commutativity of the Pauli operators in the Hamiltonian, the number of measurements required on quantum computers increases significantly as the system size grows, which may hinder practical applications of VQE. In this work, we present a protocol termed joint Bell measurement VQE (JBM-VQE) to reduce the number of measurements and speed up the VQE algorithm. Our method employs joint Bell measurements, enabling the simultaneous measurement of the absolute values of all expectation values of Pauli operators present in the Hamiltonian. In the course of the optimization, JBM-VQE estimates the absolute values of the expectation values of the Pauli operators for each iteration by the joint Bell measurement, while the signs of them are measured less frequently by the conventional method to measure the expectation values. Our approach is based on the empirical observation that the signs do not often change during optimization. We illustrate the speed-up of JBM-VQE compared to conventional VQE by numerical simulations for finding the ground states of molecular Hamiltonians of small molecules, and the speed-up of JBM-VQE at the early stage of the optimization becomes increasingly pronounced in larger systems. Our approach based on the joint Bell measurement is not limited to VQE and can be utilized in various quantum algorithms whose cost functions are expectation values of many Pauli operators.

翻訳日:2023-07-06 10:47:21 公開日:2023-07-04

# 予測符号化と不確かさ最小化によるアクティブセンシング

Active Sensing with Predictive Coding and Uncertainty Minimization ( http://arxiv.org/abs/2307.00668v2 )

ライセンス: Link先を確認

Abdelrahman Sharafeldin, Nabil Imam, Hannah Choi

(参考訳) 本稿では,生物にインスパイアされた2つの計算,予測符号化と不確実性最小化に基づくエンドツーエンド探索手法を提案する。この手順は、タスクに依存しない本質的に駆動された方法で、任意の探索設定に適用することができる。まず,mazeナビゲーションタスクで提案手法を実証し,基礎となる遷移分布を発見し,環境の空間的特徴を再構築できることを示す。第2に,エージェントが情報を収集するために,その視覚環境を積極的にサンプリングする必要があるアクティブビジョンのより複雑なタスクに,このモデルを適用する。我々のモデルは教師なしの表現を構築でき、センサのシーンを積極的にサンプリングし、効率的に分類できることを示す。さらに,これらの表現を下流分類の入力として用いると,他のベースラインと比較してデータ効率と学習速度が向上すると同時に,パラメータの複雑さも低下することを示した。最後に、モデルのモジュラリティにより、内部メカニズムを分析し、探索行動中の知覚と行動の相互作用についての洞察を導き出すことができる。

We present an end-to-end procedure for embodied exploration based on two biologically inspired computations: predictive coding and uncertainty minimization. The procedure can be applied to any exploration setting in a task-independent and intrinsically driven manner. We first demonstrate our approach in a maze navigation task and show that our model is capable of discovering the underlying transition distribution and reconstructing the spatial features of the environment. Second, we apply our model to the more complex task of active vision, where an agent must actively sample its visual environment to gather information. We show that our model is able to build unsupervised representations that allow it to actively sample and efficiently categorize sensory scenes. We further show that using these representations as input for downstream classification leads to superior data efficiency and learning speed compared to other baselines, while also maintaining lower parameter complexity. Finally, the modularity of our model allows us to analyze its internal mechanisms and to draw insight into the interactions between perception and action during exploratory behavior.

翻訳日:2023-07-06 10:46:27 公開日:2023-07-04

# ディープニューラルネットワークのためのスパーシティアウェア一般化理論

Sparsity-aware generalization theory for deep neural networks ( http://arxiv.org/abs/2307.00426v2 )

ライセンス: Link先を確認

Ramchandran Muthukumar, Jeremias Sulam

(参考訳) 深層人工ニューラルネットワークは、未理解のままの驚くべき一般化能力を達成する。本稿では,隠れ層アクティベーションにおいて達成される疎度を生かしたディープフィードフォワードReLUネットワークの一般化を解析するための新しいアプローチを提案する。各入力サンプルの有効なモデルサイズを削減したフレームワークを開発することで、スパーシティと一般化の間の根本的なトレードオフを示すことができる。重要なことは、この結果がモデルによって達成される疎度について強い仮定をしていないことであり、近年のノルムベースのアプローチよりも改善されている。過度にパラメータ化されたモデルであっても、特定の設定においてデータ依存の先行値と組み合わせて非空き境界を示す。

Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By developing a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models.

翻訳日:2023-07-06 10:46:12 公開日:2023-07-04

# スコア正規化を用いたCNNに基づく人物再識別の改善

Improving CNN-based Person Re-identification using score Normalization ( http://arxiv.org/abs/2307.00397v2 )

ライセンス: Link先を確認

Ammar Chouchane, Abdelmalik Ouamane, Yassine Himeur, Wathiq Mansoor, Shadi Atalla, Afaf Benzaibak and Chahrazed Boudellal

(参考訳) 個人再識別(PRe-ID)は、セキュリティ、監視、小売分析において重要な課題であり、複数のカメラやビューにまたがる個人を特定することである。しかし、照明・背景・視点の変化により困難な課題となっている。 PRe-IDシステムの成功には,効率的な特徴抽出とメートル法学習アルゴリズムが不可欠である。本稿では,畳み込みニューラルネットワーク(cnn)に基づく特徴抽出法と,xqda(cross-view quadratic discriminant analysis)を併用した,メトリック学習のための新しい手法を提案する。また、マハラノビス距離とスコア正規化処理を用いてカメラスコア間の不整合に対処するマッチングアルゴリズムを実装した。提案手法は, VIPeR, GRID, CUHK01, PRID450Sの4つの挑戦的データセットで検証し, 有望な結果を得た。例えば、GRID、CUHK01、VIPeR、PRID450Sデータセットのランク-20の精度は61.92%、83.90%、92.03%、96.22%であったが、スコア正規化後にそれぞれ64.64%、89.30%、92.78%、98.76%に増加した。したがって、4つの挑戦的データセットの有望な結果は、提案手法の有効性を示している。

Person re-identification (PRe-ID) is a crucial task in security, surveillance, and retail analysis, which involves identifying an individual across multiple cameras and views. However, it is a challenging task due to changes in illumination, background, and viewpoint. Efficient feature extraction and metric learning algorithms are essential for a successful PRe-ID system. This paper proposes a novel approach for PRe-ID, which combines a Convolutional Neural Network (CNN) based feature extraction method with Cross-view Quadratic Discriminant Analysis (XQDA) for metric learning. Additionally, a matching algorithm that employs Mahalanobis distance and a score normalization process to address inconsistencies between camera scores is implemented. The proposed approach is tested on four challenging datasets, including VIPeR, GRID, CUHK01, and PRID450S, and promising results are obtained. For example, without normalization, the rank-20 rate accuracies of the GRID, CUHK01, VIPeR and PRID450S datasets were 61.92%, 83.90%, 92.03%, 96.22%; however, after score normalization, they have increased to 64.64%, 89.30%, 92.78%, and 98.76%, respectively. Accordingly, the promising results on four challenging datasets indicate the effectiveness of the proposed approach.

翻訳日:2023-07-06 10:45:59 公開日:2023-07-04

PDF登録状況（公開日: 20230704）