# BushraDBR: 重複バグレポートの自動検索

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports ( http://arxiv.org/abs/2407.04707v1 )

Ra'Fat Al-Msie'deen, (参考訳) Bugzilla のような Bug Tracking System (BTS) は、一般に特定のソフトウェアシステムのために提出された Bug Reports (BR) を追跡するために使用される。 重複バグレポート(DBR)検索は、BTSでDBRを取得するプロセスである。 このプロセスは、DBRのエンジニアによる不要な作業を避けるために重要である。 以前に提出された(あるいは重複した)BR上で、労力や時間などのエンジニアリングリソースの浪費を防止するためには、ソフトウェアユーザによって提出されたDBRを見つけて検索することが不可欠である。 そこで本稿では,DBRの検索と複製の開始前停止を支援する自動アプローチ(BushraDBR)を提案する。 BushraDBRはBushra Duplicate Bug Reports検索プロセスの略である。 従って、新しいBRがBug Repository(BRE)に送信されると、BRE内の既存のBRの複製であるかどうかをBushraDBRアプローチ経由でチェックする。 もしそうであれば、エンジニアはそれをDBRとマークし、BRは追加作業の考慮から除外され、そうでなければBRはBREに追加される。 BushraDBRアプローチは、新たに提出されたBRとBREの他のBRの間のテキスト類似性(TS)を利用してDBRを検索する。 BushraDBRは、BRからの非構造化データを利用して、情報検索(IR)メソッドを効率的に適用する。 BushraDBRアプローチはDBRの検索に2つの手法を使う: 潜時セマンティックインデックス (LSI) と形式概念解析 (FCA) である。 BushraDBRの独創性は、新たに報告されたBRとBTSの他のBRを比較して、ソフトウェア保守(SM)プロセスにおける時間と労力を節約することで、DBRを発生前に停止することである。 BushraDBRはLSIおよびFCA技術を用いてDBRを独自に回収する。 BushraDBRアプローチは、Bugzillaから公開されているいくつかのデータセットで検証され、評価されている。 実験により、BushraDBRアプローチがDBRを効率的に回収できることが示されている。

A Bug Tracking System (BTS), such as Bugzilla, is generally utilized to track submitted Bug Reports (BRs) for a particular software system. Duplicate Bug Report (DBR) retrieval is the process of obtaining a DBR in the BTS. This process is important to avoid needless work from engineers on DBRs. To prevent wasting engineer resources, such as effort and time, on previously submitted (or duplicate) BRs, it is essential to find and retrieve DBRs as soon as they are submitted by software users. Thus, this paper proposes an automatic approach (called BushraDBR) that aims to assist an engineer (called a triager) to retrieve DBRs and stop the duplicates before they start. Where BushraDBR stands for Bushra Duplicate Bug Reports retrieval process. Therefore, when a new BR is sent to the Bug Repository (BRE), an engineer checks whether it is a duplicate of an existing BR in BRE or not via BushraDBR approach. If it is, the engineer marks it as DBR, and the BR is excluded from consideration for any additional work; otherwise, the BR is added to the BRE. BushraDBR approach relies on Textual Similarity (TS) between the newly submitted BR and the rest of the BRs in BRE to retrieve DBRs. BushraDBR exploits unstructured data from BRs to apply Information Retrieval (IR) methods in an efficient way. BushraDBR approach uses two techniques to retrieve DBRs: Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of BushraDBR is to stop DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the BTS, thus saving time and effort during the Software Maintenance (SM) process. BushraDBR also uniquely retrieves DBR through the use of LSI and FCA techniques. BushraDBR approach had been validated and evaluated on several publicly available data sets from Bugzilla. Experiments show the ability of BushraDBR approach to retrieve DBRs in an efficient manner.
翻訳日:2024-07-22 21:58:59 公開日:2024-05-04
# 科学研究・教育活動における現代情報技術

Modern Information Technologies in Scientific Research and Educational Activities ( http://arxiv.org/abs/2407.10296v1 )

Kyrylo Malakhov, Vadislav Kaverinskiy, Liliia Ivanova, Oleksandr Romanyuk, Oksana Romaniuk, Svitlana Voinova, Sergii Kotlyk, Oksana Sokolova, (参考訳) このモノグラフは、インタラクティブ人工知能システム、テキスト生成システム、スペシャリストの競争性の診断、画像形成における正しい色レンダリング、大学院生の仕事の情報化、三次元3Dモデルを作成するためのアクセス可能な技術といった分野における科学研究の現状を要約し分析する。 モノグラフは、IT分野で働く企業のスペシャリストや従業員、高等教育機関の教師、修士、学生、大学院生、そして情報技術に関する問題に関心のある人にも役に立つだろう。 このモノグラフは、2023年10月にオデッサ工科大学で開催された第16回国際科学・実践会議「情報技術と自動化」の結果に基づいて編纂された。

The monograph summarizes and analyzes the current state of scientific research in the field of interactive artificial intelligence systems, text generation systems, diagnostics of the competitiveness of specialists, in the areas of correct color rendering in image formation, informatization of the work of graduate students, accessible technology for creating three-dimensional 3D models. The monograph will be useful both to specialists and employees of companies working in the IT field, as well as teachers, masters, students and graduate students of higher educational institutions, as well as anyone interested in issues related to information technology. The monograph was compiled based on the results of the 16-th international scientific and practical conference Information technologies and automation - 2023, which took place in October 2023 at Odessa National University of Technology.
翻訳日:2024-07-22 12:59:07 公開日:2024-05-04
# 大規模言語モデルによる微粒な人間の色覚関連性の推定

Large Language Models estimate fine-grained human color-concept associations ( http://arxiv.org/abs/2406.17781v1 )

Kushin Mukherjee, Timothy T. Rogers, Karen B. Schloss, (参考訳) 概念は抽象的かつ具体的であり、知覚的な色空間にまたがる関連強度の分布をもたらし、物体認識から情報可視化の解釈まで視覚認知の側面に影響を与える。 先行研究では、経験のクロスモーダルな統計構造から着色概念の関連が学べると推測されているが、自然環境がそのような構造を持っているか、あるいは学習システムが強い事前制約なしにそれを発見・活用できるかどうかは不明である。 我々は,多モーダルな大言語モデルであるGPT-4を,追加のトレーニングを伴わずに,人間のような色覚関連を推定する能力について検討することで,これらの課題に対処した。 知覚色空間にまたがる71色集合(\texttt{UW-71})の人間の色覚関連性評価と抽象性の異なる概念から、GPT-4で生成した相関性評価が人体評価をどれだけ良く予測できるかを考察した。 GPT-4のレーティングは人間のレーティングと相関し、画像から色覚関連を自動的に推定する最先端の手法に匹敵する性能を示した。 GPT-4のコンセプト間の性能の変動は、コンセプトのカラーコンセプト関連分布の特異性によって説明できる。 本研究は,インターネットの自然環境において表現される言語と知覚の高次共変性が,人間の色覚関連性の学習を支援するのに十分な情報を含んでいることを示唆する。 この研究により、GPT-4は幅広い概念のカラーアソシエーションの分布を効率的に推定することができ、効果的で直感的な情報視覚化を設計するための重要なツールとして機能する可能性が示唆された。

Concepts, both abstract and concrete, elicit a distribution of association strengths across perceptual color space, which influence aspects of visual cognition ranging from object recognition to interpretation of information visualizations. While prior work has hypothesized that color-concept associations may be learned from the cross-modal statistical structure of experience, it has been unclear whether natural environments possess such structure or, if so, whether learning systems are capable of discovering and exploiting it without strong prior constraints. We addressed these questions by investigating the ability of GPT-4, a multimodal large language model, to estimate human-like color-concept associations without any additional training. Starting with human color-concept association ratings for 71 color set spanning perceptual color space (\texttt{UW-71}) and concepts that varied in abstractness, we assessed how well association ratings generated by GPT-4 could predict human ratings. GPT-4 ratings were correlated with human ratings, with performance comparable to state-of-the-art methods for automatically estimating color-concept associations from images. Variability in GPT-4's performance across concepts could be explained by specificity of the concept's color-concept association distribution. This study suggests that high-order covariances between language and perception, as expressed in the natural environment of the internet, contain sufficient information to support learning of human-like color-concept associations, and provides an existence proof that a learning system can encode such associations without initial constraints. The work further shows that GPT-4 can be used to efficiently estimate distributions of color associations for a broad range of concepts, potentially serving as a critical tool for designing effective and intuitive information visualizations.
翻訳日:2024-07-01 06:21:45 公開日:2024-05-04
# リアルタイム神経織物レンダリング

Real-time Neural Woven Fabric Rendering ( http://arxiv.org/abs/2406.17782v1 )

Xiang Chen, Lu Wang, Beibei Wang, (参考訳) 織布はリアルなレンダリングの応用において広く使われており、リアルタイムの能力も不可欠である。 しかし、複雑な構造と光学的外観のため、現実的な織物をリアルタイムでレンダリングすることは困難であり、多くのサンプルを伴わずにエイリアスやノイズを引き起こす。 この問題の核心はファブリックシェーディングモデルのマルチスケール表現であり、高速レンジクエリを可能にする。 従来のニューラルメソッドでは、各素材のトレーニングコストでこの問題に対処し、実用性を制限していた。 本稿では,異なる種類の織物を異なるスケールで表現する軽量ニューラルネットワークを提案する。 織布パターンの規則性と反復性により,ネットワークは布のパターンやパラメータを小さな潜在ベクトルとしてエンコードすることができる。 画素のフットプリントを入力として適用することにより,ネットワークはマルチスケール表現を実現する。 さらに、私たちのネットワークは高速で、軽量な構造のため、ストレージがほとんどありません。 その結果,RTX 3090では,60fps近い織物のレンダリングと編集が可能となり,その品質は真実に近いものとなり,可視エイリアスやノイズを伴わないことがわかった。

Woven fabrics are widely used in applications of realistic rendering, where real-time capability is also essential. However, rendering realistic woven fabrics in real time is challenging due to their complex structure and optical appearance, which cause aliasing and noise without many samples. The core of this issue is a multi-scale representation of the fabric shading model, which allows for a fast range query. Some previous neural methods deal with the issue at the cost of training on each material, which limits their practicality. In this paper, we propose a lightweight neural network to represent different types of woven fabrics at different scales. Thanks to the regularity and repetitiveness of woven fabric patterns, our network can encode fabric patterns and parameters as a small latent vector, which is later interpreted by a small decoder, enabling the representation of different types of fabrics. By applying the pixel's footprint as input, our network achieves multi-scale representation. Moreover, our network is fast and occupies little storage because of its lightweight structure. As a result, our method achieves rendering and editing woven fabrics at nearly 60 frames per second on an RTX 3090, showing a quality close to the ground truth and being free from visible aliasing and noise.
翻訳日:2024-07-01 06:21:45 公開日:2024-05-04
# IQLS: メタデータを活用して大規模言語モデルベースのクエリを複雑で汎用的なデータに活用するフレームワーク

IQLS: Framework for leveraging Metadata to enable Large Language Model based queries to complex, versatile Data ( http://arxiv.org/abs/2405.15792v1 )

Sami Azirar, Hossam A. Gabbar, Chaouki Regoui, (参考訳) データの量と複雑さが増大するにつれて、その検索はより困難なタスクとなり、より多くの知識とリソースを必要としている。 これは、データ収集のための新しい技術が膨大な量のリアルタイムデータを提供するロジスティクス業界に特に当てはまる。 Intelligent Query and Learning System (IQLS)は、自然言語を使ってデータ検索を簡単にすることで、プロセスを単純化する。 利用可能なメタデータと利用可能なデータモデルに基づいて、構造化されたデータをフレームワークにマッピングする。 このフレームワークは、大規模言語モデルを利用したエージェントのための環境を作成する。 エージェントは、データの階層的な性質を利用して、ワンショットデータ検索ではなく、複数の小さなコンテキスト認識決定を反復的にフィルタリングする。 データフィルタリングの後、IQLSはエージェントがインターフェースを通じてユーザクエリによって与えられたタスクを満足することを可能にする。 これらのインタフェースは、マルチモーダル交通情報検索から、複数の制約下での経路計画まで幅広い。 後者は、クエリパラメータに基づいて決定される動的オブジェクトを定義する。 このオブジェクトは、道路ネットワークをナビゲートできるドライバーを表す。 道路網は、そのデータに基づく属性を持つグラフとして描かれている。 Dijkstraアルゴリズムの修正版を使うことで、与えられた制約の下で最適な経路を決定することができる。 プロセス全体を通して、ユーザはシステムと対話し、ガイドする能力を維持します。 IQLSはカナダの物流セクターのケーススタディで紹介されており、地理空間、視覚、表、およびテキストデータを自然言語で意味的に簡単にクエリできる。

As the amount and complexity of data grows, retrieving it has become a more difficult task that requires greater knowledge and resources. This is especially true for the logistics industry, where new technologies for data collection provide tremendous amounts of interconnected real-time data. The Intelligent Query and Learning System (IQLS) simplifies the process by allowing natural language use to simplify data retrieval . It maps structured data into a framework based on the available metadata and available data models. This framework creates an environment for an agent powered by a Large Language Model. The agent utilizes the hierarchical nature of the data to filter iteratively by making multiple small context-aware decisions instead of one-shot data retrieval. After the Data filtering, the IQLS enables the agent to fulfill tasks given by the user query through interfaces. These interfaces range from multimodal transportation information retrieval to route planning under multiple constraints. The latter lets the agent define a dynamic object, which is determined based on the query parameters. This object represents a driver capable of navigating a road network. The road network is depicted as a graph with attributes based on the data. Using a modified version of the Dijkstra algorithm, the optimal route under the given constraints can be determined. Throughout the entire process, the user maintains the ability to interact and guide the system. The IQLS is showcased in a case study on the Canadian logistics sector, allowing geospatial, visual, tabular and text data to be easily queried semantically in natural language.
翻訳日:2024-06-02 14:39:48 公開日:2024-05-04
# Open-SQL Framework: オープンソースの大規模言語モデルにおけるテキストからSQLへの拡張

Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models ( http://arxiv.org/abs/2405.06674v1 )

Xiaojun Chen, Tianle Wang, Tianhao Qiu, Jianbin Qin, Min Yang, (参考訳) Text-to-SQLタスクにおける大きな言語モデル(LLM)の成功にもかかわらず、オープンソースのLLMはコンテキスト理解と応答コヒーレンスにおいて課題に直面している。 これらの問題に対処するため,オープンソースのLLMでテキストからSQLに適合する体系的手法である \ours を提示する。 コントリビューションには、Text-to-SQLタスクにおけるオープンソースのLCMの包括的な評価、効果的な質問表現のためのOpenprompt戦略、教師付き微調整のための新しい戦略が含まれる。 ステップ・バイ・ステップ推論におけるChain-of-Thoughtの利点を探求し,数ショット学習の強化を目的とした'openexample method'を提案する。 さらに, 大規模データベースにおける課題に対処するため, トークン効率の高い手法として, \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, \textbf{Example Column Truncation} を導入する。 本研究は,教師付き微調整が文脈学習能力に与える影響について,さらなる調査の必要性を強調した。 BIRD-Devデータセットでは,Llama2-7Bが2.54 %から41.04 %に,Code Llama-7Bが14.54 %から48.24 %に改善された。 特に、Code Llama-7B のパフォーマンスは BIRD-Dev データセット上で GPT-4 (46.35\%) を上回った。

Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.
翻訳日:2024-05-27 03:27:39 公開日:2024-05-04
# EDA Corpus: OpenROADとのインタラクションを強化するための大規模言語モデルデータセット

EDA Corpus: A Large Language Model Dataset for Enhanced Interaction with OpenROAD ( http://arxiv.org/abs/2405.06676v1 )

Bing-Yue Wu, Utsav Sharma, Sai Rahul Dhanvi Kankipati, Ajay Yadav, Bintu Kappil George, Sai Ritish Guntupalli, Austin Rovinski, Vidya A. Chhabria, (参考訳) 大規模言語モデル(LLM)は設計の強力なツールとして機能し、タスク自動化と設計支援の両方の機能を提供する。 近年の進歩は、LLMをチップ設計プロセスに統合するための大きな可能性を示しているが、これらの研究の多くは、LLMのトレーニングと配布に使用するために、公開されていないデータや/または許可されていないデータに依存している。 本稿では,広く採用されているオープンソースEDAツールチェーンであるOpenROADに適したオープンソースデータセットを導入することで,このギャップを埋めることを目的としたソリューションを提案する。 データセットには1000以上のデータポイントがあり、以下の2つのフォーマットで構成されている。 一 散文の解答を伴う質問書からなる一対の集合 (ii)コードプロンプトと対応するOpenROADスクリプトからなるペアセット。 このデータセットを提供することで、EDA領域内でLLMに焦点を当てた研究を促進することを目指している。 データセットはhttps://github.com/OpenROAD-Assistant/EDA-Corpusで公開されている。

Large language models (LLMs) serve as powerful tools for design, providing capabilities for both task automation and design assistance. Recent advancements have shown tremendous potential for facilitating LLM integration into the chip design process; however, many of these works rely on data that are not publicly available and/or not permissively licensed for use in LLM training and distribution. In this paper, we present a solution aimed at bridging this gap by introducing an open-source dataset tailored for OpenROAD, a widely adopted open-source EDA toolchain. The dataset features over 1000 data points and is structured in two formats: (i) a pairwise set comprised of question prompts with prose answers, and (ii) a pairwise set comprised of code prompts and their corresponding OpenROAD scripts. By providing this dataset, we aim to facilitate LLM-focused research within the EDA domain. The dataset is available at https://github.com/OpenROAD-Assistant/EDA-Corpus.
翻訳日:2024-05-27 03:27:39 公開日:2024-05-04
# GAD:オンライン適応学習を用いたリアルタイム歩行異常検出システム

GAD: A Real-time Gait Anomaly Detection System with Online Adaptive Learning ( http://arxiv.org/abs/2405.09561v1 )

Ming-Chang Lee, Jia-Chun Lin, Sokratis Katsikas, (参考訳) 歩行異常検出は、人の通常の歩行パターンから逸脱を検出するタスクである。 これらの逸脱は、医療領域における健康問題や医療状況、またはセキュリティ領域における不正な偽造と不正なIDアクセスを示す可能性がある。 多くの歩行異常検出アプローチが導入されたが、その多くはオフラインデータ前処理、オフラインモデル学習、設定パラメータなどを必要とするため、現実のシナリオにおける有効性と適用性が制限される可能性がある。 本稿では,リアルタイム歩行異常検出システムであるGADを紹介する。 GADは、次元減少とLSTM(Long Short-Term Memory)に基づいて、個人の3次元加速度計の読み取りにおける異常を検出することに焦点を当てている。 起動後、GADはユーザーから歩行セグメントを収集し、異常検知器を訓練し、ユーザの歩行パターンをリアルタイムで学習する。 後続のモデル検証が成功し、ユーザのその後のステップを使用してトレーニングされた検出器を検証する場合、その検出装置は、ユーザの要求に応じて、ユーザのその後の歩行読みの異常を識別するために使用される。 異常検出装置は、小さなパターンの変化に適応するためにオンラインで保持され、適切な予測を提供することができない限り、再訓練が行われる。 本研究では,個々の歩幅に合わせたパーソナライズされた手法と,固定歩幅を利用した一様手法の2つを探索した。 オープンソースの歩行データセットを用いた実験結果から,GADはパーソナライズされた手法と組み合わせることで高い検出精度を達成できることがわかった。

Gait anomaly detection is a task that involves detecting deviations from a person's normal gait pattern. These deviations can indicate health issues and medical conditions in the healthcare domain, or fraudulent impersonation and unauthorized identity access in the security domain. A number of gait anomaly detection approaches have been introduced, but many of them require offline data preprocessing, offline model learning, setting parameters, and so on, which might restrict their effectiveness and applicability in real-world scenarios. To address these issues, this paper introduces GAD, a real-time gait anomaly detection system. GAD focuses on detecting anomalies within an individual's three-dimensional accelerometer readings based on dimensionality reduction and Long Short-Term Memory (LSTM). Upon being launched, GAD begins collecting a gait segment from the user and training an anomaly detector to learn the user's walking pattern on the fly. If the subsequent model verification is successful, which involves validating the trained detector using the user's subsequent steps, the detector is employed to identify abnormalities in the user's subsequent gait readings at the user's request. The anomaly detector will be retained online to adapt to minor pattern changes and will undergo retraining as long as it cannot provide adequate prediction. We explored two methods for capturing users' gait segments: a personalized method tailored to each individual's step length, and a uniform method utilizing a fixed step length. Experimental results using an open-source gait dataset show that GAD achieves a higher detection accuracy ratio when combined with the personalized method.
翻訳日:2024-05-27 03:17:55 公開日:2024-05-04
# 進化戦略強化型深部強化学習を用いたエスケープフライト車両の誘導設計

Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning ( http://arxiv.org/abs/2405.03711v1 )

Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang, (参考訳) 飛行車両の誘導コマンドは一定時間間隔の一連のデータセットであり、誘導設計はシーケンシャルな決定問題を構成し、深層強化学習(DRL)を使用するための基本的な条件を満たす。 本稿では,脱走飛行車両(EFV)がDRLに基づいて誘導コマンドを生成し,追尾飛行車両(PFV)が比例航法に基づいて誘導コマンドを生成するシナリオを考察する。 EFVの場合、誘導設計の目的は、与えられた回避距離によって課される制約を受けながら、残留速度を段階的に最大化することである。 したがって、超大規模における不規則な動的最大値問題(英語版)が定式化され、最適解が得られる時点の瞬間が不確かであり、最適解は以前に生成された全ての中間指示命令に依存する。 この問題を解決するために、2段階の戦略が考え出される。 最初のステップでは、近いポリシー最適化(PPO)アルゴリズムを使用して、EFVのガイダンスコマンドを生成する。 報奨関数,ニューラルネットワークパラメータ,学習速度が精巧に設計されているにもかかわらず,グローバル検索空間におけるPPOの結果は粗い。 そこで,第2ステップでは,PPOの結果を初期値として用いた進化戦略(ES)に基づくアルゴリズムを導入し,局所空間を探索することで解の質をさらに向上することを提案する。 シミュレーションの結果、PPOアルゴリズムに基づく誘導設計手法は、ベンチマークソフトアクター批判的かつ深い決定論的ポリシー勾配アルゴリズムによって達成された残留速度よりも高い67.24m/sの残留速度を達成することができることが示された。 さらに、ES強化PPOアルゴリズムはPPOアルゴリズムを2.7 %上回り、69.04 m/sの残差速度を達成する。

Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving this problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7\%, achieving a residual velocity of 69.04 m/s.
翻訳日:2024-05-08 18:34:09 公開日:2024-05-04
# ネットワークの書き換えが必要かもしれない:高次元関数グラフ分解に基づくネットワークアドバイザ

Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition ( http://arxiv.org/abs/2405.03712v1 )

Xiaoyan Su, Yinghao Zhu, Run Li, (参考訳) ネットワークにおける単一低次元活性化関数の研究は、内部共変量シフトと勾配偏差問題を引き起こしている。 比較的小さな研究領域は、単一のアクティベーション関数アプリケーションにプロパティ補完を提供するために関数の組み合わせを使う方法である。 本稿では,上記の課題に対処するネットワーク敵手法を提案する。 これはネットワーク内で異なるアクティベーション関数を使用する最初の方法である。 現在のネットワークにおける既存のアクティベーション機能に基づいて、反対の微分画像特性を持つ対角関数を構築し、異なるネットワーク層に対するアクティベーション機能としてこれら2つを交互に使用する。 本稿では,高次元関数グラフ分解法(HD-FGD)を提案する。 各分解項の部分微分の逆数を統合すると、その逆関数は分解過程の計算規則を参照して得られる。 ネットワーク敵法の使用やHD-FGDのみの使用は、従来のMLP+アクティベーション機能モードを効果的に置き換えることができる。 以上の方法により,トレーニング効率と予測精度の両面で,標準活性化関数よりも大幅に向上した。 この記事では、いくつかのアクティベーション関数に関連する敵問題に対処し、副作用なく既存のモデルにシームレスに統合できる代替案を提示している。 カンファレンスのレビュープロセスが完了した後、コードをオープンソースとして公開します。

In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.
翻訳日:2024-05-08 18:34:09 公開日:2024-05-04
# 逆CTスキャンとしてのMRI画像の処理による異方性セグメンテーションの改善

Improve Cross-Modality Segmentation by Treating MRI Images as Inverted CT Scans ( http://arxiv.org/abs/2405.03713v1 )

Hartmut Häntze, Lina Xu, Leonhard Donle, Felix J. Dorfner, Alessa Hering, Lisa C. Adams, Keno K. Bressem, (参考訳) CT(CT)セグメンテーションモデルには、MRI(MRI)セグメンテーションモデルで現在サポートされていないクラスが含まれることが多い。 本研究では,T1強調MRI画像に適用したTotalSegmentatorモデルを用いて,MRIデータ上でのCTセグメンテーションモデルのセグメンテーション品質を大幅に向上させることができることを示す。 画像インバージョンは実装が簡単で、専用のグラフィックス処理ユニット(GPU)を必要としない。

Computed tomography (CT) segmentation models frequently include classes that are not currently supported by magnetic resonance imaging (MRI) segmentation models. In this study, we show that a simple image inversion technique can significantly improve the segmentation quality of CT segmentation models on MRI data, by using the TotalSegmentator model, applied to T1-weighted MRI images, as example. Image inversion is straightforward to implement and does not require dedicated graphics processing units (GPUs), thus providing a quick alternative to complex deep modality-transfer models for generating segmentation masks for MRI data.
翻訳日:2024-05-08 18:34:09 公開日:2024-05-04
# UniDEC : 極多ラベル分類のための統一デュアルエンコーダと分類器訓練

UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification ( http://arxiv.org/abs/2405.03714v1 )

Siddhant Kharbanda, Devaansh Gupta, Gururaj K, Pankaj Malhotra, Cho-Jui Hsieh, Rohit Babbar, (参考訳) Extreme Multi-label Classification (XMC) は非常に大きなラベル空間から関連するラベルのサブセットを予測する。 この問題のために開発されたモデルは、伝統的にモジュラーアプローチを用いている。 (i)クエリを埋め込んでテキストをラベル付けするためのデュアルエンコーダ(DE) (ii) メタ分類器のトレーニングによって抽出されたショートリストラベルをリランクするOne-vs-All分類器。 このような手法は経験的成功を示しているが、2つの重要な未知の側面を観察する。 i)DEトレーニングは一般的に、より多くのデータを提供するデータセットであっても、単一のポジティブな関係しか使用しません。 (II) 既存のアプローチは, マルチラベル問題のOvA還元のみを用いることに固執している。 この研究は、デュアルエンコーダとクラシファイアをマルチクラス損失で統一的にトレーニングする、新しいエンドツーエンドのトレーニング可能なフレームワークであるUniDECを提案することによって、これらの側面を探求することを目的としている。 マルチクラス損失の選択のために、この研究は、複数の(来るべき場合、すべて)正の値を利用するマルチラベル問題の新規なピック・アズ・ラベル(PSL)削減を提案する。 提案するフレームワークは,複数のXMLベンチマークデータセット上でのマルチGPU SOTAメソッドと同等の結果を達成しつつ,単一のGPU上での最先端結果を実現している。

Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally used modular approach with (i) a Dual Encoder (DE) to embed the queries and label texts, (ii) a One-vs-All classifier to rerank the shortlisted labels mined through meta-classifier training. While such methods have shown empirical success, we observe two key uncharted aspects, (i) DE training typically uses only a single positive relation even for datasets which offer more, (ii) existing approaches fixate on using only OvA reduction of the multi-label problem. This work aims to explore these aspects by proposing UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together in a unified fashion using a multi-class loss. For the choice of multi-class loss, the work proposes a novel pick-some-label (PSL) reduction of the multi-label problem with leverages multiple (in come cases, all) positives. The proposed framework achieves state-of-the-art results on a single GPU, while achieving on par results with respect to multi-GPU SOTA methods on various XML benchmark datasets, all while using 4-16x lesser compute and being practically scalable even beyond million label scale datasets.
翻訳日:2024-05-08 18:34:09 公開日:2024-05-04
# 連結型CNNアーキテクチャのための反復フィルタプルーニング

Iterative Filter Pruning for Concatenation-based CNN Architectures ( http://arxiv.org/abs/2405.03715v1 )

Svetlana Pavlitska, Oliver Bagge, Federico Peccia, Toghrul Mammadov, J. Marius Zöllner, (参考訳) 深層ニューラルネットワークの資源効率向上には,モデル圧縮とハードウェアアクセラレーションが不可欠である。 現代の物体検出器は、連結を伴う高度に相互接続された畳み込み層を有する。 本研究は, YOLOv7の例を例として, このようなアーキテクチャにプルーニングをどのように適用できるかを考察する。 本稿では,畳み込み層間の接続グラフに基づいて,連結層を扱う手法を提案する。 反復感度解析、プルーニング、およびその後のモデル微調整を自動化することにより、モデル精度を維持しながらパラメータ数とFLOPの両方のモデルサイズを大幅に削減できる。 最後に,FPGAおよびNVIDIA Jetson Xavier AGXにプルーンドモデルをデプロイする。 Pruned Modelは、未処理のモデルと比較して畳み込み層の2倍のスピードアップを示し、FPGA上で14 FPSのリアルタイム機能を実現する。 私たちのコードはhttps://github.com/fzi-forschungszentrum-informatik/iterative-yolo-pruning.comで公開されています。

Model compression and hardware acceleration are essential for the resource-efficient deployment of deep neural networks. Modern object detectors have highly interconnected convolutional layers with concatenations. In this work, we study how pruning can be applied to such architectures, exemplary for YOLOv7. We propose a method to handle concatenation layers, based on the connectivity graph of convolutional layers. By automating iterative sensitivity analysis, pruning, and subsequent model fine-tuning, we can significantly reduce model size both in terms of the number of parameters and FLOPs, while keeping comparable model accuracy. Finally, we deploy pruned models to FPGA and NVIDIA Jetson Xavier AGX. Pruned models demonstrate a 2x speedup for the convolutional layers in comparison to the unpruned counterparts and reach real-time capability with 14 FPS on FPGA. Our code is available at https://github.com/fzi-forschungszentrum-informatik/iterative-yolo-pruning.
翻訳日:2024-05-08 18:34:09 公開日:2024-05-04
# グループキー設定方式

A Group Key Establishment Scheme ( http://arxiv.org/abs/2109.15037v2 )

Sueda Guzey, Gunes Karabulut Kurt, Enver Ozdemir, (参考訳) グループ認証は、一組のユーザーがグループに属し、共通の鍵を配布することを確認する方法である。 1つの中央機関が1つずつ認証を行う標準的な認証方式とは異なり、グループ認証は、グループのすべてのメンバーに対して一度に認証処理を処理できる。 最近発表されたグループ認証アルゴリズムは、主に有限体上の楕円曲線群と共にラグランジュ多項式補間を利用する。 新たなアプローチとして、この研究は、任意のサイズの群に対して、グループ認証とキー確立のための線形空間の使用を提案する。 線形空間へのアプローチは、グループメンバー間で共通の共有鍵を確立するために、計算と通信負荷の削減をもたらす。 ベクトル空間を用いることの利点は、提案手法をエネルギー・資源制約デバイスに適用できることである。 軽量な認証とキーアグリーメントの提供に加えて,この提案では,グループ内の任意のユーザがメンバ以外のメンバをメンバにすることができる。 このスキームは、そのようなメンバーのスポンサーがグループ内の誰にでも容易に認識できるように設計されている。 ラグランジュの多項式補間に基づく他のグループ認証スキームとは異なり、提案スキームは、一部のメンバーの共有だけを使用することで、グループ秘密を侵害するためのツールを提供しておらず、非メンバーを容易に認識することができ、サービス中断攻撃を防止している。

Group authentication is a method of confirmation that a set of users belong to a group and of distributing a common key among them. Unlike the standard authentication schemes where one central authority authenticates users one by one, group authentication can handle the authentication process at once for all members of the group. The recently presented group authentication algorithms mainly exploit Lagrange's polynomial interpolation along with elliptic curve groups over finite fields. As a fresh approach, this work suggests use of linear spaces for group authentication and key establishment for a group of any size. The approach with linear spaces introduces a reduced computation and communication load to establish a common shared key among the group members. The advantages of using vector spaces make the proposed method applicable to energy and resource constrained devices. In addition to providing lightweight authentication and key agreement, this proposal allows any user in a group to make a non-member to be a member, which is expected to be useful for autonomous systems in the future. The scheme is designed in a way that the sponsors of such members can easily be recognized by anyone in the group. Unlike the other group authentication schemes based on Lagrange's polynomial interpolation, the proposed scheme doesn't provide a tool for adversaries to compromise the whole group secrets by using only a few members' shares as well as it allows to recognize a non-member easily, which prevents service interruption attacks.
翻訳日:2024-05-08 03:57:05 公開日:2024-05-04

MARS via LASSO ( http://arxiv.org/abs/2111.11694v4 )

ライセンス: Link先を確認
Multivariate adaptive regression splines (MARS) is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural lasso variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear combinations of functions in the MARS basis and imposing a variation based complexity constraint. Our estimator can be computed via finite-dimensional convex optimization, although it is defined as a solution to an infinite-dimensional optimization problem. Under a few standard design assumptions, we prove that our estimator achieves a rate of convergence that depends only logarithmically on dimension and thus avoids the usual curse of dimensionality to some extent. We also show that our method is naturally connected to nonparametric estimation techniques based on smoothness constraints. We implement our method with a cross-validation scheme for the selection of the involved tuning parameter and compare it to the usual MARS method in various simulation and real data settings.
翻訳日:2024-05-08 03:57:05 公開日:2024-05-04
# 遅延ブロックチェーンのためのライトクライアント

Light Clients for Lazy Blockchains ( http://arxiv.org/abs/2203.15968v3 )

ライセンス: Link先を確認
Lazy blockchains decouple consensus from transaction verification and execution to increase throughput. Although they can contain invalid transactions (e.g., double spends) as a result, these can easily be filtered out by full nodes that check if there have been previous conflicting transactions. However, creating light (SPV) clients that do not see the whole transaction history becomes a challenge: A record of a transaction on the chain does not necessarily entail transaction confirmation. In this paper, we devise a protocol that enables the creation of efficient light clients for lazy blockchains. The number of interaction rounds and the communication complexity of our protocol are logarithmic in the blockchain execution time. Our construction is based on a bisection game that traverses the Merkle tree containing the ledger of all - valid or invalid - transactions. We prove that our proof system is succinct, complete and sound, and empirically demonstrate the feasibility of our scheme.
翻訳日:2024-05-08 03:49:02 公開日:2024-05-04
# 単層半導体におけるクアドルプロン

The Quadruplon in a Monolayer Semiconductor ( http://arxiv.org/abs/2207.12760v3 )

ライセンス: Link先を確認
Understanding the structure of matter or materials and interaction or correlations among the constituent elementary particles are the central tasks of all branches of science, from physics, chemistry, to biology. In physics, this ultimate goal has spurred a constant search for high-order correlated entities or composite particles for nearly all states and forms of matter, from elementary particles, nuclei, cold atoms, to condensed matter. So far, such composite particles involving two or three constituent particles have been experimentally identified, such as the Cooper pairs, excitons, and trions in condensed matter physics, or diquarks and mesons in quantum chromodynamics. Although the four-body irreducible entities have long been predicted theoretically in a variety of materials systems alternatively as quadruplons, quadrons, or quartets, the closely related experimental observation so far seems to be restricted to the field of elementary particles (e.g. the recent tetraquark at CERN) only. In this article, we present the first experimental evidence for the existence of a four-body irreducible entity, the quadruplon, involving two electrons and two holes in a monolayer of Molybdenum Ditelluride. Using the optical pump-probe technique, we discovered a series of new spectral features that are distinct from those of trions and bi-excitons. By solving the four-body Bethe-Salpeter equation in conjunction with the cluster expansion approach, we are able to explain these spectral features in terms of the four-body irreducible cluster or the quadruplons. In contrast to a bi-exciton which consists of two weakly bound excitons, a quadruplon consists of two electrons and two holes without the presence of an exciton.
翻訳日:2024-05-08 03:39:13 公開日:2024-05-04
# 脳の信号は人間の言語による内的アライメントを明らかにするか?

Can Brain Signals Reveal Inner Alignment with Human Languages? ( http://arxiv.org/abs/2208.06348v5 )

ライセンス: Link先を確認
Brain Signals, such as Electroencephalography (EEG), and human languages have been widely explored independently for many downstream tasks, however, the connection between them has not been well explored. In this study, we explore the relationship and dependency between EEG and language. To study at the representation level, we introduced \textbf{MTAM}, a \textbf{M}ultimodal \textbf{T}ransformer \textbf{A}lignment \textbf{M}odel, to observe coordinated representations between the two modalities. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure features. On downstream applications, sentiment analysis and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 1.7% on K-EmoCon and 9.3% on Zuco datasets for sentiment analysis, and 7.4% on ZuCo for relation detection. In addition, we provide interpretations of the performance improvement: (1) feature distribution shows the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) alignment weights show the influence of different language semantics as well as EEG frequency features; (3) brain topographical maps provide an intuitive demonstration of the connectivity in the brain regions. Our code is available at \url{https://github.com/Jason-Qiu/EEG_Language_Alignment}.
翻訳日:2024-05-08 03:39:13 公開日:2024-05-04
# ニューラルネットワークの量子的性質の実験的検証

Experimental verification of the quantum nature of a neural network ( http://arxiv.org/abs/2209.07577v3 )

ライセンス: Link先を確認
Neural networks are being used to improve the probing of the state spaces of many particle systems as approximations to wavefunctions and in order to avoid the recurring sign problem of quantum monte-carlo. One may ask whether the usual classical neural networks have some actual hidden quantum properties that make them such suitable tools for a highly coupled quantum problem. I discuss here what makes a system quantum and to what extent we can interpret a neural network as having quantum remnants. I suggest that a system can be quantum both due to its fundamental quantum constituents and due to the rules of its functioning, therefore, we can obtain entanglement both due to the quantum constituents' nature and due to the functioning rules, or, in category theory terms, both due to the quantum nature of the objects of a category and of the maps. From a practical point of view, I suggest a possible experiment that could extract entanglement from the quantum functioning rules (maps) of an otherwise classical (from the point of view of the constituents) neural network.
翻訳日:2024-05-08 03:39:13 公開日:2024-05-04
# 因果的視点から見た小学生の学びの再考

Revisiting Few-Shot Learning from a Causal Perspective ( http://arxiv.org/abs/2209.13816v2 )

ライセンス: Link先を確認
Few-shot learning with $N$-way $K$-shot scheme is an open challenge in machine learning. Many metric-based approaches have been proposed to tackle this problem, e.g., the Matching Networks and CLIP-Adapter. Despite that these approaches have shown significant progress, the mechanism of why these methods succeed has not been well explored. In this paper, we try to interpret these metric-based few-shot learning methods via causal mechanism. We show that the existing approaches can be viewed as specific forms of front-door adjustment, which can alleviate the effect of spurious correlations and thus learn the causality. This causal interpretation could provide us a new perspective to better understand these existing metric-based methods. Further, based on this causal interpretation, we simply introduce two causal methods for metric-based few-shot learning, which considers not only the relationship between examples but also the diversity of representations. Experimental results demonstrate the superiority of our proposed methods in few-shot classification on various benchmark datasets. Code is available in https://github.com/lingl1024/causalFewShot.
翻訳日:2024-05-08 03:39:13 公開日:2024-05-04
# QuACK: Koopman演算子学習によるグラディエントベースの量子最適化の高速化

QuACK: Accelerating Gradient-Based Quantum Optimization with Koopman Operator Learning ( http://arxiv.org/abs/2211.01365v3 )

ライセンス: Link先を確認
Quantum optimization, a key application of quantum computing, has traditionally been stymied by the linearly increasing complexity of gradient calculations with an increasing number of parameters. This work bridges the gap between Koopman operator theory, which has found utility in applications because it allows for a linear representation of nonlinear dynamical systems, and natural gradient methods in quantum optimization, leading to a significant acceleration of gradient-based quantum optimization. We present Quantum-circuit Alternating Controlled Koopman learning (QuACK), a novel framework that leverages an alternating algorithm for efficient prediction of gradient dynamics on quantum computers. We demonstrate QuACK's remarkable ability to accelerate gradient-based optimization across a range of applications in quantum optimization and machine learning. In fact, our empirical studies, spanning quantum chemistry, quantum condensed matter, quantum machine learning, and noisy environments, have shown accelerations of more than 200x speedup in the overparameterized regime, 10x speedup in the smooth regime, and 3x speedup in the non-smooth regime. With QuACK, we offer a robust advancement that harnesses the advantage of gradient-based quantum optimization for practical benefits.
翻訳日:2024-05-08 03:39:13 公開日:2024-05-04
# 国別重要度サンプリングによる低変数オフ政治評価

Low Variance Off-policy Evaluation with State-based Importance Sampling ( http://arxiv.org/abs/2212.03932v5 )

ライセンス: Link先を確認
In many domains, the exploration process of reinforcement learning will be too costly as it requires trying out suboptimal policies, resulting in a need for off-policy evaluation, in which a target policy is evaluated based on data collected from a known behaviour policy. In this context, importance sampling estimators provide estimates for the expected return by weighting the trajectory based on the probability ratio of the target policy and the behaviour policy. Unfortunately, such estimators have a high variance and therefore a large mean squared error. This paper proposes state-based importance sampling estimators which reduce the variance by dropping certain states from the computation of the importance weight. To illustrate their applicability, we demonstrate state-based variants of ordinary importance sampling, weighted importance sampling, per-decision importance sampling, incremental importance sampling, doubly robust off-policy evaluation, and stationary density ratio estimation. Experiments in four domains show that state-based methods consistently yield reduced variance and improved accuracy compared to their traditional counterparts.
翻訳日:2024-05-08 03:39:13 公開日:2024-05-04
# UNETR++: 効率的で正確な3D画像セグメンテーション

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation ( http://arxiv.org/abs/2212.04497v3 )

ライセンス: Link先を確認
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient having linear complexity with respect to the input sequence length. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the overall network parameters. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while being significantly efficient with a reduction of over 71% in terms of both parameters and FLOPs, compared to the best method in the literature. Code: https://github.com/Amshaker/unetr_plus_plus.
翻訳日:2024-05-08 01:45:50 公開日:2024-05-04
# バックドア言語モデルの内部メカニズムの解析と編集

Analyzing And Editing Inner Mechanisms Of Backdoored Language Models ( http://arxiv.org/abs/2302.12461v3 )

ライセンス: Link先を確認
Poisoning of data sets is a potential security threat to large language models that can lead to backdoored models. A description of the internal mechanisms of backdoored language models and how they process trigger inputs, e.g., when switching to toxic language, has yet to be found. In this work, we study the internal representations of transformer-based backdoored language models and determine early-layer MLP modules as most important for the backdoor mechanism in combination with the initial embedding projection. We use this knowledge to remove, insert, and modify backdoor mechanisms with engineered replacements that reduce the MLP module outputs to essentials for the backdoor mechanism. To this end, we introduce PCP ablation, where we replace transformer modules with low-rank matrices based on the principal components of their activations. We demonstrate our results on backdoored toy, backdoored large, and non-backdoored open-source models. We show that we can improve the backdoor robustness of large language models by locally constraining individual modules during fine-tuning on potentially poisonous data sets. Trigger warning: Offensive language.
翻訳日:2024-05-08 01:45:50 公開日:2024-05-04
# SUNY: 必要かつ十分の観点からの畳み込みニューラルネットワークの視覚的解釈フレームワーク

SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective ( http://arxiv.org/abs/2303.00244v2 )

ライセンス: Link先を確認
Researchers have proposed various methods for visually interpreting the Convolutional Neural Network (CNN) via saliency maps, which include Class-Activation-Map (CAM) based approaches as a leading family. However, in terms of the internal design logic, existing CAM-based approaches often overlook the causal perspective that answers the core "why" question to help humans understand the explanation. Additionally, current CNN explanations lack the consideration of both necessity and sufficiency, two complementary sides of a desirable explanation. This paper presents a causality-driven framework, SUNY, designed to rationalize the explanations toward better human understanding. Using the CNN model's input features or internal filters as hypothetical causes, SUNY generates explanations by bi-directional quantifications on both the necessary and sufficient perspectives. Extensive evaluations justify that SUNY not only produces more informative and convincing explanations from the angles of necessity and sufficiency, but also achieves performances competitive to other approaches across different CNN architectures over large-scale datasets, including ILSVRC2012 and CUB-200-2011.
翻訳日:2024-05-08 01:45:50 公開日:2024-05-04
# グラスマン多様体上の二次割り当てによるロバストアフィン点マッチング

Robust affine point matching via quadratic assignment on Grassmannians ( http://arxiv.org/abs/2303.02698v4 )

ライセンス: Link先を確認
Robust Affine matching with Grassmannians (RAG) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RAG is more robust to noise and point discrepancy than previous methods.
翻訳日:2024-05-08 01:45:49 公開日:2024-05-04
# オプションフレームワークに基づくマルチモード探索による自律的非モノリシックエージェント

An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework ( http://arxiv.org/abs/2305.01322v3 )

ライセンス: Link先を確認
Most exploration research on reinforcement learning (RL) has paid attention to `the way of exploration', which is `how to explore'. The other exploration research, `when to explore', has not been the main focus of RL exploration research. The issue of `when' of a monolithic exploration in the usual RL exploration behaviour binds an exploratory action to an exploitational action of an agent. Recently, a non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals. The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously. We describe the initial research of an autonomous multi-mode exploration of non-monolithic behaviour in an options framework. The higher performance of our method is shown against the existing non-monolithic exploration method through comparative experimental results.
翻訳日:2024-05-08 01:36:03 公開日:2024-05-04
# 原子論的スピンダイナミクスにおける量子効果の会計

Accounting for Quantum Effects in Atomistic Spin Dynamics ( http://arxiv.org/abs/2305.17082v2 )

ライセンス: Link先を確認
Atomistic spin dynamics (ASD) is a standard tool to model the magnetization dynamics of a variety of materials. The fundamental dynamical model underlying ASD is entirely classical. In this paper, we present two approaches to effectively incorporate quantum effects into ASD simulations, thus enhancing their low temperature predictions. The first allows to simulate the magnetic behavior of a quantum spin system by solving the equations of motion of a classical spin system at an effective temperature relative to the critical temperature. This effective temperature is determined a priori from the microscopic properties of the system. The second approach is based on a \semi model where classical spins interact with an environment with a quantum-like power spectrum. The parameters that characterize this model can be calculated ab initio or extracted from experiments. This semi-classical model quantitatively reproduces the absolute temperature behavior of a magnetic system, thus accounting for the quantum mechanical aspects of its dynamics, even at low temperature. The methods presented here can be readily implemented in current ASD simulations with no additional complexity cost.
翻訳日:2024-05-08 01:26:19 公開日:2024-05-04
# 分散フェデレーションラーニング : 調査と展望

Decentralized Federated Learning: A Survey and Perspective ( http://arxiv.org/abs/2306.01603v2 )

ライセンス: Link先を確認
Federated learning (FL) has been gaining attention for its ability to share knowledge while maintaining user data, protecting privacy, increasing learning efficiency, and reducing communication overhead. Decentralized FL (DFL) is a decentralized network architecture that eliminates the need for a central server in contrast to centralized FL (CFL). DFL enables direct communication between clients, resulting in significant savings in communication resources. In this paper, a comprehensive survey and profound perspective are provided for DFL. First, a review of the methodology, challenges, and variants of CFL is conducted, laying the background of DFL. Then, a systematic and detailed perspective on DFL is introduced, including iteration order, communication protocols, network topologies, paradigm proposals, and temporal variability. Next, based on the definition of DFL, several extended variants and categorizations are proposed with state-of-the-art (SOTA) technologies. Lastly, in addition to summarizing the current challenges in the DFL, some possible solutions and future research directions are also discussed.
翻訳日:2024-05-08 01:26:19 公開日:2024-05-04
# 雑音処理による因果推論の同定と副作用の排除

Identifiable causal inference with noisy treatment and no side information ( http://arxiv.org/abs/2306.10614v2 )

ライセンス: Link先を確認
In some causal inference scenarios, the treatment variable is measured inaccurately, for instance in epidemiology or econometrics. Failure to correct for the effect of this measurement error can lead to biased causal effect estimates. Previous research has not studied methods that address this issue from a causal viewpoint while allowing for complex nonlinear dependencies and without assuming access to side information. For such a scenario, this study proposes a model that assumes a continuous treatment variable that is inaccurately measured. Building on existing results for measurement error models, we prove that our model's causal effect estimates are identifiable, even without knowledge of the measurement error variance or other side information. Our method relies on a deep latent variable model in which Gaussian conditionals are parameterized by neural networks, and we develop an amortized importance-weighted variational objective for training the model. Empirical results demonstrate the method's good performance with unknown measurement error. More broadly, our work extends the range of applications in which reliable causal inference can be conducted.
翻訳日:2024-05-08 01:16:13 公開日:2024-05-04
# センチメントの可能性を明らかにする - 大規模言語モデルは中国の株価運動を予測することができるか?

Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements? ( http://arxiv.org/abs/2306.14222v2 )

ライセンス: Link先を確認
The rapid advancement of Large Language Models (LLMs) has spurred discussions about their potential to enhance quantitative trading strategies. LLMs excel in analyzing sentiments about listed companies from financial news, providing critical insights for trading decisions. However, the performance of LLMs in this task varies substantially due to their inherent characteristics. This paper introduces a standardized experimental procedure for comprehensive evaluations. We detail the methodology using three distinct LLMs, each embodying a unique approach to performance enhancement, applied specifically to the task of sentiment factor extraction from large volumes of Chinese news summaries. Subsequently, we develop quantitative trading strategies using these sentiment factors and conduct back-tests in realistic scenarios. Our results will offer perspectives about the performances of Large Language Models applied to extracting sentiments from Chinese news texts.
翻訳日:2024-05-08 01:16:13 公開日:2024-05-04
# 効率的なアップデートによるベクトルコミット

Vector Commitments with Efficient Updates ( http://arxiv.org/abs/2307.04085v5 )

ライセンス: Link先を確認
Dynamic vector commitments that enable local updates of opening proofs have applications ranging from verifiable databases with membership changes to stateless clients on blockchains. In these applications, each user maintains a relevant subset of the committed messages and the corresponding opening proofs with the goal of ensuring a succinct global state. When the messages are updated, users are given some global update information and update their opening proofs to match the new vector commitment. We investigate the relation between the size of the update information and the runtime complexity needed to update an individual opening proof. Existing vector commitment schemes require that either the information size or the runtime scale linearly in the number $k$ of updated state elements. We construct a vector commitment scheme that asymptotically achieves both length and runtime that is sublinear in $k$, namely $k^\nu$ and $k^{1-\nu}$ for any $\nu \in (0,1)$. We prove an information-theoretic lower bound on the relation between the update information size and runtime complexity that shows the asymptotic optimality of our scheme. For $\nu = 1/2$, our constructions outperform Verkle commitments by about a factor of $2$ in terms of both the update information size and runtime, but makes use of larger public parameters.
翻訳日:2024-05-08 01:16:13 公開日:2024-05-04
# Bengali Fakeレビュー:ベンチマークデータセットと検出システム

Bengali Fake Reviews: A Benchmark Dataset and Detection System ( http://arxiv.org/abs/2308.01987v3 )

ライセンス: Link先を確認
The proliferation of fake reviews on various online platforms has created a major concern for both consumers and businesses. Such reviews can deceive customers and cause damage to the reputation of products or services, making it crucial to identify them. Although the detection of fake reviews has been extensively studied in English language, detecting fake reviews in non-English languages such as Bengali is still a relatively unexplored research area. This paper introduces the Bengali Fake Review Detection (BFRD) dataset, the first publicly available dataset for identifying fake reviews in Bengali. The dataset consists of 7710 non-fake and 1339 fake food-related reviews collected from social media posts. To convert non-Bengali words in a review, a unique pipeline has been proposed that translates English words to their corresponding Bengali meaning and also back transliterates Romanized Bengali to Bengali. We have conducted rigorous experimentation using multiple deep learning and pre-trained transformer language models to develop a reliable detection system. Finally, we propose a weighted ensemble model that combines four pre-trained transformers: BanglaBERT, BanglaBERT Base, BanglaBERT Large, and BanglaBERT Generator . According to the experiment results, the proposed ensemble model obtained a weighted F1-score of 0.9843 on 13390 reviews, including 1339 actual fake reviews and 5356 augmented fake reviews generated with the nlpaug library. The remaining 6695 reviews were randomly selected from the 7710 non-fake instances. The model achieved a 0.9558 weighted F1-score when the fake reviews were augmented using the bnaug library.
翻訳日:2024-05-08 01:06:19 公開日:2024-05-04
# ロバスト制約マルコフ決定過程におけるロバストラグランジアンと逆数ポリシー勾配

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes ( http://arxiv.org/abs/2308.11267v2 )

ライセンス: Link先を確認
The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy through gradient descent rather than indirectly and abruptly through constrained optimisation on a sorted value list. A theoretical analysis first derives the Lagrangian policy gradient for the policy optimisation of both proposed algorithms and then the adversarial policy gradient to learn the adversary for Adversarial RCPG. Empirical experiments injecting perturbations in inventory management and safe navigation tasks demonstrate the competitive performance of both algorithms compared to traditional RCPG variants as well as non-robust and non-constrained ablations. In particular, Adversarial RCPG ranks among the top two performing algorithms on all tests.
翻訳日:2024-05-08 01:06:19 公開日:2024-05-04
# マルチオブジェクトグラフアフォーダンスネットワーク:学習された複合オブジェクトアフォーダンスによるゴール指向プランニング

Multi-Object Graph Affordance Network: Goal-Oriented Planning through Learned Compound Object Affordances ( http://arxiv.org/abs/2309.10426v3 )

ライセンス: Link先を確認
Learning object affordances is an effective tool in the field of robot learning. While the data-driven models investigate affordances of single or paired objects, there is a gap in the exploration of affordances of compound objects composed of an arbitrary number of objects. We propose the Multi-Object Graph Affordance Network which models complex compound object affordances by learning the outcomes of robot actions that facilitate interactions between an object and a compound. Given the depth images of the objects, the object features are extracted via convolution operations and encoded in the nodes of graph neural networks. Graph convolution operations are used to encode the state of the compounds, which are used as input to decoders to predict the outcome of the object-compound interactions. After learning the compound object affordances, given different tasks, the learned outcome predictors are used to plan sequences of stack actions that involve stacking objects on top of each other, inserting smaller objects into larger containers and passing through ring-like objects through poles. We showed that our system successfully modeled the affordances of compound objects that include concave and convex objects, in both simulated and real-world environments. We benchmarked our system with a baseline model to highlight its advantages.
翻訳日:2024-05-08 00:55:03 公開日:2024-05-04
# カリキュラム強化学習のための最適輸送の利点について

On the Benefit of Optimal Transport for Curriculum Reinforcement Learning ( http://arxiv.org/abs/2309.14091v2 )

ライセンス: Link先を確認
Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in various works, it is less clear how to generate them for a given learning environment, resulting in various methods aiming to automate this task. In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in various tasks with different characteristics.
翻訳日:2024-05-08 00:55:03 公開日:2024-05-04
# LLM地上映像拡散モデル

LLM-grounded Video Diffusion Models ( http://arxiv.org/abs/2309.17444v3 )

ライセンス: Link先を確認
Text-conditioned diffusion models have emerged as a promising tool for neural video generation. However, current models still struggle with intricate spatiotemporal prompts and often generate restricted or incorrect motion. To address these limitations, we introduce LLM-grounded Video Diffusion (LVD). Instead of directly generating videos from the text inputs, LVD first leverages a large language model (LLM) to generate dynamic scene layouts based on the text inputs and subsequently uses the generated layouts to guide a diffusion model for video generation. We show that LLMs are able to understand complex spatiotemporal dynamics from text alone and generate layouts that align closely with both the prompts and the object motion patterns typically observed in the real world. We then propose to guide video diffusion models with these layouts by adjusting the attention maps. Our approach is training-free and can be integrated into any video diffusion model that admits classifier guidance. Our results demonstrate that LVD significantly outperforms its base video diffusion model and several strong baseline methods in faithfully generating videos with the desired attributes and motion patterns.
翻訳日:2024-05-08 00:55:03 公開日:2024-05-04
# オブジェクトパーマンスによるオフライン追跡

Offline Tracking with Object Permanence ( http://arxiv.org/abs/2310.01288v4 )

ライセンス: Link先を確認
To reduce the expensive labor cost for manual labeling autonomous driving datasets, an alternative is to automatically label the datasets using an offline perception system. However, objects might be temporally occluded. Such occlusion scenarios in the datasets are common yet underexplored in offline auto labeling. In this work, we propose an offline tracking model that focuses on occluded object tracks. It leverages the concept of object permanence which means objects continue to exist even if they are not observed anymore. The model contains three parts: a standard online tracker, a re-identification (Re-ID) module that associates tracklets before and after occlusion, and a track completion module that completes the fragmented tracks. The Re-ID module and the track completion module use the vectorized map as one of the inputs to refine the tracking results with occlusion. The model can effectively recover the occluded object trajectories. It achieves state-of-the-art performance in 3D multi-object tracking by significantly improving the original online tracking result, showing its potential to be applied in offline auto labeling as a useful plugin to improve tracking by recovering occlusions.
翻訳日:2024-05-08 00:45:15 公開日:2024-05-04
# GNNにおける局所性を考慮したグラフ検索

Locality-Aware Graph-Rewiring in GNNs ( http://arxiv.org/abs/2310.01668v2 )

ライセンス: Link先を確認
Graph Neural Networks (GNNs) are popular models for machine learning on graphs that typically follow the message-passing paradigm, whereby the feature of a node is updated recursively upon aggregating information over its neighbors. While exchanging messages over the input graph endows GNNs with a strong inductive bias, it can also make GNNs susceptible to over-squashing, thereby preventing them from capturing long-range interactions in the given graph. To rectify this issue, graph rewiring techniques have been proposed as a means of improving information flow by altering the graph connectivity. In this work, we identify three desiderata for graph-rewiring: (i) reduce over-squashing, (ii) respect the locality of the graph, and (iii) preserve the sparsity of the graph. We highlight fundamental trade-offs that occur between spatial and spectral rewiring techniques; while the former often satisfy (i) and (ii) but not (iii), the latter generally satisfy (i) and (iii) at the expense of (ii). We propose a novel rewiring framework that satisfies all of (i)--(iii) through a locality-aware sequence of rewiring operations. We then discuss a specific instance of such rewiring framework and validate its effectiveness on several real-world benchmarks, showing that it either matches or significantly outperforms existing rewiring approaches.
翻訳日:2024-05-08 00:45:15 公開日:2024-05-04
# 依存レバレッジスコアサンプリングによるアクティブラーニングの改善

Improved Active Learning via Dependent Leverage Score Sampling ( http://arxiv.org/abs/2310.04966v2 )

ライセンス: Link先を確認
We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non-independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the \emph{pivotal sampling algorithm}, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to $50\%$. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak \emph{one-sided $\ell_{\infty}$ independence condition} (which includes pivotal sampling) can actively learn $d$ dimensional linear functions with $O(d\log d)$ samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under $\ell_{\infty}$ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on $O(d)$ samples.
翻訳日:2024-05-08 00:45:15 公開日:2024-05-04
# In-Contextデモの少ないジェイルブレークとガードアライメント言語モデル

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations ( http://arxiv.org/abs/2310.06387v2 )

ライセンス: Link先を確認
Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating harmful content have emerged. In this paper, we delve into the potential of In-Context Learning (ICL) to modulate the alignment of LLMs. Specifically, we propose the In-Context Attack (ICA), which employs strategically crafted harmful demonstrations to subvert LLMs, and the In-Context Defense (ICD), which bolsters model resilience through examples that demonstrate refusal to produce harmful responses. Through extensive experiments, we demonstrate the efficacy of ICA and ICD in respectively elevating and mitigating the success rates of jailbreaking prompts. Moreover, we offer theoretical insights into the mechanism by which a limited set of in-context demonstrations can pivotally influence the safety alignment of LLMs. Our findings illuminate the profound influence of ICL on LLM behavior, opening new avenues for improving the safety and alignment of LLMs.
翻訳日:2024-05-08 00:45:15 公開日:2024-05-04
# Vendi ScoreのCousins: 科学と機械学習のための類似性に基づく多様性メトリクスの家族

Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning ( http://arxiv.org/abs/2310.12952v3 )

ライセンス: Link先を確認
Measuring diversity accurately is important for many scientific fields, including machine learning (ML), ecology, and chemistry. The Vendi Score was introduced as a generic similarity-based diversity metric that extends the Hill number of order q=1 by leveraging ideas from quantum statistical mechanics. Contrary to many diversity metrics in ecology, the Vendi Score accounts for similarity and does not require knowledge of the prevalence of the categories in the collection to be evaluated for diversity. However, the Vendi Score treats each item in a given collection with a level of sensitivity proportional to the item's prevalence. This is undesirable in settings where there is a significant imbalance in item prevalence. In this paper, we extend the other Hill numbers using similarity to provide flexibility in allocating sensitivity to rare or common items. This leads to a family of diversity metrics -- Vendi scores with different levels of sensitivity -- that can be used in a variety of applications. We study the properties of the scores in a synthetic controlled setting where the ground truth diversity is known. We then test their utility in improving molecular simulations via Vendi Sampling. Finally, we use the Vendi scores to better understand the behavior of image generative models in terms of memorization, duplication, diversity, and sample quality.
翻訳日:2024-05-08 00:35:16 公開日:2024-05-04
# ニューラルネットワークによる階層型車両ルーティング問題の解法

Genetic Algorithms with Neural Cost Predictor for Solving Hierarchical Vehicle Routing Problems ( http://arxiv.org/abs/2310.14157v2 )

ライセンス: Link先を確認
When vehicle routing decisions are intertwined with higher-level decisions, the resulting optimization problems pose significant challenges for computation. Examples are the multi-depot vehicle routing problem (MDVRP), where customers are assigned to depots before delivery, and the capacitated location routing problem (CLRP), where the locations of depots should be determined first. A simple and straightforward approach for such hierarchical problems would be to separate the higher-level decisions from the complicated vehicle routing decisions. For each higher-level decision candidate, we may evaluate the underlying vehicle routing problems to assess the candidate. As this approach requires solving vehicle routing problems multiple times, it has been regarded as impractical in most cases. We propose a novel deep-learning-based approach called Genetic Algorithm with Neural Cost Predictor (GANCP) to tackle the challenge and simplify algorithm developments. For each higher-level decision candidate, we predict the objective function values of the underlying vehicle routing problems using a pre-trained graph neural network without actually solving the routing problems. In particular, our proposed neural network learns the objective values of the HGS-CVRP open-source package that solves capacitated vehicle routing problems. Our numerical experiments show that this simplified approach is effective and efficient in generating high-quality solutions for both MDVRP and CLRP and has the potential to expedite algorithm developments for complicated hierarchical problems. We provide computational results evaluated in the standard benchmark instances used in the literature.
翻訳日:2024-05-08 00:35:16 公開日:2024-05-04
# マトリックス機構のプライバシ増幅

Privacy Amplification for Matrix Mechanisms ( http://arxiv.org/abs/2310.15526v2 )

ライセンス: Link先を確認
Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.
翻訳日:2024-05-08 00:35:16 公開日:2024-05-04
# Imitation Bootstrapped Reinforcement Learning

Imitation Bootstrapped Reinforcement Learning ( http://arxiv.org/abs/2311.02198v5 )

ライセンス: Link先を確認
Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations that enable IL to generalize to all possible scenarios, and any distribution shift would require recollecting data for finetuning. Therefore, RL is appealing if it can build upon IL as an efficient autonomous self-improvement procedure. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework for sample-efficient RL with demonstrations that first trains an IL policy on the provided demonstrations and then uses it to propose alternative actions for both online exploration and bootstrapping target values. Compared to prior works that oversample the demonstrations or regularize RL with an additional imitation loss, IBRL is able to utilize high quality actions from IL policies since the beginning of training, which greatly accelerates exploration and training efficiency. We evaluate IBRL on 6 simulation and 3 real-world tasks spanning various difficulty levels. IBRL significantly outperforms prior methods and the improvement is particularly more prominent in harder tasks.
翻訳日:2024-05-08 00:25:31 公開日:2024-05-04
# サッカー場登録のためのビデオベース連続ベイズホログラフィー推定

Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration ( http://arxiv.org/abs/2311.10361v2 )

ライセンス: Link先を確認
A novel Bayesian framework is proposed, which explicitly relates the homography of one video frame to the next through an affine transformation while explicitly modelling keypoint uncertainty. The literature has previously used differential homography between subsequent frames, but not in a Bayesian setting. In cases where Bayesian methods have been applied, camera motion is not adequately modelled, and keypoints are treated as deterministic. The proposed method, Bayesian Homography Inference from Tracked Keypoints (BHITK), employs a two-stage Kalman filter and significantly improves existing methods. Existing keypoint detection methods may be easily augmented with BHITK. It enables less sophisticated and less computationally expensive methods to outperform the state-of-the-art approaches in most homography evaluation metrics. Furthermore, the homography annotations of the WorldCup and TS-WorldCup datasets have been refined using a custom homography annotation tool that has been released for public use. The refined datasets are consolidated and released as the consolidated and refined WorldCup (CARWC) dataset.
翻訳日:2024-05-08 00:25:31 公開日:2024-05-04
# SPECT画像のマルチモーダル融合によるコントラストグラフクロスビュー学習を用いたパーキンソン病の分類と臨床像

Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features ( http://arxiv.org/abs/2311.14902v3 )

ライセンス: Link先を確認
Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel multimodal co-attention module, integrating embeddings from separate graph views derived from low-dimensional representations of images and clinical features. This enables more robust and structured feature extraction for improved multi-view data analysis. Additionally, a simplified contrastive loss-based fusion method is devised to enhance cross-view fusion learning. Our graph-view multimodal approach achieves an accuracy of 91% and an area under the receiver operating characteristic curve (AUC) of 92.8% in five-fold cross-validation. It also demonstrates superior predictive capabilities on non-image data compared to solely machine learning-based methods.
翻訳日:2024-05-08 00:15:17 公開日:2024-05-04
# 時間の波--アメリカ史における不連続

A ripple in time: a discontinuity in American history ( http://arxiv.org/abs/2312.01185v4 )

ライセンス: Link先を確認
In this note we use the State of the Union Address (SOTU) dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2. While it is widely believed that BERT (and its variations) is most suitable for NLP classification tasks, we find out that GPT-2 in conjunction with nonlinear dimension reduction methods such as UMAP provide better separation and stronger clustering. This makes GPT-2 + UMAP an interesting alternative. In our case, no model fine-tuning is required, and the pre-trained out-of-the-box GPT-2 model is enough. We also used a fine-tuned DistilBERT model for classification detecting which President delivered which address, with very good results (accuracy 93% - 95% depending on the run). An analogous task was performed to determine the year of writing, and we were able to pin it down to about 4 years (which is a single presidential term). It is worth noting that SOTU addresses provide relatively small writing samples (with about 8'000 words on average, and varying widely from under 2'000 words to more than 20'000), and that the number of authors is relatively large (we used SOTU addresses of 42 US presidents). This shows that the techniques employed turn out to be rather efficient, while all the computations described in this note can be performed using a single GPU instance of Google Colab. The accompanying code is available on GitHub.
翻訳日:2024-05-08 00:15:17 公開日:2024-05-04
# 超音波画像における運動インフォームドニードルセグメンテーション

Motion Informed Needle Segmentation in Ultrasound Images ( http://arxiv.org/abs/2312.01239v3 )

ライセンス: Link先を確認
Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both needle features and needle motion. Our method offers three key contributions. First, we propose a compatible framework that seamlessly integrates into commonly used encoder-decoder style architectures. Second, we demonstrate superior performance compared to recent state-of-the-art needle segmentation models using our novel convolutional neural network (CNN) based KF-inspired block, achieving a 15\% reduction in pixel-wise needle tip error and an 8\% reduction in length error. Third, to our knowledge we are the first to implement a learnable filter to incorporate non-linear needle motion for improving needle segmentation.
翻訳日:2024-05-08 00:15:17 公開日:2024-05-04
# 木を用いた医療AIモデルにおけるアルゴリズムバイアスの検出

Detecting algorithmic bias in medical-AI models using trees ( http://arxiv.org/abs/2312.02959v5 )

ライセンス: Link先を確認
With the growing prevalence of machine learning and artificial intelligence-based medical decision support systems, it is equally important to ensure that these systems provide patient outcomes in a fair and equitable fashion. This paper presents an innovative framework for detecting areas of algorithmic bias in medical-AI decision support systems. Our approach efficiently identifies potential biases in medical-AI models, specifically in the context of sepsis prediction, by employing the Classification and Regression Trees (CART) algorithm. We verify our methodology by conducting a series of synthetic data experiments, showcasing its ability to estimate areas of bias in controlled settings precisely. The effectiveness of the concept is further validated by experiments using electronic medical records from Grady Memorial Hospital in Atlanta, Georgia. These tests demonstrate the practical implementation of our strategy in a clinical environment, where it can function as a vital instrument for guaranteeing fairness and equity in AI-based medical decisions.
翻訳日:2024-05-08 00:15:17 公開日:2024-05-04
# 効率的なファインチューニングのための勾配型パラメータ選択法

Gradient-based Parameter Selection for Efficient Fine-Tuning ( http://arxiv.org/abs/2312.10136v2 )

ライセンス: Link先を確認
With the growing size of pre-trained models, full fine-tuning and storing all the parameters for various downstream tasks is costly and infeasible. In this paper, we propose a new parameter-efficient fine-tuning method, Gradient-based Parameter Selection (GPS), demonstrating that only tuning a few selected parameters from the pre-trained model while keeping the remainder of the model frozen can generate similar or better performance compared with the full model fine-tuning method. Different from the existing popular and state-of-the-art parameter-efficient fine-tuning approaches, our method does not introduce any additional parameters and computational costs during both the training and inference stages. Another advantage is the model-agnostic and non-destructive property, which eliminates the need for any other design specific to a particular model. Compared with the full fine-tuning, GPS achieves 3.33% (91.78% vs. 88.45%, FGVC) and 9.61% (73.1% vs. 65.57%, VTAB) improvement of the accuracy with tuning only 0.36% parameters of the pre-trained model on average over 24 image classification tasks; it also demonstrates a significant improvement of 17% and 16.8% in mDice and mIoU, respectively, on medical image segmentation task. Moreover, GPS achieves state-of-the-art performance compared with existing PEFT methods.
翻訳日:2024-05-08 00:05:27 公開日:2024-05-04
# グラフコンピューティングのための加速器の導入

Enabling Accelerators for Graph Computing ( http://arxiv.org/abs/2312.10561v2 )

ライセンス: Link先を確認
The advent of Graph Neural Networks (GNNs) has revolutionized the field of machine learning, offering a novel paradigm for learning on graph-structured data. Unlike traditional neural networks, GNNs are capable of capturing complex relationships and dependencies inherent in graph data, making them particularly suited for a wide range of applications including social network analysis, molecular chemistry, and network security. GNNs, with their unique structure and operation, present new computational challenges compared to conventional neural networks. This requires comprehensive benchmarking and a thorough characterization of GNNs to obtain insight into their computational requirements and to identify potential performance bottlenecks. In this thesis, we aim to develop a better understanding of how GNNs interact with the underlying hardware and will leverage this knowledge as we design specialized accelerators and develop new optimizations, leading to more efficient and faster GNN computations. A pivotal component within GNNs is the Sparse General Matrix-Matrix Multiplication (SpGEMM) kernel, known for its computational intensity and irregular memory access patterns. In this thesis, we address the challenges posed by SpGEMM by implementing a highly optimized hashing-based SpGEMM kernel tailored for a custom accelerator. Synthesizing these insights and optimizations, we design state-of-the-art hardware accelerators capable of efficiently handling various GNN workloads. Our accelerator architectures are built on our characterization of GNN computational demands, providing clear motivation for our approaches. This exploration into novel models underlines our comprehensive approach, as we strive to enable accelerators that are not just performant, but also versatile, able to adapt to the evolving landscape of graph computing.
翻訳日:2024-05-08 00:05:27 公開日:2024-05-04
# インストラクショナルビデオのナビゲーションのための経路

Detours for Navigating Instructional Videos ( http://arxiv.org/abs/2401.01823v2 )

ライセンス: Link先を確認
We introduce the video detours problem for navigating instructional videos. Given a source video and a natural language query asking to alter the how-to video's current path of execution in a certain way, the goal is to find a related ''detour video'' that satisfies the requested alteration. To address this challenge, we propose VidDetours, a novel video-language approach that learns to retrieve the targeted temporal segments from a large repository of how-to's using video-and-text conditioned queries. Furthermore, we devise a language-based pipeline that exploits how-to video narration text to create weakly supervised training data. We demonstrate our idea applied to the domain of how-to cooking videos, where a user can detour from their current recipe to find steps with alternate ingredients, tools, and techniques. Validating on a ground truth annotated dataset of 16K samples, we show our model's significant improvements over best available methods for video retrieval and question answering, with recall rates exceeding the state of the art by 35%.
翻訳日:2024-05-08 00:05:27 公開日:2024-05-04
# AttributionScanner: メタデータフリースライス検索によるモデル検証のためのビジュアル分析システム

AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding ( http://arxiv.org/abs/2401.06462v2 )

ライセンス: Link先を確認
Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance, often characterized by distinct feature sets or descriptive metadata. However, in the context of validating vision models involving unstructured image data, this approach faces significant challenges, including the laborious and costly requirement for additional metadata and the complex task of interpreting the root causes of underperformance. To address these challenges, we introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding. Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design. Our interactive interface provides straightforward guidance for users to detect, interpret, and annotate predominant model issues, such as spurious correlations (model biases) and mislabeled data, with minimal effort. Additionally, it employs a cutting-edge model regularization technique to mitigate the detected issues and enhance the model's performance. The efficacy of AttributionScanner is demonstrated through use cases involving two benchmark datasets, with qualitative and quantitative evaluations showcasing its substantial effectiveness in vision model validation, ultimately leading to more reliable and accurate models.
翻訳日:2024-05-08 00:05:27 公開日:2024-05-04
# 潜在変数モデルの厳密解における特徴学習の3つのメカニズム

Three Mechanisms of Feature Learning in the Exact Solution of a Latent Variable Model ( http://arxiv.org/abs/2401.07085v2 )

ライセンス: Link先を確認
We identify and exactly solve the learning dynamics of a one-hidden-layer linear model at any finite width whose limits exhibit both the kernel phase and the feature learning phase. We analyze the phase diagram of this model in different limits of common hyperparameters including width, layer-wise learning rates, scale of output, and scale of initialization. Our solution identifies three novel prototype mechanisms of feature learning: (1) learning by alignment, (2) learning by disalignment, and (3) learning by rescaling. In sharp contrast, none of these mechanisms is present in the kernel regime of the model. We empirically demonstrate that these discoveries also appear in deep nonlinear networks in real tasks.
翻訳日:2024-05-07 23:55:35 公開日:2024-05-04
# 品質多様性アルゴリズムはおそらく最適化に役立つ

Quality-Diversity Algorithms Can Provably Be Helpful for Optimization ( http://arxiv.org/abs/2401.10539v2 )

ライセンス: Link先を確認
Quality-Diversity (QD) algorithms are a new type of Evolutionary Algorithms (EAs), aiming to find a set of high-performing, yet diverse solutions. They have found many successful applications in reinforcement learning and robotics, helping improve the robustness in complex environments. Furthermore, they often empirically find a better overall solution than traditional search algorithms which explicitly search for a single highest-performing solution. However, their theoretical analysis is far behind, leaving many fundamental questions unexplored. In this paper, we try to shed some light on the optimization ability of QD algorithms via rigorous running time analysis. By comparing the popular QD algorithm MAP-Elites with $(\mu+1)$-EA (a typical EA focusing on finding better objective values only), we prove that on two NP-hard problem classes with wide applications, i.e., monotone approximately submodular maximization with a size constraint, and set cover, MAP-Elites can achieve the (asymptotically) optimal polynomial-time approximation ratio, while $(\mu+1)$-EA requires exponential expected time on some instances. This provides theoretical justification for that QD algorithms can be helpful for optimization, and discloses that the simultaneous search for high-performing solutions with diverse behaviors can provide stepping stones to good overall solutions and help avoid local optima.
翻訳日:2024-05-07 23:55:35 公開日:2024-05-04
# VR-GS:バーチャルリアリティにおける物理ダイナミクスを意識した対話型ガウス撮影システム

VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality ( http://arxiv.org/abs/2401.16663v2 )

ライセンス: Link先を確認
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience.Our project page is available at: https://yingjiang96.github.io/VR-GS/.
翻訳日:2024-05-07 23:45:49 公開日:2024-05-04
# スピン軌道-角-モーメント結合による励起子-ポラリトン縮合における渦の安定性

Stability of vortices in exciton-polariton condensates with spin-orbital-angular-momentum coupling ( http://arxiv.org/abs/2401.17927v2 )

ライセンス: Link先を確認
The existence and dynamics of stable quantized vortices is an important subject of quantum many-body physics. Spin-orbital-angular-momentum coupling (SOAMC), a special type of spin-orbit coupling, has been experimentally achieved to create vortices in atomic Bose-Einstein condensates (BEC). Here, we generalize the concept of SOAMC to a two-component polariton BEC and analyze the emergence and configuration of vortices under a finite-size circular pumping beam. We find that the regular configuration of vortex lattices induced by a finite-size circular pump is significantly distorted by the spatially dependent Raman coupling of SOAMC, even in the presence of a repulsive polariton interaction which can assist the forming of stable vortex configuration. Meanwhile, a pair of vortices induced by SOAMC located at the center of polariton cloud remains stable. When the Raman coupling is sufficiently strong and interaction is weak, the vortices spiraling in from the edge of polariton cloud will disrupt the polariton BEC.
翻訳日:2024-05-07 23:45:49 公開日:2024-05-04
# 2DEG-圧電ヘテロ構造における非線形感受性とマルチフォノン混合過程の非初期計算

Ab-Initio Calculations of Nonlinear Susceptibility and Multi-Phonon Mixing Processes in a 2DEG-Piezoelectric Heterostructure ( http://arxiv.org/abs/2402.00303v3 )

ライセンス: Link先を確認
Solid-state elastic-wave phonons are a promising platform for a wide range of quantum information applications. An outstanding challenge and enabling capability in harnessing phonons for quantum information processing is achieving strong nonlinear interactions between them. To this end, we propose a general architecture using piezoelectric-semiconductor heterostructures consisting of a piezoelectric acoustic material hosting phonon modes in direct proximity to a two-dimensional electron gas (2DEG). Each phonon in the piezoelectric material carries an electric field, which extends into the 2DEG. The fields induce polarization of 2DEG electrons, which in turn interact with other piezoelectric phononic electric fields. The net result is coupling between the various phonon modes. We derive, from first principles, the nonlinear phononic susceptibility of the system. We show that many nonlinear processes are strongly favored at high electron mobility, motivating the use of the 2DEG to mediate the nonlinearities. We derive in detail the first, second, and third-order susceptibilities and calculate them for the case of a lithium niobate surface acoustic wave interacting with a GaAs-AlGaAs heterostructure 2DEG. We show that, for this system, the strong third-order nonlinearity could enable single-phonon Kerr shift in an acoustic cavity that exceeds realistic cavity linewidths, potentially leading to a new class of acoustic qubit. We further show that the strong second-order nonlinearity could be used to produce a high-gain, traveling-wave parametric amplifier to amplify--and ultimately detect--the outputs of the acoustic cavity qubits. Assuming favorable losses in such a system, these capabilities, combined with the ability to efficiently transduce phonons from microwave electromagnetic fields in transmission lines, thus hold promise for creating all-acoustic quantum information processors.
翻訳日:2024-05-07 23:45:49 公開日:2024-05-04
# スキップスポンジ攻撃:ディープニューラルネットワークのスポンジ重量のポジティング

The SkipSponge Attack: Sponge Weight Poisoning of Deep Neural Networks ( http://arxiv.org/abs/2402.06357v2 )

ライセンス: Link先を確認
Sponge attacks aim to increase the energy consumption and computation time of neural networks deployed on hardware accelerators. Existing sponge attacks can be performed during inference via sponge examples or during training via Sponge Poisoning. Sponge examples leverage perturbations added to the model's input to increase energy and latency, while Sponge Poisoning alters the objective function of a model to induce inference-time energy effects. In this work, we propose a novel sponge attack called SkipSponge. SkipSponge is the first sponge attack that is performed directly on the parameters of a pre-trained model using only a few data samples. Our experiments show that SkipSponge can successfully increase the energy consumption of image classification models with fewer samples required than Sponge Poisoning. We show that poisoning defenses are ineffective if not adjusted specifically for the defense against SkipSponge (i.e., they decrease target layer bias values). Our work shows that SkipSponge is more effective on the GANs and the autoencoders than the state-of-the-art. Additionally, SkipSponge is stealthier than the previous Sponge Poisoning attack as it does not require significant changes in the victim model's weights. Our experiments indicate that the SkipSponge attack can be performed even when an attacker has access to only 1% of the entire dataset and reaches up to 13% energy increase.
翻訳日:2024-05-07 23:35:59 公開日:2024-05-04
# 先例のないコード変更自動化 - LLMの融合と例による変換

Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example ( http://arxiv.org/abs/2402.07138v2 )

ライセンス: Link先を確認
Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. Large Language Models (LLMs), trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.
翻訳日:2024-05-07 23:35:59 公開日:2024-05-04
# グラフポーリング:グラフ上の負の転送に対処する

Subgraph Pooling: Tackling Negative Transfer on Graphs ( http://arxiv.org/abs/2402.08907v2 )

ライセンス: Link先を確認
Transfer learning aims to enhance performance on a target task by using knowledge from related tasks. However, when the source and target tasks are not closely aligned, it can lead to reduced performance, known as negative transfer. Unlike in image or text data, we find that negative transfer could commonly occur in graph-structured data, even when source and target graphs have semantic similarities. Specifically, we identify that structural differences significantly amplify the dissimilarities in the node embeddings across graphs. To mitigate this, we bring a new insight in this paper: for semantically similar graphs, although structural differences lead to significant distribution shift in node embeddings, their impact on subgraph embeddings could be marginal. Building on this insight, we introduce Subgraph Pooling (SP) by aggregating nodes sampled from a k-hop neighborhood and Subgraph Pooling++ (SP++) by a random walk, to mitigate the impact of graph structural differences on knowledge transfer. We theoretically analyze the role of SP in reducing graph discrepancy and conduct extensive experiments to evaluate its superiority under various settings. The proposed SP methods are effective yet elegant, which can be easily applied on top of any backbone Graph Neural Networks (GNNs). Our code and data are available at: https://github.com/Zehong-Wang/Subgraph-Pooling.
翻訳日:2024-05-07 23:35:58 公開日:2024-05-04
# Emoji Driven Crypto Assetsの市場反応

Emoji Driven Crypto Assets Market Reactions ( http://arxiv.org/abs/2402.10481v2 )

ライセンス: Link先を確認
In the burgeoning realm of cryptocurrency, social media platforms like Twitter have become pivotal in influencing market trends and investor sentiments. In our study, we leverage GPT-4 and a fine-tuned transformer-based BERT model for a multimodal sentiment analysis, focusing on the impact of emoji sentiment on cryptocurrency markets. By translating emojis into quantifiable sentiment data, we correlate these insights with key market indicators like BTC Price and the VCRIX index. Our architecture's analysis of emoji sentiment demonstrated a distinct advantage over FinBERT's pure text sentiment analysis in such predicting power. This approach may be fed into the development of trading strategies aimed at utilizing social media elements to identify and forecast market trends. Crucially, our findings suggest that strategies based on emoji sentiment can facilitate the avoidance of significant market downturns and contribute to the stabilization of returns. This research underscores the practical benefits of integrating advanced AI-driven analyses into financial strategies, offering a nuanced perspective on the interplay between digital communication and market dynamics in an academic context.
翻訳日:2024-05-07 23:35:58 公開日:2024-05-04
# AlloyASG: Alloy Predicate Code Representation as a compact structurely Balanced Graph

AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph ( http://arxiv.org/abs/2403.00170v4 )

ライセンス: Link先を確認
Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties before systems are built to automated testing and debugging of their implementations after they are built. Unfortunately, the model itself needs to be correct to gain these benefits. Alloy is a commonly used modeling language that has several existing efforts to repair faulty models automatically. Currently, these efforts are search-based methods that use an Abstract Syntax Tree (AST) representation of the model and do not scale. One issue is that ASTs themselves suffer from exponential growth in their data size due to the limitation that ASTs will often have identical nodes separately listed in the tree. To address this issue, we introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG), which represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph and ensures its structural balance for almost finitely enumerable code segments. We evaluate the efficiency of our CSBASG representation for Alloy models in terms of it's compactness compared to ASTs, and we explore if a CSBASG can ease the process of comparing two Alloy predicates. Moreover, with this representation in place, we identify several future applications of CSBASG, including Alloy code generation and automated repair.
翻訳日:2024-05-07 23:26:12 公開日:2024-05-04
# ベイズ最適化のためのEpsilon-Greedy Thompsonサンプリング

Epsilon-Greedy Thompson Sampling to Bayesian Optimization ( http://arxiv.org/abs/2403.00540v2 )

ライセンス: Link先を確認
Bayesian optimization (BO) has become a powerful tool for solving simulation-based engineering optimization problems thanks to its ability to integrate physical and mathematical understandings, consider uncertainty, and address the exploitation--exploration dilemma. Thompson sampling (TS) is a preferred solution for BO to handle the exploitation--exploration trade-off. While it prioritizes exploration by generating and minimizing random sample paths from probabilistic models -- a fundamental ingredient of BO -- TS weakly manages exploitation by gathering information about the true objective function after it obtains new observations. In this work, we improve the exploitation of TS by incorporating the $\varepsilon$-greedy policy, a well-established selection strategy in reinforcement learning. We first delineate two extremes of TS, namely the generic TS and the sample-average TS. The former promotes exploration, while the latter favors exploitation. We then adopt the $\varepsilon$-greedy policy to randomly switch between these two extremes. Small and large values of $\varepsilon$ govern exploitation and exploration, respectively. By minimizing two benchmark functions and solving an inverse problem of a steel cantilever beam,we empirically show that $\varepsilon$-greedy TS equipped with an appropriate $\varepsilon$ is more robust than its two extremes,matching or outperforming the better of the generic TS and the sample-average TS.
翻訳日:2024-05-07 23:26:12 公開日:2024-05-04
# LLMからの自己説明によるテキストスタイルの蒸留

Distilling Text Style Transfer With Self-Explanation From LLMs ( http://arxiv.org/abs/2403.01106v2 )

ライセンス: Link先を確認
Text Style Transfer (TST) seeks to alter the style of text while retaining its core content. Given the constraints of limited parallel datasets for TST, we propose CoTeX, a framework that leverages large language models (LLMs) alongside chain-of-thought (CoT) prompting to facilitate TST. CoTeX distills the complex rewriting and reasoning capabilities of LLMs into more streamlined models capable of working with both non-parallel and parallel data. Through experimentation across four TST datasets, CoTeX is shown to surpass traditional supervised fine-tuning and knowledge distillation methods, particularly in low-resource settings. We conduct a comprehensive evaluation, comparing CoTeX against current unsupervised, supervised, in-context learning (ICL) techniques, and instruction-tuned LLMs. Furthermore, CoTeX distinguishes itself by offering transparent explanations for its style transfer process.
翻訳日:2024-05-07 23:26:12 公開日:2024-05-04
# Rydberg-atom Networkにおける電流状態の量子重ね合わせ

Quantum superpositions of current states in Rydberg-atom networks ( http://arxiv.org/abs/2403.03202v2 )

ライセンス: Link先を確認
Quantum simulation of many-body quantum systems using Rydberg-atom platforms has become of extreme interest in the last years. The possibility to realize spin Hamiltonians and the accurate control at the single atom level paved the way for the study of quantum phases of matter and dynamics. Here, we propose a quantum optimal control protocol to engineer current states: quantum states characterized by Rydberg excitations propagating in a given spatially closed tweezer networks. Indeed, current states with different winding numbers can be generated on demand. Besides those ones with single winding number, superposition of quantum current states characterized by more winding numbers can be obtained. The single current states are eigenstates of the current operator that therefore can define an observable that remains persistent at any time. In particular, the features of the excitations dynamics reflects the nature of current states, a fact that in principle can be used to characterize the nature of the flow experimentally without the need of accessing high order correlators.
翻訳日:2024-05-07 23:26:12 公開日:2024-05-04
# SWAP-NAS:超高速NASのための試料ワイドアクティベーションパターン

SWAP-NAS: sample-wise activation patterns for ultra-fast NAS ( http://arxiv.org/abs/2403.04161v4 )

ライセンス: Link先を確認
Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size control during the search. For example, Spearman's rank correlation coefficient between regularised SWAP-Score and CIFAR-100 validation accuracies on NAS-Bench-201 networks is 0.90, significantly higher than 0.80 from the second-best metric, NWOT. When integrated with an evolutionary algorithm for NAS, our SWAP-NAS achieves competitive performance on CIFAR-10 and ImageNet in approximately 6 minutes and 9 minutes of GPU time respectively.
翻訳日:2024-05-07 23:16:28 公開日:2024-05-04
# 最近の大規模視線モデルの有効性評価

Effectiveness Assessment of Recent Large Vision-Language Models ( http://arxiv.org/abs/2403.04306v3 )

ライセンス: Link先を確認
The advent of large vision-language models (LVLMs) represents a noteworthy advancement towards the pursuit of artificial general intelligence. However, the model efficacy across both specialized and general tasks warrants further investigation. This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of these novel models. To gauge their efficacy in specialized tasks, we employ six challenging tasks across three distinct application scenarios, namely natural, healthcare, and industrial ones. Such six tasks include salient/camouflaged/transparent object detection, as well as polyp detection, skin lesion detection, and industrial anomaly detection. We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization under these tasks. Moreover, we conduct empirical investigations utilizing the aforementioned LVLMs together with GPT-4V, assessing their multi-modal understanding capabilities in general tasks including object counting, absurd question answering, affordance reasoning, attribute recognition, and spatial relation reasoning. Our investigations reveal that these LVLMs demonstrate limited proficiency not only in specialized tasks but also in general tasks. We delve deep into this inadequacy and uncover several potential factors, including limited cognition in specialized tasks, object hallucination, text-to-image interference, and decreased robustness in complex problems. We hope this study could provide useful insights for the future development of LVLMs, helping researchers improve LVLMs to cope with both general and specialized applications.
翻訳日:2024-05-07 23:16:28 公開日:2024-05-04
# クリニカル・アクセシブル・ラジオロジー・ファンデーション・モデルに向けて--オープン・アクセスとライトウェイト--自動評価による検討

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation ( http://arxiv.org/abs/2403.08002v3 )

ライセンス: Link先を確認
The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space, as exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. For best practice, we conduct a systematic ablation study on various choices in data engineering and multimodal training. The resulting LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
翻訳日:2024-05-07 23:16:28 公開日:2024-05-04
# モデル蒸留の理論に向けて

Towards a theory of model distillation ( http://arxiv.org/abs/2403.09053v2 )

ライセンス: Link先を確認
Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06,HVD15]. Despite many practical applications, basic questions about the extent to which models can be distilled, and the runtime and amount of data needed to distill, remain largely open. To study these questions, we initiate a general theory of distillation, defining PAC-distillation in an analogous way to PAC-learning [Val84]. As applications of this theory: (1) we propose new algorithms to extract the knowledge stored in the trained weights of neural networks -- we show how to efficiently distill neural networks into succinct, explicit decision tree representations when possible by using the ``linear representation hypothesis''; and (2) we prove that distillation can be much cheaper than learning from scratch, and make progress on characterizing its complexity.
翻訳日:2024-05-07 23:16:28 公開日:2024-05-04
# FakeWatch: 偽ニュースを検知して、信頼できる選挙を確実にするフレームワーク

FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections ( http://arxiv.org/abs/2403.09858v2 )

ライセンス: Link先を確認
In today's technologically driven world, the rapid spread of fake news, particularly during critical events like elections, poses a growing threat to the integrity of information. To tackle this challenge head-on, we introduce FakeWatch, a comprehensive framework carefully designed to detect fake news. Leveraging a newly curated dataset of North American election-related news articles, we construct robust classification models. Our framework integrates a model hub comprising of both traditional machine learning (ML) techniques, and state-of-the-art Language Models (LMs) to discern fake news effectively. Our objective is to provide the research community with adaptable and precise classification models adept at identifying fake news for the elections agenda. Quantitative evaluations of fake news classifiers on our dataset reveal that, while state-of-the-art LMs exhibit a slight edge over traditional ML models, classical models remain competitive due to their balance of accuracy and computational efficiency. Additionally, qualitative analyses shed light on patterns within fake news articles. We provide our labeled data at https://huggingface.co/datasets/newsmediabias/fake_news_elections_labelled_data and model https://huggingface.co/newsmediabias/FakeWatch for reproducibility and further research.
翻訳日:2024-05-07 23:16:28 公開日:2024-05-04
# PolSAR土地被覆分類のための不均一ネットワークを用いたコントラスト学習法

Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification ( http://arxiv.org/abs/2403.19902v2 )

ライセンス: Link先を確認
Polarimetric synthetic aperture radar (PolSAR) image interpretation is widely used in various fields. Recently, deep learning has made significant progress in PolSAR image classification. Supervised learning (SL) requires a large amount of labeled PolSAR data with high quality to achieve better performance, however, manually labeled data is insufficient. This causes the SL to fail into overfitting and degrades its generalization performance. Furthermore, the scattering confusion problem is also a significant challenge that attracts more attention. To solve these problems, this article proposes a Heterogeneous Network based Contrastive Learning method(HCLNet). It aims to learn high-level representation from unlabeled PolSAR data for few-shot classification according to multi-features and superpixels. Beyond the conventional CL, HCLNet introduces the heterogeneous architecture for the first time to utilize heterogeneous PolSAR features better. And it develops two easy-to-use plugins to narrow the domain gap between optics and PolSAR, including feature filter and superpixel-based instance discrimination, which the former is used to enhance the complementarity of multi-features, and the latter is used to increase the diversity of negative samples. Experiments demonstrate the superiority of HCLNet on three widely used PolSAR benchmark datasets compared with state-of-the-art methods. Ablation studies also verify the importance of each component. Besides, this work has implications for how to efficiently utilize the multi-features of PolSAR data to learn better high-level representation in CL and how to construct networks suitable for PolSAR data better.
翻訳日:2024-05-07 23:06:30 公開日:2024-05-04
# 機械学習のロバスト性:プライマー

Machine Learning Robustness: A Primer ( http://arxiv.org/abs/2404.00897v3 )

ライセンス: Link先を確認
This chapter explores the foundational concept of robustness in Machine Learning (ML) and its integral role in establishing trustworthiness in Artificial Intelligence (AI) systems. The discussion begins with a detailed definition of robustness, portraying it as the ability of ML models to maintain stable performance across varied and unexpected environmental conditions. ML robustness is dissected through several lenses: its complementarity with generalizability; its status as a requirement for trustworthy AI; its adversarial vs non-adversarial aspects; its quantitative metrics; and its indicators such as reproducibility and explainability. The chapter delves into the factors that impede robustness, such as data bias, model complexity, and the pitfalls of underspecified ML pipelines. It surveys key techniques for robustness assessment from a broad perspective, including adversarial attacks, encompassing both digital and physical realms. It covers non-adversarial data shifts and nuances of Deep Learning (DL) software testing methodologies. The discussion progresses to explore amelioration strategies for bolstering robustness, starting with data-centric approaches like debiasing and augmentation. Further examination includes a variety of model-centric methods such as transfer learning, adversarial training, and randomized smoothing. Lastly, post-training methods are discussed, including ensemble techniques, pruning, and model repairs, emerging as cost-effective strategies to make models more resilient against the unpredictable. This chapter underscores the ongoing challenges and limitations in estimating and achieving ML robustness by existing approaches. It offers insights and directions for future research on this crucial concept, as a prerequisite for trustworthy AI systems.
翻訳日:2024-05-07 23:06:30 公開日:2024-05-04
# パッチに基づく完全畳み込みネットワークを用いた高速拡散型画像登録

Fast Diffeomorphic Image Registration using Patch based Fully Convolutional Networks ( http://arxiv.org/abs/2404.04244v2 )

ライセンス: Link先を確認
Diffeomorphic image registration is a fundamental step in medical image analysis, owing to its capability to ensure the invertibility of transformations and preservation of topology. Currently, unsupervised learning-based registration techniques primarily extract features at the image level, potentially limiting their efficacy. This paper proposes a novel unsupervised learning-based fully convolutional network (FCN) framework for fast diffeomorphic image registration, emphasizing feature acquisition at the image patch level. Furthermore, a novel differential operator is introduced and integrated into the FCN architecture for parameter learning. Experiments are conducted on three distinct T1-weighted magnetic resonance imaging (T1w MRI) datasets. Comparative analyses with three state-of-the-art diffeomorphic image registration approaches including a typical conventional registration algorithm and two representative unsupervised learning-based methods, reveal that the proposed method exhibits superior performance in both registration accuracy and topology preservation.
翻訳日:2024-05-07 22:56:46 公開日:2024-05-04
# Mind-to- Image: Projecting Visual Mental Imagination of the Brain from fMRI

Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI ( http://arxiv.org/abs/2404.05468v4 )

ライセンス: Link先を確認
The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. The resulting pipeline we call Mind-to-Image marks a step towards creating a technology that allow direct reconstruction of visual imagery.
翻訳日:2024-05-07 22:56:46 公開日:2024-05-04
# BLINK: マルチモーダルな大規模言語モデルは理解できるが知覚できない

BLINK: Multimodal Large Language Models Can See but Not Perceive ( http://arxiv.org/abs/2404.12390v3 )

ライセンス: Link先を確認
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challenges for current multimodal LLMs because they resist mediation through natural language. Blink reformats 14 classic computer vision tasks into 3,807 multiple-choice questions, paired with single or multiple images and visual prompting. While humans get 95.70% accuracy on average, Blink is surprisingly challenging for existing multimodal LLMs: even the best-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only 13.17% and 7.63% higher than random guessing, indicating that such perception abilities have not "emerged" yet in recent multimodal LLMs. Our analysis also highlights that specialist CV models could solve these problems much better, suggesting potential pathways for future improvements. We believe Blink will stimulate the community to help multimodal LLMs catch up with human-level visual perception.
翻訳日:2024-05-07 22:46:58 公開日:2024-05-04
# PoseINN: Invertible Neural Networksを用いたリアルタイム視覚ベースのPose回帰とローカライゼーション

PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks ( http://arxiv.org/abs/2404.13288v2 )

ライセンス: Link先を確認
Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality. While SOTA models are becoming increasingly accurate, they can still be unwieldy due to high computational costs. In this paper, we propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and poses for a given scene. Our model achieves similar performance to the SOTA while being faster to train and only requiring offline rendering of low-resolution synthetic data. By using normalizing flows, the proposed method also provides uncertainty estimation for the output. We also demonstrated the efficiency of this method by deploying the model on a mobile robot.
翻訳日:2024-05-07 22:46:58 公開日:2024-05-04
# FLDM-VTON:仮想試行のための忠実潜在拡散モデル

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on ( http://arxiv.org/abs/2404.14162v2 )

ライセンス: Link先を確認
Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.
翻訳日:2024-05-07 22:37:13 公開日:2024-05-04
# MMAC-Copilot:マルチモーダルエージェント協調運用システム

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot ( http://arxiv.org/abs/2404.18074v2 )

ライセンス: Link先を確認
Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps. To evaluate the performance of MMAC-Copilot, we conducted experiments using both the GAIA benchmark and our newly introduced Visual Interaction Benchmark (VIBench). VIBench focuses on non-API-interactable applications across various domains, including 3D gaming, recreation, and office scenarios. MMAC-Copilot achieved exceptional performance on GAIA, with an average improvement of 6.8\% over existing leading systems. Furthermore, it demonstrated remarkable capability on VIBench, particularly in managing various methods of interaction within systems and applications. These results underscore MMAC-Copilot's potential in advancing the field of autonomous virtual agents through its innovative approach to agent collaboration.
翻訳日:2024-05-07 22:37:13 公開日:2024-05-04
# SoK:ディープラーニングを用いた複雑な人間の活動認識の精度の裏側

SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning ( http://arxiv.org/abs/2405.00712v2 )

ライセンス: Link先を確認
Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.
翻訳日:2024-05-07 20:39:25 公開日:2024-05-04
# バルクRNAデータセットからの特徴選択のためのマルチドメインマルチタスクアプローチ

A Multi-Domain Multi-Task Approach for Feature Selection from Bulk RNA Datasets ( http://arxiv.org/abs/2405.02534v1 )

ライセンス: Link先を確認
In this paper a multi-domain multi-task algorithm for feature selection in bulk RNAseq data is proposed. Two datasets are investigated arising from mouse host immune response to Salmonella infection. Data is collected from several strains of collaborative cross mice. Samples from the spleen and liver serve as the two domains. Several machine learning experiments are conducted and the small subset of discriminative across domains features have been extracted in each case. The algorithm proves viable and underlines the benefits of across domain feature selection by extracting new subset of discriminative features which couldn't be extracted only by one-domain approach.
翻訳日:2024-05-07 19:40:24 公開日:2024-05-04
# AdaFPP:パノラマ活動認識のための適応型バイプロパゲーティングプロトタイプ学習

AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition ( http://arxiv.org/abs/2405.02538v1 )

ライセンス: Link先を確認
Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie, (参考訳) パノラマ活動認識(PAR)は、パノラマシーンにおいて、個人活動、グループ活動、グローバル活動を含む複数の人が行う多粒度行動を特定することを目的としている。 これまでの方法 1) トレーニング及び推論において手動で注釈付き検出ボックスに大きく依存し,さらに実践的な展開を妨げること。 2) パノラマシーンにおける複数の人物の空間閉塞や大きさの変化を検知するために通常の検出器を直接使用し,PARの性能向上を阻害した。 この目的のために、我々は、オールインワンフレームワークにおける認識モジュールとともに最適化された、様々なサイズの隠蔽された人物に対応する検出器を学習することを検討する。 そこで本研究では,適応型検知器と多粒度プロトタイプをエンド・ツー・エンドの作業として学習することにより,パノラマ活動シーンにおける個人,グループ,グローバルな活動を協調的に認識する,適応型バイプロパゲーティング・プロトタイプ学習(AdaFPP)フレームワークを提案する。 具体的には,成長したパノラマシーンにおける複数の人物の大きさや空間的閉塞の変動に対応するため,パノラマ適応焦点装置を導入し,オリジナル検出によって同定されたオブジェクト密度サブ領域の細粒度検出を包括的に選択し,個人の大きさ適応検出を実現する。 さらに、不正確な個別の局所化による情報損失を軽減するため、各個人、グループ、グローバルレベル間の双方向情報伝達を容易にすることにより、閉ループの相互作用を促進し、異なる粒度にわたって情報的整合性を促進するバイプロパゲーションプロトタイプを導入する。 大規模な実験は、AdaFPPの重要な性能を示し、PARの強力な適用性を強調している。

# スパイキング言語モデルにおける極端量子化の探索

Exploring Extreme Quantization in Spiking Language Models ( http://arxiv.org/abs/2405.02543v1 )

ライセンス: Link先を確認
Despite the growing prevalence of large language model (LLM) architectures, a crucial concern persists regarding their energy and power consumption, which still lags far behind the remarkable energy efficiency of the human brain. Recent strides in spiking language models (LM) and transformer architectures aim to address this concern by harnessing the spiking activity of biological neurons to enhance energy/power efficiency. Doubling down on the principles of model quantization and energy efficiency, this paper proposes the development of a novel binary/ternary (1/1.58-bit) spiking LM architecture. Achieving scalability comparable to a deep spiking LM architecture is facilitated by an efficient knowledge distillation technique, wherein knowledge from a non-spiking full-precision "teacher" model is transferred to an extremely weight quantized spiking "student" LM. Our proposed model represents a significant advancement as the first-of-its-kind 1/1.58-bit spiking LM, and its performance is rigorously evaluated on multiple text classification tasks of the GLUE benchmark.
# パーミッションレスブロックチェーンにおけるBFTベースのコンセンサス保護のための新しいプロトコル

A Novel Endorsement Protocol to Secure BFT-Based Consensus in Permissionless Blockchain ( http://arxiv.org/abs/2405.02544v1 )

ライセンス: Link先を確認
Permissionless blockchain technology offers numerous potential benefits for decentralised applications, such as security, transparency, and openness. BFT-based consensus mechanisms are widely adopted in the permissioned blockchain to meet the high scalability requirements of the network. Sybil attacks are one of the most potential threats when applying BFT-based consensus mechanisms in permissionless blockchain due to the lack of effective verification mechanisms for participants' identities. This paper presents a novel endorsement-based bootstrapping protocol with a signature algorithm that offers a streamlined, scalable identity endorsement and verification process. This approach effectively safeguards the BFT-based consensus mechanism against Sybil attacks. Using our proposed method, we have conducted thorough security analyses and simulation experiments to assess security, robustness, and scalability advantages in large-scale networks. Our results demonstrate that the scheme can effectively address the identity verification challenges when applying BFT-based consensus in a permissionless blockchain.
# 畳み込みニューラルネットワークを用いた活動領域磁図解析による宇宙気象事象の予測

Prediction of Space Weather Events through Analysis of Active Region Magnetograms using Convolutional Neural Network ( http://arxiv.org/abs/2405.02545v1 )

ライセンス: Link先を確認
Although space weather events may not directly affect human life, they have the potential to inflict significant harm upon our communities. Harmful space weather events can trigger atmospheric changes that result in physical and economic damages on a global scale. In 1989, Earth experienced the effects of a powerful geomagnetic storm that caused satellites to malfunction, while triggering power blackouts in Canada, along with electricity disturbances in the United States and Europe. With the solar cycle peak rapidly approaching, there is an ever-increasing need to prepare and prevent the damages that can occur, especially to modern-day technology, calling for the need of a comprehensive prediction system. This study aims to leverage machine learning techniques to predict instances of space weather (solar flares, coronal mass ejections, geomagnetic storms), based on active region magnetograms of the Sun. This was done through the use of the NASA DONKI service to determine when these solar events occur, then using data from the NASA Solar Dynamics Observatory to compile a dataset that includes magnetograms of active regions of the Sun 24 hours before the events. By inputting the magnetograms into a convolutional neural network (CNN) trained from this dataset, it can serve to predict whether a space weather event will occur, and what type of event it will be. The model was designed using a custom architecture CNN, and returned an accuracy of 90.27%, a precision of 85.83%, a recall of 91.78%, and an average F1 score of 92.14% across each class (Solar flare [Flare], geomagnetic storm [GMS], coronal mass ejection [CME]). Our results show that using magnetogram data as an input for a CNN is a viable method to space weather prediction. Future work can involve prediction of the magnitude of solar events.
# 進化的アーキテクチャへの平衡伝播を用いたSNNのスケーリング

Scaling SNNs Trained Using Equilibrium Propagation to Convolutional Architectures ( http://arxiv.org/abs/2405.02546v1 )

ライセンス: Link先を確認
Equilibrium Propagation (EP) is a biologically plausible local learning algorithm initially developed for convergent recurrent neural networks (RNNs), where weight updates rely solely on the connecting neuron states across two phases. The gradient calculations in EP have been shown to approximate the gradients computed by Backpropagation Through Time (BPTT) when an infinitesimally small nudge factor is used. This property makes EP a powerful candidate for training Spiking Neural Networks (SNNs), which are commonly trained by BPTT. However, in the spiking domain, previous studies on EP have been limited to architectures involving few linear layers. In this work, for the first time we provide a formulation for training convolutional spiking convergent RNNs using EP, bridging the gap between spiking and non-spiking convergent RNNs. We demonstrate that for spiking convergent RNNs, there is a mismatch in the maximum pooling and its inverse operation, leading to inaccurate gradient estimation in EP. Substituting this with average pooling resolves this issue and enables accurate gradient estimation for spiking convergent RNNs. We also highlight the memory efficiency of EP compared to BPTT. In the regime of SNNs trained by EP, our experimental results indicate state-of-the-art performance on the MNIST and FashionMNIST datasets, with test errors of 0.97% and 8.89%, respectively. These results are comparable to those of convergent RNNs and SNNs trained by BPTT. These findings underscore EP as an optimal choice for on-chip training and a biologically-plausible method for computing error gradients.
# OpcodeとAPIコールに基づくCNN-LSTMとマルウェア分類のための移動学習モデル

CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls ( http://arxiv.org/abs/2405.02548v1 )

ライセンス: Link先を確認
In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.
# 絡み合い浄化プロトコル選択モジュールの設計

Design of an entanglement purification protocol selection module ( http://arxiv.org/abs/2405.02555v1 )

ライセンス: Link先を確認
Entanglement purification protocols, designed to improve the fidelity of Bell states over quantum networks for inter-node communications, have attracted significant attention over the last few decades. These protocols have great potential to resolve a core challenge in quantum networking of generating high-fidelity Bell states. However, previous studies focused on the theoretical discussion with limited consideration of realistic errors. Studies of dynamically selecting the right purification protocol under various realistic errors that populate in practice have yet to be performed. In this work, we study the performance of various purification protocols under realistic errors by conducting density matrix simulations over a large suite of error models. Based on our findings of how specific error channels affect the performance of purification protocols, we propose a module that can be embedded in the quantum network. This module determines and selects the appropriate purification protocol, considering not only expected specifications from the network layer but also the capabilities of the physical layer. Finally, the performance of our proposed module is verified using two benchmark categories. Compared with the default approach and exhaustive search approach, we show a success rate approaching 90% in identifying the optimal purification protocol for our target applications.
# トランスファーラーニングによるフルートの断片化

Few-Shot Fruit Segmentation via Transfer Learning ( http://arxiv.org/abs/2405.02556v1 )

ライセンス: Link先を確認
Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.
# 医療における生成型大規模言語モデルの人的評価のための文献レビューと枠組み

A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare ( http://arxiv.org/abs/2405.02559v1 )

ライセンス: Link先を確認
As generative artificial intelligence (AI), particularly Large Language Models (LLMs), continues to permeate healthcare, it remains crucial to supplement traditional automated evaluations with human expert evaluation. Understanding and evaluating the generated texts is vital for ensuring safety, reliability, and effectiveness. However, the cumbersome, time-consuming, and non-standardized nature of human evaluation presents significant obstacles to the widespread adoption of LLMs in practice. This study reviews existing literature on human evaluation methodologies for LLMs within healthcare. We highlight a notable need for a standardized and consistent human evaluation approach. Our extensive literature search, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, spans publications from January 2018 to February 2024. This review provides a comprehensive overview of the human evaluation approaches used in diverse healthcare applications.This analysis examines the human evaluation of LLMs across various medical specialties, addressing factors such as evaluation dimensions, sample types, and sizes, the selection and recruitment of evaluators, frameworks and metrics, the evaluation process, and statistical analysis of the results. Drawing from diverse evaluation strategies highlighted in these studies, we propose a comprehensive and practical framework for human evaluation of generative LLMs, named QUEST: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence. This framework aims to improve the reliability, generalizability, and applicability of human evaluation of generative LLMs in different healthcare applications by defining clear evaluation dimensions and offering detailed guidelines.
# PINNを用いた因果問題解決の難しさの理解

Understanding the Difficulty of Solving Cauchy Problems with PINNs ( http://arxiv.org/abs/2405.02561v1 )

ライセンス: Link先を確認
Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural networks. We show that minimizing the sum of $L^2$ residual and initial condition error is not sufficient to guarantee the true solution, as this loss function does not capture the underlying dynamics. Additionally, neural networks are not capable of capturing singularities in the solutions due to the non-compactness of their image sets. This, in turn, influences the existence of global minima and the regularity of the network. We demonstrate that when the global minimum does not exist, machine precision becomes the predominant source of achievable error in practice. We also present numerical experiments in support of our theoretical claims.
# 集中治療室における急性呼吸不全に対するDeep Representation Learning-based Dynamic Trajectory Phenotyping

Deep Representation Learning-Based Dynamic Trajectory Phenotyping for Acute Respiratory Failure in Medical Intensive Care Units ( http://arxiv.org/abs/2405.02563v1 )

ライセンス: Link先を確認
Sepsis-induced acute respiratory failure (ARF) is a serious complication with a poor prognosis. This paper presents a deep representation learningbased phenotyping method to identify distinct groups of clinical trajectories of septic patients with ARF. For this retrospective study, we created a dataset from electronic medical records (EMR) consisting of data from sepsis patients admitted to medical intensive care units who required at least 24 hours of invasive mechanical ventilation at a quarternary care academic hospital in southeast USA for the years 2016-2021. A total of N=3349 patient encounters were included in this study. Clustering Representation Learning on Incomplete Time Series Data (CRLI) algorithm was applied to a parsimonious set of EMR variables in this data set. To validate the optimal number of clusters, the K-means algorithm was used in conjunction with dynamic time warping. Our model yielded four distinct patient phenotypes that were characterized as liver dysfunction/heterogeneous, hypercapnia, hypoxemia, and multiple organ dysfunction syndrome by a critical care expert. A Kaplan-Meier analysis to compare the 28-day mortality trends exhibited significant differences (p < 0.005) between the four phenotypes. The study demonstrates the utility of our deep representation learning-based approach in unraveling phenotypes that reflect the heterogeneity in sepsis-induced ARF in terms of different mortality outcomes and severity. These phenotypes might reveal important clinical insights into an effective prognosis and tailored treatment strategies.
# ニューラルネットワークのロバスト性を改善するために人間の心室視覚ストリームを活用する

Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness ( http://arxiv.org/abs/2405.02564v1 )

ライセンス: Link先を確認
Human object recognition exhibits remarkable resilience in cluttered and dynamic visual environments. In contrast, despite their unparalleled performance across numerous visual tasks, Deep Neural Networks (DNNs) remain far less robust than humans, showing, for example, a surprising susceptibility to adversarial attacks involving image perturbations that are (almost) imperceptible to humans. Human object recognition likely owes its robustness, in part, to the increasingly resilient representations that emerge along the hierarchy of the ventral visual cortex. Here we show that DNNs, when guided by neural representations from a hierarchical sequence of regions in the human ventral visual stream, display increasing robustness to adversarial attacks. These neural-guided models also exhibit a gradual shift towards more human-like decision-making patterns and develop hierarchically smoother decision surfaces. Importantly, the resulting representational spaces differ in important ways from those produced by conventional smoothing methods, suggesting that such neural-guidance may provide previously unexplored robustness solutions. Our findings support the gradual emergence of human robustness along the ventral visual hierarchy and suggest that the key to DNN robustness may lie in increasing emulation of the human brain.
# Dirac Brackets $\leftrightarrow$ Open Quantum Systems: 対応

Dirac Brackets $\leftrightarrow$ Open Quantum Systems: A Correspondence ( http://arxiv.org/abs/2405.02566v1 )

ライセンス: Link先を確認
The time evolution of an open quantum system is governed by the Gorini-Kossakowski-Sudarshan-Lindlad equation for the reduced density operator of the system. This operator is obtained from the full density operator of the composite system involving the system itself, the bath, and the interactions between them, by performing a partial trace over the bath degrees of freedom. The entanglement between the system and the bath leads to a generalized Liouville evolution that involves, amongst other things, dissipation and decoherence of the system. In a similar fashion, the time evolution of a physical observable in a classically constrained dynamical system is governed by a generalization of the Liouville equation, in which the usual Poisson bracket is replaced by the so-called Dirac bracket. The generalization takes into account the reduction in the phase space of the system because of constraints, which arise either because they are introduced by hand, or because of some underlying gauge invariance. We derive an intriguing, but precise classical-quantum correspondence between the aforementioned situations which connects the Lindblad operators to the constraints. The correspondence is illustrated in a system of coupled simple harmonic oscillators studied earlier in the context of the area law of black holes by Bombelli, Koul, Lee, and Sorkin, and independently by Srednicki.
# ActiveNeuS:ニューラルインプリシトサーフェス不確かさを用いたアクティブ3次元再構成

ActiveNeuS: Active 3D Reconstruction using Neural Implicit Surface Uncertainty ( http://arxiv.org/abs/2405.02568v1 )

ライセンス: Link先を確認
Active learning in 3D scene reconstruction has been widely studied, as selecting informative training views is critical for the reconstruction. Recently, Neural Radiance Fields (NeRF) variants have shown performance increases in active 3D reconstruction using image rendering or geometric uncertainty. However, the simultaneous consideration of both uncertainties in selecting informative views remains unexplored, while utilizing different types of uncertainty can reduce the bias that arises in the early training stage with sparse inputs. In this paper, we propose ActiveNeuS, which evaluates candidate views considering both uncertainties. ActiveNeuS provides a way to accumulate image rendering uncertainty while avoiding the bias that the estimated densities can introduce. ActiveNeuS computes the neural implicit surface uncertainty, providing the color uncertainty along with the surface information. It efficiently handles the bias by using the surface information and a grid, enabling the fast selection of diverse viewpoints. Our method outperforms previous works on popular datasets, Blender and DTU, showing that the views selected by ActiveNeuS significantly improve performance.
# 後継機能付き教師なし事前訓練のための探索と爆発の分離

Decoupling Exploration and Exploitation for Unsupervised Pre-training with Successor Features ( http://arxiv.org/abs/2405.02569v1 )

ライセンス: Link先を確認
Unsupervised pre-training has been on the lookout for the virtue of a value function representation referred to as successor features (SFs), which decouples the dynamics of the environment from the rewards. It has a significant impact on the process of task-specific fine-tuning due to the decomposition. However, existing approaches struggle with local optima due to the unified intrinsic reward of exploration and exploitation without considering the linear regression problem and the discriminator supporting a small skill sapce. We propose a novel unsupervised pre-training model with SFs based on a non-monolithic exploration methodology. Our approach pursues the decomposition of exploitation and exploration of an agent built on SFs, which requires separate agents for the respective purpose. The idea will leverage not only the inherent characteristics of SFs such as a quick adaptation to new tasks but also the exploratory and task-agnostic capabilities. Our suggested model is termed Non-Monolithic unsupervised Pre-training with Successor features (NMPS), which improves the performance of the original monolithic exploration method of pre-training with SFs. NMPS outperforms Active Pre-training with Successor Features (APS) in a comparative experiment.
# ViTALS: 外科的腎摘出術における行動局在化のための視覚変換器

ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy ( http://arxiv.org/abs/2405.02571v1 )

ライセンス: Link先を確認
Surgical action localization is a challenging computer vision problem. While it has promising applications including automated training of surgery procedures, surgical workflow optimization, etc., appropriate model design is pivotal to accomplishing this task. Moreover, the lack of suitable medical datasets adds an additional layer of complexity. To that effect, we introduce a new complex dataset of nephrectomy surgeries called UroSlice. To perform the action localization from these videos, we propose a novel model termed as `ViTALS' (Vision Transformer for Action Localization in Surgical Nephrectomy). Our model incorporates hierarchical dilated temporal convolution layers and inter-layer residual connections to capture the temporal correlations at finer as well as coarser granularities. The proposed approach achieves state-of-the-art performance on Cholec80 and UroSlice datasets (89.8% and 66.1% accuracy, respectively), validating its effectiveness.
# Off-OAB: 最適アクション依存ベースラインを用いたオフポリティポリシ勾配法

Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline ( http://arxiv.org/abs/2405.02572v1 )

ライセンス: Link先を確認
Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy data. However, these methods suffer from the high variance of the off-policy policy gradient (OPPG) estimator, which results in poor sample efficiency during training. In this paper, we propose an off-policy policy gradient method with the optimal action-dependent baseline (Off-OAB) to mitigate this variance issue. Specifically, this baseline maintains the OPPG estimator's unbiasedness while theoretically minimizing its variance. To enhance practical computational efficiency, we design an approximated version of this optimal baseline. Utilizing this approximation, our method (Off-OAB) aims to decrease the OPPG estimator's variance during policy optimization. We evaluate the proposed Off-OAB method on six representative tasks from OpenAI Gym and MuJoCo, where it demonstrably surpasses state-of-the-art methods on the majority of these tasks.
# ベトナムのスペル補正のためのBERTと変圧器の組み合わせ

A Combination of BERT and Transformer for Vietnamese Spelling Correction ( http://arxiv.org/abs/2405.02573v1 )

ライセンス: Link先を確認
Recently, many studies have shown the efficiency of using Bidirectional Encoder Representations from Transformers (BERT) in various Natural Language Processing (NLP) tasks. Specifically, English spelling correction task that uses Encoder-Decoder architecture and takes advantage of BERT has achieved state-of-the-art result. However, to our knowledge, there is no implementation in Vietnamese yet. Therefore, in this study, a combination of Transformer architecture (state-of-the-art for Encoder-Decoder model) and BERT was proposed to deal with Vietnamese spelling correction. The experiment results have shown that our model outperforms other approaches as well as the Google Docs Spell Checking tool, achieves an 86.24 BLEU score on this task.
# データマイニングに基づく前方計測システム統合のための動的異常検出法

A Data Mining-Based Dynamical Anomaly Detection Method for Integrating with an Advance Metering System ( http://arxiv.org/abs/2405.02574v1 )

ライセンス: Link先を確認
Building operations consume 30% of total power consumption and contribute 26% of global power-related emissions. Therefore, monitoring, and early detection of anomalies at the meter level are essential for residential and commercial buildings. This work investigates both supervised and unsupervised approaches and introduces a dynamic anomaly detection system. The system introduces a supervised Light Gradient Boosting machine and an unsupervised autoencoder with a dynamic threshold. This system is designed to provide real-time detection of anomalies at the meter level. The proposed dynamical system comes with a dynamic threshold based on the Mahalanobis distance and moving averages. This approach allows the system to adapt to changes in the data distribution over time. The effectiveness of the proposed system is evaluated using real-life power consumption data collected from smart metering systems. This empirical testing ensures that the system's performance is validated under real-world conditions. By detecting unusual data movements and providing early warnings, the proposed system contributes significantly to visual analytics and decision science. Early detection of anomalies enables timely troubleshooting, preventing financial losses and potential disasters such as fire incidents.
# CTD4 - 複数臨界のカルマン融合を用いた深部連続分布型アクター臨界剤

CTD4 - A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics ( http://arxiv.org/abs/2405.02576v1 )

ライセンス: Link先を確認
Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks compared to conventional Reinforcement Learning (RL) approaches. However, the practical application of CDRL is encumbered by challenging projection steps, detailed parameter tuning, and domain knowledge. This paper addresses these challenges by introducing a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces. The proposed algorithm simplifies the implementation of distributional RL, adopting an actor-critic architecture wherein the critic outputs a continuous probability distribution. Additionally, we propose an ensemble of multiple critics fused through a Kalman fusion mechanism to mitigate overestimation bias. Through a series of experiments, we validate that our proposed method is easy to train and serves as a sample-efficient solution for executing complex continuous-control tasks.
# Mixat: バイリンガル・エミラティ・イングリッシュ音声のデータセット

Mixat: A Data Set of Bilingual Emirati-English Speech ( http://arxiv.org/abs/2405.02578v1 )

ライセンス: Link先を確認
This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. Mixat was developed to address the shortcomings of current speech recognition resources when applied to Emirati speech, and in particular, to bilignual Emirati speakers who often mix and switch between their local dialect and English. The data set consists of 15 hours of speech derived from two public podcasts featuring native Emirati speakers, one of which is in the form of conversations between the host and a guest. Therefore, the collection contains examples of Emirati-English code-switching in both formal and natural conversational contexts. In this paper, we describe the process of data collection and annotation, and describe some of the features and statistics of the resulting data set. In addition, we evaluate the performance of pre-trained Arabic and multi-lingual ASR systems on our dataset, demonstrating the shortcomings of existing models on this low-resource dialectal Arabic, and the additional challenge of recognizing code-switching in ASR. The dataset will be made publicly available for research use.
# サプライズ最小化によるロボットスワムの自然運動:簡単なシミュレーションから実世界実験へ

Innate Motivation for Robot Swarms by Minimizing Surprise: From Simple Simulations to Real-World Experiments ( http://arxiv.org/abs/2405.02579v1 )

ライセンス: Link先を確認
Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.
# PropertyGPT:LLMによる検索付加資産生成によるスマートコントラクトの形式検証

PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation ( http://arxiv.org/abs/2405.02580v1 )

ライセンス: Link先を確認
With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new prop- erty for a given code. While this basic process is relatively straight- forward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) runtime-verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated prop- erties. We have implemented these strategies into a novel system called PropertyGPT, with 623 human-written properties collected from 23 Certora projects. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, resulting in $8,256 bug bounty rewards.
# 定常表現: モデル置換の改善のための適合性と含意を最適に近似する

Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements ( http://arxiv.org/abs/2405.02581v1 )

ライセンス: Link先を確認
Learning compatible representations enables the interchangeable use of semantic features as models are updated over time. This is particularly relevant in search and retrieval systems where it is crucial to avoid reprocessing of the gallery images with the updated model. While recent research has shown promising empirical evidence, there is still a lack of comprehensive theoretical understanding about learning compatible representations. In this paper, we demonstrate that the stationary representations learned by the $d$-Simplex fixed classifier optimally approximate compatibility representation according to the two inequality constraints of its formal definition. This not only establishes a solid foundation for future works in this line of research but also presents implications that can be exploited in practical learning scenarios. An exemplary application is the now-standard practice of downloading and fine-tuning new pre-trained models. Specifically, we show the strengths and critical issues of stationary representations in the case in which a model undergoing sequential fine-tuning is asynchronously replaced by downloading a better-performing model pre-trained elsewhere. Such a representation enables seamless delivery of retrieval service (i.e., no reprocessing of gallery images) and offers improved performance without operational disruptions during model replacement. Code available at: https://github.com/miccunifi/iamcl2r.
# ヒューマン・オートノミー・チームのための説明可能なインターフェース:調査

Explainable Interface for Human-Autonomy Teaming: A Survey ( http://arxiv.org/abs/2405.02583v1 )

ライセンス: Link先を確認
Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.
# テキスト誘導多言語特徴合成によるCLIPの未確認領域への一般化

Generalizing CLIP to Unseen Domain via Text-Guided Diverse Novel Feature Synthesis ( http://arxiv.org/abs/2405.02586v1 )

ライセンス: Link先を確認
Vision-language foundation models like CLIP have shown impressive zero-shot generalization, but finetuning on downstream datasets can cause overfitting and loss of its generalization ability on unseen domains. Although collecting additional data from new domains of interest is possible, this method is often impractical due to the challenges in obtaining annotated data. To address this, we propose a plug-and-play feature augmentation method called LDFS (Language-Guided Diverse Feature Synthesis) to synthesize new domain features and improve existing CLIP fine-tuning strategies. LDFS has three main contributions: 1) To synthesize novel domain features and promote diversity, we propose an instance-conditional feature augmentation strategy based on a textguided feature augmentation loss. 2) To maintain feature quality after augmenting, we introduce a pairwise regularizer to preserve augmented feature coherence within the CLIP feature space. 3) We propose to use stochastic text feature augmentation to reduce the modality gap and further facilitate the process of text-guided feature synthesis. Extensive experiments show LDFS superiority in improving CLIP generalization ability on unseen domains without collecting data from those domains. The code will be made publicly available.
# 注意ネットワークによるヨロ改善と安全ヘルメット検出のための一般化性能の向上

Better YOLO with Attention-Augmented Network and Enhanced Generalization Performance for Safety Helmet Detection ( http://arxiv.org/abs/2405.02591v1 )

ライセンス: Link先を確認
Safety helmets play a crucial role in protecting workers from head injuries in construction sites, where potential hazards are prevalent. However, currently, there is no approach that can simultaneously achieve both model accuracy and performance in complex environments. In this study, we utilized a Yolo-based model for safety helmet detection, achieved a 2% improvement in mAP (mean Average Precision) performance while reducing parameters and Flops count by over 25%. YOLO(You Only Look Once) is a widely used, high-performance, lightweight model architecture that is well suited for complex environments. We presents a novel approach by incorporating a lightweight feature extraction network backbone based on GhostNetv2, integrating attention modules such as Spatial Channel-wise Attention Net(SCNet) and Coordination Attention Net(CANet), and adopting the Gradient Norm Aware optimizer (GAM) for improved generalization ability. In safety-critical environments, the accurate detection and speed of safety helmets plays a pivotal role in preventing occupational hazards and ensuring compliance with safety protocols. This work addresses the pressing need for robust and efficient helmet detection methods, offering a comprehensive framework that not only enhances accuracy but also improves the adaptability of detection models to real-world conditions. Our experimental results underscore the synergistic effects of GhostNetv2, attention modules, and the GAM optimizer, presenting a compelling solution for safety helmet detection that achieves superior performance in terms of accuracy, generalization, and efficiency.
# 対称性保護位相状態からサブシステムキャット状態への大きな対称性による創発的クラスター状態の階層化

Hierarchy of emergent cluster states by measurement from symmetry-protected-topological states with large symmetry to subsystem cat state ( http://arxiv.org/abs/2405.02592v1 )

ライセンス: Link先を確認
We propose measurement-producing hierarchy emerging among correlated states by sequential subsystem projective measurements. We start from symmetry-protected-topological (SPT) cluster states with a large symmetry and apply sequential subsystem projective measurements to them and find that generalized cluster SPT states with a reduced symmetry appear in the subsystem of the unmeasured sites. That prescription finally produces Greenberger-Home-Zeilinger states with long-range order in the subsystem composed of periodic unmeasured sites of the original lattice. The symmetry-reduction hierarchical structure from a general large symmetric SPT cluster state is clearly captured by the measurement update flow in the efficient algorithm of stabilizer formalism. This approach is useful not only for the analytical search for the measured state but also for numerical simulation with a large system size. We also numerically verify the symmetry-reduction hierarchy by sequential subsystem projective measurements applied to large systems and large symmetric cluster SPT states.
# Leveraging (Biased) Information:オフラインデータ付きマルチアームバンド

ライセンス: Link先を確認
Large Language Models estimate fine-grained human color-concept associations ( http://arxiv.org/abs/2406.17781v1 )
# 自律運転における視覚に基づく3D占有予測 : レビューと展望

Vision-based 3D occupancy prediction in autonomous driving: a review and outlook ( http://arxiv.org/abs/2405.02595v1 )

Yanan Zhang, Jinqing Zhang, Zengran Wang, Junhao Xu, Di Huang, (参考訳) 近年、自動運転はドライバーの負担を軽減し、運転安全性を向上させる可能性に注意を向けている。 視覚に基づく3D占有予測は、画像入力から自動運転車周辺の3Dボクセルグリッドの空間的占有状況と意味を予測し、費用対効果の高い自動運転の認識システムに適した新たな認識課題である。 多くの研究が、オブジェクト中心の知覚タスクよりも3D占有率予測の方が優れていることを証明しているが、この急速に発展する分野に焦点を当てた専門的なレビューはいまだにない。 本稿では,視覚に基づく3D占有率予測の背景について紹介し,その課題について論じる。 第2に、機能強化、配置親和性、ラベル効率の3つの側面から、視覚に基づく3D占有率予測の進捗状況を総合的に調査し、各手法のポテンシャルと課題を詳細に分析する。 最後に,代表的な研究動向を概説し,今後の展望について考察する。 研究者にとって貴重なリファレンスを提供するため、関連する論文、データセット、コードの定期的に更新されたコレクションがhttps://github.com/zya3d/Awesome-3D-Occupancy-Predictionで組織されている。

In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous driving. Although numerous studies have demonstrated the greater advantages of 3D occupancy prediction over object-centric perception tasks, there is still a lack of a dedicated review focusing on this rapidly developing field. In this paper, we first introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. Secondly, we conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects: feature enhancement, deployment friendliness and label efficiency, and provide an in-depth analysis of the potentials and challenges of each category of methods. Finally, we present a summary of prevailing research trends and propose some inspiring future outlooks. To provide a valuable reference for researchers, a regularly updated collection of related papers, datasets, and codes is organized at https://github.com/zya3d/Awesome-3D-Occupancy-Prediction.
# パラメータ効率の良いファインチューニングのための勝利チケットを見つけるランダム・マスキング

Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning ( http://arxiv.org/abs/2405.02596v1 )

Jing Xu, Jingzhao Zhang, (参考訳) 微調整の大きな言語モデル(LLM)はコストがかかる。 パラメータ効率の良い微調整(PEFT)は、パラメータのごく一部をトレーニングすることで問題に対処し、その成功は事前訓練されたモデルの表現性と柔軟性を明らかにする。 本稿では,PEFTの限界について検討し,その設計をさらに単純化し,標準設定を超えてトレーニング可能なパラメータの数を削減した。 この目的のために、事前訓練されたモデルを微調整するためにランダム・マスキング(Random Masking)を用いる。 その単純さにもかかわらず、Random Maskingは驚くほど効果的であることを示し、予測される学習率を大きくすることで、LoRAなどの標準的なPEFTアルゴリズムの性能にマッチし、トレーニング可能なパラメータを少なくする。 我々はランダム・マスキングの成功を実証的かつ理論的に探求する。 マスキングは, より平坦な損失環境と, より遠方にある解を誘導し, 学習速度を向上し, 必要となることを示す。

Fine-tuning large language models (LLM) can be costly. Parameter-efficient fine-tuning (PEFT) addresses the problems by training a fraction of the parameters, whose success reveals the expressiveness and flexibility of pretrained models. This paper studies the limit of PEFT, by further simplifying its design and reducing the number of trainable parameters beyond standard setups. To this end, we use Random Masking to fine-tune the pretrained model. Despite its simplicity, we show that Random Masking is surprisingly effective: with a larger-than-expected learning rate, Random Masking can match the performance of standard PEFT algorithms such as LoRA on various tasks, using fewer trainable parameters. We provide both empirical and theoretical explorations into the success of Random Masking. We show that masking induces a flatter loss landscape and more distant solutions, which allows for and necessitates large learning rates.
# UDUC:学習に基づくロバスト制御のための不確実性駆動型アプローチ

UDUC: An Uncertainty-driven Approach for Learning-based Robust Control ( http://arxiv.org/abs/2405.02598v1 )

Yuan Zhang, Jasper Hoffmann, Joschka Boedecker, (参考訳) 学習に基づく技術は、モデル予測制御(MPC)と強化学習(RL)の両方で人気がある。 確率的アンサンブル(PE)モデルは、システムダイナミクスをモデル化するための有望なアプローチを提供し、高次元制御シナリオにおける不確実性とスケーラビリティを捉える能力を示している。 しかし、PEモデルはモード崩壊の影響を受けやすいため、トレーニングセットと若干異なる環境に直面した場合、非破壊的な制御が生じる。 本稿では,PEモデルの学習のための代替目的として,$\textbf{u}$ncertainty-$\textbf{d}$riven rob$\textbf{u}$st $\textbf{c}$ontrol (UDUC)ロスを導入する。 実世界強化学習(Real-world Reinforcement Learning, RWRL)ベンチマークでは, トレーニング環境とテスト環境の間にかなりの環境ミスマッチが生じている。

Learning-based techniques have become popular in both model predictive control (MPC) and reinforcement learning (RL). Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics, showcasing the ability to capture uncertainty and scalability in high-dimensional control scenarios. However, PE models are susceptible to mode collapse, resulting in non-robust control when faced with environments slightly different from the training set. In this paper, we introduce the $\textbf{u}$ncertainty-$\textbf{d}$riven rob$\textbf{u}$st $\textbf{c}$ontrol (UDUC) loss as an alternative objective for training PE models, drawing inspiration from contrastive learning. We analyze the robustness of UDUC loss through the lens of robust optimization and evaluate its performance on the challenging Real-world Reinforcement Learning (RWRL) benchmark, which involves significant environmental mismatches between the training and testing environments.
# Astro-NER -- Astronomy Named Entity Recognition: GPTは優れたドメインエキスパートアノテーションか?

Astro-NER -- Astronomy Named Entity Recognition: Is GPT a Good Domain Expert Annotator? ( http://arxiv.org/abs/2405.02602v1 )

Julia Evans, Sameer Sadruddin, Jennifer D'Souza, (参考訳) 本研究では,学術領域を対象としたNERモデル開発における課題の1つとして,適切なラベル付きデータの不足について考察する。 我々は、天文学の分野における科学的実体を注釈づけする分野の専門家を支援するために、微調整LDMモデルからの予測を用いて、このような協調的なプロセスがドメインの専門知識を近似できるかどうかを明らかにすることを目的としたアプローチを実験した。 本結果から, ドメインエキスパートとLLM支援非専門家との間には適度な合意が得られ, ドメインエキスパートとLLMモデルの予測との間には公平な合意が得られた。 追加実験では、このタスクにおける微調整およびデフォルトのLCMの性能を比較した。 我々はまた、専門分野の専門家によって検証された天文学のための専門的な科学的実体アノテーションスキームも導入した。 本手法は,研究テーマに関連する科学的実体にのみ焦点をあてた,学術研究中心の視点を取り入れたものである。 5000の注釈付き天文学記事のタイトルを含むデータセットが公開されている。

In this study, we address one of the challenges of developing NER models for scholarly domains, namely the scarcity of suitable labeled data. We experiment with an approach using predictions from a fine-tuned LLM model to aid non-domain experts in annotating scientific entities within astronomy literature, with the goal of uncovering whether such a collaborative process can approximate domain expertise. Our results reveal moderate agreement between a domain expert and the LLM-assisted non-experts, as well as fair agreement between the domain expert and the LLM model's predictions. In an additional experiment, we compare the performance of finetuned and default LLMs on this task. We have also introduced a specialized scientific entity annotation scheme for astronomy, validated by a domain expert. Our approach adopts a scholarly research contribution-centric perspective, focusing exclusively on scientific entities relevant to the research theme. The resultant dataset, containing 5,000 annotated astronomy article titles, is made publicly available.
# MEXGEN:情報収集経路計画のための効果的かつ効率的な情報ゲイン近似

MEXGEN: An Effective and Efficient Information Gain Approximation for Information Gathering Path Planning ( http://arxiv.org/abs/2405.02605v1 )

Joshua Chesser, Thuraiappah Sathyan, Damith C. Ranasinghe, (参考訳) 対象物に関する情報を収集する自律ロボットは、効率、性能、安全性を向上させるため、多くの実世界の応用がある。 自律性を実現するには、不確実性の下でのシーケンシャルな意思決定問題を解決するためのオンライン計画アルゴリズムが必要である。 このような計画上の問題は、将来を予測して最適な決定を下すという組み合わせの性質のため、非常に難しい。 情報理論計画アルゴリズムでは,不確実な信念状態からセンサ計測の可能性を予測する難しい問題に対して,計算的に効率的かつ効果的な近似法を開発する。 この手法は情報収集行動から得られる情報をより正確に予測する。 我々の理論的解析は,提案法が現在の効率的な手法よりも予測誤差が低いことを証明している。 マルチロータ型空中ロボットを用いた広範囲なシミュレーション・フィールド実験により, 電波源追尾と位置決め問題の性能向上を実証した。

Autonomous robots for gathering information on objects of interest has numerous real-world applications because of they improve efficiency, performance and safety. Realizing autonomy demands online planning algorithms to solve sequential decision making problems under uncertainty; because, objects of interest are often dynamic, object state, such as location is not directly observable and are obtained from noisy measurements. Such planning problems are notoriously difficult due to the combinatorial nature of predicting the future to make optimal decisions. For information theoretic planning algorithms, we develop a computationally efficient and effective approximation for the difficult problem of predicting the likely sensor measurements from uncertain belief states}. The approach more accurately predicts information gain from information gathering actions. Our theoretical analysis proves the proposed formulation achieves a lower prediction error than the current efficient-method. We demonstrate improved performance gains in radio-source tracking and localization problems using extensive simulated and field experiments with a multirotor aerial robot.
# UnSAMFlow:Segment Anything Modelでガイドされた教師なし光学フロー

UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model ( http://arxiv.org/abs/2405.02608v1 )

Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx, (参考訳) 従来の教師なし光学フロー法は、オブジェクトレベルの情報がないため、閉塞や運動境界に弱い。 そこで本研究では,UnSAMFlowを提案する。UnSAMFlowは,最新の基盤モデルセグメンツ・ア・シング・モデル(SAM)のオブジェクト情報も活用する,教師なしフローネットワークである。 まず、SAMマスクに合わせた自己教師付きセマンティック拡張モジュールを含める。 また,従来の滑らかさ損失の勾配の低さを解析し,その代わりにホモグラフィに基づく新しい滑らかさ定義を提案する。 オブジェクトレベルの機能をさらに集約するために、シンプルだが効果的なマスク機能モジュールも追加された。 これらの適応により,本手法はオブジェクトの周囲に鋭い境界を持つ透明な光フロー推定を行い,KITTIとSintelの双方のデータセットにおける最先端の手法より優れる。 また,本手法はドメインをまたいでよく一般化し,非常に効率的に動作する。

Traditional unsupervised optical flow methods are vulnerable to occlusions and motion boundaries due to lack of object-level information. Therefore, we propose UnSAMFlow, an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM). We first include a self-supervised semantic augmentation module tailored to SAM masks. We also analyze the poor gradient landscapes of traditional smoothness losses and propose a new smoothness definition based on homography instead. A simple yet effective mask feature module has also been added to further aggregate features on the object level. With all these adaptations, our method produces clear optical flow estimation with sharp boundaries around objects, which outperforms state-of-the-art methods on both KITTI and Sintel datasets. Our method also generalizes well across domains and runs very efficiently.
# 新しいフーリエ畳み込みネットワークを用いた112Gb/s上流PONの高度等化

Advanced Equalization in 112 Gb/s Upstream PON Using a Novel Fourier Convolution-based Network ( http://arxiv.org/abs/2405.02609v1 )

Chen Shao, Elias Giacoumidis, Patrick Matalla, Jialei Li, Shi Li, Sebastian Randel, Andre Richter, Michael Faerber, Tobias Kaefer, (参考訳) 本稿では,112Gb/s上流PAM4-PONに対して,FConvNetに基づく新しい低複雑性フーリエ畳み込みネットワーク(FConvNet)を実験的に実証する。 FConvNetは、0.005のBERにおいて、51tap Sato等化器とベンチマーク機械学習アルゴリズムと比較して、受信感度を2dBと1dBに向上させる。

We experimentally demonstrate a novel, low-complexity Fourier Convolution-based Network (FConvNet) based equalizer for 112 Gb/s upstream PAM4-PON. At a BER of 0.005, FConvNet enhances the receiver sensitivity by 2 and 1 dB compared to a 51-tap Sato equalizer and benchmark machine learning algorithms respectively.
# ペアワイズ比較クエリによる線形ユーティリティ関数の学習

Learning Linear Utility Functions From Pairwise Comparison Queries ( http://arxiv.org/abs/2405.02612v1 )

Luise Ge, Brendan Juba, Yevgeniy Vorobeychik, (参考訳) 線形効用関数のペア比較クエリによる学習可能性について検討する。 特に,2つの学習目標について考察する。 第1の目的はペア比較に対するサンプル外応答を予測することであり、第2の目的はユーティリティ関数の真のパラメータを概ね回復することである。 受動的学習環境では, クエリ応答がノイズによって損なわれない場合と, 分布が十分に「ニッチ」である場合のツィバコフ雑音の下で, 線形ユーティリティが第一目的に対して効率的に学習可能であることを示す。 これとは対照的に,クエリ応答がノイズフリーであっても,強力なモデリング仮定を伴わない大規模なデータ分布に対して,ユーティリティパラメータが学習できないことを示す。 次に,能動的学習環境での学習問題を解析する。 この場合、第2の目的であっても効率よく学習できることを示し、ノイズフリーおよびノイズの多いクエリ応答設定のためのアルゴリズムを提示する。 この結果から,受動的学習と能動的学習の相互選好クエリ間の質的学習性差が示され,ユーティリティ学習のためのペアワイズクエリを選択する能力の価値が示された。

We study learnability of linear utility functions from pairwise comparison queries. In particular, we consider two learning objectives. The first objective is to predict out-of-sample responses to pairwise comparisons, whereas the second is to approximately recover the true parameters of the utility function. We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective, both when query responses are uncorrupted by noise, and under Tsybakov noise when the distributions are sufficiently "nice". In contrast, we show that utility parameters are not learnable for a large set of data distributions without strong modeling assumptions, even when query responses are noise-free. Next, we proceed to analyze the learning problem in an active learning setting. In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings. Our results thus exhibit a qualitative learnability gap between passive and active learning from pairwise preference queries, demonstrating the value of the ability to select pairwise queries for utility learning.
# テトラBFT:無認証・応答性BFTコンセンサスの遅延低減

TetraBFT: Reducing Latency of Unauthenticated, Responsive BFT Consensus ( http://arxiv.org/abs/2405.02615v1 )

Qianyu Yu, Giuliano Losa, Xuechao Wang, (参考訳) 本稿では,部分的に同期してコンセンサスを解決し,公開鍵暗号の必要性を排除し,計算不能な敵に対するレジリエンスを確保するための,新規なビザンチンフォールトトレラントプロトコルTetraBFTを提案する。 TetraBFTにはいくつかの魅力的な機能がある: 一定のローカルストレージしか必要とせず、最適な通信複雑性を持ち、楽観的な応答性を満足し、プロトコルが理想的な条件下で実際のネットワーク速度で動作できるようにする。 厳密なセキュリティ分析と形式的検証により,TetraBFTの正当性を検証した。 さらに、我々はTetraBFTをマルチショット連鎖コンセンサスプロトコルに拡張し、無認証プロトコルにパイプライニング技術を適用する先駆的な取り組みを行っている。 これにより、TetraBFTは、高効率を目指すブロックチェーンシステムの実用的でデプロイ可能なソリューションとして位置づけられる。

This paper presents TetraBFT, a novel unauthenticated Byzantine fault tolerant protocol for solving consensus in partial synchrony, eliminating the need for public key cryptography and ensuring resilience against computationally unbounded adversaries. TetraBFT has several compelling features: it necessitates only constant local storage, has optimal communication complexity, satisfies optimistic responsiveness -- allowing the protocol to operate at actual network speeds under ideal conditions -- and can achieve consensus in just 5 message delays, which outperforms all known unauthenticated protocols achieving the other properties listed. We validate the correctness of TetraBFT through rigorous security analysis and formal verification. Furthermore, we extend TetraBFT into a multi-shot, chained consensus protocol, making a pioneering effort in applying pipelining techniques to unauthenticated protocols. This positions TetraBFT as a practical and deployable solution for blockchain systems aiming for high efficiency.
# 分子特性予測のためのコントラストデュアル相互作用グラフニューラルネットワーク

Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction ( http://arxiv.org/abs/2405.02628v1 )

Zexing Zhao, Guangsi Shi, Xiaopeng Wu, Ruohua Ren, Xiaojun Gao, Fuyi Li, (参考訳) 分子特性予測は、AIによる薬物発見と分子特性学習の鍵となる要素である。 近年の進歩にもかかわらず、既存の手法は、特に分子構造に特有なタスクにおいて、一般化能力の制限やラベルなしデータからの学習の表現の不十分といった課題に直面している。 これらの制約に対処するために、分子特性予測のための新しい自己教師付きグラフニューラルネットワークフレームワークであるDIG-Molを紹介する。 このアーキテクチャは、コントラスト学習のパワーを二重相互作用機構とユニークな分子グラフ拡張戦略で活用する。 DIG-Molは2つの相互接続ネットワークと運動量蒸留ネットワークを統合し、分子特性を効率的に改善する。 このフレームワークは、コントラストの損失を最小限に抑えて、分子構造や高次セマンティクスに関する重要な情報を抽出する能力をサポートしている。 我々は,様々な分子特性予測タスクにおける広範囲な実験的評価により,DIG-Molの最先端性能を確立した。 少数の学習シナリオにおいて優れた伝達可能性を示すことに加えて、DIG-Molの強化された解釈可能性と表現能力を可視化する。 これらの結果は,従来の手法が直面する課題を克服する上でのアプローチの有効性を確認し,分子特性予測の大幅な進歩を示すものである。

Molecular property prediction is a key component of AI-driven drug discovery and molecular characterization learning. Despite recent advances, existing methods still face challenges such as limited ability to generalize, and inadequate representation of learning from unlabeled data, especially for tasks specific to molecular structures. To address these limitations, we introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction. This architecture leverages the power of contrast learning with dual interaction mechanisms and unique molecular graph enhancement strategies. DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization. The framework's ability to extract key information about molecular structure and higher-order semantics is supported by minimizing loss of contrast. We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks. In addition to demonstrating superior transferability in a small number of learning scenarios, our visualizations highlight DIG-Mol's enhanced interpretability and representation capabilities. These findings confirm the effectiveness of our approach in overcoming challenges faced by traditional methods and mark a significant advance in molecular property prediction.
# SPARSE: リアルタイム攻撃調査のための意味的追跡と経路解析

SPARSE: Semantic Tracking and Path Analysis for Attack Investigation in Real-time ( http://arxiv.org/abs/2405.02629v1 )

Jie Ying, Tiantian Zhu, Wenrui Cheng, Qixuan Yuan, Mingjun Ma, Chunlin Xiong, Tieming Chen, Mingqi Lv, Yan Chen, (参考訳) 先進的永続脅威(APT)の複雑さと破壊性が増大するにつれて、攻撃者の標的を達成するための一連の行動を特定する傾向が高まり、攻撃調査と呼ばれる。 現在、アナリストは、重要なイベント(攻撃に関連する)をキャプチャするために、Point-Of-Interest(POI)イベントの因果解析を行うために、プロファイランスグラフを構築している。 しかし、プロファイナンスグラフの巨大化と臨界事象の希少性のため、既存の攻撃調査手法では、偽陽性、高オーバーヘッド、高レイテンシといった問題に悩まされている。 そこで本稿では,ストリームログから重要なコンポーネントグラフ(クリティカルイベント)を構築するための,効率的かつリアルタイムなシステムであるSPARSEを提案する。 私たちの重要な観察は 1)疑わしい実体間の相互作用フローからなる疑わしいセマンティックグラフ(SSG)に臨界事象が存在し、 2)攻撃者の目標を達成する情報の流れは経路の形で存在する。 そのため、SPARSEは攻撃調査(SSGの構築とパスレベルの文脈分析)を行うために2段階のフレームワークを使用する。 まず、SPARSEは、イベントをストリームとして消費する状態ベースのモードで動作し、セマンティックトランスファールールとストレージ戦略を通じて、POIイベントに関連するSSGへのアクセスを容易にする。 そして、SPARSEは、SSGからPOIイベントに関連するすべての疑わしい流れ経路(SFP)を特定し、各経路の影響を定量化し、無関係なイベントをフィルタリングする。 実大規模アタックデータセットを用いた評価では,SPARSEは,バックトラックグラフ(約227,589エッジ)よりも2014 Xの臨界成分グラフ(約113エッジ)を1.6秒で生成可能である。 SPARSEは、無関係なエッジをフィルタリングする他の最先端技術よりも25倍効果的である。

As the complexity and destructiveness of Advanced Persistent Threat (APT) increase, there is a growing tendency to identify a series of actions undertaken to achieve the attacker's target, called attack investigation. Currently, analysts construct the provenance graph to perform causality analysis on Point-Of-Interest (POI) event for capturing critical events (related to the attack). However, due to the vast size of the provenance graph and the rarity of critical events, existing attack investigation methods suffer from problems of high false positives, high overhead, and high latency. To this end, we propose SPARSE, an efficient and real-time system for constructing critical component graphs (i.e., consisting of critical events) from streaming logs. Our key observation is 1) Critical events exist in a suspicious semantic graph (SSG) composed of interaction flows between suspicious entities, and 2) Information flows that accomplish attacker's goal exist in the form of paths. Therefore, SPARSE uses a two-stage framework to implement attack investigation (i.e., constructing the SSG and performing path-level contextual analysis). First, SPARSE operates in a state-based mode where events are consumed as streams, allowing easy access to the SSG related to the POI event through semantic transfer rule and storage strategy. Then, SPARSE identifies all suspicious flow paths (SFPs) related to the POI event from the SSG, quantifies the influence of each path to filter irrelevant events. Our evaluation on a real large-scale attack dataset shows that SPARSE can generate a critical component graph (~ 113 edges) in 1.6 seconds, which is 2014 X smaller than the backtracking graph (~ 227,589 edges). SPARSE is 25 X more effective than other state-of-the-art techniques in filtering irrelevant edges.
# cuTN-QSVM:cuQuantum SDKを用いたcuTensorNetアクセラレーション量子サポートベクトルマシン

cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK ( http://arxiv.org/abs/2405.02630v1 )

Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Willie, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin, (参考訳) 本稿では,NVIDIA の cuQuantum SDK で実現される計算の進歩,特に cuTensorNet ライブラリを活用した量子支援ベクトルマシン (QSVM) の適用について検討する。 本稿では,実験によって実証された計算オーバーヘッドを指数的から二次的コストに大幅に低減するシミュレーションワークフローを提案する。 状態ベクトルシミュレーションは50以上の量子ビット数では実現不可能になるが,我々はcuTensorNetがNVIDIA A100 GPU上で数秒で完了するシミュレーションを高速化することを示した。 マルチGPU処理をMPI(Message Passing Interface)と組み合わせることで,計算時間の顕著な減少を報告し,データサイズの増加に対するアプローチの強い線形高速化を効果的に実証する。 これにより、QSVMは高性能コンピューティング(HPC)システム上で効率的に動作し、研究者がまだ研究されていない複雑な量子アルゴリズムを探索するための新しい窓を開くことができる。 精度評価では、従来のSVMの能力を超え、100以上のトレーニングセットのためのMNISTデータセット内の挑戦的な分類に対して、最大95%を達成する。 これらの進歩は、cuQuantum SDK内のcuTensorNetを量子機械学習シミュレーションをスケールするための重要なツールとして位置づけ、量子-HPCエコシステム内で重要なような計算戦略のシームレスな統合を示唆する可能性がある。

This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulations become infeasible for qubit counts over 50, our evaluation demonstrates that cuTensorNet speeds up simulations to be completed within seconds on the NVIDIA A100 GPU, even for qubit counts approaching 784. By employing multi-GPU processing with Message Passing Interface (MPI), we document a marked decrease in computation times, effectively demonstrating the strong linear speedup of our approach for increasing data sizes. This enables QSVMs to operate efficiently on High-Performance Computing (HPC) systems, thereby opening a new window for researchers to explore complex quantum algorithms that have not yet been investigated. In accuracy assessments, our QSVM achieves up to 95\% on challenging classifications within the MNIST dataset for training sets larger than 100 instances, surpassing the capabilities of classical SVMs. These advancements position cuTensorNet within the cuQuantum SDK as a pivotal tool for scaling quantum machine learning simulations and potentially signpost the seamless integration of such computational strategies as pivotal within the Quantum-HPC ecosystem.
# 掘削データを用いた岩盤質量分類のための教師なし機械学習 : 既存の岩盤質量分類システムにおいて,データ駆動システムはどうやって限界を扱えるのか?

Unsupervised machine learning for data-driven classification of rock mass using drilling data: How can a data-driven system handle limitations in existing rock mass classification systems? ( http://arxiv.org/abs/2405.02631v1 )

T. F. Hansen, A. Aarset, (参考訳) 地下構造物の安定性とリスクを世界規模で評価し, その支援と発掘設計を導く上で, 岩盤質量分類システムの重要性が示唆された。 しかし、主に1970年代に開発されたシステムは、現代の高解像度データや高度な統計技術へのアクセスを欠き、意思決定支援システムとしての有効性を制限した。 当初、この文脈で観測された限界を概説し、その後、ドリルデータに基づくデータ駆動システムがこれらの制限を克服する方法について説明した。 フルトンネルプロファイルの1メートル区間における数千のMWDデータから抽出した統計情報を用いて,岩盤質量のシグネチャとして機能し,岩盤質量分類の基盤として機能する,明確に定義されたクラスターを形成することが可能であることを実証した。 非線形多様体学習法(UMAP)と線形主成分分析(PCA)を用いて48値ベクトルの次元性を低減し,クラスタリングを強化した。 教師なし機械学習手法(HDBSCAN,Agglomerative Clustering,K-means)を用いてデータをクラスタリングし,マルチオブジェクトベイズ最適化によりハイパーパラメータを最適化し,効率的なクラスタリングを行った。 ドメイン知識を利用することで、MWDデータのコアクラスタに追加機能を追加することにより、クラスタリングとシステムチューニングの機会が改善されました。 我々は,これらのクラスターを,岩盤タイプや岩質のラベルを含む物理岩盤質量特性と相関させ,岩盤質量評価のためのキーMWDパラメータの累積分布を分析して,岩盤質量を有意に区別するかどうかを判定した。 MWDデータの岩団クラスター形成能力は、この客観的なデータ駆動手法に基づく将来の分類システムにとって、人間の偏見を伴わない大きな可能性を示唆している。

Rock mass classification systems are crucial for assessing stability and risk in underground construction globally and guiding support and excavation design. However, systems developed primarily in the 1970s lack access to modern high-resolution data and advanced statistical techniques, limiting their effectiveness as decision-support systems. Initially, we outline the limitations observed in this context and later describe how a data-driven system, based on drilling data as detailed in this study, can overcome these limitations. Using extracted statistical information from thousands of MWD-data values in one-meter sections of a full tunnel profile, thus working as a signature of the rock mass, we have demonstrated that it is possible to form well-defined clusters that can act as a foundational basis for various rock mass classification systems. We reduced the dimensionality of 48-value vectors using nonlinear manifold learning techniques (UMAP) and linear principal component analysis (PCA) to enhance clustering. Unsupervised machine learning methods (HDBSCAN, Agglomerative Clustering, K-means) were employed to cluster the data, with hyperparameters optimised through multi-objective Bayesian optimisation for effective clustering. Using domain knowledge, we experienced improved clustering and system tuning opportunities in adding extra features to core clusters of MWD-data. We structured and correlated these clusters with physical rock mass properties, including labels of rock type and rock quality, and analysed cumulative distributions of key MWD-parameters for rock mass assessment to determine if clusters meaningfully differentiate rock masses. The ability of MWD data to form distinct rock mass clusters suggests substantial potential for future classification systems grounded in this objective, data-driven methodology, free from human bias.
# 等角予測を用いたディープラーニングモデルのオンボード校正検出

Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction ( http://arxiv.org/abs/2405.02634v1 )

Protim Bhattacharjee, Peter Jung, (参考訳) 深層学習モデルのブラックボックスの性質は、リモートセンシングのような重要なアプリケーションでの使用を複雑にしている。 コンフォーマル予測は、そのようなシナリオに対する信頼を保証する方法である。 データ交換性に対して、共形予測は、ユーザ定義エラー率内に真のクラスを含むことが保証される予測セットの形式で、有限サンプルカバレッジ保証を提供する。 本稿では,共形予測アルゴリズムが深層学習モデルの不確実性と関連していることを示すとともに,この関係を深部学習モデルが校正外であるかどうかを検出するために利用できることを示す。 Resnet50、Densenet161、InceptionV3、MobileNetV2といった一般的な分類モデルは、EuroSATのようなリモートセンシングデータセットに適用され、ノイズの多いシナリオ下でモデル出力が不信になることを示す。 さらに、モデル不確かさと共形予測セットの平均サイズに関連する校正外検出手順を示す。

The black box nature of deep learning models complicate their usage in critical applications such as remote sensing. Conformal prediction is a method to ensure trust in such scenarios. Subject to data exchangeability, conformal prediction provides finite sample coverage guarantees in the form of a prediction set that is guaranteed to contain the true class within a user defined error rate. In this letter we show that conformal prediction algorithms are related to the uncertainty of the deep learning model and that this relation can be used to detect if the deep learning model is out-of-calibration. Popular classification models like Resnet50, Densenet161, InceptionV3, and MobileNetV2 are applied on remote sensing datasets such as the EuroSAT to demonstrate how under noisy scenarios the model outputs become untrustworthy. Furthermore an out-of-calibration detection procedure relating the model uncertainty and the average size of the conformal prediction set is presented.
# TREC iKAT 2023:対話型知識アシスタントの評価のためのテストコレクション

TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants ( http://arxiv.org/abs/2405.02637v1 )

Mohammad Aliannejadi, Zahra Abbasiantaeb, Shubham Chatterjee, Jeffery Dalton, Leif Azzopardi, (参考訳) 近年,Large Language Models (LLMs) の開発によって会話情報探索が急速に発展し,ユーザ要求に対する自然な解釈と応答の基盤となっている。 TREC Interactive Knowledge Assistance Track (iKAT) は、研究者が会話検索エージェント(Conversational Search Agents, CSA)をテストおよび評価できるようにすることを目的としている。 このコレクションには、20のトピックにまたがる36のパーソナライズされた対話が含まれており、それぞれにパーソナライズされたユーザペルソナを定義するPersonal Text Knowledge Base (PTKB)が組み合わされている。 約26,000の通路を持つ344の旋回は、関連性の評価、および4つの重要な次元(妥当性、完全性、基底性、自然性)で生成された応答に関する追加評価として提供される。 このコレクションは、CSAに対して、多様な個人的コンテキストを効率的にナビゲートし、関連するペルソナ情報を提供し、関連する会話にコンテキストを活用するよう求めている。 PTKBの統合と決定探索タスクの強調は、このテストコレクションの独特性に寄与し、対話型および対話型知識アシスタントの研究を進めるための重要なベンチマークとなる。

Conversational information seeking has evolved rapidly in the last few years with the development of Large Language Models (LLMs), providing the basis for interpreting and responding in a naturalistic manner to user requests. The extended TREC Interactive Knowledge Assistance Track (iKAT) collection aims to enable researchers to test and evaluate their Conversational Search Agents (CSA). The collection contains a set of 36 personalized dialogues over 20 different topics each coupled with a Personal Text Knowledge Base (PTKB) that defines the bespoke user personas. A total of 344 turns with approximately 26,000 passages are provided as assessments on relevance, as well as additional assessments on generated responses over four key dimensions: relevance, completeness, groundedness, and naturalness. The collection challenges CSA to efficiently navigate diverse personal contexts, elicit pertinent persona information, and employ context for relevant conversations. The integration of a PTKB and the emphasis on decisional search tasks contribute to the uniqueness of this test collection, making it an essential benchmark for advancing research in conversational and interactive knowledge assistants.
# PrivSGP-VR: 密接な実用性境界を持つ差分自家変量誘導確率勾配プッシュ

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds ( http://arxiv.org/abs/2405.02638v1 )

Zehan Zhu, Yan Huang, Xin Wang, Jinming Xu, (参考訳) 本稿では,各ノードに対して,確率的勾配プッシュと分散化を併用し,各ノードに対して$(\epsilon, \delta)$-differential privacy (DP)を保証できる差分プライベートな分散学習手法(PrivSGP-VR)を提案する。 我々の理論的分析は、DPガウス雑音の下では、PrivSGP-VRが$\mathcal{O}(1/\sqrt{nK})$のサブ線形収束速度を達成していることを示している。 モーメント会計手法を活用することで、分散環境での特定のプライバシー予算の下でモデルユーティリティを最大化するために、最適な$Kを導出する。 この最適化された$K$で、PrivSGP-VR は$\mathcal{O}\left( \sqrt{d\log \left( \frac{1}{\delta} \right)}/(\sqrt{n}J\epsilon) \right)$, where $J$ と $d$ はそれぞれ、ローカルサンプルの数と決定変数の次元である。 大規模な実験は、特に最適化された$K$で最適化されたユーティリティの観点から、完全に分散された環境で、我々の理論的な知見を裏付ける。

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees $(\epsilon, \delta)$-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of $\mathcal{O}(1/\sqrt{nK})$, where $n$ and $K$ are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to $n$. Leveraging the moments accountant method, we further derive an optimal $K$ to maximize the model utility under certain privacy budget in decentralized settings. With this optimized $K$, PrivSGP-VR achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}{\delta} \right)}/(\sqrt{n}J\epsilon) \right)$, where $J$ and $d$ are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of $1/\sqrt{n}$ improvement compared to that of the existing decentralized counterparts, such as A(DP)$^2$SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized $K$, in fully decentralized settings.
# 宇宙における機械学習: 搭載MLモデルの放射能に対するロバスト性の調査

Machine Learning in Space: Surveying the Robustness of on-board ML models to Radiation ( http://arxiv.org/abs/2405.02642v1 )

Kevin Lange, Federico Fontana, Francesco Rossi, Mattia Varile, Giovanni Apruzzese, (参考訳) 現代の宇宙船はますます機械学習(ML)に依存している。 しかし、宇宙の物理機器は、放射線などの様々な自然の危険にさらされており、コンピュータ装置の正しい操作を阻害する可能性がある。 自然に誘発される欠陥がML関連ハードウェアに損傷をもたらすことを示す証拠は数多くあるが、宇宙用途のMLモデルに対する放射の影響は十分に研究されていない。 これは問題であり、これらの自然現象によってMLモデルがどのように影響を受けるかを理解していないため、放射耐性MLソフトウェアを開発する上で「どこから始めるか」は不確実である。 ML研究者として、私たちはこのジレンマに取り組みます。 機械学習を専門とするスペースインダストリー実践者と組むことで,最先端技術に関するリフレクティブな分析を行う。 本研究は, 宇宙船用MLモデルに対する自然災害の影響について, 先行研究が徹底的に検証しなかった事実を提示する。 そして、"負の結果"を通して、いくつかの既存のオープンソース技術は、衛星におけるMLのいくつかの応用に対する放射の影響を研究するために、研究者によってはほとんど利用できないことを示す。 建設的なステップとして、我々は現在のフレームワークを活用して、放射誘発断層に対するクラウド検出のための実用的なMLモデルのロバスト性を評価するための簡単な実験を行った。 我々の評価は、すべての欠点が、いくつかの先行研究で主張されているような破壊的なものではないことを明らかにしている。 私たちのリソースを一般公開することで、宇宙耐性MLモデルの開発を先導するために、研究者が宇宙船にアクセスできる足場を提供しています。

Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space applications are not well-studied. This is a problem: without understanding how ML models are affected by these natural phenomena, it is uncertain "where to start from" to develop radiation-tolerant ML software. As ML researchers, we attempt to tackle this dilemma. By partnering up with space-industry practitioners specialized in ML, we perform a reflective analysis of the state of the art. We provide factual evidence that prior work did not thoroughly examine the impact of natural hazards on ML models meant for spacecraft. Then, through a "negative result", we show that some existing open-source technologies can hardly be used by researchers to study the effects of radiation for some applications of ML in satellites. As a constructive step forward, we perform simple experiments showcasing how to leverage current frameworks to assess the robustness of practical ML models for cloud detection against radiation-induced faults. Our evaluation reveals that not all faults are as devastating as claimed by some prior work. By publicly releasing our resources, we provide a foothold -- usable by researchers without access to spacecraft -- for spearheading development of space-tolerant ML models.
# 解釈可能なマルチビュークラスタリング

Interpretable Multi-View Clustering ( http://arxiv.org/abs/2405.02644v1 )

Mudi Jiang, Lianyu Hu, Zengyou He, Zhikui Chen, (参考訳) マルチビュークラスタリングは重要な研究領域となり、クラスタリングの精度を高めるために過去数十年にわたって多くの手法が提案されてきた。 しかし、現実世界の多くのアプリケーションでは、なぜサンプルが特定のクラスタに割り当てられているのかを説明しながら、明確な意思決定プロセスを明確に示すことが不可欠である。 その結果,マルチビューデータをクラスタリングするための解釈可能な手法の開発には,依然として大きなギャップが残っている。 この重要なギャップを埋めるために、我々は、解釈可能なマルチビュークラスタリングフレームワークを導入することで、この方向への第一歩を踏み出します。 提案手法は,各ビューから埋め込み特徴を抽出して擬似ラベルを生成し,決定木の初期構築を誘導することから始める。 その後、解釈可能な決定木を書き換えると共に、各ビューのフィーチャ表現を反復的に最適化する。 実データを用いた実験結果から,本手法は多視点データに対して透過的なクラスタリングプロセスを提供するだけでなく,最先端のマルチビュークラスタリング手法に匹敵する性能を提供することが示された。 我々の知る限りでは、これは多視点データに特化した解釈可能なクラスタリングフレームワークを設計する最初の試みであり、この分野に新たな道を開く。

Multi-view clustering has become a significant area of research, with numerous methods proposed over the past decades to enhance clustering accuracy. However, in many real-world applications, it is crucial to demonstrate a clear decision-making process-specifically, explaining why samples are assigned to particular clusters. Consequently, there remains a notable gap in developing interpretable methods for clustering multi-view data. To fill this crucial gap, we make the first attempt towards this direction by introducing an interpretable multi-view clustering framework. Our method begins by extracting embedded features from each view and generates pseudo-labels to guide the initial construction of the decision tree. Subsequently, it iteratively optimizes the feature representation for each view along with refining the interpretable decision tree. Experimental results on real datasets demonstrate that our method not only provides a transparent clustering process for multi-view data but also delivers performance comparable to state-of-the-art multi-view clustering methods. To the best of our knowledge, this is the first effort to design an interpretable clustering framework specifically for multi-view data, opening a new avenue in this field.
# Windowsのマルウェア検知器の更新:敵のexEmplesに対するロバストさと回帰のバランス

Updating Windows Malware Detectors: Balancing Robustness and Regression against Adversarial EXEmples ( http://arxiv.org/abs/2405.02646v1 )

Matous Kozak, Luca Demetrio, Dmitrijs Trizna, Fabio Roli, (参考訳) Adversarial EXEmplesは、機械学習のWindowsマルウェア検出を回避すべく、慎重に調整されたプログラムで、検出効率に対処可能な堅牢なモデルの開発に取り組んでいる。 しかしながら、堅牢なモデルがEXEmplの大多数を予防し、時間とともに予測能力を維持することができても、モデルはより新しい脅威に微調整され、部分的な更新やスクラッチからの時間的再トレーニングに繋がる。 したがって、たとえ攻撃に対する堅牢性が高くても、新しいモデルは、以前正しく検出された脅威を誤分類することで、性能の低下を被る可能性がある。 これらの理由から,Windows のマルウェア検知器を更新する際の精度とレグレッションのトレードオフについて検討し,既存の検出器にチェーン可能なプラグイン EXE-Scanner を提案する。 従来提案されていた硬化技術が,非破壊モデル更新時の精度の低下に悩まされていることを実証的に示す。 一方,EXE-Scannerは精度の低下のない頑健なモデルに匹敵する性能を示し,ベース分類器の後に適切にチェーンして,コストのかかる再学習を必要とせずに最高の性能を得る方法を示す。 再現性を高めるために、我々は、最先端の摂動アルゴリズムに基づく逆EXEmplesのデータセットとともに、ソースコードをオープンにリリースする。

Adversarial EXEmples are carefully-perturbed programs tailored to evade machine learning Windows malware detectors, with an on-going effort in developing robust models able to address detection effectiveness. However, even if robust models can prevent the majority of EXEmples, to maintain predictive power over time, models are fine-tuned to newer threats, leading either to partial updates or time-consuming retraining from scratch. Thus, even if the robustness against attacks is higher, the new models might suffer a regression in performance by misclassifying threats that were previously correctly detected. For these reasons, we study the trade-off between accuracy and regression when updating Windows malware detectors, by proposing EXE-scanner, a plugin that can be chained to existing detectors to promptly stop EXEmples without causing regression. We empirically show that previously-proposed hardening techniques suffer a regression of accuracy when updating non-robust models. On the contrary, we show that EXE-scanner exhibits comparable performance to robust models without regression of accuracy, and we show how to properly chain it after the base classifier to obtain the best performance without the need of costly retraining. To foster reproducibility, we openly release source code, along with the dataset of adversarial EXEmples based on state-of-the-art perturbation algorithms.
# ラベルノイズに対するロバストな等角予測スコア

A Conformal Prediction Score that is Robust to Label Noise ( http://arxiv.org/abs/2405.02648v1 )

Coby Penso, Jacob Goldberger, (参考訳) コンフォーマル予測(CP)は、このセット内に正しいクラスが存在するという事前定義された確率を持つ小さな予測セットを構築することで、ネットワークの不確実性を定量化する。 本研究では,雑音ラベル付き検証セットに基づくCP校正問題に取り組む。 ラベルノイズに頑健なコンフォメーションスコアを導入する。 ノイズラベル付きデータとノイズレベルを用いて、ノイズフリーコンフォメーションスコアを推定する。 テストフェーズでは、ノイズフリースコアを使用して予測セットを形成する。 提案アルゴリズムをいくつかの標準医用画像分類データセットに適用した。 提案手法は,必要なカバレッジを維持しつつ,予測セットの平均サイズの観点から,現在の手法よりも大きなマージンで優れていることを示す。

Conformal Prediction (CP) quantifies network uncertainty by building a small prediction set with a pre-defined probability that the correct class is within this set. In this study we tackle the problem of CP calibration based on a validation set with noisy labels. We introduce a conformal score that is robust to label noise. The noise-free conformal score is estimated using the noisy labeled data and the noise level. In the test phase the noise-free score is used to form the prediction set. We applied the proposed algorithm to several standard medical imaging classification datasets. We show that our method outperforms current methods by a large margin, in terms of the average size of the prediction set, while maintaining the required coverage.
# ネットワークトラフィック解析のための汎用マルチモーダル表現学習

Generic Multi-modal Representation Learning for Network Traffic Analysis ( http://arxiv.org/abs/2405.02649v1 )

Luca Gioacchini, Idilio Drago, Marco Mellia, Zied Ben Houidi, Dario Rossi, (参考訳) ネットワークトラフィック分析は、ネットワーク管理、トラブルシューティング、セキュリティに不可欠である。 トラフィック分類、異常検出、新規発見などのタスクは、ネットワークデータや計測から運用情報を抽出する上で基本となる。 我々は、ディープパケット検査と基本的な機械学習から、研究者が特定の問題ごとに設計されたカスタムDLアーキテクチャを定義しテストするディープラーニング(DL)アプローチへの移行を目撃する。 ここでは、異なるトラフィック分析タスクを解くのに十分な柔軟性を持つ汎用DLアーキテクチャの必要性を提唱する。 本稿では、汎用データ適応モジュールに基づくDLアーキテクチャを提案し、次いで抽出した情報をコンパクトでリッチな中間表現(埋め込み)に要約する統合モジュールを提案する。 その結果、柔軟なマルチモーダルオートエンコーダ(MAE)パイプラインが実現し、さまざまなユースケースを解決できる。 このアーキテクチャを交通分類(TC)タスクで示すのは、その結果を最先端のソリューションと定量的に比較できるからである。 しかし、MAEアーキテクチャは汎用的であり、複数のシナリオで有用な表現の学習に使用できると論じる。 TCでは、MAEは、面倒な機能エンジニアリングを避けながら、代替よりも同等かそれ以上の性能を発揮し、トラフィック分析におけるDLソリューションの採用を合理化しています。

Network traffic analysis is fundamental for network management, troubleshooting, and security. Tasks such as traffic classification, anomaly detection, and novelty discovery are fundamental for extracting operational information from network data and measurements. We witness the shift from deep packet inspection and basic machine learning to Deep Learning (DL) approaches where researchers define and test a custom DL architecture designed for each specific problem. We here advocate the need for a general DL architecture flexible enough to solve different traffic analysis tasks. We test this idea by proposing a DL architecture based on generic data adaptation modules, followed by an integration module that summarises the extracted information into a compact and rich intermediate representation (i.e. embeddings). The result is a flexible Multi-modal Autoencoder (MAE) pipeline that can solve different use cases. We demonstrate the architecture with traffic classification (TC) tasks since they allow us to quantitatively compare results with state-of-the-art solutions. However, we argue that the MAE architecture is generic and can be used to learn representations useful in multiple scenarios. On TC, the MAE performs on par or better than alternatives while avoiding cumbersome feature engineering, thus streamlining the adoption of DL solutions for traffic analysis.
# トピックモデリングを用いたホロコースト証言における物語パターンとアウトリーチの同定

Identifying Narrative Patterns and Outliers in Holocaust Testimonies Using Topic Modeling ( http://arxiv.org/abs/2405.02650v1 )

Maxim Ifergan, Renana Keydar, Omri Abend, Amit Pinchevski, (参考訳) ホロコーストの生き残り証言の膨大なコレクションは、貴重な歴史的洞察を提示するが、手動による分析に挑戦する。 本稿では,USC Shoah Foundation Holocaust 証言コーパスを探索するために,高度自然言語処理(NLP)技術を活用する。 質問文を構造化された質問文として扱うことにより、主要なテーマを特定するためにトピックモデリングを適用する。 言語モデリング技術の最近の進歩を生かしたBERTopicを実験する。 証言セクションを固定部分に整列し、証言のコーパスにまたがるトピックの進化を明らかにする。 これは、一般的な物語スキーマと、年齢と性別に基づくサブグループ間の相違の両方を強調している。 本稿では,他のグループに類似した非典型的話題分布を示すグループ内の証言を識別する新しい手法を提案する。 本研究はホロコーストの生存者の複雑な物語に独特な洞察を与え、NLPの歴史的言説を照らし、生き残り体験における潜在的な逸脱を特定する力を示す。

The vast collection of Holocaust survivor testimonies presents invaluable historical insights but poses challenges for manual analysis. This paper leverages advanced Natural Language Processing (NLP) techniques to explore the USC Shoah Foundation Holocaust testimony corpus. By treating testimonies as structured question-and-answer sections, we apply topic modeling to identify key themes. We experiment with BERTopic, which leverages recent advances in language modeling technology. We align testimony sections into fixed parts, revealing the evolution of topics across the corpus of testimonies. This highlights both a common narrative schema and divergences between subgroups based on age and gender. We introduce a novel method to identify testimonies within groups that exhibit atypical topic distributions resembling those of other groups. This study offers unique insights into the complex narratives of Holocaust survivors, demonstrating the power of NLP to illuminate historical discourse and identify potential deviations in survivor experiences.
# 圧縮ビデオにおける遠隔心拍推定のための深部パルス信号拡大法

Deep Pulse-Signal Magnification for remote Heart Rate Estimation in Compressed Videos ( http://arxiv.org/abs/2405.02652v1 )

Joaquim Comas, Adria Ruiz, Federico Sukno, (参考訳) データ駆動型アプローチによる遠隔心拍測定(rPPG)の最近の進歩は、精度を著しく向上させた。 しかし、ビデオ圧縮のようないくつかの課題は依然として残っており、高度に圧縮されたビデオからrPPG信号を回復することは特に複雑である。 いくつかの研究は、ビデオ圧縮の難しさと影響を強調しているが、効果的な解決策は限られている。 本稿では,ビデオ圧縮がrPPG推定に与える影響に対処する新しい手法を提案する。この手法はパルス信号の倍率変換を利用して圧縮された動画をrPPG信号が拡大される非圧縮データ領域に適応させる。 UCLA-rPPG と UBFC-rPPG の2つの公開データセットに対して,複数の圧縮速度でデータベース内性能とデータベース間性能を両立させることにより,本モデルの有効性を検証した。 さらに,MAHNOB-HCI と COHFACE の2つの高圧縮・広帯域化データセットに対して,本手法のロバスト性を評価し,心拍数推定結果の顕著さを明らかにした。

Recent advancements in remote heart rate measurement (rPPG), motivated by data-driven approaches, have significantly improved accuracy. However, certain challenges, such as video compression, still remain: recovering the rPPG signal from highly compressed videos is particularly complex. Although several studies have highlighted the difficulties and impact of video compression for this, effective solutions remain limited. In this paper, we present a novel approach to address the impact of video compression on rPPG estimation, which leverages a pulse-signal magnification transformation to adapt compressed videos to an uncompressed data domain in which the rPPG signal is magnified. We validate the effectiveness of our model by exhaustive evaluations on two publicly available datasets, UCLA-rPPG and UBFC-rPPG, employing both intra- and cross-database performance at several compression rates. Additionally, we assess the robustness of our approach on two additional highly compressed and widely-used datasets, MAHNOB-HCI and COHFACE, which reveal outstanding heart rate estimation results.
# 信念進化ネットワークによる異方性カノニカル分解

Isopignistic Canonical Decomposition via Belief Evolution Network ( http://arxiv.org/abs/2405.02653v1 )

Qianli Zhou, Tianxiang Zhan, Yong Deng, (参考訳) 不確実な環境での汎用情報処理モデルの開発は、説明可能な人工知能の発展に不可欠である。 デンプスター・シェーファーのエビデンスの理論は、主観的確率論と可能性理論と密接に関連しているてんかんの不確実性を表現するためのよく知られた効果的な推論方法である。 特定の信念構造の下では相互に変換できるが、情報処理の統一的なアプローチと同様に、明確かつ解釈可能な変換プロセスが欠如している。 本稿では,同義的信念関数と超注意的伝達可能信念モデルの観点から,これらの課題に対処することを目的とする。 まず,信念進化ネットワークに基づく同義変換を提案する。 この変換は、潜在的な決定結果を保ちながら、情報グラニュラーの調整を可能にする。 等比変換は、新しい正準分解を確立するために、超注意的な伝達可能な信念モデルと統合される。 この分解は、可能性分布とその同型質量関数の間の逆経路を与える。 正準分解の結果は等比関数と呼ばれ、BPAの正当性と相対コミットメント度を反映した同一の情報量分布である。 さらに,同義性関数を調整して基本信念の割り当てを再構築する手法を提案する。 過注意な伝達可能な信念モデルにおける不確実性のモデリングと処理におけるこのアプローチの利点を探求する。 より一般に、確率論、デンプスター・シェーファー理論、可能性理論に基づく人工知能の一般モデルを構築するための理論的基盤を確立する。

Developing a general information processing model in uncertain environments is fundamental for the advancement of explainable artificial intelligence. Dempster-Shafer theory of evidence is a well-known and effective reasoning method for representing epistemic uncertainty, which is closely related to subjective probability theory and possibility theory. Although they can be transformed to each other under some particular belief structures, there remains a lack of a clear and interpretable transformation process, as well as a unified approach for information processing. In this paper, we aim to address these issues from the perspectives of isopignistic belief functions and the hyper-cautious transferable belief model. Firstly, we propose an isopignistic transformation based on the belief evolution network. This transformation allows for the adjustment of the information granule while retaining the potential decision outcome. The isopignistic transformation is integrated with a hyper-cautious transferable belief model to establish a new canonical decomposition. This decomposition offers a reverse path between the possibility distribution and its isopignistic mass functions. The result of the canonical decomposition, called isopignistic function, is an identical information content distribution to reflect the propensity and relative commitment degree of the BPA. Furthermore, this paper introduces a method to reconstruct the basic belief assignment by adjusting the isopignistic function. It explores the advantages of this approach in modeling and handling uncertainty within the hyper-cautious transferable belief model. More general, this paper establishes a theoretical basis for building general models of artificial intelligence based on probability theory, Dempster-Shafer theory, and possibility theory.
# 複数エージェント強化学習における選択的相互作用と長期経験による協調の強化

Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2405.02654v1 )

Tianyu Ren, Xiao-Jun Zeng, (参考訳) 社会的ジレンマにおけるグループ協力の促進におけるネットワーク構造の重要性は広く認識されている。 以前の研究では、このファシリテーションは空間的相互作用によって引き起こされる戦略の体系化に起因している。 強化学習は、動的相互作用が協調の進化に与える影響を調べるために用いられているが、エージェントが隣り合う選択行動をどのように発達するか、そして明示的な相互作用構造の中で戦略的な配置を形成するかについての理解の欠如が依然として残っている。 そこで本研究では,空間的囚人のジレンマゲームにおけるマルチエージェント強化学習に基づく計算フレームワークを提案する。 この枠組みにより、エージェントは、事前に設定された社会的規範や外部インセンティブに依存する既存の研究とは異なる、長期の経験に基づいてジレンマ戦略を選択し、近隣住民と対話することができる。 2つの異なるQ-ネットを用いて各エージェントをモデル化することにより、協調と相互作用の共進化ダイナミクスを解き放つ。 その結果, 長期経験により, 非協力的隣人を識別し, 協力的隣人との交流を優先できる可能性が示唆された。 この創発的な自己組織化行動は、同様の戦略でエージェントのクラスタ化を招き、ネットワークの相互性を高め、グループ協力を強化する。

The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding about how agents develop neighbour selection behaviours and the formation of strategic assortment within an explicit interaction structure. To address this, our study introduces a computational framework based on multi-agent reinforcement learning in the spatial Prisoner's Dilemma game. This framework allows agents to select dilemma strategies and interacting neighbours based on their long-term experiences, differing from existing research that relies on preset social norms or external incentives. By modelling each agent using two distinct Q-networks, we disentangle the coevolutionary dynamics between cooperation and interaction. The results indicate that long-term experience enables agents to develop the ability to identify non-cooperative neighbours and exhibit a preference for interaction with cooperative ones. This emergent self-organizing behaviour leads to the clustering of agents with similar strategies, thereby increasing network reciprocity and enhancing group cooperation.
# R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models ( http://arxiv.org/abs/2405.02659v1 )

Taolin Zhang, Dongyang Li, Qizhou Chen, Chengyu Wang, Longtao Huang, Hui Xue, Xiaofeng He, Jun Huang, (参考訳) Retrieval-augmented large language model (LLMs) は、情報検索システムによって検索された関連コンテンツを利用して正しい応答を生成し、幻覚の問題を緩和することを目的としている。 しかし、既存のレトリバー・サプライヤ法では、検索した文書とLLM間の微細な構造的意味論の相互作用を考慮せずに、テキスト生成タスクを実行するために、関連文書をLLMのプロンプトに付加するのが一般的である。 この問題は、長い文書で拡張された入力プロンプトを扱う場合、LSMは'中間にあるロース'の傾向があるため、正確な応答生成には特に重要である。 本研究では,検索拡張LDMの文書順序を学習するための'Reinforced Retriever-Reorder-Responder'' (R$^4$) という新しいパイプラインを提案する。 再順序学習プロセスは、生成した応答の質に応じて、文書順序調整と文書表現強調という2つのステップに分けられる。 具体的には、検索した文書注文を、グラフ注意学習に基づいて、開始、中、終了位置に整理することを目的としており、応答品質の強化報酬を最大化する。 文書表現の強化は、文書レベルの勾配対向学習を通じて、品質の悪い応答に対する検索された文書の表現をさらに洗練する。 大規模な実験により,提案したパイプラインは,様々な公開データセットの強いベースラインと比較して,知識集約的なタスクに対して,現実的な質問応答性能が向上することが示された。 ソースコードとトレーニングされたモデルは、論文の受理時にリリースされる。

Retrieval-augmented large language models (LLMs) leverage relevant content retrieved by information retrieval systems to generate correct responses, aiming to alleviate the hallucination problem. However, existing retriever-responder methods typically append relevant documents to the prompt of LLMs to perform text generation tasks without considering the interaction of fine-grained structural semantics between the retrieved documents and the LLMs. This issue is particularly important for accurate response generation as LLMs tend to ``lose in the middle'' when dealing with input prompts augmented with lengthy documents. In this work, we propose a new pipeline named ``Reinforced Retriever-Reorder-Responder'' (R$^4$) to learn document orderings for retrieval-augmented LLMs, thereby further enhancing their generation abilities while the large numbers of parameters of LLMs remain frozen. The reordering learning process is divided into two steps according to the quality of the generated responses: document order adjustment and document representation enhancement. Specifically, document order adjustment aims to organize retrieved document orderings into beginning, middle, and end positions based on graph attention learning, which maximizes the reinforced reward of response quality. Document representation enhancement further refines the representations of retrieved documents for responses of poor quality via document-level gradient adversarial learning. Extensive experiments demonstrate that our proposed pipeline achieves better factual question-answering performance on knowledge-intensive tasks compared to strong baselines across various public datasets. The source codes and trained models will be released upon paper acceptance.
# DDE-Find:データから遅延微分方程式を学習する

DDE-Find: Learning Delay Differential Equations from Data ( http://arxiv.org/abs/2405.02661v1 )

Robert Stephany, (参考訳) 遅延微分方程式(Delay Differential Equations, DDE)は、様々な科学的現象をモデル化できる微分方程式のクラスである。 しかし、DDEの予測を実験結果と一致させるパラメータ、特に遅延時間を特定することは困難である。 DDEのパラメータ、時間遅延、初期条件関数を学習するためのデータ駆動フレームワークであるDDE-Findを紹介する。 DDE-Findは、モデルパラメータに対する損失関数の勾配を効率的に計算するために、随伴型アプローチを用いる。 我々は,隣接体を用いて損失の勾配の表現を動機付け,厳密に証明する。 DDE-Findは、データからDDEを学ぶための最近の発展の上に構築され、データからDDEを学ぶための最初の完全なフレームワークを提供する。 数値実験を通じて,DDE-Findはノイズの多い限られたデータからDDEを学習できることを実証した。

Delay Differential Equations (DDEs) are a class of differential equations that can model diverse scientific phenomena. However, identifying the parameters, especially the time delay, that make a DDE's predictions match experimental results can be challenging. We introduce DDE-Find, a data-driven framework for learning a DDE's parameters, time delay, and initial condition function. DDE-Find uses an adjoint-based approach to efficiently compute the gradient of a loss function with respect to the model parameters. We motivate and rigorously prove an expression for the gradients of the loss using the adjoint. DDE-Find builds upon recent developments in learning DDEs from data and delivers the first complete framework for learning DDEs from data. Through a series of numerical experiments, we demonstrate that DDE-Find can learn DDEs from noisy, limited data.
# MedPromptExtract(医療データ抽出ツール):NLPとプロンプトエンジニアリングを用いた匿名化と階層自動データ抽出

MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineering ( http://arxiv.org/abs/2405.02664v1 )

Roomani Srivastava, Suraj Prasad, Lipika Bhat, Sarvesh Deshpande, Barnali Das, Kshitij Jadhav, (参考訳) 医療記録のシームレスなデジタル化における大きな障害は、既存の記録との相互運用性の欠如である。 さらなる治療計画や研究に必要な関連する医療情報を抽出することは、医師の非常に貴重な時間を含む労働集約的なタスクに費やす時間である。 本稿では, 半教師付き学習, 大規模言語モデル, 自然言語処理を併用した自動ツールMedPromptExtractについて述べる。

A major roadblock in the seamless digitization of medical records remains the lack of interoperability of existing records. Extracting relevant medical information required for further treatment planning or even research is a time consuming labour intensive task involving the much valuable time of doctors. In this demo paper we present, MedPromptExtract an automated tool using a combination of semi supervised learning, large language models, natural lanuguage processing and prompt engineering to convert unstructured medical records to structured data which is amenable to further analysis.
# ユーザレベルにおけるメトリック差分プライバシー

Metric Differential Privacy at the User-Level ( http://arxiv.org/abs/2405.02665v1 )

Jacob Imola, Amrita Roy Chowdhury, Kamalika Chaudhuri, (参考訳) メートル差プライバシー(DP)は、入力のペア間の距離に基づいて不均一なプライバシー保証を提供する。 多くのアプリケーション(ロケーションデータなど)の自然なプライバシセマンティクスをキャプチャし、結果として標準DPよりも便利になるため、プライバシの概念は広く普及している。 しかしながら、メトリックDPにおける以前の作業は主に、すべてのユーザが単一のデータ項目のみをレポートする、‘textit{item-level’設定に重点を置いていた。 より現実的な設定は、ユーザが複数のアイテムをコントリビュートし、ユーザの‘textit{entire}コントリビューションの粒度でプライバシを求める、ユーザレベルのDPである。 本稿では,ユーザレベルでのメートル法DPの研究を開始する。 具体的には、ユーザのデータの変化の大きさと空間的側面の両方をキャプチャするプライバシーの概念を得るために、アースモーバー距離(d_\textsf{EM}$)を使っています。 主な技術貢献は3つある。 まず、線形クエリとアイテムワイズクエリに応答する2つの新しいメカニズムを$d_\textsf{EM}$-DPで設計する。 具体的には、後者の分析は、独立した関心を持つかもしれないシャッフル結果によるプライバシー増幅の一般化を伴う。 第2に、新しいサンプリングベース機構により、非有界な一般から有界な$d_\textsf{EM}$-DP(データセットのサイズは固定され公開されている)へのブラックボックスの削減を提供する。 第3に,提案手法は,特定の種類の線形クエリや周波数推定に対して,ユーザレベルのDPよりも有効性を向上できることを示す。

Metric differential privacy (DP) provides heterogeneous privacy guarantees based on a distance between the pair of inputs. It is a widely popular notion of privacy since it captures the natural privacy semantics for many applications (such as, for location data) and results in better utility than standard DP. However, prior work in metric DP has primarily focused on the \textit{item-level} setting where every user only reports a single data item. A more realistic setting is that of user-level DP where each user contributes multiple items and privacy is then desired at the granularity of the user's \textit{entire} contribution. In this paper, we initiate the study of metric DP at the user-level. Specifically, we use the earth-mover's distance ($d_\textsf{EM}$) as our metric to obtain a notion of privacy as it captures both the magnitude and spatial aspects of changes in a user's data. We make three main technical contributions. First, we design two novel mechanisms under $d_\textsf{EM}$-DP to answer linear queries and item-wise queries. Specifically, our analysis for the latter involves a generalization of the privacy amplification by shuffling result which may be of independent interest. Second, we provide a black-box reduction from the general unbounded to bounded $d_\textsf{EM}$-DP (size of the dataset is fixed and public) with a novel sampling based mechanism. Third, we show that our proposed mechanisms can provably provide improved utility over user-level DP, for certain types of linear queries and frequency estimation.
# 一般化解析から状態空間モデルへの最適化設計へ

From Generalization Analysis to Optimization Designs for State Space Models ( http://arxiv.org/abs/2405.02670v1 )

Fusheng Liu, Qianxiao Li, (参考訳) 状態空間モデル(英: State Space Model, SSM)は、時系列解析における基礎モデルであり、最近、シーケンシャルモデリングにおけるトランスフォーマーの代替として示されている。 本稿では,SSMの一般化を理論的に研究し,一般化結果に基づく学習アルゴリズムの改良を提案する。 具体的には、SSM に対して \textit{data-dependent} の一般化を与え、SSM パラメータとトレーニングシーケンスの時間的依存との間の相互作用を示す。 一般化バウンダリを利用して,(1)提案した一般化尺度に基づいてモデル初期化のスケーリングルールを設定し,SSMの出力値スケールのロバスト性を大幅に向上させるとともに,SSMをトレーニングするための新たな正規化手法を導入し,一般化性能を向上させる。 結果を検証するために, 数値計算を行った。

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.
# 非自己回帰翻訳における情報冗長性について

On the Information Redundancy in Non-Autoregressive Translation ( http://arxiv.org/abs/2405.02673v1 )

Zhihao Wang, Longyue Wang, Jinsong Su, Junfeng Yao, Zhaopeng Tu, (参考訳) トークン反復は、完全非自己回帰翻訳(NAT)におけるマルチモーダル問題の典型的な形式である。 本研究では,最近提案されたNATモデルにおけるマルチモーダル問題を再考する。 本研究は,従来の測定基準である連続繰り返し比では測定できない,他の種類の情報冗長性誤差を導入したことを明らかにする。 NAT出力を手動でアノテートすることにより,複数モーダリティ問題によく対応した2種類の情報冗長性誤差を同定する。 人間のアノテーションは時間がかかり、労力がかかるため、2種類の冗長なエラーを評価するための自動メトリクスを提案する。 我々のメトリクスは、将来の研究で新しい手法を評価し、それらの効果をより包括的に理解することを可能にする。

Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotating the NAT outputs, we identify two types of information redundancy errors that correspond well to lexical and reordering multi-modality problems. Since human annotation is time-consuming and labor-intensive, we propose automatic metrics to evaluate the two types of redundant errors. Our metrics allow future studies to evaluate new methods and gain a more comprehensive understanding of their effectiveness.
# Quranic Audio Dataset:非アラビア話者からのクラウドソーシングとラベリング

Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers ( http://arxiv.org/abs/2405.02675v1 )

Raghad Salameh, Mohamad Al Mdfaa, Nursultan Askarbekuly, Manuel Mazzara, (参考訳) 本稿では、アラビア語以外の話者に対して、クアン語を引用する学習の課題について論じる。 我々は、慎重に注釈付けされたQuranicデータセットをクラウドソーシングして、学習プロセスを単純化するためにAIモデルを構築できる可能性を探る。 特に,ボランティアベースのクラウドソーシングのジャンルを用いて,オーディオ資産収集のためのクラウドソーシングAPIを実装している。 私たちはこのAPIを、NamazAppという既存のモバイルアプリに統合し、音声のリサイクリングを収集しました。 収集したオーディオ資産に注釈をつけるために,Quran Voiceというクラウドソーシングプラットフォームを開発した。 その結果、11カ国以上の1287人の参加者のプールから約7000人のクラニック・リサイクリングを収集し、このデータセットから6つのカテゴリで1166件のリサイクリングを注釈付けした。 我々は,アルゴリズムによって割り当てられたラベルと専門家の判断との間には,0.77,0.63のアノテータ間合意,0.89のアノテータ間合意を達成している。

This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.
# ハンドオブジェクトインタラクションコントローラ(HOIC:Deep Reinforcement Learning for Restructing Interactions with Physics)

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics ( http://arxiv.org/abs/2405.02676v1 )

Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu, (参考訳) 手で操作する物体は、日々の活動において重要な相互作用運動である。 我々はこの動きを1台のRGBDカメラで忠実に再構成し、物理を活用するための新しい強化学習手法を提案する。 まず、ネットワークトレーニングをより安定させるために、直接オブジェクト制御を確立するオブジェクト補償制御を提案する。 一方、補償力とトルクを利用して、簡単な点接触モデルをより物理的に解明可能な面接触モデルにシームレスにアップグレードし、復元精度と物理的正しさをさらに向上する。 実験は、ヒューリスティックな物理規則を一切含まないまま、この研究は、深い強化学習を模倣し難い複雑な動きである手-物体の相互作用の再構築に物理学をうまく関与させることを示唆している。 私たちのコードとデータはhttps://github.com/hu-hy17/HOIC.comで公開されています。

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.
# 計算抽出ナラティブマップによるメディアフレームの符号化能力の評価

Evaluating the Ability of Computationally Extracted Narrative Maps to Encode Media Framing ( http://arxiv.org/abs/2405.02677v1 )

Sebastián Concha Macías, Brian Keith Norambuena, (参考訳) ナラティブは世界を理解する上での基本的な枠組みとして機能し、コラボレーティブなセンスメイキングにおいて重要な役割を担い、センスメイキングのための汎用的な基盤を提供する。 フラーミングは微妙だが強力なメカニズムであり、特定の単語の選択を通じて大衆の認識に影響を与え、報道されたニュースイベントの解釈を形成する。 物語とフレーミングの重要性が認識されているにもかかわらず、計算の抽出と表現の文脈におけるフレーミングの明確な考慮に関して、文献に顕著なギャップが存在する。 本稿では、ニュースデータからフレーミング情報を取得するための、特定の物語抽出と表現アプローチ(物語マップ)の能力について考察する。 1)ナラティブ抽出法はデータセットのフレーミング分布を捉えるか? (2) 一貫性のあるフレーミングを持つ表現を生成するか? この結果から,アルゴリズムはフレーミング分布を捕捉する一方で,様々な開始・終了イベントを一貫したフレーミングを実現することが課題となっている。 本結果は,ニュース物語における複雑なフレーミングのダイナミクスをユーザに提供するナラティブマップの可能性を強調した。 しかし、計算的物語抽出プロセスにおいて、フレーミング情報を直接活用することは、未解決の課題である。

Narratives serve as fundamental frameworks in our understanding of the world and play a crucial role in collaborative sensemaking, providing a versatile foundation for sensemaking. Framing is a subtle yet potent mechanism that influences public perception through specific word choices, shaping interpretations of reported news events. Despite the recognized importance of narratives and framing, a significant gap exists in the literature with regard to the explicit consideration of framing within the context of computational extraction and representation. This article explores the capabilities of a specific narrative extraction and representation approach -- narrative maps -- to capture framing information from news data. The research addresses two key questions: (1) Does the narrative extraction method capture the framing distribution of the data set? (2) Does it produce a representation with consistent framing? Our results indicate that while the algorithm captures framing distributions, achieving consistent framing across various starting and ending events poses challenges. Our results highlight the potential of narrative maps to provide users with insights into the intricate framing dynamics within news narratives. However, we note that directly leveraging framing information in the computational narrative extraction process remains an open challenge.
# 位置情報:Quo Vadis, Unsupervised Time Series Anomaly Detection?

Position Paper: Quo Vadis, Unsupervised Time Series Anomaly Detection? ( http://arxiv.org/abs/2405.02678v1 )

M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis, (参考訳) Timeseries Anomaly Detection (TAD)における機械学習奨学金の現在の状況は、欠陥のある評価指標の使用、一貫性のないベンチマークプラクティス、新しいディープラーニングベースのモデル設計における選択に対する適切な正当化の欠如に悩まされている。 本稿は,TADにおける現状を批判的に分析し,現在の研究の誤解を招き,問題となる方法や評価の実践を明らかにする。 我々の立場は、モデル設計の新規性のみを追求することから、ベンチマークプラクティスの改善、非自明なデータセットの作成、特定のタスクに対するモデルアーキテクチャの有用性の研究に重点を置いている。 その結果,厳密な評価プロトコルの必要性,単純なベースラインの作成,および最先端の深部異常検出モデルが線形写像を効果的に学習できることが示唆された。 これらの結果から, 簡便かつ解釈可能なTAD法のさらなる探索と開発の必要性が示唆された。 最先端のディープラーニングベースのモデルにおけるモデルの複雑さの増加は、残念ながら、ほとんど改善しない。 この分野を前進させるための洞察と提案を提供する。

The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from pursuing only the novelty in model design to improving benchmarking practices, creating non-trivial datasets, and placing renewed emphasis on studying the utility of model architectures for specific tasks. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward.
# 論理学における「知識」と「知識」 : キャッシュの進化に就て

Prévisions météorologiques basées sur l'intelligence artificielle : une révolution peut en cacher une autre ( http://arxiv.org/abs/2405.02679v1 )

Zied Ben-Bouallegue, Mariana C A Clare, Matthieu Chevallier, (参考訳) 高品質なリアナリシスデータセットを用いたディープラーニングアルゴリズムに基づく人工知能(AI)は、天気予報に大きな可能性を示している。 この文脈において、欧州中距離気象予報センター(ECMWF)は、AIに基づく新しい予測システムを開発している。 現在、決定論的予測の検証結果は有望である。 しかし、AIに基づく天気予報の現実性はしばしば疑問視される。 ここでは、異なる種類のリアリズムを特定し、特に、構造的リアリズムと気象事象の予測可能性の関係について論じる。 さらに、AIに基づく決定論的予測の統計的分析は、確率論的アプローチが解決するべき現実主義/パフォーマンスジレンマを示している。 L'intelligence artificielle (IA) bouleverse aujourd'hui le monde de la pr\'evision m''et\'eorologique avec l'utilisation d'algorithmes d'apprentissage profond nourris par des champs de r'eanalyses M''et\'eorologiques \`a Moyen Terme (CEPMMT) a d''ecid\'e de d''evelopper un nouveau syst\`eme de pr''evisions resposant sur l'IA d'eterministe, montrent des r''esultats prometteurs, pour le moment de type d'eterministe, montrent des r''esultats prometteurs。 toutefois, le r'ealisme de ce type de pr'evisions reposant sur l'IA est souvent questionn\'e。 Ici, nous identifions diff\'erents types de r\'ealisme et interrogeons notamment le rapport entre r\'ealisme structurel et pr\'evisibilit\'e des \'ev\^enements m\'et\eorologiques。 Une analyse statistique de pr\'evisions d\'eterministes reposant sur l'IA laisse apparaitre un dilemme r\'ealisme/ Performance qu'une approche probabiliste devrait aider \`a r\'soudre。

Artificial intelligence (AI), based on deep-learning algorithm using high-quality reanalysis datasets, is showing enormous potential for weather forecasting. In this context, the European Centre for Medium-Range Weather Forecasts (ECMWF) is developing a new forecasting system based on AI. Verification results of deterministic forecast for now are promising. However, the realism of weather forecasts based on AI is often questioned. Here, different types of realism are identified and we discuss, in particular, the relationship between structural realism and predictability of weather events. Furthermore, a statistical analysis of deterministic forecasts based on AI points to a realism/performance dilemma that a probabilistic approach should help to solve. -- L'intelligence artificielle (IA) bouleverse aujourd'hui le monde de la pr\'evision m\'et\'eorologique avec l'utilisation d'algorithmes d'apprentissage profond nourris par des champs de r\'eanalyses. Dans ce contexte, le Centre Europ\'een pour les Pr\'evisions M\'et\'eorologiques \`a Moyen Terme (CEPMMT) a d\'ecid\'e de d\'evelopper un nouveau syst\`eme de pr\'evisions resposant sur l'IA. Ces pr\'evisions, pour le moment de type d\'eterministe, montrent des r\'esultats prometteurs. Toutefois, le r\'ealisme de ce type de pr\'evisions reposant sur l'IA est souvent questionn\'e. Ici, nous identifions diff\'erents types de r\'ealisme et interrogeons notamment le rapport entre r\'ealisme structurel et pr\'evisibilit\'e des \'ev\^enements m\'et\'eorologiques. Une analyse statistique de pr\'evisions d\'eterministes reposant sur l'IA laisse apparaitre un dilemme r\'ealisme/performance qu'une approche probabiliste devrait aider \`a r\'esoudre.
# 位相空間における量子多体系の位相図のナビゲート

Navigating the phase diagram of quantum many-body systems in phase space ( http://arxiv.org/abs/2405.02680v1 )

Khadija El Hawary, Mohamed Azzouz, Morad El Baz, Sebastian Deffner, Bartłomiej Gardas, Zakaria Mzaouali, (参考訳) 我々は、スピン$-(\frac{1}{2\! ! \frac{1}{2})$ and spin$-(\frac{1}{2}\! -\! 1) Ising-Heisenberg 鎖。 位相境界の検出における絡み合いの収束と比較して位相空間アプローチの利点と限界を強調した。 位相空間における等角スライス近似は位相図の本質的特徴を捉える効果的な方法であるが、同次スピン$-(\frac{1}{2}\! -\! \frac{1}{2})$ Ising-Heisenberg 鎖。 対照的に、不均一スピン$-(\frac{1}{2}\! -\! 1) 系の位相図を正確に捉えるためには, 位相空間全体に対する積分が必要とされる。 この区別は、検討中の量子系の均一性に対する位相空間法の感度を浮き彫りにする。

We demonstrate the unique capabilities of the Wigner function, particularly in its positive and negative parts, for exploring the phase diagram of the spin$-(\frac{1}{2\!}-\!\frac{1}{2})$ and spin$-(\frac{1}{2}\!-\!1)$ Ising-Heisenberg chains. We highlight the advantages and limitations of the phase space approach in comparison with the entanglement concurrence in detecting phase boundaries. We establish that the equal angle slice approximation in the phase space is an effective method for capturing the essential features of the phase diagram, but falls short in accurately assessing the negativity of the Wigner function for the homogeneous spin$-(\frac{1}{2}\!-\!\frac{1}{2})$ Ising-Heisenberg chain. In contrast, we find for the inhomogeneous spin$-(\frac{1}{2}\!-\!1)$ chain that an integral over the entire phase space is necessary to accurately capture the phase diagram of the system. This distinction underscores the sensitivity of phase space methods to the homogeneity of the quantum system under consideration.
# FedProK: 原型的特徴的知識伝達による信頼に値するフェデレーションクラスインクリメンタルラーニング

FedProK: Trustworthy Federated Class-Incremental Learning via Prototypical Feature Knowledge Transfer ( http://arxiv.org/abs/2405.02685v1 )

Xin Gao, Xin Yang, Hao Yu, Yan Kang, Tianrui Li, (参考訳) フェデレーション・クラス・インクリメンタル・ラーニング(FCIL)は、動的フェデレーション・ラーニング(FL)における新しいクラスを学ぶために、以前の知識を継続的に移行することに焦点を当てている。 しかし,既存手法では,FCILの信頼性,すなわち連続性,プライバシ,効率性を同時に向上させることを考慮していない。 この問題に対処するため,FedProK(Federated Prototypeal Feature Knowledge Transfer)を提案する。 具体的には,(1)学習クラスからの時間的知識伝達によるクライアント側の特徴翻訳手順と,(2)クライアント間の空間的知識伝達によるサーバ側のプロトタイプ的知識融合である。 同期と非同期の両方で実施された大規模な実験により、FedProKは3つの信頼性の観点から他の最先端手法よりも優れており、空間的時間的知識を選択的に伝達する効果が検証された。

Federated Class-Incremental Learning (FCIL) focuses on continually transferring the previous knowledge to learn new classes in dynamic Federated Learning (FL). However, existing methods do not consider the trustworthiness of FCIL, i.e., improving continual utility, privacy, and efficiency simultaneously, which is greatly influenced by catastrophic forgetting and data heterogeneity among clients. To address this issue, we propose FedProK (Federated Prototypical Feature Knowledge Transfer), leveraging prototypical feature as a novel representation of knowledge to perform spatial-temporal knowledge transfer. Specifically, FedProK consists of two components: (1) feature translation procedure on the client side by temporal knowledge transfer from the learned classes and (2) prototypical knowledge fusion on the server side by spatial knowledge transfer among clients. Extensive experiments conducted in both synchronous and asynchronous settings demonstrate that our FedProK outperforms the other state-of-the-art methods in three perspectives of trustworthiness, validating its effectiveness in selectively transferring spatial-temporal knowledge.
# 自然画像を用いた2次元視覚変換器による3次元ニューロン分割の促進

Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images ( http://arxiv.org/abs/2405.02686v1 )

Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai, (参考訳) 神経科学の基本課題の1つであるニューロン再構成は、三次元光顕微鏡画像データから神経形態を再構築する。 神経系における神経細胞の構造と機能の関係を分析する上で重要な役割を担っている。 しかし、ニューロンデータセットの不足と高品質なSWCアノテーションのため、単一ニューロン再構成のための堅牢なセグメンテーション手法を開発することは依然として困難である。 この制限に対処するため、我々は、複雑なニューロン構造を学習する際のセグメンテーションモデルを支援するために、膨大な自然画像データからコンセンサス知識を抽出することを目的としている。 具体的には,大規模な自然画像に事前学習した2次元ビジョントランスフォーマーモデルを利用して,2次元から3次元の重み移動戦略でトランスフォーマーに基づく3次元ニューロンセグメンテーションモデルを初期化する,新たなトレーニングパラダイムを提案する。 本手法は, 豊富な自然と希少なニューロン画像領域間の知識共有接続を構築し, データ効率で3次元ニューロンセグメンテーション能力を向上させる。 一般的なベンチマークであるBigNeuronを用いて評価し、トレーニングサンプルと同じ量でスクラッチからトレーニングしたモデルに対して、ニューロンセグメンテーション性能を8.71%向上させる。

Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single neuron reconstruction. To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures. Specifically, in this work, we propose a novel training paradigm that leverages a 2D Vision Transformer model pre-trained on large-scale natural images to initialize our Transformer-based 3D neuron segmentation model with a tailored 2D-to-3D weight transferring strategy. Our method builds a knowledge sharing connection between the abundant natural and the scarce neuron image domains to improve the 3D neuron segmentation ability in a data-efficiency manner. Evaluated on a popular benchmark, BigNeuron, our method enhances neuron segmentation performance by 8.71% over the model trained from scratch with the same amount of training samples.
# 低ランクテンソル表現を用いた半教師付き対称行列分解

Semi-supervised Symmetric Matrix Factorization with Low-Rank Tensor Representation ( http://arxiv.org/abs/2405.02688v1 )

Yuheng Jia, Jia-Nan Li, Wenhui Wu, Ran Wang, (参考訳) 半教師付き対称非負行列分解(SNMF)は、SNMFのクラスタリング能力を改善するために利用可能な監督情報(通常はペアワイズ制約の形で)を利用する。 従来の手法では、局所的な視点からペアワイズ制約(英語版)を導入しており、すなわち、類似性行列を直接的に洗練するか、ペアワイズ制約に従って分解されたベクトルの距離を制限している。 そこで本論文では, 組込み行列の積とその変換によって得られる類似性行列と, 対の制約行列によって合成されたテンソルの低ランク表現を求めることで, この2つの行列をグローバルな視点から同時に強化し, 半教師付きSNMFモデルを提案する。 次に、拡張SNMFモデルを提案し、埋め込み行列を上記のテンソル低ランク表現に適合させる。 最後に、強化されたペアワイズ制約により類似性行列を洗練する。 上記のステップを繰り返して、類似性行列とペアの制約行列を連続的に強化し、高品質な埋め込み行列をもたらす。 大規模な実験は、我々の方法の優越性を裏付けるものである。 コードはhttps://github.com/JinaLeejnl/TSNMFで公開されている。

Semi-supervised symmetric non-negative matrix factorization (SNMF) utilizes the available supervisory information (usually in the form of pairwise constraints) to improve the clustering ability of SNMF. The previous methods introduce the pairwise constraints from the local perspective, i.e., they either directly refine the similarity matrix element-wisely or restrain the distance of the decomposed vectors in pairs according to the pairwise constraints, which overlook the global perspective, i.e., in the ideal case, the pairwise constraint matrix and the ideal similarity matrix possess the same low-rank structure. To this end, we first propose a novel semi-supervised SNMF model by seeking low-rank representation for the tensor synthesized by the pairwise constraint matrix and a similarity matrix obtained by the product of the embedding matrix and its transpose, which could strengthen those two matrices simultaneously from a global perspective. We then propose an enhanced SNMF model, making the embedding matrix tailored to the above tensor low-rank representation. We finally refine the similarity matrix by the strengthened pairwise constraints. We repeat the above steps to continuously boost the similarity matrix and pairwise constraint matrix, leading to a high-quality embedding matrix. Extensive experiments substantiate the superiority of our method. The code is available at https://github.com/JinaLeejnl/TSNMF.
# Diffomorphic Transformer-based Abdomen MRI-CT deformable Image Registration

Diffeomorphic Transformer-based Abdomen MRI-CT Deformable Image Registration ( http://arxiv.org/abs/2405.02692v1 )

Yang Lei, Luke A. Matkovic, Justin Roper, Tonghe Wang, Jun Zhou, Beth Ghavidel, Mark McDonald, Pretesh Patel, Xiaofeng Yang, (参考訳) 本稿では,腹部MRI-CT画像を直接登録するための変形ベクトル場(DVF)を推定できるディープラーニングフレームワークの構築を目的とする。 提案手法は微分同相変形を仮定する。 確率微分同相登録モデルから抽出した位相保存変形特徴を用いて,DVF推定に腹部の動きを正確に求めることができる。 モデルでは,変形特徴抽出のための畳み込みニューラルネットワーク(CNN)に,運動追跡の優れた性能を示すスウィン変換器を組み込んだ。 モデルでは,画像類似性損失と表面整合損失を用いて最適化した。 画像損失を計算するために、変形したMRI画像とCT画像の間にモダリティ非依存の近傍記述子(MIND)を用いた。 MRIおよびCT画像上での凹凸構造の表面の歪んだ座標間の距離を計測することにより表面整合損失を判定した。 対象登録誤差(TRE),Dice類似度係数(DSC),およびCT画像の変形輪郭と手動輪郭間の平均表面距離(MSD)を用いてCT画像に対して変形MRI画像の評価を行った。 硬式登録のみと比較してDIRは肝門脈のDSC値が0.850,0.628,0.903,0.763,肝臓のMSDが7.216mmから3.232mmに減少し,TREが26.238mmから8.492mmに減少した。 微分同相変換器を用いた変形可能な画像登録法は,腹部MRI-CT画像対から正確なDVFを生成する有効な方法を提供する。 これは、現在の肝放射線治療のための治療計画ワークフローで利用することができる。

This paper aims to create a deep learning framework that can estimate the deformation vector field (DVF) for directly registering abdominal MRI-CT images. The proposed method assumed a diffeomorphic deformation. By using topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model, abdominal motion can be accurately obtained and utilized for DVF estimation. The model integrated Swin transformers, which have demonstrated superior performance in motion tracking, into the convolutional neural network (CNN) for deformation feature extraction. The model was optimized using a cross-modality image similarity loss and a surface matching loss. To compute the image loss, a modality-independent neighborhood descriptor (MIND) was used between the deformed MRI and CT images. The surface matching loss was determined by measuring the distance between the warped coordinates of the surfaces of contoured structures on the MRI and CT images. The deformed MRI image was assessed against the CT image using the target registration error (TRE), Dice similarity coefficient (DSC), and mean surface distance (MSD) between the deformed contours of the MRI image and manual contours of the CT image. When compared to only rigid registration, DIR with the proposed method resulted in an increase of the mean DSC values of the liver and portal vein from 0.850 and 0.628 to 0.903 and 0.763, a decrease of the mean MSD of the liver from 7.216 mm to 3.232 mm, and a decrease of the TRE from 26.238 mm to 8.492 mm. The proposed deformable image registration method based on a diffeomorphic transformer provides an effective and efficient way to generate an accurate DVF from an MRI-CT image pair of the abdomen. It could be utilized in the current treatment planning workflow for liver radiotherapy.
# DiffuseTrace: 潜時拡散モデルのための透明でフレキシブルな透かし方式

DiffuseTrace: A Transparent and Flexible Watermarking Scheme for Latent Diffusion Model ( http://arxiv.org/abs/2405.02696v1 )

Liangqi Lei, Keke Gai, Jing Yu, Liehuang Zhu, (参考訳) 潜在拡散モデル(LDM)は、幅広い応用が可能であるが、不正利用に関する倫理的懸念を提起し、生成モデル出力に透かしを付けることは、AI生成コンテンツに関連する著作権追跡や潜在的なリスク軽減に欠かせない手法である。 しかし、ホック後の透かし技術は回避の影響を受けやすい。 LDMの既存の透かし方式は固定メッセージのみを埋め込むことができる。 ウォーターマークメッセージの変更は、モデルの再トレーニングを必要とする。 ウォーターマークの安定性は、モデル更新とイテレーションの影響を受けます。 さらに, 変分オートエンコーダ(VAE)と拡散モデルを用いた現在の復元型透かし除去技術は, かなりの量の透かしを除去する能力を有する。 そこで我々はDiffuseTraceと呼ばれる新しい手法を提案する。 目標は、将来の検出を意味的に行うために、すべての生成された画像に見えない透かしを埋め込むことだ。 エンコーダ・デコーダモデルのトレーニングを通じて初期潜伏変数と透かし情報の統一表現を確立する。 透かし情報はエンコーダを介して初期潜伏変数に埋め込まれ、サンプリングプロセスに統合される。 拡散処理を反転させてデコーダを利用して透かし情報を抽出する。 DiffuseTraceは拡散モデルコンポーネントの微調整に依存しない。 透かしは画像の質を損なうことなく意味的に画像空間に埋め込まれる。 エンコーダデコーダは任意の拡散モデルにおけるプラグインとして利用することができる。 DiffuseTraceの有効性と柔軟性を実験により検証した。 DiffuseTraceは、変分オートエンコーダと拡散モデルに基づく最新の攻撃と戦う上で、前例のない優位性を持っている。

Latent Diffusion Models (LDMs) enable a wide range of applications but raise ethical concerns regarding illegal utilization.Adding watermarks to generative model outputs is a vital technique employed for copyright tracking and mitigating potential risks associated with AI-generated content. However, post-hoc watermarking techniques are susceptible to evasion. Existing watermarking methods for LDMs can only embed fixed messages. Watermark message alteration requires model retraining. The stability of the watermark is influenced by model updates and iterations. Furthermore, the current reconstruction-based watermark removal techniques utilizing variational autoencoders (VAE) and diffusion models have the capability to remove a significant portion of watermarks. Therefore, we propose a novel technique called DiffuseTrace. The goal is to embed invisible watermarks in all generated images for future detection semantically. The method establishes a unified representation of the initial latent variables and the watermark information through training an encoder-decoder model. The watermark information is embedded into the initial latent variables through the encoder and integrated into the sampling process. The watermark information is extracted by reversing the diffusion process and utilizing the decoder. DiffuseTrace does not rely on fine-tuning of the diffusion model components. The watermark is embedded into the image space semantically without compromising image quality. The encoder-decoder can be utilized as a plug-in in arbitrary diffusion models. We validate through experiments the effectiveness and flexibility of DiffuseTrace. DiffuseTrace holds an unprecedented advantage in combating the latest attacks based on variational autoencoders and Diffusion Models.
# 非断熱性誘導体結合による遷移に対するフェルミの黄金律速表現

Fermi's golden rule rate expression for transitions due to nonadiabatic derivative couplings in the adiabatic basis ( http://arxiv.org/abs/2405.02697v1 )

Seogjoo J. Jang, Young Min Rhee, (参考訳) 断熱電子状態と原子核の位置状態の基底で表されるコンパクトだが一般的な分子ハミルトニアンから始め、断熱電子状態間の非断熱微分結合(NDC)項を慎重に検討する。 フェルミの黄金律 (FGR) の教科書式におけるアディバティック電子状態に対して評価されたNDC項の従来の使用法は、異なる測地におけるアディバティック状態の非直交性を無視した追加近似を暗黙的に呼び出す。 そこで我々は, 断熱状態とNDC項を, 断熱状態の最小ポテンシャルエネルギー状態で明示的に用いた準断熱近似に基づいて, 明確に定義されたFGR速度式を導出した。 次に,すべての核自由度を調和振動子の集合としてモデル化する条件と近似を明らかにし,NDC項のモータによる非コンドン効果を明示的に考慮しながら閉形式FGR速度式を導出する。 結果のレート表現は、NDC項の二次的寄与とフランク・コンドンモードへの結合による項を含む。 原子核振動が鋭い高周波モードと広いオーミック浴のスペクトル密度の両方から構成される場合のモデル計算は、その速度表現の新たな特徴と意味を示唆している。

Starting from a compact but general molecular Hamiltonian expressed in the bases of adiabatic electronic states and position states of nuclei, we make careful consideration of nonadiabatic derivative coupling (NDC) terms between adiabatic states. It is clarified that the conventional use of NDC terms evaluated for an adiabatic electronic state in the textbook expression for the Fermi's golden rule (FGR) rate implicitly invokes an additional approximation that ignores non-orthogonality of adiabatic states at different geometries. Thus, we derive a well-defined FGR rate expression based on a quasi-adiabatic approximation that explicitly uses the adiabatic states and NDC terms evaluated at the minimum potential energy state of the initial adiabatic states. We then clarify conditions and approximations leading to the modeling of all the nuclear degrees of freedom as a set of harmonic oscillators, and then derive a closed form FGR rate expression while accounting for the non-Condon effects due to momenta in NDC terms explicitly. The resulting rate expression includes terms due to quadratic contribution of NDC terms and also their couplings to Franck-Condon modes. Model calculations for the case where nuclear vibrations consist of both a sharp high frequency mode and a broad Ohmic bath spectral density illustrate new features and implications of the rate expression.
# 下流分類作業のための安定拡散データセット生成

Stable Diffusion Dataset Generation for Downstream Classification Tasks ( http://arxiv.org/abs/2405.02698v1 )

Eugenio Lomurno, Matteo D'Oria, Matteo Matteucci, (参考訳) 生成人工知能の最近の進歩により、現実世界のデータを忠実に模倣する高品質な合成データの作成が可能になった。 本稿では,Stable Diffusion 2.0モデルの合成データセット生成への適応について検討し,トランスファーラーニング,ファインチューニング,生成パラメータ最適化技術を用いて,下流分類タスクにおけるデータセットの有用性を向上する。 本稿では,クラスエンコーダとキー生成パラメータの最適化を利用したクラス条件付きモデルを提案する。 その3分の1のケースでは、実際のデータセットでトレーニングされたデータセットよりも優れたパフォーマンスのモデルが生成されました。

Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.
# 生成モデルにおける新しいモードのスケーラブルな同定に向けて

Towards a Scalable Identification of Novel Modes in Generative Models ( http://arxiv.org/abs/2405.02700v1 )

Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li, Farzan Farnia, (参考訳) 生成モデルの解釈可能な比較では、関連する各モデルによってより頻繁に生成されるサンプルの型を特定する必要がある。 異なる生成モデルをランク付けするために、文献でいくつかの定量的スコアが提案されているが、このようなスコアに基づく評価は、様々なサンプルタイプの取得において、生成モデル間の微妙な違いを明らかにしていない。 本研究では,Fourier-based Identification of Novel Clusters (FINC) と呼ばれる手法を提案する。 FINCは、ランダムなフーリエ特徴に基づくスケーラブルな確率的アルゴリズムを提供し、2つの生成モデルのカーネル共分散行列の固有空間を推定し、主固有方向を利用して、各モデルにより支配的なサンプル型を検出する。 FINC法の標準コンピュータビジョンデータセットおよび生成モデルフレームワークへの応用を実証する。 提案手法は, 広範に使用されている生成モデルを用いて, 異なる周波数で捕捉したサンプルタイプを強調表示するために, 開発したフーリエ方式のスケーラビリティと効率性を示唆する。

An interpretable comparison of generative models requires the identification of sample types produced more frequently by each of the involved models. While several quantitative scores have been proposed in the literature to rank different generative models, such score-based evaluations do not reveal the nuanced differences between the generative models in capturing various sample types. In this work, we propose a method called Fourier-based Identification of Novel Clusters (FINC) to identify modes produced by a generative model with a higher frequency in comparison to a reference distribution. FINC provides a scalable stochastic algorithm based on random Fourier features to estimate the eigenspace of kernel covariance matrices of two generative models and utilize the principal eigendirections to detect the sample types present more dominantly in each model. We demonstrate the application of the FINC method to standard computer vision datasets and generative model frameworks. Our numerical results suggest the scalability and efficiency of the developed Fourier-based method in highlighting the sample types captured with different frequencies by widely-used generative models.
# データキュレーションレンズによる機械学習データ実践:評価フレームワーク

Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework ( http://arxiv.org/abs/2405.02703v1 )

Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, Christoph Becker, (参考訳) 機械学習におけるデータセット開発の研究は、モデル開発を可能にし、結果を形成するデータプラクティスにより多くの注意を払っている。 多くの人は、アーカイブやデータキュレーション分野からの理論と実践を採用することで、より公正さ、説明責任、透明性、より倫理的な機械学習をサポートすることができると主張している。 そこで本研究では,データキュレーションのレンズによる機械学習データセット開発におけるデータ実践について検討する。 機械学習におけるデータプラクティスをデータキュレーションの実践として評価する。 そこで我々は,データキュレーションの概念と原則を用いた機械学習データセット評価フレームワークを開発した。 25のMLデータセットに対する評価結果の混合手法分析を通じて、機械学習データ処理に採用すべきデータキュレーション原則の実現可能性について検討し、現在どのようにデータキュレーションが行われているかを検討する。 機械学習の研究者たちは、しばしばモデル開発を強調するが、標準的なデータキュレーションの原則を適用するのに苦労している。 本研究は, 両分野の用語を共有した次元の評価, 規範的制約を伴わない概念の適応における高い解釈柔軟性, ルーブリックの適用に必要なデータキュレーションの専門知識の深さの制限, データセット作成者が責任を負う範囲をスクーピングする際の課題など, これらの分野の共通部分における課題について考察した。 我々はこれらの課題に対処する方法を提案し、データキュレーションの概念や手法が機械学習のデータプラクティスにどのように影響するかを概説する、評価のための全体的なフレームワークを開発する。

Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning. In response, this paper examines data practices in machine learning dataset development through the lens of data curation. We evaluate data practices in machine learning as data curation practices. To do so, we develop a framework for evaluating machine learning datasets using data curation concepts and principles through a rubric. Through a mixed-methods analysis of evaluation results for 25 ML datasets, we study the feasibility of data curation principles to be adopted for machine learning data work in practice and explore how data curation is currently performed. We find that researchers in machine learning, which often emphasizes model development, struggle to apply standard data curation principles. Our findings illustrate difficulties at the intersection of these fields, such as evaluating dimensions that have shared terms in both fields but non-shared meanings, a high degree of interpretative flexibility in adapting concepts without prescriptive restrictions, obstacles in limiting the depth of data curation expertise needed to apply the rubric, and challenges in scoping the extent of documentation dataset creators are responsible for. We propose ways to address these challenges and develop an overall framework for evaluation that outlines how data curation concepts and methods can inform machine learning data practices.
# 潜在結晶対称性で保護される高次トポロジー

Higher-order topology protected by latent crystalline symmetries ( http://arxiv.org/abs/2405.02704v1 )

L. Eek, M. Röntgen, A. Moustaj, C. Morais Smith, (参考訳) 回転対称性は、Cn対称高次トポロジカル結晶絶縁体における分数角電荷の存在に必要な要件ではないことを示す。 代わりに、系の等スペクトル還元を行うと明らかになる潜在回転対称性を持つことは十分である。 本稿では,潜在結晶対称系に対する充填異常の概念を導入し,修正トポロジカル不変量を提案する。 したがって、Cn対称性によって保護される2次元の高次位相の概念は、潜在対称性によって保護されるように一般化される。 我々の主張は、Cn対称性がない場合に非自明なコーナー電荷を示すモデルの具体的な例によって裏付けられている。 この研究は、トポロジカル結晶絶縁体の分類を拡張し、潜在対称性を含む。

We demonstrate that rotation symmetry is not a necessary requirement for the existence of fractional corner charges in Cn-symmetric higher-order topological crystalline insulators. Instead, it is sufficient to have a latent rotation symmetry, which may be revealed upon performing an isospectral reduction on the system. We introduce the concept of a filling anomaly for latent crystalline symmetric systems, and propose modified topological invariants. The notion of higher- order topology in two dimensions protected by Cn symmetry is thus generalized to a protection by latent symmetry. Our claims are corroborated by concrete examples of models that show non-trivial corner charge in the absence of Cn-symmetry. This work extends the classification of topological crystalline insulators to include latent symmetries.
# 量子パラメトリック発振器における共鳴力誘起対称性の破れ

Resonant-force induced symmetry breaking in a quantum parametric oscillator ( http://arxiv.org/abs/2405.02706v1 )

D. K. J. Boneß, W. Belzig, M. I. Dykman, (参考訳) パラメトリック変調発振器は、変調周波数の半分で2つの反対位相振動状態を有する。 振動周波数における余分な力は状態の対称性を破る。 この効果は、発振器と熱浴とのカップリングによって生じる力と量子ゆらぎの相互作用によって非常に強い。 力は振動子の量子状態上の揺らぎによって引き起こされるウォークの速度を変化させる。 状態の数が大きければ、その効果は振動状態の切り替え率において指数関数的に大きな要素に蓄積される。 私たちはその要因を見つけ、制限ケースで分析します。 温度ゼロの制限下では、余分な力が詳細バランスを破り、非摂動的にスイッチング速度が増大することを示した。

A parametrically modulated oscillator has two opposite-phase vibrational states at half the modulation frequency. An extra force at the vibration frequency breaks the symmetry of the states. The effect can be extremely strong due to the interplay between the force and the quantum fluctuations resulting from the coupling of the oscillator to a thermal bath. The force changes the rates of the fluctuation-induced walk over the quantum states of the oscillator. If the number of the states is large, the effect accumulates to an exponentially large factor in the rate of switching between the vibrational states. We find the factor and analyze it in the limiting cases. We show that in the zero-temperature limit the extra force breaks the detailed balance, leading to a nonperturbatively strong increase of the switching rate.
# ELearnFitによるニュース要約の効率化 : 文脈内学習の効率化とファインチューニングの効率化

Enhancing News Summarization with ELearnFit through Efficient In-Context Learning and Efficient Fine-Tuning ( http://arxiv.org/abs/2405.02710v1 )

Che Guan, Andrew Chin, Puya Vahabi, (参考訳) 日々のニュースサイクルによって配信される情報の希薄化に伴い、ニュースフィードを効率的に効率的に要約し、素早く消費する必要性が高まっている。 XSumデータセットからニュース記事の簡潔でコヒーレントな要約を生成するために,大規模言語モデル(LLM)を,従来の言語モデルと比較して高度な学習能力と生成能力で活用する。 本稿では,LLMの2つの重要な側面,すなわち,テキスト内学習(ELearn)とパラメータ学習(EFit)に焦点をあてる。 ELearnでは、プロンプトにおけるショット数の増加と単純なテンプレートの利用により、一般的に要約の品質が向上することがわかった。 また, ELearnでは, モデル性能の向上には至らず, 実例の活用が期待できる。 さらに,異なる手法を用いてEFitを解析し,LLMの第1層を微調整すると,他の層を微調整したり,LoRAを利用するよりも優れた結果が得られることを示した。 また、より適切なトレーニングサンプルを選択的レイヤで活用しても、パフォーマンスが向上しないこともわかりました。 ELearnとEFitを組み合わせた新しいモデル(ELearnFit)を開発した。 また、ELearnFitを使ってプロンプトと微調整のトレードオフを強調しています。 究極的には,本研究は,速報・微調整段階におけるニュース要約を最適化し,ニュース記事の合成を強化するための実践的手法を提供する。

With the deluge of information delivered by the daily news cycle, there is a growing need to effectively and efficiently summarize news feeds for quick consumption. We leverage large language models (LLMs), with their advanced learning and generative abilities as compared to conventional language models, to generate concise and coherent summaries for news articles from the XSum dataset. Our paper focuses on two key aspects of LLMs: Efficient in-context Learning (ELearn) and Parameter Efficient Fine-tuning (EFit). Under ELearn, we find that increasing the number of shots in prompts and utilizing simple templates generally improve the quality of summaries. We also find that utilizing relevant examples in few-shot learning for ELearn does not improve model performance. In addition, we studied EFit using different methods and demonstrate that fine-tuning the first layer of LLMs produces better outcomes as compared to fine-tuning other layers or utilizing LoRA. We also find that leveraging more relevant training samples using selective layers does not result in better performance. By combining ELearn and EFit, we create a new model (ELearnFit) that leverages the benefits of both few-shot learning and fine-tuning and produces superior performance to either model alone. We also use ELearnFit to highlight the trade-offs between prompting and fine-tuning, especially for situations where only a limited number of annotated samples are available. Ultimately, our research provides practical techniques to optimize news summarization during the prompting and fine-tuning stages and enhances the synthesis of news articles.
# 若者のピアサポートにおけるAIの役割--人間とAIによる反応の嗜好に関する研究

The Role of AI in Peer Support for Young People: A Study of Preferences for Human- and AI-Generated Responses ( http://arxiv.org/abs/2405.02711v1 )

Jordyn Young, Laala M Jawara, Diep N Nguyen, Brian Daly, Jina Huh-Yoo, Afsaneh Razi, (参考訳) 生成人工知能(AI)は、ニュース、教育、ソーシャルメディアを含む日常的な技術に統合されている。 AIは、会話パートナー、自動補完、レスポンス提案としてプライベートな会話をさらに浸透させた。 ソーシャルメディアが若者のピアサポート交換の主要な方法になるにつれ、いつ、どのようにAIがそうした交換を有益な、安全で、社会的に適切な方法で促進し支援できるかを理解する必要がある。 我々は622人の若者に、オンライン調査を完了させ、ヘルプ検索メッセージに対する盲目の人間とAI生成の反応を評価するよう依頼した。 被験者は、関係性、自己表現性、身体的健康に関する状況に対して、AIが生成した反応を好んだ。 しかし、自殺思考などのセンシティブな話題に対処する場合、若者は人間の反応を好んだ。 また、オンラインピアサポート交換におけるトレーニングの役割と、若者の幸福を支えることの意味についても論じる。 Disclaimer: この論文には自殺の考えを含むセンシティブなトピックが含まれています。 読者の判断は推奨される。

Generative Artificial Intelligence (AI) is integrated into everyday technology, including news, education, and social media. AI has further pervaded private conversations as conversational partners, auto-completion, and response suggestions. As social media becomes young people's main method of peer support exchange, we need to understand when and how AI can facilitate and assist in such exchanges in a beneficial, safe, and socially appropriate way. We asked 622 young people to complete an online survey and evaluate blinded human- and AI-generated responses to help-seeking messages. We found that participants preferred the AI-generated response to situations about relationships, self-expression, and physical health. However, when addressing a sensitive topic, like suicidal thoughts, young people preferred the human response. We also discuss the role of training in online peer support exchange and its implications for supporting young people's well-being. Disclaimer: This paper includes sensitive topics, including suicide ideation. Reader discretion is advised.
# CoE-SQL: 編集の連鎖を伴うマルチターンテキストからSQLへのインコンテキスト学習

CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions ( http://arxiv.org/abs/2405.02712v1 )

Hanchong Zhang, Ruisheng Cao, Hongshen Xu, Lu Chen, Kai Yu, (参考訳) 最近、Large Language Models (LLMs) は、様々なドメインやタスクにおいて印象的な機能を持つことが実証されている。 マルチターンテキスト・トゥ・SQLタスクにおけるプロンプト設計の問題について検討し,SQLクエリ生成時のLCMの推論能力の向上を試みる。 会話のコンテキストでは、現在のSQLクエリは、コンテキスト依存性のため、いくつかの操作だけで、前のSQLクエリから修正することができる。 我々は,従来のSQLクエリとエディションチェーンを併用したSQLクエリに基づいて,LCMにSQLクエリを生成させる,CoE-SQLという手法を紹介した。 我々はまた、我々のアプローチの最適構成を決定するために、広範囲にわたるアブレーション研究も行っている。 提案手法は,2つのベンチマークSParCとCoSQLにおいて,SOTAの微調整モデルと競合するSParCとCoSQLの性能を安定的に向上させる。

Recently, Large Language Models (LLMs) have been demonstrated to possess impressive capabilities in a variety of domains and tasks. We investigate the issue of prompt design in the multi-turn text-to-SQL task and attempt to enhance the LLMs' reasoning capacity when generating SQL queries. In the conversational context, the current SQL query can be modified from the preceding SQL query with only a few operations due to the context dependency. We introduce our method called CoE-SQL which can prompt LLMs to generate the SQL query based on the previously generated SQL query with an edition chain. We also conduct extensive ablation studies to determine the optimal configuration of our approach. Our approach outperforms different in-context learning baselines stably and achieves state-of-the-art performances on two benchmarks SParC and CoSQL using LLMs, which is also competitive to the SOTA fine-tuned models.
# 関連性を超えて: パースペクティブ・アウェアネスにおけるレトリバーの評価と改善

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness ( http://arxiv.org/abs/2405.02714v1 )

Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu, (参考訳) Information Retrieval (IR) のタスクは、ユーザの情報要求に基づいて関連する文書を識別するシステムを必要とする。 現実のシナリオでは、検索者はドキュメントとクエリ間のセマンティックな関連性に頼るだけでなく、ユーザクエリの背後にある微妙な意図や視点を認識することが期待されている。 例えば、クレームの検証を依頼すると、下流システムが公正な判断を下すために、支持と矛盾する視点の両方から証拠を特定することが期待される。 本研究では,検索者がクエリの異なる視点を認識および応答できるかどうかを検討する。クレームに関する関連文書の検索以外にも,検索者がサポートする文書と反対する文書とを区別できるのか? 我々は既存の6つのタスクを改革して拡張し、検索のためのベンチマークを作成します。 実験でカバーされている現在の検索者は、クエリの微妙な視点に対する認識が限られており、また特定の視点に偏りがあることが示される。 本研究の目的は,レトリバー表現空間の幾何学的特徴を活用し,ゼロショット方式でレトリバーの視点認識を改善することにある。 我々は,同じタスクセット上での投影法の有効性と有効性を示す。 さらに分析は、アンビグQAでは4.2%、エッセイ執筆では29.9%の精度で、非認識ベースラインに比べて、視点認識が様々な下流タスクのパフォーマンスを向上することを示す。

The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.
# AFter:RGBT追跡用アテンションベース核融合ルータ

AFter: Attention-based Fusion Router for RGBT Tracking ( http://arxiv.org/abs/2405.02717v1 )

Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo, (参考訳) RGBT追跡のコアとなるマルチモーダル機能融合は近年,多くの融合研究が出現している。 しかし、既存のRGBT追跡手法は、動的シナリオにおける様々な課題を扱うのが困難であるマルチモーダル機能を統合するために、固定核融合構造を広く採用している。 この問題に対処するために、この研究は AFter と呼ばれる新しい \emph{A}ttention-based \emph{F}usion rou\emph{ter} を提示する。 特に、階層的注意ネットワークに基づく融合構造空間を設計し、融合操作に対応する各注目ベース融合ユニットと、融合構造に対応するこれらの注目ユニットの組み合わせを設計する。 注意に基づく融合ユニットの組み合わせを最適化することにより、様々な挑戦的なシナリオに対応するために、動的に融合構造を選択することができる。 ニューラルネットワーク探索アルゴリズムにおける異なる構造の複雑な探索とは異なり、各注意に基づく融合ユニットにルータを装備する動的ルーティングアルゴリズムを開発し、融合構造を効率的に最適化するための組み合わせ重み付けを予測する。 5つの主流RGBT追跡データセットに対する大規模な実験は、提案されたAFterの最先端RGBTトラッカーに対する優れた性能を示している。 コードをhttps://github.com/Alexadlu/AFter.comでリリースします。

Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emph{ter} called AFter, which optimizes the fusion structure to adapt to the dynamic challenging scenarios, for robust RGBT tracking. In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure. Through optimizing the combination of attention-based fusion units, we can dynamically select the fusion structure to adapt to various challenging scenarios. Unlike complex search of different structures in neural architecture search algorithms, we develop a dynamic routing algorithm, which equips each attention-based fusion unit with a router, to predict the combination weights for efficient optimization of the fusion structure. Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of the proposed AFter against state-of-the-art RGBT trackers. We release the code in https://github.com/Alexadlu/AFter.
# 量子状態伝達プロトコルにおける忠実度分布

Distribution of Fidelity in Quantum State Transfer Protocols ( http://arxiv.org/abs/2405.02721v1 )

Salvatore Lorenzo, Francesco Plastina, Tony J. G. Apollaro, Mirko Consiglio, Karol Życzkowski, (参考訳) 量子状態伝達プロトコルは、量子鍵分布から量子計算まで、多くの量子情報処理タスクにおいて主要なツールキットである。 このようなプロトコルの性能を評価するために、入力と出力状態の間の平均忠実度に依存することが多い。 このスキームを超越して、忠実度の全確率分布を解析し、単一および2量子状態の遷移を導出するための一般的な枠組みを提供する。 完全転送に特徴的なデルタ形状から,非完全読み出しタイミングを含むプロセスの現実的な特徴から,その拡張と変形を解析した。 平均忠実度と同じ値を共有する異なる量子転送モデルでは、忠実度の分布が異なるため、最小忠実度を含むプロトコルに関する追加情報が得られる。

Quantum state transfer protocols are a major toolkit in many quantum information processing tasks, from quantum key distribution to quantum computation. To assess performance of a such a protocol, one often relies on the average fidelity between the input and the output states. Going beyond this scheme, we analyze the entire probability distribution of fidelity, providing a general framework to derive it for the transfer of single- and two-qubit states. Starting from the delta-like shape of the fidelity distribution, characteristic to perfect transfer, we analyze its broadening and deformation due to realistic features of the process, including non-perfect read-out timing. Different models of quantum transfer, sharing the same value of the average fidelity, display different distributions of fidelity, providing thus additional information on the protocol, including the minimum fidelity.
# パルス単一光子分光におけるチャーピングの役割について

On the role of chirping in pulsed single photon spectroscopy ( http://arxiv.org/abs/2405.02723v1 )

Elnaz Darsheshdar, Aiman Khan, Francesco Albarelli, Animesh Datta, (参考訳) 本研究では,2レベル系(TLS)と1光子パルスとの相互作用強度の推定精度について検討した。 ガウスおよび指数時間プロファイルに適用した線形・二次・正弦波時間位相を考察する。 漸近期にTLSが基底状態に完全に崩壊したとき、基本的な精度はそのスペクトル振幅の大きさにのみ依存する。 位相変調ガウスパルスの場合、これはスペクトル帯域で完全に決定される。 一般時相プロファイルと位相変調の基本精度を評価するための式を提供する。 最後に, パルス単一光子分光法において, 実験的に実現可能なモード分解測定が最適であるか, あるいはそれに近いかを示す。

We investigate the precision of estimating the interaction strength between a two-level system (TLS) and a single-photon pulse when the latter is subject to chirping. We consider linear, quadratic, and sinusoidal temporal phases applied to Gaussian and exponential temporal profiles. At the asymptotic time, when the TLS has fully decayed to its ground state, the fundamental precision depends solely on the magnitude of its spectral amplitude. For quadratically phase-modulated Gaussian pulses, this is entirely determined by the spectral bandwidth. We provide expressions for evaluating the fundamental precision for general temporal profiles and phase modulations. Finally, we show that experimentally feasible mode-resolved measurements are optimal, or close to it, for chirped, pulsed single photon spectroscopy.
# リスク感性多エージェント強化学習における平衡バイアスのモデリング

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2405.02724v1 )

Yingjie Fei, Ruitu Xu, (参考訳) リスク依存型マルチエージェント強化学習において,エージェントが多種多様なリスク嗜好を持つ報酬のエントロピー的リスク尺度を最適化し,リスクに敏感なマルチエージェント強化学習について検討した。 我々は,既存の文献から否定的に適応された後悔をパフォーマンス指標として利用することで,最もリスクに敏感なエージェントを優先し,他のエージェントを無視する平衡バイアスの政策を誘導できることを示した。 ナイーブな後悔の欠如に対処するため、我々はリスクバランスのとれた後悔という新しい後悔の概念を提案し、均衡バイアスの問題を克服することの限界を低く示す。 さらに,リスクに敏感なマルコフゲームにおいて,Nashの学習,相関,粗相関平衡を学習するための自己再生アルゴリズムを開発した。 提案アルゴリズムは, リスクバランスの取れた後悔に対して, ほぼ最適の後悔保証が得られることを示す。

We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret.
# 機械学習システムにおける隠れフィードバックループ効果の数学的モデル

A Mathematical Model of the Hidden Feedback Loop Effect in Machine Learning Systems ( http://arxiv.org/abs/2405.02726v1 )

Andrey Veprikov, Alexander Afanasiev, Anton Khritankov, (参考訳) 社会規模の機械学習システムの広範な展開には、信頼性の喪失、バイアスの増幅、AIの安全性要件違反など、これらのシステムが環境にもたらす長期的な影響の完全な理解が必要である。 本稿では,誤り増幅,帰納的概念ドリフト,エコーチャンバーなど,意図しない隠れたフィードバックループに起因するいくつかの現象を共同で記述するために,繰り返し学習プロセスを導入する。 このプロセスは、データを取得し、予測モデルをトレーニングし、単一の数学的モデル内でエンドユーザに予測を配信するサイクル全体を含む。 このような繰り返し学習設定の特徴は、環境の状態が時間とともに学習者自身に因果的に依存するようになり、データ分布に関する通常の仮定に反することである。 本稿では,繰り返し学習プロセスの力学系モデルを提案し,システム動作の正および負のフィードバックループモードに対する確率分布の制限セットを証明した。 2つの合成データセット上で、模範的な教師付き学習問題を用いて一連の計算実験を行う。 実験の結果は、力学モデルから導かれる理論的な予測と一致する。 本研究は,機械学習システムにおける学習過程の反復的学習の実現可能性を示し,その領域におけるさらなる研究の機会を広げるものである。

Widespread deployment of societal-scale machine learning systems necessitates a thorough understanding of the resulting long-term effects these systems have on their environment, including loss of trustworthiness, bias amplification, and violation of AI safety requirements. We introduce a repeated learning process to jointly describe several phenomena attributed to unintended hidden feedback loops, such as error amplification, induced concept drift, echo chambers and others. The process comprises the entire cycle of obtaining the data, training the predictive model, and delivering predictions to end-users within a single mathematical model. A distinctive feature of such repeated learning setting is that the state of the environment becomes causally dependent on the learner itself over time, thus violating the usual assumptions about the data distribution. We present a novel dynamical systems model of the repeated learning process and prove the limiting set of probability distributions for positive and negative feedback loop modes of the system operation. We conduct a series of computational experiments using an exemplary supervised learning problem on two synthetic data sets. The results of the experiments correspond to the theoretical predictions derived from the dynamical model. Our results demonstrate the feasibility of the proposed approach for studying the repeated learning processes in machine learning systems and open a range of opportunities for further research in the area.
# U-DiT:U形拡散変圧器におけるダウンサンプルトークン

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers ( http://arxiv.org/abs/2405.02730v1 )

Yuchuan Tian, Zhijun Tu, Hanting Chen, Jie Hu, Chao Xu, Yunhe Wang, (参考訳) DiT(Diffusion Transformer)は、遅延空間画像生成のための拡散タスクにトランスフォーマーアーキテクチャを導入する。 一連の変圧器ブロックをチェーンする等方的アーキテクチャでは、DiTは競争性能と優れたスケーラビリティを示しているが、一方で、DiTによるU-Netの放棄とその次の改善は再考する価値がある。 この目的のために、U-NetアーキテクチャのDiTと等方的なDiTを比較することで、簡単な玩具実験を行う。 U-Netアーキテクチャは、U-Netインダクティブバイアスの中でわずかに有利にしかならず、U-NetスタイルのDiT内の潜在的な冗長性を示している。 U-Netのバックボーン機能が低周波に支配されているという発見に触発されて、クエリキー値タプルのトークンダウンサンプリングを行い、計算量を大幅に削減したにもかかわらず、さらなる改善を実現した。 ダウンサンプルトークンによる自己注意に基づいて,本論文では,U字型DiT(U-DiT)のシリーズを提案し,U-DiTモデルの異常な性能を示すための広範な実験を行う。 提案されたU-DiTは、計算コストのわずか1/6でDiT-XL/2を上回った。 コードはhttps://github.com/YuchuanTian/U-DiT.comで入手できる。

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the paper and conduct extensive experiments to demonstrate the extraordinary performance of U-DiT models. The proposed U-DiT could outperform DiT-XL/2 with only 1/6 of its computation cost. Codes are available at https://github.com/YuchuanTian/U-DiT.
# システムレビュー:コネクテッドおよび自律走行車における異常検出

Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles ( http://arxiv.org/abs/2405.02731v1 )

J. R. V. Solaas, N. Tuptuk, E. Mariconti, (参考訳) この系統的なレビューは、連結車両と自律車両の異常検出に焦点を当てている。 最初のデータベース検索では2160項目が特定され、そのうち203項目が厳格な審査と評価の後にこのレビューに含まれていた。 この研究では、異常検出に最もよく使用される人工知能(AI)アルゴリズムが、一級SVMとともにLSTM、CNN、オートエンコーダなどのニューラルネットワークであることが明らかになった。 ほとんどの異常ベースのモデルは実際の運用車両データを使用して訓練されたが、攻撃や故障などの異常はデータセットに人工的に注入されることが多かった。 これらのモデルは、主にリコール、精度、精度、F1スコア、偽陽性率の5つの主要な評価指標を用いて評価された。 異常検出モデルに最も頻繁に使用される評価指標は、精度、精度、リコール、F1スコアである。 この体系的なレビューはいくつかのレコメンデーションを提示します。 まず、異常検出モデルの総合的な評価を提供するために、複数の評価指標を組み込む必要がある。 第二に、研究のごく一部だけがモデルをオープンソース化し、研究コミュニティ内でのコラボレーションを促進するためにモデルを公開し、発見を効果的に検証し比較する必要性を示している。 第三に、提案された異常に基づく検出モデルの有効性をテストするために、事前に定義された異常やサイバー攻撃を伴うデータセットをベンチマークする必要がある。 さらに,車両への異常検出の展開について,道路上での性能評価を行うための今後の研究が必要である。 Ethernet や FlexRay など,CAN へのプロトコルの異なる侵入検知システムについての研究は,特に不足している。

This systematic review focuses on anomaly detection for connected and autonomous vehicles. The initial database search identified 2160 articles, of which 203 were included in this review after rigorous screening and assessment. This study revealed that the most commonly used Artificial Intelligence (AI) algorithms employed in anomaly detection are neural networks like LSTM, CNN, and autoencoders, alongside one-class SVM. Most anomaly-based models were trained using real-world operational vehicle data, although anomalies, such as attacks and faults, were often injected artificially into the datasets. These models were evaluated mostly using five key evaluation metrics: recall, accuracy, precision, F1-score, and false positive rate. The most frequently used selection of evaluation metrics used for anomaly detection models were accuracy, precision, recall, and F1-score. This systematic review presents several recommendations. First, there is a need to incorporate multiple evaluation metrics to provide a comprehensive assessment of the anomaly detection models. Second, only a small proportion of the studies have made their models open source, indicating a need to share models publicly to facilitate collaboration within the research community, and to validate and compare findings effectively. Third, there is a need for benchmarking datasets with predefined anomalies or cyberattacks to test and improve the effectiveness of the proposed anomaly-based detection models. Furthermore, there is a need for future research to investigate the deployment of anomaly detection to a vehicle to assess its performance on the road. There is a notable lack of research done on intrusion detection systems using different protocols to CAN, such as Ethernet and FlexRay.
# Recall Them All: Retrieval-Augmented Language Models for Long Object List extract from Long Documents (英語)

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents ( http://arxiv.org/abs/2405.02732v1 )

Sneha Singhania, Simon Razniewski, Gerhard Weikum, (参考訳) テキストから関係抽出する方法は、リコールの制限を犠牲にして、主に高精度に焦点をあてる。 しかし、高いリコールは、特定の主題と特定の関係にあるオブジェクトエンティティの長いリストをポップアップさせるのに不可欠である。 関連オブジェクトのキューは、長いテキストで多くのパスに分散することができる。 これは長いテキストから長いリストを抽出することの難しさを浮き彫りにする。 本稿では,L3X法を2段階に分けて提案する手法について述べる。(1)大規模言語モデル(LLM)を用いたリコール指向生成と,(2)精度指向の精査による候補の検証と検証を行う。 我々のL3X法はLLMのみの世代をかなりの差で上回る。

Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.
# ソフトウェア工学研究論文を読むためのダイアグラム化手法--経験報告

A Diagramming Technique for Teaching Students to Read Software Engineering Research Papers: an experience report ( http://arxiv.org/abs/2405.02734v1 )

Mary Shaw, (参考訳) 科学研究論文を読むことは、多くの学生が博士課程に入る前に学ばないスキルであるが、その成功には欠かせない。 本稿では,この技術を教えるための図式化手法について述べる。 これにより、学生はより効果的な読者になった。

Reading scientific research papers is a skill that many students do not learn before entering PhD programs, but it is critical to their success. This paper describes our diagramming technique for teaching this skill, which helps them identify the structure and the scientific argument of the paper. This has made our students more effective readers.
# 大規模言語モデルを用いた知識グラフ補完の相関予測

Relations Prediction for Knowledge Graph Completion using Large Language Models ( http://arxiv.org/abs/2405.02738v1 )

Sakher Khalil Alqaaidi, Krzysztof Kochut, (参考訳) 知識グラフは、構造化形式で事実を表現するために広く使われている。 大規模な応用のため、知識グラフは不完全である。 関係予測タスクは、各一対のノードに1つ以上の可能な関係を割り当てて知識グラフ補完を得る。 本研究では,関係予測タスクにおいて,知識グラフノード名を用いて大規模言語モデルを微調整する。 ノード名のみを利用することで、帰納的設定でモデルが十分に動作できるようにします。 実験の結果,広く利用されている知識グラフベンチマークにおいて,新たなスコアが得られた。

Knowledge Graphs have been widely used to represent facts in a structured format. Due to their large scale applications, knowledge graphs suffer from being incomplete. The relation prediction task obtains knowledge graph completion by assigning one or more possible relations to each pair of nodes. In this work, we make use of the knowledge graph node names to fine-tune a large language model for the relation prediction task. By utilizing the node names only we enable our model to operate sufficiently in the inductive settings. Our experiments show that we accomplish new scores on a widely used knowledge graph benchmark.
# パフォーマンスを超えて - LLMにおけるラベルバイアスの定量化と緩和

Beyond Performance: Quantifying and Mitigating Label Bias in LLMs ( http://arxiv.org/abs/2405.02743v1 )

Yuval Reif, Roy Schwartz, (参考訳) 大規模言語モデル(LLM)は、命令を含むコンテキストプロンプトや最小限の入出力例を活用することで、多様なタスクに顕著な適応性を示す。 しかし、最近の研究はラベルバイアスも明らかにした。 それでも、このバイアスを確実にかつ大規模に検出し、測定することは、比較的未発見のままである。 本研究では,モデル予測におけるラベルバイアスの定量化のための様々なアプローチを評価し,279の分類タスクと10のLLMを包括的に調査した。 本研究は, 脱バイアス前後のモデルに有意なラベルバイアスを生じさせるとともに, 従来は使用されていなかった結果に基づく評価指標の重要性を浮き彫りにする。 さらに,ラベルバイアスの低減と性能向上の両面において,最近のキャリブレーション手法よりも優れたラベルバイアス校正法を提案する。 以上の結果から,LSMの予測におけるラベルバイアスが信頼性の障壁であることが示唆された。

Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplored. In this study, we evaluate different approaches to quantifying label bias in a model's predictions, conducting a comprehensive investigation across 279 classification tasks and ten LLMs. Our investigation reveals substantial label bias in models both before and after debiasing attempts, as well as highlights the importance of outcomes-based evaluation metrics, which were not previously used in this regard. We further propose a novel label bias calibration method tailored for few-shot prompting, which outperforms recent calibration approaches for both improving performance and mitigating label bias. Our results emphasize that label bias in the predictions of LLMs remains a barrier to their reliability.
# 不完全なクライアント参加の有無によるサーバ支援フェデレーション学習の理解

Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation ( http://arxiv.org/abs/2405.02745v1 )

Haibo Yang, Peiwen Qiu, Prashant Khanduri, Minghong Fang, Jia Liu, (参考訳) 連邦学習(FL)における既存の作業は、多くの場合、完全なクライアントまたは均一に分散されたクライアントの参加を伴う理想的なシステムを前提とします。 しかし、実際には、システムの不均一性要因が無数にあるため、一部のクライアントがFLトレーニング(いわゆる不完全なクライアント参加)に参加できないことが観察されている。 不完全なクライアント参加の影響を緩和するための一般的なアプローチは、サーバに補助データセットを備えたサーバ支援連合学習(SA-FL)フレームワークである。 しかしながら、SA-FLが不完全なクライアント参加問題に対処する上で有効であることが実証的に証明されているにもかかわらず、SA-FLの理論的理解はいまだに欠如している。 一方、従来のFLにおける不完全なクライアント参加の意義もよく理解されていない。 これらの理論的ギャップは、SA-FLを厳格に調査する動機となっている。 この目的のために, 従来の FL は不完全なクライアント参加の下で PAC を学習可能であることを示す。 そして,不完全なクライアント参加を伴うFLのPAC学習性は,理論上初めてSA-FLの使用を正当化するSA-FLによって再現可能であることを示す。 最後に, 従来の FL と同じ線形収束速度保証を, 理想的なクライアント参加仮定で実現し, 収束保証付きの最初の SA-FL アルゴリズムを提供する$\mathsf{SAFARI}$ (server-assisted Federated Averaging) アルゴリズムを提案する。 異なるデータセットに対する大規模な実験では、$\mathsf{SAFARI}$が不完全なクライアント参加時のパフォーマンスを大幅に改善している。

Existing works in federated learning (FL) often assume an ideal system with either full client or uniformly distributed client participation. However, in practice, it has been observed that some clients may never participate in FL training (aka incomplete client participation) due to a myriad of system heterogeneity factors. A popular approach to mitigate impacts of incomplete client participation is the server-assisted federated learning (SA-FL) framework, where the server is equipped with an auxiliary dataset. However, despite SA-FL has been empirically shown to be effective in addressing the incomplete client participation problem, there remains a lack of theoretical understanding for SA-FL. Meanwhile, the ramifications of incomplete client participation in conventional FL are also poorly understood. These theoretical gaps motivate us to rigorously investigate SA-FL. Toward this end, we first show that conventional FL is {\em not} PAC-learnable under incomplete client participation in the worst case. Then, we show that the PAC-learnability of FL with incomplete client participation can indeed be revived by SA-FL, which theoretically justifies the use of SA-FL for the first time. Lastly, to provide practical guidance for SA-FL training under {\em incomplete client participation}, we propose the $\mathsf{SAFARI}$ (server-assisted federated averaging) algorithm that enjoys the same linear convergence speedup guarantees as classic FL with ideal client participation assumptions, offering the first SA-FL algorithm with convergence guarantee. Extensive experiments on different datasets show $\mathsf{SAFARI}$ significantly improves the performance under incomplete client participation.
# サブゴール蒸留 : 小言語エージェントの改良手法

Sub-goal Distillation: A Method to Improve Small Language Agents ( http://arxiv.org/abs/2405.02749v1 )

Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar, Marc-Alexandre Cote, (参考訳) 大規模言語モデル(LLM)は対話型タスクのエージェントとして大きな可能性を証明してきたが、その相当な計算要件と制限された呼び出し数は、特に意思決定のような長期的対話型タスクや継続的なタスクを含むシナリオにおいて、その実用性を制限している。 これらの制約に対処するために,数十億のパラメータを持つLLMの性能を,より小さな言語モデル(770Mパラメータ)に転送する手法を提案する。 提案手法では,LLMから知識蒸留を学習してサブゴールを生成する計画モジュールと,基本動作を用いてこれらのサブゴールを学習する実行モジュールから構成される階層的エージェントを構築する。 より詳しくは、LLMを利用して、目標を達成するための一連のサブゴールでオラクルパスに注釈を付ける。 その後、この注釈付きデータを使用して、計画モジュールと実行モジュールの両方を微調整します。 重要なことは、どちらのモジュールも推論中にLLMへのリアルタイムアクセスに依存しておらず、LLMの相互作用に関連する全体的なコストを固定コストに大幅に削減する。 難易度とマルチタスクの対話型テキスト環境であるScienceWorldでは,基本動作のみに基づく標準的な模倣学習を16.7%(絶対的)で上回っている。 我々の分析は、他のLCM法と比較して、我々のアプローチの効率性を強調している。 私たちのコードと蒸留のための注釈付きデータはGitHubで参照できます。

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.
# コントラストデコーディングによる大規模言語モデルにおける文脈理解の促進

Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding ( http://arxiv.org/abs/2405.02750v1 )

Zheng Zhao, Emilio Monti, Jens Lehmann, Haytham Assem, (参考訳) 大規模言語モデル(LLM)は、テキスト生成中に入力コンテキストを不適切に統合する傾向にあり、モデルパラメータのエンコードされた事前知識に過度に依存するため、事実的不整合や文脈的に不整合なコンテンツを生成する可能性がある。 LLMは2つの主要な知識源を利用する。 1)事前訓練からの事前(パラメトリック)知識、及び 2)入力プロンプトからの文脈的(非パラメトリック)知識。 この研究は、LLMが生成過程、特にオープンドメイン質問応答の文脈において、これらの知識ソースを効果的にバランスさせる方法について、オープンな疑問に対処する。 この問題に対処するため, 逆無関係なパスを負のサンプルとして, コントラッシブデコーディングを統合することによって, 生成時の強靭なコンテキストグラウンド化を向上する手法を提案する。 特に,本手法は,さらなるトレーニングを必要とせず,推論時に動作可能である。 我々は,その適用性と有効性を示す総合的な実験を行い,既存の方法論よりもその優位性を示す実証的な証拠を提供する。 私たちのコードは、https://github.com/amazon-science/ContextualUnderstanding-ContrastiveDecodingで公開されています。

Large language models (LLMs) tend to inadequately integrate input context during text generation, relying excessively on encoded prior knowledge in model parameters, potentially resulting in generated text with factual inconsistencies or contextually unfaithful content. LLMs utilize two primary knowledge sources: 1) prior (parametric) knowledge from pretraining, and 2) contextual (non-parametric) knowledge from input prompts. The study addresses the open question of how LLMs effectively balance these knowledge sources during the generation process, specifically in the context of open-domain question answering. To address this issue, we introduce a novel approach integrating contrastive decoding with adversarial irrelevant passages as negative samples to enhance robust context grounding during generation. Notably, our method operates at inference time without requiring further training. We conduct comprehensive experiments to demonstrate its applicability and effectiveness, providing empirical evidence showcasing its superiority over existing methodologies. Our code is publicly available at: https://github.com/amazon-science/ContextualUnderstanding-ContrastiveDecoding.
# Image Anti-forensicsのためのDeep Image Restoration

Deep Image Restoration For Image Anti-Forensics ( http://arxiv.org/abs/2405.02751v1 )

Eren Tahir, Mert Bal, (参考訳) 画像鑑定は、画像が改ざんされたかどうかを懸念する一方で、画像鑑定法が改ざんされた画像を検出するのを防ぐために画像鑑定を試みている。 この2つの分野の競争は、深層学習の進展よりずっと前に始まった。 JPEG圧縮、曖昧化、ノイズ化は、今日の標準の単純な方法であり、長い間、反法医学に使われており、法医学と反法医学の両方で多くの研究の対象となっている。 これらの従来の手法は古いが、偽画像の検出が困難になり、深層画像偽造検出モデルの訓練にデータ拡張に使用される。 画像の検出を困難にすることに加えて、これらの手法は画像に痕跡を残して画質を劣化させる。 これらの痕跡を検出するために、別の画像鑑定法も開発されている。 本研究では, 深部画像復元モデルを用いて, さらに一歩進んで画像品質を向上し, 鍛造画像の検出を困難にしている。 これらの手法が画質に与える影響を評価する。 次に、既存の2つの最良の画像操作検出モデルについて、深層学習と深層学習を併用して、提案手法を試験する。 その結果,既存の画像偽造検出モデルが提案手法に反することを示す。 コードの実装はhttps://github.com/99eren99/DIRFIAFで公開される。

While image forensics is concerned with whether an image has been tampered with, image anti-forensics attempts to prevent image forensics methods from detecting tampered images. The competition between these two fields started long before the advancement of deep learning. JPEG compression, blurring and noising, which are simple methods by today's standards, have long been used for anti-forensics and have been the subject of much research in both forensics and anti-forensics. Although these traditional methods are old, they make it difficult to detect fake images and are used for data augmentation in training deep image forgery detection models. In addition to making the image difficult to detect, these methods leave traces on the image and consequently degrade the image quality. Separate image forensics methods have also been developed to detect these traces. In this study, we go one step further and improve the image quality after these methods with deep image restoration models and make it harder to detect the forged image. We evaluate the impact of these methods on image quality. We then test both our proposed methods with deep learning and methods without deep learning on the two best existing image manipulation detection models. In the obtained results, we show how existing image forgery detection models fail against the proposed methods. Code implementation will be publicly available at https://github.com/99eren99/DIRFIAF .
# 安全な強化学習のための暗黙のセーフセットアルゴリズム

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning ( http://arxiv.org/abs/2405.02754v1 )

Weiye Zhao, Tairan He, Feihan Li, Changliu Liu, (参考訳) 深部強化学習(DRL)は多くの連続制御タスクにおいて顕著な性能を示した。 しかし、DRLの現実的な応用に対する大きな障害は、安全保証の欠如である。 DRLエージェントは報酬形成によって期待されるシステムの安全性を満たすことができるが、常に厳しい制約(例えば安全仕様)を満たすようにエージェントを設計することは、ステップ毎に非常に難しい課題である。 対照的に、安全管理分野における既存の作業は、ハードセーフティ制約の持続的満足度を保証する。 しかし、これらの手法は、DRL設定ではアクセスできない安全な制御を合成するために、明示的な解析系力学モデルを必要とする。 本稿では,DRLエージェントのセーフガードを合成し,トレーニングを通して安全を保証するためのモデルフリー安全制御アルゴリズム,暗黙安全セットアルゴリズムを提案する。 提案アルゴリズムは,ブラックボックスの動的関数(例えば,デジタルツインシミュレータ)をクエリすることで,安全指標(バリア証明書)とそれに続く安全制御法則を合成する。 さらに、暗黙的安全集合アルゴリズムは、連続時間系と離散時間系の両方において、安全な集合と前方不変性に対する有限時間収束を保証することを理論的に証明する。 提案アルゴリズムを最先端のセーフティガイムベンチマークで検証し、最先端の安全DRL法と比較して9,5\% \pm 9\%$累積報酬を得た上で、安全性違反をゼロにする。 さらに、結果のアルゴリズムは並列計算を伴う高次元システムによくスケールする。

Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95\% \pm 9\%$ cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.
# TK-Planes:動的UAVシーンのための高次元特徴ベクトル付きタイヤ付きK-Planes

TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes ( http://arxiv.org/abs/2405.02762v1 )

Christopher Maxey, Jaehoon Choi, Yonghan Lee, Hyungtae Lee, Dinesh Manocha, Heesung Kwon, (参考訳) 本稿では,無人航空機(UAV)の認識における合成と実世界の領域間ギャップを埋める新しい手法を提案する。 我々のformu-lationは、動いた物体や人間の行動からなる動的なシーン向けに設計されており、その目的はポーズや行動を認識することである。 我々は,K-Planes Neural Radiance Field (NeRF)の拡張を提案する。 階層化された特徴ベクトルを生成し、シーンに関する概念情報を効果的にモデル化するとともに、出力された特徴マップをRGB画像に変換する画像デコーダを生成する。 本手法は,シーン内の静的および動的物体の情報を活用し,高精細映像の高精細なシーン特性を捉えることができる。 我々は,Okutama Action や UG2 などの挑戦的データセットの性能評価を行い,最先端の空中認識アルゴリズムよりも精度が大幅に向上したことを示す。

In this paper, we present a new approach to bridge the domain gap between synthetic and real-world data for un- manned aerial vehicle (UAV)-based perception. Our formu- lation is designed for dynamic scenes, consisting of moving objects or human actions, where the goal is to recognize the pose or actions. We propose an extension of K-Planes Neural Radiance Field (NeRF), wherein our algorithm stores a set of tiered feature vectors. The tiered feature vectors are generated to effectively model conceptual information about a scene as well as an image decoder that transforms output feature maps into RGB images. Our technique leverages the information amongst both static and dynamic objects within a scene and is able to capture salient scene attributes of high altitude videos. We evaluate its performance on challenging datasets, including Okutama Action and UG2, and observe considerable improvement in accuracy over state of the art aerial perception algorithms.
# 大規模言語モデルの敵対的ロバスト性の評価 : 実証的研究

Assessing Adversarial Robustness of Large Language Models: An Empirical Study ( http://arxiv.org/abs/2405.02764v1 )

Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer, (参考訳) 大規模言語モデル(LLM)は自然言語処理に革命をもたらしたが、敵の攻撃に対する頑強さは依然として重要な問題である。 Llama, OPT, T5 など,主要なオープンソース LLM の脆弱性を露呈する,新しいホワイトボックス方式の攻撃手法を提案する。 本研究では, モデルサイズ, 構造, 微調整が対向的摂動抵抗に及ぼす影響を評価する。 5つのテキスト分類タスクの総合的な評価により,LLMのロバスト性に対する新たなベンチマークが確立される。 本研究の成果は,LLMを現実のアプリケーションに確実に展開すること,信頼性の高いAIシステムの進歩に寄与することにつながる。

Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems.
# 言語モデルにおける編集知識の検出

Detecting Edited Knowledge in Language Models ( http://arxiv.org/abs/2405.02765v1 )

Paul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert, (参考訳) 知識編集技術(KE)は、事前学習から学んだ言語モデルの時代遅れまたは不正確な知識を更新することができる。 しかしKEはまた、誤情報や有害なコンテンツを挿入するなど、潜在的に悪意のあるアプリケーションに直面している。 さらに、責任あるAIの文脈では、エンドユーザは、生成されたアウトプットが編集された知識によって駆動されているか、事前トレーニングからファーストハンドの知識によって駆動されているかを知るように指示される。 そこで本研究では,言語モデルにおける編集された知識を,新たなタスクを導入することで検出する。編集されたモデルとモデルが生成する特定の知識が与えられた場合,その知識を(事前学習に基づく)「非編集」あるいは「編集」のいずれかに分類することを目的とする。 2つの最先端KE、2つの言語モデル、2つのデータセットでタスクを開始する。 さらに,隠れ状態表現を入力特徴とするロジスティック回帰モデルRepRegを提案する。 我々の結果は、RepRegが強いベースラインを確立し、99.81%のピーク精度と97.79%のドメイン外設定を実現していることを示している。 第二に、RepRegは限られたトレーニングセット(200のトレーニングサンプル)でほぼ最適のパフォーマンスを達成し、ドメイン外の設定でもパフォーマンスを維持する。 最後に、同じ主題や対象を含む場合、編集された知識と非編集された知識を分離することはより困難である。

Knowledge editing techniques (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KE also faces potential malicious applications, e.g. inserting misinformation and toxic content. Moreover, in the context of responsible AI, it is instructive for end-users to know whether a generated output is driven by edited knowledge or first-hand knowledge from pre-training. To this end, we study detecting edited knowledge in language models by introducing a novel task: given an edited model and a specific piece of knowledge the model generates, our objective is to classify the knowledge as either "non-edited" (based on the pre-training), or ``edited'' (based on subsequent editing). We initiate the task with two state-of-the-art KEs, two language models, and two datasets. We further propose a simple classifier, RepReg, a logistic regression model that takes hidden state representations as input features. Our results reveal that RepReg establishes a strong baseline, achieving a peak accuracy of 99.81%, and 97.79% in out-of-domain settings. Second, RepReg achieves near-optimal performance with a limited training set (200 training samples), and it maintains its performance even in out-of-domain settings. Last, we find it more challenging to separate edited and non-edited knowledge when they contain the same subject or object.
# 一様学習を超えて: 生涯学習における複数のモダリティの統合の重要性

Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning ( http://arxiv.org/abs/2405.02766v1 )

Fahad Sarfraz, Bahram Zonooz, Elahe Arani, (参考訳) 人間は継続学習(CL)に優れ、ディープニューラルネットワーク(DNN)は破滅的な忘れを見せる。 効果的なCLを可能にする脳の健全な特徴は、DNNで過小評価されている学習と推論に複数のモダリティを利用することである。 そこで本稿では,マルチモーダル連続学習のためのベンチマークを導入するとともに,マルチモーダル学習における複数モーダルの役割と相互作用について考察する。 以上の結果から,複数のビューと相補的な情報を複数のモーダルから活用することで,より正確かつ堅牢な表現を学習できることが示唆された。 これにより、モデルがモダリティ固有の規則性に弱くなり、忘れをかなり軽減する。 さらに、分布シフトに対して、個々のモーダルが様々な強靭性を示すことが観察された。 最後に,各モダリティにおけるデータ点間の関係構造的類似性を利用して,異なるモダリティからの情報を統合・整合する手法を提案する。 本手法は,単モーダル推論と多モーダル推論の両方を可能にする強力なベースラインを設定する。 本研究は,CLの実現における複数のモダリティの役割をさらに探求する上で有望な事例であり,今後の研究のための標準ベンチマークを提供する。

While humans excel at continual learning (CL), deep neural networks (DNNs) exhibit catastrophic forgetting. A salient feature of the brain that allows effective CL is that it utilizes multiple modalities for learning and inference, which is underexplored in DNNs. Therefore, we study the role and interactions of multiple modalities in mitigating forgetting and introduce a benchmark for multimodal continual learning. Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations. This makes the model less vulnerable to modality-specific regularities and considerably mitigates forgetting. Furthermore, we observe that individual modalities exhibit varying degrees of robustness to distribution shift. Finally, we propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality. Our method sets a strong baseline that enables both single- and multimodal inference. Our study provides a promising case for further exploring the role of multiple modalities in enabling CL and provides a standard benchmark for future research.
# エントロピー規則化ゲームにおける独立自然政策勾配の線形収束

Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization ( http://arxiv.org/abs/2405.02769v1 )

Youbang Sun, Tao Liu, P. R. Kumar, Shahin Shahrampour, (参考訳) 本研究は,マルチエージェント強化学習におけるエントロピー規則化独立自然政策勾配(NPG)アルゴリズムに焦点を当てる。 この研究において、エージェントは正確な政策評価を持つ神託にアクセスでき、それぞれの独立報酬を最大化しようとすると仮定される。 各個人の報酬は、マルチエージェントシステムのすべてのエージェントの行動に依存すると仮定され、エージェント間のゲームに繋がる。 我々は、すべてのエージェントが、エントロピー正則化の導入によって強制される有界な合理性を持つポリシーの下で決定を下すと仮定する。 実際には、より小さな正規化はエージェントがより合理的でナッシュポリシーに近い振る舞いをすることを意味する。 一方、より大きな正則化を持つエージェントはよりランダムに作用し、より多くの探索を可能にする。 十分なエントロピー正則化の下で、この系の力学は線形速度で量子応答平衡(QRE)に収束することを示す。 正規化仮定は,QREがナッシュ均衡を近似することを妨げているが,本研究は協調ゲーム,ポテンシャルゲーム,2プレーヤマトリクスゲームなど,幅広いゲームに適用できる。 我々はまた、理論解析の検証として、複数のゲーム(マルコフゲームを含む)に対して広範な実験結果を提供する。

This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this work, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all the agents in the multi-agent system, leading to a game between agents. We assume all agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies the agents are more rational and behave closer to Nash policies. On the other hand, agents with larger regularization acts more randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium, our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
# PhilHumans: 個人の健康のために機械学習をベンチマークする

PhilHumans: Benchmarking Machine Learning for Personal Health ( http://arxiv.org/abs/2405.02770v1 )

Vadim Liventsev, Vivek Kumar, Allmin Pradhap Singh Susaiyah, Zixiu Wu, Ivan Rodin, Asfand Yaar, Simone Baloccu, Marharyta Beraziuk, Sebastiano Battiato, Giovanni Maria Farinella, Aki Härmä, Rim Helaoui, Milan Petkovic, Diego Reforgiato Recupero, Ehud Reiter, Daniele Riboni, Raymond Sterling, (参考訳) 医療における機械学習の利用は、患者の成果を改善し、医療のリーチと手頃な価格を拡大する可能性がある。 他の応用分野の歴史は、インテリジェントシステムの開発には強力なベンチマークが不可欠であることを示している。 我々は、HUman-Machine Natural Interaction(PhilHumans)を活用し、さまざまなヘルスケア設定、トークセラピー、ダイエットコーチング、緊急ケア、集中治療、産科ソノグラフィー、さらにはアクション予測、タイムリーモデリング、時間モデリング、インサイトマイニング、言語モデリング、コンピュータビジョン、強化学習、プログラム合成など、さまざまな学習設定を含む、機械学習のための総合的なベンチマークスイートであるPhilHumansを紹介します。

The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leveraging HUman-MAchine Natural interactions (PhilHumans), a holistic suite of benchmarks for machine learning across different Healthcare settings - talk therapy, diet coaching, emergency care, intensive care, obstetric sonography - as well as different learning settings, such as action anticipation, timeseries modeling, insight mining, language modeling, computer vision, reinforcement learning and program synthesis
# MMEarth:地理空間表現学習のためのマルチモーダル・プレテキスト・タスク

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning ( http://arxiv.org/abs/2405.02771v1 )

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang, (参考訳) 地球観測(EO)データの量は膨大であるが、多くの重要な応用にはラベル付きトレーニングデータがない。 しかし、EOデータには、地理的位置と時間に基づいて、さまざまなモダリティとセンサーからのデータを自動的にペアリングするユニークな機会がある。 この機会を捉えて、グローバルスケールで多様なマルチモーダル事前トレーニングデータセットを作成します。 この120万箇所の新たなコーパスを用いて,光衛星画像の汎用表現を学習するために,MP-MAE(Multi-Pretext Masked Autoencoder)アプローチを提案する。 我々のアプローチは、完全な畳み込みマスク付きオートエンコーダ(MAE)であるConvNeXt V2アーキテクチャに基づいている。 マルチモーダル・プレテキスト・タスクの組をベースとしたMP-MAEアプローチは、ImageNetで事前訓練されたMAEと、ドメイン固有の衛星画像で事前訓練されたMAEよりも優れていることを示す。 これは、画像分類やセマンティックセグメンテーションを含むいくつかの下流タスクで示される。 また,光衛星画像のみの事前学習に比べて,BigEarthNetでは1g4pp,So2Satでは16ppのマルチモーダル事前学習により線形探索性能が著しく向上することがわかった。 グローバルスケールアプリケーションにおいて重要な側面であるラベルやパラメータの効率も向上することを示す。

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that multi-modal pretraining notably improves the linear probing performance, e.g. 4pp on BigEarthNet and 16pp on So2Sat, compared to pretraining on optical satellite images only. We show that this also leads to better label and parameter efficiency which are crucial aspects in global scale applications.
# 現実世界の顔の復元に向けて:新しいベンチマーク

Towards Real-world Video Face Restoration: A New Benchmark ( http://arxiv.org/abs/2404.19500v2 )

Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong, (参考訳) 画像上のブラインド顔復元(BFR)はここ数年で大きく進歩しているが、視線方向や顔の向きなどのより複雑な顔の動きに対してより難しい実世界のビデオ顔復元(VFR)は未解決のままである。 典型的なBFR法は、プライベートに合成されたデータセットや、実際のビデオフレームのカバレッジに制限がある自己コンパイルされた現実世界の低品質の顔画像で評価される。 本研究では、主にビデオフレームから"Full, Occluded, and Side"の分類を用いたFOSと呼ばれる新しい実世界のデータセットを導入し、ビデオ上の現在の手法の適用性について検討した。 既存のテストデータセットと比較して、FOSデータセットはより多様な劣化をカバーし、より複雑なシナリオからの顔サンプルを含む。 確立されたデータセットから,最新のBFR手法とビデオスーパーレゾリューション(VSR)手法の両方をベンチマークし,VFRタスクにおけるその可能性と限界を特定した。 また,画像品質評価(IQA)指標と顔IQA(FIQA)指標の有効性を主観的ユーザスタディを用いて検討した。 実験結果と詳細な分析結果により,現在のBFR法とVSR法の両方の成功と失敗から知見を得た。 これらの結果は、現在の顔修復アプローチにも課題をもたらし、VFR研究の今後の進歩を期待する。

Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.
# コントラストビジョン・ランゲージ事前学習におけるキャプション多様性のモデル化

Modeling Caption Diversity in Contrastive Vision-Language Pretraining ( http://arxiv.org/abs/2405.00740v2 )

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wildon, Aaron Courville, Nicolas Ballas, (参考訳) 画像のキャプションには数千の方法があります。 一方、CLIP(Contrastive Language Pretraining)は、イメージとそのキャプションを単一のベクタにマッピングすることで機能する。 本稿では,画像にマッチするキャプションの多様性をモデル化したLlip, Latent Language Image Pretrainingを紹介する。 Llipの視覚エンコーダは、テキストから派生した情報を条件付けして最終的な表現に混合された視覚的特徴のセットを出力する。 Llipは大規模エンコーダでも,CLIPやSigLIPのような非コンテクスト化されたベースラインよりも優れた性能を示す。 Llipは、平均2.9%のゼロショット分類ベンチマークをViT-G/14エンコーダで改善している。 具体的には、ImageNetでゼロショットのトップ-1の精度が83.5%に達し、同様の大きさのCLIPを1.4%上回っている。 また,MS-COCOのゼロショット検索を6.0%改善した。 提案手法によって導入されたコンポーネントの包括的分析を行い,Llipがよりリッチな視覚表現につながることを示す。

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.
# LidaRF:ストリートシーンのニューラルラジアンスフィールドにLidarを埋め込む

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes ( http://arxiv.org/abs/2405.00900v2 )

Shanlin Sun, Bingbing Zhuang, Ziyu Jiang, Buyu Liu, Xiaohui Xie, Manmohan Chandraker, (参考訳) 光リアリスティックシミュレーションは、自律運転のようなアプリケーションにおいて重要な役割を担い、ニューラルラディアンスフィールド(NeRF)の進歩により、デジタル3Dアセットの自動作成によるスケーラビリティの向上が期待できる。 しかし、大半がコリニアカメラの動きとスペーサーサンプリングにより、道路の景観に復元品質が損なわれている。 一方、アプリケーションはしばしば、車線変更のような行動を正確にシミュレートするために、入力から逸脱するカメラビューからのレンダリングを要求する。 本稿では,Lidarデータを利用した街路におけるNeRF品質向上のためのいくつかの知見を提案する。 まず,ラディアンスデコーディングのための暗黙のグリッドベース表現と融合したLidarから幾何学的シーン表現を学習し,明示的な点雲によって提供されるより強力な幾何学的情報を提供する。 次に, 密度化ライダー点の蓄積による利用を可能にする, 密閉型奥行き監視方式を提案する。 第3に、さらなる改善のためにLidarポイントから強化されたトレーニングビューを生成します。 私たちの洞察は、実際の運転シーン下での新規ビュー合成を大幅に改善することにつながります。

Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often demands rendering from camera views that deviate from the inputs to accurately simulate behaviors like lane changes. In this paper, we propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes. First, our framework learns a geometric scene representation from Lidar, which is fused with the implicit grid-based representation for radiance decoding, thereby supplying stronger geometric information offered by explicit point cloud. Second, we put forth a robust occlusion-aware depth supervision scheme, which allows utilizing densified Lidar points by accumulation. Third, we generate augmented training views from Lidar points for further improvement. Our insights translate to largely improved novel view synthesis under real driving scenes.
# バイレベル最適化とミニマックス最適化のための高速化された1次一階法

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization ( http://arxiv.org/abs/2405.00914v2 )

Chris Junchi Li, (参考訳) 本稿では,二値最適化のための一階法,すなわち,二値近似のための一階法を高速化するアルゴリズムを新たに提案する。 このアルゴリズムは、emph{fully} の1次オラクルを活用し、非凸-強凸二レベル最適化における近似定常点を求め、効率的な最適化のためにオラクル複雑性を向上する。 現状の問合せ複雑度における近似的な1次定常点と2次定常点を求める理論的保証が確立され、それらの複雑な最適化タスクの解法の有効性が示された。 本研究では,実世界の問題に対する実証的研究を行い,提案アルゴリズムの有効性を検証した。 非凸-強凸二値最適化問題の最適化における \texttt{(P)RAF${}^2$BA} の重要性は、その最先端収束率と計算効率によって証明される。

This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, enhancing oracle complexity for efficient optimization. Theoretical guarantees for finding approximate first-order stationary points and second-order stationary points at the state-of-the-art query complexities are established, showcasing their effectiveness in solving complex optimization tasks. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms. The significance of \texttt{(P)RAF${}^2$BA} in optimizing nonconvex-strongly-convex bilevel optimization problems is underscored by its state-of-the-art convergence rates and computational efficiency.
# HandSSCA:RGB画像からのステートスペースチャネル注意による3Dハンドメッシュ再構築

HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images ( http://arxiv.org/abs/2405.01066v2 )

Zixun Jiao, Xihan Wang, Quanli Gao, (参考訳) 単一のRGB画像から手メッシュを再構築するのは難しい作業です。 これまでのほとんどの研究は、さらなる情報を導入し、3D再構成結果を改善するための注意機構を導入しようとしたが、計算の複雑さは増大した。 この結果から,計算効率を向上しつつ,より簡潔なアーキテクチャを提案することができた。 本研究では,手動ポーズ推定の分野に状態空間モデリングを組み込んだ,シンプルで効果的な3次元手動メッシュ再構成ネットワークHandSSCAを提案する。 ネットワーク上では,有効な感覚場を拡張し,空間次元における手の特徴を抽出し,チャネル次元における手動領域の特徴を増強する,新しい状態空間アテンションモジュールを設計した。 この設計は、完全で詳細なハンドメッシュを再構築するのに役立ちます。 FREIHAND, DEXYCB, HO3Dなど, 難易度の高い手動オクルージョンを特徴とするよく知られたデータセットを用いて行った大規模な実験により, 提案したHandSSCAは, 最小パラメータ数を維持しながら, 最先端の性能を達成できることを示した。

Reconstructing a hand mesh from a single RGB image is a challenging task because hands are often occluded by objects. Most previous works attempted to introduce more additional information and adopt attention mechanisms to improve 3D reconstruction results, but it would increased computational complexity. This observation prompts us to propose a new and concise architecture while improving computational efficiency. In this work, we propose a simple and effective 3D hand mesh reconstruction network HandSSCA, which is the first to incorporate state space modeling into the field of hand pose estimation. In the network, we have designed a novel state space channel attention module that extends the effective sensory field, extracts hand features in the spatial dimension, and enhances hand regional features in the channel dimension. This design helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets featuring challenging hand-object occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandSSCA achieves state-of-the-art performance while maintaining a minimal parameter count.
# 潜在的なマスアサインメント脆弱性のためのREST APIのマイニング

Mining REST APIs for Potential Mass Assignment Vulnerabilities ( http://arxiv.org/abs/2405.01111v2 )

Arash Mazidi, Davide Corradini, Mohammad Ghafari, (参考訳) REST APIは、保護されたリソースにアクセスする上で重要な役割を持っています。 セキュリティテストツールが利用可能であるにもかかわらず、マス割り当ての脆弱性はREST APIで一般的であり、機密データの不正な操作につながる。 我々は、REST API仕様をマイニングする軽量なアプローチを提案し、大量割り当てをしがちな操作と属性を特定します。 100のAPIについて予備調査を行い、25の脆弱性が見つかった。 6つのAPIで9つの真の脆弱な操作を確認した。

REST APIs have a pivotal role in accessing protected resources. Despite the availability of security testing tools, mass assignment vulnerabilities are common in REST APIs, leading to unauthorized manipulation of sensitive data. We propose a lightweight approach to mine the REST API specifications and identify operations and attributes that are prone to mass assignment. We conducted a preliminary study on 100 APIs and found 25 prone to this vulnerability. We confirmed nine real vulnerable operations in six APIs.
