Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240420となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Clock Domain Crossing (CDC) の実用的形式検証手法 Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC) ( http://arxiv.org/abs/2406.06533v1 ) ライセンス: Link先を確認	Aman Kumar, Muhammad Ul Haque Khan, Bijitendra Mittra,	(参考訳) 現代のSystem-on-Chip (SoC) の設計は、テクノロジーのスケールアップにより、ますます複雑になりつつある。 SoC設計はしばしば複数の非同期クロックドメインで動作し、全体的な設計の複雑さをさらに増す。デバイスを効率よくするために、デザイナは複数の非同期ドメインを生成するGlobally-Asynchronous Locally-Synchronous (GALS)アプローチを採用する。これらのClock Domain Crossings (CDC) は、転移性の影響を受けやすく、そのようなCDCの機能的検証は、バグの回避を確実にするために非常に重要である。レジスタ転送レベル(RTL)シミュレーションや静的タイミング解析のような従来の検証手法では、これらのCDC問題に対処するには不十分であり、検証のギャップが生じる可能性がある。さらに、これらのCDC関連バグを特定するのは非常に時間がかかり、コストがかかるシリコン再スピンの最も一般的な理由の1つである。本研究は, CDCパスのメタスタビリティ・インジェクション(MSI)の実施により, CDCの問題を最小化するための実用的形式的検証手法の開発に焦点をあてる。 Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain Crossings (CDC) are prone to metastability effects, and functional verification of such CDC is very important to ensure that no bug escapes. Conventional verification methods, such as register transfer level (RTL) simulations and static timing analysis, are not enough to address these CDC issues, which may lead to verification gaps. Additionally, identifying these CDC-related bugs is very time-consuming and is one of the most common reasons for costly silicon re-spins. This paper is focused on the development of a pragmatic formal verification methodology to minimize the CDC issues by exercising Metastability Injection (MSI) in different CDC paths.	翻訳日:2024-07-01 08:00:19 公開日:2024-04-20
# 高構成のディジタル設計における効率的な構成被覆のための半形式的検証手法 A Semi-Formal Verification Methodology for Efficient Configuration Coverage of Highly Configurable Digital Designs ( http://arxiv.org/abs/2405.01572v1 ) ライセンス: Link先を確認	Aman Kumar, Sebastian Simon,	(参考訳) 今日では、システムオンチップ(SoC)の大多数が、開発サイクルを短縮するために知的財産(IP)を使用している。このようなIPが開発されると、設計の高構成性に焦点が当てられる。この設計側の柔軟性は、検証側のIP構成の巨大な状態空間をカバーし、可能なパラメータ設定の全ての機能的正しさを保証するという課題をもたらす。可能性の多さはブルートフォースのアプローチを許さないため、典型的および極端な仮定に基づいて選択された少数の設定しか検証されない。特に、ISO 26262機能安全基準に従う必要がある自動車アプリケーションでは、すべての重要な変種をカバーする要件は、いずれにせよ満たされる必要がある。シミュレーションベースの検証や形式検証のような最先端の既存の検証技術には、それぞれ時間空間の爆発や状態空間の爆発といった課題があるため、高度に構成可能なディジタルデザインを効率的に検証することの欠如がある。本稿では,高度に構成可能なディジタル設計を効率的に構成するための半形式的検証手法に着目する。この方法論は、高い構成カバレッジを可能にするシミュレーティブおよびフォーマルなメソッドに基づいた、ランタイムの削減に焦点を当てている。また,提案手法を高度に構成可能なマイクロプロセッサIPに適用し,そのメリットについて考察する。 Nowadays, a majority of System-on-Chips (SoCs) make use of Intellectual Property (IP) in order to shorten development cycles. When such IPs are developed, one of the main focuses lies in the high configurability of the design. This flexibility on the design side introduces the challenge of covering a huge state space of IP configurations on the verification side to ensure the functional correctness under every possible parameter setting. The vast number of possibilities does not allow a brute-force approach, and therefore, only a selected number of settings based on typical and extreme assumptions are usually verified. Especially in automotive applications, which need to follow the ISO 26262 functional safety standard, the requirement of covering all significant variants needs to be fulfilled in any case. State-of-the-Art existing verification techniques such as simulation-based verification and formal verification have challenges such as time-space explosion and state-space explosion respectively and therefore, lack behind in verifying highly configurable digital designs efficiently. This paper is focused on a semi-formal verification methodology for efficient configuration coverage of highly configurable digital designs. The methodology focuses on reduced runtime based on simulative and formal methods that allow high configuration coverage. The paper also presents the results when the developed methodology was applied on a highly configurable microprocessor IP and discusses the gained benefits.	翻訳日:2024-05-12 16:10:01 公開日:2024-04-20
# マルチビット量子化フェデレーション学習におけるSERに基づくデバイス選択機構 A SER-based Device Selection Mechanism in Multi-bits Quantization Federated Learning ( http://arxiv.org/abs/2405.02320v1 ) ライセンス: Link先を確認	Pengcheng Sun, Erwu Liu, Rui Wang,	(参考訳) 無線通信の質は、FL(Federated Learning)の性能に直接影響を及ぼすので、シンボル誤り率(SER)を用いて、FLにおける無線通信の影響を解析する。 FLシステムでは、非直交多重アクセス(NOMA)を、無線チャネルの重畳特性を利用する複数のユーザによる通信混雑と干渉を低減するための基本的な通信フレームワークとして用いることができる。最小平均角誤差(MMSE)に基づくシリアル干渉キャンセル(SIC)技術を用いて、受信端で各端末ノードの勾配を1つずつ回復する。本稿では、勾配パラメータを複数のビットに量子化して、より多くの勾配情報を最大範囲に保持し、伝送誤差の許容性を向上させる。そこで我々は,SERベースのデバイス選択機構(SER-DSM)を設計し,学習性能が悪い通信条件のユーザの影響を受けないようにした。実験は、勾配の多重ビット量子化がFLに与える影響と、提案したSERデバイス選択機構の必要性と優位性を示す。 The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes advantage of the superposition characteristics of wireless channels. The Minimum Mean Square Error (MMSE) based serial interference cancellation (SIC) technology is used to recover the gradient of each terminal node one by one at the receiving end. In this paper, the gradient parameters are quantized into multiple bits to retain more gradient information to the maximum extent and to improve the tolerance of transmission errors. On this basis, we designed the SER-based device selection mechanism (SER-DSM) to ensure that the learning performance is not affected by users with bad communication conditions, while accommodating as many users as possible to participate in the learning process, which is inclusive to a certain extent. The experiments show the influence of multi-bit quantization of gradient on FL and the necessity and superiority of the proposed SER-based device selection mechanism.	翻訳日:2024-05-12 16:00:17 公開日:2024-04-20
# 学生オンライン授業間相互作用の日常的行動分類 Ordinal Behavior Classification of Student Online Course Interactions ( http://arxiv.org/abs/2405.05142v1 ) ライセンス: Link先を確認	Thomas Trask,	(参考訳) オンライン授業とMOOCスタイルのオンライン授業における学生のインタラクションパターンに関する研究は,過去11年間にわたって広く研究されてきた。しかし、オンライン・コースとMOOCスタイルのオンライン・フォーマットで提供されるのと同じコースを修了する学生の習慣を比較した文献の差は依然として残っている。本研究は、ジョージア工科大学CS1301 edxコースの学生を対象に、オンラインコースとMOOCスタイルコースの両方でブラウザベースの利用パターンを調べ、この2つのコースの間にどのようなパターンが存在するかを決定する。 The study in interaction patterns between students in on-campus and MOOC-style online courses has been broadly studied for the last 11 years. Yet there remains a gap in the literature comparing the habits of students completing the same course offered in both on-campus and MOOC-style online formats. This study will look at browser-based usage patterns for students in the Georgia Tech CS1301 edx course for both the online course offered to on-campus students and the MOOCstyle course offered to anyone to determine what, if any, patterns exist between the two cohorts.	翻訳日:2024-05-12 15:40:48 公開日:2024-04-20
# ノード決定プール付きグラフニューラルネットワークにおける階層的表現学習 Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling ( http://arxiv.org/abs/1910.11436v3 ) ライセンス: Link先を確認	Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, Cesare Alippi,	(参考訳) グラフニューラルネットワーク(GNN)では、プール演算子は入力グラフの局所的な要約を計算し、そのグローバルな特性を捉える。本研究では,全体のグラフトポロジを保ちながら粗いグラフを生成するGNNのためのプール演算子であるノード決定プール(NDP)を提案する。トレーニング中、GNNは新しいノード表現を学び、それらを粗いグラフのピラミッドに適合させ、前処理の段階でオフラインで計算する。 NDPは3つのステップから構成される。まず、ノードデシメーション手順は、スペクトルアルゴリズムによって同定された分割の一方の側に属するノードを選択し、 \maxcut{} 解を近似する。その後、選択されたノードはKron還元と接続され、粗いグラフを形成する。最後に、得られたグラフは非常に密度が高いので、粗いグラフの隣接行列を具現化してGNNの計算コストを削減するスペーシフィケーション手法を適用する。特に、グラフ構造を著しく変更することなく、多くのエッジを除去できることが示されている。実験の結果、NDPは最先端のグラフプーリング演算子よりも効率が良く、同時に、多種多様なグラフ分類タスクにおける競合性能も向上していることがわかった。 In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.	翻訳日:2024-05-05 18:18:22 公開日:2024-04-20
# LEMDA:IoTシステムにおける侵入検出のための新機能エンジニアリング手法 LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems ( http://arxiv.org/abs/2404.16870v1 ) ライセンス: Link先を確認	Ali Ghubaish, Zebo Yang, Aiman Erbad, Raj Jain,	(参考訳) モノのインターネット(IoT)システム用の侵入検知システム(IDS)は、AIベースのモデルを使用してセキュアな通信を保証できる。 IoTシステムは、複雑なモデルを必要とする大量のデータを生成する多くの接続デバイスを持つ傾向があります。複雑なモデルには、オーバーフィット、低い解釈可能性、高い計算複雑性といった悪名高い問題がある。モデル複雑性のペナルティ(すなわち正規化)を追加することで過度な適合が容易になるが、解釈可能性や計算効率の面ではほとんど役に立たない。機能エンジニアリングはこれらの問題を解決することができるため、大規模なIoTシステムではIDSがデータのサイズと寸法を減らし、パフォーマンスが良く、データストレージが小さくなり、高速な検出が可能になった。本稿では,LEMDA (Mean Decrease in Accuracyに基づく光機能工学) と呼ばれる新しい特徴工学手法を提案する。 LEMDAは指数減衰と任意の感度因子を応用し、最も情報性の高い特徴を選択・生成する。提案手法は,3つのIoTデータセットと4つのAI/MLモデルを用いて,他の機能工学手法と比較して評価されている。その結果,LEMDAは全てのIDSモデルのF1スコアを平均34%改善し,ほとんどの場合の平均トレーニング時間と検出時間を短縮した。 Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.	翻訳日:2024-05-05 18:14:01 公開日:2024-04-20
# 知識グラフ完全性のための連続的関係抽出手法 A Continual Relation Extraction Approach for Knowledge Graph Completeness ( http://arxiv.org/abs/2404.17593v1 ) ライセンス: Link先を確認	Sefika Efeoglu,	(参考訳) 構造化された形式で非構造化データを表現することは、情報システム管理がそれを分析して解釈する上で最も重要なことである。これを実現するために、主なタスクがエンティティ認識と関係抽出と呼ばれる情報抽出パイプラインを活用することにより、構造化されていないデータを知識グラフに変換することができる。本論文は,実世界から来るデータストリーム内のエンティティ間の関係(相互接続)を識別する,新たな連続関係抽出手法を開発することを目的とする。この論文のドメイン固有のデータは、ドイツやオーストリアの新聞のコロナニュースである。 Representing unstructured data in a structured form is most significant for information system management to analyze and interpret it. To do this, the unstructured data might be converted into Knowledge Graphs, by leveraging an information extraction pipeline whose main tasks are named entity recognition and relation extraction. This thesis aims to develop a novel continual relation extraction method to identify relations (interconnections) between entities in a data stream coming from the real world. Domain-specific data of this thesis is corona news from German and Austrian newspapers.	翻訳日:2024-05-05 18:04:17 公開日:2024-04-20
# ソーシャルメディアの利用はアプリシーケンスから予測可能:LSTMとトランスフォーマーニューラルネットワークを用いて行動モデルを構築する Social Media Use is Predictable from App Sequences: Using LSTM and Transformer Neural Networks to Model Habitual Behavior ( http://arxiv.org/abs/2404.16066v1 ) ライセンス: Link先を確認	Heinrich Peters, Joseph B. Bayer, Sandra C. Matz, Yikun Chi, Sumer S. Vaid, Gabriella M. Harari,	(参考訳) 本稿では,スマートフォン利用者の逐次行動の予測モデルを用いて,ソーシャルメディアの習慣を研究する新しいアプローチを提案する。メディアおよび技術習慣に関する文献の多くは、自己報告アンケートや単純な行動頻度測定に頼っているが、メディアおよび技術習慣の重要かつ未検討の側面である、反復的な行動系列への組込みについて検討する。 Long Short-Term Memory(LSTM)とTransformer Neural Networkの活用 (i)ソーシャルメディアの利用は、内外レベルで予測可能である。 (II)ソーシャルメディア利用の予測可能性には、個人差が強い。いくつかのモデリング手法の性能について検討する。一すべての参加者から収集されたデータに基づいて訓練されたグローバルモデル 2イディオグラフィー人固有のモデル、及び三人固有のデータに基づいて微調整されたグローバルモデル。個人固有のモデリングも、個人固有のデータの微調整も、グローバルモデルよりも大幅に優れておらず、グローバルモデルが様々な慣用的行動パターンを表現できたことを示している。さらに,ソーシャルメディア利用の個人レベルの予測性は,一般のスマートフォン利用頻度やソーシャルメディア利用頻度と大きく関係しているわけではなく,行動頻度と異なる習慣の側面を捉えていることを示す。習慣モデリングと理論的発展の意味について論じる。 The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetitive behavioral sequences. Leveraging Long Short-Term Memory (LSTM) and transformer neural networks, we show that (i) social media use is predictable at the within and between-person level and that (ii) there are robust individual differences in the predictability of social media use. We examine the performance of several modeling approaches, including (i) global models trained on the pooled data from all participants, (ii) idiographic person-specific models, and (iii) global models fine-tuned on person-specific data. Neither person-specific modeling nor fine-tuning on person-specific data substantially outperformed the global models, indicating that the global models were able to represent a variety of idiosyncratic behavioral patterns. Additionally, our analyses reveal that the person-level predictability of social media use is not substantially related to the frequency of smartphone use in general or the frequency of social media use, indicating that our approach captures an aspect of habits that is distinct from behavioral frequency. Implications for habit modeling and theoretical development are discussed.	翻訳日:2024-04-26 18:22:04 公開日:2024-04-20
# 形式的およびシミュレーションに基づくRADAR SoCの有効検証 Efficient Verification of a RADAR SoC Using Formal and Simulation-Based Methods ( http://arxiv.org/abs/2404.15371v1 ) ライセンス: Link先を確認	Aman Kumar, Mark Litterick, Samuele Candido,	(参考訳) IoT(Internet of Things)とHuman-to-Machine Interaction(HMI)の需要が増加するにつれ、このようなソリューションを提供する現代のSystem-on-Chips(SoC)はますます複雑になっています。この複雑な設計は、特に消費者電子製品にとって、市場へのタイム・トゥ・マーケットが重要な要素である場合、検証に重大な課題をもたらす。本稿では,複雑な無線検出・ラング(RADAR)をベースとしたSoCを用いて,ミリメートル精度で人体の動きのオンチップセンシングを行うケーススタディを提案する。我々は,形式的手法とシミュレーション的手法を併用して相互補完を行い,信頼性の高い検証サインオフを実現する。要件駆動のフローアプローチを採用する一方で、複数の要件に対応し、プロジェクトからのノウハウを強調するために、さまざまな検証方法の使用を実演する。さらに、機械学習(ML)ベースの手法、特にCadenceのXcelium MLツールを使用して、検証スループットを改善しました。 As the demand for Internet of Things (IoT) and Human-to-Machine Interaction (HMI) increases, modern System-on-Chips (SoCs) offering such solutions are becoming increasingly complex. This intricate design poses significant challenges for verification, particularly when time-to-market is a crucial factor for consumer electronics products. This paper presents a case study based on our work to verify a complex Radio Detection And Ranging (RADAR) based SoC that performs on-chip sensing of human motion with millimetre accuracy. We leverage both formal and simulation-based methods to complement each other and achieve verification sign-off with high confidence. While employing a requirements-driven flow approach, we demonstrate the use of different verification methods to cater to multiple requirements and highlight our know-how from the project. Additionally, we used Machine Learning (ML) based methods, specifically the Xcelium ML tool from Cadence, to improve verification throughput.	翻訳日:2024-04-25 15:44:33 公開日:2024-04-20
# RoadBEV:鳥の視線で道路表面を再構築する RoadBEV: Road Surface Reconstruction in Bird's Eye View ( http://arxiv.org/abs/2404.06605v2 ) ライセンス: Link先を確認	Tong Zhao, Lei Yang, Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Yintao Wei,	(参考訳) 路面条件、特に幾何学的プロファイルは、自動運転車の走行性能に大きな影響を及ぼす。視覚に基づくオンライン道路再建は,道路情報を事前に収集する。モノクル深度推定やステレオマッチングといった既存のソリューションは、控えめなパフォーマンスに悩まされている。最近のバードアイビュー(Bird's-Eye-View、BEV)の認識技術は、より信頼性と正確な再構築の可能性を秘めている。本稿では, 単眼画像とステレオ画像で道路標高を推定する, RoadBEV-mono と RoadBEV-stereo の2つの簡易かつ効果的な道路標高復元モデルを提案する。前者はイメージビューから検索したボクセル特徴に基づく標高値と直接適合する一方、後者は左右のボクセル特徴の相違を示すBEVボリュームに基づく道路標高パターンを効率的に認識する。洞察に富んだ分析は、その構成と視点との相違を明らかにする。実世界のデータセットの実験は、モデルの有効性と優越性を検証します。 RoadBEVモノとRoadBEVステレオの標高誤差はそれぞれ1.83cmと0.50cmである。単眼画像に基づくBEVでは, 推定性能が50%向上した。我々のモデルは実用的な応用に期待でき、自律運転における視覚に基づくBEVの認識に貴重な基準を提供する。コードはhttps://github.com/ztsrxh/RoadBEVで公開されている。 Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction. This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively. The former directly fits elevation values based on voxel features queried from image view, while the latter efficiently recognizes road elevation patterns based on BEV volume representing discrepancy between left and right voxel features. Insightful analyses reveal their consistence and difference with perspective view. Experiments on real-world dataset verify the models' effectiveness and superiority. Elevation errors of RoadBEV-mono and RoadBEV-stereo achieve 1.83cm and 0.50cm, respectively. The estimation performance improves by 50\% in BEV based on monocular image. Our models are promising for practical applications, providing valuable references for vision-based BEV perception in autonomous driving. The code is released at https://github.com/ztsrxh/RoadBEV.	翻訳日:2024-04-24 18:46:42 公開日:2024-04-20
# コントラル検出の最適化: 効率的なNet-b4エンコーディングによるディープラーニングアプローチ Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 Encoding ( http://arxiv.org/abs/2404.14441v1 ) ライセンス: Link先を確認	Qunwei Lin, Qian Leng, Zhicheng Ding, Chao Yan, Xiaonan Xu,	(参考訳) 環境の持続可能性を求める中で、航空産業は生態系のフットプリントを最小限に抑えるという課題に直面している。主要な解決策の1つは、航空機の排気によって発生する直線的な氷結晶雲をターゲットとした避妊である。これらのコントラルは、大気熱を捕捉し、正確なセグメンテーションと、環境影響を測定するためのコントラル画像の包括的な分析を必要とすることで、地球温暖化を悪化させる。しかし、このセグメンテーションタスクは、異なる大気条件下でのコントラルの出現の変化と予測モデルにおける潜在的なミスアライメントの問題により複雑である。本稿では,特徴抽出に効率的なNet-b4エンコーダを応用した革新的な深層学習手法を提案し,衛星画像における反則検出の精度と効率を高めるために,誤り訂正,ソフトラベリング,擬似ラベル技術とシームレスに統合する。提案手法は,衛星画像の正確なコントラル検出と分析のための堅牢な枠組みを提供し,航空環境への影響軽減を支援することによって,コントラル画像解析を再定義し,持続可能な航空の目的に寄与することを目的としている。 In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trapping atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images to gauge their environmental impact. However, this segmentation task is complex due to the varying appearances of contrails under different atmospheric conditions and potential misalignment issues in predictive modeling. This paper presents an innovative deep-learning approach utilizing the efficient net-b4 encoder for feature extraction, seamlessly integrating misalignment correction, soft labeling, and pseudo-labeling techniques to enhance the accuracy and efficiency of contrail detection in satellite imagery. The proposed methodology aims to redefine contrail image analysis and contribute to the objectives of sustainable aviation by providing a robust framework for precise contrail detection and analysis in satellite imagery, thus aiding in the mitigation of aviation's environmental impact.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# Smooth Q-Learningアルゴリズムの統一ODE解析 Unified ODE Analysis of Smooth Q-Learning Algorithms ( http://arxiv.org/abs/2404.14442v1 ) ライセンス: Link先を確認	Donghwan Lee,	(参考訳) Q-ラーニングの収束は、過去数十年にわたる広範な研究の焦点となっている。近年,Q-ラーニングのための漸近収束解析をスイッチングシステムフレームワークを用いて導入している。このアプローチは、連続時間スイッチングシステムとしてモデル化された非同期Q-ラーニングの収束を証明するために、いわゆる常微分方程式(ODE)アプローチを適用する。しかし、安定性を証明するためには、準単調性のような制約条件を基礎となるスイッチングシステムに満たさなければならないため、解析方法をスムーズなQ-ラーニング変種など他の強化学習アルゴリズムに容易に一般化することは困難である。本稿では、スイッチングシステムアプローチを改善し、Q-ラーニングとそのスムーズな変形を解析できる、より汎用的で統一的な収束解析を提案する。提案手法は,Lyapunov関数として機能する$p$-normに基づく同期Q-ラーニングの収束に関する過去の研究に動機付けられている。しかし、提案した分析は、より一般的なODEモデルに対処し、非同期Q-ラーニングと、より単純なフレームワークでそのスムーズなバージョンの両方をカバーできる。 Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# 意味的依存とキーワードに基づく機械翻訳の評価 Evaluation of Machine Translation Based on Semantic Dependencies and Keywords ( http://arxiv.org/abs/2404.14443v1 ) ライセンス: Link先を確認	Kewei Yuan, Qiurong Zhao, Yang Xu, Xiao Zhang, Huansheng Ning,	(参考訳) 本稿では,既存の機械翻訳評価アルゴリズムの多くが語彙情報と構文情報のみを考慮しているが,文に含まれる深い意味情報を無視するという事実を踏まえ,参照翻訳に基づいて機械翻訳の意味的正当性を評価し,意味的依存関係と文キーワード情報を統合する計算手法を提案する。ハルビン工科大学ソーシャル・コンピューティング・情報検索研究センターが開発した言語技術プラットフォームを用いて、文のセマンティック依存分析とキーワード分析を行い、キーワードに対応するセマンティック依存グラフ、キーワード、重み情報を取得する。文のセマンティック依存関係を持つすべての単語情報と、セマンティック情報に影響を与えるキーワード情報を含んでいる。単語と依存多機能を含む意味的関連性ペアを構築する。文のキーセマンティクスは意味依存によって抽出されたセマンティクス情報では強調できないため、あいまいなセマンティクス解析がもたらされる。したがって、機械翻訳セマンティック評価の範囲内には、文キーワード情報も含まれる。文の意味的正当性を包括的かつ詳細に評価するために, 実験結果から, 類似した手法と比較して, 評価アルゴリズムの精度が向上し, 機械翻訳の意味的正当性をより正確に測定できることを示した。 In view of the fact that most of the existing machine translation evaluation algorithms only consider the lexical and syntactic information, but ignore the deep semantic information contained in the sentence, this paper proposes a computational method for evaluating the semantic correctness of machine translations based on reference translations and incorporating semantic dependencies and sentence keyword information. Use the language technology platform developed by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology to conduct semantic dependency analysis and keyword analysis on sentences, and obtain semantic dependency graphs, keywords, and weight information corresponding to keywords. It includes all word information with semantic dependencies in the sentence and keyword information that affects semantic information. Construct semantic association pairs including word and dependency multi-features. The key semantics of the sentence cannot be highlighted in the semantic information extracted through semantic dependence, resulting in vague semantics analysis. Therefore, the sentence keyword information is also included in the scope of machine translation semantic evaluation. To achieve a comprehensive and in-depth evaluation of the semantic correctness of sentences, the experimental results show that the accuracy of the evaluation algorithm has been improved compared with similar methods, and it can more accurately measure the semantic correctness of machine translation.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# 不確かさを意識したベイズニューラルネットワークによるバッテリヘルスモニタリング Practical Battery Health Monitoring using Uncertainty-Aware Bayesian Neural Network ( http://arxiv.org/abs/2404.14444v1 ) ライセンス: Link先を確認	Yunyi Zhao, Zhang Wei, Qingyu Yan, Man-Fai Ng, B. Sivaneasan, Cheng Xiang,	(参考訳) バッテリーの健康モニタリングと予測は、安全、持続可能性、経済的側面に大きな影響を与える電気移動時代において極めて重要である。既存の研究はしばしば予測精度に重点を置いているが、現実のアプリケーションにおける技術の展開を妨げる実用的な要因を無視する傾向がある。本稿では,バッテリ寿命予測のためのベイズニューラルネットワークに基づくモデルを開発する。本モデルでは,モデルの各パラメータに対して,バッテリ健康に関するセンサデータを使用し,単一点ではなく分布を適用した。これにより、モデルが固有のランダム性とバッテリ健康の不確実性をキャプチャし、正確な予測だけでなく、定量的な不確実性も得られる。提案モデルの有効性を実験的に検証し, 予測誤差は平均13.9%, 特定の試験電池では2.9%であった。さらに、すべての予測には定量的な確実性が含まれており、バッテリーの初期から中期にかけて66%改善されている。この研究は、バッテリ技術に対する実用的価値を持ち、業界における技術導入の加速に寄与している。 Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop models based on the Bayesian neural network for predicting battery end-of-life. Our models use sensor data related to battery health and apply distributions, rather than single-point, for each parameter of the models. This allows the models to capture the inherent randomness and uncertainty of battery health, which leads to not only accurate predictions but also quantifiable uncertainty. We conducted an experimental study and demonstrated the effectiveness of our proposed models, with a prediction error rate averaging 13.9%, and as low as 2.9% for certain tested batteries. Additionally, all predictions include quantifiable certainty, which improved by 66% from the initial to the mid-life stage of the battery. This research has practical values for battery technologies and contributes to accelerating the technology adoption in the industry.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# 大規模言語モデルによる合成データ評価のための多面的評価フレームワーク A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models ( http://arxiv.org/abs/2404.14445v1 ) ライセンス: Link先を確認	Yefeng Yuan, Yuhong Liu, Liang Cheng,	(参考訳) 生成型AIと大規模言語モデル(LLM)の急速な進歩は、特に製品レビューのような構造化表形式の領域において、合成データを生成するための新たな道を開いた。潜在的なメリットにもかかわらず、特にトレーニングデータセットで個人情報が使用される場合、プライバシリークに関する懸念が表面化している。さらに、生成された合成データの品質を定量的に測定し、下流タスクに利用できる総合的な評価フレームワークが存在しない。このギャップに対応するために、さまざまな評価指標を用いて合成された表データの忠実さ、有用性、およびプライバシー保護を評価するために設計されたオープンソースの評価フレームワークであるSynEvalを紹介した。提案するフレームワークであるSynEvalの有効性を,ChatGPT,Claude,Llamaの3つの最先端LCMから生成された総合製品レビューデータに適用して検証した。実験結果から, 合成データ生成の文脈における各種評価指標間のトレードオフを明らかにした。さらに、SynEvalは、合成表データに携わる研究者や実践者にとって重要な手段であり、特定のアプリケーションに対して生成されたデータの適合性を司法的に判断する権限を与え、ユーザのプライバシの維持に重点を置いている。 The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# NVIDIA弾性率に基づく物理インフォームドニューラル演算子フォワードモデルによる新しいA.I型貯留層評価 A Novel A.I Enhanced Reservoir Characterization with a Combined Mixture of Experts -- NVIDIA Modulus based Physics Informed Neural Operator Forward Model ( http://arxiv.org/abs/2404.14447v1 ) ライセンス: Link先を確認	Clement Etienam, Yang Juntao, Issam Said, Oleg Ovcharenko, Kaustubh Tangsali, Pavel Dimitrov, Ken Hester,	(参考訳) 本研究では,貯水池評価のための高度なワークフローを開発し,新しいアプローチによる貯水池履歴マッチングの課題を効果的に解決した。本手法は,高度なクラスタ分類回帰(CCR)フレームワークにおいて,物理インフォームドニューラル演算子(PINO)をフォワードモデルとして統合する。このプロセスは、貯水池履歴マッチングにおける急激な不確実性定量化のために最適化された適応正規化アンサンブルカルマンインバージョン(aREKI)によって強化される。このイノベーティブなワークフローは未知の透水性とポロシティの場をパラメータ化し、変分畳み込みオートエンコーダやCCRのような技術で非ガウス測度を捉える。エキゾチックな先行と教師付きモデルとして機能するCCRは、ピースマンウェル方程式の非線形ダイナミクスを正確にシミュレートするために、PINOサロゲートと相乗化する。 CCRアプローチは、各ステージに異なる機械学習アルゴリズムを適用する際の柔軟性を可能にする。 PINO貯水池サロゲートの更新は、監督データ、初期条件、黒油PDEの残留物から得られた損失関数によって駆動される。我々の統合モデルはPINO-Res-Simと呼ばれ、圧力、飽和度、石油、水、ガスの生産速度を含む重要なパラメータを出力します。合成貯水池とノルンフィールドの制御実験により従来のシミュレータに対して検証された手法は、顕著な精度を示した。さらに、aREKIワークフローのPINO-Res-Simは、従来の手法よりも100～6000倍高速な計算速度で、未知のフィールドを効率よく回収した。 NVIDIA H100上で実行されるPINO-Res-Simの学習フェーズは驚くほど効率的で、複雑な計算タスクのためのアンサンブルベースのメソッドと互換性があった。 We have developed an advanced workflow for reservoir characterization, effectively addressing the challenges of reservoir history matching through a novel approach. This method integrates a Physics Informed Neural Operator (PINO) as a forward model within a sophisticated Cluster Classify Regress (CCR) framework. The process is enhanced by an adaptive Regularized Ensemble Kalman Inversion (aREKI), optimized for rapid uncertainty quantification in reservoir history matching. This innovative workflow parameterizes unknown permeability and porosity fields, capturing non-Gaussian posterior measures with techniques such as a variational convolution autoencoder and the CCR. Serving as exotic priors and a supervised model, the CCR synergizes with the PINO surrogate to accurately simulate the nonlinear dynamics of Peaceman well equations. The CCR approach allows for flexibility in applying distinct machine learning algorithms across its stages. Updates to the PINO reservoir surrogate are driven by a loss function derived from supervised data, initial conditions, and residuals of governing black oil PDEs. Our integrated model, termed PINO-Res-Sim, outputs crucial parameters including pressures, saturations, and production rates for oil, water, and gas. Validated against traditional simulators through controlled experiments on synthetic reservoirs and the Norne field, the methodology showed remarkable accuracy. Additionally, the PINO-Res-Sim in the aREKI workflow efficiently recovered unknown fields with a computational speedup of 100 to 6000 times faster than conventional methods. The learning phase for PINO-Res-Sim, conducted on an NVIDIA H100, was impressively efficient, compatible with ensemble-based methods for complex computational tasks.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# オブジェクト指向アーキテクチャ:デュランプレートのためのソフトウェア工学にヒントを得た形状文法 Object-Oriented Architecture: A Software Engineering-Inspired Shape Grammar for Durands Plates ( http://arxiv.org/abs/2404.14448v1 ) ライセンス: Link先を確認	Rohan Agarwal,	(参考訳) モジュラーアーキテクチャ設計の課題に対処するため,計算機科学の関数型およびオブジェクト指向プログラミング原理を用いた形状文法システムの実装を通じて,新しいアプローチを提案する。フレンチ・ネオ古典主義の建築家ジャン=ニコラ=ルイ・デュラン(Jean-Nicolas-Louis Durand)は、モジュラー・ルールに基づく建築の手法で知られており、複雑な建築様式を体系的に表現するシステムの能力を示している。コンピュータプログラミングの原理を活用することで、提案された方法論は、デュランのオリジナルプレートの固有の論理に固執しながら、多様な設計を作成できる。 Shape Machineの統合により、アーキテクトやデザイナのためのフレキシブルなフレームワークが可能になり、既存のCADソフトウェアでモジュール化された方法で複雑な構造を生成することができる。本研究は建築設計における計算ツールの探索に寄与し、歴史的に重要な建築要素を合成するための汎用的なソリューションを提供する。 Addressing the challenge of modular architectural design, this study presents a novel approach through the implementation of a shape grammar system using functional and object-oriented programming principles from computer science. The focus lies on the modular generation of plates in the style of French Neoclassical architect Jean-Nicolas-Louis Durand, known for his modular rule-based method to architecture, demonstrating the system's capacity to articulate intricate architectural forms systematically. By leveraging computer programming principles, the proposed methodology allows for the creation of diverse designs while adhering to the inherent logic of Durand's original plates. The integration of Shape Machine allows a flexible framework for architects and designers, enabling the generation of complex structures in a modular fashion in existing CAD software. This research contributes to the exploration of computational tools in architectural design, offering a versatile solution for the synthesis of historically significant architectural elements.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# ニューラルネットワークを用いたStackOverflowの質問品質予測 Predicting Question Quality on StackOverflow with Neural Networks ( http://arxiv.org/abs/2404.14449v1 ) ライセンス: Link先を確認	Mohammad Al-Ramahi, Izzat Alsmadi, Abdullah Wahbeh,	(参考訳) インターネットやソーシャルメディアを通じて利用できる情報の豊富さは前例がない。コンピューティング分野において、Stack OverflowのようなWebサイトは、コンピューティングとプログラミングの問題に対するソリューションを求めるユーザにとって重要なソースだと考えられている。しかし、他のソーシャルメディアプラットフォームと同様に、Stack Overflowには関連する情報と無関係な情報が混在している。本稿では,質問応答(QA)コミュニティの例として,Stack Overflowにおける質問の品質を予測するニューラルネットワークモデルの評価を行った。その結果、ベースライン機械学習モデルと比較してニューラルネットワークモデルの有効性を示し、80%の精度を実現した。さらに,ニューラルネットワークモデルにおけるレイヤーの数は,その性能に大きな影響を及ぼす可能性が示唆された。 The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated neural network models to predict the quality of questions on Stack Overflow, as an example of Question Answering (QA) communities. Our results demonstrate the effectiveness of neural network models compared to baseline machine learning models, achieving an accuracy of 80%. Furthermore, our findings indicate that the number of layers in the neural network model can significantly impact its performance.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# GraphMatcher: オントロジーマッチングのためのグラフ表現学習アプローチ GraphMatcher: A Graph Representation Learning Approach for Ontology Matching ( http://arxiv.org/abs/2404.14450v1 ) ライセンス: Link先を確認	Sefika Efeoglu,	(参考訳) オントロジーマッチングは、2つ以上のオントロジーにおいて2つ以上のエンティティ間の関係や対応を見つけるものとして定義される。ドメインオントロジーの相互運用性問題を解決するためには、これらのオントロジーにおける意味論的に類似したエンティティを見つけ、マージする前にアライメントする必要がある。本研究で開発されたGraphMatcherは,グラフアテンションを用いたオントロジーマッチングシステムである。 GraphMatcherは、オントロジーアライメント評価イニシアチブ (OAEI) 2022 のカンファレンストラックで顕著な結果を得た。そのコードは ~\url{https://github.com/sefeoglu/gat_ontology_matching} で公開されている。 Ontology matching is defined as finding a relationship or correspondence between two or more entities in two or more ontologies. To solve the interoperability problem of the domain ontologies, semantically similar entities in these ontologies must be found and aligned before merging them. GraphMatcher, developed in this study, is an ontology matching system using a graph attention approach to compute higher-level representation of a class together with its surrounding terms. The GraphMatcher has obtained remarkable results in in the Ontology Alignment Evaluation Initiative (OAEI) 2022 conference track. Its codes are available at ~\url{https://github.com/sefeoglu/gat_ontology_matching}.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# 高次元データの複数ビューにおける外乱検出のための生成サブスペース逆アクティブラーニング Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data ( http://arxiv.org/abs/2404.14451v1 ) ライセンス: Link先を確認	Jose Cribeiro-Ramallo, Vadim Arzamasov, Federico Matteucci, Denis Wambold, Klemens Böhm,	(参考訳) 高次元表データのアウトリー検出は、データマイニングにおいて重要なタスクであり、多くの下流タスクやアプリケーションに必須である。既存の教師なしの外れ値検出アルゴリズムは、不適切な仮定(IA)、次元性の呪い(CD)、複数ビュー(MV)など1つ以上の問題に直面している。これらの課題に対処するために,複数の敵を持つジェネレーティブ・サブスペース・アドバイザリアル・アクティブ・ラーニング(GSAAL)を導入する。これらの敵対者は、異なるデータ部分空間上の限界クラス確率関数を学習し、一方、全空間の1つの生成器は、不等式全体の分布をモデル化する。 GSAAL は MV の制限に対応するために特別に設計されており、IA と CD も扱える唯一の方法である。本稿では,MVの包括的数学的定式化,識別器の収束保証,GSAALの拡張性について述べる。我々はGSAALの有効性とスケーラビリティを実証し、特にMVシナリオにおいて、他の一般的なOD手法と比較して優れた性能を示す。 Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAAL), a novel approach that uses a Generative Adversarial Network with multiple adversaries. These adversaries learn the marginal class probability functions over different data subspaces, while a single generator in the full space models the entire distribution of the inlier class. GSAAL is specifically designed to address the MV limitation while also handling the IA and CD, being the only method to do so. We provide a comprehensive mathematical formulation of MV, convergence guarantees for the discriminators, and scalability results for GSAAL. Our extensive experiments demonstrate the effectiveness and scalability of GSAAL, highlighting its superior performance compared to other popular OD methods, especially in MV scenarios.	翻訳日:2024-04-24 18:17:13 公開日:2024-04-20
# FIRST:FrontrunnIngのレジリエントなスマートコントラクト FIRST: FrontrunnIng Resilient Smart ConTracts ( http://arxiv.org/abs/2204.00955v2 ) ライセンス: Link先を確認	Emrah Sariboz, Gaurav Panwar, Roopa Vishwanathan, Satyajayant Misra,	(参考訳) 暗号通貨の使用量の増加により、貸し出し、借り入れ、マージン取引などの従来の金融応用を暗号通貨の世界に広く浸透させてきた。一部のケースでは、本質的に透明で規制されていない暗号通貨が、これらのアプリケーションのユーザを攻撃します。悪意のあるエンティティは、現在処理されていない金融トランザクションの知識を活用し、未処理のトランザクションの前に独自のトランザクションを実行しようとする。この結果、財務的損失、不正確なトランザクション、さらにはより多くの攻撃にさらされる可能性がある。本稿では、最前線攻撃を防ぐフレームワークであるFIRSTを提案し、検証遅延関数やアグリゲートシグネチャを含む暗号プロトコルを用いて構築する。我々の設計では、VDFの公開パラメータを生成するためのフェデレートされたセットアップがあり、単一の信頼できるセットアップの必要性を排除しています。我々は、FIRSTを正式に分析し、Universal Composabilityフレームワークを用いてセキュリティを証明し、FIRSTの有効性を実験的に実証する。 Owing to the meteoric rise in the usage of cryptocurrencies, there has been a widespread adaptation of traditional financial applications such as lending, borrowing, margin trading, and more, to the cryptocurrency realm. In some cases, the inherently transparent and unregulated nature of cryptocurrencies leads to attacks on users of these applications. One such attack is frontrunning, where a malicious entity leverages the knowledge of currently unprocessed financial transactions submitted by users and attempts to get its own transaction(s) executed ahead of the unprocessed ones. The consequences of this can be financial loss, inaccurate transactions, and even exposure to more attacks. We propose FIRST, a framework that prevents frontrunning attacks, and is built using cryptographic protocols including verifiable delay functions and aggregate signatures. In our design, we have a federated setup for generating the public parameters of the VDF, thus removing the need for a single trusted setup. We formally analyze FIRST, prove its security using the Universal Composability framework and experimentally demonstrate the effectiveness of FIRST.	翻訳日:2024-04-24 01:49:47 公開日:2024-04-20
# ELODI:Positive-Congruent Trainingのためのロジット差分抑制 ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training ( http://arxiv.org/abs/2205.06265v3 ) ライセンス: Link先を確認	Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto,	(参考訳) 負のフリップ(負のフリップ)は、レガシーモデルが更新されたときに分類システムで導入されたエラーである。既存の負のフリップ率(NFR)を減らす方法は、新しいモデルに古いモデルを模倣させたり、推論コストを禁ずるアンサンブルを使ったりすることで、全体的な精度を犠牲にしている。我々は、NFRの減少におけるアンサンブルの役割を分析し、通常決定境界に近くない負のフリップを除去するが、ロジット間の距離に大きな偏差を示すことが多いことを観察する。本研究は,誤差率とNFRの両方でパラゴン性能を実現する分類システムを,単一モデルの推論コストで訓練する手法であるELODI(Ensemble Logit Difference Inhibition)を提案する。この方法は、分類システムを更新するために使用される単一学生モデルに均質なアンサンブルを蒸留する。 ELODIはまた、最大ロジット値を持つクラスのサブセットのロジット差のみを罰する一般化された蒸留目標であるロジット差分抑制(LDI)も導入している。複数の画像分類ベンチマークでは、ELODIによるモデル更新により、精度の保持とNFRの低減が向上した。 Negative flips are errors introduced in a classification system when a legacy model is updated. Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy by forcing a new model to imitate the old models, or use ensembles, which multiply inference cost prohibitively. We analyze the role of ensembles in reducing NFR and observe that they remove negative flips that are typically not close to the decision boundary, but often exhibit large deviations in the distance among their logits. Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model. The method distills a homogeneous ensemble to a single student model which is used to update the classification system. ELODI also introduces a generalized distillation objective, Logit Difference Inhibition (LDI), which only penalizes the logit difference of a subset of classes with the highest logit values. On multiple image classification benchmarks, model updates with ELODI demonstrate superior accuracy retention and NFR reduction.	翻訳日:2024-04-24 01:49:47 公開日:2024-04-20
# 置換に基づく進化的アルゴリズムの実行時解析 Runtime Analysis for Permutation-based Evolutionary Algorithms ( http://arxiv.org/abs/2207.04045v4 ) ライセンス: Link先を確認	Benjamin Doerr, Yassine Ghannane, Marouane Ibn Brahim,	(参考訳) 進化的アルゴリズム(EA)の理論解析は、過去25年間に擬ブール最適化問題において大きな進歩を遂げてきたが、EAが置換に基づく問題を解決する方法に関する散発的な理論的な結果のみが存在する。置換に基づくベンチマークの欠如を克服するため,従来の擬似ブールベンチマークを置換集合上で定義されたベンチマークに転送する一般的な方法を提案する。次に、Scharnow, Tinnefeld, Wegener (2004) が提案した置換に基づく$(1+1)$ EAの厳密な実行時解析を、LeadingOnes と Jump ベンチマークの類似で実施する。後者は、ビットストリングと異なり、置換を$\sigma$を別の$\tau$に変換するのがどれほど難しいかを決定するハミング距離だけでなく、$\sigma \tau^{-1}$の正確なサイクル構造も示している。このため、より対称なスクランブル突然変異作用素も考慮する。単純な証明に繋がるだけでなく、奇妙なジャンプサイズを持つジャンプ関数のランタイムを$\Thetaの係数で削減する。 (n)$。最後に、ビットストリングの場合のように、スクランブル演算子の重み付きバージョンが$m^{\Thetaの高速化につながることを示す。 (m)}$ on jump function with jump size $m$ 短い経験的分析によりこれらの知見が裏付けられるが、また、ヴォイド突然変異率のような小さな実装の詳細が重要な違いをもたらすことも明らかである。 While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.	翻訳日:2024-04-24 01:41:46 公開日:2024-04-20
# ベスト賞の選考 Selection of the Most Probable Best ( http://arxiv.org/abs/2207.07533v2 ) ライセンス: Link先を確認	Taeho Kim, Kyoung-kuk Kim, Eunhye Song,	(参考訳) 予測値ランキングと選択(R&S)問題では,すべてのk解のシミュレーション出力が,分布によって不確実性をモデル化可能な共通パラメータに依存する。パラメータが有限である場合にMPBを学習するための効率的な逐次サンプリングアルゴリズムを設計し,最適である確率が最も高い解として,最も確率の高いベスト(MPB)を定義する。我々はMPBを誤って選択する確率の大きな偏差率を導出し、最適な計算予算割当問題を定式化し、速度最大化の静的サンプリング比を求める。その後、問題は緩和され、検証するために解釈可能で計算的に効率的である最適条件の集合が得られる。最適化条件における未知の手段をその推定値に置き換えるアルゴリズムを考案し,シミュレーション予算が増加するにつれて,アルゴリズムのサンプリング比が条件を満たすことを証明した。さらに, 平均推定にカーネルリッジレグレッションを適用し, 同じ漸近収束結果を得ることにより, アルゴリズムの実証性能を著しく向上できることを示す。これらのアルゴリズムは、最先端の文脈R&Sアルゴリズムとベンチマークされ、経験的性能が優れていることを示した。 We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances.	翻訳日:2024-04-24 01:41:46 公開日:2024-04-20
# DeepVARwT:トレンド付きVARモデルのディープラーニング DeepVARwT: Deep Learning for a VAR Model with Trend ( http://arxiv.org/abs/2209.10587v4 ) ライセンス: Link先を確認	Xixi Li, Jingsong Yuan,	(参考訳) ベクトル自己回帰(VAR)モデルは、複数の時系列間の依存を記述するために使われてきた。これは定常時系列のモデルであり、各系列に決定論的傾向が存在するように拡張することができる。 VARモデルに適合する前に、データをパラメトリックまたは非パラメトリックに遅延すると、後半部ではより多くのエラーが発生する。本研究では,DeepVARwTと呼ばれる新しい手法を提案する。この手法は,トレンドと依存構造を同時に最大に推定する深層学習手法を用いている。この目的のためにLong Short-Term Memory (LSTM) ネットワークが使用される。モデルの安定性を確保するため、Ansley & Kohn (1986) の変換を用いて自己回帰係数の因果条件を適用する。シミュレーション研究と実データへの適用について述べる。本研究では,実データから生成した現実的傾向関数を用いて,実関数/パラメータ値と比較する。実データアプリケーションでは,本モデルの予測性能を文献の最先端モデルと比較する。 The vector autoregressive (VAR) model has been used to describe the dependence within and across multiple time series. This is a model for stationary time series which can be extended to allow the presence of a deterministic trend in each series. Detrending the data either parametrically or nonparametrically before fitting the VAR model gives rise to more errors in the latter part. In this study, we propose a new approach called DeepVARwT that employs deep learning methodology for maximum likelihood estimation of the trend and the dependence structure at the same time. A Long Short-Term Memory (LSTM) network is used for this purpose. To ensure the stability of the model, we enforce the causality condition on the autoregressive coefficients using the transformation of Ansley & Kohn (1986). We provide a simulation study and an application to real data. In the simulation study, we use realistic trend functions generated from real data and compare the estimates with true function/parameter values. In the real data application, we compare the prediction performance of this model with state-of-the-art models in the literature.	翻訳日:2024-04-24 01:41:46 公開日:2024-04-20
# PTDE:マルチエージェント強化学習のための拡張実行による個人化訓練 PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2210.08872v2 ) ライセンス: Link先を確認	Yiqun Chen, Hangyu Mao, Jiaxin Mao, Shiguang Wu, Tianle Zhang, Bin Zhang, Wei Yang, Hongxing Chang,	(参考訳) 分散実行による集中訓練(CTDE)は、多エージェント強化学習において広く採用されているパラダイムとして現れ、Q$-function(英語版)や集中的批判(英語版)を学習するためのグローバル情報の利用を強調している。対照的に、調査ではグローバルな情報を活用して、個別の$Q$関数や個々のアクターを直接強化しています。特に,全てのエージェントに対して同一のグローバル情報を普遍的に適用することは,最適な性能を示すには不十分であることが判明した。その結果、各エージェントに合わせたグローバル情報のカスタマイズを提唱し、総合的なパフォーマンスを高めるためにエージェント個人化されたグローバル情報を作成する。さらに,エージェント個人化されたグローバル情報をエージェントのローカル情報に蒸留するPTDE(Personalized Training with Distilled Execution)という新しいパラダイムを導入する。この蒸留された情報は、分散実行中に利用され、性能劣化を最小限に抑える。 PTDEは最先端のアルゴリズムとシームレスに統合できるため、SMACベンチマーク、Google Research Football(GRF)ベンチマーク、Learning to Rank(LTR)タスクなど、さまざまなベンチマークで注目すべきパフォーマンス向上を実現している。 Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint $Q$-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual $Q$-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamlessly integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.	翻訳日:2024-04-24 01:41:46 公開日:2024-04-20
# OpenPack:IoT対応のロジスティック環境におけるパッケージ作業認識のための大規模データセット OpenPack: A Large-scale Dataset for Recognizing Packaging Works in IoT-enabled Logistic Environments ( http://arxiv.org/abs/2212.11152v2 ) ライセンス: Link先を確認	Naoya Yoshimura, Jaime Morales, Takuya Maekawa, Takahiro Hara,	(参考訳) ヒトの日常活動とは異なり、産業領域における業務活動認識のための既存のセンサデータセットは、産業現場との密接なコラボレーションが必要なため、現実的なデータ収集の困難さによって制限されている。これはまた、産業応用のための方法の研究と開発を制限している。そこで本研究では,これらの課題に対処し,産業領域における作業活動の機械的認識に関する研究に寄与するため,OpenPackと呼ばれる大規模な作業認識データセットを新たに導入する。 OpenPackには、加速度データ、キーポイント、深度画像、IoT対応デバイス(例えばハンドヘルドバーコードスキャナー)からの読み取りを含む53.8時間のマルチモーダルセンサーデータが含まれており、パッケージング作業経験の異なる16の被験者から収集されている。本研究では,現在最先端の人間活動認識技術をデータセットに適用し,この結果に基づいて,広汎なコンピューティングコミュニティにおける複雑な作業活動認識研究の今後の方向性を示す。 OpenPackは、困難なタスクを提供することで、センサベースのアクション/アクティビティ認識コミュニティに貢献すると考えています。 OpenPackデータセットはhttps://open-pack.github.io.comで公開されている。 Unlike human daily activities, existing publicly available sensor datasets for work activity recognition in industrial domains are limited by difficulties in collecting realistic data as close collaboration with industrial sites is required. This also limits research on and development of methods for industrial applications. To address these challenges and contribute to research on machine recognition of work activities in industrial domains, in this study, we introduce a new large-scale dataset for packaging work recognition called OpenPack. OpenPack contains 53.8 hours of multimodal sensor data, including acceleration data, keypoints, depth images, and readings from IoT-enabled devices (e.g., handheld barcode scanners), collected from 16 distinct subjects with different levels of packaging work experience. We apply state-of-the-art human activity recognition techniques to the dataset and provide future directions of complex work activity recognition studies in the pervasive computing community based on the results. We believe that OpenPack will contribute to the sensor-based action/activity recognition community by providing challenging tasks. The OpenPack dataset is available at https://open-pack.github.io.	翻訳日:2024-04-24 01:41:46 公開日:2024-04-20
# リニア・オプティカル・トランスポート・エンベディング Linear Optimal Partial Transport Embedding ( http://arxiv.org/abs/2302.03232v4 ) ライセンス: Link先を確認	Yikun Bai, Ivan Medri, Rocio Diaz Martin, Rana Muhammad Shahroz Khan, Soheil Kolouri,	(参考訳) 最適トランスポート(OT)は、機械学習、統計処理、信号処理など様々な分野で応用されている。しかし、バランスの取れた質量要件は、実用上の問題においてその性能を制限している。これらの制限に対処するため、不均衡なOT、最適部分輸送(OPT)、Hellinger Kantorovich(HK)を含むOT問題の変種が提案されている。本稿では,OTおよびHK上の(局所的な)線形化手法をOPT問題に拡張したリニア最適部分輸送(LOPT)埋め込みを提案する。提案手法は,2組の正測度間のOPT距離の計算を高速化する。理論的な貢献に加えて,ポイントクラウド補間およびPCA解析におけるLOPT埋め込み手法の実証を行った。 Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed. In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem. The proposed embedding allows for faster computation of OPT distance between pairs of positive measures. Besides our theoretical contributions, we demonstrate the LOPT embedding technique in point-cloud interpolation and PCA analysis.	翻訳日:2024-04-24 01:32:01 公開日:2024-04-20
# インスタンスアソシエーションの展開:オーディオ・ビジュアル・セグメンテーションの概観 Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation ( http://arxiv.org/abs/2304.02970v6 ) ライセンス: Link先を確認	Yuanhong Chen, Yuyuan Liu, Hu Wang, Fengbei Liu, Chong Wang, Helen Frazer, Gustavo Carneiro,	(参考訳) 音声視覚セグメント化(AVS)は、音声視覚キューに基づいて、正確に音を分割する作業である。音声・視覚学習の有効性は、音と視覚オブジェクトの正確な相互アライメントの実現に大きく依存する。健全な視覚学習には2つの重要な要素が必要である。 1)高品質な画素レベルのマルチクラスアノテート画像とオーディオファイルに関連付けられた課題データセット 2)音声情報とそれに対応する視覚オブジェクトとの強いつながりを確立できるモデル。しかしながら、これらの要件は、偏りのあるオーディオ視覚データを含むトレーニングセットや、偏りのあるトレーニングセットをはるかに越えたモデルなど、現在の手法によって部分的に解決されているだけである。本研究では,難易度と比較的偏りのない高画質な視覚的セグメンテーション・ベンチマークを構築するための費用対効果の新たな手法を提案する。また,音声・視覚指導型コントラスト学習のための新たな情報的サンプルマイニング手法を提案し,識別的コントラスト的サンプルを利用してモーダル間理解を実現する。ベンチマークの有効性を示す実験結果を示す。さらに,既存のAVSデータセットおよび新しいベンチマークを用いて行った実験により,本手法が最先端(SOTA)セグメンテーション精度を実現することを示す。 Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting sounding objects based on audio-visual cues. The effectiveness of audio-visual learning critically depends on achieving accurate cross-modal alignment between sound and visual objects. Successful audio-visual learning requires two essential components: 1) a challenging dataset with high-quality pixel-level multi-class annotated images associated with audio files, and 2) a model that can establish strong links between audio information and its corresponding visual object. However, these requirements are only partially addressed by current methods, with training sets containing biased audio-visual data, and models that generalise poorly beyond this biased training set. In this work, we propose a new cost-effective strategy to build challenging and relatively unbiased high-quality audio-visual segmentation benchmarks. We also propose a new informative sample mining method for audio-visual supervised contrastive learning to leverage discriminative contrastive samples to enforce cross-modal understanding. We show empirical results that demonstrate the effectiveness of our benchmark. Furthermore, experiments conducted on existing AVS datasets and on our new benchmark show that our method achieves state-of-the-art (SOTA) segmentation accuracy.	翻訳日:2024-04-24 01:32:01 公開日:2024-04-20
# CAFIN: グラフ上での教師なし表現学習のためのインプロセッシングによる中心性意識の公平性 CAFIN: Centrality Aware Fairness inducing IN-processing for Unsupervised Representation Learning on Graphs ( http://arxiv.org/abs/2304.04391v3 ) ライセンス: Link先を確認	Arvindh Arun, Aakash Aanegola, Amul Agrawal, Ramasuri Narayanam, Ponnurangam Kumaraguru,	(参考訳) グラフ上での教師なし表現学習は、乱れのないネットワークデータの増大と、生成された表現のコンパクトさ、豊かさ、有用性により、勢いを増している。この文脈では、表現の生成中に公平さとバイアスの制約を考慮する必要性が十分に動機付けられ、先行研究である程度研究されている。この設定における以前の研究の大きな制限の1つは、ノード間の不均等なパフォーマンスをもたらす様々なノード中心性など、グラフ内の接続パターンによって生じるバイアスに対処することを目的としていないことである。本研究は,教師なし環境でのグラフ構造によるバイアス軽減の問題に対処することを目的としている。この目的のために我々は,グラフの構造情報を活用し,既存のフレームワークが生成した表現をチューニングする中心性を考慮したフェアネス誘導フレームワークであるCAFINを提案する。 GraphSAGE(このドメインで人気のあるフレームワーク)にデプロイし、ノード分類とリンク予測という2つの下流タスクで有効性を示します。実証的には、CAFINは、さまざまなドメインからの一般的なデータセット(18から80%のパフォーマンス格差の削減)間のパフォーマンス格差を一貫して低減します。 Unsupervised Representation Learning on graphs is gaining traction due to the increasing abundance of unlabelled network data and the compactness, richness, and usefulness of the representations generated. In this context, the need to consider fairness and bias constraints while generating the representations has been well-motivated and studied to some extent in prior works. One major limitation of most of the prior works in this setting is that they do not aim to address the bias generated due to connectivity patterns in the graphs, such as varied node centrality, which leads to a disproportionate performance across nodes. In our work, we aim to address this issue of mitigating bias due to inherent graph structure in an unsupervised setting. To this end, we propose CAFIN, a centrality-aware fairness-inducing framework that leverages the structural information of graphs to tune the representations generated by existing frameworks. We deploy it on GraphSAGE (a popular framework in this domain) and showcase its efficacy on two downstream tasks - Node Classification and Link Prediction. Empirically, CAFIN consistently reduces the performance disparity across popular datasets (varying from 18 to 80% reduction in performance disparity) from various domains while incurring only a minimal cost of fairness.	翻訳日:2024-04-24 01:32:01 公開日:2024-04-20
# SPIRiT-Diffusion: 加速度MRIのための自己整合駆動拡散モデル SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI ( http://arxiv.org/abs/2304.05060v2 ) ライセンス: Link先を確認	Zhuo-Xu Cui, Chentao Cao, Yue Wang, Sen Jia, Jing Cheng, Xin Liu, Hairong Zheng, Dong Liang, Yanjie Zhu,	(参考訳) 拡散モデルは画像生成の指導的手法として登場し、磁気共鳴画像再構成(MRI)の領域で成功している。しかし、拡散モデルに基づく既存の再構成法は主に画像領域で定式化されており、コイル感度マップ(CSM)における不正確性に影響を受けやすい。 k空間補間法はこの問題に効果的に対処できるが、従来の拡散モデルはk空間補間では容易に適用できない。この課題を克服するために,反復自己整合SPIRiT法に着想を得たk空間補間拡散モデルであるSPIRiT-Diffusionを提案する。具体的には、SPIRiTにおける自己整合項(k-空間物理先行項)の反復解法を用いて、拡散過程を管理する新しい確率微分方程式(SDE)を定式化する。その後、拡散処理を行うことでk空間データを補間することができる。この革新的なアプローチは、拡散モデルにおいてSDEを設計する際の最適化モデルの役割を強調し、拡散プロセスは、モデル駆動拡散と呼ばれる概念である最適化モデルに固有の物理学と密に一致させることができる。頭蓋内3次元画像と頸動脈壁画像を用いたSPIRiT-Diffusion法について検討した。その結果, 画像領域再構築法よりも精度が高く, 10。 Diffusion models have emerged as a leading methodology for image generation and have proven successful in the realm of magnetic resonance imaging (MRI) reconstruction. However, existing reconstruction methods based on diffusion models are primarily formulated in the image domain, making the reconstruction quality susceptible to inaccuracies in coil sensitivity maps (CSMs). k-space interpolation methods can effectively address this issue but conventional diffusion models are not readily applicable in k-space interpolation. To overcome this challenge, we introduce a novel approach called SPIRiT-Diffusion, which is a diffusion model for k-space interpolation inspired by the iterative self-consistent SPIRiT method. Specifically, we utilize the iterative solver of the self-consistent term (i.e., k-space physical prior) in SPIRiT to formulate a novel stochastic differential equation (SDE) governing the diffusion process. Subsequently, k-space data can be interpolated by executing the diffusion process. This innovative approach highlights the optimization model's role in designing the SDE in diffusion models, enabling the diffusion process to align closely with the physics inherent in the optimization model, a concept referred to as model-driven diffusion. We evaluated the proposed SPIRiT-Diffusion method using a 3D joint intracranial and carotid vessel wall imaging dataset. The results convincingly demonstrate its superiority over image-domain reconstruction methods, achieving high reconstruction quality even at a substantial acceleration rate of 10.	翻訳日:2024-04-24 01:32:01 公開日:2024-04-20
# 2体リニアキックロータシステムにおけるカオスと局部位相 Chaos and localized phases in a two-body linear kicked rotor system ( http://arxiv.org/abs/2304.08899v2 ) ライセンス: Link先を確認	Anjali Nambudiripad, J. Bharathi Kannan, M. S. Santhanam,	(参考訳) 周期的なキックにもかかわらず、リニアキックドローター(LKR)は、運動エネルギー項が運動量で線形である積分可能かつ正確に解けるモデルである。最近、空間的に相互作用するLKRも積分可能であることが示され、対応する量子状態における動的局在が得られた。同様の局所化位相は、連結相対論的キックローターのような他の非可積分モデルにも存在する。この研究は2体LKRを用いて2つの主要な結果を示し、第一に、ローターのモータ間の相互作用を通じて、積分可能なリニアキックローターにカオスが引き起こされることを示した。リアプノフ指数の分析的推定値を得る。第二に、このカオスモデルの量子力学は、蹴りの強さと相互作用の強さの変化によって、古典的に誘導された局在化、動的局在化、部分拡散および拡散相といった様々な相を示すことが示されている。本システムにおける絡み合い生産の観点から,これらの位相のシグネチャを指摘する。有効ヒルベルト空間次元を定義することにより、絡み合う成長速度を適切なランダム行列平均を用いて理解することができる。 Despite the periodic kicks, a linear kicked rotor (LKR) is an integrable and exactly solvable model in which the kinetic energy term is linear in momentum. It was recently shown that spatially interacting LKRs are also integrable, and results in dynamical localization in the corresponding quantum regime. Similar localized phases exist in other non-integrable models such as the coupled relativistic kicked rotors. This work, using a two-body LKR, demonstrates two main results; firstly, it is shown that chaos can be induced in the integrable linear kicked rotor through interactions between the momenta of rotors. An analytical estimate of its Lyapunov exponent is obtained. Secondly, the quantum dynamics of this chaotic model, upon variation of kicking and interaction strengths, is shown to exhibit a variety of phases -- classically induced localization, dynamical localization, subdiffusive and diffusive phases. We point out the signatures of these phases from the perspective of entanglement production in this system. By defining an effective Hilbert space dimension, the entanglement growth rate can be understood using appropriate random matrix averages.	翻訳日:2024-04-24 01:32:01 公開日:2024-04-20
# Curious Rhythms: ウィキペディア消費の時間的規則性 Curious Rhythms: Temporal Regularities of Wikipedia Consumption ( http://arxiv.org/abs/2305.09497v3 ) ライセンス: Link先を確認	Tiziano Piccardi, Martin Gerlach, Robert West,	(参考訳) ウィキペディアは世界最大の百科事典として、幅広い情報ニーズに対応している。以前の研究では、ウィキペディア利用者の情報は1日を通して異なることが指摘されていたが、現在までに基礎となる力学の大規模かつ定量的な研究は行われていない。本論文は,英語ウィキペディアのサーバログから抽出した数十億件のタイムゾーン補正ページ要求を大規模に分析し,その状況と時間が消費情報の種類とどのように関連しているかを調査することによって,このギャップを埋めるものである。まず, 日中交替のグローバルなパターンを除去したとしても, 個々の物品の消費習慣が日中変化を強く維持していることを示す。そこで,本研究では,夜間に好まれる記事と就労時間に好まれる記事とを特に区別し,消費パターンの原型的形状を特徴付ける。最後に、ウィキペディアの記事のアクセスリズムの話題的・文脈的相関について検討し、記事の話題、読者国、アクセスデバイス(モバイル対デスクトップ)が日々の注意パターンの重要な予測因子であることを示す。これらの発見は、人間がウェブ上で情報を求める方法に新たな光を当て、ウィキペディアを知識と学習のための最大のオープンプラットフォームの一つとして焦点を合わせ、ウィキペディアが情報のニーズを満たすリッチな知識基盤としての役割を一日を通じて強調し、世界中の情報を探究する情報を理解し、適切な情報システムの設計に意味があることを強調した。 Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-scale analysis of billions of timezone-corrected page requests mined from English Wikipedia's server logs, with the goal of investigating how context and time relate to the kind of information consumed. First, we show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities. Then, we characterize the prototypical shapes of consumption patterns, finding a particularly strong distinction between articles preferred during the evening/night and articles preferred during working hours. Finally, we investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns. These findings shed new light on how humans seek information on the Web by focusing on Wikipedia as one of the largest open platforms for knowledge and learning, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day, with implications for understanding information seeking across the globe and for designing appropriate information systems.	翻訳日:2024-04-24 01:22:08 公開日:2024-04-20
# ニューラルネットワーク伝搬デコーダの一般化境界 Generalization Bounds for Neural Belief Propagation Decoders ( http://arxiv.org/abs/2305.10540v2 ) ライセンス: Link先を確認	Sudarshan Adiga, Xin Xiao, Ravi Tandon, Bane Vasic, Tamal Bose,	(参考訳) 機械学習ベースのアプローチは、次世代通信システムのためのデコーダの設計にますます使われている。広く使われているフレームワークの1つは、信念伝播(NBP)であり、このフレームワークは、信念伝播(BP)イテレーションをディープニューラルネットワークに展開し、パラメータはデータ駆動方式で訓練される。 NBPデコーダは古典的復号アルゴリズムを改善することが示されている。本稿では, NBPデコーダの一般化機能について検討する。具体的には、デコーダの一般化ギャップは、経験的ビットエラーレートと期待ビットエラーレートの差である。このギャップを埋めて、コードパラメータ(ブロック長、メッセージ長、変数/チェックノード次数)、復号化イテレーション、トレーニングデータセットサイズなど、デコーダの複雑さに依存することを示す新たな理論的結果を示す。通常のパリティチェック行列と不規則なパリティチェック行列の両方について結果が提示される。我々の知る限りでは、ニューラルネットワークに基づくデコーダの一般化性能に関する最初の理論的結果である。本稿では,トレーニングデータセットサイズに対する一般化ギャップの依存性を示す実験結果と,異なるコードに対する復号化の繰り返しを示す。 Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms. In this paper, we investigate the generalization capabilities of NBP decoders. Specifically, the generalization gap of a decoder is the difference between empirical and expected bit-error-rate(s). We present new theoretical results which bound this gap and show the dependence on the decoder complexity, in terms of code parameters (blocklength, message length, variable/check node degrees), decoding iterations, and the training dataset size. Results are presented for both regular and irregular parity-check matrices. To the best of our knowledge, this is the first set of theoretical results on generalization performance of neural network based decoders. We present experimental results to show the dependence of generalization gap on the training dataset size, and decoding iterations for different codes.	翻訳日:2024-04-24 01:22:08 公開日:2024-04-20
# OER: 継続的なオフライン強化学習のためのオフライン体験リプレイ OER: Offline Experience Replay for Continual Offline Reinforcement Learning ( http://arxiv.org/abs/2305.13804v2 ) ライセンス: Link先を確認	Sibo Gai, Donglin Wang, Li He,	(参考訳) エージェントには、事前にコンパイルされたオフラインデータセットのシーケンスを通じて、新たなスキルを継続的に学習する能力が望まれる。しかし、一連のオフラインタスクを連続的に学習することは、リソース制限されたシナリオ下での破滅的な忘れの問題につながる可能性が高い。本稿では、エージェントが一連のオフライン強化学習タスクを学習し、全ての連続タスクの環境を探索することなく、小さなリプレイバッファで全ての学習タスクの性能を追求する、新しい設定である連続オフライン強化学習(CORL)を定式化する。すべてのシーケンシャルなタスクについて一貫して学習するためには、エージェントは新しい知識を取得し、一方、古い知識をオフラインで保存する必要がある。この目的のために,我々は連続学習アルゴリズムを導入し,CORL問題の最も適切なアルゴリズムとして経験再生(ER)を実験的に発見した。しかし、CORLにERを導入すると、リプレイバッファにおける経験と学習ポリシーからの軌跡とのミスマッチという、新しい分散シフト問題が発生することが観察された。このような問題に対処するために、リプレイバッファを構築するための新しいモデルベースエクスペリエンスセレクション(MBES)方式を提案し、そこで遷移モデルを学習して状態分布を近似する。このモデルは、記憶のための学習モデルに最も近いオフラインデータからデータをフィルタリングすることで、リプレイバッファと学習モデルの間の分布バイアスをブリッジするために使用される。さらに,新しいタスクを学習する能力を高めるために,新しい二重行動クローニング(DBC)アーキテクチャを用いて経験再現手法を再構成し,Q-ラーニングプロセスにおける行動閉鎖の障害を回避する。一般に、アルゴリズムをオフライン体験再生(OER)と呼ぶ。広汎な実験により,OER法は広く使用されているムジョコ環境においてSOTAのベースラインを上回っていることが示された。 The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.	翻訳日:2024-04-24 01:22:08 公開日:2024-04-20
# leftover Lunch: 言語モデルのためのアドバンテージベースのオフライン強化学習 Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models ( http://arxiv.org/abs/2305.14718v5 ) ライセンス: Link先を確認	Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark Riedl,	(参考訳) RLHF(Reinforcement Learning with Human Feedback)は、言語モデル(LM)アライメントの最も顕著な手法である。しかし、RLHFは不安定でデータハングリーなプロセスであり、微調整のために新しい高品質なLM生成データを必要とする。本稿では,既存のデータに対するRLトレーニングを可能にするオフラインポリシー勾配アルゴリズムであるAdvantage-Leftover Lunch RL (A-LoL)を紹介する。 LM出力シーケンス全体を単一のアクションとして仮定することで、A-LoLはシーケンスレベルの分類器や人間設計のスコアリング機能を報酬として組み込むことができる。その後、LMの値の推定値を使用することで、A-LoLは正の優位性(左上)のデータポイントのみを訓練し、ノイズに耐性を持たせる。全体として、A-LoLは実装が容易で、サンプル効率が高く、安定したLMトレーニングレシピである。 A-LoLとその変種の有効性を4つの異なる言語生成タスクで示す。オンラインRL(PPO)と最近のRL(DPO, PRO)とオフラインRL(GOLD)を比較した。一般的に使用されているRLHFベンチマークであるHelpful and Harmless Assistant (HHA)では、A-LoLメソッドで訓練されたLMは、人間によるベースラインよりも安全で役に立つと評価されている。さらに、残りの3つのタスクでは、A-LoLはノイズや準最適トレーニングデータを使用しても、複数の異なる報酬関数を最適化することができた。実験コードもリリースしています。 https://github.com/abaheti95/LoL-RL Reinforcement Learning with Human Feedback (RLHF) is the most prominent method for Language Model (LM) alignment. However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for finetuning. We introduce Advantage-Leftover Lunch RL (A-LoL), a new class of offline policy gradient algorithms that enable RL training on any pre-existing data. By assuming the entire LM output sequence as a single action, A-LoL allows incorporating sequence-level classifiers or human-designed scoring functions as rewards. Subsequently, by using LM's value estimate, A-LoL only trains on positive advantage (leftover) data points, making it resilient to noise. Overall, A-LoL is an easy-to-implement, sample-efficient, and stable LM training recipe. We demonstrate the effectiveness of A-LoL and its variants with a set of four different language generation tasks. We compare against both online RL (PPO) and recent preference-based (DPO, PRO) and reward-based (GOLD) offline RL baselines. On the commonly-used RLHF benchmark, Helpful and Harmless Assistant (HHA), LMs trained with A-LoL methods achieve the highest diversity while also being rated more safe and helpful than the baselines according to humans. Additionally, in the remaining three tasks, A-LoL could optimize multiple distinct reward functions even when using noisy or suboptimal training data. We also release our experimental code. https://github.com/abaheti95/LoL-RL	翻訳日:2024-04-24 01:22:08 公開日:2024-04-20
# NLPモデルのドメインシフトに対するロバスト性の測定 Measuring the Robustness of NLP Models to Domain Shifts ( http://arxiv.org/abs/2306.00168v5 ) ライセンス: Link先を確認	Nitay Calderon, Naveh Porat, Eyal Ben-David, Alexander Chapanin, Zorik Gekhman, Nadav Oved, Vitaly Shalumov, Roi Reichart,	(参考訳) ドメインロバストネス(DR)に関する既存の研究は、異なる設定、限られたタスクの多様性、コンテキスト内学習のような最近の能力に関する研究が不足している。さらに、DR測定の一般的な実践は、完全には正確ではないかもしれない。現在の研究は、チャレンジセットに焦点を当て、ソースドロップ(SD: Source Drop)のみに依存している。しかし、ドメイン内パフォーマンスの劣化を測定するターゲットドロップ(TD)は相補的な視点として使うべきであると論じる。これらの問題に対処するため、まず7つの異なるNLPタスクからなるDRベンチマークを算出し、SDとTDの両方を計測した。そこで我々は,21種類の微調整モデルと少ショットLLMを14,000以上のドメインシフトを含む大規模DR研究を行った。両方のモデルタイプがドメインシフト時にドロップに悩まされることがわかりました。微調整のモデルはドメイン内では優れているが、少数ショットのLLMはドメインを超越し、ロバスト性が向上する。さらに、真のDRチャレンジよりも難しいドメインにシフトすることで、大きなSDをしばしば説明できることがわかり、これは相補的なメトリックとしてのTDの重要性を強調している。我々の研究は、NLPモデルの現在のDR状態に光を当て、より堅牢なモデルに対する評価プラクティスの改善を促進することを願っている。 Existing research on Domain Robustness (DR) suffers from disparate setups, limited task variety, and scarce research on recent capabilities such as in-context learning. Furthermore, the common practice of measuring DR might not be fully accurate. Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation. However, we argue that the Target Drop (TD), which measures degradation from the target in-domain performance, should be used as a complementary point of view. To address these issues, we first curated a DR benchmark comprised of 7 diverse NLP tasks, which enabled us to measure both the SD and the TD. We then conducted a comprehensive large-scale DR study involving over 14,000 domain shifts across 21 fine-tuned models and few-shot LLMs. We found that both model types suffer from drops upon domain shifts. While fine-tuned models excel in-domain, few-shot LLMs often surpass them cross-domain, showing better robustness. In addition, we found that a large SD can often be explained by shifting to a harder domain rather than by a genuine DR challenge, and this highlights the importance of TD as a complementary metric. We hope our study will shed light on the current DR state of NLP models and promote improved evaluation practices toward more robust models.	翻訳日:2024-04-24 01:12:24 公開日:2024-04-20
# 合成データを用いたレアカメラビューにおける2次元人物位置推定の改善 Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data ( http://arxiv.org/abs/2307.06737v2 ) ライセンス: Link先を確認	Miroslav Purkrabek, Jiri Matas,	(参考訳) 人間のポーズ推定のための方法とデータセットは、主にサイドビューとフロントビューのシナリオに焦点を当てている。合成データを活用することで限界を克服し、ポーズとビューを包括的に制御したSMPLベースの合成人間を生成するRePoGen(RarE POses GENerator)を導入する。トップビューデータセットの実験と、さまざまなポーズを持つ実画像の新しいデータセットにより、COCOデータセットにRePoGenデータを追加することは、一般的なビューのパフォーマンスを損なうことなく、トップビューとボトムビューのポーズ推定に対する以前のアプローチより優れていることが示されている。アブレーション研究は、解剖学的妥当性、特に先行研究は、効果的なパフォーマンスの前提条件ではないことを示している。導入されたデータセットと対応するコードはhttps://mirapurkrabek.github.io/RePoGen-paper/ で公開されている。 Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches to top- and bottom-view pose estimation without harming performance on common views. An ablation study shows that anatomical plausibility, a property prior research focused on, is not a prerequisite for effective performance. The introduced dataset and the corresponding code are available on https://mirapurkrabek.github.io/RePoGen-paper/ .	翻訳日:2024-04-24 01:02:16 公開日:2024-04-20
# Prot2Text:GNNとトランスフォーマーを用いたマルチモーダルタンパク質の機能生成 Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers ( http://arxiv.org/abs/2307.14367v3 ) ライセンス: Link先を確認	Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis,	(参考訳) 近年,様々な機械学習手法が開発され,タンパク質機能予測の分野で大きな進歩を遂げている。しかし、既存のほとんどの方法はタスクを多分類問題、すなわち事前に定義されたラベルをタンパク質に割り当てるものとして定式化している。本研究では,タンパク質の機能を自由テキスト形式で予測する新しいアプローチであるProt2Textを提案する。エンコーダ・デコーダフレームワークでグラフニューラルネットワーク(GNN)とLarge Language Models(LLM)を組み合わせることで,タンパク質配列や構造,テキストアノテーションや記述など,さまざまなデータタイプを効果的に統合する。このマルチモーダルアプローチはタンパク質の機能の全体的表現を可能にし、詳細で正確な機能記述の生成を可能にする。本モデルを評価するため,SwissProtからマルチモーダルタンパク質データセットを抽出し,Prot2Textの有効性を実証的に実証した。これらの結果は、マルチモーダルモデル、特にGNNとLLMの融合による変換的影響を強調し、研究者に、既存のタンパク質だけでなく、より正確な機能予測のための強力なツールを提供する。 In recent years, significant progress has been made in the field of protein function prediction with the development of various machine-learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e. assigning predefined labels to proteins. In this work, we propose a novel approach, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including protein sequence, structure, and textual annotation and description. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate functional descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.	翻訳日:2024-04-24 01:02:16 公開日:2024-04-20
# 言語モデルを用いた患者と臨床の整合性の検討 Matching Patients to Clinical Trials with Large Language Models ( http://arxiv.org/abs/2307.15051v3 ) ライセンス: Link先を確認	Qiao Jin, Zifeng Wang, Charalampos S. Floudas, Fangyuan Chen, Changlin Gong, Dara Bracken-Clarke, Elisabetta Xue, Yifan Yang, Jimeng Sun, Zhiyong Lu,	(参考訳) 臨床試験は、しばしば患者募集の課題によって妨げられる。本稿では,患者間マッチングを支援するLLMフレームワークであるTrialGPTを紹介する。患者注記が与えられた場合、TrialGPTは、患者が基準ごとの基準に基づいて適性を予測するとともに、これらの予測を統合して、対象の臨床試験に対する適性を評価する。公用コホート184例を対象に,TrialGPTの試験レベル予測性能について検討した。また,3名の医師に1,000名以上の患者基準ペアをラベル付けし,基準レベルの予測精度を評価した。実験の結果、TrialGPTは専門家のパフォーマンス(88.7%-90.0%)に近く、忠実な説明で87.3%の基準レベルの精度を達成した。集計されたTrialGPTスコアは、ヒトの適性判断と高い相関があり、最高の競争モデルを32.6%から57.2%で上回り、臨床試験を除外している。さらに,本研究により,TrialGPTは実生活における臨床試験マッチング作業において,スクリーニング時間(42.6%)を大幅に短縮できることが明らかとなった。これらの結果と分析により,TrialGPTなどのLSMとの臨床治験の機会が得られた。 Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%-90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT.	翻訳日:2024-04-24 01:02:16 公開日:2024-04-20
# 量子金融シミュレーションと量子状態生成のための新しいアプローチ A novel approach for quantum financial simulation and quantum state preparation ( http://arxiv.org/abs/2308.01844v2 ) ライセンス: Link先を確認	Yen-Jui Chang, Wei-Ting Wang, Hao-Yuan Chen, Shih-Wei Liao, Ching-Ray Chang,	(参考訳) 量子状態の準備は、量子コンピューティングと情報処理において不可欠である。特定の量子状態の正確かつ確実な準備能力は、様々な用途に不可欠である。量子コンピュータの有望な応用の1つは量子シミュレーションである。これは、我々がシミュレートしようとしているシステムを表す量子状態を作成する必要がある。本研究では,パラメータ化量子回路(PQC)と古典シミュレータの変分解法を用いた複雑な確率分布の学習とロードを目的とした,新しいシミュレーションアルゴリズムであるマルチスプリット-ステップ量子ウォーク(multi-SSQW)を提案する。マルチSSQWアルゴリズムは、分割ステップの量子ウォークの修正版であり、マルチエージェント決定プロセスを統合するように拡張され、金融市場をモデル化するのに適している。この研究は、確率分布シミュレーションと金融市場モデリングにおける有望な能力を実証するために、マルチSSQWアルゴリズムの理論的記述と実証的研究を提供する。量子計算の利点を生かして、マルチSSQWは複雑な財務分布とシナリオを高精度にモデル化し、財務分析と意思決定のための貴重な洞察とメカニズムを提供する。マルチSSQWの主な利点は、モデリングの柔軟性、安定した収束、即時計算である。これらの利点は、動的な金融市場での急速なモデリングと予測の可能性を強調している。 Quantum state preparation is vital in quantum computing and information processing. The ability to accurately and reliably prepare specific quantum states is essential for various applications. One of the promising applications of quantum computers is quantum simulation. This requires preparing a quantum state representing the system we are trying to simulate. This research introduces a novel simulation algorithm, the multi-Split-Steps Quantum Walk (multi-SSQW), designed to learn and load complicated probability distributions using parameterized quantum circuits (PQC) with a variational solver on classical simulators. The multi-SSQW algorithm is a modified version of the split-steps quantum walk, enhanced to incorporate a multi-agent decision-making process, rendering it suitable for modeling financial markets. The study provides theoretical descriptions and empirical investigations of the multi-SSQW algorithm to demonstrate its promising capabilities in probability distribution simulation and financial market modeling. Harnessing the advantages of quantum computation, the multi-SSQW models complex financial distributions and scenarios with high accuracy, providing valuable insights and mechanisms for financial analysis and decision-making. The multi-SSQW's key benefits include its modeling flexibility, stable convergence, and instantaneous computation. These advantages underscore its rapid modeling and prediction potential in dynamic financial markets.	翻訳日:2024-04-24 01:02:16 公開日:2024-04-20
# 暗黙のマルチタスク強化学習問題に対するポリシー適応法 A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems ( http://arxiv.org/abs/2308.16471v2 ) ライセンス: Link先を確認	Satoshi Yamamori, Jun Morimoto,	(参考訳) 接触や衝突を含む動的運動生成タスクでは、ポリシーパラメータの小さな変化は、非常に異なるリターンをもたらす。例えば、サッカーでは、打球の位置や力がわずかに変化したり、ボールの摩擦が変化した場合に、ボールは同様の方向の動きで完全に異なる方向に飛べる。しかし、異なる方向にボールを向くためには、全く異なるスキルが必要であると想像することは困難である。本研究では,異なる報酬関数や環境パラメータを持つ単一動作カテゴリにおいて,目標や環境の暗黙的な変化にポリシーを適用するためのマルチタスク強化学習アルゴリズムを提案する。単足ロボットモデルを用いて,ボール誘導作業における提案手法の評価を行った。その結果,提案手法はゴール位置の暗黙的な変化やボールの再生係数に適応できるが,標準領域のランダム化手法では異なるタスク設定に対処できないことがわかった。 In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.	翻訳日:2024-04-24 00:52:28 公開日:2024-04-20
# $\rm SP^3$:PCAプロジェクションによる構造化プルーニングの強化 $\rm SP^3$: Enhancing Structured Pruning via PCA Projection ( http://arxiv.org/abs/2308.16475v2 ) ライセンス: Link先を確認	Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen,	(参考訳) 構造化プルーニング(Structured pruning)は、事前訓練された言語モデル(PLM)のサイズを減らす手法として広く使われているが、現在の手法は、モデルのサイズと効率に重要な次元であるPLMの隠れ次元(d)を圧縮する可能性を見落としていることが多い。本稿では,PCAプロジェクションを用いた構造化プルーニング手法(SP3)を提案し,マスク前に主成分によって定義された空間に特徴を投影することで,効果的にdを減少させる手法を提案する。ベンチマーク(GLUEとSQuAD)の大規模な実験は、SP3がdを70%削減し、BERTベースモデルの94%を圧縮し、96%以上の精度を維持し、同じ圧縮比でdを6%圧縮する他の方法よりも優れていることを示している。 SP3はOPTやLlamaなど他のモデルでも有効であることが証明されている。私たちのデータとコードは匿名のリポジトリで利用可能です。 Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency. This paper introduces a novel structured pruning approach, Structured Pruning with PCA Projection (SP3), targeting the effective reduction of d by projecting features into a space defined by principal components before masking. Extensive experiments on benchmarks (GLUE and SQuAD) show that SP3 can reduce d by 70%, compress 94% of the BERTbase model, maintain over 96% accuracy, and outperform other methods that compress d by 6% in accuracy at the same compression ratio. SP3 has also proven effective with other models, including OPT and Llama. Our data and code are available at an anonymous repo.	翻訳日:2024-04-24 00:52:28 公開日:2024-04-20
# 非測定共著者をもつ一般化線形モデルに対する同時推論 Simultaneous inference for generalized linear models with unmeasured confounders ( http://arxiv.org/abs/2309.07261v3 ) ライセンス: Link先を確認	Jin-Hong Du, Larry Wasserman, Kathryn Roeder,	(参考訳) 数万の同時仮説テストがゲノム研究で定期的に行われ、異なる発現遺伝子を同定する。しかし、計測されていない共同設立者のために、多くの標準的な統計手法は実質的に偏っているかもしれない。本稿では,多変量一般化線形モデルに対する共起効果の存在下での大規模仮説検証問題について検討する。任意のコンバウンディング機構の下では,直交構造を利用し,線形射影を3つの重要な段階に統合する,統一的な統計的推定と推論の枠組みを提案する。これは、潜伏係数を回復するために、辺縁と非負の関係の共起効果を遠ざけることから始まる。その後、ラッソ型最適化により潜在因子と一次効果を共同で推定する。最後に、仮説テストのために投影および重み付けされたバイアス補正ステップを組み込む。理論的には、様々な効果と非漸近誤差境界の同定条件を確立する。サンプルおよび応答サイズが無限大に近づくと、漸近的な$z$-testsの効果的なType-Iエラー制御を示す。数値実験により, 提案手法はベンジャミン・ホックベルク法により偽発見率を制御し, 代替手法よりも強力であることが示された。 2つのサンプル群から得られた単細胞RNA-seq数を比較することにより、モデルから有意な共変量が欠如している場合に、共起効果を調節する適性を示す。 Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover the latent coefficients. Subsequently, latent factors and primary effects are jointly estimated through lasso-type optimization. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish the identification conditions of various effects and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.	翻訳日:2024-04-24 00:52:28 公開日:2024-04-20
# C-Pack:中国の一般的な埋め込みを促進するためにパッケージ化されたリソース C-Pack: Packaged Resources To Advance General Chinese Embedding ( http://arxiv.org/abs/2309.07597v3 ) ライセンス: Link先を確認	Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff,	(参考訳) C-Packは、一般的な中国の埋め込みの分野を著しく前進させるリソースのパッケージである。 C-Packには3つの重要なリソースが含まれている。 1) C-MTEBは6つのタスクと35のデータセットをカバーする中国語テキスト埋め込みの総合ベンチマークである。 2) C-MTPは, ラベル付き, ラベルなしの中国語コーパスを用いて, 埋め込みモデルを訓練するための大量のテキスト埋め込みデータセットである。 3) C-TEMは、複数のサイズをカバーする埋め込みモデルのファミリーである。弊社のモデルは、C-MTEB上の以前の中国語のテキスト埋め込みを、リリース時に最大で10%上回っている。また、C-TEMのための一連のトレーニング方法を統合し、最適化します。一般的な中国語の埋め込みに関するリソースに加えて、英語のテキスト埋め込みのためのデータとモデルもリリースしています。 MTEBベンチマークでは、英語モデルは最先端のパフォーマンスを達成していますが、我々のリリースした英語データは、中国のデータより2倍も大きいのです。これらのリソースはすべてhttps://github.com/FlagOpen/FlagEmbedding.comで公開されています。 We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.	翻訳日:2024-04-24 00:42:43 公開日:2024-04-20
# Lemur: プログラムの自動検証に大規模言語モデルを統合する Lemur: Integrating Large Language Models in Automated Program Verification ( http://arxiv.org/abs/2310.04870v4 ) ライセンス: Link先を確認	Haoze Wu, Clark Barrett, Nina Narodytska,	(参考訳) LLMの実証されたコード理解能力は、検証ツールで難しいプログラムプロパティに関する高度な抽象的推論を必要とするタスクである自動プログラム検証に使用できるかどうかという問題を提起する。自動プログラム検証のためのLLMと自動推論器のパワーを組み合わせるための一般的な手法を提案する。我々は、この方法論をトランジションルールの集合として公式に記述し、その健全性を証明する。本稿では,音声自動検証手法として計算をインスタンス化し,一連の合成および競合ベンチマークの実践的改善を実証する。 The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.	翻訳日:2024-04-24 00:42:43 公開日:2024-04-20
# 不均一な自己監視学習による表現の強化 Enhancing Representations through Heterogeneous Self-Supervised Learning ( http://arxiv.org/abs/2310.05108v2 ) ライセンス: Link先を確認	Zhong-Yu Li, Bo-Wen Yin, Shanghua Gao, Yongxiang Liu, Li Liu, Ming-Ming Cheng,	(参考訳) 異なるアーキテクチャから異種表現を組み込むことは、様々なビジョンタスク、例えば、トランスフォーマーと畳み込みを組み合わせたハイブリッドネットワークを促進する。しかし、このような異種アーキテクチャ間の相補性は、自己教師付き学習では十分に活用されていない。そこで本研究では,HSSL(Heterogeneous Self-Supervised Learning)を提案する。このプロセスでは、HSSLは構造的変化を伴わずに表現学習方式でベースモデルに新しい特徴を付与する。 HSSLを包括的に理解するために,ベースモデルと補助ヘッドを含む多種多様な異種対の実験を行った。アーキテクチャの相違が大きくなるにつれて,ベースモデルの表現品質が向上することがわかった。本研究の動機は,特定のベースモデルの学習に最も適した補助頭部を迅速に決定する探索戦略と,モデルの差分を増大させる単純かつ効果的な方法を提案することである。 HSSLは、画像分類、セマンティックセグメンテーション、インスタンスのセグメンテーション、オブジェクト検出など、さまざまなダウンストリームタスクにおいて優れたパフォーマンスを達成する。私たちのソースコードは公開されます。 Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection. Our source code will be made publicly available.	翻訳日:2024-04-24 00:42:43 公開日:2024-04-20
# CoT3DRef:データ効率のよい3Dビジュアルグラウンド CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding ( http://arxiv.org/abs/2310.06214v3 ) ライセンス: Link先を確認	Eslam Mohamed Bakr, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny,	(参考訳) 3Dビジュアルグラウンドティングは、発話によって条件付けられた3Dシーンでオブジェクトをローカライズする機能である。既存のほとんどのメソッドは参照ヘッドを使って参照オブジェクトを直接ローカライズし、複雑なシナリオで失敗する。さらに、ネットワークが最終決定に達する方法や理由も示していない。本稿では,人間の知覚システムを模倣する可能性を秘めた,解釈可能な3次元視覚基盤を設計できるのか? と。この目的のために、まずアンカーの連鎖を予測し、次に最終ターゲットを予測することにより、シークエンス・ツー・シーケンスのSeq2Seqタスクとして3次元視覚接地問題を定式化する。解釈可能性は全体的なパフォーマンスを改善するだけでなく、障害ケースの特定にも役立ちます。思考の連鎖に従えば、参照タスクを解釈可能な中間ステップに分解し、パフォーマンスを高め、フレームワークを極めてデータ効率のよいものにすることができます。さらに,提案するフレームワークは既存のアーキテクチャに容易に組み込むことができる。我々は,Nr3D,Sr3D,Scanreferベンチマークの総合的な実験を通じてアプローチを検証するとともに,手動のアノテートデータを必要としない既存手法と比較して一貫した性能向上を示す。さらに、提案するフレームワークであるCoT3DRefは、データ効率がかなり高いのに対して、Sr3Dデータセットでは、データの10%しかトレーニングしていない場合、データ全体に基づいてトレーニングされたSOTAのパフォーマンスと一致します。コードはhttps:eslambakr.github.io/cot3dref.github.io/で公開されている。 3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances. Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios. In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question Can we design an interpretable 3D visual grounding framework that has the potential to mimic the human perception system?. To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence Seq2Seq task by first predicting a chain of anchors and then the final target. Interpretability not only improves the overall performance but also helps us identify failure cases. Following the chain of thoughts approach enables us to decompose the referring task into interpretable intermediate steps, boosting the performance and making our framework extremely data-efficient. Moreover, our proposed framework can be easily integrated into any existing architecture. We validate our approach through comprehensive experiments on the Nr3D, Sr3D, and Scanrefer benchmarks and show consistent performance gains compared to existing methods without requiring manually annotated data. Furthermore, our proposed framework, dubbed CoT3DRef, is significantly data-efficient, whereas on the Sr3D dataset, when trained only on 10% of the data, we match the SOTA performance that trained on the entire data. The code is available at https:eslambakr.github.io/cot3dref.github.io/.	翻訳日:2024-04-24 00:32:58 公開日:2024-04-20
# 文脈モデリングによる半監督された群集数:群集場面の全体的理解を促進する Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes ( http://arxiv.org/abs/2310.10352v3 ) ライセンス: Link先を確認	Yifei Qian, Xiaopeng Hong, Zhongliang Guo, Ognjen Arandjelović, Carl R. Donovan,	(参考訳) そこで本研究では,信頼度の高い群集数モデルの訓練に要する重いアノテーション負担を軽減し,より多くのデータを活用することで,モデルをより実践的かつ正確にするため,教師の枠組みに基づいた新たな半教師方式を提案する。ラベル付きデータが不足している場合には、ローカルパッチに過度に適合する傾向にある。このような状況下では、ラベルなしデータによる局所パッチ予測の精度を単に改善するという従来のアプローチは不十分である。そこで本研究では,モデル固有の「従属化」能力の育成という,よりニュアンスなアプローチを提案する。この能力により、モデルは群衆シーンの理解を活用し、人間の認知過程を反映することで、地域の数を正確に見積もることができる。この目的を達成するために、ラベルのないデータにマスキングを適用し、全体的手がかりに基づいてこれらのマスキングされたパッチの予測をモデルに導く。さらに,特徴学習を支援するために,細粒度密度分類タスクを組み込んだ。本手法は, 厳密な構造制約や損失制約を伴わないため, 既存の群集カウント法に適用可能である。さらに、我々のフレームワークでトレーニングされたモデルが「補助的」な振る舞いを示すことを観察する。高密度領域を正確に予測し、局所的な詳細を組み込んで高密度領域を予測する。提案手法は,上海技術AやUCF-QNRFといった挑戦的なベンチマークにおいて,従来のアプローチをはるかに上回り,最先端の性能を実現する。コードは、https://github.com/cha15yq/MRC-Crowd.comで入手できる。 To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the conventional approach of solely improving the accuracy of local patch predictions through unlabeled data proves inadequate. Consequently, we propose a more nuanced approach: fostering the model's intrinsic 'subitizing' capability. This ability allows the model to accurately estimate the count in regions by leveraging its understanding of the crowd scenes, mirroring the human cognitive process. To achieve this goal, we apply masking on unlabeled data, guiding the model to make predictions for these masked patches based on the holistic cues. Furthermore, to help with feature learning, herein we incorporate a fine-grained density classification task. Our method is general and applicable to most existing crowd counting methods as it doesn't have strict structural or loss constraints. In addition, we observe that the model trained with our framework exhibits a 'subitizing'-like behavior. It accurately predicts low-density regions with only a 'glance', while incorporating local details to predict high-density regions. Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is available at: https://github.com/cha15yq/MRC-Crowd.	翻訳日:2024-04-24 00:32:57 公開日:2024-04-20
# Refining Latent Representations: Heterogeneous Graph LearningのためのジェネレーティブSSLアプローチ Refining Latent Representations: A Generative SSL Approach for Heterogeneous Graph Learning ( http://arxiv.org/abs/2310.11102v4 ) ライセンス: Link先を確認	Yulan Hu, Zhirui Yang, Sheng Ouyang, Yong Liu,	(参考訳) 自己監視学習(SSL)は大きな可能性を示し、グラフ学習への関心が高まっている。しかし、生成的SSL法では、HGL(Heterogeneous Graph Learning)の可能性はいまだに未解明である。 Generative SSLは、エンコーダを使用して、入力グラフを潜在表現にマッピングし、デコーダを使用して潜在表現から入力グラフを復元する。従来のHGL SSLメソッドは一般的にグラフの不均一性を捕捉するための複雑な戦略を設計するが、これはしばしば非自明なビュー構築戦略に大きく依存している。しかし、生成SSLにおける潜伏表現の精細化は、グラフ学習結果を効果的に改善することができる。本研究では,HGL用に特別に設計された生成SSL方式であるHGVAEを提案する。 HGVAEは不均一性を捉える複雑な戦略を設計する代わりに、潜伏表現の精細化に重点を置いている。具体的には、HGVAEは、潜在表現に基づく対照的なタスクを革新的に開発する。負のサンプルの硬さを確保するために,変分推論(VI)を利用して高品質な負のサンプルを生成するプログレッシブ・ネガティブ・サンプル生成(PNSG)機構を開発した。 HGLに生成SSLを適用する先駆者として、HGVAEは潜在表現を洗練し、高品質な表現を学ぶようモデルに促す。様々な最先端(SOTA)ベースラインと比較して、HGVAEは印象的な結果をもたらし、その優位性を検証する。 Self-Supervised Learning (SSL) has shown significant potential and has garnered increasing interest in graph learning. However, particularly for generative SSL methods, its potential in Heterogeneous Graph Learning (HGL) remains relatively underexplored. Generative SSL utilizes an encoder to map the input graph into a latent representation and a decoder to recover the input graph from the latent representation. Previous HGL SSL methods generally design complex strategies to capture graph heterogeneity, which heavily rely on contrastive view construction strategies that are often non-trivial. Yet, refining the latent representation in generative SSL can effectively improve graph learning results. In this study, we propose HGVAE, a generative SSL method specially designed for HGL. Instead of focusing on designing complex strategies to capture heterogeneity, HGVAE centers on refining the latent representation. Specifically, HGVAE innovatively develops a contrastive task based on the latent representation. To ensure the hardness of negative samples, we develop a progressive negative sample generation (PNSG) mechanism that leverages the ability of Variational Inference (VI) to generate high-quality negative samples. As a pioneer in applying generative SSL for HGL, HGVAE refines the latent representation, thereby compelling the model to learn high-quality representations. Compared with various state-of-the-art (SOTA) baselines, HGVAE achieves impressive results, thus validating its superiority.	翻訳日:2024-04-24 00:32:57 公開日:2024-04-20
# GPT-4はチューリング試験に合格するのか? Does GPT-4 pass the Turing test? ( http://arxiv.org/abs/2310.20216v2 ) ライセンス: Link先を確認	Cameron R. Jones, Benjamin K. Bergen,	(参考訳) GPT-4をオンラインチューリングテストで評価した。最も優れたGPT-4プロンプトは49.7%のゲームで通過し、ELIZA(22%)とGPT-3.5(20%)を上回ったが、ヒトが設定したベースラインに届かなかった(66%)。参加者の判断は主に言語的スタイル(35%)と社会情緒的特徴(27%)に基づいており、知性はチューリング試験に合格するには不十分であるという考えを支持した。 LLMとゲーム数に関する参加者の知識は、AI検出の精度と正の相関関係があり、学習と実践が詐欺を軽減できる戦略であることを示唆した。インテリジェンステストとしての既知の制限にもかかわらず、チューリングテストは自然主義的なコミュニケーションと騙しの評価として引き続き関係していると我々は主張する。人間としてマスクレーディングできるAIモデルは、広く社会的な結果をもたらす可能性があり、異なる戦略の有効性と人間の類似性を判断するための基準を分析します。 We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants' decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.	翻訳日:2024-04-24 00:23:13 公開日:2024-04-20
# Carpe Diem:生涯言語モデルにおける世界知識の評価について Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models ( http://arxiv.org/abs/2311.08106v2 ) ライセンス: Link先を確認	Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun,	(参考訳) 常に変化する世界の知識のダイナミックな性質は、静的データに基づいて訓練された言語モデルに対する課題を提示している。人間の言語におけるこれらの時間依存力学のための言語モデルの能力を研究するために、進化するウィキペディアデータベース上でLMを訓練し評価するために設計された、時間的に進化する質問応答ベンチマークであるEvolvingQAを導入する。 EvolvingQAの構築は、大規模な言語モデルを使用してパイプラインで自動化されます。既存の継続的な学習ベースラインが、時代遅れの知識の更新と削除に悩まされていることを明らかにする。我々の分析では、モデルが小さな重み勾配のために知識の修正に失敗することを示唆している。さらに,言語モデルが特に数値情報や時間情報の変化を反映するのに苦慮していることも明らかにした。本研究の目的は,実世界の情報の動的性質をモデル化することであり,言語モデルの進化適応性を忠実に評価することである。 The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database. The construction of EvolvingQA is automated with our pipeline using large language models. We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge. Our analysis suggests that models fail to rectify knowledge due to small weight gradients. In addition, we elucidate that language models particularly struggle to reflect the change of numerical or temporal information. Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.	翻訳日:2024-04-24 00:23:13 公開日:2024-04-20
# FREE:環境生態系のモデリングのための基礎的意味認識 FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems ( http://arxiv.org/abs/2311.10255v2 ) ライセンス: Link先を確認	Shiyuan Luo, Juntong Ni, Shengyu Chen, Runlong Yu, Yiqun Xie, Licheng Liu, Zhenong Jin, Huaxiu Yao, Xiaowei Jia,	(参考訳) 環境生態系のモデリングは、我々の惑星の持続可能性にとって重要であるが、多くの物理変数間の相互作用によって引き起こされる複雑なプロセスのため、非常に困難である。多くの変数を大規模に測定することは困難であるため、既存の研究は観測可能な特徴と局所的に利用可能な測定値の組み合わせを、特定の研究領域と期間のモデルを構築するための入力として利用することが多い。これは、環境生態系のモデリングを進める上で、根本的な疑問を提起する。空間と時間の様々な環境データ間の複雑な関係をモデル化するための一般的なフレームワークを構築するには、どうすればよいのか? 本稿では、利用可能な環境データをテキスト空間にマッピングし、環境科学における従来の予測モデリングタスクを意味認識問題に変換する新しいフレームワークFREEを紹介する。提案したFREEフレームワークは、Large Language Models(LLM)の最近の進歩を活用して、元々の入力機能を自然言語記述で補う。これにより、データセマンティクスのキャプチャが容易になり、入力機能の不規則性を活用することができる。長期予測に使用する場合、FREEは将来予測を強化するために新たに収集した観測を組み込む柔軟性を持つ。 FREEの有効性は、2つの社会的に重要な実世界の応用の文脈で評価され、デラウェア川流域の河川水温を予測し、イリノイ州とアイオワ州で毎年トウモロコシの収量を予測する。複数のベースライン法よりも優れた予測性能の他に、FREEは物理モデルで生成されたシミュレーションデータに基づいて事前学習できるため、よりデータ効率と計算効率が良いことが示されている。 Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models.	翻訳日:2024-04-24 00:23:13 公開日:2024-04-20
# $\mathrm{XOR}^{}$と$\mathrm{FFL}$ゲームに対する最適かつほぼ最適な量子戦略 Optimal, and approximately optimal, quantum strategies for $\mathrm{XOR}^{}$ and $\mathrm{FFL}$ games ( http://arxiv.org/abs/2311.12887v2 ) ライセンス: Link先を確認	Pete Rigas,	(参考訳) 我々は、様々な非ローカルなXORゲームに対して最適で、ほぼ最適な量子戦略を解析する。 2016年のオストロフによる以前の議論に基づいて、プレイヤーが線形汎関数を最大化して非局所的なゲームに勝つための戦略を採用できると特徴付けたAliceとBobは、ある確率分布から引き出された質問に対する各答えを検証し、AliceとBobが量子エンタングルメント、二次元資源システム、可逆変換に依存する戦略を採用する場合の量子優位性を実現するために、より広い種類の量子戦略のパフォーマンスを解析するためのフレームワークのさらなる応用を特定できる。 Fortnow-Feige-Lovasz (FFL) ゲームでは、2016 のフレームワークは、(1) 適切な非ゼロの線形変換を構築し、(2) 作用素が単位フロベニウスノルムを持ち、(3) 誤差境界を構築し、対応する近似演算を$\big(A_k \otimes \textbf{I} \big) \ket{\psi}$, and $\big( \textbf{I} \otimes \big( \frac{\pm B_{kl} + B_{lk}}{\sqrt{2}} \big) \ket{\psi}$,(4) 演算子は、A_j=i$(5) の上限で適用された順序に置換された有界であることを示す。我々は,本フレームワークの他のゲームへの適用に読者の注意を惹きつける。 We analyze optimal, and approximately optimal, quantum strategies for a variety of non-local XOR games. Building upon previous arguments due to Ostrev in 2016, which characterized approximately optimal, and optimal, strategies that players Alice and Bob can adopt for maximizing a linear functional to win non-local games after a Referee party examines each answer to a question drawn from some probability distribution, we identify additional applications of the framework for analyzing the performance of a broader class of quantum strategies in which it is possible for Alice and Bob to realize quantum advantage if the two players adopt strategies relying upon quantum entanglement, two-dimensional resource systems, and reversible transformations. For the Fortnow-Feige-Lovasz (FFL) game, the 2016 framework is directly applicable, which consists of five steps, including: (1) constructing a suitable, nonzero, linear transformation for the intertwining operations, (2) demonstrating that the operator has unit Frobenius norm, (3) constructing error bounds, and corresponding approximate operations, for $\big( A_k \otimes \textbf{I} \big) \ket{\psi}$, and $\big( \textbf{I} \otimes \big( \frac{\pm B_{kl} + B_{lk}}{\sqrt{2}} \big) \big) \ket{\psi}$, (4) constructing additional bounds for permuting the order in which $A^{j_i}_i$ operators are applied, (5) obtaining Frobenius norm upper bounds for Alice and Bob's strategies. We draw the attention of the reader to applications of this framework in other games with less regular structure.	翻訳日:2024-04-24 00:23:13 公開日:2024-04-20
# TransNeXt: 視覚変換器のロバストな視覚知覚 TransNeXt: Robust Foveal Visual Perception for Vision Transformers ( http://arxiv.org/abs/2311.17132v3 ) ライセンス: Link先を確認	Dai Shi,	(参考訳) 残差接続における深度劣化効果のため、情報交換のために積み重ね層に依存する多くの効率的なビジョントランスフォーマーモデルでは、十分な情報混合が得られず、不自然な視覚知覚に繋がる。本稿では,生物の眼球運動と眼球運動をシミュレートするバイオミメティックデザインに基づくトークンミキサーAggregated Attentionを提案する。さらに、従来のクエリやキーと相互作用する学習可能なトークンを組み込み、クエリとキーの類似性に依存するだけでなく、アフィニティ行列の生成も多様化する。本手法では,情報交換の積み重ねに頼らず,奥行き劣化を効果的に回避し,自然な視覚知覚を実現する。さらに,GLUとSEのギャップを埋めるチャネルミキサーであるConvolutional GLUを提案する。集約された注意と畳み込みGLUを組み合わせて、TransNeXtと呼ばれる新しいビジュアルバックボーンを作成します。大規模な実験により、TransNeXtは複数のモデルサイズにまたがって最先端のパフォーマンスを実現することが実証された。 224^2$の解像度で、TransNeXt-Tinyはイメージネットの精度84.0%に達し、69%のパラメータでConvNeXt-Bを上回った。 TransNeXt-Base は ImageNet の精度86.2%、ImageNet-A の精度61.6%を384^2$、COCO オブジェクト検出 mAP 57.1、ADE20K セマンティックセグメンテーション mIoU 54.7 で達成している。 Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and keys. Our approach does not rely on stacking for information exchange, thus effectively avoiding depth degradation and achieving natural visual perception. Additionally, we propose Convolutional GLU, a channel mixer that bridges the gap between GLU and SE mechanism, which empowers each token to have channel attention based on its nearest neighbor image features, enhancing local modeling capability and model robustness. We combine aggregated attention and convolutional GLU to create a new visual backbone called TransNeXt. Extensive experiments demonstrate that our TransNeXt achieves state-of-the-art performance across multiple model sizes. At a resolution of $224^2$, TransNeXt-Tiny attains an ImageNet accuracy of 84.0%, surpassing ConvNeXt-B with 69% fewer parameters. Our TransNeXt-Base achieves an ImageNet accuracy of 86.2% and an ImageNet-A accuracy of 61.6% at a resolution of $384^2$, a COCO object detection mAP of 57.1, and an ADE20K semantic segmentation mIoU of 54.7.	翻訳日:2024-04-24 00:13:26 公開日:2024-04-20
# 1000フレームの1Bパラメータによる終端動作検出 End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames ( http://arxiv.org/abs/2311.17241v2 ) ライセンス: Link先を確認	Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem,	(参考訳) 近年、時間的行動検出(TAD)は、エンドツーエンドのトレーニングで大幅に改善されている。しかし、メモリボトルネックのため、限られたスケールと限られたデータ量を持つモデルだけがエンドツーエンドのトレーニングを受けられるため、必然的にTADのパフォーマンスが制限される。本稿では,エンド・ツー・エンドのトレーニングにおけるメモリ消費を削減し,10億のパラメータと入力ビデオが1,536フレームにスケールアップし,大幅な検出性能を実現する。我々のアプローチの鍵は、トレーニングメモリを減らす新しい軽量モジュールである、時間的不変アダプタ(TIA)にある。 TIAを用いて,TADタスクに適応するために,TAAのパラメータのみを更新することで,背骨を学習から解放する。 TIAはまた、背骨全体に隣接するフレームから時間的にコンテキストを集約することで、TAD表現を改善する。 4つの代表的なデータセットにまたがってモデルを評価した。効率的な設計のため、VideoMAEv2-giantでエンドツーエンドをトレーニングし、THUMOS14で75.4%のmAPを達成できます。コードはhttps://github.com/sming256/AdaTADで入手できる。 Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training, which inevitably restricts TAD performance. In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames, leading to significant detection performance. The key to our approach lies in our proposed temporal-informative adapter (TIA), which is a novel lightweight module that reduces training memory. Using TIA, we free the humongous backbone from learning to adapt to the TAD task by only updating the parameters in TIA. TIA also leads to better TAD representation by temporally aggregating context from adjacent frames throughout the backbone. We evaluate our model across four representative datasets. Owing to our efficient design, we are able to train end-to-end on VideoMAEv2-giant and achieve 75.4% mAP on THUMOS14, being the first end-to-end model to outperform the best feature-based methods. Code is available at https://github.com/sming256/AdaTAD.	翻訳日:2024-04-24 00:13:26 公開日:2024-04-20
# アクションスロット:交通場面におけるマルチラベル原子活動認識のための視覚行動中心表現 Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes ( http://arxiv.org/abs/2311.17948v2 ) ライセンス: Link先を確認	Chi-Hsi Kung, Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen,	(参考訳) 本稿では,マルチラベル原子活動認識について検討する。行動認識の顕著な進歩にもかかわらず、複数の道路利用者の動きと文脈情報の両方を包括的に理解できないため、原子活動を認識することは依然として困難である。本稿では、視覚行動中心の表現を学習し、動き情報と文脈情報の両方をキャプチャするスロットアテンションに基づくアプローチであるAction-Slotを紹介する。私たちのキーとなる考え方は、原子活動が起こる領域に注意を払うことができるアクションスロットを、明示的な知覚誘導を必要とせずに設計することです。スロットアテンションをさらに高めるために、アクションスロットと競合するバックグラウンドスロットを導入し、アクティビティを欠くバックグラウンド領域に不必要なフォーカスを避ける訓練プロセスを支援する。しかし、既存のデータセットにおける不均衡なクラス分布は、稀な活動の評価を妨げている。この制限に対処するため,OATSより4倍大きく,原子活性のバランスの取れた分布を特徴とするTACOという合成データセットを収集した。本手法の有効性を検証するため,様々な行動認識ベースラインに対する包括的実験およびアブレーション研究を行った。また,実世界のデータセット上でのマルチラベル原子活動認識の性能は,TACO上での事前学習により向上できることを示す。ソースコードとデータセットをリリースします。プロジェクトのページで視覚化のビデオをご覧ください。 In this paper, we study multi-label atomic activity recognition. Despite the notable progress in action recognition, it is still challenging to recognize atomic activities due to a deficiency in a holistic understanding of both multiple road users' motions and their contextual information. In this paper, we introduce Action-slot, a slot attention-based approach that learns visual action-centric representations, capturing both motion and contextual information. Our key idea is to design action slots that are capable of paying attention to regions where atomic activities occur, without the need for explicit perception guidance. To further enhance slot attention, we introduce a background slot that competes with action slots, aiding the training process in avoiding unnecessary focus on background regions devoid of activities. Yet, the imbalanced class distribution in the existing dataset hampers the assessment of rare activities. To address the limitation, we collect a synthetic dataset called TACO, which is four times larger than OATS and features a balanced distribution of atomic activities. To validate the effectiveness of our method, we conduct comprehensive experiments and ablation studies against various action recognition baselines. We also show that the performance of multi-label atomic activity recognition on real-world datasets can be improved by pretraining representations on TACO. We will release our source code and dataset. See the videos of visualization on the project page: https://hcis-lab.github.io/Action-slot/	翻訳日:2024-04-24 00:13:26 公開日:2024-04-20
# D$^2$ST-Adapter:Few-shot行動認識のための不整形と変形可能な時空間適応器 D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition ( http://arxiv.org/abs/2312.01431v3 ) ライセンス: Link先を確認	Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian,	(参考訳) 大規模な事前学習された画像モデルを数発のアクション認識に適応させることは、数発の学習に不可欠である頑健な特徴抽出器を学習する上で、効果的かつ効率的な戦略であることが証明されている。典型的な微調整ベースの適応パラダイムは、数ショットの学習シナリオで過度に適合する傾向があり、ビデオデータの時間的特徴を学習するためのモデリングの柔軟性がほとんどない。本研究では,D$^2$ST-Adapter (Disentangled-and-Deformable Spatio-Temporal Adapter, D$^2$ST-Adapter) を提案する。空間的特徴と時間的特徴を絡み合った方法で符号化するデュアルパスアーキテクチャで設計されている。特に,D$^2$ST-Adapterのコアコンポーネントとして異方性変形型時空間アテンションモジュールを考案し,空間的および時間的領域に沿って異方性サンプリング密度を調整し,対応する経路で特に空間的・時間的特徴を学習し,D$^2$ST-Adapterにより3次元時空間のグローバルな視野における特徴を符号化し,軽量な設計を維持した。プレトレーニングされたResNetとViTの両方における本手法のインスタンス化による広範囲な実験は、数発のアクション認識のための最先端の手法よりも、本手法が優れていることを示す。本手法は,時間的ダイナミクスが行動認識に不可欠である難易度シナリオに特に適している。 Adapting large pre-trained image models to few-shot action recognition has proven to be an effective and efficient strategy for learning robust feature extractors, which is essential for few-shot learning. Typical fine-tuning based adaptation paradigm is prone to overfitting in the few-shot learning scenarios and offers little modeling flexibility for learning temporal features in video data. In this work we present the Disentangled-and-Deformable Spatio-Temporal Adapter (D$^2$ST-Adapter), which is a novel adapter tuning framework well-suited for few-shot action recognition due to lightweight design and low parameter-learning overhead. It is designed in a dual-pathway architecture to encode spatial and temporal features in a disentangled manner. In particular, we devise the anisotropic Deformable Spatio-Temporal Attention module as the core component of D$^2$ST-Adapter, which can be tailored with anisotropic sampling densities along spatial and temporal domains to learn spatial and temporal features specifically in corresponding pathways, allowing our D$^2$ST-Adapter to encode features in a global view in 3D spatio-temporal space while maintaining a lightweight design. Extensive experiments with instantiations of our method on both pre-trained ResNet and ViT demonstrate the superiority of our method over state-of-the-art methods for few-shot action recognition. Our method is particularly well-suited to challenging scenarios where temporal dynamics are critical for action recognition.	翻訳日:2024-04-24 00:13:26 公開日:2024-04-20
# Masked Pre-TrainingとCollaborative Self-Trainingによる教師なしビデオドメイン適応 Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training ( http://arxiv.org/abs/2312.02914v4 ) ライセンス: Link先を確認	Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa,	(参考訳) 本研究では,ビデオ行動認識における教師なし領域適応(UDA)の問題に取り組む。我々のアプローチはUNITEと呼ばれ、画像教師モデルを用いてビデオ学生モデルを対象領域に適応させる。 UNITEは、教師が指導するマスク付き蒸留目標を用いて、まず自己指導型事前学習を用いて、ターゲットドメインビデオにおける差別的特徴学習を促進する。次に,ビデオ学生モデルとイメージ教師モデルを用いて,マスク付き対象データを用いた自己学習を行い,未ラベル対象ビデオのための改良された擬似ラベルを生成する。我々の自己学習プロセスは、ドメイン間の強い転送性能を達成するために、両方のモデルの強みをうまく活用する。我々は、複数のビデオ領域適応ベンチマークに対するアプローチを評価し、これまでに報告された結果に対する大幅な改善を観察する。 In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.	翻訳日:2024-04-24 00:13:26 公開日:2024-04-20
# StructComp: グラフコントラスト学習における構造圧縮による伝達の代替 StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning ( http://arxiv.org/abs/2312.04865v3 ) ライセンス: Link先を確認	Shengzhong Zhang, Wenjie Yang, Xinyuan Cao, Hongwei Zhang, Zengfeng Huang,	(参考訳) グラフコントラスト学習(GCL)は、グラフデータを学習するための強力なツールとなっているが、そのスケーラビリティは依然として大きな課題である。本研究では,この問題を解決するために,構造圧縮(StructComp)と呼ばれるシンプルで効果的なトレーニングフレームワークを提案する。拡散行列上の疎低ランク近似にインスパイアされたStructCompは、圧縮ノードでエンコーダを訓練する。これにより、エンコーダはトレーニング期間中にメッセージパッシングを行わず、対照的な損失でサンプルペアの数を大幅に削減できる。理論的には、元のGCL損失はStructCompによって計算された対照的な損失と近似できる。さらに、StructCompはGCLモデルのさらなる正規化用語と見なすことができ、より堅牢なエンコーダとなる。様々なデータセットに関する実証的研究により、StructCompは、バニラGCLモデルやスケーラブルなトレーニング手法と比較して、モデルパフォーマンスを改善しながら、時間とメモリ消費を大幅に削減することが示された。 Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on various datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.	翻訳日:2024-04-24 00:13:26 公開日:2024-04-20
# ビジョンランゲージモデルによるFew-Shot物体検出の再検討 Revisiting Few-Shot Object Detection with Vision-Language Models ( http://arxiv.org/abs/2312.14494v2 ) ライセンス: Link先を確認	Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan,	(参考訳) FSOD(Few-shot Object Detection)ベンチマークは、アノテーションを限定した新しいカテゴリを検出するための高度な技術を持っている。既存のベンチマークでは、COCOのような確立されたデータセットを、それぞれ、事前トレーニングと微調整のためのベースクラスと新しいクラスに分割することで再利用している。しかし、これらのベンチマークは、FSODが実際にどのようにデプロイされているかを反映していない。少数の基本カテゴリを事前学習するよりは、対象ドメインに対して基礎モデル(例えば、Webスケールデータに基づいて事前学習された視覚言語モデル(VLM))を微調整することがより現実的であると論じる。驚いたことに、GroundingDINOのようなVLMからのゼロショット推論はCOCO上の最先端(48.3対33.1 AP)よりも著しく優れている。しかし、そのようなゼロショットモデルは、それでも対象とする興味ある概念と一致しない。例えば、ウェブ上のトレーラーは、自動運転車の文脈におけるトレーラーとは異なるかもしれない。本研究では,任意の外部データセット上で事前学習し,ターゲットクラス毎のKショットを微調整した検出器を評価するための新しいベンチマークプロトコルであるFoundational FSODを提案する。さらに、現在のFSODベンチマークは、データのサブセット上の各カテゴリに対する徹底的なアノテーションを含む、実際にフェデレーションされたデータセットである点に留意する。我々はこの知見を利用して、フェデレートされた損失を伴う微調整VLMの簡単な戦略を提案する。我々は LVIS と nu Images に対するアプローチの有効性を実証し,5.9 AP による先行作業よりも改善した。私たちのコードはhttps://github.com/anishmadan23/foundational_fsodで利用可能です。 Few-shot object detection (FSOD) benchmarks have advanced techniques for detecting new categories with limited annotations. Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively. However, these benchmarks do not reflect how FSOD is deployed in practice. Rather than only pre-training on a small number of base categories, we argue that it is more practical to fine-tune a foundation model (e.g., a vision-language model (VLM) pre-trained on web-scale data) for a target domain. Surprisingly, we find that zero-shot inference from VLMs like GroundingDINO significantly outperforms the state-of-the-art (48.3 vs. 33.1 AP) on COCO. However, such zero-shot models can still be misaligned to target concepts of interest. For example, trailers on the web may be different from trailers in the context of autonomous vehicles. In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class. Further, we note that current FSOD benchmarks are actually federated datasets containing exhaustive annotations for each category on a subset of the data. We leverage this insight to propose simple strategies for fine-tuning VLMs with federated losses. We demonstrate the effectiveness of our approach on LVIS and nuImages, improving over prior work by 5.9 AP. Our code is available at https://github.com/anishmadan23/foundational_fsod	翻訳日:2024-04-24 00:03:25 公開日:2024-04-20
# 交換のないSGDの軌道について On the Trajectories of SGD Without Replacement ( http://arxiv.org/abs/2312.16143v2 ) ライセンス: Link先を確認	Pierfrancesco Beneventano,	(参考訳) 本稿では,SGD(Stochastic Gradient Descent)の暗黙的正則化効果について検討する。我々は、大規模なニューラルネットワークを最適化するために一般的に使用される変種であるSGDを置き換えることなく検討する。例えば、学習率とヘッセンの積を$O(1)$とし、モデルアーキテクチャ、学習タスク、損失(客観的)関数を指定しない。我々の理論の核となる結果は、SGDを置き換えることなく最適化することは、新しい正則化への追加ステップと局所的に等価であるということである。これは、置換のないSGDの期待軌跡を分離できることを意味している。 (i)高い曲率の方向に沿ってSGDに代えて(バッチをサンプリングする。)、 (二)平坦なものに沿ったノイズ共分散の痕跡の正則化。その結果、置換のないSGDは平坦な領域を移動し、置換したSGDよりもかなり速くサドルを逃れることができた。いくつかの視覚的タスクにおいて、新しい正規化器はフィッシャーマトリックスの重み付けされた痕跡をペナライズし、それ故にヘッセンのスペクトルの空間が、以前の研究から経験的な観察に則っていることを奨励する。また、SGDが(GDとは対照的に)安定性の端で訓練されない理由についても説明する。 This article examines the implicit regularization effect of Stochastic Gradient Descent (SGD). We consider the case of SGD without replacement, the variant typically used to optimize large-scale neural networks. We analyze this algorithm in a more realistic regime than typically considered in theoretical works on SGD, as, e.g., we allow the product of the learning rate and Hessian to be $O(1)$ and we do not specify any model architecture, learning task, or loss (objective) function. Our core theoretical result is that optimizing with SGD without replacement is locally equivalent to making an additional step on a novel regularizer. This implies that the expected trajectories of SGD without replacement can be decoupled in (i) following SGD with replacement (in which batches are sampled i.i.d.) along the directions of high curvature, and (ii) regularizing the trace of the noise covariance along the flat ones. As a consequence, SGD without replacement travels flat areas and may escape saddles significantly faster than SGD with replacement. On several vision tasks, the novel regularizer penalizes a weighted trace of the Fisher Matrix, thus encouraging sparsity in the spectrum of the Hessian of the loss in line with empirical observations from prior work. We also propose an explanation for why SGD does not train at the edge of stability (as opposed to GD).	翻訳日:2024-04-24 00:03:25 公開日:2024-04-20
# 多様化によるOOD一般化の鍵となる要素の解明 Unraveling the Key Components of OOD Generalization via Diversification ( http://arxiv.org/abs/2312.16313v3 ) ライセンス: Link先を確認	Harold Benoit, Liangze Jiang, Andrei Atanov, Oğuzhan Fatih Kar, Mattia Rigotti, Amir Zamir,	(参考訳) 監視された学習データセットには、トレーニングセットが同じようにうまく説明される複数のキューが含まれている可能性がある。しかし、それらの多くは、すなわち分布シフトの下で予測力を失い、結果としてアウト・オブ・ディストリビューション(OOD)データへの一般化に失敗する。最近開発された「多様性」法(Lee et al , 2023; Pagliardini et al , 2023)は、異なる特徴に依存する複数の多様な仮説を見つけることによってこの問題にアプローチしている。本研究の目的は,OODの一般化能力に寄与する重要な要素を同定することである。 1) 分散化法は, 分散化に使用するラベルなしデータの分布に非常に敏感であり, メソッド固有の甘味点から離れた場合, 性能が著しく低下することを示す。 2)OODの一般化には多様化だけでは不十分である。使用済みの学習アルゴリズム、例えば、モデルのアーキテクチャと事前学習の選択は、非常に重要です。標準的な実験(WaterbirdsとOffice-Homeデータセットの分類)では、第2の選択肢を使用すると、絶対的な精度が最大20%低下する。 (3)学習アルゴリズムの最適選択はラベルのないデータに依存する。 (4) 最後に, 多様な仮説の数を増やすことで, 上記の落とし穴を軽減できないことを示す。これらの結果は,OODの一般化能力に影響を及ぼす設計要因の解明に寄与する。既存の手法を最大限に活用する方法を実践者たちに指導し、新しいより良い方法の開発を研究者に指導することができる。 Supervised learning datasets may contain multiple cues that explain the training set equally well, i.e., learning any of them would lead to the correct predictions on the training data. However, many of them can be spurious, i.e., lose their predictive power under a distribution shift and consequently fail to generalize to out-of-distribution (OOD) data. Recently developed "diversification" methods (Lee et al., 2023; Pagliardini et al., 2023) approach this problem by finding multiple diverse hypotheses that rely on different features. This paper aims to study this class of methods and identify the key components contributing to their OOD generalization abilities. We show that (1) diversification methods are highly sensitive to the distribution of the unlabeled data used for diversification and can underperform significantly when away from a method-specific sweet spot. (2) Diversification alone is insufficient for OOD generalization. The choice of the used learning algorithm, e.g., the model's architecture and pretraining, is crucial. In standard experiments (classification on Waterbirds and Office-Home datasets), using the second-best choice leads to an up to 20\% absolute drop in accuracy. (3) The optimal choice of learning algorithm depends on the unlabeled data and vice versa i.e. they are co-dependent. (4) Finally, we show that, in practice, the above pitfalls cannot be alleviated by increasing the number of diverse hypotheses, the major feature of diversification methods. These findings provide a clearer understanding of the critical design factors influencing the OOD generalization abilities of diversification methods. They can guide practitioners in how to use the existing methods best and guide researchers in developing new, better ones.	翻訳日:2024-04-24 00:03:25 公開日:2024-04-20
# 衝突機での散乱断面積によるベルの不等式測定は可能か? Can Bell inequalities be tested via scattering cross-section at colliders ? ( http://arxiv.org/abs/2401.01162v2 ) ライセンス: Link先を確認	Song Li, Wei Shen, Jin Min Yang,	(参考訳) 衝突子におけるベルの不等式をテストするための最近の研究では、散乱断面積からのスピン相関の再構成はスピン相関の双線型形式に依存しており、全ての局所隠れ変数モデル(LHVM)がそのような性質を持つわけではない。一般LHVMが散乱断面積データによって排除できないことを示すために,粒子生成と崩壊の散乱断面積を標準量子理論と正確に同一に再現できる特定のLHVMを提案する。これにもかかわらず、散乱断面積によるスピン相関の再構成は、量子スピン相関の代用として古典的なスピン相関を用いたモデルにおいて、LHVMの幅広いクラスを除外することができる。 In current studies for testing Bell inequalities at colliders, the reconstruction of spin correlations from scattering cross-sections relies on the bilinear form of the spin correlations, and not all local hidden variable models (LHVMs) have such a property. To demonstrate that a general LHVM cannot be rule out via scattering cross-section data, we propose a specific LHVM, which can exactly duplicate the same scattering cross-section for particle production and decay as the standard quantum theory, making it indistinguishable at colliders in principle. Despite of this, we find that reconstructing spin correlations through scattering cross-sections can still rule out a broad class of LHVMs, e.g., those models employing classical spin correlations as a surrogate for quantum spin correlations.	翻訳日:2024-04-23 23:53:39 公開日:2024-04-20
# PVTをベースとしたエンコーディングと精細復号によるCT肝セグメンテーション CT Liver Segmentation via PVT-based Encoding and Refined Decoding ( http://arxiv.org/abs/2401.09630v3 ) ライセンス: Link先を確認	Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas, Gorkem Durak, Alpay Medetalibeyoglu, Matthew Antalek, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci,	(参考訳) CTスキャンからの正確な肝分画は、効果的な診断と治療計画に不可欠である。コンピュータ支援診断システムは、肝疾患の診断、疾患の進行、治療計画の精度を向上させることを約束する。そこで本研究では,事前学習されたピラミッド・ビジョン・トランスフォーマ(PVT v2)と,高度な残差アップサンプリングとデコーダブロックを組み合わせた新しいディープラーニング手法である‘textit{\textbf{PVTFormer}}を提案する。改良された特徴チャネルアプローチを階層的デコーディング戦略に統合することにより、PVTFormerはセマンティック機能を強化して高品質なセグメンテーションマスクを生成する。肝腫瘍セグメンテーションベンチマーク(LiTS)2017において提案手法の厳密な評価により,提案手法は高ダイス係数86.78\%,mIoU78.46\%,低HD3.50が得られた。その結果,最新の肝セグメンテーション法に対する新しいベンチマークの設定においてPVTFormerの有効性を裏付ける結果を得た。提案されたPVTFormerのソースコードは、 \url{https://github.com/DebeshJha/PVTFormer} で入手できる。 Accurate liver segmentation from CT scans is essential for effective diagnosis and treatment planning. Computer-aided diagnosis systems promise to improve the precision of liver disease diagnosis, disease progression, and treatment planning. In response to the need, we propose a novel deep learning approach, \textit{\textbf{PVTFormer}}, that is built upon a pretrained pyramid vision transformer (PVT v2) combined with advanced residual upsampling and decoder block. By integrating a refined feature channel approach with a hierarchical decoding strategy, PVTFormer generates high quality segmentation masks by enhancing semantic features. Rigorous evaluation of the proposed method on Liver Tumor Segmentation Benchmark (LiTS) 2017 demonstrates that our proposed architecture not only achieves a high dice coefficient of 86.78\%, mIoU of 78.46\%, but also obtains a low HD of 3.50. The results underscore PVTFormer's efficacy in setting a new benchmark for state-of-the-art liver segmentation methods. The source code of the proposed PVTFormer is available at \url{https://github.com/DebeshJha/PVTFormer}.	翻訳日:2024-04-23 23:53:39 公開日:2024-04-20
# Reliance: 情報とニュースの信頼性評価のための信頼性のあるアンサンブル学習 RELIANCE: Reliable Ensemble Learning for Information and News Credibility Evaluation ( http://arxiv.org/abs/2401.10940v2 ) ライセンス: Link先を確認	Majid Ramezani, Hamed Mohammadshahi, Mahshid Daliry, Soroor Rahmani, Amir-Hosein Asghari,	(参考訳) 情報拡散の時代において、ニュースコンテンツの信頼性を識別することは、ますます増加する課題である。本稿では,堅牢な情報と偽ニュースの信頼性評価を目的とした,先駆的なアンサンブル学習システムであるRELIANCEを紹介する。 Support Vector Machine(SVM)、Naive Bayes(英語版)、ロジスティック回帰(英語版)、ランダムフォレスト(英語版)、Bidirectional Long Term Memory Networks(英語版) (BiLSTMs)を含む5つの多様なベースモデルで構成され、RELIANCEはその強度を統合する革新的なアプローチを採用し、アンサンブルの集合的知性を利用して精度を高めている。実験では、個々のモデルよりもRELIANCEの方が優れていることが示され、信頼できない情報ソースと信頼できない情報ソースを区別する効果が示された。 Relianceはまた、情報およびニュース信頼性評価のベースラインモデルを超え、情報ソースの信頼性を評価する効果的なソリューションとしての地位を確立している。 In the era of information proliferation, discerning the credibility of news content poses an ever-growing challenge. This paper introduces RELIANCE, a pioneering ensemble learning system designed for robust information and fake news credibility evaluation. Comprising five diverse base models, including Support Vector Machine (SVM), naive Bayes, logistic regression, random forest, and Bidirectional Long Short Term Memory Networks (BiLSTMs), RELIANCE employs an innovative approach to integrate their strengths, harnessing the collective intelligence of the ensemble for enhanced accuracy. Experiments demonstrate the superiority of RELIANCE over individual models, indicating its efficacy in distinguishing between credible and non-credible information sources. RELIANCE, also surpasses baseline models in information and news credibility assessment, establishing itself as an effective solution for evaluating the reliability of information sources.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# あなたのケトルはハッカーより賢い? 消費者向けIoTデバイスでリプレイ攻撃の脆弱性を評価するためのスケーラブルなツール Is Your Kettle Smarter Than a Hacker? A Scalable Tool for Assessing Replay Attack Vulnerabilities on Consumer IoT Devices ( http://arxiv.org/abs/2401.12184v2 ) ライセンス: Link先を確認	Sara Lazzaro, Vincenzo De Angelis, Anna Maria Mandalari, Francesco Buccafurri,	(参考訳) コンシューマモノのインターネット(IoT)デバイスは、しばしばローカルネットワークを利用して対応するアプリや他のデバイスと通信する。これはクラウドをオフロードするため、効率の面でメリットがあります。 ENISAとNISTのセキュリティガイドラインは、安全と信頼性のためのデフォルトのローカル通信を可能にすることの重要性を強調している。実際、IoTデバイスは、クラウド接続が利用できない場合にも機能し続けなければならない。クラウドデバイス接続のセキュリティは通常、標準プロトコルの使用によって強化されるが、ローカル接続セキュリティはしばしば見過ごされる。ローカル通信のセキュリティの無視は、リプレイ攻撃を含む様々な脅威への扉を開く。本稿では,攻撃をリプレイするためのIoTデバイスの脆弱性を自動的にテストするための体系的手法を設計することによって,この種の攻撃について検討する。具体的には,REPLIOTというツールを用いて,ターゲット装置の事前知識を必要とせずに,リプレイ攻撃が成功したかどうかを判定する手法を提案する。私たちは、さまざまなベンダーやカテゴリにまたがる人気のある商用デバイスを使って、何千もの自動実験を行います。特に,これらのデバイスのうち51%はローカル接続をサポートしていないため,ENISA/NISTガイドラインの信頼性と安全性要件に準拠していない。残りの75%のデバイスは、検出精度0.98-1のREPLIOTによるリプレイ攻撃に対して脆弱であることがわかった。最後に、この脆弱性の原因について検討し、緩和戦略について議論する。 Consumer Internet of Things (IoT) devices often leverage the local network to communicate with the corresponding companion app or other devices. This has benefits in terms of efficiency since it offloads the cloud. ENISA and NIST security guidelines underscore the importance of enabling default local communication for safety and reliability. Indeed, an IoT device should continue to function in case the cloud connection is not available. While the security of cloud-device connections is typically strengthened through the usage of standard protocols, local connectivity security is frequently overlooked. Neglecting the security of local communication opens doors to various threats, including replay attacks. In this paper, we investigate this class of attacks by designing a systematic methodology for automatically testing IoT devices vulnerability to replay attacks. Specifically, we propose a tool, named REPLIOT, able to test whether a replay attack is successful or not, without prior knowledge of the target devices. We perform thousands of automated experiments using popular commercial devices spanning various vendors and categories. Notably, our study reveals that among these devices, 51% of them do not support local connectivity, thus they are not compliant with the reliability and safety requirements of the ENISA/NIST guidelines. We find that 75% of the remaining devices are vulnerable to replay attacks with REPLIOT having a detection accuracy of 0.98-1. Finally, we investigate the possible causes of this vulnerability, discussing possible mitigation strategies.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# Delocate: ランダムに位置決めされたトレーパー付きディープフェイクビデオの検出と位置決め Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces ( http://arxiv.org/abs/2401.13516v2 ) ライセンス: Link先を確認	Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou,	(参考訳) ディープフェイクビデオはますます現実的になりつつあり、フレームごとに異なる顔の領域を微妙に改ざんしている。その結果、既存のDeepfake検出手法の多くは、未知のドメインのDeepfakeビデオを検出するのに苦労し、改ざんされた領域を正確に特定する。そこで本研究では,未知のドメインのDeepfakeビデオの認識とローカライズが可能なDelocateという,新しいDeepfake検出モデルを提案する。 OurmethodはRecovering and Localizationという2つのステージから構成される。回復段階において、モデルランダムは興味のある領域(ROI)を隠蔽し、痕跡を改ざんすることなく実際の顔を再構成する。ローカライゼーション段階において、リカバリフェーズの出力とフォージェリーグラウンドの真理マスクは、フォージェリーローカライゼーションプロセスの導出を補助する。このプロセスは、偽の顔の回復段階と回復不良を戦略的に強調し、改ざんされた領域の局所化を容易にする。広範に使用されている4つのベンチマークデータセットの大規模な実験により、乱れ領域のローカライズに限らず、クロスドメイン検出性能も向上することが示された。 Deepfake videos are becoming increasingly realistic, showing subtle tampering traces on facial areasthat vary between frames. Consequently, many existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domain Deepfake videos. Ourmethod consists of two stages named recoveringand localization. In the recovering stage, the modelrandomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, resulting in a relatively good recovery effect for realfaces and a poor recovery effect for fake faces. Inthe localization stage, the output of the recoveryphase and the forgery ground truth mask serve assupervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Ourextensive experiments on four widely used benchmark datasets demonstrate that Delocate not onlyexcels in localizing tampered areas but also enhances cross-domain detection performance.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# Pixel to Elevation: 自動オフロードナビゲーションのための画像を用いた長距離標高マップの学習 Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation ( http://arxiv.org/abs/2401.17484v3 ) ライセンス: Link先を確認	Chanyoung Chung, Georgios Georgakis, Patrick Spieler, Curtis Padgett, Ali Agha, Shehryar Khattak,	(参考訳) 長距離での地形トポロジーの理解は、特に高速での航行において、オフロードロボットミッションの成功に不可欠である。現在幾何学的マッピングに大きく依存しているLiDARセンサーは、より遠くのマッピングでスパース測定を行う。この課題に対処するために,車載エゴセントリック画像のみをリアルタイムに利用して,長距離の地形標高マップを予測可能な,新しい学習ベースアプローチを提案する。提案手法は3つの要素から構成される。まず, トランスフォーマーをベースとしたエンコーダを導入し, エゴセントリックな視線と, 以前の鳥眼の視線高度マップの予測との相互関係を学習する。第2に,多視点視覚画像特徴を有する複雑な非構造地形上での3次元車両の姿勢認識型位置符号化を提案する。最後に、下流のナビゲーション作業を容易にするために、標高マップ予測間の時間的整合性を改善するために、歴史を付加した学習可能なマップ埋め込みを提案する。実世界のオフロード駆動データを用いて,複雑・非構造地形における自律型オフロードロボットナビゲーションの適用性について実験的に検証した。さらに、この手法は現在の最先端手法と比較して質的かつ定量的に比較される。大規模フィールド実験により, 地形の高度を正確に予測し, 地形の全体像を長距離で効果的に把握し, ベースラインモデルを超えていることが示された。最後に,提案手法の重要成分の影響を強調・理解し,オフロードロボットナビゲーション能力を向上させるための適合性を検証するためにアブレーション研究を行った。 Understanding terrain topology at long-range is crucial for the success of off-road robotic missions, especially when navigating at high-speeds. LiDAR sensors, which are currently heavily relied upon for geometric mapping, provide sparse measurements when mapping at greater distances. To address this challenge, we present a novel learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time. Our proposed method is comprised of three main elements. First, a transformer-based encoder is introduced that learns cross-view associations between the egocentric views and prior bird-eye-view elevation map predictions. Second, an orientation-aware positional encoding is proposed to incorporate the 3D vehicle pose information over complex unstructured terrain with multi-view visual image features. Lastly, a history-augmented learn-able map embedding is proposed to achieve better temporal consistency between elevation map predictions to facilitate the downstream navigational tasks. We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain using real-world offroad driving data. Furthermore, the method is qualitatively and quantitatively compared against the current state-of-the-art methods. Extensive field experiments demonstrate that our method surpasses baseline models in accurately predicting terrain elevation while effectively capturing the overall terrain topology at long-ranges. Finally, ablation studies are conducted to highlight and understand the effect of key components of the proposed approach and validate their suitability to improve offroad robotic navigation capabilities.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# 粗さ検出・分類のためのスマートウォッチマイクロフォンセンサの高調度化 Harnessing Smartwatch Microphone Sensors for Cough Detection and Classification ( http://arxiv.org/abs/2401.17738v2 ) ライセンス: Link先を確認	Pranay Jaiswal, Haroon R. Lone,	(参考訳) 本研究では,マイクロホンセンサを内蔵したスマートウォッチを用いたコークスのモニタリングと各種のコークス検出の可能性について検討した。参加者32名を対象に調査を行い,9時間分の音声データを制御的に収集した。その後, このデータを構造化した手法で処理し, その結果, 223個の正粘性試料が得られた。さらに,拡張手法によりデータセットを改良し,特殊な1次元CNNモデルを用いた。このモデルでは、非歩行時の98.49%、歩行中の98.2%の精度で、スマートウォッチが生地を検知できることを示している。さらに,本研究では,クラスタリング技術を用いて,4種類の生地の同定に成功した。 This study investigates the potential of using smartwatches with built-in microphone sensors for monitoring coughs and detecting various cough types. We conducted a study involving 32 participants and collected 9 hours of audio data in a controlled manner. Afterward, we processed this data using a structured approach, resulting in 223 positive cough samples. We further improved the dataset through augmentation techniques and employed a specialized 1D CNN model. This model achieved an impressive accuracy rate of 98.49% while non-walking and 98.2% while walking, showing smartwatches can detect cough. Moreover, our research successfully identified four distinct types of coughs using clustering techniques.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# AI生成画像検出に必要なのは1つのシンプルなパッチ A Single Simple Patch is All You Need for AI-generated Image Detection ( http://arxiv.org/abs/2402.01123v2 ) ライセンス: Link先を確認	Jiaxuan Chen, Jieteng Yao, Li Niu,	(参考訳) 最近の生成モデルの発展は、超現実的な偽画像を生成する可能性を解き放つ。偽画像の悪用を防ぐため、AIが生成した画像検出は、偽画像と実際の画像とを区別することを目的としている。しかし、既存の手法では、未知のジェネレータが生成した画像を検出する際に、厳しい性能低下に悩まされている。生成モデルは、リッチなテクスチャでパッチを生成することに集中し、単純なパッチに存在するカメラキャプチャによる隠れノイズを無視しながら、画像をよりリアルにする傾向にある。本稿では,偽画像の識別に単一単純パッチのノイズパターンを利用する手法を提案する。さらに,低品質画像の処理における性能低下により,干渉情報を除去するエンハンスメントモジュールと知覚モジュールを導入する。大規模な実験により, 提案手法は, 公開ベンチマーク上での最先端性能を実現することができることを示した。 The recent development of generative models unleashes the potential of generating hyper-realistic fake images. To prevent the malicious usage of fake images, AI-generated image detection aims to distinguish fake images from real images. However, existing method suffer from severe performance drop when detecting images generated by unseen generators. We find that generative models tend to focus on generating the patches with rich textures to make the images more realistic while neglecting the hidden noise caused by camera capture present in simple patches. In this paper, we propose to exploit the noise pattern of a single simple patch to identify fake images. Furthermore, due to the performance decline when handling low-quality generated images, we introduce an enhancement module and a perception module to remove the interfering information. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on public benchmarks.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# PiCO: 一貫性最適化に基づくLCMのピアレビュー PiCO: Peer Review in LLMs based on the Consistency Optimization ( http://arxiv.org/abs/2402.01830v2 ) ライセンス: Link先を確認	Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu, Yu Wang, Ming Pang, Li Yuan,	(参考訳) 既存の大規模言語モデル (LLMs) の評価手法は一般的に、人間アノテーションを使ったクローズド環境とドメイン固有のベンチマークでの性能をテストすることに重点を置いている。本稿では,LLMを自動計測するピアレビュー機構を利用して,教師なしの新たな評価方向を探索する。この設定では、オープンソースのLLMとクローズドソースのLLMは同じ環境にあり、ラベルのない質問に回答し、互いに評価することができる。これらのモデル間の能力階層を得るため、各LLMに学習可能な能力パラメータを割り当て、最終ランク付けを調整する。制約付き最適化問題として定式化し、各LLMの能力とスコアの一貫性を最大化することを目的としている。背景にある重要な前提は、高レベルのLSMは低レベルのLSMよりも他人の回答をより正確に評価でき、高レベルのLSMは高い応答スコアを達成できるということである。さらに,PEN,CIN,LISという3つの指標を用いて,ランク付けのギャップを評価する。これらのメトリクスを用いて複数のデータセットの実験を行い、提案手法の有効性を検証する。 Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations. In this paper, we explore a novel unsupervised evaluation direction, utilizing peer-review mechanisms to measure LLMs automatically. In this setting, both open-source and closed-source LLMs lie in the same environment, capable of answering unlabeled questions and evaluating each other, where each LLM's response score is jointly determined by other anonymous ones. To obtain the ability hierarchy among these models, we assign each LLM a learnable capability parameter to adjust the final ranking. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. The key assumption behind is that high-level LLM can evaluate others' answers more accurately than low-level ones, while higher-level LLM can also achieve higher response scores. Moreover, we propose three metrics called PEN, CIN, and LIS to evaluate the gap in aligning human rankings. We perform experiments on multiple datasets with these metrics, validating the effectiveness of the proposed approach.	翻訳日:2024-04-23 23:43:55 公開日:2024-04-20
# 対決の監査:エビデンスとスタイルによる高度な対論生成の評価 Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style ( http://arxiv.org/abs/2402.08498v4 ) ライセンス: Link先を確認	Preetika Verma, Kokil Jaidka, Svetlana Churina,	(参考訳) Reddit ChangeMyViewデータセットからの投稿に対するエビデンスベースでスタイリスティックな反論を作成する能力について、大規模な言語モデル(LLM)を監査しました。質的および定量的な指標のホスト間でそれらの修辞的品質をベンチマークし、最終的には人間の逆論と比較して説得力で評価した。 GPT-3.5 Turbo と Koala とそれらの微調整された変種と PaLM 2 はエビデンスの使用と議論スタイルの異なるプロンプトである。 GPT-3.5 Turboは、特に「相互性」スタイルの議論において、強いパラフレーズとスタイルの忠実さで、議論の質において最高位にランクされた。しかし、スタイリスティックな反論は人間の説得力基準に欠けており、人々は証拠に基づく反論に相反することを好んだ。この結果から, 明らか性と様式的要素のバランスが, 説得力のある反論に不可欠であることが示唆された。今後の研究の方向性とLCMのアウトプット評価の意義について論じる。 We audited large language models (LLMs) for their ability to create evidence-based and stylistic counter-arguments to posts from the Reddit ChangeMyView dataset. We benchmarked their rhetorical quality across a host of qualitative and quantitative metrics and then ultimately evaluated them on their persuasive abilities as compared to human counter-arguments. Our evaluation is based on Counterfire: a new dataset of 32,000 counter-arguments generated from large language models (LLMs): GPT-3.5 Turbo and Koala and their fine-tuned variants, and PaLM 2, with varying prompts for evidence use and argumentative style. GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in `reciprocity' style arguments. However, the stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals. The findings suggest that a balance between evidentiality and stylistic elements is vital to a compelling counter-argument. We close with a discussion of future research directions and implications for evaluating LLM outputs.	翻訳日:2024-04-23 23:34:03 公開日:2024-04-20
# 大規模言語モデルの推論を用いたパズル解法に関する調査 Puzzle Solving using Reasoning of Large Language Models: A Survey ( http://arxiv.org/abs/2402.11291v2 ) ライセンス: Link先を確認	Panagiotis Giadikiaroglou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,	(参考訳) パズル解決におけるLarge Language Models(LLM)の機能の探索は、AIの可能性と課題に関する重要な洞察を明らかにし、複雑な推論タスクにおけるそれらの適用性を理解するための重要なステップを示す。この調査では、パズルをルールベースとルールレスのカテゴリに分割するユニークな分類法を活用し、様々な方法論を通じてLSMを批判的に評価する。関連するデータセットとベンチマークの批判的レビューを通じて、LLMの性能を評価し、複雑なパズルシナリオにおける重要な課題を特定する。本研究は,高度な論理的推論を必要とする人において,LLM能力と人間ライクな推論の相違を浮き彫りにした。この調査は、LLMのパズル解決能力を高め、AIの論理的推論と創造的問題解決の進歩に貢献するために、新しい戦略とよりリッチなデータセットの必要性を強調している。 Exploring the capabilities of Large Language Models (LLMs) in puzzle solving unveils critical insights into their potential and challenges in AI, marking a significant step towards understanding their applicability in complex reasoning tasks. This survey leverages a unique taxonomy -- dividing puzzles into rule-based and rule-less categories -- to critically assess LLMs through various methodologies, including prompting techniques, neuro-symbolic approaches, and fine-tuning. Through a critical review of relevant datasets and benchmarks, we assess LLMs' performance, identifying significant challenges in complex puzzle scenarios. Our findings highlight the disparity between LLM capabilities and human-like reasoning, particularly in those requiring advanced logical inference. The survey underscores the necessity for novel strategies and richer datasets to advance LLMs' puzzle-solving proficiency and contribute to AI's logical reasoning and creative problem-solving advancements.	翻訳日:2024-04-23 23:34:03 公開日:2024-04-20
# MultiCorrupt: マルチモードロバストネスデータセットと3次元物体検出のためのLiDAR-Camera Fusionのベンチマーク MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection ( http://arxiv.org/abs/2402.11677v3 ) ライセンス: Link先を確認	Till Beemelmanns, Quan Zhang, Christian Geller, Lutz Eckstein,	(参考訳) 自動走行のためのマルチモーダル3Dオブジェクト検出モデルは、nuScenesのようなコンピュータビジョンベンチマークでは例外的な性能を示した。しかし、密集したLiDAR点雲や精密に校正されたセンサーアレイへの依存は、現実世界のアプリケーションに課題をもたらす。センサの不整合、誤校正、異なるサンプリング周波数などの問題は、LiDARとカメラのデータにおける空間的および時間的不整合につながる。加えて、LiDARとカメラデータの完全性は、インクリメント気象などの有害な環境条件によってしばしば損なわれ、閉塞やノイズ干渉を引き起こす。この課題に対処するため,MultiCorruptは,10種類の汚職に対してマルチモーダル3Dオブジェクト検出器の堅牢性を評価するために設計された総合的なベンチマークである。我々は,MultiCorrupt上で5つの最先端マルチモーダル検出器を評価し,その耐久性能の観点からその性能を解析した。以上の結果から, 既存手法は, 腐敗の種類や融合戦略によって, 各種の強靭性を示すことがわかった。マルチモーダルな設計選択が、そのようなモデルをある種の摂動に対して堅牢にするための洞察を提供する。データセット生成コードとベンチマークはhttps://github.com/ika-rwth-aachen/MultiCorruptで公開されている。 Multi-modal 3D object detection models for automated driving have demonstrated exceptional performance on computer vision benchmarks like nuScenes. However, their reliance on densely sampled LiDAR point clouds and meticulously calibrated sensor arrays poses challenges for real-world applications. Issues such as sensor misalignment, miscalibration, and disparate sampling frequencies lead to spatial and temporal misalignment in data from LiDAR and cameras. Additionally, the integrity of LiDAR and camera data is often compromised by adverse environmental conditions such as inclement weather, leading to occlusions and noise interference. To address this challenge, we introduce MultiCorrupt, a comprehensive benchmark designed to evaluate the robustness of multi-modal 3D object detectors against ten distinct types of corruptions. We evaluate five state-of-the-art multi-modal detectors on MultiCorrupt and analyze their performance in terms of their resistance ability. Our results show that existing methods exhibit varying degrees of robustness depending on the type of corruption and their fusion strategy. We provide insights into which multi-modal design choices make such models robust against certain perturbations. The dataset generation code and benchmark are open-sourced at https://github.com/ika-rwth-aachen/MultiCorrupt.	翻訳日:2024-04-23 23:34:03 公開日:2024-04-20
# シュレーディンガー猫量子状態を用いた所定の位相シフトの検出 Using Schroedinger cat quantum state for detection of a given phase shift ( http://arxiv.org/abs/2403.03787v2 ) ライセンス: Link先を確認	V. L. Gorshenin, F. Ya. Khalili,	(参考訳) Shroedinger cat の量子状態において準備された光パルスを2本腕干渉計の暗いポートと強い古典的な光を明るいポートに注入することで、原理上、所定の位相シフトを不明瞭に検出できることを示す。この位相シフトの値は古典キャリアとシュレーディンガーの猫状態の振幅に逆比例する。しかし、この目的にはエキゾチックな検出手順が必要である。出力されるダークポートの光子数を測定することで、「偽陽性」確率で位相シフトを検出することができる。この場合の「偽陰性」確率はシュレーディンガーの猫状態の振幅の増加に伴って減少し、この振幅の合理的な値は0.1程度小さくすることができる。 We show that injecting a light pulse prepared in the Shroedinger cat quantum state into the dark port of a two-arm interferometer and the strong classical light into the bright one, it is possible, in principle, to detect a given phase shift unambiguously. The value of this phase shift is inversely proportional to the amplitudes of both the classical carrier and Shroedinger cat state. However, an exotic detection procedure is required for this purpose. By measuring the number of photons at the output dark port, it is possible to detect the phase shift with the vanishing "false positive" probability. The "false negative" probability in this case decreases with the increase on the amplitude of the Schroedinger cat state and, for reasonable values of this amplitude, can be made as small as about 0.1.	翻訳日:2024-04-23 23:24:19 公開日:2024-04-20
# Debatrix: LLMに基づく反復時間解析による多次元議論判断 Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM ( http://arxiv.org/abs/2403.08010v2 ) ライセンス: Link先を確認	Jingcong Liang, Rong Ye, Meng Han, Ruofei Lai, Xinyu Zhang, Xuanjing Huang, Zhongyu Wei,	(参考訳) 広範囲で活気あるマルチターンの議論を評価するために、自動討論審査をどうやって構築できるのか? この課題は、長いテキスト、複雑な議論関係、多次元アセスメントなどで議論されるので、難しい。同時に、現在の研究は主に短い対話に焦点を当てており、議論全体を評価することはめったにない。本稿では,Large Language Models (LLMs) を利用して,マルチターン討論の分析と評価を行うDebatrixを提案する。具体的には、Debatrixは垂直かつ反復的な時系列分析と水平多次元評価コラボレーションを備えている。実世界の議論シナリオに合わせるため、私たちはPanelBenchベンチマークを導入し、システムの性能と実際の議論結果を比較した。以上の結果から,LSMを直接使用して議論評価を行うことによる顕著な改善が示唆された。ソースコードとベンチマークデータはhttps://github.com/ljcleo/debatrix.comで公開されている。 How can we construct an automated debate judge to evaluate an extensive, vibrant, multi-turn debate? This task is challenging, as judging a debate involves grappling with lengthy texts, intricate argument relationships, and multi-dimensional assessments. At the same time, current research mainly focuses on short dialogues, rarely touching upon the evaluation of an entire debate. In this paper, by leveraging Large Language Models (LLMs), we propose Debatrix, which makes the analysis and assessment of multi-turn debates more aligned with majority preferences. Specifically, Debatrix features a vertical, iterative chronological analysis and a horizontal, multi-dimensional evaluation collaboration. To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes. The findings indicate a notable enhancement over directly using LLMs for debate evaluation. Source code and benchmark data are available online at https://github.com/ljcleo/debatrix .	翻訳日:2024-04-23 23:14:33 公開日:2024-04-20
# 野生動物における感情認識のための複合マルチモーダルトランス Joint Multimodal Transformer for Emotion Recognition in the Wild ( http://arxiv.org/abs/2403.10488v3 ) ライセンス: Link先を確認	Paul Waligora, Haseeb Aslam, Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger,	(参考訳) マルチモーダル感情認識(MMER)システムは、例えば視覚的、テキスト的、生理的、聴覚的モダリティ間のモーダル間関係を利用して、通常、単モーダルシステムより優れている。本稿では,キーベースクロスアテンションとの融合のために,ジョイントマルチモーダルトランス (JMT) を利用するMMER法を提案する。このフレームワークは、様々なモダリティの相補的な性質を利用して予測精度を向上させることができる。異なるバックボーンは、ビデオシーケンス上の各モードにおけるモーダル内時空間依存性をキャプチャする。その後、JMT融合アーキテクチャは個々のモダリティ埋め込みを統合し、モデルがモーダル間およびモーダル間関係を効果的にキャプチャすることを可能にする。 Affwild2データセット(顔と声を含む)の次元的感情認識と、Biovidデータセット(顔とバイオセンサーを含む)の痛み推定という2つの困難な表現認識タスクに関する広範な実験は、我々のJMT融合がMMERにコスト効率の良いソリューションをもたらすことを示唆している。実験の結果,MMERシステムによる核融合により,関連するベースラインや最先端手法よりも優れた性能が得られることがわかった。 Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the inter- and intra-modal relationships between, e.g., visual, textual, physiological, and auditory modalities. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention. This framework can exploit the complementary nature of diverse modalities to improve predictive accuracy. Separate backbones capture intra-modal spatiotemporal dependencies within each modality over video sequences. Subsequently, our JMT fusion architecture integrates the individual modality embeddings, allowing the model to effectively capture inter- and intra-modal relationships. Extensive experiments on two challenging expression recognition tasks -- (1) dimensional emotion recognition on the Affwild2 dataset (with face and voice) and (2) pain estimation on the Biovid dataset (with face and biosensors) -- indicate that our JMT fusion can provide a cost-effective solution for MMER. Empirical results show that MMER systems with our proposed fusion allow us to outperform relevant baseline and state-of-the-art methods.	翻訳日:2024-04-23 23:14:33 公開日:2024-04-20
# Lodge: 特徴的なダンスプリミティブによるロングダンス生成のための粗大な拡散ネットワーク Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives ( http://arxiv.org/abs/2403.10518v3 ) ライセンス: Link先を確認	Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li,	(参考訳) 与えられた音楽に条件付けされた非常に長いダンスシーケンスを生成することができるネットワークであるLodgeを提案する。そこで我々は,2つの拡散モデル間の中間表現として有意な表現性を持つ特徴的ダンスプリミティブを提案する。第1段階はグローバル拡散であり、粗いレベルの音楽距離相関と生産特性のダンスプリミティブの理解に焦点を当てている。対照的に第2段階は局所拡散であり、ダンスプリミティブや振付規則の指導の下で、詳細な動き列を並列に生成する。さらに,足と地面の接触を最適化するフットリファインブロックを提案し,運動の物理的現実性を高める。提案手法は,グローバルな振付パターンと局所的な動きの質,表現性とのバランスを保ちながら,非常に長いダンスシーケンスを並列に生成することができる。大規模な実験により,本手法の有効性が検証された。 We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the coarse-level music-dance correlation and production characteristic dance primitives. In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules. In addition, we propose a Foot Refine Block to optimize the contact between the feet and the ground, enhancing the physical realism of the motion. Our approach can parallelly generate dance sequences of extremely long length, striking a balance between global choreographic patterns and local motion quality and expressiveness. Extensive experiments validate the efficacy of our method.	翻訳日:2024-04-23 23:14:33 公開日:2024-04-20
# ガウス過程回帰を用いた機械学習に基づくシステムの信頼性解析 Machine learning-based system reliability analysis with Gaussian Process Regression ( http://arxiv.org/abs/2403.11125v2 ) ライセンス: Link先を確認	Lisang Zhou, Ziqian Luo, Xueting Pan,	(参考訳) 機械学習に基づく信頼性解析手法は、その計算効率と精度に大きな進歩を示した。近年,計算性能を向上させるために,多くの効率的な学習戦略が提案されている。しかし、理論的最適学習戦略を探求する者はほとんどいない。本稿では,そのような探索を容易にするいくつかの定理を提案する。具体的には, 候補設計サンプル間の相関を考慮し, 無視する事例について詳しく述べる。さらに、Kriging相関を無視するケースに対して、よく知られたU学習関数を最適な学習関数に再構成できることを証明した。さらに、逐次多重訓練サンプル濃縮の理論的最適学習戦略についても、ベイズ推定とそれに対応する損失関数を用いて数学的に検討する。シミュレーションの結果,Krigingの相関性を考慮した最適学習戦略は,性能関数の評価回数の削減の観点から,Krigingの相関性やその他の最先端の学習機能を文献から無視する手法よりも有効であることが示唆された。しかし、この実装は非常に大きな計算資源を調査する必要がある。 Machine learning-based reliability analysis methods have shown great advancements for their computational efficiency and accuracy. Recently, many efficient learning strategies have been proposed to enhance the computational performance. However, few of them explores the theoretical optimal learning strategy. In this article, we propose several theorems that facilitates such exploration. Specifically, cases that considering and neglecting the correlations among the candidate design samples are well elaborated. Moreover, we prove that the well-known U learning function can be reformulated to the optimal learning function for the case neglecting the Kriging correlation. In addition, the theoretical optimal learning strategy for sequential multiple training samples enrichment is also mathematically explored through the Bayesian estimate with the corresponding lost functions. Simulation results show that the optimal learning strategy considering the Kriging correlation works better than that neglecting the Kriging correlation and other state-of-the art learning functions from the literatures in terms of the reduction of number of evaluations of performance function. However, the implementation needs to investigate very large computational resource.	翻訳日:2024-04-23 23:14:33 公開日:2024-04-20
# CHisIEC: 古代中国史のための情報抽出コーパス CHisIEC: An Information Extraction Corpus for Ancient Chinese History ( http://arxiv.org/abs/2403.15088v2 ) ライセンス: Link先を確認	Xuemei Tang, Zekun Deng, Qi Su, Hao Yang, Jun Wang,	(参考訳) 自然言語処理(NLP)は、デジタル人文科学(DH)の領域において重要な役割を担い、歴史的・文化的遺産文書の構造解析を推進するための基盤となっている。これは、名前付きエンティティ認識(NER)と関係抽出(RE)のドメインに特に当てはまる。我々は,古代史・文化の迅速化への取り組みとして,「中国歴史情報抽出法人」(CHisIEC)を提示する。 CHisIEC は NER と RE タスクの開発と評価を目的とした,精巧にキュレートされたデータセットである。 1830年以上にわたる13の王朝のデータを網羅した、顕著な歴史的時系列を描いているCisIECは、中国の史料に固有の広範囲の時間的範囲とテキストの不均一性を表わしている。データセットには4つの異なるエンティティタイプと12のリレーションタイプが含まれており、14,194のエンティティと8,609のリレーションで構成されている。データセットの堅牢性と汎用性を確立するため,さまざまなサイズとパラダイムのモデルを含む総合的な実験を行った。また,古代中国史に関わる課題の文脈において,Large Language Models (LLMs) の機能を評価する。データセットとコードは \url{https://github.com/tangxuemei 1995/CHisIEC} で公開されている。 Natural Language Processing (NLP) plays a pivotal role in the realm of Digital Humanities (DH) and serves as the cornerstone for advancing the structural analysis of historical and cultural heritage texts. This is particularly true for the domains of named entity recognition (NER) and relation extraction (RE). In our commitment to expediting ancient history and culture, we present the ``Chinese Historical Information Extraction Corpus''(CHisIEC). CHisIEC is a meticulously curated dataset designed to develop and evaluate NER and RE tasks, offering a resource to facilitate research in the field. Spanning a remarkable historical timeline encompassing data from 13 dynasties spanning over 1830 years, CHisIEC epitomizes the extensive temporal range and text heterogeneity inherent in Chinese historical documents. The dataset encompasses four distinct entity types and twelve relation types, resulting in a meticulously labeled dataset comprising 14,194 entities and 8,609 relations. To establish the robustness and versatility of our dataset, we have undertaken comprehensive experimentation involving models of various sizes and paradigms. Additionally, we have evaluated the capabilities of Large Language Models (LLMs) in the context of tasks related to ancient Chinese history. The dataset and code are available at \url{https://github.com/tangxuemei1995/CHisIEC}.	翻訳日:2024-04-23 23:04:49 公開日:2024-04-20
# 2ストリームFoveation-based Active Vision Learningに向けて Towards Two-Stream Foveation-based Active Vision Learning ( http://arxiv.org/abs/2403.15977v3 ) ライセンス: Link先を確認	Timur Ibrayev, Amitangshu Mukherjee, Sai Aparna Aketi, Kaushik Roy,	(参考訳) ディープニューラルネットワーク(DNN)ベースのマシン認識フレームワークは、入力全体をワンショットで処理し、"何が観察されているか"と"どこにあるか"の両方に対する回答を提供する。対照的に、神経科学の「二流仮説」は、人間の視覚野における神経処理を、脳の2つの別々の領域を利用して、何とどこにあるのかを答える能動的視覚システムとして説明している。本研究では,「二流仮説」にインスパイアされた機械学習フレームワークを提案する。具体的には、提案するフレームワークが以下のメカニズムをモデル化する。 1)眼底部が知覚する入力領域に着目した腹側流(何) 2 視覚的指導を提供する背後(場所)流路及び 3)2つのストリームの反復処理により、視覚的焦点を調整し、フォーカスされた画像パッチのシーケンスを処理する。提案するフレームワークのトレーニングは,腹側ストリームモデルのためのラベルベースのDNNトレーニングと背側ストリームモデルのための強化学習によって達成される。本稿では,2ストリームのファベーションに基づく学習が,訓練データをオブジェクトクラスや属性に限定した弱教師付きオブジェクトローカライゼーション(WSOL)の課題に対して適用可能であることを示す。このフレームワークは、オブジェクトのプロパティを予測し、バウンディングボックスを予測してそれをローカライズすることができる。また、この2つのストリームの独立性から、背側モデルを適用することで、異なるデータセットからオブジェクトをローカライズできることを示す。 Deep neural network (DNN) based machine perception frameworks process the entire input in a one-shot manner to provide answers to both "what object is being observed" and "where it is located". In contrast, the "two-stream hypothesis" from neuroscience explains the neural processing in the human visual cortex as an active vision system that utilizes two separate regions of the brain to answer the what and the where questions. In this work, we propose a machine learning framework inspired by the "two-stream hypothesis" and explore the potential benefits that it offers. Specifically, the proposed framework models the following mechanisms: 1) ventral (what) stream focusing on the input regions perceived by the fovea part of an eye (foveation), 2) dorsal (where) stream providing visual guidance, and 3) iterative processing of the two streams to calibrate visual focus and process the sequence of focused image patches. The training of the proposed framework is accomplished by label-based DNN training for the ventral stream model and reinforcement learning for the dorsal stream model. We show that the two-stream foveation-based learning is applicable to the challenging task of weakly-supervised object localization (WSOL), where the training data is limited to the object class or its attributes. The framework is capable of both predicting the properties of an object and successfully localizing it by predicting its bounding box. We also show that, due to the independent nature of the two streams, the dorsal model can be applied on its own to unseen images to localize objects from different datasets.	翻訳日:2024-04-23 23:04:49 公開日:2024-04-20
# AI研究、政策、実践の10の優先事項 Now, Later, and Lasting: Ten Priorities for AI Research, Policy, and Practice ( http://arxiv.org/abs/2404.04750v3 ) ライセンス: Link先を確認	Eric Horvitz, Vincent Conitzer, Sheila McIlraith, Peter Stone,	(参考訳) 人工知能(AI)の進歩は、私たちの生活や社会の多くの側面を変革し、大きな機会をもたらすと同時に、重大なリスクや課題を生じさせます。今後数十年は、産業革命に匹敵する人類の転換点になるかもしれない。 AIに関する百年研究の創始者やリーダーの視点から、前進するための一連の推奨事項を共有します。 10年前に立ち上げられたこのプロジェクトは、複数の専門分野の専門家による永続的な一連の研究にコミットし、人間や社会に対するAIの即時的、長期的、そして遠方的な影響を評価し、AIの研究、政策、実践についてレコメンデーションを行う。ニューラルモデルから新たな能力が生まれるのを目の当たりにしているので、これらのモデルとその振る舞いに関する科学的理解を深める努力をすることが重要です。技術的、社会的、社会技術的レンズを通じて、AIが人や社会に与える影響に対処し、エンジニアリング、社会的、行動的、経済的な分野からの声を含む、さまざまな専門家の洞察を取り入れなければならない。さまざまな利害関係者間の対話、コラボレーション、行動を促進することで、私たちは、AIの開発と展開を、人間の繁栄に貢献する可能性を最大化する方法で戦略的に導くことができます。短期的な意味と長期的な意味に焦点をあてる分野が多様化しているにもかかわらず、どちらも重要な意味を持つと考えている。 1950年、AIのパイオニアの一人であるアラン・チューリングは「我々は少し先までしか見ることができないが、やるべきことはたくさんある」と記した。 AI技術の短期的および長期的影響の両方に対処する、アクションのための10のレコメンデーションを提供します。 Advances in artificial intelligence (AI) will transform many aspects of our lives and society, bringing immense opportunities but also posing significant risks and challenges. The next several decades may well be a turning point for humanity, comparable to the industrial revolution. We write to share a set of recommendations for moving forward from the perspective of the founder and leaders of the One Hundred Year Study on AI. Launched a decade ago, the project is committed to a perpetual series of studies by multidisciplinary experts to evaluate the immediate, longer-term, and far-reaching effects of AI on people and society, and to make recommendations about AI research, policy, and practice. As we witness new capabilities emerging from neural models, it is crucial that we engage in efforts to advance our scientific understanding of these models and their behaviors. We must address the impact of AI on people and society through technical, social, and sociotechnical lenses, incorporating insights from a diverse range of experts including voices from engineering, social, behavioral, and economic disciplines. By fostering dialogue, collaboration, and action among various stakeholders, we can strategically guide the development and deployment of AI in ways that maximize its potential for contributing to human flourishing. Despite the growing divide in the field between focusing on short-term versus long-term implications, we think both are of critical importance. As Alan Turing, one of the pioneers of AI, wrote in 1950, "We can only see a short distance ahead, but we can see plenty there that needs to be done." We offer ten recommendations for action that collectively address both the short- and long-term potential impacts of AI technologies.	翻訳日:2024-04-23 23:04:49 公開日:2024-04-20
# ディープビデオ圧縮のためのタスク認識エンコーダ制御 Task-Aware Encoder Control for Deep Video Compression ( http://arxiv.org/abs/2404.04848v2 ) ライセンス: Link先を確認	Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin,	(参考訳) マシンタスクのためのディープビデオ圧縮(DVC)に関する以前の研究は、通常、特定のタスクごとに独自のコーデックをトレーニングし、タスクごとに専用のデコーダを強制する必要がある。対照的に、従来のビデオコーデックはフレキシブルなエンコーダコントローラを採用しており、モード予測のようなメカニズムによって単一のコーデックを異なるタスクに適応させることができる。このことからインスピレーションを得て,機械用ディープビデオ圧縮のための革新的なエンコーダコントローラを導入する。モード予測とグループ・オブ・ピクチャーズ(GoP)選択モジュールを備える。提案手法は,符号化段階での制御を集中化し,検出やトラッキングなど,さまざまなタスクに適応可能なエンコーダ調整を実現するとともに,標準の事前学習DVCデコーダとの互換性を維持する。実験的な証拠は,本手法が既存の訓練済みDVCを用いて,複数のタスクにまたがって適用可能であることを示している。さらに,本手法が従来のDVCよりも25%ほど優れており,事前学習したデコーダが1つしかないことが実証された。 Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an innovative encoder controller for deep video compression for machines. This controller features a mode prediction and a Group of Pictures (GoP) selection module. Our approach centralizes control at the encoding stage, allowing for adaptable encoder adjustments across different tasks, such as detection and tracking, while maintaining compatibility with a standard pre-trained DVC decoder. Empirical evidence demonstrates that our method is applicable across multiple tasks with various existing pre-trained DVCs. Moreover, extensive experiments demonstrate that our method outperforms previous DVC by about 25% bitrate for different tasks, with only one pre-trained decoder.	翻訳日:2024-04-23 23:04:49 公開日:2024-04-20
# 足のロコマニピュレーションのための視覚全体制御 Visual Whole-Body Control for Legged Loco-Manipulation ( http://arxiv.org/abs/2403.16967v3 ) ライセンス: Link先を確認	Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Rizhao Qiu, Ruihan Yang, Xiaolong Wang,	(参考訳) そこで本研究では,ロボットアームを装着したロボットによる移動操作の問題点について検討する。ロボットの脚は、通常移動のために使用されるが、全身制御を行うことで操作能力を増幅する機会を提供する。つまり、ロボットは足と腕を同時に制御し、ワークスペースを拡張する。視覚的観察により全身制御を自律的に行うことのできる枠組みを提案する。当社のアプローチであるVisual Whole-Body Control(VBC)は、すべての自由度を用いて、エンドエフェクタマニピュレータの位置を追跡する低レベルポリシーと、視覚入力に基づいてエンドエフェクタ位置を提案する高レベルポリシーで構成されている。シミュレーションにおける両レベルのポリシーをトレーニングし、実際のロボット展開のためのSim2Real転送を実行する。さまざまな構成(高さ、位置、方向)と環境において、さまざまなオブジェクトを拾う際に、大規模な実験を行い、ベースラインよりも大幅に改善した。プロジェクトページ: https://wholebody-b1.github.io We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the end-effector manipulator position and a high-level policy proposing the end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments. Project page: https://wholebody-b1.github.io	翻訳日:2024-04-23 22:55:04 公開日:2024-04-20
# ディープフェイクの生成と検出:ベンチマークと調査 Deepfake Generation and Detection: A Benchmark and Survey ( http://arxiv.org/abs/2403.17881v3 ) ライセンス: Link先を確認	Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, Dacheng Tao,	(参考訳) Deepfake(ディープフェイク)は、特定の条件下で非常にリアルな顔画像やビデオを作成する技術であり、エンターテイメント、映画制作、デジタルヒューマン創造といった分野において大きな応用可能性を持つ。ディープラーニングの進歩により、主に変分オートエンコーダとジェネレーティブ・アドバイサル・ネットワークによって表現される技術は印象的な生成結果を得た。最近では、強力な生成能力を持つ拡散モデルの出現が、新たな研究の波を引き起こしている。ディープフェイク生成に加えて、対応する検出技術は継続的に進化し、プライバシー侵害やフィッシング攻撃などのディープフェイクの潜在的な誤用を規制している。本調査は, この急速に発展する分野における, ディープフェイクの発生と検出, 現状の要約と解析の最新の展開を包括的にレビューする。まずタスク定義を統一し、データセットとメトリクスを包括的に導入し、開発技術について議論する。そこで我々は,複数の関連分野の開発について論じ,顔スワップ,顔の再現,話し顔の生成,顔属性の編集,偽造検出という4つの代表的なディープフェイク分野の研究に焦点をあてる。その後、各分野の一般的なデータセットに代表的手法を総合的にベンチマークし、最新かつ影響力のある著作を十分に評価する。最後に,議論分野の課題と今後の研究方向性について分析する。 Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few. With the advancements in deep learning, techniques primarily represented by Variational Autoencoders and Generative Adversarial Networks have achieved impressive generation results. More recently, the emergence of diffusion models with powerful generation capabilities has sparked a renewed wave of research. In addition to deepfake generation, corresponding detection technologies continuously evolve to regulate the potential misuse of deepfakes, such as for privacy invasion and phishing attacks. This survey comprehensively reviews the latest developments in deepfake generation and detection, summarizing and analyzing current state-of-the-arts in this rapidly evolving field. We first unify task definitions, comprehensively introduce datasets and metrics, and discuss developing technologies. Then, we discuss the development of several related sub-fields and focus on researching four representative deepfake fields: face swapping, face reenactment, talking face generation, and facial attribute editing, as well as forgery detection. Subsequently, we comprehensively benchmark representative methods on popular datasets for each field, fully evaluating the latest and influential published works. Finally, we analyze challenges and future research directions of the discussed fields.	翻訳日:2024-04-23 22:55:04 公開日:2024-04-20
# V2X連携によるエンド・ツー・エンド自動運転 End-to-End Autonomous Driving through V2X Cooperation ( http://arxiv.org/abs/2404.00717v2 ) ライセンス: Link先を確認	Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, Zaiqing Nie,	(参考訳) 先進的な自律運転のための有望なアプローチとして,自走車とV2X通信によるインフラセンサデータの協調利用が出現している。しかし、現在の研究では、最終的な計画性能を最適化するためにエンドツーエンドの学習を採用するのではなく、個々のモジュールの改善に重点を置いている。本稿では,UniV2Xについて紹介する。UniV2Xは,多様なビューにまたがる全てのキー駆動モジュールをシームレスに統合し,統合されたネットワークに組み込む,先駆的な自律運転フレームワークである。車両とインフラの効果的な連携のための疎密度ハイブリッドデータ伝送と融合機構を提案し,その利点を3つ挙げる。 1) エージェント認識, オンラインマッピング, 占有率予測を同時に強化し, 最終的に計画性能を向上する。 2)実用的・限られた通信条件に優しい送信システム。 3) このハイブリッドデータの解釈可能性を備えた信頼性のあるデータ融合。我々は、実際の協調運転データセットであるDAIR-V2Xに挑戦する上で、UniV2Xといくつかのベンチマークメソッドを再現する。実験の結果,UniV2Xは計画性能と中間出力性能を大幅に向上させることができた。コードはhttps://github.com/AIR-THU/UniV2Xにある。 Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. Code is at https://github.com/AIR-THU/UniV2X.	翻訳日:2024-04-23 22:55:04 公開日:2024-04-20
# 新型コロナウイルス検出のための空間的スライス学習 A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection ( http://arxiv.org/abs/2404.01643v2 ) ライセンス: Link先を確認	Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai,	(参考訳) 従来のCT画像認識では,各CTスキャンの解像度とサイズに有意なばらつきがしばしばあり,入力サイズと適応性に対する厳密な要件が要求される。 2)CTスキャンには,多くのアウト・オブ・ディストリビューション(OOD)スライスが含まれている。重要な特徴は、CTスキャン全体の特定の空間領域とスライスにのみ存在する可能性がある。これらがどこにあるのか、どうやって効果的に把握できるのか? そこで本稿では,CTスキャンに特化して設計されたSSFL++(Spatial-Slice Feature Learning)フレームワークを提案する。本研究の目的は,全CTスキャンでOODデータをフィルタリングし,70%の冗長性を完全に低減し,解析のための重要な空間スライスを選択することである。一方,KDS法は,トレーニングおよび推論段階における安定性を向上させるため,収束率を向上し,性能を向上する。その結果、トレーニングデータの1%しか持たない単純なE2Dモデルを用いて、本モデルの有望な性能を実証した。 DEF-AI-MIAワークショップで提供されるCOVID-19-CT-DBデータセットとCVPR 2024を併用して,本手法の有効性を検証した。ソースコードはhttps://github.com/ming053l/E2Dで入手できる。 Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code is available at https://github.com/ming053l/E2D	翻訳日:2024-04-23 22:45:14 公開日:2024-04-20
# 2レベルフィードバック制御によるネットワークシステムの侵入耐性 Intrusion Tolerance for Networked Systems through Two-Level Feedback Control ( http://arxiv.org/abs/2404.01741v2 ) ライセンス: Link先を確認	Kim Hammar, Rolf Stadler,	(参考訳) サービスレプリカを2段階最適制御問題とするシステムの侵入耐性を定式化する。ローカルレベルではノードコントローラが侵入回復を行い、グローバルレベルではシステムコントローラが複製係数を管理する。局所的およびグローバルな制御問題は、操作研究における古典的な問題、すなわち機械交換問題と在庫補充問題として定式化することができる。この定式化に基づいて、侵入耐性システムのための新しい制御アーキテクチャであるTOLERANCEを設計する。両レベルにおける最適制御戦略がしきい値構造を持ち、それらの計算に効率的なアルゴリズムを設計することを証明する。 10種類のネットワーク侵入を行うエミュレーション環境でのTOLERANCEの実装と評価を行う。その結果、TOLERANCEは、最先端の侵入耐性システムと比較して、サービスの可用性を向上し、運用コストを低減できることがわかった。 We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.	翻訳日:2024-04-23 22:45:14 公開日:2024-04-20
# 拡散$^2$:直交拡散モデルのスコア構成による動的3次元コンテンツ生成 Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models ( http://arxiv.org/abs/2404.02148v2 ) ライセンス: Link先を確認	Zeyu Yang, Zijie Pan, Chun Gu, Li Zhang,	(参考訳) 近年の3D生成の進歩は、インターネット規模の画像データで事前訓練され、大量の3Dデータで微調整された3D対応画像拡散モデルの改善により、高度に一貫したマルチビュー画像を生成する能力によって大きく促進されている。しかし、同期したマルチビュービデオデータが不足しているため、このパラダイムを4D生成に直接適用することは不可能である。それにもかかわらず、利用可能なビデオと3Dデータは、ビデオと多視点拡散モデルのトレーニングに適しており、それぞれが満足できる動的および幾何学的事前情報を提供することができる。本稿では,これらのモデルからの幾何的整合性および時間的滑らか性に関する知識を活用し,連続した4次元表現の最適化に使用できる高密度な多視点画像と多フレーム画像を直接サンプリングする動的3次元コンテンツ作成のための新しいフレームワークであるDiffusion$^2$を提案する。具体的には、生成する画像の確率構造に基づいて、ビデオと多視点拡散モデルのスコア合成による簡易かつ効果的な復調戦略を設計する。画像生成の並列性の高さと現代の4D再構成パイプラインの効率性により、我々のフレームワークは数分で4Dコンテンツを生成できる。さらに,本手法は4次元データへの依存を回避し,基礎映像や多視点拡散モデルのスケーラビリティから恩恵を受ける可能性がある。大規模な実験により,提案手法の有効性と各種のプロンプトに柔軟に適応する能力が実証された。 Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models which are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of producing highly consistent multi-view images. However, due to the scarcity of synchronized multi-view video data, it is impractical to adapt this paradigm to 4D generation directly. Despite that, the available video and 3D data are adequate for training video and multi-view diffusion models that can provide satisfactory dynamic and geometric priors respectively. In this paper, we present Diffusion$^2$, a novel framework for dynamic 3D content creation that leverages the knowledge about geometric consistency and temporal smoothness from these models to directly sample dense multi-view and multi-frame images which can be employed to optimize continuous 4D representation. Specifically, we design a simple yet effective denoising strategy via score composition of video and multi-view diffusion models based on the probability structure of the images to be generated. Owing to the high parallelism of the image generation and the efficiency of the modern 4D reconstruction pipeline, our framework can generate 4D content within few minutes. Furthermore, our method circumvents the reliance on 4D data, thereby having the potential to benefit from the scalability of the foundation video and multi-view diffusion models. Extensive experiments demonstrate the efficacy of our proposed framework and its capability to flexibly adapt to various types of prompts.	翻訳日:2024-04-23 22:45:14 公開日:2024-04-20
# 人間の視線が常に人間のAIチームの分類精度を向上しない機械を対話的に誘導することを可能にする Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy ( http://arxiv.org/abs/2404.05238v3 ) ライセンス: Link先を確認	Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen,	(参考訳) Explainable AI (XAI) における何千もの論文、注目マップ \cite{vaswani2017attention} と特徴重要マップ \cite{bansal2020sam} が、AIの判断に各入力機能がどの程度重要かを知る共通の手段として確立されている。ユーザがテスト時に重要な機能を編集できるようにすることで、ダウンストリームタスクにおける人間とAIチームの精度が向上するかどうか、興味深い、未調査の質問である。本稿では、入力画像とトレーニングセット画像のパッチワイド対応を最初に予測し、それらをベースとして分類決定を行う、最先端のアンテホックな分類器 \cite{taesi2022visual} であるCHM-Corrを活用することで、この問題に対処する。我々はCHM-CorrのインタラクティブインターフェースであるCHM-Corr++を構築し、CHM-Corrが提供する機能の重要度マップを編集し、更新されたモデル決定を観察できるようにする。 CHM-Corr++を使用すると、ユーザーはモデルが出力を変更するかどうか、いつ、どのように変更するかについての洞察を得ることができ、静的な説明以上の理解を改善することができる。しかし,1400件の意思決定を行った18名の専門家による研究では,静的な説明よりもCUB-200の鳥画像分類において,対話的アプローチがユーザ精度を向上させるという統計的意義は見出されていない。これは、対話性は人間-AIチームの精度を高め、将来の研究の必要性を高めるという仮説に挑戦する。画像分類器の注意を編集するためのインタラクティブツールであるCHM-Corr++をオープンソースとして公開した(インタラクティブなデモはこちらを参照)。 https://github.com/anguyen8/chm-corr-interactive。 Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive.	翻訳日:2024-04-23 22:45:14 公開日:2024-04-20
# 大規模言語モデルを用いた企業知識ベースに対する質問応答の強化 Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models ( http://arxiv.org/abs/2404.08695v2 ) ライセンス: Link先を確認	Feihu Jiang, Chuan Qin, Kaichun Yao, Chuyu Fang, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong,	(参考訳) 効率的な知識管理は、企業や組織の運用効率と革新的な能力の両方を増強する上で重要な役割を担っている。ベクトル化による知識の索引付けにより,知識検索手法が出現し,知識管理システムの有効性が著しく向上した。近年、生成自然言語処理技術の急速な進歩は、ユーザクエリに適合した関連文書を検索した後、正確で一貫性のある回答を生成するための道を開いた。しかし、企業知識ベースでは、知識検索と生成のためのスクラッチから広範なトレーニングデータを組み立てることは、プライベートデータのプライバシとセキュリティポリシーが大きなコストを伴っているため、非常に難しい課題である。本稿では,大規模言語モデル(LLM)に基づく新しい検索・生成フレームワークであるEKRGを提案する。具体的には,まず LLM を用いて,知識検索者の学習に十分な文書検索ペアを生成する命令チューニング手法を提案する。この方法は、慎重に設計された指示を通じて、事実指向の知識とソリューション指向の知識の両方を含む、企業の知識ベースに対する多様な質問を効率的に生成する。さらに,学習過程の効率化を図るため,関係性に敏感な教師学生学習戦略を構築した。提案手法では,新たな思考連鎖(CoT)に基づく微調整手法を提案する。最後に、実世界のデータセットに関する広範な実験を行い、提案フレームワークの有効性を実証した。 Efficient knowledge management plays a pivotal role in augmenting both the operational efficiency and the innovative capacity of businesses and organizations. By indexing knowledge through vectorization, a variety of knowledge retrieval methods have emerged, significantly enhancing the efficacy of knowledge management systems. Recently, the rapid advancements in generative natural language processing technologies paved the way for generating precise and coherent answers after retrieving relevant documents tailored to user queries. However, for enterprise knowledge bases, assembling extensive training data from scratch for knowledge retrieval and generation is a formidable challenge due to the privacy and security policies of private data, frequently entailing substantial costs. To address the challenge above, in this paper, we propose EKRG, a novel Retrieval-Generation framework based on large language models (LLMs), expertly designed to enable question-answering for Enterprise Knowledge bases with limited annotation costs. Specifically, for the retrieval process, we first introduce an instruction-tuning method using an LLM to generate sufficient document-question pairs for training a knowledge retriever. This method, through carefully designed instructions, efficiently generates diverse questions for enterprise knowledge bases, encompassing both fact-oriented and solution-oriented knowledge. Additionally, we develop a relevance-aware teacher-student learning strategy to further enhance the efficiency of the training process. For the generation process, we propose a novel chain of thought (CoT) based fine-tuning method to empower the LLM-based generator to adeptly respond to user questions using retrieved documents. Finally, extensive experiments on real-world datasets have demonstrated the effectiveness of our proposed framework.	翻訳日:2024-04-23 20:47:39 公開日:2024-04-20
# FusionMamba: Mambaを用いたマルチモーダル画像融合のための動的特徴強調 FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba ( http://arxiv.org/abs/2404.09498v2 ) ライセンス: Link先を確認	Xinyu Xie, Yawen Cui, Chio-In Ieong, Tao Tan, Xiaozhi Zhang, Xubin Zheng, Zitong Yu,	(参考訳) マルチモーダル画像融合は、異なるモードの情報を組み合わせて、包括的な情報と詳細なテクスチャを持つ単一の画像を作成することを目的としている。しかし、畳み込みニューラルネットワークに基づく融合モデルは、局所畳み込み操作に焦点をあてたため、グローバルな画像の特徴を捉える際の限界に直面する。トランスフォーマーベースのモデルは、グローバルな特徴モデリングに優れているが、その2次複雑さに起因する計算上の課題に直面している。近年、Selective Structured State Space Modelは、線形複雑度を持つ長距離依存モデリングにおいて重要な可能性を示し、上記のジレンマに対処するための有望な道を提供する。本稿では,マルチモーダル画像融合のための動的特徴強調手法FusionMambaを提案する。具体的には,画像融合のための効率的なマンバモデルを提案し,動的畳み込みとチャネルアテンションによる効率的な視覚状態空間モデルを統合する。この改良されたモデルは、Mambaの性能とグローバルモデリング能力だけでなく、局所的な拡張能力を高めながらチャネルの冗長性を低下させる。さらに,2つの動的特徴拡張モジュール (DFEM) と相互モード融合マンバモジュール (CMFM) からなる動的特徴融合モジュール (DFFM) を考案した。前者は動的テクスチャ強化と動的差分知覚に役立ち、後者はモード間の相関性を高め、冗長なモーダル情報を抑制する。 FusionMambaは、様々なマルチモーダル画像融合タスク(CT-MRI、PET-MRI、SPECT-MRI)、赤外線および可視画像融合タスク(IR-VIS)、多モーダルバイオメディカル画像融合データセット(GFP-PC)にまたがって、最先端のSOTA(State-of-the-art)性能を実現した。 FusionMambaのコードはhttps://github.com/millieXie/FusionMamba.comで公開されている。 Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at https://github.com/millieXie/FusionMamba.	翻訳日:2024-04-23 20:37:54 公開日:2024-04-20
# フォッカー・プランク方程式の補間超対称対 Interpolating supersymmetric pair of Fokker-Planck equations ( http://arxiv.org/abs/2404.09551v2 ) ライセンス: Link先を確認	Choon-Lin Ho,	(参考訳) 我々は、一対の超対称関連Fokker-Planck方程式を定数係数で補間するFokker-Planck方程式を考える。形状不変性の興味深い性質に基づき、超対称対のフォッカー・プランク系の解の様々な1パラメータ補間を直接構築することができる。 We consider Fokker-Planck equations that interpolate a pair of supersymmetrically related Fokker-Planck equations with constant coefficients. Based on the interesting property of shape-invariance, various one-parameter interpolations of the solutions of the supersymmetric pair of Fokker-Planck systems can be directly constructed.	翻訳日:2024-04-23 20:37:54 公開日:2024-04-20
# CREST: ゼロショット学習の強化のための証拠深層学習によるクロスモーダル共鳴 CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning ( http://arxiv.org/abs/2404.09640v3 ) ライセンス: Link先を確認	Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li,	(参考訳) ゼロショット学習(ZSL)は、既知のカテゴリから未知のカテゴリへのセマンティックな知識伝達を活用することで、新しいクラスの認識を可能にする。この知識は、典型的には属性記述にカプセル化され、クラス固有の視覚的特徴を識別し、視覚的セマンティックなアライメントを促進し、ZSLのパフォーマンスを向上させる。しかし、インスタンス間の分布不均衡や属性共起といった現実世界の課題は、画像の局所的なばらつきの識別を妨げることがしばしばあり、これは、きめ細かい領域固有の属性アノテーションの不足によって悪化する。さらに、カテゴリー内の視覚的プレゼンテーションの多様性は属性カテゴリーの関連を歪ませることもできる。そこで本研究では,双方向の双方向ZSLアプローチであるCRESTを提案する。属性と視覚的ローカライゼーションの表現を抽出することから始まり、Evidential Deep Learning (EDL) を用いて、根底にあるてんかんの不確実性を測定することによって、強陰性に対するモデルのレジリエンスを高める。 CRESTには、視覚的カテゴリと属性的カテゴリのアライメントの両方に焦点を当てたデュアルラーニングパスが組み込まれており、潜在空間と可観測空間の堅牢な相関性を保証する。さらに,不確実性のあるクロスモーダル融合手法を導入し,視覚属性推論を洗練させる。大規模な実験では、複数のデータセットにまたがるモデルの有効性とユニークな説明可能性を示す。私たちのコードとデータは、https://github.com/JethroJames/CRESTで利用可能です。 Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances and attribute co-occurrence among instances often hinder the discernment of local variances in images, a problem exacerbated by the scarcity of fine-grained, region-specific attribute annotations. Moreover, the variability in visual presentation within categories can also skew attribute-category associations. In response, we propose a bidirectional cross-modal ZSL approach CREST. It begins by extracting representations for attribute and visual localization and employs Evidential Deep Learning (EDL) to measure underlying epistemic uncertainty, thereby enhancing the model's resilience against hard negatives. CREST incorporates dual learning pathways, focusing on both visual-category and attribute-category alignments, to ensure robust correlation between latent and observable spaces. Moreover, we introduce an uncertainty-informed cross-modal fusion technique to refine visual-attribute inference. Extensive experiments demonstrate our model's effectiveness and unique explainability across multiple datasets. Our code and data are available at: https://github.com/JethroJames/CREST	翻訳日:2024-04-23 20:37:54 公開日:2024-04-20
# セキュリティとプライバシ製品インクルージョン Security and Privacy Product Inclusion ( http://arxiv.org/abs/2404.13220v1 ) ライセンス: Link先を確認	Dave Kleidermacher, Emmanuel Arriaga, Eric Wang, Sebastian Porst, Roger Piqueras Jover,	(参考訳) 本稿では,多様な背景からユーザに対するセキュリティとプライバシを確保することの課題について考察する。本稿では,セキュリティとプライバシに製品が組み込まれるリスクや対策を識別するための脅威モデリング手法を提案する。我々は、低所得層、接続性の低さ、デバイス使用の共有、MLフェアネスなど、ユーザが高いレベルのセキュリティとプライバシを達成する能力に影響を与えるさまざまな要因について論じる。我々は,グローバルなセキュリティおよびプライバシユーザエクスペリエンス調査の結果を提示し,製品開発者への影響について論じる。私たちの研究は、セキュリティとプライバシに対するより包括的なアプローチの必要性を強調し、研究者や実践者がさまざまなユーザのために製品やサービスを設計するとき、考慮すべきフレームワークを提供します。 In this paper, we explore the challenges of ensuring security and privacy for users from diverse demographic backgrounds. We propose a threat modeling approach to identify potential risks and countermeasures for product inclusion in security and privacy. We discuss various factors that can affect a user's ability to achieve a high level of security and privacy, including low-income demographics, poor connectivity, shared device usage, ML fairness, etc. We present results from a global security and privacy user experience survey and discuss the implications for product developers. Our work highlights the need for a more inclusive approach to security and privacy and provides a framework for researchers and practitioners to consider when designing products and services for a diverse range of users.	翻訳日:2024-04-23 19:58:55 公開日:2024-04-20
# Vim4Path: 病理画像のための自己監督型視覚マンバ Vim4Path: Self-Supervised Vision Mamba for Histopathology Images ( http://arxiv.org/abs/2404.13222v1 ) ライセンス: Link先を確認	Ali Nasiri-Sarvi, Vincent Quoc-Huy Trinh, Hassan Rivaz, Mahdi S. Hosseini,	(参考訳) Gigapixel Whole Slide Images (WSI) からの表現学習は、組織構造の複雑な性質とラベル付きデータの不足により、計算病理学において重要な課題となっている。マルチインスタンス学習手法はこの課題に対処し、イメージパッチを活用し、自己監視学習(SSL)アプローチを用いた事前学習モデルを用いたスライドの分類を行っている。 SSLとMILの両方のパフォーマンスは、機能エンコーダのアーキテクチャに依存している。本稿では、状態空間モデルにインスパイアされたVision Mamba(Vim)アーキテクチャを、DINOフレームワークの計算病理学における表現学習に活用することを提案する。我々は、パッチレベルとスライドレベルの両方の分類において、Camelyon16データセット上でのVim対ビジョントランスフォーマー(ViT)の性能を評価する。以上の結果から,Vim は ViT と比較して性能が向上し,特に比較的小規模なモデルでは ROC AUC が8.21 増加していることが明らかとなった。説明可能性分析は、Vimの機能をさらに強調し、Vimが病理学者のワークフローに似ていないViTを独自にエミュレートしていることを明らかにした。この人間の専門的分析との整合性は、現実的な診断におけるヴィムの可能性を強調し、計算病理学における効果的な表現学習アルゴリズムの開発に大きく貢献する。コードと事前訓練されたウェイトは、 \url{https://github.com/AtlasAnalyticsLab/Vim4Path}でリリースします。 Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The performance of both SSL and MIL methods relies on the architecture of the feature encoder. This paper proposes leveraging the Vision Mamba (Vim) architecture, inspired by state space models, within the DINO framework for representation learning in computational pathology. We evaluate the performance of Vim against Vision Transformers (ViT) on the Camelyon16 dataset for both patch-level and slide-level classification. Our findings highlight Vim's enhanced performance compared to ViT, particularly at smaller scales, where Vim achieves an 8.21 increase in ROC AUC for models of similar size. An explainability analysis further highlights Vim's capabilities, which reveals that Vim uniquely emulates the pathologist workflow-unlike ViT. This alignment with human expert analysis highlights Vim's potential in practical diagnostic settings and contributes significantly to developing effective representation-learning algorithms in computational pathology. We release the codes and pretrained weights at \url{https://github.com/AtlasAnalyticsLab/Vim4Path}.	翻訳日:2024-04-23 19:58:55 公開日:2024-04-20
# タブラルデータに対する特徴空間属性を組み込んだモデルに基づく対実的説明 Model-Based Counterfactual Explanations Incorporating Feature Space Attributes for Tabular Data ( http://arxiv.org/abs/2404.13224v1 ) ライセンス: Link先を確認	Yuta Sumiya, Hayaru shouno,	(参考訳) 大規模なデータセットからパターンを正確に予測することが知られている機械学習モデルは、意思決定において極めて重要である。その結果、入力摂動を導入して予測を説明する反事実的説明手法が顕著になった。これらの混乱は、しばしば予測を変更する方法を示唆し、実行可能なレコメンデーションをもたらす。しかし、現在の手法では、各入力変更の最適化問題を解く必要があり、計算コストがかかる。さらに、従来の符号化手法は表データのカテゴリ変数の摂動に不適切に対処する。そこで本研究では,正規化フローを用いた効率的な対実的説明法であるFastDCFlowを提案する。提案手法は, 複雑なデータ分布を捕捉し, 近接性を保持する有意義な潜在空間を学習し, 予測を改善する。分類変数に対しては、順序関係を尊重し、摂動コストを含むTargetEncodingを採用しました。提案手法は, 既存の手法を複数の指標で比較し, 対実的説明のためのトレードオフのバランスを崩した。ソースコードは以下のリポジトリで入手できる。 Machine-learning models, which are known to accurately predict patterns from large datasets, are crucial in decision making. Consequently, counterfactual explanations-methods explaining predictions by introducing input perturbations-have become prominent. These perturbations often suggest ways to alter the predictions, leading to actionable recommendations. However, the current techniques require resolving the optimization problems for each input change, rendering them computationally expensive. In addition, traditional encoding methods inadequately address the perturbations of categorical variables in tabular data. Thus, this study propose FastDCFlow, an efficient counterfactual explanation method using normalizing flows. The proposed method captures complex data distributions, learns meaningful latent spaces that retain proximity, and improves predictions. For categorical variables, we employed TargetEncoding, which respects ordinal relationships and includes perturbation costs. The proposed method outperformed existing methods in multiple metrics, striking a balance between trade offs for counterfactual explanations. The source code is available in the following repository: https://github.com/sumugit/FastDCFlow.	翻訳日:2024-04-23 19:58:55 公開日:2024-04-20
# 量子計算における誤差補正アルゴリズムのためのナノメカニカルアンシラ量子ビット生成器 Nanomechanical ancilla qubits generator for error correction algorithms in quantum computation ( http://arxiv.org/abs/2404.13234v1 ) ライセンス: Link先を確認	Danko Radić, Leonid Y. Gorelik, Sergei I. Kulinich, Robert I. Shekhter,	(参考訳) 本稿では,3ビットフリップ符号のエンコーダとして実証された,量子コンピューティングにおける誤り訂正アルゴリズムに対して,適切に絡み合った「アンシラ」量子ビットを生成するナノエレクトロメカニカルセットアップを提案する。このセットアップは、電圧バイアス超伝導電極とクーパー対箱の状態で機械的に振動するメソスコピック超伝導粒との間の交流ジョセフソン効果を利用して、ゲート電圧によって制御されるメソスコピック端子に基づいている。要求された機能は、特に2つの外部パラメータ(バイアス電圧とゲート電圧)を操作するための時間プロトコールによって達成される。超電導穀物は、カンチレバーのフリーエンドに固定され、制御された機内機械振動を行い、2つの垂直空間方向に一対の絡み合った猫状態に組織されたナノメカニカルコヒーレント状態を生成する。クーパー対箱とナノメカニカルコヒーレント状態は、特定の方法で3つの絡み合った量子ビットとなる: 最初はクーパー対箱状態の重ね合わせでエンコードされた量子情報は、2つの特別な3つの四角い状態、$\vert \uparrow + \, + \rangle$ と $\vert \downarrow - \, - \rangle$ の量子重ね合わせに変換される。これは3ビットビットフリップ符号の基本入力状態を構成し、主に量子計算でエラー訂正に使用され、ナノエレクトロメカニクスによって最後の2つのアンシラ量子ビットが生成される1つの物理オブジェクトに「インストール」される。 We suggest a nanoelectromechanical setup that generates properly entangled ancillary ("ancilla") qubits for error correction algorithms in quantum computing, demonstrated as an encoder for the three-qubit bit flip code. The setup is based on mesoscopic terminal utilizing the AC Josephson effect between voltage biased superconducting electrodes and mechanically vibrating mesoscopic superconducting grain in the regime of the Cooper pair box, controlled by the gate voltage. Required functionality is achieved by specifically tailored time-protocol of operating two external parameters: bias voltage and gate voltage. The superconducting grain is fixed on the free end of a cantilever, performing controlled in-plane mechanical vibrations, generating the nanomechanical coherent states organised in a pair of entangled cat-states in two perpendicular spatial directions. Cooper pair box and nanomechanical coherent states become three entangled qubits in a particular way: quantum information, initially encoded in superposition of the Cooper pair box states, is transduced into quantum superposition of two special 3-qubit entangled states, $\vert \uparrow + \, + \rangle$ and $\vert \downarrow - \, - \rangle$. It constitutes the basic input state for the three-qubit bit flip code, used in quantum computation mainly for error correction, "installed" on a single physical object in which the last two ancilla qubits are generated by the nanoelectromechanical setup.	翻訳日:2024-04-23 19:58:55 公開日:2024-04-20
# TrialDura: 解釈可能な治験期間予測のための階層的注意変換器 TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction ( http://arxiv.org/abs/2404.13235v1 ) ライセンス: Link先を確認	Ling Yue, Jonathan Li, Md Zabirul Islam, Bolun Xia, Tianfan Fu, Jintai Chen,	(参考訳) 臨床試験プロセスは、薬物開発としても知られ、新しい治療法の開発に欠かせないステップである。介入臨床試験の主な目的は、人体における特定の疾患の治療における薬物ベースの治療の安全性と効果を評価することである。しかし、臨床試験は長く、労働集約的で、費用がかかる。臨床試験の期間は、全体的な費用に影響を与える重要な要因である。したがって、臨床試験のスケジュールを効果的に管理することは、予算の制御と研究の経済的可能性の最大化に不可欠である。この問題に対処するために、病気名、薬物分子、試験段階、資格基準を含む多モードデータを用いて臨床試験期間を推定する機械学習ベースのTrialDuraを提案する。次に,臨床実験データのより深く,より関連性の高い意味的理解を提供するために,バイオメディカルコンテキスト用に特別に調整されたBio-BERT埋め込みにエンコードする。最後に、モデルの階層的な注意機構は、すべての埋め込みを繋ぎ、それらの相互作用を捉え、臨床試験期間を予測する。提案モデルでは, 平均絶対誤差(MAE)が1.04年, 根平均二乗誤差(RMSE)が1.39年であった。公開されているコードはhttps://anonymous.4open.science/r/TrialDura-F196で見ることができる。 The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a crucial factor that influences overall expenses. Therefore, effective management of the timeline of a clinical trial is essential for controlling the budget and maximizing the economic viability of the research. To address this issue, We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data, including disease names, drug molecules, trial phases, and eligibility criteria. Then, we encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding of clinical trial data. Finally, the model's hierarchical attention mechanism connects all of the embeddings to capture their interactions and predict clinical trial duration. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models, indicating more accurate clinical trial duration prediction. Publicly available code can be found at https://anonymous.4open.science/r/TrialDura-F196	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# PAFedFV:指静脈認識のための個人化・非同期フェデレーション学習 PAFedFV: Personalized and Asynchronous Federated Learning for Finger Vein Recognition ( http://arxiv.org/abs/2404.13237v1 ) ライセンス: Link先を確認	Hengyu Mu, Jian Guo, Chong Han, Lijuan Sun,	(参考訳) ユーザのプライバシ保護に重点が置かれているため、フェデレートラーニングに基づく生体認証が最新の研究ホットスポットとなっている。しかし,従来のフェデレーション学習法は,データの不均一性やオープンセット検証のため,指静脈認識に直接適用できない。そのため、いくつかの応用例が提案されているのみである。これらの方法には相変わらず2つの欠点がある。 1) 指静脈データは非常に均一であり,非独立に非独立に分布する(非IID)ため,一部のクライアントでは一様モデルでは性能が低下する。 2)個々のクライアントでは、サーバからモデルを返すのを待つ時間など、大量の時間が使われません。このような問題に対処するため,本研究では,Personalized and Asynchronous Federated Learning for Finger Vein Recognition (PAFedFV) フレームワークを提案する。 PAFedFVは、非IIDデータの不均一性を解決するために、パーソナライズされたモデルアグリゲーション法を設計する。一方、クライアントが待機時間を利用するために、非同期のトレーニングモジュールを使用している。最後に、6つの指静脈データセットに関する広範な実験を行った。これらの実験結果に基づいて, フェデレート学習における非IID指静脈データの影響を解析し, PAFedFVの精度およびロバスト性における優位性を実証した。 With the increasing emphasis on user privacy protection, biometric recognition based on federated learning have become the latest research hotspot. However, traditional federated learning methods cannot be directly applied to finger vein recognition, due to heterogeneity of data and open-set verification. Therefore, only a few application cases have been proposed. And these methods still have two drawbacks. (1) Uniform model results in poor performance in some clients, as the finger vein data is highly heterogeneous and non-Independently Identically Distributed (non-IID). (2) On individual client, a large amount of time is underutilized, such as the time to wait for returning model from server. To address those problems, this paper proposes a Personalized and Asynchronous Federated Learning for Finger Vein Recognition (PAFedFV) framework. PAFedFV designs personalized model aggregation method to solve the heterogeneity among non-IID data. Meanwhile, it employs an asynchronized training module for clients to utilize their waiting time. Finally, extensive experiments on six finger vein datasets are conducted. Base on these experiment results, the impact of non-IID finger vein data on performance of federated learning are analyzed, and the superiority of PAFedFV in accuracy and robustness are demonstrated.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# 大規模言語モデルのためのパーソナライズされた無線フェデレーション学習 Personalized Wireless Federated Learning for Large Language Models ( http://arxiv.org/abs/2404.13238v1 ) ライセンス: Link先を確認	Feibo Jiang, Li Dong, Siwei Tu, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato,	(参考訳) 大規模言語モデル(LLM)は自然言語処理タスクに革命をもたらした。しかしながら、無線ネットワークへの展開は、プライバシとセキュリティ保護機構の欠如など、依然として課題に直面している。フェデレートラーニング(FL)は、これらの課題に対処するための有望なアプローチとして登場した。しかし、大きなデータと不均一なデータの非効率な処理、リソース集約的なトレーニング、高い通信オーバーヘッドといった問題に悩まされている。これらの課題に対処するために、まず、無線ネットワークにおけるLLMの異なる学習段階と特徴を比較した。次に、コミュニケーションオーバーヘッドの少ない2つのパーソナライズされたワイヤレスフェデレーション微調整手法、すなわち、強化学習を利用してパーソナライズを実現するローカルLLMをパーソナライズするパーソナライズドフェデレーション微調整法(PFIT)、グローバルアダプタとローカルローランド適応(LoRA)を活用してローカルLoRAをアグリゲーションなしでパーソナライズできるパーソナライズされたフェデレーションタスク微調整法(PFTT)を導入する。最後に,提案手法の有効性を実証するためにシミュレーションを行い,オープンな問題を包括的に議論する。 Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms. Federated Learning (FL) has emerged as a promising approach to address these challenges. Yet, it suffers from issues including inefficient handling with big and heterogeneous data, resource-intensive training, and high communication overhead. To tackle these issues, we first compare different learning stages and their features of LLMs in wireless networks. Next, we introduce two personalized wireless federated fine-tuning methods with low communication overhead, i.e., (1) Personalized Federated Instruction Tuning (PFIT), which employs reinforcement learning to fine-tune local LLMs with diverse reward models to achieve personalization; (2) Personalized Federated Task Tuning (PFTT), which can leverage global adapters and local Low-Rank Adaptations (LoRA) to collaboratively fine-tune local LLMs, where the local LoRAs can be applied to achieve personalization without aggregation. Finally, we perform simulations to demonstrate the effectiveness of the proposed two methods and comprehensively discuss open issues.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# 医用画像セグメンテーションのためのPixel-Wiseスーパービジョンを超えて:従来のモデルから基礎モデルへ Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models ( http://arxiv.org/abs/2404.13239v1 ) ライセンス: Link先を確認	Yuyan Shi, Jialu Ma, Jin Yang, Shasha Wang, Yichi Zhang,	(参考訳) 医用画像のセグメンテーションは多くの画像誘導臨床アプローチにおいて重要な役割を担っている。しかし、既存のセグメンテーションアルゴリズムは、特に専門家だけが信頼性と正確なアノテーションを提供することができる医療画像領域において、主に、労働集約的かつ専門的要求の両方が可能な、訓練用のピクセル単位のアノテーション付き完全注釈画像の可用性に依存している。この課題を軽減するため、画像レベル、バウンディングボックス、スクリブル、ポイントなどの弱いアノテーションでディープモデルをトレーニングできるセグメンテーション手法の開発に注力している。視覚基盤モデルの出現、特にSAM(Segment Anything Model)は、大規模な事前学習によって可能となる、迅速なセグメンテーションのための弱いアノテーションを使ったセグメンテーションタスクの革新的な機能を導入している。従来の学習手法とともに基礎モデルを採用することで、近年の関心調査コミュニティが増加し、現実世界の応用の可能性を示している。本稿では,弱アノテーションを用いた医用画像セグメンテーションにおけるアノテーション効率学習における基礎モデル導入前後の最近の進歩を包括的に調査する。さらに,既存のアプローチの課題を分析,議論し,基礎モデルの軌跡を形作り,医用画像セグメンテーションの分野をさらに進めるための貴重なガイダンスを提供すると信じている。 Medical image segmentation plays an important role in many image-guided clinical approaches. However, existing segmentation algorithms mostly rely on the availability of fully annotated images with pixel-wise annotations for training, which can be both labor-intensive and expertise-demanding, especially in the medical imaging domain where only experts can provide reliable and accurate annotations. To alleviate this challenge, there has been a growing focus on developing segmentation methods that can train deep models with weak annotations, such as image-level, bounding boxes, scribbles, and points. The emergence of vision foundation models, notably the Segment Anything Model (SAM), has introduced innovative capabilities for segmentation tasks using weak annotations for promptable segmentation enabled by large-scale pre-training. Adopting foundation models together with traditional learning methods has increasingly gained recent interest research community and shown potential for real-world applications. In this paper, we present a comprehensive survey of recent progress on annotation-efficient learning for medical image segmentation utilizing weak annotations before and in the era of foundation models. Furthermore, we analyze and discuss several challenges of existing approaches, which we believe will provide valuable guidance for shaping the trajectory of foundational models to further advance the field of medical image segmentation.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# 2つの市場におけるラミフィケーションを用いた逆因果戦略環境の学習 Learning In Reverse Causal Strategic Environments With Ramifications on Two Sided Markets ( http://arxiv.org/abs/2404.13240v1 ) ライセンス: Link先を確認	Seamus Somerstep, Yuekai Sun, Ya'acov Ritov,	(参考訳) 労働市場の均衡モデルによって動機づけられた我々は、戦略エージェントが直接成果を操作できる因果戦略分類の定式化を開発する。応用として、労働力の戦略的対応を期待する雇用主と、そうでない雇用主を比較する。我々は,雇用者報酬,労働力のスキルレベル,場合によっては労働力のエクイティを改善するために,適度に最適な雇用政策を持つ雇用者が,雇用者報酬を改善するという理論と実験の組み合わせを提示する。一方,作業従事者は労働力の効用を害し,他の事例では差別を防げないことを示す。 Motivated by equilibrium models of labor markets, we develop a formulation of causal strategic classification in which strategic agents can directly manipulate their outcomes. As an application, we compare employers that anticipate the strategic response of a labor force with employers that do not. We show through a combination of theory and experiment that employers with performatively optimal hiring policies improve employer reward, labor force skill level, and in some cases labor force equity. On the other hand, we demonstrate that performative employers harm labor force utility and fail to prevent discrimination in other cases.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# オークション型フェデレーション学習のための知的エージェント:調査 Intelligent Agents for Auction-based Federated Learning: A Survey ( http://arxiv.org/abs/2404.13244v1 ) ライセンス: Link先を確認	Xiaoli Tang, Han Yu, Xiaoxiao Li, Sarit Kraus,	(参考訳) オークションベースのフェデレーション・ラーニング(AFL)はFLインセンティブ・メカニズム設計の重要な分野であり、高品質なデータ・オーナーがデータ・コンシューマー(すなわちサーバ)のFLトレーニング・タスクに参加することを公平かつ効率的に動機付ける能力がある。利害関係者(すなわちデータ消費者、データ所有者、オークション業者)に対するAFL意思決定支援の効率を高めるために、インテリジェントエージェントベースの技術が出現した。しかし、この分野の非常に学際的な性質と、アクセス可能な視点を提供する総合的な調査が欠如していることから、研究者がこの分野に参入して貢献することは困難である。本稿では,AI-AFL(Intelligent Agents for AFL)文献に関する第1回調査を通じて,この重要なギャップを埋める。既存のIA-AFLの動作を整理する独自の多層分類法を提案する。 1)利害関係者は, 2 競売の仕組みが採用され、及び 3)エージェントの目的は,読者にこの分野に対する多視点的な視点を提供することである。さらに,既存手法の限界を分析し,広く採用されている性能評価指標を要約し,IA-AFLエコシステムにおける効果的かつ効率的な利害関係者主導の意思決定支援に向けた将来的な方向性について議論する。 Auction-based federated learning (AFL) is an important emerging category of FL incentive mechanism design, due to its ability to fairly and efficiently motivate high-quality data owners to join data consumers' (i.e., servers') FL training tasks. To enhance the efficiency in AFL decision support for stakeholders (i.e., data consumers, data owners, and the auctioneer), intelligent agent-based techniques have emerged. However, due to the highly interdisciplinary nature of this field and the lack of a comprehensive survey providing an accessible perspective, it is a challenge for researchers to enter and contribute to this field. This paper bridges this important gap by providing a first-of-its-kind survey on the Intelligent Agents for AFL (IA-AFL) literature. We propose a unique multi-tiered taxonomy that organises existing IA-AFL works according to 1) the stakeholders served, 2) the auction mechanism adopted, and 3) the goals of the agents, to provide readers with a multi-perspective view into this field. In addition, we analyse the limitations of existing approaches, summarise the commonly adopted performance evaluation metrics, and discuss promising future directions leading towards effective and efficient stakeholder-oriented decision support in IA-AFL ecosystems.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# ISQA:科学要約のためのインフォームティブ・ファクチュアリティ・フィードバック ISQA: Informative Factuality Feedback for Scientific Summarization ( http://arxiv.org/abs/2404.13246v1 ) ライセンス: Link先を確認	Zekai Li, Yanxia Qin, Qian Liu, Min-Yen Kan,	(参考訳) Informative Scientific Question-Answering (ISQA) feedback\footnote{Code is available at \url{https://github.com/lizekai-richard/isqa}}。要約の反復的精錬を通じて、科学的な要約の事実性を高めるために、文の基本的な理性を探究する。 ISQAは、肯定的なフィードバックで検証されたステートメントを補強し、否定的なフィードバックで不正なステートメントを修正するよう、要約エージェントに頼んで、これをきめ細かな方法で行う。以上の結果から,ISQAフィードバック機構は,複数の科学的データセットで評価されるように,要約タスクにおける各種オープンソースLCMの事実性を大幅に向上することが示された。 We propose Iterative Facuality Refining on Informative Scientific Question-Answering (ISQA) feedback\footnote{Code is available at \url{https://github.com/lizekai-richard/isqa}}, a method following human learning theories that employs model-generated feedback consisting of both positive and negative information. Through iterative refining of summaries, it probes for the underlying rationale of statements to enhance the factuality of scientific summarization. ISQA does this in a fine-grained manner by asking a summarization agent to reinforce validated statements in positive feedback and fix incorrect ones in negative feedback. Our findings demonstrate that the ISQA feedback mechanism significantly improves the factuality of various open-source LLMs on the summarization task, as evaluated across multiple scientific datasets.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# ハイパースペクトル画像分類のための3次元畳み込み誘導スペクトル空間変換器 3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification ( http://arxiv.org/abs/2404.13252v1 ) ライセンス: Link先を確認	Shyam Varahagiri, Aryaman Sinha, Shiv Ram Dubey, Satish Kumar Singh,	(参考訳) 近年、ビジョントランスフォーマー(ViT)は、自己認識機構のため、畳み込みニューラルネットワーク(CNN)よりも有望な分類性能を示している。多くの研究者がハイパースペクトル画像(HSI)分類にViTを組み込んでいる。 HSIは狭いスペクトル帯域によって特徴づけられ、豊富なスペクトルデータを提供する。 ViTはシーケンシャルなデータを扱うが、CNNのようなスペクトル空間情報を抽出することはできない。さらに、高い分類性能を持つためには、HSIトークンとクラス(CLS)トークンの間に強い相互作用がある必要がある。これらの問題を解決するために、3D-Convolution Guided Residual Module (CGRM) を用いたHSI分類のための3D-Convolution Guided Spectral-Spatial Transformer (3D-ConvSST)を提案する。さらに、クラストークンを前もってGlobal Average Poolingを適用し、より差別的で関連する高レベルな特徴を効果的にコード化します。 3つの公開HSIデータセットを用いて、最先端の伝統、畳み込み、トランスフォーマーモデルよりも提案モデルの方が優れていることを示す大規模な実験が行われた。コードはhttps://github.com/ShyamVarahagiri/3D-ConvSSTで公開されている。 In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cannot extract spectral-spatial information like CNNs. Furthermore, to have high classification performance, there should be a strong interaction between the HSI token and the class (CLS) token. To solve these issues, we propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification that utilizes a 3D-Convolution Guided Residual Module (CGRM) in-between encoders to "fuse" the local spatial and spectral information and to enhance the feature propagation. Furthermore, we forego the class token and instead apply Global Average Pooling, which effectively encodes more discriminative and pertinent high-level features for classification. Extensive experiments have been conducted on three public HSI datasets to show the superiority of the proposed model over state-of-the-art traditional, convolutional, and Transformer models. The code is available at https://github.com/ShyamVarahagiri/3D-ConvSST.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# ST-SSM:交通予測のための空間空間モデルの選択状態 ST-SSMs: Spatial-Temporal Selective State of Space Model for Traffic Forecasting ( http://arxiv.org/abs/2404.13257v1 ) ライセンス: Link先を確認	Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Haoning Xi, Junbin Gao,	(参考訳) 正確な効率的な交通予測は、インテリジェント交通システムの計画、管理、制御に不可欠である。交通予測の最先端手法の多くは、時空間ニューラルネットワークを予測モデルとして用い、トランスフォーマーとともに予測対象(道路セグメントの交通状況など)のグローバルな情報を学ぶことによって、長期と短期の両方を効果的に予測する。しかし、これらの手法は優れた性能を得るのに高い計算コストがかかることが多い。本稿では,新しいST-Mambaブロックを特徴とする交通流予測モデルST-SSM(Spatial-Temporal Selective State Space Model)を提案する。比較分析ではST-マンバ層の効率が強調され、3つの注意層に等価であるが、処理時間が大幅に短縮された。多様な実世界のデータセットの厳密なテストを通じて、ST-SSMsモデルは予測精度と計算の単純さを例外的に改善し、トラフィックフロー予測領域に新しいベンチマークを設定する。 Accurate and efficient traffic prediction is crucial for planning, management, and control of intelligent transportation systems. Most state-of-the-art methods for traffic prediction effectively predict both long-term and short-term by employing spatio-temporal neural networks as prediction models, together with transformers to learn global information on prediction objects (e.g., traffic states of road segments). However, these methods often have a high computational cost to obtain good performance. This paper introduces an innovative approach to traffic flow prediction, the Spatial-Temporal Selective State Space Model (ST-SSMs), featuring the novel ST-Mamba block, which can achieve good prediction accuracy with less computational cost. A comparative analysis highlights the ST-Mamba layer's efficiency, revealing its equivalence to three attention layers, yet with markedly reduced processing time. Through rigorous testing on diverse real-world datasets, the ST-SSMs model demonstrates exceptional improvements in prediction accuracy and computational simplicity, setting new benchmarks in the domain of traffic flow forecasting	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# インカムと健康因子の機械学習分析による糖尿病予測 Predicting Diabetes with Machine Learning Analysis of Income and Health Factors ( http://arxiv.org/abs/2404.13260v1 ) ライセンス: Link先を確認	Fariba Jafari Horestani, M. Mehdi Owrang O,	(参考訳) 本研究では,糖尿病と健康指標の複雑な関係について検討し,新たな収入の変動に着目した。 2015年行動危険因子監視システム(BRFSS)のデータを利用して、血圧、コレステロール、BMI、喫煙習慣など様々な要因が糖尿病の流行に与える影響を分析する。包括的分析は,それぞれの要因を分離するだけでなく,その相互依存性や糖尿病に対する集団的影響も調べる。我々の研究の新たな側面は、糖尿病リスクの決定要因としての収入の検証である。我々は、社会経済的地位と糖尿病の間の複雑な相互作用を解明するために、統計的および機械学習技術を使用し、経済的幸福が健康にどのように影響するかの新しい洞察を提供する。我々の研究は、低所得のブラケットが糖尿病の発生率に結びついている、明らかな傾向を明らかにした。健康因子とライフスタイルの選択を含む33変数の混合分析において,高血圧,高コレステロール,コレステロールチェック,所得,BMIなどの特徴が重要であることを確認した。これらの要素は、糖尿病の流行と管理において重要な役割を担っていることが示唆されている。 In this study, we delve into the intricate relationships between diabetes and a range of health indicators, with a particular focus on the newly added variable of income. Utilizing data from the 2015 Behavioral Risk Factor Surveillance System (BRFSS), we analyze the impact of various factors such as blood pressure, cholesterol, BMI, smoking habits, and more on the prevalence of diabetes. Our comprehensive analysis not only investigates each factor in isolation but also explores their interdependencies and collective influence on diabetes. A novel aspect of our research is the examination of income as a determinant of diabetes risk, which to the best of our knowledge has been relatively underexplored in previous studies. We employ statistical and machine learning techniques to unravel the complex interplay between socio-economic status and diabetes, providing new insights into how financial well-being influences health outcomes. Our research reveals a discernible trend where lower income brackets are associated with a higher incidence of diabetes. In analyzing a blend of 33 variables, including health factors and lifestyle choices, we identified that features such as high blood pressure, high cholesterol, cholesterol checks, income, and Body Mass Index (BMI) are of considerable significance. These elements stand out among the myriad of factors examined, suggesting that they play a pivotal role in the prevalence and management of diabetes.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# FilterPrompt: 拡散モデルにおける画像転送の誘導 FilterPrompt: Guiding Image Transfer in Diffusion Models ( http://arxiv.org/abs/2404.13263v1 ) ライセンス: Link先を確認	Xi Wang, Yichen Peng, Heng Fang, Haoran Xie, Xi Yang, Chuntao Li,	(参考訳) 制御可能な生成タスクでは、生成した画像を柔軟に操作し、単一の入力画像キューに基づいて所望の外観や構造を達成できる。これを実現するには、入力画像データ内のキー属性を効果的に分離し、表現を正確に取得する必要がある。以前の研究では、主に特徴空間内の画像属性の分離に焦点が当てられていた。しかし、実世界のデータに存在する複雑な分布は、そのようなデカップリングアルゴリズムを他のデータセットに適用することを難しくすることが多い。さらに、機能符号化に対する制御の粒度は、特定のタスク要求を満たすのにしばしば失敗する。様々な生成モデルの特性を精査すると,拡散モデルの入力感度と動的進化特性は,画素空間における明示的な分解操作と効果的に融合できることがわかった。これにより、入力画像の特定の特徴分布に対して画素空間で実行される画像処理操作が可能となり、生成した結果において所望の制御効果が得られる。そこで本研究では,モデル制御効果を高めるためのFilterPromptを提案する。任意の拡散モデルに普遍的に適用可能であり、ユーザーはタスク要求に応じて特定の画像特徴の表現を調整でき、より正確で制御可能な生成結果を容易にすることができる。特に,我々の設計した実験では,FilterPromptが特徴相関を最適化し,生成プロセス中のコンテント競合を緩和し,モデルの制御能力を向上することを示した。 In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This integration enables the image processing operations performed in pixel space for a specific feature distribution of the input image, and can achieve the desired control effect in the generated results. Therefore, we propose FilterPrompt, an approach to enhance the model control effect. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features in accordance with task requirements, thereby facilitating more precise and controllable generation outcomes. In particular, our designed experiments demonstrate that the FilterPrompt optimizes feature correlation, mitigates content conflicts during the generation process, and enhances the model's control capability.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# F5Cファインダー:mRNA上の5-ホルミルシチジン修飾を予測するための説明可能な生物学的言語モデル F5C-finder: An Explainable and Ensemble Biological Language Model for Predicting 5-Formylcytidine Modifications on mRNA ( http://arxiv.org/abs/2404.13265v1 ) ライセンス: Link先を確認	Guohao Wang, Ting Liu, Hongqiang Lyu, Ze Liu,	(参考訳) 5-ホルミルシチジン(5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine)は、様々な生物学的過程において重要である。しかし、従来のf5C検出のための実験的手法は、しばしば手間がかかり、時間を要するため、f5Cのサイトを包括的に転写酵素にマッピングする能力は制限される。計算手法はコスト効率と高スループットの代替手段を提供するが、f5Cの認識モデルは開発されていない。自然言語処理における言語モデルからインスピレーションを得て,f5Cの同定にマルチヘッドアテンションを用いたアンサンブルニューラルネットワークモデルであるf5Cファインダーを提案する。 5つの異なる特徴抽出法を用いて、5つの個別のニューラルネットワークを構築し、これらのネットワークはその後、アンサンブル学習を通じて統合され、f5Cファインダーを生成する。 10倍のクロスバリデーションと独立試験により, AUCが0.807, 0.827で, f5CファインダーがSOTA(State-of-the-art)性能を達成した。この結果は、ゲノム内の順序(順序)と機能的意味(意味)の両方をキャプチャする生物学的言語モデルの有効性を強調している。さらに、組み込まれた解釈可能性により、モデルが何を学習しているかを理解することができ、キーシーケンシャルな要素の識別と、それらの生物学的機能のより深い探索の間に橋渡しができる。 As a prevalent and dynamically regulated epigenetic modification, 5-formylcytidine (f5C) is crucial in various biological processes. However, traditional experimental methods for f5C detection are often laborious and time-consuming, limiting their ability to map f5C sites across the transcriptome comprehensively. While computational approaches offer a cost-effective and high-throughput alternative, no recognition model for f5C has been developed to date. Drawing inspiration from language models in natural language processing, this study presents f5C-finder, an ensemble neural network-based model utilizing multi-head attention for the identification of f5C. Five distinct feature extraction methods were employed to construct five individual artificial neural networks, and these networks were subsequently integrated through ensemble learning to create f5C-finder. 10-fold cross-validation and independent tests demonstrate that f5C-finder achieves state-of-the-art (SOTA) performance with AUC of 0.807 and 0.827, respectively. The result highlights the effectiveness of biological language model in capturing both the order (sequential) and functional meaning (semantics) within genomes. Furthermore, the built-in interpretability allows us to understand what the model is learning, creating a bridge between identifying key sequential elements and a deeper exploration of their biological functions.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# 表構造と文字認識のためのマルチセルデコーダと相互学習 Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition ( http://arxiv.org/abs/2404.13268v1 ) ライセンス: Link先を確認	Takaya Kawakatsu,	(参考訳) 学術論文や財務報告などの文書から表の内容を取り出し,それを大規模言語モデルで処理可能な形式に変換することは,知識情報処理において重要な課題である。テーブル構造だけでなくセル内容も認識するエンドツーエンドアプローチは、外部文字認識システムを用いた最先端モデルに匹敵する性能を達成し、さらなる改善の可能性を秘めている。さらに、これらのモデルでは、数百セルの長いテーブルを局所的な注意を払って認識できるようになった。しかし、モデルでは、ヘッダーからフッタへの1方向のテーブル構造を認識し、各セルごとにセル内容の認識を行うため、近隣セルから有用な情報を検索する機会はない。本稿では,エンド・ツー・エンドアプローチを改善するために,マルチセルコンテンツデコーダと双方向相互学習機構を提案する。この効果は2つの大きなデータセットで実証され、実験結果は、多数のセルを持つ長いテーブルであっても、最先端のモデルに匹敵する性能を示す。 Extracting table contents from documents such as scientific papers and financial reports and converting them into a format that can be processed by large language models is an important task in knowledge information processing. End-to-end approaches, which recognize not only table structure but also cell contents, achieved performance comparable to state-of-the-art models using external character recognition systems, and have potential for further improvements. In addition, these models can now recognize long tables with hundreds of cells by introducing local attention. However, the models recognize table structure in one direction from the header to the footer, and cell content recognition is performed independently for each cell, so there is no opportunity to retrieve useful information from the neighbor cells. In this paper, we propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach. The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models, even for long tables with large numbers of cells.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# 非定常雑音下での確率的誤差キャンセルの改善 Improving probabilistic error cancellation in the presence of non-stationary noise ( http://arxiv.org/abs/2404.13269v1 ) ライセンス: Link先を確認	Samudra Dasgupta, Travis S. Humble,	(参考訳) 非定常雑音の存在下での確率的誤差キャンセル(PEC)結果の安定性について検討する。ベイズ法を利用して,PECの安定性と精度を向上させる戦略を設計する。我々は,Bernstein-Vazirani アルゴリズムを5ビット実装し,ibm_kolkata デバイス上で行った実験により,非適応型 PEC と比較して精度が 42% 向上し,安定性が60% 向上したことを明らかにした。これらの結果は,PECの活用に不可欠である非定常雑音に効果的に対処するための適応推定プロセスの重要性を浮き彫りにした。 We investigate the stability of probabilistic error cancellation (PEC) outcomes in the presence of non-stationary noise, which is an obstacle to achieving accurate observable estimates. Leveraging Bayesian methods, we design a strategy to enhance PEC stability and accuracy. Our experiments using a 5-qubit implementation of the Bernstein-Vazirani algorithm and conducted on the ibm_kolkata device reveal a 42% improvement in accuracy and a 60% enhancement in stability compared to non-adaptive PEC. These results underscore the importance of adaptive estimation processes to effectively address non-stationary noise, vital for advancing PEC utility.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# StrideNET:動的粗さ抽出による地形認識のためのスイニングトランス StrideNET: Swin Transformer for Terrain Recognition with Dynamic Roughness Extraction ( http://arxiv.org/abs/2404.13270v1 ) ライセンス: Link先を確認	Maitreya Shelare, Neha Shigvan, Atharva Satam, Poonam Sonar,	(参考訳) 深層学習の進歩は、リモートセンシング画像の分類に革命をもたらしている。トランスフォーマーベースのアーキテクチャは、自己認識機構を利用して、画像内のグローバルな関係とともに、長距離依存関係のキャプチャを可能にする、従来の畳み込み手法に代わるものとして登場した。そこで本研究では,地形認識と暗黙的特性推定のために設計された新しいデュアルブランチアーキテクチャであるStrideNETを提案する。地形認識部はSwin Transformerを利用して、その階層的表現と低計算コストを活用し、局所的特徴とグローバル的特徴の両方を効率的に捉える。地形特性分枝は, 統計的テクスチャ解析法を用いて, 粗さやすべり性などの表面特性の抽出に重点を置いている。地形特性の計算により、環境認識の強化が可能である。 StrideNETモデルは、Grassy、Marshy、Sandy、Rockyの4つのターゲット地形クラスからなるデータセットでトレーニングされている。 StrideNETは、現代の方法と比較して競争力がある。この研究の意味は、環境モニタリング、土地利用と土地被覆分類(LULC)、災害対応、精密農業など、様々な応用にまで及んでいる。 Advancements in deep learning are revolutionizing the classification of remote-sensing images. Transformer-based architectures, utilizing self-attention mechanisms, have emerged as alternatives to conventional convolution methods, enabling the capture of long-range dependencies along with global relationships in the image. Motivated by these advancements, this paper presents StrideNET, a novel dual-branch architecture designed for terrain recognition and implicit properties estimation. The terrain recognition branch utilizes the Swin Transformer, leveraging its hierarchical representation and low computational cost to efficiently capture both local and global features. The terrain properties branch focuses on the extraction of surface properties such as roughness and slipperiness using a statistical texture analysis method. By computing surface terrain properties, an enhanced environmental perception can be obtained. The StrideNET model is trained on a dataset comprising four target terrain classes: Grassy, Marshy, Sandy, and Rocky. StrideNET attains competitive performance compared to contemporary methods. The implications of this work extend to various applications, including environmental monitoring, land use and land cover (LULC) classification, disaster response, precision agriculture, and much more.	翻訳日:2024-04-23 19:49:10 公開日:2024-04-20
# クロスマスク復元を用いた教師なし異常検出のための多機能再構成ネットワーク Multi-feature Reconstruction Network using Crossed-mask Restoration for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2404.13273v1 ) ライセンス: Link先を確認	Junpu Wang, Guili Xu, Chunlei Li, Guangshuai Gao, Yuehua Cheng,	(参考訳) 工業生産における品質検査において,正常試料のみを用いた無監督異常検出が重要である。既存の再構成手法は有望な結果を得たが、画像再構成における識別性に乏しい情報と、モデル過剰一般化能力による異常な再生という2つの問題に直面している。上記の課題を克服するために,画像再構成を並列特徴復元の組み合わせに変換し,マルチ機能再構成ネットワークであるMFRNetを提案する。具体的には、予め訓練されたモデルから入力画像のより識別的な階層的表現を生成するために、まずマルチスケール特徴集約器を開発した。その後、抽出した特徴マップをランダムにカバーするためにクロスマスクジェネレータを採用し、次いで、欠落した領域の高品質な修復のためのトランス構造に基づく復元ネットワークを構築する。最後に、ハイブリッド損失は、モデルトレーニングと異常推定をガイドし、画素と構造的類似性の両方を考慮している。大規模な実験により、我々の手法は4つの公開データセットと1つの自作データセットにおいて、他の最先端のデータセットと非常に競争的であるか、大幅に上回っていることが示された。 Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above issues, we convert the image reconstruction into a combination of parallel feature restorations and propose a multi-feature reconstruction network, MFRNet, using crossed-mask restoration in this paper. Specifically, a multi-scale feature aggregator is first developed to generate more discriminative hierarchical representations of the input images from a pre-trained model. Subsequently, a crossed-mask generator is adopted to randomly cover the extracted feature map, followed by a restoration network based on the transformer structure for high-quality repair of the missing regions. Finally, a hybrid loss is equipped to guide model training and anomaly estimation, which gives consideration to both the pixel and structural similarity. Extensive experiments show that our method is highly competitive with or significantly outperforms other state-of-the-arts on four public available datasets and one self-made dataset.	翻訳日:2024-04-23 19:39:26 公開日:2024-04-20
# 拡張されたオブジェクトインテリジェンス:XRオブジェクトでアナログワールドを対話可能にする Augmented Object Intelligence: Making the Analog World Interactable with XR-Objects ( http://arxiv.org/abs/2404.13274v1 ) ライセンス: Link先を確認	Mustafa Doga Dogan, Eric J. Gonzalez, Andrea Colaco, Karan Ahuja, Ruofei Du, Johnny Lee, Mar Gonzalez-Franco, David Kim,	(参考訳) 対話型デジタルエンティティとしての物理オブジェクトのシームレスな統合は、空間コンピューティングの課題である。本稿では,デジタルオブジェクトがデジタルであるかのように対話できる能力を備えた,デジタルオブジェクトと物理オブジェクトの境界線を曖昧にするために設計された,新しいXRインタラクションパラダイムであるAugmented Object Intelligence(AOI)を紹介する。提案手法では,オブジェクトのセグメンテーションと分類と,MLLM(Multimodal Large Language Models)のパワーを組み合わせることで,これらのインタラクションを容易にする。我々は,AOI の概念を XR-Objects というオープンソースのプロトタイプシステムで実装する。このシステムにより、アナログオブジェクトが情報を伝えるだけでなく、細部への問い合わせやタスクの実行といったデジタルアクションを開始することができる。 1)従来のAIアシスタントよりもAOIの概念を定義し、その利点を詳述し、(2)XR-Objectsシステムのオープンソース設計と実装を詳述し、(3)さまざまなユースケースとユーザスタディを通じてその汎用性を示す。 Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper introduces Augmented Object Intelligence (AOI), a novel XR interaction paradigm designed to blur the lines between digital and physical by endowing real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to vast digital functionalities. Our approach utilizes object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in rich and contextually relevant ways. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through a variety of use cases and a user study.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# スコア変更を超えて:2つの観点からの非参照画像品質評価に対する敵対的攻撃 Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives ( http://arxiv.org/abs/2404.13277v1 ) ライセンス: Link先を確認	Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang,	(参考訳) ディープニューラルネットワークは、NR-IQA(No-Reference Image Quality Assessment)において驚くべき成功を収めている。しかし、最近の研究は、NR-IQAモデルが微妙な敵の摂動に対して脆弱であることを強調し、モデル予測と主観的評価の不整合をもたらす。しかし、現在の敵対的攻撃は、個々の画像の予測スコアの摂動に焦点を合わせ、画像集合全体におけるスコア間の相関関係の重要な側面を無視している。一方、ランキング相関と同様、NR-IQAタスクでは相関が重要な役割を担っていることに留意する必要がある。 NR-IQAモデルのロバスト性を包括的に探求するために,画像集合内の相関関係を乱し,個々の画像に変化をスコアする相関エラーベースの新たなフレームワークを導入する。我々の研究は主に、Spearman's Rank-Order correlation Coefficient (SROCC)やMean Squared Error (MSE)のような予測エラー関連メトリクスのようなランキング関連相関指標に焦点を当てている。そこで本研究では,SROCC-MSE-Attack (SMA) と呼ばれる2段階のSROCC-MSE-Attack (SMA) を提案する。実験の結果,SMA法はSROCCを負の値に大きく破壊するだけでなく,個々の画像のスコアにかなりの変化をもたらすことが明らかとなった。一方、さまざまなカテゴリのメトリクスにまたがって最先端のパフォーマンスを示す。提案手法はNR-IQAモデルのロバスト性に関する新しい視点を提供する。 Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# 超音波金属溶接における条件モニタリングのためのタスクパーソナライズによるフェデレーション伝達学習 Federated Transfer Learning with Task Personalization for Condition Monitoring in Ultrasonic Metal Welding ( http://arxiv.org/abs/2404.13278v1 ) ライセンス: Link先を確認	Ahmadreza Eslaminia, Yuquan Meng, Klara Nahrstedt, Chenhui Shao,	(参考訳) 超音波金属溶接(UMW)は産業用途において重要な接合技術である。プロセス異常が接合品質を著しく低下させるため、UMWアプリケーションでは条件監視(CM)機能が必要である。近年、機械学習モデルは複雑なパターンを学習できるため、多くの製造アプリケーションにおいてCMにとって有望なツールとして登場した。しかし、これらのモデルのデプロイを成功させるためには、膨大なトレーニングデータが必要である。さらに、既存の機械学習モデルの多くは一般化性に欠けており、新しいプロセス構成(すなわちドメイン)に直接適用できない。このような問題は、メーカー間でデータをプールすることで軽減される可能性があるが、データ共有はデータプライバシの重大な懸念を引き起こす。これらの課題に対処するため,データプライバシを確保しつつ,分散学習におけるドメイン一般化機能を提供するFTL-TP(Federated Transfer Learning with Task Personalization)フレームワークを提案する。特徴空間から統一表現を効果的に学習することにより、FTL-TPは、同様のタスクを行うクライアントに対してCMモデルを適応させることができる。 FTL-TPの有効性を実証するために,2つの異なるUMW CMタスク,ツール条件モニタリング,ワークピース表面条件分類について検討した。最先端のFLアルゴリズムと比較して、FTL-TPは新しいターゲット領域におけるCMの精度を5.35%から8.08%向上させる。 FTL-TPはまた、不均衡なデータ分散と限られたクライアント分数を含む挑戦的なシナリオでも優れた性能を発揮する。さらに,エッジクラウドアーキテクチャ上でのFTL-TP手法の実装により,本手法が実現可能かつ効率的に実現可能であることを示す。 FTL-TPフレームワークは、他の様々な製造アプリケーションに容易に拡張可能である。 Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patterns. Yet, the successful deployment of these models requires substantial training data that may be expensive and time-consuming to collect. Additionally, many existing machine learning models lack generalizability and cannot be directly applied to new process configurations (i.e., domains). Such issues may be potentially alleviated by pooling data across manufacturers, but data sharing raises critical data privacy concerns. To address these challenges, this paper presents a Federated Transfer Learning with Task Personalization (FTL-TP) framework that provides domain generalization capabilities in distributed learning while ensuring data privacy. By effectively learning a unified representation from feature space, FTL-TP can adapt CM models for clients working on similar tasks, thereby enhancing their overall adaptability and performance jointly. To demonstrate the effectiveness of FTL-TP, we investigate two distinct UMW CM tasks, tool condition monitoring and workpiece surface condition classification. Compared with state-of-the-art FL algorithms, FTL-TP achieves a 5.35%--8.08% improvement of accuracy in CM in new target domains. FTL-TP is also shown to perform excellently in challenging scenarios involving unbalanced data distributions and limited client fractions. Furthermore, by implementing the FTL-TP method on an edge-cloud architecture, we show that this method is both viable and efficient in practice. The FTL-TP framework is readily extensible to various other manufacturing applications.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# セマンティックコミュニケーションにおけるセマンティック・シンボリック再構築のバックドア攻撃と防御 Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications ( http://arxiv.org/abs/2404.13279v1 ) ライセンス: Link先を確認	Yuan Zhou, Rose Qingyang Hu, Yi Qian,	(参考訳) 次世代無線通信ネットワークでは,セマンティック通信が重要である。既存の研究は、ディープラーニングに基づくセマンティックコミュニケーションフレームワークを開発した。しかし、ディープラーニングを利用したシステムは、バックドア攻撃や敵攻撃のような脅威に対して脆弱である。本稿では,ディープラーニング対応セマンティックコミュニケーションシステムを対象としたバックドア攻撃について検討する。現在のバックドア攻撃はセマンティック・コミュニケーションのシナリオには適していないため、セマンティック・シンボル(BASS)に対する新たなバックドア・アタック・パラダイムが導入された。具体的には,BASS防止のためのトレーニングフレームワークを提案する。さらに、リバースエンジニアリングベースおよびプルーニングベースの防衛戦略は、セマンティックコミュニケーションにおけるバックドア攻撃を防ぐように設計されている。シミュレーションの結果,提案した攻撃パラダイムと防衛戦略の有効性が示された。 Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic communication systems. Since current works on backdoor attacks are not tailored for semantic communication scenarios, a new backdoor attack paradigm on semantic symbols (BASS) is introduced, based on which the corresponding defense measures are designed. Specifically, a training framework is proposed to prevent BASS. Additionally, reverse engineering-based and pruning-based defense strategies are designed to protect against backdoor attacks in semantic communication. Simulation results demonstrate the effectiveness of both the proposed attack paradigm and the defense strategies.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# Wills Aligner:ロバストな多目的脳表現学習者 Wills Aligner: A Robust Multi-Subject Brain Representation Learner ( http://arxiv.org/abs/2404.13282v1 ) ライセンス: Link先を確認	Guangyin Bao, Zixuan Gong, Qi Zhang, Jialei Zhou, Wei Fan, Kun Yi, Usman Naseem, Liang Hu, Duoqian Miao,	(参考訳) 最近の研究では、人間の脳活動から視覚情報を復号する技術が目覚ましい進歩を遂げている。しかし、被験者間の皮質パーセレーションや認知パターンの有意な変動により、現在のアプローチは各被験者にパーソナライズされたディープモデルを提供し、現実の文脈においてこの技術の実用性を制限している。この課題に対処するために,頑健な多目的脳表現学習者であるWills Alignerを紹介した。私たちのWills Alignerは最初、解剖学的レベルで異なる被験者の脳を調整します。その後、個々の認知パターンを学習するために、脳の専門家の混合物が組み込まれている。さらに、多目的学習タスクを2段階のトレーニングに分離し、深層モデルとそのプラグインネットワークを推進し、共通性間の知識と様々な認知パターンを学習する。 Wills Alignerは、解剖学的差異を克服し、単一のモデルを多目的脳表現学習に効率的に活用することを可能にする。粗くきめ細かな視覚的デコードタスクにまたがるアプローチの性能を慎重に評価する。 The experimental results showed that our Wills Aligner achieves State-of-the-art performance。 Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, due to the significant variability in cortical parcellation and cognition patterns across subjects, current approaches personalized deep models for each subject, constraining the practicality of this technology in real-world contexts. To tackle the challenges, we introduce Wills Aligner, a robust multi-subject brain representation learner. Our Wills Aligner initially aligns different subjects' brains at the anatomical level. Subsequently, it incorporates a mixture of brain experts to learn individual cognition patterns. Additionally, it decouples the multi-subject learning task into a two-stage training, propelling the deep model and its plugin network to learn inter-subject commonality knowledge and various cognition patterns, respectively. Wills Aligner enables us to overcome anatomical differences and to efficiently leverage a single model for multi-subject brain representation learning. We meticulously evaluate the performance of our approach across coarse-grained and fine-grained visual decoding tasks. The experimental results demonstrate that our Wills Aligner achieves state-of-the-art performance.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# PoseINN: Invertible Neural Networksを用いたリアルタイム視覚ベースのPose回帰とローカライゼーション PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks ( http://arxiv.org/abs/2404.13288v1 ) ライセンス: Link先を確認	Zirui Zang, Ahmad Amine, Rahul Mangharam,	(参考訳) カメラからエゴ位置を推定することは、モバイルロボティクスから拡張現実に至るまで、ロボット工学における重要な問題である。 SOTAモデルはますます正確化が進んでいるが、計算コストが高いため、いまだに扱いにくい。本稿では,インバータブルニューラルネットワーク(INN)を用いて画像の潜在空間とシーンのポーズのマッピングを求める。我々のモデルは、訓練が速く、低解像度合成データのオフラインレンダリングしか必要とせず、SOTAと同じような性能を実現している。正規化フローを用いることで,提案手法は出力に対する不確実性を推定する。また,移動ロボットにモデルを配置することで,本手法の有効性を実証した。 Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality. While SOTA models are becoming increasingly accurate, they can still be unwieldy due to high computational costs. In this paper, we propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and poses for a given scene. Our model achieves similar performance to the SOTA while being faster to train and only requiring offline rendering of low-resolution synthetic data. By using normalizing flows, the proposed method also provides uncertainty estimation for the output. We also demonstrated the efficiency of this method by deploying the model on a mobile robot.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# 二重混合:音声からの連続事象検出を目指して Double Mixture: Towards Continual Event Detection from Speech ( http://arxiv.org/abs/2404.13289v1 ) ライセンス: Link先を確認	Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari,	(参考訳) 音声イベント検出は、セマンティックイベントと音響イベントの両方のタグ付けを含むマルチメディア検索に不可欠である。従来のASRシステムは、対話の解釈が環境の文脈によって異なるとしても、コンテンツにのみ焦点をあてて、これらの出来事間の相互作用を見落としていることが多い。本稿では, 音声イベント検出における主な課題として, 過去の出来事を忘れることなく新たな事象を連続的に統合すること, 音響イベントからの意味のゆがみについて述べる。音声からの連続イベント検出という新しいタスクを導入し、2つのベンチマークデータセットを提供する。破滅的な忘れ込みと効果的な切り離しの課題に対処するため,我々は「二重混合」という新しい手法を提案する。本手法は, 適応性を高め, 忘れないように, 頑健な記憶機構と音声の専門知識を融合する。この課題は,コンピュータビジョンや自然言語処理において,現在最先端の手法では効果的に対処できない重要な課題であることを示す。提案手法は,様々な連続的な学習シーケンスにまたがって,最小の忘れ込み率と最高レベルの一般化を実現している。私たちのコードとデータはhttps://anonymous.4open.science/status/Continual-SpeechED-6461で公開されています。 Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events. We introduce a new task, continual event detection from speech, for which we also provide two benchmark datasets. To address the challenges of catastrophic forgetting and effective disentanglement, we propose a novel method, 'Double Mixture.' This method merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting. Our comprehensive experiments show that this task presents significant challenges that are not effectively addressed by current state-of-the-art methods in either computer vision or natural language processing. Our approach achieves the lowest rates of forgetting and the highest levels of generalization, proving robust across various continual learning sequences. Our code and data are available at https://anonymous.4open.science/status/Continual-SpeechED-6461.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# サブワードのトークン化の評価 : エイリアンのサブワード構成とOOV一般化への挑戦 Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge ( http://arxiv.org/abs/2404.13292v1 ) ライセンス: Link先を確認	Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella,	(参考訳) Byte-Pair Encoding (BPE) など、現在の言語モデルの一般的なサブワードトークンは、モデルの下流のパフォーマンスに影響を与える形態素境界を尊重しないことが知られている。多くの改良されたトークン化アルゴリズムが提案されているが、それらの評価と相互比較は依然として未解決の問題である。そこで本研究では,サブワードトークン化のための内在的・外在的評価フレームワークを提案する。 Intrinsic Evaluation is based on our new UniMorph Labeller tool that classified subword tokenization as morphological or alien。外部評価は、新たに指定された3つの下流テキスト分類タスクからなるOut-of-Vocabulary Generalization Challenge 1.0ベンチマークによって行われる。実験の結果,UniMorph Labellerの精度は98%であり,すべての言語モデル(ALBERT,BERT,RoBERTa,DeBERTaを含む)において,単語の意味の意味的構成性に対する形態的トークン化に比べて,異種トークン化が低いことが示唆された。 The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been proposed, their evaluation and cross-comparison is still an open problem. As a solution, we propose a combined intrinsic-extrinsic evaluation framework for subword tokenization. Intrinsic evaluation is based on our new UniMorph Labeller tool that classifies subword tokenization as either morphological or alien. Extrinsic evaluation, in turn, is performed via the Out-of-Vocabulary Generalization Challenge 1.0 benchmark, which consists of three newly specified downstream text classification tasks. Our empirical findings show that the accuracy of UniMorph Labeller is 98%, and that, in all language models studied (including ALBERT, BERT, RoBERTa, and DeBERTa), alien tokenization leads to poorer generalizations compared to morphological tokenization for semantic compositionality of word meanings.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# 相関脱落チャネルにおける重力猫状態の量子性 Quantumness of gravitational cat states in correlated dephasing channels ( http://arxiv.org/abs/2404.13294v1 ) ライセンス: Link先を確認	Saeed Haddadi, Mehrdad Ghominejad, Artur Czerwinski,	(参考訳) 本研究では, 負のデファスチャネルにおける重力猫状態の量子性について検討する。熱状態下での2匹の重力猫(2立方体)の脱コヒーレンスに、相似チャネルの連続的な作用の古典的相関がどのように影響するかを調べることに注力する。その結果、量子コヒーレンス、局所的な量子フィッシャー情報、ベル非局所性は、2つの量子ビットがチャネルを通過するときの時間を通して古典的相関を増大させることで著しく向上できることが示された。しかし、状態間の重力相互作用とエネルギーギャップは、重力猫の量子特性に複雑な影響を示す。重力物理学と量子情報処理の両方に重要な新機能が報告されている。 We study the quantumness of gravitational cat states in correlated dephasing channels. Our focus is on exploring how classical correlations between successive actions of a dephasing channel influence the decoherence of two gravitational cats (two qubits) at a thermal regime. The results show that the quantum coherence, local quantum Fisher information, and Bell non-locality can be significantly enhanced by augmenting classical correlations throughout the entire duration when the two qubits pass the channel. However, the gravitational interaction and energy gap between states exhibit intricate impacts on the quantum characteristics of gravitational cats. New features are reported that can be significant for both gravitational physics and quantum information processing.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# インクリメンタルビルドにおけるビルド依存性エラーの検出 Detecting Build Dependency Errors in Incremental Builds ( http://arxiv.org/abs/2404.13295v1 ) ライセンス: Link先を確認	Jun Lyu, Shanshan Li, He Zhang, Lanxin Yang, Bohan Liu, Manuel Rigger,	(参考訳) Makeのようなビルドツールによって実行される増分ビルドと並列ビルドは、現代のC/C++ソフトウェアプロジェクトの中心である。それらの正しい効率的な実行は、ビルドスクリプトに依存する。しかし、ビルドスクリプトはエラーを起こしやすい。最も多いエラーは、依存性の欠如(MD)と冗長依存関係(RD)である。これらのエラーを検出する最先端の手法は、クリーンなビルド(すなわち、クリーンな環境におけるソフトウェア構成のサブセットの完全なビルド)に依存している。これらの課題に対処するため、インクリメンタルビルドのコンテキストにおいて、ビルド依存性エラーを検出するためのECheckerと呼ばれる新しいアプローチを提案する。 ECheckerの中核となる考え方は、C/C++プリプロセッサディレクティブとMakefileの変更を新しいコミットから推論することで、実際のビルド依存関係を自動的に更新することだ。 ECheckerは、効率を維持しながらクリーンビルドに依存する方法よりも高い効率を達成する。私たちは、ECheckerの有効性と効率を評価するため、12の代表的なプロジェクトを選択しました。評価結果を,最先端のビルド依存性検出ツールと比較した。評価の結果,ECheckerのF-1スコアは最先端法に比べて0.18改善した。 ECheckerはビルド依存性のエラー検出効率を平均85.14倍に向上させる(中央値16.30倍)。その結果、ECheckerは、ビルド依存性のエラーを効率的に検出する実践者をサポートすることができた。 Incremental and parallel builds performed by build tools such as Make are the heart of modern C/C++ software projects. Their correct and efficient execution depends on build scripts. However, build scripts are prone to errors. The most prevalent errors are missing dependencies (MDs) and redundant dependencies (RDs). The state-of-the-art methods for detecting these errors rely on clean builds (i.e., full builds of a subset of software configurations in a clean environment), which is costly and takes up to multiple hours for large-scale projects. To address these challenges, we propose a novel approach called EChecker to detect build dependency errors in the context of incremental builds. The core idea of EChecker is to automatically update actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits, which avoids clean builds when possible. EChecker achieves higher efficiency than the methods that rely on clean builds while maintaining effectiveness. We selected 12 representative projects, with their sizes ranging from small to large, with 240 commits (20 commits for each project), based on which we evaluated the effectiveness and efficiency of EChecker. We compared the evaluation results with a state-of-the-art build dependency error detection tool. The evaluation shows that the F-1 score of EChecker improved by 0.18 over the state-of-the-art method. EChecker increases the build dependency error detection efficiency by an average of 85.14 times (with the median at 16.30 times). The results demonstrate that EChecker can support practitioners in detecting build dependency errors efficiently.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# 非零運動量をもつコヒーレンシングハードコアボソン凝縮状態 Coalescing hardcore-boson condensate states with nonzero momentum ( http://arxiv.org/abs/2404.13297v1 ) ライセンス: Link先を確認	C. H. Zhang, Z. Song,	(参考訳) 例外点(EPs)は、非エルミート系の排他的特徴として、基底状態を超えた代替安定状態である合体状態を支持する。本研究では, 強オンサイト相互作用を持つ1次元, 2次元, 3次元拡張ボース・ハッバード系における凝縮状態の動的形成に対する非エルミート不純物の影響について検討する。ハードコア限界の解に基づいて,特定の系パラメータが特定の整合条件を満たす場合,ODLRO (off-diagonal long-range order) の縮合モードが存在することを示す。開境界条件下では、凝縮状態は非エルミート$\mathcal{PT}$対称境界がEPを生じさせるときに結合状態となる。この現象の背後にある基本的なメカニズムは、非エルミート境界における多粒子波束の散乱ダイナミクスを解析することによって解明される。 EPダイナミクスは非ゼロ運動量を持つ凝縮状態の動的生成を促進する。理論的知見をさらに裏付けるために,数値シミュレーションを行った。この研究は、相互作用するボソンの潜在的な凝縮を公表するだけでなく、凝縮状態の工学にもアプローチを提供する。 Exceptional points (EPs), as an exclusive feature of a non-Hermitian system, support coalescing states to be alternative stable state beyond the ground state. In this work, we explore the influence of non-Hermitian impurities on the dynamic formation of condensate states in one-, two-, and three-dimensional extended Bose-Hubbard systems with strong on-site interaction. Based on the solution for the hardcore limit, we show exactly that condensate modes with off-diagonal long-range order (ODLRO) can exist when certain system parameters satisfy specific matching conditions. Under open boundary conditions, the condensate states become coalescing states when the non-Hermitian $\mathcal{PT}$-symmetric boundary gives rise to the EPs. The fundamental mechanism behind this phenomenon is uncovered through analyzing the scattering dynamics of many-particle wavepackets at the non-Hermitian boundaries. The EP dynamics facilitate the dynamic generation of condensate states with non-zero momentum. To further substantiate the theoretical findings, numerical simulations are conducted. This study not only unveils the potential condensation of interacting bosons but also offers an approach for the engineering of condensate states.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# PCQA: プロンプト条件に基づくAIGC品質評価のための強力なベースライン PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition ( http://arxiv.org/abs/2404.13299v1 ) ライセンス: Link先を確認	Xi Fang, Weigang Wang, Xiaoxin Lv, Jun Yan,	(参考訳) 大規模言語モデル(LLM)と拡散モデル(Diffusion Models)の開発は、人工知能生成コンテンツ(AIGC)のブームをもたらす。 AIGC技術に基づいて、異なる画像やビデオの定量評価を提供するために、効果的な品質評価フレームワークを構築することが不可欠である。 AIGCメソッドによって生成されたコンテンツは、人工的なプロンプトによって駆動される。したがって,AIGCの品質評価の基礎として,このプロンプトが有効であることは直感的である。本研究では,効果的なAIGC品質評価(QA)フレームワークを提案する。まず,複数ソースCLIP(Contrastive Language- Image Pre-Training)テキストエンコーダをベースとしたハイブリッドプロンプト符号化手法を提案する。第2に,適応したプロンプトと視覚機能を効果的にブレンドするアンサンブルベースの機能ミキサーモジュールを提案する。 AIGIQA-20K (AI-Generated Image Quality Assessment database) と T2VQA-DB (Text-to-Video Quality Assessment DataBase) の2つのデータセットにおける実証的研究を行い,提案手法の有効性を検証した。提案するシンプルで実現可能なフレームワークは,マルチモーダル・ジェネレーション分野の研究開発を促進する可能性がある。 The development of Large Language Models (LLM) and Diffusion Models brings the boom of Artificial Intelligence Generated Content (AIGC). It is essential to build an effective quality assessment framework to provide a quantifiable evaluation of different images or videos based on the AIGC technologies. The content generated by AIGC methods is driven by the crafted prompts. Therefore, it is intuitive that the prompts can also serve as the foundation of the AIGC quality assessment. This study proposes an effective AIGC quality assessment (QA) framework. First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder to understand and respond to the prompt conditions. Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features. The empirical study practices in two datasets: AIGIQA-20K (AI-Generated Image Quality Assessment database) and T2VQA-DB (Text-to-Video Quality Assessment DataBase), which validates the effectiveness of our proposed method: Prompt Condition Quality Assessment (PCQA). Our proposed simple and feasible framework may promote research development in the multimodal generation field.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# Capturing Momentum: 機械学習と時系列理論を用いたテニスマッチング解析 Capturing Momentum: Tennis Match Analysis Using Machine Learning and Time Series Theory ( http://arxiv.org/abs/2404.13300v1 ) ライセンス: Link先を確認	Jingdi Lei, Tianqi Kang, Yuluan Cao, Shiwei Ren,	(参考訳) 本稿ではテニスの試合の勢いについて分析する。また、その一般化性能から、スポーツゲームの結果を予測するシステムの構築や、技術統計に基づくプレイヤーのパフォーマンス分析に有用である。まず隠れマルコフモデルを用いてプレイヤーのパフォーマンスとして定義される運動量を予測する。そして、Xgboost を用いて運動量の重要性を証明する。最後に,本モデルの性能評価にLightGBMを用い,SHAP特徴量ランキングと重み解析を用いて,プレイヤーのパフォーマンスに影響を及ぼす重要な点を求める。 This paper represents an analysis on the momentum of tennis match. And due to Generalization performance of it, it can be helpful in constructing a system to predict the result of sports game and analyze the performance of player based on the Technical statistics. We First use hidden markov models to predict the momentum which is defined as the performance of players. Then we use Xgboost to prove the significance of momentum. Finally we use LightGBM to evaluate the performance of our model and use SHAP feature importance ranking and weight analysis to find the key points that affect the performance of players.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# フェイクベンチ:大きめのマルチモーダルモデルでアキレスのフェイク画像のヒールを発見 FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models ( http://arxiv.org/abs/2404.13306v1 ) ライセンス: Link先を確認	Yixuan Li, Xuelin Liu, Xiaoyang Wang, Shiqi Wang, Weisi Lin,	(参考訳) 近年,人工知能(AI)モデルによって生成された偽画像は,偽画像検出モデルに対する新たな課題として現実と区別できないものとなっている。この程度では、人間の理解できない説明がないため、現実または偽の単純な二分判断は説得力が少なく、信頼性が低いように見える。幸運なことに、LMM(Large Multimodal Models)は、その性能が未決定のまま、判断プロセスを実現する可能性をもたらす。そこで本稿では,偽のサインに人間の言語記述を付加した偽画像からなる,透過的なデファクタに対する最初のベンチマークであるFakeBenchを提案する。 1)LMMはAIによって生成された偽画像を区別できるか、(2)LMMは偽画像をどのように区別できるのか? 具体的には、FakeClassデータセットを6kの多様なソースの偽画像と実画像で構築し、それぞれに画像の信頼性に関する質問&回答ペアを設け、検出能力をベンチマークする。本研究では,LMMの推論能力と解釈能力を検討するために,偽画像のファルシフィケーションを明らかにする暗黙の手がかりに関する15k個の記述からなるFakeClueデータセットを提案する。さらに,FakeQAを構築し,LMMの解答能力を評価する。実験の結果,現在のLMMは中等度識別能力,予備解釈能力,推論能力を有しており,画像デフォーメーションの解答能力は欠かせないことがわかった。 FakeBenchは近く一般公開される予定だ。 Recently, fake images generated by artificial intelligence (AI) models have become indistinguishable from the real, exerting new challenges for fake image detection models. To this extent, simple binary judgments of real or fake seem less convincing and credible due to the absence of human-understandable explanations. Fortunately, Large Multimodal Models (LMMs) bring possibilities to materialize the judgment process while their performance remains undetermined. Therefore, we propose FakeBench, the first-of-a-kind benchmark towards transparent defake, consisting of fake images with human language descriptions on forgery signs. FakeBench gropes for two open questions of LMMs: (1) can LMMs distinguish fake images generated by AI, and (2) how do LMMs distinguish fake images? In specific, we construct the FakeClass dataset with 6k diverse-sourced fake and real images, each equipped with a Question&Answer pair concerning the authenticity of images, which are utilized to benchmark the detection ability. To examine the reasoning and interpretation abilities of LMMs, we present the FakeClue dataset, consisting of 15k pieces of descriptions on the telltale clues revealing the falsification of fake images. Besides, we construct the FakeQA to measure the LMMs' open-question answering ability on fine-grained authenticity-relevant aspects. Our experimental results discover that current LMMs possess moderate identification ability, preliminary interpretation and reasoning ability, and passable open-question answering ability for image defake. The FakeBench will be made publicly available soon.	翻訳日:2024-04-23 19:39:25 公開日:2024-04-20
# GPT-4におけるエラータイプの調査とUSMLE質問への回答 Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions ( http://arxiv.org/abs/2404.13307v1 ) ライセンス: Link先を確認	Soumyadeep Roy, Aparup Khatua, Fatemeh Ghoochani, Uwe Hadler, Wolfgang Nejdl, Niloy Ganguly,	(参考訳) GPT-4は医療用QAタスクにおいて高い精度を示し、86.70%の精度で、Med-PaLM 2は86.50%である。しかし、エラーの約14%が残っている。加えて、現在の研究では GPT-4 を用いて正しい選択肢を予測できるが、説明は得られず、したがって GPT-4 や他の LLM で使用される思考過程や推論についての洞察は得られない。そこで,本研究では,医学生との連携から得られた新たな領域固有の誤り分類法を提案する。 GPT-4 USMLE Error (G4UE) データセットは, アメリカ医学ライセンス試験 (USMLE) に対する4153 GPT-4 の正解と 919 の誤応答からなる。これらの応答は非常に長く(258語平均)、選択されたオプションを正当化する GPT-4 からの詳細な説明を含んでいる。そして、Potatoアノテーションプラットフォームを使用して大規模なアノテーション研究を開始し、有名なクラウドソーシングプラットフォームであるProlificを通じて44人の医療専門家を募集した。私たちは、これらの919の不正なデータポイントのうち300点を、異なるクラスの粒度レベルで注釈付けし、エラーの背後にある理由を特定するためにマルチラベルスパンを作成しました。注釈付きデータセットでは、GPT-4の誤応答のかなりの部分は、アノテーションによって「GPT-4による推論可能な応答」に分類される。これは、訓練された医療専門家の間でも、誤った選択肢につながる可能性のある説明を明らかにするという課題に光を当てている。データポイント毎にSemRepツールを用いて抽出した医療概念と医用意味述語も提供する。 LLMが複雑な医学的疑問に答える能力を評価するのに役立つと我々は信じている。リソースはhttps://github.com/roysoumya/usmle-gpt4-error-taxonomy で公開しています。 GPT-4 demonstrates high accuracy in medical QA tasks, leading with an accuracy of 86.70%, followed by Med-PaLM 2 at 86.50%. However, around 14% of errors remain. Additionally, current works use GPT-4 to only predict the correct option without providing any explanation and thus do not provide any insight into the thinking process and reasoning used by GPT-4 or other LLMs. Therefore, we introduce a new domain-specific error taxonomy derived from collaboration with medical students. Our GPT-4 USMLE Error (G4UE) dataset comprises 4153 GPT-4 correct responses and 919 incorrect responses to the United States Medical Licensing Examination (USMLE) respectively. These responses are quite long (258 words on average), containing detailed explanations from GPT-4 justifying the selected option. We then launch a large-scale annotation study using the Potato annotation platform and recruit 44 medical experts through Prolific, a well-known crowdsourcing platform. We annotated 300 out of these 919 incorrect data points at a granular level for different classes and created a multi-label span to identify the reasons behind the error. In our annotated dataset, a substantial portion of GPT-4's incorrect responses is categorized as a "Reasonable response by GPT-4," by annotators. This sheds light on the challenge of discerning explanations that may lead to incorrect options, even among trained medical professionals. We also provide medical concepts and medical semantic predications extracted using the SemRep tool for every data point. We believe that it will aid in evaluating the ability of LLMs to answer complex medical questions. We make the resources available at https://github.com/roysoumya/usmle-gpt4-error-taxonomy .	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# 生成学習のための潜在Schr{ö}dinger Bridge拡散モデル Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning ( http://arxiv.org/abs/2404.13309v1 ) ライセンス: Link先を確認	Yuling Jiao, Lican Kang, Huazhen Lin, Jin Liu, Heng Zuo,	(参考訳) 本稿では,現在の拡散モデルの包括的理論的解析を行うことを目的とする。本稿では、この領域における理論的探索の枠組みとして、潜在空間におけるSchr{\"o}dinger Bridge拡散モデルを用いた新しい生成学習手法を提案する。提案手法は,対象分布から逸脱する可能性のある分布から派生したデータを用いたエンコーダ・デコーダアーキテクチャの事前学習から始まり,既存の大規模モデルを活用することで,大規模なサンプルサイズの収容が容易になる。次に、Schr{\「o}dinger Bridge framework」を用いた潜伏空間内の拡散モデルを構築した。我々の理論的解析は、潜在Schr{\"o}dinger橋拡散モデルによる学習分布のエンドツーエンド誤差解析の確立を含む。具体的には、生成した分布と対象分布の間の2階ワッサーシュタイン距離を制御する。さらに, 得られた収束速度は次元の呪いを効果的に軽減し, 普及する拡散モデルに対する堅牢な理論的支援を提供する。 This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{\"o}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution that may diverge from the target distribution, thus facilitating the accommodation of a large sample size through the utilization of pre-existing large-scale models. Subsequently, we develop a diffusion model within the latent space utilizing the Schr{\"o}dinger bridge framework. Our theoretical analysis encompasses the establishment of end-to-end error analysis for learning distributions via the latent Schr{\"o}dinger bridge diffusion model. Specifically, we control the second-order Wasserstein distance between the generated distribution and the target distribution. Furthermore, our obtained convergence rates effectively mitigate the curse of dimensionality, offering robust theoretical support for prevailing diffusion models.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# STAT: 一般化可能な時間的行動ローカライゼーションを目指して STAT: Towards Generalizable Temporal Action Localization ( http://arxiv.org/abs/2404.13311v1 ) ライセンス: Link先を確認	Yangcen Liu, Ziyi Liu, Yuanhao Zhai, Wen Li, David Doerman, Junsong Yuan,	(参考訳) WTAL(Wakly-supervised temporal action Localization)は、ビデオレベルのラベルだけでアクションインスタンスを認識およびローカライズすることを目的としている。大幅な進歩にもかかわらず、既存のメソッドは、異なる分散への転送時に深刻なパフォーマンス劣化に悩まされるため、現実のシナリオにはほとんど適応しない可能性がある。この問題に対処するために,アクションローカライズ手法の一般化性向上に焦点をあてた,一般化可能な時間的行動ローカライズタスク(GTAL)を提案する。その結果, 性能低下の原因は主に, 行動尺度の違いによる一般化性の欠如にあることがわかった。この問題に対処するために,教師-学生構造を活用したSTAT(Self-supervised Temporal Adaptive Teacher)を提案する。我々のSTATは改良モジュールとアライメントモジュールを備えている。前者は文脈情報を利用してモデルの出力を反復的に洗練し、対象のスケールに適応するのに役立つ。後者は、生徒モデルと教師モデルとのコンセンサスを促進することにより、改善プロセスを改善する。本研究では,THUMOS14,ActivityNet1.2,HACSの3つのデータセットに対して広範囲に実験を行い,同分布評価性能に近づいた場合においても,クロスディストリビューション評価条件の下でベースライン法を著しく改善することを示した。 Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels. Despite the significant progress, existing methods suffer from severe performance degradation when transferring to different distributions and thus may hardly adapt to real-world scenarios . To address this problem, we propose the Generalizable Temporal Action Localization task (GTAL), which focuses on improving the generalizability of action localization methods. We observed that the performance decline can be primarily attributed to the lack of generalizability to different action scales. To address this problem, we propose STAT (Self-supervised Temporal Adaptive Teacher), which leverages a teacher-student structure for iterative refinement. Our STAT features a refinement module and an alignment module. The former iteratively refines the model's output by leveraging contextual information and helps adapt to the target scale. The latter improves the refinement process by promoting a consensus between student and teacher models. We conduct extensive experiments on three datasets, THUMOS14, ActivityNet1.2, and HACS, and the results show that our method significantly improves the Baseline methods under the cross-distribution evaluation setting, even approaching the same-distribution evaluation performance.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# リプシッツ連続制御問題の安定性と強化学習への応用について On the stability of Lipschitz continuous control problems and its application to reinforcement learning ( http://arxiv.org/abs/2404.13316v1 ) ライセンス: Link先を確認	Namkyeong Cho, Yeoneung Kim,	(参考訳) モデルなし強化学習におけるハミルトン-ヤコビ-ベルマン方程式(HJB)の重要な安定性特性、特にリプシッツ連続最適制御問題に対処する。リプシッツ連続最適制御問題と古典最適制御問題とのギャップを粘度解フレームワークで埋め、リプシッツ連続最適制御問題の値関数の安定性に関する新たな洞察を提供する。力学と報酬関数の構造的仮定を導入することにより、値関数の収束率をさらに研究する。さらに、リプシッツ連続制御問題に対する一般化されたフレームワークを導入し、元の問題を取り入れ、それを活用して、新しいHJBに基づく強化学習アルゴリズムを提案する。提案手法の安定性特性と性能を,既存手法と比較してよく知られたベンチマーク例で検証した。 We address the crucial yet underexplored stability properties of the Hamilton--Jacobi--Bellman (HJB) equation in model-free reinforcement learning contexts, specifically for Lipschitz continuous optimal control problems. We bridge the gap between Lipschitz continuous optimal control problems and classical optimal control problems in the viscosity solutions framework, offering new insights into the stability of the value function of Lipschitz continuous optimal control problems. By introducing structural assumptions on the dynamics and reward functions, we further study the rate of convergence of value functions. Moreover, we introduce a generalized framework for Lipschitz continuous control problems that incorporates the original problem and leverage it to propose a new HJB-based reinforcement learning algorithm. The stability properties and performance of the proposed method are tested with well-known benchmark examples in comparison with existing approaches.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# 一般量子演算の曖昧な識別 Unambiguous discrimination of general quantum operations ( http://arxiv.org/abs/2404.13317v1 ) ライセンス: Link先を確認	Weizhou Cai, Jing-Ning Zhang, Ziyue Hua, Weiting Wang, Xiaoxuan Pan, Xinyu Liu, Yuwei Ma, Ling Hu, Xianghao Mu, Haiyan Wang, Yipu Song, Chang-Ling Zou, Luyan Sun,	(参考訳) 量子操作の識別は、長い間興味深い課題であり、量子オブジェクトを識別する際の量子的特徴の理解を大幅に前進させた理論的研究である。この課題は量子状態の識別と密接に関連しており、後者の証明・証明は光子を用いて既に実現されている。しかし、ユニタリ演算と非ユニタリ演算の両方を含む一般量子演算を識別する実験的な実証は、いまだに解明されていない。一般の量子系、特に高次元の量子系では、任意の量子状態の準備と任意の量子演算の実装と一般化された測定は非自明なタスクである。ここでは,最大6個の変位演算子の最適不明確な識別と,非単位量子演算の曖昧な識別を実験的に実証する。本研究は,量子情報処理の実験的な研究のための強力なツールを示し,量子センシング分野における幅広い価値ある応用を刺激することが期待されている。 The discrimination of quantum operations has long been an intriguing challenge, with theoretical research significantly advancing our understanding of the quantum features in discriminating quantum objects. This challenge is closely related to the discrimination of quantum states, and proof-of-principle demonstrations of the latter have already been realized using optical photons. However, the experimental demonstration of discriminating general quantum operations, including both unitary and non-unitary operations, has remained elusive. In general quantum systems, especially those with high dimensions, the preparation of arbitrary quantum states and the implementation of arbitrary quantum operations and generalized measurements are non-trivial tasks. Here, for the first time, we experimentally demonstrate the optimal unambiguous discrimination of up to 6 displacement operators and the unambiguous discrimination of non-unitary quantum operations. Our results demonstrate powerful tools for experimental research in quantum information processing and are expected to stimulate a wide range of valuable applications in the field of quantum sensing.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# EHRFL:不均一なEHRのためのフェデレーションラーニングフレームワークと参加顧客の選択 EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients ( http://arxiv.org/abs/2404.13318v1 ) ライセンス: Link先を確認	Jiyoun Kim, Junu Kim, Kyunghoon Hur, Edward Choi,	(参考訳) 本研究では、電子健康記録のためのフェデレーション学習(EHR)において、現実的に見落とされがちな2つのシナリオに対する解決策を提供する。まず、EHRのテキストベース線形化を用いた医療機関間のフェデレーション学習を支援するフレームワークであるEHRFLを紹介する。第2に、単一医療機関がフェデレーションラーニングを開始し、ホストが生み出す費用を削減するために、顧客数を最適化しなければならないモデルを構築するシナリオに焦点を当てる。参加する顧客を選別するために,データ潜入者を利用して施設に適した参加者を特定する,新しい精度ベースの手法を提案する。実験の結果, EHRFL は, 異なる EHR システムを持つ病院におけるフェデレーション学習を効果的に行うことができることがわかった。さらに, モデル性能を損なうことなく参加客数を削減し, 機関別モデル構築における運用コストの低減を図るため, 精度に基づく手法の有効性を実証した。我々は、この研究が、連合学習の EHR への広範な採用の基盤となると信じている。 In this study, we provide solutions to two practical yet overlooked scenarios in federated learning for electronic health records (EHRs): firstly, we introduce EHRFL, a framework that facilitates federated learning across healthcare institutions with distinct medical coding systems and database schemas using text-based linearization of EHRs. Secondly, we focus on a scenario where a single healthcare institution initiates federated learning to build a model tailored for itself, in which the number of clients must be optimized in order to reduce expenses incurred by the host. For selecting participating clients, we present a novel precision-based method, leveraging data latents to identify suitable participants for the institution. Our empirical results show that EHRFL effectively enables federated learning across hospitals with different EHR systems. Furthermore, our results demonstrate the efficacy of our precision-based method in selecting reduced number of participating clients without compromising model performance, resulting in lower operational costs when constructing institution-specific models. We believe this work lays a foundation for the broader adoption of federated learning on EHRs.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# Pixelは「バリアー」:拡散モデルは想像以上に逆向きにロバスト Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think ( http://arxiv.org/abs/2404.13320v1 ) ライセンス: Link先を確認	Haotian Xue, Yongxin Chen,	(参考訳) 拡散モデルの逆例は、安全上の問題に対する解決策として広く使われている。個人画像に敵対的摂動を加えることで、攻撃者は容易にそれらを編集したり模倣したりすることはできない。しかしながら、これらすべての保護が潜在拡散モデル(LDM)をターゲットにしていることに注意する必要がある。このことは、拡散モデルがほとんどの深層モデルのような敵攻撃に対して脆弱であると考えることを誤解させるかもしれない。本稿では, 勾配をベースとしたホワイトボックス攻撃がLDM攻撃に有効であっても, PDM攻撃に失敗する,という新たな知見を示す。この発見は、異なるモデル構造を持つ様々なPDMおよびLCDに対する、ほぼ幅広い攻撃手法の広範な実験によって裏付けられている。また, PDMは, 画像を保護するために, LDMで生成した対向パターンを効果的に除去するために, オフ・ザ・シェルフ・パーファイラとして使用することができる。我々は、我々の洞察が、拡散モデルに対する敵のサンプルを保護方法として再考し、より効果的な保護に向けて前進させることを期待している。コードはhttps://github.com/xavihart/PDM-Pure.comで入手できる。 Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adversarial patterns that were generated on LDMs to protect the images, which means that most protection methods nowadays, to some extent, cannot protect our images from malicious attacks. We hope that our insights will inspire the community to rethink the adversarial samples for diffusion models as protection methods and move forward to more effective protection. Codes are available in https://github.com/xavihart/PDM-Pure.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# MergeNet: 異種モデル、タスク、モダリティ間の知識マイグレーション MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities ( http://arxiv.org/abs/2404.13322v1 ) ライセンス: Link先を確認	Kunxi Li, Tianyu Zhan, Shengyu Zhang, Kun Kuang, Jiwei Li, Zhou Zhao, Fei Wu,	(参考訳) 本研究では, 全く異なるモデルアーキテクチャ, タスク, モダリティ間の異質な知識伝達に着目した。既存の知識伝達方法(例えば、バックボーン共有、知識蒸留)は、しばしばモデル構造やタスク固有の機能/ラベル内の共有要素にヒンジし、複雑なモデルタイプやタスクへの転送を制限する。これらの課題を克服するために、異種モデルのパラメータ空間のギャップを埋めることを学び、これらのパラメータ空間内での直接的な相互作用、抽出、知識の応用を容易にするMergeNetを提案する。 MergeNetの中核となるメカニズムはパラメータアダプタにあり、ソースモデルの低ランクパラメータをクエリして、ターゲットモデルへのパラメータの識別とマッピングを順応的に学習する。 MergeNetは両方のモデルと共に学習され、我々のフレームワークは、ソースモデルのトレーニング軌道知識を含む、現在のステージに関連する知識を動的に転送し、適応することができます。不均一な知識伝達に関する大規模な実験は、代表的アプローチが干渉したり適用範囲を減らしたりすることの可能な、挑戦的な設定において顕著な改善を示す。 In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models, facilitating the direct interaction, extraction, and application of knowledge within these parameter spaces. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters and adeptly learning to identify and map parameters into the target model. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage, including the training trajectory knowledge of the source model. Extensive experiments on heterogeneous knowledge transfer demonstrate significant improvements in challenging settings, where representative approaches may falter or prove less applicable.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# フェデレーションラーニングによる協調的視覚的位置認識 Collaborative Visual Place Recognition through Federated Learning ( http://arxiv.org/abs/2404.13324v1 ) ライセンス: Link先を確認	Mattia Dutto, Gabriele Berton, Debora Caldarola, Eros Fanì, Gabriele Trivigno, Carlo Masone,	(参考訳) 視覚的位置認識(VPR)は、画像の位置を検索問題として扱うことで、画像の位置を推定することを目的としている。 VPRはジオタグ付き画像のデータベースを使用し、ディープニューラルネットワークを活用して、各画像からデクリプタと呼ばれるグローバル表現を抽出する。 VPRモデルのトレーニングデータは、地理的に散在する多様なソース(ゲオタグ付き画像)に由来することが多いが、トレーニングプロセス自体は典型的には中央集権的であると仮定される。本研究は,フェデレートラーニング(FL)のレンズを通してVPRの課題を再考し,この適応に関連するいくつかの重要な課題に対処する。 VPRデータは本質的に適切に定義されたクラスを欠いているため、モデルは通常、データマイニングのステップを必要とするコントラスト学習を使用してトレーニングされる。さらに、フェデレートされたシステムのクライアントデバイスは、その処理能力に関して非常に異質である。提案するFedVPRフレームワークは、VPRの新しいアプローチを提供するだけでなく、FL研究のための新しい、挑戦的で現実的なタスクを導入し、FL内の他の画像検索タスクへの道を開いた。 Visual Place Recognition (VPR) aims to estimate the location of an image by treating it as a retrieval problem. VPR uses a database of geo-tagged images and leverages deep neural networks to extract a global representation, called descriptor, from each image. While the training data for VPR models often originates from diverse, geographically scattered sources (geo-tagged images), the training process itself is typically assumed to be centralized. This research revisits the task of VPR through the lens of Federated Learning (FL), addressing several key challenges associated with this adaptation. VPR data inherently lacks well-defined classes, and models are typically trained using contrastive learning, which necessitates a data mining step on a centralized database. Additionally, client devices in federated systems can be highly heterogeneous in terms of their processing capabilities. The proposed FedVPR framework not only presents a novel approach for VPR but also introduces a new, challenging, and realistic task for FL research, paving the way to other image retrieval tasks in FL.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# 機械学習を用いた融雪駆動流速予測の比較解析 Comparative Analysis on Snowmelt-Driven Streamflow Forecasting Using Machine Learning Techniques ( http://arxiv.org/abs/2404.13327v1 ) ライセンス: Link先を確認	Ukesh Thapa, Bipun Man Pati, Samit Thapa, Dhiraj Pyakurel, Anup Shrestha,	(参考訳) 機械学習技術の急速な進歩は、水資源を含む様々な領域に広く応用されている。しかし, 融雪モデルはまだ広く調査されていない領域である。本研究では,ヒンズー・クシュ・ヒマラヤ地方のヒマラヤ盆地における融雪駆動放電モデルにおいて,時相畳み込みネットワーク(TCN)を利用した最先端の深層学習モデルを提案する。提案モデルの性能を評価するため,SVR(Support Vector Regression),LSTM(Long Short Term Memory),Transformer(Transformer)など,他の一般的なモデルとの比較分析を行った。さらに、5つの外折りと3つの内折りにNested Cross-validation(CV)を使用し、内折りにハイパーパラメータチューニングを行う。モデル平均絶対誤差(MAE)、ルート平均二乗誤差(RMSE)、R平方(R^{2}$)、クリング・グプタ効率(KGE)、ナッシュ・サトクリフ効率(NSE)を各外周毎に算出する。平均値では、TNが他のモデルより優れており、MAEは0.011、RMSEは0.023、R^{2}$は0.991、KGEは0.992、NSEは0.991である。本研究は,融雪駆動流速予測における従来の機械学習手法と比較して,ディープラーニングモデルの有効性を示すものである。さらに、TCNの優れた性能は、同様の水文学応用のための有望なディープラーニングモデルとしての可能性を強調している。 The rapid advancement of machine learning techniques has led to their widespread application in various domains including water resources. However, snowmelt modeling remains an area that has not been extensively explored. In this study, we propose a state-of-the-art (SOTA) deep learning sequential model, leveraging the Temporal Convolutional Network (TCN), for snowmelt-driven discharge modeling in the Himalayan basin of the Hindu Kush Himalayan Region. To evaluate the performance of our proposed model, we conducted a comparative analysis with other popular models including Support Vector Regression (SVR), Long Short Term Memory (LSTM), and Transformer. Furthermore, Nested cross-validation (CV) is used with five outer folds and three inner folds, and hyper-parameter tuning is performed on the inner folds. To evaluate the performance of the model mean absolute error (MAE), root mean square error (RMSE), R square ($R^{2}$), Kling-Gupta Efficiency (KGE), and Nash-Sutcliffe Efficiency (NSE) are computed for each outer fold. The average metrics revealed that TCN outperformed the other models, with an average MAE of 0.011, RMSE of 0.023, $R^{2}$ of 0.991, KGE of 0.992, and NSE of 0.991. The findings of this study demonstrate the effectiveness of the deep learning model as compared to traditional machine learning approaches for snowmelt-driven streamflow forecasting. Moreover, the superior performance of TCN highlights its potential as a promising deep learning model for similar hydrological applications.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# 立体内視鏡画像の超解像・手術機器分割のためのSEGSRNet SEGSRNet for Stereo-Endoscopic Image Super-Resolution and Surgical Instrument Segmentation ( http://arxiv.org/abs/2404.13330v1 ) ライセンス: Link先を確認	Mansoor Hayat, Supavadee Aramvith, Titipat Achakulvisut,	(参考訳) SEGSRNetは、低解像度立体内視鏡画像における手術器具の正確な識別という課題に対処する。我々の革新的なフレームワークは、セグメント化の前に最先端の超解像技術を適用することにより、画像の明瞭度とセグメンテーション精度を向上させる。これにより、より正確なセグメンテーションのための高品質な入力が保証される。 SEGSRNetは、高度な特徴抽出と注意機構と空間処理を組み合わせることで、画像の詳細を鮮明にする。提案モデルはDice,IoU,PSNR,SSIM,SEGSRNetなどの現行モデルより優れている。 SEGSRNetは、画像の解像度と正確なセグメンテーションを提供し、外科的精度と患者のケア結果を大幅に向上させることができる。 SEGSRNet addresses the challenge of precisely identifying surgical instruments in low-resolution stereo endoscopic images, a common issue in medical imaging and robotic surgery. Our innovative framework enhances image clarity and segmentation accuracy by applying state-of-the-art super-resolution techniques before segmentation. This ensures higher-quality inputs for more precise segmentation. SEGSRNet combines advanced feature extraction and attention mechanisms with spatial processing to sharpen image details, which is significant for accurate tool identification in medical images. Our proposed model outperforms current models including Dice, IoU, PSNR, and SSIM, SEGSRNet where it produces clearer and more accurate images for stereo endoscopic surgical imaging. SEGSRNet can provide image resolution and precise segmentation which can significantly enhance surgical accuracy and patient care outcomes.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# Fuzzychain: ブロックチェーンネットワークのための等価なコンセンサスメカニズム Fuzzychain: An Equitable Consensus Mechanism for Blockchain Networks ( http://arxiv.org/abs/2404.13337v1 ) ライセンス: Link先を確認	Bruno Ramos-Cruz, Javier Andreu-Pérez, Francisco J. Quesada, Luis Martínez,	(参考訳) ブロックチェーン技術は、分散暗号化ネットワークを通じてセキュアで透明なトランザクションを確立するための信頼できる方法になっています。ブロックチェーンの運用はコンセンサスアルゴリズムによって管理されており、その中ではProof of Stake(PoS)が一般的だが、その欠点がある。提案手法であるファジィチェーンでは,利害関係のセマンティクス定義にファジィセットを導入し,分散処理制御を推進している。本システムは,利得ファジィ集合の会員度に基づくバリデータを選択する。ブロックチェーンにファジィセットを適用するという先駆的な提案として、FuzzychainはPoSの制限の修正を目指している。以上の結果から,Fuzzychainは機能的にPoSに適合するだけでなく,バリデータ間の利害関係の公平な分配も保証し,より包括的なバリデータ選択と分散ネットワークの実現につながることが示唆された。 Blockchain technology has become a trusted method for establishing secure and transparent transactions through a distributed, encrypted network. The operation of blockchain is governed by consensus algorithms, among which Proof of Stake (PoS) is popular yet has its drawbacks, notably the potential for centralising power in nodes with larger stakes or higher rewards. Fuzzychain, our proposed solution, introduces the use of fuzzy sets to define stake semantics, promoting decentralised and distributed processing control. This system selects validators based on their degree of membership to the stake fuzzy sets rather than just the size of their stakes. As a pioneer proposal in applying fuzzy sets to blockchain, Fuzzychain aims to rectify PoS's limitations. Our results indicate that Fuzzychain not only matches PoS in functionality but also ensures a fairer distribution of stakes among validators, leading to more inclusive validator selection and a better-distributed network.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# テストケースジェネレータとしての大規模言語モデル:性能評価と拡張 Large Language Models as Test Case Generators: Performance Evaluation and Enhancement ( http://arxiv.org/abs/2404.13340v1 ) ライセンス: Link先を確認	Kefan Li, Yuan Yuan,	(参考訳) 大規模言語モデル(LLM)を用いたコード生成は広く研究され、目覚ましい進歩を遂げた。コード生成の補完的な側面として、テストケースの生成は、コードの品質と信頼性を保証する上で非常に重要です。しかし、LLMをテストケースジェネレータとして使うことは、あまり研究されていない。この線に沿った現在の研究は、LLMが生成したテストケースの支援によるコード生成の強化に重点を置いているが、テストケース生成のみでのLCMのパフォーマンスは、包括的に検討されていない。このギャップを埋めるため、我々はLLMがいかに高品質なテストケースを生成できるかを広範囲に実験する。問題の難しさが増大するにつれて、現状のLLMは、計算や推論に固有の制限があるため、正しいテストケースを生成するのに苦労していることがわかった。この問題を緩和するために、テストインプットとテストアウトプットの生成を分離する「emph{TestChain}」と呼ばれるマルチエージェントフレームワークを提案する。特にTestChainは、より正確なテスト出力を提供するために、LLMのReActフォーマットの会話チェーンを使用してPythonインタプリタと対話する。以上の結果から,TestChainはベースラインのマージンを大きく上回っていることが示唆された。特に、テストケースの精度の面では、GPT-4をバックボーンとして使用するTestChainは、LeetCode-hardデータセットのベースラインよりも13.84.%改善されている。 Code generation with Large Language Models (LLMs) has been extensively studied and achieved remarkable progress. As a complementary aspect to code generation, test case generation is of crucial importance in ensuring the quality and reliability of code. However, using LLMs as test case generators has been much less explored. Current research along this line primarily focuses on enhancing code generation with assistance from test cases generated by LLMs, while the performance of LLMs in test case generation alone has not been comprehensively examined. To bridge this gap, we conduct extensive experiments to study how well LLMs can generate high-quality test cases. We find that as the problem difficulty increases, state-of-the-art LLMs struggle to generate correct test cases, largely due to their inherent limitations in computation and reasoning. To mitigate this issue, we further propose a multi-agent framework called \emph{TestChain} that decouples the generation of test inputs and test outputs. Notably, TestChain uses a ReAct format conversation chain for LLMs to interact with a Python interpreter in order to provide more accurate test outputs. Our results indicate that TestChain outperforms the baseline by a large margin. Particularly, in terms of the accuracy of test cases, TestChain using GPT-4 as the backbone achieves a 13.84\% improvement over the baseline on the LeetCode-hard dataset.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# 自己監督型異常による高スペクトル異常検出 Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior ( http://arxiv.org/abs/2404.13342v1 ) ライセンス: Link先を確認	Yidan Liu, Weiying Xie, Kai Jiang, Jiaqing Zhang, Yunsong Li, Leyuan Fang,	(参考訳) 既存のハイパースペクトル異常検出(HAD)法の大半は、背景と異常成分を分離するために低ランク表現(LRR)モデルを使用しており、そこでは異常成分は手作りのスパース事前(例えば$\ell_{2,1}$-norm)で最適化されている。しかし、これは異常に存在する空間構造を見落とし、検出結果を手動で設定した間隔に大きく依存させるため、理想的ではないかもしれない。これらの問題に対処するために、自己教師付き異常前処理(SAP)と呼ばれる自己教師付きネットワークを用いて、LRRモデルにおける異常成分の最適化基準を再定義する。この前者は,超スペクトル異常の特徴を学習するためにカスタマイズされた自己教師型学習のテキストタスクによって得られる。具体的には、このプリテキストタスクは、元のハイパースペクトル画像(HSI)と擬似アノマリーHSIを区別する分類タスクであり、擬似アノマリーは元のHSIから生成され、任意のポリゴンベースと任意のスペクトルバンドを持つプリズムとして設計される。さらに、複雑な背景からの異常の分離を容易にするため、より洗練された背景表現をリッチな背景辞書で提供するための二重精製戦略を提案する。様々な超スペクトルデータセットに対する大規模な実験により、提案されたSAPは、他の先進的HAD法よりも正確で解釈可能な解を提供することを示した。 The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely dependent on manually set sparsity. To tackle these problems, we redefine the optimization criterion for the anomaly component in the LRR model with a self-supervised network called self-supervised anomaly prior (SAP). This prior is obtained by the pretext task of self-supervised learning, which is customized to learn the characteristics of hyperspectral anomalies. Specifically, this pretext task is a classification task to distinguish the original hyperspectral image (HSI) and the pseudo-anomaly HSI, where the pseudo-anomaly is generated from the original HSI and designed as a prism with arbitrary polygon bases and arbitrary spectral bands. In addition, a dual-purified strategy is proposed to provide a more refined background representation with an enriched background dictionary, facilitating the separation of anomalies from complex backgrounds. Extensive experiments on various hyperspectral datasets demonstrate that the proposed SAP offers a more accurate and interpretable solution than other advanced HAD methods.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# UnibucLLM:複数項目質問に対する項目難易度と応答時間の自動予測用LLMのハーネス化 UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions ( http://arxiv.org/abs/2404.13343v1 ) ライセンス: Link先を確認	Ana-Cristina Rogoz, Radu Tudor Ionescu,	(参考訳) 本研究は,BEA 2024共有タスクにおけるUSMLE多項目質問(MCQ)の項目難易度と応答時間を予測するために,LLM(Large Language Models)に基づく新しいデータ拡張手法を提案する。我々のアプローチは、ゼロショットLLM(ファルコン、メディトロン、ミストラル)からの回答でデータセットを増強し、6つの代替機能の組み合わせに基づいたトランスフォーマーモデルを採用することに基づいている。その結果,質問の難易度を予測することはより困難であることが示唆された。特に,本手法は質問文を一貫して含み,LSM回答の多様性の恩恵を享受し,LSMの医療用ライセンス試験における自動評価改善の可能性を強調した。私たちはコードをhttps://github.com/ana-rogoz/BEA-2024.comで公開しています。 This work explores a novel data augmentation method based on Large Language Models (LLMs) for predicting item difficulty and response time of retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task. Our approach is based on augmenting the dataset with answers from zero-shot LLMs (Falcon, Meditron, Mistral) and employing transformer-based models based on six alternative feature combinations. The results suggest that predicting the difficulty of questions is more challenging. Notably, our top performing methods consistently include the question text, and benefit from the variability of LLM answers, highlighting the potential of LLMs for improving automated assessment in medical licensing exams. We make our code available https://github.com/ana-rogoz/BEA-2024.	翻訳日:2024-04-23 19:29:41 公開日:2024-04-20
# GRANOLA: グラフニューラルネットワークの適応正規化 GRANOLA: Adaptive Normalization for Graph Neural Networks ( http://arxiv.org/abs/2404.13344v1 ) ライセンス: Link先を確認	Moshe Eliasof, Beatrice Bevilacqua, Carola-Bibiane Schönlieb, Haggai Maron,	(参考訳) 近年、グラフニューラルネットワーク(GNN)層の設計を洗練し、表現力の制限や過度なスムーシングといった多様な課題を克服する努力が続けられている。広く採用されているにもかかわらず、GNNアーキテクチャ内のBatchNormやInstanceNormのような既製の正規化レイヤが組み込まれても、グラフ構造化データのユニークな特性を効果的に捉えることはできないため、全体的なアーキテクチャの表現力は低下する可能性がある。さらに、既存のグラフ固有の正規化レイヤは、実質的で一貫したメリットを提供するのに苦労することが多い。本稿では,新しいグラフ適応正規化層であるGRANOLAを提案する。既存の正規化層とは異なり、GRANOLAはグラフの特定の特性、特にグラフ内のランダムノード特徴(RNF)の伝播を利用して得られるその近傍構造の表現表現を生成することにより、ノード特徴を正規化している。設計選択を支援する理論的結果を示す。各種グラフベンチマークの広範な評価は,既存の正規化手法よりもGRANOLAの優れた性能を示している。さらに、GRANOLAは、メッセージパッシングニューラルネットワーク(MPNN)の複雑さと同時期に、すべてのベースラインの中で最高のパフォーマンスの方法として出現する。 In recent years, significant efforts have been made to refine the design of Graph Neural Network (GNN) layers, aiming to overcome diverse challenges, such as limited expressive power and oversmoothing. Despite their widespread adoption, the incorporation of off-the-shelf normalization layers like BatchNorm or InstanceNorm within a GNN architecture may not effectively capture the unique characteristics of graph-structured data, potentially reducing the expressive power of the overall architecture. Moreover, existing graph-specific normalization layers often struggle to offer substantial and consistent benefits. In this paper, we propose GRANOLA, a novel graph-adaptive normalization layer. Unlike existing normalization layers, GRANOLA normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure, obtained by leveraging the propagation of Random Node Features (RNF) in the graph. We present theoretical results that support our design choices. Our extensive empirical evaluation of various graph benchmarks underscores the superior performance of GRANOLA over existing normalization techniques. Furthermore, GRANOLA emerges as the top-performing method among all baselines within the same time complexity of Message Passing Neural Networks (MPNNs).	翻訳日:2024-04-23 19:19:57 公開日:2024-04-20
# 専門家軌道との類似性を維持しつつ、安全批判運転シナリオを増強する Augmenting Safety-Critical Driving Scenarios while Preserving Similarity to Expert Trajectories ( http://arxiv.org/abs/2404.13347v1 ) ライセンス: Link先を確認	Hamidreza Mirkhani, Behzad Khamidehi, Kasra Rezaee,	(参考訳) 軌道拡大は、模倣学習における分布シフトを緩和する手段として機能する。しかしながら、元々のエキスパートデータを不十分に表現した軌道を模倣すると、特に安全クリティカルなシナリオにおいて、望ましくない振る舞いが生じる可能性がある。本稿では,専門家の軌跡データとの類似性を維持するために,軌道拡張手法を提案する。これを実現するために、我々はまず、少数だが安全クリティカルなグループを識別する軌道をクラスタ化する。そして、幾何学的変換によって同一クラスタ内の軌道を結合し、新しい軌道を生成する。これらのトラジェクトリはトレーニングデータセットに追加され、指定された安全関連基準を満たすようにします。実験の結果,これらの拡張軌道を用いた模擬学習モデルの訓練は閉ループ性能を著しく向上させることができることがわかった。 Trajectory augmentation serves as a means to mitigate distributional shift in imitation learning. However, imitating trajectories that inadequately represent the original expert data can result in undesirable behaviors, particularly in safety-critical scenarios. We propose a trajectory augmentation method designed to maintain similarity with expert trajectory data. To accomplish this, we first cluster trajectories to identify minority yet safety-critical groups. Then, we combine the trajectories within the same cluster through geometrical transformation to create new trajectories. These trajectories are then added to the training dataset, provided that they meet our specified safety-related criteria. Our experiments exhibit that training an imitation learning model using these augmented trajectories can significantly improve closed-loop performance.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# ソーシャル学習:ネットワークシステムにおけるエッジインテリジェンスのためのパラダイムシフトに関する調査 Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems ( http://arxiv.org/abs/2404.13348v1 ) ライセンス: Link先を確認	Xiaofei Wang, Yunfeng Zhao, Chao Qiu, Qinghua Hu, Victor C. M. Leung,	(参考訳) 人工知能(AI)とビッグデータによる堅牢な衝動の中で、エッジインテリジェンス(EI)は、エッジコンピューティング(EC)でAIを合成し、AIサービスの潜在能力を最大限に活用するための模範的なソリューションとなっている。それでも、通信コスト、リソース割り当て、プライバシ、セキュリティの課題は、さまざまな要件を持つサービスをサポートする能力の制限を続けています。これらの課題に応えて,社会学習(SL)を将来性のあるソリューションとして導入し,EIの進展をさらに促進する。 SLは、EIシステム内のエージェントの協調能力と集団知性を増幅することを目的とした、社会的原則と行動に基づく学習パラダイムである。 SLはシステムの適応性を向上するだけでなく、様々なデバイスやプラットフォームにまたがる分散インテリジェンスに不可欠な通信やネットワークプロセスも最適化する。したがって、SLとEIの組み合わせにより、将来のネットワークにおける協調的なインテリジェンスの開発が大幅に促進される可能性がある。本稿では,EI と SL の統合に関する文献レビューの結果を概説し,既存の EI と SL に関する研究成果を要約する。その後、EIの限界とSLの利点を包括的に掘り下げる。これらのシステムにおける通信課題やネットワーク戦略、その他の側面に特に重点を置いており、システムの効率性を改善する上で、最適化されたネットワークソリューションの役割を概説している。これらの議論に基づき, 社会的アーキテクチャ, 社会的訓練, 社会的推論という3つの統合された構成要素について詳細に検討し, その強みと弱さを分析した。最後に,SLとEIを組み合わせた将来的な応用の可能性を特定し,オープンな問題について議論し,今後の研究を提案する。 Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in supporting services with diverse requirements. In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI. SL is a learning paradigm predicated on social principles and behaviors, aimed at amplifying the collaborative capacity and collective intelligence of agents within the EI system. SL not only enhances the system's adaptability but also optimizes communication, and networking processes, essential for distributed intelligence across diverse devices and platforms. Therefore, a combination of SL and EI may greatly facilitate the development of collaborative intelligence in the future network. This paper presents the findings of a literature review on the integration of EI and SL, summarizing the latest achievements in existing research on EI and SL. Subsequently, we delve comprehensively into the limitations of EI and how it could benefit from SL. Special emphasis is placed on the communication challenges and networking strategies and other aspects within these systems, underlining the role of optimized network solutions in improving system efficacy. Based on these discussions, we elaborate in detail on three integrated components: socialized architecture, socialized training, and socialized inference, analyzing their strengths and weaknesses. Finally, we identify some possible future applications of combining SL and EI, discuss open problems and suggest some future research.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# プログレッシブトレーニングによる不均一なフェデレーション学習のための記憶壁の破壊 Breaking the Memory Wall for Heterogeneous Federated Learning with Progressive Training ( http://arxiv.org/abs/2404.13349v1 ) ライセンス: Link先を確認	Yebo Wu, Li Li, Chunlin Tian, Chengzhong Xu,	(参考訳) 本稿では,記憶壁を効果的に破壊する新しいプログレッシブFLフレームワークであるProFLを提案する。具体的には、ProFLはモデルを元のアーキテクチャに基づいて異なるブロックに分割する。各トレーニングラウンドでモデル全体を更新する代わりに、ProFLはまずフロントブロックをトレーニングし、収束後に安全に凍結する。次に次のブロックのトレーニングがトリガーされる。このプロセスは、モデル全体のトレーニングが完了するまで繰り返します。このようにして、異種デバイスへのデプロイが可能なメモリフットプリントを効果的に削減する。各ブロックの特徴的表現を維持するため、トレーニングプロセス全体を2段階に分けて、プログレッシブモデル縮小とプログレッシブモデル成長の2段階に分割する。プログレッシブモデル縮小段階において,各ブロックが期待する特徴表現を学習し,初期化パラメータを得るのを支援するために,対応する出力モジュールを慎重に設計する。そして、得られた出力モジュールを対応するプログレッシブモデル成長段階に利用する。さらに,各ブロックの学習速度を制御するために,スカラー視点による新しいメトリクスを提案し,各ブロックの学習状況を評価し,次のブロックの学習をいつトリガーするかを決定する。最後に, ProFLの収束性を理論的に証明し, ProFLの有効性を評価するために, 代表モデルおよびデータセットに関する広範な実験を行う。その結果、ProFLはピークメモリのフットプリントを57.4%まで効果的に削減し、モデル精度を82.4%まで改善した。 This paper presents ProFL, a novel progressive FL framework to effectively break the memory wall. Specifically, ProFL divides the model into different blocks based on its original architecture. Instead of updating the full model in each training round, ProFL first trains the front blocks and safely freezes them after convergence. Training of the next block is then triggered. This process iterates until the training of the whole model is completed. In this way, the memory footprint is effectively reduced for feasible deployment on heterogeneous devices. In order to preserve the feature representation of each block, we decouple the whole training process into two stages: progressive model shrinking and progressive model growing. During the progressive model shrinking stage, we meticulously design corresponding output modules to assist each block in learning the expected feature representation and obtain the initialization parameters. Then, the obtained output modules are utilized in the corresponding progressive model growing stage. Additionally, to control the training pace for each block, a novel metric from the scalar perspective is proposed to assess the learning status of each block and determines when to trigger the training of the next one. Finally, we theoretically prove the convergence of ProFL and conduct extensive experiments on representative models and datasets to evaluate the effectiveness of ProFL. The results demonstrate that ProFL effectively reduces the peak memory footprint by up to 57.4% and improves model accuracy by up to 82.4%.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# Swa Bhasha:Sinhala翻訳へのメッセージベースシングリッシュ Swa Bhasha: Message-Based Singlish to Sinhala Transliteration ( http://arxiv.org/abs/2404.13350v1 ) ライセンス: Link先を確認	Maneesha U. Athukorala, Deshan K. Sumanathilaka,	(参考訳) Machine Transliterationは、基本的な言語を計算方法で異なる言語に翻訳する機能を提供する。翻訳は近年注目されている重要な技術プロセスである。シンハラ語は、シンハラ語の資源が不足しているため、多くの制約がある。これらの制限のため、シンハラ文字の翻訳は非常に複雑で時間を要する。したがって、スリランカの大多数は「シングリッシュ」という名前の非形式的なテキスト言語を使用して、そのプロセスをシンプルにしている。本研究は,翻訳の複雑さを減らし,Singlish言語の単語レベルでの文字化に着目した。母音を使わずに一致するシンハラ語をマッピングできるルールベース手法を考案した。様々なタイプパターンが様々なコミュニティによって収集された。収集したデータはすべてのシンハラ文字で分析され、関連するシングリッシュパターンが生成される。このシステムは、認識されたタイピングパターンとマッチングすることで、シングリッシュ文字と併用する新しい数値符号化システムを導入した。マッピングプロセスにはファジィロジックベースの実装が使用されている。独自の数値を含む符号化辞書も実装されている。このシステムでは、各ローマ字化英語の文字には、各単語に固有のパターンを構築することのできるユニークな数値コードが割り当てられた。このシステムは、シングリッシュ語のパターンと一致する最も関連性の高いシンハラ語を識別するか、最も関連性の高い単語提案を与える。例えば、キヤナ(kiyanna)、キヤナ(kiynna)、キヤナ(kiynna)、キヤナ(kiynna)、キヤナ(kiynna)、キヤナ(kiynna)、キヤナ(kiyanna)、キヤナ(kiynna)、キヤナ(kiyanna)、キヤナ(kiyanna)、キヤナ(kiynna)などである。これらの結果から,「スワ・バシャ」の音訳システムは,シンリッシュ語からシンハラ語へのテキスト化を行ないながら,シンハラ人の体験を高める能力を有することが明らかとなった。 Machine Transliteration provides the ability to transliterate a basic language into different languages in a computational way. Transliteration is an important technical process that has caught the attention most recently. The Sinhala transliteration has many constraints because of the insufficiency of resources in the Sinhala language. Due to these limitations, Sinhala Transliteration is highly complex and time-consuming. Therefore, the majority of the Sri Lankans uses non-formal texting language named 'Singlish' to make that process simple. This study has focused on the transliteration of the Singlish language at the word level by reducing the complication in the transliteration. A new approach of coding system has invented with the rule-based approach that can map the matching Sinhala words even without the vowels. Various typing patterns were collected by different communities for this. The collected data have analyzed with every Sinhala character and unique Singlish patterns related to them were generated. The system has introduced a newly initiated numeric coding system to use with the Singlish letters by matching with the recognized typing patterns. For the mapping process, fuzzy logic-based implementation has used. A codified dictionary has also implemented including unique numeric values. In this system, Each Romanized English letter was assigned with a unique numeric code that can construct a unique pattern for each word. The system can identify the most relevant Sinhala word that matches with the pattern of the Singlish word or it gives the most related word suggestions. For example, the word 'kiyanna,kianna, kynna, kynn, kiynna' have mapped with the accurate Sinhala word "kiyanna". These results revealed that the 'Swa Bhasha' transliteration system has the ability to enhance the Sinhala users' experience while conducting the texting in Singlish to Sinhala.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# 拡散モデルによる日光駆動型建築設計の生成 Generating Daylight-driven Architectural Design via Diffusion Models ( http://arxiv.org/abs/2404.13353v1 ) ライセンス: Link先を確認	Pengzhi Li, Baijuan Li,	(参考訳) 近年,大規模モデルの急速な発展は,建築などの学際分野に新たな可能性をもたらしている。本稿では,新しい日光駆動型AI支援アーキテクチャ設計手法を提案する。まず, ランダムパラメータを用いたマッサージモデル生成手法を定式化し, 素早いマッサージモデルを生成する。その後、日光駆動型ファサード設計戦略を統合し、窓レイアウトを正確に決定し、マッサージモデルに適用する。最後に,大規模言語モデルとテキスト・ツー・イメージ・モデルとをシームレスに組み合わせ,ビジュアル・アーキテクチャ・デザイン・レンダリングの効率を向上する。実験の結果,提案手法は建築家の創造的なインスピレーションと,建築設計開発のための新しい道の先駆者を支援することが示された。プロジェクトページ: https://zrealli.github.io/DDADesign/。 In recent years, the rapid development of large-scale models has made new possibilities for interdisciplinary fields such as architecture. In this paper, we present a novel daylight-driven AI-aided architectural design method. Firstly, we formulate a method for generating massing models, producing architectural massing models using random parameters quickly. Subsequently, we integrate a daylight-driven facade design strategy, accurately determining window layouts and applying them to the massing models. Finally, we seamlessly combine a large-scale language model with a text-to-image model, enhancing the efficiency of generating visual architectural design renderings. Experimental results demonstrate that our approach supports architects' creative inspirations and pioneers novel avenues for architectural design development. Project page: https://zrealli.github.io/DDADesign/.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# ライドバーグ原子分光における高次カシミール・ポルダー相互作用の影響 Effects of higher-order Casimir-Polder interactions on Rydberg atom spectroscopy ( http://arxiv.org/abs/2404.13354v1 ) ライセンス: Link先を確認	Biplab Dutta, Joao Carlos de Aquino Carvalho, Guadalupe Garcia-Arellano, Paolo Pedri, Athanasios Laliotis, Chris Boldt, Jivesh Kaushal, Stefan Scheel,	(参考訳) 極近場において、原子波関数の空間的拡張が原子-表面距離よりも無視できない場合、双極子近似はカシミール・ポルダー相互作用を記述するのに十分ではない。ここでは、誘電体表面に近いライドバーグ原子のカシミール・ポルダーエネルギーシフトへの高次、四重極、オクトゥポールの寄与を計算する。その後、薄膜および選択的反射分光法におけるこれらの高次項の影響について検討した。基本的関心の他に、非常に小さな原子表面分離の新たな体制は、リドベルクやフォトニックプラットフォームと対面する表面結合原子による量子技術応用に関係している。 In the extreme near-field, when the spatial extension of the atomic wavefunction is no longer negligible compared to the atom-surface distance, the dipole approximation is no longer sufficient to describe Casimir-Polder interactions. Here we calculate the higher-order, quadrupole and octupole, contributions to Casimir-Polder energy shifts of Rydberg atoms close to a dielectric surface. We subsequently investigate the effects of these higher-order terms in thin-cell and selective reflection spectroscopy. Beyond its fundamental interest, this new regime of extremely small atom surface separations is relevant for quantum technology applications with Rydberg or surface-bound atoms interfacing with photonic platforms.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# 音楽の一貫性モデル Music Consistency Models ( http://arxiv.org/abs/2404.13358v1 ) ライセンス: Link先を確認	Zhengcong Fei, Mingyuan Fan, Junshi Huang,	(参考訳) 一貫性モデルは、画像/ビデオの効率的な生成を容易にし、最小限のサンプリングステップで合成することができる。拡散モデルに関連する計算負担を軽減するのに有利であることが証明されている。それでも、音楽生成における一貫性モデルの適用は、ほとんど未検討のままである。このギャップに対処するために,音楽クリップのメル-スペクトログラムを効率よく合成し,サンプリングステップ数を最小化しながら高品質を維持し,一貫性モデルの概念を活用する音楽一貫性モデル(\texttt{MusicCM})を提案する。既存のテキストから音楽への拡散モデルに基づいて、 texttt{MusicCM} モデルは、一貫性の蒸留と逆微分器の訓練を取り入れている。さらに,複数の拡散過程を共有制約に組み込むことで,拡張コヒーレントな音楽を生成することにも有用である。実験結果から, 計算効率, 忠実度, 自然性の観点から, モデルの有効性が明らかとなった。注目すべきは、‘texttt{MusicCM} は、わずか4ステップのサンプリングステップでシームレスな音楽合成を実現することだ。 Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (\texttt{MusicCM}), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the \texttt{MusicCM} model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with shared constraints. Experimental results reveal the effectiveness of our model in terms of computational efficiency, fidelity, and naturalness. Notable, \texttt{MusicCM} achieves seamless music synthesis with a mere four sampling steps, e.g., only one second per minute of the music clip, showcasing the potential for real-time application.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# 意味的に修正されたアンハイク自動音声認識 Semantically Corrected Amharic Automatic Speech Recognition ( http://arxiv.org/abs/2404.13362v1 ) ライセンス: Link先を確認	Samuael Adnew, Paul Pu Liang,	(参考訳) 自動音声認識(ASR)は、世界中の話し言葉のアクセシビリティを高める上で重要な役割を果たす。本稿では、主に東アフリカで5000万人以上の人々が話しているアムハラ語のためのASRツールセットを構築する。アムハラ語はゲエズ文字(Ge'ez script)で書かれている。これにより、間隔の位置が生成文の意味に大きな影響を与えるため、アムハラ語の計算処理が困難になる。既存のAmharic ASRのベンチマークでは,これらの間隔を考慮せず,個々のグラフの誤り率のみを測定できることがわかった。本稿では,既存のAmharic ASRテストデータセットの書き起こしを初めてリリースし,コミュニティが進捗を正確に評価できるようにする。さらに、トランスフォーマーエンコーダデコーダアーキテクチャを用いて、生のASR出力を文法的に完全かつ意味論的に意味のあるアムハラ語文に整理する後処理手法を提案する。補正されたテストデータセットの実験により、我々のモデルは、アンハラ語音声認識システムの意味的正当性を高め、5.5\%の文字誤り率(CER)、23.3\%の単語誤り率(WER)を達成する。 Automatic Speech Recognition (ASR) can play a crucial role in enhancing the accessibility of spoken languages worldwide. In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. Amharic is written in the Ge'ez script, a sequence of graphemes with spacings denoting word boundaries. This makes computational processing of Amharic challenging since the location of spacings can significantly impact the meaning of formed sentences. We find that existing benchmarks for Amharic ASR do not account for these spacings and only measure individual grapheme error rates, leading to significantly inflated measurements of in-the-wild performance. In this paper, we first release corrected transcriptions of existing Amharic ASR test datasets, enabling the community to accurately evaluate progress. Furthermore, we introduce a post-processing approach using a transformer encoder-decoder architecture to organize raw ASR outputs into a grammatically complete and semantically meaningful Amharic sentence. Through experiments on the corrected test dataset, our model enhances the semantic correctness of Amharic speech recognition systems, achieving a Character Error Rate (CER) of 5.5\% and a Word Error Rate (WER) of 23.3\%.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# MahaSQuAD: マラウイの質問への回答における言語学の分岐 MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering ( http://arxiv.org/abs/2404.13364v1 ) ライセンス: Link先を確認	Ruturaj Ghatage, Aditya Kulkarni, Rajlaxmi Patil, Sharvi Endait, Raviraj Joshi,	(参考訳) 質問応答システムは情報検索に革命をもたらしたが、言語と文化の境界は幅広いアクセス可能性を制限する。この研究は、ロバストなデータキュレーションアプローチを用いて、英語質問回答データセット(SQuAD)を翻訳することで、低リソース言語における効率的なQnAデータセットの欠如のギャップを埋める試みである。 118,516のトレーニング、11,873のバリデーション、11,803のテストサンプルからなる、Indic言語Marathiのための最初の完全なSQuADデータセットであるMahaSQuADを紹介した。また、手動で検証した500のサンプルのゴールドテストセットも提示する。文脈の維持と言語的ニュアンス処理の課題に対処し、正確な翻訳を保証する。さらに、QnAデータセットは、翻訳を用いて任意の低リソース言語に簡単に変換できないため、翻訳文中の応答翻訳をそのスパンにマッピングする堅牢な方法が必要である。したがって、この問題に対処するため、SQuADを低リソース言語に翻訳するための汎用的なアプローチも提示する。そこで我々は,低リソース言語における言語と文化のギャップを,質問応答システムの領域で橋渡しする,スケーラブルなアプローチを提案する。データセットとモデルはhttps://github.com/l3cube-pune/MarathiNLPで公開されています。 Question-answering systems have revolutionized information retrieval, but linguistic and cultural boundaries limit their widespread accessibility. This research endeavors to bridge the gap of the absence of efficient QnA datasets in low-resource languages by translating the English Question Answering Dataset (SQuAD) using a robust data curation approach. We introduce MahaSQuAD, the first-ever full SQuAD dataset for the Indic language Marathi, consisting of 118,516 training, 11,873 validation, and 11,803 test samples. We also present a gold test set of manually verified 500 examples. Challenges in maintaining context and handling linguistic nuances are addressed, ensuring accurate translations. Moreover, as a QnA dataset cannot be simply converted into any low-resource language using translation, we need a robust method to map the answer translation to its span in the translated passage. Hence, to address this challenge, we also present a generic approach for translating SQuAD into any low-resource language. Thus, we offer a scalable approach to bridge linguistic and cultural gaps present in low-resource languages, in the realm of question-answering systems. The datasets and models are shared publicly at https://github.com/l3cube-pune/MarathiNLP .	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# Movie101v2: 映画ナレーションベンチマークの改善 Movie101v2: Improved Movie Narration Benchmark ( http://arxiv.org/abs/2404.13370v1 ) ライセンス: Link先を確認	Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin,	(参考訳) 視覚障害者を支援するために、映像に合わせたプロット記述を作成することを目的とした自動映画ナレーション。標準的なビデオキャプションとは異なり、重要な視覚的詳細を記述するだけでなく、複数の映画撮影で展開されたプロットを推測する必要があるため、独特で進行中の課題が生じる。自動映画ナレーションシステムの開発を進めるため,既存のデータセットの限界を再考し,大規模なバイリンガル映画ナレーションデータセットであるMovie101v2を開発した。第2に,映画ナレーションの達成に欠かせない課題を考慮し,長期的目標を3段階に分割し,個別クリップ内での理解をめざした初期段階に着目した。また、段階的な課題目標に合わせて、新たなナレーションアセスメントも導入します。第3に、我々の新しいデータセットを用いて、GPT-4Vを含むいくつかの主要な視覚言語モデルをベースライン化し、現在のモデルが映画ナレーション生成に直面する課題について、詳細な調査を行う。以上の結果から,映画ナレーション生成の達成は,徹底的な研究を必要とする魅力的な目標であることが示唆された。 Automatic movie narration targets at creating video-aligned plot descriptions to assist visually impaired audiences. It differs from standard video captioning in that it requires not only describing key visual details but also inferring the plots developed across multiple movie shots, thus posing unique and ongoing challenges. To advance the development of automatic movie narrating systems, we first revisit the limitations of existing datasets and develop a large-scale, bilingual movie narration dataset, Movie101v2. Second, taking into account the essential difficulties in achieving applicable movie narration, we break the long-term goal into three progressive stages and tentatively focus on the initial stages featuring understanding within individual clips. We also introduce a new narration assessment to align with our staged task goals. Third, using our new dataset, we baseline several leading large vision-language models, including GPT-4V, and conduct in-depth investigations into the challenges current models face for movie narration generation. Our findings reveal that achieving applicable movie narration generation is a fascinating goal that requires thorough research.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# HybridFlow:極低ビットレート画像圧縮のためのマスク付きコードブックへの連続性注入 HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression ( http://arxiv.org/abs/2404.13372v1 ) ライセンス: Link先を確認	Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang,	(参考訳) 本稿では,極低ビットレートの学習画像圧縮(lic)の課題について検討する。量子化された連続的な特徴を伝達する先進的な手法は、重度の量子化損失のため、しばしばぼやけやノイズの多い再構成をもたらす。視覚空間を識別する学習されたコードブックに基づく従来のlicメソッドは、通常は、忠実な詳細をキャプチャする際、限定されたコードワードの表現能力が不十分なため、不忠実な再構築をもたらす。本稿では,超低ビットレート下での高知覚品質と高忠実度を実現するために,連続的な機能ベースとコードブックベースのストリームを組み合わせた新しいデュアルストリームフレームワークHyrbidFlowを提案する。コードブックベースのストリームは、以前に学習されたコードブックから恩恵を受け、再構築された画像の質と明快さを提供する。継続的機能ストリームは、忠実さの詳細を維持することを目標としている。超低ビットレートを実現するために、マスク付きトークンベースのトランスフォーマが提案され、ここでは、コードワードインデックスのマスク部分のみを送信し、連続特徴ストリームから情報に導かれるトークン生成により、欠落したインデックスを復元する。また、最終的な画像再構成のための画素復号法において、2つのストリームをマージするブリッジ補正ネットワークを構築し、連続的なストリーム特徴は、コードブックベースの画素復号器のバイアスを補正し、再構成された忠実度の詳細を強制する。実験結果は、既存のシングルストリームのコードブックベースや連続機能ベースのlic手法と比較して、非常に低ビットレートで複数のデータセット間で優れた性能を示す。 This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# 理論と実践のギャップを埋める: ベンチマーク転送進化最適化 Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization ( http://arxiv.org/abs/2404.13377v1 ) ライセンス: Link先を確認	Yaqing Hou, Wenqiang Ma, Abhishek Gupta, Kavitesh Kumar Bali, Hongwei Ge, Qiang Zhang, Carlos A. Coello Coello, Yew-Soon Ong,	(参考訳) 近年、トランスファー進化最適化(TrEO)の分野は、複雑な問題の解決に対するその大きな影響の実現によって、かなりの成長をみせている。タスク間で知識を伝達することで生じる課題に対処するために、多くのアルゴリズムが登場した。しかし、転送最適化における '`no free lunch theorem'' は、様々な問題タイプにまたがって、単一のアルゴリズムが優位に立つことはないことを明らかにしている。本稿では,様々なTrEOアルゴリズムの性能を現実的なシナリオで評価するために,ベンチマーク手法を採用することで,この問題点に対処する。転送最適化の方法論的焦点が拡大しているにもかかわらず、既存のベンチマーク問題は設計が不十分なためにしばしば不足し、主に現実の妥当性に欠ける合成的な問題を特徴としている。本稿では,ビッグデータタスクインスタンスの3つの重要な側面(ボリューム,多様性,速度)に基づいて分類された文献からの問題を統合する,実用的なTrEOベンチマークスイートのパイオニアとなる。我々の主な目的は、既存のTrEOアルゴリズムを包括的に分析し、実践的な課題に取り組むための新しいアプローチを開発するための道を開くことである。ボリューム,多様性,速度の3次元を具現化する現実的なベンチマークを導入することで,多様かつ複雑な転送シナリオに直面したアルゴリズム性能の理解を深めることを目指す。このベンチマークスイートは研究者にとって貴重なリソースとして機能し、現実世界の問題を解決するためにTrEOアルゴリズムの洗練と進歩を促進する。 In recent years, the field of Transfer Evolutionary Optimization (TrEO) has witnessed substantial growth, fueled by the realization of its profound impact on solving complex problems. Numerous algorithms have emerged to address the challenges posed by transferring knowledge between tasks. However, the recently highlighted ``no free lunch theorem'' in transfer optimization clarifies that no single algorithm reigns supreme across diverse problem types. This paper addresses this conundrum by adopting a benchmarking approach to evaluate the performance of various TrEO algorithms in realistic scenarios. Despite the growing methodological focus on transfer optimization, existing benchmark problems often fall short due to inadequate design, predominantly featuring synthetic problems that lack real-world relevance. This paper pioneers a practical TrEO benchmark suite, integrating problems from the literature categorized based on the three essential aspects of Big Source Task-Instances: volume, variety, and velocity. Our primary objective is to provide a comprehensive analysis of existing TrEO algorithms and pave the way for the development of new approaches to tackle practical challenges. By introducing realistic benchmarks that embody the three dimensions of volume, variety, and velocity, we aim to foster a deeper understanding of algorithmic performance in the face of diverse and complex transfer scenarios. This benchmark suite is poised to serve as a valuable resource for researchers, facilitating the refinement and advancement of TrEO algorithms in the pursuit of solving real-world problems.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# コヒーレンス測定に基づく電子ビームのウィグナー関数の再構成 Reconstruction of Wigner function of electron beams based on coherence measurements ( http://arxiv.org/abs/2404.13379v1 ) ライセンス: Link先を確認	Shuhei Hatanaka, Jun Yamasaki,	(参考訳) 電子ビームの密度行列とウィグナー関数をエアリーパターン強度プロファイル解析により再構成する方法を開発した。透過電子顕微鏡対象物体の密度行列をコヒーレンス関数と電子波振幅と位相分布を用いて計算した。その後、ウィグナー函数は行列要素を用いて再構成された。位相空間の起点におけるウィグナー関数に基づいて、その軸方向の明るさを計算する式を導出した上で、従来の平均輝度測定よりも精度良くエミッタ性能を反映したショットキー界放出ガンの軸方向の明るさを測定した。 We developed a reconstruction method for the density matrix and Wigner function of electron beams through analysis of the Airy pattern intensity profile. The density matrix in a transmission electron microscope object plane was calculated using the coherence function and the electron wave amplitude and phase distributions. The Wigner function was then reconstructed using the matrix elements. Based on the Wigner function at the origin of the phase space, we derived a formula to calculate the axial brightness, and then measured the axial brightness of a Schottky field emission gun, which reflects the emitter performance more accurately and precisely than the conventional mean brightness measurements.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# DNA : 接触追跡のための個人用神経増強法 DNA: Differentially private Neural Augmentation for contact tracing ( http://arxiv.org/abs/2404.13381v1 ) ライセンス: Link先を確認	Rob Romijnders, Christos Louizos, Yuki M. Asano, Max Welling,	(参考訳) 新型コロナウイルスのパンデミックは経済と社会に大きな影響を及ぼした。接触追跡はウイルスキャリアの早期検出による感染率の低下に有効な方法である。しかし、これは最近のパンデミックでは一般的には採用されておらず、プライバシーに関する懸念が最も重要な理由として挙げられている。我々は、分散化されたコンタクトトレースにおける現在の最先端技術のプライバシー保証を大幅に改善する。これまでの研究は統計的推論のみに基づいていたが、学習したニューラルネットワークによる推論を強化し、このニューラルネットワークが差分プライバシーを満たすことを保証する。 COVID19のシミュレーターでは、1メッセージ当たりのepsilon=1でも、感染する可能性のある個人の検出が大幅に改善され、ターゲットテストの結果、感染率が低下する可能性がある。この作業は、重要なプライバシー保証を維持しながら、ディープラーニングをコンタクトトレースに統合する上で、重要な第一歩となる。 The COVID19 pandemic had enormous economic and societal consequences. Contact tracing is an effective way to reduce infection rates by detecting potential virus carriers early. However, this was not generally adopted in the recent pandemic, and privacy concerns are cited as the most important reason. We substantially improve the privacy guarantees of the current state of the art in decentralized contact tracing. Whereas previous work was based on statistical inference only, we augment the inference with a learned neural network and ensure that this neural augmentation satisfies differential privacy. In a simulator for COVID19, even at epsilon=1 per message, this can significantly improve the detection of potentially infected individuals and, as a result of targeted testing, reduce infection rates. This work marks an important first step in integrating deep learning into contact tracing while maintaining essential privacy guarantees.	翻訳日:2024-04-23 19:19:56 公開日:2024-04-20
# 2光子緩和を伴う異方性Rabiモデル Anisotropic Rabi model with two-photon relaxation ( http://arxiv.org/abs/2404.13385v1 ) ライセンス: Link先を確認	Hui Li, Jia-Kai Shi, Li-Bao Fan, Zi-Min Li, Chuan-Cun Shu,	(参考訳) 回転と反回転の相互作用と光場の2光子緩和という3つの光物質相互作用プロセスの相互作用は、量子光学と量子情報処理における関心事である。本研究では, 異方性Rabiモデルを用いた3つの光物質相互作用過程を理論的に検討する。リンドブラッド・マスター方程式を数値的に解くことにより、励起緩和力学を解析し、非エルミート有効ハミルトニアンを導出し、さらなる物理的洞察を得る。これらの相互作用の個々の効果を探索するため、実効ハミルトニアンの3つの解析的抽出可能な限界について検討する。解析の結果,3つの競合する光物質相互作用プロセスはパリティに敏感であり,過渡状態と定常状態の両方で興味深い現象が生じることがわかった。これら3つの相互作用項が競合するとき、特に量子相転移に似た興味深い動的パターンが出現する。この研究は、オープン量子系における超強光-物質相互作用の理解を深め、空洞ベースの量子計算に関する貴重な洞察を提供する。 The interplay of three light-matter interaction processes - rotating and counter-rotating interactions and two-photon relaxation of the light field - is a topic of interest in quantum optics and quantum information processing. In this work, we theoretically investigate the three light-matter interaction processes using the anisotropic Rabi model, which accounts for different strengths of rotating and counter-rotating interactions and the unique occurrence of photon escape exclusively in pairs. By numerically solving the Lindblad master equation, we analyze the excitation-relaxation dynamics and derive a non-Hermitian effective Hamiltonian to gain further physical insights. To explore the individual effects of these interactions, we examine three analytically tractable limits of the effective Hamiltonian. Our analysis reveals that the three competitive light-matter interaction processes exhibit sensitivity to parity, leading to intriguing phenomena in both transient and steady states. Particularly interesting dynamical patterns resembling quantum phase transitions emerge when these three interaction terms compete. This work deepens the understanding of ultrastrong light-matter interaction in open quantum systems and offers valuable insights into cavity-based quantum computations.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# SSVT:眼疾患診断のための眼底画像に基づく自己監督型視覚変換器 SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images ( http://arxiv.org/abs/2404.13386v1 ) ライセンス: Link先を確認	Jiaqi Wang, Mengtian Kang, Yong Liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti,	(参考訳) 機械学習に基づく基礎画像診断技術は、医療資源の削減や客観的な評価結果の提供など、世界中の関心を喚起する。しかし、現在の手法は一般に指導的手法に基づいており、医療従事者に重労働負荷をもたらし、効果的なデータベースの拡大に苦しむ。そこで本稿では,無ラベルの眼底画像を自動的に解析し,北京トングレン病院が収集した6つの公的データセットと2つのデータセットに基づいて,4つの主眼疾患の97.0%の高い評価精度を得られるラベルフリー手法「SSVT」を構築した。その結果, バイオメディカル・リソース不足地域における非教師なし学習の有効性と, 世界的視力向上への強力な応用の可能性が示された。 Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this article, we established a label-free method, name 'SSVT',which can automatically analyze un-labeled fundus images and generate high evaluation accuracy of 97.0% of four main eye diseases based on six public datasets and two datasets collected by Beijing Tongren Hospital. The promising results showcased the effectiveness of the proposed unsupervised learning method, and the strong application potential in biomedical resource shortage regions to improve global eye health.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 超低損失集積フォトニクスは明るい狭帯域光子対源を可能にする Ultralow-loss integrated photonics enables bright, narrow-band, photon-pair sources ( http://arxiv.org/abs/2404.13387v1 ) ライセンス: Link先を確認	Ruiyang Chen, Yi-Han Luo, Jinbao Long, Baoqi Shi, Chen Shen, Junqiu Liu,	(参考訳) 光子対光源は、フォトニック量子系にとって重要な構成要素である。ケーラー非線形性とキャビティ強化した自発4波混合を利用して、フォトニック集積回路上に構築されたマイクロ共振器を用いてチップスケール光子対光源を作成することができる。実用化のためには、マイクロ共振器の品質係数$Q$は光子対光源の輝度を増大させ、その直線幅を減少させることが必須である。前者は$Q^4$に、後者は$Q$に比例する。本稿では,マイクロ共振器をベースとした狭帯域光子対光源について述べる。この集積マイクロ共振器は窒化ケイ素で作製され、標準のCMOSファウントリープロセスで製造され、極低損失が3ドル/m、本質的なQ$は10ドル^7ドルである。光子対光源の輝度は1.17\times10^9$ Hz/mW$^2$/GHz、光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光子対光さらに、2階相関$g^{(2)}_\mathrm{h}(0)=0.0037(5)$と、視認率$0.973(9)$の時間ビン絡みのソースも可能となる。我々の研究は、超低損失集積フォトニクスのグローバルポテンシャルを証明し、量子通信やネットワークへの効率的でコンパクトで堅牢なインターフェースを触媒する新しい量子光源と回路を創出する。 Photon-pair sources are critical building blocks for photonic quantum systems. Leveraging Kerr nonlinearity and cavity-enhanced spontaneous four-wave mixing, chip-scale photon-pair sources can be created using microresonators built on photonic integrated circuit. For practical applications, a high microresonator quality factor $Q$ is mandatory to magnify photon-pair sources' brightness and reduce their linewidth. The former is proportional to $Q^4$, while the latter is inversely proportional to $Q$. Here, we demonstrate an integrated, microresonator-based, narrow-band photon-pair source. The integrated microresonator, made of silicon nitride and fabricated using a standard CMOS foundry process, features ultralow loss down to $3$ dB/m and intrinsic $Q$ factor exceeding $10^7$. The photon-pair source has brightness of $1.17\times10^9$ Hz/mW$^2$/GHz and linewidth of $25.9$ MHz, both of which are record values for silicon-photonics-based quantum light source. It further enables a heralded single-photon source with heralded second-order correlation $g^{(2)}_\mathrm{h}(0)=0.0037(5)$, as well as a time-bin entanglement source with a raw visibility of $0.973(9)$. Our work evidences the global potential of ultralow-loss integrated photonics to create novel quantum light sources and circuits, catalyzing efficient, compact and robust interfaces to quantum communication and networks.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 自己教師型機械学習による医療専門家の育成に伴う多発性足底障害の診断 Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning ( http://arxiv.org/abs/2404.13388v1 ) ライセンス: Link先を確認	Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti,	(参考訳) 眼科医の不足がタイムリーな診断を妨げている未発達の地域では、眼底疾患は視覚障害や視覚障害の主な原因である。 AI支援されたファンドイメージ分析には、高精度、ワークロード削減、アクセシビリティの改善など、いくつかのメリットがあるが、信頼性のあるモデルを構築するには、大量の専門家アノテートデータが必要である。このジレンマに対処するために、ラベルのないファンドス画像から多様なファンドス病を処理できる汎用的な自己教師型機械学習フレームワークを提案する。提案手法のAUCは,既存の指導的アプローチを15.7%超え,一人の人間専門家の能力を超えている。さらに、当社のモデルは、異なる地域、人種、異種画像ソースからのさまざまなデータセットや、複数のカメラやデバイスからのクオリティに順応する。本手法は,眼底疾患を診断するためのラベルフリーの汎用フレームワークを提供する。 Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To address this dilemma, we propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images. Our method's AUC surpasses existing supervised approaches by 15.7%, and even exceeds performance of a single human expert. Furthermore, our model adapts well to various datasets from different regions, races, and heterogeneous image sources or qualities from multiple cameras or devices. Our method offers a label-free general framework to diagnose fundus diseases, which could potentially benefit telehealth programs for early screening of people at risk of vision loss.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 自然言語推論のための説明に基づくバイアスデカップリング正規化 Explanation based Bias Decoupling Regularization for Natural Language Inference ( http://arxiv.org/abs/2404.13390v1 ) ライセンス: Link先を確認	Jianxiang Zang, Hui Liu,	(参考訳) Transformerベースの自然言語推論エンコーダの堅牢性は、意図されたタスク関連機能よりもデータセットバイアスに依存する傾向があるため、しばしば妥協される。近年の研究では、トレーニング過程における偏りのあるサンプルの重量を減らし、これを緩和しようと試みている。しかしながら、これらのデバイアス法は主に、各ケース内の偏り成分を明示的に決定することなく、どのサンプルに偏りがあるかを特定することに重点を置いている。この制限は、アウト・オブ・ディストリビューション推論におけるこれらのメソッドの能力を制限する。この問題に対処するため、我々は、人間が因果関係を説明するために使用する論理をモデルに組み入れることを目標としている。説明に基づくバイアスデカップリング規則化(EBD-Reg)を提案する。 EBD-Regは人間の説明を基準として、エンコーダにディチンギング、デカップリング、アライニングの三部構成の並列監視を確立するよう指示する。この方法では、残余要素をバイアスとして取り除きながら、推論中にタスク関連の特徴を表すキーワードをエンコーダが識別し、フォーカスすることができる。実証的な証拠は、EBD-Regがトランスフォーマーベースのエンコーダを効果的に誘導し、人間の中心レンズを通してバイアスを分離し、アウト・オブ・ディストリビューション推論能力の点で他の手法をはるかに上回っていることを示している。 The robustness of Transformer-based Natural Language Inference encoders is frequently compromised as they tend to rely more on dataset biases than on the intended task-relevant features. Recent studies have attempted to mitigate this by reducing the weight of biased samples during the training process. However, these debiasing methods primarily focus on identifying which samples are biased without explicitly determining the biased components within each case. This limitation restricts those methods' capability in out-of-distribution inference. To address this issue, we aim to train models to adopt the logic humans use in explaining causality. We propose a simple, comprehensive, and interpretable method: Explanation based Bias Decoupling Regularization (EBD-Reg). EBD-Reg employs human explanations as criteria, guiding the encoder to establish a tripartite parallel supervision of Distinguishing, Decoupling and Aligning. This method enables encoders to identify and focus on keywords that represent the task-relevant features during inference, while discarding the residual elements acting as biases. Empirical evidence underscores that EBD-Reg effectively guides various Transformer-based encoders to decouple biases through a human-centric lens, significantly surpassing other methods in terms of out-of-distribution inference capabilities.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 空間的文脈を用いた防火設備用電力流のオンライン計画 Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context ( http://arxiv.org/abs/2404.13391v1 ) ライセンス: Link先を確認	Jianyu Xu, Qiuzhuang Sun, Yang Yang, Huadong Mo, Daoyi Dong,	(参考訳) 2019-20 オーストラリアのブッシュファイアは、多くの経済的損失をもたらし、電力システムの運用に大きな影響を及ぼした。停電により発電所や送電線が著しく影響を受け、運用コストが増大する。本研究では,停電を受ける電力系統の最適潮流(OPF)を計画する上で,基本的かつ困難な課題について検討する。森林火災の確率的性質を考慮し,ムーアの周辺モデルに基づいて,その動態を捉えるモデルを構築した。そこで本稿では, 停電状況を明らかにする定期的な検査手法として, 電力ネットワーク内の電力の流れを逐次計画するオンライン最適化モデルフレームワークを提案する。本枠組みは, 森林火災の拡散は時間とともに非定常的であり, 拡散・封入確率が不明であると仮定する。これらの課題に対処するため,ブッシュファイアの地理的情報を「空間的文脈」として扱うコンテキスト学習アルゴリズムを開発した。オンライン学習アルゴリズムは、観測データに基づいて、未知の確率を逐次学習し、それに応じてOPF決定を行う。逐次OPF決定は、真のモデルパラメータを知っている透視戦略に対する累積損失として定義される後悔関数を最小化することを目的としている。我々は、他のベンチマークアルゴリズムが達成した後悔境界よりも優れた後悔関数のバウンドを導出することにより、アルゴリズムの理論的保証を提供する。オーストラリアのNSWから得られた実際のブッシュファイアデータからモデル仮定を検証し,その適用性を示すために2つのパワーシステムに適用する。 The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the stochastic nature of bushfire spread, we develop a model to capture such dynamics based on Moore's neighborhood model. Under a periodic inspection scheme that reveals the in-situ bushfire status, we propose an online optimization modeling framework that sequentially plans the power flows in the electricity network. Our framework assumes that the spread of bushfires is non-stationary over time, and the spread and containment probabilities are unknown. To meet these challenges, we develop a contextual online learning algorithm that treats the in-situ geographical information of the bushfire as a 'spatial context'. The online learning algorithm learns the unknown probabilities sequentially based on the observed data and then makes the OPF decision accordingly. The sequential OPF decisions aim to minimize the regret function, which is defined as the cumulative loss against the clairvoyant strategy that knows the true model parameters. We provide a theoretical guarantee of our algorithm by deriving a bound on the regret function, which outperforms the regret bound achieved by other benchmark algorithms. Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 小データセットからの分子特性予測のための伝達学習 Transfer Learning for Molecular Property Predictions from Small Data Sets ( http://arxiv.org/abs/2404.13393v1 ) ライセンス: Link先を確認	Thorren Kirschbaum, Annika Bande,	(参考訳) 機械学習は、例えば高スループットスクリーニングアプリケーションにおいて、高価な実験や量子化学計算をバイパスする新しい化学ツールとして登場した。しかし、多くの機械学習研究は小さなデータセットに依存しており、メッセージパッシングニューラルネットワークのような強力なディープラーニングアーキテクチャを効率的に実装することは困難である。本研究では,メッセージパッシングニューラルネットワークPaiNNとSOAP分子ディスクリプタを,回帰木による増進に合わせた単純な分子ディスクリプタの集合に結合させた上で,メッセージパッシングニューラルネットワークPaiNNで最良の結果が得られる,小さなデータセット上での分子特性予測のための一般的な機械学習モデルをベンチマークする。そこで我々は,PaiNNの予測能力をさらに向上させるために,大規模データセットを用いて各モデルを事前学習し,元のデータセットを微調整した上で,より正確なモデルを得ることができる転送学習戦略を提案する。事前学習ラベルは、計算的に安価なab initioまたは半経験的モデルから得られ、ターゲットデータセット上の単純な線形回帰によって補正され、元のデータに近いラベルが得られる。この戦略は、優れた結果が得られるハーバード大学の太陽光発電データセット(HOPV, HOMO-LUMO-gaps)と、複雑な基礎となる学習課題と事前学習および微調整ラベルを得るために使用される異種手法により、この手法が失敗するFreesolvデータセット(解決エネルギー)で検証される。最後に、最終トレーニング結果が事前学習データセットのサイズで単調に改善しないことが分かるが、データポイントが少ない事前学習は、よりバイアスのある事前学習モデルと微調整後の精度の向上につながる可能性がある。 Machine learning has emerged as a new tool in chemistry to bypass expensive experiments or quantum-chemical calculations, for example, in high-throughput screening applications. However, many machine learning studies rely on small data sets, making it difficult to efficiently implement powerful deep learning architectures such as message passing neural networks. In this study, we benchmark common machine learning models for the prediction of molecular properties on small data sets, for which the best results are obtained with the message passing neural network PaiNN, as well as SOAP molecular descriptors concatenated to a set of simple molecular descriptors tailored to gradient boosting with regression trees. To further improve the predictive capabilities of PaiNN, we present a transfer learning strategy that uses large data sets to pre-train the respective models and allows to obtain more accurate models after fine-tuning on the original data sets. The pre-training labels are obtained from computationally cheap ab initio or semi-empirical models and corrected by simple linear regression on the target data set to obtain labels that are close to those of the original data. This strategy is tested on the Harvard Oxford Photovoltaics data set (HOPV, HOMO-LUMO-gaps), for which excellent results are obtained, and on the Freesolv data set (solvation energies), where this method is unsuccessful due to a complex underlying learning task and the dissimilar methods used to obtain pre-training and fine-tuning labels. Finally, we find that the final training results do not improve monotonically with the size of the pre-training data set, but pre-training with fewer data points can lead to more biased pre-trained models and higher accuracy after fine-tuning.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 検索・拡張生成に基づく関係抽出 Retrieval-Augmented Generation-based Relation Extraction ( http://arxiv.org/abs/2404.13397v1 ) ライセンス: Link先を確認	Sefika Efeoglu, Adrian Paschke,	(参考訳) 情報抽出(IE)は、エンティティと関係抽出(RE)手法を用いて、構造化されていないテキストデータを構造化形式に変換する変換プロセスである。一対のエンティティ間の関係を識別することは、このフレームワークにおいて重要な役割を果たす。関係抽出のための様々な技術が存在するにもかかわらず、その有効性はラベル付きデータやかなりの計算資源へのアクセスに大きく依存している。これらの課題に対処する際には、LLM(Large Language Models)が有望なソリューションとして登場します。これらの制約を克服するために,本研究におけるRAG4RE(Retrieved-Augmented Generation-based Relation extract)を提案する。本研究は,異なるLLMを用いたRAG4RE手法の有効性を評価した。 TACRED, TACREV, Re-TACRED, SemEval RE データセットなどの確立したベンチマークを活用することで, RAG4RE アプローチの有効性を総合的に評価することを目的とする。特に、Flan T5、Llama2、Mistralなどの著名なLLMを活用している。我々のRAG4REアプローチは,LSMのみに基づく従来のREアプローチよりも優れており,特にTACREDデータセットとその変動は顕著である。さらに,本手法は,TACREDとTACREVの両方のデータセットにまたがる従来のRE手法と比較して,顕著な性能を示し,自然言語処理におけるREタスクの有効性と可能性を強調した。 Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The identification of the relation between a pair of entities plays a crucial role within this framework. Despite the existence of various techniques for relation extraction, their efficacy heavily relies on access to labeled data and substantial computational resources. In addressing these challenges, Large Language Models (LLMs) emerge as promising solutions; however, they might return hallucinating responses due to their own training data. To overcome these limitations, Retrieved-Augmented Generation-based Relation Extraction (RAG4RE) in this work is proposed, offering a pathway to enhance the performance of relation extraction tasks. This work evaluated the effectiveness of our RAG4RE approach utilizing different LLMs. Through the utilization of established benchmarks, such as TACRED, TACREV, Re-TACRED, and SemEval RE datasets, our aim is to comprehensively evaluate the efficacy of our RAG4RE approach. In particularly, we leverage prominent LLMs including Flan T5, Llama2, and Mistral in our investigation. The results of our study demonstrate that our RAG4RE approach surpasses performance of traditional RE approaches based solely on LLMs, particularly evident in the TACRED dataset and its variations. Furthermore, our approach exhibits remarkable performance compared to previous RE methodologies across both TACRED and TACREV datasets, underscoring its efficacy and potential for advancing RE tasks in natural language processing.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# HiVG:ビジュアルグラウンドのための階層型マルチモーダルきめ細粒度変調 HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding ( http://arxiv.org/abs/2404.13400v1 ) ライセンス: Link先を確認	Linhui Xiao, Xiaoshan Yang, Fang Peng, Yaowei Wang, Changsheng Xu,	(参考訳) 視覚的グラウンドティングは、自然言語を介して視覚領域をグラウンドすることを目的としており、クロスモーダルアライメントに大きく依存するタスクである。既存の作業では、単モーダル事前訓練されたモデルを使用して、多モーダル対応情報を無視しながら視覚的/言語的な知識を別々に伝達した。コントラッシブ言語画像事前学習法とローランク適応法(LoRA)の最近の進歩により,マルチモーダル事前学習に基づく基礎課題の解決を目指す。しかし、事前訓練と接地の間には大きな課題ギャップがある。そこで我々は,これらのギャップに対処するために,高精度かつ効率的な階層型マルチモーダルきめ細粒度変調フレームワーク,すなわちHiVGを提案する。特に、HiVGは多層適応型クロスモーダルブリッジと階層型マルチモーダル低ランク適応(Hi LoRA)パラダイムで構成されている。クロスモーダルブリッジは、視覚的特徴と接地に必要なものとの不整合に対処し、多レベル視覚的特徴とテキスト的特徴との接続を確立する。 Hi LoRAは、階層的な方法で、クロスモーダルな特徴を浅い層から深い層に適応させることによって、知覚エラーの蓄積を防止する。 5つのデータセットによる実験結果から, 提案手法の有効性を実証し, 重要な接地能力と, 有望なエネルギー効率の優位性を実証した。プロジェクトページ:https://github.com/linhuixiao/HiVG。 Visual grounding, which aims to ground a visual region via natural language, is a task that heavily relies on cross-modal alignment. Existing works utilized uni-modal pre-trained models to transfer visual/linguistic knowledge separately while ignoring the multimodal corresponding information. Motivated by recent advancements in contrastive language-image pre-training and low-rank adaptation (LoRA) methods, we aim to solve the grounding task based on multimodal pre-training. However, there exists significant task gaps between pre-training and grounding. Therefore, to address these gaps, we propose a concise and efficient hierarchical multimodal fine-grained modulation framework, namely HiVG. Specifically, HiVG consists of a multi-layer adaptive cross-modal bridge and a hierarchical multimodal low-rank adaptation (Hi LoRA) paradigm. The cross-modal bridge can address the inconsistency between visual features and those required for grounding, and establish a connection between multi-level visual and text features. Hi LoRA prevents the accumulation of perceptual errors by adapting the cross-modal features from shallow to deep layers in a hierarchical manner. Experimental results on five datasets demonstrate the effectiveness of our approach and showcase the significant grounding capabilities as well as promising energy efficiency advantages. The project page: https://github.com/linhuixiao/HiVG.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 外れ値付き$k$Sparse Wasserstein Barycenterの近似アルゴリズム Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers ( http://arxiv.org/abs/2404.13401v1 ) ライセンス: Link先を確認	Qingyuan Yang, Hu Ding,	(参考訳) Wasserstein Barycenter (WB) は最適輸送における最も基本的な最適化問題の1つである。分布の集合が与えられた場合、WBの目標は、平均ワッサーシュタイン距離を最小化する新しい分布を見つけることである。解を ``$k$-sparse'' に制限すれば、問題はさらに難しくなる。本稿では,外乱が存在する場合のWB問題として$k$sparseの問題について検討する。既存のWBアルゴリズムは、ケースを外れ値で処理するために直接拡張できないため、いくつかの新しいアイデアを開発するために緊急に必要である。まず、アウトレーヤを持つ$k$sparse WBと(アウトレーヤを持つ)クラスタリング問題との関係について検討する。特に,外乱問題のある$k$-sparse WBに対して,定数近似係数を求めるクラスタリングに基づくLP法を提案する。さらに, 1+\epsilon)$-approximation factor を任意の $\epsilon>0$ に対して達成するためにコアセット法を用いる。最後に,提案アルゴリズムの実験を行い,その実効性を実証する。 Wasserstein Barycenter (WB) is one of the most fundamental optimization problems in optimal transportation. Given a set of distributions, the goal of WB is to find a new distribution that minimizes the average Wasserstein distance to them. The problem becomes even harder if we restrict the solution to be ``$k$-sparse''. In this paper, we study the $k$-sparse WB problem in the presence of outliers, which is a more practical setting since real-world data often contains noise. Existing WB algorithms cannot be directly extended to handle the case with outliers, and thus it is urgently needed to develop some novel ideas. First, we investigate the relation between $k$-sparse WB with outliers and the clustering (with outliers) problems. In particular, we propose a clustering based LP method that yields constant approximation factor for the $k$-sparse WB with outliers problem. Further, we utilize the coreset technique to achieve the $(1+\epsilon)$-approximation factor for any $\epsilon>0$, if the dimensionality is not high. Finally, we conduct the experiments for our proposed algorithms and illustrate their efficiencies in practice.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# コマンドライン言語モデルを用いた大規模侵入検出 Intrusion Detection at Scale with the Assistance of a Command-line Language Model ( http://arxiv.org/abs/2404.13402v1 ) ライセンス: Link先を確認	Jiongliang Lin, Yiwen Guo, Hao Chen,	(参考訳) 侵入検知は長期にわたるセキュリティ上の重要な問題である。企業のセキュリティソリューションでは,侵入を自動的に検出できるシステムが非常に需要が高い。既存のソリューションは、セキュリティオペレータが設計した手作りのルールに大きく依存している。 AIと機械学習は、データから異常なユーザの振る舞いをインテリジェントかつ自動的に検査することで、この問題に対処するための有望なソリューションを提供する。しかし、文献における既存の学習ベースの侵入検知システムは、主に小さなデータ用に設計されており、クラウド環境におけるビッグデータのパワーを活用できない。本稿では,大規模な事前学習を取り入れた侵入検知システムを導入し,AIによる侵入検知のための数千万行の指令に基づいて大規模言語モデルを訓練する。 3000万のトレーニングサンプルと1000万のテストサンプルで実施した実験では、ソリューションの有効性が検証された。 Intrusion detection is a long standing and crucial problem in security. A system capable of detecting intrusions automatically is on great demand in enterprise security solutions. Existing solutions rely heavily on hand-crafted rules designed by security operators, which suffer from high false negative rates and poor generalization ability to new, zero-day attacks at scale. AI and machine learning offer promising solutions to address the issues, by inspecting abnormal user behaviors intelligently and automatically from data. However, existing learning-based intrusion detection systems in the literature are mostly designed for small data, and they lack the ability to leverage the power of big data in cloud environments. In this paper, we target at this problem and introduce an intrusion detection system which incorporates large-scale pre-training, so as to train a large language model based on tens of millions of command lines for AI-based intrusion detection. Experiments performed on 30 million training samples and 10 million test samples verify the effectiveness of our solution.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 一般活性化機能を有する完全連結二層ニューラルネットワークの解空間と記憶容量 Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions ( http://arxiv.org/abs/2404.13404v1 ) ライセンス: Link先を確認	Sota Nishiyama, Masayuki Ohzeki,	(参考訳) 二項分類モデルの記憶能力は、モデルが学習できるパラメータ毎のランダムな入出力ペアの最大数である。機械学習モデルの表現力の指標の一つであり,様々なモデルの性能を比較する上で重要である。本研究では, 統計物理学のレプリカ法を用いて, 一般活性化関数を持つ完全連結二層ニューラルネットワークの解空間の構造と記憶容量を解析した。その結果, パラメータあたりの記憶容量は無限幅でも有限であり, ネットワークの重みは負の相関を示し, 結果として「労働の分断」が生じることがわかった。さらに, データセットサイズの増加は, 重みの置換対称性が損なわれ, 解空間が非結合領域に分裂する特定の遷移点において, 相転移を引き起こすことが判明した。この遷移点と記憶容量のアクティベーション関数の選択に対する依存性を同定する。これらの知見は, アクティベーション関数の影響と, パラメータ数が解空間の構造に与える影響の理解に寄与し, 特定の目的に基づいて適切なアーキテクチャを選択するための洞察を提供する可能性がある。 The storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn. It is one of the indicators of the expressive power of machine learning models and is important for comparing the performance of various models. In this study, we analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions using the replica method from statistical physics. Our results demonstrate that the storage capacity per parameter remains finite even with infinite width and that the weights of the network exhibit negative correlations, leading to a 'division of labor'. In addition, we find that increasing the dataset size triggers a phase transition at a certain transition point where the permutation symmetry of weights is broken, resulting in the solution space splitting into disjoint regions. We identify the dependence of this transition point and the storage capacity on the choice of activation function. These findings contribute to understanding the influence of activation functions and the number of parameters on the structure of the solution space, potentially offering insights for selecting appropriate architectures based on specific objectives.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# 連続LBSインタラクションにおけるユーザビリティを最適化した多面的プライバシリーク管理フレームワーク A Framework for Managing Multifaceted Privacy Leakage While Optimizing Utility in Continuous LBS Interactions ( http://arxiv.org/abs/2404.13407v1 ) ライセンス: Link先を確認	Anis Bkakria, Reda Yaich,	(参考訳) 位置情報ベースのサービス(LBS)のプライバシは、モバイルデバイスの普及と、さまざまなアプリケーションへの位置情報の統合の増大において、最重要課題となっている。本稿では,LBSにおけるプライバシー漏洩の理解と管理の促進を目的とした,新たなコントリビューションについて述べる。私たちのコントリビューションは、位置情報ベースのインタラクションのさまざまな側面にわたるプライバシー上の懸念を分析するための、より包括的なフレームワークを提供します。具体的には、(\epsilon, \delta)$-location privacy, $(\epsilon, \delta, \theta)$-trajectory privacy, $(\epsilon, \delta, \theta)$-POI privacyを紹介します。さらに、これらのプライバシー概念の基本的な関係を確立し、LBSにおけるプライバシー保護の全体的アプローチを促進する。さらに,提案するプライバシー保護機構の有用性を評価するために,プライバシー保護とデータユーティリティのトレードオフに関する知見を提供する。最後に,本フレームワークをPlannarアイソトピック機構でインスタンス化し,その実用性を実証し,有効性を確保し,各種次元にわたるプライバシー漏洩を定量化する。提案した評価は, 精度の定量化を図りつつ, 位置情報, トラジェクトリ, 関心点(POI)のプライバシー損失を捉える上で, フレームワークの有効性を総合的に把握するものである。 Privacy in Location-Based Services (LBS) has become a paramount concern with the ubiquity of mobile devices and the increasing integration of location data into various applications. In this paper, we present several novel contributions aimed at advancing the understanding and management of privacy leakage in LBS. Our contributions provides a more comprehensive framework for analyzing privacy concerns across different facets of location-based interactions. Specifically, we introduce $(\epsilon, \delta)$-location privacy, $(\epsilon, \delta, \theta)$-trajectory privacy, and $(\epsilon, \delta, \theta)$-POI privacy, which offer refined mechanisms for quantifying privacy risks associated with location, trajectory, and points of interest when continuously interacting with LBS. Furthermore, we establish fundamental connections between these privacy notions, facilitating a holistic approach to privacy preservation in LBS. Additionally, we present a lower bound analysis to evaluate the utility of the proposed privacy-preserving mechanisms, offering insights into the trade-offs between privacy protection and data utility. Finally, we instantiate our framework with the Plannar Isotopic Mechanism to demonstrate its practical applicability while ensuring optimal utility and quantifying privacy leakages across various dimensions. The conducted evaluations provide a comprehensive insight into the efficacy of our framework in capturing privacy loss on location, trajectory, and Points of Interest (POI) while facilitating quantification of the ensured accuracy.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# AMMUNet:リモートセンシング画像セグメンテーションのためのマルチスケールアテンションマップマージ AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation ( http://arxiv.org/abs/2404.13408v1 ) ライセンス: Link先を確認	Yang Yang, Shunyi Zheng,	(参考訳) 深層学習の進歩は、リモートセンシングセマンティックセグメンテーションにおいて顕著な進歩をもたらした。注意機構は、グローバルなモデリングと文脈情報の利用を可能にする一方で、高い計算コストの課題に直面し、長期依存の捕捉を弱めるウィンドウベースの操作を必要とし、リモートセンシング画像処理の有効性を阻害する。本稿では,マルチスケールアテンションマップをマージするUNetベースのフレームワークであるAMMUNetを提案する。 GMSAは、グローバルなマルチヘッド自己保持機構とは対照的に、計算コストを大幅に削減しながら、グローバル情報を効率的に取得する。これは、寸法対応の戦略的利用により、粒度を調整し、相対的な位置バイアスパラメータを減らし、計算効率を最適化する。提案するAMMMは,マルチスケールアテンションマップを固定マスクテンプレートを用いた統一表現に効果的に組み合わせ,グローバルアテンション機構のモデリングを可能にする。実験により,本手法の精度が向上し,挑戦的ベイヒンゲンデータセットでは75.48 %,ポツダムデータセットでは77.90 %の顕著な平均交叉(mIoU)が達成された。コードはhttps://github.com/interpretty/AMMUNet.comで入手できる。 The advancement of deep learning has driven notable progress in remote sensing semantic segmentation. Attention mechanisms, while enabling global modeling and utilizing contextual information, face challenges of high computational costs and require window-based operations that weaken capturing long-range dependencies, hindering their effectiveness for remote sensing image processing. In this letter, we propose AMMUNet, a UNet-based framework that employs multi-scale attention map merging, comprising two key innovations: the granular multi-head self-attention (GMSA) module and the attention map merging mechanism (AMMM). GMSA efficiently acquires global information while substantially mitigating computational costs in contrast to global multi-head self-attention mechanism. This is accomplished through the strategic utilization of dimension correspondence to align granularity and the reduction of relative position bias parameters, thereby optimizing computational efficiency. The proposed AMMM effectively combines multi-scale attention maps into a unified representation using a fixed mask template, enabling the modeling of global attention mechanism. Experimental evaluations highlight the superior performance of our approach, achieving remarkable mean intersection over union (mIoU) scores of 75.48\% on the challenging Vaihingen dataset and an exceptional 77.90\% on the Potsdam dataset, demonstrating the superiority of our method in precise remote sensing semantic segmentation. Codes are available at https://github.com/interpretty/AMMUNet.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# テラヘルツ応用への期待を示すトンネル接合の5例の解析解 Analytical solutions for five examples of shunted tunneling junctions showing promise for terahertz applications ( http://arxiv.org/abs/2404.13415v1 ) ライセンス: Link先を確認	Mark J Hagmann,	(参考訳) 解析手法を用いて、矩形、三角形、あるいはデルタ関数からなる歪んだモデルと電子の相互作用を研究した。ゼロ電位における前バリア領域との直列の電位障壁。それぞれのモデルにおいて、曲がりくねった境界条件は、行列方程式が右列ベクトルの零点しか持たない原因となる。したがって、行列の行列式は非自明な解に対して 0 でなければならない。各モデルの行列式はパラメータのみを含む(例えば、シャントの長さ、障壁の長さと高さ、電子エネルギー)。したがって、各モデルの解の完全な集合は、代数学を用いてパラメータ空間のすべての点を決定し、各モデルの係数を計算することによって得られる。パラメータ空間におけるある点から別の点への経路は、モデルの操作の可能な履歴に対応する。プロトタイプでは、シャントは数nmの平均自由経路上で準コヒーレントな電子輸送を提供する特定の金属のフィラメントである可能性がある。時間に依存しないシュロディンガー方程式の準静電シミュレーションは、100nmのデバイスが最大1000Hzの周波数で動作できることを示唆している。 We used analytical methods to study the interaction of electrons with shunted models consisting of a rectangular, triangular, or delta function. potential barrier in series with a pre-barrier region at zero potential. In each model the shunted boundary conditions cause the matrix equation to have only zeros in the right-hand column vector. Thus, the determinant for each matrix must be zero for a non-trivial solution. The determinant for each model contains only the parameters (e.g. the length of the shunt, the length and height of the barrier, and the electron energy). Thus, the complete set of solutions for each model is obtained by using algebra to determine all of the points in the parameter space, and then to calculate the coefficients for each model. Any path from one point to another in the parameter space corresponds to a possible history for the operation of the model. In prototypes the shunts could be filaments of certain metals that provide quasi-coherent electron transport over mean-free paths of tens of nm. Quasi-static simulations of the time-independent Schrodinger equation suggest that a device with a size of 100 nm could operate at frequencies up to 1,000 THz.	翻訳日:2024-04-23 19:10:11 公開日:2024-04-20
# Gaussian-class Activation Mapping Explainer を用いたオブジェクト検出の効率的かつ簡潔な説明 Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer ( http://arxiv.org/abs/2404.13417v1 ) ライセンス: Link先を確認	Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Tuong Phan, Hung Cao,	(参考訳) オブジェクト検出モデルに対する説明可能なAI(XAI)において、迅速かつ妥当な説明を提供することの課題に対処するため、G-CAME(Gaussian Class Activation Mapping Explainer)を紹介した。提案手法は,選択した層からのアクティベーションマップを利用し,ガウスカーネルを適用して予測対象に対する重要な画像領域を強調することにより,高精度なサリエンシマップを効率よく生成する。他のリージョンベースのアプローチと比較して、G-CAMEは、品質を損なうことなく説明時間を0.5秒に大幅に短縮する。我々は, MS-COCO 2017データセット上で, Faster-RCNN と YOLOX を用いた G-CAME の評価を行った。 To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compared with other Region-based approaches, G-CAME significantly reduces explanation time to 0.5 seconds without compromising the quality. Our evaluation of G-CAME, using Faster-RCNN and YOLOX on the MS-COCO 2017 dataset, demonstrates its ability to offer highly plausible and faithful explanations, especially in reducing the bias on tiny object detection.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# NeurCADRecon:ゼロガウス曲率によるCAD表面再構成のためのニューラル表現 NeurCADRecon: Neural Representation for Reconstructing CAD Surfaces by Enforcing Zero Gaussian Curvature ( http://arxiv.org/abs/2404.13420v1 ) ライセンス: Link先を確認	Qiujie Dong, Rui Xu, Pengfei Wang, Shuangmin Chen, Shiqing Xin, Xiaohong Jia, Wenping Wang, Changhe Tu,	(参考訳) ニューラルサイン付き距離関数(SDF)を用いた有機モデル再構成の最近の進歩にもかかわらず、CADモデルを低品質の無向点雲から直接高忠実に再構成することは大きな課題である。本稿では,CADモデルの表面が概ね片方向のパッチで構成されており,それぞれが特徴線周辺でもほぼ展開可能であるという先行観測に基づいて,この問題に対処する。提案手法はNeurCADReconという自己監督型であり,その損失にはガウス曲率を0に向けて促進し,入力点への忠実度を確保できる発展性項を含む。ガウス曲率が先端点でゼロでないことに注意し、これらの先端点の存在を許容する二重トラフ曲線を導入する。さらに,与えられた点が不完全あるいは疎小である状況に対処するための動的サンプリング戦略を開発する。得られた神経SDFは、鋭い特徴点/線をはっきりと表すことができるため、特徴整列メッシュをSDFから抽出し、滑らかな表面パッチに分解し、パラメトリックCAD設計の回収の難しさを大幅に軽減することができる。既存の最先端手法と総合的に比較すると,忠実CAD形状の再構築において,我々のアプローチの顕著な利点が示される。 Despite recent advances in reconstructing an organic model with the neural signed distance function (SDF), the high-fidelity reconstruction of a CAD model directly from low-quality unoriented point clouds remains a significant challenge. In this paper, we address this challenge based on the prior observation that the surface of a CAD model is generally composed of piecewise surface patches, each approximately developable even around the feature line. Our approach, named NeurCADRecon, is self-supervised, and its loss includes a developability term to encourage the Gaussian curvature toward 0 while ensuring fidelity to the input points. Noticing that the Gaussian curvature is non-zero at tip points, we introduce a double-trough curve to tolerate the existence of these tip points. Furthermore, we develop a dynamic sampling strategy to deal with situations where the given points are incomplete or too sparse. Since our resulting neural SDFs can clearly manifest sharp feature points/lines, one can easily extract the feature-aligned triangle mesh from the SDF and then decompose it into smooth surface patches, greatly reducing the difficulty of recovering the parametric CAD design. A comprehensive comparison with existing state-of-the-art methods shows the significant advantage of our approach in reconstructing faithful CAD shapes.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# 多元学習:分散学習を用いた包括的非IIDデータ処理 MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning ( http://arxiv.org/abs/2404.13421v1 ) ライセンス: Link先を確認	Michael Duchesne, Kaiwen Zhang, Chamseddine Talhi,	(参考訳) Federated Learning(FL)は、機密臨床機械学習のようなユースケースを可能にするための、プライバシー保護のための重要なテクニックとして登場した。 FLはデータを所有するリモートデバイスによってトレーニングされたモデルを集約することで動作する。これにより、FLは、プライバシーを損なうことなく、多数の学習者のクラウドソースデータを用いて、強力なグローバルモデルのトレーニングを可能にする。しかし、集約サーバはグローバルモデルを生成する際の単一障害点である。さらに、モデルの性能は、データが独立ではなく、すべてのリモートデバイス上で同一に分散された(非IIDデータ)ときに悩まされる。これにより、非常に異なるモデルが集約され、特定のシナリオで最大50%パフォーマンスが低下する可能性がある。本稿では,FLの利点を維持しつつ,上記の課題に対処する。非IIDデータを扱うために設計された分散FLフレームワークであるMultiConfederated Learningを提案する。従来のFLとは異なり、MultiConfederated Learningは複数のモデルを(単一のグローバルモデルではなく)並列に維持し、データがIIDでないときの収束を支援する。トランスファーラーニングの助けを借りて、学習者はより少ないモデルに収束できる。適応性を高めるために、学習者は仲間からどの更新を集計するかを選択することができる。 Federated Learning (FL) has emerged as a prominent privacy-preserving technique for enabling use cases like confidential clinical machine learning. FL operates by aggregating models trained by remote devices which owns the data. Thus, FL enables the training of powerful global models using crowd-sourced data from a large number of learners, without compromising their privacy. However, the aggregating server is a single point of failure when generating the global model. Moreover, the performance of the model suffers when the data is not independent and identically distributed (non-IID data) on all remote devices. This leads to vastly different models being aggregated, which can reduce the performance by as much as 50% in certain scenarios. In this paper, we seek to address the aforementioned issues while retaining the benefits of FL. We propose MultiConfederated Learning: a decentralized FL framework which is designed to handle non-IID data. Unlike traditional FL, MultiConfederated Learning will maintain multiple models in parallel (instead of a single global model) to help with convergence when the data is non-IID. With the help of transfer learning, learners can converge to fewer models. In order to increase adaptability, learners are allowed to choose which updates to aggregate from their peers.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# PIPER:Hindsight Relabelingによるプリミティブインフォームド推論に基づく階層的強化学習 PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling ( http://arxiv.org/abs/2404.13423v1 ) ライセンス: Link先を確認	Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi,	(参考訳) 本研究では,プライオリティベース学習を応用して報酬モデルを学習する手法であるHindsight Relabelingを用いたPrimitive-Informed Preference-based Hierarchical reinforcement Learning(PIPER)を紹介する。この報酬は、プリミティブな振る舞いの影響を受けないため、既存の階層的アプローチに共通する非定常性を緩和し、様々な難解なスパース・リワードタスクにおける印象的なパフォーマンスを示すことができる。人間のフィードバックを得るのは通常実用的ではないため、環境から得られる疎い報酬を用いてフィードバックを生成するプリミティブ・イン・ザ・ループ・アプローチに置き換えることを提案する。さらに,実現不可能なサブゴール予測を防止し,解の退化を回避するために,より高レベルなポリシーを条件として,低レベルなポリシーのための実行可能なサブゴールを生成するプリミティブインフォームド正規化を提案する。我々は、PIPERが階層的強化学習において非定常性を緩和し、困難でスパース・リワードなロボット環境において50$\%以上の成功率を達成することを示すための広範な実験を行った。 In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# AdvLoRA:視覚言語モデルの逆低ランク適応 AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models ( http://arxiv.org/abs/2404.13425v1 ) ライセンス: Link先を確認	Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Gang Zhou, Xingwei Zhang, Xinwang Liu, Xiaolong Zheng,	(参考訳) VLM(Vision-Language Models)は、人工知能(AGI)において重要な技術である。 AGIの急速な成長に伴い、セキュリティ問題はVLMにとって最も重要な課題の1つとなった。本稿では,広範にわたる実験を通じて,従来のVLMの適応手法の脆弱性を実証する。さらに、VLMのサイズが大きくなるにつれて、従来のVLMへの逆適応技術の実行により、計算コストが高くなる。これらの問題を解決するために、パラメータ効率の高い \underline{Adv}ersarial adaptation methodである \underline{AdvLoRA}w-\underline{R}ank \underline{A}daptationを提案する。まず, VLMの対角適応における本質的な低ランク特性について検討し, 明らかにした。 LoRAと異なり、パラメータクラスタリングとパラメータアライメントに基づく新しい再パラメータ化法を設計することにより、対向適応の効率性と堅牢性を向上させる。さらに、ロバスト性をさらに向上するため、適応パラメータ更新戦略を提案する。これらの設定により,提案したAdvLoRAはモデルセキュリティと高資源廃棄物問題を軽減する。大規模な実験はAdvLoRAの有効性と効率を実証している。 Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the size of the VLMs increases, performing conventional adversarial adaptation techniques on VLMs results in high computational costs. To solve these problems, we propose a parameter-efficient \underline{Adv}ersarial adaptation method named \underline{AdvLoRA} by \underline{Lo}w-\underline{R}ank \underline{A}daptation. At first, we investigate and reveal the intrinsic low-rank property during the adversarial adaptation for VLMs. Different from LoRA, we improve the efficiency and robustness of adversarial adaptation by designing a novel reparameterizing method based on parameter clustering and parameter alignment. In addition, an adaptive parameter update strategy is proposed to further improve the robustness. By these settings, our proposed AdvLoRA alleviates the model security and high resource waste problems. Extensive experiments demonstrate the effectiveness and efficiency of the AdvLoRA.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# Data Privacy Vocabulary (DPV) -- Version 2 Data Privacy Vocabulary (DPV) -- Version 2 ( http://arxiv.org/abs/2404.13426v1 ) ライセンス: Link先を確認	Harshvardhan J. Pandit, Beatriz Esteves, Georg P. Krog, Paul Ryan, Delaram Golpayegani, Julian Flake,	(参考訳) Data Privacy Vocabulary (DPV)は、W3C Data Privacy Vocabularies and Controls Community Group (DPVCG)によって開発された、個人データの処理を記述するための機械可読性、相互運用性、標準ベースの表現の作成を可能にする。また、EUのGDPRのような立法要件をサポートするための特定のアプリケーションを記述するために、DPVの拡張も公開している。 DPVは、W3C ODRLなどの既存の標準と併用し、特定のユースケースやドメインに適応するためにカスタマイズおよび拡張可能な語彙を提供することによって、最先端における重要なニッチを埋める。この記事では、DPVのバージョン2イテレーションについて、その内容、方法論、現在の採用と利用、将来の可能性について説明する。また、さまざまな規制(EUのDGAおよびAI法など)と世界中のコミュニティイニシアチブ(例えばSolid)をサポートするための共通の語彙として機能する上でのDPVの関連性と役割についても説明している。 The Data Privacy Vocabulary (DPV), developed by the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG), enables the creation of machine-readable, interoperable, and standards-based representations for describing the processing of personal data. The group has also published extensions to the DPV to describe specific applications to support legislative requirements such as the EU's GDPR. The DPV fills a crucial niche in the state of the art by providing a vocabulary that can be embedded and used alongside other existing standards such as W3C ODRL, and which can be customised and extended for adapting to specifics of use-cases or domains. This article describes the version 2 iteration of the DPV in terms of its contents, methodology, current adoptions and uses, and future potential. It also describes the relevance and role of DPV in acting as a common vocabulary to support various regulatory (e.g. EU's DGA and AI Act) and community initiatives (e.g. Solid) emerging across the globe.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# テキスト依存型話者検証(TdSV)チャレンジ2024:課題評価計画 Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan ( http://arxiv.org/abs/2404.13428v1 ) ライセンス: Link先を確認	Zeinali Hossein, Lee Kong Aik, Alam Jahangir, Burget Lukas,	(参考訳) 本論文では,テキスト依存型話者検証(TdSV)チャレンジ2024について概説する。この課題の第一の目的は、参加者が単一で競争的なシステムを開発し、徹底的な分析を行い、テキストに依存した話者検証のためのマルチタスク学習、自己教師付き学習、少数ショット学習などの革新的な概念を探求することである。 This document outlines the Text-dependent Speaker Verification (TdSV) Challenge 2024, which centers on analyzing and exploring novel approaches for text-dependent speaker verification. The primary goal of this challenge is to motive participants to develop single yet competitive systems, conduct thorough analyses, and explore innovative concepts such as multi-task learning, self-supervised learning, few-shot learning, and others, for text-dependent speaker verification.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# React-OT:化学反応における遷移状態生成のための最適輸送 React-OT: Optimal Transport for Generating Transition State in Chemical Reactions ( http://arxiv.org/abs/2404.13430v1 ) ライセンス: Link先を確認	Chenru Duan, Guan-Horng Liu, Yuanqi Du, Tianrong Chen, Qiyuan Zhao, Haojun Jia, Carla P. Gomes, Evangelos A. Theodorou, Heather J. Kulik,	(参考訳) 遷移状態 (TSs) は反応機構や設計触媒を理解する上で重要な過渡構造であるが、実験で捉えることは困難である。あるいは、TSを計算的に検索するために多くの最適化アルゴリズムが開発されている。しかし、量子化学法(通常は密度汎関数理論)によって駆動されるこれらのアルゴリズムのコストは依然として高く、反応探索のための大規模な反応ネットワークの構築においてその応用に課題を提起している。ここでは反応物や生成物から独自のTS構造を生成するための最適な輸送手法であるReact-OTを開発した。 React-OT は 0.053{\AA} の中央構造根平均平方偏差 (RMSD) と1.06 kcal/mol の障壁高さ誤差を持つ高度に正確なTS 構造を生成する。 RMSDとバリア高さの誤差は、理論レベルが低いGFN2-xTBの反応データセット上でReact-OTを前訓練することで、約25%改善される。我々は、未知の機構を持つ化学反応を探索する際に、TSを標的としたReact-OTの高精度かつ高速な推論を想定する。 Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing challenges for their applications in building large reaction networks for reaction exploration. Here we developed React-OT, an optimal transport approach for generating unique TS structures from reactants and products. React-OT generates highly accurate TS structures with a median structural root mean square deviation (RMSD) of 0.053{\AA} and median barrier height error of 1.06 kcal/mol requiring only 0.4 second per reaction. The RMSD and barrier height error is further improved by roughly 25% through pretraining React-OT on a large reaction dataset obtained with a lower level of theory, GFN2-xTB. We envision the great accuracy and fast inference of React-OT useful in targeting TSs when exploring chemical reactions with unknown mechanisms.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# Nested-TNT:マルチスケール特徴処理を備えた階層型視覚変換器 Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing ( http://arxiv.org/abs/2404.13434v1 ) ライセンス: Link先を確認	Yuang Liu, Zhiheng Qiu, Xiaokai Qin,	(参考訳) トランスフォーマーは、自然言語処理における優れた性能、従来の畳み込みニューラルネットワークを超え、新しい最先端技術を達成するため、コンピュータビジョンの分野で応用されている。 ViTは画像を「視覚文」と呼ばれるいくつかの局所的なパッチに分割する。しかし、画像に含まれる情報は巨大で複雑であり、「視覚文」レベルでのみ特徴に焦点を当てるだけでは不十分である。ローカルパッチ間の機能についても考慮する必要がある。さらに改良するために,TNTモデルを提案し,そのアルゴリズムにより,より正確な結果が得られるように,より小さなパッチ,すなわち視覚的単語に分割する。 Transformerの中核はマルチヘッドアテンション機構であり、従来のアテンションメカニズムは異なるアテンションヘッド間のインタラクションを無視している。冗長性を低減し、利用率を向上させるため、ネストアルゴリズムを導入し、画像分類タスクにNested-TNTを適用した。この実験は、提案したモデルが、データセットCIFAR10では2.25%、データセットFLOWERS102では2.78%、0.25%を上回る、ViTとTNTよりも優れた分類性能を達成したことを確認した。 Transformer has been applied in the field of computer vision due to its excellent performance in natural language processing, surpassing traditional convolutional neural networks and achieving new state-of-the-art. ViT divides an image into several local patches, known as "visual sentences". However, the information contained in the image is vast and complex, and focusing only on the features at the "visual sentence" level is not enough. The features between local patches should also be taken into consideration. In order to achieve further improvement, the TNT model is proposed, whose algorithm further divides the image into smaller patches, namely "visual words," achieving more accurate results. The core of Transformer is the Multi-Head Attention mechanism, and traditional attention mechanisms ignore interactions across different attention heads. In order to reduce redundancy and improve utilization, we introduce the nested algorithm and apply the Nested-TNT to image classification tasks. The experiment confirms that the proposed model has achieved better classification performance over ViT and TNT, exceeding 2.25%, 1.1% on dataset CIFAR10 and 2.78%, 0.25% on dataset FLOWERS102 respectively.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# 奥行き誘導型ニューラルサーフェスを用いた高忠実度内視鏡画像合成 High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces ( http://arxiv.org/abs/2404.13437v1 ) ライセンス: Link先を確認	Baoru Huang, Yida Wang, Anh Nguyen, Daniel Elson, Francisco Vasconcelos, Danail Stoyanov,	(参考訳) 外科腫瘍学において、大腸内視鏡検査は生検などの診断補助の提供、特にポリープ検出において外科的ナビゲーションの促進に重要な役割を果たしている。近年,コンピュータによる内視鏡手術が注目され,カメラのローカライゼーション,深度推定,表面再構成など,様々な3Dコンピュータビジョン技術が融合している。 NeRF(Neural Radiance Fields)とNeuS(Neural Implicit Surfaces)が登録画像から正確な3次元表面モデルを導出するための有望な手法として登場し、制約されたカメラの動きから生じる既存の大腸再建アプローチの限界に対処している。しかし, 単眼大腸内視鏡画像再構成では, 組織テクスチャの表現が不十分であり, スケールの混乱が最終レンダリングの進行を妨げている。本稿では,1フレームの深度マップで補足した内視鏡画像に応用したNeuSを応用して,大腸部分再建のための新しい手法を提案する。特に,1フレームの深度マップのみをフォトリアリスティックな再構成とニューラルレンダリングの応用に利用し,この1フレームの深度マップをオブジェクトスケールの他の単眼深度推定ネットワークから容易に得ることができるようにした。ファントム画像に対する厳密な実験と検証により, 大腸領域を完全にレンダリングし, 表面の見えない部分を捕捉するなど, 異常な精度が得られた。このブレークスルーは、安定的で一貫してスケールされた再建を達成するための道を開き、がんスクリーニングの手順と治療介入の質を高めることを約束する。 In surgical oncology, screening colonoscopy plays a pivotal role in providing diagnostic assistance, such as biopsy, and facilitating surgical navigation, particularly in polyp detection. Computer-assisted endoscopic surgery has recently gained attention and amalgamated various 3D computer vision techniques, including camera localization, depth estimation, surface reconstruction, etc. Neural Radiance Fields (NeRFs) and Neural Implicit Surfaces (NeuS) have emerged as promising methodologies for deriving accurate 3D surface models from sets of registered images, addressing the limitations of existing colon reconstruction approaches stemming from constrained camera movement. However, the inadequate tissue texture representation and confused scale problem in monocular colonoscopic image reconstruction still impede the progress of the final rendering results. In this paper, we introduce a novel method for colon section reconstruction by leveraging NeuS applied to endoscopic images, supplemented by a single frame of depth map. Notably, we pioneered the exploration of utilizing only one frame depth map in photorealistic reconstruction and neural rendering applications while this single depth map can be easily obtainable from other monocular depth estimation networks with an object scale. Through rigorous experimentation and validation on phantom imagery, our approach demonstrates exceptional accuracy in completely rendering colon sections, even capturing unseen portions of the surface. This breakthrough opens avenues for achieving stable and consistently scaled reconstructions, promising enhanced quality in cancer screening procedures and treatment interventions.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# コロナニュースのための微細粒状タンパク質 Fine-Grained Named Entities for Corona News ( http://arxiv.org/abs/2404.13439v1 ) ライセンス: Link先を確認	Sefika Efeoglu, Adrian Paschke,	(参考訳) 新聞などの情報資源は、2019年12月以降、コロナの流行に関連するさまざまな言語で、構造化されていないテキストデータを生み出している。これらの非構造化テキストの分析は、構造化フォーマットで表現することなく、時間を要するため、構造化フォーマットで表現することが不可欠である。この目標を達成するための重要なタスクであるエンティティタグ付けと関係抽出を備えた情報抽出パイプラインは、これらのテキストに適用できるかもしれない。本研究では,ジェネリックおよびドメイン固有のエンティティを含むコロナニュース記事からトレーニングデータを生成するためのデータアノテーションパイプラインを提案する。名前付きエンティティ認識モデルは、この注釈付きコーパスに基づいてトレーニングされ、訓練されたモデルの性能を評価するドメインの専門家によって手動で注釈付けされたテスト文に基づいて評価される。コードベースとデモはhttps://github.com/sefeoglu/coronanews-ner.git.comで公開されている。 Information resources such as newspapers have produced unstructured text data in various languages related to the corona outbreak since December 2019. Analyzing these unstructured texts is time-consuming without representing them in a structured format; therefore, representing them in a structured format is crucial. An information extraction pipeline with essential tasks -- named entity tagging and relation extraction -- to accomplish this goal might be applied to these texts. This study proposes a data annotation pipeline to generate training data from corona news articles, including generic and domain-specific entities. Named entity recognition models are trained on this annotated corpus and then evaluated on test sentences manually annotated by domain experts evaluating the performance of a trained model. The code base and demonstration are available at https://github.com/sefeoglu/coronanews-ner.git.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# オンデマンドマルチホットスポット熱管理のための機械学習支援熱電冷却 Machine Learning-Assisted Thermoelectric Cooling for On-Demand Multi-Hotspot Thermal Management ( http://arxiv.org/abs/2404.13441v1 ) ライセンス: Link先を確認	Jiajian Luo, Jaeho Lee,	(参考訳) システム・オン・チップ(SoC)技術の急速な出現は、システムに空間的および時間的進化を伴う複数の動的ホットスポットを導入し、オンデマンド熱管理を実現するためにより効率的で洗練されたインテリジェントなアプローチを必要とする。本研究では,全領域にわたるリアルタイムマルチホットスポット条件に基づいてTECユニットを個別に制御することにより,大域的最適温度を実現することのできる,熱電冷却器(TEC)の機械学習支援最適化アルゴリズムを提案する。開始モジュールを持つ畳み込みニューラルネットワーク(CNN)は、システムの基盤となる熱-電気物理の結合を理解し、TECと非接触で正確な温度予測を行うように訓練されている。受動的熱勾配, ペルチェ効果, ジュール効果の複雑な相互作用により, 局所最適TEC制御は空間温度トレードオフを経験し, 大域的最適解には至らない。この問題に対処するために、バックトラックに基づく最適化アルゴリズムが設計された機械学習モデルを用いて開発され、グローバルな最適解を達成するために可能なTECの割り当てを反復する。 NHSホットスポット(n, m 以下 10 以下 NHS 以下 20 未満)を持つ n 個の任意の m に対して,我々のアルゴリズムは,地球規模の最適温度と対応する TEC アレイ制御を平均 1.07 秒で提供し,その裏側温度予測を反復的に行うことができる。これは、約18分を要する従来のFEM戦略と比較して4桁以上のスピードアップを示している。 The rapid emergence of System-on-Chip (SoC) technology introduces multiple dynamic hotspots with spatial and temporal evolution to the system, necessitating a more efficient, sophisticated, and intelligent approach to achieve on-demand thermal management. In this study, we present a novel machine learning-assisted optimization algorithm for thermoelectric coolers (TECs) that can achieve global optimal temperature by individually controlling TEC units based on real-time multi-hotspot conditions across the entire domain. A convolutional neural network (CNN) with inception module is trained to comprehend the coupled thermal-electrical physics underlying the system and attain accurate temperature predictions with and without TECs. Due to the intricate interaction among passive thermal gradient, Peltier effect and Joule effect, a local optimal TEC control experiences spatial temperature trade-off which may not lead to a global optimal solution. To address this issue, a backtracking-based optimization algorithm is developed using the designed machine learning model to iterate all possible TEC assignments for attaining global optimal solutions. For arbitrary m by n matrix with NHS hotspots (n, m less than 10 and NHS less than 20), our algorithm is capable of providing global optimal temperature and its corresponding TEC array control in an average of 1.07 second while iterating through tens of temperature predictions behind-the-scenes. This represents a speed increase of over four orders of magnitude compared to traditional FEM strategies which take approximately 18 minutes.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# FisheyeDetNet:自動走行のための魚眼カメラシステムにおける物体検出 FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving ( http://arxiv.org/abs/2404.13443v1 ) ライセンス: Link先を確認	Ganesh Sistu, Senthil Yogamani,	(参考訳) 物体検出は自律走行における成熟した問題であり、歩行者検出は最初に展開されたアルゴリズムの1つである。文学において総合的に研究されている。しかし、近距離場センシングのサラウンドビューに使用される魚眼カメラでは、物体検出は比較的少ない。標準的なバウンディングボックスの表現は、特に周辺部において重い放射歪みのため、魚眼カメラでは失敗する。これを軽減するために、バウンディングボックスの標準オブジェクト検出出力表現の拡張について検討する。我々は、回転する有界箱、楕円、ポリゴンを極弧/角表現として設計し、これらの表現を分析するためにインスタンスセグメンテーションmIOUメートル法を定義する。提案したモデルであるPhiteeyeDetNetは他より優れており、自動走行用Valeo fisheye around-viewデータセットのmAPスコアは49.5 %である。このデータセットは、ヨーロッパ、北米、アジアにまたがる4つのサラウンドビューカメラから撮影された60万枚の画像である。私たちの知る限りでは、これは自動走行シナリオのための魚眼カメラによる物体検出に関する初めての詳細な研究である。 Object detection is a mature problem in autonomous driving with pedestrian detection being one of the first deployed algorithms. It has been comprehensively studied in the literature. However, object detection is relatively less explored for fisheye cameras used for surround-view near field sensing. The standard bounding box representation fails in fisheye cameras due to heavy radial distortion, particularly in the periphery. To mitigate this, we explore extending the standard object detection output representation of bounding box. We design rotated bounding boxes, ellipse, generic polygon as polar arc/angle representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model FisheyeDetNet with polygon outperforms others and achieves a mAP score of 49.5 % on Valeo fisheye surround-view dataset for automated driving applications. This dataset has 60K images captured from 4 surround-view cameras across Europe, North America and Asia. To the best of our knowledge, this is the first detailed study on object detection on fisheye cameras for autonomous driving scenarios.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# DMesh: 汎用メッシュの差別化可能な表現 DMesh: A Differentiable Representation for General Meshes ( http://arxiv.org/abs/2404.13445v1 ) ライセンス: Link先を確認	Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou,	(参考訳) 一般的な3次元三角形メッシュに対して微分可能表現 DMesh を提案する。 DMeshはメッシュの幾何学情報と接続情報の両方を考慮する。我々の設計では、まず、重み付きデラウネー三角測量(WDT)に基づく領域をコンパクトにテセルレートする凸テトラヘドラの集合を取得し、WDTに基づく微分可能な方法で、所望のメッシュ上に存在する顔の確率を定式化する。これにより、DMeshは様々なトポロジのメッシュを微分可能な方法で表現することができ、勾配に基づく最適化を用いて、ポイントクラウドやマルチビューイメージなど、さまざまな観測の下でメッシュを再構築することができる。ソースコードと全文は、https://sonsang.github.io/dmesh-project.orgで入手できる。 We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and formulate probability of faces to exist on our desired mesh in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point cloud and multi-view images using gradient-based optimization. The source code and full paper is available at: https://sonsang.github.io/dmesh-project.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# SiNC+: 周期的信号の教師なし学習による適応型カメラベースバイタル SiNC+: Adaptive Camera-Based Vitals with Unsupervised Learning of Periodic Signals ( http://arxiv.org/abs/2404.13449v1 ) ライセンス: Link先を確認	Jeremy Speth, Nathan Vance, Patrick Flynn, Adam Czajka,	(参考訳) 血液量の脈拍や呼吸のようなサブトル周期的な信号は、RGBビデオから抽出することができ、非接触型健康モニタリングを低コストで実現する。遠隔パルス推定(rPPG)の進歩は、現在ディープラーニングソリューションによって推進されている。しかし、現代のアプローチは、コンタクト-PPGセンサーから基礎的真理を持つベンチマークデータセットで訓練され、評価される。本稿では,ラベル付きビデオデータの必要性を軽減するために,信号回帰のための非コントラスト非教師付き学習フレームワークを提案する。周期性と有限帯域幅の仮定は最小限であり,本手法は非競合ビデオから直接血液体積パルスを検出する。周期的な信号の視覚的特徴を学習するには,正常な生理的帯域内でのスパースパワースペクトルの促進と,パワースペクトルのバッチによるばらつきが十分であることがわかった。我々は、RPPGで特別に作成されていない非競合ビデオデータを用いて、ロバストパルス速度推定器を訓練する実験を行った。誘導バイアスが限られていることから,ターゲット信号の帯域幅を変化させることで,カメラによる呼吸に同じアプローチを適用できた。この手法は、異なる領域からの帯域制限された準周期信号の教師なし学習に十分であることを示す。さらに,本フレームワークは,単一主題からの映像のモデルを微調整するのに有効であり,パーソナライズされた適応的な信号レグレシタを実現することができることを示す。 Subtle periodic signals, such as blood volume pulse and respiration, can be extracted from RGB video, enabling noncontact health monitoring at low cost. Advancements in remote pulse estimation -- or remote photoplethysmography (rPPG) -- are currently driven by deep learning solutions. However, modern approaches are trained and evaluated on benchmark datasets with ground truth from contact-PPG sensors. We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. With minimal assumptions of periodicity and finite bandwidth, our approach discovers the blood volume pulse directly from unlabelled videos. We find that encouraging sparse power spectra within normal physiological bandlimits and variance over batches of power spectra is sufficient for learning visual features of periodic signals. We perform the first experiments utilizing unlabelled video data not specifically created for rPPG to train robust pulse rate estimators. Given the limited inductive biases, we successfully applied the same approach to camera-based respiration by changing the bandlimits of the target signal. This shows that the approach is general enough for unsupervised learning of bandlimited quasi-periodic signals from different domains. Furthermore, we show that the framework is effective for finetuning models on unlabelled video from a single subject, allowing for personalized and adaptive signal regressors.	翻訳日:2024-04-23 19:00:27 公開日:2024-04-20
# Cut-FUNQUE:圧縮トーンマップ高ダイナミックレンジ映像の客観的品質モデル Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos ( http://arxiv.org/abs/2404.13452v1 ) ライセンス: Link先を確認	Abhinau K. Venkataramanan, Cosmin Stejerean, Ioannis Katsavounidis, Hassene Tmar, Alan C. Bovik,	(参考訳) ハイダイナミックレンジ(HDR)ビデオは、標準ダイナミックレンジ(SDR)ビデオよりも幅広いコントラストと色を表現できるため、近年人気が高まっている。 HDRビデオキャプチャは、Apple iPhone、Google Pixel、Samsung Galaxyなどの最近のフラッグシップ携帯電話によって人気が高まっているが、多くの消費者は依然としてHDRビデオを表示できないレガシーなSDRディスプレイを使用している。その結果、HDRビデオは、SDR対応ビデオコンシューマの大部分にストリーミングする前に、トーンマップで処理されなければならない。しかし、サーバ側のトーンマッピングは、トーンマッピング演算子(TMO)とそのパラメータの選択に関する決定を自動化し、高忠実度出力を生成する。さらに、これらの選択は、ストリーミングシナリオにおいてユビキタスであるロッキー圧縮の影響とバランスをとらなければならない。本研究では,音色マップと圧縮HDR映像の視覚的品質を正確に予測できる,カットファンキュー(Cut-FUNQUE)という,客観的な映像品質の新規かつ効率的なモデルを開発した。最後に,このようなビデオの大規模クラウドソースデータベース上でCut-FUNQUEを評価し,最先端の精度を実現することを示す。 High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# システムの信頼性を革新する - 予測保守戦略におけるAIの役割 Revolutionizing System Reliability: The Role of AI in Predictive Maintenance Strategies ( http://arxiv.org/abs/2404.13454v1 ) ライセンス: Link先を確認	Michael Bidollahkhani, Julian M. Kunkel,	(参考訳) 分散システムにおけるメンテナンスの展望は、人工知能(AI)の統合によって急速に進化している。また、計算継続システムの複雑さが増すにつれて、予測保守(Pd.M.)におけるAIの役割はますます重要になっている。本稿ではPd.Mの現状を概観する。スケーラブルなAI技術の組み合わせに焦点を当てた、コンピューティング連続体。複雑で異質なコンピューティング継続システムに直面した従来のメンテナンスプラクティスの限界を認識したこの研究は、AI、特に機械学習とニューラルネットワークがどのようにPd.Mを強化するために使われているかを探求する。戦略だこの調査は、既存の文献の徹底的なレビューを含み、この分野における重要な進歩、方法論、ケーススタディを強調している。システム障害の予測精度の向上とメンテナンススケジュールの最適化においてAIが果たす役割を批判的に検証し、ダウンタイムの低減とシステムの長寿命化に寄与する。この分野の最新の進歩から成果を合成することにより、AIによる予測保守の実装の有効性と課題に関する洞察を提供する。これは、技術的進歩とコンピュータ連続システムの複雑さの増大に対応して、メンテナンスプラクティスの進化を浮き彫りにしている。この調査から得られた結論は、Pd.M.の現在の景観と今後の方向性を理解する実践者や研究者にとって有効である。分散システムですこの分野における継続的な研究と開発の必要性を強調し、AI時代のよりインテリジェントで効率的で費用対効果の高いメンテナンスソリューションのトレンドを指摘する。 The landscape of maintenance in distributed systems is rapidly evolving with the integration of Artificial Intelligence (AI). Also, as the complexity of computing continuum systems intensifies, the role of AI in predictive maintenance (Pd.M.) becomes increasingly pivotal. This paper presents a comprehensive survey of the current state of Pd.M. in the computing continuum, with a focus on the combination of scalable AI technologies. Recognizing the limitations of traditional maintenance practices in the face of increasingly complex and heterogenous computing continuum systems, the study explores how AI, especially machine learning and neural networks, is being used to enhance Pd.M. strategies. The survey encompasses a thorough review of existing literature, highlighting key advancements, methodologies, and case studies in the field. It critically examines the role of AI in improving prediction accuracy for system failures and in optimizing maintenance schedules, thereby contributing to reduced downtime and enhanced system longevity. By synthesizing findings from the latest advancements in the field, the article provides insights into the effectiveness and challenges of implementing AI-driven predictive maintenance. It underscores the evolution of maintenance practices in response to technological advancements and the growing complexity of computing continuum systems. The conclusions drawn from this survey are instrumental for practitioners and researchers in understanding the current landscape and future directions of Pd.M. in distributed systems. It emphasizes the need for continued research and development in this area, pointing towards a trend of more intelligent, efficient, and cost-effective maintenance solutions in the era of AI.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# 音響近似を用いたニューラルネットワーク動的モデルのリアルタイム安全制御 Real-Time Safe Control of Neural Network Dynamic Models with Sound Approximation ( http://arxiv.org/abs/2404.13456v1 ) ライセンス: Link先を確認	Hanjiang Hu, Jianglin Lan, Changliu Liu,	(参考訳) ニューラルネットワークダイナミックモデル(NNDM)の安全な制御は、ロボット工学や多くの応用において重要である。しかし、NNDMの最適安全制御をリアルタイムに計算することは依然として困難である。実時間計算を実現するために,NNDMの音響近似を制御合成に用いることを提案する。特に、NNDMにおけるReLU活性化関数のBernstein多項式オーバー近似(BPO)に基づくBernstein over-approximated Neural Dynamics(BOND)を提案する。 NNDMのBPO緩和における最も安全でない近似状態を用いて、近似による誤差を軽減し、安全制御問題の持続可能性を確保するために、最悪のケース安全性指標を合成する。オンラインリアルタイム最適化では、非線形最悪の安全制約の1次テイラー近似を、高次残差の l2 境界バイアス項を付加した NNDM の線形層として定式化する。異なるニューラルダイナミクスと安全性制約による総合的な実験により、音近似のNNDMは、MIP(Mixed integer Programming)を用いた安全な制御ベースラインよりも10～100倍高速で、最悪の安全指標の有効性と、提案したBONDのリアルタイム大規模設定におけるスケーラビリティが検証された。 Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernstein polynomial over-approximation (BPO) of ReLU activation functions in NNDM. To mitigate the errors introduced by the approximation and to ensure persistent feasibility of the safe control problems, we synthesize a worst-case safety index using the most unsafe approximated state within the BPO relaxation of NNDM offline. For the online real-time optimization, we formulate the first-order Taylor approximation of the nonlinear worst-case safety constraint as an additional linear layer of NNDM with the l2 bounded bias term for the higher-order remainder. Comprehensive experiments with different neural dynamics and safety constraints show that with safety guaranteed, our NNDMs with sound approximation are 10-100 times faster than the safe control baseline that uses mixed integer programming (MIP), validating the effectiveness of the worst-case safety index and scalability of the proposed BOND in real-time large-scale settings.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# 3ストローク熱機関の微視的最適性能 Optimal performance of a three stroke heat engine in the microscopic regime ( http://arxiv.org/abs/2404.13461v1 ) ライセンス: Link先を確認	Tanmoy Biswas, Chandan Datta,	(参考訳) 本研究では,2つの熱浴間の温度差を利用して作業体を用いた作業を行うストローク式熱エンジンについて検討する。エンジンの作業体は、熱湯から熱を抽出し、作業を生成し、3ストローク連続実行により余剰熱を冷湯に放出することを目的としている。任意次元の作業体を含むエンジンの熱力学の枠組みについて論じる。次に,本エンジンの2段作業体を含む最小バージョンに着目し,最適作業生産と効率のクローズドな表現を行う。さらに,Jaynes-Cummingsモデル内でそのようなエンジンを実現する可能性について検討し,最適作業生産と効率について考察する。最後に、有限次元熱浴を考慮したエンジンのスケールダウンを行い、このシナリオにおける最適作業と効率の式を得る。 In the microscopic regime, we consider a stroke-based heat engine that utilizes the temperature difference between two heat baths to produce work using a working body. The working body of the engine aims to extract heat from the hot heat bath, generate work, and discharge the surplus heat into the cold heat bath through the successive execution of three strokes. We discuss the thermodynamic framework for such an engine that involves any arbitrary-dimensional working body. Next, our investigation focuses on the smallest version of this engine, which involves a two-level working body, resulting in a closed expression for optimal work production and efficiency. Furthermore, we investigate the possibility of realizing such an engine within the Jaynes-Cummings model, gaining insights into optimal work production and efficiency. Finally, we scale down the engine, taking into account finite-dimensional heat baths, and obtain the expressions for optimal work and efficiency in this scenario.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# ハイブリッドな仕事の現実を探る - 疎いグループのソフトウェアプロフェッショナルとのケーススタディ Exploring Hybrid Work Realities: A Case Study with Software Professionals From Underrepresented Groups ( http://arxiv.org/abs/2404.13462v1 ) ライセンス: Link先を確認	Ronnie de Souza Santos, Cleyton Magalhes, Robson Santons, Jorge Correia-Neto,	(参考訳) コンテキスト。パンデミック後の時代には、ソフトウェア専門家は、リモートワークから得られる柔軟性を好んで、オフィスルーチンへの復帰に抵抗した。ハイブリットワーク構造は、その後、ソフトウェア企業内で人気となり、毎日オフィスで働くことを選択でき、柔軟性を保ち、ソフトウェア開発における過小評価グループのサポートの増加など、いくつかのメリットを生み出します。ゴール。パンデミック後のハイブリッド作業において,疎いグループからソフトウェア専門家がどのような経験をしているかを検討した。特に,脳神経障害,LGBTQIA+個人,およびソフトウェア業界で働く障害者の経験を分析した。方法。南アフリカのソフトウェア企業において,不足するグループに着目したケーススタディを行った。結果。ハイブリッドワークは、ポストパンデミック時代の過小評価されたグループからソフトウェアプロフェッショナルに好まれる。アドバンテージには、家庭でのフォーカスの改善、パーソナライズされた仕事のセットアップ、健康治療のための宿泊などがある。孤立とインフラサポートの不十分が懸念され、積極的な組織戦略の必要性が強調される。結論。ハイブリッドワークは、ソフトウェアエンジニアリングにおける多様性と包摂性を促進するための有望な戦略として現れ、従来のオフィス環境の過去の制限に対処する。 Context. In the post-pandemic era, software professionals resist returning to office routines, favoring the flexibility gained from remote work. Hybrid work structures, then, become popular within software companies, allowing them to choose not to work in the office every day, preserving flexibility, and creating several benefits, including an increase in the support for underrepresented groups in software development. Goal. We investigated how software professionals from underrepresented groups are experiencing post-pandemic hybrid work. In particular, we analyzed the experiences of neurodivergents, LGBTQIA+ individuals, and people with disabilities working in the software industry. Method. We conducted a case study focusing on the underrepresented groups within a well-established South American software company. Results. Hybrid work is preferred by software professionals from underrepresented groups in the post-pandemic era. Advantages include improved focus at home, personalized work setups, and accommodation for health treatments. Concerns arise about isolation and inadequate infrastructure support, highlighting the need for proactive organizational strategies. Conclusions. Hybrid work emerges as a promising strategy for fostering diversity and inclusion in software engineering, addressing past limitations of the traditional office environment.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# テストへの道:なぜ女性がソフトウェアテストに参加し続けるのか? Paths to Testing: Why Women Enter and Remain in Software Testing? ( http://arxiv.org/abs/2404.13464v1 ) ライセンス: Link先を確認	Kleice Silva, Ann Barcomb, Ronnie de Souza Santos,	(参考訳) 背景。女性はソフトウェア開発にユニークな問題解決スキルをもたらし、多くの場合、全体論的なアプローチと詳細への注意を好んでいます。ソフトウェアテストでは、専門家が欠陥を特定するためにシステム機能を探究するため、正確さと細部への注意が不可欠である。これらのスキルと女性の強みの一致を認識することは、ソフトウェア工学の多様性を高めるための戦略を導き出すことができる。ゴール。本研究は,ソフトウェアテストにおける女性がキャリアを選択する動機について考察し,現場に留まる理由について考察する。方法。本研究は、ソフトウェア工学ガイドラインに従って、ソフトウェアテストにおける女性からのデータを収集し、そのモチベーション、経験、視点を探索する横断的な調査手法を用いた。発見。調査の結果、女性は入学レベルの就職機会の増加、仕事と生活のバランス、性別のステレオタイプがさらに少ないため、ソフトウェアテストに入ることが明らかとなった。残るモチベーションには、高品質なソフトウェアを提供することの影響、継続的学習の機会、アクティビティが彼らにもたらす課題などがある。しかし、この分野における包括性とキャリア開発は、持続的な多様性を改善する必要がある。結論。これらの発見は、ソフトウェアテストにおける女性の多様なモチベーションの理解と、この理解が、プロフェッショナルな成長を育み、より包括的で平等な産業の展望を生み出す上でいかに重要であるかについて、研究者や実践者に興味深い洞察を与えてくれる。 Background. Women bring unique problem-solving skills to software development, often favoring a holistic approach and attention to detail. In software testing, precision and attention to detail are essential as professionals explore system functionalities to identify defects. Recognizing the alignment between these skills and women's strengths can derive strategies for enhancing diversity in software engineering. Goal. This study investigates the motivations behind women choosing careers in software testing, aiming to provide insights into their reasons for entering and remaining in the field. Method. This study used a cross-sectional survey methodology following established software engineering guidelines, collecting data from women in software testing to explore their motivations, experiences, and perspectives. Findings. The findings reveal that women enter software testing due to increased entry-level job opportunities, work-life balance, and even fewer gender stereotypes. Their motivations to stay include the impact of delivering high-quality software, continuous learning opportunities, and the challenges the activities bring to them. However, inclusiveness and career development in the field need improvement for sustained diversity. Conclusion. Preliminary yet significant, these findings offer interesting insights for researchers and practitioners towards the understanding of women's diverse motivations in software testing and how this understanding is important for fostering professional growth and creating a more inclusive and equitable industry landscape.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# エンティティ認識者はグローバル英語でよく働くか? Do "English" Named Entity Recognizers Work Well on Global Englishes? ( http://arxiv.org/abs/2404.13465v1 ) ライセンス: Link先を確認	Alexander Shan, John Bauer, Riley Carlson, Christopher Manning,	(参考訳) 一般的な英語のエンティティ認識(NER)データセットの大部分は、多くのグローバルな英語の変種が存在するにもかかわらず、アメリカまたはイギリス英語のデータを含んでいる。そのため、グローバルな英語の活用を一般化するかどうかは定かではない。これをテストするために、世界中の低リソースの英語版でNERモデルの性能を分析するために、NewswireデータセットであるWorldwide English NER Datasetを構築した。学習済みの文脈モデルRoBERTaとELECTRAを用いたモデルを含む,広く使用されているNERツールキットとトランスフォーマーモデルを,一般的に使用されている英国のニュースワイヤデータセット,CoNLL 2003,よりアメリカに焦点を当てたデータセットOntoNotes,グローバルデータセットの3つのデータセットで検証した。 CoNLLまたはOntoNotesデータセットでトレーニングされたすべてのモデルは、Worldwide Englishデータセットでテストされたいくつかのケースで、10 F1以上の大幅なパフォーマンス低下を経験しました。アジアと中東は比較的高い性能を示したが,地域別誤差を調べた結果,オセアニアとアフリカでは最大のパフォーマンス低下が見られた。最後に、Worldwideデータセットでトレーニングされた組み合わせモデルと、CoNLLまたはOntoNotesは、両方のテストセットで1-2 F1しか失われていないことが分かりました。 The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset, the Worldwide English NER Dataset, to analyze NER model performance on low-resource English variants from around the world. We test widely used NER toolkits and transformer models, including models using the pre-trained contextual models RoBERTa and ELECTRA, on three datasets: a commonly used British English newswire dataset, CoNLL 2003, a more American focused dataset OntoNotes, and our global dataset. All models trained on the CoNLL or OntoNotes datasets experienced significant performance drops-over 10 F1 in some cases-when tested on the Worldwide English dataset. Upon examination of region-specific errors, we observe the greatest performance drops for Oceania and Africa, while Asia and the Middle East had comparatively strong performance. Lastly, we find that a combined model trained on the Worldwide dataset and either CoNLL or OntoNotes lost only 1-2 F1 on both test sets.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# グローバルデジタル民主主義によるグローバルデジタルプラットフォーム構築のための草の根アーキテクチャ A Grassroots Architecture to Supplant Global Digital Platforms by a Global Digital Democracy ( http://arxiv.org/abs/2404.13468v1 ) ライセンス: Link先を確認	Ehud Shapiro,	(参考訳) 我々は、地域デジタルコミュニティの社会的、経済的、市民的、政治的ニーズ、およびそれらの連合を支援するために設計された、草の根と呼ばれるグローバルデジタルプラットフォームに対するアーキテクチャ上の代替案を提示する。 Grassrootsプラットフォームは、地域コミュニティにグローバルデジタルプラットフォームに代わるものを提供し、メンバーのスマートフォンでのみ運用し、ネットワーク自体以外のグローバルリソースを禁止します。このような共同体は、初期資本や外部クレジットなしでデジタル経済を形成し、主権的な民主主義と連邦を行使し、最終的にはグローバルなデジタル民主主義の草の根を形成する。 We present an architectural alternative to global digital platforms termed grassroots, designed to serve the social, economic, civic, and political needs of local digital communities, as well as their federation. Grassroots platforms may offer local communities an alternative to global digital platforms while operating solely on the smartphones of their members, forsaking any global resources other than the network itself. Such communities may form digital economies without initial capital or external credit, exercise sovereign democratic governance, and federate, ultimately resulting in the grassroots formation of a global digital democracy.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# GWLZ:科学データのためのグループ学習型ロッシー圧縮フレームワーク GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data ( http://arxiv.org/abs/2404.13470v1 ) ライセンス: Link先を確認	Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin,	(参考訳) 計算能力の急速な拡大と、現代のHPCシステムの継続的な規模拡大は、エクサスケールの科学データを管理する上で大きな課題となる。このような膨大なデータセットに直面して、従来のロスレス圧縮技術は、すべての情報をそのまま保存しながら、データサイズを管理可能なレベルに下げるには不十分である。これに対し、研究者はデータサイズ削減と情報保持のバランスを保ちながら、エラーバウンドの損失圧縮手法に切り替えた。しかし、これらの圧縮機は実用性にも拘わらず、再現性に限界がある。この問題に対処するために,近年の深層学習の進歩から着想を得たGWLZを提案する。ニューラルネットワークのグループを活用することで、GWLZは圧縮効率に無視できない影響で、圧縮されたデータ再構成の品質を大幅に向上する。 Nyxデータセットの異なるフィールドに対する実験結果は、GWLZによる顕著な改善を示し、0.0003xという無視可能なオーバーヘッドで最大20%の品質向上を実現した。 The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# ロボットのための事前学習型オブジェクト中心表現を「何」と「何」を基礎モデルから作成する Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models ( http://arxiv.org/abs/2404.13474v1 ) ライセンス: Link先を確認	Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman,	(参考訳) 近年、ロボット制御のための事前学習型視覚表現と、未知のカテゴリオブジェクトを一般画像にセグメント化において大きな進歩を遂げている。ロボット学習の改善にこれらを活用するために,ロボット制御のための事前学習対象中心表現を構築するための新しいフレームワークである$\textbf{POCR}$を提案する。心理学やコンピュータビジョンにおける「どこ」の表現の理論に基づいて、事前訓練されたモデルからのセグメンテーションを用いて、シーン内の様々な実体のタイムステップを安定して発見し、「どこで」情報をキャプチャする。このようなセグメント化された各エンティティに対して,ロボット制御タスクに適したベクトル記述を構築するための事前学習モデルを適用し,そのエンティティが何であるかをキャプチャする。そこで,本研究では,既訓練モデルの出力を新たなトレーニングなしで適切に組み合わせることで,制御のための事前学習対象中心表現を構築した。各種のロボットタスクにおいて、POCRで訓練されたロボットマニピュレータの模倣ポリシーは、ロボット工学の最先端の事前訓練された表現や、通常スクラッチから訓練された以前のオブジェクト中心の表現よりも、優れたパフォーマンスと体系的な一般化を実現していることを示す。 There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose $\textbf{POCR}$, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing "where" information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing "what" the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# PristiQ: クラウドにおける量子学習のデータセキュリティを維持するための共同設計フレームワーク PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the Cloud ( http://arxiv.org/abs/2404.13475v1 ) ライセンス: Link先を確認	Zhepeng Wang, Yi Sheng, Nirajan Koirala, Kanad Basu, Taeho Jung, Cheng-Chang Lu, Weiwen Jiang,	(参考訳) クラウドコンピューティングの恩恵を受け、今日のアーリーステージの量子コンピュータは、Quantum-as-a-Service(QaaS)として知られるクラウドサービスを介してリモートでアクセスすることができる。しかし、量子機械学習(QML)では、データ漏洩のリスクが高い。 QaaSでQMLモデルを実行するには、まずデータのサブ回路を含む量子回路をローカルにコンパイルし、次にコンパイルされた回路をQaaSプロバイダに送信して実行する必要がある。 QaaSプロバイダが信頼できない場合、生データをエンコードするサブ回路を簡単に盗むことができる。そこで本稿では,QMLのデータセキュリティをQaaSパラダイム,すなわちPristiQで保護するための協調設計フレームワークを提案する。ユーザ定義のセキュリティキーに付随するセキュアなキュービットを持つ暗号化サブ回路を導入することにより、データのセキュリティを大幅に向上することができる。また,暗号化された量子データ上での性能を維持するために,自動探索アルゴリズムを提案する。シミュレーションと実際のIBM量子コンピュータの実験結果はどちらも、QMLのモデル性能を維持しながら、量子データに対して高いセキュリティを提供するPristiQの能力を証明するものである。 Benefiting from cloud computing, today's early-stage quantum computers can be remotely accessed via the cloud services, known as Quantum-as-a-Service (QaaS). However, it poses a high risk of data leakage in quantum machine learning (QML). To run a QML model with QaaS, users need to locally compile their quantum circuits including the subcircuit of data encoding first and then send the compiled circuit to the QaaS provider for execution. If the QaaS provider is untrustworthy, the subcircuit to encode the raw data can be easily stolen. Therefore, we propose a co-design framework for preserving the data security of QML with the QaaS paradigm, namely PristiQ. By introducing an encryption subcircuit with extra secure qubits associated with a user-defined security key, the security of data can be greatly enhanced. And an automatic search algorithm is proposed to optimize the model to maintain its performance on the encrypted quantum data. Experimental results on simulation and the actual IBM quantum computer both prove the ability of PristiQ to provide high security for the quantum data while maintaining the model performance in QML.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# 因果性, 疎度, 密度を考慮した有意義な対物探査フレームワーク A Framework for Feasible Counterfactual Exploration incorporating Causality, Sparsity and Density ( http://arxiv.org/abs/2404.13476v1 ) ライセンス: Link先を確認	Kleopatra Markou, Dimitrios Tomaras, Vana Kalogeraki, Dimitrios Gunopulos,	(参考訳) 機械学習モデルのアウトプットを、インプットへの小さな摂動を通じて、対実的(CF)説明で解釈する必要があることは、研究コミュニティで注目されている。 CFの様々な例は重要であるが、それらの側面は同時に実現可能であり、必ずしも全体に適用できるとは限らない。この研究は、異なるベンチマークデータセットを使用して、属性の論理因果関係の保存を通して、CFサンプルが元の入力にわずかな変更を加えて生成可能かどうかを検証し、現実のケースではエンドユーザーにとって実際に有用である。そこで我々は,入力クラスと可変オートエンコーダ(VAE)を区別するために,ブラックボックスモデルを分類器として使用した。拡張として、2次元多様体(各データセットの1つ)を抽出し、実現不可能な例の大多数を配置した。実験では,一般的に使用されている3つのデータセットを使用して,データセットの属性でそれらの重要性を確認することで,事前に定義されたすべての因果制約を満たす,実行可能かつスパースなCF例を生成しました。 The imminent need to interpret the output of a Machine Learning model with counterfactual (CF) explanations - via small perturbations to the input - has been notable in the research community. Although the variety of CF examples is important, the aspect of them being feasible at the same time, does not necessarily apply in their entirety. This work uses different benchmark datasets to examine through the preservation of the logical causal relations of their attributes, whether CF examples can be generated after a small amount of changes to the original input, be feasible and actually useful to the end-user in a real-world case. To achieve this, we used a black box model as a classifier, to distinguish the desired from the input class and a Variational Autoencoder (VAE) to generate feasible CF examples. As an extension, we also extracted two-dimensional manifolds (one for each dataset) that located the majority of the feasible examples, a representation that adequately distinguished them from infeasible ones. For our experimentation we used three commonly used datasets and we managed to generate feasible and at the same time sparse, CF examples that satisfy all possible predefined causal constraints, by confirming their importance with the attributes in a dataset.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# スケーラビリティと低オーバーヘッドのローハマー除去を実現するための逆検出の活用 Leveraging Adversarial Detection to Enable Scalable and Low Overhead RowHammer Mitigations ( http://arxiv.org/abs/2404.13477v1 ) ライセンス: Link先を確認	Oğuzhan Canpolat, A. Giray Yağlıkçı, Ataberk Olgun, İsmail Emir Yüksel, Yahya Can Tuğrul, Konstantinos Kanellopoulos, Oğuz Ergin, Onur Mutlu,	(参考訳) RowHammerはDRAMにおける読み取り障害の主要な例であり、DRAMセルの行(DRAM行)に繰り返しアクセスすることで、物理的に近くの他のDRAM行のビットフリップを誘導する。 RowHammerソリューションは、現代のコンピューティングシステムにおけるセキュリティとプライバシの基本的な構成要素であるメモリアイソレーションを維持するために、そのようなビットフリップを緩和する(例えば、ハンマーされた行の隣接行をリフレッシュする)予防アクションを実行する。しかし、予防行動は、メモリコントローラのメモリ要求を妨害する非無視のメモリ要求遅延とシステムパフォーマンスのオーバーヘッドを引き起こす。 DRAMチップ世代よりも小さくなる技術がRowHammerを悪化させるにつれ、RowHammerソリューションのオーバーヘッドは著しく大きくなる。その結果、悪意のあるプログラムは、多くのRowHammer防止アクションを発生させることで、メモリシステムを効果的にホグし、アプリケーションを無視することができる。本研究では、RowHammerソリューションを起動するメモリアクセスのジェネレータを追跡することにより、RowHammerソリューションのパフォーマンスオーバーヘッドに対処する。この目的のために、BreakHammerを提案する。 BreakHammerは既存のRowHammerソリューションと連携して、予防行動を引き起こすハードウェアスレッドを特定する。そのためにBreakHammerは、RowHammerの防止アクションの頻度に基づいて、スレッドのRowHammerの確率を見積もる。 BreakHammerは、スレッドのRowHammerの可能性に基づいて、スレッドがメモリシステムに注入できるオンザフライリクエストの数を制限する。これにより、BreakHammerは、実行された対策の数を著しく削減し、システム性能を48.7%(最大105.5%)改善し、良質なアプリケーションに誘導される最大スローダウンを14.6%削減し、ほぼゼロの領域のオーバーヘッド(例えば、ハイエンドプロセッサのチップ領域の0.0002%)を減らした。 RowHammer is a prime example of read disturbance in DRAM where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in other physically nearby DRAM rows. RowHammer solutions perform preventive actions (e.g., refresh neighbor rows of the hammered row) that mitigate such bitflips to preserve memory isolation, a fundamental building block of security and privacy in modern computing systems. However, preventive actions induce non-negligible memory request latency and system performance overheads as they interfere with memory requests in the memory controller. As shrinking technology node size over DRAM chip generations exacerbates RowHammer, the overheads of RowHammer solutions become prohibitively large. As a result, a malicious program can effectively hog the memory system and deny service to benign applications by causing many RowHammer preventive actions. In this work, we tackle the performance overheads of RowHammer solutions by tracking the generators of memory accesses that trigger RowHammer solutions. To this end, we propose BreakHammer. BreakHammer cooperates with existing RowHammer solutions to identify hardware threads that trigger preventive actions. To do so, BreakHammer estimates the RowHammer likelihood of a thread, based on how frequently it triggers RowHammer preventive actions. BreakHammer limits the number of on-the-fly requests a thread can inject into the memory system based on the thread's RowHammer likelihood. By doing so, BreakHammer significantly reduces the number of performed counter-measures, improves the system performance by an average (maximum) of 48.7% (105.5%), and reduces the maximum slowdown induced on a benign application by 14.6% with near-zero area overhead (e.g., 0.0002% of a highend processor's chip area).	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# 精密配置タスクのための深部SE(3)-等変幾何推論 Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks ( http://arxiv.org/abs/2404.13478v1 ) ライセンス: Link先を確認	Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, David Held,	(参考訳) 多くのロボット操作タスクは幾何学的推論タスクとしてフレーム化することができ、エージェントは初期条件からタスクを満たす位置にオブジェクトを正確に操作できなければならない。多くの場合、タスクの成功は2つのオブジェクトの関係に基づいて定義されます。そのような場合、解は対象とエージェントの初期位置と等しく、カメラのポーズと不変であるべきである。これは、高次元のデモンストレーションから直接学習することでこの課題を解決しようとする学習システムにとっての課題である。本研究では,SE(3)-同変を証明可能な精度の高いポーズ予測手法を提案する。この問題をSE(3)不変なタスク固有のシーン表現の学習に分解し、SE(3)同変を証明可能な新しい幾何学的推論層と解釈することで解決する。実世界の実演から収集した相対配置関係データを高精度に表現し,シミュレーションされた配置タスクにおいて,同じ量のデータで訓練された従来の方法よりも精度の高い配置予測が得られることを示す。追加情報とビデオはhttps://sites.google.com/view/reldist-iclr-2023で見ることができる。 Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial position of the objects as well as the agent, and invariant to the pose of the camera. This poses a challenge for learning systems which attempt to solve this task by learning directly from high-dimensional demonstrations: the agent must learn to be both equivariant as well as precise, which can be challenging without any inductive biases about the problem. In this work, we propose a method for precise relative pose prediction which is provably SE(3)-equivariant, can be learned from only a few demonstrations, and can generalize across variations in a class of objects. We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene and then interpreting this representation with novel geometric reasoning layers which are provably SE(3) equivariant. We demonstrate that our method can yield substantially more precise placement predictions in simulated placement tasks than previous methods trained with the same amount of data, and can accurately represent relative placement relationships data collected from real-world demonstrations. Supplementary information and videos can be found at https://sites.google.com/view/reldist-iclr-2023.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# コンテンツから画像の出現を遠ざける共同品質評価と実例誘導画像処理 Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content ( http://arxiv.org/abs/2404.13484v1 ) ライセンス: Link先を確認	Abhinau K. Venkataramanan, Cosmin Stejerean, Ioannis Katsavounidis, Hassene Tmar, Alan C. Bovik,	(参考訳) ディープラーニング革命は、スタイル/ドメイン転送、強化/復元、視覚的品質評価といった低レベルの画像処理タスクに強く影響を与えている。多くの場合、別々に扱われるが、上記のタスクは、基礎となる内容を変更することなく、入力画像の外観を理解し、編集し、拡張する共通のテーマを共有している。我々はこの観察を利用して、入力をコンテンツや外観特徴に分解する新しい非絡み合い表現学習法を開発した。モデルは自己教師型で訓練され、学習した特徴を用いてDisQUEと呼ばれる新しい品質予測モデルを開発する。本研究では,DisQUEが品質予測タスクや歪みタイプにまたがって最先端の精度を実現していることを示す。さらに,HDRトーンマッピングなどの画像処理にも同様の特徴が有効であることを示す。 The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone mapping, where the desired output characteristics may be tuned using example input-output pairs.	翻訳日:2024-04-23 18:50:40 公開日:2024-04-20
# マルチエージェント強化学習のためのグループ認識コーディネーショングラフ Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2404.10976v2 ) ライセンス: Link先を確認	Wei Duan, Jie Lu, Junyu Xuan,	(参考訳) 協調的マルチエージェント強化学習(MARL)はエージェント間のシームレスな協調を必要とする。このグラフを学習する既存の方法は、主にエージェント対ペア関係に焦点をあて、高階関係を無視している。いくつかの手法は、グループ内の行動類似性を包含するように協調モデリングを拡張しようとするが、通常は潜伏グラフの同時学習において不足し、部分的に観察されたエージェント間の情報交換を制限している。これらの制約を克服するために,現在観測されている行動パターンからエージェントペア間の協調とグループレベルの依存性の両方を捉えるために,GACG(Group-Aware Coordination Graph)を推論する新しい手法を提案する。このグラフは、意思決定中にエージェント間の情報交換のためのグラフ畳み込みにさらに使用される。同一グループ内のエージェント間の行動整合性をさらに確保するため,グループ間の凝集を促進するグループ距離損失を導入し,グループ間の特殊化を促進する。本稿では,StarCraft IIマイクロマネジメントタスクによるGACGの性能評価を行った。アブレーション実験により, 本手法の各成分の有効性について実験的に検証した。 Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# FedFa: フェデレーションラーニングのための完全な非同期トレーニングパラダイム FedFa: A Fully Asynchronous Training Paradigm for Federated Learning ( http://arxiv.org/abs/2404.11015v2 ) ライセンス: Link先を確認	Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao,	(参考訳) フェデレーション学習は、トレーナーのデータのプライバシを保証しながら、多数のデバイス上で機械学習モデルのトレーニングをスケールするための、効率的な分散トレーニングパラダイムとして特定されている。 FedAvgは、クライアント間での不均一なデータの影響を排除し、収束を保証することを約束しているフェデレーション学習の基本的なパラメータ更新戦略になっている。しかし、トレーニング中の各通信ラウンド毎の同期パラメータ更新障壁は、待ち時間が大きくなり、トレーニング手順が遅くなる。したがって、最近の最先端のソリューションでは、半非同期アプローチを用いて収束を保証することで待ち時間コストを軽減することが提案されている。それでも、出現する半非同期アプローチは、待ち時間を完全に排除することはできない。我々はFedFaと呼ばれる完全な非同期トレーニングパラダイムを提案し、パラメータ更新にいくつかのバッファリング結果を使用することで、モデル収束を保証し、フェデレーション学習の待ち時間を完全に排除できる。さらに,提案したFedFaの収束率の理論的証明を提供する。 IIDと非IIDの両方のシナリオにおいて高い精度を維持しつつ、最先端の同期型および半非同期型の戦略と比較して、フェデレート学習のトレーニング性能を最大6倍と4倍のスピードアップで効果的に向上することを示す。 Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm, called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# 局所形状変換によるSO(3)-不変意味対応の学習 Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform ( http://arxiv.org/abs/2404.11156v2 ) ライセンス: Link先を確認	Chunghyun Park, Seungwook Kim, Jaesik Park, Minsu Cho,	(参考訳) 形状間の正確な3D対応を確立することは、コンピュータビジョンとロボット工学にとって重要な課題である。しかし,既存の自己教師型手法では完全な入力形状のアライメントを前提としており,実際の適用性が制限されている。本研究では,RISTと呼ばれる局所形状変換を用いた自己教師型回転不変3次元対応学習システムを提案する。具体的には、入力形状のSO(3)-同変大域形状記述子を局所形状記述子にマッピングする、各点についてSO(3)-不変局所形状変換を動的に定式化することを学ぶ。これらの局所形状記述子はデコーダへの入力として提供され、ポイントクラウドの自己とクロスコンストラクションを容易にする。提案する自己教師型学習パイプラインは,異なる形状の意味的対応点を類似の局所的な形状記述子にマッピングし,RISTが高密度な点対応を確立できるようにする。 RISTは、任意の回転点雲対に与えられる3D部分ラベル転送とセマンティックキーポイント転送の最先端性能を示し、既存の手法をかなりのマージンで上回る。 Establishing accurate 3D correspondences between shapes stands as a pivotal challenge with profound implications for computer vision and robotics. However, existing self-supervised methods for this problem assume perfect input shape alignment, restricting their real-world applicability. In this work, we introduce a novel self-supervised Rotation-Invariant 3D correspondence learner with Local Shape Transform, dubbed RIST, that learns to establish dense correspondences between shapes even under challenging intra-class variations and arbitrary orientations. Specifically, RIST learns to dynamically formulate an SO(3)-invariant local shape transform for each point, which maps the SO(3)-equivariant global shape descriptor of the input shape to a local shape descriptor. These local shape descriptors are provided as inputs to our decoder to facilitate point cloud self- and cross-reconstruction. Our proposed self-supervised training pipeline encourages semantically corresponding points from different shapes to be mapped to similar local shape descriptors, enabling RIST to establish dense point-wise correspondences. RIST demonstrates state-of-the-art performances on 3D part label transfer and semantic keypoint transfer given arbitrarily rotated point cloud pairs, outperforming existing methods by significant margins.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# AIインタフェースにおけるデザインパターンとの相互作用による特徴付けとモデリング Characterizing and modeling harms from interactions with design patterns in AI interfaces ( http://arxiv.org/abs/2404.11370v2 ) ライセンス: Link先を確認	Lujain Ibrahim, Luc Rocher, Ana Valdivia,	(参考訳) 人工知能(AI)システムを用いたアプリケーションの普及は、洗練されたインターフェースを通じてこれらのシステムと対話するユーザの増加につながっている。ヒューマンコンピュータインタラクションの研究は、ユーザー行動と技術的能力とリスクに対するユーザーの認識の両方を形作るインターフェースを長年にわたって示してきた。しかし、AIシステムの社会的および倫理的リスクを評価する実践者や研究者は、人間とAIの相互作用に対する人為的、欺く、没入的なインターフェースの影響を見落としてしまう傾向にある。ここでは,適応型AIシステムを用いたインタフェースの設計は,従来考えられていた以上のフィードバックループによって,カスケード効果をもたらす可能性がある,と論じる。まず、AIインターフェース設計のスコーピングレビューを行い、AIインターフェースに潜在的に有害なデザインパターンの有害なテーマを抽出する。そこで我々は,AIインタフェース設計における影響評価を構造化し,促進する概念モデルとして,AIシステムの設計強化制御(DECAI)を提案する。 DECAIは制御系理論(動的物理系の解析と設計の理論)の原則に基づいて、ヒューマンAIシステムにおけるインターフェースの役割を解明する。推薦システムと対話型言語モデルシステムに関する2つのケーススタディを通じて、AIインタフェース設計の評価にDECAIをどのように利用できるかを示す。 The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces. Human-computer interaction research has long shown that interfaces shape both user behavior and user perception of technical capabilities and risks. Yet, practitioners and researchers evaluating the social and ethical risks of AI systems tend to overlook the impact of anthropomorphic, deceptive, and immersive interfaces on human-AI interactions. Here, we argue that design features of interfaces with adaptive AI systems can have cascading impacts, driven by feedback loops, which extend beyond those previously considered. We first conduct a scoping review of AI interface designs and their negative impact to extract salient themes of potentially harmful design patterns in AI interfaces. Then, we propose Design-Enhanced Control of AI systems (DECAI), a conceptual model to structure and facilitate impact assessments of AI interface designs. DECAI draws on principles from control systems theory -- a theory for the analysis and design of dynamic physical systems -- to dissect the role of the interface in human-AI systems. Through two case studies on recommendation systems and conversational language model systems, we show how DECAI can be used to evaluate AI interface designs.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# シンボリック機械学習から核モデルを発見する Discovering Nuclear Models from Symbolic Machine Learning ( http://arxiv.org/abs/2404.11477v2 ) ライセンス: Link先を確認	Jose M. Munoz, Silviu M. Udrescu, Ronald F. Garcia Ruiz,	(参考訳) 多くの現象学的核モデルが提案され、核チャートの異なる領域で特定の観測可能物を記述することが提案されている。しかしながら、全ての核の複雑な振る舞いを記述する統一モデルの開発は、依然として未解決の課題である。ここでは,新しいシンボリック機械学習(ML)が,従来の物理モデルを再発見するか,あるいは簡易性,忠実性,予測力を向上した代替品を識別できるかを検討する。この課題に対処するために,多目的反復型シンボル回帰手法を開発し,複数の観測対象に対するシンボル回帰を処理し,実験的不確実性を考慮し,高次元問題に対して頑健である。原理の証明として,光・中質量核の核結合エネルギーと電荷半径を記述するために本手法を適用した。提案手法では, 陽子数と中性子数に基づいて単純な解析関係を同定し, 最先端の原子核モデルに匹敵する精度で解釈可能なモデルを提供する。さらに、このML発見モデルと既存の補完モデルを統合し、核安定性の限界を推定した。これらの結果は、正確な核モデルを開発し、複雑な多体問題の記述をガイドするシンボリックMLの可能性を強調している。 Numerous phenomenological nuclear models have been proposed to describe specific observables within different regions of the nuclear chart. However, developing a unified model that describes the complex behavior of all nuclei remains an open challenge. Here, we explore whether novel symbolic Machine Learning (ML) can rediscover traditional nuclear physics models or identify alternatives with improved simplicity, fidelity, and predictive power. To address this challenge, we developed a Multi-objective Iterated Symbolic Regression approach that handles symbolic regressions over multiple target observables, accounts for experimental uncertainties and is robust against high-dimensional problems. As a proof of principle, we applied this method to describe the nuclear binding energies and charge radii of light and medium mass nuclei. Our approach identified simple analytical relationships based on the number of protons and neutrons, providing interpretable models with precision comparable to state-of-the-art nuclear models. Additionally, we integrated this ML-discovered model with an existing complementary model to estimate the limits of nuclear stability. These results highlight the potential of symbolic ML to develop accurate nuclear models and guide our description of complex many-body problems.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# バグの自動局所化と修復のための大規模言語モデルへの深い取り組み A Deep Dive into Large Language Models for Automated Bug Localization and Repair ( http://arxiv.org/abs/2404.11595v2 ) ライセンス: Link先を確認	Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, Omer Tripp,	(参考訳) 大規模言語モデル(LLM)は、自動プログラム修復(APR)など、様々なソフトウェアエンジニアリングタスクにおいて顕著な効果を示している。本研究では,LSMを用いた自動バグ修正について深く検討する。既知のバグ位置を仮定したり、ラインレベルのローカライズツールに依存する、あるいは1ステップでバグの予測と修正を行う、ディープラーニングベースのAPRメソッドとは対照的に、当社のアプローチでは、トークンレベルでのバグ位置を予測するためにLSMを独自に使用し、その後バグ修正に利用しています。異なるLLMを用いたバグローカライゼーションと修正の方法論は,多様なコンテキスト情報の効果的な統合と帰納的バイアスの取り込みの改善を可能にする。 Toggle: Token-Granulated Bug Localization and repairは、バグローカライゼーションモデル、調整ユニット、バグ修正モデルを統合する包括的なプログラム修復フレームワークである。 Toggleはバギー関数を入力として、完全な修正関数を生成する。本稿では, バグ修正モデルに対して, 誘導バイアスをより有効に活用し, 他よりも著しく優れる最も効果的なプロンプトを特定するための, 様々な手法について検討する。 Toggleは、CodeXGLUEコードリファインメントベンチマークにおける新しい最先端(SOTA)パフォーマンスを実現し、Defects4Jを含む、他の広く使用されているAPRデータセットで、より良く、同等のパフォーマンスを示す。 Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our approach uniquely employs LLMs to predict bug location at the token level and subsequently utilizes them for bug fixing. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information and improved incorporation of inductive biases. We introduce Toggle: Token-Granulated Bug Localization and Repair, a comprehensive program repair framework that integrates a bug localization model, an adjustment unit, and a bug-fixing model. Toggle takes a buggy function as input and generates a complete corrected function. We investigate various styles of prompting to the bug fixing model to identify the most effective prompts that better utilize the inductive bias and significantly outperform others. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark, and exhibits better and comparable performance on several other widely-used APR datasets, including Defects4J.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# 宇宙デシッター時空における2つの衝突するウンルー・デウィット検出器間の絡み合い発生 Entanglement generation between two comoving Unruh-DeWitt detectors in the cosmological de Sitter spacetime ( http://arxiv.org/abs/2404.11931v2 ) ライセンス: Link先を確認	Sourav Bhattacharya, Shagun Kaushal,	(参考訳) 宇宙空間における2つの同一のUnruh-DeWitt検出器間の絡み合いの発生や収穫について検討する。 2つの共振型2レベル検出器を同時に空間的位置で検討する。検出器は最初は絡まっていないと仮定される。検出器は個別にスカラー場に結合し、2つの検出器間の結合につながる。我々は、実数体と複素数体の両方に対して、2種類のスカラー場(共形対称および無質量最小結合)を考える。スカラー場に対応する自由度を追従することにより、2つの検出器の密度行列を構築し、その固有値が検出器のエネルギー準位間の遷移を特徴づける。これらのフィールドの単位固有時間当たりの検出器応答関数に対する既存の結果を用いて、次に対数ネガティビティを計算し、2つの検出器間の遅い時間に発生する絡み合いの度合いを定量化する。異なる種類のスカラー場に対するこれらの結果の類似性と相違について論じている。 We investigate the entanglement generation or harvesting between two identical Unruh-DeWitt detectors in the cosmological de Sitter spacetime. We consider two comoving two-level detectors at a coincident spatial position. The detectors are assumed to be unentangled initially. The detectors are individually coupled to a scalar field, which eventually leads to coupling between the two detectors. We consider two kinds of scalar fields -- conformally symmetric and massless minimally coupled, for both real and complex cases. By tracing out the degrees of freedom corresponding to the scalar field, we construct the reduced density matrix for the two detectors, whose eigenvalues characterise transitions between the energy levels of the detectors. By using the existing results for the detector response functions per unit proper time for these fields, we next compute the logarithmic negativity, quantifying the degree of entanglement generated at late times between the two detectors. The similarities and differences of these results for different kind of scalar fields have been discussed.	翻訳日:2024-04-23 12:48:38 公開日:2024-04-20
# ProTA: テキスト検索のための確率的トークン集約 ProTA: Probabilistic Token Aggregation for Text-Video Retrieval ( http://arxiv.org/abs/2404.12216v2 ) ライセンス: Link先を確認	Han Fang, Xianghao Zang, Chao Ban, Zerun Feng, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun,	(参考訳) テキストビデオ検索は、あるクエリに対して最も関連性の高いクロスモーダルサンプルを見つけることを目的としている。近年の手法は空間的・時間的関係のモデル化に重点を置いている。しかし、ビデオクリップはキャプションよりも多様な内容を含んでいるため、これらの非対称なビデオテキストペアを整列させるモデルは、多くの偽陽性結果を取得するリスクが高い。本稿では,コンテンツ非対称性との相互相互作用を扱うための確率的トークン集約(ProTA)を提案する。具体的には、低次元空間と高次元空間の両方において、トークン表現をアンタングルと再集約する2つの部分関連アグリゲーションを提案する。トークンレベルの確率的表現を生成し,特徴表現の多様性を維持するために,トークンベースの確率的アライメントを提案する。さらに、コンパクトなクロスモーダル分布空間を学習するために、適応的なコントラスト損失を提案する。広範な実験に基づいて、ProTAはMSR-VTT(50.9%)、LSMDC(25.8%)、DiDeMo(47.2%)を大幅に改善した。 Text-video retrieval aims to find the most relevant cross-modal samples for a given query. Recent methods focus on modeling the whole spatial-temporal relations. However, since video clips contain more diverse content than captions, the model aligning these asymmetric video-text pairs has a high risk of retrieving many false positive results. In this paper, we propose Probabilistic Token Aggregation (ProTA) to handle cross-modal interaction with content asymmetry. Specifically, we propose dual partial-related aggregation to disentangle and re-aggregate token representations in both low-dimension and high-dimension spaces. We propose token-based probabilistic alignment to generate token-level probabilistic representation and maintain the feature representation diversity. In addition, an adaptive contrastive loss is proposed to learn compact cross-modal distribution space. Based on extensive experiments, ProTA achieves significant improvements on MSR-VTT (50.9%), LSMDC (25.8%), and DiDeMo (47.2%).	翻訳日:2024-04-23 12:38:52 公開日:2024-04-20
# 大規模言語モデリングによる皮肉検出における感情特徴の増強 Augmenting emotion features in irony detection with Large language modeling ( http://arxiv.org/abs/2404.12291v2 ) ライセンス: Link先を確認	Yucheng Lin, Yuhan Xia, Yunfei Long,	(参考訳) そこで本研究では,感情中心のテキスト強化を促進するために,大規模言語モデル(LLM)を即時学習で適用する,新たな皮肉検出手法を提案する。伝統的な皮肉検出技術は、静的言語的特徴や事前定義された知識ベースに依存しているため、しばしば、皮肉に不可欠な微妙な感情的な次元を見落としているため、一般的には不足している。対照的に,本手法は,LLMを通した微妙な感情的手がかりを,皮肉検出の基礎として広く認識されている3つのベンチマークNLPモデル(BERT,T5,GPT-2)に統合することにより,検出プロセスを増強する。本手法をSemEval-2018 Task 3データセットを用いて評価し,皮肉検出能力の大幅な向上について検討した。 This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, our methodology augments the detection process by integrating subtle emotional cues, augmented through LLMs, into three benchmark pre-trained NLP models - BERT, T5, and GPT-2 - which are widely recognized as foundational in irony detection. We assessed our method using the SemEval-2018 Task 3 dataset and observed substantial enhancements in irony detection capabilities.	翻訳日:2024-04-23 12:38:52 公開日:2024-04-20
# 散逸相および非エルミート相転移における断熱変換 Adiabatic Transformations in Dissipative and Non-Hermitian Phase Transitions ( http://arxiv.org/abs/2404.12337v2 ) ライセンス: Link先を確認	Pavel Orlov, Georgy V. Shlyapnikov, Denis V. Kurlov,	(参考訳) 量子幾何学テンソルは、孤立量子系における平衡相転移の解析と検出のための一般的な枠組みとして確立されている。非エルミート量子系における相転移の研究に普遍的なアプローチを提供する量子幾何テンソルの新しい一般化を提案する。我々の一般化は、断熱変換の生成の概念に基づいており、リウヴィリア超作用素または有効非エルミート・ハミルトン作用素によって記述されたシステムに適用することができる。本稿では,非エルミート的Su-Schrieffer-Heegerモデルと2次リウビリアンを用いた一般準自由散逸性フェルミオン系を解析し,提案手法について述べる。その結果,本手法は全モデル間の位相遷移を効果的に同定し,一般の非エルミート系を解析するための普遍的なツールを提供することがわかった。 The quantum geometric tensor has established itself as a general framework for the analysis and detection of equilibrium phase transitions in isolated quantum systems. We propose a novel generalization of the quantum geometric tensor, which offers a universal approach to studying phase transitions in non-Hermitian quantum systems. Our generalization is based on the concept of the generator of adiabatic transformations and can be applied to systems described by either a Liouvillian superoperator or by an effective non-Hermitian Hamiltonian. We illustrate the proposed method by analyzing the non-Hermitian Su-Schrieffer-Heeger model and a generic quasi-free dissipative fermionic system with a quadratic Liouvillian. Our findings reveal that this method effectively identifies phase transitions across all examined models, providing a universal tool for investigating general non-Hermitian systems.	翻訳日:2024-04-23 12:38:52 公開日:2024-04-20

Title

Authors

Abstract

論文公表日・翻訳日

# Clock Domain Crossing (CDC) の実用的形式検証手法

Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC) ( http://arxiv.org/abs/2406.06533v1 )

ライセンス: Link先を確認

Aman Kumar, Muhammad Ul Haque Khan, Bijitendra Mittra,

(参考訳) 現代のSystem-on-Chip (SoC) の設計は、テクノロジーのスケールアップにより、ますます複雑になりつつある。 SoC設計はしばしば複数の非同期クロックドメインで動作し、全体的な設計の複雑さをさらに増す。デバイスを効率よくするために、デザイナは複数の非同期ドメインを生成するGlobally-Asynchronous Locally-Synchronous (GALS)アプローチを採用する。これらのClock Domain Crossings (CDC) は、転移性の影響を受けやすく、そのようなCDCの機能的検証は、バグの回避を確実にするために非常に重要である。レジスタ転送レベル(RTL)シミュレーションや静的タイミング解析のような従来の検証手法では、これらのCDC問題に対処するには不十分であり、検証のギャップが生じる可能性がある。さらに、これらのCDC関連バグを特定するのは非常に時間がかかり、コストがかかるシリコン再スピンの最も一般的な理由の1つである。本研究は, CDCパスのメタスタビリティ・インジェクション(MSI)の実施により, CDCの問題を最小化するための実用的形式的検証手法の開発に焦点をあてる。

Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain Crossings (CDC) are prone to metastability effects, and functional verification of such CDC is very important to ensure that no bug escapes. Conventional verification methods, such as register transfer level (RTL) simulations and static timing analysis, are not enough to address these CDC issues, which may lead to verification gaps. Additionally, identifying these CDC-related bugs is very time-consuming and is one of the most common reasons for costly silicon re-spins. This paper is focused on the development of a pragmatic formal verification methodology to minimize the CDC issues by exercising Metastability Injection (MSI) in different CDC paths.

翻訳日:2024-07-01 08:00:19 公開日:2024-04-20

# 高構成のディジタル設計における効率的な構成被覆のための半形式的検証手法

A Semi-Formal Verification Methodology for Efficient Configuration Coverage of Highly Configurable Digital Designs ( http://arxiv.org/abs/2405.01572v1 )

ライセンス: Link先を確認

Aman Kumar, Sebastian Simon,

(参考訳) 今日では、システムオンチップ(SoC)の大多数が、開発サイクルを短縮するために知的財産(IP)を使用している。このようなIPが開発されると、設計の高構成性に焦点が当てられる。この設計側の柔軟性は、検証側のIP構成の巨大な状態空間をカバーし、可能なパラメータ設定の全ての機能的正しさを保証するという課題をもたらす。可能性の多さはブルートフォースのアプローチを許さないため、典型的および極端な仮定に基づいて選択された少数の設定しか検証されない。特に、ISO 26262機能安全基準に従う必要がある自動車アプリケーションでは、すべての重要な変種をカバーする要件は、いずれにせよ満たされる必要がある。シミュレーションベースの検証や形式検証のような最先端の既存の検証技術には、それぞれ時間空間の爆発や状態空間の爆発といった課題があるため、高度に構成可能なディジタルデザインを効率的に検証することの欠如がある。本稿では,高度に構成可能なディジタル設計を効率的に構成するための半形式的検証手法に着目する。この方法論は、高い構成カバレッジを可能にするシミュレーティブおよびフォーマルなメソッドに基づいた、ランタイムの削減に焦点を当てている。また,提案手法を高度に構成可能なマイクロプロセッサIPに適用し,そのメリットについて考察する。

Nowadays, a majority of System-on-Chips (SoCs) make use of Intellectual Property (IP) in order to shorten development cycles. When such IPs are developed, one of the main focuses lies in the high configurability of the design. This flexibility on the design side introduces the challenge of covering a huge state space of IP configurations on the verification side to ensure the functional correctness under every possible parameter setting. The vast number of possibilities does not allow a brute-force approach, and therefore, only a selected number of settings based on typical and extreme assumptions are usually verified. Especially in automotive applications, which need to follow the ISO 26262 functional safety standard, the requirement of covering all significant variants needs to be fulfilled in any case. State-of-the-Art existing verification techniques such as simulation-based verification and formal verification have challenges such as time-space explosion and state-space explosion respectively and therefore, lack behind in verifying highly configurable digital designs efficiently. This paper is focused on a semi-formal verification methodology for efficient configuration coverage of highly configurable digital designs. The methodology focuses on reduced runtime based on simulative and formal methods that allow high configuration coverage. The paper also presents the results when the developed methodology was applied on a highly configurable microprocessor IP and discusses the gained benefits.

翻訳日:2024-05-12 16:10:01 公開日:2024-04-20

# マルチビット量子化フェデレーション学習におけるSERに基づくデバイス選択機構

A SER-based Device Selection Mechanism in Multi-bits Quantization Federated Learning ( http://arxiv.org/abs/2405.02320v1 )

ライセンス: Link先を確認

Pengcheng Sun, Erwu Liu, Rui Wang,

(参考訳) 無線通信の質は、FL(Federated Learning)の性能に直接影響を及ぼすので、シンボル誤り率(SER)を用いて、FLにおける無線通信の影響を解析する。 FLシステムでは、非直交多重アクセス(NOMA)を、無線チャネルの重畳特性を利用する複数のユーザによる通信混雑と干渉を低減するための基本的な通信フレームワークとして用いることができる。最小平均角誤差(MMSE)に基づくシリアル干渉キャンセル(SIC)技術を用いて、受信端で各端末ノードの勾配を1つずつ回復する。本稿では、勾配パラメータを複数のビットに量子化して、より多くの勾配情報を最大範囲に保持し、伝送誤差の許容性を向上させる。そこで我々は,SERベースのデバイス選択機構(SER-DSM)を設計し,学習性能が悪い通信条件のユーザの影響を受けないようにした。実験は、勾配の多重ビット量子化がFLに与える影響と、提案したSERデバイス選択機構の必要性と優位性を示す。

The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes advantage of the superposition characteristics of wireless channels. The Minimum Mean Square Error (MMSE) based serial interference cancellation (SIC) technology is used to recover the gradient of each terminal node one by one at the receiving end. In this paper, the gradient parameters are quantized into multiple bits to retain more gradient information to the maximum extent and to improve the tolerance of transmission errors. On this basis, we designed the SER-based device selection mechanism (SER-DSM) to ensure that the learning performance is not affected by users with bad communication conditions, while accommodating as many users as possible to participate in the learning process, which is inclusive to a certain extent. The experiments show the influence of multi-bit quantization of gradient on FL and the necessity and superiority of the proposed SER-based device selection mechanism.

翻訳日:2024-05-12 16:00:17 公開日:2024-04-20

# 学生オンライン授業間相互作用の日常的行動分類

Ordinal Behavior Classification of Student Online Course Interactions ( http://arxiv.org/abs/2405.05142v1 )

ライセンス: Link先を確認

Thomas Trask,

(参考訳) オンライン授業とMOOCスタイルのオンライン授業における学生のインタラクションパターンに関する研究は,過去11年間にわたって広く研究されてきた。しかし、オンライン・コースとMOOCスタイルのオンライン・フォーマットで提供されるのと同じコースを修了する学生の習慣を比較した文献の差は依然として残っている。本研究は、ジョージア工科大学CS1301 edxコースの学生を対象に、オンラインコースとMOOCスタイルコースの両方でブラウザベースの利用パターンを調べ、この2つのコースの間にどのようなパターンが存在するかを決定する。

The study in interaction patterns between students in on-campus and MOOC-style online courses has been broadly studied for the last 11 years. Yet there remains a gap in the literature comparing the habits of students completing the same course offered in both on-campus and MOOC-style online formats. This study will look at browser-based usage patterns for students in the Georgia Tech CS1301 edx course for both the online course offered to on-campus students and the MOOCstyle course offered to anyone to determine what, if any, patterns exist between the two cohorts.

翻訳日:2024-05-12 15:40:48 公開日:2024-04-20

# ノード決定プール付きグラフニューラルネットワークにおける階層的表現学習

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling ( http://arxiv.org/abs/1910.11436v3 )

ライセンス: Link先を確認

Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, Cesare Alippi,

(参考訳) グラフニューラルネットワーク(GNN)では、プール演算子は入力グラフの局所的な要約を計算し、そのグローバルな特性を捉える。本研究では,全体のグラフトポロジを保ちながら粗いグラフを生成するGNNのためのプール演算子であるノード決定プール(NDP)を提案する。トレーニング中、GNNは新しいノード表現を学び、それらを粗いグラフのピラミッドに適合させ、前処理の段階でオフラインで計算する。 NDPは3つのステップから構成される。まず、ノードデシメーション手順は、スペクトルアルゴリズムによって同定された分割の一方の側に属するノードを選択し、 \maxcut{} 解を近似する。その後、選択されたノードはKron還元と接続され、粗いグラフを形成する。最後に、得られたグラフは非常に密度が高いので、粗いグラフの隣接行列を具現化してGNNの計算コストを削減するスペーシフィケーション手法を適用する。特に、グラフ構造を著しく変更することなく、多くのエッジを除去できることが示されている。実験の結果、NDPは最先端のグラフプーリング演算子よりも効率が良く、同時に、多種多様なグラフ分類タスクにおける競合性能も向上していることがわかった。

In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.

翻訳日:2024-05-05 18:18:22 公開日:2024-04-20

# LEMDA:IoTシステムにおける侵入検出のための新機能エンジニアリング手法

LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems ( http://arxiv.org/abs/2404.16870v1 )

ライセンス: Link先を確認

Ali Ghubaish, Zebo Yang, Aiman Erbad, Raj Jain,

(参考訳) モノのインターネット(IoT)システム用の侵入検知システム(IDS)は、AIベースのモデルを使用してセキュアな通信を保証できる。 IoTシステムは、複雑なモデルを必要とする大量のデータを生成する多くの接続デバイスを持つ傾向があります。複雑なモデルには、オーバーフィット、低い解釈可能性、高い計算複雑性といった悪名高い問題がある。モデル複雑性のペナルティ(すなわち正規化)を追加することで過度な適合が容易になるが、解釈可能性や計算効率の面ではほとんど役に立たない。機能エンジニアリングはこれらの問題を解決することができるため、大規模なIoTシステムではIDSがデータのサイズと寸法を減らし、パフォーマンスが良く、データストレージが小さくなり、高速な検出が可能になった。本稿では,LEMDA (Mean Decrease in Accuracyに基づく光機能工学) と呼ばれる新しい特徴工学手法を提案する。 LEMDAは指数減衰と任意の感度因子を応用し、最も情報性の高い特徴を選択・生成する。提案手法は,3つのIoTデータセットと4つのAI/MLモデルを用いて,他の機能工学手法と比較して評価されている。その結果,LEMDAは全てのIDSモデルのF1スコアを平均34%改善し,ほとんどの場合の平均トレーニング時間と検出時間を短縮した。

Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.

翻訳日:2024-05-05 18:14:01 公開日:2024-04-20

# 知識グラフ完全性のための連続的関係抽出手法

A Continual Relation Extraction Approach for Knowledge Graph Completeness ( http://arxiv.org/abs/2404.17593v1 )

ライセンス: Link先を確認

Sefika Efeoglu,

(参考訳) 構造化された形式で非構造化データを表現することは、情報システム管理がそれを分析して解釈する上で最も重要なことである。これを実現するために、主なタスクがエンティティ認識と関係抽出と呼ばれる情報抽出パイプラインを活用することにより、構造化されていないデータを知識グラフに変換することができる。本論文は,実世界から来るデータストリーム内のエンティティ間の関係(相互接続)を識別する,新たな連続関係抽出手法を開発することを目的とする。この論文のドメイン固有のデータは、ドイツやオーストリアの新聞のコロナニュースである。

Representing unstructured data in a structured form is most significant for information system management to analyze and interpret it. To do this, the unstructured data might be converted into Knowledge Graphs, by leveraging an information extraction pipeline whose main tasks are named entity recognition and relation extraction. This thesis aims to develop a novel continual relation extraction method to identify relations (interconnections) between entities in a data stream coming from the real world. Domain-specific data of this thesis is corona news from German and Austrian newspapers.

翻訳日:2024-05-05 18:04:17 公開日:2024-04-20

# ソーシャルメディアの利用はアプリシーケンスから予測可能:LSTMとトランスフォーマーニューラルネットワークを用いて行動モデルを構築する

Social Media Use is Predictable from App Sequences: Using LSTM and Transformer Neural Networks to Model Habitual Behavior ( http://arxiv.org/abs/2404.16066v1 )

ライセンス: Link先を確認

Heinrich Peters, Joseph B. Bayer, Sandra C. Matz, Yikun Chi, Sumer S. Vaid, Gabriella M. Harari,

(参考訳) 本稿では,スマートフォン利用者の逐次行動の予測モデルを用いて,ソーシャルメディアの習慣を研究する新しいアプローチを提案する。メディアおよび技術習慣に関する文献の多くは、自己報告アンケートや単純な行動頻度測定に頼っているが、メディアおよび技術習慣の重要かつ未検討の側面である、反復的な行動系列への組込みについて検討する。 Long Short-Term Memory(LSTM)とTransformer Neural Networkの活用 (i)ソーシャルメディアの利用は、内外レベルで予測可能である。 (II)ソーシャルメディア利用の予測可能性には、個人差が強い。いくつかのモデリング手法の性能について検討する。一すべての参加者から収集されたデータに基づいて訓練されたグローバルモデル 2イディオグラフィー人固有のモデル、及び三人固有のデータに基づいて微調整されたグローバルモデル。個人固有のモデリングも、個人固有のデータの微調整も、グローバルモデルよりも大幅に優れておらず、グローバルモデルが様々な慣用的行動パターンを表現できたことを示している。さらに,ソーシャルメディア利用の個人レベルの予測性は,一般のスマートフォン利用頻度やソーシャルメディア利用頻度と大きく関係しているわけではなく,行動頻度と異なる習慣の側面を捉えていることを示す。習慣モデリングと理論的発展の意味について論じる。

The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetitive behavioral sequences. Leveraging Long Short-Term Memory (LSTM) and transformer neural networks, we show that (i) social media use is predictable at the within and between-person level and that (ii) there are robust individual differences in the predictability of social media use. We examine the performance of several modeling approaches, including (i) global models trained on the pooled data from all participants, (ii) idiographic person-specific models, and (iii) global models fine-tuned on person-specific data. Neither person-specific modeling nor fine-tuning on person-specific data substantially outperformed the global models, indicating that the global models were able to represent a variety of idiosyncratic behavioral patterns. Additionally, our analyses reveal that the person-level predictability of social media use is not substantially related to the frequency of smartphone use in general or the frequency of social media use, indicating that our approach captures an aspect of habits that is distinct from behavioral frequency. Implications for habit modeling and theoretical development are discussed.

翻訳日:2024-04-26 18:22:04 公開日:2024-04-20

# 形式的およびシミュレーションに基づくRADAR SoCの有効検証

Efficient Verification of a RADAR SoC Using Formal and Simulation-Based Methods ( http://arxiv.org/abs/2404.15371v1 )

ライセンス: Link先を確認

Aman Kumar, Mark Litterick, Samuele Candido,

(参考訳) IoT(Internet of Things)とHuman-to-Machine Interaction(HMI)の需要が増加するにつれ、このようなソリューションを提供する現代のSystem-on-Chips(SoC)はますます複雑になっています。この複雑な設計は、特に消費者電子製品にとって、市場へのタイム・トゥ・マーケットが重要な要素である場合、検証に重大な課題をもたらす。本稿では,複雑な無線検出・ラング(RADAR)をベースとしたSoCを用いて,ミリメートル精度で人体の動きのオンチップセンシングを行うケーススタディを提案する。我々は,形式的手法とシミュレーション的手法を併用して相互補完を行い,信頼性の高い検証サインオフを実現する。要件駆動のフローアプローチを採用する一方で、複数の要件に対応し、プロジェクトからのノウハウを強調するために、さまざまな検証方法の使用を実演する。さらに、機械学習(ML)ベースの手法、特にCadenceのXcelium MLツールを使用して、検証スループットを改善しました。

As the demand for Internet of Things (IoT) and Human-to-Machine Interaction (HMI) increases, modern System-on-Chips (SoCs) offering such solutions are becoming increasingly complex. This intricate design poses significant challenges for verification, particularly when time-to-market is a crucial factor for consumer electronics products. This paper presents a case study based on our work to verify a complex Radio Detection And Ranging (RADAR) based SoC that performs on-chip sensing of human motion with millimetre accuracy. We leverage both formal and simulation-based methods to complement each other and achieve verification sign-off with high confidence. While employing a requirements-driven flow approach, we demonstrate the use of different verification methods to cater to multiple requirements and highlight our know-how from the project. Additionally, we used Machine Learning (ML) based methods, specifically the Xcelium ML tool from Cadence, to improve verification throughput.

翻訳日:2024-04-25 15:44:33 公開日:2024-04-20

# RoadBEV:鳥の視線で道路表面を再構築する

RoadBEV: Road Surface Reconstruction in Bird's Eye View ( http://arxiv.org/abs/2404.06605v2 )

ライセンス: Link先を確認

Tong Zhao, Lei Yang, Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Yintao Wei,

(参考訳) 路面条件、特に幾何学的プロファイルは、自動運転車の走行性能に大きな影響を及ぼす。視覚に基づくオンライン道路再建は,道路情報を事前に収集する。モノクル深度推定やステレオマッチングといった既存のソリューションは、控えめなパフォーマンスに悩まされている。最近のバードアイビュー(Bird's-Eye-View、BEV)の認識技術は、より信頼性と正確な再構築の可能性を秘めている。本稿では, 単眼画像とステレオ画像で道路標高を推定する, RoadBEV-mono と RoadBEV-stereo の2つの簡易かつ効果的な道路標高復元モデルを提案する。前者はイメージビューから検索したボクセル特徴に基づく標高値と直接適合する一方、後者は左右のボクセル特徴の相違を示すBEVボリュームに基づく道路標高パターンを効率的に認識する。洞察に富んだ分析は、その構成と視点との相違を明らかにする。実世界のデータセットの実験は、モデルの有効性と優越性を検証します。 RoadBEVモノとRoadBEVステレオの標高誤差はそれぞれ1.83cmと0.50cmである。単眼画像に基づくBEVでは, 推定性能が50%向上した。我々のモデルは実用的な応用に期待でき、自律運転における視覚に基づくBEVの認識に貴重な基準を提供する。コードはhttps://github.com/ztsrxh/RoadBEVで公開されている。

Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction. This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively. The former directly fits elevation values based on voxel features queried from image view, while the latter efficiently recognizes road elevation patterns based on BEV volume representing discrepancy between left and right voxel features. Insightful analyses reveal their consistence and difference with perspective view. Experiments on real-world dataset verify the models' effectiveness and superiority. Elevation errors of RoadBEV-mono and RoadBEV-stereo achieve 1.83cm and 0.50cm, respectively. The estimation performance improves by 50\% in BEV based on monocular image. Our models are promising for practical applications, providing valuable references for vision-based BEV perception in autonomous driving. The code is released at https://github.com/ztsrxh/RoadBEV.

翻訳日:2024-04-24 18:46:42 公開日:2024-04-20

# コントラル検出の最適化: 効率的なNet-b4エンコーディングによるディープラーニングアプローチ

Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 Encoding ( http://arxiv.org/abs/2404.14441v1 )

ライセンス: Link先を確認

Qunwei Lin, Qian Leng, Zhicheng Ding, Chao Yan, Xiaonan Xu,

(参考訳) 環境の持続可能性を求める中で、航空産業は生態系のフットプリントを最小限に抑えるという課題に直面している。主要な解決策の1つは、航空機の排気によって発生する直線的な氷結晶雲をターゲットとした避妊である。これらのコントラルは、大気熱を捕捉し、正確なセグメンテーションと、環境影響を測定するためのコントラル画像の包括的な分析を必要とすることで、地球温暖化を悪化させる。しかし、このセグメンテーションタスクは、異なる大気条件下でのコントラルの出現の変化と予測モデルにおける潜在的なミスアライメントの問題により複雑である。本稿では,特徴抽出に効率的なNet-b4エンコーダを応用した革新的な深層学習手法を提案し,衛星画像における反則検出の精度と効率を高めるために,誤り訂正,ソフトラベリング,擬似ラベル技術とシームレスに統合する。提案手法は,衛星画像の正確なコントラル検出と分析のための堅牢な枠組みを提供し,航空環境への影響軽減を支援することによって,コントラル画像解析を再定義し,持続可能な航空の目的に寄与することを目的としている。

In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trapping atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images to gauge their environmental impact. However, this segmentation task is complex due to the varying appearances of contrails under different atmospheric conditions and potential misalignment issues in predictive modeling. This paper presents an innovative deep-learning approach utilizing the efficient net-b4 encoder for feature extraction, seamlessly integrating misalignment correction, soft labeling, and pseudo-labeling techniques to enhance the accuracy and efficiency of contrail detection in satellite imagery. The proposed methodology aims to redefine contrail image analysis and contribute to the objectives of sustainable aviation by providing a robust framework for precise contrail detection and analysis in satellite imagery, thus aiding in the mitigation of aviation's environmental impact.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# Smooth Q-Learningアルゴリズムの統一ODE解析

Unified ODE Analysis of Smooth Q-Learning Algorithms ( http://arxiv.org/abs/2404.14442v1 )

ライセンス: Link先を確認

Donghwan Lee,

(参考訳) Q-ラーニングの収束は、過去数十年にわたる広範な研究の焦点となっている。近年,Q-ラーニングのための漸近収束解析をスイッチングシステムフレームワークを用いて導入している。このアプローチは、連続時間スイッチングシステムとしてモデル化された非同期Q-ラーニングの収束を証明するために、いわゆる常微分方程式(ODE)アプローチを適用する。しかし、安定性を証明するためには、準単調性のような制約条件を基礎となるスイッチングシステムに満たさなければならないため、解析方法をスムーズなQ-ラーニング変種など他の強化学習アルゴリズムに容易に一般化することは困難である。本稿では、スイッチングシステムアプローチを改善し、Q-ラーニングとそのスムーズな変形を解析できる、より汎用的で統一的な収束解析を提案する。提案手法は,Lyapunov関数として機能する$p$-normに基づく同期Q-ラーニングの収束に関する過去の研究に動機付けられている。しかし、提案した分析は、より一般的なODEモデルに対処し、非同期Q-ラーニングと、より単純なフレームワークでそのスムーズなバージョンの両方をカバーできる。

Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# 意味的依存とキーワードに基づく機械翻訳の評価

Evaluation of Machine Translation Based on Semantic Dependencies and Keywords ( http://arxiv.org/abs/2404.14443v1 )

ライセンス: Link先を確認

Kewei Yuan, Qiurong Zhao, Yang Xu, Xiao Zhang, Huansheng Ning,

(参考訳) 本稿では,既存の機械翻訳評価アルゴリズムの多くが語彙情報と構文情報のみを考慮しているが,文に含まれる深い意味情報を無視するという事実を踏まえ,参照翻訳に基づいて機械翻訳の意味的正当性を評価し,意味的依存関係と文キーワード情報を統合する計算手法を提案する。ハルビン工科大学ソーシャル・コンピューティング・情報検索研究センターが開発した言語技術プラットフォームを用いて、文のセマンティック依存分析とキーワード分析を行い、キーワードに対応するセマンティック依存グラフ、キーワード、重み情報を取得する。文のセマンティック依存関係を持つすべての単語情報と、セマンティック情報に影響を与えるキーワード情報を含んでいる。単語と依存多機能を含む意味的関連性ペアを構築する。文のキーセマンティクスは意味依存によって抽出されたセマンティクス情報では強調できないため、あいまいなセマンティクス解析がもたらされる。したがって、機械翻訳セマンティック評価の範囲内には、文キーワード情報も含まれる。文の意味的正当性を包括的かつ詳細に評価するために, 実験結果から, 類似した手法と比較して, 評価アルゴリズムの精度が向上し, 機械翻訳の意味的正当性をより正確に測定できることを示した。

In view of the fact that most of the existing machine translation evaluation algorithms only consider the lexical and syntactic information, but ignore the deep semantic information contained in the sentence, this paper proposes a computational method for evaluating the semantic correctness of machine translations based on reference translations and incorporating semantic dependencies and sentence keyword information. Use the language technology platform developed by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology to conduct semantic dependency analysis and keyword analysis on sentences, and obtain semantic dependency graphs, keywords, and weight information corresponding to keywords. It includes all word information with semantic dependencies in the sentence and keyword information that affects semantic information. Construct semantic association pairs including word and dependency multi-features. The key semantics of the sentence cannot be highlighted in the semantic information extracted through semantic dependence, resulting in vague semantics analysis. Therefore, the sentence keyword information is also included in the scope of machine translation semantic evaluation. To achieve a comprehensive and in-depth evaluation of the semantic correctness of sentences, the experimental results show that the accuracy of the evaluation algorithm has been improved compared with similar methods, and it can more accurately measure the semantic correctness of machine translation.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# 不確かさを意識したベイズニューラルネットワークによるバッテリヘルスモニタリング

Practical Battery Health Monitoring using Uncertainty-Aware Bayesian Neural Network ( http://arxiv.org/abs/2404.14444v1 )

ライセンス: Link先を確認

Yunyi Zhao, Zhang Wei, Qingyu Yan, Man-Fai Ng, B. Sivaneasan, Cheng Xiang,

(参考訳) バッテリーの健康モニタリングと予測は、安全、持続可能性、経済的側面に大きな影響を与える電気移動時代において極めて重要である。既存の研究はしばしば予測精度に重点を置いているが、現実のアプリケーションにおける技術の展開を妨げる実用的な要因を無視する傾向がある。本稿では,バッテリ寿命予測のためのベイズニューラルネットワークに基づくモデルを開発する。本モデルでは,モデルの各パラメータに対して,バッテリ健康に関するセンサデータを使用し,単一点ではなく分布を適用した。これにより、モデルが固有のランダム性とバッテリ健康の不確実性をキャプチャし、正確な予測だけでなく、定量的な不確実性も得られる。提案モデルの有効性を実験的に検証し, 予測誤差は平均13.9%, 特定の試験電池では2.9%であった。さらに、すべての予測には定量的な確実性が含まれており、バッテリーの初期から中期にかけて66%改善されている。この研究は、バッテリ技術に対する実用的価値を持ち、業界における技術導入の加速に寄与している。

Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop models based on the Bayesian neural network for predicting battery end-of-life. Our models use sensor data related to battery health and apply distributions, rather than single-point, for each parameter of the models. This allows the models to capture the inherent randomness and uncertainty of battery health, which leads to not only accurate predictions but also quantifiable uncertainty. We conducted an experimental study and demonstrated the effectiveness of our proposed models, with a prediction error rate averaging 13.9%, and as low as 2.9% for certain tested batteries. Additionally, all predictions include quantifiable certainty, which improved by 66% from the initial to the mid-life stage of the battery. This research has practical values for battery technologies and contributes to accelerating the technology adoption in the industry.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# 大規模言語モデルによる合成データ評価のための多面的評価フレームワーク

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models ( http://arxiv.org/abs/2404.14445v1 )

ライセンス: Link先を確認

Yefeng Yuan, Yuhong Liu, Liang Cheng,

(参考訳) 生成型AIと大規模言語モデル(LLM)の急速な進歩は、特に製品レビューのような構造化表形式の領域において、合成データを生成するための新たな道を開いた。潜在的なメリットにもかかわらず、特にトレーニングデータセットで個人情報が使用される場合、プライバシリークに関する懸念が表面化している。さらに、生成された合成データの品質を定量的に測定し、下流タスクに利用できる総合的な評価フレームワークが存在しない。このギャップに対応するために、さまざまな評価指標を用いて合成された表データの忠実さ、有用性、およびプライバシー保護を評価するために設計されたオープンソースの評価フレームワークであるSynEvalを紹介した。提案するフレームワークであるSynEvalの有効性を,ChatGPT,Claude,Llamaの3つの最先端LCMから生成された総合製品レビューデータに適用して検証した。実験結果から, 合成データ生成の文脈における各種評価指標間のトレードオフを明らかにした。さらに、SynEvalは、合成表データに携わる研究者や実践者にとって重要な手段であり、特定のアプリケーションに対して生成されたデータの適合性を司法的に判断する権限を与え、ユーザのプライバシの維持に重点を置いている。

The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# NVIDIA弾性率に基づく物理インフォームドニューラル演算子フォワードモデルによる新しいA.I型貯留層評価

A Novel A.I Enhanced Reservoir Characterization with a Combined Mixture of Experts -- NVIDIA Modulus based Physics Informed Neural Operator Forward Model ( http://arxiv.org/abs/2404.14447v1 )

ライセンス: Link先を確認

Clement Etienam, Yang Juntao, Issam Said, Oleg Ovcharenko, Kaustubh Tangsali, Pavel Dimitrov, Ken Hester,

(参考訳) 本研究では,貯水池評価のための高度なワークフローを開発し,新しいアプローチによる貯水池履歴マッチングの課題を効果的に解決した。本手法は,高度なクラスタ分類回帰(CCR)フレームワークにおいて,物理インフォームドニューラル演算子(PINO)をフォワードモデルとして統合する。このプロセスは、貯水池履歴マッチングにおける急激な不確実性定量化のために最適化された適応正規化アンサンブルカルマンインバージョン(aREKI)によって強化される。このイノベーティブなワークフローは未知の透水性とポロシティの場をパラメータ化し、変分畳み込みオートエンコーダやCCRのような技術で非ガウス測度を捉える。エキゾチックな先行と教師付きモデルとして機能するCCRは、ピースマンウェル方程式の非線形ダイナミクスを正確にシミュレートするために、PINOサロゲートと相乗化する。 CCRアプローチは、各ステージに異なる機械学習アルゴリズムを適用する際の柔軟性を可能にする。 PINO貯水池サロゲートの更新は、監督データ、初期条件、黒油PDEの残留物から得られた損失関数によって駆動される。我々の統合モデルはPINO-Res-Simと呼ばれ、圧力、飽和度、石油、水、ガスの生産速度を含む重要なパラメータを出力します。合成貯水池とノルンフィールドの制御実験により従来のシミュレータに対して検証された手法は、顕著な精度を示した。さらに、aREKIワークフローのPINO-Res-Simは、従来の手法よりも100～6000倍高速な計算速度で、未知のフィールドを効率よく回収した。 NVIDIA H100上で実行されるPINO-Res-Simの学習フェーズは驚くほど効率的で、複雑な計算タスクのためのアンサンブルベースのメソッドと互換性があった。

We have developed an advanced workflow for reservoir characterization, effectively addressing the challenges of reservoir history matching through a novel approach. This method integrates a Physics Informed Neural Operator (PINO) as a forward model within a sophisticated Cluster Classify Regress (CCR) framework. The process is enhanced by an adaptive Regularized Ensemble Kalman Inversion (aREKI), optimized for rapid uncertainty quantification in reservoir history matching. This innovative workflow parameterizes unknown permeability and porosity fields, capturing non-Gaussian posterior measures with techniques such as a variational convolution autoencoder and the CCR. Serving as exotic priors and a supervised model, the CCR synergizes with the PINO surrogate to accurately simulate the nonlinear dynamics of Peaceman well equations. The CCR approach allows for flexibility in applying distinct machine learning algorithms across its stages. Updates to the PINO reservoir surrogate are driven by a loss function derived from supervised data, initial conditions, and residuals of governing black oil PDEs. Our integrated model, termed PINO-Res-Sim, outputs crucial parameters including pressures, saturations, and production rates for oil, water, and gas. Validated against traditional simulators through controlled experiments on synthetic reservoirs and the Norne field, the methodology showed remarkable accuracy. Additionally, the PINO-Res-Sim in the aREKI workflow efficiently recovered unknown fields with a computational speedup of 100 to 6000 times faster than conventional methods. The learning phase for PINO-Res-Sim, conducted on an NVIDIA H100, was impressively efficient, compatible with ensemble-based methods for complex computational tasks.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# オブジェクト指向アーキテクチャ:デュランプレートのためのソフトウェア工学にヒントを得た形状文法

Object-Oriented Architecture: A Software Engineering-Inspired Shape Grammar for Durands Plates ( http://arxiv.org/abs/2404.14448v1 )

ライセンス: Link先を確認

Rohan Agarwal,

(参考訳) モジュラーアーキテクチャ設計の課題に対処するため,計算機科学の関数型およびオブジェクト指向プログラミング原理を用いた形状文法システムの実装を通じて,新しいアプローチを提案する。フレンチ・ネオ古典主義の建築家ジャン=ニコラ=ルイ・デュラン(Jean-Nicolas-Louis Durand)は、モジュラー・ルールに基づく建築の手法で知られており、複雑な建築様式を体系的に表現するシステムの能力を示している。コンピュータプログラミングの原理を活用することで、提案された方法論は、デュランのオリジナルプレートの固有の論理に固執しながら、多様な設計を作成できる。 Shape Machineの統合により、アーキテクトやデザイナのためのフレキシブルなフレームワークが可能になり、既存のCADソフトウェアでモジュール化された方法で複雑な構造を生成することができる。本研究は建築設計における計算ツールの探索に寄与し、歴史的に重要な建築要素を合成するための汎用的なソリューションを提供する。

Addressing the challenge of modular architectural design, this study presents a novel approach through the implementation of a shape grammar system using functional and object-oriented programming principles from computer science. The focus lies on the modular generation of plates in the style of French Neoclassical architect Jean-Nicolas-Louis Durand, known for his modular rule-based method to architecture, demonstrating the system's capacity to articulate intricate architectural forms systematically. By leveraging computer programming principles, the proposed methodology allows for the creation of diverse designs while adhering to the inherent logic of Durand's original plates. The integration of Shape Machine allows a flexible framework for architects and designers, enabling the generation of complex structures in a modular fashion in existing CAD software. This research contributes to the exploration of computational tools in architectural design, offering a versatile solution for the synthesis of historically significant architectural elements.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# ニューラルネットワークを用いたStackOverflowの質問品質予測

Predicting Question Quality on StackOverflow with Neural Networks ( http://arxiv.org/abs/2404.14449v1 )

ライセンス: Link先を確認

Mohammad Al-Ramahi, Izzat Alsmadi, Abdullah Wahbeh,

(参考訳) インターネットやソーシャルメディアを通じて利用できる情報の豊富さは前例がない。コンピューティング分野において、Stack OverflowのようなWebサイトは、コンピューティングとプログラミングの問題に対するソリューションを求めるユーザにとって重要なソースだと考えられている。しかし、他のソーシャルメディアプラットフォームと同様に、Stack Overflowには関連する情報と無関係な情報が混在している。本稿では,質問応答(QA)コミュニティの例として,Stack Overflowにおける質問の品質を予測するニューラルネットワークモデルの評価を行った。その結果、ベースライン機械学習モデルと比較してニューラルネットワークモデルの有効性を示し、80%の精度を実現した。さらに,ニューラルネットワークモデルにおけるレイヤーの数は,その性能に大きな影響を及ぼす可能性が示唆された。

The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated neural network models to predict the quality of questions on Stack Overflow, as an example of Question Answering (QA) communities. Our results demonstrate the effectiveness of neural network models compared to baseline machine learning models, achieving an accuracy of 80%. Furthermore, our findings indicate that the number of layers in the neural network model can significantly impact its performance.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# GraphMatcher: オントロジーマッチングのためのグラフ表現学習アプローチ

GraphMatcher: A Graph Representation Learning Approach for Ontology Matching ( http://arxiv.org/abs/2404.14450v1 )

ライセンス: Link先を確認

Sefika Efeoglu,

(参考訳) オントロジーマッチングは、2つ以上のオントロジーにおいて2つ以上のエンティティ間の関係や対応を見つけるものとして定義される。ドメインオントロジーの相互運用性問題を解決するためには、これらのオントロジーにおける意味論的に類似したエンティティを見つけ、マージする前にアライメントする必要がある。本研究で開発されたGraphMatcherは,グラフアテンションを用いたオントロジーマッチングシステムである。 GraphMatcherは、オントロジーアライメント評価イニシアチブ (OAEI) 2022 のカンファレンストラックで顕著な結果を得た。そのコードは ~\url{https://github.com/sefeoglu/gat_ontology_matching} で公開されている。

Ontology matching is defined as finding a relationship or correspondence between two or more entities in two or more ontologies. To solve the interoperability problem of the domain ontologies, semantically similar entities in these ontologies must be found and aligned before merging them. GraphMatcher, developed in this study, is an ontology matching system using a graph attention approach to compute higher-level representation of a class together with its surrounding terms. The GraphMatcher has obtained remarkable results in in the Ontology Alignment Evaluation Initiative (OAEI) 2022 conference track. Its codes are available at ~\url{https://github.com/sefeoglu/gat_ontology_matching}.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# 高次元データの複数ビューにおける外乱検出のための生成サブスペース逆アクティブラーニング

Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data ( http://arxiv.org/abs/2404.14451v1 )

ライセンス: Link先を確認

Jose Cribeiro-Ramallo, Vadim Arzamasov, Federico Matteucci, Denis Wambold, Klemens Böhm,

(参考訳) 高次元表データのアウトリー検出は、データマイニングにおいて重要なタスクであり、多くの下流タスクやアプリケーションに必須である。既存の教師なしの外れ値検出アルゴリズムは、不適切な仮定(IA)、次元性の呪い(CD)、複数ビュー(MV)など1つ以上の問題に直面している。これらの課題に対処するために,複数の敵を持つジェネレーティブ・サブスペース・アドバイザリアル・アクティブ・ラーニング(GSAAL)を導入する。これらの敵対者は、異なるデータ部分空間上の限界クラス確率関数を学習し、一方、全空間の1つの生成器は、不等式全体の分布をモデル化する。 GSAAL は MV の制限に対応するために特別に設計されており、IA と CD も扱える唯一の方法である。本稿では,MVの包括的数学的定式化,識別器の収束保証,GSAALの拡張性について述べる。我々はGSAALの有効性とスケーラビリティを実証し、特にMVシナリオにおいて、他の一般的なOD手法と比較して優れた性能を示す。

Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAAL), a novel approach that uses a Generative Adversarial Network with multiple adversaries. These adversaries learn the marginal class probability functions over different data subspaces, while a single generator in the full space models the entire distribution of the inlier class. GSAAL is specifically designed to address the MV limitation while also handling the IA and CD, being the only method to do so. We provide a comprehensive mathematical formulation of MV, convergence guarantees for the discriminators, and scalability results for GSAAL. Our extensive experiments demonstrate the effectiveness and scalability of GSAAL, highlighting its superior performance compared to other popular OD methods, especially in MV scenarios.

翻訳日:2024-04-24 18:17:13 公開日:2024-04-20

# FIRST:FrontrunnIngのレジリエントなスマートコントラクト

FIRST: FrontrunnIng Resilient Smart ConTracts ( http://arxiv.org/abs/2204.00955v2 )

ライセンス: Link先を確認

Emrah Sariboz, Gaurav Panwar, Roopa Vishwanathan, Satyajayant Misra,

(参考訳) 暗号通貨の使用量の増加により、貸し出し、借り入れ、マージン取引などの従来の金融応用を暗号通貨の世界に広く浸透させてきた。一部のケースでは、本質的に透明で規制されていない暗号通貨が、これらのアプリケーションのユーザを攻撃します。悪意のあるエンティティは、現在処理されていない金融トランザクションの知識を活用し、未処理のトランザクションの前に独自のトランザクションを実行しようとする。この結果、財務的損失、不正確なトランザクション、さらにはより多くの攻撃にさらされる可能性がある。本稿では、最前線攻撃を防ぐフレームワークであるFIRSTを提案し、検証遅延関数やアグリゲートシグネチャを含む暗号プロトコルを用いて構築する。我々の設計では、VDFの公開パラメータを生成するためのフェデレートされたセットアップがあり、単一の信頼できるセットアップの必要性を排除しています。我々は、FIRSTを正式に分析し、Universal Composabilityフレームワークを用いてセキュリティを証明し、FIRSTの有効性を実験的に実証する。

Owing to the meteoric rise in the usage of cryptocurrencies, there has been a widespread adaptation of traditional financial applications such as lending, borrowing, margin trading, and more, to the cryptocurrency realm. In some cases, the inherently transparent and unregulated nature of cryptocurrencies leads to attacks on users of these applications. One such attack is frontrunning, where a malicious entity leverages the knowledge of currently unprocessed financial transactions submitted by users and attempts to get its own transaction(s) executed ahead of the unprocessed ones. The consequences of this can be financial loss, inaccurate transactions, and even exposure to more attacks. We propose FIRST, a framework that prevents frontrunning attacks, and is built using cryptographic protocols including verifiable delay functions and aggregate signatures. In our design, we have a federated setup for generating the public parameters of the VDF, thus removing the need for a single trusted setup. We formally analyze FIRST, prove its security using the Universal Composability framework and experimentally demonstrate the effectiveness of FIRST.

翻訳日:2024-04-24 01:49:47 公開日:2024-04-20

# ELODI:Positive-Congruent Trainingのためのロジット差分抑制

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training ( http://arxiv.org/abs/2205.06265v3 )

ライセンス: Link先を確認

Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto,

(参考訳) 負のフリップ(負のフリップ)は、レガシーモデルが更新されたときに分類システムで導入されたエラーである。既存の負のフリップ率(NFR)を減らす方法は、新しいモデルに古いモデルを模倣させたり、推論コストを禁ずるアンサンブルを使ったりすることで、全体的な精度を犠牲にしている。我々は、NFRの減少におけるアンサンブルの役割を分析し、通常決定境界に近くない負のフリップを除去するが、ロジット間の距離に大きな偏差を示すことが多いことを観察する。本研究は,誤差率とNFRの両方でパラゴン性能を実現する分類システムを,単一モデルの推論コストで訓練する手法であるELODI(Ensemble Logit Difference Inhibition)を提案する。この方法は、分類システムを更新するために使用される単一学生モデルに均質なアンサンブルを蒸留する。 ELODIはまた、最大ロジット値を持つクラスのサブセットのロジット差のみを罰する一般化された蒸留目標であるロジット差分抑制(LDI)も導入している。複数の画像分類ベンチマークでは、ELODIによるモデル更新により、精度の保持とNFRの低減が向上した。

Negative flips are errors introduced in a classification system when a legacy model is updated. Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy by forcing a new model to imitate the old models, or use ensembles, which multiply inference cost prohibitively. We analyze the role of ensembles in reducing NFR and observe that they remove negative flips that are typically not close to the decision boundary, but often exhibit large deviations in the distance among their logits. Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model. The method distills a homogeneous ensemble to a single student model which is used to update the classification system. ELODI also introduces a generalized distillation objective, Logit Difference Inhibition (LDI), which only penalizes the logit difference of a subset of classes with the highest logit values. On multiple image classification benchmarks, model updates with ELODI demonstrate superior accuracy retention and NFR reduction.

翻訳日:2024-04-24 01:49:47 公開日:2024-04-20

# 置換に基づく進化的アルゴリズムの実行時解析

Runtime Analysis for Permutation-based Evolutionary Algorithms ( http://arxiv.org/abs/2207.04045v4 )

ライセンス: Link先を確認

Benjamin Doerr, Yassine Ghannane, Marouane Ibn Brahim,

(参考訳) 進化的アルゴリズム(EA)の理論解析は、過去25年間に擬ブール最適化問題において大きな進歩を遂げてきたが、EAが置換に基づく問題を解決する方法に関する散発的な理論的な結果のみが存在する。置換に基づくベンチマークの欠如を克服するため,従来の擬似ブールベンチマークを置換集合上で定義されたベンチマークに転送する一般的な方法を提案する。次に、Scharnow, Tinnefeld, Wegener (2004) が提案した置換に基づく$(1+1)$ EAの厳密な実行時解析を、LeadingOnes と Jump ベンチマークの類似で実施する。後者は、ビットストリングと異なり、置換を$\sigma$を別の$\tau$に変換するのがどれほど難しいかを決定するハミング距離だけでなく、$\sigma \tau^{-1}$の正確なサイクル構造も示している。このため、より対称なスクランブル突然変異作用素も考慮する。単純な証明に繋がるだけでなく、奇妙なジャンプサイズを持つジャンプ関数のランタイムを$\Thetaの係数で削減する。 (n)$。最後に、ビットストリングの場合のように、スクランブル演算子の重み付きバージョンが$m^{\Thetaの高速化につながることを示す。 (m)}$ on jump function with jump size $m$ 短い経験的分析によりこれらの知見が裏付けられるが、また、ヴォイド突然変異率のような小さな実装の詳細が重要な違いをもたらすことも明らかである。

While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.

翻訳日:2024-04-24 01:41:46 公開日:2024-04-20

# ベスト賞の選考

Selection of the Most Probable Best ( http://arxiv.org/abs/2207.07533v2 )

ライセンス: Link先を確認

Taeho Kim, Kyoung-kuk Kim, Eunhye Song,

(参考訳) 予測値ランキングと選択(R&S)問題では,すべてのk解のシミュレーション出力が,分布によって不確実性をモデル化可能な共通パラメータに依存する。パラメータが有限である場合にMPBを学習するための効率的な逐次サンプリングアルゴリズムを設計し,最適である確率が最も高い解として,最も確率の高いベスト(MPB)を定義する。我々はMPBを誤って選択する確率の大きな偏差率を導出し、最適な計算予算割当問題を定式化し、速度最大化の静的サンプリング比を求める。その後、問題は緩和され、検証するために解釈可能で計算的に効率的である最適条件の集合が得られる。最適化条件における未知の手段をその推定値に置き換えるアルゴリズムを考案し,シミュレーション予算が増加するにつれて,アルゴリズムのサンプリング比が条件を満たすことを証明した。さらに, 平均推定にカーネルリッジレグレッションを適用し, 同じ漸近収束結果を得ることにより, アルゴリズムの実証性能を著しく向上できることを示す。これらのアルゴリズムは、最先端の文脈R&Sアルゴリズムとベンチマークされ、経験的性能が優れていることを示した。

We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances.

翻訳日:2024-04-24 01:41:46 公開日:2024-04-20

# DeepVARwT:トレンド付きVARモデルのディープラーニング

DeepVARwT: Deep Learning for a VAR Model with Trend ( http://arxiv.org/abs/2209.10587v4 )

ライセンス: Link先を確認

Xixi Li, Jingsong Yuan,

(参考訳) ベクトル自己回帰(VAR)モデルは、複数の時系列間の依存を記述するために使われてきた。これは定常時系列のモデルであり、各系列に決定論的傾向が存在するように拡張することができる。 VARモデルに適合する前に、データをパラメトリックまたは非パラメトリックに遅延すると、後半部ではより多くのエラーが発生する。本研究では,DeepVARwTと呼ばれる新しい手法を提案する。この手法は,トレンドと依存構造を同時に最大に推定する深層学習手法を用いている。この目的のためにLong Short-Term Memory (LSTM) ネットワークが使用される。モデルの安定性を確保するため、Ansley & Kohn (1986) の変換を用いて自己回帰係数の因果条件を適用する。シミュレーション研究と実データへの適用について述べる。本研究では,実データから生成した現実的傾向関数を用いて,実関数/パラメータ値と比較する。実データアプリケーションでは,本モデルの予測性能を文献の最先端モデルと比較する。

The vector autoregressive (VAR) model has been used to describe the dependence within and across multiple time series. This is a model for stationary time series which can be extended to allow the presence of a deterministic trend in each series. Detrending the data either parametrically or nonparametrically before fitting the VAR model gives rise to more errors in the latter part. In this study, we propose a new approach called DeepVARwT that employs deep learning methodology for maximum likelihood estimation of the trend and the dependence structure at the same time. A Long Short-Term Memory (LSTM) network is used for this purpose. To ensure the stability of the model, we enforce the causality condition on the autoregressive coefficients using the transformation of Ansley & Kohn (1986). We provide a simulation study and an application to real data. In the simulation study, we use realistic trend functions generated from real data and compare the estimates with true function/parameter values. In the real data application, we compare the prediction performance of this model with state-of-the-art models in the literature.

翻訳日:2024-04-24 01:41:46 公開日:2024-04-20

# PTDE:マルチエージェント強化学習のための拡張実行による個人化訓練

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2210.08872v2 )

ライセンス: Link先を確認

Yiqun Chen, Hangyu Mao, Jiaxin Mao, Shiguang Wu, Tianle Zhang, Bin Zhang, Wei Yang, Hongxing Chang,

(参考訳) 分散実行による集中訓練(CTDE)は、多エージェント強化学習において広く採用されているパラダイムとして現れ、Q$-function(英語版)や集中的批判(英語版)を学習するためのグローバル情報の利用を強調している。対照的に、調査ではグローバルな情報を活用して、個別の$Q$関数や個々のアクターを直接強化しています。特に,全てのエージェントに対して同一のグローバル情報を普遍的に適用することは,最適な性能を示すには不十分であることが判明した。その結果、各エージェントに合わせたグローバル情報のカスタマイズを提唱し、総合的なパフォーマンスを高めるためにエージェント個人化されたグローバル情報を作成する。さらに,エージェント個人化されたグローバル情報をエージェントのローカル情報に蒸留するPTDE(Personalized Training with Distilled Execution)という新しいパラダイムを導入する。この蒸留された情報は、分散実行中に利用され、性能劣化を最小限に抑える。 PTDEは最先端のアルゴリズムとシームレスに統合できるため、SMACベンチマーク、Google Research Football(GRF)ベンチマーク、Learning to Rank(LTR)タスクなど、さまざまなベンチマークで注目すべきパフォーマンス向上を実現している。

Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint $Q$-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual $Q$-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamlessly integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.

翻訳日:2024-04-24 01:41:46 公開日:2024-04-20

# OpenPack:IoT対応のロジスティック環境におけるパッケージ作業認識のための大規模データセット

OpenPack: A Large-scale Dataset for Recognizing Packaging Works in IoT-enabled Logistic Environments ( http://arxiv.org/abs/2212.11152v2 )

ライセンス: Link先を確認

Naoya Yoshimura, Jaime Morales, Takuya Maekawa, Takahiro Hara,

(参考訳) ヒトの日常活動とは異なり、産業領域における業務活動認識のための既存のセンサデータセットは、産業現場との密接なコラボレーションが必要なため、現実的なデータ収集の困難さによって制限されている。これはまた、産業応用のための方法の研究と開発を制限している。そこで本研究では,これらの課題に対処し,産業領域における作業活動の機械的認識に関する研究に寄与するため,OpenPackと呼ばれる大規模な作業認識データセットを新たに導入する。 OpenPackには、加速度データ、キーポイント、深度画像、IoT対応デバイス(例えばハンドヘルドバーコードスキャナー)からの読み取りを含む53.8時間のマルチモーダルセンサーデータが含まれており、パッケージング作業経験の異なる16の被験者から収集されている。本研究では,現在最先端の人間活動認識技術をデータセットに適用し,この結果に基づいて,広汎なコンピューティングコミュニティにおける複雑な作業活動認識研究の今後の方向性を示す。 OpenPackは、困難なタスクを提供することで、センサベースのアクション/アクティビティ認識コミュニティに貢献すると考えています。 OpenPackデータセットはhttps://open-pack.github.io.comで公開されている。

Unlike human daily activities, existing publicly available sensor datasets for work activity recognition in industrial domains are limited by difficulties in collecting realistic data as close collaboration with industrial sites is required. This also limits research on and development of methods for industrial applications. To address these challenges and contribute to research on machine recognition of work activities in industrial domains, in this study, we introduce a new large-scale dataset for packaging work recognition called OpenPack. OpenPack contains 53.8 hours of multimodal sensor data, including acceleration data, keypoints, depth images, and readings from IoT-enabled devices (e.g., handheld barcode scanners), collected from 16 distinct subjects with different levels of packaging work experience. We apply state-of-the-art human activity recognition techniques to the dataset and provide future directions of complex work activity recognition studies in the pervasive computing community based on the results. We believe that OpenPack will contribute to the sensor-based action/activity recognition community by providing challenging tasks. The OpenPack dataset is available at https://open-pack.github.io.

翻訳日:2024-04-24 01:41:46 公開日:2024-04-20

# リニア・オプティカル・トランスポート・エンベディング

Linear Optimal Partial Transport Embedding ( http://arxiv.org/abs/2302.03232v4 )

ライセンス: Link先を確認

Yikun Bai, Ivan Medri, Rocio Diaz Martin, Rana Muhammad Shahroz Khan, Soheil Kolouri,

(参考訳) 最適トランスポート(OT)は、機械学習、統計処理、信号処理など様々な分野で応用されている。しかし、バランスの取れた質量要件は、実用上の問題においてその性能を制限している。これらの制限に対処するため、不均衡なOT、最適部分輸送(OPT)、Hellinger Kantorovich(HK)を含むOT問題の変種が提案されている。本稿では,OTおよびHK上の(局所的な)線形化手法をOPT問題に拡張したリニア最適部分輸送(LOPT)埋め込みを提案する。提案手法は,2組の正測度間のOPT距離の計算を高速化する。理論的な貢献に加えて,ポイントクラウド補間およびPCA解析におけるLOPT埋め込み手法の実証を行った。

Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed. In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem. The proposed embedding allows for faster computation of OPT distance between pairs of positive measures. Besides our theoretical contributions, we demonstrate the LOPT embedding technique in point-cloud interpolation and PCA analysis.

翻訳日:2024-04-24 01:32:01 公開日:2024-04-20

# インスタンスアソシエーションの展開:オーディオ・ビジュアル・セグメンテーションの概観

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation ( http://arxiv.org/abs/2304.02970v6 )

ライセンス: Link先を確認

Yuanhong Chen, Yuyuan Liu, Hu Wang, Fengbei Liu, Chong Wang, Helen Frazer, Gustavo Carneiro,

(参考訳) 音声視覚セグメント化(AVS)は、音声視覚キューに基づいて、正確に音を分割する作業である。音声・視覚学習の有効性は、音と視覚オブジェクトの正確な相互アライメントの実現に大きく依存する。健全な視覚学習には2つの重要な要素が必要である。 1)高品質な画素レベルのマルチクラスアノテート画像とオーディオファイルに関連付けられた課題データセット 2)音声情報とそれに対応する視覚オブジェクトとの強いつながりを確立できるモデル。しかしながら、これらの要件は、偏りのあるオーディオ視覚データを含むトレーニングセットや、偏りのあるトレーニングセットをはるかに越えたモデルなど、現在の手法によって部分的に解決されているだけである。本研究では,難易度と比較的偏りのない高画質な視覚的セグメンテーション・ベンチマークを構築するための費用対効果の新たな手法を提案する。また,音声・視覚指導型コントラスト学習のための新たな情報的サンプルマイニング手法を提案し,識別的コントラスト的サンプルを利用してモーダル間理解を実現する。ベンチマークの有効性を示す実験結果を示す。さらに,既存のAVSデータセットおよび新しいベンチマークを用いて行った実験により,本手法が最先端(SOTA)セグメンテーション精度を実現することを示す。

Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting sounding objects based on audio-visual cues. The effectiveness of audio-visual learning critically depends on achieving accurate cross-modal alignment between sound and visual objects. Successful audio-visual learning requires two essential components: 1) a challenging dataset with high-quality pixel-level multi-class annotated images associated with audio files, and 2) a model that can establish strong links between audio information and its corresponding visual object. However, these requirements are only partially addressed by current methods, with training sets containing biased audio-visual data, and models that generalise poorly beyond this biased training set. In this work, we propose a new cost-effective strategy to build challenging and relatively unbiased high-quality audio-visual segmentation benchmarks. We also propose a new informative sample mining method for audio-visual supervised contrastive learning to leverage discriminative contrastive samples to enforce cross-modal understanding. We show empirical results that demonstrate the effectiveness of our benchmark. Furthermore, experiments conducted on existing AVS datasets and on our new benchmark show that our method achieves state-of-the-art (SOTA) segmentation accuracy.

翻訳日:2024-04-24 01:32:01 公開日:2024-04-20

# CAFIN: グラフ上での教師なし表現学習のためのインプロセッシングによる中心性意識の公平性

CAFIN: Centrality Aware Fairness inducing IN-processing for Unsupervised Representation Learning on Graphs ( http://arxiv.org/abs/2304.04391v3 )

ライセンス: Link先を確認

Arvindh Arun, Aakash Aanegola, Amul Agrawal, Ramasuri Narayanam, Ponnurangam Kumaraguru,

(参考訳) グラフ上での教師なし表現学習は、乱れのないネットワークデータの増大と、生成された表現のコンパクトさ、豊かさ、有用性により、勢いを増している。この文脈では、表現の生成中に公平さとバイアスの制約を考慮する必要性が十分に動機付けられ、先行研究である程度研究されている。この設定における以前の研究の大きな制限の1つは、ノード間の不均等なパフォーマンスをもたらす様々なノード中心性など、グラフ内の接続パターンによって生じるバイアスに対処することを目的としていないことである。本研究は,教師なし環境でのグラフ構造によるバイアス軽減の問題に対処することを目的としている。この目的のために我々は,グラフの構造情報を活用し,既存のフレームワークが生成した表現をチューニングする中心性を考慮したフェアネス誘導フレームワークであるCAFINを提案する。 GraphSAGE(このドメインで人気のあるフレームワーク)にデプロイし、ノード分類とリンク予測という2つの下流タスクで有効性を示します。実証的には、CAFINは、さまざまなドメインからの一般的なデータセット(18から80%のパフォーマンス格差の削減)間のパフォーマンス格差を一貫して低減します。

Unsupervised Representation Learning on graphs is gaining traction due to the increasing abundance of unlabelled network data and the compactness, richness, and usefulness of the representations generated. In this context, the need to consider fairness and bias constraints while generating the representations has been well-motivated and studied to some extent in prior works. One major limitation of most of the prior works in this setting is that they do not aim to address the bias generated due to connectivity patterns in the graphs, such as varied node centrality, which leads to a disproportionate performance across nodes. In our work, we aim to address this issue of mitigating bias due to inherent graph structure in an unsupervised setting. To this end, we propose CAFIN, a centrality-aware fairness-inducing framework that leverages the structural information of graphs to tune the representations generated by existing frameworks. We deploy it on GraphSAGE (a popular framework in this domain) and showcase its efficacy on two downstream tasks - Node Classification and Link Prediction. Empirically, CAFIN consistently reduces the performance disparity across popular datasets (varying from 18 to 80% reduction in performance disparity) from various domains while incurring only a minimal cost of fairness.

翻訳日:2024-04-24 01:32:01 公開日:2024-04-20

# SPIRiT-Diffusion: 加速度MRIのための自己整合駆動拡散モデル

SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI ( http://arxiv.org/abs/2304.05060v2 )

ライセンス: Link先を確認

Zhuo-Xu Cui, Chentao Cao, Yue Wang, Sen Jia, Jing Cheng, Xin Liu, Hairong Zheng, Dong Liang, Yanjie Zhu,

(参考訳) 拡散モデルは画像生成の指導的手法として登場し、磁気共鳴画像再構成(MRI)の領域で成功している。しかし、拡散モデルに基づく既存の再構成法は主に画像領域で定式化されており、コイル感度マップ(CSM)における不正確性に影響を受けやすい。 k空間補間法はこの問題に効果的に対処できるが、従来の拡散モデルはk空間補間では容易に適用できない。この課題を克服するために,反復自己整合SPIRiT法に着想を得たk空間補間拡散モデルであるSPIRiT-Diffusionを提案する。具体的には、SPIRiTにおける自己整合項(k-空間物理先行項)の反復解法を用いて、拡散過程を管理する新しい確率微分方程式(SDE)を定式化する。その後、拡散処理を行うことでk空間データを補間することができる。この革新的なアプローチは、拡散モデルにおいてSDEを設計する際の最適化モデルの役割を強調し、拡散プロセスは、モデル駆動拡散と呼ばれる概念である最適化モデルに固有の物理学と密に一致させることができる。頭蓋内3次元画像と頸動脈壁画像を用いたSPIRiT-Diffusion法について検討した。その結果, 画像領域再構築法よりも精度が高く, 10。

Diffusion models have emerged as a leading methodology for image generation and have proven successful in the realm of magnetic resonance imaging (MRI) reconstruction. However, existing reconstruction methods based on diffusion models are primarily formulated in the image domain, making the reconstruction quality susceptible to inaccuracies in coil sensitivity maps (CSMs). k-space interpolation methods can effectively address this issue but conventional diffusion models are not readily applicable in k-space interpolation. To overcome this challenge, we introduce a novel approach called SPIRiT-Diffusion, which is a diffusion model for k-space interpolation inspired by the iterative self-consistent SPIRiT method. Specifically, we utilize the iterative solver of the self-consistent term (i.e., k-space physical prior) in SPIRiT to formulate a novel stochastic differential equation (SDE) governing the diffusion process. Subsequently, k-space data can be interpolated by executing the diffusion process. This innovative approach highlights the optimization model's role in designing the SDE in diffusion models, enabling the diffusion process to align closely with the physics inherent in the optimization model, a concept referred to as model-driven diffusion. We evaluated the proposed SPIRiT-Diffusion method using a 3D joint intracranial and carotid vessel wall imaging dataset. The results convincingly demonstrate its superiority over image-domain reconstruction methods, achieving high reconstruction quality even at a substantial acceleration rate of 10.

翻訳日:2024-04-24 01:32:01 公開日:2024-04-20

# 2体リニアキックロータシステムにおけるカオスと局部位相

Chaos and localized phases in a two-body linear kicked rotor system ( http://arxiv.org/abs/2304.08899v2 )

ライセンス: Link先を確認

Anjali Nambudiripad, J. Bharathi Kannan, M. S. Santhanam,

(参考訳) 周期的なキックにもかかわらず、リニアキックドローター(LKR)は、運動エネルギー項が運動量で線形である積分可能かつ正確に解けるモデルである。最近、空間的に相互作用するLKRも積分可能であることが示され、対応する量子状態における動的局在が得られた。同様の局所化位相は、連結相対論的キックローターのような他の非可積分モデルにも存在する。この研究は2体LKRを用いて2つの主要な結果を示し、第一に、ローターのモータ間の相互作用を通じて、積分可能なリニアキックローターにカオスが引き起こされることを示した。リアプノフ指数の分析的推定値を得る。第二に、このカオスモデルの量子力学は、蹴りの強さと相互作用の強さの変化によって、古典的に誘導された局在化、動的局在化、部分拡散および拡散相といった様々な相を示すことが示されている。本システムにおける絡み合い生産の観点から,これらの位相のシグネチャを指摘する。有効ヒルベルト空間次元を定義することにより、絡み合う成長速度を適切なランダム行列平均を用いて理解することができる。

Despite the periodic kicks, a linear kicked rotor (LKR) is an integrable and exactly solvable model in which the kinetic energy term is linear in momentum. It was recently shown that spatially interacting LKRs are also integrable, and results in dynamical localization in the corresponding quantum regime. Similar localized phases exist in other non-integrable models such as the coupled relativistic kicked rotors. This work, using a two-body LKR, demonstrates two main results; firstly, it is shown that chaos can be induced in the integrable linear kicked rotor through interactions between the momenta of rotors. An analytical estimate of its Lyapunov exponent is obtained. Secondly, the quantum dynamics of this chaotic model, upon variation of kicking and interaction strengths, is shown to exhibit a variety of phases -- classically induced localization, dynamical localization, subdiffusive and diffusive phases. We point out the signatures of these phases from the perspective of entanglement production in this system. By defining an effective Hilbert space dimension, the entanglement growth rate can be understood using appropriate random matrix averages.

翻訳日:2024-04-24 01:32:01 公開日:2024-04-20

# Curious Rhythms: ウィキペディア消費の時間的規則性

Curious Rhythms: Temporal Regularities of Wikipedia Consumption ( http://arxiv.org/abs/2305.09497v3 )

ライセンス: Link先を確認

Tiziano Piccardi, Martin Gerlach, Robert West,

(参考訳) ウィキペディアは世界最大の百科事典として、幅広い情報ニーズに対応している。以前の研究では、ウィキペディア利用者の情報は1日を通して異なることが指摘されていたが、現在までに基礎となる力学の大規模かつ定量的な研究は行われていない。本論文は,英語ウィキペディアのサーバログから抽出した数十億件のタイムゾーン補正ページ要求を大規模に分析し,その状況と時間が消費情報の種類とどのように関連しているかを調査することによって,このギャップを埋めるものである。まず, 日中交替のグローバルなパターンを除去したとしても, 個々の物品の消費習慣が日中変化を強く維持していることを示す。そこで,本研究では,夜間に好まれる記事と就労時間に好まれる記事とを特に区別し,消費パターンの原型的形状を特徴付ける。最後に、ウィキペディアの記事のアクセスリズムの話題的・文脈的相関について検討し、記事の話題、読者国、アクセスデバイス(モバイル対デスクトップ)が日々の注意パターンの重要な予測因子であることを示す。これらの発見は、人間がウェブ上で情報を求める方法に新たな光を当て、ウィキペディアを知識と学習のための最大のオープンプラットフォームの一つとして焦点を合わせ、ウィキペディアが情報のニーズを満たすリッチな知識基盤としての役割を一日を通じて強調し、世界中の情報を探究する情報を理解し、適切な情報システムの設計に意味があることを強調した。

Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-scale analysis of billions of timezone-corrected page requests mined from English Wikipedia's server logs, with the goal of investigating how context and time relate to the kind of information consumed. First, we show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities. Then, we characterize the prototypical shapes of consumption patterns, finding a particularly strong distinction between articles preferred during the evening/night and articles preferred during working hours. Finally, we investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns. These findings shed new light on how humans seek information on the Web by focusing on Wikipedia as one of the largest open platforms for knowledge and learning, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day, with implications for understanding information seeking across the globe and for designing appropriate information systems.

翻訳日:2024-04-24 01:22:08 公開日:2024-04-20

# ニューラルネットワーク伝搬デコーダの一般化境界

Generalization Bounds for Neural Belief Propagation Decoders ( http://arxiv.org/abs/2305.10540v2 )

ライセンス: Link先を確認

Sudarshan Adiga, Xin Xiao, Ravi Tandon, Bane Vasic, Tamal Bose,

(参考訳) 機械学習ベースのアプローチは、次世代通信システムのためのデコーダの設計にますます使われている。広く使われているフレームワークの1つは、信念伝播(NBP)であり、このフレームワークは、信念伝播(BP)イテレーションをディープニューラルネットワークに展開し、パラメータはデータ駆動方式で訓練される。 NBPデコーダは古典的復号アルゴリズムを改善することが示されている。本稿では, NBPデコーダの一般化機能について検討する。具体的には、デコーダの一般化ギャップは、経験的ビットエラーレートと期待ビットエラーレートの差である。このギャップを埋めて、コードパラメータ(ブロック長、メッセージ長、変数/チェックノード次数)、復号化イテレーション、トレーニングデータセットサイズなど、デコーダの複雑さに依存することを示す新たな理論的結果を示す。通常のパリティチェック行列と不規則なパリティチェック行列の両方について結果が提示される。我々の知る限りでは、ニューラルネットワークに基づくデコーダの一般化性能に関する最初の理論的結果である。本稿では,トレーニングデータセットサイズに対する一般化ギャップの依存性を示す実験結果と,異なるコードに対する復号化の繰り返しを示す。

Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms. In this paper, we investigate the generalization capabilities of NBP decoders. Specifically, the generalization gap of a decoder is the difference between empirical and expected bit-error-rate(s). We present new theoretical results which bound this gap and show the dependence on the decoder complexity, in terms of code parameters (blocklength, message length, variable/check node degrees), decoding iterations, and the training dataset size. Results are presented for both regular and irregular parity-check matrices. To the best of our knowledge, this is the first set of theoretical results on generalization performance of neural network based decoders. We present experimental results to show the dependence of generalization gap on the training dataset size, and decoding iterations for different codes.

翻訳日:2024-04-24 01:22:08 公開日:2024-04-20

# OER: 継続的なオフライン強化学習のためのオフライン体験リプレイ

OER: Offline Experience Replay for Continual Offline Reinforcement Learning ( http://arxiv.org/abs/2305.13804v2 )

ライセンス: Link先を確認

Sibo Gai, Donglin Wang, Li He,

(参考訳) エージェントには、事前にコンパイルされたオフラインデータセットのシーケンスを通じて、新たなスキルを継続的に学習する能力が望まれる。しかし、一連のオフラインタスクを連続的に学習することは、リソース制限されたシナリオ下での破滅的な忘れの問題につながる可能性が高い。本稿では、エージェントが一連のオフライン強化学習タスクを学習し、全ての連続タスクの環境を探索することなく、小さなリプレイバッファで全ての学習タスクの性能を追求する、新しい設定である連続オフライン強化学習(CORL)を定式化する。すべてのシーケンシャルなタスクについて一貫して学習するためには、エージェントは新しい知識を取得し、一方、古い知識をオフラインで保存する必要がある。この目的のために,我々は連続学習アルゴリズムを導入し,CORL問題の最も適切なアルゴリズムとして経験再生(ER)を実験的に発見した。しかし、CORLにERを導入すると、リプレイバッファにおける経験と学習ポリシーからの軌跡とのミスマッチという、新しい分散シフト問題が発生することが観察された。このような問題に対処するために、リプレイバッファを構築するための新しいモデルベースエクスペリエンスセレクション(MBES)方式を提案し、そこで遷移モデルを学習して状態分布を近似する。このモデルは、記憶のための学習モデルに最も近いオフラインデータからデータをフィルタリングすることで、リプレイバッファと学習モデルの間の分布バイアスをブリッジするために使用される。さらに,新しいタスクを学習する能力を高めるために,新しい二重行動クローニング(DBC)アーキテクチャを用いて経験再現手法を再構成し,Q-ラーニングプロセスにおける行動閉鎖の障害を回避する。一般に、アルゴリズムをオフライン体験再生(OER)と呼ぶ。広汎な実験により,OER法は広く使用されているムジョコ環境においてSOTAのベースラインを上回っていることが示された。

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.

翻訳日:2024-04-24 01:22:08 公開日:2024-04-20

# leftover Lunch: 言語モデルのためのアドバンテージベースのオフライン強化学習

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models ( http://arxiv.org/abs/2305.14718v5 )

ライセンス: Link先を確認

Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark Riedl,

(参考訳) RLHF(Reinforcement Learning with Human Feedback)は、言語モデル(LM)アライメントの最も顕著な手法である。しかし、RLHFは不安定でデータハングリーなプロセスであり、微調整のために新しい高品質なLM生成データを必要とする。本稿では,既存のデータに対するRLトレーニングを可能にするオフラインポリシー勾配アルゴリズムであるAdvantage-Leftover Lunch RL (A-LoL)を紹介する。 LM出力シーケンス全体を単一のアクションとして仮定することで、A-LoLはシーケンスレベルの分類器や人間設計のスコアリング機能を報酬として組み込むことができる。その後、LMの値の推定値を使用することで、A-LoLは正の優位性(左上)のデータポイントのみを訓練し、ノイズに耐性を持たせる。全体として、A-LoLは実装が容易で、サンプル効率が高く、安定したLMトレーニングレシピである。 A-LoLとその変種の有効性を4つの異なる言語生成タスクで示す。オンラインRL(PPO)と最近のRL(DPO, PRO)とオフラインRL(GOLD)を比較した。一般的に使用されているRLHFベンチマークであるHelpful and Harmless Assistant (HHA)では、A-LoLメソッドで訓練されたLMは、人間によるベースラインよりも安全で役に立つと評価されている。さらに、残りの3つのタスクでは、A-LoLはノイズや準最適トレーニングデータを使用しても、複数の異なる報酬関数を最適化することができた。実験コードもリリースしています。 https://github.com/abaheti95/LoL-RL

Reinforcement Learning with Human Feedback (RLHF) is the most prominent method for Language Model (LM) alignment. However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for finetuning. We introduce Advantage-Leftover Lunch RL (A-LoL), a new class of offline policy gradient algorithms that enable RL training on any pre-existing data. By assuming the entire LM output sequence as a single action, A-LoL allows incorporating sequence-level classifiers or human-designed scoring functions as rewards. Subsequently, by using LM's value estimate, A-LoL only trains on positive advantage (leftover) data points, making it resilient to noise. Overall, A-LoL is an easy-to-implement, sample-efficient, and stable LM training recipe. We demonstrate the effectiveness of A-LoL and its variants with a set of four different language generation tasks. We compare against both online RL (PPO) and recent preference-based (DPO, PRO) and reward-based (GOLD) offline RL baselines. On the commonly-used RLHF benchmark, Helpful and Harmless Assistant (HHA), LMs trained with A-LoL methods achieve the highest diversity while also being rated more safe and helpful than the baselines according to humans. Additionally, in the remaining three tasks, A-LoL could optimize multiple distinct reward functions even when using noisy or suboptimal training data. We also release our experimental code. https://github.com/abaheti95/LoL-RL

翻訳日:2024-04-24 01:22:08 公開日:2024-04-20

# NLPモデルのドメインシフトに対するロバスト性の測定

Measuring the Robustness of NLP Models to Domain Shifts ( http://arxiv.org/abs/2306.00168v5 )

ライセンス: Link先を確認

Nitay Calderon, Naveh Porat, Eyal Ben-David, Alexander Chapanin, Zorik Gekhman, Nadav Oved, Vitaly Shalumov, Roi Reichart,

(参考訳) ドメインロバストネス(DR)に関する既存の研究は、異なる設定、限られたタスクの多様性、コンテキスト内学習のような最近の能力に関する研究が不足している。さらに、DR測定の一般的な実践は、完全には正確ではないかもしれない。現在の研究は、チャレンジセットに焦点を当て、ソースドロップ(SD: Source Drop)のみに依存している。しかし、ドメイン内パフォーマンスの劣化を測定するターゲットドロップ(TD)は相補的な視点として使うべきであると論じる。これらの問題に対処するため、まず7つの異なるNLPタスクからなるDRベンチマークを算出し、SDとTDの両方を計測した。そこで我々は,21種類の微調整モデルと少ショットLLMを14,000以上のドメインシフトを含む大規模DR研究を行った。両方のモデルタイプがドメインシフト時にドロップに悩まされることがわかりました。微調整のモデルはドメイン内では優れているが、少数ショットのLLMはドメインを超越し、ロバスト性が向上する。さらに、真のDRチャレンジよりも難しいドメインにシフトすることで、大きなSDをしばしば説明できることがわかり、これは相補的なメトリックとしてのTDの重要性を強調している。我々の研究は、NLPモデルの現在のDR状態に光を当て、より堅牢なモデルに対する評価プラクティスの改善を促進することを願っている。

Existing research on Domain Robustness (DR) suffers from disparate setups, limited task variety, and scarce research on recent capabilities such as in-context learning. Furthermore, the common practice of measuring DR might not be fully accurate. Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation. However, we argue that the Target Drop (TD), which measures degradation from the target in-domain performance, should be used as a complementary point of view. To address these issues, we first curated a DR benchmark comprised of 7 diverse NLP tasks, which enabled us to measure both the SD and the TD. We then conducted a comprehensive large-scale DR study involving over 14,000 domain shifts across 21 fine-tuned models and few-shot LLMs. We found that both model types suffer from drops upon domain shifts. While fine-tuned models excel in-domain, few-shot LLMs often surpass them cross-domain, showing better robustness. In addition, we found that a large SD can often be explained by shifting to a harder domain rather than by a genuine DR challenge, and this highlights the importance of TD as a complementary metric. We hope our study will shed light on the current DR state of NLP models and promote improved evaluation practices toward more robust models.

翻訳日:2024-04-24 01:12:24 公開日:2024-04-20

# 合成データを用いたレアカメラビューにおける2次元人物位置推定の改善

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data ( http://arxiv.org/abs/2307.06737v2 )

ライセンス: Link先を確認

Miroslav Purkrabek, Jiri Matas,

(参考訳) 人間のポーズ推定のための方法とデータセットは、主にサイドビューとフロントビューのシナリオに焦点を当てている。合成データを活用することで限界を克服し、ポーズとビューを包括的に制御したSMPLベースの合成人間を生成するRePoGen(RarE POses GENerator)を導入する。トップビューデータセットの実験と、さまざまなポーズを持つ実画像の新しいデータセットにより、COCOデータセットにRePoGenデータを追加することは、一般的なビューのパフォーマンスを損なうことなく、トップビューとボトムビューのポーズ推定に対する以前のアプローチより優れていることが示されている。アブレーション研究は、解剖学的妥当性、特に先行研究は、効果的なパフォーマンスの前提条件ではないことを示している。導入されたデータセットと対応するコードはhttps://mirapurkrabek.github.io/RePoGen-paper/ で公開されている。

Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches to top- and bottom-view pose estimation without harming performance on common views. An ablation study shows that anatomical plausibility, a property prior research focused on, is not a prerequisite for effective performance. The introduced dataset and the corresponding code are available on https://mirapurkrabek.github.io/RePoGen-paper/ .

翻訳日:2024-04-24 01:02:16 公開日:2024-04-20

# Prot2Text:GNNとトランスフォーマーを用いたマルチモーダルタンパク質の機能生成

Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers ( http://arxiv.org/abs/2307.14367v3 )

ライセンス: Link先を確認

Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis,

(参考訳) 近年,様々な機械学習手法が開発され,タンパク質機能予測の分野で大きな進歩を遂げている。しかし、既存のほとんどの方法はタスクを多分類問題、すなわち事前に定義されたラベルをタンパク質に割り当てるものとして定式化している。本研究では,タンパク質の機能を自由テキスト形式で予測する新しいアプローチであるProt2Textを提案する。エンコーダ・デコーダフレームワークでグラフニューラルネットワーク(GNN)とLarge Language Models(LLM)を組み合わせることで,タンパク質配列や構造,テキストアノテーションや記述など,さまざまなデータタイプを効果的に統合する。このマルチモーダルアプローチはタンパク質の機能の全体的表現を可能にし、詳細で正確な機能記述の生成を可能にする。本モデルを評価するため,SwissProtからマルチモーダルタンパク質データセットを抽出し,Prot2Textの有効性を実証的に実証した。これらの結果は、マルチモーダルモデル、特にGNNとLLMの融合による変換的影響を強調し、研究者に、既存のタンパク質だけでなく、より正確な機能予測のための強力なツールを提供する。

In recent years, significant progress has been made in the field of protein function prediction with the development of various machine-learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e. assigning predefined labels to proteins. In this work, we propose a novel approach, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including protein sequence, structure, and textual annotation and description. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate functional descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.

翻訳日:2024-04-24 01:02:16 公開日:2024-04-20

# 言語モデルを用いた患者と臨床の整合性の検討

Matching Patients to Clinical Trials with Large Language Models ( http://arxiv.org/abs/2307.15051v3 )

ライセンス: Link先を確認

Qiao Jin, Zifeng Wang, Charalampos S. Floudas, Fangyuan Chen, Changlin Gong, Dara Bracken-Clarke, Elisabetta Xue, Yifan Yang, Jimeng Sun, Zhiyong Lu,

(参考訳) 臨床試験は、しばしば患者募集の課題によって妨げられる。本稿では,患者間マッチングを支援するLLMフレームワークであるTrialGPTを紹介する。患者注記が与えられた場合、TrialGPTは、患者が基準ごとの基準に基づいて適性を予測するとともに、これらの予測を統合して、対象の臨床試験に対する適性を評価する。公用コホート184例を対象に,TrialGPTの試験レベル予測性能について検討した。また,3名の医師に1,000名以上の患者基準ペアをラベル付けし,基準レベルの予測精度を評価した。実験の結果、TrialGPTは専門家のパフォーマンス(88.7%-90.0%)に近く、忠実な説明で87.3%の基準レベルの精度を達成した。集計されたTrialGPTスコアは、ヒトの適性判断と高い相関があり、最高の競争モデルを32.6%から57.2%で上回り、臨床試験を除外している。さらに,本研究により,TrialGPTは実生活における臨床試験マッチング作業において,スクリーニング時間(42.6%)を大幅に短縮できることが明らかとなった。これらの結果と分析により,TrialGPTなどのLSMとの臨床治験の機会が得られた。

Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%-90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT.

翻訳日:2024-04-24 01:02:16 公開日:2024-04-20

# 量子金融シミュレーションと量子状態生成のための新しいアプローチ

A novel approach for quantum financial simulation and quantum state preparation ( http://arxiv.org/abs/2308.01844v2 )

ライセンス: Link先を確認

Yen-Jui Chang, Wei-Ting Wang, Hao-Yuan Chen, Shih-Wei Liao, Ching-Ray Chang,

(参考訳) 量子状態の準備は、量子コンピューティングと情報処理において不可欠である。特定の量子状態の正確かつ確実な準備能力は、様々な用途に不可欠である。量子コンピュータの有望な応用の1つは量子シミュレーションである。これは、我々がシミュレートしようとしているシステムを表す量子状態を作成する必要がある。本研究では,パラメータ化量子回路(PQC)と古典シミュレータの変分解法を用いた複雑な確率分布の学習とロードを目的とした,新しいシミュレーションアルゴリズムであるマルチスプリット-ステップ量子ウォーク(multi-SSQW)を提案する。マルチSSQWアルゴリズムは、分割ステップの量子ウォークの修正版であり、マルチエージェント決定プロセスを統合するように拡張され、金融市場をモデル化するのに適している。この研究は、確率分布シミュレーションと金融市場モデリングにおける有望な能力を実証するために、マルチSSQWアルゴリズムの理論的記述と実証的研究を提供する。量子計算の利点を生かして、マルチSSQWは複雑な財務分布とシナリオを高精度にモデル化し、財務分析と意思決定のための貴重な洞察とメカニズムを提供する。マルチSSQWの主な利点は、モデリングの柔軟性、安定した収束、即時計算である。これらの利点は、動的な金融市場での急速なモデリングと予測の可能性を強調している。

Quantum state preparation is vital in quantum computing and information processing. The ability to accurately and reliably prepare specific quantum states is essential for various applications. One of the promising applications of quantum computers is quantum simulation. This requires preparing a quantum state representing the system we are trying to simulate. This research introduces a novel simulation algorithm, the multi-Split-Steps Quantum Walk (multi-SSQW), designed to learn and load complicated probability distributions using parameterized quantum circuits (PQC) with a variational solver on classical simulators. The multi-SSQW algorithm is a modified version of the split-steps quantum walk, enhanced to incorporate a multi-agent decision-making process, rendering it suitable for modeling financial markets. The study provides theoretical descriptions and empirical investigations of the multi-SSQW algorithm to demonstrate its promising capabilities in probability distribution simulation and financial market modeling. Harnessing the advantages of quantum computation, the multi-SSQW models complex financial distributions and scenarios with high accuracy, providing valuable insights and mechanisms for financial analysis and decision-making. The multi-SSQW's key benefits include its modeling flexibility, stable convergence, and instantaneous computation. These advantages underscore its rapid modeling and prediction potential in dynamic financial markets.

翻訳日:2024-04-24 01:02:16 公開日:2024-04-20

# 暗黙のマルチタスク強化学習問題に対するポリシー適応法

A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems ( http://arxiv.org/abs/2308.16471v2 )

ライセンス: Link先を確認

Satoshi Yamamori, Jun Morimoto,

(参考訳) 接触や衝突を含む動的運動生成タスクでは、ポリシーパラメータの小さな変化は、非常に異なるリターンをもたらす。例えば、サッカーでは、打球の位置や力がわずかに変化したり、ボールの摩擦が変化した場合に、ボールは同様の方向の動きで完全に異なる方向に飛べる。しかし、異なる方向にボールを向くためには、全く異なるスキルが必要であると想像することは困難である。本研究では,異なる報酬関数や環境パラメータを持つ単一動作カテゴリにおいて,目標や環境の暗黙的な変化にポリシーを適用するためのマルチタスク強化学習アルゴリズムを提案する。単足ロボットモデルを用いて,ボール誘導作業における提案手法の評価を行った。その結果,提案手法はゴール位置の暗黙的な変化やボールの再生係数に適応できるが,標準領域のランダム化手法では異なるタスク設定に対処できないことがわかった。

In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.

翻訳日:2024-04-24 00:52:28 公開日:2024-04-20

# $\rm SP^3$:PCAプロジェクションによる構造化プルーニングの強化

$\rm SP^3$: Enhancing Structured Pruning via PCA Projection ( http://arxiv.org/abs/2308.16475v2 )

ライセンス: Link先を確認

Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen,

(参考訳) 構造化プルーニング(Structured pruning)は、事前訓練された言語モデル(PLM)のサイズを減らす手法として広く使われているが、現在の手法は、モデルのサイズと効率に重要な次元であるPLMの隠れ次元(d)を圧縮する可能性を見落としていることが多い。本稿では,PCAプロジェクションを用いた構造化プルーニング手法(SP3)を提案し,マスク前に主成分によって定義された空間に特徴を投影することで,効果的にdを減少させる手法を提案する。ベンチマーク(GLUEとSQuAD)の大規模な実験は、SP3がdを70%削減し、BERTベースモデルの94%を圧縮し、96%以上の精度を維持し、同じ圧縮比でdを6%圧縮する他の方法よりも優れていることを示している。 SP3はOPTやLlamaなど他のモデルでも有効であることが証明されている。私たちのデータとコードは匿名のリポジトリで利用可能です。

Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency. This paper introduces a novel structured pruning approach, Structured Pruning with PCA Projection (SP3), targeting the effective reduction of d by projecting features into a space defined by principal components before masking. Extensive experiments on benchmarks (GLUE and SQuAD) show that SP3 can reduce d by 70%, compress 94% of the BERTbase model, maintain over 96% accuracy, and outperform other methods that compress d by 6% in accuracy at the same compression ratio. SP3 has also proven effective with other models, including OPT and Llama. Our data and code are available at an anonymous repo.

翻訳日:2024-04-24 00:52:28 公開日:2024-04-20

# 非測定共著者をもつ一般化線形モデルに対する同時推論

Simultaneous inference for generalized linear models with unmeasured confounders ( http://arxiv.org/abs/2309.07261v3 )

ライセンス: Link先を確認

Jin-Hong Du, Larry Wasserman, Kathryn Roeder,

(参考訳) 数万の同時仮説テストがゲノム研究で定期的に行われ、異なる発現遺伝子を同定する。しかし、計測されていない共同設立者のために、多くの標準的な統計手法は実質的に偏っているかもしれない。本稿では,多変量一般化線形モデルに対する共起効果の存在下での大規模仮説検証問題について検討する。任意のコンバウンディング機構の下では,直交構造を利用し,線形射影を3つの重要な段階に統合する,統一的な統計的推定と推論の枠組みを提案する。これは、潜伏係数を回復するために、辺縁と非負の関係の共起効果を遠ざけることから始まる。その後、ラッソ型最適化により潜在因子と一次効果を共同で推定する。最後に、仮説テストのために投影および重み付けされたバイアス補正ステップを組み込む。理論的には、様々な効果と非漸近誤差境界の同定条件を確立する。サンプルおよび応答サイズが無限大に近づくと、漸近的な$z$-testsの効果的なType-Iエラー制御を示す。数値実験により, 提案手法はベンジャミン・ホックベルク法により偽発見率を制御し, 代替手法よりも強力であることが示された。 2つのサンプル群から得られた単細胞RNA-seq数を比較することにより、モデルから有意な共変量が欠如している場合に、共起効果を調節する適性を示す。

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover the latent coefficients. Subsequently, latent factors and primary effects are jointly estimated through lasso-type optimization. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish the identification conditions of various effects and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.

翻訳日:2024-04-24 00:52:28 公開日:2024-04-20

# C-Pack:中国の一般的な埋め込みを促進するためにパッケージ化されたリソース

C-Pack: Packaged Resources To Advance General Chinese Embedding ( http://arxiv.org/abs/2309.07597v3 )

ライセンス: Link先を確認

Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff,

(参考訳) C-Packは、一般的な中国の埋め込みの分野を著しく前進させるリソースのパッケージである。 C-Packには3つの重要なリソースが含まれている。 1) C-MTEBは6つのタスクと35のデータセットをカバーする中国語テキスト埋め込みの総合ベンチマークである。 2) C-MTPは, ラベル付き, ラベルなしの中国語コーパスを用いて, 埋め込みモデルを訓練するための大量のテキスト埋め込みデータセットである。 3) C-TEMは、複数のサイズをカバーする埋め込みモデルのファミリーである。弊社のモデルは、C-MTEB上の以前の中国語のテキスト埋め込みを、リリース時に最大で10%上回っている。また、C-TEMのための一連のトレーニング方法を統合し、最適化します。一般的な中国語の埋め込みに関するリソースに加えて、英語のテキスト埋め込みのためのデータとモデルもリリースしています。 MTEBベンチマークでは、英語モデルは最先端のパフォーマンスを達成していますが、我々のリリースした英語データは、中国のデータより2倍も大きいのです。これらのリソースはすべてhttps://github.com/FlagOpen/FlagEmbedding.comで公開されています。

We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.

翻訳日:2024-04-24 00:42:43 公開日:2024-04-20

# Lemur: プログラムの自動検証に大規模言語モデルを統合する

Lemur: Integrating Large Language Models in Automated Program Verification ( http://arxiv.org/abs/2310.04870v4 )

ライセンス: Link先を確認

Haoze Wu, Clark Barrett, Nina Narodytska,

(参考訳) LLMの実証されたコード理解能力は、検証ツールで難しいプログラムプロパティに関する高度な抽象的推論を必要とするタスクである自動プログラム検証に使用できるかどうかという問題を提起する。自動プログラム検証のためのLLMと自動推論器のパワーを組み合わせるための一般的な手法を提案する。我々は、この方法論をトランジションルールの集合として公式に記述し、その健全性を証明する。本稿では,音声自動検証手法として計算をインスタンス化し,一連の合成および競合ベンチマークの実践的改善を実証する。

The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.

翻訳日:2024-04-24 00:42:43 公開日:2024-04-20

# 不均一な自己監視学習による表現の強化

Enhancing Representations through Heterogeneous Self-Supervised Learning ( http://arxiv.org/abs/2310.05108v2 )

ライセンス: Link先を確認

Zhong-Yu Li, Bo-Wen Yin, Shanghua Gao, Yongxiang Liu, Li Liu, Ming-Ming Cheng,

(参考訳) 異なるアーキテクチャから異種表現を組み込むことは、様々なビジョンタスク、例えば、トランスフォーマーと畳み込みを組み合わせたハイブリッドネットワークを促進する。しかし、このような異種アーキテクチャ間の相補性は、自己教師付き学習では十分に活用されていない。そこで本研究では,HSSL(Heterogeneous Self-Supervised Learning)を提案する。このプロセスでは、HSSLは構造的変化を伴わずに表現学習方式でベースモデルに新しい特徴を付与する。 HSSLを包括的に理解するために,ベースモデルと補助ヘッドを含む多種多様な異種対の実験を行った。アーキテクチャの相違が大きくなるにつれて,ベースモデルの表現品質が向上することがわかった。本研究の動機は,特定のベースモデルの学習に最も適した補助頭部を迅速に決定する探索戦略と,モデルの差分を増大させる単純かつ効果的な方法を提案することである。 HSSLは、画像分類、セマンティックセグメンテーション、インスタンスのセグメンテーション、オブジェクト検出など、さまざまなダウンストリームタスクにおいて優れたパフォーマンスを達成する。私たちのソースコードは公開されます。

Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection. Our source code will be made publicly available.

翻訳日:2024-04-24 00:42:43 公開日:2024-04-20

# CoT3DRef:データ効率のよい3Dビジュアルグラウンド

CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding ( http://arxiv.org/abs/2310.06214v3 )

ライセンス: Link先を確認

Eslam Mohamed Bakr, Mohamed Ayman, Mahmoud Ahmed, Habib Slim, Mohamed Elhoseiny,

(参考訳) 3Dビジュアルグラウンドティングは、発話によって条件付けられた3Dシーンでオブジェクトをローカライズする機能である。既存のほとんどのメソッドは参照ヘッドを使って参照オブジェクトを直接ローカライズし、複雑なシナリオで失敗する。さらに、ネットワークが最終決定に達する方法や理由も示していない。本稿では,人間の知覚システムを模倣する可能性を秘めた,解釈可能な3次元視覚基盤を設計できるのか? と。この目的のために、まずアンカーの連鎖を予測し、次に最終ターゲットを予測することにより、シークエンス・ツー・シーケンスのSeq2Seqタスクとして3次元視覚接地問題を定式化する。解釈可能性は全体的なパフォーマンスを改善するだけでなく、障害ケースの特定にも役立ちます。思考の連鎖に従えば、参照タスクを解釈可能な中間ステップに分解し、パフォーマンスを高め、フレームワークを極めてデータ効率のよいものにすることができます。さらに,提案するフレームワークは既存のアーキテクチャに容易に組み込むことができる。我々は,Nr3D,Sr3D,Scanreferベンチマークの総合的な実験を通じてアプローチを検証するとともに,手動のアノテートデータを必要としない既存手法と比較して一貫した性能向上を示す。さらに、提案するフレームワークであるCoT3DRefは、データ効率がかなり高いのに対して、Sr3Dデータセットでは、データの10%しかトレーニングしていない場合、データ全体に基づいてトレーニングされたSOTAのパフォーマンスと一致します。コードはhttps:eslambakr.github.io/cot3dref.github.io/で公開されている。

3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances. Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios. In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question Can we design an interpretable 3D visual grounding framework that has the potential to mimic the human perception system?. To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence Seq2Seq task by first predicting a chain of anchors and then the final target. Interpretability not only improves the overall performance but also helps us identify failure cases. Following the chain of thoughts approach enables us to decompose the referring task into interpretable intermediate steps, boosting the performance and making our framework extremely data-efficient. Moreover, our proposed framework can be easily integrated into any existing architecture. We validate our approach through comprehensive experiments on the Nr3D, Sr3D, and Scanrefer benchmarks and show consistent performance gains compared to existing methods without requiring manually annotated data. Furthermore, our proposed framework, dubbed CoT3DRef, is significantly data-efficient, whereas on the Sr3D dataset, when trained only on 10% of the data, we match the SOTA performance that trained on the entire data. The code is available at https:eslambakr.github.io/cot3dref.github.io/.

翻訳日:2024-04-24 00:32:58 公開日:2024-04-20

# 文脈モデリングによる半監督された群集数:群集場面の全体的理解を促進する

Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes ( http://arxiv.org/abs/2310.10352v3 )

ライセンス: Link先を確認

Yifei Qian, Xiaopeng Hong, Zhongliang Guo, Ognjen Arandjelović, Carl R. Donovan,

(参考訳) そこで本研究では,信頼度の高い群集数モデルの訓練に要する重いアノテーション負担を軽減し,より多くのデータを活用することで,モデルをより実践的かつ正確にするため,教師の枠組みに基づいた新たな半教師方式を提案する。ラベル付きデータが不足している場合には、ローカルパッチに過度に適合する傾向にある。このような状況下では、ラベルなしデータによる局所パッチ予測の精度を単に改善するという従来のアプローチは不十分である。そこで本研究では,モデル固有の「従属化」能力の育成という,よりニュアンスなアプローチを提案する。この能力により、モデルは群衆シーンの理解を活用し、人間の認知過程を反映することで、地域の数を正確に見積もることができる。この目的を達成するために、ラベルのないデータにマスキングを適用し、全体的手がかりに基づいてこれらのマスキングされたパッチの予測をモデルに導く。さらに,特徴学習を支援するために,細粒度密度分類タスクを組み込んだ。本手法は, 厳密な構造制約や損失制約を伴わないため, 既存の群集カウント法に適用可能である。さらに、我々のフレームワークでトレーニングされたモデルが「補助的」な振る舞いを示すことを観察する。高密度領域を正確に予測し、局所的な詳細を組み込んで高密度領域を予測する。提案手法は,上海技術AやUCF-QNRFといった挑戦的なベンチマークにおいて,従来のアプローチをはるかに上回り,最先端の性能を実現する。コードは、https://github.com/cha15yq/MRC-Crowd.comで入手できる。

To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the conventional approach of solely improving the accuracy of local patch predictions through unlabeled data proves inadequate. Consequently, we propose a more nuanced approach: fostering the model's intrinsic 'subitizing' capability. This ability allows the model to accurately estimate the count in regions by leveraging its understanding of the crowd scenes, mirroring the human cognitive process. To achieve this goal, we apply masking on unlabeled data, guiding the model to make predictions for these masked patches based on the holistic cues. Furthermore, to help with feature learning, herein we incorporate a fine-grained density classification task. Our method is general and applicable to most existing crowd counting methods as it doesn't have strict structural or loss constraints. In addition, we observe that the model trained with our framework exhibits a 'subitizing'-like behavior. It accurately predicts low-density regions with only a 'glance', while incorporating local details to predict high-density regions. Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is available at: https://github.com/cha15yq/MRC-Crowd.

翻訳日:2024-04-24 00:32:57 公開日:2024-04-20

# Refining Latent Representations: Heterogeneous Graph LearningのためのジェネレーティブSSLアプローチ

Refining Latent Representations: A Generative SSL Approach for Heterogeneous Graph Learning ( http://arxiv.org/abs/2310.11102v4 )

ライセンス: Link先を確認

Yulan Hu, Zhirui Yang, Sheng Ouyang, Yong Liu,

(参考訳) 自己監視学習(SSL)は大きな可能性を示し、グラフ学習への関心が高まっている。しかし、生成的SSL法では、HGL(Heterogeneous Graph Learning)の可能性はいまだに未解明である。 Generative SSLは、エンコーダを使用して、入力グラフを潜在表現にマッピングし、デコーダを使用して潜在表現から入力グラフを復元する。従来のHGL SSLメソッドは一般的にグラフの不均一性を捕捉するための複雑な戦略を設計するが、これはしばしば非自明なビュー構築戦略に大きく依存している。しかし、生成SSLにおける潜伏表現の精細化は、グラフ学習結果を効果的に改善することができる。本研究では,HGL用に特別に設計された生成SSL方式であるHGVAEを提案する。 HGVAEは不均一性を捉える複雑な戦略を設計する代わりに、潜伏表現の精細化に重点を置いている。具体的には、HGVAEは、潜在表現に基づく対照的なタスクを革新的に開発する。負のサンプルの硬さを確保するために,変分推論(VI)を利用して高品質な負のサンプルを生成するプログレッシブ・ネガティブ・サンプル生成(PNSG)機構を開発した。 HGLに生成SSLを適用する先駆者として、HGVAEは潜在表現を洗練し、高品質な表現を学ぶようモデルに促す。様々な最先端(SOTA)ベースラインと比較して、HGVAEは印象的な結果をもたらし、その優位性を検証する。

Self-Supervised Learning (SSL) has shown significant potential and has garnered increasing interest in graph learning. However, particularly for generative SSL methods, its potential in Heterogeneous Graph Learning (HGL) remains relatively underexplored. Generative SSL utilizes an encoder to map the input graph into a latent representation and a decoder to recover the input graph from the latent representation. Previous HGL SSL methods generally design complex strategies to capture graph heterogeneity, which heavily rely on contrastive view construction strategies that are often non-trivial. Yet, refining the latent representation in generative SSL can effectively improve graph learning results. In this study, we propose HGVAE, a generative SSL method specially designed for HGL. Instead of focusing on designing complex strategies to capture heterogeneity, HGVAE centers on refining the latent representation. Specifically, HGVAE innovatively develops a contrastive task based on the latent representation. To ensure the hardness of negative samples, we develop a progressive negative sample generation (PNSG) mechanism that leverages the ability of Variational Inference (VI) to generate high-quality negative samples. As a pioneer in applying generative SSL for HGL, HGVAE refines the latent representation, thereby compelling the model to learn high-quality representations. Compared with various state-of-the-art (SOTA) baselines, HGVAE achieves impressive results, thus validating its superiority.

翻訳日:2024-04-24 00:32:57 公開日:2024-04-20

# GPT-4はチューリング試験に合格するのか?

Does GPT-4 pass the Turing test? ( http://arxiv.org/abs/2310.20216v2 )

ライセンス: Link先を確認

Cameron R. Jones, Benjamin K. Bergen,

(参考訳) GPT-4をオンラインチューリングテストで評価した。最も優れたGPT-4プロンプトは49.7%のゲームで通過し、ELIZA(22%)とGPT-3.5(20%)を上回ったが、ヒトが設定したベースラインに届かなかった(66%)。参加者の判断は主に言語的スタイル(35%)と社会情緒的特徴(27%)に基づいており、知性はチューリング試験に合格するには不十分であるという考えを支持した。 LLMとゲーム数に関する参加者の知識は、AI検出の精度と正の相関関係があり、学習と実践が詐欺を軽減できる戦略であることを示唆した。インテリジェンステストとしての既知の制限にもかかわらず、チューリングテストは自然主義的なコミュニケーションと騙しの評価として引き続き関係していると我々は主張する。人間としてマスクレーディングできるAIモデルは、広く社会的な結果をもたらす可能性があり、異なる戦略の有効性と人間の類似性を判断するための基準を分析します。

We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants' decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient to pass the Turing test. Participant knowledge about LLMs and number of games played positively correlated with accuracy in detecting AI, suggesting learning and practice as possible strategies to mitigate deception. Despite known limitations as a test of intelligence, we argue that the Turing test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.

翻訳日:2024-04-24 00:23:13 公開日:2024-04-20

# Carpe Diem:生涯言語モデルにおける世界知識の評価について

Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models ( http://arxiv.org/abs/2311.08106v2 )

ライセンス: Link先を確認

Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun,

(参考訳) 常に変化する世界の知識のダイナミックな性質は、静的データに基づいて訓練された言語モデルに対する課題を提示している。人間の言語におけるこれらの時間依存力学のための言語モデルの能力を研究するために、進化するウィキペディアデータベース上でLMを訓練し評価するために設計された、時間的に進化する質問応答ベンチマークであるEvolvingQAを導入する。 EvolvingQAの構築は、大規模な言語モデルを使用してパイプラインで自動化されます。既存の継続的な学習ベースラインが、時代遅れの知識の更新と削除に悩まされていることを明らかにする。我々の分析では、モデルが小さな重み勾配のために知識の修正に失敗することを示唆している。さらに,言語モデルが特に数値情報や時間情報の変化を反映するのに苦慮していることも明らかにした。本研究の目的は,実世界の情報の動的性質をモデル化することであり,言語モデルの進化適応性を忠実に評価することである。

The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database. The construction of EvolvingQA is automated with our pipeline using large language models. We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge. Our analysis suggests that models fail to rectify knowledge due to small weight gradients. In addition, we elucidate that language models particularly struggle to reflect the change of numerical or temporal information. Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models.

翻訳日:2024-04-24 00:23:13 公開日:2024-04-20

# FREE:環境生態系のモデリングのための基礎的意味認識

FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems ( http://arxiv.org/abs/2311.10255v2 )

ライセンス: Link先を確認

Shiyuan Luo, Juntong Ni, Shengyu Chen, Runlong Yu, Yiqun Xie, Licheng Liu, Zhenong Jin, Huaxiu Yao, Xiaowei Jia,

(参考訳) 環境生態系のモデリングは、我々の惑星の持続可能性にとって重要であるが、多くの物理変数間の相互作用によって引き起こされる複雑なプロセスのため、非常に困難である。多くの変数を大規模に測定することは困難であるため、既存の研究は観測可能な特徴と局所的に利用可能な測定値の組み合わせを、特定の研究領域と期間のモデルを構築するための入力として利用することが多い。これは、環境生態系のモデリングを進める上で、根本的な疑問を提起する。空間と時間の様々な環境データ間の複雑な関係をモデル化するための一般的なフレームワークを構築するには、どうすればよいのか? 本稿では、利用可能な環境データをテキスト空間にマッピングし、環境科学における従来の予測モデリングタスクを意味認識問題に変換する新しいフレームワークFREEを紹介する。提案したFREEフレームワークは、Large Language Models(LLM)の最近の進歩を活用して、元々の入力機能を自然言語記述で補う。これにより、データセマンティクスのキャプチャが容易になり、入力機能の不規則性を活用することができる。長期予測に使用する場合、FREEは将来予測を強化するために新たに収集した観測を組み込む柔軟性を持つ。 FREEの有効性は、2つの社会的に重要な実世界の応用の文脈で評価され、デラウェア川流域の河川水温を予測し、イリノイ州とアイオワ州で毎年トウモロコシの収量を予測する。複数のベースライン法よりも優れた予測性能の他に、FREEは物理モデルで生成されたシミュレーションデータに基づいて事前学習できるため、よりデータ効率と計算効率が良いことが示されている。

Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models.

翻訳日:2024-04-24 00:23:13 公開日:2024-04-20

# $\mathrm{XOR}^{*}$と$\mathrm{FFL}$ゲームに対する最適かつほぼ最適な量子戦略

Optimal, and approximately optimal, quantum strategies for $\mathrm{XOR}^{*}$ and $\mathrm{FFL}$ games ( http://arxiv.org/abs/2311.12887v2 )

ライセンス: Link先を確認

Pete Rigas,

(参考訳) 我々は、様々な非ローカルなXORゲームに対して最適で、ほぼ最適な量子戦略を解析する。 2016年のオストロフによる以前の議論に基づいて、プレイヤーが線形汎関数を最大化して非局所的なゲームに勝つための戦略を採用できると特徴付けたAliceとBobは、ある確率分布から引き出された質問に対する各答えを検証し、AliceとBobが量子エンタングルメント、二次元資源システム、可逆変換に依存する戦略を採用する場合の量子優位性を実現するために、より広い種類の量子戦略のパフォーマンスを解析するためのフレームワークのさらなる応用を特定できる。 Fortnow-Feige-Lovasz (FFL) ゲームでは、2016 のフレームワークは、(1) 適切な非ゼロの線形変換を構築し、(2) 作用素が単位フロベニウスノルムを持ち、(3) 誤差境界を構築し、対応する近似演算を$\big(A_k \otimes \textbf{I} \big) \ket{\psi}$, and $\big( \textbf{I} \otimes \big( \frac{\pm B_{kl} + B_{lk}}{\sqrt{2}} \big) \ket{\psi}$,(4) 演算子は、A_j=i$(5) の上限で適用された順序に置換された有界であることを示す。我々は,本フレームワークの他のゲームへの適用に読者の注意を惹きつける。

We analyze optimal, and approximately optimal, quantum strategies for a variety of non-local XOR games. Building upon previous arguments due to Ostrev in 2016, which characterized approximately optimal, and optimal, strategies that players Alice and Bob can adopt for maximizing a linear functional to win non-local games after a Referee party examines each answer to a question drawn from some probability distribution, we identify additional applications of the framework for analyzing the performance of a broader class of quantum strategies in which it is possible for Alice and Bob to realize quantum advantage if the two players adopt strategies relying upon quantum entanglement, two-dimensional resource systems, and reversible transformations. For the Fortnow-Feige-Lovasz (FFL) game, the 2016 framework is directly applicable, which consists of five steps, including: (1) constructing a suitable, nonzero, linear transformation for the intertwining operations, (2) demonstrating that the operator has unit Frobenius norm, (3) constructing error bounds, and corresponding approximate operations, for $\big( A_k \otimes \textbf{I} \big) \ket{\psi}$, and $\big( \textbf{I} \otimes \big( \frac{\pm B_{kl} + B_{lk}}{\sqrt{2}} \big) \big) \ket{\psi}$, (4) constructing additional bounds for permuting the order in which $A^{j_i}_i$ operators are applied, (5) obtaining Frobenius norm upper bounds for Alice and Bob's strategies. We draw the attention of the reader to applications of this framework in other games with less regular structure.

翻訳日:2024-04-24 00:23:13 公開日:2024-04-20

# TransNeXt: 視覚変換器のロバストな視覚知覚

TransNeXt: Robust Foveal Visual Perception for Vision Transformers ( http://arxiv.org/abs/2311.17132v3 )

ライセンス: Link先を確認

Dai Shi,

(参考訳) 残差接続における深度劣化効果のため、情報交換のために積み重ね層に依存する多くの効率的なビジョントランスフォーマーモデルでは、十分な情報混合が得られず、不自然な視覚知覚に繋がる。本稿では,生物の眼球運動と眼球運動をシミュレートするバイオミメティックデザインに基づくトークンミキサーAggregated Attentionを提案する。さらに、従来のクエリやキーと相互作用する学習可能なトークンを組み込み、クエリとキーの類似性に依存するだけでなく、アフィニティ行列の生成も多様化する。本手法では,情報交換の積み重ねに頼らず,奥行き劣化を効果的に回避し,自然な視覚知覚を実現する。さらに,GLUとSEのギャップを埋めるチャネルミキサーであるConvolutional GLUを提案する。集約された注意と畳み込みGLUを組み合わせて、TransNeXtと呼ばれる新しいビジュアルバックボーンを作成します。大規模な実験により、TransNeXtは複数のモデルサイズにまたがって最先端のパフォーマンスを実現することが実証された。 224^2$の解像度で、TransNeXt-Tinyはイメージネットの精度84.0%に達し、69%のパラメータでConvNeXt-Bを上回った。 TransNeXt-Base は ImageNet の精度86.2%、ImageNet-A の精度61.6%を384^2$、COCO オブジェクト検出 mAP 57.1、ADE20K セマンティックセグメンテーション mIoU 54.7 で達成している。

Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and keys. Our approach does not rely on stacking for information exchange, thus effectively avoiding depth degradation and achieving natural visual perception. Additionally, we propose Convolutional GLU, a channel mixer that bridges the gap between GLU and SE mechanism, which empowers each token to have channel attention based on its nearest neighbor image features, enhancing local modeling capability and model robustness. We combine aggregated attention and convolutional GLU to create a new visual backbone called TransNeXt. Extensive experiments demonstrate that our TransNeXt achieves state-of-the-art performance across multiple model sizes. At a resolution of $224^2$, TransNeXt-Tiny attains an ImageNet accuracy of 84.0%, surpassing ConvNeXt-B with 69% fewer parameters. Our TransNeXt-Base achieves an ImageNet accuracy of 86.2% and an ImageNet-A accuracy of 61.6% at a resolution of $384^2$, a COCO object detection mAP of 57.1, and an ADE20K semantic segmentation mIoU of 54.7.

翻訳日:2024-04-24 00:13:26 公開日:2024-04-20

# 1000フレームの1Bパラメータによる終端動作検出

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames ( http://arxiv.org/abs/2311.17241v2 )

ライセンス: Link先を確認

Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem,

(参考訳) 近年、時間的行動検出(TAD)は、エンドツーエンドのトレーニングで大幅に改善されている。しかし、メモリボトルネックのため、限られたスケールと限られたデータ量を持つモデルだけがエンドツーエンドのトレーニングを受けられるため、必然的にTADのパフォーマンスが制限される。本稿では,エンド・ツー・エンドのトレーニングにおけるメモリ消費を削減し,10億のパラメータと入力ビデオが1,536フレームにスケールアップし,大幅な検出性能を実現する。我々のアプローチの鍵は、トレーニングメモリを減らす新しい軽量モジュールである、時間的不変アダプタ(TIA)にある。 TIAを用いて,TADタスクに適応するために,TAAのパラメータのみを更新することで,背骨を学習から解放する。 TIAはまた、背骨全体に隣接するフレームから時間的にコンテキストを集約することで、TAD表現を改善する。 4つの代表的なデータセットにまたがってモデルを評価した。効率的な設計のため、VideoMAEv2-giantでエンドツーエンドをトレーニングし、THUMOS14で75.4%のmAPを達成できます。コードはhttps://github.com/sming256/AdaTADで入手できる。

Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training, which inevitably restricts TAD performance. In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames, leading to significant detection performance. The key to our approach lies in our proposed temporal-informative adapter (TIA), which is a novel lightweight module that reduces training memory. Using TIA, we free the humongous backbone from learning to adapt to the TAD task by only updating the parameters in TIA. TIA also leads to better TAD representation by temporally aggregating context from adjacent frames throughout the backbone. We evaluate our model across four representative datasets. Owing to our efficient design, we are able to train end-to-end on VideoMAEv2-giant and achieve 75.4% mAP on THUMOS14, being the first end-to-end model to outperform the best feature-based methods. Code is available at https://github.com/sming256/AdaTAD.

翻訳日:2024-04-24 00:13:26 公開日:2024-04-20

# アクションスロット:交通場面におけるマルチラベル原子活動認識のための視覚行動中心表現

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes ( http://arxiv.org/abs/2311.17948v2 )

ライセンス: Link先を確認

Chi-Hsi Kung, Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen,

(参考訳) 本稿では,マルチラベル原子活動認識について検討する。行動認識の顕著な進歩にもかかわらず、複数の道路利用者の動きと文脈情報の両方を包括的に理解できないため、原子活動を認識することは依然として困難である。本稿では、視覚行動中心の表現を学習し、動き情報と文脈情報の両方をキャプチャするスロットアテンションに基づくアプローチであるAction-Slotを紹介する。私たちのキーとなる考え方は、原子活動が起こる領域に注意を払うことができるアクションスロットを、明示的な知覚誘導を必要とせずに設計することです。スロットアテンションをさらに高めるために、アクションスロットと競合するバックグラウンドスロットを導入し、アクティビティを欠くバックグラウンド領域に不必要なフォーカスを避ける訓練プロセスを支援する。しかし、既存のデータセットにおける不均衡なクラス分布は、稀な活動の評価を妨げている。この制限に対処するため,OATSより4倍大きく,原子活性のバランスの取れた分布を特徴とするTACOという合成データセットを収集した。本手法の有効性を検証するため,様々な行動認識ベースラインに対する包括的実験およびアブレーション研究を行った。また,実世界のデータセット上でのマルチラベル原子活動認識の性能は,TACO上での事前学習により向上できることを示す。ソースコードとデータセットをリリースします。プロジェクトのページで視覚化のビデオをご覧ください。

In this paper, we study multi-label atomic activity recognition. Despite the notable progress in action recognition, it is still challenging to recognize atomic activities due to a deficiency in a holistic understanding of both multiple road users' motions and their contextual information. In this paper, we introduce Action-slot, a slot attention-based approach that learns visual action-centric representations, capturing both motion and contextual information. Our key idea is to design action slots that are capable of paying attention to regions where atomic activities occur, without the need for explicit perception guidance. To further enhance slot attention, we introduce a background slot that competes with action slots, aiding the training process in avoiding unnecessary focus on background regions devoid of activities. Yet, the imbalanced class distribution in the existing dataset hampers the assessment of rare activities. To address the limitation, we collect a synthetic dataset called TACO, which is four times larger than OATS and features a balanced distribution of atomic activities. To validate the effectiveness of our method, we conduct comprehensive experiments and ablation studies against various action recognition baselines. We also show that the performance of multi-label atomic activity recognition on real-world datasets can be improved by pretraining representations on TACO. We will release our source code and dataset. See the videos of visualization on the project page: https://hcis-lab.github.io/Action-slot/

翻訳日:2024-04-24 00:13:26 公開日:2024-04-20

# D$^2$ST-Adapter:Few-shot行動認識のための不整形と変形可能な時空間適応器

D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition ( http://arxiv.org/abs/2312.01431v3 )

ライセンス: Link先を確認

Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian,

(参考訳) 大規模な事前学習された画像モデルを数発のアクション認識に適応させることは、数発の学習に不可欠である頑健な特徴抽出器を学習する上で、効果的かつ効率的な戦略であることが証明されている。典型的な微調整ベースの適応パラダイムは、数ショットの学習シナリオで過度に適合する傾向があり、ビデオデータの時間的特徴を学習するためのモデリングの柔軟性がほとんどない。本研究では,D$^2$ST-Adapter (Disentangled-and-Deformable Spatio-Temporal Adapter, D$^2$ST-Adapter) を提案する。空間的特徴と時間的特徴を絡み合った方法で符号化するデュアルパスアーキテクチャで設計されている。特に,D$^2$ST-Adapterのコアコンポーネントとして異方性変形型時空間アテンションモジュールを考案し,空間的および時間的領域に沿って異方性サンプリング密度を調整し,対応する経路で特に空間的・時間的特徴を学習し,D$^2$ST-Adapterにより3次元時空間のグローバルな視野における特徴を符号化し,軽量な設計を維持した。プレトレーニングされたResNetとViTの両方における本手法のインスタンス化による広範囲な実験は、数発のアクション認識のための最先端の手法よりも、本手法が優れていることを示す。本手法は,時間的ダイナミクスが行動認識に不可欠である難易度シナリオに特に適している。

Adapting large pre-trained image models to few-shot action recognition has proven to be an effective and efficient strategy for learning robust feature extractors, which is essential for few-shot learning. Typical fine-tuning based adaptation paradigm is prone to overfitting in the few-shot learning scenarios and offers little modeling flexibility for learning temporal features in video data. In this work we present the Disentangled-and-Deformable Spatio-Temporal Adapter (D$^2$ST-Adapter), which is a novel adapter tuning framework well-suited for few-shot action recognition due to lightweight design and low parameter-learning overhead. It is designed in a dual-pathway architecture to encode spatial and temporal features in a disentangled manner. In particular, we devise the anisotropic Deformable Spatio-Temporal Attention module as the core component of D$^2$ST-Adapter, which can be tailored with anisotropic sampling densities along spatial and temporal domains to learn spatial and temporal features specifically in corresponding pathways, allowing our D$^2$ST-Adapter to encode features in a global view in 3D spatio-temporal space while maintaining a lightweight design. Extensive experiments with instantiations of our method on both pre-trained ResNet and ViT demonstrate the superiority of our method over state-of-the-art methods for few-shot action recognition. Our method is particularly well-suited to challenging scenarios where temporal dynamics are critical for action recognition.

翻訳日:2024-04-24 00:13:26 公開日:2024-04-20

# Masked Pre-TrainingとCollaborative Self-Trainingによる教師なしビデオドメイン適応

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training ( http://arxiv.org/abs/2312.02914v4 )

ライセンス: Link先を確認

Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa,

(参考訳) 本研究では,ビデオ行動認識における教師なし領域適応(UDA)の問題に取り組む。我々のアプローチはUNITEと呼ばれ、画像教師モデルを用いてビデオ学生モデルを対象領域に適応させる。 UNITEは、教師が指導するマスク付き蒸留目標を用いて、まず自己指導型事前学習を用いて、ターゲットドメインビデオにおける差別的特徴学習を促進する。次に,ビデオ学生モデルとイメージ教師モデルを用いて,マスク付き対象データを用いた自己学習を行い,未ラベル対象ビデオのための改良された擬似ラベルを生成する。我々の自己学習プロセスは、ドメイン間の強い転送性能を達成するために、両方のモデルの強みをうまく活用する。我々は、複数のビデオ領域適応ベンチマークに対するアプローチを評価し、これまでに報告された結果に対する大幅な改善を観察する。

In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.

翻訳日:2024-04-24 00:13:26 公開日:2024-04-20

# StructComp: グラフコントラスト学習における構造圧縮による伝達の代替

StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning ( http://arxiv.org/abs/2312.04865v3 )

ライセンス: Link先を確認

Shengzhong Zhang, Wenjie Yang, Xinyuan Cao, Hongwei Zhang, Zengfeng Huang,

(参考訳) グラフコントラスト学習(GCL)は、グラフデータを学習するための強力なツールとなっているが、そのスケーラビリティは依然として大きな課題である。本研究では,この問題を解決するために,構造圧縮(StructComp)と呼ばれるシンプルで効果的なトレーニングフレームワークを提案する。拡散行列上の疎低ランク近似にインスパイアされたStructCompは、圧縮ノードでエンコーダを訓練する。これにより、エンコーダはトレーニング期間中にメッセージパッシングを行わず、対照的な損失でサンプルペアの数を大幅に削減できる。理論的には、元のGCL損失はStructCompによって計算された対照的な損失と近似できる。さらに、StructCompはGCLモデルのさらなる正規化用語と見なすことができ、より堅牢なエンコーダとなる。様々なデータセットに関する実証的研究により、StructCompは、バニラGCLモデルやスケーラブルなトレーニング手法と比較して、モデルパフォーマンスを改善しながら、時間とメモリ消費を大幅に削減することが示された。

Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on various datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.

翻訳日:2024-04-24 00:13:26 公開日:2024-04-20

# ビジョンランゲージモデルによるFew-Shot物体検出の再検討

Revisiting Few-Shot Object Detection with Vision-Language Models ( http://arxiv.org/abs/2312.14494v2 )

ライセンス: Link先を確認

Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan,

(参考訳) FSOD(Few-shot Object Detection)ベンチマークは、アノテーションを限定した新しいカテゴリを検出するための高度な技術を持っている。既存のベンチマークでは、COCOのような確立されたデータセットを、それぞれ、事前トレーニングと微調整のためのベースクラスと新しいクラスに分割することで再利用している。しかし、これらのベンチマークは、FSODが実際にどのようにデプロイされているかを反映していない。少数の基本カテゴリを事前学習するよりは、対象ドメインに対して基礎モデル(例えば、Webスケールデータに基づいて事前学習された視覚言語モデル(VLM))を微調整することがより現実的であると論じる。驚いたことに、GroundingDINOのようなVLMからのゼロショット推論はCOCO上の最先端(48.3対33.1 AP)よりも著しく優れている。しかし、そのようなゼロショットモデルは、それでも対象とする興味ある概念と一致しない。例えば、ウェブ上のトレーラーは、自動運転車の文脈におけるトレーラーとは異なるかもしれない。本研究では,任意の外部データセット上で事前学習し,ターゲットクラス毎のKショットを微調整した検出器を評価するための新しいベンチマークプロトコルであるFoundational FSODを提案する。さらに、現在のFSODベンチマークは、データのサブセット上の各カテゴリに対する徹底的なアノテーションを含む、実際にフェデレーションされたデータセットである点に留意する。我々はこの知見を利用して、フェデレートされた損失を伴う微調整VLMの簡単な戦略を提案する。我々は LVIS と nu Images に対するアプローチの有効性を実証し,5.9 AP による先行作業よりも改善した。私たちのコードはhttps://github.com/anishmadan23/foundational_fsodで利用可能です。

Few-shot object detection (FSOD) benchmarks have advanced techniques for detecting new categories with limited annotations. Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively. However, these benchmarks do not reflect how FSOD is deployed in practice. Rather than only pre-training on a small number of base categories, we argue that it is more practical to fine-tune a foundation model (e.g., a vision-language model (VLM) pre-trained on web-scale data) for a target domain. Surprisingly, we find that zero-shot inference from VLMs like GroundingDINO significantly outperforms the state-of-the-art (48.3 vs. 33.1 AP) on COCO. However, such zero-shot models can still be misaligned to target concepts of interest. For example, trailers on the web may be different from trailers in the context of autonomous vehicles. In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class. Further, we note that current FSOD benchmarks are actually federated datasets containing exhaustive annotations for each category on a subset of the data. We leverage this insight to propose simple strategies for fine-tuning VLMs with federated losses. We demonstrate the effectiveness of our approach on LVIS and nuImages, improving over prior work by 5.9 AP. Our code is available at https://github.com/anishmadan23/foundational_fsod

翻訳日:2024-04-24 00:03:25 公開日:2024-04-20

# 交換のないSGDの軌道について

On the Trajectories of SGD Without Replacement ( http://arxiv.org/abs/2312.16143v2 )

ライセンス: Link先を確認

Pierfrancesco Beneventano,

(参考訳) 本稿では,SGD(Stochastic Gradient Descent)の暗黙的正則化効果について検討する。我々は、大規模なニューラルネットワークを最適化するために一般的に使用される変種であるSGDを置き換えることなく検討する。例えば、学習率とヘッセンの積を$O(1)$とし、モデルアーキテクチャ、学習タスク、損失(客観的)関数を指定しない。我々の理論の核となる結果は、SGDを置き換えることなく最適化することは、新しい正則化への追加ステップと局所的に等価であるということである。これは、置換のないSGDの期待軌跡を分離できることを意味している。 (i)高い曲率の方向に沿ってSGDに代えて(バッチをサンプリングする。)、 (二)平坦なものに沿ったノイズ共分散の痕跡の正則化。その結果、置換のないSGDは平坦な領域を移動し、置換したSGDよりもかなり速くサドルを逃れることができた。いくつかの視覚的タスクにおいて、新しい正規化器はフィッシャーマトリックスの重み付けされた痕跡をペナライズし、それ故にヘッセンのスペクトルの空間が、以前の研究から経験的な観察に則っていることを奨励する。また、SGDが(GDとは対照的に)安定性の端で訓練されない理由についても説明する。

This article examines the implicit regularization effect of Stochastic Gradient Descent (SGD). We consider the case of SGD without replacement, the variant typically used to optimize large-scale neural networks. We analyze this algorithm in a more realistic regime than typically considered in theoretical works on SGD, as, e.g., we allow the product of the learning rate and Hessian to be $O(1)$ and we do not specify any model architecture, learning task, or loss (objective) function. Our core theoretical result is that optimizing with SGD without replacement is locally equivalent to making an additional step on a novel regularizer. This implies that the expected trajectories of SGD without replacement can be decoupled in (i) following SGD with replacement (in which batches are sampled i.i.d.) along the directions of high curvature, and (ii) regularizing the trace of the noise covariance along the flat ones. As a consequence, SGD without replacement travels flat areas and may escape saddles significantly faster than SGD with replacement. On several vision tasks, the novel regularizer penalizes a weighted trace of the Fisher Matrix, thus encouraging sparsity in the spectrum of the Hessian of the loss in line with empirical observations from prior work. We also propose an explanation for why SGD does not train at the edge of stability (as opposed to GD).

翻訳日:2024-04-24 00:03:25 公開日:2024-04-20

# 多様化によるOOD一般化の鍵となる要素の解明

Unraveling the Key Components of OOD Generalization via Diversification ( http://arxiv.org/abs/2312.16313v3 )

ライセンス: Link先を確認

Harold Benoit, Liangze Jiang, Andrei Atanov, Oğuzhan Fatih Kar, Mattia Rigotti, Amir Zamir,

(参考訳) 監視された学習データセットには、トレーニングセットが同じようにうまく説明される複数のキューが含まれている可能性がある。しかし、それらの多くは、すなわち分布シフトの下で予測力を失い、結果としてアウト・オブ・ディストリビューション(OOD)データへの一般化に失敗する。最近開発された「多様性」法(Lee et al , 2023; Pagliardini et al , 2023)は、異なる特徴に依存する複数の多様な仮説を見つけることによってこの問題にアプローチしている。本研究の目的は,OODの一般化能力に寄与する重要な要素を同定することである。 1) 分散化法は, 分散化に使用するラベルなしデータの分布に非常に敏感であり, メソッド固有の甘味点から離れた場合, 性能が著しく低下することを示す。 2)OODの一般化には多様化だけでは不十分である。使用済みの学習アルゴリズム、例えば、モデルのアーキテクチャと事前学習の選択は、非常に重要です。標準的な実験(WaterbirdsとOffice-Homeデータセットの分類)では、第2の選択肢を使用すると、絶対的な精度が最大20%低下する。 (3)学習アルゴリズムの最適選択はラベルのないデータに依存する。 (4) 最後に, 多様な仮説の数を増やすことで, 上記の落とし穴を軽減できないことを示す。これらの結果は,OODの一般化能力に影響を及ぼす設計要因の解明に寄与する。既存の手法を最大限に活用する方法を実践者たちに指導し、新しいより良い方法の開発を研究者に指導することができる。

Supervised learning datasets may contain multiple cues that explain the training set equally well, i.e., learning any of them would lead to the correct predictions on the training data. However, many of them can be spurious, i.e., lose their predictive power under a distribution shift and consequently fail to generalize to out-of-distribution (OOD) data. Recently developed "diversification" methods (Lee et al., 2023; Pagliardini et al., 2023) approach this problem by finding multiple diverse hypotheses that rely on different features. This paper aims to study this class of methods and identify the key components contributing to their OOD generalization abilities. We show that (1) diversification methods are highly sensitive to the distribution of the unlabeled data used for diversification and can underperform significantly when away from a method-specific sweet spot. (2) Diversification alone is insufficient for OOD generalization. The choice of the used learning algorithm, e.g., the model's architecture and pretraining, is crucial. In standard experiments (classification on Waterbirds and Office-Home datasets), using the second-best choice leads to an up to 20\% absolute drop in accuracy. (3) The optimal choice of learning algorithm depends on the unlabeled data and vice versa i.e. they are co-dependent. (4) Finally, we show that, in practice, the above pitfalls cannot be alleviated by increasing the number of diverse hypotheses, the major feature of diversification methods. These findings provide a clearer understanding of the critical design factors influencing the OOD generalization abilities of diversification methods. They can guide practitioners in how to use the existing methods best and guide researchers in developing new, better ones.

翻訳日:2024-04-24 00:03:25 公開日:2024-04-20

# 衝突機での散乱断面積によるベルの不等式測定は可能か?

Can Bell inequalities be tested via scattering cross-section at colliders ? ( http://arxiv.org/abs/2401.01162v2 )

ライセンス: Link先を確認

Song Li, Wei Shen, Jin Min Yang,

(参考訳) 衝突子におけるベルの不等式をテストするための最近の研究では、散乱断面積からのスピン相関の再構成はスピン相関の双線型形式に依存しており、全ての局所隠れ変数モデル(LHVM)がそのような性質を持つわけではない。一般LHVMが散乱断面積データによって排除できないことを示すために,粒子生成と崩壊の散乱断面積を標準量子理論と正確に同一に再現できる特定のLHVMを提案する。これにもかかわらず、散乱断面積によるスピン相関の再構成は、量子スピン相関の代用として古典的なスピン相関を用いたモデルにおいて、LHVMの幅広いクラスを除外することができる。

In current studies for testing Bell inequalities at colliders, the reconstruction of spin correlations from scattering cross-sections relies on the bilinear form of the spin correlations, and not all local hidden variable models (LHVMs) have such a property. To demonstrate that a general LHVM cannot be rule out via scattering cross-section data, we propose a specific LHVM, which can exactly duplicate the same scattering cross-section for particle production and decay as the standard quantum theory, making it indistinguishable at colliders in principle. Despite of this, we find that reconstructing spin correlations through scattering cross-sections can still rule out a broad class of LHVMs, e.g., those models employing classical spin correlations as a surrogate for quantum spin correlations.

翻訳日:2024-04-23 23:53:39 公開日:2024-04-20

# PVTをベースとしたエンコーディングと精細復号によるCT肝セグメンテーション

CT Liver Segmentation via PVT-based Encoding and Refined Decoding ( http://arxiv.org/abs/2401.09630v3 )

ライセンス: Link先を確認

Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas, Gorkem Durak, Alpay Medetalibeyoglu, Matthew Antalek, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci,

(参考訳) CTスキャンからの正確な肝分画は、効果的な診断と治療計画に不可欠である。コンピュータ支援診断システムは、肝疾患の診断、疾患の進行、治療計画の精度を向上させることを約束する。そこで本研究では,事前学習されたピラミッド・ビジョン・トランスフォーマ(PVT v2)と,高度な残差アップサンプリングとデコーダブロックを組み合わせた新しいディープラーニング手法である‘textit{\textbf{PVTFormer}}を提案する。改良された特徴チャネルアプローチを階層的デコーディング戦略に統合することにより、PVTFormerはセマンティック機能を強化して高品質なセグメンテーションマスクを生成する。肝腫瘍セグメンテーションベンチマーク(LiTS)2017において提案手法の厳密な評価により,提案手法は高ダイス係数86.78\%,mIoU78.46\%,低HD3.50が得られた。その結果,最新の肝セグメンテーション法に対する新しいベンチマークの設定においてPVTFormerの有効性を裏付ける結果を得た。提案されたPVTFormerのソースコードは、 \url{https://github.com/DebeshJha/PVTFormer} で入手できる。

Accurate liver segmentation from CT scans is essential for effective diagnosis and treatment planning. Computer-aided diagnosis systems promise to improve the precision of liver disease diagnosis, disease progression, and treatment planning. In response to the need, we propose a novel deep learning approach, \textit{\textbf{PVTFormer}}, that is built upon a pretrained pyramid vision transformer (PVT v2) combined with advanced residual upsampling and decoder block. By integrating a refined feature channel approach with a hierarchical decoding strategy, PVTFormer generates high quality segmentation masks by enhancing semantic features. Rigorous evaluation of the proposed method on Liver Tumor Segmentation Benchmark (LiTS) 2017 demonstrates that our proposed architecture not only achieves a high dice coefficient of 86.78\%, mIoU of 78.46\%, but also obtains a low HD of 3.50. The results underscore PVTFormer's efficacy in setting a new benchmark for state-of-the-art liver segmentation methods. The source code of the proposed PVTFormer is available at \url{https://github.com/DebeshJha/PVTFormer}.

翻訳日:2024-04-23 23:53:39 公開日:2024-04-20

# Reliance: 情報とニュースの信頼性評価のための信頼性のあるアンサンブル学習

RELIANCE: Reliable Ensemble Learning for Information and News Credibility Evaluation ( http://arxiv.org/abs/2401.10940v2 )

ライセンス: Link先を確認

Majid Ramezani, Hamed Mohammadshahi, Mahshid Daliry, Soroor Rahmani, Amir-Hosein Asghari,

(参考訳) 情報拡散の時代において、ニュースコンテンツの信頼性を識別することは、ますます増加する課題である。本稿では,堅牢な情報と偽ニュースの信頼性評価を目的とした,先駆的なアンサンブル学習システムであるRELIANCEを紹介する。 Support Vector Machine(SVM)、Naive Bayes(英語版)、ロジスティック回帰(英語版)、ランダムフォレスト(英語版)、Bidirectional Long Term Memory Networks(英語版) (BiLSTMs)を含む5つの多様なベースモデルで構成され、RELIANCEはその強度を統合する革新的なアプローチを採用し、アンサンブルの集合的知性を利用して精度を高めている。実験では、個々のモデルよりもRELIANCEの方が優れていることが示され、信頼できない情報ソースと信頼できない情報ソースを区別する効果が示された。 Relianceはまた、情報およびニュース信頼性評価のベースラインモデルを超え、情報ソースの信頼性を評価する効果的なソリューションとしての地位を確立している。

In the era of information proliferation, discerning the credibility of news content poses an ever-growing challenge. This paper introduces RELIANCE, a pioneering ensemble learning system designed for robust information and fake news credibility evaluation. Comprising five diverse base models, including Support Vector Machine (SVM), naive Bayes, logistic regression, random forest, and Bidirectional Long Short Term Memory Networks (BiLSTMs), RELIANCE employs an innovative approach to integrate their strengths, harnessing the collective intelligence of the ensemble for enhanced accuracy. Experiments demonstrate the superiority of RELIANCE over individual models, indicating its efficacy in distinguishing between credible and non-credible information sources. RELIANCE, also surpasses baseline models in information and news credibility assessment, establishing itself as an effective solution for evaluating the reliability of information sources.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# あなたのケトルはハッカーより賢い? 消費者向けIoTデバイスでリプレイ攻撃の脆弱性を評価するためのスケーラブルなツール

Is Your Kettle Smarter Than a Hacker? A Scalable Tool for Assessing Replay Attack Vulnerabilities on Consumer IoT Devices ( http://arxiv.org/abs/2401.12184v2 )

ライセンス: Link先を確認

Sara Lazzaro, Vincenzo De Angelis, Anna Maria Mandalari, Francesco Buccafurri,

(参考訳) コンシューマモノのインターネット(IoT)デバイスは、しばしばローカルネットワークを利用して対応するアプリや他のデバイスと通信する。これはクラウドをオフロードするため、効率の面でメリットがあります。 ENISAとNISTのセキュリティガイドラインは、安全と信頼性のためのデフォルトのローカル通信を可能にすることの重要性を強調している。実際、IoTデバイスは、クラウド接続が利用できない場合にも機能し続けなければならない。クラウドデバイス接続のセキュリティは通常、標準プロトコルの使用によって強化されるが、ローカル接続セキュリティはしばしば見過ごされる。ローカル通信のセキュリティの無視は、リプレイ攻撃を含む様々な脅威への扉を開く。本稿では,攻撃をリプレイするためのIoTデバイスの脆弱性を自動的にテストするための体系的手法を設計することによって,この種の攻撃について検討する。具体的には,REPLIOTというツールを用いて,ターゲット装置の事前知識を必要とせずに,リプレイ攻撃が成功したかどうかを判定する手法を提案する。私たちは、さまざまなベンダーやカテゴリにまたがる人気のある商用デバイスを使って、何千もの自動実験を行います。特に,これらのデバイスのうち51%はローカル接続をサポートしていないため,ENISA/NISTガイドラインの信頼性と安全性要件に準拠していない。残りの75%のデバイスは、検出精度0.98-1のREPLIOTによるリプレイ攻撃に対して脆弱であることがわかった。最後に、この脆弱性の原因について検討し、緩和戦略について議論する。

Consumer Internet of Things (IoT) devices often leverage the local network to communicate with the corresponding companion app or other devices. This has benefits in terms of efficiency since it offloads the cloud. ENISA and NIST security guidelines underscore the importance of enabling default local communication for safety and reliability. Indeed, an IoT device should continue to function in case the cloud connection is not available. While the security of cloud-device connections is typically strengthened through the usage of standard protocols, local connectivity security is frequently overlooked. Neglecting the security of local communication opens doors to various threats, including replay attacks. In this paper, we investigate this class of attacks by designing a systematic methodology for automatically testing IoT devices vulnerability to replay attacks. Specifically, we propose a tool, named REPLIOT, able to test whether a replay attack is successful or not, without prior knowledge of the target devices. We perform thousands of automated experiments using popular commercial devices spanning various vendors and categories. Notably, our study reveals that among these devices, 51% of them do not support local connectivity, thus they are not compliant with the reliability and safety requirements of the ENISA/NIST guidelines. We find that 75% of the remaining devices are vulnerable to replay attacks with REPLIOT having a detection accuracy of 0.98-1. Finally, we investigate the possible causes of this vulnerability, discussing possible mitigation strategies.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# Delocate: ランダムに位置決めされたトレーパー付きディープフェイクビデオの検出と位置決め

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces ( http://arxiv.org/abs/2401.13516v2 )

ライセンス: Link先を確認

Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou,

(参考訳) ディープフェイクビデオはますます現実的になりつつあり、フレームごとに異なる顔の領域を微妙に改ざんしている。その結果、既存のDeepfake検出手法の多くは、未知のドメインのDeepfakeビデオを検出するのに苦労し、改ざんされた領域を正確に特定する。そこで本研究では,未知のドメインのDeepfakeビデオの認識とローカライズが可能なDelocateという,新しいDeepfake検出モデルを提案する。 OurmethodはRecovering and Localizationという2つのステージから構成される。回復段階において、モデルランダムは興味のある領域(ROI)を隠蔽し、痕跡を改ざんすることなく実際の顔を再構成する。ローカライゼーション段階において、リカバリフェーズの出力とフォージェリーグラウンドの真理マスクは、フォージェリーローカライゼーションプロセスの導出を補助する。このプロセスは、偽の顔の回復段階と回復不良を戦略的に強調し、改ざんされた領域の局所化を容易にする。広範に使用されている4つのベンチマークデータセットの大規模な実験により、乱れ領域のローカライズに限らず、クロスドメイン検出性能も向上することが示された。

Deepfake videos are becoming increasingly realistic, showing subtle tampering traces on facial areasthat vary between frames. Consequently, many existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domain Deepfake videos. Ourmethod consists of two stages named recoveringand localization. In the recovering stage, the modelrandomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, resulting in a relatively good recovery effect for realfaces and a poor recovery effect for fake faces. Inthe localization stage, the output of the recoveryphase and the forgery ground truth mask serve assupervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Ourextensive experiments on four widely used benchmark datasets demonstrate that Delocate not onlyexcels in localizing tampered areas but also enhances cross-domain detection performance.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# Pixel to Elevation: 自動オフロードナビゲーションのための画像を用いた長距離標高マップの学習

Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation ( http://arxiv.org/abs/2401.17484v3 )

ライセンス: Link先を確認

Chanyoung Chung, Georgios Georgakis, Patrick Spieler, Curtis Padgett, Ali Agha, Shehryar Khattak,

(参考訳) 長距離での地形トポロジーの理解は、特に高速での航行において、オフロードロボットミッションの成功に不可欠である。現在幾何学的マッピングに大きく依存しているLiDARセンサーは、より遠くのマッピングでスパース測定を行う。この課題に対処するために,車載エゴセントリック画像のみをリアルタイムに利用して,長距離の地形標高マップを予測可能な,新しい学習ベースアプローチを提案する。提案手法は3つの要素から構成される。まず, トランスフォーマーをベースとしたエンコーダを導入し, エゴセントリックな視線と, 以前の鳥眼の視線高度マップの予測との相互関係を学習する。第2に,多視点視覚画像特徴を有する複雑な非構造地形上での3次元車両の姿勢認識型位置符号化を提案する。最後に、下流のナビゲーション作業を容易にするために、標高マップ予測間の時間的整合性を改善するために、歴史を付加した学習可能なマップ埋め込みを提案する。実世界のオフロード駆動データを用いて,複雑・非構造地形における自律型オフロードロボットナビゲーションの適用性について実験的に検証した。さらに、この手法は現在の最先端手法と比較して質的かつ定量的に比較される。大規模フィールド実験により, 地形の高度を正確に予測し, 地形の全体像を長距離で効果的に把握し, ベースラインモデルを超えていることが示された。最後に,提案手法の重要成分の影響を強調・理解し,オフロードロボットナビゲーション能力を向上させるための適合性を検証するためにアブレーション研究を行った。

Understanding terrain topology at long-range is crucial for the success of off-road robotic missions, especially when navigating at high-speeds. LiDAR sensors, which are currently heavily relied upon for geometric mapping, provide sparse measurements when mapping at greater distances. To address this challenge, we present a novel learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time. Our proposed method is comprised of three main elements. First, a transformer-based encoder is introduced that learns cross-view associations between the egocentric views and prior bird-eye-view elevation map predictions. Second, an orientation-aware positional encoding is proposed to incorporate the 3D vehicle pose information over complex unstructured terrain with multi-view visual image features. Lastly, a history-augmented learn-able map embedding is proposed to achieve better temporal consistency between elevation map predictions to facilitate the downstream navigational tasks. We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain using real-world offroad driving data. Furthermore, the method is qualitatively and quantitatively compared against the current state-of-the-art methods. Extensive field experiments demonstrate that our method surpasses baseline models in accurately predicting terrain elevation while effectively capturing the overall terrain topology at long-ranges. Finally, ablation studies are conducted to highlight and understand the effect of key components of the proposed approach and validate their suitability to improve offroad robotic navigation capabilities.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# 粗さ検出・分類のためのスマートウォッチマイクロフォンセンサの高調度化

Harnessing Smartwatch Microphone Sensors for Cough Detection and Classification ( http://arxiv.org/abs/2401.17738v2 )

ライセンス: Link先を確認

Pranay Jaiswal, Haroon R. Lone,

(参考訳) 本研究では,マイクロホンセンサを内蔵したスマートウォッチを用いたコークスのモニタリングと各種のコークス検出の可能性について検討した。参加者32名を対象に調査を行い,9時間分の音声データを制御的に収集した。その後, このデータを構造化した手法で処理し, その結果, 223個の正粘性試料が得られた。さらに,拡張手法によりデータセットを改良し,特殊な1次元CNNモデルを用いた。このモデルでは、非歩行時の98.49%、歩行中の98.2%の精度で、スマートウォッチが生地を検知できることを示している。さらに,本研究では,クラスタリング技術を用いて,4種類の生地の同定に成功した。

This study investigates the potential of using smartwatches with built-in microphone sensors for monitoring coughs and detecting various cough types. We conducted a study involving 32 participants and collected 9 hours of audio data in a controlled manner. Afterward, we processed this data using a structured approach, resulting in 223 positive cough samples. We further improved the dataset through augmentation techniques and employed a specialized 1D CNN model. This model achieved an impressive accuracy rate of 98.49% while non-walking and 98.2% while walking, showing smartwatches can detect cough. Moreover, our research successfully identified four distinct types of coughs using clustering techniques.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# AI生成画像検出に必要なのは1つのシンプルなパッチ

A Single Simple Patch is All You Need for AI-generated Image Detection ( http://arxiv.org/abs/2402.01123v2 )

ライセンス: Link先を確認

Jiaxuan Chen, Jieteng Yao, Li Niu,

(参考訳) 最近の生成モデルの発展は、超現実的な偽画像を生成する可能性を解き放つ。偽画像の悪用を防ぐため、AIが生成した画像検出は、偽画像と実際の画像とを区別することを目的としている。しかし、既存の手法では、未知のジェネレータが生成した画像を検出する際に、厳しい性能低下に悩まされている。生成モデルは、リッチなテクスチャでパッチを生成することに集中し、単純なパッチに存在するカメラキャプチャによる隠れノイズを無視しながら、画像をよりリアルにする傾向にある。本稿では,偽画像の識別に単一単純パッチのノイズパターンを利用する手法を提案する。さらに,低品質画像の処理における性能低下により,干渉情報を除去するエンハンスメントモジュールと知覚モジュールを導入する。大規模な実験により, 提案手法は, 公開ベンチマーク上での最先端性能を実現することができることを示した。

The recent development of generative models unleashes the potential of generating hyper-realistic fake images. To prevent the malicious usage of fake images, AI-generated image detection aims to distinguish fake images from real images. However, existing method suffer from severe performance drop when detecting images generated by unseen generators. We find that generative models tend to focus on generating the patches with rich textures to make the images more realistic while neglecting the hidden noise caused by camera capture present in simple patches. In this paper, we propose to exploit the noise pattern of a single simple patch to identify fake images. Furthermore, due to the performance decline when handling low-quality generated images, we introduce an enhancement module and a perception module to remove the interfering information. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on public benchmarks.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# PiCO: 一貫性最適化に基づくLCMのピアレビュー

PiCO: Peer Review in LLMs based on the Consistency Optimization ( http://arxiv.org/abs/2402.01830v2 )

ライセンス: Link先を確認

Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu, Yu Wang, Ming Pang, Li Yuan,

(参考訳) 既存の大規模言語モデル (LLMs) の評価手法は一般的に、人間アノテーションを使ったクローズド環境とドメイン固有のベンチマークでの性能をテストすることに重点を置いている。本稿では,LLMを自動計測するピアレビュー機構を利用して,教師なしの新たな評価方向を探索する。この設定では、オープンソースのLLMとクローズドソースのLLMは同じ環境にあり、ラベルのない質問に回答し、互いに評価することができる。これらのモデル間の能力階層を得るため、各LLMに学習可能な能力パラメータを割り当て、最終ランク付けを調整する。制約付き最適化問題として定式化し、各LLMの能力とスコアの一貫性を最大化することを目的としている。背景にある重要な前提は、高レベルのLSMは低レベルのLSMよりも他人の回答をより正確に評価でき、高レベルのLSMは高い応答スコアを達成できるということである。さらに,PEN,CIN,LISという3つの指標を用いて,ランク付けのギャップを評価する。これらのメトリクスを用いて複数のデータセットの実験を行い、提案手法の有効性を検証する。

Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations. In this paper, we explore a novel unsupervised evaluation direction, utilizing peer-review mechanisms to measure LLMs automatically. In this setting, both open-source and closed-source LLMs lie in the same environment, capable of answering unlabeled questions and evaluating each other, where each LLM's response score is jointly determined by other anonymous ones. To obtain the ability hierarchy among these models, we assign each LLM a learnable capability parameter to adjust the final ranking. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. The key assumption behind is that high-level LLM can evaluate others' answers more accurately than low-level ones, while higher-level LLM can also achieve higher response scores. Moreover, we propose three metrics called PEN, CIN, and LIS to evaluate the gap in aligning human rankings. We perform experiments on multiple datasets with these metrics, validating the effectiveness of the proposed approach.

翻訳日:2024-04-23 23:43:55 公開日:2024-04-20

# 対決の監査:エビデンスとスタイルによる高度な対論生成の評価

Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style ( http://arxiv.org/abs/2402.08498v4 )

ライセンス: Link先を確認

Preetika Verma, Kokil Jaidka, Svetlana Churina,

(参考訳) Reddit ChangeMyViewデータセットからの投稿に対するエビデンスベースでスタイリスティックな反論を作成する能力について、大規模な言語モデル(LLM)を監査しました。質的および定量的な指標のホスト間でそれらの修辞的品質をベンチマークし、最終的には人間の逆論と比較して説得力で評価した。 GPT-3.5 Turbo と Koala とそれらの微調整された変種と PaLM 2 はエビデンスの使用と議論スタイルの異なるプロンプトである。 GPT-3.5 Turboは、特に「相互性」スタイルの議論において、強いパラフレーズとスタイルの忠実さで、議論の質において最高位にランクされた。しかし、スタイリスティックな反論は人間の説得力基準に欠けており、人々は証拠に基づく反論に相反することを好んだ。この結果から, 明らか性と様式的要素のバランスが, 説得力のある反論に不可欠であることが示唆された。今後の研究の方向性とLCMのアウトプット評価の意義について論じる。

We audited large language models (LLMs) for their ability to create evidence-based and stylistic counter-arguments to posts from the Reddit ChangeMyView dataset. We benchmarked their rhetorical quality across a host of qualitative and quantitative metrics and then ultimately evaluated them on their persuasive abilities as compared to human counter-arguments. Our evaluation is based on Counterfire: a new dataset of 32,000 counter-arguments generated from large language models (LLMs): GPT-3.5 Turbo and Koala and their fine-tuned variants, and PaLM 2, with varying prompts for evidence use and argumentative style. GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in `reciprocity' style arguments. However, the stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals. The findings suggest that a balance between evidentiality and stylistic elements is vital to a compelling counter-argument. We close with a discussion of future research directions and implications for evaluating LLM outputs.

翻訳日:2024-04-23 23:34:03 公開日:2024-04-20

# 大規模言語モデルの推論を用いたパズル解法に関する調査

Puzzle Solving using Reasoning of Large Language Models: A Survey ( http://arxiv.org/abs/2402.11291v2 )

ライセンス: Link先を確認

Panagiotis Giadikiaroglou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,

(参考訳) パズル解決におけるLarge Language Models(LLM)の機能の探索は、AIの可能性と課題に関する重要な洞察を明らかにし、複雑な推論タスクにおけるそれらの適用性を理解するための重要なステップを示す。この調査では、パズルをルールベースとルールレスのカテゴリに分割するユニークな分類法を活用し、様々な方法論を通じてLSMを批判的に評価する。関連するデータセットとベンチマークの批判的レビューを通じて、LLMの性能を評価し、複雑なパズルシナリオにおける重要な課題を特定する。本研究は,高度な論理的推論を必要とする人において,LLM能力と人間ライクな推論の相違を浮き彫りにした。この調査は、LLMのパズル解決能力を高め、AIの論理的推論と創造的問題解決の進歩に貢献するために、新しい戦略とよりリッチなデータセットの必要性を強調している。

Exploring the capabilities of Large Language Models (LLMs) in puzzle solving unveils critical insights into their potential and challenges in AI, marking a significant step towards understanding their applicability in complex reasoning tasks. This survey leverages a unique taxonomy -- dividing puzzles into rule-based and rule-less categories -- to critically assess LLMs through various methodologies, including prompting techniques, neuro-symbolic approaches, and fine-tuning. Through a critical review of relevant datasets and benchmarks, we assess LLMs' performance, identifying significant challenges in complex puzzle scenarios. Our findings highlight the disparity between LLM capabilities and human-like reasoning, particularly in those requiring advanced logical inference. The survey underscores the necessity for novel strategies and richer datasets to advance LLMs' puzzle-solving proficiency and contribute to AI's logical reasoning and creative problem-solving advancements.

翻訳日:2024-04-23 23:34:03 公開日:2024-04-20

# MultiCorrupt: マルチモードロバストネスデータセットと3次元物体検出のためのLiDAR-Camera Fusionのベンチマーク

MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection ( http://arxiv.org/abs/2402.11677v3 )

ライセンス: Link先を確認

Till Beemelmanns, Quan Zhang, Christian Geller, Lutz Eckstein,

(参考訳) 自動走行のためのマルチモーダル3Dオブジェクト検出モデルは、nuScenesのようなコンピュータビジョンベンチマークでは例外的な性能を示した。しかし、密集したLiDAR点雲や精密に校正されたセンサーアレイへの依存は、現実世界のアプリケーションに課題をもたらす。センサの不整合、誤校正、異なるサンプリング周波数などの問題は、LiDARとカメラのデータにおける空間的および時間的不整合につながる。加えて、LiDARとカメラデータの完全性は、インクリメント気象などの有害な環境条件によってしばしば損なわれ、閉塞やノイズ干渉を引き起こす。この課題に対処するため,MultiCorruptは,10種類の汚職に対してマルチモーダル3Dオブジェクト検出器の堅牢性を評価するために設計された総合的なベンチマークである。我々は,MultiCorrupt上で5つの最先端マルチモーダル検出器を評価し,その耐久性能の観点からその性能を解析した。以上の結果から, 既存手法は, 腐敗の種類や融合戦略によって, 各種の強靭性を示すことがわかった。マルチモーダルな設計選択が、そのようなモデルをある種の摂動に対して堅牢にするための洞察を提供する。データセット生成コードとベンチマークはhttps://github.com/ika-rwth-aachen/MultiCorruptで公開されている。

Multi-modal 3D object detection models for automated driving have demonstrated exceptional performance on computer vision benchmarks like nuScenes. However, their reliance on densely sampled LiDAR point clouds and meticulously calibrated sensor arrays poses challenges for real-world applications. Issues such as sensor misalignment, miscalibration, and disparate sampling frequencies lead to spatial and temporal misalignment in data from LiDAR and cameras. Additionally, the integrity of LiDAR and camera data is often compromised by adverse environmental conditions such as inclement weather, leading to occlusions and noise interference. To address this challenge, we introduce MultiCorrupt, a comprehensive benchmark designed to evaluate the robustness of multi-modal 3D object detectors against ten distinct types of corruptions. We evaluate five state-of-the-art multi-modal detectors on MultiCorrupt and analyze their performance in terms of their resistance ability. Our results show that existing methods exhibit varying degrees of robustness depending on the type of corruption and their fusion strategy. We provide insights into which multi-modal design choices make such models robust against certain perturbations. The dataset generation code and benchmark are open-sourced at https://github.com/ika-rwth-aachen/MultiCorrupt.

翻訳日:2024-04-23 23:34:03 公開日:2024-04-20

# シュレーディンガー猫量子状態を用いた所定の位相シフトの検出

Using Schroedinger cat quantum state for detection of a given phase shift ( http://arxiv.org/abs/2403.03787v2 )

ライセンス: Link先を確認

V. L. Gorshenin, F. Ya. Khalili,

(参考訳) Shroedinger cat の量子状態において準備された光パルスを2本腕干渉計の暗いポートと強い古典的な光を明るいポートに注入することで、原理上、所定の位相シフトを不明瞭に検出できることを示す。この位相シフトの値は古典キャリアとシュレーディンガーの猫状態の振幅に逆比例する。しかし、この目的にはエキゾチックな検出手順が必要である。出力されるダークポートの光子数を測定することで、「偽陽性」確率で位相シフトを検出することができる。この場合の「偽陰性」確率はシュレーディンガーの猫状態の振幅の増加に伴って減少し、この振幅の合理的な値は0.1程度小さくすることができる。

We show that injecting a light pulse prepared in the Shroedinger cat quantum state into the dark port of a two-arm interferometer and the strong classical light into the bright one, it is possible, in principle, to detect a given phase shift unambiguously. The value of this phase shift is inversely proportional to the amplitudes of both the classical carrier and Shroedinger cat state. However, an exotic detection procedure is required for this purpose. By measuring the number of photons at the output dark port, it is possible to detect the phase shift with the vanishing "false positive" probability. The "false negative" probability in this case decreases with the increase on the amplitude of the Schroedinger cat state and, for reasonable values of this amplitude, can be made as small as about 0.1.

翻訳日:2024-04-23 23:24:19 公開日:2024-04-20

# Debatrix: LLMに基づく反復時間解析による多次元議論判断

Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM ( http://arxiv.org/abs/2403.08010v2 )

ライセンス: Link先を確認

Jingcong Liang, Rong Ye, Meng Han, Ruofei Lai, Xinyu Zhang, Xuanjing Huang, Zhongyu Wei,

(参考訳) 広範囲で活気あるマルチターンの議論を評価するために、自動討論審査をどうやって構築できるのか? この課題は、長いテキスト、複雑な議論関係、多次元アセスメントなどで議論されるので、難しい。同時に、現在の研究は主に短い対話に焦点を当てており、議論全体を評価することはめったにない。本稿では,Large Language Models (LLMs) を利用して,マルチターン討論の分析と評価を行うDebatrixを提案する。具体的には、Debatrixは垂直かつ反復的な時系列分析と水平多次元評価コラボレーションを備えている。実世界の議論シナリオに合わせるため、私たちはPanelBenchベンチマークを導入し、システムの性能と実際の議論結果を比較した。以上の結果から,LSMを直接使用して議論評価を行うことによる顕著な改善が示唆された。ソースコードとベンチマークデータはhttps://github.com/ljcleo/debatrix.comで公開されている。

How can we construct an automated debate judge to evaluate an extensive, vibrant, multi-turn debate? This task is challenging, as judging a debate involves grappling with lengthy texts, intricate argument relationships, and multi-dimensional assessments. At the same time, current research mainly focuses on short dialogues, rarely touching upon the evaluation of an entire debate. In this paper, by leveraging Large Language Models (LLMs), we propose Debatrix, which makes the analysis and assessment of multi-turn debates more aligned with majority preferences. Specifically, Debatrix features a vertical, iterative chronological analysis and a horizontal, multi-dimensional evaluation collaboration. To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes. The findings indicate a notable enhancement over directly using LLMs for debate evaluation. Source code and benchmark data are available online at https://github.com/ljcleo/debatrix .

翻訳日:2024-04-23 23:14:33 公開日:2024-04-20

# 野生動物における感情認識のための複合マルチモーダルトランス

Joint Multimodal Transformer for Emotion Recognition in the Wild ( http://arxiv.org/abs/2403.10488v3 )

ライセンス: Link先を確認

Paul Waligora, Haseeb Aslam, Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger,

(参考訳) マルチモーダル感情認識(MMER)システムは、例えば視覚的、テキスト的、生理的、聴覚的モダリティ間のモーダル間関係を利用して、通常、単モーダルシステムより優れている。本稿では,キーベースクロスアテンションとの融合のために,ジョイントマルチモーダルトランス (JMT) を利用するMMER法を提案する。このフレームワークは、様々なモダリティの相補的な性質を利用して予測精度を向上させることができる。異なるバックボーンは、ビデオシーケンス上の各モードにおけるモーダル内時空間依存性をキャプチャする。その後、JMT融合アーキテクチャは個々のモダリティ埋め込みを統合し、モデルがモーダル間およびモーダル間関係を効果的にキャプチャすることを可能にする。 Affwild2データセット(顔と声を含む)の次元的感情認識と、Biovidデータセット(顔とバイオセンサーを含む)の痛み推定という2つの困難な表現認識タスクに関する広範な実験は、我々のJMT融合がMMERにコスト効率の良いソリューションをもたらすことを示唆している。実験の結果,MMERシステムによる核融合により,関連するベースラインや最先端手法よりも優れた性能が得られることがわかった。

Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the inter- and intra-modal relationships between, e.g., visual, textual, physiological, and auditory modalities. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention. This framework can exploit the complementary nature of diverse modalities to improve predictive accuracy. Separate backbones capture intra-modal spatiotemporal dependencies within each modality over video sequences. Subsequently, our JMT fusion architecture integrates the individual modality embeddings, allowing the model to effectively capture inter- and intra-modal relationships. Extensive experiments on two challenging expression recognition tasks -- (1) dimensional emotion recognition on the Affwild2 dataset (with face and voice) and (2) pain estimation on the Biovid dataset (with face and biosensors) -- indicate that our JMT fusion can provide a cost-effective solution for MMER. Empirical results show that MMER systems with our proposed fusion allow us to outperform relevant baseline and state-of-the-art methods.

翻訳日:2024-04-23 23:14:33 公開日:2024-04-20

# Lodge: 特徴的なダンスプリミティブによるロングダンス生成のための粗大な拡散ネットワーク

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives ( http://arxiv.org/abs/2403.10518v3 )

ライセンス: Link先を確認

Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li,

(参考訳) 与えられた音楽に条件付けされた非常に長いダンスシーケンスを生成することができるネットワークであるLodgeを提案する。そこで我々は,2つの拡散モデル間の中間表現として有意な表現性を持つ特徴的ダンスプリミティブを提案する。第1段階はグローバル拡散であり、粗いレベルの音楽距離相関と生産特性のダンスプリミティブの理解に焦点を当てている。対照的に第2段階は局所拡散であり、ダンスプリミティブや振付規則の指導の下で、詳細な動き列を並列に生成する。さらに,足と地面の接触を最適化するフットリファインブロックを提案し,運動の物理的現実性を高める。提案手法は,グローバルな振付パターンと局所的な動きの質,表現性とのバランスを保ちながら,非常に長いダンスシーケンスを並列に生成することができる。大規模な実験により,本手法の有効性が検証された。

We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the coarse-level music-dance correlation and production characteristic dance primitives. In contrast, the second-stage is the local diffusion, which parallelly generates detailed motion sequences under the guidance of the dance primitives and choreographic rules. In addition, we propose a Foot Refine Block to optimize the contact between the feet and the ground, enhancing the physical realism of the motion. Our approach can parallelly generate dance sequences of extremely long length, striking a balance between global choreographic patterns and local motion quality and expressiveness. Extensive experiments validate the efficacy of our method.

翻訳日:2024-04-23 23:14:33 公開日:2024-04-20

# ガウス過程回帰を用いた機械学習に基づくシステムの信頼性解析

Machine learning-based system reliability analysis with Gaussian Process Regression ( http://arxiv.org/abs/2403.11125v2 )

ライセンス: Link先を確認

Lisang Zhou, Ziqian Luo, Xueting Pan,

(参考訳) 機械学習に基づく信頼性解析手法は、その計算効率と精度に大きな進歩を示した。近年,計算性能を向上させるために,多くの効率的な学習戦略が提案されている。しかし、理論的最適学習戦略を探求する者はほとんどいない。本稿では,そのような探索を容易にするいくつかの定理を提案する。具体的には, 候補設計サンプル間の相関を考慮し, 無視する事例について詳しく述べる。さらに、Kriging相関を無視するケースに対して、よく知られたU学習関数を最適な学習関数に再構成できることを証明した。さらに、逐次多重訓練サンプル濃縮の理論的最適学習戦略についても、ベイズ推定とそれに対応する損失関数を用いて数学的に検討する。シミュレーションの結果,Krigingの相関性を考慮した最適学習戦略は,性能関数の評価回数の削減の観点から,Krigingの相関性やその他の最先端の学習機能を文献から無視する手法よりも有効であることが示唆された。しかし、この実装は非常に大きな計算資源を調査する必要がある。

Machine learning-based reliability analysis methods have shown great advancements for their computational efficiency and accuracy. Recently, many efficient learning strategies have been proposed to enhance the computational performance. However, few of them explores the theoretical optimal learning strategy. In this article, we propose several theorems that facilitates such exploration. Specifically, cases that considering and neglecting the correlations among the candidate design samples are well elaborated. Moreover, we prove that the well-known U learning function can be reformulated to the optimal learning function for the case neglecting the Kriging correlation. In addition, the theoretical optimal learning strategy for sequential multiple training samples enrichment is also mathematically explored through the Bayesian estimate with the corresponding lost functions. Simulation results show that the optimal learning strategy considering the Kriging correlation works better than that neglecting the Kriging correlation and other state-of-the art learning functions from the literatures in terms of the reduction of number of evaluations of performance function. However, the implementation needs to investigate very large computational resource.

翻訳日:2024-04-23 23:14:33 公開日:2024-04-20

# CHisIEC: 古代中国史のための情報抽出コーパス

CHisIEC: An Information Extraction Corpus for Ancient Chinese History ( http://arxiv.org/abs/2403.15088v2 )

ライセンス: Link先を確認

Xuemei Tang, Zekun Deng, Qi Su, Hao Yang, Jun Wang,

(参考訳) 自然言語処理(NLP)は、デジタル人文科学(DH)の領域において重要な役割を担い、歴史的・文化的遺産文書の構造解析を推進するための基盤となっている。これは、名前付きエンティティ認識(NER)と関係抽出(RE)のドメインに特に当てはまる。我々は,古代史・文化の迅速化への取り組みとして,「中国歴史情報抽出法人」(CHisIEC)を提示する。 CHisIEC は NER と RE タスクの開発と評価を目的とした,精巧にキュレートされたデータセットである。 1830年以上にわたる13の王朝のデータを網羅した、顕著な歴史的時系列を描いているCisIECは、中国の史料に固有の広範囲の時間的範囲とテキストの不均一性を表わしている。データセットには4つの異なるエンティティタイプと12のリレーションタイプが含まれており、14,194のエンティティと8,609のリレーションで構成されている。データセットの堅牢性と汎用性を確立するため,さまざまなサイズとパラダイムのモデルを含む総合的な実験を行った。また,古代中国史に関わる課題の文脈において,Large Language Models (LLMs) の機能を評価する。データセットとコードは \url{https://github.com/tangxuemei 1995/CHisIEC} で公開されている。

Natural Language Processing (NLP) plays a pivotal role in the realm of Digital Humanities (DH) and serves as the cornerstone for advancing the structural analysis of historical and cultural heritage texts. This is particularly true for the domains of named entity recognition (NER) and relation extraction (RE). In our commitment to expediting ancient history and culture, we present the ``Chinese Historical Information Extraction Corpus''(CHisIEC). CHisIEC is a meticulously curated dataset designed to develop and evaluate NER and RE tasks, offering a resource to facilitate research in the field. Spanning a remarkable historical timeline encompassing data from 13 dynasties spanning over 1830 years, CHisIEC epitomizes the extensive temporal range and text heterogeneity inherent in Chinese historical documents. The dataset encompasses four distinct entity types and twelve relation types, resulting in a meticulously labeled dataset comprising 14,194 entities and 8,609 relations. To establish the robustness and versatility of our dataset, we have undertaken comprehensive experimentation involving models of various sizes and paradigms. Additionally, we have evaluated the capabilities of Large Language Models (LLMs) in the context of tasks related to ancient Chinese history. The dataset and code are available at \url{https://github.com/tangxuemei1995/CHisIEC}.

翻訳日:2024-04-23 23:04:49 公開日:2024-04-20

# 2ストリームFoveation-based Active Vision Learningに向けて

Towards Two-Stream Foveation-based Active Vision Learning ( http://arxiv.org/abs/2403.15977v3 )

ライセンス: Link先を確認

Timur Ibrayev, Amitangshu Mukherjee, Sai Aparna Aketi, Kaushik Roy,

(参考訳) ディープニューラルネットワーク(DNN)ベースのマシン認識フレームワークは、入力全体をワンショットで処理し、"何が観察されているか"と"どこにあるか"の両方に対する回答を提供する。対照的に、神経科学の「二流仮説」は、人間の視覚野における神経処理を、脳の2つの別々の領域を利用して、何とどこにあるのかを答える能動的視覚システムとして説明している。本研究では,「二流仮説」にインスパイアされた機械学習フレームワークを提案する。具体的には、提案するフレームワークが以下のメカニズムをモデル化する。 1)眼底部が知覚する入力領域に着目した腹側流(何) 2 視覚的指導を提供する背後(場所)流路及び 3)2つのストリームの反復処理により、視覚的焦点を調整し、フォーカスされた画像パッチのシーケンスを処理する。提案するフレームワークのトレーニングは,腹側ストリームモデルのためのラベルベースのDNNトレーニングと背側ストリームモデルのための強化学習によって達成される。本稿では,2ストリームのファベーションに基づく学習が,訓練データをオブジェクトクラスや属性に限定した弱教師付きオブジェクトローカライゼーション(WSOL)の課題に対して適用可能であることを示す。このフレームワークは、オブジェクトのプロパティを予測し、バウンディングボックスを予測してそれをローカライズすることができる。また、この2つのストリームの独立性から、背側モデルを適用することで、異なるデータセットからオブジェクトをローカライズできることを示す。

Deep neural network (DNN) based machine perception frameworks process the entire input in a one-shot manner to provide answers to both "what object is being observed" and "where it is located". In contrast, the "two-stream hypothesis" from neuroscience explains the neural processing in the human visual cortex as an active vision system that utilizes two separate regions of the brain to answer the what and the where questions. In this work, we propose a machine learning framework inspired by the "two-stream hypothesis" and explore the potential benefits that it offers. Specifically, the proposed framework models the following mechanisms: 1) ventral (what) stream focusing on the input regions perceived by the fovea part of an eye (foveation), 2) dorsal (where) stream providing visual guidance, and 3) iterative processing of the two streams to calibrate visual focus and process the sequence of focused image patches. The training of the proposed framework is accomplished by label-based DNN training for the ventral stream model and reinforcement learning for the dorsal stream model. We show that the two-stream foveation-based learning is applicable to the challenging task of weakly-supervised object localization (WSOL), where the training data is limited to the object class or its attributes. The framework is capable of both predicting the properties of an object and successfully localizing it by predicting its bounding box. We also show that, due to the independent nature of the two streams, the dorsal model can be applied on its own to unseen images to localize objects from different datasets.

翻訳日:2024-04-23 23:04:49 公開日:2024-04-20

# AI研究、政策、実践の10の優先事項

Now, Later, and Lasting: Ten Priorities for AI Research, Policy, and Practice ( http://arxiv.org/abs/2404.04750v3 )

ライセンス: Link先を確認

Eric Horvitz, Vincent Conitzer, Sheila McIlraith, Peter Stone,

(参考訳) 人工知能(AI)の進歩は、私たちの生活や社会の多くの側面を変革し、大きな機会をもたらすと同時に、重大なリスクや課題を生じさせます。今後数十年は、産業革命に匹敵する人類の転換点になるかもしれない。 AIに関する百年研究の創始者やリーダーの視点から、前進するための一連の推奨事項を共有します。 10年前に立ち上げられたこのプロジェクトは、複数の専門分野の専門家による永続的な一連の研究にコミットし、人間や社会に対するAIの即時的、長期的、そして遠方的な影響を評価し、AIの研究、政策、実践についてレコメンデーションを行う。ニューラルモデルから新たな能力が生まれるのを目の当たりにしているので、これらのモデルとその振る舞いに関する科学的理解を深める努力をすることが重要です。技術的、社会的、社会技術的レンズを通じて、AIが人や社会に与える影響に対処し、エンジニアリング、社会的、行動的、経済的な分野からの声を含む、さまざまな専門家の洞察を取り入れなければならない。さまざまな利害関係者間の対話、コラボレーション、行動を促進することで、私たちは、AIの開発と展開を、人間の繁栄に貢献する可能性を最大化する方法で戦略的に導くことができます。短期的な意味と長期的な意味に焦点をあてる分野が多様化しているにもかかわらず、どちらも重要な意味を持つと考えている。 1950年、AIのパイオニアの一人であるアラン・チューリングは「我々は少し先までしか見ることができないが、やるべきことはたくさんある」と記した。 AI技術の短期的および長期的影響の両方に対処する、アクションのための10のレコメンデーションを提供します。

Advances in artificial intelligence (AI) will transform many aspects of our lives and society, bringing immense opportunities but also posing significant risks and challenges. The next several decades may well be a turning point for humanity, comparable to the industrial revolution. We write to share a set of recommendations for moving forward from the perspective of the founder and leaders of the One Hundred Year Study on AI. Launched a decade ago, the project is committed to a perpetual series of studies by multidisciplinary experts to evaluate the immediate, longer-term, and far-reaching effects of AI on people and society, and to make recommendations about AI research, policy, and practice. As we witness new capabilities emerging from neural models, it is crucial that we engage in efforts to advance our scientific understanding of these models and their behaviors. We must address the impact of AI on people and society through technical, social, and sociotechnical lenses, incorporating insights from a diverse range of experts including voices from engineering, social, behavioral, and economic disciplines. By fostering dialogue, collaboration, and action among various stakeholders, we can strategically guide the development and deployment of AI in ways that maximize its potential for contributing to human flourishing. Despite the growing divide in the field between focusing on short-term versus long-term implications, we think both are of critical importance. As Alan Turing, one of the pioneers of AI, wrote in 1950, "We can only see a short distance ahead, but we can see plenty there that needs to be done." We offer ten recommendations for action that collectively address both the short- and long-term potential impacts of AI technologies.

翻訳日:2024-04-23 23:04:49 公開日:2024-04-20

# ディープビデオ圧縮のためのタスク認識エンコーダ制御

Task-Aware Encoder Control for Deep Video Compression ( http://arxiv.org/abs/2404.04848v2 )

ライセンス: Link先を確認

Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin,

(参考訳) マシンタスクのためのディープビデオ圧縮(DVC)に関する以前の研究は、通常、特定のタスクごとに独自のコーデックをトレーニングし、タスクごとに専用のデコーダを強制する必要がある。対照的に、従来のビデオコーデックはフレキシブルなエンコーダコントローラを採用しており、モード予測のようなメカニズムによって単一のコーデックを異なるタスクに適応させることができる。このことからインスピレーションを得て,機械用ディープビデオ圧縮のための革新的なエンコーダコントローラを導入する。モード予測とグループ・オブ・ピクチャーズ(GoP)選択モジュールを備える。提案手法は,符号化段階での制御を集中化し,検出やトラッキングなど,さまざまなタスクに適応可能なエンコーダ調整を実現するとともに,標準の事前学習DVCデコーダとの互換性を維持する。実験的な証拠は,本手法が既存の訓練済みDVCを用いて,複数のタスクにまたがって適用可能であることを示している。さらに,本手法が従来のDVCよりも25%ほど優れており,事前学習したデコーダが1つしかないことが実証された。

Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an innovative encoder controller for deep video compression for machines. This controller features a mode prediction and a Group of Pictures (GoP) selection module. Our approach centralizes control at the encoding stage, allowing for adaptable encoder adjustments across different tasks, such as detection and tracking, while maintaining compatibility with a standard pre-trained DVC decoder. Empirical evidence demonstrates that our method is applicable across multiple tasks with various existing pre-trained DVCs. Moreover, extensive experiments demonstrate that our method outperforms previous DVC by about 25% bitrate for different tasks, with only one pre-trained decoder.

翻訳日:2024-04-23 23:04:49 公開日:2024-04-20

# 足のロコマニピュレーションのための視覚全体制御

Visual Whole-Body Control for Legged Loco-Manipulation ( http://arxiv.org/abs/2403.16967v3 )

ライセンス: Link先を確認

Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Rizhao Qiu, Ruihan Yang, Xiaolong Wang,

(参考訳) そこで本研究では,ロボットアームを装着したロボットによる移動操作の問題点について検討する。ロボットの脚は、通常移動のために使用されるが、全身制御を行うことで操作能力を増幅する機会を提供する。つまり、ロボットは足と腕を同時に制御し、ワークスペースを拡張する。視覚的観察により全身制御を自律的に行うことのできる枠組みを提案する。当社のアプローチであるVisual Whole-Body Control(VBC)は、すべての自由度を用いて、エンドエフェクタマニピュレータの位置を追跡する低レベルポリシーと、視覚入力に基づいてエンドエフェクタ位置を提案する高レベルポリシーで構成されている。シミュレーションにおける両レベルのポリシーをトレーニングし、実際のロボット展開のためのSim2Real転送を実行する。さまざまな構成(高さ、位置、方向)と環境において、さまざまなオブジェクトを拾う際に、大規模な実験を行い、ベースラインよりも大幅に改善した。プロジェクトページ: https://wholebody-b1.github.io

We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the end-effector manipulator position and a high-level policy proposing the end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments. Project page: https://wholebody-b1.github.io

翻訳日:2024-04-23 22:55:04 公開日:2024-04-20

# ディープフェイクの生成と検出:ベンチマークと調査

Deepfake Generation and Detection: A Benchmark and Survey ( http://arxiv.org/abs/2403.17881v3 )

ライセンス: Link先を確認

Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, Dacheng Tao,

(参考訳) Deepfake(ディープフェイク)は、特定の条件下で非常にリアルな顔画像やビデオを作成する技術であり、エンターテイメント、映画制作、デジタルヒューマン創造といった分野において大きな応用可能性を持つ。ディープラーニングの進歩により、主に変分オートエンコーダとジェネレーティブ・アドバイサル・ネットワークによって表現される技術は印象的な生成結果を得た。最近では、強力な生成能力を持つ拡散モデルの出現が、新たな研究の波を引き起こしている。ディープフェイク生成に加えて、対応する検出技術は継続的に進化し、プライバシー侵害やフィッシング攻撃などのディープフェイクの潜在的な誤用を規制している。本調査は, この急速に発展する分野における, ディープフェイクの発生と検出, 現状の要約と解析の最新の展開を包括的にレビューする。まずタスク定義を統一し、データセットとメトリクスを包括的に導入し、開発技術について議論する。そこで我々は,複数の関連分野の開発について論じ,顔スワップ,顔の再現,話し顔の生成,顔属性の編集,偽造検出という4つの代表的なディープフェイク分野の研究に焦点をあてる。その後、各分野の一般的なデータセットに代表的手法を総合的にベンチマークし、最新かつ影響力のある著作を十分に評価する。最後に,議論分野の課題と今後の研究方向性について分析する。

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few. With the advancements in deep learning, techniques primarily represented by Variational Autoencoders and Generative Adversarial Networks have achieved impressive generation results. More recently, the emergence of diffusion models with powerful generation capabilities has sparked a renewed wave of research. In addition to deepfake generation, corresponding detection technologies continuously evolve to regulate the potential misuse of deepfakes, such as for privacy invasion and phishing attacks. This survey comprehensively reviews the latest developments in deepfake generation and detection, summarizing and analyzing current state-of-the-arts in this rapidly evolving field. We first unify task definitions, comprehensively introduce datasets and metrics, and discuss developing technologies. Then, we discuss the development of several related sub-fields and focus on researching four representative deepfake fields: face swapping, face reenactment, talking face generation, and facial attribute editing, as well as forgery detection. Subsequently, we comprehensively benchmark representative methods on popular datasets for each field, fully evaluating the latest and influential published works. Finally, we analyze challenges and future research directions of the discussed fields.

翻訳日:2024-04-23 22:55:04 公開日:2024-04-20

# V2X連携によるエンド・ツー・エンド自動運転

End-to-End Autonomous Driving through V2X Cooperation ( http://arxiv.org/abs/2404.00717v2 )

ライセンス: Link先を確認

Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, Zaiqing Nie,

(参考訳) 先進的な自律運転のための有望なアプローチとして,自走車とV2X通信によるインフラセンサデータの協調利用が出現している。しかし、現在の研究では、最終的な計画性能を最適化するためにエンドツーエンドの学習を採用するのではなく、個々のモジュールの改善に重点を置いている。本稿では,UniV2Xについて紹介する。UniV2Xは,多様なビューにまたがる全てのキー駆動モジュールをシームレスに統合し,統合されたネットワークに組み込む,先駆的な自律運転フレームワークである。車両とインフラの効果的な連携のための疎密度ハイブリッドデータ伝送と融合機構を提案し,その利点を3つ挙げる。 1) エージェント認識, オンラインマッピング, 占有率予測を同時に強化し, 最終的に計画性能を向上する。 2)実用的・限られた通信条件に優しい送信システム。 3) このハイブリッドデータの解釈可能性を備えた信頼性のあるデータ融合。我々は、実際の協調運転データセットであるDAIR-V2Xに挑戦する上で、UniV2Xといくつかのベンチマークメソッドを再現する。実験の結果,UniV2Xは計画性能と中間出力性能を大幅に向上させることができた。コードはhttps://github.com/AIR-THU/UniV2Xにある。

Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. Code is at https://github.com/AIR-THU/UniV2X.

翻訳日:2024-04-23 22:55:04 公開日:2024-04-20

# 新型コロナウイルス検出のための空間的スライス学習

A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection ( http://arxiv.org/abs/2404.01643v2 )

ライセンス: Link先を確認

Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai,

(参考訳) 従来のCT画像認識では,各CTスキャンの解像度とサイズに有意なばらつきがしばしばあり,入力サイズと適応性に対する厳密な要件が要求される。 2)CTスキャンには,多くのアウト・オブ・ディストリビューション(OOD)スライスが含まれている。重要な特徴は、CTスキャン全体の特定の空間領域とスライスにのみ存在する可能性がある。これらがどこにあるのか、どうやって効果的に把握できるのか? そこで本稿では,CTスキャンに特化して設計されたSSFL++(Spatial-Slice Feature Learning)フレームワークを提案する。本研究の目的は,全CTスキャンでOODデータをフィルタリングし,70%の冗長性を完全に低減し,解析のための重要な空間スライスを選択することである。一方,KDS法は,トレーニングおよび推論段階における安定性を向上させるため,収束率を向上し,性能を向上する。その結果、トレーニングデータの1%しか持たない単純なE2Dモデルを用いて、本モデルの有望な性能を実証した。 DEF-AI-MIAワークショップで提供されるCOVID-19-CT-DBデータセットとCVPR 2024を併用して,本手法の有効性を検証した。ソースコードはhttps://github.com/ming053l/E2Dで入手できる。

Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code is available at https://github.com/ming053l/E2D

翻訳日:2024-04-23 22:45:14 公開日:2024-04-20

# 2レベルフィードバック制御によるネットワークシステムの侵入耐性

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control ( http://arxiv.org/abs/2404.01741v2 )

ライセンス: Link先を確認

Kim Hammar, Rolf Stadler,

(参考訳) サービスレプリカを2段階最適制御問題とするシステムの侵入耐性を定式化する。ローカルレベルではノードコントローラが侵入回復を行い、グローバルレベルではシステムコントローラが複製係数を管理する。局所的およびグローバルな制御問題は、操作研究における古典的な問題、すなわち機械交換問題と在庫補充問題として定式化することができる。この定式化に基づいて、侵入耐性システムのための新しい制御アーキテクチャであるTOLERANCEを設計する。両レベルにおける最適制御戦略がしきい値構造を持ち、それらの計算に効率的なアルゴリズムを設計することを証明する。 10種類のネットワーク侵入を行うエミュレーション環境でのTOLERANCEの実装と評価を行う。その結果、TOLERANCEは、最先端の侵入耐性システムと比較して、サービスの可用性を向上し、運用コストを低減できることがわかった。

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.

翻訳日:2024-04-23 22:45:14 公開日:2024-04-20

# 拡散$^2$:直交拡散モデルのスコア構成による動的3次元コンテンツ生成

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models ( http://arxiv.org/abs/2404.02148v2 )

ライセンス: Link先を確認

Zeyu Yang, Zijie Pan, Chun Gu, Li Zhang,

(参考訳) 近年の3D生成の進歩は、インターネット規模の画像データで事前訓練され、大量の3Dデータで微調整された3D対応画像拡散モデルの改善により、高度に一貫したマルチビュー画像を生成する能力によって大きく促進されている。しかし、同期したマルチビュービデオデータが不足しているため、このパラダイムを4D生成に直接適用することは不可能である。それにもかかわらず、利用可能なビデオと3Dデータは、ビデオと多視点拡散モデルのトレーニングに適しており、それぞれが満足できる動的および幾何学的事前情報を提供することができる。本稿では,これらのモデルからの幾何的整合性および時間的滑らか性に関する知識を活用し,連続した4次元表現の最適化に使用できる高密度な多視点画像と多フレーム画像を直接サンプリングする動的3次元コンテンツ作成のための新しいフレームワークであるDiffusion$^2$を提案する。具体的には、生成する画像の確率構造に基づいて、ビデオと多視点拡散モデルのスコア合成による簡易かつ効果的な復調戦略を設計する。画像生成の並列性の高さと現代の4D再構成パイプラインの効率性により、我々のフレームワークは数分で4Dコンテンツを生成できる。さらに,本手法は4次元データへの依存を回避し,基礎映像や多視点拡散モデルのスケーラビリティから恩恵を受ける可能性がある。大規模な実験により,提案手法の有効性と各種のプロンプトに柔軟に適応する能力が実証された。

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models which are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of producing highly consistent multi-view images. However, due to the scarcity of synchronized multi-view video data, it is impractical to adapt this paradigm to 4D generation directly. Despite that, the available video and 3D data are adequate for training video and multi-view diffusion models that can provide satisfactory dynamic and geometric priors respectively. In this paper, we present Diffusion$^2$, a novel framework for dynamic 3D content creation that leverages the knowledge about geometric consistency and temporal smoothness from these models to directly sample dense multi-view and multi-frame images which can be employed to optimize continuous 4D representation. Specifically, we design a simple yet effective denoising strategy via score composition of video and multi-view diffusion models based on the probability structure of the images to be generated. Owing to the high parallelism of the image generation and the efficiency of the modern 4D reconstruction pipeline, our framework can generate 4D content within few minutes. Furthermore, our method circumvents the reliance on 4D data, thereby having the potential to benefit from the scalability of the foundation video and multi-view diffusion models. Extensive experiments demonstrate the efficacy of our proposed framework and its capability to flexibly adapt to various types of prompts.

翻訳日:2024-04-23 22:45:14 公開日:2024-04-20

# 人間の視線が常に人間のAIチームの分類精度を向上しない機械を対話的に誘導することを可能にする

Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy ( http://arxiv.org/abs/2404.05238v3 )

ライセンス: Link先を確認

Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen,

(参考訳) Explainable AI (XAI) における何千もの論文、注目マップ \cite{vaswani2017attention} と特徴重要マップ \cite{bansal2020sam} が、AIの判断に各入力機能がどの程度重要かを知る共通の手段として確立されている。ユーザがテスト時に重要な機能を編集できるようにすることで、ダウンストリームタスクにおける人間とAIチームの精度が向上するかどうか、興味深い、未調査の質問である。本稿では、入力画像とトレーニングセット画像のパッチワイド対応を最初に予測し、それらをベースとして分類決定を行う、最先端のアンテホックな分類器 \cite{taesi2022visual} であるCHM-Corrを活用することで、この問題に対処する。我々はCHM-CorrのインタラクティブインターフェースであるCHM-Corr++を構築し、CHM-Corrが提供する機能の重要度マップを編集し、更新されたモデル決定を観察できるようにする。 CHM-Corr++を使用すると、ユーザーはモデルが出力を変更するかどうか、いつ、どのように変更するかについての洞察を得ることができ、静的な説明以上の理解を改善することができる。しかし,1400件の意思決定を行った18名の専門家による研究では,静的な説明よりもCUB-200の鳥画像分類において,対話的アプローチがユーザ精度を向上させるという統計的意義は見出されていない。これは、対話性は人間-AIチームの精度を高め、将来の研究の必要性を高めるという仮説に挑戦する。画像分類器の注意を編集するためのインタラクティブツールであるCHM-Corr++をオープンソースとして公開した(インタラクティブなデモはこちらを参照)。 https://github.com/anguyen8/chm-corr-interactive。

Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive.

翻訳日:2024-04-23 22:45:14 公開日:2024-04-20

# 大規模言語モデルを用いた企業知識ベースに対する質問応答の強化

Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models ( http://arxiv.org/abs/2404.08695v2 )

ライセンス: Link先を確認

Feihu Jiang, Chuan Qin, Kaichun Yao, Chuyu Fang, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong,

(参考訳) 効率的な知識管理は、企業や組織の運用効率と革新的な能力の両方を増強する上で重要な役割を担っている。ベクトル化による知識の索引付けにより,知識検索手法が出現し,知識管理システムの有効性が著しく向上した。近年、生成自然言語処理技術の急速な進歩は、ユーザクエリに適合した関連文書を検索した後、正確で一貫性のある回答を生成するための道を開いた。しかし、企業知識ベースでは、知識検索と生成のためのスクラッチから広範なトレーニングデータを組み立てることは、プライベートデータのプライバシとセキュリティポリシーが大きなコストを伴っているため、非常に難しい課題である。本稿では,大規模言語モデル(LLM)に基づく新しい検索・生成フレームワークであるEKRGを提案する。具体的には,まず LLM を用いて,知識検索者の学習に十分な文書検索ペアを生成する命令チューニング手法を提案する。この方法は、慎重に設計された指示を通じて、事実指向の知識とソリューション指向の知識の両方を含む、企業の知識ベースに対する多様な質問を効率的に生成する。さらに,学習過程の効率化を図るため,関係性に敏感な教師学生学習戦略を構築した。提案手法では,新たな思考連鎖(CoT)に基づく微調整手法を提案する。最後に、実世界のデータセットに関する広範な実験を行い、提案フレームワークの有効性を実証した。

Efficient knowledge management plays a pivotal role in augmenting both the operational efficiency and the innovative capacity of businesses and organizations. By indexing knowledge through vectorization, a variety of knowledge retrieval methods have emerged, significantly enhancing the efficacy of knowledge management systems. Recently, the rapid advancements in generative natural language processing technologies paved the way for generating precise and coherent answers after retrieving relevant documents tailored to user queries. However, for enterprise knowledge bases, assembling extensive training data from scratch for knowledge retrieval and generation is a formidable challenge due to the privacy and security policies of private data, frequently entailing substantial costs. To address the challenge above, in this paper, we propose EKRG, a novel Retrieval-Generation framework based on large language models (LLMs), expertly designed to enable question-answering for Enterprise Knowledge bases with limited annotation costs. Specifically, for the retrieval process, we first introduce an instruction-tuning method using an LLM to generate sufficient document-question pairs for training a knowledge retriever. This method, through carefully designed instructions, efficiently generates diverse questions for enterprise knowledge bases, encompassing both fact-oriented and solution-oriented knowledge. Additionally, we develop a relevance-aware teacher-student learning strategy to further enhance the efficiency of the training process. For the generation process, we propose a novel chain of thought (CoT) based fine-tuning method to empower the LLM-based generator to adeptly respond to user questions using retrieved documents. Finally, extensive experiments on real-world datasets have demonstrated the effectiveness of our proposed framework.

翻訳日:2024-04-23 20:47:39 公開日:2024-04-20

# FusionMamba: Mambaを用いたマルチモーダル画像融合のための動的特徴強調

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba ( http://arxiv.org/abs/2404.09498v2 )

ライセンス: Link先を確認

Xinyu Xie, Yawen Cui, Chio-In Ieong, Tao Tan, Xiaozhi Zhang, Xubin Zheng, Zitong Yu,

(参考訳) マルチモーダル画像融合は、異なるモードの情報を組み合わせて、包括的な情報と詳細なテクスチャを持つ単一の画像を作成することを目的としている。しかし、畳み込みニューラルネットワークに基づく融合モデルは、局所畳み込み操作に焦点をあてたため、グローバルな画像の特徴を捉える際の限界に直面する。トランスフォーマーベースのモデルは、グローバルな特徴モデリングに優れているが、その2次複雑さに起因する計算上の課題に直面している。近年、Selective Structured State Space Modelは、線形複雑度を持つ長距離依存モデリングにおいて重要な可能性を示し、上記のジレンマに対処するための有望な道を提供する。本稿では,マルチモーダル画像融合のための動的特徴強調手法FusionMambaを提案する。具体的には,画像融合のための効率的なマンバモデルを提案し,動的畳み込みとチャネルアテンションによる効率的な視覚状態空間モデルを統合する。この改良されたモデルは、Mambaの性能とグローバルモデリング能力だけでなく、局所的な拡張能力を高めながらチャネルの冗長性を低下させる。さらに,2つの動的特徴拡張モジュール (DFEM) と相互モード融合マンバモジュール (CMFM) からなる動的特徴融合モジュール (DFFM) を考案した。前者は動的テクスチャ強化と動的差分知覚に役立ち、後者はモード間の相関性を高め、冗長なモーダル情報を抑制する。 FusionMambaは、様々なマルチモーダル画像融合タスク(CT-MRI、PET-MRI、SPECT-MRI)、赤外線および可視画像融合タスク(IR-VIS)、多モーダルバイオメディカル画像融合データセット(GFP-PC)にまたがって、最先端のSOTA(State-of-the-art)性能を実現した。 FusionMambaのコードはhttps://github.com/millieXie/FusionMamba.comで公開されている。

Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at https://github.com/millieXie/FusionMamba.

翻訳日:2024-04-23 20:37:54 公開日:2024-04-20

# フォッカー・プランク方程式の補間超対称対

Interpolating supersymmetric pair of Fokker-Planck equations ( http://arxiv.org/abs/2404.09551v2 )

ライセンス: Link先を確認

Choon-Lin Ho,

(参考訳) 我々は、一対の超対称関連Fokker-Planck方程式を定数係数で補間するFokker-Planck方程式を考える。形状不変性の興味深い性質に基づき、超対称対のフォッカー・プランク系の解の様々な1パラメータ補間を直接構築することができる。

We consider Fokker-Planck equations that interpolate a pair of supersymmetrically related Fokker-Planck equations with constant coefficients. Based on the interesting property of shape-invariance, various one-parameter interpolations of the solutions of the supersymmetric pair of Fokker-Planck systems can be directly constructed.

翻訳日:2024-04-23 20:37:54 公開日:2024-04-20

# CREST: ゼロショット学習の強化のための証拠深層学習によるクロスモーダル共鳴

CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning ( http://arxiv.org/abs/2404.09640v3 )

ライセンス: Link先を確認

Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li,

(参考訳) ゼロショット学習(ZSL)は、既知のカテゴリから未知のカテゴリへのセマンティックな知識伝達を活用することで、新しいクラスの認識を可能にする。この知識は、典型的には属性記述にカプセル化され、クラス固有の視覚的特徴を識別し、視覚的セマンティックなアライメントを促進し、ZSLのパフォーマンスを向上させる。しかし、インスタンス間の分布不均衡や属性共起といった現実世界の課題は、画像の局所的なばらつきの識別を妨げることがしばしばあり、これは、きめ細かい領域固有の属性アノテーションの不足によって悪化する。さらに、カテゴリー内の視覚的プレゼンテーションの多様性は属性カテゴリーの関連を歪ませることもできる。そこで本研究では,双方向の双方向ZSLアプローチであるCRESTを提案する。属性と視覚的ローカライゼーションの表現を抽出することから始まり、Evidential Deep Learning (EDL) を用いて、根底にあるてんかんの不確実性を測定することによって、強陰性に対するモデルのレジリエンスを高める。 CRESTには、視覚的カテゴリと属性的カテゴリのアライメントの両方に焦点を当てたデュアルラーニングパスが組み込まれており、潜在空間と可観測空間の堅牢な相関性を保証する。さらに,不確実性のあるクロスモーダル融合手法を導入し,視覚属性推論を洗練させる。大規模な実験では、複数のデータセットにまたがるモデルの有効性とユニークな説明可能性を示す。私たちのコードとデータは、https://github.com/JethroJames/CRESTで利用可能です。

Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances and attribute co-occurrence among instances often hinder the discernment of local variances in images, a problem exacerbated by the scarcity of fine-grained, region-specific attribute annotations. Moreover, the variability in visual presentation within categories can also skew attribute-category associations. In response, we propose a bidirectional cross-modal ZSL approach CREST. It begins by extracting representations for attribute and visual localization and employs Evidential Deep Learning (EDL) to measure underlying epistemic uncertainty, thereby enhancing the model's resilience against hard negatives. CREST incorporates dual learning pathways, focusing on both visual-category and attribute-category alignments, to ensure robust correlation between latent and observable spaces. Moreover, we introduce an uncertainty-informed cross-modal fusion technique to refine visual-attribute inference. Extensive experiments demonstrate our model's effectiveness and unique explainability across multiple datasets. Our code and data are available at: https://github.com/JethroJames/CREST

翻訳日:2024-04-23 20:37:54 公開日:2024-04-20

# セキュリティとプライバシ製品インクルージョン

Security and Privacy Product Inclusion ( http://arxiv.org/abs/2404.13220v1 )

ライセンス: Link先を確認

Dave Kleidermacher, Emmanuel Arriaga, Eric Wang, Sebastian Porst, Roger Piqueras Jover,

(参考訳) 本稿では,多様な背景からユーザに対するセキュリティとプライバシを確保することの課題について考察する。本稿では,セキュリティとプライバシに製品が組み込まれるリスクや対策を識別するための脅威モデリング手法を提案する。我々は、低所得層、接続性の低さ、デバイス使用の共有、MLフェアネスなど、ユーザが高いレベルのセキュリティとプライバシを達成する能力に影響を与えるさまざまな要因について論じる。我々は,グローバルなセキュリティおよびプライバシユーザエクスペリエンス調査の結果を提示し,製品開発者への影響について論じる。私たちの研究は、セキュリティとプライバシに対するより包括的なアプローチの必要性を強調し、研究者や実践者がさまざまなユーザのために製品やサービスを設計するとき、考慮すべきフレームワークを提供します。

In this paper, we explore the challenges of ensuring security and privacy for users from diverse demographic backgrounds. We propose a threat modeling approach to identify potential risks and countermeasures for product inclusion in security and privacy. We discuss various factors that can affect a user's ability to achieve a high level of security and privacy, including low-income demographics, poor connectivity, shared device usage, ML fairness, etc. We present results from a global security and privacy user experience survey and discuss the implications for product developers. Our work highlights the need for a more inclusive approach to security and privacy and provides a framework for researchers and practitioners to consider when designing products and services for a diverse range of users.

翻訳日:2024-04-23 19:58:55 公開日:2024-04-20

# Vim4Path: 病理画像のための自己監督型視覚マンバ

Vim4Path: Self-Supervised Vision Mamba for Histopathology Images ( http://arxiv.org/abs/2404.13222v1 )

ライセンス: Link先を確認

Ali Nasiri-Sarvi, Vincent Quoc-Huy Trinh, Hassan Rivaz, Mahdi S. Hosseini,

(参考訳) Gigapixel Whole Slide Images (WSI) からの表現学習は、組織構造の複雑な性質とラベル付きデータの不足により、計算病理学において重要な課題となっている。マルチインスタンス学習手法はこの課題に対処し、イメージパッチを活用し、自己監視学習(SSL)アプローチを用いた事前学習モデルを用いたスライドの分類を行っている。 SSLとMILの両方のパフォーマンスは、機能エンコーダのアーキテクチャに依存している。本稿では、状態空間モデルにインスパイアされたVision Mamba(Vim)アーキテクチャを、DINOフレームワークの計算病理学における表現学習に活用することを提案する。我々は、パッチレベルとスライドレベルの両方の分類において、Camelyon16データセット上でのVim対ビジョントランスフォーマー(ViT)の性能を評価する。以上の結果から,Vim は ViT と比較して性能が向上し,特に比較的小規模なモデルでは ROC AUC が8.21 増加していることが明らかとなった。説明可能性分析は、Vimの機能をさらに強調し、Vimが病理学者のワークフローに似ていないViTを独自にエミュレートしていることを明らかにした。この人間の専門的分析との整合性は、現実的な診断におけるヴィムの可能性を強調し、計算病理学における効果的な表現学習アルゴリズムの開発に大きく貢献する。コードと事前訓練されたウェイトは、 \url{https://github.com/AtlasAnalyticsLab/Vim4Path}でリリースします。

Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The performance of both SSL and MIL methods relies on the architecture of the feature encoder. This paper proposes leveraging the Vision Mamba (Vim) architecture, inspired by state space models, within the DINO framework for representation learning in computational pathology. We evaluate the performance of Vim against Vision Transformers (ViT) on the Camelyon16 dataset for both patch-level and slide-level classification. Our findings highlight Vim's enhanced performance compared to ViT, particularly at smaller scales, where Vim achieves an 8.21 increase in ROC AUC for models of similar size. An explainability analysis further highlights Vim's capabilities, which reveals that Vim uniquely emulates the pathologist workflow-unlike ViT. This alignment with human expert analysis highlights Vim's potential in practical diagnostic settings and contributes significantly to developing effective representation-learning algorithms in computational pathology. We release the codes and pretrained weights at \url{https://github.com/AtlasAnalyticsLab/Vim4Path}.

翻訳日:2024-04-23 19:58:55 公開日:2024-04-20

# タブラルデータに対する特徴空間属性を組み込んだモデルに基づく対実的説明

Model-Based Counterfactual Explanations Incorporating Feature Space Attributes for Tabular Data ( http://arxiv.org/abs/2404.13224v1 )

ライセンス: Link先を確認

Yuta Sumiya, Hayaru shouno,

(参考訳) 大規模なデータセットからパターンを正確に予測することが知られている機械学習モデルは、意思決定において極めて重要である。その結果、入力摂動を導入して予測を説明する反事実的説明手法が顕著になった。これらの混乱は、しばしば予測を変更する方法を示唆し、実行可能なレコメンデーションをもたらす。しかし、現在の手法では、各入力変更の最適化問題を解く必要があり、計算コストがかかる。さらに、従来の符号化手法は表データのカテゴリ変数の摂動に不適切に対処する。そこで本研究では,正規化フローを用いた効率的な対実的説明法であるFastDCFlowを提案する。提案手法は, 複雑なデータ分布を捕捉し, 近接性を保持する有意義な潜在空間を学習し, 予測を改善する。分類変数に対しては、順序関係を尊重し、摂動コストを含むTargetEncodingを採用しました。提案手法は, 既存の手法を複数の指標で比較し, 対実的説明のためのトレードオフのバランスを崩した。ソースコードは以下のリポジトリで入手できる。

Machine-learning models, which are known to accurately predict patterns from large datasets, are crucial in decision making. Consequently, counterfactual explanations-methods explaining predictions by introducing input perturbations-have become prominent. These perturbations often suggest ways to alter the predictions, leading to actionable recommendations. However, the current techniques require resolving the optimization problems for each input change, rendering them computationally expensive. In addition, traditional encoding methods inadequately address the perturbations of categorical variables in tabular data. Thus, this study propose FastDCFlow, an efficient counterfactual explanation method using normalizing flows. The proposed method captures complex data distributions, learns meaningful latent spaces that retain proximity, and improves predictions. For categorical variables, we employed TargetEncoding, which respects ordinal relationships and includes perturbation costs. The proposed method outperformed existing methods in multiple metrics, striking a balance between trade offs for counterfactual explanations. The source code is available in the following repository: https://github.com/sumugit/FastDCFlow.

翻訳日:2024-04-23 19:58:55 公開日:2024-04-20

# 量子計算における誤差補正アルゴリズムのためのナノメカニカルアンシラ量子ビット生成器

Nanomechanical ancilla qubits generator for error correction algorithms in quantum computation ( http://arxiv.org/abs/2404.13234v1 )

ライセンス: Link先を確認

Danko Radić, Leonid Y. Gorelik, Sergei I. Kulinich, Robert I. Shekhter,

(参考訳) 本稿では,3ビットフリップ符号のエンコーダとして実証された,量子コンピューティングにおける誤り訂正アルゴリズムに対して,適切に絡み合った「アンシラ」量子ビットを生成するナノエレクトロメカニカルセットアップを提案する。このセットアップは、電圧バイアス超伝導電極とクーパー対箱の状態で機械的に振動するメソスコピック超伝導粒との間の交流ジョセフソン効果を利用して、ゲート電圧によって制御されるメソスコピック端子に基づいている。要求された機能は、特に2つの外部パラメータ(バイアス電圧とゲート電圧)を操作するための時間プロトコールによって達成される。超電導穀物は、カンチレバーのフリーエンドに固定され、制御された機内機械振動を行い、2つの垂直空間方向に一対の絡み合った猫状態に組織されたナノメカニカルコヒーレント状態を生成する。クーパー対箱とナノメカニカルコヒーレント状態は、特定の方法で3つの絡み合った量子ビットとなる: 最初はクーパー対箱状態の重ね合わせでエンコードされた量子情報は、2つの特別な3つの四角い状態、$\vert \uparrow + \, + \rangle$ と $\vert \downarrow - \, - \rangle$ の量子重ね合わせに変換される。これは3ビットビットフリップ符号の基本入力状態を構成し、主に量子計算でエラー訂正に使用され、ナノエレクトロメカニクスによって最後の2つのアンシラ量子ビットが生成される1つの物理オブジェクトに「インストール」される。

We suggest a nanoelectromechanical setup that generates properly entangled ancillary ("ancilla") qubits for error correction algorithms in quantum computing, demonstrated as an encoder for the three-qubit bit flip code. The setup is based on mesoscopic terminal utilizing the AC Josephson effect between voltage biased superconducting electrodes and mechanically vibrating mesoscopic superconducting grain in the regime of the Cooper pair box, controlled by the gate voltage. Required functionality is achieved by specifically tailored time-protocol of operating two external parameters: bias voltage and gate voltage. The superconducting grain is fixed on the free end of a cantilever, performing controlled in-plane mechanical vibrations, generating the nanomechanical coherent states organised in a pair of entangled cat-states in two perpendicular spatial directions. Cooper pair box and nanomechanical coherent states become three entangled qubits in a particular way: quantum information, initially encoded in superposition of the Cooper pair box states, is transduced into quantum superposition of two special 3-qubit entangled states, $\vert \uparrow + \, + \rangle$ and $\vert \downarrow - \, - \rangle$. It constitutes the basic input state for the three-qubit bit flip code, used in quantum computation mainly for error correction, "installed" on a single physical object in which the last two ancilla qubits are generated by the nanoelectromechanical setup.

翻訳日:2024-04-23 19:58:55 公開日:2024-04-20

# TrialDura: 解釈可能な治験期間予測のための階層的注意変換器

TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction ( http://arxiv.org/abs/2404.13235v1 )

ライセンス: Link先を確認

Ling Yue, Jonathan Li, Md Zabirul Islam, Bolun Xia, Tianfan Fu, Jintai Chen,

(参考訳) 臨床試験プロセスは、薬物開発としても知られ、新しい治療法の開発に欠かせないステップである。介入臨床試験の主な目的は、人体における特定の疾患の治療における薬物ベースの治療の安全性と効果を評価することである。しかし、臨床試験は長く、労働集約的で、費用がかかる。臨床試験の期間は、全体的な費用に影響を与える重要な要因である。したがって、臨床試験のスケジュールを効果的に管理することは、予算の制御と研究の経済的可能性の最大化に不可欠である。この問題に対処するために、病気名、薬物分子、試験段階、資格基準を含む多モードデータを用いて臨床試験期間を推定する機械学習ベースのTrialDuraを提案する。次に,臨床実験データのより深く,より関連性の高い意味的理解を提供するために,バイオメディカルコンテキスト用に特別に調整されたBio-BERT埋め込みにエンコードする。最後に、モデルの階層的な注意機構は、すべての埋め込みを繋ぎ、それらの相互作用を捉え、臨床試験期間を予測する。提案モデルでは, 平均絶対誤差(MAE)が1.04年, 根平均二乗誤差(RMSE)が1.39年であった。公開されているコードはhttps://anonymous.4open.science/r/TrialDura-F196で見ることができる。

The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a crucial factor that influences overall expenses. Therefore, effective management of the timeline of a clinical trial is essential for controlling the budget and maximizing the economic viability of the research. To address this issue, We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data, including disease names, drug molecules, trial phases, and eligibility criteria. Then, we encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding of clinical trial data. Finally, the model's hierarchical attention mechanism connects all of the embeddings to capture their interactions and predict clinical trial duration. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models, indicating more accurate clinical trial duration prediction. Publicly available code can be found at https://anonymous.4open.science/r/TrialDura-F196

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# PAFedFV:指静脈認識のための個人化・非同期フェデレーション学習

PAFedFV: Personalized and Asynchronous Federated Learning for Finger Vein Recognition ( http://arxiv.org/abs/2404.13237v1 )

ライセンス: Link先を確認

Hengyu Mu, Jian Guo, Chong Han, Lijuan Sun,

(参考訳) ユーザのプライバシ保護に重点が置かれているため、フェデレートラーニングに基づく生体認証が最新の研究ホットスポットとなっている。しかし,従来のフェデレーション学習法は,データの不均一性やオープンセット検証のため,指静脈認識に直接適用できない。そのため、いくつかの応用例が提案されているのみである。これらの方法には相変わらず2つの欠点がある。 1) 指静脈データは非常に均一であり,非独立に非独立に分布する(非IID)ため,一部のクライアントでは一様モデルでは性能が低下する。 2)個々のクライアントでは、サーバからモデルを返すのを待つ時間など、大量の時間が使われません。このような問題に対処するため,本研究では,Personalized and Asynchronous Federated Learning for Finger Vein Recognition (PAFedFV) フレームワークを提案する。 PAFedFVは、非IIDデータの不均一性を解決するために、パーソナライズされたモデルアグリゲーション法を設計する。一方、クライアントが待機時間を利用するために、非同期のトレーニングモジュールを使用している。最後に、6つの指静脈データセットに関する広範な実験を行った。これらの実験結果に基づいて, フェデレート学習における非IID指静脈データの影響を解析し, PAFedFVの精度およびロバスト性における優位性を実証した。

With the increasing emphasis on user privacy protection, biometric recognition based on federated learning have become the latest research hotspot. However, traditional federated learning methods cannot be directly applied to finger vein recognition, due to heterogeneity of data and open-set verification. Therefore, only a few application cases have been proposed. And these methods still have two drawbacks. (1) Uniform model results in poor performance in some clients, as the finger vein data is highly heterogeneous and non-Independently Identically Distributed (non-IID). (2) On individual client, a large amount of time is underutilized, such as the time to wait for returning model from server. To address those problems, this paper proposes a Personalized and Asynchronous Federated Learning for Finger Vein Recognition (PAFedFV) framework. PAFedFV designs personalized model aggregation method to solve the heterogeneity among non-IID data. Meanwhile, it employs an asynchronized training module for clients to utilize their waiting time. Finally, extensive experiments on six finger vein datasets are conducted. Base on these experiment results, the impact of non-IID finger vein data on performance of federated learning are analyzed, and the superiority of PAFedFV in accuracy and robustness are demonstrated.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# 大規模言語モデルのためのパーソナライズされた無線フェデレーション学習

Personalized Wireless Federated Learning for Large Language Models ( http://arxiv.org/abs/2404.13238v1 )

ライセンス: Link先を確認

Feibo Jiang, Li Dong, Siwei Tu, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato,

(参考訳) 大規模言語モデル(LLM)は自然言語処理タスクに革命をもたらした。しかしながら、無線ネットワークへの展開は、プライバシとセキュリティ保護機構の欠如など、依然として課題に直面している。フェデレートラーニング(FL)は、これらの課題に対処するための有望なアプローチとして登場した。しかし、大きなデータと不均一なデータの非効率な処理、リソース集約的なトレーニング、高い通信オーバーヘッドといった問題に悩まされている。これらの課題に対処するために、まず、無線ネットワークにおけるLLMの異なる学習段階と特徴を比較した。次に、コミュニケーションオーバーヘッドの少ない2つのパーソナライズされたワイヤレスフェデレーション微調整手法、すなわち、強化学習を利用してパーソナライズを実現するローカルLLMをパーソナライズするパーソナライズドフェデレーション微調整法(PFIT)、グローバルアダプタとローカルローランド適応(LoRA)を活用してローカルLoRAをアグリゲーションなしでパーソナライズできるパーソナライズされたフェデレーションタスク微調整法(PFTT)を導入する。最後に,提案手法の有効性を実証するためにシミュレーションを行い,オープンな問題を包括的に議論する。

Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms. Federated Learning (FL) has emerged as a promising approach to address these challenges. Yet, it suffers from issues including inefficient handling with big and heterogeneous data, resource-intensive training, and high communication overhead. To tackle these issues, we first compare different learning stages and their features of LLMs in wireless networks. Next, we introduce two personalized wireless federated fine-tuning methods with low communication overhead, i.e., (1) Personalized Federated Instruction Tuning (PFIT), which employs reinforcement learning to fine-tune local LLMs with diverse reward models to achieve personalization; (2) Personalized Federated Task Tuning (PFTT), which can leverage global adapters and local Low-Rank Adaptations (LoRA) to collaboratively fine-tune local LLMs, where the local LoRAs can be applied to achieve personalization without aggregation. Finally, we perform simulations to demonstrate the effectiveness of the proposed two methods and comprehensively discuss open issues.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# 医用画像セグメンテーションのためのPixel-Wiseスーパービジョンを超えて:従来のモデルから基礎モデルへ

Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models ( http://arxiv.org/abs/2404.13239v1 )

ライセンス: Link先を確認

Yuyan Shi, Jialu Ma, Jin Yang, Shasha Wang, Yichi Zhang,

(参考訳) 医用画像のセグメンテーションは多くの画像誘導臨床アプローチにおいて重要な役割を担っている。しかし、既存のセグメンテーションアルゴリズムは、特に専門家だけが信頼性と正確なアノテーションを提供することができる医療画像領域において、主に、労働集約的かつ専門的要求の両方が可能な、訓練用のピクセル単位のアノテーション付き完全注釈画像の可用性に依存している。この課題を軽減するため、画像レベル、バウンディングボックス、スクリブル、ポイントなどの弱いアノテーションでディープモデルをトレーニングできるセグメンテーション手法の開発に注力している。視覚基盤モデルの出現、特にSAM(Segment Anything Model)は、大規模な事前学習によって可能となる、迅速なセグメンテーションのための弱いアノテーションを使ったセグメンテーションタスクの革新的な機能を導入している。従来の学習手法とともに基礎モデルを採用することで、近年の関心調査コミュニティが増加し、現実世界の応用の可能性を示している。本稿では,弱アノテーションを用いた医用画像セグメンテーションにおけるアノテーション効率学習における基礎モデル導入前後の最近の進歩を包括的に調査する。さらに,既存のアプローチの課題を分析,議論し,基礎モデルの軌跡を形作り,医用画像セグメンテーションの分野をさらに進めるための貴重なガイダンスを提供すると信じている。

Medical image segmentation plays an important role in many image-guided clinical approaches. However, existing segmentation algorithms mostly rely on the availability of fully annotated images with pixel-wise annotations for training, which can be both labor-intensive and expertise-demanding, especially in the medical imaging domain where only experts can provide reliable and accurate annotations. To alleviate this challenge, there has been a growing focus on developing segmentation methods that can train deep models with weak annotations, such as image-level, bounding boxes, scribbles, and points. The emergence of vision foundation models, notably the Segment Anything Model (SAM), has introduced innovative capabilities for segmentation tasks using weak annotations for promptable segmentation enabled by large-scale pre-training. Adopting foundation models together with traditional learning methods has increasingly gained recent interest research community and shown potential for real-world applications. In this paper, we present a comprehensive survey of recent progress on annotation-efficient learning for medical image segmentation utilizing weak annotations before and in the era of foundation models. Furthermore, we analyze and discuss several challenges of existing approaches, which we believe will provide valuable guidance for shaping the trajectory of foundational models to further advance the field of medical image segmentation.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# 2つの市場におけるラミフィケーションを用いた逆因果戦略環境の学習

Learning In Reverse Causal Strategic Environments With Ramifications on Two Sided Markets ( http://arxiv.org/abs/2404.13240v1 )

ライセンス: Link先を確認

Seamus Somerstep, Yuekai Sun, Ya'acov Ritov,

(参考訳) 労働市場の均衡モデルによって動機づけられた我々は、戦略エージェントが直接成果を操作できる因果戦略分類の定式化を開発する。応用として、労働力の戦略的対応を期待する雇用主と、そうでない雇用主を比較する。我々は,雇用者報酬,労働力のスキルレベル,場合によっては労働力のエクイティを改善するために,適度に最適な雇用政策を持つ雇用者が,雇用者報酬を改善するという理論と実験の組み合わせを提示する。一方,作業従事者は労働力の効用を害し,他の事例では差別を防げないことを示す。

Motivated by equilibrium models of labor markets, we develop a formulation of causal strategic classification in which strategic agents can directly manipulate their outcomes. As an application, we compare employers that anticipate the strategic response of a labor force with employers that do not. We show through a combination of theory and experiment that employers with performatively optimal hiring policies improve employer reward, labor force skill level, and in some cases labor force equity. On the other hand, we demonstrate that performative employers harm labor force utility and fail to prevent discrimination in other cases.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# オークション型フェデレーション学習のための知的エージェント:調査

Intelligent Agents for Auction-based Federated Learning: A Survey ( http://arxiv.org/abs/2404.13244v1 )

ライセンス: Link先を確認

Xiaoli Tang, Han Yu, Xiaoxiao Li, Sarit Kraus,

(参考訳) オークションベースのフェデレーション・ラーニング(AFL)はFLインセンティブ・メカニズム設計の重要な分野であり、高品質なデータ・オーナーがデータ・コンシューマー(すなわちサーバ)のFLトレーニング・タスクに参加することを公平かつ効率的に動機付ける能力がある。利害関係者(すなわちデータ消費者、データ所有者、オークション業者)に対するAFL意思決定支援の効率を高めるために、インテリジェントエージェントベースの技術が出現した。しかし、この分野の非常に学際的な性質と、アクセス可能な視点を提供する総合的な調査が欠如していることから、研究者がこの分野に参入して貢献することは困難である。本稿では,AI-AFL(Intelligent Agents for AFL)文献に関する第1回調査を通じて,この重要なギャップを埋める。既存のIA-AFLの動作を整理する独自の多層分類法を提案する。 1)利害関係者は, 2 競売の仕組みが採用され、及び 3)エージェントの目的は,読者にこの分野に対する多視点的な視点を提供することである。さらに,既存手法の限界を分析し,広く採用されている性能評価指標を要約し,IA-AFLエコシステムにおける効果的かつ効率的な利害関係者主導の意思決定支援に向けた将来的な方向性について議論する。

Auction-based federated learning (AFL) is an important emerging category of FL incentive mechanism design, due to its ability to fairly and efficiently motivate high-quality data owners to join data consumers' (i.e., servers') FL training tasks. To enhance the efficiency in AFL decision support for stakeholders (i.e., data consumers, data owners, and the auctioneer), intelligent agent-based techniques have emerged. However, due to the highly interdisciplinary nature of this field and the lack of a comprehensive survey providing an accessible perspective, it is a challenge for researchers to enter and contribute to this field. This paper bridges this important gap by providing a first-of-its-kind survey on the Intelligent Agents for AFL (IA-AFL) literature. We propose a unique multi-tiered taxonomy that organises existing IA-AFL works according to 1) the stakeholders served, 2) the auction mechanism adopted, and 3) the goals of the agents, to provide readers with a multi-perspective view into this field. In addition, we analyse the limitations of existing approaches, summarise the commonly adopted performance evaluation metrics, and discuss promising future directions leading towards effective and efficient stakeholder-oriented decision support in IA-AFL ecosystems.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# ISQA:科学要約のためのインフォームティブ・ファクチュアリティ・フィードバック

ISQA: Informative Factuality Feedback for Scientific Summarization ( http://arxiv.org/abs/2404.13246v1 )

ライセンス: Link先を確認

Zekai Li, Yanxia Qin, Qian Liu, Min-Yen Kan,

(参考訳) Informative Scientific Question-Answering (ISQA) feedback\footnote{Code is available at \url{https://github.com/lizekai-richard/isqa}}。要約の反復的精錬を通じて、科学的な要約の事実性を高めるために、文の基本的な理性を探究する。 ISQAは、肯定的なフィードバックで検証されたステートメントを補強し、否定的なフィードバックで不正なステートメントを修正するよう、要約エージェントに頼んで、これをきめ細かな方法で行う。以上の結果から,ISQAフィードバック機構は,複数の科学的データセットで評価されるように,要約タスクにおける各種オープンソースLCMの事実性を大幅に向上することが示された。

We propose Iterative Facuality Refining on Informative Scientific Question-Answering (ISQA) feedback\footnote{Code is available at \url{https://github.com/lizekai-richard/isqa}}, a method following human learning theories that employs model-generated feedback consisting of both positive and negative information. Through iterative refining of summaries, it probes for the underlying rationale of statements to enhance the factuality of scientific summarization. ISQA does this in a fine-grained manner by asking a summarization agent to reinforce validated statements in positive feedback and fix incorrect ones in negative feedback. Our findings demonstrate that the ISQA feedback mechanism significantly improves the factuality of various open-source LLMs on the summarization task, as evaluated across multiple scientific datasets.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# ハイパースペクトル画像分類のための3次元畳み込み誘導スペクトル空間変換器

3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification ( http://arxiv.org/abs/2404.13252v1 )

ライセンス: Link先を確認

Shyam Varahagiri, Aryaman Sinha, Shiv Ram Dubey, Satish Kumar Singh,

(参考訳) 近年、ビジョントランスフォーマー(ViT)は、自己認識機構のため、畳み込みニューラルネットワーク(CNN)よりも有望な分類性能を示している。多くの研究者がハイパースペクトル画像(HSI)分類にViTを組み込んでいる。 HSIは狭いスペクトル帯域によって特徴づけられ、豊富なスペクトルデータを提供する。 ViTはシーケンシャルなデータを扱うが、CNNのようなスペクトル空間情報を抽出することはできない。さらに、高い分類性能を持つためには、HSIトークンとクラス(CLS)トークンの間に強い相互作用がある必要がある。これらの問題を解決するために、3D-Convolution Guided Residual Module (CGRM) を用いたHSI分類のための3D-Convolution Guided Spectral-Spatial Transformer (3D-ConvSST)を提案する。さらに、クラストークンを前もってGlobal Average Poolingを適用し、より差別的で関連する高レベルな特徴を効果的にコード化します。 3つの公開HSIデータセットを用いて、最先端の伝統、畳み込み、トランスフォーマーモデルよりも提案モデルの方が優れていることを示す大規模な実験が行われた。コードはhttps://github.com/ShyamVarahagiri/3D-ConvSSTで公開されている。

In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cannot extract spectral-spatial information like CNNs. Furthermore, to have high classification performance, there should be a strong interaction between the HSI token and the class (CLS) token. To solve these issues, we propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification that utilizes a 3D-Convolution Guided Residual Module (CGRM) in-between encoders to "fuse" the local spatial and spectral information and to enhance the feature propagation. Furthermore, we forego the class token and instead apply Global Average Pooling, which effectively encodes more discriminative and pertinent high-level features for classification. Extensive experiments have been conducted on three public HSI datasets to show the superiority of the proposed model over state-of-the-art traditional, convolutional, and Transformer models. The code is available at https://github.com/ShyamVarahagiri/3D-ConvSST.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# ST-SSM:交通予測のための空間空間モデルの選択状態

ST-SSMs: Spatial-Temporal Selective State of Space Model for Traffic Forecasting ( http://arxiv.org/abs/2404.13257v1 )

ライセンス: Link先を確認

Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Haoning Xi, Junbin Gao,

(参考訳) 正確な効率的な交通予測は、インテリジェント交通システムの計画、管理、制御に不可欠である。交通予測の最先端手法の多くは、時空間ニューラルネットワークを予測モデルとして用い、トランスフォーマーとともに予測対象(道路セグメントの交通状況など)のグローバルな情報を学ぶことによって、長期と短期の両方を効果的に予測する。しかし、これらの手法は優れた性能を得るのに高い計算コストがかかることが多い。本稿では,新しいST-Mambaブロックを特徴とする交通流予測モデルST-SSM(Spatial-Temporal Selective State Space Model)を提案する。比較分析ではST-マンバ層の効率が強調され、3つの注意層に等価であるが、処理時間が大幅に短縮された。多様な実世界のデータセットの厳密なテストを通じて、ST-SSMsモデルは予測精度と計算の単純さを例外的に改善し、トラフィックフロー予測領域に新しいベンチマークを設定する。

Accurate and efficient traffic prediction is crucial for planning, management, and control of intelligent transportation systems. Most state-of-the-art methods for traffic prediction effectively predict both long-term and short-term by employing spatio-temporal neural networks as prediction models, together with transformers to learn global information on prediction objects (e.g., traffic states of road segments). However, these methods often have a high computational cost to obtain good performance. This paper introduces an innovative approach to traffic flow prediction, the Spatial-Temporal Selective State Space Model (ST-SSMs), featuring the novel ST-Mamba block, which can achieve good prediction accuracy with less computational cost. A comparative analysis highlights the ST-Mamba layer's efficiency, revealing its equivalence to three attention layers, yet with markedly reduced processing time. Through rigorous testing on diverse real-world datasets, the ST-SSMs model demonstrates exceptional improvements in prediction accuracy and computational simplicity, setting new benchmarks in the domain of traffic flow forecasting

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# インカムと健康因子の機械学習分析による糖尿病予測

Predicting Diabetes with Machine Learning Analysis of Income and Health Factors ( http://arxiv.org/abs/2404.13260v1 )

ライセンス: Link先を確認

Fariba Jafari Horestani, M. Mehdi Owrang O,

(参考訳) 本研究では,糖尿病と健康指標の複雑な関係について検討し,新たな収入の変動に着目した。 2015年行動危険因子監視システム(BRFSS)のデータを利用して、血圧、コレステロール、BMI、喫煙習慣など様々な要因が糖尿病の流行に与える影響を分析する。包括的分析は,それぞれの要因を分離するだけでなく,その相互依存性や糖尿病に対する集団的影響も調べる。我々の研究の新たな側面は、糖尿病リスクの決定要因としての収入の検証である。我々は、社会経済的地位と糖尿病の間の複雑な相互作用を解明するために、統計的および機械学習技術を使用し、経済的幸福が健康にどのように影響するかの新しい洞察を提供する。我々の研究は、低所得のブラケットが糖尿病の発生率に結びついている、明らかな傾向を明らかにした。健康因子とライフスタイルの選択を含む33変数の混合分析において,高血圧,高コレステロール,コレステロールチェック,所得,BMIなどの特徴が重要であることを確認した。これらの要素は、糖尿病の流行と管理において重要な役割を担っていることが示唆されている。

In this study, we delve into the intricate relationships between diabetes and a range of health indicators, with a particular focus on the newly added variable of income. Utilizing data from the 2015 Behavioral Risk Factor Surveillance System (BRFSS), we analyze the impact of various factors such as blood pressure, cholesterol, BMI, smoking habits, and more on the prevalence of diabetes. Our comprehensive analysis not only investigates each factor in isolation but also explores their interdependencies and collective influence on diabetes. A novel aspect of our research is the examination of income as a determinant of diabetes risk, which to the best of our knowledge has been relatively underexplored in previous studies. We employ statistical and machine learning techniques to unravel the complex interplay between socio-economic status and diabetes, providing new insights into how financial well-being influences health outcomes. Our research reveals a discernible trend where lower income brackets are associated with a higher incidence of diabetes. In analyzing a blend of 33 variables, including health factors and lifestyle choices, we identified that features such as high blood pressure, high cholesterol, cholesterol checks, income, and Body Mass Index (BMI) are of considerable significance. These elements stand out among the myriad of factors examined, suggesting that they play a pivotal role in the prevalence and management of diabetes.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# FilterPrompt: 拡散モデルにおける画像転送の誘導

FilterPrompt: Guiding Image Transfer in Diffusion Models ( http://arxiv.org/abs/2404.13263v1 )

ライセンス: Link先を確認

Xi Wang, Yichen Peng, Heng Fang, Haoran Xie, Xi Yang, Chuntao Li,

(参考訳) 制御可能な生成タスクでは、生成した画像を柔軟に操作し、単一の入力画像キューに基づいて所望の外観や構造を達成できる。これを実現するには、入力画像データ内のキー属性を効果的に分離し、表現を正確に取得する必要がある。以前の研究では、主に特徴空間内の画像属性の分離に焦点が当てられていた。しかし、実世界のデータに存在する複雑な分布は、そのようなデカップリングアルゴリズムを他のデータセットに適用することを難しくすることが多い。さらに、機能符号化に対する制御の粒度は、特定のタスク要求を満たすのにしばしば失敗する。様々な生成モデルの特性を精査すると,拡散モデルの入力感度と動的進化特性は,画素空間における明示的な分解操作と効果的に融合できることがわかった。これにより、入力画像の特定の特徴分布に対して画素空間で実行される画像処理操作が可能となり、生成した結果において所望の制御効果が得られる。そこで本研究では,モデル制御効果を高めるためのFilterPromptを提案する。任意の拡散モデルに普遍的に適用可能であり、ユーザーはタスク要求に応じて特定の画像特徴の表現を調整でき、より正確で制御可能な生成結果を容易にすることができる。特に,我々の設計した実験では,FilterPromptが特徴相関を最適化し,生成プロセス中のコンテント競合を緩和し,モデルの制御能力を向上することを示した。

In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This integration enables the image processing operations performed in pixel space for a specific feature distribution of the input image, and can achieve the desired control effect in the generated results. Therefore, we propose FilterPrompt, an approach to enhance the model control effect. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features in accordance with task requirements, thereby facilitating more precise and controllable generation outcomes. In particular, our designed experiments demonstrate that the FilterPrompt optimizes feature correlation, mitigates content conflicts during the generation process, and enhances the model's control capability.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# F5Cファインダー:mRNA上の5-ホルミルシチジン修飾を予測するための説明可能な生物学的言語モデル

F5C-finder: An Explainable and Ensemble Biological Language Model for Predicting 5-Formylcytidine Modifications on mRNA ( http://arxiv.org/abs/2404.13265v1 )

ライセンス: Link先を確認

Guohao Wang, Ting Liu, Hongqiang Lyu, Ze Liu,

(参考訳) 5-ホルミルシチジン(5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine, 5-formylcytidine)は、様々な生物学的過程において重要である。しかし、従来のf5C検出のための実験的手法は、しばしば手間がかかり、時間を要するため、f5Cのサイトを包括的に転写酵素にマッピングする能力は制限される。計算手法はコスト効率と高スループットの代替手段を提供するが、f5Cの認識モデルは開発されていない。自然言語処理における言語モデルからインスピレーションを得て,f5Cの同定にマルチヘッドアテンションを用いたアンサンブルニューラルネットワークモデルであるf5Cファインダーを提案する。 5つの異なる特徴抽出法を用いて、5つの個別のニューラルネットワークを構築し、これらのネットワークはその後、アンサンブル学習を通じて統合され、f5Cファインダーを生成する。 10倍のクロスバリデーションと独立試験により, AUCが0.807, 0.827で, f5CファインダーがSOTA(State-of-the-art)性能を達成した。この結果は、ゲノム内の順序(順序)と機能的意味(意味)の両方をキャプチャする生物学的言語モデルの有効性を強調している。さらに、組み込まれた解釈可能性により、モデルが何を学習しているかを理解することができ、キーシーケンシャルな要素の識別と、それらの生物学的機能のより深い探索の間に橋渡しができる。

As a prevalent and dynamically regulated epigenetic modification, 5-formylcytidine (f5C) is crucial in various biological processes. However, traditional experimental methods for f5C detection are often laborious and time-consuming, limiting their ability to map f5C sites across the transcriptome comprehensively. While computational approaches offer a cost-effective and high-throughput alternative, no recognition model for f5C has been developed to date. Drawing inspiration from language models in natural language processing, this study presents f5C-finder, an ensemble neural network-based model utilizing multi-head attention for the identification of f5C. Five distinct feature extraction methods were employed to construct five individual artificial neural networks, and these networks were subsequently integrated through ensemble learning to create f5C-finder. 10-fold cross-validation and independent tests demonstrate that f5C-finder achieves state-of-the-art (SOTA) performance with AUC of 0.807 and 0.827, respectively. The result highlights the effectiveness of biological language model in capturing both the order (sequential) and functional meaning (semantics) within genomes. Furthermore, the built-in interpretability allows us to understand what the model is learning, creating a bridge between identifying key sequential elements and a deeper exploration of their biological functions.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# 表構造と文字認識のためのマルチセルデコーダと相互学習

Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition ( http://arxiv.org/abs/2404.13268v1 )

ライセンス: Link先を確認

Takaya Kawakatsu,

(参考訳) 学術論文や財務報告などの文書から表の内容を取り出し,それを大規模言語モデルで処理可能な形式に変換することは,知識情報処理において重要な課題である。テーブル構造だけでなくセル内容も認識するエンドツーエンドアプローチは、外部文字認識システムを用いた最先端モデルに匹敵する性能を達成し、さらなる改善の可能性を秘めている。さらに、これらのモデルでは、数百セルの長いテーブルを局所的な注意を払って認識できるようになった。しかし、モデルでは、ヘッダーからフッタへの1方向のテーブル構造を認識し、各セルごとにセル内容の認識を行うため、近隣セルから有用な情報を検索する機会はない。本稿では,エンド・ツー・エンドアプローチを改善するために,マルチセルコンテンツデコーダと双方向相互学習機構を提案する。この効果は2つの大きなデータセットで実証され、実験結果は、多数のセルを持つ長いテーブルであっても、最先端のモデルに匹敵する性能を示す。

Extracting table contents from documents such as scientific papers and financial reports and converting them into a format that can be processed by large language models is an important task in knowledge information processing. End-to-end approaches, which recognize not only table structure but also cell contents, achieved performance comparable to state-of-the-art models using external character recognition systems, and have potential for further improvements. In addition, these models can now recognize long tables with hundreds of cells by introducing local attention. However, the models recognize table structure in one direction from the header to the footer, and cell content recognition is performed independently for each cell, so there is no opportunity to retrieve useful information from the neighbor cells. In this paper, we propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach. The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models, even for long tables with large numbers of cells.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# 非定常雑音下での確率的誤差キャンセルの改善

Improving probabilistic error cancellation in the presence of non-stationary noise ( http://arxiv.org/abs/2404.13269v1 )

ライセンス: Link先を確認

Samudra Dasgupta, Travis S. Humble,

(参考訳) 非定常雑音の存在下での確率的誤差キャンセル(PEC)結果の安定性について検討する。ベイズ法を利用して,PECの安定性と精度を向上させる戦略を設計する。我々は,Bernstein-Vazirani アルゴリズムを5ビット実装し,ibm_kolkata デバイス上で行った実験により,非適応型 PEC と比較して精度が 42% 向上し,安定性が60% 向上したことを明らかにした。これらの結果は,PECの活用に不可欠である非定常雑音に効果的に対処するための適応推定プロセスの重要性を浮き彫りにした。

We investigate the stability of probabilistic error cancellation (PEC) outcomes in the presence of non-stationary noise, which is an obstacle to achieving accurate observable estimates. Leveraging Bayesian methods, we design a strategy to enhance PEC stability and accuracy. Our experiments using a 5-qubit implementation of the Bernstein-Vazirani algorithm and conducted on the ibm_kolkata device reveal a 42% improvement in accuracy and a 60% enhancement in stability compared to non-adaptive PEC. These results underscore the importance of adaptive estimation processes to effectively address non-stationary noise, vital for advancing PEC utility.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# StrideNET:動的粗さ抽出による地形認識のためのスイニングトランス

StrideNET: Swin Transformer for Terrain Recognition with Dynamic Roughness Extraction ( http://arxiv.org/abs/2404.13270v1 )

ライセンス: Link先を確認

Maitreya Shelare, Neha Shigvan, Atharva Satam, Poonam Sonar,

(参考訳) 深層学習の進歩は、リモートセンシング画像の分類に革命をもたらしている。トランスフォーマーベースのアーキテクチャは、自己認識機構を利用して、画像内のグローバルな関係とともに、長距離依存関係のキャプチャを可能にする、従来の畳み込み手法に代わるものとして登場した。そこで本研究では,地形認識と暗黙的特性推定のために設計された新しいデュアルブランチアーキテクチャであるStrideNETを提案する。地形認識部はSwin Transformerを利用して、その階層的表現と低計算コストを活用し、局所的特徴とグローバル的特徴の両方を効率的に捉える。地形特性分枝は, 統計的テクスチャ解析法を用いて, 粗さやすべり性などの表面特性の抽出に重点を置いている。地形特性の計算により、環境認識の強化が可能である。 StrideNETモデルは、Grassy、Marshy、Sandy、Rockyの4つのターゲット地形クラスからなるデータセットでトレーニングされている。 StrideNETは、現代の方法と比較して競争力がある。この研究の意味は、環境モニタリング、土地利用と土地被覆分類(LULC)、災害対応、精密農業など、様々な応用にまで及んでいる。

Advancements in deep learning are revolutionizing the classification of remote-sensing images. Transformer-based architectures, utilizing self-attention mechanisms, have emerged as alternatives to conventional convolution methods, enabling the capture of long-range dependencies along with global relationships in the image. Motivated by these advancements, this paper presents StrideNET, a novel dual-branch architecture designed for terrain recognition and implicit properties estimation. The terrain recognition branch utilizes the Swin Transformer, leveraging its hierarchical representation and low computational cost to efficiently capture both local and global features. The terrain properties branch focuses on the extraction of surface properties such as roughness and slipperiness using a statistical texture analysis method. By computing surface terrain properties, an enhanced environmental perception can be obtained. The StrideNET model is trained on a dataset comprising four target terrain classes: Grassy, Marshy, Sandy, and Rocky. StrideNET attains competitive performance compared to contemporary methods. The implications of this work extend to various applications, including environmental monitoring, land use and land cover (LULC) classification, disaster response, precision agriculture, and much more.

翻訳日:2024-04-23 19:49:10 公開日:2024-04-20

# クロスマスク復元を用いた教師なし異常検出のための多機能再構成ネットワーク

Multi-feature Reconstruction Network using Crossed-mask Restoration for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2404.13273v1 )

ライセンス: Link先を確認

Junpu Wang, Guili Xu, Chunlei Li, Guangshuai Gao, Yuehua Cheng,

(参考訳) 工業生産における品質検査において,正常試料のみを用いた無監督異常検出が重要である。既存の再構成手法は有望な結果を得たが、画像再構成における識別性に乏しい情報と、モデル過剰一般化能力による異常な再生という2つの問題に直面している。上記の課題を克服するために,画像再構成を並列特徴復元の組み合わせに変換し,マルチ機能再構成ネットワークであるMFRNetを提案する。具体的には、予め訓練されたモデルから入力画像のより識別的な階層的表現を生成するために、まずマルチスケール特徴集約器を開発した。その後、抽出した特徴マップをランダムにカバーするためにクロスマスクジェネレータを採用し、次いで、欠落した領域の高品質な修復のためのトランス構造に基づく復元ネットワークを構築する。最後に、ハイブリッド損失は、モデルトレーニングと異常推定をガイドし、画素と構造的類似性の両方を考慮している。大規模な実験により、我々の手法は4つの公開データセットと1つの自作データセットにおいて、他の最先端のデータセットと非常に競争的であるか、大幅に上回っていることが示された。

Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above issues, we convert the image reconstruction into a combination of parallel feature restorations and propose a multi-feature reconstruction network, MFRNet, using crossed-mask restoration in this paper. Specifically, a multi-scale feature aggregator is first developed to generate more discriminative hierarchical representations of the input images from a pre-trained model. Subsequently, a crossed-mask generator is adopted to randomly cover the extracted feature map, followed by a restoration network based on the transformer structure for high-quality repair of the missing regions. Finally, a hybrid loss is equipped to guide model training and anomaly estimation, which gives consideration to both the pixel and structural similarity. Extensive experiments show that our method is highly competitive with or significantly outperforms other state-of-the-arts on four public available datasets and one self-made dataset.

翻訳日:2024-04-23 19:39:26 公開日:2024-04-20

# 拡張されたオブジェクトインテリジェンス:XRオブジェクトでアナログワールドを対話可能にする

Augmented Object Intelligence: Making the Analog World Interactable with XR-Objects ( http://arxiv.org/abs/2404.13274v1 )

ライセンス: Link先を確認

Mustafa Doga Dogan, Eric J. Gonzalez, Andrea Colaco, Karan Ahuja, Ruofei Du, Johnny Lee, Mar Gonzalez-Franco, David Kim,

(参考訳) 対話型デジタルエンティティとしての物理オブジェクトのシームレスな統合は、空間コンピューティングの課題である。本稿では,デジタルオブジェクトがデジタルであるかのように対話できる能力を備えた,デジタルオブジェクトと物理オブジェクトの境界線を曖昧にするために設計された,新しいXRインタラクションパラダイムであるAugmented Object Intelligence(AOI)を紹介する。提案手法では,オブジェクトのセグメンテーションと分類と,MLLM(Multimodal Large Language Models)のパワーを組み合わせることで,これらのインタラクションを容易にする。我々は,AOI の概念を XR-Objects というオープンソースのプロトタイプシステムで実装する。このシステムにより、アナログオブジェクトが情報を伝えるだけでなく、細部への問い合わせやタスクの実行といったデジタルアクションを開始することができる。 1)従来のAIアシスタントよりもAOIの概念を定義し、その利点を詳述し、(2)XR-Objectsシステムのオープンソース設計と実装を詳述し、(3)さまざまなユースケースとユーザスタディを通じてその汎用性を示す。

Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper introduces Augmented Object Intelligence (AOI), a novel XR interaction paradigm designed to blur the lines between digital and physical by endowing real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to vast digital functionalities. Our approach utilizes object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in rich and contextually relevant ways. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through a variety of use cases and a user study.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# スコア変更を超えて:2つの観点からの非参照画像品質評価に対する敵対的攻撃

Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives ( http://arxiv.org/abs/2404.13277v1 )

ライセンス: Link先を確認

Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang,

(参考訳) ディープニューラルネットワークは、NR-IQA(No-Reference Image Quality Assessment)において驚くべき成功を収めている。しかし、最近の研究は、NR-IQAモデルが微妙な敵の摂動に対して脆弱であることを強調し、モデル予測と主観的評価の不整合をもたらす。しかし、現在の敵対的攻撃は、個々の画像の予測スコアの摂動に焦点を合わせ、画像集合全体におけるスコア間の相関関係の重要な側面を無視している。一方、ランキング相関と同様、NR-IQAタスクでは相関が重要な役割を担っていることに留意する必要がある。 NR-IQAモデルのロバスト性を包括的に探求するために,画像集合内の相関関係を乱し,個々の画像に変化をスコアする相関エラーベースの新たなフレームワークを導入する。我々の研究は主に、Spearman's Rank-Order correlation Coefficient (SROCC)やMean Squared Error (MSE)のような予測エラー関連メトリクスのようなランキング関連相関指標に焦点を当てている。そこで本研究では,SROCC-MSE-Attack (SMA) と呼ばれる2段階のSROCC-MSE-Attack (SMA) を提案する。実験の結果,SMA法はSROCCを負の値に大きく破壊するだけでなく,個々の画像のスコアにかなりの変化をもたらすことが明らかとなった。一方、さまざまなカテゴリのメトリクスにまたがって最先端のパフォーマンスを示す。提案手法はNR-IQAモデルのロバスト性に関する新しい視点を提供する。

Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# 超音波金属溶接における条件モニタリングのためのタスクパーソナライズによるフェデレーション伝達学習

Federated Transfer Learning with Task Personalization for Condition Monitoring in Ultrasonic Metal Welding ( http://arxiv.org/abs/2404.13278v1 )

ライセンス: Link先を確認

Ahmadreza Eslaminia, Yuquan Meng, Klara Nahrstedt, Chenhui Shao,

(参考訳) 超音波金属溶接(UMW)は産業用途において重要な接合技術である。プロセス異常が接合品質を著しく低下させるため、UMWアプリケーションでは条件監視(CM)機能が必要である。近年、機械学習モデルは複雑なパターンを学習できるため、多くの製造アプリケーションにおいてCMにとって有望なツールとして登場した。しかし、これらのモデルのデプロイを成功させるためには、膨大なトレーニングデータが必要である。さらに、既存の機械学習モデルの多くは一般化性に欠けており、新しいプロセス構成(すなわちドメイン)に直接適用できない。このような問題は、メーカー間でデータをプールすることで軽減される可能性があるが、データ共有はデータプライバシの重大な懸念を引き起こす。これらの課題に対処するため,データプライバシを確保しつつ,分散学習におけるドメイン一般化機能を提供するFTL-TP(Federated Transfer Learning with Task Personalization)フレームワークを提案する。特徴空間から統一表現を効果的に学習することにより、FTL-TPは、同様のタスクを行うクライアントに対してCMモデルを適応させることができる。 FTL-TPの有効性を実証するために,2つの異なるUMW CMタスク,ツール条件モニタリング,ワークピース表面条件分類について検討した。最先端のFLアルゴリズムと比較して、FTL-TPは新しいターゲット領域におけるCMの精度を5.35%から8.08%向上させる。 FTL-TPはまた、不均衡なデータ分散と限られたクライアント分数を含む挑戦的なシナリオでも優れた性能を発揮する。さらに,エッジクラウドアーキテクチャ上でのFTL-TP手法の実装により,本手法が実現可能かつ効率的に実現可能であることを示す。 FTL-TPフレームワークは、他の様々な製造アプリケーションに容易に拡張可能である。

Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patterns. Yet, the successful deployment of these models requires substantial training data that may be expensive and time-consuming to collect. Additionally, many existing machine learning models lack generalizability and cannot be directly applied to new process configurations (i.e., domains). Such issues may be potentially alleviated by pooling data across manufacturers, but data sharing raises critical data privacy concerns. To address these challenges, this paper presents a Federated Transfer Learning with Task Personalization (FTL-TP) framework that provides domain generalization capabilities in distributed learning while ensuring data privacy. By effectively learning a unified representation from feature space, FTL-TP can adapt CM models for clients working on similar tasks, thereby enhancing their overall adaptability and performance jointly. To demonstrate the effectiveness of FTL-TP, we investigate two distinct UMW CM tasks, tool condition monitoring and workpiece surface condition classification. Compared with state-of-the-art FL algorithms, FTL-TP achieves a 5.35%--8.08% improvement of accuracy in CM in new target domains. FTL-TP is also shown to perform excellently in challenging scenarios involving unbalanced data distributions and limited client fractions. Furthermore, by implementing the FTL-TP method on an edge-cloud architecture, we show that this method is both viable and efficient in practice. The FTL-TP framework is readily extensible to various other manufacturing applications.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# セマンティックコミュニケーションにおけるセマンティック・シンボリック再構築のバックドア攻撃と防御

Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications ( http://arxiv.org/abs/2404.13279v1 )

ライセンス: Link先を確認

Yuan Zhou, Rose Qingyang Hu, Yi Qian,

(参考訳) 次世代無線通信ネットワークでは,セマンティック通信が重要である。既存の研究は、ディープラーニングに基づくセマンティックコミュニケーションフレームワークを開発した。しかし、ディープラーニングを利用したシステムは、バックドア攻撃や敵攻撃のような脅威に対して脆弱である。本稿では,ディープラーニング対応セマンティックコミュニケーションシステムを対象としたバックドア攻撃について検討する。現在のバックドア攻撃はセマンティック・コミュニケーションのシナリオには適していないため、セマンティック・シンボル(BASS)に対する新たなバックドア・アタック・パラダイムが導入された。具体的には,BASS防止のためのトレーニングフレームワークを提案する。さらに、リバースエンジニアリングベースおよびプルーニングベースの防衛戦略は、セマンティックコミュニケーションにおけるバックドア攻撃を防ぐように設計されている。シミュレーションの結果,提案した攻撃パラダイムと防衛戦略の有効性が示された。

Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic communication systems. Since current works on backdoor attacks are not tailored for semantic communication scenarios, a new backdoor attack paradigm on semantic symbols (BASS) is introduced, based on which the corresponding defense measures are designed. Specifically, a training framework is proposed to prevent BASS. Additionally, reverse engineering-based and pruning-based defense strategies are designed to protect against backdoor attacks in semantic communication. Simulation results demonstrate the effectiveness of both the proposed attack paradigm and the defense strategies.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# Wills Aligner:ロバストな多目的脳表現学習者

Wills Aligner: A Robust Multi-Subject Brain Representation Learner ( http://arxiv.org/abs/2404.13282v1 )

ライセンス: Link先を確認

Guangyin Bao, Zixuan Gong, Qi Zhang, Jialei Zhou, Wei Fan, Kun Yi, Usman Naseem, Liang Hu, Duoqian Miao,

(参考訳) 最近の研究では、人間の脳活動から視覚情報を復号する技術が目覚ましい進歩を遂げている。しかし、被験者間の皮質パーセレーションや認知パターンの有意な変動により、現在のアプローチは各被験者にパーソナライズされたディープモデルを提供し、現実の文脈においてこの技術の実用性を制限している。この課題に対処するために,頑健な多目的脳表現学習者であるWills Alignerを紹介した。私たちのWills Alignerは最初、解剖学的レベルで異なる被験者の脳を調整します。その後、個々の認知パターンを学習するために、脳の専門家の混合物が組み込まれている。さらに、多目的学習タスクを2段階のトレーニングに分離し、深層モデルとそのプラグインネットワークを推進し、共通性間の知識と様々な認知パターンを学習する。 Wills Alignerは、解剖学的差異を克服し、単一のモデルを多目的脳表現学習に効率的に活用することを可能にする。粗くきめ細かな視覚的デコードタスクにまたがるアプローチの性能を慎重に評価する。 The experimental results showed that our Wills Aligner achieves State-of-the-art performance。

Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, due to the significant variability in cortical parcellation and cognition patterns across subjects, current approaches personalized deep models for each subject, constraining the practicality of this technology in real-world contexts. To tackle the challenges, we introduce Wills Aligner, a robust multi-subject brain representation learner. Our Wills Aligner initially aligns different subjects' brains at the anatomical level. Subsequently, it incorporates a mixture of brain experts to learn individual cognition patterns. Additionally, it decouples the multi-subject learning task into a two-stage training, propelling the deep model and its plugin network to learn inter-subject commonality knowledge and various cognition patterns, respectively. Wills Aligner enables us to overcome anatomical differences and to efficiently leverage a single model for multi-subject brain representation learning. We meticulously evaluate the performance of our approach across coarse-grained and fine-grained visual decoding tasks. The experimental results demonstrate that our Wills Aligner achieves state-of-the-art performance.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# PoseINN: Invertible Neural Networksを用いたリアルタイム視覚ベースのPose回帰とローカライゼーション

PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks ( http://arxiv.org/abs/2404.13288v1 )

ライセンス: Link先を確認

Zirui Zang, Ahmad Amine, Rahul Mangharam,

(参考訳) カメラからエゴ位置を推定することは、モバイルロボティクスから拡張現実に至るまで、ロボット工学における重要な問題である。 SOTAモデルはますます正確化が進んでいるが、計算コストが高いため、いまだに扱いにくい。本稿では,インバータブルニューラルネットワーク(INN)を用いて画像の潜在空間とシーンのポーズのマッピングを求める。我々のモデルは、訓練が速く、低解像度合成データのオフラインレンダリングしか必要とせず、SOTAと同じような性能を実現している。正規化フローを用いることで,提案手法は出力に対する不確実性を推定する。また,移動ロボットにモデルを配置することで,本手法の有効性を実証した。

Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality. While SOTA models are becoming increasingly accurate, they can still be unwieldy due to high computational costs. In this paper, we propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and poses for a given scene. Our model achieves similar performance to the SOTA while being faster to train and only requiring offline rendering of low-resolution synthetic data. By using normalizing flows, the proposed method also provides uncertainty estimation for the output. We also demonstrated the efficiency of this method by deploying the model on a mobile robot.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# 二重混合:音声からの連続事象検出を目指して

Double Mixture: Towards Continual Event Detection from Speech ( http://arxiv.org/abs/2404.13289v1 )

ライセンス: Link先を確認

Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari,

(参考訳) 音声イベント検出は、セマンティックイベントと音響イベントの両方のタグ付けを含むマルチメディア検索に不可欠である。従来のASRシステムは、対話の解釈が環境の文脈によって異なるとしても、コンテンツにのみ焦点をあてて、これらの出来事間の相互作用を見落としていることが多い。本稿では, 音声イベント検出における主な課題として, 過去の出来事を忘れることなく新たな事象を連続的に統合すること, 音響イベントからの意味のゆがみについて述べる。音声からの連続イベント検出という新しいタスクを導入し、2つのベンチマークデータセットを提供する。破滅的な忘れ込みと効果的な切り離しの課題に対処するため,我々は「二重混合」という新しい手法を提案する。本手法は, 適応性を高め, 忘れないように, 頑健な記憶機構と音声の専門知識を融合する。この課題は,コンピュータビジョンや自然言語処理において,現在最先端の手法では効果的に対処できない重要な課題であることを示す。提案手法は,様々な連続的な学習シーケンスにまたがって,最小の忘れ込み率と最高レベルの一般化を実現している。私たちのコードとデータはhttps://anonymous.4open.science/status/Continual-SpeechED-6461で公開されています。

Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events. We introduce a new task, continual event detection from speech, for which we also provide two benchmark datasets. To address the challenges of catastrophic forgetting and effective disentanglement, we propose a novel method, 'Double Mixture.' This method merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting. Our comprehensive experiments show that this task presents significant challenges that are not effectively addressed by current state-of-the-art methods in either computer vision or natural language processing. Our approach achieves the lowest rates of forgetting and the highest levels of generalization, proving robust across various continual learning sequences. Our code and data are available at https://anonymous.4open.science/status/Continual-SpeechED-6461.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# サブワードのトークン化の評価 : エイリアンのサブワード構成とOOV一般化への挑戦

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge ( http://arxiv.org/abs/2404.13292v1 )

ライセンス: Link先を確認

Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella,

(参考訳) Byte-Pair Encoding (BPE) など、現在の言語モデルの一般的なサブワードトークンは、モデルの下流のパフォーマンスに影響を与える形態素境界を尊重しないことが知られている。多くの改良されたトークン化アルゴリズムが提案されているが、それらの評価と相互比較は依然として未解決の問題である。そこで本研究では,サブワードトークン化のための内在的・外在的評価フレームワークを提案する。 Intrinsic Evaluation is based on our new UniMorph Labeller tool that classified subword tokenization as morphological or alien。外部評価は、新たに指定された3つの下流テキスト分類タスクからなるOut-of-Vocabulary Generalization Challenge 1.0ベンチマークによって行われる。実験の結果,UniMorph Labellerの精度は98%であり,すべての言語モデル(ALBERT,BERT,RoBERTa,DeBERTaを含む)において,単語の意味の意味的構成性に対する形態的トークン化に比べて,異種トークン化が低いことが示唆された。

The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been proposed, their evaluation and cross-comparison is still an open problem. As a solution, we propose a combined intrinsic-extrinsic evaluation framework for subword tokenization. Intrinsic evaluation is based on our new UniMorph Labeller tool that classifies subword tokenization as either morphological or alien. Extrinsic evaluation, in turn, is performed via the Out-of-Vocabulary Generalization Challenge 1.0 benchmark, which consists of three newly specified downstream text classification tasks. Our empirical findings show that the accuracy of UniMorph Labeller is 98%, and that, in all language models studied (including ALBERT, BERT, RoBERTa, and DeBERTa), alien tokenization leads to poorer generalizations compared to morphological tokenization for semantic compositionality of word meanings.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# 相関脱落チャネルにおける重力猫状態の量子性

Quantumness of gravitational cat states in correlated dephasing channels ( http://arxiv.org/abs/2404.13294v1 )

ライセンス: Link先を確認

Saeed Haddadi, Mehrdad Ghominejad, Artur Czerwinski,

(参考訳) 本研究では, 負のデファスチャネルにおける重力猫状態の量子性について検討する。熱状態下での2匹の重力猫(2立方体)の脱コヒーレンスに、相似チャネルの連続的な作用の古典的相関がどのように影響するかを調べることに注力する。その結果、量子コヒーレンス、局所的な量子フィッシャー情報、ベル非局所性は、2つの量子ビットがチャネルを通過するときの時間を通して古典的相関を増大させることで著しく向上できることが示された。しかし、状態間の重力相互作用とエネルギーギャップは、重力猫の量子特性に複雑な影響を示す。重力物理学と量子情報処理の両方に重要な新機能が報告されている。

We study the quantumness of gravitational cat states in correlated dephasing channels. Our focus is on exploring how classical correlations between successive actions of a dephasing channel influence the decoherence of two gravitational cats (two qubits) at a thermal regime. The results show that the quantum coherence, local quantum Fisher information, and Bell non-locality can be significantly enhanced by augmenting classical correlations throughout the entire duration when the two qubits pass the channel. However, the gravitational interaction and energy gap between states exhibit intricate impacts on the quantum characteristics of gravitational cats. New features are reported that can be significant for both gravitational physics and quantum information processing.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# インクリメンタルビルドにおけるビルド依存性エラーの検出

Detecting Build Dependency Errors in Incremental Builds ( http://arxiv.org/abs/2404.13295v1 )

ライセンス: Link先を確認

Jun Lyu, Shanshan Li, He Zhang, Lanxin Yang, Bohan Liu, Manuel Rigger,

(参考訳) Makeのようなビルドツールによって実行される増分ビルドと並列ビルドは、現代のC/C++ソフトウェアプロジェクトの中心である。それらの正しい効率的な実行は、ビルドスクリプトに依存する。しかし、ビルドスクリプトはエラーを起こしやすい。最も多いエラーは、依存性の欠如(MD)と冗長依存関係(RD)である。これらのエラーを検出する最先端の手法は、クリーンなビルド(すなわち、クリーンな環境におけるソフトウェア構成のサブセットの完全なビルド)に依存している。これらの課題に対処するため、インクリメンタルビルドのコンテキストにおいて、ビルド依存性エラーを検出するためのECheckerと呼ばれる新しいアプローチを提案する。 ECheckerの中核となる考え方は、C/C++プリプロセッサディレクティブとMakefileの変更を新しいコミットから推論することで、実際のビルド依存関係を自動的に更新することだ。 ECheckerは、効率を維持しながらクリーンビルドに依存する方法よりも高い効率を達成する。私たちは、ECheckerの有効性と効率を評価するため、12の代表的なプロジェクトを選択しました。評価結果を,最先端のビルド依存性検出ツールと比較した。評価の結果,ECheckerのF-1スコアは最先端法に比べて0.18改善した。 ECheckerはビルド依存性のエラー検出効率を平均85.14倍に向上させる(中央値16.30倍)。その結果、ECheckerは、ビルド依存性のエラーを効率的に検出する実践者をサポートすることができた。

Incremental and parallel builds performed by build tools such as Make are the heart of modern C/C++ software projects. Their correct and efficient execution depends on build scripts. However, build scripts are prone to errors. The most prevalent errors are missing dependencies (MDs) and redundant dependencies (RDs). The state-of-the-art methods for detecting these errors rely on clean builds (i.e., full builds of a subset of software configurations in a clean environment), which is costly and takes up to multiple hours for large-scale projects. To address these challenges, we propose a novel approach called EChecker to detect build dependency errors in the context of incremental builds. The core idea of EChecker is to automatically update actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits, which avoids clean builds when possible. EChecker achieves higher efficiency than the methods that rely on clean builds while maintaining effectiveness. We selected 12 representative projects, with their sizes ranging from small to large, with 240 commits (20 commits for each project), based on which we evaluated the effectiveness and efficiency of EChecker. We compared the evaluation results with a state-of-the-art build dependency error detection tool. The evaluation shows that the F-1 score of EChecker improved by 0.18 over the state-of-the-art method. EChecker increases the build dependency error detection efficiency by an average of 85.14 times (with the median at 16.30 times). The results demonstrate that EChecker can support practitioners in detecting build dependency errors efficiently.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# 非零運動量をもつコヒーレンシングハードコアボソン凝縮状態

Coalescing hardcore-boson condensate states with nonzero momentum ( http://arxiv.org/abs/2404.13297v1 )

ライセンス: Link先を確認

C. H. Zhang, Z. Song,

(参考訳) 例外点(EPs)は、非エルミート系の排他的特徴として、基底状態を超えた代替安定状態である合体状態を支持する。本研究では, 強オンサイト相互作用を持つ1次元, 2次元, 3次元拡張ボース・ハッバード系における凝縮状態の動的形成に対する非エルミート不純物の影響について検討する。ハードコア限界の解に基づいて,特定の系パラメータが特定の整合条件を満たす場合,ODLRO (off-diagonal long-range order) の縮合モードが存在することを示す。開境界条件下では、凝縮状態は非エルミート$\mathcal{PT}$対称境界がEPを生じさせるときに結合状態となる。この現象の背後にある基本的なメカニズムは、非エルミート境界における多粒子波束の散乱ダイナミクスを解析することによって解明される。 EPダイナミクスは非ゼロ運動量を持つ凝縮状態の動的生成を促進する。理論的知見をさらに裏付けるために,数値シミュレーションを行った。この研究は、相互作用するボソンの潜在的な凝縮を公表するだけでなく、凝縮状態の工学にもアプローチを提供する。

Exceptional points (EPs), as an exclusive feature of a non-Hermitian system, support coalescing states to be alternative stable state beyond the ground state. In this work, we explore the influence of non-Hermitian impurities on the dynamic formation of condensate states in one-, two-, and three-dimensional extended Bose-Hubbard systems with strong on-site interaction. Based on the solution for the hardcore limit, we show exactly that condensate modes with off-diagonal long-range order (ODLRO) can exist when certain system parameters satisfy specific matching conditions. Under open boundary conditions, the condensate states become coalescing states when the non-Hermitian $\mathcal{PT}$-symmetric boundary gives rise to the EPs. The fundamental mechanism behind this phenomenon is uncovered through analyzing the scattering dynamics of many-particle wavepackets at the non-Hermitian boundaries. The EP dynamics facilitate the dynamic generation of condensate states with non-zero momentum. To further substantiate the theoretical findings, numerical simulations are conducted. This study not only unveils the potential condensation of interacting bosons but also offers an approach for the engineering of condensate states.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# PCQA: プロンプト条件に基づくAIGC品質評価のための強力なベースライン

PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition ( http://arxiv.org/abs/2404.13299v1 )

ライセンス: Link先を確認

Xi Fang, Weigang Wang, Xiaoxin Lv, Jun Yan,

(参考訳) 大規模言語モデル(LLM)と拡散モデル(Diffusion Models)の開発は、人工知能生成コンテンツ(AIGC)のブームをもたらす。 AIGC技術に基づいて、異なる画像やビデオの定量評価を提供するために、効果的な品質評価フレームワークを構築することが不可欠である。 AIGCメソッドによって生成されたコンテンツは、人工的なプロンプトによって駆動される。したがって,AIGCの品質評価の基礎として,このプロンプトが有効であることは直感的である。本研究では,効果的なAIGC品質評価(QA)フレームワークを提案する。まず,複数ソースCLIP(Contrastive Language- Image Pre-Training)テキストエンコーダをベースとしたハイブリッドプロンプト符号化手法を提案する。第2に,適応したプロンプトと視覚機能を効果的にブレンドするアンサンブルベースの機能ミキサーモジュールを提案する。 AIGIQA-20K (AI-Generated Image Quality Assessment database) と T2VQA-DB (Text-to-Video Quality Assessment DataBase) の2つのデータセットにおける実証的研究を行い,提案手法の有効性を検証した。提案するシンプルで実現可能なフレームワークは,マルチモーダル・ジェネレーション分野の研究開発を促進する可能性がある。

The development of Large Language Models (LLM) and Diffusion Models brings the boom of Artificial Intelligence Generated Content (AIGC). It is essential to build an effective quality assessment framework to provide a quantifiable evaluation of different images or videos based on the AIGC technologies. The content generated by AIGC methods is driven by the crafted prompts. Therefore, it is intuitive that the prompts can also serve as the foundation of the AIGC quality assessment. This study proposes an effective AIGC quality assessment (QA) framework. First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder to understand and respond to the prompt conditions. Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features. The empirical study practices in two datasets: AIGIQA-20K (AI-Generated Image Quality Assessment database) and T2VQA-DB (Text-to-Video Quality Assessment DataBase), which validates the effectiveness of our proposed method: Prompt Condition Quality Assessment (PCQA). Our proposed simple and feasible framework may promote research development in the multimodal generation field.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# Capturing Momentum: 機械学習と時系列理論を用いたテニスマッチング解析

Capturing Momentum: Tennis Match Analysis Using Machine Learning and Time Series Theory ( http://arxiv.org/abs/2404.13300v1 )

ライセンス: Link先を確認

Jingdi Lei, Tianqi Kang, Yuluan Cao, Shiwei Ren,

(参考訳) 本稿ではテニスの試合の勢いについて分析する。また、その一般化性能から、スポーツゲームの結果を予測するシステムの構築や、技術統計に基づくプレイヤーのパフォーマンス分析に有用である。まず隠れマルコフモデルを用いてプレイヤーのパフォーマンスとして定義される運動量を予測する。そして、Xgboost を用いて運動量の重要性を証明する。最後に,本モデルの性能評価にLightGBMを用い,SHAP特徴量ランキングと重み解析を用いて,プレイヤーのパフォーマンスに影響を及ぼす重要な点を求める。

This paper represents an analysis on the momentum of tennis match. And due to Generalization performance of it, it can be helpful in constructing a system to predict the result of sports game and analyze the performance of player based on the Technical statistics. We First use hidden markov models to predict the momentum which is defined as the performance of players. Then we use Xgboost to prove the significance of momentum. Finally we use LightGBM to evaluate the performance of our model and use SHAP feature importance ranking and weight analysis to find the key points that affect the performance of players.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# フェイクベンチ:大きめのマルチモーダルモデルでアキレスのフェイク画像のヒールを発見

FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models ( http://arxiv.org/abs/2404.13306v1 )

ライセンス: Link先を確認

Yixuan Li, Xuelin Liu, Xiaoyang Wang, Shiqi Wang, Weisi Lin,

(参考訳) 近年,人工知能(AI)モデルによって生成された偽画像は,偽画像検出モデルに対する新たな課題として現実と区別できないものとなっている。この程度では、人間の理解できない説明がないため、現実または偽の単純な二分判断は説得力が少なく、信頼性が低いように見える。幸運なことに、LMM(Large Multimodal Models)は、その性能が未決定のまま、判断プロセスを実現する可能性をもたらす。そこで本稿では,偽のサインに人間の言語記述を付加した偽画像からなる,透過的なデファクタに対する最初のベンチマークであるFakeBenchを提案する。 1)LMMはAIによって生成された偽画像を区別できるか、(2)LMMは偽画像をどのように区別できるのか? 具体的には、FakeClassデータセットを6kの多様なソースの偽画像と実画像で構築し、それぞれに画像の信頼性に関する質問&回答ペアを設け、検出能力をベンチマークする。本研究では,LMMの推論能力と解釈能力を検討するために,偽画像のファルシフィケーションを明らかにする暗黙の手がかりに関する15k個の記述からなるFakeClueデータセットを提案する。さらに,FakeQAを構築し,LMMの解答能力を評価する。実験の結果,現在のLMMは中等度識別能力,予備解釈能力,推論能力を有しており,画像デフォーメーションの解答能力は欠かせないことがわかった。 FakeBenchは近く一般公開される予定だ。

Recently, fake images generated by artificial intelligence (AI) models have become indistinguishable from the real, exerting new challenges for fake image detection models. To this extent, simple binary judgments of real or fake seem less convincing and credible due to the absence of human-understandable explanations. Fortunately, Large Multimodal Models (LMMs) bring possibilities to materialize the judgment process while their performance remains undetermined. Therefore, we propose FakeBench, the first-of-a-kind benchmark towards transparent defake, consisting of fake images with human language descriptions on forgery signs. FakeBench gropes for two open questions of LMMs: (1) can LMMs distinguish fake images generated by AI, and (2) how do LMMs distinguish fake images? In specific, we construct the FakeClass dataset with 6k diverse-sourced fake and real images, each equipped with a Question&Answer pair concerning the authenticity of images, which are utilized to benchmark the detection ability. To examine the reasoning and interpretation abilities of LMMs, we present the FakeClue dataset, consisting of 15k pieces of descriptions on the telltale clues revealing the falsification of fake images. Besides, we construct the FakeQA to measure the LMMs' open-question answering ability on fine-grained authenticity-relevant aspects. Our experimental results discover that current LMMs possess moderate identification ability, preliminary interpretation and reasoning ability, and passable open-question answering ability for image defake. The FakeBench will be made publicly available soon.

翻訳日:2024-04-23 19:39:25 公開日:2024-04-20

# GPT-4におけるエラータイプの調査とUSMLE質問への回答

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions ( http://arxiv.org/abs/2404.13307v1 )

ライセンス: Link先を確認

Soumyadeep Roy, Aparup Khatua, Fatemeh Ghoochani, Uwe Hadler, Wolfgang Nejdl, Niloy Ganguly,

(参考訳) GPT-4は医療用QAタスクにおいて高い精度を示し、86.70%の精度で、Med-PaLM 2は86.50%である。しかし、エラーの約14%が残っている。加えて、現在の研究では GPT-4 を用いて正しい選択肢を予測できるが、説明は得られず、したがって GPT-4 や他の LLM で使用される思考過程や推論についての洞察は得られない。そこで,本研究では,医学生との連携から得られた新たな領域固有の誤り分類法を提案する。 GPT-4 USMLE Error (G4UE) データセットは, アメリカ医学ライセンス試験 (USMLE) に対する4153 GPT-4 の正解と 919 の誤応答からなる。これらの応答は非常に長く(258語平均)、選択されたオプションを正当化する GPT-4 からの詳細な説明を含んでいる。そして、Potatoアノテーションプラットフォームを使用して大規模なアノテーション研究を開始し、有名なクラウドソーシングプラットフォームであるProlificを通じて44人の医療専門家を募集した。私たちは、これらの919の不正なデータポイントのうち300点を、異なるクラスの粒度レベルで注釈付けし、エラーの背後にある理由を特定するためにマルチラベルスパンを作成しました。注釈付きデータセットでは、GPT-4の誤応答のかなりの部分は、アノテーションによって「GPT-4による推論可能な応答」に分類される。これは、訓練された医療専門家の間でも、誤った選択肢につながる可能性のある説明を明らかにするという課題に光を当てている。データポイント毎にSemRepツールを用いて抽出した医療概念と医用意味述語も提供する。 LLMが複雑な医学的疑問に答える能力を評価するのに役立つと我々は信じている。リソースはhttps://github.com/roysoumya/usmle-gpt4-error-taxonomy で公開しています。

GPT-4 demonstrates high accuracy in medical QA tasks, leading with an accuracy of 86.70%, followed by Med-PaLM 2 at 86.50%. However, around 14% of errors remain. Additionally, current works use GPT-4 to only predict the correct option without providing any explanation and thus do not provide any insight into the thinking process and reasoning used by GPT-4 or other LLMs. Therefore, we introduce a new domain-specific error taxonomy derived from collaboration with medical students. Our GPT-4 USMLE Error (G4UE) dataset comprises 4153 GPT-4 correct responses and 919 incorrect responses to the United States Medical Licensing Examination (USMLE) respectively. These responses are quite long (258 words on average), containing detailed explanations from GPT-4 justifying the selected option. We then launch a large-scale annotation study using the Potato annotation platform and recruit 44 medical experts through Prolific, a well-known crowdsourcing platform. We annotated 300 out of these 919 incorrect data points at a granular level for different classes and created a multi-label span to identify the reasons behind the error. In our annotated dataset, a substantial portion of GPT-4's incorrect responses is categorized as a "Reasonable response by GPT-4," by annotators. This sheds light on the challenge of discerning explanations that may lead to incorrect options, even among trained medical professionals. We also provide medical concepts and medical semantic predications extracted using the SemRep tool for every data point. We believe that it will aid in evaluating the ability of LLMs to answer complex medical questions. We make the resources available at https://github.com/roysoumya/usmle-gpt4-error-taxonomy .